\ ’;\1\‘\1‘h\\m

 

 

 

 

‘w :.:~ i

in

 

 

 

 

 

 

 

 

 

it: , ”a ...

 

 

 

 

 

.n; ,.,,|.
.1y»...,‘....‘
1 ,

 

-. 'n,~= - . .

Luz-H.713". ., .;.3. .2. ..
w .. m . . m”... L”.—

. . l '1

 

 

lllllllllllllllllHIHHHllllllllllllllllllllllllllllllllll

3 12930 l

 

 

 

LIBRARY
Michigan State
University
-\ ,4

 

This is to certify that the

dissertation entitled

CRITERIA FOR ASSESSING KINDERGARTEN TO
TWELFTH-GRADE ENGLISH LANGUAGE ARTS CURRICULA ,
TEACHING PRACTICES , AND STUDENT PERFORMANCE

presented by

ELLEN HENSON BR INKLEY

has been accepted towards fulﬁllment
of the requirements for

Ph . D . degree in English

ﬂ/z A/ /cMé

Major professor

Date 2 $393M /?7/

MS U is an Afﬁrmative Action/Equal Opportunity Institution 0-12771

 

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

MSU Is An Affirmative Action/Equal Opportunity Institution
c:\circ\datedue.pm3-p.1

 

4’ _,.__j _

CRITERIA FOR ASSESSING KINDERGARTEN To
TWELFTH-GRADE ENGLISH LANGUAGE ARTS CURRICULA,
TEACHING PRACTICES, AND STUDENT PERFORMANCE
By

Ellen Henson Brinkley

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of English

1991

 

@54~ 58 77

ABSTRACT
CRITERIA FOR ASSESSING KINDERGARTEN TO

TWELFTH GRADE ENGLISH LANGUAGE ARTS CURRICULA,
TEACHING PRACTICES, AND STUDENT PERFORMANCE

BY

Ellen Henson Brinkley

Previous historical research has described the teaching
cm' English language arts. Other research. has examined
cuiteria that can be used to evaluate educational programs
as a prelude to reform. The purpose of this study was to
investigate (l) by what criteria English language arts
programs have been assessed in the United States in the
past, (2) what criteria are being used and recommended for
English language arts assessment today, (3) what lessons can
be learned from the use of past and present criteria and
contexts for assessment, and (4) how English language arts
program assessments might provide data that decision—makers
can consider in order to make needed reforms.

Data collection procedures included reviewing English
language arts professional publications, especially the
publications of the National Council of Teachers of English
(NCTE), and designing a questionnaire and compiling
questionnaire results. Respondents were members of NCTE
committees Charged with addressing issues of English
language arts curricula, teaching practices, and student
performance, as well as representatives from selected NCTE

Centers of Excellence award—winning programs.

 

 

 

Results show that from the beginning of this country’s
history evaluation of English language arts curricula and
teaching practices have been linked to evaluation of student
performance. Major findings of the questionnaire study are
that teachers, students, and test results are perceived as
the factors most significant in shaping English language
arts curricula today; that school administrators are
perceived as most significant in evaluating English language
arts teaching practices; that exchange of ideas among
teachers within their own school building is perceived as
most significant in influencing change in teaching
practices; and that objective tests are perceived as the
most frequent means by which English language arts student
performance assessment. occurs. These :results suggest. the
need for multi—dimensional program assessment, with student
test results serving as only one criterion by which English
language arts curricula, teaching' practices, and. student

performance are evaluated.

 

 

ACKNOWLEDGEMENTS

This study grows out of nagging questions that I, like
all other English language arts professionals, have
struggled with. Not only must. we teach, but we must
evaluate as well--and be evaluated. Several persons have
made this project interesting, challenging, and possible.
The members of my guidance committee have shown both the
patience and the impatience I needed to get the project
done. Each of the members of my committee--Stephen Tchudi,
Marilyn Wilson, Sheila Fitzgerald, and Diane Brunner--has
taught me more than I thought I might learn during the
course of the study. Colleagues in the English Department
at Western Michigan University have listened to my
dissertation stories and have believed in me when I needed
that most. My husband and children have put up with a wife
and mother who at times has tried to do it all. To Max,
Matthew and Sarah I offer my apologies, my love, and my

thanks.

iv

 

 

 

 

Chapter

1

3

TABLE OF CONTENTS

The Need for Research and the

Research Process . . . . . . . . . . . .

Research Questions . . . . . . .
Rationale and Methodology for this Study

Early English Language Arts Evaluation:

1607-1924 0 o o o o o o o o o o o o o 0

Very Early English Language Arts . . . .
New Measures for Changing Times . . . .

Composition Scales . . . . . . . . . . .
Evaluating Reading and other Language
Arts . . . . . . . . . . . . . . . . .

Indirect Evaluation of English Language
Arts Teaching Practices and Curricula.

The Impact of Standardized Testing:

1925-1940 0 o o o o o o o o o o o o o .

Testing Enthusiasm . . . . . . . . . . .

Evaluating English Language Arts . . . .

Evaluating English Language Arts
Teaching Practices and Curricula . .

Challenging the Tests: 1941-1957 . . . .

English Language Arts Test Abuse
and Criticism . . . . . . . . . . . .
Evaluating English Language Arts . . . .
Evaluating English Language Arts
Teaching Practices . . . . . . . . .
Major Studies Evaluating English
Language Arts Curricula and Programs .

Reconsidering English Language Arts

Evaluation: 1958-1969 . . . . . . . . .

Evaluating English Language Arts . . . .
Evaluating English Language Arts
Teaching Practices and Curricula . . .

 

Page

l4
l4
19
26
32

39

44

44
49

59
7o
71
73
84

87

96
98

108

 

 

v1
6 Expanding Testing and Alternatives:

1970-1987 . . . . . . . . . . . . . . . . 119
From Testing as Measurement to Testing

to Testing as Management . . . . . . . . . 119
Evaluating English Language Arts . . . . . . 123
Evaluating Reading . . . . . . . . . . . . . 128
Evaluating Literature . . . . . . . . . . . 135
Evaluating Writing . . . . . . . . . . . . . 137
Evaluating Oral Language Arts . . . . . . . 142
National English Language Arts Assessment . 144
Evaluating English Language Arts

Teaching Practices . . . . . . . . . . . . 146

Evaluating English Language Arts Curricula . 153
Toward a Theory of English Language

Arts Assessment . . . . . . . . . . . . . 158
7 Current Conditions . . . . . . . . . . . . . . 162
The Power and Impact of Testing . . . . . . 162
Evaluating English Language Arts . . . . . . 164
Integrating English Language Arts
Assessment . . . . . . . . . . . . . . . . 174
Evaluating English Language Arts
Teaching Practices . . . . . . . . . . 177
Evaluating English Language Arts
Curricula . . . . . . . . . . . . . . . . 180

8 Reports from English Language Arts
Professionals . . . . . . . . . . . . . . . 184

English Language Arts Curricula . ... . . . 189
English Language Arts Teaching Practices . . 196
English Language Arts Student Assessment . . 199

Optional Final Essay Items . . . . . . . . . 204
9 Conclusions, Speculations, and

Recommendations . . . . . . . . . . . . . . 213
Evaluation of English Language Arts . . . . 214

Evaluation of English Language Arts
Teaching Practices . . . . . . . . . . . . 221

Evaluation of English Language Arts
Curricula and Programs . . . . . . . . . . 223
LISTOFTABLES.............. ....vii
LIST OF FIGURES . . . . . . . . . . . . . . . . . .viii
APPENDIX A . . . . . . . . . . . . . . . . . . . . . . 223
APPENDIX B . . . . . . . . . . . . . . . . . . . . . 234
00.00.0235

BIBLIOGRAPHY . . . . . . . . . . . . . .

 

10

11

12

13

14

LIST OF TABLES

Sample for Questionnaire . . . . . . .
Factors that Shape Curricula . . . .

Curriculum Evaluation . . . . . . . .

Designing and Revising Curriculum Guides

Use of Curriculum Guides . . . . . . .

Value of Curriculum Guides . . . . . .

Persons Who Determine Teaching Practices

Evaluation of Teaching Practices . . .

Influences on Changing Teaching Practices

School District Means of Assessment .
Classroom Means of Assessment . . . .
Factors Determining Means of Assessment

Factors in Program Assessment Process

Suggested Improvements for Program Evaluation

vii

O

187

189

191

193

193

195

196

197

198

200

201

202

204

208

 

 

LIST OF FIGURES

Sample Unit of the Burgess Silent-Reading Test . . . 36
Ayres Handwriting Scale . . . . . . . . . . . . . . 38
Participation in Discussion Flow Chart . . . . . . .104

Observation Checklist to Assess Reading Abilities .132

Estes Attitude Scale . . . . . . . . . . . . . . . .133

Bay Village Reading Scale . . . . . . . . . . . . .168

Evaluating Writing as a Process . . . . . . . . . .173
viii

 

 

 

 

CHAPTER ONE

THE NEED FOR RESEARCH AND THE RESEARCH PROCESS

Driven by pressures from the general public and by
national attention to testing and assessment,1 kindergarten
to twelth-grade (K-12) school district personnel find
themselves self—consciously wondering how effective their
programs are, how they compare with others like or different
from themselves, and how well their students and faculties
measure up. My own experience confirms this fact, for I was
recently hired as an external consultant to conduct a review
of a K-12 English language arts program in a nearby public
school district. My charge was to describe from an
outsider’s perspective the program as it currently existed
and to recommend changes based on my knowledge and
experience. It is this program evaluation experience that
provided the primary spark of interest in conducting this
study.

English language arts teachers and administrators are
aware that current theory and practice related to the
teaching of K-12 English language arts have changed
dramatically during the last fifteen years. For example, in
the past, beginning reading has been taught as a set of
letter and sound identification skills on the premise that
such skills led to comprehension of ideas being transmitted
by a writer. More often now, even in the early grades,

reading is taught as a transaction of meaning between reader

1

 

and writer, a transaction that occurs as the reader uses
graphophonemic, syntactic, and semantic cues from the text
to construct meaning. In the past, likewise, literature has
often been studied as fixed texts to be analyzed, with great
emphasis on understanding the prevailing interpretations of
literary critics. Today, however, more attention is given
to the interaction of student readers with texts and to
recognition of the meanings that readers bring to texts and
to the meanings that readers construct as they read.

The teaching of writing has undergone change that is
just as dramatic. In the past it has been assumed that
writers decided their meaning ahead of time and planned
their writing according to a preconceived thesis and
outline. The finished papers were submitted to a teacher--
usually the sole reader-~who then evaluated the finished
products. Today writing is taught not only as a means of
communicating predetermined thoughts but also as a way to
explore thoughts and as a way to determine focus. Writing
evolves and is aided by response from teacher and from peer
readers, eventually being shaped into polished pieces to be
shared with as authentic audiences as possible. In the
past, listening often has been taught by having students
take notes as teachers lectured, and speaking has often been
taught by giving assigned formal speeches and by requiring
oral recitation of answers to study questions. More often
today, however, both listening and speaking occur naturally

in the interaction of small classroom groups engaged in

 

 

 

 

collaborative learning activities, with students actively

thinking, questioning, and articulating for themselves
concepts being studied and responses to them.

Such teaching and learning strategies have been
discussed and recommended by theorists and researchers in
English language arts professional publications and
conference sessions. In response, school districts have
often sought to implement such reforms. Sometimes English
language arts teachers and curriculum coordinators rewrite
curriculum guides to reflect new theories and new methods
and then simply hope for the best. Sometimes extensive
inservice training and follow-up seem almost to guarantee
successful implementation of reforms. Sometimes teachers'
intuitions tell them that the new programs work, and
sometimes research confirms the value and validity of the
new theory and new methods. However, not every new method
or program is successful, and sometimes there is
disagreement among faculty, parents, administrators, and
students as to how effective the new ways are or of the
value still inherent in the old ways.

Unfortunately, as old ways are discarded, often new
ways are still measured by old evaluation tools. In fact,
today’s English educators tell us that old evaluation
methods need to be changed as well. Although state
departments of education sometimes publish program
assessment guidelines (available through the ERIC system),

they frequently consist of simple checklists. School

 

district teachers and administrators want and need better
ways to assess their students’ progress and better ways to
assess their own English language arts programs.

School districts and communities must, of course, not
only be concerned with assessment within the district but
also with the array of state and national tests and
assessments. Students from the early elementary grades are
given state and national standardized tests which produce
results that are frequently analyzed and used to draw
conclusions about the comparative performance not only of
individual students but also of teachers, administrators,
school districts, and even states and nations. Anyone who
helps pay for education--parents, communities, state and
national governmental bodies, and the general public-—feels
a right, and often a responsibility as well, to ask about

its effectiveness and efficiency.

Research Questions

The control, then, of today’s curricula and teaching
practices rests in bits and pieces all the way from the
classroom to the Congress. Assessment of English and
language arts curricula, teaching practices, and student
performance must, therefore, be considered within the
context of complex political, economic, and social issues.
English educators must consider profound context questions-_
questions that will be considered as a part of this study:

1. What is the purpose of evaluation and assessment?

 

 

 

2. Who will evaluate, and who will set the standards?

3. Who will be served by assessment, and who stands to

lose?

While English language arts teaching and learning have
become encumbered with multiple evaluators and multiple
evaluations, the solution is not simply to reduce the
participation and control by those outside the classroom.
Accountability is a given-—assessment does and will and
should occur. The purpose of this study, then, is not to
argue against assessment per se, which seems an unreasonable
and futile endeavor, but rather to focus primarily on
criteria for assessment:

1. By what criteria have English language arts

programs been assessed in the United States during its

brief history?

2. What criteria are being used and being suggested

for English language arts assessment today?

3. What lessons can be learned from the use of past

and present criteria and contexts for assessment?

4. How might English language arts program assessments

provide data that is accurate, appropriate, and useful

so that decision-makers can consider needed reforms?

Rationale and Methodology for this Study
A. study of the materials front English language arts

professional books and journal articles yields invaluable

 

6

data about assessment practices of the past, since such
publications provide the primary record. of what English
language arts professional leaders have observed and
recommended. Especially during the earlier years such
information exists primarily embedded in broader discussions
of how English language arts was being taught or of how
educators thought it should be taught. My task in part,
then, has been to glean the fragments of discussion about
English language arts evaluation of curricula, teaching
practices, and student performance from professional
publications in order to determine the shifts in thinking
and practice over the years--always recognizing the impact
that overall English language arts trends and broader
educational and national events had on evaluation of student
performance and programs.

The historical research should be illuminating and
instructive for English educators today as they act and
react in an educational environment that currently seems to
give its greatest priority to issues of testing and
evaluation. Historical research is recommended to provide
'%1 perspective for’ decision-making' about. educational
problems," to assist in "understanding why things are as
they are," to predict "future trends," and to avoid the
plight of those who ignore the old adage, “those who are
unfamiliar with the mistakes of history are doomed to repeat

them“ (Wiersma 184). Historical studies, such as Arthur N.

Applebee’s Tradition and Reform in the Teaching of English,

 

 

 

 

have jprovided just. such. helpful foundations for English
educators as they consider broader English issues and
reforms.

Earliest historical data can be most readily drawn from
secondary' sources, i.e., historical accounts. of 'teaching
practices in the United States, while later data can be
drawn from a variety of primarily English language arts
professional. publications, ‘which. frequently' record. first-
person ‘testimony of English language arts educators and
classroom teachers. Thus, information focused specifically
on English language arts is easier to access after 1912 when
the English Journal was first published.

Because of the broad potential scope of this study,
limits have been imposed on the materials to be reviewed.
Not included are discussions of assessment of K-12 English
language arts programs outside the United States or
discussions of the special assessment needs of ethnic
minority groups and special education students. Also not
included are discussions of textbooks or teacher education
programs, though admittedly all of these factors are
important. considerations in their’ own right and. have a
bearing on how and why English language arts assessment
occurs as it does.

As I consulted English language arts professional
publications, I was aware of the problems that can cloud the
value and validity of the information recorded there. While

the editors of such publications have provided over the

 

 

years a forum in which English language arts theories and
practices could be discussed, still there has always existed
the possibility for distortion that can occur as
professional bandwagons emerge and then disappear. Writers
of professional articles usually present themselves as
spokespersons who believe they possess insight about the
truth and who hope others will seek, and benefit from, their
shared insights.

Indeed, writers of professional publications-—
especially those writing for a specific professional
organization such. as the National Council of Teacher of
English (NCTE)-—are usually influenced themselves by the
articles and books previously published by that professional
organization. It is not difficult also to imagine readers
who assume that, because particular English language arts
practices have been discussed at length in professional
publications, those practices have been adopted in a general
way, whether or not that might actually be the case.

Additional data can be solicited, however, from those
who are most apt to be especially attuned to the realities
of English language arts assessment practices as well as
aware of the professional lore included in professional
publications. Unlike writers of professional publication
materials, questionnaire respondents offer their insights
upon request and thus may provide a somewhat different,
Perhaps more realistic perspective, since they can remain

anonymous if they choose to do so. Thus, responses to a

 

 

9

questionnaire could serve as a supplement and as a reality
check to historical research and to current professional
publication information.

Using principles of questionnaire construction
suggested by Robert Slavin (aw—”MW
A Practical Guide) and Likert-scale information provided by
William Wiersma (Research Methods in Education: An
Introduction), and aided by conversations with my
dissertation director and with a data collection and
analysis consultant, I designed two versions of a
questionnaire (Appendix A) for 300 persons who could be
identified as likely to have knowledge and experience
regarding the assessment of K-12 English language arts
curricula, teaching practices, and student performance. By
virtue of my position as vice president of the Michigan
Council of Teachers of English, I have recent copies of the
NCTE Directory, which allowed me to identify English
educators (specific groups will be described in chapter 8)
serving on NCTE committees charged with addressing issues of
English language arts curricula, teaching’ practices, and
student performance. In addition, I selected contact
persons identified in lists provided by NCTE of 1985, 1987,
and 1989 NCTE Centers of Excellence award winners, choosing
carefully among the lists those whose awards appear from the
program titles listed to have been given the award for an
entire English language arts program or for a substantial

part of the entire program (e.g., a middle school writing

 

10

program). The questionnaires were designed to elicit
information about actual procedures and programs and to
elicit opinions as well and thus yield both quantitative and
qualititative information (detailed in chapter 8).

Even questionnaire responses, of course, must be
interpreted. with the understanding that reality as these
respondents know it may also be shaped in part by the
influence of professional publications. This limitation,
however, is outweighed by the actual professional English
language arts situations these respondents experience from
day to day and year to year. Perhaps a more personal
limitation exists because I myself have also inevitably been
influenced by professional publications, though this
limitation is also offset by my own professional experience
as a K-lz classroom English teacher, as a teacher of
inservice and preservice English teachers, and as an English
language arts consultant.

The issue of distinguishing perceptions from reality
does seem to be a central one, however. I realized, for
example, as I prepared for the K-12 program review conducted
recently, that I had to use limited time and money resources
wisely to yield the most accurate data. Without a staff of
researchers who could conduct extensive classroom
observations, I had no choice but to rely to a great extent
on what school district staff and persons in the community
expressed as opinions, albeit relatively informed ones,

about the operation and effectiveness of the English

 

 

 

11

language arts program. On that occasion, what was
ultimately helpful was John Goodlad’s distinction among
different types of curricula: the "ideal" curriculum, "what
scholars . . . believe should be taught"; the "formal"
curriculum, "what some controlling agency (like the state or
the local district) has prescribed"; the "perceived"
curriculum, "what teachers believe they are teaching in
response to the needs of the pupils"; the "operational"
curriculum, "what an observer would actually see being
taught in the classroom"; and the "experiential" curriculum,
"what the students believe they are learning" (qtd. in
Glatthorn l8). Gathering information from these multiple
evaluative perspectives seemed an effective way to approach
a K-12 English language arts program evaluation. As I
approached the present study, it seemed likely that again
Goodlad’s categories might be useful as I analyzed both the
present and the past English language arts evaluation
criteria and contexts.

There are several assumptions, then, that underlie this
study: that assessment and evaluation are inherently
valuable; that many groups and individuals have a stake in
assessment and believe it is important; that evaluation
tools should closely match current theory and practice; that
it is possible to assess for the wrong reasons; that
assessment has the potential to do harm as well as good;
that assessment is affected by external conditions; and

that, in spite of the quantity of student test results

12

available to administrators today, school districts need
more information about program effectiveness than most have
now.

With these assumptions in mind, I have tried to
discover what has already been tried but did not work in the
past and possibly what has already been tried but might work
now. I have analyzed the effectiveness of various methods
and measurements used in the past; questionnaire data about
actual evaluation criteria and contexts today; and informed
opinions from questionnaire respondents as to how best to

assess English language arts programs. From this study, I

 

draw conclusions and make recommendations about the kinds of

information school districts need to gather and how they

 

might best acquire it in order to decide on needed reforms.
This study and its results should have relevance for most
pmblic schoOl districts and for consultants who anticipate
conducting evaluative studies of English and language arts

programs in the future.

1Today the terms "evaluation" and "assessment" are
frequently interchanged, with both suggesting formal or
informal means by which a judgment can be made. "Testing“
is considered one means by which assessment or evaluation
can occur. The usage of these terms has changed, however,
over the years, especially the use of "assessment." In
NCTE’s 1975 booklet, Common Sense and Testing in English,
for example, "assessment" was defined as "a term used for
local, state, and national projects that seek to describe
how well students are doing in various fields" while
"evaluation" was defined as "the process of determining the
value or worth of something in schooling at any level" (4).

Interestingly, the Oxfo d erican Dictionar (1980) defines
"assess" as "to decide or fix the amount of value of, to

 

13

estimate the worth or quality or likelihood," while it
defines "evaluate" very similarly as "to find out or state
the value of, to assess." The verb "test," however, is
defined as "to subject to a critical evaluation of the
qualities or the attributes of,“ seeming to emphasize both
the power of the test-maker and the unpleasantness of the
test-taking task.

 

 

 

 

 

 

CHAPTER TWO

EARLY ENGLISH LANGUAGE ARTS EVALUATION: 1607-1924

Very Early English Language Arts

A review of the literature from the past reveals that
educational evaluation. is as old, or' nearly' so, as (are
teaching and learning themselves, for any valuing of
achievement or learning involves evaluation. In the
earliest days of this country’ 5 history any education that
occurred took place in the home, with parents as teachers or
with a private tutor hired by the family. It is easy to
imagine that even there the language and literacy learning
and teaching that took place were subjected to evaluation-—
the teacher evaluating the progress of the young scholars
and his or her own work as a teacher, the students self—
evaluating their own understanding and their teacher's
effectiveness, the parents evaluating both the work of their
children and of the tutor.

In the American colonies, then, any evaluation of
student performance was based on observable behaviors--
reading the Bible aloud (N. Smith 35) during family
devotions or reading and writing letters. Parents probably
made mental notes as to how well a child's literacy compared
to that of his or her siblings or even to the memory of the
parents’ own early achievements. Ultimately, of course, the

results could be observed and evaluated further as the young

14

 

 

 

15

students took on adult responsibilties that demanded the use
of their knowledge and skills.
Clifton. Johnson’s 1904 Old-Time Schools. and. School-

books, a study of colonial teaching practices, explains that

 

as settlements grew larger, many communities started a dame
school, for "[t]here was always some woman in every
neighborhood who, for a small amount of money, was willing
to take charge of the children and teach them the rudiments
of knowledge" (25). ZIt is difficult, however, to know how
greatly the communities valued education or the devotion of
teachers to their duties. Johnson explains that while the
"dame" listened to the students’ recitation, "she busied her
fingers with knitting’ and sewing, and in the intervals
between lessons sometimes worked at the spinning-wheel"
(25). According to Johnson, sometimes in the South even
convicts served as teachers during these earliest years.
Apparently new settlers who had been convicted of small
crimes occasionally paid for their ocean voyage, if they
could read or write, by becoming indentured to teach for a
length of time (32).

Nila Smith in her historical study of AE§£19§h_Reading
Instruction refers to the colonial period (1607-1776) as
"the period of religious emphasis" (10). She explains that
with Protestantism came the doctrine that individuals were
responsible for their own salvation and thus had to learn to
read and interpret scriptures for themselves (11). In such

a society reading instruction was valued. Evaluation of

 

 

16

reading skill occurred in the form of oral reading of the
Bible or the W as well as by saying aloud
the letters of the alphabet and syllables as listed in the
primer. Johnson explained that the minister also played an
important part in evaluation. As a town officer he
"examined the children in the catechism and in their
knowledge of the Bible" and carried out what must have been
one of the country’s first evaluations of listening skills
by questioning students "on the sermon of the preceding
Sunday" (24).

Sidney Cohen supplies additional evaluation details in
his description of the educational law enacted in
Massachusetts in 1642. "Selectmen" in each town were
charged with determining "whether or not parents and masters
were following their obligations," that is, determining if
the children were being taught "to read and understand the
principles of religion and the capital laws of the country"
(44). The stakes, at least as set by the law itself, were
fairly high. Fines could be assessed parents who refused to
have their children examined, and if a court or magistrate
agreed with the selectmen that particular parents were
remiss in educating their child, the Child could be
apprenticed, in which case the master of the "deficient
child" would be required to fulfill the provisions of the
law. In 1690 Connecticut passed a Similar law which made it
"incumbent upon local jurymen to examine the reading ability

of all the town’s children" and to fine negligent parents

 

 

 

17

(81). Cohen points out, however, that in actual practice,
parents and towns often found ways around the penalties and
that sometimes the student readers’ only test was to recite
a memorized catechism, which did not actually measure
reading skill at all (81).

Given the communities’ relatively low expectations of
teachers, there seemed little attention paid to evaluation
of teaching practices or of curriculum. In 1654, however, a
Massachusetts law recommended that the selectmen "exercise
some supervision over the quality of the teachers employed
by the community" (Cohen 56). Teachers may also have felt
that. both they and their curricula were being examined
indirectly on the occasions when visiting officials examined
students.

By the mid-17005 prospective students of Benjamin
Franklin's English School ("English" in this case used to
distinguish the school from. those emphasizing Latin and
Greek) had to meet the following entrance requirements: "It
is expected that every Scholar to be admitted into this
School, be at least able to pronounce and divide the
syllables in Reading, and to write a legible Hand . . . ."
(qtd. in w. Smith 177). Franklin had definite ideas about
student reading performance that might or might not meet his
standards. In describing the second of six classes to be
tauth, he complained that the boys

. . . often read as Parrots speak, knowing little or

nothing of Meaning. And it is impossible a Reader
should give the due Modulation to his Voice, and

 

 

18
pronounce properly, unless his Understanding goes
before his Tongue, and makes him Master of the
Sentiment. (W. Smith 179)

Writing lessons focused throughout the early years
primarily on penmanship and spelling, and evaluation was
probably dependent on what could be demonstrated for all to
see. The emphasis on good penmanship is indicated by
“exhibition pieces" which were passed around for visitors to
admire on the last day of the school term (Johnson 112).
Franklin, however, again made it clear that meaning should

be emphasized:

The boys should be put on Writing Letters to each other
on any common Occurrences, and on various Subjects,
imaginary Business &c. containing little Stories,
accounts of their late Reading, what Parts of Authors
please them, and why. . . . (W. Smith 181)
Evaluation of such work was a certainty, for Franklin added,
“All their letters to pass through the Master’s Hand, who
is 'to point out the Faults, advise the Corrections, and
commend what he finds right" (W. Smith 181).

During these early years when many did not read or
write and when in many homes the only book was the Bible,
oral language was considered especially important. Although
relatively few went to college, those who did found that the
colleges focused great attention on rhetoric and oratory,

following the "oral-based eighteenth-century model of
education“ (Lunsford 3). The ability to speak correctly and

persuasively in public was easily evaluated by student

performance. Oratory' made demands on listeners as well,

 

 

19

though early educators seemed less concerned about
evaluating listening.

There was, then, during these early years great
emphasis on receiving knowledge, on following rules, on
learning to do things the right way. To the extent that
students could remember and reproduce transmitted
information, they were judged successful students. To the
extent that teachers produced students who were functionally
literate, teaching practices and curricula were met with

approval.

New Measures for Changing Times

As the country’s attention shifted in 1776 to
revolution and independence, the explicitlyl religious
emphasis in classrooms was replaced by a nationalistic and
moralistic emphasis (N. Smith 37). It was hoped that reading
would foster loyalty in the new nation as well as "high
ideals of virtue and moral behavior" (N. Smith 37). Noah
Webster’s The American Spelling Book provided an American
standard by which young students could be measured, and
Lindley Murray’s English Grammar was designed to "promote in
some degree, the cause of virtue, as well as of learning"
(qtd. in Tchudi and Mitchell 7). The emphasis in Murray's

Grammar on the rigid procedure of "parsing" of text also

 

provided almost countless grammatical labels on which

students could be tested.

 

 

 

20

Literature was studied during these years primarily as
a subject upon which composition assignments and
examinations could be based. Arthur N. Applebee's
discussion of the colleges’ attitude toward literature is
instructive for understanding the view of pre-college
educators as well. Essentially, literature was perceived as
something to be enjoyed outside the classroom rather than as
a subject to be taught. Literature was used inside the
Classroom, however, as a model for writing compositions, and
secondary English teachers routinely found themselves
teaching literature from. the college' :reading lists to
prepare their students for college entrance examinations (A.
Applebee, Tradition 30).

By the mid-18005 Horace Mann had advocated the use of
written. rather than oral examinations because they' were
thought to be more objective:

. . . written exams provide all students the same

question in the same setting. Oral examiners

necessarily had to ask different questions during
testing because all students were in the room awaiting
their turns. Oral examiners also could phrase their
questions so that some answers were more obvious than
others. As a result some students received easy
questions, while other students were stuck with

difficult ones. (qtd. in Moore 958)

The academy movement that Franklin had promoted spread
rapidly, with one report of 6,185 academies in the United
States by 1855 (Spring 22). These "town schools" were
frequently supported by the communities but controlled by

private boards of trustees (Spring 22). Especially once

settlers moved westward into more isolated locations,

 

21

however, classroom teachers themselves frequently had
little, if any, schooling beyond what their own one-room
school had offered. My own great-grandmother was one of
these young teachers. Family records describe her having
attended the "country school" for a short time in the mid-
18005 before enrolling in the Gallia Academy across the Ohio
river from her Virginia home. Her final evaluation served
as a form of teacher certification:

When she was only 14 and a half years old . . .her

father took her on the back of him on a horse. Took

her to the three school trustees, and each one of them
examined her in one or two subjects. She got high
grades in all, and they gave her a school to teach when
she was just fourteen and a half years old. (qtd. in

Brinkley 2)

One of the reasons why such young teachers were
recruited was because of the rapid growth of school
populations that had begun to take place. Indeed, by the
end of the 18005 it was no longer even possible for
evaluation of student performance to be an individual matter
between teacher and student, for no individual student could
Claim very much of the teacher’s attention.

At this time the issue of student evaluation, in the
form of college admissions tests, occupied the attention of
high school and college English faculties and students
alike. In 1871 the University of Michigan organized a
Commission of Examiners to visit high schools to evaluate
faculty, students, and curricula. Thus an accreditation

system was developed, so that students from approved schools

could be admitted to college on the basis of their school’s

 

 

22

merit (Mason 41). Fred N. Scott, of the University of
Michigan, simply stipulated that high schools should send
them "young men and young women who respect their mother
tongue and know how to use it" (qtd. in Hook, Long 12).
Harvard, Yale, and other eastern schools, however,
established rigid, written examinations for all applicants
(Lunsford 2).

The high schools struggled to match their own curricula
to the college reading lists so that their students would
not be at a disadvantage. High school teachers became
exasperated, however, when faced with the need to teach so
many works of literature listed by so many colleges and when
they realized how little control they had over their own
curricula. Eventually their complaints reached the National
Education Association (NEA), which referred the matter to
its English Round Table, which in turn formed a committee to
study the matter, report its findings, and make
recommendations. Questionnaire responses led this committee
to recommend the establishment of a national organization of
teachers of English (Hook, Long 14).

The formation of NCTE in 1911 and the advent in 1912 of
the English Journal provided a broader forum for English
language .arts teaching' practices, curricula, and. student
performance to be described, evaluated, and improved. The
journal’s first editor explained that it aspired "to provide
a: means of expression and. a general clearing house of

experience and opinion for the English teachers of the

 

 

 

23

country" and to be "a bearer of helpful messages to all who
are interested in the teaching of the Mother tongue" (qtd.
in Hook, L_011g 23). Hook observes that early NCTE leaders
Shared a belief that council publications "should not follow
a party line but should be open to informed, independent
expression of even highly divergent opinions" (23). Ten
years later the second editor would likewise assert that,
"We desire to make the magazine an open forum for all,
conservative and radical alike, who have important ideas and
can state them well" (Hook, Long 83), though he admitted his
own progressive bias, which he predicted would "result in a
preponderance of the new methods in the magazine, but this
on the whole seems to be desirable, since those are the less
known" (qtd. in Hook, Long 83).

Because of the college entrance examination
controversy, evaluation of student performance was an issue
of importance from the first issue of the English_gournal.
Attention to the college entrance issue was soon diverted,
however, to the "new-type" tests and new theories about how
student evaluation could and should be handled. Such a
direction made sense to those of this time who were placing
less faith in God and religion and. more in scientific
"truths." At about the same time intelligence tests were
being developed. The Binet scales (1905-8) and the Stanford
revision (1916) were used during World War I in what today
would be labeled a "high-stakes" assessment situation, for

these tests were used to classify recruits to determine who

 

 

 

 

24

would serve in leadership positions and who would be sent to
the front lines. Real-life testing during World War I
revealed other results--for example, that thousands of
soldiers could not read well enough to follow printed
military instructions (N. Smith 158). Educators soon
realized the potential for such tests in the schools,
supposedly to group similar students in order to provide
instruction to match students’ abilities (A. Applebee,
Tradition 82).

Soon Edward Thorndike was calling for accurate measures
of what he referred to as educational "products" and working
untiringly' to develop objective tests for a 'variety of
subjects (N. Smith 127). Turn-of—the-century educators had
begun to experiment with what they called "scientific"
methods of teaching reading. Smith reported that in 1902
the "scientific alphabet" had been introduced to reduce the
number of characters that represent the sounds of the
English language in order to facilitate reading instruction
and learning" (127). The "sentence method" and the "story
method" as well as elaborate "phonetic methods" were also
introduced (N. Smith 128). Edmund. Huey’s classic, The
Psychology and Pedagogy of Reading, published in 1908,
provided a "scientific treatment" of reading, according to
N. Smith (123).

As if the country and its schools were not already
eager enough for standardized and objective tests, early

issues of the English Journal included articles which

 

 

 

25

highlighted the need for more accurate and especially more
expeditious ways to evaluate student performance, teaching
practices, and curricula. Such topics were right on target
for school teachers and administrators who had seen
elementary and secondary student populations jump from
6,871,000 in 1870 to 17,813,000 in 1910. They also had seen
the number of high schools increase from 500 in 1870 to an
amazing 10,000 just forty years later (Kirschenbaum et a1.
51). Teachers eventually were faced with classes as large
as 50 students and more, with the result that--especia11y
for high school teachers who met a new group each hour——it
was extremely difficult to know students’ individual
interests and abilities“ ‘Vincil Coulter’s 1912 article
complained about the difficult teaching conditions for
English teachers, especially when compared to the favored
status of science. The data he presented served as an
evaluation of English language arts teaching conditions.
For example, whereas science teachers each taught an average
of 75 students, English teachers taught 136. Schools which
spent $1.42 per pupil for science materials spent 17 cents
per pupil for English materials (25-26). Later in that same
year Ernest Noyes optimistically called in his article for a
"clear-cut, concrete standard of measurement which will mean
the same thing to all people in all places and is not
dependent upon the opinion of any individual" (534).

Other publications added fuel to the fire by

demonstrating the unreliability of grades as measures of

 

 

 

26

student accomplishment. For example, Kirschenbaum et a1.
Cited 1912 studies by Starch and Elliott in which papers
graded by teachers in 142 schools revealed that one
particular paper was scored anywhere from 64 percent to 98
percent while another was scored from 50 percent to 97
percent. Another student’s paper got failing marks from 15
percent of the teachers while 12 percent of the teachers
gave it a score of 90 percent or above. Kirschenbaum et al.
explain that, "with more than 30 different scores for a
single paper and a range of over 40 points, there is little
reason to wonder why the reporting of these results caused a
’slight’ stir among educators" (54-55). Given these
conditions and concerns, it is not surprising that
"scientific," i.e., standardized and objective, tests soon
captured the attention of English and language arts

educators at all levels.

Composition Scales

How to test composition seemed to pose the problem
commanding the greatest attention among English educators
and led to the development of a number of compositon scales.
One of the first and one of the most popular was the
Hillegas scale. A 1912 W article eXPlained
how the scale had been developed: A large number of student
compositions had been sent to several hundred judges, who
were asked to arrange the papers in order of merit. From

these rankings, a scale of ten samples "ranging in value by

 

 

 

27

equal steps from 0 to 937 units" was derived (Noyes 535).
(Actually the zero point was established on the basis of an
"artificial sample produced by an adult who tried to write
very poor English" (Noyes 535), an understandable cause for
later criticism of this particular scale.) The ten sample
papers and their percentage scores were copied and
distributed to serve as what today would be called "range
finders" by teachers who could compare their own students’
writing to the samples. It is interesting to notice the
many benefits that were projected for such measures.
Supervisors, for example, were told, they could. use the
samples to "compare classes of the same grade in different
schools, in different cities, or under different teachers"
(Noyes 536). These suggestions emphasized the external uses
that could be made of test scores and at least implied the
possibility of linking teacher evaluation to student
performance on the basis of what were thought to be
objective measures.

Thorndike in a later Ehgli§h_gggrnal article explained
the mathematical procedure used with the Hillegas scale.
The rankings of all the judges were averaged for a
particular sample and all the samples arranged then in order
of merit. The value of 1.0 was assigned as the amount of
difference that existed when 75 out of 100 experts ranked a
sample correctly, that is, when no more than 25 judges put
the "worse" sample ahead of the "better." Once a zero score

ad been established, samples could be selected which were

 

 

 

 

28

1.0 better than zero, 1.0 better than 1, 1.0 better than 2,
etc. (Thorndike 551).

Although the Hillegas scale seemed the most commonly
used and discussed composition scale, the Harvard—Newton
Scale was also frequently referred to. Eighth-grade
compositions were marked by the percentile method by 24
teachers in much the same way they were with the Hillegas
scale. However, the Harvard—Newton Scale went further by
offering writing samples for each of the four modes—-
narration, exposition, argumentation, and description
(Starch 146).

Not everyone bought the notion of writing scales. C.
EL. Ward in a 1917 English. Journal article, "The Scale
Illusion," attacked the practice of ranking themes (221).
This writer argued that "[aJny measure of literary value is
impressionistic: any measure of literary value and
mechanicial value at the same time is a phantom" (223-4),
and further insisted that, “A system that shows [the
student] only his height above an absolute zero can no more
produce a harvest than a thermometer can bring forth figs"
(230). What Ward offered instead of the Hillegas scale,
however, would not be well received by modern readers, for
it was a system based on the principle of subtraction for
errors from a perfect score. In 1920 Roger Hatch tried a
somewhat similar attack on composition scales, complaining
that they were "mostly based upon that most deceptive

theory, that law of averages" (338). Citing his 20 years of

 

 

 

29

work trying to find out what colleges wanted in composition,
he: offered a system of penalties, which. he used as a
pedagogical tool by presenting it to his composition
students to refer to as they prepared assignments. Again,
modern readers might cringe to hear his comment on the
efficiency of his system: "with the composition marked in
red ink or pencil . . . it is the labor of a moment only to
run the eye over the marks and add up the total, subtracting
from 100 percent to get the basic grade" (342-3).

Flora Parker recommended confining Hillegas’ procedures
to subjects which were definitely measurable, such as
spelling, rules of grammar, etc. She insisted that
composition is, in addition to being a demonstration of
correctly applied rules, also "an art with all the
intangible graces and beauties which reside in that realm"
(204). Immediately following Parker’s article is a reply by
EL A. Courtis, who rather smugly labeled the scale a
"valuable measuring device" (208) and asserted that
"everything worth while in education is also measurable"
(208). Again, evaluation of teachers and curricula were
linked to student performance for, according to Courtis, the
Hillegas scale would be especially beneficial for
supervisors who must decide "questions of general policy"
(213) and could be used to determine "the efficiency of
different methods of teaching." Further, he suggested that
teachers could use the scale to judge their own work so that

they could change their methods and bring their own work "up

 

 

 

 

 

30

to standard" (215) . Ultimately, any teacher who proved
"incapable of profiting by training . . . need[ed] to be
eliminated as a teacher of English composition" (216). It
is: not. difficult. to imagine the reactions of classroom
teachers who might read such warnings: teachers who valued
their jobs would work hard to see that their students’ test
scores were as high as possible.

The Breed and Frostic Scale (Klapper 190) for sixth
graders was similar to the Hillegas Scale, but all student
writing was typed before being submitted to judges and to
eliminate handwriting influences on scoring. Somewhat like
the Harvard-Newton scale, the Van. Wagenen’s Composition
Scale offered distinct scales for description, narration,
and exposition, but included separate values for "thought
content, mechanics, and structure" as well (Klapper 200).
Klapper’s Teaching English in Elementary and Junior High
Schools echoed the beliefs of others already cited in that
he specificially advocated the composition scales as
especially useful as measures of teacher achievement (230)
and of the value of teaching methods (231).

Wirt Faust describes yet another of these obviously
ins-consuming and expensive projects to design a new and
etter way to evaluate composition. Working with four high
chools he asked seniors to write to a prompt, as did most
f the other plans as well, in this case assigning themes
escribing "fields, lakes, seas, streets, and the like and

0 contain no mention of human characters" (258). Of these

 

 

31

he chose 30, which were typed to eliminate handwriting, and
sent to 40 judges (primarily NTCE members). The judges were
asked to arrange the papers in order of merit and to write
for each paper an opinion about the merit, defects, and the
reason better than the one below it and poorer than the one
above. Each was given a numerical score on content and on
form. Interestingly, the article published just the 12 best
of the 30 along with the data about merits, defects, and
comparisons of each and offered the samples as "standards in
descriptive theme writing for the Senior year of the high
school" (260). The puzzle in this case is that there was no
further mention of those samples numbered 13 to 30 or of the
level of student writing they represented. It seems at
least possible that readers might mistakenly have thought of
the top 12 samples as representative of the entire range of
responses against which to measure their own students’ work.

Although many questioned the validity. and reliability
of composition scores and standardized tests throughout this
period, the experts still seemed to side with Daniel Starch,
who had argued in 1916, "[aJny quality or ability of human
nature that is detectable is also measurable" (2). Finally,
ICTE, which had aired so many of the pros and cons in the
nolish Journal, spoke out on the testing isue. The June
1923 issue included a report from the NCTE Committee on
Ixaminations, whose first sentence made it Clear where the
rofessional organization stood—-"The Committee on

xaminations desires to stimulate an interest in a more

 

 

 

ii

511

CC

fL‘

ti

":51

 

32

widespread use of standard tests in English" (Certain 365).
The primary reason for this recommendation seemed to be that
such tests could provide school districts the opportunity to
compare their standing with other districts, an external
function of testing. Sterling Leonard’s article later that
year reminded readers why these so-called "scientific" tests
were preferable by describing a study of teacher corrections
on student papers. He found a great number of "wrong"
corrections as well as "a multitudinous array of puristic or
wholly captions excisions, restatements, rearrangements, and
additions which make over the pupils’ own expressions into
such as fit the corrector’s way of thinking and writing"
("How" 528). Almost no one, it seemed, wanted to go back to

the old ways.

Evaluatin Readin and Other Lan ua e Arts
Although the English Journal was NCTE’S only

professional journal during the early years, it focused much
of its evaluation attention on discussion of composition,
especially' that. of secondary' students. Elementary
educators, however, were devoting more of their attention to
the evaluation of reading, although N. Smith explained that
in fact standardized reading tests were slower to develop--
probably because both silent and oral reading were difficult
to measure and to analyze into testable elements (161).
Starch’s 1916 book included a reading test that he

designed that may have served as an early model for later

 

 

 

33

tests. It was intended to measure the "chief elements" of
reading, perceived by Starch as comprehension, speed, and
correctness of pronunciation (20). He offered several
reading passages at increasingly difficult reading levels
which students were asked to read Silently for thirty
seconds. Following the reading they were asked to mark the
spot where they stopped reading and to write down as much as
they could remember from their reading. Interestingly
enough, the written retelling was scored by crossing out the
words which reproduced the text and by counting those
remaining--seeing what percentage of words should be
discarded as not related to the text (31). Starch
recommended that the test results be used to develop "a
definite standard of attainment to be reached at the end of
each grade" (31-32). Such a standard, he insisted, would
make it possible for a "qualified person to go into a
schoolroom and measure the attainment in any or all subjects
and determine on the basis of these measurements whether the
pupils are up to the standard, whether they are deficient,
how much, and in what specific respects" (32).

By 1918 at least four tests of silent reading had
ppeared (N. Smith 161), resulting in what Smith referred to
s a "phenomenal" increase in reading research led by
tudies concerned with the standardization and application
of reading tests (186). As if in response to Starch’s
recommendation of grade-level standards, N. Smith reported

that the one "great fundamental truth" that became apparent

 

 

 

BS

ind

EVE

(15

in

to

oil

beg

of:

te:

 

I‘GJ

Si

me

01:

Ac

st

DE

, illl||L

 

34

as the newly developed tests were administered was the wide
individual differences in students’ reading achievement,
even those "in the same grade and in the same classroom"
(194). Smith acknowledged, however, that it was clear even
in those early days of standardized reading tests the extent
to which testing drives instruction: "as soon as school
officials begin to test some phase of instruction, teachers
begin to emphasize that phase in their teaching" (162).

A wide range of reading tests and scales was eventually
offered. What they measured differed markedly from test to
test and reflected strikingly different definitions of
reading, some tests concerned with decoding of visual
symbols and others concerned with comprehension of the
meaning of whole passages.

Clarence Stone’s revised 1926 edition of w
Oral Reading described a wide assortment of reading tests
but cautioned that a student’s rating on a particular test
"should be used as only one item to be considered along with
other data" (228). The reading part of the Stanford
Achievement Test, however, promised to help classify
students as it asked students to comprehend. meaning of
paragraphs, sentences, and words of increasing difficulty
(Stone 229). Some sentences on this and on other similar
tests were phrased as yes-no questions and others--such as
"Are men larger than boys?"--would no doubt seem naive and
problematic to modern test designers. The Burgess Rate Test

measured. how :many' equally' difficult paragraphs. could. be

 

 

 

 

 

35

omprehended in five minutes and sometimes asked students to
raw a response (Figure 1) . The Chapman-Cook Speed of
eading Test also used paragraphs of equal difficulty and
easured how many could be comprehended within two and a
alf minutes. Students in this case were asked to cross out
he word that spoiled the meaning of the paragraph, e.g.,

Tom got badly hurt the other day, when fighting with

his older brother. As soon as this happened, he ran
home to his mother, laughing as hard as he could.
(241)

 

Such items might trouble modern test-makers, who would
estion them as too subject to a variety of
nterpretations. Gray’s test, however, was perhaps more
ypical of oral reading tests. As the Child read aloud, the
ester was to record on a copy of the paragraph the errors
ads by the student, with the idea that the better students
ould read faster with fewer errors (Stone 263).

Tests to measure other dimensions of the language arts
re discussed and developed during this same era of test
scination and explosion--some of which look very
realistic to readers today. For example, the Ayres
ndwriting scale, as described by Starch, was intended to
aluate students’ handwriting skill. Incredibly, the test
s constructed by taking samples by 1,578 children’s
ndwriting, separating the individual words, and then
asuring the speed by which readers could read these words.
entually eight degrees of legibility were determined and

esented to be used as guides, with three samples of each--

 

 

 

 

 

36

 

 

I. This naughty dog likes to steal bones. When he
steals one he hides it where no other dog can ﬁnd it.
He has just stolen two bones, and you must take your
pencil and make two short, straight lines, to show
where they are lying on the ground near the dog.
Draw them as quickly as you can, and then go on.

 

 

 

Figure 1 - Sample Unit of the Burgess Silent-Reading Test

 

 

slant, mediu'
scale).
Literat
discussed c
effectively.
standardized
described 1:
measure the
students. t
quality fro
poem was ac.
takers were
"best" (m
discern and
Throug'
Standardize
scale for '
more imporl
"incidental
fa“, by 15
rePorted p;
°°mpositior
hOWever, 1.,
Were best
Without Us
encouraged

laboratori‘

T____t

37

slant, medium, and vertical (Figure 2 shows a portion of the
scale).

Literature posed unique evaluation problems, which were
Iiscussed on occasion but were difficult to address
mfectively. Efforts were made, however, to create
:tandardized tests of appreciation of literature. Leonard
lescribed the process used in such a test that was to
[easure the literary appreciation of both teachers and
tudents. The test presented a number of poems "ranging in
:uality from Mother Goose to Bridges of Masefield." Each
oem was accompanied by three "spoiled" versions, and test-
akers were asked to determine which in each case was the
best" (Essential 59), thus demonstrating their ability to
iscern and appreciate the best literature.

Throughout this period some wished for even more
tandardized tests. For example, Klapper in 1915 sought a
cale for oral composition, since he believed it was much
ore important than written composition, which he termed
incidental" in the life of the average person (221). In
act, by 1925 early issues of The Elementary English Review
sported projects in Detroit and Chicago to develop oral
Dmposition standards (Hosic; Beverly). Some educators,
awever, reminded readers that "desirable language habits"
are best observed in everyday oral and written expresion.
’Lthout using the expression "reflective practice," they
couraged teachers to think of their classrooms as "testing

boratories" in which they could use their students’

 

 

 

 

demonstrated
Bates, and 5
Several
evaluation
widespread.
argued agai
to parrot l:
The author
their placr
instead (Wi
NCTE, advoc
he also mad

SUCh as gat

W
W

From
Standardize
beginning
Practices a
this theoi
pralCtices .
is Very 11
evalllation
least dlu
evaluéition

administra

T____i

39

amonstrated. skill to revise 'teaching' practices (Savitz,
[tes, and Starry 2).

Several authors of this period did point the way to
raluation recommendations that eventually became
.despread. For example, a 1918 English Journal article
gued against "old-time memory tests" that asked students
. parrot back information provided by texts and teachers.
e author did not offer standardized or objective tests in
eir place but rather open-book "thought examinations"
stead (Wiley 327). Although Certain, speaking in 1923 for
TE, advocated standardized tests for diagnostic purposes,

also made recommendations which sound more current today,

ch as gathering work in individual student folders (466).

direct Evaluation of English Language
ts Teaching Practices and Curricula

From the previous citations, it is clear that
andardized tests of student performance were from the
ginning promoted in part as a way to evaluate teaching
actices and curricula. High student scores, according to
is theory, were seen as evidence of good teaching
actices and of teaching what needed to be taught. There
very little in the professional literature about direct
aluation of English language arts teaching practices, at
iSt. during these years, probably Ibecause teacher
iluation was seen as the province of school

inistrators.

 

 

There
needed by ‘
article pro
"natural en<
"quick mind,
temperament,
38). Spec:
that they 14
“ability to
knowledge a:
"provocativ
expression,
was made
evaluation,
(or those c

The 04
Place duri
national :
In 1913,
English ap]
the NBA w
national s
schools as
numbers C
interestin

Was llh0W

 

40

There is, however, discussion about the qualities
eeded by the English teacher. Franklin Baker’s 1913
rticle provides an example. He listed the following
natural endowments" needed by teachers-—a "clear mind," a
quick mind," "retentiveness and fulness of memory," "social
emperament," and a "keen, intuitive sense of languge" (336-
8). Specificially for English teachers, he recommended
hat they know and love their subject, that they have the
ability to stimulate and guide other minds in acquiring
nowledge and love of subject," and that they be skillful in
provocative talk that leads to clear, vigorous thought and
xpression, oral and written" (338). Although no mention
as made of using these qualities for formal teacher
valuation, readers could compare their own qualifications
or those of others) against the lists provided.

The other English language arts evaluation that took
lace during the early NCTE years was district-wide and
ational studies of curricula and teaching practices.
n 1913, for example, a national study of high school
nglish appeared, sponsored by committees from both NCTE and
he NEA which had been charged ultimately to prepare a
ational syllabus. A questionnaire had been sent to high
:hools asking for information about teaching conditions,
meers of students, and curriculum. An especially
mteresting part of this study is that one of the questions
is "how do you test the efficiency of your English

>urses?" ("Types" 582). Responses included expected

 

 

answers such
school exams
were also :
effective e
appreciate l
conclusion
there seem
satisfactory
served seve
individual I
to compare
might decid
found happ,
using data
improve Wor
The g
involved 11
Cremin repc
POPular mac.
Opinionated
talk with
interview I
in 11%
"public a]
incompetem
needed 8cm

 

4].

nswers such as college entrance exams, college courses, and
chool exams. On the other hand, less quantifiable measures
ere .also :mentioned, such as, "the power of clear and
ffective expression," "interest in the work," "power to
ppreciate literature," and "voluntary reading." It was the
onclusion of this committee in 1913 that "on the whole
here seemed to be no tests generally regarded as
atisfactory" (582). Publishing the results of such studies
erved several purposes. It is easy to imagine that
ndividual English teachers might study the reports in order
0 compare their own teaching practices. English faculties
ight decide to revise their curriculum based on what they
ound happening in other places. School administrators,
sing data from the reports, might persuade school boards to
mprove working conditions or buy new materials.

The general public did, however, occasionally get
nvolved in the evaluation of school programs. Lawrence
remin reports that back in the late 18005 The Forum, a
apular magazine of the day, had sent Joseph Mayer Rice, an
pinionated pediatrician, around the country to observe,
ilk with teachers, attend school board meetings, and
iterview parents (4). When Rice’s findings were published
1 The Forum, they included sensational descriptions of
>ublic apathy, political interference, corruption, and
1competence" (4), sparking considerable discussion about
aeded school reforms for several years to come. By 1924 an

iTE study was conducted which involved 7,752 persons in 42

 

 

states. In
were asked '
ordinary su
this case to
big enough
the leaders
included a
language ar
to choose .
improve tum
1924, it i:
"definite s
(Searson
Silbstantia]
“mum

of ready-m;

This 1
eall‘liest
judgments l
Performanc
clasSroom
they Were
Officials .

personal <

that Stam

reading an

 

42

states. In this case persons in a variety of occupations
were asked their opinions about language skills needed "for
ordinary success in life" (Searson 102). The results in
this case were intended to be used to help design "a program
big enough to challenge the imagination and cooperation of
the leaders of the country" (Searson 99). This same study
included a questionnaire distributed among 8,799 English and
language arts teachers. One of the questions asked teachers
to choose among 21 items the "most urgent thing needed to
improve the teaching of English.“ Given the climate of
1924, it is not surprising that one of the top choices was
"definite standards of English work for each grade, or year"
(Searson 105). Such teachers undoubtedly provided a
substantial market for Franklin Bobbitt’s How to Make a
Curriculum published that same year and including hundreds

of ready—made educational objectives.

This brief overview makes clear the fact that, from the
earliest colonial examinations, individual teachers’
judgments have been viewed as inadequate measures of student
performance. Society’s representatives from outside the
:lassroom have from the beginning been involved-"whether
:hey were ministers, school trustees, or college admissions
>fficials. Still, such evaluations were subject to varying
Jersonal opinions. It is easy to understand the promise
:hat standardized scales and tests held to turn students'

'eading and writing into numerical scores not tainted by

 

 

human attit
efficiency 1
to promise
English lan
well. Stil
arts educat
the time tc
in order
individuall
sought for
At an
GXpressed
numerical
Standards
teaching I
questioned
adequately
arts Skill
question 1

Stildent SO

 

43

human attitudes and impressions. The same impersonal
efficiency that worked on the factory assembly lines seemed
to promise both higher productivity and quality control in
Emglish language arts evaluation of student performance as
well. Still, it is striking how much faith English language
arts educators placed in the tests. One wonders if any took
the time to study the content of individual test questions
in order to determine whether the questions, either
individually or collectively, reflected the learning they
sought for their students.

At any rate, caught up in this spirit, by 1924 they
expressed the View in the professional journals that
numerical scores were the primary means by which to set
standards for English language arts student performance,
teaching practices, and curricula. Few at this point
questioned whether the tests themselves could accurately and
adequately evaluate the merits of complex English language
arts skills or processes. Few seemed, in print at least, to
question the effects the numerical labels might have as

student sorting devices.

 

 

 

THE

Testing Entl

Nation:
followed by
stay in sc
represented
age--attrib
for employm
made a di
become a
efficiency-
the hard tj

If th.
the develo]
°blective 1
1925-4940
and discus-

In s,
edrlier
Echusiast

important

September
SEemingly
adVOCated

during t1

CHAPTER THREE

THE IMPACT OF STANDARDIZED TESTING: 1925-1940

Testing Enthusiasm

Nationally these were years of good times economically
followed by great hardship, years when students began to
stay in school longer. By 1930 high school students
represented over half of all the population of high school
age--attributable in part to "the shrinkage of opportunity
for employment" (Koos 305). Even the advent of school buses
made a difference, for school consolidation could then
>ecome a .reality (Cremin 274-5), promising’ greater
efficiency--undoubted1y' an important consideration during
:he hard times.

If the first half of the 19205 had been dominated by
:he development and description of various standardized and
>bjective tests of English and language arts, the years from
.925-1940 brought widespread use of, experimentation with,
ind discussion about tests and their potential uses.

In such a climate many English educators who had
arlier praised standardized and objective tests
:nthusiastically recommended that such tests be given an
mportant place in the curriculum. For example, the editor
f Elementary English Review, C. C. Certain, outlined in
eptember 1926 a testing program for the new school year.
eemingly unaware of any problems with the tests, he
dvocated using all of the following standardized tests

uring the year: Briggs Form Tests-~A1pha and Beta;

44

 

 

 

 

Wisconsin Te
Test of Sr
Correct Eng
recommended
dictation te

For 5
recommended

the Seconda

 

significant
century ent.
Objectively
(433). His
that teach.
profession,
our unstabl
Striking an
Chart list:
educators ,
°°mposition
literature,
SPElling' a

The $1
Praised as
which Were
Prejudice,
(Thomas et

defenSiVelv

 

  

45

   
  
  
  
  
  
   
 

sconsin Test of Sentence Recognition--V and VI; Wisconsin
st of Grammatical Correctness--A and B; and Clapp’s
rrect English Test—-A and B (211). Moreover, he
commended that these tests be supplemented by "parallel
ctation tests" and "controlled composition tests" (211).

For secondary English teachers Charles Thomas

"new tests" in The Teaching of English in
(1926), insisting' that "[t]he most

ommended the

    

nificant movement in education during the twentieth

   

tury enters in the attempt to provide scales that will
jectively measure ability and achievement in school work"
38). His recommendations came as a result of his belief
at teachers should be "true to the ideals of our
afession," by employing "every agency that will correct
r unstable personal judgments" (439). Perhaps the most
riking and instructive information he included was a long
art listing scales and measures that secondary English
Jcators could choose from: 21 tests were listed for
nposition, 20 for grammar, 18 for language, 11 for
:erature, 6 for punctuation, 30 for reading, 26 for
alling, and 17 for vocabulary (475-82).

The standardized and objective tests were most highly
ised as a corrective of teachers’ subjective judgments,
ch were thought to be too often determined "by mood,
judice, or gross misconception of factors and conditions"
omas et al. 209). Beyond that, English teachers might

ensively expect other benefits, since students could

 

 

"neither bl
answers wit]
492). Inc'
observed t
conglomerat
is what I 1
student whc
cannot blam
which "the
Why he rec.
<airl" (Sat
could, the]
against s1
probably (1:
needed to 1
Many '
StUdents w
tYpe" test
in the Jan
"used as a
Study on
tests adm'
the true /1
about %
Students,
routine 0:

8Glue test

 

 

46

"neither bluff through an examination nor pad [their]
answers with unconvincing material" (Satterfield and Royster
492). Indeed, one classroom teacher is said to have
observed that "[a] student can no longer write a
conglomeration of words, words, words, and then say, ’ That
is what I meant, but you took off for it.’" Instead, the
student whose test score was a 75 would realize that "he
cannot blame the teacher" as he might in an essay test in
which "the teacher cannot by any magic explain to one child
why he receives a grade lower than that of another boy or
girl" (Satterfield and Royster 493). English teachers
could, then, use standardized numerical scores as a shield
against student criticism, though a test score of 75
probably did not help the student understand what he knew or
needed to know.

Many teachers of this period were convinced that their
students were as enthusiastic as they were about the "new-
type" tests. An eighth-grade teacher, for example, writing
in the January 1928 Elementary English Review, reported that
"used as a means of stimulating interest and more intensive
study on the part of the pupils themselves, [objective]
:ests admirably serve their purpose" (Evans 13). Possibly
:he true/false and fill-in-the-blank questions she designed
about Ivanhoe and "Evangeline" seemed like a game to her
students, who were not already numbed by the repetition and
routine of objective tests, as today's students often are.

:ome test enthusiasts, however, seemed almost too eager to

 

 

 

project the:
questionablc
when they 3‘
"a friendly
(196).

The l
with the (3
tests. Cl:
English tee
to the bac}
an "x" in
reappear 0
strips we:
questions
teacher wa
his or her
and COUld
(304).

Many
unrealism
“Commie
belieVe 1‘.
designed
interests
advocated
adall3ted t,

 

47

ject their own testing euphoria onto students. It seems
stionable, for example, how accurate Thomas et a1. were
n they surmised that students thought of the new tests as
friendly and silent insurance against unrecognized merit"
6).

The late 19205 brought considerable experimentation
h the design and format of English and language arts
ts. Clapp and YOung described in 1928 a "self-marking"
lish test they had devised. Carbon strips were attached
the back of the test pages so that when students marked
"x" in a box to answer a test question, the "x" would
appear on the backside of the sheet. When the carbon
:ips were removed, students could readily see which
astions were correct and which were incorrect. The
icher was encouraged to check quickly the performance of
or her class by getting a show of hands for each item
could then tailor future lessons and drills accordingly
4).

Many enthusiastic test advocates seem to have had
ealistic expectations as to what testing could actually
omplish. The test-teach-test procedure led them to
ieve that standards and "minimum essentials" could be
igned that would exactly match students’ abilities and
erests. SOphia Camenisch’s 1926 article, for example,
ocated working out a "sequence of topics perfectly
pted to the needs of the growing child" (184) and praised

001 systems that had designed standards for each grade,

 

 

 

explaining
uniform acl
seem based
all studen
instructiox
that studel
By 1
especially
the worklc
could pro
Writing as
For 1
marke
marke
the
teach
teach
resul
“lain, hov
Were not 1
the 311911:
the exper
machine-ma
Sheet is
learning a
attainabl.
Broening
dYnamic e

after a

returned

 

48

plaining that such standards would "do much to bring about
iform achievement" (189). Unfortunately, such statements
em based on at least two faulty assumptions--first, that
1 students of a certain age were ready for the same
struction, and second, that spelling out standards meant
at students would achieve them.

By 1939 machine-scored tests were hailed as an

 

specially noteworthy innovation, partly because they eased
1e workload of English teachers and partly because they
Juld provide students with very quick test results.
riting as an NCTE committee, Broening et a1. explained that
For us, the interval between those days of teacher—
marked, subjective examinations and these of machine-
marked objective tests covers the dynamic story of how
the scientific measurement movement is transferring
teacher energy from laborious test scoring to creative
teaching on the basis of valid and reliable test
results. (309-10)
ain, however, modern readers might wonder if the authors
ere not unrealistically optimistic when they reported that
.e English teacher feels "there is no greater thrill than
.e experience of seeing her pupils' reactions to their

chine-marked, diagnostically coded answer sheets. This

eet is for each pupil the electric signal which starts

 

arning and keeps it going in the direction of worth-while
tainable goals" (310). Speaking from her own experience,
ening insisted. that she had "never“ witnessed such a
amic effect upon learning as during the period the day
er a test was taken when a pupil's answer sheet is

urned to him . . . and [he] knows exactly when he was

 

 

 

 

right and v
fact, Broer
teaching" ‘

advocated 15

Evaluating
Althor

profusion <
compositiov
seemed gre
students’ '
the Engli:
English ed
Which Wou]
needed to
Clapp’s ls
Which he
automatic
therefore,
"Special ,
Situation
“Sage ( 42.
i“ the bl
following

Said
Said

 

V

49

ight and why, and what he has wrong, and why" (310). In
act, Broening et al. used the expression "test-determined
eaching" to describe the curriculum this NCTE committee

dvocated for English language arts teachers (342).

valuating English Language Arts

Although the early 19205 had seen the development of a
rofusion of tests focused on composition, expecially using
smposition scales for scoring, in the mid-twenties there
semed greater interest in applying the new-type tests to
:udents’ usage errors. Perhaps especially concerned about
1e English language arts skills of recent immigrants,
iglish educators experimented in the design of usage tests
rich would allow teachers to adjust their curriculum as
:eded to correct particular errors. For example, Frank
.app’s 1926 article discussed a test of language habits,
rich he asserted could be tested only if students’
[tomatic responses could be measured. He was careful,
rerefore, to design test questions which would not attract
:pecial attention to the form" and yet would present a
tuation. in ‘which. students. could. demonstrate ‘their' own
age (42). Students, for example, might be asked to fill

the blank with a word that rhymed with "new" in the
llowing item:

Said Jack, "My book is nice and new,"
Said Jill, "I wish I had one ____"

 

 

 

 

 

Interesting
thus stymie
could corre
S. A.
testing of.
Wisconsin ‘I
that the 1
pupils in a
’grammaticz
'them kind
find out ti
AWain, stu
asked to c.
\ ho:
"choresn 0
"home" but
the Senter
language 1
article We
frequency
"laying" :
than "ainl
Anotl
article.
studtl‘tts
asked the

 

50

nterestingly enough, some students responded with "also,"
hus stymieing any attempt to discover whether test-takers
ould correctly decide among "to," "too," and "two" (42).

S. A. Leonard’s 1926 article likwise focused on the
esting of grammatical correctness. Reporting on the
lisconsin Test of Grammatical Correctness, Leonard explained
:hat the purpose of the test was ( 1) to find "how many
>upils in a hundred in various grades make certain censured
grammatical errors’ of the purely conventional sort like
them kind’ and ’1earn us,’" and (2) to "enable schools to
‘ind out the same thing specifically for themselves" (430).
.gain, students’ responses puzzled the test-makers. When
sked to complete the sentence, "I have finished my work and

home," some students filled in the blank with

 

chores" or "mother," words which shared an association with
home" but which did not fit syntactically with the rest of
he sentence (such responses sound like those that second—
anguage users might have given). Included in Leonard’s
rticle was an elaborate tabulation of 31 errors in order of
requency so that teachers could see, for instance, whether
laying" for "lying" was apt to be a more common problem
an "ain’t" for "aren’t" (440).

Another effort to rank errors appeared in a 1929
ticle. In this case a group of New York University
udents had given teachers copies of student texts and
ked them to find the errors in the papers. Rather than

cus on the quality of students’ usage, the university

 

 

students w
and inadeq
120) and CI
to the fol
for "ran"
violations
colloquial
"expressio
in?"). U
during the
might toda
expressed
teachers
idiosyncra
teSting p1
grades ma
Secondary
the Same .
f°r Older
effECtiVe
"Pupils d
leaVing ix
TYpi<
listening.
during t1
however, 1

oral lane

 

 

51

tudents were apparently more shocked by the "uncertainty
nd inadequacy of the teachers’ judgments" (Barnes et al.

20) and consequently tried to classify the errors according

to the following categories: "gross errors" (e.g., "run"
for "ran"), "trivial errors" (e.g., subjunctive mood
violations such as "was" for "were), "expressions in good

solloquial use" (e.g., "kind of pretty"), and finally
'expressions in good general use" (e.g., "Which drawer is it
Ln?"). Undoubtedly, a great deal of effort was expended
luring the process of error classification, a task that
might today seem of questionable value, though these authors
expressed the hope that their classification would help
:eachers sort out their own linguistic preferences and
.diosyncracies (120). In another case, a Massachusetts
esting program confirmed that students in lower elementary
'rades made the same errors as students in the upper
econdary grades, though the older students made fewer of
he same errors (G. Wilson 117). The improved test scores
or older students were attributed in this case not to
ffective teaching or learning but to the likelihood that
pupils doing less well in school work tend to drop out,
eaving in school those who are proficient" (Wilson 117).
Typically, the oral language arts—-speaking and
istening--received less attention among test developers
uring these years. Some unwieldy efforts were made,
owever, to apply the methods used for composition scales to

ral language. Sydney Harring, for instance, reported in

 

1928 a so:
children
compositior
(71). Lat
study whic
records of
various su
terms of s.
of ideas;
measure v.
well. The
The stumbl
with such
Classroom
as these,
Procedures
there seen
for correc
Effor
apply Star
designers
infornath
to find V
Such attex
which Drc
Deetry.

With One

 

52

1928 a scale devised using stenographic records made as
children presented oral compositions. The written
compositions were then judged for "composition quality only"
( 71). Later Mildred Dawson described a somewhat similar
study which also used stenographic records, in this case
records of students’ "conversation and discussion in the
various subjects" (195). This data was then analyzed in
terms of sentence structure, correct usage, and organization
of ideas; and eventually a rating scale was constructed to
measure voice, posture, articulation, and vocabulary as
well. The results for each child were then charted (195).
The stumbling block was, of course, the stenography needed
with such plans. It seems questionable whether any
classroom teachers were able to immﬂement such suggestions
as 'these, though tape recordings would later' make such
procedures somewhat more practical. Again, in this case
there seemed a special effort to focus attention on the need
for correctness.

Efforts continued during these years to find ways to
pply standardized methods to the study of literature. Test
esigners realized the need to measure more than factual
'nformation about literature and struggled especially hard
0 find ways to measure appreciation of literature. One
uch attempt was reported in the March 1926 English_gggrg§;,
hich. provided. a test designed to test appreciation of
oetry. One section of the test included lines from poems

ith one version unchanged and the other two versions

 

 

 

 

reworded ":
Students we
best to the

And ma
Find c

And me
Find c

Come,
Seek s

In order t
understand.
test asked
you ever

shepherd t
that stude
apt to app
Wonders wj
answered q
"Have you
pedagogica
knowing t
Such (Tiles
Sec°nd la
respondim
for this

that the

23 Percent

after Cla

 

53

reworded "in order to destroy the rhythm" (Ruhlen 203) .
Students were asked to indicate which of the choices sounded
best to them:

And may my old, lingering age
Find out some peaceful hermitage,

And may at last my weary age
Find out the peaceful hermitage,

Come, let my old, dull, white age
Seek some quiet and restful hermitage,

 

In order to determine students’ background experiences and
understanding of poetic language, another section of the
test asked for yes-no responses to such questions as "Have
you ever seen . . . a dappled dawn? ebon shades? a
shepherd telling tales under a hawthorne?" The intent was
that students unfamiliar with the vocabulary would be less
apt to appreciate and understand poetic style. However, one
wonders ‘what students ‘might. have thought about as they
answered questions such as "Have you ever tasted ale?" and
"Have you ever been lulled to sleep by the wind?" And what
pedagogical conclusions might teachers have drawn from
knowing the students’ negative or affirmative answers to
such questions? Again, students for whom English was a
second language seem to have been at a disadvantage in
responding to such questions. At any rate, the responses
for this particular test were carefully calculated to show
that the class as a whole had a background for appreciating
23 percent of the imagery in "L’Allegro." A post-test given

after class discussion of the poem revealed that students

 

 

then under
leading the
not be taut
appreciate
instructim
Perha]
asked for
literary
article $1
engage stu
the materi
questions
there are
jest? Wha
Similarly
Selection,
would You
these wrj
strategy c
reported
lively an.
personal <
better tha
in the pa:
In ]
surprising

54

hen understood about 50 percent of the images (208),
eading the author to conclude that perhaps this poem should
at be taught, since so many students still seemed unable to
ppreciate the imagery even after having received
nstruction about it.

Perhaps more useful were teacher-designed tests that
ked for personal engagement of students with their
terary texts. For example, Olga Achtenhagen’s 1926
ticle suggested using "thought questions" intended to
gage students personally in their reading and to build on
e material already familiar to them for response to such
estions as "do you agree with Bacon when he says that
rare are certain things which ought to be privileged from
est? What are your reasons?" (288). In 1928 Ruth Moscrip
.milarly suggested that students be asked to read a
election, then write responses to questions such as "Who
luld you rather have been, Maggie or Tom? Why?" Using
ese written responses prior to class discussion, a
rategy often recommended for classrooms today, the author
ported that subsequent discussion had been especially
vely and led her to conclude that questions asking for
rsonal opinion and engagement aided literary appreciation
:ter than the factual tests of knowledge she had created
the past (140).

In 1931 Poley’s "Learning by Testing" suggested a
prising innovation which offered a hint of the student-

tered focus for English language arts that was beginning

 

 

 

 

to develop
asked his
explained
with the b
Durin
part of "
arts were
knowledge
interpret
of NCTE’s
the ideal
Purpose br
well, Hat
ideal cur
Constantly
"the appri
life expel
Later
asserted 1
centrol 0
rather th.
read (Jon
students
class had

each and

Studied.

55

to develop during the 19305.. Poley reported that he had
asked. his ninth graders to devise their own tests and
explained that students were enthusiastic about coming up
with the best possible questions (135).
During the early 19305 English came to be viewed as a
art of "life experience," that is, English and language
rts were considered important not just as a way to acquire
nowledge and skills but also as a way to understand and
'nterpret life’s experiences. In 1932 W. Hatfield, Chairman
f NCTE’s Curriculum Commission, spoke of "an ideal life as
he ideal curriculum" (179). Such a reconceptualization of
urpose brought with it a new consideration of evaluation as
well. Hatfield explained that just as "the backbone of the
ideal curriculum . . . is a sequence of experiences
:onstantly increasing in complexity and subtlety," so also
'the appraisals must be in terms of growth of power in the
.ife experiences rather than formal tests" (191).

Later an article entitled "Power—Testing in Literature"
.sserted that the primary purpose for testing was to develop
ontrol over books students would encounter in the future
ather than to display knowledge of books they had already
sad (Jonas 800). A "power" test was created by giving
tudents three unfamiliar passages written by authors the
Lass had studied and asking them to identify the writer of
rob. and to explain the similarities to previous works

udied. Such a test "represented[ed] not crammed facts and

 

 

parroted v
material"
F0110
Rousseau,
Smith 243
students .
English as
begin wit
already e:
circumstar
furtt
and 1
overc
and
[citi
m
Sounding
described
leads on .
Network,
planning,
and enjoy
As r
”Pillar w
puhlicati
uriderstan
Gray repc
the IllIIItbe
tWiCe th

 

 

56

rroted views but unaided pupil power to deal with fresh
terial" (804).

Following the focus on the child-centered pedagogy of
usseau, Froebel, Pestalozzi, Dewey, and Kilpatrick (N.
ith 243), the "activity curriculum" for elementary
udents emerged during the 19305 with its emphasis on
glish as experience. The activity curriculum was said to
gin with "something which an individual or group has
ready experienced" and to continue under the following
rcumstances:

. . . through the desire of the individual or group to
further interpret the experience, difficulties arise
and through the efforts of each individual or group to
overcome these difficulties, new interests are created

and new problems appear, and so on. (N. Smith 244
[citing 33rd yearbook, part II, National Society for

the Study of Education])

unding remarkably current today, this curriculum was
cribed as "a never-ending process . . . each experience
d5 on to further experiencing, thus forming an intricate
work , which involves investigating , questioning ,
nning, performing, evaluating, appreciating, achieving,

enjoying" (N. Smith 244).

A5 reading had become more and more important to and
ular with society, evidenced by increasingly widespread
lication of newspapers and books (Gray 10-11), efforts to
erstand the reading process likewise expanded. William
y reported, for example, that during the years 1925-37

number of research studies in reading had been more than

ce the number reported during the entire preceding

 

 

 

century (1
reading e)
how many t
While
read:
exter
stanc
freq1
His long I
from purel
interpret:
meaning r
itself, L
testimony
Intended
teachers
,Chapters
just as
Variety (
WhiCh ser
generated
Inde
Particula
educaltion
c("isllltan
it al.,
llsoundnes
Standard

359)‘ Ht

 

57

:ury (15) . As might be expected, efforts to evaluate

ling expanded as well. In 1931 Paul Sangren pointed out

many test-makers had scrambled to design reading tests:
While there are only three or four standardized oral
reading tests that have been used to any considerable
extent, there are approximately one hundred fairly well
standardized silent reading tests that have found

frequent use. (53)

; long list of such tests includes stated purposes ranging
>m purely decoding skills to measuring word recognition to
:erpreting texts and testing power to comprehend total
ening of paragraphs (88—93). Perhaps Sangren’s book
self, Improvement of Reading Through the Use of Tests, is
stimony to the importance placed on reading tests.
tended as a textbook, presumably for a college course for
achers or administrators, Sangren’s book ended some
pters with problems for study and discussion. Possibly
t as useful for potential reading specialists were a
iety' of graphs, scales, charts, and. diagrams--all of
'ch served as examples of the kind of data that could be
erated from test scores.

Indeed, reading theorists of the day seemed
ticularly eager to test. Arthur Gates, a professor of
cation at Teachers College who also served as a
sultant for a major publisher of basal materials (Goodman
al., Report 22), affirmed, as might be expected, the
undness of the policy of making systematic use of

ndardized. tests at intervals. during' the ‘year" (Gates

). However, reading educators led the way in suggesting

 

 

 

other mean
the inclus
that were
example, 1
"certain
combinatic
of study
(359).
curriculul
teachers a
student t]
times:
reading e;
reading :
Skills, .
advanced
For
follow a
administry
and at t.
(lather at
students.
informati
students:
confers“c
ability .

informal

 

58

other means of evaluation as well, probably as a reaction to
the inclusion of more student-centered classroom activities
that were a part of the experience curriculum. Gates, for
example, called for wider use in elementary classrooms of
"certain observational methods, ratings, questionnaires,
combination teach-and-test materials, subjective appraisals
of study habits, and other less widely used techniques"
(359). He insisted that superintendents, principals,
curriculum departments, book committees, supervisors, and
teachers all needed evaluative data (360) and that for each
student the following information should be available at all
times: age, intelligence, language abilities, previous
reading experiences and interests, vocabulary, basal silent-
reading skills, word mastery skills, basal oral-reading
skills, general reading habits, reading interests, and
advanced reading and study skills (360-61).

For high school students Gates recommended that schools
:‘ollow a number of testing procedures. In addition to
rdministering standardized reading tests at the beginning
1nd at the end of the year, teachers should periodically
rather and evaluate records of independent reading done by
.tudents. They also should test students’ ability to find
nformation in libraries and in books; should check
tudents’ study habits by observation, individual
onference, and self-inventory; and should measure students’
bility to read in the various subject-matter fields "by

nformal tests in connection with their class work."

 

 

 

Teachers :
treatment
norm for g
of student
and-test m
for the t:
individual
"increas[e
time“ (375
It i
interest ;
his own
DOSition
educators
of evalua.
qenuinely
classroom
dismissin,
“in busy
Primarily
PerceiVed
Parents,

quest ion .

 

59

:achers should also seek "expert diagnosis and remedial
'eatment for students whose reading ability falls below the
prm for grade VII" (386). Gates especially praised the use
‘ student workbooks and other "printed booklets of teach-
ld-test materials," because they provided an effective way
(r the teacher "to keep in almost daily contact with each
(dividual’s progress and difficulties" and because they
ncreas[ed] efficiency by teaching and testing at the same
me" (375).

It is difficult not to attribute part of Gates’s
terest in having published materials used to the fact of
s own employment by a publisher. However, Gates’s
sition was not considered extreme. Indeed, many other
ucators in the years ahead presented similarly long lists

evaluation suggestions in an effort, one assumes, to be
nuinely helpful. It seems likely, however, that in actual
assrooms teachers responded to such long lists by
missing them as impossible within the context of their
a busy classroom schedules. How much simpler to depend
imarily on objective tests which produced what they
rceived to be a scientific certainty that students,
rents, and administrators would find difficult to

estion.

rluating English Language Arts
Lching Practices and Curricula

Evaluation of teaching practices among English language

5 teachers continued through the late twenties to be

 

 

handled in
by the prc
Sidney Co:
rather qua
"antipathy
«followed
teachers--
communicat
relationsl
“many sor
desirable
in PeOple
"desire"

Most impo
be a "re:
self-appr,
Broening

Protessio
"Did the

remote Va
records,

teaCher 0
recorded

individua
authors ,

evaluatic

 

6O

handled indirectly, at least as such evaluation was treated
by the professional English and language arts publications.
Sidney Cox’s 1928 The Teaching__o_f__ﬂ1gli_s_h, for instance,
rather quaintly discussed virtues needed by all teachers--an
"antipathy to deception," nerve, energy, and health (80-83)

--followed by a discussion of qualities unique to English
teachers--"a fundamental and imperious desire to
communicate," "an urge to establish reciprocal
relationships" with students, and the ability to cope with
"many sorts of actuality all the time" (83-84). Other
desirable traits on Cox’s list included a general interest
in people, taste, an "acquaintance" with good books, the
"desire" to 'write, and. possession. of Jknowledge (84-88).
Most important, Cox asserted, was that the English teacher
be a "real person" (89). Eleven years later the teacher
self—appraisal criteria for unit teaching devised by Angela
Broening' et a1. sounded. considerably"more practical and
professional, asking questions about conferences, such as,
"Did the conferences help pupils to see the immediate and
remote values of the subject—matter?" and questions about
records, such as, "At intervals of different lengths did the
teacher check with the pupils their efforts and successes as
recorded in their notebooks and evidenced in their special
individual and group projects?" (Conducting 284). These
authors affirmed the Classroom teacher’s central role in

evaluation and decision—making based on reflection:

 

 

 

 

The
hers:
or e:
girl:
while
accui
expel
to a
(285
As t
studied 1
objective
realized
pupils, 1
marks and
closely
learning"
reCogniti
teaching
needs, t1
prOViSion
desirable
Was "hon
bright 0:
“Sins tn.
(308),
In (
disCussir

ll

CurriCult

 

61
The final phase of appraisal is what the teacher

herself feels, thinks, and knows (because of empirical
or experimental evidence) has happened to the boys and
girls in her class. Looking back over a unit of work
while details are fresh enough to be recalled
accurately yet with the perspective of a completed
experience gives the teacher the courage to repeat or
to abandon the unit. with another group) of pupils.

(285)

As teachers learned from self-evaluation, they also
studied test data from their students’ standardized and
objective tests and, as Broening et al. reported, eventually
realized that "in no classroom, however small the number of
pupils, however well selected on the basis of teachers’
marks and parents’ pedigrees, could be found boys and girls
closely alike in respect to all the factors affecting
learning" (Broening et al., Conducting 60). Such
recognition of individual differences led to a variety of
teaching practices and programs designed to meet individual
needs, though as Leonard Koos pointed out in 1933, such
provisions were "not yet as generally practiced as seem[ed]
desirable" (308). Instead, the easy solution that emerged
was "homogeneous grouping, special classes for ‘the ‘very
bright or gifted and for the slow," more often than not
using the intelligence quotient as the basis for grouping
(308).

In order to consider English language arts evaluation

of curricula, it is helpful to consider Ronald Doll’s

 

iscussion (1970) of curriculum improvement, which he termed
"a very recent field of inquirY" (20). Apparently, early

urriculum decisions were made on the basis of

 

 

administra
consideral
A clc
reveal cc
however, «
reported :
national
practices
Programs.
districts
enormousl
scope of
admittedl
and langr
almost 2(
in state
thils, 1
0Ver 800
this stuc'
Penulatic
curricuh
the heme:
of the 1
peri0d:
to haVe

before 1

62

administrative and teacher recommendations with little
consideration of official curriculum evaluation (21).

A close look at English language arts publications does
reveal considerable curriculum evaluation taking place,
however, during the 19205 and 19305, though often what was
reported most noticeably were results of state-wide or even
national studies of English language arts programs and
practices rather than school- or district-wide evaluation
programs. Often such studies involved hundreds of school
districts and thousands of teachers--and undoubtedly were
enormously expensive to conduct. Koos described in 1933 the
scope of the National Survey of Secondary Education--
admittedly a broader study than those involving just English
and language arts. Still, it is difficult to imagine that
almost 200,000 forms were sent to "administrative officers
in state departments and local school systems, teachers,
pupils, former pupils, parents, and employers" (Koos 305).
Over 800 visits were made to over 500 schools as a part of
this study in an effort to study school organization, school
population, problems of administration and supervision, the

urriculum, and extra-curricular activities (304). One of
he benefits listed for this study seems applicable to many
f the very large research reports conducted during this
eried: "those in charge of the schools and teachers like
0 have the records and descriptions of the innovations

efore them and to be permitted. to exercise their own

 

 

 

judgment ‘
adopt or .
On a
on a stud
teachers
the cont]
economic,
were also
establish
as to tin
units b1
Organizat
covered -
classes,
0f achiei
0f readi
Slight sv
Commute.
The
%
evElluati.
by Broer
"Washer:
out how
d‘hnq ab
was app,

contribu

 

 

63

judgment with respect to which of them they will themselves
adopt or adapt in the different local situations" (312).

On a considerably smaller scale, NCTE reported in 1936
on a study of the "correlated curriculum" that involved 73
teachers and nearly 2,000 students. Care was taken to match
the control and experimental groups in regard to social,
economic, intellectual, and achievement status. Teachers
were also "rated" before the study began. Three groups were
established--one which integrated units with no restrictions
as to time schedule and activities, another which integrated
units but followed a fixed time schedule and fixed
organization of materials, and a final control group which
covered the same "general ground" but in separate subject
classes. When comparisons were made based on the criteria
of achievement tests, mental age, information tests, amount
of reading done, and attitudes, the results indicated a
slight superiority for both of the experimental groups (NCTE
Committee on Correlation 237).

The stated purpose of the 1939 publication by NCTE of
Conducting EXQeriences in English was to report on
evaluation of the experience curriculum. This text, written
by Broening et a1. and mentioned earlier, explained that
"teachers and supervisors over the country wished to find
out how the experience idea was working--what others were
doing about it" (vi). Therefore, a committee of 5 persons
was appointed. by' NCTE, and. the committee in turn used

contributions describing classroom practices from 274

 

 

 

 

English ta
research 6
strategies
thoroughly
arts class

Perha
the repo
articulat
placement
since on
curriculu
gather d
analyze '
based on

Eve]
PUp:
uni'
ObSl
ora
wri
acc
lis
thr
8B

pla
owe
the
lik
prc
Put

This pr<

data! S

In this

 

 

 

64

English teachers as well as results of questionnaires and
research experiments conducted to test particular classroom
strategies. What evolved was a comprehensive report
thoroughly researched, in many cases by English language
arts classroom teachers themselves.

Perhaps one of the more useful parts of the text was
the report of efforts in Baltimore to work out an
articulation program to determine the best grade-level
placement of particular units of study--especia11y helpful
since units were an important part of the experience
curriculum. The teachers engaged in the study were asked to
gather’ data from their students and. then in groups to
analyze the data to design the sequence of the curriculum
based on their analysis:

Every teacher kept careful records of what individual
pupils and groups were able to accomplish during the

unit . Standardi zed and teacher-made tests and
observational data concerning pupils’ emotional and
oral responses were studied . Specimens of pupi l s ’

writing prepared under known conditions were
accumulated and analyzed. From all these, a tentative
list of attainments, grade by grade was set up. Then
through conference of 73 teachers with 6A, 7A with 7B,
83 with 7A, etc., on up to 12A with college and with
placement counselors, the grade lists were anlyazed for
overlapping. Items in the list for a given grade were
then starred to show what teachers of that grade should
like entering pupils to have mastered and what the
promoting teacher reported as attainments of her
pupils. (268)

This process, though heavily dependent on standardized test
data, still provided a centrol role for English teachers.

131 this case, the teachers themselves ‘were involved in

 

 

 

 

 

 

classroom-
determini

As ;
provided
be evalua
it useful
or other
those whi
planning
supervisc

A.
adul

The
lanQUage
1925‘194

that toc

 

65

classroom—based research—-collecting and analyzing data and
determining where to set standards.

As an appendix to the 1939 text, Broening et a1.
provided criteria by which experience-centered courses might
be evaluated with the hope that "[l]oca1 committees may find
it useful to apply these standards in appraising their own
or other courses" (349). Among the criteria included were
those which outlined a democratic procedure by which course
planning could become a "creative experience to teachers,
supervisors, and pupils":

A. Survey . . . the needs in English of pupils and of
adults.

B. State objectives in terms of pupils’ present and
immediate future needs and of social sanctions.

C. Build units based upon experience.

D. Try out units in test-controlled situations in
actual classrooms.

E. Examine all available instructional equipment.

F. Utilize expert advice and scientific research
findings where relevant.

G. Outline the procedures used in building the course
so that new teachers may understand its philosophy and
contribute to its continuous adaptation to changing

conditions.

H. Prepare instructional tests and cumulative records
for measuring pupil growth in terms of adOpted

objectives. (350)

There is one other form of evaluation of K-12 English
language arts teaching practices and curricula during the
1925-1940 period, and that is the very unofficial measuring

that took place by college professors who drew conclusions

 

abc
on

put
tea
col
col
fr:
Coi

ab

ed
te
th
am
th
an
En
cc

SC

 

66

about their students’ pre-college learning experiences based
on their college performance. It is clear from professional
publications of the period that many secondary English
teachers were both fascinated and intimidated by the
colleges and felt obliged to try to discern what the
colleges wanted them to do. In 1931, a year when just a
fraction of the college-age population attended college, the
College Entrance Examination Board (CEEB) published a book
about the college entrance exams, Examining the Examination
in English. The book seemed intended primarily for college
educators but had significant implications for secondary
teachers as well. In addition to discussing the history of
the entrance exams and to providing elaborate comparisons
among the various forms that had been given over the years,
the book also reported the results of gathered information
and opinions from college professors, headmasters, and
English teachers in public and private schools and
correlated students’ college grades to their earlier exam
scores (Thomas, Examining 140-67). .Apparently' the CEEB
committee members felt that having conducted their study
somehow qualified them to give advice about K-12 programs.
Thus, included in the chapter summarizing findings and
discussing recommendations were statements intended to tell
:he K-12 schools what they should do, such as, "This process
if outlining or ’hdue-printing’ should start very early in
he pupil’s school career and should be consistently

racticed through the twelfth grade" (200). Even more self-

 

 

 

rightec
"should
stander
expect
their :
Sr
discarr
place
living
Progre:
Progre
accomp
1932 a
study
on coi
"Succe
School
lDVOlV
Colleg
eXamir
to pa
htqgra
hYpotI
exDerz'
Viewe(
et al

 

67

righteously, they asserted that any future entrance exam
"should embody in its questions a clearly conceived idea of
standards in English which the colleges have a right to
expect from secondary-school graduates who seek admission to
their institutions" (212).

Some did feel that secondary schools should strive "to
discard preparation for college as a goal and to put in its
place an educational program that looks toward complete
living" (Koos 312). One of the most extensive evaluation
programs of the l9305--the Eight-Year Study conducted by the
Progressive Education Association (PEA)--was expected to
accomplish this very thing. This elaborate study, begun in
1932 and concluded in 1940, was similar to the earlier CEEB
study in that it studied the effect of high school curricula
on college-bound students, in this case directly testing
"success in college" of several thousand graduates of
schools. According to the terms of the study the schools
involved had been freed from the constraints of the usual
college-preparatory course and freed from college entrance
examinations (Eberhart 261). Each of the 30 schools chosen

0 participate had develOped a "distinctive education
rogram" (261) which was "in essence an educational
ypothesis which it was necessary to test in actual
xperience" (262), so that evaluating the new programs was
iewed as important from the very beginning. Kirschembaum
t al. later would call the Eight-Year Study "the most

rofoundly important research study ever undertaken in the

 

 

 

history of A
because many
whose achie
exceeded tho-
pointed out,
World War I
"seem to hav
carefully p
involved cla
little impac
Programs: cc
to influence

used in an a

The Opt
this Period
during this
they were
measureg. ]
PrOCeeded m
profeSSiOna]

Instead, the
valid: and
students.

Stotistics
Surely had

68

ry of American education" (182), apparently so-labeled
se many of the experimental programs produced students

achievements, without the motivation of grades,
ied those of control groups (183). Kirschenbaum et al.
ed out, however, that-—perhaps because of the 1942
War II publication data--the results of the study
to have been lost on most educators" (182). Thus, a
mlly planned and documented research project which
led classroom teachers in a central role seemed to have
s impact as a model for later planners of innovative
ims: college admission tests continued to be given and
fluence secondary curricula, and grades continued to be

.n an attempt to motivate student performance.

'he optimism that surrounded standardized tests during
period felt right, then, to most English educators
this period as a way to combat the subjectivity that
were told had always permeated their evaluation
es. It is easy with hindsight to say they should have
ded more cautiously in offering their personal and
sional endorsements to such testing practices.
1, they operated on the assumption that the tests were
and they projected their own enthusiasm onto their
:5. They seemed unconcerned that students became
:ics to be manipulated and plotted on graphs but
had no way to anticipate the damaging effects such

nd their results might have.

 

 

They did
possible tes
them seem n:
suggestions
individual c

In the
curricula,
active par
evaluation.
encouraged
research pr
Procedures
the assum]
PrOfessiona

It is
teaSt part
CohhuCt hat
ht teachin
Students.
Seems to 11.
data that

of heeded

 

69

hey did also respond, however, by exploring a range of
ible testing procedures and questions. Although some of

seem naive today, some of the alternative evaluation
estions they offered pass as newer ideas today, e.g.,
vidual conferences and personal engagement questions.

In the midst of their "experience" and "activity"
icula, students were encouraged to some degree to be
.ve participants in their own education and its
.uation. English language arts teachers were also
(uraged to think for themselves as they participated in
arch projects that let them have a say about both the
edures and the recommendations. Such plans operated on

assumption that classroom teachers were capable
essionals. ‘

It is especially interesting to note that during at
t part of these years there was interest and money to
uct national program evaluations which served as a test
eaching conditions as much as a test of teachers and
ents. In fact, the primary use of such study reports
5 to have been to provide English language arts teachers

that could be used to persuade district administrators

eeded reforms.

 

 

Sch
especial
of the 1
percent
Commissi
stay in
school
makeup .
160).

Bu
Broenin
life ex
English
unanaly
Prevent
719-201
youth
applyii
proble;

activi

P

and fi

INHIBIT!

 

CHAPTER FOUR

CHALLENGING THE TESTS: 1941-1957

School populations during this time continued to grow,
ially in the high schools, which had served 50 percent
e high school age population in 1930 but served 75-85
pnt of that population twenty years later (NCTE
ssion 440). Because less successful students tended to
in school longer, the average "leaving age" of high
1 students kept getting higher, thus changing the

p of high school English classes (D. Smith, Evaluating

Building on the theory of English as life experience,
ing saw connections between world events and students’
experience. She insisted in 1941 that the role of the
sh teacher was to "immunize youth against hate, against
lyzed prejudices and unfounded conceptions which
t the realization of the poet’s vision" ("The Role"
0). Teachers should, according to Broening, "help
to rediscover and to reaffirm American ideals,
ing them to present local, national, and international
ms and realizing them through individual and group
'ties" (720).
ooley with hindsight would later refer to the 19305
'rst half of the 19405 as a time in which
nglish apparently fell heir to everything which
ducation felt that children should have and which did

ot fall naturally into any other area of the
urriculum . . . the period in which the newspaper, the

 

 

 

magazi
motion
electr
became

English Lar
Tstmbise

The e
years was
seemed to
educators
objective
had refine
Professior

Walte
movement,
the Engli
goals whi
for their
as vocabu
isolation
reSuited
Teaching
by testir
the inst
reve‘ISe 4.
(197).
chthSed

"faulty

71

magazine, the popular book, detective fiction, silent
motion pictures, talking motion pictures, radios,
electrified phonographs, and, finally, television
became a part of the English teacher’s job. (498)

ish Language Arts

Abuse and Criticism

The enthusiasm for testing that had existed in earlier
'5 was soon tempered by considerable criticism that
led to grow more intense. By the 1940s many English
:ators had become disillusioned by standardized and
active tests or, if they had always been test—resistant,
refined their arguments enough to finally be heard in
fessional publications.

Walter Cook, an articulate critic of the measurement
ement, insisted in 1944 that it had negatively affected
English curriculum by focusing attention upon "limited
is which [could] be objectively checked without regard
itheir relative importance" and by measuring such things
wocabulary, spelling, capitalization, and punctuation in
aation, a practice especially undesirable because it
%lted in "a tendency to teach them in isolation" (197).
thing practices had likewise been negatively influenced

$esting: "Since the evaluation program was not keyed into
4‘
arse the procedure and to fit instruction to evaluation"

%
T). Worse for students, test norms were "too frequently
fused with grade standards," which sometimes resulted in

i

alty classification and promotion practices" (197).

instuctional program, the only alternative was to

D .
importar
Regents
W
practice
passed
which w
high sc
economi

Smith p

PE
tests <
seek we
text 1
illust;

intent

I—JHI'I'UIH'MQ’Q’MCTSH

72

D. Smith found this last problem of particular
tance when she conducted her study of the New York
.ts Exams, as described in Evaluating Instruction in
idary English (1941). She was especially upset by the
:ice of using test scores to decide which eighth graders
ad "from the rural school to the town high school" and
1 were retained in eighth grade. Recognizing that the

school provided "opportunities for shop work, home
amics, agriculture, commercial training, and the like,"
a pointed out how unsatisfactory the testing system was:

. . . the very pupils who cannot measure up when called
upon to comprehend poetry and name the parts of speech
are in greatest need of the type of training available
at the secondary school level; yet they are the ones
held back one, two, and sometimes three years because
they cannot pass the eighth grade examination. (164)

Perhaps it was teachers’ awareness of how damaging the

i
could be for their students that in part led them to

ways to circumvent the effects of the tests. One 1944
included a much earlier comment by a publisher that
states the extent to which teachers, perhaps well
ntioned, abused the tests:

I could give you the names of several school systems in
which cumulative files are kept of all forms of our
tests. We have standing orders from these systems to
supply them with each new form as it appears. Our
agents tell us that in these systems the tests are
available to all teachers who, if not encouraged to do
so, are certainly not prevented from duplicating these
tests and drilling their pupils in taking them. Then
some form or other of these tests is used at the end of
the year to measure achievement and to make comparisons
between classes within the same system. (qtd. in Cook
196)

 

 

 

of read
for sec
got the
include
preseni
years
profes:
oral l;
I:
recogn
theore
Davis
a defi
accura
regret
tests
liters
(187).
descrj
of
°°mpre
readi]

(Tinke

teach

73

W

For elementary language arts students, the evaluation
'eading seemed a major focus during this period, whereas
secondary English students, the evaluation of literature
the most attention. Although all the language arts were
uded whenever objective evaluative criteria were
ented, what seemed essentially missing during these
's for both elementary and secondary levels was
essional discussion of the evaluation of writing and of

language.

In regard to reading, there was considerable
gnition that tests in reading reflected particular
retical definitions of what reading was. Frederick
s in 1944 argued that reading tests needed to "formulate
finition of reading that can be accepted as adequate and
rate by authorities in the field" (181) and expressed
pt that most current reading tests were "almost entirely
E of word knowledge and of the ability to comprehend the

al meaning of the separate statements in what is read"
). Ten years later, however, a Reading Teacher article

ibed an only slightly expanded list of the "main areas"
reading—-"word recognition, vocabulary meanings,
ehension, rate of reading, study skills, special silent
'ng skills, oral reading, and interests and tastes"
er 36).

Educators who depended on step—by—step methods for the

ing and testing of reading packaged by national

 

 

 

publi
"read
were
lengi
level
the

devi
“opt
Smit
sent
cont
stud
Smit
pose
nume
Engl
whe1

foc

tea
Kat
ed.
9V6
es;
hat
the

the

74

mlishers probably welcomed the scientific sound of
eadability" formulas that emerged during the forties and
ere designed to determine by word counts and sentence
sngth. which reading texts were best for particular age

Nels. Appearing even more impressively scientific were
i

la "mechanical aids" that emerged at about the same time—-

hices with technical names like Etelebinocular,“
@thalmagraph," "metronoscope," and “tachistoscope” (N.
i

ﬁth 303). With each machine, "words, phrases, or

htences of content were flashed for recognition under
ntrolled time allotments which could be decreased as the
udent gained in ability to grasp them more quickly" (N.
ith 303), thereby offering a whole range of new testing
ssibilities and new reading measurements that would yield
merical scores. The interest in applying technology to
glish language arts evaluation continued into the fifties
an teaching machines provided programmed instruction
:used on "small bits of learning units" (N. Smith 402).
Not all reading educators were swayed by the scientific
aching and testing materials of the day, however.
:hleen Hester’s Teaching Every Child to Read (1948, 2d.
in 1955) offered an array of alternative reading
lluation procedures. Teacher observation was considered
>ecially important and included noticing students’ reading
)its and tastes and recording the kinds of books read, the
lunt of time spent reading, attitudes towards books, and

extent to which students turned to books for information

 

 

(385)
"the
obser
luncl
chec]
appra
Teaci
so

info
Hest
[emp

near

resi
scie
teat
SUb:
rear
Cri
Eng
ato
Hen
ref

Sit

75
385). Teachers could, according to Hester, also record
the child’s social and emotional development" and

tservations of "pupil behavior in the classroom, in the
hnchroom, and on the playground" (386). They could use
hecklists, pupil file folders, anecdotal records, self—
ppraisal records, and interest inventories (386-87).
eachers reading such a list might have felt overwhelmed by
P comprehensive a plan, but perhaps some used these
hformal measures and noticed a couple of pages later
tster’s bold assertion that "standardized tests supplement
emphasis added] the information obtained by the informal
aans that have been suggested" (388).

Secondary English teachers as a group seemed even more
asistant than elementary teachers to applying so—called
:ientific measurement to their curricula--especially to the
aaching and learning of literature. Whether they
lbscribed to Rosenblatt’s emphasis on the responses of
:aders (Literature as Exploration, 1938) or to New
'iticism’s emphasis on close readings, many secondary
(glish teachers seemed more aware of the dangers of
omizing texts in order to make them testable. George
nry's 1954 article, "Only Spirit Can Measure Spirit,"
flects such an attitude. Henry described a hypothetical
tuation in which a fifteen-year—old student

. . . comes to your desk and says, "You know, that

captain in the Sea Wolf is the kind of fellow you

wouldn’t like to meet in real life, to have as a

friend, I mean. You’d no doubt hate to have him around
you. Yet, when you read this book, you see inside him.

 

Henr
this

scie

Such
as C
sitt
sine

fine

durj
meae
"Fur
Pr0<
Con:
its
and
fon
(l8
adv.

gro

 

76
You know what makes him be the person he is. You begin

to look at things his way, and you start to understand

him. You sense his point of view, knowing the kind of

fellow he is. You don’t agree with him, yet you

appreciate his type." (181)
nry posed the question for readers, "Why should we give
is boy an objective test in English? In the name of
ience?" Students, Henry suggested, have been

so conditioned by our formalism in teaching that they

themselves don’t know when they are getting educated. .

. . Hence their notion of what is ’good for them’ lies

in the complacency of routine, of text, of the number

right and wrong, and of the final examination. (181)
:h statements ring true for many English teachers today,
did Henry’s insistence that "the teacher must devise more
:uations soliciting ease and confidence and dignity and
icerity to reveal what is going on inside the student, not
1d better objective tests!" (181).

Those few who discussed evaluation of composition
'ing this period felt less constrained by objective
isures. Harry Greene and William Gray in 1946 mentioned
.nctional Objectives . . . of Written Expression" but
duced items which reflected merely the practical
rtesies prescribed by the life experience curriculum,
ms such as "to use correct form and content in all social

business correspondence" (180), to "fill in certain
ms and items of information as evidence of understanding"
L), "to write a telegram, notice, announcement, or

ertisement" (182), and "to keep records and minutes of

1p meetings" (183). These objectives were then turned

 

 

intc
grad
test
fill

suqc.

Stu

OPP

ab:
ev:
as
re:

Su-

77

:o classroom assignments which produced papers easy to
lde for form and correctness, or they were turned into
:ts which were likely to be dominated by short—answer or
.l-in-the-blank questions. Indeed, Greene and Gray

pgested questions such as the following:

Which of the . . . salutations
Dear Sir: Dear Bob,
Dear Mr. Snow: Gentlemen:

would be suitable for use in writing a friendly letter

to Mr. Robert Snow, an older business man? (181)
Idents were given questions about writing rather than
aortunities to write:

Show for each of the following the proper material to

be used in writing. Write the correct number on the

line before each exercise:

__ (a) A note to the grocer (1) Written on plain

boy to put the meat in paper

the icebox on the porch (2) Written in longhand
__ (b) An application for a with a pencil

position in an office (3) Written in longhand
__ (c) A note of thanks to with blue/black ink

your hostess for dinner on plain note paper
__ (d) A formal note of (4) Written in longhand

regret with blue/black ink
__ (e) A cordial letter to a on tinted note paper

friend (5) Written in longhand

with brown or green
ink on tinted paper
(Greene and Gray 184)

College Board data cited in 1957 might provide insight
ut why relatively little discussion of composition
luation appears. When high school English teachers were
ed about the writing assignments they gave, their

ponses revealed the effects on writing assignments of

1 factors as "over—enrollment, competing activities, and

 

 

even a
that o
the w:
instru
testat
E
attent
many
sugges
parali
such
effic:
or
conve:
autho:
objec
stude

would

Sect j

diSCl

78

:n administrative pressure" (French 201). It seems likely
:t overloaded teachers of the day compensated by limiting
: writing they asked students to do and by focusing
:truction and evaluation on relatively trivial, but easily
:table items.

Evaluation. of oral language likewise received little
:ention, although the experience curriculum provided for
1y listening and speaking activities. Greene and Gray
[gested several objectives of oral expression that
ralleled objectives for written composition—~objectives

:h as "to greet others easily and courteously and

?iciently" (177), "to give clear directions, explanations,
announcements" (179), and "to participate in
1versation, group discussion, and meetings" (179). The

hors pointed out that in many cases evaluation of such
ectives involved simply noting whether or how well
.dents behaved. in social situations or' asking how 'they
,ld respond in a specific situation, e.g.,

When answering a ringing telephone, which of these
actions are to be preferred?

(a) Lift the receiver and wait for the person to
speak.

(b) Say, "Hello."

(c) Say, "Hello, this is Bill Smith."

(d) Say, "The Smiths’ house, Bill speaking."
(e) Say, "Guess who this is." (177—78)

An NCTE study of language arts curriculum devoted a
tion of Language Arts for Today’s Children (1954) to a

:ussion of the part that parents and even "other members

 

of

peri

Unf
fol

sch

or

prc
let
one
de:
ab.
th.
an-
an
Pu
or
an
de

gr

79

E the community" might play in evaluation of student
erformance (393):
If parents are to realize that growth in reading . .
progresses slowly through many and successive steps,
that the child’s desire to communicate is normally far
ahead of his ability to spell the words he needs, that
discriminating vocabulary grows out of real experiences
and opportunity to talk about them, they must have some
share in the establishing of such values for the school
and in determining what are appropriate means of
appraisal. (393)
nfortunately, in the "Techniques of Appraisal" section that
ollows, the focus is entirely on appraisal that happens at
chool rather than in the home or community.
Whether parents were a part of the evaluation process
r not, they have always been recipients of their children’s
rogress reports. Rather than using simple numerical or
etter grades, Hester suggested a more descriptive report,
he that "considers each child as an individual and
ascribes his work in terms of his own aptitudes and
bilities" (391). Hester’s progress report would describe
me child’s "personality and character development, his work
1d health habits, his attitudes toward parents, teachers,
ad classmates, his attainment in subject areas" (391).
1rthermore, it would allow the child to help evaluate his
7 her own progress during a "cooperative planning period“
1d would inform parents of the child’s "growth and

evelopment and [provide] information to assist the home in

liding the child in his future growth" (391).

 

occz
wit]

desc

Eng
tha

9V6

at
eve
hos
pe:
Al-
11'
ar

9X

1i

my

80

Self-evaluation by students continued to be suggested
:asionallyg Robert Pooley and Robert Williams noticed
:h regret, however, in their Wisconsin study (1948)
;cribed below that

In only 9 percent of the classes was there evidence

that the class had in mind a set of standards by which

the papers were to be judged and by which they could be
checked before submission, and in only two classes had
these standards been evolved from general discussion

and hence become the standards of the class. (166)

One English Journal article put the whole issue of
[lish language arts student performance into a perspective
it went beyond the formal or informal measures and
{luation strategies so often suggested either prior to
,s time or after. Henry, drawing on his experience as a
rh school principal and English teacher, demonstrated in

1946 article, "An Attempt to Measure Ideals," what we
iay might refer to as reflective practice at its best—-or

its worst. Clearly Henry was willing to rethink
luation, to strip away all that he had been taught about

evaluation was supposed to be done and to depend on
sonal reflections of his own teaching experience.
hough he spoke primarily about the teaching of
erature, his thinking seemed applicable to all language
s and to other subjects as well. He confessed the
speration he had felt through his years of teaching, that
ter what I considered to be a good moment of teaching

erature I had no way of measuring it, or proving even to

elf that it was good" (487). He wanted the reassurance

 

 

 

song
in 1
the:

(48'

rev.
Fir:
dev
(48
to

one
Wha

ite

two

per
sta
Stl
all
bet
or:
(4:
Whe

re]

81

ught by English teachers from every era: "as one teacher
that experiment in which nearly all Americans have placed
air faith I wanted to be sure that I had no useless part"
37).

Henry sought affirmation of his teaching practices by
rising the process by which he evaluated his students.
:st, he stopped giving formal tests in literature and
Iised "a scale of values to replace fixed quartiles"
38). He wanted "a measure that would be, above all, clear
the pupils, something that . . . measured the ’real,’ not
a trait isolated from the pupil’s total humanity" (489).
it emerged from his thinking and experimenting was a 20—
am list, which he dittoed for his students and spent three
rs of Class time discussing. Each item was designed in
1 parts, the first expressing some truth about the
iucated man" or setting forth a hypothetical situation and

second raising an evaluative question. His intention
to nudge students to think beyond obvious measures of
formance and to encourage them to adopt his own ethical
ndards, as demonstrated in one item in which he asked
dents whether a poor performance on a task that had
eady been formally evaluated several months earlier might
er reveal a student’s actual performance than the
inal performance that had closely followed instructions
). Similarly, in another item he asked essentially
her more weight should be given for an insightful book

rt than for one mechanically correct. Ultimately, he

 

 

 

revea
were
the t
and

their

on h
sougl

beine

In t
desi
cour
that
neas
fell
Clas
Stat
teac

acce

grad

Witt

 

82

ealed in yet another item his belief that, as students
e confronted with the "eternal questions of mankind" in
books they read, their own character should be improved
the character improvement should be considered when

ir teacher decided their grade in literature (490).
Henry’s success in using these criteria surely depended

his ability to model the characteristics and habits he
ht for his students, though he might be criticized for
g unfairly judgmental. For example, he explained that
The brightest pupil I ever taught was a rank egoist.
The young Lord Byron protested at his "B’s" as an
affront to his "talent." I said: "Art is 21 social
tool, not an outlet for exhibitionism. Until you learn
that fact you will never been an “A“ pupil--or a
writer. (492)

the end, Henry admitted that the new measures he had

igned had "added both liveliness and reality to the

rse [but] I cannot prove it, and now I am dissatisfied

I have yet no way of measuring the effectiveness of the
urel What good is it simply to say to a layman or to a
ow-teacher—-minus chart, graph, or table-—that now my
ses have more ’spirit’?" (493). Henry’s candid
ements reveal the inadequacy many English language arts
hers must have felt in the midst of a society that
pted only hard statistics as proof.

Henry’s willingness to open up the evaluation and
ing issue apparently went unrewarded by follow-up
rest en: the part of his colleagues. There was much

in this unorthodox article that might have provoked a

 

 

var
str

to

net
stt
age
Fe}
in
st]
the
1201
tn
an.
an

86‘

te
co
at
in
un
We
de
ma

0t

83

variety of responses-—e.g., consideration of ways to measure
students’ affective responses-—but unfortunately, it seemed
to go largely unnoticed.

One other unusual test of student performance should be
mentioned from this time period, for it involved 'testing
students not only against their current peers but also
against same-aged students from 20 years before. The
February 1956 Elementary English reported a study conducted
in Evanston, Illinois, in which the reading performance of
students during 1952-4 was compared with the performance of
those during 1932-4. Although it seems difficult to believe
:oday, the researchers conducting the study were convinced
:hat the community 20 years later was "comparatively stable"
and that the pupils were similar "in most respects" (Miller
and Lanton 91). Care was taken to give the tests on the
:ame day of the month as had been done 20 years before, to
yive exactly the same test, and to reproduce the same
:esting conditions. Undoubtedly, school personnel and the
bmmunity felt reassured to find that "present-day pupils
.ttending Evanston schools at the primary, intermediate, and
unior high-school levels read with more comprehension and
nderstand the meaning of words better than did children who
ere enrolled in the same grades and schools more than two
ecades ago" (96)—-though it seems unclear what use might be
ade of such knowledge, at least at the district level,

ther than as public relations material.

 

 

 

IE E?

52'

Tee
the
ace
Not
fu1

tee

be;
su
te
ra
pe
hi
ea
te
fc

SlI

84

Evaluating English Language
arts Teaching Practices

English languge arts teachers continued to be evaluated
3y the results of their students on standardized tests. As
3. Smith discovered in her study of the New York Regents
Pests, "Teachers believe, whether rightly or wrongly, that
:hey' may be dropped or retained at the end of the year
according to the number of their pupils who pass or fail.
Jot uncommonly, when applying for a new position they
furnish a so-called 'Regents’ record’ as one form of
:estimonial" (158).

Perhaps, then, English language arts teachers actually
)elieved their students’ scores were proof of their own
:uccess or failure. Or perhaps they pragmatically used the
zest data when it was to their advantage to do so. At any
ate, teachers took the testing seriously and realized the
ersonal stake they had in helping their students produce
igh scores. As in the case of the teachers mentioned
arlier who were said to have used copies of standardized
ests to teach from, teachers too often seemed willing to
ocus on high test scores as their most important goal. D.
nith similarly found that "teachers of New York state teach
iat they expect these examinations will test" and also that
'w]here materials are lacking, as they are in large numbers

schools, Regents’ drill books and, back copies of the
amination become a major factor in the daily program at

th the elementary and the secondary school level" (165).

 

Espe
whet
tha1
“as:
suo
bey
Int
Wil
art
by

sig
to

En;
f0:

su.

in

in
th
st

Ti

 

85

Especially disturbed by these practices, Smith wondered
whether money spent on testing--which produced nothing more
than numerical scores--might not be better spent in
"assisting schools in developing techniques for determining
success or failure in reading and expression which go far
beyond those (H? the average group examinations" (190-91).
Interestingly enough, a few years later in 1948 Pooley and
Williams found in their study of Wisconsin English language
arts programs that, "although testing in language skills is
by no means lacking, the results of the tests and the
significance of the evaluations have not contributed greatly
to the modification of content, methods, or materials in
English teaching" (93-4). Still, he added a caution that
formal testing not be used as "a form of supervision," since
such practices tended to "promote the wrong kind of
instruction" (96).

Unfortunately, those who conducted a state-wide study
in Tennessee seemed unaware of such dangers. As many before
:hem had done, they looked to the colleges to determine the
success of their secondary curricula and teaching practices.
the procedure used was to ask the colleges in the state to
3001 the results of their English placement tests and then
:0 prepare a "helpful yearly report of the efficiency of the
ligh-school training" for individual teachers and
ldministrators (Hodges 72-73). However, this procedure
:ontained a number of glaring inconsistencies that

.nvalidated the results. For example, each college used its

 

 

 

0W1
use
rel
wh.
rej
gr
re
st

in

th
ef
re

CC

V.

 

86

n placement test, which meant varying measures had been
ed. Although each student was originally "ranked only in
lation to the other students in the particular college in
ich he [was] enrolled" (73), the school districts received
ports showing' the percentage rankings of each. of their
aduates-—as if the percentages had the same base. The
searchers rationalized, by saying that "[a] student who
ands first in one college would probably be among the best
another, the average student in one college would
bably be among the average in another, and so on" (73).
Even more disturbing, however, is the degree to which
e student scores were used as a measure of the
fectiveness of teachers and curriculum. Although the
sults were prepared so as to make it impossible to compare
lleges, each secondary teacher received a list of her or
s own former students’ scores and each principal received
copy of the list sent to each teacher, "showing both the
rage for each teacher and the general average for the
001" (74). Superintendents and state officials received
orts as well. Incredibly, the Tennessee Council of
chers of English approved of this ranking and reporting
even made up a "yearly honor list" of schools. Hodges,
niversity of Tennessee English professor, noted that "the
y fact that a school is not on one of the honor lists
ves notice that the school is at best only average and
es it something to answer for in the community" (74). He

lained that this procedure was designed to "improve

 

 

En

III

CI

er

SC

M?“ I:

 

 

87

nglish teaching" (75) and cited one school official as
:ging teachers whose students got low reports "to review
~itically their entire teaching procedures" (75). Based
Jtirely on students’ test scores (actually, only on the
:ores of students who attended Tennessee colleges and
niversities) with no consideration given for varying
aching conditions or student populations, Hodges rather

oudly concluded that "the yearly testing brings into clear

 

lief the most effective English teachers. Two years of
oling placement tests from twenty-six colleges have made a
ore of teachers stand out like mountain peaks over the
ate" (75). Perhaps what is most striking in hindsight is
at no one seemed able to see the flaws in these
ocedures.

jor Studies Evaluating English
nguage Arts Curricula and Programs

Extensive state-wide studies of English language arts
sulted in the publication of books which may have served
guides to school administrators and teachers seeking to
aluate their own curricula and programs. D. Smith, the
iefatigable leader of English language arts curricula
ring the 1940s, was commissioned to direct a study,
ationed earlier, of the New York Regents testing program,
study that resulted in the publication in 1941 of
aluating Instruction in Secondary School English. Smith
1 her assistants sought to isolate "major issues" and to

tress "chief problems discovered" (v). They visited

 

 

 

SChOe
analj
diar
thei
list
clas
anal
inst
supe
rele
neee
rece
for
the
ext:

par

hee
res
maj
loc

if

 

T___11

schools, consulted with local officials and teachers,
analyzed students’ test results, and studied "reading
diaries of boys and girls, together with the records of
their attendance at motion pictures and their habits of
listening to the radio" (2). Syllabi were studied, as were
classroom and library equipment and book supplies. They
analyzed the "continuity" of the program, the nature of

instruction in the classroom, the "organization and

 

supervision of the program as a whole, and especially the
elationships of the offerings to community and individual
eeds" (2). Almost nothing seemed overlooked as they
recorded and reported the amount of money each school spent
for textbooks and the library, number of books borrowed from
the state traveling library, class size, schedules,
extracurricular duties, professional reading, and
participation in professional organizations (6).
Appropriately, they offered the caution still worth
1eeding today, that "no general inquiry can tap all the
resources of the individual community. It can present only
najor problems which will repay further investigation by the
.ocalities themselves" (9). Smith expressed the hope that
.f their study could
. stimulate careful consideration of areas of
strength and weakness as revealed by the Inquiry on the
part of local authorities, who are 111 a position to
make much more intelligent and detailed study of

conditions within the individual community, it will
have been abundantly worth while. (9)

Pool
publ
surv
for

schc
thei

Engl

Poo
the
ins
Cur
ins
cri
and

We:

89

A somewhat similar state—wide study was described by
Pooley and Williams in The Teaching of English in Wisconsin,
published in 1948. In this case, Pooley and Williams
surveyed the teaching methods and instructional materials
for English language arts in the elementary and secondary
schools (144—45). Rather bravely, they began the report of
their study by listing complaints commonly made against
English and its teaching:

1. The results of English instruction fail to justify

the amount of time allotted to the subject in most

elementary and high school courses of study.

2. English instruction often fails to turn out pupils
who can speak effectively.

3. English teachers do not succeed in interesting
their pupils or challenging them to make satisfactory
progress.

4. English courses of study are traditional and dull,
and unrelated to the actual needs of the pupil.

5. English instruction is ineffective (a) because it

includes too much grammar; (b) because it does not

include enough formal grammar.

6. The English curriculum is in chaos; no one knows

what to teach or in what grade to teach it. (3)
Pooley and Williams’s plan was to determine "how much truth
there [was] in each charge, and the extent to which English
instruction suffer[ed] specifically from defects in
curriculum organization, instructional materials, and
instructional methods" and to discover "how far the
criticisms are refuted by the evidence of competent teaching
and positive results" (3). The answers to these questions

were sought by preparing a detailed analysis of the problem

an

va

of

in

cl

fi

SE

P]

0“

90

and planning questionnaires and personal visits involving a
variety of persons and materials, such as teachers, courses
of study, textbooks and reference books, basic elements of
instruction, and classroom methods (4—6).

After studying questionnaires and making over 900
classroom observations, the results were compiled in a book
filled with statistical tables calculating everything from
salaries of high school English teachers (122) to "Grade
Placement of Items of Capitalization in Courses of Study in
Elementary Schools of Six Cities" (44). The fact that the
Wisconsin study did not test students led the authors to
feel that teacher—participants in the study were less
defensive and that school visits created good will (93).

Another extensive experiment and study was conducted by
the Stanford School of Education (financed by the General
.Education Board). This massive project involved 10,000
students and 150 teachers and administrators in secondary
schools and led to the eventual publication of three books,
one focused. on English. titled English for Social Living.
During the three-year experiment teachers of English and
foreign languages were charged to find ways to improve
student growth in their classes and were encouraged along
‘with the students to "create and to grow according to their
own best thought" and to "exercise their freedom"
("Stanford" 119). Furthermore, provision was made for
summer programs on Stanford’s campus in order for teachers

to work cooperatively with those from other school

 

dist:
cent!
pers
with
(119
stud
effe
used
in e

nent

("St
pro;
"as
muck
sent
122:

edu<

91

districts, always seeking to "observe the results of
centering work in English and foreign languages upon the
personal and social welfare of young people, conceived
within the democratic framework of a creative Americanism"
(119). The criteria by which each of the 50 programs
studied would be measured are as follows: "What is the
effect of this material on the young people with whom it is
used? Does it help them develop confident, vigorous ability
in all aspects of communication? Does its use promote the
mental health of each boy and girl and of society?" (120).
Perhaps as interesting as the project itself was the
evaluation of the project that took place in the pages of
the English Journal, which published a series of reviews of
the books published, especially the book focused on English.
One reviewer, Max Herzberg, a high school principal and
former NCTE president, devised and answered his own loaded
questions, such as, "Have pupils been given not merely the
artistic (’creative') point of View but also that of the
scientist?" and "Has the whole range of American education
been covered, including that impasse, the academic college?"
("Stanford" 121-22). Herzberg essentially praised the
project, pointing out among other reasons that the students
“as a result of the novel procedures . . . employed, came
much closer to a balanced equation of ability and
achievement than in the traditional classroom" ("Stanford“
122). On the other hand, another reviewer, an English

education professor named Pendleton, dismissed the books as

 

 

tee
boe
thl
( u
ne
en
of

1'6

92

"radical pronouncements by college professors of education,
bolstered by classroom suggestions written by controlled
teachers" (“Stanford" 125). Further, he complained that the
books reflected "the New Deal of present politics and . . .
the views of the group now dominating the National Council"
("Stanford" 126), views he would reject because they
neglected "all subject matter, all conformity to the
environing world, and all careful study of masterpieces and
of history" (126). Regardless of how the readers of the
reviews responded, the inclusion of such differing
viewpoints suggest a healthy determination on the part of
the editor to provide the open forum that earlier editorial
policies had outlined.

Although frequently recommendations for evaluating
English language arts grew out of large research projects, a
number of educators offered suggestions which seemed to grow
‘out of their own professional experience and theoretical
contemplations, especially suggestions by which schools
could evaluate their overall language arts programs. For
example, in 1954 John DeBoer, a University of Illinois
Professor of Education, published a list of characteristics
of "modern" programs which could serve as criteria for the
evaluation of elementary lanaguage arts instruction (485).
According to DeBoer, the modern school does all of the
following:

o expects the child to read only when he is ready to

read
0 provides in the classroom many attractive books,

 

 

 

of
an
th
OF
fc

e)

 

93

magazines, and other reading materials suited to
many interests and levels of ability

0 has an attractive, well stocked central library

0 systematically undertakes to cultivate wide and
varied reading interests in children

0 makes clinical facilities available to disabled

readers

0 provides an abundance and variety of direct
experiences

0 makes effective use of many kinds of audio-visual
aids

0 takes account of modern media of mass communications
0 undertakes to cultivate the child’s love of poetry
0 undertakes to cultivate the child’s gift for
creative expresion
0 provides abundant opportunity for the oral sharing
of ideas and experiences
0 develops skill in written communication through
well-motivated experiences in actual communication
(485-92)
Occasionally, during these years there were discussions
of how evaluation programs themselves should be evaluated,
and again, criteria for evaluation seemed to be offered on
the basis of the writer’s personal and professional
opinions. In 1944 Walter Cook, for example, listed criteria
for an "adequate" evaluation program, including such
texpected items as "the evaluation intruments should tend to
reveal to the learner clearly and in detail the inadequacies
of his performance" but also items that sound more current
today, such as "the program should be based on the fact that
the most effective evaluation . . . is that which is carried
on by the learner" and "evaluation instruments should be
available to the teacher and learner whenever the learning
situation requires them and not according to the calendar"

(198).

 

 

 

94

Pooley would later observe that the English situation
in 1950 was a time of "plenty of theory," and such articles
as the ones described above attest to that fact (498).
Pooley also observed, however, that 1950 was a time of great
"need of practical common sense" (498). Clearly the years
1941-1957 were a time of stretching beyond standardized and
objective tests and a time of looking beyond local
circumstances. By 1957, however, English educators were
jolted by Flesch’s Whv Johnnv Can’t Read and by Sputnik and
found themselves and their curricula facing considerable
criticism--especially charges of anti-intellectualism (A.
Applebee, Tradition 188) leveled against the life experience
curriculum. These circumstances would soon lead to
reconceptualizing and re-evaluating on many levels.

This was, then, a time of questioning, of weighing the
‘merits of testing, though even those who sensed an
inadequacy about tests and test scores felt helpless to know
very clearly what might be better. It was a time when test
scores were commonly used to make important decisions about
placement and promotion that sometimes worked against
students’ best interests. It was a time when educators
admitted openly that testing drove curriculum and teaching
practices and a time when teachers’ evaluations were
explicitly linked to student test scores. There seemed a
growing understanding of the control, especially the

external control, that testing could have on the lives of

 

 

stL'

to

95

students and teachers, who had the most to win and the most

to lose.

 

wh:
scl
wo
co:
32
Is
An
Sk
de
to
tn
st
ev

ir

CHAPTER FIVE
RECONSIDERING ENGLISH LANGUAGE ARTS EVALUATION:

1958-1969

English educators in the "Sputnik Age" predicted a time
which would "undoubtedly bring rigorous examination of the
school program" (D. Smith, "Re-establishing" 317) and which
would find English teachers seeking to define "what
constitutes growth in the various aspects of English" (Smith
326). The year 1958 saw English educators at the Basic
Issues Conference advocate a greater focus on content (A.
Applebee, Tradition 193—94), while that same year B. F.
Skinner described teaching machines and programmed learning
designed to break content down into tiny bits of information
to be sequentially presented (Science, October 1958). These
twelve years became, then, a time of self-conscious taking
stock among English language arts educators, a time of
evaluating previously-held basic assumptions and proposing

1
innovations to be tested.

1 It is no surprise that articles appeared with titles
i

like "The Teaching of English in the Soviet Middle School"

t
(English Journal 1959), "Why Ivan Can Read," (Elementary

PT 4

(nglish Teacher 1962), and "How Russian Children Learn to

.ead," (Reading Teacher 1959), which self-consciously asked,

1”;

'Are there features of the Russian system that might be
adopted in our own schools, or methods that would be
iefinitely disadvantageous?" Such questions suggested a

ylobal comparison, at least in this country, as we wondered

96

 

 

ho

SE

te
ir

CC

 

97

how we measured up against another power that challenged our
‘sense of technological superiority.

There was considerable discussion and questioning about
testing itself, such as Ralph Tyler’s explanation that tests
in the past had emphasized measurement and "reflected the
content of teaching materials," but that they were more
recently thought of as "a series of situations which call
forth from the student the kind of behavior defined in the
objective and permit a record to be made of the student’s
actual behavior" (6). Somewhat similarly, Warren Findley in
a 1963 article contrasted disenchantment with standardized
tests of the past with newer efforts to "promote and measure
a balanced set of educational objectives, including ability
to use or apply knowledge" (1). By 1969 the emphasis was
clearly on using behavioral objectives both to shape spiral
curricula and to evalauate them. Accordingly, professional
English language arts publications generated articles with
titles such as "Objectives for Language Arts in Nongraded
Schools" (Elementary English 1969) and "Selected Objectives
in the English Language Arts (Pre-K through 12)" (Elementary
English 1969).

Phillip Jackson called attention to the danger that the
general public, and even most English language arts teachers
as well, fell into when interpreting test scores, especially
in the case of norm—referenced tests: "Rather than being
viewed as convenient symbols which summarize an individual’s

performance in a most crude fashion, test scores come to be

 

 

 

see
sce
cat
the
li:
tee
me‘
we
be
me

th

in
pr
oc

li

 

98

seen as something the individual ’has’" (28). Thus, test
scores, which were widely used during these years to
categorize students for purposes of ability grouping, had
the effect of labeling students in ways that often created
lifelong scars. Unfortunately, few English language arts
teachers or parents seemed to heed Eleanor McKey’s
metaphorical admonition: "Let us use the standard test as
we use our watches, always mindful of the fact that they may
be a little fast or a little slow, but they are,
nevertheless, more reliable and accurate than a glance at

the sun" (611).

Evaluating English Language Arts

In comparison 1x) some earlier periods, evaluation of
individual student performance received less attention in
professional publications of this period. There were
occasional articles about evaluating understanding of
literature, but they seemed to repeat what had been said
before. Dwight Burton, for example, in his 1959 text,
offered evaluative criteria that reached beyond the
classroom, such as, determining how much voluntary reading
students did, but also suggested using objective tests to
measure literary knowledge and ability to comprehend
literary material (251-52). He further recommended tests of
literary "taste" as well as informal classroom evaluation

methods, such as, teacher observation, interest inventories,

and attitude scales (256—57).

 

 

 

nor
sug
sch
rea

rea

rec
tee
Slll
oce

I‘EE

W11
the
Ci:

88‘

th
be

in

99

Similarly, discussions of reading evaluation offered
more of the same long listing of activities that had been
suggested earlier. Mary Austin reported in 1958 that many
schools were using the following means of evaluation for
reading-—standardized reading achievement tests; informal
reading surveys; diagnostic procedures; observations;
individual conferences; inventories of reading skills,
interests, and study habits; teacher-constructed tests;
tests of pupils’ ability to locate reference materials;
records of students’ independent reading; and year-end
testing (36). In spite of these suggestions, however,
surely most evaluation of reading during this period
occurred essentially within the context of published basal
reading materials, which were used almost universally in
American schools by the 19605 (Goodman et al., Report 24).
Within the basal system both the instructional materials and
he tests were developed by the same publisher, creating a
ircularity that would go relatively unchallenged for
everal years to come.

As reading tests were subjected to a variety of tests
:hemselves through research, their shortcomings continued to
)e aired in professional publications. Roger Lennon, for
.nstance, in 1962 reported that
Studies agree that most of the measurable variance in
tests of reading competence, however varied the tests
entering into the determination, can be accounted for
in terms of a fairly small number of factors. . .

It seems entirely clear that numerous superficially

discrete reading skills to which separate names or
titles have been attached are, in fact, so closely

 

 

 

 

m0]
eve
pul
st:
de
ex
el
th
ye
me

fc

100

related, as far as any test results reveal, that we

must consider them virtually identical. (333)

Articles in professional publications paid relatively
more attention during these years to discussion of
evaluation of writing. What is striking about some of the
publications is that their authors seemed willing to notice
strengths in student writing, rather than simply assuming a
deficit attitude when evaluating student writers. For
example, Ruth Strickland’s 1960 article evaluating
elementary children’s writing spoke in terms of focusing on
the "growth of an individual child from day to day and from
year to year" (322). Evaluation data could be gathered by
means of anecdotal records, self—evaluation (323), student
folders (329), and sentence analysis of writing samples.
Beyond evaluation of individual student progress, Strickland

yalso recommended the evaluation of "growth of the class as a
t

‘whole with comparisons among classes . . . quality of
composition within an entire school . . . methods of
teaching' writing . . . [and] periodic evaluation of the

total curriculum in writing within grade levels" (322).

The task of evaluating the writing of secondary
students was addressed by the Association of English
Teachers of Western Pennsylvania, which published two
undated pamphlets (though bibliographic entries indicate
publication after 1958). Both the junior high and senior
high booklets included several student themes--as had

articles about composition scales back in the 19203. Each

 

 

 

 

compos
reprod
commen
commer
studen
commer
questi
pamph]
method
loads

bookle

fﬁ _A. H H d- . ."l

101

:omposition had been corrected and evaluated, and each was
reproduced along with handwritten marginal and in-text
:omments, as well as a one-to-three paragraph concluding
:omment to the student and somewhat longer note to the
student’s teacher from the evaluators. The evaluators’
comments ranged from direct advice to suggestions phrased as
questions to descriptive praise and criticism. The
pamphlets also included discussion of practical evaluation
methods which acknowledged the need for manageable paper-
loads for English teachers. The writers of the senior high
booklet conceded, for example, that
Teachers in the classroom will certainly be aware that
the comments on the papers that follow are generally
more extensive than they can afford to make on the
hundreds of papers they must grade. . . . The sooner
the general public, together with school boards and
school administrators, realize the time and effort that
a good composition program requires, the nearer we will
be to a genuinely realistic understanding of the

demands made upon the English teacher. (Suggestions .
. . Senior High 3)

mile composition scales 25 years earlier had been used to
:easure individual students’ writing, the editors of the
ennsylvania pamphlets expressed the hope that their texts
ould be used as a "focal point for discussion rather than
5 an arbitrary set of standards" and especially recommended
tat English teachers meet within their own building to
scuss the materials in the pamphlets (SpggesggnﬁL_L~L_g
W2).

When A Guide for Evaluating Student Composition (edited
Sister M. Judine) was published by NCTE in 1965, it may

 

 

 

have
arti
had

duri
from
rece
stue
stue
see]
tea
rec
per
pri

per

rel
wri
hor
cm
70
mo

of

 

ave seemed like a landmark work, for it pulled together 25
rticles related to evaluating composition, many of which
ad originally appeared in state and regional publications
uring the 1950s and early 19605. It included an excerpt
rom the Pennsylvania pamphlets mentioned above as well as
ecently designed rating scales, a defense of praise of
tudent work, and practical articles about "managing"

tudent, writing. Even though the articles in this book

 

aemed not to reflect any particular approach to the
aching' of composition, they served to .nudge readers to
consider composition evaluation from a variety of
Frspectives. The fact that the book was still being
tinted ten years later indicates the extent of its
arceived usefulness.

In spite of evaluation approaches that seemed to
zflect a more current attitude toward the teaching of
:iting, Hook’s 1961 report provided evidence that revealed
1w far removed current theory was from prevailing
imposition classroom practices. Citing responses from over
to secondary English department heads, Hook explained that
ire respondents reported spending the most time on "study
' functional grammar, with exercises intended more to teach
plication than to teach identification" than on "writing

students and discussion of what they write, along with
scussion of professional authors’ techniques"

Characteristics" 12).

 

 

duril
Zoll:
flow
Lund:
syst«
test
fitt
(747
and

test

of

I‘GCC
obse
1.1896

(K0;

eVa]
Prim
Paul
M

dete

big)

103

Evaluation of oral language also got some attention
during these years as well, with one 1958 article by Marian
Zollinger and Mildred Dawson suggesting that teachers plot
flow charts of their class discussions (Figure 3). Sara
Lundsteen, however, sought more of a "scientific,
systematic, and developmental approach" to the teaching and
testing of listening and seemed primarily interested in
fitting listening into a scheme of behavioral objectives
(747). Eighteen critical listening lessons were developed
and. used, with students being‘ given both pre- and post—
:ests. Given orally, the 79—item test measured "detection
>f the speaker’s purpose, analysis and judgment of
>ropaganda and arguments" (745). D. W. Kopp’s 1967
mementary English article cited a "dearth of standardized
:ests of oral communication skills and abilities," and
ecommended that rating scales, tape recordings,
bservations, and even "teacher-pupil—made tests" could be
sed to emphasize the improvement by each individual child
Kopp 120).

When there was discussion of informal classroom
valuation of English language arts, it seemed to focus
rimarily on the individual instructional needs of students.
aul Burns, for example, published a book called Diagnostic
eaching of Language Arts (1967) which described in enough
etail how anecdotal records might be used that readers

ight actually have been able to follow his suggestions:

 

 

 

 

 

 

Figure 3 - Participation in Discuss oooooooooooo

 

 

 

Anot

conf

book

appr
that

like

1963
was
test
comm
Proc
bull
soul
as -
hows
duri
Clas
Drug
Educ

Pets

 

105
There are many ways to maintain such records: one
possibility is the use of loose-leaf notebook or ring
binder that accommodates full—sized sheets, each
child’s name being put on a tab so his pages can be
found quickly. The purpose of the record is mainly to
note learnings the child has achieved and those he has
yet to acquire. (10)
>ther 1967 text advocated the use of teacher-pupil
nferences for the teaching of reading, providing a range
questions that could focus on the appropriateness of a
0k (e.g., “Why did you choose this particular book?"), on
preciation of a book (e.g., "What was it about this book
lat. made it good?“), and on values gained from a book
e.g., "Did something happen in the book which you would
Lke to have happen to you?") (Hunt 111).
For more traditional testemakers, NCTE published in
963 a booklet entitled Building Better English Tests, which
'as intended to serve as a corrective to existing faulty
esting practices and to help new English teachers avoid
ommon testing mistakes. It led teachers through the
recess of planning a test, selecting questions, and
uilding short—answer and essay questions--so that teachers
ould become more skilled practitioners of "the art—-as well
3 the science--of testing" (Carruthers 5). Ultimately,
owever, external testing seemed to grow more important
iring these years than either traditonal or alternative
tassroom evaluation measures. Aided by the guidance
:ogram initiated as a part of the National Defense

lucation Act (NDEA) of 1958, high schools gained a staff

arson whose job description usually included primary

 

 

atten'

stand

contr
discu
of ti
raise
exami
"exp:
in t1
exam:
empha
expre
reco:
call
the

Brit
situ
Call
they
in

hers
unne
exam
Won:
(155

0f

106

ttention to administration and interpretation of
tandardized tests (Findley 2).

The topic of external evaluation emerged as a
ontroversial issue at the 1966 Dartmouth Conference, as
iscussed in Herbert Muller’s book describing the workings
if the conference. According to Muller, when Alan Purves
'aised the possibility of "national assessment
examinations," the British contingent at the conference
expressed shock," since they had "thought that America was
.n this respect an Eden, untouched by the curse of external
examinations" (158). When the British called for an
:mphatic statement condemning external examinations and
xpressed the opinion that "no issue was more vital, no
ecommendation more urgently needed" (158), Purves and
olleagues apparently responded by expressing sympathy with
he denunciation of such a rigid examination system as the
ritish contended with but also "pointed out that the
ituation in this country is quite different and does not
all for such a manifesto." Purves asserted that even if
iey wanted to do so, "there is no one constituted authority
1 America to address a ringing denunciation to" and
ersuasively insisted that such an action was probably
necessary since "the proposal of national assessment
aminations is meeting strong opposition even though they
11d not affect the standing of students in the schools"
59). Purves’s words sound especially ironic now in light

national and state-wide testing programs that have

 

 

 

de‘
00]

6X

W3

di
na
fc
is

fJ'

107

veloped in recent years. Although the Dartmouth
nference participants did call for a systematic review of
aminations and grading, it is difficult today not to
lieve that a more strongly worded statement--and warning—-
5 needed.

By the end of the 19605 there was in fact considerable
scussion of and reaction to proposals for more testing and
.tional assessments. Willard Congreve's article of 1968,
yr example, insisted that "lack of appropriate evaluation
: undoubtedly one of the greatest weaknesses in the entire
.eld of education today" (307). Although Clarence Derrick
:knowledged. that. many saw as a problem the "paucity of
national’ essay tests," believing that "if someone doesn’t
est it, teachers won’t teach it, and students won’t learn
a" (496), nevertheless, he called on readers to "renounce
1e hope of any kind of testing of writing on large-scale
rtional tests" (499), contending that such tests would not
ﬁeld reliable scores or be economically feasible (496).
ien a 1969 Journal of Reading article raised the question
that Can We Expect from a National Assessment in Reading?"
te discussion about large—scale testing seemed to be
gttled, for the first sentence unequivocally stated, "A
itional assessment of education has begun" (Shafer 3).
Ther than argue the merits or disadvantages of national
-sessment, Robert Shafer offered cautions about the

porting of assessment results:

 

 

 

II-a Iraq

108
1. No score is to be derived for an individual since
each individual will receive only a portion of

exercises in the various fields being assessed in his
age group.

2. Individuals are not to be ranked in the reporting

of results since the assessment is to describe groups
and not individuals.

3. Each exercise must stand alone in the assessment;
it would not be submerged as part of a test.
Therefore, each item must be independently defensible
in terms of the objectives and capable of being
reported on as to the percentage of people answering it
correctly. (8)
though these were important cautions which have been
:eded in National Assessment of Educational Progress (NAEP)
esting, they have often gone unheeded with the development
state-wide testing, as later discussion will indicate.
rafer's prophetic warning about the potential greatest
anger of national reading assessment seems even now
oplicable to all English language arts testing:
Perhaps the greatest danger . . . may be found in the
pleas of many who, after the results become public,
will wish to restrict the curriculum to those
objectives and specific areas which were included in
the assessment and which they feel can successfully be
measured. A further danger will be that what is

considered difficult to assess will not be considered
as worth having. (54)

faluating English Language Arts
eaching Practices and Curricula

Teaching practices and curricula seemed to be
insidered more directly measurable during this period, as
ientifiable behaviors were being suggested as proof of
lacher and curriculum effectiveness. Another dimension of

e evaluation of teaching practices was revealed, however,

discussions of who would bear what responsibility for

 

 

 

ev
Cu
te
ju
pr
to
at
on
va
pr
st

WC

te
cc
A]
St
e\

si

109

-valuation. For example, an NCTE Commission on the

urriculum raised the provocative question, "Do the English

 

.eachers themselves establish the criteria and standards for
udging each other’s performance?" (282). Elizabeth Howard
roposed that not only should teachers’ performance be used

judge a reading program but that administrators’
fectiveness should be an important criterion as well. She
tlined program responsibilities that could be handled by
rious administrators, including superintendents,

Fincipals, and "supervisors of appraisal." In each case,
i

1e suggested that administrators play a supporting role--
.g., interpreting test results for the community and
Jrking with staff in planning and evaluating (170-73).

The issue of evaluation of English language arts
aaching practices was also addressed during these years in
mnection with discussion of overall program evaluation.
.though James Squire and Roger Applebee’s 1966 study of
ccessful English programs did not include direct teacher
aluation criteria, the report did present a variety of
gnificant and insignificant facts about English teachers,
mparing questionnaire responses of those whom project
servers had identified as outstanding with those of other
achers (348). As might be expected, the outstanding
achers had more experience, spent more time reading and
iting, and were more involved in professionally related
tivities than were the "general" group. Curiously, Squire

i Applebee provided such puzzling details as the fact that

th
1i

th

te

0F

wt
ar

ti

110

this same group of outstanding teachers spent less time
listening to music but more time in part—time employment
than did the general group (348—51).

Another kind of evaluation of English language arts
teaching practices and curricula took place through the
opinions expressed in the professional journals. The
English Journal continued to publish articles, for example,
which reflected college judgments about how English language
arts programs should be taught at all levels. Articles with
titles such as "What the Colleges Expect" (A Report of the
NCTE Committee (n1 High School College Articulation, 1961)
sometimes had the effect of negatively evaluating, or at
least patronizing, secondary teaching practices by advising
iigh school English teachers, for instance, not to weaken
:ollege—preparatory courses "by including units on social
:onversation, telephone manners, senior problems, or any
>ther matters related only vaguely to . . . teaching
.anguage, composition, and literature" (403). If turnabout
.5 fair play, a later article (1962) reported another study
)y the same NCTE committee, in this case a study of college
’reshman English courses. Perhaps secondary English
teachers felt some small comfort when, having suffered
ollege criticism for so long, they were told that "there
re quite as many things wrong with freshman English in
ollege as with English in the high school" (178).

Clearly during 1958-1969 the English language arts

valuation issue that received the most attention and

 

 

ef

of

an

in

CO

pr

 

WE

t1“.

Cl:

Ev

SC

II]

ti

st

 

ffort, as reflected in professional publications, was that
f large-scale evaluation of English language arts programs
nd practices. Beginning in 1960 NCTE had become involved
n program evaluation through the formation of its
ommitteee to Review Curriculum Guides. The Committee
rovided a review service, reported trends in curriculum and
guide-making" to the profession, and selected guides for
isplay at NCTE Conventions (NCTE Committee to Review 891).
n "Trends in Curriculum Guides,“ they published a checklist
ed by the Committtee as they evaluated curriculum guides
at could serve as a check for local school districts as
#11 (NCTE Committee to Review 895-97). Another NCTE group,
he Commission on the Curriculum, also focused attention on
urriculum evaluation when they published "A Check List for
valuating the English Program in the Junior and Senior High
:hool" (1962). They too provided questions that could
.ead local school faculties to the thorough examination of
leir programs from which all improvement ultimately must
.em" (273).

Several researchers of this period and of future
riods sought to analyze the characteristics of successful
glish language arts programs that had been identified in a
riety of fairly unsystematic ways. Often these studies
:e conducted nationwide with the hope of deriving
scriptions of——and prescriptions for-—success which could
imitated or adapted by programs and districts throughout

: country.

 

 

 

ar
ar
q]
we
is
a]
Er
wi
cc
ir.
wi
di
In
ha

re

    
  
  

112

Arno Jewett’s 1959 English Language Arts in American
Hi h Schools is one such study. Published by the U. S.
Department of Health, Education, and. Welfare, researchers
analyzed 285 courses of study from every part of the country
and reported on "promising practices in language arts"
gleaned from the courses of study. In addition, a survey
was sent to school district administrators and instructional
-eaders, teacher educators, and selected members of NCTE—-
all of whom were asked about the processes used to develop
Inglish language arts curricula (5). Though not working

vith a representative sample, the researchers eventually

 

'ompiled a "list of principles" that seemd to be effective
n the development and revision of courses of study, along
ith techniques used by administrators and curriculum
irectors that seemed to produce the desired results.
icluded were the following process recommendations that may

:ve served as guidelines for school districts interested in

form:

1. Through a schoolwide survey, discover the
curricular problems teachers are concerned about; then,
focus attention on a few major problems that they have
in common.

2. If necessary, use the broken-front approach-—that
is, first involve those persons who are most interested
in studying and changing the curricular program; then,
as they move ahead, encourage others to join them.
Avoid high-pressure methods.

3. Focus attention on a few major problems rather than
many minor ones.

4. Provide necessary books, instructional resources,
:onsultants, clerical help, etc., and an adequate

 

quest
had

in a:

that
cone.
cont:
("Che
SiZe
anon]
on 1
“59ft
read.
chare
expl.
depa:
re001

("ch;

 

113

budget to enable the curriculum committee to do its
job.

5. Help teachers and others to see the total role of
all participants and to understand their own job in the
entire undertaking.

6. Keep the attention of the working group focused on:
a. What is being accomplished, and b. what remains to
be done.

7. Have a long-range program.

8. Involve in the curriculum work the persons to be
affected by the changes recommended. (16)

Hook reported in 1961 information gathered from a
estionnaire to approximately 800 secondary schools which
d been winners and runners-up of NCTE Achievement Awards
an effort to discover the distinguishing characteristics

these schools ("Characteristics" 9). Hook’s hope was
at any notable similarities "might provide useful hints
>ncerning curricular and other" practices that apparently
ntribute to the development of especially able students"
Characteristics" 9). Questions were asked about class
ze, amount of time spent on extra—curricular activities,
vunt of writing assigned, degrees of teachers, time spent
literature, etc. In an attempt to make his report more
ful to teachers, he also provided a checklist by which
iers could compare their own schools’ responses with the
acteristics of the award-winning' schools. Hook then
.citly encouraged readers to discuss their findings in a
tment meeting and present significant findings and
nendations to district administrators

acteristics" 13).

 

that

and

 

 

 

114

Such reports sometimes had further dramatic impact in
that they created awareness of prevailing national strengths
and weaknesses. Squire’s 1962 reflections on the influence
of the National Interest and Teaching of English (1961)
report indicate that it raised "disturbing questions about
teacher preparation in English, teaching conditions, and
existing elementary and secondary programs" and called for
"vigorous professinoal leadership at the local, regional,
and national levels to improve the total profession" (Squire
381). The result, Squire insisted, was "energetic
reappraisal" within the profession. Even more significant,
however, was the impact this particular report had outside
the profession. Described by A. Applebee as "a direct and
shrewd presentation of the importance of English to the
national welfare, coupled with a startling documentation of
instructional inadequacies," the report was eventually
distributed to all members of Congress and to "other
influential government figures" (A. Applebee, Tradition 199-
200). This report, coupled with the subsequent publication
of The National Interest and the Continuing Education of
Teachers of English in 1964, with its startling evaluation
of the profession, provided documentation needed to help
convince Congress to broaden the NDEA to include funds for

English (A. Applebee, Tradition 201).

Squire and Applebee’s 1966 report, A Study of English

 

 

sc
ic‘
sr
sc

CC

E:
C]
in
qu
“F
ea
de
an
C0
re
pu
Se
St
Re
Wh
Su

9X

 

115

successful programs and. built on the results of previous
studies as it did 50. Sponsored by the U. S. Department of
Education, this study sought to discover how "stronger
schools" were achieving important results in English and to
identify the characteristics of what were deemed to be
superior English programs which might be emulated in other
schools (1). Studying a total of 158 schools which
:onsistently produced Achievement Award winners along with
'comparable schools with highly regarded programs in
English,“ (3-4), Squire and Applebee used in their study
:lassroom observation, individual and departmental
.nterviews, group meetings with teachers and students,
[uestionnaires and checklists (4). However, they also drew
,pon the criteria developed and the results attained from
arlier studies, including the award-winning characteristics
evised by Hook, the checklist of characteristics of junior
nd senior high English programs developed by the NCTE
ommission on the English Curriculum, and reports and
ecommendations from other committees, commissions, and
ublications of NCTE and other groups (4-5). Using fifteen
aparate instruments designed for the study (23-25), they
:udied everything from "Type of Final Examination and
elative Percentages of Content Therein" (322) to "Ways in
rich English Departments Would Most Likely Spend
lpplementary Funds" (138), eventually compiling an

:haustive 601-page report.

 

 

no
pr
Th
no
ca
re
de
ha
pu

an

th
pu
so
co
se
be
An
0f
Wh
th
te
in
ex

it

 

116

Eventually English educators came to realize the need
not just to evaluate existing programs but to plan for
program evaluation from the time new programs were designed.
The need to make evaluation a part of curriculum reform
movements and the need to handle the evaluation process
carefully were pointed out by Michael Shugrue in his
reflections on the Project English centers that were
developed with NDEA funds. Apparently the materials that
had been designed by the Project English centers were
published just a few at a time rather than all in a group,
and perhaps more significantly, "some professional journals
. . . published premature reviews of the Center Curriculums"
(43), with the effect that judgments were made quickly about
the entire project based on the small sample of materials
ipublished early on. Shugrue explained that, "Disturbingly,
some teachers . . . dismissed the work of the Centers as too
content-oriented or not sufficiently novel before they had
seen more than a small fraction of the rich variety of units
being produced in more than twenty Centers“ (43). A.
Applebee pointed to a need for careful empirical evaluation
of results of new projects and programs. As he reviewed
what he saw as the unsystematic evaluation of the work of
he Project English sites, he issued the reminder that

eachers’ impressions about curriculum reform are "almost
'nevitably highly positive" because they are based on "the
xcitement and stimulation inherent in the process of change

'tself" (214). He conceded, however, that

 

 

ac
AP
ti

te

En
ev
th
to
te
us
pu
to
0f
En
St.
to
Al'
We:

the

 

117

The kind of careful documentation of long—term results

that had marked the eight-year study was simply beyond

the ken of most of the staff involved in these efforts.

The result was a mountain of essentially untested

materials which no one really knew what to do with.

Very few of the centers admitted to any failures, but

very few carried on the kind of studies that would have

told them if they had failed. (Tradition 214)

Although readers today could point to greater
acceptance of and respect for classroom research, perhaps
Applebee offers today an insightful caution to include more
timeless assessment measures which go beyond the personal
testimony of new program participants.

One of the more significant features of this era of
English language arts evaluation is that to a great degree
evaluation that mattered most was essentially taken out of
the hands of classroom teachers. This was especially true
for elementary language arts teachers whose curriculum and
eaching practices depended on the basal reader materials
sed in the district. Published test materials were
urchased along with the readers, and teachers were expected
0 use them and indeed to rely on them as accurate measures
f student performance. For both elementary and secondary
nglish language arts classroom teachers, nationally—normed,
tandardized tests were a routine part of their school year,
0 be followed soon by state and national tests as well.
1though a variety of alternative classroom evaluation ideas

re suggested during these years, any use made of them in

e classroom involved an extra commitment on the teacher's

 

 

part

stake

118

part that seldom seemed justified as long as the testing

stakes rested on the standardized tests and their scores.

 

 

fr
ex
di
fa

ne

51

3C

111C

te

ir

 

CHAPTER SIX

EXPANDED TESTING AND ALTERNATIVES: 1970-1987

From Testing as Measurement
to Testing as Management

During the early 19705 the money that had flowed rather
freely during the 19605 for English language arts
experimentation and expansion began to dry up or possibly be
diverted to the Vietnam War. Concurrently, the country’s
fascination with behaviorist psychology had resulted in a
new focus on accountability, as reflected in emerging
systems approaches that promised cost—efficient solutions to
a variety of bureaucratic problems. The demand for
accountability "changed the role of measurement and made it
more and more central in the management of education" (NCTE,
Common 2). So pervasive was this movement that many English
teachers and professional leaders found themselves caught up
in it one way or another.

The systems approach to the teaching of English
language arts was intended to include pre-testing,
programmed individualized instruction, and post-testing
(Maxwell and Tovatt 11). Such activities were to be based
on a list of predetermined objectives derived from
observable student behaviors. AS teachers and
administrators realized that any given classroom activity

might involve many objectives, the task of writing

Objectives to cover the entire English language arts

curriculum became overwhelming. Indeed, 1n one preject

119

 

 

ree
in:
wh:
EV:
ob
wh

di

it
76
0b
in
fi
of
gr
Ir

IT’

 

 

120

reading specialists worked a year and a half to develop "an
initial list of more than 1,200 behavior objectives," a task
which was described as "only the beginning" (Evans 269).
Eventually professional and commercial groups assumed the
objective—writing task and established "banks" of objectives
which could be distributed on order to schools and school
districts (Brett 43).

Convinced that "if English experts do not do the work,
it may be done less well by others" (Hook, "Tri—University"
76), those who were a part of the Tri-University Behavioral
Objectives for English project explained in 1970 their
intention to develop a preliminary catalog of objectives, to
field test it, then to publish "for the information and use
of the profession" a catalog of English objectives for
grades 9-12 (Hook 86). That same year another group, the
Instructional Objectives Exchange, published English Skills
2:2, which contained 76 objectives and related evaluation
items organized into categories for speech, composition,
diction and tone, etc. For example, Objective 8 listed in
the "major category" of composition and the "sub-category"
of paragraph is as follows:

OBJECTIVE: The student will write a paragraph using

identification as the method of exposition. The

paragraph will conform to pre—specified criteria.

These criteria are:

l. The paragraph will have a topic sentence to which
the other sentences in the paragraph are related.

2. It will be free from gross spelling, mechanical, or
structural errors.

 

de
th
at)

as

Sui
ex;
ba:
ob

all:

 

121

3. It will use identification as the process for the
development of the subject.

4. It will be as long as the teacher specifies. (10)

The following year Hook et a1. published, as promised,

Representative Performance Objectives for High School
English, which provided objectives-—now sporting the new

 

label "performance" rather than "behaviora1"--and rationales
(5)-

Those who welcomed the use of behavioral objectives to
develop curricula predicted an orderliness and precision
that school administrators and board members must have found
appealing. They might, for example, be provided data such
as the following on which to base curriculum decisions:

The voice and diction improvement program, using

electronic laboratory tape cartridges, has cost $10,000

this year. It has served 800 students, 80 percent of
whom have shown marked improvement in their voice

quality and diction. Data sheets supporting this
conclusion are attached. (Brett 45)

 

Such accounting would require special expertise, as Jewett
explained in a 1971 English Education article. After
baseline data had been gathered and pupil performance
objectives had been written, an "independent educational
auditor" could be called to assist:

He looks at the base-line data to see whether they are
reliable, valid, and comprehensive in nature. He looks
at the pretests and other pre-evaluation procedures to
determine whether they measure what they are intended
. to measure. Later in the term he looks at interim and
‘ postests and other evaluation procedures to determine
whether they measure progress toward attainment of the
performance objectives. (10)

 

ot
tt
a:

116

tt.
ar
re
f 1'
mc
in
Oh
be
te
re
of
li
co

Mo

Pu
fu

(I!

122

In response to those who might question the power such a
person might wield over an English program, Jewett was quick
to point out that the auditor would not set the standards in
English, and he would "not select tests or set the
objectives for the course, although he might point out to
the teacher or the project director that certain objectives
are vague, nebulous, or unmeasurable or that other tests are
needed" (11).

In 1969 NCTE had passed a resolution urging caution in
the use of behavioral objectives in the teaching of English,
and Maxwell and Tovatt’s book published in 1970 indeed
reflected caution. Although the early chapters provided a
fictional scenario depicting both sides of the debate, even
more persuasive may have been the later chapters written by
individuals with varying attitudes toward behavioral
objectives. Some contributors denounced the use of
behavioral objectives and the pseudo—scientific approach to
teaching and learning. Purves, for example, reminded
readers that describing behavior involved "only a small part
of what is going on when people read and respond to
literature, when they generate utterances, and when they
compose their conceptualizations" ("Measure" 96). James
Moffett saw broader concerns as well, including the
"unintended mischief that will almost surely result from
publishing behavioral goals, and the bad precedent set for
future relations between government and education"

("Misbehaviorist" 111). Readers who respected Moffett must

 

 

 

 

123

have been startled and influenced when at the end of the
chapter he publicly withdrew from the Tri-University Project
(116). The issue, nevertheless, continued to be hotly
debated in most professional journals in articles carrying

such titles as "Behavioral Objectives?-—No" (Ferguson 52).

Evaluating English Language Arts

The whole matter of behavioral objectives was an issue
that consumed an enormous amount of attention and. energy
among English educators of the time, but other issues
related to the testing of students emerged or reappeared as
worrisome concerns as well. While the country's attention
was focused on the need for equality and for individual
rights, for example, English educators considered anew what
was fair to students and the extent to which their tests
might be considered objective. Purves provided the simple
explanation that "[t]he objective nature of the test is that
a machine or clerk can be programmed to see if the test
taker has chosen the answer that the testmaker selected as
correct. The testmaker’s judgment, of course, is
subjective" ("Evaluating" 235). Similarly, Kirschenbaum et
a1. insisted that when classroom teachers create objective
tests, every question is "selected in a subjective fashion
by a teacher with certain pet interests" (197). The
publication of their Wad-ja—get? (1971) raised further
questions about the accuracy and legitimacy of student

evaluation, especially grading, that led many English

 

 

te
ad
an
tc
gr
sr
jc
of
or

IE

 

124

teachers to consciously' monitor‘ their: own practices. In
addition to cataloging the problems associated with grades
and testing, these authors offered a range of alternatives
to traditional grading, such as self-evaluation, pass/fail
grading, and contract systems. Such alternatives were
subsequently tried and discussed in a number of professional
journal articles and may have influenced an NCTE Statement
of Policy on grading which advocated among other things that
only passing grades be recorded (n1 a student’s permanent
record (Burton et al. 302).

As statewide testing received increasing attention and
became more prolific, the terms "competency-based education"
and "minimal competency" were used to describe the tests
designed to measure whether students could perform the least
that could be expected. Allan Glatthorn reported that as of
1978 "thirty—six states had taken some type of action in
support of competency-based education" (17), and. by 1981
Charles Cooper reported that every state in the country had
"adopted or is seriously considering" minimal competency
testing as a way to establish standards for grade-to-grade
promotion and for' high school graduation (vii). Calling
this issue "the most explosive" one on the educational scene
in 1981, Cooper’s The Nature and Measurement of Competepgy
in English represented NCTE’s effort to respond first to a
1976 call for an ad hoc group to "explore this new
development and to suggest various responses NCTE might

make" and second to a 1977 resolution which opposed

 

 

1e
ti
be
it

it

 

125

legislatively-mandated competency-based testing "until such
time as it is determined to be socially and educationally
beneficial" (18). Designed explicitly as devices to sort
individual students, competency tests were intended to
identify those who fail to meet a certain standard" (5).
Aware of other potential uses of such tests, Miles Myers
explained that in a cost-cutting era "professional
statements of minimum competencies can be used as a
rationale for cutting costs. Everything beyond the minimum
becomes by definition a frill which public funds are not
obligated to support" (166).

The problems associated with all tests, but especially
with standardized tests, continued during these years to be
discussed in professional publications. The use and abuse
of test results drew perhaps the greatest condemnation.
Delores Durkin, for example, reported how test results were
used to decide which preschoolers should stay at home for
another year (767), and NCTE insisted that too often test
results were misused "to place individual students in
particular kinds of classes, to evaluate the effectiveness
of a new curriculum, or to assess the strengths and
weaknesses of as school system" (NCTE, Common 6). Several
authors perceived widespread ignorance about tests and about
their results, even among those in positions to determine
policy: "In some school districts groups have called for a
mandate that 95 percent of the children achieve at or above

grade level," a mandate Purves pointed out was "a satistical

 

cont
arti
schc
elee
’ abe
simi
shot
can'
in 1

have

of

Ins1
way:
app]
new

lea]

exar
"It
can
giVe
Wit]
Pree
obj,
gnie

be]

_______—___.____1.1 .11.... .1..1-1.._1.11 1.11. ,1- ,

126

contradiction in terms" ("Testing" 7). A. writer of an
article for a principal’s journal reported in 1987 that "a
school board candidate in my local district vowed that, if
elected, he would see to it that all students were reading
’above grade level'" (Burrill 61). Roger Farr expressed
similar impatience with those in positions of power who
should know better: "It often seems that busy legislators
can’t be bothered with really understanding what’s going on
in the schools; all they want to know is whether the scores
have gone up or down" ("New Trends" 22).

Paralleling the complaints and criticism were a number
of positive efforts, such as the 1973 NCTE Research
Instruments Project meant to find or develop "innovative
ways to measure such things as growth in literary
appreciation, reading, writing, listening, and speaking, and
new means of assessing attitude change, climate for
learning, and creativity" (Burton et a1 304-5).

Some suggested computerized testing. Marvin Glock, for
example, saw computers as perfect for a mastery curriculum:
"It would be possible to have an automatic assignment of
curriculum packages for an individual pupil based on tests
given by a computer individuating degree of mastery along
with diagnostic information" (63). Similarly, Brett
predicted individualized instruction using behavioral
objectives programmed into a computer, "which will track and
guide the student, task by task. The individual pacing will

be made possible by an abundance of instructional materials"

 

 

 

127

(45). Henry Slotnick and John Knapp even discussed essay
grading by computer (1971).

Many English educators, however, focused on informal
classroom assessment. Dorothy Strickland, for example,
advocated teacher observation (vi). Burton et a1. suggested
monitoring lists of books read, student journals, and
attitude surveys, among other items (311). Contributors to
Dorothy Watson’s Ideas and Insights (1987) encouraged
videotaping literature discussion groups, self—evaluations,
and anecdotal record sheets. Offering the least intrusive
suggestion of all perhaps were John Mayher and Rita Brause,
who explained that "[t]here is no reason why children can’t
be evaluated on the basis of the work they are actually
doing during the year" (394). Lori Clarke, on the other
hand, suggested that testing should be viewed positively as
a "culmination" rather than an examination--a "culmination
of his learning, of his intellectual excitement aroused by
the interaction of fellow students and teacher upon each
other" (43). English language arts teachers of this era,
then, were confronted with widely disparate approaches to
assessment of student performance-~as indicated by the
contrast of tests designed to identify students who fail to
tests conceived as the culmination of intellectual

excitement.

b1

pr

11’

01'

 

 

128

Evaluatin Readin
The teaching of reading had by 1970 come to be regarded
by many as so significant as to be a separate discipline,
prompted at least partly by the influence of the
International Reading Association and its affiliates. Not
only was the broad issue of reading performance discussed in
professional journals but smaller subtopics were treated as
well. One 1974 list, a "Concise Guide to Standardized
Secondary and College Reading Tests" (Mavrogenes et a1.),
abstracted and discussed 58 tests but cautioned that almost
2,000 reading tests had been included in the most recent
Buros index. Perhaps the widespread use of the standardized
tests could be explained by the promises made in the
publishers’ descriptions. Roger Lennon suggested that if
the best publishers were to be believed, standardized
reading tests could measure any and all of the following:
paragraph comprehension, word meaning, word
discrimination, word recognition, word analysis skills,
ability to draw inferences from what is read, retention
of details, ability to locate specific information,
rate of reading, speed of comprehension, visual
perception of words and letters, ability to determine
the intent of a writer, ability to grasp the general
idea, ability to deduce the meaning of words from
context, ability to read with understanding in the
natural sciences, in the social sciences, in the
humanities, ability to perceive relationships in
written material, ability to sense an author’s mood, or
intent, ability to appreciate poetry, ability to grasp
the organization of ideas, ability to read maps,
charts, and tables (19)
After bombarding his reader with such a list, Lennon

asserted that reliably only the following components of

reading ability could be recognized and measured by using

B.

 

129

standardized reading tests: a general verbal factor,
comprehension of explicitly stated material, comprehension
of implicit or latent meaning, and an element that might be
termed "appreciation" (29).

Other writers resisted the use of reading rate as a
criterion (McDonald) and the use of readability scales to
determine passages for reading tests (Rankin), while still
others pointed out flaws in the test items themselves.
Virginia Allen’s 1978 article, for example, explained that a
reading subtest had asked students to "indicate which of
these choices is the first syllable, or first part of the
word printed in front of the choices":

Item 7. riddle r1 rid ridd

Item 9. after a af aft

Item 18. have h ha have

Item 20. here h he here (89)
Beyond the question of whether "first syllable" and "first
part" might mean the same thing lies the more important
question of whether correct answers to such items might
provide any helpful indication of a student’s ability to
read. A 1987 Language Arts article reported that in one
oral reading inventory such short passages were used that
some of the children responded by trying to link the
passages into a single story line, with "patently wrong
answers" resulting (Bussis and Chittenden 307). The
children were in this case penalized for knowing too much--

or for using what they knew—-about how stories are

constructed. Similarly, children were penalized when they

 

 

 

usI
the
Ch;
"B1
fol
ch.
ot]

fol

in;
qu;
Ch.
im
in:

te:

sul
001
Use
All
te:
de:
de'
tat
0f

in.

 

130

"substituted an idea that seemed more logical to them than
the particular idea expressed in the test" (Bussis and
Chittenden 307). For example, one short reading passage—-
"Bud had run. He fed the pup. The pup ate a bun"——was
followed by the question, "What did the pup eat?" Some
children answered that the pup had eaten a "bone" while
others said "some food." The authors explained that in a
follow-up discussion the children had said that they were
unsure what a bun meant in this context or "they couldn’t
imagine such a thing being fed to a dog." Neither response
qualified as correct in the test manual (Bussis and
Chittenden 307), though again the "incorrect" answers
indicated nothing about ability to read but instead
indicated perhaps only that these children were not savvy
test—takers.

There seemed general agreement that many of the reading
subtests were inappropriate measures of reading performance.
Occasionally, however, an article appeared advocating the
use of ever smaller particles of text to test students on.
Albert Marcus, for example, encouraged word recognition
tests which would measure "each discrete skill involved in
decoding," and he insisted that this measurement "should be
developed by skill for each phonic element that has to be
taught" to the point that "[r]ather than use a sample of one
of the initial consonant blends with. r, the test should

include all the blends with r, so if the student knows some

 

 

whic

depe
proc
defi
info

even

anal
was
pass
orig
the
for
read
ran
thrc
tha1
hane

wou:

nise
Pas:
StUe
inse

3101

131

of them, the teacher will know which ones are known and
which ones are not known" (734).

Some seemed aware that better assessments of reading
depended on discovering better definitions of the reading
process. As psycholinguistic and socio—psycholinguistic
definitions of reading emerged, so did a number of new
informal classroom reading assessment strategies and
eventually even new large-scale reading assessments as well.

One of the more unusual alternatives was miscue
analysis. Working individually with a reader, the teacher
was advised to ask the child to read aloud an unfamiliar
passage and to record any miscues, i.e., deviations from the
original text, which could later be analyzed to determine
the quality of each response. The reader was not penalized
for minor deviations which did not negatively affect the
reader’s comprehension. For example, if the text "The boys
ran through the dark forest" was read as "The boys went
through the dark forest, the child’s misreading indicated
that in fact the passage had been understood. On the other
hand, a reading of "The boys ran through the dark frest"
would indicate a lack of comprehension (Goodman and Burke
4). By considering as many as 28 questions about each
miscue and by tallying the kinds of miscues in a selected
passage, the evaluator could acquire information about the
student’s reading strategies to be used in designing future
instruction. (Although miscue analysis is based on reading

aloud--a practice considered relatively unnatural to those

 

 

wh
yi
an
re
of
mi
sc
He
be

 

 

lea/«74.45

67891

 

132

who think of themselves primarily as silent readers~~it
yields assessment information that seems impossible to get
any other way).

As they came to regard reading as a process that
required an active reader, teachers and researchers began to
offer a variety of measures by which attitude toward reading
might be assessed. Thomas Estes, for example, devised a
scale to ‘measure reading attitudes (Figure 5), and Betty
Heathington and Estill Alexander proposed in 1978 a "child—
based" observation checklist:

Figure 4 — Observation Checklist to
Assess Reading Abilities

 

In the two—week period, has the child:

1. Seemed happy when engaged in reading activities?

2 Volunteered to read aloud in class?

3. Read a book during free time?

4. Mentioned reading a book at home?

5. Chosen reading over other activities (playing
games, coloring, talking, etc.)?

. Made requests to go to the library?

. Checked out books at the library?

. Talked about books he/she has read?

. Finished most of the books she/he has started?

0. Mentioned books she/he has at home? __ __

(770)

Readers’ prior knowledge also came to be considered an
important criterion by which to evaluate reading
comprehension. By 1987 Betty Holmes and Nancy Roser offered
"Five Ways to Assess Readers’ Prior Knowledge," and Merlin
Wittrock suggested "Process Oriented Measures of

Comprehension," which involved asking students to summarize,

 

133

Attitude Scale

A 2 strongly agree
8 : agree

C : undemded

D : drsagree

E : strongly disagree

l.Reading :5 (or learning but
not for enjoyment.

2. Money spent on books IS we”-
spent.

3. There rs nothing to be gained
from reading books.

4, Bocks are a bore.

5. Readmg us a good way to
spend spare time.

6. Snanng books in class is a
waste of time.

7. Reading turns me on.

8. Reading is only for grade
grubbers.

9. Books aren’t usually good
enough to finish.

10. Reading is rewarding to me.

11. Reading becomes boring after
about an hour.

12. Most books are too long and
dull.

13. Free reading doesn’t teach
anything. .

14. There should be more time
(or free reading during the
school day.

15. There are many books which
1 hope to read.

16. Books should not be rend
except for ctass require-
ments.

17. Reading is something I can
do Without.

18. A certain amount of summer
vacation should be set 3516.
‘or readmg.

19. Books make good presents.

20. Readmg rs dull.

Figure 5 - Estes Attitude Scale

 

 

 

 

rer
Dav
ass

stu

pro

int

rea
sta
(Va
det
clo
tha
mea
and
tex-
Tre:
Mici

str.

 

 

134

reread, question, and infer (736). Sheila Valencia and P.
David Pearson described "the best possible" reading
assessment as teacher observation and interaction with
students "as they read authentic texts for genuine purposes"
(728). Building on Vygotsky’s notion of the zone of
proximal development, the classroom teacher then could
intervene with support or suggestions as needed (728).

By the mid-eighties efforts to redesign statewide
reading tests were well under way. By 1987 "at least 40
statewide competency testing programs were in place
(Valencia and Pearson 727), and reading theorists were
determined to reconceptualize state assessment to more
closely reflect the new reading definition. Farr explained
that strategies used by better readers--i.e., constructing
meaning from background knowledge, visualizing story events
and sequences, and hypothesizing about facts and events in
texts--might be used as criteria for the new tests ("New
Trends" 23). The description by Karen Wixon et al. of
Michigan’s latest reading assessment parallels some of the
strategies Farr mentioned:

First, good readers must be able to integrate their

knowledge and skills as they construct meaning for

different texts under a variety of reading conditions.

Second, good readers must have knowledge about the

various purposes for reading, about how different

reader, text, and contextual factors can influence
their reading, and about the skills and strategies they
can use in their reading. Third, good readers are
those who have developed positive attitudes about

reading and positive perceptions about themselves as
readers. (750)

 

USS

 

Mit

res

knc

pel

eve

int

seI

km

Ch<

 

f0]

Stl

Stl

aul

v
s
11t
Inc‘
thc

at

0CC

 

 

135

Using full—length stories and subject area texts, the
Michigan test was designed with the following balance of
responses—~50 percent constructing meaning, 30 percent
knowledge about reading, and 20 percent attitudes and self
perceptions (751).

Illinois’ experimentation with new statewide reading
evaluation was described by Valencia and Pearson (1987). It
involved summary writing, metacognitive judgments, question
selection, multiple acceptable responses, and prior
knowledge (730). In each case students were given multiple-
choice questions to answer, each with a problem-solving
format. For example, in order to evaluate summary writing,
students read three or four summaries written by other
students and selected the one they thought best. These
authors acknowledged the benefits of classroom assessment
but also warned that "[u]nless we can influence large scale
assessment, we may not be able to refocus assessment at all"

(730).

Evaluating Literature

When compared to the array of reading tests and
assessment tools available and discussed, assessments of
literature seemed almost non-existent during these years.
Indeed, Walter Moore and Larry Kennedy pointed out in 1971
that Buros had listed "no standardized tests for literature
at the elementary school level" (443). There was still the

occasional article suggesting an expansion of the criteria

 

 

 

use
wit
"De
Poe
dur
obj
exa
mea
cor.
inf
be

fol

str

"nc
Chi
whj

inc

 

 

136

used to evaluate students’ understanding of and experience
with literature, such as Sarah Snider's 1978 article,
"Developing Non-Essay Tests to Measure Affective Response to
Poetry." For the most part, however, discussion focused
during these years on the possibility of using behavioral
objectives to teach and to test literature. Purves, for
example, attempted to use Bloom’s taxonomy to create
measurable behavioral objectives for literary works,
contextual information, literary theory, and cultural
information. Multiple-choice items, Purves insisted, could
be designed to yield the needed responses, as in the
following item intended to measure the extent to which a
student could "accept the importance" of a literary text:

Most poetry seems like a meaningless jumble of words.

1. Strongly agree

2. Agree

3. Disagree

4. Strongly disagree ("Evaluation" 750)

A Reading Teacher article presented both the "yes" and
"no" sides of the question of "Behavioral Objectives for
Children’s Literature?" Gordon Peterson’s "yes" response,
while conceding that research evidence was limited and
inconclusive, included the listing of 47 objectives, such
as, "After reading a selection with an implied theme, in one
sentence state the implied theme and briefly defend the
choice of the implied theme" (657). Other writers seemed
less convinced that elementary children’s experiences with

literature would be enriched by using such objectives or

 

 

ev
fc
it
hc
ir
m
Bi

tl

137

evaluation questions. Patrick Groff in his "no" response,
for example, argued that children’s literature did not lend
itself to any kind of "pre-determinism or prescription about
how child readers should respond to it" (662). Instead, he
insisted, imposing behavioral objectives would narrow the
number and variety of responses (662). Somewhat similarly,
Bill Ferguson argued that "[e]vidently, the behaviorist
thinks that an examination of the atoms of poetry will allow
the student to assemble them in his mind, add them up, so to

speak, and arrive at a total effect" (54).

Evaluating Writing

During the 19705 the evaluation of writing also focused
on the controversy about the appropriateness of using
behavioral objectives. Joseph Foley, for example, used
essentially the same method that Purves had suggested in
constructing a matrix by which content (in this case, ideas,
organization, style, mechanics, and choice of words) could
be measured by means of cognitive and affective behaviors,
as demonstrated in responses to multiple-choice questions
about. writing (770). For the most part, however, there
seemed general agreement among writers of professional
publications that writing could only be evaluated using
actual texts composed by the students themselves. As had
been true in earlier periods, English teachers continued to

suggest ways to respond to student writing and to address

 

 

 

 

the

wri

had
A11
wri
and
all
the
age
inv
ove
gra
of

stu

gat

cri
Sec

Unf

in

 

g
l
l
l
l

138

the sometimes conflicting issues of response to student
writing and grading of student writing.

By the late 1980s Calkins, Atwell, Romano and others
had suggested a variety of classroom evaluation procedures.
All encouraged student writers to self—evaluate their own
writing during the writing process and to confer with peers
and teacher about needed revisions (e.g., Calkins 159), and
all emphasized the need to temper response and evaluation so
that students emerged from the experience eager to write
again. Romano, for example, described his system, which
involved. an initial sorting into stacks on the basis of
overall impression and then writing individual comments and
grade for each student (114-15). Atwell explained the use
of periodic evaluation conferences with students, in which
student and teacher discussed and evaluated writing pieces
gathered over time in a writing folder (114).

As criterion-referenced tests of writing were being
recommended instead of norm-referenced ones (Squire,
"Behavioral" 146), authors of professional articles and
textbooks proposed checklists as evaluation guides. Using
these checklists in the classrooms of the 705 and 805 may
have served as an early effort to share the evaluation
criteria with the students, to let them in on the evaluation
secrets, so to speak, that have too often seemed
unfathomable to students.

Beyond being used for evaluation of student performance

in individual classrooms, criterion-referencing became one

 

 

 

of
wr
as
fo
as
to
un
th
wr

do

as;
de:

ho

pri

Dr:
110-.
Dr:
the

 

 

139

of the highly recommended methods for scoring student
writing samples on school-wide and district-wide writing
assessments. Roger McCaig described the criteria created
for use with writing samples collected during a school-wide
assessment-—criteria syntactically based on what he referred
to as M-units, which were similar to the better known T—
units created earlier by Hunt (Hillocks 64) but different in
that for young children’s writing, allowance was made for
writing which could be "reconstructed into a sentence in
accordance with a judgement about the child’s intention"
(McCaig, "The Writing" 7).

The publication in 1974 of Paul Diederich’s Measuring
Growth in Emglish seemed to refocus attention on writing
assessment both inside individual classrooms and beyond. In
describing the Education Testing Service (ETS) model of
holistic and analytic scoring of student writing, Diederich
promised ways of "improving the reliability of grades on
essays" (l) and a system of evaluation that could eliminate
"more than 90 percent of the grading that goes on day after
day in almost every classroom" (4). Six years later when
Myers published A Procedure for Writing Assessment and
Holistic Scoring, he spoke of Diederich as the person "who
has done more than anyone else to develop holistic scoring
procedures for use in the schools" (2). For those who had
not already experienced holistic scoring, Myers’ book
provided enough how-to information to persuade many of them

that they could do it and that it was the best way to assess

 

 

 

wr
st
ev

mi

Tl
()2
of

EU

US

as

 

 

 

140

writing (4). By teaching the scoring procedures to
students, classroom teachers could share more of the
evaluation secrets with their students. In Chris Paulis’s
middle school classroom, for example, holistic scoring was
used as a revision strategy (128).

By 1985 Lester Faigley et a1. had moved beyond holistic
scoring of written products to consider the writing process
as well:

. . . if instruction is to focus on processes of

composition as well as products, then evaluation

efforts must accommodate this shift in focus.

Evaluations must provide useful descriptions of the

ways students compose in order to identify and assess

changes in these processes that result from

instruction. (161)

These authors suggested a variety of strategies for
gathering process information, especially advocating the use
of process logs, self—evaluation questionnaires, and pre-
and post-term interviews (173-77).

Many states during these years began to reconsider the
use of multiple-choice items about. writing to indirectly
assess student writing performance. In Florida, however,
the multiple—choice format "was predetermined by the
bureaucracy," though. teachers and the Florida Council of
Teachers of English later had "input" (Simmons 27).
Pointing out that the ETS team of test writers under
contract was "wholly from the ranks of psychometricians,"

Simmons regretted that "[n]o verbal expression is tested or

measured; aside from filling in their personal data at the

 

 

 

141

top of the page, Florida students don’t produce a word,
either orally or in writing, throughout" (27).

In most states, however, test designers followed
Diederich’s lead and developed tests involving written work
composed by students, which could be holistically scored.
Colorado developed, for example, a Writing Assessment
Program which asked students in seventh, ninth, eleventh,
and college and university freshmen to write on the same
topic so that papers could be scored in a group across age-
levels (Distefano and Killion 208). The California Direct
Writing Assessment, using matrix sampling, included writing
prompts which required students to produce a variety of
writing types for a variety of purposes and audiences
(Peckham 31). In the case of California, and in a number of
other states as well, one of the criteria used to judge this
program a success was the important part played by inservice
workshops which gave teachers the "opportunity to talk about
what criteria are important in a specific type of writing"
(32). In descriptions of the New York State Writing Test
for fifth graders, similar benefits were pointed out.
Charles Chew, for example, explained that inservice programs
throughout the state "train local educators not only to rate
the tests but to develop instructional strategies to meet
students’ needs in writing" (56). This particular test was
praised as coming "very close to approximating the composing

process" (Chew 50), since students wrote two different

 

142

pieces and in addition were given time over two days for
prewriting and revision (Chew 50).

When Maine instituted its statewide writing assessment,
teachers who had participated in training for scoring
"decided they should teach the scoring procedures to their
students" which led to a writing exchange program with
another school in the state (Takacs 34), thus again sharing
the evaluation criteria with students as a nudge to

internalize it to apply to their own and to others’ writing.

Evaluating Oral Language Arts

As earlier, very little attention was given to
assessment of students’ oral performance, at least among
journals and books read by an audience of English language
arts teachers--who seem less likely to have read
professional journals for speech teachers. Still, the
"language arts" were thought of as including not only
reading and writing but speaking and listening as well, and
the evidence seems clear that oral language was overlooked
as an evaluation issue. Indeed, Walter' Moore and Larry
Kennedy reported in 1971 that there were at that time "no
standardized tests which measured speaking" (442), and only
occasionally did English language arts publications discuss
classroom-based oral evaluation strategies.

One 1974 text did include a separate chapter for both
listening and oral communication, though the focus seemed to

be on listening to formal presentations (Burns 52) and on

 

 

 

 

di
no

ar

pr
st

fr

ga

Vi

tl

143

discovering oral "deficiencies" (Burns 64), most noticeably
non-standard usage (76). Similarly, John Melear’s 1974
article, "An Informal Language Inventory," described a
procedure used primarily for "children who display a lack of
standard English" (510), mostly those who were bilingual and
from low socio-economic groups. The data in this case were
gathered by inferring the quality of language use from the
"percentage of grammatically correct sentences" used by
pupils as they told about pictures they had drawn (510).
Similarly, an article by Margaret Brown recommended asking
children to tell a story from a book of pictures while the
teachers taped the session for later analysis (507). Janet
Black’s article, "There’s More to Language Than Meets the
Ear," asserted that oral language assessment must include
observing, that is, the "seeing and watching of children in
various social and interactive contexts" (527).

Perry Gilmore, however, encouraged classroom teachers
to broaden their thinking about language assessment further
by suggesting that "peer-owned language such as occurs on
the playground should be included in a comprehensive
language assessment" (584). Through his analysis of "steps"
performed by the girls at recess, Gilmore discovered that
"[a]lthough most of the students in the observed classes
were identified as skill deficient, observations, to the the
contrary, indicated that the students were skill proficient"
(365). Recognizing that "steps" did not count officially as

literacy skills, Gilmore insisted that "[a]ssessment has too

 

 

 

 

of
In
aw

la

he
SF
pa
19
Me

as

[2

144

often meant closing doors rather than opening them. . . .
Instead, teacher expectations should be raised through an
awareness that students are capable of doing more with
language when they are given the room and respect to do so"
(390).

When distinguished from attention given to speech and
hearing evaluation in regard to physical impairment,
speaking and listening as language arts seemed for the most
part to have escaped the attention of state test makers. By
1980, however, an article appeared which described a
Massachusetts project intended "to assist in developing
assessment procedures in listening and speaking for the
state’s elementary and secondary students" (Backlund et al.
621). Being careful to explain the differences between oral
and written language, they argued that “assessing reading
and writing skills should not be used to indicate
achievement of speaking and listening" and explained that
competence in oral communication is dependent on the
interaction of several factors including

the speaker’s and listener’s purpose or task in

communication; the topic or subject being talked about;

the attitudes, experiences, maturity, skills, and
knowledge background of both the speaker and listener;

and the time, place, and preceding events of the
communication setting. (623)

National English Language Arts Assessment

Although the first National Assessment of Educational
Progress (NAEP) was administered in 1969, it was first

reported on during the early 19705. English educators

 

 

 

 

as

As

15

145

continued to have mixed feelings about such an assessment,
as suggested by Shafer’s article title, "A National
Assessment in English: A Double Edged Sword." Finally in
1975 John Mellon published for NCTE National 1* .t and
the Teaching of English, a book summarizing "in detail the
findings of the initial writing, reading, and literature
assessments" and interpreting the factual data "from a
number of perspectives" (1).

In writing, students were given imaginary situations
and directed to write short passages in response, which were
scored either "acceptable“ or "unacceptable" (16); they were
asked yes/no questions regarding out-of—school writing; and
they were asked to compose essays, which were scored
holistically (20). Some of the reading exercises involved
isolated phrases and sentences of text but others involved
short passages of prose and poetry (42). Results were
reported as percentages of respondents able to read and
comprehend each item. The literature assessment, Mellon
explained, seemed "a first step only in its intended
direction--cautious, conservative, almost tentative, and
frankly experimental in places" (76). This portion of the
assessment used multiple—choice questions with reason
follow-ups, orally composed and tape-recorded answers to
open-ended questions about works of literature, and essays
about a given work (85).

In spite of English educators’ mixed feelings about the

test itself, Purves praised the presentation of results in

 

 

 

te

pe
ta

ll" IE?

146

terms "intelligible to the layman" using "percentages of
people performing satisfactorily on clearly understood
tasks" ("Evaluating" 238).

Rexford Brown was another proponent of NAEP whose
articles appeared frequently in English language arts
publications. He seemed convinced that "in many states and
districts I am familiar with, the NAEP legitimized
activities that were close to happening but needed a push or
some kind of ’authoritative’ support before they could be
put into action ("The Examiner" 221-22). As an employee of
NAEP, Brown was not an unbiased reporter in this case,
though later materials will show that others shared his
impression that large—scale tests have the potential to
legitimize specific classroom activities.

Evaluating English Language
Arts Teaching Practices

Issues related to evaluation of students’ performance
clearly spilled over into the realm of evaluation of English
language arts teaching practices during these years. Much
of the discussion appeared within the context of students’
performance--to praise or blame teachers based upon their
students’ performance, to disavow any such link, or to argue
the possibilities on both sides. McCaig, for example,
seemed to believe that students’ performance directly
reflected teaching practices:

Not until the annual "writing test" was initiated as

part of the spring achievement battery were differences
between classrooms documented and the debate resolved.

 

 

147
The first year of testing demonstrated in a systematic
way what some people already knew; namely that some
first grade teachers were teaching' writing and some
were not. ("What Research" 49)
Lanny Morreau insisted that teachers should welcome having
their own performance linked to that of their students:
. . . an evaluation based on the learned repertoire of
students is both less capricious and far more
educationally relevant than the half-hour visit by a
superintendent, an inventory of teacher attendance at
staff and PTA functions, or the particular mode of
dress which a teacher selects. (37)
English teachers should, Morreau insisted, "demand" that
their own evaluations be "based on their effect upon the
behavior of their students" (37). other writers of
professional articles tended to agree that administrative
evaluation of teachers was inadequate. Burton et al., for
example, pointed out that in most cases, "[g]enerally, if
the teacher seems well prepared, the students appear
reasonably content and profitably occupied, the class is
orderly, supervisors will assume that all is well" (294).
What was offered by English educators in the place of
administrative evaluation, however, seldom linked teacher
performance to student test scores. John Hassett, for
example, explained that even as "[w]e do not judge a
doctor’s competence by the blood pressure readings of his
patients," we also "should not judge a teacher’s competence
on the basis of the test scores of the pupils unless there
is additional evidence that the teacher is failing to do his

or her job in the classroom" (31). John Maxwell also

asserted that "[t]est score results do not evaluate teacher

 

 

 

 

ef

fr

 

 

148

effectiveness . . . achievement test scores represent only a
fraction of the effect of English 'teaching' and learning"
(27). Further, Maxwell insisted, such tests measure "a
fraction of an English teacher’s goals" and therefore
student scores "are too unreliable an index to be used for
personnel decisions" (27). Mayher and Brause perceived a
trap that innovative teachers often find themselves in,
"caught between their own professional convictions about the
best approaches to promoting pupil learning and the outside
systems of assessment used to measure their success" (391).

What seemed most often suggested as a remedy for
relying on students’ performance on tests or on
administrators’ evaluations was allowing English language
arts teachers to become involved in their own evaluation.
Velma Elliott, for example, suggested that teachers should
become peer evaluators and proposed a system whereby
teachers could sumit names of teachers to their principals,
who would choose from among the names a team of evaluators
(727). Burton et a1. went further, however, and suggested
teacher self-evaluation instead. Teachers could, for
example, analyze their own behavior by means of a journal or
personal checklist (295), tape recording sessions,
soliciting observation by colleagues (296), student
evaluations (297), interviews with students, or
questionnaires (299).

These years were ones in which many English language

arts teachers sensed a public disappointment in their

 

 

 

pea
un:
"1
th
of
ar
ei

WE

Ma

149

performance. Allan Glatthorn, for instance, seemed
unsympathetic with English teachers, whom he believed had
"largely ignored the exhortations to made radical changes in
the way in which they teach" (13). Citing Goodlad’s study
of schools, he theorized that "the scholar’s recommendations
are largely ignored by the classroom teacher, who finds them
either too recondite or too unrealistic" (18). Further, it
was Glatthorn’s opinion that

The formal curriculum is often quietly subverted, the

mimeographed curriculum guide filed in the bottom

cabinet until evaluation time . . . projects which try
to develop teacher-proof curricula fail because they

fall into the hands of curriculum-proof teachers. (19)
Many teachers’ responses to the situation were predictably
and understandably defensive, as expressed by a teacher
educator who observed, "Anyone who comes into my room to
watch me teach and has a rating scale in hand is my enemy"
(Small 176).

Maxwell blamed teachers for some of the public
fascination with tests, saying "the testing fraud is in
major part something that is done by the consumers to
themselves" (iv). Citing the work of an NCTE Task Force on
Measurement and Evaluation, Maxwell reported "evidence of
widespread ignorance about tests among teachers,
administrators, members of school boards, the media, and the
public" (iv). When Sheila Fitzgerald surveyed teachers
several years later, she found that many teachers still
seemed unaware of the criticisms being leveled against

standardized tests and that "especially the teachers of

 

 

150

elementary students" believed that tests accurately
reflected what students had learned (39). Even later Mayher
and Brause observed that ironically "as teachers, we have
primarily communicated with parents through grades and test
scores whether or not we believe that they actually measured
student learning. We have, in effect, taught parents to
have a high regard for such scores and are, therefore,
caught in a trap of our own making" (391). Those who
believed themselves enlightened about tests and testing were
still caught up short by Valencia and Pearson’s assertion
that teachers "take secret pride that [their] pet
instructional technique produces greater gains than other
techniques" on one of the very tests that have been
criticized (726).

English language arts teachers also found themselves
criticized when they did--and when they did not--teach to
the tests. Erickson theorized that "[p]ossibly many of our
’best readers' and their classmates show depressed scores on
standardized reading comprehension tests because of an acute
deficiency in test taking ability" (140). Such a statement
was a telling one because of the implied criticism it
included. It implied, first, that such tests were in fact
accurate measures and important. Further, it can be
interpreted as a criticism both of teachers, who failed to
teach what their students needed, and of the students who,
according to the prevailing attitude at the time, were

themselves considered at fault or "deficient." Faced with

 

 

 

 

st
de
tc

t1”.

tl

pt
tl

151

students possessing such a "deficiency," teachers had to
decide how many test-taking strategies to teach and whether
to use such materials as a NEA book published in 1986 with
the title How to Prepare Students for Writing Tests
(Tuttle).

Given the criticism focused on tests, teachers during
this period were increasingly encouraged in professional
publications to become informed about issues, to think of
themselves as the best evaluators of their students’ work,
and to evaluate their own teaching practices as well. In
response to a 1973 resolution, NCTE published the practical
booklet, Common Sense and Testing in English (1975), which
provided teachers with enough technical knowledge to analyze
their current testing situations and the need for change.
Clearly written text and diagrams provided guides to help
English teachers understand and help others understand the
range of testing problems and possibilities, including, for
example, "Legitimate Uses and Users of the Results of
Measurement" and a "Citizen’s Edition" of the report as
well.

Even as teachers began to seek alternative measures by
which to evaluate their students, they were also encouraged
to ask questions about their own teaching and to
systematically collect data and test hyotheses about their
own teaching (Judy, The ABCs 164). Citing advantages of
self-reporting, S. M. Koziol and Patricia Burns explained

that as teachers complete self—report inventories, they

 

152

"became aware of instructional choices they had not been
aware of or had not considered for some time . . . they were
examining what they did as teachers and they were thinking
seriously about alternatives for their classrooms" (116).
One of the most novel articles affirming teacher
competence and good judgment appeared under the title, "The
Latest Model." This tongue-in—cheek narrative recounted a
search for the best assessment tools. As the buyer entered
"Ernie’s Evaluation Emporium," he/she was shown a variety of
models by the manager, "Norm Reference." Each model seemed
to promise more than the one before, e.g.,
This sleek model here provides grade equivalent scores
out to 6 decimal places, is renormed every other
Tuesday, and the publisher guarantees that every child
scoring over the 90th percentile will receive a
simulated gold bracelet with his or her score etched on
the back. (Fredericks 790)
Eventually the latest model was described, along with the
promise that "they last forever and consistently give
accurate diagnostic and evaluative information on students"
and in addition, "the price is right" (791). Teachers, of
course, were the latest model. In that same issue of Thg
Reading Teacher, however, a more serious article echoed the
same position. "Teachers as Evaluation Experts" highlighted
the value of the evaluation expertise teachers provide as

they detect patterns, know classroom procedures, listen, and

evaluate to serve instruction (Johnston 740).

 

 

 

 

Such curriculum materials,

153

Evaluating English Language Arts Curricula

Just as evaluation of teaching practices was linked to

evaluation of student performance, so also was evaluation of

English language arts curricula linked. Although English

educators insisted that standardized tests did not provide

enough information that would make them useful in deciding

about changes in the English curricula (Maxwell 2), during

these years as curricula were designed and developed and

evaluated, student performance on tests was the dominant

driving force.
How difficult it must have been to watch testing
overpower the curriculum and to have the resulting weakened

curriculum shape evaluation. George Madaus cited evidence

of the teaching and testing circularity when he spoke of a

public school principal who testified that

. . . reading instruction has come to closely resemble

the practice of taking reading tests. In reading
students using commercial materials read dozens

class,
of little paragraphs about which they then answer
questions. The materials they use are more and more

designed to look exactly like the tests they will take
in the spring. (8)
Mayher and Brause pointed out,

"enable kids to practice for the tests--which thereby

demonstrate that the schools are achieving their objectives"

(392).

curricular materials
pointed out,

houses . . .

Perhaps the ultimate link was between publishers of

and tests. As Betty Jane Wagner
"Editors at the el—hi desks of major publishing

know they can appeal to potential buyers by

 

154

citing better test performance by students who have piloted
their textbooks" (55). Indeed, a booklet called Imprgying
SAT Scores accompanied the 1985 edition of the Ginn
Literature Series, its introduction stating,

The booklet provides an overview and explanation of the

verbal SAT, a description of the relationship between

the instructional program of the Ginn Literature Series
and the SAT verbal skill areas, and test practice

masters in the four SAT areas. . . . (1)

Although student performance issues dominated
curriculum discussions, there were during these years those
who addressed the need to evaluate English language arts
curricula as a precursor to considering new options. Three
books published in 1980, in fact, were entirely focused on
the English language arts curriculum. Glatthorn’s Guide for
Developing an English Curriculum for the Eighties included
Mandel’s Foreword citing a shortage of recent NCTE books on
curriculum reform because so much professional energy had
been expended in responding to "proponents of competency-
based teaching, minimal competencies, and state—mandated
testing" (ix) that. there had. been little energy left to
consider curriculum reform. What Glatthorn proposed,
however, was a "mastery curriculum" characterized by careful
sequencing (i.e., "learning of objective 3 depends on
mastery of objectives 1 and 2") facilitated through careful
planning (i.e., "teaching objective 3 requires deliberate
analysis of its component skills") which should result in

measurable outcomes (i.e., "a test can easily determine

 

 

111E

Wi

tl

a
t
o
f
h

 

 

155

whether objective 3 has been mastered) and was "best
mastered when its content is clearly delineated into
discrete units or lessons" (28).

Barrett Mandel ’ 5 W
was another NCTE publication of 1980, intended as a response
to a 1977 NCTE sense-of—the-house motion calling for
national guidelines for curricula in English (1). Mandel
admitted that his book might seem to readers a "far cry from
the intention of the motion" and explained that what he
offered instead was a description of three curriculum models
from which readers could choose: the competencies model, the
heritage or traditional model; and the process or student-
centered model (3).

Without the constraints imposed by Glatthorn’s mastery
curriculum and without the indecisiveness suggested by
Mandel’s three models, Judy (ABC’s of Literacy) proposed
"ten global priorities" for literacy education, though he
suggested they be simply a "starting point for school and
community discussions" (82). Among them are the following
assertions that serve as a rejection of the teaching—testing
circular trap: that literacy programs should be "based on
reading and writing experiences, not principally on the
study of literacy-related skills" (82); that they "must lead
to continuous growth rather than offering isolated
experiences or training" (85); that they "be developed by
the people who will conduct them—-the teachers" (92); and

that "teachers must be willing to offer instruction in

 

 

ree
ski
boc
let
in‘
he
th

cl

gr

SC

 

156

reading and writing skills whenever and wherever those
skills are needed" (95). (While none of the three 1980
books seemed to have had a particularly striking effect on
later discussions of curriculum evaluation, it is
interesting to consider what impact Judy’s text might have
had on English teachers if it too had been published under
the auspices of NCTE, since it seemed to fulfill more
closely the spirit of NCTE's 1977 resolution.)

Officially NCTE continued to evaluate school curriculum
guides and to offer a review service which districts or
schools could use to obtain a critique from the NCTE
Curriculum Committee members free of charge. Their
published models and guidelines provided periodically
updated criteria by which curricula could be evaluated. In
1973 NCTE created a Task Force of Measurement and
Evaluation.

Some school-wide curriculum evaluation projects were
reported involving a variety of curriculum assessment
criteria. APEX Evaluated and Revised (1975) was one such
effort. In this case, the school’s "nongraded phase—
elective English curriculum" was evaluated. Language arts
teachers volunteered to meet after school to develop the
evaluation process, which included using in-depth
questionnaires (7) and semantic differential scales to
measure student responses and attitudes (6). More
predictably, other criteria suggested--though apparently not

used in this project--were scores on reading, SAT, and ACT

 

de:

st

re

st

ph

te

 

157

tests (8). In contrast to the APEX study is the example of
a small private school’s efforts to gather the data needed
to make curricular decisions. The Prospect School was
described as a K-9 demonstration school in Vermont with a
staff of five teachers and three researchers. Teachers and
researchers gathered the following descriptive records for
student and curriculum evaluation purposes: drawings,
photos, journals, written work, teachers’ weekly records,
teachers’ reports to parents, curriculum trees, sociograms,
and more (Carini 45). Such a project, though admirable and
perhaps a model for some (those who could provide three
researchers for each five teachers), would be beyond the
reach of the most ambitious teacher-researcher working
alone.

A survey of curriculum evaluation materials during
these years reveals that essentially missing from this
period were large—scale studies of English language arts
programs, such as Squire and Applebee had conducted earlier.
Arthur Applebee’s Writing in the Secondary School is one of
the few exceptions. Using classroom observation of writing
assignments and related instruction in two midwestern high
schools over a full academic year and a national
questionnaire survey of teachers in six major subject areas,
he studied teachers’ actions and attitudes. His research
results charted such things as "Mean Percent of Lesson Time
Involving Writing Activities" (31) and "Types of Writing

Reported by students" (33). Given the overwhelming emphasis

158

on student performance and on teacher accountability during
these years, broad curriculum evaluation projects were
apparently seldom funded, and even school—wide projects,
especially realistic ones, seemed rare.
WW
Language Arts Assessment

In 1978 English Journal editor Judy observed that
English teachers had been accused of being "anti-test“ or
"anti-evaluation" ("Standardized" 6). Indeed, some seemed
to condemn all tests, and sometimes the reasons cited seemed
valid, as in the case of a 1974 editorial which theorized
that student performance would improve if instead of
testing, the money used for tests were diverted to "building
better reading programs, supplying teachers with more and
better books, and training teachers in the use of more
effective approaches" (qtd. in Farr and Roser 594).
Following the editorial, Farr and Roser responded by
insisting that, even if such a plan could be put into
practice, "a future editorial would be demanding evidence
that the additional funds were being wisely spent" (594).

The accountability emphasis at the beginning of this
period eclipsed other English language arts evaluation
concerns. Classroom teachers, told by administrators and
legislators to follow and sometimes produce countless
behavioral objectives, had little time or energy left to
devise alternatives. Perhaps, however, it was their

intuitive aversion to the behaviorist approach that prompted

 

 

 

 

 

thei
offe
of t
info

asse

shou
asse
Whil
issm
inst
in

foli
empl
very
real
rec.
con
art

abo

pro
foo
QVa
fin
don

PO]

159

their determination to discover what it was they found so
offensive and to devise alternatives. The fact that so many
of the alternatives they suggested were classroom based and
informal measures may indicate their readiness to rethink
assessment completely.

English educators seemed convinced that assessment
should match current theory and practice, and based their
assessment reform efforts toward achieving a better match.
While such an approach seems to be logical, it raises the
issue as to how closely there should be a match. For
instance, in the days when telephone courtesy was emphasized
in English language arts classrooms, should tests have
followed suit (as they did actually) and given a similar
emphasis to such skills? It seems possible that to do so is
very much like the teaching-testing trap of the basal
reading skills programs that have been so criticized in
recent years. An important question for future
consideration, then, is the extent to which English language
arts evaluation should or should not follow current ideas
about curriculum and teaching practices.

It is interesting that the accountability system was
promoted during a time when the nation was supposedly
focused on equal opportunities for all, yet this system of
evaluation was clearly a human sorting process focused on
finding deficiencies in the skills that groups outside the
dominant culture were apt to possess. It was, then, a

political and economic issue that in some ways parallels the

 

 

160

emphasis on grammatical correctness during the years of mass
immigration during the 19205 and 305. Who was served by the
particular ways that English language arts was evaluated and
by the focus of the evaluation? Those who felt their own
positions threatened?

If there was an emerging attitude at the end of this
period, it was that assessment efforts should demystify
testing——that students should share in the evaluation
process and understand it. Rather than thinking of
assessment as a means of separating or excluding those who
do not measure up, assessment should,at least in part, be
thought of as a way to acknowledge strengths. Another
important direction has grown from the holisitic scoring of
writing, which by being a reliable measure, has influenced
teachers to rethink their own ability to evaluate
effectively and to rethink how evaluation should be done,
i.e., in the context of a whole work.

Perhaps by the end of the eighties the English language
arts world was ready for a theory of English language arts
assessment——whether it be assessment of student performance,
assessment of teaching practices, or assessment of
curricula. If so, several tenets of such a theory began to
emerge during the 705 and 8OS—-that process forms of
measurement be included (M. Wilson 12), that error be valued
as necessary to the developmental process (M. Wilson 12),
that emphasis be placed on "possibility rather than on

actuality" (Chaplin 216), that assessment programs and

 

 

161

procedures foster sound teaching practices, and that the
secrets of evaluation be shared so that evaluative criteria

can be internalized.

 

 

CHAPTER SEVEN

CURRENT CONDITIONS

The Power and Impact of Testing

Since 1987 assessment has become in education circles
and beyond a topic often discussed. Even writers for a Timg
cover story reported their perception that "young people
want constant feedback from supervisors . . . people in
their 205 crave grades, performance evaluations and reviews.
They want a quantification of their achievement" (Gross and
Scott 59). If there is any truth to such perceptions,
schooling has no doubt played a central role in creating
such circumstances. In fact, the impact of testing, as
reported in professional publications in English language
arts and other disciplines as well, has become so great that
every facet of education is being rethought through the
filter of assessment, and some theorists have concluded that
school itself is becoming a "test-like activity" (Langer,
qtd. by Edelsky and Harman 160).

At the same time test-bashing is common in almost all
professional quarters, especially in English language arts.
Standardized tests are criticized as “synthetic, contrived,
confining, and controlling, out of touch with modern theory
and research" (K. Goodman et al., Whglg xi). Almost
everyone in the language arts field seems to believe that
tests have come to wield too much power over curriculum and
over the lives of students and teachers alike. Edith

Aronson and Roger Farr speak of the "growing empowerment of

 

 

te

 

ti

SC

 

163

tests," and Madaus insists that "[t]esting is fast usurping
the role of the curriculum as the mechanism of defining what
school is about in this country" (63). There is "no test
worth teaching to," according to Madaus, who calls
measurement-driven instruction "psychometric imperialism"
(84). Yet, we sometimes invest high—stakes testing with so
much power, Madaus warns, that "society tends to treat test
results as the major goal of schooling rather‘ than as a
useful but fallible indicator of achievement" (97).

If tests carry such high stakes, they have made
everyone who might be affected anxious, sometimes so anxious
that tests and their results have been grossly misused. One
former classroom teacher explained, "I have direct evidence
that the answer sheets of the students who scored lowest on
our district tests are being removed before they are sent to
the central office, where district averages are computed"
(Richards 66). Not all abuses are so overtly dishonest, but
many lie suspiciously close to that line. Susan Harman
highlights a variety of actions-~some of which school
administrators rationalize for reasons that have nothing to
do with testing--that in fact raise test scores:

Use old norms; exempt the children with limited English

by sending them to bilingual classes, modify the

testing conditions for black children by sending them
to special education, leave other low-scoring children
back in hopes that their age will give them an edge,
teach to the tests, or simply teach the tests (that is,

cheat). (50)

Given the frenzy of attention being given today to

assessment, English language arts teachers must focus in a

 

 

164

profound way on what assessment will mean for the

generations of future students.

Evaluating English Language Arts
By the late eighties the reading profession, which had
been controlled for decades by standardized testing and by

test-driven curricula, had begun to explore a range of

 

alternative means of assessment. Frequently, however, the
discussion of the prevalence and problems of standardized
tests has overshadowed discussion of the specific criteria
by which students’ reading performance could be evaluated.
Carole Edelsky and Susan Harman have compiled the

evidence against standardized tests for reading. Such

 

tests, they argue, are based on a faulty conception of what
reading is: "Test makers ignore the interconnections and
interdependence among the various language sub—systems such
as reader’s purpose, text, genre, and the social relations
among reader, teacher, and author" (158). Further, they
insist that the tests are not valid since they do not
measure what they say they measure. Citing a study by
Altwerger and Resta, they report that "1,000 children showed
no particular relationship between their actual reading and
their scores on the California Tests of Basic Skills. Some
children scored high but read poorly; others scored high and
read well; some low scorers read well and others didn't." A
test score does relate, Edelsky and Harman insist, to "how

well that person does on test—like tasks in school [though]

 

tes

are

pro

esp

 

of

im]

19!

by

pri

fr.

NA

lln

id

NA

pc

t1“.

165

. . . there is I“) evidence whatsoever that the tasks on
tests (like being able to identify short and long vowels)
are used in real reading" (159). There are additional
problems with the way reading tests are interpreted,
especially by very young students:
. . . when Jesse was six, he told his mother he thought
the way to take a test was to pick the answer he liked,
so he read them all and then found the ones that
sounded nicest. Mishi, a second grader, thought the
idea was never to read the questions first because that
would be cheating. Nicky, on the other hand, thought
it would be cheating to look back at the passage
because that would make answering the question too
easy, so he covered it up. (Edelsky and Harman 160)
Both NCTE and IRA have denounced the overuse and abuse
of inappropriate reading tests and called for development of
improved means of assessment. IRA, for example, passed a
1988 resolution which "opposes the proliferation of school
by school, district by district, state by state, and
province by Emovince comparison assessments" and withdrew
from any involvement with the development of an improved
NAEP reading test (Farstrup l). NCTE in 1989 urged NAEP
"not to incorporate the testing of discrete word
identification skills into assessments conducted by the
NAEP" and to "intensify efforts to inform educators,
policymakers, and the public about the problems inherent in
the testing of discrete skills" (Jan. 1990 Language .Arts
93). NAEP itself published a report (Langer et al.) on the
results of its 1988 reading test which attempts to describe

"who reads best" and "how well do students read." Included

among the criteria used to answer such questions are data on

 

 

166

independent reading, availability of reading materials,
value of reading, time spent on reading instruction,
emphasis on reading skills, emphasis on testing, and recent
changes in teaching practices. Using matrix sampling, NAEP
provides such information regarding reading tests as that 22
percent of fourth graders were tested in reading at least
once a week and that "higher-achieving students were likely
to be tested the least, while lower-achieving students were
more likely to be tested at least weekly" (55).

A number of newer directions for reading assessment
have recently been suggested. Constance Weaver’s chapter
title, "How Can We Assess Readers’ Strengths and Begin to
Determine Their Instructional Needs?" reflects the movement
away from 21 deficiency model of assessment and away from
testing for the purpose of sorting students. Reading
performance, according to Weaver, should be evaluated using
measures that meet several criteria, among them, recognition
that "no two readers . . . will ever read or understand the
same selection in exactly the same way" and provision of
"insight into a reader’s strategies" (327). Such a whole
language perspective on reading assessment has led to
"greater emphasis on semantic cues" and to "greater reliance
on miscue analysis and the development of checklists for
process rather than for skills" (Aronson and Farr 161).
Perhaps as an alternative to miscue analysis, Lesley Morrow
has designed a "Story Retelling Analysis" suitable for

classroom use (112). The Bay Village (Ohio) City Schools

 

 

 

 

ha

Sc

C3.

oh

de

pc

fe

er

167

has developed a "District Holistic .Assessment of Reading
Scale" (Figure 5) which provides a four-point scale which
can be used to evaluate "the accumulation of several reading
observations." A less technical reading assessment is
described from a principal’s perspective: "Whenever
possible, hear each child, in grades 1-4 at least, read a
few pages to you. Give each child an on-the—spot
encouraging written analysis" (Corbett 53).

Although literature has frequently been replacing basal
readers in elementary classrooms, discussions in
professional publications have not included evaluating
students’ understanding of literature or the quality of
literary experiences. (Perhaps those who believe that only
what is especially valued is tested might interpret the lack
of such articles as support for the concern that literature
in elementary classrooms is being used solely as a way to
teach reading in the same way that basal readers were used
in the past.)

Although he was probably thinking of secondary or post-
secondary students and of standardized testing, Purves
reported at the 1989 NCTE convention that when literature is
tested, "often test questions reduce literature to the level
of textbook where knowledge is factual." Further, there are
"no questions on evaluation of the work as aesthetic object
with attitudes, beliefs, or interests and no questions
dealing with the nature of the aesthetical transaction."

Concerned that the imaginative power of literature "remains

 

168
DISTRICT HOLISTIC ASSESSMENT OF READING SCALE

This evaluation should be the accumulation of several reading observations. This evaluation
should be adjusted according to grade level and child.

4

Loves to read

Reads from many different genres

Reads stories and information that require sophisticated background information
Reads orally with ﬂuency

Reads with great concentration

Shares information about stories spontaneously

Identiﬁes themes and makes links with other similar materials automatically
Uses information from printed material in conversation

Uses rcadin g to get necessary information

I-las unusual insights or perspectives on what has been read

Has an extensive vocabulary

Enjoys reading

Shows competence in understanding story or book

Has some difﬁculty with unusual vocabulary or sentence constructions
Constructs image of the story that is very consistent with text
Monitors and self-corrects in oral reading

Makes good predictions regarding text

Uses text to ﬁnd information competently

Has a strong vocabulary

2

Responds to reading assignments rather than a personal drive to read

Needs help in ﬁnding selections of interest

Needs push to read different or challenging materials

Lacks conﬁdence as a reader

Constructs partial image of text often focusing on details of lesser importance
Tends to pick easy selections or familiar stories

Has weak vocabulary

1

Experiences great difﬁculty in constructing meaning

Lacks conﬁdence in making predictions

Demonstrates little or no use of strategies

Gets hung up on "sounding out"

Shows frustration 1n reading task

Has little self motivation

Displays weak study skills

Has short attention span - easily distracted

Demonstrates little connection between thinking, saying. reading, writing
Has a meager vocabulary

Figure 6 - Bay Village Reading Scale

 

 

 

unex;
comp:
true
and

have
orde
obse
summ

seen

ESSE

169

unexplored" in most assessments, he cited low—level
comprehension questions, such as "Who murdered Macbeth?" and
true—false items, such as, "Huckleberry Finn is a good boy"
and "Hamlet is mad“ (to which Purves responded, "I might
have gotten that one wrong"). A text is read, then, "in
order to take a multiple-choice test on it," a task Purves
observes is best prepared for by reading a commercial plot
summary. He seems aware that what is valued is tested and
seems to regret that
Only two states have a humanities assessment and thus
include literature as an aspect of general cultural and
intellectual history. Fewer than a quarter of the
states . . . measure student knowledge of specific
authors and titles, literary terminology, or general
cultural information, and only two of the states report
that these particular measures are used to help
determine promotion or graduation. ("Today’s" 5)
Perhaps Robert Probst shares Purves’ concern about the
assessment of literary knowledge and experiences. However,
Probst recommends that evaluation "should grow logically out
of our concern for students’ responses to the literature and
their analysis of them" (225). He offers self-evaluation
questions that focus on the personal engagement of the
student with the literature, that is, on the transaction
between reader and text-—e.g., "Did you enjoy reading the
work?" and "Did the literary work offer any new insight or
point of view?" (225). Further, Probst suggests checklist
questions to evaluate students’ reading performance and
process, including such items as "Does the student

distinguish between the thoughts and feelings she brings to

 

 

 

 

170

a literary work and those that can be reasonably attributed
to the text?" and "Does the student accept the
responsibility for making meaning out of the literature and
the discussions? Or does she depend on others to tell her
what works mean?" (226—27). Such questions call for
concentrated classroom observation but seem to echo—-and
provide a: way to respond to-—some of the same assessment
concerns Henry described 30 years earlier.

The assessment of writing has moved since the late
eighties toward a focus on teaching students how to assess
writing. The expectation is that once they recognize what
constitutes good writing they will be able to produce it
themselves. For example, Steven Zemelman and Harvey Daniels
suggest that students help establish criteria for grading
and for evaluating papers, in an effort to take the mystery
out of writing assessment. Dan Kirby and Tom Liner agree
and offer "checkpoint scales" which could be used for
student self-evaluation and also could reduce grading time
for beleaguered secondary English teachers. Similarly Iris
Tiedt's 1989 text suggests training students to use
evaluation rubrics (190).

Vicki Spandel and Richard Stiggins (Creatin Writers,
1990) explicitly link writing and assessment and
instruction. These authors insist that assessment should
not be intrusive and should not be taken out of the writing
classroom (x). They explain that revising and editing, peer

reviews, and even sharing writing by reading aloud are all

_,_r

 

 

171

forms of assessment and offer criteria for a system of
classroom writing assessment:

0 reflects specific, well-defined, consistently
applied criteria

0 provides student writers and teachers with better
insights about what makes a piece of writing work

0 reveals the strengths, as well as the weaknesses,
in writing

0 gives teachers some welcome clues about what
(specifically) they can do to help students write
better

0 provides students (as well as teachers, parents,
and others) a working vocabulary that they can use
to talk about writing (14)

Based on research and teaching, Spandel and Stiggins include
instructions for training students in holistic and analytic
scoring and provide classroom-tested suggestions regarding
how to respond to students’ papers, emphasizing the power of
positive comments.

Donald Graves recommends a portfolio approach and
suggests that even young students could be asked to spread
out their work and to make judgments about the various
pieces of writing, finding and. marking items--those that
indicate "good" writing, that were "hard" to write, that
show the writer is getting the "hang of it," that show the
writer. has "learned" something, etc. (NCTE Convention's
"Whole Day of Whole Language," 1990). Ray Levi extends the
kind of suggestions Graves provides to include parental

evaluation of student portfolios. He describes an end-of—

the-year questionnaire for parents as they respond to their

 

172

children’s first—grade writing folder: "I invited the
parents, after reviewing portfolios, to share comments about
their children’s growth and to articulate hopes for the
following year" (270).

When the State of Michigan considered portfolio
assessment in the design of a statewide writing assessment,
difficulties emerged that call into question even the
possibility of such an approach for large—scale assessment.
During a pilot study, which I was involved in, student-
selected pieces of writing were included for scoring along
with a timed (50-minute) writing sample and an essentially
untimed sample (composed, as much as possible, in normal
classroom conditions over as much as a five-day period).
While the writing done in response to provided prompts was
relatively easy to evaluate, the self-selected pieces
represented such a wide range of genre and such a mix of
student-edited and teacher—corrected pieces that it seemed
impossible, at least to the pilot study committee, to make
comparisons or to determine the standards that might be used
to evaluate such pieces. Some do, however, suggest the use
of portfolios for large-scale assessment, as in the case of
New Hampshire (Simmons), though E1 Michigan Department of
Education employee has observed that the city of Detroit
alone has more students than the entire state of New
Hampshire, a comment which puts into a different perspective

the use of large-scale portfolio assessment.

 

173

NAEP published Learning to Write in Our Nation’s
Schools (Applebee et al. 1990) to report results of its 1988
writing tests. It presents data reflecting several criteria
by which writing performance and attitude can be measured,
asking questions about planning, revising, and editing
strategies; about liking writing; about time devoted to
writing; about length (ﬂ? writing assignments, etc. Such
criteria go well beyond the evaluation of written products
and reflect continued interest in finding ways to evaluate
writing processes. Zemelman and Daniels link teachers’
roles with simultaneous writing and evaluation processes,

suggesting a number of possible relationships worth further

 

 

research:
Stage: PREWRITING: DRAFTING: REVISION: PUBLICATION:
Writer’s
focus: Ideas Fluency Clarity Correctness
Kind of
assessment: Observing Responding Evaluating Grading
Teacher Listener; Encourager; Coach; Expert;
roles: encourager coach expert editor
Goal of Probing Encouraging; Questioning; Judging;
feedback: for suggesting; challenging; grading
interests; processes evaluating
motivating

 

Figure 7 — Evaluating Writing as a Process

Some oral language specialists seem to share with
literature specialists the concern that unless something is

tested, it will continue to be undervalued in the classroom.

174

John Stewig insists, for example, "Without significant,
numerous, and to-some—degree standardized assessment
measures, oral language will not be included systematically
in the curriculum" (173). He suggests that classroom
teachers monitor students’ oral growth by taping and

analyzing oral language, one advantage being that "tapes are

 

both cheap and small enough to store easily" and they allow
for language samples to be gathered at "various times and in
varying contexts" 173). Although such procedures are more
possible than were the stenographic records suggested in
earlier years, they would require considerable time and
effort to implement. Perhaps, however, Stewig’s point is
that sufficient time and effort focused on oral language, in

his mind at least, is exactly what is needed.

 

Integrating English Language Arts Assessment

One of the most significant recent influences on
changing means of evaluation of student performance in
English language arts has occurred because of the impact of
the whole language movement. Whether or not one agrees with
the philosophy, whole language theorists and practitioners
believe that traditional tests are not consistent with a
whole language philosophy and have, therefore, experimented
with a variety of classroom assessment alternatives. They
have in many cases been careful to explain the theoretical
and philosophical implications of the practices and

procedures suggested. For example, the Whole Language

175

Evaluation Book (K. Goodman et a1. 1989) devotes its preface
to explaining the theoretical basis for forms of evaluation
that are consistent with the principles of whole language,
emphasizing a "positive view" of teaching and learning (xi).

The ethnographic research that Denny Taylor and a group
of classroom teachers and administrators are conducting
extends the theoretical discussions of whole language
evaluation. As Taylor explains, "one cannot approach
assessment in a new way without also altering what passes
for teaching and learning in a school setting" (3). She and
her colleagues are gathering and studying every scrap of
evidence of selected students’ use of language to determine
their "literacy configuration," that is, the way they use
print to produce "a unique pattern of . . . literacy
behaviors" (8). From a practical standpoint, however,
Taylor's research methods may be just marginally applicable
to classroom use, since the time involved makes such case
studies prohibitive for most classroom teachers (one
teacher, for example, admitted "it took two hours each time
I wrote about a child") (271).

As whole language practitioners design classroom
procedures for implementing a whole language theory of
assessment, they have emphasized self-evaluation and the use
of portfolios. Both allow for integration of language arts
across other content areas as well, allow for individual

choices, focus on students’ strengths as well as weaknesses,

 

176

and count on students’ ability to take charge of much of
their own learning.

Many of the alternative assessments that whole language
advocates suggest to replace standardized tests and
objective classroom tests are items that have been suggested
many times in the past-~items such as teacher observation,
interviews, self-evaluation, and portfolios. The
professional literature of the past, however, does not seem
to indicate that many of these alternative suggestions ever
materialized in significant numbers in classrooms around the
country--which might lead one to wonder whether or not the
situation will be any different today, though we constantly
encounter articles and conference sessions focused on these
alternatives. Perhaps such alternative assessments are more
likely to appear in classrooms in 1991 because the
professional journal articles are more often written by
actual classroom teachers who can testify to the success of
the new assessment procedures and, more importantly, can
describe in detail how the procedures work and what cautions
are in order. Too often in the past suggestions for
alternative assessments were offered by teacher educators
who could offer little, if any classroom examples and
evidence to support their suggestions. The teacher—as-
researcher movement, however, seems to have been welcomed by
editors of professional journals, who frequently include
articles such as "Adapting the Portfolio to Meet Student

Needs" (Krest), in which a high school teacher explains her

177

classroom portfolio system, and "Finding the Value in
Evaluation: Self-Assessment in a Middle School Classroom"
(Reif), written by a classroom teacher explaining in detail
how and why her own practices work.

Evaluating English Language Arts
Teaching Practices

Even as it has grown more difficult to separate
different strands of the English language arts from each
other as a result of recent holistic approaches, it has
likewise become more difficult to separate the issue of
evaluating teaching practices from evaluation of student
performance. Controversy rages as to whether students’
performance should be a criterion by which to evaluate
teaching practices and teachers' performance. In Georgia,
as Samuel Meisels reports, administrators and teachers in
local school districts have been told that "their
performance will be evaluated based on the gains made by
their students on the CAT in succeeding years," a practice
which has led to teachers making changes in their programs
and teaching styles (20). In Kentucky a new law provides
for schools with steady improvement in student performance
to receive "cash awards to be used as the majority of
faculty in each school determine." The faculty and
administrators at schools which fail to improve "will be
subject to transfer or dismissal" (Foster 36). Marc Tucker
and the National Center on Education and Economy recommend

that teachers be held accountable for student performance

 

 

178

and that "real awards" be given school professionals who
help students meet the standards and "consequences" for
those who do not (Gursky 55).

Madaus points out that one reason tests continue to be
so popular is that "the public and policymakers have come to
mistrust teachers' judgments and want to replace them with
external examinations" (114). Given the history of English
language arts evaluation, it seems clear the public has
frequently done so before. On the other hand, the report
from the English Coalition Conference insists that "English
teachers are the professionals most qualified to specify
what is important in English studies: what are the
understandings——and more important, the ways of knowing and
doing—-that our students should achieve" (Lloyd-Jones and
Lunsford 41). Apparently, President Bush and those
attending the 1989 education summit hope politically to
please both camps, for they have called for both greater
authority and for greater accountability for teachers and
principals (Oct. 4, 1989, Education Week).

Recognizing the heuristic dimension of teaching,
English language arts teachers, like their colleagues in
other fields, are being encouraged to seek official
evaluation which acknowledges "the process of discovery and
growth" and which allows for a measure of "unpredictability
and uncertainty" (Bryant 38). The title of an article by
Yetta Goodman, "Evaluation of Students: Evaluation of

Teachers," is printed so that the letters in the second

179

phrase are reversed to visually depict the fact that student
assessment is a reflection of teacher evaluation. Goodman’s
intent, however, in expressing this relationship is not to
hold English language arts teachers accountable so much as
to nudge teachers to recognize their own status as learners
as well as teachers:
Seeing ourselves reflected in our classrooms and in the
responses of our students helps us to understand the
nature of language learning and at the same time helps
us become aware of our influences on that learning and
on the relationships between teaching and learning.
The dynamic transaction between teachers and students
results in change in all the actors and actions
involved in the teaching/learning experience. (3)
Graves’s Build a Literate Classroom (1991) is an entire
book. designed so that English. language arts teachers can
evaluate their own teaching and their own beliefs and
attitudes about learning-—building on his belief that
changed classrooms are the result of changed teachers. Some
of the chapter titles suggest the process of change that
Graves believes English language arts teachers might set for
themselves: "Make Your Own Decisions~-With the Children,"
"Rethink Learning and the Use of Time," "Structure a
Literate Classroom," and "Evaluate Your own Classroom."
It seems likely that at least some English language
arts teachers who choose to follow Goodman’s and Graves’s
advice might find themselves professionally at the mercy of

a testing system that may or may not be philosophically

compatible with their professional belief system. Such

180

situations will present a significant challenge to English

language arts teachers in the 19905.

Evaluating English Language Arts Curricula

The English language arts teachers Goodman and Graves
had in mind are capable professionals who can design and
evaluate English language arts curricula. Whether they are
ever given the opportunity to do so, some English language
arts teachers seem eager to be involved in such decisions,
and some publications are beginning to address such
possibilities. One book, for example, encourages each
elementary school faculty to "examine its own programs and
values before embarking on a course of curriculum reform"
(Goodlad xi). This report advocates involving local school
faculty, who develop their own plan "designed specifically
to fit [their] needs and desires" (Klein 6). The author
suggests surveying parents, teachers, administrators, and
students to discover what such groups believe ought to be,
and what is being, emphasized in the curriculum (Klein 27).
In addition to considering such groups’ opinions as reform
suggestions are made, teachers and administrators could use
the data to determine the kind of information such groups
need to know to better understand the English language arts
curriculum and program.

Robert Donmoyer’s article, "Curriculum Evaluation and
Negotiation of Meaning," goes further by describing a

"deliberative approach" which involves actually gathering

181

together a group of "teachers, administrators, parents,
community members and where appropriate students, and asking
them to discuss and debate such questions as (1) what issues
should be focused on in the evaluation, (2) what sorts of
data ought to be collected and what methods should be
employed to do the collection, and (3) what recommendations
should be made to improve the program (275). Donmoyer
points out that under such a plan the emphasis is on
"fostering communication and resolving disagreements among
participants who ideally will start the evaluation process
with different views of education in general and the program
being evaluated in particular" (275). He insists, however,
that questions of meaning should always be explicitly
addressed-—e.g., how is reading defined?--and resolved
through discussion. Such a plan calls for considerable
"group process skills" (277) on the part of the "evaluation
leader" but can yield both recommendations for change and
closer consensus as to the meaning attached to curriculum
and teaching practices (278).

Donmoyer would admit that program evaluation involving
groups of persons with varying perspectives and agendas is,
at best, an intricate undertaking. The benefit in initially
involving groups in decisions about what should be
evaluated, for example, may be a greater sense of ownership
in the evaluation efforts on the part of the school

community as a whole (278).

182

Any discussion of curriculum design and evaluation
focuses on what it is that students need to learn and know
and do. English language arts professional publications
frequently focus on these more theoretical and classroom
issues. Seldom, however, have English language arts
publications given recent attention to important political
issues, such as standards, that are being debated elsewhere.
In broader education journals, for example, the high school
diploma is being described as an indication merely of
"credit accrual and seat time" (Wiggins 42). Norm—
referenced tests are being described as offering only a
"floating standard, which in a sense, makes it no standard
at all" (O’Neil 6). The idiosyncratic nature of teacher
grading is again being challenged (Canady and Hotchkiss).

As might be expected, tied to discussion of national
standards are discussions of national tests. Parents and
the general public seem to favor the idea (a 1989 Gallup/Phi
Delta Kappa poll, for example, reported that 73 percent
support a common national exam for graduation) (O’Neil 7).
One member of the President's Education Policy Advisory
Committee has called a national exam "a foregone conclusion
now" ("Specter" 6). An almost $2.5 million grant has been
awarded a group working in Rochester, New York, and
Pittsburgh to produce a "broad national examination system,"
"national educational goals," "national syllabus," and "new

ways of measuring students’ mastery of knowledge and skills"

 

(Gursky 52). Where such plans might leave English language

 

 

183

leaders and classroom teachers is unclear. As happened
during the behavioral objectives era, English educators may,
it seems, find themselves swept along with little, if any,
chance to protest.

The stakes, however, are very high. Enormous amounts
of money are involved-—in grants such as that just described
and in awards to ETS for NAEP--money that is thereby not
available for other educational purposes. A relatively
small group of persons seems to be making decisions that
will have significant impact on teaching and learning at all
levels. Whether or not the leaders of English language arts
professional organizations believe they should take
particular political stands on these broader issues, past
experiences and issues have demonstrated that English
language arts leaders and classroom teachers need at the
very least the chance--via journal articles and conference

sessions--to hear the issues debated.

 

 

CHAPTER EIGHT

REPORTS FROM ENGLISH LANGUAGE ARTS PROFESSIONALS

As a part of many of the English language arts program
evaluations conducted in the past and cited in this study,
English educators have recognized the value of gathering
questionnaire data. In fact, questionnaires have been a
common feature of such studies because they allow for a
variety of perspectives from which to view other data.
Researchers such as Pooley and Williams in their Wisconsin
study and J. N. Hook in his report on award-winning high
schools have recognized the need for the reality check that
questionnaires make possible.

My own questionnaire was designed to elicit information
from English language arts professionals who had (1) special
interest and/or experience in assessment of English language
arts curricula, teaching practices, or student performance
and (2) frequent, if not daily, contact with actual English
language arts teachers and classrooms. Questionnaire
respondents do, of course, sometimes respond to an anonymous
questionnaire as if what they' wished for' were true, and
Slavin cautions that "questionnaire-scales attempting to
measure hard—to-quantify variables" often have low
reliability (78). Still, questionnaire respondents can
provide a wealth of information-~both fact and opinion--in
their responses to both prompted and open questions and in

marginal and supplementary comments.

184

185

My questions were designed primarily to yield
information about criteria actually being used to evaluate
current English language arts programs and also about the
contexts in which that evaluation takes place. Two final
optional items asked for brief descriptions of district
English language arts program evaluation processes and for
suggestions that might help decision-makers improve
programs. The questionnaire, designed in consultation with
my dissertation adviser and with a data collection and
analysis consultant, included a six-point Likert scale for
several of the questions. For all questions, respondents
were asked to check as many items as applied and to feel
free to add comments, thus allowing for some of the
elaboration that is possible when gathering interview data.

Before mailing the questionnaires, I pretested them
with. a group of graduate students who were also English
language arts teachers in a variety of school settings.
Their responses led me to make minor refinements in phrasing
and in the items included. For example, they pointed out
that in the "Teaching Practices" section, question #1, I had
asked about the significance of "factors" which were ctually
"persons." Their questions about how to respond to the
"Student Assessment" section, question #1, led me to revise
the question by removing "for English/Language Arts," which
seemed to be confusing and unnecessarily restrictive. Their
resonses to the first open—ended question led me to rephrase

and clarify the fact that when I asked about the process of

 

 

186

evaluating English language arts "program," I hoped they
would consider curriculum, teaching practices, and student
performance.

Beyond these safeguards, I felt reasonably hopeful that
the sample I chose to work with would provide honest and
substantive information because they were all English
language arts professionals who served on NCTE committees
or served as contact persons in award-winning school
districts. Clearly such a group was not a sample
representative of all English language arts teachers and
educators across the country. However, I specifically
valued the responses of this group in part because they were
knowledgeable about circumstances beyond their own personal
situations. (By specifically asking respondents to consider
their school district or the districts they knew best, I
sought responses not limited to single classroom
experiences, though such responses necessarily increased the
reporting of hearsay evidence.)

Even offering specific prompts for respondents to
consider does unavoidably bias responses to some degree.
The compensation for that bias, I believe, exists in the
indication from respondents that the questions provoked new
reflections and insights about their own and others'
circumstances. For example, one respondent observed that,
"Filling out your survey helped me realize how much control
rests with the classroom teacher." Another respondent

commented on the questionnaire items used, saying "they will

 

 

 

187

open doors for honest comments." The questionnaires were

sent to members of the NCTE groups indicated in Table 1.

Table l - Sample for Questionnaire

 

 

Center of Excellence Winners — drawn from

lists of 1985, 1987, 1989 ........... ..... ...... 62
standing Committee on Testing and Evaluation ..... 12
Committee on Curriculum ...................... ... 13
Elementary Practices/Programs Committee .......... 23
National Certification & Assessment Committee .... 14

Conference on English Education Committee —
Supervision & Curriculum Development in

English Language Arts Consultants .............. 143
*Classroom Practices in Teaching English .... ...... 5
*Centers of Excellence Committee ...... .... ....... . 5
*Evaluation Curriculum Guides Committee ........ ... 20

Total 297

*Chosen from 1989 rather than the 1990 NCTE Directory, since
these committees were being reconstituted in 1990.

Of the total group, 209 seemed (from their
institutional affiliation listed or from their mailing
address) to be employed within a school district, 88 seemed
(for the same reasons) to be employed by universities or
state departments of education. Questionnaires were sent to
persons in 39 states, with heaviest concentrations
predictably in California, New York and Texas.
(Questionnaires were not sent to those with no institutional
affiliation listed or address given, to ex officio members,
and to NCTE staff liaison members of committees.) Each
questionnaire was accompanied by a cover letter (Appendix B)
and a stamped-self—addressed envelope, but no follow-up

mailings were sent. (The Michigan State University

 

188

Committee (n1 Research Involving Human Subjects granted an
exemption to the University Policy on Research with Human
Subjects for this study.)

The total number of responses received, 102 from the
school districts (49 percent) and 39 from universities/state
departments of education (44 percent) for a total of 141 (47
percent) was not disappointing, given the length of the
questionnaire (3 pages, single—spaced, asking for responses
to 77 items to be checked plus 2 optional essay items) and
the May mailing date, normally an especially busy time for
classroom teachers. Six questionnaires were returned by the
post office, and two were returned blank with the
explanation that the respondents’ current work did not
involve enough direct contact with school districts to
enable them to respond knowledgeably. One respondent did
not fill in the checked responses but indicated that he felt
uncomfortable with the questions asked, since so much
variety can exist from situation to situation. There were,
then, 132 usable responses.

The responses tended to be rich with information. Only
11 percent of those who responded provided only checkmarks,
while 77 percent wrote out responses to the optional essay
questions. Thirty-five percent added marginal comments
(several wrote lengthy notes or even separate letters as
well), and 46 percent included their name and address so
that results could be mailed to them. Several added

personal notes (though I knew almost none of them

 

 

 

 

189

personally) inviting me to visit their districts and wishing
me luck with the study.

In reporting the results, I have combined Likert-scale
responses into three groups rather than six and have
included only a sampling of marginal comments to suggest the
range of information offered and opinions expressed.
Although responses are reported as percentages, it would be
misleading, because of the small sample involved, to place
great significance on the numbers. Instead, the value of

the responses lies primarily in the trends they suggest.

English Language Arts Curricula

Recognizing the important link between curriculum
design and curriculum evaluation, my first questions focused
on the influences on English language arts curricula.

Table 2 - Factors That Shape Curriculum

 

1. In your school district (or the school districts you
know best) how significant are the following factors in
shaping the English Language Arts curriculum?

 

 

Significance = Very Moderately Not
a. accreditation bodies (n=91) 34.0% 34.1% 31.9%
b. college expectations (n=126) 49.9 36.5 13.5
C administrator/board

interests (n=l30) 44.5 39.1 16.1
d. professional literature (n=129) 44.9 40.2 14.7
e. community influences (n=l30) 30.7 50.5 19.1
f. faculty skills/interests (n=132) 46.9 44.6 8.2
9. school schedule (n=128) 28.0 42.0 29.6
h. student needs/interests (n=132) 48.4 39.0 12.0
i. test results (n=123) 44.6 44.6 10.5

 

j. other - tradition, state curricula/tests, supervisors

 

 

 

190

Marginal comments about factors that shape English
language arts curricula sometimes explain district-wide
curriculum situations, e.g., "In this large school district
there are great variations by sub—district and by school.
Although there is a district-wide ’standardized curriculum’
in all subject matter areas, K-12, it is mainly a listing of
topics, and not descriptive of subject matters." Others
offer a particular definition of curriculum, e.g., "I will
answer all these questions from the vantage point of
curriculum as subject matter selected and organized on the
basis of a curriculum design."

Although several influences outside the classroom are
perceived as significant, "college expectations" is most
often ranked as very significant in the shaping of
curriculum, slightly higher than even "student needs and
interests." K—12 English language arts professionals still,
apparently, see themselves like their early twentieth—
century counterparts--strongly influenced by post—secondary
forces beyond their control.

Other highly significant factors are "administrator and
board interests," "professional literature" (no doubt more

important to this group of professional leaders than to a

general population of educators), "faculty skills and
interests," "student needs and interests," and "test
results." (The low number of responses for "accreditation

bodies" can perhaps be explained by the placement of that

item on the questionnaire, which may have led some

 

 

 

 

 

 

 

191

respondents not to notice that it was actually the first
item. Otherwise, almost all respondents checked each
item.)

Fewer than half rank "test results" as very
significant, which seems a little surprising in light of all
the rhetoric in professional publications about testing.
However, almost 90 percent rank test results as at least
moderately significant. Overall, when "very" and "moderate"
figures are combined, the three most significant factors in
shaping English language arts curricula are teachers,
students, and tests. These results indicate what most
English educators perceive as appropriate central roles for
teachers and learners in the classroom. The fact that test
results rank high is consistent with opinions expressed in
profess professional publications that tests already exert

considerable control over curriculum.

Table 3 — Curriculum Evaluation

 

2. In your school district (or the districts you know best)
how significant are the following groups in evaluating the
English/Language Arts curriculum?

 

 

Significance = Very Moderately Not
a. teaching faculty (n=132) 67.4% 19.7% 12.8%
b. curriculum coordinator/

principal (n=131) 59.5 29.0 11.4
0. external consultant (n=127) 8.6 33.9 57.5
d. students (n=129) 15.5 39.6 45.0
e. accreditation bodies (n=128) 28.9 43.8 27.3
f. community/board (n=129) 30.3 41.8 28.0

 

9. other - state, parents

 

 

 

192

English language arts teachers clearly are perceived
as most influential in evaluating English language arts
curricula, a reassuring result for most English educators,
though the fact that more than 1 in 10 rank faculty as not
significant in evaluation of curriculum is disturbing.
Curriculum coordinators and principals are also ranked as
very significant. These persons are all part of school-
based or district-based staff, whereas least significant are
external consultants. According to respondents, students
play a noticeably insignificant role in evaluating English
language arts curriculum, despite current professional
discussions about learning communities and negotiated
curricula. Thus, the reality of English language arts
classrooms appears to be a fairly traditional one with
classroom teachers as the primary evaluators of the
curriculum.

The following three questions about curriculum guides
may have implied they are more important as a curriculum
factor than others believe them to be. Although my
historical study revealed that relatively little is said
about curriculum guides in English language arts
publications, my own experience has led me to believe that
curriculum guides are, officially at least, considered

important.

 

 

193

Table 4 — Designing and Revising Curriculum Guides

 

3. In your school district (or the districts you know best)
how significant are the following persons in designing and
revising curriculum guides?

 

 

Significance = Very Moderately Not
a. teaching faculty (n=132) 81.0% 13.6% 5.3%
b. curriculum coordinator (n=126) 67.5 28.6 4.0
c. principal/administrator (n=130) 21.6 43.9 34.6

 

d. other - state, external consultant, students, curriculum
committee, parents

 

Question 3 responses makes clear that curriculum guides
are considered a matter primarily for classroom teachers and
curriculum coordinators to produce. One respondent
explained that usually the curriculum coordinator chaired
the curriculum guide committee, serving as a facilitator
"but does not impose change." The range of other persons
added by respondents as significant in designing and
revising curriculum guides suggests that guide writing can
be a process that takes into consideration. a variety of

perspectives.

Table 5 - Use of Curriculum Guides

 

4. What use is made of curriculum guides in your school
district (or districts you know best)?

 

a. teachers are expected to follow them explicitly ...26.3%

 

b. they are intended to be followed loosely ...... ....52.6
c. they are often ignored ... ........ ........ ...... ...29.3
d. other - "non—existent," intended to be followed but

modified/added to, designed to "model" plans and choices

 

 

 

194

The inclusion of the "often ignored" item seemed to
have the effect of a signal to respondents that I did not
necessarily expect them to respond in party—line terms and
that I was aware that official policy and actual experience
do not always match. Responses about the use made of
curriculum guides suggest that a fair amount of curriculum
guide writing may be a matter of going through the motions,
since almost 1 in 3 respondents perceive curriculum guides
to be "often ignored." Many respondents seemed eager to
express a positive or negative judgment about curriculum
guides, for their comments often seem either skeptical or
defensive. For example, one school-district respondent
said, "District guides are often ignored. Our own program
is adhered to." A university or state department respondent
said, "One has no way of knowing . . . teachers too often
say one thing and do another." On the other hand, several
English language arts professionals within school districts
defend the use of their curriculum guide. One called it a
"working document." Another said it is "highly useful . . .
because of the model integrated plans, the flexibility, and
choices." Another especially enthusiastic respondent
explained that the "guide is designed to encourage teachers

and kids to discover together how to make learning happen."

 

 

 

195

Table 6 - Value of Curriculum Guide

 

5. What value does an English/Language Arts curriculum
guide have?

 

a. its development/revision provides occasion for
faculty to discuss pedagogical issues and

seek consensus .... ............... .... ...... 78. 9%
b. provides information for new teachers in the
district ........ ...... ........ ............. 8.8

c. provides an official source for reference by
administrators /teachers in discussion with
parents/board members . ........... .. ......... ..73.7

 

d. other - assures equity and prevents overlap across
district, identifies skills for mandated tests, articulates
K-12 program, improves instruction, used for evaluation

 

The range of "other" responses suggests that the value
of curriculum guides may be a more complex issue than my
prompts indicated. Some respondents see the guide as
providing 2a "scaffold" for the curriculum which, another
respondent reports, "prevents overlapping" and, according to
another respondent, prevents "undue repetition"~-all
rsponses which seem to imply a traditional curriculum
structure.

Some see curriculunt guides as linked to assessment:
"They are a check and balance for teachers - an assessment
resource to see if their yearly curriculum addresses
district expectations." Another respondent explains that
the guide "identifies skills . . . on the graduation test."

Again, the actual value of curriculum guides is
perceived as different, in some cases, from what is
officially stated: "Although #1 is quoted, in actual

practice final decisions are made by curriculum coordinator

 

 

 

 

196

and administrator, who may know nothing about field."
Another respondent said that they "make it appear state
framework is being followed," and another agreed that
"they’re for show." Apparently, situations differ to a
great extent and perceptions of those situations differ as

well.

English Language Arts Teaching Practices

It is interesting to notice the shifts in influence
that various persons have when issues regarding English

language arts teaching practices are considered.

Table 7 - Persons Who Determine Teaching Practices

 

1. In your school district (or the districts you know best)
how significant are the following persons in determining
teaching practices?

 

 

 

Significance = Very Moderately Not

a. classroom teacher (n=127) 84.9% 12.5% 2.3%
b. curriculum coordinator (n=120) 35.0 47.5 17.4

c. principal (n=123) 29.2 50.3 20.2

d other - department head, superintendent, state,

community/board, students,

 

Again, classroom teachers are perceived as most
significant in determining teaching practices, with less
than 3 percent reporting teachers as not significant in
determining teaching practices. These figures may be
somewhat surprising in light of general impressions

sometimes expressed that external forces are wresting

 

 

197

 

control from teachers. English language arts professionals
seem to think that tests exert more control over the content
of the curriculum than over teaching practices. Some
respondents do, of course, point out that their districts do
not have a curriculum coordinator. Principals seem not to

be considered instructional leaders among these respondents.

Table 8 - Evaluation of Teaching Practices

 

2. In your school district (or the districts you know best)
how significant are the following factors in evaluating
teaching practices?

 

 

Significance = Very Moderately Not
a. principal (n=129) 77.6% 21.7% 0.8%
b. curriculum coordinator (n=122) 26.3 40.2 33.6
c. peer teachers (n=124) 20.9 34.7 44.3
d. parents/board members (n=122) 10.7 35.3 54.1
e. students (n=125) 11.2 32.8 56.0
f. teacher self-evaluation (n=121) 34.7 41.4 15.7

 

g. other - department chair, test results

 

If teachers are perceived as most significant in
determining teaching practices, clearly principals are
perceived as most significant in evaluating teaching
practices. One respondent commented that the principal as
evaluator "is governed by the contract," a situation that
may be true almost universally. Again, students are cited
as especially not significant, as are parents and board
members, again implying a traditional school and classroom

structure.

 

 

 

198

Table 9 — Influences on Changing Teaching Practices

 

3. In your school district (or the districts you know best)
how significant are the following influences in changing
teaching practices?

 

 

 

Significance = Very Moderately Not
a. professional literature (n=131) 35.0% 46.5% 18.3%
b. inservice/staff development

training (n=103) 38.7 56.3 4.8
c. exchange of ideas among teachers

in building (n=131) 59.5 33.5 6.7
d. exchange of ideas among teachers

in district (n=127) 35.3 44.0 20.4
e. exchange of ideas within wider

group (n=127) 44.0 35.3 20.4
f. administrators/board (n=129) 17.0 44.1 38.7
g. constraints of facilities/

school schedules (n=127) 24.3 49.5 25.9
h. student needs/interests (n=129) 42.5 40.2 17.0
i. test results (n=129) 35.6 47.9 16.2
j. community (n=107) 10.2 44.7 44.7
k. other - writing project, state, department

chair/curriculum coordinator

 

When teaching practices are changed, respondents
indicate that teachers lead the way, especially when they
exchange ideas and information among colleagues in their own
school building. Somewhat surprising is the strong showing
of "exchange of ideas within a wider group (e.g., writing
project support group)." Both these responses are
consistent with the optional essay responses, which
underscore the need teachers feel for time to grow
professionally. They also suggest a sense of confidence in
their own ability to be decision-makers.

Interestingly, whereas principals are perceived as

especially significant in evaluating teaching practices,

 

 

199

administrators are cited as almost the least significant
influence on changing 'teaching practices. Although
principals have the official authority, English language
arts professionals seem not to depend on administrators’
advice as much as they do on other resources.
Interestingly, only 1 in 10 perceived community influences
to be very significant.

If "very" and "moderate" responses are added,
"professional literature" and "inservice/staff development
training" are cited as significant by over 80 percent of
respondents (again, perhaps reflecting the professionalism
of this sample of respondents). One respondent, for
example, has observed that professional literature is
significant now "much more so than ten years ago.
Classroom influences are also cited as at least moderately
significant by 80 percent or more of respondents, for
English language arts professionals perceive both "student
needs and interests" and "test results" as significant in

changing teaching practices.

English Language Arts Student Assessment

In considering criteria for assessing English language
arts student performance, I focused first on district-wide

measures and then on classroom measures.

 

200

Table 10 - School District Means of Assessment

 

1. In your school district (or the districts you know best)
which of the following tests or means of assessment are
used?

 

 

a. SAT, ACT, CAT, other nationally—normed tests ...... 90.2%
b. state-mandated tests ........ . ........... ..... ..... 84.2
objective .......... ........ ....... ...... ........ 65.4
writing sample ....................... . ......... .66.9
c. holistic scoring of writing samples ............... 78.2
d. student portfolios ..... ..... ............... . ...... 44.4
e. other - district tests, criterion—referenced tests, end

of level/book/basal tests

 

Clearly percentages are high for all tests and
measurements mentioned. Nationally-normed tests and state-
mandated tests are common throughout the country’s school
districts. Beyond these tests, however, holistically scored
writing samples seem also to have found a significant place
in district English language arts assessments. The marginal
comments make it clear that respondents consider portfolios
the current assessment "cutting edge" and are eager to try
them (several indicate portfolios are currently being phased
in). It seems remarkable that 54 percent of those employed
by school districts report using portfolios in their
districts, whereas only 19 percent of those from
universities or state departments of education seem aware of
portfolio use in school districts they know best. Although
portfolio assessment can mean different things to different
people, and although these responses might reflect
respondents’ eagerness to appear up to date, it seems

especially important to me—-and perhaps enlightening to

 

 

201

university and state respondents as well--that so many
districts seem to be using some kind of portfolios. I say
this because, as the historical review indicates,
alternative assessment forms have often been recommended

without always being put into general practice.

Table 11 - Classroom Means of Assessment

 

2. In the classrooms of your district (or the districts you
know best, what means of assessment are currently being used
for English/Language Arts?

 

 

a. objective tests ..... .. ................. ...........91.7%
b. essay tests ....... ............... ..... ..... .. ..... 88.0
c. observation of students ............. .......... ....74.4
d. interaction with students ............. ............58.6
e. student compositions. ....... ............. ........ ..92.5
f. student performances .......... ........... .........67.7
g. oral reading .............. ......... ...............43.6
h. student portfolios ............... ................. 63.2
i. contractual grading . ..... ...... ........ ...........36.l
j. student self-evaluation ......... ...... .... ........ 40.6
k. other — group evaluations/collaborative projects, peer
evaluations

 

More traditional forms of classroom assessment, such as
objective tests and student compositions, are apparently
used in 9 out of 10 classrooms, while essay tests follow
close behind. Observations of students make a strong
showing as well, though observation can be interpreted
broadly to mean of variety of things, from carefully planned
formal and informal classroom observations to vague
"participation" grades given on the basis of random

impressions. "Student performances" are included by 2 out

 

 

 

202

of 3 respondents, with one even commenting that "Our policy
on examinations requires ’a final culminating experience.’"

Portfolios are cited by almost 2 out of 3 respondents,
again an indication that this form of assessment is already
a part of classroom practice. "Student self-evaluation,"
however, receives a relatively low rank, even though so much
is being written in professional publications about the
importance of self-evaluation and even though it seems a
relatively easy assessment form to implement in the
classroom.

A variety of factors, some of which have appeared in
earlier prompts, influence decisions about how English

language arts student performance might be assessed.

Table 12 - Factors Determining Means of Assessment

 

3. In your school district (or the districts you know best)
how significant are the following factors in determining
means of assessment of English/Language Arts?

Significance = Very Moderately Not

a. professional literature (n=130) 34.5% 43.9% 21.5%
b. inservice/staff development

(n=128) 45.3 41.4 13.3
0. exchange of ideas among

teachers in building (n=123) 47.2 43.1 9.7
d. exchange of ideas among

teachers in district (n=121) 30.6 47.1 22.3
e. exchange of ideas within

wider group (n=121) 24.0 44.7 31.7
f. administrators/board 30.9 48.4 20.8
g. time constraints 32.7 47.1 20.2

 

h. other - state, lack of money and time, district grading
policy

 

 

203

Again, exchange of ideas among teachers in building is
most significant (90 percent of respondents rank it as at
least moderately significant), though inexplicably, exchange
of ideas within a wider group is least influential in
determining the means of English language arts assessment.
Apparently, while Writing Project and similar support groups
have influenced curriculuni decisions, they have had less
impact on assessment. "Inservice and staff development" is
perceived as significant (86 percent saw this item as at
least moderately significant), as are "administrators/board"
and "time constraints." Several respondents acknowledge and
affirm. "time constraints" as an important factor--perhaps
one they had not considered before. One respondent, for
example, observes that "we put tremendous burdens on
ourselves by ignoring [time constraints]," and another
commented that "handling the paperload continues to be a
problem. Despite staff development on alternatives to ’red-
penciling,’ e.g., conferencing,
teachers/parents/administrators continue to legitimize only
edited versions of writing completed by the teacher."
Though money issues are not included as a prompt, one
respondent points out that budget constraints are another

important factor.

 

 

204

Optional Final Essay Items
The optional essay items allowed respondents to some
degree to reconsider items and issues raised in the previous

questions.

 

1. Briefly describe, to the extent you are aware of it, the
process in your district (or in the districts you know best)
for evaluating the English/Language Arts program--curricula,
teaching practices, and student performance.

 

In describing program evaluation processes,
respondents were on their own without prompted items to
respond to, though they had just had a variety of items
suggested to them as they responded to the first part of the
questionnaire. From the items mentioned by respondents, the

following results have been compiled.

Table 13 — Factors in Program Assessment Process

 

 

(N = 97)
1. Student test score results.......... ..... .....49. 5%
2. state involvement (direct or indirect).........23. 7

3. School district committee
(teachers, administrators and sometimes

parents in multiple-year process)............19.6
4. No process ............................. ....... 9.3
5. Surveys (parents, students, teachers) ......... 7.2
6. Observation of teachers ...... ........ ......... 7.2
7. Alternative assessment information re:

students (self—evaluations, portfolios) ..... 6.2
8. Professional publications/conferences

(as indicator of current practice)........... 4
9. External consultants ........ ......... ......... 4.

 

By far, the factor most often mentioned regarding

program evaluation is the analysis and influence of

 

 

205

standardized test data. Respondents seem to agree with
recent professional publications in this regard. One of
every two respondents explicitly mentions test results, and
undoubtedly others would have done so if they had described
their process in detail rather than with a brief note, such
as "six—year evaluation process." Some respondents who
mention student test scores as a factor use what may or may
not be neutral language ("Student performance on
standardized tests, i.e., CAT, DRP, is the greatest
determinant of how effective teaching practices are
evaluated"). Others more openly express their unhappiness
with what is perceived as over-use and abuse of tests ("Test
scores are judged not to reflect what is being taught and
are disregarded on one hand while used as
evidence/proof/reason why change cannot occur on the
otherl").

One of the most striking features of the responses
received for this item is the fact that almost 1 in 5
responses mention only the part student test scores play.
Because the item follows the student assessment part of the
questionnaire, it is possible that some respondents misread
this item as related only to student assessment rather to
program assessment. However, most of these respondents make
it clear that, while they understood the question was about
overall program evaluation, test scores are essentially the
only measure by which English language arts program

evaluation occurs. For example, one respondent explains,

 

206

All programs in this district are judged by test

results only . . . the board and superintendent have an

obsession with test scores; all teaching behaviors,

curriculum, schedules, etc., are controlled by [tests].

Teachers essentially teach to tests most of the year."
Perhaps one of the more insightful responses regarding the
use of test scores is the following: "Most evaluation is in
the hands of the teacher and will probably stay there as
long as test scores remain high."

Almost 1 in 4 respondents mention the influence of
state guidelines or state-mandated curricula and/or
assessments. Since some of the respondents are in fact
state employees, it is not surprising that comments about
state involvement are often positive ("Districts are
presently trying to meet new state guidelines in building
curriculum that includes student needs, reflect the state-
of-the-art practice, and community needs.")

Committees seem to provide the district—level vehicle
by which English language arts program evaluation most often
occurs, with committee members drawn sometimes strictly from
faculty, sometimes from faculty and administrators, and
sometimes from parents, board members, and community as
well. Surveys, interviews, and questionnaires are sometimes
used with all of these groups and sometimes also with
students. A few respondents express dissatisfaction with
committee procedures, however, such as the respondent who
described a cycle of curriculum adjustments that seemed
entirely driven by the curriculum coordinator, who

apparently promotes a particular program or procedure until

207

"two years later that dies and he begins some process for
another idea." Another respondent explains that, "Our
committee teachers end up being ’yes men’ or get disgusted
and give up."

Any mention of evaluation of teaching practices seems
almost entirely limited to observations by administrators.
Still, if "superintendents’ contracts may depend on the
scores," the result is said to be "a great deal of top-down
influence on the English language arts curriculum." Other
responses regarding teaching practices range from
"Evaluation of teaching practices receives very little
attention. Teachers may teach the way they think is best"

to "We are told what must be done and taught. We are also

 

told to enforce, almost to the page, the guides given to us
to use."

Almost 10 percent of respondents report that they have
no program evaluation process. One called the process
"haphazard" with "no formal evaluation procedures." Another
reported that there is "no process--the district
administration reviews exam results and tells me to get
better results." One respondent added that, "Curriculum is

constantly being written-—NEVER implemented or evaluated."

Once respondents had vented some of their assessment
frustrations in describing "what is," most. were ready to

think more positively as they considered “what could be.“

208

 

2. If this process of evaluating the English Language Arts
program could be altered, what changes might help decision-
makers be in a better position to suggest improvements?

 

Again, respondents needed to generate their own answers
rather than respond to items provided. Although one
response in regard to suggestions for change is simply a
sarcastic, "Only if God were there to give them the 'right
answers,’" most respondents seemed to offer serious
suggestions. As they did so, some may have drawn on the
prompts suggested by the items listed on the earlier
sections of the questionnaire. From the items mentioned by

respondents, the following results have been compiled:

Table 14 - Suggested Improvements for Program Evaluation

 

 

(N = 78)

1. Greater knowledge re: assessment, research.......l6.5%
2. Time to reflect, read, share ideas, foster

collegiality ....................... ......... ...15.5
3. Teachers sharing in decision-making ..... .........ll.3
4. Staff development/inservice ......................lO.3
5. Use of portfolios ........ ..... .................. 9.3
6. Money .................. ..... ..................... 6.2
7. Using observation, interviews, ongoing

assessment ........ ... . ....... ............... 5.2
8. Fewer tests/better assessments ................... 4.1

 

The responses mentioned in this item «generally have
more to do with the context of, or conditions for, program
assessment than with evaluation processes per se.
Respondents’ first concerns are not about new procedures but

about professional responsibility and opportunity. One

 

 

 

209

respondent, for example, sees the need for "research
information in brief form to give to administrators, parents
so they understand and support changes and improvements that
will reflect current, research—based practices." Another
respondent expresses the desire for "real hard evidence that
shows the lack of accuracy of standardized tests and their
destructive influence" while another seeks "better
understanding of assessment as a means to improve
instruction vs. testing for accountability." Respondents
seem to recognize that program evaluation takes time, but
interestingly, respondents report a need for time to share
ideas with colleagues, to read and reflect and become
knowledgeable, or as one respondent put it, "time to share
and discuss and read about what works and be allowed and
supported to make change."

If English language arts teachers had the knowledge and
time, some respondents reason, they should share in a more
significant way the decision making done in regard to
English language arts. Some respondents, however, do
express satisfaction with their own present systems and
boast, "Our school faculty has been given great autonomy in
developing our own philosophy with regard to all the
curriculum areas." What they seek for themselves, they seek
for others as well, listing staff development and inservice
as a high priority--although some of the responses express
rather smug and superior attitudes toward their colleagues,

complaining that they "do not read professional literature"

 

 

210

and that "if we didn’t change books every seven years,
nothing would change."

Although classroom teachers are infrequently involved
with specific money issues directly, they live with the
consequences of limited or plentiful funds everyday. A few
respondents seemed to realize an important connection
between money and programs and, therefore, mention money as
an item important to improved program assessment. When they
mention money, however, it is not money to pay for testing
that they want but rather money to develop the
professionalism to be capable curriculum evaluators
themselves ("If teachers are to be truly involved in
curriculum maintenance district wide, they need time and pay
to do it").

In regard to changes that might affect classroom
assessment, respondents mention portfolios most often,
though some caution that they need to know how to use
portfolios effectively-~as "more than just ’holding bins.’"
Another respondent reports that a portfolio system would be
desirable but “NOT where portfolios are holistically scored
but where teachers sit together and discuss student work
periodically--first with students, then with other teachers
and administrators." In addition to mentioning portfolios,
respondents also occasionally mention a variety of other
informal classroom assessments (e.g., observation,
interviews, ongoing assessment). A number of other

suggestions are offered, such as the need for an English

211

language specialist rather than generalists in the district
office, since sometimes "everyone, regardless of background,
sees themselves as knowledgeable about English." Others
mention the need for better coordination and communication
among K-12 language arts teachers.

Although relatively few mention tests as a part of
their suggestions for improved program evaluation, those who
do have strong feelings about the issue. One respondent
reports positively, "We need to stop fearing evaluation and
begin to look at it as an informative process. Then we need
to engage in evaluation frequently--formative, diagnostic,
and summative." More often, however, respondents continue
to rail against the power of tests, with one conceding, "We
hardly rely on teachers’ statements about students at all,"
and another explaining,

The results of state mandated testing are having

PROFOUND impact upon the curriculum now--much more so

than any previous source of information. We are

feeling intense pressure to modify curriculum to
address weaknesses of our students’ testing. The goal
now is to improve objective test scores.

While one respondent concludes that, "I feel, and fear,
that each school district will have to devise their own
means of assessment to better accommodate their needs,"
another mentioned the need for greater involvement in
professional organizations, such as NCTE, and expresses the
desire for specific help from such organizations (since the
questionnaires were sent specifically to NCTE members, such

responses are not surprising). Another respondent expresses

 

 

 

 

 

 

212

the wish that NCTE would provide "more exposure . . . to
programs of English across the state and nation" by
including program descriptions in journals, so that
"interested readers could write directly to the schools for
detailed information." On the other hand, another
respondent mentions the helpfulness of the NCTE Recommended
Curriculuni Guides while another exclaims, "Thank God for
support of professional organizations like NCTE and a few
enlightened leaders and teachers."

Although attitudes are hard to judge in some cases, it
is my impression from studying the questionnaire responses
that the respondents might be fairly evenly divided between
those who seemed relatively content with English language
arts program development and evaluation as they know and
experience it and those who seem primarily frustrated,
impatient, or disappointed. One respondent said, "We’re
pretty well—off, frankly," and another said,
"No need to change." Most, however, had specific
suggestions and opinions about what might improve evaluation
processes. If there is a recurring theme among their
suggestions, it is the belief (a belief reflected in current
professional publications as well) that English language
arts professionals can and should see themselves as change—
agents in district English language arts programs. If
curricula and student assessment are to change, these
respondents believe English language arts teachers

themselves hold the key to such changes.

 

 

CHAPTER NINE

CONCLUSIONS, SPECULATIONS AND RECOMMENDATIONS

A review of the history of the criteria by which
English language arts programs have been assessed, along
with consideration of the contexts in which such assessment
has occurred, can provide valuable insights to help address
the issues and needs of today. The historical review
combined with an analysis of data regarding current thinking
about these issues can lead further to conclusions about the
past and present, speculations about their significance, and
recommendations of how best to address the issues and needs
of the future.

Both the context and the criteria questions raised in
chapter one are complex indeed. There is not one single
purpose for evaluation and assessment nor one single person
or group who should evaluate and set standards. There are a
variety of groups that are served by assessment and that
might ultimately stand to lose. There are countless
criteria that might serve to help evaluate English language
arts programs.

Given what has been learned about the past and the
present, the following conclusions, speculations, and
recommendations have emerged. Because my study has raised
as many questions as it has answered, I will not attempt to
offer resolutions in every case but will hope that in some
cases new questions might offer as much or more illumination

of the issues than would efforts to force closure.

213

 

 

214

Evaluation of English Language Arts

1. History and present circumstances reveal that there is
both an internal function for English language arts
assessment and an external function as well. By internal, I
mean all evaluation that goes on within the classroom
community between students and the classroom teacher. By
external, I mean all assessment that is initiated by someone
other than students and their classroom teacher. This study
has shown that there is now and always has been external
evaluation as well internal classroom evaluation. From the
colonial days when the selectmen and ministers visited the
schools to hear students recite, there have been external
evaluations of literacy learning in this country. In a
perfect world there would be only classroom evaluation, or
perhaps only self—evaluation. Teachers would understand
exactly what each learner needed and provide it. In an
imperfect world, however, both internal and external

assessment are realities that cannot be denied.

2. When standardized and objective tests first appeared,
they were hailed as ea correction to teachers’ subjective
impressions (e.g., p. 26 and p. 32). Today there is renewed
criticism of the subjective and idiosyncratic nature of
teachers’ grades (e.g., Canady and Hotchkiss) which could be
cited by those advocating national tests and standards.

In spite of recent loss of faith in standardized tests

 

 

215

expressed. by' many, there seems to be little thought (by
anyone except perhaps by classroom teachers) of going back

to depending solely on teachers’ grades.

3. The primary purpose for English language arts evaluation
should be to improve students’ literacy learning. There are
many other worthwhile purposes, but students’ needs should
be the first priority. Some of the tests discussed earlier
in this study have, in fact, clearly resulted in harm to
students and. their learning. Therefore, it is important
that English language arts assessments have clear purposes

that, as much as humanly possible, will not do harm.

4. English language arts professionals believe that testing
drives curriculum. Historical and current data reveal that
testing, to a greater or lesser degree, has almost always
driven curriculum (e.g., the effects of early twentieth-
century college entrance exams on secondary curricula, p.
20). This may mean today simply that a classroom teacher
adjusts a lesson after grading a test or it may mean that a
school district promotes or dismisses teachers on the basis
of the district’s student test scores. Although. we can
regret this situation, it seems that we are now at a point
where any new test--or means of assessment--must be worth

teaching toward.

 

 

216

5. Historically it has been true that some parts of English
language arts seem more difficult to assess than others.
Perhaps the most difficult to evaluate has been students’
understanding and experience of literature (pp. 134-36)--
which sometimes yields few observable behaviors. Or perhaps
the most difficult has been the spontaneous language arts—-
speaking and listening (p. 52). At any rate, as this study
has shown, each of the language arts has its own special
evaluation characteristics and difficulties. Whole language
teachers have sought ways to observe students’ language in
the context of classroom activities. However, holistic
efforts to integrate learning and assessment should not
ignore the distinctive evaluative characteristics of the

individual language arts.

6. Practicality has always been a consideration when
decisions have been made about English language arts
assessment. However, practicality should not be the primary
consideration. When standardized tests were introduced,
English language arts teachers as a rule embraced them
(e.g., p. 45). If some teachers questioned their validity,
those teachers also knew that the alternative was to go back
to reading stacks and stacks of student essays. Under the
circumstances, the practicality argument must have seemed
very seductive. More recently writing skills have been
easily and economically "tested" with multiple-choice

questions, but such tests do not, of course, actually test

 

 

 

 

217

writing. Today scoring sessions for writing assessment are
significantly more expensive than the old tests, but they
are considered worth the investment. Though practicality is
a significant issue, for assessments suggested in the

future, practicality must not be the first consideration.

7. History has shown that self—evaluation is not 61 new
recommendation for English language arts students but one
that has been encouraged for many years. Since self-
evaluation promotes reflection and learning, today’s English
language arts students should be encouraged to self-
evaluate. Self-evaluation ideally would involve not only
simple checklists like some designed in the past but

significant thought and discussion as well.

8. Evaluation of English language arts need not be bound by
the constraints of old tests and assessments. English
educators should continue to explore assessments that
acknowledge strengths as well as weaknesses, that consider
processes and strategies used, that view error as a
necessary part of the risk—taking needed for learning, and
that allow for possible responses not anticipated by the

test makers.

9. Informal classroom assessments are being advocated now,
as they have been in the past (e.g., p. 58). Whole language

promotes observation, or "kidwatching," usually citing as

 

218

examples observations being used with beginning readers and
writers. It is uncertain, however, how appropriate such
methods are for older students. For example, what effect
might observation have on students who are aware that they
are being watched--that their comments and even facial
expressions might be noted in anecdotal records? What might
such practices promote? It seems likely that informal
classroom assessments will eventually also be criticized as
too subjective and idiosyncratic, in the same way that
teachers’ grades have been in the past. At some point some
students will get shortchanged because of the way a teacher
interprets the content of a portfolio or observes a
student’s participation (or lack of participation) in a

collaborative project.

10. Portfolios present potential problems as well.
Questionnaire respondents and professional publications
report that portfolios are quickly becoming a part of
classroom, district, and even large-scale means of
assessment. They seem to work especially well in classrooms
as teachers and students gather' material over time as a
record of students’ literacy growth and achievement. Beyond
the classroom context, however, it is difficult to decide
how useful portfolios can be. What criteria for evaluation,
for example, allow a poem from one portfolio to be compared
with a letter from another? Are portfolios best used for

assessment purposes, then, as a way to monitor or document

 

 

 

219

rather than to evaluate? There are also troubling issues in
regard to portfolio preparation. How much time will
teachers allow or expect students to spend in compiling
materials for the portfolio? Will such activities always be
an unobtrusive part of classroom routines? If parents help

choose what to include, will the child whose parents neglect

to save samples of their child’s literacy be at a
disadvantage? Just what to do with portfolios seems
unresolved at this point. (One conference presenter, for

example, recently cautioned against portfolio presentations
for the school board that she had witnessed, occasions that

became "cutest kids" shows.)

11. As this study has shown, several English language arts
teachers have experimented with teaching students the
secrets of evaluation (e.g., p. 55). That is, they have
removed the evaluation mysteries by sharing the evaluative
criteria with students and training them to evaluate their
own and the work of others. This form of self-evaluation
seems to merit continued consideration, for it can lead to
internalization of evaluative criteria and is consistent

with efforts to promote critical thinking.

12. The question of who will evaluate and set the standards
is a central issue, one that raises countless questions that

need to be considered. carefully' by' English language arts

 

 

 

220

professionals. In part, the question of who will evaluate
seems partly settled, in light of this study, if we consider
past and present recommendations of English educators:
English language arts teachers consider themselves the
primary evaluation experts. External evaluation is the
issue that generates more of the unanswered questions. Even
if we assume there is merit in state and national assessment
(an assumption many are not willing to make), the issue of
standards is not resolved. Should there be standards? What
purpose do standards serve, other than to sort those who
fail from those who do not? To what extent should English
language arts teachers functicui as gatekeepers? What. do

failing scores reveal? What effects do failing scores have

on students? On the teacher? 0n the salaries of
administrators?

13. Current recommendations are that evaluation tools
should match current theory and practice. A look at

history, however, reveals some examples that raise questions
about this issue. In the midst of the experience curriculum
of the 1930s, for instance, the tests matched the most
trivial parts of the curriculum, as seen in the test item
cited earlier about how to answer the telephone. The basal
reader test-teach-test system has also created a situation
in which the tests and the teaching materials match--based
on the same theory of learning and published and sold as a

package. Today as English educators call for assessment to

221

be consistent with current theory, we need to consider
carefully the extent to which the curriculum and assessent

should match--and to ponder what the alternatives might be.

Evaluation of English Language
Arts Teaching Practices

1. As this study has shown, evaluation of English language
arts teaching practices have been and are currently often
inextricably linked to evaluation of student performance (p.
30, p. 84). Therefore, many' of the same issues are of
concern. For example, both internal and external evaluation
seem to be fixed for evaluation of teaching practices.
Classroom teachers are encouraged to self-evaluate their
teaching practices—-and to extend self-evaluation to
reflective practice and to classroom research. Self-

evaluation is not enough, however.

2. As this study has shown, some educators have insisted
that students’ test scores provide an objective measure for
evaluating teaching practices and therefore should be
welcomed by classroom teachers. They have pointed to the
subjectivity of administrators' evaluations and offer
student test scores as a corrective. Most English language
arts teachers realize, however, that student test scores do
not supply sufficient evidence of the success or failure of

their own teaching practices. Student test scores reflect

 

 

 

222

more than teaching practices-—they reflect, for example,
students’ family and social experiences and needs, the
learning theories and biases of the test-makers, and the
curriculum and teaching practices of prior school
experience. Student test scores should, then, serve as one
criterion by which English language arts teaching practices

might be measured.

3. As with evaluation of student performance, evaluation of
English language arts teaching practices should acknowledge
teachers’ strengths as well as weaknesses, consider
processes and strategies used, view error as ii necessary
part of the risk—taking needed for learning (teachers are
learners too), and allow for unanticipated classroom
possibilities that have the potential to enrich literacy

learning.

4. Similar cautions about. portfolios seem in order for
their use in evaluating teaching practices as for evaluating
student performance. English language arts teachers can
gather data to document their own exemplary performance, but
again there are unresolved issues that need to be
considered. How much time and effort, for example, might a
teacher spend showcasing her work? Might professionally

produced videotapes eventually become the norm? Where might

this trend end?

 

 

 

223

Evaluation gf English Language
Arts Curricula and Programs

1. Criteria for English language arts assessment--whether
it be for curricula, teaching practices, or student
performance——seem to be inseparable from societal and
educational contexts. In the early twentieth century, for
example, the methodologies of science and business were
thought to be directly applicable to education and thus
dramatically influenced the criteria by which English
language arts were evaluated. Likewise, the behaviorist
psychology of the 1970s shaped the forms of assessment of
that period. Today's emphasis on holism and integration
appears to be having a similar impact in shaping English
language arts assessment, though thoughts of national tests

and standards may be in direct conflict with this emphasis.

2. Curriculum guides are potentially powerful, for they
represent. a prescribed. curricula and. sometimes prescribed
teaching practices and assessment. measures as well. The
individual English language arts teacher who wants to
consider changes and innovations needs a way to know how
much of what kind of deviation will be permitted. Although
little is written about curriculum guides in the
professional publications, the work of the NCTE Curriculum
Guide committee has served a valuable purpose by publishing

model curricula and curricular asssessment criteria.

 

3. Criteria for English language arts assessment cannot be
separated from the conditions in which assessment takes
place. For example, widespread use of objective and
standardized tests occurred at least partly because of
changed conditions, i.e., growing numbers of students (p.
25). If school administrators had had the money to do so,
they might have opted to change the conditions rather than
the means of evaluation but instead chose the less costly
alternative. Conditions such as class size have so much
impact on teaching practices and curriculum, at least in
part because they are money issues, that they in fact become
criteria by which English language arts progams are

evaluated.

4. Some English language arts teachers in the past have
been the primary decision-makers regarding curriculum and
evaluation (p. 65). Current professional publications and
questionnaire data show that some English language arts
teachers actively involved in the profession seek for
themselves and their colleagues time, knowledge, and
opportunity—-in order to become better classroom evaluation
experts and in order to make decisions affecting the design
and assessment of curricula, teaching practices, and student
performance. These educators realize that they and their
colleagues cannot be expected to switch from using published

multiple-choice tests one day to using portfolios and

 

225

observation the next and therefore seek the time and
knowledge to become better informed, self-confidently
believing in their own ability to become curriculum

designers and evaluators.

5. Historically, English language arts professional
organizations have helped English educators and district
administrators determine how to assess curricula, teaching
practices, and student performance. NCTE’s resolutions have
sometimes served at least as indirect criteria by which
English language arts programs and practices could be
assessed. As NCTE’s past actions regarding testing and
evaluation have been traced, it has sometimes been difficult
to know whether those actions were helpful or not. As this
study shows, NCTE has from time to time jumped on testing
bandwagons, e.g., by passing the 1923 resolution encouraging
English language arts teachers to use more standardized
tests and by publishing books and journal articles extolling
the benefits of standardized and objective tests. Without a
crystal ball, it must be difficult for NCTE leaders to chart
the best course. Because of such difficulties, it seems
clear is that the professional publications should maintain
their policy of providing a forum in which a range of

opinions can be heard.

6. A look at the large—scale studies of English language

arts programs in the past is instructive today. When we

 

 

 

 

226

consider the quantities of information gathered in the past
(sometimes involving hundreds of interviews and classroom
observations and many thousands of documents), it is easy to
wonder what use may have been made of the findings. As this
study has shown, often it has been recommended that school
district administrators and English language arts teachers
use study results to compare their own circumstances to
those described in the studies. Their comparisons might
then lead teachers and administrators to use the data to
highlight their own needs and to argue for improved
conditions in their own situations. Such studies rarely
occur' today, at least in English language arts circles,
surely at least partly for financial reasons. If government
and private money was used for such projects in the past, it
seems likely that some of the money once spent on such
studies is today being spent on efforts to design new ways
to test students. Perhaps the link is not a strong one, but
large-scale studies of the past allowed teachers to evaluate
and perhaps reform their own circumstances in light of what
was happening elsewhere. Today or tomorrow’s tests of
student performance may not yield information that is as
useful. Whether large—scale studies are feasible today is
uncertain, but a suggestion offered by a questionnaire
respondent seems worth considering--that NCTE should
supplement conference sessions with journal descriptions of
specific programs, providing enough detail so that teachers

could understand both the programs and the conditions that

 

227

make the programs possible. Both the lessons from history
and from current questionnaire responses suggest that
English language arts journals should also include more
discussion of even political and money issues that seem
seldom included in current English language arts

professional publications but that have significant impact.

The ideal evaluation of English language arts programs
would be multi-dimensional, incorporating the perspectives
of a variety of experts and evaluation stakeholders, as
suggested by Goodlad's descriptions of curriculum cited in
chapter one. This study has presented a variety of
additional useful English language arts evaluation criteria
and processes from the past and the present--everything from
teachers’ grades and standardized test scores to library
circulation figures and the music habits of English
teachers. While it is impossible to dictate which criteria
should be used in every English language arts program, it
seems especially important that students' English language
arts test scores should serve as just one criterion by which
student performance is measured and as just one criterion by
which teaching practices are measured and as just one
criterion by which curriculum is evaluated. No single
English language arts criterion should by itself decide
major curricular, teaching practice, or student performance

issues.

APPENDIX A

 

 

 

_ . 231
Criteria for Assessing English Language Arts
Curricula, Teaching Practices, and Students' Performance
Ellen H. Brinkley, 1990

For each question please check as many items as apply FOR THE
SCHOOL DISTRICT(S) YOU KNOW BEST. Feel free to add explanation
and/or connsnt in the margins or on the back.

ENGLISH LANGUAGE ARTS CURRICULA:

1. How significant are the following factors in shaping the
English/Language Arts curriculum?

accreditation bodies

——+——4———+———4———+——H——
(e.g., North Central, state) very not
significant significant

college expectations
administrator and board interests
professional literature
community influences

faculty skills and interests
school schedule

student needs and interests

test results

other:

 

 

1.- - <_ _. ‘-
—.— ...- 1.. .r

l

war-+11-

 

._ 4- .. T -L st .... ._ ....

.L

 

., -- -_ .. 1 i l l ._
..P _ .1_ .- ﬂ. -.. ...— .. ..

4- ,..._ _.

2. How significant are the following groups in evaluating the
English/Language Arts curriculum?

teaching faculty
curriculum coordinator/principal
external consultant

 

 

 

 

-- 4»— -— q- -

 

 

 

1111, ; 5 : :
4‘; I I I I
44.7 . . . .
4; 1 r I 1
students . i I r .1
accreditation bodies Ti I I i :4,
l ﬁr l I
community/board . i . . . i
T I r I I V
other 3 1 l 1 1 I a
v 1 V I ' f

3. How significant are the following persons in designing and
revising curriculum guides?

 

teaching faculty

 

 

 

 

 

curriculum coordinator VAT ; I ; E :
principal or other administrator 44% . : . :~ 1
other: L 1 1 1 J_
‘ﬁ I v I r l
4. use is made of curriculum guides?

teachers are expected to follow them explicitly

they are intended to be followed loosely

they are often ignored

other:

value does an English/Language Arts curriculum guide have?

its development/revision provides occasion for faculty to
discuss pedagogical issues and seek consensus

rovides information for new teachers in the district
grovides an official source for reference by administrators

and teachers in discussion with parents and board members
other:

Ill I illlli

 

 

 

 

. . 232
Criteria for Assessing English Language Arts
Brinkley, 1990

liIILISH LANGUEGE ARTS TEBKIUDIEIEUMfrICES:

1. How significant are the following persons in determining
teaching practices?
classroom teacher . , . ,

I very not
I Significant significant
curriculum coordinator
. ' ——»——4———+—-—+-—«———+-—
princ1pal I I I , I
other:
———————————————-—— —+———+——ﬁ———+——ﬁ———+—

2. How significant are the following factors in evaluating

teaching practices?

-r-

l
v

principal

curriculum coordinator
peer teachers
parents/board members
students

teacher self-evaluation
other:

 

i ~r - i i i i

‘F -, _- LT __ ._
_ -_ 0_._ -+ -P W_

3. How significant are the following influences in changing
teaching practices?

professional literature ,
inservice/staff development training 4T
exchange of ideas among teachers '
in the building 447 . L . , ,
exchange of ideas among teachers r
in the district , . , 1
exchange of ideas within a wider
group (e.g., writing project
support group) . + ,
administrators and board 1*
constraints of facilities and
school schedule 4 . 1
student needs and interests . . .
test results . .
community : ; :
other:

 

 

 

EIKEHISH IJUKIDKEE lﬂtns STtHnﬂvr ASSES§§ﬂ§IP=
1. Which of the following tests or means of assessment are used?

SAT, ACT, CAT, other nationally-normed tests
state-mandated tests:

objective

writing sample
holistic scorinig of writing samples
student portfolios
other:

 

 

 

. . 230
Criteria for Assessing English Language
Brinkley, 1990

2. In the classrooms of your district,

 

Arts

what means of assessment

are currently being used for English/Language Arts?

objective tests

essay tests

observation of students
interaction with students
student compositions
student performances

 

student portfolios
contractual grading
student self—evaluation
other:

HHIHH

 

oral reading (e.g., miscue analysis)

 

In your school district how significant are the following

factors in determining means of assessment of English/Language Arts?

professional literature

 

l l 1 l I

 

I
not

 

 

 

 

 

 

 

very
significant significant
inservice/staff development , 1 . , , .
exchange of ideas among teachers T I I : I I
in the building ' ' ' ' ’ '
exchange of ideas among teachers . , , 1 , .
in tile district ‘ ' ' '
exchange of ideas within a wider 44 , . 1 . .
group ﬁr ﬁr I v I 1
administrators and board 4% i { l % j
time constraints g : § ; 1 ;
Other ‘ i : % ¢ : :
IF YOU HAVE THE TIME AND PATIENCE . . .
1. Briefly describe, to the extent you are aware of it, the prgggsg

in your district for evaluating the
curriculum, teaching practices, and

English/Language Arts program——
student performance.

2. If this process of evaluating the English Language Arts program
could be altered, what changes might help dec151on—makers be in
a better position to suggest improvements?

15...; - 'I-I—w-r r.

 

 

APPENDIX B

 

Department of Engnsn Kalamazoo Mlcmgan 49008-5092

 

 

WESTERN MICHIGAN UNIVERSITY

 

May 10, 1990

Dear NCTE Classroom Practices in Teaching English
Committee Member:

As English educators we have witnessed dramatic changes both in theory and in
classroom practice. Given today's emphasis on test scores and student assess—
ment, we often find school districts self—consciously wondering how effective
their programs are, how they compare with others like or different from
themselves, and how well their students and faculties measure up.

I am conducting research to discover techniques and criteria that school
districts. and perhaps external consultants. can use to assess the effective—
ness of K-12 English Language Arts (1) curricula, (2) teaching practices. and
(3) student performance.

Your knowledge, experience, and opinions can provide invaluable information
that will be useful to this study and to the profession.

Please take a few minutes to complete the enclosed survey and return it to me
by gggg lg. 1990, if at all possible. You need not identify yourself, but if
you do. I'll be happy to send you the results of the survey once the study is
completed. .

 

Sincerely,

{WA/W
Ellen H. Brinkley

EHBzct

 

 

B I BLIOGRAPHY

 

BIBLIOGRAPHY

Achtenhagen, Olga. "Why Is an Examination--And What of It?"
English Journal 15 (1926): 285-89.

Allen, Virginia F. "Riddle: What Does a Reading Test Test?"
Learning, Nov. 1978: 87-89.

APEX Evaluated and Revised. Trenton, MI: Trenton Public
Schools, 1975.

Applebee, Arthur N. Tradition and Reform in the Teaching of
English. Urbana: NCTE, 1974.

Applebee, Arthur N. Writing in the Secondary School.
Urbana: NCTE, 1981.

Applebee, Arthur N. et al. Learning to Write in our
Nation’s Schools. U.S. Dept. of Education, 1990.

Aronson, Edith and Roger Farr. "Issues in Assessment."
Journal of Reading 32 (Nov. 1988): 175-77.

Association of English Teachers of Western Pennsylvania.
Suggestions for Evaluating Junior High Writing.
Champaign, IL: NCTE, n.d.

---. Suggestions for Evaluating Senior High Writing.
Champaign, IL: NCTE, n.d.

Atwell, Nancie. In the Middle. Upper Montclair, NJ:
Boynton/Cook, 1987.

Austin, Mary C. "Evaluating status and Needs in Reading."

Evaluation in Reading. Ed. Helen M. Robinson. .
Supplementary Educational Monographs No. 88. Chicago:

U of Chicago, 1958. 36-41.

Backlund, Phil et a1. "Evaluating Speaking and Listening
Skill Assessment Instruments." Language Arts 57
(1980): 621-27.

Baker, Franklin T. "The Teacher of English." English
Journal 2 (1913): 335-43.

Barnes, Walter et al. "Judging Teachers’ Judgments in
Grammar Errors." English Journal 18 (1929): 120-25+.

235

 

 

‘Hhﬁ

 

236

Bay Village (Ohio) City Schools. "District Holistic
Assessment of Reading Scale."

Beverly, Clara. "Standards in Oral Composition: Grade One."
Elementary English Review 2 (1925): 360-61.
Black, Janet K. "There’s More to Language Than Meets the

Ear." Language Arts 56 (1979) 516—33.

Bobbitt, Franklin. How to Make a Curriculum. Boston:
Hougton, 1924.

Brett, Sue M. "The Federal View of Behavioral Objectives."

On Writing Behavioral Objectives for English. Eds.
John Maxwell and Anthony Tovatt. Champaign, IL: NCTE,

1970. 43-47.

Brinkley, Ellen Henson. "A Gift of the Past." Goldenseal
10.4 (1984) 4-8.

Broening, Angela M. "English as Experience in Secondary

Schools." Essays on the Teaching of English. NY:
Russell and Russell, 1940. 58—78.

---. "The Role of the Teacher of English in a Democracy."
English Journal 30 (1941): 718-29.

Broening, Angela M. et a1. Conducting Experiences in
English. NY: Appleton-Century—Crofts, 1939.

Brown, Margaret E. "A Practical Approach to Analyzing
Children’s Talk in the Classroom." Language Arts 54
(1977): 506-10.

Brown, Rexford. "The Examiner Is Us." English Education 16
(1984): 220-25.

Burns, Paul C. Diagnostic Teaching of the Language Arts.

Itasca, IL: F. E. Peaock, 1974.

Buros, Oscar Krisen, ed. English Tests and Reviews.
Highland Park, NJ: Gryphon Press, 1975.

Burrill, Lois E. "How Well Should a High School Graduate
Read?" NASSP Bulletin. Mar. 1987: 61—71.

Burton, Dwight L. Literature Study in the High Schools.

NY: Henry Holt, 1959.

Burton, Dwight L. et a1. Teaching English Today. Boston:
Houghton Mifflin, 1975.

 

237

Bussis, Anne M. and Edward A. Chittenden. "Research
Currents: What the Reading Tests Neglect." Language
Arts 64 (1987): 302—08.

Calkins, Lucy McCormick. The Art of Teaching Writing.
Portsmouth, NH: Heinemann, 1986.

Camenisch, Sophia Catherine. "Some Recent Tendencies in the
Minimum-Essentials Movement in English." Engligh
Journal 15 (1926): 181—90.

Canady, Robert Lynn and Phyllis Riley Hotchkiss. "It’s a
Good Score! Just a Bad Grade." Phi Delta Kappan
(Sept. 1989): 68-71.

Carini, Patricia F. "The Prospect School: Taking Account of
Process." Testing and Evaluation: New Views.
Washington, DC: Association for Childhood
International, 1975.

Carruthers, Robert B. Building Better English Tests.
Champaign, IL: NCTE, 1963.

Certain, C. C. "Are Your Pupils Up to Standard in
Composition?" English Journal 12 (1923): 365-77.

---. "A Testing Program for the New School Year."
Elementary English Review 3 (Sept. 1926): 211—21.

Chaplain, Miriam. "Pushing Minimun Standards Toward the
Maximum in English." English Education 9 (1978): 212—
17.

Chew, Charles R. "Large Scale Writing Assessment: An

Instructional Message." Testing in the English
Language Arts. Ed. John Beard and Scott McNabb.
Rochester, MI: Michigan Council of Teachers of

English, 1985. 49-51.

Clapp, Frank L. "A Test for Habits in English." Elementary
English Review 3 (Jan. 1926): 42-46.

Clapp, Frank L. and Robert V. Young. "A Self—Marking
English Form Test." Elementary English ReView 5 (Dec.
1928): 304-06.

Clarke, Lori. "Creative Teaching--Why Not Creative
Testing?" English Education 4 (1972): 43-47.

Cohen, Sheldon S. A History of Colonial Education. 1607—
1776. NY: John Wiley, 1974.

238

Congreve, Willard. "Implementing and Evaluating the Use of
Innovations." Innovations and Change in Reading
Instruction. 67th Yearbook — Part II. National
Society for the Study of Education. Chicago: U of
Chicago, 1968. 291-319.

Cook, Walter W. "Evaluation in the Language-Arts Program.“
Teaching Language in the Elementary School. 43rd
Yearbook — Part II. National Society for the study of
Education. Chicago: U of Chicago, 1944. 194-214.

Cooper, Charles R., ed. The Nature and Measurement of
Competency in English. Urbana, IL: NCTE, 1981.

Corbett, William D. "Let’s Tell the Good News About Reading
and Writing." Educational Leadership 46 (Apr. 1989):
53.

Coulter, Vincil Carey. "Financial Support of English

Teaching." English Journal 1 (1912): 24—29.

Courtis, S. A. "The Value of Measurements: II. The Uses of
the Hillegas Scale." English Journal 8 (1919): 208-17.

Cox, Sidney. The Teaching of English. NY: Harper and
Brothers, 1928.

Cremin, Lawrence A. The Transformation of the School. NY:
Alfred A. Knopf, 1961.

Davis, Frederick B. "What Do Reading Tests Really Measure?"
English Journal 33 (1944), 180-87.

Dawson, Mildred A. "Building a Language-Composition
Curriculum in the Elementary School." Elementary
English Review 8 (Oct. 1931): 194-96.

DeBoer, John D. "Earmarks of a Modern Language Arts Program
in the Elementary School." Elementary English Review
31 (1954): 485-93.

Derrick, Clarence. "Tests of Writing." English Journal 53
(1967), 496—99.

Diederich, Paul B. Measuring Growth in English. Urbana:
NCTE, 1974.

Distefano, Phillip and Joellen Killion. "Assessing Writing
Skills Through a Process Approach." English Education
16 (1984): 203-07.

Doll, Ronald C. Curriculum Improvement. 2d ed. Boston:
Allyn, 1970.

 

 

 

 

239

Donmoyer, Robert. "Curriculum Evaluation and the
Negotiation of Meaning." Language Arts 67 (1990): 274-
86.

Durkin, Delores. "Testing in the Kindergarten." The

Reading Teacher 40 (1987): 766—70.

Eberhart, Wilfred. "Evaluation in English in the Eight-Year
study." English Journal 28 (1939): 261-70.

Edelsky, Carole and Susan Harman. "One More Critique of
Reading Tests——With Two Differences." English
Education 20 (Oct. 1988) 157-71.

Elliott, Velma L. "Peer Evaluation for Teachers? Why Not?"
Elementary English 51 (1974): 727-30.

Estes, Thomas H. "A Scale to Measure Attitudes Toward
Reading." Journal of Reading 15 (1971): 135-38.

Evans, David N. "Standards Are Needed for CRT!"
Educational Leadership 32 (1975): 268-70.

Evans, M. Eleanor. "Objective Tests in Eighth Grade
Literature." Elementary English Review 4 (Jan. 1928):
13-22.

Faigley, Lester et a1. Assessing Writers' Knowledge and
Processes of Composing. Norwood, NJ: Ablex, 1985.

Farr, Roger. "New Trends in Reading Assessment."
Curriculum Review, Sept./Oct. 1987, 21-23.

Farr, Roger and Nancy L. Roser. "Reading Assessment: A Look
at Problems and Issues." Journal of Reading 17 (1974):
592-99.

Farstrup, Alan E. "Point/Counterpoint: State by State
Comparisons on National Assessments." Reading Today 7
(Dec. 1989/Jan. 1990), 1.

Faust, Wirt G. "An Effort to standardize Descriptive Theme-
Writing for the Senior Year of the High School."
English Journal 5 (1916): 257—71.

Ferguson, Bill L. "Behavioral Objectives-~No!" English
Education 3 (1971): 52-55.

Findley, Warren G. "Purposes of School Testing Programs and
Their Efficient Development." The Impact and
Improvement of School Testing Programs. 62nd Yearbook.
National Society for the Study of Education. Chicago:
U of Chicago, 1963. 1—27.

 

 

 

 

 

 

 

240
"Implications of Parent, Teacher, and
John Beard

Fitzgerald, Sheila.
Student Perspectives on the Value of School Tests."
Testing in the English Language Arts. Ed.
and Scott McNabb. Rochester, MI: Michigan Council of
1985. 28-43.

Teachers of English,
"Evaluation of Learning in Writing."

Handbook on Formative and Summative Evaluation of
Bloom et a1. NY:

Foley, Joseph J.
Ed. Benjamin S.

Student Learning.
McGraw-Hill, 1971. 767-813.
Foster, Jack D. "The Role of Accountability in Kentucky’s
Education Reform Act of 1990." Educational Leadership
48 (Feb. 1991): 34-36.
"Latest Model." Reading Teacher 40

Fredericks, Anthony D.
(1987): 790-91.

French, John W. "What English Teachers Think of Essay
Testing." English Journal 46 (1957): 196-201.

Gates, Arthur I. "The Measurement and Evaluation of
Achievement in Reading-" The_Teaghing_gf_Readingi_A

36th Yearbook - Part I. National

Bloomington, IL:

Second Report.
Society for the Study of Education.
359-88.

Public School Publishing, 1937.
Assessing Sub—rosa

"Research Currents:
Language Arts 61

Gilmore, Perry.
Skills in Children’s Language."
(1984): 384-91.
Glatthorn, Allan A. A Guide for Developing an English
Curriculum for the Eighties. Urbana: NCTE, 1980.
"Reading Tests: Past, Present, and
Reading Diagnosis and Evaluation. Ed.
1970. 55-64.

Glock, Marvin D.
Future."
Dorothy L. DeBoer. Newark, DE: IRA,
M. Frances Klein. Curriculum
NY: Teachers College,

Goodlad, John I. Foreword.
Reform in the Elementary School.
1989.

Goodman, Kenneth S. et a1. Report Card on Basal Readers.
Katonah, NY: Richard C. Owen, 1988.
Portsmouth,

Goodman, Kenneth S., Yetta M. Goodman, and Wendy J. Hood,
eds. The Whole Language Evaluation Book.
Evaluation of

NH: Heinemann, 1989.
Ed.

Goodman, Yetta. "Evaluation of Students:
Teachers." Whole Lan ua e Evaluation Book.
Portsmouth, NH: Heinemann,

Kenneth S. Goodman et al.

1989. 3-14.

 

 

 

 

 

241

Goodman, Yetta M. and Carolyn L. Burke. Reading Miscue
Inventory. London: Macmillan, 1972.

Graves, Donald H. Build a Literate Classroom. Portsmouth,
NH: Heinemann, 1991.

—--. NCTE Convention. Atlanta, 1990.

Gray, William S. "A Decade of Progress." The Teacing of
Reading: A Second Report. 36th Yearbook — Part I.
National Society for the Study of Education.
Bloomington, IL: Public School Publishing, 1937. 5-21.

-——. "Nature and Scope of a Sound Reading Program."
Reading in the High School and College. 47th Yearbook
— Part II. National Society for the Study of
Education. Chiago: U of Chicago, 1948. 46-48.

Greene, Harry A. and William S. Gray. “The Measurement of
Understanding in the Language Arts." The Measurement
of Understanding. 45th Yearbook - Part I. National

Society for the Study of Education. Chicago: U of
Chicago, 1946. 175-200.

Groff, Patrick. "Behavioral Objectives for Children’s
Literature?--Nol" Reading Teacher 30 (1977): 653-63.

Gross, David M. and Sophronia Scott. Time 16 July 1990: 56-

o

 

Gursky, Daniel. "Ambitious Measures." Teacher 2 (Apr.
1991): 50—56.

Harman, Susan. "National Tests, National Standards,
National Curriculum." Language Arts 68 (1991): 49—50.

Harring, Sydney. "A Scale for Judging Oral Compositions."
Elementary English Review 5 (Mar. 1928): 71-73+.

Hassett, John J. "Checking the Accuracy of Pupil Scores in
Standardized Tests." English Journal 67 (1978): 30—31.

Hatch, Roger Conant. "A Standard of Measurement in English
Composition." English Journal 9 (1920): 338-44.

Hatfield, W. Wilbur. "The Ideal Curriculum." Elementary
English Review 9 (Sept. 1932): 179—81+.

Heathington, Betty S. and J. Estill Alexander. "A Child-
Based Observation Checklist to Assess Attitudes Toward
Reading." Reading Teacher 31 (1978): 769-71.

Henry, George H. "An Attempt to Measure Ideals." English
Journal 35 (1946): 487—93.

 

 

 

242

---. "Only Spirit Can Measure Spirit." English Journal 43
(1954): 177-82.

Hester, Kathleen B. Teaching Every Child to Read. 2d ed.
NY: Harper, 1955.

Hillocks, George, Jr. Research on Written Composition.
Urbana: NCTE, 1986.

Hodges, John C. "The State-Wide English Program in
Tennessee." English Journal 34 (1945): 71—76.

Holmes, Betty C. and Nancy L. Roser. "Five Ways to Assess
Readers’ Prior Knowledge." Reading Teacher (1987):
646-49.

Hook, J. N. "Characteristics of Award-Winning High
Schools." English Journal 50 (1961): 9—15.

---. A Long Way Together. Urbana: NCTE, 1979.

-——. "The Tri—University BOE Project: A Project Report."
On Writing Behavioral Objectives for English. Ed. John
Maxwell and Anthony Tovatt. Champaign, IL: NCTE, 1970.
75—86.

Hook, J. N. et al. Representative Behavioral Objectives for
High School English. NY: Ronald, 1971.

Hosic, James F. "The Chicago Standards in Oral Compositon."
Elementary English Review 2 (1925): 170-71.

Howard, Elizabeth Zimmerman. "Appraising Strengths and
Weaknesses of the Total Reading Program." Evaluation
of Reading. Ed. Helen M. Robinson. Supplementary
Educational Monographs No. 88. Chicago: U of Chicago,
1958. 169-73.

Huey, Edmund Burke. The Psychology and Pedagogy of Reading.
NY: Macmillan, 1908.

Hunt, Lyman C., Jr. "Evaluation Through Teacher-Pupil

Conferences." The Evaluation of Children’s Reading
Achievement. Ed. Thomas C. Barrett. Newark, DE: IRA,
1967. 111-125.

Improving SAT Scores. Lexington, MA: Ginn, 1985.

Instructional Objectives Exchange. English Skills. 7—9.
Los Angeles: Instructional Objectives Exchange, 1970.

 

 

 

 

 

243

Jackson, Phillip W. "In Grades Seven through Nine."

Evaluation of Reading. Ed. Helen M. Robinson.
Supplementary Educational Monographs No. 88. Chicago:
U of Chicago, 1958. 28-31.

Jewett, Arno. "Accountability in English." English
Education 3 (1971): 5-15.

--—. English Language Arts in American High Schools.
Washington, DC: U.S. Dept. of H.E.W. Bulletin 1958.
No. 13. 1959.

Jonas, Leah. "Power-Testing in Literature." English
Journal 29 (1940): 799-805.

Johnson, Clifton. Old—Time Schools and School-books. 1904.
Intro. Carl Withers. NY: Dover, 1963.

Johnston, Peter. "Teachers as Evaluation Experts." Reading
Teacher 40 (1987): 744-54.

Judine, Sister M. A Guide for Evaluating Student
Composition. Urbana: NCTE, 1965.

Judy, Stephen N. ABCs of Literacy. NY: Oxford, 1980.

---. "Standardardized Tests and Their Alternatives."
English Journal 67 (1978): 5-6.

Kirby, Dan and Tom Liner with Ruth Vinz. Inside Out. 2d ed.
Portsmouth, NH: Heinemann, 1988.

Kirschenbaum, Howard, Rodney Napier, and Sidney B. Simon.
Wad-ja-Get? NY: Hart, 1971.

Klapper, Paul. Teaching English in Elementary and Junior
High Schools. NY: D. Appleton-Century, 1915.

Klein, M. Frances. Curriculum Reform in the Elementary
School. NY: Teachers College, 1989.

Kopp, O. W. "The Evaluation of Oral Language Activities:
Teaching and Learning." Elementary English 44 (1967):
114-23.

Koos, Leonard V. "The National Survey of Secondary .
Education: Its Implications for Teachers of English."

English Journal 22 (1933): 303—13.

Koziol, S. M., Jr. and Patricia Burns. "Using Self— .
Reports for Monitoring English Instruction." English

Education 16 (1985): 113-21.

 

 

244
Krest, Margie. "Adapting the Portfolio to Meet Student
Needs." English Journal 79 (Feb. 90): 29-34.

Langer, Judith A. et al. Learning to Read in Our Nation’s
Schools. U.S.Dept. of Education, 1990.

Lennon, Roger T. "What Can Be Measured?" Reading Teacher
15 (1962): 326-37. Rpt. in Measurement and Evaluation

of Reading. Ed. Roger Farr. NY: Harcourt, 1970. 18-
34.
Leonard, S. A. "The Wisconsin Tests of Grammatical

Correctness." English Journal 15 (1926): 430-42.

Leonard, Sterling Andrus. Essential Principles of Teaching
Reading and Literature. Philadelphia: J.B. Lippincott,
1922.

---. "How English Teahers Correct Papers." English Journal
12 (1923): 517-32.

Levi, Ray. "Assessment and Educational Vision: Engaging
Parents and Learners." Language Arts 67 (1990): 269—
73.

Lloyd-Jones, Richard and Andrea A. Lunsford, eds. The

English Coalition Conference: Democracy Through

Language. Urbana: NCTE; NY: MLA, 1989.

Lundsteen, Sara W. "Teaching and Testing Critical Listening
in the Fifth and Sixth Grades." Elementary English 41
(1964): 743-52.

Lunsford, Andrea A. "The Past--and Future--of Writing
Assessment." Writing Assessment. Eds. Karen
Greenberg et al. NY: Longman, 1986. 1-12.

Madaus, George F. "The Influence of Testing in the
Curriculum." Critical Issues in Curriculum. 87th

Yearbook - Part I. National Society for the Study of
Education. Chicago: U of Chicago, 1988. 83-121.

---. "What Do Test Scores ’Really’ Mean in Educational

Policy?" Testing in the English Language Arts. Ed.
John Beard and Scott McNabb. Rochester, MI: Michigan

Council of Teachers of English, 1985. 1-11.

Mandel, Barrett, J., ed. Three Language Arts Curriculum
Models. Urbana: NCTE, 1980.

Marcus, Albert. "Diagnosis and Accountability." Elementary
English 51 (1974): 731-35.

245

Mason, James Hocker. "The Educational Milieu, 1874-1911."
English Journal 68.4 (1979): 40-45.

Mavrogenes, Nancy A. et al. "Concise Guide to Standardized
Secondary and College Reading Tests." Journal of
Reading 18 (1974): 12—22.

Maxwell, John C. Introduction. Common Sense and Testing in
English. Urbana: NCTE, 1975. iv-V.

Maxwell, John C. and Anthony Tovatt, eds. On Writing
Behavioral Objectives for Enclish. Champaign, IL:
NCTE, 1970.

Mayher, John S. and Rita S. Brause. "Learning Through
Teacing: Is Testing Crippling Integrated Language
Education?" Language Arts 63 (1986): 390-96.

McCaig, Roger A. "What Research and Evalation Tells Us
About Teaching Written Expression in the Elementary
School." The Language Arts Teacher in Action.
Kalamazoo, MI: Western Michigan University, 1977. 46—
56.

--— "The Writing of Elementary School Children." Grosse

Point, MI: Grosse Point Public School System, 1972.

McDonald, Arthur S. "Measuring Reading Performance."

Measurement and Evaluation of Reading. Ed. Roger
Farr. NY: Harcourt, 1970. 10—17.

McKey, Eleanor F. "Do Standardized Tests Do What They Claim
to Do?" English Journal (1961): 607-11.

Meisels, Samuel J. “High-Stakes Testing in Kindergarten."
Educational Leadership 46 (Apr. 1989): 16-22.

Melear, John D. "An Informal Language Inventory."

Elementary English 51 (1974), 508-11.

Mellon, John C. National Assessment and the Teaching of
English. Urbana: NCTE, 1975.

Miller, Vera V. and Wendell C. Lanton. "Reading Achievement
of School Children—-Then and Now." Elementary English
33 (1956): 91-97.

Moffett, James. "Misbehaviorist English: A Position Paper."
On Writing Behavioral Objectives in English. Ed. John
Maxwell and Anthony Tovatt. Champaign, IL: NCTE. 111-
16.

246

Moore, David W. "A Case for Naturalistic Assessment of
Reading Comprehension." Language Arts 60 (1983): 957-
69.

Moore, Walter J. and Larry D. Kennedy. "Evaluation of
Learning in the Language Arts." Handbook on Formative

and Summative Evaluation of Student Learning. Ed.
Benjamin F. Bloom et al. NY: McGraw—Hill, 1971. 399-

445.

Morreau, Lanny E. "Behavioral Objectives: Analysis and
Appreciation." Accountability and the Teaching of
English. Ed. Henry B. Maloney. Urbana: NCTE, 1972.
35-52.

Morrow, Lesley Mandel. "Assessing Children's Understanding
of Story Through Their Construction and Reconstruction
of Narrative." Assessment for Instruction in Early
Literacy. Ed. Lesley Mandel Morrow and Jeffrey K.
Smith. Englewood Cliffs, NJ: Prentice-Hall, 1990.
110-34.

Moscrip, Ruth. "Shall We Test in Literature?" Elementary
English Review 5 (1928): l40—41+.

Muller, Herbert J. The Uses of English. NY: Holt, 1967.

Myers, Miles. "The Politics of Minimum Competency." The
Nature and Measurement of Competency in English. Ed.
Charles C. Cooper. Urbana: NCTE, 1980.

---. A Procedure for Writing Assessment and Holistic
Scoring. Urbana: NCTE, 1980.

NCTE. Common Sense and Testing in English. Urbana: NCTE,
1975.

NCTE Commission on the Curriculum. "A Check List for
Evaluating the English Program in the Junior and Senior
High School.“ English Journal 51 (1962): 273-82.

NCTE Commission on the English Curriculum. The English

Language Arts in the Secondary School. NY: Appleton—
Century—Crofts, 1956.

-—-. Language Arts for Today’s Children. NY: Appleton-
Century-Crofts, 1954.

NCTE Committee on Correlation. A Correlated Curriculum.
NY: D. Appleton-Century, 1936.

NCTE Committee on High School-College Articulation. "But
What Are we Articulating With? English Journal 51
(1962): 967-79.

 

247

——- "What the Colleges Expect." English Journal 50 (1961):

402-12.

NCTE Committee to Review Curriculum Guides. "Trends in
Curriculum Guides." Elementary English 45 (1968): 891-
97.

Noyes, Ernest. "Progress in Standardizing the Measurement

of Composition." English Journal 1 (1912): 532-36.

O’Neil, John. "Drive for National Standards Picking Up
Steam." Educational Leadership 48 (Feb. 1991): 4—8.

Parker, Flora E. "The Value of Measurements: I. The
Measurement of Composition in English Classes."
English Journal 8 (1919): 203-08.

Paulis, Chris. "Holistic Scoring." The Clearing House
(Oct. 1958): 57-60.

Peckham, Irvin. "Statewide Direct Writing Assessment."
English Journal 76 (Dec. 1987): 30-33.

Peterson, Gordon. "Behavioral Objectives for Children’s
Literature? Yes!" Reading Teacher 30 (1977): 652—60.

Poley, Irvin C. "Learning by Testing." English Journal 20
(1931): 128-36.

Pooley, Robert C. "Where Are We At?" English Journal 39
(1950): 497-504.

Pooley, Robert C. and Robert C. Williams. The Teacing of
English in Wisconsin. Madison: U of Wisconsin, 1948.

Probst, Robert E. Response and Analysis. Portsmouth, NH:
Boynton/Cook, 1988.

Purves, Alan C. "Evaluating Growth in English." The
Teaching of English. Ed. James R. Squire. Chicago: U
of Chicago, 1977. 230—59.

---. "Evaluation of Learning in Literature." Handbook on
Formative and Summative Evaluation of Student Learning.
Ed. Benjamin S. Bloom et al. NY: McGraw—Hill, 1971.
697—766.

--—. "’Measure what mean are doing. Plan what man might
become.’" On Writing Behavioral Objectives for
English. Ed. John Maxwell and Anthony Tovatt.
Champaign, IL: NCTE, 1970. 87-96.

---. NCTE Convention. Baltimore, 1989.

 

 

248

Rankin, Earl F., Jr. "The Cloze Procedure--Its Validity and

Utility." Measurement and Evaluation of Reading. Ed.
Roger Farr. NY: Harcourt, 1970. 237-53.

Reif, Linda. "Finding the Value in Evaluation: Self-
Assessment in a Middle School Classroom." Educational
Leadership 47 (Mar. 1990): 24—29.

Richards, T. S. [pseudonym] "Testmania: The School Under
Seige." Learning 17.7 (Mar. 1989): 64-66.

Romano, Tom. Clearing the Way. Portsmouth, NH: Heinemann,
987.

Ruhlen, Helen V. "Experiment in Testing Appreciation."
English Journal 15 (1926): 202-09.

Sangren, Paul V. Improvement in Reading Through the Use of

Tets. Kalamazoo, MI: Western State Teachers College,
19311.

Satterfield, Mabel S. and Salibelle Royster. "The New-Type
Test in English." English Journal 20 (1931): 490—95.

Savitz, Jerohn J., Myrtle Garrison Bates, and D. Ralph
Starry. Composition Standards. NY: Hinds, Hayden &
Eldredge, 1923.

Searson, J. W. "Determining a Language Program." English
Journal 13 (1924): 99-115.

Shafer, Robert E. "A National Assessment in English: A
Double Edged Sword." Elementary English 48 (1971):
188-95.

---. "What Can We Expect from a National Assessment in
Reading?" Journal or Reading 13 (1969): 3—8+.

Shugrue, Michael F. English in a Decade of Change. NY:
Pegasus, 1968.

Simmons, Jay. "Portfolios as Large-scale Assessment.“
Language Arts 67 (Mar. 1990): 262-68.

Simmons, John S. "Testing on Both Sides: A Comparison."
English Journal 76 (Dec. 1987): 27-29.

Slavin, Robert E. Research Methods in Education. Englewood
Cliffs, NJ: Prentice-Hall, 1984.

Slotnick, Henry B. and John V. Knapp. "Essay Grading by
Computer." English Journal 60 (1971): 75-87.

 

 

249

Small, Robert C., Jr. "The English Teacher and the
Superv1sor." English Education 7 (1976): 169—76.

Smith, Dora V. Evaluating Instruction in Secondary School

English. English Monograph 11. Chicago: NCTE, 1941.

~--. "Re-establishing Guidelines for the English
Curriculum." English Journal (1958): 317-43.

Smith, Nila Banton. American Reading Instruction. 1934.
Neward, DE: IRA, 1965.

Smith, Wilson, ed. Theories of Education in Early America
America 1655-1819. Indianapolis: Bobbs—Merrill, 1973.

Snider, Sarah J. "Developing Non-Essay Tests to Measure
Affective Responses to Poetry." English Journal 67
(1978): 38-40.

Spandel, Vicki and Richard J. Stiggins. Creating Writers.
NY: Longman, 1990.

"Specter of National Test Rears Its Head." ASCD

(Association of Supervision and Curriculum Development)

Update 32 (Nov. 1990): 1+6.

Spring, Joel. The American School 1642-1990. 2d ed. NY:
Longman, 1990.

Squire, James R. "Behavioral Objectives and

Accountability." Goal Making for English Teaching.
Ed. Henry B. Maloney. Urbana: NCTE, 1973.

---. "English at the Crossroads: The National Interest
Report Plus Eighteen." English Journal 51 (1962): 381-
92.

Squire, James R. and Roger K. Applebee. A Study of English

Programs in Selected High Schools Which Consistently
Educate Outstanding Students in English. Urbana: U of
Illinois, 1966.

"The Stanford Language Arts Investigation: A Symposium."
English Journal 33 (1944): 119-29.

Starch, Daniel. Educational Measurements. NY: Macmillan,
1916.

 

Stewig, John Warren. "Oral Language: A Place in the
Curriculum." Clearing House 61.4 (Dec. 1988): 171-74.

Stone, Clarence R. Silent and Oral Reading. Boston:
Houghton Mifflin, 1926.

250

Strickland, Dorothy S. Foreword. Observing the Language
Learner. Ed. Angela Jaggar and M. Trika Smith-Burke.
Newark, DE: IRA; Urbana: NCTE, 1985. v-vi.

Strickland, Ruth G. "Evaluating Children's Compositon."
Elementary English 37 (May 1960): 321-30.

Tackacs, Claudia. "AWE: Classroom Use of a State Testing
Program." English Journal 76 (Dec. 1987): 34-36.

Taylor, Denny. "Teaching Without Testing: Assessing the
Complex1ty of Children’s Literacy Learning." English
Education 22 (Feb. 1990): 4-74.

Tchudi, Stephen, and Diana Mitchell. Explorations in the
Teaching of English. 3rd ed. NY: Harper, 1989.

Thomas, Charles Swain. The Teaching of English in the
Secondary School. Rev. ed. Boston: Houghton Mifflin,

1927.

Thomas, Charles Swain et al. Examining the Examination in
English. Cambridge: Harvard U Press, 1931.

Thorndike, Edward L. "Notes on the Significance and Use of
the Hillegas Scale for Measuring the Quality of English

Composition." English Journal 2 (1913): 551-61.

Tiedt, Iris McClellan. Writing From Topic to Evaluation.
Boston: Allyn, 1989.

Tinker, Miles A. "Appraisal of Growth in Reading." Reading
Teacher 8 (1954): 35-38.

Tuttle, Frederick B., Jr. How to Prepare for Writing Tests.
Washington, DC: NEA, 1986.

Tyler, Ralph W. "What Is Evaluation?" Evaluation of
Reading. Ed. Helen M. Robinson. Supplementary
Educational Monographs No. 88. Chicago: U of Chicago,

1958. 4-9.

"Types of Organization of High-School English: Report of a
Committee of the National Council of Teachers of

English." English Journal 2 (1913): 575—95.

Valencia, Sheila and P. David Pearson. "Reading Assessment:
Time for a Change." Reading Teacher 40 (1987): 726-32.

Wagner, Betty Jane. "A Valid Way to Assess the Effects of a

Writing Project." English in the Eighties. Ed.

Robert D. Eagleson. Australian Association for the
Teaching of English, 1982. 51-60.

 

251

Ward, C. H. "The Scale Illusion." English Journal 6
(1917): 221-30.

Watson, Dorothy J., ed. Ideas and Insights. Urbana: NCTE,
1987.

Weaver, Constance. Reading Process and Practice.
Portsmouth, NH: Heinemann, 1988.

Wiersma, William. Research Methods in Education. 3rd ed.
Boston: Allyn and Bacon, 1985.

Wiggins, Grant. "Teaching to the (Authentic) Test."
Educational Leadership 46 (Apr. 1989): 41-47.

Wiley, Mary Callum. "The English Examination." English
Journal 7 (1918): 327-30.

Wilson, G. M. "New Standards in Written English."
Elementary English Review 6 (1929): 117-19+.

Wilson, Marilyn. "Testing and Literacy: A Contradiction in

Terms?" Testing in the English Language Arts. Ed.
John Beard and Scott McNabb. Rochester, MI: Michigan

Council of Teachers of English, 1985. 12—16.

Wittrock, Merlin C. "Process Oriented Measures of
Comprehension." Reading Teacher 40 (1987): 734-37.

Wixon, Karen K. et al. "New Directions in Statewide Reading
Assessment." Reading Teacher 40 (1987): 749-54.

Zemelman, Steven and Harvey Daniels. A Community of
Writers. Portsmouth, NH: Heinemann, 1988.

Zollinger, Marian, and Mildred A. Dawson. "Evaluation of
Oral Communication. English Journal 47 (1958): 500-04.