\ ’;\1\‘\1‘h\\m ‘w :.:~ i in it: , ”a ... .n; ,.,,|. .1y»...,‘....‘ 1 , -. 'n,~= - . . Luz-H.713". ., .;.3. .2. .. w .. m . . m”... L”.— . . l '1 lllllllllllllllllHIHHHllllllllllllllllllllllllllllllllll 3 12930 l LIBRARY Michigan State University -\ ,4 This is to certify that the dissertation entitled CRITERIA FOR ASSESSING KINDERGARTEN TO TWELFTH-GRADE ENGLISH LANGUAGE ARTS CURRICULA , TEACHING PRACTICES , AND STUDENT PERFORMANCE presented by ELLEN HENSON BR INKLEY has been accepted towards fulfillment of the requirements for Ph . D . degree in English fl/z A/ /cMé Major professor Date 2 $393M /?7/ MS U is an Affirmative Action/Equal Opportunity Institution 0-12771 PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. DATE DUE DATE DUE DATE DUE MSU Is An Affirmative Action/Equal Opportunity Institution c:\circ\datedue.pm3-p.1 4’ _,.__j _ CRITERIA FOR ASSESSING KINDERGARTEN To TWELFTH-GRADE ENGLISH LANGUAGE ARTS CURRICULA, TEACHING PRACTICES, AND STUDENT PERFORMANCE By Ellen Henson Brinkley A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of English 1991 @54~ 58 77 ABSTRACT CRITERIA FOR ASSESSING KINDERGARTEN TO TWELFTH GRADE ENGLISH LANGUAGE ARTS CURRICULA, TEACHING PRACTICES, AND STUDENT PERFORMANCE BY Ellen Henson Brinkley Previous historical research has described the teaching cm' English language arts. Other research. has examined cuiteria that can be used to evaluate educational programs as a prelude to reform. The purpose of this study was to investigate (l) by what criteria English language arts programs have been assessed in the United States in the past, (2) what criteria are being used and recommended for English language arts assessment today, (3) what lessons can be learned from the use of past and present criteria and contexts for assessment, and (4) how English language arts program assessments might provide data that decision—makers can consider in order to make needed reforms. Data collection procedures included reviewing English language arts professional publications, especially the publications of the National Council of Teachers of English (NCTE), and designing a questionnaire and compiling questionnaire results. Respondents were members of NCTE committees Charged with addressing issues of English language arts curricula, teaching practices, and student performance, as well as representatives from selected NCTE Centers of Excellence award—winning programs. Results show that from the beginning of this country’s history evaluation of English language arts curricula and teaching practices have been linked to evaluation of student performance. Major findings of the questionnaire study are that teachers, students, and test results are perceived as the factors most significant in shaping English language arts curricula today; that school administrators are perceived as most significant in evaluating English language arts teaching practices; that exchange of ideas among teachers within their own school building is perceived as most significant in influencing change in teaching practices; and that objective tests are perceived as the most frequent means by which English language arts student performance assessment. occurs. These :results suggest. the need for multi—dimensional program assessment, with student test results serving as only one criterion by which English language arts curricula, teaching' practices, and. student performance are evaluated. ACKNOWLEDGEMENTS This study grows out of nagging questions that I, like all other English language arts professionals, have struggled with. Not only must. we teach, but we must evaluate as well--and be evaluated. Several persons have made this project interesting, challenging, and possible. The members of my guidance committee have shown both the patience and the impatience I needed to get the project done. Each of the members of my committee--Stephen Tchudi, Marilyn Wilson, Sheila Fitzgerald, and Diane Brunner--has taught me more than I thought I might learn during the course of the study. Colleagues in the English Department at Western Michigan University have listened to my dissertation stories and have believed in me when I needed that most. My husband and children have put up with a wife and mother who at times has tried to do it all. To Max, Matthew and Sarah I offer my apologies, my love, and my thanks. iv Chapter 1 3 TABLE OF CONTENTS The Need for Research and the Research Process . . . . . . . . . . . . Research Questions . . . . . . . Rationale and Methodology for this Study Early English Language Arts Evaluation: 1607-1924 0 o o o o o o o o o o o o o 0 Very Early English Language Arts . . . . New Measures for Changing Times . . . . Composition Scales . . . . . . . . . . . Evaluating Reading and other Language Arts . . . . . . . . . . . . . . . . . Indirect Evaluation of English Language Arts Teaching Practices and Curricula. The Impact of Standardized Testing: 1925-1940 0 o o o o o o o o o o o o o . Testing Enthusiasm . . . . . . . . . . . Evaluating English Language Arts . . . . Evaluating English Language Arts Teaching Practices and Curricula . . Challenging the Tests: 1941-1957 . . . . English Language Arts Test Abuse and Criticism . . . . . . . . . . . . Evaluating English Language Arts . . . . Evaluating English Language Arts Teaching Practices . . . . . . . . . Major Studies Evaluating English Language Arts Curricula and Programs . Reconsidering English Language Arts Evaluation: 1958-1969 . . . . . . . . . Evaluating English Language Arts . . . . Evaluating English Language Arts Teaching Practices and Curricula . . . Page l4 l4 19 26 32 39 44 44 49 59 7o 71 73 84 87 96 98 108 v1 6 Expanding Testing and Alternatives: 1970-1987 . . . . . . . . . . . . . . . . 119 From Testing as Measurement to Testing to Testing as Management . . . . . . . . . 119 Evaluating English Language Arts . . . . . . 123 Evaluating Reading . . . . . . . . . . . . . 128 Evaluating Literature . . . . . . . . . . . 135 Evaluating Writing . . . . . . . . . . . . . 137 Evaluating Oral Language Arts . . . . . . . 142 National English Language Arts Assessment . 144 Evaluating English Language Arts Teaching Practices . . . . . . . . . . . . 146 Evaluating English Language Arts Curricula . 153 Toward a Theory of English Language Arts Assessment . . . . . . . . . . . . . 158 7 Current Conditions . . . . . . . . . . . . . . 162 The Power and Impact of Testing . . . . . . 162 Evaluating English Language Arts . . . . . . 164 Integrating English Language Arts Assessment . . . . . . . . . . . . . . . . 174 Evaluating English Language Arts Teaching Practices . . . . . . . . . . 177 Evaluating English Language Arts Curricula . . . . . . . . . . . . . . . . 180 8 Reports from English Language Arts Professionals . . . . . . . . . . . . . . . 184 English Language Arts Curricula . ... . . . 189 English Language Arts Teaching Practices . . 196 English Language Arts Student Assessment . . 199 Optional Final Essay Items . . . . . . . . . 204 9 Conclusions, Speculations, and Recommendations . . . . . . . . . . . . . . 213 Evaluation of English Language Arts . . . . 214 Evaluation of English Language Arts Teaching Practices . . . . . . . . . . . . 221 Evaluation of English Language Arts Curricula and Programs . . . . . . . . . . 223 LISTOFTABLES.............. ....vii LIST OF FIGURES . . . . . . . . . . . . . . . . . .viii APPENDIX A . . . . . . . . . . . . . . . . . . . . . . 223 APPENDIX B . . . . . . . . . . . . . . . . . . . . . 234 00.00.0235 BIBLIOGRAPHY . . . . . . . . . . . . . . 10 11 12 13 14 LIST OF TABLES Sample for Questionnaire . . . . . . . Factors that Shape Curricula . . . . Curriculum Evaluation . . . . . . . . Designing and Revising Curriculum Guides Use of Curriculum Guides . . . . . . . Value of Curriculum Guides . . . . . . Persons Who Determine Teaching Practices Evaluation of Teaching Practices . . . Influences on Changing Teaching Practices School District Means of Assessment . Classroom Means of Assessment . . . . Factors Determining Means of Assessment Factors in Program Assessment Process Suggested Improvements for Program Evaluation vii O 187 189 191 193 193 195 196 197 198 200 201 202 204 208 LIST OF FIGURES Sample Unit of the Burgess Silent-Reading Test . . . 36 Ayres Handwriting Scale . . . . . . . . . . . . . . 38 Participation in Discussion Flow Chart . . . . . . .104 Observation Checklist to Assess Reading Abilities .132 Estes Attitude Scale . . . . . . . . . . . . . . . .133 Bay Village Reading Scale . . . . . . . . . . . . .168 Evaluating Writing as a Process . . . . . . . . . .173 viii CHAPTER ONE THE NEED FOR RESEARCH AND THE RESEARCH PROCESS Driven by pressures from the general public and by national attention to testing and assessment,1 kindergarten to twelth-grade (K-12) school district personnel find themselves self—consciously wondering how effective their programs are, how they compare with others like or different from themselves, and how well their students and faculties measure up. My own experience confirms this fact, for I was recently hired as an external consultant to conduct a review of a K-12 English language arts program in a nearby public school district. My charge was to describe from an outsider’s perspective the program as it currently existed and to recommend changes based on my knowledge and experience. It is this program evaluation experience that provided the primary spark of interest in conducting this study. English language arts teachers and administrators are aware that current theory and practice related to the teaching of K-12 English language arts have changed dramatically during the last fifteen years. For example, in the past, beginning reading has been taught as a set of letter and sound identification skills on the premise that such skills led to comprehension of ideas being transmitted by a writer. More often now, even in the early grades, reading is taught as a transaction of meaning between reader 1 and writer, a transaction that occurs as the reader uses graphophonemic, syntactic, and semantic cues from the text to construct meaning. In the past, likewise, literature has often been studied as fixed texts to be analyzed, with great emphasis on understanding the prevailing interpretations of literary critics. Today, however, more attention is given to the interaction of student readers with texts and to recognition of the meanings that readers bring to texts and to the meanings that readers construct as they read. The teaching of writing has undergone change that is just as dramatic. In the past it has been assumed that writers decided their meaning ahead of time and planned their writing according to a preconceived thesis and outline. The finished papers were submitted to a teacher-- usually the sole reader-~who then evaluated the finished products. Today writing is taught not only as a means of communicating predetermined thoughts but also as a way to explore thoughts and as a way to determine focus. Writing evolves and is aided by response from teacher and from peer readers, eventually being shaped into polished pieces to be shared with as authentic audiences as possible. In the past, listening often has been taught by having students take notes as teachers lectured, and speaking has often been taught by giving assigned formal speeches and by requiring oral recitation of answers to study questions. More often today, however, both listening and speaking occur naturally in the interaction of small classroom groups engaged in collaborative learning activities, with students actively thinking, questioning, and articulating for themselves concepts being studied and responses to them. Such teaching and learning strategies have been discussed and recommended by theorists and researchers in English language arts professional publications and conference sessions. In response, school districts have often sought to implement such reforms. Sometimes English language arts teachers and curriculum coordinators rewrite curriculum guides to reflect new theories and new methods and then simply hope for the best. Sometimes extensive inservice training and follow-up seem almost to guarantee successful implementation of reforms. Sometimes teachers' intuitions tell them that the new programs work, and sometimes research confirms the value and validity of the new theory and new methods. However, not every new method or program is successful, and sometimes there is disagreement among faculty, parents, administrators, and students as to how effective the new ways are or of the value still inherent in the old ways. Unfortunately, as old ways are discarded, often new ways are still measured by old evaluation tools. In fact, today’s English educators tell us that old evaluation methods need to be changed as well. Although state departments of education sometimes publish program assessment guidelines (available through the ERIC system), they frequently consist of simple checklists. School district teachers and administrators want and need better ways to assess their students’ progress and better ways to assess their own English language arts programs. School districts and communities must, of course, not only be concerned with assessment within the district but also with the array of state and national tests and assessments. Students from the early elementary grades are given state and national standardized tests which produce results that are frequently analyzed and used to draw conclusions about the comparative performance not only of individual students but also of teachers, administrators, school districts, and even states and nations. Anyone who helps pay for education--parents, communities, state and national governmental bodies, and the general public-—feels a right, and often a responsibility as well, to ask about its effectiveness and efficiency. Research Questions The control, then, of today’s curricula and teaching practices rests in bits and pieces all the way from the classroom to the Congress. Assessment of English and language arts curricula, teaching practices, and student performance must, therefore, be considered within the context of complex political, economic, and social issues. English educators must consider profound context questions-_ questions that will be considered as a part of this study: 1. What is the purpose of evaluation and assessment? 2. Who will evaluate, and who will set the standards? 3. Who will be served by assessment, and who stands to lose? While English language arts teaching and learning have become encumbered with multiple evaluators and multiple evaluations, the solution is not simply to reduce the participation and control by those outside the classroom. Accountability is a given-—assessment does and will and should occur. The purpose of this study, then, is not to argue against assessment per se, which seems an unreasonable and futile endeavor, but rather to focus primarily on criteria for assessment: 1. By what criteria have English language arts programs been assessed in the United States during its brief history? 2. What criteria are being used and being suggested for English language arts assessment today? 3. What lessons can be learned from the use of past and present criteria and contexts for assessment? 4. How might English language arts program assessments provide data that is accurate, appropriate, and useful so that decision-makers can consider needed reforms? Rationale and Methodology for this Study A. study of the materials front English language arts professional books and journal articles yields invaluable 6 data about assessment practices of the past, since such publications provide the primary record. of what English language arts professional leaders have observed and recommended. Especially during the earlier years such information exists primarily embedded in broader discussions of how English language arts was being taught or of how educators thought it should be taught. My task in part, then, has been to glean the fragments of discussion about English language arts evaluation of curricula, teaching practices, and student performance from professional publications in order to determine the shifts in thinking and practice over the years--always recognizing the impact that overall English language arts trends and broader educational and national events had on evaluation of student performance and programs. The historical research should be illuminating and instructive for English educators today as they act and react in an educational environment that currently seems to give its greatest priority to issues of testing and evaluation. Historical research is recommended to provide '%1 perspective for’ decision-making' about. educational problems," to assist in "understanding why things are as they are," to predict "future trends," and to avoid the plight of those who ignore the old adage, “those who are unfamiliar with the mistakes of history are doomed to repeat them“ (Wiersma 184). Historical studies, such as Arthur N. Applebee’s Tradition and Reform in the Teaching of English, have jprovided just. such. helpful foundations for English educators as they consider broader English issues and reforms. Earliest historical data can be most readily drawn from secondary' sources, i.e., historical accounts. of 'teaching practices in the United States, while later data can be drawn from a variety of primarily English language arts professional. publications, ‘which. frequently' record. first- person ‘testimony of English language arts educators and classroom teachers. Thus, information focused specifically on English language arts is easier to access after 1912 when the English Journal was first published. Because of the broad potential scope of this study, limits have been imposed on the materials to be reviewed. Not included are discussions of assessment of K-12 English language arts programs outside the United States or discussions of the special assessment needs of ethnic minority groups and special education students. Also not included are discussions of textbooks or teacher education programs, though admittedly all of these factors are important. considerations in their’ own right and. have a bearing on how and why English language arts assessment occurs as it does. As I consulted English language arts professional publications, I was aware of the problems that can cloud the value and validity of the information recorded there. While the editors of such publications have provided over the years a forum in which English language arts theories and practices could be discussed, still there has always existed the possibility for distortion that can occur as professional bandwagons emerge and then disappear. Writers of professional articles usually present themselves as spokespersons who believe they possess insight about the truth and who hope others will seek, and benefit from, their shared insights. Indeed, writers of professional publications-— especially those writing for a specific professional organization such. as the National Council of Teacher of English (NCTE)-—are usually influenced themselves by the articles and books previously published by that professional organization. It is not difficult also to imagine readers who assume that, because particular English language arts practices have been discussed at length in professional publications, those practices have been adopted in a general way, whether or not that might actually be the case. Additional data can be solicited, however, from those who are most apt to be especially attuned to the realities of English language arts assessment practices as well as aware of the professional lore included in professional publications. Unlike writers of professional publication materials, questionnaire respondents offer their insights upon request and thus may provide a somewhat different, Perhaps more realistic perspective, since they can remain anonymous if they choose to do so. Thus, responses to a 9 questionnaire could serve as a supplement and as a reality check to historical research and to current professional publication information. Using principles of questionnaire construction suggested by Robert Slavin (aw—”MW A Practical Guide) and Likert-scale information provided by William Wiersma (Research Methods in Education: An Introduction), and aided by conversations with my dissertation director and with a data collection and analysis consultant, I designed two versions of a questionnaire (Appendix A) for 300 persons who could be identified as likely to have knowledge and experience regarding the assessment of K-12 English language arts curricula, teaching practices, and student performance. By virtue of my position as vice president of the Michigan Council of Teachers of English, I have recent copies of the NCTE Directory, which allowed me to identify English educators (specific groups will be described in chapter 8) serving on NCTE committees charged with addressing issues of English language arts curricula, teaching’ practices, and student performance. In addition, I selected contact persons identified in lists provided by NCTE of 1985, 1987, and 1989 NCTE Centers of Excellence award winners, choosing carefully among the lists those whose awards appear from the program titles listed to have been given the award for an entire English language arts program or for a substantial part of the entire program (e.g., a middle school writing 10 program). The questionnaires were designed to elicit information about actual procedures and programs and to elicit opinions as well and thus yield both quantitative and qualititative information (detailed in chapter 8). Even questionnaire responses, of course, must be interpreted. with the understanding that reality as these respondents know it may also be shaped in part by the influence of professional publications. This limitation, however, is outweighed by the actual professional English language arts situations these respondents experience from day to day and year to year. Perhaps a more personal limitation exists because I myself have also inevitably been influenced by professional publications, though this limitation is also offset by my own professional experience as a K-lz classroom English teacher, as a teacher of inservice and preservice English teachers, and as an English language arts consultant. The issue of distinguishing perceptions from reality does seem to be a central one, however. I realized, for example, as I prepared for the K-12 program review conducted recently, that I had to use limited time and money resources wisely to yield the most accurate data. Without a staff of researchers who could conduct extensive classroom observations, I had no choice but to rely to a great extent on what school district staff and persons in the community expressed as opinions, albeit relatively informed ones, about the operation and effectiveness of the English 11 language arts program. On that occasion, what was ultimately helpful was John Goodlad’s distinction among different types of curricula: the "ideal" curriculum, "what scholars . . . believe should be taught"; the "formal" curriculum, "what some controlling agency (like the state or the local district) has prescribed"; the "perceived" curriculum, "what teachers believe they are teaching in response to the needs of the pupils"; the "operational" curriculum, "what an observer would actually see being taught in the classroom"; and the "experiential" curriculum, "what the students believe they are learning" (qtd. in Glatthorn l8). Gathering information from these multiple evaluative perspectives seemed an effective way to approach a K-12 English language arts program evaluation. As I approached the present study, it seemed likely that again Goodlad’s categories might be useful as I analyzed both the present and the past English language arts evaluation criteria and contexts. There are several assumptions, then, that underlie this study: that assessment and evaluation are inherently valuable; that many groups and individuals have a stake in assessment and believe it is important; that evaluation tools should closely match current theory and practice; that it is possible to assess for the wrong reasons; that assessment has the potential to do harm as well as good; that assessment is affected by external conditions; and that, in spite of the quantity of student test results 12 available to administrators today, school districts need more information about program effectiveness than most have now. With these assumptions in mind, I have tried to discover what has already been tried but did not work in the past and possibly what has already been tried but might work now. I have analyzed the effectiveness of various methods and measurements used in the past; questionnaire data about actual evaluation criteria and contexts today; and informed opinions from questionnaire respondents as to how best to assess English language arts programs. From this study, I draw conclusions and make recommendations about the kinds of information school districts need to gather and how they might best acquire it in order to decide on needed reforms. This study and its results should have relevance for most pmblic schoOl districts and for consultants who anticipate conducting evaluative studies of English and language arts programs in the future. 1Today the terms "evaluation" and "assessment" are frequently interchanged, with both suggesting formal or informal means by which a judgment can be made. "Testing“ is considered one means by which assessment or evaluation can occur. The usage of these terms has changed, however, over the years, especially the use of "assessment." In NCTE’s 1975 booklet, Common Sense and Testing in English, for example, "assessment" was defined as "a term used for local, state, and national projects that seek to describe how well students are doing in various fields" while "evaluation" was defined as "the process of determining the value or worth of something in schooling at any level" (4). Interestingly, the Oxfo d erican Dictionar (1980) defines "assess" as "to decide or fix the amount of value of, to 13 estimate the worth or quality or likelihood," while it defines "evaluate" very similarly as "to find out or state the value of, to assess." The verb "test," however, is defined as "to subject to a critical evaluation of the qualities or the attributes of,“ seeming to emphasize both the power of the test-maker and the unpleasantness of the test-taking task. CHAPTER TWO EARLY ENGLISH LANGUAGE ARTS EVALUATION: 1607-1924 Very Early English Language Arts A review of the literature from the past reveals that educational evaluation. is as old, or' nearly' so, as (are teaching and learning themselves, for any valuing of achievement or learning involves evaluation. In the earliest days of this country’ 5 history any education that occurred took place in the home, with parents as teachers or with a private tutor hired by the family. It is easy to imagine that even there the language and literacy learning and teaching that took place were subjected to evaluation-— the teacher evaluating the progress of the young scholars and his or her own work as a teacher, the students self— evaluating their own understanding and their teacher's effectiveness, the parents evaluating both the work of their children and of the tutor. In the American colonies, then, any evaluation of student performance was based on observable behaviors-- reading the Bible aloud (N. Smith 35) during family devotions or reading and writing letters. Parents probably made mental notes as to how well a child's literacy compared to that of his or her siblings or even to the memory of the parents’ own early achievements. Ultimately, of course, the results could be observed and evaluated further as the young 14 15 students took on adult responsibilties that demanded the use of their knowledge and skills. Clifton. Johnson’s 1904 Old-Time Schools. and. School- books, a study of colonial teaching practices, explains that as settlements grew larger, many communities started a dame school, for "[t]here was always some woman in every neighborhood who, for a small amount of money, was willing to take charge of the children and teach them the rudiments of knowledge" (25). ZIt is difficult, however, to know how greatly the communities valued education or the devotion of teachers to their duties. Johnson explains that while the "dame" listened to the students’ recitation, "she busied her fingers with knitting’ and sewing, and in the intervals between lessons sometimes worked at the spinning-wheel" (25). According to Johnson, sometimes in the South even convicts served as teachers during these earliest years. Apparently new settlers who had been convicted of small crimes occasionally paid for their ocean voyage, if they could read or write, by becoming indentured to teach for a length of time (32). Nila Smith in her historical study of AE§£19§h_Reading Instruction refers to the colonial period (1607-1776) as "the period of religious emphasis" (10). She explains that with Protestantism came the doctrine that individuals were responsible for their own salvation and thus had to learn to read and interpret scriptures for themselves (11). In such a society reading instruction was valued. Evaluation of 16 reading skill occurred in the form of oral reading of the Bible or the W as well as by saying aloud the letters of the alphabet and syllables as listed in the primer. Johnson explained that the minister also played an important part in evaluation. As a town officer he "examined the children in the catechism and in their knowledge of the Bible" and carried out what must have been one of the country’s first evaluations of listening skills by questioning students "on the sermon of the preceding Sunday" (24). Sidney Cohen supplies additional evaluation details in his description of the educational law enacted in Massachusetts in 1642. "Selectmen" in each town were charged with determining "whether or not parents and masters were following their obligations," that is, determining if the children were being taught "to read and understand the principles of religion and the capital laws of the country" (44). The stakes, at least as set by the law itself, were fairly high. Fines could be assessed parents who refused to have their children examined, and if a court or magistrate agreed with the selectmen that particular parents were remiss in educating their child, the Child could be apprenticed, in which case the master of the "deficient child" would be required to fulfill the provisions of the law. In 1690 Connecticut passed a Similar law which made it "incumbent upon local jurymen to examine the reading ability of all the town’s children" and to fine negligent parents 17 (81). Cohen points out, however, that in actual practice, parents and towns often found ways around the penalties and that sometimes the student readers’ only test was to recite a memorized catechism, which did not actually measure reading skill at all (81). Given the communities’ relatively low expectations of teachers, there seemed little attention paid to evaluation of teaching practices or of curriculum. In 1654, however, a Massachusetts law recommended that the selectmen "exercise some supervision over the quality of the teachers employed by the community" (Cohen 56). Teachers may also have felt that. both they and their curricula were being examined indirectly on the occasions when visiting officials examined students. By the mid-17005 prospective students of Benjamin Franklin's English School ("English" in this case used to distinguish the school from. those emphasizing Latin and Greek) had to meet the following entrance requirements: "It is expected that every Scholar to be admitted into this School, be at least able to pronounce and divide the syllables in Reading, and to write a legible Hand . . . ." (qtd. in w. Smith 177). Franklin had definite ideas about student reading performance that might or might not meet his standards. In describing the second of six classes to be tauth, he complained that the boys . . . often read as Parrots speak, knowing little or nothing of Meaning. And it is impossible a Reader should give the due Modulation to his Voice, and 18 pronounce properly, unless his Understanding goes before his Tongue, and makes him Master of the Sentiment. (W. Smith 179) Writing lessons focused throughout the early years primarily on penmanship and spelling, and evaluation was probably dependent on what could be demonstrated for all to see. The emphasis on good penmanship is indicated by “exhibition pieces" which were passed around for visitors to admire on the last day of the school term (Johnson 112). Franklin, however, again made it clear that meaning should be emphasized: The boys should be put on Writing Letters to each other on any common Occurrences, and on various Subjects, imaginary Business &c. containing little Stories, accounts of their late Reading, what Parts of Authors please them, and why. . . . (W. Smith 181) Evaluation of such work was a certainty, for Franklin added, “All their letters to pass through the Master’s Hand, who is 'to point out the Faults, advise the Corrections, and commend what he finds right" (W. Smith 181). During these early years when many did not read or write and when in many homes the only book was the Bible, oral language was considered especially important. Although relatively few went to college, those who did found that the colleges focused great attention on rhetoric and oratory, following the "oral-based eighteenth-century model of education“ (Lunsford 3). The ability to speak correctly and persuasively in public was easily evaluated by student performance. Oratory' made demands on listeners as well, 19 though early educators seemed less concerned about evaluating listening. There was, then, during these early years great emphasis on receiving knowledge, on following rules, on learning to do things the right way. To the extent that students could remember and reproduce transmitted information, they were judged successful students. To the extent that teachers produced students who were functionally literate, teaching practices and curricula were met with approval. New Measures for Changing Times As the country’s attention shifted in 1776 to revolution and independence, the explicitlyl religious emphasis in classrooms was replaced by a nationalistic and moralistic emphasis (N. Smith 37). It was hoped that reading would foster loyalty in the new nation as well as "high ideals of virtue and moral behavior" (N. Smith 37). Noah Webster’s The American Spelling Book provided an American standard by which young students could be measured, and Lindley Murray’s English Grammar was designed to "promote in some degree, the cause of virtue, as well as of learning" (qtd. in Tchudi and Mitchell 7). The emphasis in Murray's Grammar on the rigid procedure of "parsing" of text also provided almost countless grammatical labels on which students could be tested. 20 Literature was studied during these years primarily as a subject upon which composition assignments and examinations could be based. Arthur N. Applebee's discussion of the colleges’ attitude toward literature is instructive for understanding the view of pre-college educators as well. Essentially, literature was perceived as something to be enjoyed outside the classroom rather than as a subject to be taught. Literature was used inside the Classroom, however, as a model for writing compositions, and secondary English teachers routinely found themselves teaching literature from. the college' :reading lists to prepare their students for college entrance examinations (A. Applebee, Tradition 30). By the mid-18005 Horace Mann had advocated the use of written. rather than oral examinations because they' were thought to be more objective: . . . written exams provide all students the same question in the same setting. Oral examiners necessarily had to ask different questions during testing because all students were in the room awaiting their turns. Oral examiners also could phrase their questions so that some answers were more obvious than others. As a result some students received easy questions, while other students were stuck with difficult ones. (qtd. in Moore 958) The academy movement that Franklin had promoted spread rapidly, with one report of 6,185 academies in the United States by 1855 (Spring 22). These "town schools" were frequently supported by the communities but controlled by private boards of trustees (Spring 22). Especially once settlers moved westward into more isolated locations, 21 however, classroom teachers themselves frequently had little, if any, schooling beyond what their own one-room school had offered. My own great-grandmother was one of these young teachers. Family records describe her having attended the "country school" for a short time in the mid- 18005 before enrolling in the Gallia Academy across the Ohio river from her Virginia home. Her final evaluation served as a form of teacher certification: When she was only 14 and a half years old . . .her father took her on the back of him on a horse. Took her to the three school trustees, and each one of them examined her in one or two subjects. She got high grades in all, and they gave her a school to teach when she was just fourteen and a half years old. (qtd. in Brinkley 2) One of the reasons why such young teachers were recruited was because of the rapid growth of school populations that had begun to take place. Indeed, by the end of the 18005 it was no longer even possible for evaluation of student performance to be an individual matter between teacher and student, for no individual student could Claim very much of the teacher’s attention. At this time the issue of student evaluation, in the form of college admissions tests, occupied the attention of high school and college English faculties and students alike. In 1871 the University of Michigan organized a Commission of Examiners to visit high schools to evaluate faculty, students, and curricula. Thus an accreditation system was developed, so that students from approved schools could be admitted to college on the basis of their school’s 22 merit (Mason 41). Fred N. Scott, of the University of Michigan, simply stipulated that high schools should send them "young men and young women who respect their mother tongue and know how to use it" (qtd. in Hook, Long 12). Harvard, Yale, and other eastern schools, however, established rigid, written examinations for all applicants (Lunsford 2). The high schools struggled to match their own curricula to the college reading lists so that their students would not be at a disadvantage. High school teachers became exasperated, however, when faced with the need to teach so many works of literature listed by so many colleges and when they realized how little control they had over their own curricula. Eventually their complaints reached the National Education Association (NEA), which referred the matter to its English Round Table, which in turn formed a committee to study the matter, report its findings, and make recommendations. Questionnaire responses led this committee to recommend the establishment of a national organization of teachers of English (Hook, Long 14). The formation of NCTE in 1911 and the advent in 1912 of the English Journal provided a broader forum for English language .arts teaching' practices, curricula, and. student performance to be described, evaluated, and improved. The journal’s first editor explained that it aspired "to provide a: means of expression and. a general clearing house of experience and opinion for the English teachers of the 23 country" and to be "a bearer of helpful messages to all who are interested in the teaching of the Mother tongue" (qtd. in Hook, L_011g 23). Hook observes that early NCTE leaders Shared a belief that council publications "should not follow a party line but should be open to informed, independent expression of even highly divergent opinions" (23). Ten years later the second editor would likewise assert that, "We desire to make the magazine an open forum for all, conservative and radical alike, who have important ideas and can state them well" (Hook, Long 83), though he admitted his own progressive bias, which he predicted would "result in a preponderance of the new methods in the magazine, but this on the whole seems to be desirable, since those are the less known" (qtd. in Hook, Long 83). Because of the college entrance examination controversy, evaluation of student performance was an issue of importance from the first issue of the English_gournal. Attention to the college entrance issue was soon diverted, however, to the "new-type" tests and new theories about how student evaluation could and should be handled. Such a direction made sense to those of this time who were placing less faith in God and religion and. more in scientific "truths." At about the same time intelligence tests were being developed. The Binet scales (1905-8) and the Stanford revision (1916) were used during World War I in what today would be labeled a "high-stakes" assessment situation, for these tests were used to classify recruits to determine who 24 would serve in leadership positions and who would be sent to the front lines. Real-life testing during World War I revealed other results--for example, that thousands of soldiers could not read well enough to follow printed military instructions (N. Smith 158). Educators soon realized the potential for such tests in the schools, supposedly to group similar students in order to provide instruction to match students’ abilities (A. Applebee, Tradition 82). Soon Edward Thorndike was calling for accurate measures of what he referred to as educational "products" and working untiringly' to develop objective tests for a 'variety of subjects (N. Smith 127). Turn-of—the-century educators had begun to experiment with what they called "scientific" methods of teaching reading. Smith reported that in 1902 the "scientific alphabet" had been introduced to reduce the number of characters that represent the sounds of the English language in order to facilitate reading instruction and learning" (127). The "sentence method" and the "story method" as well as elaborate "phonetic methods" were also introduced (N. Smith 128). Edmund. Huey’s classic, The Psychology and Pedagogy of Reading, published in 1908, provided a "scientific treatment" of reading, according to N. Smith (123). As if the country and its schools were not already eager enough for standardized and objective tests, early issues of the English Journal included articles which 25 highlighted the need for more accurate and especially more expeditious ways to evaluate student performance, teaching practices, and curricula. Such topics were right on target for school teachers and administrators who had seen elementary and secondary student populations jump from 6,871,000 in 1870 to 17,813,000 in 1910. They also had seen the number of high schools increase from 500 in 1870 to an amazing 10,000 just forty years later (Kirschenbaum et a1. 51). Teachers eventually were faced with classes as large as 50 students and more, with the result that--especia11y for high school teachers who met a new group each hour——it was extremely difficult to know students’ individual interests and abilities“ ‘Vincil Coulter’s 1912 article complained about the difficult teaching conditions for English teachers, especially when compared to the favored status of science. The data he presented served as an evaluation of English language arts teaching conditions. For example, whereas science teachers each taught an average of 75 students, English teachers taught 136. Schools which spent $1.42 per pupil for science materials spent 17 cents per pupil for English materials (25-26). Later in that same year Ernest Noyes optimistically called in his article for a "clear-cut, concrete standard of measurement which will mean the same thing to all people in all places and is not dependent upon the opinion of any individual" (534). Other publications added fuel to the fire by demonstrating the unreliability of grades as measures of 26 student accomplishment. For example, Kirschenbaum et a1. Cited 1912 studies by Starch and Elliott in which papers graded by teachers in 142 schools revealed that one particular paper was scored anywhere from 64 percent to 98 percent while another was scored from 50 percent to 97 percent. Another student’s paper got failing marks from 15 percent of the teachers while 12 percent of the teachers gave it a score of 90 percent or above. Kirschenbaum et al. explain that, "with more than 30 different scores for a single paper and a range of over 40 points, there is little reason to wonder why the reporting of these results caused a ’slight’ stir among educators" (54-55). Given these conditions and concerns, it is not surprising that "scientific," i.e., standardized and objective, tests soon captured the attention of English and language arts educators at all levels. Composition Scales How to test composition seemed to pose the problem commanding the greatest attention among English educators and led to the development of a number of compositon scales. One of the first and one of the most popular was the Hillegas scale. A 1912 W article eXPlained how the scale had been developed: A large number of student compositions had been sent to several hundred judges, who were asked to arrange the papers in order of merit. From these rankings, a scale of ten samples "ranging in value by 27 equal steps from 0 to 937 units" was derived (Noyes 535). (Actually the zero point was established on the basis of an "artificial sample produced by an adult who tried to write very poor English" (Noyes 535), an understandable cause for later criticism of this particular scale.) The ten sample papers and their percentage scores were copied and distributed to serve as what today would be called "range finders" by teachers who could compare their own students’ writing to the samples. It is interesting to notice the many benefits that were projected for such measures. Supervisors, for example, were told, they could. use the samples to "compare classes of the same grade in different schools, in different cities, or under different teachers" (Noyes 536). These suggestions emphasized the external uses that could be made of test scores and at least implied the possibility of linking teacher evaluation to student performance on the basis of what were thought to be objective measures. Thorndike in a later Ehgli§h_gggrnal article explained the mathematical procedure used with the Hillegas scale. The rankings of all the judges were averaged for a particular sample and all the samples arranged then in order of merit. The value of 1.0 was assigned as the amount of difference that existed when 75 out of 100 experts ranked a sample correctly, that is, when no more than 25 judges put the "worse" sample ahead of the "better." Once a zero score ad been established, samples could be selected which were 28 1.0 better than zero, 1.0 better than 1, 1.0 better than 2, etc. (Thorndike 551). Although the Hillegas scale seemed the most commonly used and discussed composition scale, the Harvard—Newton Scale was also frequently referred to. Eighth-grade compositions were marked by the percentile method by 24 teachers in much the same way they were with the Hillegas scale. However, the Harvard—Newton Scale went further by offering writing samples for each of the four modes—- narration, exposition, argumentation, and description (Starch 146). Not everyone bought the notion of writing scales. C. EL. Ward in a 1917 English. Journal article, "The Scale Illusion," attacked the practice of ranking themes (221). This writer argued that "[aJny measure of literary value is impressionistic: any measure of literary value and mechanicial value at the same time is a phantom" (223-4), and further insisted that, “A system that shows [the student] only his height above an absolute zero can no more produce a harvest than a thermometer can bring forth figs" (230). What Ward offered instead of the Hillegas scale, however, would not be well received by modern readers, for it was a system based on the principle of subtraction for errors from a perfect score. In 1920 Roger Hatch tried a somewhat similar attack on composition scales, complaining that they were "mostly based upon that most deceptive theory, that law of averages" (338). Citing his 20 years of 29 work trying to find out what colleges wanted in composition, he: offered a system of penalties, which. he used as a pedagogical tool by presenting it to his composition students to refer to as they prepared assignments. Again, modern readers might cringe to hear his comment on the efficiency of his system: "with the composition marked in red ink or pencil . . . it is the labor of a moment only to run the eye over the marks and add up the total, subtracting from 100 percent to get the basic grade" (342-3). Flora Parker recommended confining Hillegas’ procedures to subjects which were definitely measurable, such as spelling, rules of grammar, etc. She insisted that composition is, in addition to being a demonstration of correctly applied rules, also "an art with all the intangible graces and beauties which reside in that realm" (204). Immediately following Parker’s article is a reply by EL A. Courtis, who rather smugly labeled the scale a "valuable measuring device" (208) and asserted that "everything worth while in education is also measurable" (208). Again, evaluation of teachers and curricula were linked to student performance for, according to Courtis, the Hillegas scale would be especially beneficial for supervisors who must decide "questions of general policy" (213) and could be used to determine "the efficiency of different methods of teaching." Further, he suggested that teachers could use the scale to judge their own work so that they could change their methods and bring their own work "up 30 to standard" (215) . Ultimately, any teacher who proved "incapable of profiting by training . . . need[ed] to be eliminated as a teacher of English composition" (216). It is: not. difficult. to imagine the reactions of classroom teachers who might read such warnings: teachers who valued their jobs would work hard to see that their students’ test scores were as high as possible. The Breed and Frostic Scale (Klapper 190) for sixth graders was similar to the Hillegas Scale, but all student writing was typed before being submitted to judges and to eliminate handwriting influences on scoring. Somewhat like the Harvard-Newton scale, the Van. Wagenen’s Composition Scale offered distinct scales for description, narration, and exposition, but included separate values for "thought content, mechanics, and structure" as well (Klapper 200). Klapper’s Teaching English in Elementary and Junior High Schools echoed the beliefs of others already cited in that he specificially advocated the composition scales as especially useful as measures of teacher achievement (230) and of the value of teaching methods (231). Wirt Faust describes yet another of these obviously ins-consuming and expensive projects to design a new and etter way to evaluate composition. Working with four high chools he asked seniors to write to a prompt, as did most f the other plans as well, in this case assigning themes escribing "fields, lakes, seas, streets, and the like and 0 contain no mention of human characters" (258). Of these 31 he chose 30, which were typed to eliminate handwriting, and sent to 40 judges (primarily NTCE members). The judges were asked to arrange the papers in order of merit and to write for each paper an opinion about the merit, defects, and the reason better than the one below it and poorer than the one above. Each was given a numerical score on content and on form. Interestingly, the article published just the 12 best of the 30 along with the data about merits, defects, and comparisons of each and offered the samples as "standards in descriptive theme writing for the Senior year of the high school" (260). The puzzle in this case is that there was no further mention of those samples numbered 13 to 30 or of the level of student writing they represented. It seems at least possible that readers might mistakenly have thought of the top 12 samples as representative of the entire range of responses against which to measure their own students’ work. Although many questioned the validity. and reliability of composition scores and standardized tests throughout this period, the experts still seemed to side with Daniel Starch, who had argued in 1916, "[aJny quality or ability of human nature that is detectable is also measurable" (2). Finally, ICTE, which had aired so many of the pros and cons in the nolish Journal, spoke out on the testing isue. The June 1923 issue included a report from the NCTE Committee on Ixaminations, whose first sentence made it Clear where the rofessional organization stood—-"The Committee on xaminations desires to stimulate an interest in a more ii 511 CC fL‘ ti ":51 32 widespread use of standard tests in English" (Certain 365). The primary reason for this recommendation seemed to be that such tests could provide school districts the opportunity to compare their standing with other districts, an external function of testing. Sterling Leonard’s article later that year reminded readers why these so-called "scientific" tests were preferable by describing a study of teacher corrections on student papers. He found a great number of "wrong" corrections as well as "a multitudinous array of puristic or wholly captions excisions, restatements, rearrangements, and additions which make over the pupils’ own expressions into such as fit the corrector’s way of thinking and writing" ("How" 528). Almost no one, it seemed, wanted to go back to the old ways. Evaluatin Readin and Other Lan ua e Arts Although the English Journal was NCTE’S only professional journal during the early years, it focused much of its evaluation attention on discussion of composition, especially' that. of secondary' students. Elementary educators, however, were devoting more of their attention to the evaluation of reading, although N. Smith explained that in fact standardized reading tests were slower to develop-- probably because both silent and oral reading were difficult to measure and to analyze into testable elements (161). Starch’s 1916 book included a reading test that he designed that may have served as an early model for later 33 tests. It was intended to measure the "chief elements" of reading, perceived by Starch as comprehension, speed, and correctness of pronunciation (20). He offered several reading passages at increasingly difficult reading levels which students were asked to read Silently for thirty seconds. Following the reading they were asked to mark the spot where they stopped reading and to write down as much as they could remember from their reading. Interestingly enough, the written retelling was scored by crossing out the words which reproduced the text and by counting those remaining--seeing what percentage of words should be discarded as not related to the text (31). Starch recommended that the test results be used to develop "a definite standard of attainment to be reached at the end of each grade" (31-32). Such a standard, he insisted, would make it possible for a "qualified person to go into a schoolroom and measure the attainment in any or all subjects and determine on the basis of these measurements whether the pupils are up to the standard, whether they are deficient, how much, and in what specific respects" (32). By 1918 at least four tests of silent reading had ppeared (N. Smith 161), resulting in what Smith referred to s a "phenomenal" increase in reading research led by tudies concerned with the standardization and application of reading tests (186). As if in response to Starch’s recommendation of grade-level standards, N. Smith reported that the one "great fundamental truth" that became apparent BS ind EVE (15 in to oil beg of: te: I‘GJ Si me 01: Ac st DE , illl||L 34 as the newly developed tests were administered was the wide individual differences in students’ reading achievement, even those "in the same grade and in the same classroom" (194). Smith acknowledged, however, that it was clear even in those early days of standardized reading tests the extent to which testing drives instruction: "as soon as school officials begin to test some phase of instruction, teachers begin to emphasize that phase in their teaching" (162). A wide range of reading tests and scales was eventually offered. What they measured differed markedly from test to test and reflected strikingly different definitions of reading, some tests concerned with decoding of visual symbols and others concerned with comprehension of the meaning of whole passages. Clarence Stone’s revised 1926 edition of w Oral Reading described a wide assortment of reading tests but cautioned that a student’s rating on a particular test "should be used as only one item to be considered along with other data" (228). The reading part of the Stanford Achievement Test, however, promised to help classify students as it asked students to comprehend. meaning of paragraphs, sentences, and words of increasing difficulty (Stone 229). Some sentences on this and on other similar tests were phrased as yes-no questions and others--such as "Are men larger than boys?"--would no doubt seem naive and problematic to modern test designers. The Burgess Rate Test measured. how :many' equally' difficult paragraphs. could. be 35 omprehended in five minutes and sometimes asked students to raw a response (Figure 1) . The Chapman-Cook Speed of eading Test also used paragraphs of equal difficulty and easured how many could be comprehended within two and a alf minutes. Students in this case were asked to cross out he word that spoiled the meaning of the paragraph, e.g., Tom got badly hurt the other day, when fighting with his older brother. As soon as this happened, he ran home to his mother, laughing as hard as he could. (241) Such items might trouble modern test-makers, who would estion them as too subject to a variety of nterpretations. Gray’s test, however, was perhaps more ypical of oral reading tests. As the Child read aloud, the ester was to record on a copy of the paragraph the errors ads by the student, with the idea that the better students ould read faster with fewer errors (Stone 263). Tests to measure other dimensions of the language arts re discussed and developed during this same era of test scination and explosion--some of which look very realistic to readers today. For example, the Ayres ndwriting scale, as described by Starch, was intended to aluate students’ handwriting skill. Incredibly, the test s constructed by taking samples by 1,578 children’s ndwriting, separating the individual words, and then asuring the speed by which readers could read these words. entually eight degrees of legibility were determined and esented to be used as guides, with three samples of each-- 36 I. This naughty dog likes to steal bones. When he steals one he hides it where no other dog can find it. He has just stolen two bones, and you must take your pencil and make two short, straight lines, to show where they are lying on the ground near the dog. Draw them as quickly as you can, and then go on. Figure 1 - Sample Unit of the Burgess Silent-Reading Test slant, mediu' scale). Literat discussed c effectively. standardized described 1: measure the students. t quality fro poem was ac. takers were "best" (m discern and Throug' Standardize scale for ' more imporl "incidental fa“, by 15 rePorted p; °°mpositior hOWever, 1., Were best Without Us encouraged laboratori‘ T____t 37 slant, medium, and vertical (Figure 2 shows a portion of the scale). Literature posed unique evaluation problems, which were Iiscussed on occasion but were difficult to address mfectively. Efforts were made, however, to create :tandardized tests of appreciation of literature. Leonard lescribed the process used in such a test that was to [easure the literary appreciation of both teachers and tudents. The test presented a number of poems "ranging in :uality from Mother Goose to Bridges of Masefield." Each oem was accompanied by three "spoiled" versions, and test- akers were asked to determine which in each case was the best" (Essential 59), thus demonstrating their ability to iscern and appreciate the best literature. Throughout this period some wished for even more tandardized tests. For example, Klapper in 1915 sought a cale for oral composition, since he believed it was much ore important than written composition, which he termed incidental" in the life of the average person (221). In act, by 1925 early issues of The Elementary English Review sported projects in Detroit and Chicago to develop oral Dmposition standards (Hosic; Beverly). Some educators, awever, reminded readers that "desirable language habits" are best observed in everyday oral and written expresion. ’Lthout using the expression "reflective practice," they couraged teachers to think of their classrooms as "testing boratories" in which they could use their students’ demonstrated Bates, and 5 Several evaluation widespread. argued agai to parrot l: The author their placr instead (Wi NCTE, advoc he also mad SUCh as gat W W From Standardize beginning Practices a this theoi pralCtices . is Very 11 evalllation least dlu evaluéition administra T____i 39 amonstrated. skill to revise 'teaching' practices (Savitz, [tes, and Starry 2). Several authors of this period did point the way to raluation recommendations that eventually became .despread. For example, a 1918 English Journal article gued against "old-time memory tests" that asked students . parrot back information provided by texts and teachers. e author did not offer standardized or objective tests in eir place but rather open-book "thought examinations" stead (Wiley 327). Although Certain, speaking in 1923 for TE, advocated standardized tests for diagnostic purposes, also made recommendations which sound more current today, ch as gathering work in individual student folders (466). direct Evaluation of English Language ts Teaching Practices and Curricula From the previous citations, it is clear that andardized tests of student performance were from the ginning promoted in part as a way to evaluate teaching actices and curricula. High student scores, according to is theory, were seen as evidence of good teaching actices and of teaching what needed to be taught. There very little in the professional literature about direct aluation of English language arts teaching practices, at iSt. during these years, probably Ibecause teacher iluation was seen as the province of school inistrators. There needed by ‘ article pro "natural en< "quick mind, temperament, 38). Spec: that they 14 “ability to knowledge a: "provocativ expression, was made evaluation, (or those c The 04 Place duri national : In 1913, English ap] the NBA w national s schools as numbers C interestin Was llh0W 40 There is, however, discussion about the qualities eeded by the English teacher. Franklin Baker’s 1913 rticle provides an example. He listed the following natural endowments" needed by teachers-—a "clear mind," a quick mind," "retentiveness and fulness of memory," "social emperament," and a "keen, intuitive sense of languge" (336- 8). Specificially for English teachers, he recommended hat they know and love their subject, that they have the ability to stimulate and guide other minds in acquiring nowledge and love of subject," and that they be skillful in provocative talk that leads to clear, vigorous thought and xpression, oral and written" (338). Although no mention as made of using these qualities for formal teacher valuation, readers could compare their own qualifications or those of others) against the lists provided. The other English language arts evaluation that took lace during the early NCTE years was district-wide and ational studies of curricula and teaching practices. n 1913, for example, a national study of high school nglish appeared, sponsored by committees from both NCTE and he NEA which had been charged ultimately to prepare a ational syllabus. A questionnaire had been sent to high :hools asking for information about teaching conditions, meers of students, and curriculum. An especially mteresting part of this study is that one of the questions is "how do you test the efficiency of your English >urses?" ("Types" 582). Responses included expected answers such school exams were also : effective e appreciate l conclusion there seem satisfactory served seve individual I to compare might decid found happ, using data improve Wor The g involved 11 Cremin repc POPular mac. Opinionated talk with interview I in 11% "public a] incompetem needed 8cm 4]. nswers such as college entrance exams, college courses, and chool exams. On the other hand, less quantifiable measures ere .also :mentioned, such as, "the power of clear and ffective expression," "interest in the work," "power to ppreciate literature," and "voluntary reading." It was the onclusion of this committee in 1913 that "on the whole here seemed to be no tests generally regarded as atisfactory" (582). Publishing the results of such studies erved several purposes. It is easy to imagine that ndividual English teachers might study the reports in order 0 compare their own teaching practices. English faculties ight decide to revise their curriculum based on what they ound happening in other places. School administrators, sing data from the reports, might persuade school boards to mprove working conditions or buy new materials. The general public did, however, occasionally get nvolved in the evaluation of school programs. Lawrence remin reports that back in the late 18005 The Forum, a apular magazine of the day, had sent Joseph Mayer Rice, an pinionated pediatrician, around the country to observe, ilk with teachers, attend school board meetings, and iterview parents (4). When Rice’s findings were published 1 The Forum, they included sensational descriptions of >ublic apathy, political interference, corruption, and 1competence" (4), sparking considerable discussion about aeded school reforms for several years to come. By 1924 an iTE study was conducted which involved 7,752 persons in 42 states. In were asked ' ordinary su this case to big enough the leaders included a language ar to choose . improve tum 1924, it i: "definite s (Searson Silbstantia] “mum of ready-m; This 1 eall‘liest judgments l Performanc clasSroom they Were Officials . personal < that Stam reading an 42 states. In this case persons in a variety of occupations were asked their opinions about language skills needed "for ordinary success in life" (Searson 102). The results in this case were intended to be used to help design "a program big enough to challenge the imagination and cooperation of the leaders of the country" (Searson 99). This same study included a questionnaire distributed among 8,799 English and language arts teachers. One of the questions asked teachers to choose among 21 items the "most urgent thing needed to improve the teaching of English.“ Given the climate of 1924, it is not surprising that one of the top choices was "definite standards of English work for each grade, or year" (Searson 105). Such teachers undoubtedly provided a substantial market for Franklin Bobbitt’s How to Make a Curriculum published that same year and including hundreds of ready—made educational objectives. This brief overview makes clear the fact that, from the earliest colonial examinations, individual teachers’ judgments have been viewed as inadequate measures of student performance. Society’s representatives from outside the :lassroom have from the beginning been involved-"whether :hey were ministers, school trustees, or college admissions >fficials. Still, such evaluations were subject to varying Jersonal opinions. It is easy to understand the promise :hat standardized scales and tests held to turn students' 'eading and writing into numerical scores not tainted by human attit efficiency 1 to promise English lan well. Stil arts educat the time tc in order individuall sought for At an GXpressed numerical Standards teaching I questioned adequately arts Skill question 1 Stildent SO 43 human attitudes and impressions. The same impersonal efficiency that worked on the factory assembly lines seemed to promise both higher productivity and quality control in Emglish language arts evaluation of student performance as well. Still, it is striking how much faith English language arts educators placed in the tests. One wonders if any took the time to study the content of individual test questions in order to determine whether the questions, either individually or collectively, reflected the learning they sought for their students. At any rate, caught up in this spirit, by 1924 they expressed the View in the professional journals that numerical scores were the primary means by which to set standards for English language arts student performance, teaching practices, and curricula. Few at this point questioned whether the tests themselves could accurately and adequately evaluate the merits of complex English language arts skills or processes. Few seemed, in print at least, to question the effects the numerical labels might have as student sorting devices. THE Testing Entl Nation: followed by stay in sc represented age--attrib for employm made a di become a efficiency- the hard tj If th. the develo] °blective 1 1925-4940 and discus- In s, edrlier Echusiast important September SEemingly adVOCated during t1 CHAPTER THREE THE IMPACT OF STANDARDIZED TESTING: 1925-1940 Testing Enthusiasm Nationally these were years of good times economically followed by great hardship, years when students began to stay in school longer. By 1930 high school students represented over half of all the population of high school age--attributable in part to "the shrinkage of opportunity for employment" (Koos 305). Even the advent of school buses made a difference, for school consolidation could then >ecome a .reality (Cremin 274-5), promising’ greater efficiency--undoubted1y' an important consideration during :he hard times. If the first half of the 19205 had been dominated by :he development and description of various standardized and >bjective tests of English and language arts, the years from .925-1940 brought widespread use of, experimentation with, ind discussion about tests and their potential uses. In such a climate many English educators who had arlier praised standardized and objective tests :nthusiastically recommended that such tests be given an mportant place in the curriculum. For example, the editor f Elementary English Review, C. C. Certain, outlined in eptember 1926 a testing program for the new school year. eemingly unaware of any problems with the tests, he dvocated using all of the following standardized tests uring the year: Briggs Form Tests-~A1pha and Beta; 44 Wisconsin Te Test of Sr Correct Eng recommended dictation te For 5 recommended the Seconda significant century ent. Objectively (433). His that teach. profession, our unstabl Striking an Chart list: educators , °°mposition literature, SPElling' a The $1 Praised as which Were Prejudice, (Thomas et defenSiVelv 45 sconsin Test of Sentence Recognition--V and VI; Wisconsin st of Grammatical Correctness--A and B; and Clapp’s rrect English Test—-A and B (211). Moreover, he commended that these tests be supplemented by "parallel ctation tests" and "controlled composition tests" (211). For secondary English teachers Charles Thomas "new tests" in The Teaching of English in (1926), insisting' that "[t]he most ommended the nificant movement in education during the twentieth tury enters in the attempt to provide scales that will jectively measure ability and achievement in school work" 38). His recommendations came as a result of his belief at teachers should be "true to the ideals of our afession," by employing "every agency that will correct r unstable personal judgments" (439). Perhaps the most riking and instructive information he included was a long art listing scales and measures that secondary English Jcators could choose from: 21 tests were listed for nposition, 20 for grammar, 18 for language, 11 for :erature, 6 for punctuation, 30 for reading, 26 for alling, and 17 for vocabulary (475-82). The standardized and objective tests were most highly ised as a corrective of teachers’ subjective judgments, ch were thought to be too often determined "by mood, judice, or gross misconception of factors and conditions" omas et al. 209). Beyond that, English teachers might ensively expect other benefits, since students could "neither bl answers wit] 492). Inc' observed t conglomerat is what I 1 student whc cannot blam which "the Why he rec. upils in a hundred in various grades make certain censured grammatical errors’ of the purely conventional sort like them kind’ and ’1earn us,’" and (2) to "enable schools to ‘ind out the same thing specifically for themselves" (430). .gain, students’ responses puzzled the test-makers. When sked to complete the sentence, "I have finished my work and home," some students filled in the blank with chores" or "mother," words which shared an association with home" but which did not fit syntactically with the rest of he sentence (such responses sound like those that second— anguage users might have given). Included in Leonard’s rticle was an elaborate tabulation of 31 errors in order of requency so that teachers could see, for instance, whether laying" for "lying" was apt to be a more common problem an "ain’t" for "aren’t" (440). Another effort to rank errors appeared in a 1929 ticle. In this case a group of New York University udents had given teachers copies of student texts and ked them to find the errors in the papers. Rather than cus on the quality of students’ usage, the university students w and inadeq 120) and CI to the fol for "ran" violations colloquial "expressio in?"). U during the might toda expressed teachers idiosyncra teSting p1 grades ma Secondary the Same . f°r Older effECtiVe "Pupils d leaVing ix TYpi< listening. during t1 however, 1 oral lane 51 tudents were apparently more shocked by the "uncertainty nd inadequacy of the teachers’ judgments" (Barnes et al. 20) and consequently tried to classify the errors according to the following categories: "gross errors" (e.g., "run" for "ran"), "trivial errors" (e.g., subjunctive mood violations such as "was" for "were), "expressions in good solloquial use" (e.g., "kind of pretty"), and finally 'expressions in good general use" (e.g., "Which drawer is it Ln?"). Undoubtedly, a great deal of effort was expended luring the process of error classification, a task that might today seem of questionable value, though these authors expressed the hope that their classification would help :eachers sort out their own linguistic preferences and .diosyncracies (120). In another case, a Massachusetts esting program confirmed that students in lower elementary 'rades made the same errors as students in the upper econdary grades, though the older students made fewer of he same errors (G. Wilson 117). The improved test scores or older students were attributed in this case not to ffective teaching or learning but to the likelihood that pupils doing less well in school work tend to drop out, eaving in school those who are proficient" (Wilson 117). Typically, the oral language arts—-speaking and istening--received less attention among test developers uring these years. Some unwieldy efforts were made, owever, to apply the methods used for composition scales to ral language. Sydney Harring, for instance, reported in 1928 a so: children compositior (71). Lat study whic records of various su terms of s. of ideas; measure v. well. The The stumbl with such Classroom as these, Procedures there seen for correc Effor apply Star designers infornath to find V Such attex which Drc Deetry. With One 52 1928 a scale devised using stenographic records made as children presented oral compositions. The written compositions were then judged for "composition quality only" ( 71). Later Mildred Dawson described a somewhat similar study which also used stenographic records, in this case records of students’ "conversation and discussion in the various subjects" (195). This data was then analyzed in terms of sentence structure, correct usage, and organization of ideas; and eventually a rating scale was constructed to measure voice, posture, articulation, and vocabulary as well. The results for each child were then charted (195). The stumbling block was, of course, the stenography needed with such plans. It seems questionable whether any classroom teachers were able to immflement such suggestions as 'these, though tape recordings would later' make such procedures somewhat more practical. Again, in this case there seemed a special effort to focus attention on the need for correctness. Efforts continued during these years to find ways to pply standardized methods to the study of literature. Test esigners realized the need to measure more than factual 'nformation about literature and struggled especially hard 0 find ways to measure appreciation of literature. One uch attempt was reported in the March 1926 English_gggrg§;, hich. provided. a test designed to test appreciation of oetry. One section of the test included lines from poems ith one version unchanged and the other two versions reworded ": Students we best to the And ma Find c And me Find c Come, Seek s In order t understand. test asked you ever shepherd t that stude apt to app Wonders wj answered q "Have you pedagogica knowing t Such (Tiles Sec°nd la respondim for this that the 23 Percent after Cla 53 reworded "in order to destroy the rhythm" (Ruhlen 203) . Students were asked to indicate which of the choices sounded best to them: And may my old, lingering age Find out some peaceful hermitage, And may at last my weary age Find out the peaceful hermitage, Come, let my old, dull, white age Seek some quiet and restful hermitage, In order to determine students’ background experiences and understanding of poetic language, another section of the test asked for yes-no responses to such questions as "Have you ever seen . . . a dappled dawn? ebon shades? a shepherd telling tales under a hawthorne?" The intent was that students unfamiliar with the vocabulary would be less apt to appreciate and understand poetic style. However, one wonders ‘what students ‘might. have thought about as they answered questions such as "Have you ever tasted ale?" and "Have you ever been lulled to sleep by the wind?" And what pedagogical conclusions might teachers have drawn from knowing the students’ negative or affirmative answers to such questions? Again, students for whom English was a second language seem to have been at a disadvantage in responding to such questions. At any rate, the responses for this particular test were carefully calculated to show that the class as a whole had a background for appreciating 23 percent of the imagery in "L’Allegro." A post-test given after class discussion of the poem revealed that students then under leading the not be taut appreciate instructim Perha] asked for literary article $1 engage stu the materi questions there are jest? Wha Similarly Selection, would You these wrj strategy c reported lively an. personal < better tha in the pa: In ] surprising 54 hen understood about 50 percent of the images (208), eading the author to conclude that perhaps this poem should at be taught, since so many students still seemed unable to ppreciate the imagery even after having received nstruction about it. Perhaps more useful were teacher-designed tests that ked for personal engagement of students with their terary texts. For example, Olga Achtenhagen’s 1926 ticle suggested using "thought questions" intended to gage students personally in their reading and to build on e material already familiar to them for response to such estions as "do you agree with Bacon when he says that rare are certain things which ought to be privileged from est? What are your reasons?" (288). In 1928 Ruth Moscrip .milarly suggested that students be asked to read a election, then write responses to questions such as "Who luld you rather have been, Maggie or Tom? Why?" Using ese written responses prior to class discussion, a rategy often recommended for classrooms today, the author ported that subsequent discussion had been especially vely and led her to conclude that questions asking for rsonal opinion and engagement aided literary appreciation :ter than the factual tests of knowledge she had created the past (140). In 1931 Poley’s "Learning by Testing" suggested a prising innovation which offered a hint of the student- tered focus for English language arts that was beginning to develop asked his explained with the b Durin part of " arts were knowledge interpret of NCTE’s the ideal Purpose br well, Hat ideal cur Constantly "the appri life expel Later asserted 1 centrol 0 rather th. read (Jon students class had each and Studied. 55 to develop during the 19305.. Poley reported that he had asked. his ninth graders to devise their own tests and explained that students were enthusiastic about coming up with the best possible questions (135). During the early 19305 English came to be viewed as a art of "life experience," that is, English and language rts were considered important not just as a way to acquire nowledge and skills but also as a way to understand and 'nterpret life’s experiences. In 1932 W. Hatfield, Chairman f NCTE’s Curriculum Commission, spoke of "an ideal life as he ideal curriculum" (179). Such a reconceptualization of urpose brought with it a new consideration of evaluation as well. Hatfield explained that just as "the backbone of the ideal curriculum . . . is a sequence of experiences :onstantly increasing in complexity and subtlety," so also 'the appraisals must be in terms of growth of power in the .ife experiences rather than formal tests" (191). Later an article entitled "Power—Testing in Literature" .sserted that the primary purpose for testing was to develop ontrol over books students would encounter in the future ather than to display knowledge of books they had already sad (Jonas 800). A "power" test was created by giving tudents three unfamiliar passages written by authors the Lass had studied and asking them to identify the writer of rob. and to explain the similarities to previous works udied. Such a test "represented[ed] not crammed facts and parroted v material" F0110 Rousseau, Smith 243 students . English as begin wit already e: circumstar furtt and 1 overc and [citi m Sounding described leads on . Network, planning, and enjoy As r ”Pillar w puhlicati uriderstan Gray repc the IllIIItbe tWiCe th 56 rroted views but unaided pupil power to deal with fresh terial" (804). Following the focus on the child-centered pedagogy of usseau, Froebel, Pestalozzi, Dewey, and Kilpatrick (N. ith 243), the "activity curriculum" for elementary udents emerged during the 19305 with its emphasis on glish as experience. The activity curriculum was said to gin with "something which an individual or group has ready experienced" and to continue under the following rcumstances: . . . through the desire of the individual or group to further interpret the experience, difficulties arise and through the efforts of each individual or group to overcome these difficulties, new interests are created and new problems appear, and so on. (N. Smith 244 [citing 33rd yearbook, part II, National Society for the Study of Education]) unding remarkably current today, this curriculum was cribed as "a never-ending process . . . each experience d5 on to further experiencing, thus forming an intricate work , which involves investigating , questioning , nning, performing, evaluating, appreciating, achieving, enjoying" (N. Smith 244). A5 reading had become more and more important to and ular with society, evidenced by increasingly widespread lication of newspapers and books (Gray 10-11), efforts to erstand the reading process likewise expanded. William y reported, for example, that during the years 1925-37 number of research studies in reading had been more than ce the number reported during the entire preceding century (1 reading e) how many t While read: exter stanc freq1 His long I from purel interpret: meaning r itself, L testimony Intended teachers ,Chapters just as Variety ( WhiCh ser generated Inde Particula educaltion c("isllltan it al., llsoundnes Standard 359)‘ Ht 57 :ury (15) . As might be expected, efforts to evaluate ling expanded as well. In 1931 Paul Sangren pointed out many test-makers had scrambled to design reading tests: While there are only three or four standardized oral reading tests that have been used to any considerable extent, there are approximately one hundred fairly well standardized silent reading tests that have found frequent use. (53) ; long list of such tests includes stated purposes ranging >m purely decoding skills to measuring word recognition to :erpreting texts and testing power to comprehend total ening of paragraphs (88—93). Perhaps Sangren’s book self, Improvement of Reading Through the Use of Tests, is stimony to the importance placed on reading tests. tended as a textbook, presumably for a college course for achers or administrators, Sangren’s book ended some pters with problems for study and discussion. Possibly t as useful for potential reading specialists were a iety' of graphs, scales, charts, and. diagrams--all of 'ch served as examples of the kind of data that could be erated from test scores. Indeed, reading theorists of the day seemed ticularly eager to test. Arthur Gates, a professor of cation at Teachers College who also served as a sultant for a major publisher of basal materials (Goodman al., Report 22), affirmed, as might be expected, the undness of the policy of making systematic use of ndardized. tests at intervals. during' the ‘year" (Gates ). However, reading educators led the way in suggesting other mean the inclus that were example, 1 "certain combinatic of study (359). curriculul teachers a student t] times: reading e; reading : Skills, . advanced For follow a administry and at t. (lather at students. informati students: confers“c ability . informal 58 other means of evaluation as well, probably as a reaction to the inclusion of more student-centered classroom activities that were a part of the experience curriculum. Gates, for example, called for wider use in elementary classrooms of "certain observational methods, ratings, questionnaires, combination teach-and-test materials, subjective appraisals of study habits, and other less widely used techniques" (359). He insisted that superintendents, principals, curriculum departments, book committees, supervisors, and teachers all needed evaluative data (360) and that for each student the following information should be available at all times: age, intelligence, language abilities, previous reading experiences and interests, vocabulary, basal silent- reading skills, word mastery skills, basal oral-reading skills, general reading habits, reading interests, and advanced reading and study skills (360-61). For high school students Gates recommended that schools :‘ollow a number of testing procedures. In addition to rdministering standardized reading tests at the beginning 1nd at the end of the year, teachers should periodically rather and evaluate records of independent reading done by .tudents. They also should test students’ ability to find nformation in libraries and in books; should check tudents’ study habits by observation, individual onference, and self-inventory; and should measure students’ bility to read in the various subject-matter fields "by nformal tests in connection with their class work." Teachers : treatment norm for g of student and-test m for the t: individual "increas[e time“ (375 It i interest ; his own DOSition educators of evalua. qenuinely classroom dismissin, “in busy Primarily PerceiVed Parents, quest ion . 59 :achers should also seek "expert diagnosis and remedial 'eatment for students whose reading ability falls below the prm for grade VII" (386). Gates especially praised the use ‘ student workbooks and other "printed booklets of teach- ld-test materials," because they provided an effective way (r the teacher "to keep in almost daily contact with each (dividual’s progress and difficulties" and because they ncreas[ed] efficiency by teaching and testing at the same me" (375). It is difficult not to attribute part of Gates’s terest in having published materials used to the fact of s own employment by a publisher. However, Gates’s sition was not considered extreme. Indeed, many other ucators in the years ahead presented similarly long lists evaluation suggestions in an effort, one assumes, to be nuinely helpful. It seems likely, however, that in actual assrooms teachers responded to such long lists by missing them as impossible within the context of their a busy classroom schedules. How much simpler to depend imarily on objective tests which produced what they rceived to be a scientific certainty that students, rents, and administrators would find difficult to estion. rluating English Language Arts Lching Practices and Curricula Evaluation of teaching practices among English language 5 teachers continued through the late twenties to be handled in by the prc Sidney Co: rather qua "antipathy «followed teachers-- communicat relationsl “many sor desirable in PeOple "desire" Most impo be a "re: self-appr, Broening Protessio "Did the remote Va records, teaCher 0 recorded individua authors , evaluatic 6O handled indirectly, at least as such evaluation was treated by the professional English and language arts publications. Sidney Cox’s 1928 The Teaching__o_f__fl1gli_s_h, for instance, rather quaintly discussed virtues needed by all teachers--an "antipathy to deception," nerve, energy, and health (80-83) --followed by a discussion of qualities unique to English teachers--"a fundamental and imperious desire to communicate," "an urge to establish reciprocal relationships" with students, and the ability to cope with "many sorts of actuality all the time" (83-84). Other desirable traits on Cox’s list included a general interest in people, taste, an "acquaintance" with good books, the "desire" to 'write, and. possession. of Jknowledge (84-88). Most important, Cox asserted, was that the English teacher be a "real person" (89). Eleven years later the teacher self—appraisal criteria for unit teaching devised by Angela Broening' et a1. sounded. considerably"more practical and professional, asking questions about conferences, such as, "Did the conferences help pupils to see the immediate and remote values of the subject—matter?" and questions about records, such as, "At intervals of different lengths did the teacher check with the pupils their efforts and successes as recorded in their notebooks and evidenced in their special individual and group projects?" (Conducting 284). These authors affirmed the Classroom teacher’s central role in evaluation and decision—making based on reflection: The hers: or e: girl: while accui expel to a (285 As t studied 1 objective realized pupils, 1 marks and closely learning" reCogniti teaching needs, t1 prOViSion desirable Was "hon bright 0: “Sins tn. (308), In ( disCussir ll CurriCult 61 The final phase of appraisal is what the teacher herself feels, thinks, and knows (because of empirical or experimental evidence) has happened to the boys and girls in her class. Looking back over a unit of work while details are fresh enough to be recalled accurately yet with the perspective of a completed experience gives the teacher the courage to repeat or to abandon the unit. with another group) of pupils. (285) As teachers learned from self-evaluation, they also studied test data from their students’ standardized and objective tests and, as Broening et al. reported, eventually realized that "in no classroom, however small the number of pupils, however well selected on the basis of teachers’ marks and parents’ pedigrees, could be found boys and girls closely alike in respect to all the factors affecting learning" (Broening et al., Conducting 60). Such recognition of individual differences led to a variety of teaching practices and programs designed to meet individual needs, though as Leonard Koos pointed out in 1933, such provisions were "not yet as generally practiced as seem[ed] desirable" (308). Instead, the easy solution that emerged was "homogeneous grouping, special classes for ‘the ‘very bright or gifted and for the slow," more often than not using the intelligence quotient as the basis for grouping (308). In order to consider English language arts evaluation of curricula, it is helpful to consider Ronald Doll’s iscussion (1970) of curriculum improvement, which he termed "a very recent field of inquirY" (20). Apparently, early urriculum decisions were made on the basis of administra consideral A clc reveal cc however, « reported : national practices Programs. districts enormousl scope of admittedl and langr almost 2( in state thils, 1 0Ver 800 this stuc' Penulatic curricuh the heme: of the 1 peri0d: to haVe before 1 62 administrative and teacher recommendations with little consideration of official curriculum evaluation (21). A close look at English language arts publications does reveal considerable curriculum evaluation taking place, however, during the 19205 and 19305, though often what was reported most noticeably were results of state-wide or even national studies of English language arts programs and practices rather than school- or district-wide evaluation programs. Often such studies involved hundreds of school districts and thousands of teachers--and undoubtedly were enormously expensive to conduct. Koos described in 1933 the scope of the National Survey of Secondary Education-- admittedly a broader study than those involving just English and language arts. Still, it is difficult to imagine that almost 200,000 forms were sent to "administrative officers in state departments and local school systems, teachers, pupils, former pupils, parents, and employers" (Koos 305). Over 800 visits were made to over 500 schools as a part of this study in an effort to study school organization, school population, problems of administration and supervision, the urriculum, and extra-curricular activities (304). One of he benefits listed for this study seems applicable to many f the very large research reports conducted during this eried: "those in charge of the schools and teachers like 0 have the records and descriptions of the innovations efore them and to be permitted. to exercise their own judgment ‘ adopt or . On a on a stud teachers the cont] economic, were also establish as to tin units b1 Organizat covered - classes, 0f achiei 0f readi Slight sv Commute. The % evElluati. by Broer "Washer: out how d‘hnq ab was app, contribu 63 judgment with respect to which of them they will themselves adopt or adapt in the different local situations" (312). On a considerably smaller scale, NCTE reported in 1936 on a study of the "correlated curriculum" that involved 73 teachers and nearly 2,000 students. Care was taken to match the control and experimental groups in regard to social, economic, intellectual, and achievement status. Teachers were also "rated" before the study began. Three groups were established--one which integrated units with no restrictions as to time schedule and activities, another which integrated units but followed a fixed time schedule and fixed organization of materials, and a final control group which covered the same "general ground" but in separate subject classes. When comparisons were made based on the criteria of achievement tests, mental age, information tests, amount of reading done, and attitudes, the results indicated a slight superiority for both of the experimental groups (NCTE Committee on Correlation 237). The stated purpose of the 1939 publication by NCTE of Conducting EXQeriences in English was to report on evaluation of the experience curriculum. This text, written by Broening et a1. and mentioned earlier, explained that "teachers and supervisors over the country wished to find out how the experience idea was working--what others were doing about it" (vi). Therefore, a committee of 5 persons was appointed. by' NCTE, and. the committee in turn used contributions describing classroom practices from 274 English ta research 6 strategies thoroughly arts class Perha the repo articulat placement since on curriculu gather d analyze ' based on Eve] PUp: uni' ObSl ora wri acc lis thr 8B pla owe the lik prc Put This pr< data! S In this 64 English teachers as well as results of questionnaires and research experiments conducted to test particular classroom strategies. What evolved was a comprehensive report thoroughly researched, in many cases by English language arts classroom teachers themselves. Perhaps one of the more useful parts of the text was the report of efforts in Baltimore to work out an articulation program to determine the best grade-level placement of particular units of study--especia11y helpful since units were an important part of the experience curriculum. The teachers engaged in the study were asked to gather’ data from their students and. then in groups to analyze the data to design the sequence of the curriculum based on their analysis: Every teacher kept careful records of what individual pupils and groups were able to accomplish during the unit . Standardi zed and teacher-made tests and observational data concerning pupils’ emotional and oral responses were studied . Specimens of pupi l s ’ writing prepared under known conditions were accumulated and analyzed. From all these, a tentative list of attainments, grade by grade was set up. Then through conference of 73 teachers with 6A, 7A with 7B, 83 with 7A, etc., on up to 12A with college and with placement counselors, the grade lists were anlyazed for overlapping. Items in the list for a given grade were then starred to show what teachers of that grade should like entering pupils to have mastered and what the promoting teacher reported as attainments of her pupils. (268) This process, though heavily dependent on standardized test data, still provided a centrol role for English teachers. 131 this case, the teachers themselves ‘were involved in classroom- determini As ; provided be evalua it useful or other those whi planning supervisc A. adul The lanQUage 1925‘194 that toc 65 classroom—based research—-collecting and analyzing data and determining where to set standards. As an appendix to the 1939 text, Broening et a1. provided criteria by which experience-centered courses might be evaluated with the hope that "[l]oca1 committees may find it useful to apply these standards in appraising their own or other courses" (349). Among the criteria included were those which outlined a democratic procedure by which course planning could become a "creative experience to teachers, supervisors, and pupils": A. Survey . . . the needs in English of pupils and of adults. B. State objectives in terms of pupils’ present and immediate future needs and of social sanctions. C. Build units based upon experience. D. Try out units in test-controlled situations in actual classrooms. E. Examine all available instructional equipment. F. Utilize expert advice and scientific research findings where relevant. G. Outline the procedures used in building the course so that new teachers may understand its philosophy and contribute to its continuous adaptation to changing conditions. H. Prepare instructional tests and cumulative records for measuring pupil growth in terms of adOpted objectives. (350) There is one other form of evaluation of K-12 English language arts teaching practices and curricula during the 1925-1940 period, and that is the very unofficial measuring that took place by college professors who drew conclusions abc on put tea col col fr: Coi ab ed te th am th an En cc SC 66 about their students’ pre-college learning experiences based on their college performance. It is clear from professional publications of the period that many secondary English teachers were both fascinated and intimidated by the colleges and felt obliged to try to discern what the colleges wanted them to do. In 1931, a year when just a fraction of the college-age population attended college, the College Entrance Examination Board (CEEB) published a book about the college entrance exams, Examining the Examination in English. The book seemed intended primarily for college educators but had significant implications for secondary teachers as well. In addition to discussing the history of the entrance exams and to providing elaborate comparisons among the various forms that had been given over the years, the book also reported the results of gathered information and opinions from college professors, headmasters, and English teachers in public and private schools and correlated students’ college grades to their earlier exam scores (Thomas, Examining 140-67). .Apparently' the CEEB committee members felt that having conducted their study somehow qualified them to give advice about K-12 programs. Thus, included in the chapter summarizing findings and discussing recommendations were statements intended to tell :he K-12 schools what they should do, such as, "This process if outlining or ’hdue-printing’ should start very early in he pupil’s school career and should be consistently racticed through the twelfth grade" (200). Even more self- rightec "should stander expect their : Sr discarr place living Progre: Progre accomp 1932 a study on coi "Succe School lDVOlV Colleg eXamir to pa htqgra hYpotI exDerz' Viewe( et al 67 righteously, they asserted that any future entrance exam "should embody in its questions a clearly conceived idea of standards in English which the colleges have a right to expect from secondary-school graduates who seek admission to their institutions" (212). Some did feel that secondary schools should strive "to discard preparation for college as a goal and to put in its place an educational program that looks toward complete living" (Koos 312). One of the most extensive evaluation programs of the l9305--the Eight-Year Study conducted by the Progressive Education Association (PEA)--was expected to accomplish this very thing. This elaborate study, begun in 1932 and concluded in 1940, was similar to the earlier CEEB study in that it studied the effect of high school curricula on college-bound students, in this case directly testing "success in college" of several thousand graduates of schools. According to the terms of the study the schools involved had been freed from the constraints of the usual college-preparatory course and freed from college entrance examinations (Eberhart 261). Each of the 30 schools chosen 0 participate had develOped a "distinctive education rogram" (261) which was "in essence an educational ypothesis which it was necessary to test in actual xperience" (262), so that evaluating the new programs was iewed as important from the very beginning. Kirschembaum t al. later would call the Eight-Year Study "the most rofoundly important research study ever undertaken in the history of A because many whose achie exceeded tho- pointed out, World War I "seem to hav carefully p involved cla little impac Programs: cc to influence used in an a The Opt this Period during this they were measureg. ] PrOCeeded m profeSSiOna] Instead, the valid: and students. Stotistics Surely had 68 ry of American education" (182), apparently so-labeled se many of the experimental programs produced students achievements, without the motivation of grades, ied those of control groups (183). Kirschenbaum et al. ed out, however, that-—perhaps because of the 1942 War II publication data--the results of the study to have been lost on most educators" (182). Thus, a mlly planned and documented research project which led classroom teachers in a central role seemed to have s impact as a model for later planners of innovative ims: college admission tests continued to be given and fluence secondary curricula, and grades continued to be .n an attempt to motivate student performance. 'he optimism that surrounded standardized tests during period felt right, then, to most English educators this period as a way to combat the subjectivity that were told had always permeated their evaluation es. It is easy with hindsight to say they should have ded more cautiously in offering their personal and sional endorsements to such testing practices. 1, they operated on the assumption that the tests were and they projected their own enthusiasm onto their :5. They seemed unconcerned that students became :ics to be manipulated and plotted on graphs but had no way to anticipate the damaging effects such nd their results might have. They did possible tes them seem n: suggestions individual c In the curricula, active par evaluation. encouraged research pr Procedures the assum] PrOfessiona It is teaSt part CohhuCt hat ht teachin Students. Seems to 11. data that of heeded 69 hey did also respond, however, by exploring a range of ible testing procedures and questions. Although some of seem naive today, some of the alternative evaluation estions they offered pass as newer ideas today, e.g., vidual conferences and personal engagement questions. In the midst of their "experience" and "activity" icula, students were encouraged to some degree to be .ve participants in their own education and its .uation. English language arts teachers were also (uraged to think for themselves as they participated in arch projects that let them have a say about both the edures and the recommendations. Such plans operated on assumption that classroom teachers were capable essionals. ‘ It is especially interesting to note that during at t part of these years there was interest and money to uct national program evaluations which served as a test eaching conditions as much as a test of teachers and ents. In fact, the primary use of such study reports 5 to have been to provide English language arts teachers that could be used to persuade district administrators eeded reforms. Sch especial of the 1 percent Commissi stay in school makeup . 160). Bu Broenin life ex English unanaly Prevent 719-201 youth applyii proble; activi P and fi INHIBIT! CHAPTER FOUR CHALLENGING THE TESTS: 1941-1957 School populations during this time continued to grow, ially in the high schools, which had served 50 percent e high school age population in 1930 but served 75-85 pnt of that population twenty years later (NCTE ssion 440). Because less successful students tended to in school longer, the average "leaving age" of high 1 students kept getting higher, thus changing the p of high school English classes (D. Smith, Evaluating Building on the theory of English as life experience, ing saw connections between world events and students’ experience. She insisted in 1941 that the role of the sh teacher was to "immunize youth against hate, against lyzed prejudices and unfounded conceptions which t the realization of the poet’s vision" ("The Role" 0). Teachers should, according to Broening, "help to rediscover and to reaffirm American ideals, ing them to present local, national, and international ms and realizing them through individual and group 'ties" (720). ooley with hindsight would later refer to the 19305 'rst half of the 19405 as a time in which nglish apparently fell heir to everything which ducation felt that children should have and which did ot fall naturally into any other area of the urriculum . . . the period in which the newspaper, the magazi motion electr became English Lar Tstmbise The e years was seemed to educators objective had refine Professior Walte movement, the Engli goals whi for their as vocabu isolation reSuited Teaching by testir the inst reve‘ISe 4. (197). chthSed "faulty 71 magazine, the popular book, detective fiction, silent motion pictures, talking motion pictures, radios, electrified phonographs, and, finally, television became a part of the English teacher’s job. (498) ish Language Arts Abuse and Criticism The enthusiasm for testing that had existed in earlier '5 was soon tempered by considerable criticism that led to grow more intense. By the 1940s many English :ators had become disillusioned by standardized and active tests or, if they had always been test—resistant, refined their arguments enough to finally be heard in fessional publications. Walter Cook, an articulate critic of the measurement ement, insisted in 1944 that it had negatively affected English curriculum by focusing attention upon "limited is which [could] be objectively checked without regard itheir relative importance" and by measuring such things wocabulary, spelling, capitalization, and punctuation in aation, a practice especially undesirable because it %lted in "a tendency to teach them in isolation" (197). thing practices had likewise been negatively influenced $esting: "Since the evaluation program was not keyed into 4‘ arse the procedure and to fit instruction to evaluation" % T). Worse for students, test norms were "too frequently fused with grade standards," which sometimes resulted in i alty classification and promotion practices" (197). instuctional program, the only alternative was to D . importar Regents W practice passed which w high sc economi Smith p PE tests < seek we text 1 illust; intent I—JHI'I'UIH'MQ’Q’MCTSH 72 D. Smith found this last problem of particular tance when she conducted her study of the New York .ts Exams, as described in Evaluating Instruction in idary English (1941). She was especially upset by the :ice of using test scores to decide which eighth graders ad "from the rural school to the town high school" and 1 were retained in eighth grade. Recognizing that the school provided "opportunities for shop work, home amics, agriculture, commercial training, and the like," a pointed out how unsatisfactory the testing system was: . . . the very pupils who cannot measure up when called upon to comprehend poetry and name the parts of speech are in greatest need of the type of training available at the secondary school level; yet they are the ones held back one, two, and sometimes three years because they cannot pass the eighth grade examination. (164) Perhaps it was teachers’ awareness of how damaging the i could be for their students that in part led them to ways to circumvent the effects of the tests. One 1944 included a much earlier comment by a publisher that states the extent to which teachers, perhaps well ntioned, abused the tests: I could give you the names of several school systems in which cumulative files are kept of all forms of our tests. We have standing orders from these systems to supply them with each new form as it appears. Our agents tell us that in these systems the tests are available to all teachers who, if not encouraged to do so, are certainly not prevented from duplicating these tests and drilling their pupils in taking them. Then some form or other of these tests is used at the end of the year to measure achievement and to make comparisons between classes within the same system. (qtd. in Cook 196) of read for sec got the include preseni years profes: oral l; I: recogn theore Davis a defi accura regret tests liters (187). descrj of °°mpre readi] (Tinke teach 73 W For elementary language arts students, the evaluation 'eading seemed a major focus during this period, whereas secondary English students, the evaluation of literature the most attention. Although all the language arts were uded whenever objective evaluative criteria were ented, what seemed essentially missing during these 's for both elementary and secondary levels was essional discussion of the evaluation of writing and of language. In regard to reading, there was considerable gnition that tests in reading reflected particular retical definitions of what reading was. Frederick s in 1944 argued that reading tests needed to "formulate finition of reading that can be accepted as adequate and rate by authorities in the field" (181) and expressed pt that most current reading tests were "almost entirely E of word knowledge and of the ability to comprehend the al meaning of the separate statements in what is read" ). Ten years later, however, a Reading Teacher article ibed an only slightly expanded list of the "main areas" reading—-"word recognition, vocabulary meanings, ehension, rate of reading, study skills, special silent 'ng skills, oral reading, and interests and tastes" er 36). Educators who depended on step—by—step methods for the ing and testing of reading packaged by national publi "read were lengi level the devi “opt Smit sent cont stud Smit pose nume Engl whe1 foc tea Kat ed. 9V6 es; hat the the 74 mlishers probably welcomed the scientific sound of eadability" formulas that emerged during the forties and ere designed to determine by word counts and sentence sngth. which reading texts were best for particular age Nels. Appearing even more impressively scientific were i la "mechanical aids" that emerged at about the same time—- hices with technical names like Etelebinocular,“ @thalmagraph," "metronoscope," and “tachistoscope” (N. i fith 303). With each machine, "words, phrases, or htences of content were flashed for recognition under ntrolled time allotments which could be decreased as the udent gained in ability to grasp them more quickly" (N. ith 303), thereby offering a whole range of new testing ssibilities and new reading measurements that would yield merical scores. The interest in applying technology to glish language arts evaluation continued into the fifties an teaching machines provided programmed instruction :used on "small bits of learning units" (N. Smith 402). Not all reading educators were swayed by the scientific aching and testing materials of the day, however. :hleen Hester’s Teaching Every Child to Read (1948, 2d. in 1955) offered an array of alternative reading lluation procedures. Teacher observation was considered >ecially important and included noticing students’ reading )its and tastes and recording the kinds of books read, the lunt of time spent reading, attitudes towards books, and extent to which students turned to books for information (385) "the obser luncl chec] appra Teaci so info Hest [emp near resi scie teat SUb: rear Cri Eng ato Hen ref Sit 75 385). Teachers could, according to Hester, also record the child’s social and emotional development" and tservations of "pupil behavior in the classroom, in the hnchroom, and on the playground" (386). They could use hecklists, pupil file folders, anecdotal records, self— ppraisal records, and interest inventories (386-87). eachers reading such a list might have felt overwhelmed by P comprehensive a plan, but perhaps some used these hformal measures and noticed a couple of pages later tster’s bold assertion that "standardized tests supplement emphasis added] the information obtained by the informal aans that have been suggested" (388). Secondary English teachers as a group seemed even more asistant than elementary teachers to applying so—called :ientific measurement to their curricula--especially to the aaching and learning of literature. Whether they lbscribed to Rosenblatt’s emphasis on the responses of :aders (Literature as Exploration, 1938) or to New 'iticism’s emphasis on close readings, many secondary (glish teachers seemed more aware of the dangers of omizing texts in order to make them testable. George nry's 1954 article, "Only Spirit Can Measure Spirit," flects such an attitude. Henry described a hypothetical tuation in which a fifteen-year—old student . . . comes to your desk and says, "You know, that captain in the Sea Wolf is the kind of fellow you wouldn’t like to meet in real life, to have as a friend, I mean. You’d no doubt hate to have him around you. Yet, when you read this book, you see inside him. Henr this scie Such as C sitt sine fine durj meae "Fur Pr0< Con: its and fon (l8 adv. gro 76 You know what makes him be the person he is. You begin to look at things his way, and you start to understand him. You sense his point of view, knowing the kind of fellow he is. You don’t agree with him, yet you appreciate his type." (181) nry posed the question for readers, "Why should we give is boy an objective test in English? In the name of ience?" Students, Henry suggested, have been so conditioned by our formalism in teaching that they themselves don’t know when they are getting educated. . . . Hence their notion of what is ’good for them’ lies in the complacency of routine, of text, of the number right and wrong, and of the final examination. (181) :h statements ring true for many English teachers today, did Henry’s insistence that "the teacher must devise more :uations soliciting ease and confidence and dignity and icerity to reveal what is going on inside the student, not 1d better objective tests!" (181). Those few who discussed evaluation of composition 'ing this period felt less constrained by objective isures. Harry Greene and William Gray in 1946 mentioned .nctional Objectives . . . of Written Expression" but duced items which reflected merely the practical rtesies prescribed by the life experience curriculum, ms such as "to use correct form and content in all social business correspondence" (180), to "fill in certain ms and items of information as evidence of understanding" L), "to write a telegram, notice, announcement, or ertisement" (182), and "to keep records and minutes of 1p meetings" (183). These objectives were then turned intc grad test fill suqc. Stu OPP ab: ev: as re: Su- 77 :o classroom assignments which produced papers easy to lde for form and correctness, or they were turned into :ts which were likely to be dominated by short—answer or .l-in-the-blank questions. Indeed, Greene and Gray pgested questions such as the following: Which of the . . . salutations Dear Sir: Dear Bob, Dear Mr. Snow: Gentlemen: would be suitable for use in writing a friendly letter to Mr. Robert Snow, an older business man? (181) Idents were given questions about writing rather than aortunities to write: Show for each of the following the proper material to be used in writing. Write the correct number on the line before each exercise: __ (a) A note to the grocer (1) Written on plain boy to put the meat in paper the icebox on the porch (2) Written in longhand __ (b) An application for a with a pencil position in an office (3) Written in longhand __ (c) A note of thanks to with blue/black ink your hostess for dinner on plain note paper __ (d) A formal note of (4) Written in longhand regret with blue/black ink __ (e) A cordial letter to a on tinted note paper friend (5) Written in longhand with brown or green ink on tinted paper (Greene and Gray 184) College Board data cited in 1957 might provide insight ut why relatively little discussion of composition luation appears. When high school English teachers were ed about the writing assignments they gave, their ponses revealed the effects on writing assignments of 1 factors as "over—enrollment, competing activities, and even a that o the w: instru testat E attent many sugges parali such effic: or conve: autho: objec stude would Sect j diSCl 78 :n administrative pressure" (French 201). It seems likely :t overloaded teachers of the day compensated by limiting : writing they asked students to do and by focusing :truction and evaluation on relatively trivial, but easily :table items. Evaluation. of oral language likewise received little :ention, although the experience curriculum provided for 1y listening and speaking activities. Greene and Gray [gested several objectives of oral expression that ralleled objectives for written composition—~objectives :h as "to greet others easily and courteously and ?iciently" (177), "to give clear directions, explanations, announcements" (179), and "to participate in 1versation, group discussion, and meetings" (179). The hors pointed out that in many cases evaluation of such ectives involved simply noting whether or how well .dents behaved. in social situations or' asking how 'they ,ld respond in a specific situation, e.g., When answering a ringing telephone, which of these actions are to be preferred? (a) Lift the receiver and wait for the person to speak. (b) Say, "Hello." (c) Say, "Hello, this is Bill Smith." (d) Say, "The Smiths’ house, Bill speaking." (e) Say, "Guess who this is." (177—78) An NCTE study of language arts curriculum devoted a tion of Language Arts for Today’s Children (1954) to a :ussion of the part that parents and even "other members of peri Unf fol sch or prc let one de: ab. th. an- an Pu or an de gr 79 E the community" might play in evaluation of student erformance (393): If parents are to realize that growth in reading . . progresses slowly through many and successive steps, that the child’s desire to communicate is normally far ahead of his ability to spell the words he needs, that discriminating vocabulary grows out of real experiences and opportunity to talk about them, they must have some share in the establishing of such values for the school and in determining what are appropriate means of appraisal. (393) nfortunately, in the "Techniques of Appraisal" section that ollows, the focus is entirely on appraisal that happens at chool rather than in the home or community. Whether parents were a part of the evaluation process r not, they have always been recipients of their children’s rogress reports. Rather than using simple numerical or etter grades, Hester suggested a more descriptive report, he that "considers each child as an individual and ascribes his work in terms of his own aptitudes and bilities" (391). Hester’s progress report would describe me child’s "personality and character development, his work 1d health habits, his attitudes toward parents, teachers, ad classmates, his attainment in subject areas" (391). 1rthermore, it would allow the child to help evaluate his 7 her own progress during a "cooperative planning period“ 1d would inform parents of the child’s "growth and evelopment and [provide] information to assist the home in liding the child in his future growth" (391). occz wit] desc Eng tha 9V6 at eve hos pe: Al- 11' ar 9X 1i my 80 Self-evaluation by students continued to be suggested :asionallyg Robert Pooley and Robert Williams noticed :h regret, however, in their Wisconsin study (1948) ;cribed below that In only 9 percent of the classes was there evidence that the class had in mind a set of standards by which the papers were to be judged and by which they could be checked before submission, and in only two classes had these standards been evolved from general discussion and hence become the standards of the class. (166) One English Journal article put the whole issue of [lish language arts student performance into a perspective it went beyond the formal or informal measures and {luation strategies so often suggested either prior to ,s time or after. Henry, drawing on his experience as a rh school principal and English teacher, demonstrated in 1946 article, "An Attempt to Measure Ideals," what we iay might refer to as reflective practice at its best—-or its worst. Clearly Henry was willing to rethink luation, to strip away all that he had been taught about evaluation was supposed to be done and to depend on sonal reflections of his own teaching experience. hough he spoke primarily about the teaching of erature, his thinking seemed applicable to all language s and to other subjects as well. He confessed the speration he had felt through his years of teaching, that ter what I considered to be a good moment of teaching erature I had no way of measuring it, or proving even to elf that it was good" (487). He wanted the reassurance song in 1 the: (48' rev. Fir: dev (48 to one Wha ite two per sta Stl all bet or: (4: Whe re] 81 ught by English teachers from every era: "as one teacher that experiment in which nearly all Americans have placed air faith I wanted to be sure that I had no useless part" 37). Henry sought affirmation of his teaching practices by rising the process by which he evaluated his students. :st, he stopped giving formal tests in literature and Iised "a scale of values to replace fixed quartiles" 38). He wanted "a measure that would be, above all, clear the pupils, something that . . . measured the ’real,’ not a trait isolated from the pupil’s total humanity" (489). it emerged from his thinking and experimenting was a 20— am list, which he dittoed for his students and spent three rs of Class time discussing. Each item was designed in 1 parts, the first expressing some truth about the iucated man" or setting forth a hypothetical situation and second raising an evaluative question. His intention to nudge students to think beyond obvious measures of formance and to encourage them to adopt his own ethical ndards, as demonstrated in one item in which he asked dents whether a poor performance on a task that had eady been formally evaluated several months earlier might er reveal a student’s actual performance than the inal performance that had closely followed instructions ). Similarly, in another item he asked essentially her more weight should be given for an insightful book rt than for one mechanically correct. Ultimately, he revea were the t and their on h sougl beine In t desi cour that neas fell Clas Stat teac acce grad Witt 82 ealed in yet another item his belief that, as students e confronted with the "eternal questions of mankind" in books they read, their own character should be improved the character improvement should be considered when ir teacher decided their grade in literature (490). Henry’s success in using these criteria surely depended his ability to model the characteristics and habits he ht for his students, though he might be criticized for g unfairly judgmental. For example, he explained that The brightest pupil I ever taught was a rank egoist. The young Lord Byron protested at his "B’s" as an affront to his "talent." I said: "Art is 21 social tool, not an outlet for exhibitionism. Until you learn that fact you will never been an “A“ pupil--or a writer. (492) the end, Henry admitted that the new measures he had igned had "added both liveliness and reality to the rse [but] I cannot prove it, and now I am dissatisfied I have yet no way of measuring the effectiveness of the urel What good is it simply to say to a layman or to a ow-teacher—-minus chart, graph, or table-—that now my ses have more ’spirit’?" (493). Henry’s candid ements reveal the inadequacy many English language arts hers must have felt in the midst of a society that pted only hard statistics as proof. Henry’s willingness to open up the evaluation and ing issue apparently went unrewarded by follow-up rest en: the part of his colleagues. There was much in this unorthodox article that might have provoked a var str to net stt age Fe} in st] the 1201 tn an. an 86‘ te co at in un We de ma 0t 83 variety of responses-—e.g., consideration of ways to measure students’ affective responses-—but unfortunately, it seemed to go largely unnoticed. One other unusual test of student performance should be mentioned from this time period, for it involved 'testing students not only against their current peers but also against same-aged students from 20 years before. The February 1956 Elementary English reported a study conducted in Evanston, Illinois, in which the reading performance of students during 1952-4 was compared with the performance of those during 1932-4. Although it seems difficult to believe :oday, the researchers conducting the study were convinced :hat the community 20 years later was "comparatively stable" and that the pupils were similar "in most respects" (Miller and Lanton 91). Care was taken to give the tests on the :ame day of the month as had been done 20 years before, to yive exactly the same test, and to reproduce the same :esting conditions. Undoubtedly, school personnel and the bmmunity felt reassured to find that "present-day pupils .ttending Evanston schools at the primary, intermediate, and unior high-school levels read with more comprehension and nderstand the meaning of words better than did children who ere enrolled in the same grades and schools more than two ecades ago" (96)—-though it seems unclear what use might be ade of such knowledge, at least at the district level, ther than as public relations material. IE E? 52' Tee the ace Not fu1 tee be; su te ra pe hi ea te fc SlI 84 Evaluating English Language arts Teaching Practices English languge arts teachers continued to be evaluated 3y the results of their students on standardized tests. As 3. Smith discovered in her study of the New York Regents Pests, "Teachers believe, whether rightly or wrongly, that :hey' may be dropped or retained at the end of the year according to the number of their pupils who pass or fail. Jot uncommonly, when applying for a new position they furnish a so-called 'Regents’ record’ as one form of :estimonial" (158). Perhaps, then, English language arts teachers actually )elieved their students’ scores were proof of their own :uccess or failure. Or perhaps they pragmatically used the zest data when it was to their advantage to do so. At any ate, teachers took the testing seriously and realized the ersonal stake they had in helping their students produce igh scores. As in the case of the teachers mentioned arlier who were said to have used copies of standardized ests to teach from, teachers too often seemed willing to ocus on high test scores as their most important goal. D. nith similarly found that "teachers of New York state teach iat they expect these examinations will test" and also that 'w]here materials are lacking, as they are in large numbers schools, Regents’ drill books and, back copies of the amination become a major factor in the daily program at th the elementary and the secondary school level" (165). Espe whet tha1 “as: suo bey Int Wil art by sig to En; f0: su. in in th st Ti 85 Especially disturbed by these practices, Smith wondered whether money spent on testing--which produced nothing more than numerical scores--might not be better spent in "assisting schools in developing techniques for determining success or failure in reading and expression which go far beyond those (H? the average group examinations" (190-91). Interestingly enough, a few years later in 1948 Pooley and Williams found in their study of Wisconsin English language arts programs that, "although testing in language skills is by no means lacking, the results of the tests and the significance of the evaluations have not contributed greatly to the modification of content, methods, or materials in English teaching" (93-4). Still, he added a caution that formal testing not be used as "a form of supervision," since such practices tended to "promote the wrong kind of instruction" (96). Unfortunately, those who conducted a state-wide study in Tennessee seemed unaware of such dangers. As many before :hem had done, they looked to the colleges to determine the success of their secondary curricula and teaching practices. the procedure used was to ask the colleges in the state to 3001 the results of their English placement tests and then :0 prepare a "helpful yearly report of the efficiency of the ligh-school training" for individual teachers and ldministrators (Hodges 72-73). However, this procedure :ontained a number of glaring inconsistencies that .nvalidated the results. For example, each college used its 0W1 use rel wh. rej gr re st in th ef re CC V. 86 n placement test, which meant varying measures had been ed. Although each student was originally "ranked only in lation to the other students in the particular college in ich he [was] enrolled" (73), the school districts received ports showing' the percentage rankings of each. of their aduates-—as if the percentages had the same base. The searchers rationalized, by saying that "[a] student who ands first in one college would probably be among the best another, the average student in one college would bably be among the average in another, and so on" (73). Even more disturbing, however, is the degree to which e student scores were used as a measure of the fectiveness of teachers and curriculum. Although the sults were prepared so as to make it impossible to compare lleges, each secondary teacher received a list of her or s own former students’ scores and each principal received copy of the list sent to each teacher, "showing both the rage for each teacher and the general average for the 001" (74). Superintendents and state officials received orts as well. Incredibly, the Tennessee Council of chers of English approved of this ranking and reporting even made up a "yearly honor list" of schools. Hodges, niversity of Tennessee English professor, noted that "the y fact that a school is not on one of the honor lists ves notice that the school is at best only average and es it something to answer for in the community" (74). He lained that this procedure was designed to "improve En III CI er SC M?“ I: 87 nglish teaching" (75) and cited one school official as :ging teachers whose students got low reports "to review ~itically their entire teaching procedures" (75). Based Jtirely on students’ test scores (actually, only on the :ores of students who attended Tennessee colleges and niversities) with no consideration given for varying aching conditions or student populations, Hodges rather oudly concluded that "the yearly testing brings into clear lief the most effective English teachers. Two years of oling placement tests from twenty-six colleges have made a ore of teachers stand out like mountain peaks over the ate" (75). Perhaps what is most striking in hindsight is at no one seemed able to see the flaws in these ocedures. jor Studies Evaluating English nguage Arts Curricula and Programs Extensive state-wide studies of English language arts sulted in the publication of books which may have served guides to school administrators and teachers seeking to aluate their own curricula and programs. D. Smith, the iefatigable leader of English language arts curricula ring the 1940s, was commissioned to direct a study, ationed earlier, of the New York Regents testing program, study that resulted in the publication in 1941 of aluating Instruction in Secondary School English. Smith 1 her assistants sought to isolate "major issues" and to tress "chief problems discovered" (v). They visited SChOe analj diar thei list clas anal inst supe rele neee rece for the ext: par hee res maj loc if T___11 schools, consulted with local officials and teachers, analyzed students’ test results, and studied "reading diaries of boys and girls, together with the records of their attendance at motion pictures and their habits of listening to the radio" (2). Syllabi were studied, as were classroom and library equipment and book supplies. They analyzed the "continuity" of the program, the nature of instruction in the classroom, the "organization and supervision of the program as a whole, and especially the elationships of the offerings to community and individual eeds" (2). Almost nothing seemed overlooked as they recorded and reported the amount of money each school spent for textbooks and the library, number of books borrowed from the state traveling library, class size, schedules, extracurricular duties, professional reading, and participation in professional organizations (6). Appropriately, they offered the caution still worth 1eeding today, that "no general inquiry can tap all the resources of the individual community. It can present only najor problems which will repay further investigation by the .ocalities themselves" (9). Smith expressed the hope that .f their study could . stimulate careful consideration of areas of strength and weakness as revealed by the Inquiry on the part of local authorities, who are 111 a position to make much more intelligent and detailed study of conditions within the individual community, it will have been abundantly worth while. (9) Pool publ surv for schc thei Engl Poo the ins Cur ins cri and We: 89 A somewhat similar state—wide study was described by Pooley and Williams in The Teaching of English in Wisconsin, published in 1948. In this case, Pooley and Williams surveyed the teaching methods and instructional materials for English language arts in the elementary and secondary schools (144—45). Rather bravely, they began the report of their study by listing complaints commonly made against English and its teaching: 1. The results of English instruction fail to justify the amount of time allotted to the subject in most elementary and high school courses of study. 2. English instruction often fails to turn out pupils who can speak effectively. 3. English teachers do not succeed in interesting their pupils or challenging them to make satisfactory progress. 4. English courses of study are traditional and dull, and unrelated to the actual needs of the pupil. 5. English instruction is ineffective (a) because it includes too much grammar; (b) because it does not include enough formal grammar. 6. The English curriculum is in chaos; no one knows what to teach or in what grade to teach it. (3) Pooley and Williams’s plan was to determine "how much truth there [was] in each charge, and the extent to which English instruction suffer[ed] specifically from defects in curriculum organization, instructional materials, and instructional methods" and to discover "how far the criticisms are refuted by the evidence of competent teaching and positive results" (3). The answers to these questions were sought by preparing a detailed analysis of the problem an va of in cl fi SE P] 0“ 90 and planning questionnaires and personal visits involving a variety of persons and materials, such as teachers, courses of study, textbooks and reference books, basic elements of instruction, and classroom methods (4—6). After studying questionnaires and making over 900 classroom observations, the results were compiled in a book filled with statistical tables calculating everything from salaries of high school English teachers (122) to "Grade Placement of Items of Capitalization in Courses of Study in Elementary Schools of Six Cities" (44). The fact that the Wisconsin study did not test students led the authors to feel that teacher—participants in the study were less defensive and that school visits created good will (93). Another extensive experiment and study was conducted by the Stanford School of Education (financed by the General .Education Board). This massive project involved 10,000 students and 150 teachers and administrators in secondary schools and led to the eventual publication of three books, one focused. on English. titled English for Social Living. During the three-year experiment teachers of English and foreign languages were charged to find ways to improve student growth in their classes and were encouraged along ‘with the students to "create and to grow according to their own best thought" and to "exercise their freedom" ("Stanford" 119). Furthermore, provision was made for summer programs on Stanford’s campus in order for teachers to work cooperatively with those from other school dist: cent! pers with (119 stud effe used in e nent ("St pro; "as muck sent 122: edu< 91 districts, always seeking to "observe the results of centering work in English and foreign languages upon the personal and social welfare of young people, conceived within the democratic framework of a creative Americanism" (119). The criteria by which each of the 50 programs studied would be measured are as follows: "What is the effect of this material on the young people with whom it is used? Does it help them develop confident, vigorous ability in all aspects of communication? Does its use promote the mental health of each boy and girl and of society?" (120). Perhaps as interesting as the project itself was the evaluation of the project that took place in the pages of the English Journal, which published a series of reviews of the books published, especially the book focused on English. One reviewer, Max Herzberg, a high school principal and former NCTE president, devised and answered his own loaded questions, such as, "Have pupils been given not merely the artistic (’creative') point of View but also that of the scientist?" and "Has the whole range of American education been covered, including that impasse, the academic college?" ("Stanford" 121-22). Herzberg essentially praised the project, pointing out among other reasons that the students “as a result of the novel procedures . . . employed, came much closer to a balanced equation of ability and achievement than in the traditional classroom" ("Stanford“ 122). On the other hand, another reviewer, an English education professor named Pendleton, dismissed the books as tee boe thl ( u ne en of 1'6 92 "radical pronouncements by college professors of education, bolstered by classroom suggestions written by controlled teachers" (“Stanford" 125). Further, he complained that the books reflected "the New Deal of present politics and . . . the views of the group now dominating the National Council" ("Stanford" 126), views he would reject because they neglected "all subject matter, all conformity to the environing world, and all careful study of masterpieces and of history" (126). Regardless of how the readers of the reviews responded, the inclusion of such differing viewpoints suggest a healthy determination on the part of the editor to provide the open forum that earlier editorial policies had outlined. Although frequently recommendations for evaluating English language arts grew out of large research projects, a number of educators offered suggestions which seemed to grow ‘out of their own professional experience and theoretical contemplations, especially suggestions by which schools could evaluate their overall language arts programs. For example, in 1954 John DeBoer, a University of Illinois Professor of Education, published a list of characteristics of "modern" programs which could serve as criteria for the evaluation of elementary lanaguage arts instruction (485). According to DeBoer, the modern school does all of the following: o expects the child to read only when he is ready to read 0 provides in the classroom many attractive books, of an th OF fc e) 93 magazines, and other reading materials suited to many interests and levels of ability 0 has an attractive, well stocked central library 0 systematically undertakes to cultivate wide and varied reading interests in children 0 makes clinical facilities available to disabled readers 0 provides an abundance and variety of direct experiences 0 makes effective use of many kinds of audio-visual aids 0 takes account of modern media of mass communications 0 undertakes to cultivate the child’s love of poetry 0 undertakes to cultivate the child’s gift for creative expresion 0 provides abundant opportunity for the oral sharing of ideas and experiences 0 develops skill in written communication through well-motivated experiences in actual communication (485-92) Occasionally, during these years there were discussions of how evaluation programs themselves should be evaluated, and again, criteria for evaluation seemed to be offered on the basis of the writer’s personal and professional opinions. In 1944 Walter Cook, for example, listed criteria for an "adequate" evaluation program, including such texpected items as "the evaluation intruments should tend to reveal to the learner clearly and in detail the inadequacies of his performance" but also items that sound more current today, such as "the program should be based on the fact that the most effective evaluation . . . is that which is carried on by the learner" and "evaluation instruments should be available to the teacher and learner whenever the learning situation requires them and not according to the calendar" (198). 94 Pooley would later observe that the English situation in 1950 was a time of "plenty of theory," and such articles as the ones described above attest to that fact (498). Pooley also observed, however, that 1950 was a time of great "need of practical common sense" (498). Clearly the years 1941-1957 were a time of stretching beyond standardized and objective tests and a time of looking beyond local circumstances. By 1957, however, English educators were jolted by Flesch’s Whv Johnnv Can’t Read and by Sputnik and found themselves and their curricula facing considerable criticism--especially charges of anti-intellectualism (A. Applebee, Tradition 188) leveled against the life experience curriculum. These circumstances would soon lead to reconceptualizing and re-evaluating on many levels. This was, then, a time of questioning, of weighing the ‘merits of testing, though even those who sensed an inadequacy about tests and test scores felt helpless to know very clearly what might be better. It was a time when test scores were commonly used to make important decisions about placement and promotion that sometimes worked against students’ best interests. It was a time when educators admitted openly that testing drove curriculum and teaching practices and a time when teachers’ evaluations were explicitly linked to student test scores. There seemed a growing understanding of the control, especially the external control, that testing could have on the lives of stL' to 95 students and teachers, who had the most to win and the most to lose. wh: scl wo co: 32 Is An Sk de to tn st ev ir CHAPTER FIVE RECONSIDERING ENGLISH LANGUAGE ARTS EVALUATION: 1958-1969 English educators in the "Sputnik Age" predicted a time which would "undoubtedly bring rigorous examination of the school program" (D. Smith, "Re-establishing" 317) and which would find English teachers seeking to define "what constitutes growth in the various aspects of English" (Smith 326). The year 1958 saw English educators at the Basic Issues Conference advocate a greater focus on content (A. Applebee, Tradition 193—94), while that same year B. F. Skinner described teaching machines and programmed learning designed to break content down into tiny bits of information to be sequentially presented (Science, October 1958). These twelve years became, then, a time of self-conscious taking stock among English language arts educators, a time of evaluating previously-held basic assumptions and proposing 1 innovations to be tested. 1 It is no surprise that articles appeared with titles i like "The Teaching of English in the Soviet Middle School" t (English Journal 1959), "Why Ivan Can Read," (Elementary PT 4 (nglish Teacher 1962), and "How Russian Children Learn to .ead," (Reading Teacher 1959), which self-consciously asked, 1”; 'Are there features of the Russian system that might be adopted in our own schools, or methods that would be iefinitely disadvantageous?" Such questions suggested a ylobal comparison, at least in this country, as we wondered 96 ho SE te ir CC 97 how we measured up against another power that challenged our ‘sense of technological superiority. There was considerable discussion and questioning about testing itself, such as Ralph Tyler’s explanation that tests in the past had emphasized measurement and "reflected the content of teaching materials," but that they were more recently thought of as "a series of situations which call forth from the student the kind of behavior defined in the objective and permit a record to be made of the student’s actual behavior" (6). Somewhat similarly, Warren Findley in a 1963 article contrasted disenchantment with standardized tests of the past with newer efforts to "promote and measure a balanced set of educational objectives, including ability to use or apply knowledge" (1). By 1969 the emphasis was clearly on using behavioral objectives both to shape spiral curricula and to evalauate them. Accordingly, professional English language arts publications generated articles with titles such as "Objectives for Language Arts in Nongraded Schools" (Elementary English 1969) and "Selected Objectives in the English Language Arts (Pre-K through 12)" (Elementary English 1969). Phillip Jackson called attention to the danger that the general public, and even most English language arts teachers as well, fell into when interpreting test scores, especially in the case of norm—referenced tests: "Rather than being viewed as convenient symbols which summarize an individual’s performance in a most crude fashion, test scores come to be see sce cat the li: tee me‘ we be me th in pr oc li 98 seen as something the individual ’has’" (28). Thus, test scores, which were widely used during these years to categorize students for purposes of ability grouping, had the effect of labeling students in ways that often created lifelong scars. Unfortunately, few English language arts teachers or parents seemed to heed Eleanor McKey’s metaphorical admonition: "Let us use the standard test as we use our watches, always mindful of the fact that they may be a little fast or a little slow, but they are, nevertheless, more reliable and accurate than a glance at the sun" (611). Evaluating English Language Arts In comparison 1x) some earlier periods, evaluation of individual student performance received less attention in professional publications of this period. There were occasional articles about evaluating understanding of literature, but they seemed to repeat what had been said before. Dwight Burton, for example, in his 1959 text, offered evaluative criteria that reached beyond the classroom, such as, determining how much voluntary reading students did, but also suggested using objective tests to measure literary knowledge and ability to comprehend literary material (251-52). He further recommended tests of literary "taste" as well as informal classroom evaluation methods, such as, teacher observation, interest inventories, and attitude scales (256—57). nor sug sch rea rea rec tee Slll oce I‘EE W11 the Ci: 88‘ th be in 99 Similarly, discussions of reading evaluation offered more of the same long listing of activities that had been suggested earlier. Mary Austin reported in 1958 that many schools were using the following means of evaluation for reading-—standardized reading achievement tests; informal reading surveys; diagnostic procedures; observations; individual conferences; inventories of reading skills, interests, and study habits; teacher-constructed tests; tests of pupils’ ability to locate reference materials; records of students’ independent reading; and year-end testing (36). In spite of these suggestions, however, surely most evaluation of reading during this period occurred essentially within the context of published basal reading materials, which were used almost universally in American schools by the 19605 (Goodman et al., Report 24). Within the basal system both the instructional materials and he tests were developed by the same publisher, creating a ircularity that would go relatively unchallenged for everal years to come. As reading tests were subjected to a variety of tests :hemselves through research, their shortcomings continued to )e aired in professional publications. Roger Lennon, for .nstance, in 1962 reported that Studies agree that most of the measurable variance in tests of reading competence, however varied the tests entering into the determination, can be accounted for in terms of a fairly small number of factors. . . It seems entirely clear that numerous superficially discrete reading skills to which separate names or titles have been attached are, in fact, so closely m0] eve pul st: de ex el th ye me fc 100 related, as far as any test results reveal, that we must consider them virtually identical. (333) Articles in professional publications paid relatively more attention during these years to discussion of evaluation of writing. What is striking about some of the publications is that their authors seemed willing to notice strengths in student writing, rather than simply assuming a deficit attitude when evaluating student writers. For example, Ruth Strickland’s 1960 article evaluating elementary children’s writing spoke in terms of focusing on the "growth of an individual child from day to day and from year to year" (322). Evaluation data could be gathered by means of anecdotal records, self—evaluation (323), student folders (329), and sentence analysis of writing samples. Beyond evaluation of individual student progress, Strickland yalso recommended the evaluation of "growth of the class as a t ‘whole with comparisons among classes . . . quality of composition within an entire school . . . methods of teaching' writing . . . [and] periodic evaluation of the total curriculum in writing within grade levels" (322). The task of evaluating the writing of secondary students was addressed by the Association of English Teachers of Western Pennsylvania, which published two undated pamphlets (though bibliographic entries indicate publication after 1958). Both the junior high and senior high booklets included several student themes--as had articles about composition scales back in the 19203. Each compos reprod commen commer studen commer questi pamph] method loads bookle ffi _A. H H d- . ."l 101 :omposition had been corrected and evaluated, and each was reproduced along with handwritten marginal and in-text :omments, as well as a one-to-three paragraph concluding :omment to the student and somewhat longer note to the student’s teacher from the evaluators. The evaluators’ comments ranged from direct advice to suggestions phrased as questions to descriptive praise and criticism. The pamphlets also included discussion of practical evaluation methods which acknowledged the need for manageable paper- loads for English teachers. The writers of the senior high booklet conceded, for example, that Teachers in the classroom will certainly be aware that the comments on the papers that follow are generally more extensive than they can afford to make on the hundreds of papers they must grade. . . . The sooner the general public, together with school boards and school administrators, realize the time and effort that a good composition program requires, the nearer we will be to a genuinely realistic understanding of the demands made upon the English teacher. (Suggestions . . . Senior High 3) mile composition scales 25 years earlier had been used to :easure individual students’ writing, the editors of the ennsylvania pamphlets expressed the hope that their texts ould be used as a "focal point for discussion rather than 5 an arbitrary set of standards" and especially recommended tat English teachers meet within their own building to scuss the materials in the pamphlets (SpggesggnfiL_L~L_g W2). When A Guide for Evaluating Student Composition (edited Sister M. Judine) was published by NCTE in 1965, it may have arti had duri from rece stue stue see] tea rec per pri per rel wri hor cm 70 mo of ave seemed like a landmark work, for it pulled together 25 rticles related to evaluating composition, many of which ad originally appeared in state and regional publications uring the 1950s and early 19605. It included an excerpt rom the Pennsylvania pamphlets mentioned above as well as ecently designed rating scales, a defense of praise of tudent work, and practical articles about "managing" tudent, writing. Even though the articles in this book aemed not to reflect any particular approach to the aching' of composition, they served to .nudge readers to consider composition evaluation from a variety of Frspectives. The fact that the book was still being tinted ten years later indicates the extent of its arceived usefulness. In spite of evaluation approaches that seemed to zflect a more current attitude toward the teaching of :iting, Hook’s 1961 report provided evidence that revealed 1w far removed current theory was from prevailing imposition classroom practices. Citing responses from over to secondary English department heads, Hook explained that ire respondents reported spending the most time on "study ' functional grammar, with exercises intended more to teach plication than to teach identification" than on "writing students and discussion of what they write, along with scussion of professional authors’ techniques" Characteristics" 12). duril Zoll: flow Lund: syst« test fitt (747 and test of I‘GCC obse 1.1896 (K0; eVa] Prim Paul M dete big) 103 Evaluation of oral language also got some attention during these years as well, with one 1958 article by Marian Zollinger and Mildred Dawson suggesting that teachers plot flow charts of their class discussions (Figure 3). Sara Lundsteen, however, sought more of a "scientific, systematic, and developmental approach" to the teaching and testing of listening and seemed primarily interested in fitting listening into a scheme of behavioral objectives (747). Eighteen critical listening lessons were developed and. used, with students being‘ given both pre- and post— :ests. Given orally, the 79—item test measured "detection >f the speaker’s purpose, analysis and judgment of >ropaganda and arguments" (745). D. W. Kopp’s 1967 mementary English article cited a "dearth of standardized :ests of oral communication skills and abilities," and ecommended that rating scales, tape recordings, bservations, and even "teacher-pupil—made tests" could be sed to emphasize the improvement by each individual child Kopp 120). When there was discussion of informal classroom valuation of English language arts, it seemed to focus rimarily on the individual instructional needs of students. aul Burns, for example, published a book called Diagnostic eaching of Language Arts (1967) which described in enough etail how anecdotal records might be used that readers ight actually have been able to follow his suggestions: Figure 3 - Participation in Discuss oooooooooooo Anot conf book appr that like 1963 was test comm Proc bull soul as - hows duri Clas Drug Educ Pets 105 There are many ways to maintain such records: one possibility is the use of loose-leaf notebook or ring binder that accommodates full—sized sheets, each child’s name being put on a tab so his pages can be found quickly. The purpose of the record is mainly to note learnings the child has achieved and those he has yet to acquire. (10) >ther 1967 text advocated the use of teacher-pupil nferences for the teaching of reading, providing a range questions that could focus on the appropriateness of a 0k (e.g., “Why did you choose this particular book?"), on preciation of a book (e.g., "What was it about this book lat. made it good?“), and on values gained from a book e.g., "Did something happen in the book which you would Lke to have happen to you?") (Hunt 111). For more traditional testemakers, NCTE published in 963 a booklet entitled Building Better English Tests, which 'as intended to serve as a corrective to existing faulty esting practices and to help new English teachers avoid ommon testing mistakes. It led teachers through the recess of planning a test, selecting questions, and uilding short—answer and essay questions--so that teachers ould become more skilled practitioners of "the art—-as well 3 the science--of testing" (Carruthers 5). Ultimately, owever, external testing seemed to grow more important iring these years than either traditonal or alternative tassroom evaluation measures. Aided by the guidance :ogram initiated as a part of the National Defense lucation Act (NDEA) of 1958, high schools gained a staff arson whose job description usually included primary atten' stand contr discu of ti raise exami "exp: in t1 exam: empha expre reco: call the Brit situ Call they in hers unne exam Won: (155 0f 106 ttention to administration and interpretation of tandardized tests (Findley 2). The topic of external evaluation emerged as a ontroversial issue at the 1966 Dartmouth Conference, as iscussed in Herbert Muller’s book describing the workings if the conference. According to Muller, when Alan Purves 'aised the possibility of "national assessment examinations," the British contingent at the conference expressed shock," since they had "thought that America was .n this respect an Eden, untouched by the curse of external examinations" (158). When the British called for an :mphatic statement condemning external examinations and xpressed the opinion that "no issue was more vital, no ecommendation more urgently needed" (158), Purves and olleagues apparently responded by expressing sympathy with he denunciation of such a rigid examination system as the ritish contended with but also "pointed out that the ituation in this country is quite different and does not all for such a manifesto." Purves asserted that even if iey wanted to do so, "there is no one constituted authority 1 America to address a ringing denunciation to" and ersuasively insisted that such an action was probably necessary since "the proposal of national assessment aminations is meeting strong opposition even though they 11d not affect the standing of students in the schools" 59). Purves’s words sound especially ironic now in light national and state-wide testing programs that have de‘ 00] 6X W3 di na fc is fJ' 107 veloped in recent years. Although the Dartmouth nference participants did call for a systematic review of aminations and grading, it is difficult today not to lieve that a more strongly worded statement--and warning—- 5 needed. By the end of the 19605 there was in fact considerable scussion of and reaction to proposals for more testing and .tional assessments. Willard Congreve's article of 1968, yr example, insisted that "lack of appropriate evaluation : undoubtedly one of the greatest weaknesses in the entire .eld of education today" (307). Although Clarence Derrick :knowledged. that. many saw as a problem the "paucity of national’ essay tests," believing that "if someone doesn’t est it, teachers won’t teach it, and students won’t learn a" (496), nevertheless, he called on readers to "renounce 1e hope of any kind of testing of writing on large-scale rtional tests" (499), contending that such tests would not field reliable scores or be economically feasible (496). ien a 1969 Journal of Reading article raised the question that Can We Expect from a National Assessment in Reading?" te discussion about large—scale testing seemed to be gttled, for the first sentence unequivocally stated, "A itional assessment of education has begun" (Shafer 3). Ther than argue the merits or disadvantages of national -sessment, Robert Shafer offered cautions about the porting of assessment results: II-a Iraq 108 1. No score is to be derived for an individual since each individual will receive only a portion of exercises in the various fields being assessed in his age group. 2. Individuals are not to be ranked in the reporting of results since the assessment is to describe groups and not individuals. 3. Each exercise must stand alone in the assessment; it would not be submerged as part of a test. Therefore, each item must be independently defensible in terms of the objectives and capable of being reported on as to the percentage of people answering it correctly. (8) though these were important cautions which have been :eded in National Assessment of Educational Progress (NAEP) esting, they have often gone unheeded with the development state-wide testing, as later discussion will indicate. rafer's prophetic warning about the potential greatest anger of national reading assessment seems even now oplicable to all English language arts testing: Perhaps the greatest danger . . . may be found in the pleas of many who, after the results become public, will wish to restrict the curriculum to those objectives and specific areas which were included in the assessment and which they feel can successfully be measured. A further danger will be that what is considered difficult to assess will not be considered as worth having. (54) faluating English Language Arts eaching Practices and Curricula Teaching practices and curricula seemed to be insidered more directly measurable during this period, as ientifiable behaviors were being suggested as proof of lacher and curriculum effectiveness. Another dimension of e evaluation of teaching practices was revealed, however, discussions of who would bear what responsibility for ev Cu te ju pr to at on va pr st WC te cc A] St e\ si 109 -valuation. For example, an NCTE Commission on the urriculum raised the provocative question, "Do the English .eachers themselves establish the criteria and standards for udging each other’s performance?" (282). Elizabeth Howard roposed that not only should teachers’ performance be used judge a reading program but that administrators’ fectiveness should be an important criterion as well. She tlined program responsibilities that could be handled by rious administrators, including superintendents, Fincipals, and "supervisors of appraisal." In each case, i 1e suggested that administrators play a supporting role-- .g., interpreting test results for the community and Jrking with staff in planning and evaluating (170-73). The issue of evaluation of English language arts aaching practices was also addressed during these years in mnection with discussion of overall program evaluation. .though James Squire and Roger Applebee’s 1966 study of ccessful English programs did not include direct teacher aluation criteria, the report did present a variety of gnificant and insignificant facts about English teachers, mparing questionnaire responses of those whom project servers had identified as outstanding with those of other achers (348). As might be expected, the outstanding achers had more experience, spent more time reading and iting, and were more involved in professionally related tivities than were the "general" group. Curiously, Squire i Applebee provided such puzzling details as the fact that th 1i th te 0F wt ar ti 110 this same group of outstanding teachers spent less time listening to music but more time in part—time employment than did the general group (348—51). Another kind of evaluation of English language arts teaching practices and curricula took place through the opinions expressed in the professional journals. The English Journal continued to publish articles, for example, which reflected college judgments about how English language arts programs should be taught at all levels. Articles with titles such as "What the Colleges Expect" (A Report of the NCTE Committee (n1 High School College Articulation, 1961) sometimes had the effect of negatively evaluating, or at least patronizing, secondary teaching practices by advising iigh school English teachers, for instance, not to weaken :ollege—preparatory courses "by including units on social :onversation, telephone manners, senior problems, or any >ther matters related only vaguely to . . . teaching .anguage, composition, and literature" (403). If turnabout .5 fair play, a later article (1962) reported another study )y the same NCTE committee, in this case a study of college ’reshman English courses. Perhaps secondary English teachers felt some small comfort when, having suffered ollege criticism for so long, they were told that "there re quite as many things wrong with freshman English in ollege as with English in the high school" (178). Clearly during 1958-1969 the English language arts valuation issue that received the most attention and ef of an in CO pr WE t1“. Cl: Ev SC II] ti st ffort, as reflected in professional publications, was that f large-scale evaluation of English language arts programs nd practices. Beginning in 1960 NCTE had become involved n program evaluation through the formation of its ommitteee to Review Curriculum Guides. The Committee rovided a review service, reported trends in curriculum and guide-making" to the profession, and selected guides for isplay at NCTE Conventions (NCTE Committee to Review 891). n "Trends in Curriculum Guides,“ they published a checklist ed by the Committtee as they evaluated curriculum guides at could serve as a check for local school districts as #11 (NCTE Committee to Review 895-97). Another NCTE group, he Commission on the Curriculum, also focused attention on urriculum evaluation when they published "A Check List for valuating the English Program in the Junior and Senior High :hool" (1962). They too provided questions that could .ead local school faculties to the thorough examination of leir programs from which all improvement ultimately must .em" (273). Several researchers of this period and of future riods sought to analyze the characteristics of successful glish language arts programs that had been identified in a riety of fairly unsystematic ways. Often these studies :e conducted nationwide with the hope of deriving scriptions of——and prescriptions for-—success which could imitated or adapted by programs and districts throughout : country. ar ar q] we is a] Er wi cc ir. wi di In ha re 112 Arno Jewett’s 1959 English Language Arts in American Hi h Schools is one such study. Published by the U. S. Department of Health, Education, and. Welfare, researchers analyzed 285 courses of study from every part of the country and reported on "promising practices in language arts" gleaned from the courses of study. In addition, a survey was sent to school district administrators and instructional -eaders, teacher educators, and selected members of NCTE—- all of whom were asked about the processes used to develop Inglish language arts curricula (5). Though not working vith a representative sample, the researchers eventually 'ompiled a "list of principles" that seemd to be effective n the development and revision of courses of study, along ith techniques used by administrators and curriculum irectors that seemed to produce the desired results. icluded were the following process recommendations that may :ve served as guidelines for school districts interested in form: 1. Through a schoolwide survey, discover the curricular problems teachers are concerned about; then, focus attention on a few major problems that they have in common. 2. If necessary, use the broken-front approach-—that is, first involve those persons who are most interested in studying and changing the curricular program; then, as they move ahead, encourage others to join them. Avoid high-pressure methods. 3. Focus attention on a few major problems rather than many minor ones. 4. Provide necessary books, instructional resources, :onsultants, clerical help, etc., and an adequate quest had in a: that cone. cont: ("Che SiZe anon] on 1 “59ft read. chare expl. depa: re001 ("ch; 113 budget to enable the curriculum committee to do its job. 5. Help teachers and others to see the total role of all participants and to understand their own job in the entire undertaking. 6. Keep the attention of the working group focused on: a. What is being accomplished, and b. what remains to be done. 7. Have a long-range program. 8. Involve in the curriculum work the persons to be affected by the changes recommended. (16) Hook reported in 1961 information gathered from a estionnaire to approximately 800 secondary schools which d been winners and runners-up of NCTE Achievement Awards an effort to discover the distinguishing characteristics these schools ("Characteristics" 9). Hook’s hope was at any notable similarities "might provide useful hints >ncerning curricular and other" practices that apparently ntribute to the development of especially able students" Characteristics" 9). Questions were asked about class ze, amount of time spent on extra—curricular activities, vunt of writing assigned, degrees of teachers, time spent literature, etc. In an attempt to make his report more ful to teachers, he also provided a checklist by which iers could compare their own schools’ responses with the acteristics of the award-winning' schools. Hook then .citly encouraged readers to discuss their findings in a tment meeting and present significant findings and nendations to district administrators acteristics" 13). that and 114 Such reports sometimes had further dramatic impact in that they created awareness of prevailing national strengths and weaknesses. Squire’s 1962 reflections on the influence of the National Interest and Teaching of English (1961) report indicate that it raised "disturbing questions about teacher preparation in English, teaching conditions, and existing elementary and secondary programs" and called for "vigorous professinoal leadership at the local, regional, and national levels to improve the total profession" (Squire 381). The result, Squire insisted, was "energetic reappraisal" within the profession. Even more significant, however, was the impact this particular report had outside the profession. Described by A. Applebee as "a direct and shrewd presentation of the importance of English to the national welfare, coupled with a startling documentation of instructional inadequacies," the report was eventually distributed to all members of Congress and to "other influential government figures" (A. Applebee, Tradition 199- 200). This report, coupled with the subsequent publication of The National Interest and the Continuing Education of Teachers of English in 1964, with its startling evaluation of the profession, provided documentation needed to help convince Congress to broaden the NDEA to include funds for English (A. Applebee, Tradition 201). Squire and Applebee’s 1966 report, A Study of English sc ic‘ sr sc CC E: C] in qu “F ea de an C0 re pu Se St Re Wh Su 9X 115 successful programs and. built on the results of previous studies as it did 50. Sponsored by the U. S. Department of Education, this study sought to discover how "stronger schools" were achieving important results in English and to identify the characteristics of what were deemed to be superior English programs which might be emulated in other schools (1). Studying a total of 158 schools which :onsistently produced Achievement Award winners along with 'comparable schools with highly regarded programs in English,“ (3-4), Squire and Applebee used in their study :lassroom observation, individual and departmental .nterviews, group meetings with teachers and students, [uestionnaires and checklists (4). However, they also drew ,pon the criteria developed and the results attained from arlier studies, including the award-winning characteristics evised by Hook, the checklist of characteristics of junior nd senior high English programs developed by the NCTE ommission on the English Curriculum, and reports and ecommendations from other committees, commissions, and ublications of NCTE and other groups (4-5). Using fifteen aparate instruments designed for the study (23-25), they :udied everything from "Type of Final Examination and elative Percentages of Content Therein" (322) to "Ways in rich English Departments Would Most Likely Spend lpplementary Funds" (138), eventually compiling an :haustive 601-page report. no pr Th no ca re de ha pu an th pu so co se be An 0f Wh th te in ex it 116 Eventually English educators came to realize the need not just to evaluate existing programs but to plan for program evaluation from the time new programs were designed. The need to make evaluation a part of curriculum reform movements and the need to handle the evaluation process carefully were pointed out by Michael Shugrue in his reflections on the Project English centers that were developed with NDEA funds. Apparently the materials that had been designed by the Project English centers were published just a few at a time rather than all in a group, and perhaps more significantly, "some professional journals . . . published premature reviews of the Center Curriculums" (43), with the effect that judgments were made quickly about the entire project based on the small sample of materials ipublished early on. Shugrue explained that, "Disturbingly, some teachers . . . dismissed the work of the Centers as too content-oriented or not sufficiently novel before they had seen more than a small fraction of the rich variety of units being produced in more than twenty Centers“ (43). A. Applebee pointed to a need for careful empirical evaluation of results of new projects and programs. As he reviewed what he saw as the unsystematic evaluation of the work of he Project English sites, he issued the reminder that eachers’ impressions about curriculum reform are "almost 'nevitably highly positive" because they are based on "the xcitement and stimulation inherent in the process of change 'tself" (214). He conceded, however, that ac AP ti te En ev th to te us pu to 0f En St. to Al' We: the 117 The kind of careful documentation of long—term results that had marked the eight-year study was simply beyond the ken of most of the staff involved in these efforts. The result was a mountain of essentially untested materials which no one really knew what to do with. Very few of the centers admitted to any failures, but very few carried on the kind of studies that would have told them if they had failed. (Tradition 214) Although readers today could point to greater acceptance of and respect for classroom research, perhaps Applebee offers today an insightful caution to include more timeless assessment measures which go beyond the personal testimony of new program participants. One of the more significant features of this era of English language arts evaluation is that to a great degree evaluation that mattered most was essentially taken out of the hands of classroom teachers. This was especially true for elementary language arts teachers whose curriculum and eaching practices depended on the basal reader materials sed in the district. Published test materials were urchased along with the readers, and teachers were expected 0 use them and indeed to rely on them as accurate measures f student performance. For both elementary and secondary nglish language arts classroom teachers, nationally—normed, tandardized tests were a routine part of their school year, 0 be followed soon by state and national tests as well. 1though a variety of alternative classroom evaluation ideas re suggested during these years, any use made of them in e classroom involved an extra commitment on the teacher's part stake 118 part that seldom seemed justified as long as the testing stakes rested on the standardized tests and their scores. fr ex di fa ne 51 3C 111C te ir CHAPTER SIX EXPANDED TESTING AND ALTERNATIVES: 1970-1987 From Testing as Measurement to Testing as Management During the early 19705 the money that had flowed rather freely during the 19605 for English language arts experimentation and expansion began to dry up or possibly be diverted to the Vietnam War. Concurrently, the country’s fascination with behaviorist psychology had resulted in a new focus on accountability, as reflected in emerging systems approaches that promised cost—efficient solutions to a variety of bureaucratic problems. The demand for accountability "changed the role of measurement and made it more and more central in the management of education" (NCTE, Common 2). So pervasive was this movement that many English teachers and professional leaders found themselves caught up in it one way or another. The systems approach to the teaching of English language arts was intended to include pre-testing, programmed individualized instruction, and post-testing (Maxwell and Tovatt 11). Such activities were to be based on a list of predetermined objectives derived from observable student behaviors. AS teachers and administrators realized that any given classroom activity might involve many objectives, the task of writing Objectives to cover the entire English language arts curriculum became overwhelming. Indeed, 1n one preject 119 ree in: wh: EV: ob wh di it 76 0b in fi of gr Ir IT’ 120 reading specialists worked a year and a half to develop "an initial list of more than 1,200 behavior objectives," a task which was described as "only the beginning" (Evans 269). Eventually professional and commercial groups assumed the objective—writing task and established "banks" of objectives which could be distributed on order to schools and school districts (Brett 43). Convinced that "if English experts do not do the work, it may be done less well by others" (Hook, "Tri—University" 76), those who were a part of the Tri-University Behavioral Objectives for English project explained in 1970 their intention to develop a preliminary catalog of objectives, to field test it, then to publish "for the information and use of the profession" a catalog of English objectives for grades 9-12 (Hook 86). That same year another group, the Instructional Objectives Exchange, published English Skills 2:2, which contained 76 objectives and related evaluation items organized into categories for speech, composition, diction and tone, etc. For example, Objective 8 listed in the "major category" of composition and the "sub-category" of paragraph is as follows: OBJECTIVE: The student will write a paragraph using identification as the method of exposition. The paragraph will conform to pre—specified criteria. These criteria are: l. The paragraph will have a topic sentence to which the other sentences in the paragraph are related. 2. It will be free from gross spelling, mechanical, or structural errors. de th at) as Sui ex; ba: ob all: 121 3. It will use identification as the process for the development of the subject. 4. It will be as long as the teacher specifies. (10) The following year Hook et a1. published, as promised, Representative Performance Objectives for High School English, which provided objectives-—now sporting the new label "performance" rather than "behaviora1"--and rationales (5)- Those who welcomed the use of behavioral objectives to develop curricula predicted an orderliness and precision that school administrators and board members must have found appealing. They might, for example, be provided data such as the following on which to base curriculum decisions: The voice and diction improvement program, using electronic laboratory tape cartridges, has cost $10,000 this year. It has served 800 students, 80 percent of whom have shown marked improvement in their voice quality and diction. Data sheets supporting this conclusion are attached. (Brett 45) Such accounting would require special expertise, as Jewett explained in a 1971 English Education article. After baseline data had been gathered and pupil performance objectives had been written, an "independent educational auditor" could be called to assist: He looks at the base-line data to see whether they are reliable, valid, and comprehensive in nature. He looks at the pretests and other pre-evaluation procedures to determine whether they measure what they are intended . to measure. Later in the term he looks at interim and ‘ postests and other evaluation procedures to determine whether they measure progress toward attainment of the performance objectives. (10) ot tt a: 116 tt. ar re f 1' mc in Oh be te re of li co Mo Pu fu (I! 122 In response to those who might question the power such a person might wield over an English program, Jewett was quick to point out that the auditor would not set the standards in English, and he would "not select tests or set the objectives for the course, although he might point out to the teacher or the project director that certain objectives are vague, nebulous, or unmeasurable or that other tests are needed" (11). In 1969 NCTE had passed a resolution urging caution in the use of behavioral objectives in the teaching of English, and Maxwell and Tovatt’s book published in 1970 indeed reflected caution. Although the early chapters provided a fictional scenario depicting both sides of the debate, even more persuasive may have been the later chapters written by individuals with varying attitudes toward behavioral objectives. Some contributors denounced the use of behavioral objectives and the pseudo—scientific approach to teaching and learning. Purves, for example, reminded readers that describing behavior involved "only a small part of what is going on when people read and respond to literature, when they generate utterances, and when they compose their conceptualizations" ("Measure" 96). James Moffett saw broader concerns as well, including the "unintended mischief that will almost surely result from publishing behavioral goals, and the bad precedent set for future relations between government and education" ("Misbehaviorist" 111). Readers who respected Moffett must 123 have been startled and influenced when at the end of the chapter he publicly withdrew from the Tri-University Project (116). The issue, nevertheless, continued to be hotly debated in most professional journals in articles carrying such titles as "Behavioral Objectives?-—No" (Ferguson 52). Evaluating English Language Arts The whole matter of behavioral objectives was an issue that consumed an enormous amount of attention and. energy among English educators of the time, but other issues related to the testing of students emerged or reappeared as worrisome concerns as well. While the country's attention was focused on the need for equality and for individual rights, for example, English educators considered anew what was fair to students and the extent to which their tests might be considered objective. Purves provided the simple explanation that "[t]he objective nature of the test is that a machine or clerk can be programmed to see if the test taker has chosen the answer that the testmaker selected as correct. The testmaker’s judgment, of course, is subjective" ("Evaluating" 235). Similarly, Kirschenbaum et a1. insisted that when classroom teachers create objective tests, every question is "selected in a subjective fashion by a teacher with certain pet interests" (197). The publication of their Wad-ja—get? (1971) raised further questions about the accuracy and legitimacy of student evaluation, especially grading, that led many English te ad an tc gr sr jc of or IE 124 teachers to consciously' monitor‘ their: own practices. In addition to cataloging the problems associated with grades and testing, these authors offered a range of alternatives to traditional grading, such as self-evaluation, pass/fail grading, and contract systems. Such alternatives were subsequently tried and discussed in a number of professional journal articles and may have influenced an NCTE Statement of Policy on grading which advocated among other things that only passing grades be recorded (n1 a student’s permanent record (Burton et al. 302). As statewide testing received increasing attention and became more prolific, the terms "competency-based education" and "minimal competency" were used to describe the tests designed to measure whether students could perform the least that could be expected. Allan Glatthorn reported that as of 1978 "thirty—six states had taken some type of action in support of competency-based education" (17), and. by 1981 Charles Cooper reported that every state in the country had "adopted or is seriously considering" minimal competency testing as a way to establish standards for grade-to-grade promotion and for' high school graduation (vii). Calling this issue "the most explosive" one on the educational scene in 1981, Cooper’s The Nature and Measurement of Competepgy in English represented NCTE’s effort to respond first to a 1976 call for an ad hoc group to "explore this new development and to suggest various responses NCTE might make" and second to a 1977 resolution which opposed 1e ti be it it 125 legislatively-mandated competency-based testing "until such time as it is determined to be socially and educationally beneficial" (18). Designed explicitly as devices to sort individual students, competency tests were intended to identify those who fail to meet a certain standard" (5). Aware of other potential uses of such tests, Miles Myers explained that in a cost-cutting era "professional statements of minimum competencies can be used as a rationale for cutting costs. Everything beyond the minimum becomes by definition a frill which public funds are not obligated to support" (166). The problems associated with all tests, but especially with standardized tests, continued during these years to be discussed in professional publications. The use and abuse of test results drew perhaps the greatest condemnation. Delores Durkin, for example, reported how test results were used to decide which preschoolers should stay at home for another year (767), and NCTE insisted that too often test results were misused "to place individual students in particular kinds of classes, to evaluate the effectiveness of a new curriculum, or to assess the strengths and weaknesses of as school system" (NCTE, Common 6). Several authors perceived widespread ignorance about tests and about their results, even among those in positions to determine policy: "In some school districts groups have called for a mandate that 95 percent of the children achieve at or above grade level," a mandate Purves pointed out was "a satistical cont arti schc elee ’ abe simi shot can' in 1 have of Ins1 way: app] new lea] exar "It can giVe Wit] Pree obj, gnie be] _______—___.____1.1 .11.... .1..1-1.._1.11 1.11. ,1- , 126 contradiction in terms" ("Testing" 7). A. writer of an article for a principal’s journal reported in 1987 that "a school board candidate in my local district vowed that, if elected, he would see to it that all students were reading ’above grade level'" (Burrill 61). Roger Farr expressed similar impatience with those in positions of power who should know better: "It often seems that busy legislators can’t be bothered with really understanding what’s going on in the schools; all they want to know is whether the scores have gone up or down" ("New Trends" 22). Paralleling the complaints and criticism were a number of positive efforts, such as the 1973 NCTE Research Instruments Project meant to find or develop "innovative ways to measure such things as growth in literary appreciation, reading, writing, listening, and speaking, and new means of assessing attitude change, climate for learning, and creativity" (Burton et a1 304-5). Some suggested computerized testing. Marvin Glock, for example, saw computers as perfect for a mastery curriculum: "It would be possible to have an automatic assignment of curriculum packages for an individual pupil based on tests given by a computer individuating degree of mastery along with diagnostic information" (63). Similarly, Brett predicted individualized instruction using behavioral objectives programmed into a computer, "which will track and guide the student, task by task. The individual pacing will be made possible by an abundance of instructional materials" 127 (45). Henry Slotnick and John Knapp even discussed essay grading by computer (1971). Many English educators, however, focused on informal classroom assessment. Dorothy Strickland, for example, advocated teacher observation (vi). Burton et a1. suggested monitoring lists of books read, student journals, and attitude surveys, among other items (311). Contributors to Dorothy Watson’s Ideas and Insights (1987) encouraged videotaping literature discussion groups, self—evaluations, and anecdotal record sheets. Offering the least intrusive suggestion of all perhaps were John Mayher and Rita Brause, who explained that "[t]here is no reason why children can’t be evaluated on the basis of the work they are actually doing during the year" (394). Lori Clarke, on the other hand, suggested that testing should be viewed positively as a "culmination" rather than an examination--a "culmination of his learning, of his intellectual excitement aroused by the interaction of fellow students and teacher upon each other" (43). English language arts teachers of this era, then, were confronted with widely disparate approaches to assessment of student performance-~as indicated by the contrast of tests designed to identify students who fail to tests conceived as the culmination of intellectual excitement. b1 pr 11’ 01' 128 Evaluatin Readin The teaching of reading had by 1970 come to be regarded by many as so significant as to be a separate discipline, prompted at least partly by the influence of the International Reading Association and its affiliates. Not only was the broad issue of reading performance discussed in professional journals but smaller subtopics were treated as well. One 1974 list, a "Concise Guide to Standardized Secondary and College Reading Tests" (Mavrogenes et a1.), abstracted and discussed 58 tests but cautioned that almost 2,000 reading tests had been included in the most recent Buros index. Perhaps the widespread use of the standardized tests could be explained by the promises made in the publishers’ descriptions. Roger Lennon suggested that if the best publishers were to be believed, standardized reading tests could measure any and all of the following: paragraph comprehension, word meaning, word discrimination, word recognition, word analysis skills, ability to draw inferences from what is read, retention of details, ability to locate specific information, rate of reading, speed of comprehension, visual perception of words and letters, ability to determine the intent of a writer, ability to grasp the general idea, ability to deduce the meaning of words from context, ability to read with understanding in the natural sciences, in the social sciences, in the humanities, ability to perceive relationships in written material, ability to sense an author’s mood, or intent, ability to appreciate poetry, ability to grasp the organization of ideas, ability to read maps, charts, and tables (19) After bombarding his reader with such a list, Lennon asserted that reliably only the following components of reading ability could be recognized and measured by using B. 129 standardized reading tests: a general verbal factor, comprehension of explicitly stated material, comprehension of implicit or latent meaning, and an element that might be termed "appreciation" (29). Other writers resisted the use of reading rate as a criterion (McDonald) and the use of readability scales to determine passages for reading tests (Rankin), while still others pointed out flaws in the test items themselves. Virginia Allen’s 1978 article, for example, explained that a reading subtest had asked students to "indicate which of these choices is the first syllable, or first part of the word printed in front of the choices": Item 7. riddle r1 rid ridd Item 9. after a af aft Item 18. have h ha have Item 20. here h he here (89) Beyond the question of whether "first syllable" and "first part" might mean the same thing lies the more important question of whether correct answers to such items might provide any helpful indication of a student’s ability to read. A 1987 Language Arts article reported that in one oral reading inventory such short passages were used that some of the children responded by trying to link the passages into a single story line, with "patently wrong answers" resulting (Bussis and Chittenden 307). The children were in this case penalized for knowing too much-- or for using what they knew—-about how stories are constructed. Similarly, children were penalized when they usI the Ch; "B1 fol ch. ot] fol in; qu; Ch. im in: te: sul 001 Use All te: de: de' tat 0f in. 130 "substituted an idea that seemed more logical to them than the particular idea expressed in the test" (Bussis and Chittenden 307). For example, one short reading passage—- "Bud had run. He fed the pup. The pup ate a bun"——was followed by the question, "What did the pup eat?" Some children answered that the pup had eaten a "bone" while others said "some food." The authors explained that in a follow-up discussion the children had said that they were unsure what a bun meant in this context or "they couldn’t imagine such a thing being fed to a dog." Neither response qualified as correct in the test manual (Bussis and Chittenden 307), though again the "incorrect" answers indicated nothing about ability to read but instead indicated perhaps only that these children were not savvy test—takers. There seemed general agreement that many of the reading subtests were inappropriate measures of reading performance. Occasionally, however, an article appeared advocating the use of ever smaller particles of text to test students on. Albert Marcus, for example, encouraged word recognition tests which would measure "each discrete skill involved in decoding," and he insisted that this measurement "should be developed by skill for each phonic element that has to be taught" to the point that "[r]ather than use a sample of one of the initial consonant blends with. r, the test should include all the blends with r, so if the student knows some whic depe proc defi info even anal was pass orig the for read ran thrc tha1 hane wou: nise Pas: StUe inse 3101 131 of them, the teacher will know which ones are known and which ones are not known" (734). Some seemed aware that better assessments of reading depended on discovering better definitions of the reading process. As psycholinguistic and socio—psycholinguistic definitions of reading emerged, so did a number of new informal classroom reading assessment strategies and eventually even new large-scale reading assessments as well. One of the more unusual alternatives was miscue analysis. Working individually with a reader, the teacher was advised to ask the child to read aloud an unfamiliar passage and to record any miscues, i.e., deviations from the original text, which could later be analyzed to determine the quality of each response. The reader was not penalized for minor deviations which did not negatively affect the reader’s comprehension. For example, if the text "The boys ran through the dark forest" was read as "The boys went through the dark forest, the child’s misreading indicated that in fact the passage had been understood. On the other hand, a reading of "The boys ran through the dark frest" would indicate a lack of comprehension (Goodman and Burke 4). By considering as many as 28 questions about each miscue and by tallying the kinds of miscues in a selected passage, the evaluator could acquire information about the student’s reading strategies to be used in designing future instruction. (Although miscue analysis is based on reading aloud--a practice considered relatively unnatural to those wh yi an re of mi sc He be lea/«74.45 67891 132 who think of themselves primarily as silent readers~~it yields assessment information that seems impossible to get any other way). As they came to regard reading as a process that required an active reader, teachers and researchers began to offer a variety of measures by which attitude toward reading might be assessed. Thomas Estes, for example, devised a scale to ‘measure reading attitudes (Figure 5), and Betty Heathington and Estill Alexander proposed in 1978 a "child— based" observation checklist: Figure 4 — Observation Checklist to Assess Reading Abilities In the two—week period, has the child: 1. Seemed happy when engaged in reading activities? 2 Volunteered to read aloud in class? 3. Read a book during free time? 4. Mentioned reading a book at home? 5. Chosen reading over other activities (playing games, coloring, talking, etc.)? . Made requests to go to the library? . Checked out books at the library? . Talked about books he/she has read? . Finished most of the books she/he has started? 0. Mentioned books she/he has at home? __ __ (770) Readers’ prior knowledge also came to be considered an important criterion by which to evaluate reading comprehension. By 1987 Betty Holmes and Nancy Roser offered "Five Ways to Assess Readers’ Prior Knowledge," and Merlin Wittrock suggested "Process Oriented Measures of Comprehension," which involved asking students to summarize, 133 Attitude Scale A 2 strongly agree 8 : agree C : undemded D : drsagree E : strongly disagree l.Reading :5 (or learning but not for enjoyment. 2. Money spent on books IS we”- spent. 3. There rs nothing to be gained from reading books. 4, Bocks are a bore. 5. Readmg us a good way to spend spare time. 6. Snanng books in class is a waste of time. 7. Reading turns me on. 8. Reading is only for grade grubbers. 9. Books aren’t usually good enough to finish. 10. Reading is rewarding to me. 11. Reading becomes boring after about an hour. 12. Most books are too long and dull. 13. Free reading doesn’t teach anything. . 14. There should be more time (or free reading during the school day. 15. There are many books which 1 hope to read. 16. Books should not be rend except for ctass require- ments. 17. Reading is something I can do Without. 18. A certain amount of summer vacation should be set 3516. ‘or readmg. 19. Books make good presents. 20. Readmg rs dull. Figure 5 - Estes Attitude Scale rer Dav ass stu pro int rea sta (Va det clo tha mea and tex- Tre: Mici str. 134 reread, question, and infer (736). Sheila Valencia and P. David Pearson described "the best possible" reading assessment as teacher observation and interaction with students "as they read authentic texts for genuine purposes" (728). Building on Vygotsky’s notion of the zone of proximal development, the classroom teacher then could intervene with support or suggestions as needed (728). By the mid-eighties efforts to redesign statewide reading tests were well under way. By 1987 "at least 40 statewide competency testing programs were in place (Valencia and Pearson 727), and reading theorists were determined to reconceptualize state assessment to more closely reflect the new reading definition. Farr explained that strategies used by better readers--i.e., constructing meaning from background knowledge, visualizing story events and sequences, and hypothesizing about facts and events in texts--might be used as criteria for the new tests ("New Trends" 23). The description by Karen Wixon et al. of Michigan’s latest reading assessment parallels some of the strategies Farr mentioned: First, good readers must be able to integrate their knowledge and skills as they construct meaning for different texts under a variety of reading conditions. Second, good readers must have knowledge about the various purposes for reading, about how different reader, text, and contextual factors can influence their reading, and about the skills and strategies they can use in their reading. Third, good readers are those who have developed positive attitudes about reading and positive perceptions about themselves as readers. (750) USS Mit res knc pel eve int seI km Ch< f0] Stl Stl aul v s 11t Inc‘ thc at 0CC 135 Using full—length stories and subject area texts, the Michigan test was designed with the following balance of responses—~50 percent constructing meaning, 30 percent knowledge about reading, and 20 percent attitudes and self perceptions (751). Illinois’ experimentation with new statewide reading evaluation was described by Valencia and Pearson (1987). It involved summary writing, metacognitive judgments, question selection, multiple acceptable responses, and prior knowledge (730). In each case students were given multiple- choice questions to answer, each with a problem-solving format. For example, in order to evaluate summary writing, students read three or four summaries written by other students and selected the one they thought best. These authors acknowledged the benefits of classroom assessment but also warned that "[u]nless we can influence large scale assessment, we may not be able to refocus assessment at all" (730). Evaluating Literature When compared to the array of reading tests and assessment tools available and discussed, assessments of literature seemed almost non-existent during these years. Indeed, Walter Moore and Larry Kennedy pointed out in 1971 that Buros had listed "no standardized tests for literature at the elementary school level" (443). There was still the occasional article suggesting an expansion of the criteria use wit "De Poe dur obj exa mea cor. inf be fol str "nc Chi whj inc 136 used to evaluate students’ understanding of and experience with literature, such as Sarah Snider's 1978 article, "Developing Non-Essay Tests to Measure Affective Response to Poetry." For the most part, however, discussion focused during these years on the possibility of using behavioral objectives to teach and to test literature. Purves, for example, attempted to use Bloom’s taxonomy to create measurable behavioral objectives for literary works, contextual information, literary theory, and cultural information. Multiple-choice items, Purves insisted, could be designed to yield the needed responses, as in the following item intended to measure the extent to which a student could "accept the importance" of a literary text: Most poetry seems like a meaningless jumble of words. 1. Strongly agree 2. Agree 3. Disagree 4. Strongly disagree ("Evaluation" 750) A Reading Teacher article presented both the "yes" and "no" sides of the question of "Behavioral Objectives for Children’s Literature?" Gordon Peterson’s "yes" response, while conceding that research evidence was limited and inconclusive, included the listing of 47 objectives, such as, "After reading a selection with an implied theme, in one sentence state the implied theme and briefly defend the choice of the implied theme" (657). Other writers seemed less convinced that elementary children’s experiences with literature would be enriched by using such objectives or ev fc it hc ir m Bi tl 137 evaluation questions. Patrick Groff in his "no" response, for example, argued that children’s literature did not lend itself to any kind of "pre-determinism or prescription about how child readers should respond to it" (662). Instead, he insisted, imposing behavioral objectives would narrow the number and variety of responses (662). Somewhat similarly, Bill Ferguson argued that "[e]vidently, the behaviorist thinks that an examination of the atoms of poetry will allow the student to assemble them in his mind, add them up, so to speak, and arrive at a total effect" (54). Evaluating Writing During the 19705 the evaluation of writing also focused on the controversy about the appropriateness of using behavioral objectives. Joseph Foley, for example, used essentially the same method that Purves had suggested in constructing a matrix by which content (in this case, ideas, organization, style, mechanics, and choice of words) could be measured by means of cognitive and affective behaviors, as demonstrated in responses to multiple-choice questions about. writing (770). For the most part, however, there seemed general agreement among writers of professional publications that writing could only be evaluated using actual texts composed by the students themselves. As had been true in earlier periods, English teachers continued to suggest ways to respond to student writing and to address the wri had A11 wri and all the age inv ove gra of stu gat cri Sec Unf in g l l l l 138 the sometimes conflicting issues of response to student writing and grading of student writing. By the late 1980s Calkins, Atwell, Romano and others had suggested a variety of classroom evaluation procedures. All encouraged student writers to self—evaluate their own writing during the writing process and to confer with peers and teacher about needed revisions (e.g., Calkins 159), and all emphasized the need to temper response and evaluation so that students emerged from the experience eager to write again. Romano, for example, described his system, which involved. an initial sorting into stacks on the basis of overall impression and then writing individual comments and grade for each student (114-15). Atwell explained the use of periodic evaluation conferences with students, in which student and teacher discussed and evaluated writing pieces gathered over time in a writing folder (114). As criterion-referenced tests of writing were being recommended instead of norm-referenced ones (Squire, "Behavioral" 146), authors of professional articles and textbooks proposed checklists as evaluation guides. Using these checklists in the classrooms of the 705 and 805 may have served as an early effort to share the evaluation criteria with the students, to let them in on the evaluation secrets, so to speak, that have too often seemed unfathomable to students. Beyond being used for evaluation of student performance in individual classrooms, criterion-referencing became one of wr as fo as to un th wr do as; de: ho pri Dr: 110-. Dr: the 139 of the highly recommended methods for scoring student writing samples on school-wide and district-wide writing assessments. Roger McCaig described the criteria created for use with writing samples collected during a school-wide assessment-—criteria syntactically based on what he referred to as M-units, which were similar to the better known T— units created earlier by Hunt (Hillocks 64) but different in that for young children’s writing, allowance was made for writing which could be "reconstructed into a sentence in accordance with a judgement about the child’s intention" (McCaig, "The Writing" 7). The publication in 1974 of Paul Diederich’s Measuring Growth in Emglish seemed to refocus attention on writing assessment both inside individual classrooms and beyond. In describing the Education Testing Service (ETS) model of holistic and analytic scoring of student writing, Diederich promised ways of "improving the reliability of grades on essays" (l) and a system of evaluation that could eliminate "more than 90 percent of the grading that goes on day after day in almost every classroom" (4). Six years later when Myers published A Procedure for Writing Assessment and Holistic Scoring, he spoke of Diederich as the person "who has done more than anyone else to develop holistic scoring procedures for use in the schools" (2). For those who had not already experienced holistic scoring, Myers’ book provided enough how-to information to persuade many of them that they could do it and that it was the best way to assess wr st ev mi Tl ()2 of EU US as 140 writing (4). By teaching the scoring procedures to students, classroom teachers could share more of the evaluation secrets with their students. In Chris Paulis’s middle school classroom, for example, holistic scoring was used as a revision strategy (128). By 1985 Lester Faigley et a1. had moved beyond holistic scoring of written products to consider the writing process as well: . . . if instruction is to focus on processes of composition as well as products, then evaluation efforts must accommodate this shift in focus. Evaluations must provide useful descriptions of the ways students compose in order to identify and assess changes in these processes that result from instruction. (161) These authors suggested a variety of strategies for gathering process information, especially advocating the use of process logs, self—evaluation questionnaires, and pre- and post-term interviews (173-77). Many states during these years began to reconsider the use of multiple-choice items about. writing to indirectly assess student writing performance. In Florida, however, the multiple—choice format "was predetermined by the bureaucracy," though. teachers and the Florida Council of Teachers of English later had "input" (Simmons 27). Pointing out that the ETS team of test writers under contract was "wholly from the ranks of psychometricians," Simmons regretted that "[n]o verbal expression is tested or measured; aside from filling in their personal data at the 141 top of the page, Florida students don’t produce a word, either orally or in writing, throughout" (27). In most states, however, test designers followed Diederich’s lead and developed tests involving written work composed by students, which could be holistically scored. Colorado developed, for example, a Writing Assessment Program which asked students in seventh, ninth, eleventh, and college and university freshmen to write on the same topic so that papers could be scored in a group across age- levels (Distefano and Killion 208). The California Direct Writing Assessment, using matrix sampling, included writing prompts which required students to produce a variety of writing types for a variety of purposes and audiences (Peckham 31). In the case of California, and in a number of other states as well, one of the criteria used to judge this program a success was the important part played by inservice workshops which gave teachers the "opportunity to talk about what criteria are important in a specific type of writing" (32). In descriptions of the New York State Writing Test for fifth graders, similar benefits were pointed out. Charles Chew, for example, explained that inservice programs throughout the state "train local educators not only to rate the tests but to develop instructional strategies to meet students’ needs in writing" (56). This particular test was praised as coming "very close to approximating the composing process" (Chew 50), since students wrote two different 142 pieces and in addition were given time over two days for prewriting and revision (Chew 50). When Maine instituted its statewide writing assessment, teachers who had participated in training for scoring "decided they should teach the scoring procedures to their students" which led to a writing exchange program with another school in the state (Takacs 34), thus again sharing the evaluation criteria with students as a nudge to internalize it to apply to their own and to others’ writing. Evaluating Oral Language Arts As earlier, very little attention was given to assessment of students’ oral performance, at least among journals and books read by an audience of English language arts teachers--who seem less likely to have read professional journals for speech teachers. Still, the "language arts" were thought of as including not only reading and writing but speaking and listening as well, and the evidence seems clear that oral language was overlooked as an evaluation issue. Indeed, Walter' Moore and Larry Kennedy reported in 1971 that there were at that time "no standardized tests which measured speaking" (442), and only occasionally did English language arts publications discuss classroom-based oral evaluation strategies. One 1974 text did include a separate chapter for both listening and oral communication, though the focus seemed to be on listening to formal presentations (Burns 52) and on di no ar pr st fr ga Vi tl 143 discovering oral "deficiencies" (Burns 64), most noticeably non-standard usage (76). Similarly, John Melear’s 1974 article, "An Informal Language Inventory," described a procedure used primarily for "children who display a lack of standard English" (510), mostly those who were bilingual and from low socio-economic groups. The data in this case were gathered by inferring the quality of language use from the "percentage of grammatically correct sentences" used by pupils as they told about pictures they had drawn (510). Similarly, an article by Margaret Brown recommended asking children to tell a story from a book of pictures while the teachers taped the session for later analysis (507). Janet Black’s article, "There’s More to Language Than Meets the Ear," asserted that oral language assessment must include observing, that is, the "seeing and watching of children in various social and interactive contexts" (527). Perry Gilmore, however, encouraged classroom teachers to broaden their thinking about language assessment further by suggesting that "peer-owned language such as occurs on the playground should be included in a comprehensive language assessment" (584). Through his analysis of "steps" performed by the girls at recess, Gilmore discovered that "[a]lthough most of the students in the observed classes were identified as skill deficient, observations, to the the contrary, indicated that the students were skill proficient" (365). Recognizing that "steps" did not count officially as literacy skills, Gilmore insisted that "[a]ssessment has too of In aw la he SF pa 19 Me as [2 144 often meant closing doors rather than opening them. . . . Instead, teacher expectations should be raised through an awareness that students are capable of doing more with language when they are given the room and respect to do so" (390). When distinguished from attention given to speech and hearing evaluation in regard to physical impairment, speaking and listening as language arts seemed for the most part to have escaped the attention of state test makers. By 1980, however, an article appeared which described a Massachusetts project intended "to assist in developing assessment procedures in listening and speaking for the state’s elementary and secondary students" (Backlund et al. 621). Being careful to explain the differences between oral and written language, they argued that “assessing reading and writing skills should not be used to indicate achievement of speaking and listening" and explained that competence in oral communication is dependent on the interaction of several factors including the speaker’s and listener’s purpose or task in communication; the topic or subject being talked about; the attitudes, experiences, maturity, skills, and knowledge background of both the speaker and listener; and the time, place, and preceding events of the communication setting. (623) National English Language Arts Assessment Although the first National Assessment of Educational Progress (NAEP) was administered in 1969, it was first reported on during the early 19705. English educators as As 15 145 continued to have mixed feelings about such an assessment, as suggested by Shafer’s article title, "A National Assessment in English: A Double Edged Sword." Finally in 1975 John Mellon published for NCTE National 1* .t and the Teaching of English, a book summarizing "in detail the findings of the initial writing, reading, and literature assessments" and interpreting the factual data "from a number of perspectives" (1). In writing, students were given imaginary situations and directed to write short passages in response, which were scored either "acceptable“ or "unacceptable" (16); they were asked yes/no questions regarding out-of—school writing; and they were asked to compose essays, which were scored holistically (20). Some of the reading exercises involved isolated phrases and sentences of text but others involved short passages of prose and poetry (42). Results were reported as percentages of respondents able to read and comprehend each item. The literature assessment, Mellon explained, seemed "a first step only in its intended direction--cautious, conservative, almost tentative, and frankly experimental in places" (76). This portion of the assessment used multiple—choice questions with reason follow-ups, orally composed and tape-recorded answers to open-ended questions about works of literature, and essays about a given work (85). In spite of English educators’ mixed feelings about the test itself, Purves praised the presentation of results in te pe ta ll" IE? 146 terms "intelligible to the layman" using "percentages of people performing satisfactorily on clearly understood tasks" ("Evaluating" 238). Rexford Brown was another proponent of NAEP whose articles appeared frequently in English language arts publications. He seemed convinced that "in many states and districts I am familiar with, the NAEP legitimized activities that were close to happening but needed a push or some kind of ’authoritative’ support before they could be put into action ("The Examiner" 221-22). As an employee of NAEP, Brown was not an unbiased reporter in this case, though later materials will show that others shared his impression that large—scale tests have the potential to legitimize specific classroom activities. Evaluating English Language Arts Teaching Practices Issues related to evaluation of students’ performance clearly spilled over into the realm of evaluation of English language arts teaching practices during these years. Much of the discussion appeared within the context of students’ performance--to praise or blame teachers based upon their students’ performance, to disavow any such link, or to argue the possibilities on both sides. McCaig, for example, seemed to believe that students’ performance directly reflected teaching practices: Not until the annual "writing test" was initiated as part of the spring achievement battery were differences between classrooms documented and the debate resolved. 147 The first year of testing demonstrated in a systematic way what some people already knew; namely that some first grade teachers were teaching' writing and some were not. ("What Research" 49) Lanny Morreau insisted that teachers should welcome having their own performance linked to that of their students: . . . an evaluation based on the learned repertoire of students is both less capricious and far more educationally relevant than the half-hour visit by a superintendent, an inventory of teacher attendance at staff and PTA functions, or the particular mode of dress which a teacher selects. (37) English teachers should, Morreau insisted, "demand" that their own evaluations be "based on their effect upon the behavior of their students" (37). other writers of professional articles tended to agree that administrative evaluation of teachers was inadequate. Burton et al., for example, pointed out that in most cases, "[g]enerally, if the teacher seems well prepared, the students appear reasonably content and profitably occupied, the class is orderly, supervisors will assume that all is well" (294). What was offered by English educators in the place of administrative evaluation, however, seldom linked teacher performance to student test scores. John Hassett, for example, explained that even as "[w]e do not judge a doctor’s competence by the blood pressure readings of his patients," we also "should not judge a teacher’s competence on the basis of the test scores of the pupils unless there is additional evidence that the teacher is failing to do his or her job in the classroom" (31). John Maxwell also asserted that "[t]est score results do not evaluate teacher ef fr 148 effectiveness . . . achievement test scores represent only a fraction of the effect of English 'teaching' and learning" (27). Further, Maxwell insisted, such tests measure "a fraction of an English teacher’s goals" and therefore student scores "are too unreliable an index to be used for personnel decisions" (27). Mayher and Brause perceived a trap that innovative teachers often find themselves in, "caught between their own professional convictions about the best approaches to promoting pupil learning and the outside systems of assessment used to measure their success" (391). What seemed most often suggested as a remedy for relying on students’ performance on tests or on administrators’ evaluations was allowing English language arts teachers to become involved in their own evaluation. Velma Elliott, for example, suggested that teachers should become peer evaluators and proposed a system whereby teachers could sumit names of teachers to their principals, who would choose from among the names a team of evaluators (727). Burton et a1. went further, however, and suggested teacher self-evaluation instead. Teachers could, for example, analyze their own behavior by means of a journal or personal checklist (295), tape recording sessions, soliciting observation by colleagues (296), student evaluations (297), interviews with students, or questionnaires (299). These years were ones in which many English language arts teachers sensed a public disappointment in their pea un: "1 th of ar ei WE Ma 149 performance. Allan Glatthorn, for instance, seemed unsympathetic with English teachers, whom he believed had "largely ignored the exhortations to made radical changes in the way in which they teach" (13). Citing Goodlad’s study of schools, he theorized that "the scholar’s recommendations are largely ignored by the classroom teacher, who finds them either too recondite or too unrealistic" (18). Further, it was Glatthorn’s opinion that The formal curriculum is often quietly subverted, the mimeographed curriculum guide filed in the bottom cabinet until evaluation time . . . projects which try to develop teacher-proof curricula fail because they fall into the hands of curriculum-proof teachers. (19) Many teachers’ responses to the situation were predictably and understandably defensive, as expressed by a teacher educator who observed, "Anyone who comes into my room to watch me teach and has a rating scale in hand is my enemy" (Small 176). Maxwell blamed teachers for some of the public fascination with tests, saying "the testing fraud is in major part something that is done by the consumers to themselves" (iv). Citing the work of an NCTE Task Force on Measurement and Evaluation, Maxwell reported "evidence of widespread ignorance about tests among teachers, administrators, members of school boards, the media, and the public" (iv). When Sheila Fitzgerald surveyed teachers several years later, she found that many teachers still seemed unaware of the criticisms being leveled against standardized tests and that "especially the teachers of 150 elementary students" believed that tests accurately reflected what students had learned (39). Even later Mayher and Brause observed that ironically "as teachers, we have primarily communicated with parents through grades and test scores whether or not we believe that they actually measured student learning. We have, in effect, taught parents to have a high regard for such scores and are, therefore, caught in a trap of our own making" (391). Those who believed themselves enlightened about tests and testing were still caught up short by Valencia and Pearson’s assertion that teachers "take secret pride that [their] pet instructional technique produces greater gains than other techniques" on one of the very tests that have been criticized (726). English language arts teachers also found themselves criticized when they did--and when they did not--teach to the tests. Erickson theorized that "[p]ossibly many of our ’best readers' and their classmates show depressed scores on standardized reading comprehension tests because of an acute deficiency in test taking ability" (140). Such a statement was a telling one because of the implied criticism it included. It implied, first, that such tests were in fact accurate measures and important. Further, it can be interpreted as a criticism both of teachers, who failed to teach what their students needed, and of the students who, according to the prevailing attitude at the time, were themselves considered at fault or "deficient." Faced with st de tc t1”. tl pt tl 151 students possessing such a "deficiency," teachers had to decide how many test-taking strategies to teach and whether to use such materials as a NEA book published in 1986 with the title How to Prepare Students for Writing Tests (Tuttle). Given the criticism focused on tests, teachers during this period were increasingly encouraged in professional publications to become informed about issues, to think of themselves as the best evaluators of their students’ work, and to evaluate their own teaching practices as well. In response to a 1973 resolution, NCTE published the practical booklet, Common Sense and Testing in English (1975), which provided teachers with enough technical knowledge to analyze their current testing situations and the need for change. Clearly written text and diagrams provided guides to help English teachers understand and help others understand the range of testing problems and possibilities, including, for example, "Legitimate Uses and Users of the Results of Measurement" and a "Citizen’s Edition" of the report as well. Even as teachers began to seek alternative measures by which to evaluate their students, they were also encouraged to ask questions about their own teaching and to systematically collect data and test hyotheses about their own teaching (Judy, The ABCs 164). Citing advantages of self-reporting, S. M. Koziol and Patricia Burns explained that as teachers complete self—report inventories, they 152 "became aware of instructional choices they had not been aware of or had not considered for some time . . . they were examining what they did as teachers and they were thinking seriously about alternatives for their classrooms" (116). One of the most novel articles affirming teacher competence and good judgment appeared under the title, "The Latest Model." This tongue-in—cheek narrative recounted a search for the best assessment tools. As the buyer entered "Ernie’s Evaluation Emporium," he/she was shown a variety of models by the manager, "Norm Reference." Each model seemed to promise more than the one before, e.g., This sleek model here provides grade equivalent scores out to 6 decimal places, is renormed every other Tuesday, and the publisher guarantees that every child scoring over the 90th percentile will receive a simulated gold bracelet with his or her score etched on the back. (Fredericks 790) Eventually the latest model was described, along with the promise that "they last forever and consistently give accurate diagnostic and evaluative information on students" and in addition, "the price is right" (791). Teachers, of course, were the latest model. In that same issue of Thg Reading Teacher, however, a more serious article echoed the same position. "Teachers as Evaluation Experts" highlighted the value of the evaluation expertise teachers provide as they detect patterns, know classroom procedures, listen, and evaluate to serve instruction (Johnston 740). Such curriculum materials, 153 Evaluating English Language Arts Curricula Just as evaluation of teaching practices was linked to evaluation of student performance, so also was evaluation of English language arts curricula linked. Although English educators insisted that standardized tests did not provide enough information that would make them useful in deciding about changes in the English curricula (Maxwell 2), during these years as curricula were designed and developed and evaluated, student performance on tests was the dominant driving force. How difficult it must have been to watch testing overpower the curriculum and to have the resulting weakened curriculum shape evaluation. George Madaus cited evidence of the teaching and testing circularity when he spoke of a public school principal who testified that . . . reading instruction has come to closely resemble the practice of taking reading tests. In reading students using commercial materials read dozens class, of little paragraphs about which they then answer questions. The materials they use are more and more designed to look exactly like the tests they will take in the spring. (8) Mayher and Brause pointed out, "enable kids to practice for the tests--which thereby demonstrate that the schools are achieving their objectives" (392). curricular materials pointed out, houses . . . Perhaps the ultimate link was between publishers of and tests. As Betty Jane Wagner "Editors at the el—hi desks of major publishing know they can appeal to potential buyers by 154 citing better test performance by students who have piloted their textbooks" (55). Indeed, a booklet called Imprgying SAT Scores accompanied the 1985 edition of the Ginn Literature Series, its introduction stating, The booklet provides an overview and explanation of the verbal SAT, a description of the relationship between the instructional program of the Ginn Literature Series and the SAT verbal skill areas, and test practice masters in the four SAT areas. . . . (1) Although student performance issues dominated curriculum discussions, there were during these years those who addressed the need to evaluate English language arts curricula as a precursor to considering new options. Three books published in 1980, in fact, were entirely focused on the English language arts curriculum. Glatthorn’s Guide for Developing an English Curriculum for the Eighties included Mandel’s Foreword citing a shortage of recent NCTE books on curriculum reform because so much professional energy had been expended in responding to "proponents of competency- based teaching, minimal competencies, and state—mandated testing" (ix) that. there had. been little energy left to consider curriculum reform. What Glatthorn proposed, however, was a "mastery curriculum" characterized by careful sequencing (i.e., "learning of objective 3 depends on mastery of objectives 1 and 2") facilitated through careful planning (i.e., "teaching objective 3 requires deliberate analysis of its component skills") which should result in measurable outcomes (i.e., "a test can easily determine 111E Wi tl a t o f h 155 whether objective 3 has been mastered) and was "best mastered when its content is clearly delineated into discrete units or lessons" (28). Barrett Mandel ’ 5 W was another NCTE publication of 1980, intended as a response to a 1977 NCTE sense-of—the-house motion calling for national guidelines for curricula in English (1). Mandel admitted that his book might seem to readers a "far cry from the intention of the motion" and explained that what he offered instead was a description of three curriculum models from which readers could choose: the competencies model, the heritage or traditional model; and the process or student- centered model (3). Without the constraints imposed by Glatthorn’s mastery curriculum and without the indecisiveness suggested by Mandel’s three models, Judy (ABC’s of Literacy) proposed "ten global priorities" for literacy education, though he suggested they be simply a "starting point for school and community discussions" (82). Among them are the following assertions that serve as a rejection of the teaching—testing circular trap: that literacy programs should be "based on reading and writing experiences, not principally on the study of literacy-related skills" (82); that they "must lead to continuous growth rather than offering isolated experiences or training" (85); that they "be developed by the people who will conduct them—-the teachers" (92); and that "teachers must be willing to offer instruction in ree ski boc let in‘ he th cl gr SC 156 reading and writing skills whenever and wherever those skills are needed" (95). (While none of the three 1980 books seemed to have had a particularly striking effect on later discussions of curriculum evaluation, it is interesting to consider what impact Judy’s text might have had on English teachers if it too had been published under the auspices of NCTE, since it seemed to fulfill more closely the spirit of NCTE's 1977 resolution.) Officially NCTE continued to evaluate school curriculum guides and to offer a review service which districts or schools could use to obtain a critique from the NCTE Curriculum Committee members free of charge. Their published models and guidelines provided periodically updated criteria by which curricula could be evaluated. In 1973 NCTE created a Task Force of Measurement and Evaluation. Some school-wide curriculum evaluation projects were reported involving a variety of curriculum assessment criteria. APEX Evaluated and Revised (1975) was one such effort. In this case, the school’s "nongraded phase— elective English curriculum" was evaluated. Language arts teachers volunteered to meet after school to develop the evaluation process, which included using in-depth questionnaires (7) and semantic differential scales to measure student responses and attitudes (6). More predictably, other criteria suggested--though apparently not used in this project--were scores on reading, SAT, and ACT de: st re st ph te 157 tests (8). In contrast to the APEX study is the example of a small private school’s efforts to gather the data needed to make curricular decisions. The Prospect School was described as a K-9 demonstration school in Vermont with a staff of five teachers and three researchers. Teachers and researchers gathered the following descriptive records for student and curriculum evaluation purposes: drawings, photos, journals, written work, teachers’ weekly records, teachers’ reports to parents, curriculum trees, sociograms, and more (Carini 45). Such a project, though admirable and perhaps a model for some (those who could provide three researchers for each five teachers), would be beyond the reach of the most ambitious teacher-researcher working alone. A survey of curriculum evaluation materials during these years reveals that essentially missing from this period were large—scale studies of English language arts programs, such as Squire and Applebee had conducted earlier. Arthur Applebee’s Writing in the Secondary School is one of the few exceptions. Using classroom observation of writing assignments and related instruction in two midwestern high schools over a full academic year and a national questionnaire survey of teachers in six major subject areas, he studied teachers’ actions and attitudes. His research results charted such things as "Mean Percent of Lesson Time Involving Writing Activities" (31) and "Types of Writing Reported by students" (33). Given the overwhelming emphasis 158 on student performance and on teacher accountability during these years, broad curriculum evaluation projects were apparently seldom funded, and even school—wide projects, especially realistic ones, seemed rare. WW Language Arts Assessment In 1978 English Journal editor Judy observed that English teachers had been accused of being "anti-test“ or "anti-evaluation" ("Standardized" 6). Indeed, some seemed to condemn all tests, and sometimes the reasons cited seemed valid, as in the case of a 1974 editorial which theorized that student performance would improve if instead of testing, the money used for tests were diverted to "building better reading programs, supplying teachers with more and better books, and training teachers in the use of more effective approaches" (qtd. in Farr and Roser 594). Following the editorial, Farr and Roser responded by insisting that, even if such a plan could be put into practice, "a future editorial would be demanding evidence that the additional funds were being wisely spent" (594). The accountability emphasis at the beginning of this period eclipsed other English language arts evaluation concerns. Classroom teachers, told by administrators and legislators to follow and sometimes produce countless behavioral objectives, had little time or energy left to devise alternatives. Perhaps, however, it was their intuitive aversion to the behaviorist approach that prompted thei offe of t info asse shou asse Whil issm inst in foli empl very real rec. con art abo pro foo QVa fin don PO] 159 their determination to discover what it was they found so offensive and to devise alternatives. The fact that so many of the alternatives they suggested were classroom based and informal measures may indicate their readiness to rethink assessment completely. English educators seemed convinced that assessment should match current theory and practice, and based their assessment reform efforts toward achieving a better match. While such an approach seems to be logical, it raises the issue as to how closely there should be a match. For instance, in the days when telephone courtesy was emphasized in English language arts classrooms, should tests have followed suit (as they did actually) and given a similar emphasis to such skills? It seems possible that to do so is very much like the teaching-testing trap of the basal reading skills programs that have been so criticized in recent years. An important question for future consideration, then, is the extent to which English language arts evaluation should or should not follow current ideas about curriculum and teaching practices. It is interesting that the accountability system was promoted during a time when the nation was supposedly focused on equal opportunities for all, yet this system of evaluation was clearly a human sorting process focused on finding deficiencies in the skills that groups outside the dominant culture were apt to possess. It was, then, a political and economic issue that in some ways parallels the 160 emphasis on grammatical correctness during the years of mass immigration during the 19205 and 305. Who was served by the particular ways that English language arts was evaluated and by the focus of the evaluation? Those who felt their own positions threatened? If there was an emerging attitude at the end of this period, it was that assessment efforts should demystify testing——that students should share in the evaluation process and understand it. Rather than thinking of assessment as a means of separating or excluding those who do not measure up, assessment should,at least in part, be thought of as a way to acknowledge strengths. Another important direction has grown from the holisitic scoring of writing, which by being a reliable measure, has influenced teachers to rethink their own ability to evaluate effectively and to rethink how evaluation should be done, i.e., in the context of a whole work. Perhaps by the end of the eighties the English language arts world was ready for a theory of English language arts assessment——whether it be assessment of student performance, assessment of teaching practices, or assessment of curricula. If so, several tenets of such a theory began to emerge during the 705 and 8OS—-that process forms of measurement be included (M. Wilson 12), that error be valued as necessary to the developmental process (M. Wilson 12), that emphasis be placed on "possibility rather than on actuality" (Chaplin 216), that assessment programs and 161 procedures foster sound teaching practices, and that the secrets of evaluation be shared so that evaluative criteria can be internalized. CHAPTER SEVEN CURRENT CONDITIONS The Power and Impact of Testing Since 1987 assessment has become in education circles and beyond a topic often discussed. Even writers for a Timg cover story reported their perception that "young people want constant feedback from supervisors . . . people in their 205 crave grades, performance evaluations and reviews. They want a quantification of their achievement" (Gross and Scott 59). If there is any truth to such perceptions, schooling has no doubt played a central role in creating such circumstances. In fact, the impact of testing, as reported in professional publications in English language arts and other disciplines as well, has become so great that every facet of education is being rethought through the filter of assessment, and some theorists have concluded that school itself is becoming a "test-like activity" (Langer, qtd. by Edelsky and Harman 160). At the same time test-bashing is common in almost all professional quarters, especially in English language arts. Standardized tests are criticized as “synthetic, contrived, confining, and controlling, out of touch with modern theory and research" (K. Goodman et al., Whglg xi). Almost everyone in the language arts field seems to believe that tests have come to wield too much power over curriculum and over the lives of students and teachers alike. Edith Aronson and Roger Farr speak of the "growing empowerment of te ti SC 163 tests," and Madaus insists that "[t]esting is fast usurping the role of the curriculum as the mechanism of defining what school is about in this country" (63). There is "no test worth teaching to," according to Madaus, who calls measurement-driven instruction "psychometric imperialism" (84). Yet, we sometimes invest high—stakes testing with so much power, Madaus warns, that "society tends to treat test results as the major goal of schooling rather‘ than as a useful but fallible indicator of achievement" (97). If tests carry such high stakes, they have made everyone who might be affected anxious, sometimes so anxious that tests and their results have been grossly misused. One former classroom teacher explained, "I have direct evidence that the answer sheets of the students who scored lowest on our district tests are being removed before they are sent to the central office, where district averages are computed" (Richards 66). Not all abuses are so overtly dishonest, but many lie suspiciously close to that line. Susan Harman highlights a variety of actions-~some of which school administrators rationalize for reasons that have nothing to do with testing--that in fact raise test scores: Use old norms; exempt the children with limited English by sending them to bilingual classes, modify the testing conditions for black children by sending them to special education, leave other low-scoring children back in hopes that their age will give them an edge, teach to the tests, or simply teach the tests (that is, cheat). (50) Given the frenzy of attention being given today to assessment, English language arts teachers must focus in a 164 profound way on what assessment will mean for the generations of future students. Evaluating English Language Arts By the late eighties the reading profession, which had been controlled for decades by standardized testing and by test-driven curricula, had begun to explore a range of alternative means of assessment. Frequently, however, the discussion of the prevalence and problems of standardized tests has overshadowed discussion of the specific criteria by which students’ reading performance could be evaluated. Carole Edelsky and Susan Harman have compiled the evidence against standardized tests for reading. Such tests, they argue, are based on a faulty conception of what reading is: "Test makers ignore the interconnections and interdependence among the various language sub—systems such as reader’s purpose, text, genre, and the social relations among reader, teacher, and author" (158). Further, they insist that the tests are not valid since they do not measure what they say they measure. Citing a study by Altwerger and Resta, they report that "1,000 children showed no particular relationship between their actual reading and their scores on the California Tests of Basic Skills. Some children scored high but read poorly; others scored high and read well; some low scorers read well and others didn't." A test score does relate, Edelsky and Harman insist, to "how well that person does on test—like tasks in school [though] tes are pro esp of im] 19! by pri fr. NA lln id NA pc t1“. 165 . . . there is I“) evidence whatsoever that the tasks on tests (like being able to identify short and long vowels) are used in real reading" (159). There are additional problems with the way reading tests are interpreted, especially by very young students: . . . when Jesse was six, he told his mother he thought the way to take a test was to pick the answer he liked, so he read them all and then found the ones that sounded nicest. Mishi, a second grader, thought the idea was never to read the questions first because that would be cheating. Nicky, on the other hand, thought it would be cheating to look back at the passage because that would make answering the question too easy, so he covered it up. (Edelsky and Harman 160) Both NCTE and IRA have denounced the overuse and abuse of inappropriate reading tests and called for development of improved means of assessment. IRA, for example, passed a 1988 resolution which "opposes the proliferation of school by school, district by district, state by state, and province by Emovince comparison assessments" and withdrew from any involvement with the development of an improved NAEP reading test (Farstrup l). NCTE in 1989 urged NAEP "not to incorporate the testing of discrete word identification skills into assessments conducted by the NAEP" and to "intensify efforts to inform educators, policymakers, and the public about the problems inherent in the testing of discrete skills" (Jan. 1990 Language .Arts 93). NAEP itself published a report (Langer et al.) on the results of its 1988 reading test which attempts to describe "who reads best" and "how well do students read." Included among the criteria used to answer such questions are data on 166 independent reading, availability of reading materials, value of reading, time spent on reading instruction, emphasis on reading skills, emphasis on testing, and recent changes in teaching practices. Using matrix sampling, NAEP provides such information regarding reading tests as that 22 percent of fourth graders were tested in reading at least once a week and that "higher-achieving students were likely to be tested the least, while lower-achieving students were more likely to be tested at least weekly" (55). A number of newer directions for reading assessment have recently been suggested. Constance Weaver’s chapter title, "How Can We Assess Readers’ Strengths and Begin to Determine Their Instructional Needs?" reflects the movement away from 21 deficiency model of assessment and away from testing for the purpose of sorting students. Reading performance, according to Weaver, should be evaluated using measures that meet several criteria, among them, recognition that "no two readers . . . will ever read or understand the same selection in exactly the same way" and provision of "insight into a reader’s strategies" (327). Such a whole language perspective on reading assessment has led to "greater emphasis on semantic cues" and to "greater reliance on miscue analysis and the development of checklists for process rather than for skills" (Aronson and Farr 161). Perhaps as an alternative to miscue analysis, Lesley Morrow has designed a "Story Retelling Analysis" suitable for classroom use (112). The Bay Village (Ohio) City Schools ha Sc C3. oh de pc fe er 167 has developed a "District Holistic .Assessment of Reading Scale" (Figure 5) which provides a four-point scale which can be used to evaluate "the accumulation of several reading observations." A less technical reading assessment is described from a principal’s perspective: "Whenever possible, hear each child, in grades 1-4 at least, read a few pages to you. Give each child an on-the—spot encouraging written analysis" (Corbett 53). Although literature has frequently been replacing basal readers in elementary classrooms, discussions in professional publications have not included evaluating students’ understanding of literature or the quality of literary experiences. (Perhaps those who believe that only what is especially valued is tested might interpret the lack of such articles as support for the concern that literature in elementary classrooms is being used solely as a way to teach reading in the same way that basal readers were used in the past.) Although he was probably thinking of secondary or post- secondary students and of standardized testing, Purves reported at the 1989 NCTE convention that when literature is tested, "often test questions reduce literature to the level of textbook where knowledge is factual." Further, there are "no questions on evaluation of the work as aesthetic object with attitudes, beliefs, or interests and no questions dealing with the nature of the aesthetical transaction." Concerned that the imaginative power of literature "remains 168 DISTRICT HOLISTIC ASSESSMENT OF READING SCALE This evaluation should be the accumulation of several reading observations. This evaluation should be adjusted according to grade level and child. 4 Loves to read Reads from many different genres Reads stories and information that require sophisticated background information Reads orally with fluency Reads with great concentration Shares information about stories spontaneously Identifies themes and makes links with other similar materials automatically Uses information from printed material in conversation Uses rcadin g to get necessary information I-las unusual insights or perspectives on what has been read Has an extensive vocabulary Enjoys reading Shows competence in understanding story or book Has some difficulty with unusual vocabulary or sentence constructions Constructs image of the story that is very consistent with text Monitors and self-corrects in oral reading Makes good predictions regarding text Uses text to find information competently Has a strong vocabulary 2 Responds to reading assignments rather than a personal drive to read Needs help in finding selections of interest Needs push to read different or challenging materials Lacks confidence as a reader Constructs partial image of text often focusing on details of lesser importance Tends to pick easy selections or familiar stories Has weak vocabulary 1 Experiences great difficulty in constructing meaning Lacks confidence in making predictions Demonstrates little or no use of strategies Gets hung up on "sounding out" Shows frustration 1n reading task Has little self motivation Displays weak study skills Has short attention span - easily distracted Demonstrates little connection between thinking, saying. reading, writing Has a meager vocabulary Figure 6 - Bay Village Reading Scale unex; comp: true and have orde obse summ seen ESSE 169 unexplored" in most assessments, he cited low—level comprehension questions, such as "Who murdered Macbeth?" and true—false items, such as, "Huckleberry Finn is a good boy" and "Hamlet is mad“ (to which Purves responded, "I might have gotten that one wrong"). A text is read, then, "in order to take a multiple-choice test on it," a task Purves observes is best prepared for by reading a commercial plot summary. He seems aware that what is valued is tested and seems to regret that Only two states have a humanities assessment and thus include literature as an aspect of general cultural and intellectual history. Fewer than a quarter of the states . . . measure student knowledge of specific authors and titles, literary terminology, or general cultural information, and only two of the states report that these particular measures are used to help determine promotion or graduation. ("Today’s" 5) Perhaps Robert Probst shares Purves’ concern about the assessment of literary knowledge and experiences. However, Probst recommends that evaluation "should grow logically out of our concern for students’ responses to the literature and their analysis of them" (225). He offers self-evaluation questions that focus on the personal engagement of the student with the literature, that is, on the transaction between reader and text-—e.g., "Did you enjoy reading the work?" and "Did the literary work offer any new insight or point of view?" (225). Further, Probst suggests checklist questions to evaluate students’ reading performance and process, including such items as "Does the student distinguish between the thoughts and feelings she brings to 170 a literary work and those that can be reasonably attributed to the text?" and "Does the student accept the responsibility for making meaning out of the literature and the discussions? Or does she depend on others to tell her what works mean?" (226—27). Such questions call for concentrated classroom observation but seem to echo—-and provide a: way to respond to-—some of the same assessment concerns Henry described 30 years earlier. The assessment of writing has moved since the late eighties toward a focus on teaching students how to assess writing. The expectation is that once they recognize what constitutes good writing they will be able to produce it themselves. For example, Steven Zemelman and Harvey Daniels suggest that students help establish criteria for grading and for evaluating papers, in an effort to take the mystery out of writing assessment. Dan Kirby and Tom Liner agree and offer "checkpoint scales" which could be used for student self-evaluation and also could reduce grading time for beleaguered secondary English teachers. Similarly Iris Tiedt's 1989 text suggests training students to use evaluation rubrics (190). Vicki Spandel and Richard Stiggins (Creatin Writers, 1990) explicitly link writing and assessment and instruction. These authors insist that assessment should not be intrusive and should not be taken out of the writing classroom (x). They explain that revising and editing, peer reviews, and even sharing writing by reading aloud are all _,_r 171 forms of assessment and offer criteria for a system of classroom writing assessment: 0 reflects specific, well-defined, consistently applied criteria 0 provides student writers and teachers with better insights about what makes a piece of writing work 0 reveals the strengths, as well as the weaknesses, in writing 0 gives teachers some welcome clues about what (specifically) they can do to help students write better 0 provides students (as well as teachers, parents, and others) a working vocabulary that they can use to talk about writing (14) Based on research and teaching, Spandel and Stiggins include instructions for training students in holistic and analytic scoring and provide classroom-tested suggestions regarding how to respond to students’ papers, emphasizing the power of positive comments. Donald Graves recommends a portfolio approach and suggests that even young students could be asked to spread out their work and to make judgments about the various pieces of writing, finding and. marking items--those that indicate "good" writing, that were "hard" to write, that show the writer is getting the "hang of it," that show the writer. has "learned" something, etc. (NCTE Convention's "Whole Day of Whole Language," 1990). Ray Levi extends the kind of suggestions Graves provides to include parental evaluation of student portfolios. He describes an end-of— the-year questionnaire for parents as they respond to their 172 children’s first—grade writing folder: "I invited the parents, after reviewing portfolios, to share comments about their children’s growth and to articulate hopes for the following year" (270). When the State of Michigan considered portfolio assessment in the design of a statewide writing assessment, difficulties emerged that call into question even the possibility of such an approach for large—scale assessment. During a pilot study, which I was involved in, student- selected pieces of writing were included for scoring along with a timed (50-minute) writing sample and an essentially untimed sample (composed, as much as possible, in normal classroom conditions over as much as a five-day period). While the writing done in response to provided prompts was relatively easy to evaluate, the self-selected pieces represented such a wide range of genre and such a mix of student-edited and teacher—corrected pieces that it seemed impossible, at least to the pilot study committee, to make comparisons or to determine the standards that might be used to evaluate such pieces. Some do, however, suggest the use of portfolios for large-scale assessment, as in the case of New Hampshire (Simmons), though E1 Michigan Department of Education employee has observed that the city of Detroit alone has more students than the entire state of New Hampshire, a comment which puts into a different perspective the use of large-scale portfolio assessment. 173 NAEP published Learning to Write in Our Nation’s Schools (Applebee et al. 1990) to report results of its 1988 writing tests. It presents data reflecting several criteria by which writing performance and attitude can be measured, asking questions about planning, revising, and editing strategies; about liking writing; about time devoted to writing; about length (fl? writing assignments, etc. Such criteria go well beyond the evaluation of written products and reflect continued interest in finding ways to evaluate writing processes. Zemelman and Daniels link teachers’ roles with simultaneous writing and evaluation processes, suggesting a number of possible relationships worth further research: Stage: PREWRITING: DRAFTING: REVISION: PUBLICATION: Writer’s focus: Ideas Fluency Clarity Correctness Kind of assessment: Observing Responding Evaluating Grading Teacher Listener; Encourager; Coach; Expert; roles: encourager coach expert editor Goal of Probing Encouraging; Questioning; Judging; feedback: for suggesting; challenging; grading interests; processes evaluating motivating Figure 7 — Evaluating Writing as a Process Some oral language specialists seem to share with literature specialists the concern that unless something is tested, it will continue to be undervalued in the classroom. 174 John Stewig insists, for example, "Without significant, numerous, and to-some—degree standardized assessment measures, oral language will not be included systematically in the curriculum" (173). He suggests that classroom teachers monitor students’ oral growth by taping and analyzing oral language, one advantage being that "tapes are both cheap and small enough to store easily" and they allow for language samples to be gathered at "various times and in varying contexts" 173). Although such procedures are more possible than were the stenographic records suggested in earlier years, they would require considerable time and effort to implement. Perhaps, however, Stewig’s point is that sufficient time and effort focused on oral language, in his mind at least, is exactly what is needed. Integrating English Language Arts Assessment One of the most significant recent influences on changing means of evaluation of student performance in English language arts has occurred because of the impact of the whole language movement. Whether or not one agrees with the philosophy, whole language theorists and practitioners believe that traditional tests are not consistent with a whole language philosophy and have, therefore, experimented with a variety of classroom assessment alternatives. They have in many cases been careful to explain the theoretical and philosophical implications of the practices and procedures suggested. For example, the Whole Language 175 Evaluation Book (K. Goodman et a1. 1989) devotes its preface to explaining the theoretical basis for forms of evaluation that are consistent with the principles of whole language, emphasizing a "positive view" of teaching and learning (xi). The ethnographic research that Denny Taylor and a group of classroom teachers and administrators are conducting extends the theoretical discussions of whole language evaluation. As Taylor explains, "one cannot approach assessment in a new way without also altering what passes for teaching and learning in a school setting" (3). She and her colleagues are gathering and studying every scrap of evidence of selected students’ use of language to determine their "literacy configuration," that is, the way they use print to produce "a unique pattern of . . . literacy behaviors" (8). From a practical standpoint, however, Taylor's research methods may be just marginally applicable to classroom use, since the time involved makes such case studies prohibitive for most classroom teachers (one teacher, for example, admitted "it took two hours each time I wrote about a child") (271). As whole language practitioners design classroom procedures for implementing a whole language theory of assessment, they have emphasized self-evaluation and the use of portfolios. Both allow for integration of language arts across other content areas as well, allow for individual choices, focus on students’ strengths as well as weaknesses, 176 and count on students’ ability to take charge of much of their own learning. Many of the alternative assessments that whole language advocates suggest to replace standardized tests and objective classroom tests are items that have been suggested many times in the past-~items such as teacher observation, interviews, self-evaluation, and portfolios. The professional literature of the past, however, does not seem to indicate that many of these alternative suggestions ever materialized in significant numbers in classrooms around the country--which might lead one to wonder whether or not the situation will be any different today, though we constantly encounter articles and conference sessions focused on these alternatives. Perhaps such alternative assessments are more likely to appear in classrooms in 1991 because the professional journal articles are more often written by actual classroom teachers who can testify to the success of the new assessment procedures and, more importantly, can describe in detail how the procedures work and what cautions are in order. Too often in the past suggestions for alternative assessments were offered by teacher educators who could offer little, if any classroom examples and evidence to support their suggestions. The teacher—as- researcher movement, however, seems to have been welcomed by editors of professional journals, who frequently include articles such as "Adapting the Portfolio to Meet Student Needs" (Krest), in which a high school teacher explains her 177 classroom portfolio system, and "Finding the Value in Evaluation: Self-Assessment in a Middle School Classroom" (Reif), written by a classroom teacher explaining in detail how and why her own practices work. Evaluating English Language Arts Teaching Practices Even as it has grown more difficult to separate different strands of the English language arts from each other as a result of recent holistic approaches, it has likewise become more difficult to separate the issue of evaluating teaching practices from evaluation of student performance. Controversy rages as to whether students’ performance should be a criterion by which to evaluate teaching practices and teachers' performance. In Georgia, as Samuel Meisels reports, administrators and teachers in local school districts have been told that "their performance will be evaluated based on the gains made by their students on the CAT in succeeding years," a practice which has led to teachers making changes in their programs and teaching styles (20). In Kentucky a new law provides for schools with steady improvement in student performance to receive "cash awards to be used as the majority of faculty in each school determine." The faculty and administrators at schools which fail to improve "will be subject to transfer or dismissal" (Foster 36). Marc Tucker and the National Center on Education and Economy recommend that teachers be held accountable for student performance 178 and that "real awards" be given school professionals who help students meet the standards and "consequences" for those who do not (Gursky 55). Madaus points out that one reason tests continue to be so popular is that "the public and policymakers have come to mistrust teachers' judgments and want to replace them with external examinations" (114). Given the history of English language arts evaluation, it seems clear the public has frequently done so before. On the other hand, the report from the English Coalition Conference insists that "English teachers are the professionals most qualified to specify what is important in English studies: what are the understandings——and more important, the ways of knowing and doing—-that our students should achieve" (Lloyd-Jones and Lunsford 41). Apparently, President Bush and those attending the 1989 education summit hope politically to please both camps, for they have called for both greater authority and for greater accountability for teachers and principals (Oct. 4, 1989, Education Week). Recognizing the heuristic dimension of teaching, English language arts teachers, like their colleagues in other fields, are being encouraged to seek official evaluation which acknowledges "the process of discovery and growth" and which allows for a measure of "unpredictability and uncertainty" (Bryant 38). The title of an article by Yetta Goodman, "Evaluation of Students: Evaluation of Teachers," is printed so that the letters in the second 179 phrase are reversed to visually depict the fact that student assessment is a reflection of teacher evaluation. Goodman’s intent, however, in expressing this relationship is not to hold English language arts teachers accountable so much as to nudge teachers to recognize their own status as learners as well as teachers: Seeing ourselves reflected in our classrooms and in the responses of our students helps us to understand the nature of language learning and at the same time helps us become aware of our influences on that learning and on the relationships between teaching and learning. The dynamic transaction between teachers and students results in change in all the actors and actions involved in the teaching/learning experience. (3) Graves’s Build a Literate Classroom (1991) is an entire book. designed so that English. language arts teachers can evaluate their own teaching and their own beliefs and attitudes about learning-—building on his belief that changed classrooms are the result of changed teachers. Some of the chapter titles suggest the process of change that Graves believes English language arts teachers might set for themselves: "Make Your Own Decisions~-With the Children," "Rethink Learning and the Use of Time," "Structure a Literate Classroom," and "Evaluate Your own Classroom." It seems likely that at least some English language arts teachers who choose to follow Goodman’s and Graves’s advice might find themselves professionally at the mercy of a testing system that may or may not be philosophically compatible with their professional belief system. Such 180 situations will present a significant challenge to English language arts teachers in the 19905. Evaluating English Language Arts Curricula The English language arts teachers Goodman and Graves had in mind are capable professionals who can design and evaluate English language arts curricula. Whether they are ever given the opportunity to do so, some English language arts teachers seem eager to be involved in such decisions, and some publications are beginning to address such possibilities. One book, for example, encourages each elementary school faculty to "examine its own programs and values before embarking on a course of curriculum reform" (Goodlad xi). This report advocates involving local school faculty, who develop their own plan "designed specifically to fit [their] needs and desires" (Klein 6). The author suggests surveying parents, teachers, administrators, and students to discover what such groups believe ought to be, and what is being, emphasized in the curriculum (Klein 27). In addition to considering such groups’ opinions as reform suggestions are made, teachers and administrators could use the data to determine the kind of information such groups need to know to better understand the English language arts curriculum and program. Robert Donmoyer’s article, "Curriculum Evaluation and Negotiation of Meaning," goes further by describing a "deliberative approach" which involves actually gathering 181 together a group of "teachers, administrators, parents, community members and where appropriate students, and asking them to discuss and debate such questions as (1) what issues should be focused on in the evaluation, (2) what sorts of data ought to be collected and what methods should be employed to do the collection, and (3) what recommendations should be made to improve the program (275). Donmoyer points out that under such a plan the emphasis is on "fostering communication and resolving disagreements among participants who ideally will start the evaluation process with different views of education in general and the program being evaluated in particular" (275). He insists, however, that questions of meaning should always be explicitly addressed-—e.g., how is reading defined?--and resolved through discussion. Such a plan calls for considerable "group process skills" (277) on the part of the "evaluation leader" but can yield both recommendations for change and closer consensus as to the meaning attached to curriculum and teaching practices (278). Donmoyer would admit that program evaluation involving groups of persons with varying perspectives and agendas is, at best, an intricate undertaking. The benefit in initially involving groups in decisions about what should be evaluated, for example, may be a greater sense of ownership in the evaluation efforts on the part of the school community as a whole (278). 182 Any discussion of curriculum design and evaluation focuses on what it is that students need to learn and know and do. English language arts professional publications frequently focus on these more theoretical and classroom issues. Seldom, however, have English language arts publications given recent attention to important political issues, such as standards, that are being debated elsewhere. In broader education journals, for example, the high school diploma is being described as an indication merely of "credit accrual and seat time" (Wiggins 42). Norm— referenced tests are being described as offering only a "floating standard, which in a sense, makes it no standard at all" (O’Neil 6). The idiosyncratic nature of teacher grading is again being challenged (Canady and Hotchkiss). As might be expected, tied to discussion of national standards are discussions of national tests. Parents and the general public seem to favor the idea (a 1989 Gallup/Phi Delta Kappa poll, for example, reported that 73 percent support a common national exam for graduation) (O’Neil 7). One member of the President's Education Policy Advisory Committee has called a national exam "a foregone conclusion now" ("Specter" 6). An almost $2.5 million grant has been awarded a group working in Rochester, New York, and Pittsburgh to produce a "broad national examination system," "national educational goals," "national syllabus," and "new ways of measuring students’ mastery of knowledge and skills" (Gursky 52). Where such plans might leave English language 183 leaders and classroom teachers is unclear. As happened during the behavioral objectives era, English educators may, it seems, find themselves swept along with little, if any, chance to protest. The stakes, however, are very high. Enormous amounts of money are involved-—in grants such as that just described and in awards to ETS for NAEP--money that is thereby not available for other educational purposes. A relatively small group of persons seems to be making decisions that will have significant impact on teaching and learning at all levels. Whether or not the leaders of English language arts professional organizations believe they should take particular political stands on these broader issues, past experiences and issues have demonstrated that English language arts leaders and classroom teachers need at the very least the chance--via journal articles and conference sessions--to hear the issues debated. CHAPTER EIGHT REPORTS FROM ENGLISH LANGUAGE ARTS PROFESSIONALS As a part of many of the English language arts program evaluations conducted in the past and cited in this study, English educators have recognized the value of gathering questionnaire data. In fact, questionnaires have been a common feature of such studies because they allow for a variety of perspectives from which to view other data. Researchers such as Pooley and Williams in their Wisconsin study and J. N. Hook in his report on award-winning high schools have recognized the need for the reality check that questionnaires make possible. My own questionnaire was designed to elicit information from English language arts professionals who had (1) special interest and/or experience in assessment of English language arts curricula, teaching practices, or student performance and (2) frequent, if not daily, contact with actual English language arts teachers and classrooms. Questionnaire respondents do, of course, sometimes respond to an anonymous questionnaire as if what they' wished for' were true, and Slavin cautions that "questionnaire-scales attempting to measure hard—to-quantify variables" often have low reliability (78). Still, questionnaire respondents can provide a wealth of information-~both fact and opinion--in their responses to both prompted and open questions and in marginal and supplementary comments. 184 185 My questions were designed primarily to yield information about criteria actually being used to evaluate current English language arts programs and also about the contexts in which that evaluation takes place. Two final optional items asked for brief descriptions of district English language arts program evaluation processes and for suggestions that might help decision-makers improve programs. The questionnaire, designed in consultation with my dissertation adviser and with a data collection and analysis consultant, included a six-point Likert scale for several of the questions. For all questions, respondents were asked to check as many items as applied and to feel free to add comments, thus allowing for some of the elaboration that is possible when gathering interview data. Before mailing the questionnaires, I pretested them with. a group of graduate students who were also English language arts teachers in a variety of school settings. Their responses led me to make minor refinements in phrasing and in the items included. For example, they pointed out that in the "Teaching Practices" section, question #1, I had asked about the significance of "factors" which were ctually "persons." Their questions about how to respond to the "Student Assessment" section, question #1, led me to revise the question by removing "for English/Language Arts," which seemed to be confusing and unnecessarily restrictive. Their resonses to the first open—ended question led me to rephrase and clarify the fact that when I asked about the process of 186 evaluating English language arts "program," I hoped they would consider curriculum, teaching practices, and student performance. Beyond these safeguards, I felt reasonably hopeful that the sample I chose to work with would provide honest and substantive information because they were all English language arts professionals who served on NCTE committees or served as contact persons in award-winning school districts. Clearly such a group was not a sample representative of all English language arts teachers and educators across the country. However, I specifically valued the responses of this group in part because they were knowledgeable about circumstances beyond their own personal situations. (By specifically asking respondents to consider their school district or the districts they knew best, I sought responses not limited to single classroom experiences, though such responses necessarily increased the reporting of hearsay evidence.) Even offering specific prompts for respondents to consider does unavoidably bias responses to some degree. The compensation for that bias, I believe, exists in the indication from respondents that the questions provoked new reflections and insights about their own and others' circumstances. For example, one respondent observed that, "Filling out your survey helped me realize how much control rests with the classroom teacher." Another respondent commented on the questionnaire items used, saying "they will 187 open doors for honest comments." The questionnaires were sent to members of the NCTE groups indicated in Table 1. Table l - Sample for Questionnaire Center of Excellence Winners — drawn from lists of 1985, 1987, 1989 ........... ..... ...... 62 standing Committee on Testing and Evaluation ..... 12 Committee on Curriculum ...................... ... 13 Elementary Practices/Programs Committee .......... 23 National Certification & Assessment Committee .... 14 Conference on English Education Committee — Supervision & Curriculum Development in English Language Arts Consultants .............. 143 *Classroom Practices in Teaching English .... ...... 5 *Centers of Excellence Committee ...... .... ....... . 5 *Evaluation Curriculum Guides Committee ........ ... 20 Total 297 *Chosen from 1989 rather than the 1990 NCTE Directory, since these committees were being reconstituted in 1990. Of the total group, 209 seemed (from their institutional affiliation listed or from their mailing address) to be employed within a school district, 88 seemed (for the same reasons) to be employed by universities or state departments of education. Questionnaires were sent to persons in 39 states, with heaviest concentrations predictably in California, New York and Texas. (Questionnaires were not sent to those with no institutional affiliation listed or address given, to ex officio members, and to NCTE staff liaison members of committees.) Each questionnaire was accompanied by a cover letter (Appendix B) and a stamped-self—addressed envelope, but no follow-up mailings were sent. (The Michigan State University 188 Committee (n1 Research Involving Human Subjects granted an exemption to the University Policy on Research with Human Subjects for this study.) The total number of responses received, 102 from the school districts (49 percent) and 39 from universities/state departments of education (44 percent) for a total of 141 (47 percent) was not disappointing, given the length of the questionnaire (3 pages, single—spaced, asking for responses to 77 items to be checked plus 2 optional essay items) and the May mailing date, normally an especially busy time for classroom teachers. Six questionnaires were returned by the post office, and two were returned blank with the explanation that the respondents’ current work did not involve enough direct contact with school districts to enable them to respond knowledgeably. One respondent did not fill in the checked responses but indicated that he felt uncomfortable with the questions asked, since so much variety can exist from situation to situation. There were, then, 132 usable responses. The responses tended to be rich with information. Only 11 percent of those who responded provided only checkmarks, while 77 percent wrote out responses to the optional essay questions. Thirty-five percent added marginal comments (several wrote lengthy notes or even separate letters as well), and 46 percent included their name and address so that results could be mailed to them. Several added personal notes (though I knew almost none of them 189 personally) inviting me to visit their districts and wishing me luck with the study. In reporting the results, I have combined Likert-scale responses into three groups rather than six and have included only a sampling of marginal comments to suggest the range of information offered and opinions expressed. Although responses are reported as percentages, it would be misleading, because of the small sample involved, to place great significance on the numbers. Instead, the value of the responses lies primarily in the trends they suggest. English Language Arts Curricula Recognizing the important link between curriculum design and curriculum evaluation, my first questions focused on the influences on English language arts curricula. Table 2 - Factors That Shape Curriculum 1. In your school district (or the school districts you know best) how significant are the following factors in shaping the English Language Arts curriculum? Significance = Very Moderately Not a. accreditation bodies (n=91) 34.0% 34.1% 31.9% b. college expectations (n=126) 49.9 36.5 13.5 C administrator/board interests (n=l30) 44.5 39.1 16.1 d. professional literature (n=129) 44.9 40.2 14.7 e. community influences (n=l30) 30.7 50.5 19.1 f. faculty skills/interests (n=132) 46.9 44.6 8.2 9. school schedule (n=128) 28.0 42.0 29.6 h. student needs/interests (n=132) 48.4 39.0 12.0 i. test results (n=123) 44.6 44.6 10.5 j. other - tradition, state curricula/tests, supervisors 190 Marginal comments about factors that shape English language arts curricula sometimes explain district-wide curriculum situations, e.g., "In this large school district there are great variations by sub—district and by school. Although there is a district-wide ’standardized curriculum’ in all subject matter areas, K-12, it is mainly a listing of topics, and not descriptive of subject matters." Others offer a particular definition of curriculum, e.g., "I will answer all these questions from the vantage point of curriculum as subject matter selected and organized on the basis of a curriculum design." Although several influences outside the classroom are perceived as significant, "college expectations" is most often ranked as very significant in the shaping of curriculum, slightly higher than even "student needs and interests." K—12 English language arts professionals still, apparently, see themselves like their early twentieth— century counterparts--strongly influenced by post—secondary forces beyond their control. Other highly significant factors are "administrator and board interests," "professional literature" (no doubt more important to this group of professional leaders than to a general population of educators), "faculty skills and interests," "student needs and interests," and "test results." (The low number of responses for "accreditation bodies" can perhaps be explained by the placement of that item on the questionnaire, which may have led some 191 respondents not to notice that it was actually the first item. Otherwise, almost all respondents checked each item.) Fewer than half rank "test results" as very significant, which seems a little surprising in light of all the rhetoric in professional publications about testing. However, almost 90 percent rank test results as at least moderately significant. Overall, when "very" and "moderate" figures are combined, the three most significant factors in shaping English language arts curricula are teachers, students, and tests. These results indicate what most English educators perceive as appropriate central roles for teachers and learners in the classroom. The fact that test results rank high is consistent with opinions expressed in profess professional publications that tests already exert considerable control over curriculum. Table 3 — Curriculum Evaluation 2. In your school district (or the districts you know best) how significant are the following groups in evaluating the English/Language Arts curriculum? Significance = Very Moderately Not a. teaching faculty (n=132) 67.4% 19.7% 12.8% b. curriculum coordinator/ principal (n=131) 59.5 29.0 11.4 0. external consultant (n=127) 8.6 33.9 57.5 d. students (n=129) 15.5 39.6 45.0 e. accreditation bodies (n=128) 28.9 43.8 27.3 f. community/board (n=129) 30.3 41.8 28.0 9. other - state, parents 192 English language arts teachers clearly are perceived as most influential in evaluating English language arts curricula, a reassuring result for most English educators, though the fact that more than 1 in 10 rank faculty as not significant in evaluation of curriculum is disturbing. Curriculum coordinators and principals are also ranked as very significant. These persons are all part of school- based or district-based staff, whereas least significant are external consultants. According to respondents, students play a noticeably insignificant role in evaluating English language arts curriculum, despite current professional discussions about learning communities and negotiated curricula. Thus, the reality of English language arts classrooms appears to be a fairly traditional one with classroom teachers as the primary evaluators of the curriculum. The following three questions about curriculum guides may have implied they are more important as a curriculum factor than others believe them to be. Although my historical study revealed that relatively little is said about curriculum guides in English language arts publications, my own experience has led me to believe that curriculum guides are, officially at least, considered important. 193 Table 4 — Designing and Revising Curriculum Guides 3. In your school district (or the districts you know best) how significant are the following persons in designing and revising curriculum guides? Significance = Very Moderately Not a. teaching faculty (n=132) 81.0% 13.6% 5.3% b. curriculum coordinator (n=126) 67.5 28.6 4.0 c. principal/administrator (n=130) 21.6 43.9 34.6 d. other - state, external consultant, students, curriculum committee, parents Question 3 responses makes clear that curriculum guides are considered a matter primarily for classroom teachers and curriculum coordinators to produce. One respondent explained that usually the curriculum coordinator chaired the curriculum guide committee, serving as a facilitator "but does not impose change." The range of other persons added by respondents as significant in designing and revising curriculum guides suggests that guide writing can be a process that takes into consideration. a variety of perspectives. Table 5 - Use of Curriculum Guides 4. What use is made of curriculum guides in your school district (or districts you know best)? a. teachers are expected to follow them explicitly ...26.3% b. they are intended to be followed loosely ...... ....52.6 c. they are often ignored ... ........ ........ ...... ...29.3 d. other - "non—existent," intended to be followed but modified/added to, designed to "model" plans and choices 194 The inclusion of the "often ignored" item seemed to have the effect of a signal to respondents that I did not necessarily expect them to respond in party—line terms and that I was aware that official policy and actual experience do not always match. Responses about the use made of curriculum guides suggest that a fair amount of curriculum guide writing may be a matter of going through the motions, since almost 1 in 3 respondents perceive curriculum guides to be "often ignored." Many respondents seemed eager to express a positive or negative judgment about curriculum guides, for their comments often seem either skeptical or defensive. For example, one school-district respondent said, "District guides are often ignored. Our own program is adhered to." A university or state department respondent said, "One has no way of knowing . . . teachers too often say one thing and do another." On the other hand, several English language arts professionals within school districts defend the use of their curriculum guide. One called it a "working document." Another said it is "highly useful . . . because of the model integrated plans, the flexibility, and choices." Another especially enthusiastic respondent explained that the "guide is designed to encourage teachers and kids to discover together how to make learning happen." 195 Table 6 - Value of Curriculum Guide 5. What value does an English/Language Arts curriculum guide have? a. its development/revision provides occasion for faculty to discuss pedagogical issues and seek consensus .... ............... .... ...... 78. 9% b. provides information for new teachers in the district ........ ...... ........ ............. 8.8 c. provides an official source for reference by administrators /teachers in discussion with parents/board members . ........... .. ......... ..73.7 d. other - assures equity and prevents overlap across district, identifies skills for mandated tests, articulates K-12 program, improves instruction, used for evaluation The range of "other" responses suggests that the value of curriculum guides may be a more complex issue than my prompts indicated. Some respondents see the guide as providing 2a "scaffold" for the curriculum which, another respondent reports, "prevents overlapping" and, according to another respondent, prevents "undue repetition"~-all rsponses which seem to imply a traditional curriculum structure. Some see curriculunt guides as linked to assessment: "They are a check and balance for teachers - an assessment resource to see if their yearly curriculum addresses district expectations." Another respondent explains that the guide "identifies skills . . . on the graduation test." Again, the actual value of curriculum guides is perceived as different, in some cases, from what is officially stated: "Although #1 is quoted, in actual practice final decisions are made by curriculum coordinator 196 and administrator, who may know nothing about field." Another respondent said that they "make it appear state framework is being followed," and another agreed that "they’re for show." Apparently, situations differ to a great extent and perceptions of those situations differ as well. English Language Arts Teaching Practices It is interesting to notice the shifts in influence that various persons have when issues regarding English language arts teaching practices are considered. Table 7 - Persons Who Determine Teaching Practices 1. In your school district (or the districts you know best) how significant are the following persons in determining teaching practices? Significance = Very Moderately Not a. classroom teacher (n=127) 84.9% 12.5% 2.3% b. curriculum coordinator (n=120) 35.0 47.5 17.4 c. principal (n=123) 29.2 50.3 20.2 d other - department head, superintendent, state, community/board, students, Again, classroom teachers are perceived as most significant in determining teaching practices, with less than 3 percent reporting teachers as not significant in determining teaching practices. These figures may be somewhat surprising in light of general impressions sometimes expressed that external forces are wresting 197 control from teachers. English language arts professionals seem to think that tests exert more control over the content of the curriculum than over teaching practices. Some respondents do, of course, point out that their districts do not have a curriculum coordinator. Principals seem not to be considered instructional leaders among these respondents. Table 8 - Evaluation of Teaching Practices 2. In your school district (or the districts you know best) how significant are the following factors in evaluating teaching practices? Significance = Very Moderately Not a. principal (n=129) 77.6% 21.7% 0.8% b. curriculum coordinator (n=122) 26.3 40.2 33.6 c. peer teachers (n=124) 20.9 34.7 44.3 d. parents/board members (n=122) 10.7 35.3 54.1 e. students (n=125) 11.2 32.8 56.0 f. teacher self-evaluation (n=121) 34.7 41.4 15.7 g. other - department chair, test results If teachers are perceived as most significant in determining teaching practices, clearly principals are perceived as most significant in evaluating teaching practices. One respondent commented that the principal as evaluator "is governed by the contract," a situation that may be true almost universally. Again, students are cited as especially not significant, as are parents and board members, again implying a traditional school and classroom structure. 198 Table 9 — Influences on Changing Teaching Practices 3. In your school district (or the districts you know best) how significant are the following influences in changing teaching practices? Significance = Very Moderately Not a. professional literature (n=131) 35.0% 46.5% 18.3% b. inservice/staff development training (n=103) 38.7 56.3 4.8 c. exchange of ideas among teachers in building (n=131) 59.5 33.5 6.7 d. exchange of ideas among teachers in district (n=127) 35.3 44.0 20.4 e. exchange of ideas within wider group (n=127) 44.0 35.3 20.4 f. administrators/board (n=129) 17.0 44.1 38.7 g. constraints of facilities/ school schedules (n=127) 24.3 49.5 25.9 h. student needs/interests (n=129) 42.5 40.2 17.0 i. test results (n=129) 35.6 47.9 16.2 j. community (n=107) 10.2 44.7 44.7 k. other - writing project, state, department chair/curriculum coordinator When teaching practices are changed, respondents indicate that teachers lead the way, especially when they exchange ideas and information among colleagues in their own school building. Somewhat surprising is the strong showing of "exchange of ideas within a wider group (e.g., writing project support group)." Both these responses are consistent with the optional essay responses, which underscore the need teachers feel for time to grow professionally. They also suggest a sense of confidence in their own ability to be decision-makers. Interestingly, whereas principals are perceived as especially significant in evaluating teaching practices, 199 administrators are cited as almost the least significant influence on changing 'teaching practices. Although principals have the official authority, English language arts professionals seem not to depend on administrators’ advice as much as they do on other resources. Interestingly, only 1 in 10 perceived community influences to be very significant. If "very" and "moderate" responses are added, "professional literature" and "inservice/staff development training" are cited as significant by over 80 percent of respondents (again, perhaps reflecting the professionalism of this sample of respondents). One respondent, for example, has observed that professional literature is significant now "much more so than ten years ago. Classroom influences are also cited as at least moderately significant by 80 percent or more of respondents, for English language arts professionals perceive both "student needs and interests" and "test results" as significant in changing teaching practices. English Language Arts Student Assessment In considering criteria for assessing English language arts student performance, I focused first on district-wide measures and then on classroom measures. 200 Table 10 - School District Means of Assessment 1. In your school district (or the districts you know best) which of the following tests or means of assessment are used? a. SAT, ACT, CAT, other nationally—normed tests ...... 90.2% b. state-mandated tests ........ . ........... ..... ..... 84.2 objective .......... ........ ....... ...... ........ 65.4 writing sample ....................... . ......... .66.9 c. holistic scoring of writing samples ............... 78.2 d. student portfolios ..... ..... ............... . ...... 44.4 e. other - district tests, criterion—referenced tests, end of level/book/basal tests Clearly percentages are high for all tests and measurements mentioned. Nationally-normed tests and state- mandated tests are common throughout the country’s school districts. Beyond these tests, however, holistically scored writing samples seem also to have found a significant place in district English language arts assessments. The marginal comments make it clear that respondents consider portfolios the current assessment "cutting edge" and are eager to try them (several indicate portfolios are currently being phased in). It seems remarkable that 54 percent of those employed by school districts report using portfolios in their districts, whereas only 19 percent of those from universities or state departments of education seem aware of portfolio use in school districts they know best. Although portfolio assessment can mean different things to different people, and although these responses might reflect respondents’ eagerness to appear up to date, it seems especially important to me—-and perhaps enlightening to 201 university and state respondents as well--that so many districts seem to be using some kind of portfolios. I say this because, as the historical review indicates, alternative assessment forms have often been recommended without always being put into general practice. Table 11 - Classroom Means of Assessment 2. In the classrooms of your district (or the districts you know best, what means of assessment are currently being used for English/Language Arts? a. objective tests ..... .. ................. ...........91.7% b. essay tests ....... ............... ..... ..... .. ..... 88.0 c. observation of students ............. .......... ....74.4 d. interaction with students ............. ............58.6 e. student compositions. ....... ............. ........ ..92.5 f. student performances .......... ........... .........67.7 g. oral reading .............. ......... ...............43.6 h. student portfolios ............... ................. 63.2 i. contractual grading . ..... ...... ........ ...........36.l j. student self-evaluation ......... ...... .... ........ 40.6 k. other — group evaluations/collaborative projects, peer evaluations More traditional forms of classroom assessment, such as objective tests and student compositions, are apparently used in 9 out of 10 classrooms, while essay tests follow close behind. Observations of students make a strong showing as well, though observation can be interpreted broadly to mean of variety of things, from carefully planned formal and informal classroom observations to vague "participation" grades given on the basis of random impressions. "Student performances" are included by 2 out 202 of 3 respondents, with one even commenting that "Our policy on examinations requires ’a final culminating experience.’" Portfolios are cited by almost 2 out of 3 respondents, again an indication that this form of assessment is already a part of classroom practice. "Student self-evaluation," however, receives a relatively low rank, even though so much is being written in professional publications about the importance of self-evaluation and even though it seems a relatively easy assessment form to implement in the classroom. A variety of factors, some of which have appeared in earlier prompts, influence decisions about how English language arts student performance might be assessed. Table 12 - Factors Determining Means of Assessment 3. In your school district (or the districts you know best) how significant are the following factors in determining means of assessment of English/Language Arts? Significance = Very Moderately Not a. professional literature (n=130) 34.5% 43.9% 21.5% b. inservice/staff development (n=128) 45.3 41.4 13.3 0. exchange of ideas among teachers in building (n=123) 47.2 43.1 9.7 d. exchange of ideas among teachers in district (n=121) 30.6 47.1 22.3 e. exchange of ideas within wider group (n=121) 24.0 44.7 31.7 f. administrators/board 30.9 48.4 20.8 g. time constraints 32.7 47.1 20.2 h. other - state, lack of money and time, district grading policy 203 Again, exchange of ideas among teachers in building is most significant (90 percent of respondents rank it as at least moderately significant), though inexplicably, exchange of ideas within a wider group is least influential in determining the means of English language arts assessment. Apparently, while Writing Project and similar support groups have influenced curriculuni decisions, they have had less impact on assessment. "Inservice and staff development" is perceived as significant (86 percent saw this item as at least moderately significant), as are "administrators/board" and "time constraints." Several respondents acknowledge and affirm. "time constraints" as an important factor--perhaps one they had not considered before. One respondent, for example, observes that "we put tremendous burdens on ourselves by ignoring [time constraints]," and another commented that "handling the paperload continues to be a problem. Despite staff development on alternatives to ’red- penciling,’ e.g., conferencing, teachers/parents/administrators continue to legitimize only edited versions of writing completed by the teacher." Though money issues are not included as a prompt, one respondent points out that budget constraints are another important factor. 204 Optional Final Essay Items The optional essay items allowed respondents to some degree to reconsider items and issues raised in the previous questions. 1. Briefly describe, to the extent you are aware of it, the process in your district (or in the districts you know best) for evaluating the English/Language Arts program--curricula, teaching practices, and student performance. In describing program evaluation processes, respondents were on their own without prompted items to respond to, though they had just had a variety of items suggested to them as they responded to the first part of the questionnaire. From the items mentioned by respondents, the following results have been compiled. Table 13 — Factors in Program Assessment Process (N = 97) 1. Student test score results.......... ..... .....49. 5% 2. state involvement (direct or indirect).........23. 7 3. School district committee (teachers, administrators and sometimes parents in multiple-year process)............19.6 4. No process ............................. ....... 9.3 5. Surveys (parents, students, teachers) ......... 7.2 6. Observation of teachers ...... ........ ......... 7.2 7. Alternative assessment information re: students (self—evaluations, portfolios) ..... 6.2 8. Professional publications/conferences (as indicator of current practice)........... 4 9. External consultants ........ ......... ......... 4. By far, the factor most often mentioned regarding program evaluation is the analysis and influence of 205 standardized test data. Respondents seem to agree with recent professional publications in this regard. One of every two respondents explicitly mentions test results, and undoubtedly others would have done so if they had described their process in detail rather than with a brief note, such as "six—year evaluation process." Some respondents who mention student test scores as a factor use what may or may not be neutral language ("Student performance on standardized tests, i.e., CAT, DRP, is the greatest determinant of how effective teaching practices are evaluated"). Others more openly express their unhappiness with what is perceived as over-use and abuse of tests ("Test scores are judged not to reflect what is being taught and are disregarded on one hand while used as evidence/proof/reason why change cannot occur on the otherl"). One of the most striking features of the responses received for this item is the fact that almost 1 in 5 responses mention only the part student test scores play. Because the item follows the student assessment part of the questionnaire, it is possible that some respondents misread this item as related only to student assessment rather to program assessment. However, most of these respondents make it clear that, while they understood the question was about overall program evaluation, test scores are essentially the only measure by which English language arts program evaluation occurs. For example, one respondent explains, 206 All programs in this district are judged by test results only . . . the board and superintendent have an obsession with test scores; all teaching behaviors, curriculum, schedules, etc., are controlled by [tests]. Teachers essentially teach to tests most of the year." Perhaps one of the more insightful responses regarding the use of test scores is the following: "Most evaluation is in the hands of the teacher and will probably stay there as long as test scores remain high." Almost 1 in 4 respondents mention the influence of state guidelines or state-mandated curricula and/or assessments. Since some of the respondents are in fact state employees, it is not surprising that comments about state involvement are often positive ("Districts are presently trying to meet new state guidelines in building curriculum that includes student needs, reflect the state- of-the-art practice, and community needs.") Committees seem to provide the district—level vehicle by which English language arts program evaluation most often occurs, with committee members drawn sometimes strictly from faculty, sometimes from faculty and administrators, and sometimes from parents, board members, and community as well. Surveys, interviews, and questionnaires are sometimes used with all of these groups and sometimes also with students. A few respondents express dissatisfaction with committee procedures, however, such as the respondent who described a cycle of curriculum adjustments that seemed entirely driven by the curriculum coordinator, who apparently promotes a particular program or procedure until 207 "two years later that dies and he begins some process for another idea." Another respondent explains that, "Our committee teachers end up being ’yes men’ or get disgusted and give up." Any mention of evaluation of teaching practices seems almost entirely limited to observations by administrators. Still, if "superintendents’ contracts may depend on the scores," the result is said to be "a great deal of top-down influence on the English language arts curriculum." Other responses regarding teaching practices range from "Evaluation of teaching practices receives very little attention. Teachers may teach the way they think is best" to "We are told what must be done and taught. We are also told to enforce, almost to the page, the guides given to us to use." Almost 10 percent of respondents report that they have no program evaluation process. One called the process "haphazard" with "no formal evaluation procedures." Another reported that there is "no process--the district administration reviews exam results and tells me to get better results." One respondent added that, "Curriculum is constantly being written-—NEVER implemented or evaluated." Once respondents had vented some of their assessment frustrations in describing "what is," most. were ready to think more positively as they considered “what could be.“ 208 2. If this process of evaluating the English Language Arts program could be altered, what changes might help decision- makers be in a better position to suggest improvements? Again, respondents needed to generate their own answers rather than respond to items provided. Although one response in regard to suggestions for change is simply a sarcastic, "Only if God were there to give them the 'right answers,’" most respondents seemed to offer serious suggestions. As they did so, some may have drawn on the prompts suggested by the items listed on the earlier sections of the questionnaire. From the items mentioned by respondents, the following results have been compiled: Table 14 - Suggested Improvements for Program Evaluation (N = 78) 1. Greater knowledge re: assessment, research.......l6.5% 2. Time to reflect, read, share ideas, foster collegiality ....................... ......... ...15.5 3. Teachers sharing in decision-making ..... .........ll.3 4. Staff development/inservice ......................lO.3 5. Use of portfolios ........ ..... .................. 9.3 6. Money .................. ..... ..................... 6.2 7. Using observation, interviews, ongoing assessment ........ ... . ....... ............... 5.2 8. Fewer tests/better assessments ................... 4.1 The responses mentioned in this item «generally have more to do with the context of, or conditions for, program assessment than with evaluation processes per se. Respondents’ first concerns are not about new procedures but about professional responsibility and opportunity. One 209 respondent, for example, sees the need for "research information in brief form to give to administrators, parents so they understand and support changes and improvements that will reflect current, research—based practices." Another respondent expresses the desire for "real hard evidence that shows the lack of accuracy of standardized tests and their destructive influence" while another seeks "better understanding of assessment as a means to improve instruction vs. testing for accountability." Respondents seem to recognize that program evaluation takes time, but interestingly, respondents report a need for time to share ideas with colleagues, to read and reflect and become knowledgeable, or as one respondent put it, "time to share and discuss and read about what works and be allowed and supported to make change." If English language arts teachers had the knowledge and time, some respondents reason, they should share in a more significant way the decision making done in regard to English language arts. Some respondents, however, do express satisfaction with their own present systems and boast, "Our school faculty has been given great autonomy in developing our own philosophy with regard to all the curriculum areas." What they seek for themselves, they seek for others as well, listing staff development and inservice as a high priority--although some of the responses express rather smug and superior attitudes toward their colleagues, complaining that they "do not read professional literature" 210 and that "if we didn’t change books every seven years, nothing would change." Although classroom teachers are infrequently involved with specific money issues directly, they live with the consequences of limited or plentiful funds everyday. A few respondents seemed to realize an important connection between money and programs and, therefore, mention money as an item important to improved program assessment. When they mention money, however, it is not money to pay for testing that they want but rather money to develop the professionalism to be capable curriculum evaluators themselves ("If teachers are to be truly involved in curriculum maintenance district wide, they need time and pay to do it"). In regard to changes that might affect classroom assessment, respondents mention portfolios most often, though some caution that they need to know how to use portfolios effectively-~as "more than just ’holding bins.’" Another respondent reports that a portfolio system would be desirable but “NOT where portfolios are holistically scored but where teachers sit together and discuss student work periodically--first with students, then with other teachers and administrators." In addition to mentioning portfolios, respondents also occasionally mention a variety of other informal classroom assessments (e.g., observation, interviews, ongoing assessment). A number of other suggestions are offered, such as the need for an English 211 language specialist rather than generalists in the district office, since sometimes "everyone, regardless of background, sees themselves as knowledgeable about English." Others mention the need for better coordination and communication among K-12 language arts teachers. Although relatively few mention tests as a part of their suggestions for improved program evaluation, those who do have strong feelings about the issue. One respondent reports positively, "We need to stop fearing evaluation and begin to look at it as an informative process. Then we need to engage in evaluation frequently--formative, diagnostic, and summative." More often, however, respondents continue to rail against the power of tests, with one conceding, "We hardly rely on teachers’ statements about students at all," and another explaining, The results of state mandated testing are having PROFOUND impact upon the curriculum now--much more so than any previous source of information. We are feeling intense pressure to modify curriculum to address weaknesses of our students’ testing. The goal now is to improve objective test scores. While one respondent concludes that, "I feel, and fear, that each school district will have to devise their own means of assessment to better accommodate their needs," another mentioned the need for greater involvement in professional organizations, such as NCTE, and expresses the desire for specific help from such organizations (since the questionnaires were sent specifically to NCTE members, such responses are not surprising). Another respondent expresses 212 the wish that NCTE would provide "more exposure . . . to programs of English across the state and nation" by including program descriptions in journals, so that "interested readers could write directly to the schools for detailed information." On the other hand, another respondent mentions the helpfulness of the NCTE Recommended Curriculuni Guides while another exclaims, "Thank God for support of professional organizations like NCTE and a few enlightened leaders and teachers." Although attitudes are hard to judge in some cases, it is my impression from studying the questionnaire responses that the respondents might be fairly evenly divided between those who seemed relatively content with English language arts program development and evaluation as they know and experience it and those who seem primarily frustrated, impatient, or disappointed. One respondent said, "We’re pretty well—off, frankly," and another said, "No need to change." Most, however, had specific suggestions and opinions about what might improve evaluation processes. If there is a recurring theme among their suggestions, it is the belief (a belief reflected in current professional publications as well) that English language arts professionals can and should see themselves as change— agents in district English language arts programs. If curricula and student assessment are to change, these respondents believe English language arts teachers themselves hold the key to such changes. CHAPTER NINE CONCLUSIONS, SPECULATIONS AND RECOMMENDATIONS A review of the history of the criteria by which English language arts programs have been assessed, along with consideration of the contexts in which such assessment has occurred, can provide valuable insights to help address the issues and needs of today. The historical review combined with an analysis of data regarding current thinking about these issues can lead further to conclusions about the past and present, speculations about their significance, and recommendations of how best to address the issues and needs of the future. Both the context and the criteria questions raised in chapter one are complex indeed. There is not one single purpose for evaluation and assessment nor one single person or group who should evaluate and set standards. There are a variety of groups that are served by assessment and that might ultimately stand to lose. There are countless criteria that might serve to help evaluate English language arts programs. Given what has been learned about the past and the present, the following conclusions, speculations, and recommendations have emerged. Because my study has raised as many questions as it has answered, I will not attempt to offer resolutions in every case but will hope that in some cases new questions might offer as much or more illumination of the issues than would efforts to force closure. 213 214 Evaluation of English Language Arts 1. History and present circumstances reveal that there is both an internal function for English language arts assessment and an external function as well. By internal, I mean all evaluation that goes on within the classroom community between students and the classroom teacher. By external, I mean all assessment that is initiated by someone other than students and their classroom teacher. This study has shown that there is now and always has been external evaluation as well internal classroom evaluation. From the colonial days when the selectmen and ministers visited the schools to hear students recite, there have been external evaluations of literacy learning in this country. In a perfect world there would be only classroom evaluation, or perhaps only self—evaluation. Teachers would understand exactly what each learner needed and provide it. In an imperfect world, however, both internal and external assessment are realities that cannot be denied. 2. When standardized and objective tests first appeared, they were hailed as ea correction to teachers’ subjective impressions (e.g., p. 26 and p. 32). Today there is renewed criticism of the subjective and idiosyncratic nature of teachers’ grades (e.g., Canady and Hotchkiss) which could be cited by those advocating national tests and standards. In spite of recent loss of faith in standardized tests 215 expressed. by' many, there seems to be little thought (by anyone except perhaps by classroom teachers) of going back to depending solely on teachers’ grades. 3. The primary purpose for English language arts evaluation should be to improve students’ literacy learning. There are many other worthwhile purposes, but students’ needs should be the first priority. Some of the tests discussed earlier in this study have, in fact, clearly resulted in harm to students and. their learning. Therefore, it is important that English language arts assessments have clear purposes that, as much as humanly possible, will not do harm. 4. English language arts professionals believe that testing drives curriculum. Historical and current data reveal that testing, to a greater or lesser degree, has almost always driven curriculum (e.g., the effects of early twentieth- century college entrance exams on secondary curricula, p. 20). This may mean today simply that a classroom teacher adjusts a lesson after grading a test or it may mean that a school district promotes or dismisses teachers on the basis of the district’s student test scores. Although. we can regret this situation, it seems that we are now at a point where any new test--or means of assessment--must be worth teaching toward. 216 5. Historically it has been true that some parts of English language arts seem more difficult to assess than others. Perhaps the most difficult to evaluate has been students’ understanding and experience of literature (pp. 134-36)-- which sometimes yields few observable behaviors. Or perhaps the most difficult has been the spontaneous language arts—- speaking and listening (p. 52). At any rate, as this study has shown, each of the language arts has its own special evaluation characteristics and difficulties. Whole language teachers have sought ways to observe students’ language in the context of classroom activities. However, holistic efforts to integrate learning and assessment should not ignore the distinctive evaluative characteristics of the individual language arts. 6. Practicality has always been a consideration when decisions have been made about English language arts assessment. However, practicality should not be the primary consideration. When standardized tests were introduced, English language arts teachers as a rule embraced them (e.g., p. 45). If some teachers questioned their validity, those teachers also knew that the alternative was to go back to reading stacks and stacks of student essays. Under the circumstances, the practicality argument must have seemed very seductive. More recently writing skills have been easily and economically "tested" with multiple-choice questions, but such tests do not, of course, actually test 217 writing. Today scoring sessions for writing assessment are significantly more expensive than the old tests, but they are considered worth the investment. Though practicality is a significant issue, for assessments suggested in the future, practicality must not be the first consideration. 7. History has shown that self—evaluation is not 61 new recommendation for English language arts students but one that has been encouraged for many years. Since self- evaluation promotes reflection and learning, today’s English language arts students should be encouraged to self- evaluate. Self-evaluation ideally would involve not only simple checklists like some designed in the past but significant thought and discussion as well. 8. Evaluation of English language arts need not be bound by the constraints of old tests and assessments. English educators should continue to explore assessments that acknowledge strengths as well as weaknesses, that consider processes and strategies used, that view error as a necessary part of the risk—taking needed for learning, and that allow for possible responses not anticipated by the test makers. 9. Informal classroom assessments are being advocated now, as they have been in the past (e.g., p. 58). Whole language promotes observation, or "kidwatching," usually citing as 218 examples observations being used with beginning readers and writers. It is uncertain, however, how appropriate such methods are for older students. For example, what effect might observation have on students who are aware that they are being watched--that their comments and even facial expressions might be noted in anecdotal records? What might such practices promote? It seems likely that informal classroom assessments will eventually also be criticized as too subjective and idiosyncratic, in the same way that teachers’ grades have been in the past. At some point some students will get shortchanged because of the way a teacher interprets the content of a portfolio or observes a student’s participation (or lack of participation) in a collaborative project. 10. Portfolios present potential problems as well. Questionnaire respondents and professional publications report that portfolios are quickly becoming a part of classroom, district, and even large-scale means of assessment. They seem to work especially well in classrooms as teachers and students gather' material over time as a record of students’ literacy growth and achievement. Beyond the classroom context, however, it is difficult to decide how useful portfolios can be. What criteria for evaluation, for example, allow a poem from one portfolio to be compared with a letter from another? Are portfolios best used for assessment purposes, then, as a way to monitor or document 219 rather than to evaluate? There are also troubling issues in regard to portfolio preparation. How much time will teachers allow or expect students to spend in compiling materials for the portfolio? Will such activities always be an unobtrusive part of classroom routines? If parents help choose what to include, will the child whose parents neglect to save samples of their child’s literacy be at a disadvantage? Just what to do with portfolios seems unresolved at this point. (One conference presenter, for example, recently cautioned against portfolio presentations for the school board that she had witnessed, occasions that became "cutest kids" shows.) 11. As this study has shown, several English language arts teachers have experimented with teaching students the secrets of evaluation (e.g., p. 55). That is, they have removed the evaluation mysteries by sharing the evaluative criteria with students and training them to evaluate their own and the work of others. This form of self-evaluation seems to merit continued consideration, for it can lead to internalization of evaluative criteria and is consistent with efforts to promote critical thinking. 12. The question of who will evaluate and set the standards is a central issue, one that raises countless questions that need to be considered. carefully' by' English language arts 220 professionals. In part, the question of who will evaluate seems partly settled, in light of this study, if we consider past and present recommendations of English educators: English language arts teachers consider themselves the primary evaluation experts. External evaluation is the issue that generates more of the unanswered questions. Even if we assume there is merit in state and national assessment (an assumption many are not willing to make), the issue of standards is not resolved. Should there be standards? What purpose do standards serve, other than to sort those who fail from those who do not? To what extent should English language arts teachers functicui as gatekeepers? What. do failing scores reveal? What effects do failing scores have on students? On the teacher? 0n the salaries of administrators? 13. Current recommendations are that evaluation tools should match current theory and practice. A look at history, however, reveals some examples that raise questions about this issue. In the midst of the experience curriculum of the 1930s, for instance, the tests matched the most trivial parts of the curriculum, as seen in the test item cited earlier about how to answer the telephone. The basal reader test-teach-test system has also created a situation in which the tests and the teaching materials match--based on the same theory of learning and published and sold as a package. Today as English educators call for assessment to 221 be consistent with current theory, we need to consider carefully the extent to which the curriculum and assessent should match--and to ponder what the alternatives might be. Evaluation of English Language Arts Teaching Practices 1. As this study has shown, evaluation of English language arts teaching practices have been and are currently often inextricably linked to evaluation of student performance (p. 30, p. 84). Therefore, many' of the same issues are of concern. For example, both internal and external evaluation seem to be fixed for evaluation of teaching practices. Classroom teachers are encouraged to self-evaluate their teaching practices—-and to extend self-evaluation to reflective practice and to classroom research. Self- evaluation is not enough, however. 2. As this study has shown, some educators have insisted that students’ test scores provide an objective measure for evaluating teaching practices and therefore should be welcomed by classroom teachers. They have pointed to the subjectivity of administrators' evaluations and offer student test scores as a corrective. Most English language arts teachers realize, however, that student test scores do not supply sufficient evidence of the success or failure of their own teaching practices. Student test scores reflect 222 more than teaching practices-—they reflect, for example, students’ family and social experiences and needs, the learning theories and biases of the test-makers, and the curriculum and teaching practices of prior school experience. Student test scores should, then, serve as one criterion by which English language arts teaching practices might be measured. 3. As with evaluation of student performance, evaluation of English language arts teaching practices should acknowledge teachers’ strengths as well as weaknesses, consider processes and strategies used, view error as ii necessary part of the risk—taking needed for learning (teachers are learners too), and allow for unanticipated classroom possibilities that have the potential to enrich literacy learning. 4. Similar cautions about. portfolios seem in order for their use in evaluating teaching practices as for evaluating student performance. English language arts teachers can gather data to document their own exemplary performance, but again there are unresolved issues that need to be considered. How much time and effort, for example, might a teacher spend showcasing her work? Might professionally produced videotapes eventually become the norm? Where might this trend end? 223 Evaluation gf English Language Arts Curricula and Programs 1. Criteria for English language arts assessment--whether it be for curricula, teaching practices, or student performance——seem to be inseparable from societal and educational contexts. In the early twentieth century, for example, the methodologies of science and business were thought to be directly applicable to education and thus dramatically influenced the criteria by which English language arts were evaluated. Likewise, the behaviorist psychology of the 1970s shaped the forms of assessment of that period. Today's emphasis on holism and integration appears to be having a similar impact in shaping English language arts assessment, though thoughts of national tests and standards may be in direct conflict with this emphasis. 2. Curriculum guides are potentially powerful, for they represent. a prescribed. curricula and. sometimes prescribed teaching practices and assessment. measures as well. The individual English language arts teacher who wants to consider changes and innovations needs a way to know how much of what kind of deviation will be permitted. Although little is written about curriculum guides in the professional publications, the work of the NCTE Curriculum Guide committee has served a valuable purpose by publishing model curricula and curricular asssessment criteria. 3. Criteria for English language arts assessment cannot be separated from the conditions in which assessment takes place. For example, widespread use of objective and standardized tests occurred at least partly because of changed conditions, i.e., growing numbers of students (p. 25). If school administrators had had the money to do so, they might have opted to change the conditions rather than the means of evaluation but instead chose the less costly alternative. Conditions such as class size have so much impact on teaching practices and curriculum, at least in part because they are money issues, that they in fact become criteria by which English language arts progams are evaluated. 4. Some English language arts teachers in the past have been the primary decision-makers regarding curriculum and evaluation (p. 65). Current professional publications and questionnaire data show that some English language arts teachers actively involved in the profession seek for themselves and their colleagues time, knowledge, and opportunity—-in order to become better classroom evaluation experts and in order to make decisions affecting the design and assessment of curricula, teaching practices, and student performance. These educators realize that they and their colleagues cannot be expected to switch from using published multiple-choice tests one day to using portfolios and 225 observation the next and therefore seek the time and knowledge to become better informed, self-confidently believing in their own ability to become curriculum designers and evaluators. 5. Historically, English language arts professional organizations have helped English educators and district administrators determine how to assess curricula, teaching practices, and student performance. NCTE’s resolutions have sometimes served at least as indirect criteria by which English language arts programs and practices could be assessed. As NCTE’s past actions regarding testing and evaluation have been traced, it has sometimes been difficult to know whether those actions were helpful or not. As this study shows, NCTE has from time to time jumped on testing bandwagons, e.g., by passing the 1923 resolution encouraging English language arts teachers to use more standardized tests and by publishing books and journal articles extolling the benefits of standardized and objective tests. Without a crystal ball, it must be difficult for NCTE leaders to chart the best course. Because of such difficulties, it seems clear is that the professional publications should maintain their policy of providing a forum in which a range of opinions can be heard. 6. A look at the large—scale studies of English language arts programs in the past is instructive today. When we 226 consider the quantities of information gathered in the past (sometimes involving hundreds of interviews and classroom observations and many thousands of documents), it is easy to wonder what use may have been made of the findings. As this study has shown, often it has been recommended that school district administrators and English language arts teachers use study results to compare their own circumstances to those described in the studies. Their comparisons might then lead teachers and administrators to use the data to highlight their own needs and to argue for improved conditions in their own situations. Such studies rarely occur' today, at least in English language arts circles, surely at least partly for financial reasons. If government and private money was used for such projects in the past, it seems likely that some of the money once spent on such studies is today being spent on efforts to design new ways to test students. Perhaps the link is not a strong one, but large-scale studies of the past allowed teachers to evaluate and perhaps reform their own circumstances in light of what was happening elsewhere. Today or tomorrow’s tests of student performance may not yield information that is as useful. Whether large—scale studies are feasible today is uncertain, but a suggestion offered by a questionnaire respondent seems worth considering--that NCTE should supplement conference sessions with journal descriptions of specific programs, providing enough detail so that teachers could understand both the programs and the conditions that 227 make the programs possible. Both the lessons from history and from current questionnaire responses suggest that English language arts journals should also include more discussion of even political and money issues that seem seldom included in current English language arts professional publications but that have significant impact. The ideal evaluation of English language arts programs would be multi-dimensional, incorporating the perspectives of a variety of experts and evaluation stakeholders, as suggested by Goodlad's descriptions of curriculum cited in chapter one. This study has presented a variety of additional useful English language arts evaluation criteria and processes from the past and the present--everything from teachers’ grades and standardized test scores to library circulation figures and the music habits of English teachers. While it is impossible to dictate which criteria should be used in every English language arts program, it seems especially important that students' English language arts test scores should serve as just one criterion by which student performance is measured and as just one criterion by which teaching practices are measured and as just one criterion by which curriculum is evaluated. No single English language arts criterion should by itself decide major curricular, teaching practice, or student performance issues. APPENDIX A _ . 231 Criteria for Assessing English Language Arts Curricula, Teaching Practices, and Students' Performance Ellen H. Brinkley, 1990 For each question please check as many items as apply FOR THE SCHOOL DISTRICT(S) YOU KNOW BEST. Feel free to add explanation and/or connsnt in the margins or on the back. ENGLISH LANGUAGE ARTS CURRICULA: 1. How significant are the following factors in shaping the English/Language Arts curriculum? accreditation bodies ——+——4———+———4———+——H—— (e.g., North Central, state) very not significant significant college expectations administrator and board interests professional literature community influences faculty skills and interests school schedule student needs and interests test results other: 1.- - <_ _. ‘- —.— ...- 1.. .r l war-+11- ._ 4- .. T -L st .... ._ .... .L ., -- -_ .. 1 i l l ._ ..P _ .1_ .- fl. -.. ...— .. .. 4- ,..._ _. 2. How significant are the following groups in evaluating the English/Language Arts curriculum? teaching faculty curriculum coordinator/principal external consultant -- 4»— -— q- - 1111, ; 5 : : 4‘; I I I I 44.7 . . . . 4; 1 r I 1 students . i I r .1 accreditation bodies Ti I I i :4, l fir l I community/board . i . . . i T I r I I V other 3 1 l 1 1 I a v 1 V I ' f 3. How significant are the following persons in designing and revising curriculum guides? teaching faculty curriculum coordinator VAT ; I ; E : principal or other administrator 44% . : . :~ 1 other: L 1 1 1 J_ ‘fi I v I r l 4. use is made of curriculum guides? teachers are expected to follow them explicitly they are intended to be followed loosely they are often ignored other: value does an English/Language Arts curriculum guide have? its development/revision provides occasion for faculty to discuss pedagogical issues and seek consensus rovides information for new teachers in the district grovides an official source for reference by administrators and teachers in discussion with parents and board members other: Ill I illlli . . 232 Criteria for Assessing English Language Arts Brinkley, 1990 liIILISH LANGUEGE ARTS TEBKIUDIEIEUMfrICES: 1. How significant are the following persons in determining teaching practices? classroom teacher . , . , I very not I Significant significant curriculum coordinator . ' ——»——4———+—-—+-—«———+-— princ1pal I I I , I other: ———————————————-—— —+———+——fi———+——fi———+— 2. How significant are the following factors in evaluating teaching practices? -r- l v principal curriculum coordinator peer teachers parents/board members students teacher self-evaluation other: i ~r - i i i i ‘F -, _- LT __ ._ _ -_ 0_._ -+ -P W_ 3. How significant are the following influences in changing teaching practices? professional literature , inservice/staff development training 4T exchange of ideas among teachers ' in the building 447 . L . , , exchange of ideas among teachers r in the district , . , 1 exchange of ideas within a wider group (e.g., writing project support group) . + , administrators and board 1* constraints of facilities and school schedule 4 . 1 student needs and interests . . . test results . . community : ; : other: EIKEHISH IJUKIDKEE lfltns STtHnflvr ASSES§§fl§IP= 1. Which of the following tests or means of assessment are used? SAT, ACT, CAT, other nationally-normed tests state-mandated tests: objective writing sample holistic scorinig of writing samples student portfolios other: . . 230 Criteria for Assessing English Language Brinkley, 1990 2. In the classrooms of your district, Arts what means of assessment are currently being used for English/Language Arts? objective tests essay tests observation of students interaction with students student compositions student performances student portfolios contractual grading student self—evaluation other: HHIHH oral reading (e.g., miscue analysis) In your school district how significant are the following factors in determining means of assessment of English/Language Arts? professional literature l l 1 l I I not very significant significant inservice/staff development , 1 . , , . exchange of ideas among teachers T I I : I I in the building ' ' ' ' ’ ' exchange of ideas among teachers . , , 1 , . in tile district ‘ ' ' ' exchange of ideas within a wider 44 , . 1 . . group fir fir I v I 1 administrators and board 4% i { l % j time constraints g : § ; 1 ; Other ‘ i : % ¢ : : IF YOU HAVE THE TIME AND PATIENCE . . . 1. Briefly describe, to the extent you are aware of it, the prgggsg in your district for evaluating the curriculum, teaching practices, and English/Language Arts program—— student performance. 2. If this process of evaluating the English Language Arts program could be altered, what changes might help dec151on—makers be in a better position to suggest improvements? 15...; - 'I-I—w-r r. APPENDIX B Department of Engnsn Kalamazoo Mlcmgan 49008-5092 WESTERN MICHIGAN UNIVERSITY May 10, 1990 Dear NCTE Classroom Practices in Teaching English Committee Member: As English educators we have witnessed dramatic changes both in theory and in classroom practice. Given today's emphasis on test scores and student assess— ment, we often find school districts self—consciously wondering how effective their programs are, how they compare with others like or different from themselves, and how well their students and faculties measure up. I am conducting research to discover techniques and criteria that school districts. and perhaps external consultants. can use to assess the effective— ness of K-12 English Language Arts (1) curricula, (2) teaching practices. and (3) student performance. Your knowledge, experience, and opinions can provide invaluable information that will be useful to this study and to the profession. Please take a few minutes to complete the enclosed survey and return it to me by gggg lg. 1990, if at all possible. You need not identify yourself, but if you do. I'll be happy to send you the results of the survey once the study is completed. . Sincerely, {WA/W Ellen H. Brinkley EHBzct B I BLIOGRAPHY BIBLIOGRAPHY Achtenhagen, Olga. "Why Is an Examination--And What of It?" English Journal 15 (1926): 285-89. Allen, Virginia F. "Riddle: What Does a Reading Test Test?" Learning, Nov. 1978: 87-89. APEX Evaluated and Revised. Trenton, MI: Trenton Public Schools, 1975. Applebee, Arthur N. Tradition and Reform in the Teaching of English. Urbana: NCTE, 1974. Applebee, Arthur N. Writing in the Secondary School. Urbana: NCTE, 1981. Applebee, Arthur N. et al. Learning to Write in our Nation’s Schools. U.S. Dept. of Education, 1990. Aronson, Edith and Roger Farr. "Issues in Assessment." Journal of Reading 32 (Nov. 1988): 175-77. Association of English Teachers of Western Pennsylvania. Suggestions for Evaluating Junior High Writing. Champaign, IL: NCTE, n.d. ---. Suggestions for Evaluating Senior High Writing. Champaign, IL: NCTE, n.d. Atwell, Nancie. In the Middle. Upper Montclair, NJ: Boynton/Cook, 1987. Austin, Mary C. "Evaluating status and Needs in Reading." Evaluation in Reading. Ed. Helen M. Robinson. . Supplementary Educational Monographs No. 88. Chicago: U of Chicago, 1958. 36-41. Backlund, Phil et a1. "Evaluating Speaking and Listening Skill Assessment Instruments." Language Arts 57 (1980): 621-27. Baker, Franklin T. "The Teacher of English." English Journal 2 (1913): 335-43. Barnes, Walter et al. "Judging Teachers’ Judgments in Grammar Errors." English Journal 18 (1929): 120-25+. 235 ‘Hhfi 236 Bay Village (Ohio) City Schools. "District Holistic Assessment of Reading Scale." Beverly, Clara. "Standards in Oral Composition: Grade One." Elementary English Review 2 (1925): 360-61. Black, Janet K. "There’s More to Language Than Meets the Ear." Language Arts 56 (1979) 516—33. Bobbitt, Franklin. How to Make a Curriculum. Boston: Hougton, 1924. Brett, Sue M. "The Federal View of Behavioral Objectives." On Writing Behavioral Objectives for English. Eds. John Maxwell and Anthony Tovatt. Champaign, IL: NCTE, 1970. 43-47. Brinkley, Ellen Henson. "A Gift of the Past." Goldenseal 10.4 (1984) 4-8. Broening, Angela M. "English as Experience in Secondary Schools." Essays on the Teaching of English. NY: Russell and Russell, 1940. 58—78. ---. "The Role of the Teacher of English in a Democracy." English Journal 30 (1941): 718-29. Broening, Angela M. et a1. Conducting Experiences in English. NY: Appleton-Century—Crofts, 1939. Brown, Margaret E. "A Practical Approach to Analyzing Children’s Talk in the Classroom." Language Arts 54 (1977): 506-10. Brown, Rexford. "The Examiner Is Us." English Education 16 (1984): 220-25. Burns, Paul C. Diagnostic Teaching of the Language Arts. Itasca, IL: F. E. Peaock, 1974. Buros, Oscar Krisen, ed. English Tests and Reviews. Highland Park, NJ: Gryphon Press, 1975. Burrill, Lois E. "How Well Should a High School Graduate Read?" NASSP Bulletin. Mar. 1987: 61—71. Burton, Dwight L. Literature Study in the High Schools. NY: Henry Holt, 1959. Burton, Dwight L. et a1. Teaching English Today. Boston: Houghton Mifflin, 1975. 237 Bussis, Anne M. and Edward A. Chittenden. "Research Currents: What the Reading Tests Neglect." Language Arts 64 (1987): 302—08. Calkins, Lucy McCormick. The Art of Teaching Writing. Portsmouth, NH: Heinemann, 1986. Camenisch, Sophia Catherine. "Some Recent Tendencies in the Minimum-Essentials Movement in English." Engligh Journal 15 (1926): 181—90. Canady, Robert Lynn and Phyllis Riley Hotchkiss. "It’s a Good Score! Just a Bad Grade." Phi Delta Kappan (Sept. 1989): 68-71. Carini, Patricia F. "The Prospect School: Taking Account of Process." Testing and Evaluation: New Views. Washington, DC: Association for Childhood International, 1975. Carruthers, Robert B. Building Better English Tests. Champaign, IL: NCTE, 1963. Certain, C. C. "Are Your Pupils Up to Standard in Composition?" English Journal 12 (1923): 365-77. ---. "A Testing Program for the New School Year." Elementary English Review 3 (Sept. 1926): 211—21. Chaplain, Miriam. "Pushing Minimun Standards Toward the Maximum in English." English Education 9 (1978): 212— 17. Chew, Charles R. "Large Scale Writing Assessment: An Instructional Message." Testing in the English Language Arts. Ed. John Beard and Scott McNabb. Rochester, MI: Michigan Council of Teachers of English, 1985. 49-51. Clapp, Frank L. "A Test for Habits in English." Elementary English Review 3 (Jan. 1926): 42-46. Clapp, Frank L. and Robert V. Young. "A Self—Marking English Form Test." Elementary English ReView 5 (Dec. 1928): 304-06. Clarke, Lori. "Creative Teaching--Why Not Creative Testing?" English Education 4 (1972): 43-47. Cohen, Sheldon S. A History of Colonial Education. 1607— 1776. NY: John Wiley, 1974. 238 Congreve, Willard. "Implementing and Evaluating the Use of Innovations." Innovations and Change in Reading Instruction. 67th Yearbook — Part II. National Society for the Study of Education. Chicago: U of Chicago, 1968. 291-319. Cook, Walter W. "Evaluation in the Language-Arts Program.“ Teaching Language in the Elementary School. 43rd Yearbook — Part II. National Society for the study of Education. Chicago: U of Chicago, 1944. 194-214. Cooper, Charles R., ed. The Nature and Measurement of Competency in English. Urbana, IL: NCTE, 1981. Corbett, William D. "Let’s Tell the Good News About Reading and Writing." Educational Leadership 46 (Apr. 1989): 53. Coulter, Vincil Carey. "Financial Support of English Teaching." English Journal 1 (1912): 24—29. Courtis, S. A. "The Value of Measurements: II. The Uses of the Hillegas Scale." English Journal 8 (1919): 208-17. Cox, Sidney. The Teaching of English. NY: Harper and Brothers, 1928. Cremin, Lawrence A. The Transformation of the School. NY: Alfred A. Knopf, 1961. Davis, Frederick B. "What Do Reading Tests Really Measure?" English Journal 33 (1944), 180-87. Dawson, Mildred A. "Building a Language-Composition Curriculum in the Elementary School." Elementary English Review 8 (Oct. 1931): 194-96. DeBoer, John D. "Earmarks of a Modern Language Arts Program in the Elementary School." Elementary English Review 31 (1954): 485-93. Derrick, Clarence. "Tests of Writing." English Journal 53 (1967), 496—99. Diederich, Paul B. Measuring Growth in English. Urbana: NCTE, 1974. Distefano, Phillip and Joellen Killion. "Assessing Writing Skills Through a Process Approach." English Education 16 (1984): 203-07. Doll, Ronald C. Curriculum Improvement. 2d ed. Boston: Allyn, 1970. 239 Donmoyer, Robert. "Curriculum Evaluation and the Negotiation of Meaning." Language Arts 67 (1990): 274- 86. Durkin, Delores. "Testing in the Kindergarten." The Reading Teacher 40 (1987): 766—70. Eberhart, Wilfred. "Evaluation in English in the Eight-Year study." English Journal 28 (1939): 261-70. Edelsky, Carole and Susan Harman. "One More Critique of Reading Tests——With Two Differences." English Education 20 (Oct. 1988) 157-71. Elliott, Velma L. "Peer Evaluation for Teachers? Why Not?" Elementary English 51 (1974): 727-30. Estes, Thomas H. "A Scale to Measure Attitudes Toward Reading." Journal of Reading 15 (1971): 135-38. Evans, David N. "Standards Are Needed for CRT!" Educational Leadership 32 (1975): 268-70. Evans, M. Eleanor. "Objective Tests in Eighth Grade Literature." Elementary English Review 4 (Jan. 1928): 13-22. Faigley, Lester et a1. Assessing Writers' Knowledge and Processes of Composing. Norwood, NJ: Ablex, 1985. Farr, Roger. "New Trends in Reading Assessment." Curriculum Review, Sept./Oct. 1987, 21-23. Farr, Roger and Nancy L. Roser. "Reading Assessment: A Look at Problems and Issues." Journal of Reading 17 (1974): 592-99. Farstrup, Alan E. "Point/Counterpoint: State by State Comparisons on National Assessments." Reading Today 7 (Dec. 1989/Jan. 1990), 1. Faust, Wirt G. "An Effort to standardize Descriptive Theme- Writing for the Senior Year of the High School." English Journal 5 (1916): 257—71. Ferguson, Bill L. "Behavioral Objectives-~No!" English Education 3 (1971): 52-55. Findley, Warren G. "Purposes of School Testing Programs and Their Efficient Development." The Impact and Improvement of School Testing Programs. 62nd Yearbook. National Society for the Study of Education. Chicago: U of Chicago, 1963. 1—27. 240 "Implications of Parent, Teacher, and John Beard Fitzgerald, Sheila. Student Perspectives on the Value of School Tests." Testing in the English Language Arts. Ed. and Scott McNabb. Rochester, MI: Michigan Council of 1985. 28-43. Teachers of English, "Evaluation of Learning in Writing." Handbook on Formative and Summative Evaluation of Bloom et a1. NY: Foley, Joseph J. Ed. Benjamin S. Student Learning. McGraw-Hill, 1971. 767-813. Foster, Jack D. "The Role of Accountability in Kentucky’s Education Reform Act of 1990." Educational Leadership 48 (Feb. 1991): 34-36. "Latest Model." Reading Teacher 40 Fredericks, Anthony D. (1987): 790-91. French, John W. "What English Teachers Think of Essay Testing." English Journal 46 (1957): 196-201. Gates, Arthur I. "The Measurement and Evaluation of Achievement in Reading-" The_Teaghing_gf_Readingi_A 36th Yearbook - Part I. National Bloomington, IL: Second Report. Society for the Study of Education. 359-88. Public School Publishing, 1937. Assessing Sub—rosa "Research Currents: Language Arts 61 Gilmore, Perry. Skills in Children’s Language." (1984): 384-91. Glatthorn, Allan A. A Guide for Developing an English Curriculum for the Eighties. Urbana: NCTE, 1980. "Reading Tests: Past, Present, and Reading Diagnosis and Evaluation. Ed. 1970. 55-64. Glock, Marvin D. Future." Dorothy L. DeBoer. Newark, DE: IRA, M. Frances Klein. Curriculum NY: Teachers College, Goodlad, John I. Foreword. Reform in the Elementary School. 1989. Goodman, Kenneth S. et a1. Report Card on Basal Readers. Katonah, NY: Richard C. Owen, 1988. Portsmouth, Goodman, Kenneth S., Yetta M. Goodman, and Wendy J. Hood, eds. The Whole Language Evaluation Book. Evaluation of NH: Heinemann, 1989. Ed. Goodman, Yetta. "Evaluation of Students: Teachers." Whole Lan ua e Evaluation Book. Portsmouth, NH: Heinemann, Kenneth S. Goodman et al. 1989. 3-14. 241 Goodman, Yetta M. and Carolyn L. Burke. Reading Miscue Inventory. London: Macmillan, 1972. Graves, Donald H. Build a Literate Classroom. Portsmouth, NH: Heinemann, 1991. —--. NCTE Convention. Atlanta, 1990. Gray, William S. "A Decade of Progress." The Teacing of Reading: A Second Report. 36th Yearbook — Part I. National Society for the Study of Education. Bloomington, IL: Public School Publishing, 1937. 5-21. -——. "Nature and Scope of a Sound Reading Program." Reading in the High School and College. 47th Yearbook — Part II. National Society for the Study of Education. Chiago: U of Chicago, 1948. 46-48. Greene, Harry A. and William S. Gray. “The Measurement of Understanding in the Language Arts." The Measurement of Understanding. 45th Yearbook - Part I. National Society for the Study of Education. Chicago: U of Chicago, 1946. 175-200. Groff, Patrick. "Behavioral Objectives for Children’s Literature?--Nol" Reading Teacher 30 (1977): 653-63. Gross, David M. and Sophronia Scott. Time 16 July 1990: 56- o Gursky, Daniel. "Ambitious Measures." Teacher 2 (Apr. 1991): 50—56. Harman, Susan. "National Tests, National Standards, National Curriculum." Language Arts 68 (1991): 49—50. Harring, Sydney. "A Scale for Judging Oral Compositions." Elementary English Review 5 (Mar. 1928): 71-73+. Hassett, John J. "Checking the Accuracy of Pupil Scores in Standardized Tests." English Journal 67 (1978): 30—31. Hatch, Roger Conant. "A Standard of Measurement in English Composition." English Journal 9 (1920): 338-44. Hatfield, W. Wilbur. "The Ideal Curriculum." Elementary English Review 9 (Sept. 1932): 179—81+. Heathington, Betty S. and J. Estill Alexander. "A Child- Based Observation Checklist to Assess Attitudes Toward Reading." Reading Teacher 31 (1978): 769-71. Henry, George H. "An Attempt to Measure Ideals." English Journal 35 (1946): 487—93. 242 ---. "Only Spirit Can Measure Spirit." English Journal 43 (1954): 177-82. Hester, Kathleen B. Teaching Every Child to Read. 2d ed. NY: Harper, 1955. Hillocks, George, Jr. Research on Written Composition. Urbana: NCTE, 1986. Hodges, John C. "The State-Wide English Program in Tennessee." English Journal 34 (1945): 71—76. Holmes, Betty C. and Nancy L. Roser. "Five Ways to Assess Readers’ Prior Knowledge." Reading Teacher (1987): 646-49. Hook, J. N. "Characteristics of Award-Winning High Schools." English Journal 50 (1961): 9—15. ---. A Long Way Together. Urbana: NCTE, 1979. -——. "The Tri—University BOE Project: A Project Report." On Writing Behavioral Objectives for English. Ed. John Maxwell and Anthony Tovatt. Champaign, IL: NCTE, 1970. 75—86. Hook, J. N. et al. Representative Behavioral Objectives for High School English. NY: Ronald, 1971. Hosic, James F. "The Chicago Standards in Oral Compositon." Elementary English Review 2 (1925): 170-71. Howard, Elizabeth Zimmerman. "Appraising Strengths and Weaknesses of the Total Reading Program." Evaluation of Reading. Ed. Helen M. Robinson. Supplementary Educational Monographs No. 88. Chicago: U of Chicago, 1958. 169-73. Huey, Edmund Burke. The Psychology and Pedagogy of Reading. NY: Macmillan, 1908. Hunt, Lyman C., Jr. "Evaluation Through Teacher-Pupil Conferences." The Evaluation of Children’s Reading Achievement. Ed. Thomas C. Barrett. Newark, DE: IRA, 1967. 111-125. Improving SAT Scores. Lexington, MA: Ginn, 1985. Instructional Objectives Exchange. English Skills. 7—9. Los Angeles: Instructional Objectives Exchange, 1970. 243 Jackson, Phillip W. "In Grades Seven through Nine." Evaluation of Reading. Ed. Helen M. Robinson. Supplementary Educational Monographs No. 88. Chicago: U of Chicago, 1958. 28-31. Jewett, Arno. "Accountability in English." English Education 3 (1971): 5-15. --—. English Language Arts in American High Schools. Washington, DC: U.S. Dept. of H.E.W. Bulletin 1958. No. 13. 1959. Jonas, Leah. "Power-Testing in Literature." English Journal 29 (1940): 799-805. Johnson, Clifton. Old—Time Schools and School-books. 1904. Intro. Carl Withers. NY: Dover, 1963. Johnston, Peter. "Teachers as Evaluation Experts." Reading Teacher 40 (1987): 744-54. Judine, Sister M. A Guide for Evaluating Student Composition. Urbana: NCTE, 1965. Judy, Stephen N. ABCs of Literacy. NY: Oxford, 1980. ---. "Standardardized Tests and Their Alternatives." English Journal 67 (1978): 5-6. Kirby, Dan and Tom Liner with Ruth Vinz. Inside Out. 2d ed. Portsmouth, NH: Heinemann, 1988. Kirschenbaum, Howard, Rodney Napier, and Sidney B. Simon. Wad-ja-Get? NY: Hart, 1971. Klapper, Paul. Teaching English in Elementary and Junior High Schools. NY: D. Appleton-Century, 1915. Klein, M. Frances. Curriculum Reform in the Elementary School. NY: Teachers College, 1989. Kopp, O. W. "The Evaluation of Oral Language Activities: Teaching and Learning." Elementary English 44 (1967): 114-23. Koos, Leonard V. "The National Survey of Secondary . Education: Its Implications for Teachers of English." English Journal 22 (1933): 303—13. Koziol, S. M., Jr. and Patricia Burns. "Using Self— . Reports for Monitoring English Instruction." English Education 16 (1985): 113-21. 244 Krest, Margie. "Adapting the Portfolio to Meet Student Needs." English Journal 79 (Feb. 90): 29-34. Langer, Judith A. et al. Learning to Read in Our Nation’s Schools. U.S.Dept. of Education, 1990. Lennon, Roger T. "What Can Be Measured?" Reading Teacher 15 (1962): 326-37. Rpt. in Measurement and Evaluation of Reading. Ed. Roger Farr. NY: Harcourt, 1970. 18- 34. Leonard, S. A. "The Wisconsin Tests of Grammatical Correctness." English Journal 15 (1926): 430-42. Leonard, Sterling Andrus. Essential Principles of Teaching Reading and Literature. Philadelphia: J.B. Lippincott, 1922. ---. "How English Teahers Correct Papers." English Journal 12 (1923): 517-32. Levi, Ray. "Assessment and Educational Vision: Engaging Parents and Learners." Language Arts 67 (1990): 269— 73. Lloyd-Jones, Richard and Andrea A. Lunsford, eds. The English Coalition Conference: Democracy Through Language. Urbana: NCTE; NY: MLA, 1989. Lundsteen, Sara W. "Teaching and Testing Critical Listening in the Fifth and Sixth Grades." Elementary English 41 (1964): 743-52. Lunsford, Andrea A. "The Past--and Future--of Writing Assessment." Writing Assessment. Eds. Karen Greenberg et al. NY: Longman, 1986. 1-12. Madaus, George F. "The Influence of Testing in the Curriculum." Critical Issues in Curriculum. 87th Yearbook - Part I. National Society for the Study of Education. Chicago: U of Chicago, 1988. 83-121. ---. "What Do Test Scores ’Really’ Mean in Educational Policy?" Testing in the English Language Arts. Ed. John Beard and Scott McNabb. Rochester, MI: Michigan Council of Teachers of English, 1985. 1-11. Mandel, Barrett, J., ed. Three Language Arts Curriculum Models. Urbana: NCTE, 1980. Marcus, Albert. "Diagnosis and Accountability." Elementary English 51 (1974): 731-35. 245 Mason, James Hocker. "The Educational Milieu, 1874-1911." English Journal 68.4 (1979): 40-45. Mavrogenes, Nancy A. et al. "Concise Guide to Standardized Secondary and College Reading Tests." Journal of Reading 18 (1974): 12—22. Maxwell, John C. Introduction. Common Sense and Testing in English. Urbana: NCTE, 1975. iv-V. Maxwell, John C. and Anthony Tovatt, eds. On Writing Behavioral Objectives for Enclish. Champaign, IL: NCTE, 1970. Mayher, John S. and Rita S. Brause. "Learning Through Teacing: Is Testing Crippling Integrated Language Education?" Language Arts 63 (1986): 390-96. McCaig, Roger A. "What Research and Evalation Tells Us About Teaching Written Expression in the Elementary School." The Language Arts Teacher in Action. Kalamazoo, MI: Western Michigan University, 1977. 46— 56. --— "The Writing of Elementary School Children." Grosse Point, MI: Grosse Point Public School System, 1972. McDonald, Arthur S. "Measuring Reading Performance." Measurement and Evaluation of Reading. Ed. Roger Farr. NY: Harcourt, 1970. 10—17. McKey, Eleanor F. "Do Standardized Tests Do What They Claim to Do?" English Journal (1961): 607-11. Meisels, Samuel J. “High-Stakes Testing in Kindergarten." Educational Leadership 46 (Apr. 1989): 16-22. Melear, John D. "An Informal Language Inventory." Elementary English 51 (1974), 508-11. Mellon, John C. National Assessment and the Teaching of English. Urbana: NCTE, 1975. Miller, Vera V. and Wendell C. Lanton. "Reading Achievement of School Children—-Then and Now." Elementary English 33 (1956): 91-97. Moffett, James. "Misbehaviorist English: A Position Paper." On Writing Behavioral Objectives in English. Ed. John Maxwell and Anthony Tovatt. Champaign, IL: NCTE. 111- 16. 246 Moore, David W. "A Case for Naturalistic Assessment of Reading Comprehension." Language Arts 60 (1983): 957- 69. Moore, Walter J. and Larry D. Kennedy. "Evaluation of Learning in the Language Arts." Handbook on Formative and Summative Evaluation of Student Learning. Ed. Benjamin F. Bloom et al. NY: McGraw—Hill, 1971. 399- 445. Morreau, Lanny E. "Behavioral Objectives: Analysis and Appreciation." Accountability and the Teaching of English. Ed. Henry B. Maloney. Urbana: NCTE, 1972. 35-52. Morrow, Lesley Mandel. "Assessing Children's Understanding of Story Through Their Construction and Reconstruction of Narrative." Assessment for Instruction in Early Literacy. Ed. Lesley Mandel Morrow and Jeffrey K. Smith. Englewood Cliffs, NJ: Prentice-Hall, 1990. 110-34. Moscrip, Ruth. "Shall We Test in Literature?" Elementary English Review 5 (1928): l40—41+. Muller, Herbert J. The Uses of English. NY: Holt, 1967. Myers, Miles. "The Politics of Minimum Competency." The Nature and Measurement of Competency in English. Ed. Charles C. Cooper. Urbana: NCTE, 1980. ---. A Procedure for Writing Assessment and Holistic Scoring. Urbana: NCTE, 1980. NCTE. Common Sense and Testing in English. Urbana: NCTE, 1975. NCTE Commission on the Curriculum. "A Check List for Evaluating the English Program in the Junior and Senior High School.“ English Journal 51 (1962): 273-82. NCTE Commission on the English Curriculum. The English Language Arts in the Secondary School. NY: Appleton— Century—Crofts, 1956. -—-. Language Arts for Today’s Children. NY: Appleton- Century-Crofts, 1954. NCTE Committee on Correlation. A Correlated Curriculum. NY: D. Appleton-Century, 1936. NCTE Committee on High School-College Articulation. "But What Are we Articulating With? English Journal 51 (1962): 967-79. 247 ——- "What the Colleges Expect." English Journal 50 (1961): 402-12. NCTE Committee to Review Curriculum Guides. "Trends in Curriculum Guides." Elementary English 45 (1968): 891- 97. Noyes, Ernest. "Progress in Standardizing the Measurement of Composition." English Journal 1 (1912): 532-36. O’Neil, John. "Drive for National Standards Picking Up Steam." Educational Leadership 48 (Feb. 1991): 4—8. Parker, Flora E. "The Value of Measurements: I. The Measurement of Composition in English Classes." English Journal 8 (1919): 203-08. Paulis, Chris. "Holistic Scoring." The Clearing House (Oct. 1958): 57-60. Peckham, Irvin. "Statewide Direct Writing Assessment." English Journal 76 (Dec. 1987): 30-33. Peterson, Gordon. "Behavioral Objectives for Children’s Literature? Yes!" Reading Teacher 30 (1977): 652—60. Poley, Irvin C. "Learning by Testing." English Journal 20 (1931): 128-36. Pooley, Robert C. "Where Are We At?" English Journal 39 (1950): 497-504. Pooley, Robert C. and Robert C. Williams. The Teacing of English in Wisconsin. Madison: U of Wisconsin, 1948. Probst, Robert E. Response and Analysis. Portsmouth, NH: Boynton/Cook, 1988. Purves, Alan C. "Evaluating Growth in English." The Teaching of English. Ed. James R. Squire. Chicago: U of Chicago, 1977. 230—59. ---. "Evaluation of Learning in Literature." Handbook on Formative and Summative Evaluation of Student Learning. Ed. Benjamin S. Bloom et al. NY: McGraw—Hill, 1971. 697—766. --—. "’Measure what mean are doing. Plan what man might become.’" On Writing Behavioral Objectives for English. Ed. John Maxwell and Anthony Tovatt. Champaign, IL: NCTE, 1970. 87-96. ---. NCTE Convention. Baltimore, 1989. 248 Rankin, Earl F., Jr. "The Cloze Procedure--Its Validity and Utility." Measurement and Evaluation of Reading. Ed. Roger Farr. NY: Harcourt, 1970. 237-53. Reif, Linda. "Finding the Value in Evaluation: Self- Assessment in a Middle School Classroom." Educational Leadership 47 (Mar. 1990): 24—29. Richards, T. S. [pseudonym] "Testmania: The School Under Seige." Learning 17.7 (Mar. 1989): 64-66. Romano, Tom. Clearing the Way. Portsmouth, NH: Heinemann, 987. Ruhlen, Helen V. "Experiment in Testing Appreciation." English Journal 15 (1926): 202-09. Sangren, Paul V. Improvement in Reading Through the Use of Tets. Kalamazoo, MI: Western State Teachers College, 19311. Satterfield, Mabel S. and Salibelle Royster. "The New-Type Test in English." English Journal 20 (1931): 490—95. Savitz, Jerohn J., Myrtle Garrison Bates, and D. Ralph Starry. Composition Standards. NY: Hinds, Hayden & Eldredge, 1923. Searson, J. W. "Determining a Language Program." English Journal 13 (1924): 99-115. Shafer, Robert E. "A National Assessment in English: A Double Edged Sword." Elementary English 48 (1971): 188-95. ---. "What Can We Expect from a National Assessment in Reading?" Journal or Reading 13 (1969): 3—8+. Shugrue, Michael F. English in a Decade of Change. NY: Pegasus, 1968. Simmons, Jay. "Portfolios as Large-scale Assessment.“ Language Arts 67 (Mar. 1990): 262-68. Simmons, John S. "Testing on Both Sides: A Comparison." English Journal 76 (Dec. 1987): 27-29. Slavin, Robert E. Research Methods in Education. Englewood Cliffs, NJ: Prentice-Hall, 1984. Slotnick, Henry B. and John V. Knapp. "Essay Grading by Computer." English Journal 60 (1971): 75-87. 249 Small, Robert C., Jr. "The English Teacher and the Superv1sor." English Education 7 (1976): 169—76. Smith, Dora V. Evaluating Instruction in Secondary School English. English Monograph 11. Chicago: NCTE, 1941. ~--. "Re-establishing Guidelines for the English Curriculum." English Journal (1958): 317-43. Smith, Nila Banton. American Reading Instruction. 1934. Neward, DE: IRA, 1965. Smith, Wilson, ed. Theories of Education in Early America America 1655-1819. Indianapolis: Bobbs—Merrill, 1973. Snider, Sarah J. "Developing Non-Essay Tests to Measure Affective Responses to Poetry." English Journal 67 (1978): 38-40. Spandel, Vicki and Richard J. Stiggins. Creating Writers. NY: Longman, 1990. "Specter of National Test Rears Its Head." ASCD (Association of Supervision and Curriculum Development) Update 32 (Nov. 1990): 1+6. Spring, Joel. The American School 1642-1990. 2d ed. NY: Longman, 1990. Squire, James R. "Behavioral Objectives and Accountability." Goal Making for English Teaching. Ed. Henry B. Maloney. Urbana: NCTE, 1973. ---. "English at the Crossroads: The National Interest Report Plus Eighteen." English Journal 51 (1962): 381- 92. Squire, James R. and Roger K. Applebee. A Study of English Programs in Selected High Schools Which Consistently Educate Outstanding Students in English. Urbana: U of Illinois, 1966. "The Stanford Language Arts Investigation: A Symposium." English Journal 33 (1944): 119-29. Starch, Daniel. Educational Measurements. NY: Macmillan, 1916. Stewig, John Warren. "Oral Language: A Place in the Curriculum." Clearing House 61.4 (Dec. 1988): 171-74. Stone, Clarence R. Silent and Oral Reading. Boston: Houghton Mifflin, 1926. 250 Strickland, Dorothy S. Foreword. Observing the Language Learner. Ed. Angela Jaggar and M. Trika Smith-Burke. Newark, DE: IRA; Urbana: NCTE, 1985. v-vi. Strickland, Ruth G. "Evaluating Children's Compositon." Elementary English 37 (May 1960): 321-30. Tackacs, Claudia. "AWE: Classroom Use of a State Testing Program." English Journal 76 (Dec. 1987): 34-36. Taylor, Denny. "Teaching Without Testing: Assessing the Complex1ty of Children’s Literacy Learning." English Education 22 (Feb. 1990): 4-74. Tchudi, Stephen, and Diana Mitchell. Explorations in the Teaching of English. 3rd ed. NY: Harper, 1989. Thomas, Charles Swain. The Teaching of English in the Secondary School. Rev. ed. Boston: Houghton Mifflin, 1927. Thomas, Charles Swain et al. Examining the Examination in English. Cambridge: Harvard U Press, 1931. Thorndike, Edward L. "Notes on the Significance and Use of the Hillegas Scale for Measuring the Quality of English Composition." English Journal 2 (1913): 551-61. Tiedt, Iris McClellan. Writing From Topic to Evaluation. Boston: Allyn, 1989. Tinker, Miles A. "Appraisal of Growth in Reading." Reading Teacher 8 (1954): 35-38. Tuttle, Frederick B., Jr. How to Prepare for Writing Tests. Washington, DC: NEA, 1986. Tyler, Ralph W. "What Is Evaluation?" Evaluation of Reading. Ed. Helen M. Robinson. Supplementary Educational Monographs No. 88. Chicago: U of Chicago, 1958. 4-9. "Types of Organization of High-School English: Report of a Committee of the National Council of Teachers of English." English Journal 2 (1913): 575—95. Valencia, Sheila and P. David Pearson. "Reading Assessment: Time for a Change." Reading Teacher 40 (1987): 726-32. Wagner, Betty Jane. "A Valid Way to Assess the Effects of a Writing Project." English in the Eighties. Ed. Robert D. Eagleson. Australian Association for the Teaching of English, 1982. 51-60. 251 Ward, C. H. "The Scale Illusion." English Journal 6 (1917): 221-30. Watson, Dorothy J., ed. Ideas and Insights. Urbana: NCTE, 1987. Weaver, Constance. Reading Process and Practice. Portsmouth, NH: Heinemann, 1988. Wiersma, William. Research Methods in Education. 3rd ed. Boston: Allyn and Bacon, 1985. Wiggins, Grant. "Teaching to the (Authentic) Test." Educational Leadership 46 (Apr. 1989): 41-47. Wiley, Mary Callum. "The English Examination." English Journal 7 (1918): 327-30. Wilson, G. M. "New Standards in Written English." Elementary English Review 6 (1929): 117-19+. Wilson, Marilyn. "Testing and Literacy: A Contradiction in Terms?" Testing in the English Language Arts. Ed. John Beard and Scott McNabb. Rochester, MI: Michigan Council of Teachers of English, 1985. 12—16. Wittrock, Merlin C. "Process Oriented Measures of Comprehension." Reading Teacher 40 (1987): 734-37. Wixon, Karen K. et al. "New Directions in Statewide Reading Assessment." Reading Teacher 40 (1987): 749-54. Zemelman, Steven and Harvey Daniels. A Community of Writers. Portsmouth, NH: Heinemann, 1988. Zollinger, Marian, and Mildred A. Dawson. "Evaluation of Oral Communication. English Journal 47 (1958): 500-04.