MSU LIBRARIES ‘— RETURNING MATERIALS: P1ace in book drop to remove this checkout from your record. FINES W111 be charged if book is returned after the date stamped be10w. A DESCRIPTIVE MULTIMETHOD STUDY OF TEACHER JUDGMENT DURING THE MARKING PROCESS BY Sylvia Pratt Whitmer A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Elementary and Special Education l98l I‘IfII / / 9/ / l .1‘ \ ABSTRACT A DESCRIPTIVE MULTIMETHOD STUDY OF TEACHER JUDGMENT DURING THE MARKING PROCESS By Sylvia Pratt Whitmer Persistent dissatisfaction with traditional teacher marks, A B C D E, prompted this study. The purpose was to generate a description of the judgment processes which allow a teacher to distill a single symbol, a mark, from the diverse activities of the classroom to represent pupil progress. The subjects were five experienced, upper elementary teachers from a typically achieving district in Michigan. The framework for the investigation came from the field of human judgment. Four established methods were applied: process tracing, policy capturing, attribution theory and utility theory. Field data included taped interviews, record books, marks and predicted marks across a year. Data were organized into a composite case and five teacher cases. Quantitative and qualitative analysis included multiple regression, Pearson and partial correlations, frequency counts and categorization and coding of verbatim interview protocols. The findings revealed that teachers generally use procedural and contingency rules to guide a three-stage process of collecting data, weighting and assigning it to preordained categories, and choosing between categories when cumulated scores fall in zones of uncertainty or failure. The teachers' primary tool of inference at the procedural level is the record book, and the dominant judgment factor is task completion. The completion factor has a variable weight in the judgment process dependent upon task difficulty and student ability. Judgment factors which impact teacher choice in zones of uncertainty include Sylvia Pratt Whitmer ability, effort, task difficulty, home support and classroom behavior/physical maturity. Of these, effort has the most weight. The contingency focus is upon those factors which promote individual and group task completion; the primary contingency tools of inference are checks, minuses and pluses. The marking process is restricted to classroom functions, a conclusion suggesting that previous marking studies have made inappropriate assumptions about teacher judgment processes. Formative marks serve a feedback function, but summative marks do not. A marking judgment model was constructed from these findings. The potential of the model is as a heuristic to generate further deliberation and research in marking and as a tool for practitioners to refine their judgment policies. C‘) Copyright by Sylvia Pratt Whitmer |98l Dedicated to My husband Hugh, children Anamaria, Gordon, Kristin, and my mother, Merle Pratt, who formed an incomparable support base, and to the memory of my mentor Marie I. Rasey ACKNOWLEDGMENTS I wish to express my gratitude to Dr. Perry Lanier, doctoral committee chairman, for his reassuring guidance, his indispensable prodding, his constructive attention to the details of the dissertation, his general wise counsel and patience; Dr. Lawrence Lezotte for his keen perceptions into the political and practical nature of the public school mission, his continuing optimism and good humor; Dr. Keith Anderson for his philosophical challenge to keep teacher accountability for pupil learning within a humanistic perspective; Dr. Ted Word for his insistance on breadth of background and his penetrating questions to assure disciplined inquiry during the study. Special gratitude is expressed to the founders of the Institute for Research on Teaching, particularly Dr. Lee Shulman and Dr. Judith Lanier who ventured to go beyond research on teacher behavior to examine teacher thoughts and intentions. Special gratitude also to the leaders and staff of the Birmingham School District who regularly impressed upon me an understanding of the discrepancy between ideal theories and the practical realities of the classroom. In addition, I wish to acknowledge with deepest gratitude the efforts of Sylvia Clemence and Janyce Tilmon,CPS, who devoted untold hours, energy and creative spirit to pulling this dissertation into official form. Finally, I wish to express special thanks to the five teachers who volunteered for this study, their supportive principals, and to Dr. Jerry Blanchard who encouraged the project and without whose help this research would not have been possible. TABLE OF CONTENTS Chapter Page LIST OF TABLES ......................................... viii LIST OF FIGURES ................................................... I. INTRODUCTIONOOOIOOOO ...... O ..... 00...... ....... 0.0.0.0... Background Research ...................................... ProblemStatement......... ........ . ............. PurposeoftheStudy. ..... Research Questions ....... Research Methodology . .................... . ...... .. ....... AssumptionsandLimitations................................ Summaryand Overview ......... II. REVIEW OF THE LITERATURE . . . . . . . ......................... Introduction......... ........ . ........... MarksandMarking ........ ........ Pre-I920.......... ....... l920-I970 Accountingfor Capacity Accounting for Character and Motivation. .......... Accountingfor Goals I970 to thePresent..................................... Critiqueofthe MarkingLiterature Teacher Decision Making and Judgment .. .................... Teacher EffectivenessResearch.......................... InformationProcessing.................................. Teacher Decision Making.... ....... . ......... Teacher Judgment Critique of Decision-Making Research Summary III. RESEARCH SETTING, PROCEDURES AND METHODS . . . . . . . . . . . . Introduction.......... ..... ResearchSetting........ ......... . ........ ....... . Procedures. .............. . ....... ....... . ..... Data Collection........................................ Data Organization...................................... Data Analysis ..... Methods.. ................. . ............... ........ ProcessTracing..................................... PolicyCapturing.................................... UtilityAnalysis ........ . ............ AttributionTheory Stability Locus............. ....... ...... X Chapter contrO'. OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 000...... summary 0...... OOOOOOOOOOOOOOOO O ........ OOOOOOOOOOOOOOOO IV. FINDINGS O O O O O O O ......... O OOOOOOOOOOOOOOOOOOOOOOOOOOOO O O 0 Introduction....... ....... Composite Case Rules ProceduralRuIes ............. ....... Contingency Rules ........................... . ..... . Statistical Analysus Across the Year Within MarkingPeriods Verbal Analysis CompositeSummary.................................... CaseOne-Teacher One BaseData.... ..... Philomphyand RUIes 0.00.00.00.00...OOOOOOOOOOOOOOOOOOO Procedural Rules.................................... Contingency Rules Statistical Analysis Verbal Analysis . ....... summarYOOOO0.0.00.0.0000....0.0000000000IOOOOOOOOOOO. CaseTwonTeacher Two BaseData............................................. Philosophyand Rules Procedural Rules ContingencyRules . Statistical Analsysis .......................... Verbal Analysis Summary CaseThree-TeacherThree BaseData............................................. PhilosOphyand Rules Procedural Rules Contingency Rules Statistical Analysis..................................... Verbal Analysis Summary.. ..... . ................................. Case Four - Teacher Four . .......................... BaseData............................................. Philosophyand Rules Procedural Rules ContingencyRules Statistical Analysis vi Verbal Analysis ........................................ I48 summarYOOOOOOOOOOOOOOO. OOOOOOOOOOOOOOOOOOOOOOOOOOOOO 0 In“ Chapter Page CaseFive-TeacherFive.................... ............. .. I47 BaseData...... ................ .............. I47 PhilosophyandRules M7 Procedural Rules ...... . ......... . ................... M7 ContingencyRules I48 Statistical Analysns M9 VerbalAnalysis........................................ |53 Summary I57 V. CONCLUSIONS AND IMPLICATIONS..... ............ . ........ . I64 InfrOdUCfionOOOOOOOOOOOCOOOOOOOOOOOOOOOOOOOOOOOOOOOO ..... 0 '6“ Summary of Findings by Research Questions I64 lmPIICOtimeOl-ResearChoooooooooooo00000000000000.0000... I70 'mplicafims forPractice.D.OOOOOCOOOOOOOOOOOOOOOOOOCOOOOOOO '72 Summary ................... ......... I74 APPENDIXA... ............................. . ....... I75 APPENDIXB... ........ . ..... ..... I82 APPENDIXC............. .................. .. ................... |84 APPENDIX D... .............. . ..................................... l87 APPENDIXE...... ........ . ....................................... . 203 APPENDIXF.......... ...... 205 LISTOFREFERENCES ..... 206 vii Table 4.I Table 4.2 Table 4.3 Table 4.4 Table 4.5 Table 4.6 Table 4.7 Table 4.8 Table 4.9 Table 4.I0 Table 4.I I Table 4.I2 Table 4.I3 Table 4.I4 Table 4.I5 Table 4.I6 Table 4. I 7 Table 4.I8 Table 4. I 9 LIST OF TABLES Relationship between the final mark and predictor marks OOOOOOOOOOOOIOOOOO...0.0...OOOOOOOOOOOOOOOOOOOOOO Correlations between marks and predicted marks acrossayear O00......OOIOOOOOIOOOOOIOOOOOIIO0.00.0.0... Controlled relationships between the final predictim and the final mark OCOO...OOOOOOOIOOOOOOOOOOOOOO Composite pattern of marking averages across a year . . . . . . . . Distribution of marks across three marking periOdsn'anguage and mathOOOOOOOOO...OOOOOOOOOOOOOOOOOO. Teacher attribution-utility categories . . . . . . . . . . . . . . ..... . . . Composite attribution-utility count and percentage . . . . . . . . . . Attribution-utility categories collapsed . . . . . . . . . . . . . . . . . . . . Pattern of average marks across a year-Teacher One . . . . . . . . Distribution of marks across three marking periOds-‘TeaCher one.I.OOOOOOOIOOOOOOIOOI.00.... ...... O. AIIfIbU‘I’IOfl-U‘I'IIIIY percentage-“'TeOCher one o o o o o o o o o o o o o o o 0 Cross tabulations of effort and ability-Teacher One . . . . . . . . . Pattern of average marks across a year—Teacher Two. . . . . . . . Distribution of marks across three marking periOdy'TeaCher TWOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO. Attribution-utility percentage—Teacher Two. . . . . . . . . . . . . . . . Cross tabulations of effort and ability--Teacher Two . . . . . . . . . Pattern of average marks across a year-.TeOCher ThreeOOOO0.0....0...OOOOOOOOOOOOOOOOOOOOOO Distribution of marks across three marking periOds-‘TeOCher Three 0.0.0.0...OOOOOOOOOOOOOOOOOOOOOOOO Attribution-utility percentage—Teacher Three . . . . . . . . . . . . . . viii Page 70 72 73 75 78 8| 82 87 I02 I04 |06 I08 IIS ”7 I2I I23 I28 I3I I33 Table 4.20 Table 4.2I Table 4.22 Table 4.23 Table 4.24 Table 4.25 Table 4.26 Table 4.27 Table 4.28 Page Cross tabulations of effort and ability-.TeOCherThreeO....0.IOOOOOOOOOOOOOOOOOOIOOOOOOOO '35 Pattern of average marks across a year—Teacher Four . . . . . . . I40 Distribution of marks across three marking periOdy-TeOCherFourOIOCIOOOOOOOOIOOOOOO OOOOOOOOOOOOO O. '44 Attribution-utility percentage—Teacher Four . . . . . . . . . . . . . . . I46 Cross tabulations of effort and ability—Teacher Four . . . . . . . . ISI Pattern of average marks across a year--Teacher Five . . . . . . . I55 Distribution of marks across three marking perIOdS--T80Cher FIVE one.ooooooooooooooooooooooooooooooo '56 Attribution-Utility Percentage—Teacher Five . . . . . . . . . . . . . . . IS7 Cross tabulations of effort and abilityuTeacher Five. . . . . . . . . IS7 Figure 4.I Figure 4.2 Figure 4.3 Figure 4.4 Figure 4.5 Figure 4.6 Figure 4.7 Figure 4.8 Figure 4.9 Figure 4.I0 Figure 4.I I Figure 4.I2 Figure 4.I 3 Figure 4.I4 Figure 4.I5 Figure 5.| LIST OF FIGURES Plotted relationship between the first and final marks OOOOOOOOIOOOOOOOOOOOO0.00.00.00.00.00000000000000 Composite marking policy (with predictions) Sample record bookaccount Decision tree for markingjudgment........................ Marking policy for Teacher One (with predictimS) .0.0DOCOCOCOOOOOIOQOCOOOOOODOOOOOOOOOOOCCCOC RecordbOOROCCOUHI~TeOCher one ooooooooooooooooooooooo Marking policy for Teacher Two (with prediCtimS) O..0.000COCCCCOOOCCCCCCCOOOOOOOOOOOOCOOOOIOC Record wokaccountfiTeOCherTWOoooooooooooooooooooooooo Marking policy for Teacher Three (with predictimS) O0.000000000000000000000000000000000IOOOOIOO RecordmkaccomtnTeOCherThree OOOOOOOOIOOOOOOOOOOIOO Marking policy for Teacher Four (with predictionS) OOOOOODOOOOOOOOOOOOOOOOOOOOOOOOOODOOOOOOOOOO Record DOOkOCCOUflI—TCOCI'ICI' FOUI’ ooooooooooooooooooooooo Marking policy Teacher Five (with predictimS) O....OOOOIOOOOCOOIOICO0.00000...0.000.000... Record bOOkOCCOUI'IT-“TCOCI'ICF FIVE ooooooooooooooooooooooo Framework for markingprocess.. ...... Framework for markingprocess........................... Page 7 I 74 77 85 |0| I05 II4 II9 I25 I27 I39 I43 I50 l53 l6l I66 CHAPTER I INTRODUCTION Report cards and the pupil progress marks contained therein are one of the most persistent phenomena of American educational history. Marks fulfill both a measurement and a gatekeeping function with societal consequences extending well beyond the school organization. Despite continued criticism by educators based on low reliability ratings on repeated research, despite extensive professional efforts to establish more objective substitutes, traditional teacher marks continue to dominate official records and to be the most reliable source of information on student achievement (Bejar, I98I; Evans, I976; Lavin, I965). Furthermore, the societal need to identify and develop outstanding talent, to assure minimal competency to all graduates, to be fiscally accountable with public funds combines with this obvious lack of an acceptable substitute to assure the use of marks for some time into the future. The process by which teachers make marking judgments, therefore, merits careful investigation. Marks are commonly used as (I) single, summary symbols, (2) indicating achievement in some substantial segment of a student's educational enterprise, (3) given by an instructor for (4) purposes of record and report. A mark represents the teacher's judgment of pupil achievement based on such a combination of evidence as the teacher elects to use. It involves a determination of weights in value for specific products such as recitation, homework, tests, essays and for less tangible factors such as class participation, neatness in written work, mechanical correctness, industry and effort and personal agreeableness (Thorndike, I969). Teacher judgment is one component of general human cognitive processes. Cognitive processes (interchangeable with the terms thinking and mental processes), are those unseen phenomena of the brain which enable the human organism to adapt behavior to the information of the environment. Within history, cognitive processes have been depicted by two major strands: memory structures to represent information; mental operations performed upon and between these memory structures (Posner, I973). In this study, the more modern terms decision-making processes are also treated as interchangeable with cognitive processes. Judgment is defined as one type of decision or cognitive process. The study makes no attempt to distinguish between fundamental terms which are of lasting concern to the field of cognition such as static-dynamic qualities and innate learned qualities of mental operations. Instead, the study takes its lead from information processing, from the processes of selection (simplification) and inference which allow simple human memory structures to organize the complex information of a changing social environment such as the contemporary classroom. Most histories of marks refer to the four functions of marks identified by Wrinkle (I 947). I. Administrative Functions. Marks indicate whether a student will be promoted, required to repeat or be graduated. They are used for transferring records, for judging candidates for college admission and for evaluating prospective employees. 2. Guidance Functions. Marks are used in diagnosis of special abilities or inabilities and for placement within the curriculum. 3. Information Functions. Marks are the chief means by which schools officially record pupil progress and communicate it to parents. 4. Motivation and Discipline. Marks are used to stimulate greater student effort, to determine eligibility for honors of various sorts and eligibility to play on teams. 0 Feedback Function. In addition to the four functions identified by Wrinkle, a fifth function was identified. In I967, the Yearbook of the Association of Supervision and Curriculum Development (A.S.C.D.) related this fifth function to the theories underlying behavioral objectives and accountability. Arthur Coombs and contributing authors discussed the need for teachers and programs to have a summary evaluation component which acted as a feedback mechanism for correcting or fine tuning system errors. They argued that marks ought to fulfill such an important function or substitutes should be found (Wilhelms, I967). These five general functions involve some of society's biggest value questions and include the assumptions that marks can: 0 Accurately measure individual achievement against an absolute standard. 0 Reflect individual achievement in relation to potential. 0 Reflect individual achievement in relation to group (class, school, state or national norms). o Predict future achievement in subsequent schooling, K-I2. o Predict future achievement in college. 0 Predict future job success. 0 Motivate sustained or increased academic production. 0 Reflect teacher effectiveness as a feedback mechanism. Background Research Research in several fields of education has been guided by these functions and assumptions. This study reviews the literature in two related areas; research on marks and marking and research on teacher decision making. The reviews reveal an abundance of studies on marks themselves, but an absence of systematic inquiry into the teacher judgments which underly the marking process. Research on marks and marking has been guided by these identified functions and assumptions for more than half a century. Four separate reviews of the marking literature reveal a continuing contradiction (Evans, I976; Kirschenbaum, Simon 8. Napier, I97I; Smith 8. Dobbins, I959; Thorndike, I969). On the one hand, marks are repeatedly found unsatisfactory because they fail to carry out any one of the five functions reliably on objective measures, and because they rely on subjective teacher judgment. On the other hand, marks remain the most reliable indicator of student performance (Bejar, I98I; Lavin, I965), and recommended substitutes still rely heavily on teacher judgment. This seeming contradiction points to a discrepancy between the number of functions ascribed to marks and the number which are actually accounted for by classroom teachers. The discrepancy may be understood by noting other themes and silences found within the marking literature. The review in Chapter II indicates that the number of functions ascribed to marks and the underlying assumptions has grown and developed with the expanding role of education in general. Research into these functions has been dominated by the measurement field which has examined each assumption singly, found it unreliable, and argued logically that the function was not being carried out well, and therefore, substitutes should be found for the whole process. Studies in this mode have relied heavily on outcome measures, usually paper and pencil tests, and have emphasized correlational strategies with limited variables under consideration. When studies have expanded to consider teacher variables, they have looked of teacher characteristics and behavior rather than teacher judgment processes. They have emphasized the static-linear nature rather than the dynamic interactive process of the classroom. The continued emphasis on studying single assumptions and limited, discrete variables, has taken the focus off the larger, complex functions and multifaceted nature of the marking task. Yet it is from this multifaceted nature of the classroom environment that teachers select and organize the marking task. There are notable silences within the teacher decision-making literature. Although teacher judgment is acknowledged as a basic factor in marking and in most substitutes, the judgment or organizing process has not been examined systemically. Even in the emerging literature which examines teacher planning and content choice, the marking process has not gained attention. Neither has the time and cost of the marking decision been examined in the literature, although all recommended substitutes for conventional marks make increased demands on teacher time and frequently are accompanied by pleas for revised reporting procedures, smaller classes and released time (Anderson, I966; Thorndike, I969). The literature is also silent regarding interaction effect between pupil level of compliance with assigned tasks and teacher judgment, although one theoretician describes the task structure of classrooms as ultimately an exchange of performance for grade (Becker, Geer, 8. Hughes, I968). Finally, the emerging literature in the teacher effectiveness field, which supports the measurement of the classroom factors of immediacy and simultaneity, is silent as to how these are related to the marking process (Brophy, April I980; Doyle, I975). In his review of research on marks, Robert Thorndike pointed out that marks involved conflicting values of society which made it difficult to arrive at a working consensus on marking practices. This being the case, Thorndike included an important section on marks related to the institutional culture pattern which impacts them, and he pointed out that a "modus vivendi" has typically been worked out between the tradition of marking and the rest of the institutional culture. "One who would reform the marking system of an educational institution needs first to acquire a profound understanding of the culture of that institution. In conclusion, much of the literature on marks and marking over the past 50 years seems to have missed the mark because it has operated in an unrealistic world. It has been unrealistic in two senses. It has been insensitive to the very real limits in time, precision of judgment and skill in assessment within which the typical teacher Operates. It has been largely unaware of the complex cultural pressures bearing upon instructors within the society of an educational institution and defining the bases and limits of their grading practices much more than does any psychometic theory. It is within these two limiting structures that any reform or improvement of grading practices must operate (Thorndike, I969, p. 768). The well-developed literature on human judgment and decision making is pertinent to the marking task. In particular, Herbert Simon's contribution in Sciences of the Artificial (I969) bears directly on the discrepany between theoretically ascribed functions and actual teacher practices. Simon contends that the human memory is really quite limited, and in order to make an extraordinarily complex environment manageable, people construct simplified or "bounded” models of reality (Newell 8. Simon, I972) and attend only to strategic or salient factors in the situation. When the environment is a changing one rather than a fixed, mechanical nature, then the rational model of decision making is idealistic. Reasonable persons cannot know all possible combinations of factors in a changing social environment such as a classroom containing 30 diverse pupils, hence, they cannot "optimize" one solution as the rational model suggests. Instead, one must plan, group, choose and simplify in order to satisfy the situation and proceed to the next problem. This becomes an exercise in information processing. Perhaps nowhere in the teaching process is the information-processing paradigm more appropriate than in the task of marking. The process of simplifying the complex ingredients of classroom activities into symbols which have sufficient meaningfulness to be included in a summative mark is a process of design. Teachers are apparently highly selective because there are relatively few categories in a record book, and these are representative of an entire six to eight weeks of student activity and production. Distilling one mark from approximately ID or 20 meaningful marks in a record book which in turn were distilled from numerous activities over an extended period of weeks, is surely a remarkable act of simplification worthy of investigation. And when this distillation of classroom tasks is expected to be directly related to the functions which society ascribes to marks, it appeared important to investigate the relationship. Problem Statement The persistent dissatisfaction with teacher marks of student performance lies in a discrepancy between the functions ascribed to marks by society and the functions actually taken into account by teachers when judging pupil performance in the classroom context. Society has used marks as measures of academic achievement against an absolute standard (mastery), as predictors of future achievement in K—l2 (diagnosis and placement), as predictors of college success (entry and credentialing), as predictors of future job success (job entry and training), as motivators for learning (reward and punishment) and as potential evaluators of teacher/program effectiveness (feedback and accountability). These functions have guided marking research. Despite repeated research findings of low reliability of marks with these functions (Evans, Kirshenbaum, Smith 8. Dobbins, Thorndike), marks remain the dominant system of assessing and recording pupil progress at all levels and the most influential predictor of college performance (Bejar, I98l). The emerging research literature on teacher decision making suggests that the immediate demands of the classroom environment influence teacher decisions and planning more than theoretically based objectives or goals (Brophy, I980; Clark 8. Yinger, I979; Joyce, I980; Shavelson, I980). The marking judgment, the process of selection, organization and inference regarding evidence upon which the mark will be determined, is also heavily influenced by immediate classroom demands and student characteristics. That is, teacher selection of tasks to be included in a summary mark, and the heuristics and attributions used to reach final judgment, involve a more limited and immediate set of functions than those ascribed by society in general. Since the literature on both marking and teacher decision making is silent as to marking processes, a study to determine the nature of the discrepancy and the mental process was warranted. Purpose of the Study The major purpose of this study was to develop an understanding of the marking judgment which engages the teacher across a school year. The principal goal was to generate a description of the thoughts, judgments and decisions of five elementary teachers during the marking task. It was hoped (I) to identify strategies and cues which determined the marking judgment and perhaps to construct a model or framework of the process from these, (2) to compare the emerging judgment factors with the functions ascribed to marks by society and (3) to generate hypotheses about the marking process which would indicate fruitful areas for future research. An investigation of the process of marking was justified by the number of highly involved constituencies who had previously commissioned their own studies of marks, but not targeted the teacher judgment process—school districts and parents, teachers, students and the educational research community. (I) From the viewpoint of school districts and administrators, the report card remains the major communication device between schools and homes across the nation (ERS Report, I977). It is relied upon by parents as a personal pupil progress report (Anderson, I966). The marking process is considered so valuable that district policies and teacher contracts specify periodic reports and often set aside paid teacher record days. (2) From the teacher viewpoint, marking student work is a task which absorbs the most significant block of professional time outside the classroom (Hilsum 8. Cone, I97I; Yinger, I977) and which results in a rational system (record book) capable of explaining or justifying student marks at any given time. (3) From the student viewpoint, marks are part of a permanent record which may track students into specific skill levels or classes and which continue to be the most reliable source of achievement information for determining eventual college or job entry (Bejar, I98l). (4) Finally, from the educational research community viewpoint, the process of marking deserved to be studied in its own right as the potential site of greatest accountability where learning tasks, intentionally planned and assigned by teachers, are transformed into measurable student achievement, symbolized by a mark, on a daily, cumulative basis. In this respect, it is interesting to note that teacher education programs seldom have courses or texts pertaining to the marking process nor to its role within the larger teacher process. Research Questions A study of the teacher marking judgment represents a study of general human judgment. Judgment is well discussed by Johnson (I955) and Newell (I968) and summarized and reviewed by Shulman and Elstein (I975). Judgment as described by Johnson is distinguished by three thought processes: preparation, production and judgment. ID The third process, judgment, may be idenified as the evaluation or categorizing of an object of thought. This is logically differentiated from productive thought in that typically nothing is produced. The material is merely judged; i.e., put into one category or another. Many of the subjective analyses of thinking have included a concluding phase of hypothesis testing or verification during which the thoughts previously produced are judged. In experimental psychology, judgment is a well-developed topic, studied chiefly under the headings of psychophysics, aesthetics, attitudes and rating of personnel (Johnson, I955, p. 5l, cited in Newell, I968). Newell gives Johnson's definition more precision: l. The main inputs to the process are given and available; obtaining, discovering or formulating them is not part of judgment. 2. The output is simple and well defined prior to the judgment; the judgment itself is one of a set of admissible responses; where classes or categories are given, it is usually called selection, estimation or classification. 3. The process is not simple transduction of information; judgment goes beyond the information given, adding information to the output. 4. Judgment is not simply a calculation or the application of a given rule. 5. The process of judgment concludes or occurs at the end of a more extended process. 6. The process is immediate, not extended through time with subprocesses, in which case we would refer to preparation for judgment. 7. The process is distinguished from searching, discovering or creating, as well as from musing, browsing or idly observing (Newell, I968). In applying these insights to an inquiry on teacher mental processes while marking pupils, Lawrence Stenhouse adds a historical dimension, retrospective generalizations, rather than the more commonly used predictive generalizations, of well-known judgment studies. Stenhouse's concern is to "map the range of experience rather than to perceive within that range the operation of laws in the scientific sense. The utilitization of history . . . works through the refinement of judgment, not the refinement of prediction" (Stenhouse, I980). In applying the thoughts of Stenhouse, Johnson, Newell and Shulman to the marking task, the following research questions were generated to guide the study: I. Upon what information is the summative mark based (first, second and final mark)? 2. What cognitive process or processes make possible the formative stages (record book categoies) of marking? 3. Is there a judgmental rule which explains how the input information (formative) is transformed into the output (summative), a mark? 4. If the judgmental rule yields a zone of uncertainty between any two preordained categories of judgment, what cognitive processes enable the teacher to assign a mark up or down? 5. Do identified cognitive processes form a pattern, schema or model of the marking process? 6. Do these identified teacher cognitive processes account for the five functions ascribed to marks by society in general? 7. Of the four methods of investigation used, is one superior for illuminating the marking process? Research Methodology The judgment literature indicates the use of numerous methodologies. Due to the nature of cognitive data and the high reliance on self-report, the field has spent considerable time critiquing its own methods in order to assure scientific validity. For teacher marking, a combination of methods allowed a system of cross-checking or corroborating evidence gathered by any one method. Hence this study used four of the most common approaches within judgment l2 theory: process tracing, policy capturing, attribution theory and utility theory. These approaches were grounded in the comprehensive work of Nisbett and Ross, Kahneman and Tversky, Heider and Weiner, Shulman and Elstein and Simon and Newell. Within the decision-making literature, one term needs particular explanation for this study: policy capturing. Under ordinary circumstances, policies are understood to be the general guiding principles which determine institutional procedures and regulations. In the judgment literature, policies are private/personal principles which determine the weighting of various factors (cues) involved in a judgment. Policy capturing concentrates on identifying these cues, defining their relationship to each other, and estimating their relationship to a final judgment for the purposes of prediction and control. In this study, it results in both a composite teacher marking policy and an individual policy for each teacher depicted in Figures 4.2, 4.5, 4.7, 4.9, 4.” and 4.I3. Policy capturing is discussed in greater detail in Chapter III. The study used classroom field data derived from a series of taped interviews (five experienced, upper elementary teachers, one suburban school district), record books, marks and predicted marks in math and language arts across one year. Quantitative and qualitative analysis focused on the formative, summative and final marks of l52 students. The distinction between these levels of marks, previously made in the research questions, is critical to any interpretation of the analysis. Formative marks are those which represent daily and/or weekly tasks within the record book columns. Summative marks are those which result from the teacher judgment at the end of an officially designated marking period and which are sent home on report cards. Final marks, sometimes called the final summative marks, are those which result from the judgment process at the end of the school year and which remain on the permanent record. l3 The distinction between these levels is important and can be confusing, because summative marks change roles—performing a formative function within the final mark. It is the summative marks and the final marks which serve as the major data base for quantitative, statistical procedures within this study. Formative marks and verbatim protocols form the primary data base for the qualitative methods. The data was organized into a composite case and five cases within the schema of each teacher accompanied by his/her captured policies and strategies. These schemas supported an emerging classroom model of the teacher marking judgment. Assumptions and Limitations The primary assumption of the study is that teachers' thoughts guide a significant portion of the teacher marking process and that these cognitive processes, despite being unobservable, can be identified and understood. The study does not seek to prove or disprove a specific hypothesis. Since no other systematic inquiry has been undertaken on teacher marking judgments, this study is intended to be a first step in generating meaningful hypotheses for future research. The study has important limitations. It was limited to the elementary level in order to have comparable data from the teachers studied. Although the marking research literature is heavily oriented to college and high school marks, there has been little attempt to hold grade level variables constant in state-of- the-art papers. Marking reviews jog back and forth across grade levels without much concern. From the early interviews, it became obvious that it was highly likely that elementary teachers who have students in self-contained classrooms l4 framed the marking judgment differently than the secondary teachers who have students for one period only. Although some implications of the study may be applicable to marking judgments at all levels, it is imperative to recognize that this study was limited to elementary teachers. The teachers in this study are highly experienced. None has less than I4 years of teaching. This may be viewed as a limitation when generalizing to other teacher populations. However, it may also be viewed as an advantage when it is noted the average U.S. teacher is comparable in age and experience. Teachers of long experience may have refined the judgment process to the point of having very specific policies and strategies to declare. The focal point of the study was teachers' judgment related to formative and summative marks. There are numerous areas in education which touch on marks indirectly, e.g., reward and punishment literature, student mediating literature, etc. To review each of these would have made the study unwieldly. Future studies may combine some of these areas meaningfully. Summary and Overview Chapter I has established that a contradiction appears in the marking literature as to the reliability and usefulness of marks and marking. The contradiction may arise from a discrepancy between the functions ascribed to marks by society and those actually accounted for by classroom teachers during the marking task. In order to create an understanding of the teacher judgment process across a year, this study of five teachers serving l52 students in a typically achieving, suburban community in Michigan was undertaken. Chapter II reviews the background literature in two related areas; research on marks and research on teacher decision making. These reviews place the problem in I5 historical perspective. Chapter III reviews four research methods for investigating a judgment problem which involves gathering and analyzing both qualitative and quantitative data. Process tracing, policy capturing, attribution theory and utility theory are applied to five teacher cases and one composite report. The primary mode of data collection, the structured interview, is discussed in detail. Chapter IV displays the data which teachers considered during the marking task, contains inferences made from the data and presents a model of the marking judgment process. Chapter V summarizes the findings. The model of the teacher judgment process is compared to the functions of marking as delineated in Chapter I. Implications of the study are discussed. The conclusion is summarized under the research questions which guided the study. CHAPTER II REVIEW OF THE LITERATURE Introduction Teacher judgments during the marking process have not been empirically investigated and, therefore, do not provide a body of research to review. However, studies of teacher decision-making processes and studies of marks and marking have been combined to lay the groundwork for investigating teacher thinking during the marking task. This review is divided into two major sections: Research in Marks and Marking and Research in Teacher Decision Making. Each section is followed by a brief critique, and the chapter is concluded with a summary statement. Chapter III, Methods and Procedures, contains an additional literature review. It is limited to a review of four common methods of investigation used to examine cognitive processes, and was included because of the controversial nature of self-report data which forms the basic evidence in judgment studies. In Chapter III, the review is directly tied to the immediate application of specific methods within the marking study and, therefore, was not included as part of the general literature on marking judgments. Marks and Marking Research on marking parallels historical educational concerns. For the purpose of organizing this review, the literature has been divided into three phases: pre-I920 when the emphasis was on perfecting standards of measurement of pupil products usually without concern for the pupil; l920-l970 when the l6 I7 emphasis was on the learner and on increasing the functions and comprehensiveness of marks to account for pupil characteristics related to learning; I970 to the present, when the emphasis is on teacher behavior which causes learning especially in relation to evaluation and accountability. The years used to bound a time period are characterized by particular educational developments or movements at a national level including; the Testing movement (Army Alpha 8. Beta tests), the Progressive Education movement (Developmental psychology), the Behavioral Objectives movement (Goals statements), the Accountability movement (Coleman Report combined with national decline in SAT achievement scores). In no sense are these dates intended to be precise cut-offs, but rather as general markers along trend lines. In fact, most movements were germinated by particular events in a preceding period. Despite national movements and expanding functions, marking practices reflect the peculiar local nature of U.S. education which blends national concerns in a local formula, frequently tied to a local tax base. And, as this study illustrates, marking continues to reflect the processes of individual, human judgment as exercised by the teacher. The central question for research on marks has been what the mark should represent. What aspects of student performance should the mark try to characterize and in relation to what reference group should the appraisal be made? These questions set the research agenda on marks. The search for answers reveals a clear parallel with historical educational developments; a trend toward adding functions, toward more sophisticated measuring tools, toward complexity of record format. Concurrently despite increasing functions, the search reveals a strong underlying reliance on pupil achievement variables. The search also reveals that research on marks has had a limited conceptual framework which has l8 manipulated and correlated product and learner variables in static designs, but which has not investigated the variables of teacher and classroom. Five reviews are especially pertinent and form the basis of this section of the study, although the section is brought up to date by reference to single studies recently completed. A seminal review by Wrinkle (I947) set forth the basic functions of marks, and all other reviews refer to Wrinkle's contribution. A review by Smith and Dobbins (I959) organized the literature between l9l0 and I957 from a historical perspective. Another by Kirschenbaum, Simon and Napier (I97I) took a humanistic point of view and organized the research literature to convey the message that traditional marks were dysfunctional for a significant segment of students. This point of view was supported and updated by Evans (I976) where the literature was specifically organized around Wrinkle's four functions of marks. Lastly, Thorndike (I969) organized the literature around a traditional measurement view where marks were compared to specified frames of reference. Thorndike parted from the traditional view, however, by arguing for a new research agenda where institutional and teacher variables would be investigated. Each review was done in depth and offered an important perspective. Pre-I920 Studies on marks have reflected the development of educational concepts. Early work reflected the long-held philosophy that knowledge existed apart from the individual. The job of teachers was to impart knowledge to students and to measure the extent to which it had been absorbed. Early research was concerned with the knowledge-product and standardizing measurement across teachers. Starch and Elliott (l9l2) set the pace by three simply designed studies asking approximately I00 teachers in each of three subject areas to mark a paper l9 on a scale of I00 points. Critics accepted the 39-point range found in the subjective area of English, but they were surprised at the range of 45 points found in geometry. The probable error in all three subjects was nearly the same; 5.4 English, 7.5 math, 7.7 history. Starch and Elliott concluded that variability was a function of the examiner and of the method of examination. With slight variations, these studies were repeated up through the l950$ with similar results. Studies by Rugg, Dearborn and Kelly between l9l0 and l9l5 led to the generalization that individual teachers set their own standards with a resulting variability, unreliability and inconsistency of distribution of student marks (Smith 8. Dobbins). Bells (I930) demonstrated that teachers regraded papers with low reliability. Tieg (I952) reported that a single teacher given the same test paper to rescore, assigned marks that differed l4 points (l00-point scale) on the average from first marks. However, studies concluding low reliability were accompanied by studies which were more optimistic. Greater reliability and less variability was achieved by merely changing from percentage points (I00 categories) to letter grades (commonly five categories). Starch and Elliott supported this. Wrinkle pointed to the adoption of five symbols as the greatest single change in marking practice since I900 and by I939, four-fifths of elementary and secondary schools used them (Wrinkle). Less variation also resulted when serious attention was given to marking and when teachers agreed upon general governing principles (Jaggard, I9I9). Common standards were the pursuit of the College Entrance Examination Board as early as I9I0 (Smith 8. Dobbins). Studies continued to measure learning as a product to be accounted for by standards, frequencies and distribution measures. Answers to marking problems were sought by manipulating symbols and categories, and the major function of marks was to represent knowledge. 20 12M The years between I920 and I970 clearly reflect expanding functions beyond the measurement of subject matter achievement. Widespread use of the Army Alpha and Beta tests during World War I established the concept of measurable differences in capacity and went a long way toward destroying the idea that any child could learn as well as any other child if s/he tried hard enough. This concept brought an interest in individualization of instruction, increased attention to ability grouping, changes in promotion practices and wide use of standardized tests. Accountingfor Capacity. From World War I onward, the marking function was expected to take account of individual capacity. The use of standardized tests for individualization and ability grouping brought forth the problem of whether or not a full range of marks should be used in each ability group. Studies promoted various solution such as A, B and C for superior classes; B, C and D for average classes; C, D and F for slower classes. A Los Angeles committee (I926), however, recommended the whole range in each section and a Wisconsin committee (I929) recommended a symbol to denote the level at which the work was done (Smith 8. Dobbins). Research was inconclusive and the problem of grouping or tracking continues into present day discussions of mainstreaming and gifted education. As compulsory attendance laws took effect and the school population increased, the interest in measuring achievement and in performance prediction became fascinating fields in their own right, separate from marking, but interestingly always reliant on marks as a point of validation (Bejar, l98l; Lavin, I965). These fields have a voluminous research literature which is not reviewed within this study but which is referred to at a later point. 2| Accounting for Character and Motivation. The impact of testing on marking was strengthened by the impact of the Progressive Education movement undergirded by developmental psychology. The movement was gathering momentum prior to World War I. The Progressive Education Association officially stated: "The aim of Progressive Education is the freest and fullest development of the individual, based upon the scientific study of his mental, physical, spiritual and social characteristics and needs" (Cremin, I95I, p. 240). Cremin characterized the movement as a many-sided effort to use the schools to improve the lives of individuals by: first, broadening the program and function of the school to include direct concern for health, vocation and quality of family and community life; second, by applying scientific, pedogogical research in the classroom; third, by tailoring instruction to the different kinds and classes of children who were being brought within the purview of the school. This broadening viewpoint grew in momentum and influenced most educational procedures including pupil records and marks (Cremin). Whereas the testing movement greatly influenced grouping and promotion practices related to marks, the Progressive Education movement emphasized pupil characteristics which were related to level of development and motivation. Ethical character, citizenship, responsibility, industriousness, worthy use of leisure time—these factors were considered to be important and related to learning. Report card formats responded by carrying checklists which supplemented symbols. Studies showed that teachers considered effort, attitude and other factors besides achievement and that the term "achievement" had many component parts such as accuracy, mastery, regularity and application. The use of rating scales and checklists in both academic (cognitive) and nonacademic (affective) areas offered a basis for marking which was behaviorally defined. Freyd (I923) found rating scales yielded more reliability between raters and 22 between the same rater over time than did marks. By I939, 87 percent of elementary report forms listed character traits (Wrinkle, I947). Evaluation in this manner, however, was often difficult to explain to parents because of overlapping terms such as "reliability," "dependability" and "reSponsibility" (Wrinkle). Gradually, parent conferences were instituted to supplement both symbols and checklists. Conferences were found especially useful when one teacher had a class of youngsters for the whole day, but the time commitment to this process was heavy. Evidence indicates that supplements to symbols developed steadily at the elementary level with a clearly emerging relationship between wealth of the school district and extent of the supplements (Thorndike, I969). The marking function was expected to account for pupil traits which enhanced or detracted from school learning. The I930s marking studies reflected a growing criticism of marks on the part of the Progressive Education movement. The major thrust of criticism was directed toward the competitive nature of marking which was believed to seriously harm student motivation especially students from poor home backgrounds (Wrinkle). Research studies were directed toward the relationship between motivation and grades. Marks as major incentives were heavily criticized in the belief that they often led to undesirable behaviors such as cheating, superficial learning and complacent attitudes. Some educators thought they encouraged complacency and/or fostered fear (Smith 8. Dobbins). Tiegs (I93I), however, found that with intermediate pupils, 90 percent tried harder because of good marks and 97 percent because of poor marks. Fay (I937) reported that A students do better when told their marks, B students somewhat better and C students only slightly better (Smith 8. Dobbins). Although feelings ran high on this tapic, research was inconclusive. More recently, research indicates that usually the better students experience greater motivation from grades (McDavid, I959; 23 Phillips, I962) and down—graded students often continue to fail (Chansky, I962). A number of researchers have observed that teacher marks are often skewed to the high end of the curve, and that this skewing is to be expected in selected groups (Smith 8. Dobbins), but skewing was not examined as a motivational technique. Research on marks and motivation, spanning the years between I930 and I970, can be summarized by saying that different students respond to grades differently. Wrinkle, however, concluded that of all the functions which marks were expected to carry out, the motivational function was actually the "only one they served with any considerable degree of effectiveness." Heavy criticism of marks continued through the reviews of Kirschenbaum and Evans with two major effects; first, it firmly established the motivational function of marks; second, it promoted the search for substitutes. The I940$ movement to Behavioral Objectives evolved from the search of Progressive Education for a substitute for marks. It was also encouraged by the realization that no one could be sure what a single mark meant unless it represented the measurement of a single identified value. If the change from percentages to five symbols was a hallmark in the history of marking, the movement to objectives was surely as important. However, despite its original promise as a substitute for marks, its eventual impact was indirect. Its profound effect was to reground the marking function in the curriculum rather than in the competing functions of selection, prediction, reporting or motivation. Aside from a few marking experiments, Behavioral Objectives changed the 2529333 of marking rather than marks themselves. Accounti_ng for Goals. Research on marks and marking since I940 reflects the growing momentum of the Behavioral Objectives movement and its absorption by the Accountability movement. Marking studies indicated a growing practice of attempting to evaluate much more than subject-matter achievement (Traxler, 24 I957), however, the general terms of "citizenship" and "responsibility" were not satisfactory due to overlapping meanings. The improvement of written comments become a distinct goal and borrowed much from Traxler in "The Nature and Use of Anecdotal Records" (Traxler, I939; Wrinkle, I947). The Ten-Year Study at Colorado State College of Education shows the emerging relationship between marking and Behavioral Objectives. In I929 the Colorado Campus Research Laboratory announced it was going out of the A B C D E marking business: Over the next ten years we made almost every mistake a school could make in our efforts to improve our marking and reporting practices. In rapid succession we developed and discarded innumerable detailed evaluation report forms, checklists and scale- type reports. We juggled symbols-S U, H S U, H M L, and others. We accumulated thick files of anecdotal records. We tried informal- letter reports. For a time we abandoned all forms of written reporting and substituted parent-teacher conferences. We constructed elaborate cumulative record forms. We emphasized student self-evaluation. We developed still other detailed report forms. And in every direction we went we came out at the same spot. If it was good, it took too much time; it wasn't practical; it wouldn't work in the public schools. And our job as a research- laboratory school was to work out not only something we could use; it had to be equally useful for Yuman or Yampa or Teaneck or Tacoma (Wrinkle, p. 4). In the I940s, Wrinkle looked back and wrote: From the beginning of our work we recognized the importance of outcomes other than those specifically associated with subject matter. We considered self-directiveness and social adjustment and others of such importance that we built checklists, scales, self-evaluation forms, and other devices to evaluate them. But it did not down on us until about I938 that if the end product of a part of a youngster's experience could and should be measured in terms of what he does, the end product of all educative experience is the modification of behavior of the learner. That is why we have schools. If a learning experience does not result in a modification of the wa the learner behaves, he has not really learned anything of real va ue (Wrinkle, I947, p. 4). 25 Behavioral Objectives eventually became a powerful center of controversey. Bloom (I956) created a taxonomy for classifying objectives which fueled educational research for years. He went on to establish percentages of mastery within objectives and eventually to assert that aptitudes were alterable, that all students could learn most objectives given sufficient time and appropriate instruction (Bloom, I976). The tendency of some parts of the measurement field to define smaller and smaller units of behavior, however, was viewed with growing alarm even by such advocates and seminal thinkers as Ralph Tyler (Shane, I973) and James Popham (I972). Tyler repeatedly cautioned that human nature was better portrayed by higher or more general objectives than by the specific ones currently used (Tyler, I950). Popham noted that some important goals were unassessible and in some classes the proportion of nonmeasurable goals might be smaller than others. By the I960s, it was clear the Behavioral Objectives would not be an acceptable substitute for marks. The list of Behavioral Objectives grew so long that it had to be classified into the domains of cognitive, affective and psychomotor. Lists of objectives on report cards represented the height of the expansion of the format and function of report cards. Following this attempt at substitution for marks, no new solutions have been offered within the marking literature, although minimal competency tests have been offered as a product bottom line. I970 to the Present Prior to I970, commissioned by the federal legislature to perform the first massive assessment of public education, Coleman published his correlational research Equality of Opportunity (Coleman, I966). His conclusion was that despite massive compensatory spending programs sponsored by the Civil Rights 26 Movement, the schools were not fulfilling the job of equalizing the outcomes of all students. In I969 heeding Coleman's message, Leon Lessinger articulated the government's accountability theory in education whereby schools were to be responsible for actual learning outcomes not just for providing opportunities (Lessinger, I969, I975). Accountability for outcomes following I970, was fueled by a swelling tide of public conservatism, fiscal ‘and otherwise, and the startling fact, emanating from the Scholastic Aptitude testing service, that the achievement scores of college bound students had declined steadily since I963 (S.A.T. Panel Report, I977). The period of expansion of marking functions paralleling the period of curriculum expansion, experienced the strong impact of evaluative measurements and publication of the results. The lack of success of expensive compensatory education programs, the declining achievement scores and the general swelling of conservative sentiment caused educators to debate which of the many functions which had accrued to education over the years were truly their responsibility. Educational research since I970 reflects an evaluative and reflective mode and a reemergence of educational historians searching the past for patterns of development and success across the years (Broudy, I972; Cremin, I96I, I965; Tyack, I980). In particular, Cremin documents the formal demise of Progressive Education in the l9505 (Cremin, I96I), although others believe it reemerged briefly with the Civil Rights Movement (N.E.A., I974). The marking literature after I970 is characterized by a drop off in research studies, by literature reviews which are forced to refer to studies at least twenty years old, and by very specific investigations into declining achievement scores related to a phenomena called "grade inflation." The first two 27 characteristics are self-evident in the literature reviews, but grade inflation is of particular interest. Evans (I976) reported that during the l950$ and I960s, aptitude test scores increased, but grade distributions remained unchanged. Aiken (I963) found that average grades remained unchanged despite rising S.A.T. scores. In I972 Baird and Feister analyzed data from large samples and concluded: This study confirms the earlier research . . . which indicated that faculty members, at least collectively, prefer or are committed to a certain distribution of grades. Thus, faculties show an "adaption level" by awarding, on the average, about the same average and distribution of grades, whether their current students were brighter or duller than last year's (Baird 8. Feister, I972, p. 440). However, toward the end of the I960s, this situation reversed. Aptitude scores started to drop and average grades awarded to students started increasing. This phenomenon has been carefully reported in the S.A.T. Panel Report (I977). Termed "grade inflation," this trend has occurred at both the college and high school level (Ferguson 8. Maxey, I975), but the rate of grade inflation has diminished since I974 (Bejar, I98l). According to Evans, the resulting situation indicates that grades reflect different achievement levels at various time periods because different levels of competition prevail. It also indicates that within a limited range, teacher marks reflect changing educational values with allowance for lag time. By I974, a survey by N.E.A. indicated that the five-symbol system of reporting pupil progress was dominant in the majority of the nation's schools. At the fourth—grade level, approximately seventy percent of the schools also used a supplement such as parent conferences. Following the fourth grade, supplementary information declines in use (N.E.A., I974). However, the use of grades for promotion and eventual college entry remains the dominant pattern 28 through the 703 to the present, and pupil achievement is the primary for A B C symbols. In summary, the literature on marks and marking has expanded from theoretically representing one function—a quantity of knowledge—to representing five very complex functions. Since the I960s the expansion appears to have tapered off. The criticism by Progressive Education that marks did not encompass enough of the functions of learning has been replaced by the criticism of the conservatives that marks are attempting to reflect too much, hence, "grade inflation." Although there is no evidence of a contraction of functions, many evaluative studies have been suggested. The present study of teacher cognitive processes is representative of this trend Critique of the Marking Literature The thoughts of Joseph Schwab on curriculum are especially pertinent to marking. In I969 he wrote of the field of curriculum that it was "moribund," unable by its present methods and principles to continue its work and desperately- in need of new and more effective principles. Signs of such a crisis include a flight from the subject of the field including a sign of "marked perseveration, a repetition of old and familiar knowledge in new languages which add little or nothing to the old meanings as embodied in the older and familiar language or repetition of old and familiar formulations by way of criticisms or minor additions and modifications" (Schwab, I969, p. 4). Schwab could have been describing research on marks since the I96OS. He went on, however, to suggest that the problem resulted due to an unexamined reliance on theory in an area where theory is partly inappropriate and partly inadequate to the task. 29 Schwab argued that curriculum was a practical rather than a theoretical art. Practical arts begin with the requirement that existing institutions and existing practices be preserved and altered piecemeal, not dismantled and replaced, because the functioning of the whole must remain current. That is, the practical is concerned with the effects of the proposed pattern of change through time in order that they retain coherence and relevance to one another (Schwab). Research on marking has been limited by a narrow conceptual framework which has (I) concentrated on product and learner variables, (2) concentrated on static designs and (3) concentrated at an idealistic level of what ought to be rather than the realistic, practical classroom as it is. In this respect, research on marking reflects similar problems to curriculum, the problems of psychological research applied to education. The conceptual framework of marking has been limited by a concentration on product and learner variables and by a neglect of the judgment processes of teachers. Yet, beginning with the work of Starch and Elliott in I9I2, teacher judgment has been acknowledged as the crucial element in variation. But Starch and Elliott did not study teacher judgment in relation to human judgment in general. They preferred to work with product standards, marking frequencies and distributions. By I947, Wrinkle stated that the "assumption that anyone except the person who gives a mark can look at it and tell with any degree of accuracy what it means is the No. I fallacy involved in the use of the conventional marking system" (Wrinkle, p. 35). But Wrinkle did not study the cognitive processes which allow teachers to combine elements into a judgment. Instead he attempted to solve the problem by defining behavior into small easily observable behavioral units. The marking literature contains several references to teachers' tendency to skew marks upward, but there are no systematic studies of teachers awareness of or rationale for this phenomenon. Hence the conceptual framework has remained limited and repetitious. 30 Lavin, in a review of 300 empirical predictive studies of achievement, noted that research had not studied the teacher grading process. "A student's grade is more than something that characterizes him as does his score on a personality inventory or an intelligence test; . . . rather a grade should be viewed as a function of the interaction between student and teacher . . . . It is clear that if we want to predict a grade, we must know something not only about the student, but about the teacher as well." Lavin argued in I965 for a study of the subjective factors in teachers' grading practices, even though they were harder to define and to measure reliably (Lavin, I965). To date, that study has been neglected. Cronbach argued in a similar fashion in I975. Noting that the original role of the scientist to observe, had been neglected in favor of controlled experimentation, he pointed out the severe limitations of experimental or individual psychology to detect interactions of the sort Lavin called for. He attributed this limitation to the need in physical science to have a fixed reality, a controlled variable or a limited range of situations. "Rarely is a social or behavioral phenomenon isolated enough to have this steady process property" (Cronback, I975, p. 682). Cronbach called for a return to a disciplined observation of uncontrolled conditions, of personal characteristics and of events that occurred during treatment and measurement. One reasonable aspiration of psychology is to assess local events accurately in order to improve short-run control, short—run empiricism. Cronbach's call for careful description has been answered by numerous ethnography studies of classroom behavioral processes, but descriptions of cognitive processes have been slower to appear. Hence the mental processes which help teachers to simplify the complex environments of the classroom have essentially been neglected. 3| Thorndike's review of the marking literature supports both Lavin and Cronbach by concluding that any future improvement in marking practices had to be sensitive to the very real limits in time and precision of judgment within which the typical teacher operates. He stated further that research had been largely unaware of the complex cultural pressures which bear upon an instructor much more than any psychometric theory. In this respect, he pointed to the "modus vivendi" which has typically been worked out between the traditions of marking and the rest of the institutional culture (Thorndike, p. 765), a "modus vivendi" which is typical of the practical arts but often neglected by theoreticians. The conceptual framework has been further limited by static research designs which have allowed the assessment of pupil progress to be dominated by one final (summative) mark, paper or test. Teachers, however, give several marks during a year and usually these are based on numerous task assignments or tests. Few studies have examined the impact of performance patterns. In these few which have, a relatively consistent pattern of evaluation has been found: continuously high performance > ascending performance >descending (or random) performance > continuously low performance (Ryan 8. Levine, I98l). Other linear experiments have investigated whether expectancies reflect a primary or a recency effect with final performance being dominant. A current study by Ryan and Levine found a modified recency model which assumes that the weight assigned to final performance varies inversely with the difference between final and next to final performance to be most reflective of observers evaluative processes. Predictions were different from evaluations with subjects less willing to make a long-term bet on a "late bloomer" than on one who showed generally high or consistently improving performance. These sequential studies of patterns within marking show promise and need replication in natural settings. 32 The documented limitations of the conceptual framework in marking research are strikingly similar to the situation in curriculum described in I969 by Joseph Schwab. Marking functions may lend themselves much more to practical arts than to theoretical and ideal circumstances. The ability of teachers to consider a practical combination of functions in a single symbol is best discussed through the literature on teacher decision making which follows. Teacher Decision Making and Judgment Teacher decision making is a relatively new field. It emerged from the field on Teacher Effectiveness in response to a growing criticism that despite extensive research, few consistent relationships between teacher variables and effectiveness criteria had been established (Doyle, I975; Dunkin 8. Biddle, I974; Getzels 8. Jackson, I963; Medley 8. Mitzel, I959). In addition, the field was characterized by a growing discrepancy between theoretically based prescriptions and actual classroom practice (Doyle, I975). Teacher decision-making studies were an attempt on the part of researchers to get beyond the variables of observable behavior to the teacher mental processes which select tasks and guide classroom behavior. Such inquiry would shed light on the growing discrepancy between theory and practice. Teacher Effectiveness Research The literature on teacher decision making is limited, but its roots are well established in the teacher effectiveness literature. The bulk of effectiveness research concentrated on four classifications of variables; presag variables (teacher characteristics such as age, sex, social class and training), context variables (grade level, subject matter, size of class and community), process 33 variables (teaching method and style, talking and questioning strategies) and product variables (achievement and tests). In contrast to the marking literature which focused largely on product and learner variables (aptitude and intelligence), teacher effectiveness research concentrated on process variables in an attempt to isolate generic skills which would be taught in teacher training institutions. Empirical research, manipulating or counting these process variables in controlled settings, was extensive and has been well reviewed (Dunkin 8. Biddle, I974; Gage, I978; Medley, I977; Rosenshine, I976). The teacher-effectiveness research contributed greatly to the identification, specificity and frequency of teacher actions in the classroom (questioning, reinforcing, wait-time, acknowledging students, task orientation, praise, absence of criticism; Gage, I978; Medley, I979; Rosenshine, I979). Gage, sifting through several hundred variables in teacher behavior and classroom activity, summarized the implications of the research to say that teachers should organize and manage classes so as to optimize the concept of "academic learning time"-time in which pupils are actively engaged in their academic tasks (Gage, I978, p. 39, 40). To this end, Gage listed seven "teacher-should" statements which he felt were easily inferred from research data. Hence the search for overt acts of teachers which were related to student achievement bore fruit, and recently Rosenshine (I979) described process-product research as "alive, well, and continuing." Nevertheless, teacher-effectiveness research came under continued criticism much of it from within its own research ranks. The process-product approach was limited conceptually (Brophy, I980; Doyle, I975; Shavelson, I980) primarily because it had not taken account of teacher thoughts and intentions in organizing the classroom. Brophy, prominent In the field, stated that effectiveness research had been virtually silent on the topic of teachers' thoughts 34 while engaged in the act of teaching. He attributed this silence to the pervasive influence of behaviorism on American social science research which tended to look upon thoughts as "mere epiphenomena accompanying behavior" and which tended to focus on observable teacher skills rather than decision making. However, BrOphy stated that this position had softened even among serious behaviorists like Bandura (I977) and Meichenbaum (I977) who had recently stressed the role of thinking (self-talk, verbal behavior) in directing behavior (Brophy, April I980). In his latest paper, Recent Research on Teachirfi, Brophy noted the limits of teacher research during the 703 and predicts a shift from studying teacher effects as measured on end-of-the-year achievement tests to studying immediate or at least short-term outcomes of instruction, with increasing attention to the performance demands that different teaching behaviors and decisions impose on students (Brophy, November I980, p. 20). Others support BrOphy's point of veiw, finding many classroom and teacher problems which are not addressed by the process-product approach. Shavelson notes that recommendations about increasing the frequency of a teaching act say nothing about when to act (N.|.E., I975; Shavelson, I980). Hence a teacher may possess a full range of teaching skills but not have strategies to determine when they are appropriate. In a comprehensive paper on the shifting paradigms of research, Doyle had predicted a shift away from the conceptual limitations of the process-product approach as early as I975, although he foresaw more emphasis on the environmental press of classroom demands (ecological) which establish limits to the range of response options available to the teacher (Doyle, I977). Information Processing The basis for a study of teacher thoughts has been well laid at the Institute for Research on Teaching. The model for the institute was one of 35 Clinical Information Processing, presented at the National Conference on Studies in Teaching by Lee Shulman (N.I.E., I975). The focal point of the model and of the institute remains cognitive functioning, or the mental life of the teacher. How do teachers process the information of the classroom to create an environment which promotes achievement? The information of the classroom includes student characteristics, student records, subject matter, curriculum resources, organizational structure of the school, the accountability system, etc. Each of these serves as a set of "cues" from which the teacher makes diagnostic judgments and plans prescriptive actions. The panel which supported the general clinical model was anxious that it not be a sterile focus, and they made a commitment to whatever forms of disciplined inquiry seemed appropriate to the research problem and educational setting under investigation. Hence the model has inspired research into a variety of teacher decisions, and it has encouraged the borrowing of conceptual research frameworks from related fields such as cognitive psychology and anthropology. The theoretical grounding for the information-processing approach came primarily from cognitive psychology with meaningful elaboration by learning psychology. In particular, information processing acknowledged the fact that the ability of peOpIe to process all of the information in their complex environment was limited by the capacity of short-term memory (Newell 8. Simon, I972; Simon, I969). Human decision making and judgment was able to overcome some of the limitations by using the identified processes of simplifying and inferring. Simplifying strategies such as "chunking" information into more abstract units, increases the amount of information processed (Miller, I956). Simon specified seven chunks plus or minus two while Craik lowered this estimate to three chunks, 36 plus or minus one (I97I).I People also selectively perceive and interpret portions of the available information and construct a simplified model of reality (Newell 8. Simon, I972). By "bounding" the system within which they are operating, decision- makers do not have to deal with the real world in its totality but rather with only those "strategic factors" in a situation, once again reducing the demands of the environment (Simon, I957). In this context, Simon argued against the rational decision models which held that all possible alternatives and their outcomes should be examined before a decision is made. Instead, Simon contended that administrative man, as distinguished from the ideal economic man, looks for a course of action that is satisfactory rather than optimal. Along with March, Simon elaborated the notions of "satisficing" and "bounded rationality" as depicting the reasonable thinking of man in a complex, confusing environment (March 8. Simon, I957; Yinger, I977). A second mental strategy underlying information processing is inferring or going beyond available data. As discussed in the cognitive literature, the most common tools of inferencing include the creation of knowledge structures (schemas, beliefs), the availability heuristic and the representative heuristic (Nisbett 8. Ross). Kahneman and Tversky (I973) noted that decisions often required people to judge the relative frequency of particular objects or the likelihood of particular events. In so judging, they may be influenced by the availability of the objects or events, that is, by their accessibility in memory processes. To the extent that this availability is associated with objective frequency, it is a useful judgment tool. However, cognitive psychologists have IChunking is pertinent to this study when considering that arguments within marking research frequently revolved around having three, five or seven symbols in the system (Smith 8. Dobbins, I959). 37 found it subject to typical biases. Kahneman and Tversky described a second tool which aids people in decisions and which they termed the representativeness heuristic. People assess the degree to which the salient features of the object are representative of, or similar to, the features presumed to be characteristic of the category. Although this heuristic is useful, indeed an absolutely essential tool, it too is subject to typical problems—the most common being the neglect of important information contained in relevant base rates (Nisbett 8. Ross). Nisbett and Ross contend that people's understanding of the flow of social events probably depends less on specific use of tools than on such procedures as a rich store of general knowledge of objects, people, events and their relationships. This knowledge is represented as beliefs or theories or schemas. Such knowledge structures provide an interpretive framework which helps supplement information and resolve ambiguity. The concept is interesting to psychologists, and Nisbett and Ross refer to a growing list of terms including "frames," "scripts," "prototypes" and the more general term "schemas" (for a detailed discussion of these structures, one is referred to Nisbett 8. Ross, Human Inference. Cognitive psychology has studied these mental processes in applied settings in order to understand people's actions in certain complex environments. Given the growing awareness of the complexity of the classroom environment, given the increasing functions of education and, given the limited predictable relationships found between variables in previous educational research, it seems natural that researchers in teacher effectiveness have begun to join fields to explore the mental processes which guide applied behavior. To date, these studies within education are limited, informative and growing in number. Most have concentrated on teacher choice of alternative curriculum, teacher allocation of time, teacher interactive decision, teacher conception of subject and teacher planning. It is notable that none have been undertaken on teacher judgment during the marking task. 38 Teacher Decision Making Since I970 there have been approximately forty studies of teacher decision making. Reviews of the teacher thinking literature are organized around the categories of planning, judgment, interactive decision making and others (Brophy, April I980; Clark 8. Yinger, I978; Joyce, I980; Shavelson, I980). Since each review examines most of the some studies, this review will summarize the trends and conclusions, referring only to specific studies under the category of judgment. Most of the planning studies have been curious about the extent to which the model of rational curriculum planning is used in teacher plans. This applied model was first proposed by Tyler (I950) and later elaborated by Taba (I962) and Popham and Baker (I970). It recommended four essential steps: (I) Specify objectives, (2) Select learning activities, (3) Organize learning activities and (4) Specify evaluation procedures (Clark 8. Yinger, I978). The findings of Zahorik and Taylor, using questionnaires and rankings, were that teachers did not begin their planning with objectives, but rather with content (subject matter) or resources. Contrary to the theoristis ideas, procedures for evaluation were an issue of minor importance (Taylor, I970; Zahorik, I970). Following I975 research on planning focused on simulated planning situations rather than rankings (Peterson, Marx 8. Clark, I978; Morine, I976). Simulated cases in laboratory settings then gave way to classroom field studies (Marine, I975; Smith and Sendelbach, I979; Yinger, I976). The latter relied heavily on self-report methods, especially verbal analysis of "think aloud" sessions or written plans and stimulated recall procedures. The general conclusions of the planning research were that teachers' plans are unique, that teachers focus first on content (subject-matter), that they spend much of their 39 planning time on activities or instructional tasks and that they seldom refer to behavorial objectives. The planning literature joined the interactive decision-making literature on the importance of maintaining an activity flow during class periods. To this end, plans are routinized to minimize conscious decision making during interactive teaching (Clark 8. Yinger, I979; Joyce, I979; Mackay 8. Marland, I978; Morine 8. Dershimer, I978). Conscious decision making, therefore, usually arises when the teaching routine is not going as planned. Joyce's review of planning and interactive decision making supported the notion that the most powerful decisions were made early in the year. These decisions involved the selection of instructional materials and the development of a flow of activities which enabled children to approach tasks which were embedded in the material. This flow limited the potential stimuli to which students and teacher responded, establishing routines and parameters of on-and- off task behavior—this became the basis of information processing and fine tuning. Once this flow was set up, teachers rarely made decisions which changed directions. Instead, their concerns divided between pupil achievement (appropriate response to content of tasks) and involvement (maintenance of the on-the-task behavior). In other words, said Joyce, teachers do not think as instructional designers do, continuously selecting new methods and ways of reaching children, but rather, they work within a general design set up early in the year (Joyce, I 979). T_eacher Judgment Along with planning and interactive decision making, judgment has been investigated. Reviewing judgment studies is complicated by the lack of distinction between the terms judgment, planning, prediction and decision 40 making. For example, planning and judgment are frequently lumped together in literature reviews, although planning is generally defined as a beginning act or preactive. Judgment, on the other hand, is located at the end of a sequence of operations, as a post-active operation, as the final "assignment of an object to a small number of specified categories" (Johnson, I972). Prediction is more closely associated with planning while evaluation is associated with judgment. Within the teacher studies to date, these distinctions have not been clear, although Einhorn and Kleinmuntz recognized the problem and suggested "that the distinction between judgment and choice be maintained and sharpened" (Einhorn, I979). Therefore, when is judgment most important in teaching? Clark answered that "teacher judgment plays an important part in predicting student cognitive and affective achievement, predicting teacher's use of instructional moves, teacher planning and teacher selection of instructional activities" (Clark, I978). Clark's definition is obviously broad and points to the lack of clear classification of judgment studies, yet it depicted the newness of this focus of research for which Shavelson is now urging a taxonomy (Shavelson, I980). Teacher judgment studies, as classified in the literature reviews of Clark, Shavelson and Brophy, are limited to approximately ten studies. Anderson (I977) studied the judgment policies of I64 high school teachers in regard to 36 different hypothetical descriptions. Teachers were asked to rate the descriptions on a nine- point scale from very poor to outstanding. In addition, teachers were asked to rate each characteristic separately and to rank order them. Conclusions were for greater consistency in areas ranked least important, such as establishment of objectives and homework requirements and inconsistency in areas ranked most important, such as knowledge of subject and fairness in grading. The general conclusion was that teachers may base their decisions on a policy different from the one they report using. 4I In a frequently quoted study, Shavelson et al. (I977) investigated the sensitivity of teachers to the reliability of information received and willingness to revise judgments when presented with new information. This experiment involved I64 graduate students in education with a hypothetical case study. The major finding was that teachers are sensitive to reliability of information and will revise estimates. This finding contradicts the major findings of Kahneman and Tversky that people in general were neither sensitive to reliability of information nor willing to revise. In a related study involving anchoring, Joyce et al. asked I0 teachers to perform pupil sorts at repeated times during the school year and to predict end-of-the-year reading achievement. In sorting, the major cues were student personality and involvement. Other cues included ability and achievement. The most obvious finding of the study was that teacher prediction of reading did not differ substantially between September and November, even though teachers had much more information. This stability of judgment contrasts with the findings of Shavelson et al. A study by Clark et al. (I978) attempted to identify the cues within language arts activities which caused teachers to judge them useful. Clark asked I4 teachers to rate 26 language arts activities. He then asked them to reexamine each activity rated highly and list the attractive features. This was repeated for activities rated low. Student motivation and involvement were mentioned most frequently followed by features thought to influence cognitive and affective outcomes. Level of estimated difficulty of the subject came third. Marx (I978) studied the judgment cues teachers used in predicting cognitive and affective outcomes. Twelve teachers taught a series of three social studies lessons to groups of eight junior high students in a laboratory setting. After each session, the teachers made predictions of the rank order of students and described student behavior or other cues which caused the prediction. Marx found that participation was the most frequently used cue. 42 In a subsection on judgment, in his review of teacher cognitive activities, BrOphy (April I980) referred to many of the some studies. He referred back to expectancy studies which indicated that teachers can and sometimes will make predictions about student achievement on the basis of such factors as race, handwriting neatness or physical attractiveness (Brophy 8. Good, I974). He was also intrigued by the tendency of first impressions to "anchor" later perceptions as noted in Joyce's study. This anchoring appears in documented studies where students who do well early and then tail off tend to receive higher predictions of future achievement than students who ultimately earn the same average score (Brophy, April I980). This last finding of "anchoring" is questioned by the conclusions of Ryan and Levine, mentioned earlier which indicated that people use different sequencing cues for prediction than for evaluation, anchoring being more closely related to prediction and recency to evaluation or final judgment. On the whole, Brophy reports that analyses indicated that teachers based their estimates of student achievement and classroom conduct on the cues most relevant . . . reading achievement on prior reading, math predictions on prior math, and behavior on previous classroom conduct. A study by Willis (I972) supported this. Willis asked first grade teachers to predict end-of-the-year achievement during the first week of class. Teachers early rankings correlated highly with end-of-semester rankings. Interviews in regard to criteria of judgment revealed self—confidence, participation in class, general maturity and ability to work well independently. After a few weeks of experience with the students, even more weight was given to observed classroom performance in reading. This agreed with Shavelson's results and similar findings were reported by Long and Henderson (I972). Brophy concluded that compared to the work on teacher planning, "teacher judgment has yet to 'jell' as a subfield with structure and direction." 43 Critique of Decision-MakingResearch All reviews point out the newness of the field and the lack of studies to critique. Shavelson's review (I980) pointed out that the planning studies were not sufficient. Most of the research was descriptive. Little was known about how and why activities were constructed or under what conditions they were used. This might help explain the contradiction between simulated studies which have indicated that techers do use objectives (when probed) and naturalistic studies which find they seldom use objectives. Brophy's critical comments also called for going beyond descriptive studies to identify aspects of quality in teacher planning, ways to conceptualize, measure and document differences in quality. The findings of the planning studies reveal remarkable agreement. The apparent lack of usefulness of objectives needs more investigation before prescriptions are made in teacher education. Most samples are small, most studies unreplicated. Aside from the obvious newness of the field, the outstanding characteristic of the decision-making research is the general absence of outcome relationships. What are the teachers' thoughts and intentions about final achievement, given a year of instructional time from a student's life? How is the effectiveness of planned tasks evaluated? Is "maintaining activity flow" related to Gage's admonition to optimize "academic learning time"? Doyle's work is extremely pertinent to the lack of established relationship between planned tasks and outcome. He suggested that the formal task structure of the classroom is defined as an exchange of performance for grades. Becker, Geer and Hughes (I968) also contended that this "exchange of performance for grades is, formally and institutionally, what the class is about." Schellenberg (I965) described the task structure as an exchange of performance for status. Student differential attentiveness is related to what the student perceives is 44 important to the teacher and the exchange process (Doyle, I975). This exchange is obviously related to the activity flow of the classroom. Nevertheless, the marking process is never mentioned in the teacher decision-making literature, although it provides a promising site from which to examine teacher thoughts about outcome. A second characteristics of the literature is its concentration on identifying cues which account for static decisions. The research techniques concentrate on one-time products (achievement tests) or one-time decisions rather than an a series of decisions over time, a progression. Resnick describes this as a weakness of cognitive psychology in general. "Cognitive psychology has been largely concerned with describing 'maments' of performance or 'states' of understanding. In can be characterized as a 'transparent snapshat' psychology, in which mental processes are depicted at a given point in time . . . . (In contrast,) learning theoristis record 'movies' rather than snapshots but continue to seek primarily avert behavior." Resnick seeks a blending of the strengths of each specialty for the purpose of building a psychology of instruction (Resnick, I98l). Resnick's concerns bear directly upon the decision-making literature which has reflected much of cognitive psychology's development The decision-making literature also lacks clarity of definition. In particular, the term judgment is used loosely to cover all cases where "cues" are used in arriving at a decision. Yet some studies indicate that different strategies may be used for evaluating than for predicting, for marking than for planning. Shavelson's suggestion for a taxonomy is especially worthy in this context. Work by Einhorn, Kleinmuntz and Kleinmuntz on linear regression and process tracing in judgment studies concluded a section by stating "in any event, more work is necessary to clearly distinguish judgment from choice and the processes that may be invoked by each" (Einhorn, I979). 45 Given a similar discrepancy between increasing theoretical prescriptions for teacher effectiveness and for teacher marking and given the limited application of these prescriptions in the classroom, an investigation into teacher mental processes during marking yielded useful insights. Summary This literature review has attempted to place the study of teacher judgment during the marking process into the framework of pertinent related research. The obvious conclusion from the review is that historically the functions of marks have incresed over the years. However, despite research and new prescriptions, teacher responsiveness to this increase has been limited, causing a continuous dissatisfaction. A related conclusion, equally obvious, is that teacher judgment processes have been neglected despite their acknowledged role in determining marks. This study was a first step in connecting these areas and exploring this neglected site of research. CHAPTER III RESEARCH SETTING, PROCEDURES AND METHODS Introduction Conventional teacher marks, A B C D E, symbolizing pupil progress are likely to dominate official records for some time to come. This study was designed to capture the judgment processes of teachers during marking across one school year. It addressed the following research questions: I. Upon what information is the summative mark (first, second and final based)? 2. What cognitive processes make possible the formative stages (record book categories) of marking? 3. Is there a judgmental rule which explains how the input information (formative) is transformed into the output (summative), a mark? 4. If the judgmental rule yields a zone of uncertainty between any two preordained categories of judgment, A B C D E, what processes enable the teacher to assign a mark up or down? How and why do they work? 5. Do the identified cognitive processes farm a pattern, schema or model of the marking process? 6. Do the identified teacher cognitive processes account for the five functions ascribed to marks by society in general? 7. Of the four research methods used in this investigation, is one superior for illuminating the marking process? In addressing these questions, five elementary teachers from one Michigan school district served as research subjects. Data were gathered from each in the curriculum areas of language arts and mathematics following three marking periods across the school year. The methods of gathering and analyzing data 46 anon: 47 included: process tracing, policy capturing, attributional theory and utility theory. This chapter contains a description of the school setting along with the method of selecting teacher participants. The description is followed by an explanation of the procedures and methods used in data collection and analysis. The methods explanation is presented in the context of the judgment literature, with a rationale for the use of multimethod analysis of the self-report data which is basic to cognitive studies. Research Setting School District B, the site of this study is representative of the typical, surburban districts in Oakland County, Michigan. Its enrollment is declining, with a current pupil population of l4,500. These pupils are distributed across six secondary campuses and 2| elementary buildings. The pupils in District B come from a broad range of socioeconomic background, although ethnic mix is modest and racial mix minimal. Pupils in IQ of the 2| elementary buildings are served by Title I programs, indicating low socioeconomic status, while the majority of pupils in some buildings come from homes where the parents are professionals. Frequently these backgrounds are mixed in one building. Declining enrollment continues to cause mergings of these differing student populations. Pupil performance in District B is average among Oakland County's 28 school districts. The district ranks in the middle of Oakland County's range on the Michigan Assessment Test. Performance on the California Basic Skills Test registers slightly above the national average. Pupil scores from the Differential Aptitude Test also support this average profile. 48 District B has a policy of building autonomy whereby principals and their staffs select their own pupil reporting system. Fourteen of the elementary schools report pupil progress, at the upper elementary levels, via traditional marks plus a checklist and comments. The remaining schools use checklists and written comments without marks. All schools have four marking periods and two parent/teacher conferences following the first and third markings. Five principals from schools using traditional marks expressed an interest in the marking study, and said they would consult their teachers regarding participation in the investigation. (Experience beyond five years in the upper elementary classrooms was the only criterion of the researcher.) The first two principals contacted by the researcher, following the initial expression of interest by the five, yielded five volunteer teachers, three men and two women, from grades four, five and six. These five teachers became the subjects of the study. The subjects were representative of the mode of teacher tenure in the district—none had less than l4 years of teaching experience. It is important to note that these subjects were not selected for being the "best" teachers. Instead they volunteered to give information to the investigator during free periods or when the principal substituted. Each teacher carried a typical class load ranging from 29 to 33 students. Procedures Data Collection The structured interview was the primary source of data acquisition. Each of the five subjects was interviewed immediately following the first, second and final marking, of four periods. The interviews were taped and subsequently transcribed into protocols. 49 The in-depth interview is frequently criticized as a method of self-report which reflects known biases and justifications. In this study of marking, the strength of the interview is the potential to elicit personal opinions, including biases, knowledge and understandings related to one repetitive and official portion of the teacher's job. The potential is best realized by heeding expert advice on the interview. The interview as a method is well discussed by Bussis and Chittenden (I976), Schatzman and Strauss (I973), Pelto and Pelto (I970). Each of these authors notes the enormous literature on interviewing formats and techniques, each stresses the importance of grounding interviews in other theoretical work, and each finds the information supplied by a key informant of a given cultural environment an essential element in constructing the meaning behind behavior. The form of interview questions is important. "For many situations, fieldworkers should devise questions concerning concrete events, behaviors and possessions, instead of asking questions involving vague generalizations," says Pelto. Bussis and Chittenden reported that "questions phrased at a high level of generalization turned out to be answerable only by abstractions and generalities too vague to be revealing of personal constructs . . . and the type of question that more readily brought out personal constructs was one posed with concrete reference to materials, to classroom practices or to children's behaviors." Schatzman and Strauss state that the level of abstraction or concreteness is best found by becoming familiar with the culture, hence, "early interviews tend to prove less economical than later ones, mainly because the researcher has not yet fully determined precisely what information he needs . . . . Observation, brief questioning and casual conversation are so very important; they eventually provide a broad context for effective and economical interviewing." 50 Thus, the taped interviews in this study, conducted on-site, were based on previous insights into interview formats and focused on products of the teachers' own creation, such as, record books. This allowed for prediction, reflection and open-ended responses. The interview is presented in Appendix A. In addition to the interview, data were collected from official marks, record books and a pupil sort. Marks of all students in each class were collected in language arts and mathematics although the teachers also marked in spelling, reading, social studies, science and art. The teachers were also asked to predict the marks for each student for the next marking period and give brief reasons why the mark was predicted to remain the same, go up or go down. The record book data were collected to cross-check teachers' verbal protocols. Data Organization The data collected were organized into six sections, a composite case and five individual teacher cases. The composite case includes a model of the teacher judgment processes during marking and subsections on rules, statistical analysis and protocol analysis. The five teacher cases, each of which also has a subsection on rules, statistical analysis and protocol analysis, follow the composite case. Data Analysis The analysis of data—marks, predicted marks, record books and pupil sort- was both qualitative and quantitative. Specific analysis of marks and predicted marks involved multiple regression analysis, Pearson correlations and frequency distributions. Transcribed interviews were coded verbatim and categorized in several ways: by the common attributional categories of ability, effort, task difficulty and home support; by elaboration of description; and by a decision tree. 5| Methods Four research strategies from the field of judgment seemed especially congruent with the marking phases: process-tracing techniques to establish the validity of an overarching schema (taped interviews and content analysis of verbal protocols); policy-capturing techniques to analyze the record book system and combination rule throughout the year (multiple regression, Pearson and partial correlations and frequencies); utility analysis techniques to investigate teacher method of assessing risk related to classroom behavior (decision tree); and attributional techniques to investigate teacher method of assessing risk related to future student motivation to achieve (interview data related to record book analysis and prediction data). Using a multimethod approach to explore teachers' grading processes, allowed the broadest description of the task. An integration of approaches sought to use the strengths of each method while minimizing the weaknesses by carefully distinguishing findings which were corroborated by several methodological perspectives and those which emerged in only one field of reference. In this manner, the study attempted to recreate teachers' understandings surrounding the judgment task and to relate the task to achievement and management in each teacher's unique classroom. Process Tracing The emphasis in the "process-tracing" method is on building a model or schema of the cognitive rules and heuristics used by clinicians. Analysis of verbal data from taped interviews or think-aloud sessions is the basic method of identifying implicit rules and heuristics. In contrast to the familiar linear regression technique of policy capturing which seeks to identify the judgment cues 52 which carry the most weight, process tracing identifies many cues which may have been used in a task but which do not necessarily receive the most weight (Einhorn, I979). In the initial search to discover the underlying processes in the marking judgment, the researcher was interested in all cues and possible patterns which may serve as an adaptive mechanism between a very complex learning environment and a simple, required response (a mark). Judgment during the marking process suggested two subjudgment phases, one which allowed multiple categories (five marks, A B C D E, with or without pluses and minuses) and a potential second one which allowed only a choice between any two categories. The decision between multiple categories often necessitates a quantitative mode of analysis while a choice between two, especially if one is considered a reward and the other a punishment, may dictate more qualitative approaches (Einhorn, I979). F or the purpose of inquiry into a new cognitive arena, teacher marking, it is important to note that process tracing was used to establish a path in the wilderness, a pattern or schema. In many previous judgment studies, process tracing was used to indicate that a linear model was too simple when cues were used concurrently or as trade-offs. In such studies, cues were already specified and the method sought to assess how they combined, balanced or traded off. It is in this multiplicative sense that Einhorn describes process tracing as a more detailed method of analysis than linear regression when he critiques both (Einhorn, I979). In this study, however, process tracing was much more general than linear regression, and it attempted to pattern decisions made over a longer time frame. The intended outcome of process tracing here was to build a general model. If process-tracing methods support a schema, then policy-capturing techniques and . attributional-utility techniques could give a more detailed analysis of particular heuristics used at different phases of the task. 53 Verbal report data are frequently dismissed as unscientific and simply a form of introspection which is worthless for verification. Ericsson and Simon point out, however, that behaviorism and allied schools of thought have been schizophrenic about the status of verbalizations as data because the basic behavioral data in standard experimental paradigms are a signal "yes" or "no" or a choice between words which is essentially indistinguishable from verbal respones. But Ericsson and Simon also point out that researchers using verbal report data are remiss in reporting the details of how they collect and analyze verbal data and how it relates to even a rudimentary theory of cognitive processing. Ericsson and Simon are especially anxious to distinguish between concurrent verbalization which gains information from subjects while they are attending to it and retrospective verbalization where the subject is asked about cognitive processes which occurred at an earlier point in time (Ericsson 8. Simon, I980). These concerns parallel the interest of researchers who have constructively criticized the in-depth interview. The strength of the interview is its potential to elicit personal apinions, knowledge and unerstandings. Within this study the interview has been detailed under Data Collection. In an attempt to strengthen insights gained from the interviews, multimethods of analysis have been applied and the interview process has been grounded in official records, teacher marks. Policy Capturing The most frequently used method of studying judgment processes is policy capturing (Shulman 8. Elstein, I975; Slavic 8. Lichtenstein, I97I). Beginning with a linear model, this method attempts to reproduce the inferential responses of a particular judge as he weighs and combines important factors (cues) in the task environment. Policy capturing concentrates on identifying these cues, defining 54 their relationship to each other, and estimating their relationship to a final judgment for purposes of prediction and control. Policy-capturing studies generally discuss alternative methods of cue specification. Five approaches are delineated and evaluated in an article by Clark, Yinger and Wildfang (I978): (I) logical specification, (2) expert opinion, (3) prespecification and narrowing of a large number of potential cues, (4) allowing cues to emerge during a judgment task and (5) participant observation in a naturalistic setting. The expert opinion method consults experts to determine their nominations for important cues in a specific judgment task. This approach has been widely used in studies of pathologists, physicians and stockbrokers (Shulman 8. Elstein, I975). But the approach is criticized because cues which may be nominated may not be the ones actually used in practice and/or because experts may weight cues very differently. Policy capturing's reliance on linear regression has also been criticized as much too simple a depiction of human judgment with its highly contingent methods. Einhorn has repeatedly responded to this "simplistic" label by pointing out that the mathematical representation of a model should not be confused with the process which it represents. He elaborates that the essence of linear models is that they imply trade-offs thereby "not only acknowledging the existence of conflict but resolving it through compromise . . . linear models of judgment imply a sophisticated and complex cognitive strategy" (Einhorn, I979). Policy capturing lends itself to a study of the processes involved in teacher judgment during marking due to the specific nature of the task environment. Cue specification in the marking task relies on the teacher as expert with continuous corroboration from official records in the grade book. Verification of judgments can come through retrospective generalizations which rely on past action that is evidenced in public record. Stenhouse proposes "the 55 ideal that no qualitatively-based theorizing in education should be regarded as acceptable unless its argument stands or falls on the interpretation of accessible and well-cited sources so that the interpretation offered can be critically examined" (Stenhouse, I978). A routine check of the record book quickly established how the cues were combined to arrive at the judgment (mark). A strategy of recording and predicting grades over a year's time (three marking periods) also provided statistical evidence of trends across time, correlations between marks and predictions and identification of anchoring or recency effects. When policy-capturing studies attempt to infer causes for the choices of a judge or weights assigned to cues used in a judgment, they must guard against two common errors. Kahneman and Tversky (I973) have named these the "availability" and "representative" biases. The first infers cause from available evidence, apparent frequency or accessibility in present memory. Obviously biases in the judge's exposure, level of attention and storage can arise. The representative bias predicts future choices by reflecting on the degree to which the specified outcome represents its origins or resembles a like case. In drawing inferences, the decision heuristic of which class or category on event represents, is an essential tool which must be understood to avoid bias such as the familiar "Gambler's Fallacy" which leads one to conclude after observing a long run of red on a roulette wheel, that black is now due because it has been underrepresented in a chance process up to now. Preliminary examination of teachers record books indicate an intent on the part of teachers to "be fair" and to eliminate the common biases referred to by Kahneman and Tversky. (Sample record book in Chapter IV.) The grid of the record book becomes an inferential teacher tool to arrive at a baseline figure of performance within a class population (vertical column) balanced against a 56 baseline figure for individual performance (horizontal). The baseline for whole class performance is related to the assignment which is considered to be indicative of similar (representative, normed) grade level performances across the nation according to the textbook publishers and related to the collected performances (consensus) of that particular class by collected samples of work graded on a point basis. The baseline data for a given student are the collection of numerous work samples of the same student over time. The record book, therefore, is used as a scientific data base. The policy-capturing phase of this study analyzed this data base by multiple regression methods in order to capture some potential underlying policies of the teacher. The information obtained lead to an understanding of "what" work samples a teacher accumulates and "how" the specific marking judgment is made. Utility Analysis Although policy capturing appears well suited to the first phase of marking, it does not shed sufficient light upon the process if the combination rule does not yield a perfect preordained category such as A B C D E. What process operates when a student's work falls into a zone of uncertainty midway between two marks? How does the teacher decide to go to the higher or lower mark? Utility analysis is a systematic approach to decision making under conditions of uncertainty and risk. It forces the decision-maker to separate the logical structure of the decision problem into its component parts to be analyzed individually and recombined systematically (Weinstein, Fineberg, Elstein, Frazier, Newhauser, 8. Neutra, I980). Alternative actions must be clearly described and the question of the need for additional information at each branch of the decision tree is recurring. Alternative outcomes, assessment of values and possible trade- offs are crucial to decision analysis and involve utility decisions. 57 The decision tree is the fundamental analytic tool for utility analysis. It requires the decision-maker to identify alternative actions which might be taken at different times and to obtain information at these times. An optimal course of action is desired. The building blocks of the decision tree are: l. Choice nodes, at which one or two alternative actions may be explored. 2. Chance nodes, at which the status of the student is revealed, test information becomes available or other outside information. 3. Outcomes, which describe what happens to a student along each path of events in terms of attributes held to be of value (i.e., classroom cooperation and concentration on task). "By convention, a decision tree is built from left to right, with choice nodes represented by squares and chance nodes by circles and with outcomes specified at the right hand 'tips' of the tree" (Weinstein, Fineberg, et al., p. I6). A decision tree for the marking process may be found in Chapter IV, Figure 4.4. The reasoning process behind utility analysis can be summarized in four major stages: I. Data acquisitionJ in which information is obtained by the teacher using a variety of methods (tests, homework, oral participation in class, observed behavior during class sessions). 2. Hypothesis generation, in which alternative problem formulations are retrieved from memory. 3. Data interpretations, in which the data are interpreted in light of the alternative hypotheses under consideration. 4. Hypothesis evaluation, in which the data are used to determine if one of the diagnostic hypothesis already generated can be confirmed. If not, the problem must be recycled: New hypotheses generated, and additional data 58 are collected until one of the hypotheses is verified (Weinstein, Fineberg, et al.). Weinstein, et al. point out that two modes of clinical inference are observed in the process; one is diagnostic and the other therapeutic. In the case of teacher marking judgments, both are concerned with sustaining or increasing a student's motivation and cooperation as indicated by work production, class participation and lack of problem behavior. The decision-analysis process appears well suited to investigate a teacher's choice of giving a higher or lower grade, and each marking period gives the teacher a new chance to assess the hypothesis by examining the record book and reflecting on class behavior. Attribution Theorl Attribution theory is concerned with how a person infers or attributes cause to an event and what happens once he does. Attributional research goes beyond a description of what factors are in evidence during judgment to an explanation of why and how the factors are combined into a decision. Two major theoretical approaches to attribution are appropriate to the marking decision; the covariatian principle of Kelley and the dimensional approach of Weiner. Kelley (l97l) proposed that people determine the cause of behavior by examining the covariatian between the effects and possible causes of the behavior and then attributing the effect to the cause or causes with which it covaries. Covariation of an effect over time is called "consistency." The extent to which an effect is localized across situations is called "distinctiveness" and the extent to which an effect is found across people is called "consensus." 59 The application of Kelley's covariatian principle to teachers' marking judgments emphasizes the extent to which a student's effort and ability are judged to be within his own responsibility and control. It offers specifications of data about a student's effort (work samples over time) and ability (test results) and contingency factors (parental support/pressure, attendance/health) which can lead to the assignment of high or low marks. The effect of consistency is that the more often a student turns in work, the more likely he is to be judged responsible for learning the work. The effect of distinctiveness is that the more the student misses work assignments only in restricted circumstances, the less he may be held responsible for the failure (i.e., when ill or when evidence of parental problems is overwhelming such as death, divorce or abuse). The effect of consensus is that the absence of work samples and reasonable test results when associated with few students in a given class is judged the responsibility of the student more than when associated with many students in the particular class which may indicate external factors such as task difficulty are involved. The role of consensus or base-rate information is controversial in many fields. Its usefulness in attributing cause has been detailed in the research of Kahenman and Tversky (I973) and Nisbett and Ross (I980). In the marking process, it is important to note the consensus of one classroom may differ from another (especially if there is a tracking system) and from one school environment to another (especially if there are large socioeconomic differences). This area is well discussed in the teacher expectation literature and will not be delineated further here except to note that for an individual teacher, the consensus of a given elementary, self-contained classroom is primarily limited to the boundary of that classroom system or culture. Weiner's (I979) dimensional theoretical approach, developing from concepts of Heider (I958), and Kelley (I967), provides evidence that attribution 60 judgments for success and failure are divisible into three dimensions: stability, locus and control. Stability considers whether the causes of behavior are stable or variable over time and contrasts Heider's notions of relatively fixed characteristics such as ability and typical effort with fluctuating factors such as immediate effort, attention and mood. Locus considers whether causes are within (internal) or outside (external) the person, and from the perspective of achievement, internal causes may include ability, effort, maturity, etc., while external causes may include classroom environment, task and family. Control considers the degree to which the individual is responsible for the present event, hence, the degree of control of future change. Weiner proposes that these three dimensions create a meaningful grid on which to assess the myriad causes of achievement events. Furthermore, although numerous perceived causes of success and failure are discussed in the literature (Cooper 8. Burger, I978), there is rather a small list from which the main causes are repeatedly selected. Within this limited list, ability and effort appear to be the most salient causes. Weiner contends that each of the three dimensions of causality has a primary psychological function or a linkage. The primary relation of the stability dimension is to the magnitude of expectancy change following success or failure. The locus dimension has implications for self-esteem and the perceived control dimension relates to helping, evaluating and liking by others. Stability. Attributional research in regard to stability has remained remarkably consistent since l97l. "Empirical evidence . . . has proven definitively that causal ascriptions for past performance are an important determinant of goal expectancies" (Weiner, I979, p. 9). Failure ascribed to low ability or to task difficulty decreases the expectation of future success more than failure ascribed to bad luck, mood or a lack of immediate effort (Weiner et al., I976). In I976 6I Weiner et al, carried out an investigation which clearly discriminated between the dimensions of stability and locus, and proved that expectancy changes are related to stability and are not associated with locus as was previously postulated in substantial competing literature. Weiner concluded that "the literature associating stability with expectancy change is unequivocal, and the findings generalize outside of the laboratory as well as beyond achievement domain" (Weiner, I979, p. ID). A comparison of Kelley's covariatian principle and Weiner's dimensions points to considerable overlap in categories such as consistency and stability, distinctiveness and locus, however, Kelley's theory is primarily historical record while Weiner's is primarily concerned with future control. Together they shed great insight on the cognitive process used in the teacher's marking task. £9313. The concept of locus is not as firmly established in research. Weiner, himself, has corrected an earlier position which presumed an invariant positive relation between internality and maximum emotional reaction (Weiner, I97l, I977 and I978). A series of studies was initiated to determine the relation between attribution and affect in which subjects responded to a series of scenarios depicting success or failure experiences. About l00 affects for success and ISO for failure were provided with a rating scale for intensity. The findings indicated a two—phase response, the immediate and intense affect was related to outcome regardless of "why." That is, success resulted in obvious happiness, and failure revealed displeasure and upset. But attribution linkages were also present. Given success, the linkages were: ability - competence + confidence; typical effort - relaxation; immediate effort - activation; others - gratitude; personality - pride; and luck - surprise, relief and guilt. For failure, the attribution linkages generally were: ability - incompetence; effort - guilt and shame; personality - resignation; others - anger and aggression; and luck - surprise. In the long run, central self- 62 esteem emotions that facilitate or impede subsequent achievement may be more effected by the attributional linkages than by the immediate affect of feeling good or bad. Control. The control dimension concerns inferences about the intentions of others rather than causes. This dimension is perhaps more meaningfully related to the teacher's role of helping, evaluating and liking (sentiment). The majority of investigations into helping behavior concluded that help is more likely when the perceived cause of the need is an environmental barrier as opposed to being internal to the person desirous of aid (Berkowitz, I969; Piliavin, Rodin, 8. Piliavin, I969). Piliavin found that when a failure is perceived as controllable, then help is withheld under the assumption that the person should help himself. Hence a drunk person is less likely to receive aid than an ill person since illness is considered uncontrollable. One experiment on helping concerned altruism in the classroomulending class notes to an unknown classmate (Barnes, Ickes, 8. Kidd, I977; Weiner, I979). In this experiment, two causal themes for a student's lack of notes were contrasted in eight possible combinations. In one theme the student always (stable) or sometimes (unstable) did not take notes because of something about himself (internal) or something about the professor (external). The student was unable to take good notes (uncontrollable) or did not try (controllable). Following each causal statement, subjects on a l0-point scale, rating from "definitely would lend notes" to "definitely would not lend notes." The results indicated that help is reasonably extended in all combinations except when the cause is internal and controllable such as if the student did not try to take notes or in the second theme if the student could have avoided being absent. Data from investigations into evaluation "conclusively demonstrated that effort is of greater importance than ability in determining reward and 63 punishment. High effort was rewarded more than high ability given success, and lack of effort was punished more than lack of ability given failure" (Weiner, I979). Weiner explains this discrepancy between ability and effort by referring to moral feelings of "ought to do" and by the feeling that reward and punishment could actually change behavior in the future. "That is, there is a pervasive influence of perceived controllability or personal responsibility . . . in achievement related contexts, including how students are graded" (Weiner, I979, p. I7). Attribution analysis involves a schematic model of the attributional process which is clarified by further investigation of a specific set of cues and related by the person to a possible set of outcomes. "In a typical study, the subject is provided with a number of informational cues in all possible combinations and asked to rate why each event described by a particular cue combination may have occurred. ANOVA models have frequently been used for data analysis, and interaction effects have been equated with configurality in cue usage. Although these studies have used a wide variety of cues, situations, and attribution rating scales, results have been surprisingly consistent" (Carroll, Payne, Frieze, I976). The attribution and decision-making framework of the current study attempts to elaborate one phase of the overall marking process, the phase wherein the teacher assesses the risk of obtaining more effort (homework and tests) and more achievement (tests) by giving a higher or lower mark at a marking period. Through interviews which call for prediction of future marks, accompanied by the factors (cues) influential to such a prediction, and the eventual outcome of the prediction (the grade), this investigator attempted to flesh out a more complete schema of the marking process. 64 Summary This investigation into teacher judgment during the marking process attempted to answer seven research questions. Data were gathered through in- depth interviews with five elementary teachers from one Michigan school district across one school year. Interview data were grounded in the teachers' record books and the official marks of l52 students. A multimethod approach was used in order to provide both quantitative and qualitative analysis. The four established judgment approaches (process tracing, policy capturing, utility analysis and attribution theory) allowed for cross-checking and corroboration of evidence thereby lessening the risks of total reliance on self-report data. The evidence gathered was organized into five teacher cases and one composite model of the teacher marking judgment. CHAPTER IV FINDINGS Introduction The major purpose of this study was to create an understanding of the judgment processes which engage teachers during marking across a school year. Four established judgment methods were used to uncover the process; process tracing, policy capturing, attribution and utility analysis. The basic data was collected through taped and transcribed interviews. A unique aspect of this multimethod analysis was that the interviews were grounded in official public records; the teacher record book and summative marks on report cards. All interviews were done with the record book at hand. The public aspect of marks added an extra dimension of objectivity to an otherwise subjective process. In this chapter, the findings are organized to display the data which teachers considered during the marking task, to make inferences from the data and to ascertain whether or not (I) the data indicates a coherent and meaningful process which suggests a model and (2) whether or not the data answers the major research questions which guided the study (page I l). The findings are presented as six cases beginning with a composite case which serves as an organizer followed by five teacher cases. Each case is further divided into four sections; marking rules, statistical analysis, verbal analysis and a summary. An integrated summary concludes the chapter. The composite case format was chosen to serve as an organizer, setting a pattern for the individual cases. Basic data are displayed in each teacher case and inferences are made. In order to avoid lengthy repetition of common discussion points within each teacher case, the inferences are more extensive 65 66 within the composite case. Discussion within each teacher case will then refer back to the composite case, noting points of difference. This format results in a detailed composite case followed by more concise individual cases, although it must be kept in mind that the data were originally collected on an individual basis and later combined to create the composite model. Composite Case The composite case depicts factors which the five teachers have in common. These commonalities were derived through computation of the marking data of all five teachers using multiple regression, correlations, and frequencies (N.I.E., _S_E_S_§, I980) and through content analysis which distilled common rules, categorized and coded attributes and utilities and identified key descriptions from the interviews. B_u_l_e_3_s The process tracing-phase identified two sets of rules which guide the marking judgment: procedural and contingency. The two types of rules dealt with different aspects of the judgment process. From the point of view of information processing, procedural rules were concerned with selection and simplification. These rules set up a linear, routine record book system, selected tasks for inclusion and accounted for academic standards and precision measurement for marks on tasks. Procedural rules were product and time based and lent themselves to statistical analysis. Contingency rules were those which determined judgment in uncertainty and exception. From the point of view of information processing, contingency rules were concerned with inferential processes which go beyond the data. These rules were essentially involved with factors which promoted (I) stable individual task completion over a year's time, and (2) a stable classroom environment for on- 67 task behavior or class flow over a school year. Contingency rules involved teachers in an assessment of motivational factors for each student, including ability, effort, home support, classroom behavior and task difficulty. related and lent themselves to verbal analysis. transcribed interviews, highlighted these two major aspects; one of routine judgment procedure and one of contingency judgment strategies. Procedural rules: 2. Teachers assumed that completed tasks resulted in learning (implicit, not stated). Teachers assigned tasks and gathered marking data regularly in a record book. Teachers accounted for task completion at a given level of difficulty with a check system and for task completion at a given standard of mastery by a mark. Teachers gathered marks from a sufficient variety of tasks (tests, written projects, exercises) to satisfy their criteria for validity. None had less than six formative marks. Four had more than ten. Teachers had individual theories about weighting some tasks (tests vs. homework) more heavily than others. Teachers had individual systems for transforming points representing standard criteria on a written paper into A B C marks. Teachers had a combination rule for transforming formative marks into summary marks. They added all task marks across and divided by the total number of assigned tasks (arithmetic mean). This was corroborated by an analysis of each record book in math and language arts. Hence these rules were motivation and behavior The rules, distilled from 68 Contingency rules: I. Teachers ranked effort as related to ability as a prime criteria for going up or down. Effort was judged by re ular work and extra work (record books and attribution chart? 2. Teachers had strategies to apply if the work fell midway between two marks. 3. Teachers had individual strategies for marks which fell below C (frequencies and quotations). Procedural rules resulted in a record book system which operated as a statistical tool to help overcome many of the common errors of human judgment, which were discussed in the preceding chapter. An analysis of the record book showed the teachers' intent to account for a base rate of work for the nation, (the assignments adjusted to grade level on nationally normed verbal information, i.e., textbooks), for the classroom (the vertical column of any given assignment) and for the individual student (the horizontal row). Hence teacher record books were inferential tools that depicted student achievement in comparison to individual ability, class (group) and nation. Initially the teacher used only the record book to compute the mark into a preordained category of A B C D E. However, when the work fell into a zone of uncertainty between two grades or when it fell into the D or E category, contingency rules were put into operation. A statistical analysis of the marks (I 52 students) which resulted from the procedural rules follows. Statistical Analysis The statistical methods involved multiple regression analysis, Pearson and partial correlations, frequency counts and cross tabulations. Marking data was entered in the computer under language arts and mathematics. The following symbols were used in all charts contained in this technical report. Ll representing 69 the first mark in language, L2 the prediction of the second language mark, L3 second mark in language, LA the prediction of the final language mark, L5 final mark in language; Ml representing the first mark in math, M2 the prediction of the second math mark, M3 second mark in math, M4 the prediction of the final math mark, M5 final mark in math. For computation purposes, the summative marks, the predicted marks and the final marks in both language arts and mathematics were arithmetically valued and entered in the computer as follows: A+=l3 B+=IO C+=7 D+=4 E =I A = l2 B: 9 C = 6 D = 3 Incomplete=0 A-=ll 8-: 8 C-=5 D-=2 These values were used to derive all statistical factors which are found within the figures and tables of this study. Their role is particularly told in the composite teacher policy model, Figure 4.2. Across the year. Multiple regression is a general statistical technique for analyzing the relationship between a dependent variable and a set of predictor variables. In this case the researcher was interested in the extent to which the final summative mark in a subject is a function of preceding marks within the same year. The most important use of the regression technique is to descibe the best linear prediction equation. Using multiple regression analysis, composite teacher marking policies (equations) for language and mathematics were captured: L5 (final language mark) = .42L3 + .30L4 + .I6L2 + .I5Ll - Constant M5 (final math mark) = '39M4 + .34M3 + '23Ml - Constant 70 There is some inconsistency between the policy equations. In the language policy, the greatest predictor of the final mark is the second mark (Beta weight = .42). The second greatest predictor is the final prediction (Beta = .30) and the poorest predictor is the first mark (Beta = .I5). In the math equation the final prediction is the greatest predictor of the final mark (Beta = .39) and the second mark (.34) is the second greatest predictor. All composite regressions were plotted. The basic purpose of a plot is to give a pictorial representation of the relationship between any two variables and to indicate the confidence interval of the regression line. The relationship between the first and last marks in language and math are depicted here. All other regression plots are in Appendix D. Table 4.I Relationship Between the Final Mark and Predictor Marks Teacher N = 5 Student N = I52 W Variable - I.s Final Language hrs Dependant Variable - I5 Final nut. that Leopage Arts Variables I I “Ch lath Variables I F 51!” Final rmmm La .30 6- .015 Final memo. '4 .39 1a. .000 second nmm L3 41 II. 4100 Second nursing '5 .35 u. .000 Second mama.- L2 -1‘ ‘- 433 Second Prediction '2 -- -- .m rm: lifting L1 45 3-0 -°64 First nuts... '1 .23 u. .m Iota. Overall F - m Sip. .M 1%."- Overall F - 1“ 519- M” _ mltinle a - .e: mun. . . .9} I Square - .az I Sauna - 33 Standard Deviation - 1.13 Standard Deviation - 1.27 Note. Multiple regression analysis was used with the marks of I52 upper-elementary students to describe the best linear prediction equation. It is important to notice that not all Beta weights are significant in the equation. 7| up 0 O O ’ .1 J L e e e 4 ¢ . ’ a g 4 a ’10 a .. e n. I ‘ I a. e e e e a a a t e I» e e " e~ e S I 2 ’ I I I J» o ’ ,‘ e a » e I I . . I I I e- ’ I a I ’ I J 4” I e e g r e e e 4 ” 6 0 , . 9 III. 0 . LII- e ' Ta ' a T Tie ”- " use ' 755a ' «Isle u LI "II! III Mill "I" Note. The relationship between the first and final mark7§ graphically presented. Of importance is the depiction of the confidence interval surrounding the regression line. Figure 1LT Plotted refitionship between first and final marks. The possible inference that the final mark is largely a function of the second mark and the final prediction, is supported by the Pearson correlations. Pearson correlations are statistical techniques which provide a single number to summarize the relationship between two or more variables. In the case of marks, the researcher was interested in the relationship between the final mark and preceding marks and predictions. The correlational charts included here indicate that the final composite language and math mark is more closely correlated to the actual second mark (.88) and the final predicted mark (.88) than to the first mark (.82). 72 Table 4.2 Correlations Between Marks and Predicted Marks Across a Year Teacher N = 5 Student N = |52 'Language Arts Variables L! L, L, L. L. “mu“ Variables '1 '2 ”I ”Q '5 nm Martin. L, .1» .70 .aa .32 "m "Iv-tins I, .0 ‘ 1| .7! a! Second mmm L, .aa .n .9 .Il Second Minion '5 I! .78 .II .n SecoM mum L, .79 .7a .13 .aa Seem lat-km I; re .78 .91 .I mu Prediction L, .as a: .93 .aa Fin-l Mimo- I. 78 $1 .91 .- Final mum L, .a2 .31 .aa .3 "ml "av-km l5 81 .n .3 .- moo: F-m Note. The Pearson correlations were derived from the summative marks (first, second and final) of elementary teachers across the school year. Before drawing final inferences from the regression and correlational analysis, the policy equation must be adjusted by the evidence derived from partial correlations. Partial correlations allow the researcher to describe the relationship between two variables while adjusting for the effects of one or more additional variables. In this case the reseacher was interested in double-checking the relationship between the predicted final mark (L4 and Ma) and the actual final mark (L5 and M5) while controlling for other previous marks. Specifically when controlling for Ll L2 L3, the correlation between the final prediction and the final mark is reduced from .88 to .20, a reduction which is also present in mathematics. The relationship between the prediction of the second mark and the actual mark (L2 L3), when controlling for the first mark (LI) equals .26 and in math it equals .3l. 73 Table 4.3 Controlled Relationship Between the Final Prediction and the Final Mark Teacher N = 5 Student N = l52 EM ‘3’ T LL 1 1‘ 1-5:] W"; '5’ l "1 I 3“ J “D 533,. L1 - rm: Actual Ian Log. nl - rm: Actual Inn: L2 - Second Prediction I12 . second Predicttal L, - Second mm um I: - Second mm m l.‘ . Final Prediction - '4‘ . Final mucum t. ' "W "I" n5 - rm! nm Note. Partial correlations were derived from the marks of [52 upper elementary students across the year. Through statistical analysis, the following conclusions appear. All marks- are highly correlated and significant (P = .OOI). The final mark is most highly correlated to the second mark which is primarily correlated to the first. Correlations between actual marks are greater than those between predictions and marks, although it is obvious that predictions become more closely correlated with actual marks as the year progresses. The following judgment model emerges: 74 Language Arts Predicted L2 Fredictad L‘ m. Ll - rm: Actual Hart L2 . 5.“ Prediction r(L‘ l.s controlling For Ll L3) - .24 L3 - Second ktull Hf! L‘ I Final Pndlctlal L5 I Final Hart Hath-atics Predicted I: Predicted I‘ Q .31 “a. .7a . Q a .73", ’I“__ .33 ”Ewe "x r — I; = I, m. n, - rm: Actual Mark I: - Second Pnediction “"4 '5 “"m""" '°" "1 '3’ " '32 ll, - Second Actual lav-L I‘ - Final muscul- I5 - Final Hart Note. This judgment policy or model was derived from the statistical analysis portion of the judgment study (multiple regression, Pearson correlation and partial correlations) and is based on the summative and predicted marks of l52 students across a school year. Figure 4.2. Composite marking policy (with predictions). The judgment model is corroborated by the pattern of the bargarph (frequencies) which illustrates that the average marks across the year are generally slightly lower than predictions. Table 4.4 Composite Pattern of Marking Averages Across a Year Teacher N = 5 Student N = l52 nv-srnvr' annual-.9. 5 E E 5 E i; 3; 3; is i: g .. i t 3 3 i a g 3 g g Note. Bargraph derived from teachers' average marks and predictions over a year's period. At this point in the analysis, using statistical techniques, three competing hypotheses are possible; a recency effect, a primacy effect and an averaging. The original regression equation suggested that the marking policy was most heavily influenced by a recent mark or prediction (0 recency effect). This hypothesis represents a phenomenon noticed in many theories of error in human judgment in which thinking is determined by the most recently available information in the memory structure. However, this conclusion is drawn into question by the partial 76 correlations. There is evidence in the partial correlations for the hypothesis that the final mark is "anchored" in the first mark. The anchoring heuristic suggests that teachers determine a student's marks early in the year, and adjustments thereafter are biased toward the initial values. In the judgment literature, this anchoring phenomenon is a typical error of laymen (Nisbett 8. Ross, l98l; Kahneman & Tversky, I974), and it was found to be typical of teacher expectancy/behavior patterns in repeated classroom interaction studies (Brophy 8. Good, I970). Therefore, the inference that the last mark is anchored in the first mark is not only present in the data but established in the literature. These two hypotheses offer explanations at both extremes of the spectrum. A third logical inference is possible and is not eliminated by the data. The final mark may not be anchored in the first mark or recently determined by the latest mark, but in fact, all three may be grounded in the teachers' record books. This inference is not apparent in either the regression equation or the partial correlations. However, regression and correlational analysis rely on arithmetic abstractions which may avoid some important subtleties of the marking judgment. A more detailed examination of the record book and the frequencies of marks within each marking period was undertaken. Within marking periods An analysis of the teacher record books indicated that each marking period stood on its own formative marks. Teachers regularly assigned tasks and indicated completion in the record book. They stated that they followed a combination rule which involves an arithmetic mean. That is, either by hand calculation or calculator, teachers added all formative task marks and divided by the total number of marks. Departures from this rule resulted under two circumstances: when the formative work fell between two preordained categories and when the work fell below C. Corroboration with records indicated that this was generally followed. A formal statistical analysis of the formative 77 marks within the teacher record book was not undertaken. Instead, the data from the interview were corroborated by collecting the record books, photocopying all language and math marks, and cross-checking. Class Consensus Record - Vertical Note. Assignments are frequently taken from texts and workbooks written for a national grade level norm. Figure F3. Sample record book account. This corroboration process was combined with an analysis of the extent of pluses and minuses distributed throughout the year. 78 Table 4.5 Distribution of Marks Across Three Marking Periods - Language and Math unease: mu naming Period - hm (ll) Teacher - (mam Ital-sing Perlsd - rlm (Ill) lease-r - tone-nu Plus and Ilnus Plus and Ilinus :2 c , zl hols. I e! IInuses I pluses - ll 1 I relasadte A/l- m_ I e! einuses I eluses - 7-i I re heed to Al'- I or ninuses - 7.9 Preerdalned categories -l9- ' I of sinuses - 5-3 Preordsined categories I". i put sea a 1.! I I pluses - Ll lncmlet Ilanl ' 3' L "m“ c... u mm a nun" "H“ ' “to“ (L3) leacner ‘ . Marlins Feried ISeeee-i ('3) Teacher - (nestle Plus and ”mu: Plus and Minus ml. Io'elnusesloluaes- 275 IrelsteduA/I- . Ima- Io nunsleluses-ILI Irelltm All- I“ I or einuses' 5° 1 M“ '“‘m ““5"." 72 5 I or :inuses ls l reordained Icstegorles - '1' ’ I or pluses - - s e In: Ila nt- .5. unease: Dinning Period - final 1L5) leather - (anemia Min thrting Period - Final (as) Teacher I (melee Plus and Ilnua _ 3y Plus and Ilinus :5. ~ 2.1 vili.l_1_u.l::.s;2_u_u*_~;:_1m m. I or elnua-s l pluses - 11.5 : related In All I e! linusel - 15,9 "ordained categories '77. 5 Ian. I e! ninuses I oluses - 21.0 I related to All- I e! pluses - 6.6 [new eta - I or Iinueel - 15.! Pr reor rained ranger-lies - 77 lie I I of pluses - A Ila 5.1.5:...“an aqua al 152 scale: (ILH ouanL equals ..5 79 An analysis of frequencies of marks, including pluses and minuses, indicated that there are patterns within marking periods. Before discussing inferences, it must be noted that two teachers did not give minuses or pluses on report cards; however, they did use them in the record book or they had a schema for extra effort (see Teacher Three and Teacher Four). The majority of marks within each marking were assigned to the preordained categories with only a minority invoking contingency rules. Preordained categories predominated heavily in the first marking. Contingency rules operated more regularly after the first marking with minuses given more often than pluses. Minuses and pluses were primarily used as a contingency measure to qualify As and 85. During the first language marking, only 2.7% were C or below; second marking only 3.7%; final marking only 2.l%. Only one 0+ was given in language and math combined with two Ds. The inference may be drawn that there is more subtlety involved with marking at the upper level and that the rate of change in minuses and pluses does not show up clearly in the arithmetic abstractions of multiple regression and correlations. Contingency rules operate differently below C than above C, indicating that OS and Es carry an altogether different connotation to teachers than marks which are average and above. The greatest evidence for the influence of the formative tasks of a marking period in determining the summative mark was the official existence of the record book, the listing and marking of tasks completed, and the integrity of the math calculations. This was further supported by the fluctuations between marking periods. It became evident that each marking period was self-contained despite potential hypotheses to the contrary. From the policy-capturing perspective, the marking judgment in the majority of cases was procedural and linear, albeit representative of only one phase of the process. Although the record book analysis 80 lends more detail to the overall understanding of teachers' judgment processes, neither the summative nor formative levels of analysis reveal the judgment factors operating within the contingency rules nor answers the questions why teachers do what they do. Other methods of inquiry were necessary to expose these processes. Verbal Analysis Verbal analysis of protocols, like the statistical analysis was put into perspective by the marking rules which emerged through process tracing. This part of the study was particularly concerned with identifying the judgment factors underlying the contingency rules. Contingency rules are related to uncertain zones midway between marks and to cases of failure or near failure. Exposing the judgment cues involved various methods of establishing and categorizing teacher concerns. The interview process not only recorded marks of I52 students, but asked teachers to predict the next marking and to discuss the factors which influenced the prediction. The common attributional categories were discussed in some detail in Chapter III. They are ability, effort, task difficulty and luck. In coding verbatim responses, 0 category of miscellaneous existed for one-time events. Early in the process, luck was replaced by an emergent home support category, and miscellaneous was replaced by an emergent catergory on class behavior and physical maturity. The later is more aligned with utility and maintenance of class flow or on-task behavior. Teacher statements were counted and turned into percentages. A copy of the coding device may found in Appendix C. The categories follow in Table 4.6. ’f./ 8| Table 4.6 Teacher Attribution-Utility Categories Upper Elementary Students AllLlI!.1A£n1th=£n11 usu£_5ueesnl Concert or AVEnAeEa Assvs. serpent coco. nucn asLOn. Lon. sELon sacs: LEVEL. CONCEPl or snlcntl thv anlsnt. AsnoanALLr lOP-NOYCN STUDEnl. selcnrest xls ln CLAss. sLon. ConCLel or AcnlEVEnl ovsnlunssn. niGn/Lm onczrl or snAsEs: A. I. C stustnt -C. srnAlent-A st Hunts lEsl reEElE. SPECIAL EDUCAVIDI ItuDEll. TOTAL AIILITV. EEEOII.1BDII!AIIDI1 In CLAss: AttEnolne. concEnrnArlns. nAstlna tins. LAZlness. LAcs or slsCIrLln nL Sues. Y‘CARELEIriIIc Flnlsnss {ea an: or CLAII: ensCIEnlleus. HAS roon stuns/noes nAslts. Doss sxrnA noes. none tnAn ls AssEs roe. HAK ”I sue ALL Asslsnnsnts. Hones MEAD. lotAL. Conresnu OVEIAC:IEVIIGa IIALLY WTITI UNOERACNIEVII‘a UNSTAILE EFFORT. ConrtrltlvE. eEErs or nltn rnlEnss. 'SllnuLAllne nln to so Anvr nl éngLHOSTA AonE-lo- one sAsls. [RHIIEO TO GET ALLA T GIT NI! ACT TOGETHER. IIA NEEDS TO I! PIOOOEO COIOTAITLT. ¥::' OIIOREAIIZEOa TOTAL (reset. IASI.I1E£1£NLI! SleLs HULTIPLICATIOI IO! HAITIIID. lvtslen ls ortsn DIFFICULT. ADOLEI EXPIEIIIIO IOIAI ll nnltlne. Slussnts sPtan nELL so II‘ n! noeAlns en en!!!“ ADDITION AID SUITIACTIOI IOT nAlenEs. HAY DIP AI COICIPTI IICDHI RDA! slrrlcutt. olAL. lsxr sees LEVEL AEAolns Asset enAss LeveL. REAslnc AT snAsE LEVEL. AEAolna A courLs or anAss ssLo~ LEVEL SoclAL luoiss soon It slrrlcuLt. §0CIAL luolss lssts Ans nAas. olAL. GensnAL EAnnlne sIsAlLEs nlLs Is aElne testes. nAs A nAno lint LuAnnlns IV aeALs. lnouaLE CONCENTAAVIN BEllEn In LAuGUAGfl. BertEs In nAtn. ISCUSSES “LL 0 05‘50 TOTAL lAsA Dlrrlcule. IV! Pagengs vsnv nssronslvs to nets PAnsnrs vsnv essrenslvs to nEEs roe sx lLL. FAlnse EerCIALLv nssronslvs. Hornsn serCIALLr nesronSlVE. PAncnts AlsgLUTELV ELATEO tnAt It nnsn‘ l LL '0: nsALLr tons tns rAnEnrs ur.‘ Aunt Ans unCLs nno REALLY CAnE. :AAEnls nlLL II SURE TNE nAens All 8. TOTAL SUPPORTIVIa PIODLEHATIC (OFTII LIAOIIO TO POO. STUD HAIITI RECENI AEHAAIIAOI. LAnGUAeE reosLEns falcons LAneuAasl. INGLE rAnenl sELson nenE. tn rAnsnts nonnlne. T00 tines FDA DISCIPLINE. ELoEnLv rAnenrs nltneut nucn Ensnsv. FAlnEA LEFT lnE none. AneEe. nornEA nAs nAo asvsnAL nusaAnssa nAn: cm no Sllen on cause. nearltAless. laiAL FROILEHATIC. Unsurem MT! vs HornEn nAn nln nsonn so sAer. Hglnen :AYO n: It nsntALLr nEtAAsls. n AISENCE on tAnslnEss EXCIIIIVI nllnout lLLnEss on Excuse. Punlrlvs. AIDICULOUI rEnALtlEs. TOTAL unsurrontlvs. TOTAL NONI SUPPORT. CLASSBflQfl_IEHAIIDBZBAIHIIIXLDfiliLDEBEIIAL ansICAL Guantn srunt. enonlne nArIva. VERY LAA6£.-NIAVV. III FOI All. SHALL roe AsL HAno tins nltn nlnssLP.- PusEnTV. n nEleAtlon. n' t It? STILL LDIO IIOUIM TO 00 TOTAL PHYSICALa n. Soc In Ill. lnlEAEstEs In IAILI. NAII. Its. v' s nAn n/sor vc ALnAtIVI. LjKII to Vlsll. FLlonrr. CAnt "LI. TOTAL SOCIAL. TIONAL OTIONAL PIOOLEHIa PIIOOIAL PIOILIHI. VERY. VEAV sEnslllVI. ConstAntLv nonnlss. vzav InnAr unL Veer nAlueE Ans szrtnsAILE. ALnAvs nELrs unsusoe. silo. LIKES to PLEASE om Ens. leEs to PLEASE nE N(rEAcnsnl. YELLS out Ansntns. LAcxs contn0L. neavous reoaLEns. TelAL EnsrlonAL. TOTAL IsnAVIen. 82 Table lI.7 Composite Attribution-Utility Count Percentage Aii Teachers Students - 152 First Marking ' x ABILITY 27/29 21/28 21/31 21/33 27/31 100 (Achievement) 90 EFFORT 26/29 9/28 24/31 10/33 21/31 so (Motivation) 70 HOME SUPPORT 17/29 3/28 14/31 8/33 9/31 60 50 CLASSROM 7/29 15/28 6/31 12/33 9/31 40 BEHAVIOR + PHYSICAL 3° MATURITY 20 IO TASK DIFFICULTY 10/31 9/31 In In In In In g E g H H «I H H > A.» > t C C OI C C C '0 «I § 3 o%’ 28 3% “£8 E 3 +- 1.. 5 g 3 3 3 fi 3 03 '3 2 e-l K a: a ‘1; Ft: ‘5: h. at u' «n V g 8 5 I” L L L L L " 35:: 3: 2a 293 2:; 2:2 2; 1’- .. w s a L1 L1 L1 L2 LI :3 55 a: co to ca :0 as ~ I: g g 2 .3 .‘3 .1’ .9 .1” 3 u u .- Note. The left side of the table displays the actual count of comments made by each teacher within each category against the total class size and of all teachers against the total l52 students. The right side of the table displays the total percentage within each category of all teachers. All teacher comments pertaining to individual pupil's marks were coded. Once the categories of home support and classroom behavior were identified, no additional miscellaneous category emerged despite the efforts of the researcher to locate unique concerns which did not fit a framework. Task difficulty did not emerge as a major concern for most teachers and . needs further explanation. Teacher problems with task difficulty were frequently solved by their use of individualized programs, grade level reading materials or special education support. Hence typical comments were: 83 She struggles to maintain Bs. She's sort of a C+, 8- kid (coded under effort and ability). She has learning problems, she's just plain slow, low LG. She just has to have everything taught on her level or she doesn't get it. If she were in a traditional classroom where the teacher was teaching all fourth-grade level, she would be in big trouble, but I individualize in basic skill subjects (coded under ability). She has an A in reading, but I have starred that because she is reading at third-grade level which is two grades below, but she is doing )excellent work at the third—grade level (coded for ability and effort . I do not grade him in math; he is a learning disabled student (coded as ability). Math may dip as concepts become more difficult. She is a hard worker, a good student, but I tend to feel that she is a low achiever (coded for task difficulty, effort and ability). She has not mastered multiplication yet, hence, she has a D (coded for task difficulty). These comments indicate great overlap between the categories of ability and task difficulty due to teacher's choice of words. To clarify categories, task difficulties were coded closely with specific subject matter. This overlap problem is addressed at a later point by combining categories. However, it should not be concluded from this that teachers lack concern with task difficulty. The categories of home support and classroom behavior also deserve some further explanation. Whereas effort, ability and task difficulty are common attributional categories, the home support level appeared to be an attempt on the part of teachers to qualify and perhaps control effort or lack of it. Not only is the category frequently mentioned by teachers, but it is invoked at both extremes: that of insufficient support or of sufficient support to gain leverage for more effort. 84 Consider the following comments: I predict the grade will go up because she has not mastered multiplication and received a D which hopefully woke the parent up and let him know we have a problem. Mark will remain A. High expectations from home. She might improve if the mother is interested. It all depends if the mother wants her to write better. Well, I think mostly because parents start taking a very active role in their children's education when kids bring home bad grades. In some cases it can damage a child's self-image but on the other hand there are kids who can be motivated by low marks. i find that I not only have to know the child but his parents too, to know if a low grade would be damaging or motivating. And it only takes the first conference to know. When I got to working with his mother, he finally shaped up. She is really below average student, but she works extremely hard and has a very supportive father, and I think he is the driving force here. C-, 0-1. I predict D and D, split family. Mother works nights, step-father emotional problems. Not a lot of energy left for school work. C-D. Marks will stay about the same. She lacks self- confidence. Parental support is lacking or was. I hOpe it changes, but she is late a lot and misses study time. Home motivation is strong. Almost no motivation from home. Comments such as these were coded under home support level and were clearly used by the teacher as a factor in predicting future academic success or failure, on attribution process. Class behavior was also a category which emerged and needs delineation. This is a utility concept. That is, it was important to the teacher to maintain on- task behavior and to maintain the flow of classroom activities. This maintenance 85 of flow was a goal in itself, separate from achievement but also related to it (Joyce, I980). Activities were planned to accomplish academic tasks; completed tasks were equated with achievement. Therefore any disruption of class flow took time away from a task. For individual students who caused distraction, the loss of time was personal, but frequently, if the flow was disrupted, the loss of time was general. Where teachers perceived that sociability, excessive talking and lack of concentration were disruptive to task—oriented behavior, they mentioned these characteristics in relation to predicted marks, i.e., "Her mark will probably go up when she controls her talking." Each teacher stated that they allowed some level of conversation during class, hence, comments on excessive talking, goofing off, teasing, etc., were interpreted as off-task behavior which the teacher was attempting to bring in line. Since the mark was based on the tasks completed, it was assumed that the teacher's comment regarding a low grade was a recognition that zeroes result. Tasks were incomplete, and therefore off-task behavior lowered a mark. The category of classroom behavior lent itself to the decision- tree method of utility analysis. A 3 Student -roductive Preordained cate-o C '. ncreased effort I ' nd cooperation Student coo-erative Combination Marks u- ,. Sustained effort Ruin I V and cooperation Student bored ] nu. creased effort Risk between - 5 any two marks ncreased effort Student disruptive | nd cooperation Marks down Sustained effort and cooperation A///1AStudent uninterest33] ecrggsed effort \q Student disruptive | Figure All. Decision tree for marking judgment (adapted from Winstein, Fineberg, et al., p. l8). The decision tree is the fundamental analytic tool for utility theory. It requires the decision-maker to identify alternative action which might be taken 86 and to obtain information at these times. An optimal course of action is desired. In the case of teacher marking, the teachers were concerned to have students complete tasks and to maintain class flow (on-task behavior). The decision tree indicates points of risk and the following teacher comments indicate that teachers risked marks to maintain task behavior during class. Basically a C student but has a behavior problem with another girl in the room, too much talking, but will probably do better now that l have moved her away. The C will come up I'm sure because she is very bright, but there is too much talking; she is one of my worst talkers. He's fairly bright, but he's been a discipline problem since he's been here. I am trying something new this week with him. Peer pressure. I assigned a couple of kids to every time goofs, to yell out " get to work." Yell out right in class. It only started today, but knows he's being watched by someone other than me. He just needs that. I look for him to go up, but he's the most distractable kid I have every seen and distracted mostly to bother other people. She is my second worst talker, but as I get a handle on her behavior, I think the grades will pick up. Teacher Four gives minuses and pluses in some areas: . . . if they can stick with the task. If they are half way between an A and B, and they've got minuses in my book because of fooling around in class when they're supposed to be doing experiements or whatever we're doing at the time, then I would give them the lower of the two grades. Teacher judgment factors, up to this point, have been established through content analysis by coding verbatim responses and counting the totals in each category. Ability emerged as the primary factor and effort secondary. Another means of comparing factor weights, however, is to combine categories which have great overlap and count again. By combining ability and task difficulty and comparing the new category to a combination of the highly overlapping categories 87 of effort, home support and classroom behavior, a second conclusion appears. Teachers are more concerned with factors related to effort than ability. Table 4.8 Attribution-Utility Categories Collapsed Ability/T ask Difficulty Effort/Home/Behavior I I7 Comments Ability 90 Comments Effort l9 Comments Task Sl Comments Home __ 3 Comments Behavior l 36 TOTAL l 90 TOTAL Theoretically, this conclusion makes sense. Teachers, like people in general, perceive that they have more influence over effort than ability (Weiner, I979). In addition the category of effort covers two purposes of the teacher; individual task completion and classroom on-task behavior. Ability, on the other hand, serves primarily the individual level. The first conclusion which appears from categorizing and coding verbal comments is that ability is the primary judgment cue for marks. The second conclusion, which appears after related categories have been collapsed, is that effort is the primary factor in marking judgments. The possible third conclusion from the coding data is that the categories of ability and effort are impossible to separate entirely, and therefore continuously present the teacher with a very complex judgment situation. It is difficult to clearly weight one category over another as a judgment cue. It is safe to say that high ability and high effort at the elementary level result in high marks, and low ability and low effort result in low marks. Any other combination of ability and effort presents complexity in isolating judgment cues if one remains within this counting mode of analysis. 88 There is, however, a different way to analyze the data. When research goes beyond counting comments, one notes that descriptions of unstable effort and distracting or disrupting classroom behavior are much more lengthy and elaborate than descriptions depiciting productive students. The longest and most elaborate descriptions involved home support level as a rationale. Contrast the following key descriptions: Descriptions of brightness with effort. She is a very bright kid and a hard worker. Again a bright girl, high expectations from home. She is an overachiever, a good student and very conscientious. A, A. Just a very highly motivated student, a lot of internal motivation as well as from home. B, A. He'll stay there. He just seems really motivated. He is my brightest, hardest worker, and I predict he'll be an A all year. He does all the extras too. A, A. Very bright, has had a discipline problem up until now, no more though. B. A. Very bright but a little disorganized. B, B. Same. Both of these kids are just kind of steady workers. Description of uncertainty with unstable effort. D in math. D in language and I predict they will come up. He has a lot of home influences in his background. He is living with a grandfather, the mother has gone to court to achieve custody. i feel he is a much brighter student than his marks show. So if I can get him motivated myself, I think his marks will come up. She has a D in math, C in language. This is an underachiever, immature student. I think she is capable of doing more, but she is much more interested in socializing than working. A loving little girl, capable of improving. 89 Has a D in math, a D in language. I referred for language disability because I believe he is much bri hter. He is reading a couple of grade levels below where he shou d be and yet through social studies discussions, I feel he is quite frightened. He is being tested right now. C, D. Language will stay, but the math might drop. Again because of the complexity of the math coming up for P. And P probably is pretty much of an average student but very immature, not especially motivated to do well. She is the youngest child in the family which, I think plays some role. Dropped from A to B in language because of incomplete assignments. I think he'll get it back up. He's a very bright boy. has a lot of personal problems which—he's doing a pretty good job of keeping from interfering with classroom, but he really does have a lot. A single mother with—well they've had a lot of problem with people breaking in the home. He's seeing a psychiatrist too. That seems to be—at the beginning of the year he had a real nervous "tic" in his throat. I don't hear it anymore. He's getting that under control. C, D. She is my second lowest child and expected to stay the same. Real problem I have with this one, because I've recommended her for testing and her parents refused. Her parents insisted that I put her in a different book because the book I gave her was too easy. So I've had a problem with the parents because they refused to recognize that she is slow. I had two conferences in the same week with the parents. Finally I just let them have their way. I just put a note in her file to the effect that I will not accept the responsibility for the book that she is in. I requested testing, but they refused. Elaboration of problems is, therefore, a powerful way to weight the concerns of teachers. Using elaboration, descriptions within a category indicated that teachers were much more concerned with ability accompanied by unstable effort than with ability accompanied by steady or extra effort. Effort in these cases was the key judgment factor, not ability. Once again, however, the descriptive data indicated a high interrelationship. The role of predictions in the marking judgment deserves separate discussion. Does teacher expectancy (prediction) play an important judgment role during evaluation? Many studies reviewed by Brophy and Good (I975) concluded that expectancies became self-fulfilling prOphesies in the classic model of 9O Pygmalion in the Classroom, 0 well-publicized study by Rosenthal and Jacobsen (I968). The expectancy model holds that "Early in the school year, using the school records and/or observations of students during classroom interaction, all teachers form differential expectations regarding the achievement potential and personal characteristics of the students in their classrooms. Some of these intitial expectations are inappropriate, and some are relatively rigid and resistant to change even in the face of contradictory student behavior." These expectancies cause teachers to begin to treat students differently (Brophy 8. Good, p. 39). The evidence contained within this study of teacher marking does not contradict this hypotheses, but it questions it. For example, as evidenced in the statistical analysis of the composite study, teacher predictions were higher than actual marks throughout the year. Teachers continued this pattern in the cue sort exercise which was given immediately after the final mark. The cue sort (see teacher cases) was not statistically meaningful, but it emphasized the strong relationship between ability and effort and pointed to continuous high expectations for students. These expectations were born out through the high composite B averages. Looking at the verbal analysis, teachers discussed home support factors and classroom behavior primarily in relation to prediction at the beginning of the marking period. But when the actual summative marking judgment was made, teachers tended to routinely look at work completion factors. The majority of marks were determined by routine procedural rules. Predictions, therefore, did not appear to determine summative marks. Predictions appeared to be based on the teachers' perceptions of effort to be expended as estimated by knowledge of the attribution categories, whereas marks were based on actual effort expended as measured through task completion in the record book. Optimistic predictions may reflect the teachers' sense of control to influence task completion positively either by leverage through home support, or 9| by creating interesting and captivating academic tasks. Hence predictions may tell as much about teachers' professional self-confidence as about students' ability and effort. While the evidence in this study does not contradict the expectancy theories, it does suggest that prediction may be quite a separate type of activity from marking judgments and that it might be useful to distinguish carefully between the terms prediction and judgment. Prediction appears to be more closely related to planning and beginning activities than to marking judgments and concluding activities. From alternative methods of analysis, judgment cues used by teachers in the marking process have been identified and weighted. The inference can be made that although elementary teachers acknowledged ability as a major cause of success or failure, of higher or lower grades, they were more concerned with maintaining individual effort toward task completion and its counterpart whole class effort toward on-task behavior. One last concern in this study was whether or not teachers used the summative marking process as a feedback mechanism for their teaching effectiveness. The following questions opened the way for responses: (25) What do these marks (formative) represent? (Probe: Why were these specific assignments chosen?) (27) Are there activities which occur which you don't mark? (34) As you look at this whole group of grades, how would you say this group is progressing? Are you satisfied? (3) You have said that you are generally (satisfied-dissatisfied). Will you change any of your plans for next year? (l0) Consider the following situation: A year-end marking in which more than three- quarters of the 30 students in Teacher X's class receive C or D. What would you conclude? Answers to these questions led to the conclusion that summative marks were not a feedback mechanism, however, individually assigned tasks, especially 92 tests, may serve as such mechanisms. All teachers were generally satisfied with their classes' progress. All teachers ended the year with a class average in language arts and math which hovered close to a B. All teachers were surprised that the average was so high and openly pondered the implications. All teachers rejected the occurrence of the simulated situation in l0 as possible. I never heard of a situation like that. I can't ever think of a situation where three-fourths would receive Cs and Ds. I would want to know the record of the preceding year, whether they made any progress at all. I'd want to know the reading level. We've never had a situation like that here, and I've taught for long enough that I, usually have at least some really good student and then you've got some that are not outstanding, but they're good. If they're reading below grade level, we write that accordingly. That's why I can't understand, can't imagine that situation. The teacher is not doing his or her job. Three-quarters, did you say? i can't believe it. You know—that would be my first inclination. I would suspect that something was getting across to that top group of students. I'd want to know what type of unit the teacher happened to be teaching, if the teacher was giving individualized instruction versus group, whether there were any personality conflicts (laughs). I would suspect that right away. Was the teacher having any problems of his or her own, for one reason or another? Especially if I were the principal, i would check into that first. That isn't the way the year should end up. There are too many good students in a classroom. Given the chance, they'll produce for you. C or D? Well, one conclusion could be that the students were not up to relating well with the teacher and were not really motivated to please the teacher. Another factor could be that there was little home motivation and maybe school is not interesting. I can't really imagine it happening—not here. In contrast to the rejection of the simulation situation, the teachers responded to question 27 by indicating that task failure was a feedback. They stated that they did not mark all activities. Those most frequently excluded were introducing new material, pretests, class discussions and tests failed by a significant portion of the class. This, therefore, became a direct question in the second interview. Answers included: 93 For instance, a complete failure across the board? Maybe five people passing the subject. Then I feel that l failed and my grade is bad. So no grade goes in the book. If it has been something disasterous in the social studies area, we just had a study on Canada, and they all did poorly with the exception of three children, we decided to throw that out. And I would reteach it from another approach, and we had a person come in and give us some additional information on Canada, and we are in the process of using a couple of movies. Then we'll go back and try again. I would reteach if i felt the test was good. if I felt the test tested what I wanted it to test. Then if the kids were not achieving at the level I would expect, as an overall pattern, then I would go back and reteach and then give a similar test again. I would take it personally to mean that I didn't teach it very well. I didn't motivate them to learn. My response would depend how important I felt the material was. If I felt it was very important, then I would reteach it. If it was something that was, just happened to be in the book, but i didn't consider it important very subjectively, then I would just let it go and sometimes I would give the kids a chance to redo the test or take it home and correct it for some credit. (Laughs) Either i didn't teach the material or the questions were presented to them in such a way as to be too confusing. So if it happened, I would either strike out the test and not even put the mark in or I would lower my standards for that particular test and change the grading standard. I have a standard grading system, percentages, 90-l00 is an A, 80-90 is a B, etc. The only difference I know is one time during the fall, i changed the grading standard for a social studies test because the highest score was eight out of ten. it was a hard test. in this case, I put eight was an A, and then I just went seven was a B, etc. So I did adjust that one. (This teacher weights tests as equal to homework tasks.) The contrast between the responses to failure on a test and to the simulated situation where the summative marks were three-fourths Cs and D5 with no outright failures, supported the idea that tasks were more important indicants of learning and effective teaching to teachers than summative marks which were seen as a routine averaging of task marks. Some anticipated findings did not emerge from this investigation of marking. This may be the result of the interview format or the lack of expertise 94 on the part of the interviewer to probe at significant points. Specifically the researcher had hoped that teachers would discuss the selection and quality of assigned tasks to a far greater extent and make distinctions between tasks which would indicate different values (weights). instead, most teachers remained on a general level discussing the significance of the number and variety of tasks rather than the quality. However, most weighted tests more heavily (double) than other tasks. Some assignments were simply checked in or out rather than being corrected, indicating a quality decision. There seemed to be a general assumption that most tasks were worthy, especially the individualized skill tasks which came from published materials and occupied many entries in record books. Conclusions which could be drawn from this lack of detailed comment are likely to be spurious. However, the discussion is important because it appears to be related to task difficulty which most teachers did not list as an important attributional category during the marking judgment. The conclusion which seems most probable is 101 that teachers do not make such distinctions, but rather that such distinctions are important during the planning process (beginning) instead of the marking process (concluding). Once tasks are selected, they become part of an assumption which teachers operate from, and which are questioned only at times of failure as when the majority of the class failed a test. This further supports the finding that during the marking judgment, teachers are not concerned with task quality so much as with standard of task completion. Other anticipated findings did not emerge. Teachers seldom discussed the future placement or success of the student beyond the current year. They seldom discussed their own marks in relation to any outside criteria except the Michigan Educational Achievement Program (MEAP). They seldom discussed standardized testing outcomes as a measure of effective teaching. These would have indicated out-of-classroom measures. The findings which emerged, combined with those 95 which did not, indicate that teachers concerns during the marking judgment were largely bounded by classroom parameters. 96 Composite Summary The purpose of the study of teacher judgment during marking was to create an understanding of the judgment processes which engage teachers across a year and to see if these processes indicate a pattern or model. The data gathered and analyzed have fulfilled the purpose. The findings indicate that teachers generally follow procedural and contingency rules which divide the process into three stages: collection of task completion information, computation of task information and modification strategies to deal with uncertainty between categories and with failure. A marking judgment model, derived from the combined findings of statistical and verbal analysis, illustrates these stages and is included in the integrated summary following the teacher cases. The findings indicate that the statistical techniques help determine outcomes, hence, they are a signal for caution. The statistical techniques of multiple regression, Pearson correlations and partial correlations, revealed that each marking period made a contribution to the final mark. Regression analysis generally weighted the final prediction and the second mark as the most influential predictors, supporting a recency model of decision making. The Pearson correlations revealed that all marks were more highly related and significant than the regression equation leads one to believe, although the Pearson correlations supported the general weighting patterns of the regression process. Partial correlations which control for specified variables, questioned the recency effect, and supported an "anchoring" or primacy effect. Both the recency and anchoring effects are well known as common judgment processes. The teachers, however, stated that they strictly averaged the separate summative marks across the year thereby contradicting both hypotheses. fi&~_ 97 The findings of this study indicate that each marking period stands on its own tasks. Teachers do generally average formative marks at the end of a marking period, and they do generally average the summative marks to arrive at a final mark. However, an analysis of record books, of minuses and pluses and of verbal protocols reveals that they do not do this as "strictly" or as "cut and dried" as they perceive. Instead they have contingency rules which operate in zones of uncertainty and in exceptions. Contingency situations seem to increase as the year goes on. Teachers have considerable commonality about the judgment cues which operate during contingency zones. The cues include ability, effort, home support level, classroom behavior/physical maturity and task difficulty. Effort constitutes the primary contingency cue with ability close behind. The composite study reveals that teacher marking processes at the procedural level are related to task completion and at the contingency level are related to factors which promote task completion, especially effort. Interest in the home support level is basically related to gaining leverage to maintain or increase effort. Interest in classroom behavior is also related to maintaining on- task behavior of a significant group of students to assure task completion. Taken together, these procedural and contingency judgment processes reveal that teachers' marks are task focused and classroom bound. 98 Case One—Teacher One Base Data Sex: Male. Years of teaching: III. In this district: l4. Class Size: 29. Grade and composition: 6 fifth/23 sixth; l3 boys, l6 girls. Parents attending last conference: 25 of 29. Philosophy and Rule Teacher One believed the current system of marking was satisfactory. During an experience with a Pass, Satisfactory, Unsatisfactory system, he felt that teachers did not have "enough breakdown." He felt that "written comments to parents is the largest controversy in the school because of the time it consumes." Teacher One described his phi|050phy that "children should be appraised weekly and know where they are on the parameters that l have set: how they compare to their peers, how they are comparing to the material and what they are accomplishing. i talk to them as a class and have them figure out their grade, and if it is a serious situation then I will have a conference." To this end, Teacher One set up operating rules: Procedural rules: I. I put everything in the book so I don't have to try and remember. i don't try to do any grades from memory. 2. Roughly thirty grades were considered in the first marking. i try to give roughly a grade per day in each subject if we 99 cover the subject that day. Some grades reflect three days of work, for instance when we are working on a series of plays and writing a play. The categories in the record book are planned weekly. We start out in the beginning of the year with everyone in the same area, same book, same pages. The ones who excel, the ones who get five lOOs in a row then are selected out. They form a fast moving group called the "jet set" but this group fluctuates continually. There are some things I don't mark like introductions to new units and pretests in new areas. I mark in percentages and convert to the marks. 96 to iOO is an A, 90 to 95 an A-, 85 to 89 is B, 80 to 84 is B-, 75 to 79 is a C, and i only give one C and one D category. I weigh a test grade right against daily work. i don't give it extra weight. A lot of people do not test well, and so I would rather down play a test and up play daily work. If the daily work is being done by a parent for instance, then it gives the child an unfair advantage, but it is very very evident as soon as we do take a test. Then I adjust the weighting. Contingency rules: We also have group grading where six children may receive the same grade for a project. But if one person is fooling around and the rest are working, this person may receive a lower grade. if i have a complete failure across the board, that is, maybe five people passing the test, then I don't record the grade. Then i feel I failed. We just had a study on Canada and they all did poorly, with the exception of three children, so we decided to throw that out, and I would reteach it from another approach. Also, I allow the child to throw out their worst grade. in daily grading i always give them the benefit of the doubt. Some subjects are harder to mark. Math and spelling are easy and right out of the book. Social studies, reading and language arts are difficult to find out precisely where the child is. They involve a lot of correcting of papers and tests. I think they should know something pretty specific, not just generalizations. For instance, if they said there are many provinces in Canada and two territories, then i weigh iOO that, okay, what do they mean by many, but if they go on and tell me something about the provinces and they have just forgotten to give me ten, then I have to make a judgment, does this person really know there are ten? The rest of the test may tell me. 5. I retain people. However, i don't keep them back unless they can benefit. These two children are capable of something more, but they need a year of stability to know rules and parameters. i try to build these children up. it is devastating to go on and on and get more and more of things you don't understand. On the other hand, will not stay, because I just don't think this student has the equipment or is capable of much more. 6. Occasionally, I give pluses for extra credit work. If a child completes extra problems or completes a challenge, I give them a plus. When I figure the final grades, if the grade was a 95, I would round it off to 96 because of the extra effort work. So a child that is maybe not the sharpest in the world, but is a very diligent worker receives credit for being a diligent worker. However, the child must score a certain score, and I do strictly take that off the top. 7. Kids must end up on their own merit, and I take strictly the four marking periods, tally those and divide them by four, and wherever that grade falls within the half point, i give them that grade. 50 the minuses and pluses come in there. In line with these rules, Teacher One set up a record book system. it is the marks derived from this system that are analyzed statistically in the next section. A section analyzing the interview protocol precedes the case summary. Statistical Analysis Three statistical techniques (multiple regression, Pearson correlation and partial correlations) were combined to reveal the marking policies of Teacher One across a school year in language and mathematics. The tables used to modify the original regression equation and yield on adjusted marking model may be found in Appendix D. Language Arts manna L, mmua L. w w as }~,‘. .9: \,~ A: n 5 .91 ‘. fig. I.‘ - "fit kMI Ierk L1 . 5.“ “kn“ 70.. L5 “MIN” TOP I.1 L3) 0 .« L] ' “COM ACMI "If! L. - Final 'rediction L5 I Final Ierk Iatheoetics hedicted I! Predicted It, 525. n, - first Actual nara IIz - Second Prediction II, - Second Actual Mark II, - final Prediction Is 0 Final led H"; II‘ controlling for Ii, I13) - .11 Note. These policies were captured through Pearson correlations adjusted by partial correlations. Summative marks and predicted marks were the base data. Figure 4.57 Marking policy for Teacher One (with predictions) for 29 students. ' The initial regression equation indicated that the final prediction in language (Beta = .60) and the second mark (Beta = .34) were the best predictors of the final language mark. in math, the final mark was best predicted by the second mark (Beta = .69) and no others were significant. The Pearson correlations supported this. Adjustment by the partial correlations reduced the effect of the final prediction of language. The frequency bargraph supported the impact of the actual marks strongly. l02 Table 4.9 Pattern of Average Marks Across a Year Teacher One Students = 29 Class Average I I I-O CO? e s C-8 004 O 8 11-: I 1 E I g E Ii g5 If 35 IE IE § .. " a 1:. g 5 II a 1.: ~ :1 I = g E s 5 8 8 g 3 I 3 VI 1. 1‘2 Note. These averages were derived from the marks and predicted marks of 29 students across a school year. Hence Teacher One's policies resembled the composite model to a remarkable degree. All marks were highly correlated (P = .OOI). Actual marks predicted the final mark more accurately than predicted marks. Predicted marks were more highly correlated with immediate past marks than with future marks. The frequency distribution of marks for Teacher One also resembled the composite distribution. Teacher One used minuses and pluses, thus depicting the l03 contingency rules quantitatively. The pattern of minuses and pluses increased as the year continued; increasing from O in the first language mark to 20.6% the second and 37.9% the third; increasing from 6.9% in the first math marks to 34.4% in the second to 4|.3% in the third. The pattern indicated that all minuses and pluses operate at the A/B level and that there are consistently more minuses than pluses. Despite increasing contingency rules, Teacher One's predominant pattern remained procedurally based with more than 5096 of the marks in preordained categories of A B C D E as shown in Table 4.I0. An analysis of Teacher One's record book reveals his task entries (see Figure 4.6). He has a total of 20 tasks which show a wide variety of content including skill dittos, letters, one test, compositions, penmanship assignments and special topics such as Halloween. Percentages are used and averaged into mark equivalencies at the end of the marking period. Teacher One does not use a check-in (,I) system, but he does enter some assignments at IOO96 if they are completed (note columns full of lOOs). Teacher One's marking policies are obviously focused on task completion. l04 Table 4.I0 Distribution of Marks Across Three Marking Periods a nun Harllng Period - hrs: (1 ) Iearner on. Marking nriu - um (11‘) Teacher 0M 15m emu - 2s swam - 2! "us and Minus Plus and Minus Ing- l a! sinuses I ohms - o I related teA/s lists. I of Itnuses I sinus - e s I rele us to All- : .1 .Im". . 0 pm rdained “Nah. .19" _ I at einusss- I. 9 Preordained categories 0 91.1 I of pluses - 0 I of plus 0 Min 1m Mas-ting mm - Second (1 ) Teacher o" mm. mm ' “W" ('3) gm. 2"}, 3 Students - as Iii flu and Iiinus flue and one I s I of sinuses I luses s20. I S related to All- so.e Ian. I of sinuses I pluses - 10.. I related to All- _°‘! 3 of seeinu II’I v.2 PI- eardsined categories - II. A I of sinuses . 31.1 Mined categories .u, i [Miss-34 Into-slete-Z “DISH-‘3‘ 1m MT“ Planing Period - final (L‘) teacher One Musing Period - I'Inal (”II Teacher Ole Students - 2, Students I II Hus ans Iiinus Plus and Iiinue lg !~ I of sinuses I pluses - 17. I 3 related to All-17’ lose. I of uinuses I luses - 4L! 1 related to All Iinulll - L0 Pr eordeleed categories - I! l — I of sinuses I 7-9 I at pluses - l 9 Inca-ole! e - I of sin es 3 I ' ll.) Freordeined categorise . “.1 “Is. ”a. .R... ‘4-0 '~.-I ‘t-I \g__ I Note. Teacher One's judgment policies followed the general composite model. Figure 1&6. Record book account Teacher One. |06 Verbal Analysis An analysis of Teacher One's protocol indicates that his contingency rules are based on the attributional-utility factors of the composite model. Table 4.I | Attribution-Utility Percentage Teacher One Students 29 ABILITY (Achievement) EFFORT (Motivation) CLASSROM BEHAVIOR TASK DlFFICLlTY m SUPPORT The pattern of weighting factors is also representative. Ability (.93) and effort (.90) vie for first influence on judgment. Home support level (.59) is clearly his next concern. His descriptions of home situations are elaborate and closely related to effort which increases the impact of effort as a category. Ancedotes illustrate the complexity of Teacher One's weighting of cues between categories. Now I had a parent last year who did all the kid's work; it was very evident. In fact we wrote on the paper, I wasn't the first one, dear Mr. So and So you've done very well on the daily work and sent the work home. I don't know if it ever got there, but we graded the dad as he did the daily work and the kid didn't know anything, l07 didn't know from applesauce as to what was going on. He had three children in the school and he did all the work; he was very busy. (Teacher graded down, tasks not completed by student.) is a B- in math and a D in language arts and that is a severe drop. The reason was she just didn't complete a lot of subjects, a lot of things; her mother came in, she tried to make them up, she didn't make of three of them and then just gave up. She's a very moody child. They just told her she was going to move to Arizona and the family's so screwed up you can't believe it. The dad is a 25- year marine sergeant, so you know how that is. There is absolutely, I mean he does ridiculous things like penalizes the kid to the house for 90 days! That's like a court martial you know. That's not using your head in discipline. The kid forgets what they're being disciplined for. Scrub the john for l20 days. Geez, dumb! i can't get it across to him, you know. (Teacher graded down. Lack of task completion overrode home problem.) I'm going to pass him. It wouldn't do any good to fail him. He's got a dying grandmother living in the home, they just moved her out. His parents are older than I am, older than we are. They're in their late 603 and here's this little squirt coming along. He's got all kinds of physical problems but I think 90% of them are made up. The parents are heavy smokers. He does have chronic chest congestion and allergies to smoke and dogs and cats and all this stuff. He lives in that house you know and then he worries. He's constantly worried his dad's going to die, his mother has attacks, sugar attacks and he comes home and finds her out on a sugar attack, has to go and get her insulin and give her a shot and the grandmother, uh, it tears him up all the time and they do crazy things like call the school and get him ready to come home. No explanation. So then he's blubbering around the room wondering, my mother's dying, or my father's . . . he's always worried, wouldn't do any good to fail him. He could do no better, the poor kids got all of the worries l've got on my back plus a dozen of his own. A little old man already. He had the best, I took him home with me last week and, I take all of the kids home during the year, three at a time. And I took him home last week and this little guy had the best time I think he probably ever had in his life. He was so, he didn't want to leave, he didn't want to come back home and he was so excited he almost wet his pants. He was just shaking he was so excited about going and being there, and I take him down to do a little wood shop project with him and we played games, he was so excited his hands were shaking. Poor little guy. (Teacher graded up. Problems override task completion.) Despite considerable sympathy with student home problems, Teacher One remains loyal to a task completion criteria. Effort toward that goal must be in evidence or the odds must be overwhelming, such as the quoted case, before Teacher One relinguishes basic procedural rules. l08 The category of class behavior received very few comments (.24) from Teacher One. Evidently his task orientation does not make behavior a major problem. However, there is not sufficient data to draw conclusions. Teacher One did not comment on the simulation question because the interview ran short of time. However, he commented emphatically that a failure of the majority of students on a unit test would provide an occasion to reteach. He gave an example of a recent occurrence which was quoted in the composite. case. He saw such task failure as a feedback mechanism. Teacher One resembled the composite case in his optimism. His predictions were higher than actual marks across the year. His class average was high. in addition, the cue sort technique given at the end of year reinforced this conclusion. Table li.l2 Cross Tabulations of Effort and Ability Teacher One. EFFORT High Low High 21 4 25 ABILITY Low 0 4 4 21 8 29 Criteria from outside the classroom did not enter Teacher One's regular marking judgment. "The only time I ever do this (compare students to standardized tests) is if I am challenged on the grade I am giving. I never look at their past record when i give them. The only record I look at is a reading score card that comes with them . . . we try to put them into the proper book." |09 However, Teacher One does have curiosity about his students in seventh grade. "They send back the testing machine runoff and I compare it just to find out where my kids went, because I like to compare them against past years. They've always done well. Yes, l am finding out how well I'm doing." These comments indicate a definite interest in the student's future, but there is not specific detail involved perhaps because he perceives the situation as successful. These general sentiments, however, were never expressed in relation to any of Teacher One's marks or predicted marks. Summary Teacher One's judgment policies closely resembled the composite case. He marked the majority of students according to procedural rules and a minority according to contingency rules. His judgment cues related to contingency rules also mirror the composite with more weight for effort and ability. Teacher One's policies have coherency. They are focused on task completion on a variety of tasks besides skills. His marking is classroom bound. ”0 Case Two--Teacher Two Base Data Sex: Male. Years of teaching: l9. In this district: l7. Class size: 28. Grade and composition: li/S split. Parents attending last conference: 26 or 28. Philosophy and Rules Teacher Two was very satisfied with the current grading system (see report card in Appendix 8). Only as a child had he experienced a different system of excellent, satisfactory and unsatisfactory. He appreciated the fact that the school district did not have a specific philosophy of grading and felt "thank goodness, i hate to be dictated to. i like it just the way it is because we all have our own criteria for grading." Teacher Two "see grades as having two purposes: one to inform parents one to motivate children. Not to inform children, because i think children in my class always know where they're at anyway. But sometimes I grade down in order to motivate them to try harder next time. But my primary thing is to report to parents." To this end, Teacher Two has certain rules which he applies to marking. Some of these rules or strategies are embedded in phil050phical statements. Procedural rules: I. Our report cards give us a chance to put down the grade 5. level at which students operate as well as grade. Since I teach all basic subjects on an individualized basis, I have kids reading as low as beginning third and as high as sixth. I could give a third grade reader an A if they were doing a lot of work and trying very hard. In other words, an effort grade. This also works in the reverse. I do have some children that are in the fourth grade in a sixth grade book and getting a C or a D on their report card because they did not read a lot and did not do their workbook properly. I grade according to what we've done. We may have as few as four grades in a marking or as many as ten. If I have less than four, I probably wouldn't even give a grade. I would think the grade wouldn't even be valid. I don't give many grades so they tend to be very important when I do give them, especially tests. Well, I think it's very unfortunate when you only have one test in a marking period to grade, which was the case this last marking period and that was my fault. I just let things go too long and it came to the end of the year, end of the marking period and I didn't have enough time. I think to be fair with kids and ive them a chance you should give at least two or three other minor grades to average with those tests. Generally I just follow the old standard: 90s is an A, 808 is a B, 70s is a C and 603 is a D, below 60 would be an E. l weight tests and written reports heavier and usually put them in a different color so they can be seen. in reading, math or penmanship, they can sort of go at their own speed. For instance they know they are expected to do at least one penmanship paper a week, ten papers per ten weeks. Those that had l0 pass papers got a middle 8. If they did considerably more they could get an A. Two children got A+s because they had done 35 or 40 papers when l0 were expected. My records are free, they are on my desk all the time. Kids are always looking over my shoulder comparing even competing with each other to see who has the most math papers, etc. Contingency rules: Well, two things influence whether l'll go up or down. The child itself; if I feel they're trying their hardest, i tend to give them the higher grade. if I think that they are capable of even better than that highest grade, a higher grade above that, I would tend to give them the lower grade to sort of encourage them to try harder next time. And the other thing: earlier in the year I tend to give the lower grade, ll. ”2 later in the year I tend to give the higher grade because I think grades can be used for motivation and you motivate a child more, I believe, by giving them the lower grade to begin with and give them something they can shoot for. That is, if we're talking C+s and B+s but if we're talking very low grades like the difference between and E and a D-, I would tend to go the higher grade if it's in the low range, say below C because it tends to help your self-image and some kids, once I have seen kids just blossom and bloom when l've given them the higher grade, the C instead of the D. And it improves their self-image and they feel so much better about themselves. But when we're talking above a C then i tend to give them the lower grade early in the year and the higher grade later in the year. i don't usually grade daily discussions. if we read out of a book and have a discussion about something, and if the class is cooperative, I don't usually grade. But if they get so they are not listening or paying attention which has not happened this year yet, then i start giving oral reports which I record as pluses or minuses. Marks are a motivating factor mostly because parents start taking a very active role in their children's education when the kids bring home bad grades. This is greatest during the first two markings, but along about the third marking period it's spring and everyone's attention is sort of turned away from school, including parents. Well, in some cases it can damage a child's self-image but on the other hand, there are kids who can be motivated by low marks. I find that i not only have to know the child but his parent, too. To know if a low grade would be damaging or motivating to them. And it only takes after the first conference and even asking other teachers about certain kids to find out whether low grading would damage them or not. Well, like all teachers I have my expectations of what a child should do in my room and I guess I feel that it's my responsibility to not only just give out work and put grades in the grade book, but see to it that a kid achieves up to my expectations. Push, push, push, l'm always pushing them. l'm always reminding them. l'm always trying if they fall below on a test as a class tending to 9 ve them an opportunity to do extra credit work to make up their grade. I don't like to send Cs and D3 and Es home. But I do. You looked at my report cards. You see that i do send Cs and D5 and Es. But I give them all the opportunity I can. l'm a mother hen. I really keep after them and try to give them every opportunity. If the majority of the class failed a test, I would take that personally to mean that I didn't teach it very well. I didn't motivate them to learn. Sometimes i would give the kids a ”3 chance to redo the test, take it home and correct it or correct it in class and try to improve their grade or as I did this last test in social studies I had nine kids get an E on the big unit test and I told them I was sorry that the information was taken directly from the book and they were told what the test would be on, they were told that most of the test would be the vocabulary words that were in dark print and all they had to do was . . . so I felt that the test was fair but so many did get poor grades on it and it was right before card marking and I had to grade their report cards the nine kids would have taken home Es and I didn't want to send Es home on report cards to parents. I ended up sending about three home. I gave the kids an option of doing some extra credit questions in their book, strictly on their own, and I told them where the questions were and how to do them and quite a few came in and l was able to raise their grade. It was sort of a band-aid approach to the situation. I sure don't like to send home Es. In line with these rules, Teacher Two set up a record book system. It is the marks derived from this system that are analyzed statistically in the next section. A section analyzing the interview protocols precedes the summary. Statistical Analysis Three statistical techniques (multiple regression, Pearson correlations and partial correlations) were combined to reveal the marking policies of Teacher Two across a school year in language and mathematics (see Appendix D for background tables). II4 Marking Policy (with Predictions) Teacher Two Language Arts Predicted L2 Predicted L‘ at!» L, - rim Actual em L, ' Second Prediction ril‘ L5 controlling for I.l L3) 0 ,3; L, - Second Actual em l.‘ I final Prediction L‘ I Final lift “MNCI Predicted I: Predicted I4 I, I First Actual liar-t in! 0 Second Prediction rill4 its controlling for I] I!) I .45 iia I Second Actual liar-t I, - Final Prediction us I Final liars l5 Note. These policies were captured through Pearson correlations adjusted by partial correlations. Summative marks and predicted marks were the base data. Figure 4.7. Marking policy for Teacher Two (with predictions) TOT 28 students. The initial regression equation (see Appendix D) indicated that the second language mark (Beta = .34) and the final prediction in math (Beta = .40) were the best predictors of the final mark. This was supported by the Pearson correlations but adjusted somewhat by the partials. The frequency bargraph appears to be the deciding factor in this policy. ”5 Table 4. l 3 Pattern of Average Marks Across a Year Teacher Two Students=28 Class Average In“ All n-u It“ I I s-s ‘1 007 C 6 CI! 00‘ I 3 0-: I l _J ‘3 E E E E 35 i5 i5 55 3i i E g. E ii i 9 3 a 5 a 8 S g g Note. These averages were derived from the marks and predicted marks of 28 students across a school year. Teacher Two has a 9.5 class average and predicted average in language across the year until the last mark when it increases to l0.2, the highest of all five teachers. In math, the average steadily increases across the year for both actual and predicted marks; from 8.2 to 8.8 to 9.3 to 9.4 to l0.0. Yet these marks result in the weakest correlations of any of the five subjects. This appears to be related to the fluctuating distribution of minus and plus marks. Nevertheless, the correlations are generally high and significant. ”6 Hence Teacher Two's policies resemble the composite model in that the average remains stable despite fluctuations in minuses and pluses. Predictions are more highly correlated to the immediate past mark than those of other teachers. The frequency distribution of marks for Teacher Two resembles the composite in having minus and plus categories. These categories also follow the composite pattern of having few in the first marking (7.l% in math, l4.3% in language) increasing in the final mark (46.5% in math, 42.9% in language). During the second language marking however, Teacher Two gave 64.3% of his marks with plus or minus modifications which did not follow the composite pattern. Obviously, Teacher Two used contingency rules regularly as depicted in Table 4.I4. ”7 Table 4.I4 Distribution of Marks Across Three Marking Periods nursing emu - nm “1' Immner Tee Merlin. Period - rim 0. ) Teacher Tale Stu dents I 2! I Stude- tudests I a Plus and iisnus Plus and Minus lots. I of sinuses I pluses I u. I I related to All- Iete. I o! sinuses I lusee - 1 l I related to All- 7-I linusel e P mn‘slnel gauges-I" I IS. 7 I 0' linulal I l Preerdained “will I" I In 'plu ses - 10.1 I o! pluses I 0 Hurling rum - mm (L,) Ieuner Tee nuns. Period - Second i ) i212." Tn Stun-ts . zs '3 students - 1! Plus end Nines Plus and limes lies . I o! sinuses I pluses - sa. I I related to All- .fl. ; of "mu" . 1 _! . puns-20.6 Ir~eiatedtelllI I °’ M“ “- ‘ "0"“ I“ “"00"“ ' 35- 7 — s e' Iinusea ”1-1 .a Preordained cateeeries -n l I of pluses - 17. I pluse1.2 Lmfl ""Hll Period 'Final "'9' Teacher The Students . fl ital-ting Period - Final ('5) Tuner lee Plus and Minus tudents I a Plus and Iiinus lose. I of sinuses I pluses I ‘2 0 I l I M sinuses I 25. o p " eerni‘nercmeprxsdn 57; Ines. I oi sinuses I pluses - “.I I rfl'aled tel/II “.5 I o! pluses - 17. I I M sinuses I ze.s Preordained categories I I). I I at pluses I ".9 H8 An analysis of Teacher Two's record book indicates the task entries (see Figure ll.8). Teacher Two had the least entries and relied on creative assignments (writing and stories) rather than skill areas which may account for the use of contingency rules. Teacher Two also puts the greatest stress on literature and book reports. Students gain contingency points through the black dots which appear on the left side of the page. she... :10 can be 4.9.1144 .d. O p —_J..I l eseeif . f l 4’"! § \- ‘2 ‘ §. K spot/u II9 M . - ~ I. Amer! ~ -’ ,. A“. 144 Padr’ z ’fi. eo’fug and” yes, I a.” Note. This record book page was extracted from data collecter on Teacher Two's marks in language and mathematics. This depicts marks in language for two marking periods. Figure 4.8. Record book account Teacher Two. |20 The conclusion appears that Teacher Two differs from the composite model by specifying fewer procedural rules, by entering less tasks and by allowing considerably more Ieaway for individual student initiative. Teacher Two's class average resembles the model in being considerably above average. Verbal Analysis Teacher Two's attribution-utility chart reflected the same general categories of concern as the composite judging model. However, he put different weights on the cues: ability (.75) running significantly higher than effort (.32); classroom behavior (.54) higher than any other teacher in the study; home support comments running lower (. l0). Teacher Two seldom commented on task difficulty perhaps because he used an individualized approach in almost every area. It is difficult to characterize Teacher Two's task orientation. He has fewer tasks in the record book, but he offers the greatest Opportunity for students to do even more tasks on a contingency basis. Some students take advantage of this, hence, his generally high class average. The procedural base is less clear. The highly individualized approach may explain the lack of discussion about a specific level of achievement for the class. |2| Table ll. | 5 Attribution-Utility Percentage Teacher Two Students 28 100 88888388 A U CA 6 g. .. ’0‘ .2: ‘23 52 :V .- :5 Hu- 2:; Teacher Two has some comments and anecdotes which highlight this HOME SUPPORT CLASSROOM BEHAVIOR TASK DIFFICILTY situation. i teach all the basic subjects on a individualized basis so i have kids as low as beginning third and as high as sixth. Usually my problem is not the letter grades I give out so much as the grade level in those subjects especially reading and maybe spelling. l have usually two to three parents every marking period or every conference period who will wonder how I can give their child an A in reading when they are a year behind in reading. So I have to explain that it is an effort grade. Once they understand that my grades and most subjects are effort grades, they are satisfied. C. D. One of my lowest students and expected to stay the same. Real problem i have with this one because I've recommended her for testing and her parents refused. Her parents insisted that I put her in a different book because the book I gave her was too easy so I‘ve had a problem with parents there because they refuse to recognize that she is slow. Finally I just let them have their way. I just put a note in the files to the effect that l l22 would not accept the responsibility for the book she was in . . . I suggested testing but they refused. In fact I had some parents once that refused to have their boy in my room. I said, (principal) I want that boy in my room because they got their older girls through and they were never in my room. I always looked forward to having them, they were nice kids. I finally decided that I wanted this boy, I wanted one of these kids in my room and they were just adamant because they had heard rumors that he doesn't know what he is doing and so on and so on. So I got them in here for conference, this was like the spring because of coming fall and I showed them my records and showed them my way of doing things and they were very surprised and they said okay, well can be in your room. Well, was in my room the following fall in a fourth grade 4/5 split as a fourth grader and the next year I taught 4/5 again and they requested him to be in my room the second year. They were afraid because my system was so different from anything they had before and they thought they had heard rumors that I let kids do as they want to. One reason it happened was because-my kids are working in different pages in different books and they help each other out—there is a higher noise level in my room than some other rooms especially when we are doing math. Math is kind of a noisy time and it lasts about forty- five minutes a day. At that time we had a lot of parents working in the library so they would walk by my room and see all this commotion and moving around, and occasionally l'm sure they would see kids goofing off too, because boys especially will throw a paper wad or misbehave, and of course, eyes tend to focus right there with that one kid doing something wrong, parents don't realize that when you keep kids in their seats and quiet supposedly, they are still goofing off but they are just doing it quietly. They are playing with their little cars behind their book and they are reading a magazine or a comic behind their book, but when you turn them loose to be free, the behavior becomes more overt and it's easier to spot and they spot it. These extended comments by Teacher Two indicate a problem area in communication which stems from his procedural base. A conclusion present in this case is that Teacher Two differs from the composite model in the extent to which he functions on the contingency rule basis. Most teachers in the study, especially those emphasizing skills, operate predominately on the procedural level for the majority of their marks which is much easier to communicate. Teacher Two is likely to continue having some difficulties in explaining his system to parents. l23 Teacher Two resembles the model in his general willingness to discard marks if the whole class failed, seeing this as his own failure. in the cue sort, Teacher Two expressed an average optimism. Table 4J6 Cross Tabulations of Effort and Ability Teacher Two EFFORT _High Low High I 13 I 2' 15 Low | 4 I 9] 13 i ABILITY 17 1 28 Summary The judgment process of Teacher Two differs substantially from the model. in a significant number of marks, he follows contingency rules rather than typical procedural rules. In effect, his students have more opportunity to move ahead, setting their own standard. On the other hand, he does not give as many As as some other subjects. Teacher Two's judgment policy is very much influenced by attributional- utility factors although he weights them differently, a trait common across subjects except for effort. Lastly, Teacher Two defines his tasks differently and chooses fewer of them for the record book. In this manner, it is difficult to conclude that his marking judgment is geared toward task completions at a specified level or standard. Completion in and of itself appears to be a more dominant goal. Despite this question, Teacher Two's marking is classroom bound. l2li Case Three - Teacher Three Base Data Sex: Female Years of teaching: l6. In this district: l5. Class size: 32 (5 L.D.) One student moved. Grade and composition: 5th grade; half boys/half girls. Parents attending last conference: 29 of 32. Philosophy and Rules Teacher Three was basically satisfied with the marking system (see report card in Appendix B). She felt "the conference with the parents is the most important part of the program so to speak, because i do explain to the parent how I mark and why. It's a way for the student to know how he is doing in regards to expectation." "I think marks are definitely important to the parents in this area, because we hear if somebody's mark is not where the parent thinks its going to be. They usually contact us right away. Most of the parents in this area are successful people and they want their children to be. in turn, they feel that success in school leads to a successful life, and therefore that's their way of measuring how their child is doing." Teacher Three believed that marks were a periodic summary of student learning and that parent knowledge of the situation brought motivational support. Hence she had rules to support this: |25 Procedural rules: 4. 7. The record book is very valuable. The marking allows me to summarize my thoughts about the child and how he is progressing, so when I go into the conference I know exactly where the child is in regards to my expectation for him. I have been caught in the spot where parents come in and say, "quick I want a conference about my kid," and I really haven't had time to look back over marks, and get my thoughts together. i keep track of daily work, which I don't always record. l do check a paper daily and I do look at progress daily, but I don't always record it. it depends on the work itself. i may have as many as 20 marks in the book or as few as IO. Marks in the record book represent homework, skill study sheets, workbook pages, projects, oral projects, tests-a great variety. ln math, I have a separate page where I list the skills and as students master the skill at 80% or better, I give them a mark. Since they also work in groups, they may change groups. If the majority of the class failed a test, "I would reteach if I felt the test was good; if it tested what I wanted it to test." The first marking and conference are especially important. Until then, the student really doesn't know how you're going to mark. Once he sees what your standards are, then he tends to work harder or less hard, depending on how he felt he worked in comparison to the mark. i don't really look at permanent records until after I have marked the first marking period. I like to get my own feelings for the kid. i do look at the Michigan Assessment tests and they are useful. i pretty much use them to basically support my own opinion. I still size up my kids myself first. Most categories of the record book are planned in advance. However, if we don't complete the task successfully, i may not record it until I have retaught it the next day. i use one very useful grading device which helps keep my marks less biased. It is a chart which figures points into letter grades according to the standards 90-lOO = A, etc. That way, I can have six items or thirteen. They don't have to be an even ten or twenty. l 26 Contingency rules: I. If I am in doubt, i look back at the marks to see the nature of the assignment. If, for example, in reading the comprehension marks are all good, but they drop in skill papers, 1 would probably go with the higher mark. If comprehension marks are low, then I might stay with the lower mark . . . . in language I would look back, if the creative writing assignments are the high areas and the low areas are in the skills, I'd probably keep the mark at the low level. I think creativity is important, but i think the language skills have got to be there. 2. i give four points for A, three for B, two for C and one for D, and I show incomplete with circles. i add them up and divide by the total number of marks. Very cut and dried. if the mark is in the middle, i look at the record book and the kid. Is he really working or not putting forth much effort? 3. I sometimes give Us to make a point with the student and parent. if I slide them through on a C, nobody gets too upset, but a D really bring the point home. 4. Tests are important. I guess i really feel, especially the ones i make up, do reflect what I'm teaching and whether or not the student is mastering it. (I double-check my results with those of the district's criterion reference measures as well.) i don't give semester tests as such. I test after I've taught so that they come throughout the marking period. It's the one way I can make sure that they are mastering the skill, not just parroting back a short-term learned kind of thing. And I give review tests in skill areas such as multiplication. And again, if the grade is coming out in the middle, if test scores are high, then I'll move the grade up. if low, l'll drop it back. 5. Skill tests are important. I feel fifth graders should have mastered multiplication. If they hadn't, they got a D, which hopefully woke the parents up and let them know the problem. In line with these rules, Teacher Three set up a record book system. it is the marks derived from this system that are analyzed statistically in the next section. A section analyzing the interview protocol precedes the summary. |27 Statistical Analysis Three statistical techniques (multiple regression, Pearson correlations and partial correlations) were combined to reveal the marking policies of Teacher Three across a school year in language and mathematics. The tables used to modify the original regression equation and yield unadjusted marking model may be found in Appendix D. Limesms 'redtcted L2 Predicted L, ‘1' a .375 » an. i.‘ - rm: Actual m '1 ' 30"“ mama.- riL, Ls controlling for L, La) - .05 L, - Second scum liar-t , L, I fieel lredictioe I.s I Fidel Here lethssstice mamas, mm“ a, w s ‘5'" as .9: Q‘s." .0 ss 2‘ .ss‘tg" - "l "s = "s gum. is, - rm: ActIsei lie "2 “a“ "“""°" run, its controlling for e, la) . .2: 32 "s I I Second Actual flare . Fins] Prediction I Final Iert Note. These policies were captured through Pearson correlations adjust by partial correlations. Summative marks and predicted marks were the base data. Figure 4.9. Mafldng policy for TeacfiTer ThreeWith predictions) for 32 students. The initial regression equation indicated that the first and second mark were the best predictors of the final language mark. The first and final predicted marks carried the greatest weight in the final mathematics mark. Adjustment by Pearson and partial correlations resulted in displacement of the final predicted 128 mark in math by the second mark (see Appendix D for additional tables). This adjustment was supported by the frequency bargraph depicting Teacher Three's class average pattern. This showed that predictions did not correlate with marks as highly as marks correlated with each other. Table A.” Pattern of Average Marks Across a Year Teacher Three Students 3| C lass Average M33. "Hr-sens:- e-NUOIIOH. E E E E E is is i; i; i: E .. s it E g E all :2 B u '5 g '1‘ g g c as 5 3 a a a 2 :3 3i 2 «'2 Note. These averages were derived from the marks and predicted marks of 3| students across a school year. Teacher Three's policies resembled the composite model. The conclusion , appeared that her marks were more correlated to actual marks than to predictions. Predictions were also more closely correlated to immediate past |29 marks than to future marks. These conclusions supported the teacher's statement that she generally averaged the summative marks across the year, although not quite in the "cut and dried" manner she perceived. Teacher Three has the highest correlation of the individual cases between the first and last marks (language .92 and math .9l) supporting a strong relative position on "cut and dried" average. The frequency distribution pattern of marks across the year for Teacher Three indicated very little change of class average or mark distribution across the year and no use of minuses and pluses. This finding further supports the strength of the first marking as a stable unit in the individual regression equation which differs from the composite regression pattern. There is an interesting note, however. Teacher Three gave a disproportionate number of Ds the first marking to make a point that the multiplication tables had not been learned. Presumably this lowered the class average to make a point to the parents. When followed up, this turned out to be the case. The math class average incresed steadily through the year, 8.8 to 9.4 to 9.6 as shown in Table 4.I8. An analysis of Teacher Three's record book showed the task entries (see sample Figure 4.IO). Of l7 language arts entries in one marking |5 were corrected and two checked in (/). I30 i :9:- ts) s:::s =1:- a“ :5 I=3" =3 2: Figure 4. I0. Record book account Teacher Three. Of these same l7 entries, one was a test, I I were parts of speech and four were short story assignments from the language book. Hence the predominant mode in assigned tasks for the marking was precise and did not require many contingencies. That is, the assignments for this period are skill oriented. l3l Table 4.I8 Distribution of Marks Across Three Marking Periods sure i ted f l n 1'. Martina Period . Fires (ii ) Teacher “res n I ‘ g u If“ (Ll) Streams urge 1 Students I 21 pi.“ and a...“ Plus ssd lisnus In”. I e! sinuses I pluses I o I related te All I 0 late. I a! sinuses I pluses I 0 I related to m- 0 I at sinuse se-s Preordained categories I was I M sinuses I 0 Preerdsined cstegeriee I lNI I 0' pluses I I 0' pluses I 0 Men! I l vs I i e s e (L ) i am three ""“M "fl“ ' 5"°"‘ “'3’ m4" "'0' '.a In r o I seen e er 9 ' 1 5w“ _ :1 Students- :1 Plus and lisnass Plus end lisnus lists. I of sinuses I pluses I o I releted Is All I ll late. I of sinuses I pluses I ll I related to All a o _ — I die lse:nus I 00 Preordained categories I was I I2: 33:"? g :{n’rt‘f ““9"" ' W“ I M p use: I Mlli Lmas "Mm! MM . mu (I5) leather Three “HUM? Period I Final (l5) leather anee Students I :1 students I ll Plus and liinus Plus and lunus lieu. I s! sinuses I pluses I a I related In All I 0 ate. I e! sinuses I pluses . o I related ta A/e - o : °" 0 "unlined cetepriss I will I a! sinus In Precrdsined categories I lOOI I" :"‘:“I' o "OM ' I I 0' pluses l32 Verbal Analysis Teacher Three discussed skills and mastery more than any teacher. It is therefore interesting to note her attributional-utility comments compared to the composite case. The category of ability comments was lower (.68) than effort (.77) but task difficulty comments were higher (.32) than those of most teachers. This verified her focus on skills as a criteria for marking. I had one student who questioned his math mark and he felt he should have gotten an A instead of a 8. He felt that he had passed two-digit division. The kids knew ahead of time the criteria for my marks in that they had to be a certain point to get an A, had to be at another point to get a B, another for a C. They knew this ahead of time, must have been a couple of weeks. So a lot of them worked very hard to pass certain test levels, but he didn't pass it until the day of the conference which was too late. I was really sorry, but the mark was in. i will stick with small group instruction. We just finished a geometry unit and i work with the whole class in that type thin . Then i go back into small group or specific skill development. I wor with each group daily. I feel that immediate classroom contact is important to me, then I know where i need to reteach the next day, where l'm getting across. Usually it makes the kid feel very successful because i worked with him and he knows exactly what to do and can go back and do it. He feels like he's doing it, always making As in his math even though he is working at a lower level. He feels very good about himself and I think that's the whole key. i think if a kid feels good about himself and I can keep him motivated, then he will progress. Teacher Three seldom discussed class behavior in the protocols, and the attribution-utility charts attest to this (. l 9). Evidently, the task orientation which she clearly spelled out to students kept this group of students occupied. Teacher Three gave more As than any other teacher indicating that students were on task. Marks in this class were, therefore, directly tied to tasks and the performance-grade exchange was carried out. Sixteen As were given on the first marking in her class. I33 Table ll.l9 Attribution-Utility Percentage Teacher Three Students 3| 100 8888388 20 IO A a CA 3 E: -. >H U. w) ‘3 is. V :5 0-. an! HH- 2: Teacher Three cooperated with Teacher Four. They regularly compared m SUPPORT CLASSROOM BEHAVIOR TASK DIFFICIIJY notes, and neither used minuses or pluses on report cards. Both stressed tasks, both used a marking device (wheel) which allowed them to clearly turn number points corrected on a paper into A B C D E. This allowed them to use uneven numbers quickly. Teacher Three commented frequently about effort, and rated it above ability, but the idea of effort was directly translated into skill mastery at a stated level of difficulty. In the manner of behavioral objectives as discussed by Wrinkle in his report card study, Teacher Three's approach resulted in clear categories of marks within marking periods. Clear categories, without plus or minus modifications, resulted in generally high correlation across the year. I34 Teacher Three preferred not to use outside standards to assess students. in comparing my marks with the student assessment scores or any other standardized tests, I don't really, I try not to let those influence me. In fact, I don't really look at permanent records until after I have marked the first marking period. I like to get my own feeling for the kid . . . . I do look at MEAP scores and they are useful. I pretty much use these tests to basically support my own opinion. i still size up my kids myself first. No, i don't really look at these tests first. I don't pull out all the test scores and look through the record and say well this kid has always been a B student so l assume that is where he is going to fall, or this kids been the pits all along. I really like to get my own rapport established with the kids, I don't like to categorize them right away. in fact, we get reading scores and math scores for reading groups and math groups from the previous teachers, and I don't even rely on them. These brief responses are not sufficient for drawing specific inference, but they add to the conclusion that Teacher Three's criteria of judgment about kids' ability and effort was contained primarily within the classroom. When combined with the total lack of comment on any item related to future student work beyond the present fifth grade, it supported the general composite conclusion that the marking judgment was classroom bound. Teacher Three put heavy emphasis on the home support level in protocol statements and in the attribution-utility chart. She used home support as a leverage to maintain or increase effort on task completion. She did not use home knowledge to modify marks directly with a minus or plus. Teacher Three resembled the composite case in her optimism about her students. Not only were her predictions high, but the cue sort given at the end of the year was also particularly high in light of having five special education students. I35 Table £1.20 Cross Tabulations of Effort and Ability Teacher Three EFFORT _High Low High L15 I 5 I 20 Law I 3 I 7 I 10 18 12 30 ABILITY Summary Teacher Three's marking policies and judgment cues resembled the composite model. Statistical analysis supported high correlations between marks with predictions having lower correlations. Marking distributions revealed clean categories with no minuses or pluses which differed from the composite. In verbal analysis, Teacher Three considered the composite judgment cues of ability, effort, homesupport, class behavior and task difficulty. However, she weighted task difficulty more heavily than most other categories which supported her emphasis on basic skill objectives. Teacher Three focused her marking on task completion at a given level of difficulty, and her marking policies were classroom bound. Base Data l36 Qgse Four - Teacher Four Sex: Female. Years of teaching: '4. In this district l4. Class size: 33 Class grade and composition: Fifth grade; l6 boys and I7 girls. Parents at last conference: 29 of 33. Philosophy and Rules Teacher Four was satisfied with the current marking system (see report card in Appendix B). Having used others, she stated that "the one we are using now is the best one we have ever had." Teacher Four relied on the conference to make an important two-way communication where she also gets input. She felt that "traditional A B C marks within the conference setting are the best. black and white. Probably this is because the parents grew up with it. However, it is very important to indicate the grade level at which the student is being marked." In light of her phiIOSOphy, Teacher Four established procedural and contingency rules. Procedural rules: My record book contains marks for homework, special reports and tests. I don't count class participation or discussions because I expect them all to participate. I use a point system where A is 4, B is 3, C is 2. I use a calculator across ID to 20 items and find a strict average for the marking period. In grading papers, I use a percentage system where 90-lOO is an A, 80-90 is a B, etc. The only difference i know was a time I changed the scale because the highest scores were so low. I did adjust l37 that one, and I would be tempted where the test is very difficult. Homework and tests come out the same. I mean I don't weight one over the other. In math, marks are based on a mastery level. If they were into 2-digit division, they got a B. If they are beyond 2- digit division, they got an A. Achievement should have more weight than effort in a grade, but i try to help the child out in other ways. When it comes to grades, the only thing I take into consideration is total difficulty of the subject such as our social studies for fifth grade. Contingency rules: If a mark is half way between, I look back in my book to see minuses and pluses which have been given at work periods, i.e., experiments in science. If they have a lot of minuses because of fooling around in class, then I would give them the lower of two grades. The time of year is also important because of motivation. At the beginning of the year, i would tend to mark down. I think that if they start off with a top grade the first marking period, they tend to go down because they don't really have anything to work for. At the semester break, however, I tend to mark up. I feel they ought to have credit for what they have done. Marks are related to motivation, but they are limited by ability. If they get a B, I think that's as far as they can go. Even if they are motivated to try harder, it's pretty hard to bring it up to an A if you don't have that something extra. Marks are especially related to motivation depending on the family situation. It all depends what the parents' expectations are. In most cases, parents are motivated by their kid's marks. Though I have some that really don't care. That is, they care, but it doesn't really make them or get them to force their kids to do a little bit more work or try a little harder. They tend to give up and say, well, that's all their child can do. I have hardly any Es because I don't believe in giving Es. If a child hands in his work, I feel I can't give him an E. I usually push the kid hard to get the work done. Actually, I did give one E because I just couldn't budge him even keeping him in the classroom during recess. |38 6. No, l don't believe B is an average mark. I didn't realize my marks averaged B. I don't know how this happened because I would say that l have as many outstanding as low students and all the rest in the middle. l'm surprised. in line with these rules, Teacher Four set up a record book system. She also set up a card system for mathematics where the students are moved from one group to another upon mastery, hence, the record book is too limited. It is the marks derived from this system that are analyzed statistically in the next section. The interview concerning the marks is analyzed following that. Statistical Analysis Three statistical techniques (multiple regression, Pearson correlations and partial correlations) were combined to reveal the marking policies of Teacher Four across a school year in language and mathematics. The tables used to modify the original regression equation and yield on adjusted marking model may be found in the Appendix D. m. l‘ - rm: mm m :, - «an mumm- m. L. ceetretlieu m’ L, L,) - -.o: t, - um mm m L. I "III Fredictiee I.‘ - mu m filth-etics heeicted I, Predicted l, I‘ I first Actui M a: . Seoul mung. rill, It; controlling for It '3) 0 .18 I, - Sea-i Actual m I 0 Heel Prediction '5 0 Heel left O Note. These policies were captured through Pearson correlations adjusted by partial correlations. Summative marks and predicted marks were the base data. Figure h.l l. Marking policy for Teacher Four (with predictiong for 33 students. The initial regression equation indicted that the second language mark (Beta = .93) was the best predictor of the final language mark. ln mathematics, the best predictor was also the second mark (Beta = .38 and .89 in Stepwise regression) and the only one with any significance. This was supported by Pearson correlations and the partials. It was further confirmed by the frequency bargraph which shows that the second marking average is closer to the final average than the prediction, however, the distinction is minor when the mark averages are so close. lliO Table 4.2l Pattern of Average Marks Across a Year Teacher Four Students 33 Class Average 7.5.2,: "r-srncr “ND...“- um MB MT“ MTII W HT! MTI mu Sim FRENCH“: W ARTS HIM ”ICU": m MT! HIST Milli: Sim MIXING: FINAL Mill“: F215. These averages were derived from the marks and predicted marks of 33 students across a school year. Teacher Four's policies resemble the composite model. The conclusion appears that her marks in general are more highly correlated to actual marks than to predictions. Predictions are more highly correlated to immediate past marks than to future marks. These correlations support the averaging of grades across the year. A frequency distribution table depicts the marks. The pattern of marks across the year for Teacher Four indicates that she, like Teacher Three, gave a significant group of D5 in the first math mark. Those Ds were displaced by Cs the second marking and by Bs in the last. Like Teacher Three, Teacher Four did not Ml use pluses or minuses on the report card. Her categories were clean. However, Teacher Four did use plus and minus concepts in the record book indicating contingency factors at other levels as shown in Table 4.22. An analysis of Teacher Four's record book indicates the task focus (see Figure h.l2). Of nineteen entries within one marking, eighteen were corrected and one checked in (- or +). Of these entries two were tests, and the rest were concerned with writing mechanics and parts of speech. The assigned tasks were skill based, hence, less contingency bound, which supports the clean categories of mark distributions. lllZ Table 4.22 Distribution of Marks Across Three Marking Periods Inna“! uni i‘arklnq hrloe - first “1) aacher Four Barking Period - hrs: (ll) Teacher Faur Student - 3: Student: - )1 Pin and ”mu Flu: and "mu; i 0 I related am late. I of liuuses I flutes - o I related ta All- 0 lo“- : :l :33: a pause: . lreerdained categtTriee - [on] I a! Iinulu - o Pm rdainad cateeerias I W)! t of pluses - 0 I a! pine: - u m Mm ' Marlin Period - Second (I l teacher '0'" ”mm; mm ‘ 5mm “5) 332::- '-""1': 9 3 menu - :1 Flu; and Hum: Plus and Minus o Ire ledte All- late. I o! IIMI:I I pluses 0 o I rel aedt te-A/I EL“ ‘ °' “nu“ I plum I __ o n...le uueeriee - was I oil a F raerdaihed cat rial . was i of m u" I of 9101:" - 0 “m. I” ‘ " pluses ' 0 u.“ l . “I“ ' ' mu Manna Period - Hnal (L5) Teacher four mm». mm - Final (It ) Teacher Four Stu -13 5 Students . 1: flu and "tau: flu and Minus EL.IDII|nqu|Pluul.a lrelatallldta [Inilt‘ luau-D lrelatatel All' 0 I at lihusat - :raerdeinad cataq;rloel - loci m" 1 :1 .|::.:. n .0 Imrdained cateeeriel ' [M I a! pluses . 1 g of 91““ . o 1.“ a “l3 I! {Hi i :n n-nunn n n a Figure 4.I2 Record book account Teacher Four. I44 Verbal Analysis Teacher Four was succinct but clear in her responses to the interview questions and in her commentary about her students. Hence. although her comments indicated great care of her students, there was very little elaboration or anecdote from which to extract classroom concerns. In the attribution-utility comments, Teacher Four stressed ability (.64) followed by classroom behavior (.36). Effort ranks lower than class behavior (.30), however, Teacher Four commented elsewhere in the protocol quite extensively about effort reflecting a limit on effort at both the top and bottom of the scale. Teacher Four clearly stated that effort without ability cannot get an A, and at the other end of the scale almost any effort will merit something above E. Table 4.23' Attribution-Uti lity Percentage Teacher Four Students 33 100 88888388 10 A OJ CA 2: O Q-e- >d-I NOB ~r-> Sw- 2" v2 v L'h H“ .40 HLL. (nu. 2 auger In I 67. I l56 Teacher Five's attributional categories resembled the general pattern of the composite case with ability leading effort and with significant comments on home support and classroom behavior. Teacher Five discussed home support problems in detail, expressing sympathy with many students' situations. However, his marks did not show extra generosity for those students. Students were given the benefit of the doubt only where some effort was shown. Teacher Five also mentioned puberty as a cause of wasting class time and lack of work. It was not a cause for leniency, but neither did his comments reveal inordinate concern. He has the sixth grade and portrayed puberty problems of growth and "not getting his act together" as characteristic of some sixth graders yearly. Although Teacher Five was concerned about a student's home support and classroom behavior, he modified the impact of these judgmental cues by mediating them through the cues of effort, i.e., checks, check minuses and check pluses in his record book. Effort adjustments were only a minus or plus away from the original calculated average. Table 4.27 Attribution-Utility Percentage Teacher Five Students 3| CLASSROOM BEHAVIOR TASK Dl FFICULTY A u :A g: o a;-.- >u can: ~.—> £— 00-! 52 w ii.— «a: .10 --u. 23 KIME SUPPORT IS7 Teacher Five differed from the attributional-utilities composite in his interest in task difficulty. He was as concerned with that category as he was with home support and classroom behavior. He was especially interested in a solution for students with test blocks which he categorized as a task difficulty. Teacher Five was optimistic about his students as shown in the cross tabulations of effort and ability where 26 of 3| pupils are considered to have average to high ability and 22 had high effort. Although not statistically meaningful, the crosstabs gave description to the class from the teacher's perspective Table 4.28 Cross Tabulations of Effort and Ability Teacher Five EFFORT High Low High I 21 I 5 I 26 Low 1 1 I 4 I 5 ' 22 9 31 ABILITY There was consistency between optimism, high predictions and Teacher Five's marks. He gave I9 marks 8 and above. Consistent with his attitude toward effort, low marks and confidence, Teacher Five gave only one E in language and one in math for complete "goofing off" all year. He also gave a D- to avoid an E. Summary The judgment process of Teacher Five supports the model. in the majority of marks, he followed procedural rules. In a minority, he used contingency rules. l58 which Teacher Five saw as highly related to self-confidence. Physical maturity was a significant part of the class behavior category, six comments out of nine on puberty or size. The marking process of Teacher Five was primarily bounded by the classroom. He was interested in parent influence to help with self-confidence and effort, and self-confidence was directly related back to effort. Teacher Five, however, was one of two teachers to mention the next grade and the need for two students to be followed up in junior high school. He mentioned on interest in the California Basic test results to let him know if his class expectations were realistic. This interest in outside factors may be a function of the sixth grade which leads to the next school. In summary, Teacher Five evidenced a coherent marking policy which supported the model and which was focused on and bounded by classroom tasks. l59 Case Summary The five teacher cases reveal marking policies and strategies which corroborate the composite case. For most marks, teachers follow the procedural rules. One teacher in this study differed significantly from the composite by using contingency rules more than procedural rules. The teachers agreed upon the five important cues operating in contingency situations; ability, effort, home support, classroom behavior and task difficulty. However, each teacher weighted these cues differently as can be seen in the attributional coding tables. At the level of teacher case analysis, some additional hypotheses emerged involving interrelationships between contingency judgment cues. These appear to involve trade-offs. Within this study, there was no systematic way to code for these hypotheses, but they need to be explored in future research. 0 The pattern of ascendance and recession of certain cue categories appears to be related to the fact that effort must be sustained over l80 days during a year's period. Such intensity of social conditions may demand compromises between achievement factors and behavior factors, hence, the ascendance of class behavior and home support during the midyear. o The influence of home support level appears to be related to the time of year with home support being especially important between the first marking/conference and the next to the last marking/conference. Apparently, teachers use the first marking period as a time of assessment of factors in existence. By the last marking period, they are not only aware of the categories but they are aware of the extent to which such categories may be combined to bring about task completion. 0 The category of classroom behavior and physical maturity also appears to be related to time of year. It appears to have greatest impact on teacher judgment at the beginning of the year, descending slightly after the first half of the year when class routines have been established. I60 0 The categories of ability and effort are dominant throughout the year, however, they too seem to be related to a time pattern, with almost overwhelming influence during the first and last marking period. 0 The importance of task difficulty as a cue category for marking appears related to the extent to which it is a consideration during planning and task selection. Where tasks are originally assigned at an appropriate level of challenge, task difficulty recedes as a marking judgment cue. Grade level reading materials, individualized programs and special education prescriptions relieve teachers of one complex judgment process. When tests were failed by a significant number of students, teachers tended to throw-out results and reteach rather than use a "task difficulty" category. These hypotheses are very tentative, but they have some support in the frequency distribution tables which indicate patterned fluctuations across the year. They need consideration in future research. The teacher cases, taken together, strengthen the composite model of the marking judgment process. |6l Integgated Summary This chapter displayed and analyzed the judgment, policies and cues of five teachers during the marking process across the school year. Evidence was presented in a composite case and five teacher cases. A model of the judgment process was constructed from the rules and cues which emerged during investigation. Information information and Processes in Classroom in Decision-Maker (Teacher) : : I I : : Information : : about students : : (reading level : : 7; and IEPC) : : '5 __ l l '3 : : 0 Combination rule § 'Teacfirbok l E P rdid t 1 Ai ' recor o , 0 rec a ne ca egor es“ ss gn “ as a statisti- :" : A s —’ Hark cal tool : : : : e Uncertainty zones : : between categories I I I I I I : : : o Subjudgment strategies I I I I I I : : l \ i i E A I I I : : Estimate (prediction) risk Information : of assigning higher or about : lower mark by: classroom : management : Attribution: cause of : success or failure 0 Participation : : e ability ,5 , e Cooperation L. , o effort: stable/unstable g ' and attitude : ' work 3. ~ : a home support .5 : ' e maturity/physical develop- ‘2 : ment 8 L ------------------- «I a task difficulty Utility: maintenance of on-task behavior a work production a class participation a cooperation and attitude I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I. - .._1 Final Decision (Mark) .FTgure 4.I: Framework for marking process (adapted from Carroll and Payne, I976). I62 The model indicates that teachers generally follow procedural and contingency rules which divide the process into three stages: collection of task completion information, computation of task information and modification strategies to deal with uncertainty between categories and with failure. In the majority of l52 cases, teachers marked pupils routinely from record book information which was combined according to a preordained category of A B C D or E (F). This was a linear process operating across the top of the model according to procedural rules. The primary cue used in the marking at the procedural rule level was task completion at a given standard (formative mark) or completion as a check W) in or out. In a minority of l52 cases, the combination rule resulted in uncertainty between two categories or in failure. Then contingency rules were put into operation. Contingency rules involved attributional and utility strategies based on consideration of five factors or cues which emerged from verbal analysis: effort, ability, home support level, classroom behavior and physical maturity and task difficulty. Prediction was not a judgment cue in itself. Under contingency rules, effort appeared as the primary cue vying with ability. Elementary teachers did not use summative marks as a feedback mechanism to improve teaching. Instead, they used intermittent tasks such as tests. If a significant group of students failed, the teachers judged themselves unsuccessful in teaching the unit and, therefore, retaugh or discarded the grade. The expectation that summative marks should serve in a feedback capacity is misplaced. The major conclusion of the study is that teachers have a coherent marking judgment process which operates across a school year. Within this process, task completion at a stated level of difficulty and at a given standard of mastery is the dominant cue in the marking judgment. Other cues operate in I63 zones of uncertainty between two preordained marks or in exceptions such as failure. The judgment process is bounded by the classroom task environment. I64 CHAPTER V CONCLUSIONS AND IMPLICATIONS Introduction The persistent dissatisfaction with traditional marks, A B C D E, which symbolize pupil progress, prompted this investigation of teacher marking processes. A review of the voluminous research literature on marks and the emerging literature on teacher decision making revealed little systematic inquiry into the judgment process underlying marks. The purpose of this study was to develop an understanding of the marking process which engages teachers across a school year by describing the heuristics, strategies and cues of five upper elementary teachers (l52 students) from a typically achieving school district in Michigan. The study posed seven research questions and investigated them through established research methods from the field of human judgment: process tracing, policy capturing, utility theory and attributional theory. Chapter V summarizes the findings under the question headings, compares them to the functions ascribed to marks by society (Chapter I), and discusses the implications for practice and future research. Summary of Findings by Research Questions ypon what information is the summative mark based? The summative mark for each marking period is based upon the completion of a significant number and variety of assigned tasks at an appropriate level of difficulty and standard of mastery. I65 What cogpitive processes make possible the formative stages (record book categories) of markigq? The cognitive processes of selection, simplification and inference operate through heuristics (rules), attributions of individual success and failure and perceived utilities of the classroom. Procedural rules emerged which guide and routinize the marking process. The record book is the key inferential tool of the process. Each teacher had variations on these rules, but all specified a significant number of tasks, a variety of tasks and an appropriate level of difficulty. The specification of tasks rested on the basic assumption that student learning results from completing meaningful tasks. Is there a judgmental rule which explains how the input information (Lormative) is transformed into the output (summative) mark? There is a linear arithmetic rule which averages across collected marks and which is directly related to standard of mastery and degree of task completion. Within a marking period, this rule is focused on completed tasks which carry weighted values and are assigned to the preordained categories of A B C D E. For example, ten math points earn an A, nine 0 B, and so on. In turn, each A is worth 4 points, each B is worth 3, each C is worth 2, each D is worth I. There is a great discrepancy as to whether an E equals 0 or something above 0. Across the year, the rule focuses on averaging the summative marks of each marking period. Hence the final mark is a derived arithmetic mean based on the weighted values of the completed tasks of each marking period. If the judgmental rule yields a zone of uncertainty between any two pgreordained categories or yields a failure, what cognitive processes enable the teacher to mark lfl) or down? Whereas procedural rules emerged to organize the marking process, contingency rules emerged to help clarify choices in uncertainty. Contingency rules rested on attributions of individual student success or failure and perceived utilities for total classroom behavior. Attribution I66 and perceived utility are inferential thinking processes which go beyond the collected data. In this study, they were encompassed within the categories of ability, effort, home support, classroom behavior/physical maturity and task difficulty. The most common tools for assessing these attributions or utilities were checks, minuses and pluses. Other conditions influenced contingency judgments. These included (I) trade-offs between contingency categories, (2) time of the l80 day-year and (3) extreme absence without cause. Systematic inquiry into these conditions was not within the scope of this study. Do identified cognitive grocesses form a pattern, schema or model of the marking process? A model has been proposed. This model is based on the procedural and contingency rules which divide the marking process into three phases: selection and collection of data, valuing and assigning of data to preordained categories of A B C D E, and contingency factors to facilitate choice under uncertainty or failure. The majority of marks are determined at the procedural level. I67 Information Information and Processes Final in Classroom in Decision-Maker (Teacher) Decision (Mark) I ..................... I I --- ' a a s i I I I Information I E I about students I I I (reading level I v; I and IEPC) '- l 13 I — . 0 Combination rule 0 ' Teacher ' 2 I record book I a Preordained categories Assign “ E asastatisti- " A a c o E —’-mrt I cal tool I I a Uncertainty zones I I between categories . I I I I 0 f cia I I a Subjudgment strategies Mark I l E I l \ IT I L '1 Estimate (prediction) risk ' Intonation of assigning higher or about lower mark by: classroom management . Attribution: cause of success or failure 0 Participation a ability >. e Cooperation _. a effort: stable/unstable g and attitude work 3: a home support .5 , . a maturity/physical develop- “ I I mat C I l 43 E — I a task difficulty I I Utility: maintenance of I on-task behavior I a work production I a class participation I a cooperation and attitude E I __________________ __ “.1 Figure S.I Framework for marking process (adapted from Carroll & Payne, I976). Do identified Mitive processes account for the five functions ascribed to marks by society in general? A review of the ascribed functions which were presented in Chapter I, indicates that the functions may be classified into two general groups: one involves assumptions about marks related to conditions outside the classroom such as future counseling placement within the K-l2 program, future marks, and future job success; the other involves conditions inside the classroom (ecology) structure such as motivation, achievement and a teaching feedback function. The findings of this study indicate clearly that the judgment I68 processes of teachers (rules, strategies and cues) are focused on task completion which is bounded by the particular classroom and its immediate participants. The marking judgment processes of teachers, therefore, are not concerned about the functions ascribed to marks which are outside the classroom. Marking judgments primarily relate to task completion at a given level of difficulty and standard of mastery, and to the factors which promote that completion. Hence teachers define their marking responsibility in terms of the practical demands of 30 pupils in a classroom for a whole year. Of the four methods of investigation used, is one superior for illuminating the markigg process? The four methods of process tracing, policy capturing, attribution theory and utility theory, shed light on different levels of the marking model. Process tracing provided the broadest description of the marking judgment and supplied some part of the answer for each research question. It provided the most rules and cues used in the marking process across the year. Consistent with the process tracing discussion by Einhorn in Chapter III, the distinction between two subjudgment phases emerged; one which dealt with choices between multiple categories, A B C D E, and one which dealt primarily with a choice between any two categories. These phases were labeled procedural and contingency, and they provided the major divisions of the model. A definite weakness of process tracing was its inability to distinguish the various weights of factors in the judgment. Policy capturing dealt best with the procedural questions, with the summative marks across the year and with teacher choices between multiple categories of marks. lt answered research questions pertaining to combination rules across the year leading to the conclusion that each marking period functions separately. Within policy capturing, different statistical techniques led to different results. For example, multiple regression tended toward a recency effect unless adjusted. Pearson correlations made a repeatedly strong case for a T‘i'f‘ll ’ .1 I69 primacy effect. Partial correlations tended to adjust both techniques and supply a modified policy which led to a neutral position on recency and primacy effects. This neutrality forced attention back to the significance of formative marks within the record book. Attribution theory dealt well with the research questions regarding zones of uncertainty between any two categories. Protocol comments were categorized and counted illustrating the general weighting of the categories of ability, effort, home support and task difficulty. Adjusted attribution charts showed effort to predominate the judgment but always vying with ability. This substantiated Weiner's findings discussed in Chapter III, page A 63 . Policy capturing with statistical analysis did not get at these factors, but attribution theory with verbal analysis did. The findings were further supported by frequency distributions of pluses, minuses and checks, which showed that contingency situations tended to increase as the year progressed. Attribution theory, however, is oriented toward an individual psychology, and it misses some aspects of cooperative class behavior. Utility theory filled in the class behavior gap. It too was concerned with contingency factors particularly on-task behavior. Some teachers gave pluses and minuses in separate columns specifically for cooperative behavior. These columns were only consulted when a mark was determined to be in a zone of uncertainty. The decision tree tool illustrates teacher risks and thoughts when deciding to give a higher or lower grade. Utility theory is very concerned with estimating future effort or behavior, but not concerned with attributing cause on an individual basis. It can be concluded that the research question was inappropriately phrased when it asked for a superior method. Instead, each method had strengths and weaknesses. Together they illustrated the total, year long marking process with its emphasis on task completion. The four methods together led to the identification of a model of the cognitive processes involved in marking 'I..‘i' I70 judgments. Together they answered the research question relating to the five functions of marking by indicating that the validity of past research on marks must be questioned in light of its general limitation to single phases of a much larger judgment process and its general focus on functions outside the classroom. Only with a multimethod approach was the total process illustrated. Implications for Research Four outcomes of the study have implications for research: the importance of task completion as the primary unit of the performance-grade exchange; the classroom bounds of the marking process; the value of the multimethod approach to marking judgments; and the heuristic value of the model. These outcomes relate to research in different fields of education. Task completion at a given level of difficulty and a given standard of mastery emerged as the primary judgment cue of teachers during the marking process. The factor of completion, or the filling in of columns across the teacher record book appears to carry a heavier weight than the quality of the completed work. Two features substantiate this assertion: any work handed in receives some credit above E; students operating at a lower than class average level of task difficulty can receive the same amount of credit. However, it is also notable that above the level of C, teachers begin to create more categories of distinction by the use of minuses and pluses. Note the frequency distribution charts of marks across the year. Hence the criterion of completion has greater weight below C and the criterion quality vies with completion above C. The criterion of completion is greater with students operating below grade level on task difficulty. This overwhelming emphasis on task completion at both an Individual and class level calls into question a prevalent notion that teachers mark students l7l according to racial or socioeconomic characteristics as implied in some expectancy research. The marking task at the end of a given time period appears to be based on different factors than those used in the prediction process at the beginning of a time period, most notably the factor of completion. The distinction between prediction and judgment has not been clarified in previous studies. The marking judgment relies directly on student task completion and indirectly on the classroom behavior which produces task completion more than it relies on identified student characteristics. This overwhelming emphasis on completion also draws attention to the quality and quantity of the original tasks and the expectations assigned during planning. The current debate about the perceived rigor of private schools (Coleman, l98l) or of the effective public schools (Brookover and Lezotte, I976) goes to the heart of the issue of assigned and completed tasks. Do teachers assign more tasks at a greater level of difficulty in effective schools? What factors influence the number, variety and quality of assigned tasks? The implication for research is that the teacher expectation studies need to have a student evaluation (marking) dimension. The second factor which has implications for research is the bounded nature of the classroom. This may explain some of the previous unreliability of marks. The review of the marking literature indicates that most studies compared marks to functions outside the classroom such as future placement and future success. Current studies in teacher decision making and planning, are finding that the classroom culture has its own demands which must be considered. The work of Doyle, in particular, emphasizes the ecological nature of the classroom. The planning studies of Yinger and Clark specifically found that the chief unit of planning was the task rather than behavioral objectives. The implications of this marking study are that future studies of marking must account for the bounded nature of the process. Teacher decision-making research needs to examine the I72 relationship between tasks and marking, between planning and marking and between time-on-task and weighting of tasks. To date teacher decision-making studies have emphasized the preactive and interactive phases of decision making, neglecting the postactive. The multimethod approach to marking studies is promising and has implications for research. When tasks have been investigated in the past, only one task such as a test or paper has been examined. For example, the Starch and Elliott model of research asks a significant number of experts (l00+) to correct one essay or test and concludes that marks are unreliable. This current study suggests that the reliability of one task is discounted by most elementary teachers who collect a great number and variety of task data in their record books. In the future, research on the number, variety and weighting of assignments promises greater insights than replications of one time task research. The past habit of examining single products and generalizing the results to the marking process points to the role which the marking judgment model could play. In effect, it provides a framework for past marking studies which shows that some studies were entirely involved with the procedural level of marking, others with the contingency level. Either one alone does not account for the total marking process. Hence, the model places the value of past studies into a meaningful framework. Implications for Practice The outcomes of the study have implications for practitioners which are primarily related to the heuristic value of the model. Recalling Stenhouses's earlier concern that the use of research was to map the range of experience rather than to perceive the operation of laws within it, and to work through the I73 refinement of judgment rather than the refinement of prediction, this marking study adds to his goal. The model can be used as a practitioner tool for reflecting upon aspects of the marking task. Practitioners can ask themselves what data they do collect for a mark. They can examine the quality and variety of their tasks and the extent to which some tasks may represent trivia or depth. They can reflect upon the interrelationships between various contingency factors and upon the relationship between procedural and contingency rules. The importance of the home support category is cause for reflection. To what extent do practitioners rely upon the home for leverage? To what extent do they communiate their procedural rules to the home rather than being satisfied with the oft repeated combination rule statement that 90 to I00 is an A, 80 to 89 is a B, etc., which is only a very small aspect of marking. In this regard, there are obviously implications for the home. The role of the family in task completion is important and often neglected in discussions of educational accountability. School districts may need to articulate this role to parents and to reexamine the role of homework which many parents actually request. The fact that many teachers do not use the summative mark at the end of a marking period as a feedback mechanism needs discussion and further exploration. If teachers feel that a variety of tasks are important to reflect a range of student capabilities, then why do they not look at the summative mark which reflects this range as an important source of assessment? Why do they emphasize formative task feedback to the exclusion of summative feedback? There may be important instructional reasons why this is so, but at this time, the problem has not been addressed by practitioners or reseachers. Lastly, there are implications for teacher educators. The model provides the opportunity to discuss the framework for marks and the importance of some consistency between class activities, assigned tasks and weighted marks in the I74 record book. Rather than leaving the marking process as a last thought after instruction, it needs to be integrated into the entire instructional process. In particular, the potential use of summative marks as an additional source of feedback needs exploration. Summary This study was intended to generate a description of the judgment processes of five elementary teachers (l52 students) during marking across a school year. The findings support a model of the marking judgment constructed from the strategies and cues which emerged through analysis of marks, record books and interviews. The model presents a three phase process which is guided by procedural and contingency rules. Findings indicate that task completion is the primary focus of the judgment, with the criterion of completion having a variable weight in the judgment. The marking judgment is bounded by the classroom, a conclusion which suggests that many past marking studies have made assumptions about marks which are inappropriate to the teacher judgment process. The study found formative marks serve as a feedback mechanism but that summative and final marks do not. The study was limited to five experienced teachers, hence any specific conclusions are highly tentative. The model, however, is useful as a heuristic to generate further discussion, deliberation and research hypotheses. APPENDICES I75 APPENDIX A INTERVIEWS The primary means of collecting data was the structured, in-depth interview. Five elementary teachers were interviewed following three different marking periods (November, February, June). The first interview was the longest, taking from one and one-half to two hours. The format of the second and third interview was much shorter as may be seen. However, all interviews asked teachers for each pupil's mark, the predicted mark for the next marking and the reasons for the mark remaining, going up or going down. INTERVIEW #l - NOVEMBER INTERVIEW #2 - FEBRUARY INTERVIEW #3 - JUNE I76 Questionnaire The Grading Process Interview #l - November Introduction The following questionnaire contains a variety of questions. They are attempt to find out what processes and strategies you use to organize all of the work your students do into a single mark in a subject. At the end of the question- aire, you may have some thoughts to add, and I would welcome them. In the interest of time, I shall go right ahead with the questions indicating period— ically which one we are on for the sake of a quicker review of the tape. Please feel free to comment at the end of the session. I77 Questionnaire Section I ll. 12. l3. 14. 15. 16. How many years have you taught? How many years have you taught in this district? this school? Have you ever used a marking system other than A,B,C symbols? (Probe: -Did you prefer it? -Strenghts and weaknesses?) Has the marking system in this school been changed or examdned recently? Are you satisfied with the report card format at this school? (Probe: -Do you have specific suggestions for change?) How often do you mark cards? Are these times satisfactory? More times? Less? Does the district give a day off school to complete the marking process? (Probe: -Do you find this useful? -Do you do any of the marking in advance?) When you were in.elementary school, how did you feel about marks? Do you have a working philosophy about marking and where it fits into the whole educational picture? Does yourdistrict have a working philosophy about grading? (Probe: -Do you agree with the symbol system as explained on the report card or do you have some different ideas. following questions relate to your current class and its marks Which grade do you have this year? How many children are in the class? (Probe: -Boys -Girls) On a scale of one to five, to what degree are the ethnic backgrounds of your students similar. One represents little difference and five is great difference. (1 2 3 4 5) Question discarded On a scale of one to five, to what extent is the general behavior of the students similar. One represents litte diversity and five is great. (1 2 3 4 5) Question discarded On a scale of one to five, to what extent is the achievement of students similar? One represents little difference or a rather homogeneous class, five represents great difference. (1 2 3 4 5) Question discarded On a scale of one to five what is the degree of parental involvement in school activities? (1 2 3 4 5) Question discarded 17. 18. 19. 20. 21. 22. 23. I78 On a scale of one to five what is the degree of parental interest in pupil progress? (1 2 3 4 S) gagstion discarded What was the number of your students' parents who attended the recent con- ference. On a scale of one to five, what is the degree of parental stability in the community? i.e. how many one parent families? (1 2 3 4 5) 925stion discarded How does the achievement level of this class compare to other classes which you have had? (Probe: -comparab1e, greater, less) Does this class operate at grade level? (Probe: -How many above? -How many below? -How do you determine this? i.e. textbooks, reading level etc. Can a mark reflect the situation where an individual pupil is progressing but is still below grade level? (Probe: -How?) Can a mark reflect the situation where an individual pupil is achieving much above grade level, but is not working very hard? How? The following questions are focused aroung the record book and they are the heart of this study because we really know'very little about the ways in.which teachers organize the marking task. 24. 25. 26. 27. 28. 29. How many marks are considered in this marking period? What do these marks represent? (Probe: -Subject, tasks -Why were these specific assignments chosen?) Do you plan these categories well in advance or do you wait until the task is over to decide whether it should be in a record book category? Discuss your process Are there activities which occur which you don't mark? (Probe: -Examples) Do you find some subjects easier to mark than others? (Probe: -Which ones? -Why? -How does Math contrast with English?) Could we now look over each students grade at the end of the marking and could you explain to me how you came to the final mark in each case. I am not interested in student names but I an interest in cases where it was a problem deciding which grade to give. i.e. students who fell between two grades. Could you please predict whether each pupil will improve next time, probably remain the same or lost ground? Grade Predicted Grade Factors Influencing i.e. student 2901 LeAe- Me - 31. 32. 33. 34. 35. I79 Do you think there would be much agreement amongst teachers about various criteria to be considered in marking? Do you feel it is useful to compare the marks which you give with student assessment scores-—or other standardized tests? (Probe: a. Have you ever done so? b. Do you record the scores?) Do you feel that marks are related to student motivation? (Probe: a. In your class? How? b. In another class? How? c. Are marks ever a re- ward?) As you look over this whole group of grades, how would you say this group is progressing? (Probe: -Are you satisfied? -How do you feel about those receiving less than C? -Do you feel that you can help them improve? -Do some need additional resources? Are these available?) You have said that you are generally (satisfied - dissatisfied). Will you change any of your plans for the year? How? (Probe: -Will you use or change groups? -Will you add resources or new activities?) GeneralfiEtic validation 36. 37. 38. 39. Have you ever been questioned by parents about a student's mark? (Probe: (1) How often? (2) About what concerns?) How do you handle this situation? (1) Have you ever changed a mark for a parent? Do students question their report card marks? (Probe: (1) How often? (2) What concerns them? (3) How do you handle this situation? (4) Have you ever changed a mark for a pupil? Has the principal ever question reportcard marks? (Probe: (1) How often? (2) What concerns him? (3) How do you handle this situation? Since I approached you about this study of the grading process, have you had any particular thoughts about the subject in general? It is possible that as we go through the year, you will have some insights of your own on this topic which would be very important to me. I would like to leave you a small notebook and pen, and if random throughts occur to you, would you please write them down and I'll pick them up when I come in the new year. my telephone number is 626-6252, and I would love to discuss this in more detail if you have any questions. 10. ISO Interview #2 - February In our previous interview, you mentioned that the final grade at the end of a marking period was arrived at by an averaging of marks in the book and not really a difficult decision. When a mark is computed to be half way between two grades, what factors influence your decision to go up or down? You mentioned that you weight tests more heavily than other work. Could you clarify how you weight tests and why they are worth more than daily work? Looking at a particular test, how do you decide which one will merit an A? That is, how do you set the standard for a given test? Before the class takes it or after you have seen results? How would you interpret a test if the majority of the class failed it? Would you discuss your feelings about the relationship between a test score and how much you feel the student has actually learned? What do you feel about the number and timing of tests? You mentioned that marks were a motivating factor. Why do you think this is so? Do marks have the same motivational power at each marking period? Do they motivate amount of work or actual material learned? Do marks motivate parents as well, or what is the relationship between home and school in regard to marks? Consider the following situation: A year end marking in which more than three quarters of the 30 students in Teacher X's class received C or D? What would you conclude? What additional information would you need to come to a con- clusion? 5. 6. |8| Interview #3 - June May I please record your final marks. Would you please do two quick sorts for me. A. Please sort the class into two categories accoring to effort: those who put forth average to above average on a regular basis and those who put forth average and below. 8. Please sort the class into two categories according to ability: those whom you perceive to be generally average to above average ability, those whom you perceive to be generally average to below average ability. Your class averages above a B in language arts and in math. Some people think that classes should average a C. Could you discuss some of your reasons why your class is higher. You have no (one) E's so nobody has failed. Do you have a theory about marks below C? If you had to make a quick judgment about your own marking, which would you say carries the most weight in deciding a mark - the effort or the outcome? why? In several studies of classroom achievement, four factors were repeatedly mentioned by teacher as being regularly influencial - ability, effort, task difficulty, luck. You have frequently mentioned ability and effort, but you have added several distractions such as lack of ability to concentrate, care- lessness, low level of home support or divorce, and interest in socializing. How do these affect marks? Would you mind a follow up question during the summer? Phone number. APPENDIX B REPORT CARD 1 REPORT CARD 2 I82 APPENDIX B Report Card Which was Used in the School of Teachers One and Two Command: N u "D u‘ :amnmu cutaneous WWW” ELEMENTARY SCHOOL A - bastion: m I WWW EVALUATION op C ~ Average souls-omen: mommeess :me -Wem I-Wm mM-mwumm Mum 1 2 8 4 g 1 I 3 6 I- mam- 1 W IT 1 1 I J Glue“ Comm-mums am Comnunmwlomw Emma“ f L I I I Elmsethnwmmfocm ”(LUNG G'mw Grade” 3”“ "'"‘ "‘""'°" 5;.“ am": «are: correcuv T one come: swung no other when arses M00.“ - ., ”WIT ._. ,_...4 nl Gotham” ass-96mm“ summonses V-Neede W n- wanna-"M n.- “ A‘ m_r.~ v Hummus-n mam-see I‘m I. W. at me IWOG'IV U810!“ woiully Md Us” ume ml.” mas Comomes moment on "me |83 APPENDIX B Report Card Which was Used in the School of Teachers Three, Four and Five Oakland County. “chum mmm mmm Fuel 1 - Outstanding Achievement A - EXCELLENY 2 - Sauslsctory Acn-evenven: O - ABOVE AVERAGE 3"“ mm— J-WNMOIMHWBW C-AVERAGE 040016.!” museummrom O-BELow AVERAGE Tm Huang Pence E - mum Sense: SUBJECT SUBJECT SUBJECT COMMENTS LANGUAGE mmemncs ammo .... w "“ Ilen- n "" "‘" SOCIAL sewmon 4* N 3.. AW LANGUAGE SOCIAL STUDIES J. U. WORK HABITS I WI'Q U- “5". SPELLING SCIENCE ,m ~v~ HANDWRITING m" ATTENDANCE an” l81l APPENDIX C Attribution-Utility Coding Device This instrument was the result of the categorizing of teacher comments taken from the transcribed interviews. I85 fl//louf{ (all 0/4/1150 cannon/3 Is" 7u'A/Jlfl!l/4I 24"" [IA éatlrrr, :4, 60/157 nfl‘rrlalfaal cap/y on: gofinca/fef 644,01, ”‘1' 4//...¢/ 44¢» scuora/ common/9 “(Irv utc/ V‘ 445prxfe n, ,fléflfi hwy,“ org/Airs, Alex: afiar one were xée/u/r/ or} .41.- (VAna/tc'a filou/ ¢AluJé¢ [/24»{ loan say/oer!“ an elzn'rovor [al.au'r Mer- mee¢4l e/4fgy‘n/Ca/ :au/c’orl'd'i. I86 “ma—Is ”finer I‘m» ‘7 W APPENDIX D CASE STUDY CHARTS I87 APPENDIX D Case Study Charts The data contained in these tables were used to modify the original regression equation. Similar tables for the composite case were included in the body of the text, with interpretation, as explanation of the adjustment process. Teachers One, Two, Original Regression Equation Three, Four and Five } Regression Tables, Pearson Correlations, Partials APPENDIX D CASE STUDY CHARTS TEACHER ONE I88 Variable Weights Within the Final Mark Teacher One Dependent Variable - L5 Final Language Mark Language Arts Variables B F Sign. Final Prediction L4 .60 5.35 .030 Second Marking L3 .34 3.5 .072 Second Prediction L2 .066 .057 .812 First Marking L1 .045 .029 .865 Note: Overall F - 41.97 Multiple R - .94 R Square - .87 Standard Deviation - 1.42 Dependent Variable - M5 Final Math Hark Math Variables B F Sign. Final Prediction "4 .096 .073 .788 Second Marking "3 .69 5.28 .030 Second Prediction n2 -11 -145 -707 First Marking "1 .16 .436 .515 Note: Overall F - 43. Multiple R - .94 R Square - .87 Standard Deviation c 1.40 Note. These weights are derived through regression analysis of'the numerical equivalents of grades of an upper elementary class across a year. 189 Correlations Among Markings and Predictions Teacher One Language Arts Variables L1 L2 L3 L4 L5 First Marking . L1 1.000 .95 .73 .77 .75 Second Prediction L2 1,000 .70 .75 .74 Second Marking L3 1,000 .93 .91 Final Prediction L4 1,000 .92 Final Marking L5 1,000 P-.001 Mathematics Variables M1 M2 M3 M4 M5 First Marking "1 1,000 .94 .85 .85 .85 Second Prediction M2 1,000 .34 .88 .84 Second Marking H3 1,000 .96 .93 Final Prediction . H4 1,000 .91 Final Markinq "5 1,000 P=.001 . Regression Equations: L5 = .60L4 + .35L3 - Constant (Sign. I .030) (Sign. = .072) M5 = .80143 + .28112 - Constant (Sign. - .000) (Sign. a .083) Note. These correlations are based on the language arts and mathematics marks and predictions of an upper elementary class across one year. I90 Partial Correlation Coefficients Teacher One [r03 L3T ] Li 1 .03 1.439J Mgtg. Ll - First Actual Mark L2 - Second Prediction L - Second Actual Mark - Final Prediction L - Final Mark [r1742 M3) j M1 J .21 [.136] 393g. Ml - First Actual Mark M2 - Second Prediction M3 - Second Actual Mark M4 - Final Prediction MS I Final Mark Note. Partial correlation coefficients were derived from the numerical equivalents of the marks of a class of upper elementary students across a year. APPENDIX D CASE STUDY CHARTS TEACHER TWO I9I Variable Weights Within the Final Mark Teacher Two Dependent Variable - L5 Final Language Mark Language Arts Variables B F Sign. Final Prediction L4 .32 2.08 .162 Second Marking L3 .34 2.74 .111 Second Prediction L2 l .040 .120 .732 First Marking L1 -l .13 1.28 .268 Note: Overall F . 12.94 Multiple R - .83 R Square - .69 Standard Deviation a -90 Dependent Variable - M5 Final Math Mark Math Variables 8 F Sign. Final Prediction "4 .40 4.91 .037 Second Marking "3 .009 .011 .916 Second Prediction "2 .052 .026 .613 First Marking "1 .18 3 88 .061 Note: Overall F - 18. Multiple R - .87 R Square a .75 Standard Deviation s .82 Nate. These weights are derived through regression analysis of the numeriCal equivalents of grades of an upper elementary class across a year. I92 Correlations Among Markings and Predictions Teacher Two Language Arts Variables L1 L2 L3 L4 L5 First Marking L1 .62 A .67 .651 .68l Second Prediction L2 P4034 .57 .33 Second Marking L3 .83 .78 Final Prediction L4 .78 Final Marking, L5 PI.001 Mathematics Variables M4 M5 {First Marking M1 .7},- 31 Second Prediction M2 .71 .74 Second Marking H3 .74 .50 Final Prediction M4 .32 Final Marking M5 PI.001 Regression Equations: L5 I .40L2 + .34L3 + .32L4 - Constant (Sign. I .732) (Sign. I .111) (Sign. I.162) M5 I .42M4 + .ZOM1 - Constant (Sign. I .003) (Sign. I .007) Note. TITese correlations are based on the language arts and mathematics marks and predictions of an upper elementary class across one year. I93 Partial Correlation Coefficients Teacher Two [rILlLaT I 11 L .12 I°27LI flgtg. Ll I First Actual Mark L I Second Prediction L: I Second Actual Mark L4 I Final Prediction L5 I Final Mark 1' 2 3 1 M1 1 .55 [.001J Note. M I First Actual Mark I Second Prediction I Second Actual Mark - Final Prediction I Final Mark t’I‘UNF‘ Note. Partial correlation coefficients were derived from the numerical equivalents of the marks of a class of upper elementary students across a year. APPENDIX D CASE STUDY CHARTS TEACHER THREE I94 Variable Weights Within the Final Mark Teacher Three Dependent Variable I L5 Final Language Mark Language Arts Variables I 8 F Sign. Final Prediction l H; .031 .011 .916 Second Marking L3 38 2.75 .111 Second Prediction L2 1 .014 .03 .957 First Marking L1 1 .69 6.72 .017 Mote: Overall F I 53. Multiple R I .95 R Square I .90 Standard Deviation I 1.22 Dependent Variable I Ms Final Math Mark Math Variables 8 F Sign. Final Prediction "4 ~35 1-9 ~173 Second Marking "3 .14 .35 .556 Second Prediction "2 -.16 .52 . .477 First Marking "1 .46 10.9 .003 . Note: Overall F I 38. Muitipie R - .93 R Square I .87 Standard Deviation I 1.16 . Note. These weights are derived through regression analysis of the numerical equivalents of grades of an upper elementary class across a year. I95 Correlations Among Markings and Predictions Teacher Three Language Arts Variables L1 L2 L3 L, L5 First Harkin9 L1 .94 .82 .89 92 Second Prediction L2 .33 .37 .89 Second Marking L3 .93 .39 Final Prediction L4 .91 Final Marking L5 PI.001 Mathematics Variables "1 M2 M3 M4 M5 First Marking "1 89 .88 .84 91 Second Prediction M2 .86 .90 .84 fiCOfld Marking "3 .93 .89 Final Prediction M4 .87 . Final Marking M5 PI.001 Regression Equations: L5 I .69L1 + .381.3 - Constant (Sign. I .017) (Sign. I .111) M5 I .45141 + .38144 - Constant \ (Sign. I .000) (Sign. I .011) Note. These correlations are based on the language arts and mathematics marks and predictions of an upper elementary class across one year. I96 Partial Correlation Coefficients Teacher Three I'ILzLal I LL l '23 J-OGSJ Mote: Ll I First Actual Mark L2 I Second Prediction L3 I Second Actual Mark L4 I Final Prediction L5 I Final Mark W2"; I "1 I "0 I'mj Note: MI I First Actual Mark M2 I Second Prediction M3 I Second Actual Mark M4 I Final Prediction MS I Final Mark Note. Partial correlation coefficients were derived from the numerical equivalents of the marks of a class of upper elementary students across a year. APPENDIX D CASE STUDY CHARTS TEACHER FOUR I97 Variable Weights Within the Final Mark Teacher Four .1 Dependent Variable I L5 Final Language Mark Language Arts Variables 8 F Sign. Final Prediction L4 -.55 1.68 .206 Second Marking L3 .93 6.19 .019 Second Prediction L2 .43 3.78 .062 First Marking L1 .030 .035 .854 Note: Overall F I 36. WIIHHB R I .91 R Square I -84 Standard Deviation I 1.04 Dependent Variable I M5 Final Math Mark Math Variables B F Sign. Final Prediction "4 .15 .63 .434 Second Marking "3 .38 3.08 .091 Second Prediction "2 .21 .1.5 .227 First Marking "1 .18 1.4 .243 Note: Overall F I 57. Multiple R I .95 R Square I .89 Standard Deviation I 1.05 Note. These weights are derived through regression analysis of the numerical equivalents of grades of an upper elementary class across a year. I98 Correlations Among Markings and Predictions Teacher Four Language Arts Variables L1 L2 L3 L‘ Ls First inrking 1.L .88 .85 .84 .82 Second Prediction 4L2, .91 .93 .88 Second Marking L3 .98 .90 Final Prediction L4 .88 inal Marking L5 PI.001 Mathematics Variables M1 M2 M3 M4 M5 irst Marking "1 Prediction Marking inal Prediction inal Marking Regression Equations: LS I .93L3 + .43L2 - Constant (Sign. I .019) (Sign. I .062) M5 I .59M3 + .31M1 - Constant (Sign. I .000) (Sign. I .009) PI.001 Note. mathematics marks and predictions of on class across one year. Th—ese correlations are based on the language arts and upper elementary I99 Partial Correlation Coefficients Teacher Four IrILZLSI I '1 I -“ 'I-°°‘] £255. Ll I First Actual Mark L2 I Second Prediction L3 I Second Actual Mark Final Prediction Final Mark F III E E1143 M3) I ii1 ] .43 l .008] flggg. MI I First Actual Mark Second Prediction Second Actual Mark Final Prediction Final Mark 3 3 3 3 hunt a a a a Note. Partial correlation coefficients were derived from the numerical equivalents of the marks of a class of upper elementary students across a year. APPENDIX D CASE STUDY CHARTS TEACHER FIVE 200 Variable Weights Within the Final Mark Teacher Five -.-..‘. Dependent Variable I LS Final Language Mark I Language Arts Variables I I B I F Sign. I Final Prediction I L4 I .52 3.64 .067 I ISecond Marking I L3 I .090 I .119 l .732j Second Prediction I L2 I .42 I 8.1 I .008 I First Marking I L1 I -.08 I .016 I .684 INote: Overall F I 43. Multiple R I .93 R Square I .87 Standard Deviation I 1.06 -w-—-..—~.o- -‘e- _ -— . Dependent Variable I M5 Final Math Mark I Math Variables I B F SIOD- Final Prediction "4 .56 7.64 .010 Second Marking M3 .45 4.3 .048 l Second Prediction M2 -.122 .49. .488 I Firs; Marking "1 .048 I .065 . .799 INote: Overall F - 29. l Multiple R I .90 I R Souare - .81 I Standard Deviation I 1.20 . -ov-.. a...“ a Note. These weights are derived through regression analysis of the numerical equivalents of grades of an upper elementary class across a year. 20I Correlations Among Markings and Predictions Teacher Five Language Arts Variables L1 L2 L3 L4 L5 First Marking L1 .90 .92 .92 .87 Second Prediction L2 .85 .87 .90 Second Marking L3 .95 .88 Final Prediction L4 .91 Final Marking I L5 ' P-.001 Mathematics Variables M1 M2 M3 M4 M5 first Martins "1 I .78 .76 .69 .66 FScond Prediction M2 .74 .71 .63 Fecond Marking M3 .90 .87 inal Prediction M4 .88 inal Marking M5 PI.001 Regression Equations: L5 I .9OL3 + .SZL4 + .42L2 - Constant (Sign. I .732) (Sign. I .067) (Sign. I .008) MS I .53M4 + .42M3 - Constant (Sign. I .010) (Sign. I .035) Note. These correlations are based on the language arts and mathematics marks and predictions of an upper elementary class across one year. 202 Partial Correlation Coefficients Teacher Five @213) I 1.1 I .13 .1135 First Actual Mark L2 I Second Prediction L3 I Second Actual Mark L‘ I Final Prediction Final Mark '2 0 H O P p I r m I ::_3 3 I‘TTMID I .79 [.001 J Note. MI I First Actual Mark M2 I Second Prediction M3 I Second Actual Mark M4 I Final Prediction MS I Final Mark Note. Partial correlation coefficients were derived from the numerical equivalents of the marks of a class of upper elementary students across a year. APPENDIX E COMPOSITE REGRESSION PLOTS LANGUAGE MATHEMATICS 203 APPENDIX E COMPOSITE REGRESSION PLOTS - LANGUAGE gins: .4 I e a I 9 .e e e O O O O I. II I O I! e III/e I I I I u e. 4. Bang: «4 if. Ll SEC” Pm gun-S .4 1 are. T LI NHL Pm 9800. C.— no. It. LI FIMH. FRED 1.5-0 MTM HIST MUM F III. 2011 APPENDIX E COMPOSITE REGRESSION PLOTS - MATHEMATICS V Y V V I it . a”, 3 an , I I I 0 I, i 9 O V’ 9 . .- I/ 5 meat ‘ ’I 3 1.. I, b e I, g ‘t e e e, I e e e e O I”. o I 4 e . I I , I I 9 9 O O z I I m It. ’ o a: o a. . -’; ; c c : - 1 ' 4 e :e e a e a na 1" I.” m o as e g .. r : c a 4 4 c: c c s - : tau 0.- . 1.5 11.8- 14' «an in not. at Mill 01¢er II4 Mill Fl“. m0 . APPENDIX F COMPOSITE CROSS TABULATION 205 APPENDIX F Composite Cross Tabulations All Teachers ABILITY EFFORT High Low High 93 24 117 Low 9 24 33 102 48 150 LIST OF REFERENCES 206 LIST OF REFERENCES Aiken, L. The grading behavior of a college faculty. Educational and Psychological Measurement, l963,§, 3l9-322. Anderson, R. The importance and purpose of reporting. The National Elementary Princiml, May I966, XLV, No. 6. Baird, L., & Feister, W. Grading standards: The relation of changes in average student ability to the average grades awarded. American Educational Research Journal, Summer I972, 2, 440. Bandura, A. Social LearnirgThwry. New Jersey: Prentice-Hall, I977. Barnes, R., Ickes, W., & Kidd, R. Effects of perceived intentionally and stability of another's dependency on hebianhavTor: A field experTment. Unpublished manuscript, University of Wisconsin, Madison, I977. Cited by B. Weiner, I979. Becker, H., Geer, B., 8: Hughes, E. Makin the Grade: The Academic Side of College Life. New York: Wiley, I968. Bejar, l., & Blew, E. Grade inflation ad the validity of the Scholastic Aptitude Test. American Educational Research Journal, Summer I98l, _|_8, No. 2, I43-l56. Bells, W. Reliability of repeated grading of essay type examination. Journal of Educational Psychology, |930,_2_|, 48-52. Berkowitz, L. Resistance to improper dependency relationships. Journal of Experimental Social Psychology, I969, 5, 283-294. Bloom, B. Human Characteristics and School Learning, New York: McGraw-Hill, I976. Bloom, B. Taxonomy of Educational Objectives,1 Handbook l: nggitive Domain. New York: David McKay Company, Inc., I956. Brophy, J. Teachers' cognitive activities and overt behaviors. Occasional Paper, Institute for Research on Teaching, Michigan State University, April I980. Brophy, J. Recent research on teaching. Occasional Paper No. 40, Institute for Research on Teaching, Michigan State University, November I980. Brophy, J., & Good, T. Teachers cof'nmunication of differential expectations for children's classroom performance: Some behavioral data. Journal of Educational Psychology, I970, fl, 365-379. Brophy, J., 8: Good, T. Teacher-Student Relationships. New York: Holt, Rinehart and Winston, lnc., l97lI. 207 Broudy, H. The Real World of the Public Schools. New York: Harcourt Brace Jovanovich, Inc., I972. Bussis, A., Chittenden, E., & Amarel, M. Beyond Surface Curriculum. Colorado: Westview Press, I976. Carroll, J., & Payne, J. (Ed.) Cognition and Social Behavior. New Jersey: Erlbaum Associates, I976. (Distributed by John Wikz and Sons, New York.) Carroll, J., & Payne, J. The psychology of the parole decision process: A joint application of attribution theory and information processing psychology. Qgggition and Social Behavior. New Jersey: Erlbaum Associates, I976. Chansky, N. The x-ray of the school mark. Educational Forum, March I962, 347- 352. Clark, C., & Yinger, R. Research on teacher thinkiryq. Institute for Research on Teaching, Michigan State University, Research Series No. I2, April I978. Clark, C., Yinger, R., & Wildfong, S. ldentif in cues for use in studies of teacher iudgment. Institute for Research on Ieaching, Michigan State University, Research Series No. 23, July I978. Coleman, J., Campbell, E., Hobson, C., McPartland, J., Mood, A., Weinfield, F., & York, R. Equality of Educational Qpportunity. U.S. Office of Health, Education and Welfare, I966. Coleman, J., Hoffer, T., & Kilgore, S. Public and private schools: A report to the National Center for Education. NationaI Opinfin Research Center, University of Chicago, March I98I. Combs, A. Educational Accountability Beyond Behavioral Objectives. Washington: Association for Supervision and Curriculum Development, I972. Cooper, H., & Burger, J. Internality, stability and personal efficacy: A categorization of free response academic attributions. Unpublished manuscript, University of Missouri, Columbia, I978. Craik, F. Primary Memory. British Medical Bulletin, I97I. _2_7_, 232-236. Cremin, L. The Genius of American Education. New York: Vintage Books, I965. Cremin, L. The Transformation of the School. New York: Vintage Books, I96I. Cronbach, L. Beyond the two disciplines of scientific psychology. American Psychologist, I957, _LZ, 67I-684. Dearborn, W. School and Universitj Grades. University of Wisconsin, I9IO. Doyle, W. Paradigms for research on teacher effectiveness. In L. Shulman (Ed.). Review of Research in Education, _5_, Illinois: Peacock, I977. 208 Doyle, W. Student mediatinguesponses in teaching effectiveness: Final Report. (Project No. 0-0645.) Department of Education, North Texas University, March I980. Dunkin, M., & Biddle, B. The Studygof Teaching. New York: Holt, Rinehart and Winston, Inc., I974. Dusek, J. Do teachers bias children's learning? Review of Educational Research, I975,-ll; 66I-68ho Educational Research Service. ReportingPupil Progress: Policies, Procedures and Systems. Virginia: I977. Einhorn, H., & Hogarth, R. Confidence in judgment: Persistence of illusion of validity. Psycholggical Review, I978, 8_5_, No. 5. Einhorn, H., Kleinmuntz, B., & Kleinmuntz, D. Linear Regression and Process Tracing Models of Clinical Judgment. Unpublished paper, Gaduate School of Business, University of Chicago, January I979. Ericsson, K., 8: Simon, H. Verbal reports as data. Psycholggical Review, May I980, g, No. 3. Evans, F. What research says about grading. S. Simon & J. Bellanca, (Eds.). Washington: Association of Supervision and Curriculum Development, I976. Fay, P. The effect of the knowledge of marks on the subsequent achievement of college students. Journal of Educational Psycholgqy, I937,2_8, 548-554. Ferguson, R., & Maxey, E. Trencyn academic perfogr_nance of high school and college students. Paper, I975. Available from Educational Resources and Information Center, U.S. Office of Education, ED I09 523. Fischhoff, B. Attribution theory and judgment under uncertainty. New Directions in Attribution Research. New Jersey: Erlbaum Associates, I976. F ischhoff, B., & Beyth, R. I knew it would happen. Remembered probabilities of once—future things. Organizational Behavior and Human Performance, l975,£. Freyd, M. The graphic scale. Journal of Educational Psychology, I923, fit, 94. Gage, N. The Scientific Basis of the Art of Teaching. New York: Teachers College Press, Columbia University, I978. Getzels, J., & Jackson, P. The teacher's personality and characteristics. In N. L. Gage (Ed.). Handbook of Research on Teaching. Chicago: Rand McNaIIy, I963. Goldberg, L. Man versus model of man: A rationale, plus some evidence, for a method of improving clinical inferences. Psychological Bulleth, I970, 13, No. 6. 209 Heider, F. The Psychology of Interpersonal Relations. New York: Wiley, I958. Hilsom, S., & Cane, B. The Teacher's Day. London: National Foundation for Educational Research in England and Wales, l97l. Jaggard, G. Improving the marking system. Educational Administrative Supplement, l9l9, _5, 25-35. Johnson, D. The Psychology of Thought and Judgment. New York: Harper Row, I955. Joyce, B. Toward a theory of information processing in teaching. Educational Research Quarterly, I978-79, _3_, 73-77. Kahneman, D., & Tversky, A. On the psychology of prediction. Psychological Review, July I973, 8_0, No. 4. Kahneman, D., & Tversky, A. Subjective probability. ngpitiveP scholggy, July I972, _3, No. 3. Kelly, F. Teachers' marks, their variability and standardization. Teacher's Collegg Recorg, Columbia University, New York: I9I4. Kelley, H. At__tribution Theory in Social Psychology. New Jersey: General Learning Press, I97I. (Based on a previous paper of the same title.) In D. Levine (Ed.) Nebragka Symposium on Motivation. Lincoln: University of Nebraska Press, I967. Kirschenbaum, H., Simon, S., & Napier, R. Wad-Ja-Get? The Grading Game in American Education. New York: Hart Publishing Co., I97I. Lavin, D. The Prediction of Academic Performance. New York: The Russell Sage Foundation, I965. Lessinger, L. After Texarkana, what? Nations Schools, December I969, fl, Vol. 6, 37-40. Lezotte, L., Miller, D., Passalaqua, J., & Brookover, W. School learning climate and student achievement. Document: SSTA Center, clo Teacher Education Project, Florida State University, 403 Education Building, Tallahassee, Florida 32306, I976. Long, R., & Henderson, E. The effect of pupils' race, class, test scores, and classroom behavior on the academic expectancies of southern and nonsouthern white teachers. Paper presented at annual meeting of American Educational Research Association, I972. Cited by J. BrOphy, Teachers (finitive activities and overt behavior. Los Angeles Committee of the Secondary School Principals Association. Marking slow pupils. California Quarterly of Secondary Education I926, i, 386- 39I. Cited in Smith and Dobbins, Encyclopedia of Educational Research, I957. 2I0 Mackay, D., & Marland, P. Thought Processes of Teachers. Paper presented at the annual meeting of the American Educational Research Association. Toronto: February I978. Marland, P. A Study of Teachers' Interactive Thoughts. Unpublished doctoral dissertation, [FIR/ersity of AIberta, I977. Marx, R. Teacher Judgnents of Students' Cognitive and Affective Outcomes. Unpublished doctoral dissertation, Stanford University, I978. McDavid, J. Some relationships between social reinforcement and scholastic achievement. Journal of Consulting Psychology, I959,_2_3_, ISI-I54. Medley, D. The effectiveness of teachers. In P. Peterson and H. Walberg (Eds.). Research on Teaching: Concepts, Findiggs and Implications. California: McCutchan, I979. Meichenbaum, D. C_ognitive Behavior Modification. New York: Plenium, I977. Cited in J. Brophy, Teacher Cognitive Activities and Overt Behaviors. Miller, G. The magical number seven plus or minus two: Some limits on our capacity for processing information. Psychological Revievg, I956, _6_3, 8l-97. Morine-Dershimer, G. Teachers' conceptions of pupils--An outgrowth of instructional context. The South Baj Study, Part III. Institute for Research on Teaching, Michigan State University, Research Series No. 59, July I978. National Institute of Education. Teachinggas Clinicgl Information Processi_ng. Washington: National Conference on Studies in Teaching (Report of Panel 6), I975. National Education Association. What Research Sgys to the Teacher: Evaluation and Reporting of Student Achievement. Washington: I974, ED 099-405. Newell, A. igdggnent and Its Representation: An Introduction. New York: Wiley, I968. In B. Kleinmuntz (Ed.). Formal representation of human judgment. Newell, A., & Simon, H. Human Problem Solvigg. New Jersey: Prentice-Hall, I972. Nie, N. et aI. Statistical Package for the Social Sciences. New York: McGraw- Hill, I976. Nisbett, R., & Ross, L. Human Inference: Strategies and Shortcomings of Social Judgment. New Jersey: Prentice-Hall, I980. Pelto, P., & Pelto, G. Anthropological Research. New York: Cambridge Press, I978 (I970). Phillips, B. Sex, social class and anxiety as sources of variation in school anxiety. Journal of Educational Psychology, I962, 53, 3I6-322. 2I| Piliavin, |., Kodin, J., & Piliavin, J. Good semaritanism: An underground phenomena? Journal of Personality and Social Psychology, I969, E, 289- 299. Popham, W. Must all objectives be behavorial? Educational Leadership, April l972,;9_, 605-608. Prawat, R., & Nickerson, J. Affective outcomes in education. Progress Report, Institute for Research on Teaching, Michigan State University, I979. Resnick, L., & Ford, W. The PsychologLof Mathematics for Instruction. New Jersey: Erlbaum Associates, I98I. Rosenshine, B. Content, time and direct instruction. In P. Peterson and H. Walberg (Eds.). Research on Teaching: Concepts, Findings and Implications. California: McCutchan, I979. Rosenthal, R., & Jacobson, L. Pygmalion in the Classroom: Teacher Expectation and Pupil's Intellectual Devaopment. New York: Holt, Rinehart and Winston, T968. Rugg, H. Teachers' marks and marking systems. Educational Administrative Swplement, I9l5, l, I l7-I42. Cited in Smith and Dobbins, EncycTopedia of Educational Research, I957. Rudman, H. Integratirgaweament with instruction, A review (I922-I980 . Institute for Research on Teaching, Michigan State University, Research Series No. 75, January I980. Ryan, K., & Levine, J. Impact of academic performance pattern an assigned grade and predicted performance. Journal of Educational ngcholgy, I98l, L3, NO. 3, 386-3920 Schatzman, L., & Strauss, A. F ieldfiesearch, Strategies for a Natural Sociology. New Jersey: Prentice-Hall, I973. Schellenberg, J. The class-hour economy. Harvard Educational Review, I965,_3_5, I6l-l64. Schwab, J. The practical: A language for curriculum. School Review. Illinois: University of Chicago, I969. Shane, J., & Shane, H. Ralph Tyler discusses behavioral objectives. Today's Education, September-October I973, _6_2_, 4l-46. Shavelson, R. Research on teachers' pedaggfial thoughts,juggments, decisions and behavior. Unpublished manuscript. University of California, Los Angeles, I980. Shavelson, R., Caldwell, J., & lzu, T. Teachers' sensitivity to the reliability of information in making pedagogical decisions. American Educational Research Journal, Spring I977, _l_4_, No. 2, 83-97. 2I2 Shulman, L., & Elstein, A. Studies of problem solving judgment and decision making. In F. Kerlinger (Ed.). Review of Research in Education. Illinois: F. E. Peacock, I975. Simon, H. The Sciences of the Artifical. Cambridge: M.I.T. Press, I969. Slovic, P., & Lichtenstein, 5. Comparison of bayesian and regression approaches to the study of information processing in judgment. Orggizational Behavior and Human Performance, é, I97I. Smith, A., & Dobbin, J. Marks and marking systems. Encyclopedia of Educationa Research, (Ed.). MacMillan Co., I959. Smith, E., & Sendelbach, N. Teacher intentions for science instruction and their antecedants in proggam materials. Paper presented at annual meeting of the American Educational Research Association, I979. Smith, E., & Tyler, R. Adventures in American Education: Appraisingrand RecordingLStudent Progress. New York: Harper and Bros., I942. Stake, R. The case study method in social inquiry. Educational Researcher, A.E.R.A., February l978,_7_, No. 2. Starch, D., 8: Elliott, E. Reliability of the grading of high school work in English. School Review, I9I2, _2_0, 442-457, idem. Reliability of grading work in mathematics. School Review, I9l3, _2_I_, 254-295; and Reliability of grading work in history. School Review, I9I3, fl, 676-68I. Stenhouse, L. Case study and case records; towards a contemporary history of education. British Educational Research Journal, I978, 4, No. 2. Taba, H. Curriculum Developmen_tLTheory and Practice. New York: Harcourt, Brace and World, I962. Taylor, P. How Teachers Plan Their Courses. Slouth, Bucks: National Foundation for Educational Research, I970. Thorndike, R. Marks and marking. Robert Ebel (Ed.). Encyclopedia of Educational Research, 4th Edition, MacMillan Co., I969. Tiegs, E. Tests and Measurements for Teachers. Houghton, I93I. Cited in Smith and Dobbins, Etcyclopedia of EducationcflResearch, I957. Traxler, A. Techniques of Guidance. New York: Harper, I957. Traxler, A. The Nature and Use of Anecdotal Records. Educational Records Bureau, New York: I949. Tyack, D. Ways of seeing: An Essay on the history of compulsory schooling. Harvard Educational Review, August I976, fl, 355-389. Tyler, R. Basic Princigies of Curriculum and Instruction. University of Chicago Press, I950. 2I3 Weiner, B. A theory of motivation for some classroom experiences. Journal of Educational Psychology, I979,7_I, No. I. Weiner, B., Nierenberg, R., & Goldstein, M. Social learning (locus of control) versus attributional (causal stability) interpretations of expectancy success. Journal of Personalitj, I976, _4_4_, 52-68. Weinstein, M., & Fineberg, H., et al. Clinical Decision Analyses. Philadelphia: Saunders Co., I980. Wilhelms, F. Evaluation as Feedback and Guide. Washington: Association of Supervision and Curriculum Development, Yearbook, I967. Willis, 5. Formation of teachers' expectations of students' academic erformance. Unpublished doctoral dissertation, University of Texas at %ustln, I972. Cited by J. Brophy, in Teachers' cognitive activities and overt behaviors. Wisconsin State Committee. A uniform system of marking records In Wisconsin. Nations Schools, June I929, _3_, 50. Wrinkle, W. Improvigg Marking and Reporting Practices. New York: Holt, Rinehart and Winston, Inc., I947. Yinger, R. A study of teacher planning: Description and theory development usina ethnographic and information gocessing_methods. Unpublished doctoral dissertation, Michigan State University, I977. (A summary is available as Research Series No. I8. Institute for Research on Teaching, Michigan State University, I978.) Zahorik, J. The effect of planning on teaching. Elementary School Journal, December I970,_7_I, No. 3, I43-I5I.