22‘ 'l' , | :2 25 2‘3, 2"." Lafi‘n IN.“ 2,2222 13:2? 22~- .,.. 2W "2'2.“{252 ; '25-." 22_ III'PIW 11.2, 2 7i: ur‘ v3.3.3:- . .‘O ‘71:. 2 u 329': 1; {IL "72% ”29:” 2- V I“... ’- :‘a’L : : ‘22,] P" l .‘ ...2 -2 ' 2.3"»- '24:. 5.4 . .,4 . a v .52}: . ‘2‘ . '1 , H - 122-159, kw,“ 73' "an: 71;: ‘31" a." ‘ £12 2 . - “1:~_“v2:1.2_‘ 5 R). . a": 22 2 2 %" '{I'Ii ‘41- \‘A‘L'J': _ . . L ‘..- . 2-. 2r 2 2' 2 .2 . 4i I 2.I *' g $3 $' 2-1:. ”A“ .IY‘n.’ :_'{‘2‘: ' 93 '3 '1- ”(1.": _J ’ 2 2. 2'" * Jun '2. 0:12,.‘ '. 32' 2 o ‘2 ‘ ‘2' klll 2.7%; ' {3,14, :~::;°i' ”‘23:?!" .3; _'c“ “it; ("I r: mu." ‘IM 1::V‘ ." 2...“. “'9" k. 222’"‘."“"f{'l,‘ . 21:! L ‘ ‘2' ‘. :12:- (4 $55}. _ ‘1: ‘ 1‘” 2‘ 5' , W ' 2 - ' 2 2 2 5;; 'fi- 11‘7“!" “112* #2? ”(I. 3.751?“ :éfi‘l." '? $419, » 2'1" '2‘I,;|2'"'.'2.':‘. if: Py’fl‘r‘ IL' "'21): I)" (“)6 “Prawn W2" :26)”: ',"1"("“$;I 2.. Il|:2."“ ‘ 'I‘ 51* . 3":[2 4“]: I,“ ’ :2 “I “Q .V%’Hé 22"." (’I‘ 22,12,“ £1.14“. fig"! m‘ . Kym" J. lufo'KaI'r::l:-,'I‘ ' 2 2 2 2.. 2 ' '-\I MI} 2 '.".':' .4 ‘ "I H , “I I "I' : 4;. t;‘- {“5 2 2. 2221:”) ‘~ .2. 'I‘? 2" {Wm- . ‘2L' 3’7,» lMC). .222 Site.“ " If: It, {It'll $32.; 1’“ .3135“ \LJ (‘1':- ;§II‘I" H 05 ”I.” \ I. 3'“ ”2"“ gl‘l“:'"l'." t bah)...“ ”lit-F,‘ '1“ 351.2% ‘ I‘ ‘ 22‘": '2" 224' 31""‘2" ‘22» r 2.2 2‘ '2-‘III~‘I'I" {2292" 9- 2" 22 '5 ' " 'z '1' ' My; W’ 'IQL. :M 22525“? '- 2 “2'22“5-‘2h" “232423.5321526" ' ~ . Ir PU r.‘ :::‘ ; I 249;]! I..‘u_‘:.'J_|,‘:‘:-$‘ If . .«| 220.3%? ‘22:“... .. 2 1' 2’22 .‘Iu Wag?! ‘ “2:24:22 2'122 7 VP 7“- V; '2232. ‘ I‘F' ‘ .' - 73G; "2/4 { . ’i;: _.:..".."‘|:é;_" Rant" . 22:12.5 -1 - i‘ 1... him”: {it"sL . 1' 2 WLFII'Et 2m 2. ”’2‘..- . “‘5'." 2' _'..', :- 1I2';;;'l't:_2.j ‘7" ' ) 152‘; 224,; I “I 5".“ 'dTi‘l 2}“ 0-22 v THESE-3 MSU Ls an Affirmative Action/Equal Opportunity Institution "\ ,‘ ‘ (.‘. . i r ‘ .1 Li ‘ ‘ » "<~.q— 9 U 3,. 9. ' " f_’ \ a} , ~ -'.-- ~ “2'! l - = ‘ ' . - . ‘ . . . . fl ' l 3?“; J r. .a J wflwv—v wm \ ‘- ‘ IK‘W.‘~-' ' ' This is to certify that the dissertation entitled Sources of Error in Meaéures of Time Allocation in the Classroom presented by David J. Solomon has been accepted towards fulfillment of the requirements for Ph.D. degree in Educational Psychology «worse/mm Major professor Date August 1, 1983 012771 -- ..-r."—"‘<"-' - ‘)V{€SI.] RETURNING MATERIALS: \ Place in book drop to LJBRARJES remove this checkout from ‘1-Ilzx-IIL. your record. FINES will be charged if book is returned after the date stamped below. SOURCES OF ERROR IN MEASURING TIME ALLOCATION IN ELEMENTARY CLASSROOMS By David J. Solomon A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Educational Psychology 1983 ABSTRACT SOURCES OF ERROR IN MEASURING TIME ALLOCATION IN ELEMENTARY CLASSROOMS By David J. Solomon This dissertation investigates error in measuring classroom time allocation within the framework of the Harnischfenger and Wiley (1976) model of school learning. First, a generalizability study was done where the facets were measures (teacher logs verses observer field notes), classes, days and students. Secondly, the reliability of a coding procedure was assessed. Thirdly, the error in teacher logs was modeled. Both the teacher as well as an outside observer recorded the activities of each student in six classes for eight days over a three month period. Two separate individuals each coded the observer descriptions for two of the classes for four days. The variance for each of the facets in the generaliza- bility study as well as their interactions were computed. These were used to estimate the reliability of a number of data collection designs. Mean differences and absolute value mean differences were used to assess the consistency of the multiple codings. Error in teacher recorded pur- suit length was partitioned into a fixed bias, random bias and random error based on a measurement model developed by Schmidt (1981). The error in categorizing pursuit based (f V.- '——4 SE ‘ 21. t1 Tl. 1. David J. Solomon on teacher logs was evaluated. The results indicated that increasing the number of days observed was the most powerful approach for improving the reliability of time allocation studies. However the use of observers as opposed to teachers as a data source also resulted in a substantial improvement. Increasing the number of students observed within a class resulted in little or no improvement. The multiple codings of the same observer notes were found to be reasonably consistent. The error in teacher estimates of pursuit length was mainly random. The pursuit records coded from observer and teacher descriptions were fairly consistent in terms of how subject matters were categorized with the exception of teacher supervision. The results suggest that teachers can collect reliable data for measuring differences among classrooms on time allocation. If the focus is on measuring differences among students within a class, or the time categories are narrowly defined as in the Harnischfieger and Wiley model, it may not be possible to obtain acceptable levels of reliability without resorting to observers and/or extremely large research designs. ACKNOWLEDGMENTS I would like to thank my wife, Carolyn Solomon, for the many hours of editorial assistance she provided as well as patience beyond the call of duty prior to my orals. I would also like to thank my advisor, Dr. William Schmidt, and committee member, Dr. Robert Floden, for the help and constructive criticism they provided. In addition I would like to thank my employer, Touchstone Applied Science Associates, for the use of their data processing equipment. ii TABLE OF CONTENTS Page LIST OF TABLES . . . . . . . . . . . . . . . . . . . v CHAPTER ONE: INTRODUCTION . . . . . . . . . . . . . 1 Purpose . . . . . . . . . . . . . . . . . . . . 2 Measurement of Pupil Pursuits . . . . . . . . . 3 CHAPTER TWO: REVIEW OF THE LITERATURE . . . . . . . 7 The Harnischfeger and Wiley Model . . . . . . . 8 The Relationship Between Time and Learning . . ll Engagement Rates . . . . . . . . . . . . . . . 17 The Reliability of Teacher Recorded Data . . . 24 CHAPTER THREE: THE DESIGN OF THE STUDY . . . . . . 28 Research Questions . . . . . . . . . . . . . . 29 Sample . . . . . . . . . . . . . . . . . . . . 31 Data Collection Procedures . . . . . . . . . . 31 Estimation of Variance Components . . . . . . . 32 Procedures with Time per Day per Student as Unit . . . . . . . . . . . . . . . . . . . . 34 Method Used to Assess Error at the Pursuit Level . . . . . . . . . . . . . . . . . . . 36 Coding Procedures . . . . . . . . . . . . . . . 37 The Analysis of Coded Pursuit Records . . . . . 40 The Analysis of the Categorization of Pursuits . . . . . . . . . . . . . . . . . 42 iii CHAPTER THREE: CONTINUED Analysis of Errors in Pursuit Length Summary CHAPTER FOUR: THE RESULTS Estimation of Variance Components Reliability of the Coding Procedures The Consistency of Log and Observation Estimates . . Individual Pursuit Level Analysis Pursuit Level Analyses Discrepancies in Pursuit Coding . Discrepancies in the Length of Pursuits CHAPTER FIVE: CONCLUSIONS AND IMPLICATIONS FOR MEASURING TIME ON TASK Summary of the Results Conclusions Coder Reliability The Generalizability of Measures of Time Allocation . The Accuracy of Teacher Logs APPENDIX A . APPENDIX B APPENDIX C . BIBLIOGRAPHY iv Page 43 45 48 49 58 68 83 85 86 97 116 116 124 125 126 127 132 143 154 157 Table £11wa 10. 11. 12. l3. 14. 15. 16. LIST OF TABLES Correlation of Time and Learning Lower Elementary (Grades 2 and 3) N=33 Upper Elementary (Grades 4 and 5) N=62 Pursuit Record Data Elements Variance Components of Measures of Five Subjects and Transitions Generalizability Coefficients for Measuring Time Allocation Comparison of Time Measures from Multiple Codings Activity Time Measure Differences Ratio of Time Measure Differences Over Sources of Variance Differences Among Classes in Pursuit Categorization . Categorization of Activity Type from Logs and Observations . . . . . . Categorization of Activity Type from Logs and Observations by Class Categorization of Group Type from Logs and Observations . . . . . . . Categorization of Group from Logs and Observations by Class Categorization of Supervision from Logs and Observations Categorization of Supervision from Logs and Observations Page 12 19 19 38 50 56 6O 7O 72 87 88 89 92 93 95 96 Table 17. 18. 19. 20. 21. 22. Sources of Error in Pursuit Length Coded from Teacher Logs by Class Bias at Standard Pursuit Lengths by Class Bias and Error Components of Measurement Error Variance by Classroom Sources of Error in Pursuit Length Coded from Teacher Logs by Activity Bias at Standard Pursuit Lengths by Activity . . . . . . . Bias and Error Components of Measurement Error Variance by Activity vi Page 99 101 104 106 107 110 SI ti it on re to ma all 01‘. inc CHAPTER ONE INTRODUCTION One of the most critical roles a teacher commands is the determination of pupil activities. The decisions teachers make about how pupil time is allocated to various subject matters and in what types of settings greatly influences what students learn. With this in mind, student activities seem a logical focal point for educa- tional research. A growing amount of educational research is in fact now focusing on student activities especially one aspect of it called "time on task." Stallings (1980) has called time on task one of the most useful variables to emerge from research on teaching in the 1970's. Much of the research to date has shown a substantial relationship between the amount of time allocated to a subject matter and achieve- ment in that area (Wiley, 1976; Fisher, 1976; Stallings, 1980; Schmidt, 1981). There are also many other aspects of student activities that in conjunction with subject matter affect what children learn in school. Harnischfieger and Wiley (1976) in their model of school learning focused on instructional grouping (whole group, subgroup and individual) and type of supervision (whether directly 512; the a l in: in' SR: to an: bee res are is We P‘m: \ 2 supervised by the teacher or not). Another dimension is the extent to which a single student activity integrates a number of different subject matters. Language arts instruction for example, offers a unique potential for integrating two or more curricula. Reading and writing are skills that must be taught using some content. In order to read or write, one must read or write something. That something can be another subject matter such as science or social studies allowing a student to simultaneously learn another subject matter area. Although integration has been talked about for over fifty years (Symonds, 1930), research is needed on the extent to which student pursuits are actually integrated in classrooms and if this approach is indeed effective. There are many other interesting questions in educa- tional research that the study of pupil activities can address. Allington (1980) found the time students spend in school reading was positively related to their reading ability. Research on the relationship between teachers' perceptions of students' abilities and the time students spend in different pursuits could help explain why the variation in achievement between students grows as they progress through school. PURPOSE -Given the interest and growing evidence of the importance of time allocation in the classroom, research is 3 needed on how best to measure this variable. Collecting data on how time is spent by students in school can be a complex and expensive process. The Language Arts Project of the Institute for Research on Teaching has developed a set of procedures for collecting time allocation data at the student level within the framework of a model of school learning developed by Harnischfeger and Wiley (1976). The purpose of this dissertation is to evaluate various aspects of this procedure with the hope of determining more accurate and efficient methods of collecting student time allocation data. Two basic methods were used to collect descriptions of student time allocation. The first were logs recorded by the teacher throughout the school day. The second method was structured field observations. MEASUREMENT OF PUPIL PURSUITS The procedures developed by the Language Arts Project to collect time allocation data consisted of two stages. First a description of student activities and the times they occur were to be obtained. Secondly, these descrip- tions were to be coded into pupil pursuit records with the beginning and ending time as well as codes indicating the subject matter, group type and supervision of the activity. Errors can occur in both describing the activities and coding them into pupil pursuit records. There are a number of ways descriptions of student 4 classroom activities can be obtained. The use of observer recorded structured field notes and teacher recorded logs are evaluated by this study. It is hypothesized that observers are more accurate than teachers in that they can focus all their efforts on data collection while the teacher's main job is teaching that at times can take all of his/her effort. A teacher is likely to be recording pupil pursuits and the time they occur after they happen especially when the classroom situation is hectic. The teacher in some cases would have to estimate starting and ending times and could forget to record activities. An observer focusing his/her whole attention on data collec- tion is less likely to miss activities or have to guess as to their beginning and ending tbmes. Using observer transcripts as descriptions of classroom activities generally would be more expensive than using teacher logs. The costs of using teachers or observers for obtaining student activity descriptions is roughly proportional to the number of days data is collected. The second stage of measuring the time students spend in different pursuits is coding descriptions of student activities into pupil pursuit records. As was stated above these records contain a beginning and ending time for the pursuit and codes categorizing the pursuit in terms of subject matter, group type and teacher supervision. It is hypothesized that coding error mainly consists of improperly categorizing pupil pursuits on one or more of 5 the dimensions. The beginning and ending times are contained in the descriptions and it is expected that it would be rare for coders to incorrectly c0py them. Also this type of error is easily caught via computer error checks since it results in gaps or overlaps in pupil pursuit records. Coder accuracy is probably to a large extent deter- mined by the categorization scheme and the coder's training in its use. The categorization of any phenomenon is always to some extent artificial and ambiguities usually exist. Harnischfeger and Wiley's model defines the categories for group type and type of supervision but leaves it up to individual researchers on how to categorize subject matter. It is hypothesized that the greater the number of subject matter categories and the more detailed they are, the more difficult it would be to have consistency across coders. Assuming reasonable care is used in developing a coding scheme and training coders, it is hypothesized that the coding process would introduce relatively small amounts of error compared to the error from.descriptions of pupil activities. The amount of funds available for data collection in research studies is always limited. In determining the most appropriate use of resources for data collection in time allocation studies it is necessary to balance the cost of increasing the accuracy of the data on each class- room with the cost of collecting data on as many classrooms 6 as possible which increases generalizability and reduces sampling error. To make prudent decisions as to the best data collection design for answering a given set of research questions, researchers using the Harnischfeger and Wiley model as a paradigm need as much information as possible as to the amount of error introduced by various approaches as well as which types of student activities are most difficult to record. CHAPTER TWO REVIEW OF THE LITERATURE Harnischfeger and Wiley's model of school learning (1976) has provided the framework for the data collection procedures this paper is assessing. The first section of this chapter presents their model. The model suggests that the amount of time allotted for students to learn different subject matters under different learning settings is the single most important determiner of school achieve- ment. The second section of this chapter reviews the literature relating time on task and achievement. As was stated in chapter one, most theorists believe it is only the allotted time in school that a student spends actively attempting to learn that promotes achievement. The third section of this chapter reviews the research on engagement rates and the relationship between allocated time and engaged time in the classroom. The major focus of this study is investigating the nature and extent of errors in teacher recorded data on student time allocation. The fourth section of this chapter will discuss the research on the accuracy of teacher recorded information on class- room activities. 8 THE HARNISCHFEGER AND WILEY MODEL Harnischfeger and Wiley's model is based on two assumptions. The first is that the most important factor determining student achievement on a topic is the total amount of time the student spends actively attempting to learn the topic. The second is that there are large differences in the total allocated learning time students receive in various curricula under different learning settings. Given these two assumptions, the model focuses on student classroom activities. Harnischfeger and Wiley strongly believe it is only by shaping pupil activities that factors such as curricula and teacher actions influence school learning. There are a number of factors that control the time students spend learning different topics. The state government usually sets the number of school days in a year while the school administration controls the length of the school day, breaks such as lunch and recess as well as to some extent the curriculum. Within these limits the classroom teacher allocates time to different subject matters. The teacher also controls the structure of the learning setting under which students receive instruction on different topics. The students of course decide the amount of effort that they put into mastering a topic although influenced by teacher's actions and classroom setting. The crucial variable in Harnischfeger and Wiley's 9 model is the time students spend in what they term pursuits. A pupil pursuit is defined as the intersection of three dimensions. These dimensions are subject matter, group type broken into whole class, subgroups and individual activities and whether or not the activity is directly supervised by the teacher. The model is focused on these dimensions because Harnischfeger and Wiley believe that these are the primary dimensions along which teachers organize classroom activities. The students in a class move from activity to activity throughout the school day, usually with a transition period in between activities. It is these individual student activities that form the basic unit of pupil pursuits. This model focuses on the teacher's role of allocating scarce resources. The resources are the limited amount of time in school children have to learn different subject matters and the teachers' limited time he/she must distribute among twenty or thirty students in the class. Harnischfeger and Wiley acknowledge that teachers perform many important roles in facilitating student learning other than allocating resources. They explain material, motivate students as well as provide them with feedback as to their performance. The model focuses on time allocation because Harnischfeger and Wiley feel that is the most powerful approach to improving student achieve- ment. The basic premise of this model is that pupil pursuits lO mediate the effects of such things as curricula, administra- tive policy and teacher actions in promoting student learning. Given this assertion, two broad types of research questions are suggested by the model. The first is what are the effects of different types of policy decisions and teacher practices on pupil pursuits. For example, how much academic learning time is gained by increasing the school day by an hour? Do open versus traditional classrooms result in a large increase in transition time? The second type of research question is what kinds of pupil pursuits seem to work best for which types of students. For example, is working in a small group on an experiment an effective way of teaching the scientific method to elementary school children? Is it necessary to have the teacher directly supervise and pro- vide immediate feedback to students learning beginning reading skills? In answering both types of research questions suggested by the model, it is necessary to measure the time students spend in different pursuits. The major focus of this dissertation is to determine whether teacher logs can provide a reasonably accurate though economical approach to collecting classroom time allocation data as compared with the use of classroom observers. 11 THE RELATIONSHIP BETWEEN TIME AND LEARNING It seems inherently obvious that exposure to a given subject matter is a precondition for mastery of the skills and content contained as part of that subject matter. There are probably many other aspects of both the learner and the instructional setting that affect the extent of what students learn in school, however exposure seems necessary. For years researchers have been struck by the tremendous differences in the time students receive in different topics (Borg, 1980). This section discusses some of the empirical research that has assessed the relationship between time and learning. The majority of studies investigating the relationship between time and achievement have found a positive relation- ship though the size of this relationship has varied considerably. These differences are likely due to both sampling error in the individual studies as well as the wide range of methods, conditions, subject and content areas used. This section will begin by discussing a review of the literature relating time and learning. This will be followed by descriptions of three large scale studies that control for background variables and measure achieve- ment at the student level while measuring time allocations at the school level. Fredrick and Walberg (1980) reviewed approximately fifty studies relating time and learning. They categorized 12 the studies into those measuring instructional time in years, days, hours, and minutes. They found a consistent though moderate relationship in all four categories of studies. The correlation ranges are shown in Table 1. Table 1 Correlation of Time and Learning Category Low High Years of instruction .26 .71 Days of instruction .36 .69 Hours of instruction .13 .59 Minutes of instruction .15 .53 As one might expect, controlling for social class depressed the correlations in a number of studies. The authors also make the point that a number of studies found the log of the instructional time tended to be a better predictor of achievement than the actual time. This suggests increases in achievement with increases in time spent on particular topics may drop off after some point. Wiley (1976) performed a hierarchial analysis relating quantity of schooling to achievement in math, reading, and verbal ability controlling for student background variables. The analysis was hierarchial in the sense that student achievement and background were measured at the student level while quantity of schooling was measured at the school level. Wiley used a portion of the Equality of 13 Educational Opportunity Report (EEOR) data consisting of 2,558 sixth grade students from 40 central city Detroit schools. The achievement measures for verbal ability consisted of sentence completion and synonym subtests from Educational Testing Services School and College Ability Test series. The math and reading ability measures were from Educational Testing Services Sequential Test of Educational Progress. The measure of quantity of schooling was the log of the triple product of the average daily attendance rate, length of school day and number of school days per year. The student background variables consisted of race, number of siblings and the number of certain types of possessions in the home. Wiley found quantity of schooling had a large effect on achievement in all three areas. From the path coefficients he obtained (4.88, 9.76, and 11.12 with standard errors of 1.62, 2.80, and 3.00 respectively for math, reading, and verbal), Wiley concluded a 24% increase in the quantity of schooling would result in a 34%, 65%, and 34% increase in verbal, reading and math scores respectively. Karweit (1976a, 1976b) performed a set of analyses similar to Wiley's. She used 30 sdhools from the suburban Detroit area. Using the same analysis procedures as Wiley, she found nonsignificant (p % .05) effects for quantity of schooling on all three measures (math, reading, and verbal ability). She also analyzed EEOR data from a number of 14 other cities (Philadelphia, Milwaukee, Washingtin, D.C., Cleveland, and Baltimore). The effects she found for quantity of schooling in the inner city schools were positive though smaller than their standard errors for all three dependent variables. The effect for quantity of schooling on achievement in the suburban schools was negative though nonsignificant (p k .05). Karweit (1976a) also performed similar analysis on a number of other sets of data without finding effects anywhere near as large as Wiley's. Schmidt (1981) assessed the relationship between achievement and the number of hours of high school instruc- tion in six curricular areas controlling for ability and student background. He used a national sample of 9,195 students in 725 schools from the graduating class of 1972. The data was collected as part of the National Longitudinal Study (NLS). The achievement measures were a vocabulary test of synonyms, a reading comprehension test, and a mathematics test. The ability measures consisted of a test of associative memory, a test of inductive reasoning and a test of perceptual speed. All six ability and achievement tests were developed by the NLS. The background variables consisted of sex, race (white, nonwhite) and a composite SES measure created from parent education, income, father's occupation and the possession of certain household items. The quantity of schooling in six subject areas (science, social studies, foreign language, English, math, 15 and fine arts) was computed for each subject matter using the number of semesters taken in each subject matter and the instructional time received during a semester. ACT test battery scores were available for a subsample of 1,421 of the students and the English, social studies, math and science subscales were used as additional measures of achievement for the subsample of students that had taken this battery. Schmidt found that quantity of schooling had a clear positive effect on achievement controlling for ability and student background. This was true for both NLS and ACT achievement measures. As one might expect the relationship was strongest for time spent in classes closely related to the test material. The one exception was a strong rela- tionship between foreign language exposure and all the achievement measures. Schmidt hypothesizes that ability was not adequately controlled for by the measures used, and time in foreign language classes was acting as a proxy for ability since college bound students tend to take more classes in foreign languages. The effects ranged from about two to four percentage points for every 100 hours of instruction (approximately one semester). The largest effects were for math achievement. The divergent findings of Karweit as opposed to Wiley and Schmidt are somewhat puzzling given their similar nmmhodology. Though Wiley's sample size of forty schools is relatively small, the size of the effects for quantity 16 of time were at least two and as much as three and a half times larger their standard errors. Karweit's Detroit sample of thirty schools was quite small and the absence of significant effects for quantity of schooling in that study could be explained by sampling error. This however was not the case for the study using a number of other large metropolitan areas. It is interesting that the same pattern of results was found for Detroit as the other metropolitan areas in terms of the sign of the relation- ship between quantity of schooling and learning in the inner city as opposed to the suburbs. Quantity of school- ing at least as it was measured by Karweit and Wiley tends to be positively related to learning in the relatively low SES inner city while negatively related to achievement in the relatively high SES suburbs. This suggests that increased time in school may be an important factor for children in the inner city schools who are less likely to get exposure to the skills and content contained in school curriculum at home than their suburban counterparts. Schmidt's findings on this question were quite different than Karweit's and Wiley's. Schmidt examined the impact of quantity of schooling on achievement for students from six types of schools. Schools with high percentage of minority and low income students versus other schools in three size categories, less than 300 students, 300 to 600 students and above 600 students. The pattern of results he found was inixed with a strong tendency for school size to interact n: F-l 17 with SES in terms of the impact of quantity of schooling on achievement. In some categories, for some types of achievement, quantity of schooling did seem to have a greater impact on low SES students. In other cases, the most striking being the impact of quantity of schooling in mathematics on mathematics achievement, the impact was much greater for students from schools with a high percen- tage of low SES students. Schmidt's study differed from.Wiley's and Karweit's among other things in the way time was measured. He used measures of the time spent in specific subject areas as opposed to the total time in school. This better match between content of exposure and the achievement measures used might explain Why he found significant effect for quantity of schooling while Karweit did not. ENGAGEMENT RATES This section reviews a number of studies on engagement rates or the proportion of allocated instructional time in school a student actually spends attempting to learn. Common sense suggests that it is only during the portion of allocated time a student spends on task that learning can occur. There is also evidence that there is a stronger relationship between engaged time and achievement than between allocated time and achievement (Borg, 1980). Despite this, the time allocated to various types of instruction in schools is an important variable in 18 educational research for a number of reasons. First, it places a ceiling on the total amount of engaged time a student can receive. Secondly, allocated time can be directly affected by policy changes while engaged time is more directly under student control and can only be indirectly affected by teacher actions. Thirdly, as will be shown in this section, naturally occurring variations in allocated time seem to have at least as great an impact on engaged time as naturally occurring variations in engage- ‘ment rates. Karweit and Slavin (1981) collected data on the scheduled, instructional and engaged time in mathematics for the classrooms of twelve teachers. Six students in each classroom.were observed for ten days. The students' activities were recorded every thirty seconds during math instruction. The mathematics computations, concepts and application subscales of the Comprehensive Test of Basic Skills (CBTS) were used as a pre and post measure of math ability. The means, standard deviations and correlations among the variables for lower and upper elementary classes in the study are given in Table 2 and Table 3. The high intercorrelation among the three measures of time (scheduled, actual, and engaged), as well as the equivalence of their relationships to the achievement measure suggest collecting data on engaged time or even allotted time may not be necessary in time on task research. Scheduled time which is generally much cheaper and easier 19 'l‘able2 Lower Elanentary (grades 2 and 3) N=33 Post Test 63.7(17.9) Pre Test ' .91 62.3(17.6) Schedned .30 .23 94 .1 (11.5) Instruct. .37 .29 .91 80.1(10.8) Eng. Min. .42 .30 .87 .90 73.8(11.8) Eng. Rate .37 .22 .19 .42 .64' .78(.07) Table 3 [Janet Elanemary (grades 4 and 5) N=62 Post'l'est 56.6(19.5) Pre Test .89 51.2(21.3) Scheduled .41 .45 97.8(14 .6) Instruct. .36 .39 .97 84.9(14.5) Eng. Min. .42 .43 .85 .89 75.9(14.6) Eng. Rate .19 .15 .15 .19 .62 .78( .08) tize a is a f and en schedu teache: raged deviati Ka hemeen Upper a EValuat be disc that in engages found at: Al reading Student: Study Us Six Stm 20 to obtain seems to be an acceptable alternative at least in this small scale study for estimating the relationship between time and learning. It is true, however, that scheduled time is a substantial over-estimate of the actual time a student spends engaged learning a topic. If there is a fixed constant for the difference between scheduled and engaged time that is fairly stable across teachers, scheduled time could provide an acceptable substitute measure for the absolute amount of engaged time after applying a correction factor. The data from Karweit and Slavin suggest this is not the case. Across the twelve teachers the ratio of engaged time to scheduled time ranged from .42 to .81 with a mean of .67 and a standard deviation of .11. Karweit and Slavin also found positive correlations between engagement rate and the three time measures in both upper and lower elementary classes. The Beginning Teacher Evaluation Study (BTES) also found similar results as will be discussed below. This is encouraging in suggesting that increasing instructional time may not lead to lower engagement rates at least within the range of variation found among the classrooms in this small scale study. Allocated time, engaged time, and engagement rates in reading and mathematics among second and fifth grade students were recorded in the Beginning Teacher Evaluation Study using a procedure similar to Karweit and Slavin's. Six students within each class with average ability were observe the stu Borg a1 Who can accuratt learning of class listenec Approxi: were r81 students This SUE rates ml; given the 21 observed by a BTES staff member. The results were quite similar to those of Karweit and Slavin's. These results suggest that within the naturally occurring variation in allocated time, there is a zero to slightly positive relationship between allocated time and engagement rates. As stated above, this suggests that increasing allocated time in math and reading does not result in lower engage- ment rates due to fatigue or boredom. The BTES Study also found engagement rates ranging from about .70 to .75 which is consistent with the findings of Karweit and Slavin. Borg (1980) reviewed a number of studies on engagement rates done in the 1920's and 1930's. The engagement rates observed then were somewhat higher than those observed in the studies discussed above, ranging from .80 to .98. Borg also discusses the question of how well an observer who can only assess outward signs of attention can accurately assess whether a pupil is actually engaged in learning. He cites Bloom (1976) who made sound recordings of classroom activities and asked students while they listened to them what their thoughts had been at that time. Approximately 65% of the students' thoughts during lecture were related to the lecture tOpic, while 55% of the students' thoughts during discussion were on discussion. This suggests observer's reports of engagement time and rates might be somewhat inflated. This is not surprising given that a student could seem to an observer to be pay- ing attention while he or she was actually daydreaming. 22 It was stated in the beginning of this section that naturally occurring variations in allocated time seemed to have as great if not greater impact on engaged time as engagement rates. In the studies discussed above; engage- ment rates have been found as low as 55% and as high as 98%. This implies that it would take approximately twice as much allocated time in a given instructional area when the engagement rate was at the very low end of the range for students to receive a given amount of engaged time as when the engagement rate was at the high end of the range. Obviously changes in engagement rates of this magnitude can have a large impact on the engaged time students experience in different topics. The variability of allo- cated time in various subject matters suggests it has even greater impact on the engaged time students receive in those subject matters. An analysis of the time allocation data from the six classrooms this study is based upon found that classes with the highest amount of allocated time in each of five major subject matter areas (language arts, reading, math, social studies, and science) spent at least twice as much time in that subject matter area as classes with the lowest amount of time in that particular area. In the case of science the class With the most amount of allocated time spent approximately 50 times as much time in science as the class with the least amount of allocated time in science. Although this study was done on a small number of classes, the results are 23 consistent with what has been found by other researchers (Borg, 1980; Mann, 1928). Even if classes where students received the most amount of instruction in a given subject matter area had the lowest engagement rates, they would receive more time on task in that subject matter area than students in classes with the least amount of instruction in that area even if they were engaged nearly 100% of the time. In summary, Karweit and Slavin's study suggests that there is a near perfect linear relationship between scheduled time, allocated time and engaged time at least in mathematics instruction at the elementary school level. They also found however that scheduled time is a consider- able overestimate of the allocated time in math instruction. In addition, the extent of overestimation is not consistent across classes. Karweit and Slavin as well as the BTES study found a zero to slightly positive correlation between engagement rates and allocated time suggesting that it is possible to increase instructional time at least to some extent without lowering student attention. Engagement rates in the studies reviewed have ranged from 55% up to nearly 100%. The work of Bloom (1976) indicates that engagement rates may be somewhat overestimated. Although there is some evidence that there is a wide range of engagement rates across classes, the even wider range of allocated time to various curricular areas across classes suggests that allocated time has as large, if not 1 '7 .335 r 1‘ ["1 v p O teach resea inves been e Rosens studie revl 8W aCC’JI'aI mome< allocat the StL‘ bEEaVio questio general infozmal tion frc a Clear teachers Cl 24 larger, an effect on engaged time as engagement rates in a given subject matter area. THE RELIABILITY OF TEACHER RECORDED DATA This section reviews the research on the ability of teachers to provide accurate information for use in research. This topic is rarely the major focus of an investigation and what research has been done has generally been a by-product of research on other tOpics (Hook and Rosenshine, 1979). One study and a review of 11 other studies were found that relate to this question. The review article will be discussed first. Hook and Rosenshine reviewed 11 studies of the accuracy of teacher reports. Although none of the studies focused on the ability of teachers to record student time allocation, their findings seem relevant. They grouped the studies into those of teacher reports of specific behaviors, those of scales formed from items in teacher questionnaires, and those of teacher reports grouped into general traits such as open versus traditional. The information provided by the teacher was related to informa- tion from an observer or students. Of the six studies of specific behaviors, none found a clear relationship between teacher reports and the other source of information. Although this suggests that teachers cannot provide accurate information on specific classroom activities, other factors may in part explain 25 this finding. Studies of the generalizability of teacher behavior have found that the variability among occasions and raters is so large for certain behaviors that they cannot be measured reliably without the use of large numbers of raters and occasions (Erlich and Shavelson, 1978), (Shavelson and Dempsey-Atwood, 1976). The lack of congruence between teacher self reports and observations may in part be due to error in observations as well as error in teacher self reports. The correspondence between teacher reports and observations was better for scales and dimensions than for specific behaviors. This is what one would expect assuming the error in teacher reports was random. In the two studies relating teacher reports of their general teaching style with observer ratings a strong relationship was found. The results of this review are not surprising. The more specific the information a teacher provides, the less accurate it is likely to be. Although these studies assessed the reliability of teacher self reports of their classroom behavior, the findings may well generalize to the accuracy of teacher recorded time allocation data. One would expect that the finer and more specific the activity categories used in a time allocation study, the less accurate teacher logs would be. A comparison of time allocation data collected using outside observers and teacher logs was done as part of the 26 Beginning Teacher Evaluation Study (BTES) (Fisher, Filby, Marliave, Cahen, Dishaw, Moore, and Berliner, 1978). The sample consisted of 25 second grade and 22 fifth grade classrooms. Three boys and three girls were selected in each class as target students. The teachers in each class kept daily logs of the time spent by each of the target students in specific content categories within the areas of reading and mathematics. On one day each week, a trained observer recorded each of the target students' activities and the times they occurred as well as error and engagement rates. Achievement data was also collected at four time points. The BTES Study found that although miscategorization from the teacher logs did occur, there was in general a good match between the observations and the logs. The correlations between observation time and log time were reasonably high. In second grade they ranged from .44 to .95 with a mean of .68 across the different activity categories. In fifth grade they ranged from .06 to .94 with a mean of .65. From their experience and the data they collected, the researchers felt that teachers tended on an individual basis to overestimate or underestimate the time in different categories. Using the results of comparing the mean observation and log time, correction factors were computed for each teacher's bias by forming ratios of the observation time over the log time for read- ing and math. These ranged from 0.717 to 1.643 for reading bias inves 27 and 0.422 to 2.727 for math. These results suggest that at least some teachers have a fair amount of bias in the allocated time they record. The results of the BTES Study suggests that teachers can in fact collect reasonably accurate data on the time their students spend in different activities, though some teachers tend to overestimate or underestimate allocated time. In the BTES Study data was collected on only six students per class and just in the areas of reading and math. In the present study the ability of teachers to keep track of the activities of all the students in the class on the full range of subjects taught was assessed. Unlike the BTES study, the major focus of this dissertation was to evaluate teachers as a source of time allocation data. For this reason, the nature and extent of the error and bias in teacher recorded logs of student activities was investigated in much greater detail. “filter: StUdent 4' transcrii of a 311233 reliabili. CHAPTER THREE THE DESIGN OF THE STUDY This chapter begins with a discussion of the questions this study attempts to answer. This is followed by a description of the six classrooms on which time allocation data were collected. Since the major focus is on evaluat- ing data collection procedures, these will be described in detail next. Two basic approaches were used to assess errors in collecting time allocation data in the classroom. The first approach assessed error at the level of total time per day for a given student in a given activity or pursuit. This is generally the lowest level at which time allocation research is done. The second approach assesses error at the level of the individual pupil pursuit as defined by Harnischfenger and Wiley (1976). This level was chosen to be consistent with Harnischfenger and Wiley's model. Part of the data collection procedure this study is evaluating consists of coding written descriptions of student activities in the form of teacher logs or observer transcripts into pupil pursuit records. Multiple codings of a subset of the observer notes were used to assess the reliability of coders. 28 1 ' l '6‘. (I) \ 29 RESEARCH QUESTIONS As was stated in chapter one, the major purpose of this dissertation is to evaluate two approaches to collect- ing classroom time allocation data on several dimensions. The first is the use of observers recording in the form of structured field notes the activities of individual students. The second is the use of classroom teachers recording the activities of their students in the form of written logs. There is probably no way to obtain perfectly accurate measures of the time students spend in different pursuits or activities. The use of a full time observer can probably provide as accurate a description as can be obtained of student classroom activities. An observer can focus all of his/her attention on the task of recording student activities while a teacher's main focus must be on teaching. The use of teachers to record student activities, however, is likely to be considerably less expensive in that it eliminates the cost of the salary of a full time observer. The use of teachers to collect these data is also likely to be less disruptive than the introduction of an outside observer to the classroom. The major focus of this dissertation is to what extent and under what circumstances can teacher logs be used as a substitute for outside observers as a data source for time allocation studies. In addition how might log keeping procedures and training methods be improved to increase the IECC O Sing very It is proce: what 1 and SC differ Sublet: cussed measuri O Q it's e? '4. among 3 tEachEr to be 11' 30 accuracy of the information provided by teachers. The second question this dissertation addresses is the reliability of a set of procedures for coding written descriptions of student activities into pupil pursuit records. These pupil pursuit records indicate for a single student his/her activity coded on the three major dimensions of the Harnischfenger and Wiley model (subject matter, grouping composition and.Whether or not the activity was supervised by the teacher). The records include the beginning and ending times of the activity, and a new record was started when the students' activity changed according to any one of the dimensions. The coding of the teacher logs and observations is very time consuming and tedious using this coding process. It is important to determine how reliable the coding process is and whether multiple coders are necessary. The final question this dissertation addresses is what is the best strategy for sampling classrooms, students and school days within classrooms. If there are extreme differences in the time students receive in different subjects, as is suggested by the previous research dis- cussed in chapter two, a great deal of precision in ‘measuring time allocation may not be necessary to estimate it's effects on learning. If there is little variability among students within a class, as would be the case if the teacher mainly used whole group instruction, there seems to be little use in observing all or most of the students 31 in a class. On the other hand, if the teacher used a large amount of individualized instruction resulting in large differences among students within the class in the time devoted to different subjects, observing a large number of individual students would be important. As with students, if there is substantial variation among days in the school year within classes in the time spent in different subjects, it would be necessary to obtain data from a large number of days to get accurate estimates. If there was little variation among days, it would be necessary to obtain data on only a few days in each class— room to obtain reasonably accurate information. SAMPLE Time allocation data were collected in six elementary classrooms from.the greater Lansing, Michigan area. The sample included classrooms from inner-city Lansing as well as rural and suburban districts around Lansing. There were two second grade classes, two third grades, a fifth grade and a team taught fourth and fifth grade double classroom. DATA COLLECTION PROCEDURES Classroom.time data were collected during a three month period in the spring of 1978. During this period each teacher in the study kept daily logs of the classroom 32 activities of each student in the class. The teachers were asked to record the logs while the activities were taking place whenever this was possible. The logs included the beginning and ending time of an activity, student group, lesson content and materials, instructional purpose, and instructional strategy. An example is provided in appendix A. On eight days in three of the classrooms and nine days in the other three classrooms, an observer recorded the classroom activities of each student in the classroom in the form of structured field notes. These included descriptions of the activities of each student group or individual, those students making up the group, and the beginning and ending time of the activity. An example of a transcribed version of a set of these notes describing the activities of a class for a day is provided in appendix A. Both the written logs and observations of the students' activities were then coded using the scheme presented in appendix B. For two classrooms on four days each, the observations were coded by two separate individuals to allow for estimating the reliability of the individual coders. ESTIMATION OF VARIANCE COMPONENTS In order to help give a perspective as to the importance of the error introduced by various methods of can, each 33 collecting classroom time allocation information, estimates of the variation among classrooms, measures (logs versus observations), students and days within classrooms and the existing interactions for five major subject matter areas, and the transitions between lessons were computed. The unit used was minutes per day, and the subject matters were language arts, reading, math, science and social science. To greatly simplify the estimation of these variance components, days and students were randomly dropped from each classroom to obtain a balanced design. This resulted in six classrooms, two measures (observations and logs), eight days and 13 students within each classroom. In this design, classes, students and days were conceptualized as random and measures as fixed. Students and days were nested in classrooms and measures was crossed with the other three dimensions in the design. A set of rules of thumb developed by Millman and Glass (1967) were used to develop computational formulas for computing the mean squares and their expected values. Unbiased estimates of the variance components were obtained from the mean squares via algebraic manipulation. These formulas were programmed in FORTRAN by the author in order to perform the computations. The reliability of measures of time allocation using different numbers of students, days and the two types of measures were computed using generalizability theory 'U and o diffe: contair day Wer the am StUdent Pursuit ject mat 80cial S subgrOuP 34 (Cronbach, Gleser, Nanda, and Rajaratnam, 1972). This information can help researchers designing future studies of time allocation. PROCEDURES WITH TIME PER DAY PER STUDENT AS UNIT As was stated above, the error introduced by different approaches to collecting time allocation data in the classroom was assessed at the level of time per day per student (marginals for pupil pursuits) and at the level of individual pursuit records. This section describes the procedures at the first level. At this level both the consistency between the logs and observations as well as the consistency among different coders coding the same observations was assessed. The methods used were generally parallel. The method will be described in terms of assessing the consistency of the logs and observations noting differences in the procedures for multiple coders where they exist. Parallel data sets coded from logs and observations containing each student's pursuits in each class on each day were formed. The data sets were aggregated to obtain the number of minutes in pursuit categories for each student in each class on each day. The time in twelve pursuit categories was computed. These included five sUb- ject matters, language arts, reading, math, science, and social studies; three grouping strategies, whole group, subgroup and individual instruction; teacher supervised 35 and unsupervised instruction, as well as the time in transition between lessons and mixed seatwork. The average difference (observation time minus log time) was computed for each of the twelve categories and averaged across students and days in each classroom” For the multiple codings this was done across the days and students each pair of coders coded. These mean deviations provided a measure of the averaged difference in the time estimates for the different data collection methods. For various students and or days, the logs may indicate more time in a given activity category than the observations, while for other students or days the observa- tions might indicate more time. This could also be true for multiple codings of the same observations. To the extent this happened, the deviations would tend to cancel out. To provide a measure of the average difference per day per student between logs and observation time measures in a category, absolute values of the difference between the logs and observations were computed and averaged across days and students. A comparison of the mean difference and the mean absolute value difference provides a measure of the extent either the logs or observations consistently show more time in a pursuit category. If either the logs or the observations show more time in a category than the other across all days and students in a class, the mean absolute value of the difference will be equal in magni- tude to the mean difference in that category. To the 36 extent it varies as to whether the observations or logs show more time in a pursuit category, the magnitude of the mean absolute value of the difference will be higher than the mean difference. In order to provide a frame of reference for inter- preting the mean differences and mean absolute value differences in the categories, the mean observation time was also computed for each category. In the case of multiple codings of the same observations, the time indicated in a pursuit category for a student on a day was averaged and these values were averaged across days and students. METHOD USED TO ASSESS ERROR AT THE PURSUIT LEVEL An attempt was made to evaluate the teacher logs as a data collection procedure at the individual pursuit level. Since this is the level at which the data is collected, it was felt that a better understanding of the types of discrepancies between the logs and observations could be achieved at this level. Due to discrepancies in the logs and observations, there was not always a one to one match between pursuit records. A coding procedure was used to match log and observation pursuit records. ctivit of the represe This pr Tn observe extent do this Observe Created from th was add element In be a mo than th flinch Sm This re. bI‘Eakin! pieces 37 CODING PROCEDURES Given the number of pursuit records in the observation and log data sets (approximately 47,000 in the observations and 23,000 in the logs) a sample was selected for coding. Four days from each class were randomly sampled. Since there was a large amount of similarity among students in a class on a given day in their pursuit records due to group activities, three or four students in each class on each of the four sampled days were selected that tended to represent groups of children with similar pursuit records. This procedure resulted in 76 student-days being coded. The purpose of the coding procedure was to match the observation and log pursuit records so that the type and extent of differences between them could be studied. To do this, the log pursuit records were mapped onto the observation pursuit records. That is, a coded record was created for each observation record and the information from the apprOpriate log pursuit record or portion of it was added to the coded record. Table 4 gives the data elements on each coded record. In general, due to the fact the observations tended to be a more detailed description of the student activities than the logs, each observation pursuit record included a much smaller time interval than a log pursuit record. This resulted in the coding process to a large extent breaking up log pursuit records and assigning separate pieces to different observation pursuit records. The oqv \pé C02 Cla. Catt who 38 coded records contained an ID number for the log pursuit the log portion came from if it was a piece of a larger log pursuit. This was so that it could be tied back to the total pursuit from.Which it came. Table 4 Pursuit Record Data Elements student ID code class ID code date starting time of observation pursuit total time of observation pursuit subject matter code of observation pursuit grouping code of observation pursuit supervision code of observation pursuit total time of log pursuit 10. subject matter code of log pursuit ll. grouping code of log pursuit 12. supervision code of observation pursuit 13. log pursuit ID if log pursuit divided \OCDVO‘U‘IPLDNH The coding process was made up of the following steps. First, the days to be coded for each class were randomly selected. The pursuit records were aggregated to the level of the total time for the day for each student in the class in the subject matters, grouping and supervision categories. The information was used to group students who represented different time allocation patterns in the class. There were generally three or four such patterns in a class on a given day. A student was randomly selected from each group and the pursuit records for these students were coded. Listings of the log and observation pursuits on the 39 selected student-days were made. These were used to map the log pursuits onto the observation pursuits. The information from the observation pursuits was pulled off via a FORTRAN program while the matching log pursuit information was key-entered under control of the FORTRAN program. In general the coding process of matching the log pursuits with the observation pursuits was quite straight- forward. In most cases the pursuits matched in an obvious way one to one or as stated above, the log pursuits were broken up and portions of a single log pursuit were matched with a number of observation pursuits and an ID number assigned to them so that they could be tied back together if so desired during analysis. There were a few instances where an observation pursuit record included a large block of time where the log pursuit file had the same block of time broken into several pursuits. Since the coding procedure mapped the log pursuits onto the observation pursuits, there was no simple solution to this situation. There were multiple log pursuit records describing different time portions of the same observation pursuit record. In general the largest log pursuit record was matched with the observation pursuit record. Each occurrence of this situation along with how it was resolved is documented in appendix C. Since the coding procedure used to match log and observation pursuit records could introduce error that ‘1 '11 O 40 would be mistaken for real differences between the log and observation pursuits, two student days were randomly selected and coded by two individuals to provide a measure of the reliability of the coding procedure. THE ANALYSIS OF CODED PURSUIT RECORDS The pursuit records from logs and observations could differ in two ways. First they could differ in their recording of the beginning and ending time of the pursuit. Secondly, they could differ in how the activity was categorized on one or more of the three dimensions, sub- ject matter, grouping and teacher supervision. Although both types of discrepancies result in differences in time estimates between the log and observation data when aggregated, these two types of discrepancies will be analyzed separately. Both the teachers and the observers were instructed to record the beginning and ending times of activities using the wall clock in the classroom. The coding of the logs and observations were checked by multiple individuals and the coded pursuit records were verified keypunched. A computer check was also done to assure that the starting time of the next pursuit record for a student was the same as the ending time of the last pursuit record. These checks made it unlikely that any substantial amount of error was introduced into the log or observation pursuit records in terms of beginning and ending times during codi tt rec: less 0111 dUQ 41 coding and data entry. It seems reasonable to assume that discrepancies in the beginning and ending times of pupil pursuits or activities were mainly due to the teachers' inability to keep track of the exact times student activities changed while teaching. In many instances the teachers probably had to reconstruct or guess at the time that activities changed because they were too busy to look at the clock at that time. It is true that the observer could also record the times incorrectly, but this seems considerably less likely given the fact the observer's main focus was on recording student activities. Differences in how a given student activity was categorized resulted from the unreliability of the coding process and differences and ambiguities in the descrip- tions of the activities contained in the observer notes and teacher logs. As described above, multiple codings of the same observer notes were done for two classrooms on four days each. This should provide data on the unreliability of the coding process. Examples of teacher logs and transcribed observer notes are contained in appendix A. The far greater detail of the observer notes as compared to teacher logs would suggest that discrepancies in the categorization of activities would for the most part be due to ambiguities in the teacher logs. 42 THE ANALYSIS OF THE CATEGORIZATION OF PURSUITS Discrepancies in the categorization of student activities was assessed as follows. A crosstabulation like table was constructed with one of the dimensions being the log categorization and the other being the observation categorization. The following statistics were included for each cell of the table; the number of pursuits in that cell, the average number of minutes of the pursuits (from the observations), and the proportion of pursuits in that cell of the total number of pursuits with that given observation coding. Tables were constructed for activity type (the five subject matters, transitions between lessons, and seatwork), group type and supervision. The diagonal cells in this table represent those pursuits where the categorization of pursuit by the logs and observations was consistent. Each off-diagonal cell represents a certain type of discrepancy between the log and observation coding of the activity. It is possible to evaluate the extent and nature of the differences in the log and observation codings of the student activities from these tables. A log-linear model (Bock, 1975) was fit to test whether there were statistically significant differences among classrooms in the proportion of pursuits in the cells of the table described above. The differences among classrooms was found to be significant beyond the p % .001 level for activity type as well as group and supervision 43 code. For this reason tables for each classroom as well as a table of the combined results are presented and discussed. ANALYSIS OF ERRORS IN PURSUIT LENGTH A measurement model described by Schmidt (1981) was used to assess the consistency of the length of log and observation pursuit records. As has been discussed above, the assumption has been made that the observation data is more accurate than the logs. The purpose of comparing the use of teacher logs and observer transcripts as methods of collecting classroom time allocation information is to see how W611 the information collected from logs approxi- mates that from observations. For this reason, the observation time has been conceptualized as a true score in this model and the log time as an observed score. The advantage of Schmidt's model over the classical true score model in this situation is that it allows for fixed bias and random bias correlated with the true score as well as random error independent of the true score. Fixed bias in this situation would be the consistent tendency for a teacher to over or under estimate the time in an activity category. Random bias correlated with the true score in this study would be the tendency for the bias in the teachers' estimates of the students' time in an activity to change with the magnitude of the true time in the activity. In addition the model includes a random 44 error term uncorrelated with the true score with an expected value of zero. Schmidt's model is expressed as follows: X = A + BE + e (1) where X is the observed score (log time), 5 is the true score (observation time) and e is the random residual term in the model. A is the intercept and B is the slope of the linear relationship between X and g. It is also assumed that the covariance of g and e is zero. Under this model measurement error denoted by e is expressed as X minus 5. Given the model of (1), and the definition of measure- ment error as the difference between observed and true scores, measurement error (e) can be expressed as follows: e = A + (B-l)g + e As one can see, measurement error in this model is made up of three distinct components. The first, specified by A is a fixed bias. The second which is a function of both B and 5, represents a random bias associated with the magnitude of the true score. The final component 8, is random error independent of the true score. The model was fit for each class separately as it was felt that both bias and random error in keeping track of pursuit length would vary among the teachers. The model was also fit for the five subject matter categories, 45 transitions between lessons, and seatwork as well as the grouping and supervision categories. For the pursuit categories, the model was fit for only those pursuits where there was agreement between the logs and observations to the pursuit category. Both log and observation measures of pursuit length are random.variables. For this reason maximum likelihood was used to estimate the para- meters and provide correct asymptotic standard errors. SUMMARY This chapter described the methods that assess using teacher logs as opposed to outside observers as a method of collecting classroom time allocation information within the framework of the Harnischfenger and Wiley model of school learning. Teacher logs and outside observer notes were collected from six classrooms for eight (three classrooms) or nine (three classrooms) days on each student's activities. These descriptions were coded into pursuit records as defined by the Harnischfenger and Wiley model. In the case of two classrooms for four days each, the descriptions provided by the observers were coded by two separate individuals. Five approaches were used to assess the consistency of the information provided from the logs and observations and evaluate the data collection procedure as a whole. The expected values of the variability among classrooms and students and days within classrooms were computed for 46 the time in minutes spent in language arts, reading, math, science, social studies and transitions between lessons. This was done both to provide some perspective on the discrepancies found between activity time computed from logs and observations as well as help determine how many days and students a researcher would need to observe to obtain accurate activity time estimates for a classroom- The consistency of multiple codings of the observa- tions from.the four days each in two classrooms was assessed by computing average differences, and average absolute value differences across students and days for each pair of coders in the five subject matters listed above, transitions between lessons, group type, and the presence or absences of teacher supervision after the pursuit records were aggregated to the time per category per student per day. A parallel analysis was done compar- ing the time coded from teacher logs and observer tran- scripts. Since the descriptions of student activities were collected and coded at the level of the individual pursuit records, it was felt that the discrepancies between the logs and observations could be best understood at that level. A coding procedure was used to match a sample of the log and observation pursuit records. Two types of discrepancies could exist between pursuit records coded from logs and observations. First, they could differ in length. Second, they could differ in how the activity was 47 categorized. A measurement model developed by Schmidt (1981) was used to estimate random and fixed bias, as well as random error in the log pursuit length as compared with the observation pursuit length. Crosstabulation of how the pursuit records were categorized from logs and observa- tions for all the classrooms as a whole as well as the individual classrooms, was done to assess the discrepancies in pursuit categorization. CHAPTER FOUR THE RESULTS The results of the data analysis are presented in this chapter. Five approaches were used to evaluate the reliability and efficiency of the data collection methods used by the Language Arts Project to collect time alloca- tion data in classrooms. The first approach evaluated the reliability of coders coding written observations into pupil pursuit records. The second approach estimated the variance components for different subject matters; 1) among classes, 2) among schools within classes, 3) among students within classes, and 4) among measures (logs and observa- tions), as well as all the existing interactions. These variance components were then used to estimate the reliability for different data collection designs. The third approach assessed the consistency of measures of time in different pupil pursuits collected.with teacher logs and observer notes at the level of student time per day. The fourth approach analyzed the discrepancies in the categorization of individual pupil pursuit records coded from logs and observations. The fifth approach estimated fixed and random bias as well as random error in teacher recorded pursuit length as compared with observer 48 49 recorded pursuit length. ESTIMATION OF VARIANCE COMPONENTS In this section, estimates of the variance components among classes, measures (logs and observations), students, days and the existing interactions of these factors are presented. These variance components are then used to estimate the reliability of four data collection designs using generalizability theory (Cronbach, et al.). The variance components are presented in Table 5. They were estimated for language arts, reading, math, science, social studies and transitions between lessons. The sources of variance include measures (logs and observa- tions), classes, days, students, and the following inter- actions: measures by class, students by day, measures by day, measures by student and measures by student by day. No other interactions existed due to the nesting of students and days within classes. The numbers in the table are the variance estimates while the numbers in parentheses are the square roots which are probably more useful in that their metric is minutes. There were relatively small amounts of measures variance across the five subject matters and transitions between lessons. Measure variance is the variability between time measures from logs and observations averaged over students,classes and days. For math, science and transitions, there was no variation among measures. For 50 Table 5 variance Components of Measures of Five Subjects and Transitions Language Social Tran- Amts Reading Math Science Studies sitions Measures 3.17 7.19 0.00 0.00 9.36 0.00 (1.78) (2.68) (0.00) (0.00) (3.06) (0.00) Classes 173.28 665.12 96.46 302.21 239.59 137.83 (13.16) (25.79) (9.82) (17.38) (15.48) (11.73) Days 807.77 423.11 158.00 243.95 265.89 41.05 (28.42) (20.57) (12.56) (15.62) (16.31) (6.41) Students 29.26 40.10 1.60 0.00 0.00 0.81 (5.41) (6.33) (1.26) (0.00) (0.00) (0.90) M x C 13.47 0.00 11.46 1.49 0.00 165.91 (3.67) (0.00) (3.39) (1.22) (0.00) (12.88) s x D 15.51 36.80 4.40 7.38 8.80 0.90 (3.94) (6.07) (2.10) (2.72) (3.00) (0.95) M x D 247.56 252.86 82.95 92.84 250.33 85.61 (15.73) (15.74) (9.11) (9.64) (15.82) (9.25) M x S 0.00 22.62 0.00 0.00 0.56 0.42 (0.00) (4.76) (0.00) (0.00) (0.74) (0.65) M x D x 8 184.17 50.41 55.66 3.64 31.13 25.76 (13.57) (7.10) (7.46) (1.91) (5.58) (5.08) 51 language arts, reading and social studies, the square roots of the variance components were 1.78, 2.68 and 3.06 minutes respectively. These results suggest that when the design is collapsed across classes, days and students, there is little or no variability between log and observa- tion recorded time. The variability among classes was relatively large when compared to the other variance components. For reading and science it was the largest source of variance. For transitions it was the second largest, while for math and social studies it was the third, and in language arts it was the fourth. The square root of the variance among classes ranged from 11.73 minutes in transitions to 25.79 minutes for reading. As will be discussed in the next section, variation among classes would be conceptualized as "true score" variance when a research is interested in the differences in time spent in a type of activity among classrooms. The fact that this source of variance is large suggests that differences among classes in the time spent in various activities can be measured reliably. The variability among days within classes was also large when compared to the other variance components. It was the largest component for language arts, math and social studies, and the second largest for reading and science. Days was the fourth largest variance component for transitions. Apparently there are large day to day differences in the amount of time allocated in these five 52 subject matters within classes. Since day to day variation in allocated time to a subject matter would generally be considered error, increasing the number of days each class was observed would be an effective approach to increasing the reliability of time allocation measures. Compared with days and classes, the variability among students within a class in the time spent in each of the six activity categories was quite small. In science and social science, this component was zero. Reading had the largest amount of variability among students of the six activities where the square root of the variance component was 6.33. Apparently students within the same class tend to receive pretty much the same amount of instructional time per subject matter. The measure by class interaction was quite small for the five subject matters but relatively large for transi- tions. In reading and social studies it was zero, and for the other three subject matters the square root of this component was less than four minutes. It was however the largest component for transitions where the square root of the variance component was 12.88 minutes. The size of this component has important implications in terms of this study. It is the extent that there were differences among classrooms in the discrepancies between log and observa- tion recorded time. If one assumes that the observation measures were accurate as has been done in this study, the class by measure interaction is a measure of the extent 53 teachers differ in the magnitude and/or sign of the error in their recording pursuit length averaged over students and days. These results suggest that the differences among teachers in the magnitude and direction of their errors in recording of student activities was generally small with the exception of transitions. As will be presented in a following section, there were in fact large differences both in sign and magnitude among teachers in the differences between log and observation recorded time for transitions. The square root of the student by day interaction component ranged from just under a minute in transitions to just over six minutes in reading. It was the sixth largest of the nine components for all the activity categories with the exception of science, where it was the fourth largest. The measure by day interaction component was fairly large across all the activity categories. It's square root ranged from 15.84 minutes in social studies to 9.11 minutes in math. This suggests that there are large day to day differences in the discrepancies between logs and observations within a class. Whole class and subgroup activities make up a large portion of a school day in most classrooms. The activity description recorded by the teacher or observer would be the same for all the students in the class or subgroup when whole class or subgroup activities were taking place. Errors and ambiguities in a 54 teacher's description of a student activity or it's length would occur for all the children in the class or subgroup if the whole class or subgroup participated in the activity. This could result in large discrepancies on a given day between log and observation time recorded for a given activity type. This might explain the large measure by day interaction. If the errors in the teacher logs were basically random, they would cancel out across days resulting in little variance between the log and observa- tion measures averaged over days as was observed in this study. Unlike the measures by day interaction, the measures by students interaction was small or nonexistant. This component was zero for language arts, math and science. The square root was under a minute for the other categories with the exception of reading, where it was 4.76 minutes. Again the fact that much of the student day is spent in group activities could explain the small student by measure interaction. Discrepancies between log and observation recorded time would be consistent across students in the group or in other words no student by measure interaction. The measures by day by student interaction was moderate in size for the different activity categories. Since there was only one observation per cell in this design, this component includes the residual variance from all sources not included in the design. The square root of this component ranged in magnitude from 1.91 minutes in 55 science to 13.57 minutes in language arts. It is difficult to interpret this component, first because it is a three way interaction, and secondly because it also contains all other sources of variance not included in the design. The way the variance was distributed across the nine components was fairly consistent for the six activity categories. The largest sources of variance were classes, days and the measure by day interaction with the exception of transitions, where the measures by class interaction was the largest source. There were relatively small amounts and in the case of science, math and transitions no variance among measures. This suggests that the differences between teacher logs and observer field notes in the time recorded in these activity categories tends to cancel out when averaged over days, students and classes. Through the use of the variance components discussed above, the reliability or generalizability of time measures in a number of research designs was computed. For studies of differences among classes, the designs included all combinations of using logs or observations, measures on two and four days, and observing five and ten students. For studies of differences among students within a classroom, the designs included all combinations of using logs or observations and measures on two and four days. The results are presented in Table 6. The general- izability coefficients are to some unknown, though likely to some extent, underestimated for the designs using 56 Table 6 Generalizability Cbefficients for Measuring‘Time Allocation Lang. Art Reading Math Science Soc. Stud. TranB. Lang..Art Reading Math Science Soc. Stud. Trans. .23 .64 .41 .64 .47 .37 Observations Among’Classes Days Students .290: .TEn .Eiye .TBn .29 .44 .45 .75 .85 .86 .54 .69 '.70 .71 .83 .83 .64 .78 .78 .86 .92 .93 Logs .24 .37 .38 .65 .77 .78 .42 .56 , .57 .64 .78 .78 .47 .63 .63 .37 .41 .41 Among Students IMO .05 .00 .00 Days .EDHL .08 57 observers. The measure by day by student interaction was included in the estimate of the observed variance because this component contains the variance from all sources not included in the design. The actual variance of the three way interaction should not be in the observed variance due to the fact observers are considered accurate and differences between observations and logs are due to errors in the logs. The results in Table 6 suggest that both increasing the number of days observed and using observers as opposed to teachers to collect data result in meaningful improve- ments in reliability. Increasing the number of students observed from five to ten when the focus is on measuring differences among classes made at most a hundredth of a point improvement in the reliability. The reliability for assessing differences among classes was in general reasonably good. Although it was somewhat low for language arts and math in the designs presented and for transitions when logs were used, acceptable levels of reliability, e.g. .80 and above, could probably be achieved by collecting data on more days and/or using observations. This was not the case when the focus was on assessing differences among students within class- rooms. For science and social science there was no variance among students within a class and hence differ- ences cannot be measured at all. The variability among students was small for the other four categories. It is 58 likely that reasonably reliable data could be obtained for reading and possibly language arts if data was collected on a large number of days particularly if observers were used. It is unlikely that acceptable levels of reliability could be achieved for measuring math and transition time with any sort of reasonable design. The use of observers as opposed to teachers had the largest impact for measuring transition time. For measur- ing differences among classes, it more than doubled the size of the coefficients. This was due mainly to the fact the class by measure interaction was the largest source of variance for transition time. For the other activities, there was generally an improvement of between .05 and .10 in the reliability when using observers as opposed to teachers. The improvement was roughly the same or slightly better when data was collected on four as opposed to two days. Given the costs of using outside observers, increas- ing the number of days observed and using teachers as opposed to observers may well be a more cost effective approach for achieving equally reliable data at least for these six activity categories. RELIABILITY OF THE CODING PROCEDURES The observer notes from classrooms one and three on four of the days observed were coded by two separate individuals. These coders were graduate students who for the most part were working towards their doctorate in 59 education. In classroom one, separate sets of coders coded two days each. In classroom three, one individual coded four days while two other individuals each coded two days. The double codings were used to assess the reliability of coders using the coding conventions developed by the Language Arts Project. These coding conventions are contained in appendix B. The pursuit records produced from each set of codings were aggregated to the level of time per day per student for each of the five subject matter areas, transitions between lessons, seatwork, supervision and grouping categories. The time in each of the above categories on each student-day from the two codings was averaged. In addition, the difference and the absolute value of the difference was computed. These three statistics were then averaged across students and days coded for each pair of coders. The results for the two classes are presented in Table 7. The average time for the two codings in each category was provided to give some perspective for interpreting the differences in the time in different categories coded by different coders. An average difference between coders of three minutes has a different meaning when the total time in the category was 10 minutes as opposed to 30 minutes. The ratio of the difference of the codings over the average of the two codings expressed as a percent is also provided in Table 7. The mean deviations provide a 60 Table 7 Comparison of Time Measures from Multiple Codings Category Lang. A... Reading Math Science 800. Studies Seatwork Transitions WholelGroup Subgroup Individual supervised Ave. 34.98 84.73 34.15 8.18 11.81 5.11 47.50 55.86 12.34 151.64 91.82 Coder 1 vs Coder 2 Dev. 1.96 2.81 7.56 1.52 -0.88 -2.15 -12 021 -ll.44 23.77 -1044 '3.02 Unsupervised 120.11 -l.90 Class 1 Abs. Dev. 2.54 8.24 9.69 2.31 0.88 6.06 12.67 . 19.98 24.27 9.85 21.48 6.60 % of Ave. 6 3 22 19 7 42 26 20 192 % of Abs. 77 34 78 66 100 35 96 57 98 15 14 29 Category Lang. Arts Reading Math Science Soc. Studies Seatwork Transitions Whole Group Subgroup Individual Supervised AVG. 41.17 53.63 22.50 0.00 0.21 21.29 34.19 78.92 4.92 129.47 94.37 Unsupervised 12 .15 Table 7 (cont'd.) Coder 3 vs Coder 4 Dev. Abs. Dev. -9.90 -3.40 0.50 0.00 -0.42 10.54 0.46 -11.17 -0.58 7.94 -7.77 4.13 61 Classl 11.23 4.98 0.92 0.00 0.42 10.54 5.17 5.08 8.85 11.31 9.46 % of Ave. 24 6 2 0 . 200 50 14 12 %of Abs. 88 68 54 0 100 100 100 11 69 44 Category Tang. Arts Reading Math Science Soc. Studies Seatwork Transitions Ave. 30.83 79.18 24.68 65.19 30.48 0.71 30.36 Whole Group 189 .98 Subgroup Individbal supervised 3.40 38.46 208.05 Unsupervised 23.70 62 Table 7 (cont'd.) Class 3 Coder 5 vs Coder 6 Dev. Abs. Dev. 3.90 3.94 -l.32 8.32 0.48 0.48 -0.42 0.42 -0.48 0.48 1.42 1.42 -0.20 0.44 4.06 4.78. -l.40 1.40 -0.44 4.16 5.38 5.54 -3.16 3.44 % of Ave. 13 2 % of Abs. 99 16 100 100 100 100 45 85 100 97 92 Category Lang. Arts Reading ' Math Science Soc. Studies Seatwork Transitions Ave. 32.34 73.85 27.91 50.27 53.85 2.03 31.06 Whole Group 118.58 Subgroup Individual SLpervised 5.03 31.98 236.94 Unsupervised 18.70 Table 7 (oont'd.) Coder5vsCoder7 DeV. -0.24 2.10 3.38 5.66 -2.46 0.14 -l.64 13.08 ~0.38 -3.80 8.72 0.08 63 C1ass3 Abs. Dev. 2.76 2.18 3.38 7.06 2.90 0.94 5.68 16.44 . 3.26 16.04 11.76 3.28 % of Ave. 1 3 12 oommqmuw 12 % of Abs. 9 96 100 80 85 15 29 80 12 24 74 64 measure of the difference between coders in the average amount of time in a category coded over two days and approximately 20 students. Since the deviation between coders in the time they code for a pursuit category on a day for a student can be positive or negative, they can cancel each other out when averaged over days and students. The average absolute value of the differences was computed to assess the extent this occurred. The extent the mean absolute difference between coders across days and students is greater than the mean difference between coders can be looked upon as a measure of the extent the differ- ences between coders is not systematic or in a sense random. For example, if one coder consistently codes more time in language arts than the other coder across days and students, the difference in the time in language arts between them will always be positive or negative and the mean difference and the mean absolute difference would have the same magnitude. To the extent it varied as to which coder coded more time in language arts across students and days, the magnitude of the mean absolute time difference would be larger than the mean time difference. The ratio of the mean difference over the mean absolute value difference expressed as a percent is provided in Table 7 . For the five subject matter areas, the reliability of the coding seems in general to be quite good. In most cases the mean difference between coders was less than 10% 65 of the average time coded in a category. The subject matters where the discrepancies exceeded 10% of the average time in the subject matter differed between sets of coders suggesting that there was no particular subject matter that was difficult to code. In some cases the mean differences and absolute mean difference were equal or nearly equal suggesting there were systematic differences between coders while in other cases the absolute mean difference was substantially higher than the average difference. With the exception on social studies where there were systematic differences between each pair of coders, there did not seem to be a tendency across coders for the differences between them to be systematic or not systematic. For three of the four pairs of coders, the discrepancy in the time categorized as mixed seatwork was relatively large compared to the amount of time coded as mixed seat- work. This suggests that it may be difficult to distinguish whether an activity is mixed seatwork or instruction in a given subject matter. There was also a large discrepancy between one set of coders in terms of the time spent in transitions between lessons. The fact the absolute value of the difference between coders and the difference between coders were almost equal means that there is a consistent tendency for one of the coders to code more time as transitions than the other. One other set of coders was almost in perfect agreement as to the time 66 spent in transitions. The other two sets of coders had fairly small (.46 and 1.64 minutes) average differences per student per day. The absolute mean differences were between five and six minutes for both, suggesting it varied as to which coder coded more time in subgroup activities across days and students. AS‘With the subject matter categories, the consistency of the coding of the group structure varied between sets of coders. The greatest inconsistencies were found in coding the subgroup category. Coders one and two differed by almost 24 minutes while the average of their two codings was just over 12 minutes suggesting one coded on the average almost 24 minutes while the other coded essentially no time in subgroup pursuits. The other three pairs of coders had somewhat smaller differences in subgroup time. The mean absolute value differences however were quite large when compared to the amount of time coded in sub- group activities. In general, the coding was more consistent for whole group and individual activities. Two of the sets of coders had mean differences and.mean absolute value differences greater than 10% of the average time coded for whole group or class activities. One set of coders had a mean difference and a mean absolute difference of greater than 10% of the average time coded for individual activities. The other pairs of coders coding whole group and individual time were quite consis- tent when compared to the amount of time coded for these 67 categories. The coders were also fairly consistent in terms of their coding of the time spent in activities supervised and unsupervised by the teacher. The one major exception was that one pair of coders had a mean absolute value difference for supervised time of 21.48 minutes where the average time per student per day supervised was 91.82 minutes. The deviation averaged across days and students was only 3.02 minutes suggesting that one coder did not consistently code more time supervised than the other. In summary, the coding in general was fairly reliable. In most instances the average difference between coders per day per student in an activity category was less than 10% of the average time per student per day in that activity category. In addition, there did not seem to be any activity category upon which all four sets of coders had large differences. There did seem to be some difference between pairs of coders in the extent of agreement between them, though this conclusion is tentative given the size of the data set. Coders 5 and 7 were the most consistent. Only in math and individual instruction was the average difference between them greater than 10% of the average time coded between them in the category and in those categories the average difference was only 12% of the average time coded between them. Coders l and 2 on the other hand had averaged differences greater than 10% of the average time 68 in six categories and these percentages ranged from 19 to 192. Given the fact that these results are based on a relatively small data set, they are difficult to interpret. Each set of coders only coded two days, and a single difference in the categorization of an activity to which the whole class participated in could result in a large systematic difference between the coders in the time coded in two categories. This probably to some extent makes the differences between coders seem more systematic than they in fact are. What appears clear from these results is that differences between coders are to some extent (and quite possibly a large extent) due to nonsystematic differences rather than systematic differences in how they categorize written descriptions of student activities. If the assumption is made that these nonsystematic differences between coders are due to random errors in each coding and these errors have an expected value of zero, the reliability of the coding process would improve with the size of the data set coded as these random errors cancel out. THE CONSISTENCY OF LOG AND OBSERVATION ESTIMATES In this section the results of comparing time on task estimated from teacher logs and observer notes at the level of time per student per day are presented. Pursuit time records coded from logs and observations were each aggregated to produce data records with the total time for 69 each student in the study on each day. Both log and observation data was available. This was done for time in language arts, reading, mathematics, social studies, science, seatwork, transitions between lessons, time in whole class activities, activities in subgroups within the class, individual activities, teacher supervised activities, and unsupervised activities. The average time per day per student in each category coded from the logs and observa- tions in each class are presented in Table 8. In addition, these tables contain the ratio of the difference between the log and observation time over the observation time expressed as a percentage to aid in the interpretation of the magnitude of the difference. The ratio of the difference between the observations and the logs over the absolute value of the average difference between the observations and the logs expressed as a percentage is also contained in Table 8. This statistic can be used to assess the extent the daily differences between time coded in a category from teacher logs was systematically higher or lower than the time coded from observations. These analyses are essentially the same as was done previously in the analyses of the differences among coders. Table 9 presents the average difference between log and observation time in each class for language arts, reading, math, science, social science and transitions as a percentage of the square root of the estimated variance among classes and students (see Table 5). The square roots of the Obs. Lang. Log Arts Dev. Abs. Obs. Reading Log Dev. Ems. Obs. Math Log Dev. Abs. Obs. Science Log Dev. Abs. Obs. Social Log Studies Dev. Abs. Obs. Seat- Log work Dev. Abs. Cbs. Tran- Log sitions rev. Ame. 70 Table 8 Activity Time Measure Differences Class 1 2 3 4 38.96 41.88 36.02 44.32 52.42 54.75 30.32 48.12 35% 31% 16% 9% 91% 44% 66% 22% 67.58 37.54 77.82 82.54 70.34 37.75 78.91 93.45 4% 1% 1% 13% 13% 3% 10% 39% 35.18 29.83 29.23 63.31 39.39 34.31 27.30 47.11 12% 20% 7% 26% 66% 91% 32% 93% 2.32 9.86 48.51 9.33 3.21 11.25 51.35 12.07 7% 14% 6% 29% 36% 100% 51% 100% 4.42 8.61 32.24 0.00 3.21 11.83 34.95 5.06 27% 37% 8% - 74% 100% 51% 100% 16.64 44.90 5.51 7.09 0.00 22.44 3.81 8.00 100% 50% 31% 12% 100% 59% 90% 6% 37.80 44.50 30.15 15.72 42.44 50.26 7.46 30.33 12% 13% 75% 93% 52% 37% 100% 86% 68.87 78.02 13% 31% 21.48 22.86 6% 8% . 34.02 27.53 19% 58% 1.69 6.02 256% 100% 39.07 42.91 10% 17% 32.57 40.70 25% 63% 19.68 16.52 16% 25% 67.92 65.22 15% 41,74 47.25 13% 35% 37.99 40.20 6% 21% 14.34 0.00 100% 100% 17.88 33.96 90% 69% 5.04 24.66 389% 92% 47.01 13.71 71% 100% Whole Group Group vidual Super- vised Super- vised Dev. Obs. 109 Dev. Abs. 1 82.75 95.07 15% 94% 8.72 0.00 100% 100% 134.42 133.94 0% 3% 103.48 207.86 101% 100% 116.44 21.14 82% 100% 71 Table 8 (Cont'd.) 75.07 90.78 21% 53% 8.79 9.26 5% 9% 121.62 103.33 15% 50% 106.31 140.44 32% 96% 82.90 23.61 28% 94% Class 3 4 186.70 101.58 179.63 108.54 4% 7% 24% 18% 6.29 19.94 4.50 29.95 28% 50% 30% 45% 53.66 152.61 66.99 104.52 19% 32% 47% 96% 216.57 122.92 234.20 203.46 8% 66% 72% 90% 26.51 142.21 11.15 98.51 43% 69% 85% 100% 92.61 123.80 34% 53% 63.50 15.05 76% 90% 92.53 73.77 20% 57% 151.50 202.93 34% 92% 84.44 58.14 69% 98% 6 103.82 119.33 15% 53% 17.15 16.71 3% 3% 113.98 122.02 7% 21% 140.17 193.24 38% 94% 89.94 22.46 25% 60% Lang. Arts Read. Math Sci. Social Stud. Trans. 72 Table 9 Ratio of Time Measure Differences Over Sources of Variance Class Student Class Student Class Student Class Student Class Student Class itions Student 1 133% 260% 13% 37% 34% 1% 7% 38% 374% 127% 249% 1% 47% 8% 20% 47% 465% 3 56% 110% 5% 15% 15% 17% 36% 186% 1830% Class 4 37% 74% 55% 148% 130% 16% 31% 120% 1178% 5 90% 177% 6% 19% 52% 26% 24% 26% 255% 6 27% 52% 25% 75% 99% 273% 2685% 73 estimated variances can be thought of as roughly the average difference between the mean of all classes (or students within a class) and each given class (or student within a class). Since these are the differences in which researchers studying time on task are generally interested, the relationship between the magnitude of the error in teacher logs as compared with observer notes with these differences should be very useful for interpreting the seriousness of these errors. The metric for all the above statistics is minutes per day per student. It should be remembered that the discrepancies between pursuit records produced from logs and observations are due both to coder unreliability and differences between the logs and observations. There was no good.way to separate these sources of differences between the log and observation pursuit records. If one compares Table 7 containing the differences between multiple codings of the same observations with Table 8 containing the differences between logs and observations, the differences between the logs and observations were in general considerably larger than the differences between coders. This was true despite the fact the observation versus log means were based on eight days, while the multiple codings were based on only two days, giving a much greater opportunity for nonsystema- tic differences between observers and teachers as Opposed to multiple coders to cancel each other out over days. This suggests that a major portion of the discrepancies 74 between the logs and observations are due to real differ- ences in the information they contain rather than to coder unreliability in translating the written descriptions into pursuit records. The discrepancies between logs and observations in the average time per day per student in language arts ranged from 4% to 35% of the average time in language arts computed from the observations across the six classrooms. There was also a substantial difference across classrooms in the extent the differences between the log and observa- tion time was systematic. The ratio of the difference over absolute value difference between logs and observa- tions ranged from 15% to 91%. As was stated above, one way to assess the seriousness of differences between the logs and observations is to compare their magnitude with the variability among classes and students within classes in language arts time. The ratio of the average difference between log and observation time in language arts over the square root of the estimated variance among classes and students within classes is presented in Table 9. These ratios ranged from 27% to 133% across classes for the square root of the variance among classes and 52% to 260% for the square root of the variance among students within classes. In two of the classes for class variability and four of the classes for student variability, the ratios were greater than 100%. It would seem reasonable to con- clude in such cases, substituting teachers as a source of 75 time on task information for observers would be question- able if a researcher is interested in investigating differences among classes or students within classes. The fact that in four of the classes the average difference between log and observation time measures was less than half as large as the average absolute value difference suggests that teacher logs may be an acceptable source of data in large scale studies where the error in the data they provide would cancel out. The consistency between the logs and observations in terms of the time spent in reading was substantially better than language arts. The difference between log and observation time measures ranged from.1% to 13% of the time in reading measured from observations. The differences also did not tend to be systematic. The differences between log and observation measures ranged from.3% to 39% of the absolute differences. This suggests the relatively small discrepancies in the log measures as compared with the observation measures would cancel out even further in larger data sets. The average difference between log and observation measures of reading time ranged from.l% to 55% as large as the square root of the estimated variance among classes in reading and 3% to 148% as large as the square root of the estimated variance among students within classes in reading. This suggests that one can obtain sufficiently accurate data from teacher logs of reading time for 76 studying differences among classes and in most cases differences among students within a class. Also the larger the data set the less likely there is to be a problem of measurement error. In mathematics the average difference between logs and observations ranged from 6% to 26% the average time in mathematics per student per day. The differences between log and observation measures were somewhat more systematic than reading or even language arts. The differences ranged from 21% to 93% of the average absolute value difference. In all but one class the average difference between the log and observation measures of mathematics time were less than approximately half the size of the square root of the estimated variance among classes in the time spent in mathematics instruction. The estimated variance among students within classes in mathematics instruction was zero. This suggests that teacher logs are in general an acceptable measure of student time in mathematics for studies investigating differences among classes. In two of the classrooms, the logs and observations differed substantially in the amount of time the recorded that student spending learning science. In one class the observations indicated that students spent on average 14.34 minutes a day in science while the logs indicated they spent no time in science. In the other class, the logs indicated the students averaged 6.02 minutes per day while 77 the observations indicated they averaged 1.69 minutes per day. In the other four classes the logs and observations were much closer in agreement regarding the time spent in science. The discrepancies in log and observation time ranged from 6% to 29% of the time in science recorded from the observations. The differences in the time recorded in science from logs and observations was much more systematic than in the other subject matters. In four of the six classes the difference and absolute value difference were equal indicating either the log or observation time was consistently higher than the other across all students on all days included. In the other two classes the ratio of the difference to the absolute difference was 36. With the exception of one class, the differences between log and observation measures of science time were considerably smaller than the variability among classes in the time per day per student spent on science instruction. The differences ranged from 1% to 26% of the square root of the estimated variance among classes for these five classes. In the sixth class the difference was 86% as large as the estimated variance among classes. Although the differences between logs and observation measures of the time students spent in science may be systematic and not cancel out in large studies, they seem.to be small in most cases as compared to the variability among classes in the time spent in science instruction. As in 78 mathematics, the estimated variance among students within a class in the time spent in science instruction was zero. Five of the six teachers recorded on average more time in social science than the classroom observers. In two of the classrooms the mean difference between the logs and observations was substantial. In one class there was an average of 5.06 minutes of social studies recorded in the logs per student per day while there was no time in social studies recorded at all in the observations. In the other there was an average of 33.96 minutes recorded in the logs while only 17.88 minutes recorded in the obser- vations. The other four classes had considerably smaller mean differences between the log and observation time in social studies. They ranged from 8% to 37% of the observa- tion time recorded in social studies. In two classes the average difference between log and observation measures of the time spent in social studies instruction was equal to the absolute value of the average time and only in one class was the difference less than half as large as the absolute value difference. This suggests that like science, the differences in the log and observation measures of social studies time tend to be more systematic than in the other subject matters. As with science, only in one class (the same class), was the average difference between log and observation time in social studies nearly as large as the square root of the variance among classes. This suggests that teacher 79 logs probably can in most cases provide a satisfactory source of time on task data for researching differences among classes for social studies. The estimated variance among students within a class was also zero for social studies. There were substantial differences in the amount of time coded from logs and observations for mixed seatwork in three classes. In one class, over 16 minutes of seat— work on average was recorded from the observations but none was recorded from the logs. In another class almost four times as much time in mixed seatwork was recorded from the logs as from the observations. There also seemed to be a tendency in five of the six classes for the logs or observations to be systematically higher than the other across classes and students. One should note however that in three of the classes the logs indicated more time in mixed seatwork while in the other three the observations indicated more time. Mixed seatwork, as can be seen in the coding procedures presented in appendix B in item 50, constituted a general category for when students were working at their seats on a number of different subject matters and it was not possible to tell upon which subject matters individual students were working. This might explain why there were some large discrepancies in the amount of time coded from logs and observations in this category. The average difference between the time coded from 80 logs and observations in transitions between lessons was over 50% as large as the total time coded as transitions between lessons from the observations in three of the six classes. In the three classes with large differences between the amount of time coded from logs and observa- tions, the differences also tended to be systematic, with the average difference per student per day equal to or nearly equal to the average absolute value difference per student per day. The average difference per day per student coded from logs and observations was greater than the square root of the estimated variance among classes in three of the classrooms, and substantially greater than the variance among students within classes for all six classrooms. This suggests that it may be necessary to use observers to record transition time in classrooms particularly if the researchers are interested in differences among students within a class. It does not seem surprising that teachers seem to have difficulty keeping track of transition time since this is a busy time for them. There also might be some internal pressure for teachers to minimize transition time since it reflects on their classroom management skills, though this might not be the case since three of the six teachers actually recorded more transition time than the observers. In all but one of the classes more time was recorded in individual activities and less in whole group activities 81 by the observers as compared to the teachers. The observers also recorded more time on the average than the teachers in subgroup activities for four of the six classrooms. With the exception of the subgroup category in which relatively small amounts of time were coded, the difference between the amount of time coded from logs and observations was small compared to the amount of time coded from the observations. There were large and consistent differences between the time recorded as teacher supervised and unsupervised activities in the logs and observations for all six class- rooms. There seems to be a strong tendency for more time to be recorded as teacher supervised by the teacher as compared to observers. In two classrooms the differences were approximately an hour and a half per day per student. In the other classroom the differences between the logs and observations were also substantial. In five of the classrooms the mean deviation for supervised time exceeded 30 minutes, while in the other classroom it was 17.63 minutes. The differences also were systematic across days and students within a class. The use of logs does not seem to be an acceptable method of data collecting for the time students spend directly supervised by the teacher. As with transition time, teachers might feel pressure to indicate they are supervising instruction more than they really are. Part of the problem.mdght also be in hOW'the teachers recorded 82 classroom activities on the logs. An example of the form that was used by the teachers to log student activities is contained in appendix A. As will be discussed in chapter five, it may be possible to improve the ability of teachers to keep track of the time they spend supervising individual students by changing this form. In summary, a comparison of time allocation data coded from teacher logs and observer notes of the same student activities analyzed at the level of time per day indicates there are more than trivial differences between the two data collection methods in the time they record students spend in different activities. The most drastic differ- ence found was for teacher supervision. There was a consistent tendency across all six classes for teachers to record substantially more student time being directly supervised by themselves than recorded by classroom observers. The differences between log and observation time estimates for the other activity categories, with the exception of reading, were not consistent across classrooms. In reading, although there was a consistent tendency across classrooms for teacher logs on the average to indicate more time in reading, these differences were relatively small. In reading for all classrooms and for most classrooms in the other categories with the exception of teacher supervision, the absolute mean differences were substan- tially larger than the mean differences. This indicates 83 that across students and days, neither logs nor observa- tions recorded more time in a category consistently across days and students. This suggests that while the differ- ences between the logs and observations time recorded in an activity category on a single day for a single student may be large, these differences would tend to cancel out to some extent over days and students in a large scale study. INDIVIDUAL PURSUIT LEVEL ANALYSIS When time in different activities measured using teacher logs and observer notes at the level of time per day per student was compared, large differences between the two approaches were found in many cases. In this section the differences between the log and observation measures will be examined at the level of individual pupil pursuits as defined by Harnischfenger and Wiley (1976). Since this is the level at which the activity data is both collected through logs and observations and coded into pursuit records, it should provide better insights as to how differences between log and observation measures of student activities result. As was described in chapter three, pupil pursuit records were coded from the logs and observations for analysis. They included the beginning and ending time of the pursuit or activity and codings categorizing it in terms of the activity type, student grouping and 84 supervision. If one or more of these categories changed, the pursuit was considered to have ended and a new one started. Due to discrepancies between the logs and observa- tions, there was not always a one to one match between the pursuit records coded from logs and observations of a given student's activities on a given day. For example, the log pursuit records might show a student spending from 9:00 to 10:00 studying science individually without teacher supervision, while the observation pursuit records might show the student in transition from 9:00 to 9:05, then studying science individually without supervision from 9:05 to 9:30 then doing seatwork with the whole class under supervision from 9:30 to 10:05. To allow analysis of the discrepancies between log and observation pursuit records, a coding procedure described in detail in chapter three was used. Although the coding procedure was quite straightforward, errors in the coding process would be confounded with true discrepancies between the log and observation pursuit records. For this reason, multiple codings of two student days were done to assess the extent of coder unreliability. The length of each pursuit record from the two codings was correlated for each of the two student days and found to be .99 and .96 respectively. In addition, the codings for activity type, group type and teacher supervision from the two codings were crosstabulated. For the first 85 student day that was double coded, of the thirty—four pursuits, there was disagreement for one pursuit each on the three dimensions. For the second student day there was complete agreement for group type, and discrepancies for two pursuits each for activity type and teacher supervision coding for the total of fifty-two pursuits. It is felt that these results suggest that error in coding will have a trivial effect on the results of the analysis at the pursuit level. PURSUIT LEVEL ANALYSES As was discussed in chapter three, the hypothesis has been made that there are two major sources of the dis- crepancies between the logs and observations. First there are differences in the length of pursuit records. A teacher may indicate in his or her log that an activity started at 9:05 and ended at 9:33 while the observer may indicate in his or her notes that the activity started at 9:08 and ended at 9:39. The second major source of discrepancies is due to differences and ambiguities in the descriptions of the student activities contained in the logs and observations resulting in differences in the codes describing the activities in terms of subject matter, grouping and supervision. For example, the same activity may be coded from logs as whole group social studies with teacher supervision while coded as subgroup science with— out teacher supervision from the observations. Separate 86 types of analyses were used to assess the extent and nature of these two types of errors. Discrepancies in the coding of the same student activity will be discussed first. DISCREPANCIES IN PURSUIT CODING The discrepancies in the coding of pupil pursuits across all classes are presented in Tables 11 through 16. Table 11 presents the coding of the activity type. This includes the major subject matters, transitions between lessons and mixed seatwork. Table 13 includes the coding of the three grouping categories and Table 15 includes the two supervision categories. Each of the tables shows the coding from the logs horizontally and the coding from the observations vertically. The top number indicates the number of pursuits in that cell of the table. The second number indicates the average length of the pursuits in that cell in minutes. The third number indicates the proportion of pursuits with that code from the observations that are in that cell. In other words the row proportions. A log linear analysis was used to test the hypothesis that the discrepancies between the coding of the three dimensions from logs and observations differed signifi- cantly among the teachers. There was a significant lack 10f fit when the classroom dimension was left out of the Inodel for activity type as well as group and supervision code. The likelihood ratio and Pearson chi-squares and their associated degrees of freedom and probabilities for 87 the tests of fit on each dimension are given below in Table 10. Table 10 Differences Among Classes in Pursuit Categorization DF Lr Chisq. Pearson Chisq. p < Activity Type 240 819.43 907.57 .0000 Group Type 40 314.58 346.25 .0000 Teach. Sup. 15 104.85 102.96 .0000 Given the fact that there are statistically signifi- cant differences among classrooms in the discrepancies between the logs and observations in how individual pursuit records are coded on the three dimensions, individual crosstabulations by class are provided in Tables 12, 14 and 16. In order to make the tables readable, only the percent of observation pursuits with that coding in the cell are provided. The results contained in Tables 11 and 12 will be discussed first. The consistency between the coding of activity type was reasonably good for those pursuits coded in the five ‘major subject matter areas. The proportion of pursuits coded from the logs as being in the same subject as coded in the observations ranged from .73 for language arts to .92 for social studies across all classes. There was a tendency to confuse language arts and reading, especially in classes two, three and six. This is not 88 Table 11 Categorization of Activity Type from.Logs and Observations Observations Logs Lang. Soc. Tran- Seat— Arts Read. Math Stud. Sci. sitions work Lang. 244 35 3 0 0 7 44 Arts 11.21 7.40 6.00 0.00 0.00 3.71 9.11 .73 .11 .01 .00 .00 .02 .13 27 296 6 11 0 14 6 Read. 11.70 11.91 22.17 16.36 0.00 5.64 18.67 .08 .82 .02 .03 .00 .04‘ .02 8 6 231 0 0 5 4 Math 5.38 8.50 11.22 0.00 0.00 2.60 28.50 .03 .02 .91 .00 .00 .02 .02 0 0 0 62 16 0 1 Science 0.00 0.00 0.00 14.11 38.38 0.00 11.00 .00 .00 .00 .78 :.20 .00 .01 6 0 0 0 79 0 1 Soc. 24.50 0.00 0.00 0.00 11.63 0.00 3.00 Stud. .07 .00 .00 .00 .92 .00 .02 Tran- 90 46 39 32 27 243 27 sitions 3.04 2.87 2.72 2.31 4.33 4.03 4.37 .18 .09 .08 .06 .05 .48 .05 Seat- 49 27 1 0 l 20 40 work 7.39 12.52 17.00 0.00 22.00 5.40 17.25 .36 .20 .01 .00 .01 .14 .29 89 qmue12 categorization of Activity Type from.Logs and Observations by Class Ckms me. Ram. lhfln as. San than Smxh ng 0000000 0520001 1 91 0000000 TWO 9000427 1 6 0000010 2 0000660 0000090 9080000 4280050 5 0000 000v 0300090 0 l 1 0070000 Fax 5500044 1 18 0400040 2 0000850 6 0100050 3 0 1 0000020 0 1 1 0500020 3 3406054 3 16 0000010 3 0004037 901 0000000 2070007 8 1 3820050 9O surprising given the similarity and overlap of these subject matters. There was also a tendency for pursuits coded as language arts from the observations to be coded as mixed seatwork and visa versa. This was a particular problem in classes two and six. The pursuits coded as math, social studies and science from the observations was for the most part consistently coded the same from the logs. The pr0portions were .91, .78 and .92 respectively. The major problem in science was in class six where all the 16 pursuits coded as social studies from the observations were coded as social studies from the logs. Large discrepancies were found for those pursuits coded as transitions from the observations. Only 48% of those pursuits coded from the observations as transitions were also coded as transitions from the logs. The incon- sistencies in the coding of transition time occurred in all six classes. This suggests as one might expect, that teachers have trouble keeping track of the time spent in transitions between lessons. In classes one, two and four, there were also a substantial number of pursuits coded from the logs as transitions while being coded as various subject matters and seatwork from.the observations. This would explain why there was actually more time coded as transitions between lessons from the logs as from the observations in these classes, as can be seen in Table 8. Only 29% of the pursuits coded as seatwork from the 91 observations were also coded as seatwork from the logs. These inconsistencies occurred across all the classrooms though it was a particular problem in classes one, two, and four. In general, seatwork was confused with reading and language arts. There was also a strong tendency for pursuits coded as subject matters and transitions from the observations to be coded as seatwork from the logs. Of those pursuits coded as seatwork from the logs, 67% were coded as other than mixed seatwork from the observations. As was stated earlier, the mixed seatwork category was used when students were working on multiple subject matters at their seats and it was not possible to determine which subject matter a particular student was working on at any given time. It is not surprising that there were a large number of pursuits where there was disagreement in terms of this category. The crosstabulation of group type coded from logs and observations at the pursuit level is presented in Tables 13 and 14. The major discrepancy was in the sub- group category. Of those pursuits coded as subgroup frmm observations, over half were coded as individual pursuits in classes three through six from the logs. In class one, six of the eight pursuits coded as subgroup from the observations were coded as whole group from.the logs. The whole group and individual categories were found to be more consistently coded from logs and observations. Approximately 80% of those pursuits coded as whole group 92 Table 13 Categorization of Group 1352 from Logs and Observations Observations logs Whole Group Subgroup Individlal Whole 385 10 _ 95 Group 15.63 9.80 6.88 .79 .02 .19 Sub- 8 42 59 Group 25 .13 13 .88 10.58 .07 .39 .54 88 45 575 Indi- 11 .48 5 .69 12 .67 vidJal .12 .07 .81 93 Table 14 Categorization of Group from Logs and Observations by Class Class 013. Log Whole Sub. Ind. One Whole 68 0 32 Sub. 75 12 12 Ind. ‘7 0 93 'IVo Whole 88 4 8 Sub. 0 76 24 Ind. 16 8 _ '79 Three Whole 83 O 17 Sub. 6 6 88 Ind. 18 0 82 Four Whole 98 0 2 Sub. 5 43 52 Ind. 14 29 57 Five Whole 53 '7 40 Sub. 0 34 66 Ind. 15 0 85 Six Whole 70 1 29 Sub. O 44 56 Ind. 4 2 94 94 and individual activities from observations were coded the same way from the logs. The crosstabulation of teacher supervision coded from logs and observations at the pursuit level is presented in Tables 15 and 16. There was a strong tendency for those pursuits coded as not being supervised by the teacher from the observations to be coded as supervised from the logs. In all six classes over half the pursuits coded from the observations as unsupervised were coded as supervised from the logs. Among those pursuits coded as supervised from the observations, 88% were coded as supervised frmm the logs. Given these results at the pursuit level, it is not surprising that there was substantially more student activity time categorized as teacher supervised in all the classrooms when the data was aggregated to the level of student time per day. The discrepancies in the categorization of the pursuits from logs and observations at the pursuit level seems to parallel quite closely the discrepancies in the average time in those categories coded from logs and observations at the level of total time per student day discussed previously in this paper. This suggests that miscategorization of pupil pursuits is a major source of the discrepancies between the time students spend in different activities measured using teacher logs and observer notes. The other potential source of discrepan- cies between the time students spend in different activities 95 Table 15 Categorization of Supervision from.Logs and Observations Observation Log Supervised Unsupervised 715 98 Supervised 11.90 5.33 .88 .12 302 152 Unsupervised 15.12 15.38 .67 .33 96 Table 16 Categorization of Supervision from Log and Observations Class Obs . Log Sm. Unsup. (he Slp. 90 10 Unsup. 70 30 'No Sup. 77 23 Unsup. 52 48 'I'nree 8m. 97 3 Unsup. 80 20 Four SLp. 84 16 Unsup. 68 32 Five Sup. 96 4 Unsup. 81 19 Six SLp. 88 12 Unsup. 63 37 97 measured using logs and observations is differences in the recorded time pursuits start and end. The analysis of this source of error is discussed next. DISCREPANCIES IN THE LENGTH OF PURSUITS A measurement model discussed by Schmidt (1981) was used to assess how consistent teachers were with observers in recording the beginning and ending times of student activities. The model which was described in chapter three is essentially the linear regression of a observed score on its true score. In this instance, the pursuit length computed from the observer recorded beginning and ending times is defined as the true score and the pursuit length computed from the teacher recorded beginning and ending times is defined as the observed score. Schmidt shows algebraically that error or the difference between the true and observed score contains three distinct components. One is a fixed bias independent of the true score and equal to the intercept of the observed score regressed on the true score. The second is a random bias associated with the magnitude of the true score and equal to the slope of the regression of the observed score on the true score times the true score minus one. The third component is random error independent of the true score and equal to the standard error of estimate of the regression of the observed score on the true score. The pursuit lengths from both the observations and the logs (true and observed 98 scores in the model) are random variables. For this reason, maximum likelihood was used to estimate the para- meters and provided correct asymptotic standard errors. The model was fit separately for the pursuits in each class. It was felt that teachers were likely to differ in how accurately they could keep track of the beginning and ending times of the activities of the students in their classes. These results are presented in Table 17. The table contains the mean pursuit length in minutes from.the logs and observations, fixed bias, the random bias coefficient (one minus the slope of the log time regressed on the observation time), square root of the random error and the number of pursuits. The table also contains the asymptotic standard errors for the fixed bias component and random bias coefficient. The model was also fit separately for the pursuits in the five subject matters, transitions between lessons, seatwork, group type and whether or not the pursuit was supervised by the teacher. Only those pursuits that were coded the same from logs and observations were included. These results are presented in Table 20 which contains the same statistics as Table 17. In order to give some perspective to the effect of fixed and random bias when using teachers as opposed to observers for recording the beginning and ending times of pursuits, the fixed, random and total bias for pursuit lengths of 15, 30, 45 and 60 minutes for each class and activity category are presented in Table 19. Fixed, random and - 99 Table 17 Sources of Error in Pursuit Length Coded from Teacher Logs by Class Obs. Log Fixed Random Bias Error Class Mean Time Bias Coefficient Stand. Dev. Count 1 10.78 10.29 0.38 (.20) -.08 (.01) 2.91 399 2 7.02 6.94 0.05 (.10) —.02 (.01) 2.02 616 3 10.96 10.65 0.33 (.25) -.06 (.01) 4.21 445 4 14.40 13.32 1.75 (.38) 4.20 (.02) 5.43 350 5 14.24 14.05 0.54 (.28) -.05 (.01) 3.12 251 6 11.87 11.56 -0.55 (.22) .02 (.01) 3.80 510 100 total bias at given pursuit lengths are also provided for each of the activity categories in Table 21. Shorter sample pursuit lengths (l, 2, 5 and 10 minutes) are used for transitions since, as can be seen by the mean log and observation times presented in Table 20, these pursuits tend to be shorter. Error of measurement or in this case the difference between teacher and observer measures of pursuit length is defined under the model in equation 1 e = A + (B-l)g + e (l) where e is error, A is fixed bias, B is random bias, and 6 equals random error. Since A is a fixed constant, the variance of e is given below in equation 2. 2 _ 2 2 2 0e — (B-l) 0g + 06 (2) Table 18 presents the total error variance, random bias variance and random error variance for each of the six classrooms. The proportion of the error variance from random bias and error are also presented. The same statistics are also presented in Table 22 for each of the activity categories. The results of fitting the model across all types of pursuits in each classroom will be discussed first. The fixed bias among the teachers tended to be small and with the exception of classroom six, positive. Only classroom four had a fixed bias exceeding one minute. Only 101 Table 18 Bias at Standard Pursuit Lengths by Class Pursuit Tbtal Fixed Random Class Minutes Bias Bias Bias 15 -.82 .38 -1.20 l 30 -2.02 .38 -2.40 45 -3.22 .38 -3.60 60 -4.42 .38 -4.80 15 -.25 .05 -.30 2 30 -.55 .05 . -.60 45 -085 .05 -090 60 -1.15 .05 -l.20 15 -.57 .33 -.90 3 30 -1.47 .33 -1.80 45 -2037 .33 —2070 60 -3.27 .33 -3.60 15 -1.25 ' 1.75 -3.00 4 30 -4.25 1.75 -6.00 45 -7.25 1.75 -9.00 60 -10.25 1.75 -12.00 15 -.21 .54 -.75 5 30 -.96 .54 -1.50 45 -1.71 .54 -2.25 60 -2.46 .54 -3.00 15 -025 -055 030 6 30 .05 —.55 .60 45 .35 -.55 .90 60 .65 -.55 1.20 102 in classrooms four and six did the fixed bias differ from zero by more than two standard errors. The random bias coefficient among the teachers was also small and.With the exception of classroom six, negative. The magnitude of the random bias coefficient was under .10 with the exception of classroom four where it was .20. The random bias coefficients though small, differed from zero by more than two standard errors in all but classrooms two and six. The standard deviation of the random error which can be thought of as roughly the average amount of random error in a teacher recorded pursuit, ranged from 2.02 minutes in classroom two to 5.43 minutes in classroom four. As stated above, with the exception of classroom.six, the fixed bias was positive and the random bias negative. In classroom six the signs of the bias coefficients were reversed. This suggests that in the first five classes the teachers tend to underestimate short pursuits and over- estimate long pursuits. In classroom six the teacher tended to underestimate short pursuits and overestimate long pursuits. This is due to the random bias being a function of the true or observation pursuit length, the longer the pursuit the greater the magnitude of random ‘bias. The fixed bias is constant across pursuit of different lengths. This would result in the magnitude of the fixed bias to be greater than the magnitude of the ‘random bias up to a given pursuit length, while the magni- tude of the random.bias would be greater for larger 103 pursuits. Through simple algebra it can be shown that the magnitude of the fixed bias is equal to the magnitude of the random bias when the pursuit length is equal to A/(B-l), where A is the fixed bias and B-1 is the random bias coefficient. At this pursuit length, if the fixed and random bias differ in sign they cancel out resulting in no bias. If the fixed and random.bias were of the same sign, they of course would be additive. Table 19 presents the total, fixed and random bias for 15, 30, 45 and 60 minute pursuits in each classroom. The total bias is negative across all four pursuit lengths in the first five classrooms and is negative for 15 minute pursuits and positive for larger pursuits in classroom six. The bias for a 60 minute pursuit was -lO.25 minutes in classroom four. In the other five classrooms the Imagnitude of the bias for a 60 minute pursuit was under five minutes. Only in classroom four did bias seem to be a major factor in teacher recorded pursuit length. This is also the only classroom where both the fixed bias component and the random bias coefficient differed from zero by more than two standard errors. Table 19 presents the random.bias random error and total measurement error variances for each classroom. These results indicate that random errors in recording pursuit length as opposed to random bias seems to be the Inajor contributing factor to measurement error variance. Only in classroom.four did the variance from random bias Bias and Error Components of Measurement Error Variance by Classroom 104- Table 19 Measurement Error Variance 9.35 4.11 18.44 40.17 10.25 14.51 Random Bias .90 (10) .03 ( 1) .72 ( 4) 10.69 (27) .52 ( 5) Random Error 8.45 (90) ‘4.08 (99) 17.72 (96) 29.48 (73) 9.73 (95) 14.51 (99) 105 account for more than 10% of the total error variance, and in that class it accounted for under a third of the variance. Table 20 presents the results of fitting the model for different types of pursuits. This analysis was collapsed across classes to ensure reasonable sample sizes and keep the sets of results to a manageable number. As when the model was fit across all types of pursuits, bias did not seem to be a significant factor for recording pursuit length in the five subject matters. The fixed bias was positive and under a minute in all five areas. Only in math and language arts did it differ from zero by more than two standard errors. The random bias coefficient was negative and less than .1 for all the five subjects. Only in reading and math did it differ from zero by greater than two standard errors. These results suggest that for science and social studies, the classical true score model may be appropriate for modeling error in teacher recorded pursuit length. Since under the classical true score model error is random with an expected value of zero, it would tend to cancel out in large scale studies. This would tend to indicate that for those pursuit cate- gories where the classical true score model fits, the error in teacher recorded pursuit length should not be a problem. Table 21 presents the fixed, random and total bias estimates for the five subject matters as well as 106 Table 20 Sources of Error in Pursuit Length Coded from Teacher Logs by Activity Obs. Log Fixed Random Bias Error Class Mean Time Bias Coefficient Stand. DEV. Count Lang. Arts 11.21 11.51 .52 (.19) -.02 (.01) 2.08 244 Reading 11.91 11.82 .29 (.21) -.03 (.01) 2.75 296 Math 11.17 11.26 .88 (.30) -.07 (.02) 3.50 230 Science 14.11 14.40 .32 (.96) -.01 (.05) 5.41 62 Soc. Stud. 11.63 12.17 .62 (.53) -.01 (.03) 3.49 79 Transitions 4.03 4.49 1.75 (.23) -.32 (.05) 1.91 241 Seatwork 17.25 16.87 2.77 (.82) -.18 (.04) 4.19 38 Whole Group 15.63 15.79 .92 (.26) -.05 (.01) 3.59 386 Subgroup 13.88 14.09 -.18 (.98) .03 (.05) 4.28 42 Indivichal 12.67 12.68 .75 (.21) -.06 (.01) 3.51 573 Supervised 11.90 11.97 .59 (.15) . -.04 (.01) 2.95 715 Unsupervised 15.38 15.01 2.08 (.69) -.16 (.03) 5.40 152 107 Table 21 Bias at Standard Pursuit Lengths by Activity Pursuit Total Fixed Random Activity Minutes Bias Bias Bias 15 .22 .52 -.30 LangJage Arts 30 -.08 .52 -.60 45 ”.38 .52 ".90 ' 60 “-.68 .52 "1.20 15 -.16 .29 “.45 45 -1006 029 -1035 60 “1.51 .29 ‘ -1.80 15 -017 088 -1005 45 -2027 .88 -3015 60 -3032 .88 -4020 15 017 .32 -015 Science 30 .02 ' .32 -.30 45 -013 032 -o45 60 -.28 .32 ".60 15 .47 .62 ”.15 45 .17 .62 -.45 60 .02 062 -060 1 1.43 1.75 -032 Transitions 2 1.11 1.75 -.64 5 .15 1.75 -1060 15 007 2077 -2070 Mixed 30 ”2.63 2.77 "5.40 Seatwork 45 -5.33 2.77 '3.10 60 -8.03 2.77 -10 .80 108 Table 21 (Cont'd.) Pursuit Tbtal Fixed Random Activity Minutes Bias Bias Bias 15 .17 092 -075 Whole ' 30 -.58 .92 -1.50 Group 45 -1.33 .92 -2.25 60 -2.08 .92 -3.00 15 .27 -.18 .45 Subgroup 30 .72 -.18 .90 45 1.17 -.18 1.35 60 1.62 -.18 1.80 15 -015 075 -0” Individual 30 -1.05 .75 -1.80 45 -1.95 .75 -2.70 60 -2.85 .75 -3.60 15 -.01 .59 -.60 Supervised 30 -.61 .59 -1.20 45 -1.21 .59 -1.80 60 -1.81 .59 -2.40 15 .32 2.08 -2.40 unsupervised 30 -2.72 2.08 -4.80 45 -5.12 2.08 -7.20 60 -7.52 2.08 -9.60 109 transitions and mixed seatwork. Of the five subject matters, only in reading and math does the total bias exceed a magnitude of one minute for even a 60 minute pursuit. For mathematics and reading the magnitude of the bias was under five minutes for a 60 minute pursuit. This suggests that even in the two subject matters where bias differed from zero by more than two standard errors, the magnitude of the bias is relatively slight. Table 22 presents the random bias and random error components of total measurement error or log time minus observation time variance for the 12 activity categories. In science and social studies, random bias accounted for less than one percent of the total measurement error. In the other subject matter categories random bias also accounted for a very small portion of the total error variance. This coupled with the results presented above, suggests that the major source of error in teacher recorded pursuit length is random error rather than bias. Bias was a more important factor in recording pursuit length for transitions between lessons. The fixed bias was 1.75 minutes and the random bias coefficient was -.32. The fact that the fixed bias and the random bias coeffi- cient differed in sign indicates they would tend to cancel each other out with no bias when the pursuit length was 5.46 minutes. Both the fixed bias and random bias coefficient differed from zero by well over two standard errors. These results suggest that teachers have difficulty 110 Table 22 Bias and Error Components of Measurement Error Variance by Activity Measurement Random Random Activity Error Variance Bias Error LangJage Arts 4.39 .05 ( 1) 4.33 (99) Reading 7.73 .17 ( 2) 7.56 (98) Mathematics 13.08 .83 ( 6) 12.25 (94) Science 29.29 .02 ( 0) 29.2‘7(100) Social Studies 12.20 .02 ( 0) 12.l8(100) Transitions 4.27 .62 (15) 3.65 (85) Seatwork 27.96 10.40 (37) 17.56 (63) Whole Group 13.47 .58 ( 4) 12.89 (96) Subgroup 18.47 .15 ( 1) 18.32 (99) Individ1al 12.91 .58 ( 4) 12.32 (96) Smervised 8.99 .29 ( 3) 8.70 (97) Unsupervised 33.54 4.38 (13) 29 .16 (87) 111 keeping track of the length of time their students spend in transitions. Teachers tend to overestimate short transitions (less than 5.46 minutes) and underestimate longer pursuits. This is not surprising given that teachers are generally quite busy managing the classroom during these periods and probably do not have time to look at the clock and record beginning and ending times. Table 21 presents the estimated fixed, random and total bias for transition pursuits of l, 2, 5 and 10 minute lengths. Shorter pursuit lengths were given for transi- tions in Table 21 due to the fact that transitions tend to be shorter than other activities as can be seen from the mean observation and log pursuit lengths given in Table 20. As one can see the estimated bias in teacher recorded pursuit length for a one minute pursuit is 1.43 minutes, greater than the actual pursuit length. For a 10 minute pursuit, the total bias has reversed sign and is -l.45 minutes. Random bias accounted for 15% of the error variance for transitions between lessons. This was far greater than was found for the five subject matter areas, though random error still seems to be the major contributor to error variance in teacher recorded pursuit length. Bias was also a factor in the teacher's recording of the pursuit length of seatwork. The fixed bias was 2.77 minutes and the random bias weight was -.18. Both differed from zero by well over two standard errors. Random bias also made up a significant portion of the total error 112 variance. Apparently teachers are more biased in keeping track of the time their students spend doing mixed seatwork than they are in keeping track of the time their students spend in the five subject matter areas. As can be seen in Table 22, teachers tend to slightly overestimate short ‘mixed seatwork pursuits (.07 minutes for a 15 minute pursuit) and underestimate longer pursuits (8.03 minutes for a 60 minute pursuit). There was only a small amount of fixed and random bias in the recording of the pursuit length for all three grouping categories. Both the fixed bias and random bias weight for the subgroup category differed from zero by less than one standard error. The fixed bias for whole group and .75 individual pursuits was .92 and respectively. Both were larger than three standard errors. The random.bias weights were -.05 and -.06 for whole group and individual pursuits respectively, both differing from zero by more than three standard errors. Despite the fact one can be quite confident that random bias differed from zero for whole group and individual activities, it made up only a small portion of the total error variance. As can be seen in Table 22, random bias made up only 4% of the error variance for both individual pursuits and.whole group pursuits. As can be seen in Table 21, the total bias for a 60 minute pursuit was less than three minutes for all the grouping categories. The square root of the random error variance for whole group, subgroup and 113 individual pursuits was 3.59, 4.28 and 3.51 minutes respectively. As with the five subject matters, this was about 20% to 30% as large as the average pursuit length. As one might expect, teachers had more difficulty keeping track of the beginning and ending times of unsuper- vised pursuits as compared with supervised pursuits. There was both more bias and random error for unsupervised pursuits. The fixed bias for supervised and unsupervised pursuits was .59 and 2.08 minutes respectively and in both cases the fixed bias estimates were larger than three standard errors. The random error coefficient for super- vised and unsupervised pursuits were -.04 and -.16 respectively and both differed from.zero by more than three standard errors. The standard deviation of the random error variance was also larger for unsupervised as opposed to supervised pursuits, 5.40 to 2.95 minutes respectively. For both categories, random error variance as opposed to random bias tended to make up the major portion of the total measurement error variance. When pursuit data was aggregated to the level of total time per day per activity category, the teacher logs indicated considerably more time spent as supervised and less as unsupervised than the observer field notes. As was discussed in the previous section, this was at least partly explained by differences in the categorization of pursuits. The results in Table 20 suggests it is also partly explained by a tendency for teachers to overestimate 114 on the average the length of supervised pursuits and underestimate the length of unsupervised pursuits. The mean pursuit length of a supervised pursuit from the observations is slightly smaller than from the logs (11.90 to 11.97). The reverse is true for unsupervised pursuits to a greater extent (15.38 to 15.01). It is difficult to evaluate the impact of this tendency or bias at the level of time per day supervised and unsupervised, since there may be many supervised and unsupervised pursuits per student per day. What seems clear is that the tendency for teachers to underestimate the length of unsupervised pursuits and overestimate the length of supervised pursuits to some extent along with miscategorization, results in considerably more student time being recorded as teacher supervised from teacher logs than from observer notes. In summary, a model with a fixed and/or random bias component is necessary for some teachers and some activity or pursuit categories across teachers, while in others the classical true score model is adequate. Even in those categories, or for those teachers where the random bias was statistically significant, random error made up the major portion of the total error variance. The random and fixed bias point estimates differed in sign for each teacher and each pursuit category. With the exception of teacher four and subgroup instruction, the fixed bias was positive and the random bias negative. This indicates 115 that in most cases teachers tend to underestimate longer pursuits and overestimate shorter ones. The square root of the random error variance ranged from about 20% to 30% as large as the average pursuit length across teachers and pursuit categories. This component can be thought of as roughly the average difference between teacher and observer recorded pursuit length due to random errors. Though substantial, this component which is random with an expected value of zero under either Schmidt's model or the classical true score model, would tend to cancel out when averaged over pursuits. CHAPTER FIVE CONCLUSIONS AND IMPLICATIONS FOR.MEASURING TIME ON TASK This chapter begins with a summary of the results of assessing the use of teachers to collect classroom time allocation data. The results will be discussed in the order they were presented in chapter four. This will be followed by a discussion of the implications of the results for future studies of time allocation and how data collection procedures might be improved. SUMMARY OF THE RESULTS For four days each in two of the classrooms two individuals coded the observer transcripts in order to assess the reliability of the coding procedures. The time in five subject matter areas coded by two different coders from the same observer transcripts was fairly consistent. In most cases, the average difference between coders as well as the average of the absolute value of the difference was less than 10 percent of the time coded for that subject matter. The coders had some difficulty coding mixed seatwork, suggesting there may be some con- fusion as to what is mixed seatwork and what is instruction in a given subject matter. One set of coders had large 116 117 differences in the amount of time they coded in transitions between lessons. The other coders were quite consistent in the time coded as transitions. There were large differences among some of the pairs of coders in the time students spent in the different grouping categories. The most substantial difference was between one pair of coders for the subgroup category where one coded approximately 24 minutes a day and the other coder coded essentially no time in this category. The fact the other three pairs of coders were fairly consistent in the time they coded in this category suggests this was probably an isolated problem. All 4 pairs of coders were fairly consistent in the time coded in the supervisory categories. There also did not seem to be a consistent tendency of one coder or the other to record more time as either supervised or unsupervised across days and students. The coders in general were quite consistent in the time they coded in the different activity categories. Although there were cases in which a pair of coders differed substantially in the time they coded for a specific category, there were no categories where a majority of the four pairs of coders differed substantially in the time they coded in the category. This suggests that differences between coders in the time they code is probably idiosyncratic, possibly having more to do with differences in how they coded a specific activity, rather than any general tendency. Only two days of a single 118 class were coded by each pair of coders due to the sub- stantial amount of time and effort it took to code observer notes. If a single activity that a large portion of the class participated in was coded differently by the coders, this single discrepancy could result in substantial differences in the time coded by the coders in the two categories. Generalizability theory was used to estimate the reliability of a number of sample data collection designs. This provided insights on how best to collect reliable data in a cost effective fashion. The facets were measures (observations and logs), classes, days and students. The time categories were language arts, reading, math, science, social studies and transitions between lessons. Estimates of the variance components were computed for each facet and all the existing interactions among the facets. Classes, days and the measure by day interaction tended to be the largest components. The major exception was the class by measure interaction which was the largest component for transitions. The reliability for measuring differences among classes was estimated for all possible combinations of using logs or observations, two or four days, and five or ten students. The reliability for measuring differences among students within a class was estimated for all com- binations of logs or observations and two or four days. The results suggest that increasing the number of days 119 observed is generally the most effective method of increas- ing reliability, though using observers as opposed to teachers results in a substantial improvement. Increasing the number of students observed had little or no effect on the reliability. It seems clear that reasonably reliable data, e.g. coefficients of .80, can be obtained with the use of either teachers or observers as the data source and a reasonable number of days observed, e.g. under fifteen, if the focus is on assessing differences among classes. If, however, a research is interested in measuring differences among students within classes, the task is much more difficult if not impossible. There was no variance among students for science and social studies and hence differences among students cannot be measured. In the other categories there were relatively small amounts of variance among students. Only in reading and possibly language arts could differences among students be measured reliably and only through the use of observers on a large number of days. For eight days in three of the classrooms and nine days in the other three classrooms observers as well as the classroom teachers recorded student activities and the time they occurred. The amount of time coded from the teacher logs and observer transcripts was compared for five subject matters, transitions between lessons, seat- work, group type and whether or not the activity was supervised by the teacher. Although the differences between 120 the time coded from the logs and observations probably contain differences due to coder error as well as differ- ences in the information recorded by the teacher and observer, the differences found in the time in different categories between the log and observation data far exceeded that between multiple codings of the same obser- vation discussed above. This indicates that there are real and substantial differences between the information collected by teachers and observers. The most striking and consistent difference across all the classrooms was the tendency for teachers to record more supervised time and less unsupervised time as com: pared with the observers. This was true for all teachers. The difference ranged from 11.51 minutes per day less time as unsupervised being recorded by one teacher to a high of 104.48 minutes per day more time being recorded as supervised by another teacher. If the assumption is made that the error is in the teacher logs rather than the observer notes, these results indicate the use of teachers to record the time their students spend under their direct supervision will not provide accurate data. Reading was the only other category where there were consistent differences between the logs and observations. In all the classrooms the teachers recorded more time in reading than the observers, though the differences were nowhere near as great as in the supervision categories. The fact the mean absolute value of the difference between 121 logs and observations was substantially larger than the mean difference in all the classrooms indicates that none of the teachers consistently recorded more time than the observers across all the days and students in his/her class in reading. In all the other activity categories as well as the three group type categories, there was no consistent tendency across classes for more time to be recorded by the teachers or the observers. The magnitude of the dis- crepancies between log and observation time varied widely from classroom to classroom. Also there was no classroom where the discrepancies were consistently large or small across the categories as compared with the other classrooms, suggesting that there were no teachers that were particular- ly accurate or inaccurate in recording student activities. In order to better understand the differences that existed between the teacher logs and observer notes, the discrepancies were also analyzed at the level of individual pursuit records. Given the nature of the pursuit records, a coding process was used to create a one to one match between the log and observation pursuits. Given the size of the data set, this was done only on a random sample of days in each classroom for a random sample of students within each group of students that had similar activities. Multiple codings of two student days were done to assess the reliability of this coding process and it was found to be quite reliable. 122 Two types of errors could exist between pursuit records coded from the logs and observations. First they could differ as to how the pursuit was categorized in terms of the type of activity and/or the group and super- vision type. Secondly, differences could exist in the recorded length of the pursuit. Separate procedures were used to assess each type of difference. A crosstabulation of the codings of each pursuit from the logs and observa— tions was done both across classes and.within classes for activity type as well as group and supervision type. A measurement model developed by Schmidt (1981) was used to assess the discrepancies in pursuit length recorded in the observations and logs. The consistency between the log and observation data was quite good for the categorization of the five subject matter areas. This was not the case for transitions between lessons and seatwork. The log and observation codings were reasonably consistent for whole group and individual instruction. This was not the case for subgroup activities where the majority of pursuits coded in this category from the observations were coded as individual pursuits from the logs. As one might expect from the results at the level of total time per day per student, the majority of pursuits coded as unsupervised from the observations were coded as supervised from the logs. Schmidt's model allowed the error or discrepancies in pursuit length between the logs and observations to be 123 partitioned into three distinct components. The first is a fixed bias in estimating pursuit length from the logs as compared with the observations. The second is a random bias associated with pursuit length. The third is a random error independent of pursuit length. The model was fit for each class across all pursuits and for individual pursuit categories across all teachers. With the exception of classroom six, there was a slight positive fixed bias and a slight negative random bias. In classroom six the signs were reversed. This indicates that in the first five classrooms the teachers tended to overestimate short pursuits and underestimate long pursuits. In classroom six this tendency was reversed. With the exception of classroom four, these tendencies were small. There was a substantial amount of random error in estimating pursuit length in all six classrooms. The standard deviation of the error variance ranged from about 20% to 30% of the average length of a pursuit or from about 2 to 5 minutes across the classes. As when the model was fit for the pursuits in each class, there was in general a slight positive fixed bias and slight negative random bias when the model was fit for different types of pursuits with the following exceptions. Both the random and fixed bias component were substantially higher for transitions between lessons, seatwork and unsupervised activities. This suggests teachers have greater difficulty keeping track of these types of student 124 activities, tending to overestimate the length of the short pursuits and underestimate the long pursuits. The sign of the bias components was reversed for subgroup activities though the magnitude of both components was small, indicating a slight tendency for teachers to under- estimate the length of short pursuits and overestimate the length of long pursuits for subgroup activities. There was a substantial amount of random error in estimating pursuit length for all the types of activities. The amount of random error ranged from about 20 percent of the average pursuit length in reading to almost 50 percent of the average pursuit length in unsupervised activities. CONCLUSIONS The major purpose of this study was to evaluate teachers as a source of classroom time allocation informa- tion. The first question this study addressed was how reliable was the coding procedure used by the Language Arts Project to transform written descriptions of student activities into pursuit records, that is categorizing the activities and recording their length. The second question was what were the major sources of variance in time allocation and how do they affect the reliability of various data collection designs. The third and major question of this study was to what extent can teachers provide accurate information on their student's activities by keeping daily logs. 125 CODER RELIABILITY As stated above, pursuit records from two codings of the same observer transcripts were found to be in general fairly consistent though there were some exceptions. In addition, in most cases one of the coders in each pair was not consistently higher than the other across students and days coded. This would suggest that the differences would to some extent cancel out when coding large data sets. Given these results, it is probably not necessary to use multiple coders in the future on a regular basis using the procedures developed by the Language Arts Project. This could potentially save a great deal of effort given that it takes about eight hours to code a class day if all the students in the class are observed. Since there were categories where some of the pairs of coders did disagree substantially, it would be wise to double code a few class days coded by each coder to locate problems if they exist. It would also be a good idea to rotate coders so that a class was coded by more than one individual. This would help balance out individual coder biases. The consistency of the multiple codings also suggests the pursuit records coded from the descriptions of classroom activities collected by the Language Arts Project are reasonably accurate and can provide useful information on how student time is allocated in elementary classrooms. 126 THE GENERALIZABILITY OF MEASURES OF TIME ALLOCATION The major sources of variance in measures of time allocation found in this study were classes, days and the interaction of measures and days. The one exception was for transition time where the largest source of variation was the class by measure interaction. The implications of generalizability study results in terms of the accuracy of teacher logs will be discussed in the next section. When the variance components were used to compute the reliability of a number of sample research designs, the results made it clear that both increasing the number of days observed and using observers as opposed to teachers resulted in substantial increases in reliability. Increas- ing the number of students observed within a class has little or no effect on the reliability. It also became clear that researchers should have no trouble obtaining reliable data when they are interested in comparing class- rooms in terms of the time their students spend on average in various activities. There was little, and in the case of science and social studies, no variance among students within a class. Obviously it is not possible nor necessary to measure differences in time allocation among students when the differences do not exist as the results of this study suggest for science and social studies instruction. There was however variation among students in language arts, reading, math and transitions. The reliability 127 coefficients from the sample data collection designs suggest that it would be possible to obtain reasonably reliable data for comparing time differences in reading and possibly language arts among students within a class. It would however require sampling a large number of school days, particularly if teachers as opposed to observers were used as the data source. This was not the case for math and transitions. There was so little variability among students in a class for these activities that it would require collecting data on a huge number of days to obtain reliable data even if observers were used. The differences among students in a class in the time spent studying math or in transition between lessons is so small however that it is probably not worth investigating anyway. THE ACCURACY OF TEACHER LOGS As was discussed above, it seems clear that teacher logs can provide time allocation data accurate enough to measure differences among classes in broad subject matter categories. Although there was roughly a .10 improvement in reliability when observers as Opposed to teachers were used as the data source, this difference could be made up by increasing the number of days data is collected. This may well turn out to be a more cost effective approach. Research on teaching often focuses on much more narrowly defined student activities than subject matter 128 categories such as math or reading. The researchers of the BTES study (Fisher, et al., 1978) for example, measured time allocation in many categories within math and reading such as word structure and computational transfer. The Harnischfenger and Wiley model defines pupil pursuits in terms of the intersection of subject matter grouping and supervision. It is not clear from this study whether or not teachers can provide accurate data when the time categories are more narrowly defined. Teachers grossly overestimated the amount of time they spend directly supervising student instruction. There was also error in time they recorded students spent in different grouping categories. When student activities or pursuits are defined in terms of the intersection of subject matter group and supervision, it is questionable as to whether teachers are of use. This study of course did not address the accuracy of teacher logs for recording student time in narrowly defined subject matter categories. In addition to the question of whether teacher logs are accurate enough to be used for collecting time alloca- tion data, this study provided insights about the nature of the error in teacher logs. The analysis at the pursuit level found the discrepancies between the observers and teachers were due to both differences in how individual pursuits were categorized as well as their length. It is difficult to assess the impact of each of the two types of error, though it seems likely that the errors in 129 categorization of pursuits had a greater effect for two reasons. First, the discrepancies in categorization that were found to a large extent seemed to explain the discrepancies between the log and observation time in many of the categories found in the analysis at the level of time per student per day. This was true even though the pursuit level analysis was done on only a small sample of the data as the analysis at the level of time per student per day. The most striking example of this is supervision time where there was a tendency for pursuits categorized as unsupervised from observations to be categorized as supervised from the logs. The extent this occurred in each classroom is roughly related to the size of the discrepancy found when the pursuits were aggregated to the time per student per day. Secondly, the results of fitting Schmidt's model suggests that there was little fixed or random.bias in most of the pursuit categories. Although there was a substantial amount of random error in pursuit length, these differences being random would tend to cancel out. There is probably little that can be done to improve teacher accuracy in recording the beginning and ending time of pursuits. Their prime responsibility is teaching and during busy periods they are likely to have to guess the exact time a pursuit started or ended. As stated above, the analysis at the pursuit level suggests that the discrepancies between the teachers and observers in recording beginning and ending times of the pursuits is 130 mainly random and would tend to cancel out. This suggests that although there is little that can be done to improve the accuracy of teachers in recording the times pursuits begin and end, nothing really needs to be done. Differences in the categorization of pursuits seemed to be the major source of discrepancies in the time recorded in different categories from log and observation descriptions. As can be seen in the examples of teacher logs and observer transcripts presented in appendix A, the descriptions of the pupil's activities in the observer transcripts provide far greater detail than the descrip- tions in the teacher logs. It seems reasonable to assume that the discrepancies in the categorization of pursuits is due mainly to errors in coding of the logs because the coders were not provided with enough information. Although it is unreasonable to expect teachers to provide the kind of detail contained in the observations, changes in the log forms the teachers use could possibly improve the accuracy of the information they provide. The greatest discrepancies in pursuit categorization between the logs and observations were in teacher super- vision. There was also a strong tendency for pursuits coded as subgroup from observer transcripts to be coded as individual from teacher logs in four of the six classes. One possible way of improving the accuracy of categorizing the grouping and supervision dimensions is to have the teacher code them directly on the logs. Columns could be 131 provided for each dimension into which the teacher could put a code indicating the category; i.e. l for supervised or 2 for unsupervised. This procedure might also be feasible for the activity type (subject matters, transi- tions, etc.) though the number of categories is greater and the distinctions between them are less clear. Direct coding of pupil pursuits by the teachers on some or all the dimensions the pursuits are categorized would also reduce the work or eliminate the need for a separate coding process. Field testing of this approach for teacher log keeping would be necessary to see if it is effective for categorizing along each specific dimension. Using teachers to directly code pursuits would also have the advantage of reducing the work or actually eliminating the need for the coding process. APPENDICES APPENDIX A @000 9: :05 :08 :ll :15 :16 :20 :21 35 APPENDIX A Observation Bell rings, S's start coming into room. T comes into room. T takes hot lunch and milk count and attendance. T dismisses S's and all but 23, 26, 19 (12 is absent) leave and go to reading and spelling in other rooms. S's from 3 other rooms come in LL room for reading and spelling. T chats with observer, gets books arranged for reading, puts file folders and work— books out on TD#1. P. 92, 92 -— Pink; p. 116 -- Blue; p. 108 -- Purple. The T tells each of the 3 groups that these are the pages to write their spelling words on. T starts giving dictation from spelling workbooks; he starts giving dictation to the Pink group lst, then dictates 2 words to the Blue group, and then dictates 2 words to the Purple group. He dictates 2 words at a time to each group and then says a sentence to each group which contains both words just dictated. T has moved TD#1 up to the NW corner of the room just in front of the CB. TD#1 is a waist high table with wheels on it. On top of TD#1, the T has TE of TB and workbooks arranged. While giving dictation, the T either walks over to the group or faces the group to which he dictates the words. In all 20(?) words are dictated to each group. HR S's in reading/spelling class: Jill - 23 Vickie MC - 26 Lee Ann M (absent) - 12 Tonia Y - 19 Another T came into the room and talked to LL for about 30 seconds. S's were writing words at this time. 132 133 9:39 Dictation stops; T asks S's to correct words, he waits for room to quiet down. 9:40 T goes to TD#1 and looks at some materials on the desk. 9:41 T -- "Complete last week's dittos. Leave your workbooks with me. We have to complete the skills unit. Finish vocabulary and then start on TB study.” This assignment is for the Pink group. They begin working independently. 9:42 T dictates a sentence to 3'3 in the Blue group. 9:43 T asks S to repeat the sentence. 9:44 T dictates another sentence. He dictates each sentence only once. S's are expected to be able to write down correctly the whole sentence dictated. He calls it a "fluency program.” 9:46 T asks S to read the sentence. He then asks S to read what she has written. 9:47 T -- "... in Skills Reader, we're going thru the TB study to emphasize important parts. Turn to p. 243 in Skills Reader ... on p. 242." T reads directions out loud. T then allows time for S's to read the selection silently. This is the Blue group he is now working with. Pink (T sometimes calls them Orange group) group working independent- 1y. 9:50 T stands at TD#1 waiting for S's to read the selection from p. 242 of Skills Reader. 9:51 T walks around helping S's who are having trouble deciding on a title for the selection. 9:52 T goes to CB and writes: I. A. B. II. A. B. III. A. B. 9:53 T asks S what the title is. T writes on the CB what he says: Whythe South Has Busy Industry. T asks the group what the main topics of the 9 10: 10: 10: 10 10: 10: 10: 10 10: 10: :54 00 02 O3 :04 05 O7 08 :09 10 12 134 selection are. S's answer and T writes them on the CB. 1. People II. Water 111. Natural Resources T asks for 2 important details for each main tOpic. S's answer and T writes them on the CB. T helps S's narrow their responses to one or two words. I. A. Manpower B. Demand of products II. A. Mountains streams S's felt there B. Powerful rivers should be 3 impor- C. Electric power tant details under II. III. A. Mineral deposits B. Raw materials T -- "For you S's who had finished this assignment, is this how your outline looked?" T -- "How many have completed their reading papers?" T begins passing out papers that S's had completed earlier. T had corrected them. T asks S's to correct their errors on the papers. T completes handing out papers. Several S's bring up papers to him. He looks them over. T gives directions to Blue group S's for reviewing of paragraph skills and ... skills and Vocab. review. S's ask questions, T answers. T walks around Blue group checking to see if S's are getting started correctly. T stops and explains to the Blue group that on ditto, 43, question 20, only 1 S got this correct. T -- ”What does percolating mean?" T walks around the Pink group to see how S's in the Pink group are working. T goes back to TD#1. S comes up and asks for help. T helps the S. T goes to bookcase by windows and takes out some paperback books and gives them to a HR S to read. 10:13 10:14 10:29 10:30 10:31 10:34 11:02 11:07 11:09 11:12 11:17 135 The books are Readers Digest skill books. (This is the S who is on an individualized reading program. T calls her the HR group.) T goes back to TD#1, S comes up and asks for help. T helps the S. T walks around helping any S in the CR.who needs help. All S's are working independently in TB and workbooks. 1 S is working in a Readers Digest book. 19 is looking at a globe. Some S's are working on a packet of dittos, #38-41, and 43 from ditto master for "Reading to Learn." Jill-23 wanders around the room from 10:23-10:30, Jill is in both Purple and Blue spelling groups. T -- "Tomorrow we will work in Barnell-Loft and SRA. Those who have not finished work in the unit can have time to do so." T talks to the whole group. T -- "Alright, that will be all for today." S's start leaving room, HRS, #2 - 18, 20-22, and 24, 25 start coming back into the room. Transition. Recess. S's start coming into room. T —- "Today we are going to review the addition of unlike fraction." T passes out dittos to the whole group, #2-11, 13-26; T tells S's to begin working on them because he needs a few minutes to prepare metric materials for the class. S's begin working on ditto. T walks around room getting materials ready. S's have trouble on the ditto, so T interrupts his preparation. T goes to CB and writes: l/3=4/12 3,6,12,15... 1/2 l/4=3/12 4,8,12,16... 1/7 _/14 S's have trouble remembering how to find common denominator. (CD) T explains how to find CD, multiples of each denominator the common multiple (CM) will be the CD. T walks around helping S's find common denominator. T leaves room. 11:22 11:23 11:26 11:28 11:30 11:32 11:33 136 T returns, T -- "It has taken me longer than I had anticipated, so we'll work on that tomorrow." T -- "Those of you who have finished can look ahead in your metric book. Those having trouble, look up here." T does common denominator problem on the CB. It is problem #2 on the ditto. 2/3 = _/24 (written on board). 1/8 = _/24 whole number X8 = 16, 24 T walks around to see if S's are getting correct answers, then back to the CB to complete problem. 2/3 = 16/24 T asks S's to tell him what + 1/8 = 3/24 the answer is. 19/24 T -- "You should ask yourself, can this be reduced to lowest terms?" T asks 20 for the answer. T -- "Look at problem 3." T puts this problem on the CB. 2/5 = 8/20 + 1/4 = 5/20 12/20 T -- "Say to yourself, you multiply the denominator by a series of whole numbers." T -- "Can the problem be reduced? What do you say to yourself, 20?" S -- "13 is prime so you can't reduce it." T -- "Problem 4." T puts it on the CB. 3/4 = _/20 + 5/5 = /20 T -- ”25, what is the CD?" S —- "...”, S -— "20” T -- "We have time for l more." T puts this problem on the CB. 5/6 = 15/18 + 2/9 = 4/18 19/18 1 1/18 T -- "This fraction is > 1." ”How much greater?" He answers his own question. Transition. 11: 12: 12: 12: 12: 12: 12: 12: 12 12 35 10 13 17 20 21 26 30 :32 12: 34 :35 137 S's begin to leave for lunch. Bell rings and S's start coming into room. They take out books and read independently. USSR T comes into room, chats briefly with observer and then takes roll. Several S's chat briefly with him. T leaves room. 23 came over and sat down next to me. She does not want to read during USSR. T returns, picks up 16mm film and then leaves room again. He sets up film on projector in the hall. T returns, picks up TE of Language book and puts it on TD#1; hangs up his coat in the closet; puts extra math dittos away; answers some S's question; gets AV order blank out of his briefcase and then sits at a table and begins filling it out. S's continue to read independently. 7 leaves to go with reading specialist. T -— "Put away your work and go to science." Transition. T stops working on order form. He pulls down film screen and closes curtains. S's 2 - 26 leave for science. T brings in the film projector and sets it up in the SE corner of room. S's from another room come in. T talking to the whole group -- "The film is an introduction to U.S. geography and land forms." (The film is a Cornet Film of about 1953 vintage.) T starts 16mm film, then adjusts curtains, gets record book from TD#1 and then sits by the pro- jector working on record sheets and watching the film. Topics covered in the film: Mineral deposits -- oil, ore Great Seal of the US and motto -- E Pluribus Unum Belts -- Corn; wheat; cotton, tobacco, oats, soy- beans, hay; corn belt one of the most important regions. Corn for humans and for livestock. One of the richest regions in the U.S., importance of the products grown for the economy and jobs. Grazing and irrigated crops region -- mountains and high plateau; sheep and cattle in this region, need of water in this region. Dams built to store 12 12 12 12: 12 12: 12: :47 :48 12: 50 :53 54 :55 56 58 :01 :05 138 water allows for irrigation. People live in cities because of variety of jobs. 2/3 of 180,000,000 people live in cities. How are these people linked together? The 12 largest cities are near water. Great Lakes -- St. Lawrence Waterway. Ohio, Mississippi and Missouri Rivers used for transportation. Ore sent over the GL and St. Law. Waterway. Waterways tie people, natural resources and products together. Railroads do the same. Oil pipelines, air transportation tie people and regions together; transportation links people, resources and commerce together. It ties U.S. people together; highways are especially useful for people to be tied together. Film ends. T talks to the whole group -- "Did you find any new information in this film?" S -- "About St. Law. Seaway" ... "mining pits." T led discussion with whole group, T passes around a chunk of natural copper ore. Discussion was on ore deposits and ore ranges. T uses map of U.S. to point out where copper and iron ore ranges are located. In northern MI, WIS, and MINN. Points out how the ranges extend into Canada. Tells about his brother-in—law who works in a copper mine. 7 returns. T passes out taconite ore pellets for S's to handle and look at. T explains how taconite pellets are made. Use low grade ore; ore tumble in large magnetic drums. T passes out chunks of natural iron ore just as it came from the ground. Points out natural orange color. T shows different chunks of copper ore from different ranges in the U.P. Some are solid copper, others have impurities mixed in. T starts to collect the ore samples from the S's. T explains about the St. Lawrence Seaway. He points it out on the U.S. map. 8 says he was surprised to see that the Hawaiian Islands were so far north and he wanted to know why it wasn't cold there. T explains the difference in climate. S's leave -- transition. l 1: 1: :08 10 17 :21 :22 :25 :27 :28 :31 :33 :34 :35 :37 :38 139 T explains that the film is an introduction to the geography of the U.S. -- land forms, climate, transportation, crops. S's 20 and 3 run the film -- same film as was shown to the 12:30-1:00 group. Whole group watches; additional topic on the film: variety of climates -- helped to make the U.S. the leading agricultural nation of the world. T watches film. T stops film 7 minutes into it and asks S's to recall names of the great rivers of the U.S. -- Ohio, Missouri, Mississippi. He points them out on the map of the U.S. hanging on the front CB. T -- "How many dams on the Colorado River? . about 23." 6 asks a question on what would happen if one of the dams broke, T explained that it would depend on how much water was behind the dam. In other words, the time of year would help determine how much water. T talked briefly about the earthen dams that had recently broken. Several more S's question on dams and floods. (T had to justify morally the use of the word dam.) 22 comes into the room from science; the dams and floods discussion led a S to relate experience they had recently with severe weather conditions. 22 leaves the room. T points out major cities on the U.S. map. T starts film again and sits and watches; addi- tional topic on film: kinds of products carried by rail -- cattle, sand, gravel, lumber, oil. 22 returns. T writes 180,000,000 on the CB. Film ends; 6 leaves room. T asks S's to relate things they learned from the film -- S's responses -- St. Lawrence Seaway, dams. 3, 22 leave room. 6 returns. T talks about the Welland Canal. T told about seeing a Dannish ship at dock in Deluth. 3, 22 return. :39 :40 :42 :43 :44 :45 :46 :48 :49 :51 :52 :00 :01 140 T talked about the air quality in Gary, Ind. T wrote EPA on the CB and then explained what it meant. Gary receives raw materials by boat. 20 goes to the U.S. map and points out where he went on a train trip and saw a lot of air pollution near Chicago. He wondered why nothing had been done about it. 10 leaves the room. T asks S's if they could tell from the cars shown on the film when the film was made, also the tractors and harvesting equipment. 11 leaves the room. 10 returns, 11 returns. Principal comes into the room and talks briefly to T. S's decided the film was made in the '50's. S's related information they had about tractors and equipment used to build roads. S's talked about family and neighbors who had tractors and equip- ment they had seen on construction sites. T tells S's to take out English books and turn to chapter 6, p. 169. T -- "Here is listed what we are about to study." T -- "question 1: What is jargon?" T -- "You people who are still having trouble finding the page, turn to the table of contents and see what page chapter 6 starts on." T -- "question 2.” T opens curtains. Question 3) T continues to briefly introduce the tOpics listed on p. 169, he briefly explains what each topic is. He sometimes ties a tOpic in with what had been previously studied. Topics which tied in with previous study were 3); 4); 7). For sentence 12) T asked S's to relate a riddle. 8'3 6, l6 and 24 gave riddles. 20 said one could advertise kittens free in the State Journal. T asked why. 20 said maybe the Humane Society arranged it. :04 :05 :06 :07 :08 :10 :12 :13 :14 :15 :17 :18 :20 :21 :24 :26 :27 141 Secretary brings in notes and tells T they have to go home after school. T asks 23 to write S's names on notes to send home; she sits at a table on the side of the room putting names on notes. T -- "O.K., let's take a look at jargon (topic 1)." T asks 24 to read aloud to the class from p. 170. T tries to get S's to come up with the word caption. They don't so he tells them about caption. 24 reads the next question from p. 170. T dis- cusses jargon words, swabbing and bulkhead. 24 reads orally from p. 170, T explains meaning of "jargon" in terms of swabbing and bulkhead -- words used in the Navy. 24 reads orally from p. 170. T asks 11 to respond, 22 leaves the room. 24 reads orally from p. 170. 22 returns. T asks 6 and 22 if police have jargon, (their dads are policemen). They said they think so. T tells about visiting the FBI headquarters in Washington D.C. 2 reads orally from p. 170. T asks S's to discuss meaning of jargon in sentences 1) and 2) on p. 170. 23 finishes writ- ing names on notes. 24 reads orally last sentence on p. 170. 10 reads paragraph at top of p. 171; T and S's discuss meanings of jargon in sentences 1-4) on p. 171. 3 reads question 2. 23 begins writing student names on another set of notes to go home, S-23 decided to do this on her own. T tells jargon from golf that is now little used -- brassey, mashey, spoon. S's had never heard these words! 8 reads 3) out loud on p. 171. :29 :30 :32 :33 :34 :35 :38 :39 :40 :41 :45 :10 :15 :20 142 26 reads orally from "For Practice” on p. 171. 14 reads orally 1) under "Written" on p. 171. T asks S's if they watch "Quincy" on t.v. and briefly discussed "morgue" in this context. 6 looks up morgue in dictionary. . reads 2) orally, p. 171; T was in a radio station once when it was cut off the air. It took about 3 minutes to build power back up in order to put the station back on the air. 23 stops writing names on second set of notes. 21 reads 3) orally, p. 171; T -- "Havlichek started playing BB in 1962. Yesterday he got a standing ovation. 6 stops looking for morgue in dict. T reads the definition out loud to class. 25 reads 4) orally, p. 171. T asks 3 to review jargon. T -- "Preview for tomorrow" p. 172. T asks 11 to comment on it. P. 173 T asks 15 to comment on it. P. 174 T comments and so does S-l9. S's leave for recess; T goes outside to supervise recess. S's begin coming in from recess cause it's raining; S's get books, coats and misc. materials ready to go home and also "play" in the room. T exchanges small talk with S and then gets safety patrols organized to have "rainy day bus line-up." S's dismissed to go home. APPENDIX B APPENDIX B CodinggProcedures General Procedures 1. 10. 11. Each student in each class is assigned a number (01, 10, 20) which remains constant for all coding procedures. Class refers to a number assigned to each teacher in the study. Da refers to the date of the data source which is co ed. Source indicates whether teacher logs, observations, or teachers plans were used as the data source. Beginning and ending times refer to the time activities started and Stopped. Subject areas include types of activities found in school days with provisions for major and minor areas. Group refers to whole group, subgroup or individual. Group size refers to the number of students in the group considered; check attendance to determine group size. Supervisory code refers to teacher supervised, other supervised or nonsupervised. Location refers to in own room, out of own room, or out of school. Process variable refers to the amount of actual reading or writing done by students during a time interval. Writing refers to text or sentence compositions, not to penmanship. 143 144 Student Procedures 12. Use the same subject numbers throughout all of the coding for a given classroom. That is to say, subject 25 must refer to the same person in all of the coding. 13. If a child is absent record on the code sheets his number, class, day, and source. For the beginning time, give the beginning time for all other students for that day, and for the ending time, use the ending time for that day. Be sure to check attendance and note those children that are absent on the code sheets. 14. If some pupils are not identified ignore their actions in the coding or if they are identified but only as involved in momentary actions, ignore them in coding (anything 30 seconds or less or "brief" is defined as momentary). 15. If a beginning and an ending time cannot be found for children leaving the room, ignore their having left, i.e. treat them as if they never left the room. Time Interval Procedures 16. Times for intervals must be continuous, e.g. 9:12 - 9:20; next interval 9:20 - 9:40; next 9:40 - Subject Area Procedures - General l7. Always consider the large unit when classifying subject areas. If a larger segment of time which is homogenous with respect to content has embedded in it only a short comment by the teacher which would change the content specification, ignore this comment and code for the larger unit. 18. When the teacher gives directions or elaborates on an assignment, this should be coded in whatever subject area it occurs; it is part of the time interval for the subject area coded. 19. Announcement of due dates should never be coded separately. a. If due dates are announced during a regular lesson, then treat the announcement as part of the subject area in which it occurs. b. If due dates are announced during a transition, 20. 21. 22. 23. Subject 145 consider the announcement as part of the transition. For the codes 0100, 0200 and 1500 no minor is usually coded. When children leave for the library, code the content of what they will be doing in the library if you know it. If children leave during a period in which they were instructed to use the library as a resource, then code their subject area as the same as what the rest of the class is studying during that time interval: only code them being out of their room by location code. If children leave during some other time when the content is not clear, or during the reading or language arts period, or during their free time, assume they have gone to pick out a library book for their free reading time: code these students as 0212 and 12 on the process variable. There is no separate code for tests. All testing should simply be coded as to the subject matter which it covers. For the supervisory code, code it as l - teacher supervised. For the group designa- tion code, code it as individual. For the process variable code - code it as 30. Code movies or tests or field trips or educational assemblies in terms of the content involved for subject area. Areas Procedures - Language Arts 24. 25. 26. 27. Code all sharing activities as 0110: Language Arts - oral communication. If children spend time with speech and/or hearing therapy code them as 0110. Writing instruction under language arts includes instruction in the process and art of writing as well as structured practice in writing; it does not refer to penmanship. Sentence composition refers to composing sentences only - not to text composition. Sentence comple- tion is sentence composition if it involves more than one word. If as a part of the language arts lesson children are taught to read maps, tables, or graphs, or to develop map legends, tables or graphs, this should be classified as 0180 - information gathering 28. Subject 29. 30. subject 146 skills. The category "literary forms" under language arts is for content dealing with various literary forms such as poetry, autobiographies, biographies, fairy tales, folktales, and tall tales. If the reading lesson aims at reading literary forms then "literary forms" should be used as the minor designation. Areas Procedures - Language Arts and Reading For reading and language arts use the teacher's specification (from the schedule or the blackboard or convention) as to whether the major code is reading or language arts. For all reading and language arts lessons where the major specification is reading or language arts, code the content of the reading, writing, spelling, etc. lesson as the minor content specification. If the content does not fit one of the codes, such as science, social studies, etc., then and only then leave the minor code blank. Do not stretch the point in coding the minor area. In a fairly straightfbrward way, it must be science, social studies, etc. before it is coded as such. a. Reading lessons can have a minor in language arts, and vice versa. Area Procedure - Reading 31. The reading categories are defined as follows: a. No explicit analysis - no overt attempt is made at analyzing what is read. b. Word analysis - includes phonetic analysis, structural analysis and sight words. c. Word meaning - vocabulary development. d. Text analysis - comprehension, sequencing, main events, main idea, setting, etc. e. Individual reading — child is reading by him- self either silently or to the teacher. f. Group reading - the activity where a subgroup meets with the teacher and some or all of the children alternate in reading the text and sometimes answer questions about what they have read. Also - where questions are asked and the children then read silently to find the answers. To be coded here, children must be reading. If the children are reading paragraphs 32. 33. 34. 147 from their workbooks in class with the teacher and then discussing them, code thié as group reading. Lecture or discussion - where the teacher lectures on or the teacher and students have a discussion about reading itself. Also - for situations where there is a discussion about the content of what has been read but there is no reading - (either silently or out loud) - during that lesson. Also - when the teacher lectures (talks) about reading, word analysis, or literary forms without actual reading by the students. Individual reading and doing exercises - where the child reads by himself/herself and does exercises based on the reading. Doing exercises (dittos, tapes) - where children are doing only the exercises. If the teacher is discussing their answers with them, this is coded as discussion. If an individual child reads with T and they discuss the text this is coded as individual reading with T's supervision: don't code as discussion. If more than one of the reading levels (on the third digit, e.g. word analysis, etc.) occurs dur- ing a lesson, code as follows: a. If the different areas are covered separately and are sequenced one after another and are of at least 2 minutes in duration, code the different parts separately. Create a new time interval for each part of the lesson. If the different levels are distinct and sequenced but short in duration (all but one less than 2 minutes), code the whole lesson as one time interval and code it hierarchically, giving the level with the highest code the greatest priority (e.g., if both word analysis and text analysis occur and word analysis is less than 2 minutes in length code the whole interval as text analysis). If the different levels are intermixed in the lesson, code the whole lesson as one interval and use the level with the highest code. If a child is reading with an aide, classify the subject matter as 0200 and code the process variable as 12. If a child is doing a crossword puzzle and it is not clear from the context that the purpose of it is for word analysis, then code 3 in the third digit for reading, or word meaning. 35. 36. Subject 37. 38. 39. 40. 41. 148 In reading on the 4th digit (individual reading, group reading, etc.) make a new interval for activity change and code it separately. Do not code the whole lesson or use the notion of an hierarchy. When dealing with reading groups, code the children involved in that reading group 0900 from the moment the teacher calls them up for the reading group until the point at which the actual instruction begins. When the children finish with the reading group and are dismissed, code them 0900 from that point until the point at which it is recognized that they have actually begun work on some other matter. If this is not indicated, then do not code them as 0900 but simplycode them as hav1ng returned to seatwork or whatever else it is that they are doing. This latter case will most likely be prevalent. Area Procedures - Social Studies and Science The distinction between Social Studies and Science revolves around the focus of content. If the focus is technical, then it is science. If on the other hand, the focus is on the effect that some scientific or technological field has on society or individuals, then it is coded as social studies. If during a science or social studies lesson the teacher instructs the students in reading or some area of language arts, be sure to code reading or language arts as the appropriate minor. To be coded as a minor instead of as a process (see convention 60) there must be formal instruction or formal feedback in the area. Social Studies includes history, geography, sociology, anthropology, government, political science and economics (all coded as 0800). Lessons dealing with social behavior and affective goals and values should be coded as Social Studies (0810). In terms of the code 0810 (subject area), only code lessons where there is formal instruction in the area of values or social attitudes. Do not code momentary interactions about values, behavior in the classroom or issues of discipline under the code 0810. "Child of the week" is coded 0810. 149 Subject Area Procedures - Breaks, Beginnings, and Endings 42. 43. 44. 45. 46. Codes 09-13 for Subject Areas indicate various breaks. a. 09 - for between instructional activities including the passing out or collecting of materials. If a child spends time with a social worker code him as 0900. 10 - only for recess or lunch. 11 - all activities at beginning or ending of day (or half of the day) including lunch money, clean-up. d. 12 - if children disappear for short periods of time from their room and it is not clear where they went code them as 1200. e. 13 - any other break such as fire or tornado drills; other people enter the room, etc. f. If children come in late at the beginning of the day, code them as 0900 until they arrive; if they come in late after lunch code them as 1000 until they arrive. 00" Whatever happens at the beginning of the day or at the beginning of the second half of the day (before the teacher formally begins the activities) is coded 0900 or transition. When the teacher begins, this could be coded as 1100 if it is a beginning or ending exercise or as the regular subject matter if there are no beginning exercises. For transitions or breaks do not code process variable, group, supervisory code location, etc. Just code times and break code. Children leaving and returning during transitions or breaks or opening exercises need not be separately recorded, as long as they leave and return during the break. For transitions to and from reading subgroups, code them for the children involved when the information is available. For the beginning of the group lesson code the transition from the time the teacher announces the group to the class to the time at which the lesson begins with these children. If there is confusion as to the beginning time Kg. the transition, code the lesson as having begun immediately. The end of the subgroup comes when the teacher announces they are finished. If there is no further reference to these children returning to their seats or beginning other activities, code without transition. 150 47. Make a judgment about when the transition is over using the criterion of when most children have begun to work. 48. Code all passing out of materials as transitions. Subject Area Procedures - Seatwork 49. If the child is doing seatwork during the reading lesson and is reading in his reader, and it is not clear to the rater whether the reading instruction is aimed at word meaning, text comprehension, or whatever, classify subject as 0202. The third digit 0 means that it could not be ascertained what the nature of the reading task is, but it is known that the child is working in reading and he is also doing reading by himself. Likewise, if the child is working on some ditto or a workbook and if it cannot be ascertained from the observation what the exact nature of the exercise is, classify as 0203. This indicates he is reading and working on exercises, without knowledge of the exact nature of the material. If you know whether it is word meaning or text analysis or whatever, then, of course, code this in the third digit. If the child is intermixing the two, that is, reading and also doing exercises and it is not possible from the observations to know at what point the child stopped reading and began doing the exercises based on that reading, then use the code 6 in the fourth digit for reading. This indicates both reading and exer- cises are being done during that period. 50. If during an individual work period the teacher makes an announcement about the fact that the children ought to move on to task B when finished with task A the fact that both A and B are now possible tasks must be accounted for. This will usually necessitate the use of mixed seatwork code 15 with the third digit indicating, if it is possible, which two subject matters are being included in the mixed seatwork. However, if the teacher does not change subject matters by her announcement; that is, both assignments are in reading, or both assignments are in language arts, then there is no need to move to the 1500 code. Group Designation Procedures 51. Group designation refers to the nature of the instructional setting. 52. 53. 151 a. For group designation, if more than one child is involved, but less than the whole class, code as subgroup. b. Movies and assemblies are whole group activi- ties unless otherwise specified. Code the group size variable for all intervals but do not change it to reflect momentary changes in group size such as toilet, library, etc., breaks for individual children in the group. a. For group size involving standard groups just take the given number in the group minus those children that are absent for that day. b. For all non-standard groups count the number involved. Supervisory Code Procedures 54. 55. For the supervisory code, it should be coded teacher or other supervised only if the teacher or aide is actively involved in educational super- vision or monitoring of student activities. a. If a teacher is walking around the room and supervising seatwork by interacting with the children and all the interactions are momentary, code all children during this period as having been supervised. b. If the teacher is at his/her desk or is walk- ing around and has a 30 second or longer interaction with a child, code the child as having been supervised during this interval and all other children during this interval as not having been supervised. c. If_the teacher is at his/her desk or a table working on something by his/herself or watch- ing the children, and children come up to the teacher for momentary interactions, code all children during this interval as non-supervised. d. All whole group or subgroup teacher instruction is coded teacher supervised for the children involved. e. Code the showing of movies, instructional use of tapes, records, etc., typically as super- v1se . Only use the category "other supervised" when it is some individual other than the teacher, such as an aide or another student who is used as an aide in the classroom. If the children leave the room and receive their instruction from the music teacher, the P. E. teacher, or the art teacher, code them 152 as having been teacher supervised. Also code children during the time they are in the library as teacher supervised (unless there is no person formally assigned as a librarian). If a child is near the teacher, working by himself/ herself and the teacher is also working by himself/ herself, the supervisory code is 3 - nonsupervised: close physical proximity to the teacher does not If during an observation a child is recorded as having come up to the teacher for instruction and the next instance recorded is of new child being called up to the front, at that point (unless otherwise specified in the observation) code the other child as having returned to his seat. Most observations should indicate both the time they came up and the time they returned to their seat, but if not, use the above convention. Do not forget that when the supervisory code changes, i.e. the teacher starts or stops to actively monitor, a separate time interval has to Ignore any individual discipline problems in the classroom, no matter the length of time involved, unless they interrupt the teacher while he/she is with some other children who are receiving instruction. The point is that the interaction must take teacher time or supervision away from 56. count as supervision. 57. 58. be coded. 59. other children. Process Variable Procedures 60. For the process variable, code whether during the time interval in question the student himself/her- self did any writing or reading. The student must Have actually done the reading or writing. If both occur, code it 4. The process variable records the ESE of reading or creative writing, not formal instruction in reading or writing, which is recorded as a major or minor. The second digit records roughly what proportion of the interval was spent in reading or writing. Reading must involve more than reading directions or sentences on a ditto - it must involve the reading of text. Writing is also classified only when the Child writes text - not merely filling in words on dittos or copying material from the board. In those instances where the child makes up 61. 62. 63. 64. 153 a story but does not write it himself, this is not classified as writing. To be classified as wfiting more than a sentence must be involved. If instruction in writing is provided but the children do not actually write themselves, code the major as 0170, and the process variable as 30. For the process variable; if the children leave the room to go to a reading class with another teacher, code them 12. For the process variable; code children working in their workbooks as 30. If there is no information about the process variable (reading and/or writing) which can be broken down to the individual level for time intervals, code the process variable 00. For USSR; code 12 for the process variable (after transition, if applicable); USSR represents a structured opportunity to read. APPENDIX C APPENDIX C ANOMALIES IN MATCHING OBSERVATION AND LOG PURSUIT RECORDS 12/17/81 Student 23 in class 3 on 4/27 was dropped from pursuit analysis due to the fact he/she was coded absent by the observer and not by the teacher. 12/29/81 Student 12 in class 4 on 5/12 was dropped from pursuit analysis due to the fact he/she was coded absent by the teacher and not by observer. 1/18/82 On student 2 in class 6 on 5/10 the observer coded 10, 0, 0 from 11:35 to 1:15 while the teacher coded a variety of activities listed below: minutes major group sup 5 l l l 25 15 3 l 5 0 0 0 35 10 0 0 3 9 0 0 22 15 3 1 This problem was handled by matching the observation pursuit with the 35 minute 10,0,0 log pursuit. 154 155 1/21/82 On student 19 in class 6 on 5/10 the observer coded 83 minutes of 5, 3, 2 from 1:52 to 3:15 while the teacher coded a variety of activities listed below: minutes major group sup 8 15 3 l 3 9 0 0 12 2 l 0 20 7 l 1 5 3 3 0 5 9 0 0 25 4 1 1 This problem was handled by matching the observation pursuit with the 25 minutes of 4,1,1 since it was the largest block of time. 1/21/82 On student 2 in class 6 on 5/25 the observer coded as minutes of 13, 0, 0 from 1:49 to 3:15 while the teacher coded a variety of activities listed below: minutes major group sup ll 2 3 l 15 10 0 0 5 9 0 0 10 2 l l 40 7 l l 5 9 0 0 This problem was handled by matching the observation pursuit with the 40 minutes of 7,1,1 since it was the largest block of time. 156 1/27/82 On student 2 in class 6 on 6/06 the observer coded 70 minutes of 5, 3, 3 from 12:50 to 2:00 while the teacher coded 35 minutes of 5, 3, l and 30 minutes of 2, 3, l and 5 minutes of 9, 0, 0 during that time period. The observer pursuit record was matched with the 35 minutes of 5, 3, 1. BIBLIOGRAPHY BIBLIOGRAPHY Allington, R. L. Poor readers don't get to read much. Occasional paper 31, The Institute for ResearEh on Teaching, Michigan State University, East Lansing, MI, 1980. Arlin, M., Roth, G. Pupils' use of time while reading comics and books. American Educational Research Jgurnal, 1978, 15 (2), 201-216. Bennett, N. Teaching style and pupil progress. London: Open Books, 1976. Bloom, B. S. Human Characteristics and School Learning. New York: McGraw-Hill, 1976. Bock, R. D. Multivariate Statistical Methods in Behavioral Research. New York: McGraw-Hill, 1975, 507-5597 Borg, W. R. Time and school learning. In Denham, C. and Lieberman A. (Eds.), Time To Learn. National Insti- tute of Education. May 1980: Brophy, J. E. Teacher behavior and its effects. Journal of Educational Psychology. 1979, 11 (6), 733-760. Carol, John B. A model of school learning. Teacher's College Record. 1963, 64, 723-733. Colin, H. M. Accuracy of teacher reports of their class- room behavior. Review of Educational Research. 1979, 49 (1), 1-12. Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. The Dependability Behavioral Measurements: The Theory of Generalizability for Scores and Profiles. New York: John Wiley and Sons, 1972. Ebmeier, H. H. & Ziomek, R. L. Engagement Rates as a Function of Subject Area, Grade Level and Time of Day. Paper presentation, American Educational Research Association Annual Meeting, New York, April 1982. Fisher, Charles W. A study of instructional time in grade 2 reading. San FranEiSco, Ca.: Technical Report 157 158 II-4, Beginning Teacher Evaluation Study, Far West Laboratory for Educational Research and Development, November 1976. ERIC Document Reproduction Service No. ED 145 414. Fischer, C. W., Filby, N. N., Marliave, R., Cahen, L. S., Dishaw, M. M., More, J. E., and Berliner, D. C. Teaching behaviors, academic learning time and student achievement. Final report of Phase III-B. Beginning Teacher Evaluation Study (Technical Report V-l). Washington, D.C.: National Institute of Education, June 1978. Fredrick, W. C. & Walberg, H. J., Rasher, S. P. Time, teacher comments, and achievement in urban high schools. The Journal of Educational Research. 1979, 16 (2), 63-65. Fredrick, W. C., & Walberg, H. J. Learning as a function of time. The Journal of Educational Research. 1980, 76, 183-194. Harnishchfeger, A., Wiley, D. E. The teaching-learning process in elementary schools. A synoptic view. Curriculum Inquiry. 1976, 6 (1), 5-43. Karweit, N. A reanalysis of the effect of quantity of schooling on achievement. Sociology of Education- 1976, 62, July, 236-246. Karweit, N. Quantity of schooling: a major educational factor? The Educational Researcher. Feb. 1976, 15-17. Karweit, N., & Slavin, R. E. Measurement and modeling choices in studies of time and learning. American Educational Research Journal. 1981, 16 (25, 157-171. Millman, J. & Glass, G. Rules of thumb for the ANOVA table. Journal of Educational Measurement, 1967, 6, 41-51. Schmidt, W. H. Measurement error: Should it include a specification for bias. Paper presentation. American Educational Research Association Annual Meeting, Los Angeles, April 1981. Schmidt, W. H. The high school curriculum: It does make a difference. Curriculum Inquiry, in press. Shavelson, R., Dempsey—Atwood, N. Generalizability of measures of teaching behavior. Review of Educational Research, 1976, 66 (4), 553—611. 159 Stallings, J. Allocated academic learning time revisited, or beyond time on task. Educational Researcher. 1980, Z (11), 11-16. Symonds, P. The correlation of English with other subjects from the point of view of psychology. The Elementary English Review, 1930, Z (l). Wiley, D. E. Another hour, another day: Quantity of schooling, a potent path for policy. In Sewell, W. H., Hauser, R. M. & Featherman, D. L. (Eds.), Schooling and Achievement in American Society, New York: Academic Press, 1976. STY W!Mull/IT;I‘M/1117MWWW“