z " ‘IJ , :5 o. 301545“... t v n . (Hill... r Ya .. a... . o . I .I . $3.1. it ‘24. . 5 fly: fix-Ir! .3 .155. 1r .tt:. i. ailx OI}. :73: l b} it: I {(5 uv'ic.‘i‘.‘! " ¢ .1523: . ...\l€ . .3: .1: :tzi sol-3.. .! , {5:355:21 .r! . . 433.: :12? V. :. .30. 1.3 ‘1: 3.11.1131: ‘ , . ~ .5. V a. ’ssltov? , fria'vrlkr .14. .oI.!-.O.5 .n.\ )P |\ (IQ {33$ II. P... llllllllllllll ll This is to certify that the dissertation entitled THE EFFECT OF INCREASING TESTING TIME ON THE RESULTS OF THE READING COMPREHENSION AND REFERENCE MATERIALS SUBTESTS OF THE IONA TEST OF BASIC SKILLS presented by Henry G. Dulmage has been accepted towards fulfillment of the requirements for ! Ph.D_ dqyeefil Educational Administration' ‘ l Major professor Dec . 22, 1992 Date MSU i: an Affirmative Action/Equal Opportunity Institution 0-12771 LIBRARY Mlchlgan State Unlverslty PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or botorc date due. | ‘DATE DUE DATE DUE DATE DUE l MSU Is An Affirmative Action/Equal Opportunity Inditution chS-DJ __._-— __—_ THE EFFECT OF INCREASING TESTING TIME ON THE RESULTS OF THE READING COMPREHENSION AND REFERENCE MATERIALS SUBTESTS OF THE IOWA TEST OF BASIC SKILLS BY Henry G. Dulmage A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree or DOCTOR OF PHILOSOPHY Department of Educational Administration 1992 ABSTRACT THE EFFECT OF INCREASING TESTING TIME ON THE RESULTS OF THE READING COMPREHENSION AND REFERENCE MATERIALS SUBTESTS OF THE IOWA TEST OF BASIC SKILLS BY Henry G. Dulmage Speededness is a major concern in standardized achievement tests. The high stakes use of test results is exploding as the United States looks for ways to improve education to compete better in a global economy. As these high stake decisions based on test results increase, the optimal time limit of each subtest becomes more important. This research was designed to test the effect of increasing time on the Reading Comprehension and Reference Materials subtests of the Iowa Test of Basic Skills. The time limits for these two subtests were altered in five minute increments, five, ten, and fifteen minutes. The study was conducted using a balance, randomized block design to reduce the error. Twelve sixth grade classrooms, in a middle-sized school district, were divided into three blocks by ability. The classrooms were randomly assigned to one of the Henry G. Dulmage treatments. The results included analyzing for the effects of sex, membership in Chapter I, and family configuration. A regression procedure was used to determine whether the effect of excess time was significant. The study found that increasing testing time by five, ten, or fifteen minutes on the Reading Comprehension and Reference: Materials subtests of the Iowa Test of Basic Skills does not significantly increase the score, however a block effect was found for both Reading Comprehension and Reference Materials and a treatment by block interaction was found in Reading Comprehension. It also determined that the focus and speededness of the Reading Comprehension subtests of the Stanford Achievement Test and the Iowa Test of Basic Skills are different; they may, in fact, measure different kinds of reading comprehension. H) 4 4 r ([1 I"I ACKNOWLEDGMENTS I would like to express my appreciation to Dr. Herbert C. Rudman, chairman of my committee, for his contributions to this study. His patience and guidance with the specifics of writing a research paper were extremely valuable. He took the time necessary to help me understand how to write and research in a scholarly manner. In addition, he was interested in me as a person, and really cared about the struggles I was having with the writing process. I could not have brought this dissertation to completion without his expert help. I would like to thank Dr. Steven Raudenbush, committee member, for his help with the statistical analysis and interpretation and presentation of results. He was patient and understanding as he helped me learn more statistical analysis. Dr. Raudenbush has a way of making difficult concepts easier to understand. To Dr. Frederick Ignatovic, I express my gratitude for his help in reviewing the fine points of this paper. His expertise in research design and writing were extremely valuable in editing this paper. He maintains a good sense of humor which helped me break the tension connected with this process. iv SEE me pre LEma grad devo leat Prof, Dr. Louis Romano's steady influence and ability to see the big picture are greatly appreciated. This helped me relate my study to practice and gave my research added practical meaning. In addition, I would like to thank Mr. Rafa Kasam, graduate student in psychometrics, who helped with the statistical analysis of this study. He helped make the difficult seem easy. Finally, I would like to thank my wife, Marge, and my family for their patience and understanding as I worked toward completion of this study. I could not have completed this research without their support. I would like to dedicate this study to Mrs. Mildred Leman, now deceased, one of the world's truly great first grade teachers and one of my heroes. Her faith in me and devotion to never saying a child can't learn and won't learn have been a constant source of guidance for my professional career. II III. TABLE OF CONTENTS Page LIST OF TABLES .................................... viii LIST OF FIGURES. ................................... xii Chapter I. PURPOSE................ ...................... 1 Research Questions ......................... 5 Methodology................. ............... 8 Variables...... ..... ......... ............. 10 Characteristics of the Iowa Test of Basic Skills. ........................ 11 Time Alterations............. ............ .11 Outcomes .................................. 11 II. REVIEW OF THE LITERATURE .................... 13 High Stakes Use of Test Results ........... 13 Student Accountability..... ............. 15 Development of State and Local Educational Policy .................... 19 Comparing School or School Districts... .......................... 21 Determining Program or Activity Eligibility.... ....................... 22 Teacher Certification or Advancement ........................... 24 Curriculum Change and Development ....... 26 Speednesses in Test Construction .......... 27 Setting Time Limits.. ......... . ......... 27 Speed Factors.... ....................... 28 III. METHODOLOGY AND INSTRUMENTATION ............. 34 Sample......... ........................... 34 Instrumentation.......... ..... . ........... 36 Characteristics of the Iowa Test of Basic Skills .................. 36 SeEEing Time Limits....... .............. 37 Analytical Design.......... ............. 39 Blocking...... .......................... 41 vi Chapter Page Best Covariate..........................46 Choosing the Appropriate F- Test..... ....48 Determination of Variables ..... .. .......48 Directions for Administering the Test...49 Time Alterations. ....... . ...... ...50 Time Assignments Related to Buildings. .50 sumary. COCO... ..... 0...... ...... .OOOOOO::53 IV. ANALYSIS AND FINDINGS ....................... 56 Outcome Data........... ................... 57 Analysis..................................59 Reference Materials.....................63 Reading Comprehension.....................64 Reference Materials ....................... 68 Additional Research Questions...........71 Objectives Used for Test Design ......... 77 Internal Design Characteristics ...... ...83 Standardization.........................85 Summary........... ...... ................91 Summary.................................. 93 V. SUMMARYOOOOOOO. ..... ...OOOOOOOOOOOOOOO ..... .94 Purpose.................. ....... ..........94 High Stakes............. ....... . ....... .95 Design of the Research....... ..... ........96 Conclusions...............................98 Discussion................. ........ ...... .98 Implications for Future Research.........101 Content Match. ................... ...... 101 Setting Time Limits....................103 Ability vs. Time........... ............ 104 APPENDICES OOOOOOOOOOOOOOOO00.0000000000000000000.105 A. Altered Directions for Experimental Groups........... ..... . ..... 106 8. Demographics and Posttest Statistics by HomeroomOOOO0.00.000000000000000.0.000 122 LIST OF REFERENCES.. ........ . ........ .. ............ 136 vii 3.4 3.5 LIST OF TABLES Description of Sample ........ ... ......... Speeded Considerations in Standardizing the Iowa Test of Basic Skills....... ..... Experimental Group Descriptive Statistics by BIOCkOOOOOOOOOOOOO OOOOOOOOOOOOOOOOOOOO Descriptive Statistics by Treatment Group Pearson Correlation Coefficients ......... Testing Time by Treatment and Subtest.... Experimental Groups by Building Number of Classrooms per Site.. .................... Demographic Variables Codes .............. Outcome Data Unadjusted Means Raw Score.. Relationship Between Covariates and Achievement on the Reading Comprehension Subtest of the Iowa Test of Basic Skills Raw Score Data........................... Relationship Between Covariates and Achievement on the Reference Materials Subtest of the Iowa Test of Basic Skills Adjusted Score Data.......... ...... ...... Pooled Unadjusted Treatment Means (Raw Score) for Reading Comprehension Subtest Iowa Test of Basic Skills................ viii ..... 35 ..... 38 ..... 42 ..... 45 ..... 47 ..... 51 ..... 52 ..... 54 ..... 58 ..... 61 ..... 62 ..... 65 4.14 4.15 Table Page 4.5 Analysis of Variance Using Unique Sum of Squares, Raw Scores, Unadjusted Reading Comprehension..... ................. ...... ..... 67 4.6 Adjusted Posttest Means Reading Comprehension.7© 4.7 Pooled Unadjusted Treatment Means for Reference Skills Iowa Test of Basic Skills Raw Scores ............... .... .......... 71 4.8 Analysis of Variance Using Unique Sum of Squares, Raw Scores, Adjusted Reference Skills ..... O OOOOOOOOOOOOOOOO 00.0.00... 00000000 72 4.9 Unadjusted Means for the Reading Comprehension Subtest of the Iowa Test of Basic Skills by Block by Treatment....... ..... 74 4.10 Unadjusted Means for the Reference Skills Subtest of the Iowa Test of Basic Skills by Block by Treatment ................ . ........ 76 4.11 Number of Questions Per Objective Iowa Test of Basic Skills............... ........... 80 4.12 NUmber of Questions Per Objective ..... .. ...... 81 4.13 Comparison of the Content of the Reading Comprehension Subtests of Stanford Achievement Test and Iowa Test of Basic Skills.... .................................... 84 4.14 Length and Number of Passages Comparison of the Reading Comprehension Subtest of the Iowa Test of Basic Skills, Level 12 and the Reading Comprehension Subtest of the Stanford Achievement Test, Intermediate 1.......................... ...... 86 4.15 Questions Per Passage Comparison of the Reading Comprehension Subtest of the Iowa Test of Basic Skills, Level 12 and the Reading Comprehension Subtest of the Stanford Achievement Test, Intermediate 1..... 87 ix Table 4.16 3-4 8-5 3-6 8-7 8-8 B-9 B-lO B-ll Comparison of the Reading Comprehension Subtest of the Igwa Test of Basic Skills, Level 12 and the Reading Comprehension Subtest of the Stanford Achievement Test, Intermediate 1 on Readability Factors used in the Spache, Dale-Chall, Fry, Rayor, Flesch, and Gunning-Fog Readability Tests. ..... ....... ..... ........ Comparison of the Reading Comprehension Subtest of the Igga Test of Basic Skills, Level 12 and the Reading Comprehension Subtest of the Stanford Achievement Test, Intermediate 1 on Readability Factors used in the Spache, Dale-Chall, Fry, Rayor, Flesch, and Gunning—Fog Readability Tests..................... ..... Comparison of the Reading Comprehension Subtest of the Igwa Test of Basic Skills, Level 12 and the Reading Comprehension Subtest of the Stanford Achievement Test, Intermediate 1 on Readability Factors used in the Spache, Dale-Chall, Fry, Rayor, Flesch, and Gunning-Fog Readability Tests............... ........... Demographic Variable Code......... ......... Raw Data Group 1............. ..... ......... Raw Data Group 2...... ...... .. ............. Raw Data Group 3.... ....................... Raw Data Group 4. .......... . .......... ..... Raw Data Group 5. ........ . ....... . ......... Raw Data Group 6........... ................ Raw Data Group 7..... ........ .. ....... ..... Raw Data Group 8............... ...... . ..... Raw Data Group 9............ ...... ......... Raw Data Group 10.. ..... .... ........... .... Page ...88 ...89 ...90 ..123 ..124 .125 O O 126 .127 ..128 ..129 ..130 ..131 ..132 Table Page 8-12 RaWData Group llooooooooooooooooooooo ooooooo 134 8—13 Raw Data Group 12..................... ....... 135 xi Fi LIST OF FIGURES Figure Page 4.1 Treatment Effect by Block ..................... 69 xii ach; Cons the give Educ Stud 1038 task CHAPTER I PURPOSE The concern for speededness in standardized achievement tests has always been an important consideration. Achievement tests that accurately reflect the academic condition of individual students and can be given in an efficient amount. of’ time are crucial to educators. Educators need to be able to demonstrate that students are learning, and learning well, with minimal loss of instructional time. Therefore, the time it takes to administer a battery of tests is an important consideration when choosing an achievement test. Time on task has been shown to be positively related to achievement by the research of Madeline Hunter and others (Hunter, 1978). The time limits established by test publishers and authors reflect the amount of time they have determined is appropriate for a student to demonstrate accurately their knowledge of the material tested. The internal design of standardized achievement tests, then takes into account a sensitivity for the need for timely administration of the test as well as accurate results. a: 1:7 ts mo 6C The uses of standardized achievement tests have changed from the original purpose of evaluating a pupil's achievement and/or the condition of the curriculum to include the high stakes decisions of funding programs, granting of diplomas, the rating of teachers, bonuses, and the like. The importance of testing and the time limits prescribed by test publishers for achievement tests have taken on additional importance with the recent movements toward "excellence" in education and accountability. New laws, such as Michigan's P.A. 25, call for student achievement to be measured annually and reported to the public and for planning to be based on measured student outcomes. The promise of funding tied to performance, looms in the near future. In addition, P.A. 25 calls for the establishment of achievement goals for students which, if not reached in a reasonable amount of time, will result in a school's being taken over by the state. The movement toward whole language instruction in reading is causing reading achievement subtests to use longer passages that more accurately assess the comprehension of the learner. Clearly, test results are becoming the basis of more and more decision making at the local district, state, and national policy- making level. The need for accurate measures of student learning, both norm referenced and criteria referenced, are exploding. With this explosion in the number of rt «5 tests and uses of tests, the need for efficiency is also paramount. The time limits set by publishers is important, and so is the accuracy of those time limits. It is valuable to know if established time limits can be altered in relatively minor ways and have a significant impact on individual or group scores or means. If the inaccurate timing of achievement tests by teachers or the inaccurate establishment of test times by publishers has a differential effect on individual or collective outcomes, then more careful administrative practices and test development procedures may be necessary. Classrooms, teachers, or school districts may be eligible for rewards of various kinds due to test results that are unfairly achieved. It is the purpose of this study to look at the effect of altering the time limits of selected subtests of the Iowa Test of Basic Skills on both individual and class outcomes. This is a replication of two studies done by Rudman and Raudenbush (1986; 1987) using the Stanford Achievement Test. They found a significant effect of testing time on the class mean in Reading Comprehension, but no significant effect on either Word Study Skills or Mathematics Applications subtests when the standardized time limits were exceeded (1986, p. 8; 1987, p. 9). Their results also suggested that the opt: SUQE Word of time othe This as u rule limi The 0f t. test reseE can I at or addre optimal testing time may not necessarily be the time suggested by the test publisher. The time limits of the Word Study Skills, and Mathematics Applications subtests of the Stanford Achievement Test were not shown to be time-sensitive. This could mean that these subtests and others are given more time than is optimally necessary. This may also apply to other standardized test batteries as well. At the least, it brings into question the 90% rule commonly used by test developers when setting time limits for subtests of standardized achievement tests. The 90% rule establishes the testing time as the amount of time it takes for 90% of the test takers to finish the test under power conditions (Nunnally, 1978). Further research needs to be done to determine if these results can be confirmed for subtests of other test batteries and at other grade levels. The important research questions addressed by this dissertation follow: 1. Did the Rudman/Raudenbush research (1986; 1987 ) truly uncover a time sensitive area in Reading Comprehension, or was it an artifact of the Stanford Achievement Test? Will the Iowa Test of Basic Skills, reading subtest, yield similar results? 2. Will this study show if there is a quadratic effect of excess time on the two subtests of the I9w_a Test of Basic Skills used in this study? The effect of added time may increase the scores for some amounts of til in< time, but that effect may not be sustained for all tested increases in time. 3. Are the reading comprehension subtests of the Iowa Test of Basic Skills and the Stanford Achievement Test equivalent, that is, do they appear to test the same things? Research Questions These research questions are directed at furthering our knowledge of the accuracy of current practices in setting time limits for subtests of achievement test batteries. The time limit alterations used in this study will replicate those used in the Rudman/Raudenbush studies. The major research questions explored were the following: 1. Will increasing the time limits on the Reading Comprehension subtest of the Iowa Test of Basic Skills by five, ten, or fIfteen minutes sIgnIficantly increase student achievement? 2. Will increasing the time limits on the Reference Materials subtest of the Iowa Test of Basic Skills by five, ten, or fifteen minutes significantly increase student achievement? Speededness Tests can be categorized into Speeded tests, power tests, time-power tests, and speed-difficulty tests (Nunnally, 1978, p. 632). Pure power tests are designed to measure knowledge in the absence of time limits, using to ite tes nd (Nu. ach; adm: Clea to measure knowledge in the absence of time limits, using items of ordered difficulty, easy to hard. Pure speeded tests measure knowledge under the constraints of time, and the test items are more trivial in difficulty (Nunnally, 1978, p. 631). Standardized tests of achievement are time—power tests due to the necessary administrative functions involved in testing. It is clear that time limits are imposed because of administrative constraints (Kendall, 1964). The time limits are set for most tests by using some variation of the 90% rule, that is, the time is noted when 90% of the test takers are finished under power conditions. Nunnally refers to this as the "comfortable time limit" (1978). The time limits for the Stanford Achievement Test have been established using the 90% rule. The time limits for the Iowa Test of Basic Skills were set based upon a variation of the 90% rule, the authors looked at the percentage of students finishing 75% of the test, 80% of the test, and 100% of the test. The results of the :Rudman/Raudenbush. studies indicate the optimal time limit may not have been found for the Reading Comprehension subtest of the Stanford Achievement Test by using the usual method of setting time limits for tests (1986; 1987). Their results showed a highly significant, positive, linear effect of excess time on Reading Comprehension test scores. Perhaps a of the stud BOag kind: SCOrE are l 1956) Stands 1985)? Way to longer testing time than that used by the publisher would actually be the optimal time limit for the Reading Comprehension subtest of the Stanford Achievement Test. If the results of the Rudman/Raudenbush study can be verified by further research, using different achievement tests, these ancillary questions and suggestions for additional study may be suggested: 1. Are the time limits set for the administration of the subtests on standardized achievement tests really the "optimal time limit" (Kendall, 1964)? 2. Is there a differential effect of time limits on students of differing abilities (Durso & Rubin, 1982; Boag & Neild, 1962)? 3. Are there different kinds of speed for different kinds of tasks (Lord, 1956)? 4. Will students who are capable but work slowly score better than students who work slowly because they are less able (Daly & Stahmann, 1968)? 5. Is there a speed factor in intelligence (Lord, 1956)? 6. How closely should the administration. of standardized tests be monitored (Rudman & Raudenbush, 1986)? 7. Is the 90% rule, and its variations, the best way to set test time limits (Rudman & Raudenbush, 1987)? for Sta One con: of diff the plac resu inVe: imPOI Subte COmpr in fa COmPa to 10 test Public The center: K 8. Are there in fact different maximal time limits for different subgroups in the population (Daly & Stahmann, 1968)? This study is designed to be a replication study. One purpose of a replication study is to determine consistency of results. In this case to test the results of the Rudman/Raudenbush (1986; 1987) studies across a different population and different achievement test. If the results are the same, additional confidence can be placed upon the results of the earlier studies; if the results are different, researchers may want to investigate the reasons (Borg, 1983, p. 383). To determine if the results are the same, it is important to determine if the Reading Comprehension subtest of the Stanford Achievement Test and the Reading Comprehension subtest of the Iowa Test of Basic Skills, in fact, test the same thing. That is, are the two tests comparable? To make that determination, it is necessary to look at the internal makeup of the tests, the content tested, and standardization procedures. Methodology The research milieu is a middle sized Michigan Public School system. of approximately 7,844 students. The elementary schools are organized around "elementary centers" that have replaced the traditional neighborhood schoc apprc one 2 high grade of 't in ai of B; gene] school concept. These centers or complexes each house approximately 1,300 students. This system currently has one large high school (1,486 pupils) and one large junior high school (1,591 pupils). The average enrollment per grade is 562 students. The sixth grade is fairly typical of the characteristics of a Midwest middle sized school in all of the major demographics: sex, ethnicity, number of parents, and social class. The school system used the 1986 edition of the Igwg Test of Basic Skills to monitor the progress of its programs. These tests were given in the fall of the year to provide the district with. needed information concerning Chapter I eligibility, the status of the general curriculum, and to provide objective information for developing individual student's programs. Grades 3, 5, and 6 are tested with the Iowa Test of Basic Skills. This study was based upon the test results from the 12 sixth grade classrooms, in the school system that volunteered to be part of the study. This study was not part of the regular testing program and took place in the spring of 1988. Within those classrooms, the students who had test results available from their fall, 1987, Iowa Test of Basic Skills testing were used (see Table 3.1). The sixth graders, included, were housed in two locations, School 1 had 157 students in 6 sections, and School 2 had 154 students in 6 sections. They were simi Tdbl 10 similar regarding the other variables of interest (see Table 3.1). Variables The variables considered in the Rudman/Raudenbush study (1986; 1987) were replicated, to the extent possible, in this study. These variables—-sex, ethnicity, and social class--have continued to be the focus of sociological research (Rehberg & Rosenthal, 1978, p. 4). In addition, it has been noted that children from families of low levels of education and different language and cultural styles do not do well on formal tests of ability and aptitude (Brookover & Erickson, 1975, p. 104). The Rudman/Raudenbush research found the best covariate to be achievement on the previous years Stanford Achievement Test. No significant relationship to achievement was found for the other demographic variables (Rudman & Raudenbush, 1986, p. 6). Rudman and Raudenbush used the number of students receiving AFDC (Aid to Families of Dependent Children) as a secondary blocking~variable. This research used the same blocking variables. This research found the best covariate to be previous achievement on the same subtest of the Iowa Test of Basic Skills. Membership in Chapter I also proved to be a publi: 11 significant covariate. No significant relationship was discovered for other variables. Characteristics of the Iowa Test of Basic Skills The Iowa Test of Basic Skills, 1986 edition, Level 12, is made up of 13 subtests: Vocabulary, Reading Comprehension, Spelling, Capitalization, Punctuation, Usage and Expression, Visual. Materials, Reference Materials, Math Concepts, Math Problems, Math Computation, Social Studies, and Science. The school system included in the study administers all the available subtests, except Science and Social Studies. ’ This research dealt with only the Reading Comprehension and Reference Materials subtests of the Iowa Test of Basic Skills. Time Alterations The time alterations used in this research replicated those used by Rudman/Raudenbush. The control group used the time limits established by the test publisher. The experimental groups had five, ten, or fifteen minutes added to the testing time of each subtest. Outcomes The principal objective of this study was to determine if additional time will affect the outcome on the I oft 12 the Reading Comprehension and Reference Materials subtest of the Iowa Test of Basic Skills. Evidence was found to suggest that achievement on the IReading Comprehension subtest of the Iowa Test of Basic Skills is not affected by additional time. In addition, no significant effect was found for the Reference Material subtest of the Iowa Test of Basic Skills when additional time was added. Other subtests, after additional study, may be found to be affected by additional time. If this is true, then the test publishers should look at alternate ways to set time limits for subtests. The standard procedure of using the 90% rule, to set time limits, or variations of this rule, is no longer sophisticated enough when tests are being used fcr "high stake" decisions or to compare unequal subgroup populations. Additional research is needed to create new and better procedures to establish time limits for all subtests of all achievement tests. CHAPTER II REVIEW OF THE LITERATURE High Stakes Use of Test Results The accountability movement in education has brought with it a new reliance on testing to demonstrate the effectiveness of educational programming at all levels. Effective education means a positive change in performance or increase in knowledge and a change that can be measured and reported. Schools are rewarded or criticized based upon test results. The stakes are high. The importance of administrative practices in the giving of standardized achievement tests has increased with the "high stakes” decisions that are attached to the outcome data. Three reports from the 0.8. Department of Education--A nation at risk (1983), First lessons (1986), and James Madison secondary school (1986)--all point to standardized test results as an indicator of slipping quality in America's schools and their use as a quality indicator for the future. The use of standardized test results to reward excellence and monitor remedial activities, designed to improve competence, began before 13 these these Educa' summi goals 14 these reports, but they have increased dramatically since these reports were published. The more recent Education 2000 National Goals for Education, developed at an historic governors' education summit in Charlottesville, Virginia, established six goals for education: 1. By the year 2000 we will bring all children to school ready to learn. 2. By the year 2000 we will raise the graduation rate to 90%. 3. By the year 2000 we will demonstrate student competency in all subjects and prepare students to be responsible citizens, productive workers, and lifelong learners. 4. By the year 2000 we will make the U.S. #1 in math and science. 5. By the year 2000 we will achieve 100% literacy. 6. By the year 2000 we will ensure drug-free, violence-free schools. (National Association of School Boards Bulletin). The implications for testing are very clear. The measurement for the Goals 2000 will involve student testing at every level and create enormously high stakes for failure to perform. Madaus (1985) reports, although he disagrees, that testing has been seen as the universal cure for the ills of schooling. It is the source of information that will save the world from illiterate graduates. School reforms that rely on tests to certify students or to assess the (Sa tes LES‘ polj 15 quality of schools have proliferated in recent years (Salganik, 1985). Several kinds of high stake decisions are based on test results. A review of the literature suggests that tests are used in several ways: 1. Student accountability. 2. Development of state and local educational policies. 3. Comparing schools and school districts. 4. Determining program or activity eligibility. Teacher Certification or advancement. O‘U'l Curriculum change and development. Student Accountability Have the students learned what the system has taught them? Should they be given a diploma or promoted? The notion that advancement should be based upon demonstrated competence as shown by test results is very common in education today. Indeed, students' test scores are beginning to be used as the data base for entrance into or exiting from academic programs as well as a major quality control factor in determining readiness for graduation. A nation at risk (1983), while certainly not the initial catalyst for change, has focused attention upon education as perhaps never before and calls for the "judicious use of achievement tests at key transition poin Resn Basi of e 16 points throughout the educational system (Resnick & Resnick, 1983). Sandifer reports that the South Carolina Basic Skills Assessment Program, established by legislation in 1978, requires that the state department of education: 1. Establish statewide educational objectives in the basic skills with minimum standards of student achievement for readiness and for grades 1, 2, 3, 6, 7, and 11; 2. Select a readiness test to be administered at the beginning of grade 1; and 3; and 3. Select or develop criterion-referenced tests in reading and mathematics to be administered at the end of grades 1, 2, 3, 6, 8, and 11 (Popham, Cruse, Rankin, Sandifer, & Williams, 1985). Other states have joined the movement. Florida's Senate Bill 68 includes performance standards for each academic course in Grades 9-12 (Pipho, 1986). Arkansas called for holding students at the eighth grade unless they could pass a competency test in the basic skills (Pipho, 1986). Texas banned social promotion and required a grade of 70% for passing from one grade to the next (Pipho, 1986). Georgia's 1985 Quality Basic Education Act addressed the testing of students and stated that a readiness instrument be administered during kindergarten and early first grade, that competencies be established for each student K-12, and that the use of norm—referenced and criterion-referenced tests be increased at all levels (Pipho, 1986). Oklahoma's Reform Bill stan (Phi loca Soci Stud resu high 17 Bill of 1985 called for the use of norm-referenced standardized testing for students in grades 3, 7, and 10 (Phipho, 1986). It is now common for states, as well as local districts, to base promotion upon test results. Social promotion is becoming a thing of the past. Students are also being confronted with the use of test results to determine whether or not they graduate from high school. The focus is on outcomes in current educational change. Unlike earlier reforms that relied on process controls, the new reforms use test scores to determine whether students should receive diplomas or whether program quality is adequate, and rely on "output controls" (Salganik, 1985). Nineteen states are currently implementing tests for high school graduation (Anderson & Pipho, 1984). Rankin reports that the Detroit Public Schools have established requirements for high school graduation called the Detroit High School Proficiency Program. Students in Detroit need to demonstrate mastery of these requirements in order to receive a board-endorsed diploma (Popham et al., 1985). Maryland has used a reading test as a prerequisite for graduation since 1982 and planned to add mathematics and writing in 1987 and citizenship in 1988 (Popham et al., 1985). that in'I prog: atten exami Or C RGSni the n (ReSn 18 Remedial programming is another high stakes decision that is being determined by test scores. Senate Bill 350 in Texas directs that the results of the assessment program are to be used to design and implement appropriate compensatory services in grades 3, 4, and 9. In addition each school is to receive an allotment for each disadvantaged student enrolled in the district (Popham et al., 1985). Pennsylvania has mandated a test for remediation purposes for grades 3, 5, and 8 (Anderson & Pipho, 1984). A related issue, tracking, also has received attention. In England, a complex system of national examinations permit schools to assign students to tracks or classes according to test performance (Resnick & Resnick, 1983). Tracking is also identified as one of the major elements effecting U.S. educational standards (Resnick & Resnick, 1983). These are growing trends. In 1975 not one state had established minimum competency testing programs; four years later, 37 states had done so (Doyle & Hartle, 1985). Now 40 states are actively pursuing some form of minimum competency testing (Anderson & Pipho, 1984). This high stakes use of test results appears to be here to stay. U (D {—1 0 Mick Part 19 Development of State and Local Educational Policy The accountability and refonn movement in education is using test results to create policy decisions. In mandating tests, policymakers have created the illusion that test perfOrmanoe is synonymous with the quality of education (Madaus, 1985). Politicians and educational administrators have changed the definition of educational quality from process based to outcome based. Examples, such as the MAP (Michigan Accreditation Program), focuses on outcome measures and utilizes standardized test results as a measure of quality. In addition, P.A. 25 in Michigan requires progress to be reported at least in part as standardized test scores and policy decisions to be made by analyzing standardized test scores. Standardized testing, says one critic, is regarded as the universal cure for educational ills because it is relatively inexpensive, well. developed, readily available, and administratively simple. Moreover, the symbolic value of tests is attractive to policymakers (Madaus, 1985). As public confidence in education has eroded, policymakers have looked for ways to revive that confidence. Testing has introduced a deceptive simplicity to the task of restoring both. educational quality and public confidence in the schools. Few people m fror Of1 imnl Init 20 are willing to argue with the use of tests as a means of ensuring quality control (Salganik, 1985). Testing has become the "authoritative" evidence that quality is rising (Salganik, 1985). As school systems across the United States began to react to public pressures in the 60's and 70's, they developed policies related to change and set policies that are based on evidence created through the use of test data. Today state legislatures, state departments of education, local school districts, and test publishers are all working together to bring about more comparative data. Testing is becoming the preferred means of trying to effect change in education (Anderson & Pipho, 1984). The use of testing to effect policy change has moved from the local to the state and national level. The use of tests has assumed a central role in establishing and implementing state and federal education policy. Initially, the results of local tests were used merely to inform state and federal policymakers about the condition of education. The shift away from local use was a logical outgrowth of the huge federal expenditures for curriculum development in the 19605. Advocates for various racial and ethnic groups have begun to cite discrepancies in test results between minorities and the majority. The idea of national assessment was promoted during the 19605 so that, according to Ralph Tayler, the pro res lev' 21 founder of the National Assessment of Educational Progress, ”Necessary information. might. be jperiodically collected to furnish a basis for public discussion and broader understanding of educational progress and problems." The Coleman report was based largely on the results of a standardized test of verbal ability. All of these examples show the trend toward state level and national level testing. They represent the use of test scores as a source of information when lobbying to influence 'within the development of' public policy rather than using test results as an influence within the sphere of pedagogical practice (Madaus, 1985). Tests and examinations have traditionally served as a major' means of setting and maintaining educational standards (Resnick & Resnick, 1983). Additional uses of tests demand that they become more and more reliable. If we are to base policy decisions upon test results, then we must be confident that tests measure what they say they measure and that administrative practice in administering tests do not effect the results. Tests must be made as error free as possible. Comparing School or School Districts There is a tremendous amount of pressure to compare schools at all 1evels--local, state, and national. The identification of the best schools is viewed as important in tl past favor Schol Prog: cause eleme Such an ar 22 in the nation's quest for excellence. Terrel Bell, a past Secretary of Education, has gone on record as favoring a 50-state educational ranking system using the Scholastic .Aptitude Test and American College Testing Program (Anderson & Pipho, 1984). The political pressure caused by such comparison is thought to be an important element of successful programs (Odden & Anderson, 1986). Such a listing of st‘ate-by—state comparisons appears in an article by Stellman and Powell (1985). Comparisons are being considered by other groups and agencies as well. The Southern Regional Education Board is going ahead with a pilot program in Florida, Tennessee, and Virginia to make minimum competency test scores comparable across state lines and to publish state-by-state comparison data on tests (Anderson & Pipho, 1984). In Michigan, school districts are compared in the newspapers based on the results of the Michigan Educational Assessment Program test. The council of Chief State School Officers voted in Nbvember, 1984, to endorse such. state-by-state comparisons, but they expressed concern that comparisons be fair and established a group to study the issue (Salganik, 1985). Determinin Program or Activity E igibility It has already been noted that test results are used to provide information for tracking students in the se: cox mo: dig has to eqd thi pra the 23 secondary schools. Resnick has noted that one way to control educational standards is to use test results to monitor individual student's access to programs and diplomas (Resnick & Resnick, 1983). This movement is based on the idea that a common criterion applied equally to every one is just, since it makes all individuals equal before the law (Popham et al., 1985). However, this does not take into account the administrative practices in giving tests, nor the differing talents of the various subgroups of the population. Required admission to remedial programming is yet another use of test results. As was noted earlier, Senate Bill 350 in Texas directs that students be funneled into remedial classes (Popham et al., 1985). In other states, such as Michigan, additional funds are given to some districts to establish remedial programs or compensatory programs when they have a significant number of students who score low on assessment tests. Madaus (1985) notes that the placing of children who fall below an artificial cutoff line on a statewide test into remedial programs is one of the problem areas in testing. The National Collegiate Athletic Association has a rule that a student who attends a Division I school must have a combined verbal and math score of 700 or higher on the Scholastic Aptitude Test to be eligible to play during his freshman year (Madaus, 1985). Local school of inst Cree stat exam Cart: 24 districts and some states, such as Texas, have taken similar stands and define eligibility for extracurricular activities on academic performance. In New Mexico a 1986 legislation established eligibility requirements for many student programs (Pipho, 1986). Teacher Certification or Advancement Student achievement is not alone as a high-stake use of test results. Teachers are also being ‘tested to insure competency before being granted teaching credentials. Action in this area has been taken in the following states: Oklahoma--a 1985 reform. bill that established an examination in the basic skills for initial teacher certification (Pipho, 1986). Florida--a 1983 bill that called for a statewide merit pay plan for teachers (Pipho, 1986). Arkansas--Act 76 created a competency test for teachers that required all practicing teachers to pass a general test of academic skills before their certificates would be renewed (Pipho, 1986). Tennessee--Senate Bill—1 in 1984 that created a Career Ladder Law (Pipho, 1986). 25 California--Senate Bill-813 included pilot program to reward high schools for improved student achievement (Pipho, 1986). Michigan--P.A. 267 and P.A. 25--accreditation programs, such as Michigan's MAP program use standardized test results as an output measure in determining the success of a school. Schools must develop annual reports for the public that show current achievement test results and comparisons to past testing. In addition, a two-tier level of teacher competency tests must be passed for state licensure. Anderson and Pipho (1984) have noted that the use of measured outputs is one of the notions that came out of the accountability movement. Test results mean continued successful operation in some school districts. "What do test results mean to a school board member, administrator, or teacher . .. . in a school district in which certification is in jeopardy because of low results on a statewide exam?" (Madaus, 1985). The superintendent of the Detroit Pubic Schools announced that she would close schools that do not measure up to set performance and improvement standards. It is clear that teachers, students, and school districts have high stakes in the measured outcomes of student achievement. Certification, accreditation, and SPEC Test 26 increased revenues are strong motivators for districts and teachers to have students who perform well on tests. Curriculum Change and Development Testing has become a powerful tool for effecting change in education (Resnick 8. Resnick, 1983). Popham states, ”a high stakes test of educational achievement then serves as a powerful curricular magnet" (Popham, 1987). Information to drive curriculum change is one of the oldest. uses for educational communityu It is a process tool used to determine the effectiveness of curriculum content and teaching methodology. School districts refer to standardized tests to determine what specific changes they should make in their curriculum. Test developers try to design tests that fit the general curriculum of the nation. Test results are important to the educational community. They are used to make decisions that effect the lives of children and the fate of teachers, schools, school districts, and even the educational well being of states. These high stakes demand that professionals carefully scrutinize all aspects of test administration as well as test validity and curriculum match. Too much is dependent upon the outcome measures of norm and criterion referenced tests. 27 Speednesses in Test Construction Speed is an affecting factor in almost all achievement and aptitude tests (Lord, 1956). Speed, therefore, is a major factor in test administration procedures, and in fact, is used to categorize tests. It is a possible factor in fairness as speed may differentially affect various subgroups within the population. Anastasi (1961) states, "It is important to know the extent to which speed and power enter into performance on any particular test." Tests can be categorized into speeded tests, power tests, time-power tests, and speed-difficulty tests (Nunnally, 1978, p. 632). Pure power tests are designed to measure knowledge in the absence of time limits, using items of ordered difficulty, easy to hard. Pure speeded tests measure knowledge under the constraints of time, and the test items are more trivial in difficulty (Nunnally, 1978, p. 631). Standardized tests of achievement are time-power tests due to the necessary administrative functions involved in testing. It is clear that time limits are imposed because of administrative constraints (Kendall, 1964). Setting Time Limits The setting of time limits is not clearly understood. Som Var Whei COns time COm: l1m: test not the 28 Much remains to be learned about speed, in spite of the fact that it is commonly an element in tests and may impact upon test scores. Is speed on cognitive tests a unitary trait? Or are there different kinds of speed for different kinds of tasks? If so, how highly correlated are these different kinds of speed? How highly correlated are speed and level on the same task? How do various criteria relate to speed, and how speeded should tests to predict these criteria be? (Lord, 1956, p. 31). Some researchers, Thorndike and Rimoldi, have questioned is there is a difference between test takers that can be defined by rate of completion (Daly & Stahmann, 1968). The time limits are set for most tests by using some variation of the 90% rule, that is, the time is noted when 90% of the test takers are finished under power conditions. Nunnally refers to this as the "comfortable time limit" (1978). In other situations, this comfortable limit is checked by using a preconceived time limit and then checking to see what percentage of the test takers are finished within that time. There does not appear to be any empirical evidence for the use of the 90% figure, it appears to be convention (Kendall, 1964). Kendall (1964) contends that there is a "maximum time limit” for each test, the time beyond which, "there is no further increase in validity." Speed Factors Earlier research has investigated several aspects of speed and possible relationships to covariates. Studies have found several speededness factors. All of these 29 factors are thought to influence the results of achievement and aptitude tests. A.1951 study by French found that there is a perceptual-speed factor, quickly finding the correct answer in the midst of distracting material (Lord, 1956). He breaks this factor down into two additional factors, speed of symbol discrimination, recognizing familiar symbols, and form perception, making sense of unfamiliar symbols (Lord, 1956). Also in his 1951 study, French found these factors to be related to *speed; finger dexterity, fluency of expression, ideational fluency, reaction time, speed of association, speed of judgment, and tapping word fluency. In a 1953 study, French noted factors of speed of closure and speed of cognition (Lord, 1956). Rimoldi, in a 1951 found that speed of judgment, speed of cognition, and a personal tempo are factors related to speededness in test taking (Lord, 1953). Lord refers to a number-speed factor in his 1956 study. A study done in 1982 by Wild, Durso, and Rubin addressed the following research question relating to the verbal and quantitative experimental sections of the Graduate Record Examinations Aptitude Test (GRE). Does increasing the amount of time per question have a differential effect on the score (after controlling for initial ability) of examine subgroups as defined by sex, by race, (Black and White), and by the number of years that have elapsed since the baccalaureate degree was obtained? fac fac faC‘ He the Conc the admi (LO: and to abil 30 These researchers discovered although a larger proportion of examinees completed the experimental tests when given additional time, this extra time does not help any of one of the subgroups more than any other subgroup studied. They suggest that further research is needed on the interaction of ability and tests timing within various subgroups (Wild, Durso, & Rubin, 1982). Lord, in a 1956 study, looked at a total of 10 factors, four of which were factors of speed. The speed factors were number-speed factor, perceptual-speed factor, verbal-speed factor, and spatial-speed factor. He fOund that all correlations between course grades and the four speed factor were positive, but not large. He concluded that speed of various kinds plays some part in the course grades studied and that speededness in the admissions examinations is to this extent justified (Lord, 1956). A 1962 study by Boag and Nield focused on the researoh question, "Is there a relationship between time and untimed test scores when vocabulary tests are given to secondary school students grouped according to ability. These researchers found that: 1. ”Speed and power scores cannot be used interchangeably" as supported by the result that high school students within each group made marked changes in their relative standings when they given additional time. 2. "Changes in relative scores occurred with considerably greater frequency in the regents and ave com suf It Verj aff: 31 nonregents group (average groups) than in the scholarship (above average) groupu" The biggest change was in the regents group (average academic track students). 3. "There is a relationship in the results of these scores to the general academic ability of the student" (Boag & Neild, 1962). They concluded that "under power conditions, the average student, that is the slow and accurate student, comes out nearer the top when given plenty of time but suffers when there is a time limit" (Boag & Neild, 1962). It appears from this study that setting time limits from very speeded condition to a high power condition will affect how at least some subgroups perform on a test. In 1968 Daly and Stahmann conducted a study that looked at the time limits of the Cooperative English Expression Test that were used as part of the admissions procedure for the University of Utah. The results showed that if 41% of the students were given additional time on the placement test, they would have achieved the same result they achieved after being placed in the remedial English course. This study raised a special subgroup of test takers, ”slow working." A 1972 study by Evans and Reilly was constructed to determine "(1) if the Reading Comprehension section of the Law School Admissions Test (LSAT) is more speeded for candidates from predominantly black colleges than for a typical candidate population, and (2) if reducing the amount of speededness has a differential effect on the two c two I the admit the : Theix Speec Speec grou; Sign; Reil; this sOme What ConS 11m: CQSE timE She. 32 two candidate populations." These researchers identified two ways to investigate speededness: (1) to administer the same test and vary the time limits, and (2) to administer tests with different numbers of questions in the same time limit. They chose the second alternative. Their findings were that the LSAT ‘was somewhat more speeded for fee-free candidates, that reducing the speededness increases the test scores somewhat for both groups, and that decreasing speededness is not significantly beneficial to fee-free candidates (Evans & Reilly, 1972). There appears to be some evidence from this study that decreasing speededness does have at least some effect on reading achievement scores, as this is what the subsection. of the ‘test ‘used is intended to measure. Kendall has identified two kinds of time limits when considering the speedness of tests. The optimal time limit is that time for which the ratio of benefits to the cost associated with testing is maximum. The maximal time limit is that time beyond which validity does not show any reliable increase (Kendall, 1964). This evidence supports the idea that. speed may differentially affect subgroups within a population. This effect may show up in the means of different groups taking standardized achievement tests. Therefore, the administration of standardized tests is an important issue if re acros same ‘ 33 issue. The search for the maximal time limit is crucial if results are to be as accurate as possible and fair across all subpopulations and between groups taking the same test. CHAPTER III METHODOLOGY AND INSTRUMENTATION Sample This study included the sixth grade classrooms in a middle sized school district that volunteered. The sixth graders included in the sample were housed in two locations: School 1 had 128 students in six sections, while School 2 had 147 students in six sections. They were similar regarding the other variables of interest (see Table 3.1). Twelve classrooms in two large elementary schools volunteered to be part of the study. A similar condition was true for the Rudman/Raudenbush research. Within the classrooms that volunteered, all students took the test, but only the scores for students who took the Fall 1987 Iowa Test of Basic Skills and had results for at least the Reading Comprehension subtest were used for the study. This accounts for the differences between student counts per building, 157 and 154, respectively; and the sample counts per building, 128 and 147, respectively. 34 Tab] DGSC Demo Varil scl eXE 4. Cha . Pam EXp Samplir . SiXt Incl 35 Table 3.1 Description of Sample Demographic Relative Variables Factors Frequency 1. Percent of each sex Male 50.55 in experimental Female 49.45 sample 2. Percent ethnicity in Black 17.09 experimental sample Other Minority 6.91 Total Minority 24.00 White 76.00 3. Percent minority per Total Minority school site in School 1 26.56 experimental sample Total Minority School 2 21.77 4. Chapter I Population Chapter 1 22.55 Regular program 77.45 5. Family Structure of % of 2 parent families 70.54 Experimental Sample % of 1 parent families 29.45 Sampling Variable Count 1. Sixth Graders Enrollment 275 Included in Sample Buildings 2 Building 1 128 Building 2 147 36 Instrumentation Characteristics of the Iowa TesE of Basic Skills The Reading Comprehension subtest in Level 12 "consists of selections varying in length from a few sentences to a full page. The passages were chosen in an attempt to represent as completely as possible all of the types of materials encountered by the pupils in their everyday reading" (Hieronymus 8. Hoover, 1986a, pp. 78- 79). The subject areas included are. social studies, science, literature, and general information. The authors indicate that there is a gradual ranking of higher level thinking skills as a student moves from level to level. The Reading Comprehension subtest of the Iowa Test of Basic Skills is included in this study to see if any relationship between time and achievement exists, such as that in the Rudman/Raudenbush study (1986; 1987) using the Reading Comprehension subtest of the Stanford Achievement Test. The Reference Materials subtest of the Iowa Test of Basic Skills measures the following skills: using a dictionary, using an encyclopedia, using general references, alphabetizing, using dictionary guide words, using key words, and using an index (Hieronymus & Hoover, 1986a, p. 85). ‘This subtest was chosen because it tests application skills that are based upon reading and the lar in cont add OI] Vc‘ HQ‘nSrf- N in Table Davy ‘ 37 language arts. These skills are similar to those needed to score well on the Reading Comprehension subtest, and therefore, this subtest may also be sensitive to changes in time limits. It is a subtest that does not parallel content tested by Rudman/Raudenbush (1986; 1987) and may add to an understanding of the effect of increasing time on various subtests of standardized achievement tests. Setting Time Limits An important characteristic of achievement tests is that they should be "power" tests rather than "speeded" tests. Pupils should be given ample time to complete the test in order to provide for a true measure of their skill development. . . . The power characteristics of the Iowa Test of Basic Skills are demonstrated in three ways: 1. by the percents of pupils who complete the tests, 2. by the percents of pupils who complete 75% of the tests, and 3. by the percents of items completed by 80% of the pupils (Hieronymus & Hoover, 1986b, pp. 42-43). The completion rates of the relevant subtests are given in Table 3.2 (Hieronymus & Hoover, 1986b, p. 50). A Replicated Experimental Design David Lykken (Borg, 1983, pp. 383-385) has distinguished three types of replication: 1. Literal replication, or an exact duplication of the sampling procedure, experimental conditions, measuring techniques, and methods of analysis of an earlier study. 2. Operational replication, duplicates only the sampling and experimental procedures. Subtes: Reading Referen Skills C BaSiC S 136), 38 Table 3.2 Speeded Considerations in Standardizing the Iowa Test of Basic Skills. Completion Rates Level 12 Spring* Percent of Percent of Percent of Pupils Pupils Items Subtest Completing Test Completing Completed 75% of Items by 80% of Pupils Reading 91 96 100 Reference Skills 97 99 100 I *Manual for School Administrators, Iowa Test of Basic Skills, Riverside Publishing Company, 1986, p. 136). sampl Rudma: Opera‘ V 39 3. Constructive replication, uses only a statement of' the empirical fact the first investigator claimed to have established, and uses different methods of sampling, measurement, and data analysis. Since this study was designed to duplicate the sampling' procedures and measurement techniques of the Rudman/Raudenbush studies (1986; 1987), it is an operational replication. Analytical Design When considering an analytical design an experimenter needs to consider the following points: 1. Will the actual data collected contain all the information needed to make inferences, and can the information be extracted from the data. 2. Can the important hypotheses be tested validly and separately. 3. Will the level of precision reached in estimation, and the jpower of' the statistical tests be satisfactory for the purpose (Hayes, 1981, p. 404). The data collected included demographic data on sex, race, ability, social status, and membership in Chapter I programming. The testing statistics used was raw score. These data provided all the information necessary to make the inferences required by the hypothesis. Each hypothesis was represented by an experimental change in time limit. Each experimental time limit was assigned to each block. Therefore, the important hypothesis were accounted for in the design of the study. assur« confi 40 The researcher used the .05 level of confidence to assure precision in the study. This is the level of confidence usually used in social science research. The basic analytical design of this study involved pupils nested within block-by-treatment combinations. The blocks were used to increase the statistical power of the experiment by reducing the "error." The error is reduced by partitioning the error into the sum of squares for the blocks and the sum of squares residual, and the block interaction with treatment effect. One can often lower within factor level variability of the outcome measures by drawing the experimental units from a homogeneous subpopulation--e.g., persons all the same age, sex, IQ, and socioeconomic level. This may reduce the "error" considerably . . . (Glass & Stanley 1970, p. 491). In this study the subpopulations were created by partitioning the experimental groups within the population by ability, defined as the group mean on the Reading Comprehension subtest on the previous Fall's Igwa Test of Basic Skills and the proportion of children receiving AFDC. Analysis issues involved were: 1. establishing blocks 2. choosing covariates 3. handling unequal sample sizes 4. choosing an appropriate error term for hypothesis testing Block block. score Test block; from I by Sir was c was r. were blocks from c 41 Blocking The participating classrooms were divided into blocks using classroom means on the fall achievement test score on the Reading Comprehension subtest of the _I_o_w3 Test of Basic Skills Form H Level 12 as the primary blocking variable. The classrooms were then rank ordered from high to low. They were put into one of three blocks by simply dividing the ranked list. After this ordering was complete, the percentage of children receiving AFDC was reviewed for each classroom. The block assignments were also reviewed using this blocking variable. The blocks were established to be as different as possible from one another on the achievement blocking variable and on the proportion of children receiving AFDC. In this study there were three blocks: one contained the four highest achieving classrooms, one contained the four middle achieving classrooms, and one contained the four lowest achieving classrooms. The classrooms within each block were then randomly assigned to treatments by assigning each. classroom a number and then using a published list of random numbers for assignment. The resulting' experimental procedure produced a balanced randomized block design with three blocks and four treatments within each block. The groups are described in Table 3.3. 42 Table 3.3 Experimental Group Descriptive Statistics by Block. B C T C % of % % of % of % of Pre- Pre— L L R O Other Male AFDC Chap one Read Read 0 A E U Non- Prnt Mean Mean C S A N White K S T T Raw Adjust— M Score ed E N T 1 1 5 26 0.0 46.2 3.8 0.0 0.0 38.84 44.88 1 2 15 27 29.6 70.4 7.4 11.1 22.2 34.78 42.74 1 3 0 24 16.7 54.2 33.3 0.0 4.2 33.17 37.33 1 4 10 29 20.7 44.8 20.7 0.0 17.2 34.17 39.48 MEAN 26.5 16.7 53.9 16.3 2.8 10.9 35.29 41.15 2 5 10 25 28.0 52.0 40.0 24.0 36.0 30.20 34.04 2 6 0 23 30.4 43.5 56.5 17.4 52.2 27.83 27.87 2 7 15 24 28.0 40.0 32.0 48.0 32.0 28.92 32.68 2 8 5 23 27.3 59.1 40.9 18.2 40.9 27.04 34.00 MEAN 23.8 28.4 48.7 42.2 26.9 40.3 28.54 32.18 3 9 15 15 33.3 60.0 93.3 0.0 26.7 25.40 28.53 3 10 5 14 35.7 71.4 100.0 78.6 28.0 19.43 19.26 3 11 10 22 18.2 36.4 72.7 40.9 63.6 26.73 34.55 3 12 O 23 21.7 39.1 60.9 34.8 43.5 26.30 37.26 MEAN 18.5 27.2 51.7 81.7 38.6 40.7 24.95 31.26 diffe perce stude 43 The blocks are similar in percent of males. The differences are greater in number of students per block, percent of non-White students per group, percent of students on AFDC, percent of students in the Chapter I program, percent of students from one-parent families, and pretest mean. Group mean differences in the demographic variables of AFDC, membership in Chapter I programming, and students from one-parent families were especially large. These differences suggest that the top ability classrooms contained markedly more advantaged children. The classrooms reflect an ordering of rooms by ability within the school system that is done to program for gifted and talented students. When this ordering is done, related social factors also are ordered from advantaged to less advantages. This finding substantiates an observation by Brookover, "Since people from nuncrity groups and lower—socioeconomic status families are more likely to be placed in lower educational level groups or tracks, the extensive grouping practice is clearly related to the racial, ethnic, and socioeconomic stratification of the society" (Brookover, 1975, p. 130). Class size differences reflected the opposite attention, that is, the least advantaged classes had the least number of students. This was especially true of the two smallest classes (14 and 15 students). The smallest class also had a large 44 difference in raw score per test mean. The fact that fairly large differences in number of students per block and per treatment existed was taken into account in the statistical process used in analyzing the data. The blocks are arranged from advantaged and high achieving to less advantaged and lower achieving. Were there any apparent differences between the treatment groups? Table 3.4 shows the same descriptive statistics arranged by treatment. The descriptive statistics arranged by experimental group showed somewhat smaller differences between treatment groups on the variables included in the study than the differences between blocks. Class size varied a maximum of 4.3 students and a minimum of two students; the proportion of non-White students varied from 30.3% to 16.6% or almost half, a large difference; the proportion of males varied by a maximum of 14.5% from 58.9% in the five-minute treatment group to 44.4% in the ten-minute treatment group; the proportion of AFDC children varied by a maximum of 6% and was highest in the control group (50.2% to 44.2%), the proportion of Chapter I students almost doubled between the control group and the five-minute experimental group, 17.4% to 31.9%, the other groups were close to the five-minute experimental group; the proportion of children from one-parent families varied by 15.6%, 38.9%, to 23.3%. The spread of differences on the Tabl DESC' ~33,me (nrnrnr—«n Contr Table 3.4 45 Descriptive Statistics by Treatment Group. T C C % % % of % of % of Mean Mean R L O Other Male AFCD Chap. One Pre- Pre- E A U Non- Parent Test Test A S N White Homes T S T Raw Adjust- Score ed Control 3 24 16.7 54.2 33.3 0.0 4.2 33.17 37.33 6 23 30.4 43.5 56.5 17.4 52.2 27.83 27.87 12 23 21.7 39.1 60.9 34.8 43.5 26.30 37.26 MEANS 23.3 22.9 45.6 50.2 17.4 33.3 29.16 Five Minutes 1 26 0.0 46.2 3.8 0.0 0.0 38.84 44.88 8 23 27.3 59.1 40.9 18.2 40.9 27.04 34.00 10 14 35.7 71.4 100.0 78.6 28.9 19.43 19.26 MEANS 21.0 21.0 58.9 48.2 31.9 23.3 30.30 Ten Minutes 4 29 20.7 44.8 20.7 0.0 17.2 34.17 39.48 5 25 28.0 52.0 40.0 24.0 36.0 30.20 34.04 11 22 18.2 36.4 72.7 40.9 63.6 26.73 34.55 MEANS 25.3 16.6 44.4 44.4 21.6 38.9 30.71 FifteeniMinutes 2 27 29.6 70.4 7.4 11.1 22.2 34.78 42.74 7 24 28.0 40.0 32.0 48.0 32.0 28.92 32.68 9 15 33.3 60.0 93.3 0.0 26.7 25.40 28.53 MEANS 22.0 30.3 56.8 44.2 19.7 26.9 30.52 achiex score betwee 46 achievement variable was small, a maximum of 1.55 raw score points. The differences in variables was spread between the experimental groups. 1. The control group was highest in the proportion of students on AFDC. 2. The five-minute experimental group was highest in the proportion of student in Chapter I. 3. The ten—minute experimental group was highest in children from one-parent families. 4. The fifteen-minute experimental group was highest in the proportion of non-White children and males. The blocks were significantly different from one another and the experimental groups, while showing differences on most variables are similar to one another on the achievement variable. The random selection of experimental groups was verified. This completes the review of the demographic data concerning the establishment of blocks and treatment groups. Best Covariate For the Reading Comprehension subtest, the best single covariate proved to be the Reading Comprehension pretest, r-.80. For the Reference Materials subtest, the Reference Materials pretest proved to be the best covariate, r-.75 (see Table 3.5). The covariates (sex, Tabl Pear Prere Comp: Prere Skill familj PrOVe Signi: the Cr and subte: Readii the w. CQVar: redUCe 47 Table 3.5 Pearson Correlation Coefficients Post Post Reading Reference Comprehension Skills Prereading .80 .66 Comprehension (N-275) (N=275) P-.000 P-.000 Prereference .71 .75 Skills (N-275) (N-275) P-.000 P-.000 family configuration, ethnic group, membership) did not prove to be significant. Membership in Chapter I was significant for Reading Comprehension. The analysis of the covariates is shown in Table 4.2. This result agreed with the findings of the Rudman and Raudenbush research, "for Reading Comprehension subtest, the best single covariate proved to be the Total Reading pretest . . . fer the Word Study Skills subtest, the Word Study Skill pretest proved to be the best single covariate" (Rudman & Raudenbush, 1986, p. 6). An ANOVA using regression techniques was employed to reduce the error associated with unequal sample sizes. &. & Were Raude {Elat member haVe C {RehbeJ en n: QGUCati 48 Choosing the Appropriate F—Test In nested designs of this type, the appropriate F test for the treatment contrasts typically uses the unexplained variation between classes as the mean square error. In such an analysis, the residual effects of classrooms are viewed as random effects. When the null hypothesis of no residual variance between classrooms is retained, an alternative error term is available, and typically provides a more powerful F-test. The alternative is to pool the residual between-class variation with the residual within—class variation, yielding a dramatic increase in the degree of freedom associated with the error (Hopkins, 1982) (Rudman/Raudenbush, 1987, p. 9). Determination of Variables The variables chosen for inclusion in this research were selected on the following criteria. 1. A replication of the variables in the Rudman/ Raudenbush research. 2. They were variables that have been shown to be related to achievement in prior social science research and for which statistics were available to the researcher. The variables--sex, ethnicity, family configuration, membership in Chapter I, and past academic achievement-- were included in this study. Variables, such as these, have continued to be the focus of sociological research (Rehberg a Rosenthal, 1978, p. 4). In addition, it has been noted that children from families of low levels of education and different language and cultural styles do n=~ D? Q 49 not do well on formal tests of ability and aptitude (Brookover & Erickson, 1975, p. 104). The demographic variables are defined in the following manner: Sex is male or female. Ethnicity is White (non-Hispanic) and other (largely Black, some Hispanic, Asian, Japanese, and Native Americans). Family configuration included the categories of one- parent or two-parent family structures. Membership in AFDC consists of students receiving Aid to Families of Dependent Children. Membership in Chapter I is those children receiving supplemental help through the district Chapter I program. These students were those who achieved in the bottom quartile in reading and/or math on the Fall Iowa Test of Basic Skills. The coding for the variables is contained in Table 3.8 and again in the data in the Appendix. Diregtions for Administering the Test The teachers who volunteered to participate in the study all received researcher prepared direction sheets. All of the instruction sheets were duplicates of the directions given in the manual for the Reading Comprehension and Reference Materials subtest of the Iowa 50 Test of Basic Skills, except that the time limits were altered according to the experimental group to which the class belonged. These direction sheets are shown in Appendix A. Time Alterations It was felt reasonable to replicate the time limit extensions of the previous study (Rudman & Raudenbush, 1986) after comparing the Reading Comprehension subtest of the Standard Achievement Test and the Iowa Test of Basic Skills. A comparison of the two tests appears in Chapter IV. The tests are comparable on the criteria selected. The time limits for the Reference Materials subtest of the Iowa Test of Basic Skills will be extended as were the time limits in the original Rudman/Raudenbush Study (1986; 1987). This is a different subtest and does not have parallel data from the earlier research. See Table 3.6 for testing times. Time Assi nments Related Eo BuIIdIngs The classrooms within the blocks were randomly assigned to treatment groups. The treatments were divided between the schools as shown in Table 3.7. The blocking variables could be viewed as ordinal variables, ranking the classrooms on background variables related to Treatn 51 Table 3.6 Testinngime by Treatment and Subtest. I Treatment Reading Reference Time in Materials Minutes Time in Minutes 1 42* 25* 2 47 30 3 52 35 4 57 40 I *Preliminary Technical Summary, Iowa Test of Basic Skills, Riverside Publishing Company, 1986, p. 4. 52 Table 3.7 Experimental Groups by Building Number of Classrooms per Site. Time School A School B Control (0) 1 2 5 minutes 1 2 10 minutes 2 1 15 minutes 2 1 53 the outcomes. The experimental assignments were as balanced as possible between buildings, considering that there were three blocks and two school sites (see Table 3.7). Summary This research was designed to look at the effect of increasing the time limits on the Reading Comprehension and Reference Materials subtests of the Iowa Test of Basic Skills. The samples were drawn from students in the sixth grade classrooms who volunteered to be part of the study. A total of twelve classrooms were divided into three blocks using ability, defined as prior results on the Reading Comprehension subtest of the Iowa Test of Basic Skills, as the primary blocking factor. The increases in time limits were set in five-minute increments, a control group (the time limits set by the publisher), five additional minutes, ten additional ininutes, and fifteen additional minutes. These ‘paralleled the prior research by Rudman and Raudenbush. The analytical design employed standard regression (procedures and HLM statistical procedures to analyze the data for the three balanced blocks of four groups each. The HLM model is thought to be more accurate when there are small sample sizes and unequal block sizes as it 54 Table 3.8 Demographic Variables Codes Variable Code Sex Male 0 Female 1 Ethnicity White 0 Non-White 1 or 2 Family Configuration One Parent 1 Two Parents 2 Chapter I No Chapter I 0 Receiving Chapter I 1 55 automatically takes these statistical problems into account as it analyzes the data. The blocks were established to be as different as possible on achievement. The three blocks were ordered in ability from high to low. Assignment to experimental groups within blocks was done randomly. The best covariate for the Reading Comprehension subtest of the Iowa Test of Basic Skills proved to be prior results on the Reading Comprehension subtest of the Iowa Test of Basic Skills. The best covariate for the Reference Materials subtest of the Iowa Test of Basic Skills proved to the prior achievement on the Reference Materials subtest of the Iowa Test of Basic Skills. The standard F test of .05 level of confidence was used. This matches most of the social science research. CHAPTER IV ANALYSIS AND FINDINGS This research study was designed to test the effect of increasing time on the achievement test results on the Reading Comprehension and Reference Materials subtests of the 1986 edition of the Iowa Test of Basic Skills. The possible relationship of selected covariates to the outcome was examined using a multiple-regression procedure. The best predictors of achievement on the Reading Comprehension and Reference Materials subtests of the Iowa Test of Basic Skills were determined to be past achievement on the Reading Comprehension and Reference Materials subtests of the Iowa Test of Basic Skills. The results were analyzed using adjusted and unadjusted means for the effect of the best covariate. Linear, quadratic, and cubic effects were analyzed to determine the nature of any relationship. A linear effect would establish a direct relationship between time and achievement scores, a quadratic effect would suggest that, if a relationship exists, it may diminish in time and give some indication of how much time is "optimal" or when the point of diminishing returns is reached. A cubic effect would 56 57 suggest a mixed relationship between increased time and achievement. Outcome Data The experimental sample took their tests on mark sense sheets that were machine scored. The results were transferred to the mainframe computer for analysis via a tap transfer to minimize error. The data were analyzed using a statistical program, SPSS—x (Norusis, 1988). A summary of the outcome data appears in Table 4.1. Visually, there appeared to be a moderate scattered effect of increasing scores when test time is increased for most treatment groups on the Reading Comprehension subtest. The effect did not appear to be linear as there was not an increasing raw score for each increase in time. The group in the five-minute treatment in Block 3 did not appear to fit the general pattern of the results, and their score was quite depressed. The effects for the Reference Materials subtest showed moderate scatter in scores when additional time was given to take the test. The difference in general outcome for the five-minute .experimental time increase in Block 3 was also present in the results for this subtest. This class also had depressed scores on the pretest, 19.43 (see Table 3.3). The results did not appear to be linear. 58 Table 4.1 Outcome Data Unadjusted Means Raw Score Fifteen Minutes Subtest/ None Five Ten Blocks Minutes Minutes Pooled Unadjusted Classroom Means and Sample Sizes Reading Comprehension Block 1 PreEesE 33.17 38.84 34.17 34.78 35.29 Posttest 37.33 44.88 39.48 42.74 41.11 Block 2 PreEesE 27.83 27.04 30.20 28.92 28.49 Posttest 27.87 32.68 34.04 28.53 30.78 Block 3 PreEesE 26.30 19.43 26.73 25.40 24.95 Posttest 38.23 19.14 34.55 34.00 31.48 Posttest Pooled 34.25 33.38 35.34 34.52 34.37 Reference Materials Block 1 29.21 41.31 33.31 30.04 33.47 iBlock.2 25.26 30.96 32.12 25.40 28.44 IBlock.3 27.14 18.00 25.18 32.59 25.73 Pooled 27.21 30.42 30.24 28.96 29.21 Sample Size "Counts" Block 1 24.00 26.00 29.00 27.00 106.00 Block 2 23.00 22.00 25.00 25.00 95.00 Block 3 23.00 14.00 22.00 15.00 74.00 Pooled 70.00 62.00 76.00 67.00 275.00 59 Analysis Choosing the Best Covariate The following questions were posed concerning covariate issues: 1. Is prior achievement, defined as a student's score on a prior test of the Reading Comprehension subtest of the Iowa Test of Basic Skills, a good predictor of expected achievement on the Reading Comprehension subtest of the 1933 Test of Basic Skills? Is gender a good predictor of expected achievement on the Reading Comprehension subtest of the Iowa Test of Basic Skills? Is family configuration a good predictor of expected achievement of the Reading Comprehension subtest of the Iowa Test of Basic Skills? Is assignment to Chapter I programming a good predictor of expected achievement on the Reading Comprehension subtest of the Iowa Test of Basic Skills? Is membership in a nunority ethnic group a good predictor of expected achievement on the Reading Comprehension subtest of the Iowa Test of Basic Skills? 60 An analysis of variance procedure was used to determine the correlation between the discrete variables and achievement on the Reading Comprehension subtest and the Reference Materials subtest of the Iowa Test of Basic Skills. Correlations were tested using variables in isolation and in combination with other variables. Previous research has shown that the variables selected for this research are. correlated with. an increase in achievement. They, singly or in. combination, should explain all or most of variance in scores. Two— and three-way interactions were included. The statistical data appear in Tables 4.2 and 4.3. The data confirmed that the best predictor of success on the Reading Comprehension subtest of the Igwa Test of Basic Skills is achievement, defined as prior performance on the Reading Comprehension subtest. This relationship showed significance to the .000 level of confidence. There is virtually no chance that the results on a prior subtest in Reading Comprehension would not be a good predictor of subsequent results. Prior results on the Reference Materials subtest also appeared to be a good predictor of success on the Reading Comprehension subtest, but not as strong as prior results on the Reading Comprehension subtest. The significance level was .039, within the .05 guideline usually used in social science research and also embraced in this 61 Table 4.2 Relationship Between Covariates and Achievement on the Readirg Comprehension Subtest of the Iowa Test of Basic Skills Adjusted Score Data. Source of Sum of DE Mean F P Variation Squares Square of F Covariates 16348.336 2 8174.168 88.945 .000 Preread 4624.378 1 4624.378 126.794 .000 Preference 397.612 1 397.612 4.327 .039 Main Effects 474.306 4 118.576 1.290 .274 Sex 15.906 1 15.906 .173 .678 Chapter I 414.813 1 414.813 4.514 .035 Race 14.904 1 14.904 .162 .687 Family 92.918 1 92.918 1.011 .316 2-Way Interactions 339.390 6 56.565 .615 .718 Sex/Chapter 46.251 1 46.251 .503 .479 Sex/Race 9.122 1 0.122 .099 .753 Sex/Family 222.005 1 222.005 2.416 .121 Chapter/Race 12.248 1 12.248 .133 .715 Chapter/Family .015 1 .015 .000 .990 Race/Family 28.013 1 28.013 .305 .581 ‘3—Way Interactions 897.381 4 224.345 2.441 .047 Sex/Chp/Race 2.998 1 2.998 .033 .857 Sex/Chp/Fam 268.884 1 268.884 2.926 .088 Sex/Race/Fam 46.209 1 46.209 .503 .479 Chp/Race/Fam 229.074 1 229.074 2.493 .116 IExplained 18059.413 16 1128.713 12.282 .000 IResidual 23618.645 257 91.901 'rotal 41678.058 273 152.667 62 Table 4.3 Relationship Between Covariates and Achievement on the Reference Materials Subtest of the Iowa Test of Basic Skills Adjusted Score Data. Source of Sum of DE Mean F P Variation Squares Square of F Covariates 6664.956 2 3332.478 20.671 .00 Preread 336.244 1 336.244 2.086 .150 Preference 1480.832 1 1480.832 9.186 .003 Main Effects 753.835 4 183.959 1.141 .338 Sex 145.481 1 145.481 .902 .343 Chapter I 107.392 1 107.392 .666 .415 Race 84.828 1 84.828 .526 .469 Family 248.353 1 248.353 1.541 .216 2-Way Interactions 332.765 6 55.461 .344 .913 Sex/Chapter 54.492 1 54.492 .338 .561 Sex/Race 14.381 1 14.381 .089 .765 Sex/Family 22.449 1 22.449 .139 .709 Chapter/Race 71.288 1 71.288 .442 .507 Chapter/Family .266 1 .266 .002 .968 Race/Family 145.825 1 145.825 .905 .342 3-Way Interactions 1045.475 4 261.269 1.621 .169 Sex/Chp/Race 65.415 1 65.415 .406 .525 Sex/Chp/Fam 199.242 1 199.242 1.236 .267 Sex/Race/Fam 81.458 1 81.458 .505 .478 Chp/Race/Fam 104.030 1 104.030 .645 .423 Explained 8779.031 16 548.689 3.404 .00 Residual 41431.674 257 161.213 Total 50210.704 273 183.922 63 research. This variable, however, was not as good as the pretest in Reading Comprehension. One other variable turned out to be significant: membership in Chapter I. It was significant to the .035 level of confidence. The raw score data shown in Table 4.1 shows that Block 3, the block containing the highest percentage of students in Chapter I (38% on average to 10.9% and 26.9% for Blocks 1 and 2, respectively) had a different achievement pattern over time than either Block 1 or Block 2. The group in the five-minute experimental time for Block 3 had very different results, as noted in Table 4.1, is entirely made up of Chapter I students. The rest of the covariates were not significant, either alone or in combination, with other covariates. The only other covariate that is even close to significant is a three- way interaction effect between sex, Chapter I membership, and family configuration at .088 level of significance. Reference Materials The same analysis of covariance was performed on the results for the Reference Materials subtest of the _Io_wa Test of Basic Skills. The results for this subtest are shown in Table 4.3. The only good predictor of achievement on the Reference Materials subtest of the Iowa Test of Basic Skills proved to be the pretest test of the Reference 64 Materials subtest of the Iowa Test of Basic Skills, at the .003 level of significance. The rest of the covariates are not significant, either alone or in combination with other covariates. Membership in Chapter I and the three—way interaction between sex, membership in Chapter I, and family configuration were not significant or even close to significant as they were on the Reading Comprehension subtest. After analyzing the data, it appeared that the best predictor of achievement. on the :Reading’ Comprehension subtest of the Iowa Test of Basic Skills is prior achievement on the Reading Comprehension subtest of the Iowa Test of Basic Skills; the best predictor of achievement on the Reference Materials subtest of the Iowa Test of Basic Skills is the pretest score on a previous Reference Materials subtest of the Iowa Test of Basic Skills. When controlled for the best predictor, the other covariates did not appear to be significant. The effect of these two covariates were used to adjust the scores in later analysis. Reading Comprehension The following research question was used when analyzing the Reading Comprehension subtest scores. 65 Will increasing the time limits on the Reading Comprehension subtest of the Iowa Test. of’ Basic Skills 'by five, ten, or fifteen minutes EIgfiIficantly increase student achievement? When the data for Reading Comprehension was analyzed by comparing the post test unadjusted means for each experimental group, there appeared to be a nuxed relationship between time and achievement (Table 4.1). Blocks 1 and 2, the two highest ability groups, showed a small increase across all treatments. There was no clear pattern of a greater effect for any one treatment. Block 3, the group of lowest ability, showed a decrease in mean across all treatments. When the means were pooled across all blocks, the effect appeared to be fairly flat or no increase in average score (Table 4.4). Table 4.4 Pooled Unadjusted Treatment Means (Raw Score) for Reading Comprehension Subtest Iowa Test of Basic Skills Treatment Group Treatment Mean Control 34.25416 Treatment 1 ( 5 minuteS) 33.38303 'Treatment 2 (10 minutes) 35.34277 Treatment 3 (15 minutes) 34.52455 66 To determine if the moderate change in the raw score means on the Reading Comprehension subtest of the 1% Test of Basic Skills, noted visually by comparing the treatment means, is significant, a standard ANOVA procedure was employed. The results of the ANOVA are shown in Table 4.5. Using adjusted raw scores the relationship between increased time and Reading Comprehension subtest scores on the Iowa Test of Basic Skills did not appear to be significant. However, the treatment effect on the outcome measure does appear to be dependent upon block membership. Any effect, significant or not, of time on achievement may be dependent on ability (p-.05). Block membership in this experimental sample was determined by ability level. In addition, there was a significant treatment by block interaction effect (p-.00). It may be that effect of increasing time on the achievement results of student, although not significant, may be different for students of low ability than students of average or better ability on the Reading Comprehension subtest of the Iowa Test of Basic Skills. The adjusted means Table 4.6 for the results of the Reading Comprehension subtest were graphed by block to help explain the treatment by block interaction effect. It is possible that what we are seeing is a teacher or a tracking effect and not directly attributable to 67 Table 4.5 Analysis of Variance Using Unique Sum of Squares, Raw Scores, Unadjusted Reading Comprehension. SS DF MS F P of F Within Cells 22414.39 262 85.55 Covariate Pretest-read 9192.37 1 9192.37 107.45 .000 5 Minutes 25.31 1 255.31 .30 .587 10 Minutes .05 1 .05 .00 .982 15 Minutes 102.040 1 102.40 1.20 .275 Block 942.32 2 471.16 5.51 .005 Treat/Block 2571.40 6 428.57 5.01 .000 68 increasing the time. Figure 4.1 illustrates a treatment by block effect. The depressed scores for the 5 minute treatment in Block 3 is markedly different from all other treatment scores. All members of this group were Chapter I students and the pretest score was also depressed. This could be the result of prior placement of students by ability, low expectations of a teacher or teachers due to the tracking or poor test administration practices. Reference Materials The next research question was concerned with adding time to the Reference Materials subtest of the Iowa Test of Basic Skills. Will increasing the time limits on the Reference Materials subtest of the Iowa Test off Basic Skills by five, ten, or fifteen minutes significantIy increase student achievement? When the unadjusted treatment means for the Reference Materials subtest were viewed, it appeared that there was a mixed effect of time on achievement. Blocks 1 and 2 showed a small increase: in. mean across all treatments. Block 3 showed a decrease for treatment 1 (a five-minute increase in time) and treatment 2 (a ten- minute increase in time); however, the mean was slightly higher for treatment 3 (a fifteen-minute increase in time). When the means were pooled, the increase in means appeared fairly flat across all treatments (see Tables 4.1 and 4.7). 69 a 80040 .0: N 800.3 .0. p 88.3 no. whamzbkuc... .2 Zumtfi .23: 2m». .32.”):— . many! a - 20.. A 7 MW omnrov angular-memsg 92:25- :05 uz