PREDICTING OUTCOMES ON HIGH STAKES ASSESSMENTS FOR MIDDLE SCHOOL STUDENTS: A COMPARISON OF CURRICULUM-BASED MEASURES AND EXTANT DATASETS By Nathan A. Stevenson A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Special Education – Doctor of Philosophy 2015 ABSTRACT PREDICTING FAILURE ON HIGH STAKES ASSESSMENTS FOR MIDDLE SCHOOL STUDENTS: A COMPARISON OF CURRICULUM-BASED MEASURES AND EXTANT DATASETS By Nathan A. Stevenson As a school-wide framework, Multi-Tired Systems of Support (MTSS) relies on the prevention and early identification of students at risk of academic failure (Sugai & Horner, 2009). Approaches to early identification of students in need of support include the administration of universal screening assessments and the analysis of existing student data such as attendance, grades, office discipline referrals, and prior performance on statewide assessments. However, there is little research that directly compares the accuracy and reliability of these approaches, particularly in middle grades. This investigation provides a direct comparison of curriculum-based measures in reading and the examination of archival data at the middle school level for the identification of students at risk for academic failure. Data were collected for students in grades seven (n = 197) and eight (n = 237). Data were analyzed through hierarchical logistic and linear regressions using outcomes on reading subtests of Michigan Education Achievement Program (MEAP) and ACT Explore® as the dependent variables. Results inform how data from universal screening assessments and existing sources can be used to accurately and efficiently identify students in need of academic support. Copyright by NATHAN A. STEVENSON 2015 Dedicated to Jenny, Carter, and Lily. Thank you for everything you are and all that you do. I could not have done this without out you. iv ACKNOWLEDGEMENTS I wish to thank the incredible group of teachers, administrators, and consultants I work with every day. Special thanks to Dr. Cythia Okolo, for her unwavering support, advisement, critique, and flexibility throughout this experience. Thank you to Dr. Sara Witmer and Dr. Mark Reckase for their superb advice and service on this committee. To Dr. Troy Mariage, who helps me see the bigger picture and keeps me excited about everything. To my teammates Dave Pike and Nate Jarvie for their tremendous contributions to the education of all students. And finally to my wife, Jennifer, whose support and encouragement made this possible. v TABLE OF CONTENTS LIST OF TABLES….…………………………………………………………………...….….. vi INTRODUCTION……………………………………………………………..………….….... 1 CHAPTER 1: Rationale……….…………….……….…………..…………………………...... Universal Screening………..……………………………………………….…………......... Screening in Secondary Schools.……………..…………………………….………………. Screening in Middle Schools…………………………………………………..…..……...... 3 3 4 7 CHAPTER 2: Literature Review...............................……………………………………...….. Universal Screening Measures in Reading………………………………………...…..…… Oral Reading Fluency…..………………………………………………………..……… Maze Reading Comprehension……..…………………………………………......…...... easyCBM Multiple Choice Reading Comprehension…………...………....……..…...… Early Warning Signs…………………………………………………….…..…………........ Attendance……………………………………………………….……………………… Behavior…………………..………………………….……….…………………....…..... Course Failure………………………………………………………………………...…. Statewide Reading Assessments..………………………………………….………......... Recent Research in Middle School Screening........................................................................ Research Questions……………………………………………………………………… 9 9 9 10 12 13 15 15 17 18 20 22 CHAPTER 3: Methods ...............……………..………..…………………………………….... Participants and Setting………………..………………………………………..………....... Data Collection .............................................................…………………………...……...... Reading Screening Measures……………………………………………………..……….... easyCBM Multiple Choice Reading Comprehension................……..………………...... Reading-Curriculum Based Measure ………………………………………………….... Maze Reading Comprehension…………………………………………...……………... Early Warning Signs…………………………………………….………………………...... Attendance………………………………………….……….…………………………... Office Discipline Referrals………………………………..…………………………...... Failing Grades…………………………………………..……………………………...... Reading Outcome Measures................................................................................................... Michigan Education Assessment Program.............................…………………………... ACT Explore® Reading...................................................................................................... Data Analysis……………………………………………..…………………………............ Data Screening………………………………………..……………...…………….......... Binary Analysis………………………………………..……………………..…….......... Linear Analysis………………………………………..……………………..…….......... 24 24 24 26 27 27 28 28 28 20 29 29 29 31 32 32 32 35 CHAPTER 4: Results................………………………..…………………………………….... 36 Data Screening...................……………………………………………………..………....... 36 iv Demographic Variables.......................................................................................................... Binary Analysis .............................................................…………………………...……...... Early Warning Signs………...……………………………………………………….. Curriculum Based Measures…………………………………...…………..……….... Linear Analysis………………………………………………..……..………………...... Early Warning Signs………...……………………………………………………….. Curriculum Based Measures…………………………………...…………..……….... 36 37 37 40 42 43 44 CHAPTER 5: Discussion……………………………………….………………………..…...... Summary...………………………………………….……….…………………….………... Question 1...………………………..…………………….…..........…….......................... Question 2……………………....................…...………................……….…………...... Question 3…...…….....................…………….………................…….…………............ Limitations………………………………………………………………………………….. Implications……………………………………………………..…………………………. Suggestions for Future Research…………………………………………………………… 48 48 49 51 53 55 58 60 APPPENDIX….…………………….…………………………………..…………………….... 63 BIBLIOGRAPHY…………………………………………………………………………….... 84 v LIST OF TABLES Table 1 Summary of Descriptive Statistics and Tests for Normality......................................... 64 Table 2 Model Fit for Seventh Grade EWS Variables on MEAP Reading............................... 65 Table 3 Model Fit for Eighth Grade EWS Variables on MEAP Reading................................. 66 Table 4 Logistic Regression Results for Seventh Grade EWS on MEAP Reading……........... 67 Table 5 Logistic Regression Results for Eighth Grade EWS on MEAP Reading………........ 68 Table 6 Model Fit for Eighth Grade EWS Variables on ACT Explore® Reading…...….…..... 69 Table 7 Logistic Regression Results for Eighth Grade EWS on ACT Explore® Reading........ 70 Table 8 Model Fit for Seventh Grade CBM Variables on MEAP Reading............................... 71 Table 9 Model Fit for Eighth Grade CBM Variables on MEAP Reading................................. 72 Table 10 Logistic Regression Results for Seventh Grade CBM on MEAP Reading................. 73 Table 11 Logistic Regression Results for Eighth Grade CBM on MEAP Reading………....... 74 Table 12 Model Fit for Eighth Grade CBM Variables on ACT Explore® Reading................... 75 Table 13 Logistic Regression Results for Eighth Grade CBM on ACT Explore® Reading...... 76 Table 14 Linear Regression Results for Seventh Grade EWS on MEAP Reading………........ 77 Table 15 Linear Regression Results for Eighth Grade EWS on MEAP Reading...................... 78 Table 16 Linear Regression Results for Seventh Grade CBM on MEAP Reading………...… 79 Table 17 Linear Regression Results for Eighth Grade CBM on MEAP Reading……….…… 80 Table 18 Linear Regression Results for Eighth Grade EWS on ACT Explore® Reading….… 81 Table 19 Linear Regression Results for Eighth Grade CBM on ACT Explore® Reading….… 82 Table 20 Comparison Data of Enrollment and Academic Achievement for 2012..................... 83 vi INTRODUCTION Multi-Tiered Systems of Support (MTSS) is a framework for the delivery of instruction and intervention to support positive academic and behavioral outcomes. MTSS incorporates both academic and behavior supports (Sugai & Horner, 2009). With the integration of Response-toIntervention (RtI) and School-Wide Positive Behavior Supports (SW-PBIS), MTSS provides schools with a comprehensive structure of services including: (a) early and accurate identification of students in need of support, (b) delivery of research-based instruction and interventions, (c) systematic collection and analysis of student data, and (d) explicit methods of data-based decision making (Fuchs, 2003; Fuchs & Fuchs, 1998; Fuchs, Fuchs, & Speece, 2002; Fuchs, Mock, Morgan, & Young, 2003; Sugai & Horner, 2009; Vaughn, Linan-Thompson, & Hickman, 2003). While there is much debate on the impact of MTSS on student achievement, it is widely accepted that the early identification and prevention of academic failure is a reasonable, and responsible approach to ensuring that all students meet or exceed the standards for career and college readiness (Fuchs, 2003; Fuchs, Fuchs, & Compton; 2013). Early identification and treatment of students in need of support is a core function of any MTSS system (Batsche et al., 2005; Hosp, Hosp, & Dole, 2011; Salvia, Ysseldyke, & Bolt, 2009). At the elementary level, identification of learners at-risk for not reaching grade level benchmarks is often done using Curriculum-Based Measures (CBM) such as oral reading fluency and Maze reading comprehension. CBM are a key source of data in (a) screening for risk of academic failure, (b) progress monitoring, (c) program evaluation, (d) diagnosis of specific learning disabilities, and (e) a recommended component of a balanced assessment system (Salvia, Ysseldyke, & Bolt, 2009). At the secondary level, existing data sets such as state achievement tests and course 1 grades are used more often as screening tools than are CBM (Heppen & Therriault, 2008). Copious research has demonstrated the predictive power of extant data at the high school level with such factors as attendance, behavior, grades, and results from past state assessments in predicting high school dropout (Balfanz, 2009; Balfanz, Herzog, & Mac Iver, 2007; Baydar, Brooks-Gunn, & Furstenberg, 1993; Casillas et al., 2012; Heppen & Therriault, 2008; Jerald, 2006; Pinkus, 2008). These data, commonly known as Early Warning Signs (EWS), are used in high schools throughout the United States to serve a screening function similar to that of CBM at the elementary level (Kennellly & Monard, 2007). While screening with CBM reading assessments in elementary is done to predict which students are at risk of not reaching grade level performance benchmarks, EWS at the high school level are typically used to predict which students have academic or behavior problems and are therefore at risk for dropout. However, despite the growth of research in MTSS, CBM, and EWS, there remains several unanswered questions with regard to the technical adequacy and usefulness of such predictive data at the middle school level. The following chapters discuss the purpose, rationale, and brief history behind the use of CBM and EWS as universal screening measures, a review of existing literature regarding the use of CBM and EWS in middle school, along with methods, results, and discussion of an investigation comparing the use of CBM and EWS in predicting failure on high stakes tests. 2 CHAPTER 1: Rationale Universal Screening To provide effective intervention and supplemental services to students that struggle, schools must first identify those in need of support. Early and accurate identification of students at risk for developing persistent skill deficits provides schools the best opportunity to intervene early and put students back on track to success. One approach to identifying students in need of support is through the use of universal screeners. Universal screeners are brief assessments given to all students in a school as an initial means for determining which students may be in need of additional instructional supports and services (Hosp, Hosp, & Dole, 2011). Universal screeners in schools are analogous to measures of blood pressure or body temperature in a routine office visit to a physician. Such assessments are not diagnostic but are strong indicators that problems exist. Early identification through screening procedures aids in identifying those individuals who, without intervention, would “develop serious and chronic academic problems” (Fuchs, Fuchs, & Compton, 2013 p. 265). Though there is much debate on the efficacy of screening with regard to achieving better student outcomes for those at risk of academic problems, it is widely accepted that a focus on early identification and prevention of academic difficulties is a promising approach to stemming academic difficulties and enabling success in school (Fuchs, Fuchs, & Compton, 2013). Many schools administer universal screeners in the form of CBM at regular intervals throughout the school year. Curriculum-based measures (CBM) are a class of assessments used to measure student progress in specific school-based skills (Deno, 1985; Deno, 2003). CBM are robust assessments of academic skills that have been standardized for difficulty within and across grade levels (Hosp, Hosp, & Dole, 2011). Measures such as oral reading fluency (ORF) 3 provide an estimate of students’ overall reading ability by measuring the number of words a student can read per minute from a grade level appropriate text. When used as universal screeners CBM scores for individual students are compared to predetermined benchmark cutscores (Hasbrouk & Tindal, 2006; Tindal & Nese, 2009). This comparison aids in determining if a student is at risk of failure and, therefore, requires intervention. Currently, there are a variety of commercially and publically available CBM for use as universal screening assessments in elementary and secondary schools. The Center on Response to Intervention at American Institutes for Research (www.rti4success.org) conducts a periodic review and maintains a comprehensive catalog of CBM that includes detailed information on the psychometric properties, extent of empirical evidence, and the strength of existing evidence for each assessment. Examples include DIBELS Oral Reading Fluency (D-ORF), ReadingCurriculum Based Measure (R-CBM), Maze Reading Comprehension (Maze), Nonsense Word Fluency (NWF), and easyCBM™ Multiple Choice Reading Comprehension (MCRC; Hintze & Silberglitt, 2005; Hosp, Hosp, & Dole, 2011; Saez, et al., 2010; Yeo, 2009). Screening in Secondary Schools Screening procedures at the secondary level are intended to identify students in need of additional supports and provide remediation services as early as possible. The earlier that students are identified for intervention, the more time there will be to provide interventions and potentially stem dropout. Unfortunately, the idea of early intervention is a bit of a misnomer in middle and high school. Once in middle and high school there is comparatively little time left in a student’s K-12 career to get them back on track for graduation. In fact, as will be discussed in detail in Chapter Two, the strength of the relationship between early warning indicators and eventual dropout is so strong that it is highly unlikely that students off track in their first year of 4 high school will graduate without early, intensive, and ongoing intervention. Because of this time crunch, there is great urgency in identifying students that need interventions or other supplemental services as early as possible Research demonstrating intervention effects that allow students with persistent reading deficits to make catch-up levels of growth is scarce. In 2010, Vaughn and colleagues explored the effects of a package of intensive reading interventions that incorporated decoding, vocabulary, fluency, and comprehension for students in sixth grade. Students identified for additional reading support received an average of 99.6 hours of intervention in a single school year. Though positive, effects were generally weak (median effect size .19, range .015-.19) and in most cases showed no statistically significant post-test gains over pre-test scores (Vaughn et al., 2010). A similar study conducted by Wanzek et al. (2011) tested the effects of a year-long intervention program for middle school students with identified learning disabilities in reading. Despite daily instruction of 50 minutes in groups of 10 to 15 students, effects over the control group were small (η = .000-.054) with significant effects limited only to word reading fluency. Results of small effect sizes in the treatment of reading problems for students in middle school are not unusual. Similar results have been found in many other studies (Calhoon & Petscher, 2013; Edmonds et al., 2009; Roberts, Vaughn, Fletcher, Stuebing, & Barth, 2013; Vaughn et al., 2012) Even when reading interventions demonstrate large positive effects it may not be enough to bring struggling readers to a level at which they no longer require additional interventions and supports. For example, a study conducted by Vaughn and colleagues on intensive reading interventions for students in grades 6-8 found that despite a treatment effect size of 1.20 most 5 students in the treatment group improved significantly, but did not reach grade level benchmarks of performance after three full years of intensive intervention (Vaughn et al., 2012). These studies are presented not to be discouraging but to illustrate the difficulty in helping older students with deficits in reading achieve accelerated rates of growth and eventually reach grade level standards. In order to maximize opportunities for intervention, efficient and accurate methods of early identification of at-risk individuals are necessary. For decades, researchers and practitioners alike have explored the connections between school failure and predictors of failure such as early literacy skills, cognitive ability, and behavior characteristics (Baydar, Brooks-Gunn, & Furstengerg, 1994). Presumably, early identification of students at risk of dropping out will enable educators to provide timely, effective intervention that will keep students in school and put them back on the path to graduation (Balfanz, 2009; Balfanz, Herzog, & Mac Iver, 2007; Pinkus, 2008). “Many students who drop out send strong distress signals for years.” (Neild, Balfanz, and Herzon, 2007, p. 28). Data such as attendance, grades, state assessment results, and office discipline referrals, commonly, called Early Warning Signs (EWS), are consistently good predictors of eventual dropout (Kennelly & Monrad, 2007). Much like CBM at the elementary level, EWS is a method of universal screening for students in need of academic and behavioral support. However, instead of administering assessments of specific skills, EWS uses existing data as an indicator of risk for failure and dropout. EWS at the upper grades are often favored over CBM due in part to the logistical considerations of data collection and time. Students in early elementary grades, specifically kindergarten and first grade, lack the archival data that is often available for older students simply because they have yet to amass such data during their limited school careers. 6 Furthermore, students at the early grades are also not required to take state achievement tests until third grade. Again, like CBM, EWS are purposed to detect potential achievement and behavior problems but do not typically provide information on what to do about such problems. EWS can, however, provide insight into the nature of interventions required and help schools allocate resources toward systems and infrastructure that minimize risk factors (Pinkus, 2008). Knowing exactly which warning signs a student is exhibiting may give teachers insight into the specific services that may address the problem. For example, disaggregating early warning signs data by attendance, course failure, and office discipline referrals can help teachers determine if the best course of action is an intervention focused on developing school appropriate behaviors, providing academic support, or addressing poor attendance. Screening in Middle Schools While the research on CBM and the use of EWS data continues to grow, there remain many unanswered questions concerning the use of CBM and EWS as they relate to universal screening in middle school. Though this has begun to change in recent years, far fewer studies exist that explore the use of CBM with students in grades 6 through 8 than at the elementary level (Denton et al., 2011). The vast majority of research in CBM as a screening tool has focused on students in elementary grades (Baker et. al, 2014; Denton et al., 2011; Yeo, 2009). Similarly, most research on EWS has focused on students in grades 9 through 12. This leaves educators working in middle schools with comparatively less guidance concerning which methods of screening are most appropriate for quickly and accurately identifying students in need of support students in grades 6 through 8. Furthermore, even amongst what is known about the reliability and predictive validity of screening measures, there are often “serious inefficiencies” (p. 265) in implementing screening procedures resulting in “unacceptably high rates of false positives (or 7 students who appear at risk but are not)” (Fuchs, Fuchs, & Compton, 2013, p. 266). Currently, it is not clear if either CBM or EWS are appropriately fit for students in middle school. It is also unclear which of these two methods provides the most accurate and reliable assessment of risk. In order for schools to function efficiently and effectively for all students it is necessary to explore CBM and EWS approaches to screening for students in need of additional support. The following chapter explores extant literature on three commonly used CBM assessments and four common EWS predictors, their predictive validity, and how they have been used for universal screening in middle school. This study explores two different methods (EWS and CBM) for early identification of students at-risk in reading for reading problems and how data from these two methods might be used to create a more efficient and accurate approach to screening for students in middle school. 8 CHAPTER 2: Literature Review There are many popular CBM now in use across various assessment platforms. Measures such as Phoneme Segmentation Fluency (PSF), Oral Reading Fluency (ORF), and Nonsense Word Fluency (NWF), have drawn much attention from researchers given the relationship between early identification and prevention of further reading problems (Denton 2012; Goffreda, Diperna, & Pedersen, 2009; Jenkins & O’Connor, 2002). For at-risk students,early identification of reading problems may enable early intervention and therefore prevent further problems in later years (Compton, Fuchs, & Fuchs, 2013). There is now more than thirty years of research validating CBM as predictor of reading proficiency (Baker et. al, 2014; Deno, Mirkin, & Chiang, 1982; Deno, Mirkin, Chiang, & Lowry, 1980; Espin & Deno, 1995; Hintze, Keller-Margulis, & Shapiro, 2008; Helwig, Anderson, & Tindal, 2002; Hosp, Hosp, & Dole, 2011; McGlinchey & Hixson, 2004; Shapiro, Keller, Lutz, Santoro, & Hintze, 2006; Yeo, 2009). Likewise, numerous studies have shown that high school dropout can be predicted with some degree of certainty using extant data from EWS (Balfanz, 2009; Balfanz, Herzog, & Mac Iver, 2007; Baydar, Brooks-Gunn, & Furstenberg, 1993; Casillas et al., 2012; Heppen & Therriault, 2008; Jerald, 2006; Pinkus, 2008). The following section outlines several of the more commonly used CBM and EWS screening measures and examines current application in middle school. Universal Screening Measures in Reading Oral Reading Fluency. Oral reading fluency (ORF) is the most ubiquitous of all reading CBM (Deboy, 2013). Several online assessment platforms include measures of ORF as a key measure in their CBM assessment package for students in grades in first through eighth grade. Oral reading fluency is a general measure of grade level reading speed and accuracy. Students read grade level passages that are timed for 1-minute. ORF assessments are given in a 1-on-1 9 setting. Students read a passage aloud while the test administrator marks errors. The number of errors is subtracted from the total number of words read correctly to yield a score of Words Read Correctly per Minute (WRC). Students are assessed on three independent passages. The median score is then recorded as the students’ score. ORF is commonly used to assess students in first through eighth grade. The wide range of available testing levels and correlations ranging from .60 to .80 with standardized reading comprehension make ORF a popular screening assessment (Shinn & Shinn, 2002). In an analysis of ORF across grade levels, Ardoin and colleagues found a very strong relationship between ORF and other common measures of reading comprehension such as Woodcock-Johnson III (r = .76) and Iowa Test of Basic Skills-TR (r = .72) (Ardoin et al., 2004). More recently, Barth and colleagues (2012) reported that ORF correlations with external reading measures (e.g., Test of Sentence Reading Efficiency and Maze reading comprehension) were moderate to high (range r = .64 - .68) with test-retest reliability of .89 or greater. The popularity of ORF among practitioners has drawn considerable scrutiny from researchers. When research for DIBELS, AIMSweb, and easyCBM are combined, there are more than 60 studies evaluating the validity, reliability, and accuracy of outcomes on ORF CBM (Goffreda & DiPerna, 2010; Yeo, 2009), nearly all of which confirm ORF as a “valid indicator of student performance on future comprehension assessments” (Deboy, 2013, p. 8). Maze Reading Comprehension. Maze is a curriculum-based measure that assesses reading comprehension. Unlike ORF which uses reading fluency as a correlate to comprehension and overall reading achievement, Maze more directly measures students’ understanding of a text. Students read a grade level appropriate passage in which every seventh word has been replaced by three words in parentheses. Students must read the passage and circle the word in parentheses 10 that best completes the sentence. Maze is timed at three minutes. Maze may be a groupadministered assessment. Students receive a score indicating the number of words chosen correctly and the number of words marked incorrectly (Shinn & Shinn, 2002). There are several efforts among commercial and non-profit institutions to develop Maze assessments administered digitally online. Currently, all Maze assessments available for widespread use in schools must be hand scored. Like ORF, Maze is a common measure that is generally considered a moderately to highly reliable assessment of reading (Tolar et al., 2012). Moderate to high correlations have been reported with established measures of reading comprehension such as Woodcock-Johnson Passage Comprehension Subtest (r = .56) and Iowa Test of Basic Skills Reading Comprehension (r = .70; Ardoin et al., 2004; Parker, Hasbrouk, & Tindal, 1992). Espin and colleagues confirmed these results for students in middle school reporting validity coefficients r = .70 or greater in predicting scaled scores on statewide assessments of reading achievement (Espin, Wallace, Lembke, Campbell & Long, 2010). Maze provides a combined assessment of silent reading fluency and reading comprehension (Fuchs & Fuchs, 1992; Tolar, et al., 2012). The measurement of comprehension as a part of CBM screening assessments is appealing over oral reading fluency, particularly to teachers in upper elementary and middle school. As students move into upper grades, measuring fluency alone may be undesirable as the focus of instruction shifts from learning to read to reading to learn (Ardoin et al., 2004; Espin et al., 2010; Tolar, et al., 2012). Prior research on Maze has focused largely on students at the elementary level. However, the evidence base has expanded in recent years to include students in middle school (Tolar, et al., 2012). Most notably, a 2012, investigation by Tolar and colleagues explored the psychometric 11 properties of Maze for students in grades 6 through 8 including test form variability, and predictive validity. Investigators reported predictive validity coefficients consistent with previous research conducted at the elementary grades ranging from .54 to .73. Results for test-reliability were also consistent with previous research with coefficients ranging from .73 to .91 (Tolar, et al., 2012). easyCBM Multiple Choice Reading Comprehension. easyCBM Multiple-Choice Reading Comprehension (MCRC) is an untimed assessment that measures reading comprehension on grade level appropriate text passages. After reading a passage, students answer factual, inferential, and analytical questions. For students in third through eighth grade MCRC contains a total of 20 questions. Text passages are made available while answering questions. Students may refer to the text as often as needed throughout the assessment. MCRC is designed to be group administered and may be given to an unlimited number of students simultaneously via online assessment modules or hardcopy formats (Riverside, 2012). Assessments completed using the online module are scored instantly. Hardcopy assessments must be scored by hand. Scores are reported as the total number of questions answered correctly. Unlike other forms of curriculum-based measurement such as ORF and Maze, that assess basic skills correlating highly with grade level content skills and knowledge, MCRC directly measures grade level reading comprehension skills. MCRC is the most common format of reading comprehension assessment (Andreassen & Braten, 2010), similar to many other measures of reading comprehension including many statewide assessments of reading throughout the country (Deboy, 2013). Reports on MCRC show technical adequacy in alternate form reliability, Rasch item analysis, and split half reliability (range .12-.63; Irvin, Alonzo, Lai, Park, & Tindal, 2012). 12 Early Warning Signs Early Warning Signs (EWS) is the common name for a variety of data sources that have been found to correlate highly with high school dropout (Kennelly & Monrad, 2007). The National High School Center, Alliance for Excellent Education, National Middle School Association, and American Institute for Research actively promote the use EWS as a mechanism for identifying students for interventions and supplemental services to prevent dropout. Funded by the U. S. Department of Education’s Office of Elementary and Secondary Education and Office of Special Education Programs, the National High School Center (NHSC; BetterHighSchools.org) produces free software (i.e. Early Warning Systems) designed to assist districts in compiling EWS data in order to identify students at risk of dropout. Using course performance, attendance, and credit accrual, school personnel can use EWS to assess students’ overall risk status and identify students with multiple at-risk factors (Heppen & Theriault, 2008). Though the majority of EWS attention has been devoted specifically to the high school level, identifying students in need of support as as late as high school may not provide sufficient time to provide intervention services and get students on track for graduation. To give students and teachers more opportunities for remediation, many have recognized the need for identification of students at-risk before high school (Belfanz, 2009; Balfanz, Herzog, & Mac Iver, 2007; Casillas, et al., 2012; Neild, Balfanz, & Herzog, 2007; Pinkus, 2008). In 2007, Neild, Balfanz, and Herzog examined commonly collected data from school districts in hopes of finding reliable predictors of which students would eventually drop out of school. Researchers examined grades, test scores, attendance, behavior reports and special education status for a cohort of students from sixth grade through graduation. The team determined that over half of eventual dropouts were identifiable as middle school students. In 13 one investigation researchers found that, for students in sixth grade, anyone that exhibited one or more of any of the following markers; (a) failure in mathematics, (b) failure in English, (c) attendance rate below 80%, or (d) “unsatisfactory mark in behavior” for at least one class, had a 75% chance of dropping out. When researchers examined data from students in eighth grade, one course failure in English or mathematics or below 80% attendance once again signaled students with a 75% chance of dropout (Neild, Balfanz, & Herzog, 2007). In a follow-up study two years later Balfanz and colleagues extended the exploration of early warning signs by replicating the research in five additional school districts. In all replications the predictors used as early warning signs for students in eighth grade from the previous study now identified students with a 25% or lower chance of graduation. This follow-up study also added two points of consideration. First, students exhibiting warning signs in sixth grade and eventually dropping out of school did so typically during their junior year of high school. That means that students that showed warning signs in sixth grade remained in school for five additional years. This indicates that schools have a substantial window of opportunity to intervene and potentially prevent eventual dropout. Second, researchers explored the use of course failures in the form of Fs (grades below 60%) as well as near failures or D’s (grades between 60-69.99%). Grades of F are generally considered as failure for which students receive no credit accrual. Passing courses is, “key to earning the required credits to graduate” (p. 5, Belfanz, 2009). Grades of D are just above failure and thus considered passing. However, grades of D were predictive of Fs, and therefore a potential signal that predicts dropout before failures actually develop. The inclusion of both Ds and Fs produced a model that more confidently captured all potential students at risk. However the model also produced a higher proportion of students flagged for potential dropout than eventually dropped out. Many of these students would 14 have not been incorrectly identified using F’s alone. The authors contend that a higher potential for false positives risks is worth the additional security gained by reaching more students actually at-risk for dropout (Belfanz, 2009). Attendance. Attendance is an early warning sign with direct connection to academic proficiency and successful graduation. Absences present a physical barrier to learning in the school environment. If students are not present for instruction, their access to content is limited to what they can manage independently. Though this may be less of a concern in environments where online content is the primary mechanism of instruction. Attaining proficiency in course content can be challenging, even when students are in attendance. Absence from school compounds existing learning difficulties or often causes learning difficulties due to lack of instruction. Allensworth and Easton cite absences as a strong predictor of failure for students in ninth grade, noting a strong negative relationship R2 = -.51 between absences and grade point average (2005). Even moderate rates of absences (5-10 days per semester) are associated with increased rates of dropout (Heppen & Theriault, 2008). Neild and Belfanz (2006) found the number of absences accrued in the first 30 days of high school was one of the strongest predictors of eventual dropout when compared to other risk factors such as gender, race, age, and scores on standardized tests. Similarly, in a long-term study with Philadelphia Public Schools, researchers found that among students that eventually dropped out of school, 85% exhibited patterns of absences that developed in sixth grade and increased throughout the remainder of their careers in middle school (Neild, Belfanz, & Herzog, 2007). Behavior. Like attendance, patterns of inappropriate behavior can be both a direct problem in and of themselves, and a factor that exacerbates existing learning difficulties. Behaviors that detract from learning limit a student’s ability to learn by focusing attention 15 elsewhere and preventing engagement in instruction. Inappropriate behavior can impede the learning of other students and the teachers’ ability to teach effectively. Comorbidity of academic and behavior problems is well documented, spanning literature in dropout prevention, special education, behavior management, health, neuroscience, and others (Barry, Lyman, & Klinger, 2002; Hinshaw, 1992; McIntosh, Goodman, & Bohannon, 2010; Nigg, Hinshaw, Crarte, & Treuting, 1998; Reinke, Herman, Petras, & Ialongo, 2008; Sullivan, Childs, & O’Connell, 2010; Tobin & Sugai, 2009). Given this connection between behavior and academic performance, schools can use behavior as a predictor of academic risk. This is typically done using data from school suspensions (Jerald, 2006). In an exploration of dropout predictors using grade point average, race, socioeconomic status, and suspensions, researchers showed that a single suspension increased the likelihood that a student would drop out by 77.5% (Suh & Suh, 2007). However, data from suspensions is only one way in which behavior data can serve as a predictor of academic risk. Office Discipline Referrals (ODRs) can also be used to identify potential academic problems. ODRs refers to the documentation of major and minor behavioral infractions such as fighting, harassment, and noncompliance. ODRs are a source of data that can be useful for to identifying students with behavior difficulties and to inform the problem solving process (Sugai, et al., 2000). In 2009, Tobin and Sugai conducted a study that supports the use of discipline referrals as a method for predicting future academic and behavior problems. Researchers analyzed longitudinal data from the school records of 526 students to predict negative high school outcomes. Researchers created a model using data from sixth grade records to predict whether or not students would be on track for high school graduation in ninth grade. The model included 16 discipline referrals for fighting, harassment, nonviolent misbehavior, grade point average, contact with the juvenile justice system, and out-of-school suspensions. When controlled for variance related to gender, analysis showed that referrals were a statistically significant predictors (r2 = .65 - .91, p = <.001) of whether or not students were on track academically for graduation, as early as ninth grade (Tobin & Sugai, 2009). In addition to the strong correlation between academics and behavior, ODRs have other properties that make these data useful for screening purposes. First, ODR is not synonymous with punishment. Tracking ODRs differs from suspensions in that ODRs are logged based on the occurrence of an event warranting documentation as opposed to an event that warrants temporary exclusion from the school environment. Students can also accumulate ODRs for minor infractions, meaning that schools can use ODRs to identify students that exhibit patterns of behavior not severe enough for suspension. Course Failure. Failing grades are another key indicator of high school dropout. At the secondary level course grades are the primary factor behind credit accrual, grade point average, and career and college readiness. Without passing grades, it is unlikely that students will accrue sufficient credits for graduation and will not reach acceptable levels of academic proficiency required for post-secondary education or the workplace (Casillas et al., 2012). When compared to all other students, students that exhibit even a single course failure in ninth grade show increased rates of future course failure, decreasing test scores and increases in behavior problems (Cohen & Smerdon, 2009). Even in middle school, course grades remain an important variable in monitoring students’ long term and short-term academic progress toward graduation. The Consortium on Chicago School Research reports that a single course failure is highly predictive of eventual 17 dropout, noting that 83% of all students with no failed courses their freshman year will go on to graduate (Allensworth & Easton, 2005). When data on course grades are examined with other risk predictors, the reliability of correct prediction increases substantially. Casillas et al. found that course failure in eighth grade mathematics and English along with attendance lower than 80% identifies almost 80% of all future high school dropouts (Casillas et al., 2012). Fortunately, schools that target students at risk for failure and provide additional academic supports are generally successful in lowering high school dropout (Cohen & Smerdon, 2009; Neild, Belfanz, & Herzog, 2007, Belfanz, 2009). When used as a source of data for screening students at risk of failure in reading, course failure is an important factor in identifying students in need of support and preventing high school dropout. Statewide Reading Assessments. Since 2002, schools are required to meet federal school assessment and accountability standards under No Child Left Behind (NCLB) PL 107115, or file for a federal wavier with a state and federally approved school improvement plan. States must assess students in third through eighth grade in math and reading every year. Schoolwide assessment results must be publically available, with student level results made freely available to parents. Assessment results are then used to determine to what extent schools have been successful in teaching grade level content standards to all students. As the primary driver of school accountability, these results weigh heavily into school’s overall measure of effectiveness. Schools that do not reach Adequate Yearly Progress (AYP) may incur sanctions that include loss of state funding, state imposed restructuring or reconfiguration, and building closure (Hunley, Davies, & Miller, 2013; McGlinchey & Hixson, 2004; Stage & Jacobsen, 2004; Yeo, 2009). Research exploring the reliability and validity of CBM has a well-established history of using state assessment data as the dependent variable. (Baker et. al, 2014; Deboy, 2013; Denton 18 et al., 2011; Tindal, Nese, & Alonzo; Yeo, 2009; Wiley & Deno, 2005). As of 2014 there were more than 30 studies in oral reading fluency alone that used state administered reading assessments from 15 different states as the dependent variable in analysis (Kilgus, Methe, Maggin, & Tomasula, 2014; Yeo, 2009). However, what is unique about statewide reading achievement tests is that they can be considered both a measure of outcomes and a measure of prediction. Statewide achievement tests are often considered high stakes given the potential sanctions levied on schools that fail to reach achievement goals. Many school districts now include state-wide assessment results as part of teacher evaluation process, identification of students for intervention, and even as replacements for scientifically validated reading diagnostics (Casillas et al, 2012; Smartt & Reschly, 2007). It is also common to have annual IEP goals for students with disabilities tied to results on statewide tests (Katsiyannis, Zhang, Ryan, & Jones, 2007). In this sense, state mandated reading achievement tests are one of the outcomes on which schools are trying to effect positive change. Conversely, scores on state achievement tests can be a predictor of future academic performance. In fact, the psychometric procedures used in determining benchmark cut scores (i.e. logistic regression, signal detection theory, and equipercentile matching) attempt to define with some certainty how students are likely to score on future tests given their current performance. This essentially means cut scores are set such that current performance relative to the cut score predicts future performance. If, by design, state test scores predict future test scores, schools may be able to use scores on state assessments to serve as a screening method for identifying students that are not likely to reach grade level standards without some type of intervention or supplemental service. 19 Therefore, prior performance on state tests may serve the same function as CBM and EWS. Reed, Wexler, and Vaughn argue that state assessments alone may be an efficient screening tool for predicting which students are currently at risk for not reaching grade level benchmark and eventual dropout (2012). Since state assessments are mandated, schools are already required to allocate resources to comply with federal and state accountability law. Using state test data as a screening measure might then be a very efficient use of data from a test students are already required to take. State assessments of reading are administered annually and generally include criterion-referenced benchmarks of performance that may approximate the results of a dedicated screening assessment. Many state assessments also report a confidence interval or range of performance that can be considered in order to widen the potential pool of students that may need intervention (Reed, Wexler, & Vaughn, 2012). Prior research also indicates that past performance on state assessments is a good predictor of future performance on state assessments. In an analysis of ten different measures of reading, Denton and colleagues (2011) reported that of 10 measures, the best predictor of 2007 Texas reading comprehension accountability test (TAKS) scores was 2006 TAKS scores (AUC = .82). Additionally, 44% of variance on the TAKS was explained by TAKS scores from the previous year. Recent Research in Middle School Screening By comparison, far fewer studies have examined the use of CBM and other methods for risk identification (i.e. EWS) at the middle school level than exist at the elementary and high school levels. While CBM for early grades were developed with specific attention paid to key skills in early literacy they may not be appropriate for adolescent learners. Moving from learning basic skills to more complex grade level content, middle schools may require CBM that address this shift. That is not to say that reading CBM developed for use at the elementary level are 20 inherently ill-fit for use at the secondary level (Baker et. al, 2014; Yeo, 2009). On the contrary, skills that continue to develop throughout a student’s scholastic career such as oral reading fluency and basic reading comprehension may in fact be useful measures for universal screening (Helwig, Anderson, & Tindal, 2002). Despite a popular belief in the diminishing relationship between fluency and grade level reading comprehension as students age (Baker et. al, 2014). In 2009, Yeo conducted a meta-analysis of oral reading fluency CBM and found no such relationship. At the time however, there were only two published studies that included students beyond fifth grade. The overwhelming majority of data represented students in grades three and four. A more recent study by Denton et al. (2011), found correlations between oral reading fluency and high stakes tests (r = .69-.76) for students in grades 6 through 8 were comparable to correlations typically found at the elementary level. Fortunately, research is now expanding to include more data from secondary grades and address these questions more thoroughly. Baker and colleagues (2014) recently explored the diagnostic efficiency and criterion validity of CBM in measures of oral reading fluency, word reading accuracy, and multiple choice reading comprehension. Researchers collected data from 2,943 students in grades 7 and 8. Using state assessment results in reading as the outcome measure, researchers determined that oral reading fluency and reading comprehension combined, led to more accurate diagnosis of reading failure than either measure alone. The combination of ORF and reading comprehension also explained the greatest amount of variance in outcomes compared to all other tested combinations of predictor variables. Word reading accuracy provided little additional value to the predictive model. Similarly, Stevenson (in press) examined the accuracy and reliability of oral reading fluency, maze reading comprehension, and easyCBM multiple choice reading comprehension 21 (MCRC) in predicting outcomes on state assessments. Using data from 494 students in grades 7 and 8, researchers examined which combination of measures most accurately predicted outcomes on state reading tests in Michigan. Results indicate that for students in seventh grade a combination of easyCBM MCRC and oral reading fluency provided the highest significant level of classification accuracy (77%) and accounted for greater variance (53.2%) than any other linear combination of measures tested. Results were similar for students in eighth grade with easyCBM MCRC and oral reading together producing a classification accuracy of 82.3% with variance explained at 60.7%. Results indicate the combination of MCRC and oral reading fluency provided the best overall predictor of reading proficiency among the combinations tested. Research Questions It is clear from the extant literature that there are a number of predictors from both new assessment data (CBM) and extant records (EWS) in middle school that correlate with latter success and/or failure in school. What is unclear, however is which of these approaches, CBM or EWS (and the individual predictors therein) are the most suitable for use in middle school. To determine the most effective screening strategy there are many variables to consider including cost, time devoted to collecting the data, the personnel and training required, logistical considerations, and the accuracy of screening procedures in correctly identifying which students do or do not require addition supports to reach academic standards and eventually graduate. The following investigation explores this question of accuracy in screening, specifically targeting reading proficiency and outcomes on high-stakes assessments. The current study is a direct comparison of CBM and EWS approaches for identification of students at risk of failure in reading for grades seven and eight. The primary research questions for this investigation are: 22 1. How do curriculum-based measures in reading compare to early warning signs data in predicting outcomes on high stakes tests of reading at the middle school level? 2. What is the valued added in combining curriculum-based measures (CBM) in oral reading fluency, Maze reading comprehension, and easyCBM multiple choice reading comprehension as predictors of outcomes on high stakes reading assessments at the middle school level? 3. What is the valued added in combining Early Warning Signs (EWS) data in attendance, office discipline referrals, failing grades, and prior performance on statewide assessments function as predictors of outcomes on high stakes reading assessments at the middle school level? 23 CHAPTER 3: Methods Participants and Setting All data for this investigation were collected from a suburban middle school in the Midwest. The school serves 434 students in grades seven (n = 197) and eight (n = 237). Enrollment includes 48.8% male, 51.2% female. Enrollment by race/ethnicity is 4.7% of students identified as Asian, 24.1% African-American, 36.8% Caucasian, 19.6% Hispanic, and 14.6% identified as two or more races. Less than 1% of students were categorized as other or unreported. Additional demographic reporting includes 64.3% of students eligible for free or reduced lunch (FRL), 2.2% of students qualified as English Language Learners, and 15.7% of students enrolled in special education. The site for this investigation was chosen based on the capacity of the school and district to provide the appropriate data for the questions of interests. The researcher for this investigation is formerly the Data Coach for the participating school district. Though the researcher participated in administration of many of the assessments included in the dataset, the researcher was not directly involved in compiling the dataset used for the current investigation. Data Collection Data for this study were gathered exclusively from archival sources. School staff supplied the researcher with the dataset electronically. Prior to delivering the dataset to researchers, school staff removed all students’ identifying information including names, birth dates, local school identification numbers, and state identification codes. All assessment scores and demographic information included in this study were collected as part of the participating school’s regular operating procedures. Assessment scores were collected as part of the typical battery of assessments given to all students in fall of each academic year. Since all data were 24 collected as part of regular educational practice and did not contain identifying information, the study posed no risk for students and did not fit the definition of human subjects research. Therefore, the current study was determined exempt from review by the Institutional Review Board. Data supplied to the researcher included demographic information for grade level, gender, race, eligibility for free or reduced price lunch, disability status, disability classification, and status as English Language Learners. District staff also provided data for attendance, office discipline referrals, and course failures for the 2013-2014 school year. Assessment data were provided for three universal screening assessments administered in September 2013 including oral reading fluency, Maze reading comprehension, and easyCBM Multiple Choice Reading Comprehension. The participating school routine administers oral reading fluency, Maze, and easyCBM assessments three times each year. Statewide assessment data for the Michigan Education Assessment Program (MEAP) reading subtest were provided for fall 2012 and fall 2013 testing periods. MEAP is administered annually to all public and charter schools in the state of Michigan. Local district staff for each school does administration of the MEAP in each school. Data from the MEAP reading subtest included raw score, scaled score, and proficiency level. Data from ACT Explore® reading subtest included raw score and scaled score. ACT Explore® is administered to students in the Spring of the eighth grade year. ACT Explore® is included as part the districts assessment of career and college readiness. The sequence of career and college readiness includes ACT Explore® (eighth grade), ACT Plan® (tenth grade), and ACT® (eleventh grade). Detailed purpose and rationale for the inclusion of each of these assessments is included below. 25 All students enrolled during the assessment periods participated in all assessments. Missing scores in the dataset were due to changes in school enrollment for individual students. Any students with a missing score in the dataset became enrolled in the school after the conclusion of benchmark testing or was provided an alternative statewide reading assessment in compliance with state testing guidelines and the students’ Individualized Education Plan (IEP). The participating school provides formal training to all staff involved in direct administration of CBM and state assessments. All teachers participate in administration of state assessments. All teachers also participate in administration and scoring of CBM of at least one type of assessment. Not all teachers participate in administration of all CBM assessments, though everyone is required to participate in administration of at least one measure. The district also direct assistance to teachers during the test periods as part of the training protocols. Though the participating school provided training for each individual responsible for administration of CBM and state assessments, at the time of this investigation they did not systematically collect fidelity of implementation data for adherence to testing procedures. Therefore it was not possible to determine fidelity of testing procedures. Reading Screening Measures The participating school conducts universal screening in reading and math three times each year in September, January, and May. Each screening period (also known as benchmark testing period) includes a two-week window of opportunity for initial testing and an additional week for make-up testing. All reading screening assessment data for the current investigation were from the fall screening period. For the purposes of this investigation, the fall administration of universal screening measures in reading was the period that immediately preceded 26 administration of state reading tests. There was approximately three weeks separating the end of the benchmark testing period and start of administration of MEAP. easyCBM Multiple-Choice Reading Comprehension. All assessments of Multiple Choice Reading Comprehension were conducted using easyCBM MCRC assessment (Riverside, 2012). Group administration of MCRC was done electronically using iPads. Each teacher that participated in the administration of MCRC received training in testing protocols and procedures from the district level easyCBM system administrator. Prior to each benchmark testing period, each participating teacher received a brief refresher training in the administration procedures. All students took the MCRC during the predetermined benchmark assessment period for fall 2013. easyCBM MCRC consists of 20 multiple choice reading comprehension questions. Each question was worth one point. Scores for MCRC were recorded in the dataset as raw scores. Reading-Curriculum Based Measure. Reading-Curriculum Based Measure (R-CBM) is a measure of oral reading fluency. All R-CBM assessments were administered using the AIMSweb browser based scoring system. R-CBM is given in a 1-on-1 setting. The participating school uses a team-based approach to administering benchmark assessments. A group of five teachers, paraprofessionals, and instructional coaches conducted all administrations of R-CBM benchmark testing. All school staff participating in benchmark assessments receive ongoing training in the administration and scoring of reading measures within the AIMSweb system. During benchmark administrations of R-CBM, each student reads three grade level passages timed at 1-minute each. The median score was then recorded. Median scores were reported to researcher in the form of, (a) number of words read correctly per minute, (b) number of errors, and (c) percentage of words read correctly. 27 Maze Reading Comprehension. Maze was group administered in hardcopy format during language arts classes. All language arts teachers previously received training in the administration and scoring of Maze assessments. All language arts teachers also received a brief refresher in training protocols no more than one week prior to the start of the benchmark testing period. All Maze assessments were scored by the same team responsible for the administration of R-CBM assessments. Data were provided to researchers for the total words marked correctly, total errors, and percentage accuracy. Early Warning Signs Attendance. Attendance data for this investigation were gathered directly from the participating school’s student information system. Absence data from the 2012-2013 school were provided for all students. Per school policy, all teachers are required to record student absences and late arrivals at the beginning of each of 7 periods throughout the day. According to school policy, late arrivals less than ten minutes late are coded as “T,” for tardy. Late arrivals greater than 10 minutes late but less than 20 minutes late are coded as “Y” indicating a severe tardy. Any late arrival greater than 20 minutes is coded as an absence. Absence codes are assigned to each absence according to the circumstances behind the absence. Attendance codes included A = Absent-Unexcused, I = In-School Suspension, S = Out-of-School Suspension, U = Unexcused Absence, and V = School activity. For the purposes of this investigation all absence codes were collapsed to a single code. All absences codes were aggregated to yield the total number of absence codes accumulated in the most recent school year. Absences of any type indicate that the student was not present for instruction. Absence from instruction is a significant predictor of academic failure (Balfanz, 2005). 28 Office Discipline Referrals. Office discipline referrals (ODR) is the term associated with all major and minor disciplinary incidents that occur within the school environment. School staff track ODRs for all students using the School-Wide Information System (SWIS) digital online behavior tracking system developed at Education and Community Supports (ECS) at the University of Oregon (pbisapps.org). At the participating school, all instances of lunch detention, after-school detention, Saturday school, in-school suspensions, out-of-school suspensions, expulsions, and referrals to the school’s time-out room are logged as ODRs. Data on location, date, time of day, referring staff member, behavior warranting the referral, and likely function of given behavior are logged with each ODR. To preserve confidentiality, only the total number of ODRs for each student were provided to the researcher. Failing Grades. Data for course failures were collected from the school’s electronic student information system. All final course grades from the prior school year were provided to the researcher as letter grades (A, B, C, D, & F). The total number of course failures was calculated for each student. Though the participating school considered a grade of D as passing, the inclusion of D as a failure in screening procedures decreases the likelihood of misidentifying risk status for students that may be near failure but are technically passing courses. Inclusion of D grades as failures is also consistent with research and policy recommendations of early warning signs. (Balfanz, 2009; Balfanz, Herzog, & Mac Iver, 2007). Reading Outcome Measures Michigan Education Assessment Program. The Michigan Education Assessment Program (MEAP) was the statewide system of assessments for academic progress for the State of Michigan. MEAP annually assessed grade level content in reading, writing, math, science, and social studies in the fall of each academic year for third through ninth grades. Fall administration 29 of MEAP occurred approximately two weeks after the close of the benchmark screening period for universal screening. The reading subtest of the MEAP assessment was selected as the primary dependent variable in this investigation for several. First, the cut scores used to determine the difference between proficient and not proficient were revised in 2011 to become non-arbitrary points. Cut scores are backward mapped for each grade level beginning at a cut point for students at eleventh grade with a 50% probability of earning B or better in their first college course. All other cut scores are determined by a 50% likelihood of reaching proficiency on the following year’s MEAP test (State of Michigan). Second, as the assessment used for school accountability under federal requirements MEAP is indeed a high stakes test. As the statewide reading assessment, MEAP is aligned with the state adopted standards for reading at each grade level. Fourth, all students in third through ninth grade were required to take MEAP unless the student had an active IEP and the students’ “Individualized Education Plan (IEP) teams determined it was not appropriate for them to participate in the state’s general education assessments” (p. H-3, Michigan Department of Education, Statewide Assessment Selection, [MDE] 2012). One student in the dataset was marked by the school as eligible for an alternative assessment and did not take MEAP Reading. School staff indicated that for all students with disabilities that qualified to take MEAP, all testing accommodations were provided to students in accordance with each students’ IEP requirements. Data on which students received which accommodations were not provided to the researcher. Results for MEAP Reading were provided to the researcher in the form of scaled scores and proficiency levels. Proficiency levels were determined by the Michigan Department of 30 Education, Bureau of Assessment and Accountability (BAA). Scale scores for each student were organized according to predetermined performance criteria. Performance levels include; 1 = advanced, 2 = proficient, 3 = partially proficient, and 4 = not proficient. Only students scoring advanced and proficient are counted as proficient under Michigan State Accountability Criteria. Technical documentation for MEAP report statewide classification accuracy in reading of 73.9% for seventh grade and 74.7% for eighth grade. Empirical Item Response Theory (IRT) statistics for reliability range from α = .80-.82 for seventh and eight grade (Michigan Department of Education, 2012). For the 2014-2015 school year, MEAP was discontinued by the Michigan Department of Education. As of spring 2015 the statewide assessment of reading achievement and accountability is the Michigan Student Test of Educational Progress (M-STEP). M-STEP is administered in the spring of each school year. ACT Explore® Reading. ACT Explore® is a nationally administered assessment of academic achievement. ACT Explore® is the middle school companion of the ACT®, a commonly used college entrance exam. ACT Explore® was selected as a secondary outcome measure for this investigation for several reasons. First, the purpose of ACT® assessments is to assess students’ readiness for college. Scaled scores for ACT Explore® are aligned to the outcomes of ACT Plan given in tenth grade and the regular ACT® taken for college entrance requirements in the eleventh grade. That means that a score of 16 on the ACT Explore® is equivalent to a score of 16 on the college entrance ACT®. This is advantageous in that students’ current state of college readiness in eighth grade can be assessed in a way that is directly comparable to the test students take for their actual college entrance. This is a unique feature of ACT® that makes the assessment of actual college readiness salient as early as eighth grade. Inclusion of ACT 31 Explore® as a dependent variable in this investigation is therefore helpful in evaluating the EWS and CBM prediction as they relate specifically to college and career readiness. Second, MEAP is a test of proficiency on state adopted skills and standards. ACT Explore® is a nationally administered test that is not directly tied to a single state’s adoption of standards. Instead, ACT Explore® assesses literacy and math skills more broadly since it must be configured to cover standards across multiple states. This, analysis of ACT Explore® provides a unique opportunity to readily compare the results of the prediction models under investigation with both state level and national level data. ACT Explore® is administered to students in the spring of eighth grade. Scores for ACT Explore® were reported as raw scores and scaled scores. The proficiency cut points for career and college readiness used for scaled scores on ACT Explore® were determined by ACT® (ACT, 2013). As of June 2014 ACT Explore® is no longer available. ACT Explore® has been replaced by ACT Aspire™ beginning spring 2015. Data Analysis Data Screening. Prior to analysis, all assessment data were examined for outliers, normality in distribution, skewness and kurtosis for each dependent measure. All statistical analyses were conducted independently for each grade level, outcome measure, and method of analysis. The dependent variables were MEAP Reading and ACT Explore®. In all analyses, cases with missing data were excluded pairwise. Analyses were conducted using hierarchical logistic regression and hierarchical linear regression. Binary Analysis. The chief function of any universal screening assessment is to determine if a student is at-risk or not at-risk for academic problems. Therefore it is necessary to analyze how accurately screening procedures correctly predict categorization of student as at-risk 32 or not at-risk for not achieving grade level standards of performance. The current study uses EWS and CBM models to predict risk categorization in reading. Each assessment of risk predicted by each model is compared to the actual outcomes of students’ reading proficiency on high stakes reading tests. Since the primary outcome of consideration is the accuracy in predicting proficiency on high stakes assessments, the outcome of greatest interest is dichotomous (proficient or not proficient). For hypothesis testing of categorical and continuous predictors on dichotomous outcomes logistic regression is the most appropriate method of analysis (Peng, Lee, & Ingersoll, 2001). Scores for each of the dependent variables (MEAP reading and ACT Explore reading) were collapsed into binary outcomes. MEAP proficiency levels (1) advanced and (2) proficient were combined into a single category of (1) proficient. MEAP proficiency levels (3) partially proficient and (4) not proficient were combined into the category of (0) not proficient. ACT Explore® scaled scores were categorized based on the college readiness benchmark cut scores provided by ACT® (ACT, 2013). Scaled scores greater than or equal to 16 were recoded (1) on-track for college readiness Scaled scores less than 16 were recoded as (0) not on-track for college readiness. Hierarchical Logistic regression was used to assess the accuracy and value added of CBM and EWS variables, individual and in combination, in predicting outcomes on MEAP and ACT Explore. The enter method was used to add predictor variables separately for each block of the prediction models. In order to control for variance due to demographic variables, ethnicity, gender, and students qualifying for free or reduced lunch (FRL), were entered in block one of each analysis. The inclusion of demographic variables in subsequent blocks of the regression models was contingent upon statistical significance. Significant (β-value, p < .05) demographic 33 variables were retained in each subsequent block. Non-significant variables (β-value, p > .05) were removed from the model. Order for block entry of CBM variables was based on the results of prior CBM research for students in middle school (Baker et al, 2014; Stevenson, in press). Variables were entered in order of the strength of predictive relationship with the outcome measures, with the strongest predictors entered first and the weakest predictors entered last. Block two includes easyCBM MCRC. Block three includes R-CBM. Block four includes Maze reading comprehension. The CBM prediction model was run independently for MEAP and ACT Explore at each grade level. A separate prediction model was created and tested for EWS variables using logistic regression. Block entry of variables for early warning signs data is based on variance explained, likelihood of correctly predicting school dropout from extant data (Belfanz, 2009; Belfanz, Herzog, Mac Iver, 2007; Heppen, Therriault, 2008; Jerald, 2006; Pinkus, 2008), and the use of past scores on state assessments to predict future scores on state tests (Casillas et al., 2012; Denton et al., 2011; 2012; Reed, Wexler, & Vaughn, 2009). Demographic variables were entered into the first block of the prediction model as described above. Block two includes the most recent MEAP reading scale score (MEAP-Y1). Block three includes the total number office discipline referrals for the prior school year (Tobin & Sugai, 1999). Block four includes attendance in the form of the total number of period absences from the prior school year. Block five includes the total number of course failures. Course failures include grades of both D and F, commensurate with the findings of Belfanz and colleagues (2007). The EWS prediction model was run independently for MEAP and ACT Explore at each grade level. 34 Linear Analysis. Hierarchical linear regression was also used to examine the relationship between combinations of EWS and CBM predictor variables and outcomes on MEAP reading and ACT Explore Reading. Scaled scores for MEAP an ACT Explore were used as the dependent measures. Analyses were conducted for combinations of EWS predictors and combinations of CBM independently. Block entries of variables for each hierarchical linear regression model were entered in the same order as in the logistic models described above. Linear analyses were run separately for each grade level. 35 CHAPTER 4: Results Data Screening A summary of descriptive statistics and tests for normality are included in Table 1. Data for ACT Explore®, MEAP Reading, R-CBM, easyCBM, MCRC, and Maze were within ±1.5 for skewness and kurtosis (Tabachnick & Fidell, 2013). Examination of Q-Q plots and histograms indicate that the data were normally distributed, therefore no data transformations were performed (George & Mallery, 2010; Ghasemi & Zahediasl, 2012). However, seventh and eighth grade scores for easyCBM were significant on the Shapiro-Wilk test indicating a potential nonnormal distribution (eighth grade = .951, p < .001; seventh grade = .928, p < .001). MEAP scores for both grades also showed significant test statistics on Shapiro-Wilk (eighth grade = .955, p < .001; seventh = .986, p = .020). Some indication of non-normality was anticipated given that the school-wide percent proficient for MEAP reading was below the state average for both seventh (11.7%) and eighth (-6.5%) grade. However, as normality in distribution is not an assumption that must be met for logistic regression, no transformations of the data were necessary (Peng, Lee, & Ingersoll, 2002). All other measures of reading were within recommended tolerances for normality;thus it is unlikely that the potential for non-normal distribution of data would compromise the following statistical analysis. Demographic Variables In each of the screening models under investigation the demographic variables of gender, socio-economic status (SES; as measured by students qualifying under federal guidelines for Free or Reduce Lunch), and ethnicity were included in the first entry of variables into each model. Achievement gaps among these subgroups are well documented in the literature (Hernandez, 2011; O’Connor & Fernandez, 2006; O’Connor, Hill, & Robinson, 2009). In 36 particular, students from low-SES households, Hispanic students, and African-American students typically are disproportionately identified at-risk and have fewer students identified as gifted and talented (O’Connor & Fernandez, 2006; O’Connor, Hill, & Robinson, 2009; Hernandez, 2011). According to Hernandez (2011) 22% of all students that live in poverty at any time throughout their childhood do not graduate from high school. Data from the National Assessment of Educational Progress (NAEP) from 1992 to 2009 showed an average gap of 25 point gap in reading achievement between White and Hispanic students in eighth grade (Hemphill, Vanneman, & Rahman, 2011). Results are similar for the gap in reading achievement between White and Black students. NAEP scores from 1992 to 2007 showed a 7-point deference between White and Black students with no statistically significant change between 1998 and 2007 (Vanneman, Hamilton, Anderson, & Rahman, 2009). While this investigation is not specifically concerned with the associations between demographic subgroups and student achievement, it would be a mistake to ignore variables that are known to be associated with increased risk for failure and dropout. In order to appropriately evaluate the prediction models of interest, demographic variables were included in each prediction model in order to control for these effects. In subsequent entries of variables for each model, demographic variables that returned non-significant statistics for variance explained were removed from each model beginning in block two. Binary Analysis Early Warning Signs. Block entry of EWS variables in the prediction model were entered in the following order: demographic variables, prior scores on state reading assessments (MEAP-Y1), ODRs, attendance, and course failure. Model fit was assessed for each block entry of variables. Test statistics for model fit and variance explained for outcomes on MEAP reading 37 are summarized in Table 2 (seventh grade) and Table 3 (eighth grade). Chi-squared and HosmerLemeshow (H-L) tests were preformed for goodness of fit. For seventh grade EWS predicting MEAP, H-L tests for all five blocks were nonsignificant indicating that the models were not poorly fit to the data (see Table 2). Omnibus tests showed significance in block one (FRL), χ2 (1, N = 158) = 27.56, p < .001 and block two (MEAP-Y1) χ2 (1, N = 182) = 78.882, p < .001. While the overall model itself retained significance in each block, the addition of predictors in block three (ODRs) χ2 (1, N = 182) = 1.65, p = .198, block four (Attendance) χ2 (1, N = 158) = 3.00, p = .083, and block five (Course Failure) χ2 (1, N = 182) = .728, p < .001 were all non-significant. Using the Nagelkerke R estimate for R2 21.4% of variance was explained in block one. Block two showed 52.4% of variance explained. The addition of variables in blocks three, four, and five showed no significant increase in variance explained. Classification results for seventh grade EWS to predict MEAP reading showed an accuracy rate of 60.1% for FRL alone in block one (see Table 4). FRL was retained in subsequent block entries. The addition of MEAP-Y1 (prior performance on MEAP reading) in block two increased classification accuracy to 74.7%. Adding ODRs in block three increased classification accuracy to 75.9%. The addition of attendance in block four produced no increase or decrease in classification accuracy. In block five the addition of course failure produced a decrease in classification accuracy down to 74.1%. For eighth grade EWS predicting MEAP, H-L tests in all block entries were nonsignificant (see Table 3). Omnibus tests showed significance in blocks one, χ2 (1, N = 222) = 13.16, p < .001 and two χ2 (2, N = 221) = 123.25, p < .001. As with seventh grade, the addition of ODRs, attendance, and course failure into the model in subsequent blocks produced an overall model that was significant even though the addition of each variable was non-significant. 38 Variance explained in block one was 7.9%. The addition of MEAP-Y1 in block two showed a significant increase in variance explained to 58.5%. Additional variables in blocks three, four, and five produced no statistically significant increase in variance explained. Classification results for 8th grade EWS to predict MEAP reading, showed a larger than expected accuracy for FRL at 64.4% in block one alone (see Table 5). The addition of MEAP-Y1 in block two increased classification accuracy to 84.7%. The addition of ODRs, attendance, and course failure in subsequent blocks produced exactly 0% increase in classification accuracy. Logistic regression of EWS predictors for eighth grade was also done using ACT Explore® reading as the dependent variable. As a measure designed to specifically address career and college readiness ACT Explore provides external validity for the prediction model and provides a look how the prediction models function in terms of college readiness as opposed to profiency on state adopted standards. Results of EWS analysis with ACT Explore® as the dependent measure are consistent with the results of EWS analysis with MEAP reading as the dependent measure. Analyses show similarities in classification accuracy and variance explained as well as the significance of each EWS predictor in each block entry. Model fit tests for EWS predicting ACT Explore® in all block entries were non-significant (see Table 6) indicating that the models in each block were not poorly fit to the data. Omnibus tests showed significance in blocks one, χ2 (1, N = 207) = 19.40, p = .007 and two χ2 (2, N = 207) = 86.929, p < .001. The addition of ODRs, attendance, and course failure into the model in subsequent blocks produced an overall model that was significant though the addition of each variable was non-significant. Variance explained in block one was 12.8%. The addition of MEAP-Y1 in block two showed a significant increase in variance explained to 49.1%. Additional variables in blocks three, four, and five produced no statistically significant increase in variance explained. Classification results 39 for 8th grade EWS to predict ACT Explore® showed accuracy for FRL of 70.9% in block one alone (see Table 7). The addition of MEAP-Y1 in block two increased classification accuracy to 79.6%. Adding ODRs, attendance, and course failure in subsequent blocks produced an increase in classification accuracy to 81.6% though this increase was non-significant. Curriculum-Based Measures. Block entry of CBM variables in the prediction model were entered in the following order: demographic variables, easyCBM MCRC, R-CBM, and Maze. Model fit was assessed for each block entry of variables. Test statistics for model fit and variance explained for outcomes on MEAP reading are summarized in Table 8 (seventh grade) and Table 9 (eighth grade). Chi-squared and Hosmer- Lemeshow (H-L) tests were preformed for goodness of fit. For seventh grade CBM predicting MEAP reading, H-L tests for all five blocks were non-significant indicating that the models were not poorly fit (see table 8). Omnibus tests showed significance in block one (FRL + Ethnicity), χ2 (1, N = 182) = 15.083, p = .002 and block two (easyCBM) χ2 (2, N = 182) = 40.018, p < .001, block three (R-CBM) χ2 (3, N = 182) = 63.374, p < .001, and block four (Maze) χ2 (4, N = 182) = 64.528, p < .001 for the overall model. The addition of easyCBM in block two χ2 (2, N = 182) = 25.314 p <.001 and R-CBM χ2 (3, N = 182) = 23.355 p < .001 in block three were significant at the point of entry and Maze in block four were significant at each point of entry χ2 = 1.15-25. Nagelkerke R estimates of r2 showed that the increase in variance explained with the addition of each variable was significant in all four blocks. Block one showed 7.5% of variance was explained. Block two showed 38.1% of variance explained. Block three showed 51.3% variance explained. Block four showed 53.4% of variance explained. Significant increases in 40 variance explained and chi-squared omnibus test indicate that the each variable added significant value to the prediction model. Results of classification accuracy for seventh grade CBM to predict MEAP reading showed a relatively high accuracy rate of 66.1% for FRL and ethnicity in block one (see Table 10). FRL and ethnicity were retained in subsequent block entries. The addition of easyCBM MCRC in block two increased classification accuracy to 67.8%. Adding R-CBM in block three increased classification accuracy to 73.9%. Adding Maze in block four resulted in a decrease in classification accuracy to 72.8%. For eighth grade, CBM predicting MEAP, H-L tests in blocks one and two were nonsignificant. Block three which included easyCBM and R-CBM was significant H-L χ2 (8, N = 222) = 16.1, p = .041 (see Table 9). Omnibus tests showed significance in blocks one, χ2 (1, N = 222) = 12.525, p = .006, two χ2 (2, N = 222) = 13.16, p < .001, three χ2 (3, N = 222) = 104.681, p < .001, and four χ2 (4, N = 222) = 110.119, p < .001. The addition of easyCBM MCRC, R-CBM, and Maze in subsequent blocks were significant at each point of entry. Variance explained in block one was 7.5%. In block two easyCBM MCRC showed a significant increase in variance explained to 38.1%. In block three variance explained increased to 51.3% with R-CBM added. Peak variance explained was 53.4% in block four with the addition of Maze. Classification results for 8th grade CBM to predict MEAP reading showed an accuracy of 65.9% for FRL in block one alone (see Table 11). The addition of easyCBM MCRC in block two increased classification accuracy to 79.6%. In block three R-CBM improved accuracy to 84.1%. For block four the addition of Maze to the model reduced overall classification accuracy to 82.7%. When the analysis was repeated for eighth grade using ACT Explore® reading as the dependent variable, results of CBM analysis with ACT Explore® as the dependent measure were 41 again, consistent with the results of CBM analysis with MEAP reading as the dependent measure. Analyses show similarities in classification accuracy and variance explained as well as the significance of each EWS predictor in each block entry. Model fit tests for CBM predicting ACT Explore® in all block entries were non-significant (see Table 12) indicating that the models in each block were not poorly fit to the data. Omnibus tests showed significance for the overall model were significant (p < .005). The addition of easyCBM MCRC, R-CBM, and Maze variables were significant at each point of entry. Variance explained in block one for FRL and ethnicity was 9.3%. The addition of easyCBM MCRC in block two showed a significant increase in variance explained to 32.2%. In block four, R-CBM increased variance explained to 42.4%. In block three, Maze increased variance explained to 46.7%. Classification results for 8th grade CBM to predict ACT Explore® showed an accuracy for FRL and ethnicity of 71.1% in block one alone (see Table 13). Adding easyCBM MCRC in block two increased classification accuracy to 74.8%. Adding R-CBM increased accuracy to 80.7% in block three. In block four Maze increased overall accuracy to 83.5%. Linear Analysis To further explore the relationship between CBM, EWS, and outcomes on high stakes tests, hierarchical linear regression was conducted using MEAP reading scaled scores and ACT Explore® scaled scores as the dependent variables. Linear regression was selected in order to examine the linear relationship between each combination of predictors and dependent measures on a continuous (linear) scale. Linear regression allows for a more accurate assessment of the correlation between each prediction model and linear outcomes. Linear regression also provides the opportunity to examine a direct R2 calculation (as opposed to the Nagelkerke R), and the standard error of each prediction model. Scaled scores for ACT Explore and MEAP are grade 42 level dependent, meaning that the range of scores from not proficient to proficient in one grade do not overlap with the range of scores from another grade level. To avoid a bi-modal distribution in outcome measures, analyses were conducted separately for each grade level and each dependent variable. Analysis was done using the enter method. Block entry of each predictor variable was conducted in the same order as in the logistic regression analyses described above. All linear regression models were examined for multicollinearity using Tolerance and Variance Inflation Factor (VIF). For the CBM models some multicollinearity was expected, given that all three CBM assessments (easyCBM MCRC, R-CBM, and Maze) and dependent variables (MEAP reading and ACT Explore® reading) are measures of reading performance. For EWS variables multicolinearity was also expected given that behavior problems, academic problems, and attendance problems are often comorbid. There is little evidence that multicollinearity within this investigation was at a level severe enough to compromise the interpretation of results. Tolerance and VIF were typically within acceptable limits (Tolerance > .4, VIF < 2.5) and in many cases less severe than anticipated. For seventh grade CBM assessments regressed to MEAP, reading collinearity was within acceptable limits (Tolerance > .4, VIF < 2.5) for all variables in the blocks one through three (range 1.02, 1.55). However in block four VIF exceeded 2.5 for the last two variables entered, R-CBM and Maze (range 1.06, 2.97). All other models produced test statistics for tolerance and VIF that were within the acceptable range. Early Warning Signs. Using EWS to predict outcomes for scaled scores on MEAP reading for seventh grade, FRL was included in model one to control for variance due to socioeconomic status. Model one with FRL alone was statistically significant R2 = .04, F(1,165) 43 = 6.95, p = .009 (see table 14). In model two the addition of MEAP-Y1 was significant R2 = .527, F(1,164) = 168.49, p < .001, indicating an increase of 48% variance explained over FRL alone. In models three through five, the addition of ODRs, attendance, and course failure were non-significant. Non-significant changes in R2 for blocks three through five range from .001 to .005 indicating that adding ODRs, attendance, and course failure to the linear combination of EWS predictors produced no significant improvement in the model. For eighth grade, FRL was included in model one and was significant R2 = .057, F(1,220) = 13.189, p < .001 (see Table 15). When MEAP-Y1 was added in model two, the proportion of variance explained by the model increased significantly R2 = .60, F(2,163) = 298.688, p < .001. No other predictors added to the linear combination in subsequent models yielded a significant change to the EWS model. Results of the linear analyses of the EWS model to predict outcomes on MEAP reading for seventh and eighth grade are consistent with results of the logistic analyses. The linear combination of demographic predictors and prior scores on MEAP reading (MEAP-Y1) accounts for the majority of variance explained in the model. ODRs, attendance, and course failure failed to significantly strengthen the model. Curriculum Based Measures. Using CBM to predict outcomes for scaled scores on MEAP reading for seventh grade, model one included both ethnicity and FRL (see table 16). Model one was significant R2 = .10, F(1,180) = 10.25, p < .001. In model two easyCBM MCRC was added. The combination of demographic variables and easyCBM MCRC produce a significant increase in variance explained R2 = .32, F(1,179) = 55.47, p < .001. In model 3 the addition of R-CBM produced a significant increase in the percentage of variance explained by the model R2 = .483, F(1,178) = 58.12, p < .001. The maximum variance explained was achieved 44 in model 4 with the addition of Maze. However, the slight increase in R2 attributed to the addition of Maze was not significant R2 = .485, F(1,177) = .71, p = .40. These results indicate that easyCBM MCRC accounts for the greatest amount of variance attributable to a single predictor in the model. The addition of R-CBM improved the predictive model though comparatively less than easyCBM MCRC. Maze provided no added statistically significant value to the prediction model. For eighth grade, FRL and ethnicity were included in model one which was significant R2 = .07, F(2,226) = 8.10, p < .001 (see table 17). When easyCBM MCRC was added in model two, the proportion of variance explained by the model increased significantly, R2 = .37, F(1,225) = 110.098, p < .001. Block three added R-CBM into the model. Results in block three produce another significant increase in variance explained R2 = .495, F(1,224) = 53.65, p < .001. In contrast with seventh grade, the addition of Maze in model four was significant R2 = .52, F(1,223) = 9.98, p < .001. Results for eighth grade indicated a model in which the variance is more equitably distributed amongst predictors. In the CBM model for eighth grade, all four models indicate that the new predictor introduced with each entry attributed a statistically significant amount of variance explained to the model. As with the logistic regression analysis, linear regression analyses were repeated for eighth grade using ACT Explore® reading as the dependent variable. Linear regression showed that CBM assessments significantly predicted ACT Explore® scores similarly to previous analyses described above (see Table 18). In model one, race and ethnicity were included. Model one was significant R2 = .06, F(2,218) = 6.64, p = .002. With easyCBM in model two the model was significant R2 = .30, F(1,217) = 78.60, p < .001. In model three R-CBM was added. The results in model three were significant R2 = .39, F(1,216) = 27.920, p < .001. In model four, 45 Maze was added. Again, the additional variance attributed to the addition of Maze as a predictor was significant R2 = .45, F(1,215) = 23.829, p < .001. Results of analysis with ACT Explore® are consistent with the results of CBM analysis for outcomes on MEAP reading. It is worth noting though that in block four of CBM analysis for ACT Explore® the addition of Maze accounted for a larger amount of variance than it did in any other analysis run for this investigation. When repeated with ACT Explore®, analysis using the EWS model produced results under linear regression that were similar to analyses using MEAP reading as the dependent variable. Model one contained FRL to control for variance related to socioeconomic status. Model one showed a significant result R2 = .04, F(1,210) = 9.34, p = .003. In model two MEAPY1 was added. The resulting increase in variance explained was significant R2 = .48, F(1,209) = 173.344, p < .001. The addition of ODRs, attendance, and course failure in subsequent blocks was not significant. Analysis using ACT Explore® confirms the results of analysis done with MEAP reading as the dependent variable. Seeing results that are consistent between state and national assessments is encouraging. The use of a national assessment as another outcome measure to crosscheck results increases confidence that results are not subject to any substantive regional effects or undesirable psychometric properties of MEAP. When compared head-to-head, the combination of state assessment scores and EWS predictors are comparable, and in some cases more accurate than the CBM model. With MEAP reading as the dependent measure, CBM and EWS models returned comparable peak values for classification accuracy (seventh grade = CBM = 73.9%, EWS = 74.1%, eighth grade = CBM = 84.1%, EWS = 84.7%). However, EWS accounted for an overall greater amount of variance explained at seventh (55.1%) and eighth (58.1%) grade compared to CBM (seventh = 40.2%, 46 eighth = 53.4%). Additionally, in the EWS model fewer predictor variables were required to achieve the maximum variance explained, compared to the CBM model. For EWS, the demographic variables and prior MEAP scores accounted for the vast majority of variance explained. The addition of ODRs, attendance, and course failure added no statistically significant value to the predictive model. For CBM, the addition of easyCBM and R-CBM was significant at each point of entry and therefore contributed substantial value to the peak accuracy and variance explained. To put this in perspective, even with all of the CBM predictors added to the model, the CBM model did not achieve as high a value in variance explained and overall classification accuracy as did only the previous years’ MEAP scores from the EWS model. 47 CHAPTER 5: Discussion Summary This investigation sought to compare two different models for the identification of students in need of reading support; one based on extant data and the other based on additional reading specific screening tests. A thorough search of extant literature revealed a need for an investigation of screening methods at the middle school level. As discussed earlier, there is currently no consensus on the most appropriate way to identify middle school students at risk of underperformance in reading. This investigation sought to bridge this gap by comparing two disparate models, one commonly used at the elementary level (CBM) and one commonly used at the high school level (EWS). Specifically, this investigation sought to answer three questions: 1. How do curriculum-based measures in reading compare to early warning signs data in predicting outcomes on statewide assessments of reading at the middle school level? 2. What is the valued added in combining curriculum-based measures (CBM) in oral reading fluency, Maze reading comprehension, and easyCBM multiple choice reading comprehension as predictors of outcomes on high stakes reading assessments at the middle school level? 3. What is the valued added in combining Early Warning Signs (EWS) data in attendance, office discipline referrals, failing grades, and prior performance on statewide assessments function as predictors of outcomes on high stakes reading tests at the middle school level? 48 Question 1. How do curriculum-based measures in reading compare to early warning signs data in predicting outcomes on statewide assessments of reading at the middle school level? The primary analyses of this investigation examined two important factors in model comparisons. The first is the overall classification accuracy of each model. Classification accuracy is the percentage of students that were correctly classified as being at-risk or not at-risk by each of the prediction models. The results predicted by the model are compared to students’ actual scores on the outcome measures (MEAP and ACT Explore® ) to yield an overall perectage of students corrected categorized by the prediction model. Comparing classification between EWS and CBM models enables a side-by-side comparison of how accurately each of these models classifies students as at-risk in reading. The second important analysis is an examination of variance explained as defined by the Nagelgerke r (logistic regression) and r2 statistics. Variance explained is an estimate of the amount of variability in a dependent variable that can be attributed in prediction model. In the current investigation, it is helpful to examine variance explained in order to see how much of the accuracy in prediction outcomes can be attributed to each model. The greater the percentage of variance explained attributed to each model (with a p-value <.05), the more confident we can be that the model is contributing to the prediction outcomes and not a chance result. Results were consistent across seventh and eighth grade. When MEAP was used as the outcome measure the overall classification accuracy between EWS and CBM returned a less than 1% difference (seventh grade = .2%, eighth grade = .6%). When ACT Explore® was used as the dependent variable the overall classification accuracy of EWS and CBM models were separated by 1.9% in favor of the CBM model. 49 The EWS model accounted for greater variance in outcomes on both MEAP and ACT Explore® than the CBM model. This is not surprising given that the EWS model included previous scores on state reading assessments. Since previous scores on statewide reading assessments are a predictor of future scores on state reading assessments (Denton, et al., 2012) it makes sense that prior state test scores account for the largest proportion of variance in the EWS model. Findings of this investigation show variance explained by past state test scores on future state test scores in reading (seventh = 55.1%, eighth =58.1%) are notably higher than results reported by Denton et al. In their analysis of scores for eighth grade students in Texas, researchers found 44% of the variance explained on future state test scores was accounted for by the previous year’s state test scores. Results of Denton et al. (2012) and the current investigation support the assertion made by Reed and colleagues that data from annual statewide reading assessments should be, “an integral part of the universal screening system,” for reading in secondary schools (Reed, Wexler, & Vaughn, 2012, p. 49). Given that state reading assessments are a measure of reading, one might posit that state assessment scores are more aligned with the constructs assessed in the CBM model than those measured in the EWS model. Thus, state reading tests and CBM should be grouped together as a collective set of reading data. However, state reading tests do, in fact, assess a dimension of reading that is notably different than CBM universal screeners. State assessments of reading are a measure of overall grade level reading achievement aligned with state adopted standards. CBM measure more narrow skills such as oral reading fluency, which is then used as a proxy for reading achievement. Putting state reading assessments and CBM reading assessments into the prediction model may provide a more powerful overall prediction model with greater accuracy and reliability. Further analysis exploring the use of state assessment data and CBM reading data 50 to predict which students are in need of reading intervention is recommended. In addition, examination of state assessment data and each CBM measure independently provides information that teachers can use to pinpoint specific areas that warrant reading support. For example, comparing scores on R-CBM (oral reading fluency) and easyCBM MCRC (reading comprehension) may help teachers prioritize intervention services specific to students’ individual needs. However, it is important to keep in mind that the grouping of variables in each prediction model was based on the distinction between data that schools are already required to collect based on federal and state requirements (i.e. statewide achievement tests, attendance records, discipline records, and course grades) and assessments that must be administered in addition to state mandated testing requirements, formative, and summative content area tests. The impetus for this investigation comes from the underlying question, Can schools use the data they already have to identify which students need reading intervention services or must they do additional testing? In the case of the school participating in this study, results support using state test data as the best screening measure and the one that should be used at a true universal level (i.e., for all students), while reserving CBM screening when statewide test data are not available. The one exception to using state assessment data as a truly universal screener is that students exempt from MEAP testing based on their IEP qualifications will not have state reading test scores. Question 2. What is the valued added in combining curriculum-based measures (CBM) in oral reading fluency, Maze reading comprehension, and easyCBM multiple choice reading comprehension as predictors of outcomes on high stakes reading assessments at the middle school level? 51 Results of both the binary and linear CBM analysis indicate that easyCBM accounted for the largest proportion of variance attributable to the CBM measures. This is perhaps not surprising given that easyCBM, and the dependent measures used in this investigation (MEAP reading and ACT Explore®) are both multiple choice reading tests that assess factual, inferential, and evaluative questions. While the analysis in this investigation did not explore the effects of test format as a moderator of the relationship between dependent and independent variables there may be some advantages to using a multiple choice reading test to predict outcomes on another multiple choice reading test. Presumably, multiple choice reading comprehension assessments all assess similar skills and constructs and therefore would be expected to correlate with one another more strongly than those that assess different aspects of reading such as fluency or decoding. The addition of R-CBM provided an incremental increase that was small but consistently significant across grade levels, dependent measures, and analyses. As a measure of oral reading fluency R-CBM measures a different aspect of reading than does easyCBM. While easyCBM focuses on passage reading, recall, and application, R-CBM is a test of speed and accuracy. The fact that both assessments contributed significantly to the variance attributed to the CBM predictive model means that neither one is likely a comprehensive measure of reading ability. Instead, easyCBM and R-CBM measure specific aspects of reading that serve as a proxy for reading achievement as a whole. Including Maze in the model failed to produce consistent or statistically significant contributions to the CBM model. Though the lack of consistent positive results with Maze may initially seem problematic in the model, it is important to consider that Maze is a combined measure of silent reading fluency and reading comprehension. Since the two variables entered immediately prior to Maze in the model are measures of reading comprehension (easyCBM) and 52 fluency (R-CBM) it is entirely possible that variance potentially attributable to Maze was already accounted for by R-CBM and easyCBM. It is also possible that the lack of significant effect on the prediction model with the addition of Maze was a consequence of the strength of the relationship between Maze and the dependent measures. As noted in Chapter Two, prior research has shown that the relation between established measures of reading comprehension and Maze is not as strong it is between these same external measure and ORF (Ardoin et al., 2004; Parker, Hasbrouk, & Tindal, 1992; Tolar, et al., 2012). Additionally, data from the CBM model at seventh grade indicated a discrepancy between the consistency of classification accuracy and variance explained. For seventh grade CBM analysis using MEAP as the dependent variable, the variance explained by the full model is 40.2%, which is well below all other percentages of variance explained in this investigation across all models and grades (range 46.7%-58.7%). This is somewhat surprising given that the classification accuracy and between the EWS (74.1%) and CBM (72.8%) are within 2% while there is nearly a 15% discrepancy in variance explained between for EWS (55.1%) and CBM (40.2%). Staff from the participating school indicated that there are often anomalies in seventh grade CBM data trends that are attributed to the comparatively high cut scores established at seventh grade. Further investigation of seventh grade CBM data is necessary to more fully understand the differences in variance explained between the CBM and EWS models. Question 3. What is the valued added in combining Early Warning Signs (EWS) data in attendance, office discipline referrals, failing grades, and prior performance on statewide assessments function as predictors of outcomes on high stakes reading tests at the middle school level? 53 For the EWS models, it is clear that prior scores on state assessments in reading account for the vast majority of variance explained on future outcomes of state reading assessments. This pattern occurred at both seventh and eighth grade for outcomes on MEAP and ACT Explore®. These findings are consistent with similar analysis conducted by Denton and colleagues (2011) that found prior scores on state reading assessments in Texas to be the best overall predictor of future scores on state reading assessments compared to nine other commonly used reading assessments. The addition of ODRs, attendance, and course failure into the models produced only incremental non-significant increases in variance explained and classification accuracy. Typically, these results would indicate that ODRs, attendance, and course failure provide little to no added value for the prediction model. These results are somewhat surprising given the reported strength of behavior, attendance, and course failure as predictors of high school dropout (Balfanz, 2009; Balfanz, Herzog, & Mac Iver, 2007; Neild, Balfanz, & Herzog, 2007). The key to the weak value added by attendance, behavior, and course failure may be in part due to comorbidity of academic and behavior problems. That is, because variables included in early warning signs appear to be functionally related, students that struggle in one area are also likely to struggle in another. For example, students with poor attendance (>10% absences) are more likely to fail courses than those with adequate attendance. This means there is likely a considerable amount of overlap in the groups of students predicted at-risk if the EWS variables were each explored independently. Just as in the CBM results, the amount of variance predicted in the EWS model by prior state test scores could include some or all of the potential variance explained by ODRs, attendance, and course failure. Results of the investigation reported above 54 by Neild, Balfanz, & Herzog (2007) elude to this in reporting that the chances of a student dropping out of high school was 75% for students that exhibiting one or more warning signs. It is important to consider though the balance between the value added by including ODRs, attendance, and course failure in the model and the costs associated with collecting and managing such data. Since these three data come from extant sources, require no additional testing, are routinely collected, and are often required for state reporting purposes, it may be beneficial to keep ODRs, attendance, and course failure as a part of screening procedures. Therefore the incremental benefits in accuracy of prediction may outweigh the time, effort, and resources necessary to include these data. Furthermore, even non-significant increases in prediction accuracy leads to better overall efficiency in identification of students at-risk and reduces the potential for false positive and false negative predictions. For the participating school, consider that a 2% increase in overall accuracy translates into approximately seven more students correctly identified as needing or not needing reading intervention services. In sum, if schools use an EWS model for identification of students in need of reading support, and the data in ODRs, attendance, and course failure are readily available, school personnel should consider such data for screening purposes. Limitations Data for this investigation came from a single school serving two grade levels (seventh and eighth). The sample size (n = 434) for both grades combined is not atypical in the field of CBM research (Yeo, 2009), but nonetheless could benefit from a larger sample size. Because data were limited to a single site it was not possible to control for school-level, district level, or regional level effects in the analysis. However, as evidence in table 20, there are several areas in which the participating school is representative of the larger population. In school-wide 55 enrollment, the participating school includes 14% students with disabilities as a percentage of total enrollment, which is comparable to both state (13%) and national (13%) public school enrollment. Likewise, the percentage of students qualifying for free or reduced lunch at the participating school (53%) is similar to state (47%) and federal (48%) levels. On important area in which the school, state, and federal profiles are quite different is in the area of reading achievement. The percentage of students proficient in reading as measured by MEAP is 65% for student in 8th grade. This is 8% below the statewide average (73%). When compared nationally, results of National Assessment of Educational Progress (NAEP) show 36% of eighth grade students proficient in reading. Since the primary are of interest for this investigation is reading, it is important to recognize that results are generalizable to schools with similar demographic and achievement characteristics rather than the population as a whole. Future research should include data with a larger sample size across multiple schools, districts, and regions. Another important limitation in the current study is the use of MEAP and ACT Explore® as dependent variables. As mentioned above, MEAP and ACT Explore® have both been discontinued by their respective organizations beginning with the 2014-2015 school year. With the development of Common Core State Standards (CCSS) and computer adaptive testing, ACT, the Michigan Department of Education, and many other state departments of education have transitioned to tests that are more closely aligned with CCSS and incorporate technology enhanced test times such as drag and drop items, hot spots (e.g., clicking on a target area or highlighted area of text), short answer, and extended response items available in online testing environments. MEAP has been replaced by M-STEP as the statewide test of academic achievement and school accountability in Michigan. ACT Explore® has been replaced by ACT 56 Aspire®. While the results of the current investigation are informative, it will not be possible to replicate these results since MEAP and ACT Explore® are no longer available. Another limitation is the exclusive use of secondary data. Data were provided to the researcher directly from the participating school and it is not possible to assess the fidelity of testing procedures and the accuracy of data entry. A study on training and fidelity of test procedures conducted by Reed and Sturges found that 8% of all ORF administrations resulted in procedural errors that compromised the data (Reed & Sturges, 2012). Similary, Cummings, Biancarosa, Scchaper and Reed (2014) found that up to 16% of variance in ORF scores can be attributed to variation between test proctors. While there is no evidence in the current study of poor implementation fidelity that would compromise the findings, ensuring fidelity of implementation is an essential component of scientific inquiry and cannot be assumed in this investigation. Finally, it is important to recognize that this analysis of EWS and CBM focuses only one of the many functions of these data sources, and that is universal screening. CBM and EWS data can be useful for other aspects of assessment including progress monitoring, evaluation of instructional effectiveness, and as part of the referral process for special education, to name a few. This investigation explored the use of EWS and CBM as methods for distinguishing between those students that do and do not require additional reading instruction and intervention. Though EWS and CBM appear to be comparable methods for identification purposes, there was no investigation of how these models inform what happens after students have been identified as needing additional services. Further investigation is needed to know if data used in the screening process actually enables educators to appropriately fit students with the targeted services they need. Between EWS and CBM there may be advantages in using one over the other when it 57 comes to deciding what services are the best fit for a student including the specific instructional program, frequency, duration, and group size. Beyond identification, it is important for schools to know how best to proceed with students that could benefit from further intervention. Implications The current investigation represents a departure for prior students in EWS and CBM in two critical ways. First, the latency between the CBM data collected and the primary dependent measure is markedly short. Since MEAP testing was completed in the fall of each academic year, the fall benchmark cycle of CBM testing concluded less than three weeks before the start of MEAP administration. The reduced latency between testing periods effectively eliminates the intervention and maturation effects found in students that compare fall CBM data to state test data from a spring administration. Additionally, the EWS data used in these analyses were collected from the prior year. One might imagine that the close proximity between CBM and MEAP compared the EWS data and MEAP would give the CBM model a distinct advantage in its ability to accurately classify students. While reducing latency between CBM may have been advantages in predicting outcomes, the magnitude of such advantage was not such that it surpassed the overall classification accuracy of the EWS model. Second, much of the work done in curriculum-based measurement as well as other areas of educational measurement assumes that assessment is required to gather the required data for decision making, whether it be at the student, teacher, grade level, school, or district levels. However, it may be the case that the data required to make sound educational decisions may already exist. If students can be accurately categorized as at-risk of failure in reading without the use of additional tests, there is potential to create systems of assessment, decision-making, and support that are more efficient and effective. 58 Though the results of the current investigation show that EWS and CBM accurately categorize students as at-risk or not at-risk for failure on high stakes tests at similar rates, it should be noted that results do not support the elimination of CBM testing in middle schools in favor of EWS. It may be tempting to compare the classification accuracy of EWS versus CBM and conclude that CBM are no longer required for screening purposes. This may be particularly tempting to conclude when educators already question the number of tests administered to students each year. However, there are likely to be situations in which the use of CBM are necessary for the identification of students in need of support. These situations include data that are inaccurate or incomplete due to frequent absences, data that are inaccessible because of changes in building enrollment, or for those students whose archival data are inaccessible or simply do not exist. In these cases, the use of CBM as a screen method is recommended. There are also many other functions of CBM that were not addressed in this investigation. CBM can be used for weekly and monthly monitoring of students’ progress over time, evaluate the performance of an individual or class relative to local, state, and national norms of performance, and evaluate an individual student’s rate of growth and response to instruction (Salvia, Ysseldyke, & Bolt, 2012). Even if extant data are used for screening purposes, CBM should remain a part of a well-balanced school-wide system of assessment. It is also important to mention that there are several barriers that may inhibit schools’ ability to use extant data as a method for universal screening. Chief among these barriers is access to extant data. In many cases teachers, interventionists, and support staff may not have direct access to students’ past records of state assessment scores, absences, ODRs, and course failure. In accessibility can occur for a variety of reasons including delays in transfers of records from previous schools and inadequate credentials for accessing such data on district student 59 information systems. Likewise, schools may not have sufficient data infrastructure and/or expertise necessary to compile EWS data in a format that is suitable for screening purposes. While the calculations required to aggregate ODRs, attendance, and course failure require only rudimentary arithmetic, doing so en masse class-wide or building-wide can be time consuming and logistically challenging. In some cases, it may be more efficient and cost effective to administer CBM than it is to manage and maintain EWS data. There are certainly technological options that make compiling EWS data less challenging, however implementation of technological solution requires additional time, training, infrastructure, and funding. Finally, the exploration of extant and CBM data to predict outcomes on high stakes tests for students in middle schools need not be limited to an either/or scenario. Though analyses suggests that EWS and CBM assess risk status in reading for students in seventh and eighth grade with a similar level of accuracy, the most accurate and efficient method of identifying students in need of support may in fact come from a combination of EWS and CBM assessment. As suggested above, a blend of state assessment data and CBM screening is one promising option. Another potential option for screening is a tiered system of screening in which EWS data (including state assessment data) are used as the primary screening instrument. CBM assessments could then be used as a secondary level screening tool for students whose EWS data is at or near predetermined benchmarks. Furthermore, including measures of oral reading fluency and reading comprehension to EWS may assist school personnel in differentiating between students that need targeted support in a specific area or comprehensive reading support. Suggestions for Future Research Results of this investigation suggest the need for further research in several areas. Chief among these is the need for replication using results from Common Core State Standards (CCSS) 60 assessments including Smarter Balanced Assessments Consortium (SBAC) and Partnership for Assessment of Readiness for College and Careers (PARCC). As states transition to CCSS aligned state assessments it is necessary to understand if the EWS and CBM will function equally well in predicting performance new high stakes tests, in particular those using online administration and technology enhanced assessment items. Another area requiring further research is the exploration of potential bias in EWS and CBM data for disaggregated subgroups in socioeconomic status, race/ethnicity, and disability classification. Prior research suggests that students in many of these subgroups are at higher risk of underperformance and high-school dropout than the aggregated population (Hosp, Hosp, & Dole, 2011). It is therefore important to ensure that screening measures are equally accurate, or better, at identifying students in need of support for at-risk populations as they are for the general population. Though some investigation of potential bias among CBM measures by subgroup has been done, there are still many answered questions particularly for students in middle school (DeBoy, 2013; Hosp, Hosp, & Dole, 2011). Results of this study suggest that extant data sources, including prior scores on statewide reading achievement tests, are a feasible alternative to the use of curriculum-based measures in reading for universal screening of students in middle school. However it is presently unclear how the prediction models tested in this study will play out in practice for school personnel. Arming schools with accurate and efficient methods for identification of students in need of support is only the first step. The ways in which school personnel use CBM and EWS data to make decisions regarding services for students and allocation of resources is equally important. Future research is needed that explores the practical implications and feasibility of using CBM and EWS as a mechanism for universal screening in middle schools. 61 Finally, it is essential to recognize much of the current research in screening methods at the middle school level assumes that a valid and reliable method for risk identification already exists amongst the many assessments and systems available. However, it is entirely possible that the most accurate and reliable screening methods in middle school may not yet exist. The uniqueness of adolescents in their academic, behavioral, and social-emotional growth may warrant a set of screening tools that are demonstrably different than anything that currently exists. Further research is needed to explore creative methods for screening that meets the specific needs of adolescent learners. 62 APPENDIX 63 Table 1 Summary of Descriptive Statistics and Tests for Normality Shapiro-Wilk Measure easyCBM MCRC Grade Md SD Skewness Kurtosis Test df p 7 11.82 12 3.73 -.483 -.556 .951 192 <.001 8 13.36 14 3.75 -1.00 1.032 .928 235 <.001 R-CBM 7 8 137.06 145.91 136 149 37.84 38.40 -.190 .108 .125 .998 .990 .987 194 233 .229 .005 Maze 7 8 24.41 25.57 25 26 8.85 8.79 .116 .040 -.066 .099 .992 .992 192 232 .429 .264 ACT Explore® Scaled Score 8 13.7 13 3.734 .003 -.417 .956 228 .001 MEAP 7 721.61 720 33.15 .712 1.423 .955 196 <.001 Reading Scaled 8 822.23 824 24.20 -.001 .314 .986 242 .020 Score Note. Md = median. SD = standard deviation. MCRC=Multiple Choice Reading Comprehension. R-CBM=Reading Curriculum Based Measure. 64 Table 2 Model Fit for Seventh Grade EWS Variables on MEAP Reading. Omnibus Test H-L Test χ2 df p χ2 df p -2 Log Likelihood Nagelkerke R Gender+FRL+Ethnicity Step Block Model 27.560 27.560 27.560 1 1 1 <.001 <.001 <.001 2.452 8 .964 191.246 .214 MEAP-Y1* Step Block Model 71.905 71.905 78.882 1 1 2 <.001 <.001 <.001 6.145 8 .631 139.924 .524 MEAP-Y1 + ODR* Step Block Model 1.654 1.654 80.536 1 1 3 .198 .198 <.001 6.964 8 .533 138.270 .533 MEAP-Y1 + ODR + Attendance* Step Block Model 3.003 3.003 135.267 1 1 4 .083 .083 <.001 13.506 8 .096 135.267 .548 MEAP-Y1 + ODR + Attendance + Course Failure* Step Block Model .729 .729 134.538 1 1 5 .393 .393 <.001 12.433 8 .133 134.538 .551 Note. Classification cutoff = 0.5 *Gender and Ethnicity were non-significant (p > .05) in block 1, FRL was retained in all subsequent block entry of variables. FRL = Students qualifying for free and reduced price school lunch. H-L=HosmerLemeshow. MEAP-Y1 = Michigan Education Assessment Program reading subtest. ODR = Office Discipline Referral. Attendance = Total number of period absences. Course Failure = Total number of courses failed. n = 158. 65 Table 3 Model Fit for Eighth Grade EWS Variables on MEAP Reading. Omnibus Test H-L Test χ2 df p χ2 df p -2 Log Likelihood Nagelkerke R Gender+FRL+Ethnicity Step Block Model 13.160 13.160 13.160 1 1 1 <.001 <.001 <.001 .000 0 NA 275.883 .079 MEAP-Y1* Step Block Model 110.092 110.092 123.251 1 1 2 <.001 <.001 <.001 7.519 8 .452 165.791 .585 MEAP-Y1 + ODR* Step Block Model .160 .160 123.411 1 1 3 .689 .689 <.001 9.851 8 .276 165.631 .586 MEAP-Y1 + ODR + Attendance* Step Block Model .278 .278 123.689 1 1 4 .598 .598 <.001 8.515 8 .385 165.354 .587 MEAP-Y1 + ODR + Attendance + Course Failure* Step Block Model .060 .060 123.749 1 1 5 .806 .806 <.001 11.034 8 .200 165.293 .587 Note. Classification cutoff = 0.5. *Gender and ethnicity were non-significant (p > .05) in block 1, FRL was retained in all subsequent block entry of variables. FRL = Students qualifying for free and reduced price school lunch. H-L=HosmerLemeshow. ODR = Office Discipline Referral. Attendance = Total number of period absences. Course Failure = Total number of courses failed. n = 229. 66 Table 4 Logistic Regression Results for Seventh Grade EWS on MEAP Reading. Proficient No Yes % Correct β SE β Wald’s χ2 df p eβ 25 44 67.1 53.7 60.1 .860 .330 6.800 1 .009 2.362 1.238 4.507 57 21 19 61 75.0 74.4 74.7 .087 .015 35.567 1 <.001 1.060 1.060 1.122 No Yes Overal % Correct 56 18 20 64 73.7 78.0 75.9 .087 -.109 .015 .087 34.843 1.542 1 1 <.001 .214 1.091 .897 1.060 .756 1.122 1.065 MEAP-Y1 ODR Attendance No Yes Overal % Correct 54 16 22 66 71.1 80.5 75.9 .093 -.116 .023 .016 .089 .013 34.467 1.718 3.071 1 1 1 <.001 .190 .080 1.098 .890 1.023 1.064 .748 .997 1.133 1.059 1.049 MEAP-Y1 No 53 23 69.7 .090 .016 30.427 1 <.001 1.094 1.060 1.129 ODR Yes 18 64 78.0 -.094 .093 1.036 1 .309 .910 .759 1.091 Attendance Course Failure Overal % Correct 74.1 .023 -.152 .013 .184 3.175 .682 1 1 .075 .409 1.023 .859 .998 .599 1.049 1.232 Predictors Observed FRL No Yes Overal % Correct 51 38 MEAP-Y1 No Yes Overal % Correct MEAP-Y1 ODR c for eβ (95%) Note. Classification Cutoff = 0.5, MEAP-Y1 = Michigan Education Assessment Program reading subtest, ODR = Total number of Office Discipline Referrals, Attendance = Total number of period absences, FRL = Free and Reduce Price Lunch. n = 158. 67 Table 5 Logistic Regression Results for Eighth Grade EWS on MEAP Reading. Proficient No Yes % Correct β SE β Wald’s χ2 df p eβ 79 .0 1.038 .292 12.62 1 <.001 2.822 1.592 5.003 7 143 100 64.4 No Yes Overal % Correct 60 15 19 128 75.9 89.5 84.7 .080 .011 51.922 1 <.001 1.083 1.060 1.107 MEAP-Y1 ODR No Yes Overal % Correct 59 15 20 128 74.7 89.5 84.2 .080 .016 .011 .041 51.882 .156 1 1 <.001 .693 1.084 1.016 1.060 .938 1.108 1.100 MEAP-Y1 ODR Attendance No Yes Overal % Correct 59 15 20 128 74.7 89.5 84.2 .080 .019 -.001 .011 .041 .002 51.333 .223 .274 1 1 1 <.001 .636 .601 1.084 1.020 .999 1.060 .941 .996 1.108 1.105 1.003 MEP-Y1 ODR Attendance Course Failure No Yes Overal % Correct 60 15 19 128 75.9 89.5 84.7 .080 .024 -.001 -.022 .011 .045 .002 .090 49.489 .283 .122 .060 1 1 1 1 <.001 .595 .726 .806 1.083 1.024 .999 .978 1.059 .937 .995 .820 1.108 1.120 1.003 1.167 Predictors Observed FRL No 0 Yes Overal % Correct MEAP-Y1 c for eβ (95%) Note. Classification Cutoff = 0.5, MEAP-Y1 = Michigan Education Assessment Program reading subtest, ODR = Total number of Office Discipline Referrals, Attendance = Total number of period absences, FRL = Free and Reduce Price Lunch. n = 229. 68 Table 6 Model Fit for Eighth Grade EWS Variables on ACT Explore® Reading. Omnibus Test H-L Test χ2 df p χ2 df p -2 Log Likelihood Nagelkerke R Gender+FRL+Ethnicity Step Block Model 19.396 19.396 19.396 1 1 1 .007 .007 .007 .000 0 NA 229.154 .128 MEAP-Y1* Step Block Model 78.169 78.169 86.929 1 1 2 <.001 <.001 <.001 7.277 8 .507 161.622 .491 MEAP-Y1 + ODR* Step Block Model 1.633 1.633 88.561 1 1 3 .201 .201 <.001 9.462 8 .305 159.989 .499 MEAP-Y1 + ODR + Attendance* Step Block Model 2.241 2.241 90.802 1 1 4 .138 .138 <.001 3.116 8 .927 157.748 .509 MEAP-Y1 + ODR + Attendance + Course Failure* Step Block Model 1.179 1.179 91.982 1 1 5 .278 .278 <.001 3.321 8 .913 159.569 .514 Note. Classification cutoff = 0.5 *Gender and Ethnicity were non-significant (p > .05) in block 1, FRL was retained in all subsequent block entry of variables. FRL = Students qualifying for free and reduced price school lunch. H-L=HosmerLemeshow. ODR = Office Discipline Referral. Attendance = Total number of period absences. Course Failure = Total number of courses failed. n = 206. 69 Table 7 Logistic Regression Results for Eighth Grade EWS on ACT Explore® Reading. Predictors Observed Proficient No Yes FRL No 146 Yes Overal % Correct MEAP-Y1 % Correct β SE β Wald’s χ2 df p eβ c for eβ (95%) 0 100 .934 .323 8.337 1 .004 2.544 1.350 4.796 60 0 0.0 70.9 No Yes Overal % Correct 130 26 16 34 89.0 56.7 79.6 .071 .011 40.841 1 <.001 1.074 1.051 1.098 MEAP-Y1 ODR No Yes Overal % Correct 131 25 15 35 89.7 58.3 80.6 .071 -.076 .011 .062 40.253 1.489 1 1 <.001 .222 1.073 .927 1.050 .821 1.097 1.047 MEAP-Y1 ODR Attendance No Yes Overal % Correct 131 24 15 36 89.7 60.0 81.1 .074 -.095 .005 .012 .062 .003 39.947 2.321 2.333 1 1 1 <.001 .128 .127 1.077 .910 1.005 1.052 .806 .999 1.102 1.027 1.011 MEP-Y1 ODR Attendance Course Failure No Yes Overal % Correct 132 24 14 36 90.4 60.0 81.6 .073 -.077 .006 -.168 .012 .064 .003 .166 37.815 1.470 3.158 1.020 1 1 1 1 <.001 .225 .076 .312 1.075 .926 1.006 .845 1.051 .817 .999 .610 1.101 1.049 1.013 1.171 Note. Classification Cutoff = 0.5, MEAP-Y1 = Michigan Education Assessment Program Reading Subtest, ODR = Total number of Office Discipline Referrals, Attendance = Total number of period absences, FRL = Free and Reduce Price Lunch. n = 206. 70 Table 8 Model Fit for Seventh Grade CBM Variables on MEAP Reading. Omnibus Test H-L Test χ2 df p χ2 df p -2 Log Likelihood Nagelkerke R Gender+FRL+Ethnicity Step Block Model 15.083 15.083 15.083 1 1 1 .002 .002 .002 11.878 8 .157 234.361 .107 easyCBM MCRC Step Block Model 25.314 25.314 40.018 1 1 2 <.001 <.001 <.001 11.312 8 .185 209.426 .266 easyCBM MCRC + R-CBM Step Block Model 23.355 23.355 63.374 1 1 3 <.001 <.001 <.001 4.149 8 .843 186.070 .396 Step Block Model 1.154 1.154 64.528 1 1 4 .283 .283 <.001 5.283 8 .727 184.916 .402 easyCBM MCRC + R-CBM + Maze Note. Classification cutoff = 0.5 *Gender was non-significant (p > .05) in block 1, Ethnicity and FRL wer retained in all subsequent block entries of variables. FRL = Students qualifying for free and reduced price school lunch. H-L=HosmerLemeshow. MCRC=Multiple Choice Reading Comprhension. R-CBM=Reading Curriculum Based Measure. n = 180. 71 Table 9 Model Fit for Eighth Grade CBM Variables on MEAP Reading. Omnibus Test H-L Test χ2 df p -2 Log Likelihood Nagelkerke R .006 .006 .006 1.766 8 .987 277.434 .075 1 1 2 <.001 <.001 <.001 3.854 8 .870 217.111 .381 31.832 31.832 104.681 1 1 3 <.001 <.001 <.001 16.100 8 .041 185.279 .513 5.438 5.438 110.119 1 1 4 .020 .020 <.001 7.213 8 .514 179.841 .534 χ2 df p Gender+FRL+Ethnicity Step Block Model 12.525 12.525 12.525 1 1 1 easyCBM MCRC Step Block Model 61.031 61.031 72.849 easyCBM MCRC + R-CBM Step Block Model Step Block Model easyCBM MCRC + R-CBM + Maze Note. Classification cutoff = 0.5 *Gender and ethnicity were non-significant (p > .05) in block 1, FRL was retained in all subsequent block entries of variables. FRL = Students qualifying for free and reduced price school lunch. H-L=HosmerLemeshow. MCRC=Multiple Choice Reading Comprhension. R-CBM=Reading Curriculum Based Measure. n = 226. 72 Table 10 Logistic Regression Results for Seventh Grade CBM on MEAP Reading. Proficient No Yes % Correct β SE β Wald’s χ2 df p eβ 27 54 70.7 61.4 66.1 .676 .315 4.592 1 .032 1.965 1.059 3.646 65 31 27 57 70.7 64.8 67.8 .087 .015 35.567 1 <.001 1.060 1.060 1.122 No Yes Overal % Correct 69 24 23 64 75.0 72.7 73.9 .097 .032 .058 .007 2.830 19.036 1 1 .093 <.001 1.102 1.033 .984 1.018 1.235 1.048 No 69 23 75.0 .087 .059 2.212 1 .056 1.091 .973 1.225 Yes Overal % Correct 26 62 70.5 72.8 .026 .037 .009 .034 8.434 1.140 1 1 .004 .286 1.027 1.037 1.009 .970 1.045 1.110 Predictors Observed FRL Ethnicity No Yes Overal % Correct 65 34 easyCBM MCRC No Yes Overal % Correct easyCBM MCRC R-CBM easyCBM MCRC R-CBM Maze c for eβ (95%) Note. Classification cutoff = 0.5 *Gender was non-significant (p > .05) in block 1, Ethnicity and FRL were retained in all subsequent block entries of variables. FRL = Students qualifying for free and reduced price school lunch. H-L=Hosmer-Lemeshow. MCRC = Multiple Choice Reading Comprehension R-CBM = Reading-Curriculum Based Measure. Maze = Maze Reading Comprehension. n = 180. 73 Table 11 Logistic Regression Results for Eighth Grade CBM on MEAP Reading. Predictors Observed Proficient No Yes % Correct β SE β Wald’s χ2 df p eβ c for eβ (95%) FRL No Yes Overal % Correct 0 0 77 149 0 100 65.9 .989 .294 11.300 1 .001 2.688 1.510 4.784 easyCBM MCRC No Yes Overal % Correct 47 16 30 133 61.0 89.3 79.6 .377 .059 41.384 1 <.001 1.458 1.300 1.635 easyCBM MCRC R-CBM No Yes Overal % Correct 54 13 23 136 70.1 91.3 84.1 .290 .033 .062 .007 22.158 24.232 1 1 <.001 <.001 1.337 1.033 1.185 1.020 1.509 1.047 easyCBM MCRC R-CBM Maze No 53 24 68.8 .268 .062 18.462 1 <.001 1.307 1.157 1.477 Yes Overal % Correct 15 134 89.9 82.7 .023 .075 .008 .033 9.061 5.243 1 1 .003 .022 1.023 1.078 1.008 1.011 1.039 1.150 Note. Classification cutoff = 0.5 *Gender and ethnicity were non-significant (p > .05) in block 1, FRL was retained in all subsequent block entries of variables. FRL = Students qualifying for free and reduced price school lunch. H-L=Hosmer-Lemeshow. MCRC = Multiple Choice Reading Comprehension R-CBM = Reading-Curriculum Based Measure. Maze = Maze Reading Comprehension. n = 226. 74 Table 12 Model Fit for Eighth Grade CBM Variables on ACT Explore® Reading. Omnibus Test H-L Test χ2 df p -2 Log Likelihood Nagelkerke R .002 .002 .002 6.004 8 .647 247.403 .093 1 1 3 <.001 <.001 <.001 6.670 8 .573 206.485 .322 21.151 21.151 76.809 1 1 4 <.001 <.001 <.001 10.549 8 .229 185.334 .424 Step Block 9.532 9.532 1 1 .002 4.353 8 .824 175.802 .467 Model 86.341 5 χ2 df p Gender+FRL+Ethnicity Step Block Model 14.740 14.740 14.740 1 1 1 easyCBM MCRC* Step Block Model 42.051 42.051 55.658 easyCBM MCRC + R-CBM* Step Block Model easyCBM MCRC + R-CBM + Maze* .002 <.001 Note. Classification cutoff = 0.5 *Gender was non-significant (p > .05) in block 1, FRL and Ethnicity were retained in all subsequent block entry of variables. FRL = Students qualifying for free and reduced price school lunch. H-L=HosmerLemeshow. MCRC = Multiple Choice Reading Comprehension R-CBM = Reading-Curriculum Based Measure. Maze = Maze Reading Comprehension. n = 207. 75 Table 13 Logistic Regression Results for Eighth Grade CBM on ACT Explore® Reading. Proficient No Yes % Correct β SE β Wald’s χ2 df p eβ 0 0 100 0 71.1 .862 -.171 .316 .083 6.850 4.190 1 1 .009 .041 2.284 .843 1.231 .719 4.241 .993 139 39 16 24 89.7 38.1 74.8 .409 .078 27.614 1 <.001 1.506 1.292 1.754 No Yes Overal % Correct 143 30 12 33 92.3 52.4 80.7 .342 .027 .084 .007 16.633 17.036 1 1 <.001 <.001 .850 1.027 .704 1.014 1.026 1.041 No 142 13 91.6 .311 .086 13.229 1 <.001 1.365 1.154 1.615 Yes Overal % Correct 23 40 63.5 83.5 .018 .093 .007 .031 6.174 8.725 1 1 .013 .003 1.018 1.097 1.004 1.032 1.032 1.166 Predictors Observed FRL Ethnicity No Yes Overal % Correct 155 63 EasyCBM MCRC No Yes Overal % Correct EasyCBM MCRC R-CBM EasyCBM MCRC R-CBM Maze c for eβ (95%) Note. Classification cutoff = 0.5 *Gender was non-significant (p > .05) in block 1, FRL and Ethnicity were retained in all subsequent block entry of variables. FRL = Students qualifying for free and reduced price school lunch. H-L=Hosmer-Lemeshow. MCRC = Multiple Choice Reading Comprehension R-CBM = Reading-Curriculum Based Measure. Maze = Maze Reading Comprehension. n = 218. 76 Table 14 Linear Regression Results for Seventh Grade EWS on MEAP Reading. Model 1 Variable FRL B SE B -13.493 5.118 MEAP-Y1 Model 2 β B -.201 -3.495 .862 SE B 3.687 .066 ODR Model 3 β B SE B -.052 -3.685 3.681 Model 4 β -.055 .713** .858 .066 .709** -1.117 .835 Total Absences Course Failure -.072 B SE B -4.118 3.745 .859 Model 5 β B SE B β -.061 -3.878 3.753 -.058 .066 .710** .841 .069 .695** -1.140 .837 -.073 -.883 .876 -.057 .042 .064 .036 .049 .064 .042 -1.313 1.322 -.059 R2 .040** .527** .532 .533 .536 F for change in R2 6.950** 168.489** 1.789 .437 .986 Note: MEAP-Y1 = Michigan Education Assessment Program reading subtest for previous year. FRL = Students qualifying for free or reduced price lunch. ODR = Total Office Discipline Referrals. Total Absences = Total number of period absences. Course Failure = Total number of courses failed *p < .05. **p < .01. n = 165. 77 Table 15 Linear Regression Results for Eighth Grade EWS on MEAP Reading. Model 1 Variable FRL B Model 2 SE B β B -15.637 4.306 .238* -3.329 * MEAP-Y1 .846 SE B Model 3 β B SE B Model 4 β SE B F for change in R2 SE B -.051 -3.440 2.930 β .049 .761* .837 * .050 .753** .838 .050 .754** .848 -.333 .320 -.045 -.343 .327 -.047 -.463 .372 -.063 .002 .015 .007 -.001 .016 -.004 .488 .039 Course Failure R2 B -.051 -3.285 2.896 Total Absences -3.339 2.922 β 2.896 ODR -.050 B Model 5 -.052 .052 .763** .720 .057** .601** .603 .603 .604 13.189** 298.688** 1.084 .025 .459 Note: MEAP-Y1 = Michigan Education Assessment Program reading subtest for previous year. FRL = Students qualifying for free or reduced price lunch. ODR = Total Office Discipline Referrals. Total Absences = Total number of period absences. Course Failure = Total number of courses failed *p < .05. **p < .01. n =221. 78 Table 16 Linear Regression Results for Seventh Grade CBM on MEAP Reading. Model 1 Variable B SE B FRL -11.367 4.778 Ethnicity -4.052 1.151 easyCBM MCRC Model 2 β B Model 3 SE B β B SE B β B SE B β -.169** -7.960 4.211 -.119 -4.023 3.703 -.060 -4.268 3.717 -.064 -.251** -2.950 1.019 -.182** -3.251 .888 -.201** -3.146 .898 -.195** 4.178 .561 .470** 1.572 .596 .177** 1.521 .600 .171** .447 .059 .510** .400 .082 .456** .277 .328 .074 R-CBM Maze R2 F for change in R2 Model 4 .102** .315** .483** .485 10.250** 55.468** 58.118** .712 Note: FRL = Students qualifying for free and reduced price school lunch. MCRC = Multiple Choice Reading Comprehension RCBM = Reading-Curriculum Based Measure. Maze = Maze Reading Comprehension. *p < .05. **p < .01. n = 162. 79 Table 17 Linear Regression Results for Eighth Grade CBM on MEAP Reading. Model 1 Variable FRL B SE B -5.637 4.239 easyCBM MCRC Model 2 β B Model 3 SE B β B SE B β B SE B β -.238** -7.597 3.578 -.116 -6.687 3.210 -.102 -5.598 3.158 -.085 4.965 .477 .566** 3.498 .471 .399** 3.151 .472 .359** .339 .045 .396** .232 .055 .271** .804 .243 .215** R-CBM Maze R2 F for change in R2 Model 4 .057** .362** .489** .513** 13.608** 108.167** 56.185** 10.981** Note: FRL = Students qualifying for free and reduced price school lunch. MCRC = Multiple Choice Reading Comprehension R-CBM = Reading-Curriculum Based Measure. Maze = Maze Reading Comprehension. *p < .05. **p < .01. n = 221. 80 Table 18 Linear Regression Results for Eighth Grade EWS on ACT Explore® Reading. Model 1 Variable FRL B SE B -1.546 .506 MEAP-Y1 Model 2 β B SE B -.206 -.294 .387 .086 .007 ODR Model 3 β B SE B -.039 -.290 .387 .680** .085 .007 -.032 .043 Total Absences Course Failure Model 4 Model 5 β B SE B β B SE B β -.039 -.279 .391 -.037 .673** .085 .007 .671** .083 -.038 -.030 .044 -.036 -.001 .050 -.001 .000 .002 -.013 .000 -.254 .391 -.034 .007 .652** .002 .012 -.120 .096 -.083 R2 .043** .477** .478 .478 .482 F for change in R2 9.342** 173.344** .563 .058 1.549 Note: MEAP-Y1 = Michigan Education Assessment Program reading subtest for previous year. FRL = Students qualifying for free or reduced price lunch. ODR = Total Office Discipline Referrals. Total Absences = Total number of period absences. Course Failure = Total number of courses failed *p < .05. **p < .01. n = 180. 81 Table 19 Linear Regression Results for Eighth Grade CBM on ACT Explore® Reading. Model 1 Variable Model 2 Model 3 Model 4 B SE B β B SE B β B SE B β B SE B β FRL -1.370 .502 -.183** -.533 .441 -.071 -.493 .416 -.066 -.335 .397 -.045 Ethnicity -.229 .123 -.124 -.238 .106 -.129** -.184 .100 -.100 -.130 .096 -.070 .512 .058 .513** .380 .060 .380** .314 .059 .315** .031 .006 .314** 0.11 .007 .116 .148 .030 .347** easyCBM MCRC R-CBM Maze R2 .057** .308** .387** .448** F for change in R2 6.644** 78.604** 27.920** 23.829** Note: FRL = Students qualifying for free and reduced price school lunch. MCRC = Multiple Choice Reading Comprehension RCBM = Reading-Curriculum Based Measure. Maze = Maze Reading Comprehension. *p < .05. **p < .01. n = 227. 82 Table 20 Comparison Data for Enrollment and Academic Achievement for 2012. ELL SpEd School 3% 14% State* 4% National** 9% FRL White Black Hispanic 2 or More Races Asian 53% 41% 25% 18% 12% 5% 65% 37% 13% 47% 69% 18% 6% 2% 3% 73% 35% 13% 48% 52% 16% 24% 3% 5% ***36% ***35 % Reading Math * State enrollment and achievement data for the State of Michigan provided by the Michigan Department of Education mischooldate.org; **National enrollment data provided by National Center for Education Statistics nces.ed.gov. School and state reading and math data percentage of students proficient for MEAP for 8th grade. National reading and math achievement data percentage proficient using National Assessment of Educational Progress, nces.ed.gov/nationsreportcard/naepdata 83 BIBLIOGRAPHY 84 BIBLIOGRAPHY ACT Research and Policy. (2013). What are the ACT college readiness benchmarks? Retrieved: http://www.act.org/research/policymakers/pdf/benchmarks.pdf Allensworth, E. M., & Easton, J. Q. (2005). The on-track indicator as a predictor of high school graduation. Chicago: Consortium on Chicago School Research. Ardoin, S. P., Witt, J. C., Suldo, S. M., Connell, J. E., Koenig, J. L., Resetar, J. L., ... & Williams, K. L. (2004). Examining the incremental benefits of administering a maze and three versus one curriculum-based measurement reading probes when conducting universal screening. School Psychology Review, 33, 218-233. Baker, D. L., Biancarosa, G., Park, B. J., Bousselot, T., Smith, J. L., Baker, S. K., ... & Tindal, G. (2014). Validity of CBM measures of oral reading fluency and reading comprehension on high-stakes reading assessments in Grades 7 and 8. Reading and Writing, 1-48. doi: 10.1007/s11145-014-9505-4 Balfanz, R. (2009). Putting middle grades students on the graduation path. National Middle School Association. Retrieved June, 1, 2010. Balfanz, R., Herzog, L., & Mac Iver, D. J. (2007). Preventing student disengagement and keeping students on the graduation path in urban middle-grades schools: Early identification and effective interventions. Educational Psychologist, 42(4), 223-235. doi:10.1080/00461520701621079 Barry, T. D., Lyman, R. D., & Klinger, L. G. (2002). Academic underachievement and attentiondeficit/hyperactivity disorder: The negative impact of symptom severity on school performance. Journal of School Psychology, 40(3), 259-283 doi:10.1016/s00224405(02)00100-0 Barth, A. E., Stuebing, K. K., Fletcher, J. M., Cirino, P. T., Romain, M., Francis, D., & Vaughn, S. (2012). Reliability and validity of oral reading fluency median and mean scores among middle grade readers when using equated texts. Reading psychology, 33(1-2), 133-161. doi:10.1080/02702711.2012.631863 Baydar, N., Brooks‐Gunn, J., & Furstenberg, F. F. (1993). Early warning signs of functional illiteracy: Predictors in childhood and adolescence. Child development, 64(3), 815-829. doi:10.2307/1131220 Calhoon, M. B., & Petscher, Y. (2013). Individual and group sensitivity to remedial reading program design: Examining reading gains across three middle school reading projects. Reading and Writing, 26(4), 565-592. doi:http://dx.doi.org/10.1007/s11145-0139426-7 85 Casillas, A., Robbins, S., Allen, J., Kuo, Y. L., Hanson, M. A., & Schmeiser, C. (2012). Predicting early academic failure in high school from prior academic achievement, psychosocial characteristics, and behavior. Journal of Educational Psychology, 104(2), 407. doi:10.1037/a0027180 Cohen, J. S., & Smerdon, B. A. (2009). Tightening the dropout tourniquet: Easing the transition from middle to high school. Preventing School Failure: Alternative Education for Children and Youth, 53(3), 177-184. doi:10.3200/psfl.53.3.177-184 Cummings, K. D., Biancarosa, G., Schaper, A., & Reed, D. K. (2014). Examiner error in curriculum-based measurement of oral reading. Journal of School Psychology, 52(4), 361–375. doi:10.1016/j.jsp.2014.05.007 Deboy, S. L. (2013). The Predictive Relationship Between Oral Reading Fluency and Comprehension As It Relates to Minority Students. Retrieved from: https://scholarsbank.uoregon.edu/xmlui/bitstream/handle/1794/13294/Deboy_oregon_01 71A_10726.pdf?sequence=1 Deno, S. L. (1985). Curriculum-based measurement: the emerging alternative. Exceptional children. 52(3), 219-232. Retrieved from: http://psycnet.apa.org/psycinfo/1986-10522001 Deno, S. L. (1992). The nature and development of curriculum-based measurement. Preventing School Failure: Alternative Education for Children and Youth, 36(2), 5-10. doi:10.1080/1045988x.1992.9944262 Deno, S. L. (2003). Developments in curriculum-based measurement. The Journal of Special Education, 37(3), 184-192. Retrieved from: http://files.eric.ed.gov/fulltext/EJ785942.pdf Deno, S. L., Fuchs, L. S., Marston, D., & Shin, J. (2001). Using curriculum-based measurement to establish growth standards for students with learning disabilities. School Psychology Review, 30(4), 507-524. Retrieved from: http://mdestream.mde.k12.ms.us/sped/toolkit/articles/Assessment/Deno%20Sch%20Psyc h%20Rev%202001%20CBM%20overview.pdf Denton, C. A. (2012). Response to Intervention for Reading Difficulties in the Primary Grades: Some Answers and Lingering Questions. Journal of Learning Disabilities, 45(3), 232– 243. doi:10.1177/0022219412442155 Denton, C. A., Barth, A. E., Fletcher, J. M., Wexler, J., Vaughn, S., Cirino, P. T., … Francis, D. J. (2011). The Relations among oral and silent reading fluency and comprehension in middle school: Implications for identification and instruction of students with reading difficulties. Scientific Studies of Reading : The Official Journal of the Society for the Scientific Study of Reading, 15(2), 109–135. doi:10.1080/10888431003623546 86 Edmonds, M. S., Vaughn, S., Wexler, J., Reutebuch, C., Cable, A., Tackett, K. K., & Schnakenberg, J. W. (2009). A Synthesis of Reading Interventions and Effects on Reading Comprehension Outcomes for Older Struggling Readers. Review of Educational Research, 79(1), 262–300. doi:10.3102/0034654308325998 Espin, C., Wallace, T., Lembke, E., Campbell, H. and Long, J. D. (2010), Creating a ProgressMonitoring System in Reading for Middle-School Students: Tracking Progress Toward Meeting High-Stakes Standards. Learning Disabilities Research & Practice, 25: 60–75. doi: 10.1111/j.1540-5826.2010.00304.x Fuchs, D., & Fuchs, L.S. (2006). Introduction to response to intervention: What, why, and how valid is it? Reading Research Quarterly, 41, 92-99. doi:10.1598/RRQ.41.1.4 Fuchs, D., Fuchs, L. S., & Compton, D. L. (2012). Smart RtI: A next-generation approach to multilevel prevention. Exceptional Children, 78, 263–279. doi:10.1177/001440291207800301 Fuchs, D., Fuchs, L. S., & Compton, D.L. (2004). Identifying reading disabilities by responsiveness to instruction: Specifying measures and criteria. Learning Disability Quarterly, 27(4), 216-228. doi:10.2307/1593674 Fuchs, D., Fuchs, L. S., & Stecker, P. M. (2010). The" blurring" of special education in a new continuum of general education placements and services. Exceptional Children, 76(3), 301-323. doi:10.1177/001440291007600304 Fuchs, L. S., Fuchs, D., & Speece, D. L. (2002). Treatment validity as a unifying construct for identifying learning disabilities. Learning Disability Quarterly, 25, 33-45. doi:10.2307/1511189 Fuchs, D., Mock, D., Morgan, P. L., & Young, C. L. (2003). Responsiveness-to-intervention for the learning disabilities construct. Learning Disabilities Research & Practice, 18(3), 157171. doi:10.1111/1540-5826.00072 Ghasemi, A., & Zahediasl, S. (2012). Normality Tests for Statistical Analysis: A Guide for NonStatisticians. International Journal of Endocrinology and Metabolism, 10(2), 486–489. doi:10.5812/ijem.3505 Goffreda, C. T., Diperna, J. C. and Pedersen, J. A. (2009), Preventive screening for early readers: Predictive validity of the Dynamic Indicators of Basic Early Literacy Skills (DIBELS). Psychol. Schs., 46: 539–552. doi: 10.1002/pits.20396 Hemphill, F. C., & Vanneman, A. (2011). Achievement Gaps: How Hispanic and White Students in Public Schools Perform in Mathematics and Reading on the National Assessment of Educational Progress. Statistical Analysis Report. NCES 2011-459. National Center for Education Statistics. doi:10.1037/e595292011-001 87 Hasbrouck, J., & Tindal, G. A. (2006). Oral reading fluency norms: A valuable assessment tool for reading teachers. The Reading Teacher, 59(7), 636-644. doi:10.1598/RT.59.7.3 Heppen, J. B., & Therriault, S. B. (2008). Developing Early Warning Systems to Identify Potential High School Dropouts. Issue Brief. National High School Center. Hernandez, D. J. (2011). Double Jeopardy: How Third-Grade Reading Skills and Poverty Influence High School Graduation. Annie E. Casey Foundation. Hinshaw, S. P. (1992). Externalizing behavior problems and academic underachievement in childhood and adolescence: causal relationships and underlying mechanisms. Psychological bulletin, 111(1), 127. doi:10.1037/0033-2909.111.1.127 Hosp, J. L., Hosp, M. A., & Dole, J. K. (2011). Potential bias in predictive validity of universal screening measures across disaggregation subgroups. School Psychology Review, 40(1), 108. IDEA (2004). Individuals with Disabilities Education Improvement Act. Public Law 108-446. doi:10.1007/springerreference_69963 Irvin, P. S., Alonzo, J., Lai, C. F., Park, B. J., & Tindal, G. (2012). Analyzing the Reliability of the easyCBM Reading Comprehension Measures: Grade 7. Technical Report# 1206. Behavioral Research and Teaching. Jenkins, J. R., & O’Connor, R. E. (2002). Early identification and intervention for young children with reading/learning disabilities. Identification of learning disabilities: Research to practice, 99-149 Jerald, C. D. (2006). Identifying Potential Dropouts: Key Lessons for Building an Early Warning Data System. A Dual Agenda of High Standards and High Graduation Rates. Achieve, Inc. Katsiyannis, A., Zhang, D., Ryan, J. B., & Jones, J. (2007). High-stakes testing and students with disabilities: Challenges and promises. Journal of Disability Policy Studies, 18(3), 160167. doi:10.1177/10442073070180030401 Kennelly, L., & Monrad, M. (2007). Approaches to Dropout Prevention: Heeding Early Warning Signs With Appropriate Interventions. doi:10.1037/e538292012-001 Kilgus, S. P., Methe, S. A., Maggin, D. M., & Tomasula, J. L. (2014). Curriculum-based measurement of oral reading (R-CBM): A diagnostic test accuracy meta-analysis of evidence supporting use in universal screening. Journal of school psychology, 52(4), 377405. doi:10.1016/j.jsp.2014.06.002 88 Kratochwill, T. R., Volpiansky, P., Clements, M., & Ball, C. (2007). Professional development in implementing and sustaining multitier prevention models: Implications for Response to Intervention. School Psychology Review, 36(4), 618-631. McDermott, R., Raley, J. D., & Seyer-Ochi, I. (2009). Race and class in a culture of risk. Review of Research in Education. 33, 101-116. Accessed; http://www.jstor.org/stable/40588119 McGlinchey, M. T., & Hixson, M. D. (2004). Using curriculum-based measurement to predict performance on state assessments in reading. School Psychology Review, 33, 193-203. McIntosh, K., Goodman, S., & Bohanon, H. (2010). Toward True Integration of Academic and Behavior Response to Intervention Systems: Part One--Tier 1 Support. Communiqué, 39(2), 1-14. Michigan Department of Education. (2012). Michigan Educational Assessment Program: Technical Report 2011-2012. Retrieved from http://www.michigan.gov/documents/mde/MEAP_20102011_Technical_Report_394693_7.pdf National Center on Response to Intervention (2012). RtI state database. Retrieved from http://state.rti4success.org/ National Research Center on Learning Disabilities (2006). A tiered service-delivery model. Retrieved from http://www.nrcld.org/rti_manual/pages/RTIManualSection3.pdf NCLB (2002). No Child Left Behind Act. Public Law 107-115. doi:10.1007/springerreference_223926 Neild, R. C., Balfanz, R., & Herzog, L. (2007). An early warning system. Educational leadership, 65(2), 28-33. Nigg, J. T., Hinshaw, S. P., Carte, E. T., & Treuting, J. J. (1998). Neuropsychological correlates of childhood attention-deficit/hyperactivity disorder: Explainable by comorbid disruptive behavior or reading problems? Journal of Abnormal Psychology, 107(3), 468–480. doi:10.1037/0021-843x.107.3.468 O’Connor, C., Hill, L. D., & Robinson, S. R. (2009). Who’s at risk in school and what’s race got to do with it?. Review of research in education, 33(1), 1-34. Osborne, J. W. (2010). Improving your data transformations: Applying the Box-Cox transformation. Practical Assessment, Research & Evaluation, 15(12), 1-9. Park, B. J., Alonzo, J., & Tindal, G. (2011). The Development and Technical Adequacy of Seventh-Grade Reading Comprehension Measures in a Progress Monitoring Assessment System. Technical Report# 1102. Behavioral Research and Teaching. 89 Peng, C. Y. J., & So, T. S. H. (2002). Logistic regression analysis and reporting: A primer. Understanding Statistics: Statistical Issues in Psychology, Education, and the Social Sciences, 1(1), 31-70. doi:10.1207/s15328031us0101_04 Pinkus, L. (2008). Using early-warning data to improve graduation rates: Closing cracks in the education system. Washington, DC: Alliance for Excellent Education. Reed, D. K., & Sturges, K. M. (2012). An Examination of Assessment Fidelity in the Administration and Interpretation of Reading Tests. Remedial and Special Education, 34(5), 259–268. doi:10.1177/0741932512464580 Reed, D. K., Wexler, J., & Vaughn, S. (2012). RTI for reading at the secondary level: Recommended literacy practices and remaining questions. Guilford Press. Reinke, W. M., Herman, K. C., Petras, H., & Ialongo, N. S. (2008). Empirically derived subtypes of child academic and behavior problems: Co-occurrence and distal outcomes. Journal of Abnormal Child Psychology, 36(5), 759-770. doi:10.1007/s10802-007-9208-2 Roberts, G., Vaughn, S., Fletcher, J., Stuebing, K., & Barth, A. (2013). Effects of a Response‐ Based, Tiered Framework for Intervening With Struggling Readers in Middle School. Reading research quarterly, 48(3), 237-254. doi:10.1002/rrq.47 Saez, L., Park, B., Nese, J. F., Jamgochian, E., Lai, C. F., Anderson, D., ... & Tindal, G. (2010). Technical Adequacy of the easyCBM Reading Measures (Grades 3-7), 2009-2010 Version. Technical Report# 1005. Behavioral Research and Teaching. Salvia, J., Ysseldyke, J., & Bolt, S. (2012). Assessment: In special and inclusive education. Cengage Learning. Silberglitt, B., & Hintze, J. M. (2007). How Much Growth Can We Expect? A Conditional Analysis of R-CBM Growth Rates by Level of Performance. Exceptional Children, 74(1), 71-84. doi:10.1177/001440290707400104 Smartt, S., & Reschly, D. (2007). Barriers to the preparation of highly qualified teachers in reading. Washington, DC: National Comprehensive Center for Teacher Quality. Retrieved from http://www.ncctq.org/tqbrief.php Stage, S. A., Abbott, R. D., Jenkins, J. R., & Berninger, V. W. (2003). Predicting response to early reading intervention from verbal IQ, reading-related language abilities, attention ratings, and verbal IQ-word reading discrepancy: Failure to validate discrepancy method. Journal of Learning Disabilities, 36, 24-33. doi:10.1177/00222194030360010401 State of Michigan. Department of Education. (2011). Approval of Recommended New Cut Scores on the Michigan Educational Assessment Program (MEAP) and Michigan Merit 90 Examination (MME) Consistent with Career and College Readiness. by M. Flanagan. Lansing, MI (Memorandum). Stevenson, N. (2015). Predicting proficiency on statewide assessments: A Comparison of curriculum-based measures. Journal of Educational Research. doi: 10.1080/00220671.2014.910161 Sugai, G. & Horner, R. H. (2009). Responsiveness-to-intervention and school-wide positive behavior supports: Ingegration of multi-tiered systems approaches. Exceptionality, 17, 223-237. Sugai, G., Horner, R. H., Dunlap, G., Hieneman, M., Lewis, T. J., Nelson, C. M., ... & Ruef, M. (2000). Applying positive behavior support and functional behavioral assessment in schools. Journal of Positive Behavior Interventions,2(3), 131-143. doi:10.1080/09362830903235375 Suh, S., & Suh, J. (2007). Risk factors and levels of risk for high school dropouts. Professional School Counseling, 10(3), 297-306. Sullivan, C. J., Childs, K. K., & O’Connell, D. (2010). Adolescent risk behavior subgroups: An empirical assessment. Journal of youth and adolescence, 39(5), 541-562. doi:10.1007/s10964-009-9445-5 Tabachnick, B. G. & Fidell, L. S. (2013). Using multivariate statistics, 6/e. Pearson, Inc. Tindal, G., & Nese, J. F. (2011). Applications of curriculum-based measures in making decisions with multiple reference points. Advances in Learning and Behavioral Disabilities, 24, 3158. doi:10.1108/s0735-004x(2011)0000024004 Tindal, G., Nese, J. F., & Alonzo, J. (2009). Criterion-Related Evidence Using easyCBM [R] Reading Measures and Student Demographics to Predict State Test Performance in Grades 3-8. Technical Report# 0910. Behavioral Research and Teaching. Tobin, T. J., & Sugai, G. M. (1999). Using sixth-grade school records to predict school violence, chronic discipline problems, and high school outcomes. Journal of Emotional and Behavioral Disorders, 7(1), 40-53. doi:10.1177/106342669900700105 Tolar, T. D., Barth, A. E., Francis, D. J., Fletcher, J. M., Stuebing, K. K., & Vaughn, S. (2011). Psychometric properties of maze tasks in middle school students. Assessment for Effective Intervention, doi:10.1177/1534508411413913 Vanderheyden, A. M., & Tilly, W. D. (2010). Keeping RTI on track: How to identify, repair and prevent mistakes that derail implementation. LRP Publications. Vanneman, A., Hamilton, L., Anderson, J. B., & Rahman, T. (2009). Achievement Gaps: How Black and White Students in Public Schools Perform in Mathematics and Reading on the 91 National Assessment of Educational Progress. Statistical Analysis Report. NCES 2009455. National Center for Education Statistics. Vaughn, S., Cirino, P. T., Wanzek, J., Wexler, J., Fletcher, J. M., Denton, C. D., … Francis, D. J. (2010). Response to Intervention for Middle School Students With Reading Difficulties: Effects of a Primary and Secondary Intervention. School Psychology Review, 39(1), 3– 21. Vaughn, S., & Fuchs, L. S. (2003). Redefining learning disabilities as inadequate response to instruction: The promise and potential problems. Learning Disabilities Research & Practice, 18(3), 137-146. doi:10.1111/1540-5826.00070 Vaughn, S., Wexler, J., Leroux, A., Roberts, G., Denton, C., Barth, A., & Fletcher, J. (2012). Effects of intensive reading intervention for eighth-grade students with persistently inadequate response to intervention. Journal of Learning Disabilities, 45(6), 515-525. doi:10.1177/0022219411402692 Wanzek, J., Vaughn, S., Roberts, G., & Fletcher, J. M. (2011). Efficacy of a reading intervention for middle school students with learning disabilities. Exceptional Children, 78(1), 73–87. Wiley, H. I., & Deno, S. L. (2005). Oral reading and maze measures as predictors of success for English learners on a state standards assessment. Remedial and Special Education, 26(4), 207-214. doi:10.1177/07419325050260040301 Yeo, S. (2009). Predicting performance on state achievement tests using curriculum-based measurement in reading: A multilevel meta-analysis. Remedial and Special Education, 31(6), 412–422. doi:10.1177/0741932508327463 92