.r 3...... . it. . . \. t. r. 1 ii . x... As 85%) ......................................................... 176 Table 6.3: Estimated Effect of High School-Level Teacher Retention on Average School- Level Student Mobility ........................................................................................ 177 Table 6.4: Sensitivity Analyses using Mantel-Haenszel Bounds for Effect on Student Mobility: Nearest Neighbor Matching ................................................................. 178 Table 6.5: Sensitivity Analyses using Mantel-Haenszel Bounds for Effect on Student Mobility: Stratification Method ........................................................................... 179 Table 6.6: Comparison of School and Student Characteristics Between High- and Low- Retention Case Study Schools, and Between Case Study and Non-Case Study Schools ................................................................................................................. 1 80 Table 7.1: OLS Regression Predicting Mathematics Teacher Undersupply ................... 204 Table 7.2: Logistic Regression Predicting Significant Mathematics Teacher Undersuppzlg5 Table 7.3: OLS Regression Predicting ELA Teacher Undersupply ................................ 206 Table 7.4: Logistic Regression Predicting Significant ELA Teacher Undersupply ........ 207 Table 7.5: OLS Regression Predicting Science Teacher Undersupply ............................ 208 Table 7.6: Logistic Regression Predicting Significant .................................................... 209 Table 7.7: Hierarchical Linear Model Predicting Student Mathematics Achievement as a Function of Mathematics Teacher Undersupply .................................................. 210 xii Table 7.8: Hierarchical Linear Model Predicting Student English Language Arts Achievement as a Function of English Language Arts Teacher Undersupply ....21 1 Table 7.9: Hierarchical Linear Model Predicting Student Science Achievement as a Function of Science Teacher Undersupply .......................................................... 212 Table 7.10: Comparison of Main Effects for Math Undersupply on Student Mathematics Achievement Using Different Demand Formula Assumptions ........................... 216 Table 7.11: Comparison of Main Effects for English Language Arts Undersupply on Student ELA Achievement Using Different Demand Formula Assumptions ..... 216 Table 7.12: Comparison of Main Effects for Science Undersupply on Student Science Achievement Using Different Demand Formula Assumptions ........................... 216 Appendix A: Complete List of Variable Descriptions and Summary Statistics .............. 225 xiii LIST OF FIGURES Figure 3.1: Comparison of Racial Composition of Michigan’s Teaching Force to Michigan and the USA ......................................................................................... 38 Figure 4.1: Comparison of Mathematics Demand with Varying Enrollment .................. 82 Figure 4.2: Comparison of Mathematics Undersupply with Different Enrollment ......... 85 Figure 4.3: Comparison of Demand Estimates Under Four Estimation Scenarios .......... 91 Figure 4.4: Comparison of Undersupply Under Four Estimation Scenarios ................... 95 Figure 5.1: Conceptual Model for the Relationship Between Teacher Retention, Student Mobility, and Student Achievement Outcomes ................................................... 124 Figure 5.2: Distribution of School-Level Teacher Retention Rates ................................ 129 Figure 5.3: Timing of the Longitudinal Data for Establishing a Causal Path .................. 153 Figure 6.1: Box and Whisker Plot to Demonstrate Distribution of Weights .................. 157 xiv CHAPTER I: TEACHER SUPPLY, DEMAND AND UNDERSUPPLY: BACKGROUND TO THE PROBLEM Ensuring an adequate supply of qualified teachers to meet instructional demands is an area of critical importance in education policy research. Teacher supply became a matter of research focus in the early 19803, with two key demographic trends— increasing student enrollments and increasing teacher retirements—as the cause of a predicted teacher Shortage (Darling-Hammond, 1984; Ingersoll, 2001). Some of the early research on the topic reflected a growing concern regarding increasing enrollments and Simultaneous teacher retirements, and sought to establish methods and procedures for estimating the supply and demand accurately (Boe & Gilford, 1992; Arnold, Choy, & Bobbitt, 1993). The decades since have been devoted to attempting to quantify the size of this teacher supply (Ingersoll & Perda, 2009), to produce more rigorous methods for estimating supply (Santiago, 2002), and dealing with issues inherent in supply, such as teacher attrition and retention (Borman & Dowling, 2008;Dolton & van der Klaauw, 1999; Harris & Adams, 2007; Imazeki, 2005; Shen, 1997; Stinebrickner, 1998; 2002) and teacher labor market matching (Boyd, Lankford, Loeb, & Wyckoff, 2006; Stinebrickner, 2001) Teacher demand has been influenced in part by the advent of merit curriculums. In order to meet the increased rigor necessitated by the No Child Left Behind legislation (2001), states have focused resources on ensuring that teachers are teaching in-field, and have also raised standards, both in terms of teacher quality and curricular requirements. An example of such curricular changes is expanded graduation requirements, such as increasing the number of math courses required to graduate, which helps equip students for the transition to postsecondary education and the labor market. This focus is ongoing in other nationwide efforts, such as the Common Core Standards and the America Diploma Project, both of which seek to increase standards and expectations so that students finish high school with rigorous preparation necessary for success in postsecondary education and in the workforce. All of these changes in curricular requirements have an impact on the demand for teachers, both in terms of sheer quantity of teachers as well as teachers trained with specific Skill sets. Finally, the current economic climate and the dramatic “shocks” to the system make predicting teacher supply and demand even more problematic. The teaching force in America is not necessarily structured to be quickly responsive to sudden systemic Shocks. Given recent economic developments, there are significant changes related to student population and enrollment, with some states losing students at a rapid rate due to out-migration. There are also new issues on the demand side, as schools may need more teachers but may be severely restricted by funding challenges. In understanding teacher labor supply, there are three major components: demand, supply and retention, and the impact of undersupply. This dissertation addresses the overall question of teacher labor supply and its impact on student achievement and mobility by focusing on these three areas. To estimate teacher demand, a previously developed and utilized demand is tested and refined, and then used in undersupply analyses. While teacher supply in the original formula accounted for the number of F TES of each subject area in each school, it did not take into account issues of teacher “churn rates.” For a school with undersupply, or even with an adequate supply, what are the teacher churn rates, and is there a revolving door of teacher entry and departure that affects student achievement and mobility directly, and that interfaces with teacher undersupply? Finally, using the revised demand formula and considering teacher retention as a component of teacher supply, what is the impact of school-level teacher undersupply on student achievement outcomes? This study conceptualizes teacher demand, supply, and undersupply from an organizational perspective, considering teacher labor supply at the school level as an organizational characteristic of each high school, and investigates the impact of this organizational characteristic on the effectiveness of the organization itself, as measured by student achievement. Much of the work on teacher demand, supply, and undersupply has been conducted at the state or national level, using nationally representative data sets. In fact, much of the seminal work on issues related to teacher supply utilized surveys such as Schools and Staffing, and suggested an upcoming national teacher shortage (Boe & Gilford, 1992; Darling-Hammond, 2000; Ingersoll, 2001, 2003). While it is important to understand overall teacher labor supply and demand at the national and state level, this approach can mask important differences that occur on a school-by-school basis. By contrast, the studies presented here all begin by conceiving of teacher supply, demand, and undersupply as school organizational characteristics, and utilize state administrative data from Michigan to take into account distinct organizational factors related to each factor of teacher labor supply (demand, supply, and undersupply) and student achievement. As an organizational characteristic of the school, teacher supply and demand can contribute to the ability of a school to build trust among members, and to develop a sense of community. This draws on the sociology of education, which has shown that the presence of a sense of community and cohesion among families, teachers, and students is important for the success of schools (e.g., Bryk & Schneider, 2002; Durkheim, 1961; Grant, 1988; Parsons, 1959; Rosenholtz, 1989, Waller, 1932). A body of evidence suggests that the community of the school has important implications for school performance and effectiveness (Bryk, Lee, & Smith, 1990; Coleman & Hoffer, 1987; Rosenholtz, 1989). Specifically, the communal nature of certain types of schools, such as private schools, can create an environment that reinforces shared values and that leads to higher levels of social capital (Coleman & Hoffer, 1987). This sense of community is created by shared values, trust, and reciprocity between individuals in the school (Bryk & Schneider, 2002). Schools in which there is a stronger sense of community and cooperation between students and teachers have been shown to have higher achievement levels, as well as a more equitable distribution of achievement (Lee, Bryk, & Smith, 1993; Lee & Smith, 1997). How does teacher labor supply relate to the organizational culture of the school? More specifically, how can teacher labor supply be expected to impact student achievement as an organizational characteristic of a school? Prior research has shown that much of the variation in student achievement is due to within-school, rather than between-school, factors. Therefore, in order to estimate supply and demand and the impact those factors have on student achievement, the most appropriate level of analysis is at the school level. Teacher demand, particularly teacher demand as understood in the context of statewide graduation requirements, will vary by each school and subject, as will teacher supply. An important corollary to teacher supply is teacher turnover, as the relationship between student achievement and teacher supply is likely affected by teacher turnover. Schools with high rates of turnover may be more disorganized as a culture, which in turn may lead to both decreased student achievement, as well as increased student mobility, as students respond to the disorganized nature of the school and leave in greater numbers. There may also be an indirect effect of teacher turnover on student achievement via student mobility, as teacher turnover may lead to increased student turnover, which is known to be linked to decrease achievement outcomes. The third component of teacher labor supply is undersupply. School-level undersupply in each subject is hypothesized to have a negative impact on student achievement outcomes. Addressing teacher supply and demand using state administrative data Since 2005, the Institute for Education Sciences (IES) has been awarding statewide longitudinal data systems (SLDS) grants to states under the Educational Technical Assistance Act of 2002. The purpose of these grants is to help states to manage, analyze and utilize education data. Fourteen states were awarded grants in 2005; 12 additional states and the District of Columbia received grants in 2007; and under the American Reinvestment and Recovery Act of 2009, 27 states were awarded an additional $265 million in funding to further the work of the longitudinal data systems in the states. The result of this substantial effort on behalf of the federal government is that many states have rigorous, extensive, longitudinal data on students and teachers (Institute for Education Sciences, n.d.). Moreover, the SLDS grants, NCLB, and the new requirements under the American Reinvestment and Recovery Act and the Race to the Top competition all require that states, districts, and schools make use of the data they are collecting in order to influence policy and to support decision making. One potentially fruitful use of this state administrative data is the study of teacher labor supply. These teacher data are comprehensive; it is possible to know who applies, who is granted a license, and what those licenses are in, along with certain demographic information on those individuals. This helps to address a repeated concern in teacher supply research, which is that the use of data from SASS or other nationally representative data sets do not allow for the direct estimation of the absolute Size of a teacher shortage (Murphy, DeArmound, & Guin, 2003). Ingersoll and Perda (2009) point out that one of the problems with estimating supply for teachers is a lack of comprehensive data on “entry, licensing, and preemployment preparation”(p. 3). The state of Michigan, via the Center for Educational Performance and Instruction (CEPI), collects and reports information related to the entire teaching workforce, and collects this information in the Registry of Educational Personnel. This database includes employment records for any individual hired by any educational entity in the state of Michigan. The data are longitudinal; they have been collected in the REF since 2002, but have become more rigorous and reliable within the last four years. This data source iS discussed in greater detail in Chapter 2: Data and Methods. Michigan also has a wealth of student data available to the research community. Student demographic data and achievement data are available from the 2003-2004 school year to the present, linked by a unique identifier code.1 The achievement data include student test scores and performance levels of the state-administered MEAP (for grades 3- 8) and MME (for high school). These data are discussed in greater detail in the data and methods chapter. 1 Student demographic and achievement data are available prior to 2003-2004, but they are not linked by a unique identifier code, as this was not in place prior to 2003-2004. These data are well-suited for use in analyzing this question for several reasons. First, they are universe data, and include all students, teachers, and schools in the state of Michigan. This removes a great deal of uncertainty regarding the generalizability of results from a sample to a larger population.2 Secondly, as they include all teachers and any observed attrition indicates a teacher leaving the Michigan teaching workforce, rather than leaving the study sample, it is possible to obtain more accurate information about the movement of teachers between schools and districts within the state, although information on teachers moving out of state is still be difficult to track. Finally, Michigan, like many states, has invested Significant resources in the construction, maintenance, and improvement of its longitudinal data systems, and is vitally interested in using these data more frequently to assess questions related to student achievement and statewide policy. Michigan’s recent Race to the Top Round 2 application focused heavily not only on making data available for use by researchers, but on making data available for use to truly drive instruction, target professional development, and inform policy (Michigan’s Race to the Top application, 2010). For this reason, these data represent a timely and critical resource to be utilized for academic research, in order to inform the state while also supporting their efforts in this area. Given the national interest in using state administrative data to address questions in educational policy, this study is also an important contributor to the national discussion of these issues. Foundations for the Dissertation: Is there an adequate supply of teachers to meet the Michigan Merit Curriculum? 2 Although the data are universe data in one sense, they still represent a sample, albeit a very well-defined one. These data can be considered one of the possible populations of teachers, students, and schools that could represent Michigan. One would expect the sampling error to be very small to negligible, but from a theoretical point of view, it is generally understood that one never has a true population, even when in possession of very complete data. In 2006, Michigan instituted the Michigan Merit Curriculum, which required four years of math and English, three years of science and social studies, and two years of world language for high school graduation (Michigan Department of Education, 2006). The state needed a feasible, easily utilized formula that enabled them to estimate where the demand might be located, in order to inform both policy and to target resources. This method, designed to help the state make an administrative decision regarding whether or not it has enough teachers to accommodate the increase in graduation requirements, needed to meet three requirements; it needed to be I) easily manipulated, 2) able to be completed in a short period of time, and 3) able to utilize existing data. The following method fits these three characteristics. To estimate the number of teachers needed to fill the F TES prescribed by the demands of the Michigan Merit Curriculum, 1 developed the following formula3: (') (I) z where D: = number of teachers needed to meet graduation requirements in a subject area a = proportion of the student body that needs to be enrolled in each subject each year in order to meet graduation requirements xi = total student enrollment in each school (including Special education students) y = class size (assumed to be 25) = number Of periods taught per FTE per day (assumed to be 5)4 3 See Keesler, Wyse & Jones (2008b) on the [ES website (11115” ies.ed.gov/ncee/edlaps/regions/midwest/mlf/techbrief/tr 00508.ndf). This formula has been vetted by IE5 and is considered as a promising tool for use by other states. While the report was a collaborative effOrt, the development of the formula was an individual contribution on my part. This was also part of a icchnical report under the REL-Midwest Collaboration (Keesler, Wyse, & Jones, 2008a). I)The formula and related assumptions will be discussed in greater detail in Chapter 4, Estimating Teacher emand. A key benefit of this demand formula is that it can be adjusted by the practitioner to meet specific conditions of a state, district or school. Class size and number of F TEs per teacher can be increased or decreased, enrollment can be adjusted, and specific curricular requirements of the school (outside of what is required by the state) can be factored in. One of the main objectives of the dissertation to produce a formula that is rigorous but that is easily used by practitioners, allowing them to be more strategic and cost-efficient with resources, and to use available data in a real-time, meaningful manner. Calculating Supply To calculate supply, it is necessary to have access to data regarding teacher and building assignment; these are compliance data, i.e., information that each state is required to collect in order to receive federal funding. Supply is calculated by summing F TES in each subject assignment within each school. Subject assignments are recorded by FTES, or the proportion of each full-time employee (FTE) assigned to a particular task. For example, a teacher may teach math for half of the day and physics for the other half. In that case, her FTE would be reported as 0.5 math, 0.5 science. Similarly, a teacher may have both a teaching and administrative assignment. Therefore, it is important to calculate supply not by the number of “teachers” but by the total FTE within each subject area within each school. Calculating Undersupply Once demand is estimated for each school and supply is calculated, then one can SUbtract supply from demand. Positive numbers indicate an undersupply of FTE in a given subject area. The researcher should then designate a cut point at which undersupply is determined to be Significant—where this “lack” is likely affecting practice, and not able to be easily covered by adjusting teaching assignments or reconfiguring classes or other administrative tactics. Here, a cut point of greater than one F TB of undersupply is designated as significant undersupply. In the Michigan study, undersupply was calculated on a subject-specific basis, in order to take into account variation on availability of qualified teachers in different subjects. This method of estimating supply and demand is known as a behavioral model which can be used to answers questions related to the effects of policy changes by linking demand and supply estimates to relevant conditions and policies (Boe & Gilford, 1992). In Michigan in 2007, there were 223 high schools undersupplied in math, 64 in English Language Arts, 41 in science, 39 in social studies, 52 in both mathematics and English Language Arts, and 9 undersupplied in all four core areas (Keesler, Wyse, & Jones, 2008a, 2008b). Undersupply seems to be more prevalent in mathematics and ELA. This undersupply is most likely a consequence of MMC’S new requirements that students take four years of mathematics and ELA.5 There appears to be an association between schools that are undersupplied and failing AYP. While many schools that are undersupplied are able to meet their AYP requirements, there appears to be a relationship between school undersupply and failing AYP. Failing to meet AYP targets could be more a function of the demographic profile of the schools than whether a school is undersupplied or not. We cannot determine from the analyses presented here whether meeting AYP is a function of the demographics of the school, issues of undersupply, or both. Finally, although the number of schools that are undersupplied is relatively small, 5 One of the reasons we suspect that many of the schools have an undersupply in mathematics and ELA is that, according to multiple school websites, graduation requirements prior to the MMC often required students to take 2-3 years of mathematics and 2-3 years in ELA. 10 the number of students affected by undersupply is not insignificant. For example, 72,798 students attend the 61 schools that were undersupplied in mathematics and ELA in 2007. Investigating Teacher Demand, Supply and Turnover, and Undersupply Using State Administrative Data: Dissertation Outline The method outlined above for calculating teacher demand, supply, and undersupply forms the foundation for this dissertation. This method is unique in several ways. First, it utilizes state administrative data which, as described above, offers unique advantages over nationally representative data sets, including the full Specification of an entire teaching population. Secondly, supply and demand calculations are performed on a school-by-school basis. Given the local nature of both demand and teacher supply, calculating these factors at the school level gives increased precision and allows for a fine-grained analysis of the issue. Again, much of the work on teacher supply and demand produces estimates at the state or national level, which is far too aggregated to show the intricacies of supply and demand on the local level (for example, the most recent NCES publication using the SASS Teacher Follow Up Survey, Marvel, Lyter, Peltola, Strizek, Morton, & Rowland, 2007). Following Ingersoll, I view issues of supply, demand, undersupply and turnover as organizational characteristics of schools, which suggests that there are school-level differences and interactions between the organization itself and these factors which are critically important to understanding their import. Finally, this method is administratively useful as well as policy-relevant in that the assumptions can be changed by a state or district to more accurately reflect local condition, and this analysis can be run by non-technical staff who are interested in generating supply and demand calculations for their district or school. 11 This dissertation generally follows a three-essay model, although focusing more on developing a coherent overall empirical narrative as opposed to three standalone essays. As each analysis is necessary for consequent analyses, they are three critical components of the overall story of school-level teacher labor supply and its relationship to student achievement in the state of Michigan. The next chapter presents, in greater detail, the data and analytic methods utilized in the dissertation. Chapter 3 provides descriptive analyses of the teaching workforce in Michigan at the individual level; the relationship between school-level teacher retention rates and school characteristics; and finally, the relationship between school-level undersupply and school characteristics. Chapter 4 introduces the demand formula and presents a series of modifications, and then tests the original formula against the modified formula. Chapter 5 undertakes a multilevel analysis of the impact of school-level teacher retention on student achievement and student mobility, and Chapter 6 extends this analysis into a quasi-experimental framework to estimate the effect of school-level teacher retention as a treatment. Utilizing the findings from the demand formula re- estimation (Chapter 4) and integrating the school-level teacher retention rate (Chapter 5), Chapter 7 estimates the impact of school-level teacher undersupply on student achievement and student mobility. The dissertation concludes with discussion and conclusions, as well as suggested policy impacts and outlined next steps (Chapter 8). 12 CHAPTER 2: DATA, METHODS, AND MEASURES This chapter presents an overview of the data and sample, methods, and variables used throughout this dissertation. The majority of the detail regarding these elements will be presented here and referenced in their appropriate chapters. Data This dissertation utilizes data from several Michigan administrative data sources: (1) The Michigan Registry of Educational Personnel (REP), (2) the Michigan Single Record Student Database (SRSD), (3) Michigan assessment data, and (4) Michigan teacher data. Each of these sources is discussed below.1 These data represent an important, relatively new source of data for education research. Federal and state governments have spent vast sums of money developing and implementing these systems, and ensuring that they contain quality data. It is incumbent upon the research community begin to make use of these data, both in partnership with the states themselves, as well as in individual research work. These data are markedly different from other large-scale data sets. AS they contain information on every student, teacher and school in the state, they are universe data.2 While they contain an immense number of observations, they can be sparse in terms of number and types variables. A researcher can be creative, as was done here, and create variables as well as link with other data sets, but unlike studies such as those produced by NCES, these data do not contain multiple measures of things like social issues, expectations and plans, or family 1 All data were obtained from Michigan’s Center for Educational Performance and Information, and from the Office of Educational Assessment and Accountability in the Michigan Department of Education. These data are available to researchers who submit appropriate applications, research plans, and undergo human subjects review by a state review board. Although the data are universe data in one sense, they still represent a sample, albeit a very well-defined one. See footnote #2 in Chapter 1 for a discussion of this issue. 13 background measures. Missing data is a different issue as well. In a nationally representative survey, missing data can be very critical in terms of who is missing and to what extent that biases the findings. In these data sets, there is very little to no missing data, as many of these fields are required collections and missing data is not allowed. When data is identified as “missing,” it is necessary to revisit that variable and understand whether a missing value truly means “no response” or whether it is a variable that would not be expected to have a value for that given student, teacher or school. Finally, it is important to note that, although this work specifically uses Michigan administrative data, both the findings and the lessons learned from working with these types of state data are expected to be applicable in other state contexts. The Registry of Educational Personnel (REP) The Registry of Educational Personnel is Michigan’s longitudinal administrative database that contains information on every educational personnel member in the state of Michigan. This dataset includes demographic, teaching assignment, and licensure and endorsement information for all teachers. For this study, teacher data from 2005-2008 is utilized. It is possible to use these data to define the population of teachers in a given year, and understand their demographic and school characteristics; study the relationship between licenses, endorsements, and subjects taught, in order to test for out-of-field teaching; and to look at the distribution of types of licenses across types of teachers and also types of schools.3 The demographic information on teachers includes the following variables: age, gender, race, highest education degree, major and minor, higher education institution 3 The REP is stored in a relational file database, and thus creating research files for analysis requires “unstacking” the files to the appropriate level. The technical reports produced by the REL-Midwest provide further detail on the mechanics of this process. 14 attended and whether or not a teacher is “highly qualified.”4’5 Teaching assignment data provides information on each assignment held by a given teacher, including the subject, number of FTES taught for that assignment, and number of classes taught by each teacher. Finally, the license and endorsement portion of the database provides information on the type of license a teacher holds (provisional, professional, vocational, administrative, etc.), the date that license was issued, and information on which specific endorsements the teacher holds.6 There are over 2,000 types of endorsements, covering every subject area, and teachers can hold up to seven endorsements, although most hold between one and three (Lynn & Keesler, 2008). This database includes a field for teacher preparation institution, but it is missing a relatively large amount of data (approximately 40%) because older teachers did not submit the information when they entered the system and the field only reports the most recent degree. There is also a highly qualified field, although 98% of teachers are considered and reported as highly qualified, so this is not necessarily a highly informative field. There is extensive information related to assignments—subject taught, number of FTES in various subject. The Single Record Student Database (SRSD) 7 The SRSD is Michigan’s student database, which contains information on student demographics, such as gender, date of birth, racial/ethnic identification, and personally 4 Not all of the fields are “compliance fields,” which means that districts are not required to submit information for them. Age, race, gender, and highest education are all compliance fields, and are missing less than 2% of the information, but noncompliance codes like major/minor have approximately 40% missing. These variables should be used with caution, understanding their limitations. 5 “Highly qualified indicates that a teacher has at least a bachelor’s degree and has passed the Michigan teacher certification exams in both basic skills and in their subject area taught, or has demonstrated subject- area specific knowledge and training, such as graduate coursework in that subject or National Board glertification (Watkins, 2003). This is known as the License 2000, or L2K, database, which is now fully contained in the REP. 7 Beginning in 2010, the SRSD will be replaced with the Michigan Student Data System (MSDS), with appropriate crosswalks between the two systems. 15 identifying information. Michigan is able to link all student demographic and assessment data using a unique identifier code (U IC), which enables longitudinal analyses of the data. The SRSD is collected annually by all intermediate school districts and submitted to the state of Michigan in order to comply with federal reporting requirements. Student Assessment Data Michigan, like most states, has a statewide assessment system, known as the MEAS (Michigan Educational Assessment System). This includes yearly testing in reading and mathematics in grades 3-8, and one grade in high school, grade 11, as well as alternative assessments for students with moderate to severe cognitive impairments, second-language learners, and different assessments for elementary and middle schools, and high schools.8 The high school test is the Michigan Merit Examination, a three-day test administered to all juniors that includes mathematics, English language arts, science, social studies and reading, and also includes the intact ACT and WorkKeys tests. When measuring the relationship between the main predictors of interest in this study (school-level teacher retention and undersupply), and student achievement and mobility outcomes, it is desirable to have a pretest measure of student achievement, as this is an important covariate in regression analyses. In fact, Cook, Shadish and Wong (2008) find that “we can trust estimates from observational studies that match intact treatment and comparison groups on at least pretest measures of outcome.” For an observational study such as this one, including a pretest measure of student ability is critical to minimize the chance that the inferences are due to bias. 8 In compliance with NCLB, Michigan assesses mathematics and reading yearly grades 3-8 and grade 11. Other subjects must be assessed once per level (elementary, middle, and high school). Writing is assessed in grades 4 and 7. Science is assessed in 5th, 8th, and 11th grade. Social studies is assessed in grades 5, 6, and 9. 16 However, in Michigan, the state’s assessments are not vertically scaled. As a consequence, the score that a student receives on their 8th grade achievement test is not on the same scale as the 11th grade achievement test, and therefore, methods such as gain scores or regression-based grth models are not well-supported by the data. When using a pro-test score as a predictor, however, it is possible to use the assessment data in this format, because although the scores are not linked to the same scale, they can serve as a reasonable prior measure of ability as an independent variable in regression-based analytic techniques. Therefore, for this analysis, student test scores on one administration of the Michigan Merit Examination are used—the Spring 2009 administration, and a pre- test score from their last assessment data—8th grade MEAP—is also used.9 School Level Data While the teacher data file does not include school information, it is possible to link rich school-level data in using the Michigan school code. This school code has a corresponding NCES identification number, which allows for the linking of data from the Common Core of Data and other NCES sources. For this study, the school data file is constructed using the state of Michigan’s Educational Entity Master, the overall master file that defines every educational entity in the state of Michigan and its current status (i.e. open, closed). State-released files on free and reduced lunch, pupil headcount, AYP 9 A better outcome measure for tracking student achievement growth is a specially developed measure by the state of Michigan which assigns each student one of five performance level change indicators with respect to their achievement scores from year to year—maintaining, increasing, significantly increasing, decreasing, or significantly decreasing. However, these data are not available for high school students, because of concerns with constructing this measure using the high school test (the MME) and the last test prior to high school (8th grade MEAP). These tests are administered three years apart from each other, and the scales and constructs are extremely different. Middle and elementary school content is determined by Grade Level Content Expectations (GLCEs), while high school content is determined via the Michigan Merit Curriculum. Therefore, calculating the performance level change categories seems inadvisable. l7 and school accreditation grade are linked in, as is information from the Common Core of Data. Only high schools are included in the school file. Sample Although data are available on all teachers, students, and schools, the sample for this dissertation is restricted to high schools only. The Michigan Merit Curriculum is a set of high school graduation requirements, and therefore, analyzing teacher supply, demand and undersupply in relation to these requirements requires analyzing high schools only. Only high schools that are local education agency (LEA) schools or public school academy (PSA) schools are analyzed—on other words, standard public high schools and charter high schools. Of those schools, I only analyze those defined as “regular” schools, as opposed to alternative or other types of schools, like Special education centers. This is due to concerns about generalizability, in that alternative schools consist of unique configurations of students and teachers, and are fundamentally different than “regular” schools.10 If schools have a retention rate of zero, they are retained in the analysis only if they are not closed schools, with the exception of a handful of schools that were truly closed for restructuring one year and reopened the following year. Schools with less than three teachers are excluded from the analysis as well, as calculating a retention rate with less than three teachers leads to skewed results. This yields a sample of 580 high schools . 11 used In these analyses. 10 In a separate analysis of teacher retention in alternative high schools only, I find that higher levels of teacher retention rates are actually associated with lower student achievement, although this effect is not statistically significant than zero. The sample for both the school-level teacher retention and the school-level undersupply analyses is defined by the decisions made related to retention rates described above. It was necessary to analyze the same population of schools for both the retention and undersupply analyses, as the relationship between retention and undersupply was investigated. Additionally, the restrictions placed on the sample from the retention analyses served primarily to restrict the high schools to traditional high schools for whom this 18 Teachers were included in the population of instructional staff for a given year based on whether or not they had an “instructional” subject assignment, as defined by the state of Michigan. All instructional assignment records were tagged, and these were then flattened into a person-level file for each year. The population of teachers was 111,974 in 2005; 111,055 in 2006; 109,915 in 2007, and 107,793 in 2008. For the student population, only students with both a valid, non-zero MME and MEAP assessment score were retained, and those who either did not have a matching pre-test score, had a score of zero, or took Ml-Access (Michigan’s alternative assessment for students with moderate to severe cognitive difficulties), were eliminated from the analysis. This yields a sample of 96,556 students. Methods: Multilevel Modeling and Propensity Score Matching This dissertation focuses on two important relationships: (1) estimating the relationship between school-level teacher retention and student achievement and mobility outcomes, and (2) estimating the relationship between school-level teacher undersupply to meet the Michigan Merit Curriculum and student achievement, taking into account the school-level “churn” rate. In order to investigate these relationships, two types of methodological strategies are utilized: multilevel models and propensity score matching. Multilevel modeling, particularly with longitudinal data, allows for the estimation of the impact of school- and student-level characteristics on given outcomes. Multilevel models are used in tandem with propensity score matching techniques in order to provide additional evidence regarding the relationship observed, and to test for a relationship in a quasi-experimental framework. demand formula is most neatly applied. However, it is possible to re-define the sample and conduct undersupply analyses on a different sub-population of high schools. 19 Below, I present an overview of each strategy, including a brief discussion of why that strategy was selected. The methods are then referenced in their appropriate chapters. The analysis of teacher retention uses multilevel modeling with sensitivity analyses, and propensity score matching with sensitivity analyses, to examine the relationship between school-level teacher retention, student mobility and student achievement, while the analysis of school-level undersupply uses multilevel modeling with sensitivity analyses to investigate the relationship between school-level subject-Specific undersupply and student achievement. The Multilevel Models to Be Estimated To test the effects of the two main predictors of interest, school-level teacher retention and undersupply, on key outcomes, multilevel models (i.e., models with random effects) with high school students nested within schools are estimated. Unlike fixed- effects models, multilevel models do not Simply control for effects of organizational contexts (e.g., schools), because they include the capacity to simultaneously test effects at multiple levels (e.g., the effects of student prior ability at the individual level and of teacher retention rate at the school level), as well as cross-level interactions to find, for example, an interaction between a student’s race or gender and the teacher retention rate of the school (see Raudenbush & Bryk 2002). Two types of multilevel models are estimated. The first are hierarchical linear models (HLMS), which test the relationship between the key predictors (teacher retention and teacher undersupply) and the continuous student achievement outcomes, and the second are hierarchical generalized linear models (HGLMs), used only in the teacher retention analysis in order to predict student mobility, which is a dichotomous outcome, 20 as a function of teacher retention. The general structure of both types of models (specified below) is a means-aS-outcomes model, where the grand mean for each school is predicted by both student- and school-level characteristics. The theoretical framework of this study suggests that although the main predictors of interest, teacher retention and teacher undersupply, are school-level variables, these predictors do not act separately from student-level characteristics and that student performance is a function of both student- and school-level characteristics. It is well known that much of the impact on student achievement is related to individual student background characteristics, but a growing body of literature suggests that school characteristics have an impact on student performance as well (Rumberger & Thomas, 2000). In the teacher retention multilevel models, teacher retention predicts student achievement both directly and as a crosslevel interaction with student mobility. The purpose of these models is to test whether or not teacher retention has a direct effect on student achievement, as well as to test whether student mobility interacts with teacher retention in this relationship. The general model specification for the HLM models is as follows: Level 1 model: (1) Yij = l30j + B1j(student mobility) + Bj Z’ + rij Level 2 model: BOj = 700 + 701(teacher retention) + yj Q’ + uoj where Yij = outcome (mathematics scale score) for each student i in school j BOj = each school mean, represented as a function of the grand mean, student mobility, the matrix of student-level predictors, the school teacher retention rate, and the matrix of school level predictors B1 j = coefficient for student mobility 21 l3j = vector of coeffs for school j Z’ = vector of student covariates for school j 700: grand mean (intercept) 701 = effect of teacher retention on BOj (each school mean) yj = vector of school-level predictors) Q’ = vector of school covariates uoj = the residual error of Boj, distributed iid N(O, too) rij = level 1 variance (student error term), rij distributed iid N(0, 0'2) In the final teacher retention model, a crosslevel interaction term is added. The Specification for this model is the same as above, with the student mobility slope predicted by teacher retention rate. The slope is not allowed to vary (i.e. is fixed). Level 1 model: ' (2) Yij = BOj + B1j(student mobility) + Bj Z’ + rij Level 2 model: Boj = 700 + 701(teacher retention) + y: Q’ + uoj [311- = 710 + y] 1 (teacher retention) This model is modified slightly in the teacher undersupply analyses, as outlined below: Level 1 model: (3) Yij = [30,- + fi1j(student mobility) + pj z, + ,1]. Level 2 model: BOj = 700 + Y01(underSUPply) + 'Yj Q’ + qu where Yij = outcome (mathematics, ELA or science scale score) for each student i in school j BOj = each school mean, represented as a function of the grand mean, student mobility, the matrix of student-level predictors, the school teacher undersupply rate, and the matrix of school level predictors [31 j = coefficient for student mobility Bj = vector of coeffs for school j Z’ = vector of student covariates for schoolj 22 700: grand mean (intercept) 701 = effect of subject-specific undersupply on BOj (each school mean) yj = vector of school-level predictors ’ = vector of school covariates uoj = the residual error of B0,, distributed iid N(0, too) rij = level 1 variance (student error term), rij distributed iid N(0, 0'2) For the HGLMS in the teacher retention analysis, student mobility is a binary outcome variable; therefore the use of a standard level 1 multilevel model is inappropriate (Raudenbush & Bryk, 2002). Student mobility is indicated by whether or not a student remained in the same school from the fall of 2006, which was their freshmen year, until the spring of 2009, when they took the high school achievement test (1=changed schools). The general structure of the HGLM models follows below. Level 1 structural model:12 (4) on = BOj + I3j 2’ Level 2 model BOj = 700 + 701(teacher retention) + 702(student mobility 2007 cohort) + yj Q’ + uoj Bpj = ypo for p>0 ‘lij = the log odds of remaining in the same school for each student i in school j BOj = each school mean, represented as a function of the grand mean, the matrix of student-level predictors, the school teacher retention rate, and the matrix of school level predictors l3j = vector of coeffs for school j 12 The level 1 model in HGLM consists of three parts: a sampling model, a link firnction, and a structural 2 model. The sampling model assumes that Yij, given the predicted value uij, is distributed NID (uij, o ). The level-1 predicted value, uij, can be transformed so that the predictions remain within the given interval, which produces the transformed predicted value nij. This transformed predicted value is now related to the predictors of the model through the linear structure model. Combining the sampling model, link function, and level 1 structural model reproduces the familiar level-l HLM model (Raudenbush & Bryk, 2002). The level 1 variance is now heteroskedastic 23 Z’ = vector of student covariates for school j 700: grand mean (intercept) 701 = effect of teacher retention on BOJ' (each school mean) 702 = effect of 2007 cohort student mobility on BOj (each school mean) yj = vector of school-level predictors) Q’ = vector of school covariates uoj = the residual error of Boj, distributed iid N(0, too) Analytic Strategy for Multilevel Models A baseline for all multilevel models is established by estimating an unconditional random effects ANOVA. This allows for the calculation of the intraclass correlation, or the proportion of variance that is between schools. Following the estimation of this baseline model, a series of multilevel models are estimated. For the teacher retention analysis, four different multilevel models are estimated (see Table 5.2). The first, a bivariate model, estimates the bivariate relationship between school-level teacher retention and student achievement in mathematics. The second (Model 2) includes a second school-level predictor shown in the preliminary analyses to be highly correlated with the student achievement, percent free lunch, in order to control for other key school- level factors. This model also introduces a pretest measure of mathematics achievement at the student level, in order to account for student prior ability. The final two models are fully Specified models with all predictors at level 1 and level 2, with the final model containing a crosslevel interaction between teacher retention and students who remain in the same school, to test the hypothesis that there is a multiplicative effect of teacher retention and student retention on student achievement.13 ‘3 All Level 1 predictors, with the exception of pretest score and same school, are grand mean centered, in order to control for their effect, rather than partial out the impact attributable to student and school. A model with all Level 1 predictors group mean centered was run (output not reported); the effects for gender 24 A similar modeling scheme is utilized in the models predicting student mobility, beginning with a bivariate model with teacher retention predicting student mobility (see Table 5.3). The second model includes only the school-level measure of prior student mobility for the 2007 cohort. The final model is a full model, including all level 1 predictors (gender, race and program eligibility) as well as level-2 predictors. In the teacher undersupply analysis, a very similar set of models were estimated as described above. Following the estimation of a baseline model, a series of multilevel models were estimated, for mathematics, English language arts, and science. Four different multilevel models are presented in Table 7.7 (mathematics), Table 7.8 (English language arts), and Table 7.9 (science). The first, a bivariate model, estimates the bivariate relationship between subject-specific school-level teacher undersupply and student achievement in that subject. The second (Model 2) includes a second predictor shown in the preliminary analyses to be highly correlated with the student achievement, free lunch eligibility, as well as a pretest measure of mathematics achievement and the student mobility indicator at the student level, in order to account for student prior ability. The third model adds in the school-level teacher retention rate. The final model includes all predictors at level 1 and level 2. Sensitivity Analyses When approaching applied analytic work using observational data, there is a concern regarding the impact of an unobservable characteristic on the outcome, one that might invalidate the inferences drawn from the study. When using state administrative data, this is a concern as well, as state data is rich in observations but does not include a and race are largely at the student level, not at the school-mean level. Therefore, the decision was made to grand mean center race and gender in the reported models. 25 large number of variables. Sensitivity analyses are conducted to test the robustness of the inferences to the influence of other unobserved characteristics. Observational studies all vary in the degree to which they are sensitive to hidden bias, where some are very sensitive and others are relatively insensitive, even to substantial bias (Rosenbaum, 2005) The presence of unobserved confounding variables cannot be “solved” but the sensitivity of the findings to this potential bias can be quantified, as a way to understand and contextualize the rigor of the results. Therefore I will characterize the robustness of these inferences to the potential impact of confounding variables (Frank, 2000). Following Frank (2000) and Frank & Sykes, et al. (2008), the sensitivity analyses techniques used in this paper consider how unknown quantities could affect estimates. However, as opposed to reporting how violations of assumptions produce a range of estimates, the focus remains on the extent to which an assumption must be violated to invalidate an inference. As a result, the indices reported here seek to quantify the robustness of the original inference (Frank & Sykes et al, 2008; Frank & Min, 2007; Frank, 2000).14 Propensity Scores: Brief discussion of propensity scores as an analytic technique Propensity scores matching is one of a set of relatively new analytic techniques that have arisen out of the need to estimate treatment effects from observational data. Built upon the work of statisticians, such as Rosenbaum and Rubin (1983), and econometricians, such as Heckman, propensity score analysis allows for the estimation of ‘4 For more information regarding the development of these indices and their application to this study, see the Technical Appendix D. For more information on the these indices and their development and application across a variety of contexts, see Frank (2000), Frank & Min (2007), and Frank & Sykes, et a1 (2008). 26 causal effects from non-experimental data. Because of their utility in isolating effects when randomized studies are not possible (as is often the case), these methods are becoming part of the analytic methods of a number of disciplines, including education (Guo & Fraser, 2010). To provide a brief overview of propensity score matching as a method, the goal is to approximate the counterfactual: how might the outcome have varied if a school had or had not received the treatment? Since the counterfactual can never be observed in one individual unit, propensity score matching finds Similar units with different treatment assignments conditions, and compares their outcomes. One way to do this is to match the observations on a set of variables. However, this quickly becomes difficult due to the high dimensionality (Guo & Fraser, 2010; Rosenbaum, 2002; Rosenbaum & Rubin, 1983). Instead, a propensity score can be used as a balancing measure that summarizes the information in the covariates, and represents the estimated probability of receiving the treatment as a function of variables that predict treatment assignment (Morgan & Harding, 2006; Rosenbaum, 2002; Rosenbaum & Rubin, 1983). In this case, the treatment is considered high school-level teacher retention. Propensity scores are one of the main methods in matching strategies for estimating the counterfactual. The inherent difficulty with the counterfactual is that it can never be observed on one person or unit— a person/unit cannot receive both the treatment and the control under the same conditions. Therefore, the counterfactual is usually estimated in the aggregate. Propensity scores are often presented as an improvement over ordinary multiple regression, as a technique that is quasi-experimental in nature and that can account for possible selection effects. Where OLS regression makes no claims to approximating an 27 experiment, propensity scores do, and are often described as a way to do an experiment when true random assignment is impossible, as is so often the case in social science and education research (i.e. when it is unethical, impractical, and/or impossible to randomize people into treatment and control conditions). While propensity score matching is presented in an entirely different framework and so conceptually is more aligned with the idea of experimental design, in reality, it is not different in a statistical sense than regular OLS with predictors. One reason for this is that the same data is being used to estimate both types of analyses. The prediction equation in propensity score matching is using the same information from covariates that could be included in a regular multiple regression framework. That information is encoded in the propensity score, but still yields the same kind of estimates for individuals in the data set. In other words, Individual A’s values on key covariates like SES, race, gender and family composition are the same in the prediction equation for the probability to receive treatment as they are in a multiple regression framework predicting the outcome. What propensity score matching adds in applied contexts is an intuitive framework for isolating potential effects, as well as an efficient method for comparing treatment effects on groups that are matched on all available and observed covariates. While including all predictors in a multiple regression framework can yield similar estimates (for example, Frank, 2008), it does not allow the researcher to explain impacts in a clear treatment/control context. Presentation of findings in treatment/control contexts can be more intuitive to audiences, particularly when those audiences are policy makers or practitioners. A regression equation with a large number covariates can be confusing and 28 overwhelming; a comparison of means between treatment and control (where assignment is predicted by those covariates) can be more digestible. This class of methods focus on controlling for selection bias by using the observed covariates in ways to minimize this bias, and thus the offer advantages over OLS regression (Guo & Fraser, 2010). More importantly, when targeting analyses toward a policymaking audience, it can help isolate the potential effect of implementing a new policy—as is the goal here. However, it is important not to overstate the “power” of propensity scores. They do not remove the issue of unobserved confounding variables (which is a key criticism of regular regression) because those unobserved confounding variables are still unobserved, and assignment into treatment and control conditions based on covariates is only as good as the covariates. While they can control for overt selection bias, they are not able to control for hidden selection bias (Guo & Fraser, 2010). For this reason, propensity score matching is used in this analysis in conjunction with sensitivity analyses that help quantify how robust the treatment effect is to hidden bias, as suggested by Rosenbaum (2002, 2005) and Rosenbaum and Rubin (1983). Propensity Scores: Weighting Method Propensity scores allow for the comparison of those in a “treatment” group (schools with high teacher retention) and those in a control group (schools with low teacher retention) who have similar propensities to receive the treatment. The treatment, high teacher retention, is defined as a school with over 85% of the teachers retained from year to year.15 The propensity score is the probability of a school receiving the treatment 15 Tests were done to estimate the best threshold for the defining the high teacher retention treatment level. Several thresholds were included in a series of logistic regressions to see which were significant. Additionally, propensity scores were estimated for several thresholds and the distribution of treatment and 29 (high teacher retention), given a set of covariates. For this analysis, a weighting approach to propensity scores is used, in order to allow all information to be retained in the analysis and not lost as can be the case when doing a case by case matching approach to propensity scores (Hirano & Imbens, 2001; Morgan & Harding, 2006; Robins, Heman, & Brumback, 2000; Robins & Rotnitzky, 1995; Robins, Rotnitzky & Scharfstein, 2000). The propensity score is calculated by estimating a logistic regression predicting treatment (i.e. high teacher retention).l6 This yields a propensity score, which is the propensity of receiving treatment given the set of covariates, and is defined as follow (see Rosenbaum & Rubin, 1983): e(x)E Pr{t = l | x} (5) where t= whether or not the school received the treatment of high teacher retention, and Pr{t=1|x} is the probability of receiving the treatment given that set of covariates. After estimating the propensity score, several weights are calculated. The first is the weight for the treatment effect for those schools on the margin of indifference, or the effect of treatment at the margin of indifference (EOTM) (Heckman, 2005). In other words, this weight focuses on those schools that might be considered most likely to respond to a change in policy on teacher retention rates. Schools that had low teacher retention but had a high propensity for having high teacher retention might be responsive to shifts in statewide policy regarding professional development or incentives for teachers to remain in schools. This weight is calculated as follows: control cases in each of the blocks was examined. An 85% threshold yielded the most reasonable results, and therefore is used here as the treatment. 6 Propensity scores are estimated using the pscore program in the Stata statistical analysis software, created by Sascha Becker and Andrea Ichino (2002). 30 t l - t + e(x) l — e(x) (6) a)(t,x) = This weight is then included in the multilevel models specified previously, weighting the level-2 observations by their propensity to receive the treatment.17 This means that schools that have high retention are weighted by 7L)- and schools that have e x low retention are weighted by . Therefore, schools with high retention are — e x weighted more when they have a lower propensity of having high retention, and low retention schools are weighted more when they have a high propensity of having high retention. This focuses attention on those schools that either received the treatment and had low propensity to do so; or those who did not receive the treatment despite having a high propensity to do so. Weights were also constructed that allow attention to be focused on the treatment effect for the treated and the treatment effect for the control. Looking at the treatment effect for the treated asks, “How well did the treatment work for those who received it?” while looking at the treatment effect for the control asks, “How well might this treatment have worked for those in the control group?” These weights are constructed as follows: Treatment effect for the treated: —t l—e(x) a)(t, x) = t + (7) Treatment effect for the control: I 7 . . . . In order to preclude extreme observations from exertrng undue Influence on the models, weights were trimmed at 18. 31 w(t,x)=—e(t—x)+l-t (8) Propensity Scores: Stratification Method I also investigate this relationship by looking at the average treatment effect within four strata.18 This estimates the average treatment effect on the treated (ATT) using stratification matching. The strata were defined during the estimation of the propensity score. By construction, the covariates in each block are balanced and the assignment to treatment can be considered random. The ATT is computed only on the region of common support, and is computed using a weighted (by the number of treated) average of the block-specific treatment effects. In turn these are computed as the difference in average outcomes of treated and controls within the same block for which the all control variables are balanced (Becker & Ichino, 2002).19 Measures Key predictors of interest: School-level teacher turnover and undersupply” These analyses use two important predictors constructed from the data: school- level teacher “churn rates,” and school-level teacher undersupply. As this is an organizational analysis of teacher supply and demand, characteristics that may be studied at the individual level (i.e. teacher retention) are considered as an organizational characteristic of the schools. School-level teacher churn rates are calculated by identifying the number of teachers who remain in the workforce and in the same school from one year to the next. 18 This is computed using Stata’s atts command within the pscore program. See Becker & Ichino (2002). 19 For comparison, I estimate these effects with nearest neighbor, kernel, and radius matching as well. This is described further in Chapter 6 and in the Appendix C. 20 A complete list of all variables used throughout the dissertation is included in Appendix A. 32 This is estimated over four years, which is three retention time points. This is then aggregated to a churn rate; the proportion of the teaching force in a given school that remains the same from one year to the next. This rate is averaged over the three time points, and a “difference” indicator is included in analyses to control for the direction of change in average retention rates. School-level teacher undersupply is calculated using the method outlined above, with modifications. Demand is estimated using a modified demand formula (Chapter 4), then supply is calculated by summing the F TEs in each subject for each school, and undersupply is calculated by subtracting supply from demand. This variable is then used as a predictor in multilevel analyses predicting student achievement. It is also used in conjunction with the school-level teacher retention rates calculated above, in order to take into account the impact of a “revolving door” of teachers on the relationship between undersupply and outcomes. School-level Covariates The school-level predictors include those variables shown by research to be related to both teacher retention and student outcomes: the percent minority students in the school, percent free/reduced lunch students, locale (a dummy variable comparing city, town and rural schools to suburban schools), school size (a dummy variable comparing small schools of under 300 students and large schools of over 1000 students to medium schools), indicator variables for charter and magnet status. 33 Additionally, I use a set of workforce composition variables, including the proportion of teachers with professional licenses (by FTES), proportion minority teachers (by FTES), and proportion of highly qualified teachers (by FTES).21 For the student mobility analysis, a school-level covariate for the 2007 cohort mobility is used. This variable indicates the proportion of students in each school who took the MME in the spring of 2007 and who were in the same school in the fall of 2005.22 It is used as a prior measure of school-level student mobility, in order to help establish a stronger causal ordering. This measure is prior to the final year of the teacher retention measure (2008), as well as prior to the student achievement outcome measures from 2009. Student-Level C ovariates The following student predictors are utilized: mathematics or English language arts pretest score from the 2005-2006 8th grade achievement test, the MEAP test,23 gender, race, free and reduced lunch eligibility, special program eligibility and an indicator variable for whether or not students have been in the same high school since 9th grade. These student level predictors are all well-known to have important relationships with student achievement outcomes, and are a standard set of student background covariates to include in such an analysis. Gender is a dichotomous variable (1=female). Race is categorized into American Indian/Pacific Islander, Asian, Hispanic, Black and 21 School means of all student predictors are also included on the intercept when those student level redictors are included in the model and are group-mean centered. This variable indicates the proportion of 2007 juniors who were in the same school in the fall of 2005, i.e. as sophomores. Data was not available for the fall of 2004, when those students were freshmen. By legislative mandate, Michigan only administers a high school achievement test at one time point in high school—in 11th grade. Therefore, the only available pre-test measure of prior achievement is the 8th grade MEAP score. Only students with valid MEAP and MME scores of a non-zero value were retained in the analysis (n=96,556). 34 multi-ethnic, with white as the reference category. Free and reduced lunch eligibility is a dichotomous variable (1=free/reduced lunch eligible). The same school variable used as an outcome in the second set of models is used as a predictor in the student achievement models. Program eligibility indicates whether or not a student is eligible for Title 1, special education, Section 504 plan, Limited English Proficient, advanced/accelerated, or migrant services. This is recoded into a dichotomous variable that indicates whether or not a student is eligible for any special programs.24 24 The number of students in this file who are eligible for these programs is very small (n=l300). This is partly due to the fact that students who took MI-Access, Michigan’s alternative assessment for students with moderate to severe cognitive difficulties, are not included in this analysis, and thus much of the special education and Section 504 population is not included. 35 CHAPTER 3: DESCRIPT IVE ANALYSES OF MICHIGAN ’S TEACHER SUPPLY, TURNOVER, AND UNDERSUPPLY Teacher supply, retention and undersupply are interrelated aspects of the teacher labor supply in the state of Michigan. The analyses in this chapter present descriptive evidence to highlight these relationships, and to provide an overall empirical picture of teacher supply, school-level teacher retention, and school-level undersupply. The descriptive analyses presented here allow for the investigation of three questions: 1) What is the composition of Michigan’s teaching force (i.e. its supply) in terms of demographics, licenses, and distribution? 2) What is the distribution of school-level teacher retention rates over various types of schools? 3) What is the distribution of school-level undersupply over various types of schools? This chapter begins with findings regarding the characteristics and composition of the teaching workforce in Michigan, focusing in particular on the distribution of teachers across various types of schools, license types, and subjects. The analysis then turns to Michigan’s overall labor supply, looking at the number of teacher entering, leaving, and staying in the profession, and then for those who stayed, the proportion who changed schools and those who did not. The final portion of the chapter focuses on the two key school-level characteristics that are of interest throughout this analysis—school-level teacher retention and school- level undersupply, in order to understand how these characteristics vary across types of schools. While descriptive analyses cannot provide evidence regarding causal relationships, they provide critically important information regarding the population of teachers and schools under analysis. With these universe data, they are even more 36 informative than data from representative samples, as they represent the complete population of teachers and schools, and therefore observed differences reflect real differences in the education practice in Michigan. Investigating Michigan’s Teacher Labor Supply: Previous Analyses In order to understand teacher demand, supply and turnover, and undersupply, it is necessary to first understand the state of Michigan’s teaching force—its demographic composition, the distribution of licenses across schools, and issues related to out-of-field teaching and a “reserve” supply of teachers. This provides background information that is necessary to correctly situate the findings in the remainder of the dissertation in the appropriate context. To provide this information, I will rely on findings from the REL- Midwest collaboration between the Michigan Department of Education, the Center for Educational Performance and Information, and Michigan State University.1’2 What are the demographic characteristics of Michigan ’s teaching force .73 The REL-Midwest research team investigated the characteristics of Michigan’s teaching force, with particular attention paid to distributional issues as well as the relationship between teacher characteristics and placements and AYP (see Table 3.1). Overall, the state’s teaching workforce is primarily white (89%) and aging. The minority representation among the teaching force appears to increase with age, as younger teachers appear to be predominately white. l The REL-Midwest collaboration has produced three technical reports. I served as a graduate research associate on this project from it’s inception and have co-authored (either a first or second author) all three reports. The work of this project, and the findings in these reports, form the foundation for my dissertation. 2 Data fiom all of the reports mentioned here are from the 2007 REP, although analyses from the 2008 REP the most recent teacher data available) do not show any substantial differences in distribution or findings. Findings fi'om Lynn & Keesler et. al (2007): Technical Report 1: Beyond Compliance: Descriptive Characteristics of Public School Teachers in Michigan. 37 The data Show that the majority of minority teachers in Michigan are in urban schools, and that urban fringe, town, and rural schools employ teachers who are almost exclusively white. Figure 3.1 below compares Michigan’s teaching force to the racial composition of Michigan and of the United States. White teachers are overrepresented in the Michigan teaching force relative to the overall Michigan population (89% in the teaching force, compared with 78% in the population, while black and Hispanic teachers are underrepresented. Figure 3.1: Comparision of Racial Composition of Michigan's Teaching Force to Michigan and the USA Source: REP and Census 2008 I Michigan Teachers I Michigan El USA l I White Black Hispanic Asian Other One important federal reporting requirement under No Child Left Behind, as well as a focus of the current administration’s Race to the Top competition, is the equitable distribution of teachers across types of schools. Table 3.2 shows the distribution of full- and part-time teachers by teacher characteristics and across school types; Table 3.3 shows 38 the distribution of minority and non-minority teachers by teacher characteristics and across school types; and Table 3.4 Shows the distribution of teachers by highest education by teacher characteristics and across school types. Equitable distribution questions often related to the distribution of teachers in terms of quality and effectiveness. With the current data, Michigan does not calculate teacher effectiveness based on student test scores and results from annual educator evaluations, although this activity will begin in the 2010-2011 school years. Until this new system is available, equitable distribution calculations focus on issues like race and education, not student achievement growth. The distribution of teachers by school types indicates that there does not appear to be a pattern of part-time teaching in high poverty or high minority schools (see Table 3.2). In fact, there appear to be more part-time teachers in urban fringe schools, and schools with low populations of minorities and low-income students. In Table 3.3, we see that minority teachers tend to work in minority and poor schools, as well as city schools. Seventy-seven percent of minority teachers are in schools with over 50% minority enrollment; 67% are in schools where more than half of the students are free-lunch eligible; and 72% are in city schools. Minority teachers are more likely to work in larger schools than non-minority teachers. The distribution of highest degree attainment shows that teachers with doctoral and specialist degrees tend to be concentrated at the high school level and in schools in the urban fringe (see Table 3.4). Teachers with master’s and bachelor’s degrees are employed primarily in elementary and middle school. There do not appear to be noteworthy inequities in the distribution of teachers over schools by highest education degree. 39 Finally, the distribution of teachers over locale types suggests that teachers who work in cities are more likely to teach in schools where over half of the students receive free and reduced lunch (61%), and in schools with enrollments over 1000, than teachers in rural, urban fringe, or town schools (see Table 3.5). Teachers in rural schools are also more likely to teach in schools with higher numbers of low-income students (30% are in schools with over half of students eligible for free and reduced lunch); however they are also more likely to teach in small schools (under 300 students). Teachers in rural schools also look different with respect to their educational degree attainment; they are less likely to have an advanced degree and are more likely to have a bachelor’s degree (60%). Licensure and Endorsement Michigan’s teaching force is defined not only by their demographic characteristics and placements, but most importantly, by their training and skill, as represented here by their license levels.4 The most commonly held licenses are provisional, professional, and 18/30 Hour Continuing licenses. The provisional certificate, Michigan’s initial teaching certificate, is issued following successful completion of an approved elementary or secondary teacher preparation program, including student teaching, and after a candidate has passed all components of the Michigan Test for Teacher Certification (MTTC). The certificate is valid for up to six years. During this time, the teacher is expected to complete a minimum of three years of successful teaching experience, and to finish at least eighteen semester hours in a planned course of study as a prerequisite for the next 4 As of this writing, Michigan does not collect information regarding teacher quality and effectiveness. License and endorsement status serve as proxies for these qualities, along with the “highly qualified” indicator. Michigan will begin annual educator evaluations in 2011 and the results of these evaluations will become part of the REP. They will also begin to calculate growth for all teachers when the data are available and applicable, and make these measures available as well. 40 level of certification.5 The professional education certificate is Michigan’s advanced teaching certificate, and requires completion of 18 semester hours in a planned course of study afier the issuance of an approved initial teaching certificate (or an approved master’s degree earned at any time), and three years of successful teaching experience. This certificate is valid for up to five years; it must be renewed every five years by completing six semester hours at an approved teacher preparation institution or a state board-approved institution. The 30 or 18 hour continuing licenses are no longer issued, although teachers with these certificates continue to provide instructional services. They remains valid as long as the holder continues to serve in an educational capacity for 100 days in any given five- year period.6 Table 3.6 presents the distribution of licenses by teacher demographics in the state of Michigan. Age has a predictable relationship with license status, with the two age groups most likely to have provisional licenses being those under 25 and those between the ages of 25 and 29. The majority of teachers (76%) who are between the ages of 30 and 39 have professional licenses. Among those teachers who are 40-54 years of age, 47% hold professional and 40% hold 18/30 hour continuing licenses. In the oldest group (those aged 55 and above), the majority (62%) hold 18/30 hour continuing licenses. 5 (The provisional certificate can be renewed if all of the requirements for the Professional Education certificate have not been met.) ' 6 Michigan also issues substitute licenses to those who stand-in for other teachers. The must have completed 90 semester hours of satisfactory (minimum 2.0 grade point average) credit consolidated at one four-year regionally accredited college and university. The substitute permit is valid for teaching a maximum of 150 days during the school year in day-to-day substitute teaching assignments. This permit is not valid for any regular or extended assignment. However, substitute teachers are not part of the instructional universe for this analysis, and therefore are not described in any detail here. 41 There appear to be no significant differences by gender with respect to the types of licenses held by males and females. Of the 19,718 teachers with provisional licenses, 16,180 (82%) hold a bachelor’s degree. Teachers with bachelor’s degrees are most likely to have professional (41%) or provisional (37%) licenses. Teachers with master’s degrees are more likely to have professional (58%) or 18/30 hour continuing (35%) licenses. Teachers holding doctoral or specialist degrees are nearly evenly split between those with 18/30 continuing (49%) and professional (43%) licenses. The distribution of licenses by minority teacher status suggests that non-minority teachers are more likely to hold professional (51%) than 18/30 hour continuing (28%) or provisional (19%) licenses. Minority teachers, like non-minority teachers, are more likely to hold professional licenses (38%). However, more minority teachers hold provisional licenses (26%), compared with 19% of non-minority teachers. There does not appear to be a relationship between full-time status and license type. Teacher licensure and endorsement: Are teachers teaching out of field? Is there a reserve supply of teachers to meet the demands of the MMC?7 One of the critical questions with respect to teacher licensure was out of field teaching. Teachers are generally not teaching out of field in the state of Michigan. Only 2% of the total high school teaching population is teaching out-of-field. This includes 123 in mathematics, 164 in English, 96 in science, 122 in social studies, and 64 world language teachers. The match between subject assignment and endorsement seems to be closest when teachers have only one subject assignment, but even with multiple 7 . . . Findlngs from Lynn & Keesler (2008), Technical Report 3.1: Beyond Compliance: Teacher Licensure and Endorsement in the State of Michigan. 42 assignments, there are a very small number of mismatches at the high school level. It is often the case that out-of-field teachers have several assignments, one of which is out-of- field. It is relatively rare to find a teacher who is teaching full-time in a subject without a valid endorsement. Table 3.7 presents those teachers who are assigned to a core subject but do not hold at least one endorsement in that subject area. Endorsements and Subject Assignments—Potential Reserve A second question is the extent to which teachers who hold multiple endorsements could be moved from one assignment (such as social studies) to another in which there was greater need in order to meet the increased demands of the MMC. However, there does not appear to be a reserve of teachers with “unused” endorsements who could be reassigned within a school to meet subject area shortages. (See Table 3.8) In general, teachers with a math endorsement are the most likely to actually teach math (83%). Of the remaining 17%, half of those are teaching science. Math appears to be a “dominant” endorsement. Those who hold a math and another endorsement are more likely to actually teach math. There is a relationship between math and science endorsements. For those with science as one of their endorsements, 55% teach in science, and another 30% teach in math. These findings suggest that schools are not likely to have a large “reserve” of people who could teach math or world language. Schools may have a reserve of science teachers, although if reassigned they are likely to be pulled from math courses. Michigan’s Overall Teacher Labor Supply Many analyses of teacher labor supply focus at the state or national level on the number of teachers entering, leaving and remaining in the profession. While this analysis 43 moves to the school-level and studies labor supply at that level, it is nonetheless important to understand Michigan’s overall teacher labor supply. Table 3.9 below presents the proportion of Michigan’s teaching force that is retained in the profession, enters the profession, and leaves the profession each year. Additionally, for those who are retained, Table 3.9 shows the proportion of those teachers who moved to different schools (“movers”) and those who stayed in their previous schools (“stayers”). In the overall teaching workforce, Michigan retains between 92% and 94% of its teachers. Note that the absolute size of the instructional workforce is declining over time, and in 2008, was three percent smaller than in 2006. Each year, a slightly smaller percentage of the overall population leaves the profession, while the size of the entering cohorts of teachers has steadily declined, down 7% in 2008 from the 2006 rate of entry. Within those teachers who are retained in the profession, Michigan has a steady 11% inter-school mobility rate, which is the overall teacher “churn” rate for the state of Michigan at an aggregate level. As will be seen later in this dissertation, this churn rate is not equally distributed among schools. School-Level Teacher Retention by School Characteristics This dissertation conceptualizes teacher retention as a school-level characteristic. Retention rates may be distributed unevenly among schools. As described in Chapter 2, average school-level teacher retention rates were calculated for each high school in the state of Michigan. Table 3.10 presents mean school-level teacher retention rate by school characteristics, in order to understand the distribution of teacher retention rates across 44 schools.8 Does teacher retention rate vary by key school characteristics, such as locale or student poverty? Recall that teacher retention rate is an average over the period 2005- 2008, so the mean teacher retention rate presented here is the mean of that average by school characteristics. The mean school-level teacher average retention rate for all high schools in the state of Michigan is 86.20%, which indicates that for all high schools in the state of Michigan, the average four year retention rate was 86.20%. Examining the distribution of mean school-level teacher retention rates over a variety of school characteristics, school teacher retention rates differ significantly by locale, with teacher retention rates in city schools at 79.73%, which is significantly lower than the rates in suburban, town and rural areas. Suburban, rural and town schools did not have significantly different teacher retention rates. Average teacher retention rates also differ significantly by the composition of the student body, both in terms of percent minority students and percent of students who are eligible for free and reduced lunch. High minority schools have a teacher retention rate of 74.98%, which is significantly lower than other categories. Schools with over 50% of students eligible for free and reduced lunch have significantly lower teacher retention rates (80.27% for schools with 50-69% of students eligible for free and reduced lunch, and 76.09% for schools with over 70% of students eligible). In terms of school size, there is again significant variation across Sizes, with small schools having lower average teacher retention rates (81.81%) than medium (86.40%) and large schools (87.84%). Charter schools have significantly lower retention rates than non-charter schools, with charter schools having retention rates of 71.91%, compared 8 This table presents the mean undersupply in each subject by school characteristics using a series of oneway ANOVAS. The reported F -test indicates significant between-group variation on each characteristic. 45 with 87.06% for non charter schools. However, there are only a small number of these schools available to make this comparison. Magnet schools do not have significantly different teacher retention rates than non-magnet schools. Schools that have a lower percentage of teachers with professional licenses also have significantly lower teacher retention rates. Schools with less than 80% of teachers with professional licenses have an average retention rate of 79.15%, compared with 87.69% for schools with 80-90% of teachers with professional licenses, and 89.55% for schools with greater than 90% of teachers with professional licenses. The percentage of minority teachers is also related to teacher retention rates. For schools with the highest percentages of minority teachers, the teacher retention rate was 76.51%, which was significantly lower than the teacher retention rates for schools with fewer minority teachers.9 Teacher retention rates are not significantly different across schools with different percentages of highly qualified teachers. To summarize, the types of schools that have lower mean teacher retention rates are city schools, schools with high proportions of minority and low-income students, small schools and charter schools. Schools with a lower percentage of teachers with professional licenses, and schools with a higher percentage of minority teachers also appear to have lower mean teacher retention rates. School-Level Undersupply by School Characteristics Tables 3.11, 3.12, and 3.13 present mean mathematics, English language arts, and science undersupply, respectively, by school characteristics, to develop a profile of the g This variable is highly skewed, as many schools have zero or less than one percent of their population With minority teachers. Michigan’s instructional workforce is only 11% minority teachers. Therefore, this Variable is categorized into less than 1%, 1-10% and greater than 10%. 46 type of schools that are undersupplied in each subject.10 The undersupply calculations range from approximately -5 to 5 (see Appendix A for descriptions of all variables used in the study), with positive numbers indicating undersupply. Mathematics Undersupply The mean school-level teacher undersupply is .48, which suggests that, on average, schools are not significantly undersupplied, although there is considerable variation in those rates (standard deviation=1.20). This adds further evidence to suggest the importance of a school-level organizational analysis, in order to capture those school- level differences and their relationship with student achievement outcomes. Suburban schools had higher mean rates of mathematics teacher undersupply, with an average of .83, and a standard deviation of 1.37, while city schools had an average of .64 with a standard deviation of 1.68. This suggests that there were city and suburban schools with both much higher rates of undersupply, as well as much lower. Rural schools had the lowest average rate of mathematics undersupply, with the smallest standard deviation (.18 and .84, respectively). The differences among average mathematics undersupply between locales are statistically significant at the .001 level. Schools with the lowest populations of minority students have the lowest average rate of mathematics teacher undersupply (.31). Interestingly, the distribution of average mathematics teacher undersupply rates among the school categories of free and reduced lunch is mixed; those schools with the lowest rates of free and reduced lunch have, by far, the lowest average rates of mathematics teacher undersupply (.04), but schools with 10- 29% of their students free/reduced lunch eligible have average mathematics undersupply ¥ 0 This table presents the mean undersupply in each subject by school characteristics using a series of oneway ANOVAS. The reported F-test indicates significant between-group variation on each characteristic. 47 rates of .67, which is the highest of the five categories, followed by schools with 50-69% of their students free/reduced lunch eligible, at .62. However, schools with greater that 70% of their students eligible for free and reduced lunch have average rates of mathematics teacher undersupply of .29, which is the second lowest rate. This suggests that the relationship between mathematics teacher undersupply and school socioeconomic status is mixed, and may not be strongly and linearly related as one would expect. Mathematics teacher undersupply varies widely among school sizes, with this between-group variation exceeding the threshold for statistical significance. Small schools (those with less than 300 students) have the lowest average rates of mathematics teacher undersupply, with an average of -0. l 5, which means that small schools, on average, are slightly oversupplied. Large schools (those with more than 1000 students) have an average mathematics teacher undersupply of 1.06, which suggests that on average, large schools are significantly undersupplied. Charter and magnet schools are both less likely to exhibit mathematics teacher undersupply than non-charter and non- magnet schools, which may be due in part to their smaller size, although these relationships are not statistically significant. Finally, when looking at school compositional characteristics, there is not a clear relationship between the percent of the teachers in a school with professional licenses and mathematics teacher undersupply. There is a significant relationship between the proportion of minority teachers in a school and average mathematics teacher undersupply, with schools with higher rates of minority teachers having higher rates of average mathematics undersupply. Schools with less than 75% of teachers highly qualified also have higher average rates of mathematics teacher undersupply. 48 English Language Arts and Science Undersupply Table 3.12 presents average English language arts teacher undersupply rates by school characteristics and Table 3.13 presents average science teacher undersupply rates by school characteristics. These tables are included, but are not discussed in detail for two reasons. The first is that there is not an average English language arts or science teacher undersupply; the average English language arts teacher undersupply is -.78 and the average science teacher undersupply is -.77, which suggests that schools, on average, have an adequate supply of ELA and science teachers. The caveat again is that there are important variations across schools, with some schools experiencing significant undersupply in these areas, but on the average, these are not undersupplied areas. This underscores the importance of investigating this not at the state level, but at a school-by- school level. The second reason these tables are not presented in detail is that the relationships mimic those outlined in the mathematics teacher undersupply. I will highlight key differences here. Suburban schools have the highest average rates of English language arts teacher oversupply, with the average rate of teacher supply at -1.10. Using a full FTE of over- or under-supply as a significant threshold as described in Chapter 4, this represents a potential significant oversupply.ll Recall that suburban schools also had the highest average rates of mathematics teacher undersupply. This suggests that suburban schools find it easier to adequately staff their English language arts classes than mathematics classes. Like in mathematics, rural schools have the lowest rates of ELA and science 11 Note that “oversupply” is used somewhat loosely; in this formula, a rate of 0 would indicate a school that was perfectly balanced in terms of supply and demand. Positive numbers indicate an undersupply; negative numbers indicate an “oversupply,” or a supply that is greater than estimated demand. This does not suggest that these schools Should reduce their teaching force, but rather that, given the assumptions of the formula, they are not in danger of being undersupplied at this particular time. 49 oversupply, which suggest that rural schools most accurately match their teacher supply to their demand needs. These relationships are similar in science. Unlike in mathematics, in both English language arts and science, the relationship between percent free and reduced lunch and undersupply is linear and positive. As the percent of students who are free and reduced lunch eligible increases, the average oversupply of English language arts and science teachers decreases. However, even in the schools with the highest rates of free and reduced lunch, the average undersupply is negative (i.e. is not undersupplied). Average undersupply rates in English language arts and science are related to school size in a similar manner as in mathematics. Small schools have the smallest rates of oversupply, while large schools have the largest rates. This is the converse of the mathematics relationship, where small schools had the lowest undersupply and large schools had the highest average undersupply. Again, like with rural schools, this suggests that small schools are most able to match their supply and demand needs accurately, while large schools are more likely to be Significantly over- or undersupplied. The relationship between the school compositional variables (percent teachers with professional licenses, percent of teachers who are minority teachers, and percent of teachers who are highly qualified) follow the same pattern as mathematics, with schools with lower proportions of professional teachers, higher proportions of minority teachers, and lower proportions of highly qualified teachers have lower rates of oversupply (whereas in mathematics, these types of schools had higher rates of undersupply). Schools with these compositional characteristics are more likely to trend toward being undersupplied, even if the absolute rate is not undersupplied. 50 Table 3.1: Characteristics of Michigan public school teachers by full-time equivalency status a Total Full-time Part-time N % N % N % Gender Male 24,037 26 1,210 18 25,247 26 Female 67,813 74 5,574 82 73,387 74 Missing 35 <1 13 <1 48 <1 Legs <25 1,425 2 197 3 1,622 2 25-29 11,881 13 821 12 12,702 13 30—39 26,893 29 2,448 36 29,341 30 40-54 32,659 36 2,137 31 34,796 35 >=55 18,992 21 1,181 17 20,173 20 Missigg 35 <1 13 <1 48 <1 Race/ethnicity White 82,105 89 6,390 94 88,495 90 Black or African American 8,160 9 289 4 8,449 9 Hispanic or Latino 821 1 49 I 870 1 Asian American 472 1 31 < I 503 <1 American Indian or Alaska Native 213 <1 15 <1 228 <1 Native Hawaiian or Other Pacific 55 <1 4 <1 59 <1 Islander Multiple 24 <1 6 < 1 30 < 1 Missig 35 <1 13 < I 48 <1 _I_I_ighest educational level , _ c 256 1 44 1 300 I Doctoral or Specralrst’s degree Master’s degree 48,333 53 3,471 51 51,804 53 Bachelor’s degree 40,422 44 2,966 43 43,388 44 High school 1,496 2 180 2 1,676 2 Other 761 1 98 1 859 1 Function Assignment Elementary school 34,279 38 2,348 34 36,627 38 Middle/Juniorflgh school 15,094 16 777 12 15,871 I 6 High school 20,316 22 1,518 25 21,834 22 Special education 13,076 14 905 13 13,981 14 Career/technical education institutions 1,753 2 126 2 1,879 2 (1 3,827 4 847 12 4,674 5 Other Multiple 3,540 4 276 4 3,816 4 51 Table 3.1 (cont’d) Full-time a Part-time Tm" N % N % N % Subject assignment English Langtgge Arts 6,736 7 628 9 7,364 8 Social Sciences 4,004 4 264 4 4,268 4 Sciences 4,455 5 265 4 4,720 5 Mathematics 5,530 6 316 5 5,846 6 ‘ World Language 1,807 2 306 4 2,1 14 2 Bilingual Education 259 <1 8 <1 267 <1 Business 841 1 64 1 905 I Technology 1 ,332 1 136 I 1,468 1 The Arts 4,852 5 603 9 5,455 6 Career and Technical 1,530 2 178 3 1,708 2 Wellness 3,356 4 313 5 3,669 4 Elementary Education 31,539 35 1,666 24 33,205 34 Special Education 12,585 14 887 13 13,472 14 Early Childhood 974 1 216 3 1,190 1 Alternative 222 <1 32 <1 254 <1 e 1,155 l 391 6 1,546 2 Other Multiple 10,708 11 527 8 1 1,235 11 Total Teachers 91,885 93% 6,797 7% 98,682 100% a b Full-time teachers are those whose F TF>=.99. Part-time teachers are those whose FTE<1. c Includes doctoral, education Specialist, law, and medical degrees. d . . . . Includes adult/contInuIng educatron, compensatory education, preschool, and summer school. e . . . . Includes agncultural scrence and natural resources, family and consumer educatron, driver and safety education, and Jr. ROTC. 52 Table 3.2: Characteristics of Michigan schools by teacher full-time equivalency status Full-time Part-time Total equivalency N % N % N % ALL PUBLIC SCHOOLS Instructional level Primary 40,724 44 2,945 43 43,669 44 Middle/erior High School 18,123 20 971 14 19,094 19 High School 24,674 27 1,700 25 226,374 27 Other 6,465 7 648 10 7,113 7 Missing 2,432 2 533 8 2,432 2 Enrollment size Less than 300 14,105 15 1,311 19 15,416 16 300-999 59,059 64 3,869 5 7 62,928 64 1,000 or more 16,208 1 7 1,029 15 17,234 1 7 Missing 2,513 3 591 9 3,104 3 Locale City 23,869 26 1,338 19 25,207 26 Urban frige 37,815 41 3,022 45 40,837 41 Town 18,425 20 1,268 19 19,693 20 Rural 9,877 11 636 9 10,513 11 Missig 1,899 2 501 7 2,432 2 Percent minority enrollment in school Less than 5 percent 19,785 22 1,439 21 21,224 22 5 to 19% 34,935 38 2,843 42 37,778 39 20 to 49% 15,634 I 7 1,094 16 16,728 17 50% or more 18,341 20 711 10 19,052 19 Missing 3.190 3 710 10 3,900 4 Percent of students in school eligible for free or reduced-price school lunch Less than 10% 16,267 18 1,548 23 17,815 18 10 to 29% 27.045 30 2,252 33 27,045 30 30 to 49% 20,044 22 1,199 18 21,243 22 50 to 69% 11,887 13 618 9 12,505 13 70% or more 13,4392 I5 470 7 13,909 14 Missing 3,203 3 710 10 3,913 4 Total Teachers 89,994 93% 6,929 7% 96,923 100% The school characteristic categories employed in this table are those used in Table 2.1 of the National Center for Education Statistics (NCES) Statistical Analysis Report Teachers’ Tools for the 21st Century: A Report on Teachers’ Use of Technology, NCES 2000-102. US. Department of Education, Office of Educational Research and Improvement. 53 Table 3.3: Characteristics of Teachers by Minority and Non-Minority Status Minority Teacher Non-Minority Teacher award-“Hwy?" ,. n . %- n‘ ..%. Gender Male 2,222 22 23,025 26 Female 7,917 78 65,470 74 Age Less than 25 94 I 1,528 2 25-29 874 9 1 1,828 13 30-39 2,874 28 26,467 30 40-54 3,647 36 31,149 35 Over 55 2,650 26 17,523 20 Highest education Doctoral or Specialist 111 1 844 1 Master's 5,523 54 46,281 52 Bachelor's 4,018 40 39,370 44 High School 366 3 1,310 1 Other 121 I 690 1 Multiple Subjects 332 3 10,901 12 Multiple Schools 173 2 3,642 4 Special Education 1,836 18 1 1,608 13 School Characteristic: ' j I I ‘ 7 ' :l Instructional Level Primary 4,813 47 38,847 44 Middle/Junior high school 1,441 15 17,644 20 High School 2,659 26 23,705 27 Other 745 7 6,354 7 Percent minority enrollment Less than 5% 182 2 21,035 24 5-19% 606 6 37,155 42 20-49% 912 9 15,802 18 50% ogreater 7,839 77 11,209 13 Percent free/reduced lunch Eligible Less than 10% 750 7 17,047 19 10-29% 878 9 28,414 32 30-49% 1,139 I 1 20,098 23 50-69% 1,904 19 10,591 12 70% or greater 4,864 48 9,042 10 Locale City 7,295 72 17,902 20 Urban Fringe 1,893 19 38,917 44 Town 375 4 19,313 22 Rural 95 1 10,418 12 Enrollment Size Less than 300 students 1,429 14 13,974 16 300-999 students 6,033 60 56,876 64 Greater than 1000 students 2,148 21 15,076 17 f otallNquber of Teachers - J 10,139 10 88,495 901 Missingdata: race/eth (48) instruct level & locale (2423) min enroll(3900),freelunch(3913) size(3104) 54 Table 3.4: Characteristics of Teachers by Education Degree Attainment 55 Doctoral Specialist Master's Bachelor's High School Other 11 i% n 1% n j % n T % n I % cachet ,, ‘_ acteristics Gender Male 329 34 12,536 24 11,572 27 563 34 247 29 Female 626 66 39,268 76 31,816 33 1,113 67 564 66 _A_ge Less than 25 1 <1 22 <1 1,509 3 67 4 23 3 25-29 18 2 2,304 4 10,064 23 228 14 88 10 30-39 143 15 15,684 30 12,818 30 503 30 193 22 40-54 346 36 20,855 40 12,635 29 612 37 348 41 Over 55 447 47 12,939 25 6,362 15 266 16 159 19 Race/Ethnicity White 844 88 46,281 89 39,370 91 1,310 78 690 80 Black/African American 84 9 4,693 9 3,256 8 321 19 95 11 Hispanic or Latino 11 1 412 1 404 1 30 1 13 1 Asian American 13 1 262 I 208 <1 11 I 9 1 American Indian/Alaskan native 3 <1 116 <1 102 <1 4 <1 3 <1 Native Hawaiian or Other - <1 30 <1 28 <1 - <1 1 <1 Multiple - <1 10 <1 20 <1 - <1 - <1 Multiple Subjects 109 11 5,723 11 5,217 12 140 9 46 6 Multiple Buildings 29 3 1,635 3 2,096 5 37 2 19 2 Special Education 147 16 7,353 14 5,556 13 237 16 179 25 Chool Characteristics"? 1 L ' l I W 7‘ Instructional Level Primary 290 30 23,319 45 19,217 44 574 34 269 31 Middle/Junior high school 210 22 10,661 21 7,845 18 271 16 107 12 High School 364 38 13,974 27 11,292 26 540 32 204 24 Other 62 6 2,811 5 3,872 9 179 II 189 22 Percent minority enrollment Less than 5% 133 14 10,299 20 10,378 24 312 19 102 12 5-19% 380 40 21,027 41 15,944 37 255 15 172 20 20—49% 216 23 9,072 18 6,900 16 282 17 258 30 50% or greater 183 19 9,686 19 8,374 19 648 39 161 19 Percent free/reduced lunch _e_li_gible Lessthan10% 214 22 10,204 20 7,051 16 215 13 131 15 10-29% 322 34 16,455 32 12,128 28 283 17 109 13 30—49% 167 17 10,651 21 9,981 23 337 20 107 12 50-69% 92 10 5,787 11 6,037 14 386 23 203 24 70% or greater 116 12 6,979 13 6,395 15 276 16 143 17 Table 3.4 (cont'd) Doctoral Master's Bachelor's High School Other 11 % n % n % n % n % Locale City 244 26 13,966 27 10,046 23 643 38 308 36 UrbanFringe 568 59 23,181 45 16,333 38 516 31 239 28 Town 88 9 9,674 19 9,574 22 167 10 190 22 Rural 26 3 3,944 8 6,273 14 238 14 32 4 Enrollment Size Less than 300 students 101 11 6,887 13 8,043 19 229 14 156 18 300-999 students 524 55 33,400 64 27,606 64 989 59 409 48 Greater than 1000 students 296 31 10,197 20 6,310 15 291 I7 140 16 l' i i ,otal Number of Teachers 955 1 51,804 52 43,388 44 1,676 2 859 I Note: MissinLdata = 48 on race/ethnicity, 2,423 on instructional level and locale, 3, 900 on minority 56 Table 3.5: Characteristics of Teachers by Locale City Urban Fringe Town Rural n l% n l % n l% n l% ‘ eacher Claracterifics Gender Male 5,978 24 10,157 25 5,414 27 3,127 30 Female 19,219 76 30,653 75 14,274 73 7,386 70 Age Less than 25 363 I 747 2 305 2 161 2 25-29 2,863 11 5,789 14 2,522 13 1,228 12 30-39 6,943 28 12,985 32 5,843 30 3,013 29 40-54 8,954 36 13,582 33 7,297 37 4,075 39 Over 55 6,074 24 7,707 19 3,721 19 2,036 19 Race/Ethnicity White 17,902 71 38,917 95 19,313 98 10,418 99 Black/African American 6,494 26 1,372 3 160 1 11 <1 Hispanic or Latino 450 2 247 I 105 I 26 <1 Asian American 234 1 196 <1 48 <1 11 <1 American Indian/Alaskan native 92 <1 46 <1 39 <1 40 <1 Native Hawaiian or Other 19 <1 21 <1 17 <1 1 <1 Multiple 6 <1 11 <1 6 <1 6 <1 Highest education Doctoral or Specialist 244 1 568 1 88 <1 26 <1 Master's 13,966 55 23,181 57 9,674 49 3,944 38 Bachelor's 10,046 40 16,333 40 9,574 49 6,273 60 High School 643 3 516 I 167 1 238 2 Other 308 I 239 <1 190 I 32 <1 Multiple Subjects 1,534 6 4,614 11 2,949 15 2,023 19 Multiple Schools 527 2 1,130 2 1,081 5 1,033 10 SpecialEducation 3,817 15 5,368 13 2,558 13 1,178 11 chool’CharactJelriitics" - g l U A l I 5 Instructional Level Primary 12,874 51 18,601 46 8,074 41 4,120 39 Middle/Junior high school 4,105 16 8,790 22 4,249 22 1,950 19 High School 6,399 25 11,124 30 5,843 30 3,008 29 Other 1,829 7 2,322 6 1,527 8 1,435 14 Percent minority enrollment Lessthan 5% 684 3 6,818 17 7,635 39 6,087 58 5-19% 4,347 17 21,188 52 8,951 45 3,292 31 20-49% 5,106 20 8,540 21 2,224 11 858 8 50% or greater 14,715 58 3,884 10 306 2 147 1 57 Table 3.5 (cont'd) l l City Urban Fringe Town Rural n % n %( n % n % Percent free/reduced lunch eligible Less than 10% 2,764 11 10,611 26 3,274 17 1,166 11 10-29% 3,839 15 16,495 40 7,150 36 1,813 17 30-49% 3,045 12 7,828 19 6,163 31 4,208 40 50-69% 4,934 20 3,291 8 1,916 10 2,364 22 70% or greater 10,270 41 2,198 5 610 3 831 8 Enrollment Size Less than 300 students 3,955 16 4,414 11 3,623 18 3,424 33 300-999 students 15,763 63 26,889 66 13,462 68 6,814 65 Greater than 1000 students 5,337 21 9,417 23 2,276 12 204 2 . (ital Number of Teachers g -- “25,207 7 26 40,831 42 _ 19,693 20 10,513 11 Note: There are 2, 432 teachers with missing data on locale. 58 Table 3.6: License Demographics for Michigan's Instructional Workforce License Type" (N) Provisional Professional 18/30 Cont Vocational Non-Instr. Unknown Total Ass <25 945 5 0 3 14 62 1,029 25-29 8,533 3,217 0 40 75 128 11,993 30-39 6,485 22,452 8 158 215 367 29,685 40-54 3,236 16,521 13,908 429 226 647 34,967 >=55 519 7,260 13,445 178 122 245 21,769 [Unknown] 0 0 0 0 0 49 49 Gender Male 5,347 12,357 6,666 488 189 316 25,363 Female 14,371 37,098 20,695 320 463 1,133 74,080 fllnknown] 0 0 0 0 0 49 49 Highest Educational Level Doctoral/Specialist 44 414 473 8 8 1 1 958 Master's Degree 2,836 30,267 18,304 155 56 251 51,869 Bachelors Degree 16,180 18,051 8,069 358 475 490 43,623 High School 499 512 310 144 72 492 2,029 Other 159 211 205 143 40 251 1,009 [Unknown 0 0 0 0 1 3 4 Minority Status Provisional Professional 18/30 Cont Vocational Non-Instr. Unknown Total Ion-Minority Teacher 16,983 45,477 24,782 685 210 971 89,108 Minority Teacher 2,735 3,978 2,579 123 442 478 10,335 [I_J_nknown] 0 0 0 0 0 49 49 Full/Part-Tirne ' Status Provisional Professional 18/30 Cont Vocational Non-Instr. Unknown Total Part-Time 1,595 3,300 1,579 90 127 450 7,141 Full-Time 18,123 46,155 25,782 718 525 1,048 924351 Total 19,718 49,455 27,361 808 652 1,498 99,492 * For those with multiple licenses, license type refers to the teacher’s "highest" license, where Professional > 18/30 Hour C ontinuing>Provisional > Vocational > 0ther>Missing. Non-Instructional includes Administrative, Support Stafi,‘ and Substitute licenses. 59 Table 3.6 (cont'd) License Type (%) Provisional Professional 18/30 Cont Vocational Non-Instr. Unknown Age 92% 0% 0% 0% 1% 6% 71% 27% 0% 0% 1% 1% 22% 76% 0% 1% 1% 1% 9% 47% 40% 1% 1% 2% 2% 33% 62% 1% 1% 1% 0% 0% 0% 0% 0% 100% Gender 21% 49% 26% 2% 1% 1% 19% 50% 28% 0% 1% 2% 0% 0% 0% 0% 0% 100% Highest Educational Level 5% 43% 49% 1% 1% 1% 5% 58% 35% 0% 0% 0% 37% 41% 18% 1% 1% 1% 25% 25% 15% 7% 4% 24% 16% 21% 20% 14% 4% 25% 0% 0% 0% 0% 25% 75% Minority Status Provisional Professional 18/30 Cont Vocational Non-Instr. Unknown 19% 51% 28% 1% 0% 1% 26% 38% 25% 1% 4% 5% 0% 0% 0% 0% 0% 100% Full/Part-Time Status Provisional Professional 18/30 Cont Vocational Non-Instr. Unknown 22% 46% 22% 1% 2% 6% 20% 50% 28% 1% 1% 1% Total 20% 50% 28% 1% 1% 2% 60 Table 3.7: Out of Field in Core Subjects *Includes teachers with at least one high school assignment. Endorsed in Subject? Math Asiignment No Yes Total Yes 3% 97% 100% 123 3,959 4,082 No 95% 5% 100% 19,396 1,016 20,412 Total 80% 20% I 00% 19,519 4,975 24,494 Endorsed in Subject? ELA Assignment No Yes Total Yes 3% 97% 100% 164 5,017 5,181 No 87% 13% 100% 16,865 2,448 19,313 Total 70% 30% I 00% 1 7, 029 7, 465 24, 494 Endorsed in Subject? Science Assiggment No Yes Total Yes 2% 98% 100% 96 3,875 3,971 No 89% 1 1% 100% 18,321 2,202 20,523 Total 75 % 25% 100% 18,417 6,077 24,494 Endorsed in Subject? Social Studies Asiignmentl No Yes Total Yes 3% 97 % 100% 122 4,163 4,285 No 75% 25% I 00% 15,208 5,001 20,209 Total 63% 3 7% I 00% 15,330 9,164 24,494 Endorsed in Subject? World Lanflge Assign No Yes Total Yes 3% 97 % 100% 64 1,804 1,868 No 97% 3% 100% 21,999 627 22,626 Total 90% 1 0% 1 00% 22, 063 2, 431 24, 494 "' 61 Table 3.8: Teachinfissijnment Igy Endorsement Pattern (Potential "Reserve" of Teachers) Cells contain row percentages and n's Subject Assignment Biling Career Alt Arts Ed Business Tech Elem ELA Math Math-X l6 l7 0 8 2 3 39 2228 1% 1% 0% 0% 0% 0% 1% 83% Lang.Arts-X 37 94 1 10 6 2 2292 207 1% 3% 0% 0% 0% 0% 68% 6% Science-X 24 25 0 4 2 6 95 821 1% 1% 0% 0% 0% 0% 3% 30% SocialSci-X 37 124 2 1 8 2 6 1 164 540 1% 3% 0% 0% 0% 0% 29% 14% World Lang-X 2 22 3 1 0 1 142 66 0% 2% 0% 0% 0% 0% 13% 6% Subject Assignment Social Spec World Science Sciences Ed Tech Wellness Lang Misc NA Total Math-X 222 1 7 3 34 25 40 15 2 26 71 8% 1% 0% 1% 1% 1% 1% 0% 100% Lang.Arts-X 52 25 7 l l 15 76 272 54 4 3390 2% 8% 0% 0% 2% 8% 2% 0% 1 00% Science-X 1521 75 3 18 104 29 29 1 275 7 55% 3% 0% 1% 4% 1% 1% 0% 100% SocialSci-X 138 1352 14 35 231 239 66 6 3974 3% 34% 0% 1% 6% 6% 2% 0% I 00% World Lang-X 10 35 0 2 7 767 9 1 1068 1% 3% 0% 0% 1% 72% 1% 0% I 00% Table 3.9: Teacher Labor SupplLin Michigan, 2006-2008 2006 % 2007 % 2008 % Overall Teaching Workforce Retained in the profession 102,365 92% 102,439 93% 101,379 94% Left the profession (attrition rate) 9,609 8% 8,616 7% 8,536 6% Entered the profession 8,690 7,476 6,414 Total number of teachers in summer count 1 1 1,055 109,915 107,793 Within-School Mobility Rate Stayers (same school) 91,234 89% 91,471 89% 90,247 89% Movers (different school) 11,131 11% 10,968 11% 11,132 11% Total retained teachers 102,365 100% 102,439 101,379 63 Table 3.10: Mean Teacher Retention Rate by High School Characteristics Mean SD N F ~test Locale City 79.73 14.91 96 .000 Suburb 87.10 6.57 161 Town 88.60 4.65 78 Rural 87.37 5.70 245 Percent minority Less than 5% 87.93 5.14 174 .000 5-10% 88.73 4.96 141 10-15% 89.33 3.98 56 15-65% 86.42 5.80 132 Greater than 65% 74.98 15.21 77 Percent free/reduced lunch Less than 10% 88.30 6.35 43 .000 10-29% 89.14 4.27 202 30-49% 87.36 5.31 204 50-69% 80.27 1 1.87 96 70% or greater 76.09 16.19 35 School size Less than 300 students 81.81 13.23 85 .000 300-999 students 86.40 8.34 307 Greater than 1000 students 87.84 4.86 188 Charter Schools Charter 71.91 17.06 33 .000 Non-charter 87.06 6.9 1 547 Magnet Schools Magnet 87.26 6.86 65 .139 Non-magnet 86.06 8.76 5 15 Percent teachers with prof licenses Less than 80% 79.15 13.33 134 .000 80-90% 87.69 4.66 297 Greater than 90% 89.55 4.85 149 Percent minority teachers Less than 1% 88.15 5.45 275 .000 1-10% 87.81 4.91 214 Greater than 10% 76.51 14.58 91 Percent highly qualified teachers Less than 75% 85.97 7.48 197 75-85% 86.45 7.33 144 Greater than 85% 86.22 10.01 239 Total 86. 20 8. 5 7 580 Teacher retention rate is the average teacher retention rate for each school from 2005-2008 Table 3.11: Mean Mathematics Teacher Undersupply by High School Characteristics Mean SD N Locale City .64 1.68 94 .000 Suburb .83 1.37 161 Town .51 .90 78 Rural .18 .84 245 Percent minority Less than 5% .31 .81 174 .000 5-10% .46 1.12 141 10-15% .64 1.13 56 15-65% .58 1.42 132 Greater than 65% .63 1.65 75 Percent free/reduced lunch Less than 10% .04 1.60 43 .000 10-29% .67 1 .13 202 30-49% .37 1.02 204 50-69% .62 1.44 94 70% or greater .29 1.18 35 School size Less than 300 students -0.15 .73 83 .000 300-999 students .30 .91 307 Greater than 1000 students 1.06 1.52 188 Charter Schools Charter .06 1.08 33 .425 Non-charter .5 1 1 .21 545 Magnet Schools Magnet .22 1.38 65 .080 Non-magnet .52 1.12 513 Percent teachers with prof licenses Less than 80% .35 1.16 132 .366 80-90% .52 1.31 297 Greater than 90% .42 1.00 149 Percent minority teachers Less than 1% .28 .98 275 .000 1-10% .72 1.22 214 Greater than 10% .54 1.63 89 Percent highly qualified teachers Less than 75% .74 1.34 196 .000 75-85% .46 1.23 144 Greater than 85% .28 1.02 238 Total .48 1.20 5 78 Math undersupply is calculated for SY 2009 based on 2008 REP, student enrollment from 2006-2008 65 Table 3.12: Mean English Language Arts Teacher Undersupply by High School Characteristics Mean SD N F-test Locale City -.81 2.56 94 .003 Suburb -1.10 1.81 161 Town -.89 1.06 78 Rural -.52 .88 245 Percent minority Less than 5% -.61 1.01 174 .062 5-10% -.69 1.15 141 10-15% -1.17 1.79 56 15-65% -1.00 1.67 132 Greater than 65% -.67 2.64 75 Percent free/reduced lunch Less than 10% -1.43 1.93 43 .037 10-29% -.78 1.38 202 30-49% -.79 1.31 204 50-69% -.58 2.10 94 70% or greater —.46 1.87 35 School size Less than 300 students -.43 .70 83 .006 300-999 students -.71 1.36 307 Greater than 1000 students -1.05 2.09 188 Charter Schools Charter -.86 2.81 33 .752 Non—charter -.7 7 1.48 545 Magnet Schools Magnet -.86 2.27 65 .659 Non-magnet -.7 7 1.48 513 Percent teachers with prof licenses Less than 80% -.58 1.91 132 .130 80-90% -.90 1 .59 297 Greater than 90% -.72 1.18 149 Percent minority teachers Less than 1% -.58 1.04 275 .008 1-10% -1.03 1.57 214 Greater than 10% -.79 2.62 89 Percent highly qualified teachers Less than 75% -.59 1.59 196 .086 75-85% -.79 1.41 144 Greater than 85% -.93 1.67 238 Total -. 78 1.58 5 78 ELA undersupply is calculated for SY 2009 based on 2008 REP & student enrollment from 2006-2008 66 Table 3.13: Mean Science Teacher Undersupply by High School Characteristics Mean SD N F -test Locale City -1.04 1.82 94 .000 Suburb -1.07 1.36 161 Town -.68 .89 78 Rural -.50 .71 245 Percent minority Less than 5% -.47 .75 174 .000 5-10% -.73 .93 141 10-15% -.99 1.02 56 15-65% -1.21 1.59 132 Greater than 65% -.58 1.54 75 Percent free/reduced lunch Less than 10% -1.96 1.91 43 .000 10-29% -.90 1 .1 1 202 30-49% -.56 .89 204 50-69% -.53 1.23 94 70% or greater -.39 1.16 35 School size Less than 300 students -.46 .64 83 .000 300-999 students -.50 .88 307 Greater than 1000 students -1.34 1.58 188 Charter Schools Charter -.25 1 .09 33 .464 Non-charter -.80 1 .20 545 Magnet Schools Magnet -.74 1.13 65 .501 Non-magnet -.77 1.21 513 Percent teachers with prof licenses Less than 80% -.50 1.13 132 .028 80—90% -.88 1.28 297 Greater than 90% -.77 1.06 149 Percent minority teachers Less than 1% -.54 .80 275 .000 1-10% -1.06 1.40 214 Greater than 10% -.77 1.53 89 Percent highly qualified teachers Less than 75% -.80 1.34 196 .000 75-85% -.80 1.31 144 Greater than 85% -.72 .99 238 Total -. 77 1.19 578 Science undersupply is calculated for SY 2009 based on 2008 REP & student enroll from 2006-2008 67 CHAPTER 4: ESTIMATING TEACHER DEMAND USING STATE ADMINISTRATIVE DATA: CHALLENGES AND RESOLUTIONS The demand formula presented briefly in Chapter 1 provides a powerful tool for use by practitioners and by researchers in order to estimate teacher demand in the context of specific curricular requirements. While easily accessible to practitioners and researchers alike, the formula makes several assumptions that bear further investigation. The purpose of this essay is to revisit the demand formula and its underlying assumptions, test the assumptions where appropriate, and make an informed decision regarding possible modifications to the formula for future use. The revised formula is then used in Chapter 7, Teacher Undersupply. Background to the Problem: Estimating Teacher Demand In order to provide students with increased rigor in their high school coursework and prepare students more fully for the demands of a global technological economy, many states have increased graduation requirements. Since 2004, 18 states plus the District of Columbia report having raised graduation requirements to meet the American Diploma Project’s college- and work- career-ready curriculum, which includes 4 years of challenging math and English, and an additional 12 states plan to do so in the next few years (Achieve, Inc., 2008). F orty-two states have at least some statewide requirements for high school graduation (American Association of State Colleges and Universities, 2006), and 25 states now offer an optional college-preparatory diploma (Dounay, 2006). Michigan is among the set of states leading the way in these reforms, adopting one of the most comprehensive sets of high school graduation requirements in the country, known as the Michigan Merit Curriculum in 2006. The Merit Curriculum is 68 meant to improve high school content and better align the Michigan diploma with business and postsecondary requirements through more rigorous curriculum, standards, and assessment (Cherry Commission, 2004). By establishing a statewide set of required courses, the Merit Curriculum creates a minimum floor that might raise expectations and achievement for students. This reform draws on a small but growing body of research, suggesting that certain core courses, especially those in math and science, can have significant effects on students’ long-term labor outcomes (Goodman, 2010; Levine & Zimmerman, 1995; Rose & Betts, 2004). The intervention might also positively influence students’ educational aspirations by standardizing high school course-taking around requirements for a postsecondary education (Bryk, Lee, & Holland, 1993; Lee, 2002). While other states are increasing graduating requirements, Michigan stands out for the rigor and specificity of courses required under the new policy, including Algebra 1, geometry, Algebra 2, Biology 1, chemistry or physics, and at least two years of foreign language. While these courses are likely to prove beneficial for student achievement outcomes,l the question remains: will Michigan, and other states implementing similar reforms, have an adequate supply of teachers to meet the increased instructional demands from a curriculum such as this one? 1 It is important to note that, while increased graduation requirements are generally thought to improve student achievement and graduation rates (Achieve, 2009; Balfanz & West, 2009), there is some evidence that increased graduation requirements are associated with lower high school completion rates (Lillard & DeCicca, 2001), and that mandatory high school graduation exams, a closely related policy reform, increase dropout rates, particularly among low-income students (Dee & Jacob 2007; Jacob 2001; Warren, Jenkins, & Kulick, 2006) and little evidence that they improve student achievement (Grodsky, Warren, & Kalogrides, 2009; Dee & Jacob 2007). Given this mixed literature and the importance of evaluating this statewide reform, the Michigan Merit Curriculum is the topic of an ongoing, [ES-funded study, conducted by the Michigan Consortium for Educational Research, to establish the effectiveness of the MMC in increasing student achievement and postsecondary transitions (Michigan Consortium for Educational Research, 2010). 69 Estimating demand to respond to changes in curricular requirements Traditional economic-based demand formulas center on the concept of estimating demand as a function of price or cost, and in general assume that demand decreases as price increases (Ehrenberg & Smith, 1997). Economists are also interested at the point at which the supply and demand curves meet, or are in equilibrium (i.e. there is enough supply to meet the demand with no excess). When looking at the predicted demand of teachers required to meet instructional needs arising from a set of specified curricular requirements, a different type of formula is required. The goal is not to predict the number of teachers needed based on general cost or population changes. Instead, the goal is to take a point-in-time change (increase in curricular requirements) and estimate the corresponding increase in teacher time necessary to meet those requirements. Much current work calculates teacher demand based on how many positions are open at a given level of compensation (Arnold, Choy & Bobbitt, 1993; Boe & Gilford, 1992; Ingersoll, 2001; Ingersoll & Perda, 2009; Guarino, Santibanez, & Daley, 2006). This focuses on funded positions, rather than on the number of sections of a course that will need to be offered to meet curricular requirements. Looking only at job openings does not take into account schools who may need teachers but who are not able to post forjobs because of budgetary constraints, increased class size, or other organizational issues. Demand can also be expressed as the changes due to growth or decline in enrollment, the student/teacher ratio, or staff requirements and the loss of teachers due to attrition (Arnold, Choy, & Bobbitt, 1993). There are two types of demand: constrained demand (the number of teachers employed) and unrestricted demand (the number of teachers who might be hired without 70 constraints; Carroll, Reichardt, & Guarino, 2000). The method for estimating demand presented here is a version of unrestricted demand, although it does represent a necessary amount of unrestricted demand, in that the course offerings must be made available to students in order to meet the needs of the merit curriculum. There are several methods available for calculating teacher demand. One demand calculation is provided by OECD (Santiago, 2002). In this calculation, the number of teachers needed is determined as follows: (student population/average class size) x (average number of required learning hours for students/teachers’ teaching load). However, the drawback of this formula is that it ignores the level of detail associated with subject area, something that is remedied in the formula presented below. Another common formula for demand, usually estimated at the state or national level, is demand=enrollment x pupil/teacher ratio (Boe & Gilford, 1992). Both of the above analyses were conducted at either the national or state level, highlighting one of the challenges identified with many of the demand models available (i.e. the one used by NCES, the OECD model, etc.). These methods produce national level projections that are not able to provide adequate and accurate information for policymakers at the school and district level (Carroll, Reichardt, & Guarino, 2000). While the demand formula presented above is similar in nature to other formulas, this work makes two unique contributions. The first is that this method provides estimates at the school level, and also utilizes local estimates for elements such as enrollment and class size. The second is that this paper tests the validity and accuracy of this method. While there are several widely used ways to estimate teacher demand, far more attention has been paid to teacher supply estimation methods. This paper provides an opportunity 71 to gather and present evidence regarding the extent to which these methods produce reasonable estimates. Demand Formula, Assumptions, and Alterations The developed demand formula is as follows: (') Z (1) D; = number of teachers needed to meet graduation requirements in a subject area a = proportion of student body that needs to be enrolled in each subject each year in order to meet graduation requirements xi = total student enrollment in each school y = class size 2 = number of periods taught per FTE per day2 Assumptions Enrollment weight (a): The proportion of the student body expected to take a given subject in each year, in order to meet the graduation requirements. For math and English language arts, since four years of each are required, it is assumed that 100% of the student body takes a math and an English course each year, and thus the value of a is 1.0 for mathematics and English language arts. For science and social studies, a=.75 because students are required to take three years of each. These weights represent a lower-bound estimate of the number of students in a school who are taking a particular subject. They do not take into account factors such as students who take more courses in a given subject than required or students who have to 2 See Keesler, Wyse & Jones (2008) on the IES website 1mezfiies.ed.gov/ncee/edlabs/regions/midwest/pdf/techbriefltri00508.pdf). This formula has been vetted by IE8 and is considered as a promising tool for use by other states. While the report was a collaborative effort, the development of the formula was an individual contribution on my part. 72 re-take courses. The actual number of students taking courses may be higher than would be estimated by this. Student enrollment (xi): Student enrollment is calculated by taking the average enrollment over the previous three years. There is no differentiation by grade level; the enrollment is for the entire student body. The implied assumption here is that any factors contributing to the enrollments of the three previous years—i.e. rates of drop outs, cohort sizes, student attrition—will remain constant in the future. Class size (y): Class size is assumed to be 25, reflecting common practices in many high schools. However, this assumption can be changed to reflect more local conditions or other assumptions (i.e. if a policymaker wanted to test the potential effect of decreasing class size on the necessary teaching workforce, the assumptions could be altered to a smaller class). This number also reflects an “ideal” class size in that it is an agreed-upon size that is not too large but also not unreasonably small given the practices and realities of most schools. Number of periods per FTE (2): Number of periods taught per FTE per day (2) is assumed to be five. It is important to note that this formula produces conservative estimates of demand, by estimating the minimum demand, given a certain set of curricular requirements. Therefore, schools that have inadequate supply to meet the demand can be considered rather significantly undersupplied; there may be more schools identified if the demand calculation were more liberal. The assumptions reflect common practices in many high schools. However, these assumptions are not reflective of all schools, and 73 particularly are not valid for schools with non-traditional organizational patterns; this should be taken into account when calculating demand for individual schools.3 Testing Each Assumption The core purpose of this paper is to revisit the assumptions of the current demand formula, evaluating the formula for rigor and for its ability to produce reliable demand estimates. The next section details the steps taken to test each assumption. Enrollment estimates. In the original formula, enrollment is estimated by averaging three years of enrollment data for each high school in the state. However, Michigan is a state with declining population, and net out-migration. Additionally, it is not clear from looking at the past enrollments how “shocks” to the system, such as significant economic decline, major layoffs and factory closings, and home foreclosures and bankruptcies (all of which have been experienced heavily in Michigan) affect school enrollments. Finally, as mentioned previously, simply taking an average of the previous three years of enrollment data assumes that any trends in enrollment represented by those data (i.e. drop out rate, incoming class size) will remain relatively static. In order to address this, enrollment projections are calculated using single exponential smoothing with a 0.4 smoothing constant to determine state level public school enrollments, as well as with a 0.7 smoothing constant for comparison, following the methods used by NCES (Hussar & Bailey, 2008). Exponential smoothing places more weight on recent observations than on earlier ones (Hussar & Bailey, 2008). By using three years of data, 3 The 580 high schools included in this study are schools that have more “traditional” configurations, as alternative schools and schools that were not coded as “regular” for any reason were not included. Therefore, this formula applies reasonably well to the sample analyzed in this dissertation. See Chapter 2: Data and Methods, for more information on sample selection. 74 and these smoothing constants, the enrollment rates reflect the realities of enrollment in Michigan more precisely than a simple average. Districts and states can use a variety of different methods to predict overall cohort enrolhrrent, such as the cohort survival model (California Department of Finance, 2006), the ratio model (Campbell, 1997), the extended demographic model, which uses enrollment data, birth records, student dropouts, and migration and grade retention records (Campbell, 1997), and a multiregional cohort enrollment model that allows for intradistrict mobility and school choice, and requires a shorter time scale (Sweeney & Middleton, 2005). These strategies were all researched and considered for use in this analysis. However, they are all most appropriate for predicting enrollment for larger units of analysis, such as districts, and required data inputs that are not readily available at the school level, such as birth records. The weighted average with a smoothing constant allowed for the calculation of individual school enrollment projections using available data. Class size and FT Es taught per person. In the original application of the formula with Michigan data, class size was assumed to be 25, and the state was provided with alternate estimates under assumptions of 15, 20, and 30 students per class. The number of FTES taught per teacher per day was assumed to be five. However, a small value in the denominator can change the estimates of demand considerably, and since both of these numbers are found in the denominator, it is particularly important to produce very accurate estimates. Class size is a critical and well-researched issue in education, although there is lack of a consensus on the effect of class size. When there are studies of actual class size 75 (not student teacher ratio), the findings are still mixed. Some meta-analyses show that reduced class sizes do not systematically lead to improved student achievement (Hanushek, 1997), while others (Hedges, Laine, & Greenwald, 1994; Kruger, 2000) the effect may be more positive. Results from the Tennessee STAR experiment, one of the largest randomized field experiments of reduced class sizes, show that reduced class size has a significantly positive effect on test scores, particularly for children from disadvantaged situations (F inn & Achilles, 1990; Kruger, 1999). However, these effects are questioned by others (Hanushek, 1999) who are concerned that bias, implementation issues and design issues may reduce the magnitude of any treatment effect, and that the expensive nature of reduced class sizes is not necessarily merited by the findings. Twenty-five students was assumed to be a reasonable class size—one that was small enough that it did not place unreasonable instructional burdens on the teacher, but large enough that it did not reflect the more costly “class size reduction” reforms. One important caveat, however, is that using an assumed class size allows a principal, district superintendent, or state policymaker to estimate demand based on an ideal class size or based on potential class sizes due to a proposed change in class size as part of an educational intervention. There may be situations in which it is not desirable to know the exact class size because the demand projection may need to reflect desired conditions, not current conditions. For example, if a school is already undersupplied and using large classes in math in order to meet the demand, then calculating the student- teacher ratio in math and using that to estimate demand is not helpful because it will make it appear as though the school is adequately staffed, when in fact they are undersupplied because of large class sizes. For this reason, the class size will stay at an 76 estimated level (25) for this analysis at this time.4 Given the budgetary restrictions and severe economic climate facing schools today, reflecting actual class sizes is not optimal, as many school districts are dealing with teacher and funding shortages by demanding more of their current instructional staff. This is a situation in which reflecting the reality is not desirable; reflecting a reasonable class size goal will produce more realistic estimates of which schools are likely struggling to meet their instructional needs given their current staff. FTES per person are assumed to be five, based on common practice and suggestion by the state. This again will be left as an assumption, as it may be another area that is affected negatively by budgetary shortfalls. However, in order to test the possible implications for varying FTES, an assumed distribution is generated and used for comparative purposes (see below). Weighting the enrollment. Currently, the (a) value is defined as 1.0 for mathematics and English language arts and .75 for science and social studies. This is based on the assumption that, if students need to take four years of mathematics and English language arts, they will take mathematics and English language arts each year they are in school, and thus all students in a given school will be taking a mathematics and English language arts class each year—which means that 100% of the total enrollment will need to have courses available to them. In reality, this is likely to be a 4 In many traditional demand calculations, class size is approximated by student-teacher ratio (Santiago, 2002). This precludes a subject-specific student/teacher ratio. It was possible to calculate student/teacher ratio for each subject for this analysis, but this strategy was rejected as it has the same limitations as identifying the actual class size; it may reflect reality and therefore not allow areas of undersupply to be addressed. Additionally, that does not address the fact that certain classes may be small while others large within the same subject. 77 lower bound estimate, as students may either elect to take more than the minimum number of courses or may need to retake courses in order to obtain credit.5 Michigan does not currently have student transcript data available, making it virtually impossible to construct individual course taking histories from the data system in order to understand the number of courses taken in each subject by each student in a given year. However, beginning in the winter of 2011, Michigan will begin collecting bi- annual student transcript data from every high school in the entire state, via the Michigan eTranscript initiative. At this time, the data will be available to understand coursetaking patterns for Michigan students. Using these data when they become available, this weight on the enrollment will be adjusted. However, given the lack of reasonable options for these data at the time of this writing, the previously defined weights will continue to be used. As stated above, they produce conservative estimates of demand, as it is unlikely that students would take any fewer courses per year than the weights reflect, and may in fact take more, which would increase demand. Challenges with ratios. The demand formula outlined above is a ratio of enrollment to class size and F TEs per teacher. Ratios can be very sensitive to small changes in the denominator (Rice, 2007). There are statistical methods for constructing confidence intervals around ratio estimates (the Delta method or the Fieller Method); however, these methods require that variances of both the numerator and the denominator can be produced. In the current equation, variance estimates could be constructed around 5 Interestingly, in May 2010, the Michigan legislature passed a bill repealing the requirement that students pass Algebra H. This requirement was likely to have caused more students to have to retake mathematics courses (Detroit Free Press, May 7, 2010). Anecdotal evidence also suggests that students who failed Algebra I] in school are obtaining that credit via credit recovery through programs like Michigan Virtual University (MCER internal meeting, May 7, 2010). These trends suggest that a weight of 1.0 may not be a lower bound, if students are only taking Algebra H once in high school, or not required to take it at all. 78 enrollment, but not around class size or F TEs because those are constants. I can assume a sampling distribution around both class size and FTE and use that assumed distribution to test the sensitivity of the ratio to changes in class size or FTE. For example, I can assume that class size has a standard deviation of 0.5, which would mean that 64% of all class sizes fall within one student (from 24.5 to 25.5). I can then calculate the variance of the ratio and see how variable the demand calculation is under that assumed standard deviation, then repeat the procedure with other standard deviations (Rice, 2007). Testing and Validating the Formula In the next section, differences in the distribution of demand and undersupply estimates under the original formula and the distribution of demand and undersupply estimates under the tested formula are examined. The complicating aspect here is that one method is not, by definition, “better” than the other one, so the comparison is difficult. It can be assumed that the second method is more rigorous because of the increased computational rigor introduced, but this may be a tenuous assumption. The computational rigor of the second method may produce estimates that are more realistic to actual conditions, but that are less valuable in terms of highlighting schools that could use additional staffing resources. Therefore, the key questions addressed when testing the formula are: Do these two formulas produce similar estimates? If not, how are they different? What is the range of demand under the original formula, versus under the new formula? Do the two formulas produce distributions with different qualities? More importantly, for which schools are the differences in estimates most pronounced? In other words, do the two 79 methods work equally well for traditional schools but one has an advantage in less traditional schools? Rather than having an absolute test or standard by which these methods are compared, the distributional aspects produced under each method will be examined for patterns and evidence, and the strengths and weaknesses of each method will be discussed, with evidence arrayed for the use of one over the other based on a variety of criteria. The end goal is to make a recommendation regarding which formula should be utilized by policymakers. Is there evidence to suggest that the more computationally rigorous formula produces “better” estimates? Is there evidence that one of the formulas works more efficiently than the other for a certain type of school? These conclusions are necessary addendums to the method, in order to provide better guidance to end users regarding how to use the formula and how to understand it’s limitations. Analytic Strategy To implement and test the modifications described above, this chapter first tests various enrollment specifications and identifies one that seems optimal by comparing the distributions of demand and undersupply under each specification. Next, distributional assumptions are added to class size and classes taught per F TE, and these are used in conjunction with the enrollment estimate identified by the first step. Distributions of demand and undersupply are again produced and compared. The final stage investigates the differential functioning of various demand scenarios by looking at changes in the distribution of undersupply by school characteristics, and classifications of significant undersupply under different demand specifications. The chapter concludes by identifying 80 the most optimal demand formula, based on all accumulated evidence, and this formula is then used to conduct the undersupply analyses in Chapter 7. As stated above, there is no one test that conclusively proves which demand formula is the “best.” The approach taken, therefore, is one of accumulating evidence to support or reject modifications to assumptions, and seeking to iteratively implement a modification, assess the distributional changes, and make an appropriate decision based on evidence. Results Part 1: Choosing an optimal enrollment estimate Enrollment rates were calculated using three methods: a simple three-year average, a projected enrollment using three years of data and a 0.7 smoothing constant; and a projected enrollment using three years of data and a 0.4 smoothing constant. The formula for projected enrollment with a smoothing constant is: P: aXt +a(1—a)Xt—1+a(1—a)2Xt—2 where P = projected value a = smoothing constant (either 0.4 or 0.7) Xt = observation for time t Below, Figure 4.1 presents the histograms for mathematics demand calculations under the three enrollment estimates, followed by the undersupply calculations. For the sake of parsimony, only mathematics is presented here, although similar calculations and comparisons were conducted for English language arts and science. These histograms are included in the Technical Appendix B. 81 Figure 4.1: Comparison of Mathematics Demand With Varying Enrollment Estimates Density Figure 4.1.1 : Original formula 82 Figure 4.1 (cont’d) 0 demmath4 Figure 4.1.2: Smoothing constant 0.4 83 Figure 4.1 (cont’d) Density Figure 4.1.3: Smoothing constant 0.7 Viewing the histograms, they are shaped similarly, which suggests that there are not dramatically different distributions of demand regardless of enrollment estimation methods. The distribution is slightly more spread out in the original and in the 0.7 smoothing constant demand estimations, while the 0.4 smoothing constant produces a slightly more truncated distribution. Turning to the undersupply calculations, the three distributions are presented below: 84 Figure 4.2: Comparison of Mathematics Undersupply with Different Enrollment Estimates 92+ Figure 4.2.1: Original formula 85 Figure 4.2 (cont’d) -1O undmath4 Figure 4.2.2: Smoothing constant 0.4 86 Figure 4.2 (cont’d) undmath? Figure 4.2.3: Smoothing constant 0.7 Here, there are clear differences in the distributions of undersupply. The original formula produces an undersupply distribution that is weighted toward positive numbers, where positive numbers indicate greater amounts of undersupply. The 0.7 smoothing constant enrollment estimates produce a distribution shaped like that of the original enrollment estimates, but one that is less spread out and more peaked. Finally, the 0.4 smoothing constant produces a markedly different distribution, one that is more truncated and with more of the distribution in the negative numbers, which suggests adequate supply. This reflects the implications of the demand estimates—when the demand estimates have longer tails, there are more schools identified as undersupplied. Which enrollment estimate is the most appropriate to use for this analysis? Using a smoothing constant to generate a projected enrollment is conceptually an important 87 modification, given Michigan’s shifting enrollments and declining population. Which smoothing constant is most appropriate? Based on the need to appropriately identify undersupply while taking changes in enrollments into account, the 0.7 smoothing constant will be used for enrollment projections, as the use of this constant causes the weight of earlier observations to decrease rapidly (Hussar & Bailey, 2008). This is more appropriate for the situation in Michigan, where enrollments can decline quickly in a given school. Additionally, NCES suggests utilizing a higher smoothing constant when the data are population estimates as opposed to sample estimates. The distributions for ELA and science demand and undersupply are included in Technical Appendix B. Part II: Distributional assumptions for class size and courses per FT E : Addressing the issues with ratios The demand formula is a ratio—the ratio of enrollment to class size, and then that entire quantity as a ratio to periods taught per F TE. However, the denominators—class size and courses taught per FTE—are assumed values, and therefore little is known about their possible range of values. In a ratio Z=Y/X, Y and X are measured with some error (indicated by their standard deviations). The ratio (Z) will have a standard deviation, and the size of that standard deviation (and thus, the precision with which that ratio is measured) will depend on several factors, including; 1) the precision with which X and Y are measured, and 2) the correlation between X and Y. If X and Y are measured with great precision, then the difference between the E(Z), or the expectation of Z, and actual Z is small. However, if X and Y are not measured with precision, the difference between the ratio and the expected 88 value is large. Furthermore, ratios are more variable when the denominator is small, and correlation between X and Y decreases the variance of the ratio (Rice, 2007). In this case, the numerator (Y) is the enrollment. The denominator, X, is class size, and the number of courses taught per F TE. Both class size and courses per F TE are assumed, with class size assumed at 25 and courses per FTE at 5. As assumptions, they have no standard deviation. In practice, however, there is a distribution around class size and number of courses taught. The wider this distribution (i.e the less precisely they are measured), the more variable the demand calculation will be. Given that they are assumptions, and that they are designed to be adjusted by the end user to reflect changing conditions or to test the demand under planned circumstances, assuming a distribution is somewhat of an academic exercise. However, when estimating this for every school in the whole state, it is important to quantify how variable the estimates of undersupply could be with a range of values for the assumed values. This is akin to a sensitivity analysis—quantifying how sensitive the demand formula is to changes in the assumptions, and how sensitive this estimate would have to be in order to invalidate inferences around undersupply. This is evaluated empirically, as described below. In order to quantify the sensitivity of the ratio to changes in the denominator, 1 generated two new variables for class size; one with a mean of 25 and a standard deviation of 1[~N(25, 1)], and one with a mean of 25 and a standard deviation of .5 [~N(5, .5)]. I also generated a variable for courses taught per FTE, with a mean of 5 and a standard deviation of .25 [~N(5, .25)]. This was because the range of the number of courses taught is bounded by realities of a teaching day—no teacher is teaching 8 courses 89 in a day, or as few as two. A standard deviation of .25 produces a lower bound of 4.28 courses and an upper bound of 5.9 courses. For mathematics only (again, for parsimony, with ELA and science included in the Technical Appendix C), Figure 4.3 presents the distributions of demand compared under four situations: (1) Original demand formula, (2) enrollment with .7 smoothing constant (selected as the optimal enrollment estimate from above analyses), (3) enrollment with .7 smoothing, class size~N(25, 1) and courses~N(5, .25), and (4) enrollment with .7 smoothing, class size~N(25, .5) and courses~N(5, .25) 90 Figure 4.3: Comparison of Demand Estimates Under Four Estimation Scenarios Figure 4.3.1: Scenario 1: Original formula 91 Figure 4.3. (cont’d) .15 Density Figure 4.3.2: Scenario 2: Enrollment 0.7 smoothing constant 92 Figure 4.3 (cont’d) Density 0 5 10 15 20 25 math_demand71 Figure 4.3.3: Scenario 3: 0.7 smooth, class size~N(25, 1), courses~N(5, .25) 93 Figure 4.3 (con’td) to fit}, "I - 4.7:. ‘L a. . 9.: ”-. a for. ’ .‘ ‘ 3x 1 .. ”3.: a .w P — , . .‘:I‘l.d‘ \. ‘1‘ ’ ., “234?; E z 5 $331}: m ,"V g ’3 C l; “12,, “11' :1'. 8 a» s ‘75 refit} ! J'y’ fill 3;?2'. . II 8 —I 3' 9&1: '7 / a) . ”:1, .' q t 1:91: V '.‘ ‘ . _ : . zr/ ;’ I 7 it? ’ xv- . if" _, 4.1-.7 7 . . .2145" ‘. O J : hie fix ‘9‘"? 5'43} 5.55," " 31.5” Tip " ‘13}: 1 t 1‘ 0 5 1o 15 math_demand75 20 25 Figure 4.3.4: Scenario 4: 0.7 smooth, class size~N(25, .5) courses~N(5, .25) Observing the distributions, while changing enrollment estimates from the former (simple average) to a weighted average using a smoothing constant does not appear to change the distributions of demand greatly, there do appear to be changes in the distribution in situation (3) and (4). The tail of the distribution is more elongated and more positively skewed, which suggests a greater amount of observed demand when including distributional assumptions around the class size and number of courses taught per FTE. 94 Figure 4.4: Comparison of Undersupply Under Four Estimation Scenarios V. _. (V). .. E an act or .1 o O f“ “‘““ “ -5 undermath Figure 4.4.1: Scenario 1: Original formula "- i 60. _, E 0 C 8 a! - O l ““‘"‘“ -6 -4 Figure 4.4.2: Scenario 2: Enrollment 0.7 smoothing constant 95 Figure 4.4 (cont’d) 0 undermath71 Figure 4.4.3: Scenario 3: 0.7 smooth, class size~N(25, 1), courses~N(5, .25) IQ- 0 undennath75 Figure 4.4.4: Scenario 4: 0.7 smooth, class size~N(25, .5) courses~N(5, .25) 96 Figure 4.4 shows the distribution of undersupply under the four scenarios. As mentioned above, the undersupply distributions are more conservative than those in the original demand formula when using the 0.7 smoothing constant, which is why this enrollment estimate was selected for use in the final analyses. When using distributional assumptions around class size and F TB, in situation (3) and (4), the distributions become more peaked, but there are also more outliers. This reflects the fact that these small changes in the denominator can yield estimates that are more varied than those generated when using only assumed values. Do dtflerent demand formula assumptions produce estimates that vary by school characteristics? The analyses above demonstrate the shape and characteristics of the distributions of demand and undersupply under differing sets of assumptions. The second important question is: do the demand estimates lead to an identification of undersupplied schools that varies with school characteristics? In other words, is there differential formula functioning by school characteristics? To investigate this, the mean mathematics, English language arts, and science undersupply was calculated by a set of school characteristics. This table is introduced in Chapter 3, to demonstrate the descriptive relationship between undersupply and school characteristics. Tables 4.1, 4.2, and 4.3 present the distribution of mean mathematics, English language arts, and science undersupply by school characteristics under three different demand formula assumptions: 1) the original demand formula, 2) using a 0.7 smoothing constant on the enrollment, and 3) a 0.7 smoothing constant on the enrollment and distributional assumptions on class size [(~N(25, .5)] and courses per FTE [(~N(5, .25)]. 97 If the distribution of mean levels of undersupply by school characteristics varies when using different demand specifications, this is evidence that certain formulas function differentially. The comparative analyses presented in Tables 4.1, 4.2 and 4.3 demonstrate that, in each of the demand formula specifications, the relationships between undersupply and different school characteristics do not differ. The magnitude of the relationships is different, reflecting the fact that the original demand formula produced more generous estimates of undersupply, but the order of relationships is the same. For example, in Table 4.1, the distribution of mean mathematics undersupply over the four locale types using a demand formula with a smoothing constant on enrollment is .64 for city schools, .81 for suburban schools, .51 for town schools and .18 for rural schools. Using the original formula, the distribution is .98 for city schools, 1.07 for suburban schools, .74 for town schools, and .32 for rural schools—the same ordering as under the first formula. Finally, in the most complex formula, the one with a smoothing constant on the enrollment and distributional assumptions on class size and FTE, the distribution of math undersupply by locales is .69 for city schools, .90 for suburban schools, .52 for town schools, and .19 for rural schools. In all three situations, rural schools have the lowest mean mathematics undersupply, while suburban schools have the highest mean mathematics undersupply. This is true for each subject and for each set of characteristics, and this provides strong evidence to conclude that, regardless of demand formula specification, the formulas will work similarly for all school types. How does the categorization of undersupply difi'er in each of the four situations? 98 A second critical question, and possibly one that is more policy-relevant, is the extent to which determinations of significant undersupply (i.e. more than one FTE needed) change based on estimations. Although the continuous undersupply calculations are helpful in understanding changes in teacher supply more broadly, schools, districts, and states will likely want to know, “Do I have a large enough undersupply that I am going to potentially have a staffing problem?” Therefore, comparing how schools are classified as undersupplied or not undersupplied under each formula is important. In Table 4.4, the categorization of undersupply are compared in each of the situations. The classification results show that all modifications to the original formula lead to fewer schools being classified as significantly undersupplied. This suggests that it is important to use these types of modifications because this helps keep this demand formula conservative in terms of the estimates it produces. One reason for the importance of using these types of modification is the fact that many schools have declining enrollments. With a three year average, unweighted, each year contributes the same to the final enrollment estimate. Using a smoothing constant, older observations count for less than newer observations. Therefore, predicted enrollments under the smoothing constant are lower than average three year enrollments; which in turn produces lower demand estimates. Estimates under the two distributions assumptions of the standard deviation of class size are very similar. For this reason, Situation (2), which is the enrollment projection method only, seems to be an optimal method. Table 4.5 presents the cross-classifications of undersupply between Scenario 2 and the other scenarios, in order to understand variations in how schools are classified. There were 34 schools classified as undersupplied in the original formula who were not 99 when using the estimated enrollments in Situation 2 (see A in Table 4.2). Are those schools likely to experience an undersupply, or were they falsely tagged by the original estimate that did not take into account potential declining enrollments? Similarly, there were 34 schools were not undersupplied under situation (2) but were undersupplied under situation (3) and 16 schools were undersupplied in (2) but not in (3) (see B in Table 4.2). This means that assuming a distribution around class size of N(25, 1) leads to 50 schools being classified differently. For situations 3 and 4, the number is of schools classified differently is 47, and they are arrayed similarly to the situations in 2 and 3 (see C in Table 4.3). This suggests that if a distribution is going to be assumed around class size and FTES, it is better to use the more conservative estimate of class size~N(25, .5) Finally, for A and C, I identified the schools who were “misclassified” (i.e. those who were classified as undersupplied on one analysis and not the other, and vice versa) and did simple cross tabulations with chi-square tests for significant differences by four key structural categories of schools: locale, percent minority, percent free and reduced lunch, and school size. The purpose of this is to ascertain whether or not the misclassifications are more or less likely to happen to certain types of schools. Table 4.6 shows the crosstabulations for A (Situation 1 and Situation 2) and Table 4.7 shows them for C (Situation 4 and Situation 2). The difference between these two demand estimates was the inclusion of the 0.7 smoothing constant on the enrollment projections. This led to more conservative estimates of demand, and fewer schools being identified as significantly undersupplied. There are no statistically significant differences between those categorized differently by 100 g the two formulas and those categorized the same on locale, school minority composition, or free and reduced lunch Small schools were significantly less likely to be classified differently under the two demand formulas than large schools. This suggests that larger schools are more susceptible to differences in how the demand formula is specified. When the enrollment was projected using the 0.7 smoothing constant (i.e. in Situation 2) rather than averaged over three years in the original formula (Situation 1), these schools had lower enrollments and thus less undersupply. This is likely a positive modification, as the previous enrollment estimates were over-estimating the number of students in the school, and thus overestirnating demand. Table 4.7 presents the crosstabulations for C (from Table 4.5), looking at misclassifications under Situation 2 (0.7 smoothing constant on enrollment) and Situation 4 (smoothing constant and distributional assumptions on class size and FTE). The assumed distribution around class size and course per FTE is the more conservative estimate than that presented in Situation 3. Again, as seen in Table 4.6, there is no statistically significant differences between schools classified the same way under each formula and those classified differently on locale, percent minority, and percent free or reduced lunch. Fewer small schools were misclassified than expected, and more large schools were misclassified. To summarize, there is not evidence to suggest that any of these specifications of demand function differently based on school locale, percent minority students, or percent free and reduced lunch students. As seen in the comparison of mean undersupply by school characteristics, the distributions are the same, although the absolute quantity in 101 each category differs. Even when using the “significant” threshold, this formula is not functioning differentially. The original formula, without the enrollment projection methodology, was overestimating undersupply for large schools. This provides firm evidence to suggest that using a better enrollment measure was a positive improvement for the formula. Conclusions For the enrollment estimates, using a weighted average seems to be critical in order to keep the analysis on the conservative side and avoid Type I errors. If this formula is being utilized by a district or school individually, they can decide whether to use an average, a weighted average, or their own prediction of what they think their enrollment is going to be, based on their own experiences. The distribution of undersupply over types of schools does not differ regardless of the demand formula specification. Changes in classification of significant undersupply are largely due to the shape of the distributions, and the fact that imposing a 1.0 FTE cut point as the threshold for “significant” undersupply places that cut point at different locations on the varying distributions. In the original formula, more schools fell above that cut point, because the formula was more generous, and also, because enrollment was likely overestimated by using a simple average, particularly in large schools. Therefore schools were classified as undersupplied who may not actually have experienced undersupply.6 6 It is possible to verify the accuracy of this formula by using the next year’s data (in this case, REP 2009) and assessing the extent to which the schools predicted to be undersupplied are actually undersupplied. This was not done for this analysis as the focus was on improving the demand formula as an estimation strategy, but will be done in the future as a final verification step, when 2009 data is made available by MDE. 102 Assuming a distribution around class size and courses is problematic. It is instructive to know the degree to which these assumptions can change classification errors. However, for use in the formula and analyses, using the assumed distributions does not make sense, as the point of the formula is to estimate which schools MAY be undersupplied under certain conditions. The goal of the formula is important here. It is true that changes in these distributional assumptions could change inferences, as schools are categorize differently under different situations. However, since they are not empirically based, I hesitate to continue to use them in analysis. Given this evidence, in future analyses, the smoothing constant of 0.7 will be used with no distributional assumptions on class size and courses per FTE as the main analysis. For comparison, the final undersupply models will be run using the smoothing constant and distributional analyses on class size~N(25, .5) and courses per FTE~N(5, .25). This will show how the results may change in a situation where the assumptions are not fixed, but with conservative estimates of distribution. A final question related to this formula is the extent to which it is valid. Measurement literature outlines various dimensions of validity (content, criterion, construct) but the core concern in a formula of this nature is construct validity—the extent to which this formula is measuring what it is intended to measure (teacher demand)—and face validity, the “believability” of this formula. Validity, even in the pure measurement sense, does not have one “test” that establishes the validity of an instrument or a construct. Rather, validity is demonstrated via the accumulation of evidence. Evidence for the validity of this formula and method includes the computations and comparisons undertaken in this analysis above, particularly the comparisons of the 103 distribution of estimates obtained under each method. Face validity is also established by the extent to which this formula is seen as viable and believable to a policymaker audience. As the original formula has already been vetted by the state of Michigan, the Institute for Education Sciences, the Regional Educational Laboratory-Midwest, representatives of other regional educational laboratories, and by a number of academic audiences, there is substantial evidence that this formula passes the face validity test. The rigor to which the formula is submitted here suggests that it has construct validity as well. The purpose of this formula is to provide a useful tool to practitioners and researchers alike to allow for demand calculations that are sensitive to school-specific variations in demand and supply; that takes into account the context of cun'icular requirements; and that can be used as a planning tool by practitioners in which they can adjust the assumptions to reflect the current or future conditions in their school in order to inform decision making. 104 Table 4.1: Mean Mathematics Teacher Undersupply by School Characteristics Under Different Demand Formula Assumptions Enrollment Enrollment with Original Demand w/Smooth & Smoothing" Formula Dist'n on Class Mean SD F-test Mean SD F -test Mean SD F -test N Locale City .64 1.68.000 0.98 1.87 .000 .69 1.80 .000 94 Suburb .83 1.37 1.07 1.38 .90 1.46 161 Town .51 .90 0.74 0.99 .52 1.01 78 Rural .18 .84 0.32 0.85 .19 .81 245 Percent minority Lessthan 5% .31 .81 .168 0.46 0.87 .038 .29 .86 .039 174 5-10% .46 1.12 0.69 1.17 .48 1.10 141 10-15% .64 1.13 0.82 1.2 .71 1.20 56 15-65% .58 1.42 0.82 1.49 .66 1.50 132 Greater than 65% .63 1.65 0.92 1.8 .70 1.84 75 Percent free/reduced lunch Less than 10% .04 1.60.006 0.24 1.61 .008 -.003 1.51 .003 43 10-29% .67 1.13 0.88 1.2 .73 1.16 202 30-49% .37 1.02 0.56 1.08 .40 1.09 204 50-69% .62 1.44 0.83 1.62 .63 1.64 94 70%orfeater .29 1.18 0.5 1.19 .30 1.25 35 School size Less than 300 students -0.15 .73 .000 -0.08 0.7 .000 -.13 .75 .000 83 300-999 students .30 .91 0.45 0.92 .30 .95 307 Greaterthan1000 students 1.06 1.52 1.42 1.6 1.15 1.62 188 Charter Schools Charter .06 1.08.039 0.73 1.28 .003 .54 1.28 .054 33 Non-charter .51 1.21 0.05 1.06 .10 1.13 545 Magnet Schools Magnet .22 1.38.061 0.72 1.27 .122 .57 1.26 .004 65 Non-mgmt .52 1.12 0.46 1.45 .09 1.28 513 105 Table 4.1 (cont'd) Enrollment w/Smooth & Enrollment with Original Demand -Dist'n on Class Smoothing“ Formula Size and FT E Mean SD F-test Mean SD F-test Mean SD F-test N Percent teachers with prof licenses Less than 80% .35 1.16 .367 0.5 1.26 .135 .39 1.29 .468 132 80-90% .52 1.31 0.76 1.38 .55 1.35 297 Greater than 90% .42 1.00 0.74 1.07 .55 1.08 149 Percent minority teachers Less than 1% .28 .98 .000 0.44 1.01 .000 .28 .98 .001 275 1-10% .72 1.22 0.96 1.29 .77 1.29 214 Greater than 10% .54 1.63 0.83 1.79 .61 1.81 89 Percent yghly qualified teachers Less than 75% .74 1.34 .000 1 1.45 .000 .75 1.44 .003 196 75-85% .46 1.23 0.68 1.26 .49 1.22 144 Greater than 85% .28 1.02 0.44 1.08 .33 1.13 238 Total .48 1.20 0.69 1.28 578 .51 1.27 5 78 Math undersupply is calculated for the S Y 2009 based on 2008 REP and student enrollment “This calculation is identified as the optimal calculation and is used primarly in the Chapter 7 undersupply analyses 106 Tfirle 4.2: Mean Englishianguage Arts Teacher Undersupply by School Characteristics Under Different Demand Formula Assumptions Enrollment with Smoothing" Original Formula Enrollment w/ Smooth & Dist'n on Class Size/ FTE Mean SD F -test Mean SD F -test Mean SD F -test N Locale City -.81 2.56 .003 -0.48 2.69 0.028 -0.76 2.64 0.012 94 Suburb -l.10 1.81 -0.86 1.83 -1.03 1.85 161 Town -.89 1.06 -O.68 1.05 -0.9 1.04 78 Rural -.52 .88 -0.38 0.89 -0.51 0.91 245 Percent minority Less than 5% -.61 1.01 .062 -0.46 1.01 0.083 -0.63 1.07 0.181 174 5-10% -.69 1.15 -0.46 1.12 -0.67 1.12 141 10-15% -1.17 1.79 -0.99 1.86 -1.1 1.79 56 15-65% -1.00 1.67 -0.77 1.66 -0.93 1.71 132 Greaterthan 65% -.67 2.64 -0.38 2.81 -0.6 2.74 75 Percent free/reduced lunch Less than 10% -1.43 1.93 .037 -1.23 1.91 0.036 -1.47 1.96 0.024 43 10-29% -.78 1.38 -0.56 1.38 -0.72 1.36 202 30-49% -.79 1.31 -0.59 1.29 -0.76 1.33 204 50-69% -.58 2.10 -0.36 2.29 -0.56 2.21 94 70% or greater -.46 1.87 -0.24 1.89 -0.45 1.92 35 School size Less than 300 students -.43 .70 .006 -0.37 0.69 0.301 -0.41 0.7 0.027 83 300-999 students -.71 1.36 -0.55 1.38 -0.7 1.38 307 Greaterthan 1000 students -1.05 2.09 -0.69 2.17 -0.97 2.15 188 Charter Schools Charter -.86 2.81 .752 -0.55 1.52 0.268 -0.74 1.52 0.768 545 Non-charter -.77 1.48 -0.87 2.81 -0.83 2.83 33 Wet Schools Magnet -.86 2.27 .658 -0.56 1.5 0.795 -O.72 1.52 0.198 513 Non-magnet -.77 1.48 -0.62 2.33 -0.99 2.24 65 107 Table 4.2 (cont'd) Enrollment w7Smooflf Enrollment with & Dist'n on Class Smoothing" Original Formula Size/ F TE Mean SD F-test Mean SD F-test Mean SD F-test N Percent teachers with prof licenses Lessthan 80% -.58 1.91 .130 -0.43 2 0.312 -0.53 1.95 0.109 132 80-90% -.90 1.59 -0.67 1.62 -0.88 1.63 297 Greaterthan 90% -.72 1.18 -0.5 1.17 -0.68 1.18 149 Percent minority teachers Lessthan 1% -.58 1.04 .008 -0.42 1.03 0.042 -0.58 1.02 0.026 275 1-10% -1.03 1.57 -0.79 1.57 -0.98 1.6 214 Greaterthan10% -.79 2.62 -0.5 2.77 -0.72 2.74 89 Percent highly qualified teachers Less than 75% -.59 1.59 .086 -0.33 1.68 0.02 -0.58 1.62 0.164 196 75-85% -.79 1.41 -0.57 1.38 -0.75 1.38 144 Greater than 85% -.93 1.67 -0.77 1.68 -0.88 1.73 238 Total -.78 1.58 -0.6 1.62 -0.75 1.62 578 ELA undersupply is calculated for the 2009 school year based on 2008 REP and student "This calculation is identified as the optimal calculation and is used primarly in the Chapter 7 undersupply analyses 108 Table 4.3: Mean Science Teacher Undersupply by School Characteristics Under Different Demand Formula Assumptions Enroflment w/Smooffl_—l ‘ Enrollment with Original Demand & Dist'n on Class Smoothing“ "' Formula Size/ F TE Mean SD F ~test Mean SD F -test Mean SD F -test N Locale City -1.04 1.82 .000 -.79 1.86 .000 -1.00 1.86 .000 94 Suburb -1.07 1.36 -.88 1.33 -1.01 1.41 161 Town -.68 .89 -.52 .90 -.69 .93 78 Rural -.50 .71 -.39 .72 -.49 .73 245 Percent minority Less than 5% -.47 .75 .000 -.37 .76 .000 -.49 .76 .000 174 5-10% -.73 .93 -.56 .91 -.71 .98 141 10-15% -.99 1.02 -.85 1.05 -.94 1.20 56 15-65% -1.21 1.59 -1.03 1.56 -1.15 1.59 132 Greaterthan 65% -.58 1.54 -.37 1.56 -.53 1.59 75 Percent free/reduced lunch Less than10% -1.96 1.91 .000 -1.81 1.87 .000 -1.99 1.95 .000 43 10-29% -.90 1.11 -.74 1.09 -.86 1.13 202 30-49% -.56 .89 -.41 .86 -.54 .90 204 50-69% -.53 1.23 -.37 1.27 -.51 1.30 94 70%orgreater -.39 1.16 -.23 1.19 -.38 1.14 35 School size Less than 300 students -.46 .64 .000 -.41 .62 .000 -.45 .63 .000 83 300-999 students -.50 .88 -.39 .88 -.50 .92 307 Greaterthan 1000 students -l.34 1.58 -1.06 1.63 -1.27 1.64 188 Charter Schools Charter -.25 1.09 .010 -.62 1.20 .077 -.78 1.23 .012 33 Non-charter -.80 1.20 -.25 1.07 -.22 1.09 545 _M_1gnet Schools Magnet -.74 1.13 .819 -.62 1.20 .645 -.73 1.23 .536 65 Non-magnet -.77 1.21 -.55 1.12 -.83 1.18 513 109 Table 4.3 (cont'd) Enrollment w/Smoo Enrollment with Original Demand & Dist'n on Class Smoothing" Formula Size/ FTE Mean SD F -test Mean SD F -test Mean SD F -test N Less than 80% -.50 1.13 .009 -.39 1.12 .038 -.47 1.13 .008 132 80-90% -.88 1.28 -.71 .127 -.87 1.32 297 Greater than 90% -.77 1.06 -.60 1.06 -.74 1.07 149 Percent minority teachers Less than 1% -.54 .80 .000 -.42 .81 .000 -.54 .81 .001 275 1-10% -1 .06 1.40 -.88 1.38 -1.02 1.44 214 Greater than 10% -.77 1.53 -.56 1.56 -.72 1.58 89 Percent h_r_g’ hly qualified teachers Less than 75% -.80 1.34 .756 -.61 1.35 .971 -.79 1.39 .631 196 75-85% -.80 1.31 -.63 1.27 -.77 1.30 144 Greater than 85% -.72 .99 -.60 .99 -.69 1.02 238 Total -. 77 1.19 -.61 1.19 -.74 1.23 578 Science undersupply is calculated for S Y 2009 based on 2008 REP and student enrollment from ”This calculation is identified as the optimal calculation and is used primarly in the Chapter 7 undersupply analyses 110 Table 4.4: Categorization of Undersupply Under Four Sets of Assumptions Situation 1 Situation 2 Situation 3 Situation 4 Not Undersupplied 393 426 408 409 Underwplied 185 152 170 169 Situation 1: Original demand formula Situation 2: 0.7 Smoothing constant only Situation 3: 0.7 smoothing constant, class size~N(25, 1) and courses~N(5, .25) Situation 4: 0.7 smoothing, class size~N(25, .5) and courses~N(5, .25) 111 Table 4.5: Comparison of Cross-Categorizations of Undersupply Under Four Sets of Assumptions Situation 2 US Situation 2 Not US Total Situation 1 .' Undersupplied 15 l 34 185 A Situation 1: Not US 1 392 393 Total 152 426 578 Situation 2 US Situation 2 Not US Total Situation 3: Undersupplied 136 34 170 B Situation 3: Not US 16 392 408 Total 152 426 578 Situation 2 US Situation 2 Not US Total Situation 4: Undersupplied 137 32 169 Situation 4: Not US 15 394 409 Total 152 426 578 Situation 1: Original demand formula Situation 2: 0.7 Smoothing constant only Situation 3: 0.7 smoothing constant, class size~N(25, 1) and courses~N(5, .25) Situation 4: 0.7 smoothing, class size~N(25, .5) and courses~N(5, .25) 112 Table 4.6: "Misclassification" Between Situation 1 and Situation 2 ("A" from Table 4.4) Same Diflerent Expected Classification Classification Dist'n Chi-Square Locale City 17% 14% 17% 0.439 Suburb 27% 37% 28% Town 13% 17% 13% Rural 43% 31% 42% Percent minority Less than 5% 30% 29% 30% .415 5-10% 24% 29% 24% 10-15% 10% 3% 10% 15-65% 22% 31% 23% Greater than 65% 14% 9% 13% Percent free/reduced lunch Less than 10% 8% 3% 7% .281 10-29% 35% 37% 35% 30-49% 34% 49% 35% 50-69% 17% 9% 17% 70% or greater 6% 3% 6% School size Less than 300 students 15% 3% 15% .008 300-999 students 54% 43% 53% Greater than 1000 students 31% 54% 32% Situation 1: Original demand formula Situation 2: 0.7 Smoothing constant only 113 Table 4.7: "Mrs' classrfrcation" Between Situation 2 and Situation 4 ("C" from Table 4.4) Same Different Expected Chi- C lassification Classification Dist'n Square Locale City 17% 9% 17% 0.173 Suburb 27% 36% 28% Town 13% 19% 13% Rural 43% 36% 42% Percent minority Less than 5% 30% 28% 30% .161 5-10% 24% 30% 24% 10-15% 9% 13% 10% 15-65% 22% 28% 23% Greater than 65% 14% 2% 13% Percent free/reduced lunch Less than 10% 7% 15% 7% .149 10-29% 35% 32% 35% 30-49% 35% 40% 35% 50-69% 17% 1 1% 17% 70% or greater 6% 2% 6% School size Less than 300 students 16% 0% 15% .005 300-999 students 53% 53% 53% Greater than 1000 students 31% 47% 32% Situation 2: 0.7 Smoothing constant only Situation 4: 0.7 smoothing, class size~N(25, .5) and courses~N(5, .25) 114 CHAPTER 5: TEACHER CHURN RATE: THE RELATIONSHIP BETWEEN TEACHER RETENTION AND STUDENT ACHIEVEMENT Introduction High schools face the challenge of recruiting and retaining an adequate number of qualified and effective instructional staff. An elevated rate of teacher turnover within high schools may contribute to a disorganized school culture that is lacking in a sense of community, which in turn may have negative implications for student achievement and school effectiveness. Prior research has provided insights regarding the types of schools that are likely to have higher rates of teacher tumover, but there is little research regarding the impact of high levels of teacher turnover on student achievement outcomes (Carroll, Reichardt, & Guarino, 2000; Guarino et al., 2006; Hanushek, Kain, & Rivkin, 2004; Ingersoll, 2001; Lankford, Loeb, & Wyckoff, 2002; Shen, 1997; Smith & Ingersoll, 2004; Stockard & Lehman, 2004; Whitener et al., 1998). The purpose of this study is to investigate the relationship between high school teacher “churn rates,” as measured by teacher retention, or the average proportion of teachers who remain the same in a given school over a four year time period, and student achievement outcomes, as measured by mathematics achievement test scores on the state assessments.1 This study also tests the relationship between teacher retention and student mobility, with student mobility as both a covariate with teacher retention, and as an intermediate outcome through which teacher retention affects student achievement. 1 This study is limited to high schools because it is part of a larger project that focuses on the impact of high school curricular changes and teacher supply and demand on high school student achievement. However, preliminary analyses of elementary and middle school teacher retention rates were conducted, and there did not appear to be a significant relationship between teacher retention and achievement in the elementary and middle school levels. Therefore, this study focuses on high schools for both substantive and methodological reasons. 115 In this study, teacher turnover refers to teacher movement into and out of schools (which may or may not be movement into or out of the profession). In other literature, teacher attrition refers to teachers leaving the profession entirely. For the school as an organization, however, attrition and turnover have similar effects on the schools in that both represent a decrease of staff that must be replaced (Ingersoll & Perda, 2009). This study takes an organizational perspective, considering teacher retention at the school level as an organizational characteristic of each high school, and investigates the impact of this organizational characteristic on the effectiveness of the organization itself (i.e. student achievement). The phenomenon of teacher turnover has been studied primarily at the state or national level (Ingersoll, 2001; Ingersoll & Perda, 2009; Loeb, Darling— Hammond, & Luczak, 2005). This approach, while powerful, can mask important differences that occur on a school-by-school basis. By contrast, this is a school-level analysis that utilizes state administrative data from Michigan to take into account distinct organizational factors related to teacher retention and student achievement. Additionally, this analysis investigates introduces student mobility into the relationship between school-level teacher retention and student achievement outcomes. Student mobility, like teacher retention, can be considered an organizational characteristic of schools and a potential indicator of school climate. However, the interaction between teacher retention and student mobility is not clear. This study estimates the relationship between teacher retention and student mobility, both as a covariate with teacher retention in predicting student achievement outcomes, as well as an intermediate outcome predicted by teacher retention. Background to the Problem 116 Teacher Turnover as an Organizational Characteristic of Schools Ingersoll (2001) has argued that teacher turnover is an organizational feature of a school, and as thus, contributes to the community and culture of the school. Teacher turnover, as a characteristic of the school, can contribute to the ability of a school to build trust among members, and to develop a sense of community. This draws on the sociology of education, which has shown that the presence of a sense of community and cohesion among families, teachers, and students is important for the success of schools (e.g., Bryk & Schneider, 2002; Durkheim, 1961; Grant, 1988; Parsons, 1959; Rosenholtz, 1989, Waller, 1932). A body of evidence suggests that the community of the school has important implications for school performance and effectiveness (Coleman & Hoffer, 1987; Rosenholtz, 1989; Bryk, Lee, & Smith, 1990). Specifically, the communal nature of certain schools, such as private schools, creates an environment that reinforces shared values and that leads to higher levels of social capital (Coleman & Hoffer, 1987). This sense of community is created by shared values, trust, and reciprocity between individuals in the school (Bryk & Schneider, 2002). Schools in which there is a stronger sense of community and cooperation between students and teachers have been shown to have higher achievement levels, as well as a more equitable distribution of achievement (Lee, Bryk, & Smith, 1993; Lee & Smith, 1997) Prior research on teacher turnover has demonstrated that teacher turnover is not evenly distributed among schools. Schools with higher proportions of minority students, low-income students, and low-performing students have higher attrition rates, and urban school districts are likely to have higher attrition rates (Guarino et al., 2006). Specifically, the type of schools that tend to have high turnover are: those with high rates of student 117 poverty (Hanushek et al., 2004; Shen, 1997; Smith & Ingersoll, 2004); small schools (Ingersoll, 2001; Stockard & Lehman, 2004); schools with high numbers of minority students (Carroll et al., 2000; Hanushek et al., 2004); charter schools (Smith & Ingersoll, 2004); those with a high proportion of inexperienced teachers (Shen, 1997); private schools (Smith & Ingersoll, 2004; Ingersoll, 2001; Whitener et al., 1997; Arnold, Choy & Bobbitt, 1993) and urban schools (Lankford et al., 2002). Several analyses found that working conditions, particularly large class size, facilities problems, multi-track schools, and a lack of textbooks, are key factors in predicting teacher attrition (Loeb, Darling- Hammond, & Luczak, 2005; F utemick, 2007; Hanushek et al., 2004; Guarino et al., 2006).2 High rates of teacher turnover may be disruptive to the quality of the school and school performance and effectiveness (Ingersoll, 2001; Ingersoll & Perda, 2009), but the mechanism by which teacher turnover impacts achievement or other outcomes is unclear. Some evidence suggests that when qualified teachers leave, they are replaced by less qualified teachers (Reichardt, 2008), which in turn is inferred to have a negative impact on school quality due to younger, more inexperienced teaching staff rotating in to fill these positions (National Commission on Teaching and America’s Future, 2002). This 2 The focus of this paper is on school-level teacher retention, and does not address the predictors of and reasons for individual-level mobility decisions. However, other research addresses these individual reasons for leaving teaching, which include childbearing (Stinebrickner, 1998; 2002); financial considerations (Dolton & van der Klaauw, 1999; Shen, 1997; Loeb, Darling-Hammond, & Luczak, 2005). For a more thorough review, see Guarino et al. (2006). There is debate about whether salary or organizational factors are more important: Stinebrickner (1998) finds that salary considerations are more important than organizational characteristics, while Hanushek, Kain, and Rivkin (2004) find that teacher attrition is related more to being in schools with lower-achieving and minority students than to salary considerations. One of the most important reasons for teacher turnover was job dissatisfaction (Ingersoll, 2001). This suggests that if teachers do not like their jobs, they are more likely to leave, but that this decision is not based solely on monetary considerations. Rather, teachers choose to leave due to issues that relate to the climate of the school—lack of support from the school administration, or student discipline problems (Ingersoll, 2001). 118 does not necessarily address the possibility that turnover, as a “school climate” variable, could have an impact on student outcomes. The majority of the current research on teacher turnover makes this theoretical leap—teacher turnover means “new” teachers in the classroom, which means they are less likely to be experienced and effective. Given that evidence from other research shows that teacher experience and effectiveness matter for student outcomes, teacher turnover is presumed to affect student outcomes via this introduction of “new” teachers into the classroom.3 This hypothesis, however, is largely untested. While there is substantial evidence regarding the characteristics of schools that are related to high teacher turnover, there is a lack of research that investigates teacher turnover as factor in student achievement outcomes. There is an underlying assumption that high levels of organizational turnover are related to decreased performance (Ingersoll, 2001). This is due to the fact that organizational research has shown that organizations with unclear or nonroutine processes or systems that require higher levels of interaction among members of the organization are more likely to experience higher turnover and thus decreased performance (Burns & Stalker, 1961; Ingersoll, 2001; Kanter, 1977; Likert, 1967; Porter, Lawler, & Hackrnan, 1975; Turner & Lawrence, 1964; Walton, 1980). Schools are seen as these types of organizations (Bidwell, 1965, Ingersoll, 2003, 2001; Lortie, 1975).4 However, at this juncture, this is primarily a hypothesized link that has not been tested extensively. While this paper focuses on the organizational and communal Implications of teacher turnover, turnover is also costly in terms of monetary outlays for schools, with $10,000 being the estimated cost of hiring a new teacher (Barnes, Crowe, & Schaefer, 2007; Milanowski & Odden, 2007; Reichardt, 2006). Some teacher supply research suggests that the teaching profession is plagued by abnormally high rates of turnover within schools, as well as high rates of attrition from the profession entirely (Ingersoll, 2001). 119 Student Retention: The Other Level of Retention Student retention and mobility is another critical component of the organizational culture of a school. Mobility in high school has been shown to be related to diminished odds of high school graduation and decreased academic achievement, even after controlling for prior achievement, student and family background, and residential mobility (Haveman & Wolfe, 1995, Rumberger & Larson, 1998; Temple & Reynolds, 1997). Other studies have linked higher rates of student mobility in grades one through eight to an increased risk of dropping out at the high school level (Rumberger & Larson, 1998; Swanson & Schneider, 1999; Teachman, Paasch, & Carver, 1996). Student mobility rates were higher than dropout rates among high schools, and averaged 19% for the typical high school (Rumberger & Thomas, 2000). Student mobility disrupts the formation of positive relationships between students and teachers, which are important for success in school (Newman, Lohman, Newman, Myers, & Smith, 2000; Stanton-Salazar & Dombusch, 1995). Studies have shown that middle school teacher bonding is positively related to academic achievement (Johnson, Crosnoe, & Elder, 2001; Muller, 2001). When students move between schools, these are the critical social ties that are broken (Midgley, Feldlaufer, & Eccles, 1989; Roeser, Eccles, & Sameroff, 1998). These student/teacher relationships are important because teachers can help develop a significant attachment to learning through their However, others have found that when comparing teachers to comparable fields, such as nurses, social workers and accountants, teacher turnover is not significantly higher (Harris & Adams, 2007). Stinebrickner (2002) found that exit rates are not lower in other professions, and that non-teachers change professions more but non-teachers also return to the workforce more quickly after an exit than teachers do. The crux of this argument appears to revolve primarily around pre-retirement attrition. Harris & Adams (2007) suggest that teacher attrition is lower, but that there are larger numbers of early retirements than in other professions. There is also some evidence to suggest that the distribution of teacher attrition is U- shaped, with high attrition among older and very young teachers (Grissmer & Kirby, 1997; Harris & Adams, 2007) 120 encouragement (Croninger & Lee, 2001; Rosenfeld, Richman, & Bowen, 2000). These types of bonds provide students with access to an informal network of knowledge in their schools (Stanton-Salazar & Dombusch, 1995). Importantly, however, student mobility is not merely an important factor in predicting achievement or in the formation and cessation of individual social ties. While experienced by individuals, student mobility can also be considered a factor in the school organizational culture much like teacher mobility. Student mobility affects not only the individual, but also the organization itself. If both students and teachers leave a school in response to a disorganized school culture and climate, these two factors are likely interrelated, but the relationship between these two levels of “chum”——teacher and student—as not been studied. More importantly, the relationship between teacher retention, student mobility, and student achievement outcomes is unclear. While the evidence shows that student mobility has a negative effect on achievement outcomes, it stands to reason that teacher retention either predicts or interacts with student mobility. As another indicator of school organizational culture, student mobility and teacher retention may exist in a feedback loop, whereby increased levels of one predict increased levels of another, and where this disorganization affects student achievement.5 Teacher Turnover in High Schools and Its Relationship to Student Achievement It has been nearly a decade since the initial passage of the No Child Left Behind Act of 2001. In that time, increasing and measuring student achievement has become an established element in every state’s educational system. In Michigan, one of the key responses to No Child Left Behind, as well as to demands for increased rigor at the high 5 Michigan is also a school choice state, which means that parents and students can choose to attend districts and schools other than their home district, provided the school has sufficient room. 121 school level and stronger college and career preparedness for Michigan students was the enactment of the Michigan Merit Curriculum. Established in 2006, the Michigan Merit Curriculum increased high school graduation requirements to four years of mathematics, four years of English Language Arts, three years of science, three years of social studies, and two years of a world language.6 This reform put Michigan in a class of 18 states who have increased their graduation requirements to include four years of challenging mathematics (Achieve, Inc., 2008). This study focuses on mathematics achievement scores as the main outcome of student achievement. This outcome was chosen for several reasons. Mathematics consists of a set of sequential courses, in that certain skills are required before new skills can be learned. Mathematics is hierarchically structured, with specific courses required as prerequisites to other courses. Hierarchically or concurrently structured courses are likely to be implemented more successful in schools with a more stable instructional workforce, as teachers can work together to ensure that all of the required skills are taught in the classes that students are likely to take, and that there are no “gaps” in instruction at a given school. This sort of curriculum monitoring and alignment may be more difficult to attain in schools with higher rates of teacher turnover. Mathematics is a gatekeeping course for successful high school completion and college entry. Successful completion of math courses has been associated with more short-term positive academic and social outcomes (Frank et al., 2008), and increasing the likelihood of attending college (Adelman, 1999; Sadler & Tai, 2007; Sells, 1973; 6 The Michigan Merit Curriculum also specifies specific courses in mathematics, science and social studies, as well as credits in the arts, physical education and an online learning experience. For more information, see http://wwwmichiggngov/documents/mde/New MMC_one pager 11.15.06 183755_7.pdf. 122 Simpkins, Davis-Kean, & Eccles, 2006), particularly at four-year institutions (Schneider, Swanson, & Riegle-Crumb 1998; Riegle-Crumb, 2006). Additionally, math has been considered to be the primary key to the social organization of the school, due to its utility in defining academic tracks (Gamoran & Hannigan 2000; Lucas & Good 2001 ; Stevenson, Schiller, & Schneider, 1994). I This study also tests the relationship between teacher retention and student achievement in the context of student mobility. Therefore, student mobility is treated both as an outcome, as well as a covariate. Figure 5.1 below demonstrates the hypothesized relationships between teacher retention, student mobility and student achievement outcomes. The solid arrows indicate relationships that have been established or investigated by previous research, such as predictors of teacher retention and student mobility. However, how these factors relate to student achievement is unclear. The first hypothesized mechanism, illustrated by the red arrows, suggests that teacher retention affects student achievement through an intermediate outcome, student mobility. The second suggested mechanism is that teacher retention affects student achievement directly, and that student mobility interacts with teacher retention in predicting this relationship. 123 Figure 5.]: Conceptual Model for the Relationship Between Teacher Retention, Student Mobility and Student Achievement Outcomes . . Student Key school characteristics: School-level achievement 0 Percentage free/reduced teacher (1 b I‘mCh retention (measure - y 0 Percentage minority mathematlcs students and ELA 0 Location achievement ' Size test scores) 0 Sector \ 0 School level \ 0 Proportion teachers with ‘1 professional licenses _ _ _.> 0 Proportion minority 4 teachers , 0 Proportion highly ' qualified teachers Student Student characteristics: mobility 0 Gender . Race ifi 0 Program eligibility ' Known relationship Hypothesized relationship #1: Teacher retention/ student mobility interact > Hypothesized relationship #2: Teacher retention predicts student mobility, which predicts student outcomes. The Relationship Between School Characteristics, Teacher Turnover, and Student Outcomes: Hypotheses and Research Questions School characteristics and the organizational nature and community of a school have been shown to be related to outcomes such as social capital. Additionally, organizational theory has suggested that the strength of an organization is related to its performance. Teacher retention is a phenomenon experienced at the school level. Schools 124 with higher rates of teacher retention may be able to increase organizational performance by increasing coordination between subjects and grades over time, aligning expectations across subjects and grades, and reinforcing instructional messages. However, these types of functions are likely to occur more frequently in schools where instructional staff have taught together over a period of time and develop a stable school culture that defines and reinforces these messages (N onaka, 1994). Therefore, higher rates of teacher retention can be hypothesized to create a more positive and stable culture in high schools, which in turn can translate into increased student achievement in mathematics. A related component of the school culture is student mobility. Increased student mobility has been shown to be detrimental, both to the individual student and to schools with high rates of mobility. One of the reasons for this is that mobility at the student level can lead to a more disjointed, disorganized school culture, which in turn has negative impacts on achievement. Increased teacher retention may be able to mediate that impact by providing a stable teacher workforce, even if the student workforce is more mobile. There are two levels of retention—teacher and student—and it is hypothesized that these interact, with higher levels of teacher retention mediating the negative effect of low student retention and amplifying the effect of high student retention. Alternatively, teacher retention may impact student achievement outcomes via the mechanism of student mobility. Low levels of teacher retention, by creating the type of disorganized school culture described above, may cause students to move between schools at higher-than-average rates, as students respond to the lack of community and continuity at the school. Parents may also choose to remove students from schools where the culture is less organized and supportive. Students who are more likely to change 125 schools or drop out due to a lack of attachment to school may not receive the same kind of structured encouragement from teachers and from their school culture to stay in school when teacher retention rates are low and the culture is disjointed. It is hypothesized that teacher retention may act indirectly on student achievement by causing higher rates of student mobility. This question can be tested by utilizing the longitudinal nature of the data, with predictors that are causally prior to the outcomes. ANALYTIC METHODS This study utilizes a multilevel modeling approach to estimate the contextual effects of the key predictors of interest, school-specific rates teacher retention and student mobility. Although fixed effects models can be used to control for school factors without making parametric assumptions, 1 choose to express these models in a multilevel modeling framework, with students at level-1 and schools at level-2, in order to estimate effects simultaneously at both levels as well as estimate a crosslevel interaction. To estimate the relationship between school-level teacher retention, student mobility, and other school contextual factors and student achievement outcomes, I controlled for student level factors such as prior ability, gender, race, and free or reduced lunch eligibility. The models are random intercept models, which model each school mean as a function of the key predictor of interest, school-level teacher retention, and other school covariates. There are no random slopes estimated; however, crosslevel interaction terms are used to explore the relationship between school-level teacher retention and specific student characteristics. It is important to remember that these models consist of students nested within schools; although teacher data is available and used to calculate school- level teacher retention and other instructional workforce characteristic variables, teachers 126 are not currently linked to students in the Michigan data, and thus, estimating the impact of a given teacher’s retention or mobility on a given student’s achievement outcomes is not possible. Moreover, this study takes an organizational approach, in which it is hypothesized that it is the composition of the instructional workforce and the organizational culture of a school that impact student achievement outcomes, and therefore modeling individual outcomes as a function of school-level predictors is appropriate. DATA AND MEASURES Data Source The data source utilized for this paper is one of its strengths. As Ingersoll (2001) points out, one of the main challenges to obtaining more precise estimates of both the causes and effects of teacher turnover has been a lack of data, with many studies conducted as single-city studies or with nationally representative data sets that are subject to issues of selection bias and generalizability. This study utilizes a set of rich longitudinal administrative data from the state of Michigan, including data on all teachers, students and schools in the entire state, collected longitudinally over a period of four years.7’8 This allows for the study of teacher retention for all teachers in all schools. One common issue with teacher retention studies is that when teachers change schools, 7 Although this can be considered universe data, with all observations accounted for, it still represents a sample of a larger theoretical population, albeit a highly reliable one. We would expect the error estimates to be smaller, given the fact that this is essentially a well-specified, very large sample, but it is still treated like a sample and inferential statistics are employed for analytic purposes. 8 Michigan has collected administrative data for more years than four years. The current system began to be implemented in 2002. However, the older data is less reliable and less complete. Michigan has undertaken significant efforts with regards to data quality, data integrity, and data coordination across the many agencies that collect, report, and utilize administrative data, and have made great improvements in their data. For this reason, the last four years are utilized, as these are all highly reliable and complete years of data. 127 they may leave the sampling frame of the study. In these data, while it is not possible to account for teachers who leave the sampling frame by leaving the state, it is possible to account for all within—state migration, for all teachers. The data sets are described in full in Chapter 2: Data and Methods, and will not be repeated here. Measures Main independent variable: The key predictor of interest is school-level teacher churn rate. Teacher churn rate is calculated by the teacher retention rate, the number of teachers in a given school who remain the same from one year to the next.9 Teacher retention rate is calculated over four years, three retention time points: school years 2004-2005, 2005-2006, 2006-2007 and 2007-2008. These rates are averaged to generate an average retention rate for each high school in the sample. Retention is based on a teacher remaining in the same school from one year to the next. If a teacher is in multiple schools, they are counted as either a stayer or mover from all schools in which they teach. This calculation is at the person-level.10 Figure 5.2 shows the distribution of retention rates over the three retention time points. This variable is dichotomized into two categories for analysis: schools with teacher retention rates of 85% and lower, and schools with retention rates of greater than 85%.“ 9 This method is used rather than calculating the number of “new” teachers in a school in a given year because of the difficulty of defining the denominator for that calculation. 10 Like Ingersoll (2001), I analyze all turnovers or departures, and do not distinguish between teacher attrition (from the profession) and teacher mobility (between schools). If a teacher is in the same school from one year to the next, they are retained; if they are not, they have “turned over.” This focuses on the organizational aspect of teacher retention, as the consequences to the organization are the same regardless of whether a teacher leaves the field entirely, or simply moves to another school (Ingersoll, 2001). ll . . . . The models were also run wrth teacher retention rate as a continuous variable. However, the 85% cut represents an important threshold and it is more useful for policy implications to consider schools in two categories, rather than on an incremental continuum. 128 It is important to note that declining teacher retention rates could be a function of economic factors, such as layoffs or early retirement initiatives. However, by considering teacher churn rate as an organizational characteristic, the question of why churn rates are increasing is less relevant. The goal of the study is to interrogate the relationship between school-level teacher churn rates, regardless of their reason, and student achievement outcomes. Figure 5.2: Distribution of School-Level Teacher Retention Rates v ’7 o. 7 ,1 m i o. -‘ ' g. t“ .' O N ’57 ‘. D 0 T :1? ‘ , if": x» r 2" 1 gig: t :,’ 1- % $11” I}, O 63 1‘ . v?“ -' . a; '3 t 1 . i ,; ”i" If .- . g ' "t y r - g. .t m, ’5 3?}; .. __ ’ t O _1 l 5991*," 1,, 7‘»:.~' ..= 9"" , t. . , .‘ w .. .'.‘ , ~ _-~...; .' .r .r . '3 0 20 40 60 80 100 retratep0708 Retention Rate SY 2006-07 to SY 2007-08 129 Figure 5.2 (cont’d) o .lg‘m-t I w Retention Rate SY 2005-06 to SY 2006-07 3.‘ Density Retention Rate SY 2004-05 to SY 2005-06 130 Using the longitudinal nature of the data, the retention variable is calculated over the years 2005-2008; the outcome measures are from the 2009 achievement tests. Therefore, the teacher retention occurs prior to the outcome. While this does not establish causality, it helps suggest the causal ordering of events. Dependent variables: The outcome variable for the student achievement models is the scale score on mathematics achievement tests from the 2009 Michigan Merit Examination. For the student mobility models, the variable “different school” is used. This variable is constructed for all students who took the MME in the spring of 2009 and indicates whether or not they were in the same school in the fall of 2006 (when they were entering freshmen) as they were when they took the MME in spring 2009.12 School-level covariates and student covariates were described in Chapter 2: Data and Methods, and will not be repeated in detail here. For reference, Table 5 .1 describes the variables used only for this analysis and presents their summary statistics. ANALYTIC APPROACH This study posits school-level teacher retention as a critical factor in explaining variations in student achievement in mathematics, either directly or via student mobility. Therefore, the first step in the analysis is to calculate average teacher retention rates across four years, three retention time points. This was achieved by identifying the proportion of the instructional staff of a given school that remained the same from one year to the next, and then averaging those rates over the three time points. Simple descriptive statistics were then calculated for schools by average retention rate, and for all 12 Drop cut rates present a potential concern. However, this file includes all students who took the MME in 2009 and who had a valid MEAP pretest from 2005. Therefore, these are not drop outs. Students who moved between schools did not drop out, but moved to another school. Drop cuts are not included in this analysis. 131 student- and school-level variables that are used in the multilevel models. Table 3.9 presents the mean school-level teacher retention rates (averaged over four years) by school characteristics (in Chapter 3, referenced here). The Multilevel Models to Be Estimated To test the effects of teacher retention on both student mobility and student achievement outcomes, multilevel models (i.e., models with random effects) of student mobility and student achievement in mathematics with high school students nested within schools are estimated. Two sets of models are estimated. These models are discussed in greater detail in Chapter 2: Data and Methods, and only key elements are repeated here. The first set of models are hierarchical linear models (HLMs) to test the effects of teacher retention on student achievement outcomes , and the second are hierarchical generalized linear models (HGLMS) to predict student mobility as a function of teacher retention. The specifications are included below for reference. In the first set of models, teacher retention predicts student achievement both directly and as a crosslevel interaction with student mobility. The purpose of these models is to test whether or not teacher retention has a direct effect on student achievement, as well as to test whether student mobility interacts with teacher retention in this relationship. The general model specification for the HLM models is as follows: Level 1 model: (1) Yij = 1301' + [31 j(student mobility) + Bj z, + rij Level 2 model: BOj = 1'00 + 101(teacher retention) + yj Q’ + “Oj where 132 Y3 = outcome (mathematics scale score) for each student i in school j 1301 = each school mean, represented as a function of the grand mean, student mobility, the matrix of student-level predictors, the school teacher retention rate, and the matrix of school level predictors Blj = coefficient for student mobility Bj = vector of coeffs for school j Z’ = vector of student covariates for school j 100: grand mean (intercept) YO] = effect of teacher retention on B0]: (each school mean) yj = vector of school-level predictors) Q’ = vector of school covariates uoj = the residual error of 1301’, distributed iid N(0, too) rij = level 1 variance (student error term), rij distributed iid N(O, 0'2) In the final HLM model, a crosslevel interaction term is added. The specification for this model is the same as above, with the “same school” slope predicted by teacher retention rate. The slope is not allowed to vary (i.e. is fixed). Level 1 model: (2) Yij = Boj + B1 j(student mobility) + Bj Z, + ,ij Level 2 model: BOj = 100 + 701(teacher retention) + Yj Q’ + “Oj 131 j = Y10 + Y1 1 (teacher retention) Student mobility is a binary outcome variable; therefore the use of a standard level 1 multilevel model is inappropriate (Raudenbush & Bryk, 2002). Student mobility is indicated by whether or not a student remained in the same school from the fall of 2006, which was their freshmen year, until the spring of 2009, when they took the high school achievement test. To attempt to establish a causal ordering, the longitudinal nature of the data is utilized. Figure 5.3 shows the timing of the data. 133 'One challenge to this analysis might be that the causal path could be in the opposite direction—that student mobility predicts teacher retention, or that student achievement predicts both student mobility and teacher retention. In order to attempt to address that criticism, the longitudinal nature of the data is utilized. Prior individual student academic achievement from their 8th grade pretest, as well as a prior cohort-level mean mathematics achievement from 8th grade are included, to control for both prior individual and group achievement. Teacher retention is calculated over the years 2005- 2008, which ends one year prior to the student mobility outcome measure. A prior measure of school-level student mobility—the mobility rate of the 2007 cohort of students for each school—is also included. The general structure of the HGLM models follows below. Level 1 structural model:13 (3) fig = BOj + Bj Z’ Level 2 model BOj = 700 + y01(teacher retention) + 702(student mobility 2007 cohort) + yj Q’ + uoj Bpj = ypo for p>0 llij = the log odds of remaining in the same school for each student i in school j 13 The level 1 model in HGLM consists of three parts: a sampling model, a link function, and a structural model. The sampling model assumes that Yijt given the predicted value Pijt is distributed NID (uij, 62). The level-1 predicted value, Pijt can be transformed so that the predictions remain within the given interval, which produces the transformed predicted value llij- This transformed predicted value is now related to the predictors of the model through the linear structure model. Combining the sampling model, link function, and level 1 structural model reproduces the familiar level-1 HLM model (Raudenbush & Bryk, 2002). The level 1 variance is now heteroskedastic 134 130j = each school mean, represented as a function of the grand mean, the matrix of student-level predictors, the school teacher retention rate, and the matrix of school level predictors 133' = vector of coeffs for school j Z’ = vector of student covariates for school j 700: grand mean (intercept) YO] = effect of teacher retention on Boj (each school mean) 702 = effect of 2007 cohort student mobility on Boj (each school mean) yj = vector of school-level predictors) Q’ = vector of school covariates uoj = the residual error of 130,-, distributed iid N(0, too) A baseline for the HLM models with mathematics as the outcome is established by estimating an unconditional random effects ANOVA (output not reported). This allows for the calculation of the intraclass correlation, or the proportion of variance that is between schools. Following the estimation of this baseline model, a series of multilevel models were estimated. Four different multilevel models are presented in Table 3 (mathematics). The first, a bivariate model, estimates the bivariate relationship between school-level teacher retention and student achievement in mathematics. The second (Model 2) includes a second school-level predictor shown in the preliminary analyses to be highly correlated with the student achievement, percent free lunch, in order to control for other key school-level factors. This model also introduces a pretest measure of mathematics achievement at the student level, in order to account for student prior ability. The final two models are fully specified models with all predictors at level 1 and level 2, with the final model containing a crosslevel interaction between teacher retention and 135 students who remain in the same school, to test the hypothesis that there is a multiplicative effect of teacher retention and student retention on student achievement. ‘4 A similar modeling scheme is utilized in the models predicting student mobility, beginning with a bivariate model with teacher retention predicting student mobility. The second model includes only the school-level measure of prior student mobility for the 2007 cohort. The final model is a full model, including all level 1 predictors (gender, race and program eligibility) as well as level-2 predictors. Sensitivity Analyses When utilizing data from observational studies, there is a concern regarding the impact of an unobservable characteristic on the outcome, one that might invalidate the inferences drawn from the study. When using state administrative data, this is a concern as well, as state data is rich in observations but does not include a large number of variables. Sensitivity analyses are conducted to test the robustness of the inferences to the influence of other unobserved characteristics. Therefore I will characterize the robustness of these inferences to the potential impact of confounding variables (Frank, 2000). Chapter 2: Data and Methods provides further detail regarding these analyses. RESULTS Descriptive Statistics of Key School Variables by Teacher Retention Rate Table 3.9 (included in Chapter 3) presents mean school-level teacher retention rate by school characteristics, in order to develop a profile of the type of schools that have low teacher retention rates. This table was described in detail in Chapter 3, and 14 All Level 1 predictors, with the exception of pretest score and same school, are grand mean centered, in order to control for their effect, rather than partial out the impact attributable to student and school. A model with all Level 1 predictors group mean centered was run (output not reported); the effects for gender and race are largely at the student level, not at the school-mean level. Therefore, the decision was made to grand mean center race and gender in the reported models. 136 therefore only key details are repeated here. In general, the types of schools that have lower mean teacher retention rates are city schools, schools with high proportions of minority and low-income students, small schools and charter schools. Schools with a lower percentage of teachers with professional licenses, and schools with a higher percentage of minority teachers also appear to have lower mean teacher retention rates. MULTILEVEL MODEL RESULTS Mathematics Achievement Outcomes]5 To assess the relationship between student mathematics achievement and school- level teacher retention, a series Of multilevel models are estimated. Using the series of multilevel models outlined previously, I find that school-level teacher retention is positively and significantly associated with school mean mathematics achievement scores (see Table 5.2). In the bivariate regression (Model 1), schools with high teacher retention rates have mean mathematics achievement scores that are 13.28 scale score points higher than students in schools with lower teacher retention rates (pg .000). The proportion of variance explained between the unconditional model and the bivariate model is 21%. This effect for teacher retention remains in Model 2, with the addition Of the measure of student prior ability, the math pretest score, as well as another important school-level covariate, percent free and reduced lunch. Schools with high retention rates have mean mathematics achievement scores that are 2.46 scale score points higher than schools with low retention rates (pg .000). Importantly, this model now explains 88% of the variance at level 2, and 52% Of the variance at Level 1, which suggests that these two school measures, teacher retention and percent free and reduced lunch, account for much Of the l5 . . . . . Results from the baseline model, a oneway ANOVA wrth random effects, yreld an Intraclass correlation of 18% for the mathematics achievement outcome. 137 variation in school mean mathematics achievement. It could be the case that these measures both serve as proxies—teacher retention for a “school climate” set of predictors, and percent free and reduced lunch for a set of structural characteristics. Model 3 and Model 4 introduce both school-level and individual predictors in addition to the measures included in Model 2. We see that in Model 3, the relationship between high school-level teacher retention and mathematics achievement is no longer statistically significant (701:0.27, pg .543). However, other workforce composition variables are significant. Schools with a higher percentage of teachers with professional licenses experience an increase in mean mathematics scale score of .08 for each one percent increase in the percentage of teachers with professional licenses in that school (702=0.08, p5 .010). Schools with a higher percentage of minority teachers have significantly lower mean mathematics achievement, with each one percent increase in the percentage of minority teachers in a school associated with a corresponding -0.07 point decrease in mean mathematics scale score (Y03=-.07, pg .001). The percentage of highly qualified teachers is also related to mathematics achievement, with schools with a greater percentage of highly qualified teachers demonstrated higher mean mathematics achievement (Yo4=.04, pg .008). As all of these variables relate to the organizational characteristics of the instructional workforce Of a school, the evidence suggests that this composition does matter with regards to student achievement. From a school structural perspective, the percentage of students eligible for free and reduced lunch was again related to lower mean mathematics achievement, with each one percent increase in the percent of students eligible for free or reduced lunch associated with a .05 decrease in mean mathematics achievement for that school (706=-.05, p5 .004). 138 With regards to the student-level predictors included in Model 3, the key predictor of interest for students is student mobility. In Model 3, students who are not in the same school in 1 1th grade as they were in ninth grade have significantly lower mathematics achievement than those students who are in the same school (1’10: -3.07, pg .000). The group mean for the same school variable, entered at Level 2, is not significant; this suggests that the relationship between student mobility and teacher retention exists primarily at the individual level; the negative effect Of changing schools is experienced by the student. This effect is apparent even after controlling for student race and gender, as well as student prior ability. Student mobility and teacher retention are both variables that are school organizational characteristics, and therefore, both are of interest in this analysis. To estimate a potential interaction between these two important predictors, a final model (Model 4) was estimated, and included a cross level interaction between teacher retention and the same school indicator variable. Again, students who change schools have lower mathematics achievement scores (ml: -3.14, pg .000). Most importantly, however, the crosslevel interaction term is significant (712: -2.14, pg .019), which indicates that students who change schools when that school has a high teacher retention rate have mathematics scale scores that are, on average, 2.14 points lower than students who change schools when those schools have lower rates of teacher retention. In other words, there is a magnifying effect of teacher retention for students who remain in the same school. This suggests that students who do not stay in the same school and who are in schools with low teacher retention are at a serious disadvantage in terms of their mathematics achievement outcomes. 139 The final model explains 92% of the variance at the school level and 53% Of the variance at the student level, indicating that this school-level model is able to capture much of the variability in mean mathematics achievement over schools. However, given the restricted range of student predictors, there is still somewhat substantial variation to be explained in student performance.16 The student achievement outcomes models presented provide two important pieces of evidence with regards to how teacher retention interacts with student achievement and student mobility. First, in the final model, the teacher retention rate is not a significant when predicting mathematics outcomes. However, student mobility is highly significant, with students who remain in the same school performing much better on mathematics achievement tests. This suggests that student mobility is a stronger predictor of student achievement than teacher retention. However, the second hypothesis still needs to be investigated: does teacher retention predict student mobility and thus relate to student achievement via the mechanism of student mobility? The models below investigate that relationship. Student Mobility As an Outcome In the bivariate model, school level teacher retention is strongly related to student mobility. Students in schools with high levels Of teacher retention (defined as above 85% of teachers retained) are 0.35 times as likely to change schools as students in schools with low levels of teacher retention. Model 2 adds a prior measure of school student mobility, the aggregate mobility for the 2007 cohort. Students in schools with high teacher retention rates are 0.55 times as likely to change schools as students in schools with low 16 . . . . These models were estimated wrth ELA scale scores as an outcome in order to test these findrngs. The results were nearly identical, and are not presented here. 140 teacher retention rates (pg .000). Prior school-level student mobility is also significantly associated with students changing schools. Students in schools with higher prior student mobility are 1.06 times more likely to change schools than students in schools with lower levels of prior student mobility (pg .000). In the full model (Model 3), the relationship between school-level teacher retention and student mobility remains. Students in schools with high levels of teacher retention are 0.62 times as likely as students in schools with low levels of teacher retention to change schools (pg .014), controlling for prior school-level cohort mobility, and school- and student-level covariates. Given the fact that the previous models demonstrated the strong relationship between student mobility and achievement outcomes, this suggests that low levels of teacher retention are related to high levels Of student mobility, which in turn can lead to decreased student achievement outcomes. Students in schools with higher levels of 2007 cohort student mobility are 1.04 times more likely to change schools (pg .008). This again suggests that these “churn” rates are on a feedback loop. Students in schools with high levels of prior student mobility are more likely to leave those schools; students in schools with high rates of teacher retention are less likely to leave those schools; and when students remain in those schools, this create a more stable school culture that leads both to more stability in population as well as increased achievement. Turning to the other school-level covariates, the average mathematics pre-test score for the school was negatively and significantly related to student mobility, with students in schools with higher levels of prior student mathematics ability 0.97 times as likely to change schools (pg .009). At the student level, a student’s prior mathematics 141 achievement was negatively related to his likeliness to change schools (odds ratio=0.99; (pS .000). Students who are eligible for free and reduced lunch are 1.66 times more likely to change schools than their non-free/reduced lunch eligible counterparts (pS.000). Black students and multiethnic students were all significantly more likely to change schools than their white counterparts (Odds ratios 3.30 and 3.41, respectively), as were Hispanic students (Odds ratio 1.26). Interestingly, female students were 1.09 times more likely to change schools as males (pg .004). Students in special programs were 2.02 times more likely to change schools as students not in these programs. Sensitivity Analyses The models presented above are only able to account for measurable confounding variables. One of the well-known issues in using state administrative data is that these data are rich in observations, but often “poor” in variables (F iglio, 2010). Therefore, robustness indices are used here as a complement to address problems related to unmeasured confounds.17 While the Impact Threshold for Confounding Variables (ITCV) does not control for the impact Of these confounds, it does quantify how powerful they need to be in order to negate the inferences drawn (Crosnoe, 2009; Frank, 2000).18 The ITCV is the minimum product Of the correlation between the predictor and confound and the correlation between the outcome and confound necessary to reduce an estimate below threshold for statistical significance of the key association of interest; in this case, the association between teacher retention and student mobility. If the actual, ‘7 As developed by Frank et a1 (2010), there are three major “threats” to inference in observational studies: a biased sample, a small sample, and unobserved covariates. Given the fact that this sample nearly approximates the population, as it is the entire state of Michigan, and given it’s sheer size, I consider here the greatest threat to inference to be unmeasured confounds. This is likely true of most studies utilizing state administrative data. 18 . . . . . Further details regarding sensrtrvrty analyses are provided in Technical Appendix C. 142 albeit unknown, product Of these two correlations is greater than this threshold, than the inclusion Of that unobserved covariate would invalidate the inference. Therefore, I calculate the ITCV for the teacher retention predictor. The impact of an unmeasured confound (recall impact =r wyxr v.x, see Frank, 2000) would have to have magnitude greater than .04 to invalidate the inference. Thus to invalidate the inference that low levels of teacher retention predict student mobility, a confounding variable would have to be correlated with teacher retention at 0.19 and with student mobility at 0.19, accounting for covariates, 2 (Frank, 2000). These are moderate correlations.19 It is important to note that this assumes that the unmeasured confound is uncorrelated with the measured covariates (Frank, 2000). However, the relevant partial correlations that would create the impact of an unobserved confound would be smaller than these zero-order correlations because they would correlate with existing covariates (Frank, 2000; Frank & Sykes et a1, 2008). It is also useful to compare the threshold for an unmeasured variable to the impacts for measured covariates. At level 1, the strongest impact of the measured covariates, aside from the pretest of student ability, is for free and reduced lunch.20 The impact of free and reduced lunch eligibility on the coefficient for school-level teacher 19 . . . . These calculations are obtained by regressrng all of the predictors on fire treatment (school-level teacher retention) and Obtaining the R2 value, and then regressing all predictors on the outcome without the treatment and obtaining the R2 value. As the outcome was a student-level dichotomous outcome, which 2 produces pseudo- R s, the variables were aggregated to average school mobility and the continuous teacher retention predictor was used. Seltzer, Kim & Frank (2006) suggest that when performing a cluster-level regression with unbalanced data, precision weights should be used. These weights are calculated by 1/t+Vj, 2 2 where Vj=c lnj and O’ =np(l-p). See Technical Appendix D for more detail. The impact of the pretest measures of student ability and school-level student mobility are not compared to the impact of an unobserved covariate because they are obviously clear predictors of the outcomes. The impact of the pretest is well-known to be the most significant predictor and thus not a relevant comparison. 143 retention on student mobility is .011, which is the product of the correlation with teacher retention -(.15) and the correlation with student mobility (-.07). At level 2, the strongest impact of the measured level 2 covariates, aside from the pre-measure of school-level student mobility, is school average math achievement. The impact of school math achievement on student mobility is .06, which is the product Of the correlation with teacher retention (.30) and the correlation with school mean mobility (.20). Thus, the impact of an unmeasured covariate necessary to invalidate the inference of .04 would have to be greater than the impact Of the strongest level 1 covariate, although less than the impact of the strongest level 2 covariate. This suggests that the size of the impact of an unmeasured covariate would have to be similar in size to the other measured covariates. Limitations The most significant limitation of this study is that, like many studies conducted with state administrative data, there are a limited number of covariates available to use in the models, which raises the question Of unobserved covariates that might invalidate the inferences made here. While I have attempted to identify and utilize all available data, to use effective modeling strategies, and to quantify the robustness of the inference to the influence of unobserved covariates, this does not solve the fundamental problem. It is possible that teacher mobility is caused by factors such as school leadership or neighborhood characteristics, and these are not available for the models. In the future, this relationship between teacher retention, student mobility, and student outcomes can be further investigated using fixed effects models to attempt to address these unobserved characteristics and their potential impact on the relationships studied here. Additionally, 144 at this point in time, assessment data that matches with a pre-test are only available for one cohort, although more data will become available each year. Finally, Michigan’s assessment system only allows for one measurement of achievement at the high school level, which is not Optimal from a measurement perspective. Conclusions Although teacher retention appears to have a direct effect on student achievement outcomes via an interaction with student mobility, this relationship can be explained with other covariates. The results from the first analysis demonstrate that remaining in the same school has a positive impact on student achievement outcomes. The second analysis in turn demonstrates that students in schools with higher levels of teacher retention are more likely to stay in the same school, which in turn has a positive impact on achievement. Therefore, teacher retention appears to affect student achievement via the mechanism of student mobility. This suggests the importance of organizational culture on student achievement outcomes. Although I do not directly measure or test “school culture,” the most positive impacts for student achievement are evident when there is a stable teaching workforce and a stable student body. This stability allows for the organization to function effectively as a unit, and also allows for cohesion, community, trust, and reciprocity to develop and exist between teachers and students, and within the teachers themselves. It is important to consider what factors might be lead to increased teacher mobility, given the importance Of schools maintaining a stable, qualified instructional workforce. These factors may include school leadership, alterations in state policies, economic changes in the state as a whole, or inhospitable work conditions. It is critical to 145 identify which factors may be increasing teacher mobility, as this analysis shows that when teacher mobility increases, there can be negative consequences for student achievement. 146 Table 5.1: Variable Description and Summary Statistics education, Section 504, Limited English Proficient, or mirgrant programs Variable Description Mean SD Min Max Student Level (n=96, 556) Math achievement score Math scale score on 2009 Michigan 1096.7 31.1 950 1250 Merit Examination, Michigan's high school assessment Math pretest achievement Math scale score on 2006 MEAP test, 813.59 24.1 470 952 score last test administered prior to high Student mobility Dummy, 1=different school in 9th grade 0.14 0.35 0 1 (different schoolL than 11th grade. Gender Dummy, 1 = female 0.51 0.5 0 1 Racial/ethnic code American Dummy, 1 = yes. Combined American 0.01 0.1 0 1 Indian/Pacific Islander Indian and Pacific Islander due to small sample sizes Asian Dummy, 1 = yes 0.02 0.15 0 1 Black Dummy, 1 = yes 0.15 0.35 0 1 White Reference 0.79 0.41 0 1 Hispanic Dummy, 1 = yes 0.03 0.17 0 1 Multiple race/ethnicity Dummy, l = yes 0.01 0.08 0 1 Free/reduced lunch Dummy, 1=yes. Indicates whether a 0.27 0.44 0 1 eligible student is eligible for free or reduced lunch. Special programs Dummy, 1 = eligible for Title 1, special 0.01 0.11 0 1 147 Table 5.1 (cont'd) School Level (n=580 mobility (2007 cohort) 2007 cohort of MME testers who changed schools since the fall of 2005 high schools) Variable Description Mean SD Min Max Average teacher retention Percentage of teachers who were 86.2 8.57 0 100 rate retained in a school from one year to the next. Averaged over four years (2005- 2008), three retention time points Percent of teachers Percentage of teachers who have 83.29 12.1 25.2 100 w/professional licenses professional licenses in a school. Averaged over four years (2005-2008) Percent minority teachers Percentage of teachers in a school who 7.97 17.5 0 100 are minority teachers. Averaged over four years Percent highly qualified Percentage of teachers in a school who 80.61 12.5 0 100 teachers are highly qualified. Averaged over four years (2005-2008) Percent free/reduced Percentage of student body eligible for 35.88 19.9 0 99.63 lunch free/reduced lunch Percent minority Percentage of the student body that is 23.27 30 0 100 minority students Locale City Dummy, 1=yes 0.17 0.37 0 1 Suburb Reference 0.28 0.45 0 1 Town Dummy, 1=yes 0.13 0.34 0 1 Rural Dummy, 1=yes 0.42 0.49 0 1 Charter Dummy, 1 = school is a charter school 0.06 0.23 0 1 Magnet Dummy, 1 = school is a magnet school 0.11 0.32 0 1 School size Small school School has less than 300 students 0.15 0.35 0 1 Medium school School has 300-999 students 0.53 0.5 0 1 Large school School has over 1000 students 0.32 0.47 0 l School-level student School-aggregate of the proprtion of the 13.02 15.9 0 100 148 Table 5.2: Hierarchical Linear Model Predicting Student Mathematics Achievement as a Function of School-level Teacher Retention Bivariate Model 2 Model 3 Model 4 C oefl p—val C oefir p-val C oefl' p-val C oefl p-val (se) (se) (3e) (se) Intercept 700 1093.46 .000 1093.39 .000 1093.47 .000 1093.47 .000 (.51) (.22) (.19) (.19) Level 2 Variables High teacher retention rate 13.28 .000 2.46 .000 0.27 .544 0.27 .543 (greater than 85%) yo, (1.31) (.53) (.45) (.45) (Less than 85 % retention rate =reference) Instructional Workforce Characteristics Proportion teachers 0.08 .010 0.08 .010 w/prof licenses yo; (.03) (.03) Proportion minority teacherSYo3 -0.07 .001 -0.07 .001 (.02) (.02) Proportion HQ teachers 1104 0.04 .008 0.04 .008 (.02) (.02) School Structrual Characteristics Percent minority students 705 0.03 .074 0.03 .077 (.02) (.02) Percent free/reduced lunch -0.05 .004 -0.05 .004 students 706 (.02) (.02) City 1’07 0.13 .841 0.13 .840 (.65) (.65) Town yog 0.79 .202 0.79 .201 (.62) (.62) Rural yog 1.48 .007 1.48 .007 (.54) (.54) (suburb =reference) Small school Yoro -0.86 .247 -0.86 .248 (.74) (.74) Large school You 0.36 .426 0.36 .426 (.45) (.45) (medium school =reference) Charter 7012 1.18 .500 1.18 .501 (1.75) (1.75) Magnet 7013 0.17 .787 0.17 .787 (.63) (.63) Retention diffemce 5.35 .003 5.35 .003 (Y3ret-Y1ret) 7014 (.1.73) (1.73) 149 Table 5.2 (cont'd) Group Means for Level 1 Variables Mean math pretest 7015 1.20 .000 0.97 .000 0.97 .000 (.03) (.03) (.03) Mean student mobility 7016 -.01 .318 -.01 .318 (.007) (.007) Student Characteristics Student mobility 710 -3.07 .000 -3.14 .000 (.41) (.41) Math pretest 720 0.91 .000 0.90 .000 0.90 .000 (.008) (.008) (.01) Free/reduced lunch eligible 730 -3.81 .000 -3. 16 .000 -3.15 .000 (non-eligible =reference) (.20) (. 1 9) (.1 9) Student race (white =reference) American Indian, 1'40 218 .004 -2.17 .004 (.74) (.74) Asian, 150 1.52 .004 1.54 .003 (.51) (.51) Black, 760 -7.11 .000 -7.07 .000 (.39) (.39) Hispanic, 770 -2.10 .000 -2.08 .000 (.51) (.51) Multiethnic, 780 -2.84 .003 -2.83 .004 (.96) (.96) Female 790 -.21 .135 -0.21 .139 (.14) (. 14) Program eligible 7100 -5.43 .000 -5.42 .000 (.72) (.72) Crosslevel Interactions Diff school * teacher retention Intercept (diff school)y”0 -3.14 .000 (.41) High school-level teacher -2.14 .019 retention rate, 7120 (.92) Random Eflects df df df df Level 2 variance component, 1:00 138.54 578 21.47 577 13.74 563 13.74 563 Level 1 variance component 02 805.56 381.94 378.04 377.99 Proportion variance explained, n/a 52% 53% 53% Level 1 Proportion variance explained, 21 % 88% 92% 92% Level 2 * *Pretest & student mobility group mean centered, group means at level 2; other L1 predictors grand mean centered. All level 2 variables grand mean centered. 150 Table 5.3: Hierarchical Generalized Linear Model Predicting Student Mobility (1=different school from freshmen-junior year) Odds Ratios Reported Bivariate Model 2 Model 3 0.12. p-val 0. R. p-val 0.R.. p-val (58) ($8) (88) Intercept yoo 0.10 .000 0.10 .000 0.10 .000 (.08) (.07) (.07) Level 2 Variables High teacher retention rate yo, 0.35 .000 0.55 .001 0.62 .014 (Greater than than 85%) (-17) (~17) (.20) Low teacher retention rate =reference Prior school-level student 1.06 .000 1.04 .000 mobility (2007 cohort) yo; (.01) (.008) Instructional Workforce Characteristics Proportion teachers 1.02 .041 w/prof licenses yo3 (.01) Proportion minority teachers yo, 1.00 .407 (.01) Proportion HQ teachers 7o, 0.99 .034 (.01) School Structrual Characteristics Percent minority students 706 1.00 .415 (.01) Percent free/reduced lunch 0.99 .039 students yo, (.01) Cit}I 108 0.53 .013 (.25) Town 709 0.65 .l 10 (.27) Rural ‘Yoro 0.61 .012 (.20) (suburb =reference) Small school You 1 .69 .010 (.20) Large school 7012 1.22 .345 (.21) (medium school =reference) Charter You 1.91 .182 (.48) Magnet 7014 1.16 .483 (.22) 151 Table 5.3 (cont'd) Group Means for Level 1 Variables Mean math pretest 1015 0.97 .009 (.01) Student Characteristics Math pretest ylo 0.99 .000 @001) Free/reduced lunch eligible (1=yes) 1’20 1.66 .000 (.04) American Indian, y3o 1.34 .064 (.14) Asian, 1’40 1.00 .987 (.14) Black, yso 3.30 .000 (.07) Hispanic, 1’60 1.26 .052 (.12) Multiethnic, 1’70 3.41 .000 (.19) Female yoo 1.09 .004 (.03) Program eligible 1’90 2.02 .000 (.17) Random Eflects df df df Level 2 variance component, too 3.02 578 2.58 577 2.57 563 Proportion variance explained, Level 2 * All level 2 variables are grand mean centered. "Pretest and same school are group mean centered, with group means included at level 2. All other level 1 predictors are grand mean centered. 152 Figure 5.3: Timing of the Longitudinal Data for Establishing a Causal Path Average school-level teacher retention rate 2005-2008 153 Student Outcome: achievement student pretest from School-level remained in 8th grade 2007 cohort same school mobility rate 20054009 2°05 2006 2007 2008 2009 CHAPTER 6: PROPENSITY ANALYSES OF THE EFFECTS OF TEACHER TURNOVER The evidence provided in Chapter 5 strongly suggests the importance Of high school-level teacher retention in decreasing student mobility and increasing student mathematics achievement. However, the multilevel methods and sensitivity analyses presented in Chapter 4, while rigorous, do not offer a clear framework for understanding the potential effects of increasing school-level teacher retention on student mobility. Therefore, this next section of the teacher retention analysis shifts focus to the eflects of a school having high teacher retention. Using a quasi-experimental design, propensity Score matching, the impact of high teacher retention on student mobility will be estimated by comparing similar schools that have either high or low rates of teacher retention. Both the treatment (teacher retention) and the outcome (student mobility) will be analyzed as school-level variables—the proportion of teachers retained, and the proportion of students who leave a school. The analysis presented in the prior chapter uses longitudinal data and multilevel modeling to attempt to establish the relationship between school-level teacher retention, student achievement and student mobility. However, the question remains: for schools that are similar on all characteristics with the only difference being their teacher retention rate, what is the effect of school-level teacher retention on school- level student mobility? In other words, if a'school has low teacher retention, what is the potential effect on student mobility of increasing the teacher retention rate so that the school has a high teacher retention rate? Estimating the Effect of High School-Level Teacher Retention1 1 Chapter 2: Data and Methods provides a detailed description of propensity score matching as an analytic technique, as well as develops the methods utilized here. Technical Appendix D also provides additional information regarding model specification and other details. 154 In order to estimate the effect Of school level teacher retention on school-level student mobility, high school-level teacher retention is considered as a “treatment” experienced by some schools. This analysis uses a set Of propensity-score weighting schemes to investigate the impact of high school-level teacher retention on student mobility for those schools in the margin Of indifference (i.e. those most likely to respond to a treatment Offer); those in the control (i.e. those with low teacher retention) and those in the treatment (i.e. those with high teacher retention). As a fourth strategy, this analyses estimates the average treatment on the treated using a series of matching methods, including stratification, nearest neighbor, kernel matching, and radius matching. The purpose is to test whether or not increasing school-level teacher retention may be a useful “treatment” to help to stabilize the student population and decrease student mobility. In order to investigate this question, a school-level propensity score is estimated, and then a series of propensity score analyses are conducted to compare the average student mobility rates Of schools based on their probability of having high school-level teacher retention.2 Propensity Scores: Weighting Method Propensity scores allow for the comparison of those in a “treatment” group (schools with high teacher retention) and those in a control group (schools with low teacher retention) who have similar propensities to receive the treatment. The treatment, 2 Originally, this analysis was conceived as a multilevel analysis that used propensity scores at level 2 with the outcome at level 1. However, due to the fact that this is not the general framework by which multilevel propensity score models are estimated, there are methodological and analytic implications that are beyond the bounds of this analysis. Technical Appendix C outlines general questions and issues, and presents evidence for results. This will be an area for further study, with the goal of developing a clear method by which level-2 propensity scores can be used to estimate level 1 effects. 155 high teacher retention, is defined as a school with over 85 % of the teachers retained from year to year.3 The propensity score is the probability of a school receiving the treatment (high teacher retention), given a set Of covariates. The weighting approach allows all information to be retained in the analysis and not lost as can be the case when doing a case by case matching approach to propensity scores (Hirano & Imbens, 2001; Morgan & Harding, 2006; Robins, Heman, & Brumback, 2000; Robins & Rotnitzky, 1995; Robins, Rotnitzky, & Scharfstein, 2000) Complete details of the weighting method are provided in the data and methods chapter.4 After propensity weights are constructed, each school is weighted by its propensity to receive treatment in school-level regressions. There are some critiques of this method. When the weighting procedure is used to estimate the average treatment effect, Freedman and Berk (2008) found that the weighting method was Optimal if 1) study participants were independent and identically distributed (here, schools); 2) selection was exogenous, and 3) the selection equation was properly specified with correct predictor variables and functional forms. When these are not met, the weighting method may increase random error, and may bias the standard errors downward. Kang and Schafer (2007) demonstrated that using inverse probabilities as weights is sensitive to misspecification of the propensity model. For these reasons, the weighting method is used as one method, but is verified with other methods. Checking the Weights/T rimming the Weights 3 Tests were done to estimate the best threshold for defining the high teacher retention treatment level. Several thresholds were included in a series of logistic regressions to see which were significant. Additionally, propensity scores were estimated for several thresholds and the distribution of treatment and control cases in each of the blocks was examined. An 85% threshold yielded the most reasonable results, and therefore is used here as the treatment. These weights constitute sampling weights, and are applied in this analysis using the pweight command in Stata. 156 One concern with utilizing the weighting strategy is that extreme values may exert undue influence on the outcome (i.e. one school counting for too many schools). However, this can be addressed by examining a box plot (see Figure 6.1) and trimming the weights. In this analysis, rather than dropping out the observations with extreme weights, Observations with weights greater than 18 are assigned a value of 18. Figure 6.1: Box and Whisker Plot to Demonstrate Distribution of Weights .6 .8 1 1 .4 I Estimated propensity score ea .0- .2 Checking for Balance After the propensity score is estimated, checks are performed to ensure that the balancing property is satisfied. If balance is achieved, the treatment assignment and the Observed conditions are conditionally independent, given the propensity score. This satisfies the strongly ignorable treatment assignment assumption (Guo & Fraser, 2010). This is done by a series of two-sample t-tests between the treatment and control group, divided into “blocks” by propensity score. Results from these checks for balance are 157 shown in Table 6.1. There are no significant differences in the means between treatment and control on any of the covariates.5 Propensity Scores: Stratification Method I also investigate this relationship by looking at the average treatment effect within four strata.6 This estimates the average treatment effect on the treated (ATT) using stratification matching. The ATT is computed only on the region of common support, and is computed using a weighted (by the number of treated) average of the block-specific treatment effects. In turn these are computed as the difference in average outcomes of treated and controls within the same block for which the all control variables are balanced (Becker & Ichino, 2002). Propensity Scores: Nearest Neighbor, Kernel Matching and Radius Matching As a final check on my results, I conduct propensity matching using three additional methods: nearest neighbor matching, kernel matching, and radius matching. In radius matching, the ATT is computed by averaging over the unit-level treatment effects of the treated where the control(s) matched to a treated observation is/are those observations in the control group that lie within a radius of .25; if there are multiple best controls, the average outcome of those controls is used. Kernel matching calculates the ATT by averaging over the unit-level treatment effects of the treated where the control unit outcome matched to a treated Observation is obtained as kemel-weighted average of control unit outcomes. Finally, nearest neighbor matching computes the ATT by averaging over the unit-level treatment effects of the treated where the control(s) matched T1115 15 a function of the pscore estrrnatron method In Stata. The propensrty score cannot be estimated 1f ghe balancing property is not satisfied. This is computed using Stata’s atts command within the pscore program. See Becker & Ichino, 2002. 158 to a treated Observation are those Observations in the control group that have the closest propensity score; if there are multiple nearest neighbors, such as here, where three . . . 7 8 9 neighbors were specrfied, the average outcome of those controls 15 used. ’ ’ The outcome variable is a count variable of the number of students in a given school who changed schools between their ninth grade and eleventh grade year. This variable is positively skewed, suggesting the use of a Poisson or negative binomial regression model, but these models do not fit the data well. Therefore, this school-level student mobility variable is dichotomized into high student mobility (greater than 10%) and non-high student mobility, and analyzed using a logistic model for the bivariate and multivariate regressions, and for the weighted regressions. However, logistic models are not a specific Option for specification in the other propensity score models; the authors of the program suggest simply using the dichotomous outcome variable as if it were continuous (Becker, 2010; Angrist, 2001). Given that there are known weaknesses in both the weighting method and in these matching methods, I present the coefficients and standard errors from all models, to provide as much evidence as possible regarding the potential effect of high school-level teacher retention on student mobility. Results 7 For a more extensive development of each of these three methods and their strengths and weaknesses, lease see Technical Appendix C. Nearest neighbor, radius and kernel matching were implemented using Stata’s psmatch2 program. We would expect all estimators to be very similar, as with growing sample size, they come closer to comparing only exact matches (Smith, 2000). In this analysis, however, there are 580 schools; while this is a reasonable sample size, it is by no means very large, and in smaller sample sizes, the choice of matching estimator is important (Heckman, Ichimura, & Todd, 1997). To avoid bad matches, I imposed a caliper, which is a tolerance level on the maximum propensity score distance. I chose .25. It is important to note that these findings are very sensitive to the caliper size set. This is a limitation in the caliper method (Smith & Todd, 2005). Also, various calipers were tested, as suggested by Guo & Fraser (2010). 159 The first step in a propensity score matching analysis is to estimate the propensity score. Only variables that simultaneously influence the participation decision and the outcome and only variables that are unaffected by participation in treatment should be included in the propensity score matching equation (Caliendo & Kopeinig, 2005). Therefore, only variables that are either fixed over time or measured before participation in treatment (i.e high school-level teacher retention) are included. The teacher retention rate is a three year average from 2006-2008, which precludes the use of some variables, like the average percent of professional teachers in each building, from being included in the propensity score estimation. The percent minority students, percent free and reduced lunch students, locale, percent Of female students, and charter and magnet status are considered to be time-invariant characteristics. School size category is taken from the 2007 Common Core of Data. To account for student prior performance, the average student pretest scores in math and English language arts are included.lo The results of the logistic regression to estimate the school-level propensity to have high teacher retention are presented in Table 6.2.11 Schools with a greater proportion Of minority students were less likely to have high teacher retention. Large schools (those over 1000 students) were more likely to have high teacher retention. All of the other predictor variables do not meet the threshold for statistical significance. Table 6.3 presents the results of the propensity score matching analyses under nine different conditions: bivariate, regression with all covariates, weighted by propensity (EOTM, treatment on the treated, treatment on the control), stratification, nearest 1 0 These are from the 2005 MEAP, which is prior to the assignment to treatment. 1 1 All propensity scores are calculated using the pscore program in Stata. 160 neighbor, kernel matching and radius matching.12 All of these methods are used to predict the effect of high teacher retention on school-level student mobility by comparing the student mobility in schools with similar propensities to have the “treatment” (high teacher retention) where some schools actually have high teacher retention and other schools do not. When weighted by propensity to have high teacher retention using the EOTM, schools with high teacher retention are .46 times as likely to have high student mobility, and this effect meets the threshold for statistical significance. Similarly, the treatment effect for the treated shows that schools who got the treatment were 0.49 times as likely to have high student mobility as those of similar propensity who did not get the treatment. Finally, the strongest effect is where we might have expected it—treatment on the control, which suggests how strong the effect of this treatment might have been for those who did not receive it. The treatment effect for the control is 0.42, which implies that schools in the control condition might see a reduction in their likelihood to have high student mobility if they were to have high teacher retention instead Of lower teacher retention. For comparison, the unweighted regression with covariates and the unweighted bivariate regression are included. The four additional matching methods all demonstrate the same relationship. Sensitivity Analyses If there are unobserved variables that affect assignment to treatment and the outcome variable, there might be ‘hidden bias’ that will not be accounted for by these methods. Given the relatively limited data that we have available, this may be the case 12 . . . . . . The stratification, nearest neighbor, radius, and kernel matching methods all Include bootstrapped standard errors on the average treatment effect. 161 here. Will the inferences be altered by unobserved factors? The question of interest in sensitivity analyses for propensity scores is slightly different than that asked by sensitivity analyses like those reported in Chapter 5. The researcher wants to investigate how sensitive the assignment to treatment is to hidden bias. Is there an unobserved covariate that is correlated with the likelihood of receiving treatment and with the likelihood of having an increased outcome? There are two types of bias: overt bias, one that can be seen in the data at hand, and hidden bias, one that cannot be seen because the required information is not available (Rosenbaum, 2002). Hidden bias is unobserved selection. The fact that this bias is unmeasured is crucial, because if it were measured, it could be accounted for and avoided (Kennedy, 2003). Overt and hidden biases that affect causal inference can hypothetically be controlled for in well-implemented randomized experiments, through the mechanism Of randomization (Guo & Fraser, 2010). Although researchers control for selection bias through matching, they can adjust only on the Observed or measured covariates; thus, selection bias due to unmeasured covariates remains a problem. Therefore, in most Observational studies, what remains unknown is the extent to which matching or other adjustments adequately control for bias and yield estimates of treatment effects that are trustworthy. To deal with this problem, Rosenbaum developed sensitivity analysis methods designed to gauge the level of sensitivity of study findings to hidden bias (Guo and Fraser, 2010). Rosenbaum & Rubin (1983) and Rosenbaum (2002, 2005) recommend that researchers regularly conduct sensitivity analyses with Observational studies. The fundamental idea in sensitivity analyses in the propensity score matching context is to manipulate the estimated Odds of receiving the treatment to investigate the 162 extent to which the estimated treatment effects may vary. What the researcher hopes to find is that the estimated treatment effects are robust to a plausible range of selection biases. This idea is developed below.13 The participation probability is given by P; = P (xi, ui) = P(Di=1|xi, ux) = F(Bxi + yui), where x; are the observed characteristics, u; is the unobserved variable, and y is the effect Of the unobserved variable on participation decision. If the study is free of hidden bias, 7 will be zero and the participation probability will be determined solely by the observed covariates. If there is hidden bias, two schools with the same Observed covariates will have different chances of receiving the treatment. If both units have identical observed covariates, which is what is implied by the matching procedure, then the units only differ in their Odds of receiving treatment by a factor that includes the parameter 7 and the difference in their unobserved covariates u. If there are no differences in the unobserved variables between the units, or if the unobserved variables have no relationship with the participation in treatment decision, than the odds ratio will be one, which suggests that there is no hidden or unobserved bias. These sensitivity analyses attempt to evaluate how inferences about the effect of treatment (i.e. high school-level teacher retention) is altered by changing the values of y (the effect of the unobserved variables on treatment decision) and by differences in the values of the unobserved variables. This is tested using the mhbounds program, available in Stata (Becker & Caliendo, 2007). The mhbounds program computes Mantel-Haenszel 13 Through this section, I rely on the work of Guo & Fraser (2010), as well as that of Becker & Caliendo (2007), and Caliendo and Kopeinig (2005). An application of this type of sensitivity analysis is found in Haviland, Nagin, & Rosenbaum (2007), and is used as a model for this analysis. 163 bounds to check sensitivity of the estimated average treatment effects on the treated. The Mantel-Haenszel test statistic is used to compare the successful number of schools in the treatment group against the same expected number given the treatment effect is zero. To use this statistic, it is necessary to make individuals in the treatment and control groups as similar as possible, which is accomplished with the matching. To review: 1 calculated the propensity score as described above, and then used that propensity score both as a weight and as a method by which to match observations within strata and with each other for the propensity score analysis, as demonstrated in Table 6.3. I then turned to a sensitivity analysis. The mhbounds method is appropriate for checking the sensitivity Of nearest neighbor matches and the stratification methods. There is not an appropriate sensitivity analysis available for every method of matching.14 This test provides two test statistics—QMH+, which is the Mantel-Haenszel statistic with the assumption of an overestimation of the treatment effect. This would occur if we have positive unobserved selection—those schools most likely to have high teacher retention are also most likely to have high student mobility. The QM“- statistic adjusts the Mantel-Haenszel statistic downward for negative (unobserved) selection, which is where schools with high teacher retention are less likely to have high student mobility. Please note that the second, negative unobserved selection, is the direction we would expect unobserved selection to act in—i.e. those schools likely to have high 4 Propensity scores are estimated by the pscore command in Stata. Stata Offers several matching programs, including the suite in psmatch2, as well as a series of average treatment on the treated estimators. In Table 6.3, the estimates are drawn from the average treatment on the treated estimators. However, those do not have sensitivity analysis programs associated with them, whereas psmatch2 does. I used the psmatch2 matching to implement 1:1 matching, using the propensity score from pscore, and also used the strata from the pscore program with psmatch2 to generate average treatment effects and then to assess them using the sensitivity analyses described here. 164 teacher retention are also likely to have low student mobility. For this reason, this statistic is the one of key interest in the sensitivity analyses. When checking the sensitivity of the estimated average treatment on the treated under the nearest neighbor matching strategy, there is a significant negative treatment effect on the treated, which indicates that schools that receive the treatment (i.e. have high teacher retention) are significantly less likely to have high student mobility. Under the assumption of no hidden bias (which is at gamma = l), the Mantel-Haenszel test statistic suggests that this study is insensitive to hidden bias, as this result is repeated. However, looking at the bounds under the assumption that we have negative (unobserved) selection (i.e. QM“), we see that the result becomes insignificant at 1.4, which means that the confidence interval for the effect would include zero if an unobserved variable caused an Odds ratio of treatment assignment to differ between the treatment and comparison groups by 1.4. This shows that the results are somewhat sensitive to deviations in the unconfoundedness assumption. The results of the positive (unobserved) selection show that the effect is significant at gamma=1, which is the assumption of no hidden bias, and become more significant for increasing values of gamma (see Table 6.4). When evaluating the sensitivity of the average treatment On the treated estimate yielded by the stratification method, there is a similar result. There is a significant negative treatment effect on the treated (see Table 6.5). The bounds under the assumption of negative (unobserved) selection show that the result becomes insignificant at a gamma level of 1.2, which implies that the effect would likely disappear if an unobserved variable caused an odds ratio of treatment assignment to differ between the treatment and 165 comparison groups by 1.2. This implies that these results should be interpreted with caution, as this effect is sensitive to deviations in the unconfoundedness assumption. It is important to note, however, that this does not conclusively prove that hidden bias exists, but rather that the results appear to be moderately sensitive to possible deviations from the identifying unconfoundedness assumption. This provides evidence regarding the robustness of all the matching methods used here, as each method yields very similar results (as demonstrated in Table 6.3), and specifically in defense of the nearest neighbor and kernel matching. However, this does not address the weighted methods. To quantify the sensitivity of the inference in the weighted regressions, I return to the Frank (2000) method. This asks the question: how large would the impact Of an unmeasured confound need to be to invalidate the inference regarding high school-level teacher retention and school-level student mobility, when schools are weighted by their propensity to receive treatment? Rather than testing the sensitivity of the assignment to treatment, we return to focusing on the sensitivity Of the effect and seek to quantify the robustness Of that inference to the impact Of an unmeasured confound. Using the school-level outcome and aggregated predictors used in the analysis, I regressed the treatment on the predictors, high school-level teacher retention, to Obtain 2 . . . . the R fi'om that relatronshrp, and then regressed the outcome on the predrctors, wrthout the treatment. These R2 values were then used to calculate the ITCV for each of the three weighting schemes—the estimated treatment on the margin of indifference (EOTM), the treatment on the treated (TOT) and the treatment on the control (TOC). As noted previously in Chapter 5, when conducting a cluster-level regression for sensitivity 166 analyses, clusters should be weighted by precision, where precision is calculated as below: 0,2 where V} = -n— and where 0'2 = "190- P) J . 1 Precision = W eight = r +V For comparison, I calculated the ITCV using the product Of the propensity weight and the precision weight. However, the ITCV estimates were the same and therefore only one set of estimates are produced here. The ITCV for the EOTM is .05. This implies that the correlations between the unmeasured confound and high teacher retention would have to be.23 or larger, and the correlation between the unmeasured confound and school-level student mobility would have to be .20 or larger to invalidate the inference under the EOTM weighting strategy. In the TOT regression, the ITCV was .04, and the correlation between the unmeasured confound and the treatment would have to be .20, and between the unmeasured confound and the outcome would have to be .21 to invalidate the inference under the TOT strategy. Finally, in the TOC regression, the ITCV again was .04, and the correlations would have to be .22 and .19, respectively. This is a moderate correlation size, especially when comparing it to the size of other measured confounds (see Chapter 5). Conclusions fiom Propensity Score Sensitivity Analyses Sensitivity analyses can be complicated, particularly when applied over a variety of analytic situations, such as here. However, sensitivity analyses should be viewed less as absolute estimates and more as accumulations of evidence. This is the approach taken here—to estimate the treatment effects using a variety Of matching algorithms and 167 methods, and then to assess those methods for sensitivity to unmeasured confounds using all available methods to do so. The evidence suggests that the Observed average treatment effects are somewhat robust to unobserved covariates, but not overly so. Again, this does not mean that there unobserved covariates do or do not exist, but simply that the results here appear to be somewhat robust to deviations from the identifying unconfoundedness assumption. More importantly, from a substantive perspective, there is reasonable evidence that increasing school-level teacher retention rates can lead to decreases in student mobility. As shown in Chapter 5, increased student mobility has positive implications on achievement. Therefore, it may be a valid policy initiative to look for programs that help to stabilize a teaching force within a given school, or to incentivize teachers to remain within that school. Limitations of this Analysis The key limitation in the utilization of a propensity score method is the same problem that analysts face in observational studies: the issue of “hidden bias” from potential unobserved variables. A fundamental assumption of propensity score matching is the assumption of unconfoundedness (Rosenbaum & Rubin, 1983) (otherwise known as the ignorable treatment assumption). This assumption states that, conditional on the set of covariates, the assignment to treatment and control conditions is independent of the outcome of treatment. Using the propensity score allows for the control of all observed characteristics, in order to assure that the unconfoundedness assumption is met. However, the question is—could the confound, the variable that affects both assignment to treatment (in this case, high school-level teacher retention) and the outcome (high 168 school-level student mobility), be an unobserved characteristic? If any of the relevant covariates are unobserved, the propensity score matching will produce biased results (Arpino & Mealli, 2008). I do attempt to address this criticism through the use of sensitivity analyses. These are able to quantify the impact necessary of that unobserved covariate to invalidate the assumptions. The substantive question remains, are these data rich enough to merit this type of analysis? These are relatively limited data—rich in sample size, but limited in the number and type of variables. Important factors, such as school leadership and school culture type of indicators, are not measured. In this way, although propensity scores represent an improvement in the framework to evaluate effects, they are still limited by the covariates available to estimate the propensity score equation, and therefore are still limited by the question of observable factors. If the unobserved characteristic that could invalidate inferences in a multilevel modeling framework is also one that could affect both the assignment to treatment and the outcome, then it still cannot be measured and accounted for in the propensity match itself and therefore still cannot be taken into account. The results will still be biased, because that unobserved characteristic is unobserved, and therefore, unmeasurable and uncontrollable. This can be a rather bleak conclusion. However, even if this is the case, Guo & Fraser (2010) suggest that the matching-based models offer improvements over OLS based methods. Furthermore, the use of multiple methods in tandem with sensitivity analyses produce the best estimates possible from a set of observational data. Finally, it is incumbent upon the education research community to use available skills, methods and 169 knowledge to investigate a problem, and not to allow the limitations of data to preclude us from providing the evidence we can make available to the research community. Case Study Schools: An Exploratory Analysis To further interrogate the differences in outcomes for Similar schools using a more descriptive approach, I identified the schools with a very low propensity to have high levels of teacher retention, and then identified those schools that had high retention, despite their low propensity, in order to conduct a descriptive analysis. There were 106 schools with propensity scores lower than .50—in other words, a propensity of less than .50 to have high retention. Of those, 20 had high teacher retention while the other 86 did not. While the overall analysis provides useful information on the entire sample of schools, I wanted to focus in specifically on those schools that were highly unlikely to have high teacher retention who did in fact have it, to see if there were any interesting or informative differences. In a basic descriptive analysis, I analyzed all of the schools with a less than .50 propensity to have high teacher retention as “case study schools.” For this analysis, I refer to “high retention case study schools” and “low retention case study schools” in reference to only those 106 schools identified as having a propensity of high teacher retention that is less than .50. See Table 6.6. The high retention case study schools are more likely to be town or rural schools. High retention case study schools had, on average, 55% of their student body as minority students, compared with 69% in the low retention (control) case study schools. However, when comparing case study schools overall with non-case study schools, case study schools (regardless of high or low teacher retention) were much more likely to have 170 higher minority student populations. Non case-study high schools had an average of 14% of their student body as minority students. Therefore, high retention case study schools had higher rates of minority students than the general population of schools, but lower rates than low retention case study schools. High retention case study schools had the same average proportion of students with free and reduced lunch as low retention case study schools. Again, case study schools overall had much higher average rates of free and reduced lunch (60% in case study schools, 31% in non-case study schools). High retention case study schools had higher mean proportion of teachers with professional licenses than low retention case study schools (72% versus 65%), and a lower mean proportion of minority teachers (28% versus 34%). They also had a higher mean of high qualified teachers. None of these differences meet the threshold for statistical significance.15 Turning to the individual characteristics of students in these schools, there were higher than expected proportions of American Indian and Hispanic students in the high retention case study schools. Between high retention case study schools and low retention case study schools, the proportion of black students in each group was close to the expected value. However, when comparing all case study schools to non-case study schools, the proportion of black students was much higher than expected in case study schools. In other words, case study schools (those with a low propensity to have high 15 . . . . . . Although they did not meet the statrst1cal significance threshold, the “universe” nature of these data makes simple percentage comparisons informative, as differences are “real” differences and not simply due to sampling bias. 171 teacher retention, regardless of actual teacher retention level) are disproportionately attended by black students. The math scale scores were, on average, higher in these high retention case study schools than low retention case study schools, and this effect is statistically significant. High retention case study schools also had higher mean ELA scale scores. Students in high retention case study schools also change schools at a lower rate than those in low retention case study schools. The multilevel models outlined previously were re-run using only these 106 schools, but there was not a detectable effect of high school-level teacher retention on student mobility under any of the three weighting strategies. For comparative purposes, I also include in Table 6.6 the characteristics of non- case study schools, for comparison with case study schools. Case study schools were much more likely to be city schools than expected (55% of city schools were case study schools, compared with an expected value of 18%). Case study schools were also more likely to be small schools. The mean percentage of minority students was 66% in case study schools, compared with 14% in non-case study schools, and the mean percentage of students eligible for free/reduced lunch was 60% in case study schools, compared with 31% in non-case study schools. Case study schools has a statistically significantly smaller percentage of teachers with professional licenses (66% in case study schools compared with 87% in non-case study schools), and also had significantly higher mean percentages of minority teachers and new teachers. They were more likely to be charter schools, and had significantly higher mean rates of student mobility. 172 From the perspective of student characteristics, the students in case study schools were overwhelmingly black students—48% of students in case study schools were black, compared with an expected value of 10%. Case study schools had significantly lower mean mathematics and ELA achievement, and significantly lower mean prior math and ELA achievement. With the case study schools, then, we see an interesting phenomenon. Within the 106 case study schools, the schools are rather similar in terms of structural characteristics. They differ significantly, on achievement and on student mobility. However, the case study schools, taken as a whole, are much different than the non-case study schools. Treatment case study schools look more like non-case study schools than control case study schools, but are still distinctly different in terms of student composition and achievement. The fact that all of the case study schools, regardless of treatment or control, are more alike than case study to non-case study schools suggests that there is something unique about the school culture of these schools that influences their ability to retain teachers and to perform more highly on achievement tests than anticipated. While this analysis cannot go deeper in terms of measuring and investigating elements such as school culture, professionalism, and charismatic leadership, this is further evidence to suggest the importance of those factors. 173 Table 6.1: Testin for Balance in Propensity Score Estimation N Avg prop score Locale City Town Rural Suburb =reference Percent minority Percent free reduced lunch School size Less than 300 students Greater than 1000 students 300- 999=reference Average math pretest Average ELA pretest Charter Magnet Proportion female students in school Proportion Amerlnd/PI Proportion Asian Proportion black Proportion Hispanic Proportion multiethnic Proportion students in special programs Block 1 Block 2 Block 3 Control Treatment Control Treatment Control Treatment 49 7 27 18 60 115 0.14 0.13 0.36 0.4 0.64 0.68 0.73 0.43 0.44 0.44 0.1 0.19 0.02 0 0 0.11 0.05 0.1 0.06 0.29 0.22 0.17 0.71 0.62 90.8 82.97 61.69 54.41 20.17 21.79 66.26 74.33 57.29 52.72 51.93 50.17 0.29 0.43 0.15 0.28 0.24 0.52 0.26 0 0.3 0.33 0.14 0.19 794.79 793.79 799.76 800.46 804.77 808.02 800.17 797.04 802.82 804.3 804.35 809.29 0.41 0.29 0.15 0.17 0 0 0.06 0.14 0.19 0.11 0.05 0.05 56.1 54.41 56.17 50.58 51.12 53.14 0.18 0 0.71 0.87 1.87 0.62 0.59 3.54 1.25 1.66 0.88 0.99 81.42 74.28 51.52 38.31 10.02 15.86 7.41 5.04 7.9 10.93 5.12 2.59 0.28 0.18 0.98 0.37 0.55 0.13 5.75 15.18 0.37 5.92 1.16 0.21 No statistically significant difierence in means detected using two-sample t-tests Tests for balance performed as part Stata's pscore analytic package 174 Table 6.1 (cont'd) N Avg prop score Locale City Town Rural Suburb =reference Percent minority Percent free reduced lunch School size Less than 300 students Greater than 1000 students 300- 999=reference Average math pretest Average ELA pretest Charter Magnet Proportion female students in school Proportion Amerlnd/PI Proportion Asian Proportion black Proportion Hispanic Proportion multiethnic Proportion students in special programs Block 4 Block 5 Control Treatment Control Treatment 21 21 39 94 0.56 0.58 0.83 0.86 0.03 0.05 0.07 0.08 0.05 0.07 0.27 0.2 0.87 0.72 0.32 0.34 8.77 12.34 10.78 11.04 38.08 38.17 26.21 23.67 0.31 0.13 0.1 0.05 0.03 0.12 0.34 0.49 808.26 809.91 816.97 816.34 809.45 810.19 816.79 816.5 0 0 0.02 0.01 0.03 0.03 0.12 0.16 47.96 49.86 49.66 49.87 2.29 1.17 0.48 1.93 0.55 0.7 2.35 1.9 2.05 6.82 4.55 3.99 2.49 2.76 2.72 2.45 0.21 0.3 0.6 0.78 3.26 1.28 3.96 1.24 No statistically significant difference in means detected using two-sample t-tests Tests for balance performed as part Stata's pscore analytic package 175 Table 6.2: Logistic Regression for Estimating the Treatment Propensity Score (Treatment = School-Level Teacher Retention > 85%) Independent Variable Estimate Std. Error P value Percent minority students Percent free/reduced lunch students Locale City Town Rural Suburb =reference School Size Less than 300 students Greater than 1000 students 3 00-999 students=reference Charter Magnet Proportion female students Proportion asian students Proportion black students Proportion Hispanic students Proportion multiethnic students Proportion special program students Average math pretest score Average ELA pretest score 176 -0.06 -0.01 -0.33 -0.12 -0.43 -0.28 0.72 -0.69 0.56 0.01 0.09 0.03 0.04 0.06 0.001 0.001 0.05 0.03 0.01 0.41 0.42 0.34 0.32 0.31 0.61 0.37 0.01 0.03 0.03 0.03 0.04 0.01 0.03 0.03 0.03 0.179 0.417 0.769 0.212 0.389 0.021 0.258 0.128 0.994 0.093 0.186 0.094 0.12 0.927 0.986 0.093 Table 6.3: Estimated Effect of High School-Level Teacher Retention on Average School-Level Student Mobility Odds Std Model Coefficient Ratio error t-ratio pvalue Weighted by propensity (EOTM) -.78 0.46 .14 -2.48 .013 Weighted by propensity (treatment effect for the treated) -.72 0.49 .16 -2.18 .029 Weighted by propensity (treatment effect for the control) -.86 .42 .14 -2.60 .009 Unweighted, with covariates -.57 .57 .16 -2.02 .044 Unweighted, bivariate -1.62 .19 .04 -8.10 .000 Stratification method (ATT for each strata)“ -0.083 .048 -1.732 .042 Nearest neighbor“ -.091 .046 -1.98 .048 Kernel Matching“ -0.079 .048 -l .63 .103 Radius Matching“ -.09 .042 -2.22 .026 *Estimated with psmatch2 ”Estimated with the atts program for stratification matching 177 Table 6.4: Sensitivity Analyses using Mantel-Haenszel Bounds for Effect on Student Mobility: Nearest Neighbor Matching Gamma Q_mh+ Q_mh- p_mh+ _mh- 1 2.63323 2.63323 0.004229 0.004229 1.05 2.83957 2.43571 0.002259 0.007431 1.1 3.03302 2.244 0.001211 0.012416 1.15 3.21848 2.06125 0.000644 0.019639 1 .2 3.39666 1.88665 0.000341 0.029604 1.25 3.56817 1.71947 0.00018 0.042764 1.3 3.73352 1.5591 1 0.000094 0.059485 1 .35 3.89321 1.40501 0.000049 0.080009 1 .4 4.04765 1 .25669 0.000026 0.104433 1.45 4.19722 1.11372 0.000014 0.1327 1.5 4.34225 0.975715 7.10E—06 0.164603 1.55 4.48304 0.842333 3.70E—06 0.199801 1.6 4.61988 0.713265 1.90E—06 0.237841 1.65 4.75301 0.58823 1.00E-06 0.278189 1 .7 4.88264 0.466975 5.20E—07 0.320259 1.75 5.009 0.349271 2.70E-07 0.363443 1 .8 5.13226 0.234909 1 .40E-07 0.40714 1.85 5.25261 0.123695 7.50E—08 0.450778 1.9 5.37018 0.015456 3.90E—08 0.493834 1.95 5.48515 -0.089971 2.10E-08 0.535845 2 5.59762 -0.053909 1.10E-08 0.521496 Gamma : odds of differential assignment due to unobserved factors Q_mh+ : Mantel-Haenszel statistic (assumption: overestimation of treatment effect: (Lmh- : Mantel-Haenszel statistic (assumption: underestimation of treatment effect p_mh+ : significance level (assumption: overestimation of treatment effect) p mh- : significance level (assumption: underestimation of treatment effect) 178 Table 6.5: Sensitivity Analyses using Mantel-Haenszel Bounds for Effect on Student Mobility: Stratification Method Gamma Q_mh+ (Lmh- p_mh+ p_mh- 1 1.87972 1.87972 0.030073 0.030073 1.05 2.07905 1.70328 0.018807 0.044258 1.1 2.25903 1.52486 0.01 1941 0.063647 1.15 2.43158 1.35476 0.007517 0.087748 1.2 2.59735 1.1922 0.004697 0.116592 1.25 2.75692 1.03651 0.002917 0.149981 1.3 2.91079 0.887135 0.001803 0.187503 1.35 3.05939 0.743547 0.001 109 0.228575 1 .4 3 .203 13 0.605299 0.00068 0.27249 1.45 3.34234 0.471993 0.000415 0.318466 1.5 3.47736 0.343272 0.000253 0.365697 1.55 3.60845 0.218815 0.000154 0.413397 1.6 3.73587 0.098336 0.000094 0.460833 1.65 3.85986 -0.018424 0.000057 0.50735 1.7 3.98062 -0.131037 0.000034 0.552127 1.75 4.09833 -0.020381 0.000021 0.50813 1.8 4.21319 0.087155 0.000013 0.465274 1.85 4.32534 0.191752 7 .60E—06 0.423968 1.9 4.43494 0.293575 4.60E-06 0.384541 1.95 4.5421 0.392775 2.80E-06 0.347243 2 4.64698 0.489493 1 .7 0E-06 0.3 12246 Gamma : odds of differential assignment due to unobserved factors Q_mh+ : Mantel-Haenszel statistic (assumption: overestimation of treatment effect) Q_mh- : Mantel-Haenszel statistic (assumption: underestimation of treatment effect) P__mh+ : significance level (assumption: overestimation of treatment effect) P_mh- : sigm'ficance level (assumption: underestimation of treatment effect) 179 Table 6.6: Comparison of School and Student Characteristics Between High- and Low-Retention Case Study Schools, and Between Case Study and Non-Case Study Schools Case . y 100 : Characteristics of Treatment and Control Schools Treatment Control N=20 N=86 (19%) (81%) N Locale City 15% 85% 53 Suburb 10% 90% 20 Town 33% 67% 3 Rural 30% 70% 30 School Size Less than 300 students 24% 76% 34 300—999 students 16% 84% 50 Greater than 1000 students 18% 82% 22 AYP Status Fail 18% 82% 61 Pass 20% 80% 45 Percent minority 55% 69% Percent free/reduced lunch 60% 60% Percent teachers with professional licences 72% 65% Percent new teachers 19% 23% Percent minority teachers 28% 34% Perct highly qualified teachers 81% 75% Charter 18% 82% 27 Magnet 30% 70% 10 Student mobility 36% 35% PriOr student mobility 30% 29% 180 Table 6.6 (cont'd): Case Study Schools: Characteristics of Treatment and Control Schools Student Level Characteristics N=1,856 N=7653 (19%) (81%) Gender Male 20% 80% Female 19% 81% Race“ American Indian 42% 58% 57 Asian 19% 81% 79 Black 18% 82% 6,755 Pacific Islander 18% 82% 11 White 19% 81% 2,050 Hispanic 35% 65% 513 Multi-ethnic 2% 98% 44 Pro-test math scale score 810.24 798.07 Pre-test ELA scale score 814.01 803.01 11th grade math scale score 1087.89 1071.58 11th grade ELA scale score 1095.18 1080.72 Free/Reduced Lunch Eligible 16% 84% 5,230 Student mobility 15% 85% 2,924 181 CHAPTER 7: THE CORRELATES AND IMPACTS OF SCHOOL-LEVEL TEACHER UNDERSUPPLY The third paper utilizes the demand formula specification and resulting calculations obtained from Chapter 4, as well as the school-specific teacher retention rates obtained from Chapter 5 to address the third core question of the dissertation: what are the predictors of within-school undersupply and how does undersupply affect achievement? This chapter estimates the predictors of undersupply using school characteristics, and also estimates the impact of subject-specific undersupply on student achievement test scores in mathematics, English language arts, and science. Importantly, this analysis will also introduce the school-specific teacher retention rate as a key predictor. The modeling strategy employed in this chapter mimics the one used to estimate the effect of school-level teacher retention on student achievement and student mobility. For the sake of parsimony in the proposal, I will not fully restate the relevant literature that is outlined in the other two papers. Background to the Problem: Undersupply as an Organizational Characteristic of Schools The advent of more rigorous graduation requirements, increased instructional demands of NCLB and the upcoming demands related to Race to the Top funding, and labor market pressures away from teaching have led to a situation of potential undersupply of qualified teachers. Many researchers have advanced a teacher shortage thesis that identifies student enrollment increases and teacher retirement increases as a driving factor in teacher shortages. However, following Ingersoll (2001 , 2003) and Ingersoll & Perda (2009), I examine teacher shortages from an organizational 182 perspective. Undersupply can be considered an organizational feature of schools, one that can interact with school structural and compositional factors. While teacher shortages may or may not happen at an aggregate state or national level, individual schools may struggle to adequately staff their schools, particularly in the face of rigorous instructional demands. This undersupply can be exacerbated by teacher turnover. Both of these school- specific aspects of supply, undersupply and teacher turnover, and be considered alongside other factors that Shape the culture of the school and, more importantly, how students in that school may perform. Schools that are high in undersupply may have low levels of relational trust (Bryk & Schneider, 2002), or may have high numbers of low-income or minority students. In a school’s supply of teachers, turnover and undersupply are two related but very distinct features. Turnover relates to the frequency with which teachers enter and exit the school. Organizational sociology and the sociology of work have extensively studied the impact turnover has on organizational culture, and in particular, the effectiveness of the organization in achieving its goals (Horn & Griffeth, 1995; Kalleberg & Mastekaasa, 1998; Mueller & Price, 1990) Undersupply is related to tumover—a high rate of turnover can relate to being undersupplied—but distinct in that a school could have a high rate of tumover, but not be undersupplied. An undersupply of teachers relates to the school’s ability to staff all its positions adequately, and therefore can be considered a resource question. How does the lack of resources as quantified by qualified personnel relate to a school’s culture, its hospitableness as a work environment, and its ability to teach and train children? While teacher turnover and other features of teacher supply and demand have been heavily studied (see Darling-Hammond, 2000; Ingersoll, 1995, 2000, 183 2001), the relationship of a school-specific undersupply of teachers, coupled with teacher turnover in that school, to student achievement and mobility has not been studied. This analysis also moves teacher undersupply away from a general overall estimate of teacher supply into a school-specific characteristic and quality of individual schools, focusing on undersupply as an organizational characteristic rather than a more meta-level economic condition. There are two relevant questions that arise when considering school-level teacher undersupply. The first is, are undersupply rates distributed unevenly over school characteristics? Are there certain types of schools that are more likely to be undersupplied than other types of schools? Particularly in a situation where a state may appear to have an adequate number of teachers to meet instructional demands, are there schools that are still struggling to recruit and retain staff so that they can provide appropriate instructional to all students? This has implications for the equitable distribution of teachers, as well as for closing achievement gaps, both key aspects of both NCLB and the new Race to the Top funding. The second important question is, once each school’s level of undersupply is ascertained, what is the relationship between that undersupply as a school characteristic and student achievement outcomes? Do students in undersupplied schools perform more poorly than students in adequately supplied schools? One would assume this to be the case; it is difficult to imagine that the learning experience and achievement of students would be better in a school where there are not enough mathematics teachers and class sizes are extremely high or students are forced to take study halls or independent studies in order to Obtain instruction. However, this needs to be investigated empirically. If there 184 is evidence to suggest that undersupply negatively relates to student achievement, then it becomes incumbent upon districts and the state itself to ensure that each school has, at minimum, an adequate number of teachers to meet instructional demands. Analytic Methods Part 1: Model predicting undersupply Utilizing both the tested demand formula, and the refined supply calculations from the second paper, potential undersupply will be estimated for each school in the state of Michigan. The first stage in the analysis examines the predictors Of undersupply using a school-level ordinary least squares regression analysis and a logistic regression analysis. Tables 7.1, 7.3 and 7.5 present the results of OLS regressions predicting undersupply in mathematics, English language arts, and science, respectively, with undersupply as a continuous outcome; Tables 7.2, 7.4, and 7.6 present the same analysis, but with the outcome as dichotomous, where only those schools with greater than one FTE of undersupply are identified as “undersupplied.” 1 Since undersupply is a continuous measure centered around zero, with positive numbers indicating increasing amounts of undersupply, using it as a continuous outcome represents the relationship between a decreasing supply of teachers (relative to demand) and the various predictors. A school may have an undersupply of -2, which means they are actually oversupplied, given the assumptions of the formula. This analysis estimates the impact of other school- level characteristics on that undersupply number, so even if a school is not “undersupplied” in terms of the cut point of one FTE or greater, they still may have lower supply. Socral studies 18 not represented here, as Michigan 3 social studies achievement test rs one of rt’s least robust, and as social studies is not a matter of key focus in many federal and state initiatives. 185 The adequate supply of teachers across schools is strongly related to the equitable distribution of teachers. Are schools that serve low-income or minority students more likely to be undersupplied? Are schools in certain locales more likely to struggle to adequately staff their courses to meet the Michigan Merit Curriculum? While this analysis cannot estimate teacher quality, an important predecessor to quality is ensuring that schools have an adequate quantity of certified teachers to provide instruction in the MMC areas, without excessively large class sizes. Findings: Predictors of Subject-Specific Undersupply The goal is to investigate the type of school-level characteristics that are associated with a subject-specific undersupply—in this case, mathematics. Table 7.1 presents the results of the OLS regression predicting mathematics teacher undersupply. Small schools have significantly lower levels of mathematics teacher undersupply and large schools have significantly higher levels of mathematics teacher undersupply than schools with 300-999 students. Rural schools have lower rates of undersupply than their suburban counterparts. Schools with a higher average proportion of minority teachers have higher rates of mathematics teacher undersupply, with each one unit increase in the proportion of minority teachers associated with a corresponding .02 unit increase in mathematics teacher undersupply. Interestingly, schools with higher proportions of female students appear to have higher rates of mathematics undersupply, with each one unit increase in the proportion of students in a school who are female associated with a .02 increase in math teacher undersupply (significant at the .001 level). Also interestingly, schools with higher proportions of Asian, black, and Hispanic students 186 appear to have lower rates of mathematics teacher undersupply, relative to the proportion of white students in a school. Using the dichotomous outcome variable allows for the estimation of the likelihood of being “significantly” undersupplied in mathematics, where significant is defined by an undersupply of greater than one F TE (see Table 7.2). Small schools are much less likely to be significantly undersupplied in mathematics, with an odds ratio of .10, which means that small schools are .10 times as likely as schools with 300-999 students to be significantly undersupplied in mathematics Large schools are four times more likely to be significantly undersupplied in mathematics than schools with 300-999 students. This suggests that adequately staffing a school with mathematics teachers is highly dependent on school size. Schools with higher proportions of student mobility are also more likely to be significantly undersupplied, and again, schools with higher proportions of female students are more likely to be significantly undersupplied in mathematics. In English language arts, the school size relationship observed in the mathematics undersupply predictions appears to be less salient, and the key predictors are school instructional staff compositional characteristics (see Table 7.3 for the results of the OLS regression). Schools with a higher proportion of minority teachers experience higher rates of undersupply in English language arts, while schools with higher proportions of highly qualified teachers have lower rates of undersupply in ELA. Higher school-level prior ELA achievement is related to lower levels of ELA undersupply, with each one unit increase in school-level prior ELA achievement related to a .03 unit decrease in ELA teacher undersupply. The relationships between the racial composition of the student 187 body and undersupply observed in the mathematics analyses appears to have disappeared, with the exception of schools with higher proportions of black students, which have lower levels of ELA teacher undersupply. Table 7.4 presents the results of the logistic regression predicting significant ELA teacher undersupply. Again, this model is used to look only at the likelihood of being significantly undersupplied (greater than one FTE of undersupply). Schools with a higher proportion of minority teachers are 1.04 times more likely to be undersupplied than those with lower proportions of minority teachers. Schools with high levels of student prior achievement in ELA are .91 times as likely to be significantly undersupplied, while schools with higher levels of female students, Asian students, and American Indian students are all more likely to be significantly undersupplied. Turning to science teacher undersupply, schools with higher proportions of students who are free and reduced lunch eligible have higher levels of science teacher undersupply (see Table 7.5 for the OLS regression). Large schools have lower rates of science teacher undersupply. Schools with higher student mobility have higher science teacher undersupply, and schools with higher Asian student populations and higher black student populations have lower levels Of science teacher undersupply. When looking exclusively at significant science teacher undersupply (see Table 7.6), schools with higher proportions of free and reduced lunch eligible students are 1.05 times more likely to be significantly undersupplied in science, and schools with higher proportions Of student mobility are 1.02 times more likely to be undersupplied in science. This analysis identifies several common themes regarding the predictors of teacher undersupply. The first is that schools with higher proportions of minority 188 students, particularly those with higher proportions of black students, are less likely to be undersupplied in all three subject areas. This is a counterintuitive finding, but suggests that Michigan does not have an inequitable distribution of the supply of teachers to schools that serve minority students. The second is the strong relationship between school-level student mobility and teacher undersupply. In all three subjects, higher student mobility was linked to greater levels of teacher undersupply. Third, it is important to note that school-level teacher retention was not a significant predictor of subject- specific undersupply. The Relationship Between Teacher Undersupply and Student Achievement This study utilizes a multilevel modeling approach to estimate the contextual effects of the key predictor of interest, school-specific teacher undersupply in mathematics, English language arts and science. Although fixed effects models can be used to control for school factors without making parametric assumptions, I choose to express these models in a multilevel modeling framework in order to estimate effects simultaneously at both levels. To estimate the relationship between school-level teacher undersupply, school-level teacher retention rate (from Chapter 5), and other school contextual factors and student achievement outcomes, 1 controlled for student level factors such as prior ability, gender, race, and free or reduced lunch eligibility. The models are random intercept models, which model each school mean as a function of the key predictor of interest, school-level teacher undersupply, and other school covariates. There are no random slopes estimated. It is important to remember that these models consist of students nested within schools; although teacher data is available and used to calculate school-level teacher undersupply and retention, as well as other instructional 189 workforce characteristic variables, teachers are not currently linked to students in the Michigan data, and thus, estimating the impact of a given teacher’s retention or mobility on a given student’s achievement outcomes is not possible. Moreover, this study takes an organizational approach, in which it is hypothesized that it is the composition of the instructional workforce and the organizational culture of a school that impact student achievement outcomes, and therefore modeling individual outcomes as a function of school-level predictors is appropriate. Data Source The data source utilized for this paper is the same as that used in throughout the dissertation (see Chapter 2: Data and Methods for more information if necessary). This study utilizes a set of rich longitudinal administrative data from the state of Michigan, including data on all teachers, students and schools in the entire state, collected longitudinally over a period of four years. The sample of high schools is the same as defined in Chapter 5. Measures Main independent variable: The key predictor of interest is school-level teacher undersupply rate in mathematics, English language arts, and science. Teacher undersupply in each subject is calculated by estimating demand using the refined demand formula outlined in Chapter 4, calculating supply from the 2008 Registry of Educational Personnel, and subtracting supply from demand (Chapter 1 and Chapter 4 provide greater 190 detail on this method). This yields a continuous measure of supply, with positive numbers indicating potential undersupply.2 This analysis utilizes as a key predictor the main independent variable from Chapter 5, school-level teacher retention rate. Teacher churn rate is calculated by the teacher retention rate, the number of teachers in a given school who remain the same from one year to the next.3 Teacher retention rate calculated over four years, three retention time points: school years 2004-2005, 2005-2006, 2006-2007 and 2007-2008. These rates are averaged to generate an average retention rate for each high school in the sample. Retention is based on a teacher remaining in the same school from one year to the next. If a teacher is in multiple schools, they are counted as either a stayer or mover from all schools in which they teach. This calculation is at the person-level.4 Dependent variables: The outcome variable for the student achievement models is the scale score on mathematics, English language arts, and science achievement tests from the 2009 Michigan Merit Examination. School- and student-level covariates: The multilevel models are estimated with the same set of covariates described in Chapter 2 and in Chapter 5. They are briefly listed here, and discussed in greater detail above. School-level covariates include: the proportion of teachers with professional licenses, the proportion of minority teachers, the 2 For the analyses predicting undersupply, this variable was used both as continuous and dichotomous. In the multilevel models, it is used in its continuous format only. 3 This method is used rather than calculating the number of “new” teachers in a school in a given year because of the difficulty of defining the denominator for that calculation. 4 Like Ingersoll (2001), I analyze all turnovers or departures, and do not distinguish between teacher attrition (fi'om the profession) and teacher mobility (between schools). If a teacher is in the same school from one year to the next, they are retained; if they are not, they have “turned over.” This focuses on the organizational aspect of teacher retention, as the consequences to the organization are the same regardless of whether a teacher leaves the field entirely, or simply moves to another school (Ingersoll, 2001). 191 proportion of highly qualified teachers, the percent minority students, the percent of students eligible for free and reduced lunch, locale, school size, the mean achievement pretest in each subject, and the student mobility rate. Student-level covariates include student mobility, achievement pre-test in a given subject, free/reduced lunch eligibility, racial/ethnic identification, gender, and special program eligibility. ANALYTIC APPROACH This study posits school-level teacher undersupply as a critical factor in explaining variations in student achievement in mathematics, English language arts, and science. Therefore, the first step in the analysis is to calculate subject-specific teacher undersupply for each school. Simple descriptive statistics were then calculated for schools by subject-specific undersupply, and for all student- and school-level variables that are used in the multilevel models. Tables 3.10, 3.11, and 3.12 (in the descriptive analysis chapter) present the mean undersupply rates by school characteristics in each subject (details not repeated here). The Multilevel Models to Be Estimated To test the effects Of teacher undersupply on student achievement outcomes, multilevel models (i.e., models with random effects) of student achievement in mathematics, English language arts, and science with high school students nested within schools are estimated. These models are the same as those used in Chapter 5, and will not be repeated here in any length. The general model specification for the HLM models is as follows: Level 1 model: Yij = BOj + Blj(student mobility) + Bj Z’ + rij Level 2 model: 192 BOj = 700 + 701(undersupply) + 702(teacher retention) + y; Q’ + uoj where Yij = outcome (mathematics, ELA or science scale score) for each student i in school j BOj = each school mean, represented as a function of the grand mean, student mobility, the matrix Of student-level predictors, the school teacher undersupply rate, and the matrix of school level predictors 131j = coefficient for student mobility l3j = vector of coeffs for school j Z’ = vector of student covariates for school j 1100: grand mean (intercept) yo] = effect of subject-specific undersupply on Boj (each school mean) 702 = effect of school-level teacher retention rate (from Chapter 5) on Boj (each school mean) yj = vector of school-level predictors) I Q’ = vector of school covariates uoj = the residual error of Boj, distributed iid N(0, too) rij = level 1 variance (student error term), rij distributed iid N(0, 62) One key challenge to this analysis might be that the causal path could be in the opposite direction—that student achievement predicts teacher undersupply. In order to attempt to address that criticism, the longitudinal nature of the data is utilized. Prior individual student academic achievement from their 8th grade pretest, as well as a prior cohort-level mean mathematics achievement from 8th grade are included, to control for both prior individual and group achievement. Teacher undersupply is calculated based on 2008 data, while the student outcome is from the 2009 testing year. Therefore, the school-level undersupply predictor is causally prior to the student outcome. A baseline for the HLM models is established by estimating an unconditional random effects ANOVA (output not reported). This allows for the calculation of the intraclass correlation, or the proportion of variance that is between schools. Following the 193 estimation of this baseline model, a series of multilevel models were estimated, for mathematics, English language arts, and science. Four different multilevel models are presented in Table 7.7 (mathematics), Table 7 .8 (English language arts), and Table 7.9 (science). The first, a bivariate model, estimates the bivariate relationship between subject-specific school-level teacher undersupply and student achievement in that subject. The second (Model 2) includes a second predictor shown in the preliminary analyses to be highly correlated with the student achievement, fi'ee lunch eligibility, as well as a pretest measure of achievement in the appropriate subject and the student mobility indicator at the student level, in order to account for student prior ability. The third model adds in the school-level teacher retention rate. The final model includes all predictors at level 1 and level 2. Sensitivity Analyses When conducting research using Observational data, there is cause for concern regarding the impact of an unobservable characteristic on the outcome, one that might invalidate the inferences drawn from the study. State administrative data, such as those used in this study, are rich in observations but often does not include a large number of variables. Sensitivity analyses are conducted to test the robustness of the inferences to the influence of other unobserved characteristics, characterizing the robustness of these inferences to the potential impact of confounding variables (Frank, 2000). l utilized this sensitivity analysis strategy in Chapter 5 as well. 5 RESULTS Mathematics Achievement Outcomes6 5 The descriptive statistics are described in Chapter 3 and are not repeated here. Note, however, that when this is submitted as an individual journal article, the descriptive information would be included here. 194 To assess the relationship between student mathematics achievement and school- level mathematics teacher undersupply, a series of multilevel models are estimated (see Table 7.7). Using the series of multilevel models outlined previously, I find that school- level mathematics teacher undersupply is negatively and significantly associated with school mean mathematics achievement scores. In the bivariate regression (Model 1), schools with higher rates of mathematics teacher undersupply have mean mathematics achievement scores that are 1.09 scale score points lower than students in schools with lower rates of mathematics teacher undersupply (pf .05). The proportion of variance explained between the unconditional model and the bivariate model is 18%. This effect for teacher undersupply remains in Model 2, with the addition of the measure of student prior ability, the math pretest score, as well as student free and reduced lunch eligibility and student mobility. For each one unit increase in mathematics teacher undersupply, there is a corresponding decrease of 0.46 scale score points in student mathematics achievement (pS.022). Prior student mathematics achievement, aggregated to the school level, is also an important predictor, with each one unit increase in average school-level student mathematics achievement associated with a 1.23 point increase in mean scale scores. At the student level, mathematics pretest and flee/reduced lunch eligibility are both important predictors, with mathematics prior ability associated with higher mathematics achievement outcomes, and free/reduced lunch eligibility associated with significantly lower mathematics achievement outcomes. This model now explains 87% of the variance at level 2, and 53% of the variance at Level 1, which suggests that teacher undersupply and prior school-level mathematics achievement account for much of the 6 Results from the baseline model, a oneway ANOVA with random effects, yield an intraclass correlation of 18% for the mathematics achievement outcome. 195 variation in school mean mathematics achievement. In Model 3, the other key predictor of interest, school-level teacher retention rate, is introduced. This is positively and significantly related to mathematics achievement, with each one unit increase in school- level teacher retention rate associated with a .21 unit increase in mean mathematics scale scores (pS.000). In the full model, mathematics teacher undersupply continues to be associated with decreased student mathematics achievement outcome, with a one unit increase in mathematics teacher undersupply leading to a corresponding .35 point decrease in mathematics scale scores. The relationship between school-level teacher retention rate and mathematics achievement has been reduced to statistical nonsignificance, however, although it remains in the same direction. This suggests that, controlling for all other school structural and compositional characteristics, mathematics teacher undersupply is an important predictor of student mathematics achievement, even when controlling for school-level teacher retention rate. The percent of students in a school who are free and reduced lunch eligible is also linked to decreased mathematics achievement scores. Prior school-level mathematics achievement continues to be an important predictor of current student mathematics achievement. Turning to school structural characteristics, rural schools have higher mean mathematics achievement, with mean scale scores that are 1.43 scale score points higher (pS.009). In terms Of school compositional characteristics, schools with higher rates of teachers with professional licenses have higher mean mathematics achievement, with each one unit increase in the proportion of teachers with professional licenses associated with a corresponding .09 increase in mean mathematics scale scores (pg .002). Schools 196 with higher rates of minority teachers have lower mean mathematics achievement, where each one unit increase in the proportion of minority teachers is associated with a .07 decrease in mean mathematics scale scores (p£.001). At the Student level, higher levels of prior mathematics achievement are associated with higher levels of current mathematics achievement (Y20=.90, pS.000). American Indian (7.10:-2.17), black (Y60=-7-1 1), Hispanic (Y7o=-2.1 l) and multiracial (Y80=-2-85) students all had significantly lower mathematics achievement scores than their white counterparts, while Asian students had significantly higher scores (750:1.52). Finally, students who changed schools between their freshmen year and the time of the test had significantly lower mathematics achievement (ylo=-3.08, pS .000). English Language Arts Achievement Outcomes The modeling strategy for the relationship between English language arts teacher undersupply and English language arts achievement is the same as that used to assess the mathematics relationship (see Table 7.8).7 In the bivariate model, higher rates of English language arts teacher undersupply is associated with lower ELA scores, with each one unit increase in undersupply associated with a corresponding 1.31 unit increase in mean ELA score (pS.003). This relationship remains in Model 2, which includes English language arts pretest and free/reduced lunch as key predictors. In Model 2, increased undersupply is associated with decreased ELA score (“tor= -0.34) while higher levels of prior student level ELA achievement is associated with higher ELA scores (Y01=1.02, p5..000). Free and reduced lunch eligibility is associated with significantly lower rates of mean ELA achievement, with students who are free and reduced lunch eligible 7 From the unconditional ANOVA model (output not reported), the intraclass correlation is 20%. 197 demonstrating ELA scale scores that are, on average, 4.41 scale score points lower than their non-free/reduced lunch counterparts @3000). When adding school-level teacher retention as a predictor in Model 3, we see that each one unit increase in school-level teacher retention rate is associated with a 0.14 unit increase in mean ELA scale scores (pS.000). Finally, in the full model (Model 4), ELA teacher undersupply is still negatively and significantly associated with ELA achievement scores, with each one unit increase in ELA teacher undersupply associated with a 0.37 unit decrease in mean ELA achievement score (pS..000). However, the relationship between school-level teacher retention and ELA achievement has been reduced to non-Significance, likely due to the inclusion of other workforce composition variables. Looking at school workforce characteristics, schools with higher proportions of teachers with professional licenses have higher mean ELA achievement scores, while those with higher proportions of minority teachers have lower mean ELA achievement scores. For school structural characteristics, an increased proportion of minority students is associated with higher ELA scores (Y06=-06, p=.005) while the proportion of free and reduced lunch students is associated with lower ELA scores (Yo7= -0.14, p=.000). The average school-level ELA pretest score is also associated with higher mean ELA achievement (7015=0.94, p=.000). Finally, student characteristics are important in understanding ELA achievement. Student mobility is negatively associated with mean ELA achievement, with students who change schools exhibiting ELA scores that are 2.74 scale score points lower than those that stay in the same school. Free and reduced lunch eligibility is also associated with lower mean ELA achievement scores (y30= -3.94, p=.000). American Indian, black, Hispanic and multiethnic students all have significantly 198 lower ELA achievement scores than their white counterparts, while Asian students had significantly higher ELA achievement scores. Female students also had significantly higher achievement scores than male students (Y90=2.02, p=.000). Science Achievement Outcomes All of the science teacher undersupply relationships with student science achievement are in the same direction and the same magnitude as the ELA relationships described above, and for the sake of parsimony, are not repeated here. See Table 7.9. Comparison of Main Effects under Diflerent Demand Estimations In Chapter 4, the demand formula specification was justified and a formula was selected, which has been utilized here. For comparative purposes, and to see if inferences are altered regarding the impact of school-level teacher undersupply on student achievement, these models were ran under three different demand assumptions: 1)’using a 0.7 smoothing constant for enrollment (the Optimal formula identified by Chapter 4, and the one used‘as the basis for all previous analyses in this chapter), 2) the original formula, and 3) a 0.7 smoothing constant on enrollment and distributional assumptions on class size (~N(25, .5) and courses taught per FTE (~N(5, 1). Table 7.10 compares the main effects from each of these methods for mathematics, Table 7.11 for English language arts, and Table 7.12 for science. In Table 7.10, we see that the coefficients for the relationship between mathematics undersupply and student mathematics achievement is in the same direction, with Similar standard errors, and similar significance levels. The formula with distributional assumptions actually detects a stronger relationship between mathematics undersupply and student mathematics outcomes, which indicates that the primary formula 199 utilized in this chapter may be a more conservative formula, which helps avoid Type I errors. For English language arts (Table 7.11) and science (Table 7.12), the estimates and standard errors are nearly identical. Sensitivity Analyses AS discussed in Chapter 5, the question of sensitivity analyses in this contect is: to what extent could an unobserved confounding variable alter the inferences regarding the relationship between school-level teacher undersupply in mathematics, English language arts, and science and related student achievement outcomes. To do this, I make use of the Impact Threshold for Confounding Variables (ITCV), to quantity how powerful the impact of an unmeasured confound would have to be in order to negate these inferences (Crosnoe, 2009; Frank, 2000). This method is more fully developed in Chapter 5, and in Technical Appendix D. Sensitivity analyses are presented below for each of the three outcomes (mathematics, English language arts, and science).8 The impact of an unmeasured confound (recall impact =r v.y>