EXPLORATIONS OF TEACHER LABOR MARKETS
By
Seth L. Gershenson

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
Economics
2011

ABSTRACT
EXPLORATIONS OF TEACHER LABOR MARKETS
By
Seth L. Gershenson
My dissertation is comprised of three chapters that analyze various aspects of teacher labor
supply. The first two chapters use the same primary dataset and similar empirical strategies to
investigate substitute-teacher labor supply in an intermediate school district in Michigan. The
data comes from an automated calling system used to offer jobs to available substitute teachers
and is notable in two respects. First, both accepted and rejected offers are observed, which
facilitates the estimation a sequential binary-choice model of substitute teachers’ job-offer
acceptance decisions. Second, the calling system makes offers in a conditionally random order,
which generates exogenous variation in offer quality across substitute teachers. This exogenous
variation is exploited to identify the causal effects of a variety of job attributes on substitute
teachers’ labor-supply decisions.
Substitute teachers are an important, but often overlooked, source of instruction in U.S.
public schools. Chapter 1 investigates substitute teachers’ preferences for several non-wage job
characteristics and their potential implications for education policy. I find that important
determinants of the offer-acceptance decision include the offer’s arrival time, commute time, day
of week, classroom type, school type, and school quality. Interestingly, conditional on school
quality student demographics do not significantly influence substitutes’ decisions. Longer and
higher paying full-day jobs are preferred to half-day jobs, although conditional on daily pay, job
length does not significantly impact daily labor-supply decisions. Preferences for several job

characteristics are found to vary with substitutes’ regular-teacher certification status. Policy
implications of these findings are discussed.
Chapter 2 estimates the causal effect of commute time on daily labor supply. The
substitute-teacher labor market is an ideal environment in which to answer this question because
workers are subject to daily exogenous variation in commute time and are free to adjust labor
supply on a daily basis. The main result is an estimated offer-acceptance elasticity (with respect
to commute time) of about -0.4, which suggests that commute time plays an important role in
labor supply decisions. The effect of commute time on labor supply is significantly larger on
mornings when the temperature is below 20 degrees Fahrenheit, but fuel prices and rain do not
significantly alter the effect of commuting. There is no statistically significant difference in the
overall aversion to commuting between men and women, however women are particularly averse
to commuting the cold weather and are significantly more responsive to fuel prices than men.
Chapter 3 investigates the impact of an increase in the stakes of mandatory testing created
by the 2001 No Child Left Behind Act (NCLB) on teacher quality in California. NCLB
simultaneously created strong incentives for schools to improve student achievement and
increased the stress and pressure on teachers. The empirics use a difference-in-differences
identification strategy that compares teachers in tested second-grade classrooms to those in nontested first-grade classrooms. I find that the probability of second-grade teachers holding a
graduate degree significantly decreased in response to (and in anticipation of) NCLB, a small
decrease in average years of experience in tested versus non-tested classrooms, and no effect on
teacher certification.

Copyright by
SETH L. GERSHENSON
2011

Dedicated to the memory of my mother, Shona Zangari Gershenson

v

ACKNOWLEDGEMENTS

I am incredibly lucky to have had an advisor, Professor Steven Haider, who provided a
tremendous amount of support, encouragement, and constructive criticism over the past several
years. I will be forever grateful for the countless hours that he spent reading some very
preliminary drafts of my work, discussing potential improvements, and teaching me how to read,
write, and present economic research. Professor Haider was a fantastic mentor in every sense of
the word. Professor Gary Solon also deserves a huge amount of thanks for the many important
contributions he made to my dissertation and the constant friendly encouragement that he
provided; in many ways Professor Solon acted as a second advisor. I also thank the other
members of my dissertation committee, Professors Cassie Guarino and Stephen Woodbury, for
their encouragement and support throughout.
I have thoroughly enjoyed my time at MSU. During the course of my graduate studies I
have benefitted from conversations with fellow students, faculty members, and seminar
participants too numerous to mention. I received excellent training at MSU and sincerely
appreciate the time and effort that Professors John Giles, Steven Haider, Gary Solon, and Jeff
Wooldridge put into their graduate courses. I am also lucky to have had some terrific
classmates; Brian McNamara, Brian Moore, and Nick Sly have become good friends that helped
me to cope with the ups and downs of graduate school and to prepare for my comprehensive
exams. I am also thankful for the financial support and travel grants that I have received from
MSU’s Department of Economics, Graduate School, College of Social Science, and Council of
Graduate Students. I would also like to acknowledge financial support from the American
Education Finance Association’s Pre-Doctoral New Scholar Award, which I received in
vi

recognition of the first two chapters of my dissertation. I thank Cassie Guarino for encouraging
me to apply for the award.
Finally, I am blessed to have a wonderfully supportive father. His blind faith that I would
succeed helped me to persevere in the first year of graduate school and his occasional ―loans‖
made things much less stressful. Thanks Dad, I cannot begin to tell you how much I appreciate
everything that you have done for me over the years.

vii

TABLE OF CONTENTS

LIST OF TABLES .......................................................................................................................... x
LIST OF FIGURES ....................................................................................................................... xi
CHAPTER 1: How do Substiute Teachers Substitute?
1.1 Introduction ............................................................................................................................... 2
1.2 Background and Literature ....................................................................................................... 3
1.3 Institutional Details and Data.................................................................................................... 5
1.3.1 Data ........................................................................................................................................ 7
1.3.2 Descriptive Statistics .............................................................................................................. 9
1.4 Econometric Model ................................................................................................................. 14
1.4.1 Substitutes’ Optimal Decision Rule ..................................................................................... 14
1.4.2 Estimation ............................................................................................................................ 17
1.5 Results ..................................................................................................................................... 20
1.6 Conclusions ............................................................................................................................. 28
Appendix 1.1 Tables ..................................................................................................................... 32
Appendix 1.2 Figures .................................................................................................................... 43
Appendix 1.3 Average Partial Effect (APE) Definitions .............................................................. 47
Appendix 1.4 MP Probit Coefficients ........................................................................................... 49
References ..................................................................................................................................... 54
CHAPTER 2: Going the Extra Mile
2.1 Introduction ............................................................................................................................. 58
2.2 Literature Review.................................................................................................................... 60
2.3 Labor Market Environment & Data ........................................................................................ 63
2.3.1 The Intermediate School District ......................................................................................... 63
2.3.2 Data ...................................................................................................................................... 64
2.3.3 Descriptive Statistics ............................................................................................................ 65
2.4 Econometric Model ................................................................................................................. 67
2.4.1 Optimal Decision Rule ......................................................................................................... 67
2.4.2 Estimation ............................................................................................................................ 70
2.5 Results ..................................................................................................................................... 72
viii

2.5.1 Main Results ........................................................................................................................ 72
2.5.2 Sensitivity Analysis ............................................................................................................. 76
2.6 Conclusions ............................................................................................................................. 79
Appendix 2.1 Tables ..................................................................................................................... 83
Appendix 2.2 Figures .................................................................................................................... 88
Appendix 2.3 Average Partial Effects, Elasticities, & Interaction Effects ................................... 92
Appendix 2.4 Baseline RE Probit Coefficients............................................................................. 95
References ..................................................................................................................................... 99
CHAPTER 3: The Effect of High-Stakes Testing on Teacher Quality - Evidence from California
3.1 Introduction ........................................................................................................................... 103
3.2 Literature Review.................................................................................................................. 107
3.3 Institutional Details & Data .................................................................................................. 109
3.3.1 Pre-NCLB Education Policy in California ........................................................................ 109
3.3.2 NCLB’s Impact in California............................................................................................. 111
3.3.3 Data .................................................................................................................................... 112
3.4 Empirical Model and Estimation .......................................................................................... 114
3.5 Results ................................................................................................................................... 116
3.5.1 DD Estimates ..................................................................................................................... 116
3.5.2 Event History Estimates ..................................................................................................... 117
3.5.3 Sensitivity Analysis ........................................................................................................... 119
3.6 Conclusion & Discussion...................................................................................................... 120
Appendix 3.1 Tables ................................................................................................................... 124
Appendix 3.2 Figures .................................................................................................................. 129
Appendix 3.3 FE Logit Coefficients ........................................................................................... 131
References ................................................................................................................................... 133

ix

LIST OF TABLES

Table 1.1: Mean Job Characteristics ............................................................................................. 33
Table 1.2: Mean Offer Characteristics .......................................................................................... 34
Table 1.3: Daily Offers Received and Daily Selectivity of Substitutes ........................................ 35
Table 1.4: RE-Probit Coefficients................................................................................................. 36
Table 1.5: Average Partial Effects ................................................................................................ 39
Table 1.6: Mass-point Probit APE ................................................................................................ 42
Table A1: Mass-point Probit Coefficients .................................................................................... 50
Table 2.1: Mean Offer Characteristics .......................................................................................... 84
Table 2.2: RE-Probit Results ........................................................................................................ 85
Table 2.3: Linear Probability Model (LPM) Estimates ................................................................ 87
Table A2: Baseline RE-Probit Coefficients .................................................................................. 96
Table 3.1: PAIF Data Description .............................................................................................. 127
Table 3.2: Standard DD Estimates .............................................................................................. 128
Table 3.3: Event History Estimates (time-varying NCLB effects) ............................................. 129
Table 3.4: Sensitivity Analysis of Graduate-degree Results ...................................................... 130
Table A3: FE Logit Coefficients................................................................................................. 134
x

LIST OF FIGURES

Figure 1.1: Job-Length Distribution ............................................................................................. 44
Figure 1.2a: Day Of-Offer Time Distribution............................................................................... 45
Figure 1.2b: Day Before-Offer Time Distribution ........................................................................ 45
Figure 1.3: Offer Time-Acceptance Probability Gradient ............................................................ 46
Figure 2.1: Commute-Time Distributions..................................................................................... 89
Figure 2.2: Daily Weather Conditions .......................................................................................... 90
Figure 2.3: County-Level Average Daily Fuel Prices .................................................................. 91
Figure 3.1: Grade-Specific Trends in Average Teacher Characteristics .................................... 132

xi

CHAPTER 1

HOW DO SUBSTITUTE TEACHERS SUBSTITUTE?

1

1.1

Introduction

The quality of public education in the U.S. is important due to its relationship with economic
growth (Hanushek & Woessmann, 2008) and individual labor market outcomes (Card &
Krueger, 1992). Instruction is a primary input of the education production function and an
extensive literature studies the principal purveyors of instruction: regular teachers (Dolton, 2006;
Hanushek & Rivkin, 2006). Regular-teacher absence rates are between five and ten percent and
teacher absences are typically covered by substitute teachers (Roza, 2007). Little is known about
this secondary source of instruction, however, and the present paper begins to fill this gap in the
education literature by analyzing daily substitute-teacher labor supply.
Understanding the preferences of substitute teachers, particularly those certified as
regular teachers, is potentially important for several reasons. First, many schools have trouble
satisfying their demand for substitute teachers (Henderson et al., 2002; Rogers, 2001; Dorward et
al., 2000). When a substitute teacher cannot be found, regular teachers and school administrators
work overtime to cover their colleague’s absence (Rogers, 2001). This increased workload
likely decreases the covering teachers’ effectiveness throughout the day. Second, recent work
documenting the negative effect of teacher absences on student achievement suggests that
absences covered by certified substitutes are sometimes less harmful than absences covered by
non-certified substitutes (Clotfelter et al., 2009), suggesting that substitute-teacher quality may
influence student achievement. Third, poor and low-achieving schools have higher regularteacher absence rates (Clotfelter et al., 2009; Miller et al., 2008a, 2008b) and are more likely to
lose their regular teachers to wealthier and higher-achieving schools (Hanushek et al., 2004). If
substitute teachers similarly avoid low-achieving schools, the problems associated with the
availability and quality of substitute teachers discussed above are concentrated among the

2

schools and students that can least afford them. Finally, understanding the preferences of
substitute teachers might allow the design of a pay system that minimizes expenditures on
substitutes or that increases efficiency or equity by altering the distribution of substitutes or
substitute quality across schools.
I estimate a sequential binary-choice model based on an expected utility-maximizing
optimal decision rule that is hypothesized to govern substitutes’ job-offer acceptance decisions.
The empirics utilize data on the job offers, both accepted and rejected, made by an automated
calling system to substitute teachers. The offers are made in a conditionally random order that
creates exogenous variation in offer quality across substitute teachers.
Several non-wage offer characteristics are found to play an important role in substitutes’
daily labor supply decisions, including commute time, school type, school quality, and time of
offer. Friday jobs are significantly less likely to be accepted and certified substitutes are more
likely to accept offers than non-certified substitutes. Interestingly, conditional on achievement, a
school’s demographic composition does not influence substitutes’ daily decisions, nor does job
length conditional on daily pay. Substitutes do, however, systematically prefer longer and higher
paying full-day jobs to half-day jobs.

1.2

Background and Literature

Substitute teachers have recently received attention from both policy makers and the popular
media. For example, in 2007 H.R. 3345 (The Substitute Teacher Improvement Act) was
introduced in Congress and in 2010 a New York Times editorial lamented the difficulties of
substitute teaching (Bucior, 2010). Despite the apparent interest in substitute teachers, however,
they have been neglected by economists and education-policy researchers. A possible

3

explanation for the lack of rigorous research on substitute-teacher labor supply is the dearth of
data on substitute teachers in large, nationally representative, data sets like the National Center
for Education Statistics’ Schools and Staffing Survey.
Existing studies of the substitute-teacher labor market come mainly from outside of
economics. The contingent-labor literature, for instance, contains two case studies of substitute
teaching. Rogers (2001) found that substitutes in a Pennsylvania school district felt underpaid
and underemployed. A sociological study found that both substitutes and regular teachers
preferred arranging jobs personally to using an automated call system (Coverdill & Oulevey,
2007).
Strauss (2003) was primarily interested in the demand for substitute teachers in the
Pittsburgh area, but did ask some qualitative questions of Pittsburgh-area substitutes. Over 40%
cited daily pay as the most important job characteristic. Overall, 98.4% of surveyed substitutes
said that daily pay was either ―very important‖ or ―somewhat important.‖ Other commonly
mentioned important job characteristics were ―advance professional career,‖ ―discipline in
school,‖ ―safety of school,‖ and ―proximity to residence.‖ Dorward et al. (2000) surveyed a
random sample of 500 U.S. school districts on ―issues related to substitute teaching.‖ The
authors report that 86% of school districts claimed to have a ―problem‖ or ―serious problem‖
with substitute availability and that 7% of districts deemed their substitutes ―below average.‖
The average daily pay in their sample was $65 per 6 hour day and ranged from $35 to $180.
What, if any, findings from the regular-teacher literature might apply to substitute
teachers? Substitute teachers operate on a daily margin, and regular teachers choose daily labor
supply by being absent. Roza (2007) finds that regular teachers are absent about ten times per
school year, accounting for about 5% of school days, while comparable professionals take only

4

three sick days during an equivalent time period. While this difference may result from teachers
being sick more often as a result of their close contact with children, a significant number of
teacher absences appear to be discretionary: Ehrenberg et al. (1991) found that annual teacher
absences are responsive to district-level policies and Jacobson (1988) found that a small cash
bonus for perfect attendance caused a significant drop in absence rates and a large increase in
perfect attendance.
In reviewing the literature on teacher quality, Hanushek and Rivkin (2006) generally find
that certification standards and advanced degrees have little to no effect on student achievement.
Absence rates, however, have been shown to negatively impact student achievement in a variety
of settings: Clotfelter et al. (2009) in North Carolina, Miller et al. (2008a, b) in a large urban U.S.
school district, and Das et al. (2007) in Zambia. Miller et al. (2008a) suggest that the negative
effect of teacher absences may partially result from the low quality of substitutes. Substitute
teachers are subject to significantly less-stringent requirements than regular teachers (Henderson
et al., 2002). Clotfelter et al. (2009) provide evidence that substitute quality matters: absences in
primary-school reading classes covered by certified substitutes are marginally less harmful than
absences covered by non-certified substitutes.

1.3

Institutional Details and Data

This paper analyzes the daily labor supply of substitute teachers in a consortium of ten adjacent
and autonomous Michigan school districts that contains more than 70 schools. The consortium’s
members enjoy economies of scale in a variety of administrative duties. For example, districts
share the fixed costs of recruiting, training, and maintaining a large pool of substitute teachers
and of running an automated calling system used to offer jobs to substitutes. The requirements

5

to substitute teach in the consortium include passing a criminal background check, at least three
years of credits from an accredited college or university, completion of a four-hour orientation
program, and either a valid Michigan teaching certificate or a Michigan substitute-teaching
license. The latter costs $25 and must be renewed annually.
Regular teachers in the consortium requested about 20,000 substitutes during the 2006-07
school year. About half of these requests were fulfilled via personal arrangements between
regular teachers and substitutes. All remaining jobs were filled by the automated calling system.
When using the automated calling system a regular teacher may request a specific substitute by
name; this accounts for less than 10% of call-system requests. The subsequent analysis is
restricted to the approximately 9,000 requests (jobs) that were filled by the calling system but did
not specify a substitute teacher by name.
At any time prior to the start of a job, regular teachers can request a substitute by phone.
The request must specify the job’s characteristics, including start and end time, subject or grade
level, location, and (optionally) a voicemail containing special instructions. The calling system
then repeatedly offers the job to available substitutes until it is either accepted or the job begins.
Offers (phone calls) are made between 4:00 p.m. and 11:00 p.m. one or more days in advance of
the job and beginning at 5:00 a.m. on the morning of the job.
The automated calling system makes offers in a random order, conditional on two
observed characteristics: substitutes’ regular-teacher certification status (in Michigan) and
substitutes’ offer-specific ―preferred-list status.‖ From the call system’s perspective certification
is a binary variable; it does not take the job’s subject or the substitute’s area of certification into
account. Each regular teacher, school, and district maintains a fluid list of ―preferred‖ substitute
teachers. All substitutes are included on the district list, which is a substitute’s default status.

6

Because ―list status‖ is offer specific, and some substitutes might be on more lists than others, I
create a substitute-specific variable equal to the percentage of ―preferred-list‖ offers.

1

Substitutes are not penalized by the system for rejecting offers and continue to receive
offers after rejecting one. Nor are substitutes penalized for reneging on an acceptance in advance
of the job’s start time, in which case the job simply reenters the calling system’s queue. After
accepting a job, however, substitutes cease receiving offers that conflict with the accepted job.
Returning to previously rejected offers is also prohibited. The model developed in section 4
accurately portrays the functioning of the automated calling system.
Upon answering a phone call from the automated system, a substitute learns the job’s
start and end time, regular teacher’s name, subject, and school. The wage is not explicitly stated
because it is a function of job length. Daily pay for all substitutes at all consortium schools is
binary; half days pay $40 and full days pay $75. Full-day jobs are those longer than four hours
and twenty minutes. Variation in job length within half and full days is created by differences in
school schedules, class schedules, and regular teachers’ discretionary choices.

1.3.1 Data
Job offers (phone calls) are the primary unit of observation. About 5% of the roughly 100,000
offers made by the automated calling system during the 2006-07 school year are dropped from
the analysis because they concerned ―alternative schools‖ for which school-level information is
unavailable. The time and date of each offer, along with characteristics of the job being offered
and the offer recipient, are observed. Job characteristics include start and end time, subject,
school, and a unique job identifier. Recipient characteristics include substitutes’ certification
1

While the lists are updated throughout the school year, status changes are relatively rare and
tend to happen early in the school year.
7

status, preferred-list status, gender, home zip code, and a unique substitute identifier. Measures
of commute time were constructed for each substitute zip code-school address pair using
MapQuest.com.

2

The calling-system data is augmented with school-level data from two additional sources.
First, total enrollment, student-teacher ratio, and the percentage of black, Hispanic, and lunch
program-eligible students at each school was taken from the 2006-07 Common Core of Data.

3

Second, the school grades assigned by the Michigan Department of Education (MDE) in 2006-07
were taken as a measure of school achievement (quality). The MDE annually publishes a School
Report Card that assigns a letter grade (A, B, C, D, or F) to each school in the state.
Published grades are the average of three distinct grades for achievement status (test
scores in levels), achievement change (first-differenced test scores), and implementation of ―best
practices‖ (self-reported usage of 40 specified instructional methods). The first two grades are
based on Michigan Education Assessment Program (MEAP) standardized-test scores and are
adjusted to account for variation across schools in average student socioeconomic status and to
4

emphasize scores at the low end of the distribution. Finally, overall grades are subject to two
potential modifications based on the school’s Adequate Yearly Progress (AYP) status: schools
2

MapQuest uses geocoding technology to assign approximate latitude-longitude coordinates to
each school’s address and each substitute’s zip-code centroid. An algorithm that favors higher
posted speed limits and fewer turns and intersections searches for an optimal route.
Approximate driving distance and travel time are then estimated using posted speed limits,
average stop-time at each intersection, and average time it takes to make each left turn along the
route. For additional details and references see Layton (2005). A trivial number of substitutes
were assigned a non-Michigan zip code; these substitutes were dropped from the analysis.
3
Lunch-programs provide low-income students with free or reduced-price lunches. Eligibility
for such programs is a commonly-used indicator of student poverty. The Common Core of Data
is publicly provided by the National Center for Education Statistics: http://nces.ed.gov/ccd/.
4
For additional MEAP information see http://www.michigan.gov/mde/0,1607,7-14022709_31168---,00.html.
8

that make AYP and earn a D will have their grade improved to a C while schools that fail to
5

make AYP and earn an A will have their grade lowered to a B. The formulas used to compute
school grades are provided in MDE (2007).

6

1.3.2 Descriptive Statistics
Table 1.1 describes the jobs offered by the automated calling system. Column 1 reports the
average characteristics of all 8,566 unique jobs, 98% of which were ultimately accepted. The
average job was offered about eleven times before being accepted and the initial offer was made
on the day of the job about one third of the time. Slightly more than one third of jobs were halfdays, but given the way that daily pay is determined the overall average job length and hourly
wage are not particularly interesting. Jobs were roughly evenly distributed between elementary,
middle, and high schools. Only 4% of jobs were in charter schools. The majority of jobs were in
well-performing schools; only 1% of jobs were in D schools and less than 10% of jobs were in C
schools. The average job was in a school in which 19% of students were eligible for lunch
assistance, 10% were black, and 4% were Hispanic. Finally, jobs were roughly evenly
distributed across days of the week, with a slightly higher percentage of jobs occurring on
Fridays. Not reported in table 1.1 is that jobs were evenly distributed across months, with the
exception of fewer jobs in September and June.
The mean characteristics of the 3,126 never-accepted jobs are given in column 2. Two
notable differences between columns 1 and 2 emerge. First, nearly three quarters of the never5

AYP is binary. It is a computed using the percentage of ―non-proficient‖ students and
attendance rates (or graduation rates in high schools).
6
The most recent School Report Cards and accompanying documentation are publicly available
at https://oeaa.state.mi.us/ayp/. Past School Report Cards and documentation are available from
the MDE upon request.
9

accepted jobs were first offered on the day of the job. Second, the never-accepted jobs
overwhelmingly fall on Fridays.
Columns 3 and 4 of table 1.1 investigate the differences between full and half-day jobs.
Despite being significantly shorter on average, half days pay about $1.40 more per hour than
full-day jobs. Half-days are less likely to be in rural districts and more likely to be in elementary
schools. Full-day jobs are more likely to be in high schools. Half-day jobs are somewhat more
likely to be in A schools while full-day jobs are more likely to be in B, C, and D schools. The
remaining job characteristics do not systematically vary with half-day status.
Figure 1.1 shows that the job-length distribution is a bimodal mixture of two distributions
centered on the half-day and full-day means. The full-day distribution is tightly centered around
the full-day mean, while the half-day distribution exhibits more variation. One potential
explanation for this is that full-day absences result from regular teachers being unavailable for
the entire school day, rather than being unavailable for a period of time greater than four hours
and 20 minutes but less than the length of the school day, while half-day absences result from
commitments shorter than four hours and 20 minutes. Note, however, the non-zero mass to the
immediate right of the full-half cutoff. These potentially interesting jobs pay a high hourly wage
and are investigated in column 5 of table 1.1.
There are 139 ―short‖ full-day jobs that are less than five hours but pay the full-day wage.
At first blush it is surprising that so many of these high-hourly wage jobs were never accepted,
but this phenomenon is at least partly explained by the high percentage of ―short‖ full-day jobs
that were initially offered on the day of the job. For example, a regular teacher may have gone to
school, unexpectedly needed to leave, and called a substitute for the remainder of the day. Thus

10

the job was in the four to five hour range because the teacher had been in school for an hour or
two before requesting a substitute but the calling system had insufficient time to find a substitute.
The ―short‖ full-day jobs paid over $16 per hour, several dollars more than average, and
were less likely to be in rural districts, elementary schools, and high schools. A potential
concern is that teachers in unobservably bad classrooms took it upon themselves to attract
substitute teachers by purposely choosing a job length to the immediate right of the half-day
cutoff with the intention of creating a high hourly wage. Column 5 provides two pieces of
evidence against this hypothesis, however. First, these jobs are no less likely to be in A-graded
schools. Second, they are actually less likely to be on Fridays, which are the days on which it is
most difficult to find a substitute. Thus it appears that teachers are not paying compensating
wage differentials based on observable measures of quality. It is reasonable to assume,
therefore, that regular teachers are not behaving this way based on unobservable job
characteristics either.
Table 1.2 reports mean offer characteristics. Focusing on offers rather than jobs
introduces two new dimensions to the data: the offer’s recipient and timing. Column 1
summarizes the 94,106 offers made by the automated system. The average offer was made 1.6
days prior to the job’s start, half of offers were made on the day of the job, and 36% were made
on the day before the job. The average one-way commute was about 20 minutes (15 miles).
Regarding offer-recipient characteristics, 25% of offers went to certified substitutes and
93% went to substitutes who accepted at least one call-system offer during the year. Just 1% of
offers went to a substitute on a teacher’s preferred list and 6% went to substitutes on a school’s
preferred list. The remaining school characteristics have offer-averages similar to the
corresponding job averages in column 1 of table 1.1.

11

Columns 2 and 3 report offer means separately for half and full days, respectively. The
average acceptance rate for full-day jobs is slightly higher. Most of the other offer-specific
characteristics are similar across half and full days with one exception: certified substitutes
receive a larger share of half-day offers.
Columns 4 and 5 summarize offer characteristics by recipients’ certification status. The
acceptance rate of certified substitutes is three percentage points higher than that of non-certified
substitutes. This could indicate that certified substitutes have stronger tastes for substitute
teaching or that certified substitutes are more likely to accept offers because they receive higher
quality offers as a result of the calling system’s preference for certified substitutes. As expected,
on average certified substitutes receive offers one full day earlier than non-certified substitutes
and are less likely to receive day-of offers.
Average commutes are about the same for certified and non-certified substitutes, but
school characteristics are not. Certified substitutes are more than twice as likely to receive
elementary-school offers and less than half as likely to receive high-school offers as their noncertified counterparts. Certified substitutes are also more likely to receive offers from A schools,
which could result from jobs being offered to certified substitutes first, who quickly accept the A
offers leaving fewer A jobs to be offered to non-certified substitutes.
Over 85% of offers were made either on the day of or day before the job. Figure 1.2
provides day-of and day-before offer-time histograms that examine the precise timing of these
offers. Not surprisingly, the likelihood of receiving a day-of offer decreases monotonically with
time. Offers are made until relatively late in the school day because jobs, particularly half-day
jobs, can start at any time. The day-before offer-time distribution follows a U-shaped pattern:
the probability of receiving an offer decreases from 4:30 to 6:00 p.m., remains relatively flat

12

from 6:00 to 8:00 p.m., and then increases between 8:00 and 10:00 p.m. before slowly decreasing
again. One explanation of this pattern is that many substitute requests are made during and
immediately after the school day, then there is a lull during dinner time before another batch of
requests are made later in the evening. These qualitative patterns remain when looking at the
offer-time distributions separately for certified and non-certified substitutes.
Table 1.3 investigates the daily selectiveness of substitutes. For each of the 32,338 ―subdays‖ for which a substitute received at least one offer to work on a given day, the total number
of sub-day offers received is tabulated separately for substitutes who worked on the day in
question and those that did not. The average non-worker received 3.16 offers to work on the day
in question while the average worker received 2.05 such offers. This difference is a result of the
fact that offers to work on a given day essentially stop arriving once the substitute has accepted
an offer to work on that day.
Two striking features of table 1.3 will be revisited when discussing the empirical
specification. First, of the non-workers on a given day, nearly 16% rejected six or more offers
and over 40% of total offers went to these ―multiple rejecters.‖ It is unlikely that all of these
substitutes received a series of unlucky draws from the job distribution; instead, these figures
suggest that a nontrivial fraction of substitutes had a prohibitively large opportunity cost of
substitute teaching on the day in question. Second, of the substitutes who worked on a given
day, nearly 60% accepted the first offer that they received. One explanation of the high
percentage of first-offer acceptances is that many substitutes have a low opportunity cost of
subbing on a given day. Alternatively, these quick-to-accept substitutes may be extremely risk
averse or worried that another offer may not arrive. This is not to say that there is no variation in

13

the number of offers received before accepting, however, as 20% of substitutes sampled two
offers, 9% sampled three offers, 5% sampled four offers, and over 3% sampled 7 or more offers.

1.4

Econometric Model

1.4.1 Substitutes’ Optimal Decision Rule
The functioning of the automated calling system in this labor market is remarkably similar to a
finite-horizon job-search model with no recall and no on-the-job search (Mortensen, 1986).
Supposing that substitute teachers maximize expected utility when making daily labor supply
decisions, the optimal strategy can be defined in terms of a reservation-utility decision rule:
A

accept an offer if and only if the utility of accepting (U ) exceeds the expected utility of rejecting
R

(U ). The former depends on the individual’s tastes for substitute teaching and on the offer’s
N

characteristics. The latter depends on both the individual’s non-subbing alternative (U ), or
opportunity cost of substitute teaching, and expectations regarding future offers.
Let T represent the end of the school day, at which point the probability of receiving an
offer becomes zero. Rejecting an offer at time T is therefore equivalent to choosing the nonR
subbing alternative, so UT  U N . If offers arrive at time t with probability πt and only one offer

can be received per period, the expected utility of rejecting at all t less than T is





UtR   t 1E  max UtA 1,UtR 1   1   t 1  E UtR 1  .

 

  



R
Because U tR is decreasing in t and UT  U N , U tR can be approximated as

14

(1.1)

UtR  U N  b  t  ,

(1.2)

where b(t) is nonnegative, monotonically decreasing in t, and equals zero at time T. Empirically,
I will employ a flexible piecewise-linear approximation of the b(t) function that allows the first
derivative to vary with the number of days in advance that the call is made.
I assume that substitute teachers’ daily preferences are represented by the same utility
function regardless of where, or if, they work. Daily utility is a function of non-labor income
(Y), labor income (M), hours worked (H), commute time (h), and observed and unobserved
individual, day, and non-wage job characteristics (ψ). Specifically, let daily utility be separable
in income and leisure, taking the form

U  f Y    M  g  H , h   ψ.

(1.3)

The functions f and g are both increasing. M is valued linearly because small changes to lifetime
earnings have approximately no income effect (Goette et al., 2004).
The empirics will take a linear approximation of g, so the utility accruing to substitute s
of accepting an offer to work on day d at time t is

A
A
U sdt  f Ysd   γ Axsdt  λ Az s  b A jsdt  δ Ard  sd   sdt ,

(1.4)

where xsdt is a vector of observed job characteristics including M, H, and h; zs is a vector of
observed individual characteristics including gender, certification status, and preferred-list status;

15

rd is a vector of day-of-job variables including day of week and month; jsdt is a vector of timeA
of-call variables; sd is the substitute’s unobserved day-specific taste for substitute teaching;

and εsdt is an offer-specific error term capturing unobserved offer characteristics and distractions
to the substitute at the time of offer.

7

A substitute’s daily non-subbing utility depends on non-labor income and varies with
observable individual and day characteristics. The characteristics of the non-subbing activity are
N
unobserved and subsumed in an unobserved sub-day term sd . Therefore, the utility of the non-

subbing alternative is

N
N
U sd  f Ysd   λ N z s  δ N rd  sd .

(1.5)

Combining equations (1.2), (1.4), and (1.5) with the optimal decision rule discussed
above yields the probability that an offer will be accepted conditional on it being received.
Formally,



A N
Pr Asdt  1| psdt  1, x sdt , z s , rd , jsdt , sd , sd







 Pr γ Ax sdt  λz s  δrd  bjsdt  sd   sdt  0 | psdt  1 ,

7

(1.6)

Because daily pay is binary, M is replaced by a half-day dummy in the empirics. The day-ofweek and month dummies enter equation (4) because they contain information on job quality.
For example, students may be systematically rowdier on Fridays and in June because they are
excited for the weekend and summer vacation, respectively. Time of call enters equation (4)
because it proxies for job quality to the extent that the unobserved job-quality distribution
changes over time and because it provides a measure of the substitutes’ preparation time for the
job.
16

where Asdt is a binary indicator of offer acceptance, psdt is a binary indicator of having received
an offer, and parameters lacking a superscript represent net effects that are defined as follows:
A

A
N
λ  λ A  λ N , δ  δ A  δ N , b  b A  b N , and sd  sd  sd . With the exception of γ , the

primary object of interest in this study, only net effects of the model’s covariates are identified
because the same covariates enter equations (1.4) and (1.5). Finally, note that non-labor income
A

N8

was differenced out of (1.6) because it is valued identically in both U and U .

1.4.2 Estimation
The observance of all rejected offers made by the call system distinguishes this dataset from
those typically used to estimate job-search models (Devine & Kiefer, 1991, p. 8). This is
important because the usual sample-selection problem associated with observing only accepted
offers is avoided and equation (1.6) can be estimated in a straightforward binary-response
framework. A different sample-selection problem remains, however, because of the no on-thejob search rule followed by the automated calling system: substitutes who work on day d will, on
average, receive fewer offers and have higher values of ωsd than day-d non-workers. Because
offer-acceptance decisions are only observed when an offer was made, the data can be viewed as
a selected sample where psdt serves as the selection indicator. Thus pooled estimators of (1.6),
which leave ωsd in the error term, are inconsistent because ωsd is negatively correlated with psdt.

8

Intuitively, this is a result of consumption smoothing over the lifecycle and preferences that are
separable in consumption and leisure. The assumption that non-labor income is valued
differently on subbing and non-subbing days can be relaxed entirely by noting that any
difference in utility would be sub-day specific and hence captured in ωsd.
17

Conditional on zs and ωsd, however, the offer-specific error term εsdt is independent of
the selection indicator. This is a direct result of the call system’s randomness. Accordingly,
conditional on zs and ωsd, time periods in which no offer is received can be considered ―missing
at random‖ (Wooldridge, 2010, p. 795) and the sample-selection problem can be safely ignored.
This solution to an unbalanced-panels problem in a nonlinear model is similar in spirit to Kiefer
and Neumann (1981).
I assume that  sdt | xsdt , z s , rd , jsdt , sd ~ N  0,1 , so (1.6) can be rewritten as





Pr γ Ax sdt  λz s  δrd  bjsdt  sd   sdt  0 | psdt  1





  γ Ax sdt  λz s  δrd  bjsdt  sd .

(1.7)

Assuming that sd | z s , rd ~ N  0,  , equation (1.7) can be estimated using the random effects
(RE) probit estimator of Butler and Moffitt (1982). I will treat the RE-probit model as the
baseline model.
The RE-probit model makes several strong assumptions, so I consider alternative
estimators as well to verify the robustness of the results. First, consider relaxing the
distributional assumption made on the unobserved sub-day effect ωsd. As seen in table 1.4, the
raw data suggests that on any given day a nontrivial number of substitutes have an extremely low
opportunity cost (or high level of risk aversion) and another subset of substitutes have a
prohibitively high opportunity cost. This suggests that ωsd may not be normally distributed. An
alternative is a nonparametric ―mass point‖ distribution of ωsd (Heckman & Singer, 1984). An

18

additional benefit of the mass-point (MP) model is that the proportion of sub-days located at
each mass point and mass point-specific marginal effects can be estimated. Both the number and
preferences of substitutes ―at the margin‖ of accepting a day-d offer may be of particular interest
to policy makers because these are the substitutes who are likely to be influenced by small policy
changes.
Second, I relax the assumption made by both the RE-probit and MP-probit models that
ωsd is conditionally independent of the offer characteristics by using the linear fixed-effects (FE)
estimator to estimate a linear probability model (LPM). Comparing the linear-FE estimates to
linear-RE estimates provides an approximate test of the call system’s conditional randomness.

9

The FE-logit estimator is not an attractive option in the present case because the majority of
observations would be dropped from the analysis because there is no variation in the offeracceptance decisions for the majority of sub-days.
The assumption that εsdt is independent of the offer characteristics is more contentious
and less testable than the independence of ωsd. This is because regular teachers may take it upon
themselves to pay compensating wage differentials. For example, teachers in unobservably bad
classrooms might systematically offer shorter assignments, causing εsdt to be correlated with
Hsdt. Recall that the discussion of table 1.1 in section 1.3, however, suggests that regular
teachers are not paying compensating wage differentials based on observable job characteristics,

9

The test is approximate because the linear-RE estimator is inconsistent as a result of the
unbalanced-panel problem discussed above (Wooldridge, 2010, p. 831).
19

making it unlikely that they do pay compensating wage differentials based on unobservables.
Furthermore, this type of behavior is unlikely to be problematic for a number of reasons.

1.5

10

Results

Table 1.4 reports estimated RE-probit coefficients and their standard errors clustered at the
substitute level for four alternative specifications. Columns 1 and 2 estimate the baseline REprobit model using all offers (observations). The only difference between the two is the presence
of a half day-hours interaction term in the former, which is negative and statistically significant
at the 5% level. Surprisingly, the column 1 coefficient on hours is positive and statistically
significant at 5%. The half day-hours coefficient is larger in magnitude, however, implying that
the marginal effect of hours is negative for half-day jobs and positive for full-day jobs. The
model estimated in column 2 assumes that the marginal effect of hours is identical for both fullday and half-day jobs and precisely estimates a zero hours coefficient. The hours coefficients in
column 1 are quite small as well, suggesting that the true effect of job length on the labor supply
decisions of substitute teachers is quite small. The half-day dummy coefficient cannot be
directly interpreted because half-day status cannot change while holding hours constant; coherent
average partial effects (APE) will be discussed shortly.
There are essentially no differences in the remaining coefficient estimates between
columns 1 and 2. Several offer characteristics have relatively large and statistically significant
10

First, if teachers do behave this way, it is only problematic if substitutes are aware of each
job’s unobserved quality. Considering the large number of substitutes and regular teachers
working in the consortium it is unlikely that many substitutes, especially those accepting offers
from the randomized call system, are aware of the intricacies of each specific classroom.
Second, as seen in table 1.1, 98% of jobs are eventually accepted. With such a high fill rate the
threat of not finding a substitute is quite low, which significantly lowers the incentive to
implement such a strategy. Finally, concerned teachers have the more effective option of
specifically requesting a substitute or compiling a list of teacher-preferred substitutes.
20

coefficients, including commute time, school type, MDE-assigned school grades, special
education, certification, preferred list status, and the time-of-call variables. Two sets of
covariates included in the models are excluded from table 1.4 in the interest of brevity. First is a
set of day-of-job indicators that omits Wednesday; only the Friday coefficient is statistically
significant and reported in table 1.4. Second is a set of month-of-job indicators that omits
October as the reference point. The fall and early winter month coefficients are statistically
insignificant. The coefficients on the spring months (March, April, May, and June) are all about
-0.30 and statistically significant at the 1% level.
Columns 3 and 4 of table 1.4 estimate the baseline specification of column 1 separately
for certified and non-certified substitutes, respectively. A likelihood ratio test strongly rejects
11

that the model’s parameters are identical for certified and non-certified substitutes.

There are

several noticeable differences in coefficients, particularly commute time, where the certified
coefficient is more than three times larger than that for non-certified substitutes. Other large
differences are found for school type, school quality, and subjects including foreign language
and special education.
Columns 1 – 3 of table 1.5 report APE for the RE-probit models estimated in columns 1,
3 and 4 of table 1.4. These APE are comparable to the LPM coefficients reported in columns 4
and 5 of table 1.5 and were computed following Wooldridge (2010, p. 613), exploiting the fact
that the conditional expectation of (1.7) can be written as

11

The LR test statistic was formed by taking the log likelihood of the unrestricted model to be
the sum of the log likelihoods from columns 3 and 4. The resulting LR statistic has a p-value
significantly less than 0.0001.
21



 γ Ax sdt  λz s  δrd  bjsdt 
E  Asdt | psdt  1, x sdt , z s , rd , jsdt , sd    
.
0.5
2


1  







(1.8)

The value of the RHS of (1.8), averaged across all offers, provides an estimate of the predicted
acceptance probability and is reported at the bottom of table 1.5.

12

Precise definitions of the

estimated APE are provided in appendix 1.3. The APE standard errors were computed by taking
the standard deviation of 50 bootstrapped APE estimates. The bootstrap procedure resampled
with replacement at the substitute level, utilizing all observations from the chosen substitute.
Resampling at the substitute level produces standard errors that are robust to substitute-level
clustering and that are asymptotically equivalent to the usual robust ―sandwich‖ standard error
estimates (Cameron & Trivedi, 2005).
The half-day APE reported in column 1 of table 1.5 indicates that half-day jobs are 1.8
percentage points less likely to be accepted than full-day jobs. The preference for full-day jobs
is about one percentage point higher among certified substitutes than non-certified substitutes.
The overall APE of half-day job hours on the acceptance probability is -0.003 and does not vary
with certification status. It is not significantly different from zero and the standard error of 0.003
suggests that this is a precisely estimated ―zero effect‖ of hours on the acceptance decision. The
full day-hours APE is larger in magnitude, positive, and even more precisely estimated.

12

APE were computed by averaging across all offers (observations). It is worth pointing out
that different APE might be considered, however. For example, we might average across jobs
because low-quality (frequently rejected) jobs are over represented in the sample. Similarly,
averaging across substitutes might be useful to the extent that high-opportunity cost substitutes
(frequent rejecters) are overrepresented in the sample. An alternative to computing APE at all is
to simply scale the probit coefficients reported in table 1.4 by values ranging from zero to 0.4
(the range of possible values of the normal pdf).
22

Although statistically different from zero, at 0.007, a half-hour (one standard deviation) increase
in job length only raises the acceptance probability by one third of one percentage point,
suggesting that job length has no economically significant impact on the offer-acceptance
decisions made by substitute teachers.
Commute time is measured in one-way hours, so a fifteen-minute increase lowers the
acceptance rate by about 1.4 percentage points. This is a substantial effect given that the overall
predicted acceptance rate is 0.12. The effect of commute time on certified substitutes is
substantially larger than for non-certified substitutes, where a mere 15 minute increase in oneway commute time lowers the predicted acceptance rate by over four percentage points. A likely
explanation for this difference is that the certified substitutes’ opportunity cost of time is greater.
Regarding school type, overall, high-school jobs are two percentage points more likely to
be accepted than elementary and middle-school jobs. This effect is statistically significant at the
1% level. Among certified substitutes, elementary-school jobs are nearly three percentage points
less likely to be accepted, while non-certified substitutes react similarly to offers from
elementary schools and middle schools. The preference for high schools is driven entirely by
non-certified substitutes, however, who are 2.7 percentage points more likely to accept highschool jobs. Both types of substitutes are significantly less likely to accept jobs in charter
schools, with a certified APE of about -0.05 and a non-certified APE of about -0.03. The
charter-school result is interesting, especially because student demographics within the schools
are being controlled for. If it is not the students that make these jobs less desirable, one
possibility is that these jobs are more structured and require greater effort from the substitute
teacher; another is that these jobs provide substitutes with fewer networking opportunities.
Finally, it is important when interpreting these results to remember that the certified substitutes

23

are called first and are therefore choosing from a different job-quality distribution than are noncertified substitutes.
Relative to the reference group of highest-achieving A schools, both B and C schools are
about 1.5 percentage points less likely to be accepted, and this effect is statistically significant at
the 1% level. The lowest-graded schools observed in the consortium, those earning a D, are 6.6
percentage points less likely to be accepted than A schools. Magnitude wise, the effects are
slightly larger among non-certified substitutes. The school-grade effects for certified substitutes
are similar to the overall APE, but imprecisely estimated. Having controlled for schools’
achievement levels, it is interesting to note that the student-demographic variables are mostly
insignificant. There also appear to be some subtle differences between certified and noncertified preferences for student type. For example, the APE of percent black suggests that a
10% increase in black enrollment lowers the probability that a certified substitute will accept the
offer by one percentage point but raise the probability that a non-certified substitute will accept
the offer by a little more than half a percentage point, and these effects are marginally
statistically significant. Again, part of this might due to the fact that certified substitutes are
called first.
The raw call-system data contains about 70 unique subject descriptions that I aggregated
into 14 broad groups. The 13 indicator variables are strongly jointly significant, with
English/reading serving as the omitted reference group because these subjects generally require
less subject-specific knowledge than math or science, for example. Only a few subjects are
individually statistically significant, however. Large, negative, statistically significant effects
were found on the art/gym/music, special education, and ―other‖ indicators. These results
suggest that substitutes generally preferred academic to non-academic subjects, but had little

24

subject-specific preference within these broad groups.

13

The overall negative effect of each of

these seemingly diverse groups of subjects ranges from 0.02 to 0.03. These subjects share some
common characteristics, however, notably that they each require some type of specific training
and increased attentiveness on the part of the substitute. The latter is particularly true of gym
and special education classes. The large and significant negative effect of foreign language
classrooms for certified substitutes, coupled with a larger APE of the art/gym/music group,
suggests that certified substitutes are particularly averse to accepting jobs in subjects that are
foreign to them.
Certified substitutes are about four percentage points more likely to accept an offer,
according to the APE of certified in column 1, and this is precisely the difference in average
predicted acceptance rates for certified and non-certified substitutes found at the bottom of
columns 2 and 3. The APE of the list-status indicators, at two to three times the average
predicted acceptance rate, are quite large. While it is tempting to infer from this that forging
personal relationships between substitute and regular teachers is a surefire way to increase offeracceptance rates, remember that part of this positive effect results from the call order’s
dependence on list status, which inflates the estimated effect. The same applies to the certified
dummy because of certification’s role in the call order.
Friday jobs are about two percentage points less likely to be accepted than jobs on any
other day of the week, and this is true for both certified and non-certified substitutes. This is
likely the result of some combination of increased demand for substitutes on Fridays, higher
opportunity costs of substitutes on Fridays, and poorer student behavior on Fridays.

13

Similar effects were found on art, gym, and music when each was assigned its own indicator
variable. The ―other‖ category includes agriculture, English as a second language, family life,
home economics, life skills, and speech therapy.
25

Finally, recall the statistical significance and large magnitudes of the time-of-call
coefficients in table 1.4. The day-of and time-of call interactions complicate the calculation of
scalar APE. Instead, I plot the average acceptance rate as a function of time of call in figure 1.3,
which is easier to interpret. Three separate curves are plotted, representing the overall, certified,
and non-certified response to time of call. The offer time-acceptance gradient is essentially flat
for all substitutes for offers made more than one day in advance, which is intuitive because this is
too early to worry about not receiving additional offers and also because the average offer
quality is unlikely to vary across time within days this far in advance of the job. The gradient is
upward sloping on the day before, suggesting that the reservation utility is decreasing with time,
as predicted by the search model. This effect is greater for non-certified substitutes, which again
is intuitive because the non-certified subs are called last and thus might worry more than
certified subs about receiving additional future offers.
For day-of offers, however, two things change. First, the gradient becomes downward
sloping, which suggests that offer quality decreases rapidly with time on the morning of and that
this negative effect dominates the search effect observed for day-before offers. One reason for
this is that not only have the jobs been thoroughly picked over by the morning of, but also that
offers made on the morning of are more likely to have been placed by the regular teacher on the
morning of and thus these jobs are less desirable because the teacher has not left a lesson plan
and the students have not been prepared for the absence. Second, the certified gradient is now
much steeper than the non-certified gradient, perhaps because certified substitutes have a
stronger aversion to low-quality jobs.
Columns 4 and 5 of table 1.5 report RE- and FE-LPM estimates, respectively. The LPM
estimates, for the most part, are similar in sign and magnitude to the probit APE in column 1.

26

Additionally, the linear RE and FE estimates are themselves quite similar, which suggests that
the calling system is conditionally random. Finally, an advantage of the linear models is that
standard errors robust to two-way clustering are easily computed (Cameron et al., 2006). The
probit standard errors are one-way clustered at the substitute level, which is important because
the unobserved sub-day tastes for subbing and opportunity costs of subbing are likely correlated
across days, within substitutes. But if there are important unobserved job characteristics, proper
inference must account for this second source of clustering. The two-way standard errors
reported in square brackets are quite similar to the one-way clustered standard errors in
parentheses for the linear models. In the RE estimates, on average, the two-way standard errors
are 12% larger than the one-way standard errors. The similarity between the one-way and twoway clustered standard errors in the linear model is reassuring for the interpretation of the oneway probit-model standard errors.
The overall mass-point APE, reported in column 1 of table 1.6, is the weighted average of
the mass point-specific APE reported in columns 2 – 4, which are arranged in descending order
of predicted acceptance probability. Equation (1.8) is unnecessary for the calculation of the
mass-point APE because the estimated mass-point locations can be plugged directly in to the
RHS of equation (1.7). There is no practical difference between the RE-probit APE in column 1
of table 1.5 and the MP-probit APE in column 1 of table 1.6, which indicates that the results are
robust to the assumed heterogeneity distribution.
The mass-point locations and corresponding location probabilities are provided at the
bottom of table 1.6, where we see that 12% of offers go to substitutes with a 59% chance of
accepting the offer, 48% go to a substitute with a 9% chance of accepting, and a remarkable 40%
go to substitutes who will almost certainly reject any offer they receive. This last result was

27

foreshadowed in table 1.3, where 43% of calls were made to substitutes who rejected six or more
day-d offers on the way to not working on day d. With nearly half of offers going to substitutes
who have no intention of accepting or even listening to the offer, the overall APE is effectively
biased towards zero.
The mass-point APE are useful, then, because they provide the preferences of substitutes
―at the margin‖ of accepting an offer. The APE in columns 2 and 3 are about four and two times
larger than the overall APE, respectively, suggesting that substitutes ―at the margin‖ are
significantly more responsive to offer characteristics than implied by the overall APE discussed
in table 1.5. Policy makers seeking to redistribute substitutes or substitute quality across schools
may be particularly interested in the APE of column 2, because these are the substitutes who are
significantly ―at risk‖ of working on a given day.

1.6

Conclusions

This paper has used data on accepted and rejected job offers to estimate a sequential binarychoice model of substitute teachers’ daily labor supply. A variety of non-wage job
characteristics were found to significantly affect the offer-acceptance probability, including
commute time, school type, school quality, subject, day of job, and time of offer. Higher-paying
longer jobs were preferred to lower-paying shorter jobs. Job length, conditional on daily pay,
was a notable non-factor in substitutes’ decision making as were student demographics
conditional on school achievement level. Future work might probe the wage elasticity by
experimentally varying daily pay or rigorously analyzing the impact of a wage change.
The basic results of the paper are of general interest for at least three reasons. First, the
research potential of pseudo-random automated calling systems is displayed, both as a source of

28

exogenous variation and as a collector of high-quality data. Second, economists study how
individuals make decisions and this paper has provided a unique glimpse into the determinants of
a fundamentally important decision: when and where to work. Indeed, the analysis highlights the
multitude of factors in addition to hours and wages that enter the decision-making process.
Finally, the results may contribute to the regular-teacher literature more generally.
Exogenous variation in important factors such as commute time, student achievement, and
student demographics is typically nonexistent in studies of teacher attrition and sorting across
schools. The main results suggest several potentially welfare-enhancing substitute-teacher
policies. First, the call-order algorithm might be adjusted to offer jobs to nearby substitutes first.
This policy would decrease the number of calls made by the call system and increase substitutes’
preparation time for jobs. More importantly, if schools routinely call the same set of substitutes
first, these substitutes will repeatedly work in the same schools. Doing so will provide these
substitutes with specific human capital with regards to schools’ policies, layout, and individual
students’ needs. Similarly, substitutes would accumulate social capital with the administration,
faculty, and students. A lack of both types of capital is often seen as a challenge to successful
substitute teaching (Coverdill & Oulevey, 2007).
Second, a variety of methods might be implemented to attract certified substitutes to
underperforming schools and improve the equity of the distribution of substitute-teacher quality.
One solution is to pay compensating wage differentials in the low-achieving schools that stand to
benefit the most from attracting higher-quality substitutes. Such a policy can be budget neutral,
and even decrease expenditures, if the compensating wage differential is created by decreasing
daily wages of preferred jobs. Similarly, the observed preferences for commute time and time of
offer can be exploited in the calling-system algorithm to direct substitutes, or a subset of

29

substitutes, towards particular schools or classrooms. There are, however, potentially
complicated general equilibrium effects of such policy changes that must be considered.
Finally, regarding the external validity of these findings, the estimated partial effects on
the acceptance probability are likely to be small relative to national or overall effects. This is
because over 98% of substitute requests were ultimately filled and substitute teachers
nevertheless exhibited strong preferences over a variety of job characteristics. It stands to
reason, then, that in labor markets with excess demand (substitute-teacher shortages) substitutes
would be even choosier when accepting jobs. Still, it would be useful to employ a similar
empirical approach to call-system data in other substitute-teacher labor markets to verify the
robustness of these results.

30

CHAPTER 1 APPENDICIES

31

APPENDIX 1.1

CHAPTER 1 TABLES

32

Table 1.1: Mean Job Characteristics

1
0.02
10.99
(15.56)
0.32
0.36
5.76
(1.84)
$11.16
(2.19)
0.35

NeverAccepted
2
1
47.21
(46.25)
0.74
0.40
5.41
(1.86)
$11.89
(3.23)
0.36

Half-day
Jobs
3
0.03
11.40
(15.04)
0.32
1
3.43
(0.52)
$12.05
(3.14)
0.28

Full-day
Jobs
4
0.02
10.75
(15.85)
0.32
0
7.09
(0.53)
$10.66
(1.09)
0.40

Full-Day,
Hours < 5
5
0.09
18.53
(26.76)
0.38
0
4.63
(0.22)
$16.23
(0.75)
0.22

0.36
0.28
0.36
0.04
0.51
0.39
0.09
0.01
0.19
(0.15)
0.10
(0.13)
0.04
(0.04)

0.31
0.36
0.33
0.07
0.42
0.42
0.14
0.02
0.23
(0.18)
0.13
(0.18)
0.05
(0.04)

0.46
0.25
0.29
0.02
0.57
0.36
0.07
0.00
0.19
(0.15)
0.10
(0.12)
0.05
(0.03)

0.30
0.30
0.40
0.04
0.47
0.41
0.11
0.01
0.19
(0.16)
0.10
(0.14)
0.04
(0.04)

0.29
0.49
0.22
0.06
0.52
0.34
0.14
0.00
0.25
(0.15)
0.12
(0.13)
0.05
(0.04)

0.19
0.19
0.19
0.20
0.23

0.08
0.11
0.07
0.10
0.64

0.18
0.17
0.23
0.20
0.21

0.20
0.20
0.17
0.20
0.24

0.08
0.14
0.45
0.14
0.20

All Jobs
Never accepted
Times offered
First offer was day of
Half day
Hours
Hourly wage
Rural district
School Characteristics
Elementary
Middle
High
Charter
Grade A
Grade B
Grade C
Grade D
% lunch
% black
% Hispanic

Day of job
Monday
Tuesday
Wednesday
Thursday
Friday

N
8,566
214
3,126
5,440
139
Notes: Standard deviations of non-binary variables are given in parentheses. Time of first
offer is measured in days prior to the job beginning. For example, 0 means the offer was made
on the morning of the job, 1 the day before, and so on.

33

Table 1.2: Mean Offer Characteristics
Job Length
Certification Status
All Offers
Half
Full
Certified
Non-cert.
1
2
3
4
5
Accepted
0.08
0.07
0.08
0.10
0.07
Half day
0.38
1
0
0.45
0.35
Hours
5.67
3.43
7.03
5.47
5.74
(1.85)
(0.52)
(0.66)
(1.91)
(1.83)
Hourly wage
$11.27
$12.07
$10.79
$11.18
$11.31
(2.41)
(3.36)
(1.36)
(1.82)
(2.58)
Lead days
1.61
1.73
1.53
2.37
1.35
(4.88)
(5.05)
(4.78)
(6.06)
(4.38)
Day-of job
0.49
0.47
0.50
0.35
0.54
Day-before job
0.36
0.35
0.36
0.42
0.34
One-way commute min.
19.41
18.89
19.73
19.24
19.47
(14.75)
(14.88)
(14.66)
(12.60)
(15.41)
One-way commute miles
14.62
14.07
14.95
14.67
14.60
(14.50)
(14.54)
(14.47)
(12.26)
(15.19)
Certified
0.25
0.30
0.22
1
0
Worked at least once
0.93
0.93
0.93
0.94
0.93
Teacher’s list
0.01
0.00
0.01
0.01
0.00
School’s list
0.06
0.06
0.05
0.05
0.06
Rural district
0.34
0.27
0.38
0.26
0.37
Elementary
0.38
0.46
0.33
0.65
0.29
Middle
0.30
0.27
0.32
0.21
0.33
High
0.32
0.27
0.35
0.14
0.38
Charter
0.07
0.06
0.08
0.06
0.08
Grade A
0.47
0.51
0.44
0.60
0.42
Grade B
0.38
0.38
0.39
0.29
0.41
Grade C
0.14
0.10
0.16
0.10
0.15
Grade D
0.01
0.01
0.01
0.01
0.02
% lunch
0.23
0.22
0.23
0.23
0.22
(0.19)
(0.18)
(0.19)
(0.18)
(0.19)
% black
0.13
0.13
0.13
0.12
0.13
(0.17)
(0.17)
(0.18)
(0.16)
(0.18)
% Hispanic
0.05
0.05
0.05
0.05
0.05
(0.04)
(0.04)
(0.04)
(0.04)
(0.04)
Monday
0.18
0.17
0.19
0.18
0.18
Tuesday
0.16
0.15
0.17
0.17
0.16
Wednesday
0.18
0.23
0.14
0.19
0.17
Thursday
0.19
0.20
0.19
0.21
0.19
Friday
0.29
0.25
0.31
0.26
0.30
N
94,106
35,645
58,461
23,823
70,283
Notes: Standard deviations of non-binary variables are given in parentheses. Lead days
measure the days prior to the job that the offer is made. For example, 0 means the offer was
made on the morning of the job, 1 the day before, and so on.

34

Table 1.3: Daily Offers Received and Daily Selectivity of Substitutes
Day-d Non-workers
Day-d Workers
Offers
Sub-days
% sub-days % of offers
Sub-days
% sub-days % of offers
1
9,552
37.87
11.98
4,027
57.46
28.01
2
4,952
19.64
12.42
1,408
20.09
19.59
3
3,055
12.11
11.5
646
9.22
13.48
4
2,036
8.07
10.22
366
5.22
10.18
5
1,662
6.59
10.42
179
2.55
6.22
6
1,159
4.6
8.72
131
1.87
5.47
7
791
3.14
6.95
79
1.13
3.85
8
576
2.28
5.78
53
0.76
2.95
9
410
1.63
4.63
38
0.54
2.38
10
251
1
3.15
16
0.23
1.11
11-20
695
2.76
11.69
57
0.81
5.42
> 20
81
0.3
2.55
8
0.09
1.37
Total
25,220
100
100
7,008
100
100
Notes: Only 7,008 sub-days are observed for working substitutes because jobs accepted by
substitutes with non-Michigan zip codes are dropped from the sample. There were also 82
cases in which a substitute worked two non-overlapping half-day jobs on the same day.

35

Table 1.4: RE-Probit Coefficients
NonCertified
1
2
3
4
0.3963
-0.0596
0.6465
0.3767
(0.2160)*
(0.0833)
(0.4873)
(0.2420)
-0.0981
.
-0.1325
-0.0943
(0.0475)**
(0.1022)
(0.0540)*
0.0646
0.0322
0.1055
0.0604
(0.0280)**
(0.0248)
(0.0540)*
(0.0339)*
-0.5356
-0.5361
-1.2762
-0.3615
(0.2159)** (0.2162)** (0.4519)***
(0.2258)
-0.0055
-0.0014
0.0224
0.0039
(0.0550)
(0.0550)
(0.1015)
(0.0624)
-0.0455
-0.0431
-0.2197
-0.0179
(0.0551)
(0.0552)
(0.1245)*
(0.0601)
0.1980
0.2004
-0.0830
0.2943
(0.0640)*** (0.0641)***
(0.1472)
(0.0752)***
-0.3536
-0.3459
-0.4384
-0.3526
(0.1217)*** (0.1212)*** (0.2363)* (0.1303)***
0.0169
0.0172
0.0002
0.0159
(0.0087)*
(0.0087)**
(0.0113)
(0.0129)
-0.1451
-0.1468
-0.0956
-0.2064
(0.0498)*** (0.0498)***
(0.0839)
(0.0588)***
-0.1459
-0.1500
-0.1126
-0.2392
(0.0773)*
(0.0773)*
(0.1442)
(0.0872)***
-0.8619
-0.8902
-0.2353
-1.1220
(0.3779)** (0.3775)**
(0.6443)
(0.4619)**
0.0266
0.0285
-0.0297
0.3270
(0.2568)
(0.2567)
(0.5128)
(0.2810)
0.2949
0.3216
-0.8487
0.5996
(0.3242)
(0.3242)
(0.6502)
(0.3965)
-1.2323
-1.2631
-1.9631
-0.7467
(0.9328)
(0.9318)
(1.7965)
(1.1357)
-0.0345
-0.0358
0.1288
-0.0816
(0.1048)
(0.1048)
(0.2570)
(0.1167)
0.0771
0.0790
0.2596
-0.0755
(0.0902)
(0.0903)
(0.2416)
(0.1075)
0.0367
0.0404
0.2249
-0.1383
(0.0934)
(0.0934)
(0.2463)
(0.0989)
0.0428
0.0465
0.1237
-0.0753
(0.0848)
(0.0847)
(0.2376)
(0.0844)
0.1296
0.1327
0.3029
-0.0635
(0.1194)
(0.1192)
(0.2704)
(0.1368)
0.0445
0.0418
0.3477
-0.0166
(0.0863)
(0.0865)
(0.3232)
(0.0729)
All

Half day
Half-day*Hours
Hours
One-way commute
Rural district
Elementary school
High school
Charter school
Student-Teacher ratio
Grade B
Grade C
Grade D
Percent lunch program
Percent black
Percent Hispanic
Pre-K/Kindergarten
First/Second Grade
Third/Fourth Grade
Fifth/Sixth Grade
Seventh/Eighth Grade
Math

All

36

Certified

Table 1.4, Continued
Science
Social Studies
Art/Gym/Music
Business/Technology
Foreign Language
Special Education
Other
Certified
Male
Teacher’s List
School’s List
% Teacher’s Lists
% School’s Lists
Friday job
Time of call (T – t)
Day-of offer
Day-before offer
Day of*(T – t)
Day before*(T – t)

Observations
Sub-days (RE)
Substitutes (clusters)
Log likelihood
rho

0.0818
(0.0775)
-0.0203
(0.0885)
-0.2644
(0.0755)***
-0.0752
(0.0779)
-0.0278
(0.0893)
-0.1760
(0.0796)**
-0.1955
(0.1052)*
0.3692
(0.1343)***
0.0583
(0.1212)
2.0205
(0.1832)***
1.3836
(0.1134)***
-0.5716
(1.9553)
0.1853
(0.5722)
-0.2119
(0.0505)***
-0.0007
(0.0001)***
-1.1566
(0.1393)***
0.9938
(0.2232)***
0.1074
(0.0141)***
-0.0420
(0.0097)***

0.0824
(0.0775)
-0.0198
(0.0887)
-0.2580
(0.0753)***
-0.0740
(0.0780)
-0.0302
(0.0895)
-0.1749
(0.0796)**
-0.1938
(0.1053)*
0.3703
(0.1344)***
0.0587
(0.1213)
2.0284
(0.1831)***
1.3887
(0.1136)***
-0.5771
(1.9540)
0.1821
(0.5729)
-0.2101
(0.0504)***
-0.0007
(0.0001)***
-1.1500
(0.1395)***
1.0054
(0.2234)***
0.1069
(0.0141)***
-0.0425
(0.0098)***

94,106
32,228
771
-21,587
0.685

94,106
32,228
771
-21,590
0.686

37

0.1917
(0.2855)
-0.2560
(0.3097)
-0.4309
(0.3039)
0.0089
(0.3937)
-0.6902
(0.3135)**
-0.0892
(0.3139)
-0.2030
(0.3351)

0.0565
(0.0736)
0.0307
(0.0818)
-0.2454
(0.0700)***
-0.0833
(0.0707)
0.0097
(0.0866)
-0.1787
(0.0768)**
-0.2086
(0.1101)*

Yes

No

0.3112
(0.2379)
1.6924
(0.3370)***
0.8878
(0.1849)***
-2.1378
(2.7874)
-1.4233
(1.3497)
-0.1748
(0.0850)**
-0.0006
(0.0002)***
-2.0669
(0.2447)***
0.4098
(0.3566)
0.1962
(0.0261)***
-0.0201
(0.0156)

0.0133
(0.1392)
2.2188
(0.2241)***
1.5859
(0.1434)***
0.4711
(2.4342)
0.3614
(0.6441)
-0.2141
(0.0630)***
-0.0006
(0.0002)***
-0.7591
(0.1702)***
1.2544
(0.2951)***
0.0734
(0.0165)***
-0.0504
(0.0129)***

23,823
9,037
195
-6,751
0.647

70,283
23,191
576
-14,667
0.697

Table 1.4, Continued
Notes: Standard errors, in parentheses, are robust to clustering at the substitute
level. ***, **, and * indicate statistical significance at 1, 5, and 10 percent.
Coefficients for all covariates are reported with the exception of month dummies
and day-of-week dummies other than Friday. The rho statistic is the percentage
of unobserved variation due to the sub-day random effect. The RE probits were
fit using 12-point adaptive quadrature, which is the preferred approximation
method when rho is relatively large (Rabe-Hesketh et al., 2005). The coefficient
estimates are stable when the number of quadrature points is increased.

38

Table 1.5: Average Partial Effects
All
1
-0.0176
(0.0031)***

RE Probits
Certified
2
-0.0247
(0.0074)***

Non-Cert.
3
-0.0149
(0.0036)***

Half-day hours

-0.0032
(0.0028)

-0.0032
(0.0072)

-0.0030
(0.0034)

Full-day hours

0.0068
(0.0019)***

0.0142
(0.0054)**

0.0058
(0.0016)***

One-way
commute

-0.0544
(0.0112)***

-0.1611
(0.0335)***

-0.0326
(0.0134)***

Rural district

-0.0006
(0.0034)

0.0028
(0.0086)

0.0004
(0.0033)

Elementary
school

-0.0046
(0.0048)

-0.0283
(0.0089)***

-0.0017
(0.0038)

High school

0.0208
(0.0044)***

-0.0101
(0.0109)

0.0278
(0.0050)***

Charter school

-0.0325
(0.0084)***

-0.0495
(0.0201)***

-0.0292
(0.0099)***

Student/teach.
ratio

0.0017
(0.0005)***

0.00003
(0.0010)

0.0015
(0.0009)*

Grade B school

-0.0147
(0.0024)***

-0.0120
(0.0069)*

-0.0187
(0.0031)***

Grade C school

-0.0145
(0.0041)***

-0.0140
(0.0109)

-0.0208
(0.0043)***

Grade D school

-0.0663
(0.0173)***

-0.0296
(0.0482)

-0.0712
(0.0213)***

% lunch program

0.0024
(0.0137)

-0.0038
(0.0386)

0.0294
(0.0146)**

% black

0.0304
(0.0202)

-0.1058
(0.0591)*

0.0560
(0.0254)**

-0.1262
(0.0610)**

-0.2527
(0.1747)*

-0.0691
(0.0732)

Half day

% Hispanic

39

RE-LPM
All
4
-0.0037
(0.0122)
[0.0161]
-0.0007
(0.0030)
[0.0033]
0.0014
(0.0014)
[0.0020]
-0.0568
(0.0139)***
[0.0140]***
0.0019
(0.0032)
[0.0037]
-0.0022
(0.0030)
[0.0037]
0.0146
(0.0040)***
[0.0045]***
-0.0233
(0.0076)***
[0.0087]***
0.0009
(0.0005)*
[0.0006]
-0.0098
(0.0028)***
[0.0033]***
-0.0139
(0.0045)***
[0.0051]***
-0.0506
(0.0185)***
[0.0218]**
0.0141
(0.0150)
[0.0172]
0.0196
(0.0186)
[0.0213]
-0.0755
(0.0471)
[0.0548]

FE-LPM
All
5
-0.0095
(0.0126)
[0.0133]
0.0004
(0.0032)
[0.0032]
0.0009
(0.0014)
[0.0015]
-0.0677
(0.0139)***
[0.0139]***
0.0018
(0.0028)
[0.0029]
-0.0005
(0.0030)
[0.0031]
0.0130
(0.0038)***
[0.0039]***
-0.0221
(0.0080)***
[0.0083]***
0.0002
(0.0005)
[0.0005]
-0.0079
(0.0025)***
[0.0026]***
-0.0133
(0.0043)***
[0.0044]***
-0.0348
(0.0172)**
[0.0178]*
0.0149
(0.0136)
[0.0144]
0.0015
(0.0172)
[0.0176]
-0.0327
(0.0451)
[0.0461]

Table 1.5, Continued
Math

0.0048
(0.0043)

0.0487
(0.0218)**

-0.0011
(0.0035)

Science

0.0087
(0.0046)*

0.0258
(0.0268)

0.0056
(0.0054)

Social Studies

-0.0021
(0.0053)

-0.0303
(0.0203)

0.0032
(0.0053)

Art/Gym/Music

-0.0251
(0.0030)***

-0.0482
(0.0189)**

-0.0212
(0.0048)***

Business/Tech.

-0.0073
(0.0052)

0.0017
(0.0364)

-0.0073
(0.0044)*

Foreign Language

-0.0024
(0.0049)

-0.0705
(0.0188)***

0.0014
(0.0052)

Special Education

-0.0171
(0.0042)***

-0.0112
(0.0155)

-0.0157
(0.0053)***

Other

-0.0184
(0.0062)***

-0.0238
(0.0221)

-0.0178
(0.0060)***

Certified

0.0398
(0.0076)***

Yes

No

0.0057
(0.0076)

0.0414
(0.0309)

0.0012
(0.0071)

Teacher’s List

0.3327
(0.0335)***

0.3091
(0.0548)***

0.3541
(0.0355)***

School’s List

0.1999
(0.0145)***

0.1393
(0.0229)***

0.2217
(0.0161)***

Friday job

-0.0208
(0.0031)***

-0.0218
(0.0068)***

-0.0188
(0.0031)***

Average
predicted A
Observations
Sub days
Substitutes
Jobs

0.115
(0.005)***
94,106
32,228
771
8,566

0.14
(0.01)***
23,823
9,037
195
5,796

0.10
(0.005)***
70,283
23,191
576
7,343

Male

40

-0.0034
(0.0052)
[0.0064]
0.0028
(0.0051)
[0.0061]
0.0003
(0.0061)
[0.0071]
-0.0192
(0.0043)***
[0.0052]***
-0.0075
(0.0048)
[0.0061]
-0.0055
(0.0055)
[0.0069]
-0.0134
(0.0048)***
[0.0056]**
-0.0156
(0.0058)***
[0.0075]**
0.0542
(0.0167)***
[0.0167]***
0.0151
(0.0139)
[0.0139]
0.3103
(0.0397)***
[0.0413]***
0.1545
(0.0168)***
[0.0184]***
-0.0134
(0.0060)**
[0.0066]**
.

-0.0082
(0.0045)*
[0.0049]*
-0.0001
(0.0050)
[0.0052]
0.0026
(0.0060)
[0.0062]
-0.0169
(0.0041)***
[0.0043]***
-0.0076
(0.0047)
[0.0050]
-0.0074
(0.0052)
[0.0054]
-0.0129
(0.0047)***
[0.0048]***
-0.0167
(0.0059)***
[0.0060]***
.

94,106
32,228
771
8,566

94,106
32,228
771
8,566

.
0.2674
(0.0501)***
[0.0511]***
0.1151
(0.0146)***
[0.0154]***
.
.

Table 1.5, Continued
Notes: Substitute-clustered standard errors are reported in parentheses. The probit APE
standard errors are based on 50 bootstrap replications. The square brackets in columns 4
and 5 contain two-way substitute-job clustered standard errors (Cameron et al., 2006). ***,
**, and * indicate statistical significance at 1, 5, and 10 percent.

41

Table 1.6: Mass-point Probit APE
APE
APE | MP 1
APE | MP 2
APE | MP 3
1
2
3
4
Half day
-0.0168
-0.0567
-0.0205
-0.0000
(0.0153)
(0.0594)
(0.0279)
(0.0017)
One-way commute
-0.0558
-0.1859
-0.0684
-0.0001
(0.0208)***
(0.0795)**
(0.0651)
(0.0035)
Elementary school
-0.0052
-0.0174
-0.0064
-0.0000
(0.0046)
(0.0177)
(0.0092)
(0.0005)
High school
0.0190
0.0619
0.0238
0.0000
(0.0071)***
(0.0264)**
(0.0224)
(0.0014)
Charter school
-0.0315
-0.1177
-0.0354
-0.0001
(0.0104)***
(0.0342)***
(0.0479)
(0.0012)
Grade B school
-0.0131
-0.0443
-0.0160
-0.0000
(0.0043)***
(0.0166)***
(0.0159)
(0.0008)
Grade C school
-0.0132
-0.0460
-0.0158
-0.0000
(0.0081)*
(0.0273)*
(0.0306)
(0.0009)
Grade D school
-0.0625
-0.2649
-0.0620
-0.0001
(0.0259)***
(0.0680)***
(0.1651)
(0.0018)
Percent lunch program
0.0006
0.0019
0.0007
0.0000
(0.0266)
(0.1045)
(0.0494)
(0.0026)
Percent black
0.0254
0.0845
0.0311
0.0001
(0.0317)
(0.1257)
(0.0563)
(0.0035)
Percent Hispanic
-0.1110
-0.3701
-0.1362
-0.0003
(0.0858)
(0.3308)
(0.1623)
(0.0088)
Math
0.0036
0.0120
0.0045
0.0000
(0.0048)
(0.0192)
(0.0102)
(0.0005)
Science
0.0088
0.0286
0.0111
0.0000
(0.0048)
(0.0185)
(0.0094)
(0.0004)
Social Studies
-0.0043
-0.0144
-0.0052
-0.0000
(0.0055)
(0.0222)
(0.0091)
(0.0005)
Art/Gym/Music
-0.0244
-0.0876
-0.0284
-0.0000
(0.0060)***
(0.0211)***
(0.0335)
(0.0012)
Special education
-0.0172
-0.0599
-0.0204
-0.0000
(0.0065)***
(0.0229)***
(0.0260)
(0.0010)
Teacher’s list
0.3660
0.4019
0.6502
0.0144
(0.0358)***
(0.0374)***
(0.0480)***
(0.0552)
School’s list
0.2077
0.3563
0.3415
0.0012
(0.0170)***
(0.0416)***
(0.0474)***
(0.0183)
Friday job
-0.0217
-0.0752
-0.0259
-0.0000
(0.0050)***
(0.0185)***
(0.0133)*
(0.0005)
MP Location
3 MP
0.12
-1.74
-5.06
Location Probability
.
0.12
0.48
0.40
Avg. Predicted A
0.11
0.59
0.09
0.0001
(0.0098)***
(0.0461)***
(0.0485)*
(0.0020)
Notes: The model’s coefficients are reported in table A1. Standard errors are based on 50
bootstrap replications. ***, **, and * indicate statistical significance at 1, 5, and 10 percent.
The three mass-point probit model was fit using the GLLAMM Stata package (Rabe-Hesketh
et al., 2002). The likelihood function of a four mass-point model did not converge.

42

APPENDIX 1.2

CHAPTER 1 FIGURES

43

Figure 1.1: Job-length distribution

Percent of jobs

40

30

20

10

0
0

2

4

Hours

6

8

Notes: Bins are 20 minutes wide. The vertical line indicates the half-full cutoff.

44

Figure 1.2a: Day of-offer time distribution
10

Percent of offers

8
6

4
2
0
5 a.m. 6

7

8

9

10
11
Offer time

noon 1 p.m. 2

Note: Bins are 15 minutes wide.
Figure 1.2b: Day before-offer time distribution
10

Percent of offers

8
6

4
2
0
4 p.m.

5

6

7

8
9
Offer time

Note: Bins are 15 minutes wide.

45

10

11 p.m.

3

Predicted acceptance probability

Figure 1.3: Offer time-acceptance probability gradient
.2
.15
.1
.05
0 Two days before
4 p.m. 11 p.m. Offer time
All

Day before
4 p.m.

Cert.

Day of

11 p.m. 5 a.m.
Non-Cert.

Note: The gradients were computed using the RE-probit coefficients
reported in columns 1, 3, and 4 of table 1.4.

46

4 p.m.

APPENDIX 1.3

AVERAGE PARTIAL EFFECT (APE) DEFINITIONS

47

Let N represent the total number of offers (observations). In the baseline model, which assumes





2
that the sub-day random effect ωsd is ~ Normal 0,  , the APE of a continuous variable k is


 γx sdt  λz s  δd d  bt sdt

APEk =
0.5  
0.5
2
2
sdt 1 
N 1  
1  


k



N













(A1.1)

and the APE of a binary variable k is





 



k
k
  γx sdt   k 1  xsdt  λz s  δdd  bt sdt 
 γx sdt   k xsdt  λz s  δdd  bt sdt  


APEk = N   
  
 . (A1.2)
0.5
0.5
2
2



sdt 1  
1  
1  
 





1

N









For the half-day APE, equation (A1.2) is modified as follows: For the first CDF in A1.2, if the
offered job was a half-day, the vector xsdt is left as is. If a full-day job was offered, in addition
to adding γhalf, Hsdt is changed to 3.4, the mean half-day job length. Similarly, for the second
CDF in (A1.2), if a full-day job was offered the vector xsdt is left as is. If a half-day job was
offered, γhalf is subtracted, and Hsdt is changed to the full-day mean job length (7.2).
In the three-mass point model, ωsd takes the values ω1, ω2, and ω3 with probabilities π1,
π2, and π3, respectively. The APE of a continuous variable k, at mass point j, is
APEk,j = N 1 k

N

   γxsdt  λz s  δdd  bt sdt   j .

(A1.3)

sdt 1

The APE of binary variables, and the adjustment for the half-day APE, are computed in similar
fashion. The overall APE of variable k is simply the weighted average of the mass point-specific
APE:

APEk 

3

  j APEk , j .
j 1

48

(A1.4)

APPENDIX 1.4

MASS-POINT PROBIT COEFFICIENTS

49

Table A1: Mass-point Probit Coefficients
Half day
0.3248
(0.2102)
Half-day*Hours
-0.0817
(0.0451)*
Hours
0.0581
(0.0274)**
One-way commute h
-0.5414
(0.1941)***
Rural district
0.0041
(0.0515)
Elementary school
-0.0507
(0.0538)
High school
0.1812
(0.0619)***
Charter school
-0.3356
(0.1171)***
Student-Teacher ratio
0.0165
(0.0084)*
Grade B
-0.1291
(0.0476)***
Grade C
-0.1328
(0.0743)*
Grade D
-0.7756
(0.3461)**
% lunch program
0.0055
(0.2472)
% black
0.2460
(0.3081)
% Hispanic
-1.0780
(0.8961)
Pre-K/Kindergarten
-0.0285
(0.1031)
First/Second Grade
0.0858
(0.0853)
Third/Fourth Grade
0.0432
(0.0889)
Fifth/Sixth Grade
0.0310
(0.0812)
Sev./Eighth Grade
0.1569
(0.1193)
Math
0.0350
(0.0825)

50

Table A1, Continued
Science
Social Studies
Art/Gym/Music
Business/Technology
Foreign Language
Special Education
Other
Certified
Male
Teacher’s List
School’s List
% Teacher’s Lists
% School’s Lists
Friday job
Time of call (T – t)
Day-of offer
Day-before offer
Day of*(T – t)
Day before*(T – t)
Observations
Sub-days (RE)
Substitutes (clusters)
Log likelihood

0.0840
(0.0759)
-0.0419
(0.0874)
-0.2515
(0.0721)***
-0.0779
(0.0752)
-0.0465
(0.0884)
-0.1729
(0.0777)**
-0.2036
(0.1008)**
0.3629
(0.1221)***
0.0453
(0.1112)
2.3162
(0.2312)***
1.4084
(0.1193)***
-0.5942
(1.4940)
-0.0819
(0.5900)
-0.2168
(0.0462)***
-0.0006
(0.0001)***
-1.1166
(0.1236)***
0.8010
(0.2185)***
0.0992
(0.0122)***
-0.0340
(0.0095)***
94,106
32,228
771
-21,587

51

Table A1, Continued
Notes: Standard errors, in parentheses, are robust to clustering at the substitute level. ***,
**, and * indicate statistical significance at 1, 5, and 10 percent. Coefficients for all
covariates are reported with the exception of month dummies and day-of-week dummies
other than Friday. The model was fit using the GLLAMM package (Rabe-Hesketh et al.,
2002).

52

CHAPTER 1 REFERENCES

53

CHAPTER 1 REFERENCES

Bucior, C. 2010. The Replacements. New York Times, January 2.
Butler, J.S., and R. Moffitt. 1982. A computationally efficient quadrature procedure for the One
Factor Multinomial Probit Model. Econometrica 50(3): 761-764.
Cameron, A.C., J.B. Gelbach, and D.L. Miller. 2006. Robust inference with multi-way
clustering. NBER Technical Working Paper No. 327.
Cameron, A.C., and P.K. Trivedi. 2005. Microeconometrics: Methods and Applications, New
York, NY: Cambridge Univ. Press.
Card, D., and A. Krueger. 1992. Does school quality matter? Returns to education and the
characteristics of public schools in the United States. Journal of Political Economy 100(1): 1-40.
Clotfelter, C. F., H. Ladd, and J. Vigdor. 2009. Are teacher absences worth worrying about in the
U.S.? Education Finance and Policy 4(2): 115-149.
Coverdill, J., and P. Oulevey. 2007. Getting contingent work: Insights into on-call work,
matching processes, and staffing technology from a study of substitute teachers. Sociological
Quarterly 48(3): 533-557.
Das, J., S. Dercon, J. Habyarimana, and P. Krishnan. 2007. Teacher shocks and student learning:
Evidence from Zambia. Journal of Human Resources 42(4): 820-862.
Devine, T. J., and N. M. Kiefer. 1991. Empirical Labor Economics: The Search Approach.
Oxford: Oxford Univ. Press.
Dolton, P. 2006. Teacher supply. In Handbook of the Economics of Education, vol. 2, ed. E.
Hanushek and F. Welch, 1079-1161. Amsterdam: North-Holland.
Dorward, J., A. Hawkins, and G. Smith. 2000. Substitute teacher availability, pay, and influence
on teacher professional development: A national survey. ERS Spectrum Summer: 40-46.
Ehrenberg, R. G., R. A. Ehrenberg, D. Rees, and E. Ehrenberg. 1991. School district leave
policies, teacher absenteeism, and student achievement. Journal of Human Resources 26(1): 72105.
Goette, L., D. Huffman, and E. Fehr. 2004. Loss aversion and labor supply. Journal of the
European Economic Association 2(2-3): 216-228.

54

Hanushek, E., J. Kain, and S.G. Rivkin. 2004. Why public schools lose teachers? Journal of
Human Resources 39(2): 326-354.
Hanushek, E., and S.G. Rivkin. 2006. Teacher quality. In Handbook of the Economics of
Education, vol. 2, ed. E. Hanushek and F. Welch., 1019-1078. Amsterdam: North-Holland.
Hanushek, E., and L. Woessmann. 2008. The role of cognitive skills in economic development.
Journal of Economic Literature 46(3): 607-668.
Heckman, J. J., and B. Singer. 1984. A method for minimizing the impact of distributional
assumptions in econometric models for duration data. Econometrica 52(2): 271-320.
Henderson, E., N. Protheroe, and S. Porch. 2002. Developing an Effective Substitute Teacher
Program. Arlington, VA: Educational Research Service.
Jacobson, S. 1988. The effects of pay incentives on teacher absenteeism. Journal of Human
Resources 24(2): 280-287.
Kiefer, N., and G. Neumann. 1981. Individual effects in a nonlinear model: Explicit treatment of
heterogeneity in the empirical job-search model. Econometrica 49(4): 965-979.
Layton, J. 2005. How MapQuest works. HowStuffWorks.com.
http://www.howstuffworks.com/mapquest.htm (accessed March 27, 2011).
MDE. See Michigan Department of Education.
Michigan Department of Education: Office of Educational Assessment and Accountability. 2007.
Guide to Reading the Michigan School Report Cards (2007 Edition).
Miller, R., R. Murnane, and J. Willett. 2008a. Do teacher absences impact student achievement?
Longitudinal evidence from one urban school district. Educational Evaluation and Policy
Analysis 30(2): 181-200.
———. 2008b. Do worker absences affect productivity? The case of teachers. International
Labour Review 147(1): 71-89.
Mortensen, D. 1986. Job search and labor market analysis. In Handbook of Labor Economics,
vol. 2, ed. O. Ashenfelter and R. Layard, 849-919. Amsterdam: North-Holland.
Rabe-Hesketh, S., A. Skrondal, and A. Pickles. 2002. Reliable estimation of generalized linear
mixed models using adaptive quadrature. The Stata Journal 2: 1-21.
———. 2005. Maximum likelihood estimation of limited and discrete dependent variable
models with nested random effects. Journal of Econometrics 128(2): 301-323.
Rogers, J. 2001. There’s no substitute. Work and Occupations 28(1): 64-90.

55

Roza, M. 2007. Frozen assets: Rethinking teacher contracts could free billions for school
reform. January 2007 Education Sector Reports.
Strauss, R. 2003. The Market for Substitute Classroom Teachers in South-West Pennsylvania in
2001-2002. Pittsburgh, PA: The Pittsburgh Foundation.
Wooldridge, J.M. 2010. Econometric Analysis of Cross Section and Panel Data, 2nd Ed.
Cambridge, MA: MIT Press.

56

CHAPTER 2

GOING THE EXTRA MILE

57

2.1

Introduction

The cost of commuting influences a variety of economic decisions. It is a fundamental
parameter in urban-economic spatial models of firm and household location (Muth, 1969) and is
central to cost-benefit analyses of proposed transportation-infrastructure investments (Small &
Verhoef, 2007, p.181).

14

A typical goal of the latter is to spur economic development in

suburban and rural areas by decreasing the commute time to jobs in neighboring cities (So et al.,
2001). The effect of commuting on labor supply enters firms’ hiring decisions as well by
shaping the optimal ―spatial search radius‖ over which to recruit (Russo et al., 1996). Finally,
both explicit and implicit commuting costs are fixed costs of working that may influence laborforce participation (Cogan, 1981). Black et al. (2010), for example, find significantly lower
labor-force participation rates among married women in cities that have longer-than-average
commutes. Identifying the effect of commuting on labor supply is complicated by a fundamental
endogeneity problem, however: commute time is jointly determined by individuals’ job and
residence choices.

15

I estimate the causal effect of commute time on daily labor supply by studying a unique
labor market in which workers are subject to daily exogenous variation in commute time and are
not constrained in their daily labor supply decisions.

16

To motivate my approach, consider how

14

Muth (1969) is a classic text on urban-economic spatial models, in which the marginal cost of
commuting determines the size and shape of cities (p. 90-93), the wage-commute gradient, and
the housing price-commute gradient (p. 71). The spatial model may be out of equilibrium,
however; Stutzer and Frey (2008) find that individuals with longer commutes systematically
report being unhappier than those with shorter commutes.
15
The existence of two-worker households increases the problem’s complexity.
16

This approach is very much in the spirit of the work on intertemporal labor supply that, for
similar reasons, focuses on the cab-driver and stadium-vendor labor markets (Camerer et al.,
1997; Oettinger, 1999).
58

the causal effect of commuting on daily labor supply would be identified experimentally. We
would begin by holding each subject’s residential location and transportation mode fixed. Then,
on a daily basis, individuals would be asked to choose between accepting a job at a randomly
determined location and not working. If all jobs are identical aside from location, it is
straightforward to use the observed decisions to estimate the marginal effect of commute time on
daily labor supply.
One particular substitute-teacher labor market is similar to this hypothetical ideal
experiment. Each day, a consortium comprised of ten Michigan school districts and over 75
schools makes hundreds of take-it-or-leave-it job offers to substitute teachers via an automated
calling system. Importantly, the call system makes offers in a conditionally random order, which
generates exogenous variation in offer quality and commute time across substitute teachers. The
call-system’s randomness solves the usual endogeneity problem of commute times being
correlated with individuals’ unobserved tastes. Unlike in the ideal experiment, however, all jobs
and schools (locations) are not identical. These confounding factors must be ―partialed out‖ by
controlling for a variety of job characteristics and school fixed effects.
The empirics are based on an optimal decision rule that is motivated by a job-search
model of substitute teachers’ expected utility maximization. I use data on accepted and rejected
offers to estimate sequential binary-choice models of substitutes’ offer acceptance decisions.
The main results suggest that a fifteen-minute increase in one-way commute time decreases the
acceptance probability by three percentage points and that the elasticity of the acceptance
probability with respect to commute time is about -0.4.
I also investigate whether certain day and individual characteristics influence the
disutility of commuting. On average, the negative effect of commute time is about 36% larger

59

when the 6:00 a.m. temperature is below 20 degrees Fahrenheit but rainfall over the past 24
hours is not associated with commuting preferences. Fuel prices appear to increase the cost of
commuting, but this effect is imprecisely estimated. Similarly, both women and substitutes who
are certified as regular teachers tend to have a larger, but imprecisely estimated, aversion to
commuting. Estimating the model separately for men and women, however, yields the
interesting results that women are significantly more averse to commuting in cold temperatures
and are significantly more responsive to changes in fuel prices than men: the negative effect of
commute time for women is over 50% larger on frigid days and a one-dollar increase in the price
per gallon of fuel increases women’s negative effect of commute time by 44%.

2.2

Literature Review

There are both explicit and implicit private costs of commuting.

17

There are two types of

explicit commuting costs. The first is monetary: the American Automobile Association (AAA)
reports average vehicle costs of $0.42 to $0.66 per mile, about $0.10 of which is for fuel (AAA,
2009). The second includes potential physical and mental-health costs of commuting
(Koslowsky et al. 1995). The primary implicit cost is forgone time: the average one-way
commute in the US in 2004 was about 25 minutes, up from about 20 minutes in 1980 (Pisarski,
2006).
Two recent studies have directly investigated the effect of commuting on labor supply.
Using panel-data methods, Gutiérrez-i-Puigarnau and Van Ommeren (2010) find that longcommute German workers work fewer but longer days per week than individuals with shorter
commutes, but find no difference between the groups in total weekly hours. Similarly, applying
17

Though not relevant here, there are also public (external) costs of commuting (e.g. Lemp &
Kockelman, 2008).
60

an instrumental-variables procedure to Spanish time-use data, Gimenez-Nadal and Molina
(2011) find that an extra hour of commute time leads to a 35 minute increase in the length of the
workday. Both of these studies are subject to the criticism that workers are likely to be
constrained in their labor supply choices, however (Dickens & Lundberg, 1993; Kahn & Lang,
1991).
Rather than estimating the effect of commute time on outcomes such as labor supply, the
early empirical commuting literature used discrete-choice models of commuters’ transportationmode choices to estimate the willingness to pay per hour of commute time (WTP). On average,
these studies find the cost of an hour of commute time to be about 50% of the hourly wage
(Small & Verhoef, 2007, p. 52). A well-documented problem with this method is the implicit
assumption that time spent travelling in one mode (e.g., a car) is equivalent to time spent
traveling in another (e.g., a bus).
Stated-preference survey data, in which respondents rank or choose from a hypothetical
set of commute-wage bundles, has been proposed as a solution to the ―comparability‖ problem
inherent in the transportation-mode choice literature mentioned above. Calfee et al. (2001) and
Calfee and Winston (1988) evaluate stated-preference data using various econometric methods
and find a significantly lower WTP of about 20 percent of the hourly wage. However,
experimentalists have repeatedly found a ―hypothetical bias‖ in answers to subjective and
hypothetical questions, questioning the validity of estimates based on stated-preference data
(Harrison, 2006).
A third approach to estimating WTP employs structural job-search models that treat job
offers as wage-commute bundles. For example, Van Ommeren et al. (2000) find a WTP of about
50% of the hourly wage. Van Ommeren and Fosgerau (2009) extend the basic search model to

61

incorporate job-switching behavior and find that the total commuting cost, including both time
and monetary costs, is 200% of the hourly wage.
Commuting costs, particularly the opportunity cost of time, potentially vary across both
observed and unobserved individual attributes. So et al. (2001) find that commuters, relative to
non-commuters, earn higher wages, are younger, and are disproportionately male. One possible
explanation of the latter is that women must stay closer to home because they are active in home
production. This view is supported by Van den Berg and Gorter (1997) who find that women
with children have significantly higher WTP than those without children, but no significant
difference in WTP between men and childless women. Inclement weather might also increase
the marginal cost of commuting for a variety of reasons. Snow, for example, has been shown to
decrease commuters’ welfare by decreasing travel speed (Sabir et al., 2010).
Despite not estimating a formal WTP, I contribute to the existing literature on commuting
preferences in several ways. First, to the best of my knowledge this is the first paper to exploit
arguably exogenous variation in the actual commute times faced by workers making laborsupply decisions in real time. Thus I am able to estimate the effect of commute time on labor
supply without making the stronger assumptions required by the fixed-effects and instrumentalvariables estimators used in previous work. Second, my analysis is immune to the criticism that
commute time-labor supply elasticities are attenuated because substitute teachers are
unconstrained in their daily labor supply decisions.

18

Third, I am able to make inroads on the

longstanding question of whether there are gender-specific commuting preferences because the
call system’s randomness eliminates the confounding dual worker-household problem. Finally,

18

Again, in this regard, the substitute-teacher labor market is similar to the cab-driver and
stadium-vendor labor markets studied by Camerer et al. (1997) and Oettinger (1999).
62

the presence of daily variation in commute times allows me to investigate the role that cost
shifters such as fuel prices and inclement weather play in commuting decisions.

2.3

Labor-Market Environment and Data

2.3.1 The Intermediate School District
I investigate the daily labor supply decisions of substitute teachers in a consortium of ten
adjacent and autonomous school districts in Michigan. The consortium consists of over 75
schools located across approximately 600 square miles. Substitute teachers live both within and
outside the consortium. Membership in the consortium enables districts to enjoy economies of
scale in training substitute teachers and in operating an automated calling system. The calling
system is used to satisfy regular teachers’ requests for substitute teachers who were not filled
personally. The subsequent analysis focuses solely on jobs filled by the automated calling
system, accounting for about half of the consortium’s annual teacher absences.
At any time prior to the start of a job a regular teacher may request a substitute through
the automated calling system. After the regular teacher has specified the job’s characteristics the
calling system sequentially offers the job to available substitute teachers until either the job is
accepted or it begins. Job offers are made over the phone for same-day jobs beginning at 5:00
a.m. and for jobs one or more days in the future between 4:00 p.m. and 11:00 p.m. each day of
the week. Substitutes are not penalized for rejecting an offer, but once a substitute accepts a job
he or she will not receive any conflicting offers. Additionally, substitute teachers are prohibited
from returning to previously-rejected offers.
Substitutes receive offers from all consortium schools and are called in a conditionally
random order. There are two conditioning variables: the substitutes’ regular-teacher certification

63

status and ―preferred-list status.‖ Each regular teacher and school enters a list of ―preferred
substitutes‖ in the system. The phone calls made to available substitute teachers state the start
and end time, teacher name, subject, and school of the job being offered. The job’s wage is not
explicitly stated because it is a known function of job length. Daily pay in this labor market, for
all substitutes and for all schools, is binary: half days pay $40 and full days pay $75, where half
days are jobs lasting less than four hours and 21 minutes. Job length is ultimately at the
discretion of the regular teacher making the request but also influenced by school- and subjectspecific schedules.

2.3.2 Data
The primary labor-supply data comes directly from the automated calling system’s computer and
includes every offer made during the 2006-07 school year. Over 100,000 offers regarding nearly
9,000 jobs (unique substitute requests) were made. Remarkably, 98% of these jobs were
successfully filled. In addition to the job attributes mentioned above, I observe the day and time
at which each offer was made, whether or not the offer was accepted, and a unique identifier of
each substitute along with his or her certification status, preferred-list status, gender, and home
zip code.

19

Measures of commute time and distance from the center of each substitute’s home zip
code to each school’s street address were computed using MapQuest.com.

20

MapQuest uses

geocoding technology to assign approximate latitude-longitude coordinates to each school’s
19

Age is only observed for about 70% of the substitutes, so is not used in the analysis.

20

The use of centroids was necessitated by privacy requirements that prevent access to the
substitutes’ home addresses. Commutes, therefore, are measured with error because substitutes
can live anywhere within the zip code. Implications of this measurement error are discussed in
section 2.5.2.
64

street address and to the centroid of each five-digit zip code. An algorithm then searches a
database of roadmaps and evaluates potential routes. The algorithm chooses an optimal route
based on driving distance, posted speed limits, the number of left-hand turns, and the number of
intersections. The travel distance (in miles) and estimated travel time for the optimal route are
then reported. See Layton (2005) for additional details and references.
I further augment the call-system data with information on daily weather and fuel-prices,
both of which potentially influence the marginal cost of commuting. To account for inclement
weather I use daily measures of rainfall and temperature from the U.S. National Climatic Data
Center’s ―Land Surface Data.‖

21

While snowfall is likely the most important weather-related

shifter of commute costs for the general labor force (Sabir et al., 2010), it is of little interest in
the present context because schools in the consortium close when winter weather creates
hazardous driving conditions. To control for fuel costs, which represent as much as one quarter
of per-mile vehicle costs (AAA, 2009), I use county-level daily average fuel prices that are based
on daily samplings of about 100 gas stations located in the consortium’s MSA.

22

2.3.3 Descriptive Statistics
Table 2.1 provides summary statistics of one-way commutes and some other offer
characteristics. The average offered one-way commute was about 13 miles or 18 minutes, which
is slightly shorter than the U.S. national average of about 25 minutes (Pisarski, 2006). About
16% of offers were made to substitutes residing within the offering district. Of the offer
21

The ―Land Surface Data‖ is collected daily at 6 a.m. by a CO-OP station in the center of the
consortium and is publicly available from the National Climatic Data Center at
http://www.ncdc.noaa.gov/oa/climate/stationlocator.html.
22
This is proprietary data that was purchased from the private market-research firm Oil Price
Information Service (OPIS).
65

recipients, 34% were male and 24% were certified as regular teachers. The average accepted
commute was about 2.5 miles and 3 minutes shorter than the average rejected commute,
suggesting that longer commutes were less likely to be accepted regardless of whether commutes
are measured in time or distance. In the subsequent analysis I focus only on commute times
because the time and distance measures in my data are highly correlated and produce nearly
identical estimates of the elasticity of the offer-acceptance probability with respect to commute
length.

23

Similarly, on average substitutes residing within the offering district are eight

percentage points more likely to accept.

24

Figure 2.1 depicts the distributions of offered, accepted, and rejected commute times.
The majority of offered commutes are shorter than 30 minutes. Comparing the kernel density
estimates of the accepted-offer and rejected-offer distributions again suggests that accepted
offers tend to be associated with shorter commutes.
Some substitutes’ home zip codes provided in the data suggest one-way commute times
longer than two hours, raising the concern that some zip codes are incorrect. I drop suspect zip
codes from the subsequent analysis as follows. First, using a zip-code map of the area, I retain
all zip codes contiguous to the consortium. Second, I retain all zip codes containing at least one
active substitute that are contiguous to the area defined in step 1. I repeat step 2, retaining all zip
codes that are contiguous to the area defined in the previous step and contain at least one active
substitute teacher, until the region is encapsulated by a ring of zip codes containing no active

23

Van Ommeren and Fosgerau (2009) note that commute time and commute distance are
typically not equivalent measures and discuss the relative merits of each. The correlation
coefficient is 0.96 in my data, however, which likely results from numerous accessible highways
and a general lack of traffic congestion in the consortium.
24
This may have to do with preferences for the neighborhood school rather than commute time,
however, a possibility that is investigated in the sensitivity analysis of section 2.5.2.
66

substitutes. The result is a contiguous region of 48 zip codes, 11 of which are located within the
consortium.
Figure 2.2 plots rainfall over the 24-hour period ending at 6:00 a.m. and the daily
temperature at 6:00 a.m. for each school day. Rainfall is only reported for days when at least one
consortium school was open. Figure 2.3 plots the county-level average daily price per gallon of
regular unleaded fuel over the course of the school year. Fuel prices decrease in the month of
September and remain relatively stable until the end of December. Fuel prices then fall below
$2.00 in January before steadily increasing over the remainder of the school year.

2.4

Econometric Model

2.4.1 Optimal Decision Rule
This section draws heavily upon the model developed in section 1.4 of this volume. For
additional details, the interested reader is referred to section 1.4. I assume that substitute
teachers maximize expected utility when deciding whether to accept or reject an offer, which in
this case is accomplished by following a reservation-utility decision rule: accept if and only if the
A

R 25

utility of accepting (U ) exceeds the expected utility of rejecting (U ).

A

U is a function of the

R

offer’s and recipient’s characteristics. U is a function of both the recipient’s non-subbing
N

alternative (U ) and expectations of future offers.
Let T be the last time that an offer to work on a particular day can be made. Rejecting an
offer at time T, therefore, is equivalent to choosing the non-subbing alternative on that day; this

25

The functioning of the automated calling system is essentially a finite-horizon job-search
model with no recall and no on-the-job search, a la Mortensen (1986).
67

N

R
implies that UT  U N . For all t less than T, U tR can be approximated by the sum of U and a

nonnegative, monotonically-decreasing function of offer time.
Substitute teachers’ daily utility is assumed to take the same functional form whether
substitute teaching, working elsewhere, or not working at all. Daily utility is a function of nonlabor income (Y), labor income (M), hours worked (H), commuting costs (C), and a variety of
individual, day, and non-wage job characteristics that are both observed and unobserved (ψ).
Formally, let daily utility take the form

U  f Y    M  g  H   C   .

26

(2.1)

Taking a first-order approximation of g, so H enters the utility function linearly as an observed
job characteristic, the utility accruing to substitute s of accepting job j on day d at time t is

A
A
U sdjt  f Ysd   γ Ax j  Csdjt  λ Az s  b Aw sdjt  δ Ard  sd   sdjt ,

(2.2)

where xj is a vector of observed job characteristics including job length, daily pay, and full sets
of subject and school dummies; zs is a vector of observed substitute characteristics including
gender, certification status, and preferred-list status; wsdjt is a vector of offer-time variables that
is a piecewise linear function in (T – t); rd is a vector of day-of-job variables including rainfall,

26

Daily pay (M) is valued linearly because there is approximately no income effect of a small
change to lifetime earnings (Goette et al., 2004).
68

A
temperature, average fuel price, and full sets of day-of-week and month dummies; sd is the

substitute’s unobserved day-specific taste for substitute teaching; and εsdjt is an offer-specific
error term that captures job attributes that are unobserved by the econometrician (i.e. withinschool, within-subject variation in classroom quality), substitutes’ mood and attention level at the
time of offer, and substitutes’ preferences for specific jobs or schools.

27

N
The non-subbing utility U sd , which can be interpreted as a substitute’s opportunity cost

of subbing on day d, depends on observed individual and day characteristics as well as

 

N
unobserved sub-day specific non-subbing opportunities sd . Formally,

N
N
U sd  f Ysd   λ N z s  δ N rd  sd .

(2.3)

Combining equations (2.2) and (2.3) with the reservation-utility decision rule yields the
probability that an offer will be accepted conditional on it being received:

27

A

Because daily pay is binary, M will be replaced by a half-day indicator. Offer time enters U
because it might proxy for offer quality in several ways. First, the distribution of offers might
worsen over time. Second, a late-arriving offer might indicate that the regular teacher made the
request late in the morning and therefore did not have time to prepare a lesson plan for the
substitute teacher or prepare students for the absence. Third, the amount of time the substitute
has to prepare for the job might be an important measure of offer quality. Day-of-week variables
A
enter U because they may contain information on job quality via student behavior. For
example, students may behave differently on rainy or warm days, Fridays, and towards the end
of the school year. The implications of school-specific tastes among substitutes are considered
and tested for in the sensitivity analysis of section 2.5.2.
69



Pr Asdjt  1| psdt  1, x j , Csdjt , z s , w sdjt , rd , sd







(2.4)

 Pr γ x j  Csdjt  z s  bw sdjt  δrd  sd   sdjt  0 | psdt  1 ,
A

where Asdjt is a binary indicator of offer acceptance and psdt is a binary indicator of substitute s
having received a day-d offer at time t. The lack of superscripts on the coefficients of z, w, and r
and on the unobserved sub-day effect is notational, indicating that only the net effect of these
A

N

A

R

variables on the acceptance probability are identified; specifically,  =  –  , b = b – b ,
A

N

A

N

 =  –  , and  =  –  . The cost of commuting, Csdjt, will be approximated in the
empirics by both linear and quadratic functions of commute time and commute time interacted
with elements of z and r. A final comment regarding equation (2.4) is that non-labor income
A

N 28

was differenced out because it is valued identically in both U and U .

2.4.2 Estimation
The typical sample-selection problem caused by a lack of data on rejected offers is absent here
because all offers, accepted and rejected, are observed. However, the call-system data is a
selected sample in the sense that offer-acceptance decisions are only observed when an offer was
made (i.e., when psdt = 1). Substitutes who work on day d will, on average, receive fewer offers
and have higher values of ωsd than those that do not work on day d because substitutes do not

28

Intuitively, this is a result of consumption smoothing over the lifecycle and preferences that
are separable in consumption and leisure. The assumption that non-labor income is valued
differently on subbing and non-subbing days can be relaxed entirely by noting that any
difference in utility would be sub-day specific and hence incorporated in ωsd.
70

receive offers that conflict with previously accepted jobs. The resulting negative correlation
between ωsd and psdt implies that pooled estimators of (2.4) that fail to account for the presence
of ωsd are inconsistent.
Conditional on zs and ωsd, however, the offer-specific error term εsdjt is independent of
the selection indicator. This is a direct result of the call system’s randomness. Accordingly,
conditional on zs and ωsd, missing observations (time periods in which no offer is received) can
be considered ―missing at random‖ (Cameron & Trivedi, 2005, p. 926) and the conditioning on
an offer being received can be removed from the RHS of equation (2.4). This solution to the
problem of unbalanced panels in a nonlinear model is similar in spirit to Kiefer and Neumann
(1981).
The baseline model imposes the following assumptions:

 sdjt | psdt  1, x j , Csdjt , zs , wsdjt , rd ,sd    sdjt | x j , Csdjt , zs , wsdjt , rd ,sd  ~ N 0,1
and

2
sd | psdt  1, x j , Csdjt , z s , wsdjt , rd   sd | psdt  1, z s , rd  ~ N 0, .

(2.5a)
(2.5b)

Assumption (2.5a) reflects that  sdjt is independent of psdt , conditional on ωsd, as discussed
above. Assumption (2.5b) is a direct result of the call system’s conditional randomness. Under
these assumptions equation (2.4) can be rewritten as

71





Pr γ Ax j  Csdjt  λz s  bw sdjt  δrd  sd   sdjt  0 | psdt  1





  γ x j  Csdjt  λz s  bw sdjt  δrd  sd ,
A

(2.6)

and estimated using the random-effects (RE) probit procedure of Butler and Moffitt (1982).

2.5

Results

2.5.1 Main Results
Table 2.2 reports the estimated RE-probit coefficients, average partial effects (APE), and
elasticities of commute time. APE and elasticities for the RE-probit model are defined in
appendix 2.3. The APE and elasticity standard errors were computed by taking the standard
deviation of the estimates from 50 bootstrap replications. The bootstrap procedure resampled
with replacement at the substitute level, utilizing all observations from the chosen substitute.
Resampling at the substitute level produces standard errors that are robust to substitute-level
clustering and that are asymptotically equivalent to the usual robust ―sandwich‖ standard error
estimates (Cameron & Trivedi, 2002). Clustering at the substitute level allows for individuals’
opportunity costs (ωsd) to be correlated across days. Omitted from table 2.2, but included in its
regressions, are the substitute, day, and job characteristics described in section 2.4; the full set of
coefficient estimates is reported in table A2.
Column 1 assumes that commute time enters the model linearly. The commute-time
coefficient is negative and strongly statistically significant. The APE indicates that a fifteenminute increase in one-way commute time lowers the acceptance probability by about three
percentage points. In elasticity terms, a ten percent increase in commute time lowers the
acceptance probability by about four percent. Allowing for a quadratic in commute time does
72

not lead to a meaningful change in the estimated APE or elasticity of commute time. Because
the quadratic-term coefficient is statistically insignificant and a likelihood-ratio test fails to reject
the null that the quadratic model does not provide a better fit of the data, I subsequently treat
column 1 as the baseline model.
In column 3, the baseline model is expanded to include several commute time-interaction
terms that allow the effect of commute time to vary with day, job, and individual
characteristics.

29

The first interactions are weather variables. The frigid interaction uses a

dummy variable equal to one when the temperature at 6:00 a.m. on the morning of the job was
below 20 degrees Fahrenheit. Over 35% of job offers are made on the day before, and exactly
50% are made on the morning of the job, suggesting that the majority of offers are received at a
time when substitutes have an expectation of the morning-of-job temperature. The frigidinteraction effect is statistically significant at five percent confidence and magnifies the effect of
commuting by 0.045 (36%). One possible explanation for the aversion to driving in the cold is
that it is physically uncomfortable; another is the presence of safety concerns over icy roads.
Rainfall over the past 24 hours, measured at 6:00 a.m. on the day of the job, is an
imperfect measure because the rain may have ended the previous day. Nonetheless, this is the
best measure available and there are at least two reasons to believe that this noisy measure of
rainfall contains useful information. First, the timing is less of an issue for the 35% of offers
made the day before because despite the presence of forecasts, the precise ending time of the rain
is uncertain. Second, even when the majority of the rain fell during the previous day, it is
possible that roads were still wet at 6:00 a.m. the following day. Regardless, the interaction

29

I do not report the interaction coefficients because neither the sign nor the statistical
significance of the interaction coefficients is interpretable (Ai & Norton, 2003).
73

effect is a precisely estimated zero, suggesting that rainfall does not significantly influence
substitutes’ commuting preferences.
Fuel prices are the next determinant of the cost of commuting to be considered in column
3. The interaction term was constructed using the county-level daily (day-of-job) average pergallon price of regular unleaded gasoline. I use the day-of-job fuel price because past fuel
purchases are a sunk cost that should not enter in today’s decisions and for forward-looking
substitutes today’s fuel price is likely to be the best predictor of tomorrow’s. The estimated fuelprice interaction effect suggests that a one-dollar increase in the per-gallon price of gasoline
increases the negative effect of commute time by 0.027 (about 22%). This fairly large effect is
imprecisely estimated, however, and not statistically significant at traditional confidence
levels.

30

The next two interaction terms in column 3 are job-length (in hours) and a half-day
dummy. Intuitively, because the marginal opportunity cost of being away from home is
presumably increasing with time, we might expect to see a greater aversion to commuting on
longer days. This is confirmed in column 3, where a one-hour increase in job length, all else
equal, increases the negative effect of commute time by 0.008 (6%), but the effect is not
statistically significant. The job-length interaction effect on commuting preferences only
captures the increasing marginal cost of being away from home because the half day-commute
time interaction is holding the effect of daily pay constant. The half-day interaction effect is
negative but also imprecisely estimated.

30

As with rainfall, there is some question as to whether I am using the correct measure of fuel
price: for instance, lagged fuel prices may influence substitutes’ decision making. The
qualitative result of a negative but statistically insignificant effect of fuel price is robust to
instead using lagged daily fuel prices or lagged one- or two-week moving averages.
74

It is well established that, on average, women have shorter commutes than men.
Explaining this stylized fact is difficult, however, and complicated by the fact that many women
live in two-worker households. Including a male-commute interaction term allows for a simple
test of whether women’s aversion to commuting is significantly larger than that of men when
commute lengths are exogenously determined and not confounded by a joint residential decision.
A disproportionate number of female substitutes in the sample are certified as regular teachers,
however, so a certified-commute interaction is also included to disentangle differences in gender
preferences for commuting from those of certified-teachers. This is important if, as is likely,
certified substitutes have a higher opportunity cost of time than their non-certified counterparts.
Both the male and certified interaction effects are of the expected sign, but imprecisely
estimated. The lack of a statistically significant difference between men and women suggests
that the shorter commutes frequently observed among women are not due to inherent differences
between the sexes in commuting preferences.
Given the commuting literature’s longstanding interest in the differential between male
and female commute times, I take this opportunity to further examine the differences in
commuting preferences between genders by estimating the interaction model of column
separately for men and women. These results are reported in columns 4 and 5 of table 2.2. A
likelihood ratio test strongly rejects the null that the parameter values of the interaction model
are the same for both men and women.

31

A few striking results emerge when comparing

columns 4 and 5. First, it appears that the entire aversion to commuting in cold temperatures was
driven by women: the negative effect of commute time was about 50% larger for women on
frigid mornings while there was virtually no temperature effect among men. Second, females
31

The log likelihood of the unrestricted model was computed by summing the log likelihoods of
the male-only and female-only models.
75

have a significantly larger aversion to commuting when fuel prices are high: a one-dollar
increase in fuel price increases the partial effect of commute time by about 44%.

2.5.2 Sensitivity Analysis
The measurement error in commute times that results from the use of substitutes’ zip-code
centroids rather than home addresses is a potential cause for concern. If centroid-based commute
times are independent of the measurement error, however, linear probability model (LPM)
estimates are consistent (Deaton, 1997, p. 101). Similarly, in the probit model, the presence of a
normally distributed measurement-error term that is independent of the model’s covariates
creates an attenuation bias in the estimated probit coefficients, but not in the estimated APE.

32

One reason that the measurement error might be independent of the centroid-based commute
time is if centroid-based commute times represent average commute times of substitutes living
within the zip code (Deaton, 1997, p. 101).
Another potential concern is that individuals’ commute times are negatively correlated
with unobserved tastes for specific jobs if substitutes prefer to work in nearby schools for
reasons unrelated to commuting. In terms of the model, the concern is that





 sdjt  sj   sdjt and cov sj , Csdjt  0,

(2.7)

where ηsj is an unobserved substitute-job-specific match effect. If equation (2.7) is true, perhaps
because substitutes prefer to work in the schools that their children attend or in the schools that

32

This is similar to the ―neglected heterogeneity‖ problem discussed in Wooldridge (2010, p.
582-4).
76

their neighbors work in, the baseline estimates discussed above would overstate the aversion to
commuting. I show below that this is not the case.
I test for the presence of confounding ―neighborhood-school preferences‖ by estimating
the baseline model on two restricted samples: one that excludes within-zip code offers and
offered commutes of less than ten minutes and a second that excludes offers from the school that
the substitute worked in most frequently (the substitute’s modal school). These results are
reported in columns 6 and 7 of table 2.2. In neither case does the estimated commute APE
change in a meaningful way. The estimated coefficient and APE of commute time in column 6
are actually slightly larger in magnitude than their counterparts in the baseline model of column
1, suggesting that ―neighborhood-school preferences‖ are not driving the results.

33

Together, the

results in columns 6 and 7 suggest that the baseline results in column 1 are not driven by
substitute-school matching effects or by measurement error in commute times.
To this point the discussion has centered on probit-model estimates. Linear probability
models (LPM) are useful too, however, for a number of reasons. First, given that analytic results
on CME-induced attenuation bias only exist for linear models, the robustness of the main results
to the sample restriction imposed in column 6 of table 2.2 should be verified in the linear model.
Linear sub-day random-effect estimates on the full sample are provided in column 1 for
comparison’s sake, but are inconsistent due to the endogenously-unbalanced nature of the
33

The slight increase in APE observed in the restricted-sample estimates could be caused by the
presence of nonlinearities in the effect of commute time or by the presence of classical
measurement error (CME) in commute times. Whether substitutes live on the near or far side of
a zip-code’s centroid is arguably random. It is well known that CME in linear models causes an
attenuation bias, but there are no similar analytic results for non-liner models. Monte Carlo
studies, however, have shown that the coefficients in binary-response models are attenuated and
that the magnitude of the bias is negatively correlated with the signal-to-noise ratio (Cameron &
Trivedi, 2005, p. 919). The signal-to-noise ratio is smaller when the substitute and school are
located in the same zip code. The restricted sample, therefore, produces estimates that are less
susceptible to CME-induced attenuation bias.
77

substitute-day panels (Wooldridge, 2010, p. 831).

34

Instead, I take the linear sub-day fixed-

effects estimates in column 2 to be the baseline LPM estimates. The LPM estimates are strongly
statistically significant and similar in magnitude to the probit APE, albeit slightly smaller. The
linear estimates on the restricted sample, reported in column 3, are remarkably similar to the
baseline LPM estimates in column 2 and are actually slightly larger, as observed in the RE-probit
results discussed above. Again, there is no evidence that the results are being driven by
measurement error or preferences for neighborhood schools.
A second advantage of the LPM is that it is straightforward to compute standard errors
that are robust to two-way clustering (Cameron et al., 2006). Two-way clustering might be
important if, in addition to the presence of a substitute-specific taste for subbing, unobserved job
effects (ζj) enter the model. Unobserved job effects might exist because the offers state the
regular teacher’s name, but this information is not available in the data. The LPM estimates in
table 2.3 report one-way substitute-clustered standard errors in parentheses and two-way
substitute-job clustered standard errors in brackets. In each case, the one-way and two-way
standard errors are nearly identical, suggesting that failing to compute two-way robust standard
errors of the RE-probit estimates in table 2.2 does not significantly impact statistical inference.
Unobserved job effects would create a more serious problem if ζj is correlated with xj.
While the randomness of the call system implies that unobserved job quality is not correlated
with commute time itself, it is conceivable that job length, which is under the control of the
regular teacher, is correlated with unobserved job quality. For example, the regular teacher in a
classroom full of unusually difficult students might systematically make shorter substitute
34

The difference is that the linear-RE model, unlike the RE-probit, does not condition on
random effect.
78

requests to ensure that the position gets accepted. If regular teachers routinely follow this
―compensating wage differential strategy,‖ the unobserved job effect will be positively correlated
with job length and potentially bias the estimated effect of commute time.
In this case, two-way sub day-job fixed effects are required for consistency, but can only
be implemented in the linear model. Job fixed effects can be included in the model because
within-job variation in commute time is created when the same job is offered multiple times to
substitutes living in different zip codes. As discussed in Abowd et al. (1999), the usual approach
of applying OLS to mean-differenced data is infeasible due to the unbalanced nature of the
panels and the high dimensionality of the problem (there are about 9,000 jobs and 32,000 subdays). Instead, I use the two way-FE estimator of Abowd et al. (2002), the results of which are
reported in column 4 of table 2.3.

35

The estimated effect of commute time is slightly smaller

than the baseline sub-day FE estimate in column 2, but still strongly statistically significant. It is
worth noting, however, that including job fixed effects sweeps away a substantial portion of
variation in the data: specifically, 1,705 (about 20%) jobs have no variation in commute time
because they are only offered to substitutes residing within a single zip code. Furthermore, the
LPM in a panel-data setting makes restrictive assumptions of its own on the range of values that
the FE can take (Wooldridge, 2010, p. 608).

2.6

Conclusion

I used data on the job offers made to substitute teachers by an automated calling system to
estimate the causal effect of commute time on labor supply. Substitute teaching is an ideal labor
market in which to answer this question because subsitutes are both free to make daily labor
35

Abowd et al. (2002) use the iterative conjugate gradient method and sparse matrixes to
develop the exact two-way FE estimator. I use Ouazad’s (2008) A2REG Stata module.
79

supply decisions and are subject to daily exogenous variation in commute time. The main result
is characterized by an offer-acceptance elasticity with respect to commute time of about -0.4.
Interestingly, on average, no statistically-significant effects of rainfall or fuel prices on the
disutility of commuting were found, although women’s commute preferences were found to vary
with fuel prices. Extremely low temperatures do increase the cost of commuting, however, and
again this effect is particularly strong among women. Interestingly, while much has been made
of the typically shorter commutes of women, I find no evidence that women are inherently more
averse to commuting than men.
Because 98 percent of jobs were eventually accepted, there is no substitute-teacher
shortage in this particular labor market. Were there a shortage, substitutes would likely be more
selective when considering job offers and exhibit even stronger preferences over commute time.
In this sense, the effect of commute time found in this paper can be considered a lower bound.
While the generalizability of substitute teachers’ preferences to the U.S. workforce is an open
question, these results are potentially particularly relevant to two important labor markets:
regular teacher labor markets and contingent labor markets.
Ideally the WTP for commute time would be computed in addition to the reported
estimates of the causal effect of commute time on daily labor supply by taking the ratio of the
marginal disutility of commuting to the marginal utility of daily pay (i.e. the marginal rate of
substitution). In terms of the empirical model, this would simply be the ratio of the coefficients
on daily pay and commute time. I cannot do this convincingly, however, because the positive
baseline-model coefficient on job hours reported in table A2 obfuscates the interpretation of the
daily-pay coefficient (half-day dummy). In other words, I am unable to disentangle the effect of

80

job hours from the effect of daily pay and thus cannot compute the marginal rate of substitution
between commute time and daily pay.
Generally, the finding that commute time plays an important role in labor supply
decisions suggests that employment policies and studies of labor supply ought to seriously
consider time spent commuting in addition to hours worked. Similarly, firms ought to take
potential employees’ locations seriously in the hiring and recruiting processes. From an
education-policy perspective, schools might be advised to actively seek nearby residents to work
as substitute teachers, compensate regular teachers who make long commutes to schools in less
desirable neighborhoods, or even subsidize housing for regular teachers who choose to live
nearby less desirable neighborhoods’ schools.

81

CHAPTER 2 APPENDICIES

82

APPENDIX 2.1

CHAPTER 2 TABLES

83

Table 2.1: Mean Offer Characteristics
Offers
All
Rejected
Acceptance rate
0.07
0
Half day
0.37
0.37
(0.48)
(0.48)
Hours
5.69
5.68
(1.85)
(1.85)
Wage
$11.29
$11.30
(2.43)
(2.45)
One-way miles
12.63
12.81
(8.71)
(8.74)
One-way minutes
17.49
17.70
(9.29)
(9.29)
Offer Recipient
Same town
0.16
0.16
Male
0.33
0.33
Certified
0.24
0.24

Accepted
1
0.36
(0.48)
5.79
(1.83)
$11.12
(2.15)
10.35
(7.99)
14.87
(8.89)
0.24
0.34
0.32

N
97,205
90,040
7,165
Notes: Standard deviations are provided in parentheses
for non-binary variables.

84

Table 2.2: RE-Probit Results
Linear C
1
Coefficients
One-way hours
-1.1943
(0.2530)***
One-way hours2
.

Quadratic C
2

Interactions
3

Men
4

Women
5

Drop Short
6

Drop Modal
7

-1.2661
(0.2543)***
-0.9542
(1.101)

-1.1333
(0.4049)***
.

-0.7869
(1.01)
.

-1.1535
(0.4229)***
.

-1.4571
(0.3444)***
.

-1.1806
(0.2853)***
.
***

-0.1206
(0.0123)***
-0.4450
(0.0424)***
.

-0.1247
(0.012)***
.

-0.0980
(0.0208)***
.

-0.1367
(0.0178)***
.

Hours*Frigid

-0.1255
(0.0133)***
-0.3886
(0.0378)***
.
.

.

.

.

.

.

.

Hours*Cert.

.

.

.

.

Hours*Half-day

.

.

.

.

Hours*Job length

.

.

.

.

Hours*Male

.

.

-0.0698
(0.0224)***
0.0003
(0.0002)
-0.0607
(0.0275)**
-0.0267
(0.0334)
-0.0291
(0.0426)
-0.0018
(0.0117)
.

.

Hours*Gas

0.0221
(0.0352)
-0.0004
(0.0006)
0.0589
(0.0359)
-0.0858
(0.0761)
-0.0558
(0.0907)
-0.0127
(0.0202)
.

-0.1068
(0.0122)***
-0.4224
***
(0.0425)***
.
***

Hours*Rain

.

.

Predicted A

0.111
(0.005)***
97,205
763

0.111
(0.005)***
97,205
763

-0.0450
(0.0185)**
0.0001
(0.0002)
-0.027
(0.0201)
-0.0282
(0.0329)
-0.0464
(0.0371)
-0.0076
(0.0098)
0.0222
(0.0258)
0.111
(0.005)***
97,205
763

-0.1380
(0.0179)***
-0.5887
(0.0666)***
.

0.117
(0.0075)***
32,435
217

0.107
(0.0062)***
64,770
546

0.0953
(0.0053)***
74,383
667

0.086
(0.0049)***
84,191
740

Average Effects
Commute APE
Commute Elast.

Observations
Subs (clusters)

85

Table 2.2, Continued
Sub-days (RE)
32,057
32,057
32,057
9,827
22, 230
25,347
27,928
Log Likelihood
-22,007
-22,004
-22,000
-7,205
-14,637
-15,530
-16,708
0.63
0.63
0.63
0.67
0.62
0.63
0.63
Rho
Notes: Standard errors, reported in parentheses, are based on 50 bootstrap replications and robust to substitute-level clustering.
Bootstraps for column 7 are in progress. In column 6 offered commutes shorter than 15 minutes and within-zip offers are dropped
from the analysis, while in column 7 offers from each substitute’s most-frequently-worked-in school are dropped. Definitions of the
partial effects are reported in appendix 2.3. All regressions include the full set of control variables described in the text. The full set
of estimated coefficients in the baseline (column 1) model are reported in table A2.

86

Table 2.3: Linear Probability Model (LPM) Estimates
All
All
1
2
Commute Coefficient
-0.0968
-0.0854
(0.0156)***
(0.0152)***
[0.0156]***
[0.0153]***
Sub-day effect
Job effects

Random
None

Fixed
None

Drop Short
3
-0.0889
(0.0193)***
[0.0193]***

All
4
-0.0601
(0.0098)***
.

Fixed
None

Fixed
Fixed

Observations
97,205
97,205
74,383
97,205
Substitutes
763
763
667
763
Sub-days
32,057
32,057
25,347
32,057
Jobs
8,950
8,950
8,123
8,950
Notes: The standard errors reported in parentheses are robust to clustering at the substitute
level. The standard errors in square brackets are robust to two-way substitute-job
clustering. All regressions include the full set of covariates discussed in the text. Column
3 makes the same sample restriction as column 6 in table 2.2. The two-way FE model in
column 4 was estimated using the A2REG Stata package (Ouazad, 2008).

87

APPENDIX 2.2

CHAPTER 2 FIGURES

88

Figure 2.1: Commute-time Distributions
.05

Density

.04
.03
.02
.01
0
0

15

30
45
60
75
90
One-way commute time (minutes)

105

All offers (3 minute bins)
Accepted-offer kernel density
Rejected-offer kernel density

89

Figure 2.2: Daily Weather Conditions
80

60
100
40
50
20

0

0
10/2/06

12/1/06

2/1/07

Rainfall

3/30/07
Temperature

90

6/1/07

Temperature (Deg. Fahrenheit)

Rainfall (hundredths of inches)

150

Figure 2.3: County-level Average Daily Fuel Prices

Price per gallon ($)

4.00

3.50

3.00

2.50

2.00
10/1/06
12/1/06
2/1/07
4/1/07
Source: OPIS daily average retail gasoline prices.

91

6/1/07

APPENDIX 2.3

AVERAGE PARTIAL EFFECTS, ELASTICITIES, AND INTERACTION EFFECTS

92

Following Wooldridge (2010, p. 613) the conditional expectation of the acceptance probability is

 A
 γ x sdjt  Csdjt  λz s  bw sdjt  δrd
E Asdjt | psdt  1, x sdjt , Csdjt , z s , w sdjt , rd , sd   
0.5
2

1  












.



(A2.1)

Let N represent the total number of offers (observations), xβ represent the numerator of (A1),
and s represent the denominator of (A1). The APE of a continuous variable xk is
APEk =

k

 xi  

s 
i 1
N

 
Ns


(A2.2)

and the APE of a binary variable xk is

APEk = N

1



  x    1  xk
i
k
i

  

s
i 1  
 
N

    xi  k xik .









(A2.3)





s

The average elasticity (E) of continuous variable xk with respect to the acceptance probability is
Ek =

k

N

 xi 
 xi  
 
.
s 
 s 

 xik 
Ns

i 1

(A2.4)

The ―average partial interaction effect‖ (APIE) of xjxk when xj and xk are continuous is based on
the cross derivative

 2 E  Ai | xi 
x j xk

APIE jk  N

1

N 



i 1 


, which equals

  j +  jk xik   k +  jk xij   ' xi    jk   xi  


 s 

s2



N 



s



 s 



j
k

  xi    jk xi  j +  jk xi  k +  jk xi
 N 1    


s3
i 1   s   s



The APIE of xjxk when xj is continuous and xk is binary is

93

  .




(A2.5)

APIE jk  N

1

 E  Ai | xi , xk  1 E  Ai | xi , xk  0  
.

x j
x j
i 1

N




94

(A2.6)

APPENDIX 2.4

BASELINE RE-PROBIT COEFFICIENTS

95

Table A2: Baseline RE-Probit Coefficients
Variable
Coefficient
Commute hours
-1.1943
(0.2530)***
Male
0.1064
(0.1470)
Certified
0.4176
(0.2045)**
Half-day
0.0782
(0.1175)
Frigid
0.1141
(0.0602)*
Gas price
0.0426
(0.1742)
Rain
-0.0002
(0.0008)
Hours
0.0743
(0.0353)**
Pre-K/Kindergarten
-0.0268
(0.1402)
First/Second Grade
0.0166
(0.1208)
Third/Fourth Grade
-0.0017
(0.1380)
Fifth/Sixth Grade
0.0228
(0.1165)
Sev./Eighth Grade
0.0765
(0.1795)
Math
0.0501
(0.1230)
Science
0.0980
(0.0832)
Social studies
-0.0018
(0.1043)
Art/Music/Gym
-0.2047
(0.0872)**
Tech./Computers
-0.0214
(0.0917)
Foreign Language
-0.0718
(0.1064)
Special Education
-0.1686
(0.1059)
Other
0.0262
(0.1054)

Variable
Monday
Tuesday
Thursday
Friday
September
November
December
January
February
March
April
May
June
On Teacher’s List
On School’s List
Total Teacher Lists
Total School Lists
Time of call (T-t)
Day of (do)
Day before (db)
db*(T-t)
do*(T-t)

96

Coefficient
-0.0088
(0.0703)
0.0268
(0.0495)
-0.0125
(0.0589)
-0.2408
(0.0629)***
-0.0563
(0.1315)
-0.1255
(0.0756)*
-0.3056
(0.1161)***
-0.1129
(0.1364)
-0.3054
(0.1286)**
-0.2996
(0.1354)**
-0.3237
(0.1702)*
-0.3693
(0.2138)*
-0.3605
(0.2815)
1.4046
(0.2175)***
1.4469
(0.1208)***
3.5683
(1.8846)*
1.2762
(0.9410)
-0.0006
(0.0002)***
-0.6458
(0.2505)***
0.8865
(0.4526)*
-0.0373
(0.0183)**
0.0510
(0.0194)***

Table A2, Continued
Notes: These are the variables included in the baseline RE-probit model discussed in
column 1 of table 2.2. The school-dummy coefficients are not reported, but are strongly
jointly significant. Standard errors are robust to substitute-level clustering.

97

CHAPTER 2 REFERENCES

98

CHAPTER 2 REFERENCES

AAA. See American Automobile Association.
Abowd, J. M., R. H. Creecy, and F. Kramarz. 2002. Computing person and firm effects using
linked longitudinal employer-employee data. Technical Paper No. 2002-06, Longitudinal
Employer-Household Dynamics, Center for Economic Studies, U.S. Census Bureau.
Abowd, J. M., F. Kramarz, and D. N. Margolis. 1999. High wage workers and high wage firms.
Econometrica 67(2): 251-333.
Ai, C., and E. C. Norton. 2003. Interaction terms in logit and probit models. Economics Letters
80(1): 123-129.
American Automobile Association. 2009. Your Driving Costs. Heathrow, FL: AAA.
Black, D., N. Kolesnikova, and L. Taylor. 2010. Why do so few women work in New York (and
so many in Minneapolis)? Labor supply of married women across U.S. cities. Federal Reserve
Bank of St. Louis Working Paper Series: Working Paper 2007-043E.
Butler, J.S., and R. Moffitt. 1982. A computationally efficient quadrature procedure for the One
Factor Multinomial Probit Model. Econometrica 50(3): 761-764.
Calfee, J., and C. Winston. 1998. The value of automobile congestion time: Implications for
congestion policy. Journal of Public Economics 69(1): 83-102.
Calfee, J., C. Winston, and R. Stempski. 2001. Econometric issues in estimating consumer
preferences from stated preference data: A case study of the value of automobile travel time. The
Review of Economics and Statistics 83(4): 699-707.
Camerer, C., L. Babcock, G. Loewenstein, and R. Thaler. 1997. Labor supply of New York City
cabdrivers: One day at a time. Quarterly Journal of Economics 112(2): 407-441.
Cameron, A.C., J.B. Gelbach, and D.L. Miller. 2006. Robust inference with multi-way
clustering. NBER Technical Working Paper No. 327.
Cameron, A.C., and P.K. Trivedi. 2005. Microeconometrics: Methods and Applications, New
York, NY: Cambridge Univ. Press.
Cogan, J. 1981. Fixed costs and labor supply. Econometrica 49(4): 945-964.
Dickens, W., and S. Lundberg. 1993. Hours restrictions and labor supply. International
Economic Review 34(1): 169–92.

99

Gimnez-Nadal, J. I., and J. A. Molina. 2011. Commuting time and labour supply: A causal
effect? IZA Discussion Paper No. 5529.
Goette, L., D. Huffman, and E. Fehr. 2004. Loss aversion and labor supply. Journal of the
European Economic Association 2(2-3): 216-228.
Gutiérrez-i-Puigarnau, E., and J.N. van Ommeren. 2010. Labour supply and commuting. Journal
of Urban Economics 68(1): 82-89.
Harrison, G. W. 2006. Experimental evidence on alternative environmental valuation methods.
Environmental and Resource Economics 34(1): 125-162.
Kahn, S., and K. Lang. 1991. The effect of hours constraints on labor supply estimates. Review
of Economics and Statistics 73(4): 605-611.
Kiefer, N., and G. Neumann. 1981. Individual effects in a nonlinear model: Explicit treatment of
heterogeneity in the empirical job-search model. Econometrica 49(4): 965-979.
Koslowsky, M., A. N. Kluger, and M. Reich. 1995. Commuting Stress: Causes, Effects, and
Methods of Coping. New York: Plenum Press.
Layton, J. 2005. How MapQuest works. HowStuffWorks.com.
http://www.howstuffworks.com/mapquest.htm (accessed March 27, 2011).
Lemp, J. D., and K. M. Kockelman. 2008. Quantifying the external costs of vehicle use:
Evidence from America’s top-selling light-duty models. Transportation Research Part D:
Transport and Environment 13(8): 491-504.
Mortensen, D. 1986. Job search and labor market analysis. In Handbook of Labor Economics,
vol. 2, ed. O. Ashenfelter and R. Layard, 849-919. Amsterdam: North-Holland.
Muth, R. 1969. Cities and Housing. Chicago: University of Chicago Press.
Oettinger, G. 1999. An empirical analysis of the daily labor supply of stadium vendors. Journal
of Political Economy 107(2): 360-392.
Ouazad, A. 2008. A2REG: Stata module to estimate models with two fixed effects.
http://econpapers.repec.org/RePEc:boc:bocode:s456942 (accessed March 27, 2011).
Pisarski, A. 2006. Commuting in America III: The Third National Report on Commuting
Patterns and Trends. Washington, DC: Transportation Research Board.
Russo G., P. Rietveld, P. Nijkamp, and C. Gorter. 1996. Spatial aspects of recruitment behaviour
of firms: An empirical investigation. Environment and Planning A 28(6): 1077-1093.

100

Sabir, M., J. S. van Ommeren, M. Koetse, and P. Rietveld. 2010. Adverse weather and
commuting speed. Networks and Spatial Economics: 1-12.
Small, K. A., and E. T. Verhoef. 2007. The Economics of Urban Transportation. Abingdon, UK:
Routledge.
So, S., P. Orazem, and M. Otto. 2001. The effects of housing prices, wages, and commuting time
on joint residential and job location choices. American Journal of Agricultural Economics 83(4):
1036-1048.
Stutzer, A., and B.S. Frey. 2008. Stress that doesn’t pay: The commuting paradox. Scandinavian
Journal of Economics 110(2): 339-366.
Van den Berg, G.J., and Gorter, C. 1997. Job search and commuting time. Journal of Business &
Economic Statistics 15(2): 269-281.
Van Ommeren J. S., and M. Fosgerau. 2009. Workers’ marginal costs of commuting. Journal of
Urban Economics 65(1): 38-47.
Van Ommeren, J. S., G.J. van den Berg, and C. Gorter. 2000. Estimating the marginal
willingness to pay for commuting. Journal of Regional Science 40(3): 541-563.
Wooldridge, J.M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd Ed.
Cambridge, MA: MIT Press.

101

CHAPTER 3

The Effect of High-Stakes Testing on Teacher Quality: Evidence from California

102

3.1

Introduction

The 2001 passage of the No Child Left Behind Act (NCLB) represents one of the U.S. Federal
Government’s largest forays into education policy since the 1965 Elementary and Secondary
Education Act (ESEA) and the beginning of an era of increasing national attention to publicschool quality. NCLB mandated, among other things, the publication of school report cards and
sanctions on underperforming schools and districts that strengthened existing incentives for
schools to improve student achievement and strengthened incentives to increase school quality
more broadly. However, this pressure to improve student achievement can also potentially result
in unintended consequences that undermine the policy’s objectives, so understanding how both
schools and teachers respond to evidence-based accountability programs is of the utmost
importance to policy makers tasked with improving future iterations of education policy.

36

This paper uses teacher-level data from California to investigate what effect, if any,
increasing the stakes of standardized testing had on teacher quality as measured by education,
experience, and certification status. The effect’s direction is theoretically ambiguous because
both schools and teachers are presented with potentially competing incentives.

36

37

The increase in

Hamilton et al. (2008) provide a thorough review of both the history and existing literature
regarding standards-based education policy. Dee and Jacob (2010) do the same with a specific
focus on NCLB. Carnoy and Loeb (2002) exploit cross-sectional variation in accountability
strength across states and find larger achievement gains in ―strong accountability‖ states.
Strategic responses of schools have been found to include re-classifying predicted low scorers as
non-tested special-education students (Cullen & Reback, 2006; Figlio & Getzler, 2006),
suspending predicted low scorers on test days (Figlio, 2006), reassigning low-performing
teachers to non-tested grades and subjects (Chingos & West, 2011), and offering high-calorie
school lunches on test days (Figlio & Winicki, 2005). Similarly, teachers have been found to
―game the system‖ by explicitly cheating (Jacob & Levitt, 2003) and by ―teaching to the test‖
(Jacob, 2005). Hannaway and Hamilton (2008) review the literature on how accountability
influences teachers’ instructional strategies.
37
The empirical evidence on the effect of evidence-based accountability policy on teacher
attrition, for example, is mixed: Boyd et al. (2008) show that attrition of New York fourth-grade
103

tests’ stakes provided schools with strong incentives to increase achievement as measured by
standardized test scores. Schools, for example, might respond by upgrading teacher quality in all
grades and subjects, or by attempting to increase test scores at the expense of general learning
and learning in non-tested grades and subjects by reallocating low-performing teachers from
tested to non-tested grades (Chingos & West, 2011). The latter approach is short sighted and
potentially harmful, of course, because decreasing educational quality in non-tested early grades
may delay students’ development and have long-lasting consequences (Chetty et al., 2011;
Heckman et al., 2010; Heckman & Masterov, 2007).
Some teachers may relish the opportunity to make a difference in children’s lives and
welcome accountability programs. Not all teachers necessarily feel this way, however, and more
generally the ability of schools to respond to the incentives created by accountability programs
might be limited by a number of factors including budget constraints, teacher shortages, and the
preferences of teachers and teachers’ unions. Specifically, high-stakes testing might increase
teacher attrition and exacerbate teacher shortages by decreasing teachers’ autonomy in the
classroom (Luna & Turner, 2001), decreasing teachers’ sense of job security (Reback et al.,
2011), or increasing teachers’ stress levels (Daly & Chrispeels, 2005). Highly-educated teachers
might be those most at risk of leaving if they have access to better non-teaching job
opportunities.
The existing literature on the response of teacher quality to evidence-based accountability
programs largely overlooks the potential for grade-specific effects of high-stakes testing on
teacher quality (e.g., Clotfelter et al., 2004; Lee & Young, 2004), which is problematic if
accountability programs are primarily affecting tested grades because estimates will be weighted

teachers decreased in response to new high-stakes fourth-grade tests while Clotfelter et al. (2004)
find that attrition increased in North Carolina in response to a new accountability system.
104

averages of the effect in tested-grades effects and non-effects in non-tested grades. Two
exceptions to this critique include Boyd et al. (2008), who find that fourth-grade attrition rates
decreased in response to New York’s introduction of a fourth-grade test, and Phillips and
Flashman (2007), who use pre-NCLB nationally-representative data to compare inputs in tested
versus non-tested grades across ―strong-‖ and ―weak-accountability‖ states; they find small
difference in class size but no difference in teacher quality.
I contribute to this literature by estimating the effect of the NCLB-induced increase in
testing stakes on teacher quality in California using a difference-in-differences (DD) approach
that compares teacher quality in non-tested first-grade to tested second-grade classrooms. The
empirical strategy is based on the fact that beginning in the 1997/98 school year all secondthrough eleventh-grade students in California took mandatory standardized tests that were used
by California’s pre-NCLB accountability system, but NCLB substantially increased the tests’
stakes beginning in the 2002/03 school year. I also undertake two additional analyses. First, I
fully interact the DD model’s covariates with a Title 1-school indicator to test for a differential
effect of NCLB in Title 1 schools, which were subject to significantly stronger sanctions under
NCLB. Second, I consider an event-history specification that allows for year-specific policy
effects, which is useful for understanding the timing of schools’ and teachers’ response to the
policy and pre-existing differences in teacher quality across grades.
Two caveats of the analysis are worth stressing at the outset. First, this is not a study of
the effect of NCLB, which is itself a bundle of several policies. Rather, I examine the effect of
increasing the stakes of standardized testing on the distribution of teacher quality across tested
and non-tested grades. Prior to NCLB, California had an accountability program that provided
schools with two types of incentives to improve student achievement. One was a rewards

105

program that awarded high-achieving schools monetary rewards and public praise. The other
program, which low-achieving schools could voluntarily enter, provided financial assistance in
conjunction with a threat of district or state intervention if the schools did not subsequently
improve. NCLB raised the stakes of California’s standardized tests by raising the bar of
acceptable performance, mandating that the state provide school report cards indicating schools’
performance, and imposing strong sanctions on underperforming Title 1 schools.
A second caveat is the possibility of confounding spill-over effects of the increased
testing stakes to the control group (non-tested grades) caused by school administrators seeking to
improve second-grade test scores by increasing first-grade teacher quality.

38

Similarly, first-

grade teachers might worry about being blamed for low second-grade test scores. The direction
of the spill-over bias is ambiguous, again because of the potentially competing school and
teacher reactions.
The strongest findings regard teacher education: the fraction of second-grade teachers
holding a graduate degree decreased relative to that of first-grade teachers by about 1.4
percentage points in the year preceding NCLB and in each subsequent year through 2005/06.
However, the fact that an ―effect‖ is found in the year before the testing stakes were raised
obfuscates the interpretation of the results. One possible interpretation of this anomalous finding
is that there was a preemptive movement of highly-educated teachers out of the tested second
grade in anticipation of the changes to come due to the fact that NCLB was publicly debated for
one full year before being passed in January of 2002. Alternatively, this might indicate the
presence of a pre-existing differential trend between first- and second-grade classrooms.

38

This would be a greater concern if the tests were administered early in the school year, but
California administers its standardized tests in March when about 85% of the school year is
complete.
106

Regardless of the cause of the difference, however, it is interesting to note that there does appear
to be a difference in the prevalence of highly-educated teachers in first and second grades.
A much smaller and shorter-lived decrease in second-grade teachers’ average years of
experience is found and no statistically-significant effects are found on teacher-certification
measures. Nor do Title 1 schools appear to behave any differently than their non-Title 1
counterparts, which is surprising given the stronger incentives and greater pressure placed on
Title 1 schools by NCLB. Policy implications of these findings are discussed in the conclusion.

3.2

Literature Review

Lee and Young (2004) use nationally representative teacher-level data from the 1990’s to
investigate the effect of state-level accountability strength on a variety of teacher and school
outcomes. Specifically, the authors find no effect of accountability strength on class size or
teacher quality, the latter of which they measure by in-field teaching. Clotfelter et al. (2004)
investigate the impact of the implementation of an accountability program in North Carolina.
The authors find that the accountability program increased teacher attrition, decreased teacher
experience, and had no effect on teacher quality (measured by teachers’ undergraduate
institution’s selectivity). Neither paper, however, allows for grade-specific effects of
accountability. If responses are concentrated in tested grades as hypothesized in the
introduction, however, studies that fail to account for this by averaging across all grades will
produce attenuated policy-effect estimates.
More recently, a handful of papers have examined the potential for differential impacts of
accountability policies on tested versus non-tested grades. Reback et al. (2011) use state-level
variation in the definition of adequate yearly progress (AYP) to find that NCLB led teachers in

107

tested grades to work more hours per week and to be more concerned about job security than
their peers in non-tested grades. Chingos and West (2011) use post-NCLB administrative data
from Florida to find that low-value added teachers are more likely to both move to low-stakes
(non-tested) positions within their current school and to exit teaching.
The most methodologically similar paper to my approach is Boyd et al. (2008), who use
administrative data from New York State to investigate the effect of a newly-implemented
fourth-grade testing requirement on fourth-grade teacher attrition and find the counter-intuitive
result that testing led to a decrease in attrition in the newly tested fourth grade. The teachers who
did leave the fourth grade in response to the new test were more likely to be experienced and less
likely to have graduated from a highly-selective college. Post-test entrants to fourth grade were
less likely to be new teachers and more likely to have graduated from a highly-selective college.
Similarly, Phillips and Flashman (2007) use nationally representative pre-NCLB data from 1993
and 1999 to examine the difference in several teacher and classroom-level characteristics
between tested and non-tested grades in ―strong-‖ and ―weak-accountability‖ states. The authors
find marginally smaller class sizes in strong-accountability states, but no significant difference in
teachers’ highest degree obtained, years of experience, licensure status, or college quality.
I contribute to this literature in several ways. First, understanding the teacher labor
market in as large and diverse a state as California is important in its own right. Second, I
examine the effect of strengthening existing accountability policies rather than the effect of
implementing new policies when none existed before (e.g., Clotfelter et al., 2004; Boyd et al.,
2008). As federal and state policies evolve, the former question is arguably of more interest.

108

3.3

Institutional Details and Data

3.3.1 Pre-NCLB Education Policy in California
The federal Improving America’s Schools Act of 1994 encouraged states to create or expand
standards-based accountability programs. California responded by passing the Public Schools
Accountability Act of 1999 (PSAA), which was comprised of three interrelated programs that
sought to motivate schools to improve student achievement.

39

The PSAA’s primary innovation

was the creation of an Academic Performance Index (API), which is an annual school-level
achievement score based on student and demographic-group performance on California’s
40

Standardized Testing and Reporting Program (STAR).

From STAR’s inception in 1997/98, all students in second through eighth grade were
tested annually in a minimum of two subjects: math and English.

41

The tests are administered

within a ten-day window of the date on which 85% of the school year is complete; testing dates
typically fall in early to mid-March. API is scored on a scale from 200 to 1,000, with 800 being
the target for proficiency. Under PSAA, schools must score better than 800 or make annual
gains of at least five percent of their distance from 800 to remain in good standing. Schools that
met these requirements and met a threshold test-participation rate became eligible for monetary

39

A thorough independent review of PSAA, commissioned by the California Department of
Education, was conducted by the American Institutes for Research (AIR, 2003).
40
API is computed as follows. First, student scores are sorted into five performance categories.
The scores are then weighted by category-specific weights, with the higher-achieving categories
receiving larger weights. Finally, all of the weighted individual test scores within a school are
summed. The resulting number is the school’s API. For additional details see http://www.eddata.k12.ca.us/articles/Article.asp?title=understanding%20the%20API.
41
For additional information on the history of STAR, exemptions, test formats, and results see
http://star.cde.ca.gov/.
109

prizes via the Governor’s Performance Award Program, which was the second component of
PSAA.
Schools that failed to meet these guidelines became eligible to enter the Immediate
Intervention / Underperforming Schools Program (II/USP), which was the third component of
PSAA. Each year, schools that scored below a preordained API percentile became eligible to
apply to II/USP.

42

Despite the voluntary nature of participation in II/USP from the state’s

perspective, many districts in California effectively forced their eligible schools to apply (AIR,
2003). Because applications typically outnumbered available funding, 430 applicant schools
were randomly selected each year to receive the modest funding increase.

43

Analyses by Goe

(2006) and AIR (2005) find no evidence of a significant effect of II/USP funding on student
achievement, however, and it is unlikely that a slight increase in funding would impact the
outcomes of interest in the present paper.

44

II/USP schools that fail to improve in the two to

three years after entering the program were technically subject to state-level interventions, but
most teachers and principals did not consider this a credible threat (AIR, 2003).
A final relevant pre-NCLB policy in California is the Grades K-3 Class Size Reduction
Program (CSR), which is distinct from PSAA and provides schools and districts with financial

42
43

The median was the initial cutoff, but in subsequent years it was reduced to the lower quartile.
The initial II/USP lasted three years. It has since been reinstituted under various names.

44

Two potential explanations for the program’s apparent lack of success have been put forth.
First, principals may have been unsure of how to spend the funds; indeed, principals complained
that state administrators provided no guidance in this regard. Second, the comparison group used
in the empirical analyses may have been contaminated by some combination of districts
responding endogenously by cutting funding to II/USP schools and redirecting it to similarly
low-performing schools that did not receive II/USP funding and non-II/USP schools receiving
money from similar federal programs (e.g., the Comprehensive School Reform Demonstration
Program (CSRD)).
110

45

incentives to reduce class size.

Two key features of CSR are crucial for proceeding in the

absence of detailed school-level CSR participation data. First, CSR schools must reduce class
sizes in the following order: first grade, second grade, kindergarten or third grade, grade level
not chosen in previous step. Second, by the 1997/98 school year virtually all first and secondgrade classrooms were participating in CSR (Carroll et al., 2000). For these reasons, the
empirics will focus on comparisons between first and second grade classrooms. Still, a valid
concern is that NCLB encouraged some schools to enter CSR for first grade only or expand CSR
from first to second grade, differentially affecting first- and second-grade classrooms.

3.3.2 NCLB’s Impact in California
NCLB increased the stakes of California’s STAR tests in three fundamental ways. First, it
required that all schools make adequate yearly progress (AYP), which was stricter than existing
API-score requirements because in addition to reaching API growth targets, AYP requires that
schools meet percent proficient, attendance, and test participation thresholds. Second, NCLB
mandated that states publish ―school report cards‖ containing information on schools’
performance levels and AYP status. And third, NCLB threatened stronger sanctions on schools
that receive Title 1 funds but fail to make AYP.
Title 1 was a component of the original ESEA that was reinstituted by NCLB, which
provides federal funds to schools in proportion to the number of low-income students attending
the school. This money can be used to cover the cost of tutoring, after-school, and summer

45

CSR was launched in 1996 and is still active today. Schools must have no more than 20
students per class. For additional information see http://www.cde.ca.gov/ls/cs/k3/.
111

programs that reinforce the school’s standard curriculum.

46

Under NCLB, when a Title-1 school

fails to make AYP for two consecutive years it must enter Program Improvement (PI), which is a
five-year process of steadily increasing consequences that culminates with the drastic
restructuring of the school (e.g., the school is reinvented as a charter, taken over by the state, or
replaces a majority of the staff).

47

To leave PI and avoid restructuring a school must make AYP

in two consecutive years.

3.3.3 Data
The teacher data analyzed in this paper comes from the California Basic Educational Data
System (CBEDS) Professional Assignment Information Form (PAIF).

48

I use PAIF data on all

fulltime first- and second-grade teachers in self-contained classrooms from 1998/99 through
2005/06 because schools’ Title-1 status is unavailable prior to the 1998/99 school year, nearly all
first- and second-grade classrooms were participating in CSR by 1998/99, and I am primarily
interested in schools’ and teachers’ immediate responses to the NCLB-induced increase in
testing stakes that took effect in 2002/03. I augment the PAIF data with information on schools’
Title 1 status and demographic composition from the National Center for Education Statistics’
Common Core of Data (CCD). Both data sources are publicly available.

49

Table 3.1 provides an overview of the schools containing first- and second-grade
classrooms. The black share of enrollment fell slightly during the study’s time frame from 8.8%

46
47
48

For more information on Title 1 see http://www2.ed.gov/programs/titleiparta/index.html.
For the precise PI timeline see http://www.cde.ca.gov/ta/ac/ti/nclbpireq.asp.
A copy of a PAIF is available here: www.cde.ca.gov/ds/dc/cb/documents/paif08.doc.

49

CBEDS data is provided by the California Department of Education at
http://www.cde.ca.gov/ds/sd/df/. The CCD is available at http://nces.ed.gov/ccd/.
112

to 7.3% while the Hispanic share of enrollment increased from 45.8% to 51.5%. Charter schools
became more popular during this time period and about 70% of schools received Title 1 funds.
The PAIF sample contains about 45,000 self-contained classroom teachers each year that are
evenly split between first and second grade. About 750 school districts and 5,000 schools are
represented in each year.
In figure 3.1, as a prelude to the empirical analysis, I show how four teacher-quality
measures varied by grade between 1998/99 and 2005/06. Figure 3.1A examines trends in the
fraction of teachers holding a graduate degree (masters or doctorate). The pre- and post-NCLB
trends are quite similar. Second-grade classrooms are about three percentage points more likely
to be taught by a teacher who holds a graduate degree, but this gap narrows somewhat over time.
Similarly, figure 3.1B shows that second-grade teachers tend to have about one more year of
experience than their first-grade counterparts and that the trends in average experience are
similar for both grades, before and after NCLB.
Figure 3.1C defines inexperienced teachers as those with either zero or one year of prior
teaching experience. First-grade teachers are more likely to be inexperienced throughout, but the
pre- and post-NCLB trends differ for both grades. Specifically, the post-NCLB second-grade
gradient flattens out while the corresponding first-grade gradient increases slightly. Finally,
figure 3.1D shows the fraction of self-contained first and second-grade teachers who are fully
credentialed each year. There is a small gap between first and second-grade credential rates
initially, but over time the two trends converge to about 97% fully credentialed. In sum, figure
3.1 does not suggest any obvious or large effects of NCLB, although to definitively answer this
question a multivariate analysis that controls for school characteristics is necessary.

113

3.4

Empirical Model and Estimation

NCLB increased the stakes of California’s STAR tests beginning in 2002/03, but this did not
affect all grades equally because kindergarten and first-grade classrooms remained untested. I
restrict the analysis to only first- and second-grade classrooms, however, for two reasons. First,
as discussed in section 3.3.1, the CSR program requires that schools begin by reducing first- and
second-grade class sizes and kindergarten and third-grade classes are differentially affected by
CSR. Most first- and second-grade classrooms in California were participating in CSR by
1997/98, although kindergarten and third-grade participation potentially varied across schools.
The DD estimator maintains the assumption that the treatment and control groups were not
differentially affected by any other policy interventions, so given the presence of the CSR
program, this assumption is most plausible when comparing first- to second-grade classrooms.
Second, apart from testing status, first and second grade are similarly structured.
Kindergarten is fundamentally different in that it is a half-day in many school districts and many
kindergarten teachers teach two classes per day. Similarly, higher elementary grades are more
likely to compartmentalize (have one teacher teach only math, one teacher teach only science,
etc.) or track students (group students of like ability).
The standard DD estimator compares differences in outcomes between the treated and
non-treated groups, before and after the policy change, and assumes that the treatment effect of
the policy is constant across years.

50

I assume that the outcome of interest (y) is determined by

y  e o d N L t  S c n ist  ist ,
S c n ist
CB eod
ist
st

50

For a textbook treatment of DD estimators see Wooldridge (2010, pp. 147-51).
114

(3.1)

where i indexes classrooms, s indexes schools, t indexes years, Secondist is a dummy variable
equal to one if classroom i is a second-grade classroom, NCLBt is a binary indicator of NCLB
being in effect in year t, θst is a school-year fixed effect that controls for both observed and
unobserved school-year attributes such as principal quality, student type, school size, and
participation in programs such as CSR, II/USP, and CSRD, and εist is an idiosyncratic error term
that captures the effect of unobserved classroom-specific determinants of teacher quality.
The school-year fixed effects (FE) subsume the year dummies typically included in a DD
regression and are included in all subsequent models in order to partial out school-specific time
trends. Conditioning on the school-year FE means that the results are identified by within-school
year differences between first- and second-grade classrooms.
NCLB was being publicly discussed in congress during the year before it was
implemented and was a prominent component of George W. Bush’s 2000 Presidential Campaign
before that. Given the high profile and controversial nature of NCLB and the strong incentives
that it provided schools to make AYP, schools could very well have reacted in anticipation of the
passage of NCLB. Similarly, teachers who wanted to avoid the pressures of NCLB might have
preemptively exited the profession or tested classrooms. Alternatively, there may have been a
pre-existing difference in the characteristics of first- and second-grade teachers. For these
reasons, along with the fact that some schools and teachers might be slow in responding to
NCLB, I also estimate event-history models that allow the treatment effect to vary across years.
These richer models replace the NCLBt indicator seen in equation (3.1) with a full set of
year dummies, yielding the estimating equation

115

8

yist   Secondist    jYeart j Secondist   st   ist ,

(3.2)

j 1

j
where Yeart = 1 when j = t, and 0 otherwise.

Equations (3.1) and (3.2) implicitly assume that all schools responded to NCLB in the
same way, regardless of Title-1 status. This is a strong assumption, of course, because only
schools that received Title 1 funds were required to enter PI after failing to make AYP in two
consecutive years. The hypothesis that NCLB had a differential effect on Title 1 versus nonTitle 1 schools is easily tested by fully interacting the covariates in equation (3.1) or (3.2) with a
Title-1 dummy and testing the significance of the year-second grade-Title 1 triple interaction
terms.
Equations (3.1), (3.2), and the Title 1-interacted analogue of equation (3.1) will be
estimated by the standard linear-FE estimator. Because school-years are nested in schools that
are nested in districts, standard errors are made robust to clustering at the district level in all
models. Doing so makes inference robust to the presence of district-wide and school-wide
initiatives, programs, demographic effects, and superintendent effects.

3.5

51

Results

3.5.1 Difference-in-differences Estimates
Table 3.2 reports school-year fixed effects (FE) linear-model estimates of the simple DD
regressions described by equation (3.1) for each of the following teacher characteristics: holds a

51

Clustering at the highest level, in this case the district, is advocated by Angrist and Pischke
(2009, p. 319).
116

graduate degree, years of experience, is inexperienced (binary indicator of < 2 years of
experience), holds a full California teaching credential, and holds a general elementary license.
The top panel, which assumes no heterogeneity in schools’ responses to NCLB, finds small but
marginally statistically significant decreases of about one percentage point in both the probability
that second-grade teachers hold a graduate degree and the probability that second-grade teachers
hold a full California teaching credential. The negative sign on the NCLB-second grade
interaction term suggests that teacher quality decreased in the tested second grade relative to the
non-tested first grade in response to NCLB. Precise null effects of NCLB are estimated for each
of the other three measures of teacher quality.
The second panel of table 3.2 extends the baseline specification to allow for a differential
effect in Title 1 schools. The Title 1-second grade-NCLB triple interaction term is not
statistically significant for any of the five outcomes of interest, however, suggesting that the
response of Title 1 schools to NCLB did not systematically differ from that of non-Title-1
schools. The lack of a differential response is not evidence of a lack of Title 1 schools’ desire to
respond, of course, as teacher shortages, resistance from teachers, and schools’ budget
constraints could all prevent schools from enacting desired changes.

3.5.2 Event History Estimates
As discussed in section 3.4, the results of the simple DD estimates reported in table 3.2 are
misleading if there were delayed effects of the policy change, schools and teachers altered their
behavior in anticipation of NCLB, or if there were pre-existing differences between first and
second grades in teacher quality. To accommodate these possibilities, table 3.3 estimates the
event-history models described by equation 3.2 for each of the five measures of teacher quality.

117

The estimates in columns 1 and 2 of table 3.3 are problematic: there were fairly large and
statistically significant decreases in both the probability of holding a graduate degree and in
years of experience in 2001/02, the year before NCLB took effect. By treating the 2001/02
school year as pre-NCLB, the simple DD effect on the probability of holding a graduate degree
of -0.007 reported in column 1 of table 3.2 is only half the size of the effect reported in the eventhistory specification of column 1, table 3.3. It is interesting to note that the decrease in the
likelihood of second-grade teachers holding a graduate degree persisted in each year following
the passage of NCLB.
The effect on teacher experience was short-lived, dying out in subsequent years. Column
2 of table 3.3 shows that teachers’ average experience fell by about 0.2 years in both the year
preceding NCLB and the first year of NCLB, which cancelled each other out in the simple
pre/post comparison of table 3.2. The point estimates of the interaction terms remain negative in
subsequent years, but decrease in magnitude and are not statistically significant at traditional
confidence levels. In column 3 of table 3.3 there appears to be a slight increase in the probability
that second-grade classrooms are staffed by an inexperienced teacher in 2002/03, the first year of
NCLB, but essentially no effect in subsequent years. This finding fits with the estimates in
column 2, perhaps suggesting that a small subset of experienced teachers left second-grade
classrooms immediately after the passage of NCLB and were replaced by inexperienced
teachers. Finally, as seen in table 3.2, NCLB does not appear to have had a strong effect on
either fully-credentialed or elementary-licensed teachers.

118

3.5.3 Sensitivity Analysis
Table 3.4 examines the robustness of the graduate-degree results to the choice of a linear
probability model (LPM) and performs two falsification exercises that provide further evidence
that the results discussed above are attributable to the increase in testing stakes brought about by
NCLB. I focus on graduate degree because this was the only outcome for which strong effects
were found. The linear estimates were taken as the preferred baseline estimates because the FE
logit estimator precludes the calculation of precise partial effects and requires dropping
observations from school-years that contain no variation in the dependent variable.

52

Column 1 repeats the baseline estimates from column 1 of table 3.3 to facilitate
comparisons with the alternative specifications reported in columns 2 and 3. Column 2 estimates
the LPM on a restricted sample that excludes observations from school years that did not
experience any variation in the dependent variable (i.e., either all first and second-grade teachers
in the given school-year held graduate degree, or none did). This is the sample restriction
imposed by the conditional FE logit estimator reported in column 3, so it is reassuring to see that
the LPM estimates in columns 1 and 2 are quite similar.
The FE-logit coefficients reported in column 3 follow the same sign and statisticalsignificance patterns as the LPM, suggesting that the results are robust to the linear functional
form assumed by the LPM. The logit coefficients cannot be directly compared to the LPM
coefficients, but scaled coefficients that are comparable to the LPM partial effects can be
computed using the product of the sample average probability of holding a graduate degree

52

For a textbook treatment of the FE logit estimator, see Wooldridge (2010, pp. 621-2).
119

(0.25) and one minus this probability as an approximate scaling factor.

53

This scaling factor is

approximate because we are effectively computing the ―partial effect at the average‖ rather than
the ―average partial effect‖ and the year-grade interactions are binary variables. Nonetheless, it
is reassuring that the resulting scaling factor of 0.1875 produces partial-effect estimates of about
-0.02, which are in line with the LPM estimates reported in column 2. FE-logit estimates for the
other binary measures of teacher quality are reported in table A3.
Columns 4 and 5 of table 3.4 are falsification exercises that look for a differential effect
on the probability of the classroom teacher holding a graduate degree between two tested grades
and between two non-tested grades, respectively. Column 4 shows no statistically significant
―effect‖ of NCLB on third-grade teachers relative to second-grade teachers in any year before or
after NCLB; this was to be expected, given that NCLB did not differentially effect the two
grades. In column 5, which compares kindergarten to first grade, larger, positive, and marginally
statistically significant effects are found that suggest that NCLB increased the probability that
first-grade classrooms were taught by highly-educated teachers. One potential explanation of
this finding is that some graduate-degree holding teachers moved from second- to first-grade
classrooms in response to the increased testing stakes.

3.6

Conclusions and Discussion

This paper uses teacher-level data to analyze the effect of an increase in the stakes of
standardized testing created by NCLB on teacher quality in California’s second-grade
classrooms. The identification strategy compares teachers in the non-tested first grade to those

53

It is impossible to estimate proper ―average partial effects‖ because the distribution of the

school-year fixed effect θst is unknown (Wooldridge, 2010, pp. 620-1).
120

in the tested second grade. Fairly large, statistically significant, and persistent decreases in the
probability that second-grade teachers held a graduate degree were found. A small, marginally
significant decrease in teachers’ experience was found in the years preceding and immediately
following NCLB, but died out in later years. No effect was found on teachers’ certification.
Surprisingly, NCLB’s effect did not vary with schools’ Title-1 status.
One possible explanation of the finding that teacher quality decreased in tested grades
despite schools’ short-run incentives to increase teacher quality is that highly-educated teachers
were leaving tested ―high-stakes‖ grades of their own accord in response to the increased
pressure and decreased autonomy created by NCLB. That the exit of highly-educated teachers
continued for several years after the initial passage of NCLB might indicate that each year
cumulative accountability pressures induced a new group of highly-educated teachers to leave
tested classrooms. Highly-educated teachers may be particularly vulnerable to leaving the
teaching profession when external factors such as high-stakes testing increase the stresses
associated with the job because their education affords them viable alternative careers.
However, alternative explanations of the results exist. Specifically, while the statistically
significant effect in the year prior to NCLB might indicate anticipatory behavior, it could just as
easily be evidence of a pre-existing trend towards a difference in teacher quality between first
and second grades. Furthermore, the results should be interpreted cautiously because the effects
are relatively small and only statistically significant for one of the five investigated outcomes.
Of course, as mentioned previously, the non-findings may be the result of spillover effects to the
first grade. Simply put, more evidence is necessary to definitively answer the questions posed in
this paper. Future work applying a similar methodology to administrative data from other states

121

and nationally representative data such as the NCES Schools and Staffing Survey will prove
invaluable, as will the use of panel data that allows researchers to follow teachers over time.
Taken at face value, the finding that high-educated teachers became less likely to be in
tested grades is a troubling unintended consequence of NCLB, but one that might be relatively
easy to correct from a policy perspective. Standard labor-economic theory suggests that if jobs
in tested grades are significantly more stressful, these jobs should pay higher wages in order to
compensate teachers for coping with that stress.

54

Compensation could be non-pecuniary as

well, provided via additional resources such as additional planning periods, teaching aids, or
professional development. Finally, even if the difference in educational attainment between
first- and second-grade teachers is not driven by the presence of high-stakes testing, this research
has identified grade-specific differences in teacher quality that are interesting in their own right.
Are the differences driven by the supply- or demand-side of the teacher labor market? A better
understanding the causes of such differences has the potential to improve teacher hiring and
training practices, for example.

54

This is known as the theory of compensating wage differentials (Borjas, Chapter 6, 2008).
122

CHAPTER 3 APPENDICIES

123

APPENDIX 3.1

Chapter 3 TABLES

124

Table 3.1: PAIF Data Description
Year
1998/99 1999/00

2000/01

2001/02

2002/03

2003/04

2004/05

2005/06

Total

Full sample (weighted) average characteristics
Second grade
49.7%
49.6%
50.4%
% black
8.8%
8.6%
8.4%
% Hispanic
45.8%
46.9%
48.4%
% free lunch
48.3%
48.4%
47.5%
Charter
0.8%
1.4%
1.5%
Title-1 eligible
67.8%
61.8%
71.1%
N (Teachers)
45,425
45,971
46,893

50.3%
8.2%
49.3%
47.2%
1.6%
72.0%
47,240

50.1%
8.0%
49.8%
46.5%
1.7%
73.7%
47,137

50.0%
7.7%
50.4%
46.6%
1.6%
73.2%
46,234

49.9%
7.5%
51.0%
46.5%
1.8%
67.2%
45,960

49.8%
7.3%
51.5%
45.4%
2.1%
69.2%
45,761

50.0%
8.1%
49.1%
47.0%
1.5%
69.5%
370,621

School-level average characteristics
% black
8.8%
8.6%
8.3%
8.2%
8.1%
8.0%
7.8%
7.5%
.
% Hispanic
39.7%
40.8%
41.8%
43.0%
43.8%
44.8%
46.0%
47.1%
.
% free lunch
44.2%
43.8%
42.6%
42.6%
42.3%
43.1%
43.5%
42.5%
.
Charter
1.0%
1.8%
2.0%
2.3%
2.6%
2.6%
2.9%
3.3%
.
Title-1 eligible
64.2%
58.4%
67.6%
69.0%
71.2%
71.2%
64.9%
67.3%
.
N (districts)
746
752
760
756
762
765
767
767
.
N (schools)
4,802
4,857
4,924
4,973
5,038
5,087
5,145
5,241
.
Notes: The school-characteristic data comes from the Common Core of Data (CCD). Grade-specific means of the
dependent variables are provided in the main results table alongside each regression and separately by year in
figure 3.1.

125

Table 3.2: Standard DD Estimates
Graduate
Dependent var.
degree
1
Baseline Specification
Second grade
NCLB*second
Constant
Differential Title-1 effect
Second grade
Title-1*second
NCLB*second
Title-1*NCLB*second
Constant

Years
teaching
2

New
teacher
3

Full
credential
4

Elem.
license
5

0.023
(0.004)***
-0.007
(0.003)**
0.241
(0.002)***

1.079
(0.097)***
-0.025
(0.066)
11.428
(0.045)***

-0.012
(0.002)***
0.001
(0.002)
0.067
(0.001)***

0.013
(0.006)**
-0.008
(0.005)*
0.904
(0.002)***

-0.001
(0.001)
0.001
(0.001)
0.971
(0.000)***

0.021
(0.006)***
0.003
(0.006)
-0.011
(0.006)*
0.005
(0.007)
0.241
(0.002)***

1.321
(0.168)***
-0.356
(0.174)**
-0.043
(0.141)
0.037
(0.183)
11.427
(0.047)***

-0.013
(0.002)***
0.002
(0.003)
0.001
(0.003)
0.000
(0.004)
0.067
(0.001)***

0.006
(0.003)*
0.010
(0.006)*
-0.003
(0.004)
-0.007
(0.004)
0.904
(0.002)***

0.000
(0.001)
-0.002
(0.002)
0.000
(0.002)
0.001
(0.002)
0.971
(0.000)***

First grade mean
0.24
11.4
0.07
0.90
0.97
Second grade mean
0.26
12.5
0.06
0.91
0.97
Observations
370,621
370,479
370,479
370,621
370,621
School-year FE
40,067
40,066
40,066
40,067
40,067
District clusters
823
823
823
823
823
Notes: All models estimated in this table include school-by-year fixed effects (FE). The standard
errors reported in parentheses are robust to district-level clustering. ***, **, and * indicate
statistical significance at the 1%, 5%, and 10% levels. ―New teacher‖ was defined as teachers
with less than two years of experience. Sample sizes vary because experience was not available
for all teachers.

126

Table 3.3: Event History Estimates (time-varying NCLB effects)
Graduate
Years
Full
Dependent var.
New teacher
degree
teaching
credential
1
2
3
4
Second grade
Second*99/00
Second*00/01
Second*01/02
Second*02/03
Second*03/04
Second*04/05
Second*05/06
Constant

0.028
(0.004)***
-0.002
(0.004)
-0.006
(0.004)
-0.013
(0.005)***
-0.010
(0.005)**
-0.014
(0.005)***
-0.014
(0.005)***
-0.014
(0.006)**
0.241
(0.002)***

1.153
(0.120)***
0.041
(0.065)
-0.094
(0.103)
-0.236
(0.096)**
-0.192
(0.102)*
-0.114
(0.102)
-0.035
(0.112)
-0.055
(0.111)
11.428
(0.045)***

-0.015
(0.003)***
0.006
(0.003)*
0.004
(0.004)
0.002
(0.003)
0.007
(0.003)**
0.003
(0.003)
0.003
(0.005)
0.004
(0.004)
0.067
(0.001)***

0.014
(0.008)*
0.002
(0.002)
-0.002
(0.005)
-0.002
(0.004)
-0.004
(0.006)
-0.008
(0.006)
-0.012
(0.007)
-0.011
(0.007)
0.904
(0.002)***

Elem.
license
5
-0.001
(0.002)
0.001
(0.002)
-0.000
(0.002)
0.000
(0.002)
0.001
(0.002)
0.000
(0.002)
0.002
(0.002)
0.001
(0.002)
0.971
(0.000)***

First grade mean
0.24
11.4
0.07
0.90
0.97
Second grade mean
0.26
12.5
0.06
0.91
0.97
Observations
370,621
370,479
370,479
370,621
370,621
School-year FE
40,067
40,066
40,066
40,067
40,067
District clusters
823
823
823
823
823
Notes: All models estimated in this table include school-by-year fixed effects (FE). The
standard errors reported in parentheses are robust to district-level clustering. ***, **, and *
indicate statistical significance at the 1%, 5%, and 10% levels. ―New teacher‖ was defined
as teachers with less than two years of experience. Sample sizes vary because experience
was not available for all teachers.

127

Table 3.4: Sensitivity Analysis of Graduate-degree Results
st
nd
st
nd
st
nd
nd
rd
Grades used:
1 and 2
1 and 2
1 and 2
2 and 3
Baseline
Restricted
FE Logit
Specification:
LPM
LPM
LPM
Coeff.
1
2
3
4
High-grade
High-grade*99/00
High-grade*00/01
High-grade*01/02
High-grade*02/03
High-grade*03/04
High-grade*04/05
High-grade*05/06
Constant

0.028
(0.004)***
-0.002
(0.004)
-0.006
(0.004)
-0.013
(0.005)***
-0.010
(0.005)**
-0.014
(0.005)***
-0.014
(0.005)***
-0.014
(0.006)**
0.241
(0.002)***

0.033
(0.005)***
-0.002
(0.005)
-0.008
(0.005)
-0.015
(0.005)***
-0.011
(0.005)**
-0.017
(0.005)***
-0.017
(0.006)***
-0.016
(0.007)**
0.278
(0.002)***

0.178
(0.026)***
-0.016
(0.024)
-0.044
(0.024)*
-0.087
(0.028)***
-0.069
(0.029)**
-0.098
(0.028)***
-0.098
(0.029)***
-0.098
(0.032)***
.

0.009
(0.004)**
-0.002
(0.003)
0.006
(0.004)
0.006
(0.004)
0.006
(0.005)
0.002
(0.005)
0.005
(0.005)
0.001
(0.005)
0.260
(0.001)***

K and 1

st

LPM
5
-0.020
(0.004)***
0.003
(0.006)
0.003
(0.005)
0.010
(0.006)*
0.005
(0.005)
0.011
(0.006)*
0.014
(0.006)**
0.022
(0.005)***
0.250
(0.002)***

High grade mean
0.24
0.28
0.28
0.26
0.25
Low grade mean
0.26
0.30
0.30
0.27
0.24
Observations
370,621
315,804
315,804
368,933
367,643
School-year FE
40,067
31,044
31,044
40,372
40,610
District clusters
823
667
667
824
845
Notes: Column 1 is identical to column 1 in table 3.3; it is repeated to facilitate
comparison with the restricted-sample LPM in column 2. The sample restriction in
column 2 mimics that of the FE-logit, which drops observations from school-years in
which there was no variation in the dependent variable. This is why the sample sizes in
this table vary. All models estimated in this table include school-by-year fixed effects
(FE). The standard errors reported in parentheses are robust to district-level clustering.
***, **, and * indicate statistical significance at the 1%, 5%, and 10% levels.

128

APPENDIX 3.2

CHAPTER 3 FIGURES

129

Figure 3.1A: Fraction Grad. Degree

Figure 3.1B: Average Experience

.3

14

.28

13

.26

12

.24

11

.22
.2
1998/99

2000/01

2002/03

10
1998/99

2004/05

1st Grade
2nd Grade
Figure 3.1C: Fraction Inexperienced

2000/01

2002/03

2004/05

1st Grade
2nd Grade
Figure 3.1D: Fraction Full Credential

.1

1

.08

.95

.06

.9

.04
1998/99

2000/01 2002/03 2004/05
1st Grade
2nd Grade

.85
1998/99

130

2000/01 2002/03 2004/05
1st Grade
2nd Grade

APPENDIX 3.3

FE LOGIT COEFFICIENTS

131

Table A3: FE Logit Coefficients
Graduate
New teacher
VARIABLES
1
2
degree
Second

Full Cred.
3

Elem. Lic.
4
license

0.178
-0.194
0.139
-0.042
(0.026)***
(0.035)***
(0.062)**
(0.063)
Second*99/00
-0.016
0.059
0.015
0.047
(0.024)
(0.050)
(0.025)
(0.070)
Second*00/01
-0.044
0.031
-0.012
-0.032
(0.024)*
(0.051)
(0.044)
(0.071)
Second*01/02
-0.087
-0.034
-0.002
-0.036
(0.028)***
(0.045)
(0.044)
(0.092)
Second*02/03
-0.069
0.007
-0.004
0.040
(0.029)**
(0.060)
(0.057)
(0.080)
Second*03/04
-0.098
-0.102
-0.013
-0.023
(0.028)***
(0.055)*
(0.064)
(0.106)
Second*04/05
-0.098
-0.093
-0.079
0.104
(0.029)***
(0.070)
(0.066)
(0.096)
Second*05/06
-0.098
-0.053
0.011
0.043
(0.032)***
(0.057)
(0.073)
(0.086)
Observations
315,804
154,879
153,582
52,634
School-Years
31,044
13,682
13,055
4,901
Districts
667
658
520
366
2
0.001
0.002
0.001
0.0001
Pseudo R
Log Likelihood
-129294
-44643
-55881
-15149
Notes: All models estimated in this table include school-by-year fixed
effects (FE). The standard errors reported in parentheses are robust to
district-level clustering. ***, **, and * indicate statistical significance at the
1%, 5%, and 10% levels. ―New teacher‖ was defined as teachers with less
than two years of experience. Sample sizes vary both because experience
was not available for all teachers and because the FE-logit estimator drops
observations from school-years that did not experience any variation in the
dependent variable (i.e., all 0 or all 1).

132

CHAPTER 3 REFERENCES

133

CHAPTER 3 REFERENCES

AIR. See American Institutes for Research.
American Institutes for Research. 2003. Evaluation Study of the Immediate
Intervention/Underperforming Schools Program and the High Achieving/Improving Schools
Program of the Public Schools Accountability Act of 1999. Washington DC: American Institutes
for Research.
———. 2005. Evaluation Study of the Immediate Intervention/Underperforming Schools
Program of the Public Schools Accountability Act of 1999. Washington DC: American Institutes
for Research.
Angrist, J., and S. Pischke. 2009. Mostly Harmless Econometrics: An Empiricists’ Companion.
Princeton, NJ: Princeton Univ. Press.
Borjas, G. 2008. Labor Economics. 4th ed. New York, NY: McGraw-Hill.
Boyd, D., H. Lankford, S. Loeb, and J.Wyckoff. 2008. The impact of assessment and
accountability on teacher recruitment and retention: Are there unintended consequences? Public
Finance Review January 36(1): 88-111.
Carnoy, M., and S. Loeb. 2002. Does external accountability affect student outcomes? A crossstate analysis. Educational Evaluation and Policy Analysis 24(4): 305-331.
Carroll, S., R. Reichardt, and C. Guarino. 2000. The distribution of teachers among California’s
school districts and schools. Santa Monica, CA: RAND Corporation.
Chetty, R., J. N. Friedman, N. Hilger, E. Saez, D. W. Schanzenbach, and D. Yagan. 2011. How
does your kindergarten classroom affect your earnings? Evidence from Project Star. NBER
Working Paper No. 16381.
Chingos, M. M., and M. R. West. 2011. Promotion and reassignment in public school districts:
How do schools respond to differences in teacher effectiveness? Economics of Education Review
30(3): 419-433.
Clotfelter, C.T., H. F. Ladd, J. L. Vigdor, and R.A. Diaz. 2004. Do school accountability systems
make it more difficult for low-performing schools to attract and retain high-quality teachers?
Journal of Policy Analysis and Management 23(2): 251-271.
Cullen J. B., and R. Reback. 2006. Tinkering toward accolades: School gaming under a
performance accountability system. In Improving School Accountability: Check-ups or choice?
(Advances in Applied Microeconomics, Vol. 14), ed. T. Gronberg and D. Jansen, 1-34.
Amsterdam: JAI Press.
134

Daly, A. J., and J. Chrispeels. 2005. From problem to possibility: Leadership for implementing
and deepening the process of effective schools. Journal for Effective Schools 4(1): 7-25.
Dee, T. S., and B. Jacob. 2010. Impact of NCLB on students, teachers, and schools. Fall 2010
Brookings Papers on Economic Activity.
Figlio, D. N. 2006. Testing, crime, and punishment. Journal of Public Economics 90(4-5): 837851.
Figlio, D.N., and L. S. Getzler. 2006. Accountability, ability and disability: Gaming the system?
In Improving School Accountability: Check-ups or choice? (Advances in Applied
Microeconomics, Vol. 14), ed. T. Gronberg and D. Jansen, 35-49. Amsterdam: JAI Press.
Figlio, D. N., and J. Winicki. 2005. Food for thought: The effects of school accountability plans
on school nutrition. Journal of Public Economics 89(2-3): 381-394.
Goe, L. 2006. Evaluating a state-sponsored school improvement program through an improved
school finance lens. Journal of Education Finance 31(4): 395-419.
Hamilton, L.S., B. M. Stecher, and K. Yuan. 2008. Standards-based reform in the United States:
History, research, and future directions. RAND Corporation Report Number 1384. Santa
Monica, CA: RAND.
Hannaway, J., and Hamilton, L. 2008. Effects of Accountability Policies on Classroom Practices.
Washington, DC: The Urban Institute.
Heckman, J. J., and D. V. Masterov. 2007. The productivity argument for investing in young
children. Review of Agricultural Economics 29(3): 446-493.
Heckman, J. J., S. H. Moon, R. Pinto, P. A. Savelyev, and A. Yavitz. 2010. The rate of the return
to the HighScope Perry Preschool Program. Journal of Public Economics 94(1-2): 114-128.
Jacob, B.A. 2005. Accountability, incentives and behavior: the impact of high-stakes testing in
the Chicago Public Schools. Journal of Public Economics 89(5-6): 761-796.
Jacob, B. A., and S. Levitt. 2003. Rotten apples: An investigation of the prevalence and
predictors of teacher cheating. Quarterly Journal of Economics 118(3): 843-877.
Lee, J., and K. K. Wong. 2004. The impact of accountability on racial and socioeconomic equity:
Considering both school resources and achievement outcomes. American Educational Research
Journal 41(4): 797-832.
Luna, C., and C.L. Turner. 2001. The impact of the MCAS: Teachers talk about high-stakes
testing. The English Journal 91(1): 79-87.

135

Phillips, M., and J. Flashman. 2007. How did the statewide assessment and accountability
policies of the 1990’s affect instructional quality in low-income elementary schools? In
Standards-Based Reform and the Poverty Gap, ed. A. Gamoran, 47-90. Washington, DC:
Brookings Institution Press.
Reback, R., J. Rockoff, and H. L. Schwartz. 2011. Under pressure: Job security, resource
allocation, and productivity in schools under NCLB. NBER Working Paper No. 16745.
Wooldridge, J.M. 2010. Econometric Analysis of Cross Section and Panel Data, 2nd Ed.
Cambridge, MA: MIT Press.

136