ESSAYS ON THE QUALITY EDUCATION INVESTMENT ACT AND WEIGHTED
QUANTILE REGRESSION
By
Paul Burkander

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Economics - Doctor of Philosophy
2014

ABSTRACT
ESSAYS ON THE QUALITY EDUCATION INVESTMENT ACT AND
WEIGHTED QUANTILE REGRESSION
By
Paul Burkander
This dissertation contains three self-contained chapters. The first two are related in
their analysis of California’s Quality Education Investment Act (QEIA), with the former
estimating its effect on student achievement and the latter exploiting an aspect of the law to
estimate district preferences for resource allocation over low-performing schools. The final
chapter considers the distributions of quantile regressors under complex random sampling.
Beginning in the 2007-08 school year, California’s QEIA required schools selected via
lottery to institute reforms including class size reduction, increased average teacher experience, and extra professional training. The act provided additional per-pupil funding
for schools to meet these requirements. Conditional on known probabilities of selection,
which differed across schools, treatment is uncorrelated with potential outcomes, allowing
for non-parametric identification of the causal effect by inverse probability weighting. In
the first fully-funded year of the program, math scores in 4th grade increased by 0.32 SD
in the population of California school-grade averages, and by the second fully-funded year
5th grade math scores improved by 0.37 SD. By the third fully-funded year of the program,
math scores in 2nd grade were 0.30 SD higher in the distribution of California school-grade
averages, and 0.29 SD higher in 3rd grade. Selected schools did not increase teacher experience, and had 4.4 to 4.8 fewer students in the first fully-funded year in 4th and 5th grade.
In kindergarten through 3rd grade class sizes were reduced later and less dramatically, by 3
to 4.2 students by the third fully-funded year, due primarily to unselected schools exiting

California’s previous class size reduction program. The timing of class size reductions and
student achievement gains suggests class size was the driving factor.
This novel intervention also required school districts to rank their low-performing schools,
the analysis of which constitutes my second chapter. Districts understood that higher
ranked schools would be more likely to receive significant increases in funding to implement QEIA reforms. These rankings provide a unique revelation of district preferences for
resource allocation across low-performing schools. Using a discrete-choice model, I estimate the school characteristics that districts ranked highly. I find descriptive evidence that
districts were more likely to rank highly schools with a high percentage of students eligible
for Free and Reduced Price Lunch, and which were repeatedly sanctioned under No Child
Left Behind for failing to make Adequate Yearly Progress. I also find some evidence that
districts ranked highly high schools that applied for an alternative program, in which they
crafted their own reforms.
The final chapter, coauthored with Ot´avio Bartalotti, extends previous work that developed the asymptotic properties of quantile regression estimators under Standard Stratified
sampling, to Variable Probability sampling. Formulas for the asymptotic variance and
feasible estimators are provided. Simulation results are provided for both Standard Stratified and Variable Probability sampling. Simulation results confirm econometric theory
by demonstrating that under exogenous stratification unweighted estimates preform well
and are more efficient than weighted estimates. Under endogenous stratification and SS
sampling, no estimate of standard errors performs best across coefficients, quantiles, and
sample sizes. Under endogenous stratification and VP sampling, however, bootstrapped
standard errors consistently perform well.

Copyright by
PAUL BURKANDER
2014

To Maggie and Ollie.

v

ACKNOWLEDGMENTS

The production of this research has benefited tremendously from faculty, friends, and family.
Gary Solon pushed me to improve my analyses, without telling me exactly how. It is to this
that I attribute much of my success as a researcher. In his labor and applied econometrics
courses, Gary helped define for me the type of researcher I want to be. I appreciated stops
by Todd Elder’s office, the best of which ended with us both haphazardly scribbling various
formulas on paper. In Todd’s labor course it was typically when the final bell sounded that
the class got most interesting, as he would hesitate to let us leave, insisting, as chalk dust
flew and the board became increasingly filled, on showing us one more proof. Mike Conlin
recognized one summer that I was not being productive enough, and insisted on meeting
with me weekly, helping me find the pace of work that led to where I am.
The graduate student community at Michigan State owes much of its current strength to
those who came before. Quentin Brummet, Ot´avio Bartalotti, Steve Dieterle, and others
exemplified the type of graduate student and colleague that I wanted to be. I was also
fortunate to be part of a tremendous cohort. Discussions with Michelle Maxfield, Brian
Stacy, Paul Thompson, and arguments with Hassan Enayati were instrumental in my first
year’s success and beyond. Dan Litwok, Michael Bates, and Hassan are the type of people
that, if they all stand up to go somewhere, I am very likely to follow, even against the
weight of statistical evidence.
My family stood by me through my non-traditional path. Thank you, Nick, Janet,
and John. To no one am I more grateful than Kri, whose endless support has carried me
throughout.

vi

TABLE OF CONTENTS

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xi

Chapter 1 The Causal Effect of School Reform: Evidence
nia’s Quality Education Investment Act . . . . .
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . .
1.3 Policy Description . . . . . . . . . . . . . . . . . . . . . . .
1.4 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5 Identification . . . . . . . . . . . . . . . . . . . . . . . . .
1.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6.1 Regression Results . . . . . . . . . . . . . . . . . .
1.6.2 Main Results . . . . . . . . . . . . . . . . . . . . .
1.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . .
APPENDICES . . . . . . . . . . . . . . . . . . . . . . . .
APPENDIX A - FIGURES . . . . . . . . . . . . .
APPENDIX B - TABLES . . . . . . . . . . . . . .
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . .

from Califor. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .

1
1
3
7
12
16
20
20
23
29
33
34
36
51

Chapter 2 School Districts’ Revealed Preference for Resource Allocation:
Evidence from California’s Quality Education Investment Act
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Institutional Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
APPENDIX A - FIGURES . . . . . . . . . . . . . . . . . . . . . .
APPENDIX B - TABLES . . . . . . . . . . . . . . . . . . . . . . .
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

56
56
58
60
65
67
72
75
77
78
79
84

Chapter 3 Asymptotic Properties of Quantile Regression for Standard
Stratified and Variable Probability Sampling . . . . . . . . . .
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

86
86

vii

3.2
3.3

3.4
3.5

The Quantile Regression Population Problem .
Quantile Regression under SS and VP Sampling
3.3.1 SS Sampling . . . . . . . . . . . . . . . .
3.3.2 VP Sampling . . . . . . . . . . . . . . .
Simulation Results . . . . . . . . . . . . . . . .
Conclusion . . . . . . . . . . . . . . . . . . . . .
APPENDICES . . . . . . . . . . . . . . . . . .
APPENDIX A -TABLES . . . . . . . . .
APPENDIX B - PROOFS . . . . . . . .
BIBLIOGRAPHY . . . . . . . . . . . . . . . . .

viii

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

88
90
90
93
94
99
101
102
108
110

LIST OF TABLES

Table 1.1

Descriptives, Elementary Regular QEIA Schools 2007 . . . . . . .

36

Table 1.2

Sample Moment Conditions . . . . . . . . . . . . . . . . . . . . . .

37

Table 1.3

Select Regression Results . . . . . . . . . . . . . . . . . . . . . . .

38

Table 1.4

Class Size and HQT

39

Table 1.5

Average Performance Index

. . . . . . . . . . . . . . . . . . . . .

40

Table 1.6

Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

Table 1.7

Demographics Continued . . . . . . . . . . . . . . . . . . . . . . .

42

Table 1.8

Test Taking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

Table 1.9

Teacher Mobility

. . . . . . . . . . . . . . . . . . . . . . . . . . .

44

Table 1.10

Teacher Composition . . . . . . . . . . . . . . . . . . . . . . . . .

45

Table 1.11

Class Size Grades K-2

. . . . . . . . . . . . . . . . . . . . . . . .

46

Table 1.12

Class Size Grades 3-5 . . . . . . . . . . . . . . . . . . . . . . . . .

47

Table 1.13

Math Standardized Test

. . . . . . . . . . . . . . . . . . . . . . .

48

Table 1.14

ELL Standardized Test . . . . . . . . . . . . . . . . . . . . . . . .

49

Table 1.15

Cost-Benefit Analysis . . . . . . . . . . . . . . . . . . . . . . . . .

50

Table 2.1

Descriptive Statistics, All Schools . . . . . . . . . . . . . . . . . .

80

Table 2.2

Descriptive Statistics, by Ranking . . . . . . . . . . . . . . . . . .

81

Table 2.3

Difference Above Median Ranking-Below Median Ranking . . . . .

82

Table 2.4

Rank Order Logit Results, Efron’s Approximation for Ties . . . .

83

Table 3.1

Exogenous SS, Simulation Results . . . . . . . . . . . . . . . . . .

102

. . . . . . . . . . . . . . . . . . . . . . . . .

ix

Table 3.2

Endogenous SS, Simulation Results . . . . . . . . . . . . . . . . .

103

Table 3.3

Large Sample Endogenous SS, Simulation Results . . . . . . . . .

104

Table 3.4

Exogenous VP Sampling, Simulation Results . . . . . . . . . . . .

105

Table 3.5

Endogenous VP Sampling, Simulation Results . . . . . . . . . . .

106

Table 3.6

Large Sample Endogenous VP Sampling, Simulation Results . . .

107

x

LIST OF FIGURES

Figure 1.1

Support over p, All Elementary Schools . . . . . . . . . . . . . . .

34

Figure 1.2

Cohort-Level Class Size and Math Achievement Comparison . . .

35

Figure 2.1

Portion of Form Submitted by San Diego . . . . . . . . . . . . . .

78

xi

Chapter 1
The Causal Effect of School Reform:
Evidence from California’s Quality
Education Investment Act

1.1

Introduction

Current educational policy in the United States is focused on increasing the proportion
of students who meet state-determined proficiency levels on standardized tests. There
is disagreement about how to achieve this, with some arguing for additional educational
inputs, and others for more efficient use of existing inputs. An extensive literature on
educational production functions1 has attempted to resolve this and related questions.
However, despite some random experiments and a plethora of natural experiments, no
clear consensus has emerged on the question of whether marginal changes in educational
resources have any effect on educational outcomes.
California’s Quality Education Investment Act (QEIA) offers a unique opportunity to
identify the effect of increased inputs on outcomes. In the 2007-20082 school year the
1

Summaries of the assumptions and methods employed in the education production function literature
can be found in Hanushek (1979), Todd et al. (2003), and Rice et al. (2008).
2
Henceforth, school years are referred to by the year in which the Spring semester occurs. For example,
the 2007-2008 school year is referred to as 2008.

1

QEIA went into effect, leading to increased funding and obligatory reforms for about 500
selected schools, which were chosen from 1,260 participating schools. Districts were first
required to rank all of their participating schools, and then districts were randomly selected
to have their highest ranked schools funded. Once selected, funded schools were required
to institute several reforms, e.g., they had to reduce average class size, increase average
teacher experience, and provide additional professional training to teachers.
Conditional on districts’ rankings, selection of schools was random, though schools differed in their probability of selection. The selection process, and therefore the probabilities
of selection, are known, and the average treatment effect of QEIA can therefore be nonparametrically identified using inverse probability weighting (IPW). A drawback of QEIA
is that the effects of the individual reforms cannot be separately identified. However, bundled reforms are worth studying in their own right: pressure to improve outcomes often
leads to concurrent policy changes so QEIA is reflective of how reforms are actually carried
out; it may also be the case that interactive effects cause bundled reforms to be more or
less effective than the sum of their constituent parts.
Moreover, as I find, QEIA caused a reduction in class size of about 4 students per
class by the third fully-funded year of the program, but had no discernible effect on the
other main policy lever that I observe, teacher experience. Reportedly, the vast majority
of elementary schools eligible to participate in QEIA were already required to meet many
of its requirements, with the exception of reduced class size, increased teacher experience,
and increased professional training. Also, continued participation in QEIA was contingent
on schools meeting achievement growth targets; the evidence therefore suggests that the
causal effect of QEIA on standardized test scores occurred through some mix of class size

2

reduction, professional training, and increased pressure to raise test scores.
Indeed, QEIA did cause a statistically significant increase in student achievement, as
measured by both California’s Average Performance Index (API), and by grade-level results
on California’s primary standardized test. The API is a weighted school-level average across
all tested subjects, grades, and test types. With respect to the population of all elementary
schools, the average treatment effect of QEIA on the API by the third fully-funded year
of the program was an increase of 0.33 standard deviations, with larger gains for Hispanic
and low-SES students. With respect to the population of grade-level averages across all
California schools, by the third fully-funded year of the program standardized math scores
increased 0.28 standard deviations in 2nd grade, and by 0.44 standard deviations in 5th
grade. QEIA caused more modest gains in English language arts, of 0.19 and 0.22 standard
deviations in 2nd and 5th grade, respectively.
In what follows, section 1.2 reviews the relevant literature; section 1.3 describes QEIA
in greater detail; section 1.4 reviews the data used in this analysis; section 1.5 outlines the
identification strategy; section 1.6 presents the results. Section 1.7 concludes.

1.2

Literature Review

This analysis contributes causal evidence to the aforementioned extensive literature on
education production functions, which generally has found mixed results. Meta-analyses
that find no clear evidence of an effect of increased school inputs on student outcomes
include Hanushek (1986), and Hanushek (1997), though the methods employed in those
analyses are criticized by Krueger (2002). In contrast, Greenwald et al. (1996) provide a
meta-analysis that finds many school inputs do have positive effects, though their methods
3

are criticized by Hanushek (1996).
Within the education production function literature, this paper contributes to those
strands concerned with the effect of reducing class size, providing professional training to
teachers, and increasing accountability. The study of class size effects on student achievement has a rich history, dating back at least a century. As noted by Rockoff (2009), early
waves of the literature, which include field experiments as early as the 1920s, tended to
find no effect from a reduction of class size.
Recent studies of variation in class size tend to be quasi-experimental, with the notable
exception of Tennessee’s Project STAR (Student/Teacher Achievement Ratio). Project
STAR was a randomized control trial that assigned students to either small classes (1317 students per class), regular classes (22-26 students per class), or regular classes with
a teaching aide. The reduced class size treatment of Project STAR has generally been
found to have had positive effects in the short run (Nye et al. (1999), Krueger (1999)),
and in longer run outcomes (Krueger et al. (2001), Chetty et al. (2011a)) though nonrandom attrition, lack of baseline measure of student performance, and little information
about teachers and how they were randomized should give us pause in interpreting results
(Hanushek (1999)).
Notable natural experiments include Angrist et al. (1999), which uses a regression discontinuity design based on a class size rule in Israel. They find significant returns to
achievement from class size reduction for math and reading scores for 5th graders. Hoxby
(2000) also uses variation in class size generated by class size caps, and exploits exogenous variation in population, to analyze the effect of class size in Connecticut. She finds
no returns to class size reduction, and in fact rules out even modest returns to class size

4

reduction.
The results in Hoxby (2000) are questioned by Jepsen et al. (2009), who note that, in
using test scores from the following year, estimates may be attenuated. Jepsen et al. (2009)
analyze a previous class size reduction program in California, which was first implemented
in 1996. Using a fixed-effects analysis, and with a school-level measure of achievement as
the outcome variable, the authors find that a ten-student reduction in class size led to a
0.06 to 0.1 standard deviation improvement in Math, and a 0.04 to 0.6 standard deviation
improvement in Reading. An important contribution by Jepsen et al. (2009) is that, unlike previous class size analyses, they explore changes in teacher quality that result from
rapidly reducing class size, finding that in the early years of the program class size returns
were offset by losses from reduced teacher quality. This issue is explored further by Dieterle
(2013), who finds that the reduction in teacher quality was large enough to account for only
modest returns to reduced class size reduction in an anonymous state. Chingos (2012) examines another class size reduction policy implemented in Florida, and finds no effect using
a comparative interrupted time series design. Chingos (2012) exploits the fact that many
districts already met the class size requirements of Florida’s law when it was implemented.
Districts that already met the requirement received the same increase in funding, so the
counterfactual to increased funding and class size reduction is an unconstrained increase in
funding.
The literature on effects of teacher professional development is less-well developed. Yoon
et al. (2007) reviews over 1,300 studies conducted between 1986 and 2006, and found
only nine to rigorously examine the effect of teacher professional development on student
achievement. Among these, the range of effects was quite large, from -0.53 to 2.39 standard

5

deviations,3 with the smallest and largest effect coming from the same study. The student
assessment tools were generally closely aligned with teacher training, and within study
there was wide variation of effects. In a more recent study, Barrett et al. (2012) uses a
propensity score model to test whether less effective teachers are more likely to select into
professional development, and whether accounting for this selection affects estimates of
effectiveness of such programs. They find that pre-treatment value added scores are an
important predictor of participation, and controlling for selection professional development
increases student test scores by 0.08 standard deviations in elementary school.
Linking incentives to test scores has been shown to improve medium term math outcomes (Chiang (2009)), and long-term outcomes of low-performing students (Cohodes et al.
(2013)). A growing body of literature considers potential erosion of signal quality of student
assessments when those assessments are linked to incentives. There is evidence that schools
manipulate the population of test takers (Figlio et al. (2006), Jacob (2005)), shift resources
towards marginal students (Neal et al. (2010)), and that teachers manipulate test results
(Jacob et al. (2003)). The QEIA link between incentives and test scores differs from those
studied above in at least two ways: using the API, an average of scores across all students,
instead of percent proficient removes the incentive to teach to the marginal student, and
scores on tests for cognitively impaired students count toward a school’s API. I nonetheless
test below whether the population of test takers changes.
With conflicting results in meta-analyses and natural experiments, and few randomized
control trials, it seems clear that after a century of research into education production
functions, more research is needed. States such as California in 1996, Florida in 2003, and
3

Presumably the standard deviations are with respect to the population of students studied, though
neither Yoon et al. (2007) nor the source material clarifies the point.

6

Ohio in 2009 have passed class size reduction laws, devoting resources toward increasing
inputs that may or may not improve outcomes. If increased inputs can lead to improved
output, it must be determined how to move closer to an optimal mix of inputs. To address
these questions, more natural experiments with credible exogenous variation are needed.
QEIA provides such credible evidence.

1.3

Policy Description

The QEIA was preceded in California by a larger and more ambitious class size reduction
policy. That policy was enacted in 1996, and continues nominally to this day. Participation
was voluntary, but incentivized: districts received $650 in the first year4 per student in a
K-3 class of 20 or fewer students. However, this incentive has diminished twice over time.
In 2004 the maximum qualifying average class size was increased to just under 22, and as
of February 2009 classes of 25 or more students are still eligible for 70% of the per-pupil
funds, though funding is given for no more than 20 students per class.5
The QEIA came about as the consequence of litigation against then California Governor
Arnold Schwarzenegger. The plaintiffs in the case argued successfully that the state paid
less than the legislated minimum amount to kindergarten through 12th grade public schools
in the 20056 and 2006 school years. As a result, the state was required to pay back
approximately $2.7 billion to K-12 schools.
Rather than distribute the money equally across all schools, legislators decided to focus
4
5

This number was adjusted for inflation in subsequent years.
For instance, a class of 25 or more students would receive 0.70 × 20 × Full Per-Pupil Amount

6

Governor Schwarzenegger had reached an agreement with a coalition including the California Teachers
Association to underfund education by $2 billion below the amount guaranteed by Proposition 98, which
pegs education funding to growth in general funds. However, state revenue exceeded expectations, and
education funding was not updated to reflect this. For more information, see Bluth (2005).

7

on a subset of low-performing schools. The subset was chosen on a semi-random basis, and
the number of schools was chosen such that per-student funding would increase by $500 in
grades K-3,7 $900 in grades 4-8, and $1,000 in high school from 2009-2014, and by half as
much in 2008.8
Schools were deemed eligible to participate in QEIA if they were in the bottom quintile
of the state’s 2005 academic performance distribution, as determined by the API.9 Eligible
schools had to commit to meeting the requirements of QEIA before they could participate
in the selection process. Schools could choose to participate in the regular QEIA program,
or an alternative program. Schools in the alternative program, which are excluded from
this analysis, were able to design their own reform plans, which had to be approved as part
of the application to participate in QEIA. Of the 1,455 schools eligible to participate in
QEIA, 1,260 chose to do so, and 88 of these chose to participate in the alternative program.
Each district with more than one participating school was required to rank its schools.
It was permissible to give multiple schools the same rank, and indeed several districts did
so. Districts received as many random numbers as they had participating schools, and these
random numbers were assigned to each district’s schools based on the district’s rankings.
For example, if a district with two schools received random numbers 213 and 314, the
highest ranked school was assigned 213 and the second was assigned 314. If a district
assigned the same ranking to multiple schools, the order was determined randomly within
that ranking, and was done so by the California Department of Education. The selection
then proceeded in four stages.
7

For comparison, in the first full year of QEIA funding the per-pupil funding for participation in California’s existing class size reduction program was $1,071.
8
The reduced amount in 2008 was intended to give schools a chance to prepare for full implementation
of reforms by 2009.
9
Very small schools, whose API scores were considered unreliable, were excluded.

8

First, schools for the alternative program were selected. High schools were given priority
for this program, and the number of schools was chosen such that no more than 15% of
the anticipated number of students in funded schools would be in the alternative program.
The high schools with the lowest random numbers in the alternative program were funded
until this target was reached.10
Second, to ensure geographic diversity, in each county without a funded school from
the first stage, the school with the lowest random number was selected. Districts were told
that after schools were selected for the alternative program and geographic diversity, all
schools with the lowest random numbers would be funded until funds were exhausted.11
In fact, high schools were selected separately in the third stage: to ensure the legislatively mandated fair representation of grade spans, a target number of high school students
was selected so that the proportion of high school students in funded schools would be
roughly equivalent to the proportion of high school students in all participating schools.
The high schools with the lowest random numbers were selected until this target was
reached. Any school with at least one high school student in 2007 was considered a high
school for this purpose. Finally, the elementary and middle schools with the lowest random
numbers were selected until QEIA funds were exhausted.12
At the conclusion of the selection process, 25 schools had been selected for the alternative
10

Several middle and elementary schools applied for the alternative program, but given the number of
high schools that applied they effectively had zero probability of being chosen.
11
The actual selection differed somewhat, as described below. That districts were told this simplified
version is evidenced in contemporaneous school board minutes (Santa Rosa City Schools (2007)), and CDE
presentations (Balcom (2007)). This is also the depiction in the report to the California legislature (CDE
(2010)), written 3 years after the selection process.
12
As a result of this process, and unbeknownst to districts prior to selection, the funding results did not
always follow district rankings. For instance, a highly ranked high school could go unfunded, and a lower
ranked elementary school could be funded. Ranking a high school ahead of an elementary school could
also lead to both not being funded, while if the elementary were ranked higher it would be funded, if the
difference in random numbers is sufficiently large.

9

program and 463 for the regular program. One school that was selected immediately
withdrew from the program, and in subsequent years 13 schools were added. For the
purpose of this analysis, I consider all schools initially selected to be treated, and all
participating schools not selected to be the control group. Additionally, I restrict the
sample to elementary schools,13 which account for over 70% of schools participating in the
regular QEIA program.
Funded schools were required to implement the following: reduce class size; align average teacher experience with their district average; ensure that all teachers in the school be
considered Highly Qualified Teachers (HQT) under the federal Elementary and Secondary
Education Act; satisfy the requirements of the Williams settlement, which required schools
to provide qualified teachers and safe, well-maintained facilities; provide professional training to teachers and paraprofessionals; and, for high schools, increase the counselor-student
ratio.
According to CDE (2010), the vast majority of schools eligible to participate in QEIA
were already required to meet the HQT standard and the requirements of the Williams
settlement, regardless of whether they were selected to be funded. This claim is substantiated by Table 1.1, which shows that the typical participating elementary school had 94%
of its teachers classified as Highly Qualified, and 95% of participating schools were required
to satisfy the terms of the Williams settlement.
The class size reduction requirement stipulated that funded schools reduce class size to
13

This restriction has two motivations: there are additional QEIA requirements for high schools, comth
plicating the interpretation of the treatment, and beginning in 6 grade, students are sorted into various
math examinations, thus compositional changes may be confounded with changes in achievement. Results
that include middle and high schools are qualitatively quite similar, and are available from the author by
request.

10

20 students per class in grades K-3.14 In grades 4-12, class sizes had to be reduced from
their baseline level15 by 5 students, or to 25 students per class, whichever was lower. In
each of the first three years of QEIA, schools were required to reduce the difference between
the pre-QEIA average class size and QEIA target class size by 1/3. For some schools, the
average in 2007 was quite low, which was particularly strenuous for small schools with a
single classroom per grade. As such, many schools applied for and were granted waivers
from this requirement, and instead met a higher minimum class size requirement.
Under QEIA, teacher experience is measured by the Teacher Experience Index (TEI).
Teachers with more than 10 years of experience are assigned 10 years in calculating the
average. Part-time teachers are given full weight in the calculation, and teachers teaching
at multiple schools count towards each school’s average. Funded schools are required to
exceed the district average TEI.
Districts selected for QEIA are required to provide professional development opportunities for teachers, administrators, and paraprofessionals, e.g., teaching assistants. Funded
schools are required to build and maintain a system for tracking participation in professional development programs, and districts are required to ensure that funded schools are
in fact meeting the requirements. Participation requirements for teachers are clearly spelled
out by QEIA, e.g., each year at least one third of teachers in a QEIA funded school must
participate in training, but the specifics of the training program are largely left to the
schools and districts.
In addition to these reforms, participation in QEIA was contingent on meeting accel14

This is precisely the original requirement of California’s 1996 class size reduction policy, the maximum
cap of which increased over time.
15
The baseline was the grade-level average class size in 2006, unless that average was greater than 25, in
which case 2007 was used.

11

erated student achievement growth targets, as measured by California’s API. The target
API for all schools in California is 800; all California schools below that target have a
growth target of 5% of the difference between their API and 800, or 1 point, whichever is
greater. By the third year of QEIA, funded schools are required to have exceeded growth
targets on average over those first three years. A school is permitted to fall short of its
growth target in the first two years of full funding without repercussions, then after the
third fully-funded year schools whose average growth did not exceed average growth targets
lost QEIA funding.

1.4

Data

This analysis relies on several publicly available data sets produced by the California Department of Education. These include school-level data on API scores, a data set that
includes a rich set of school demographics; teacher-level data; assignment-level data, including, e.g., the number of classes assigned to a teacher and the number of students in
each of those classes; and subject-grade-level data on California standardized tests. Though
these data sets are available for earlier years as well, I rely primarily on data from 2005-2011
with one exception: the assignment-level data were not collected in 2010 due to budget
constraints. As a result, I am unable to calculate average class size, proportion of teachers
classified as HQT, or TEI for 2010. In addition to these publicly available data, I have
obtained from CDE the rankings of participating schools submitted by each district, which
include a variable for whether the school applied for the regular or alternative program.
The teacher-level data are not linked from year to year. Instead, in each year teachers
are assigned a new ID. The purpose of the ID is to facilitate linking teacher-level data
12

to assignment-level data. The teacher-level data do however contain a number of teacher
characteristics, such as years of teaching experience, years teaching in the same district,
ethnicity, gender and education. I use these characteristics to link teachers across years
within a school. If multiple teachers at a school are observationally similar, I randomly
link them across years.
As a result, I do not reliably observe duration of employment spells at a school. Similarly, if a teacher leaves, and in the following year a new observationally identical teacher
enters the school, I do not observe a change in the composition of teachers. The data
can however be used to reliably determine net changes in the characteristics of the teacher
workforce at a school. I use these data to measure average teacher experience, the proportion of teachers new to a school, and the proportion of teachers new to a school who are
either new to teaching, or experienced but new to the district.
For my measure of class size, I restrict the set of classes to math, English, science, and
self-contained classes. Self-contained classes are those in which subjects such as math and
English are taught by the same teacher, and are the most common class type in elementary
schools. This analysis excludes special education courses, vocational courses, and other
electives.16
It has become common in the education production function literature to use student
performance on standardized tests as a measure of the output of this production process.
Standardized tests surely fail to capture a number of cognitive and non-cognitive skills that
an educational system is expected to impart on students. However, there is evidence that
variation in school inputs that increase test scores also have a positive impact on a number
16

Teachers for these excluded classes are included in the teacher experience category, in part so my
measure of experience is not dependent on data missing in 2010. The TEI is based on a subset of classes
similar to that which I use to calculate average class size.

13

of later-life outcomes, such as probability of attending college, selectivity of college, and
income (Chetty et al. (2011a), Chetty et al. (2011b)). Often a student’s performance on
standardized tests is the outcome in a regression including measures of scholastic inputs
and the student’s performance in previous years as controls. The use of California’s API
in this analysis is similar, but it differs from student-level assessment scores in important
ways.
Notably, the API is an average of performance not just across students, but across
subjects and even test types. For instance, an elementary school in 2010 would have
administered an English and language arts test in grades 2-5, a math test in grades 25, and a science exam in grade 5. Additionally, two alternative exams, the California
Modified Assessment and the California Alternative Performance Assessment would have
been administered to students with varying degrees of cognitive impairment. The API for
that school is a weighted average across all these tests, subjects, and grades.17 Nonetheless,
the API is California’s primary tool for assessing academic performance, and the goal of
QEIA was to improve API scores, so I include it in my analysis. I standardize API scores
within years with respect to the distribution of all elementary school APIs.
I supplement this measure of student achievement with grade-subject-level data on
California’s primary standardized test, the eponymous CST. California makes publicly
available the mean scaled score,18 and percent of students whose scores fall into particular
bins, referred to as proficiency levels. I use these data for 2nd -5th grade math and English
language arts tests.
17

The average is weighted by the proportion of students for whom there is a valid score, and each subject
and test receives an additional weight.
18
The scaling of scores takes into account changes in difficulty of tests across years, and therefore makes
yearly comparisons more meaningful.

14

Table 1.1 lists descriptive statistics for all funded and unfunded elementary schools in
2007, and those for which pi , the probability a school is selected, lies between 0.10 and 0.90,
as well as descriptive statistics for the restricted sample in 2011. The restricted sample
is similar to the full sample with two notable exceptions: a much smaller proportion of
schools in the restricted sample are in Los Angeles, and unfunded schools in the restricted
sample have a higher TEI.
Both funded and unfunded schools in QEIA typically had high proportions of students
who were Hispanic, English language learners, eligible for free and reduced price lunch, and
whose parents did not have a college degree. In 2007, a typical school in my sample had at
least 1/3 of its teacher work force that was not in the school the prior year.19 New teachers
did however tend to have nearly three years of experience.
From Table 1.1 it is apparent that the relative reduction in class size in funded schools
in kindergarten through third grade is driven by increased class sizes in unfunded schools,
while the relative reduction in class size in fourth and fifth grade is driven by smaller class
sizes in funded schools. The QEIA requirement for class sizes in kindergarten through
third grade replicated that of California’s prior class size reduction policy, the incentives
for which were drastically weakened in the first year of QEIA. This weakened incentive led
many unfunded schools to gradually increase class sizes in lower grades.
19

Recall that my teacher-level data set can only distinguish net changes in teacher characteristics. New
teachers who are observationally identical to departing teachers from the previous year are not recorded
as new, and thus the one third estimate is a lower bound.

15

1.5

Identification

Estimating the causal effect of QEIA is complicated by two facts: districts ranked schools
according to unobserved objectives, and districts with more participating schools were more
likely to be chosen at least once. A simple comparison of funded and unfunded schools
within a district would surely be biased, though the direction of bias would depend on the
district’s objective functions. A comparison across even just the highest ranked schools in
each district would also likely be biased, since larger districts, e.g., Los Angeles Unified,
were almost certain to have their highest ranked schools funded, and the size of a district
could be correlated with potential outcomes. Even within a school over time, potential
outcomes might be correlated with treatment if districts gave higher rankings to schools
poised to improve.20
Instead, my estimation strategy relies on the following intuition: if we were to compare
only schools that had an equal probability of being funded, e.g., 50%, then within that group
treatment is random, and an OLS estimate would be consistent and unbiased. For each
probability we could repeat this exercise, yielding treatment effects conditional on each
probability. By the Law of Iterated Expectations, the unconditional average treatment
effect could then be recovered. As Wooldridge (2004) shows, the result of an exercise like
this is equivalent to the following population specification for τATE , the average treatment
effect:21
20

There is anecdotal evidence that this did in fact happen: in personal communication with a CDE
employee who was on Sacramento’s school board when schools were ranked, I was told that one school
ranked highly because, with a new and talented principal soon starting there, it was poised to improve.
21
Wooldridge (2004) shows that the coefficient in a “conditional linear projection” of outcome on treatment, where the conditioning is on probability of selection, can be averaged across probabilities to yield
this form of the average treatment effect. He also notes several alternative and asymptotically equivalent
forms of the estimator. The estimator is similar to that used in Horvitz et al. (1952). See also Imbens et al.
(2009) and Wooldridge (2010).

16

τATE = E

Ti yi (1 − Ti )yi
−
pi
(1 − pi )

=E

(Ti − pi )yi
(1 − pi )pi

= E(yi1 − yi0 )

(1.1)

where Ti is an indicator for treatment, in this case being funded by QEIA; yi is an outcome
measure; yi0 is the outcome for school i if it is not selected, and yi1 is the outcome for
school i if it is selected; and pi is the probability of selection, i.e., the propensity score.
E(yi1 − yi0 ) is the Average Treatment Effect: it captures the average change in outcome
caused by QEIA. The parameter in equation (1.1) can be estimated using its sample analog.
Given the selection mechanisms, determining the functional form of pi is complex.22
However, given the rules of selection and districts’ rankings, I am able to determine the
true propensity score (up to an arbitrarily small error) by simulation; I do so by randomly
assigning the numbers 1-1,260 to districts and replicating the selection process 1 million
times.23
This method allows for the causal effect of QEIA to be non-parametrically identified if
two assumptions are satisfied. First, treatment must be mean independent of the potential
outcomes conditional on the propensity score (i.e., E(yj |T, p) = E(yj |p), j ∈ {0, 1}). This
requirement is satisfied by the nature of the selection process. The second requirement is
that there can be no schools for which pi = 1 or pi = 0. The intuition for this requirement
is straightforward. Among the schools for which pi = 0 or pi = 1, there is no variation
22

Were the total number of high schools and elementary or middle schools predetermined, the problem
would be considerably simpler, and pi would be based on a summation of hypergeometric functions,
weighted by the probability that the district has a school selected for geographic diversity. Since the
number of schools selected depended ultimately on the number of students in each grade level in each
school, the problem is considerably more complicated.
pi (1−pi )
, which at its largest is 0.0005. Given this precision,
As a result, my estimates have SE of
1000
I refer to my estimates as the true probabilities of selection. The actual random numbers assigned to
districts are made publicly available by CDE. Using these, and the district rankings, my simulation of the
selection process perfectly predicts funded schools.
23

17

in treatment, and so these schools contribute nothing to identification. Among schools
participating in QEIA, some were in counties with only one participating school, and that
school was therefore selected with probability one. Conversely, the middle and elementary
schools that applied for the alternative program had zero probability of being selected.
There were also many schools, e.g., Los Angeles Unified’s highest ranked schools, whose
probability of selection was near one, and many, e.g., Los Angeles Unified’s lowest ranked
schools, whose probability of selection was very near zero.
In practice, researchers drop observations with probability of treatment “close” to zero
or one. Crump et al. (2009) suggest discarding observations less than α away from zero or
one, where α satisfies the following:

1
1
1
1
=2∗E
<
.
α(1 − α)
pi (1 − pi ) pi (1 − pi )
α(1 − α)

(1.2)

As a general rule of thumb, Crump et al. (2009) suggest using α = .10. After dropping
schools for which pi is identically zero or one, I am able to calculate (1.2), and α = .10
nearly satisfies this requirement exactly. I therefore restrict the sample to schools for which
pi ∈ [0.10, 0.90].
To examine whether funded and unfunded schools share a common support across pi ,
Figure 1.1 graphs the number of elementary schools that are funded and unfunded by
bins of pi . Since schools are not uniformly distributed within each bin, we should not
necessarily expect the proportion of schools funded to be the midpoint in each bin even in
the population.
Though consistent, the sample analog to (1.1) is not efficient: as Hahn (1998) shows,
it fails to achieve the semiparametric efficiency bound. Hirano, Imbens, and Ridder (2003)

18

show that a two-step estimator, in which the first step estimates the probability of treatment
using a logit series estimator, does achieve the semiparametric efficiency bound, even when
the true probability is known. This puzzle is well known in the econometric literature
(Henmi et al. (2004), Hitomi et al. (2008), Prokhorov et al. (2009), Han et al. (2011)),
though as far as I know the result has never been applied empirically, presumably because
probability of treatment is rarely known, as it is in this case.
Though seemingly counter-intuitive, this result rests on a well-known fact: even under
exogenous treatment, if variation in the outcome can be explained by variation in other
observables, partialling out this variation results in more efficient estimation. This same
principle leads to the inclusion of covariates in an OLS estimate with random and dichotomous treatment. An OLS estimate of the causal effect is consistent and unbiased without
covariates, but is more precise when covariates that explain variation in the outcome are
included.
Wooldridge (2010) makes explicit the application of this intuition. Consider ki = [(Ti −
pi )yi ]/[pi (1 − pi )], where E(ki ) = τ , my population parameter of interest. We could of
course estimate τ using the sample average of ki , but doing so treats variation in ki that is
explained by variation in covariates as noise, leading to inefficient estimation.
If instead we were to estimate pi in a first stage using a logit model, as Hirano et al.
(2003) suggest, this would be equivalent to regressing kˆi = [(Ti − pˆi )yi ]/[ˆ
pi (1 − pˆi )], where
pˆi is the predicted probability from the first stage, on a constant and dˆi = Xi (Ti − pˆi ): the
constant would be an estimate of τ , and the residuals can be used to estimate the variance of
τˆ. To the degree that dˆi explains variation in kˆi , τˆ will gain efficiency. Another way to reach
the same conclusion is to note that E(ki − τ ) = 0 and E(di ) = 0 are moment conditions.

19

Estimating τ treating pi as known disregards the second moment condition, which, so long
as it is correlated with the first moment condition, contains useful information that we
incorporate in estimation by treating pi as unknown.
With known pi , the gains in efficiency can be achieved by regressing ki on di = Xi (Ti −
pi ). This is equivalent to what Qian et al. (1999) call an augmented GMM estimator, in
which efficiency gains are achieved with moment conditions that are not a function of the
parameter of interest. I provide results using the sample analog of (1.1), which I refer to
as those with one moment condition, and results that regress ki on di , where Xi includes
an indicator for having met the growth target, proportion of students eligible for Free
and Reduced Price Lunches, enrollment, Standardized API, percent of students who are
Hispanic, English language learners, and migrant. I use the value of these variables in
2007. As illustrated in Table 1.2, the sample moment conditions implied by E(di ) = 0 are
all quite close to zero.

1.6
1.6.1

Results
Regression Results

For comparison, I first present results based on various regression specifications with the
main QEIA requirements as outcome variables. It’s important to note that, unlike the
IPW estimator, consistent estimation of average treatment effects by the regression models
depends on assumptions that might not be satisfied. Each regression has a full set of
year dummies, excluding 2005, and interactions between a treatment indicator and the
year dummies. For expositional purposes the table includes only the coefficient on the

20

interaction between the treatment indicator and the dummy for 2011.24 I present results
from regressions on the full sample as well as regressions on the restricted sample such that
pi ∈ [.10, .90].
For each main QEIA requirement, I present five regression models. Model 1 includes
only the year dummies and interactions between a treatment indicator and the year dummies. This model consistently estimates the effect of QEIA on outcomes only if treatment
is uncorrelated with potential outcomes. Since the probability of treatment is dependent on
district rankings, as well as on the size of the district, we wouldn’t expect this assumption
to be satisfied.
The second regression model adds to Model 1 an interaction between the year dummies
and the probability of treatment, pi . If there is no heterogeneity in the treatment effect,
Model 2 will consistently estimate it. If there is heterogeneity, then consistent estimation of
the average treatment effect requires Var(T |p) to be uncorrelated with potential outcomes
(Wooldridge (2004)). There is of course no way to know whether this condition is satisfied.
Model 3 also nests Model 1, and includes as covariates an indicator for whether the
school met its growth target in 2007, whether the school is in Los Angeles Unified, the
percent of students eligible for free and reduced price lunch in 2007, and the enrollment in
2007. If conditional on these covariates treatment is uncorrelated with potential outcomes,
the average treatment effect will be consistently estimated.
Models 4 and 5 build on Model 1 by including fixed effects in the former, and the above
mentioned covariates with fixed effects in the latter. Fixed effects estimation requires
treatment to be uncorrelated with trends in potential outcomes. This assumption would
be violated if districts ranked highly those schools that were primed for improvement.
24

The full set of results is available from the author by request.

21

As Table 1.3 illustrates, the estimated effect of QEIA on class size is robust to a broad
range of specifications and to the sample restriction. Average class size is estimated to
have decreased in selected schools by about 4.5 students per class. In the full sample,
estimates of the effect of QEIA on teacher experience are similarly robust to a broad
range of specifications. Average experience appears to have decreased by 0.73 to 0.98
years, suggesting that funded schools were not able to reduce class size by hiring more
experienced teachers. In the restricted sample, the standard errors are generally larger and
the effects are smaller in each model, suggesting at most a 0.74 reduction in average teacher
experience, significant at the 10% level.
The regression estimates of the effect of QEIA on schools’ API vary across models.
Controlling for the probability of treatment, the API in funded schools is estimated to
have increased by 0.41 standard deviations (p < 0.001) in the distribution of APIs across all
elementary schools in California. At the other extreme, controlling for covariates suggests
QEIA had no effect on schools’ API. Estimates from the restricted sample are precise and
less widely dispersed, with a maximum of 0.4 standard deviations and a minimum of 0.26
standard deviations.
Results for 5th grade assessments in math and English language arts vary across specifications, with effects for math being larger across specifications in the full and restricted
sample. In the full sample, the effect of QEIA on math scores varies from 0.25 (p < 0.01) to
0.46 (p < 0.001) standard deviations in the population of all school-level averages in California, and ELA scores range from an insignificant 0.06 to 0.30 (p < 0.001). The estimates
are more uniform in the restricted sample, varying from 0.41 (p < 0.001) to 0.50 standard
deviations (p < 0.001) in math, and from 0.19 (p < 0.01) to 0.30 (p < 0.001) in ELA.

22

There is some evidence from the regression results of an increase in persistence in
funded schools, measured by the percent of students who were in the same school from the
beginning of the school year through the time assessments were administered. A causal
effect of QEIA on the composition of students in a school could suggest that it did not
benefit particular students, but rather affected the likelihood that better students remained
in the school. However, the estimated effects are small relative to the baseline of about
90% (see Table 1.1), precise only in some specifications, and then only significant at the
5% level. More importantly, as indicated below, results from IPW estimates suggest no
change in student characteristics.

1.6.2

Main Results

Unlike the regression results above, IPW estimates depend only on the assumption that
the randomization was carried out correctly, and by all accounts it was. The remainder of
the paper therefore focuses on these estimates, presenting those that depend on one and
two sets of moment conditions.25
Table 1.4 shows the causal effect of QEIA on average class size, the percent of teachers
classified as highly qualified, average teacher experience, and the TEI. The point estimate
on class size in 2009, the first full year of QEIA funding, suggests that QEIA reduced class
size, but the effect is imprecisely measured. The standard errors are much smaller using
two sets of moment conditions, though they are still larger than those from the regression.
Using estimates based on two moment conditions, in the final year for which class size
data are available, QEIA reduced class size by 4.35 students per class, an estimate that is
25

Average treatment effects on the treated are available from the author by request. The point estimates
are quite similar to the average treatment effects, and are too noisy to distinguish from the average
treatment effects.

23

significant at the 0.001 level.
Consistent with the claim that both funded and unfunded schools were required to have
high proportions of HQT teachers, being funded had no causal impact on the proportion of
HQT teachers. The estimates that rely on two sets of moment conditions are all practically
small, precisely measured, and not statistically discernible from zero.
Similarly, teacher experience does not appear to have been affected by QEIA. From
2009 through 2011 the point estimates using both one and two sets of moment conditions
are neither positive nor statistically discernible from zero, as evidenced by Table 1.4. The
point estimates based on two moment conditions are smaller in absolute value than the
regression estimates, both of which are small relative to the 2007 baseline of 11.81. The
marginally significant effects on TEI in 2007 are presumably spurious, and cast doubt on
the significant differences in 2008 and 2009, using one moment condition, and 2008, using
two moment conditions. Even taking the point estimates at face value, QEIA appears to
have reduced teacher experience, measured by years or by the TEI.
The estimated effect of QEIA on student achievement, as measured by California’s
API, is quite similar to the regression estimates based on the restricted sample, as shown
in Table 1.5. Using two sets of moment conditions,26 QEIA increased API scores in funded
schools by 0.35 standard deviations (p < 0.001) by 2011, with respect to the population of
all elementary school-level averages. The effect for Hispanic students is significantly larger
than for all students by 2011 (p = 0.068), as is the effect for low-SES students (p = 0.075).
From 2008 onward there is a clear pattern of funded schools improving over unfunded
schools.
26

Note that effects on the 2007 API scores are not calculated using both moment conditions. This is
because I use 2007 API scores in the second set of moment conditions.

24

These estimates capture the causal effect of QEIA at the school level. However, it is
possible that these results are driven partly by changes in the composition of students in
response to QEIA. For instance, it may be that especially savvy parents, whose children are
more likely to receive extra support, will be aware of QEIA and select into a QEIA school.
Although I cannot currently observe student-level characteristics, I do observe school-level
averages of such things as Free and Reduced Price Lunch eligibility, whether parents have
a college degree, their race, and whether the student was enrolled in the school from the
beginning of the school year through the time assessments were administered.
Table 1.6 displays the results for FRPL,27 percent of students whose parents have a
college degree, and percent of students who were in the school the prior year. Focusing on
the estimates based on two sets of moment conditions, no coefficient is significant, and the
magnitudes of the point estimates are quite small. Similarly, Table 1.7 shows no discernible
impact on student enrollment, proportion black, or proportion Hispanic. This is consistent
with the student population not changing in response to QEIA.
However, it is still possible that the population of test takers at schools may have
changed in response to QEIA. This is particularly concerning since schools were required
to improve API scores in order to remain in the program, increasing the stakes of the
tests. Schools could manipulate their API scores by either encouraging more students
to take alternative tests,28 discouraging low-performing students from taking any test, or
manipulating answer sheets.
Of these possibilities, I currently am able to observe the number of students for whom
27

Estimates relying on two sets of moment conditions for FRPL in 2007 are not calculated, since FRPL
in 2007 is used in that set of moment conditions.
28
Alternative tests are included in the calculation of the API, but presumably the marginal student
would find the regular California Standardized Test challenging, and the California Alternative Performance
Assessment or the California Modified Assessment less so.

25

there is a valid score, and the number who take the regular standardized test. Table 1.8
displays the results of this analysis. There is no evidence that the number of valid scores
differed between funded and unfunded schools, either before or after QEIA. Similarly,
there is no evidence of a difference in the number of valid scores for low-SES students or
for Hispanic students. Neither is there evidence of a change in the proportion of students
taking the regular standardized test. Though this is not definitive evidence, it is at least
consistent with the population of test takers not changing in response to QEIA.
Two key policy levers of QEIA, decreased class size and increased teacher experience,
require changes to the teacher workforce at a school. Using the teacher-level data, I am
able to observe the net changes in teacher characteristics at a school. Tables 1.9 and 1.10
list the results from this analysis. I examine differences caused by QEIA in the proportion
of teachers new to the school, new to the school but not new to the district,29 average
experience conditional on being new to the school, and proportion of probationary, tenured,
and temporary teachers.
In 2009 QEIA appears to have caused an increase in the proportion of teachers new to
the school in funded schools relative to unfunded schools. In 2009 there were 7 percentage
points more (p < 0.1) new teachers in funded schools. The similar estimates for the change
in new teachers with experience in the district suggests that nearly all teachers new to the
school had experience in the district. Comparing the set of teachers new to a school in
funded and unfunded schools, average experience is 0.92 years greater in funded schools in
2009 (p < 0.1).
Table 1.10 lists the change in proportion of teachers who are probationary, tenured,
29

A teacher is new to the school but not the district if no teacher with the same characteristics is observed
in the school the prior year, and the teacher has more than one year of experience in the district.

26

and long-term substitutes. The differences between funded and unfunded schools in the
proportion of probationary teachers before 2008, significant at the 10%, and 5% levels, are
presumably spurious, casting some doubt on the results in later years. Assuming that the
more precisely estimated differences in proportion probationary after 2008 are not spurious,
there is evidence of an increase in probationary teachers caused by QEIA. There is also
significant evidence of fewer tenured teachers in funded schools (p < 0.05), and more longterm substitutes (p < 0.10).
Tables 1.11 and 1.12 show the effect of QEIA on class sizes at the grade level. Estimates
based on one set of moment conditions are noisy, and are at no point statistically different
from zero. Class sizes in kindergarten through 3rd grade are not affected by QEIA until
2011, at which time there are 3.0 to 4.2 fewer students in those grades in funded schools.
As mentioned above, class size data are not available in 2010, and the difference in class
sizes in these earlier grades is driven by unfunded schools exiting the previous class size
reduction program. Estimates based on two sets of moment conditions suggest that grades
4 and 5 decreased class sizes by about 4.8 (p < 0.001) and 4.4 (p < 0.01) students per class
in 2009, respectively, and by 5.5 (p < 0.001) to 6.1 (p < 0.001) students in 2011.
The effect of QEIA on API scores is important, since the primary goal of the policy was
to improve API scores. However, given that the API is an average across students, grades,
subjects and even test types, changes in API scores are hard to interpret or compare to
other findings in the literature. Tables 1.13 and 1.14 therefore list the estimated effect
of QEIA on mean scaled scores from California’s Standardized Test for math and English
language arts.
The effects of QEIA on math scores is greater in later years of the program and in

27

higher grades. There’s no discernible effect on 2nd grade math scores until 2011, when
they are 0.29 standard deviations higher in funded schools, with respect to the population
of grade-level averages (p < 0.001, 0.13 student-level standard deviations). The 3rd grade
math scores increase one year earlier, in 2010, by 0.18 standard deviations (p < 0.05, 0.08
student-level standard deviations), and by 0.29 standard deviations by 2011 (p < 0.01, 0.12
student-level standard deviations).
Math scores in 4th grade improve earlier; by 2009 they show an increase of 0.32 standard
deviations(p < 0.001, 0.15 student-level standard deviations), 0.40 (p < 0.001, 0.17 studentlevel standard deviations) in 2010, and level off in 2011 at 0.40 (p < 0.001, 0.17 studentlevel standard deviations). Interestingly, 5th grade math scores do not begin improving
until 2010, at which time they were 0.37 standard deviations higher in funded schools
(p < 0.001, 0.17 student-level standard deviations), and by 2011 they were 0.42 standard
deviations higher (p < 0.001, 0.19 student-level standard deviations)
Consistent with results from the vast majority of education reforms, the effects are
smaller for English language arts. Still, the previous pattern persists: effects are larger in
later years, in later grades, and there is an effect on 4th grade test scores in 2009 but not on
5th grade test scores. By 2011, ELA scores in 2nd grade are 0.20 standard deviations higher
in funded schools (p < 0.01, 0.09 student-level standard deviations), and in 5th grade they
are 0.23 standard deviations higher (p < 0.001, 0.11 student-level standard deviations).
To better understand the effects of increased exposure to QEIA, Figure 1.2 replicates
the information in the tables, displaying the average treatment effect on class size and
achievement at the grade level, but by cohort exposure. Each panel in the figure displays
the change in class size and achievement that a group of students with a normal grade

28

progression would face. For instance, students in panel A enter kindergarten in 2005, and
those who progress one grade each year are exposed to QEIA for one year, in 2009. Since
class size data aren’t available in 2010, I instead use the same-grade average from 2009 and
2011, e.g., the 3rd grade class size in 2010 is the average of 3rd grade class size in 2009 and
2011.
As the figure suggests, consecutive years of smaller classes do not lead to a widening of
the achievement gains. Additionally, though it is not possible to empirically separate the
effects of teacher training, high-stakes testing, and reduced class size, it must nonetheless
be the case that if teacher training and high-stakes testing explain the improved scores,
the timing would have to be correlated with changes in class size. Since both the reduced
class sizes in 2nd and 3rd grade are delayed in the same manner that the relative change
in test scores is delayed, it seems likely that the effect is driven by class size. Otherwise,
there would have to be some reason that professional training and accountability pressure
were also delayed. Though professional training is not observed, test scores in each of the
first three years counted towards the achievement target, and it therefore seems unlikely
that schools would not respond to it until the third year of the program.

1.7

Conclusion

California’s QEIA provides a unique opportunity to study the causal effects of school reform.
Using district rankings and the details of the selection process, the probability of any school
being selected is known. Between any two schools with the same probability of selection,
being funded is uncorrelated with potential outcomes. Using this, and relying on methods
described in Wooldridge (2004), Hirano, Imbens, and Ridder (2003), and Qian et al. (1999)
29

I am able to estimate the causal impact of QEIA by inverse probability weighting.
Doing so, I find that QEIA caused a decrease in class size, and had no discernible
effect on teacher experience. Two of the other QEIA requirements applied to all QEIA
eligible schools, and are therefore not considered part of the treatment here. The remaining components of treatment, professional training for teachers, which is unobserved, and
added incentive to increase achievement to maintain funding, may also contribute to the
improvement in test scores.
Test scores improved significantly, albeit unevenly across grades and years. Grades 4
and 5, in which class sizes were first reduced, experienced the largest and earliest increase
in test scores. In the first fully-funded year of the program, math scores in 4th grade
increased by 0.32 standard deviations in the population of school-grade averages, and by
the second fully-funded year 5th grade math scores improved by 0.36 standard deviations.
The improvement in test scores in 2nd and 3rd grade, like the reduction in class sizes in those
grades, occurred later, and was more modest. By the third fully-funded year of the program,
math scores in 2nd grade were 0.28 standard deviations higher in the distribution of schoolgrade averages, and 0.27 standard deviations higher 3rd grade. For teacher professional
training and added test pressure to explain the improvement, they would have to exhibit a
similar pattern of implementation across grades and years. Gains in English language arts
were modest, but exhibit the same pattern across grades and years.
To estimate the cost effectiveness of QEIA, I consider the expense that would be incurred
by replicating the intervention for each of three cohorts of students, those in 2nd , 3rd , and
4th grade in 2009. As an upper-bound, I consider the expenses that were incurred in
addition to the per-pupil allocation as fixed costs. These include an annual expense of $2

30

million for county superintendents, $1.177 million annually for CDE staff, and a one-time
expense of $5 million for regional support offices. The lower bound treats these expenses as
variable costs. I use the OMB nominal interest rate on 3-year treasury bills from 2009-2011
to express the PDV of costs in 2008, the first year of the program. I average math and
English language arts scores, and express all effects in student-level standard deviations,
with respect to the population of all California students.
For the sake of comparison, I compare the cost-benefit estimates to the cost of achieving
the same class reduction as in Project STAR. The average teacher salary in California in
2008 was $64,424 (U.S. Department of Education (2009)). Following Podgursky (2006), I
allow for benefits to account for 20% of compensation, and thus the cost of an additional
teacher in 2008 is $80,530. The one-year cost of the same class size reduction as in Project
1 − 1 )∗80, 530 ≈ $1, 867 per student. Comparing the change in student
STAR is therefore ( 15
23

test scores caused by Project STAR to those caused by QEIA is complicated by a lack of
common measure. Under the strong assumption that a standard deviation with respect to
a select sample of Tennessee students in kindergarten through third grade is comparable
to a standard deviation with respect to the population of all California students in 2nd
through 4th grade, a class-size reduction of this magnitude would result in gains of 0.20 to
0.28 standard deviations (Krueger (1999)).
Table 1.15 shows the results of this exercise for each of three cohorts of QEIA students:
those in 2nd , 3rd , and 4th grade in 2009. Where the class size requirements of QEIA
duplicated the existing class size reduction program, QEIA had no effect, and was of course
not cost effective. In other years and grades, the upper-bound cost per standard deviation
gain in test scores is comparable to Project STAR in the first and third year of each program,

31

while the second year of Project STAR lies closer to the lower bound. Project STAR’s much
more dramatic, and much more expensive, reduction in class sizes is estimated to have only
achieved concomitant dramatic increase in student achievement in its second year.
Though the design of QEIA precludes separate identification of effects of its constituent
reforms, it is nonetheless a remarkable policy, unprecedented in education for being a largescale policy intervention with random assignment. Though potentially cost-effective relative
to Project STAR in years in which it was effective, QEIA was hampered by overlap with
existing policies that caused it to be completely ineffective in certain years and grades. The
unique design of QEIA, which accommodated district preferences for resource allocation
across schools, State budget constraints and preferences for reform design, also allows for
non-parametric identification of its causal effect. Were more policies to follow this design,
our understanding of the effectiveness of various reforms could be dramatically improved.

32

APPENDICES

33

APPENDIX A - FIGURES
Figure 1.1: Support over p, All Elementary Schools
350
2.8%

Unfunded Schools
Funded Schools

300

Number of Schools

250

200

150
37.6%

98.3%

100
17.9%
50
31.0%
0

41.2%

81.0%
53.3%

63.0%

92.3%

[0,0.10) [0.10,0.20] (0.20,0.30] (0.30,0.40] (0.40,0.50] (0.50,0.60] (0.60,0.70] (0.70,0.80] (0.80,0.90] (0.90,1]

Bins of p width 0.10
Note: Numbers above bars refer to the percentage of schools funded in that bin. Sample includes all
elementary schools participating in the regular QEIA program.

34

Figure 1.2: Cohort-Level Class Size and Math Achievement Comparison
Std. Math
2005

2006

Class Size
2007

2008

95% CI
2009

Interpolated
2010

2011
1

5

1

st

2

nd

3

rd

4

th

5

th

0.5
0

0
−5

−0.5

A

−1
1
K

5

1

st

nd

2

3

rd

4

th

5

th

0.5

0
−5

0
−0.5

B

−1
1
K

5

1

st

2

nd

3

rd

4

th

5

th

0.5

Class Size

−5

−0.5

C

−1
1
K

5

1

st

2

nd

3

rd

4

th

0.5

0
−5

0
−0.5

D

−1
1
K

5

1

st

2

nd

3

rd

0.5

0
−5

0
−0.5

E

−1
1
K

5

1

st

2

nd

0.5
0

0
−5

−0.5

F

−1
2005

2006

2007

2008

2009

2010

2011

Note: Estimates are of average treatment effect using two moment conditions. Left
axis refers to class size, right axis refers to standardized math scores on California’s
CST. Class size data are missing for 2010. Shaded regions indicate use of average of
th
2009 and 2011 same-grade class size. For example 2010 4 grade class size is average
th
th
of 2009 4 grade class size and 2011 4 grade class size. Only grades 2 and above
are tested.

35

Standardized Math

0

0

APPENDIX B - TABLES
Table 1.1: Descriptives, Elementary Regular QEIA Schools 2007
All Elem.

Avgerage Class Size
Class Size Kindergarten
Class Size 1st Grade
Class Size 2nd Grade
Class Size 3rd Grade
Class Size 4th Grade
Class Size 5th Grade
Average Experience
TEI Relative
Highly Qualified Teachers
Williams Settlement Applies
Std. API
API Percentile Rank
Met Growth Target
Proportion Black
Proportion Hispanic
Proportion White
English Language Learners
Proportion FRPL
Parent College Grad
Student Enrollment
Proportion Same School
Los Angeles
Proportion Teachers New to School
New to School, Not District
Average Experience New Teachers
Proportion temp teachers
Proportion Probationary
N

pi ∈ [0, 1], 2007

pi ∈ [0.10, 0.90], 2007

pi ∈ [0.10, 0.90], 2011

2007

Unfunded

Funded

Unfunded

Funded

Unfunded

Funded

21.86
(3.54)
20.64
(4.14)
19.32
(1.85)
19.14
(1.87)
19.89
(3.17)
28.47
(4.23)
28.89
(4.15)
13.01
(3.95)
-0.04
(1.05)
0.96
(0.11)
0.24
(0.43)
0.00
(1.00)
0.50
(0.29)
0.70
(0.46)
0.08
(0.12)
0.46
(0.30)
0.33
(0.28)
0.29
(0.23)
0.55
(0.31)
0.18
(0.14)
376.60
(192.56)
0.92
(0.08)
0.09
(0.29)
0.33
(0.27)
0.24
(0.24)
3.21
(3.64)
0.06
(0.10)
0.14
(0.18)

21.93
(1.87)
20.24
(3.39)
19.29
(1.40)
18.84
(1.52)
19.42
(2.71)
28.01
(3.82)
28.28
(3.72)
11.92
(3.17)
-0.17
(1.02)
0.94
(0.10)
0.94
(0.24)
-1.10
(0.48)
0.16
(0.11)
0.68
(0.47)
0.09
(0.14)
0.79
(0.21)
0.06
(0.10)
0.55
(0.18)
0.89
(0.11)
0.06
(0.06)
472.24
(195.68)
0.91
(0.04)
0.23
(0.42)
0.30
(0.26)
0.22
(0.23)
2.76
(3.18)
0.05
(0.08)
0.13
(0.14)

21.96
(2.04)
20.41
(3.78)
19.13
(1.31)
18.97
(1.40)
19.60
(3.01)
28.18
(3.80)
28.51
(3.77)
11.61
(3.35)
-0.28
(1.00)
0.94
(0.14)
0.96
(0.20)
-1.25
(0.48)
0.12
(0.10)
0.65
(0.48)
0.11
(0.16)
0.74
(0.25)
0.07
(0.10)
0.55
(0.20)
0.88
(0.11)
0.06
(0.07)
434.61
(179.70)
0.91
(0.04)
0.09
(0.29)
0.35
(0.25)
0.25
(0.23)
3.13
(3.12)
0.05
(0.10)
0.16
(0.16)

22.14
(1.75)
20.28
(3.46)
19.33
(1.47)
18.76
(1.58)
19.76
(3.18)
28.21
(3.57)
28.70
(3.57)
12.38
(3.10)
-0.07
(0.89)
0.96
(0.09)
0.95
(0.22)
-1.13
(0.44)
0.15
(0.11)
0.66
(0.47)
0.07
(0.12)
0.78
(0.21)
0.09
(0.13)
0.55
(0.18)
0.87
(0.12)
0.06
(0.06)
445.58
(164.11)
0.91
(0.04)
0.06
(0.24)
0.34
(0.26)
0.26
(0.25)
3.17
(3.47)
0.06
(0.09)
0.15
(0.15)

22.12
(2.05)
20.70
(4.23)
19.21
(1.29)
19.00
(1.34)
19.60
(3.08)
28.56
(3.81)
28.62
(3.95)
11.70
(3.38)
-0.26
(0.98)
0.94
(0.15)
0.95
(0.22)
-1.15
(0.47)
0.15
(0.11)
0.60
(0.49)
0.08
(0.13)
0.76
(0.25)
0.07
(0.10)
0.56
(0.20)
0.86
(0.12)
0.07
(0.09)
420.82
(154.59)
0.91
(0.04)
0.03
(0.17)
0.36
(0.26)
0.25
(0.23)
3.13
(3.07)
0.06
(0.11)
0.18
(0.17)

25.03
(3.19)
23.97
(4.31)
23.78
(4.07)
23.85
(4.24)
23.95
(4.49)
28.14
(4.30)
28.36
(4.53)
13.68
(3.56)
-0.12
(0.80)
0.99
(0.04)
0.95
(0.22)
-1.11
(0.56)
0.17
(0.14)
0.60
(0.49)
0.06
(0.11)
0.80
(0.20)
0.08
(0.12)
0.51
(0.18)
0.89
(0.14)
0.07
(0.04)
405.13
(138.53)
0.92
(0.05)
0.06
(0.24)
0.50
(0.31)
0.47
(0.31)
6.44
(4.88)
0.04
(0.08)
0.06
(0.10)

20.34
(2.19)
20.23
(3.48)
19.54
(2.27)
19.36
(2.54)
19.54
(2.63)
22.37
(3.97)
22.25
(3.35)
12.95
(3.17)
-0.15
(0.69)
0.99
(0.11)
0.95
(0.22)
-0.77
(0.61)
0.26
(0.17)
0.71
(0.46)
0.08
(0.12)
0.78
(0.24)
0.06
(0.10)
0.53
(0.20)
0.85
(0.17)
0.07
(0.05)
378.30
(135.89)
0.93
(0.08)
0.03
(0.17)
0.48
(0.29)
0.43
(0.29)
5.87
(4.48)
0.04
(0.07)
0.09
(0.13)

6476

546

307

198

171

198

171

Note: Table lists means and standard deviations in parenthesis. Funded and unfunded includes all elementary schools
participating in the regular QEIA program.

36

Table 1.2: Sample Moment Conditions
Variable

Mean

S.D.

(Fundedi -pi )
(2007 Met Growth Target)(Fundedi -pi )
(2007 Proportion FRPL)(Fundedi -pi )
(2007 Student Enrollment)(Fundedi -pi )
(2007 Std. API)(Fundedi -pi )
(2007 Proportion Hispanic)(Fundedi -pi )
(2007 English Language Learners)(Fundedi -pi )
(2007 Migrant)(Fundedi -pi )

-0.00
-0.02
-0.01
-5.80
0.00
-0.01
0.00
-0.01

(0.45)
(0.36)
(0.39)
(203.35 )
(0.53)
(0.36)
(0.26)
(0.06)

Note: Sample analogs of moments in condition E[X(Fundedi −pi )] = 0.

37

Table 1.3: Select Regression Results
(1)

(2)

(3)

(4)

(5)

Full Sample
Avg. Class Size

-4.45***
(0.56)

Experience

-0.83**
(0.26)
†
0.14
(0.08)

Std. API
5
5

th
th

-4.76***
(0.56)
†
-0.73
(0.43)

-4.54***
(0.40)

-4.62***
(0.46)

-4.58***
(0.42)

-0.90**
(0.27)

-0.93**
(0.28)

-0.98***
(0.25)

0.41***
(0.07)

0.08
(0.07)

0.26***
(0.06)

0.22***
(0.05)

Grade Math

0.27**
(0.09)

0.46***
(0.09)

0.25**
(0.08)

0.37***
(0.08)

0.34***
(0.08)

Grade ELA

0.12*
(0.06)
0.00
(0.01)

0.30***
(0.07)
0.01
(0.01)

0.06
(0.06)
0.01*
(0.00)

0.17**
(0.05)
0.00
(0.01)

0.16**
(0.05)
0.01*
(0.00)

-4.90***
(0.45)
†
-0.59
(0.34)
0.37***
(0.06)

-4.80***
(0.45)
-0.67*
(0.34)
0.34***
(0.06)

Enrolled Since Previous Year

pi ∈ [.10, .90]
Avg. Class Size

-4.69***
(0.45)
†
-0.74
(0.38)
0.34***
(0.06)

-0.61
(0.45)
0.40***
(0.07)

-4.58***
(0.41)
†
-0.73
(0.40)
0.26***
(0.06)

Grade Math

0.43***
(0.09)

0.44***
(0.09)

0.41***
(0.09)

0.49***
(0.10)

0.47***
(0.10)

Grade ELA

0.26***
(0.06)
0.01
(0.01)

0.30***
(0.07)
0.01
(0.01)

0.19**
(0.06)
0.01*
(0.01)

0.26***
(0.06)
0.01
(0.01)

0.24***
(0.06)
0.01*
(0.01)

Experience
Std. API
5
5

th
th

Enrolled Since Previous Year

-4.68***
(0.50)

Covariates
No
No
Yes
No
Yes
Propensity Score
No
Yes
No
No
No
Fixed Effects
No
No
No
Yes
Yes
†
Note: indicates p < 0.10, * indicates p < 0.05, ** indicates p < 0.01, *** indicates p < 0.001. Standard
errors robust and clustered at district level. Omitted variable is 2005 unfunded. Covariates include dummy
for whether school met growth target in 2007, percent of students eligible for Free and Reduced Price
Lunches in 2007, an indicator for being in LA, and total enrollment in 2007. Results are from regression
with time dummies, and time dummies interacted with treatment indicator. When propensity score is
included, it is interacted with time dummies. Reported coefficients are from interaction of dummy for 2011
and treatment indicator.

38

Table 1.4: Class Size and HQT
Avg. Class Size

2005
2006
2007
2008
2009

HQT

TEI

ATE 1

ATE 2

ATE 1

ATE 2

ATE 1

ATE 2

ATE 1

ATE 2

0.10
(3.93)
0.17
(3.97)
0.05
(3.90)
-0.18
(3.92)
-1.61
(3.82)

0.22
(0.84)
0.25
(0.81)
0.12
(0.81)
0.08
(0.68)
-1.45†
(0.80)

0.01
(0.18)
0.01
(0.17)
-0.01
(0.17)
0.01
(0.18)
0.01
(0.18)

0.00
(0.04)
0.00
(0.04)
-0.02
(0.04)
0.00
(0.04)
0.00
(0.04)

-0.00
(0.10)
-0.05
(0.10)
-0.23†
(0.12)
-0.23†
(0.14)
-0.18
(0.13)

-4.35***
(0.95)

-0.01
(0.18)

-0.00
(0.04)

-0.20
(0.58)
-0.29
(0.64)
-0.76
(0.69)
-0.55
(0.66)
-0.45
(0.72)
-0.63
(0.72)
-0.58
(0.64)

-0.02
(0.10)
-0.09
(0.11)
-0.30*
(0.13)
-0.28*
(0.13)
-0.25†
(0.13)

-4.65
(4.05)

-0.06
(2.16)
-0.21
(2.24)
-0.69
(2.16)
-0.41
(2.19)
-0.45
(2.33)
-0.64
(2.47)
-0.57
(2.39)

-0.02
(0.10)

0.01
(0.12)

2010
2011

Experience

Note: † indicates p < 0.10, * indicates p < 0.05, ** indicates p < 0.01, *** indicates
p < 0.001. ATE1 is estimate of average treatment effect using one moment condition;
ATE2 is estimate of average treatment effect using two moment conditions. Standard
errors bootstrapped and clustered at district level. Class size data, which is used to
calculate TEI and includes HQT data, is not available in 2010.

39

Table 1.5: Average Performance Index
Std. API

2005
2006
2007
2008
2009
2010
2011

Std. API Hispanic

ATE 1

ATE 2

ATE 1

-0.01
(0.22)
0.02
(0.22)
0.01
(0.21)
0.07
(0.19)
0.16
(0.20)
0.31
(0.20)
0.43*
(0.19)

-0.03
(0.05)
-0.03
(0.05)

0.08
(0.21)
0.08
(0.20)
0.05
(0.19)
0.15
(0.17)
0.22
(0.18)
0.47*
(0.19)
0.45*
(0.18)

0.03
(0.04)
0.10†
(0.06)
0.23***
(0.06)
0.35***
(0.07)

ATE 2
0.04
(0.07)
0.02
(0.06)
0.04
(0.06)
0.09
(0.07)
0.15†
(0.08)
0.35***
(0.09)
0.41***
(0.08)

Std. API Low SES
ATE 1
-0.08
(0.20)
-0.04
(0.19)
-0.04
(0.19)
0.05
(0.16)
0.18
(0.17)
0.39*
(0.19)
0.48**
(0.18)

ATE 2
-0.07
(0.06)
-0.07
(0.05)
-0.01
(0.04)
0.03
(0.06)
0.12
(0.08)
0.31***
(0.08)
0.41***
(0.08)

Note: † indicates p < 0.10, * indicates p < 0.05, ** indicates p < 0.01, *** indicates
p < 0.001. ATE1 is estimate of average treatment effect using one moment condition; ATE2 is estimate of average treatment effect using two moment conditions.
Standard errors bootstrapped and clustered at district level. 2007 API effect not
calculated since that covariate is used in the second moment condition.

40

Table 1.6: Demographics
FRPL

2005
2006
2007
2008
2009
2010
2011

Parents College Degree

Enrolled Since Previous Year

ATE 1

ATE 2

ATE 1

ATE 2

ATE 1

ATE 2

-0.02
(0.16)
-0.03
(0.15)
-0.02
(0.15)
-0.02
(0.16)
-0.03
(0.16)
-0.03
(0.16)
-0.05
(0.16)

0.01
(0.03)
-0.01
(0.03)

0.01
(0.02)
0.01
(0.02)
0.01
(0.02)
0.01
(0.01)
0.01
(0.02)
0.01
(0.02)
0.01
(0.01)

0.01
(0.01)
0.01
(0.01)
0.00
(0.01)
-0.00
(0.01)
0.00
(0.01)
0.01
(0.01)
0.00
(0.01)

0.01
(0.16)
0.02
(0.16)
0.01
(0.16)
0.02
(0.17)
0.02
(0.16)
0.02
(0.16)
0.02
(0.16)

0.01
(0.03)
0.02
(0.03)
0.01
(0.03)
0.01
(0.03)
0.01
(0.03)
0.01
(0.03)
0.02
(0.03)

0.01
(0.03)
-0.01
(0.03)
-0.01
(0.03)
-0.03
(0.04)

Note: † indicates p < 0.10, * indicates p < 0.05, ** indicates p < 0.01, *** indicates
p < 0.001. ATE1 is estimate of average treatment effect using one moment condition; ATE2 is estimate of average treatment effect using two moment conditions.
Standard errors bootstrapped and clustered at district level. FRPL, parents with
college degree, and enrolled previous year are expressed in proportions.

41

Table 1.7: Demographics Continued
Enrollment

2005
2006
2007
2008
2009
2010
2011

Black

Hispanic

ATE 1

ATE 2

ATE 1

ATE 2

ATE 1

ATE 2

-32.38
(92.10)
-25.03
(84.61)
-24.21
(79.61)
-34.45
(83.42)
-30.22
(76.39)
-25.21
(78.41)
-31.41
(76.97)

-5.38
(20.44)
-1.19
(17.32)

0.02
(0.03)
0.02
(0.03)
0.02
(0.03)
0.02
(0.02)
0.02
(0.02)
0.01
(0.02)
0.01
(0.02)

0.02
(0.01)
0.02
(0.01)
0.01
(0.01)
0.01
(0.01)
0.01
(0.01)
0.01
(0.01)
0.01
(0.01)

-0.03
(0.13)
-0.03
(0.13)
-0.02
(0.13)
-0.02
(0.14)
-0.02
(0.14)
-0.01
(0.13)
-0.02
(0.14)

-0.00
(0.03)
-0.00
(0.03)

-11.82
(17.90)
-7.30
(16.19)
-6.93
(17.30)
-7.36
(18.15)

0.00
(0.03)
0.00
(0.03)
0.00
(0.03)
0.00
(0.03)

Note: † indicates p < 0.10, * indicates p < 0.05, ** indicates
p < 0.01, *** indicates p < 0.001. ATE1 is estimate of average
treatment effect using one moment condition; ATE2 is estimate
of average treatment effect using two moment conditions. Standard errors bootstrapped and clustered at district level. Black
and Hispanic are expressed in proportions.

42

Table 1.8: Test Taking
Valid Scores

2005
2006
2007
2008
2009
2010
2011

Number Low SES Scores

Number Hispanic Scores

Prop. CST

ATE 1

ATE 2

ATE 1

ATE 2

ATE 1

ATE 2

ATE 1

ATE 2

-25.25
(79.32)
-17.20
(73.57)
-22.40
(65.98)
-23.29
(68.51)
-15.96
(67.14)
-19.13
(69.82)
-22.24
(68.36)

-2.19
(18.25)
3.96
(14.75)
-1.28
(13.21)
-9.40
(15.06)
-2.69
(14.40)
-2.09
(15.35)
-0.65
(16.16)

-28.88
(74.56)
-21.36
(69.07)
-25.92
(62.82)
-25.84
(62.13)
-21.52
(61.05)
-27.98
(65.13)
-33.00
(63.69)

-0.73
(17.27)
2.47
(13.33)
-0.29
(11.86)
-5.47
(14.03)
-3.30
(13.87)
-4.49
(14.96)
-7.19
(16.20)

-21.88
(59.94)
-14.61
(57.90)
-20.11
(55.65)
-21.02
(54.45)
-14.70
(53.18)
-14.93
(56.92)
-21.41
(55.00)

2.85
(14.73)
8.95
(11.88)
3.99
(10.46)
-3.41
(12.31)
2.88
(12.12)
5.70
(12.86)
4.05
(13.62)

0.03
(0.19)
0.03
(0.17)
0.02
(0.18)
0.01
(0.18)
0.03
(0.18)
0.03
(0.17)
0.03
(0.17)

0.01
(0.04)
0.01
(0.03)
0.01
(0.03)
0.01
(0.03)
-0.00
(0.03)
-0.00
(0.03)
0.00
(0.04)

Note: † indicates p < 0.10, * indicates p < 0.05, ** indicates p < 0.01, *** indicates p < 0.001.
ATE1 is estimate of average treatment effect using one moment condition; ATE2 is estimate of
average treatment effect using two moment conditions. Standard errors bootstrapped and clustered
at district level. Valid, low SES, and Hispanic scores refers to all standardized tests used in API.
CST is proportion of students taking the California Standardized Test, a subset of the API.

43

Table 1.9: Teacher Mobility
New to school

2005
2006
2007
2008
2009
2010
2011

New to School, Not New to Dist.

Avg. Experience New Teachers

ATE 1

ATE 2

ATE 1

ATE 2

ATE 1

ATE 2

0.07
(0.08)
0.05
(0.07)
-0.01
(0.08)
0.06
(0.07)
0.08
(0.07)
0.03
(0.09)
-0.05
(0.10)

0.06
(0.04)
0.04
(0.04)
-0.01
(0.05)
0.06
(0.04)
0.07†
(0.04)
0.04
(0.04)
0.00
(0.05)

0.08
(0.06)
0.04
(0.05)
-0.04
(0.07)
0.03
(0.05)
0.08
(0.06)
0.03
(0.08)
-0.06
(0.10)

0.07
(0.04)
0.03
(0.04)
-0.04
(0.05)
0.03
(0.04)
0.07†
(0.04)
0.05
(0.04)
-0.01
(0.05)

0.80
(0.79)
0.40
(0.71)
-0.47
(0.91)
0.52
(0.81)
1.04
(0.80)
0.10
(1.04)
-0.65
(1.44)

0.64
(0.56)
0.25
(0.49)
-0.58
(0.66)
0.35
(0.61)
0.92†
(0.55)
0.37
(0.58)
-0.04
(0.76)

Note: † indicates p < 0.10, * indicates p < 0.05, ** indicates p < 0.01, *** indicates p < 0.001.
ATE1 is estimate of average treatment effect using one moment condition; ATE2 is estimate
of average treatment effect using two moment conditions. Standard errors bootstrapped and
clustered at district level.

44

Table 1.10: Teacher Composition
Probationary
ATE 1
2005
2006
2007
2008
2009
2010
2011

0.05
(0.04)
0.04
(0.04)
0.04
(0.03)
0.04
(0.03)
0.03
(0.03)
0.03†
(0.02)
0.03
(0.02)

ATE 2
0.05*
(0.02)
0.03
(0.02)
0.04†
(0.02)
0.04*
(0.02)
0.03*
(0.02)
0.04**
(0.01)
0.03*
(0.01)

Tenured

Long-term Substitute

ATE 1

ATE 2

ATE 1

ATE 2

-0.04
(0.13)
-0.02
(0.14)
-0.02
(0.13)
-0.05
(0.14)
-0.07
(0.14)
-0.04
(0.15)
-0.05
(0.17)

-0.04
(0.05)
-0.03
(0.05)
-0.02
(0.05)
-0.05
(0.04)
-0.07*
(0.04)
-0.04
(0.04)
-0.05
(0.04)

0.00
(0.01)
0.00
(0.01)
0.00
(0.01)
0.01
(0.02)
0.03†
(0.02)
0.00
(0.01)
0.01
(0.01)

0.00
(0.01)
-0.00
(0.01)
0.01
(0.01)
0.01
(0.02)
0.03†
(0.02)
-0.00
(0.01)
0.00
(0.01)

Note: † indicates p < 0.10, * indicates p < 0.05, ** indicates p < 0.01,
*** indicates p < 0.001. ATE1 is estimate of average treatment effect using one moment condition; ATE2 is estimate of average treatment effect
using two moment conditions. Standard errors bootstrapped and clustered at district level. Probationary, tenured, and long-term substitute
are expressed in proportions.

45

Table 1.11: Class Size Grades K-2

2005
2006
2007
2008
2009

Class Size Kindergarten

Class Size 1st grade

Class Size 2nd grade

ATE 1

ATE 2

ATE 1

ATE 2

ATE 1

ATE 2

0.33
(3.97)
0.28
(3.84)
0.24
(3.83)
0.50
(3.88)
0.15
(3.77)

0.96
(0.99)
0.83
(0.95)
0.88
(0.93)
1.26
(0.88)
0.34
(0.91)

-0.18
(3.53)
-0.93
(3.58)
-0.41
(3.40)
-0.01
(3.47)
-0.50
(3.62)

0.09
(0.72)
-0.17
(0.71)
-0.10
(0.73)
0.59
(0.59)
0.16
(0.72)

0.08
(3.62)
-0.27
(3.61)
-0.30
(3.49)
-0.21
(3.42)
-0.16
(3.47)

0.24
(0.70)
-0.12
(0.73)
0.29
(0.70)
0.36
(0.61)
0.17
(0.68)

-3.26
(4.36)

-3.04**
(1.10)

-4.70
(4.20)

-3.98***
(0.94)

-4.39
(4.06)

-4.17***
(1.02)

2010
2011

Note: † indicates p < 0.10, * indicates p < 0.05, ** indicates p < 0.01, *** indicates p < 0.001. ATE1 is estimate of average treatment effect using one moment
condition; ATE2 is estimate of average treatment effect using two moment conditions. Standard errors bootstrapped and clustered at district level. Class size
data are not available in 2010.

46

Table 1.12: Class Size Grades 3-5

2005
2006
2007
2008
2009

Class Size 3rd grade

Class Size 4th grade

Class Size 5th grade

ATE 1

ATE 2

ATE 1

ATE 1

0.21
(3.70)
-0.27
(3.64)
-0.16
(3.62)
-0.08
(3.61)
-1.33
(3.65)

-0.15
(0.74)
0.11
(0.71)
-0.23
(0.80)
0.42
(0.66)
-1.07
(0.78)

0.86
(5.21)
0.26
(5.10)
-0.40
(4.94)
-1.68
(4.73)
-4.63
(4.69)

0.30
(1.21)
-0.25
(1.08)
0.50
(1.23)
-1.51†
(0.87)
-4.77***
(1.09)

1.84
(5.70)
0.42
(5.44)
0.26
(5.55)
-1.89
(4.79)
-3.26
(5.09)

-0.45
(1.36)
-0.68
(1.20)
-0.14
(1.29)
-1.34
(0.98)
-4.36**
(1.39)

-4.12
(4.04)

-4.20***
(0.95)

-4.79
(4.80)

-5.54***
(1.14)

-5.48
(4.85)

-6.07***
(1.29)

ATE 2

ATE 2

2010
2011

Note: † indicates p < 0.10, * indicates p < 0.05, ** indicates p < 0.01, ***
indicates p < 0.001. ATE1 is estimate of average treatment effect using one
moment condition; ATE2 is estimate of average treatment effect using two
moment conditions. Standard errors bootstrapped and clustered at district
level. Class size data are not available in 2010.

47

Table 1.13: Math Standardized Test
2nd Grade
ATE 1
2005
2006
2007
2008
2009

2011

-0.02
(0.08)
-0.06
(0.08)
-0.08
(0.07)
-0.08
(0.07)
0.05
(0.08)
0.10
(0.08)
0.30***
(0.09)

ATE 1
-0.09
(0.19)
0.02
(0.18)
-0.07
(0.21)
-0.01
(0.19)
0.07
(0.18)
0.23
(0.18)
0.32†
(0.18)

4th Grade

ATE 2

ATE 1

-0.08
(0.06)
0.00
(0.06)
-0.05
(0.07)
-0.01
(0.07)
0.04
(0.07)
0.18*
(0.08)
0.29**
(0.09)

-0.08
(0.20)
-0.05
(0.19)
-0.02
(0.17)
0.04
(0.17)
0.30*
(0.15)
0.40**
(0.16)
0.45**
(0.15)

48

2010

-0.01
(0.22)
-0.04
(0.23)
-0.08
(0.19)
-0.06
(0.18)
0.10
(0.17)
0.16
(0.19)
0.36*
(0.15)

ATE 2

3rd Grade

ATE 2
-0.07
(0.06)
-0.07
(0.07)
-0.02
(0.06)
0.03
(0.06)
0.32***
(0.08)
0.40***
(0.08)
0.40***
(0.08)

5th Grade
ATE 1
-0.10
(0.20)
-0.06
(0.21)
-0.01
(0.18)
0.16
(0.17)
0.14
(0.16)
0.45*
(0.18)
0.47**
(0.16)

ATE 2
-0.09
(0.08)
-0.05
(0.06)
0.00
(0.06)
0.11
(0.07)
0.09
(0.07)
0.37***
(0.09)
0.42***
(0.10)

Note: † indicates p < 0.10, * indicates p < 0.05, ** indicates p < 0.01, *** indicates p < 0.001. ATE1 is
estimate of average treatment effect using one moment condition; ATE2 is estimate of average treatment
effect using two moment conditions. Standard errors bootstrapped and clustered at district level.

48

Table 1.14: ELL Standardized Test
2nd Grade
ATE 1
2005
2006
2007
2008

2010
2011

0.09
(0.06)
-0.02
(0.06)
-0.06
(0.05)
-0.06
(0.07)
0.03
(0.06)
0.07
(0.07)
0.20**
(0.07)

4th Grade

ATE 1

ATE 2

ATE 1

-0.01
(0.20)
0.05
(0.20)
0.01
(0.22)
0.00
(0.20)
0.05
(0.21)
0.19
(0.19)
0.23
(0.19)

-0.02
(0.04)
0.05
(0.06)
0.01
(0.07)
-0.02
(0.06)
-0.01
(0.07)
0.10
(0.08)
0.18*
(0.09)

0.03
(0.20)
0.00
(0.20)
0.09
(0.20)
0.07
(0.19)
0.17
(0.19)
0.24
(0.19)
0.33*
(0.17)

49

2009

0.10
(0.21)
0.00
(0.21)
-0.06
(0.19)
-0.03
(0.18)
0.09
(0.18)
0.10
(0.17)
0.26
(0.16)

ATE 2

3rd Grade

ATE 2
0.01
(0.05)
-0.03
(0.05)
0.06
(0.05)
0.03
(0.05)
0.16*
(0.06)
0.20**
(0.06)
0.24***
(0.06)

5th Grade
ATE 1
-0.01
(0.21)
-0.01
(0.21)
-0.02
(0.19)
0.14
(0.19)
0.07
(0.19)
0.26
(0.19)
0.32†
(0.19)

ATE 2
0.00
(0.07)
-0.01
(0.06)
-0.01
(0.04)
0.08
(0.05)
0.03
(0.05)
0.20**
(0.06)
0.23***
(0.07)

Note: † indicates p < 0.10, * indicates p < 0.05, ** indicates p < 0.01, *** indicates p < 0.001.
ATE1 is estimate of average treatment effect using one moment condition; ATE2 is estimate
of average treatment effect using two moment conditions. Standard errors bootstrapped and
clustered at district level.

49

Table 1.15: Cost-Benefit Analysis
Grade
in 2009

PDV

SD gain
2009

2010

Cost per SD
2011

2009

2010

2011

Upper Bound
4rd
3th
2th

$2,975
$3,609
$3,235

0.11
0
0

0.13
0.13
0

NA
0.15
0.14

$27,047
∞
∞

$22,885
NA
$27,764 $24,062
∞
$23,107

Lower Bound
4rd
3th
2th

$2,059
$2,506
$2,132

0.11
0
0

0.13
0.13
0

NA
0.15
0.14

$18,720
∞
∞

$15,838
NA
$19,277 $16,707
∞
$15,228

Project STAR
Pre-K

$5,247

0.20

0.28

0.22

$26,239

$18,742

$23,853

Note: Upper bound of QEIA costs assumes administrative expenses
are all fixed costs, while the lower bound assumes they are variable
costs. Estimates of cost of implementing Project STAR class size reduction assume cost of additional teacher is $80,530. Test score gains
from Project STAR class size reduction are from Krueger (1999).

50

BIBLIOGRAPHY

51

BIBLIOGRAPHY
Angrist, J.D. and V. Lavy (1999). Using Maimonides’ rule to estimate the effect of class
size on scholastic achievement. The Quarterly Journal of Economics 114.2, pp. 533–575.
Balcom, Fred (Feb. 2007). Quality Education Investment Act (QEIA) of 2006. http://
www.cde.ca.gov/fg/fo/r16/documents/qeia07present.ppt. California Department
of Education.
Barrett, Nathan, JS Butler, and Eugenia F Toma (2012). Do less effective teachers choose
professional development? Does it matter? Evaluation Review 36.5, pp. 346–374.
Bluth, Alexa H. (Aug. 2005). Lawsuit seeking cash for schools; Governor broke his word,
say teachers and schools chief. Sacramento Bee. http : / / www . mikemcmahon . info /
ctasuit.htm.
CDE (Jan. 2010). Report to the Legislature and the Governor; Quality Education Investment Act First Progress Report. http://www.cde.ca.gov/ta/lp/qe/documents/
qeialegrpt.doc. California Department of Education.
Chetty, Raj, John N Friedman, Nathaniel Hilger, Emmanuel Saez, Diane Whitmore Schanzenbach, and Danny Yagan (2011a). How does your kindergarten classroom affect your
earnings? Evidence from Project STAR. The Quarterly Journal of Economics 126.4,
pp. 1593–1660.
Chetty, Raj, John N Friedman, and Jonah E Rockoff (2011b). The long-term impacts of
teachers: Teacher value-added and student outcomes in adulthood.
Chiang, Hanley (2009). How accountability pressure on failing schools affects student
achievement. Journal of Public Economics 93.9, pp. 1045–1057.
Chingos, Matthew M (2012). The impact of a universal class-size reduction policy: Evidence
from Florida’s statewide mandate. Economics of Education Review 31.5, pp. 543–562.
Cohodes, Sarah, David Deming, Jennifer Jennings, and Christophe Jencks (2013). School
Accountability, Postsecondary Attainment and Earnings. NBER Working Paper.
Crump, Richard K, V Joseph Hotz, Guido W Imbens, and Oscar A Mitnik (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika 96.1,
pp. 187–199.
Dieterle, Steven (2013). Class-size Reduction Policies and the Quality of Entering Teachers.
Figlio, David N and Lawrence S Getzler (2006). Accountability, ability and disability:
Gaming the system? Advances in Applied Microeconomics 14, pp. 35–49.

52

Greenwald, R., L.V. Hedges, and R.D. Laine (1996). The effect of school resources on
student achievement. Review of Educational Research 66.3, pp. 361–396.
Hahn, Jinyong (1998). On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica, pp. 315–331.
Han, Chirok and Beomsoo Kim (2011). A GMM interpretation of the paradox in the inverse probability weighting estimation of the average treatment effect on the treated.
Economics Letters 110.2, pp. 163–165.
Hanushek, E.A. (1979). Conceptual and empirical issues in the estimation of educational
production functions. Journal of Human Resources, pp. 351–388.
Hanushek, E.A. (1986). The economics of schooling: Production and efficiency in public
schools. Journal of Economic Literature 24.3, pp. 1141–1177.
Hanushek, E.A. (1996). A more complete picture of school resource policies. Review of
Educational Research 66.3, pp. 397–409.
Hanushek, E.A. (1997). Assessing the effects of school resources on student performance:
An update. Educational Evaluation and Policy Analysis 19.2, p. 141.
Hanushek, E.A. (1999). Some findings from an independent investigation of the Tennessee
STAR experiment and from other investigations of class size effects. Educational Evaluation and Policy Analysis 21.2, p. 143.
Henmi, Masayuki and Shinto Eguchi (2004). A paradox concerning nuisance parameters
and projected estimating functions. Biometrika 91.4, pp. 929–941.
Hirano, Keisuke, Guido W Imbens, and Geert Ridder (2003). Efficient estimation of average
treatment effects using the estimated propensity score. Econometrica 71.4, pp. 1161–
1189.
Hitomi, Kohtaro, Yoshihiko Nishiyama, and Ryo Okui (2008). A puzzling phenomenon
in semiparametric estimation problems with infinite-dimensional nuisance parameters.
Econometric Theory 24.06, pp. 1717–1728.
Horvitz, Daniel G and Donovan J Thompson (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association
47.260, pp. 663–685.
Hoxby, C.M. (2000). The effects of class size on student achievement: New evidence from
population variation. Quarterly Journal of Economics 115.4, pp. 1239–1285.
Imbens, Guido W and Jeffrey M Wooldridge (2009). Recent developments in the econometrics of program evaluation. Journal of Economic Literature 47.1, pp. 5–86.

53

Jacob, Brian A (2005). Accountability, incentives and behavior: The impact of high-stakes
testing in the Chicago Public Schools. Journal of Public Economics 89.5, pp. 761–796.
Jacob, Brian A and Steven D Levitt (2003). Rotten apples: An investigation of the prevalence and predictors of teacher cheating. The Quarterly Journal of Economics 118.3,
pp. 843–877.
Jepsen, Christopher and Steven Rivkin (2009). Class Size Reduction and Student Achievement: The Potential Tradeoff between Teacher Quality and Class Size. Journal of Human Resources 44.1, pp. 223–250.
Krueger, A.B. (1999). Experimental estimates of education production functions. The Quarterly Journal of Economics 114.2, pp. 497–532.
Krueger, A.B. (2002). Understanding the magnitude and effect of class size on student
achievement. The Class Size Debate, pp. 7–35.
Krueger, A.B. and D.M. Whitmore (2001). The effect of attending a small class in the
early grades on college-test taking and middle school test results: Evidence from Project
STAR. The Economic Journal 111.468, pp. 1–28.
Neal, Derek and Diane Whitmore Schanzenbach (2010). Left behind by design: Proficiency
counts and test-based accountability. The Review of Economics and Statistics 92.2,
pp. 263–283.
Nye, B., L.V. Hedges, and S. Konstantopoulos (1999). The long-term effects of small classes:
A five-year follow-up of the Tennessee class size experiment. Educational Evaluation and
Policy Analysis 21.2, pp. 127–142.
Podgursky, Michael (2006). Is Teacher Pay “Adequate?” Education Working Paper Archive.
Prokhorov, Artem and Peter Schmidt (2009). GMM redundancy results for general missing
data problems. Journal of Econometrics 151.1, pp. 47–55.
Qian, Hailong and Peter Schmidt (1999). Improved instrumental variables and generalized
method of moments estimators. Journal of Econometrics 91.1, pp. 145–169.
Rice, J.K. and A.E. Schwartz (2008). “Toward an understanding of productivity in education.” Handbook of Research in Education Finance and Policy. Routledge New York.
Chap. 8, pp. 131–165.
Rockoff, Jonah (2009). Field experiments in class size from the early twentieth century.
The Journal of Economic Perspectives 23.4, pp. 211–230.
Santa Rosa City Schools (Mar. 2007). School Board Minutes, Quality Education Investment
Act. http : / / www . srcs . k12 . ca . us / board / agendas / attachments / 032807 - BR F7.pdf.

54

Todd, P.E. and K.I. Wolpin (2003). On the Specification and Estimation of the Production
Function for Cognitive Achievement. The Economic Journal 113.485, F3–F33.
U.S. Department of Education (2009). U.S. Department of Education. http://nces.ed.
gov/programs/digest/d09/tables/dt09_079.asp. U.S. Department of Education.
Institute of Education Sciences, National Center for Education Statistics.
Wooldridge, Jeffrey M (2004). Estimating average partial effects under conditional moment
independence assumptions. CeMMAP Working Paper Number CWP03/04.
Wooldridge, Jeffrey M (2010). Econometric analysis of cross section and panel data. Second
Edition. MIT Press.
Yoon, Kwang Suk, Teresa Duncan, Silvia Wen-Yu Lee, Beth Scarloss, and Kathy L Shapley
(2007). Reviewing the evidence on how teacher professional development affects student
achievement. National Center for Educational Evaluation and Regional Assistance, Institute of Education Sciences, US Department of Education.

55

Chapter 2
School Districts’ Revealed Preference
for Resource Allocation: Evidence
from California’s Quality Education
Investment Act

2.1

Introduction

Despite a growing concern since the 1970s over financial disparities across school districts,
with the least resources going to those districts least able to raise local revenue,1 relatively
little attention has been paid to disparities in resource allocation within district. Due primarily to a paucity of school-level financial data, few analyses have explored the causes and
consequences of intra-district disparities. Those that do analyze disparities within district
are typically restricted to post-hoc descriptive analyses. They observe, often imperfectly,
the outcome of the process that allocates resources across schools, a process driven by
district preferences and constrained by institutional factors such as union contracts. California’s Quality Education Investment Act (QEIA), on the other hand, provides a unique
1

For a review, see Corcoran et al. (2008), and Springer et al. (2008).

56

opportunity to directly examine district preferences over low-performing schools, which
may be a driving force of intra-district disparities.
Implemented in 2007, the QEIA allocated approximately $2.7 billion to low-performing
schools, and required that funded schools implement a bundle of reforms. Rather than
spread the money across the approximately 1,200 schools that were eligible and chose to
participate, it was decided that funds would be distributed to a small number of semirandomly chosen schools, to enable financing of ambitious reforms in each school.
As part of the process of choosing funded schools, each district with more than one
participating school was required to rank its participating schools. If selected for funding,
the school was required to reduce class size, increase their counselor-student ratio,2 provide
training opportunities to staff, align their average teacher experience with the district
average, and meet accelerated academic performance targets. Districts believed that the
probability of a school receiving funding was higher for higher ranked schools.3 District
rankings therefore provide a window into district preferences for resource allocation.
Using a discrete choice mode that leverages districts’ full rankings, this paper seeks to
address the question of how districts choose to allocate resources across schools. I find
some evidence that districts preferred schools that applied to, and would receive priority
for, an alternative program, in which schools crafted their own reforms. There is also some
evidence that districts ranked highly schools with a high percentage of students eligible for
Free and Reduced Price Lunch (FRPL).
There is clear evidence that districts ranked highly those schools that had been repeatedly sanctioned under No Child Left Behind for failing to make Adequate Yearly Progress.
2

This requirement applied only to high schools.

3

The actual selection process was misrepresented to schools, as described below. In fact, the probability
of selection was not monotonic in district rankings.

57

These sanctions imposed costs on districts, and districts apparently preferred using QEIA
resources to help mitigate those costs. This has important implications for policy makers,
particularly where policies from various levels of government overlap: schools for which the
federal government imposed a cost for failing to meet achievement targets were more likely
to receive support from their districts. The federal government did therefore effectively
incentivize districts to shift resources toward under-performing schools.
Tests of the assumption that all districts share common preferences reject, perhaps due
solely to the preferences of one large district, LA Unified. Additionally, a test that districts
weight characteristics in selecting the highest ranked school in the same way that they
rank characteristics for all schools rejects. Nonetheless, the general pattern in coefficients
persists, particularly those pertaining to NCLB sanctions. An important limitation of
QEIA is that it does not reveal district preferences over all schools, only over those that
are low-performing and chose to participate in QEIA.
The remaining paper proceeds as follows: section 2.2 relates this paper to existing
literature; section 2.3 lays out the institutional details of QEIA; section 2.4 describes the
data; section 2.5 outlines the model and identification; section 2.6 presents results. Section
2.7 concludes.

2.2

Literature

This paper contributes to the still-nascent literature on intra-district resource allocation,
which by necessity focuses on large school districts for which school-level financial data are
available. For example, Iatarola et al. (2003) examine resource allocation across middle
and elementary schools in New York City. They examine distributions across schools for
58

similar students, distributions within schools for different students, e.g., special education
and regular education students, and the association between educational outcomes and
equity of resources. In their analysis of resource allocation, they use measures of inequality
such as the range, Gini coefficient, and coefficient of variation of resource allocation. They
find that there is a trade-off between teacher salaries and certification on one hand, and
lower pupil-teacher ratios on the other. They hypothesize that districts try to channel
more teachers toward lower-performing schools, but that union contracts allow experienced
teachers to choose higher-performing schools.
Roza et al. (2004) find that the practice of using average salary, common among researchers and public officials, creates important intra-distract disparities that can be exacerbated by “budget layering.” They note, for example, that Title I is intended to supplement
schools after districts equate spending across schools. However, since the legislation permits
calculation of teacher salaries using the average in the district, disparities persist. They
note that the vast majority of school districts are blind to these disparities because they
focus on average teacher salaries, ignoring the possibility that more experienced, and thus
higher paid teachers might be channeled toward particular types of schools.
In another study, Klein (2008) analyzes school-level financial data from the Metropolitan
Nashville-Davidson County School District in Tennessee, and finds that when enrollment
is controlled for, districts actually allocate more resources to schools with high percentages
of students eligible for Free and Reduced Price Lunches. No evidence is found to suggest
that preferences are determined by academic performance or percent of minority students.
Unlike the above studies, which rely on observed allocations of funding that are the
result of district preferences and institutional constraints, the QEIA allows for a direct

59

analysis of district preferences over low-performing schools. The following section details
the background and implementation of the QEIA.

2.3

Institutional Details

The QEIA was the consequence of litigation against then California Governor Arnold
Schwarzenegger. The plaintiffs in the case successfully argued that the state underfunded
public schools in the 2004-2005 and 2005-2006 school years.4 As a result of the settlement,
the state was required to pay back approximately $2.7 billion to K-12 schools.
Recognizing that allocating the money equally across all schools, or even across all
low-performing schools, would have a small impact on per-pupil funding in those schools,
legislators decided instead to focus on a subset of low-performing schools. The subset was
chosen using a lottery mechanism, and the number of schools was chosen such that funding
would increase by $500 per student in kindergarten through 3rd grades, $900 in grades 4-8,
and $1,000 in high school from 2008-2014, and by half as much in 2007.
Schools were eligible to participate in the lottery if they were in the bottom two deciles
of the state’s academic performance distribution, as determined by California’s Academic
Performance Index (API). Eligible schools had to commit to meeting the requirements of
QEIA before they could participate in the selection process, though they could instead apply
to an alternative program, in which they would design their own reforms. Each district with
more than one participating school was required to rank each of its participating schools.
It was permissible to give multiple schools the same rank, and indeed several districts did
so.
4

Henceforth, academic years are referred to by the year in which the Spring semester occurs. Thus the
2005-2006 school year is referred to as 2006.

60

Figure 2.1 provides an excerpt of the ranking submitted by the San Diego School District. Districts had to rank participating schools, list the type of each school (e.g., middle or
elementary), whether the school was applying to the alternative program, and if it should
receive priority consideration for the alternative program. Priority consideration for the
alternative program was given to all high schools, and not to any elementary or middle
schools. Each district was then assigned as many random numbers as it had participating
schools, and the numbers were allocated to schools based on the rankings. For example, if
a district received the numbers 1 and 201, 1 was assigned to the highest ranked school and
201 to the next school. If districts assigned tied rankings, the California Department of
Education randomly chose the ordering. Within each stage of the selection, these random
numbers determined the order.
Districts were told that the selection would occur in three stages: first, high schools
that applied to the alternative program with the lowest random numbers would be chosen
until 15% of funds were allocated;5 second, to ensure geographic diversity, the school with
the lowest random number in each county would be selected, if that county did not have a
school funded in the first stage; finally, schools with the lowest random numbers applying
to the regular program would be selected, until all funds were exhausted. In fact, the last
stage was divided in two, with high schools randomly selected separately from and prior
to middle and elementary schools. That districts were told high schools and elementary
schools would be treated equally is evidenced in contemporaneous school board minutes,
e.g., (Santa Rosa City Schools (2007)), and CDE presentations (Balcom (2007)). This is
also the depiction in the report to the California legislature (CDE (2010)), written 3 years
5

If all high schools were selected and the 15% of funds not exhausted, those elementary and middle
schools with the lowest random numbers applying to the alternative program would have been selected.
However, given the number of high schools that applied, this outcome was not possible.

61

after the selection process occurred. Indeed, to my knowledge there is no public source
that correctly describes the separation of high schools in the selection process.6
Once a school was selected, it was required to implement the following reforms: reduce
class sizes; align average teacher experience, as measured by the Teacher Experience Index
(TEI), with the district average; provide professional development for teachers and staff;
ensure that all teachers at the school met requirements to be Highly-Qualified Teachers
(HQT); high schools were required to increase counselor-student ratios; and all selected
schools were required to exceed their average achievement growth target over the first
three years.7 Schools in the alternative program were exempt from the bulk of these
requirements.
The QEIA stipulated that funded schools should have no more than 20 students per
class in kindergarten through 3rd grades, and no more than 25 per class in grades 4-12 - or
5 fewer than the baseline average class size, which ever was lower.8 In the first three years
of QEIA, schools were required to reduce the difference between the pre-QEIA average class
size and the QEIA target class size by 1/3. For some schools, the average in 2007 was quite
low, which was particularly strenuous for small schools with a single class room per grade.
As such, many schools applied for and were granted waivers from this requirement, and
instead met a higher minimum class size requirement. There is no evidence that schools
were aware of the possibility of a waiver when they ranked schools.
Under QEIA, teacher experience is measured by the Teacher Experience Index (TEI).
6

I discovered that the common description was incorrect only after attempting to reconcile the district
rankings, actual random number allocation, and funding results. After extensive conversation with CDE
employees I learned the actual method.
7
Student achievement growth targets in California are formulaic: each school is required to improve by
5% of the difference between their API and 800, or by 1 point, which ever is greater, until they reach 800.
8
The baseline was the school’s grade-level average class size in 2006, unless that average was greater
than 25, in which case 2007 was used.

62

In calculating the TEI, teachers with more than 10 years of experience are assigned 10
years in calculating the average. Part-time teachers are given full weight in the calculation,
and teachers at multiple schools count towards each school’s average. Funded schools are
required to exceed the district average Teacher Experience Index.
Districts selected for QEIA were required to provide professional development opportunities for teachers, administrators, and paraprofessionals, e.g., teaching aides. Funded
schools were required to build and maintain a system for tracking participation in professional development programs, and districts were required to ensure that funded schools
were in fact meeting the requirements. Participation requirements for teachers were clearly
spelled out by QEIA, e.g., each year at least one third of teachers in a QEIA funded school
were required to participate in training, but the specifics of the training program were
largely left to the schools and districts.
All teachers in QEIA funded schools had to meet the requirements of the federal Elementary and Secondary Education Act (ESEA) for Highly Qualified Teachers. According
to CDE (2010), and as corroborated in the data (see Table 2.1), the vast majority of schools
eligible to participate in QEIA were already required to meet the HQT standard. Schools
were also required to meet the stipulations of the Williams settlement, which was the result of a 2004 court case, Williams v. California State, and already applied to most QEIA
eligible schools prior to QEIA (again, see Table 2.1). The Williams settlement required
low-performing schools to have qualified teachers and safe, well-maintained facilities.
High schools that received funding under QEIA that were not in the alternative program
were required to increase their counselor-student ratio to 1:300. As with the class size
reduction, schools were required to reduce the difference between their initial counselor-

63

student ratios and the target level by 1/3 in each of the first three years. Since so few
schools in QEIA are high schools, and since half of the high schools in QEIA are in the
alternative program, this requirement has not been monitored as extensively as the others
(CDE (2010)).
All funded schools were required to have a growth in API over the first three years of
funding that exceeded the average target growth over those three years. The API growth
target is determined formulaically for all schools in California. After the first three years of
funding, regular QEIA schools are required to meet target growth rates for each subsequent
year, and QEIA schools participating in the alternative program are required to continue
exceeding the target.
QEIA went into effect against the backdrop of the federal No Child Left Behind Act.
Under NCLB, schools are required to make Adequate Yearly Progress (AYP), or enter
“Program Improvement” (PI) status. Each year that a school remains in PI it faces increasing sanctions. If a school meets AYP for one year, it’s PI status remains the same,
i.e., it does not advance to the next year of PI. If the school meets AYP for two consecutive
years, it exits PI. In the first year of PI, districts must notify parents and provide them the
option to choose another school in the district that is not in PI, while schools must divert
Title I funding toward professional development. In the second year, the requirements of
the first year persist, and districts must also provide supplemental services to students.
In the third year, the district is required to take more severe corrective action, which
can include replacing the entire school staff, replacing the curriculum, extending the school
year or day, or appointing an outside expert. In the fourth year, a district must plan for
changing the governance structure of the school; it can for instance reopen the school as a

64

charter school, replace the staff, or allow the state to take over. In the fifth year the district
must implement this plan.

2.4

Data

This analysis draws on a number of publicly available data sets collected by the California Department of Education (CDE). School-level demographic data are made publicly
available in California’s (API) Data Files.9 These data include demographics such as enrollment, percent Free and Reduced Price Lunch, and percent of parents with a high school
degree. Also included is each school’s API, which is a weighted average across subjects of
that school’s performance on state standardized tests.
Yearly teacher-level data are also publicly available. Unfortunately, each year teachers
are reassigned unique identifiers, so the data are not linked across years. I therefore create
a synthetic panel, which links teachers across years and within schools on the basis of
teacher characteristics, notably teaching experience and experience within the district. For
example, if in a particular school there’s a teacher in 2008 with four years of experience
teaching and 2 years experience teaching in the district, and in the following year that same
school has a teacher with five years of teaching experience and three years experience in
the district, and both are equivalent in gender and race, I link those observations. If two
teachers are observationally equivalent, I randomly link them across years. An important
shortcoming of this synthetic panel is that I cannot accurately determine duration spells.
These data also include administrators and employees who interact with students but are
not teachers, e.g., guidance counselors, librarians.
9

Available at http://www.cde.ca.gov/ta/ac/ap/apidatafiles.asp

65

Class size data are available at the employee-assignment level, where an assignment is
a class taught by a particular teacher. Teachers teaching multiple math classes at a school
appear multiple times in this data, as do classes with multiple teachers, e.g., a teacher and
teacher’s aide. I use these data to construct average class sizes and the class size targets
for schools were they to participate in QEIA.
Finally, I use the rankings submitted by districts to the California Department of Education. A portion of the form submitted by San Diego County is displayed in figure 2.1.
Districts were required to note whether the school was participating; the type of the school;
whether the school was applying to the alternative program; and, for alternative schools,
whether they met the requirement for priority funding.10 Most importantly, districts had
to assign a rank to each participating school. Districts could give all schools a rank of one,
or rank a subset of schools equally.
Table 2.1 provides descriptive statistics for all California schools, those eligible to participate which chose not to, those that applied for the alternative program, and those that
applied to the regular program. High schools were more likely to apply to the alternative program, for which they received priority. As mentioned above, the vast majority of
teachers in schools eligible and participating in QEIA were already classified as HQT, and
indeed this is true of all schools in California. The Williams Settlement applied to nearly a
quarter of all schools, and nearly all schools eligible and or participating in QEIA. Schools
that were eligible to participate but chose not to do so were more likely to have met their
growth target in 2006. Not surprisingly, schools that were ineligible were less likely to be
in PI in 2007.
Table 2.2 provides descriptive statistics for my analytic sample. From the set of all
10

Priority funding for the alternative program was given to all high schools that applied for that program.

66

schools participating in QEIA, my analytic sample drops those that are the only school in
their district, and those for which all schools in the district were given the same ranking,
since these schools provide no information on district preferences. The table provides
summary statistics by the number of schools in the district. The data contain 30 districts
that ranked 2 schools, 29 that ranked 3 schools, 51 districts with 4-10 schools, and one
district, LA Unified, with 234 schools.
For comparison with later results, table 2.3 presents unconditional differences in covariates between schools ranked above the median, and schools ranked below the median, by
district size. If a district has an odd number of schools, the median school is randomly set
to be above or below. In larger districts, including LA Unified, charter schools were less
likely to be ranked above the median.11 Larger districts are also less likely to rank highly
schools that have just entered PI, but more likely to rank highly schools that are in the 5th
stage of PI. Across districts of all sizes, schools that bring in more revenue are more likely
to be ranked highly, and even more so if they met their growth target in 2006. In larger
districts middle schools were more likely to be found ranked above the median.

2.5

Model

Each district with more than one participating school was required to rank each of its
participating schools, with the understanding that the highest ranked school in each district
had the highest probability of being chosen as a QEIA school.
Let Nd denote the number of schools in district d. Schools are indexed by r, which is also
their ranking, and are denoted as Sr , where Sr
11

Si

∀i > r, and

denotes preference.

That several coefficients are significant only for larger districts is in part an artifact of the greater
statistical power given the larger samples.

67

If there exists a function F (Z r ) = F ∗ (Z r ) + r , where r is iid type I extreme value, such
that

S1

S2

···

SN

d

⇐⇒ F (Z 1 ) > F (Z 2 ) > · · · > F (Z N )
d

(2.1)

then we can use the following result, known as a rank order logit or exploding logit, and
first introduced into the economics literature by Beggs et al. (1981):

N

Pr(S1

S2

···

SN ) =
j=1

∗

F
e j
N

∗
eFm

(2.2)

m=j

This model does not explicitly allow for tied rankings, which would occur with probability zero. In the QEIA rankings, most ties occur where districts gave the same ranking to
all their participating schools, and these districts therefore provide no information about
district preferences. I drop these schools, and am left with five districts that give the same
rank to two schools, and different ranks to other schools, and one that ranks three schools
first, followed by others.
Tied rankings are analogous to tied exit times in a proportional hazard model. There,
ties can be thought of as the consequence of low-frequency data. For example, if data are
collected yearly, multiple observations may exit throughout the year at different points, but
they are observed as exiting simultaneously. If districts are insensitive to small differences
in F , then tied rankings would in fact obscure an underlying ordering.
If there is in fact an underlying order, tied rankings can be accommodated by modifying
equation (2.2) such that a set of tied schools contribute to the likelihood through the sum
of all possible orderings. For example, if a district had 3 schools, and ranked two first and

68

the other second, their contribution to the likelihood would be

P(S1

S2

S3 ∪ S2


S1

∗
F1
e
 ∗
∗
eF1 + eF2

S3 ) =


∗
F2



e
∗
+ e F3



∗
eF2

∗
+ e F3



+

e
∗
eF1

∗
F2

∗
+ e F2



∗
F1



∗
+ eF3



e
∗
+ eF3



∗
e F1

This can quickly complicate the likelihood, since for Tj schools given tied ranking j, there
are Tj ! terms in the summand. As such, Stata provides various methods for approximating
the exact likelihood. For my results, I use Efron’s approximation Efron (1977), which for
the above example would use

P(S1

S2

S3 ∪ S2


S1
∗
F1

S3 ) =


∗
F2

e

e


∗
eF1

∗
+ e F2



∗
+ eF3



∗
0.5 ∗ (eF1

∗
∗
+ eF2 ) + eF3



Stata’s “exactm” option, which uses a Gauss-Laguerre quadrature approximation of the
exact likelihood, yields similar point estimates,12 but does not allow for the calculation of
robust standard errors.
The question then is how to model F . I assume F ∗ = F ∗ (E(Revenue), E(Cost), X), and
my interest lies in estimating ∂F ∗ /∂X. That is, holding constant the expected revenue and
12

Results are available from the author by request.

69

costs of participating in QEIA, what school characteristics influence the rankings?
The revenue from a school is a function of its enrollment and the probability that the
school remains funded in each subsequent year, while the expected cost is a function of
how many teachers must be hired, the required average experience of those teachers, and
the total number of teachers, administrators, and para-professionals for whom professional
development must be provided. Conditional on meeting the class size, teacher experience,
and professional training requirements, whether a school remains in the program is a function of whether its average growth over the first three years exceeds its average growth
target over those years. I proxy for this using an indicator for whether the school met its
growth requirement in 2006, which I interact with measures of revenue and costs.
Revenue is included as the sum across grades of the per-pupil increase in funding times
the number of students in each grade. I model the cost of meeting the class size reduction
requirement as being proportional to the number of new teachers that must be hired, i.e.,
the change in the teacher-pupil ratio times the number of students:

∆T = Number new teachers =

1
CStarget

−

1
CS2007

∗ Enrollment2007

(2.3)

A school may be able to satisfy its teacher experience requirement and the class size
reduction requirement by ensuring that all newly hired teachers have at least 10 years of
experience. However, it may also be the case that even after having satisfied the class size
reduction requirement, additional changes to the teacher workforce will be required to meet
the experience requirement. To capture this, I include an indicator for whether TEI binds
after meeting the class size reduction requirement, i.e.,
70

TEI Binds = 1

1
(TEI ∗ T2007 + ∆T ∗ 10) < TEItarget
T2007 + ∆T

(2.4)

where Tt is the number of teachers in year t, and 1[·] is the indicator function. To capture
the cost of providing professional development, I include T2007 , and I include the number
of paraprofessionals. This last variable is also interacted with an indicator for being a high
school to capture the need for high schools to meet the counselor-to-student ratio.13
Also included in the model are demographic variables, including indicators for what if
any year of PI the schools is in, whether the school applied to the alternative program,
indicators for being high schools or middle schools, an interaction between high school and
alternative, an indicator for whether the school is a charter school, the percent of students
eligible for Free and Reduced Price Lunch, and the percent of students who are Hispanic.
To summarize the model, I estimate the following:

F = ξ1 Revenue + ξ2 Revenue ∗ Met2006 + ξ3 ∗ Met2006 + cγ + cδ ∗ Met2006 + βX + ε (2.5)

where Met2006 is an indicator for having met the growth target in 2006, c is a row vector
containing number of teachers, number of non-teaching employees working with students,
the gap between a school’s TEI and its target, required number of new teachers, and
indicators for high school and middle school.
Though this “kitchen sink” approach might fully control for the expected cost and
revenue of participating in QEIA, care must be taken in interpreting the coefficients on
measures of revenue and cost. Consider, for example, a middle school with an average
13

Guidance counselors are included in this variable, but cannot be identified separately from other
positions, e.g., school nurses.

71

class size of 35 students and a target of 25 students per class. Revenue, as a function of
enrollment, cannot increase while holding constant the required number of new teachers,
since for every 87.5 additional students the school receives an additional $78,780, and must
hire one more teacher to fulfill the class size requirement.14 While I attempt to control for
expected revenue and costs, I don’t tease apart their effects.

2.6

Results

Coefficients on demographic variables are presented in Table 2.4, with the baseline results
in the first column. Standard errors in brackets are robust to misspecification. That is,
even if ε does not follow the type I extreme value distribution, the robust standard errors
are correct for estimates of the parameters that minimize the misspecified log likelihood.
Standard errors in parentheses are not robust to this misspecification. Though no more
likely to rank highly middle and elementary schools that applied for the alternative program,
the non-robust standard error and point estimate suggests districts were more likely to rank
highly high schools applying to this program. This suggests that districts understood that
high schools would be given priority in this program, and they valued the flexibility of
the program. Districts were no more likely to rank charter schools higher than regular
schools, though they were more likely to rank highly schools with a higher percentage of
FRPL students, with a 43 percentage point increase in FRPL having an effect of the same
magnitude as being a high school applying for the alternative program. There is marginal
evidence that districts preferred schools with a high percentage of black students.
The pattern in the point estimates on year in PI suggests the more years a school was in
14

For comparison, average teacher salary in California in 2008 was $64,424 U.S. Department of Education
(2009), excluding benefits, which Podgursky (2006) estimates to account for 20% of total compensation.

72

PI the more likely the district was to rank the school highly. The financial and reputation
costs of PI increased each year, and districts appear to have seen QEIA as a way to limit
these costs. The effect of being in the 5th year of PI is almost four times as large as the
effect for being in the 1st year, and is comparable to the effect of a school going from no
students eligible for FRPL to all students being eligible.
LA schools make up 21.1% of my sample, and my results may partially be driven by
the ranking of LA Unified. Column 2 presents estimation results excluding LA from the
sample. In an interaction of all variables with an indicator for being an LA school, a Wald
test rejects the null of no difference in coefficients with p < 0.001. Nonetheless, as the
second column indicates, the pattern on demographic coefficients is quite similar to the
baseline model. One exception is the effect of being a charter school, which diminished
a school’s ranking more in districts other than LA Unified. Districts give more favorable
rankings to high schools participating in the alternative program, and to schools with a
high percentage of students eligible for Free and Reduced Price Lunch. The pattern persists
of districts giving higher rankings to schools the longer they are in PI, though the effect is
smaller.
The distinct preferences of LA Unified are one example of how the assumption of same
coefficients across districts could be violated. Other tests are presented in columns 3 and 4,
both of which include an interaction with a count variable of the number of participating
schools, and the latter of which also excludes LA schools. The count variable is the number
of participating schools minus the average number of participating schools in districts other
than LA Unified. A Wald test of the null hypothesis that the coefficients on terms interacted
with the count variable are all zero rejects when LA is included (p < .001), and at the 10%

73

level when LA is excluded (p = 0.08). For a typical district excluding LA, schools with
high percentages of FRPL lunch students, and schools in latter years of PI, are more likely
to be ranked highly. There is some evidence that the typical district, excluding LA, was
less likely to rank charter schools highly.
Another assumption of the rank order logit model is that districts weight characteristics
equally whether they are ranking the first school or the last. One way to test this assumption
is to estimate a conditional logit model, in which districts choose only the highest ranked
school. The results from this exercise are presented in the final column. A Hausman
test of the null hypothesis of no misspecification rejects (p = 0.0026), suggesting the rank
order logit assumptions are violated. Nonetheless, one finding remains true across all
specifications: districts were more likely to rank highly schools that were in the fifth, and
most severe year of PI. Given that the rank order logit model fails several specification
tests, the results should be interpreted as descriptive.
Nonetheless, across specifications a clear pattern emerges: districts preferred to rank
highly those schools that faced sanctions under NCLB, and the more severe those sanctions,
the more highly ranked the school became. NCLB sanctions were intended to improve student achievement, but they imposed a cost on districts. For instance, in the first year of
Program Improvement, schools had to provide additional professional training, but this effort was to be funded using existing Title I allocations, which were therefore diverted from
elsewhere. The descriptive evidence suggests that districts thought sanctioned schools most
deserved to participate in QEIA, in an effort to help those schools exit NCLB sanctioning. The federal government was therefore able to influence intra-district allocations, by
providing incentives for districts to shift resources to sanctioned schools.

74

2.7

Conclusion

The landmark court case Serrano v. Priest of 1971 ushered in an era of awareness of
disparities in educational resources across districts, with students from families with the
least resources attending districts that likewise were under-resourced. Due primarily to a
lack of within-district financial data, few studies have been able to address the question
of whether disparities exist within district as well. Resource allocation within a district is
determined by district preference and institutional constraints. Studies of within-district
disparities have by necessity focused on the outcome of this process in a handful of districts.
This paper seeks to understand determinants of intra-district resource allocation by
observing directly district preferences over low-performing schools. Due to a requirement
of California’s QEIA, districts were essentially required to answer the question, “Were you
to receive funding for one school to implement mandated reforms, which would you choose?
Conditional on that school being funded, which would you choose next?” Using districts’
responses, in the form of rankings, I model district preferences using a discrete choice
model.
Doing so, I find consistent evidence that districts preferred to fund schools that were in
the 5th year of PI. Under No Child Left Behind, schools that fail to meet Adequate Yearly
Progress are forced into increasingly strict sanctions, referred to in California as PI. By
the fifth year of PI, schools are required to implement plans that dramatically change their
organizational structure, by for instance reopening as a charter school, replacing the entire
staff, or allowing the state to take over. Districts seemed to have preferred giving these
schools the opportunity to participate in QEIA. This has important implications for policy
makers, particularly where policies from various levels of government overlap: schools for

75

which the federal government imposes a cost for failing to meet achievement targets are
more likely to receive support from their districts. The federal government can therefore
effectively incentivize districts to shift resources toward under-performing schools.
The rank order logit model that I employ has strong assumptions, such as constant
coefficients across districts and choices. That is, each district is assumed to weight characteristics equally, whether they are choosing their highest ranked or second-to-last ranked
schools. Tests for the validity of these assumptions fail in the case of QEIA, and the results
are therefore best viewed as descriptive, rather than as estimates of underlying parameters
of districts’ utility functions.
Another shortcoming of this study is that it is only capable of describing preferences
over low-performing schools. QEIA required districts to rank only those schools eligible
to participate, and eligibility was determined by an academic achievement cut-off. While
district preferences for resource allocation across low-performing schools is important, this
undoubtedly misses important dynamics in the allocation of resources across all schools
within a district. The study of district preferences across all schools is therefore left to
future research.

76

APPENDICES

77

APPENDIX A - FIGURES
Figure 2.1: Portion of Form Submitted by San Diego

78

APPENDIX B - TABLES

79

Table 2.1: Descriptive Statistics, All Schools

2007 class size
Middle school
High school
HQT 2007
Williams applies
Met target 2006
2007 LA
Year 1 of PI
Year 2 of PI
Year 3 of PI
Year 4 of PI
Year 5 of PI
N

All Schools

Eligible, Not
Participating

Participating
in Alternative

Participating
in Regular

22.53
(6.13)
0.15
(0.36)
0.24
(0.43)
0.92
(0.17)
0.23
(0.42)
0.61
(0.49)
0.08
(0.27)
0.07
(0.26)
0.03
(0.18)
0.05
(0.22)
0.03
(0.18)
0.04
(0.19)

23.33
(4.49)
0.14
(0.35)
0.22
(0.41)
0.91
(0.17)
0.89
(0.32)
0.71
(0.46)
0.01
(0.10)
0.10
(0.30)
0.10
(0.30)
0.15
(0.36)
0.18
(0.39)
0.14
(0.35)

25.36
(4.29)
0.19
(0.40)
0.45
(0.50)
0.87
(0.13)
0.95
(0.21)
0.60
(0.49)
0.24
(0.43)
0.19
(0.40)
0.07
(0.25)
0.10
(0.30)
0.18
(0.39)
0.26
(0.44)

23.18
(3.55)
0.17
(0.38)
0.11
(0.31)
0.90
(0.15)
0.95
(0.23)
0.65
(0.48)
0.18
(0.39)
0.17
(0.37)
0.10
(0.30)
0.21
(0.41)
0.16
(0.37)
0.19
(0.39)

9714

195

88

1172

Note: Table lists sample means and standard deviations in parentheses. With the
exception of 2007 class size, all variables are dichotomous, and thus means are the
proportion of schools falling into that category. PI refers to “Program Improvement.”
All schools is the universe of public schools in California in 2007.

80

Table 2.2: Descriptive Statistics, by Ranking
Participating Schools in District

Number of Districts
Alternative Program
HS*(Alternative)
Charter
Prop. FRPL
Prop. Hispanic
Prop. Black
Year 1 of PI
Year 2 of PI
Year 3 of PI
Year 4 of PI
Year 5 of PI
Elementary
N

2

3

4

5

6-10

11-55

234

30

29

16

16

19

18

1

0.033
(0.181)
0
(0 )
0.033
(0.181)
0.851
(0.125)
0.732
(0.221)
0.055
(0.087)
0.167
(0.376)
0.150
(0.360)
0.233
(0.427)
0.150
(0.360)
0.133
(0.343)
0.750
(0.437)
60

0.080
(0.274)
0
(0 )
0.023
(0.151)
0.836
(0.126)
0.746
(0.227)
0.057
(0.079)
0.138
(0.347)
0.092
(0.291)
0.184
(0.390)
0.322
(0.470)
0.149
(0.359)
0.713
(0.455)
87

0.063
(0.244)
0.016
(0.125)
0.016
(0.125)
0.821
(0.175)
0.828
(0.185)
0.031
(0.065)
0.188
(0.393)
0.047
(0.213)
0.234
(0.427)
0.188
(0.393)
0.156
(0.366)
0.641
(0.484)
64

0
(0 )
0
(0 )
0.025
(0.157)
0.750
(0.145)
0.741
(0.216)
0.113
(0.148)
0.138
(0.347)
0.125
(0.333)
0.175
(0.382)
0.250
(0.436)
0.138
(0.347)
0.700
(0.461)
80

0.144
(0.352)
0.068
(0.253)
0.014
(0.117)
0.803
(0.139)
0.790
(0.173)
0.079
(0.104)
0.164
(0.372)
0.137
(0.345)
0.199
(0.400)
0.144
(0.352)
0.123
(0.330)
0.699
(0.460)

0.055
(0.228)
0.027
(0.163)
0.037
(0.188)
0.867
(0.136)
0.704
(0.231)
0.151
(0.167)
0.105
(0.307)
0.084
(0.278)
0.256
(0.437)
0.194
(0.396)
0.224
(0.417)
0.692
(0.462)

0.090
(0.286)
0.051
(0.221)
0.026
(0.158)
0.896
(0.096)
0.832
(0.195)
0.132
(0.193)
0.299
(0.459)
0.047
(0.212)
0.192
(0.395)
0.047
(0.212)
0.299
(0.459)
0.675
(0.469)

146

438

234

Note: Table lists sample means and standard deviations in parentheses. With the exception of 2007 class size, all variables are dichotomous, and thus means are the proportion of
schools falling into that category. PI is “Program Improvement.” Required new teachers is
the change in the mandated change in the teacher/student ratio times student enrollment.

81

Table 2.3: Difference Above Median Ranking-Below Median Ranking
Participating Schools in District

Number of
Districts
Alt. program
HS*(alt.)
Charter
Prop. FRPL
Prop Hispanic
Prop black
Year 1 of PI
Year 2 of PI
Year 3 of PI
Year 4 of PI
Year 5 of PI
Elementary
N

2

3

4

5

6-10

11-55

234

30

29

16

16

19

18

1

0.000
(0.047)
0.000
(0.000)
0.000
(0.047)
-0.011
(0.032)
0.001
(0.058)
0.006
(0.023)
-0.000
(0.098)
0.167*
(0.091)
-0.067
(0.111)
-0.100
(0.093)
0.067
(0.089)
-0.167
(0.112)
60

-0.017
(0.059)
0.000
(0.000)
-0.044
(0.031)
-0.015
(0.027)
0.037
(0.049)
-0.010
(0.017)
0.102
(0.075)
-0.040
(0.062)
-0.033
(0.084)
0.022
(0.101)
0.079
(0.077)
-0.273***
(0.095)
87

-0.063
(0.061)
-0.031
(0.031)
0.031
(0.031)
0.029
(0.044)
-0.007
(0.047)
0.004
(0.016)
-0.063
(0.099)
-0.094*
(0.052)
0.031
(0.108)
0.125
(0.098)
0.125
(0.091)
0.094
(0.121)

0.000
(0.000)
0.000
(0.000)
-0.048
(0.033)
-0.034
(0.033)
0.016
(0.048)
-0.006
(0.033)
-0.061
(0.077)
-0.188***
(0.069)
0.068
(0.086)
-0.025
(0.098)
0.189**
(0.077)
-0.281***
(0.100)

64

80

0.068
(0.058)
0.027
(0.042)
-0.027
(0.019)
-0.003
(0.023)
0.029
(0.029)
-0.010
(0.017)
-0.137**
(0.061)
-0.055
(0.057)
0.178***
(0.065)
0.041
(0.058)
0.055
(0.055)
-0.301***
(0.072)

0.028
(0.022)
0.019
(0.016)
-0.054***
(0.018)
0.018
(0.013)
-0.023
(0.022)
0.027*
(0.016)
-0.081***
(0.029)
-0.022
(0.027)
-0.034
(0.042)
0.070*
(0.038)
0.212***
(0.039)
-0.163***
(0.044)

0.043
(0.037)
0.051*
(0.029)
-0.034*
(0.021)
-0.031**
(0.012)
-0.029
(0.026)
0.030
(0.025)
-0.376***
(0.055)
-0.043
(0.028)
0.060
(0.052)
0.060**
(0.028)
0.496***
(0.051)
-0.530***
(0.051)

146

438

234

Note: Robust standard errors in parentheses for the coefficient from a regression of the characteristic on
an indicator for whether the school is above or below district’s median ranking. In districts with an odd
number of schools, the median school is randomly assigned to be above or below the median. PI is “Program
Improvement.”

82

Table 2.4: Rank Order Logit Results, Efron’s Approximation for Ties
Baseline

Alt. prog.

HS*(Alt.)

Charter

Prop. FRPL

Prop. Hispanic

Prop. black

Year 1 of PI

Year 2 of PI

Year 3 of PI

Year 4 of PI

Year 5 of PI

N

LA Excluded Interact Count Exclude LA C. Logit
Interact Count

-0.09
(0.33)
[0.39]
0.73
(0.37)**
[0.54]
-0.38
(0.29)
[0.34]
1.68
(0.54)***
[0.82]**
0.49
(0.57)
[0.84]
1.07
(0.60)*
[0.86]
0.42
(0.14)***
[0.31]
0.85
(0.18)***
[0.46]*
1.19
(0.15)***
[0.52]**
1.10
(0.17)***
[0.54]**
1.65
(0.16)***
[0.67]**

0.09
(0.40)
[0.51]
0.91
(0.49)*
[0.84]
-0.65
(0.34)*
[0.26]**
1.57
(0.60)***
[0.85]*
0.45
(0.63)
[0.79]
1.36
(0.74)*
[0.98]
0.10
(0.18)
[0.23]
0.39
(0.19)**
[0.21]*
0.67
(0.17)***
[0.25]***
0.66
(0.18)***
[0.30]**
1.03
(0.18)***
[0.40]**

0.09
(0.41)
[0.52]
0.37
(0.59)
[0.94]
-0.91
(0.40)**
[0.26]***
1.97
(0.65)***
[0.93]**
-0.14
(0.70)
[0.76]
0.25
(0.91)
[1.10]
0.10
(0.20)
[0.25]
0.51
(0.21)**
[0.25]**
0.74
(0.18)***
[0.25]***
0.73
(0.19)***
[0.29]**
1.03
(0.19)***
[0.36]***

0.17
(0.39)
[0.52]
0.79
(0.50)
[0.81]
-0.68
(0.35)*
[0.27]**
1.32
(0.61)**
[0.95]
0.31
(0.65)
[0.81]
1.19
(0.77)
[0.97]
0.06
(0.19)
[0.25]
0.44
(0.20)**
[0.24]*
0.67
(0.17)***
[0.25]***
0.67
(0.18)***
[0.29]**
0.98
(0.19)***
[0.39]**

0.53
(0.94)
[0.87]
0.10
(1.18)
[1.10]
-0.55
(0.95)
[0.65]
2.07
(1.82)
[1.82]
-0.55
(1.96)
[1.69]
1.08
(2.70)
[1.83]
-0.25
(0.53)
[0.54]
0.13
(0.57)
[0.56]
0.33
(0.50)
[0.48]
0.59
(0.48)
[0.48]
0.98
(0.49)**
[0.47]**

1097

863

863

1097

1093

Note: * indicates p < 0.10, ** indicates p < 0.05, *** indicates p < 0.01. Coefficients
are from rank order logit regression of district rankings on student outcomes, except the
final column, which is conditional logit where dependent variable is 1 if district ranked
the school highest. Coefficients for measures of expect revenue and expected cost and
included but not shown. Standard errors in parentheses, and standard errors robust
to misspecification in brackets. Count refers to measure of number of participating
schools in the district, minus the average number of participating schools in districts
other than LA.

83

BIBLIOGRAPHY

84

BIBLIOGRAPHY
Balcom, Fred (Feb. 2007). Quality Education Investment Act (QEIA) of 2006. http://
www.cde.ca.gov/fg/fo/r16/documents/qeia07present.ppt. California Department
of Education.
Beggs, S., S. Cardell, and J. Hausman (1981). Assessing the potential demand for electric
cars. Journal of Econometrics 17.1, pp. 1–19.
CDE (Jan. 2010). Report to the Legislature and the Governor; Quality Education Investment Act First Progress Report. http://www.cde.ca.gov/ta/lp/qe/documents/
qeialegrpt.doc. California Department of Education.
Corcoran, S.P. and W.N. Evans (2008). Equity, adequacy, and the evolving state role in
education finance. Handbook of Research in Education Finance and Policy.
Efron, Bradley (1977). The efficiency of Cox’s likelihood function for censored data. Journal
of the American Statistical Association 72.359, pp. 557–565.
Iatarola, P. and L. Stiefel (2003). Intradistrict equity of public education resources and
performance. Economics of Education Review 22.1, pp. 69–78.
Klein, C.C. (2008). Intradistrict public school funding equity, community resources, and
performance in Nashville, Tennessee. Journal of Education Finance, pp. 1–14.
Podgursky, Michael (2006). Is Teacher Pay “Adequate?” Education Working Paper Archive.
Roza, M., P.T. Hill, S. Sclafani, and S. Speakman (2004). How within-district spending
inequities help some schools to fail. Brookings Papers on Education Policy, pp. 201–
227.
Santa Rosa City Schools (Mar. 2007). School Board Minutes, Quality Education Investment
Act. http : / / www . srcs . k12 . ca . us / board / agendas / attachments / 032807 - BR F7.pdf.
Springer, Matthew G, Eric A Houck, and James W Guthrie (2008). History and scholarship regarding United States education finance and policy. Handbook of Research in
Education Finance and Policy, pp. 3–22.
U.S. Department of Education (2009). U.S. Department of Education. http://nces.ed.
gov/programs/digest/d09/tables/dt09_079.asp. U.S. Department of Education.
Institute of Education Sciences, National Center for Education Statistics.

85

Chapter 3
Asymptotic Properties of Quantile
Regression for Standard Stratified
and Variable Probability Sampling1

3.1

Introduction

Quantile Regression has been widely used in the social sciences in recent decades, in part
due to its ability to estimate changes throughout the conditional distribution of an outcome
of interest. Ordinary Least Squares models the effect on an outcome of interest as a location
shift in the conditional distribution of the outcome variable. Yet causal effects may manifest as greater variance, skewness, or density in the tails of the conditional distribution,
all of which may be obscured by focusing exclusively on location shifts. As exemplified
by Koenker (2005), changes in independent variables may even induce a bimodal conditional distribution. Quantile Regression can reveal these effects. A natural use of Quantile
Regression has been to analyze the wage structure and potential differences in the determinants of wages observed at different points of the wage distribution, e.g., Albrecht et al.
(2003); Buchinsky (1998); Buchinsky (2001); Machado et al. (2005); Martins et al. (2004);
1

This chapter coauthored with Ot´
avio Bartalotti.

86

and Melly (2005). Given a sample in which observations are selected with equal probability, well-established methods are available for estimating a Quantile Regression model
(Koenker (2005), Wooldridge (2010)).
Frequently, however, samples are not drawn with equal probability. Commonly used
data sets such as the Current Population Survey, Panel Study of Income Dynamics, National
Longitudinal Survey of Youth, and the Health and Retirement Study sample with unequal
probability. In order to more precisely estimate characteristics of subpopulations of interest,
these subpopulations are often oversampled. Ignoring the sampling design of such data sets
may lead to inconsistent estimation, in which case consistent estimation can proceed by
weighting observations.2
Two types of sampling schemes that are prevalent in a wide range of surveys and datasets
in social sciences are Standard Stratified (SS) and Variable Probability (VP) sampling.
With SS sampling, the population is divided into J mutually exclusive, exhaustive strata,
and a random sample of size Nj is taken from stratum j. Alternatively, in the VP sampling
case an observation is first drawn at random from the population, and if the observation
falls into stratum j, it is kept with probability pj . In either case, when stratification is
exogenous, i.e., the probability of selection is independent of the outcome conditional on
covariates, estimation can proceed without regard to the stratification; the usual estimators
that ignore stratification are consistent, efficient, and asymptotically normal, and the usual
variance estimators are valid (Wooldridge (1999); Wooldridge (2001)).
When the probability of selection is not independent of the outcome conditional on
covariates, stratification is said to be endogenous, and the standard estimators are generally
inconsistent. The asymptotic properties of M-estimators with smooth objective functions
2

For a general discussion and guidance on the appropriateness of weighting, see Solon et al. (2013)

87

under VP and SS sampling have been analyzed in Wooldridge (1999) and Wooldridge
(2001), respectively. However, these results are not directly applicable to the Quantile
Regression case due to the nonsmoothness in the objective function that provides the QR
estimates.
Bartalotti (2012) partially fills this gap, by developing the asymptotic properties of
quantile regressors under SS sampling. This paper extends the analysis to Quantile Regression under VP sampling. Additionally, we present evidence from simulations, which
demonstrate that Stata’s weighted standard errors are quite inaccurate, particularly under
VP sampling. Bootstrapped standard errors outperform analytic standard errors under VP
sampling across coefficients, quantiles, and sample sizes. Under SS sampling no method of
estimating standard errors performs consistently well. In what follows, section 3.2 reviews
the standard Quantile Regression estimator. Section 3.3 reviews the asymptotic properties
of Quantile Regression under SS sampling, and develops those of VP sampling. Section 3.4
provides results from Monte Carlo simulations. Section 3.5 concludes.

3.2

The Quantile Regression Population Problem

We are interested in estimating the Conditional Quantile Function (CQF) of a random
variable y conditional on a vector of q explanatory variables x. This is defined by,

Qτ (y | x) = inf {y : F(y | x) ≥ τ }

where τ ∈ (0, 1) indexes the τ th quantile of the conditional distribution of y. Let the CQF
be described by a known function g (·) of the parameters and the explanatory variables

88

Qτ (y | x) = g x, β o,τ

(3.1)

β is subscripted with “o” to denote the true population parameter, and with τ to indicate
that the parameters typically vary with τ .
A special case of interest is given by the linear model:3

y = x β o,τ + ε

(3.2)

with Qτ (ε | x) = 0. Throughout this paper we concentrate on the linear CQF, for ease of
exposition and since it is the most widely used by practitioners. Nevertheless, the results
presented are valid for a nonlinear, correctly specified CQF, g (·). In the population, β o,τ
solves the following problem:

min E ρτ y − x β τ

β τ ∈B

(3.3)

where, ρτ (u) = (τ − 1 [u ≤ 0])u and B ∈ RK is the parameter space.
Given a random sample from the population of size N , it is possible to obtain consistent
estimates of β o,τ by a standard Quantile Regression estimator, which solves the following:

min N

β τ ∈B

−1

N

ρτ (yi − xi β τ )

(3.4)

i=1

Note that the minimization problem has the following first order conditions and sample
3

This formulation assumes the error term is additive and, hence, separable. For a treatment of the more
general formulation with (potentially) non-separable ε see Powell (1991).

89

analogue (Buchinsky (1998)):4

E
N

−1

τ − 1 y − x β o,τ ≤ 0

x

= 0

(3.5)

xi = 0

(3.6)

N

τ − 1 yi − xi β˘τ ≤ 0
i=1

where 1[·] is the indicator function.
We can therefore frame this problem as a GMM estimator that uses as moment conditions the first order conditions of the Quantile Regression problem that identify β o,τ . Under
random sampling, the standard Quantile Regression procedures can be used to estimate
β o,τ and to perform inference.

3.3
3.3.1

Quantile Regression under SS and VP Sampling
SS Sampling

We review here the SS sampling case explicated in Bartalotti (2012), and extend the
analysis to VP sampling. Under SS sampling, the population is divided into J strata,
W1 , W2 , ..., WJ . A sample of Nj observations is drawn randomly from each stratum j, and
is denoted by {wij = (yij , xij ) : i = 1, . . . , Nj }.
The strata sample sizes Nj are nonrandom. Therefore, the total sample size, N = N1 +
· · · + NJ , is nonrandom. The density of a characteristic y in the jth stratum is denoted by
dF (Y |j) with F (a|j) denoting the population proportion of households in stratum Wj with
y < a. Crucially, this density can differ across strata so even though the observations are
4

In general the FOC does not hold exactly, but the left-hand side of equation 3.6 is op (N
Buchinsky (1998).

90

1/2

). See

i.i.d. within strata, observations from different strata are independent but not necessarily
identically distributed.
Bartalotti (2012) shows that a consistent estimator of β o,τ uses the following sample
moment condition:

1
N

N
i=1

Qj
Hj

τ − 1 yij − xij β τ ≤ 0

xi = 0

(3.7)

N

where Qj = P (w ∈ Wj ) is assumed known, and Hj = Nj . If Qj is unknown, it can readily
be estimated from large survey data.
This is the empirical moment condition that is used to estimate the parameters of
interest, defining the weighted Quantile Regression estimator under SS sampling. This
estimator is consistent for the parameters of interest under Standard Stratified sampling
(Wooldridge (2001)’s theorem 3.1).5
As Bartalotti (2012) shows, under standard regularity conditions,

√

a
ˆ −β ) ∼
N (β
τ
o,τ

−1
N 0, A−1
, where
1 B1 A1

A1 = E fy|x x β o,τ xx
and
J

B1 =

Q2j

H
j=1 j

Var q|w ∈ Wj

5

As a minor point note that if one wants to implement the weighted estimator by applying a standard
Qj
Quantile Regression to weighted data, the weights for each observation will be given by H i instead of the
ji
1
Qj 2
i
usually required when implementing least squares estimation and its variants.
Hj
i

91

and qij = τ − 1[yij − xij β o,τ ≤ 0] xij .
Two main points regarding B1 are worth mentioning. The first, which is general to the
Standard Stratification literature, is that we cannot replace Var q|w ∈ Wj by the outer
product of the score as in the random sampling case because in general

τ − 1 y − x βτ o ≤ 0

E

x|w ∈ Wj = 0

(3.8)

as pointed out by Wooldridge (2001). Without further assumptions, the population moment
condition does not necessarily hold in each stratum. Second, it is interesting to note
that, distinct from the standard results in Quantile Regression for random sampling, B1
does not simplify to τ (1 − τ )E[xx ] in this case, since the variance of the binary variable
1 yij − xij β o,τ ≤ 0 is not necessarily the same for each stratum. That is, xij β o,τ will
not represent the τ th quantile in every stratum.
A feasible estimator requires knowledge of fy|x . Koenker (2005) suggests using the
fact that 1/fy|x = dQτ (Y |X)/dt. fy|x can therefore be estimated using the inverse of a
difference quotient:

fˆy|x =

2h
ˆ
ˆ
Xβ
τ +h − Xβ τ −h

(3.9)

We thus use the following estimate of A1 :

Aˆ1 = N −1

N
i=1

Qj

ˆ )x x
fˆ (x β
Hj i,y|x ij τ ij ij

A natural estimate of B1 is

92

(3.10)

ˆ = N −1
B
1

N

Q2j

2
i=1 Hj

¯ˆ
ˆ ij − q
q
j

¯ˆ = N −1
ˆ ij = τ − 1[yij − xij βˆτ ≤ 0] xij , and q
where q
j
j

3.3.2

(3.11)

i∈Wj

ˆ ij .
q

VP Sampling

Under VP sampling, N observations are first drawn at random from the population, and
the sample is denoted by {wi = (yi , xi ) i = 1, . . . , N }. If an observation falls into stratum
j, it is kept with probability pj . Following Wooldridge (1999), for each individual i we
define J indicator variables sij = 1[wi ∈ Wj ]. Likewise, we define for each individual i J
binary variables hij , where P(hij = 1) = pj . If observation i is in stratum j it is kept if
hij = 1. Finally, define rij = sij hij , which indicates whether random draw i is kept, and if
so what stratum it belongs to. Note that under VP sampling, the number of observations
kept from stratum j, Nj , is random, and so therefore is the total number of observations
kept across strata, N0 = N1 + · · · + NJ .
Corollary 1. With these definitions, a consistent Quantile Regression estimator under VP
sampling is given by the following sample moment condition:
N

J

˜
p−1
j hij sij τ − 1 yij − xij β τ ≤ 0

xij = 0

(3.12)

i=1 j=1

All proofs are provided in Appendix B. Note that the outer summation in equation (3.12)
is over N , which includes discarded observations. In practice one can use
N0

p−1
τ − 1[yij − xij β˜τ ≤ 0] xij = 0
j

i=1

93

(3.13)

˜ follows from Newey et al. (1994), Theorem 7.1:
The asymptotic distribution of β
τ
Corollary 2. If the conditions of Newey et al. (1994) Theorem 7.1 are satisfied,

√

˜ −
N (β
τ

a

−1
β τ ) ∼ N (0, A−1
2 B2 A2 ), where

A2 = E[fy|x (xβ τ )xx ]

(3.14)

and

J

p−1
j E sij qq

B2 =

(3.15)

j=1

and q is as defined above.
We estimate (3.14) using

Aˆ2 = N

N0

−1

ˆ
p−1
j fi,y|x xij xij

(3.16)

i=1

where fˆy|x is defined above. (3.15) can be estimated using

N

−1

N0

ˆ ij q
ˆ ij
p−2
j q

(3.17)

i=1

Although both (3.16) and (3.17) depend on N , which is typically not observed, these
˜ ).
cancel out in the expression of Avar(β
τ

3.4

Simulation Results

We compare the performance of the above analytic standard errors to those generated by
Stata’s qreg command both with and without the “pweight” option. We also compare them
94

to bootstrapped standard errors, where the bootstrapping procedure ignores the sampling
scheme, but in each of 1,000 bootstrap replications coefficients are estimated with regard
to the sampling scheme, i.e., using the “pweight” option in Stata. Our data generating
process follows the multiplicative heteroskedasticity model of Cameron et al. (2009):

y = 1 + x1 + x2 + u
u = (0.1 + 0.5x1 ) × ε
x1 ∼ χ21
x2 ∼ N (0, 25)
ε ∼ N (0, 25)

An advantage of this DGP is that each quantile is linear in x:

Qτ (y|x) = ατ + β1,τ x1 + β2,τ x2 = [1 + 0.1Fε−1 (τ )] + x1 + 1 + 0.5Fε−1 (τ ) x2

We create a population of 51 strata, with sizes proportional to the population of the
50 US states and the District of Columbia. We present results with both exogenous and
endogenous stratification, and under endogenous stratification we present results for two
sample sizes. For the case of exogenous stratification, the u are sorted randomly across
strata, and the sample size is the smaller of the two. In the case of endogenous stratification,
observations are sorted across strata such that the most populous strata have the largest
values of u. Since u is correlated with x1 , stratification is not exogenous, and estimators
that ignore stratification are inconsistent. The SS sampling case sets Nj = 20 ∀j for the
smaller sample size, and Nj = 50 ∀j for the larger sample size. For the VP sampling
95

case, we set pj proportional to the inverse of the population of stratum j, and therefore
in expectation each stratum is equally represented in the sample. The scaling factor is set
so that E(N ) = 1, 020 for the smaller sample size, and E(N ) = 2550 for the larger sample
size. For both SS and VP sampling we draw 10,000 samples from the population.
We present results for infeasible estimates of A1 and A2 that rely on knowledge of the
true fy|x , the standard errors for which are denoted fi , as well as feasible estimates that use
equation (3.9). For the bandwidth, we rely on Stata’s three methods, the Hall-Sheather,
Bonger, and Chamberlain methods, the standard errors based on which are denoted fˆi,1 ,
fˆi,2 , and fˆi,3 , respectively. Each bandwidth is a function of

N
i=1

weightij , and τ , where

weightij is the weight for observation i in stratum j. Thus, for the VP case, the bandwidths
are random, since N is random, but in the SS case the bandwidths are not random.
Table 3.1 presents the results for the SS sampling case under exogenous stratification.
For reference, the true values of the parameters are listed in the first row of each panel.
Throughout the simulation results, estimates that do not have a w subscript use Stata’s
qreg command without weights, while those with the w subscript use weights. Confirming
theory, the unweighted estimates well approximate the true values, and are more precise
than the weighted estimates. Since the precision of Quantile Regression estimators is
determined in part by the amount of data in the neighborhood of yi around yi −Qτ (y|x) = 0,
the estimators are noisier at the tails, e.g., at the 10th and 90th percentiles.
Stata’s unweighted standard errors, which are correct under exogenous stratification,
well approximate the standard deviation from the empirical distribution of the unweighted
estimators. In contrast, the standard errors that use weights routinely underestimate the
standard deviation of the empirical distribution of weighted estimators. Among the esti-

96

mates of standard errors of weighted estimators, those obtained by bootstrapping perform
best. Both the infeasible and the feasible analytic standard errors tend to underestimate
variation in the estimators. The bandwidth for fˆi,2 at the 25th percentile is approximately
0.35,6 and therefore it is not possible to estimate β 0.25−h .
Table 3.2 presents results under endogenous stratification. Not surprisingly, estimates
of β1 and α that fail to account for stratification do a poor job, while those that account
for stratification well approximate the true values. The coefficients on x2 , which is random
across strata, are unaffected by weighting. The variability of βˆ1,w does not exhibit the
symmetric pattern observed under exogenous stratification, and is instead monotonically
increasing in τ . This is due to the sampling scheme: the endogenous stratification over
samples observations in the neighborhood of yi around yi − Q0.10 (y|x) = 0, and under
samples observations in the neighborhood of yi around yi − Q0.90 (y|x) = 0.
Stata’s weighted standard errors appear completely insensitive to this fact, and instead
exhibit a U-shaped pattern.7 This leads to dramatic overstatement of variability at lower
quantiles, and still nonetheless understatement of variability at higher quantiles. The
bootstrapped standard errors capture the monotonic pattern of increasing in τ , but for
estimates of α and β1 the bootstrapped standard errors are too low for τ = 0.10, and
perform fairly well at τ = 0.90. In stark contrast, both the feasible and infeasible standard
errors for αw and β1,w are too high for τ = 0.90, but perform well for lower values of τ .
We present the results for the larger sample size, Nj = 50, in Table 3.3. βˆ1,w is
more precisely estimated across quantiles, with the proportional reduction in the empirical
6

Recall that the bandwidth is not random under SS sampling.

7

We believe Stata’s weighted standard errors have two flaws in implementing Equation (3.11): they use
the outer-product of the score, and the weighting factor, Qj /Hj is not squared.

97

standard deviations being about constant across τ . The bootstrapped standard errors again
tend to overstate variability in estimates of α and β1,w , particularly at lower values of τ .
With the larger sample size, both the feasible and infeasible analytic standard errors tend
to outperform bootstrapped standard errors at each value of τ , and for each coefficient.
Table 3.4 presents results for the VP case under exogenous sampling. Again, confirming theory, when stratification is exogenous, unweighted estimates well approximate the
true values, and are efficient relative to weighted estimates. The standard deviation from
the empirical distribution of the estimates across all 10,000 simulations follows a U-shaped
pattern across τ , with less variation at τ = 0.50, and the most variation in the tails.
Stata’s unweighted standard errors accurately estimate the true variation in estimates, but
the weighted standard errors are more than an order of magnitude too small.8 The bootstrapped and infeasible standard errors perform well, though both underestimate variation
of β˜1 at τ = 0.90. The feasible standard errors consistently underestimate variation in
estimates.
Results under endogenous stratification with E(N ) = 1020 are presented in Table 3.5.
Again, as in the SS case, unweighted estimates perform poorly under endogenous stratification, while the weighted estimates well approximate the true values. Variation in the
estimates is increasing in τ , which, as in the SS case, is a product of our sampling scheme:
observations with yi in the neighborhood of yi − Qτ (y|x) = 0 are systematically under
sampled at τ = 0.90, and oversampled at τ = 0.10. Stata’s weighted standard errors are
again wildly inaccurate. Across all coefficients and values of τ , the bootstrapped standard
errors perform quite well, while both the infeasible and feasible analytic standard errors
8
−1
We obtain numerically identical results to Stata’s weighted standard errors when we use p , instead
−2
of p , in Equation (3.17)

98

tend to understate variation at higher values of τ .
The results from endogenous stratification with E(N ) = 1020 are precisely mirrored
in the results under endogenous stratification with E(N ) = 2550, presented in Table 3.6.
Again Stata’s weighted standard errors are woefully inaccurate, while bootstrapped standard errors perform quite well across coefficients and values of τ . Both the feasible and
infeasible analytic standard errors tend to understate variability for τ = 0.90.

3.5

Conclusion

This paper extends the results from Bartalotti (2012), which addressed the issue of inference
for Quantile Regressions when the data are obtained through Standard Stratified sampling,
to the case where data are obtained through Variable Probability sampling. We develop
the asymptotic distribution of Quantile Regression under VP sampling. Valid estimators
for the asymptotic variance matrix of those estimators are provided. Given the insights
provided by Quantile Regression, and the fact that many data sets are obtained through
complex random sampling techniques, this paper fills an important gap in the literature.
We provide simulation results that confirm theory in showing that unweighted estimates perform well under exogenous stratification, and are in that case efficient relative
to weighted estimators. We demonstrate the importance of weighting for consistent estimates under SS and VP sampling when the sampling scheme is endogenous. Under SS
sampling, neither bootstrapped nor analytic standard errors are always best, though with
larger sample sizes the analytic standard errors tended to do better.
Under VP sampling, bootstrapped standard errors performed best across coefficients,
quantiles, and sample sizes, while analytic standard errors underestimated variability around
99

quantiles that were undersampled. A consistent finding throughout the simulation results
is that Stata’s standard errors are erroneous, and should not be used.

100

APPENDICES

101

APPENDIX A - TABLES
Table 3.1: Exogenous SS, Simulation Results
10th

25th

50th

75th

90th

β1

-2.200

-0.690

1.000

2.690

4.200

βˆ1

-0.699
(0.239)
-0.685
(0.355)
0.240
0.237
0.357
0.342

fi

-2.219
(0.300)
-2.180
(0.439)
0.299
0.291
0.423
0.405
0.444
0.384
0.417

0.328
0.343

0.997
(0.226)
0.992
(0.326)
0.223
0.218
0.331
0.315
0.322
0.306
0.316

2.671
(0.238)
2.660
(0.353)
0.238
0.235
0.355
0.335
0.345
0.324
0.337

4.192
(0.301)
4.162
(0.449)
0.302
0.296
0.430
0.399
0.412
0.382
0.398

β2

1.000

1.000

1.000

1.000

1.000

βˆ2

1.000
(0.011)
1.001
(0.016)
0.010
0.010
0.017
0.014

fi

1.001
(0.013)
1.001
(0.021)
0.013
0.012
0.022
0.018
0.020
0.017
0.019

0.013
0.016

1.000
(0.010)
1.001
(0.015)
0.010
0.009
0.016
0.013
0.014
0.012
0.014

1.000
(0.011)
1.001
(0.016)
0.010
0.010
0.017
0.014
0.015
0.013
0.015

1.000
(0.013)
1.001
(0.021)
0.012
0.012
0.022
0.017
0.018
0.016
0.019

α

0.350

0.660

1.000

1.340

1.640

α
ˆ

0.366
(0.082)
0.345
(0.125)
0.083
0.082
0.130
0.116
0.130
0.106
0.118

0.670
(0.066)
0.660
(0.100)
0.066
0.065
0.104
0.094

1.006
(0.062)
1.002
(0.092)
0.061
0.060
0.095
0.088
0.090
0.082
0.088

1.347
(0.067)
1.344
(0.101)
0.067
0.065
0.104
0.093
0.097
0.087
0.094

1.648
(0.085)
1.658
(0.128)
0.082
0.081
0.131
0.110
0.116
0.102
0.112

Standard Errors

βˆ1,w
Stata’s Unweighted
Stata’s Weighted
Bootstrapped
fˆi,1
fˆi,2
fˆi,3

Standard Errors

βˆ2,w
Stata’s Unweighted
Stata’s Weighted
Bootstrapped
fˆi,1
fˆi,2
fˆi,3

Standard Errors

α
ˆw
Stata’s Unweighted
Stata’s Weighted
Bootstrapped
fˆi,1
fˆi,2
fˆi,3
fi

0.088
0.096

Note: Estimates come from 10,000 simulations. βˆ1 is estimated without weights. βˆw,1
is estimated with Stata’s “pweight” option. Numbers in parentheses are standard
deviation of estimates across the 10,000 simulations. Bootstrapped standard errors
come from 1,000 repetitions, where each draws from the sample with equal probability
and uses weighted estimate. fˆi,1 , fˆi,2 , and fˆi,1 are from Hall-Sheather, Bonger, and
Chamberlain methods of estimating bandwidth, while fi uses known distribution.

102

Table 3.2: Endogenous SS, Simulation Results
10th

25th

50th

75th

90th

β1

-2.200

-0.690

1.000

2.690

4.200

βˆ1

-2.835
(0.124)
-0.689
(0.175)
0.153
0.241
0.251
0.168

fi

-3.959
(0.173)
-2.213
(0.145)
0.210
0.303
0.201
0.140
0.151
0.137
0.138

0.165
0.166

-1.682
(0.102)
0.992
(0.264)
0.129
0.221
0.384
0.250
0.255
0.243
0.248

-0.635
(0.112)
2.686
(0.442)
0.130
0.231
0.544
0.411
0.426
0.390
0.424

0.163
(0.141)
4.144
(0.681)
0.163
0.277
0.702
0.567
0.588
0.542
0.599

β2

1.000

1.000

1.000

1.000

1.000

βˆ2

0.999
(0.018)
1.001
(0.011)
0.016
0.013
0.012
0.011
0.012
0.010
0.011

1.001
(0.013)
1.001
(0.012)
0.013
0.010
0.012
0.011
0.010
0.011

1.001
(0.012)
1.001
(0.015)
0.012
0.009
0.016
0.013
0.014
0.012
0.014

1.001
(0.014)
1.001
(0.021)
0.014
0.010
0.023
0.018
0.019
0.017
0.020

1.003
(0.022)
1.001
(0.031)
0.021
0.012
0.036
0.025
0.026
0.024
0.027

α

0.350

0.660

1.000

1.340

1.640

α
ˆ

0.206
(0.088)
0.360
(0.057)
0.101
0.084
0.070
0.057
0.061
0.054
0.055

0.557
(0.059)
0.662
(0.057)
0.078
0.066
0.074
0.055

0.960
(0.051)
1.001
(0.079)
0.074
0.060
0.101
0.073
0.075
0.069
0.073

1.387
(0.062)
1.346
(0.122)
0.082
0.064
0.142
0.108
0.115
0.098
0.113

1.886
(0.107)
1.688
(0.193)
0.135
0.081
0.219
0.149
0.158
0.138
0.152

Standard Errors

βˆ1,w
Stata’s Unweighted
Stata’s Weighted
Bootstrapped
fˆi,1
fˆi,2
fˆi,3

Standard Errors

βˆ2,w
Stata’s Unweighted
Stata’s Weighted
Bootstrapped
fˆi,1
fˆi,2
fˆi,3
fi

Standard Errors

α
ˆw
Stata’s Unweighted
Stata’s Weighted
Bootstrapped
fˆi,1
fˆi,2
fˆi,3
fi

0.054
0.054

Note: Estimates come from 10,000 simulations. βˆ1 is estimated without weights. βˆw,1
is estimated with Stata’s “pweight” option. Numbers in parentheses are standard
deviation of estimates across the 10,000 simulations. Bootstrapped standard errors
come from 1,000 repetitions, where each draws from the sample with equal probability
and uses weighted estimate. fˆi,1 , fˆi,2 , and fˆi,1 are from Hall-Sheather, Bonger, and
Chamberlain methods of estimating bandwidth, while fi uses known distribution.

103

Table 3.3: Large Sample Endogenous SS, Simulation Results
10th

25th

50th

75th

90th

β1

-2.200

-0.690

1.000

2.690

4.200

βˆ1

-2.833
(0.078)
-0.687
(0.110)
0.096
0.152
0.159
0.109

fi

-3.960
(0.107)
-2.212
(0.093)
0.131
0.190
0.129
0.090
0.094
0.089
0.089

0.107
0.108

-1.681
(0.066)
0.992
(0.165)
0.080
0.139
0.242
0.162
0.164
0.159
0.161

-0.632
(0.072)
2.679
(0.283)
0.081
0.148
0.346
0.272
0.280
0.261
0.276

0.168
(0.089)
4.181
(0.437)
0.102
0.183
0.462
0.401
0.414
0.383
0.410

β2

1.000

1.000

1.000

1.000

1.000

βˆ2

1.000
(0.011)
1.001
(0.007)
0.010
0.008
0.007
0.007
0.008
0.006
0.007

1.001
(0.008)
1.001
(0.007)
0.008
0.007
0.008
0.007
0.007
0.007

1.001
(0.007)
1.001
(0.009)
0.008
0.006
0.010
0.009
0.009
0.008
0.009

1.001
(0.009)
1.000
(0.013)
0.009
0.006
0.014
0.012
0.012
0.011
0.013

1.003
(0.014)
1.000
(0.019)
0.014
0.007
0.020
0.015
0.016
0.015
0.018

α

0.350

0.660

1.000

1.340

1.640

α
ˆ

0.211
(0.054)
0.360
(0.036)
0.063
0.053
0.043
0.036
0.038
0.034
0.035

0.557
(0.037)
0.662
(0.036)
0.049
0.042
0.046
0.035

0.958
(0.033)
1.001
(0.049)
0.047
0.038
0.063
0.048
0.049
0.046
0.048

1.385
(0.039)
1.342
(0.077)
0.052
0.041
0.088
0.072
0.075
0.065
0.074

1.877
(0.067)
1.658
(0.116)
0.083
0.050
0.126
0.099
0.106
0.092
0.105

Standard Errors

βˆ1,w
Stata’s Unweighted
Stata’s Weighted
Bootstrapped
fˆi,1
fˆi,2
fˆi,3

Standard Errors

βˆ2,w
Stata’s Unweighted
Stata’s Weighted
Bootstrapped
fˆi,1
fˆi,2
fˆi,3
fi

Standard Errors

α
ˆw
Stata’s Unweighted
Stata’s Weighted
Bootstrapped
fˆi,1
fˆi,2
fˆi,3
fi

0.035
0.035

Note: Estimates come from 10,000 simulations. βˆ1 is estimated without weights. βˆw,1
is estimated with Stata’s “pweight” option. Numbers in parentheses are standard
deviation of estimates across the 10,000 simulations. Bootstrapped standard errors
come from 1,000 repetitions, where each draws from the sample with equal probability
and uses weighted estimate. fˆi,1 , fˆi,2 , and fˆi,1 are from Hall-Sheather, Bonger, and
Chamberlain methods of estimating bandwidth, while fi uses known distribution.

104

Table 3.4: Exogenous VP Sampling, Simulation Results
10th

25th

50th

75th

90th

β1

-2.200

-0.690

1.000

2.690

4.200

β˜1

fi

-2.223
(0.295)
-2.186
(0.437)
0.300
0.015
0.426
0.367
0.398
0.346
0.427

-0.697
(0.236)
-0.682
(0.358)
0.241
0.012
0.357
0.322
0.362
0.302
0.350

1.001
(0.224)
0.992
(0.329)
0.223
0.012
0.330
0.303
0.316
0.283
0.323

2.674
(0.237)
2.659
(0.356)
0.238
0.012
0.355
0.318
0.333
0.293
0.345

4.194
(0.305)
4.166
(0.450)
0.301
0.015
0.431
0.368
0.388
0.359
0.407

β2

1.000

1.000

1.000

1.000

1.000

β˜2

fi

1.001
(0.013)
1.001
(0.020)
0.013
0.001
0.022
0.016
0.017
0.017
0.020

1.000
(0.011)
1.000
(0.016)
0.010
0.001
0.017
0.013
0.016
0.014
0.016

1.000
(0.010)
1.001
(0.015)
0.010
0.000
0.016
0.012
0.013
0.013
0.015

1.000
(0.011)
1.000
(0.016)
0.010
0.001
0.017
0.013
0.014
0.014
0.016

1.000
(0.013)
1.001
(0.020)
0.012
0.001
0.022
0.016
0.016
0.017
0.019

α

0.350

0.660

1.000

1.340

1.640

α
˜

0.367
(0.082)
0.346
(0.125)
0.083
0.004
0.130
0.098
0.108
0.099
0.120

0.670
(0.066)
0.658
(0.100)
0.066
0.003
0.104
0.082
0.102
0.083
0.098

1.005
(0.062)
1.001
(0.092)
0.061
0.003
0.096
0.079
0.087
0.076
0.090

1.347
(0.068)
1.345
(0.101)
0.067
0.003
0.104
0.082
0.091
0.081
0.096

1.648
(0.085)
1.657
(0.128)
0.082
0.004
0.131
0.096
0.103
0.099
0.115

Standard Errors

β˜1,w
Stata’s Unweighted
Stata’s Weighted
Bootstrapped
f˜i,1
f˜i,2
f˜i,3

Standard Errors

β˜2,w
Stata’s Unweighted
Stata’s Weighted
Bootstrapped
f˜i,1
f˜i,2
f˜i,3

Standard Errors

α
˜w
Stata’s Unweighted
Stata’s Weighted
Bootstrapped
f˜i,1
f˜i,2
f˜i,3
fi

Note: Estimates come from 10,000 simulations. βˆ1 is estimated without weights. βˆw,1
is estimated with Stata’s “pweight” option. Numbers in parentheses are standard
deviation of estimates across the 10,000 simulations. Bootstrapped standard errors
come from 1,000 repetitions, where each draws from the sample with equal probability
and uses weighted estimate. fˆi,1 , fˆi,2 , and fˆi,1 are from Hall-Sheather, Bonger, and
Chamberlain methods of estimating bandwidth, while fi uses known distribution.

105

Table 3.5: Endogenous VP Sampling, Simulation Results
10th

25th

50th

75th

90th

β1

-2.200

-0.690

1.000

2.690

4.200

β˜1

fi

-3.956
(0.189)
-2.212
(0.202)
0.210
0.016
0.199
0.191
0.197
0.180
0.198

-2.830
(0.142)
-0.691
(0.252)
0.153
0.013
0.249
0.239
0.257
0.223
0.246

-1.680
(0.122)
0.989
(0.374)
0.129
0.012
0.377
0.350
0.362
0.325
0.367

-0.634
(0.127)
2.636
(0.530)
0.130
0.012
0.541
0.458
0.480
0.457
0.512

0.165
(0.157)
4.107
(0.728)
0.165
0.014
0.710
0.571
0.576
0.556
0.643

β2

1.000

1.000

1.000

1.000

1.000

β˜2

fi

1.000
(0.018)
1.001
(0.011)
0.016
0.001
0.012
0.010
0.011
0.010
0.012

1.001
(0.013)
1.001
(0.012)
0.013
0.001
0.012
0.010
0.012
0.010
0.012

1.001
(0.012)
1.000
(0.015)
0.012
0.000
0.016
0.013
0.013
0.014
0.015

1.001
(0.014)
1.000
(0.021)
0.014
0.001
0.023
0.017
0.018
0.021
0.020

1.002
(0.022)
1.000
(0.031)
0.021
0.001
0.036
0.026
0.024
0.029
0.027

α

0.350

0.660

1.000

1.340

1.640

α
˜

0.204
(0.097)
0.359
(0.068)
0.101
0.004
0.070
0.060
0.066
0.055
0.067

0.554
(0.068)
0.662
(0.072)
0.078
0.004
0.074
0.066
0.073
0.060
0.070

0.959
(0.064)
1.002
(0.098)
0.074
0.003
0.100
0.083
0.090
0.083
0.094

1.389
(0.078)
1.357
(0.137)
0.082
0.003
0.143
0.110
0.114
0.127
0.127

1.887
(0.135)
1.689
(0.198)
0.135
0.004
0.223
0.155
0.148
0.169
0.161

Standard Errors

β˜1,w
Stata’s Unweighted
Stata’s Weighted
Bootstrapped
f˜i,1
f˜i,2
f˜i,3

Standard Errors

β˜2,w
Stata’s Unweighted
Stata’s Weighted
Bootstrapped
f˜i,1
f˜i,2
f˜i,3

Standard Errors

α
˜w
Stata’s Unweighted
Stata’s Weighted
Bootstrapped
f˜i,1
f˜i,2
f˜i,3
fi

Note: Estimates come from 10,000 simulations. βˆ1 is estimated without weights. βˆw,1
is estimated with Stata’s “pweight” option. Numbers in parentheses are standard
deviation of estimates across the 10,000 simulations. Bootstrapped standard errors
come from 1,000 repetitions, where each draws from the sample with equal probability
and uses weighted estimate. fˆi,1 , fˆi,2 , and fˆi,1 are from Hall-Sheather, Bonger, and
Chamberlain methods of estimating bandwidth, while fi uses known distribution.

106

Table 3.6: Large Sample Endogenous VP Sampling, Simulation Results
10th

25th

50th

75th

90th

β1

-2.200

-0.690

1.000

2.690

4.200

β˜1

fi

-3.959
(0.119)
-2.210
(0.129)
0.132
0.017
0.128
0.125
0.127
0.120
0.127

-2.831
(0.091)
-0.687
(0.157)
0.096
0.013
0.159
0.156
0.164
0.150
0.157

-1.679
(0.077)
0.991
(0.237)
0.081
0.012
0.240
0.230
0.235
0.216
0.236

-0.632
(0.081)
2.657
(0.338)
0.081
0.012
0.344
0.308
0.323
0.300
0.333

0.171
(0.099)
4.154
(0.465)
0.102
0.015
0.464
0.388
0.416
0.377
0.435

β2

1.000

1.000

1.000

1.000

1.000

β˜2

fi

1.000
(0.011)
1.001
(0.007)
0.010
0.001
0.007
0.006
0.007
0.006
0.007

1.001
(0.008)
1.001
(0.007)
0.008
0.001
0.008
0.007
0.008
0.006
0.007

1.001
(0.007)
1.001
(0.009)
0.008
0.000
0.010
0.008
0.009
0.008
0.009

1.001
(0.009)
1.001
(0.013)
0.009
0.000
0.014
0.011
0.011
0.011
0.013

1.003
(0.014)
1.001
(0.019)
0.102
0.001
0.020
0.015
0.015
0.017
0.018

α

0.350

0.660

1.000

1.340

1.640

α
˜

0.210
(0.059)
0.360
(0.042)
0.063
0.004
0.043
0.039
0.042
0.036
0.042

0.556
(0.043)
0.662
(0.045)
0.049
0.004
0.046
0.044
0.047
0.040
0.045

0.958
(0.040)
1.001
(0.061)
0.047
0.003
0.062
0.057
0.060
0.050
0.060

1.386
(0.049)
1.346
(0.087)
0.052
0.003
0.089
0.070
0.078
0.071
0.083

1.877
(0.085)
1.662
(0.121)
0.083
0.004
0.127
0.091
0.098
0.097
0.109

Standard Errors

β˜1,w
Stata’s Unweighted
Stata’s Weighted
Bootstrapped
f˜i,1
f˜i,2
f˜i,3

Standard Errors

β˜2,w
Stata’s Unweighted
Stata’s Weighted
Bootstrapped
f˜i,1
f˜i,2
f˜i,3

Standard Errors

α
˜w
Stata’s Unweighted
Stata’s Weighted
Bootstrapped
f˜i,1
f˜i,2
f˜i,3
fi

Note: Estimates come from 10,000 simulations. βˆ1 is estimated without weights. βˆw,1
is estimated with Stata’s “pweight” option. Numbers in parentheses are standard
deviation of estimates across the 10,000 simulations. Bootstrapped standard errors
come from 1,000 repetitions, where each draws from the sample with equal probability
and uses weighted estimate. fˆi,1 , fˆi,2 , and fˆi,1 are from Hall-Sheather, Bonger, and
Chamberlain methods of estimating bandwidth, while fi uses known distribution.

107

APPENDIX B - PROOFS
Proof of Corollary 1. It suffices to show that (3.12) converges in probability to (3.5). Using
J

the facts that hij is independent of (sij , yij , xij ), E(hij ) = pj , and

j=1

sij = 1, we have the

following:

N −1

N

J

p−1
j hij sij τ − 1 yij − xij β τ ≤ 0

xij

i=1 j=1
J

p

p−1
j E hij sij (τ − 1[yij − xij β τ ≤ 0])xij

→
j=1
J

p−1
j E(hij )E sij (τ − 1[yij − xij β τ ≤ 0])xij

=
j=1





J

sij (τ − 1[yij − xij β τ ≤ 0])xij 

=E 
j=1

=E (τ − 1[yij − xij β τ ≤ 0])xij

Proof of Corollary 2. To apply the results from Newey et al. (1994), note that

J
βτ

E p−1
j hij sij τ − 1 yij − xij β τ ≤ 0

xij

j=1
J

=

βτ

E sij τ − 1 yij − xij β τ ≤ 0
j=1

108

xij

=

β

E

=

βτ

=

βτ

τ − 1 yij − xij β τ ≤ 0

xij

τ − 1 yij − xij β τ ≤ 0

E E

xij | x

E τ − Fy|x (xij β τ )xij

=E fy|x (xij β)xij xij = AV P

and

BV P = E

J
−1
j=1 pj hij sij qij

J
−1
j=1 pj hij sij qij

Cross products cancel out since hij sij hkm skm = 0 For any k = i or m = j. Note that
(hij sij )2 = hij sij .


J

BV P = E 
j=1



p−2
j hij sij qij qij

J

=

p−2
j E

J

hij sij qij qij =
j=1

j=1

109

p−1
j E sij qij qij

BIBLIOGRAPHY

110

BIBLIOGRAPHY
Albrecht, James, Anders Bj¨orklund, and Susan Vroman (2003). Is there a glass ceiling in
Sweden? Journal of Labor Economics 21.1, pp. 145–177.
Bartalotti, Ot´avio (2012). “Essays in econometrics”. UMI Number 3509178.
Buchinsky, Moshe (1998). Recent advances in Quantile Regression models: A practical
guideline for empirical research. The Journal of Human Resources 33.1, pp. 88–126.
Buchinsky, Moshe (2001). Quantile regression with sample selection: Estimating women’s
return to education in the U.S. Empirical Economics 26.1, pp. 87–113.
Cameron, Adrian Colin and Pravin K Trivedi (2009). Microeconometrics Using Stata. Vol. 5.
Stata Press College Station, TX.
Koenker, Roger (2005). Quantile Regression. 38. Cambridge university press.
Machado, Jos´e AF and Jos´e Mata (2005). Counterfactual decomposition of changes in wage
distributions using quantile regression. Journal of Applied Econometrics 20.4, pp. 445–
465.
Martins, Pedro S and Pedro T Pereira (2004). Does education reduce wage inequality?
Quantile regression evidence from 16 countries. Labour Economics 11.3, pp. 355–371.
Melly, Blaise (2005). Decomposition of differences in distribution using quantile regression.
Labour Economics 12.4, pp. 577–590.
Newey, Whitney K and Daniel McFadden (1994). Large sample estimation and hypothesis
testing. Handbook of Econometrics 4, pp. 2111–2245.
Powell, James L (1991). Estimation of monotonic regression models under quantile restrictions. Nonparametric and Semiparametric Methods in Econometrics,(Cambridge
University Press, New York, NY), pp. 357–384.
Solon, Gary, Steven J Haider, and Jeffrey Wooldridge (2013). What are we weighting for?
Tech. rep. National Bureau of Economic Research.
Wooldridge, Jeffrey M (1999). Asymptotic properties of weighted M-estimators for variable
probability samples. Econometrica 67.6, pp. 1385–1406.
Wooldridge, Jeffrey M (2001). Asymptotic properties of weighted M-estimators for standard
stratified samples. Econometric Theory 17.02, pp. 451–470.
Wooldridge, Jeffrey M (2010). Econometric analysis of cross section and panel data. Second
Edition. MIT Press.

111