rtﬁuhmdxe...’ In;

‘ y :1...
I,

.1?
H: .371}

.3" ‘20:.
$93 I

\ (van
t .y
«.3

 

dam .832

 

"G “Noon—

[”2

 

. \
. €574

This is to certify that the
dissertation entitled

ESSAYS IN PANEL DATA ECONOMETRICS EXAMINING
SELECTION BIAS AND AVERAGE TREATMENT EFFECTS

 

presented by

LIBRARY
Ichigan State
University

 

Kamyar Nasseh

 

'——M .

has been accepted towards fulﬁllment
of the requirements for the

Doctoral degree in Economics

 

 

7"“ Major Professor’s Signature

Date

MSU is an affinnative-action. equal-opportunity employer

PLACE IN RETURN BOX to remove this‘checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DAIEDUE

DATEDUE

DAIEDUE

 

MAY 0 4 2009

 

7111909

 

MAY 2 020m

 

’1011 1'0

 

 

 

 

 

 

 

 

 

 

 

6/07 p:lClRC/DateDuetindd-p.1

 

 

ESSAYS IN PANEL DATA ECONOMETRICS EXAMINING SELECTION
BIAS AND AVERAGE TREATMENT EFFECTS
By

Kamyar Nasseh

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Economics

2007

ABSTRACT

ESSAYS IN PANEL DATA ECONOMETRICS EXAMINING SELECTION
BIAS AND AVERAGE TREATMENT EFFECTS

By
Kamyar Nasseh
Chapter 1 considers the affect of time varying unobserved effects in an
unbalanced panel data model with possible selectivity bias. As in previous work
dealing with sample selection, the unobserved effects in the regression and
selection equations are allowed to be correlated with the regressors. Prior to
testing for selectivity bias, the parameters of interest are estimated using
Generalized Method of Moments. A minimum distance procedure is used to
correct for selection bias. An empirical application dealing with a wage equation

is used to illustrate the testing and correction procedures outlined in this chapter.

Keywords: Panel Data; Sample Selection; Time Varying Unobserved Effects;
Conditional Mean Independence; Generalized Method of Moments; Minimum

Distance Estimation

In Chapter 2, we consider nonlinear panel data models with possible
sample selection bias. Previous work has shown the robustness properties of the
quasi-maximum likelihood estimator under a conditional mean assumption. One
can exploit the robustness properties of this estimator to test and correct for

selection bias. Under a conditional mean assumption, a Generalized Method of

Moments procedure is also available as an estimation method under suitable
orthogonality conditions. An empirical example is used to illustrate the theory

discussed in this chapter.

Keywords: Panel Data; Sample Selection; Robustness; Quasi-conditional
maximum likelihood; Conditional Mean Independence; Generalized Method of

Moments

Chapter 3 considers estimation of Average Treatment Effects (ATES) for
panel data models. Previous work has estimated the endogenous ATE with a
correction function for cross-sectional data. The correlated random coefﬁcient
model gives us a framework from which to estimate ATES, especially when the
treatment variable is possibly endogenous. To account for endogeneity, we use a
correction function estimator, which adds a function to correct for endogeneity
bias. Monte Carlo Simulations Show that the correction function estimator
performs well in ﬁnite samples. An empirical example illustrates the theory
presented in this chapter by estimating the effect of the school choice program in

Michigan on fourth grade student performance in mathematics.

Keywords: Panel Data; Average Treatment Effect; Correction Function; FE-IV;

Correlated Random Coefﬁcient; Endogeneity Bias

DEDICATION
I would like to dedicate my thesis to my parents and grandparents, without
whose support and encouragement I could not have completed my doctoral degree
in economics. I would also like to thank my sister, Nooshin, for providing me
support and encouragement throughout the last ﬁve years. Her sense of humor
has always helped relieve the many stressful periods that come with completing a
thesis. My family has always provided me the support and inspiration to

complete my education and achieve my goals and aspirations.

iv

ACKNOWLEDGEMENT

I would like to thank my thesis advisor, Professor Jeffrey Wooldridge, for
the excellent training he has given me in the ﬁeld of panel data econometrics. I
was inspired to write a thesis in panel data econometrics after taking his advanced
course in cross-section and panel data econometrics at the beginning of my
second year. I am forever grateful to him for providing me the support,
encouragement, and patience to complete my thesis. I would also like to thank
Professor Peter Schmidt and Professor Emma Iglesias for their support. I am
deeply appreciative for the feedback they have given me throughout the writing of

this thesis.

TABLE OF CONTENTS

LIST OF TABLES ....................................................................... viii
CHAPTER 1
1. Introduction ............................................................................ 1
2. Consistency of linear time varying unobserved effects models
in balanced and unbalanced panels ........................................... 3
2.1. Consistency in a balanced panel ......................................... 3
2.2. Consistency in an unbalanced panel .................................... 6
3. Variable addition tests for selectivity bias ......................................... 9
3.1. Testing when only the selection IS observed” .............9
3. 2. Testing when the selection variable IS partially observed ........... 12
4. Correcting for sample selection bias ............................................... 14
4.1. Selection Corrections when the selection variable is
partially observed ............................................................... 15
4.2. Selection Corrections when only the selection variable is
observed ......................................................................... l8
5. Empirical Application: A wage offer equation ................................... 20
6. Conclusion ............................................................................. 24
Appendix A: GMM standard errors for testing ..................................... 25
Appendix B: GMM standard error correction ................................................... 26

Appendix C: Derivation ofthe optimal weighting matrix. 27

CHAPTER 2
1. Introduction ........................................................................... 38
2. Consistency of nonlinear models in balanced and unbalanced panels. . . . . ....40
2.1. Consistency in a balanced panel ........................................ 40
2.2. Consistency in an unbalanced panel .................................... 46
3. Tests for selection bias ............................................................... 50
3.1. A simple variable addition test for selection bias ..................... 50
3. 2. Testing for contemporaneous selection bias ........................... 52
4. Correcting for sample selection bias” 57
5. Empirical Application: A wage offer equation ................................... 60
6. Conclusion ............................................................................ 63
Appendix A .............................................................................. 64

vi

CHAPTER 3

1. Introduction ............................................................................... 71
2. General model and assumptions ...................................................... 73
3. A general method for deriving the correction function ............................. 78
4. Examples ................................................................................... 80
4.1. Probit treatment variables ................................................... 80
4.2. Tobit treatment variables .................................................... 82
5. A simple test for slope heterogeneity using FE-IV ................................. 84
6. Monte Carlo simulation ................................................................. 86
7. Empirical Example: Michigan Schools of Choice Program ....................... 89
8. Conclusion ................................................................................ 90
Appendix A ................................................................................ 95
FOOTNOTES ............................................................................... 99
REFEFERENCES ........................................................................ 101

vii

LIST OF TABLES

CHAPTER 1:
Table 1: Summary Statistics ................................................. 31
Table 2: Estimates for wage equation ...................................... 32
Table 3: Estimates for wage equation ...................................... 34
Table 4: Estimates for wage equation.
Minimum distance estimation ............................................... 36
CHAPTER 2:
Table 5: Summary Statistics ................................................ 68
Table 6: Wage offer equation. Fixed Effects Poisson (F EP)
estimates ........................................................................ 69
Table 7: Wage offer equation.
Linear Fixed Effects (FE) estimates ......................................... 70
CHAPTER 3:
Table 8: Monte Carlo Results .............................................. 92
Table 9: Monte Carlo Results .............................................. 93
Table 10: Summary Statistics .............................................. 94
Table 11: ATE estimates .................................................... 94

viii

CHAPTER 1

1. Introduction

In practice, applied labor, public, and 10 economists deal with missing data where only a
subset of the entire population is observed. For example, labor economists are often
interested in estimating wages for the working population, but may only observe a subset of
workers because hours worked are not observed for everyone in the working population.
Inconsistent parameter estimates, otherwise known as selectivity bias, result when the
sub-population is nonrandomly drawn from the overall population. Recent econometric
literature has attempted to test and correct for selectivity bias. Nijman and Verbeek (1992)
provide a Simple way to test for selectivity bias for random effects estimation. Wooldridge
(1995) uses a fixed effects approach where he allows for correlation between the unobserved
effects and the regressors. No distributional assumption is imposed upon the idiosyncratic
errors of the regression equation in Wooldridge (1995), but a normality assumption is
imposed on the errors of the selection equation. Wooldridge (1995) also allows for serial
dependence in the errors of the regression equation. Since the unobserved effect is
differenced away, selection is also allowed to depend upon the unobserved effect in the
regression equation. Other econometricians have also dealt with missing data issues.
Kyriazidou (1997) has proposed sample selection methods that do not require distributional
assumptions. Instead, she uses a differencing approach between periods to eliminate the
unobserved effect and possible sample selection problem. Rochina-Barrachina (1999)
expands upon Wooldridge (1995) by making a normality assumption on the idiosyncratic

error in the regression equation.

All of these previous works on sample selection have only taken into account time
constant unobserved effects in the regression and selection equations. In this paper, we shall
make a contribution to the missing data econometric literature by introducing time varying
unobserved effects in the regression and selection equations. For example, labor economists
often have to deal with unbalanced panels when they forecast workers wages. Wages not
only depend upon a worker’s experience, education, age, gender, etc., but also on an
unobservable skill which may have a price that varies through time. In a linear panel data
model denoting a wage equation, the unobserved effect can represent a worker’s
unobservable Skills and the time varying parameter on the unobservable effect can represent
the time varying price attributed to those skills. However, recent econometric literature has
only attempted to deal with estimation of balanced panel data models with time varying
unobserved effects. Holtz-Eakin, Newey, and Rosen (1998) estimate a vector autoregression
panel data model with time varying unobserved effects and endogeneity. Ahn, Lee, and
Schmidt (2001) consider GMM estimation of balanced linear panel data models with time
varying unobserved effects. This paper will extend Ahn, Lee, and Schmidt’s (2001) analysis
of time varying unobserved effects models to unbalanced panels. More speciﬁcally, this
paper applies time varying unobserved effects to Wooldridge (1995). Due to the time
varying parameter we impose on the unobserved effects of the regression and selection
equations, standard pooled OLS estimation will not be feasible to generate parameter
estimates and standard errors to test for selectivity bias. Therefore, GMM estimation will be
necessary to generate estimates of parameters and standard errors necessary for tests of
selectivity bias.

This paper shall only consider the affect of time varying unobserved effects on

Wooldridge’s (1995) model of sample selection. The main objective of the paper is to
consistently estimate Ahn, Lee, and Schmidt’s (2001) time varying unobserved effects model
in an unbalanced panel and account for any selectivity bias, while maintaining many of the
assumptions imposed in Wooldridge (1995). The asymptotic analysis in this paper is for
ﬁxed Tas Ngoes to inﬁnity.

The plan for this paper is as follows. Section 2 will consider consistency of the Ahn,
Lee, Schmidt (2001) estimator in a balanced and unbalanced panel. Section 3 will cover
variable addition tests for selectivity bias. Section 4 will cover a method that corrects for
selectivity bias in time varying unobserved panel data models. Section 5 applies our
estimation technique to a wage offer equation. Appendix A contains the derivation of GMM
standard errors needed for tests of selectivity bias, while Appendix B provides a GMM
method that gives precise standard errors in the presence of generated regressors. Appendix
C derives the optimal weighting matn'x used in the minimum distance procedure to correct

for selectivity bias in Section 4.

2. Consistency of linear time varying
unobserved effects models in balanced
and unbalanced panels

2.1 Consistency in a balanced panel

In Wooldridge (1995), the regression equation of interest has a time constant unobserved

effect. In this chapter, we assume that the unobserved effect in the regression equation

affects the dependent variable differently for different individuals, but that the temporal
pattern of the unobserved effects is the same for each person. For example, assume a wage
equation where the price of a worker’s skill set varies over time. More speciﬁcally, consider
the following linear time varying unobserved effects model from Ahn, Lee, Schmidt (2001),

hereafter denoted as ALS, for i. i. d. cross-section observations: for any i:

y,“ = x,-,1,B+9[1a,-1 +1.11“, t = 1,...,T,i = 1,...,N. (2.1)
where xm is l x K, B is K x 1, 6,1 is a scalar element of(1,631,...,6n)' and the
nomtalization 911 = l is imposed. In (2.1), the number one in the subscripts for
(t‘,~,1,xi,1,a,-1,u,-,1 ) denotes the primary regression equation ofinterest, such as a wage
equation. We assume that N cross-sectional observations are available and that the

asymptotic properties of the estimates are derived with T fixed and N —> 00.

The following assumption ensures consistency of the ALS estimator in a balanced panel:

Assumption 1: Fort = 1,...,T
5(1’1'11Ixill>---axiTlaail) = 0
Under Assumption 1, for fixed T, the ALS estimator iS consistent and
JN -asymptotically normal as N —> 00. Assumption 1 states that x 1,1 iS strictly exogenous
conditional on 011-1 fort = l, , TI To conduct estimation, we ﬁrst use a quasi-differencing

technique to eliminate the unobserved effect and then perform GMM to estimate the

parameters.
.1711 -911)’r11 = (X111 ‘011X111)3+ (”ill 49111011) (7-2)

Assumption 1 implies a large number of moment conditions.2 Although more moment

conditions do not hurt asymptotically, one would like to reduce the number of moment

conditions to produce better ﬁnite-sample estimates. In terms of the data available, ALS use

the following T(T- l)K moment conditions to consistently estimate 0,1 and 6 by GMM,
Proposition 2.1 Using Assumption 1, fort = 1, , T,r = 2, , T
Eixi‘zi [0"z'r1 — Kiriﬁ) - 91.10711 - Xz‘tlﬂﬂ} = 0
Proof:

Under Assumption 1, XIII is strictly exogenous with respect to um. The following

T(T— l)K moment conditions consistently estimate 0,1 and [3.
I 6H
EIXMIO‘irl - x1MB) — 29—51—0131 — x1313)” = 0, V ’Jﬁs
Proposition 2.1 follows from Assumption 1 Since

I 0"
EixhlIO'irl _ xirlﬁ) — ’63—:0’1'51 — xislﬁn} =
= E{x;,1[0’rr1 - xil‘lﬂ) - 9r10‘111 — Kiliml} -

9- I
’ ELIEixm [0751 — Xislﬁ) - 9.910111 - X1113)” I
S

Proposition 2.1 implies
EIXI‘11(”ir1 -9,.]u,-11)] = O, t = 1,...,T, r = 2,...,T. (2.3)
Since from the Law of Iterated Expectations,

E[x},1 (“n-1 — (ii-114011)]
= EIEIXI‘tlwirl _ Orluilln I ail’xi}
=E{X;-11E(llir1—9r1ui1]) I ai1,x,-)} =0,l‘=2,...,T,I= 1,...,T
Now that we have deﬁned the conditions for consistency ofthe ALS estimator in a balanced

panel data sample, we will next deﬁne the conditions for consistency of ALS in an

unbalanced panel.

2.2 Consistency of the ALS estimator in an
unbalanced panel

In this section, we define the conditions sufﬁcient for consistency ofthe ALS estimator
in an unbalanced panel. AS in the previous section, due to the time varying parameter on the
unobserved effect, one has to perform GMM to estimate 6,1 and ,8. The vector of selection
indicators for each 1' is denoted aS Si 2 (SI-1 , ,SiT)’ and ym is observed ifs” = 1. We
assume that x,-, is observed for all 5i- 6,1 and [3 are consistently estimated by performing
GMM on a selected sample of(2.l). For ﬁxed T, as N —+ co, the ALS estimator is consistent

and asymptotically normal on the selected subsample when:

Assumption 2: Fort = 1,...,T

E(“itlIxillv--’xz'T1~ailasi) = 0

Due to the multiplicative way the selection indicators interact with the regressors in the
GMM moment conditions, we need a strict exogeneity condition to consistently estimate 9,1
and B. A conditional expectation condition in the form of Assumption 2 is therefore
necessary for consistency ofdtl and [3 in an unbalanced panel. Assumption 2 is as in
Wooldridge (1995), despite the fact that time varying unobserved effects were not allowed.
However, Assumption 2 can be easily extended to models that have time varying unobserved
effects. To consistently estimate 0,] and B by GMM, we need to write the necessary

T (T— l)K moment conditions in terms of the data and parameters available.

Proposition 2.2 Under Assumption 2, fort = 1,, T. r ¢ 1
, 9 1
Eisitst'rmeO’itl T xitlﬂ) T #70171 T X17118)” : 0

Proof:
Consider a model without time varying unobserved effects. Under Assumption 2, x1“ is
strictly exogenous with respect to um and Si- The following T(T— 1)K moment conditions

that consistently estimates [3.
Eisz’ssirxin [071‘] T xirlﬂ) _ 0"1'31 T xislﬁ)” = 0a V t,r,S
Therefore, Proposition 2.2 follows from Assumption 2 for a selected subsample since

Eisissirxi‘tl IO’iri - x1115) - 0751 - Xisiﬂﬂ} =
Eisitsisxi't] [0111 T xitlﬂ) T U’I‘Sl T X13113)” T
_ Eisitsirxi‘f] [(I'itl T xitlﬂ) T ()‘irl T xirl BM}
In the in a model with time varying parameters, since the expectation operator passes through

parameters, the same considerations apply as in a model without time varying parameters

and therefore we can claim using Assumption 2 that in a selected subsample
r 911
EISI’SII‘XHI [0’11] — xillﬁ) - Q—l—O’irl — Xirlﬂﬂ} = 0,1: I, , T. r 11: I.
,.

lfthe data are missing for either time period t or time period r, we just use

E(x;-t1[(vi,1 — xitlﬁ) — (6,1/6,.1)(v,-,.1 — xi,.1ﬂ)]} = 0 for any pair (I, r) that is observed. If
data is missing for either time period t or time period r, those entries in the vector of moment
conditions are zero. Proposition 2.2 allows for more ﬂexibility when estimating the
parameters ﬂ and 9,1 . In the previous subsection, the product of 9,1 and the data from period
one were differenced from period t. However, in an unbalanced panel with missing data, this
wouldn’t be a useful transformation to estimate ﬂ and 6,1. By using the transformation in the

previous subsection, ifs” = 0, then we would lose cross-sectional observation 1' in

estimation. Proposition 2.2 allows enough ﬂexibility in that we can capture individuals that
drop out of and re—enter the sample. If data is missing for a certain time period, then one can
use the data that is available and observed for any other two time periods t and r. Plugging in
for y it] and J’irl , the moment conditions implied by Proposition 2.2 hold at the true

parameters.
, 0
E[si,sirxi[1(zl,',1— ﬁumﬂ = 0,t = 1,...,T r it (2.4)
,.

Under exogenous selection, consistency of )6 and 9,1 is achieved in an unbalanced panel
when the sample of data is selected for period r and period t. Condition (2.4) follows from
Assumption 2 by the law of iterated expectations since

0
. I [1
EL8 1151'er (”1'11 — 9 “irl )1
r1
_ E E . . l 911
- { [5it5irszl (“itl — 9r] uirl )l I Xi’ail ’Sittsir}
I
: EiSiz-YirxitiEWm I xiaail’sitasir» -
611 E . . ’ E .
T 6r] {sit‘sirxm (”irl I xI'aczz'l"91'la51'r‘)}

=0,t=l...,Tr¢t (2.5)
Conditioning on (S ”,3 n.) in (2.5) is valid under Assumption 2 due to a law of iterated
expectations argument and the fact that (s ”,3 (r) is a subset of 31- To achieve consistency of
the parameters of interest, ﬂ and 9,1, the GMM moment conditions are premultiplied by the
product of the selection indicators from period r and period 1, 5,78,}, to account for all
combinations of (SI-gs”). Identiﬁcation of the parameters ofinterest is achieved when the
data available is selected for period r and period t and when expected derivative matrix

derived from Proposition 2.2 with respect to [3 and 9,1 achieves full column rank. As in

Wooldridge (1995), it is not sufﬁcient to just put x,“ and 51-, in the conditioning set at time t.

Under Assumption 2, selection is strictly exogenous conditional on a 1'1 and x,-. Assumption
2 puts no restrictions on how selection relates to a” and Xi- Therefore, selection is allowed

to depend on the unobserved effect and the regressors in an arbitrary way.

3. Variable addition tests for selectivity
bias
Now, we will derive variable addition tests by adding a time varying unobserved effect to
the selection and regression equations. This differs from the approach taken by Wooldridge
(1995) when only time constant unobserved effects were incorporated into the selection and
regression equations. The approach used is Similar to that of Wooldridge (1995), where

either Tobit residuals or inverse Mills ratios are used as the additional variable.

3.1 Testing when only the selection indicator is
observed

As in Wooldridge (I995), assume that the explanatory variables x,“ are observed for all
t = 1,... , Tand the variable y,“ is observed ifs” = l and not otherwise. For each

I = 1,, T, deﬁne the selection process as

5,, = l[x,.,52 + 0,201,? + “1'12 2 0] (3.1)
where aiz = 1702 +3102 + (71-2 and 0,2 is a scalar element of(l,922, ,672)’. One can use
the Chamberlain (1984) version for the unobserved effect. However, to conserve on
parameters and degrees of freedom, one can use the Mundlak (1978) version for the
unobserved effect which is a linear projection of 0‘12 on a constant, the time average of the

explanatory variables, and an error term. Plugging in the Mundlak (1978) version for 01,-2 in

(3.1), one can derive T selection equations

viIZ = 6,2(‘1-2 +llit2,t = I,...,T (3.3)

where “1'2 and til-,2 arejointly normal and Var(v,~,2) = 652032 + l = rtzz. As in Wooldridge
(1995), Lil-,2 is independent ofx,‘ with E(ul-,2) = 0. Putting the selection equation in a labor
context, the wage of a worker is observed if the prospective worker accepts a wage at or
above the reservation wage. Due to the time varying parameter, 6,2, heteroskedasticity is
introduced in V172. When the ﬁrst stage probit estimation is done to collect the inverse Mills
ratio necessary for the variable addition test, the parameters 62, rm, and 172 are re-scaled due
to the time varying parameter 6,2 and the non-constant variance of via. The resulting

re-scaled parameters will be time-varying.

 

712

(13(sz + xirzézz +31’2712) (3-4)

'19 + ‘67+._‘- 19
pW ( ..>-¢(n~- (2 (2)

where ‘10. and .1? 1'2 each contain K 1 2 K regressors and (1>(-) is the standard normal CDF.
Although Wooldridge (1995) also allowed time varying parameters in the ﬁrst stage probit
estimation, logically the parameters could be time constant. Here, since it is explicit that
Var-(via) varies over time, the probit parameters in this model must vary somewhat. In
essence, our probit selection equation offers less ﬂexibility than Wooldridge (1995) since we
impose a time varying parameter on the unobserved effect in the selection equation. Since
til-,2 is a linear combination of zero mean normally distributed random variables independent
of(al-1,x,-), it is also distributed Normal ~ (0, r122) and independent of(aI-1,x,-). As in

Wooldridge (1995), one needs to assume independence between til-,2 and a” to derive a

10

convenient test for selectivity bias. Using the independence assumption between V112 and
(ail , xi), one can easily derive a conditional expectation that leads to a test for selectivity
bias. Since the vector of selection indicators 8,- is a function of(xi,vl-2), where
Viz = (1’i12,,vi72)', a sufﬁcient condition for Assumption 2 is
E(u,-,1 | al~1,x,-,v,-2) = 0,t=1,...,T (3.5)

Under the altemative of selectivity bias,

E(ul-,1|ai1,xi,vi2) = E(u,~,1 | V112) = pt-’,-,2,t = 1,...,T (3.6)
Equation (3.6) states that "1'11 is mean independent of
(an ,xi, V11 1 , I-’l-,t_1,2, I’I-,[+1’2, ,vin), conditional on til-,2. Under the alternative (3.6)
of selectivity bias,

150‘111Iariaxnvizasr) = 50111 I ar‘laxivViZ) = X1213 +9naii + Pvizz (3-7)

However, Since the selection indicator is observed, we must condition on 5, rather than on
V12 to derive a test. Later, when we claim that the selection variable is partially observed, we

can condition on Viz to derive a test. Using the law of iterated expectations and the fact that

V12 is independent of(al-1,x,-), one can Show that

50711 I arbxiiasi) = Xiziﬂ+9nan +PE("uz I (11141351)
= xir1ﬂ+9naii +PE(Vz'z2 I X1351: = 1) (3-8)

For the purpose of obtaining a simple test for selectivity bias, replace E01172 | xi,sl-) with
E(v,-,2 | x,-,s,~, = 1). To derive the test for selectivity bias, estimate E(v,-,2 l x1351“: = 1) in
order to derive the inverse Mills ratio and continue to assume that

. 7 7
I/(II‘O’I-tz) 2 922022 'I' I = TIZZ'

11

50112 I X13511 = 1) = 50112 I Xial’uz > ”(ll/:2 ”112612 +ir27t2I)

= 10102 + xnzézz +312m) (3-9)
where /l(t[/,2 + xiréirz + iii/t2) denotes the inverse Mills ratio. The following procedure tests

for selectivity bias when only the selection indicator is observed.
Procedure 3.1

1. For each I, estimate equation (3.4) using standard probit and compute the inverse Mills
ratio

A112 5 A112(1Ih2 + Xiaézz 1327(2)

2. Estimate the equation
yitl = xitlﬁ + p’IiIZ 'I' atlail + error,“ (3-10)
by GMM. Define m, = (XI-,1 Ill-,2) and 6 = (ﬂ,p)'. Rewrite (3.10) as
ym =v‘v,-,5+9,1a,1+error,-,1,z= 1,...,T (3.11)

Under the following T (T — l)(K + l) moment conditions in terms of the data available,
GMM will be consistent for 5 and 9,1.

9
E{s,-,s,-,.w'l-t[(yi,1 — w,-,5) T 91—11037] - Wi,.5)]} = 0,t = 1,,T r i t
,.

(3.12)

Identiﬁcation of the parameters is achieved when the expected derivative matrix derived
from (3.12) achieves full column rank.

3. Test H0: p = 0 using the t-statistic for f). A statistic that uses the standard GMM
standard error is valid. See Appendix A for the derivation of the GMM standard errors.

3.2 Testing when the selection variable is partially
observed

In this subsection, we again assume that the explanatory variables x,“ are observed for
all t = l, , T. The variableyl-tl is observed for only a non-negative value ofthe latent

variable, ha. Assume that for all t, the censored variable I1 it E max(0,h;,) is observed. Like

12

the previous subsection, we use the Mundlak (1978) version for 0‘12 in the selection equation

to conserve on degrees of freedom. The censored variable is deﬁned as,
12,-, = max(0,x,-,52 + 9,2(1702 +.Tcl~r72) + 12,-,2) (3.13)

where via is deﬁned as in (3.3) fort = l, , T. In a labor context, the wage ofa worker is
observed if the worker works more than zero hours. We continue to assume that V112 is
independent of (a ,1, Xi) and distributed Normal ~ (0, r222). Under the null hypothesis of no

selectivity bias,
5(“1'11 I ai1,xl-,hl-) = O,t=1,...,T (3.14)

where h,- = (I1i1,h,-2,,/1,-T)' replaces Si in the conditioning set, making (3.14) a stronger
version of Assumption 2. However, the interpretation of (3. 14) is the same as that of
Assumption 2. Since hi is a function of(xi,vl-2), (3.5) through (3.7) deﬁne the null
hypothesis of no selectivity bias and the altemative hypothesis indicating selectivity bias with
only h,- replacing s,- in (3.7). If one could observe V112 in (3.7), then one could test the null
hypothesis of no selectivity bias by adding til-,2 as an additional regressor in the GMM
estimation of (2.1). One can estimate "1'12 by estimating a Tobit model for the selection

equation. The following procedure is a valid testing procedure when hit is observed.
Procedure 3.2

1. For each I, estimate (3.13) by standard Tobit. For S” = 1, compute

V112 = hit T171012 T 912 Ti'1'7712

2. Estimate the equation
)‘it : x1113 + 109112 + 9tlo‘il + e""0"itl (3-15)
by GMM. Redeﬁne m, = (xm 12,-,2) and 5 = (ﬂ,p)'. Rewrite (3.15) as

13

y,“ = w,-,6+9,1a,-1+errori,],t= 1,...,T (3.16)

Under the following T(T— l)(K + l) moment conditions in terms ofthe data available,

GMM will be consistent for 5 and 8,1.

, 9
E{s,-,s,-rw,-,[(v,-,1 — w,-,(S) - 5%0’171 — w,-,6)]} = 0,t = 1,, T. r #:t
,.
(3.17)
Identiﬁcation of the parameters is achieved when the expected derivative matrix derived
from (3.17) achieves full column rank.

3. Test H0: p = 0 using the t-statistic for p. A statistic that uses the standard GMM
standard error is valid. See Appendix A for the derivation of the GMM standard errors.

4. Correcting for sample selection bias

We will now consider correcting for selectivity bias when the selection indicator is
partially observed and when only the selection variable is observed. In Section 3, we
considered tests for selectivity bias when It}! was partially observed or when s it was only
observed. However, it is important to note that the procedures outlined in Section 3 are not
methods to correct for selection bias, only methods to test for selection bias. By
quasi-differencing ail out of(2.1), we need to condition upon selection in at least two time
periods to estimate 6 and 6,1. In the methods that we propose below, we impose restrictions
upon ail, as in Wooldridge (1995), to avoid having to condition upon selection in at least
two time periods. Although, Dustmann and Rochina-Barrachina (2000) do condition upon
selection in two time periods and correct for selection, they use semiparametric estimation to

correct for selection bias. We will use an entirely parametric method to correct for selection

14

bias as in Wooldridge (1995) that does not restrict the distribution of a” given (XI-1,11%)
where X11 = (x111, , x171 )'. Wooldridge (1995) estimates a regression equation with time
constant unobserved effects. However, in our paper we consider a regression equation with a
time varying unobserved effect. Despite the presence time varying unobserved effects, it is
still possible to impose structure on “11 as in Wooldridge (1995). Chamberlain (1984)
allows (11-1 to depend on the entire history of X i] . In the previous sections, we made no
Speciﬁc assumptions on the unobserved effect, “it , in the regression equation. Despite the
presence of the time varying effects in our model, we can still allow a 1'1 to depend on X 1'] as
in Chamberlain (1984). However, to conserve on parameters and make estimation more
tractable, we allow the unobserved effect in the regression equation to depend on the time
averages of the explanatory variables as opposed to the entire history of X 11- Although this
is not as general as Chamberlain (1984), we still restrict or [1 in such way as to avoid having
to condition upon selection in two separate time time periods. We can then substract the

time varying parameters, 9,1, using a GMM or minimum distance procedure.

4.1 Selection corrections when the selection
variable is partially observed

The main regression equation is still given by (2.1). In this section, we shall impose a
linearity assumption relating a 1'1 to xi and til-,2. Before we continue, we will formalize the
selection mechanism based upon Section 3.

Assumption 3: Deﬁne 5,, as in (3. 1) and (3.2), where via is still independent of x,-
and We ~ N0rmal(0, r32) where rt22 is deﬁned in Section 3. Let h it be deﬁned as in

(3.13) with 12,-, E max(0,lz;t) and S," = 1[/1;‘t Z 0],! = 1,...,T.

To correct for selectivity bias, we also make the following assumption,

15

Assumption 4: (i) E(u,-,1 | xi,vl-,2) = E(ul-,1 | V112) = L(ul-,1 I 12172) and (ii)
50111 I xiaVrtz)=L(ai1 I Iv-IivViIZ)

Since the entire history of (V112, ,vin) does not appear in Assumption 4(1’), we are
allowing for the serial dependence in 111-12 to be entirely unrestricted. Assumption 4(1') is a
conditional mean independence condition that holds if(ul-t1,vit2) is independent of xi.
Other than assuming linearity in E(ai1 | Xi, Via) in Assumption 4(ii), the distribution ofa”
given (Xi, "112) is unrestricted for all t. Assumption 4(1'1') also holds when (ill-1,12%)
conditional on x,- is bivariate normal. Since E01,“ I V112) is assumed to be linear in

Assumption 4(i), one can write
E(“itl I "112) = PtVirZ (4-1)
where p, is a scalar. Using the Mundlak (1978) device for the unobserved effect in the
regression equation, ail = an +1717] + c” , where Cu is a zero mean random variable, the
linear predictor in Assumption 4(1'1') can be written as
L(a,—1 | l,.i',-,v,-,2) = an +3717] + duty-,2 (4.2)
Using a law of iterated expectations argument and Assumption 3, by conditioning on only xi,

we can write
E(a,-1 | Xi) = on +5121! + ¢,E(v,-,2 I x,-) = a). +3171 (4.3)
Since E(v,-,2 | x,) = 0 by Assumption 3. Using the Mundlak (1978) device for the
unobserved effect, (2.1) can be rewritten as
ym = xmﬂ + 9,] (an +5711] + 61-1) + “1'11 (4.4)
where (01 is a scalar, it is l x G and n. is G x l where G 2 K. Now, under Assumptions 3

and 4, we can write

16

50721 I vairz) = (011 + xtil/3 #3191101 + (ptlvitZ (4.5)
where a)“ = Baa); represents year dummies, 17,1 = Only] and (p11 = p, + 6,143,. Unlike
Wooldridge (1995), due to the time-varying parameter on an , the coefficients on the
intercept and 5c,- are time varying. Since Vrrz for r i t is not included in the conditioning set

in (4.5), 12,-,2 is not strictly exogenous. Since 5,, is a ﬁrnction of (xi,vl-t2), (4.5) implies that

EO’rrl I xi’vith‘Tt = 1) = 60:1 + X11119 +ir9110! + (ptlvit2 (4-6)
A GMM procedure that accounts for the ﬁrst stage estimation of the Tobit residuals is

required to consistently estimate (4.6).
Procedure 4.1

1. Deﬁne the residual function, 9,12, from T standard Tobit equations as in Procedure 3.2.
For Sit = 1, deﬁne the 1 x (1 +K+ G + T) vector wit = (l,5c,-,x,-t1,0,... ,1»,,2,o,...,0).

2. Obtain T = ((2)1 1, ,®T1,i3,é21,... ,én,f]],([)11,... ,OTI)’, a(3T—1+K+ G) XI

parameter vector from the non-linear GMM estimation of
Yitl = VII/HY + erroritl (4.7)

for s it = 1. To correct for the generated regressor problem, one needs to stack the
moment conditions implied by (4.7) on top of the ﬁrst order conditions that generate the
Tobit residuals from the ﬁrst stage estimation. See Appendix B for details. Under
Assumptions 3 and 4 and standard regularity conditions, Y is consistent and JN
asymptotically normal.

3. Obtain the asymptotic variance of Y, which will give AMI-((5,1). From this, one can
construct a Wald statistic with T restrictions to test H0 : (p {1 = 0. A test that fails to
reject H0 implies no sample selection.

17

4.2 Selection corrections when only the selection
indicator is observed

When only the selection indicator is observed, we can modify our approach from the
previous subsection to correct for selectivity bias. Rather than collecting Tobit residuals
from a ﬁrst stage estimation, we need to do a ﬁrst stage probit estimation. Assumptions 3
and 4 will still hold in this section. Since the variance on V112 is not constant through time,
we will need to re-scale the probit parameters as in Section 3. As in Section 3,

Var(v,-t2) = 632022 + l = rt22. When only the selection indicator is observed, we need to
ﬁnd the expectation ofym given (XI-,5” = 1), which is

50111 I xiasit : 1) = 60,] + xitlﬂ Til'etlnl + (P121112 (4'8)
where /l,~,2 denotes the inverse Mills ratio from the ﬁrst stage probit estimation. Now we can

specify the following procedure that will correct for sample selection when only Si, is

observed.
Procedure 4.2

1. For each I, estimate (3.4) by standard probit and construct the inverse Mills ratio
A112 5 11120212 + sz2522 +56121712)-
Forsi, = 1, deﬁne the l x (l + K+ G+ T) vectorwi, = (1,.i'i,x,-,1,0,...,/l,~,2,0,... ,0).

2. Obtain i- = (6,011,... ’(DTI’B’QZI’ ...,éT1,f]1,(b11, ,OTI)’, 3 (3T— I + K+ G) X I
structural parameter vector from the non-linear GMM estimation of
yin = w,,r+ error,“ (4.9)

for s it = 1. To correct for the generated regressor problem, one needs to stack the
moment conditions implied by (4.9) on top of the ﬁrst order conditions that generate the
inverse Mills ratio from the ﬁrst stage estimation. See Appendix B for details. Under
Assumptions 3 and 4 and standard regularity conditions, Y is consistent and [N
asymptotically normal.

18

3. Obtain the asymptotic variance of Y, which will give Avar((2),1). From this, one can
construct a Wald statistic with T restrictions to test Ho : (p ,1 = 0. A test that fails to
reject Ho implies no sample selection.

Alternatively, we can estimate (4.8) using a minimum distance procedure. In the next
section, an empirical example that illustrates the use of minimum distance to estimate (4.8) is
shown using wage and experience data. To estimate T, one ﬁrst needs to estimate an entirely
unrestricted version of (4.8) using pooled OLS. To account for the ﬁrst stage estimation of
2”, the standard errors for all parameters need to be adjusted. Details for the two step
estimation technique to account for the generated regressor problem is shown in Appendix C.
The following procedure outlines a method to correct for selection bias when only 3 it is

observed.
Procedure 4.3

1. For each t, estimate (3.4) by standard probit and construct the inverse Mills ratio
A122 5 1:2(V7t2 + Xrtzézz +5Cz-zfz2)
Forsi,= 1, deﬁne the 1 x (1 +K+ G+ T) vectorwi,= (l, xi,x,-,1, 0,. .,2,-,2,0,...,0).

2. Estimate the following entirely unrestricted equation using pooled OLS
J’rzi = 60:1 + xmﬁ +56:an + (P111112 + ”fort: = War +err0rit (4-10)
WhCI‘C wit = (I,d2t, ’th’xitI’ji’dzl‘ ° 2"", ,th ° Xi’lit’dzf ° Ail, ,th ° A”) IS a
1 x (2T+ K+ TG) vector and I" is a (2T+ K+ TG) x 1 vector ofreduced form
parameters. The pooled OLS estimator on 1the selected sample is written as

=ZZsz-(wm 22s sitw LL-y 111 (4.11)

1=1t=1 i=lt=l

3. Obtain Ava?'(f‘) from the pooled OLS estimation of (4. 10) to use as the optimal
weighting matrix for minimum distance estimation of 6. See Appendix C that for the
derivation of Avar(l") that accounts for the ﬁrst stage estimation of 2),.

l9

The efﬁcient C MD estimator for Y solves the following criterion function

mrrn <1“ — Hm}’[Avai-(1“)]‘1 {1‘ — Hm} (4.12)

where

H(Y)= 921]]

9TH?
9011

 

 

(PT]

I— _

H is a matrix that maps the structural parameters, Y, onto the reduced form parameters, I".

4. Obtain the asymptotic variance of Y, which will give Avar((p,1). From this, one can
construct a Wald statistic with T restrictions to test H0 : (p ,1 = 0. A test that fails to
reject Ho implies no sample selection.

5. Empirical Application: A wage offer
equaﬁon
In this section we consider estimation of a wage offer equation with possible selectivity
bias. We consider a test of selectivity bias when only the selection indicator is observed. We
consider three versions of Procedure 3.1, two of which use the HNR (1988) transformation to
eliminate the unobserved effect. One version of Procedure 3.1 that we consider uses the

following (T— l)(K+ 1) moment conditions

20

i 9 ,
EISI’SiJ-Iwitltoiitl T W115) T 91—!11 1 (JIM—1,1 T wry—1,10)» = 09 t = 2, 9T

(5.1)

The second of Procedure 3.1 considers (T— 1)2(K + 1) moment conditions fort = 2,, T,

9n
9r-1,1

 

EisitSiJ—IWISIO’I'II " Wrr5) _ U’z‘,t—1,1 — War—1,15)” = 0 {5-2)

The third version of Procedure 3.1 that we consider uses T(T — l)(K + 1) moment conditions

and is identical to (3.12)
, . 9
E(s,-,s,.,.w,.,[(y,-,1 — wL-La) — ﬁg,“ - wL,5)]) .—_ 0,t = 1,...,T. r 4.. t (5.3)

Adding more moment conditions increases asymptotic efﬁciency and does not violate the
assumptions we have made above. Finally, using efﬁcient classical minimum distance
estimation, we correct for selectivity bias using Procedure 4.3. Using data on wages,
experience, education, and age, we test for selectivity bias using Procedure 3.1. The wage
offer equation we estimate is similar to the one estimated in Dustmann and
Rochina-Barrachina (2000) and Semykina and Wooldridge (2005) except that we assume
that all regressors in the wage offer equation are exogenous. Unlike Dustmann and
Rochina-Barrachina (2000) and Semykina and Wooldridge (2005), the unobserved effect that
represents an innate Skill in the wage offer equation has a price attributed to it that can vary
over time. The data used in the estimation of the wage offer equation comes from the Panel
Study of Income Dynamics (PSID) for the years 1980-1992 and is also used in Semykina and
Wooldridge (2005). The sample includes 877 individuals and has data on wage, age,

experience, education and labor force participation. The dependent variable in the wage

21

equation is the log of real average hourly earnings which are deﬂated in 1983 dollars and
deﬁned as the ratio of the individual’s annual labor income to the annual hours worked. The
vector of explanatory variables in the wage equation are experience, experience squared, and
year dummies. As speciﬁed in Procedure 3.1, a test for selectivity bias is conducted by
testing the signiﬁcance of the coefﬁcient on the inverse Mills ratio which is generated from a
ﬁrst stage probit estimation of the participation equation. A participation indicator is the
dependent variable in the selection equation. The regressors in the selection equation are
experience, experience squared, education, age, age squared, an indicator for marital status,
other family income and its square, the number of children in three age categories, spouse’s
education, spouse’s age, spouse’s age squared, the product of spouse’s age and education,
duration of spouse’s unemployment, and a binary indicator specifying whether the spouse’s
duration of unemployment was recorded or not. Table 1 reports the summary statistics for
the variables used in estimation. Table 2 reports the parameter values, standard errors and
t-statistics for the regressors, inverse Mills ratio, and ratio of the time varying parameters,

6,1/9,_1,1 using the GMM moment conditions Speciﬁed in (5.1) and (5.2). To test for the
presence of time varying effects, we test the null hypothesis that (6 ,1/6,_1,1) = 1. As a

reference, Table 2 also reports parameters and robust standard errors from OLS, FE, and a
FE selection test without time varying parameters. Table 3 reports parameters and standard
errors using the GMM moment conditions implied by (5.3), where to test for the presence of
time varying effects, we test the null hypothesis that 6, = 1. As can be seen from Table 2,
Procedure 3.1 that uses (T— l)(K + 1) moment conditions fails to reject the null hypothesis
of no selectivity bias at the 10 % level. Also there is little evidence that a time varying price

can be attributed to innate ability. The experience parameters are signiﬁcant at the 10%

22

level and the return to experience becomes negative after about 28.5 years. The J-test rejects
the null hypothesis of correct Speciﬁcation at the 5% level but not at the 1% level, indicating
that not all of the moment conditions used are valid. However, when the wage equation is
estimated using time demeaned regressors, the test for selectivity bias rejects the null
hypothesis at the 5% level. The inclusion of time varying ﬁxed effects seems to make the
selection term less Signiﬁcant when the wage equation is estimated using GMM with

(T — l)(K + 1) moment conditions. However, Procedure 3.1 that uses (T — 1)2(K + 1)
moment conditions rejects the null hypothesis of no selectivity bias at the 1% level. Also
there is stronger evidence that a price can be attributed to a worker’s Skill set. The
experience parameters are both signiﬁcant at the 1% level and the return to experience
becomes negative after about 25.3 years. The J-test rejects the null hypothesis of correct
speciﬁcation at the 1% level, indicating that the use of lagged and future experience as
instruments may not be valid. However, the use of more moment conditions decreases the
standard errors by a Signiﬁcant amount, which is not surprising. The return to experience
becomes negative after 23.75 years for pooled OLS, and aﬁer 58.6 years for the FE selection
test procedure. Table 3, which shows the results from using the T (T — l)(K + 1) moment
conditions implied by (5.3), rejects the null hypothesis of no selectivity bias at the 1% level.
The experience parameters are also signiﬁcant at the 1% level, as are most of the time
varying parameters. The return to experience using the Speciﬁcation in (5.3) becomes
negative after about 86.3 years. The J-test rejects the null hypothesis at the 1% level. AS the
results from Tables 2 and 3 Show, the model that we are testing may be misspeciﬁed
somewhat, although this is not surprising considering that we used experience and

experience squared as instruments. One possibility why the J-test may have rejected the null

23

hypothesis in all of the Speciﬁcations used is that past wage shocks effect the number of
years of experience we observe for workers today. Also there may be some serial correlation
in the errors of our model. Overall, the results from Tables 2 and 3 Show that there is some
evidence of selection bias and therefore a need to do a correction.

Table 4 reports the results for the minimum distance procedure that corrects for
selectivity bias. The Wald test for selection bias rejects the null hypothesis of no selection
bias at the 5% level. The experience parameters are both signiﬁcant at the 5% level.
However, the return to experience becomes negative after 103.6 years, a much Slower rate
than in Procedure 3.1. Correcting for selection bias seems to decrease the rate at which the
return to experience deteriorates. Overall, the return to experience never becomes negative
over the length of the panel in all of the procedures outlined in Tables 2, 3 and 4. To test for
the presence for time varying effects, we test the null hypothesis that 9, = 1. As shown in
Table 4, for most years, there appears to be a statistically Signiﬁcant time varying price that

can be attributed to Skill.

6. Conclusion

In this paper, we have shown how to test and correct for selectivity bias in a model with
time varying unobserved effects. Due to the time varying parameter on the unobserved
effect, standard OLS techniques are not sufﬁcient to estimate the parameters of interest. Due
to the non-linear nature of the moment conditions Speciﬁed in the paper, a GMM procedure
is required to estimate the parameters. The methods speciﬁed in this paper Should be useful
when the econometrician suspects the presence of time varying unobserved effects. By

ignoring the presence of a time varying parameter on the unobserved effect, a researcher

24

risks misspecifying his model when he uses standard OLS techniques to test and correct for
selectivity bias. Future research can involve weakening the strict exogeneity assumption of
the regressors.

Appendix A: GMM STANDARD ERRORS FOR TESTING

For brevity, we will derive the GMM standard errors for the variable addition test when
only the selection indicator iS observed. The method that calculates the GMM standard
errors when the selection indicator is partially observed is similar. Consider the following

equaﬁon
)Iitl = witd+6nan+err()r,-,1,t=1,...,T (A.1)

where a” = (XL-,1 2.1-,2) and 6 = ([3,p)’. In terms ofthe data available, GMM is consistent

for 5 and 9,1 when the following T (T — l)(K + l) moment conditions hold,
I 9r]
Eisrrsrrwrzio‘iri — W115) — 970131 — Wil‘6)]}
= E[b,-1(c,“)] = 0,1 = 1,...,T. r it (A.2)

where L: = (5,921 , ,0T1 )'. Deﬁning the sample ofaverage of [’11 (g) by

N
bmm = 41,7212 11(4) (A3)

5 solves minngN1(§)'VTIbN1(§). where V11 = E[b,-1((;)b,1(g)'] and

I/ll = (l/N) 211:1 bi1(§)bL-l . Using standard results from GMM and a Taylor series

expansion,
WG— C) —> (0.093 1711864)) (A4)
where 31 = E((3bL-1/6§)' is the expected derivative matrix and V1 1 is the optimal weighting

matrix of the moment conditions. The asymptotic covariance matrix (B'l VTIB 1 )TI) can be

25

estimated from a GMM procedure. Standard errors can then be directly obtained from the
estimated covariance matrix.

Appendix B: GMM STANDARD ERROR CORRECTION

In Section 4, we derived a two-step procedure that tests and corrects for selectivity bias
in an unbalanced panel with time varying unobserved effects. Due to the generated inverse
Mills ratio or Tobit residuals derived from the ﬁrst stage estimation of the selection equation,
GMM estimation of the regression equation in the second stage can give poor standard
errors, leading to an imprecise test of signiﬁcance for 5 in (4.7) and (4.9). In this section, we
will derive a GMM method that corrects for any possible generated regressor problem when
H0 is rejected in Procedure 4.1 or 4.2. Wooldridge (1995) uses a method similar to Newey
(1984) and Pagan (1984) to adjust the standard errors derived from the second stage
estimation of the regression equation in the presence of generated regressors. Applying the
general framework in Wooldridge (2002, Chapter 14) and Newey and McFadden (1994), we
will use a GMM method that stacks the ﬁrst order conditions from the ﬁrst stage probit or
Tobit and on top of the GMM moment condition implied by estimating (4.7) and (4.9). This
method will give us precise standard errors that will allow us to correct for selectivity bias.

GMM estimation when first stage estimation is a probit or a tobit

Consider the ﬁrst stage estimation of the probit selection equation,

 

' 30 + X: 52 +27: 29
P(Sit = 1 I x1.) = (D( ’70 t2 122:2 1’7 (2 )
= (DIWIZ + Xuzérz +5‘r‘712) (13-1)

The parameters of this equation can be estimated by doing T cross-section probits. For each

(i,t), the implied score from the probit log likelihood is

26

. _ ¢(zit2Ct)z;',2[Sit ‘(D(zit2§t)]
(Ilia!) _ ¢(Zit2C1)[1‘¢(zit2Ct)] (3.2)

 

where 21-,2 = (LXI-3,3,), Ct = (w,2,§;2,y'[2)', and ¢(.) = 6CD(zl-,2C,)/8§,. For each i, stack
the (ll-Act) to get d ,-(§). The following GMM moment condition consistently estimates 6

from (4.9),

E[5itw;'50’itl — wit5)] = 0,! = 1,...,T,s it (8.3)
Stacking this moment condition on top on the moment conditions implied by running T
cross-section probits,

r
Sitwis (yitl _ wif6)

= 0 (3.4)
(MC)

provides reliable standard errors for 6. By applying the method discussed in Appendix A,
one can derive the asymptotic standard errors using the standard GMM variance formulas.
When the selection equation is a tobit, simply run T cross-section tobits. For each (i, t),
generate the the score or ﬁrst order condition for Cr As in the probit case, for each i stack
the score conditions to generate d 1-(C) and then use (8.4) to estimate 8.

Appendix C: DERIVATION OF THE OPTIMAL WEIGHTING MATRIX

The optimal weighting matrix for minimum distance estimation is derived from the
asymptotic variance of the unrestricted pooled OLS regression of (4.10). Since (4.10)
contains a generated regressor, the standard errors from the estimation of (4. 10) need to be
adjusted to account for the ﬁrst stage estimation of the selection terms. In the presence of
generated regressors, minimum distance estimation of the structural parameters of interest
will not result in asymptotically efﬁcient estimation if the optimal weighting matrix does not

take into account the ﬁrst stage estimation ofthe selection temis. This appendix outlines the

27

method to derive an asymptotically valid two-step reduced form asymptotic variance matrix

and optimal MD weighting matrix. It has been shown in Section 4 that

E(y,-,1 | wins" = 1) = w,-,I‘ (C.l)
where w, = (1,d2,, ,(IT,,x[,,.i',-,(l2[ .xi, ,dT, -x,-,i,-,,d2, oi”, ,a’T, . in) is a
1x (2T+K+ TG) vectorandl" = ((011,...,wlT,,B,n11,...,n7~1,(p11,...,gplT)’isa

(2T+ K + TG) x 1 vector. We can write (C.l) in error form as
yl'tl = qu‘ +81“ ,1 = 1,...,T (C.2)

where E(e,-t1|Wiz,S,-, = l) = 0. The pooled OLS estimator on the selected sample, after

inserting the estimated selection terms from the ﬁrst stage probit estimation, is
—1
r=<§3 2w) (2: zsmy > (C3)
1-1 t-l i=1 t=1

Using the fact that y,“ = wﬁl" +8,“ = wig“ + (wit — mgr + em , we can write (C3) as

N T

WU“ — 1“): A51 NW2 2 Zsi,a:;,[(w,, — wan“ + em] + 0,,(1) (C4)

i=1 (=1
where A0=E(Zt:1 .sl-IW'I-tv’i’l-t). Using a mean value expansion as in Wooldridge (2002)
Appendix 6a and the fact that E(e,-,1 limbs”: 1)- — 0

N T
(N-l/zzzsnwum—w,-,>r+e,-n)

i=1 t=1

T
= ‘E[Z s,,w},I"V;:w}, :I ”(it _ 1r)

t-l

+N_l/ZZZS s-,w;.,e ,,1+o,,(1) (C.5)

i=lt=l

28

where 1! = (11/12, #172,612, jug/12, +772)! is a (l + 2K1)Tx 1 vector and Vnwgt
is the (2T+ K + TG) x (1 + 2K1)TJacobian of w}! with respect to 1:. Since it is a vector of
probit maximum likelihood estimators for each t, it has the following representation
N
We“: — n) =N‘1/2 Z ri(1t)+0p(1) (C6)
i=1
Therefore,

N T
(NJ/2 Z ZsitwitKwit — Wit”! + 9211])

i=1 (=1

N r
= 2r“2 Z[Zs,,w;,e,-,1 - Dri(n):| + 0pm) ((17)

i=1 z=1
So using (C7), we can write
Md" — I") ~N0rmal(A0_lBoA51 (C8)
where Bo=Var(ZtT:1 Sitwj'teitl — Dri(1t)) E Var[p,—(l",1t)]. A consistent estimate of

Avar[,/N (f‘ — 1")] can easily be generated by replacing unknown parameters with consistent

estimators. Deﬁne
N T N I
A aN—l Z Z snag-Mi, and B aN—l Z ﬁ,(r,1‘c)ﬁ,(r,1‘r)
i=1 t=l i=1
,. A A A ~ _1 N T A I A, A I . A
where em = y,“ — witll", D =N 21.21 21:1 sitwl-II‘ Vnwl-t, and ri(1t) 18 evaluated at 1t.
To compute D and r,-(i‘t), let us deﬁne 2,12 = (l,xi,2,.?l-). The Jacobian Vyzv'vj-t is a block

matrix with all zeros except in one block. Using the expression for the derivative of the

inverse Mills ratio from Wooldridge (2002 p. 522),

29

 

 

an‘v},= 0 0 "lizz/1112'(Zizz“t+/lizz) 0 (C9)
0 0 0

the 1 x (1 + 2K1) row vector, ate/1,12 . (zi,21t,+ 21,2), appears in row 1 + (G + K+ t and
column (1 + 2K1) - (t — l) + 1. Since F'Vnw}t = —(p,lz,-,2,l,-,2 -(z,-,21t,+ 11,12),
N T

N1 szsltwitwtlzzali'a '(zittnt '1’ Ilia) (C 10)
i=1 t=l

U)
III

From standard results for probit estimation, for each i and I, the (l + 2K 1) x 1 vector 77,0?) is

written as
’A‘itﬁiz) =A‘,‘1((D(z,-,21“r,)[1 — ¢(zi,2ft,)]}_1¢(z,-,ft,)z;[2[s,-, — (l)(luzﬁtﬂ (C.11)

where

 

_ {49(1 t2“t)} Z 2'0.
A —N 1 I 1’2 I C.12
I: (=2 (l)(zitzﬂt)[ 1—(I)(Zi12ﬁ()] ( )

is a consistent estimator ofthe expected Hessian. For each i, stack the each Pitﬁct) to get

r ,-(1'i). Once D and r l-(ft) have been computed, we can compute the asymptotic variance

. . A . — . a—l . . . ,.
matnx of 1". Note that Avar(1t) = A 1BA /N. Finally, use the inverse of Avar(1t) as the

optimal weighting matrix in (4.12).

30

Tablel. Summary Statistics. Mean Values. Standard Deviation in parentheses

Variable Description
Participation Indicator

Log Real Wage
Years of Experience
Years of Education

Age

Married Indicator

Other Household
Income (thousands)

Spouse’s Age
Spouse’s Education
Spouse’s Unemployment
Duration (Weeks)

Weeks Unreported
(=1 if Spouse’s

Unemployment not reported)

Children Aged 0-2

Children Aged 3-5

Children Aged 6-17

Number of Obs.

{Entire Sample Participants Non-Participants

0.74

11.79
(7.76)
12.93
(2.30)
40.93
(10.27)
0.86
34.398
(40.379)
37.00
(18.17)
11.21
(5.25)
0.97
(4.96)
0.09

0.14
(0.37)
0.18

(0.42)
0.82

(1.01)
11,401

1
1.94
(0.62)
12.98
(7.58)
13.12
(2.27)
40.13
(9.62)
0.84
30.945
(30.868)
35.17
(18.28)
(10.98)
(5.50)
0.94
(4.80)
0.06

0.11
(0.33)
0.16
(0.40)
0.84
(1.01
8.387

O

8.49
(7.28)
12.40
(2.31)
43.14
(11.63)
0.93
44.007
(58.237)
42.10
(16.84)
11.84
(4.40)
1.06
(5.36)
0.16

0.21
(0.45)
0.24
(0.48)
0.77
(0.99)
3,014

Table 2: Estimates for Wage Equation. Robust Standard Errors in Parentheses.
Yeardunlmies are in_cluded but not reported

POLS FE FE Mills Procedure 3.1" Procedure 31”

Exp 0077*“ 0083*“ 0082*“ 0.0912" 0.1314"?
(0.0062) (0.0101) (0.0099) (0.0338) (0.0059)

Expz —0.00l6*** —0.0009*** -0.0007*** -0.0016* —0.0026***
(0.00017) (0.00017) (0.00016) (0.0008) (0.0001)

IMR — — —0.134*** —0.0813 —0.1421***
(0.0385) (0.0812) (0.0135)
62/61 — — — 1.0371 0.9363***
(0.0912) (0.0166)
63/62 — — - 0.9786 0.9643"
(0.0882) (0.0149)
94/93 -— — — 1.1118 0.9722*
(0.1446) (0.0168)
65/64 — — — 0.7842 O.8758***
(0.1490) (0. 0178)
96/195 — — — 1.0757 0.9832
(0.2251) (0.0203)
97/96 — — — 0.7963 0.7394***
(0. 1322) (0.0209)
68/67 — — — 0.9723 0.7671***
(0.1422) (0. 0192)
69/68 — — — 0.8156 0.7471***
(0.1382) (0. 0203)
610/69 — — — 1.2537 O.8775***
(0.2080) (0.0223)

32

f Table2 Continued
POLS FE FE Mills Procedure 3.1" Procedure 31])

911/610 '— — — 0T8604' 0.6784”
(0. 1695) (0. 0227)
612/611 — — — 0.8055 0.5766***
. (0. 1981) (0.0296)
613/612 1— — — 0.6994 0.2112***
(0.3279) (0.0498)

J-Test ,- — — 19.0169 484.0668 T
(P-Value) (0.0250) (0.00416)

a-(T — l)(K +1) moment conditions. b-(T — 1)2(K + 1) moment conditions.
>1: >1: 4: —Signiﬁcant at 1% level. >1: >1: —Signiﬁcant at 5% level. >1: -Signiﬁcant at 10% level.

33

Table 3. Estimates for Wage Equation. Year dummies
included but not reported.

Exp
Exp2

IMR

. Procedure 3.1“

0.4036";
(0. 0030)
—0. 0006*"
(0. 0001)
—0. 1670*“
(0. 0068)

. 0. 9596*”

(0.0031)

‘ 1.0550***
. (0.0045)
1.0661***

(0. 0048)
0. 9900"
(0. 0046)
1.0349***
(0.0055)
1.0444***
(0.0059)

’ O.9705***

(0.0056)
0. 9585*"
(0.0057)

34

Table 3 C ontd 0 Procedure 3.1“

1.0213”‘

910
(0.0063)

1.0093

611
(0.0066)
1.0236""

912
(0.0070)
0. 9610*”

913
(0.0072)
J-Test 543. 2125
(P-Value) (0.0006)

a-T (T 7— l)(K + l) moment conditions.
4 >1: =1: —Signiﬁcant at 1% level. >1: >1: —Signiﬁcant at 5% level. * —Signiﬁcant at 10% level.

35

Table 4. Estimates for Wage Equation. Minimum Distance Estimation.
Year dummies included but not reported

Exp

7

Exp"

IMR

IMR81

IMR82

IMR83

IMR84

IMR85

IMR86

IMR87

IMR88

IMR89

IMR90

IMR91

Procedure 4.3

' 0. 0829*“

(0.0094)
-0. 0004"

~ (0.0002)
? —0. 3547*"

(0.0667)
0.0145

(0.0640)
—0. 0302

(0.0638)
70.1212*

(0.0686)
0.0710

(0.0801)
0. 1339*

; (0.0725)

0. 1462*
(0.0751)
0. 2918*”
(0. 0810)
0. 1294
(0. 0843)
0.0204
(0. 0907)
0.0080
(0.0959)
0. 0407
(0. 1006)

36

Table 4 Contd ‘ Procedure 4.3

114R92 ‘(12060*
0.1055
02 1.0713***
(0.0264)
03 1.0730**
'(0.0305)
64 11.0665*
(0.0350)
05 1.0754*
(0.0388)
96 1. 1652*"
(0.0451)
97 1.0277
(0.0438)
98 1.1550*"
(0.0486)
09 1.1462‘“
(0.0513)
910 1.1361**
(0.0605)
911 1.1561**
(0.0618)
012 1.1967***
(0.0644)
013 1.1781**
1(0.0736)
Wald Test for 7

Selection Bias xﬁ 133.44
>1: * *-Signiﬁcant at 1% level. * *-Signiﬁcant at 5% level. *10% sig. level

37

CHAPTER 2

1. Introduction

In the panel data literature, theoretical and applied econometricians have been interested
in problems dealing with missing data and sample selection bias. For example, Wooldridge
(1995) developed a ﬁxed effects approach to test and correct for selection bias in linear
models. Terza (1998) considered exponential models with sample selection and exogeneity
in the case of a cross-section. Econometricians have also turned their attention to non-linear
panel data models. Hausman, Hall, and Griliches (HHG) (1984) developed and estimated a
ﬁxed-effects Poisson (FEP) panel data model under various distributional assumptions.

HHG (1984) applied their model to examine the relationship between R&D expenditure and
the number of patents a ﬁrm receives in a given time period. Wooldridge (1999) extended
the work of HHG (1984) by proving that the quasi-CMLE is robust to any distributional
misspeciﬁcation under a conditional mean assumption.

Both HHG (1984) and Wooldridge (1999) assumed strict exogeneity conditional on an
unobserved effect. Blundell, Grifﬁth, Windmeijer (BGW) (2002) use GMM to estimate
balanced count panel data models with strictly exogenous regressors. BGW (2002) also
consider count data models with feedback. Wooldridge (1997b) considers general non-linear
panel data models under a weaker sequential exogeneity assumption that allows for feedback
from the lagged dependent variable to future explanatory variables. However, under a strict
exogeneity assumption, feedback is not allowed. This paper will only consider methods to
test and correct for selection bias in non-linear panel data models under a strict exogeneity

assumption. As has been shown in the literature, the assumption of strict exogeneity may not

38

be entirely realistic. For example, the number of patents a ﬁrm gets today can inﬂuence its
R&D expenditure in the future or a wage shock in the past can effect the number of years of
experience we observe today.

The FEP estimator that we will consider in this paper can also be applied to data on
hourly earnings. When labor economists want to estimate the return to experience and
education, they often use log wage as their dependent variable. One can also consider using
wage data in its level form to measure the return to experience and education. Such a model
speciﬁcation for wages, experience, and education can be applied to non-linear panel data
models with an exponential mean function. In a selection context, the return to experience
may not be observed if a person does not accept a job offer or work a certain number of
hours. Estimating the return to experience with missing wage data can lead to selection bias.
Also, in the R&D and patent literature, the number of patents awarded to a ﬁrm in a given
year may not be observed if that ﬁrm has gone into bankruptcy. Estimating the relationship
between patents and R&D expenditure with missing data on the number of patents awarded
to a ﬁrm in a certain year can also lead to inconsistent estimates due to attrition bias.

The objective of this paper is to extend the work of Terza (1998) to cases where there
exists possible selection bias in a panel setting. We will use the estimation strategies
proposed by Wooldridge (1999) and BGW (2002) to test and correct for selection bias in the
presence of only strictly exogenous regressors. This paper will review the conditions needed
for consistency in a balanced panel and then derive the conditions needed for consistency in
an unbalanced panel. A procedure to test and correct for selection bias in non-linear panel
data models will also be outlined.

The plan of this paper is as follows. Section 2 will derive the conditions needed for

39

consistency in a balanced and unbalanced panel using the desirable robustness properties of
the Poisson QCMLE. Sections 3 and 4 provide procedures to test and correct for selection
bias. An empirical example in Section 5 will illustrate the estimation methods shown in
Sections 3 and 4. Section 6 concludes. An appendix is included to derive the asymptotic

variance matrix needed for selection corrections.

2. Consistency of nonlinear models in
balanced and unbalanced panels

2.1 Consistency in a balanced panel

This subsection will brieﬂy summarize the conditions needed for consistency of count
data models in a balanced panel. Recent econometric literature has focused on non-linear
panel data models. A typical example of a non-linear model that econometricians often
focus on is a count data model with an exponential mean function. Chamberlain (1992)
considered efﬁcient estimation in the exponential mean case when the regressors are
sequentially exogenous. Wooldridge (1997b) and Wooldridge (1999) considered a more
general class of nonlinear models with a multiplicative unobserved effect. Even when the
population model has an exponential form, the mean function conditional on a selected part
ofthe population will not have an exponential form. So, let {(xi,y,-, Vi» : i = 1, ,N}
denote a random draw from the population and 1 denote a particular time period. For the
balanced panel, we observe (VI-[,xiz) for! = 1, , T. The asymptotic analysis is done for a
ﬁxed number of time periods, T, with the size of the cross-section, N, tending to inﬁnity. In

order to achieve consistency for a balanced panel, we must assume the following

40

Assumption 2.1 Fort = 1,...,T:

501:1in » axiTJ’z') = Villain/30)
Note that xi, is a 1 x K vector of explanatory variables, v,- is unobserved heterogeneity,

[30 is a K x 1 vector of parameters at the "true value" of ,8, and ,u(-) is a strictly positive

nonlinear function. The leading example to use in place of p(-) is the exponential function,
E0‘,-,|x,~,c,-) = exp(xl-tﬂo + Ci) = v,~exp(x,-,,Bo) (2.1)

Although the exponential function can be used as the leading case for ”(o), ,u(-) can be
chosen such that it is strictly positive and well deﬁned for all 11,-, and 6. This level of
flexibility in choosing p(-) will prove useful later when deriving a test and correction for
selection bias. Assumption 2.1 dictates that in the population {x it : t = 1,, T} is strictly
exogenous conditional on the unobserved effect, Ci. Under the so-called "ﬁxed effects"
assumption, there is allowed to be arbitrary correlation between c,- and x,-. In the case of
strictly exogenous explanatory variables and the assumption that the y it follow a Poisson
distribution and are conditionally independent across time, Hausman, Hall, and Griliches
(1984) showed that their quasi—CMLE (QCMLE) ﬁxed effects Poisson (FEP) estimator
consistently estimates 00 (for ﬁxed T as N -> 00). Speciﬁcally, HHG (1984) assume the
following

y” I xi,vl- ~ P()iss0n(141-11,,(60)), t = 1,...,T (2.2)
and

.l’it~}"z'/' are independent conditional on xi, v,- (23)

Deﬁning n ,- = 2:1 y,-, as the sum of the counts across time for a certain individual, one can

41

show as in Hausman et. al (HHG) (1984) that

Yi | ni,x,-,v,- ~ M111!inonzial{n,~,p1(x,~,Bo),...,pT(xl-,ﬁo)} (2.4)

where y,- is a T x 1 vector of counts and

Pz(xi,/3) = #1109143)
23r=1 ”17“,}, 0)

However, it is important to note that the QC MLE estimator that Wooldridge (1999) and we
propose in this paper is consistent provided that the conditional mean is correctly speciﬁed.
To prove consistency of 60, there is no need to assert the distributional assumptions that
HHG (1984) make. In other words, assumptions (2.2), (2.3), and (2.4) need not hold for the
QC MLE to be consistent. In fact, for the general case in which 11(-) denotes the general
nonlinear mean function, Wooldridge (1999) showed that the FEP estimator is, in fact, fully
robust to distributional misspeciﬁcation and arbitrary dependence over time. Wooldridge
(1999) also proposed generalized method of moments estimators that are more efﬁcient than
the FEP estimator when the full set of Poisson distributional assumptions fail. Therefore, y ,-
can be a binary response, Tobit response, a nonnegative continuously distributed response, or
a logit response probability. Given a correctly speciﬁed conditional mean ﬁinction, there is
nothing that restricts y ,- to be a count response. Blundell, Grifﬁth, Windmeijer (BGW)
(2002) independently derived the robustness of the FEP estimator using its ﬁrst order
condition. In fact BGW (2002) showed that the FEP estimator is the Poisson estimator that
has dummies included for each individual. This means that for a model that uses the
exponential mean function in place of 11(o), the exponential regression does not suffer from
the incidental parameters problems just as in the linear case. Under Assumption 2.], a

variety of estimators consistently estimate [30 (with ﬁxed Tand N —+ 00), provided the

42

regressors have some time variation. Since the Wooldridge (1999) estimator is robust to any
misspeciﬁcation of the Poisson assumption as long as the conditional mean is correctly
speciﬁed and applicable to any non-linear function, I assert that the Wooldridge (1999)
QC MLE FEP estimator is identical to the BGW (2002) GMM estimator when the
exponential function is used as the non-linear function in the conditional mean.

However, for now we show that the Wooldridge (1999) QCMLE FEP is consistent in a
balanced panel under Assumption 2.1. The log likelihood for observation 1' and parameter

vector [3 is written as

T
1,03) =25»,-,Iogtp,(x.-.9)1 (2.5)

t=1
(2.5) can then be used to generate the score and Hessian functions needed for asymptotic

inference. As in Wooldridge (1999) the score for observation 1' is
T
0113) = 7/3003) =Zyrtvppttxi,(Jr/p.080]
(=1 '
T
= ZWﬁPtO‘i,ﬁ)’/Pt(xivﬁ)]{yit ‘Pt(xiaﬁ)ni}
t=l
T
=Zlvﬁpt(xiaﬂ),/Pt(xis3)]{rit(xiaﬂ)} (2.6)
(=1

where rit(x,-, [3) = y” — pt(x,-, ﬂ)n,- is a residual function. Using Assumption (2.1), the

following lemma proves that the FEP QC MLE estimator is consistent in a balanced panel.

Lemma 2.1 E[r,-,(x,~,/30) | xi], ...xl-T, Vi] = 0 under Assumption 2.1.

43

Proof

_ T . . .
Deﬁne ,ul-(xl-r,/30) E (l/T) 2,:1 y,-,.(x,-r,,80). Pluggrng 1n fory,-,,p,(x,-,[30), and ni,tak1ng
expectations, and using the Law of Iterated Expectations

E[rit(xi2 ﬁ0)]
= E[EO’,'1'Pt(xi,ﬂ0)ni)lxi’vi]

= EFEU-lx- v.)_ﬁ_if_(xil_3£)_ (“732250. I x- v-)
(I l’ l ﬂi(xir9ﬁ0) [:1 It 1’ l

—

F—

T
2 E Viﬂil(xit2180) — 121%((1/D Egg-((Xit’ﬁ0))]
1 II" [=1

—

 

Since E[f,-(/30)] = E[E[f,~([30)|x,-,v,-] = 0 by LIE and Assumption 2.1, the FEP estimator is

consistent.-
The ratio 11,-,(ﬂ)/[1 i(/3) does not depend on 6 j if Xitj does not vary over time for a certain

j ifthe exponential function is used in place of y(-). Therefore, as in linear ﬁxed effects
panel data models, coefﬁcients on time invariant regressors will not be identiﬁed using the
transformation in (2.6). However, interaction of time constant variables with year dummies
is allowed. To perform inference using the Wooldridge (1999) F EP estimator, one needs to

derive the expected Hessian from the log-likelihood in (2.5). After taking second derivatives

of (2.5), the expected Hessian is written as
AG 5 E[rl,-V5p(x,,,80)'W(x,-,ﬁo)Vﬁp(xl-,60)] (2-7)
where p(xiaB0) E [F] (X1300), 2PT(xi’ﬁ0)] and

W(X,-.Bo) = [diag{p1(x,-,ﬁo), ...,p7~(x,-,/30)}]_1. Using the score and the expected

44

Hessian, the asymptotic variance ofﬁ is A51 BoAal/N where A0 is deﬁned in (2.7) and

Bo = E[f,-(ﬂo)f,-(ﬁo)']. Consistent estimates oon and B0 are

>>
H

719 n,v,,p(x,-,B)’W(x,-,i3)vap(x,-,i3>1 (2.8)

w)
N

010009 (2.9)

2|~

N
2
i=1

N
2
i=1

. . . . . . - . . —l . .
With the estlmated asymptot1c variance being equal to A 1BA /N. A val1d asymptotlc

variance estimator when (2.2) and (2.3) hold is [fl/N.
Using Assumption 2.] and the results from Lemma 2.1, the following GMM moment

conditions consistently estimate [30

E[/1(x,-S,B())'r,-,(ﬁo)] = O, Vs,t (2.10)
where h(x,~5, [30) are available instruments that include squared and cross product terms or
just xis and is a function of [30. Given the choice ofinstruments, 11(xl-S,/30), a GMM

minimum-chi squared estimator of i3 is obtained by minimizing the following criterion

function

N ’~ N

mﬁin 4"1 ZH(X,-,l3)’r,-(B) 0’1 N-1 meam'ritm (2.11)

i=1 i=1
with

A. N ~

Q=N-'ZH<X,-.13)’r,-(13>r,-<m'H(x,-,13) (2.12)

i=1

being a L x L estimator on=E[H(X,',Bo)'ri(,80)ri(ﬁo)'H(X,-,60)]. Note that rim) is a
T x 1 residual vector deﬁned in (2.6) and H(X,-, B) is a T x L matrix of instruments that

depends on a preliminary consistent estimator, such as the QC MLE. To derive an estimator

45

just as efﬁcient as QCMLE, H(X,-, ,8) must be chosen carefully. For example, we can exploit
the results from Lemma 2.1 to derive a GMM estimator just as efﬁcient as the QCMLE. One
possibility for H(X,-, B) that will produce an estimator more efﬁcient than the QC MLE is the

T x 2L partitioned matrix
max» = W(X,-,B)Vpp(X,-,l3) 1 Vanuatu

By using this form for H(X,-, [3), we add L overidentifying restrictions by adding the

following orthogonality condition to the QC MLE score condition
EWﬁPO‘pﬁOYUWoH = 0
Using a standard Taylor-Series expansion and setting the optimal weighting matrix equal to
Q, the asymptotic variance is
Avar 7M8 — 13) = (G’Q-l G)’1 (2.13)
where

G EE[H(xi,ﬁ0)fVﬂri(Bo)] (2.14)

is a L x K ﬁrst derivative matrix. The asymptotic variance can be estimated by

(G?) 1(AD—l/N where

A

GE

M2

‘IW. 1118919913613) (2.15)
l

1

2.2 Consistency in an unbalanced panel

Now, having derived the conditions needed for consistency in a balanced panel, it is
natural to extend the analysis to conditions needed for consistency in an unbalanced panel.

By an unbalanced panel, we mean that some time periods might be missing for some

46

cross-sectional draws. Let s,- = (SI-1, ,5'l-T)’ denote a T x 1 vector of selection indicators.
The data (jg-bx“) is observed for 51-, = 1. Let {(xi,y,-,v,-,s,-)) : i = 1, ,N} denote a
random sample from the population and T,- = 23‘s,). denote the number of time periods
observed for each cross-section observation 1'. The parameter vector of interest 6 is

estimated by performing GMM or QC MLE on a selected sample based on (2.1). For ﬁxed T,

as N —+ co, the FEP estimator is consistent when the following assumption holds

Assumption 2.2 Fort = 1,...,T

EO’z’tlxils--- 2xiTvSi’Vi) = 17/101630)

Even for the unbalanced panel, the leading case to use in place of 11(-) is the exponential
mean function. Assumption 2.2 states that selection is exogenous, although this assumption
does not hold when sequentially exogenous regressors are included in the model. As in the
case of the balanced panel, arbitrary correlation is allowed between the unobserved
heterogeneity, vi, and xi. Under the so-called "ﬁxed effects" assumptions, selection is
allowed to be arbitrarily correlated with (xi,v,-). Therefore, selection in all time periods is
allowed to be correlated with x,- and vi. We are not assuming a random effects structure
where s 1' and v,- are allowed to be independent. For example, a person or a ﬁrm dropping out
a sample due to individual characteristics does not result in inconsistent estimates of [30.
Moreover, under Assumption 2.2, the FEP estimator is W -asymptotically normal since we

assume that the cross-sectional dimension tends to inﬁnity with the time dimension ﬁxed.

Deﬁning n,- = 2T

r=l Sir)?!“ and

47

#:7011le
2:1 Sirluir(xiraﬂ)

 

Pt(xissial3) =

we can show that the FEP QC MLE is consistent in an unbalanced panel. For the unbalanced
panel, the FEP estimator is robust to any distributional misspeciﬁcation as long as
Assumption 2.2 holds. The log likelihood for observation i and parameter vector 6 is written
as
T
003) =Zsin‘i110gIPt(Xi,Si,13)] (2.16)

t=1

When the sample is not selected for periods t or r, the particular observation for those periods

do not contribute to the estimation of 6. As in Wooldridge (1999), with slight modiﬁcation

to account for selection, the score for observation 1' is

T
{1(3) =V5i‘iU3) =ZSiﬂ’iIIVﬂPtO‘i’Sisﬁ),/Pt(xi’siaﬁ)]
t=l

T
: Z sit[VﬁPt(xia 31', [07171093 Sr» @1011 — [71(Xi, Si, {0'0}

N
#

T
= Z SitIV/3P1(Xia Si.» ﬂy/Pto‘ia Sta 101011329 5139)} (2-17)
=1

N

where r,-,(x,-,s,-,B) =y,-, —p,(x,-,s,-,,B)n,- is a residual function. The following lemma

illustrates that the QCMLE of [30 is consistent in an unbalanced panel.
Lemma 2.2 E[r,-,(x,-,s,-,[30) | X” , xiT,s,-, Vi] = 0 under Assumption 2.2.

Proof

_ T T . .
Deﬁne #i(xir,ﬁ) E (l/TI)ZI‘=I .s'l-rp,,.(x,-,.,[3) and TI E 21:] S”. Plugglng 1n I‘OI‘yl‘t,

48

p,(x,-, si, ﬁg), and n,-, taking expectations and using the Law of Iterated Expectations,

E[r,‘,(xi,8i,[30)]
= E[E(Yit ‘pt(xiasia130)”i)lxi’Vbsi]

F

T
#iz(xita/30) a
=E Ev-x-,s-,v- —————_ l/T- s-l: 1- x-,v-
(ill 1 z z) #i(xiraB0) <( ()2 tr 012' I l I) :l

_

T
#iI(xitaﬁ0)
=E v- - x-, —v~———_ l/T- E s- -.x-.,
Ilult( If ﬂO) l,“i(x[raﬁ0) ( I) —] Irlull( ll B0) ]

 

Since [SUI-(60)] = E[E[f,~([30)lxi,vi,si] = 0 by LIE and Assumption 2.2, the FEP estimator
is consistent in an unbalanced panel.-

As in the case of a balanced panel, the BGW (2002) estimator is a special case of the
Wooldridge (1999) estimator since both use the score of the conditional mean function to
calculate B. As stated in the previous subsection, BGW (2002) show that the FEP estimator
is the Poisson estimator that has dummies included for every individual. Given this fact, it is
not surprising that the conditions needed for consistency in an unbalanced panel are the same
as in a linear panel data model. Having shown through Lemma 2.2 that (2.17) is valid
transformation in the presence of only strictly exogenous regressors, the following GMM

moment conditions consistently estimate the parameters of interest in an unbalanced panel,
E[sl-,sl-qlz(qu, Ba)’r,-,(80)] = 0 Vt,q (2.18)

where ri,(ﬁ) = y” —p,(x,-, s,,ﬁ)n,-. In a panel data model with only strictly exogenous
regressors, consistency is achieved when the data is selected in time t and q. [3 j is not
identiﬁed when the data is not selected for time periods I and q or when x,”- is time invariant

for a certainj. The moment conditions in (2.18) are premultiplied by sits,” to account for all

49

combinations of selection in periods I and q. For GMM to be just as efﬁcient as QCMLE for

an unbalanced panel, one needs to include Vﬁpt(xi, si,[3)'/p,(x,-, 513/3) as part ofh(xl-q,ﬂ).

Using the GMM formulas speciﬁed in the previous subsection, point estimates and
asymptotic standard errors of 00 can be obtained. To perform inference using QCMLE, use

the formulas in (2.8) and (2.9) to construct the expected Hessian and asymptotic variance of

[30-

3. Tests for selection bias

3.1 A simple variable addition test for selection bias

In unbalanced panels, a researcher often has an interest in testing for selection bias.
Nijman and Verbeek (1992) and Semykina and Wooldridge (2005) propose simple variable
addition tests in a linear panel data model. We extend that analysis to non-linear panel data
models. Using Assumption 2.2, we can test for selection bias by doing a simple variable
addition test. Assumption 2.2 says that selection is exogenous. A violation of this condition
would indicate that selection bias exists in the sample. Therefore, by adding lagged or future
values of the selection indicator to (2.1), we can test for selection bias. Using lagged or
future values of the selection indicators to test for selection bias is useful, especially when 11,-,
is not observed for all time periods. Estimation of the following model using the QCMLE

method outlined in Section 2 provides a useful test for selection bias,
)7! : vi exp(x[[B+pS,‘,I+1 )ul't (3.1)

where by Assumption 2.2, E(u,~,lx,~, 14,-,si) = 1. 3 Although using the lead ofthe selection

50

indicator as an additional variable to test for selection bias has been used in linear panel data
models, as in Semykina and Wooldridge (2005), such a test has not been proposed for a
nonlinear model like (3.1). Estimating (3.1) is a novel and simple way to test for selection
bias when data is missing for the regressors or the dependent variable. In order to test for
selection bias, estimate (3. 1) using a QCMLE or method of moments procedure, and then test
H0 : p = 0 versus H1 : p ¢ 0 using an asymptotic t-statistic. Under the null hypothesis, no

selection bias exists in the sample. By using 31;,“ (Si,t—I) as an additional regressor in (3.1),

we lose the last (ﬁrst) time period. Other possibilities to include as an additional regressor in

(3. 1) include 2:11 Sir, which is the number of times an individual appears in the sample

7’

It, +1 sir, Whlch IS the number of t1mes an 1nd1v1dua1 appears 1n a

prior to period t, and 2

sample after period t. A rejection of the null hypothesis also indicates that selection is not

strictly exogenous. To test for attrition bias, we can only include sh,“ or 2 Qt +1 Sir as an

additional variable in (3.1) since attrition is an absorbing state. Including Si,t—1 or 2:11 Sir
as an additional variable to test for attrition bias would not work since neither variable varies
across 1'.

Since it is plausible that the selection indicators are correlated across time, the tests
described above can detect some contemporaneous selection bias. However, it would not be
valid to include 51-, as an additional variable in (3.1) since, by deﬁnition, 5 it does not vary by
i or t in a selected subsample. In the next subsection, we test for contemporaneous selection
bias by testing directly whether 5 it is correlated with a time varying unobservable term that

can cause selection bias.

51

3.2 Testing for contemporaneous selection bias

Terza (1998) is able to develop a selection mechanism by assuming bivariate normality
of the unobservables in the regression and selection equations for a cross-sectional model.
We extend Terza’s ( 1998) framework for the cross-section to develop a selection mechanism
for an unbalanced panel. This section of the paper considers tests for selection bias in the
presence of only strictly exogenous regressors. As stated previously, this may not be an
entirely realistic assumption in applications dealing with the relationship between R&D
expenditures and patents and wages and experience. However, the assumption of strict
exogeneity may be a safer assumption in other panel data applications. For purposes of
developing a test, consider a case when only the selection indicator is observed. In this
section, we are not considering the case when we partially observe a tobit dependent
variable. We assume that xi, is observed for all t = l, . .. , Tand y,“ is observed only if

5,, —- 1. Consider the following model with only strictly exogenous regressors

ym = “13091151 + 611 + 01:1)11111 (3.2)
Sit = 1[X,-,2,82 + 612 + “1'12 2 0] (3.3)

where 6,2 = 1102 + 371012 + 81-2 and 81-2 is a zero mean random variable independent of it,-

One can use the Chamberlain (1984) device for the unobserved effect in (3.3), but to
conserve on parameters, the Mundlak (1978) device is suitable. The term am in (3.2), which
has mean zero and variance v2, represents a possible selection term that when left out of
(3.2) causes an omitted variable bias that effects the point estimation of 6 1. Also, for the
selection mechanism to work, one must include exclusion restrictions in (3.2) and (3.3)
where x,“ is 1 x K1, 21,-,2 is 1 x K3 and K2 > K]. Plugging in for 61-2 in (3.3), the selection

equation can be written as

52

Sit = 1[n02 + 11,1712 + X82132 + rel-,2 2 0] (3.4)
where
6:22 = 872 + “it2 (3-5)
)1,- includes all regressors from xm and xitz and cal-,2 ~ Normal(0,o[2). Due to time varying
heteroskedasticity, the probit parameters in (3.4) are rescaled since

02+)? 2+X' 2 -
P(Sit = IIXI') = (I)( ’7 (”Cit ”2B ) = <D(60, +Xit2<5t2 +Xl'yt2) (3.6)

 

A test for selection bias would be valid whether or not the parameters in (3.4) are rescaled,
but by imposing the assumption that (31-12 ~ N0rmal(0,o;2) we afford ourselves more
ﬂexibility when testing for selection bias. In empirical work, the researcher would hope that
Assumption 2.2 holds and E[exp(a,-,1) | xi,c,-1,si] = 1 fort = 1,... ,T. However, if
selection bias is substantial, by leaving out a correction term, estimation of (3.2) would result
in omitted variable bias. In order to derive a test for selection bias, we assume that

(am , rel-,2) is bivariate normal conditional on (xi, 02’] ). Using this assumption, one can write
that a [,1 = peel-,2 + rm where cal-,2 and rm are independent. Therefore,

E[exp(al-,1)|e,-,2] = exp(peit2)(§o = exp(no + P9112) where 710 = log(§0). This assumes that
am is mean independent of(xi,c,-1 ,el-lz, ,ei,,_1,2,e,o’,+1,2, ,el-n) conditional on eel-,2.
Hence, under the alternative of selection bias, E[u,-,1 l x,~,s,-,c,-1,a,-,1] = l and

E[exp(a 1'11) I 6112] = exp(no + P9112)- Since time invariant regressors are not identiﬁed in
FEP estimation, 770 cannot be estimated although a full set of year dummies can be included
in estimation. By integrating the function exp(pe,~t2) over the truncated normal CDF, we can

derive the function g(zi,25,2, p). Therefore, under the alternative of selection bias,

53

EO’izll Xiacil’eiIZ) = ex1309111310 + Cil)E[exP(aitl) l Xiacilaeitz]

= 6XP(Xitll310 + Ci1)E[exP(aitl) I 9172]
= export-”BIO + Cil)exP(P0¢’it2) (3'7)

For the purposes of deriving a test for contemporaneous selection bias, we need to

condition upon the selected subsample. Therefore,

EO’itl lxiacil ’Sit

1)

€Xp(Xn1ﬁlo + 61-1)E[exp(pe,~,2)|xi,ci1,eitz > -(Zizz5z2)l
(I)
, exp(poe,-,2 — (1/2)81212)
"1’12 (2

 

ex1309211316 + 011) deitZ

m“ ‘ ¢(‘21125t20))

€Xp(xiz1ﬁ10 + CH)"

I” 5 expi—(1/2)(ea2—pe)2+(1/2>p%1

"'- 1'12 12

 

 

(Le-2
,/27r(1 - (IX—2172600)) H
ex ((1/2) 2)<1>( +2.78 )

exmxitlﬁlo + Cil)g(zi125im’ p0)
11,-1ml-[(1to) (3.8)

Note that under the null hypothesis of no selection bias, p = 0 and

g(w, + x 1,25 ,2 + 1'1 1'7 ,2, p) = 1. It is also important to emphasize that under the null

hypothesis of no selection bias, arbitrary correlation is allowed between S” and 61-1 when the

F EP estimator is used to test for contemporaneous selection bias. Again, deﬁning

271,-,(1t)

T
2):] Sirmirur)

 

pt(xissivn) :

we can develop a test for test for selection bias using QCMLE as long as the conditional

mean assumption in (3.8) holds. The log likelihood for observation 1' and parameter vector 1t

is written as

54

T
(1(3) = Z Sityit log[p;(X,-, Si, “)1 (3-9)

(:1
Using the results from Lemma 2.2, it can be seen that the score function derived from (3.9)
will have an expectation of zero. Apply the standard formulas shown in (2.8) and (2.9) to
derive asymptotic standard errors. Having derived the asymptotic variance estimator and
consistency of the QCMLE, a selection test can easily be derived. The following procedures

outline the steps needed to test for selection bias.

Procedure 3.1

A
w
L-

1. For each year, do standard probit ofs,-, on xnz and x,- to estimate (0M, ,2, 37,2.

2. Using the QCMLE method outlined above, estimate

yitl = CXPO‘izlﬁl + Ci1)g(é)z+ X82512 + iiTIva) + ”’00:
to generate B] and [5.

3. Test H0 : p = 0 against HA : p 4: 0 using an asymptotic t-statistic.
Procedure 3.2 LM/Score Test

1. Using the QC MLE method outlined above, estimate the following equation subject to
the restriction that p = 0.

J’itl = exp(x,~,1[3|+cl-) + error”
and obtain B 1.

2. Generate the score function evaluated at p = 0 using B] generated from step one. Use
the following LM statistic to test the null hypothesis of no selection bias

N N ‘1 N
(2 C(31)) <2 VpPO‘pBI)'W(XiaB1)V/3P(XiaBl)) (Z 0&0)
i=1 i=1

i=1

55

which is distributed 1% and where 2:1 fi(,Bl) is deﬁned as in (2.6).4

Unfortunately, Procedure 3.2 does not produce a robust test that takes into account
possible serial correlation in the model. An alternative procedure can be derived by noting

that

0m- (7r -
-7a’.’p—)|p=0 = €Xp(x,',1[31 + Ci1)4(‘0t + X82612 + x1712) (3-10)
where 2(-) = ¢(-)/<I>(-) is an inverse Mills ratio derived from running a standard probit

regression of Sit on x112 and i,- for each 1. Using this result, we can derive a robust variable

addition test that can detect selection bias and account for serial correlation.

Procedure 3.3 Robust Variable Addition Test

1. For each year, do standard probit of fit on x112 and i,- to estimate (2),, «5,2, 77,2. Generate
the inverse Mills ratio, 431052 + xitzg'tz + iii/,2), and take its log.

2. Deﬁne the (K 1 + 1) x l parameter vector 1: = (B'l,t//)'. Using the QCMLE FEP method
outlined above, estimate the following equation subject to the restriction that p = 0

ym = exp(xm[31+w log (11”) + ci1)+ error),
= Vilmitditm) + error” (3.11)

3. Using an asymptotic t-statistic, test H0 : I]! = 0 versus H.) : (p i 0.

Alternatively, a GMM score statistic can be used to test for selection bias under the
restriction that p = 0. First do T cross sectional probits and generate the inverse Mills ratio,
21,102), + 1812912 + iii/,2). Next, simply, estimate (3.11) using the following moment

conditions 1:0 = ([330, 1110),.

56

E[s,-,h(x,-S,n0)'e,-,(/1,-,;110)] =0, Vt,s (3.12)
where

mit(’l'it;“)
T
(l/Ti)z,=13irmir(’lit;n)

 

611011;”) = 2m in (3.13)

511-1 = (1/T,-)Z,:15,-,._)’,-,.1, and h(xl-s,rc0) equals Vﬁp,(x,-,si,1t)’/pt(x,-,sl-,rt) derived from a
preliminary consistent QCMLE. Use the moment conditions implied by (3.12) and the
GMM formulas speciﬁed in Section 2 to generate 1;? and 56(0), and then test Ho : w = 0
versus H,.; : 1,11 4t 0. To test for selection bias over time, simply interact log (in) with a full
set of year dummies and use an asymptotic wald statistic to test Ho : 1,11, = 0 versus

H1 : 1,11, :1: 0.

Note that none of the testing procedures proposed above correct for selection bias since
we are not formally imposing restrictions 6 1'1 . We assume in all of the procedures shown
above that the null hypothesis holds and that the conditional mean is correctly speciﬁed. A
rejection of the null hypothesis indicates that the conditional mean is not correctly speciﬁed,
in which case we will have to formally impose some structure or restrictions on cil in order

to correctly specify the conditional mean. We do this in the next section.

4. Correcting for sample selection bias

Various authors in recent econometric literature have proposed methods to correct for
selection bias. For example, Wooldridge (1995) proposes a correction method that corrects
for selection bias in linear panel data models. This subsection will extend Wooldridge
( 1995) to deal with non-linear count panel data models. Consider the selection model in

(3.1) and (3.2), which is rewritten here

57

ya = EXPU‘itlﬁl + Cil + aitl)“itl (4-1)
I[I]()2 + fit-1712 + KHz/32 + 8172 Z 0] (4.2)

5 it
To avoid conditioning on selection in at least two separate time periods, Wooldridge (1995)
proposes a linear projection for the unobserved effect in the regression equation. In the
selection model represented in (4.1) and (4.2), a slightly stronger assumption is needed to
deal with the unobserved effect in (4.1) to correct for selection bias. Since (4.1) is a
nonlinear panel data model, a conditional expectation assumption is needed for c 11 to correct
for selection bias in the presence of strictly exogenous regressors. Basically a Chamberlain
(1984) or Mundlak (1978) device needs to be imposed upon (4.1). Therefore, the following

assumption corrects for selection bias in the presence of only strictly exogenous regressors.

Assumption 4.1

1') Deﬁne the selection equation as in (4.2) with 8112 ~ N0rmal(0,o;2).
ii)E(u,-,1|x,~,c,-1,e,-,2) = 1

iii) E(c,-1 | x,-) = exp(7ro + Rim). Impose the Mundlak device on (4.1) by setting
C” = 7n) + Til-7r] + 5i] and deﬁne the composite error term em = 81-1 + am.

I'") EICXPU’M) | xiaeitZ] = EI€XP(‘3N1) l eizzl = eXPUIO + 1319112)

Since Mundlak device contains a constant term, we can make the normalization that
exp(n()) = l. Assumption 4. 1(iii), which holds if(e,~,1,e,-,2) is mean independent of x,-, is
key as it gives us a mechanism to derive a selection correction term. Using Assumption 4.],
the fact that sf, is a function of 8112, and integrating exp( 1016112) over the truncated normal

CDF,

58

EO"1'tlx1°s~s'it = 1) : eXPMOO 1’ xjtlﬂlo 1' i[”10)g(wto + xitZétZo + 3217,20, p10)
= €Xp[7l’00 + Xmﬁlo + iiﬂlo +10g{g(a)(0 + x1129120 + ii7120apt0)}]
(4.3)

where

 

(l)(Pt + wt + 1912512 + £1712) ]
(Nah + 1912512 + i270.)

gee + X23512 + Tommi) = exp(%p%)[
A Chamberlain (1984) or Mundlak (1978) device can be imposed for Cil in Assumption 4.1
iii. Imposing the Mundlak device for Ci] in (4.1) conserves on degrees of freedom. The
parameters in (4.3) can either be estimated jointly in a GMM procedure or separately in a

two step pooled QMLE procedure. The following two-step pooled QMLE procedure

outlines the steps to estimate I‘ =(7r0,[3'1, 7r'1 , p1 , ,pT)' and correct for selection bias.
Procedure 4.1
1. For each year t, do standard probit ofsi, on X112 and ii,- to estimate (2),,50 1'12-

2. Estimate the following equation by pooled QMLE

A
V
l-

J’z‘t = CXPUTO + X11131 + iiﬂl)g(é>z + xit2‘at2 + iiTt2apt) + (WOW:
(4.4)

by maximizing the corresponding log-likelihood function

T
Cl-(O) = Z 51-,{151 log[m(z,-t,0)] —m(z,-,,0)} (4.5)
[=1
where 21-, = (l,x,-,1,x,-,2,i,-), 0 = (F',d),,§'t2,}7;2)', and

A
\I
L—

3. Asymptotic standard errors need to be adjusted to account for the ﬁrst stage probit

59

estimates. Appendix A outlines a two-step procedure to obtain valid standard errors in
the presence of generated regressors.

5. Empirical Application: A wage offer
equaﬁon
In most empirical wage offer equations, researchers use log wage as opposed to wage as

the dependent variable when measuring the return to experience and education. Blackburn
(2007) uses an exponential mean model to estimate the return to union status for a
cross-section. Santos-Silva and Tenreyro (ST) (2006) also note that due to the presence of
heteroskedasticity, OLS parameter estimates from log-linearized models can be biased.
Therefore, one may be interested in using the level of real hourly wages as the dependent
variable and using an exponential mean model to estimate a wage equation. The wage offer

equation we are interested in estimating is

wage” exp(n, + B1e.rper,-,+/32expersq,-, + [heaven + p * seli, + ci)u,-,

1,...,T (5.1)

t

where n, represents year dummies, u it is an idiosyncratic error term, c,- is unobserved
heterogeneity, 561,-, is a selection term used to test for selection bias, and wage it is real hourly
wage. The data used in estimation of (5. 1) contains 877 individuals and follows them over a
13 year period from 1980 to 1992. The source of the data used in estimation is from the
Panel Study of Income Dynamics (PSID) and is also used in Semykina and Wooldridge
(2005). To test for selection bias, we use the simple variable addition test described in
Section 3.1 with SU+1 as the additional variable and the robust variable addition test
described in Section 3.2 with log(/1,-,) as the selection term. As described in Section 3.2, the

inverse Mills ratio is derived from Tcross-sectional probits that regress the participation

60

indicator on exogenous explanatory variables and their time averages. The regressors in the
selection equation are experience, experience squared, education, age, age squared, an
indicator for marital status, other family income and its square, the number of children in
three age categories, spouse’s education, spouse’s age, spouse’s age squared, the product of
spouse’s age and education, duration of spouse’s unemployment, and a binary indicator
specifying whether the spouse’s duration of unemployment was recorded or not. Table 5
reports the summary statistics for the variables used in the selection and regression equations.
Table 6 reports the semielasticities of E(wage it | xi,cl-) with respect to

111,-, = (expel-it,e.rpersq,-,,educi,,sel,-,)5. Column 1 in Table 6 contains the Fixed Effects
Poisson maximum likelihood estimates when the lead value of the selection indicator is used
to test for selection bias. Column 2 in Table 6 reports the FEP maximum likelihood
estimates when log(/l,-,) is used as the selection term. Robust standard errors are reported in
parentheses and usual standard errors are reported in brackets. As can be seen in Table 6,

column 1, the simple variable addition test that uses Si,t+1 as the additional regressor does

not reject the null hypothesis of no selection bias. Also, as shown in Table 6, column 2, the
robust variable addition test that uses logait) as the additional regressor does not reject the
null hypothesis of no selection bias. Therefore, it would seem that there is little evidence
that selection bias exists in the sample. As a result, there is no need to do the selection
correction outlined in Section 4 using the model speciﬁed in (5.1). As seen in Table 6,
education and the return to experience are statistically signiﬁcant at the 1% level. Holding
other factors ﬁxed, an additional year of education raises real hourly wage by about 2%.
Meanwhile, holding other factors ﬁxed, the return to experience in the ﬁrst year of work is

about 7.8%. For comparison, Column 3 in Table 6 reports the return to experience and

61

education excluding the selection term.

Table 7 reports the point estimates and standard errors for education, experience,
experience squared, and a selection term from a linear model that uses log(wageit) as the
dependent variable as opposed to wage it- The linear ﬁxed effects estimates for education
and experience are similar to the estimates obtained from FEP estimation. Holding other
factors ﬁxed the return to an additional year of education is about 2.5%. Meanwhile, holding
education ﬁxed, the return to experience after one year is about 8%. However, as seen in
columns 1 and 2 of Table 7, the tests for selection bias reject the null hypothesis at the 5%
level. Therefore, one would proceed as in Wooldridge (1995) to correct for selection bias by
imposing restrictions on the unobserved effect in the regression equation. It is interesting to
note that the signs of the parameters on Si,t+1 and the inverse Mills ratio change signs when

estimating the nonlinear model (5.1). The sign on SiJ+1 is positive when log(wageit) is used

as the dependent variable but negative when wage it is used as the dependent variable.
However, the sign on 10g(/1,-,) is positive in model (5. 1) while the sign on 21,-, in the linear
model is negative. Speciﬁcally, we assume that

E(wage,-,|x,-,c,~) = exp[p(x,-,c,-) + 02(xi,c,-)/2] (5.2)
where ,u(x,-,c,-) = E[log(wage,-,) | xi,c,-] and 02(x,-,c,-) = Var[log(wage,-,) | xi,c,-].
Therefore, heteroskedasticity in log(wage,,) is a likely reason for the change in sign in the
selection terms when estimating a exponential mean model for wages as opposed to a linear
model. As a result, the distribution in wages in (5.1) may not be picking up the sample
selection problem that clearly exists in the linear model. However, the point estimates on

education and experience are quite similar when one compares the nonlinear model to the

linear model. Again, for comparison, column 3 in Table 7 reports the return to experience

62

and education excluding the selection term.

6. Conclusion

In this paper, we extended Wooldridge (1999) and BGW (2002) to the case of an
unbalanced panel. As in Wooldridge (1999), we assumed strictly exogenous regressors. The
advantage of the FEP QCMLE approach is that as long as the conditional mean function is
correctly speciﬁed, we get consistent estimates of the parameters even if the Poisson
distributional assumption is violated. Even if the response variable is not a count variable,
we can use the F EP framework to estimate a nonlinear panel data model. This paper
proposed methods to test and correct for selection bias for an exponential mean model. We
extended Terza (1998) by applying the methods used in Wooldridge (1995) in linear panel
data models to test and correct for selection bias in nonlinear panel data models. Finally, we
applied the theory developed in the paper to test and possibly correct for selection bias. In
Section 5, we estimated a wage equation by using the level of real hourly wages and tested
for selection bias. Interestingly, we discovered that selection bias is not captured when
estimating the nonlinear wage equation model, but captured when estimating a linear wage
equation model.

For future research, one can extend this paper to account for selection and attrition bias
in nonlinear panel data models under weaker exogeneity assumptions. For example, the
methods to correct for selection bias outlined in this paper are not valid in the presence of
sequentially exogenous or predetermined regressors. Under such a scenario, one would have
to condition upon selection in two separate tine periods. Correcting for attrition bias would

not be as problematic in the presence of non-strictly exogenous regressors since attrition is an

63

absorbing state. For example, the patents and R&D relationship, ﬁrms might go bankrupt if
they cannot stay proﬁtable by receiving a sufﬁcient number of patents over a certain time

period. This would be a case where one would have to test and correct for attrition bias.

Appendix A

In Section 4, we outlined a QCMLE procedure to correct for selection bias. Deﬁne the
regressors for time period t as w, = (1,x,-,1,i,-) and z,-, = (xitzji). The parameter vectors
are deﬁned as 0 =(7r0,ﬁ'1,7r'1,p1,...,pf)' = (F',p1,...,pT)', a (l +K1 +K2 + T) x]
vector, 6, = (w,,<§'t2,y’,2)’, a (I + 2K2) x 1 vector, and 8 = (6'1,6'2,,6'T)', which is a
(l + 2K2)T x 1 vector. We continue to assume that (WI-“211) is observed for all t and y,“ is
observed if s” = 1. This appendix derives the standard errors that take into account the ﬁrst
stage estimation of g(ci) , + x 1,25 ,2 + x 1‘1" ,2, p,). The log-likelihood we are interested in
maximizing is

T
I,~(0,8) =Zsi,(j',-,1log[m(w,-,,z,-,,0,3,)] —m(w,-,,z,~,,0,(5,)} (A.l)
t=l

where

(l)(zitst ‘1' Pt)
(“21151)

 

mit = ”'(wit’zitveast) =CXP(W,'1F) €XP(%P12)

We deﬁne the (1 + K1 + K2 + T) x 1 score for observation 1' as f,-(0,8) =Zthl sitfi,(0,3,)

where

. V m'- ()1. —m-)
fit(9951) = 0 11 ":2: H (A2)

 

Note that

64

Vrm},

 

 

 

me'. =
H Emit/(31p,
where
r I (l)(z'gt'l'pt)
Vrm- = w- exp(w-,I‘)exp(ip,2) 1’ . (A.3)
It It I 2 (“2115!)
and
arm, 1 2 ¢(ziz<§i+pi) ¢(zzzSz+pz)
—.—=ex (— ) ex w‘l" . +ex w-I" .
up, P 2P1 {pt P( it ) (“2115: P( It ) ‘1)(21'151)
(A4)

The asymptotic variance matrix that we will derive will need to take into account the ﬁrst

stage estimation of 6. Under standard regularity conditions,

N
7M0 — 90) = A51 <—N-1/2 2 1,090,120) + 6pm) (A.5)

i=1

where A0 = 2;] s,,E[V9m(z,-,,00,5‘)'ng(zl-,,00,5*)/m(z,-,,00,5*)] is
(1+K. +K2 + T) x (1 +K1 +K2 + T) and f,(00;8) is (l +K1 +K2 + T) x 1 score ﬁinction
derived from (A. 1). We can also claim under standard regularity conditions that

N N

N-l/2 Z r,(00;8) =N‘1/2 Z f,-(60;8*) +ep(1) (A.6)

i=1 i=1
It is important to note that when deriving the Hessian, it is only necessary to take derivatives
with respect to 0. Given that JN—(S — 5*) =0p(1 ), we can apply a standard mean value

expansion to (A6) which gives

N N
N‘“2 Z r,(90;8) =N‘1/2 Z r,(00;5*) + LOJ’MS — 5‘) +op(1) (A.7)

i=1 i=1

65

where L0 = E[Z,T:1 anaf,~,(wl-t,z,-,,00;8’;)]. Taking partial derivatives ofthe score
condition (A.2),

V _ Vaam'nb’m — mama — Vam;,Vam,-.ma — Vanz;,vam 1.01.1 — ma)
5fIt_ 2

mi:

 

(A.8)

Taking expectations and using the fact that E0,“ | x,,s,-, = 1) = mi),

T Vamﬁ-Namiz
L0=-E 23,, mi: (A9)

[=1

 

where

liz¢(lu5t + Pr)¢(liz5t) - Zu¢(ln51)¢(lu5z+ Pt)
[9121,6012

 

V5122” = exp (WI-,1") exp( é—ptz)

For each i and t, V501 is a block matrix of zeros except in one position. For each I, Vcﬁfit
appears in row (1 + K1 + K2 + t) and column (1 + 2K2) . (t — l) + 1. A consistent estimate
ofLo is
N T
L = —N“1 2231'!th (A.10)
i=1 i=1
For the probit log-likelihood parameters, under standard regularity conditions, we can
assume
N
,[N(5 — 8) =N-1/2 211,051) + 0,)(1) (A.11)
i=1
The term q ((5‘) is a (1 + 2K 3)T x 1 vector that has zero expectation and depends on the

score and Hessian ofthe probit log likelihoods. For each 1' and t, q,-(0) is composed ofa

series of (l + 2K 2) x 1 vectors stacked on top of each other,

66

('21-! = CI ¢(Zi’6(:Z;-I[Sit — (l)(zi,5,)] (A12)
(”1120011 - (“1,2501

where
N “ I
8....-. 852442924 (A13)
. 1 (D(z,~,6,)[l-¢(Zit51)l

1s a consrstent estimator of the minus of the expected Hessian from the prob1t log likel1hoods

for each I To derive q ,(0), stack the vectors 0,, for each i. We can now derive the

asymptotic variance matrix for 00. So

N
W10— 90) = 45101—10 2111190891) + 0,211) (A14)
i=1

and therefore,
7M0 — 00) 51 Normal[0,A311)aA01] (A.15)

where g,(00;8*) = f,(00;8*) + Low-(8*) and
[g,(00,5 )g,(00,5*)'] = Var[g,-(00;8*)]. A consistent estimate ofthe asymptot1c

DOEE

variance of00 is Avcir(0) = A— DA-l/N where

N T
N1ZZsi,V9m(w,-,,0,8)’ng(w,-,,0,0)lm(w,~,,0,5) (A16)

[=1

g-(é;8>g,-(é;8)’ (A17)

U>

Ill

2.

M 2 T1

" 016 6+ +Lq,<8> (4.18)

67

Table 5. Summary Statistics. Mean Values. Standard Deviation in parentheses.

Variable Description

Participation Indicator
Real Wage

Experience
Education
Age
Married Indicator
Other Household Income
(thousands)
Spouse’s Age
Spouse’s Education
Spouse’s Unemployment
Duration (Weeks)

Weeks Unreported
(=1 if Spouse’s

Unemployment not reported)

Children Aged 0-2
Children Aged 3—5
Children Aged 6-17

Children Aged 6-17
Number of Obs.

lEntire Sample Participants Non-Participants

'0.74

11.79
(7.76)
12.93
(2.30)
40.93
(10.27)
0.86
34.398
(40.379)
371M)
(18.17)
11.21
,(5-25)
(197
(4.96)
0.09

(114
(0.37)
0.18

(0.42)
(182

(1.01)
11,401

1
8.37
(5.83)
12.98
(7.58)
13.12
(2.27)
40.13
(9.62)
0.84
30.945
(30.868)
35.17
(18.28)
11198
(5.50)
(194
(4.80)
0.06

0.11
(0.33)
(116
(0.40)
(184
(1.01)
8.387

68

0

8.49
(7.28)
12.40
(2.31)
43.14
(11.63)
0.93
44.007
(58.237)
42.10
(16.84)
11.84
(4.40)
1.06
(5.36)
0.16

(121
(0.45)
(124
(0.48)
(177
(0.99)
:L014

Table 6: Wage Offer Equation. Fixed Effects Poisson (FEP) Estimates. Robust std. errors in
parentheses, regular std. errors in brackets. Dependent variable: real hourly wage in levels.

FEP VAT
' 0. 0849*”
(0. 0130)
[0.0064]
—0. 000815***
(0.000196)
[0.000114]
0. 0232“
(0.00938)
[0.00723]
—-0. 0265

(0.0472)
[0.0216]

exper

expersq

educ

Si,t+1

log(2,-,)

log likelihood —15533.612

FEP Mills
0080?"
(0. 0123)
[0.00579]

—0. 000746**“

(0. 000201)
[0.000108]
0. 0216"
(0. 00961)
[0.00707]

0.000973
(0.00287)
[0.00209]

—l7491.209
>1: * * —Signiﬁcant at 1% level. =1< * —Signiﬁcant at 5% level. >1: —Signiﬁcant at 10% level

69

FEP
0. 0803*"
(0.0122)
[0.00573]
—0. 000736"*
(0. 000197)
[0.000101]
0. 0214"
(0.0096)
[0.00706]

—l7506. 58

Table 7. Wage Offer Equation. Linear Fixed Effects (FE) Estimates. Robust std. errors in
parentheses. Year dummies included but not reported. Dependent Variable: log of real
hourly wage.

FE VAT

exper ' 0.0875*** 0.0816***
(0.0106) (0.00994)
expersq —0.000939’”* —0.000801***
‘ (0.000175) (0.000167)
educ 0.0263 "‘ O. 0242’”
(0.0097) (0.00967)
sh,“ 0.0883" —
(0.0347)
21,-, — —.129***
(0.039)

FE Mills

FE
0. 0829*“
(0.010)
—0. 000924***
(0. 000166)
0. 0268'”
(0.00976)

>1: 4: >1: —Signiﬁcant at 1%level. * * —Signiﬁcant at 5% level. at —Signiﬁcant at 10% level

70

CHAPTER 3

1. Introduction

Recent interest in econometrics has focused on correlated random coefﬁcient (CRC)
models and the estimation of Average Partial Effects (APEs) and Average Treatment Effects
(ATEs), which are APEs for a discrete explanatory variable. Wooldridge (2002, Chapter 18)
provides an extensive review of methods to estimate ATEs. Although most work has
focused on cross-sectional random coefﬁcient models, recent interest has also centered upon
panel data models with random coefﬁcients. Wooldridge (2005a) uses a CRC panel data
model to estimate average treatment effects with strictly exogenous regressors. More
recently, Murtazashvili and Wooldridge (MW) 2005 and Murtazashvili (2007) have
introduced endogeneity into CRC panel data models. For roughly continuous endogenous
explanatory variables, MW (2005) use ﬁxed effects instrumental variables (FE-IV)
estimation to consistently estimate APEs. MW (2005) make the restrictive assumption that
the covariance between the detrended covariates and the random coefficient conditional on
the detrended IVs does not depend on the detrended IVs themselves, although the covariance
may change over time. For the cross section, Card (2001) shows that this assumption is
violated when he uses a binary indicator for proximity to a four-year college as the
instrument for treatment and IQ as a proxy for unobserved ability. For a cross-sectional
model, Wooldridge (2005b) is able to mitigate Card’s (2001) criticism by using a control
function approach for a roughly continuous endogenous treatment. Murtazashvili (2007)

extends Wooldridge (2005b) to deal with CRC panel data models with roughly continuous

71

endogenous treatments. A control function, as deﬁned by Wooldridge (2005b) and
Murtazashvili (2007), is derived by generating residuals from a reduced from equation for the
roughly continuous endogenous variable and plugging in those residuals as covariates into
the structural model. However, the approach used by Wooldridge (2005b) and Murtazashvili
(2007) does not work when the endogenous treatment variable has more discrete properties.
For example, the control function approach used by Wooldridge (2005b) or Murtazashvili
(2007) does not allow for binary or comer solution endogenous treatments. For the
cross-sectional setup, Wooldridge’s (2007) CRC model allows for an endogenous discrete
treatment by making a distributional assumption on the reduced form of the treatment
variable. By making a distributional assumption for the discrete endogenous treatment
variable, Wooldridge (2007) derives a so-called "correction function" in order to produce a
consistent estimate of the ATE. The correction function is a function of exogenous
covariates derived from a ﬁrst stage probit or tobit regression. This function is then plugged
into the structural equation, and then the ATE is estimated using IV methods.

In this paper, we extend the framework used by Wooldridge (2007) and Murtazashvili
(2007) to deal with discrete endogenous treatments in C RC panel data models. As long as
the instruments chosen have sufﬁcient variation, weaker assumptions than those used in
Murtazashvili (2007) are sufﬁcient to produce a consistent IV estimator for the ATE. The
motivation for this paper comes from Wooldridge’s (2007) "correction function" approach
for cross-sectional C RC models. As in Wooldridge (2007), I propose a simple two-step
method that corrects for endogeneity in the structural model by using a correction function
derived from a ﬁrst stage probit or tobit estimation. The IV-estimator that I propose is

consistent and JN-asymptotically normal for T ﬁxed as N —> 00. As in Murtazashvili (2007),

72

 

I allow the individual slopes on the treatment variables to vary over time, which allows me to
derive a mechanism to generate a correction function.

The plan of this paper is as follows. Section 2 introduces the model and conditions
needed to generate a consistent estimate for the ATE. Section 3 derives a general approach
to ﬁnding a correction function for the C RC panel data model. Section 4 shows how to
obtain a correction function when the endogenous treatment has a probit or tobit reduced
form distribution. Section 5 shows a FE-IV approach to estimating the endogenous ATE
along with the correction function. In Section 6, we test the ﬁnite sample properties of the
correction function estimator by performing Monte Carlo simulations. Section 7 applies the
theory presented in the paper to an empirical example dealing with the school choice
program in Michigan and student performance. Finally, Section 8 provides some concluding

remarks.

2. General model and assumptions

For a random draw i, consider the following structural model with time varying
individual slopes and discrete endogenous treatments
y,, = a, +(:,- + x,,n+w,-,b,-,+ u,,,t=1,...,T (2.1)
where c,- is unobserved heterogeneity, w,-,, l x K 1 vector of discrete endogenous treatments,
x,~,, a 1 x K 2 vector of exogenous covariates that may include an individual’s productivity
characteristics, and a, are year dummies. Note that b,, is a K I x 1 vector of time varying
individual speciﬁc slopes, and 14,-, is the idiosyncratic error. For simplicity, a balanced panel

is assumed. The setup we use is similar to that of Murtazashvili (2007), but in this case the

treatment variable is allowed to be discrete. Let 2,, be a 1 x L vector of instruments where

73

 

L + K3 2 K 1. For now, I assume that x,,, which can include an individual’s productivity
characteristics, is strictly exogenous. As in Wooldridge (2007) and Murtazashvili (2007), I
will focus on E(b,~,) = B, which is simply a time constant K1 x l ATE vector. We can
decompose b ,-, into a nonrandom and zero mean random component. Therefore, we can

write
b,, = B+d,-+r,~, (2.2)

Lets write q ,-, = d,- + r,, by where by deﬁnition E(q,-,) = 0. We assume a constant ATE over

time. Plugging in (2.2) into (2.1) gives

 

y,-, = a, +c,- +x,-,n+w,-,B+(w,-,q,-, + u,,), t = 1,...,T (2.3)

a,+c,-+x,-,r|+w,-,B+6,-, (2.4)

Assumption 2.1: E(u,-,|x,-,z,-) = 0, t = 1,...,T
where x, = (x,-1, x,T), z,- = (2,1 , ,z,T). Our model allows for two sources of
endogeneity- the correlation between w,, and 6,, in addition to arbitrary correlation between
w,, and c,- b, = (b,1 , , b,T). Notice that we have not deﬁned a relationship between

(6,, b,) and (x ,,z 1.), In order to consistently estimate the ATE, we need to develop a

 

mechanism that relates (a ,, b ,—) to the covariates and the IVs. Therefore, lets make the

following assumption

74

Assumption 2.2: Fort = 1,...,T

i) E(c,|x,,z,) = E(c,|i,,i,) = 61+ 32,03 + 2,03

ii) E(d,|x,~,z,) = E(d,|l't,-,2,) = B](i, — ‘VXY + B2(i, — VZ)’

iii) E(r,,|x,,z,) = B3(x,, — WtX)’

where B] and B3 are K1 x K2, B2 is K. x L, ‘I’X = E01,), ‘l’Z = E(i,), and
E(x,-,) = w,X. In Assumption 2.2 i) and ii), i, and i, can be thought of sufﬁcient statistics
describing how the entire history of {x,,,z,, : t = 1, , T} affects d, and c,. Assumption 2.2
iii) means that once we control for an individual’s observed productivity characteristics, the
time varying component is mean independent of the IVs. This is the sort of ignorability of
instruments assumption made in Wooldridge (2007). One needs to make the appropriate
exclusion restrictions for IV estimation to work. Assumption 2.1 is also still valid since
(c,,b,) is function of x, and 2,.

Under Assumption 2.2, one can write

(lit = B](i,-\|IX)’+BZ(2,-WZ)I+B3(X,,—\|I,X)I+e,-, (2-5)

where e,, = g, + s,,, E(e,,|x,,z,) = 0, g, is the zero mean random term implied in
Assumption 2.2 ii) and s,, is the zero mean random term implied in Assumption 2.2 iii).

Having deﬁned Assumption 2.1 and Assumption 2.2, we can now deﬁne an estimating

equation. Letting p ,- be the zero mean error term implied by c,, one can write

)3, = a, + 61+ $1,612 + 2,03+x,,n + w,,ﬁ + ((x, — WX) ® w,-,)[3l +
+ ((2,- “ 112) ® “10132 + ((xit - WtX) ‘3’ wit)B3 +191" + w,,e,, + “it (2'6)

where 13,- = vec[BJ-] forj = 1,2,3. Note that we are not trying to estimate E()i',,|w,,,x,,z,).
Parametrizing E(w,,e,,|w,,, x,,z,~) as in Garen (1984) and Heckman and Robb (1986) would

require that we use the control function approach described by Murtazashvili (2007).

75

However, as we mentioned before, the correction function approach used by Murtazashvili
(2007) breaks down when the treatment variable is discrete. Therefore, we need to apply the
so-called "correction function" approach used by Wooldridge (2007) for the cross-section.
This requires that we ﬁnd a parametric form for E(w,,e,,|x,,z,). Generally, E(w,,e,,|x,,z,)
depends on (x,, 2,). MW (2005) show that when the conditional covariance of the detrended
treatments and the unobserved heterogeneity conditional on the detrended instruments is
constant, FE-IV estimation produces a consistent estimate of the APE. Murtazashvili (2007)
relaxes this assumption even further and allows for the conditional covariance-variance
matrix to be heteroskedastic. Therefore, even if E (w ,,e ,,|x ,,z ,) were not to depend
on(x,,z,), we would still have to include year dummies in (2.6) to capture any
heteroskedasticity. In order to consistently estimate the ATE, we need to ﬁnd a parametric

form for E (w ,,e,,|x ,,z ,) and use appropriate instruments. First, deﬁne

K,,(x,,z,) E(w,-,e,,|x,,z,~) (2.7)

wit = witeit ‘ K,,(x,,z,) (2-8)
Hence, the estimating equation in error form is

y,, = a, + 9] + 12,02 + i,03+x,,n + w,,B +
+ ((511 — \l’X) ® W105] + ((2,- - W2) ® “’1'th +
+ ((in - \VzX) 3’ “V1053 + Kit(xirazi) + Tita
(2.9)

where r,, = p, + 0),, + u,,. Note that E(r,,|x,,z,~) = 0, although E(r,,|w,,,x,,z,-) 4t 0. In
effect, we are adding a "correction function" to avoid having to condition upon w,,. The
function K,,(x,,z,) acts as its own instrument in the IV estimation of(2.9). Since K,,(x,,z,)

is unknown, we need to make an assumption as in Wooldridge (2007),

76

Assumption 2.3: E(w,,e,,|x,,z,) = h,,(x,,z,,1t)p
where h ,,(x ,,z,,1t) is l x K, and p is K, x 1 parameter vector. Under Assumption 2.3
and the claim that w,, is discrete in nature, we need to make a distribution assumption
regarding w,,, which we will do later in this paper. Since h,,(x,,z,,1r) is a function not
based upon E(w,,e,,|w,,, x,,z,), but instead E(w,,e,,|x,,z,), it is a "correction function", not a
"control function", using the terminology from Wooldridge (2007). Therefore, under

Assumptions 2.1, 2.2, and 2.3, in the population (2.9) is written as

y,, = a, + 61+ 72,02 + i,03+x,,n + w,,B +
+ «if — WX) ® wi[)B1 + «if _ \I’Z) ® wit)Bz +
+ ((x,, — WtX) ® w,—,)B3 + h,,(x,,z,,1t)p + 1,, (2.10)

Provided that we have a consistent ﬁrst stage estimate for 1t, we can estimate the following

by any IV procedure, including GMM.

y,, = a, + 91+ 7902 + 2,03+x,,n + w,,B + vec[(i, — WX) ® w,-,]’Bl +
+ vec[(i, — \VZ) ® w,-,]’B2
+ vec[(x,, — WM) (8 w,,]’B3 + h,,(x,,z,,ft)p + error,, (2.11)

where it is consistent ﬁrst stage estimate of 1:, the set of possible IVs include
{1,xitaliz,iiaii,hit(xiazi,ﬁ), [(ii - WX) ‘39 xii]: [(711 — WX) ‘8 211]»

[(2, — mg) 8) x,,], [(i, — wZ) <8) z,,], [(x,, — th) ® z,,]}, w,,- is an estimate ofE(w,,J-|x,,z,)
forj = 1,,K1, and $1,, = (ii'm , ,ii’nkl). Having restricted c, in 2.1, we can also
estimate (2.1 l) by random effects IV (RE-IV), which would allow us to account for any
serial correlation. By doing GMM, we can exploit the available moment conditions to
produce an efﬁcient and consistent estimate for the ATE. There are two sources of
estimation error in estimating the ATE in (2.1 1). The ﬁrst source of estimation error comes

from replacing E(i,), E(2,) and E(x,,) with ‘VXa ‘VZ’ and “HA” respectively. Fortunately, as

77

shown in Wooldridge (2002 Chapter 6), the estimation error by replacing E (i ,), E (2,) and
E (x ,,) is cancelled out by the variation in r,,. However the asymptotic variance of

W (ft — 1t) affects the limiting distribution of the IV estimator for the ATE. Under the null
hypothesis of no endogeneity bias, p =0, the ﬁrst stage estimation of it does not affect the
limiting distribution of the IV estimator for the ATE. [See Wooldridge (2002 Chapter 6)].
To test for endogeneity, simply use the usual asymptotic Wald statistic robust to
heteroskedasticity and serial correlation and test H0 : p =0 vs. H ,4 : p 1:0. If the null
hypothesis is rejected, the standard errors need to be corrected to account for the generated
regressor problem. See Appendix A for the formulas to generate standard errors robust to

heteroskedasticity and estimation error from it.

3. A general method for deriving the
correction function

Having shown the necessary conditions to generate a consistent IV estimate for the ATE,
we will now discuss how the correction function is derived. First, I will go through a general
approach to show how the correction function is derived. Then, I will show how to derive a
correction function when the endogenous treatment is a probit or tobit response. I will use a
similar framework to that used in Wooldridge (2007), but extend the analysis to deal with a
panel. Forj = 1,...,K1 and! = 1, T, E(w,,je,,j|x,,z,) needs to be deﬁned. In order to
deﬁne the correction function, we need to formulate a particular distribution for w,,j

conditional on the covariates and IVs,

78

M’,‘,j =fj-(x,,z,,v,,j;aj) (3.1)
v,,j|x,,z, ~ Gj(V,,j,ﬂJ-) (3.2)

E(ejtjivitjaxi’zi) = E(eitjivirj) = Pjvizj (33)
Equation (3.1) speciﬁes a speciﬁc functional form for w,,,, which is a function ofthe
covariates, IVs, and the reduced form error, v,,j. Meanwhile, (3.2) speciﬁes a particular
distribution for the reduced form error which has density gj(v,,j;nj). Equation (3.3) is
conditional mean independence assumption which states that conditional on v,,,, e,,j is mean
independent of (x,, 2,). I assume a linear conditional expectation in (3.3) and this

assumption holds when (e v,,J-) is bivariate normal conditional on (x,,z,). Together, (3.1),

itj’
(3.2), and (3.3) deﬁne a particular distribution for w,,,, D(w,,j|x,,z,). Although we are

specifying a particular distribution for w,,j, which may be too strong of an assumption, we

are not restricting D(c,|x,,z,) and D(q,,|x,, z,). This is advantageous since it would be quite
restrictive to specify a particular distribution for unobserved heterogeneity. Using (3.1),
(3.2), and (3.3) and a law of iterated expectations argument, we can show that for
j = 1, ...,Kl
E(w,,,e,,j|x,,z,) = E[E(w,,je,,j|v,,,,x,,z,)|x,,z,]
= Eiwz'tjE(eitj'ivitjaxiazi)ixiazi]
= ij(W,',jV,',j|Xi,Z,')
= ijVj(xi,ziavit}';aj2)vitjixiazi]
= P1 in}(X,-,Zi,vig;q,2)vi,,-g(vitj;11,2)(1V1-U-
= pjlz,,j(x,,z,;1tj) (3.4)
where it j = (012,11 1'2), is a parameter vector. Under standard regularity conditions, 11: j can be

estimated using maximum likelihood methods. In the next section, we outline methods to

79

generate a ﬁrst stage estimate of 1!: j when w,,, either is a tobit or probit treatment.

4. Examples

I will now cover derive the correction function and deﬁne the procedure for testing and
correcting for endogeneity when the treatment is a tobit or probit variable. This section

extends Wooldridge (2007) to deal with a panel.

4.1 Probit treatment variables

For j = l, ,K 1, let w represent a binary and possibly endogenous treatment, where

it]
w,, 2 (Wm , ,w,,K1 ). Let n,, = (x,,,z,,) be a 1 x (K2 + L) vector ofexogenous covariates
and IVs. Under Assumptions 2.1, 2.2 and equation (3.3), lets make the following binary
distributional assumption for W,”-
Wit] = 1[n,,0j+c,2+v,,j Z 0] (4.1)
v,,j|n, ~ Norma/(03%,) (4.2)
where 6,2 is unobserved heterogeneity. We can allow for correlation between 6,2 and n, as

in Mundlak (1978) and write c,2 = 020 + 11,012, + a,2 where (1,2 ~ Normal(0, l). Restricting

c,2, we can rewrite (4.1) and (4.2) as

W,',]' = l[(120 + n,,0j + ﬁ,a21+p,,, Z O] (4.3)

p,,J-|n, ~ Norma/(0,0 (4.4)

2
ptj)
where p”, = (1,2 + 1),-,1- and Var(p,,j) = 0%,, Due to time varying heteroskedasticity in (4.3),

the estimated parameters in (4.3) need to be rescaled. So, the response probability in (4.3) is

written as

80

P(W = 1|n,) = (l)(azot + nitetj + ﬁia2lt)

= <D(n,1r,j) (4.5)

it]

where ny- = (a30,,0,j-,a2,,)' and 7r”- : ﬂj/O'py‘. Using the general framework described in

the previous section, we can now write as in Wooldridge (2007)

+30
E(w,,Jp,-,j|n,) = J—oo 1[n,1rj +p,,j Z OkiUW/(piyﬂlpltj
(I)
2 . . . . . d. . .
J'_nin_jpl{]¢j(plfj) [’11)
ac-
= ¢j(ni7tzj) (4.6)
Using (4.6) we can now write the following estimation equation

y,, = a, + 0. + i,02 + i,03+x,,n + w,,B + vec[(x, — \VX) ® w,,]’0, +
+ veC[(2i — VIZ) ® W,,]'B2 'i'
+ vec[(x,, —- WM) (8) w,—,]'B3 + ¢,,(n,1t,)p + r,, (4.7)

where ¢,,(n,1t,) = (¢,,, ,¢,,K1) forj = 1,...Kl. Each treatment, w,,j, follows a

univariate probit. We are not imposing a multinomial logit or probit for the treatments. The

following is a procedure for estimating (4.7).
Procedure 4.1

1. For each j = 1,... ,K, , do T cross section probits by regressing w,,, on n, in order to
estimate fry- Using the ﬁtted values, form the predicted probabilities (I) j(n ,1”: tj) along
With ¢U(n,‘ﬁ,j). 6

2. Form the 1 x K1 vectors &>,,(n,ft,) = ((0,,1,,<i>,,K1) and
(it-Km?!» = Wm , a¢itK1)-

3. Estimate the following equation by any IV procedure, including GMM,

81

y,, = a, + 61+ i,02 + 2,03+x,,n + w,,B + vec[(i, — \VX) ® w,,]'[3, +

+ vec[(i, — \VZ) ® wit],BZ + vec[(x,, - WIX) ‘8’ “571,33 +
+ we.» + error, (48)

using the following instruments .. .
[laxitaiiaiiad’ita vec[(xit _ WtX) ® (DI-[Tavecixii — WX) ® (pity,

wee-[(2,- — w) e wait-,1.
4. Using an asymptotic and robust Wald statistic, test Ho : p =0 vs. H A : p #0. If the null

hypothesis is rejected, adjust the standard errors to account for the ﬁrst stage estimation
of it,. See Appendix A.

Procedure 4.1 provides an effective method to consistently estimate the ATE in a CRC
panel data model. The procedure is fairly easy to implement and is an effective way to test

for endogeneity in the treatment effect.

4.2 Tobit treatment variables

In some cases, the treatment indicator is not a binary response variable. For example, the
number of hours spent in a job training program can act as a treatment to get a potential
worker into the workforce. For some fraction of the workforce, the number of hours
observed in a job training program is zero. Therefore, one can think of hours in a training
program as a tobit variable. In this subsection, we maintain Assumptions 2.1 and 2.2, but

now write in place of (4. 1)
w,,, = max(0, n,,0j + C12 + V,,]') (4.9)
Again, allowing for dependence between 6,2 and n,, we can write (4.9) as

w,,, = max(0,a20 + n,,0j + [1,0121 + (1,2 + v,,,)

= max(0,n,1tj +£01.11.) (4. 10)

82

where p,,, = a,2 + 1),-U- and p,,j|n, ~ N0rmal(0,012),,) as in the previous subsection. Having

deﬁned the tobit reduced form for w,,,,

+33 1 pitj
E(u-’,,jp,,j|n,) = J max(0,n,1tj +pltj)pl{/ ¢( O'ptl‘ )dpltj

—00 Opt]
= ogy¢(n,1r,j) (4.11)

As in the previous subsection, ny- = j/O’py'. Using (4.1 l), the following estimating equation
can be written as follows,

y,, = a, + 01+ i,02 + i,03+x,,11 + w,,B + vec[(x, — WX) ® w,,]’01 +
+ vec[(i, — \VZ) ® w,,]'B2 +
+ vec[(x,, — WIX) ® w,,]'[33 +[o,2,,<1>(n,1t,)]p + r,, (4.12)

where [0,29,<D(n,1t,)] = (012),,¢(n,1t,1 ), ,OgtK1¢(n,1t,K1 )). Under standard tobit
mechanics, it can be shown that
E(w,,,|n,) = (D(n,1t,j)n,1tj + Opy¢(niﬂ,j) (4.13)
The following is a procedure to test and correct for endogeneity when the treatment has a
tobit reduced form.

Procedure 4.2

1. For eachj = 1, ,Kl , do Tcross section tobits by regressing w,,, on n, in order to

estimate fry- and 02 Using the ﬁtted values, form the predicted probabilities

ptj'
(DJ-(mug) along With ¢y(n,1t,/) and 19,0 = (Dj(n,1t,j)n,1tj + CPU-(l)(nmy).

2. Form the 1 x K, vectors <D,,(n,1‘t,) = ((1),,1,...,<D,,K1),(1),-Amity) = (¢i11,---a¢z'tl<1),
and Wit : (li"i{1,...,l’{)itK1).

3. Estimate the following equation by any IV procedure, including GMM

83

y,, = a, + 91+ i,02 + 2,03+x,,r| + w,,B + vec[(i, — \IJX) ® w,,]'B, +
+ vec[(2, —- wZ) ® w,,]'B2 +
+ vec[(x,, — th) ® w,,]'B3 + [ﬁg,¢(n,ft,)]p + error,, (4.14)

using the following instruments
[1, xita ibiiﬁinvecﬁxit — WzX) 8’ Witl'a "eé‘lﬁi - WX) 8’ Wrtl',

veCKii _ W2) ® Wifyaéifi

4. Using an asymptotic and robust Wald statistic, test Ho : p =0 vs. H A : p :0. If the null
hypothesis is rejected, adjust the standard errors to account for the ﬁrst stage estimation
of 5%,. See Appendix A.

5. A simple test for slope heterogeneity
using FE-IV

In this section, we consider a ﬁxed-effects IV estimator to test for slope heterogeneity.
Consider the following modiﬁed model,
J’it : Ci + x,,n + Witbit + “it (5-1)
where x,, is a l x K vector ofcontrols, 1),, = [3 + r,, and E(b,,) = [3. However, in this
section we leave c, unrestricted. Under the following assumption, the FE-IV estimator is

consistent,

Assumption 5.1 Fort = 1, ...,T
E(U,‘,|X,’,Z,'],...,Z,T,C,',b,'1,...,bl'T) = 0

Therefore, we will perform a "ﬁxed effects" analysis by allowing for arbitrary correlation

between c, and (x,,:,,b,,). Assumption 5.1 is a natural strict exogeneity assumption on the

84

instruments. Again, we impose structure on 1),, by imposing Assumption 2.1 iii)
E(r,,|x,,z,) = Bl(x,, — “'th (5.2)
Using (5.2), we can write (5.1) as
y,, = c, + x,,n, + w,,ﬁ + vec[(x,, — W,X) ® w,,]'Bl + w,,e,, + 11,, (5.3)

where r,, = B1(x,, — ‘i’t/Y)’ + 6,, and [31 = vec[Bl].

However, as in Section 2, we need to account for the term w,,e,,. Using a variation of
Assumption 2.3, assume that

E(w,,e,,|x,,z,1 , ,2,T) = ph,,(x,,,z,) (5.4)

Ifw,, has the following probit reduced form w,, = 1[z,,y1 + x,,61 + 2,}I2 + 11,62 + v,, 2 0]
and E(e,,|v,,,z,,x,) = E(e,,|v,,) = pv,,, then as in Section 3.1, we can write that
E(w,,e,,[z,1 , ,2,T,x,) = p¢(2,,y1 + x,,01 + 2,72 + $992). Using a similar argument used in
Section 2, we can write that

K,,(x,,z,) E E(t4/',,e,,lz,1,...,z,T,x,) (5.5)
wit = witeit “K110113312 (5-6)

Using (5.4), (5.5), and (5.6), write (5.3) as

yit 2 Ci + xit‘ril + Witﬂ + vec[(x,, — \i’tX) 8’ wit]IBl +

+ ph,,(x,,,z,) + r,, (5.7)
where 1,, = (0,, + u,, is a composite error term with E(r,,[z,1, ...z,T, x,) = 0. To account for
the unobserved heterogeneity term, 6,, time demean (5.7) by subtracting the time averages
for each term. In essence, we are testing whether there is a signiﬁcant interaction effect.
The procedure that follows is only a test of whether there is evidence of slope heterogeneity

and a signiﬁcant interaction effect.

85

Procedure 5.1

. Estimate the following reduced form w,, using pooled probit
P(W,‘t = llZ,‘,X,') = (l)(Ct ‘i" Z ((71 + x,,61+2,y2 + iiOZ) (5. 8)

With the ﬁtted values, generate ¢,,- — Mg", +z,,y. + x,,01 +z,y2 + x,02) and
(Dit— - ¢(§,+Z,-,71 + x,,01+z,y2 +X 1'62).

. Estimate the following equation by F E-IV
yit = in" + “”113 + veEKXit ' WzX) 8’ Will'ﬂi +
+ p¢,,(x,,,z, )+ error,, (5.9)

where ¢,,- — (17,, — (l/DZ,T=1 ¢,,. Deﬁne the instruments to be
(¢i,,¢i,,xi,,v—€C[(Xit WtX) ® (Diti )

. In order test for a signiﬁcant interaction effect, test H0 : p = 0 using an asymptotic
t-statistic with a standard error that is robust to heteroskedasticity and serial correlation.

6. Monte Carlo simulation

In this section, we will conduct Monte Carlo simulations to study the robustness

properties of the correction function estimator when the endogenous binary treatment has the

probit reduced form. We consider the C RC panel data model presented in Section 2. For

simplicity, we will neglect to include any controls. Therefore, we will instrument only for

the binary endogenous treatment. So consider the following model

y" : ai+b,w,,+u,,,t=1,...,T (61)

Note that a, and b, have the following data generating process (DGP)

pz
(1, = a+p1,,§,+ 1———%e‘-’ (6.2)

I

86

2
_ _ l p
b,=/3+p,,,(z,—z)+ 1——}ie§’ (6.3)

where 2,, is normally distributed, 2, = (l/T) 2,21 2,,, 6? ~ N0rmal(0, 1),
6? ~ N0rmal(0,1), and 1.1,, ~ Norma/(0,1). The endogenous treatment has the following

DGP

,1:
M It

w,, = l[w','-‘, > 0] (6.4)

_ .. l
— 7a,, '1' 301,, '1' Vi, +0, + b,)

where v,, ~ N0rmal(0, 1). Deﬁning C, as year dummies, the reduced form parameters are
derived from estimating the following equation using pooled probit
POW: = 1|z,) = (MCI +z,,y1 +5172) (65)
The estimating equation for the main regression equation is
m = Ct + W113 + 51171 + Wizﬁi - Elm + 5011+ git (6-6)

where (15,, is the correction function derived from the reduced form probit equation. The
main results for the Monte Carlo simulations are presented in Table 8 and Table 9. In Table
8, Monte Carlo results are reported when 2,, ~ N0rmal(—l, 1). In order to allow for more
variability in 2,,, we report in Table 9 Monte Carlo results for 2,, ~ N0rma1(—1,4). We do
simulations for N = 500, 1000, and 1500 while T = 5. The true population parameters are
set at 6 = 2 and a = 1. Year dummies are included in the reduced form probit estimating
equation and in the structural estimating equation. The number of replications is 500.

Along with the mean, mean absolute error (MAE), standard deviation (SD), and root

mean squared error (RMSE), the median, lower quantile (LQ), and upper quantile (UQ) of B

are reported in Tables 8 and 9. As shown in Tables 8 and 9, the sample correlations between

87

w,, and b, and w,, and a ,, PM) and [)wa, are similar across the two speciﬁcations of 2 ,,. The
estimated correlation between w,, and u ,,, fawu falls when 2 ,, ~ N0rma1(—l,4). However,
there is still evidence of substantial correlation between w,, and u ,, when 2,, is N0rmal(—1, 1)
or Normal(—1,4). As predicted by the theory, the so-called "correction function" estimator
for the panel is consistent. The FE-IV estimator is also pretty well-behaved, especially when
2 ,, ~ N0rmal(—1, 1). For example, the FE-IV estimator has the minimum mean square error
compared to the correction function estimator when 2,, ~ Normal(—1, 1). As shown Table 8
and 9, the FE-IV estimator is more efﬁcient, especially when 2,, ~ N0ramal(—1,1). When

2 ,, has more variability, the correction ﬁlnction estimator is almost as efﬁcient as the F E-IV
estimator. The fact that the correction function estimator is not as efﬁcient as FE-IV when

2 ,, ~ N0rmal(—l, 1) is not surprising and suggests that the correction ﬁmction is a highly
collinear function of the instrument, 2 ,,. However, the correction function estimator has a
lower RMSE and bias compared to the F E-IV estimator when the variability of 2,, increases
As expected, POLS, IV, and FE are not consistent. The estimated values for the ATE are not
close to the population value set in the simulations. However, the correction ﬁJnction
estimator does not perform very well in terms of the estimated mean when

2,, ~ N0rmal(—1, 1) and N = 500. This is perhaps not at all surprising since we would
expect the correction function estimator to perform better when there are more
cross-sectional observations as well as more variability in 2,,. Wooldridge (2007) points out
that the correction ﬁinction estimator performs better when there are a large number of

cross-sectional observations.

88

7. Empirical Example: Michigan Schools
of Choice Program

Starting in 1997, school districts in Michigan were allowed to enroll nonresident students
without having to obtain permission from the actual student’s district of residence. In this
section, we will use the correction ﬁinction estimator to estimate the ATE of the school
choice program on student performance. As a measure of student performance, we use
satisfactory math pass rates for fourth graders. Controlling for real expenditures and

eligibility in a school lunch program, we estimate the following equation
pass4,, = a, + c, + n] log(rexpp),,+n2lunch,, + b,,ch0ice,, + u,, (7.1)

where pass4,, is satisfactory math pass rates for fourth graders, choice ,, is a binary indicator
describing whether a school district had a choice program in particular year, log(rexpp),, is
the log of real expenditure per student in 1997 dollars, lunch ,, is the percentage eligible for a
free lunch, c, is a district speciﬁc effect, and a, are year effects. The sample consists of 550
school districts for the years 1997 and 1998. A school district is assumed to have a choice
program if the district has greater than zero choice students. In equation (7.1), we assume
that choice ,, is not strictly exogenous. For example, a particular school district may not
admit students from another school district if administrators determine that performance in
the previous year declined due to external and uncontrollable factors. Table 10 gives the
summary statistics for the control variables, choice ,,, and math pass rates for districts with
and without a choice program. As shown in Table 10, schools with a choice program have
lower mean enrollment and slightly lower mean math pass rates than school districts without
a choice program.

In order to test the effect of the choice program on math pass rates, we estimate (7.1)

89

using the correction function estimator, FE-IV, FE, pooled IV, and pooled OLS. We
instrument for choice ,, by using the log of district enrollment, which we assume to be strictly
exogenous. The robust t-statistic on log enrollment in the reduced form probit equation with
controls is 3.07, while the usual non-robust t-statistic is 2.44, which would indicate that log
enrollment is a valid instrument for the binary choice program variable. The results are
presented in Table 11. Unfortunately, the correction function estimator does not perform as
well as the Monte Carlo simulations would indicate perhaps because there is not enough time
variation in the data. There is evidence of multicollinearity in the estimates in column 1 of
Table 11. However, the FE-IV estimator that tests for SIOpe heterogeneity and the interaction
effect performs somewhat well. There is strong statistical evidence that there is an
interaction effect as indicated by the t-statistic on the pdf function. The coefﬁcient on the
pdf function is statistically signiﬁcant at the 1% level. Although the FE-IV estimate
indicates the there is a positive impact of the school choice program on student performance,
the coefﬁcient on choice ,, is not statistically signiﬁcant at the 10% level. For comparison,
estimates from the FE, pooled IV, and pooled OLS regressions indicate that the choice
program has a negative impact on student performance. However, only the POLS coefﬁcient

on choice ,, is statistically signiﬁcant at the 10% level.

8. Conclusion

In this paper, we have estimated ATE for a panel using a correlated random coefﬁcient
model. Unlike previous works that deal with estimating APEs for a panel, our model allows
for discrete treatments. In addition to allowing for discreteness in our treatment variable, we

also allow for endogeneity in our model. An advantage to using the correction ﬁJnction

9O

 

estimator or the F E-IV correction ﬁmction estimator presented in Section 5 is that no
distributional assumptions are required to restrict the unobserved heterogeneity. In order to
derive the correction function and account for endogeneity, fairly strong assumptions are
required regarding the nature of the discrete treatment. However, since the treatment is
observed, it is fairly easy to make distributional assumptions regarding the endogenous
treatment. In simulation and in the empirical example presented in the paper, we used
pooled probit to estimate the reduced form for the endogenous treatment. In ﬁnite samples,
Monte Carlo simulations show that the correction ﬁrnction estimator performs well in ﬁnite
samples, especially when there are a large number of cross-sectional observations. We
applied the methods developed in this paper with an empirical example examining the effect
of the Michigan schools of choice program on student performance. In the ﬁrture, we expect
the estimator developed in this paper to be used in the program evaluation literature to test

the effect of certain policies over a ﬁxed time period.

91

Table 8: Monte Carlo Results. 500 Replications. Year Dummies Included.

Mean
SD
RMSE
MAE
LQ
Median

UQ

8 Mean
SD
RMSE
MAE
LQ
Median

UQ

Mean
SD
RMSE
MAE
LQ

Median

UQ

(1)

(2)

(3)

30:2 T=5 N=500
Estimator POLS IV

A 3.74
0.084
1.737
1.735
3.68
3.74
3.79

'B 2 2

’ 3.73

10.059
1.73
1.73

‘ 3.69
3.74
3.77

“B = 2

I 3.74
0.047

: 1.74
1.74
3.71
3.73
3.77

FE
2.385 2.61
0.216 0.078
0.45 0.62
0.39 0.61
2.23 2.56
2.38 2.61
2.54 2.66
T=75 N = 1000
2.396 2.61
0.15 0.058
0.439 0.628
0.398 0.619
2.28 2.57
2.40 2.62
2.50 2.65

T = 5 N = 1500
2.41 2.62 i
0.12 0.047
0.44 0.631
0.42 0.623
2.33 2.59
2.40 2.62
2.48 2.65

(4)

2,, ~ N0rmal(-1, 1)

(5)

i (61

(7) 18)

FE — I V Corr. Fxn [ow [)wa pm,

1.64
0.18
0.429
0.378
1.51
1.63
1.75

1.65
0.129
0.398
0.361
1.55
1.65
1.73

1.66
0.107
0.379
0.349

1.58

1.66

1.73

92

1.93
0.76
0.75
0.55
1.54
1.98
2.38

9 1.99
0.475
0.488
0.372

1.71
2.02
2.29

1.99
0.393
0.407
0.309

1.75

1.99

2. 25

0.34 0.34 0.31

13,, b..;aap..yu
0.34 0.34 0.31

bwb ‘bwa _ ﬁwu
0.34 0.34 0.31

Table 9: Monte Carlo Results. 500 Replications. Year Dummies Included.

(1)

(2)

113:2 T=5

Estimator i POLS IV

Mean
SD
RMSE
MAE
LQ
Median

UQ

Mean
SD
RMSE
MAE
LQ
Median

UQ

Mean
SD
RMSE
MAE
LQ
Median

UQ

3.49
0.09
1.49
1.48
3.42
3.48
3.54

18:2

3.48
0.064
1.48
1.48
3.43
3.48
3.52

42

3.49
0.049
1.49
1.49
3.46
3.49
3.52

3.04
0.121.
1.05
1.04
2.96
3.03
3.12

T=5

3.05
0.086
1.05
1.05
2.99
3.05
3.10
T: 5
3.05
0.066
1.06
1.06
3.01
3.04
3.09

2,, ~ N0rmal(—l,4)

(3)

N=500

FE

2.19 f

0.074
0.239
0. 196
2. 14
2. l9
2. 25

N = 1000

2.19
0.049
0.23
0.193
2.15
2.18
2.22

N=1500

2.19
0.041
0.23
0.20
2.16
2. 20
2.22

(4)

(5)

<6) (7) (8)

FE—IV C0rr.Fxn bwb [3W0 ﬁwu

1.82
0.094
0.242
0.194

1.76

1.82

1.88

1.82 i

0.066

0.228

0.186
1.78
1.82

1.86

1.83

0.054

0.221

0.181
1.79
1.82
1.86

93

1.99
0.099
0.16
0.086
1.93
2.00

2.07

2. 00
0.068
0. 143
0.063

1.95

2. 00

2.05

2.00
0.059
0.140
0.056

1.96

2.00

2.04

0.355 0.361 0.16

[2,, [91m bwu
0.356 0.361 0.15

f) u . b [3 wa ﬁwu

0.356 0.361 0.16

Table 10: Summary Statistics. Mean values. Standard Deviation in parentheses.
Variables Choice=l C hoice=0

2359 3486.5
(2819.9) (10103.71)

Enrollment

66.01 67. 989

4th Grade math pass rate
(16.19) (15.84)

5948. 63 6083.55

Expenditures (19978)
(843.68) (1074.66)

30.57 26.96

Lunch eligibility
(14.41) (17.02)

Number of obs. 437 663

Table 1 l: ATE Estimates. Robust standard errors in parentheses. Year dummies included
but no: reported.

(1) (2) (3) (4) (5)

f A Corr. Fxn FE-IV FE Pooled IV POLS

choice ‘ —30.19 19.33 —0.467 -33.78 —1.51*
(52.21) (15.53) (1.56) (30.30) (0.884)
10g(reXPP) 34.75 —15.67 7.55 2.77 1019*"
(188.67) (28.85) (6.29) (8.59) (3.27)
lunch 0.361 0.185 -.073 —0.264** —0.379***
(6. 27) (0. 267) (0. 245) (0. 126) (0. 034)

pdf 1 —14.91 —244.52*** — — —

(167.9) (80.93)
:1: >1: >1: —Signiﬁcant at 1% level. * >1: —Signiﬁcant at 5% level. * —Signiﬁcant at 10% level

94

Appendix A
In this section, I will show how to derive robust standard errors when the reduced form
for the endogenous treatment has the probit form. So consider the following model from

Section 4 with a scalar endogenous treatment 7
)"it = g,,I‘ +8” (A.1)

Where git = (1, iiaiia xii, Wit, (ii - ‘1’X)Wita(ii - W2)W17,(X12 - WZX)Wit1 (1’11),

F = (61,0'2,0'3,n',,[3,ﬁ',,B'2,ﬁg)' and E(e,,lw,,,x,,z,) = 0. Note that I" is (3 + 4K+ 2L) x 1.
Deﬁne the generated instruments in time period t by the 1 x (3 + 4K + 2L) vector

11,, = (1, i,,i,, x,,, <1>,,, (x,, — w,X)<1>,,, (i, — \VX)(1>,,], (i, - WZ)(i)it»‘1)it) which means that the

ATE [3 is just identiﬁed. Using instruments hit, the pooled ZSLS estimator is

. T .. N T .1 . “1 N T .7
= (228111111) (2211111117) ( 21117811)
i=1 t=1 i=1 t=1 i=1 t=1
N’ T . N' T .1. -1 A, T .,
x (228111111) (2211171111) (22111012) (A.2)

Multiplying through by W, writing (A. 1) as y,, = g,,1“ + (g,,—g,,)l" +r,,, and plugging this

—1

into (A.2), one can write

z=1t=1 [=1 (=1 i=1 (=1
N T 1 N T , ’1
212222;, a WZZhnhn x
(=1 t=l i=1 t=1
N T ,
x 11*“2 ZZhquu—énr +41 (A3)
i=1 i=1

Under standard regularity conditions, (A3) is written as

95

N T
(CD—1C)’1CD—121<N-1/ ZZh} [’r(g,,-g,,)’+c,-,])+op(1) (A4)
n: =1

where C =E[Z,:1 h,,g,,] and D =E[Z,T=, h,,h,,] Using a mean value expansion, one can
show that
N T N T
21-“ 2:21-24 - = W 232144
i1= t=1i=1 t=l

N T
+ [N-1 22(V5h,,)e,,] ./1_v(8 — 5) +op(1) (A.5)
i=1 t=1

where (V5h;-,) is the (3 + 4K + 2L) x (1 + 2L) Jacobian with respect to the probit reduced
form parameter vector, 5. Since E(e,,|x,,z,) = 0, it follows that E[V5h',,e,,] = 0. Hence,
one can write that
N T
N‘1 Z 2(V5h,,)e,, = 0,,(1) (A.6)
i=1 t=1
Since W (S — 5) =0,,(1),
N T
N““2 227M811: N—“ZZZh,,e,,+cp(1) (A.7)
it=1t=1i= 1 (=1

Using another mean value expansion, one can write

N T N T
N“ 221‘: ,1“ (g, g,,)' ={N71ZZh§-,r’<vag;-,)]JN<S-6)+
H=1 =11=11=1
+0p(1)
= —F,/N(8—-8) +0p(1) (A8)

where

96

 

0
, O
Vagit = I (A.9)

‘kiz(kit5)¢it
is (3 + 4K + 2L) x (1 + 2L) Jacobian with respect to the probit reduced form parameter
vector, 8, F =E[Z:,T=1 h,,l'"(V,3g;-,)], and k,, = (1,2,,,i,) is 1 x (1 + 2L). Also note that the

limiting distribution of the ﬁrst stage estimator is

. N T ‘1 N T
Was—6) = (Iv-1 224410)) (N—“2 ZZs.-,<6)>+op(1)

i=1 [=1 i=1 t=1

N T
= P'1<N‘1/2 ZZS,,(8)> (A10)
i=1 t=1

 

 

 

where
, _ (¢(ki15)}2kizkit
A145" 90.2011 —¢<k.~.6)1 (““1”
saw): ¢(kit5)kit[wit - ¢(k,,5)] (A.12)

¢(kit5)[1 - (130915)]

E[s,,(8)] =0 and P =E[Z,:1 A ,,(8)]. Therefore,

N T
JN(1‘“—r)=(C'D—1C)‘lc’D-1N-1/2 [h’c- —-FP’1s-(5)]+0 (1)
it it it P
i=1 t=l

 

(A.l3)

By the Central Limit Theorem,

7N(f‘ — I‘) 5’» N0rmal(0, (C'D‘l C)’1C'D”1LD’1C(C'D“1C)—1) (A14)

where

97

T
L EVar 2(h},e,, — FP’ls,,(8)) (A15)
t=1

Hence, the asymptotic variance of f‘, Avar(f‘) is estimated as

 

.. A .. an—IAAIA—la—l
Avar(f‘)=(CD C) CD ND C(CD C) (A16)
where
N T 1
Cs% 2 Z hug, (A17)
i=1t=1
N T '
05% Z Z h,,h,-, (A18)
i=1t=1
A N T A
12571,,— 2 Z A,,(5) (A19)
i=1 t=1
N T .,
15,221“. .11“ (Vat-3.”) (A20)
i=1t=1
N _1 T , _1 ’
LEHZ T(h§,é,,— FP s,,(8)) Z(h,-,é,,—FP s,,(8)) (A21)
i=1 t=1t=1

and Bit = )’z’t ‘ g,,1".

98

Footnotes
1. When one treats 01,, as parameters to estimate in a ﬁxed effects context, one also
needs to assume that u ,,1 is white noise. Such a strong assumption on the idiosyncratic
errors is not needed in a time constant unobserved effects model when the unobserved effect
is treated as a parameter to estimate.
2. The general transformation used by Holtz-Eakin, Newey, and Rosen (1988) involves

differencing the previous time period from the current time period.

9 0
("if-"113% ﬁfth—1 "xiJ—lﬁ) = “it — —9 t

u .
t,t—l
t— t—l

In fact, the unobserved effect is eliminated when any two time periods are differenced.

3. One can just as well write (3.1) in Chapter 2 with an additive error
J’it = v,exp(x,,ﬁ + PSi,t+1) + 911

where under Assumption 2.2, E(e,,|x,,v,,s,) = 0. Notice that

9 it
v1- exp(xizl3 + 981,891)

 

“(It = l '1'

However, nothing is assumed about the data generating process regarding (3.1) in Chapter 2.
4. The outer product of the score function can also be used in the formula for the LM

statistic.

N N 1 N
LM= 2003.) 200309-031) 20030 ~ 2%
i=1 i=1 i=1
5. It is important to note that we are not necessarily specifying a correctly speciﬁed

conditional mean function. We are merely using sel,, as an additional variable to do a test

for selection bias.

99

6. One can also perform pooled probit to generate the predicted probabilities and
correction function.

7. The analysis can easily be extended to cases of multiple treatments.

100

REFERENCES

Ahn, S., Y. Lee., and Schmidt P. (2001), "GMM Estimation of Linear Panel Data
Models with Time-Varying Individual Effects," Journal of Econometrics 101,
219-255.

 

Blackburn, M. (2007), "Estimating Wage Differentials Without Logarithms,"
Labour Economics 14, 73-98.

 

Blundell, R., Grifﬁth R., and Windmeijer F. (2002), "Individual Effects and
Dynamics in Count Data Models," J oumal of Econometrics 108, 113-131.

 

Card, D. (2001), "Estimating the Return to Schooling: Progress on Some
Persistent Econometric Problems," Econometrica 69, 1127-1160.

 

Chamberlain, G. (1984), "Panel Data," in Handbook of Econometrics, Volume 2.
ed. Z. Griliches and MD. Intriligator. Amsterdam: North Holland, 1247-1318.

Chamberlain, G. (1992), "Comment: Sequential Moment Restrictions in Panel
Data," Journal of Business and Economic Statistics 10, 20-26.

 

Dustmann, C., and Rochina-Barrachina ME. (2000), "Selection Correction in
Panel Data Models: An Application to Labour Supply and Wages," mimeo,
Department of Economics, University College London.

Garen, J. (1984), "The Returns to Schooling: A Selectivity Bias Approach with a
Continuous Choice Variable," Econometrica 52, 1199-1218.

 

Gronau, R. (1974), "Wage Comparisons- A Selectivity Bias," Journal of Political
Economy 82, 1119-1143.

Hausman, J .A., Hall B.H., and Griliches Z. (1984), "Econometric Models for
Count Data with an Application to the Patents R&D Relationship," Econometrica
52, 909-938.

Heckman, J .J ., and Robb, R. (1986), “Alternative Methods for Solving the
Problem of Selection Bias in Evaluating the Impact of Treatments on Outcomes,
in: Wainer, H. (Ed), Drawing Inferences from Self—Selected Samples. Springer-
Verlag, New York, 63-113.

Holtz—Eakin, D., Newey, W.K., and Rosen, H. (1988), "Estimating Vector
Autoregressions with Panel Data," Econometrica 56, 1371-1395.

 

Kyriazidou, E. (1997), "Estimation of a Panel Data Sample Selection Model,"
Econometrica 65, 1335-1364.

 

101

 

Mundlak, Y. (1978), "On the Pooling of Time Series and Cross-Section Data,"
Econometrica 46, 69-85.

 

Murtazashvili, I. and Wooldridge, J .M. (2005), "Fixed Effects Instrumental
Variables Estimation in Correlated Random Coefﬁcient Panel Data Models,"
Mimeo, Michigan State University Department of Economics.

Murtazashvili, I. (2007), "A Control Function Approach to Estimation of
Correlated Random Coefﬁcient Panel Data Models," Mimeo, Michigan State
University Department of Economics.

Newey, W.K. (1984), "A Method of Moments Interpretation of Sequential
Estimators," Economics Letters 14, 101-206.

 

Newey, W.K., and McFadden, D. (1994), “Large Sample Estimation and
Hypothesis Testing,” in Handbook of Econometrics, Volume 4, ed. R.F. Engle
and D. McFadden. Amsterdam: North Holland, 2111-2245.

 

Nijman, T. and Verbeek M. (1992), “Testing for Selectivity Bias in Panel Data
Models,” International Economic Review 33, 681-703.

Pagan, A., (1984), "Econometric Issues in the Analysis of Regressions with
Generated Regressors," International Economic Review 25, 221-247.

Rochina-Barrachina, ME. (1999), “A New Estimator for Panel Data Sample
Selection Models,” Annales D’Economie Et De Statistique 55-56, 153-182.

Santos Silva, J .M.C. and Tenreyo S. (2006), "The Log of Gravity," The Review of
Economics and Statistics 88(4), 641-658.

 

Semykina, A. and Wooldridge, J .M. (2005), "Estimating Panel Data Models in the
Presence of Endogeneity and Selection: Theory and Application," Unpublished

Manuscript: Department of Economics, Michigan State University, East Lansing,
MI.

Terza, J .V. (1998), "Estimating Count Models with Endogenous Switching:
Sample Selection and Endogenous Treatment Effects," Journal of Econometrics
84, 129-154.

Wooldridge, J .M. (1995), "Selection Corrections for Panel Data Models Under
Conditional Mean Independence Assumptions," Journal of Econometrics 68, 115-
132.

Wooldridge. J .M. (1997a), "Quasi-Likelihood Methods for Count Data," in
Handbook of Applied Econometrics, Volume 2, ed. M.H. Pesaran and P. Schmidt.
Oxford: Blackwell, 352-406.

 

102

Wooldridge, J .M. (1997b), "Multiplicative Panel Data Models without the Strict
Exogeneity Assumption," Econometric Theory 13, 667-678.

Wooldridge, J .M. (1999), "Distribution-Free Estimation of Some Nonlinear Panel
Data Models," Journal of Econometrics 90, 77-97.

Wooldridge, J .M. (2002), "Econometric Analysis of Cross-Section and Panel-
Data," MIT Press.

Wooldridge, J .M. (2005a), "Fixed Effects and Related Estimators in Correlated
Random Coefﬁcient and Treatment Effect Panel Data Models," Review of
Economics and Statistics 87, 385-390.

Wooldridge, J .M. (2005b), "Unobserved Heterogeneity and Estimation of
Average Partial Effects," in Identiﬁcation and Inference for Econometric Models:
Essays in Honor of Thomas J. Rothenbegg. D.W.K. Andrews and J .H. Stock
(eds), 27-55. Cambridge: Cambridge University Press.

 

Wooldridge, J .M. (2007), "Instrumental Variables Estimation of the Average
Treatment Effect in Correlated Random Coefﬁcient Models," forthcoming in
Advances in Econometrics, Volume 21 (Modeling and Evaluating Treatment
Effects in Econometrics). Daniel Millimet, Jeffrey Smith, and Edward Vytlacil
(eds.). Elsevier

 

103

11111111111111111111

   

B

11111

129