In»: E.
.41 11%.?!
1 .5 2.] 1

.nl
:

 

‘i‘u'

M‘

“It.

 

“34.4.,

5.1.9133: , . ‘ ‘ .. . . . . . may?
in”?

.
. , . ‘ 5m.
22 . I. ‘ .. . z , . , . ii...¢.m:xmm.t

 

 

LIBRARY
Michigan State
University

This is to certify that the
dissertation entitled

PANEL DATA MODELS WITH UNOBSERVED EFFECTS
AND ENDOGENOUS EXPLANATORY VARIABLES

presented by

IRINA MURTAZASHVILI

has been accepted towards fulﬁllment
of the requirements for the

 

 

Ph.D. degree in Economics
v4 )///\—4/(\ gm,»—

 

/ W ’Major Professor’s Signature

 

Date

MSU is an afﬁnnative—action, equal-opportunity employer

 

 

 

 

- -----.-I-O-Q-u- -o--.-0-.-u---~—c--.---o------

--—-.----_-.- .- -

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATEDUE

DATEDUE

DAIEDUE

 

07240m

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6/07 p-lClRC/DaleDue indd-p‘1

 

 

PANEL DATA MODELS WITH UNOBSERVED EFFECTS
AND ENDOGENOUS EXPLANATORY VARIABLES

By

Irina Murtazashvili

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Economics

2007

ABSTRACT

PANEL DATA MODELS WITH UNOBSERVED EFFECTS
AND ENDOGENOUS EXPLANATORY VARIABLES

By
Irina Murtazashvili

This dissertation consists of three essays that address issues of estimation in panel
data models with unobserved effects and endogenous explanatory variables. The ﬁrst
essay considers estimation of correlated random coefﬁcient (CRC) panel data models
with endogenous regressors. This chapter provides a set of conditions sufﬁcient for
consistency of a general class of ﬁxed effects instrumental variables (FE-IV) estimators
in the context of a CRC panel data model. The usual FE—IV estimator turns out to
be fairly robust to the presence of neglected individual-speciﬁc slopes. Monte Carlo
simulations suggest the proposed FE—IV estimator of Population Averaged Effect
(PAE) provided a full set of period dummy variables is included performs better than
other estimators in ﬁnite samples for the case of (roughly) continuous endogenous
explanatory variables.

The second essay continues studying a CRC panel data model from the ﬁrst chap-
ter but, in addition to allowing some explanatory variables to be correlated with
the idiosyncratic error, the joint distribution of the endogenous regressors and the
individual heterogeneity conditional on the instruments is allowed to depend on the
instruments. The second essay uses a two-step control function approach to account
for endogeneity and to consistently estimate average partial effects (APEs) in CRC
panel data models with endogenous roughly continuous regressors.The simulation
ﬁndings indicate that in the ﬁnite samples the control function approach to estimat-
ing the CRC balanced panel data model with time-constant individual heterogeneity

performs better than other estimators under the considered conditions. The pro-

posed method is applied to the problem of estimating the APES of annual hours of
on-job-training on output scrap rates for manufacturing ﬁrms in Michigan.

In the third essay, a dynamic binary response panel data model that allows for an
endogenous regressor is developed. This estimation approach is of particular value
for settings in which one wants to estimate the effects of a treatment which is also
endogenous. This model is applied to examine the impact of rural-urban migration
on the likelihood that households in rural China fall below the poverty line. The
empirical results that migration is important for reducing the likelihood that poor
households remain in poverty and that non-poor households fall into poverty. Further,
failure to control for unobserved heterogeneity leads to an overestimate of the impact
of migrant labor markets on probability of staying poor of those who lived below the

poverty lines.

Copyright by
Irina Murtazashvili

2007

ACKNOWLEDGMENTS

I would like to express the deepest appreciation to my adviser, Professor Jeffrey
Wooldridge for his generous advice and support. Without his guidance and help this
dissertation would not have been possible. I am very grateful for the assistance and
advice I received from Professor John Giles who also kindly provided me the data
for one of the applications. I wish to thank my committee members, Professor Peter
Schmidt, Professor Ana Maria Herrera, and Professor David Tschirley, for valuable
comments and fruitful discussions. I also thank other faculty members and doctoral
students of the Department of Economics at Michigan State University for support

during my graduate studies.

‘7

TABLE OF CONTENTS

LIST OF TABLES
LIST OF FIGURES

1 FIXED EFFECTS INSTRUMENTAL VARIABLES ESTIMATION
IN CORRELATED RANDOM COEFFICIENT PANEL DATA
MODELS
1.1 Introduction ................................
1.2 Model Speciﬁcation and Previous Results ................
1.3 Conditions for Consistent F E—IV Estimation ..............
1.4 Examples .................................
1.5 Finite Sample Behavior of the FE—IV Estimator ............
1.6 Conclusion .................................

2 A CONTROL FUNCTION APPROACH TO ESTIMATION OF
CORRELATED RANDOM COEFFICIENT PANEL DATA MOD-
ELS
2.1 Introduction ................................
2.2 Model of Interest for Balanced Panels ..................
2.3 Estimating Procedure and Calculation of Standard Errors ......
2.4 Finite Sample Behavior of the Control Function Estimator ......
2.5 Empirical Application to Effects of Job Training on Worker Productivity
2.6 Conclusion .................................

3 ESTIMATION OF A DYNAMIC BINARY RESPONSE PANEL
DATA MODEL WITH AN ENDOGENOUS REGRESSOR, WITH
AN APPLICATION TO THE ANALYSIS OF POVERTY PERSIS-
TENCE IN RURAL CHINA
3.1 Introduction ................................
3.2 Estimation of a Dynamic Binary Response Panel Data Model with an

Endogenous Regressor ..........................
3.2.] Dynamic Binary Response Panel Data Models .........
3.2.2 A General Approach to Estimation ...............
3.2.3 Allowing for Serial Correlation of Errors in the First Stage . .
3.2.4 Calculation of Average Partial Effects ..............
3.3 Migrant Labor Markets and Poverty Persistence in Rural China . . .
3.3.1 Rural-Urban Migration in China ................
3.3.2 The RCRE Household Survey ..................

vi

viii

ix

moat—il-A

19
19
21
28
32
4O
48

50
50

53
56
61
62
64
64
66

3.3.3 Migration, Consumption Growth and Poverty ......... 68
3.3.4 Estimating the Impact of Migrant Labor Markets on Poverty

Persistence ............................ 70

3.3.5 Identiﬁying the Migrant Network ................ 73

3.4 Results ................................... 76
3.5 Conclusions ................................ 78
APPENDICES 80
A Tables for Chapter 1 80
B Tables for Chapter 2 85
C Tables and Figures for Chapter 3 95

BIBLIOGRAPHY 105

vii

Al
A2
A3
A4

B1
B2
B3
B4
B5
B6
B7
B8
B9

C.1
C2
C3
C4
C5

LIST OF TABLES

Usual Unobserved Effects CRC Model for 6 = 2 and T = 5 . . 81
Usual Unobserved Effects CRC Model for 6 = 2 and T = 10 . 82

Random Trend CRC Model for 6 = 2 and T = 5 ........ 83
Random Trend CRC Model for 6 = 2 and T = 10 ........ 84
Usual Unobserved Effect CRC Model for Continuous ygit . . 86
Random Trend CRC Model for Continuous ygit ........ 87
Usual Unobserved Effect CRC Model for gm 6 (0,1) ...... 88
Random Trend CRC Model for ygz-t 6 (0,1) ............ 89
Standard Errors for the Control Function Approach ...... 90
Summary Statistics from Unbalanced and Balanced Datasets 91

POLS Estimates of the First Stage Regressions ......... 92
FE-IV and CF Estimates of the Second Stage Regressions . . 93
Summary Statistics for the Control Variables .......... 94
Household and Village Characteristics .............. 100
Factors Determining the Size of the Village Migrant Network 101

CF Approach to Estimating Determinants of Poverty Status 102
Linear Probability Model for Determinants of Poverty Status 103

Average Partial Effects of Determinants of Poverty Status . . 104

viii

LIST OF FIGURES

C.1 Share of Village Labor Force Employed as Migrants by Year 96
G2 Village Consumption Growth .................... 97
C3 Change in Poverty Headcount ................... 98
G4 Change in Out-Migrants in Village Labor Force ........ 99

ix

CHAPTER 1

FIXED EFFECTS
INSTRUMENTAL VARIABLES
ESTIMATION IN CORRELATED
RANDOM COEFFICIENT
PANEL DATA MODELS

1. 1 Introduction

In both cross section and panel data settings, there is substantial interest in estimat-
ing population averaged effects (PAES), including average treatment effects (ATEs),
in the correlated random coefﬁcient (CRC) model. Models with both exogenous ex-
planatory variables and endogenous regressors have been investigated in recent years.
Angrist (1991) discusses the conditions for consistency of ATE estimates in mod-
els with binary endogenous variables and no exogenous covariates. A set of sufficient
assumptions required for consistent ATE estimates with (roughly) continuous endoge-
nous regressors in a CRC model can be found in Wooldridge (2003). Both papers
study estimation with random sampling from a cross section.

The possibility that treatment effects might depend on individual-speciﬁc hetero-
geneity motivated Imbens and Angrist (1994) to introduce the “local average treat-

ment effect” (LATE) as an evaluation parameter, which provides a useful interpre-

tation of the instrumental variables estimator when the effect of a binary treatment
varies across units. That emphasis on LATE led to a reinterpretation of IV estimates
in many empirical applications, and spurred a great deal of research on interpreting
IV estimators in a variety of contexts. Heckman and Vytlacil (2005) provide a recent
uniﬁcation, including a discussion of whether we should be interested in parameters
such as LATE.

The understanding that IV generally consistently estimates LATE in simple set-
tings is useful, but often we are interested in estimating the expected effect for a
randomly drawn unit from the underlying population. Plus, strict interpretation of
LATE as the average treatment effect among units induced into treatment by the
switching of an instrumental variable ~ such as program eligibility — is limited to
special cases. Here we study estimation of population average effects, or average
treatment effects. in a general panel data model with heterogeneous slopes. By es-
timating population average effects we can easily estimate the aggregate effects of
various policies, such as increasing the amount of job training among the population
of manufacturing workers.

Wooldridge (2005a) studied general ﬁxed effects estimators with strictly exoge-
nous regressors in the CRC model with panel data, and derived conditions under
which generalized ﬁxed effects estimators — generalized in the sense that they sweep
away unit-speciﬁc trends — are consistent for the population averaged effect. In this
paper, we study the model in Wooldridge (2005a) but, in addition to allowing cor-
relation between the instruments and the unobserved heterogeneity, we allow some
explanatory variables to be correlated with the idiosyncratic error. The main re-
sult is a set of sufﬁcient conditions under which ﬁxed effects instrumental variables
(F E—IV) estimators consistently estimate the population averaged effect, even when
the individual-speciﬁc slopes are ignored. The results include the commonly used

ﬁxed effects two stage least squares estimator (FE-2SLS) as a special case, but also

more general F E-IV estimators that sweep away individual-speciﬁc time trends. The
conditions are most likely to apply when the endogenous explanatory variables are at
least roughly continuous, as in Wooldridge (2003) for the cross—sectional case.

The remainder of the paper is organized as follows. In Section 1.2 we introduce the
model and briefly review existing results. Section 1.3 contains the main consistency
result, and Section 1.4 covers examples where the conditions will — and will not —
hold. Section 1.5 contains a Monte Carlo study that shows how the F E—IV estimator,
with a fully set of time period dummies, outperforms its obvious competitors. The
simulation results support the results in Sections 1.3 and 1.4. Section 1.6 contains a

brief conclusion.

1.2 Model Speciﬁcation and Previous Results

The model of interest is a CRC model studied in Wooldridge (2005a). For a random

draw 2’ from the population, the model is

yit =Wt3i+xitbi+uitat= 1.---,T. (11)

where yit is a dependent variable, wt is a 1 x J vector of aggregate time variables,
which we treat as nonrandom, a,- is a J X 1 vector of individual-speciﬁc slopes on the
aggregate variables, x“ is a 1 x K vector of endogenous covariates that change across
time, b,- is a K x 1 vector of individual-speciﬁc slopes, and “it is an idiosyncratic
error. As discussed in Wooldridge (2005a), we require J < T. So, if we have two time
periods, we can only allow a scalar individual-speciﬁc intercept, a,. If T = 3, we can
allow individual-speciﬁc linear trends, too. Higher order trend terms are allowed as
T increases.

Equation (1.1) is a correlated random coefﬁcients model when the individual spe-

ciﬁc slopes, b,- (as well as the elements in a..,-), are allowed to be correlated with Xit-

For example, a simple CRC wage equation might look like

log(wage,jt) 2 an + aigt + biltrainingit + bigunionit + bi3marriedit + Hit» (1.2)

where, in addition to the standard level effect an, each individual is allowed to have
his or her own unobserved growth in wages, (11-2. In addition, the time-varying ex-
planatory variables have individual-speciﬁc returns. The variable training might be
hours spent in job training, and the CRC model allows the return to training to be
individual-speciﬁc and correlated with the amount of training — as a standard model
of human capital accumulation would suggest.

Wooldridge (2005a) studied the consistency of ﬁxed effects estimators of (1.1) that
sweep out the a,- but act as if b,- = [3 for all i. To describe Wooldridge’s main result,

and the extension here, write b,- = ,6 + d,, and substitute into (1.1):

ya = Wtaz' + Xaﬁ + (xitdi + an) E wtai + Xz‘tﬁ +1121, (13)
where “it E xitdi+'u.,-t. We eliminate a,- by regressing, for each i, yit on wt, t = 1, ..., T
and Kit on wt, t = 1, ...,T, and keeping the residuals, ijit and in, respectively. This

gives the equations

3),, = itﬁb, + 77,, = 5am? +(5t,,d,-+i1,t) = xii/3 + i3,,,t = 1, T. (1.4)

The ﬁxed effects estimator studied by Wooldridge (2005a) is just the pooled OLS

estimator from (1.4). We control the amount of individual-speciﬁc detrending by
choosing wt appropriately.

An assumption used by W’ooldridge (2005a) is the standard strict exogeneity as-

sumption conditional on (a,-. bi):

E(uit|x,:1,...,x,-T,a,j,b,j) =0,t=1,...,T. (1.5)

Using a simple iterated expectations argument, Wooldridge shows that, under the

additional assumption

E(biliit) = E(bi),t=1,...,T, (1.6)

the ﬁxed effects estimator is consistent for the population averaged effect, [3.

Consistency of the usual FE estimator relies heavily on assumption (1.5), which
rules out traditional simultaneity, time-varying measurement error, correlation be-
tween time-varying omitted factors (in Hit) and the elements of Kit, and models with
lagged dependent variables or other kinds of regressors where changes in “it may feed
back into changes in Xi,t+h for h 2 1. In the case where b,- = 6, methods that ﬁrst
eliminate a,- and then apply instrumental variables usually, 2SLS — have become a
standard tool for the applied economist. Here, we study such estimators but allow
for individual-speciﬁc slopes, bi.

Let zit be a 1 x L vector of instrumental variables, with L 2 K. Let ﬂit be
the “detrended” instruments from the individual-speciﬁc regressions of zit on wt,
t = 1, T. Then we can estimate (1.4) using instruments 2,, for unit 2' in time
period t. Whether we just use pooled 2SLS _ the estimator we focus on here - or a
more sophisticated generalized method of moments (GMM) estimator, the moment

conditions we use are

E(2§,i},,) = 0,t=1,...,T. (1.7)

In the next section, we study consistency of the F E-2SLS estimator under conditions

that relax those in Wooldridge (2005a).

1.3 Conditions for Consistent FE-IV Estimation

In order to ensure that (1.7) holds, we place conditions separately on the relation-
ship between the instruments and idiosyncratic errors and the instruments and the
unobserved effects. Plus, of course, there is always a standard rank condition.

ASSUMPTION 1: With the deﬁnitions in Section 1.2,

E(u,-t|z,-1,z,-2, ...,ZiT) = 0, t=1,....,T (1.8)

Assumption 1 is stronger than we need — as will be clear, E(z;tu,-t) = 0,t = 1, ..., T
would sufﬁce - but (1.8) is a natural strict exogeneity assumption on the instruments.
Assumption 1 is common in simultaneous equations models with panel data, as well
as models with other kinds of endogeneity that induces correlation between xz-t and
“it, such as omitted variables and measurement error. Assumption 1 rules out lagged
dependent variables among the instruments — as well as other non-strictly exogenous
instruments — and so its application to dynamic models is limited unless sufficient
strictly exogenous instruments are available. When zit = xit, so that the covariates
are strictly exogenous, Wooldridge (2005a) included a,- and b,- in the conditioning set,
as in (1.5). When the unit-speciﬁc trend function is correctly speciﬁed, this stronger
form of the assumption is essentially harmless.

The second component of the error term in (1.4) is 5e,,d,-, and we need assumptions
such that 2,, is uncorrelated with iitdi- This requires some care because it“ contains
endogenous elements. (That is, we allow components of xit to be endogenous even
after removing unit-speciﬁc intercepts and trends.) The ﬁrst assumption mimics the
key assumption from Wooldridge (2005a), except that we replace the covariates with

the instruments:

ASSUMPTION 2: b,- is mean independent of all the unit-speciﬁc “detrended” Zita

that is,

E(bil.z.it) = E(b1) = ,B,t=1,...,T. (1.9)

Because the ﬁg are net either of a time average or, more generally, level and trend
effects, Assumption 2 maintains mean independence of the heterogeneous slopes and
deviations of the instruments from long—run levels or trends. Of course, in the case
where the instruments are assumed, in each time period, to be independent of all
heterogeneity, Assumption 2 automatically holds. Assumption 2 is practically much
weaker than full independence because it allows b,- to be arbitrarily correlated with
systematic components of zit; we cover some examples in Section 1.4. [Wooldridge
(2005a) contains a discussion for the case of strictly exogenous Xit-l

Generally, the richer is wt, the more likely (1.9) is to hold. For example, the usual
F E—IV estimator takes out time averages from the instruments, and this might not
be enough to ensure (1.9) if the instruments are trending differently across units 2'.
On the other hand, adding more aggregate factors to wt reduces the variation in ﬁit,
generally leading to less efﬁcient IV estimators. Not surprisingly, in deciding what to
include in Wt we confront the usual tradeoff between efﬁciency and consistency.

Unfortunately, Assumptions 1 and 2 are not enough to conclude that the IV
estimator is consistent. Instead, we employ a constant conditional covariance as-

sumption.

ASSUMPTION 3: For j = 1, ..., K,

COVCIEZ'tj, bijlﬁit) =3 COV(iitj,bij),t=1,...,T. (1.10)

Importantly, (1.10) allows the (letrended covariates and the random coefficient
to be correlated, and the covariance may change over time; in fact, there is no re-

striction on the temporal pattern of Cov(:i},-tj, bij). But the covariance conditional on

the detrended IVs is assumed not to depend on 2,, [In any case, the covariances
Cov(:ié,-tj, bij) do not depend on 2' because of random sampling in the cross-sectional
dimension. As we are conditioning only on at, the restriction is that the covariance
condition on zit does not depend on at; we have no need to place restrictions on
other conditional covariances]

Assumption 3 extends to the panel data case a condition used by Wooldridge
(2003) for the pure cross-sectional case. An important difference is that Assumption
3 applies to the detrended covariates and instruments. Importantly, we allow the
unconditional covariances to change arbitrarily over time. Of course, if bij = ﬂj for
all 2', then (refeqzeq20) is trivially true because both sides are zero.

Assumptions 1 through 3 imply that the key orthogonality conditions (1.7) hold,
and these conditions can be used in a generalized method of moments framework.
For simplicity, we focus here on the ﬁxed effects two stage least squares estimator,
FE—2SLS [interpreted in the general sense of eliminating a,- from (1.1)]. To ensure

consistency of F E—2SLS estimator we add a standard rank condition.

ASSUMPTION 4 (i) rank (2;, Ragga-0) = K;

(ii) rank (2?le E(z;t'z,-t)) = L.

Practically speaking, the ﬁrst part of Assumption 4 is most important; it means
that, after netting out individual-speciﬁc trends, there is still sufﬁcient correlation
between the instruments and regressors. Part (ii) requires sufﬁcient variation in the
“detrended’; instruments. It would be violated if, say, we specify Wt = (1, t) and zit
contains an element that is constant across t for all 2' (such as gender) or changes by
the same value in each time period (such as a person’s age when the length of the

sampling period is constant).

PROPOSITION 1: Under Assumptions 1 to 4, the F E—IV estimator is consistent

for 6, provided a full set of time period dummies is included in (1.4).

PROOF: Under Assumption (refeqzeq19), E(dijl2it) = O,j = 1, ..., K for all t, and

SO

Efﬂtjdijlia) = COW-77m. dijlia) = COW-fig. bijliitl

But by Assumption 3, the conditional covariances equal the corresponding uncon-
ditional covariances, say “/tj» and so E(;i':,-tjd,-j|°z',tt) = 715]" j = 1,...,J, t = 1,...,T.
Since iiitd, = .ifmdil —l— iiitgdig + + :iiithz-K, we have shown that E(iitdi|'z',-t) =
”m + + “ft K E 6,. Therefore, we can write xitd, = 9t + Tit where E(r,-,|2,-,) = 0,

t = 1, ..., T. Now we plug this expression for add,- into equation (1.4):

git =6t+X.it,(3+(Tit+fI-it), t=1,...,T. (1.11)

As we have just shown, Assumptions 2 and 3 imply that E(r,-,|2,-,) = 0. Assumption
1 implies that E(a,,|s,~,) = 0. Thus, the composite error in (1.11) satisﬁes E(r,-t +
Ifitliit) = 0, t = 1, ...,T, and so any IV method that uses instruments 'z'z-t at time
t consistently estimates ,8. In particular, under the rank condition in Assumption
4, and standard ﬁnite moment conditions, the FE—2SLS estimator is consistent and

\/ N -asymptotically normal. This completes the proof.

Proposition 1 contains an important empirical lesson: unless there are very good
reasons to the contrary, one should include a full set of time effects in a ﬁxed effects IV
analysis. Even if the model does not originally contain separate time period intercepts

itself a questionable premise -- the estimating equation generally should if one wants
to allow correlated random slope coefficients.

Because the error term in (1.11), Tit + ill-t, is generally heteroskedastic and serially
correlated - at a minimum due to the presence of stud,- h inference should be carried
out using a fully robust variance matrix for [3. Typically this is straightforward for

pooled 2SLS where all instruments have been detrended prior to estimation.

1 .4 Examples

To see how Proposition 1 applies, suppose Kit is linearly related to zit with heteroge-

neous linear trends for each element of Kit:

X-it =g,-I‘+t-hi\Il+z,-tH+q,-t, t=1,...,T. (1.12)
Initially, take wt 2 ( 1, t), so the regressors and instruments are linearly detrended
before applying pooled 2SLS. Assume the instruments also have heterogeneous linear
trends, which are removed by individual-speciﬁc detrending. Then Assumption 2
simply requires that the idiosyncratic movements in zit are uncorrelated with b,, a
weak requirement on instrumental variables. For Assumption 3, write Skit = 2,,11 +
51a. t = 1.71» 50 that C0V(iitabiliit) = C0Vl(§itH +621), bilﬁz‘tl = COVfRitabiliitla

t = 1, T under Assumption 2. Thus, provided

C0"(iiit.bz°liz't) = C0V(iiz't.bz'), t= 1, ~--.T. (1-13)

we can use ﬂit as IVs for it“ to obtain a consistent estimate of the PAE, )6, in equation
(3.4). One might even assume that (q,1,...,q,-T,b,-) is independent of (2,1,...,z,-T),
which is sufﬁcient for (1.13) [as well as for Assumption 2].

It is possible that the FE—IV estimator is consistent even if we only demean the
regressors and instruments, provided the instruments satisfy a stronger exogeneity as-
sumption. In other words, even though x.,-t contains individual-speciﬁc linear trends,
we ignore that in our estimation procedure. To see why we can still get consistency,

demean Kit to get

xit — ii = [t - (T +1)/2] ° hi‘I’ + (zit - it)” + (qit — (1i), t=1,...,T. (1.14)

Now, if [(Qit — 61,1),bi] is independent of (zit — 2,) for each t, and (1.9) holds for
ﬂit = (zit — ii) and (1.13) also holds. Therefore,

10

COV(Xit - ii. bilzit - ii) = [t — (T +1)/2l‘I”COV(hiabi) = C0V(Xit — i,, bi)

for each t, which means that Assumption 3 holds: while the conditional covariances
are not generally zero, or even constant over time, they do not depend on zit — 5,. So,
the F E—IV estimator will be consistent provided we include a full set of year dummies
in estimation.

What happens if we have a binary endogenous variable, grit? Assumption 3 is
unlikely to hold. To see why, take the case wt 5 1, t = 1, ...,T, which corresponds
to the usual unobserved effects model with correlated random coefﬁcients. Then,
it“ = ﬁrit — EM = 1, ..., T, and we need E(:'r',-td,-|z',-t) not to depend on iit- Now, by

iterated expectations,

E(iitdiliit) = ElEiiitdildiiziﬂiitl = EldiE(iit|di, Zi)liitl' (115)

Standard models for binary responses, with zit strictly exogenous conditional on di,
would have P(:1:,-t = lldi,z.,-) depending on d,- and zit, in a nonlinear way. For

concreteness, suppose P(:r.,jt = lldi, zi) follows a probit model,

Pfl‘it =1|dszil= P($it=1|di,zz't)= (1)010 + a1dz' + 22102)- (1-16)
Then
T
E(i‘it]d2f, Zi) : (19((1'0-l-(21di-l-Zitag)-T—1 Z @(CYO-l-(rldi-l-Zz‘rag) -__—' gt(di, 22') (1.17)
r=1

and so, by (1.15),

Efiitdzfl'z'it) = Eldigddi, Z-i)|iz'tl (1-18)

11

Even if d,- is independent of 2,, — a sensible strengthening of Assumption 2 — (1.18)
generally depends on 2,7,. Thus, assuming Mata-[2,9 does not depend on 2,, is
rather strong for a binary endogenous explanatory variable $2‘t- [Heckman (1997)
contains a detailed discussion of the behavioral implications of this assumption in
different empirical studies] In a cross-sectional context, Wooldridge (1997) proposes
a modiﬁed set of assumptions that are sufﬁcient for consistent estimation of the ATE,
6, with a binary endogenous variable, but, applied to the current setup, P(1:,-t =
lldi, 2,,) would have to follow a linear probability model.

In a cross-sectional setting, Card (2001) shows that the analogue of Assump-
tion 3 can also be violated in the case of roughly continuous explanatory variables
due to heteroskedasticity in the variance matrix of (xi,b,-) given 2,. (With a pure
cross section, there are no time subscripts and, of course, no unit-speciﬁc demeaning
or detrending.) In an earnings equation where x,- includes schooling, Card rejects
Cov(x,~,b,~|z,:) = Cov(xi,b,-) using IQ score as a proxy for unobserved ability (an
element of b,) and a binary indicator for college proximity as an instrument for ed-
ucation. In our panel data setup, Assumption 1 allows Cov(x,-t, bilzz't) to depend on
zit, as it generally would if xz-t and zit contain persistent heterogeneity correlated with
b,. Using a generalized ﬁxed effects approach, we need only assume Cov(5°c,-t, bi|°z',-t)
does not depend on 2,,, and this is much more plausible when we think the unit-
speciﬁc detrending successfully eliminates the time-constant heterogeneity in if” and
'2'“.

More recently, in a cross-sectional setting, Wooldridge (2005b) proposes conditions
that allow Cov(x,-, bilzi) to depend on zi, but these do not apply directly to the panel
data case with time-constant heterogeneity that can be correlated with the covariates

and instruments.

12

1.5 Finite Sample Behavior of the FE—IV Estima-

tor

In this section we provide evidence on the ﬁnite sample properties of FE—IV estimator
of the population averaged effect in a CRC panel data model. Because one of the
most commonly used applications of CRC panel data models is the usual unobserved
effects model with a random coefﬁcient, we ﬁrst assume wt E 1, t = 1, ...T in (1.1), as
in the second part of the ﬁrst example from Section 1.4. Also, for scalar processes sit
and zit, we assume a linear relationship between LL'z‘t and zit, with a linear trend for
suit. We use Monte Carlo simulations to draw the data and check the properties of the
estimator. The number of replications is 500, and the results of the experiment are
presented for cross—sectional sample sizes of 100, 400, and 800 for two time horizons,
T = 5 and T = 10. The population average values are 6 = 2 and a = 3.

For t = 1, ..., T, the endogenous explanatory variable is generated as

 

1,, E Ana, + Await + Ama, + (b,- + as,- + \/1 — A3,, — Ag, — A3,, — {2(1 + 0%,,

(1.19)
where “it: Pit ~ Normal (0, 1), a, ~ Normal (0,1), b,- = 6 + di, d,- ~ Normal
(0, 03) and A”. A331,, Am, and g are constants. Further, the instrument is generated
as zit = Amaz- + 1 - agamit — where a, is deﬁned above — mit ~ Normal (t, 1), and
Am is the population correlation coefﬁcient between zit and a,, t = 1, ...T.

In our reported simulations we use 02 = 1. When Am 2 0, the coefﬁcients A”,
Am, and Am from (1.19) are the population correlation coefﬁcients between 33it and
zit, IL‘z't and ”it: and $2} and ai, t = 1, ..., T, respectively. The population correlation
between $z‘t and 1),; when Am 2 0 is {(1 + t), t = 1, ..., T. We use the coefﬁcient on the

error term in ( 1.19) to ensure that 23,-, has unit variance when Am = 0. When Am 79 0,

13

Var(:1:,-t) = 1 + 2A$2Azanm which is only slightly greater than one for our choices of
the A parameters. The relevant covariances are Cov(xit, Uit) = An, Cov(:c,-t, at) 2
AMA“ + Am, and Cov(.r,:t, zit) = A“ + AxaAza. For the endogenous explanatory
variable deﬁned in (1.19), Assumption 3 is met: Cov(fz§,-t,b,~|2,-t) = Cov(:i§it,bi) =
5(1 + t), t = 1, ...T.

The dependent variable yit is generated as:

ya = “'2' + witbi + Hit, t= 1, MT, (120)

where (1,, 1),, uit, and fit are deﬁned above. Among other estimators, we obtain the
FE—IV estimator in (1.20) acting as if b,- = 6. Based on the ﬁrst example from Section
4, we know this F E—IV estimator is consistent for 33a generated as in ( 1.19) provided
we include a full set of time dummies, even though we only demean the regressor and
the instrument while ignoring the individual-speciﬁc linear trend in the regressor.

Tables A.1 and A2 present simulation results for the correlated random coefﬁcient
model for Am = .40, Am = .20, A“ = .20, and Am = .25. The implied correlation
between grit and Zit is about .245, which seems to be a reasonable value for panel data.
For comparison, we used a data set provided with Wooldridge (2002) on domestic
route air fares for 1,149 routes in the United States for 1997 through 2000. (The data
set is called AIRFARE.) The correlation between the log of air fare (an endogenous
explanatory variable in a passenger demand equation) and the instrumental variable
candidate, the concentration ratio on the route, is about —.22, which has a magnitude
in the range of .245.

Table A.1 reports the simulation outcomes for T = 5, where 5 = .12, while
Table A2 covers the case T = 10, where f = .06. When 6 = .12, the correlation
between 322-1 and b, is slightly less than .24; when 6 = .06, the correlation is just below
.12. Columns 1 through 6 contain the mean, standard deviation (SD), root mean

squared error (RMSE), lower quartile (LQ), median, and upper quartile (UQ) of the

14

PAE estimates from 500 replications. Rows of the table report statistics for usual
pooled ordinary least squares (POLS) estimates on the original data, the usual ﬁxed
effects estimates (FE-OLS), which is just pooled OLS on the time-demeaned data,
pooled instrumental variables (IV) estimates using the original data, the ﬁxed effects-
instrumental variables estimates without period dummy variables (FE-IV without
dummies), and ﬁxed effects instrumental variables estimates when a full set of period
dummy variables is included (FE—IV with dummies).

From the table we see that the POLS estimates are roughly 1.5 times larger than
the true value of 6 in the 100, 400 and 800 observation samples. One source of bias of
the POLS estimates is the correlation between the unobserved heterogeneity a,- and
the regressor grit. A second source of bias in the POLS estimates is the endogeneity
of the regressor 15a. with correlation coefﬁcient pm very close to .4. A third source of
bias (and inconsistency) is the correlation between $it and bi.

The within transformation eliminates a,, and so the correlation between xit and
a, is not a source of bias for the usual FE—OLS estimator. But FE-OLS still produces
a biased estimator of 6 for the last two reasons mentioned above. The bias in the
FE—OLS estimator is much lower than for POLS, but the bias is still on the order of
30 percent.

The pooled IV estimator — that is, without removing time averages and without
time period dummies — actually has a larger bias than the F E—OLS estimator, a ﬁnding
that is not too surprising because the instruments are correlated with (Li. Using the
FE transformation combined with IV eliminates the dependence between zit and a,-
because lit 2 Amaz- + 1 — Agamit. Therefore, the FE-IV estimator (without time
dummies) has a smaller bias and considerably smaller RMSE than the pooled IV
estimator. More importantly, the FE—IV estimator with period dummies has the
lowest RMSE among all estimators for all the sample sizes and both time horizons.

Plus, the RMSE of the FE—IV estimator with time dummies falls quickly as the sample

15

size, N, grows. Without period dummies, the F E—IV estimates of 6 are biased by
at least 20 percent, and the bias does not disappear as N —> 00. As T increases,
the RMSE of the FE—IV estimator without dummies estimates decreases but it is still
higher than the one for the F E—IV estimates when the period dummy variables are
included. Thus, even though the structural model (1.20) does not contain a time
trend, inclusion of a full set of period dummies ensures the consistency of the FE—IV
estimation.

Not surprisingly, the FE—OLS estimator has a smaller standard deviation than
the F E—IV estimator (both without time dummies). Typically, methods that treat
regressors as exogenous have substantially less sampling variation than their IV coun-
terparts because the correlation between the instrument and regressor is typically well
below one, as in the current simulation.

The difference between the FE—IV estimates with and without time dummies
illustrates the trade-off between bias and variance. The FE—IV estimates without time
period dummy variables are always less variable than the FE—IV with time dummies.
This is hardly surprising, as including more explanatory variables — the time dummies
in this case that are correlated with the instrument induces multicollinearity into
the IV estimates. The instrument, Zita is constructed to be correlated with time
dummies, and so the FE-IV estimator with time dummies is less precise than that
without. But, of course, the estimator without time dummies suffers from substantial
bias even though the structural model does not contain separate period intercepts.
The RMSE for the FE—IV estimator that includes a full set of dummies is much lower
than the estimator that does not.

We also conducted simulations with more variability in the random coefﬁcient,
namely, 03 = 4, so that the standard deviation of b,- is double that in Tables A.1
and A2. The results of these simulations are not included here but are available

on request. With more variability in 11,-, the bias induced by failing to include time

16

dummies in the F E—IV estimation is more pronounced (even though, remember, the
structural model does not include time effects). For example, with T = 5, and
N = 800, the RMSE of the FE—IV estimator without dummies is about 1.36, compared
with about .22 for the estimator that does include the dummies.

For the next set of simulations, we take wt E (1, t), t = 1, ...,T, in (1.1), so that

each cross-sectional unit has its own linear trend. In particular, we generate yit as

yit = (1,0 + ant + Iitbi + nit, t: 1, ...,T, (1.21)

where (1,30 and an are independent Normal(o, 1) random variables and b.,-, and 21,-, are

deﬁned above. The endogenous explanatory variable xit is generated as

 

xit E szzit‘l‘druuit+A;ra(ai0+ai1)+§bi+€tdi+\/1 " /\;2rz - A3221 _ 2Afra — {2(1 + t)2e,-t,

(1.22)
and the instrument is generated as zit = Maul-0+ Wmit. Again, the coefﬁcient
on eit is chosen so that Var(:r,-t) = 1 if Am = 0. We use the same values for the /\
parameters as in Tables A.1 and A2, and we take 0b = 1. (Simulation ﬁndings for the
case 0b = 2 are available on request.) Because the structural model (1.21) contains a
time trend, the default is to include a full set of time period dummies in the various
estimation methods. For comparison, we include the FE—IV estimator without time
period dummies.

The rows of Tables A3 and A4 report statistics for POLS with time dummies,
ﬁxed effects with time dummies, pooled instrumental variables with time dummies,
ﬁxed effects instrumental variables estimates with time dummies, and ﬁxed effects
instrumental variables estimates without time dummies. As in Tables A.1 and A2,
the simulation ﬁndings are unambiguous: ﬁxed effects IV with a full set of time
dummies is superior, by far, to the other estimation methods, for all combinations of

N and T. Perhaps not surprisingly, when yit is itself trending, the consequences of

17

omitting aggregate time effects is much more detrimental than in the previous case.

The simulation ﬁndings are perhaps not too surprising: the only estimator that
is essentially unbiased for the PAE removes the unobserved effect (or, more gener-
ally, the individual-speciﬁc trends), includes a full set of aggregate time effects, and
instruments for the endogenous explanatory variable. Nevertheless, it is useful to see
that the theoretical ﬁndings in Section 1.3 have practically important implications:
the FE—IV estimator with time dummies is robust to correlation between the random
coefﬁcients and the explanatory variable, at least for assumptions that can be met by

continuous endogenous explanatory variables.

1.6 Conclusion

This paper suggests a set of conditions sufﬁcient for applying the standard IV ap-
proach to the estimation of population averaged effects in a correlated random coefﬁ-
cient panel data model with (roughly) continuous endogenous explanatory variables.
Assumptions 1 through 4 ensure consistent FE—IV estimation of the population av-
eraged slopes, 6, even ignoring individual-speciﬁc slopes. Monte Carlo simulations
suggest the proposed FE—IV estimator of PAE provided a full set of period dummy
variables is included performs better than other estimators in ﬁnite samples for the
case of (roughly) continuous endogenous explanatory variables.

A natural direction for future work is to relax homoskedasticity of E(5e,-,d,-|2,-,);
Card (2001) showed how the analogous assumption can fail in a cross-sectional envi-
ronment. Recently, Murtazashvili (2006) shows how this assumption can be relaxed
using a control function approach by putting restrictions on the reduced forms of the
endogeneous elements of x.” — restrictions that can be met for roughly continuous

variables — and by modeling the conditional covariances.

18

CHAPTER 2

A CONTROL FUNCTION
APPROACH TO ESTIMATION
OF CORRELATED RANDOM
COEFFICIENT PANEL DATA
MODELS

2. 1 Introduction

Recently, a lot of attention has been devoted to estimation of average partial effects
(APES) in correlated random coefﬁcient (CRC) models, in both cross section and
panel data settings. Studies are primarily conducted in a cross—sectional setup with
few exceptions for panel data. CRC panel data models are investigated for both ex-
ogenous and endogenous explanatory variables. Wooldridge (2005a) discusses ﬁxed
effects estimation of a CRC model for the case of exogenous independent variables
in a panel data setting. Murtazashvili and Wooldridge (2005) address ﬁxed effects
instrumental variables (FE-IV) estimation of APEs with (at least roughly) continuous
endogenous regressors in CRC panel data models.1 One of the main conditions for

consistent estimates of APEs in their study is an assumption of independence of co-

 

1We refer to the continuous variables with some discrete characteristics as roughly con-
tinuous, and provide a discussion about this kind of variables in the next section.

19

variance between detrended endogenous regressors and individual heterogeneity, con-
ditional on the transformed IVs, from the detrended instruments. Card (2001) shows
for cross-sectional data that this assumption can be violated in the case of roughly
continuous endogenous explanatory variables due to heteroskedasticity in variance-
covariance matrix of explanatory variables and individual heterogeneity conditional
on the instruments. He rejects this assumption using IQ as a proxy for unobserved
ability and a binary indicator for college proximity as an instrument for education in
the human capital earnings model. Wooldridge (2005b) proposes conditions weaker
than those in Murtazashvili and Wooldridge (2005) for obtaining consistent APEs
estimates for (roughly) continuous regressors with the Card’s problem in a cross-
sectional setup.

In this paper, we study the model in Murtazashvili and Wooldridge (2005) but,
in addition to allowing some explanatory variables to be correlated with the idiosyn—
cratic error, we correct for the drawback described in Card (2001) while still allowing
the endogenous regressors to be (roughly) continuous. We use a control function ap—
proach, which introduces residuals from the reduced form for the endogenous regres-
sors as covariates in the structural model. We propose a two-step method to account
for endogeneity and to consistently estimate APES in CRC panel data models with
endogenous (roughly) continuous regressors. The motivation for our two-step panel
data procedure comes from a cross section study by Wooldridge (2005b). Further,
we relax the assumptions in Wooldridge (2005a) and Murtazashvili and Wooldridge
(2005) by allowing the individual slopes in a CRC model to vary over time. Both
cases of time-constant and time-varying individual slopes are covered in this paper.

Monte Carlo simulations indicate that in the ﬁnite samples the control function
(CF) approach we propose for estimating the CRC balanced panel data model with
time-invariant individual heterogeneity performs better than other estimators when

the joint distribution of the individual heterogeneity and the endogenous regressors

20

conditional on the detrended instruments depends on the instrumental variables.
We apply the proposed method to the problem of estimating the average partial
effects of annual hours of on-job-training on output scrap rates for manufacturing
ﬁrms in Michigan using the ﬁrm level data for 1987 through 1989. The control function
approach we propose delivers the APEs of the annual hours of job training on the
output scrap rates that are larger in magnitudes and statistically more signiﬁcant

than the APEs’ estimates from the FE—IV approach.

2.2 Model of Interest for Balanced Panels

For a random draw 2' from the population, the structural model is
ylit =wta, +xitbi+uita f: 1,...,T, (2.1)

where wt is a 1 x J vector of aggregate time variables which we treat as nonrandom —
a,- is a .1 x1 vector of individual-speciﬁc slopes on the aggregate variables, Kit is a 1x K
vector of exogenous covariates, zm, and an endogenous covariate, 92in that change
across time, in general, Xit = f (zm, ygit), b,- is a K x 1 vector of individual-speciﬁc
slopes, and 'uz-t is an idiosyncratic error. For simplicity, assume x“ :2 (le’te ygit). Let
zit = (2121,2221) be a 1 x L vector of instrumental variables, with L 2 K, i.e., we
assume the vector 22,, contains at least one element. We assume a sample of size N
randomly drawn from the population, and T being ﬁxed in the asymptotic analysis.
For the purpose of this paper, we assume a balanced panel.

Our object of interest is 6 = E(b,:), the K x 1 vector of average partial effects, i.e.,
vector of partial effects averaged over the population distribution of any unobserved
heterogeneity. The APEs are usually of primary interest to empirical analysts. An-
other empirical question of possible interest is estimating bis themselves. However,
the estimation of bis, when we treat them as parameters, is not precise unless T is

large. As an alternative, we turn to estimation of average partial effects in our model.

21

Following Murtazashvili and Wooldridge (2005) we study estimators of 6 that are
based on the assumption that the slopes bi are constant, but we study the properties
of these estimators in the context of model (2.1). We write b,- = 6 +di, and E(d,-) = 0,
by deﬁnition. In other words, we assume that that individual heterogeneities have

constant means, 6, and random error terms, di. Substitution into (2.1) gives

ylit = Wtae' + Xz‘tﬁ + (xitdi + Hit)

5 Wtai + X216 + ’Ulz't. (2-2)

where ”Unit E xitd, + “it- We estimate 6 in (2.1) allowing the entire vector 3.,- to vary
by 'i, and to be arbitrarily correlated with Xit- Following a cross-sectional deﬁnition
from Heckman and Vytlacil (1998), we call (2.1) a correlated random coeﬂicz’ent model
because of the possible correlation between b,- and Kit-

In this paper we develop a two—step estimation method motivated by Wooldridge
(2005b) for obtaining consistent estimates of the average partial effects. The method
we employ for obtaining consistent estimates of APEs is called a control function
approach, which was pioneered by Smith and Blundell (1986) and Rivers and Vuong
(1988). The main idea of the control function method is to add control variables
into the structural model to control for the endogeneity problem (regardless of its
exact nature). To use the control function approach in our case, we need to make
assumptions about the nature of the endogeneity in the random coefficient model.
Since we have two sources of endogeneity in our model — the correlation between the
unobserved heterogeneities and the regressor ygit, and the correlation between that
regressor and the structural error, we are interested in modeling the relationships
among the random coefﬁcients, exogenous covariates, and the error from the reduced
form equation for the endogenous explanatory variable.

First, we assume there is some strictly monotonic function h() deﬁned on the

22

support set of ygit, such that

(7012a) = 520 + 221521 + 21522 +0221, t = 1, T, (2-3)
E(‘U2,jt]Z.,'1,...,ZiT) = 0, f: 1, ...,T, (2.4)
T
where Z,- = T—1 Z zit, ragit’s are error terms, and
t=1
E(“itlzilv ZiTa “22:1. “227“) = E("itl?’2i1e?’2iT) =
= P1U2it + pQUQi, t = 1, ...,T, (2.5)

where p1 and p2 are scalars, and 172,- = T—1 £112“. Assumption (2.5) is stronger
than just assuming that “it is uncorrelated wittzh1 zi. There are two parts to this as-
sumption. The ﬁrst equality says that ”it is conditional mean independent of z,- given
122.“, ”022:1“. This will always be true if (uz-t,v2,;1,...,v2,-T) and z,- are independent.
The second equality states that E(u,~t]vg,-1, ..., 122,7) is linear. Assumption (2.5) holds
if U2“ 2 (12,- +62“, where {(Uit, 6%)} is independently and identically distributed and
all conditional expectations are linear. Thus, we maintain (2.5) is a valid extension
to the CRC panel data models. We follow Rivers and Vuong (1988) and call equation
(2.3) a reduced form equation.

Strict monotonicity of h() implies that ygit is a well-deﬁned function of
{Zi1,...,ZiT} and um. Further, assumptions (2.3) and (2.4) mean that when some
function h(-) is applied to the endogenous explanatory variable, ygit, the latter has
a linear conditional mean given all the instruments. In other words, linearity of
E(y2it|z,-1, ZiT) might not be an appropriate assumption, while we want ygit to be
included linearly in the regression equation. Assumption (2.4) always holds if cm
is independent of 22-. In the standard case of continuous 312,-, with a large support
set assumptions (2.3) and (2.4) are very reasonable in many possible situations. But
if the endogenous covariate has characteristics that are not quite suitable for a con-

tinuous variable these assumptions do not generally hold. For example, assume a

23

continuous variable ygit with a large support set is deﬁned according to (2.3) when
h(-) is identity so that vgitlzi”Normal(0, 02-2,), where 022, = Var(v2,tt|z,-) is a conditional
variance that depends on z,-. In this case we can standardize Uzit to be a variable 9,3235,
which is independent of 22-, guaranteeing that assumption (2.4) is satisﬁed. However,
assumption (2.4) is unlikely to hold if 312,-, has some ”discrete”-type characteristics.
For instance, let 312,-, be a binary variable so that ygitIZ, follows a probit model. Even
having standardized the error term for this variable, v2“, we cannot hope to obtain
a new one, which is independent of zi.

For the purpose of our study, we will refer to the continuous variables with some
discrete characteristics as roughly continuous to distinguish them from the traditional
continuous variables and emphasize that these roughly continuous variables do not
always have ﬁne behaviors of continuous variables. Possible examples of these vari-
ables would be income, education, experience, etc. Garen (1984) discusses estimation
of models in the presence of selection bias when the choice variable is continuous
and the choice set is ordered. He suggests treating level of education in the human
capital earnings model as such a continuous variable: on the one hand, schooling is
traditionally thought of as a continuous variable, on the other hand, only integers of
that variable are observed.

Which functions can we use as a strictly monotonic function h(') in transfor-
mation (2.3)? For a trivial case of a continuous ygit with a large support set,

1?

we can use (1(y21't) = ygzgt. When the nature of ygit is more ”exotic, the choice
of h() is not so straightforward. For instance, Wooldridge (2005b) suggests us-
ing h(y2.,jt) = “IQ—3253;), when 312,-, is a fraction in the open unit interval, and
h(y2,t) = ln(y2,:t), when ygit > 0. Assumptions (2.3) and (2.4) rule out probit, legit,
and Tobit models because ygit has discrete characteristics.

For example, if we are interested in estimating whether there is an effect of per-

pupil spending on math test pass rates for fourth graders in Michigan, and the en-

24

 

dogenous variable of our interest is per—pupil spending, then per-pupil spending is a
roughly continuous variable and choosing a log-transformation of per-pupil spending
is appropriate. If we employ the logged per—pupil spending as the endogenous covari-
ate in the model, then logged per-pupil spending can be thought of as a continuous
variable and function h(y2.it) = ygit with ygit = ln(per-pupil spending) is clearly
adequate.

Second, we need to make assumptions about the distribution of (ai, cm) condi-

tional on the instruments. We assume

E(ailzi11 Zn“, znu. U211“) = E(ai|§i»52i), (2-6)

and

Efailia 522'.) = a + A132" + (A2 + 1432265223 (2-7)

where a and A2 are J x 1, A1 and A3 are J x L matrices of constants, respectively,
‘z‘, and “172,- are deﬁned above. Assumption (2.6) means that '73,- and '62.,- can be thought
of as sufﬁcient statistics for describing the relationship between a,- and the history of
{zitmgit : l = 1, ...,T}. Assumption (2.7) speciﬁes a particular functional form for
the relationship among a,, 2,, and 62,-. Interactions among the exogenous variables
zit and 222,, might be important. In a cross-sectional context, Card (2001) shows that
the joint distribution of (a,, um) given Zz‘t can depend on zit due to heteroskedasticity
in Var(a,-, ’Ugitlzit). He shows that using IQ as a proxy for unobserved ability and a
binary indicator for college proximity as an instrument for education in the human
capital earnings model. Assumptions (2.4) and (2.7) can still be true even when the
conditional variance-covariance matrix, Var(a.,-, U2itlzz'), is heteroskedastic.

Third, we need to make assumptions about the expected value of d,, conditional

on {lit} and {Ugit}. We assume

Efdzilzrli ZrT. “022:1. 112:7“) = add-7323521), (2-8)

25

and, in particular,
E(dilzi1aziTav2i1au-102iTl = 131(32' — 1W + (32 + B331’W22', (29)

where 2,0 E E(z,~), B1 and B3 are K x L, and B2 is K x 1 matrices of constants,
’2‘,- and 62,- are deﬁned above. In Murtazashvili and Wooldridge (2005), one of the
conditions for consistency of 6 estimates states that the covariance between (I,- and
the detrended xz-t conditional on the detrended zit equals its unconditional version,
that is, it does not depend on the detrended Zit- For the reasons mentioned earlier,
this assumption might be too restrictive for the case of roughly continuous endogenous
explanatory variables. In this study, we relax this assumption not only by dealing with
the original data, but also by allowing the covariance between d,- and Kit conditional
on the instruments to be a function of Zit- The conditions we employ in this paper
assure the consistency of 6 estimates in the case of roughly continuous endogenous
explanatory variables.

Then, we take the expectation of equation (2.2) with respect to
(2,1, ziT, 122,1, ..., v2”), employ that ygit is a deterministic function of
(z,1,...,z,-T,c2,-1, ...,2.22,:T), and use assumptions (2.3) through (2.9). The resulting

estimating equation is:

Efylitlzila ZiTa U211, U223") = Wta + (it 18' “7001 + 9-2th012 + 521(21'83 W003 +

+tht/3 + ((Z‘ — 11)) (>9 xi061 + 172ixz't/32 + 1722' (it <59 X1063 + 1010221, (2.10)

where t = 1,...,T. Here, H = 2(1+ L)(J + K) +1 is the total number of all the
independent second-stage variables. Equation (2.10) is an estimating equation for
obtaining consistent estimates of APE, 6. Importantly, the components of 22,-, 1 the
instrumental variables excluded from the structural equation (2.2) 1- do not enter the
estimating equation (2.10) in levels or interacted only with Zlit- Generally, if we
had any of these introduced in (2.10) we would lose identiﬁcation. [See Wooldridge
(2005b) for more details]

26

In some cases we might think that assumption (2.8) is too restrictive. In some
potential applications we might want to allow the random coefﬁcient to vary not only
across 2' but also across t. In other words, for a random draw 2' from the population,

the structural model becomes
mi, = wta, + xitbit + "it: t: 1, ..., T, (2.11)

where bit is a K x 1 vector of time varying individual-speciﬁc slopes. We write
bit = 6 + qz't, and E(q,-t) = 0, by deﬁnition. In other words, we assume that that
individual heterogeneities have constant means, 6, and random error terms, (lit- F ur-
ther, we assume that qits consist of both time-constant and time-varying zero mean

components, i.e., q,t = d,- -l- Tit. Substitution into (2.1) gives

l/lit = wtai + X213 + (Xitqi‘t + Us)

5 wtavi. + Xitﬁ +’01it, (212)

where um E Xz‘th't + uit. Then, the estimation equation for the model (2.12) will
need to expand in comparison with the estimation equation (2.10) to reﬂect the time
varying nature of the individual multiplicative heterogeneity. Assumptions (2.8) and
(2.9) can be replaced with the following assumptions about the error term and the

distribution of (qit, um) conditional on the instruments:

E(Qitlzila ziTa U211» 1’2iTl = E(Qitlzia "521» U221, zit): (2-13)

which says that E(q,~t|z,71, ...,ziT,t:2,:1, ...,u2.,;T) depends only on the time t values
and time averages. Since we maintain qits consist of time-constant and time-varying
components (1,; and Fit, respectively, assumption (2.13) reflects the nature of (lit-

And, ﬁnally, we assume

Efqitlzu. ---. ZrT. v2i1, 112:?) = E(di + 151131.521, U221. Zitl =

= {31% — 5’)’ + (B2 + B3-z—i’lff2i} + (B4 + B5zit’lv2its (2-14)

27

 

where if E E(z,~), B2 and B4 are K x 1, Bj, j = 1,3,5, are K x L matrices of
constants, respectively, 2,- and 172,: are deﬁned above. Clearly, the right hand side of
equation (2.14) is identical to equation (2.9) when B4 = B5 = 0.

Then, similar to the case of the time-invariant individual heterogeneity, we take
the expectation of equation (2.12) with respect to (z,1,...,z,T,v2,1, ...,vgiT), employ
that ygit is a deterministic function of (2,1, ..., ZiT, 112,71, ..., 122,7), and use assumptions

(2.3) through (2.7), (2.13), and (2.14). The resulting estimating equation is:

Efyialzih Zthe ”U221, l121T) = WtOI + (52' ® W001 + 52th012+

4%:de 0‘0 thu3 + Kit/5 + ((Z‘ - 't/x') 07¢ Xitlﬁi + 52299162 + 521(72' <83 Xitlﬁ3+
+192itxz't/34 + ’Uzrdza ® X2665 + plv2itv (2-15)
where t=1,...,T, (11 = vec[A1], (12 = vec[A2] + ([22 0 0)/, where (02 0 0)! is a
J x 1 vector, (13 = vec[A3], 63- = vec[B]-], j = 1,—5. Once again, equation (2.15) is
an estimating equation for obtaining consistent estimates of APE, 6. When wt E 1,

t = 1, ...,T, equation (2.15) simpliﬁes to:

EWmIZii, ZiT» 212,1, lair) = a + Z201+ 722102 + T2i§i03+
+3915 + (fir - 11') 0'0 Xitlﬁi + 172212162 + 5'2er ® Xitlﬁ3+

+1’22'tx-itl‘34 + v2it(zit ® Karl/35 + 9111221, 15 = 1, T- (2-16)

2.3 Estimating Procedure and Calculation of

Standard Errors

We employ the control function approach that uses the reduced form error terms,
“U221, as ”control variables” for heterogeneity and endogeneity in the structural model.
A two-step method that consistently estimates the parameters from equation (2.11)

is the following:

28

 

1. Run the POLS regression of
h(f/2it)0n11 Zita 22‘, 2: 1, ..., D], t: 1, ...,T, (2.17)

_ T
and save the residuals, 172“, 2' = 1,...,N, t = 1,...,T. Obtain 172,- : T4262“,
t=l
i=1,...,N.
2. Run the POLS regression of

91:1. 011 Wt. vec[(2,- <8) thl’e 52cm, veclfii ®thlIFZia

Xit: VGCKZ‘ - 2l '59 Xitl', faxit, vec[(2,- ® Kali/521'»

ﬁ2itxita V€lezit 53‘ Xitll,ﬁ2ita 82a. (2-18)

N T ..

where 2' = 1,...,N, t = 1,...,T, 2 = (ND—12: Zzit, and obtain 6 and the other
parameter estimates. Terms containing the vecszJtefﬁtor are used to denote all possible
interactions among the variables. For example, term vec[(z,~t ‘8 x.,-t)]'i72,-t in (2.18)
consists of K * L interaction terms.

If we want to test whether the data exhibit the properties of time-varying or time-
constant individual heterogeneity, we can employ a test of joint signiﬁcance of 6-,
j = 4,5 in (2.15). The null hypothesis of time—constant individual heterogeneity is
H0 : 64 = 65 = 0. A fully robust adjusted Wald statistic is appropriate. If the
Wald test rejects the null hypothesis then the model with time-varying individual
heterogeneity - (2.15) - should be estimated.

To test for endogeneity of ygit and individual heterogeneity we can simply test
for joint signiﬁcance of all the second-stage terms other than wt and Mt By con-
struction, the errors from the second stage of the estimating procedure are zero mean
independent of all the explanatory variables on that stage. As a result, the POLS
estimates of the second-stage parameters will be consistent, and a standard F test

of joint signiﬁcance of all the second-stage terms containing the ﬁrst-stage residuals

and time—demeaned exogenous variables, 2,, will be a valid test. If the coefﬁcients of

29

all the terms from the second stage that contain the generated regressors and time-
averaged instruments are statistically jointly different from zero, there is endogeneity
and heterogeneity problem, and neglecting it will lead to misspeciﬁcation.

If the null hypothesis of no endogeneity and no individual heterogeneity is rejected,
the standard errors in (2.18) should be adjusted for the ﬁrst-stage estimation of
62: (520,621I, 622/)1, a (2L + 1) x 1 vector of the ﬁrst stage parameters in (2.17).
Deﬁne git to be a 1 x H, H 2 (2J -l— 3K)(1 + L) + 1, vector of all the independent
second-stage variables, i.e., git 2 (wt, vec[(2,- ‘8 wt)]’, Ugiwt, vec[(2,- ® Wt)]’22,-, Xita
V€Cl(ii — Z) <29 Kill]: 52209:. veclfii ‘8’ Xitll'vizi, v2itx2’ta vecl(zit ® xitlllv2z’ta ”Uzitl- Let
git to be a 1 X H vector git that contains the estimated ﬁrst-stage residuals, 272,1:
Sit = (Wt, V€Cl(72i <8 thl', 522%, V€Cl(ii 8 thl'ﬁzz‘» Kit. V€Cl(7i - 2‘) ‘59 Xrtl’, 5221a,
vec[(2,- <8) xit)]’ﬁgi, ﬁgitxit, vec[(z,-t <8) x,t)]’62,-t, 62%). Then, the estimating equation —
(2.18) can be rewritten as 9m 2 git6+e,t, where E(e,-t|z,-1, ..., ziT, 212,1, ..., 11211“) = 0,
and 6 is a column of all the parameters from the estimating equation. Deﬁne y1,- to
be the T X 1 vector of gift: let G,- be the matrix with it" row git: and G,- be the

till

matrix with row g, Then, 6 can be estimated as:

N N T
6 = (ZGQG. )‘kZZeltyua. (2.19)
i=1 i=1t=1
Write 311a = @119 + (g,, — Sitlg + 6n = git9 + mg“ — Sit), + eit- Plugging this in
(2.19) and multiplying through by \/N gives

N T
We — 6) = A—lN‘l/ZZZeg.l6’(g.-. — at)’ + ea.

i=1t=1
.. N - A ..
where A = N‘1 ZGgG, . Using the Law of Large Numbers, we know that A L
7.21

A EE(G;G,‘). Further, a mean value expansion gives

N T N T N T
N-1/222g2.e.t = N-1/2ZZgQ.e.-.+[N—Sgt/seam«Varanasi

i=1t=1 i=1t=1 i=1t=1

30

where V52git is the H x (2L + 1) Jacobian of g], with respect

to the parameters 62 from the ﬁrst stage of the estimating pro-

 

 

 

 

cedure. For each (2', t), V52 git is a block matrix of the form:
{ 0 0 . . . 0 0 . . . 0 \
0 0 . . . 0 0 . . . 0
K :1 \ Wt Wt . . . Wt Wt . . . Wt
—Z” (7190“?) (ii X Wt) (72' th) (z, XWt) (72‘ @9th
i 0 0 . . . 0 0 . . . 0
—2L, 0 0 . . . 0 0 . . . 0
"Eli xit xit . . . x‘it xit . . . xit
, (it ‘8’ Kit) (22' ® xit) ~ - - (it <59 Kit) (32' ® xitl - . - (52‘ 8’ xa)
\ ..EL, } 3% xit . . - xit Xit . . . xit
(zit X xit) (zit 59 Kill - - - (zit 59 xit) (zit 8’ xit) - - . (zit ‘83 xit)
1 1 . . . 1 1 . . . 1

Each row of the jacobian matrix corresponds to each addendum in estimating
- ~ . . _ I _
equation (2.15). Because E(re,:t|z,-1, ...,ziT,1:2,-1, ..., 02”) — 0, E((V52g,-t) eit) — 0. It

follows that

N T
N—1:Z(vdgitleit = 012(1):
i=1t=1
A N T N T
and, since x/N(62 — 62) = Op(1), we get N—1/2: Zggteu = N_1/2Z Egg-ted +

i=1t=1 i=1t=1
op(1).

Next, using similar reasoning,

N T N T
N-1/2ZZgg.6’(g.-. — ea’ = —[xv—IZthG’wae-alW052 — 62) + ope) =
i=1t=1 i=1t=1
= —B\/N(52 — 52) + 012(1)»

T
where B =E(Zg;,6’(v(52g,t)). Further, based on the ﬁrst stage of the estimation
i=1

procedure - (2.17) - we know that

N T
W02 - 52) = C_1N"1/2ZZ(25)"U2it + 022(1),

i=1t=1
T
where C EZEKzgYZS], Z]: = (1,z.,-t,2,) is a 1 x (2L + 1) vector of the ﬁrst
t=1

stage explanatory variables, i.e., it is a vector containing a constant, exogenous ex-

31

 

planatory variables, Zita and, time averages of the exogenous explanatory variables,
T

2, = T"1 Zzit, and E((zg)’v2,§t) = 0, t = 1, ...,T. Thus, collecting all the terms we
t=l

obtain

\/N(6 — 6): A__1N 1/222[g,~,cit — BC 1F(z zit) )va] + 0p( (1).

i=1t=1
By the Central Limit Theorem,

\/N(6 — 6) L Normal(0,A—1MA-1),

where M EVart231( gztc it - BC 1z(,F)'12,-t). Therefore, the asymptotic variance of 6,
Avar(6), 1S estimated as

\7 E A'IMA‘l/N, (2.20)

where A is deﬁned above,

7— “6‘1 A —1 A
=1» 1: Zlgz.e.-.-13<:z<5>'vz.a 2(gitéa-13C(zg)'v2a) ,

i=1t=1

. N T .
B =N—12112git9'W62gzt) C :N 12 gsz) it )Zg, and éit : ylit _ Site-
2 t: z t

2.4 Finite Sample Behavior of the Control Func-

tion Estimator

In this section we provide evidence on the ﬁnite sample properties of the control
function estimator of the APE in CRC balanced panel data models. We assume that
the unobserved heterogeneity is time constant. This assumption allows us to compare
the proposed estimation method with other available estimators in the same context,
and time constant slopes are commonly assumed in many empirical applications. So,

we consider two CRC panel data models with time—constant unobserved heterogeneity

32

described by equation (2.10). First, we study the usual unobserved effect CRC model
with a random coefﬁcient, i.e., we assume wt E 1, t = 1, ..., T. Second, we employ the
random trend CRC model with W, E (1,t), t = 1, ...,T, so that each cross-sectional
unit has its own linear trend. We use Monte Carlo simulations to draw the data and
check the properties of the estimator. The number of replications is 500, and the
results of the experiments are presented for samples of 500 and 1000 observations for
a time horizon T = 5. The population values of the model parameters are set at 6 = 2
and a = 1. We consider two options for a scalar endogenous explanatory variable
ygitz (1) a continuous ym with a large support set, i.e., it is a traditional continuous
variable, and (2) gm being a fraction in the open unit interval, 312,-); E (0,1), i.e., it is
a roughly continuous variable.

For the usual unobserved effect CRC model the dependent variable ylit is gener-

ated as:
yhjt = a, + y2itb'i + nit, f: 1, ...,T, (2.21)
where
a,- E a + AME, + Aga'L—‘gi + A3af2i—5i + A406? (2.22)
b, E 3 + Albffi — E) + AgbUQi + A3btjgi§i + A4b8?, (2.23)
and
“it E lumen + /\2u’t_‘2i + /\3ue;1ta (2-24)

where 2,, ~ Normal(l,1), 02,-, ~ Normal(0,1), cf, 6]? ~ Normal(0,1), 62‘, N

T

Normal(O 1):: _ _—T ltzlzita 3 (MIT 1: Zzitv 622' = T-1231v22'ta /\laa /\2aa
t:

A3a: )‘4aa A11), A2,” A35, A41), A1“, Agu, and A3,, are constants.

For a continuous ygit on a large support set we deﬁne the endogenous explanatory

33

 

variable ygit to be ygit E h(y2,t) E 92,4, where we generate 92,, according to:

92a 5 )‘ggzzit + {Zitdi + )‘ggvgv2it1 (2-25)

where '02,, ~ Normal(0,1), d,- = b,- — 6, A922, g, and A924)2 are constants. For

ygit E (0, 1), we use the following equality to deﬁne the endogenous regressor:

 

1 + exp(92rt)

112a E

If we set 6 to be 0 in (2.25) then the condition for consistency of the F E-IV
estimator of CRC panel data models in Murtazashvili and Wooldridge (2005) will be
satisﬁed. When 5 aé 0, the covariance between the detrended endogenous explanatory
variable, 372,4, and the unobserved heterogeneity, b,- = 6 + (1,, conditional on the
detrended instrument, 5,4, is not equal its unconditional version: Cov(372,-t, bilgit) 79
Cov(3jz,t, (1,) Thus, for § 74 0, the FE—IV estimation in Murtazashvili and Wooldridge
(2005) does not deliver consistent estimators of the model parameters. While (2.25)
does not meet the requirements for consistent FE—IV estimation of (2.21) when € 75 0,
it does satisfy (2.3) through (2.14) and does allow using the CF approach to obtain

consistent parameters’ estimates in (2.21).

For the random trend CRC model the dependent variable y”, is generated as:

f/lit = 011+ (122't + y22ﬁtbz' + ”Lt-it, t: 1, MT, (2-27)

where both (2.1,: and (1.2,- are generated according to (2.22), b,-, ygit, and u,-t are also
deﬁned above.

Why would we think that the data generating process we propose in (2.22) through
(2.25) is representative of something that we might actually see in practice? One of
possible empirical examples can be a study by Hall and Jones (1999). The authors
attempt to explain the differences in output per worker by differences in institutions

and government policies, which they call social infrastructure. Even though Hall and

34

 

Jones (1999) do a cross-sectional investigation, their idea can be easily extended to
a panel data setup. Social infrastructure is thought to be endogenous. First of all,
it can depend itself on the level of GDP per worker in a country. Secondly, we do
not observe social infrastructure directly, and need to deal with a measurement error
problem. Hall and Jones (1999) suggest using Western European inﬂuence around
the world as an instrumental variable for social infrastructure. Speciﬁcally, a distance
of a country from the equator and a fraction of population speaking a European lan-
guage are used as measures of Western European inﬂuence. Clearly, the distance of a
country from the equator is time-invariant. Instead,we can use a time-varying fraction
of population speaking a European language as an IV in a panel data setting. While
both models (2.21) and (2.27) can be thought appropriate, perhaps, structural equa-
tion (2.27) should seem more suitable for modeling a behaviour of output per worker,
since we want to allow each country to have its own time trend. Further, endogeneity
of social infrastructure explains equations (2.24) and (2.25). Country-speciﬁc unob—
served cultural characteristics, both additive and multiplicative, might be related to
the fraction of population speaking a European language. It is Western Europe who
distributed to the rest of the world the ideas of Adam Smith and the importance of
property rights (among others). As a result, countries that were inﬂuenced by West-
ern Europe the most are more likely to have favorable social infrastructure. This
would explain the linear terms in equations (2.22) and (2.23). Importantly, it is pos-
sible that the joint distributions of (a,, um) given z,- and (bi, 222,4) given z,- can depend
on 2,- due to heteroskedasticity in Var(a,-, v2,t|z,) or Var(b,-, vgit|z,-), where j = 1 or 2,
as discussed by Card (2001) for the human capital earnings model. That is why we
might think that the interaction terms in (2.22), (2.23), and (2.25) are required.
Table 31 and Table B.2 present experimental results for the CRC model with
a continuous scalar endogenous explanatory variable with a large support set and

a scalar instrument Zit- Table B.1 reports the simulation outcomes for the usual

35

 

unobserved effect CRC model, while Table B.2 covers the case of the random trend
CRC model. For the usual unobserved effect model, column 2 contains the sample
correlation coefficients among the endogenous regressor, ygit, and the instrument, Zita
the error, “it: the unobserved additive eﬂ'ect, a4, and the unobserved multiplicative
heterogeneity, bi, denoted 633,22, 63,24, [33/24, 61/2)” respectively, because analytical ex-
pressions are not readily available. For the random trend model, we report the sample
correlations between 312,-, and a”, and between ygit and a2,- separately. We denote
these sample correlations 6,424,, 63,242, respectively. 63,21, is reported for t = 1.2

Columns 3 through 10 contain the mean, regular standard error (Reg. SE), ro-
bust standard error (Rob. SE)3, standard deviation (SD), root mean squared error
(RMSE), lower quartile (LQ), median, and upper quartile (UQ) of the APE estimates
from 500 replications. Rows of the table report statistics for the usual pooled ordinary
least squares (POLS) estimates on the original data, the usual ﬁxed effects estimates
(FE-OLS), which is just pooled OLS on the time-demeaned data, the instrumental
variables (IV) estimates using the original data, the ﬁxed effects-instrumental vari-
ables estimates (F E—IV), and the estimates from the control function approach (CF).
Adjusted standard error (Adj. SE) is reported for the CF approach.

It is easy to see that when 5 = 0 and the endogenous explanatory variable 312,, is
continuous on a large support set, i.e., ygit is deﬁned by (2.25) for 6 = 0, the (con-
ditional and unconditional) covariance between the detrended endogenous regressor
and the unobserved heterogeneity is constant over time. Even though Murtazashvili
and Wooldridge (2005) emphasize that the F E—IV estimator should contain a full set

of time dummies to deliver consistent estimates, they do so allowing the covariances

 

2When { 75 0, Table B.1 and Table B2 are obtained for /\la = A2,, = A3,, = 0.29, A4,, =
0.84, Alb = A2(, = A3,, = 0.2, A4,, = 0.99, A1u = A2,, = 0.37, A3,, = 0.88, A923 = 0.44, E = 0.55,
and A921,? = 0.71. When 5 = 0, Ala = Aga = A30 = 0.31, A40 = 0.82, Alb = A25 = A35 = 0.61,
A41, = 0.91, A1,, = A2,, = 0.2, A3,, = 0.96, A9,; = 0.26, and A921,, 2 0.97 are used for Table
B1 and Table B.2.

3Robust standard errors are calculated using the scaling factor from Stata 9.0, i.e., they

are clustered on individuals.

36

 

 

to vary with time while still being independent of the detrended instruments. Thus,
for the usual unobserved effect model, when we deﬁne ygit according to (2.25), there
is no need to include time dummies to obtain consistent FE—IV estimates of 6 when
E = 0. As a result, all the estimates we consider for the usual unobserved effect model
including the FE—IV estimates are based on the regressions without the time dum-
mies. For the random trend CRC model, all the reported estimates (but the CF) are
based on the regressions with the time dummies.

There are three sources of bias in the estimates under consideration. First, the
correlation between the unobserved heterogeneity a,- and the regressor 312,-, results
in the biased estimates of the model parameters. Second, the endogeneity of the
regressor 312,-, also explains why the estimates we consider are biased. Finally, the
correlation between the regressor 312,-, and the random coefficient b,- leads to the bias
(and inconsistency) in the estimates, as well. As long as 5 = 0 in (2.25), the correlation
between the endogenous explanatory variable and the random coefﬁcient does not
result in the inconsistency of the F E—IV estimator. When 5 = 0, both the FE—IV and
the CF methods deliver consistent estimates of 6. When 5 75 0, the FE—IV estimates
of 6 are both biased and inconsistent. The CF estimates, while being biased, are the
only consistent estimates considered for g 75 0.

Columns 4 and 5 contain regular and robust standard errors of the estimates. To
be exact, we report the averages of the regular and robust standard errors of the
estimates obtained from 500 replications. The regular SE are the standard errors cal-
culated under assumption that there are no heteroskedasticity and serial correlation
in the error terms. The robust SE are adjusted for both serial correlation and het-
eroskedasticity that are possibly present in the errors. Standard errors reported for
the CF approach are the standard errors, which are computed according to formula
(2.20), and which are the standard error adjusted for the ﬁrst stage estimation and

which are robust to arbitrary serial correlation and heteroskedasticity. As expected,

37

the simulations show that the robust standard errors for the ﬁrst four estimators are
surely better estimates of the standard deviations than the regular standard errors
are.

Studying Table BI and Table 8.2 for both sample sizes in case of f 75 0, we
conclude that the CF estimates have the smallest biases and the smallest RMSEs.
Murtazashvili and Wooldridge (2005) show that the F E—IV estimation results in con-
sistent estimates of 6, when 5 = 0. Table BI and Table B.2 indicate that the CF
estimator and the FE—IV estimator have very similar RMSES for 6 = 0. Closeness
in RMSES comes from similarity in both biases and standard deviations of these
estimators. The CF estimator has a smaller standard deviation than the FE—IV es-
timator in all the cases considered in Tables BI and B.2. For instance, when 6 = 0
and N = 1000, the standard deviation of the F E—IV estimator is about 21% higher
than the standard deviation of the CF estimator. Efﬁciency of the CF estimators
comes from the assumed speciﬁc functional forms for the endogenous variable and
the random coefﬁcients of equation (2.21).

For the next set of simulations, we take 312,-, being a fraction in the open unit
interval. Because the structural model (2.27) contains a trend, the default is to
include a full set of time period dummies in every estimation technique but the CF
approach. Table B3 and Table BA Show the simulation results for gm 6 (0,1).4
Now, the bias in the CF estimate (and all other estimates) is more pronounced, even
though the CF estimating method results in the smallest bias of 6 among all the
estimators. For example, when we consider the random trend model, for 312,, 6 (0,1)
when .{ aé 0, with [3,121, = .581, and N = 500, the CF estimate of 6 is 2.229 with the
RMSE of .750, compared to 60F = 2.039 with the RMSE of .182 with 6be = .578,

 

4When 6 ¢ 0, Table B3 and Table B4 are obtained for Ala = A2,, = A3,, = 0.21,
A4,, = 0.92, /\1b = 0.2, Agb = A31, 2 0.7, A41, = 0.94, A1,, = A2,, = 0.43, A3,, = 0.84,
A9,; = 0.87, 5 = 0.48, and Agzv2 = 0.11. When 5 = 0, Ala = A2,, = A3,, = 0.31, A4,, = 0.82,
Alb = 0.2, A21, = A3b = 0.7, A45 = 0.94, An, = A2,, = 0.22, A3,, = 0.96, A92: = 0.26, and
Amy, = 0.97 are used for Table B3 and Table B.4 .

38

and N = 500, for y2-it with a large support set.

As expected, when 5 = 0, it is the CF approach that has the smallest bias among
all the estimators under consideration since we simulate our dataset to satisfy the
assumptions (2.5) through (2.7). However, when 5 = 0, the evidence on the RMSEs
of the CF and the F E—IV estimators is mixed. On the one hand, the RMSE of the
FE—IV method is either clearly smaller or only marginally bigger than the RMSE of
the CF method. For example, for the random trend model with a roughly continuous
regressor, the RMSE of the FE—IV estimator is 0.599 vs. the RMSE of 0.875 of the CF
estimator when N = 500. The random trend model with a continuous explanatory
variable results in the RMSE of 0.235 for the FE—IV estimator and the RMSE of
0.223 for the CF estimator when N = 500. On the other hand, the bias of the FE—IV
estimator is clearly much more severe. The differences between the FE—IV and the
CF approaches for the random trend model with ygit E (0, 1) and g = 0 illustrate the
trade—off between bias and efﬁciency. For 5 = 0, both the CF method and the F E-IV
approach are consistent. Further, the simulations in Tables B3 and BA show that
the F E—IV estimates are always less variable than the CF estimates for the random
trend model with gm 6 (0,1). However, when 5 = 0, the bias in the CF estimator
is signiﬁcantly less than the bias in the F E—IV. Overall, the simulation ﬁndings in
Tables B3 and B4 support the idea that the CF estimating method produces more
desirable estimates of 6 when f # 0.

Applied economists are quite often reluctant to use control function methods since
control function approaches require the calculation of the adjusted standard errors,
which is not routinely done in standard econometric packages. Table B.5 contains
detailed information (which can be partially seen in Tables B.1 through B4) on the
standard errors of the control function estimates of the APEs for the two models and
the two cases of the endogenous explanatory variable considered. Columns 1 shows

whether { is different from zero. Column 2 reports the cross-sectional sample size.

39

Columns 3 through 7 contain the mean, regular standard error (Reg. SE), robust
standard error (Rob. SE), adjusted standard error (Adj. SE), and standard devia-
tion (SD) of the APE estimates from the CF approach from 500 replications. The
regular SE are the standard errors from the second stage estimation for the CF ap-
proach without adjustment for heteroskedasticity and serial correlation and without
taking the ﬁrst stage estimation into account. The robust SE are the second stage
standard errors from the CF method, which are robust to both serial correlation and
heteroskedasticity, and which are obtained ignoring the ﬁrst stage estimation. The
adjusted SE are the only standard errors which are adjusted for the ﬁrst stage esti-
mation (they are calculated according to formula (2.20)). Clearly, besides being the
only theoretically appropriate estimates of the standard errors, the adjusted standard
errors based on (2.20) approximate the standard deviations the best among the three
standard errors considered.

To summarize, the simulation ﬁndings verify that when the joint distribution of
(a4, (2,, 122,4) given 2,, depends on 2,, the most robust estimator of the average partial
effect in a correlated random coefﬁcient balanced panel data model is the control

function estimator from the two-step estimating method (2.17) — (2.18).

2.5 Empirical Application to Effects of Job 'Ii'ain-

ing on Worker Productivity

The method we propose for estimating the average partial effects from a correlated
random coefﬁcient panel data model is developed for large N small T framework.
However, real-life data limitations quite often do not allow researchers to use ”truly”
large N datasets. Here, we follow a common real-life situation with a not so large N

dimension of the available data. Suppose we want to estimate an average partial effect

40

...,,

JP?“ ‘

 

of job training on worker performance measured by output scrap rates. Holzer, Block,
Cheatham, and Knott (1993) explore the effects of a state-ﬁnanced training grant
program for manufacturing ﬁrms in Michigan using constant coefﬁcient models. They
use a three-year panel of data (1987—1989) from a unique survey of ﬁrms in Michigan
that applied for training grants under the state’s Michigan Job Opportunity Bank-
Upgrade (MJ OB) program. This program was designed to provide one-time grants to
eligible ﬁrms. An eligible ﬁrm was deﬁned as a manufacturing company with 500 or
fewer employees that was implementing new technology and had not received a grant
before. Let us estimate the effects of on-job-training on worker productivity allowing
for both additive and multiplicative unobserved ﬁrm-speciﬁc effects.

Why would we think that the random coefﬁcient panel data model might be
appropriate in this context? A possible justiﬁcation for using a RC model can be that
some unobserved ﬁrm characteristics might cause ﬁrms to respond heterogeneously
to the job training. For instance, an unobserved ”atmosphere” in each ﬁrm might
result in a heterogeneous effect of the annual hours of training per employee. Workers
might feel supported and encouraged more in ﬁrms where the management promotes
and advocates additional schooling and team efforts. Contrary, employees of ﬁrms
with no policy on education beyond workers’ current level might be discouraged to
improve their present skills and effort. As a results, the same annual hours of job
training in the two types of ﬁrms can lead to different outcomes of the output scrap
rates. Since the effect of the job training on the worker performance might be related
to the extent of the unobserved support from a ﬁrm, we should consider a correlated
version of the random coefﬁcient model

To ﬁt the method from Section 2.2, we balance data from Holzer, Block,
Cheatham, and Knott (1993), and obtain a sample on 45 ﬁrms that applied for
an MJOB grant during 1988 and 1989. (The dataset is provided with Wooldridge
(2002), and it is called JTRAIN.) Of these ﬁrms, 27 had received a grant and 18 had

41

 

 

not. The balancing of the data is made based on the availability of the data for the
scrap rate (per 100 items) and the annual hours of job training per employee.

Balancing the data raises concerns that the ﬁnal dataset might be a non-random
sample. Table B.6 contains summary statistics from the unbalanced and balanced
datasets for each of the following groups: the entire sample, ﬁrms that received a
grant in either 1988 or 1989, and ﬁrms that did not receive a grant. Comparison of
the two panels of Table B6 suggests that even though the proportion of the ﬁrms
that received the grant and the ﬁrms that did not receive the grant changed, there
are virtually no differences between the ﬁrms in the two datasets with regard to the
scrap rates and the annual hours of job training. The balanced data seem to be very
close to the unbalanced dataset in preserving the information on the scrap rates and
the annual hours of training per worker. Of course, we should also be concerned
that some unobserved ﬁrms’ characteristics played a role in formation of these two
samples. Since the MJOB program distributed grants to eligible ﬁrms on a ﬁrst-
come, ﬁrst-serve basis, we believe the grant distribution to be a fairly random process,
and assume ﬁrms are not selected into the two samples based on their unobserved
characteristics. Given these assumptions, we feel sufﬁciently conﬁdent in relying on
the balanced dataset to proceed with our analysis.

Our goal is to evaluate the average partial effect of another hour of job training
on worker productivity relaxing the traditional assumption of a constant effect of the
annual hours of training per worker on the output scrap rate. In the context of the

correlated random coefﬁcient approach, a simple panel data model of our interest is
log(scrap,~,) 2 oz + bulwsempi, + (51d88, + 52d89, + a1, + u,,, (2.28)

where scram, is ﬁrm’s i’s scrap rate in year t, hrsemp,, is annual hours of on-job-
training per employee, (2.1.,- is a ﬁrm-speciﬁc unobserved effect, and u,-, is an unobserved
disturbance for ﬁrm i at year t. We also allow different year intercepts in our structural

model. The unobserved ﬁrm ﬁxed effect, 0.1,, can contain unmeasured worker ability,

42

capital, and managerial skill, which we think of as being roughly constant over the
time period we consider. Since the unobserved ﬁrm effect includes the worker ability,
the annual hours of job training can be correlated with the unobserved effect. For
example, ﬁrm managers might want to train workers with lower skills more to improve
their productivity. Or, on the contrary, they might be interested in improving the
productivity of relatively high skilled workers even more in order to utilize new hi-
tech equipment that requires very well trained employees. Further, we should be
concerned if u,, is correlated with hrsemp,,. For example, a ﬁrm might hire more
skilled workers and reduce the on-job-training requirements at the same time. A
possibility of measurement error in hrsemp,, should also be considered since there
might be some incentives for recipients of a grant to overstate or non-recipients to
understate their training changes. If any (or both) of these is the case, we need to
deal with the endogeneity of the annual hours of training in equation (2.28). Here,
we exploit the fact that some ﬁrms received MJOB grants. We assume that grant
designation in year t is uncorrelated with the error term u,, in every time period. This
seems to be a reasonable assumption, since ﬁrms are eligible to receive a grant only
once, and grants were distributed on a ﬁrst-come, ﬁrst-serve basis, which we believe to
be a fairly random process. Thus, whether a grant is received or not in year t should
not be related to changes in the output scrap rates in any other year directly and
only through the changes in the annual hours of job training.5 Thus, we use a dummy
variable indicating whether or not a grant was received as an instrumental variable

for the annual hours of training per worker provided that hrsemp,, and grant,, are

 

5Using a constant coefﬁcient approach, we regress a change in the log of the scrap rates
as occurring between years t — 1 and t on a change in the annual hours of on—job—training,
a change in a dummy variable indicating whether a grant was received, and a lag of this
variable. The changes are taken to eliminate the ﬁrm ﬁxed effect. The results of the regres-
sion suggest that none of the three variables are either individually or jointly statistically
signiﬁcant at any conventional level of signiﬁcance (R2 for the regression is 0.032 with
F-statistic=0.96).

43

correlated.

Clearly, the variable hrsemp,, takes on only non-negative values, and we should
consider taking a logarithmic transformation of this variable to run the ﬁrst stage
regression. At the same time, about 27% of all the observations have zero values for
the annual hours of job training. Normally, we would transform a variable :17 that
has zero observations using log(1 + 3:) transformation. However, for the purpose of
our method, there is no gain in using this transformation, since the new variable,
log(1 + 1:), will take on only positive values. Thus, we choose to use the variable
reporting the annual hours of on-job-training in levels.

The ﬁrst stage regression results with and without different year intercepts are
reported in Table 37. Columns (1) and (2) report the results from the ﬁrst stage
regression with and without different year intercepts when no other variables but
grant,, are used as explanatory variables. Table B.8 reports the estimation results
for equation (2.28) by FE—IV and CF methods.

Overall, the APE estimates of the annual hours of on-job-training by the CF
approach for equation (2.28) are bigger than the corresponding estimates from the
same equation by the FE—IV method [See columns (1) and (2) vs. columns (5) and
(6) of Table B8]. For example, the CF approach in regression (6) suggests that 10
more hours of job training per worker are estimated to reduce the scrap rate by about
37%. For the ﬁrms in the sample, the average amount of job training over the three-
year period is 15.6 hours per employee, with a minimum of zero and a maximum of
154. Comparing regressions (6) and (2) from Table 8.8, we can say that the FE-IV
estimate of the APE of the average annual hours of job training per worker is about
18 times smaller in magnitude and is statistically insigniﬁcant. Overall, the estimates
from the CF method (regressions (5) and (6)) are more statistically signiﬁcant and
substantially larger in magnitudes for the two regression speciﬁcations considered

than the corresponding F E—IV estimates (regressions ( 1) and (2)).

44

What do the differences in the two estimation methods suggest? They suggest that
even if the unobserved ” atmosphere” of a ﬁrm is independent of the grant designation,
the correlation between hours of on-site job training and the ﬁrm-speciﬁc unobserved
effect is different for ﬁrms that received a grant and ﬁrms that did not. Indeed, it
is natural to think that the effect of the unobserved worker ability, managerial skills,
and a ﬁrm’s ”atmosphere” on hours of job training might be stronger among those
ﬁrms that received grants. In other words, using the language from Section 2.2, the
joint distribution of ((1,, b,, v2,,) is different for those ﬁrms that received a grant and
those that did not.

To address the question of rationale of the model with both additive and multi-
plicative individual heterogeneities in the context of this application, we test whether
the coefﬁcients of different sets of variables from the second stage are statistically dif-
ferent from zero. Several Wald statistics are calculated. First, the coefﬁcients on all
the explanatory variables but to, and .12,, (and year dummies if included) are restricted
to zero. For speciﬁcity, let us use regression (6) from Table B8. The Wald statistic for
this regression equals 13.00, which allows us to reject the null hypothesis that ignoring
endogeneity and heterogeneity is inappropriate for our data with p-value of 0.072 (a
critical chi-squared value is 12.02 with 7 degrees of freedom at 10% level). Second, to
keep in mind the endogeneity of the main explanatory variable, we restrict to zero the
coefﬁcients on all the explanatory variables but 11),, :r,,, and 62,-, (and year dummies
if included). The Wald statistic for this test is 13.33, which exceeds a critical value
of 12.59 with 6 degrees of freedom at 5% level (p—value is 0.038). And, thirdly, to
reﬂect assumption (2.5), the coeﬂicients on all the explanatory variables but 111,, 3:,,,
6%,, and 62,21), (and year dummies if included) are restricted to zero. Now, the Wald
statistic is 13.23, which is above a critical value of 11.07 with 5 degrees of freedom at
5% level (p-value is 0.021).

Finally, we check whether the interaction terms between the averages of the ﬁrst-

45

stage residuals, 62,-, and the averages of the exogenous variables, 2,, are jointly dif-
ferent from zero. To do so, we consider three possibilities: ﬁrst, when all such terms
are jointly signiﬁcant; second, when only terms originated from our assumption on
the additive heterogeneity, (1.1,, are jointly different from zero; and, third, when only
terms introduced from the assumption on the multiplicative heterogeneity, bu, are
jointly important. The resulting Wald statistics for regression (6) are 11.74, 2.47, and
11.32, respectively. Thus, we can reject the ﬁrst and the last null hypothesizes at 1%
level with chi-squared critical values of 9.21 and 6.63 for 2 and 1 degrees of freedom,
respectively. And we cannot reject the hypothesis that the interaction term between
the additive unobserved ﬁxed effect and the explanatory variable is signiﬁcant at any
conventional level of signiﬁcance (a chi-squared critical value for 1 degrees of freedom
for 10% level is 2.71). Thus, there is evidence at 1% level of signiﬁcance that the
conditional variance-covariance matrix, Var(b1,,a:,,|z,), is heteroskedastic. Overall,
we can conclude that the CF approach should perform favorably in comparison with
the FE—IV method for the output scrap rates application.

We are unlikely to draw conclusions about the causal effect of on-job-training on
the output scrap rate having only one explanatory variable — hrsemp,, — unless other
control variables are also accounted for. We consider a logarithm of the dollar value
of the annual sales ~- lsales,,, a logarithm of the number of employees -— lemploy,,,
and a logarithm of the average annual employee salary — la.vgsal,:, — as additional
explanatory variables. The summary statistics for these variables are reported in
Table B9. The ﬁrst stage regression results with and without different year intercepts
are provided in Table B.7. Regressions (3) and (4) employ additional explanatory
variables available. In these regressions, control variables lsales,, and lemploy,, and
their time averages are both individually and jointly statistically insigniﬁcant at any
conventional levels. The logarithm of the average annual employee salary, lavgsal,,,

and its time average are individually and jointly signiﬁcant at least at 10% level of

46

signiﬁcance. Based on these results we consider the average annual salary as the only

valid additional explanatory variable. Thus, the next model of our interest is
log(scrap,,) = a + b1,lirsemp,, + b2,lacgsal,, -l- 61(188, + 62d89, + a1,- + u.,,. (2.29)

The estimation results for this model are provided in column (8) of Table B.8.
Column (7) of the same table contains the results for the case without year dum-
mies. The CF estimates of the APE of the annual hours of job training slightly
decrease when we use two explanatory variables in comparison with the case with
only one regressor besides year dummies [See columns (7) and (8) vs. columns (5)
and (6) of Table B8]. Once again, the estimates from the CF method (regressions (7)
and (8)) are more statistically signiﬁcant and substantially larger in magnitudes for
the two regression speciﬁcations considered than the corresponding FE—IV estimates
(regressions (3) and (4)).

Finally, we might think that the assumption of a heterogenous response to the
annual average salary per worker stretches our imagination too much. Indeed, the
same amount of the annual salary per employee is likely to have the same impact
on the output scrap rate across different ﬁrms. Since the ﬁrms are located in the
same area (Michigan), workers of these ﬁrms are expected to get comparable value
out of the same monetary compensation for their work. Consequently, we consider

the following model:
log(scrap,,) = a + blih’l'SeTll/pit + 621avgsal,, + 61d88, + 62d89, + a1,- + u,,. (2.30)

The results for this model using the CF approach when the year dummies are both
excluded and included are provided in columns (9) and (10) of Table B.8, respectively.
The estimates from the CF method are bigger and more statistically signiﬁcant than
the corresponding FE-IV estimates reported in columns (3) and (4) of Table 88.
Interestingly, Table B.8 shows that the estimates of the adjusted standard errors

for the CF method are very close to the estimates of the robust standard errors that

47

are calculated ignoring the ﬁrst stage estimation. Contrary, the simulation results
[See Table B5] indicate that the adjustment in the standard errors for the ﬁrst stage
estimation is not trivial.

Based on results reported in columns (5) through (10) of Table B.8 we can conclude
that the CF approach estimates of the APE of the annual hours of job training
per worker are robust to different model speciﬁcations. Further, they are larger in
magnitudes and statistically more signiﬁcant than the FE—IV estimates (regressions

(1) through (4) of Table B.8) for all models considered.

2.6 Conclusion

This paper studies CRC balanced panel data models with endogenous regressors
as in Murtazashvili and Wooldridge (2005). However, in addition to allowing some
explanatory variables to be correlated with the idiosyncratic error, we also let the joint
distribution of the endogenous regressors and the individual heterogeneity conditional
on the instruments depend on the instruments. In particular, we allow the endogenous
regressors to be roughly continuous. We use a control function approach, which
introduces residuals from the reduced form for the endogenous regressors as covariates
in the structural model. We propose a two-step method to account for heterogeneity
and endogeneity and to consistently estimate APEs in CRC panel data models with
endogenous (roughly) continuous regressors. Further, we relax the assumptions in
Murtazashvili and Wooldridge (2005) by allowing the individual slopes in a CRC
model to vary over time.

Monte Carlo simulations indicate that in the ﬁnite samples the control function
approach to estimating the CRC balanced panel data model with time-invariant in-
dividual heterogeneity performs better than other estimators when the joint distri-

bution of the individual heterogeneity and the endogenous regressors conditional on

48

the instruments depends on these instruments.

Finally, we apply the new method to the problem of estimating the APEs of the
annual hours of on—job-training on the output scrap rates for manufacturing ﬁrms
in Michigan extending the work of Holzer, Block, Cheatham, and Knott (1993) to
allow for a ﬁrm-speciﬁc effect. The control function approach we propose delivers the
APEs of the annual hours of job training on the output scrap rates that are larger

in magnitudes and statistically more signiﬁcant than the APEs’ estimates from the

FE—IV approach.

49

CHAPTER 3

ESTIMATION OF A DYNAMIC
BINARY RESPONSE PANEL
DATA MODEL WITH AN
ENDOGENOUS REGRESSOR,
WITH AN APPLICATION TO
THE ANALYSIS OF POVERTY
PERSISTENCE IN RURAL
CHINA

3. 1 Introduction

Dynamic binary response models have considerable appeal for a diverse range of
policy analyses in which identifying or controlling for state dependence is important
and one is interested in a binary outcome.1 When the outcome is also affected by an
endogenous treatment, then an additional complication arises in efforts to identify the

effects of the treatment on the outcome and on state dependence. In this paper, we

 

1The range of research areas for which dynamic binary response models have proven
important include: labor force participation (Heckman and Willis, 1977; Hyslop, 1999), the
probability of receiving welfare (Bane and Ellwood, 1986), the experience social exclusion
(Poggi, 2007), and the identiﬁcation of adverse selection in insurance markets (Chiappori
and Salanie, 2000).

50

propose a parametric approach to estimating binary response dynamic panel data
models with endogenous contemporaneous regressors. Our method combines the
approach to solving the unobserved heterogeneity and the initial conditions problems
in non-linear dynamic models (Wooldridge, 20050) with a control function approach
to controlling for endogeneity of contemporaneous explanatory variables in cross-
sectional non-linear models (e.g., Rivers and Vuong, 1988; Smith and Blundell, 1986).
Among other possible applications, the relevance and potential strength of our
approach can be demonstrated in analyses of how migration in developing countries
affects the poverty status of residents living in migrant source communities. In this
setting, we are faced with two important sources of endogeneity: ﬁrst, the migration
decision of community residents may be driven by negative shocks that also raise
the probability that households are poor. Second, we expect there to be correlation
between migration decisions and the unobserved characteristics of individuals and
communities, which may also affect poverty status. Our approach allows us to con-
sistently estimate parameters of a dynamic binary response panel data model with
unobserved heterogeneity when some of the continuous contemporaneous explana-
tory variables are endogenous. To account for the endogeneity in migration from
home communities, we employ a control function approach in which residuals from
the reduced form for the endogenous regressor are introduced as covariates in the
structural model. To deal with the dynamic nature of the model, we consider two
possibilities. We ﬁrst use a “pure” random effects approach that allows the unob-
served heterogeneity to be independent of the observed exogenous covariates and
initial conditions. Next, we relax this strong assumption by employing the dynamic
correlated random effects model introduced by Wooldridge (2005c). This approach is
not only more relevant for analyses of poverty persistence, but also more ﬂexible and
computationally straightforward than alternative approaches currently in use.

We then implement our empirical approach using panel household and village

51

data from rural China. Following the market-oriented reforms introduced in the early
19805, there was a pronounced decline in the proportion of China’s population living
below the poverty line (Ravallion and Chen, 2007). While much of the literature
examining growth in China’s rural areas has focused on incentive effects related to
reform and on the role of local non-farm employment, there has been relatively little
research demonstrating the relationship between reduction of barriers to migration
from villages and the probability that households within the village have consumption
levels below the poverty line. Our empirical analysis demonstrates an economically
signiﬁcant causal relationship between reduction of barriers to migration and poverty
reduction in rural China.

The paper proceeds as follows. In the Section 3.2 below, we ﬁrst review approaches
to estimation of dynamic binary response panel data models, and then propose an
approach to estimating these models when there is an endogenous regressor. In Sec-
tion 3.3, we introduce the rural China setting, and develop a speciﬁc implementation
of the empirical model and strategy for identifying the effect of migration on poverty
within China’s villages. In Section 3.4, we discuss our estimation results and the per-
formance of the model, and then in Section 3.5 we summarize our results and discuss

directions for future research.

52

3.2 Estimation of a Dynamic Binary Response
Panel Data Model with an Endogenous Re-

gressor

3.2.1 Dynamic Binary Response Panel Data Models

Dynamic binary response panel data models with unobserved heterogeneity have been
' used extensively in theoretical and empirical studies. Both parametric and semipara-
metric methods have been proposed to solve the initial conditions problem and to
obtain consistent estimates of model parameters when all explanatory variables other
than the lagged dependent variable are strictly exogenous.2 Semiparametric methods
allow estimation of parameters without specifying a distribution of the unobserved
heterogeneity, but they are often overly restrictive with respect to the strictly exoge-
nous covariates. Honoré and Kyriazidou (2000), for example, propose an approach
that does not allow for discrete explanatory variables. More importantly, because
the semiparametric methods do not specify the distribution of the unobserved het-
erogeneity, the absolute importance of any of the explanatory variables in a dynamic
binary response panel data model cannot be determined. Models with no assump-
tion on either the unobserved effects or the initial conditions, or their relationship to
other covariates, are best described as ﬁxed eﬁccts models, and the semiparametric

approach of Honoré and Kyriazidou (2000) falls into this class of models.3

 

2With a structural binary outcome model that allows for unobserved effects, one must
be concerned that bias could be introduced through a systematic relationship between an
unobserved effect and the initial value of the dependent variable. This is known as the

initial conditions problem.
3We follow Chay and Hyslop (2000) in classifying models requiring no assumption on

unobservable effects or initial conditions as ﬁxed effect models, and refer to random ef-
fect models as those in which one speciﬁes a distribution of unobserved effects and initial
conditions given exogenous explanatory variables.

53

Due to their computational simplicity, parametric methods have received greater
attention than semiparametric methods. There are four main parametric approaches
that have been employed for estimation of the dynamic nonlinear panel data mod-
els with strictly exogenous covariates other than the lagged dependent variable. All
four approaches use conditional maximum likelihood estimation (CMLE) analysis.
The ﬁrst approach treats the initial conditions for each cross-sectional unit - 31,0 -
as nonrandom variables. If, in addition, unobserved effects, c,, are also assumed to
be independent of z,, one obtains the density of (y,1,y,2, ...,y,T) given the initial
conditions, W), and the exogenous explanatory variables, z, = (z,1,z,2,...,z,T), by
integrating out the c,. We refer to the relationship between the observed exogenous
covariates and the unobserved heterogeneity in the ﬁrst method a ”pure” random
effects relationship because we assume c,- to be independent of z, and W). While
this method does provide a way to obtain consistent estimates of the model parame-
ters, nonrandomness of the initial conditions requires the very strong assumption of
independence between the initial conditions and the unobserved effects.

A second parametric approach would involve treating the initial conditions as
random and specify a density of y,0 given (2,, c,). With this density, one can then
obtain the joint distribution of all the outcomes, (y,0, y,1, y,2, ..., y,T), conditional on
unobserved heterogeneity, c,, and strictly exogenous observables, z,. One important

drawback of this approach lies with the difﬁculty of specifying the density of y,0 given

(Zr. Ci)-4

A third method, proposed by Heckman (1981), suggests approximating a den-
sity of the initial conditions, y,0, given (z,,c,) and specifying a density of the un-
observed effects given the strictly exogenous explanatory variables. The density of
(ya), y,1, 31,2, 312T) given z, can then be obtained. While Heckman’s approach avoids

the drawback of the second method, it is computationally challenging. Since both

 

4More details on this approach and potential drawbacks can be found in Wooldridge
(2002), page 494.

54

the second and the third methods explicitly specify a distribution of the unobserved
heterogeneity conditional on strictly exogenous observables and a distribution of the
initial conditions conditional on the unobserved effects and the exogenous covariates,
they can be classiﬁed as random effects models.

Finally, an approach proposed by Wooldridge (2005c) recommends obtaining a
joint distribution of (31,1, 31,2, ..., 3,1,7) conditional on (y,0, z,) rather than a distribution
of (31,0, y,1. y,2, ..., y,T) conditional on z, as in Hechman’s approach. For this method
to work, we need to specify a density of c, given (y,0,z,).5 This fourth approach is
more ﬂexible and requires fewer computational resources than Heckman’s technique.
In this method, we call the relationship between the observed exogenous covariates
and the unobserved heterogeneity a ”correlated ” random eﬂects relationship because
we allow c, to be a linear function of z, and y,0.

In the next section we develop a theoretical method that consistently estimates
parameters of a dynamic binary response panel data model when the contemporane-
ous explanatory variables are not strictly exogenous. To do so, we employ a control
function approach, originally introduced by Smith and Blundell (1986) and Rivers
and Vuong (1988). The main idea of the control function approach is to add control
variables into the structural model to control for endogeneity. Since we will consider
a model with two possible sources of endogeneity — the correlation between the un-
observed heterogeneity and a regressor, and the correlation between a regressor and
the structural error, we model the relationships among the unobserved effect, exoge-
nous covariates, and the error from the reduced form equation for the endogenous

explanatory variable.

 

5The speciﬁcation of this density in Wooldridge’s method is motivated by Chamberlain’s
(1980) device.

55

3.2.2 A General Approach to Estimation

Our speciﬁcation of the binary response model assumes that for a random draw 2'

from the population, there is an underlying latent variable model:

yf,, = Zia/31 + 523121 + Pym—1 + 612' + Um. (3-1)

y1it=1lyfit 2 0].t=1....,T. (3.2)

where z,,, is a 1 x (K — 1) vector of strictly exogenous covariates, which may contain a
constant term, y2,, is an endogenous covariate, c1, is an unobserved effect, and um is
an idiosyncratic serially uncorrelated error such that Var(u1,,) = 1. 1]] is an indicator
function. We assume a sample of size N randomly drawn from the population, and
that T, the number of time periods, is ﬁxed in the asymptotic analysis. For simplicity,
we assume a balanced panel.

Let 6 denote (6], 62, 63, p)/, which is a 1 x (K + 1) vector of all the parameters.
Importantly, this model allows the probability of success at time t to depend not
only on unobserved heterogeneity, C1,, but also on the outcome in t — 1. A key
assumption is that. the dynamics are correctly speciﬁed and dynamic completeness
of the model implies that the error term is serially uncorrelated. Thus, assuming
that model (3.1) is correctly speciﬁed dynamically, we assume that the error term
u1,, is serially uncorrelated. Allowing, u1,, to have arbitrary serial correlation, would
suggest including more lags of the dependent variable (3.1). For example, in the
simplest case of a linear model, when an error term, u,,, follows AR(1) process, a
simple calculation shows that a dependent variable, say y,,, actually depends on not
only y,,,_1 but also y,,,_2. Similarly, in the context of our model, one should have
a good reason to expect a serially correlated error term 711,, and yet to include only

one lag of y,,,.

56

Further, we make additional assumptions on strict exogeneity of the contempora-
neous explanatory variables. First, some of the contemporaneous covariates, z1,,, are
assumed to be strictly exogenous (conditional on c1,). Second, we allow some of the

explanatory variables, here represented by the scalar 312,“ to be endogenous.

312a = 211151 + 222152 + ('2i + "2a
2 z,,6 + 2,A + 02, 'f' 112,,

= 2215 + 3M + 11221: (3-3)

where t = 1, ...,T, c2, is an unobserved effect, and u2,, is an idiosyncratic serially
uncorrelated error with Var(u2,,) = 0%. Let z,, = (zl,,,z2,,) be a 1 x L vector
of instrumental variables, with L 2 K, i.e., we assume the vector 22,, contains at
least one element. We employ the Mundlak-Chamberlain device for the unobserved
effect, c2,, and this is reﬂected in line two of equation (3.3). We replace eg, with
its projection onto the time averages of all the exogenous variables: c2, = 2,A + a2,
Then, the new composite error term is 212,, = a2, + u2,,. Further, 2, = Tizit’ and
5 = (6’ , 55);. We follow Rivers and Vuong (1988) and refer to (3.3) as a redﬁced form
equation.

Next, consider the relationship between u,,, and u2,,. We assume that (u,,,, u2,,)
has a zero mean, bivariate normal distribution and is independent of z, = (z1,, 22,) =
(z,1,z,2. ...,z,T). Note that under joint normality of (71,1,,,u2,,), with Var(u1,,) = 1,

we can write

Um = 911221 + Gift

= 9012a - (12;) + 61a, (3-4)

where 6 = 17/03 , n = Cov(u1,,, u.2,,), 0% = Var(u2,,), and 61,, is a serially uncorrelated
random term, which is independent of z, and u2,,. The absence of serial correlation

of the em follows from the fact that 111,, and 112,, are both assumed not to suffer from

57

serial correlation. If there were no lagged dependent variables on the right hand side
of equation (3.1), there would be little need to worry about possible serial correlation
in the error term ug,, of equation (3.3), as long as we assume that 111,, is also serially
uncorrelated. However, we are interested in a dynamic model, and the assumption
of no serial correlation in 2,2,, is crucial for equation (3.4). Since equation (3.3) is
essentially a reduced form equation for the endogenous variable y2,,, the assumption
of no serial correlation in 212,, (and in em, as a result) is appropriate in the context
of our model.

Equation (3.4) is essentially an assumption regarding the contemporaneous endo-
geneity of y2,,. It suggests that the contemporaneous 112,, is sufﬁcient for explaining
the relation between u”, and 212,, In other words, once we somehow account for
endogeneity of 312,, in period t, we might think that 312,, becomes ”completely” exoge-
nous, and we can estimate the parameters of interest using standard methods valid
for exogenous explanatory variables. However, there is the possibility of an addi-
tional feedback from the endogenous variable yg in different time periods to the main
dependent variable of interest, yl, at time t. This possibility arises because we let
the reduced form equation for the endogenous variable, y2,,, contain a time-constant
unobserved effect, a2,

From assumption (3.4), (21,, ~ Normal(O, 031), where 031 = 1 — {2, since

Var(u1,,) = 1, and § = Corr(u1,,,u2,,), we can write
311a = llxirtﬂ + 01i+ 9(“2it — "-2i) + 81a 2 0i
= llxiz’tl3 + 91’2it‘l' (Cir — 96122:) + 61a 2 0

= llxiit!3 + 90211 + COi + 61a 2 0]. (3-5)

/‘ I o ‘
where t = 1. MT, X127: = (ZlitayZitsyliJ—lla ,3 = 03162466 and 002'. = 012‘ - 9022'. lb
a composite unobserved effect. Since the unobserved effect CO, is present in equation

(3.5), we should consider the relation between the unobserved effect c0, and the

58

explanatory variables in equation (3.5). Importantly, the composite unobserved effect

CO, is a function of v2,,, where t = 1, ..., T, by construction:
002'. = 012: — 90227 = 612; - 9(L’2it — u2,,),t=1,...T

Thus, in order to obtain consistent estimates of the parameters from equation (3.5),
we must take into account the relation between c0, and 122,, in different time periods.

First, we use a ”pure” random effects approach, i.e., we assume that
-|- - .~N 1( —. 2)t=1 T (36)
C02 zzi yl'l01v2’l orma (10112,,0'01 1 1 “'1 a '

which can be written as CO, = 01022, + a1,, t = 1,...,T, where a1,]z,,y1,0,v2, ~
Normal(O, 031) and is independent of (z,, 3,1,0, v2,), where 22, = alwtilvzit, and v2, =
(v,1,v,2, ...,v,T). While a limiting assumption in many potential applications, the
”pure” RE assumption (3.6) may be relevant for certain cases. In particular, when
every individual in the initial time period is in the same state (e. g., we are interested
in the population of people who smoke), assumption (3.6) might be appropriate.
Further, since we assume that the composite unobserved effect, c0,, is independent of
the initial condition, y1,0, it is natural to think that ’Ugit’s in different time periods
have equal impacts on c0,. Consequently, we employ 22, as a sufﬁcient statistic for
describing the relation between co, and v2.,,’s in different time periods.

Then, under assumptions (3.1)-(3.4) and (3.6), we can rewrite equation (3.5) as

yut = llxiitﬁ + 902a + 0052' + an + 61a 2 0l- (3-7)

6 6 (1

Clearly, the estimates of 6 = —, 6 = ————, and a0 = —L— can
(log, +0211 031 +031 ,loel +03,

be obtained using standard random effects probit software by including 62, in each
time period into the list of the explanatory variables along with x1,, and 02”, where
T
7 1 A
"U2z' = T Z 'U2it-
1521
However, as we discussed earlier, the assumption of independence between the

unobserved effect and the initial conditions and the exogenous covariates is often too

59

restrictive. In particular, the ”pure” random effects assumption is unrealistic in the
context of the application to poverty persistence that we will examine below. For
instance, unobserved dimensions of ability are very likely to be related to poverty
status not only in the initial period, but also in future periods.

Rather than using a ”pure” random effects approach, we prefer building on the dy-
namic ”correlated” random effects model introduced by Wooldridge (2005c). Instead

of the conditional distribution of co, assumed in (3.6), we now assume that
r 2
C’Oilzia 31110: V2 ~ 1\OrInaKVa'ao + Zrai + 023/120: 0a,), (3-8)

which follows from writing c0, = V2,,a0 + 2,0, + agym) + a1,, where a1,|z,, 311,0, v2, ~
Normal(0, 031) and independent of (z,, y1.,0,v2,). Since we allow for a nonzero cor-
relation between the composite unobserved effect, c0,, and the initial condition, Ill-2'0,
'Ug,,’s in different time periods might have different effects on em. Thus, we let v2,,’s
from different time periods have unequal ”weights” for explaining c0, Assumption
(3.8) is an extension of Chamberlain’s assumption for a static probit model to the
dynamic setting. To allow for correlation between c0, and z, and 311,0, we assume a
conditional normal distribution with linear expectation and constant variance. As-
sumption (3.8) is a restrictive assumption since it speciﬁes a distribution for CO, given
z,, y1,0,v2,. However, it is an improvement on the “pure” random effects approach
in that it allows for some dependence between the unobserved effect and the vector
of all explanatory variables across all time periods.

Then, under assumptions (3.1)-(3.4) and (3.8), we can rewrite equation (3.5) as
ylit = llxlzttﬁ + 91221: + Cor + 61a 2 0i
= llxnt/3 + 97%: + Vain-'0 + 22:01 + (123/120 + an + 61n 2 Ol- (39)

E to 3.9‘ .t.tht. .' t. =——ﬁ—— a9=—Q——1
qua1n( )suggess a wecan est1mae6 Wan Waong

with 00- — ————Q——

02 ,ol— —— ——Land (12— — ——2——using standard random
V01+Ur211 V0 gc:+aal Vocl'i‘ar’il

60

|
I I

effects probit software by including 02,-, z,, and gym) in each time period into the list

of the explanatory variables along with x1,, and 732,,

3.2.3 Allowing for Serial Correlation of Errors in the First

Stage

If the ﬁrst stage error, um, is serially correlated, we must modify our two—step es-
timating procedure. To be speciﬁc, assume 11.2,, follows an AR(1) process: u2,, =
nu2,,,_1 + e2,,, where e2,, is a white noise error with Var(e2,,) = 062- Then, under

assumption (3.4),
C0V(€11t, 6121—1) = C0V(U1zﬁt - 9112a, “Mt—1 - 0u2i,t—1)
= Covf'um — Winn — 662itau1i,t—1 - 6""2i,t—1) = W92Efu2i,t—1).
which is more than 0, unless either 7r 2 0 or 6 = 0. Clearly, assumption (3.4) is no
longer appropriate and needs to be corrected.

Deﬁne the variance-covariance matrix of v2, as Q E E(v'2,-v2,), a T x T matrix

that we assume to be positive deﬁnite. Then,

/ 1 7,. 7,2 . . . TIT—2 7,.T—l \
7,. 1 7,. T— T-2
n2 7r 1 -~ 7rT_4 arr—3
__ I 2 . or 2
ﬂ = E(V‘22TV2,') = (7a2JTJT + 02 : .. : ,
7TT—2 WT—3 7rT—4 1

 

 

(3.10)

. . 2 ”(2: , . . .
where JT IS a T x 1 vector of ones, and 02 = 17%,. We can obtain conSIStent estimates
of the parameters in (3.10), and use them to transform '02,, to US“, which is a ﬁrst
stage error free of serial correlation. One useful method for estimating 7r, 0%,, 03,2,
and 0% is the minimum distance estimator, described in detail by Chamberlain (1984).

Cappellari ( 1999) has developed code that conveniently implements this method in

Stata.

61

Once we have ﬁrst stage errors free of serial correlation, we can use the transfor-
mation “fit = of, —— a2, to adjust assumption (3.4). We can then assume that under
joint normality of (u,,,, ugit),

um = U349 + 61a
= 905a — (122‘) + elita (3-11)
where cm is a serially uncorrelated random term, which is independent of z, and

“521- Inclusion of “fit instead of u2,, in equation (3.11) guarantees that 61,, will not

be serially correlated. Then, we can write
yin = llxiaﬂ + 012' + 916,-, — 9022' + 61a 2 0]
= llxlaﬁ + 6123,, + (612' - 0022:) + 61a 2 0
= llxlitﬁ + 976a + "01' + 6121: 2 0]. (3-12)
where t z 1, T, and c0, = cl, — 6oz, is a composite unobserved effect.

Based on equation (3.12), it is straightforward to adjust the two-step estimating
procedure discussed in Section 3.2.2 to account for the presence of the serial cor-
relation in u2,,. For example, under ”correlated” random effects assumption (3.8),
equation (3.12) can be written as

gift = llxiz'tl3 + 91/5,, + Cor + 6m 2 0]
= llxlrtﬁ + 9712.: + V2200 + z,,-0:1 + Ont/120 + an + 61a 2 of (3-13)
Then, we can estimate the parameters 6, 6, Q1, and (12 using standard random effects

probit software by including 02,-, z,, and 311,0 in each time period into the list of the

explanatory variables along with x1,,.

3.2.4 Calculation of Average Partial Effects

To assess the magnitude of state dependence we must calculate the average partial

effect (APE) of the lagged poverty status on its current value. We follow an approach

62

favored by Wooldridge (2002) to calculate the APES after our two-step estimation

procedure. The APEs can be calculated by taking either differences or derivatives of
El‘pbini3 + 91% + V2200 + 2:01 + Gel/1270)], (3-14)

where t = 1, ..., T and in the argument of the expectations operator, variables with a
subscript i are random and all others are ﬁxed.

In order to obtain estimates of the parameter values in (3.14), we appeal to a
standard uniform weak law of large numbers argument.6 For any given value of
x1,(x(1’), a consistent estimator for expression (3.14) can be obtained by replacing

unknown parameters by consistent estimators:

N
N_IZ‘D(Xfr3* + 9*92214- 9260* + Z261, + 42*31110). (3-15)

i=1
where t = 1, ...,T, the 132,, are the ﬁrst-stage pooled OLS residuals from regressing
y2,, on z,,, v2, = (6,1,6,2, ...,f',T), the * subscript denotes multiplication by 62 =
(03:33,)4/ , and 6, 6, do, 611, (312, and {72 are the conditional MLEs. Note that
62 is the usual error variance estimator from the second-stage random effects probit
regression of 3,11,, on x1,,, 13%,, z,, and 3,1,0. One may then employ either a mean value
expansion or a bootstrapping approach to obtain asymptotic standard errors. We
can compute either changes or derivatives of equation (3.15) with respect to x1, to
obtain the APEs of interest.
In common with the adjustment to our estimating procedure, one must also correct

the estimated APEs when errors are serially correlated. We obtain the APEs by taking

either differences or derivatives of
El‘I’fxltﬂ + 91% + V2100 + 22:01 + 0123/1210], (3-16)

where t = 1, ..., T. For simplicity, consider the second approach used in Section 3.2.4

to obtain the APEs’ estimates. For any given value of x1,(x(1’), a consistent estima-

 

6See Wooldridge (2002) for details.

63

tor of expression (3.16) is obtained by replacing unknown parameters by consistent
estimators:
N
N_IZ‘I’(X’1’6* + 0*02it + 922150»: + Ziéi, + 92*912'0): (3-17)
i=1

where f. = 1, T, '13,, is a first stage residual cleaned of serial correlation, where the
A

—1/2 ,. A
* subscript denotes multiplication by 62 = (031 + 05,21) , and 6, 6, 511, 62, and

2

6' are the conditional MLEs. We can then compute either changes or derivatives of

equation (3.17) with respect to x1, to obtain the APEs of interest.

3.3 Migrant Labor Markets and Poverty Persis-

tence in Rural China

Before applying the dynamic binary response model discussed above to an analysis
of how migrant labor market affect poverty status in rural China, we ﬁrst brieﬂy
review the history of rural-urban migration in China and review other evidence on
the impact of migration in the home villages of migrants. Next, we propose a speciﬁc
implementation of the dynamic binary response model to an analysis of the impact of
migration on the probability that a rural household is poor. We then introduce the
unique panel household and village data sources used in our analysis and describe our
approach to identifying the migrant networks that affect the cost of ﬁnding migrant

employment for village residents.

3.3.1 Rural-Urban Migration in China

China’s labor market experienced a dramatic change during the 19905, as the volume
of rural migrants moving to urban areas for employment grew rapidly. Estimates us-

ing the one percent sample from the 1990 and 2000 rounds of the Population Census

64

 

 

and the 1995 one percent population survey suggest that the inter-county migrant
population grew from just over 20 million in 1990 to 45 million in 1995 and 79 mil-
lion by 2000 (Liang and Ma, 2004). Surveys conducted by the National Bureau of
Statistics (NBS) and the Ministry of Agriculture include more detailed retrospective
information on past short-term migration, and suggest even higher levels of labor
migration than those reported in the census (Cai, Park and Zhao, 2007).

Before labor mobility restrictions were relaxed, households in remote regions of ru—
ral China faced low returns to local economic activity, reinforcing geographic poverty
traps (J alan and Ravallion, 2002). A considerable body of descriptive evidence related
to the growth of migration in China raises the possibility that migrant opportunity
may be an important mechanism for poverty reduction. Studies of the impact of
migration on migrant households suggest that migration is associated with higher
incomes (Taylor, Rozelle and de Brauw, 2003; Du, Park, and Wang, 2006), facilitates
risk-coping and risk-management (Giles, 2006; Giles and Y00, 2006), and is associated
with higher levels of local investment in productive activities (Zhao, 2003).

Institutional changes, policy signals and the high return to labor in urban areas
each played a role in the expansion of migration during the 19903. An early reform of
the household registration (hukou) system in 1988 ﬁrst established a mechanism for
rural migrants to obtain legal temporary residence in China’s urban areas (Mallee,
1995). In order to take advantage of this policy change, rural residents required a
national identity card to obtain a legal temporary worker card (zanzu zheng), but
not all rural counties had distributed IDs as of 1988.7 As China recovered from its
post-Tiananmen retrenchment, some credit a series of policy speeches made by Deng
Xiaoping in 1992 as signals of renewed openness toward the marketization of the

economy, including employment of migrant rural labor in urban areas (Chan and

 

7Legal temporary residence status does not confer access to the same set of beneﬁts
(e.g., subsidized education, health care, and housing) typically associated with permanent
registration as a city resident.

65

 

Zhang, 1999). Combined with economic expansion, these institutional and policy
changes led to increased demand for construction and service sector workers, and
catalyzed the growth in rural-urban migration that continued throughout the 19905.

The use of migrant networks and employment referral in urban areas are im-
portant dimensions of China’s rural-urban migration experience. Rozelle et al (1999)
emphasize that villages with more migrants in 1988 experienced more rapid migration
growth by 1995. Zhao (2003) shows that number of early migrants from a village is
correlated with the probability that an individual with no prior migration experience
will choose to participate in the migrant labor market. Meng (2000) further suggests
that variation in the size of migrant ﬂows to different destinations can be partially

explained by the size of the existing migrant population in potential destinations.8

3.3.2 The RCRE Household Survey

The primary data sources used for our analyses are the village and household surveys
conducted by the Research Center for Rural Economy at China’s Ministry of Agricul-
ture from 1986 through the 2003 survey year. We use data from 90 villages in eight
provinces (Anhui, Jilin, Jiangsu, Henan, Hunan, Shanxi, Sichuan and Zhejiang) that
were surveyed over the 17-year period, with an average of 6305 households surveyed
per year. Depending on village size, between 40 and 120 households were randomly
surveyed in each village.

The RCRE household survey collected detailed household-level information on

 

3Referral through one’s social network is a common method of job search in both the
developing and developed world. Carrington, Detragiache, and Vishnawath (1996) explicitly
show that in a model of migration, moving costs can decline with the number of migrants
over time, even if wage differentials narrow between source communities and destinations.
Survey-based evidence suggests that roughly 50 percent of new jobs in the US are found
through referrals facilitated by social networks (Montgomery, 1991). In a study of Mexican
migrants in the US, Munshi (2003) shows that having more migrants from one’s own village
living in the same city increases the likelihood of employment.

66

 

incomes and expenditures, education, labor supply, asset ownership, land holdings,
savings, formal and informal access to credit, and remittances.9 In common with the
National Bureau of Statistics (NBS) Rural Household Survey, respondent households
keep daily diaries of income and expenditure, and a resident administrator living in
the county seat visits with households once a month to collect information from the
diaries.

Our measure of consumption includes nondurable goods expenditure plus an im-
puted ﬂow of services from household durable goods and housing. In order to convert
the stock of durables into a ﬂow of consumption services, we assume that current and
past investments in housing are “consumed” over a 20—year period and that invest-
ments in durable goods are consumed over a period of 7 years.10 We also annually
“inﬂate” the value of the stock of durables to reﬂect the increase in durable goods’
prices over the period. Finally, we deﬂate all income and expenditure data to 1986
prices using the NBS rural consumer price index for each province.

There has been some debate over the representativeness of both the RCRE and
NBS surveys, and concern over differences between trends in poverty and inequality
in the NBS and RCRE surveys. These issues are reviewed extensively in Appendix
B of Benjamin et al (2005), but it is worth summarizing some of their ﬁndings here.
First, when comparing cross sections of the NBS and RCRE surveys with overlapping
years from cross sectional surveys not using a diary method, it is apparent that some

high and low income households are under-represented.“Poorer illiterate households

 

9One shortcoming of the survey is the lack of individual-level information. However, we
know the numbers of working-age adults and dependents, as well as the gender composition

of household members.
10 Our approach to valuing consumption follows the suggestions of Chen and Ravallion

( 1996) for the NBS Rural Household Survey, and is explained in more detail in Appendix

A of Benjamin et al. (2005).
11 The cross-sections used were the rural samples of the 1993, 1997 and 2000 China Health

and Nutrition Survey (CHNS) and a survey conducted in 2000 by the Center for Chinese
Agricultural Policy (CCAP) with Scott Rozelle (UC Davis) and Loren Brandt (University

of Toronto).

67

are likely to be under-represented because enumerators ﬁnd it difﬁcult to implement
and monitor the diary-based survey, and refusal rates are likely to be high among
affluent households who ﬁnd the diary reporting method a costly use of their time.
Second, much of the difference between levels and trends from the NBS and RCRE
surveys can be explained by differences in the valuation of home-produced grain and

treatment of taxes and fees.

3.3.3 Migration, Consumption Growth and Poverty

Tl‘ends

One of the beneﬁts of the accompanying village survey is a question asked each year
of village leaders about the number of registered village residents working and living
outside the village. In our analysis, we consider all registered residents working outside
their home county to be migrantsnBoth the tremendous increase in migration from
1987 onward and heterogeneity across villages are evident in Figure C.1. In 1987 an
average of 3 percent of working age laborers in RCRE villages were working outside of
their home villages, which rose steadily to 23 percent by 2003. Moreover, we observe
considerable variability in the share of working age laborers working as migrants.
Whereas some villages still had a small share of legal village residents employed
as migrants, more than 50 percent of working age adults from other villages were
employed outside the village by 2003.

The relationship between migration and consumption is of central concern for our
analysis. The linear ﬁt of the relationship between annual changes in migration and
average village consumption growth in the RCRE data suggest a positive relation-

ship (Figure C.2). The lowess ﬁt, however, suggests the presence of nonlinearities,

 

12 From follow up interviews with village leaders, it is apparent that registered residents
living outside the county are unlikely to be commuters and generally live and work outside
the village for more than six months of the year.

68

 

particularly around zero. Indeed the prospect that out-migration may be driven by
negative shocks which also depress consumption should raise concern that size of the
migrant network and consumption may be endogeneous and driven in part by shocks
affecting both variables.

Even if consumption grows with an increase in the number of residents earning
incomes from migrant employment, it is of important policy interest to understand
which residents within villages are experiencing increases in consumption. Changes in
the village poverty headcount are negatively associated with the change in the num-
ber of out-migrants, suggesting that poverty declines with increased out-migration
(Figure C.3). Nonlinearities in the bivariate relationship are evident again in the
non-parametric lowess plot of the relationship. Whether obvious non—linearities are
related to the simultaneity of shocks and increases in out-migration and poverty for
some villages or the simple fact that we have not controlled for other characteristics of
villages, establishing a relationship between migration and increased consumption of
poorer households within villages requires an analytical framework where we eliminate

bias due to simultaneity and potential sources of unobserved heterogeneity.

A Causal Relationship Between Migration and Consumption Growth

In other research using this data source, de Brauw and Giles (2007) use linear dynamic
panel data methods with continuous regressors to demonstrate a robust relationship
between the reduction of obstacles to rural—urban migration and household consump-
tion growth. While one might often suspect that the non-poor, who have sufﬁciently
high human capital and other dimensions of ability, may beneﬁt most from reduc-
tions in barriers to migration, general equilibrium effects of out-migration may lead
to greater specialization of households in villages that has beneﬁts for the poor. In
particular, de Brauw and Giles demonstrate that households at the lower end of the

consumption distribution tend to expand both their investments in agriculture related

69

assets and the area of land that they cultivate increase more with out migration than
they do for richer households. This raises the prospect that ability to migrate may
be causally related to poverty reduction within rural communities as well.

In the empirical application of our discrete binary response model below, we are
simply seeking to understand whether out-migration from villages is associated with
reductions in the probability that household consumption falls below the poverty line
in rural China. We are agnostic as to whether poverty is reduced through direct
participation in the migrant labor market, or through indirect general equilibrium

effects that raise the return to labor in agricultural and other local activities.

3.3.4 Estimating the Impact of Migrant Labor Markets on

Poverty Persistence

The econometric approach derived in Section 3.2.2 allows us to control for household
speciﬁc unobserved effects, which will include ﬁxed effects associated with the village
in which households are located. We are interested in estimating the dynamic binary
choice model for the probability that a household 15 falls below the poverty line at

time t:

p00,, 2 1[61pov,,_1 + 62mg, * pov,,_1) + 631W}, + XQ-tal + uglpcu + D, + u, + e,,],

(3.18)
where 1901),, is a binary indicator for whether the household is in poverty in year
t, which will be affected by poverty status in the prior period, pov,,_1, the size of
the migrant network from village j through which the household 2' may be able to
obtain a job referral, M 53,, a vector of household demographic and human capital
characteristics, X,,, household land per capita, lpc,,, and year dummies to control

for macroeconomic shocks, D,. We will be concerned about the possibility that an

unobserved household effect, 71,, may be systematically related to the size of the

70

 

household’s migrant network, to other covariates, and to household poverty status,
thus introduce endogeneity concerns. The error term, 5,,, may be serially correlated,
and we may be concerned that shocks in the error term may also be systematically

related to the size of the migrant network, M i and to the possibility of falling into

J't’
poverty, and thus contribute an additional endogeneity concern.

From the model speciﬁed in (3.18), we are particularly interested in identifying
the coefﬁcients on pov,,_1, M}, and M}, * pov,,_1. The coefﬁcients on pov,,_1 and
M}, =l< pov,,_1 allow us to gauge the importance of persistence in the probability that
a household is poor, and the impact of access to migrant employment opportunities
through the migrant network on poverty persistence. 63, the coefficient on M 2’2, allows
us to determine the impact of the migrant network on the probability that a household
will fall into poverty.

The speciﬁcation shown in (3.18) may have additional sources of endogeneity if we
believe that household demographic and human capital, X,,, or land per capita, lpc,,,
may vary with unobserved shocks in period t or t— 1. We address the possible concern
over endogenous household composition by using household demographic and human
capital variables for the legal long-term registered residents of households. While
household size may vary somewhat with shocks as individuals move in and out of
the household for the purpose of ﬁnding temporary work elsewhere, such variations
do not show up in registered household membership. Long-term membership only
changes when households split subject to such events as marraige or legal change of
residence to another location. Land managed by the household may also vary with
shocks. Land markets in rural China do not function well: land cannot be bought
and sold, and only in the last few years have farmers gained the right to explicitly
transfer land. Instead land is allocated by village leaders, and reallocated or adjusted
among households within village small groups if a household is judged to have too

little land to support itself. Nonetheless, there is some possibility that reallocation

71

 

may be related to shocks that occur in period t or t—1 that may also be systematically
related to poverty status and the migrant network size. Wooldridge (2002) shows that
when the assumption of strict exogeneity of the regressors fails in the context of the
standard FE estimation the inconsistency of the instrument is of order T’l. We thus

use the period t — 2 value of land per capita and estimate:

POUz‘t = llﬁiPOl’it—i + 32(11th * Pm‘itwll + £331”; + X§t01 + 021P6it—2 + Dt + “2' + Eitl,
(3.19)

One remaining issue remains in that we do not perfectly observe the network M},
through which household 2' may use for job referrals. Instead, we observe the number
of registered longterm village residents who are employed as migrants outside the
village in a particular year, or th. The true migrant network may include former
legal registered residents who have now changed their long-term residence status,
implying that the actual potential network is larger. Alternatively, the household
may not be familiar with all of the village out-migrants, and thus the actual network

through which a household may seek referrals may be smaller. Thus, we will estimate:

Povit = ll/31P0v2‘t—1+232(th* Povit—ll + 53 th + X§t011 + azlpcz't—z + Dt + 112' + fit],
(3.20)

In our identiﬁcation strategy below, we will instrument the endogenous number
of village out-migrants, M jta with village level instruments, identifying the size of
the village migrant labor force, interacted with period t — 2 lagged land per capita,
lp(:.,-t_2, in order to allow for differences in the effective value of the village migrant
network for households with different amounts of land.

Why might we expect that interacting with lpcit-2 might achieve this? We believe
that the land per capita managed by households will likely pick up a dimension of
proximity of different households within the village. Within villages in rural China,
households are separated into smaller units of roughly 20 households known as vil-

lage small groups (can ziaozu), which were referred to as production teams during

72

 

 

the Maoist period. These households are located in clusters and will have closer re-
lationships with one another than with households of other small groups. Moreover,
property rights to land in rural China typically reside with the small group, not with
the village. Thus, when land reallocations take place they typically take place within
but not across small groups. Small groups make more frequent small adjustments
to household land as the land per capita available starts to become unequal with
differential changes in household structure across households within the small group,
but there is much less ﬂexibility in making adjustments across small groups. As a
result, much of the variability of land per capita within villages occurs across small
groups.13 Interacting a village level instrument for the migrant network with land
per capita will allow the importance of th to vary across households, and much of
the difference across households occurs because of unobserved differences in the small

groups in which they reside and from which migrants refer to as home.

3.3.5 Identiﬁying the Migrant Network

To instrument the village migrant network, we make use of two policy changes that,
working together, affect the strength of migrant networks outside home counties but
are plausibly unrelated to the demand for and supply of schooling. First, a new
national ID card (shenfen zheng) was introduced in 1984. While urban residents re-
ceived IDs in 1984, residents of most rural counties did not receive them immediately.
In 1988, a reform of the residential registration system made it easier for migrants

to gain legal temporary residence in cities, but a national ID card was necessary to

 

13 We do not know village small group membership in the RCRE survey prior to 2003
when a new survey instrument was introduced. If we regress land per capita on village
dummy variables in 2003, we obtain an R-Squared of 0.503, while if we run a regression
of land per capita on small group dummy variables, we obtain an R-Squared of 0.616. A
Lagrange Multiplier test for whether the small group effects add anything signiﬁcant over
the village effects, which is effectively a test of whether small group coefﬁcients are constant
within villages, yields an LM statistic of 310.67, which has a p—value of 0.0000.

73

 

obtain a temporary residence permit (Mallee, 1995). While some rural counties made
national IDs available to rural residents as early as 1984, others distributed them in
1988, and still others did not issue IDs until several years later. The RCRE follow-up
survey asked local ofﬁcials when IDs had actually been issued to rural residents of
the county. In our sample, 41 of the 90 counties issued cards in 1988, but cards were
issued as early as 1984 in three counties and as late as 1997 in one county. It is
important to note that IDs were not necessary for migration, and large numbers of
migrants live in cities without legal temporary residence cards. However, migrants
with temporary residence cards have a more secure position in the destination com-
munity, hold better jobs, and would thus plausibly make up part of a longer-term
migrant network in migrant destinations. Thus, ID distribution had two effects after
the 1988 residential registration (hukou) reform. First, the costs of migrating to a city
should fall after IDs became available. Second, if the quality of the migrant network
improves with the years since IDs are available, then the costs of ﬁnding migrant
employment should continue to fall over time.

As a result, the size of the migrant network should be a function of both whether or
not cards have been issued and the time since cards have been issued in the village.
Given that the size of the potential network has an upper bound, we expect the
years-since—IDs-issued to have a non-linear relationship with the size of the migrant
labor force and we expect growth in the migrant network to decline after initially
increasing with distribution of IDs. In Figure C.4, we show a lowess plot of the
relationship between years since IDs were distributed and the change in number of
migrants from the village from year t—l to t. Note the sharp increase in migrants from
the time that IDs are distributed and then a slowing of the increase over time (which
would imply an even slower growth rate). This pattern suggests non-linearity in the
relationship between ID distribution and new participants in the village migrant labor

force. We thus specify our instrument as a dummy variable indicating that IDs had

74

 

been issued interacted with the years since they had been issued, and then experiment
with quadratic, cubic and quartic functions of years-since-IDs-issued. We settle on
the quartic function for our instruments because, as we show below, it ﬁts the pattern
of expanding migrant networks better than the quadratic or the cubic functions.

Since ID distribution was the responsibility of county level ofﬁces of the Ministry
of Civil Affairs, which are distinctly separate from agencies involved in setting policies
affecting land, credit, taxation and poverty alleviation (the Ministry of Agriculture
and Ministry of Finance handle most decisions that affect these policies at the local
level), it is plausible that ID distribution is not be systematically related to unob-
servable policy decisions with more direct relationship to household consumption.
Still, using a function of the years since IDs were issued is not an ideal identiﬁcation
strategy. Ideally, a policy would exist that was randomly implemented, affecting the
ability to migrate from some counties but not others. As the differential timing of the
distribution of ID cards was not necessarily random, we must be concerned that coun-
ties with speciﬁc characteristics or that followed speciﬁc policies were singled out to
receive ID cards earlier than other counties, or that features of counties receiving IDs
earlier are systematically correlated with other policies affecting consumption growth.
These counties, one might argue, were “allowed” to build up migrant networks faster
than others.

In an earlier paper, de Brauw and Giles (2007) address several possible concerns
with use of the years-since—IDs quartic as instruments for the size of the village migrant
labor force. They ﬁrst show that timing of ID distribution appears to be related to
remoteness of the village, but not systematically related to village policies affecting
that may affect consumption growth, with village administrative capacity, or with
the demand for IDs within the village. They thus argue in favor of including a village
ﬁxed effect to control for features of the local county which may have affected timing

of ID distribution, and then identify the size of the village migrant labor force off of

75

 

non—linearities in the time that it requires for migrant networks to build up. In this
paper, we interact the quartic in years-since IDs with pre—determined land per capita

of households in period t — 2 to identify the size of the village migrant network.

3.4 Results

Before estimating equation (3.20), we establish that our instruments are signiﬁcantly
related to size of the migrant labor force. We estimate the relationship as a quadratic,
cubic, and quartic function of the years since IDs were issued each interacted with pe-
riod t— 2 land per capita. These results are reported in columns (1) (3) and columns
(4) (6) of Table C.2 for each year from 1995-2001 and odd years from 1989-2001, re-
spectivelyld‘We ﬁnd a strong relationship between our instruments and the size of the
migrant. network for each speciﬁcation. For the remainder of our estimation we favor
the quartic function interacted with t — 2 land per capita for two reasons: First, the
effects of ID card distribution on the migration network can be determined more ﬂex-
ibly when we use the quartic speciﬁcation. Secondly, the partial R2 increases slightly
from the quadratic to the quartic for the both samples we consider. After controlling
for the household characteristics, the instruments have jointly signiﬁcant effects on
the number of migrants from the village for both samples, with F -statistics of 39.82
and 54.65 for the 1995 to 2001 and odd year 1989 to 2001 samples, respectively.

We apply the method introduced in Section 3.2.2 to estimating equation (3.20).
In Table C.3, we report estimation results based on the pure random effects and
correlated random effects approaches. We obtain the pure RE estimation results using
the Stata ”xtprobit” command, where year dummies (not shown), residuals from the

ﬁrst stage estimation and their time averages (not shown), number of household

 

14 Since the RCRE survey was not conducted in 1992 and 1994, we estimate the dynamic
model with one year spacing from 1995 to 2001, and with two-year spacing from 1989 to
2001.

76

 

 

members, number of prime age household laborers, second lag of land per capita,
average years of education, share of females, lagged poverty status, migration network,
interaction between the lagged poverty status and the migration network are included
as explanatory variables. The correlated RE estimation results are obtained using the
Stata ”xtprobit” command, where year dummies (not shown), residuals from the ﬁrst
stage estimation (not shown), ﬁrst-stage residuals and all the exogenous explanatory
variables in each time period (not shown), number of household members, number
of prime age household laborers, second lag of land per capita, average years of
education, share of females, lagged poverty status, migration network, interaction
between the lagged poverty status and the migration network, and the poverty status
in the initial time period are included as explanatory variables. For purposes of
comparison, we also estimate model (3.20) using a naive linear probability model and
provide the results in Table C.4. Even after controlling for the unobserved effect using
our correlated RE approach, the coefﬁcients on the lagged poverty status are highly
statistically signiﬁcant for explaining the current poverty status in both datasets
considered. The positive sign of the lagged poverty status suggests that being poor
in a previous period signiﬁcantly increases the probability of being poor in a current
period. The initial value of the poverty status is also very important. It implies that
there is substantial correlation between the unobserved effect and the initial condition.
The coefﬁcient on the lagged poverty status in the initial time period (1.028 for 1989-
2001 dataset and 1.161 for 1995—2001 dataset) is larger than the coefﬁcient on the
lagged poverty status (0.820 for 1989-2001 dataset and 1.523 for 1995-2001 dataset).
The migrant network is statistically signiﬁcant for explaining the poverty status in
all models but the correlated RE using the 1995-2001 dataset. Interaction between
the migration network and the poverty status is also statistically signiﬁcant at 0.01
level for every case we consider. The negative sign of the interaction term suggests

that those households that were poor in the previous period are less likely to benefit

77

 

 

from the increases in the size of the migration network in the current period. In
other words, in our application, we ﬁnd that migration is important for reducing
the likelihood that poor households remain in poverty and that non-poor households
fall into poverty. Further, failure to control for unobserved heterogeneity leads to
an overestimate of the impact of migrant labor markets on probability of staying
poor of those who lived below the poverty lines. The coefﬁcient on the interaction
term between the migration network and the lagged poverty status for the pure RE
approach (-0.188 for 1989-2001 dataset and -0.180 for 1995-2001 dataset) is larger
in absolute value than the coefﬁcient on the interaction term for the correlated RE
method (-0.108 for 1989-2001 dataset and -0.137 for 1995-2001 dataset).

In Table C.5 we show the APEs for both models considered for both data samples.
For example, for the sample from 1989 to 2001, the correlated random effects CF
estimate of the APE of 100 more members in the migration network for those who were
living above the poverty level is to reduce the probability of being poor by about 3.6
percentage points. For those who lived below the poverty line, the correlated random
effects CF estimate of the APE of 100 more members in the migration network is to
reduce the probability of being poor by 5.7 percentage points. Interestingly the APEs
calculated using the correlated random effects dynamic probit approach are generally
smaller than those calculated using the linear probability model. This suggests that
using a naive LPM approach might lead us to conclude that migraton has a stronger
impact on poverty reduction than found using the correlated random effects probit

model.

3.5 Conclusions

In this paper, we have developed a dynamic binary response panel data model that

allows for an endogenous regressor. This estimation approach is of particular value

78

 

 

for settings in which one wants to estimate the effects of a treatment which is also
endogenous. We next apply the model to examine the impact of rural—urban migration
on the likelihood that households in rural China fall below the poverty line. In our
application, we ﬁnd that migration is important both for reducing the likelihood that
households remain in poverty or fall into poverty if they were not poor in the previous

period.

79

 

 

APPENDIX A

Tables for Chapter 1

80

 

Table A.1. Usual Unobserved Effects CRC Model for 6 = 2 and T = 5

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(ll (2) (3) (4) (5) (6)

Estimator Time Dummies? Mean SD RMSE LQ Median UQ

N = 100
POLS no 3.363 .189 1.377 3.238 3.356 3.486
FE—OLS no 2.616 .138 0.642 2.527 2.621 2.711
IV no 2.752 .225 0.781 2.612 2.761 2.901
FE—IV no 2.423 .214 0.484 2.288 2.429 2.558
FE—IV yes 1.945 .407 0.407 1.711 1.980 2.208

N = 400
POLS no 3.369 .091 1.372 3.299 3.366 3.434
F E-OLS no 2.623 .067 0.635 2.575 2.626 2.667
IV no 2.745 .110 0.760 2.666 2.740 2.818
F E-IV no 2.428 .096 0.455 2.362 2.423 2.498
FE—IV yes , 1.988 .177 0.213 1.887 1.997 2.101

N = 800
POLS no 3.373 .063 1.375 3.330 3.366 3.412
FE—OLS no 2.625 .046 0.637 2.596 2.624 2.655
IV no 2.753 .076 0.764 2.700 2.750 2.801
FE—IV no 2.436 .068 0.458 2.389 2.437 2.480
FE—IV yes 2.004 .131 0.182 1.919 2.009 2.091

 

81

 

Table A2. Usual Unobserved Effects CRC Model for 6 = 2 and T = 10

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(1) (2) (3) (4) (5) GD

Estimator Time Dummies? Mean SD RMSE LQ Median UQ

N = 100
POLS no 3.204 .157 1.223 3.097 3.195 3.314
FE—OLS no 2.534 .106 0.562 2.469 2.531 2.603
IV no 2.397 .123 0.440 2.324 2.395 2.475
FE-IV no 2.277 .115 0.331 2.208 2.276 2.351
FE—IV yes 2.013 .283 0.313 1.841 2.020 2.210

N = 400
POLS no 3.196 .077 1.202 3.146 3.193 3.247
FE—OLS no 2.528 .056 0.545 2.490 2.527 2.565
IV no 2.392 .061 0.417 2.450 2.393 2.431
FE—IV no 2.270 .060 0.305 2.231 2.274 2.308
FE—IV yes 1.995 . .138 0.186 1.901 2.002 2.092

N = 800
POLS no 3.194 .054 1.200 3.155 3.194 3.224
FE—OLS no 2.525 .040 0.541 2.498 2.523 2.551
IV no 2.388 .042 0.410 2.357 2.387 2.416
FE—IV no 2.268 .041 0.299 2.241 2.267 2.294
F E-IV yes 1.992 .100 0.160 1.926 1.993 2.062

 

82

grmm‘j
A

 

Table A3. Random 'Irend CRC Model for 6 = 2 and T = 5

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(1) (2) (3) (4) (5) (6)

Estimator Time Dummies? Mean SD RMSE LQ Median UQ

N = 100
POLS yes 4.293 .300 2.303 4.096 4.284 4.475
FE—OLS yes 2.673 .182 0.697 2.555 2.671 2.782
IV yes 2.929 .850 1.247 2.444 2.941 3.496
FE—IV yes 2.000 .626 0.642 1.635 2.057 2.383
FE—IV no 13.414 1.411 11.422 12.464 13.221 14.225

N = 400
POLS yes 4.308 .144 2.312 4.201 4.307 4.411
FE—OLS yes 2.663 .085 0.679 2.607 2.666 2.721
IV yes 3.004 .411 1.073 2.704 3.023 3.292
FE—IV yes 2.013 .269 0.301 1.835 2.019 2.204
FE—IV no 13.406 .665 11.406 12.915 13.340 13.878

N = 800
POLS yes 4.296 .097 2.294 4.225 4.295 4.363
FE—OLS yes 2.660 .060 0.671 2.617 2.658 2.700
IV yes 2.996 .278 1.038 2.809 2.993 3.171
FE-IV yes 1.996 .187 0.223 1.874 2.005 2.130
FE—IV no 13.351 .478 11.328 13.049 13.318 13.654

 

83

Ti],
: |

V I’d—1

 

Table A4 Random Trend CRC Model for B = 2 and T = 10

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(1) (2) (3) (4) (5) (6)

Estimator Time Dummies? Mean SD RMSE LQ Median UQ

N = 100
POLS yes 4.789 .407 2.820 4.522 4.814 5.051
FE-OLS yes 2.651 .178 0.687 2.539 2.656 2.761
IV yes 2.916 1.042 1.401 2.357 2.976 2.615
FE—IV yes 1.968 .619 0.641 1.603 2.001 2.384
FE—IV no 15.933 .771 , 13.919 15.367 15.902 16.479

N = 400
POLS yes 4.808 .190 2.815 4.678 4.808 4.943
FE—OLS yes 2.662 .089 0.678 2.600 2.662 2.718
IV yes 3.000 .504 1.137 2.659 2.993 3.361
FE—IV yes 1.981 .311 0.338 1.767 1.978 2.203
FE—IV no 15.900 .406 13.875 15.633 15.890 16.177

N = 800
POLS yes 4.788 .144 2.784 4.682 4.779 4.885
FE—OLS yes 2.663 .062 0.674 2.618 2.660 2.703
IV yes 3.000 .360 1.061 2.759 3.026 3.243
FE—IV yes 1.997 .201 0.234 1.855 1.998 2.132
FE-IV no 7 15.904 .289 13.888 15.693 15.895 16.098

 

84

 

 

APPENDIX B

Tables for Chapter 2

85

 

 

898835 was .3053, mcommmmummu 2:. no woman was $5858 =< .m H K. 98 m M Q ”802

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

86

 

as: 83 $3 33 Sad 48.0 H mm ..€< 83 mo s
so.“ 33 83 $3 £3 at; $3 83 2am can. u see
22.. $3 $3 and weed we; 5:. $3 20me can. u case
3.3 33 ES 82 $3 $3 £3 £3 2 SN. u sass
33 some $3 $3 mood Ego was as: Boa 8m. n ... E
82 u 2
$3 8.: as: $3 ago 83 H mm :é «cos .5 a
$5 sod £3 83 was 83 ES 83 >1: as. Hi
83 mm; 23 £3 ”was «was 35.0 $3 Soda mg. n sass
«New amen .8; 3.2 was 83 SS 33 2 mam. u sass
mews News. 32 £3 83 woos ass wows Boa as. u N as
com n z
o n 5
$3 $3 3.9m me; an: 25¢ H mm ...3. $3 .8 s
33.. some 2.3 :3 Rod 0de was mos.” Rama 3.... u s“ q
83 23 moms mass was ES 23 was Selma as. n was
moss swam £3 as: is «was wees as...” E as. n sass
So.” .33 mass 33 33 macs ES a? 28 SN. n“. sq
82 n 2
$3 $3 was om; Sod Sod .... mm ..€< $3 .5 s.
E; some 83 83 SS «:5 $3 gem >2: Em. n can
$3 was $3.. 33 83 83 E5 was Selma :a. u. ease
as...” we; 33 3.3 23 23 33 as...” 2 as. n same
as.” 83 moms £3 3.2 on; 83 was Sea as. u N ....c
25 n z
o a w
0: uses: 3 mmzm mm mm sea mm mom _ 5.529858sz
9: av Q E A3 3v 3 n 5 _ g 3

 

ENS” m—HOH—Gmuﬂoo ROM #0602 UMHO Howmm U®>h®mQOGD ~NSWD .H.mm 236.8

 

 

.mﬂaﬁdw was ﬁt.» 3:23:33 23 no @033 8.3 $0 3:: 33 3838333 3:? =< .m H K. 9:3 m M Q 6qu

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

:333 333.: :33: 333 33:.3 33:.3nmm..€< :333 mo 33. ”333333:
33.3 333.3 333.: 3.33 333 333 333.3 333.3 >33 33. um. 3::
333.3 :33 :23 :333 333.3 333.3 333.3 3:3.3 3:03: 333. n 33333
333 333 333.3 33.: 333.3 333.3 333.3 333 >: 333. ”3333
33.3 333.3 :33 333.: 333.3 33:.3 333.3 333.3 3:9: 333. ”N33
333:u2
3:3 :33: 333.: 333.3 3:3 33:.3nm3.€< 333.: 30 3:3. ”333%
33:3 :33: 333.: 333.3 33:3 33:3 33:3 333.: >333 333. “€33
333.3 3:3.3 33:3 333.3 333.3 333.3 3333 3:33 3:09.: 333. “53333
3.33.3 333.3 :33 333.: 333.3 333.3 333 3:3 >: 333. “3333
3333 333.3 333.3 333.: 3:3 :33 333.3 :33 3:9: 333. ”N33
333n2
on
:333 333.3 333.: 333 333.3 333.3nmm..€< 333.3 30 333. ”3.3.3
333 333 333.3 333.: :3 33:3 3333 333 >13 :3 ”Mesa:
3:33 333.3 3:3 333 333.3 333.3 :333 3333 3:09.: 333. n 3333
:33 333.3 333 333.: :3 333 3:3 333 >: 333. “.333
333.3 333.3 :33 333.: 333 333 :333 333.3 Son: 333. ”33
333:nz
33:3 3.3 5.3.: 333 333 33:.3umm..€< 333.3 30 3. ”33am:
:3 333 :33 3a.: 333 333 333.3 333.3 >53: :3. “:333
333.3 333.3 333.3 333 :33 333.3 3333 3333 3:03.: :3. n 3333
333:. 333 3:3 333.: 333.3 333.3 :33 E3 2 3:3. “333.3
333 333.3 333 333.: 333 333 3.3 333.3 Son: 333. ”Nam
333n2
3:33
a: 5:332 a: 3:33 3:3 mm .38: mm .333: _ as: _ seesaw
8:3 :33 :33 E :33 :33 :3 _ :33 _ :33 S

 

3.33 325380 .83 :33): 0:0 38:. 88:5,: 3.3: 233.3

 

328533 was 3:53:33 34533393333 35 no 3333.33 83. 3338533 :< .m H E 9:3 m H Q 6362
mid mood Em: Hmmd wwﬁo H36 H Hm ..€< 89m mo 33
mood at: mm»: wvmd mid nwﬁd 8:3 mmwA >Tﬂm mvm. 3 33:

 

 

 

vwmd wEN god vmhd mmod «mod owed wEN mAOAmE omm H 3333.:

SEN uvmﬁ 939m mood mmmd oomd nmmd «mad >H 9% H 3336

3.3....“ @336 3.36 33.3” wmad med 33.0 33mm mqom mmm. H 33333
83 H 2

maﬁa «NON mmwg wnmd Sad wmmd H Wm .333 NSN m0

3mo.m wwwa mama wand mmmd mead mmmd 3%.: >353 93m. H3M3Q.
«SN wand awed 333.6 32.: 3.de mmod KEN mQOAmm omm. 3333.

 

 

 

 

 

N38 ommﬁ wwNm Sad End wcmd womd Fond >H omm. H 333%
wwwh 335m wood 832m wand mmmd mad Edam qum mmm. H 33333
com H 2
o H w

nmmd wwﬁm OMQN 3&6 326 32.0 H Hm ..€< bmHN m0

me.m mood mth and mtd mtd mid mend >Tm~m Em. H3M3m.
:oN mmmd omvﬂ owmd mmﬁo maﬁa mmod mmmd mAOkm—m mvm. ”33¢:

 

 

mmvd mwmd omﬁm mama wmmd mvmd mmmd «wad >H 3mm H 3.3333:
83¢ ommé 23w wmmd mad :36 3.3.0 mango mAOm 53m. H 333m
coo: H 2

momN SAN Sod momd mmmd Hmmd H "mm ..€< omﬁm m0

nmwd EQN mmmd @320 ovmd 5&6 mead awed >7mm owm. H3M3m:
©m©.m and «SN Emd 335.0 mid 33.0 wmmN mqoumm 93m. H333:

 

 

 

 

 

 

333.3 333.3 3:3.3 333.: 3333 3333 3333 3333 >: 333. ”3333
333.3 333.3 33:.3 333.3 333.3 :333 33:.3 333.3 3:03 333. ”3333
333n2
3333

0:: 92333,: a: 3323: 3:3 :33 .303: m3 .333: _ 3332 _ 53:35:33:
:3: :33 :33 E :33 :33 A33 _ A33 : :33 E

 

:: 33 w 333 83 683,: 0:0 38333 332838: :33: 3.3: 3:333

88

.meESU was ﬁt.» mzoﬂmmﬁwmu 2: :0 893 v.8 30 v5 33 mmpdESmm m5 =< .m H B 98 m M Q ”802

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

89

 

 

 

 

 

:3 $3 $3 $3 wood msdumm :E 83 mo 2m. "momma
$3 SM: 83 32 and mag 83 mm? >5: g. u Hana
£3 33 em.“ 83 SS 23 £3 E? macaw 8m. n mag
83 n3.» 8% £3 £3 :3 £3 :3. 2 8m. mama
3.2: 25.3 E; 5; $3 $2 «and £3: Boa Em. Him
coauz
$3 on? 83 £3 EMS Ewouma ..8< 33 mo 3. “momma
SS 52 83 $3 $3 £3 £2 23 2mm gm. n mask
33 am: $2” $3 $3 $3 83 max SOLE Em. u “an
Sww $3 $3 3?. $3 $3 3.3 mm: 2 www. “and
we; 83: 8m; 98w :2 Had was 32: Son Em. ”Em
oomuz
onw
mwi :3 ME: 80o avg avoumm A? am; no 5m. um“?
83 £2 83 33 32 $2 meg a? >1: Sm. ”Human
83 at; 33 S: 83 82. £3 E; Scam g. n mam
83 £3 a? $3 83 23 83 $3 2 8m. .1.an
$2: my; mama new.» $3. 03% £6 E; Son Sm. ulmq
ooSnz
a: 8mm 2: SS mﬁd m8.oumm.€< 83 mo :3. “Mama
:3 mad mama ES 32 Eve ﬁne 83 >15 3m. n H06
Ema $3. am? a: :2 £2 3m: 33 30mm 3. n saga
$5 83 $3 83 E: 32 83 $3 2 mam. “gum
9&2 @wwd mono owws weed ego memo mama Son 3. “NE
oomnz
ous
a: $602 3 $33 8 mm com mm we“: 582 teaasmm
42% A8 g S Q Q 3 _ 8v _ g 3

 

: ,3 w :3 .81“ ESE Odo v8? 885m 42m gag

Table B5. Standard Errors for the Control EJnction Approach

 

(1) (2) l (3) I (4) (5) (6) (71

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

g N | Mean l Reg. SE Rob. SE Adj. SE SD ﬂ
gm on a large support set
Usual Unobserved Effect CRC Model

71$ 0 500 2.059 0.040 0.035 0.056 0.057
1000 2.057 0.029 0.025 0.040 0.039
= 0 500 2.002 0.080 0.075 0.090 0.089
1000 2.000 0.057 0.053 0.064 0.062

Random Trend CRC Model
0 500 2.039 0.194 0.113 0.122 0.126
1000 2.037 0.137 0.081 0.087 0.084
= 0 500 1.993 0.278 0.169 0.176 0.178
1000 2.001 0.196 0.120 0.124 0.120

3125: E (0.1)
Usual Unobserved Effect. CRC Model

# 0 500 2.150 0.204 0.211 0.221 0.233
1000 2.137 0.143 0.150 0.157 0.157
= 0 500 2.012 0.319 0.246 0.256 0.241
1000 2.005 0.225 0.174 0.181 0.186

Random Trend CRC Model
75 0 500 2.229 0.615 0.690 0.693 0.723
1000 2.182 0.433 0.497 0.499 0.494
= 0 500 1.949 1.077 0.875 0.877 0.864
1000 1.965 0.760 0.617 0.619 0.668

 

90

 

 

Badge, :08 SM @2th 33.52%

2:. ~96 $33533 Ba 32.83% 235an one 982: 25. A5 .mqosegow Becca? one 88:30an E mosscmsg 3 ”8on

 

 

 

 

 

 

m: R a. 95$ do 38:2
838 8.2 $33 8.2 95.5 8.2 EBB .3 mazes. do 28m
0.3: gm 33v :3 :89 a: one: 2: 53 8am 92%

$3pr woocﬂam

a 8 . a: 8:3 do 38:2
83$ 32 33$ 3.2 :33 3.3 EBB ea wages. Mo 253
2o: 2:. ace «.3 :98 $3 was: 2: .83 Sam geom

 

 

 

 

8%an vooaﬂwncb

 

 

”EEO oﬁoomm 82 ED _ €30 wok/Boom _ magi =<

 

Ea. 25E

 

@388,

 

mummSaQ Umocﬂwm «:8 wooaﬁmaab Eob momummuwum magnum—m .mm 2an

91

Table B.7. POLS Estimates of the First Stage Regressions

 

 

 

Variable ] (1) (2) (3) (4) (5) (6)
g'r'cmLz-t 35.410* 36198” 28697“ 32.504* 29185" 32736"
[5.934] [5.515] [5.364] [5.024] [5.294] [5.081]
grant. -7.693 -8.482 6.788 2.981 -2.352 -5.903
[11.885] [12.530] [16.474] [17.456] [12.982] [14.064]
dtt No Yes No Yes No es
Control Variab1e(s) No No Yes Yes Yes Yes
R2 .258 .291 .314 .332 .284 .304
Number of Observations 135 135 117 117 120 120

 

 

 

Notes: (i) Dependent variable is hrsempit; grantit is a dummy indicator for whether
a grant was received, W.- is a time average of grantit. (ii) Quantities in square
brackets are fully robust standard errors. (iii) Row called ”dtt” indicates whether
a regression includes separate year intercepts for 1988 and 1989 on the ﬁrst and
on the second stages of estimation. (iv) Control variables in (3) and (4) include
log(employz-t) ~ the log of number of employees, log(salesz-t) — the log of annual sales,
and log(avg.sal,jt) ~ the log of average employee salary. Controls in (5) and (6) include
log(amgsalit) only.

*Statistically signiﬁcant at the .01 level.

92

 

 

.... I; .ll|]l_

 

888888 m0 8... 8w .o.m 888.88

08 .88858 >H-mm 88 8M .88 888.8 31¢ maﬁa 888 0028-038 8?”: mo. 05 8 .1. :82 S. 85 8 pawocEmE 382858?
.8388 83 88w

8 8: 8 8:85? 9:80:05 @3828 88:30 5 mm 83384, $888588 8:... .83 nwsoﬁu 3v 806888 80m Tc 828858
«0 888m 8808 one :0 08 8E 85 :0 33 O8 wwmﬂ 8“ 88088 8o» 80.888 80305 28688me a 8883 88085
28%.. 00:8 30m A45 .8803 SO 588m 8558 ow80>m 88 m0 m2 8:... mm 033.84, 8.980 23. 95 828838 888.0. .55 on...
8“ 088.88 888 08888 08 80x85 380 E 833850 88 gnome-28.80 8E8 088 8828 8:036:08 cosmowmuommmﬁa
88808 9808 888w 8 8838 88.8 08988 ..0._ .888 0.800me 8258 8.8 8mean 8858 8 8530.80 .888 8388.0.
83w8 88 8858.88 5 885830 93 @888 03 88 8-8 O88 88:0 2: .8 wOZ 85 E 03358 88880 BC. 3 ”88Z

GE 02 92 o2 m2 m2 ca 02 mm: mm; .80

8% OZ 8% OZ 8% OZ 8% OZ 8% OZ #8
8% 8% 8% 8% OZ OZ 8% 8% OZ OZ 88:00
was. 82o. mac. ”so. mac. mﬁo.

So. to. m8. m8. m5. m8. M8” M8; M8; MS;
as. So. ms. So. So. 2o. 25. 3o. m8. 8o.
.88.- .mmo- ...-8.- ...-.8- 80. .53.- so- me? m8.- ..m8- £8.38
.8 8 .8 .8 .8 8 Era-E >18 >18 >18 288/
83 av Q E g Q E 3 5 E

maommmmuwmm Owwum OcOoom 8: m0 moawﬁmumm m0 88 >Hum—m .m.@ 2an

 

93

 

 

 

 

 

 

 

 

 

 

633.5, 58 H8 @2qu $3.83»
23 .8>o $83318 2a mcosmgov @5853 98 282: BE. A5 .mcosagow 2228.5 03 mommﬁcobwa E mossﬁwsg 3 @802

 

 

 

 

 

M: R 3 WE; go 28:2
8333 $.89: aaommé 2.83: $33.3 3.35M: Edam Essa mags:
3385 mgvgw 233.8 $123 @3218 $33.» 8%

sci 3% :25 £3 Swot 3.8 Emigaetm
p520 mﬁoomm 902 ED # 330 838mm _ £55 :<

 

:59 Sam _

 

 

83525? 35:00 93 new momumﬁwum %.EEE:m .md 2an

94

APPENDIX C

Tables and Figures for Chapter 3

 

95

Figure (3.1. Share of Village Labor Force Employed as Migrants by Year

 

Share

 

 

1987 1989 1991 1993 1995 1997 1999 2001 2003
Year

 

96

 

I—l

Average Vlllage Consumption Growth

‘0
u:

 

A
f 1-

Figure C.2. Village Consumption Growth

 

 

 

I r I

103 _ o ‘ 100
Change In Number of Migrants
leowess Fit ' “Linear Fit l

at

 

 

 

97

 

Change in Village Poverty Headcount

.os _

 

Figure C.3. Change in Poverty Headcount

 

 

-1113 o 106

 

Change in Number of Migrants
I—Lowess Fit "“Linear Fit I

 

98

 

 

Change in Number of Out-Migrants

Figure C.4. Change in Out-Migrants in Village Labor Force

15‘

...
D
1

in Village Workforce

 

 

I I I Y

-5 o 5 10
Years Since ID Cards Issued

99

 

.52; m3?— owo. E 002002 8:

333 50 2:02: 0:: coca—5:200 2202

 

 

_ as 23 23 ~23 5% .a
23 :82 828 $9 :3: £8 28.8 23 :82 225 a 2 882 25, 9 8:5 23>
23 23 882 : Z 5% .a
8.22 v 32 8.22 222. 2.32 :82 2.22 23 22. 22S 8 2 882.520 2 .6 28>
8.8 :8 8.2: m2: 5% a
3: 282 E .23 as: :82 am: as _ v 5%. 223 2: 82 252.2 00 .2232
83 as 82 .3 5% .a
as 282 23 SN? 28 :82 23 $83 :82 2.822 08 22m 2238:
3 3. 2 3 5% a
3 22~ 3 SN? 3 22.0. 3 82v :82 8:88: Co 23> 0222 223.8:
2 S o._ N: 5% .a
E 38.2 3 Sex. 2 :82 2 82: =22 220 .2 23 22020:
as: 2 3 2 5% .8
22 :22 3 :23 3 22.8. 3 28:1 52: 2223 2288: ”ME 222 2 .2232
E 2 Z 2 5% a
3 282 3 2:3 3 £92 an 28:. =85 22522 2238: 2 8252
to: m. _ mm 3.: v. _ 2 5% 2.
2% 22m 3% 883. 8.2m 22.». 8.92 :2 s. =82 220 2 88.2200 2238:
3mm 8.28 v.26 8.22 5% a
N. .2 :82 8.3 $94 $2 :82 3% 82 _ v 222 220 .2 2282 22020:
SE. 23 one 28 5% a
83 :82 2 .o 89% 22 v32. 28 22:. 582 2.3 588: 2288:
295m 295m
8232: .20 228m :2 .20 .0852: .20 298m :5 .20
88 2 $2 .8: an; 80 88 e 22 82 80>

 

2:252:20 o22> .23 223.8: 2.0 222.

100

Table C.2. Factors Determining the Size of the Village Migrant Network

First-Stage Regressions

Dependent Variable: Number of Migrants
Odd Years from 1989 to 2001

Years from 1995 to 2001

 

Model (1) Q) (3) (4) (5) (6)
Household Population 4.709” 4.763“ -1 .785” 2335"" 2338"" -2.341""

(0.867) (0.862) (0.862) (0.656) (0.655) (0.655)
Number ofWorking Age 2.516" 2.300" 2.370” 4348"" 4269*” 4284“"
Laborers in the (1.046) (1.037) (1.037) (0.773) (0.771) (0.770)
Household
Land Per Capita .-2 9363”" ~3.719*** 4752”" 10.53?” 14.43?" 15.540”“

(2.666) (1.337) (1.414) (1.503) (1.825) (2.067)
Average Years of -0.293 -0.267 -0.272 -0.190 -0.l93 -O.206
Education

(0.345) (0.343) (0.343) (0.270) (0.270) (0.270)
Female Share of the -0.410 —0.564 -0.707 1.805 2.023 1.999
Household

(3.139) (3.123) (3.126) (2.925) (2.923) (2.924)
(Years-Since-ID-Issued) 2020*“ -3.899"'” 4396*“ -0.169 -2.513'””" 3925""
" (Land Per Capita ..2) (0.421) (0.462) (0.823) (0.273) (0.535) (0.802)
(Y ears-Since-ID-Issued)2 -0.100"'"”" 0772"" 1779*" -0.093*" 0246"" 0633*"
"‘ (Land Per Capita (.2) (0.020) (0.081) (0.241) (0.015) (0.071) (0.167)
(Years-Since-ID-Issued)3 0034*" 0123"“ -0.014“”" 0050*“
" (Land Per Capita t.2) (0.003) (0.022) (0.003) (0.015)
(Y ears-Since-ID-Issued)4 0.003 "' "' " 0.001 "' *
" (Land Per Capita (.2) (0.001) (0.000)
Observations 25692 25692 25692 22812 22812 22812
R-squarcd 0.09 0.10 0.10 0.22 0.22 0.22
F-Statistic on le with 11.80 23.10 34.21 55.84 37.59 29.68
Averages
F-Statistic on IVs w/o 12.63 40.28 39.82 109.40 71.81 54.65
Averages
Partial R2, IVs with 0.003 0.009 0.015 0.007 0.007 0.008
Averages
Partial R2, IVs w/o 0.000 0.001 0.002 0.006 0.006 0.006
Averages

 

Notes: In parenthesis we show fully robust standard errors [*** p<0.01, *"' p<0.05, * p<0.1].
All regressions include time averages of the explanatory variables and year dummies.

101

 

 
 

0:000 .00
0:0. 00 w0_ 0:008 5.3 0000.25 8.0. :000 £05 000.02-003-90052200» 00 30:09:00 00.000 0.0 0030:? 00:05:55 0:. A3 500....
A: 20.00080. .0 0 ..00» :000 E 0w000 00.: 0:. 0:0... £000.00. 0:0 :0_.0_0..00-_0_.0m 00 00.0 202050. 0.00.0. 00.: 00205 :8 0:0 A3 0:220:03.
.w0w0.0>0 0E: :05 0:0 :0_.0_0..00-_0_.0m 00 00.0 200050. 0w00m 00.... 00205 AC 0:0 CV 22000800 00:55.0 .00» 0:0 :00» 5000 E
0030:? 2.20535 05 00205 2.0.20.3. :< .EdV0 .. .mo.OV0 ...... ._o.cV0 1:; 0.0..0 00053 0000023000 3050 03 $0055.00 5 “00.02

 

 

Sam 0. 93— 80.0 m.00> 000

Sou 0. 30. E0... m.00>

0.20% b.0>00 .0_00..0> 0:00:0000

m:0_mm0.w0m 0w00m-0:000m
0:005 €0.50 ..0 £=0=_E.0~0n wﬁﬁﬁcmm— 0. :000.00< mu .mQ 0300.

0mm 0mm 0mm 0mm 00:0 88.8: .0. 22828:
N000 N000 82. 000.. 2.238.. .0 .282
0.00.0. 0.02 0008 0008 2085.300
2008 20008
5:08.. than. 08:08. 0. 202:3 .8222.
Q. ..8 @008 82.8 30.8
N. 0.0 $020. 00.0. 1.....80- 2.28... .0 320
8.0.8 000.8 50.8 300.8
. .00- ...:300- .000- 1.8000- 80880 .0 28> 8202
68.8 800.8 090.8 800.8
.80- m .00- 0.0.0- .. :00 2.80 .2 2a.. .0 02 0880
$00.8 3.00.8 2.08 $00.8
.2220. 1.32.0- 3.00.0- .2280- m.323 2238: 00.x 32.0 .0 .2202
50.8 800.8 88.8 3008
0.1.me6 02.2., _Nm.o ..zILVNvd .12,—5N6 302502 300—8203 MD 32:52
8.0.8 0.00.8 2008 50.8
...:00 . .0- :80 . .0- 0.00. ......0000. 00.2.0582 0.:
0.88 800.8 020.8 03.08 202.; 082800
.4280. 3.80.0- 3.22.0- .1020- 8003 2a 0000:0582 05.0.2000 802.2...
000.8 8008 $00.8 800.8
...:0000 :22. :10... 3.20.. 282$ 2023.00 8003
mm 00.0.0200 mm 0.00 mm 0000—230 mm 0.30 0002
00 6 .8 :0

102

 

00000 .00
0:0. 00 w0_ 000000 2.05 00.00.22 :..0. .0000 223 00002003000020.0000 00 02.05.00 00.000 0.0 0032.? 0.0020002 02. 3. 20.002.
2. 00230.3. .00 000% .0000 2 0w0.0 .0... 0:. 0.0.0 2000.00. 0:0 :0_.0_0..00-_0_.0m 00 00.0 200060. 0w0.m .000 00202 3.. 0:0 AN. 20.000000
.m0w0.0>0 0:... 00... 0:0 000.0—000002.00. 00 00.0 200000. 0w0.m .000 00202 Am. 000 2. 002000000 ...,-0.0.0.00 .00» 0:0 ..00» .0000 2
0030.03 b0.0:0_0x0 0... 00202 0:20.083. =< ._H_.ov0 ... .modV0 ...... ..c.oV0 ......L 0.0..0 0.00:0... 0000003000 3020 03 0.0050000 :_ “00.02

 

0.000 0.000 00000 00000 2202800
000.8 00.0.8
2:020 2.0.0.0 000.0000. ... 0.00.2 2022.00
6.0.8 0.0.8 0.0.8 A. .00.
.000- ...:0000- 000.0- 1.0000- 00.0000. .0 0.20
000.8 000.8 .0008 000.8
.000- ...:0000- 000.0 :._0000- 00.08000 .0 23> 00203.
000.8 .0008 .0000. .0008
.000 0.000- .000 1.0000 0.000 .2. 050-. .o 02 2800
000.8 .0008 .0008 0000.8
0:; _No.ou *anugmmodn 0.10—Gd- .....im—odu "0.0.090:— Eocomno: 0&0. 050.0 00 003552
000.8 0000.8 .0008 .0008
...-20000 3.0.0.0 ...:0000 ......000 0.2.52 0.288... .0 .2502
600.8 000.8 0000.8 .0008
2:080- 3.0000. 0000 10.00- 00.0.2002 0.:
000.8 800.8 .0008 0.0.8
0.0003
...:0000- £00000- ......0000- ...:0000- 202200 000000 20 00.05502 0.2 .o 00.082...
0.0.8 0.0.8 0.08 50.8
3.0000 ......00...0 3.0.0.0 1.3000 0.002> 0022.00 8000-.
S .0. .0. 3 .02.).
00 00.0.0000 mm 0.00 00 00.22.00 mm 0.00

Son 0. $3 08.0 m.00> 000 Sow 0. «00. 0.0.0 m.00>
00.06 b.0>00 ”030.03 .0000000D
000.000.w00 0w0.w-0:000m
00.05 5.0.60 00 0.002—0.0.05 .00 .0002 3:00.30...— .0005 0.0 0300-

 

103

 

 

.EdVQ .. .3.on ...... . Edva .11.. 85:0 Emccﬁm 3228:8002 302m 95 832E282 E 6302

8mm 88 8mm 8m 8m 82 8m 88 2888838
888.8 2888.8 88.8.8 88:88
88:88.
3;: 88 ...-$58 ...-8:88 ...-8558 E 2:98 €38
888.8 E .88 $28.8 $28.8 888.8 228.8 2828.8 2888.8
2888 ...-.8388- 8888- ...-.8288- 8888- ...-.8888- N888. :.._8Nmm88- 82858888 228
8888 2:88 $28.8 888.8 $88.8 828.8 228.8 388.8
COSMOS—um
8:88- ...-8888- 8888- ...-8888- 8888- ...,-8888- .9888 ...-8888- 83;; 8982
8:88 2828.8 338.8 588.8 288.8 888.8 2888.8 588.8 8
2 8

8888- 8888- 8888 8888- 6288- 8888 2:88 .3888 $8 838 83 8888
$28.8 $88.8 388.8 28888 $88.8 @888 2:88 2888.8

90.58.31— Eosomzoz
...-8888- ...-88888;...8288- ...-83.88- .8288- ...-8888- :3888- ...-88288- 88< oat-288.3582
9888.8 €288 €888 328.8 2288.8 €288 638.8 8888

803502
...-.3888 ...-1888 ...-2888 ...-.8888 ...-8588 ...-.8888 ...-8888 ...,-8:88 2288523582
82888 288.8 888.8 2.888 388.8 888.8 888.8 888.8 a
_u_-._
...:Emoood- *uvmbooo.o-u**mwooo.ou ......Looood- ***mmooo.cu ...»...Omoocd- 8.8.8.0359? *uuoccood- 20:3 x5332 COS—Ewen
2888.8 388.8 688.8 688.8 388.8 8288.8 $88.8 @888 8
8n_-._
...-8888- :_._§88-:-8m888- ...-8888- 8888- ...2888- 8888 :8888- 583 82382 88832
2388 3:88 8888 288.8 3888.8 888.8 2888.8 688.8

....Iwmov—d ***©;vm.o unfuomcwmd ***mohmv.o ..iihwow—d “...."Lvmmmd 3.2.2-me6 ...»...oowwmd 323m 3.535 vowwmq

 

 

 

28 E 68 5 g 6 Q A: 6882
mm mm mm
mm 038—280 mg 2:.”— naﬂotou mm 85 8228.80 mm 2.5 832250 mm 28m
228:5... _€EoU SE1. :ococzm 35:00 EA:
Bow 8 3o. Eoﬁ 83> 280 58 S 32 Eat 28>

3.35 3.325 .3 3.535.8qu .3 ﬂue-cm .5:an own-83‘- .mQ 038%

104

BIBLIOGRAPHY

Angrist, J. D. (1991). Instrumental Variables Estimation of Average Treatment
Effects in Econometrics and Epidemiology. National Bureau of Economics
Research Technical Working Paper Number 115.

Ahn, S. C., Y. H. Lee, and P. Schmidt. (2001). GMM Estimation of Linear Panel
Data Models with Time-varying Individual Effects. Joumal of Econometrics
101, 219-255.

Bane, M. J. and D. T. Ellwood. (1986). Slipping into and out of Poverty: The
Dynamics of Spells. Joumal of Human Resources, 21(1), 1-23.

Benjamin, D., L. Brandt, and J. Giles. (2005). The Evolution of Income Inequality
in Rural China. Economic Development and Cultural Change 53(4), 769-
824.

Cai, F., A. Park, and Y. Zhao. (2007). The Chinese Labor Market, chapter
prepared for China's Great Economic Transition, Loren Brand and
Thomas Rawski (eds), Cambridge University Press (in press).

Cappellari, L. (1999). Minimum Distance Estimation of Covariance Structures,
5th UK Meeting of Stata Users.

Card, D. (2001). Estimating the Return to Schooling: Progress on Some
Persistent Econometric Problems. Econometrica 52, 1 199-1218.

Chamberlain, G. (1980). Analysis of Covariance with Qualitative Data. Review of
Economic Studies 47, 225-238.

Chamberlain, G. (1984). Panel Data, in Handbook of Econometrics Volume 2, Z.
Griliches and M. D. lntriligator, (Eds.). Amsterdam: North Holland, 1247-
1318.

Chan, K. W. and L. Zhang. (1999). The Hukou System and Rural-Urban
Migration in China: Processes and Changes. China Quarterly 160, 818-55.

Chay, K. Y. and D. R. Hyslop. (2000). Identiﬁcation and Estimation of Dynamic
Binary Response Models: Empirical Evidence Using Alternative
Approaches, mimeo.

Chen, S. and M. Ravallion. (1996). Data in Transition: Assessing Rural Living
Standards in Southern China. China Economic Review 7(1), 23-56.

105

 

 

 

Cornwell, C., P. Schmidt, and R. C. Sickles. (1990). Production Frontiers with
Cross-sectional and Time-series Variation in Efﬁciency Levels. Joumal of
Econometrics 46, 185-200.

de Brauw, A. and J. Giles. (2007). Migrant Labor Markets and the Welfare of
Rural Households in the Developing World: Evidence from China.
Michigan State University, Department of Economics, Mimeo.

Du, Y., A. Park, and S. Wang. (2005). Is Migration Helping China's Poor? Journal
of Comparative Economics 33(4), 688-709.

Garen, J. (1984), The Returns to Schooling: A Selectivity Bias Approach with a
Continuous Choice Variable. Econometrica 52, 1199-1218.

Giles, J. (2006). Is Life More Risky in the Open? Household Risk-Coping and the
Opening of China's Labor Markets. Joumal of Development Economics
81(1), 25-60.

Giles, J. and K. Yoo. (2006). Precautionary Behavior, Migrant Networks and
Household Consumption Decisions: An Empirical Analysis Using
Household Panel Data from Rural China. Review of Economics and
Statistics (in press).

Hahn, J. and G. Kuersteiner. (2002). Asymptotically Unbiased Inference for a
Dynamic Panel Model with Fixed Effects When Both n and T Are Large.
Econometrica 70, 1639-1657.

Hall, R. E. and C. I. Jones. (1999), Why Do Some Countries Produce So Much
More Output per Worker than Others?. Quarterty Journal of Economics
1 14, 83-116.

Heckman, J. J. (1981). The Incidental Parameters Problem and the Problem of
Initial Conditions in Estimating a Discrete Time - Discrete Data Stochastic
Process, in CF. Manski and D. McFadden, (Eds.), Structural Analysis of
Discrete Data with Econometric Applications. MIT Press, Cambridge, MA,
179-195.

Heckman, J. J. (1997). Instrumental variables: A study of Implicit Behavioral
Assumptions Used in Making Program Evaluations. Journal of Human
Resources 32, 441 -462.

Heckman, J. J. and E. Vytlacil. (1998), Instrumental Variables Methods for the

Correlated Random Coefﬁcient Model. Journal of Human Resources 33,
974-987.

106

 

Heckman, J. J. and E. Vytlacil. (2005). Structural Equations, Treatment Effects,
and Econometric Policy Evaluation. Econometrica 73, 669-738.

Holzer, H., R. Block, M. Cheatham, and J. Knott. (1993), Are Training Subsidies
for Firms Effective? The Michigan Experience. Industrial and Labor
Relations Review 46, 625-636.

Honor'e, B. E. and E. Kyriazidou. (2000). Panel Data Discrete Choice Models
with Lagged Dependent Variables. Econometrica 68, 839-874.

Hyslop, D. R. (1999). State Dependence, Serial Correlation and Heterogeneity in
lntertemporal Labor Force Participation of Married Women. Econometrica
67(6), 1255-94.

Imbens, G. and J. D. Angrist. (1994). Identiﬁcation and Estimation of Local
Average Treatment Effects. Econometrica 62, 467-476.

Jalan, J. and M. Ravallion. (1998). Transient Poverty in Post-Refonn Rural
China. Joumal of Comparative Economics 26(2), 338-357.

Jalan, J. and M. Ravallion. (2002). Geographic Poverty Traps? A Micro Model of
Consumption Growth in Rural China. Journal of Applied Econometrics
17(4), 329-46.

Liang, Z. and Z. Ma. (2004). China's Floating Population: New Evidence from the
2000 Census. Population and Development Review 30(3), 467-488.

Mallee, H. (1995). China's Household Registration System Under Reform,
Development and Change 26(1), 1-29.

Meng, X. (2000). Regional Wage Gap, lnforrnation Flow, and Rural-urban
Migration in Y. Zhao and L. West (eds) Rural Labor Flows in China,
Berkeley: University of California Press, 251-277.

Mundlak, Y. (1978). On the Pooling of Time Series and Cross Section Data.
Econometrica 46, 69-85.

Murtazashvili, I. (2006). A Control Function Approach to Estimation of Correlated
Random Coefﬁcient Panel Data Models. Michigan State University
Department of Economics, Mimeo.

Murtazashvili, I. and J. M. Wooldridge. (2005), Fixed Effects Instrumental
Variables Estimation in Correlated Random Coefﬁcient Panel Data
Models, Mimeo, Michigan State University Department of Economics,
Mimeo.

107

 

 

 

Ravallion, M. and S. Chen. 2007. China's (Uneven) Progress Against Poverty.
Journal of Development Economics 82(1), 1-42.

Rivers, D. and Q. H. Vuong. (1988), Limited lnforrnation Estimators and
Exogeneity Tests for Simultaneous Probit Models. Journal of
Econometrics 39, 347-366.

Rozelle, S., L. Guo, M. Shen, A. Hughart and J. Giles. (1999). Leaving China's
Farms: Survey Results of New Paths and Remaining Hurdles to Rural
Migration. China Quarteriy 158, 367-393.

Semykina, A. and J. M. Wooldridge. (2005). Estimating Panel Data Models in the
Presence of Endogeneity and Selection: Theory and Application. Michigan
State University Department of Economics, Mineo.

Smith, R. and R. Blundell. (1986). An Exogeneity Test for a Simultaneous
Equation Tobit Model with an Application to Labor Supply. Econometrica
54, 679-685.

Taylor, J. E., S. Rozelle, and A. de Brauw. (2003). Migration and Incomes in
Source Communities: A New Economics of Migration Perspective from
China. Economic Development and Cultural Change 52(1), 75-101.

Wooldridge, J. M. (1997). On Two Stage Least Squares Estimation of the
Average Treatment Effect in a Random Coefﬁcient Model. Economics
Letters 56, 129-133.

Wooldridge, J. M. (2000). A Framework for Estimating Dynamic, Unobserved
Effects Panel Data Models with Possible Feedback to Future Explanatory
Variables. Economics Letters 68, 245-250.

Wooldridge, J. M. (2002). Econometric Analysis of Cross Section and Panel Data
(MIT Press, Cambridge, MA).

Wooldridge, J. M. (2003). Further Results on Instrumental Variables Estimation
of the Average Treatment Effect in the Correlated Random Coefﬁcient
Model. Econometric Theory 79, 185-191.

Wooldridge, J. M. (2005a). Fixed Effects and Related Estimators in Correlated
Random Coefﬁcient and Treatment Effect Panel Data Models. Review of
Economics and Statistics 87, 385-390.

Wooldridge, J. M. (2005b). Unobserved Heterogeneity and Estimation of Average
Partial Effects, in Identiﬁcation and Inference for Econometric Models: A
Festschrift in Honor of Thomas J. Rothenberg}. Donald W.K. Andrews and
James H. Stock (eds). Cambridge: Cambridge University Press), 27-55.

108

 

 

 

Wooldridge, J. M. (2005c). Simple Solutions to the Initial Conditions Problem in
Dynamic, Nonlinear Panel Data Models with Unobserved Heterogeneity.

Journal of Applied Econometrics 20, 39-54.

Zhao, Y. (2003). The Role of Migrant Networks in Labor Migration: The Case of
China. Contemporary Economic Policy 21(4), 500-51 1.

 

 

109

 

lllllllllllﬂllllllljljjlIlljlsllllI