In»: E. .41 11%.?! 1 .5 2.] 1 .nl : ‘i‘u' M‘ “It. “34.4., 5.1.9133: , . ‘ ‘ .. . . . . . may? in”? . . , . ‘ 5m. 22 . I. ‘ .. . z , . , . ii...¢.m:xmm.t LIBRARY Michigan State University This is to certify that the dissertation entitled PANEL DATA MODELS WITH UNOBSERVED EFFECTS AND ENDOGENOUS EXPLANATORY VARIABLES presented by IRINA MURTAZASHVILI has been accepted towards fulfillment of the requirements for the Ph.D. degree in Economics v4 )///\—4/(\ gm,»— / W ’Major Professor’s Signature Date MSU is an affinnative—action, equal-opportunity employer - -----.-I-O-Q-u- -o--.-0-.-u---~—c--.---o------ --—-.----_-.- .- - PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATEDUE DATEDUE DAIEDUE 07240m 6/07 p-lClRC/DaleDue indd-p‘1 PANEL DATA MODELS WITH UNOBSERVED EFFECTS AND ENDOGENOUS EXPLANATORY VARIABLES By Irina Murtazashvili A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Economics 2007 ABSTRACT PANEL DATA MODELS WITH UNOBSERVED EFFECTS AND ENDOGENOUS EXPLANATORY VARIABLES By Irina Murtazashvili This dissertation consists of three essays that address issues of estimation in panel data models with unobserved effects and endogenous explanatory variables. The first essay considers estimation of correlated random coefficient (CRC) panel data models with endogenous regressors. This chapter provides a set of conditions sufficient for consistency of a general class of fixed effects instrumental variables (FE-IV) estimators in the context of a CRC panel data model. The usual FE—IV estimator turns out to be fairly robust to the presence of neglected individual-specific slopes. Monte Carlo simulations suggest the proposed FE—IV estimator of Population Averaged Effect (PAE) provided a full set of period dummy variables is included performs better than other estimators in finite samples for the case of (roughly) continuous endogenous explanatory variables. The second essay continues studying a CRC panel data model from the first chap- ter but, in addition to allowing some explanatory variables to be correlated with the idiosyncratic error, the joint distribution of the endogenous regressors and the individual heterogeneity conditional on the instruments is allowed to depend on the instruments. The second essay uses a two-step control function approach to account for endogeneity and to consistently estimate average partial effects (APEs) in CRC panel data models with endogenous roughly continuous regressors.The simulation findings indicate that in the finite samples the control function approach to estimat- ing the CRC balanced panel data model with time-constant individual heterogeneity performs better than other estimators under the considered conditions. The pro- posed method is applied to the problem of estimating the APES of annual hours of on-job-training on output scrap rates for manufacturing firms in Michigan. In the third essay, a dynamic binary response panel data model that allows for an endogenous regressor is developed. This estimation approach is of particular value for settings in which one wants to estimate the effects of a treatment which is also endogenous. This model is applied to examine the impact of rural-urban migration on the likelihood that households in rural China fall below the poverty line. The empirical results that migration is important for reducing the likelihood that poor households remain in poverty and that non-poor households fall into poverty. Further, failure to control for unobserved heterogeneity leads to an overestimate of the impact of migrant labor markets on probability of staying poor of those who lived below the poverty lines. Copyright by Irina Murtazashvili 2007 ACKNOWLEDGMENTS I would like to express the deepest appreciation to my adviser, Professor Jeffrey Wooldridge for his generous advice and support. Without his guidance and help this dissertation would not have been possible. I am very grateful for the assistance and advice I received from Professor John Giles who also kindly provided me the data for one of the applications. I wish to thank my committee members, Professor Peter Schmidt, Professor Ana Maria Herrera, and Professor David Tschirley, for valuable comments and fruitful discussions. I also thank other faculty members and doctoral students of the Department of Economics at Michigan State University for support during my graduate studies. ‘7 TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES 1 FIXED EFFECTS INSTRUMENTAL VARIABLES ESTIMATION IN CORRELATED RANDOM COEFFICIENT PANEL DATA MODELS 1.1 Introduction ................................ 1.2 Model Specification and Previous Results ................ 1.3 Conditions for Consistent F E—IV Estimation .............. 1.4 Examples ................................. 1.5 Finite Sample Behavior of the FE—IV Estimator ............ 1.6 Conclusion ................................. 2 A CONTROL FUNCTION APPROACH TO ESTIMATION OF CORRELATED RANDOM COEFFICIENT PANEL DATA MOD- ELS 2.1 Introduction ................................ 2.2 Model of Interest for Balanced Panels .................. 2.3 Estimating Procedure and Calculation of Standard Errors ...... 2.4 Finite Sample Behavior of the Control Function Estimator ...... 2.5 Empirical Application to Effects of Job Training on Worker Productivity 2.6 Conclusion ................................. 3 ESTIMATION OF A DYNAMIC BINARY RESPONSE PANEL DATA MODEL WITH AN ENDOGENOUS REGRESSOR, WITH AN APPLICATION TO THE ANALYSIS OF POVERTY PERSIS- TENCE IN RURAL CHINA 3.1 Introduction ................................ 3.2 Estimation of a Dynamic Binary Response Panel Data Model with an Endogenous Regressor .......................... 3.2.] Dynamic Binary Response Panel Data Models ......... 3.2.2 A General Approach to Estimation ............... 3.2.3 Allowing for Serial Correlation of Errors in the First Stage . . 3.2.4 Calculation of Average Partial Effects .............. 3.3 Migrant Labor Markets and Poverty Persistence in Rural China . . . 3.3.1 Rural-Urban Migration in China ................ 3.3.2 The RCRE Household Survey .................. vi viii ix moat—il-A 19 19 21 28 32 4O 48 50 50 53 56 61 62 64 64 66 3.3.3 Migration, Consumption Growth and Poverty ......... 68 3.3.4 Estimating the Impact of Migrant Labor Markets on Poverty Persistence ............................ 70 3.3.5 Identifiying the Migrant Network ................ 73 3.4 Results ................................... 76 3.5 Conclusions ................................ 78 APPENDICES 80 A Tables for Chapter 1 80 B Tables for Chapter 2 85 C Tables and Figures for Chapter 3 95 BIBLIOGRAPHY 105 vii Al A2 A3 A4 B1 B2 B3 B4 B5 B6 B7 B8 B9 C.1 C2 C3 C4 C5 LIST OF TABLES Usual Unobserved Effects CRC Model for 6 = 2 and T = 5 . . 81 Usual Unobserved Effects CRC Model for 6 = 2 and T = 10 . 82 Random Trend CRC Model for 6 = 2 and T = 5 ........ 83 Random Trend CRC Model for 6 = 2 and T = 10 ........ 84 Usual Unobserved Effect CRC Model for Continuous ygit . . 86 Random Trend CRC Model for Continuous ygit ........ 87 Usual Unobserved Effect CRC Model for gm 6 (0,1) ...... 88 Random Trend CRC Model for ygz-t 6 (0,1) ............ 89 Standard Errors for the Control Function Approach ...... 90 Summary Statistics from Unbalanced and Balanced Datasets 91 POLS Estimates of the First Stage Regressions ......... 92 FE-IV and CF Estimates of the Second Stage Regressions . . 93 Summary Statistics for the Control Variables .......... 94 Household and Village Characteristics .............. 100 Factors Determining the Size of the Village Migrant Network 101 CF Approach to Estimating Determinants of Poverty Status 102 Linear Probability Model for Determinants of Poverty Status 103 Average Partial Effects of Determinants of Poverty Status . . 104 viii LIST OF FIGURES C.1 Share of Village Labor Force Employed as Migrants by Year 96 G2 Village Consumption Growth .................... 97 C3 Change in Poverty Headcount ................... 98 G4 Change in Out-Migrants in Village Labor Force ........ 99 ix CHAPTER 1 FIXED EFFECTS INSTRUMENTAL VARIABLES ESTIMATION IN CORRELATED RANDOM COEFFICIENT PANEL DATA MODELS 1. 1 Introduction In both cross section and panel data settings, there is substantial interest in estimat- ing population averaged effects (PAES), including average treatment effects (ATEs), in the correlated random coefficient (CRC) model. Models with both exogenous ex- planatory variables and endogenous regressors have been investigated in recent years. Angrist (1991) discusses the conditions for consistency of ATE estimates in mod- els with binary endogenous variables and no exogenous covariates. A set of sufficient assumptions required for consistent ATE estimates with (roughly) continuous endoge- nous regressors in a CRC model can be found in Wooldridge (2003). Both papers study estimation with random sampling from a cross section. The possibility that treatment effects might depend on individual-specific hetero- geneity motivated Imbens and Angrist (1994) to introduce the “local average treat- ment effect” (LATE) as an evaluation parameter, which provides a useful interpre- tation of the instrumental variables estimator when the effect of a binary treatment varies across units. That emphasis on LATE led to a reinterpretation of IV estimates in many empirical applications, and spurred a great deal of research on interpreting IV estimators in a variety of contexts. Heckman and Vytlacil (2005) provide a recent unification, including a discussion of whether we should be interested in parameters such as LATE. The understanding that IV generally consistently estimates LATE in simple set- tings is useful, but often we are interested in estimating the expected effect for a randomly drawn unit from the underlying population. Plus, strict interpretation of LATE as the average treatment effect among units induced into treatment by the switching of an instrumental variable ~ such as program eligibility — is limited to special cases. Here we study estimation of population average effects, or average treatment effects. in a general panel data model with heterogeneous slopes. By es- timating population average effects we can easily estimate the aggregate effects of various policies, such as increasing the amount of job training among the population of manufacturing workers. Wooldridge (2005a) studied general fixed effects estimators with strictly exoge- nous regressors in the CRC model with panel data, and derived conditions under which generalized fixed effects estimators — generalized in the sense that they sweep away unit-specific trends — are consistent for the population averaged effect. In this paper, we study the model in Wooldridge (2005a) but, in addition to allowing cor- relation between the instruments and the unobserved heterogeneity, we allow some explanatory variables to be correlated with the idiosyncratic error. The main re- sult is a set of sufficient conditions under which fixed effects instrumental variables (F E—IV) estimators consistently estimate the population averaged effect, even when the individual-specific slopes are ignored. The results include the commonly used fixed effects two stage least squares estimator (FE-2SLS) as a special case, but also more general F E-IV estimators that sweep away individual-specific time trends. The conditions are most likely to apply when the endogenous explanatory variables are at least roughly continuous, as in Wooldridge (2003) for the cross—sectional case. The remainder of the paper is organized as follows. In Section 1.2 we introduce the model and briefly review existing results. Section 1.3 contains the main consistency result, and Section 1.4 covers examples where the conditions will — and will not — hold. Section 1.5 contains a Monte Carlo study that shows how the F E—IV estimator, with a fully set of time period dummies, outperforms its obvious competitors. The simulation results support the results in Sections 1.3 and 1.4. Section 1.6 contains a brief conclusion. 1.2 Model Specification and Previous Results The model of interest is a CRC model studied in Wooldridge (2005a). For a random draw 2’ from the population, the model is yit =Wt3i+xitbi+uitat= 1.---,T. (11) where yit is a dependent variable, wt is a 1 x J vector of aggregate time variables, which we treat as nonrandom, a,- is a J X 1 vector of individual-specific slopes on the aggregate variables, x“ is a 1 x K vector of endogenous covariates that change across time, b,- is a K x 1 vector of individual-specific slopes, and “it is an idiosyncratic error. As discussed in Wooldridge (2005a), we require J < T. So, if we have two time periods, we can only allow a scalar individual-specific intercept, a,. If T = 3, we can allow individual-specific linear trends, too. Higher order trend terms are allowed as T increases. Equation (1.1) is a correlated random coefficients model when the individual spe- cific slopes, b,- (as well as the elements in a..,-), are allowed to be correlated with Xit- For example, a simple CRC wage equation might look like log(wage,jt) 2 an + aigt + biltrainingit + bigunionit + bi3marriedit + Hit» (1.2) where, in addition to the standard level effect an, each individual is allowed to have his or her own unobserved growth in wages, (11-2. In addition, the time-varying ex- planatory variables have individual-specific returns. The variable training might be hours spent in job training, and the CRC model allows the return to training to be individual-specific and correlated with the amount of training — as a standard model of human capital accumulation would suggest. Wooldridge (2005a) studied the consistency of fixed effects estimators of (1.1) that sweep out the a,- but act as if b,- = [3 for all i. To describe Wooldridge’s main result, and the extension here, write b,- = ,6 + d,, and substitute into (1.1): ya = Wtaz' + Xafi + (xitdi + an) E wtai + Xz‘tfi +1121, (13) where “it E xitdi+'u.,-t. We eliminate a,- by regressing, for each i, yit on wt, t = 1, ..., T and Kit on wt, t = 1, ...,T, and keeping the residuals, ijit and in, respectively. This gives the equations 3),, = itfib, + 77,, = 5am? +(5t,,d,-+i1,t) = xii/3 + i3,,,t = 1, T. (1.4) The fixed effects estimator studied by Wooldridge (2005a) is just the pooled OLS estimator from (1.4). We control the amount of individual-specific detrending by choosing wt appropriately. An assumption used by W’ooldridge (2005a) is the standard strict exogeneity as- sumption conditional on (a,-. bi): E(uit|x,:1,...,x,-T,a,j,b,j) =0,t=1,...,T. (1.5) Using a simple iterated expectations argument, Wooldridge shows that, under the additional assumption E(biliit) = E(bi),t=1,...,T, (1.6) the fixed effects estimator is consistent for the population averaged effect, [3. Consistency of the usual FE estimator relies heavily on assumption (1.5), which rules out traditional simultaneity, time-varying measurement error, correlation be- tween time-varying omitted factors (in Hit) and the elements of Kit, and models with lagged dependent variables or other kinds of regressors where changes in “it may feed back into changes in Xi,t+h for h 2 1. In the case where b,- = 6, methods that first eliminate a,- and then apply instrumental variables usually, 2SLS — have become a standard tool for the applied economist. Here, we study such estimators but allow for individual-specific slopes, bi. Let zit be a 1 x L vector of instrumental variables, with L 2 K. Let flit be the “detrended” instruments from the individual-specific regressions of zit on wt, t = 1, T. Then we can estimate (1.4) using instruments 2,, for unit 2' in time period t. Whether we just use pooled 2SLS _ the estimator we focus on here - or a more sophisticated generalized method of moments (GMM) estimator, the moment conditions we use are E(2§,i},,) = 0,t=1,...,T. (1.7) In the next section, we study consistency of the F E-2SLS estimator under conditions that relax those in Wooldridge (2005a). 1.3 Conditions for Consistent FE-IV Estimation In order to ensure that (1.7) holds, we place conditions separately on the relation- ship between the instruments and idiosyncratic errors and the instruments and the unobserved effects. Plus, of course, there is always a standard rank condition. ASSUMPTION 1: With the definitions in Section 1.2, E(u,-t|z,-1,z,-2, ...,ZiT) = 0, t=1,....,T (1.8) Assumption 1 is stronger than we need — as will be clear, E(z;tu,-t) = 0,t = 1, ..., T would suffice - but (1.8) is a natural strict exogeneity assumption on the instruments. Assumption 1 is common in simultaneous equations models with panel data, as well as models with other kinds of endogeneity that induces correlation between xz-t and “it, such as omitted variables and measurement error. Assumption 1 rules out lagged dependent variables among the instruments — as well as other non-strictly exogenous instruments — and so its application to dynamic models is limited unless sufficient strictly exogenous instruments are available. When zit = xit, so that the covariates are strictly exogenous, Wooldridge (2005a) included a,- and b,- in the conditioning set, as in (1.5). When the unit-specific trend function is correctly specified, this stronger form of the assumption is essentially harmless. The second component of the error term in (1.4) is 5e,,d,-, and we need assumptions such that 2,, is uncorrelated with iitdi- This requires some care because it“ contains endogenous elements. (That is, we allow components of xit to be endogenous even after removing unit-specific intercepts and trends.) The first assumption mimics the key assumption from Wooldridge (2005a), except that we replace the covariates with the instruments: ASSUMPTION 2: b,- is mean independent of all the unit-specific “detrended” Zita that is, E(bil.z.it) = E(b1) = ,B,t=1,...,T. (1.9) Because the fig are net either of a time average or, more generally, level and trend effects, Assumption 2 maintains mean independence of the heterogeneous slopes and deviations of the instruments from long—run levels or trends. Of course, in the case where the instruments are assumed, in each time period, to be independent of all heterogeneity, Assumption 2 automatically holds. Assumption 2 is practically much weaker than full independence because it allows b,- to be arbitrarily correlated with systematic components of zit; we cover some examples in Section 1.4. [Wooldridge (2005a) contains a discussion for the case of strictly exogenous Xit-l Generally, the richer is wt, the more likely (1.9) is to hold. For example, the usual F E—IV estimator takes out time averages from the instruments, and this might not be enough to ensure (1.9) if the instruments are trending differently across units 2'. On the other hand, adding more aggregate factors to wt reduces the variation in fiit, generally leading to less efficient IV estimators. Not surprisingly, in deciding what to include in Wt we confront the usual tradeoff between efficiency and consistency. Unfortunately, Assumptions 1 and 2 are not enough to conclude that the IV estimator is consistent. Instead, we employ a constant conditional covariance as- sumption. ASSUMPTION 3: For j = 1, ..., K, COVCIEZ'tj, bijlfiit) =3 COV(iitj,bij),t=1,...,T. (1.10) Importantly, (1.10) allows the (letrended covariates and the random coefficient to be correlated, and the covariance may change over time; in fact, there is no re- striction on the temporal pattern of Cov(:i},-tj, bij). But the covariance conditional on the detrended IVs is assumed not to depend on 2,, [In any case, the covariances Cov(:ié,-tj, bij) do not depend on 2' because of random sampling in the cross-sectional dimension. As we are conditioning only on at, the restriction is that the covariance condition on zit does not depend on at; we have no need to place restrictions on other conditional covariances] Assumption 3 extends to the panel data case a condition used by Wooldridge (2003) for the pure cross-sectional case. An important difference is that Assumption 3 applies to the detrended covariates and instruments. Importantly, we allow the unconditional covariances to change arbitrarily over time. Of course, if bij = flj for all 2', then (refeqzeq20) is trivially true because both sides are zero. Assumptions 1 through 3 imply that the key orthogonality conditions (1.7) hold, and these conditions can be used in a generalized method of moments framework. For simplicity, we focus here on the fixed effects two stage least squares estimator, FE—2SLS [interpreted in the general sense of eliminating a,- from (1.1)]. To ensure consistency of F E—2SLS estimator we add a standard rank condition. ASSUMPTION 4 (i) rank (2;, Ragga-0) = K; (ii) rank (2?le E(z;t'z,-t)) = L. Practically speaking, the first part of Assumption 4 is most important; it means that, after netting out individual-specific trends, there is still sufficient correlation between the instruments and regressors. Part (ii) requires sufficient variation in the “detrended’; instruments. It would be violated if, say, we specify Wt = (1, t) and zit contains an element that is constant across t for all 2' (such as gender) or changes by the same value in each time period (such as a person’s age when the length of the sampling period is constant). PROPOSITION 1: Under Assumptions 1 to 4, the F E—IV estimator is consistent for 6, provided a full set of time period dummies is included in (1.4). PROOF: Under Assumption (refeqzeq19), E(dijl2it) = O,j = 1, ..., K for all t, and SO Effltjdijlia) = COW-77m. dijlia) = COW-fig. bijliitl But by Assumption 3, the conditional covariances equal the corresponding uncon- ditional covariances, say “/tj» and so E(;i':,-tjd,-j|°z',tt) = 715]" j = 1,...,J, t = 1,...,T. Since iiitd, = .ifmdil —l— iiitgdig + + :iiithz-K, we have shown that E(iitdi|'z',-t) = ”m + + “ft K E 6,. Therefore, we can write xitd, = 9t + Tit where E(r,-,|2,-,) = 0, t = 1, ..., T. Now we plug this expression for add,- into equation (1.4): git =6t+X.it,(3+(Tit+fI-it), t=1,...,T. (1.11) As we have just shown, Assumptions 2 and 3 imply that E(r,-,|2,-,) = 0. Assumption 1 implies that E(a,,|s,~,) = 0. Thus, the composite error in (1.11) satisfies E(r,-t + Ifitliit) = 0, t = 1, ...,T, and so any IV method that uses instruments 'z'z-t at time t consistently estimates ,8. In particular, under the rank condition in Assumption 4, and standard finite moment conditions, the FE—2SLS estimator is consistent and \/ N -asymptotically normal. This completes the proof. Proposition 1 contains an important empirical lesson: unless there are very good reasons to the contrary, one should include a full set of time effects in a fixed effects IV analysis. Even if the model does not originally contain separate time period intercepts itself a questionable premise -- the estimating equation generally should if one wants to allow correlated random slope coefficients. Because the error term in (1.11), Tit + ill-t, is generally heteroskedastic and serially correlated - at a minimum due to the presence of stud,- h inference should be carried out using a fully robust variance matrix for [3. Typically this is straightforward for pooled 2SLS where all instruments have been detrended prior to estimation. 1 .4 Examples To see how Proposition 1 applies, suppose Kit is linearly related to zit with heteroge- neous linear trends for each element of Kit: X-it =g,-I‘+t-hi\Il+z,-tH+q,-t, t=1,...,T. (1.12) Initially, take wt 2 ( 1, t), so the regressors and instruments are linearly detrended before applying pooled 2SLS. Assume the instruments also have heterogeneous linear trends, which are removed by individual-specific detrending. Then Assumption 2 simply requires that the idiosyncratic movements in zit are uncorrelated with b,, a weak requirement on instrumental variables. For Assumption 3, write Skit = 2,,11 + 51a. t = 1.71» 50 that C0V(iitabiliit) = C0Vl(§itH +621), bilfiz‘tl = COVfRitabiliitla t = 1, T under Assumption 2. Thus, provided C0"(iiit.bz°liz't) = C0V(iiz't.bz'), t= 1, ~--.T. (1-13) we can use flit as IVs for it“ to obtain a consistent estimate of the PAE, )6, in equation (3.4). One might even assume that (q,1,...,q,-T,b,-) is independent of (2,1,...,z,-T), which is sufficient for (1.13) [as well as for Assumption 2]. It is possible that the FE—IV estimator is consistent even if we only demean the regressors and instruments, provided the instruments satisfy a stronger exogeneity as- sumption. In other words, even though x.,-t contains individual-specific linear trends, we ignore that in our estimation procedure. To see why we can still get consistency, demean Kit to get xit — ii = [t - (T +1)/2] ° hi‘I’ + (zit - it)” + (qit — (1i), t=1,...,T. (1.14) Now, if [(Qit — 61,1),bi] is independent of (zit — 2,) for each t, and (1.9) holds for flit = (zit — ii) and (1.13) also holds. Therefore, 10 COV(Xit - ii. bilzit - ii) = [t — (T +1)/2l‘I”COV(hiabi) = C0V(Xit — i,, bi) for each t, which means that Assumption 3 holds: while the conditional covariances are not generally zero, or even constant over time, they do not depend on zit — 5,. So, the F E—IV estimator will be consistent provided we include a full set of year dummies in estimation. What happens if we have a binary endogenous variable, grit? Assumption 3 is unlikely to hold. To see why, take the case wt 5 1, t = 1, ...,T, which corresponds to the usual unobserved effects model with correlated random coefficients. Then, it“ = firit — EM = 1, ..., T, and we need E(:'r',-td,-|z',-t) not to depend on iit- Now, by iterated expectations, E(iitdiliit) = ElEiiitdildiizifliitl = EldiE(iit|di, Zi)liitl' (115) Standard models for binary responses, with zit strictly exogenous conditional on di, would have P(:1:,-t = lldi,z.,-) depending on d,- and zit, in a nonlinear way. For concreteness, suppose P(:r.,jt = lldi, zi) follows a probit model, Pfl‘it =1|dszil= P($it=1|di,zz't)= (1)010 + a1dz' + 22102)- (1-16) Then T E(i‘it]d2f, Zi) : (19((1'0-l-(21di-l-Zitag)-T—1 Z @(CYO-l-(rldi-l-Zz‘rag) -__—' gt(di, 22') (1.17) r=1 and so, by (1.15), Efiitdzfl'z'it) = Eldigddi, Z-i)|iz'tl (1-18) 11 Even if d,- is independent of 2,, — a sensible strengthening of Assumption 2 — (1.18) generally depends on 2,7,. Thus, assuming Mata-[2,9 does not depend on 2,, is rather strong for a binary endogenous explanatory variable $2‘t- [Heckman (1997) contains a detailed discussion of the behavioral implications of this assumption in different empirical studies] In a cross-sectional context, Wooldridge (1997) proposes a modified set of assumptions that are sufficient for consistent estimation of the ATE, 6, with a binary endogenous variable, but, applied to the current setup, P(1:,-t = lldi, 2,,) would have to follow a linear probability model. In a cross-sectional setting, Card (2001) shows that the analogue of Assump- tion 3 can also be violated in the case of roughly continuous explanatory variables due to heteroskedasticity in the variance matrix of (xi,b,-) given 2,. (With a pure cross section, there are no time subscripts and, of course, no unit-specific demeaning or detrending.) In an earnings equation where x,- includes schooling, Card rejects Cov(x,~,b,~|z,:) = Cov(xi,b,-) using IQ score as a proxy for unobserved ability (an element of b,) and a binary indicator for college proximity as an instrument for ed- ucation. In our panel data setup, Assumption 1 allows Cov(x,-t, bilzz't) to depend on zit, as it generally would if xz-t and zit contain persistent heterogeneity correlated with b,. Using a generalized fixed effects approach, we need only assume Cov(5°c,-t, bi|°z',-t) does not depend on 2,,, and this is much more plausible when we think the unit- specific detrending successfully eliminates the time-constant heterogeneity in if” and '2'“. More recently, in a cross-sectional setting, Wooldridge (2005b) proposes conditions that allow Cov(x,-, bilzi) to depend on zi, but these do not apply directly to the panel data case with time-constant heterogeneity that can be correlated with the covariates and instruments. 12 1.5 Finite Sample Behavior of the FE—IV Estima- tor In this section we provide evidence on the finite sample properties of FE—IV estimator of the population averaged effect in a CRC panel data model. Because one of the most commonly used applications of CRC panel data models is the usual unobserved effects model with a random coefficient, we first assume wt E 1, t = 1, ...T in (1.1), as in the second part of the first example from Section 1.4. Also, for scalar processes sit and zit, we assume a linear relationship between LL'z‘t and zit, with a linear trend for suit. We use Monte Carlo simulations to draw the data and check the properties of the estimator. The number of replications is 500, and the results of the experiment are presented for cross—sectional sample sizes of 100, 400, and 800 for two time horizons, T = 5 and T = 10. The population average values are 6 = 2 and a = 3. For t = 1, ..., T, the endogenous explanatory variable is generated as 1,, E Ana, + Await + Ama, + (b,- + as,- + \/1 — A3,, — Ag, — A3,, — {2(1 + 0%,, (1.19) where “it: Pit ~ Normal (0, 1), a, ~ Normal (0,1), b,- = 6 + di, d,- ~ Normal (0, 03) and A”. A331,, Am, and g are constants. Further, the instrument is generated as zit = Amaz- + 1 - agamit — where a, is defined above — mit ~ Normal (t, 1), and Am is the population correlation coefficient between zit and a,, t = 1, ...T. In our reported simulations we use 02 = 1. When Am 2 0, the coefficients A”, Am, and Am from (1.19) are the population correlation coefficients between 33it and zit, IL‘z't and ”it: and $2} and ai, t = 1, ..., T, respectively. The population correlation between $z‘t and 1),; when Am 2 0 is {(1 + t), t = 1, ..., T. We use the coefficient on the error term in ( 1.19) to ensure that 23,-, has unit variance when Am = 0. When Am 79 0, 13 Var(:1:,-t) = 1 + 2A$2Azanm which is only slightly greater than one for our choices of the A parameters. The relevant covariances are Cov(xit, Uit) = An, Cov(:c,-t, at) 2 AMA“ + Am, and Cov(.r,:t, zit) = A“ + AxaAza. For the endogenous explanatory variable defined in (1.19), Assumption 3 is met: Cov(fz§,-t,b,~|2,-t) = Cov(:i§it,bi) = 5(1 + t), t = 1, ...T. The dependent variable yit is generated as: ya = “'2' + witbi + Hit, t= 1, MT, (120) where (1,, 1),, uit, and fit are defined above. Among other estimators, we obtain the FE—IV estimator in (1.20) acting as if b,- = 6. Based on the first example from Section 4, we know this F E—IV estimator is consistent for 33a generated as in ( 1.19) provided we include a full set of time dummies, even though we only demean the regressor and the instrument while ignoring the individual-specific linear trend in the regressor. Tables A.1 and A2 present simulation results for the correlated random coefficient model for Am = .40, Am = .20, A“ = .20, and Am = .25. The implied correlation between grit and Zit is about .245, which seems to be a reasonable value for panel data. For comparison, we used a data set provided with Wooldridge (2002) on domestic route air fares for 1,149 routes in the United States for 1997 through 2000. (The data set is called AIRFARE.) The correlation between the log of air fare (an endogenous explanatory variable in a passenger demand equation) and the instrumental variable candidate, the concentration ratio on the route, is about —.22, which has a magnitude in the range of .245. Table A.1 reports the simulation outcomes for T = 5, where 5 = .12, while Table A2 covers the case T = 10, where f = .06. When 6 = .12, the correlation between 322-1 and b, is slightly less than .24; when 6 = .06, the correlation is just below .12. Columns 1 through 6 contain the mean, standard deviation (SD), root mean squared error (RMSE), lower quartile (LQ), median, and upper quartile (UQ) of the 14 PAE estimates from 500 replications. Rows of the table report statistics for usual pooled ordinary least squares (POLS) estimates on the original data, the usual fixed effects estimates (FE-OLS), which is just pooled OLS on the time-demeaned data, pooled instrumental variables (IV) estimates using the original data, the fixed effects- instrumental variables estimates without period dummy variables (FE-IV without dummies), and fixed effects instrumental variables estimates when a full set of period dummy variables is included (FE—IV with dummies). From the table we see that the POLS estimates are roughly 1.5 times larger than the true value of 6 in the 100, 400 and 800 observation samples. One source of bias of the POLS estimates is the correlation between the unobserved heterogeneity a,- and the regressor grit. A second source of bias in the POLS estimates is the endogeneity of the regressor 15a. with correlation coefficient pm very close to .4. A third source of bias (and inconsistency) is the correlation between $it and bi. The within transformation eliminates a,, and so the correlation between xit and a, is not a source of bias for the usual FE—OLS estimator. But FE-OLS still produces a biased estimator of 6 for the last two reasons mentioned above. The bias in the FE—OLS estimator is much lower than for POLS, but the bias is still on the order of 30 percent. The pooled IV estimator — that is, without removing time averages and without time period dummies — actually has a larger bias than the F E—OLS estimator, a finding that is not too surprising because the instruments are correlated with (Li. Using the FE transformation combined with IV eliminates the dependence between zit and a,- because lit 2 Amaz- + 1 — Agamit. Therefore, the FE-IV estimator (without time dummies) has a smaller bias and considerably smaller RMSE than the pooled IV estimator. More importantly, the FE—IV estimator with period dummies has the lowest RMSE among all estimators for all the sample sizes and both time horizons. Plus, the RMSE of the FE—IV estimator with time dummies falls quickly as the sample 15 size, N, grows. Without period dummies, the F E—IV estimates of 6 are biased by at least 20 percent, and the bias does not disappear as N —> 00. As T increases, the RMSE of the FE—IV estimator without dummies estimates decreases but it is still higher than the one for the F E—IV estimates when the period dummy variables are included. Thus, even though the structural model (1.20) does not contain a time trend, inclusion of a full set of period dummies ensures the consistency of the FE—IV estimation. Not surprisingly, the FE—OLS estimator has a smaller standard deviation than the F E—IV estimator (both without time dummies). Typically, methods that treat regressors as exogenous have substantially less sampling variation than their IV coun- terparts because the correlation between the instrument and regressor is typically well below one, as in the current simulation. The difference between the FE—IV estimates with and without time dummies illustrates the trade-off between bias and variance. The FE—IV estimates without time period dummy variables are always less variable than the FE—IV with time dummies. This is hardly surprising, as including more explanatory variables — the time dummies in this case that are correlated with the instrument induces multicollinearity into the IV estimates. The instrument, Zita is constructed to be correlated with time dummies, and so the FE-IV estimator with time dummies is less precise than that without. But, of course, the estimator without time dummies suffers from substantial bias even though the structural model does not contain separate period intercepts. The RMSE for the FE—IV estimator that includes a full set of dummies is much lower than the estimator that does not. We also conducted simulations with more variability in the random coefficient, namely, 03 = 4, so that the standard deviation of b,- is double that in Tables A.1 and A2. The results of these simulations are not included here but are available on request. With more variability in 11,-, the bias induced by failing to include time 16 dummies in the F E—IV estimation is more pronounced (even though, remember, the structural model does not include time effects). For example, with T = 5, and N = 800, the RMSE of the FE—IV estimator without dummies is about 1.36, compared with about .22 for the estimator that does include the dummies. For the next set of simulations, we take wt E (1, t), t = 1, ...,T, in (1.1), so that each cross-sectional unit has its own linear trend. In particular, we generate yit as yit = (1,0 + ant + Iitbi + nit, t: 1, ...,T, (1.21) where (1,30 and an are independent Normal(o, 1) random variables and b.,-, and 21,-, are defined above. The endogenous explanatory variable xit is generated as xit E szzit‘l‘druuit+A;ra(ai0+ai1)+§bi+€tdi+\/1 " /\;2rz - A3221 _ 2Afra — {2(1 + t)2e,-t, (1.22) and the instrument is generated as zit = Maul-0+ Wmit. Again, the coefficient on eit is chosen so that Var(:r,-t) = 1 if Am = 0. We use the same values for the /\ parameters as in Tables A.1 and A2, and we take 0b = 1. (Simulation findings for the case 0b = 2 are available on request.) Because the structural model (1.21) contains a time trend, the default is to include a full set of time period dummies in the various estimation methods. For comparison, we include the FE—IV estimator without time period dummies. The rows of Tables A3 and A4 report statistics for POLS with time dummies, fixed effects with time dummies, pooled instrumental variables with time dummies, fixed effects instrumental variables estimates with time dummies, and fixed effects instrumental variables estimates without time dummies. As in Tables A.1 and A2, the simulation findings are unambiguous: fixed effects IV with a full set of time dummies is superior, by far, to the other estimation methods, for all combinations of N and T. Perhaps not surprisingly, when yit is itself trending, the consequences of 17 omitting aggregate time effects is much more detrimental than in the previous case. The simulation findings are perhaps not too surprising: the only estimator that is essentially unbiased for the PAE removes the unobserved effect (or, more gener- ally, the individual-specific trends), includes a full set of aggregate time effects, and instruments for the endogenous explanatory variable. Nevertheless, it is useful to see that the theoretical findings in Section 1.3 have practically important implications: the FE—IV estimator with time dummies is robust to correlation between the random coefficients and the explanatory variable, at least for assumptions that can be met by continuous endogenous explanatory variables. 1.6 Conclusion This paper suggests a set of conditions sufficient for applying the standard IV ap- proach to the estimation of population averaged effects in a correlated random coeffi- cient panel data model with (roughly) continuous endogenous explanatory variables. Assumptions 1 through 4 ensure consistent FE—IV estimation of the population av- eraged slopes, 6, even ignoring individual-specific slopes. Monte Carlo simulations suggest the proposed FE—IV estimator of PAE provided a full set of period dummy variables is included performs better than other estimators in finite samples for the case of (roughly) continuous endogenous explanatory variables. A natural direction for future work is to relax homoskedasticity of E(5e,-,d,-|2,-,); Card (2001) showed how the analogous assumption can fail in a cross-sectional envi- ronment. Recently, Murtazashvili (2006) shows how this assumption can be relaxed using a control function approach by putting restrictions on the reduced forms of the endogeneous elements of x.” — restrictions that can be met for roughly continuous variables — and by modeling the conditional covariances. 18 CHAPTER 2 A CONTROL FUNCTION APPROACH TO ESTIMATION OF CORRELATED RANDOM COEFFICIENT PANEL DATA MODELS 2. 1 Introduction Recently, a lot of attention has been devoted to estimation of average partial effects (APES) in correlated random coefficient (CRC) models, in both cross section and panel data settings. Studies are primarily conducted in a cross—sectional setup with few exceptions for panel data. CRC panel data models are investigated for both ex- ogenous and endogenous explanatory variables. Wooldridge (2005a) discusses fixed effects estimation of a CRC model for the case of exogenous independent variables in a panel data setting. Murtazashvili and Wooldridge (2005) address fixed effects instrumental variables (FE-IV) estimation of APEs with (at least roughly) continuous endogenous regressors in CRC panel data models.1 One of the main conditions for consistent estimates of APEs in their study is an assumption of independence of co- 1We refer to the continuous variables with some discrete characteristics as roughly con- tinuous, and provide a discussion about this kind of variables in the next section. 19 variance between detrended endogenous regressors and individual heterogeneity, con- ditional on the transformed IVs, from the detrended instruments. Card (2001) shows for cross-sectional data that this assumption can be violated in the case of roughly continuous endogenous explanatory variables due to heteroskedasticity in variance- covariance matrix of explanatory variables and individual heterogeneity conditional on the instruments. He rejects this assumption using IQ as a proxy for unobserved ability and a binary indicator for college proximity as an instrument for education in the human capital earnings model. Wooldridge (2005b) proposes conditions weaker than those in Murtazashvili and Wooldridge (2005) for obtaining consistent APEs estimates for (roughly) continuous regressors with the Card’s problem in a cross- sectional setup. In this paper, we study the model in Murtazashvili and Wooldridge (2005) but, in addition to allowing some explanatory variables to be correlated with the idiosyn— cratic error, we correct for the drawback described in Card (2001) while still allowing the endogenous regressors to be (roughly) continuous. We use a control function ap— proach, which introduces residuals from the reduced form for the endogenous regres- sors as covariates in the structural model. We propose a two-step method to account for endogeneity and to consistently estimate APES in CRC panel data models with endogenous (roughly) continuous regressors. The motivation for our two-step panel data procedure comes from a cross section study by Wooldridge (2005b). Further, we relax the assumptions in Wooldridge (2005a) and Murtazashvili and Wooldridge (2005) by allowing the individual slopes in a CRC model to vary over time. Both cases of time-constant and time-varying individual slopes are covered in this paper. Monte Carlo simulations indicate that in the finite samples the control function (CF) approach we propose for estimating the CRC balanced panel data model with time-invariant individual heterogeneity performs better than other estimators when the joint distribution of the individual heterogeneity and the endogenous regressors 20 conditional on the detrended instruments depends on the instrumental variables. We apply the proposed method to the problem of estimating the average partial effects of annual hours of on-job-training on output scrap rates for manufacturing firms in Michigan using the firm level data for 1987 through 1989. The control function approach we propose delivers the APEs of the annual hours of job training on the output scrap rates that are larger in magnitudes and statistically more significant than the APEs’ estimates from the FE—IV approach. 2.2 Model of Interest for Balanced Panels For a random draw 2' from the population, the structural model is ylit =wta, +xitbi+uita f: 1,...,T, (2.1) where wt is a 1 x J vector of aggregate time variables which we treat as nonrandom — a,- is a .1 x1 vector of individual-specific slopes on the aggregate variables, Kit is a 1x K vector of exogenous covariates, zm, and an endogenous covariate, 92in that change across time, in general, Xit = f (zm, ygit), b,- is a K x 1 vector of individual-specific slopes, and 'uz-t is an idiosyncratic error. For simplicity, assume x“ :2 (le’te ygit). Let zit = (2121,2221) be a 1 x L vector of instrumental variables, with L 2 K, i.e., we assume the vector 22,, contains at least one element. We assume a sample of size N randomly drawn from the population, and T being fixed in the asymptotic analysis. For the purpose of this paper, we assume a balanced panel. Our object of interest is 6 = E(b,:), the K x 1 vector of average partial effects, i.e., vector of partial effects averaged over the population distribution of any unobserved heterogeneity. The APEs are usually of primary interest to empirical analysts. An- other empirical question of possible interest is estimating bis themselves. However, the estimation of bis, when we treat them as parameters, is not precise unless T is large. As an alternative, we turn to estimation of average partial effects in our model. 21 Following Murtazashvili and Wooldridge (2005) we study estimators of 6 that are based on the assumption that the slopes bi are constant, but we study the properties of these estimators in the context of model (2.1). We write b,- = 6 +di, and E(d,-) = 0, by definition. In other words, we assume that that individual heterogeneities have constant means, 6, and random error terms, di. Substitution into (2.1) gives ylit = Wtae' + Xz‘tfi + (xitdi + Hit) 5 Wtai + X216 + ’Ulz't. (2-2) where ”Unit E xitd, + “it- We estimate 6 in (2.1) allowing the entire vector 3.,- to vary by 'i, and to be arbitrarily correlated with Xit- Following a cross-sectional definition from Heckman and Vytlacil (1998), we call (2.1) a correlated random coeflicz’ent model because of the possible correlation between b,- and Kit- In this paper we develop a two—step estimation method motivated by Wooldridge (2005b) for obtaining consistent estimates of the average partial effects. The method we employ for obtaining consistent estimates of APEs is called a control function approach, which was pioneered by Smith and Blundell (1986) and Rivers and Vuong (1988). The main idea of the control function method is to add control variables into the structural model to control for the endogeneity problem (regardless of its exact nature). To use the control function approach in our case, we need to make assumptions about the nature of the endogeneity in the random coefficient model. Since we have two sources of endogeneity in our model — the correlation between the unobserved heterogeneities and the regressor ygit, and the correlation between that regressor and the structural error, we are interested in modeling the relationships among the random coefficients, exogenous covariates, and the error from the reduced form equation for the endogenous explanatory variable. First, we assume there is some strictly monotonic function h() defined on the 22 support set of ygit, such that (7012a) = 520 + 221521 + 21522 +0221, t = 1, T, (2-3) E(‘U2,jt]Z.,'1,...,ZiT) = 0, f: 1, ...,T, (2.4) T where Z,- = T—1 Z zit, ragit’s are error terms, and t=1 E(“itlzilv ZiTa “22:1. “227“) = E("itl?’2i1e?’2iT) = = P1U2it + pQUQi, t = 1, ...,T, (2.5) where p1 and p2 are scalars, and 172,- = T—1 £112“. Assumption (2.5) is stronger than just assuming that “it is uncorrelated wittzh1 zi. There are two parts to this as- sumption. The first equality says that ”it is conditional mean independent of z,- given 122.“, ”022:1“. This will always be true if (uz-t,v2,;1,...,v2,-T) and z,- are independent. The second equality states that E(u,~t]vg,-1, ..., 122,7) is linear. Assumption (2.5) holds if U2“ 2 (12,- +62“, where {(Uit, 6%)} is independently and identically distributed and all conditional expectations are linear. Thus, we maintain (2.5) is a valid extension to the CRC panel data models. We follow Rivers and Vuong (1988) and call equation (2.3) a reduced form equation. Strict monotonicity of h() implies that ygit is a well-defined function of {Zi1,...,ZiT} and um. Further, assumptions (2.3) and (2.4) mean that when some function h(-) is applied to the endogenous explanatory variable, ygit, the latter has a linear conditional mean given all the instruments. In other words, linearity of E(y2it|z,-1, ZiT) might not be an appropriate assumption, while we want ygit to be included linearly in the regression equation. Assumption (2.4) always holds if cm is independent of 22-. In the standard case of continuous 312,-, with a large support set assumptions (2.3) and (2.4) are very reasonable in many possible situations. But if the endogenous covariate has characteristics that are not quite suitable for a con- tinuous variable these assumptions do not generally hold. For example, assume a 23 continuous variable ygit with a large support set is defined according to (2.3) when h(-) is identity so that vgitlzi”Normal(0, 02-2,), where 022, = Var(v2,tt|z,-) is a conditional variance that depends on z,-. In this case we can standardize Uzit to be a variable 9,3235, which is independent of 22-, guaranteeing that assumption (2.4) is satisfied. However, assumption (2.4) is unlikely to hold if 312,-, has some ”discrete”-type characteristics. For instance, let 312,-, be a binary variable so that ygitIZ, follows a probit model. Even having standardized the error term for this variable, v2“, we cannot hope to obtain a new one, which is independent of zi. For the purpose of our study, we will refer to the continuous variables with some discrete characteristics as roughly continuous to distinguish them from the traditional continuous variables and emphasize that these roughly continuous variables do not always have fine behaviors of continuous variables. Possible examples of these vari- ables would be income, education, experience, etc. Garen (1984) discusses estimation of models in the presence of selection bias when the choice variable is continuous and the choice set is ordered. He suggests treating level of education in the human capital earnings model as such a continuous variable: on the one hand, schooling is traditionally thought of as a continuous variable, on the other hand, only integers of that variable are observed. Which functions can we use as a strictly monotonic function h(') in transfor- mation (2.3)? For a trivial case of a continuous ygit with a large support set, 1? we can use (1(y21't) = ygzgt. When the nature of ygit is more ”exotic, the choice of h() is not so straightforward. For instance, Wooldridge (2005b) suggests us- ing h(y2.,jt) = “IQ—3253;), when 312,-, is a fraction in the open unit interval, and h(y2,t) = ln(y2,:t), when ygit > 0. Assumptions (2.3) and (2.4) rule out probit, legit, and Tobit models because ygit has discrete characteristics. For example, if we are interested in estimating whether there is an effect of per- pupil spending on math test pass rates for fourth graders in Michigan, and the en- 24 dogenous variable of our interest is per—pupil spending, then per-pupil spending is a roughly continuous variable and choosing a log-transformation of per-pupil spending is appropriate. If we employ the logged per—pupil spending as the endogenous covari- ate in the model, then logged per-pupil spending can be thought of as a continuous variable and function h(y2.it) = ygit with ygit = ln(per-pupil spending) is clearly adequate. Second, we need to make assumptions about the distribution of (ai, cm) condi- tional on the instruments. We assume E(ailzi11 Zn“, znu. U211“) = E(ai|§i»52i), (2-6) and Efailia 522'.) = a + A132" + (A2 + 1432265223 (2-7) where a and A2 are J x 1, A1 and A3 are J x L matrices of constants, respectively, ‘z‘, and “172,- are defined above. Assumption (2.6) means that '73,- and '62.,- can be thought of as sufficient statistics for describing the relationship between a,- and the history of {zitmgit : l = 1, ...,T}. Assumption (2.7) specifies a particular functional form for the relationship among a,, 2,, and 62,-. Interactions among the exogenous variables zit and 222,, might be important. In a cross-sectional context, Card (2001) shows that the joint distribution of (a,, um) given Zz‘t can depend on zit due to heteroskedasticity in Var(a,-, ’Ugitlzit). He shows that using IQ as a proxy for unobserved ability and a binary indicator for college proximity as an instrument for education in the human capital earnings model. Assumptions (2.4) and (2.7) can still be true even when the conditional variance-covariance matrix, Var(a.,-, U2itlzz'), is heteroskedastic. Third, we need to make assumptions about the expected value of d,, conditional on {lit} and {Ugit}. We assume Efdzilzrli ZrT. “022:1. 112:7“) = add-7323521), (2-8) 25 and, in particular, E(dilzi1aziTav2i1au-102iTl = 131(32' — 1W + (32 + B331’W22', (29) where 2,0 E E(z,~), B1 and B3 are K x L, and B2 is K x 1 matrices of constants, ’2‘,- and 62,- are defined above. In Murtazashvili and Wooldridge (2005), one of the conditions for consistency of 6 estimates states that the covariance between (I,- and the detrended xz-t conditional on the detrended zit equals its unconditional version, that is, it does not depend on the detrended Zit- For the reasons mentioned earlier, this assumption might be too restrictive for the case of roughly continuous endogenous explanatory variables. In this study, we relax this assumption not only by dealing with the original data, but also by allowing the covariance between d,- and Kit conditional on the instruments to be a function of Zit- The conditions we employ in this paper assure the consistency of 6 estimates in the case of roughly continuous endogenous explanatory variables. Then, we take the expectation of equation (2.2) with respect to (2,1, ziT, 122,1, ..., v2”), employ that ygit is a deterministic function of (z,1,...,z,-T,c2,-1, ...,2.22,:T), and use assumptions (2.3) through (2.9). The resulting estimating equation is: Efylitlzila ZiTa U211, U223") = Wta + (it 18' “7001 + 9-2th012 + 521(21'83 W003 + +tht/3 + ((Z‘ — 11)) (>9 xi061 + 172ixz't/32 + 1722' (it <59 X1063 + 1010221, (2.10) where t = 1,...,T. Here, H = 2(1+ L)(J + K) +1 is the total number of all the independent second-stage variables. Equation (2.10) is an estimating equation for obtaining consistent estimates of APE, 6. Importantly, the components of 22,-, 1 the instrumental variables excluded from the structural equation (2.2) 1- do not enter the estimating equation (2.10) in levels or interacted only with Zlit- Generally, if we had any of these introduced in (2.10) we would lose identification. [See Wooldridge (2005b) for more details] 26 In some cases we might think that assumption (2.8) is too restrictive. In some potential applications we might want to allow the random coefficient to vary not only across 2' but also across t. In other words, for a random draw 2' from the population, the structural model becomes mi, = wta, + xitbit + "it: t: 1, ..., T, (2.11) where bit is a K x 1 vector of time varying individual-specific slopes. We write bit = 6 + qz't, and E(q,-t) = 0, by definition. In other words, we assume that that individual heterogeneities have constant means, 6, and random error terms, (lit- F ur- ther, we assume that qits consist of both time-constant and time-varying zero mean components, i.e., q,t = d,- -l- Tit. Substitution into (2.1) gives l/lit = wtai + X213 + (Xitqi‘t + Us) 5 wtavi. + Xitfi +’01it, (212) where um E Xz‘th't + uit. Then, the estimation equation for the model (2.12) will need to expand in comparison with the estimation equation (2.10) to reflect the time varying nature of the individual multiplicative heterogeneity. Assumptions (2.8) and (2.9) can be replaced with the following assumptions about the error term and the distribution of (qit, um) conditional on the instruments: E(Qitlzila ziTa U211» 1’2iTl = E(Qitlzia "521» U221, zit): (2-13) which says that E(q,~t|z,71, ...,ziT,t:2,:1, ...,u2.,;T) depends only on the time t values and time averages. Since we maintain qits consist of time-constant and time-varying components (1,; and Fit, respectively, assumption (2.13) reflects the nature of (lit- And, finally, we assume Efqitlzu. ---. ZrT. v2i1, 112:?) = E(di + 151131.521, U221. Zitl = = {31% — 5’)’ + (B2 + B3-z—i’lff2i} + (B4 + B5zit’lv2its (2-14) 27 where if E E(z,~), B2 and B4 are K x 1, Bj, j = 1,3,5, are K x L matrices of constants, respectively, 2,- and 172,: are defined above. Clearly, the right hand side of equation (2.14) is identical to equation (2.9) when B4 = B5 = 0. Then, similar to the case of the time-invariant individual heterogeneity, we take the expectation of equation (2.12) with respect to (z,1,...,z,T,v2,1, ...,vgiT), employ that ygit is a deterministic function of (2,1, ..., ZiT, 112,71, ..., 122,7), and use assumptions (2.3) through (2.7), (2.13), and (2.14). The resulting estimating equation is: Efyialzih Zthe ”U221, l121T) = WtOI + (52' ® W001 + 52th012+ 4%:de 0‘0 thu3 + Kit/5 + ((Z‘ - 't/x') 07¢ Xitlfii + 52299162 + 521(72' <83 Xitlfi3+ +192itxz't/34 + ’Uzrdza ® X2665 + plv2itv (2-15) where t=1,...,T, (11 = vec[A1], (12 = vec[A2] + ([22 0 0)/, where (02 0 0)! is a J x 1 vector, (13 = vec[A3], 63- = vec[B]-], j = 1,—5. Once again, equation (2.15) is an estimating equation for obtaining consistent estimates of APE, 6. When wt E 1, t = 1, ...,T, equation (2.15) simplifies to: EWmIZii, ZiT» 212,1, lair) = a + Z201+ 722102 + T2i§i03+ +3915 + (fir - 11') 0'0 Xitlfii + 172212162 + 5'2er ® Xitlfi3+ +1’22'tx-itl‘34 + v2it(zit ® Karl/35 + 9111221, 15 = 1, T- (2-16) 2.3 Estimating Procedure and Calculation of Standard Errors We employ the control function approach that uses the reduced form error terms, “U221, as ”control variables” for heterogeneity and endogeneity in the structural model. A two-step method that consistently estimates the parameters from equation (2.11) is the following: 28 1. Run the POLS regression of h(f/2it)0n11 Zita 22‘, 2: 1, ..., D], t: 1, ...,T, (2.17) _ T and save the residuals, 172“, 2' = 1,...,N, t = 1,...,T. Obtain 172,- : T4262“, t=l i=1,...,N. 2. Run the POLS regression of 91:1. 011 Wt. vec[(2,- <8) thl’e 52cm, veclfii ®thlIFZia Xit: VGCKZ‘ - 2l '59 Xitl', faxit, vec[(2,- ® Kali/521'» fi2itxita V€lezit 53‘ Xitll,fi2ita 82a. (2-18) N T .. where 2' = 1,...,N, t = 1,...,T, 2 = (ND—12: Zzit, and obtain 6 and the other parameter estimates. Terms containing the vecszJteffitor are used to denote all possible interactions among the variables. For example, term vec[(z,~t ‘8 x.,-t)]'i72,-t in (2.18) consists of K * L interaction terms. If we want to test whether the data exhibit the properties of time-varying or time- constant individual heterogeneity, we can employ a test of joint significance of 6-, j = 4,5 in (2.15). The null hypothesis of time—constant individual heterogeneity is H0 : 64 = 65 = 0. A fully robust adjusted Wald statistic is appropriate. If the Wald test rejects the null hypothesis then the model with time-varying individual heterogeneity - (2.15) - should be estimated. To test for endogeneity of ygit and individual heterogeneity we can simply test for joint significance of all the second-stage terms other than wt and Mt By con- struction, the errors from the second stage of the estimating procedure are zero mean independent of all the explanatory variables on that stage. As a result, the POLS estimates of the second-stage parameters will be consistent, and a standard F test of joint significance of all the second-stage terms containing the first-stage residuals and time—demeaned exogenous variables, 2,, will be a valid test. If the coefficients of 29 all the terms from the second stage that contain the generated regressors and time- averaged instruments are statistically jointly different from zero, there is endogeneity and heterogeneity problem, and neglecting it will lead to misspecification. If the null hypothesis of no endogeneity and no individual heterogeneity is rejected, the standard errors in (2.18) should be adjusted for the first-stage estimation of 62: (520,621I, 622/)1, a (2L + 1) x 1 vector of the first stage parameters in (2.17). Define git to be a 1 x H, H 2 (2J -l— 3K)(1 + L) + 1, vector of all the independent second-stage variables, i.e., git 2 (wt, vec[(2,- ‘8 wt)]’, Ugiwt, vec[(2,- ® Wt)]’22,-, Xita V€Cl(ii — Z) <29 Kill]: 52209:. veclfii ‘8’ Xitll'vizi, v2itx2’ta vecl(zit ® xitlllv2z’ta ”Uzitl- Let git to be a 1 X H vector git that contains the estimated first-stage residuals, 272,1: Sit = (Wt, V€Cl(72i <8 thl', 522%, V€Cl(ii 8 thl'fizz‘» Kit. V€Cl(7i - 2‘) ‘59 Xrtl’, 5221a, vec[(2,- <8) xit)]’figi, figitxit, vec[(z,-t <8) x,t)]’62,-t, 62%). Then, the estimating equation — (2.18) can be rewritten as 9m 2 git6+e,t, where E(e,-t|z,-1, ..., ziT, 212,1, ..., 11211“) = 0, and 6 is a column of all the parameters from the estimating equation. Define y1,- to be the T X 1 vector of gift: let G,- be the matrix with it" row git: and G,- be the till matrix with row g, Then, 6 can be estimated as: N N T 6 = (ZGQG. )‘kZZeltyua. (2.19) i=1 i=1t=1 Write 311a = @119 + (g,, — Sitlg + 6n = git9 + mg“ — Sit), + eit- Plugging this in (2.19) and multiplying through by \/N gives N T We — 6) = A—lN‘l/ZZZeg.l6’(g.-. — at)’ + ea. i=1t=1 .. N - A .. where A = N‘1 ZGgG, . Using the Law of Large Numbers, we know that A L 7.21 A EE(G;G,‘). Further, a mean value expansion gives N T N T N T N-1/222g2.e.t = N-1/2ZZgQ.e.-.+[N—Sgt/seam«Varanasi i=1t=1 i=1t=1 i=1t=1 30 where V52git is the H x (2L + 1) Jacobian of g], with respect to the parameters 62 from the first stage of the estimating pro- cedure. For each (2', t), V52 git is a block matrix of the form: { 0 0 . . . 0 0 . . . 0 \ 0 0 . . . 0 0 . . . 0 K :1 \ Wt Wt . . . Wt Wt . . . Wt —Z” (7190“?) (ii X Wt) (72' th) (z, XWt) (72‘ @9th i 0 0 . . . 0 0 . . . 0 —2L, 0 0 . . . 0 0 . . . 0 "Eli xit xit . . . x‘it xit . . . xit , (it ‘8’ Kit) (22' ® xit) ~ - - (it <59 Kit) (32' ® xitl - . - (52‘ 8’ xa) \ ..EL, } 3% xit . . - xit Xit . . . xit (zit X xit) (zit 59 Kill - - - (zit 59 xit) (zit 8’ xit) - - . (zit ‘83 xit) 1 1 . . . 1 1 . . . 1 Each row of the jacobian matrix corresponds to each addendum in estimating - ~ . . _ I _ equation (2.15). Because E(re,:t|z,-1, ...,ziT,1:2,-1, ..., 02”) — 0, E((V52g,-t) eit) — 0. It follows that N T N—1:Z(vdgitleit = 012(1): i=1t=1 A N T N T and, since x/N(62 — 62) = Op(1), we get N—1/2: Zggteu = N_1/2Z Egg-ted + i=1t=1 i=1t=1 op(1). Next, using similar reasoning, N T N T N-1/2ZZgg.6’(g.-. — ea’ = —[xv—IZthG’wae-alW052 — 62) + ope) = i=1t=1 i=1t=1 = —B\/N(52 — 52) + 012(1)» T where B =E(Zg;,6’(v(52g,t)). Further, based on the first stage of the estimation i=1 procedure - (2.17) - we know that N T W02 - 52) = C_1N"1/2ZZ(25)"U2it + 022(1), i=1t=1 T where C EZEKzgYZS], Z]: = (1,z.,-t,2,) is a 1 x (2L + 1) vector of the first t=1 stage explanatory variables, i.e., it is a vector containing a constant, exogenous ex- 31 planatory variables, Zita and, time averages of the exogenous explanatory variables, T 2, = T"1 Zzit, and E((zg)’v2,§t) = 0, t = 1, ...,T. Thus, collecting all the terms we t=l obtain \/N(6 — 6): A__1N 1/222[g,~,cit — BC 1F(z zit) )va] + 0p( (1). i=1t=1 By the Central Limit Theorem, \/N(6 — 6) L Normal(0,A—1MA-1), where M EVart231( gztc it - BC 1z(,F)'12,-t). Therefore, the asymptotic variance of 6, Avar(6), 1S estimated as \7 E A'IMA‘l/N, (2.20) where A is defined above, 7— “6‘1 A —1 A =1» 1: Zlgz.e.-.-13<:z<5>'vz.a 2(gitéa-13C(zg)'v2a) , i=1t=1 . N T . B =N—12112git9'W62gzt) C :N 12 gsz) it )Zg, and éit : ylit _ Site- 2 t: z t 2.4 Finite Sample Behavior of the Control Func- tion Estimator In this section we provide evidence on the finite sample properties of the control function estimator of the APE in CRC balanced panel data models. We assume that the unobserved heterogeneity is time constant. This assumption allows us to compare the proposed estimation method with other available estimators in the same context, and time constant slopes are commonly assumed in many empirical applications. So, we consider two CRC panel data models with time—constant unobserved heterogeneity 32 described by equation (2.10). First, we study the usual unobserved effect CRC model with a random coefficient, i.e., we assume wt E 1, t = 1, ..., T. Second, we employ the random trend CRC model with W, E (1,t), t = 1, ...,T, so that each cross-sectional unit has its own linear trend. We use Monte Carlo simulations to draw the data and check the properties of the estimator. The number of replications is 500, and the results of the experiments are presented for samples of 500 and 1000 observations for a time horizon T = 5. The population values of the model parameters are set at 6 = 2 and a = 1. We consider two options for a scalar endogenous explanatory variable ygitz (1) a continuous ym with a large support set, i.e., it is a traditional continuous variable, and (2) gm being a fraction in the open unit interval, 312,-); E (0,1), i.e., it is a roughly continuous variable. For the usual unobserved effect CRC model the dependent variable ylit is gener- ated as: yhjt = a, + y2itb'i + nit, f: 1, ...,T, (2.21) where a,- E a + AME, + Aga'L—‘gi + A3af2i—5i + A406? (2.22) b, E 3 + Albffi — E) + AgbUQi + A3btjgi§i + A4b8?, (2.23) and “it E lumen + /\2u’t_‘2i + /\3ue;1ta (2-24) where 2,, ~ Normal(l,1), 02,-, ~ Normal(0,1), cf, 6]? ~ Normal(0,1), 62‘, N T Normal(O 1):: _ _—T ltzlzita 3 (MIT 1: Zzitv 622' = T-1231v22'ta /\laa /\2aa t: A3a: )‘4aa A11), A2,” A35, A41), A1“, Agu, and A3,, are constants. For a continuous ygit on a large support set we define the endogenous explanatory 33 variable ygit to be ygit E h(y2,t) E 92,4, where we generate 92,, according to: 92a 5 )‘ggzzit + {Zitdi + )‘ggvgv2it1 (2-25) where '02,, ~ Normal(0,1), d,- = b,- — 6, A922, g, and A924)2 are constants. For ygit E (0, 1), we use the following equality to define the endogenous regressor: 1 + exp(92rt) 112a E If we set 6 to be 0 in (2.25) then the condition for consistency of the F E-IV estimator of CRC panel data models in Murtazashvili and Wooldridge (2005) will be satisfied. When 5 aé 0, the covariance between the detrended endogenous explanatory variable, 372,4, and the unobserved heterogeneity, b,- = 6 + (1,, conditional on the detrended instrument, 5,4, is not equal its unconditional version: Cov(372,-t, bilgit) 79 Cov(3jz,t, (1,) Thus, for § 74 0, the FE—IV estimation in Murtazashvili and Wooldridge (2005) does not deliver consistent estimators of the model parameters. While (2.25) does not meet the requirements for consistent FE—IV estimation of (2.21) when € 75 0, it does satisfy (2.3) through (2.14) and does allow using the CF approach to obtain consistent parameters’ estimates in (2.21). For the random trend CRC model the dependent variable y”, is generated as: f/lit = 011+ (122't + y22fitbz' + ”Lt-it, t: 1, MT, (2-27) where both (2.1,: and (1.2,- are generated according to (2.22), b,-, ygit, and u,-t are also defined above. Why would we think that the data generating process we propose in (2.22) through (2.25) is representative of something that we might actually see in practice? One of possible empirical examples can be a study by Hall and Jones (1999). The authors attempt to explain the differences in output per worker by differences in institutions and government policies, which they call social infrastructure. Even though Hall and 34 Jones (1999) do a cross-sectional investigation, their idea can be easily extended to a panel data setup. Social infrastructure is thought to be endogenous. First of all, it can depend itself on the level of GDP per worker in a country. Secondly, we do not observe social infrastructure directly, and need to deal with a measurement error problem. Hall and Jones (1999) suggest using Western European influence around the world as an instrumental variable for social infrastructure. Specifically, a distance of a country from the equator and a fraction of population speaking a European lan- guage are used as measures of Western European influence. Clearly, the distance of a country from the equator is time-invariant. Instead,we can use a time-varying fraction of population speaking a European language as an IV in a panel data setting. While both models (2.21) and (2.27) can be thought appropriate, perhaps, structural equa- tion (2.27) should seem more suitable for modeling a behaviour of output per worker, since we want to allow each country to have its own time trend. Further, endogeneity of social infrastructure explains equations (2.24) and (2.25). Country-specific unob— served cultural characteristics, both additive and multiplicative, might be related to the fraction of population speaking a European language. It is Western Europe who distributed to the rest of the world the ideas of Adam Smith and the importance of property rights (among others). As a result, countries that were influenced by West- ern Europe the most are more likely to have favorable social infrastructure. This would explain the linear terms in equations (2.22) and (2.23). Importantly, it is pos- sible that the joint distributions of (a,, um) given z,- and (bi, 222,4) given z,- can depend on 2,- due to heteroskedasticity in Var(a,-, v2,t|z,) or Var(b,-, vgit|z,-), where j = 1 or 2, as discussed by Card (2001) for the human capital earnings model. That is why we might think that the interaction terms in (2.22), (2.23), and (2.25) are required. Table 31 and Table B.2 present experimental results for the CRC model with a continuous scalar endogenous explanatory variable with a large support set and a scalar instrument Zit- Table B.1 reports the simulation outcomes for the usual 35 unobserved effect CRC model, while Table B.2 covers the case of the random trend CRC model. For the usual unobserved effect model, column 2 contains the sample correlation coefficients among the endogenous regressor, ygit, and the instrument, Zita the error, “it: the unobserved additive efl'ect, a4, and the unobserved multiplicative heterogeneity, bi, denoted 633,22, 63,24, [33/24, 61/2)” respectively, because analytical ex- pressions are not readily available. For the random trend model, we report the sample correlations between 312,-, and a”, and between ygit and a2,- separately. We denote these sample correlations 6,424,, 63,242, respectively. 63,21, is reported for t = 1.2 Columns 3 through 10 contain the mean, regular standard error (Reg. SE), ro- bust standard error (Rob. SE)3, standard deviation (SD), root mean squared error (RMSE), lower quartile (LQ), median, and upper quartile (UQ) of the APE estimates from 500 replications. Rows of the table report statistics for the usual pooled ordinary least squares (POLS) estimates on the original data, the usual fixed effects estimates (FE-OLS), which is just pooled OLS on the time-demeaned data, the instrumental variables (IV) estimates using the original data, the fixed effects-instrumental vari- ables estimates (F E—IV), and the estimates from the control function approach (CF). Adjusted standard error (Adj. SE) is reported for the CF approach. It is easy to see that when 5 = 0 and the endogenous explanatory variable 312,, is continuous on a large support set, i.e., ygit is defined by (2.25) for 6 = 0, the (con- ditional and unconditional) covariance between the detrended endogenous regressor and the unobserved heterogeneity is constant over time. Even though Murtazashvili and Wooldridge (2005) emphasize that the F E—IV estimator should contain a full set of time dummies to deliver consistent estimates, they do so allowing the covariances 2When { 75 0, Table B.1 and Table B2 are obtained for /\la = A2,, = A3,, = 0.29, A4,, = 0.84, Alb = A2(, = A3,, = 0.2, A4,, = 0.99, A1u = A2,, = 0.37, A3,, = 0.88, A923 = 0.44, E = 0.55, and A921,? = 0.71. When 5 = 0, Ala = Aga = A30 = 0.31, A40 = 0.82, Alb = A25 = A35 = 0.61, A41, = 0.91, A1,, = A2,, = 0.2, A3,, = 0.96, A9,; = 0.26, and A921,, 2 0.97 are used for Table B1 and Table B.2. 3Robust standard errors are calculated using the scaling factor from Stata 9.0, i.e., they are clustered on individuals. 36 to vary with time while still being independent of the detrended instruments. Thus, for the usual unobserved effect model, when we define ygit according to (2.25), there is no need to include time dummies to obtain consistent FE—IV estimates of 6 when E = 0. As a result, all the estimates we consider for the usual unobserved effect model including the FE—IV estimates are based on the regressions without the time dum- mies. For the random trend CRC model, all the reported estimates (but the CF) are based on the regressions with the time dummies. There are three sources of bias in the estimates under consideration. First, the correlation between the unobserved heterogeneity a,- and the regressor 312,-, results in the biased estimates of the model parameters. Second, the endogeneity of the regressor 312,-, also explains why the estimates we consider are biased. Finally, the correlation between the regressor 312,-, and the random coefficient b,- leads to the bias (and inconsistency) in the estimates, as well. As long as 5 = 0 in (2.25), the correlation between the endogenous explanatory variable and the random coefficient does not result in the inconsistency of the F E—IV estimator. When 5 = 0, both the FE—IV and the CF methods deliver consistent estimates of 6. When 5 75 0, the FE—IV estimates of 6 are both biased and inconsistent. The CF estimates, while being biased, are the only consistent estimates considered for g 75 0. Columns 4 and 5 contain regular and robust standard errors of the estimates. To be exact, we report the averages of the regular and robust standard errors of the estimates obtained from 500 replications. The regular SE are the standard errors cal- culated under assumption that there are no heteroskedasticity and serial correlation in the error terms. The robust SE are adjusted for both serial correlation and het- eroskedasticity that are possibly present in the errors. Standard errors reported for the CF approach are the standard errors, which are computed according to formula (2.20), and which are the standard error adjusted for the first stage estimation and which are robust to arbitrary serial correlation and heteroskedasticity. As expected, 37 the simulations show that the robust standard errors for the first four estimators are surely better estimates of the standard deviations than the regular standard errors are. Studying Table BI and Table 8.2 for both sample sizes in case of f 75 0, we conclude that the CF estimates have the smallest biases and the smallest RMSEs. Murtazashvili and Wooldridge (2005) show that the F E—IV estimation results in con- sistent estimates of 6, when 5 = 0. Table BI and Table B.2 indicate that the CF estimator and the FE—IV estimator have very similar RMSES for 6 = 0. Closeness in RMSES comes from similarity in both biases and standard deviations of these estimators. The CF estimator has a smaller standard deviation than the FE—IV es- timator in all the cases considered in Tables BI and B.2. For instance, when 6 = 0 and N = 1000, the standard deviation of the F E—IV estimator is about 21% higher than the standard deviation of the CF estimator. Efficiency of the CF estimators comes from the assumed specific functional forms for the endogenous variable and the random coefficients of equation (2.21). For the next set of simulations, we take 312,-, being a fraction in the open unit interval. Because the structural model (2.27) contains a trend, the default is to include a full set of time period dummies in every estimation technique but the CF approach. Table B3 and Table BA Show the simulation results for gm 6 (0,1).4 Now, the bias in the CF estimate (and all other estimates) is more pronounced, even though the CF estimating method results in the smallest bias of 6 among all the estimators. For example, when we consider the random trend model, for 312,, 6 (0,1) when .{ aé 0, with [3,121, = .581, and N = 500, the CF estimate of 6 is 2.229 with the RMSE of .750, compared to 60F = 2.039 with the RMSE of .182 with 6be = .578, 4When 6 ¢ 0, Table B3 and Table B4 are obtained for Ala = A2,, = A3,, = 0.21, A4,, = 0.92, /\1b = 0.2, Agb = A31, 2 0.7, A41, = 0.94, A1,, = A2,, = 0.43, A3,, = 0.84, A9,; = 0.87, 5 = 0.48, and Agzv2 = 0.11. When 5 = 0, Ala = A2,, = A3,, = 0.31, A4,, = 0.82, Alb = 0.2, A21, = A3b = 0.7, A45 = 0.94, An, = A2,, = 0.22, A3,, = 0.96, A92: = 0.26, and Amy, = 0.97 are used for Table B3 and Table B.4 . 38 and N = 500, for y2-it with a large support set. As expected, when 5 = 0, it is the CF approach that has the smallest bias among all the estimators under consideration since we simulate our dataset to satisfy the assumptions (2.5) through (2.7). However, when 5 = 0, the evidence on the RMSEs of the CF and the F E—IV estimators is mixed. On the one hand, the RMSE of the FE—IV method is either clearly smaller or only marginally bigger than the RMSE of the CF method. For example, for the random trend model with a roughly continuous regressor, the RMSE of the FE—IV estimator is 0.599 vs. the RMSE of 0.875 of the CF estimator when N = 500. The random trend model with a continuous explanatory variable results in the RMSE of 0.235 for the FE—IV estimator and the RMSE of 0.223 for the CF estimator when N = 500. On the other hand, the bias of the FE—IV estimator is clearly much more severe. The differences between the FE—IV and the CF approaches for the random trend model with ygit E (0, 1) and g = 0 illustrate the trade—off between bias and efficiency. For 5 = 0, both the CF method and the F E-IV approach are consistent. Further, the simulations in Tables B3 and BA show that the F E—IV estimates are always less variable than the CF estimates for the random trend model with gm 6 (0,1). However, when 5 = 0, the bias in the CF estimator is significantly less than the bias in the F E—IV. Overall, the simulation findings in Tables B3 and B4 support the idea that the CF estimating method produces more desirable estimates of 6 when f # 0. Applied economists are quite often reluctant to use control function methods since control function approaches require the calculation of the adjusted standard errors, which is not routinely done in standard econometric packages. Table B.5 contains detailed information (which can be partially seen in Tables B.1 through B4) on the standard errors of the control function estimates of the APEs for the two models and the two cases of the endogenous explanatory variable considered. Columns 1 shows whether { is different from zero. Column 2 reports the cross-sectional sample size. 39 Columns 3 through 7 contain the mean, regular standard error (Reg. SE), robust standard error (Rob. SE), adjusted standard error (Adj. SE), and standard devia- tion (SD) of the APE estimates from the CF approach from 500 replications. The regular SE are the standard errors from the second stage estimation for the CF ap- proach without adjustment for heteroskedasticity and serial correlation and without taking the first stage estimation into account. The robust SE are the second stage standard errors from the CF method, which are robust to both serial correlation and heteroskedasticity, and which are obtained ignoring the first stage estimation. The adjusted SE are the only standard errors which are adjusted for the first stage esti- mation (they are calculated according to formula (2.20)). Clearly, besides being the only theoretically appropriate estimates of the standard errors, the adjusted standard errors based on (2.20) approximate the standard deviations the best among the three standard errors considered. To summarize, the simulation findings verify that when the joint distribution of (a4, (2,, 122,4) given 2,, depends on 2,, the most robust estimator of the average partial effect in a correlated random coefficient balanced panel data model is the control function estimator from the two-step estimating method (2.17) — (2.18). 2.5 Empirical Application to Effects of Job 'Ii'ain- ing on Worker Productivity The method we propose for estimating the average partial effects from a correlated random coefficient panel data model is developed for large N small T framework. However, real-life data limitations quite often do not allow researchers to use ”truly” large N datasets. Here, we follow a common real-life situation with a not so large N dimension of the available data. Suppose we want to estimate an average partial effect 40 ...,, JP?“ ‘ of job training on worker performance measured by output scrap rates. Holzer, Block, Cheatham, and Knott (1993) explore the effects of a state-financed training grant program for manufacturing firms in Michigan using constant coefficient models. They use a three-year panel of data (1987—1989) from a unique survey of firms in Michigan that applied for training grants under the state’s Michigan Job Opportunity Bank- Upgrade (MJ OB) program. This program was designed to provide one-time grants to eligible firms. An eligible firm was defined as a manufacturing company with 500 or fewer employees that was implementing new technology and had not received a grant before. Let us estimate the effects of on-job-training on worker productivity allowing for both additive and multiplicative unobserved firm-specific effects. Why would we think that the random coefficient panel data model might be appropriate in this context? A possible justification for using a RC model can be that some unobserved firm characteristics might cause firms to respond heterogeneously to the job training. For instance, an unobserved ”atmosphere” in each firm might result in a heterogeneous effect of the annual hours of training per employee. Workers might feel supported and encouraged more in firms where the management promotes and advocates additional schooling and team efforts. Contrary, employees of firms with no policy on education beyond workers’ current level might be discouraged to improve their present skills and effort. As a results, the same annual hours of job training in the two types of firms can lead to different outcomes of the output scrap rates. Since the effect of the job training on the worker performance might be related to the extent of the unobserved support from a firm, we should consider a correlated version of the random coefficient model To fit the method from Section 2.2, we balance data from Holzer, Block, Cheatham, and Knott (1993), and obtain a sample on 45 firms that applied for an MJOB grant during 1988 and 1989. (The dataset is provided with Wooldridge (2002), and it is called JTRAIN.) Of these firms, 27 had received a grant and 18 had 41 not. The balancing of the data is made based on the availability of the data for the scrap rate (per 100 items) and the annual hours of job training per employee. Balancing the data raises concerns that the final dataset might be a non-random sample. Table B.6 contains summary statistics from the unbalanced and balanced datasets for each of the following groups: the entire sample, firms that received a grant in either 1988 or 1989, and firms that did not receive a grant. Comparison of the two panels of Table B6 suggests that even though the proportion of the firms that received the grant and the firms that did not receive the grant changed, there are virtually no differences between the firms in the two datasets with regard to the scrap rates and the annual hours of job training. The balanced data seem to be very close to the unbalanced dataset in preserving the information on the scrap rates and the annual hours of training per worker. Of course, we should also be concerned that some unobserved firms’ characteristics played a role in formation of these two samples. Since the MJOB program distributed grants to eligible firms on a first- come, first-serve basis, we believe the grant distribution to be a fairly random process, and assume firms are not selected into the two samples based on their unobserved characteristics. Given these assumptions, we feel sufficiently confident in relying on the balanced dataset to proceed with our analysis. Our goal is to evaluate the average partial effect of another hour of job training on worker productivity relaxing the traditional assumption of a constant effect of the annual hours of training per worker on the output scrap rate. In the context of the correlated random coefficient approach, a simple panel data model of our interest is log(scrap,~,) 2 oz + bulwsempi, + (51d88, + 52d89, + a1, + u,,, (2.28) where scram, is firm’s i’s scrap rate in year t, hrsemp,, is annual hours of on-job- training per employee, (2.1.,- is a firm-specific unobserved effect, and u,-, is an unobserved disturbance for firm i at year t. We also allow different year intercepts in our structural model. The unobserved firm fixed effect, 0.1,, can contain unmeasured worker ability, 42 capital, and managerial skill, which we think of as being roughly constant over the time period we consider. Since the unobserved firm effect includes the worker ability, the annual hours of job training can be correlated with the unobserved effect. For example, firm managers might want to train workers with lower skills more to improve their productivity. Or, on the contrary, they might be interested in improving the productivity of relatively high skilled workers even more in order to utilize new hi- tech equipment that requires very well trained employees. Further, we should be concerned if u,, is correlated with hrsemp,,. For example, a firm might hire more skilled workers and reduce the on-job-training requirements at the same time. A possibility of measurement error in hrsemp,, should also be considered since there might be some incentives for recipients of a grant to overstate or non-recipients to understate their training changes. If any (or both) of these is the case, we need to deal with the endogeneity of the annual hours of training in equation (2.28). Here, we exploit the fact that some firms received MJOB grants. We assume that grant designation in year t is uncorrelated with the error term u,, in every time period. This seems to be a reasonable assumption, since firms are eligible to receive a grant only once, and grants were distributed on a first-come, first-serve basis, which we believe to be a fairly random process. Thus, whether a grant is received or not in year t should not be related to changes in the output scrap rates in any other year directly and only through the changes in the annual hours of job training.5 Thus, we use a dummy variable indicating whether or not a grant was received as an instrumental variable for the annual hours of training per worker provided that hrsemp,, and grant,, are 5Using a constant coefficient approach, we regress a change in the log of the scrap rates as occurring between years t — 1 and t on a change in the annual hours of on—job—training, a change in a dummy variable indicating whether a grant was received, and a lag of this variable. The changes are taken to eliminate the firm fixed effect. The results of the regres- sion suggest that none of the three variables are either individually or jointly statistically significant at any conventional level of significance (R2 for the regression is 0.032 with F-statistic=0.96). 43 correlated. Clearly, the variable hrsemp,, takes on only non-negative values, and we should consider taking a logarithmic transformation of this variable to run the first stage regression. At the same time, about 27% of all the observations have zero values for the annual hours of job training. Normally, we would transform a variable :17 that has zero observations using log(1 + 3:) transformation. However, for the purpose of our method, there is no gain in using this transformation, since the new variable, log(1 + 1:), will take on only positive values. Thus, we choose to use the variable reporting the annual hours of on-job-training in levels. The first stage regression results with and without different year intercepts are reported in Table 37. Columns (1) and (2) report the results from the first stage regression with and without different year intercepts when no other variables but grant,, are used as explanatory variables. Table B.8 reports the estimation results for equation (2.28) by FE—IV and CF methods. Overall, the APE estimates of the annual hours of on-job-training by the CF approach for equation (2.28) are bigger than the corresponding estimates from the same equation by the FE—IV method [See columns (1) and (2) vs. columns (5) and (6) of Table B8]. For example, the CF approach in regression (6) suggests that 10 more hours of job training per worker are estimated to reduce the scrap rate by about 37%. For the firms in the sample, the average amount of job training over the three- year period is 15.6 hours per employee, with a minimum of zero and a maximum of 154. Comparing regressions (6) and (2) from Table 8.8, we can say that the FE-IV estimate of the APE of the average annual hours of job training per worker is about 18 times smaller in magnitude and is statistically insignificant. Overall, the estimates from the CF method (regressions (5) and (6)) are more statistically significant and substantially larger in magnitudes for the two regression specifications considered than the corresponding F E—IV estimates (regressions ( 1) and (2)). 44 What do the differences in the two estimation methods suggest? They suggest that even if the unobserved ” atmosphere” of a firm is independent of the grant designation, the correlation between hours of on-site job training and the firm-specific unobserved effect is different for firms that received a grant and firms that did not. Indeed, it is natural to think that the effect of the unobserved worker ability, managerial skills, and a firm’s ”atmosphere” on hours of job training might be stronger among those firms that received grants. In other words, using the language from Section 2.2, the joint distribution of ((1,, b,, v2,,) is different for those firms that received a grant and those that did not. To address the question of rationale of the model with both additive and multi- plicative individual heterogeneities in the context of this application, we test whether the coefficients of different sets of variables from the second stage are statistically dif- ferent from zero. Several Wald statistics are calculated. First, the coefficients on all the explanatory variables but to, and .12,, (and year dummies if included) are restricted to zero. For specificity, let us use regression (6) from Table B8. The Wald statistic for this regression equals 13.00, which allows us to reject the null hypothesis that ignoring endogeneity and heterogeneity is inappropriate for our data with p-value of 0.072 (a critical chi-squared value is 12.02 with 7 degrees of freedom at 10% level). Second, to keep in mind the endogeneity of the main explanatory variable, we restrict to zero the coefficients on all the explanatory variables but 11),, :r,,, and 62,-, (and year dummies if included). The Wald statistic for this test is 13.33, which exceeds a critical value of 12.59 with 6 degrees of freedom at 5% level (p—value is 0.038). And, thirdly, to reflect assumption (2.5), the coeflicients on all the explanatory variables but 111,, 3:,,, 6%,, and 62,21), (and year dummies if included) are restricted to zero. Now, the Wald statistic is 13.23, which is above a critical value of 11.07 with 5 degrees of freedom at 5% level (p-value is 0.021). Finally, we check whether the interaction terms between the averages of the first- 45 stage residuals, 62,-, and the averages of the exogenous variables, 2,, are jointly dif- ferent from zero. To do so, we consider three possibilities: first, when all such terms are jointly significant; second, when only terms originated from our assumption on the additive heterogeneity, (1.1,, are jointly different from zero; and, third, when only terms introduced from the assumption on the multiplicative heterogeneity, bu, are jointly important. The resulting Wald statistics for regression (6) are 11.74, 2.47, and 11.32, respectively. Thus, we can reject the first and the last null hypothesizes at 1% level with chi-squared critical values of 9.21 and 6.63 for 2 and 1 degrees of freedom, respectively. And we cannot reject the hypothesis that the interaction term between the additive unobserved fixed effect and the explanatory variable is significant at any conventional level of significance (a chi-squared critical value for 1 degrees of freedom for 10% level is 2.71). Thus, there is evidence at 1% level of significance that the conditional variance-covariance matrix, Var(b1,,a:,,|z,), is heteroskedastic. Overall, we can conclude that the CF approach should perform favorably in comparison with the FE—IV method for the output scrap rates application. We are unlikely to draw conclusions about the causal effect of on-job-training on the output scrap rate having only one explanatory variable — hrsemp,, — unless other control variables are also accounted for. We consider a logarithm of the dollar value of the annual sales ~- lsales,,, a logarithm of the number of employees -— lemploy,,, and a logarithm of the average annual employee salary — la.vgsal,:, — as additional explanatory variables. The summary statistics for these variables are reported in Table B9. The first stage regression results with and without different year intercepts are provided in Table B.7. Regressions (3) and (4) employ additional explanatory variables available. In these regressions, control variables lsales,, and lemploy,, and their time averages are both individually and jointly statistically insignificant at any conventional levels. The logarithm of the average annual employee salary, lavgsal,,, and its time average are individually and jointly significant at least at 10% level of 46 significance. Based on these results we consider the average annual salary as the only valid additional explanatory variable. Thus, the next model of our interest is log(scrap,,) = a + b1,lirsemp,, + b2,lacgsal,, -l- 61(188, + 62d89, + a1,- + u.,,. (2.29) The estimation results for this model are provided in column (8) of Table B.8. Column (7) of the same table contains the results for the case without year dum- mies. The CF estimates of the APE of the annual hours of job training slightly decrease when we use two explanatory variables in comparison with the case with only one regressor besides year dummies [See columns (7) and (8) vs. columns (5) and (6) of Table B8]. Once again, the estimates from the CF method (regressions (7) and (8)) are more statistically significant and substantially larger in magnitudes for the two regression specifications considered than the corresponding FE—IV estimates (regressions (3) and (4)). Finally, we might think that the assumption of a heterogenous response to the annual average salary per worker stretches our imagination too much. Indeed, the same amount of the annual salary per employee is likely to have the same impact on the output scrap rate across different firms. Since the firms are located in the same area (Michigan), workers of these firms are expected to get comparable value out of the same monetary compensation for their work. Consequently, we consider the following model: log(scrap,,) = a + blih’l'SeTll/pit + 621avgsal,, + 61d88, + 62d89, + a1,- + u,,. (2.30) The results for this model using the CF approach when the year dummies are both excluded and included are provided in columns (9) and (10) of Table B.8, respectively. The estimates from the CF method are bigger and more statistically significant than the corresponding FE-IV estimates reported in columns (3) and (4) of Table 88. Interestingly, Table B.8 shows that the estimates of the adjusted standard errors for the CF method are very close to the estimates of the robust standard errors that 47 are calculated ignoring the first stage estimation. Contrary, the simulation results [See Table B5] indicate that the adjustment in the standard errors for the first stage estimation is not trivial. Based on results reported in columns (5) through (10) of Table B.8 we can conclude that the CF approach estimates of the APE of the annual hours of job training per worker are robust to different model specifications. Further, they are larger in magnitudes and statistically more significant than the FE—IV estimates (regressions (1) through (4) of Table B.8) for all models considered. 2.6 Conclusion This paper studies CRC balanced panel data models with endogenous regressors as in Murtazashvili and Wooldridge (2005). However, in addition to allowing some explanatory variables to be correlated with the idiosyncratic error, we also let the joint distribution of the endogenous regressors and the individual heterogeneity conditional on the instruments depend on the instruments. In particular, we allow the endogenous regressors to be roughly continuous. We use a control function approach, which introduces residuals from the reduced form for the endogenous regressors as covariates in the structural model. We propose a two-step method to account for heterogeneity and endogeneity and to consistently estimate APEs in CRC panel data models with endogenous (roughly) continuous regressors. Further, we relax the assumptions in Murtazashvili and Wooldridge (2005) by allowing the individual slopes in a CRC model to vary over time. Monte Carlo simulations indicate that in the finite samples the control function approach to estimating the CRC balanced panel data model with time-invariant in- dividual heterogeneity performs better than other estimators when the joint distri- bution of the individual heterogeneity and the endogenous regressors conditional on 48 the instruments depends on these instruments. Finally, we apply the new method to the problem of estimating the APEs of the annual hours of on—job-training on the output scrap rates for manufacturing firms in Michigan extending the work of Holzer, Block, Cheatham, and Knott (1993) to allow for a firm-specific effect. The control function approach we propose delivers the APEs of the annual hours of job training on the output scrap rates that are larger in magnitudes and statistically more significant than the APEs’ estimates from the FE—IV approach. 49 CHAPTER 3 ESTIMATION OF A DYNAMIC BINARY RESPONSE PANEL DATA MODEL WITH AN ENDOGENOUS REGRESSOR, WITH AN APPLICATION TO THE ANALYSIS OF POVERTY PERSISTENCE IN RURAL CHINA 3. 1 Introduction Dynamic binary response models have considerable appeal for a diverse range of policy analyses in which identifying or controlling for state dependence is important and one is interested in a binary outcome.1 When the outcome is also affected by an endogenous treatment, then an additional complication arises in efforts to identify the effects of the treatment on the outcome and on state dependence. In this paper, we 1The range of research areas for which dynamic binary response models have proven important include: labor force participation (Heckman and Willis, 1977; Hyslop, 1999), the probability of receiving welfare (Bane and Ellwood, 1986), the experience social exclusion (Poggi, 2007), and the identification of adverse selection in insurance markets (Chiappori and Salanie, 2000). 50 propose a parametric approach to estimating binary response dynamic panel data models with endogenous contemporaneous regressors. Our method combines the approach to solving the unobserved heterogeneity and the initial conditions problems in non-linear dynamic models (Wooldridge, 20050) with a control function approach to controlling for endogeneity of contemporaneous explanatory variables in cross- sectional non-linear models (e.g., Rivers and Vuong, 1988; Smith and Blundell, 1986). Among other possible applications, the relevance and potential strength of our approach can be demonstrated in analyses of how migration in developing countries affects the poverty status of residents living in migrant source communities. In this setting, we are faced with two important sources of endogeneity: first, the migration decision of community residents may be driven by negative shocks that also raise the probability that households are poor. Second, we expect there to be correlation between migration decisions and the unobserved characteristics of individuals and communities, which may also affect poverty status. Our approach allows us to con- sistently estimate parameters of a dynamic binary response panel data model with unobserved heterogeneity when some of the continuous contemporaneous explana- tory variables are endogenous. To account for the endogeneity in migration from home communities, we employ a control function approach in which residuals from the reduced form for the endogenous regressor are introduced as covariates in the structural model. To deal with the dynamic nature of the model, we consider two possibilities. We first use a “pure” random effects approach that allows the unob- served heterogeneity to be independent of the observed exogenous covariates and initial conditions. Next, we relax this strong assumption by employing the dynamic correlated random effects model introduced by Wooldridge (2005c). This approach is not only more relevant for analyses of poverty persistence, but also more flexible and computationally straightforward than alternative approaches currently in use. We then implement our empirical approach using panel household and village 51 data from rural China. Following the market-oriented reforms introduced in the early 19805, there was a pronounced decline in the proportion of China’s population living below the poverty line (Ravallion and Chen, 2007). While much of the literature examining growth in China’s rural areas has focused on incentive effects related to reform and on the role of local non-farm employment, there has been relatively little research demonstrating the relationship between reduction of barriers to migration from villages and the probability that households within the village have consumption levels below the poverty line. Our empirical analysis demonstrates an economically significant causal relationship between reduction of barriers to migration and poverty reduction in rural China. The paper proceeds as follows. In the Section 3.2 below, we first review approaches to estimation of dynamic binary response panel data models, and then propose an approach to estimating these models when there is an endogenous regressor. In Sec- tion 3.3, we introduce the rural China setting, and develop a specific implementation of the empirical model and strategy for identifying the effect of migration on poverty within China’s villages. In Section 3.4, we discuss our estimation results and the per- formance of the model, and then in Section 3.5 we summarize our results and discuss directions for future research. 52 3.2 Estimation of a Dynamic Binary Response Panel Data Model with an Endogenous Re- gressor 3.2.1 Dynamic Binary Response Panel Data Models Dynamic binary response panel data models with unobserved heterogeneity have been ' used extensively in theoretical and empirical studies. Both parametric and semipara- metric methods have been proposed to solve the initial conditions problem and to obtain consistent estimates of model parameters when all explanatory variables other than the lagged dependent variable are strictly exogenous.2 Semiparametric methods allow estimation of parameters without specifying a distribution of the unobserved heterogeneity, but they are often overly restrictive with respect to the strictly exoge- nous covariates. Honoré and Kyriazidou (2000), for example, propose an approach that does not allow for discrete explanatory variables. More importantly, because the semiparametric methods do not specify the distribution of the unobserved het- erogeneity, the absolute importance of any of the explanatory variables in a dynamic binary response panel data model cannot be determined. Models with no assump- tion on either the unobserved effects or the initial conditions, or their relationship to other covariates, are best described as fixed eficcts models, and the semiparametric approach of Honoré and Kyriazidou (2000) falls into this class of models.3 2With a structural binary outcome model that allows for unobserved effects, one must be concerned that bias could be introduced through a systematic relationship between an unobserved effect and the initial value of the dependent variable. This is known as the initial conditions problem. 3We follow Chay and Hyslop (2000) in classifying models requiring no assumption on unobservable effects or initial conditions as fixed effect models, and refer to random ef- fect models as those in which one specifies a distribution of unobserved effects and initial conditions given exogenous explanatory variables. 53 Due to their computational simplicity, parametric methods have received greater attention than semiparametric methods. There are four main parametric approaches that have been employed for estimation of the dynamic nonlinear panel data mod- els with strictly exogenous covariates other than the lagged dependent variable. All four approaches use conditional maximum likelihood estimation (CMLE) analysis. The first approach treats the initial conditions for each cross-sectional unit - 31,0 - as nonrandom variables. If, in addition, unobserved effects, c,, are also assumed to be independent of z,, one obtains the density of (y,1,y,2, ...,y,T) given the initial conditions, W), and the exogenous explanatory variables, z, = (z,1,z,2,...,z,T), by integrating out the c,. We refer to the relationship between the observed exogenous covariates and the unobserved heterogeneity in the first method a ”pure” random effects relationship because we assume c,- to be independent of z, and W). While this method does provide a way to obtain consistent estimates of the model parame- ters, nonrandomness of the initial conditions requires the very strong assumption of independence between the initial conditions and the unobserved effects. A second parametric approach would involve treating the initial conditions as random and specify a density of y,0 given (2,, c,). With this density, one can then obtain the joint distribution of all the outcomes, (y,0, y,1, y,2, ..., y,T), conditional on unobserved heterogeneity, c,, and strictly exogenous observables, z,. One important drawback of this approach lies with the difficulty of specifying the density of y,0 given (Zr. Ci)-4 A third method, proposed by Heckman (1981), suggests approximating a den- sity of the initial conditions, y,0, given (z,,c,) and specifying a density of the un- observed effects given the strictly exogenous explanatory variables. The density of (ya), y,1, 31,2, 312T) given z, can then be obtained. While Heckman’s approach avoids the drawback of the second method, it is computationally challenging. Since both 4More details on this approach and potential drawbacks can be found in Wooldridge (2002), page 494. 54 the second and the third methods explicitly specify a distribution of the unobserved heterogeneity conditional on strictly exogenous observables and a distribution of the initial conditions conditional on the unobserved effects and the exogenous covariates, they can be classified as random effects models. Finally, an approach proposed by Wooldridge (2005c) recommends obtaining a joint distribution of (31,1, 31,2, ..., 3,1,7) conditional on (y,0, z,) rather than a distribution of (31,0, y,1. y,2, ..., y,T) conditional on z, as in Hechman’s approach. For this method to work, we need to specify a density of c, given (y,0,z,).5 This fourth approach is more flexible and requires fewer computational resources than Heckman’s technique. In this method, we call the relationship between the observed exogenous covariates and the unobserved heterogeneity a ”correlated ” random eflects relationship because we allow c, to be a linear function of z, and y,0. In the next section we develop a theoretical method that consistently estimates parameters of a dynamic binary response panel data model when the contemporane- ous explanatory variables are not strictly exogenous. To do so, we employ a control function approach, originally introduced by Smith and Blundell (1986) and Rivers and Vuong (1988). The main idea of the control function approach is to add control variables into the structural model to control for endogeneity. Since we will consider a model with two possible sources of endogeneity — the correlation between the un- observed heterogeneity and a regressor, and the correlation between a regressor and the structural error, we model the relationships among the unobserved effect, exoge- nous covariates, and the error from the reduced form equation for the endogenous explanatory variable. 5The specification of this density in Wooldridge’s method is motivated by Chamberlain’s (1980) device. 55 3.2.2 A General Approach to Estimation Our specification of the binary response model assumes that for a random draw 2' from the population, there is an underlying latent variable model: yf,, = Zia/31 + 523121 + Pym—1 + 612' + Um. (3-1) y1it=1lyfit 2 0].t=1....,T. (3.2) where z,,, is a 1 x (K — 1) vector of strictly exogenous covariates, which may contain a constant term, y2,, is an endogenous covariate, c1, is an unobserved effect, and um is an idiosyncratic serially uncorrelated error such that Var(u1,,) = 1. 1]] is an indicator function. We assume a sample of size N randomly drawn from the population, and that T, the number of time periods, is fixed in the asymptotic analysis. For simplicity, we assume a balanced panel. Let 6 denote (6], 62, 63, p)/, which is a 1 x (K + 1) vector of all the parameters. Importantly, this model allows the probability of success at time t to depend not only on unobserved heterogeneity, C1,, but also on the outcome in t — 1. A key assumption is that. the dynamics are correctly specified and dynamic completeness of the model implies that the error term is serially uncorrelated. Thus, assuming that model (3.1) is correctly specified dynamically, we assume that the error term u1,, is serially uncorrelated. Allowing, u1,, to have arbitrary serial correlation, would suggest including more lags of the dependent variable (3.1). For example, in the simplest case of a linear model, when an error term, u,,, follows AR(1) process, a simple calculation shows that a dependent variable, say y,,, actually depends on not only y,,,_1 but also y,,,_2. Similarly, in the context of our model, one should have a good reason to expect a serially correlated error term 711,, and yet to include only one lag of y,,,. 56 Further, we make additional assumptions on strict exogeneity of the contempora- neous explanatory variables. First, some of the contemporaneous covariates, z1,,, are assumed to be strictly exogenous (conditional on c1,). Second, we allow some of the explanatory variables, here represented by the scalar 312,“ to be endogenous. 312a = 211151 + 222152 + ('2i + "2a 2 z,,6 + 2,A + 02, 'f' 112,, = 2215 + 3M + 11221: (3-3) where t = 1, ...,T, c2, is an unobserved effect, and u2,, is an idiosyncratic serially uncorrelated error with Var(u2,,) = 0%. Let z,, = (zl,,,z2,,) be a 1 x L vector of instrumental variables, with L 2 K, i.e., we assume the vector 22,, contains at least one element. We employ the Mundlak-Chamberlain device for the unobserved effect, c2,, and this is reflected in line two of equation (3.3). We replace eg, with its projection onto the time averages of all the exogenous variables: c2, = 2,A + a2, Then, the new composite error term is 212,, = a2, + u2,,. Further, 2, = Tizit’ and 5 = (6’ , 55);. We follow Rivers and Vuong (1988) and refer to (3.3) as a redficed form equation. Next, consider the relationship between u,,, and u2,,. We assume that (u,,,, u2,,) has a zero mean, bivariate normal distribution and is independent of z, = (z1,, 22,) = (z,1,z,2. ...,z,T). Note that under joint normality of (71,1,,,u2,,), with Var(u1,,) = 1, we can write Um = 911221 + Gift = 9012a - (12;) + 61a, (3-4) where 6 = 17/03 , n = Cov(u1,,, u.2,,), 0% = Var(u2,,), and 61,, is a serially uncorrelated random term, which is independent of z, and u2,,. The absence of serial correlation of the em follows from the fact that 111,, and 112,, are both assumed not to suffer from 57 serial correlation. If there were no lagged dependent variables on the right hand side of equation (3.1), there would be little need to worry about possible serial correlation in the error term ug,, of equation (3.3), as long as we assume that 111,, is also serially uncorrelated. However, we are interested in a dynamic model, and the assumption of no serial correlation in 2,2,, is crucial for equation (3.4). Since equation (3.3) is essentially a reduced form equation for the endogenous variable y2,,, the assumption of no serial correlation in 212,, (and in em, as a result) is appropriate in the context of our model. Equation (3.4) is essentially an assumption regarding the contemporaneous endo- geneity of y2,,. It suggests that the contemporaneous 112,, is sufficient for explaining the relation between u”, and 212,, In other words, once we somehow account for endogeneity of 312,, in period t, we might think that 312,, becomes ”completely” exoge- nous, and we can estimate the parameters of interest using standard methods valid for exogenous explanatory variables. However, there is the possibility of an addi- tional feedback from the endogenous variable yg in different time periods to the main dependent variable of interest, yl, at time t. This possibility arises because we let the reduced form equation for the endogenous variable, y2,,, contain a time-constant unobserved effect, a2, From assumption (3.4), (21,, ~ Normal(O, 031), where 031 = 1 — {2, since Var(u1,,) = 1, and § = Corr(u1,,,u2,,), we can write 311a = llxirtfl + 01i+ 9(“2it — "-2i) + 81a 2 0i = llxiz’tl3 + 91’2it‘l' (Cir — 96122:) + 61a 2 0 = llxiit!3 + 90211 + COi + 61a 2 0]. (3-5) /‘ I o ‘ where t = 1. MT, X127: = (ZlitayZitsyliJ—lla ,3 = 03162466 and 002'. = 012‘ - 9022'. lb a composite unobserved effect. Since the unobserved effect CO, is present in equation (3.5), we should consider the relation between the unobserved effect c0, and the 58 explanatory variables in equation (3.5). Importantly, the composite unobserved effect CO, is a function of v2,,, where t = 1, ..., T, by construction: 002'. = 012: — 90227 = 612; - 9(L’2it — u2,,),t=1,...T Thus, in order to obtain consistent estimates of the parameters from equation (3.5), we must take into account the relation between c0, and 122,, in different time periods. First, we use a ”pure” random effects approach, i.e., we assume that -|- - .~N 1( —. 2)t=1 T (36) C02 zzi yl'l01v2’l orma (10112,,0'01 1 1 “'1 a ' which can be written as CO, = 01022, + a1,, t = 1,...,T, where a1,]z,,y1,0,v2, ~ Normal(O, 031) and is independent of (z,, 3,1,0, v2,), where 22, = alwtilvzit, and v2, = (v,1,v,2, ...,v,T). While a limiting assumption in many potential applications, the ”pure” RE assumption (3.6) may be relevant for certain cases. In particular, when every individual in the initial time period is in the same state (e. g., we are interested in the population of people who smoke), assumption (3.6) might be appropriate. Further, since we assume that the composite unobserved effect, c0,, is independent of the initial condition, y1,0, it is natural to think that ’Ugit’s in different time periods have equal impacts on c0,. Consequently, we employ 22, as a sufficient statistic for describing the relation between co, and v2.,,’s in different time periods. Then, under assumptions (3.1)-(3.4) and (3.6), we can rewrite equation (3.5) as yut = llxiitfi + 902a + 0052' + an + 61a 2 0l- (3-7) 6 6 (1 Clearly, the estimates of 6 = —, 6 = ————, and a0 = —L— can (log, +0211 031 +031 ,loel +03, be obtained using standard random effects probit software by including 62, in each time period into the list of the explanatory variables along with x1,, and 02”, where T 7 1 A "U2z' = T Z 'U2it- 1521 However, as we discussed earlier, the assumption of independence between the unobserved effect and the initial conditions and the exogenous covariates is often too 59 restrictive. In particular, the ”pure” random effects assumption is unrealistic in the context of the application to poverty persistence that we will examine below. For instance, unobserved dimensions of ability are very likely to be related to poverty status not only in the initial period, but also in future periods. Rather than using a ”pure” random effects approach, we prefer building on the dy- namic ”correlated” random effects model introduced by Wooldridge (2005c). Instead of the conditional distribution of co, assumed in (3.6), we now assume that r 2 C’Oilzia 31110: V2 ~ 1\OrInaKVa'ao + Zrai + 023/120: 0a,), (3-8) which follows from writing c0, = V2,,a0 + 2,0, + agym) + a1,, where a1,|z,, 311,0, v2, ~ Normal(0, 031) and independent of (z,, y1.,0,v2,). Since we allow for a nonzero cor- relation between the composite unobserved effect, c0,, and the initial condition, Ill-2'0, 'Ug,,’s in different time periods might have different effects on em. Thus, we let v2,,’s from different time periods have unequal ”weights” for explaining c0, Assumption (3.8) is an extension of Chamberlain’s assumption for a static probit model to the dynamic setting. To allow for correlation between c0, and z, and 311,0, we assume a conditional normal distribution with linear expectation and constant variance. As- sumption (3.8) is a restrictive assumption since it specifies a distribution for CO, given z,, y1,0,v2,. However, it is an improvement on the “pure” random effects approach in that it allows for some dependence between the unobserved effect and the vector of all explanatory variables across all time periods. Then, under assumptions (3.1)-(3.4) and (3.8), we can rewrite equation (3.5) as ylit = llxlzttfi + 91221: + Cor + 61a 2 0i = llxnt/3 + 97%: + Vain-'0 + 22:01 + (123/120 + an + 61n 2 Ol- (39) E to 3.9‘ .t.tht. .' t. =——fi—— a9=—Q——1 qua1n( )suggess a wecan est1mae6 Wan Waong with 00- — ————Q—— 02 ,ol— —— ——Land (12— — ——2——using standard random V01+Ur211 V0 gc:+aal Vocl'i‘ar’il 60 | I I effects probit software by including 02,-, z,, and gym) in each time period into the list of the explanatory variables along with x1,, and 732,, 3.2.3 Allowing for Serial Correlation of Errors in the First Stage If the first stage error, um, is serially correlated, we must modify our two—step es- timating procedure. To be specific, assume 11.2,, follows an AR(1) process: u2,, = nu2,,,_1 + e2,,, where e2,, is a white noise error with Var(e2,,) = 062- Then, under assumption (3.4), C0V(€11t, 6121—1) = C0V(U1zfit - 9112a, “Mt—1 - 0u2i,t—1) = Covf'um — Winn — 662itau1i,t—1 - 6""2i,t—1) = W92Efu2i,t—1). which is more than 0, unless either 7r 2 0 or 6 = 0. Clearly, assumption (3.4) is no longer appropriate and needs to be corrected. Define the variance-covariance matrix of v2, as Q E E(v'2,-v2,), a T x T matrix that we assume to be positive definite. Then, / 1 7,. 7,2 . . . TIT—2 7,.T—l \ 7,. 1 7,. T— T-2 n2 7r 1 -~ 7rT_4 arr—3 __ I 2 . or 2 fl = E(V‘22TV2,') = (7a2JTJT + 02 : .. : , 7TT—2 WT—3 7rT—4 1 (3.10) . . 2 ”(2: , . . . where JT IS a T x 1 vector of ones, and 02 = 17%,. We can obtain conSIStent estimates of the parameters in (3.10), and use them to transform '02,, to US“, which is a first stage error free of serial correlation. One useful method for estimating 7r, 0%,, 03,2, and 0% is the minimum distance estimator, described in detail by Chamberlain (1984). Cappellari ( 1999) has developed code that conveniently implements this method in Stata. 61 Once we have first stage errors free of serial correlation, we can use the transfor- mation “fit = of, —— a2, to adjust assumption (3.4). We can then assume that under joint normality of (u,,,, ugit), um = U349 + 61a = 905a — (122‘) + elita (3-11) where cm is a serially uncorrelated random term, which is independent of z, and “521- Inclusion of “fit instead of u2,, in equation (3.11) guarantees that 61,, will not be serially correlated. Then, we can write yin = llxiafl + 012' + 916,-, — 9022' + 61a 2 0] = llxlafi + 6123,, + (612' - 0022:) + 61a 2 0 = llxlitfi + 976a + "01' + 6121: 2 0]. (3-12) where t z 1, T, and c0, = cl, — 6oz, is a composite unobserved effect. Based on equation (3.12), it is straightforward to adjust the two-step estimating procedure discussed in Section 3.2.2 to account for the presence of the serial cor- relation in u2,,. For example, under ”correlated” random effects assumption (3.8), equation (3.12) can be written as gift = llxiz'tl3 + 91/5,, + Cor + 6m 2 0] = llxlrtfi + 9712.: + V2200 + z,,-0:1 + Ont/120 + an + 61a 2 of (3-13) Then, we can estimate the parameters 6, 6, Q1, and (12 using standard random effects probit software by including 02,-, z,, and 311,0 in each time period into the list of the explanatory variables along with x1,,. 3.2.4 Calculation of Average Partial Effects To assess the magnitude of state dependence we must calculate the average partial effect (APE) of the lagged poverty status on its current value. We follow an approach 62 favored by Wooldridge (2002) to calculate the APES after our two-step estimation procedure. The APEs can be calculated by taking either differences or derivatives of El‘pbini3 + 91% + V2200 + 2:01 + Gel/1270)], (3-14) where t = 1, ..., T and in the argument of the expectations operator, variables with a subscript i are random and all others are fixed. In order to obtain estimates of the parameter values in (3.14), we appeal to a standard uniform weak law of large numbers argument.6 For any given value of x1,(x(1’), a consistent estimator for expression (3.14) can be obtained by replacing unknown parameters by consistent estimators: N N_IZ‘D(Xfr3* + 9*92214- 9260* + Z261, + 42*31110). (3-15) i=1 where t = 1, ...,T, the 132,, are the first-stage pooled OLS residuals from regressing y2,, on z,,, v2, = (6,1,6,2, ...,f',T), the * subscript denotes multiplication by 62 = (03:33,)4/ , and 6, 6, do, 611, (312, and {72 are the conditional MLEs. Note that 62 is the usual error variance estimator from the second-stage random effects probit regression of 3,11,, on x1,,, 13%,, z,, and 3,1,0. One may then employ either a mean value expansion or a bootstrapping approach to obtain asymptotic standard errors. We can compute either changes or derivatives of equation (3.15) with respect to x1, to obtain the APEs of interest. In common with the adjustment to our estimating procedure, one must also correct the estimated APEs when errors are serially correlated. We obtain the APEs by taking either differences or derivatives of El‘I’fxltfl + 91% + V2100 + 22:01 + 0123/1210], (3-16) where t = 1, ..., T. For simplicity, consider the second approach used in Section 3.2.4 to obtain the APEs’ estimates. For any given value of x1,(x(1’), a consistent estima- 6See Wooldridge (2002) for details. 63 tor of expression (3.16) is obtained by replacing unknown parameters by consistent estimators: N N_IZ‘I’(X’1’6* + 0*02it + 922150»: + Ziéi, + 92*912'0): (3-17) i=1 where f. = 1, T, '13,, is a first stage residual cleaned of serial correlation, where the A —1/2 ,. A * subscript denotes multiplication by 62 = (031 + 05,21) , and 6, 6, 511, 62, and 2 6' are the conditional MLEs. We can then compute either changes or derivatives of equation (3.17) with respect to x1, to obtain the APEs of interest. 3.3 Migrant Labor Markets and Poverty Persis- tence in Rural China Before applying the dynamic binary response model discussed above to an analysis of how migrant labor market affect poverty status in rural China, we first briefly review the history of rural-urban migration in China and review other evidence on the impact of migration in the home villages of migrants. Next, we propose a specific implementation of the dynamic binary response model to an analysis of the impact of migration on the probability that a rural household is poor. We then introduce the unique panel household and village data sources used in our analysis and describe our approach to identifying the migrant networks that affect the cost of finding migrant employment for village residents. 3.3.1 Rural-Urban Migration in China China’s labor market experienced a dramatic change during the 19905, as the volume of rural migrants moving to urban areas for employment grew rapidly. Estimates us- ing the one percent sample from the 1990 and 2000 rounds of the Population Census 64 and the 1995 one percent population survey suggest that the inter-county migrant population grew from just over 20 million in 1990 to 45 million in 1995 and 79 mil- lion by 2000 (Liang and Ma, 2004). Surveys conducted by the National Bureau of Statistics (NBS) and the Ministry of Agriculture include more detailed retrospective information on past short-term migration, and suggest even higher levels of labor migration than those reported in the census (Cai, Park and Zhao, 2007). Before labor mobility restrictions were relaxed, households in remote regions of ru— ral China faced low returns to local economic activity, reinforcing geographic poverty traps (J alan and Ravallion, 2002). A considerable body of descriptive evidence related to the growth of migration in China raises the possibility that migrant opportunity may be an important mechanism for poverty reduction. Studies of the impact of migration on migrant households suggest that migration is associated with higher incomes (Taylor, Rozelle and de Brauw, 2003; Du, Park, and Wang, 2006), facilitates risk-coping and risk-management (Giles, 2006; Giles and Y00, 2006), and is associated with higher levels of local investment in productive activities (Zhao, 2003). Institutional changes, policy signals and the high return to labor in urban areas each played a role in the expansion of migration during the 19903. An early reform of the household registration (hukou) system in 1988 first established a mechanism for rural migrants to obtain legal temporary residence in China’s urban areas (Mallee, 1995). In order to take advantage of this policy change, rural residents required a national identity card to obtain a legal temporary worker card (zanzu zheng), but not all rural counties had distributed IDs as of 1988.7 As China recovered from its post-Tiananmen retrenchment, some credit a series of policy speeches made by Deng Xiaoping in 1992 as signals of renewed openness toward the marketization of the economy, including employment of migrant rural labor in urban areas (Chan and 7Legal temporary residence status does not confer access to the same set of benefits (e.g., subsidized education, health care, and housing) typically associated with permanent registration as a city resident. 65 Zhang, 1999). Combined with economic expansion, these institutional and policy changes led to increased demand for construction and service sector workers, and catalyzed the growth in rural-urban migration that continued throughout the 19905. The use of migrant networks and employment referral in urban areas are im- portant dimensions of China’s rural-urban migration experience. Rozelle et al (1999) emphasize that villages with more migrants in 1988 experienced more rapid migration growth by 1995. Zhao (2003) shows that number of early migrants from a village is correlated with the probability that an individual with no prior migration experience will choose to participate in the migrant labor market. Meng (2000) further suggests that variation in the size of migrant flows to different destinations can be partially explained by the size of the existing migrant population in potential destinations.8 3.3.2 The RCRE Household Survey The primary data sources used for our analyses are the village and household surveys conducted by the Research Center for Rural Economy at China’s Ministry of Agricul- ture from 1986 through the 2003 survey year. We use data from 90 villages in eight provinces (Anhui, Jilin, Jiangsu, Henan, Hunan, Shanxi, Sichuan and Zhejiang) that were surveyed over the 17-year period, with an average of 6305 households surveyed per year. Depending on village size, between 40 and 120 households were randomly surveyed in each village. The RCRE household survey collected detailed household-level information on 3Referral through one’s social network is a common method of job search in both the developing and developed world. Carrington, Detragiache, and Vishnawath (1996) explicitly show that in a model of migration, moving costs can decline with the number of migrants over time, even if wage differentials narrow between source communities and destinations. Survey-based evidence suggests that roughly 50 percent of new jobs in the US are found through referrals facilitated by social networks (Montgomery, 1991). In a study of Mexican migrants in the US, Munshi (2003) shows that having more migrants from one’s own village living in the same city increases the likelihood of employment. 66 incomes and expenditures, education, labor supply, asset ownership, land holdings, savings, formal and informal access to credit, and remittances.9 In common with the National Bureau of Statistics (NBS) Rural Household Survey, respondent households keep daily diaries of income and expenditure, and a resident administrator living in the county seat visits with households once a month to collect information from the diaries. Our measure of consumption includes nondurable goods expenditure plus an im- puted flow of services from household durable goods and housing. In order to convert the stock of durables into a flow of consumption services, we assume that current and past investments in housing are “consumed” over a 20—year period and that invest- ments in durable goods are consumed over a period of 7 years.10 We also annually “inflate” the value of the stock of durables to reflect the increase in durable goods’ prices over the period. Finally, we deflate all income and expenditure data to 1986 prices using the NBS rural consumer price index for each province. There has been some debate over the representativeness of both the RCRE and NBS surveys, and concern over differences between trends in poverty and inequality in the NBS and RCRE surveys. These issues are reviewed extensively in Appendix B of Benjamin et al (2005), but it is worth summarizing some of their findings here. First, when comparing cross sections of the NBS and RCRE surveys with overlapping years from cross sectional surveys not using a diary method, it is apparent that some high and low income households are under-represented.“Poorer illiterate households 9One shortcoming of the survey is the lack of individual-level information. However, we know the numbers of working-age adults and dependents, as well as the gender composition of household members. 10 Our approach to valuing consumption follows the suggestions of Chen and Ravallion ( 1996) for the NBS Rural Household Survey, and is explained in more detail in Appendix A of Benjamin et al. (2005). 11 The cross-sections used were the rural samples of the 1993, 1997 and 2000 China Health and Nutrition Survey (CHNS) and a survey conducted in 2000 by the Center for Chinese Agricultural Policy (CCAP) with Scott Rozelle (UC Davis) and Loren Brandt (University of Toronto). 67 are likely to be under-represented because enumerators find it difficult to implement and monitor the diary-based survey, and refusal rates are likely to be high among affluent households who find the diary reporting method a costly use of their time. Second, much of the difference between levels and trends from the NBS and RCRE surveys can be explained by differences in the valuation of home-produced grain and treatment of taxes and fees. 3.3.3 Migration, Consumption Growth and Poverty Tl‘ends One of the benefits of the accompanying village survey is a question asked each year of village leaders about the number of registered village residents working and living outside the village. In our analysis, we consider all registered residents working outside their home county to be migrantsnBoth the tremendous increase in migration from 1987 onward and heterogeneity across villages are evident in Figure C.1. In 1987 an average of 3 percent of working age laborers in RCRE villages were working outside of their home villages, which rose steadily to 23 percent by 2003. Moreover, we observe considerable variability in the share of working age laborers working as migrants. Whereas some villages still had a small share of legal village residents employed as migrants, more than 50 percent of working age adults from other villages were employed outside the village by 2003. The relationship between migration and consumption is of central concern for our analysis. The linear fit of the relationship between annual changes in migration and average village consumption growth in the RCRE data suggest a positive relation- ship (Figure C.2). The lowess fit, however, suggests the presence of nonlinearities, 12 From follow up interviews with village leaders, it is apparent that registered residents living outside the county are unlikely to be commuters and generally live and work outside the village for more than six months of the year. 68 particularly around zero. Indeed the prospect that out-migration may be driven by negative shocks which also depress consumption should raise concern that size of the migrant network and consumption may be endogeneous and driven in part by shocks affecting both variables. Even if consumption grows with an increase in the number of residents earning incomes from migrant employment, it is of important policy interest to understand which residents within villages are experiencing increases in consumption. Changes in the village poverty headcount are negatively associated with the change in the num- ber of out-migrants, suggesting that poverty declines with increased out-migration (Figure C.3). Nonlinearities in the bivariate relationship are evident again in the non-parametric lowess plot of the relationship. Whether obvious non—linearities are related to the simultaneity of shocks and increases in out-migration and poverty for some villages or the simple fact that we have not controlled for other characteristics of villages, establishing a relationship between migration and increased consumption of poorer households within villages requires an analytical framework where we eliminate bias due to simultaneity and potential sources of unobserved heterogeneity. A Causal Relationship Between Migration and Consumption Growth In other research using this data source, de Brauw and Giles (2007) use linear dynamic panel data methods with continuous regressors to demonstrate a robust relationship between the reduction of obstacles to rural—urban migration and household consump- tion growth. While one might often suspect that the non-poor, who have sufficiently high human capital and other dimensions of ability, may benefit most from reduc- tions in barriers to migration, general equilibrium effects of out-migration may lead to greater specialization of households in villages that has benefits for the poor. In particular, de Brauw and Giles demonstrate that households at the lower end of the consumption distribution tend to expand both their investments in agriculture related 69 assets and the area of land that they cultivate increase more with out migration than they do for richer households. This raises the prospect that ability to migrate may be causally related to poverty reduction within rural communities as well. In the empirical application of our discrete binary response model below, we are simply seeking to understand whether out-migration from villages is associated with reductions in the probability that household consumption falls below the poverty line in rural China. We are agnostic as to whether poverty is reduced through direct participation in the migrant labor market, or through indirect general equilibrium effects that raise the return to labor in agricultural and other local activities. 3.3.4 Estimating the Impact of Migrant Labor Markets on Poverty Persistence The econometric approach derived in Section 3.2.2 allows us to control for household specific unobserved effects, which will include fixed effects associated with the village in which households are located. We are interested in estimating the dynamic binary choice model for the probability that a household 15 falls below the poverty line at time t: p00,, 2 1[61pov,,_1 + 62mg, * pov,,_1) + 631W}, + XQ-tal + uglpcu + D, + u, + e,,], (3.18) where 1901),, is a binary indicator for whether the household is in poverty in year t, which will be affected by poverty status in the prior period, pov,,_1, the size of the migrant network from village j through which the household 2' may be able to obtain a job referral, M 53,, a vector of household demographic and human capital characteristics, X,,, household land per capita, lpc,,, and year dummies to control for macroeconomic shocks, D,. We will be concerned about the possibility that an unobserved household effect, 71,, may be systematically related to the size of the 70 household’s migrant network, to other covariates, and to household poverty status, thus introduce endogeneity concerns. The error term, 5,,, may be serially correlated, and we may be concerned that shocks in the error term may also be systematically related to the size of the migrant network, M i and to the possibility of falling into J't’ poverty, and thus contribute an additional endogeneity concern. From the model specified in (3.18), we are particularly interested in identifying the coefficients on pov,,_1, M}, and M}, * pov,,_1. The coefficients on pov,,_1 and M}, =l< pov,,_1 allow us to gauge the importance of persistence in the probability that a household is poor, and the impact of access to migrant employment opportunities through the migrant network on poverty persistence. 63, the coefficient on M 2’2, allows us to determine the impact of the migrant network on the probability that a household will fall into poverty. The specification shown in (3.18) may have additional sources of endogeneity if we believe that household demographic and human capital, X,,, or land per capita, lpc,,, may vary with unobserved shocks in period t or t— 1. We address the possible concern over endogenous household composition by using household demographic and human capital variables for the legal long-term registered residents of households. While household size may vary somewhat with shocks as individuals move in and out of the household for the purpose of finding temporary work elsewhere, such variations do not show up in registered household membership. Long-term membership only changes when households split subject to such events as marraige or legal change of residence to another location. Land managed by the household may also vary with shocks. Land markets in rural China do not function well: land cannot be bought and sold, and only in the last few years have farmers gained the right to explicitly transfer land. Instead land is allocated by village leaders, and reallocated or adjusted among households within village small groups if a household is judged to have too little land to support itself. Nonetheless, there is some possibility that reallocation 71 may be related to shocks that occur in period t or t—1 that may also be systematically related to poverty status and the migrant network size. Wooldridge (2002) shows that when the assumption of strict exogeneity of the regressors fails in the context of the standard FE estimation the inconsistency of the instrument is of order T’l. We thus use the period t — 2 value of land per capita and estimate: POUz‘t = llfiiPOl’it—i + 32(11th * Pm‘itwll + £331”; + X§t01 + 021P6it—2 + Dt + “2' + Eitl, (3.19) One remaining issue remains in that we do not perfectly observe the network M}, through which household 2' may use for job referrals. Instead, we observe the number of registered longterm village residents who are employed as migrants outside the village in a particular year, or th. The true migrant network may include former legal registered residents who have now changed their long-term residence status, implying that the actual potential network is larger. Alternatively, the household may not be familiar with all of the village out-migrants, and thus the actual network through which a household may seek referrals may be smaller. Thus, we will estimate: Povit = ll/31P0v2‘t—1+232(th* Povit—ll + 53 th + X§t011 + azlpcz't—z + Dt + 112' + fit], (3.20) In our identification strategy below, we will instrument the endogenous number of village out-migrants, M jta with village level instruments, identifying the size of the village migrant labor force, interacted with period t — 2 lagged land per capita, lp(:.,-t_2, in order to allow for differences in the effective value of the village migrant network for households with different amounts of land. Why might we expect that interacting with lpcit-2 might achieve this? We believe that the land per capita managed by households will likely pick up a dimension of proximity of different households within the village. Within villages in rural China, households are separated into smaller units of roughly 20 households known as vil- lage small groups (can ziaozu), which were referred to as production teams during 72 the Maoist period. These households are located in clusters and will have closer re- lationships with one another than with households of other small groups. Moreover, property rights to land in rural China typically reside with the small group, not with the village. Thus, when land reallocations take place they typically take place within but not across small groups. Small groups make more frequent small adjustments to household land as the land per capita available starts to become unequal with differential changes in household structure across households within the small group, but there is much less flexibility in making adjustments across small groups. As a result, much of the variability of land per capita within villages occurs across small groups.13 Interacting a village level instrument for the migrant network with land per capita will allow the importance of th to vary across households, and much of the difference across households occurs because of unobserved differences in the small groups in which they reside and from which migrants refer to as home. 3.3.5 Identifiying the Migrant Network To instrument the village migrant network, we make use of two policy changes that, working together, affect the strength of migrant networks outside home counties but are plausibly unrelated to the demand for and supply of schooling. First, a new national ID card (shenfen zheng) was introduced in 1984. While urban residents re- ceived IDs in 1984, residents of most rural counties did not receive them immediately. In 1988, a reform of the residential registration system made it easier for migrants to gain legal temporary residence in cities, but a national ID card was necessary to 13 We do not know village small group membership in the RCRE survey prior to 2003 when a new survey instrument was introduced. If we regress land per capita on village dummy variables in 2003, we obtain an R-Squared of 0.503, while if we run a regression of land per capita on small group dummy variables, we obtain an R-Squared of 0.616. A Lagrange Multiplier test for whether the small group effects add anything significant over the village effects, which is effectively a test of whether small group coefficients are constant within villages, yields an LM statistic of 310.67, which has a p—value of 0.0000. 73 obtain a temporary residence permit (Mallee, 1995). While some rural counties made national IDs available to rural residents as early as 1984, others distributed them in 1988, and still others did not issue IDs until several years later. The RCRE follow-up survey asked local officials when IDs had actually been issued to rural residents of the county. In our sample, 41 of the 90 counties issued cards in 1988, but cards were issued as early as 1984 in three counties and as late as 1997 in one county. It is important to note that IDs were not necessary for migration, and large numbers of migrants live in cities without legal temporary residence cards. However, migrants with temporary residence cards have a more secure position in the destination com- munity, hold better jobs, and would thus plausibly make up part of a longer-term migrant network in migrant destinations. Thus, ID distribution had two effects after the 1988 residential registration (hukou) reform. First, the costs of migrating to a city should fall after IDs became available. Second, if the quality of the migrant network improves with the years since IDs are available, then the costs of finding migrant employment should continue to fall over time. As a result, the size of the migrant network should be a function of both whether or not cards have been issued and the time since cards have been issued in the village. Given that the size of the potential network has an upper bound, we expect the years-since—IDs-issued to have a non-linear relationship with the size of the migrant labor force and we expect growth in the migrant network to decline after initially increasing with distribution of IDs. In Figure C.4, we show a lowess plot of the relationship between years since IDs were distributed and the change in number of migrants from the village from year t—l to t. Note the sharp increase in migrants from the time that IDs are distributed and then a slowing of the increase over time (which would imply an even slower growth rate). This pattern suggests non-linearity in the relationship between ID distribution and new participants in the village migrant labor force. We thus specify our instrument as a dummy variable indicating that IDs had 74 been issued interacted with the years since they had been issued, and then experiment with quadratic, cubic and quartic functions of years-since-IDs-issued. We settle on the quartic function for our instruments because, as we show below, it fits the pattern of expanding migrant networks better than the quadratic or the cubic functions. Since ID distribution was the responsibility of county level offices of the Ministry of Civil Affairs, which are distinctly separate from agencies involved in setting policies affecting land, credit, taxation and poverty alleviation (the Ministry of Agriculture and Ministry of Finance handle most decisions that affect these policies at the local level), it is plausible that ID distribution is not be systematically related to unob- servable policy decisions with more direct relationship to household consumption. Still, using a function of the years since IDs were issued is not an ideal identification strategy. Ideally, a policy would exist that was randomly implemented, affecting the ability to migrate from some counties but not others. As the differential timing of the distribution of ID cards was not necessarily random, we must be concerned that coun- ties with specific characteristics or that followed specific policies were singled out to receive ID cards earlier than other counties, or that features of counties receiving IDs earlier are systematically correlated with other policies affecting consumption growth. These counties, one might argue, were “allowed” to build up migrant networks faster than others. In an earlier paper, de Brauw and Giles (2007) address several possible concerns with use of the years-since—IDs quartic as instruments for the size of the village migrant labor force. They first show that timing of ID distribution appears to be related to remoteness of the village, but not systematically related to village policies affecting that may affect consumption growth, with village administrative capacity, or with the demand for IDs within the village. They thus argue in favor of including a village fixed effect to control for features of the local county which may have affected timing of ID distribution, and then identify the size of the village migrant labor force off of 75 non—linearities in the time that it requires for migrant networks to build up. In this paper, we interact the quartic in years-since IDs with pre—determined land per capita of households in period t — 2 to identify the size of the village migrant network. 3.4 Results Before estimating equation (3.20), we establish that our instruments are significantly related to size of the migrant labor force. We estimate the relationship as a quadratic, cubic, and quartic function of the years since IDs were issued each interacted with pe- riod t— 2 land per capita. These results are reported in columns (1) (3) and columns (4) (6) of Table C.2 for each year from 1995-2001 and odd years from 1989-2001, re- spectivelyld‘We find a strong relationship between our instruments and the size of the migrant. network for each specification. For the remainder of our estimation we favor the quartic function interacted with t — 2 land per capita for two reasons: First, the effects of ID card distribution on the migration network can be determined more flex- ibly when we use the quartic specification. Secondly, the partial R2 increases slightly from the quadratic to the quartic for the both samples we consider. After controlling for the household characteristics, the instruments have jointly significant effects on the number of migrants from the village for both samples, with F -statistics of 39.82 and 54.65 for the 1995 to 2001 and odd year 1989 to 2001 samples, respectively. We apply the method introduced in Section 3.2.2 to estimating equation (3.20). In Table C.3, we report estimation results based on the pure random effects and correlated random effects approaches. We obtain the pure RE estimation results using the Stata ”xtprobit” command, where year dummies (not shown), residuals from the first stage estimation and their time averages (not shown), number of household 14 Since the RCRE survey was not conducted in 1992 and 1994, we estimate the dynamic model with one year spacing from 1995 to 2001, and with two-year spacing from 1989 to 2001. 76 members, number of prime age household laborers, second lag of land per capita, average years of education, share of females, lagged poverty status, migration network, interaction between the lagged poverty status and the migration network are included as explanatory variables. The correlated RE estimation results are obtained using the Stata ”xtprobit” command, where year dummies (not shown), residuals from the first stage estimation (not shown), first-stage residuals and all the exogenous explanatory variables in each time period (not shown), number of household members, number of prime age household laborers, second lag of land per capita, average years of education, share of females, lagged poverty status, migration network, interaction between the lagged poverty status and the migration network, and the poverty status in the initial time period are included as explanatory variables. For purposes of comparison, we also estimate model (3.20) using a naive linear probability model and provide the results in Table C.4. Even after controlling for the unobserved effect using our correlated RE approach, the coefficients on the lagged poverty status are highly statistically significant for explaining the current poverty status in both datasets considered. The positive sign of the lagged poverty status suggests that being poor in a previous period significantly increases the probability of being poor in a current period. The initial value of the poverty status is also very important. It implies that there is substantial correlation between the unobserved effect and the initial condition. The coefficient on the lagged poverty status in the initial time period (1.028 for 1989- 2001 dataset and 1.161 for 1995—2001 dataset) is larger than the coefficient on the lagged poverty status (0.820 for 1989-2001 dataset and 1.523 for 1995-2001 dataset). The migrant network is statistically significant for explaining the poverty status in all models but the correlated RE using the 1995-2001 dataset. Interaction between the migration network and the poverty status is also statistically significant at 0.01 level for every case we consider. The negative sign of the interaction term suggests that those households that were poor in the previous period are less likely to benefit 77 from the increases in the size of the migration network in the current period. In other words, in our application, we find that migration is important for reducing the likelihood that poor households remain in poverty and that non-poor households fall into poverty. Further, failure to control for unobserved heterogeneity leads to an overestimate of the impact of migrant labor markets on probability of staying poor of those who lived below the poverty lines. The coefficient on the interaction term between the migration network and the lagged poverty status for the pure RE approach (-0.188 for 1989-2001 dataset and -0.180 for 1995-2001 dataset) is larger in absolute value than the coefficient on the interaction term for the correlated RE method (-0.108 for 1989-2001 dataset and -0.137 for 1995-2001 dataset). In Table C.5 we show the APEs for both models considered for both data samples. For example, for the sample from 1989 to 2001, the correlated random effects CF estimate of the APE of 100 more members in the migration network for those who were living above the poverty level is to reduce the probability of being poor by about 3.6 percentage points. For those who lived below the poverty line, the correlated random effects CF estimate of the APE of 100 more members in the migration network is to reduce the probability of being poor by 5.7 percentage points. Interestingly the APEs calculated using the correlated random effects dynamic probit approach are generally smaller than those calculated using the linear probability model. This suggests that using a naive LPM approach might lead us to conclude that migraton has a stronger impact on poverty reduction than found using the correlated random effects probit model. 3.5 Conclusions In this paper, we have developed a dynamic binary response panel data model that allows for an endogenous regressor. This estimation approach is of particular value 78 for settings in which one wants to estimate the effects of a treatment which is also endogenous. We next apply the model to examine the impact of rural—urban migration on the likelihood that households in rural China fall below the poverty line. In our application, we find that migration is important both for reducing the likelihood that households remain in poverty or fall into poverty if they were not poor in the previous period. 79 APPENDIX A Tables for Chapter 1 80 Table A.1. Usual Unobserved Effects CRC Model for 6 = 2 and T = 5 (ll (2) (3) (4) (5) (6) Estimator Time Dummies? Mean SD RMSE LQ Median UQ N = 100 POLS no 3.363 .189 1.377 3.238 3.356 3.486 FE—OLS no 2.616 .138 0.642 2.527 2.621 2.711 IV no 2.752 .225 0.781 2.612 2.761 2.901 FE—IV no 2.423 .214 0.484 2.288 2.429 2.558 FE—IV yes 1.945 .407 0.407 1.711 1.980 2.208 N = 400 POLS no 3.369 .091 1.372 3.299 3.366 3.434 F E-OLS no 2.623 .067 0.635 2.575 2.626 2.667 IV no 2.745 .110 0.760 2.666 2.740 2.818 F E-IV no 2.428 .096 0.455 2.362 2.423 2.498 FE—IV yes , 1.988 .177 0.213 1.887 1.997 2.101 N = 800 POLS no 3.373 .063 1.375 3.330 3.366 3.412 FE—OLS no 2.625 .046 0.637 2.596 2.624 2.655 IV no 2.753 .076 0.764 2.700 2.750 2.801 FE—IV no 2.436 .068 0.458 2.389 2.437 2.480 FE—IV yes 2.004 .131 0.182 1.919 2.009 2.091 81 Table A2. Usual Unobserved Effects CRC Model for 6 = 2 and T = 10 (1) (2) (3) (4) (5) GD Estimator Time Dummies? Mean SD RMSE LQ Median UQ N = 100 POLS no 3.204 .157 1.223 3.097 3.195 3.314 FE—OLS no 2.534 .106 0.562 2.469 2.531 2.603 IV no 2.397 .123 0.440 2.324 2.395 2.475 FE-IV no 2.277 .115 0.331 2.208 2.276 2.351 FE—IV yes 2.013 .283 0.313 1.841 2.020 2.210 N = 400 POLS no 3.196 .077 1.202 3.146 3.193 3.247 FE—OLS no 2.528 .056 0.545 2.490 2.527 2.565 IV no 2.392 .061 0.417 2.450 2.393 2.431 FE—IV no 2.270 .060 0.305 2.231 2.274 2.308 FE—IV yes 1.995 . .138 0.186 1.901 2.002 2.092 N = 800 POLS no 3.194 .054 1.200 3.155 3.194 3.224 FE—OLS no 2.525 .040 0.541 2.498 2.523 2.551 IV no 2.388 .042 0.410 2.357 2.387 2.416 FE—IV no 2.268 .041 0.299 2.241 2.267 2.294 F E-IV yes 1.992 .100 0.160 1.926 1.993 2.062 82 grmm‘j A Table A3. Random 'Irend CRC Model for 6 = 2 and T = 5 (1) (2) (3) (4) (5) (6) Estimator Time Dummies? Mean SD RMSE LQ Median UQ N = 100 POLS yes 4.293 .300 2.303 4.096 4.284 4.475 FE—OLS yes 2.673 .182 0.697 2.555 2.671 2.782 IV yes 2.929 .850 1.247 2.444 2.941 3.496 FE—IV yes 2.000 .626 0.642 1.635 2.057 2.383 FE—IV no 13.414 1.411 11.422 12.464 13.221 14.225 N = 400 POLS yes 4.308 .144 2.312 4.201 4.307 4.411 FE—OLS yes 2.663 .085 0.679 2.607 2.666 2.721 IV yes 3.004 .411 1.073 2.704 3.023 3.292 FE—IV yes 2.013 .269 0.301 1.835 2.019 2.204 FE—IV no 13.406 .665 11.406 12.915 13.340 13.878 N = 800 POLS yes 4.296 .097 2.294 4.225 4.295 4.363 FE—OLS yes 2.660 .060 0.671 2.617 2.658 2.700 IV yes 2.996 .278 1.038 2.809 2.993 3.171 FE-IV yes 1.996 .187 0.223 1.874 2.005 2.130 FE—IV no 13.351 .478 11.328 13.049 13.318 13.654 83 Ti], : | V I’d—1 Table A4 Random Trend CRC Model for B = 2 and T = 10 (1) (2) (3) (4) (5) (6) Estimator Time Dummies? Mean SD RMSE LQ Median UQ N = 100 POLS yes 4.789 .407 2.820 4.522 4.814 5.051 FE-OLS yes 2.651 .178 0.687 2.539 2.656 2.761 IV yes 2.916 1.042 1.401 2.357 2.976 2.615 FE—IV yes 1.968 .619 0.641 1.603 2.001 2.384 FE—IV no 15.933 .771 , 13.919 15.367 15.902 16.479 N = 400 POLS yes 4.808 .190 2.815 4.678 4.808 4.943 FE—OLS yes 2.662 .089 0.678 2.600 2.662 2.718 IV yes 3.000 .504 1.137 2.659 2.993 3.361 FE—IV yes 1.981 .311 0.338 1.767 1.978 2.203 FE—IV no 15.900 .406 13.875 15.633 15.890 16.177 N = 800 POLS yes 4.788 .144 2.784 4.682 4.779 4.885 FE—OLS yes 2.663 .062 0.674 2.618 2.660 2.703 IV yes 3.000 .360 1.061 2.759 3.026 3.243 FE—IV yes 1.997 .201 0.234 1.855 1.998 2.132 FE-IV no 7 15.904 .289 13.888 15.693 15.895 16.098 84 APPENDIX B Tables for Chapter 2 85 898835 was .3053, mcommmmummu 2:. no woman was $5858 =< .m H K. 98 m M Q ”802 86 as: 83 $3 33 Sad 48.0 H mm ..€< 83 mo s so.“ 33 83 $3 £3 at; $3 83 2am can. u see 22.. $3 $3 and weed we; 5:. $3 20me can. u case 3.3 33 ES 82 $3 $3 £3 £3 2 SN. u sass 33 some $3 $3 mood Ego was as: Boa 8m. n ... E 82 u 2 $3 8.: as: $3 ago 83 H mm :é «cos .5 a $5 sod £3 83 was 83 ES 83 >1: as. Hi 83 mm; 23 £3 ”was «was 35.0 $3 Soda mg. n sass «New amen .8; 3.2 was 83 SS 33 2 mam. u sass mews News. 32 £3 83 woos ass wows Boa as. u N as com n z o n 5 $3 $3 3.9m me; an: 25¢ H mm ...3. $3 .8 s 33.. some 2.3 :3 Rod 0de was mos.” Rama 3.... u s“ q 83 23 moms mass was ES 23 was Selma as. n was moss swam £3 as: is «was wees as...” E as. n sass So.” .33 mass 33 33 macs ES a? 28 SN. n“. sq 82 n 2 $3 $3 was om; Sod Sod .... mm ..€< $3 .5 s. E; some 83 83 SS «:5 $3 gem >2: Em. n can $3 was $3.. 33 83 83 E5 was Selma :a. u. ease as...” we; 33 3.3 23 23 33 as...” 2 as. n same as.” 83 moms £3 3.2 on; 83 was Sea as. u N ....c 25 n z o a w 0: uses: 3 mmzm mm mm sea mm mom _ 5.529858sz 9: av Q E A3 3v 3 n 5 _ g 3 ENS” m—HOH—Gmufloo ROM #0602 UMHO Howmm U®>h®mQOGD ~NSWD .H.mm 236.8 .mflafidw was fit.» 3:23:33 23 no @033 8.3 $0 3:: 33 3838333 3:? =< .m H K. 9:3 m M Q 6qu :333 333.: :33: 333 33:.3 33:.3nmm..€< :333 mo 33. ”333333: 33.3 333.3 333.: 3.33 333 333 333.3 333.3 >33 33. um. 3:: 333.3 :33 :23 :333 333.3 333.3 333.3 3:3.3 3:03: 333. n 33333 333 333 333.3 33.: 333.3 333.3 333.3 333 >: 333. ”3333 33.3 333.3 :33 333.: 333.3 33:.3 333.3 333.3 3:9: 333. ”N33 333:u2 3:3 :33: 333.: 333.3 3:3 33:.3nm3.€< 333.: 30 3:3. ”333% 33:3 :33: 333.: 333.3 33:3 33:3 33:3 333.: >333 333. “€33 333.3 3:3.3 33:3 333.3 333.3 333.3 3333 3:33 3:09.: 333. “53333 3.33.3 333.3 :33 333.: 333.3 333.3 333 3:3 >: 333. “3333 3333 333.3 333.3 333.: 3:3 :33 333.3 :33 3:9: 333. ”N33 333n2 on :333 333.3 333.: 333 333.3 333.3nmm..€< 333.3 30 333. ”3.3.3 333 333 333.3 333.: :3 33:3 3333 333 >13 :3 ”Mesa: 3:33 333.3 3:3 333 333.3 333.3 :333 3333 3:09.: 333. n 3333 :33 333.3 333 333.: :3 333 3:3 333 >: 333. “.333 333.3 333.3 :33 333.: 333 333 :333 333.3 Son: 333. ”33 333:nz 33:3 3.3 5.3.: 333 333 33:.3umm..€< 333.3 30 3. ”33am: :3 333 :33 3a.: 333 333 333.3 333.3 >53: :3. “:333 333.3 333.3 333.3 333 :33 333.3 3333 3333 3:03.: :3. n 3333 333:. 333 3:3 333.: 333.3 333.3 :33 E3 2 3:3. “333.3 333 333.3 333 333.: 333 333 3.3 333.3 Son: 333. ”Nam 333n2 3:33 a: 5:332 a: 3:33 3:3 mm .38: mm .333: _ as: _ seesaw 8:3 :33 :33 E :33 :33 :3 _ :33 _ :33 S 3.33 325380 .83 :33): 0:0 38:. 88:5,: 3.3: 233.3 328533 was 3:53:33 34533393333 35 no 3333.33 83. 3338533 :< .m H E 9:3 m H Q 6362 mid mood Em: Hmmd wwfio H36 H Hm ..€< 89m mo 33 mood at: mm»: wvmd mid nwfid 8:3 mmwA >Tflm mvm. 3 33: vwmd wEN god vmhd mmod «mod owed wEN mAOAmE omm H 3333.: SEN uvmfi 939m mood mmmd oomd nmmd «mad >H 9% H 3336 3.3....“ @336 3.36 33.3” wmad med 33.0 33mm mqom mmm. H 33333 83 H 2 mafia «NON mmwg wnmd Sad wmmd H Wm .333 NSN m0 3mo.m wwwa mama wand mmmd mead mmmd 3%.: >353 93m. H3M3Q. «SN wand awed 333.6 32.: 3.de mmod KEN mQOAmm omm. 3333. N38 ommfi wwNm Sad End wcmd womd Fond >H omm. H 333% wwwh 335m wood 832m wand mmmd mad Edam qum mmm. H 33333 com H 2 o H w nmmd wwfim OMQN 3&6 326 32.0 H Hm ..€< bmHN m0 me.m mood mth and mtd mtd mid mend >Tm~m Em. H3M3m. :oN mmmd omvfl owmd mmfio mafia mmod mmmd mAOkm—m mvm. ”33¢: mmvd mwmd omfim mama wmmd mvmd mmmd «wad >H 3mm H 3.3333: 83¢ ommé 23w wmmd mad :36 3.3.0 mango mAOm 53m. H 333m coo: H 2 momN SAN Sod momd mmmd Hmmd H "mm ..€< omfim m0 nmwd EQN mmmd @320 ovmd 5&6 mead awed >7mm owm. H3M3m: ©m©.m and «SN Emd 335.0 mid 33.0 wmmN mqoumm 93m. H333: 333.3 333.3 3:3.3 333.: 3333 3333 3333 3333 >: 333. ”3333 333.3 333.3 33:.3 333.3 333.3 :333 33:.3 333.3 3:03 333. ”3333 333n2 3333 0:: 92333,: a: 3323: 3:3 :33 .303: m3 .333: _ 3332 _ 53:35:33: :3: :33 :33 E :33 :33 A33 _ A33 : :33 E :: 33 w 333 83 683,: 0:0 38333 332838: :33: 3.3: 3:333 88 .meESU was fit.» mzoflmmfiwmu 2: :0 893 v.8 30 v5 33 mmpdESmm m5 =< .m H B 98 m M Q ”802 89 :3 $3 $3 $3 wood msdumm :E 83 mo 2m. "momma $3 SM: 83 32 and mag 83 mm? >5: g. u Hana £3 33 em.“ 83 SS 23 £3 E? macaw 8m. n mag 83 n3.» 8% £3 £3 :3 £3 :3. 2 8m. mama 3.2: 25.3 E; 5; $3 $2 «and £3: Boa Em. Him coauz $3 on? 83 £3 EMS Ewouma ..8< 33 mo 3. “momma SS 52 83 $3 $3 £3 £2 23 2mm gm. n mask 33 am: $2” $3 $3 $3 83 max SOLE Em. u “an Sww $3 $3 3?. $3 $3 3.3 mm: 2 www. “and we; 83: 8m; 98w :2 Had was 32: Son Em. ”Em oomuz onw mwi :3 ME: 80o avg avoumm A? am; no 5m. um“? 83 £2 83 33 32 $2 meg a? >1: Sm. ”Human 83 at; 33 S: 83 82. £3 E; Scam g. n mam 83 £3 a? $3 83 23 83 $3 2 8m. .1.an $2: my; mama new.» $3. 03% £6 E; Son Sm. ulmq ooSnz a: 8mm 2: SS mfid m8.oumm.€< 83 mo :3. “Mama :3 mad mama ES 32 Eve fine 83 >15 3m. n H06 Ema $3. am? a: :2 £2 3m: 33 30mm 3. n saga $5 83 $3 83 E: 32 83 $3 2 mam. “gum 9&2 @wwd mono owws weed ego memo mama Son 3. “NE oomnz ous a: $602 3 $33 8 mm com mm we“: 582 teaasmm 42% A8 g S Q Q 3 _ 8v _ g 3 : ,3 w :3 .81“ ESE Odo v8? 885m 42m gag Table B5. Standard Errors for the Control EJnction Approach (1) (2) l (3) I (4) (5) (6) (71 g N | Mean l Reg. SE Rob. SE Adj. SE SD fl gm on a large support set Usual Unobserved Effect CRC Model 71$ 0 500 2.059 0.040 0.035 0.056 0.057 1000 2.057 0.029 0.025 0.040 0.039 = 0 500 2.002 0.080 0.075 0.090 0.089 1000 2.000 0.057 0.053 0.064 0.062 Random Trend CRC Model 0 500 2.039 0.194 0.113 0.122 0.126 1000 2.037 0.137 0.081 0.087 0.084 = 0 500 1.993 0.278 0.169 0.176 0.178 1000 2.001 0.196 0.120 0.124 0.120 3125: E (0.1) Usual Unobserved Effect. CRC Model # 0 500 2.150 0.204 0.211 0.221 0.233 1000 2.137 0.143 0.150 0.157 0.157 = 0 500 2.012 0.319 0.246 0.256 0.241 1000 2.005 0.225 0.174 0.181 0.186 Random Trend CRC Model 75 0 500 2.229 0.615 0.690 0.693 0.723 1000 2.182 0.433 0.497 0.499 0.494 = 0 500 1.949 1.077 0.875 0.877 0.864 1000 1.965 0.760 0.617 0.619 0.668 90 Badge, :08 SM @2th 33.52% 2:. ~96 $33533 Ba 32.83% 235an one 982: 25. A5 .mqosegow Becca? one 88:30an E mosscmsg 3 ”8on m: R a. 95$ do 38:2 838 8.2 $33 8.2 95.5 8.2 EBB .3 mazes. do 28m 0.3: gm 33v :3 :89 a: one: 2: 53 8am 92% $3pr woocflam a 8 . a: 8:3 do 38:2 83$ 32 33$ 3.2 :33 3.3 EBB ea wages. Mo 253 2o: 2:. ace «.3 :98 $3 was: 2: .83 Sam geom 8%an vooaflwncb ”EEO ofioomm 82 ED _ €30 wok/Boom _ magi =< Ea. 25E @388, mummSaQ Umocflwm «:8 wooafimaab Eob momummuwum magnum—m .mm 2an 91 Table B.7. POLS Estimates of the First Stage Regressions Variable ] (1) (2) (3) (4) (5) (6) g'r'cmLz-t 35.410* 36198” 28697“ 32.504* 29185" 32736" [5.934] [5.515] [5.364] [5.024] [5.294] [5.081] grant. -7.693 -8.482 6.788 2.981 -2.352 -5.903 [11.885] [12.530] [16.474] [17.456] [12.982] [14.064] dtt No Yes No Yes No es Control Variab1e(s) No No Yes Yes Yes Yes R2 .258 .291 .314 .332 .284 .304 Number of Observations 135 135 117 117 120 120 Notes: (i) Dependent variable is hrsempit; grantit is a dummy indicator for whether a grant was received, W.- is a time average of grantit. (ii) Quantities in square brackets are fully robust standard errors. (iii) Row called ”dtt” indicates whether a regression includes separate year intercepts for 1988 and 1989 on the first and on the second stages of estimation. (iv) Control variables in (3) and (4) include log(employz-t) ~ the log of number of employees, log(salesz-t) — the log of annual sales, and log(avg.sal,jt) ~ the log of average employee salary. Controls in (5) and (6) include log(amgsalit) only. *Statistically significant at the .01 level. 92 .... I; .ll|]l_ 888888 m0 8... 8w .o.m 888.88 08 .88858 >H-mm 88 8M .88 888.8 31¢ mafia 888 0028-038 8?”: mo. 05 8 .1. :82 S. 85 8 pawocEmE 382858? .8388 83 88w 8 8: 8 8:85? 9:80:05 @3828 88:30 5 mm 83384, $888588 8:... .83 nwsofiu 3v 806888 80m Tc 828858 «0 888m 8808 one :0 08 8E 85 :0 33 O8 wwmfl 8“ 88088 8o» 80.888 80305 28688me a 8883 88085 28%.. 00:8 30m A45 .8803 SO 588m 8558 ow80>m 88 m0 m2 8:... mm 033.84, 8.980 23. 95 828838 888.0. .55 on... 8“ 088.88 888 08888 08 80x85 380 E 833850 88 gnome-28.80 8E8 088 8828 8:036:08 cosmowmuommmfia 88808 9808 888w 8 8838 88.8 08988 ..0._ .888 0.800me 8258 8.8 8mean 8858 8 8530.80 .888 8388.0. 83w8 88 8858.88 5 885830 93 @888 03 88 8-8 O88 88:0 2: .8 wOZ 85 E 03358 88880 BC. 3 ”88Z GE 02 92 o2 m2 m2 ca 02 mm: mm; .80 8% OZ 8% OZ 8% OZ 8% OZ 8% OZ #8 8% 8% 8% 8% OZ OZ 8% 8% OZ OZ 88:00 was. 82o. mac. ”so. mac. mfio. So. to. m8. m8. m5. m8. M8” M8; M8; MS; as. So. ms. So. So. 2o. 25. 3o. m8. 8o. .88.- .mmo- ...-8.- ...-.8- 80. .53.- so- me? m8.- ..m8- £8.38 .8 8 .8 .8 .8 8 Era-E >18 >18 >18 288/ 83 av Q E g Q E 3 5 E maommmmuwmm Owwum OcOoom 8: m0 moawfimumm m0 88 >Hum—m .m.@ 2an 93 633.5, 58 H8 @2qu $3.83» 23 .8>o $83318 2a mcosmgov @5853 98 282: BE. A5 .mcosagow 2228.5 03 mommficobwa E mossfiwsg 3 @802 M: R 3 WE; go 28:2 8333 $.89: aaommé 2.83: $33.3 3.35M: Edam Essa mags: 3385 mgvgw 233.8 $123 @3218 $33.» 8% sci 3% :25 £3 Swot 3.8 Emigaetm p520 mfioomm 902 ED # 330 838mm _ £55 :< :59 Sam _ 83525? 35:00 93 new momumfiwum %.EEE:m .md 2an 94 APPENDIX C Tables and Figures for Chapter 3 95 Figure (3.1. Share of Village Labor Force Employed as Migrants by Year Share 1987 1989 1991 1993 1995 1997 1999 2001 2003 Year 96 I—l Average Vlllage Consumption Growth ‘0 u: A f 1- Figure C.2. Village Consumption Growth I r I 103 _ o ‘ 100 Change In Number of Migrants leowess Fit ' “Linear Fit l at 97 Change in Village Poverty Headcount .os _ Figure C.3. Change in Poverty Headcount -1113 o 106 Change in Number of Migrants I—Lowess Fit "“Linear Fit I 98 Change in Number of Out-Migrants Figure C.4. Change in Out-Migrants in Village Labor Force 15‘ ... D 1 in Village Workforce I I I Y -5 o 5 10 Years Since ID Cards Issued 99 .52; m3?— owo. E 002002 8: 333 50 2:02: 0:: coca—5:200 2202 _ as 23 23 ~23 5% .a 23 :82 828 $9 :3: £8 28.8 23 :82 225 a 2 882 25, 9 8:5 23> 23 23 882 : Z 5% .a 8.22 v 32 8.22 222. 2.32 :82 2.22 23 22. 22S 8 2 882.520 2 .6 28> 8.8 :8 8.2: m2: 5% a 3: 282 E .23 as: :82 am: as _ v 5%. 223 2: 82 252.2 00 .2232 83 as 82 .3 5% .a as 282 23 SN? 28 :82 23 $83 :82 2.822 08 22m 2238: 3 3. 2 3 5% a 3 22~ 3 SN? 3 22.0. 3 82v :82 8:88: Co 23> 0222 223.8: 2 S o._ N: 5% .a E 38.2 3 Sex. 2 :82 2 82: =22 220 .2 23 22020: as: 2 3 2 5% .8 22 :22 3 :23 3 22.8. 3 28:1 52: 2223 2288: ”ME 222 2 .2232 E 2 Z 2 5% a 3 282 3 2:3 3 £92 an 28:. =85 22522 2238: 2 8252 to: m. _ mm 3.: v. _ 2 5% 2. 2% 22m 3% 883. 8.2m 22.». 8.92 :2 s. =82 220 2 88.2200 2238: 3mm 8.28 v.26 8.22 5% a N. .2 :82 8.3 $94 $2 :82 3% 82 _ v 222 220 .2 2282 22020: SE. 23 one 28 5% a 83 :82 2 .o 89% 22 v32. 28 22:. 582 2.3 588: 2288: 295m 295m 8232: .20 228m :2 .20 .0852: .20 298m :5 .20 88 2 $2 .8: an; 80 88 e 22 82 80> 2:252:20 o22> .23 223.8: 2.0 222. 100 Table C.2. Factors Determining the Size of the Village Migrant Network First-Stage Regressions Dependent Variable: Number of Migrants Odd Years from 1989 to 2001 Years from 1995 to 2001 Model (1) Q) (3) (4) (5) (6) Household Population 4.709” 4.763“ -1 .785” 2335"" 2338"" -2.341"" (0.867) (0.862) (0.862) (0.656) (0.655) (0.655) Number ofWorking Age 2.516" 2.300" 2.370” 4348"" 4269*” 4284“" Laborers in the (1.046) (1.037) (1.037) (0.773) (0.771) (0.770) Household Land Per Capita .-2 9363”" ~3.719*** 4752”" 10.53?” 14.43?" 15.540”“ (2.666) (1.337) (1.414) (1.503) (1.825) (2.067) Average Years of -0.293 -0.267 -0.272 -0.190 -0.l93 -O.206 Education (0.345) (0.343) (0.343) (0.270) (0.270) (0.270) Female Share of the -0.410 —0.564 -0.707 1.805 2.023 1.999 Household (3.139) (3.123) (3.126) (2.925) (2.923) (2.924) (Years-Since-ID-Issued) 2020*“ -3.899"'” 4396*“ -0.169 -2.513'””" 3925"" " (Land Per Capita ..2) (0.421) (0.462) (0.823) (0.273) (0.535) (0.802) (Y ears-Since-ID-Issued)2 -0.100"'"”" 0772"" 1779*" -0.093*" 0246"" 0633*" "‘ (Land Per Capita (.2) (0.020) (0.081) (0.241) (0.015) (0.071) (0.167) (Years-Since-ID-Issued)3 0034*" 0123"“ -0.014“”" 0050*“ " (Land Per Capita t.2) (0.003) (0.022) (0.003) (0.015) (Y ears-Since-ID-Issued)4 0.003 "' "' " 0.001 "' * " (Land Per Capita (.2) (0.001) (0.000) Observations 25692 25692 25692 22812 22812 22812 R-squarcd 0.09 0.10 0.10 0.22 0.22 0.22 F-Statistic on le with 11.80 23.10 34.21 55.84 37.59 29.68 Averages F-Statistic on IVs w/o 12.63 40.28 39.82 109.40 71.81 54.65 Averages Partial R2, IVs with 0.003 0.009 0.015 0.007 0.007 0.008 Averages Partial R2, IVs w/o 0.000 0.001 0.002 0.006 0.006 0.006 Averages Notes: In parenthesis we show fully robust standard errors [*** p<0.01, *"' p<0.05, * p<0.1]. All regressions include time averages of the explanatory variables and year dummies. 101 0:000 .00 0:0. 00 w0_ 0:008 5.3 0000.25 8.0. :000 £05 000.02-003-90052200» 00 30:09:00 00.000 0.0 0030:? 00:05:55 0:. A3 500.... A: 20.00080. .0 0 ..00» :000 E 0w000 00.: 0:. 0:0... £000.00. 0:0 :0_.0_0..00-_0_.0m 00 00.0 202050. 0.00.0. 00.: 00205 :8 0:0 A3 0:220:03. .w0w0.0>0 0E: :05 0:0 :0_.0_0..00-_0_.0m 00 00.0 200050. 0w00m 00.... 00205 AC 0:0 CV 22000800 00:55.0 .00» 0:0 :00» 5000 E 0030:? 2.20535 05 00205 2.0.20.3. :< .EdV0 .. .mo.OV0 ...... ._o.cV0 1:; 0.0..0 00053 0000023000 3050 03 $0055.00 5 “00.02 Sam 0. 93— 80.0 m.00> 000 Sou 0. 30. E0... m.00> 0.20% b.0>00 .0_00..0> 0:00:0000 m:0_mm0.w0m 0w00m-0:000m 0:005 €0.50 ..0 £=0=_E.0~0n wfifificmm— 0. :000.00< mu .mQ 0300. 0mm 0mm 0mm 0mm 00:0 88.8: .0. 22828: N000 N000 82. 000.. 2.238.. .0 .282 0.00.0. 0.02 0008 0008 2085.300 2008 20008 5:08.. than. 08:08. 0. 202:3 .8222. Q. ..8 @008 82.8 30.8 N. 0.0 $020. 00.0. 1.....80- 2.28... .0 320 8.0.8 000.8 50.8 300.8 . .00- ...:300- .000- 1.8000- 80880 .0 28> 8202 68.8 800.8 090.8 800.8 .80- m .00- 0.0.0- .. :00 2.80 .2 2a.. .0 02 0880 $00.8 3.00.8 2.08 $00.8 .2220. 1.32.0- 3.00.0- .2280- m.323 2238: 00.x 32.0 .0 .2202 50.8 800.8 88.8 3008 0.1.me6 02.2., _Nm.o ..zILVNvd .12,—5N6 302502 300—8203 MD 32:52 8.0.8 0.00.8 2008 50.8 ...:00 . .0- :80 . .0- 0.00. ......0000. 00.2.0582 0.: 0.88 800.8 020.8 03.08 202.; 082800 .4280. 3.80.0- 3.22.0- .1020- 8003 2a 0000:0582 05.0.2000 802.2... 000.8 8008 $00.8 800.8 ...:0000 :22. :10... 3.20.. 282$ 2023.00 8003 mm 00.0.0200 mm 0.00 mm 0000—230 mm 0.30 0002 00 6 .8 :0 102 00000 .00 0:0. 00 w0_ 000000 2.05 00.00.22 :..0. .0000 223 00002003000020.0000 00 02.05.00 00.000 0.0 0032.? 0.0020002 02. 3. 20.002. 2. 00230.3. .00 000% .0000 2 0w0.0 .0... 0:. 0.0.0 2000.00. 0:0 :0_.0_0..00-_0_.0m 00 00.0 200060. 0w0.m .000 00202 3.. 0:0 AN. 20.000000 .m0w0.0>0 0:... 00... 0:0 000.0—000002.00. 00 00.0 200000. 0w0.m .000 00202 Am. 000 2. 002000000 ...,-0.0.0.00 .00» 0:0 ..00» .0000 2 0030.03 b0.0:0_0x0 0... 00202 0:20.083. =< ._H_.ov0 ... .modV0 ...... ..c.oV0 ......L 0.0..0 0.00:0... 0000003000 3020 03 0.0050000 :_ “00.02 0.000 0.000 00000 00000 2202800 000.8 00.0.8 2:020 2.0.0.0 000.0000. ... 0.00.2 2022.00 6.0.8 0.0.8 0.0.8 A. .00. .000- ...:0000- 000.0- 1.0000- 00.0000. .0 0.20 000.8 000.8 .0008 000.8 .000- ...:0000- 000.0 :._0000- 00.08000 .0 23> 00203. 000.8 .0008 .0000. .0008 .000 0.000- .000 1.0000 0.000 .2. 050-. .o 02 2800 000.8 .0008 .0008 0000.8 0:; _No.ou *anugmmodn 0.10—Gd- .....im—odu "0.0.090:— Eocomno: 0&0. 050.0 00 003552 000.8 0000.8 .0008 .0008 ...-20000 3.0.0.0 ...:0000 ......000 0.2.52 0.288... .0 .2502 600.8 000.8 0000.8 .0008 2:080- 3.0000. 0000 10.00- 00.0.2002 0.: 000.8 800.8 .0008 0.0.8 0.0003 ...:0000- £00000- ......0000- ...:0000- 202200 000000 20 00.05502 0.2 .o 00.082... 0.0.8 0.0.8 0.08 50.8 3.0000 ......00...0 3.0.0.0 1.3000 0.002> 0022.00 8000-. S .0. .0. 3 .02.). 00 00.0.0000 mm 0.00 00 00.22.00 mm 0.00 Son 0. $3 08.0 m.00> 000 Sow 0. «00. 0.0.0 m.00> 00.06 b.0>00 ”030.03 .0000000D 000.000.w00 0w0.w-0:000m 00.05 5.0.60 00 0.002—0.0.05 .00 .0002 3:00.30...— .0005 0.0 0300- 103 .EdVQ .. .3.on ...... . Edva .11.. 85:0 Emccfim 3228:8002 302m 95 832E282 E 6302 8mm 88 8mm 8m 8m 82 8m 88 2888838 888.8 2888.8 88.8.8 88:88 88:88. 3;: 88 ...-$58 ...-8:88 ...-8558 E 2:98 €38 888.8 E .88 $28.8 $28.8 888.8 228.8 2828.8 2888.8 2888 ...-.8388- 8888- ...-.8288- 8888- ...-.8888- N888. :.._8Nmm88- 82858888 228 8888 2:88 $28.8 888.8 $88.8 828.8 228.8 388.8 COSMOS—um 8:88- ...-8888- 8888- ...-8888- 8888- ...,-8888- .9888 ...-8888- 83;; 8982 8:88 2828.8 338.8 588.8 288.8 888.8 2888.8 588.8 8 2 8 8888- 8888- 8888 8888- 6288- 8888 2:88 .3888 $8 838 83 8888 $28.8 $88.8 388.8 28888 $88.8 @888 2:88 2888.8 90.58.31— Eosomzoz ...-8888- ...-88888;...8288- ...-83.88- .8288- ...-8888- :3888- ...-88288- 88< oat-288.3582 9888.8 €288 €888 328.8 2288.8 €288 638.8 8888 803502 ...-.3888 ...-1888 ...-2888 ...-.8888 ...-8588 ...-.8888 ...-8888 ...,-8:88 2288523582 82888 288.8 888.8 2.888 388.8 888.8 888.8 888.8 a _u_-._ ...:Emoood- *uvmbooo.o-u**mwooo.ou ......Looood- ***mmooo.cu ...»...Omoocd- 8.8.8.0359? *uuoccood- 20:3 x5332 COS—Ewen 2888.8 388.8 688.8 688.8 388.8 8288.8 $88.8 @888 8 8n_-._ ...-8888- :_._§88-:-8m888- ...-8888- 8888- ...2888- 8888 :8888- 583 82382 88832 2388 3:88 8888 288.8 3888.8 888.8 2888.8 688.8 ....Iwmov—d ***©;vm.o unfuomcwmd ***mohmv.o ..iihwow—d “...."Lvmmmd 3.2.2-me6 ...»...oowwmd 323m 3.535 vowwmq 28 E 68 5 g 6 Q A: 6882 mm mm mm mm 038—280 mg 2:.”— naflotou mm 85 8228.80 mm 2.5 832250 mm 28m 228:5... _€EoU SE1. :ococzm 35:00 EA: Bow 8 3o. Eofi 83> 280 58 S 32 Eat 28> 3.35 3.325 .3 3.535.8qu .3 flue-cm .5:an own-83‘- .mQ 038% 104 BIBLIOGRAPHY Angrist, J. D. (1991). Instrumental Variables Estimation of Average Treatment Effects in Econometrics and Epidemiology. National Bureau of Economics Research Technical Working Paper Number 115. Ahn, S. C., Y. H. Lee, and P. Schmidt. (2001). GMM Estimation of Linear Panel Data Models with Time-varying Individual Effects. Joumal of Econometrics 101, 219-255. Bane, M. J. and D. T. Ellwood. (1986). Slipping into and out of Poverty: The Dynamics of Spells. Joumal of Human Resources, 21(1), 1-23. Benjamin, D., L. Brandt, and J. Giles. (2005). The Evolution of Income Inequality in Rural China. Economic Development and Cultural Change 53(4), 769- 824. Cai, F., A. Park, and Y. Zhao. (2007). The Chinese Labor Market, chapter prepared for China's Great Economic Transition, Loren Brand and Thomas Rawski (eds), Cambridge University Press (in press). Cappellari, L. (1999). Minimum Distance Estimation of Covariance Structures, 5th UK Meeting of Stata Users. Card, D. (2001). Estimating the Return to Schooling: Progress on Some Persistent Econometric Problems. Econometrica 52, 1 199-1218. Chamberlain, G. (1980). Analysis of Covariance with Qualitative Data. Review of Economic Studies 47, 225-238. Chamberlain, G. (1984). Panel Data, in Handbook of Econometrics Volume 2, Z. Griliches and M. D. lntriligator, (Eds.). Amsterdam: North Holland, 1247- 1318. Chan, K. W. and L. Zhang. (1999). The Hukou System and Rural-Urban Migration in China: Processes and Changes. China Quarterly 160, 818-55. Chay, K. Y. and D. R. Hyslop. (2000). Identification and Estimation of Dynamic Binary Response Models: Empirical Evidence Using Alternative Approaches, mimeo. Chen, S. and M. Ravallion. (1996). Data in Transition: Assessing Rural Living Standards in Southern China. China Economic Review 7(1), 23-56. 105 Cornwell, C., P. Schmidt, and R. C. Sickles. (1990). Production Frontiers with Cross-sectional and Time-series Variation in Efficiency Levels. Joumal of Econometrics 46, 185-200. de Brauw, A. and J. Giles. (2007). Migrant Labor Markets and the Welfare of Rural Households in the Developing World: Evidence from China. Michigan State University, Department of Economics, Mimeo. Du, Y., A. Park, and S. Wang. (2005). Is Migration Helping China's Poor? Journal of Comparative Economics 33(4), 688-709. Garen, J. (1984), The Returns to Schooling: A Selectivity Bias Approach with a Continuous Choice Variable. Econometrica 52, 1199-1218. Giles, J. (2006). Is Life More Risky in the Open? Household Risk-Coping and the Opening of China's Labor Markets. Joumal of Development Economics 81(1), 25-60. Giles, J. and K. Yoo. (2006). Precautionary Behavior, Migrant Networks and Household Consumption Decisions: An Empirical Analysis Using Household Panel Data from Rural China. Review of Economics and Statistics (in press). Hahn, J. and G. Kuersteiner. (2002). Asymptotically Unbiased Inference for a Dynamic Panel Model with Fixed Effects When Both n and T Are Large. Econometrica 70, 1639-1657. Hall, R. E. and C. I. Jones. (1999), Why Do Some Countries Produce So Much More Output per Worker than Others?. Quarterty Journal of Economics 1 14, 83-116. Heckman, J. J. (1981). The Incidental Parameters Problem and the Problem of Initial Conditions in Estimating a Discrete Time - Discrete Data Stochastic Process, in CF. Manski and D. McFadden, (Eds.), Structural Analysis of Discrete Data with Econometric Applications. MIT Press, Cambridge, MA, 179-195. Heckman, J. J. (1997). Instrumental variables: A study of Implicit Behavioral Assumptions Used in Making Program Evaluations. Journal of Human Resources 32, 441 -462. Heckman, J. J. and E. Vytlacil. (1998), Instrumental Variables Methods for the Correlated Random Coefficient Model. Journal of Human Resources 33, 974-987. 106 Heckman, J. J. and E. Vytlacil. (2005). Structural Equations, Treatment Effects, and Econometric Policy Evaluation. Econometrica 73, 669-738. Holzer, H., R. Block, M. Cheatham, and J. Knott. (1993), Are Training Subsidies for Firms Effective? The Michigan Experience. Industrial and Labor Relations Review 46, 625-636. Honor'e, B. E. and E. Kyriazidou. (2000). Panel Data Discrete Choice Models with Lagged Dependent Variables. Econometrica 68, 839-874. Hyslop, D. R. (1999). State Dependence, Serial Correlation and Heterogeneity in lntertemporal Labor Force Participation of Married Women. Econometrica 67(6), 1255-94. Imbens, G. and J. D. Angrist. (1994). Identification and Estimation of Local Average Treatment Effects. Econometrica 62, 467-476. Jalan, J. and M. Ravallion. (1998). Transient Poverty in Post-Refonn Rural China. Joumal of Comparative Economics 26(2), 338-357. Jalan, J. and M. Ravallion. (2002). Geographic Poverty Traps? A Micro Model of Consumption Growth in Rural China. Journal of Applied Econometrics 17(4), 329-46. Liang, Z. and Z. Ma. (2004). China's Floating Population: New Evidence from the 2000 Census. Population and Development Review 30(3), 467-488. Mallee, H. (1995). China's Household Registration System Under Reform, Development and Change 26(1), 1-29. Meng, X. (2000). Regional Wage Gap, lnforrnation Flow, and Rural-urban Migration in Y. Zhao and L. West (eds) Rural Labor Flows in China, Berkeley: University of California Press, 251-277. Mundlak, Y. (1978). On the Pooling of Time Series and Cross Section Data. Econometrica 46, 69-85. Murtazashvili, I. (2006). A Control Function Approach to Estimation of Correlated Random Coefficient Panel Data Models. Michigan State University Department of Economics, Mimeo. Murtazashvili, I. and J. M. Wooldridge. (2005), Fixed Effects Instrumental Variables Estimation in Correlated Random Coefficient Panel Data Models, Mimeo, Michigan State University Department of Economics, Mimeo. 107 Ravallion, M. and S. Chen. 2007. China's (Uneven) Progress Against Poverty. Journal of Development Economics 82(1), 1-42. Rivers, D. and Q. H. Vuong. (1988), Limited lnforrnation Estimators and Exogeneity Tests for Simultaneous Probit Models. Journal of Econometrics 39, 347-366. Rozelle, S., L. Guo, M. Shen, A. Hughart and J. Giles. (1999). Leaving China's Farms: Survey Results of New Paths and Remaining Hurdles to Rural Migration. China Quarteriy 158, 367-393. Semykina, A. and J. M. Wooldridge. (2005). Estimating Panel Data Models in the Presence of Endogeneity and Selection: Theory and Application. Michigan State University Department of Economics, Mineo. Smith, R. and R. Blundell. (1986). An Exogeneity Test for a Simultaneous Equation Tobit Model with an Application to Labor Supply. Econometrica 54, 679-685. Taylor, J. E., S. Rozelle, and A. de Brauw. (2003). Migration and Incomes in Source Communities: A New Economics of Migration Perspective from China. Economic Development and Cultural Change 52(1), 75-101. Wooldridge, J. M. (1997). On Two Stage Least Squares Estimation of the Average Treatment Effect in a Random Coefficient Model. Economics Letters 56, 129-133. Wooldridge, J. M. (2000). A Framework for Estimating Dynamic, Unobserved Effects Panel Data Models with Possible Feedback to Future Explanatory Variables. Economics Letters 68, 245-250. Wooldridge, J. M. (2002). Econometric Analysis of Cross Section and Panel Data (MIT Press, Cambridge, MA). Wooldridge, J. M. (2003). Further Results on Instrumental Variables Estimation of the Average Treatment Effect in the Correlated Random Coefficient Model. Econometric Theory 79, 185-191. Wooldridge, J. M. (2005a). Fixed Effects and Related Estimators in Correlated Random Coefficient and Treatment Effect Panel Data Models. Review of Economics and Statistics 87, 385-390. Wooldridge, J. M. (2005b). Unobserved Heterogeneity and Estimation of Average Partial Effects, in Identification and Inference for Econometric Models: A Festschrift in Honor of Thomas J. Rothenberg}. Donald W.K. Andrews and James H. Stock (eds). Cambridge: Cambridge University Press), 27-55. 108 Wooldridge, J. M. (2005c). Simple Solutions to the Initial Conditions Problem in Dynamic, Nonlinear Panel Data Models with Unobserved Heterogeneity. Journal of Applied Econometrics 20, 39-54. Zhao, Y. (2003). The Role of Migrant Networks in Labor Migration: The Case of China. Contemporary Economic Policy 21(4), 500-51 1. 109 lllllllllllflllllllljljjlIlljlsllllI