rtfiuhmdxe...’ In; ‘ y :1... I, .1? H: .371} .3" ‘20:. $93 I \ (van t .y «.3 dam .832 "G “Noon— [”2 . \ . €574 This is to certify that the dissertation entitled ESSAYS IN PANEL DATA ECONOMETRICS EXAMINING SELECTION BIAS AND AVERAGE TREATMENT EFFECTS presented by LIBRARY Ichigan State University Kamyar Nasseh '——M . has been accepted towards fulfillment of the requirements for the Doctoral degree in Economics 7"“ Major Professor’s Signature Date MSU is an affinnative-action. equal-opportunity employer PLACE IN RETURN BOX to remove this‘checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DAIEDUE DATEDUE DAIEDUE MAY 0 4 2009 7111909 MAY 2 020m ’1011 1'0 6/07 p:lClRC/DateDuetindd-p.1 ESSAYS IN PANEL DATA ECONOMETRICS EXAMINING SELECTION BIAS AND AVERAGE TREATMENT EFFECTS By Kamyar Nasseh A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Economics 2007 ABSTRACT ESSAYS IN PANEL DATA ECONOMETRICS EXAMINING SELECTION BIAS AND AVERAGE TREATMENT EFFECTS By Kamyar Nasseh Chapter 1 considers the affect of time varying unobserved effects in an unbalanced panel data model with possible selectivity bias. As in previous work dealing with sample selection, the unobserved effects in the regression and selection equations are allowed to be correlated with the regressors. Prior to testing for selectivity bias, the parameters of interest are estimated using Generalized Method of Moments. A minimum distance procedure is used to correct for selection bias. An empirical application dealing with a wage equation is used to illustrate the testing and correction procedures outlined in this chapter. Keywords: Panel Data; Sample Selection; Time Varying Unobserved Effects; Conditional Mean Independence; Generalized Method of Moments; Minimum Distance Estimation In Chapter 2, we consider nonlinear panel data models with possible sample selection bias. Previous work has shown the robustness properties of the quasi-maximum likelihood estimator under a conditional mean assumption. One can exploit the robustness properties of this estimator to test and correct for selection bias. Under a conditional mean assumption, a Generalized Method of Moments procedure is also available as an estimation method under suitable orthogonality conditions. An empirical example is used to illustrate the theory discussed in this chapter. Keywords: Panel Data; Sample Selection; Robustness; Quasi-conditional maximum likelihood; Conditional Mean Independence; Generalized Method of Moments Chapter 3 considers estimation of Average Treatment Effects (ATES) for panel data models. Previous work has estimated the endogenous ATE with a correction function for cross-sectional data. The correlated random coefficient model gives us a framework from which to estimate ATES, especially when the treatment variable is possibly endogenous. To account for endogeneity, we use a correction function estimator, which adds a function to correct for endogeneity bias. Monte Carlo Simulations Show that the correction function estimator performs well in finite samples. An empirical example illustrates the theory presented in this chapter by estimating the effect of the school choice program in Michigan on fourth grade student performance in mathematics. Keywords: Panel Data; Average Treatment Effect; Correction Function; FE-IV; Correlated Random Coefficient; Endogeneity Bias DEDICATION I would like to dedicate my thesis to my parents and grandparents, without whose support and encouragement I could not have completed my doctoral degree in economics. I would also like to thank my sister, Nooshin, for providing me support and encouragement throughout the last five years. Her sense of humor has always helped relieve the many stressful periods that come with completing a thesis. My family has always provided me the support and inspiration to complete my education and achieve my goals and aspirations. iv ACKNOWLEDGEMENT I would like to thank my thesis advisor, Professor Jeffrey Wooldridge, for the excellent training he has given me in the field of panel data econometrics. I was inspired to write a thesis in panel data econometrics after taking his advanced course in cross-section and panel data econometrics at the beginning of my second year. I am forever grateful to him for providing me the support, encouragement, and patience to complete my thesis. I would also like to thank Professor Peter Schmidt and Professor Emma Iglesias for their support. I am deeply appreciative for the feedback they have given me throughout the writing of this thesis. TABLE OF CONTENTS LIST OF TABLES ....................................................................... viii CHAPTER 1 1. Introduction ............................................................................ 1 2. Consistency of linear time varying unobserved effects models in balanced and unbalanced panels ........................................... 3 2.1. Consistency in a balanced panel ......................................... 3 2.2. Consistency in an unbalanced panel .................................... 6 3. Variable addition tests for selectivity bias ......................................... 9 3.1. Testing when only the selection IS observed” .............9 3. 2. Testing when the selection variable IS partially observed ........... 12 4. Correcting for sample selection bias ............................................... 14 4.1. Selection Corrections when the selection variable is partially observed ............................................................... 15 4.2. Selection Corrections when only the selection variable is observed ......................................................................... l8 5. Empirical Application: A wage offer equation ................................... 20 6. Conclusion ............................................................................. 24 Appendix A: GMM standard errors for testing ..................................... 25 Appendix B: GMM standard error correction ................................................... 26 Appendix C: Derivation ofthe optimal weighting matrix. 27 CHAPTER 2 1. Introduction ........................................................................... 38 2. Consistency of nonlinear models in balanced and unbalanced panels. . . . . ....40 2.1. Consistency in a balanced panel ........................................ 40 2.2. Consistency in an unbalanced panel .................................... 46 3. Tests for selection bias ............................................................... 50 3.1. A simple variable addition test for selection bias ..................... 50 3. 2. Testing for contemporaneous selection bias ........................... 52 4. Correcting for sample selection bias” 57 5. Empirical Application: A wage offer equation ................................... 60 6. Conclusion ............................................................................ 63 Appendix A .............................................................................. 64 vi CHAPTER 3 1. Introduction ............................................................................... 71 2. General model and assumptions ...................................................... 73 3. A general method for deriving the correction function ............................. 78 4. Examples ................................................................................... 80 4.1. Probit treatment variables ................................................... 80 4.2. Tobit treatment variables .................................................... 82 5. A simple test for slope heterogeneity using FE-IV ................................. 84 6. Monte Carlo simulation ................................................................. 86 7. Empirical Example: Michigan Schools of Choice Program ....................... 89 8. Conclusion ................................................................................ 90 Appendix A ................................................................................ 95 FOOTNOTES ............................................................................... 99 REFEFERENCES ........................................................................ 101 vii LIST OF TABLES CHAPTER 1: Table 1: Summary Statistics ................................................. 31 Table 2: Estimates for wage equation ...................................... 32 Table 3: Estimates for wage equation ...................................... 34 Table 4: Estimates for wage equation. Minimum distance estimation ............................................... 36 CHAPTER 2: Table 5: Summary Statistics ................................................ 68 Table 6: Wage offer equation. Fixed Effects Poisson (F EP) estimates ........................................................................ 69 Table 7: Wage offer equation. Linear Fixed Effects (FE) estimates ......................................... 70 CHAPTER 3: Table 8: Monte Carlo Results .............................................. 92 Table 9: Monte Carlo Results .............................................. 93 Table 10: Summary Statistics .............................................. 94 Table 11: ATE estimates .................................................... 94 viii CHAPTER 1 1. Introduction In practice, applied labor, public, and 10 economists deal with missing data where only a subset of the entire population is observed. For example, labor economists are often interested in estimating wages for the working population, but may only observe a subset of workers because hours worked are not observed for everyone in the working population. Inconsistent parameter estimates, otherwise known as selectivity bias, result when the sub-population is nonrandomly drawn from the overall population. Recent econometric literature has attempted to test and correct for selectivity bias. Nijman and Verbeek (1992) provide a Simple way to test for selectivity bias for random effects estimation. Wooldridge (1995) uses a fixed effects approach where he allows for correlation between the unobserved effects and the regressors. No distributional assumption is imposed upon the idiosyncratic errors of the regression equation in Wooldridge (1995), but a normality assumption is imposed on the errors of the selection equation. Wooldridge (1995) also allows for serial dependence in the errors of the regression equation. Since the unobserved effect is differenced away, selection is also allowed to depend upon the unobserved effect in the regression equation. Other econometricians have also dealt with missing data issues. Kyriazidou (1997) has proposed sample selection methods that do not require distributional assumptions. Instead, she uses a differencing approach between periods to eliminate the unobserved effect and possible sample selection problem. Rochina-Barrachina (1999) expands upon Wooldridge (1995) by making a normality assumption on the idiosyncratic error in the regression equation. All of these previous works on sample selection have only taken into account time constant unobserved effects in the regression and selection equations. In this paper, we shall make a contribution to the missing data econometric literature by introducing time varying unobserved effects in the regression and selection equations. For example, labor economists often have to deal with unbalanced panels when they forecast workers wages. Wages not only depend upon a worker’s experience, education, age, gender, etc., but also on an unobservable skill which may have a price that varies through time. In a linear panel data model denoting a wage equation, the unobserved effect can represent a worker’s unobservable Skills and the time varying parameter on the unobservable effect can represent the time varying price attributed to those skills. However, recent econometric literature has only attempted to deal with estimation of balanced panel data models with time varying unobserved effects. Holtz-Eakin, Newey, and Rosen (1998) estimate a vector autoregression panel data model with time varying unobserved effects and endogeneity. Ahn, Lee, and Schmidt (2001) consider GMM estimation of balanced linear panel data models with time varying unobserved effects. This paper will extend Ahn, Lee, and Schmidt’s (2001) analysis of time varying unobserved effects models to unbalanced panels. More specifically, this paper applies time varying unobserved effects to Wooldridge (1995). Due to the time varying parameter we impose on the unobserved effects of the regression and selection equations, standard pooled OLS estimation will not be feasible to generate parameter estimates and standard errors to test for selectivity bias. Therefore, GMM estimation will be necessary to generate estimates of parameters and standard errors necessary for tests of selectivity bias. This paper shall only consider the affect of time varying unobserved effects on Wooldridge’s (1995) model of sample selection. The main objective of the paper is to consistently estimate Ahn, Lee, and Schmidt’s (2001) time varying unobserved effects model in an unbalanced panel and account for any selectivity bias, while maintaining many of the assumptions imposed in Wooldridge (1995). The asymptotic analysis in this paper is for fixed Tas Ngoes to infinity. The plan for this paper is as follows. Section 2 will consider consistency of the Ahn, Lee, Schmidt (2001) estimator in a balanced and unbalanced panel. Section 3 will cover variable addition tests for selectivity bias. Section 4 will cover a method that corrects for selectivity bias in time varying unobserved panel data models. Section 5 applies our estimation technique to a wage offer equation. Appendix A contains the derivation of GMM standard errors needed for tests of selectivity bias, while Appendix B provides a GMM method that gives precise standard errors in the presence of generated regressors. Appendix C derives the optimal weighting matn'x used in the minimum distance procedure to correct for selectivity bias in Section 4. 2. Consistency of linear time varying unobserved effects models in balanced and unbalanced panels 2.1 Consistency in a balanced panel In Wooldridge (1995), the regression equation of interest has a time constant unobserved effect. In this chapter, we assume that the unobserved effect in the regression equation affects the dependent variable differently for different individuals, but that the temporal pattern of the unobserved effects is the same for each person. For example, assume a wage equation where the price of a worker’s skill set varies over time. More specifically, consider the following linear time varying unobserved effects model from Ahn, Lee, Schmidt (2001), hereafter denoted as ALS, for i. i. d. cross-section observations: for any i: y,“ = x,-,1,B+9[1a,-1 +1.11“, t = 1,...,T,i = 1,...,N. (2.1) where xm is l x K, B is K x 1, 6,1 is a scalar element of(1,631,...,6n)' and the nomtalization 911 = l is imposed. In (2.1), the number one in the subscripts for (t‘,~,1,xi,1,a,-1,u,-,1 ) denotes the primary regression equation ofinterest, such as a wage equation. We assume that N cross-sectional observations are available and that the asymptotic properties of the estimates are derived with T fixed and N —> 00. The following assumption ensures consistency of the ALS estimator in a balanced panel: Assumption 1: Fort = 1,...,T 5(1’1'11Ixill>---axiTlaail) = 0 Under Assumption 1, for fixed T, the ALS estimator iS consistent and JN -asymptotically normal as N —> 00. Assumption 1 states that x 1,1 iS strictly exogenous conditional on 011-1 fort = l, , TI To conduct estimation, we first use a quasi-differencing technique to eliminate the unobserved effect and then perform GMM to estimate the parameters. .1711 -911)’r11 = (X111 ‘011X111)3+ (”ill 49111011) (7-2) Assumption 1 implies a large number of moment conditions.2 Although more moment conditions do not hurt asymptotically, one would like to reduce the number of moment conditions to produce better finite-sample estimates. In terms of the data available, ALS use the following T(T- l)K moment conditions to consistently estimate 0,1 and 6 by GMM, Proposition 2.1 Using Assumption 1, fort = 1, , T,r = 2, , T Eixi‘zi [0"z'r1 — Kirifi) - 91.10711 - Xz‘tlflfl} = 0 Proof: Under Assumption 1, XIII is strictly exogenous with respect to um. The following T(T— l)K moment conditions consistently estimate 0,1 and [3. I 6H EIXMIO‘irl - x1MB) — 29—51—0131 — x1313)” = 0, V ’Jfis Proposition 2.1 follows from Assumption 1 Since I 0" EixhlIO'irl _ xirlfi) — ’63—:0’1'51 — xislfin} = = E{x;,1[0’rr1 - xil‘lfl) - 9r10‘111 — Kiliml} - 9- I ’ ELIEixm [0751 — Xislfi) - 9.910111 - X1113)” I S Proposition 2.1 implies EIXI‘11(”ir1 -9,.]u,-11)] = O, t = 1,...,T, r = 2,...,T. (2.3) Since from the Law of Iterated Expectations, E[x},1 (“n-1 — (ii-114011)] = EIEIXI‘tlwirl _ Orluilln I ail’xi} =E{X;-11E(llir1—9r1ui1]) I ai1,x,-)} =0,l‘=2,...,T,I= 1,...,T Now that we have defined the conditions for consistency ofthe ALS estimator in a balanced panel data sample, we will next define the conditions for consistency of ALS in an unbalanced panel. 2.2 Consistency of the ALS estimator in an unbalanced panel In this section, we define the conditions sufficient for consistency ofthe ALS estimator in an unbalanced panel. AS in the previous section, due to the time varying parameter on the unobserved effect, one has to perform GMM to estimate 6,1 and ,8. The vector of selection indicators for each 1' is denoted aS Si 2 (SI-1 , ,SiT)’ and ym is observed ifs” = 1. We assume that x,-, is observed for all 5i- 6,1 and [3 are consistently estimated by performing GMM on a selected sample of(2.l). For fixed T, as N —+ co, the ALS estimator is consistent and asymptotically normal on the selected subsample when: Assumption 2: Fort = 1,...,T E(“itlIxillv--’xz'T1~ailasi) = 0 Due to the multiplicative way the selection indicators interact with the regressors in the GMM moment conditions, we need a strict exogeneity condition to consistently estimate 9,1 and B. A conditional expectation condition in the form of Assumption 2 is therefore necessary for consistency ofdtl and [3 in an unbalanced panel. Assumption 2 is as in Wooldridge (1995), despite the fact that time varying unobserved effects were not allowed. However, Assumption 2 can be easily extended to models that have time varying unobserved effects. To consistently estimate 0,] and B by GMM, we need to write the necessary T (T— l)K moment conditions in terms of the data and parameters available. Proposition 2.2 Under Assumption 2, fort = 1,, T. r ¢ 1 , 9 1 Eisitst'rmeO’itl T xitlfl) T #70171 T X17118)” : 0 Proof: Consider a model without time varying unobserved effects. Under Assumption 2, x1“ is strictly exogenous with respect to um and Si- The following T(T— 1)K moment conditions that consistently estimates [3. Eisz’ssirxin [071‘] T xirlfl) _ 0"1'31 T xislfi)” = 0a V t,r,S Therefore, Proposition 2.2 follows from Assumption 2 for a selected subsample since Eisissirxi‘tl IO’iri - x1115) - 0751 - Xisiflfl} = Eisitsisxi't] [0111 T xitlfl) T U’I‘Sl T X13113)” T _ Eisitsirxi‘f] [(I'itl T xitlfl) T ()‘irl T xirl BM} In the in a model with time varying parameters, since the expectation operator passes through parameters, the same considerations apply as in a model without time varying parameters and therefore we can claim using Assumption 2 that in a selected subsample r 911 EISI’SII‘XHI [0’11] — xillfi) - Q—l—O’irl — Xirlflfl} = 0,1: I, , T. r 11: I. ,. lfthe data are missing for either time period t or time period r, we just use E(x;-t1[(vi,1 — xitlfi) — (6,1/6,.1)(v,-,.1 — xi,.1fl)]} = 0 for any pair (I, r) that is observed. If data is missing for either time period t or time period r, those entries in the vector of moment conditions are zero. Proposition 2.2 allows for more flexibility when estimating the parameters fl and 9,1 . In the previous subsection, the product of 9,1 and the data from period one were differenced from period t. However, in an unbalanced panel with missing data, this wouldn’t be a useful transformation to estimate fl and 6,1. By using the transformation in the previous subsection, ifs” = 0, then we would lose cross-sectional observation 1' in estimation. Proposition 2.2 allows enough flexibility in that we can capture individuals that drop out of and re—enter the sample. If data is missing for a certain time period, then one can use the data that is available and observed for any other two time periods t and r. Plugging in for y it] and J’irl , the moment conditions implied by Proposition 2.2 hold at the true parameters. , 0 E[si,sirxi[1(zl,',1— fiumfl = 0,t = 1,...,T r it (2.4) ,. Under exogenous selection, consistency of )6 and 9,1 is achieved in an unbalanced panel when the sample of data is selected for period r and period t. Condition (2.4) follows from Assumption 2 by the law of iterated expectations since 0 . I [1 EL8 1151'er (”1'11 — 9 “irl )1 r1 _ E E . . l 911 - { [5it5irszl (“itl — 9r] uirl )l I Xi’ail ’Sittsir} I : EiSiz-YirxitiEWm I xiaail’sitasir» - 611 E . . ’ E . T 6r] {sit‘sirxm (”irl I xI'aczz'l"91'la51'r‘)} =0,t=l...,Tr¢t (2.5) Conditioning on (S ”,3 n.) in (2.5) is valid under Assumption 2 due to a law of iterated expectations argument and the fact that (s ”,3 (r) is a subset of 31- To achieve consistency of the parameters of interest, fl and 9,1, the GMM moment conditions are premultiplied by the product of the selection indicators from period r and period 1, 5,78,}, to account for all combinations of (SI-gs”). Identification of the parameters ofinterest is achieved when the data available is selected for period r and period t and when expected derivative matrix derived from Proposition 2.2 with respect to [3 and 9,1 achieves full column rank. As in Wooldridge (1995), it is not sufficient to just put x,“ and 51-, in the conditioning set at time t. Under Assumption 2, selection is strictly exogenous conditional on a 1'1 and x,-. Assumption 2 puts no restrictions on how selection relates to a” and Xi- Therefore, selection is allowed to depend on the unobserved effect and the regressors in an arbitrary way. 3. Variable addition tests for selectivity bias Now, we will derive variable addition tests by adding a time varying unobserved effect to the selection and regression equations. This differs from the approach taken by Wooldridge (1995) when only time constant unobserved effects were incorporated into the selection and regression equations. The approach used is Similar to that of Wooldridge (1995), where either Tobit residuals or inverse Mills ratios are used as the additional variable. 3.1 Testing when only the selection indicator is observed As in Wooldridge (I995), assume that the explanatory variables x,“ are observed for all t = 1,... , Tand the variable y,“ is observed ifs” = l and not otherwise. For each I = 1,, T, define the selection process as 5,, = l[x,.,52 + 0,201,? + “1'12 2 0] (3.1) where aiz = 1702 +3102 + (71-2 and 0,2 is a scalar element of(l,922, ,672)’. One can use the Chamberlain (1984) version for the unobserved effect. However, to conserve on parameters and degrees of freedom, one can use the Mundlak (1978) version for the unobserved effect which is a linear projection of 0‘12 on a constant, the time average of the explanatory variables, and an error term. Plugging in the Mundlak (1978) version for 01,-2 in (3.1), one can derive T selection equations viIZ = 6,2(‘1-2 +llit2,t = I,...,T (3.3) where “1'2 and til-,2 arejointly normal and Var(v,~,2) = 652032 + l = rtzz. As in Wooldridge (1995), Lil-,2 is independent ofx,‘ with E(ul-,2) = 0. Putting the selection equation in a labor context, the wage of a worker is observed if the prospective worker accepts a wage at or above the reservation wage. Due to the time varying parameter, 6,2, heteroskedasticity is introduced in V172. When the first stage probit estimation is done to collect the inverse Mills ratio necessary for the variable addition test, the parameters 62, rm, and 172 are re-scaled due to the time varying parameter 6,2 and the non-constant variance of via. The resulting re-scaled parameters will be time-varying. 712 (13(sz + xirzézz +31’2712) (3-4) '19 + ‘67+._‘- 19 pW ( ..>-¢(n~- (2 (2) where ‘10. and .1? 1'2 each contain K 1 2 K regressors and (1>(-) is the standard normal CDF. Although Wooldridge (1995) also allowed time varying parameters in the first stage probit estimation, logically the parameters could be time constant. Here, since it is explicit that Var-(via) varies over time, the probit parameters in this model must vary somewhat. In essence, our probit selection equation offers less flexibility than Wooldridge (1995) since we impose a time varying parameter on the unobserved effect in the selection equation. Since til-,2 is a linear combination of zero mean normally distributed random variables independent of(al-1,x,-), it is also distributed Normal ~ (0, r122) and independent of(aI-1,x,-). As in Wooldridge (1995), one needs to assume independence between til-,2 and a” to derive a 10 convenient test for selectivity bias. Using the independence assumption between V112 and (ail , xi), one can easily derive a conditional expectation that leads to a test for selectivity bias. Since the vector of selection indicators 8,- is a function of(xi,vl-2), where Viz = (1’i12,,vi72)', a sufficient condition for Assumption 2 is E(u,-,1 | al~1,x,-,v,-2) = 0,t=1,...,T (3.5) Under the altemative of selectivity bias, E(ul-,1|ai1,xi,vi2) = E(u,~,1 | V112) = pt-’,-,2,t = 1,...,T (3.6) Equation (3.6) states that "1'11 is mean independent of (an ,xi, V11 1 , I-’l-,t_1,2, I’I-,[+1’2, ,vin), conditional on til-,2. Under the alternative (3.6) of selectivity bias, 150‘111Iariaxnvizasr) = 50111 I ar‘laxivViZ) = X1213 +9naii + Pvizz (3-7) However, Since the selection indicator is observed, we must condition on 5, rather than on V12 to derive a test. Later, when we claim that the selection variable is partially observed, we can condition on Viz to derive a test. Using the law of iterated expectations and the fact that V12 is independent of(al-1,x,-), one can Show that 50711 I arbxiiasi) = Xizifl+9nan +PE("uz I (11141351) = xir1fl+9naii +PE(Vz'z2 I X1351: = 1) (3-8) For the purpose of obtaining a simple test for selectivity bias, replace E01172 | xi,sl-) with E(v,-,2 | x,-,s,~, = 1). To derive the test for selectivity bias, estimate E(v,-,2 l x1351“: = 1) in order to derive the inverse Mills ratio and continue to assume that . 7 7 I/(II‘O’I-tz) 2 922022 'I' I = TIZZ' 11 50112 I X13511 = 1) = 50112 I Xial’uz > ”(ll/:2 ”112612 +ir27t2I) = 10102 + xnzézz +312m) (3-9) where /l(t[/,2 + xiréirz + iii/t2) denotes the inverse Mills ratio. The following procedure tests for selectivity bias when only the selection indicator is observed. Procedure 3.1 1. For each I, estimate equation (3.4) using standard probit and compute the inverse Mills ratio A112 5 A112(1Ih2 + Xiaézz 1327(2) 2. Estimate the equation yitl = xitlfi + p’IiIZ 'I' atlail + error,“ (3-10) by GMM. Define m, = (XI-,1 Ill-,2) and 6 = (fl,p)'. Rewrite (3.10) as ym =v‘v,-,5+9,1a,1+error,-,1,z= 1,...,T (3.11) Under the following T (T — l)(K + l) moment conditions in terms of the data available, GMM will be consistent for 5 and 9,1. 9 E{s,-,s,-,.w'l-t[(yi,1 — w,-,5) T 91—11037] - Wi,.5)]} = 0,t = 1,,T r i t ,. (3.12) Identification of the parameters is achieved when the expected derivative matrix derived from (3.12) achieves full column rank. 3. Test H0: p = 0 using the t-statistic for f). A statistic that uses the standard GMM standard error is valid. See Appendix A for the derivation of the GMM standard errors. 3.2 Testing when the selection variable is partially observed In this subsection, we again assume that the explanatory variables x,“ are observed for all t = l, , T. The variableyl-tl is observed for only a non-negative value ofthe latent variable, ha. Assume that for all t, the censored variable I1 it E max(0,h;,) is observed. Like 12 the previous subsection, we use the Mundlak (1978) version for 0‘12 in the selection equation to conserve on degrees of freedom. The censored variable is defined as, 12,-, = max(0,x,-,52 + 9,2(1702 +.Tcl~r72) + 12,-,2) (3.13) where via is defined as in (3.3) fort = l, , T. In a labor context, the wage ofa worker is observed if the worker works more than zero hours. We continue to assume that V112 is independent of (a ,1, Xi) and distributed Normal ~ (0, r222). Under the null hypothesis of no selectivity bias, 5(“1'11 I ai1,xl-,hl-) = O,t=1,...,T (3.14) where h,- = (I1i1,h,-2,,/1,-T)' replaces Si in the conditioning set, making (3.14) a stronger version of Assumption 2. However, the interpretation of (3. 14) is the same as that of Assumption 2. Since hi is a function of(xi,vl-2), (3.5) through (3.7) define the null hypothesis of no selectivity bias and the altemative hypothesis indicating selectivity bias with only h,- replacing s,- in (3.7). If one could observe V112 in (3.7), then one could test the null hypothesis of no selectivity bias by adding til-,2 as an additional regressor in the GMM estimation of (2.1). One can estimate "1'12 by estimating a Tobit model for the selection equation. The following procedure is a valid testing procedure when hit is observed. Procedure 3.2 1. For each I, estimate (3.13) by standard Tobit. For S” = 1, compute V112 = hit T171012 T 912 Ti'1'7712 2. Estimate the equation )‘it : x1113 + 109112 + 9tlo‘il + e""0"itl (3-15) by GMM. Redefine m, = (xm 12,-,2) and 5 = (fl,p)'. Rewrite (3.15) as 13 y,“ = w,-,6+9,1a,-1+errori,],t= 1,...,T (3.16) Under the following T(T— l)(K + l) moment conditions in terms ofthe data available, GMM will be consistent for 5 and 8,1. , 9 E{s,-,s,-rw,-,[(v,-,1 — w,-,(S) - 5%0’171 — w,-,6)]} = 0,t = 1,, T. r #:t ,. (3.17) Identification of the parameters is achieved when the expected derivative matrix derived from (3.17) achieves full column rank. 3. Test H0: p = 0 using the t-statistic for p. A statistic that uses the standard GMM standard error is valid. See Appendix A for the derivation of the GMM standard errors. 4. Correcting for sample selection bias We will now consider correcting for selectivity bias when the selection indicator is partially observed and when only the selection variable is observed. In Section 3, we considered tests for selectivity bias when It}! was partially observed or when s it was only observed. However, it is important to note that the procedures outlined in Section 3 are not methods to correct for selection bias, only methods to test for selection bias. By quasi-differencing ail out of(2.1), we need to condition upon selection in at least two time periods to estimate 6 and 6,1. In the methods that we propose below, we impose restrictions upon ail, as in Wooldridge (1995), to avoid having to condition upon selection in at least two time periods. Although, Dustmann and Rochina-Barrachina (2000) do condition upon selection in two time periods and correct for selection, they use semiparametric estimation to correct for selection bias. We will use an entirely parametric method to correct for selection 14 bias as in Wooldridge (1995) that does not restrict the distribution of a” given (XI-1,11%) where X11 = (x111, , x171 )'. Wooldridge (1995) estimates a regression equation with time constant unobserved effects. However, in our paper we consider a regression equation with a time varying unobserved effect. Despite the presence time varying unobserved effects, it is still possible to impose structure on “11 as in Wooldridge (1995). Chamberlain (1984) allows (11-1 to depend on the entire history of X i] . In the previous sections, we made no Specific assumptions on the unobserved effect, “it , in the regression equation. Despite the presence of the time varying effects in our model, we can still allow a 1'1 to depend on X 1'] as in Chamberlain (1984). However, to conserve on parameters and make estimation more tractable, we allow the unobserved effect in the regression equation to depend on the time averages of the explanatory variables as opposed to the entire history of X 11- Although this is not as general as Chamberlain (1984), we still restrict or [1 in such way as to avoid having to condition upon selection in two separate time time periods. We can then substract the time varying parameters, 9,1, using a GMM or minimum distance procedure. 4.1 Selection corrections when the selection variable is partially observed The main regression equation is still given by (2.1). In this section, we shall impose a linearity assumption relating a 1'1 to xi and til-,2. Before we continue, we will formalize the selection mechanism based upon Section 3. Assumption 3: Define 5,, as in (3. 1) and (3.2), where via is still independent of x,- and We ~ N0rmal(0, r32) where rt22 is defined in Section 3. Let h it be defined as in (3.13) with 12,-, E max(0,lz;t) and S," = 1[/1;‘t Z 0],! = 1,...,T. To correct for selectivity bias, we also make the following assumption, 15 Assumption 4: (i) E(u,-,1 | xi,vl-,2) = E(ul-,1 | V112) = L(ul-,1 I 12172) and (ii) 50111 I xiaVrtz)=L(ai1 I Iv-IivViIZ) Since the entire history of (V112, ,vin) does not appear in Assumption 4(1’), we are allowing for the serial dependence in 111-12 to be entirely unrestricted. Assumption 4(1') is a conditional mean independence condition that holds if(ul-t1,vit2) is independent of xi. Other than assuming linearity in E(ai1 | Xi, Via) in Assumption 4(ii), the distribution ofa” given (Xi, "112) is unrestricted for all t. Assumption 4(1'1') also holds when (ill-1,12%) conditional on x,- is bivariate normal. Since E01,“ I V112) is assumed to be linear in Assumption 4(i), one can write E(“itl I "112) = PtVirZ (4-1) where p, is a scalar. Using the Mundlak (1978) device for the unobserved effect in the regression equation, ail = an +1717] + c” , where Cu is a zero mean random variable, the linear predictor in Assumption 4(1'1') can be written as L(a,—1 | l,.i',-,v,-,2) = an +3717] + duty-,2 (4.2) Using a law of iterated expectations argument and Assumption 3, by conditioning on only xi, we can write E(a,-1 | Xi) = on +5121! + ¢,E(v,-,2 I x,-) = a). +3171 (4.3) Since E(v,-,2 | x,) = 0 by Assumption 3. Using the Mundlak (1978) device for the unobserved effect, (2.1) can be rewritten as ym = xmfl + 9,] (an +5711] + 61-1) + “1'11 (4.4) where (01 is a scalar, it is l x G and n. is G x l where G 2 K. Now, under Assumptions 3 and 4, we can write 16 50721 I vairz) = (011 + xtil/3 #3191101 + (ptlvitZ (4.5) where a)“ = Baa); represents year dummies, 17,1 = Only] and (p11 = p, + 6,143,. Unlike Wooldridge (1995), due to the time-varying parameter on an , the coefficients on the intercept and 5c,- are time varying. Since Vrrz for r i t is not included in the conditioning set in (4.5), 12,-,2 is not strictly exogenous. Since 5,, is a firnction of (xi,vl-t2), (4.5) implies that EO’rrl I xi’vith‘Tt = 1) = 60:1 + X11119 +ir9110! + (ptlvit2 (4-6) A GMM procedure that accounts for the first stage estimation of the Tobit residuals is required to consistently estimate (4.6). Procedure 4.1 1. Define the residual function, 9,12, from T standard Tobit equations as in Procedure 3.2. For Sit = 1, define the 1 x (1 +K+ G + T) vector wit = (l,5c,-,x,-t1,0,... ,1»,,2,o,...,0). 2. Obtain T = ((2)1 1, ,®T1,i3,é21,... ,én,f]],([)11,... ,OTI)’, a(3T—1+K+ G) XI parameter vector from the non-linear GMM estimation of Yitl = VII/HY + erroritl (4.7) for s it = 1. To correct for the generated regressor problem, one needs to stack the moment conditions implied by (4.7) on top of the first order conditions that generate the Tobit residuals from the first stage estimation. See Appendix B for details. Under Assumptions 3 and 4 and standard regularity conditions, Y is consistent and JN asymptotically normal. 3. Obtain the asymptotic variance of Y, which will give AMI-((5,1). From this, one can construct a Wald statistic with T restrictions to test H0 : (p {1 = 0. A test that fails to reject H0 implies no sample selection. 17 4.2 Selection corrections when only the selection indicator is observed When only the selection indicator is observed, we can modify our approach from the previous subsection to correct for selectivity bias. Rather than collecting Tobit residuals from a first stage estimation, we need to do a first stage probit estimation. Assumptions 3 and 4 will still hold in this section. Since the variance on V112 is not constant through time, we will need to re-scale the probit parameters as in Section 3. As in Section 3, Var(v,-t2) = 632022 + l = rt22. When only the selection indicator is observed, we need to find the expectation ofym given (XI-,5” = 1), which is 50111 I xiasit : 1) = 60,] + xitlfl Til'etlnl + (P121112 (4'8) where /l,~,2 denotes the inverse Mills ratio from the first stage probit estimation. Now we can specify the following procedure that will correct for sample selection when only Si, is observed. Procedure 4.2 1. For each I, estimate (3.4) by standard probit and construct the inverse Mills ratio A112 5 11120212 + sz2522 +56121712)- Forsi, = 1, define the l x (l + K+ G+ T) vectorwi, = (1,.i'i,x,-,1,0,...,/l,~,2,0,... ,0). 2. Obtain i- = (6,011,... ’(DTI’B’QZI’ ...,éT1,f]1,(b11, ,OTI)’, 3 (3T— I + K+ G) X I structural parameter vector from the non-linear GMM estimation of yin = w,,r+ error,“ (4.9) for s it = 1. To correct for the generated regressor problem, one needs to stack the moment conditions implied by (4.9) on top of the first order conditions that generate the inverse Mills ratio from the first stage estimation. See Appendix B for details. Under Assumptions 3 and 4 and standard regularity conditions, Y is consistent and [N asymptotically normal. 18 3. Obtain the asymptotic variance of Y, which will give Avar((2),1). From this, one can construct a Wald statistic with T restrictions to test Ho : (p ,1 = 0. A test that fails to reject Ho implies no sample selection. Alternatively, we can estimate (4.8) using a minimum distance procedure. In the next section, an empirical example that illustrates the use of minimum distance to estimate (4.8) is shown using wage and experience data. To estimate T, one first needs to estimate an entirely unrestricted version of (4.8) using pooled OLS. To account for the first stage estimation of 2”, the standard errors for all parameters need to be adjusted. Details for the two step estimation technique to account for the generated regressor problem is shown in Appendix C. The following procedure outlines a method to correct for selection bias when only 3 it is observed. Procedure 4.3 1. For each t, estimate (3.4) by standard probit and construct the inverse Mills ratio A122 5 1:2(V7t2 + Xrtzézz +5Cz-zfz2) Forsi,= 1, define the 1 x (1 +K+ G+ T) vectorwi,= (l, xi,x,-,1, 0,. .,2,-,2,0,...,0). 2. Estimate the following entirely unrestricted equation using pooled OLS J’rzi = 60:1 + xmfi +56:an + (P111112 + ”fort: = War +err0rit (4-10) WhCI‘C wit = (I,d2t, ’th’xitI’ji’dzl‘ ° 2"", ,th ° Xi’lit’dzf ° Ail, ,th ° A”) IS a 1 x (2T+ K+ TG) vector and I" is a (2T+ K+ TG) x 1 vector ofreduced form parameters. The pooled OLS estimator on 1the selected sample is written as =ZZsz-(wm 22s sitw LL-y 111 (4.11) 1=1t=1 i=lt=l 3. Obtain Ava?'(f‘) from the pooled OLS estimation of (4. 10) to use as the optimal weighting matrix for minimum distance estimation of 6. See Appendix C that for the derivation of Avar(l") that accounts for the first stage estimation of 2),. l9 The efficient C MD estimator for Y solves the following criterion function mrrn <1“ — Hm}’[Avai-(1“)]‘1 {1‘ — Hm} (4.12) where H(Y)= 921]] 9TH? 9011 (PT] I— _ H is a matrix that maps the structural parameters, Y, onto the reduced form parameters, I". 4. Obtain the asymptotic variance of Y, which will give Avar((p,1). From this, one can construct a Wald statistic with T restrictions to test H0 : (p ,1 = 0. A test that fails to reject Ho implies no sample selection. 5. Empirical Application: A wage offer equafion In this section we consider estimation of a wage offer equation with possible selectivity bias. We consider a test of selectivity bias when only the selection indicator is observed. We consider three versions of Procedure 3.1, two of which use the HNR (1988) transformation to eliminate the unobserved effect. One version of Procedure 3.1 that we consider uses the following (T— l)(K+ 1) moment conditions 20 i 9 , EISI’SiJ-Iwitltoiitl T W115) T 91—!11 1 (JIM—1,1 T wry—1,10)» = 09 t = 2, 9T (5.1) The second of Procedure 3.1 considers (T— 1)2(K + 1) moment conditions fort = 2,, T, 9n 9r-1,1 EisitSiJ—IWISIO’I'II " Wrr5) _ U’z‘,t—1,1 — War—1,15)” = 0 {5-2) The third version of Procedure 3.1 that we consider uses T(T — l)(K + 1) moment conditions and is identical to (3.12) , . 9 E(s,-,s,.,.w,.,[(y,-,1 — wL-La) — fig,“ - wL,5)]) .—_ 0,t = 1,...,T. r 4.. t (5.3) Adding more moment conditions increases asymptotic efficiency and does not violate the assumptions we have made above. Finally, using efficient classical minimum distance estimation, we correct for selectivity bias using Procedure 4.3. Using data on wages, experience, education, and age, we test for selectivity bias using Procedure 3.1. The wage offer equation we estimate is similar to the one estimated in Dustmann and Rochina-Barrachina (2000) and Semykina and Wooldridge (2005) except that we assume that all regressors in the wage offer equation are exogenous. Unlike Dustmann and Rochina-Barrachina (2000) and Semykina and Wooldridge (2005), the unobserved effect that represents an innate Skill in the wage offer equation has a price attributed to it that can vary over time. The data used in the estimation of the wage offer equation comes from the Panel Study of Income Dynamics (PSID) for the years 1980-1992 and is also used in Semykina and Wooldridge (2005). The sample includes 877 individuals and has data on wage, age, experience, education and labor force participation. The dependent variable in the wage 21 equation is the log of real average hourly earnings which are deflated in 1983 dollars and defined as the ratio of the individual’s annual labor income to the annual hours worked. The vector of explanatory variables in the wage equation are experience, experience squared, and year dummies. As specified in Procedure 3.1, a test for selectivity bias is conducted by testing the significance of the coefficient on the inverse Mills ratio which is generated from a first stage probit estimation of the participation equation. A participation indicator is the dependent variable in the selection equation. The regressors in the selection equation are experience, experience squared, education, age, age squared, an indicator for marital status, other family income and its square, the number of children in three age categories, spouse’s education, spouse’s age, spouse’s age squared, the product of spouse’s age and education, duration of spouse’s unemployment, and a binary indicator specifying whether the spouse’s duration of unemployment was recorded or not. Table 1 reports the summary statistics for the variables used in estimation. Table 2 reports the parameter values, standard errors and t-statistics for the regressors, inverse Mills ratio, and ratio of the time varying parameters, 6,1/9,_1,1 using the GMM moment conditions Specified in (5.1) and (5.2). To test for the presence of time varying effects, we test the null hypothesis that (6 ,1/6,_1,1) = 1. As a reference, Table 2 also reports parameters and robust standard errors from OLS, FE, and a FE selection test without time varying parameters. Table 3 reports parameters and standard errors using the GMM moment conditions implied by (5.3), where to test for the presence of time varying effects, we test the null hypothesis that 6, = 1. As can be seen from Table 2, Procedure 3.1 that uses (T— l)(K + 1) moment conditions fails to reject the null hypothesis of no selectivity bias at the 10 % level. Also there is little evidence that a time varying price can be attributed to innate ability. The experience parameters are significant at the 10% 22 level and the return to experience becomes negative after about 28.5 years. The J-test rejects the null hypothesis of correct Specification at the 5% level but not at the 1% level, indicating that not all of the moment conditions used are valid. However, when the wage equation is estimated using time demeaned regressors, the test for selectivity bias rejects the null hypothesis at the 5% level. The inclusion of time varying fixed effects seems to make the selection term less Significant when the wage equation is estimated using GMM with (T — l)(K + 1) moment conditions. However, Procedure 3.1 that uses (T — 1)2(K + 1) moment conditions rejects the null hypothesis of no selectivity bias at the 1% level. Also there is stronger evidence that a price can be attributed to a worker’s Skill set. The experience parameters are both significant at the 1% level and the return to experience becomes negative after about 25.3 years. The J-test rejects the null hypothesis of correct specification at the 1% level, indicating that the use of lagged and future experience as instruments may not be valid. However, the use of more moment conditions decreases the standard errors by a Significant amount, which is not surprising. The return to experience becomes negative after 23.75 years for pooled OLS, and afier 58.6 years for the FE selection test procedure. Table 3, which shows the results from using the T (T — l)(K + 1) moment conditions implied by (5.3), rejects the null hypothesis of no selectivity bias at the 1% level. The experience parameters are also significant at the 1% level, as are most of the time varying parameters. The return to experience using the Specification in (5.3) becomes negative after about 86.3 years. The J-test rejects the null hypothesis at the 1% level. AS the results from Tables 2 and 3 Show, the model that we are testing may be misspecified somewhat, although this is not surprising considering that we used experience and experience squared as instruments. One possibility why the J-test may have rejected the null 23 hypothesis in all of the Specifications used is that past wage shocks effect the number of years of experience we observe for workers today. Also there may be some serial correlation in the errors of our model. Overall, the results from Tables 2 and 3 Show that there is some evidence of selection bias and therefore a need to do a correction. Table 4 reports the results for the minimum distance procedure that corrects for selectivity bias. The Wald test for selection bias rejects the null hypothesis of no selection bias at the 5% level. The experience parameters are both significant at the 5% level. However, the return to experience becomes negative after 103.6 years, a much Slower rate than in Procedure 3.1. Correcting for selection bias seems to decrease the rate at which the return to experience deteriorates. Overall, the return to experience never becomes negative over the length of the panel in all of the procedures outlined in Tables 2, 3 and 4. To test for the presence for time varying effects, we test the null hypothesis that 9, = 1. As shown in Table 4, for most years, there appears to be a statistically Significant time varying price that can be attributed to Skill. 6. Conclusion In this paper, we have shown how to test and correct for selectivity bias in a model with time varying unobserved effects. Due to the time varying parameter on the unobserved effect, standard OLS techniques are not sufficient to estimate the parameters of interest. Due to the non-linear nature of the moment conditions Specified in the paper, a GMM procedure is required to estimate the parameters. The methods specified in this paper Should be useful when the econometrician suspects the presence of time varying unobserved effects. By ignoring the presence of a time varying parameter on the unobserved effect, a researcher 24 risks misspecifying his model when he uses standard OLS techniques to test and correct for selectivity bias. Future research can involve weakening the strict exogeneity assumption of the regressors. Appendix A: GMM STANDARD ERRORS FOR TESTING For brevity, we will derive the GMM standard errors for the variable addition test when only the selection indicator iS observed. The method that calculates the GMM standard errors when the selection indicator is partially observed is similar. Consider the following equafion )Iitl = witd+6nan+err()r,-,1,t=1,...,T (A.1) where a” = (XL-,1 2.1-,2) and 6 = ([3,p)’. In terms ofthe data available, GMM is consistent for 5 and 9,1 when the following T (T — l)(K + l) moment conditions hold, I 9r] Eisrrsrrwrzio‘iri — W115) — 970131 — Wil‘6)]} = E[b,-1(c,“)] = 0,1 = 1,...,T. r it (A.2) where L: = (5,921 , ,0T1 )'. Defining the sample ofaverage of [’11 (g) by N bmm = 41,7212 11(4) (A3) 5 solves minngN1(§)'VTIbN1(§). where V11 = E[b,-1((;)b,1(g)'] and I/ll = (l/N) 211:1 bi1(§)bL-l . Using standard results from GMM and a Taylor series expansion, WG— C) —> (0.093 1711864)) (A4) where 31 = E((3bL-1/6§)' is the expected derivative matrix and V1 1 is the optimal weighting matrix of the moment conditions. The asymptotic covariance matrix (B'l VTIB 1 )TI) can be 25 estimated from a GMM procedure. Standard errors can then be directly obtained from the estimated covariance matrix. Appendix B: GMM STANDARD ERROR CORRECTION In Section 4, we derived a two-step procedure that tests and corrects for selectivity bias in an unbalanced panel with time varying unobserved effects. Due to the generated inverse Mills ratio or Tobit residuals derived from the first stage estimation of the selection equation, GMM estimation of the regression equation in the second stage can give poor standard errors, leading to an imprecise test of significance for 5 in (4.7) and (4.9). In this section, we will derive a GMM method that corrects for any possible generated regressor problem when H0 is rejected in Procedure 4.1 or 4.2. Wooldridge (1995) uses a method similar to Newey (1984) and Pagan (1984) to adjust the standard errors derived from the second stage estimation of the regression equation in the presence of generated regressors. Applying the general framework in Wooldridge (2002, Chapter 14) and Newey and McFadden (1994), we will use a GMM method that stacks the first order conditions from the first stage probit or Tobit and on top of the GMM moment condition implied by estimating (4.7) and (4.9). This method will give us precise standard errors that will allow us to correct for selectivity bias. GMM estimation when first stage estimation is a probit or a tobit Consider the first stage estimation of the probit selection equation, ' 30 + X: 52 +27: 29 P(Sit = 1 I x1.) = (D( ’70 t2 122:2 1’7 (2 ) = (DIWIZ + Xuzérz +5‘r‘712) (13-1) The parameters of this equation can be estimated by doing T cross-section probits. For each (i,t), the implied score from the probit log likelihood is 26 . _ ¢(zit2Ct)z;',2[Sit ‘(D(zit2§t)] (Ilia!) _ ¢(Zit2C1)[1‘¢(zit2Ct)] (3.2) where 21-,2 = (LXI-3,3,), Ct = (w,2,§;2,y'[2)', and ¢(.) = 6CD(zl-,2C,)/8§,. For each i, stack the (ll-Act) to get d ,-(§). The following GMM moment condition consistently estimates 6 from (4.9), E[5itw;'50’itl — wit5)] = 0,! = 1,...,T,s it (8.3) Stacking this moment condition on top on the moment conditions implied by running T cross-section probits, r Sitwis (yitl _ wif6) = 0 (3.4) (MC) provides reliable standard errors for 6. By applying the method discussed in Appendix A, one can derive the asymptotic standard errors using the standard GMM variance formulas. When the selection equation is a tobit, simply run T cross-section tobits. For each (i, t), generate the the score or first order condition for Cr As in the probit case, for each i stack the score conditions to generate d 1-(C) and then use (8.4) to estimate 8. Appendix C: DERIVATION OF THE OPTIMAL WEIGHTING MATRIX The optimal weighting matrix for minimum distance estimation is derived from the asymptotic variance of the unrestricted pooled OLS regression of (4.10). Since (4.10) contains a generated regressor, the standard errors from the estimation of (4. 10) need to be adjusted to account for the first stage estimation of the selection terms. In the presence of generated regressors, minimum distance estimation of the structural parameters of interest will not result in asymptotically efficient estimation if the optimal weighting matrix does not take into account the first stage estimation ofthe selection temis. This appendix outlines the 27 method to derive an asymptotically valid two-step reduced form asymptotic variance matrix and optimal MD weighting matrix. It has been shown in Section 4 that E(y,-,1 | wins" = 1) = w,-,I‘ (C.l) where w, = (1,d2,, ,(IT,,x[,,.i',-,(l2[ .xi, ,dT, -x,-,i,-,,d2, oi”, ,a’T, . in) is a 1x (2T+K+ TG) vectorandl" = ((011,...,wlT,,B,n11,...,n7~1,(p11,...,gplT)’isa (2T+ K + TG) x 1 vector. We can write (C.l) in error form as yl'tl = qu‘ +81“ ,1 = 1,...,T (C.2) where E(e,-t1|Wiz,S,-, = l) = 0. The pooled OLS estimator on the selected sample, after inserting the estimated selection terms from the first stage probit estimation, is —1 r=<§3 2w) (2: zsmy > (C3) 1-1 t-l i=1 t=1 Using the fact that y,“ = wfil" +8,“ = wig“ + (wit — mgr + em , we can write (C3) as N T WU“ — 1“): A51 NW2 2 Zsi,a:;,[(w,, — wan“ + em] + 0,,(1) (C4) i=1 (=1 where A0=E(Zt:1 .sl-IW'I-tv’i’l-t). Using a mean value expansion as in Wooldridge (2002) Appendix 6a and the fact that E(e,-,1 limbs”: 1)- — 0 N T (N-l/zzzsnwum—w,-,>r+e,-n) i=1 t=1 T = ‘E[Z s,,w},I"V;:w}, :I ”(it _ 1r) t-l +N_l/ZZZS s-,w;.,e ,,1+o,,(1) (C.5) i=lt=l 28 where 1! = (11/12, #172,612, jug/12, +772)! is a (l + 2K1)Tx 1 vector and Vnwgt is the (2T+ K + TG) x (1 + 2K1)TJacobian of w}! with respect to 1:. Since it is a vector of probit maximum likelihood estimators for each t, it has the following representation N We“: — n) =N‘1/2 Z ri(1t)+0p(1) (C6) i=1 Therefore, N T (NJ/2 Z ZsitwitKwit — Wit”! + 9211]) i=1 (=1 N r = 2r“2 Z[Zs,,w;,e,-,1 - Dri(n):| + 0pm) ((17) i=1 z=1 So using (C7), we can write Md" — I") ~N0rmal(A0_lBoA51 (C8) where Bo=Var(ZtT:1 Sitwj'teitl — Dri(1t)) E Var[p,—(l",1t)]. A consistent estimate of Avar[,/N (f‘ — 1")] can easily be generated by replacing unknown parameters with consistent estimators. Define N T N I A aN—l Z Z snag-Mi, and B aN—l Z fi,(r,1‘c)fi,(r,1‘r) i=1 t=l i=1 ,. A A A ~ _1 N T A I A, A I . A where em = y,“ — witll", D =N 21.21 21:1 sitwl-II‘ Vnwl-t, and ri(1t) 18 evaluated at 1t. To compute D and r,-(i‘t), let us define 2,12 = (l,xi,2,.?l-). The Jacobian Vyzv'vj-t is a block matrix with all zeros except in one block. Using the expression for the derivative of the inverse Mills ratio from Wooldridge (2002 p. 522), 29 an‘v},= 0 0 "lizz/1112'(Zizz“t+/lizz) 0 (C9) 0 0 0 the 1 x (1 + 2K1) row vector, ate/1,12 . (zi,21t,+ 21,2), appears in row 1 + (G + K+ t and column (1 + 2K1) - (t — l) + 1. Since F'Vnw}t = —(p,lz,-,2,l,-,2 -(z,-,21t,+ 11,12), N T N1 szsltwitwtlzzali'a '(zittnt '1’ Ilia) (C 10) i=1 t=l U) III From standard results for probit estimation, for each i and I, the (l + 2K 1) x 1 vector 77,0?) is written as ’A‘itfiiz) =A‘,‘1((D(z,-,21“r,)[1 — ¢(zi,2ft,)]}_1¢(z,-,ft,)z;[2[s,-, — (l)(luzfitfl (C.11) where _ {49(1 t2“t)} Z 2'0. A —N 1 I 1’2 I C.12 I: (=2 (l)(zitzflt)[ 1—(I)(Zi12fi()] ( ) is a consistent estimator ofthe expected Hessian. For each i, stack the each Pitfict) to get r ,-(1'i). Once D and r l-(ft) have been computed, we can compute the asymptotic variance . . A . — . a—l . . . ,. matnx of 1". Note that Avar(1t) = A 1BA /N. Finally, use the inverse of Avar(1t) as the optimal weighting matrix in (4.12). 30 Tablel. Summary Statistics. Mean Values. Standard Deviation in parentheses Variable Description Participation Indicator Log Real Wage Years of Experience Years of Education Age Married Indicator Other Household Income (thousands) Spouse’s Age Spouse’s Education Spouse’s Unemployment Duration (Weeks) Weeks Unreported (=1 if Spouse’s Unemployment not reported) Children Aged 0-2 Children Aged 3-5 Children Aged 6-17 Number of Obs. {Entire Sample Participants Non-Participants 0.74 11.79 (7.76) 12.93 (2.30) 40.93 (10.27) 0.86 34.398 (40.379) 37.00 (18.17) 11.21 (5.25) 0.97 (4.96) 0.09 0.14 (0.37) 0.18 (0.42) 0.82 (1.01) 11,401 1 1.94 (0.62) 12.98 (7.58) 13.12 (2.27) 40.13 (9.62) 0.84 30.945 (30.868) 35.17 (18.28) (10.98) (5.50) 0.94 (4.80) 0.06 0.11 (0.33) 0.16 (0.40) 0.84 (1.01 8.387 O 8.49 (7.28) 12.40 (2.31) 43.14 (11.63) 0.93 44.007 (58.237) 42.10 (16.84) 11.84 (4.40) 1.06 (5.36) 0.16 0.21 (0.45) 0.24 (0.48) 0.77 (0.99) 3,014 Table 2: Estimates for Wage Equation. Robust Standard Errors in Parentheses. Yeardunlmies are in_cluded but not reported POLS FE FE Mills Procedure 3.1" Procedure 31” Exp 0077*“ 0083*“ 0082*“ 0.0912" 0.1314"? (0.0062) (0.0101) (0.0099) (0.0338) (0.0059) Expz —0.00l6*** —0.0009*** -0.0007*** -0.0016* —0.0026*** (0.00017) (0.00017) (0.00016) (0.0008) (0.0001) IMR — — —0.134*** —0.0813 —0.1421*** (0.0385) (0.0812) (0.0135) 62/61 — — — 1.0371 0.9363*** (0.0912) (0.0166) 63/62 — — - 0.9786 0.9643" (0.0882) (0.0149) 94/93 -— — — 1.1118 0.9722* (0.1446) (0.0168) 65/64 — — — 0.7842 O.8758*** (0.1490) (0. 0178) 96/195 — — — 1.0757 0.9832 (0.2251) (0.0203) 97/96 — — — 0.7963 0.7394*** (0. 1322) (0.0209) 68/67 — — — 0.9723 0.7671*** (0.1422) (0. 0192) 69/68 — — — 0.8156 0.7471*** (0.1382) (0. 0203) 610/69 — — — 1.2537 O.8775*** (0.2080) (0.0223) 32 f Table2 Continued POLS FE FE Mills Procedure 3.1" Procedure 31]) 911/610 '— — — 0T8604' 0.6784” (0. 1695) (0. 0227) 612/611 — — — 0.8055 0.5766*** . (0. 1981) (0.0296) 613/612 1— — — 0.6994 0.2112*** (0.3279) (0.0498) J-Test ,- — — 19.0169 484.0668 T (P-Value) (0.0250) (0.00416) a-(T — l)(K +1) moment conditions. b-(T — 1)2(K + 1) moment conditions. >1: >1: 4: —Significant at 1% level. >1: >1: —Significant at 5% level. >1: -Significant at 10% level. 33 Table 3. Estimates for Wage Equation. Year dummies included but not reported. Exp Exp2 IMR . Procedure 3.1“ 0.4036"; (0. 0030) —0. 0006*" (0. 0001) —0. 1670*“ (0. 0068) . 0. 9596*” (0.0031) ‘ 1.0550*** . (0.0045) 1.0661*** (0. 0048) 0. 9900" (0. 0046) 1.0349*** (0.0055) 1.0444*** (0.0059) ’ O.9705*** (0.0056) 0. 9585*" (0.0057) 34 Table 3 C ontd 0 Procedure 3.1“ 1.0213”‘ 910 (0.0063) 1.0093 611 (0.0066) 1.0236"" 912 (0.0070) 0. 9610*” 913 (0.0072) J-Test 543. 2125 (P-Value) (0.0006) a-T (T 7— l)(K + l) moment conditions. 4 >1: =1: —Significant at 1% level. >1: >1: —Significant at 5% level. * —Significant at 10% level. 35 Table 4. Estimates for Wage Equation. Minimum Distance Estimation. Year dummies included but not reported Exp 7 Exp" IMR IMR81 IMR82 IMR83 IMR84 IMR85 IMR86 IMR87 IMR88 IMR89 IMR90 IMR91 Procedure 4.3 ' 0. 0829*“ (0.0094) -0. 0004" ~ (0.0002) ? —0. 3547*" (0.0667) 0.0145 (0.0640) —0. 0302 (0.0638) 70.1212* (0.0686) 0.0710 (0.0801) 0. 1339* ; (0.0725) 0. 1462* (0.0751) 0. 2918*” (0. 0810) 0. 1294 (0. 0843) 0.0204 (0. 0907) 0.0080 (0.0959) 0. 0407 (0. 1006) 36 Table 4 Contd ‘ Procedure 4.3 114R92 ‘(12060* 0.1055 02 1.0713*** (0.0264) 03 1.0730** '(0.0305) 64 11.0665* (0.0350) 05 1.0754* (0.0388) 96 1. 1652*" (0.0451) 97 1.0277 (0.0438) 98 1.1550*" (0.0486) 09 1.1462‘“ (0.0513) 910 1.1361** (0.0605) 911 1.1561** (0.0618) 012 1.1967*** (0.0644) 013 1.1781** 1(0.0736) Wald Test for 7 Selection Bias xfi 133.44 >1: * *-Significant at 1% level. * *-Significant at 5% level. *10% sig. level 37 CHAPTER 2 1. Introduction In the panel data literature, theoretical and applied econometricians have been interested in problems dealing with missing data and sample selection bias. For example, Wooldridge (1995) developed a fixed effects approach to test and correct for selection bias in linear models. Terza (1998) considered exponential models with sample selection and exogeneity in the case of a cross-section. Econometricians have also turned their attention to non-linear panel data models. Hausman, Hall, and Griliches (HHG) (1984) developed and estimated a fixed-effects Poisson (FEP) panel data model under various distributional assumptions. HHG (1984) applied their model to examine the relationship between R&D expenditure and the number of patents a firm receives in a given time period. Wooldridge (1999) extended the work of HHG (1984) by proving that the quasi-CMLE is robust to any distributional misspecification under a conditional mean assumption. Both HHG (1984) and Wooldridge (1999) assumed strict exogeneity conditional on an unobserved effect. Blundell, Griffith, Windmeijer (BGW) (2002) use GMM to estimate balanced count panel data models with strictly exogenous regressors. BGW (2002) also consider count data models with feedback. Wooldridge (1997b) considers general non-linear panel data models under a weaker sequential exogeneity assumption that allows for feedback from the lagged dependent variable to future explanatory variables. However, under a strict exogeneity assumption, feedback is not allowed. This paper will only consider methods to test and correct for selection bias in non-linear panel data models under a strict exogeneity assumption. As has been shown in the literature, the assumption of strict exogeneity may not 38 be entirely realistic. For example, the number of patents a firm gets today can influence its R&D expenditure in the future or a wage shock in the past can effect the number of years of experience we observe today. The FEP estimator that we will consider in this paper can also be applied to data on hourly earnings. When labor economists want to estimate the return to experience and education, they often use log wage as their dependent variable. One can also consider using wage data in its level form to measure the return to experience and education. Such a model specification for wages, experience, and education can be applied to non-linear panel data models with an exponential mean function. In a selection context, the return to experience may not be observed if a person does not accept a job offer or work a certain number of hours. Estimating the return to experience with missing wage data can lead to selection bias. Also, in the R&D and patent literature, the number of patents awarded to a firm in a given year may not be observed if that firm has gone into bankruptcy. Estimating the relationship between patents and R&D expenditure with missing data on the number of patents awarded to a firm in a certain year can also lead to inconsistent estimates due to attrition bias. The objective of this paper is to extend the work of Terza (1998) to cases where there exists possible selection bias in a panel setting. We will use the estimation strategies proposed by Wooldridge (1999) and BGW (2002) to test and correct for selection bias in the presence of only strictly exogenous regressors. This paper will review the conditions needed for consistency in a balanced panel and then derive the conditions needed for consistency in an unbalanced panel. A procedure to test and correct for selection bias in non-linear panel data models will also be outlined. The plan of this paper is as follows. Section 2 will derive the conditions needed for 39 consistency in a balanced and unbalanced panel using the desirable robustness properties of the Poisson QCMLE. Sections 3 and 4 provide procedures to test and correct for selection bias. An empirical example in Section 5 will illustrate the estimation methods shown in Sections 3 and 4. Section 6 concludes. An appendix is included to derive the asymptotic variance matrix needed for selection corrections. 2. Consistency of nonlinear models in balanced and unbalanced panels 2.1 Consistency in a balanced panel This subsection will briefly summarize the conditions needed for consistency of count data models in a balanced panel. Recent econometric literature has focused on non-linear panel data models. A typical example of a non-linear model that econometricians often focus on is a count data model with an exponential mean function. Chamberlain (1992) considered efficient estimation in the exponential mean case when the regressors are sequentially exogenous. Wooldridge (1997b) and Wooldridge (1999) considered a more general class of nonlinear models with a multiplicative unobserved effect. Even when the population model has an exponential form, the mean function conditional on a selected part ofthe population will not have an exponential form. So, let {(xi,y,-, Vi» : i = 1, ,N} denote a random draw from the population and 1 denote a particular time period. For the balanced panel, we observe (VI-[,xiz) for! = 1, , T. The asymptotic analysis is done for a fixed number of time periods, T, with the size of the cross-section, N, tending to infinity. In order to achieve consistency for a balanced panel, we must assume the following 40 Assumption 2.1 Fort = 1,...,T: 501:1in » axiTJ’z') = Villain/30) Note that xi, is a 1 x K vector of explanatory variables, v,- is unobserved heterogeneity, [30 is a K x 1 vector of parameters at the "true value" of ,8, and ,u(-) is a strictly positive nonlinear function. The leading example to use in place of p(-) is the exponential function, E0‘,-,|x,~,c,-) = exp(xl-tflo + Ci) = v,~exp(x,-,,Bo) (2.1) Although the exponential function can be used as the leading case for ”(o), ,u(-) can be chosen such that it is strictly positive and well defined for all 11,-, and 6. This level of flexibility in choosing p(-) will prove useful later when deriving a test and correction for selection bias. Assumption 2.1 dictates that in the population {x it : t = 1,, T} is strictly exogenous conditional on the unobserved effect, Ci. Under the so-called "fixed effects" assumption, there is allowed to be arbitrary correlation between c,- and x,-. In the case of strictly exogenous explanatory variables and the assumption that the y it follow a Poisson distribution and are conditionally independent across time, Hausman, Hall, and Griliches (1984) showed that their quasi—CMLE (QCMLE) fixed effects Poisson (FEP) estimator consistently estimates 00 (for fixed T as N -> 00). Specifically, HHG (1984) assume the following y” I xi,vl- ~ P()iss0n(141-11,,(60)), t = 1,...,T (2.2) and .l’it~}"z'/' are independent conditional on xi, v,- (23) Defining n ,- = 2:1 y,-, as the sum of the counts across time for a certain individual, one can 41 show as in Hausman et. al (HHG) (1984) that Yi | ni,x,-,v,- ~ M111!inonzial{n,~,p1(x,~,Bo),...,pT(xl-,fio)} (2.4) where y,- is a T x 1 vector of counts and Pz(xi,/3) = #1109143) 23r=1 ”17“,}, 0) However, it is important to note that the QC MLE estimator that Wooldridge (1999) and we propose in this paper is consistent provided that the conditional mean is correctly specified. To prove consistency of 60, there is no need to assert the distributional assumptions that HHG (1984) make. In other words, assumptions (2.2), (2.3), and (2.4) need not hold for the QC MLE to be consistent. In fact, for the general case in which 11(-) denotes the general nonlinear mean function, Wooldridge (1999) showed that the FEP estimator is, in fact, fully robust to distributional misspecification and arbitrary dependence over time. Wooldridge (1999) also proposed generalized method of moments estimators that are more efficient than the FEP estimator when the full set of Poisson distributional assumptions fail. Therefore, y ,- can be a binary response, Tobit response, a nonnegative continuously distributed response, or a logit response probability. Given a correctly specified conditional mean fiinction, there is nothing that restricts y ,- to be a count response. Blundell, Griffith, Windmeijer (BGW) (2002) independently derived the robustness of the FEP estimator using its first order condition. In fact BGW (2002) showed that the FEP estimator is the Poisson estimator that has dummies included for each individual. This means that for a model that uses the exponential mean function in place of 11(o), the exponential regression does not suffer from the incidental parameters problems just as in the linear case. Under Assumption 2.], a variety of estimators consistently estimate [30 (with fixed Tand N —+ 00), provided the 42 regressors have some time variation. Since the Wooldridge (1999) estimator is robust to any misspecification of the Poisson assumption as long as the conditional mean is correctly specified and applicable to any non-linear function, I assert that the Wooldridge (1999) QC MLE FEP estimator is identical to the BGW (2002) GMM estimator when the exponential function is used as the non-linear function in the conditional mean. However, for now we show that the Wooldridge (1999) QCMLE FEP is consistent in a balanced panel under Assumption 2.1. The log likelihood for observation 1' and parameter vector [3 is written as T 1,03) =25»,-,Iogtp,(x.-.9)1 (2.5) t=1 (2.5) can then be used to generate the score and Hessian functions needed for asymptotic inference. As in Wooldridge (1999) the score for observation 1' is T 0113) = 7/3003) =Zyrtvppttxi,(Jr/p.080] (=1 ' T = ZWfiPtO‘i,fi)’/Pt(xivfi)]{yit ‘Pt(xiafi)ni} t=l T =Zlvfipt(xiafl),/Pt(xis3)]{rit(xiafl)} (2.6) (=1 where rit(x,-, [3) = y” — pt(x,-, fl)n,- is a residual function. Using Assumption (2.1), the following lemma proves that the FEP QC MLE estimator is consistent in a balanced panel. Lemma 2.1 E[r,-,(x,~,/30) | xi], ...xl-T, Vi] = 0 under Assumption 2.1. 43 Proof _ T . . . Define ,ul-(xl-r,/30) E (l/T) 2,:1 y,-,.(x,-r,,80). Pluggrng 1n fory,-,,p,(x,-,[30), and ni,tak1ng expectations, and using the Law of Iterated Expectations E[rit(xi2 fi0)] = E[EO’,'1'Pt(xi,fl0)ni)lxi’vi] = EFEU-lx- v.)_fi_if_(xil_3£)_ (“732250. I x- v-) (I l’ l fli(xir9fi0) [:1 It 1’ l — F— T 2 E Viflil(xit2180) — 121%((1/D Egg-((Xit’fi0))] 1 II" [=1 — Since E[f,-(/30)] = E[E[f,~([30)|x,-,v,-] = 0 by LIE and Assumption 2.1, the FEP estimator is consistent.- The ratio 11,-,(fl)/[1 i(/3) does not depend on 6 j if Xitj does not vary over time for a certain j ifthe exponential function is used in place of y(-). Therefore, as in linear fixed effects panel data models, coefficients on time invariant regressors will not be identified using the transformation in (2.6). However, interaction of time constant variables with year dummies is allowed. To perform inference using the Wooldridge (1999) F EP estimator, one needs to derive the expected Hessian from the log-likelihood in (2.5). After taking second derivatives of (2.5), the expected Hessian is written as AG 5 E[rl,-V5p(x,,,80)'W(x,-,fio)Vfip(xl-,60)] (2-7) where p(xiaB0) E [F] (X1300), 2PT(xi’fi0)] and W(X,-.Bo) = [diag{p1(x,-,fio), ...,p7~(x,-,/30)}]_1. Using the score and the expected 44 Hessian, the asymptotic variance offi is A51 BoAal/N where A0 is defined in (2.7) and Bo = E[f,-(flo)f,-(fio)']. Consistent estimates oon and B0 are >> H 719 n,v,,p(x,-,B)’W(x,-,i3)vap(x,-,i3>1 (2.8) w) N 010009 (2.9) 2|~ N 2 i=1 N 2 i=1 . . . . . . - . . —l . . With the estlmated asymptot1c variance being equal to A 1BA /N. A val1d asymptotlc variance estimator when (2.2) and (2.3) hold is [fl/N. Using Assumption 2.] and the results from Lemma 2.1, the following GMM moment conditions consistently estimate [30 E[/1(x,-S,B())'r,-,(fio)] = O, Vs,t (2.10) where h(x,~5, [30) are available instruments that include squared and cross product terms or just xis and is a function of [30. Given the choice ofinstruments, 11(xl-S,/30), a GMM minimum-chi squared estimator of i3 is obtained by minimizing the following criterion function N ’~ N mfiin 4"1 ZH(X,-,l3)’r,-(B) 0’1 N-1 meam'ritm (2.11) i=1 i=1 with A. N ~ Q=N-'ZHr,- K]. Plugging in for 61-2 in (3.3), the selection equation can be written as 52 Sit = 1[n02 + 11,1712 + X82132 + rel-,2 2 0] (3.4) where 6:22 = 872 + “it2 (3-5) )1,- includes all regressors from xm and xitz and cal-,2 ~ Normal(0,o[2). Due to time varying heteroskedasticity, the probit parameters in (3.4) are rescaled since 02+)? 2+X' 2 - P(Sit = IIXI') = (I)( ’7 (”Cit ”2B ) = -(Zizz5z2)l (I) , exp(poe,-,2 — (1/2)81212) "1’12 (2 ex1309211316 + 011) deitZ m“ ‘ ¢(‘21125t20)) €Xp(xiz1fi10 + CH)" I” 5 expi—(1/2)(ea2—pe)2+(1/2>p%1 "'- 1'12 12 (Le-2 ,/27r(1 - (IX—2172600)) H ex ((1/2) 2)<1>( +2.78 ) exmxitlfilo + Cil)g(zi125im’ p0) 11,-1ml-[(1to) (3.8) Note that under the null hypothesis of no selection bias, p = 0 and g(w, + x 1,25 ,2 + 1'1 1'7 ,2, p) = 1. It is also important to emphasize that under the null hypothesis of no selection bias, arbitrary correlation is allowed between S” and 61-1 when the F EP estimator is used to test for contemporaneous selection bias. Again, defining 271,-,(1t) T 2):] Sirmirur) pt(xissivn) : we can develop a test for test for selection bias using QCMLE as long as the conditional mean assumption in (3.8) holds. The log likelihood for observation 1' and parameter vector 1t is written as 54 T (1(3) = Z Sityit log[p;(X,-, Si, “)1 (3-9) (:1 Using the results from Lemma 2.2, it can be seen that the score function derived from (3.9) will have an expectation of zero. Apply the standard formulas shown in (2.8) and (2.9) to derive asymptotic standard errors. Having derived the asymptotic variance estimator and consistency of the QCMLE, a selection test can easily be derived. The following procedures outline the steps needed to test for selection bias. Procedure 3.1 A w L- 1. For each year, do standard probit ofs,-, on xnz and x,- to estimate (0M, ,2, 37,2. 2. Using the QCMLE method outlined above, estimate yitl = CXPO‘izlfil + Ci1)g(é)z+ X82512 + iiTIva) + ”’00: to generate B] and [5. 3. Test H0 : p = 0 against HA : p 4: 0 using an asymptotic t-statistic. Procedure 3.2 LM/Score Test 1. Using the QC MLE method outlined above, estimate the following equation subject to the restriction that p = 0. J’itl = exp(x,~,1[3|+cl-) + error” and obtain B 1. 2. Generate the score function evaluated at p = 0 using B] generated from step one. Use the following LM statistic to test the null hypothesis of no selection bias N N ‘1 N (2 C(31)) <2 VpPO‘pBI)'W(XiaB1)V/3P(XiaBl)) (Z 0&0) i=1 i=1 i=1 55 which is distributed 1% and where 2:1 fi(,Bl) is defined as in (2.6).4 Unfortunately, Procedure 3.2 does not produce a robust test that takes into account possible serial correlation in the model. An alternative procedure can be derived by noting that 0m- (7r - -7a’.’p—)|p=0 = €Xp(x,',1[31 + Ci1)4(‘0t + X82612 + x1712) (3-10) where 2(-) = ¢(-)/(-) is an inverse Mills ratio derived from running a standard probit regression of Sit on x112 and i,- for each 1. Using this result, we can derive a robust variable addition test that can detect selection bias and account for serial correlation. Procedure 3.3 Robust Variable Addition Test 1. For each year, do standard probit of fit on x112 and i,- to estimate (2),, «5,2, 77,2. Generate the inverse Mills ratio, 431052 + xitzg'tz + iii/,2), and take its log. 2. Define the (K 1 + 1) x l parameter vector 1: = (B'l,t//)'. Using the QCMLE FEP method outlined above, estimate the following equation subject to the restriction that p = 0 ym = exp(xm[31+w log (11”) + ci1)+ error), = Vilmitditm) + error” (3.11) 3. Using an asymptotic t-statistic, test H0 : I]! = 0 versus H.) : (p i 0. Alternatively, a GMM score statistic can be used to test for selection bias under the restriction that p = 0. First do T cross sectional probits and generate the inverse Mills ratio, 21,102), + 1812912 + iii/,2). Next, simply, estimate (3.11) using the following moment conditions 1:0 = ([330, 1110),. 56 E[s,-,h(x,-S,n0)'e,-,(/1,-,;110)] =0, Vt,s (3.12) where mit(’l'it;“) T (l/Ti)z,=13irmir(’lit;n) 611011;”) = 2m in (3.13) 511-1 = (1/T,-)Z,:15,-,._)’,-,.1, and h(xl-s,rc0) equals Vfip,(x,-,si,1t)’/pt(x,-,sl-,rt) derived from a preliminary consistent QCMLE. Use the moment conditions implied by (3.12) and the GMM formulas specified in Section 2 to generate 1;? and 56(0), and then test Ho : w = 0 versus H,.; : 1,11 4t 0. To test for selection bias over time, simply interact log (in) with a full set of year dummies and use an asymptotic wald statistic to test Ho : 1,11, = 0 versus H1 : 1,11, :1: 0. Note that none of the testing procedures proposed above correct for selection bias since we are not formally imposing restrictions 6 1'1 . We assume in all of the procedures shown above that the null hypothesis holds and that the conditional mean is correctly specified. A rejection of the null hypothesis indicates that the conditional mean is not correctly specified, in which case we will have to formally impose some structure or restrictions on cil in order to correctly specify the conditional mean. We do this in the next section. 4. Correcting for sample selection bias Various authors in recent econometric literature have proposed methods to correct for selection bias. For example, Wooldridge (1995) proposes a correction method that corrects for selection bias in linear panel data models. This subsection will extend Wooldridge ( 1995) to deal with non-linear count panel data models. Consider the selection model in (3.1) and (3.2), which is rewritten here 57 ya = EXPU‘itlfil + Cil + aitl)“itl (4-1) I[I]()2 + fit-1712 + KHz/32 + 8172 Z 0] (4.2) 5 it To avoid conditioning on selection in at least two separate time periods, Wooldridge (1995) proposes a linear projection for the unobserved effect in the regression equation. In the selection model represented in (4.1) and (4.2), a slightly stronger assumption is needed to deal with the unobserved effect in (4.1) to correct for selection bias. Since (4.1) is a nonlinear panel data model, a conditional expectation assumption is needed for c 11 to correct for selection bias in the presence of strictly exogenous regressors. Basically a Chamberlain (1984) or Mundlak (1978) device needs to be imposed upon (4.1). Therefore, the following assumption corrects for selection bias in the presence of only strictly exogenous regressors. Assumption 4.1 1') Define the selection equation as in (4.2) with 8112 ~ N0rmal(0,o;2). ii)E(u,-,1|x,~,c,-1,e,-,2) = 1 iii) E(c,-1 | x,-) = exp(7ro + Rim). Impose the Mundlak device on (4.1) by setting C” = 7n) + Til-7r] + 5i] and define the composite error term em = 81-1 + am. I'") EICXPU’M) | xiaeitZ] = EI€XP(‘3N1) l eizzl = eXPUIO + 1319112) Since Mundlak device contains a constant term, we can make the normalization that exp(n()) = l. Assumption 4. 1(iii), which holds if(e,~,1,e,-,2) is mean independent of x,-, is key as it gives us a mechanism to derive a selection correction term. Using Assumption 4.], the fact that sf, is a function of 8112, and integrating exp( 1016112) over the truncated normal CDF, 58 EO"1'tlx1°s~s'it = 1) : eXPMOO 1’ xjtlfllo 1' i[”10)g(wto + xitZétZo + 3217,20, p10) = €Xp[7l’00 + Xmfilo + iifllo +10g{g(a)(0 + x1129120 + ii7120apt0)}] (4.3) where (l)(Pt + wt + 1912512 + £1712) ] (Nah + 1912512 + i270.) gee + X23512 + Tommi) = exp(%p%)[ A Chamberlain (1984) or Mundlak (1978) device can be imposed for Cil in Assumption 4.1 iii. Imposing the Mundlak device for Ci] in (4.1) conserves on degrees of freedom. The parameters in (4.3) can either be estimated jointly in a GMM procedure or separately in a two step pooled QMLE procedure. The following two-step pooled QMLE procedure outlines the steps to estimate I‘ =(7r0,[3'1, 7r'1 , p1 , ,pT)' and correct for selection bias. Procedure 4.1 1. For each year t, do standard probit ofsi, on X112 and ii,- to estimate (2),,50 1'12- 2. Estimate the following equation by pooled QMLE A V l- J’z‘t = CXPUTO + X11131 + iifll)g(é>z + xit2‘at2 + iiTt2apt) + (WOW: (4.4) by maximizing the corresponding log-likelihood function T Cl-(O) = Z 51-,{151 log[m(z,-t,0)] —m(z,-,,0)} (4.5) [=1 where 21-, = (l,x,-,1,x,-,2,i,-), 0 = (F',d),,§'t2,}7;2)', and A \I L— 3. Asymptotic standard errors need to be adjusted to account for the first stage probit 59 estimates. Appendix A outlines a two-step procedure to obtain valid standard errors in the presence of generated regressors. 5. Empirical Application: A wage offer equafion In most empirical wage offer equations, researchers use log wage as opposed to wage as the dependent variable when measuring the return to experience and education. Blackburn (2007) uses an exponential mean model to estimate the return to union status for a cross-section. Santos-Silva and Tenreyro (ST) (2006) also note that due to the presence of heteroskedasticity, OLS parameter estimates from log-linearized models can be biased. Therefore, one may be interested in using the level of real hourly wages as the dependent variable and using an exponential mean model to estimate a wage equation. The wage offer equation we are interested in estimating is wage” exp(n, + B1e.rper,-,+/32expersq,-, + [heaven + p * seli, + ci)u,-, 1,...,T (5.1) t where n, represents year dummies, u it is an idiosyncratic error term, c,- is unobserved heterogeneity, 561,-, is a selection term used to test for selection bias, and wage it is real hourly wage. The data used in estimation of (5. 1) contains 877 individuals and follows them over a 13 year period from 1980 to 1992. The source of the data used in estimation is from the Panel Study of Income Dynamics (PSID) and is also used in Semykina and Wooldridge (2005). To test for selection bias, we use the simple variable addition test described in Section 3.1 with SU+1 as the additional variable and the robust variable addition test described in Section 3.2 with log(/1,-,) as the selection term. As described in Section 3.2, the inverse Mills ratio is derived from Tcross-sectional probits that regress the participation 60 indicator on exogenous explanatory variables and their time averages. The regressors in the selection equation are experience, experience squared, education, age, age squared, an indicator for marital status, other family income and its square, the number of children in three age categories, spouse’s education, spouse’s age, spouse’s age squared, the product of spouse’s age and education, duration of spouse’s unemployment, and a binary indicator specifying whether the spouse’s duration of unemployment was recorded or not. Table 5 reports the summary statistics for the variables used in the selection and regression equations. Table 6 reports the semielasticities of E(wage it | xi,cl-) with respect to 111,-, = (expel-it,e.rpersq,-,,educi,,sel,-,)5. Column 1 in Table 6 contains the Fixed Effects Poisson maximum likelihood estimates when the lead value of the selection indicator is used to test for selection bias. Column 2 in Table 6 reports the FEP maximum likelihood estimates when log(/l,-,) is used as the selection term. Robust standard errors are reported in parentheses and usual standard errors are reported in brackets. As can be seen in Table 6, column 1, the simple variable addition test that uses Si,t+1 as the additional regressor does not reject the null hypothesis of no selection bias. Also, as shown in Table 6, column 2, the robust variable addition test that uses logait) as the additional regressor does not reject the null hypothesis of no selection bias. Therefore, it would seem that there is little evidence that selection bias exists in the sample. As a result, there is no need to do the selection correction outlined in Section 4 using the model specified in (5.1). As seen in Table 6, education and the return to experience are statistically significant at the 1% level. Holding other factors fixed, an additional year of education raises real hourly wage by about 2%. Meanwhile, holding other factors fixed, the return to experience in the first year of work is about 7.8%. For comparison, Column 3 in Table 6 reports the return to experience and 61 education excluding the selection term. Table 7 reports the point estimates and standard errors for education, experience, experience squared, and a selection term from a linear model that uses log(wageit) as the dependent variable as opposed to wage it- The linear fixed effects estimates for education and experience are similar to the estimates obtained from FEP estimation. Holding other factors fixed the return to an additional year of education is about 2.5%. Meanwhile, holding education fixed, the return to experience after one year is about 8%. However, as seen in columns 1 and 2 of Table 7, the tests for selection bias reject the null hypothesis at the 5% level. Therefore, one would proceed as in Wooldridge (1995) to correct for selection bias by imposing restrictions on the unobserved effect in the regression equation. It is interesting to note that the signs of the parameters on Si,t+1 and the inverse Mills ratio change signs when estimating the nonlinear model (5.1). The sign on SiJ+1 is positive when log(wageit) is used as the dependent variable but negative when wage it is used as the dependent variable. However, the sign on 10g(/1,-,) is positive in model (5. 1) while the sign on 21,-, in the linear model is negative. Specifically, we assume that E(wage,-,|x,-,c,~) = exp[p(x,-,c,-) + 02(xi,c,-)/2] (5.2) where ,u(x,-,c,-) = E[log(wage,-,) | xi,c,-] and 02(x,-,c,-) = Var[log(wage,-,) | xi,c,-]. Therefore, heteroskedasticity in log(wage,,) is a likely reason for the change in sign in the selection terms when estimating a exponential mean model for wages as opposed to a linear model. As a result, the distribution in wages in (5.1) may not be picking up the sample selection problem that clearly exists in the linear model. However, the point estimates on education and experience are quite similar when one compares the nonlinear model to the linear model. Again, for comparison, column 3 in Table 7 reports the return to experience 62 and education excluding the selection term. 6. Conclusion In this paper, we extended Wooldridge (1999) and BGW (2002) to the case of an unbalanced panel. As in Wooldridge (1999), we assumed strictly exogenous regressors. The advantage of the FEP QCMLE approach is that as long as the conditional mean function is correctly specified, we get consistent estimates of the parameters even if the Poisson distributional assumption is violated. Even if the response variable is not a count variable, we can use the F EP framework to estimate a nonlinear panel data model. This paper proposed methods to test and correct for selection bias for an exponential mean model. We extended Terza (1998) by applying the methods used in Wooldridge (1995) in linear panel data models to test and correct for selection bias in nonlinear panel data models. Finally, we applied the theory developed in the paper to test and possibly correct for selection bias. In Section 5, we estimated a wage equation by using the level of real hourly wages and tested for selection bias. Interestingly, we discovered that selection bias is not captured when estimating the nonlinear wage equation model, but captured when estimating a linear wage equation model. For future research, one can extend this paper to account for selection and attrition bias in nonlinear panel data models under weaker exogeneity assumptions. For example, the methods to correct for selection bias outlined in this paper are not valid in the presence of sequentially exogenous or predetermined regressors. Under such a scenario, one would have to condition upon selection in two separate tine periods. Correcting for attrition bias would not be as problematic in the presence of non-strictly exogenous regressors since attrition is an 63 absorbing state. For example, the patents and R&D relationship, firms might go bankrupt if they cannot stay profitable by receiving a sufficient number of patents over a certain time period. This would be a case where one would have to test and correct for attrition bias. Appendix A In Section 4, we outlined a QCMLE procedure to correct for selection bias. Define the regressors for time period t as w, = (1,x,-,1,i,-) and z,-, = (xitzji). The parameter vectors are defined as 0 =(7r0,fi'1,7r'1,p1,...,pf)' = (F',p1,...,pT)', a (l +K1 +K2 + T) x] vector, 6, = (w,,<§'t2,y’,2)’, a (I + 2K2) x 1 vector, and 8 = (6'1,6'2,,6'T)', which is a (l + 2K2)T x 1 vector. We continue to assume that (WI-“211) is observed for all t and y,“ is observed if s” = 1. This appendix derives the standard errors that take into account the first stage estimation of g(ci) , + x 1,25 ,2 + x 1‘1" ,2, p,). The log-likelihood we are interested in maximizing is T I,~(0,8) =Zsi,(j',-,1log[m(w,-,,z,-,,0,3,)] —m(w,-,,z,~,,0,(5,)} (A.l) t=l where (l)(zitst ‘1' Pt) (“21151) mit = ”'(wit’zitveast) =CXP(W,'1F) €XP(%P12) We define the (1 + K1 + K2 + T) x 1 score for observation 1' as f,-(0,8) =Zthl sitfi,(0,3,) where . V m'- ()1. —m-) fit(9951) = 0 11 ":2: H (A2) Note that 64 Vrm}, me'. = H Emit/(31p, where r I (l)(z'gt'l'pt) Vrm- = w- exp(w-,I‘)exp(ip,2) 1’ . (A.3) It It I 2 (“2115!) and arm, 1 2 ¢(ziz<§i+pi) ¢(zzzSz+pz) —.—=ex (— ) ex w‘l" . +ex w-I" . up, P 2P1 {pt P( it ) (“2115: P( It ) ‘1)(21'151) (A4) The asymptotic variance matrix that we will derive will need to take into account the first stage estimation of 6. Under standard regularity conditions, N 7M0 — 90) = A51 <—N-1/2 2 1,090,120) + 6pm) (A.5) i=1 where A0 = 2;] s,,E[V9m(z,-,,00,5‘)'ng(zl-,,00,5*)/m(z,-,,00,5*)] is (1+K. +K2 + T) x (1 +K1 +K2 + T) and f,(00;8) is (l +K1 +K2 + T) x 1 score fiinction derived from (A. 1). We can also claim under standard regularity conditions that N N N-l/2 Z r,(00;8) =N‘1/2 Z f,-(60;8*) +ep(1) (A.6) i=1 i=1 It is important to note that when deriving the Hessian, it is only necessary to take derivatives with respect to 0. Given that JN—(S — 5*) =0p(1 ), we can apply a standard mean value expansion to (A6) which gives N N N‘“2 Z r,(90;8) =N‘1/2 Z r,(00;5*) + LOJ’MS — 5‘) +op(1) (A.7) i=1 i=1 65 where L0 = E[Z,T:1 anaf,~,(wl-t,z,-,,00;8’;)]. Taking partial derivatives ofthe score condition (A.2), V _ Vaam'nb’m — mama — Vam;,Vam,-.ma — Vanz;,vam 1.01.1 — ma) 5fIt_ 2 mi: (A.8) Taking expectations and using the fact that E0,“ | x,,s,-, = 1) = mi), T Vamfi-Namiz L0=-E 23,, mi: (A9) [=1 where liz¢(lu5t + Pr)¢(liz5t) - Zu¢(ln51)¢(lu5z+ Pt) [9121,6012 V5122” = exp (WI-,1") exp( é—ptz) For each i and t, V501 is a block matrix of zeros except in one position. For each I, Vcfifit appears in row (1 + K1 + K2 + t) and column (1 + 2K2) . (t — l) + 1. A consistent estimate ofLo is N T L = —N“1 2231'!th (A.10) i=1 i=1 For the probit log-likelihood parameters, under standard regularity conditions, we can assume N ,[N(5 — 8) =N-1/2 211,051) + 0,)(1) (A.11) i=1 The term q ((5‘) is a (1 + 2K 3)T x 1 vector that has zero expectation and depends on the score and Hessian ofthe probit log likelihoods. For each 1' and t, q,-(0) is composed ofa series of (l + 2K 2) x 1 vectors stacked on top of each other, 66 ('21-! = CI ¢(Zi’6(:Z;-I[Sit — (l)(zi,5,)] (A12) (”1120011 - (“1,2501 where N “ I 8....-. 852442924 (A13) . 1 (D(z,~,6,)[l-¢(Zit51)l 1s a consrstent estimator of the minus of the expected Hessian from the prob1t log likel1hoods for each I To derive q ,(0), stack the vectors 0,, for each i. We can now derive the asymptotic variance matrix for 00. So N W10— 90) = 45101—10 2111190891) + 0,211) (A14) i=1 and therefore, 7M0 — 00) 51 Normal[0,A311)aA01] (A.15) where g,(00;8*) = f,(00;8*) + Low-(8*) and [g,(00,5 )g,(00,5*)'] = Var[g,-(00;8*)]. A consistent estimate ofthe asymptot1c DOEE variance of00 is Avcir(0) = A— DA-l/N where N T N1ZZsi,V9m(w,-,,0,8)’ng(w,-,,0,0)lm(w,~,,0,5) (A16) [=1 g-(é;8>g,-(é;8)’ (A17) U> Ill 2. M 2 T1 " 016 6+ +Lq,<8> (4.18) 67 Table 5. Summary Statistics. Mean Values. Standard Deviation in parentheses. Variable Description Participation Indicator Real Wage Experience Education Age Married Indicator Other Household Income (thousands) Spouse’s Age Spouse’s Education Spouse’s Unemployment Duration (Weeks) Weeks Unreported (=1 if Spouse’s Unemployment not reported) Children Aged 0-2 Children Aged 3—5 Children Aged 6-17 Children Aged 6-17 Number of Obs. lEntire Sample Participants Non-Participants '0.74 11.79 (7.76) 12.93 (2.30) 40.93 (10.27) 0.86 34.398 (40.379) 371M) (18.17) 11.21 ,(5-25) (197 (4.96) 0.09 (114 (0.37) 0.18 (0.42) (182 (1.01) 11,401 1 8.37 (5.83) 12.98 (7.58) 13.12 (2.27) 40.13 (9.62) 0.84 30.945 (30.868) 35.17 (18.28) 11198 (5.50) (194 (4.80) 0.06 0.11 (0.33) (116 (0.40) (184 (1.01) 8.387 68 0 8.49 (7.28) 12.40 (2.31) 43.14 (11.63) 0.93 44.007 (58.237) 42.10 (16.84) 11.84 (4.40) 1.06 (5.36) 0.16 (121 (0.45) (124 (0.48) (177 (0.99) :L014 Table 6: Wage Offer Equation. Fixed Effects Poisson (FEP) Estimates. Robust std. errors in parentheses, regular std. errors in brackets. Dependent variable: real hourly wage in levels. FEP VAT ' 0. 0849*” (0. 0130) [0.0064] —0. 000815*** (0.000196) [0.000114] 0. 0232“ (0.00938) [0.00723] —-0. 0265 (0.0472) [0.0216] exper expersq educ Si,t+1 log(2,-,) log likelihood —15533.612 FEP Mills 0080?" (0. 0123) [0.00579] —0. 000746**“ (0. 000201) [0.000108] 0. 0216" (0. 00961) [0.00707] 0.000973 (0.00287) [0.00209] —l7491.209 >1: * * —Significant at 1% level. =1< * —Significant at 5% level. >1: —Significant at 10% level 69 FEP 0. 0803*" (0.0122) [0.00573] —0. 000736"* (0. 000197) [0.000101] 0. 0214" (0.0096) [0.00706] —l7506. 58 Table 7. Wage Offer Equation. Linear Fixed Effects (FE) Estimates. Robust std. errors in parentheses. Year dummies included but not reported. Dependent Variable: log of real hourly wage. FE VAT exper ' 0.0875*** 0.0816*** (0.0106) (0.00994) expersq —0.000939’”* —0.000801*** ‘ (0.000175) (0.000167) educ 0.0263 "‘ O. 0242’” (0.0097) (0.00967) sh,“ 0.0883" — (0.0347) 21,-, — —.129*** (0.039) FE Mills FE 0. 0829*“ (0.010) —0. 000924*** (0. 000166) 0. 0268'” (0.00976) >1: 4: >1: —Significant at 1%level. * * —Significant at 5% level. at —Significant at 10% level 70 CHAPTER 3 1. Introduction Recent interest in econometrics has focused on correlated random coefficient (CRC) models and the estimation of Average Partial Effects (APEs) and Average Treatment Effects (ATEs), which are APEs for a discrete explanatory variable. Wooldridge (2002, Chapter 18) provides an extensive review of methods to estimate ATEs. Although most work has focused on cross-sectional random coefficient models, recent interest has also centered upon panel data models with random coefficients. Wooldridge (2005a) uses a CRC panel data model to estimate average treatment effects with strictly exogenous regressors. More recently, Murtazashvili and Wooldridge (MW) 2005 and Murtazashvili (2007) have introduced endogeneity into CRC panel data models. For roughly continuous endogenous explanatory variables, MW (2005) use fixed effects instrumental variables (FE-IV) estimation to consistently estimate APEs. MW (2005) make the restrictive assumption that the covariance between the detrended covariates and the random coefficient conditional on the detrended IVs does not depend on the detrended IVs themselves, although the covariance may change over time. For the cross section, Card (2001) shows that this assumption is violated when he uses a binary indicator for proximity to a four-year college as the instrument for treatment and IQ as a proxy for unobserved ability. For a cross-sectional model, Wooldridge (2005b) is able to mitigate Card’s (2001) criticism by using a control function approach for a roughly continuous endogenous treatment. Murtazashvili (2007) extends Wooldridge (2005b) to deal with CRC panel data models with roughly continuous 71 endogenous treatments. A control function, as defined by Wooldridge (2005b) and Murtazashvili (2007), is derived by generating residuals from a reduced from equation for the roughly continuous endogenous variable and plugging in those residuals as covariates into the structural model. However, the approach used by Wooldridge (2005b) and Murtazashvili (2007) does not work when the endogenous treatment variable has more discrete properties. For example, the control function approach used by Wooldridge (2005b) or Murtazashvili (2007) does not allow for binary or comer solution endogenous treatments. For the cross-sectional setup, Wooldridge’s (2007) CRC model allows for an endogenous discrete treatment by making a distributional assumption on the reduced form of the treatment variable. By making a distributional assumption for the discrete endogenous treatment variable, Wooldridge (2007) derives a so-called "correction function" in order to produce a consistent estimate of the ATE. The correction function is a function of exogenous covariates derived from a first stage probit or tobit regression. This function is then plugged into the structural equation, and then the ATE is estimated using IV methods. In this paper, we extend the framework used by Wooldridge (2007) and Murtazashvili (2007) to deal with discrete endogenous treatments in C RC panel data models. As long as the instruments chosen have sufficient variation, weaker assumptions than those used in Murtazashvili (2007) are sufficient to produce a consistent IV estimator for the ATE. The motivation for this paper comes from Wooldridge’s (2007) "correction function" approach for cross-sectional C RC models. As in Wooldridge (2007), I propose a simple two-step method that corrects for endogeneity in the structural model by using a correction function derived from a first stage probit or tobit estimation. The IV-estimator that I propose is consistent and JN-asymptotically normal for T fixed as N —> 00. As in Murtazashvili (2007), 72 I allow the individual slopes on the treatment variables to vary over time, which allows me to derive a mechanism to generate a correction function. The plan of this paper is as follows. Section 2 introduces the model and conditions needed to generate a consistent estimate for the ATE. Section 3 derives a general approach to finding a correction function for the C RC panel data model. Section 4 shows how to obtain a correction function when the endogenous treatment has a probit or tobit reduced form distribution. Section 5 shows a FE-IV approach to estimating the endogenous ATE along with the correction function. In Section 6, we test the finite sample properties of the correction function estimator by performing Monte Carlo simulations. Section 7 applies the theory presented in the paper to an empirical example dealing with the school choice program in Michigan and student performance. Finally, Section 8 provides some concluding remarks. 2. General model and assumptions For a random draw i, consider the following structural model with time varying individual slopes and discrete endogenous treatments y,, = a, +(:,- + x,,n+w,-,b,-,+ u,,,t=1,...,T (2.1) where c,- is unobserved heterogeneity, w,-,, l x K 1 vector of discrete endogenous treatments, x,~,, a 1 x K 2 vector of exogenous covariates that may include an individual’s productivity characteristics, and a, are year dummies. Note that b,, is a K I x 1 vector of time varying individual specific slopes, and 14,-, is the idiosyncratic error. For simplicity, a balanced panel is assumed. The setup we use is similar to that of Murtazashvili (2007), but in this case the treatment variable is allowed to be discrete. Let 2,, be a 1 x L vector of instruments where 73 L + K3 2 K 1. For now, I assume that x,,, which can include an individual’s productivity characteristics, is strictly exogenous. As in Wooldridge (2007) and Murtazashvili (2007), I will focus on E(b,~,) = B, which is simply a time constant K1 x l ATE vector. We can decompose b ,-, into a nonrandom and zero mean random component. Therefore, we can write b,, = B+d,-+r,~, (2.2) Lets write q ,-, = d,- + r,, by where by definition E(q,-,) = 0. We assume a constant ATE over time. Plugging in (2.2) into (2.1) gives y,-, = a, +c,- +x,-,n+w,-,B+(w,-,q,-, + u,,), t = 1,...,T (2.3) a,+c,-+x,-,r|+w,-,B+6,-, (2.4) Assumption 2.1: E(u,-,|x,-,z,-) = 0, t = 1,...,T where x, = (x,-1, x,T), z,- = (2,1 , ,z,T). Our model allows for two sources of endogeneity- the correlation between w,, and 6,, in addition to arbitrary correlation between w,, and c,- b, = (b,1 , , b,T). Notice that we have not defined a relationship between (6,, b,) and (x ,,z 1.), In order to consistently estimate the ATE, we need to develop a mechanism that relates (a ,, b ,—) to the covariates and the IVs. Therefore, lets make the following assumption 74 Assumption 2.2: Fort = 1,...,T i) E(c,|x,,z,) = E(c,|i,,i,) = 61+ 32,03 + 2,03 ii) E(d,|x,~,z,) = E(d,|l't,-,2,) = B](i, — ‘VXY + B2(i, — VZ)’ iii) E(r,,|x,,z,) = B3(x,, — WtX)’ where B] and B3 are K1 x K2, B2 is K. x L, ‘I’X = E01,), ‘l’Z = E(i,), and E(x,-,) = w,X. In Assumption 2.2 i) and ii), i, and i, can be thought of sufficient statistics describing how the entire history of {x,,,z,, : t = 1, , T} affects d, and c,. Assumption 2.2 iii) means that once we control for an individual’s observed productivity characteristics, the time varying component is mean independent of the IVs. This is the sort of ignorability of instruments assumption made in Wooldridge (2007). One needs to make the appropriate exclusion restrictions for IV estimation to work. Assumption 2.1 is also still valid since (c,,b,) is function of x, and 2,. Under Assumption 2.2, one can write (lit = B](i,-\|IX)’+BZ(2,-WZ)I+B3(X,,—\|I,X)I+e,-, (2-5) where e,, = g, + s,,, E(e,,|x,,z,) = 0, g, is the zero mean random term implied in Assumption 2.2 ii) and s,, is the zero mean random term implied in Assumption 2.2 iii). Having defined Assumption 2.1 and Assumption 2.2, we can now define an estimating equation. Letting p ,- be the zero mean error term implied by c,, one can write )3, = a, + 61+ $1,612 + 2,03+x,,n + w,,fi + ((x, — WX) ® w,-,)[3l + + ((2,- “ 112) ® “10132 + ((xit - WtX) ‘3’ wit)B3 +191" + w,,e,, + “it (2'6) where 13,- = vec[BJ-] forj = 1,2,3. Note that we are not trying to estimate E()i',,|w,,,x,,z,). Parametrizing E(w,,e,,|w,,, x,,z,~) as in Garen (1984) and Heckman and Robb (1986) would require that we use the control function approach described by Murtazashvili (2007). 75 However, as we mentioned before, the correction function approach used by Murtazashvili (2007) breaks down when the treatment variable is discrete. Therefore, we need to apply the so-called "correction function" approach used by Wooldridge (2007) for the cross-section. This requires that we find a parametric form for E(w,,e,,|x,,z,). Generally, E(w,,e,,|x,,z,) depends on (x,, 2,). MW (2005) show that when the conditional covariance of the detrended treatments and the unobserved heterogeneity conditional on the detrended instruments is constant, FE-IV estimation produces a consistent estimate of the APE. Murtazashvili (2007) relaxes this assumption even further and allows for the conditional covariance-variance matrix to be heteroskedastic. Therefore, even if E (w ,,e ,,|x ,,z ,) were not to depend on(x,,z,), we would still have to include year dummies in (2.6) to capture any heteroskedasticity. In order to consistently estimate the ATE, we need to find a parametric form for E (w ,,e,,|x ,,z ,) and use appropriate instruments. First, define K,,(x,,z,) E(w,-,e,,|x,,z,~) (2.7) wit = witeit ‘ K,,(x,,z,) (2-8) Hence, the estimating equation in error form is y,, = a, + 9] + 12,02 + i,03+x,,n + w,,B + + ((511 — \l’X) ® W105] + ((2,- - W2) ® “’1'th + + ((in - \VzX) 3’ “V1053 + Kit(xirazi) + Tita (2.9) where r,, = p, + 0),, + u,,. Note that E(r,,|x,,z,~) = 0, although E(r,,|w,,,x,,z,-) 4t 0. In effect, we are adding a "correction function" to avoid having to condition upon w,,. The function K,,(x,,z,) acts as its own instrument in the IV estimation of(2.9). Since K,,(x,,z,) is unknown, we need to make an assumption as in Wooldridge (2007), 76 Assumption 2.3: E(w,,e,,|x,,z,) = h,,(x,,z,,1t)p where h ,,(x ,,z,,1t) is l x K, and p is K, x 1 parameter vector. Under Assumption 2.3 and the claim that w,, is discrete in nature, we need to make a distribution assumption regarding w,,, which we will do later in this paper. Since h,,(x,,z,,1r) is a function not based upon E(w,,e,,|w,,, x,,z,), but instead E(w,,e,,|x,,z,), it is a "correction function", not a "control function", using the terminology from Wooldridge (2007). Therefore, under Assumptions 2.1, 2.2, and 2.3, in the population (2.9) is written as y,, = a, + 61+ 72,02 + i,03+x,,n + w,,B + + «if — WX) ® wi[)B1 + «if _ \I’Z) ® wit)Bz + + ((x,, — WtX) ® w,—,)B3 + h,,(x,,z,,1t)p + 1,, (2.10) Provided that we have a consistent first stage estimate for 1t, we can estimate the following by any IV procedure, including GMM. y,, = a, + 91+ 7902 + 2,03+x,,n + w,,B + vec[(i, — WX) ® w,-,]’Bl + + vec[(i, — \VZ) ® w,-,]’B2 + vec[(x,, — WM) (8 w,,]’B3 + h,,(x,,z,,ft)p + error,, (2.11) where it is consistent first stage estimate of 1:, the set of possible IVs include {1,xitaliz,iiaii,hit(xiazi,fi), [(ii - WX) ‘39 xii]: [(711 — WX) ‘8 211]» [(2, — mg) 8) x,,], [(i, — wZ) <8) z,,], [(x,, — th) ® z,,]}, w,,- is an estimate ofE(w,,J-|x,,z,) forj = 1,,K1, and $1,, = (ii'm , ,ii’nkl). Having restricted c, in 2.1, we can also estimate (2.1 l) by random effects IV (RE-IV), which would allow us to account for any serial correlation. By doing GMM, we can exploit the available moment conditions to produce an efficient and consistent estimate for the ATE. There are two sources of estimation error in estimating the ATE in (2.1 1). The first source of estimation error comes from replacing E(i,), E(2,) and E(x,,) with ‘VXa ‘VZ’ and “HA” respectively. Fortunately, as 77 shown in Wooldridge (2002 Chapter 6), the estimation error by replacing E (i ,), E (2,) and E (x ,,) is cancelled out by the variation in r,,. However the asymptotic variance of W (ft — 1t) affects the limiting distribution of the IV estimator for the ATE. Under the null hypothesis of no endogeneity bias, p =0, the first stage estimation of it does not affect the limiting distribution of the IV estimator for the ATE. [See Wooldridge (2002 Chapter 6)]. To test for endogeneity, simply use the usual asymptotic Wald statistic robust to heteroskedasticity and serial correlation and test H0 : p =0 vs. H ,4 : p 1:0. If the null hypothesis is rejected, the standard errors need to be corrected to account for the generated regressor problem. See Appendix A for the formulas to generate standard errors robust to heteroskedasticity and estimation error from it. 3. A general method for deriving the correction function Having shown the necessary conditions to generate a consistent IV estimate for the ATE, we will now discuss how the correction function is derived. First, I will go through a general approach to show how the correction function is derived. Then, I will show how to derive a correction function when the endogenous treatment is a probit or tobit response. I will use a similar framework to that used in Wooldridge (2007), but extend the analysis to deal with a panel. Forj = 1,...,K1 and! = 1, T, E(w,,je,,j|x,,z,) needs to be defined. In order to define the correction function, we need to formulate a particular distribution for w,,j conditional on the covariates and IVs, 78 M’,‘,j =fj-(x,,z,,v,,j;aj) (3.1) v,,j|x,,z, ~ Gj(V,,j,flJ-) (3.2) E(ejtjivitjaxi’zi) = E(eitjivirj) = Pjvizj (33) Equation (3.1) specifies a specific functional form for w,,,, which is a function ofthe covariates, IVs, and the reduced form error, v,,j. Meanwhile, (3.2) specifies a particular distribution for the reduced form error which has density gj(v,,j;nj). Equation (3.3) is conditional mean independence assumption which states that conditional on v,,,, e,,j is mean independent of (x,, 2,). I assume a linear conditional expectation in (3.3) and this assumption holds when (e v,,J-) is bivariate normal conditional on (x,,z,). Together, (3.1), itj’ (3.2), and (3.3) define a particular distribution for w,,,, D(w,,j|x,,z,). Although we are specifying a particular distribution for w,,j, which may be too strong of an assumption, we are not restricting D(c,|x,,z,) and D(q,,|x,, z,). This is advantageous since it would be quite restrictive to specify a particular distribution for unobserved heterogeneity. Using (3.1), (3.2), and (3.3) and a law of iterated expectations argument, we can show that for j = 1, ...,Kl E(w,,,e,,j|x,,z,) = E[E(w,,je,,j|v,,,,x,,z,)|x,,z,] = Eiwz'tjE(eitj'ivitjaxiazi)ixiazi] = ij(W,',jV,',j|Xi,Z,') = ijVj(xi,ziavit}';aj2)vitjixiazi] = P1 in}(X,-,Zi,vig;q,2)vi,,-g(vitj;11,2)(1V1-U- = pjlz,,j(x,,z,;1tj) (3.4) where it j = (012,11 1'2), is a parameter vector. Under standard regularity conditions, 11: j can be estimated using maximum likelihood methods. In the next section, we outline methods to 79 generate a first stage estimate of 1!: j when w,,, either is a tobit or probit treatment. 4. Examples I will now cover derive the correction function and define the procedure for testing and correcting for endogeneity when the treatment is a tobit or probit variable. This section extends Wooldridge (2007) to deal with a panel. 4.1 Probit treatment variables For j = l, ,K 1, let w represent a binary and possibly endogenous treatment, where it] w,, 2 (Wm , ,w,,K1 ). Let n,, = (x,,,z,,) be a 1 x (K2 + L) vector ofexogenous covariates and IVs. Under Assumptions 2.1, 2.2 and equation (3.3), lets make the following binary distributional assumption for W,”- Wit] = 1[n,,0j+c,2+v,,j Z 0] (4.1) v,,j|n, ~ Norma/(03%,) (4.2) where 6,2 is unobserved heterogeneity. We can allow for correlation between 6,2 and n, as in Mundlak (1978) and write c,2 = 020 + 11,012, + a,2 where (1,2 ~ Normal(0, l). Restricting c,2, we can rewrite (4.1) and (4.2) as W,',]' = l[(120 + n,,0j + fi,a21+p,,, Z O] (4.3) p,,J-|n, ~ Norma/(0,0 (4.4) 2 ptj) where p”, = (1,2 + 1),-,1- and Var(p,,j) = 0%,, Due to time varying heteroskedasticity in (4.3), the estimated parameters in (4.3) need to be rescaled. So, the response probability in (4.3) is written as 80 P(W = 1|n,) = (l)(azot + nitetj + fiia2lt) = ,,(n,ft,) = ((0,,1,,,,K1) and (it-Km?!» = Wm , a¢itK1)- 3. Estimate the following equation by any IV procedure, including GMM, 81 y,, = a, + 61+ i,02 + 2,03+x,,n + w,,B + vec[(i, — \VX) ® w,,]'[3, + + vec[(i, — \VZ) ® wit],BZ + vec[(x,, - WIX) ‘8’ “571,33 + + we.» + error, (48) using the following instruments .. . [laxitaiiaiiad’ita vec[(xit _ WtX) ® (DI-[Tavecixii — WX) ® (pity, wee-[(2,- — w) e wait-,1. 4. Using an asymptotic and robust Wald statistic, test Ho : p =0 vs. H A : p #0. If the null hypothesis is rejected, adjust the standard errors to account for the first stage estimation of it,. See Appendix A. Procedure 4.1 provides an effective method to consistently estimate the ATE in a CRC panel data model. The procedure is fairly easy to implement and is an effective way to test for endogeneity in the treatment effect. 4.2 Tobit treatment variables In some cases, the treatment indicator is not a binary response variable. For example, the number of hours spent in a job training program can act as a treatment to get a potential worker into the workforce. For some fraction of the workforce, the number of hours observed in a job training program is zero. Therefore, one can think of hours in a training program as a tobit variable. In this subsection, we maintain Assumptions 2.1 and 2.2, but now write in place of (4. 1) w,,, = max(0, n,,0j + C12 + V,,]') (4.9) Again, allowing for dependence between 6,2 and n,, we can write (4.9) as w,,, = max(0,a20 + n,,0j + [1,0121 + (1,2 + v,,,) = max(0,n,1tj +£01.11.) (4. 10) 82 where p,,, = a,2 + 1),-U- and p,,j|n, ~ N0rmal(0,012),,) as in the previous subsection. Having defined the tobit reduced form for w,,,, +33 1 pitj E(u-’,,jp,,j|n,) = J max(0,n,1tj +pltj)pl{/ ¢( O'ptl‘ )dpltj —00 Opt] = ogy¢(n,1r,j) (4.11) As in the previous subsection, ny- = j/O’py'. Using (4.1 l), the following estimating equation can be written as follows, y,, = a, + 01+ i,02 + i,03+x,,11 + w,,B + vec[(x, — WX) ® w,,]’01 + + vec[(i, — \VZ) ® w,,]'B2 + + vec[(x,, — WIX) ® w,,]'[33 +[o,2,,<1>(n,1t,)]p + r,, (4.12) where [0,29, 0] (6.4) _ .. l — 7a,, '1' 301,, '1' Vi, +0, + b,) where v,, ~ N0rmal(0, 1). Defining C, as year dummies, the reduced form parameters are derived from estimating the following equation using pooled probit POW: = 1|z,) = (MCI +z,,y1 +5172) (65) The estimating equation for the main regression equation is m = Ct + W113 + 51171 + Wizfii - Elm + 5011+ git (6-6) where (15,, is the correction function derived from the reduced form probit equation. The main results for the Monte Carlo simulations are presented in Table 8 and Table 9. In Table 8, Monte Carlo results are reported when 2,, ~ N0rmal(—l, 1). In order to allow for more variability in 2,,, we report in Table 9 Monte Carlo results for 2,, ~ N0rma1(—1,4). We do simulations for N = 500, 1000, and 1500 while T = 5. The true population parameters are set at 6 = 2 and a = 1. Year dummies are included in the reduced form probit estimating equation and in the structural estimating equation. The number of replications is 500. Along with the mean, mean absolute error (MAE), standard deviation (SD), and root mean squared error (RMSE), the median, lower quantile (LQ), and upper quantile (UQ) of B are reported in Tables 8 and 9. As shown in Tables 8 and 9, the sample correlations between 87 w,, and b, and w,, and a ,, PM) and [)wa, are similar across the two specifications of 2 ,,. The estimated correlation between w,, and u ,,, fawu falls when 2 ,, ~ N0rma1(—l,4). However, there is still evidence of substantial correlation between w,, and u ,, when 2,, is N0rmal(—1, 1) or Normal(—1,4). As predicted by the theory, the so-called "correction function" estimator for the panel is consistent. The FE-IV estimator is also pretty well-behaved, especially when 2 ,, ~ N0rmal(—1, 1). For example, the FE-IV estimator has the minimum mean square error compared to the correction function estimator when 2,, ~ Normal(—1, 1). As shown Table 8 and 9, the FE-IV estimator is more efficient, especially when 2,, ~ N0ramal(—1,1). When 2 ,, has more variability, the correction filnction estimator is almost as efficient as the F E-IV estimator. The fact that the correction function estimator is not as efficient as FE-IV when 2 ,, ~ N0rmal(—l, 1) is not surprising and suggests that the correction fimction is a highly collinear function of the instrument, 2 ,,. However, the correction function estimator has a lower RMSE and bias compared to the F E-IV estimator when the variability of 2,, increases As expected, POLS, IV, and FE are not consistent. The estimated values for the ATE are not close to the population value set in the simulations. However, the correction fiJnction estimator does not perform very well in terms of the estimated mean when 2,, ~ N0rmal(—1, 1) and N = 500. This is perhaps not at all surprising since we would expect the correction function estimator to perform better when there are more cross-sectional observations as well as more variability in 2,,. Wooldridge (2007) points out that the correction fiinction estimator performs better when there are a large number of cross-sectional observations. 88 7. Empirical Example: Michigan Schools of Choice Program Starting in 1997, school districts in Michigan were allowed to enroll nonresident students without having to obtain permission from the actual student’s district of residence. In this section, we will use the correction fiinction estimator to estimate the ATE of the school choice program on student performance. As a measure of student performance, we use satisfactory math pass rates for fourth graders. Controlling for real expenditures and eligibility in a school lunch program, we estimate the following equation pass4,, = a, + c, + n] log(rexpp),,+n2lunch,, + b,,ch0ice,, + u,, (7.1) where pass4,, is satisfactory math pass rates for fourth graders, choice ,, is a binary indicator describing whether a school district had a choice program in particular year, log(rexpp),, is the log of real expenditure per student in 1997 dollars, lunch ,, is the percentage eligible for a free lunch, c, is a district specific effect, and a, are year effects. The sample consists of 550 school districts for the years 1997 and 1998. A school district is assumed to have a choice program if the district has greater than zero choice students. In equation (7.1), we assume that choice ,, is not strictly exogenous. For example, a particular school district may not admit students from another school district if administrators determine that performance in the previous year declined due to external and uncontrollable factors. Table 10 gives the summary statistics for the control variables, choice ,,, and math pass rates for districts with and without a choice program. As shown in Table 10, schools with a choice program have lower mean enrollment and slightly lower mean math pass rates than school districts without a choice program. In order to test the effect of the choice program on math pass rates, we estimate (7.1) 89 using the correction function estimator, FE-IV, FE, pooled IV, and pooled OLS. We instrument for choice ,, by using the log of district enrollment, which we assume to be strictly exogenous. The robust t-statistic on log enrollment in the reduced form probit equation with controls is 3.07, while the usual non-robust t-statistic is 2.44, which would indicate that log enrollment is a valid instrument for the binary choice program variable. The results are presented in Table 11. Unfortunately, the correction function estimator does not perform as well as the Monte Carlo simulations would indicate perhaps because there is not enough time variation in the data. There is evidence of multicollinearity in the estimates in column 1 of Table 11. However, the FE-IV estimator that tests for SIOpe heterogeneity and the interaction effect performs somewhat well. There is strong statistical evidence that there is an interaction effect as indicated by the t-statistic on the pdf function. The coefficient on the pdf function is statistically significant at the 1% level. Although the FE-IV estimate indicates the there is a positive impact of the school choice program on student performance, the coefficient on choice ,, is not statistically significant at the 10% level. For comparison, estimates from the FE, pooled IV, and pooled OLS regressions indicate that the choice program has a negative impact on student performance. However, only the POLS coefficient on choice ,, is statistically significant at the 10% level. 8. Conclusion In this paper, we have estimated ATE for a panel using a correlated random coefficient model. Unlike previous works that deal with estimating APEs for a panel, our model allows for discrete treatments. In addition to allowing for discreteness in our treatment variable, we also allow for endogeneity in our model. An advantage to using the correction fiJnction 9O estimator or the F E-IV correction fimction estimator presented in Section 5 is that no distributional assumptions are required to restrict the unobserved heterogeneity. In order to derive the correction function and account for endogeneity, fairly strong assumptions are required regarding the nature of the discrete treatment. However, since the treatment is observed, it is fairly easy to make distributional assumptions regarding the endogenous treatment. In simulation and in the empirical example presented in the paper, we used pooled probit to estimate the reduced form for the endogenous treatment. In finite samples, Monte Carlo simulations show that the correction firnction estimator performs well in finite samples, especially when there are a large number of cross-sectional observations. We applied the methods developed in this paper with an empirical example examining the effect of the Michigan schools of choice program on student performance. In the firture, we expect the estimator developed in this paper to be used in the program evaluation literature to test the effect of certain policies over a fixed time period. 91 Table 8: Monte Carlo Results. 500 Replications. Year Dummies Included. Mean SD RMSE MAE LQ Median UQ 8 Mean SD RMSE MAE LQ Median UQ Mean SD RMSE MAE LQ Median UQ (1) (2) (3) 30:2 T=5 N=500 Estimator POLS IV A 3.74 0.084 1.737 1.735 3.68 3.74 3.79 'B 2 2 ’ 3.73 10.059 1.73 1.73 ‘ 3.69 3.74 3.77 “B = 2 I 3.74 0.047 : 1.74 1.74 3.71 3.73 3.77 FE 2.385 2.61 0.216 0.078 0.45 0.62 0.39 0.61 2.23 2.56 2.38 2.61 2.54 2.66 T=75 N = 1000 2.396 2.61 0.15 0.058 0.439 0.628 0.398 0.619 2.28 2.57 2.40 2.62 2.50 2.65 T = 5 N = 1500 2.41 2.62 i 0.12 0.047 0.44 0.631 0.42 0.623 2.33 2.59 2.40 2.62 2.48 2.65 (4) 2,, ~ N0rmal(-1, 1) (5) i (61 (7) 18) FE — I V Corr. Fxn [ow [)wa pm, 1.64 0.18 0.429 0.378 1.51 1.63 1.75 1.65 0.129 0.398 0.361 1.55 1.65 1.73 1.66 0.107 0.379 0.349 1.58 1.66 1.73 92 1.93 0.76 0.75 0.55 1.54 1.98 2.38 9 1.99 0.475 0.488 0.372 1.71 2.02 2.29 1.99 0.393 0.407 0.309 1.75 1.99 2. 25 0.34 0.34 0.31 13,, b..;aap..yu 0.34 0.34 0.31 bwb ‘bwa _ fiwu 0.34 0.34 0.31 Table 9: Monte Carlo Results. 500 Replications. Year Dummies Included. (1) (2) 113:2 T=5 Estimator i POLS IV Mean SD RMSE MAE LQ Median UQ Mean SD RMSE MAE LQ Median UQ Mean SD RMSE MAE LQ Median UQ 3.49 0.09 1.49 1.48 3.42 3.48 3.54 18:2 3.48 0.064 1.48 1.48 3.43 3.48 3.52 42 3.49 0.049 1.49 1.49 3.46 3.49 3.52 3.04 0.121. 1.05 1.04 2.96 3.03 3.12 T=5 3.05 0.086 1.05 1.05 2.99 3.05 3.10 T: 5 3.05 0.066 1.06 1.06 3.01 3.04 3.09 2,, ~ N0rmal(—l,4) (3) N=500 FE 2.19 f 0.074 0.239 0. 196 2. 14 2. l9 2. 25 N = 1000 2.19 0.049 0.23 0.193 2.15 2.18 2.22 N=1500 2.19 0.041 0.23 0.20 2.16 2. 20 2.22 (4) (5) <6) (7) (8) FE—IV C0rr.Fxn bwb [3W0 fiwu 1.82 0.094 0.242 0.194 1.76 1.82 1.88 1.82 i 0.066 0.228 0.186 1.78 1.82 1.86 1.83 0.054 0.221 0.181 1.79 1.82 1.86 93 1.99 0.099 0.16 0.086 1.93 2.00 2.07 2. 00 0.068 0. 143 0.063 1.95 2. 00 2.05 2.00 0.059 0.140 0.056 1.96 2.00 2.04 0.355 0.361 0.16 [2,, [91m bwu 0.356 0.361 0.15 f) u . b [3 wa fiwu 0.356 0.361 0.16 Table 10: Summary Statistics. Mean values. Standard Deviation in parentheses. Variables Choice=l C hoice=0 2359 3486.5 (2819.9) (10103.71) Enrollment 66.01 67. 989 4th Grade math pass rate (16.19) (15.84) 5948. 63 6083.55 Expenditures (19978) (843.68) (1074.66) 30.57 26.96 Lunch eligibility (14.41) (17.02) Number of obs. 437 663 Table 1 l: ATE Estimates. Robust standard errors in parentheses. Year dummies included but no: reported. (1) (2) (3) (4) (5) f A Corr. Fxn FE-IV FE Pooled IV POLS choice ‘ —30.19 19.33 —0.467 -33.78 —1.51* (52.21) (15.53) (1.56) (30.30) (0.884) 10g(reXPP) 34.75 —15.67 7.55 2.77 1019*" (188.67) (28.85) (6.29) (8.59) (3.27) lunch 0.361 0.185 -.073 —0.264** —0.379*** (6. 27) (0. 267) (0. 245) (0. 126) (0. 034) pdf 1 —14.91 —244.52*** — — — (167.9) (80.93) :1: >1: >1: —Significant at 1% level. * >1: —Significant at 5% level. * —Significant at 10% level 94 Appendix A In this section, I will show how to derive robust standard errors when the reduced form for the endogenous treatment has the probit form. So consider the following model from Section 4 with a scalar endogenous treatment 7 )"it = g,,I‘ +8” (A.1) Where git = (1, iiaiia xii, Wit, (ii - ‘1’X)Wita(ii - W2)W17,(X12 - WZX)Wit1 (1’11), F = (61,0'2,0'3,n',,[3,fi',,B'2,fig)' and E(e,,lw,,,x,,z,) = 0. Note that I" is (3 + 4K+ 2L) x 1. Define the generated instruments in time period t by the 1 x (3 + 4K + 2L) vector 11,, = (1, i,,i,, x,,, <1>,,, (x,, — w,X)<1>,,, (i, — \VX)(1>,,], (i, - WZ)(i)it»‘1)it) which means that the ATE [3 is just identified. Using instruments hit, the pooled ZSLS estimator is . T .. N T .1 . “1 N T .7 = (228111111) (2211111117) ( 21117811) i=1 t=1 i=1 t=1 i=1 t=1 N’ T . N' T .1. -1 A, T ., x (228111111) (2211171111) (22111012) (A.2) Multiplying through by W, writing (A. 1) as y,, = g,,1“ + (g,,—g,,)l" +r,,, and plugging this —1 into (A.2), one can write z=1t=1 [=1 (=1 i=1 (=1 N T 1 N T , ’1 212222;, a WZZhnhn x (=1 t=l i=1 t=1 N T , x 11*“2 ZZhquu—énr +41 (A3) i=1 i=1 Under standard regularity conditions, (A3) is written as 95 N T (CD—1C)’1CD—121+op(1) i=1 [=1 i=1 t=1 N T = P'1 (A10) i=1 t=1 where , _ (¢(ki15)}2kizkit A145" 90.2011 —¢