CONTROL FUNCTION METHODS IN APPLIED ECONOMETRICS By Riju Joshi A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Economics – Doctor of Philosophy 2018 ABSTRACT CONTROL FUNCTION METHODS IN APPLIED ECONOMETRICS By Riju Joshi This dissertation considers estimation and inference in three econometric models containing issues commonly encountered with observational data. Fundamental issues of self-selection, endogeneity, missing observations are pervasive in observational data. Moreover, often, the ob- servations in a dataset are rarely statistically independent and have complex dependence struc- tures. These issues can have a significant effect on the causal effect analysis and pose serious limitations on the popular methodologies that either maintain restrictive assumptions and/or re- quire complicated and computationally tedious solutions.The dissertation aims to apply control function method as the primary tool to design estimation procedures under relaxed distributional and functional form assumptions. I describe computationally simple solutions to these issues to obtain more precise results. These estimation procedures are obtained under relaxed distri- butional and functional form assumptions allowing a researcher to incorporate more variability (or heterogeneity). Chapter 1: Specification Tests in Unbalanced panels with Endogeneity (joint work with Jeffrey M Wooldridge) This chapter develops specification tests for unbalanced panels with endogenous explanatory variables. We obtain a general equivalence results for the Random Effects 2SLS and Pooled 2SLS in an unbalanced panel. This algebraic result serves as the foundation to the fully- robust regression based Hausman Test to compare RE2SLS and FE2SLS estimators in form of a Variable Addition Test. In addition, we also obtain an equivalence result for Control Func- tion estimators and FE2SLS estimators in an unbalanced panel. The results helps us to obtain regression-based fully robust specification test to check the correlation between the explana- tory variables and the unobserved idiosyncratic errors. The test compares FE estimators with FE2SLS estimators Chapter 2: Control Function Sieve Estimation of Endogenous Switching Models with En- dogeneity (joint work with Jeffrey M Wooldridge) In this chapter, we propose a sieve estimation procedure for estimating average treatment ef- fects with a binary treatment in the framework of endogenous switching models. We consider a generalized model for the reduced form of the treatment variable that allows for the heterogene- ity in terms of a distribution- free, conditional-heteroskedastic error term. We derive a simple, two-step estimation method that uses control function methods to correct for the endogeneity of the treatment assignment. consider the effect of attending a catholic high school on student math test scores. Chapter 3:Control Function Estimation of Spatial Error Models with Endogeneity This chapter considers estimation of linear regression models that allows some covariates to be endogenous when the data is suspected to exhibit spatial dependence. For example a hedo- nic price model in the housing markets not only has endogenous covariates such as schooling quality that are of prime interest but also often has spatially correlated neighborhood variables that are difficult to fully incorporate explicitly. These omitted spatially correlated neighbor- hood variables induce spatial correlation in the errors in the model. This paper uses control function method to control for endogeneity and incorporates the spatial dependence of data to achieve more precise results. I describe an estimation strategy that first divides the observations into groups based on the distance between them and then imposes control function assumptions to model the endogeneity within each group. A computationally simple two-step estimation procedure is suggested for a parametric estimation strategy where a GLS-type estimation is proposed that accounts for only the within-group correlations while ignoring the across-group correlations. Results from the Monte Carlo simulation studies show that we obtain noticeable efficiency gains through this estimation procedure. ACKNOWLEDGEMENTS First and foremost, I would like to express my sincere gratitude to the chair of my dissertation committee, Jeff Wooldridge, for his guidance and for his endless patience. He has always been supportive of me and my research and his kind words of encouragement really helped me grow as a researcher. I feel truly fortunate to have him as mentor. I would also like to thank Peter Schmidt, Kyooil Kim, and Saweda Liverpool-Tasie for serving on my committee and providing valuable feedback and assistance. I also appreciate the comments of seminar participants at Michigan State University, Asso- ciation of Public Policy Analysis and Management Fall 2016 Research Conference and North American Summer Meetings of the Econometric Society, and the 27th Annual Meeting of the Midwest Econometrics Group. I am specially thankful to the participants at the The Third MEG Mentoring Workshop for Junior Female Economists. I am grateful for the financial support I received from the Graduate School and the De- partment of Economics at Michigan State University, including the Delia Koo Global Student Scholarship. I am especially thankful to Siddharth Chandra for providing me with financial sup- port several times during my time in graduate school. I also appreciate the support and advice that Lori Jean Nichols and Todd Elder gave me as I navigated the graduate program and job market. I am truly indebted to my friend Muzna for all the emotional support, companionship and caring she provided that helped me through the difficult times. I am also grateful to my dearest friends Annie, Pallavi, Alyssa, Akanksha, Danielle, Walter, Meenakshi, Udita, Ashesh and Kelsie for making my graduate experience so memorable. I am truly grateful to my parents Prakash and Usha and my sister Richa for their selfless love and faith in me. They always encouraged and helped me at every stage of my personal and academic life. Finally, I am grateful to my husband and love of my life, Pathikrit for always v believing in me. He has always been there for me and without his love and support I would be lost. He is truly my rock. TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix CHAPTER 1 SPECIFICATION TESTS IN UNBALANCED PANELS WITH EN- . . . . . . . . . . . . . . Introduction . Fixed Effects 2SLS (FE2SLS) 1.3.1 1.3.2 Random Effects . DOGENEITY . . . . . 1.1 1.2 Model . 1.3 Estimation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 An Algebraic Equivalence Result . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Regression based fully-robust Hausman Test to compare RE2SLS and FE2SLS 1.6 Robust Hausman Test to compare FE vs FE2SLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 A Strategy for an Applied Econometrician . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8 Empirical Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9.1 Derivation for ˆβP2SLS . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9.2 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8.1 Background . 1.8.2 Results 1.10 Concluding Remarks 1.9 Technical Details 1.6.1 Model . . . . . . . . . . . . . . . . CHAPTER 2 CONTROL FUNCTION SIEVE ESTIMATION OF ENDOGENOUS . . . . . . . . . . . . . . 2.3.1 2.3.2 Introduction . 2.3 Estimation Strategy . SWITCHING MODELS WITH ENDOGENEITY . . . . . . . . . . . . 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Constant Coefficients Endogenous Switching Regression . . . . . . . . . . . . 2.2.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Estimating Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parametric Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sieve Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Asymptotic Properties of the Parametric Estimators . . . . . . . . . . . 2.4.2 Asymptotic properties of the Sieve Estimators . . . . . . . . . . . . . . 2.4.2.1 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2.2 Asymptotic Normality . . . . . . . . . . . . . . . . . . . . . 2.4.2.3 Consistent Variance Estimator . . . . . . . . . . . . . . . . . Explicit Expressions in our model . . . . . . . . . . . . . . . 2.4.2.4 2.4.3 Numerical Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Empirical Illustration . 2.4 Asymptotics . . . . . . . . . 1 1 5 6 6 8 9 11 13 14 16 17 17 18 21 21 25 28 29 29 32 32 35 37 38 39 41 42 42 44 45 48 50 51 54 vii . . . . . 2.6 Technical Details 2.7 Application of the Estimation Strategy to other econometric models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 Asymptotic Variance for the Parametric Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.2 Holder class . . . . . . . 2.7.1 Heterogeneous Coefficients Model . . . . . . . . . . . . . . . . . . . . 2.7.2 Binary Endogenous Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Concluding Remarks Sample Selection Model . . . . . . 61 61 65 66 67 71 74 77 CHAPTER 3 CONTROL FUNCTION ESTIMATION OF SPATIAL ERROR MOD- 3.3 Estimating Equation . . 3.4 Estimation Procedures . Instruments . 3.1 3.2 Model . . . . . . . . . . . . . . . . . . . . Introduction . ELS WITH ENDOGENEITY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 A Linear Regression Model . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Spatially Correlated Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Control Function Assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Control Function Estimation . . . . . . . . . . . . . . . . . . . . . . . First Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1.1 Second Step . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1.2 Incorporating Extra Instruments in other Estimation Procedures . . . . 3.4.2.1 Grouped 2SLS Estimation . . . . . . . . . . . . . . . . . . . 3.4.2.2 Spatial Generalized Instrumental Variable Estimation . . . . . 3.4.3 Estimation of the spatial parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Asymptotics for Feasible Spatial Control Function Estimator . . . . . . 79 79 84 84 85 87 89 90 90 91 91 91 94 95 95 96 96 98 3.5.1.1 Adjusting for first-step estimation . . . . . . . . . . . . . . . 103 3.5.1.2 A consistent estimator for variance robust to cross-sectional 3.4.2 . . . . . . 3.5 Asymptotics . structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 3.5.2 Asymptotics for 2SLS, Grouped 2SLS and Spatial GIV . . . . . . . . . 105 3.6 Monte Carlo Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 3.6.1 Data Generating Process . . . . . . . . . . . . . . . . . . . . . . . . . 107 3.6.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Spatial Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 3.6.3 3.6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Performance of Spatial GIV . . . . . . . . . . . . . . . . . . . . . . . 114 3.6.5 3.7 Conclusion and Future Research . . . . . . . . . . . . . . . . . . . . . . . . . 116 . . . . . . . . . . . BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 viii LIST OF TABLES Table 1.1: Empricial Illustration Results . . . . . . . . . . . . . . . . . . . . . . . . . . Table 1.2: Specification Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 2.1: Empirical Illustration Results . . . . . . . . . . . . . . . . . . . . . . . . . . 22 23 57 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 2.2: Treatment Effects . Table 3.1: M1 y2i = 1 + 3∗ x2i + ρui + N (0,1) . . . . . . . . . . . . . . . . . . . . . 111 Table 3.2: M2 y2i = 1 + 3∗ x2i + 2∗ x(2,i+1) + ρui + N (0,1) . . . . . . . . . . . . . . 112 Table 3.3: M3 y2i = 1 + 3∗ x2i + 3∗ x(2,i+1) + ρui + N (0,1) . . . . . . . . . . . . . . 113 Table 3.4: Performance of Spatial GIV . . . . . . . . . . . . . . . . . . . . . . . . . . 115 61 ix SPECIFICATION TESTS IN UNBALANCED PANELS WITH ENDOGENEITY CHAPTER 1 1.1 Introduction Panel data has become very popular in contemporary empirical work specially in social and behavioral sciences. Hsiao (1985,1986), Klevmarken (1989) and Baltagi (2001) attribute this popularity to the ability of panel data to capture dynamics through its time dimension providing more variability in the data. Panel data allows the observations to have heterogeneity and this allows researchers to model complicated behavioral patterns. However, finding or constructing a balanced panel data is extremely rare. In most cases, observations on each time period for all cross-sectional units are not available and we have an unbalanced or incomplete panel dataset. This is particularly common when the cross-sectional unit is a firm, household or a person. For instance, an unbalanced panel might be a result of the survey design, as in the case of a rotating panel, where equally sized sets of sample units are brought in and out of the sample in some specified pattern. In other cases, incomplete panels might arise due to cross-sectional units dropping out leading to the problem of attrition. Specification and estimation issues of econometric models for unbalanced panels have pri- marily focused on testing for the presence of selection bias and estimating models if sample selection is present. Nijman and Verbeek (1992) develop a simple test to check for sample se- lection bias in random effects framework and Wooldridge (2010) extends this for fixed effects. Wooldridge (1995) develops variable addition tests for selection bias and these tests are further extended for models with endogenous variables in Semykina and Wooldridge (2010). Hsiao et al. (2008) propose limited information test to check for selection issues. In addition to testing for sample selection, a number of studies have addressed the issue of treatment of unbalanced panels in the presence of selection bias. Numerous parametric and semi-parametric solutions 1 to correct for sample selection bias have been proposed in econometrics literature. Wooldridge (1995), Semykina and Wooldridge (2010), Kyriazidou (1997), Rochina-Barrachina (1999) are just a few notable examples that consider the estimation in unbalanced panels for both linear and non-linear models with exogenous and endogenous covariates. In the absence of selection bias, we can extend the models and the estimation methods for balanced panels to their unbalanced counterpart. Specification testing for drawing comparisons between different estimation methods can also be extended for unbalanced panels. Specification tests such as Hausman Test to compare fixed effects and random effects have been developed for unbalanced panels with exogenous sampling. However, as is the case with balanced pan- els, the traditional Hausman test maintains the assumption that the conditional variances of the composite error terms1 have the random effects structure. This becomes one of the key limi- tation of the traditional Hausman Test as the failure of this assumption distorts the asymptotic distribution of the test statistic. Wooldridge (2010) explains why we need to develop a test statistic that is robust to the vio- lation of the random effects structure of the composite errors. A comparison between the fixed effects and random effects estimator essentially boils down to testing the correlation between unobserved heterogeneity and covariates. This is captured in the conditional first moment as- sumptions of the model. The assumption that the composite errors have the random effects structure, on the other hand, is an assumption about the second moments. Traditional Hausman test is concerned with verifying the validity of the former while also maintaining the latter. Fail- ure of the second moment assumptions have serious consequences for the test statistic which is primarily concerned with the first moment. In fact, in this case it causes the test statistic to have a non-standard asymptotic distribution. Thus a non-robust Hausman test statistic which tests conditional mean specifications has no systematic power against the violation of the conditional variance specifications. 1Unobserved heterogeniety and idiosyncratic errors are clubbed together to form composite errors 2 The limitations of the traditional Hausman specification test when used as a pretest of ran- dom effect specification are also studied in Guggenberger (2010) . In particular, it is shown both theoretically and through Monte-Carlo simulations that the asymptotic size of the t-statistic that is based on either the random effect or fixed effect specification based on the outcome of Haus- man pretest, is severely distorted. For balanced panels, this issue is addressed by the fully-robust regression-based Hausman Test. The fundamental idea behind regression-based tests in panel data is given by Correlated Random Effects models due to Mundlak (1978). For unbalanced panels with exogenous ex- planatory variables, correlated random effects models are developed in Wooldridge (2016) that subsequently lead to a simple fully-robust Hausman specification tests to compare Fixed Effects and Random Effects estimators. Models with individual specific slopes are also considered and correlated random effect assumption is used to develop tests for correlation between the selec- tion and heterogeneous slopes. This paper extends Wooldridge (2016) for the unbalanced panels where some elements of the time-varying explanatory variables are allowed to be correlated with the unobserved idiosyn- cratic shocks. In particular, Correlated Random Effects models are developed for unbalanced panels with endogeneity and simple specification tests are suggested to compare Fixed Effects 2SLS (FE2SLS) and Random Effects 2SLS (RE2SLS) estimators. Wooldridge (2016) obtains an algebraic equivalence result where Fixed Effects estimator is computed as a Pooled OLS estimator (P2SLS) of the model by adding time averages of the covariates (averaged across the unbalanced panel) as additional explanatory variables. We obtain a similar result for the case when some of the covariates are allowed to be endogenous. Regression based Hausman test to check for correlation between the instruments and indi- vidual heterogeneity in unbalanced panel data models begins with modeling of the unobserved heterogeneity in terms of the time averages of the instruments. While this seems to be a natural extension of the balanced panel models, Wooldridge (2016) explains how CRE models in the 3 unbalanced panels differ from their balanced counterpart. The unbalanced nature of the panel is reflected in the time averages of the instruments as the number of time periods for which observations are available differs across different cross-sectional units. Thus time averages of the instruments are defined only for the full set of observations. In addition, unlike in the bal- anced case, the time averages of aggregate time variables are also included because we average different time periods for different cross-sectional units. In other words, in unbalanced panels, the unobserved heterogeneity is modeled in terms of the instruments and the selection. This paper uses Mundlak (1978) assumption to model the unobserved heterogeneity, and we get an algebraic result that the FE2SLS estimator for the coefficient on the covariates is obtained by doing a P2SLS or RE2SLS on the augmented model. This algebraic result showing the equivalences of the estimators serves as the building block for the specification test that compares FE2SLS and RE2SLS estimators. It provides a way to obtain a regression based Hausman test that is fully-robust to the second moment conditions of the composite errors. We also consider a test to check the endogeneity of some explanatory variables using the control function approach. In addition to being a bit unwieldy, the traditional Hausman test for checking the endogenous explanatory variables suffers from shortcomings like giving a wrong degrees of freedom and often gives a negative χ2 test statistic. Moreover, it is also not robust to heteroscadasticity. To address this issue, we adopt the control function approach and obtain an equivalence result. More specifically, we show that adding the residuals from the reduced form equation to the original model yields FE2SLS estimators. This result is then used to develop a simple regression based fully robust Hausman Test for endogeneity of the explanatory variables. The paper is structured as follows. Section 1.2 introduces the general model for unbalanced panels and the assumptions maintained in this paper. Section 1.3 specifies the key estimation methods for unbalanced panels with endogeneity, namely FE2SLS and RE2SLS. Section 1.4 obtains algebraic equivalences between different estimators. In Section 1.5, we develop a sim- ple fully-robust regression based Hausman specification test to compare Random Effects 2SLS 4 and Fixed Effects 2SLS estimators. In Section 1.6, we consider the control function approach to detect the possible endogeneity of explanatory variables in unbalanced panel data model. In Section 1.7, we briefly talk about an empirical strategy that could be followed as protocol for approaching endogeneity issues in a linear model with unbalanced panels. We illustrates this strategy and our theoretical findings with an empirical application in Section 1.8. More specif- ically, we study the effects of spending on student performance in Michigan schools. Section 1.9 concludes the paper. 1.2 Model We begin by assuming that a random sample is drawn from an underlying population that con- sists of a large number of units for whom data on T time periods are potentially observable. In our model, for an individual i, at time period t, yit denote the potentially observed out- come variable, xit is a 1× K vector of potentially observed time-variant covariates and wi de- In addition to the potentially notes the set of time-invariant variables (that contains unity). observable variables, we also draw unobservables for each i and ci denotes the unobserved heterogeneity associated with each i. The idiosyncratic errors are denoted by uit. The standard linear model with additive heterogeneity is given as: Assumption 1.2.1 yit = xitβββ + wiδδδ + ci + uit (1.1) We believe that some elements of xit are correlated with uit, or even uir with r (cid:54)= t. To deal with this endogeneity issue we have a set of 1× L possible instrumental variables zit with L ≥ K. This set of instruments zit not only includes the excluded exogenous variables but also all the elements of xit that are exogenous. To allow for an unbalanced panel, we introduce a binary selection indicator sit which is defined as 5 Assumption 1.2.2 sit = 1 if and only if (xit,yit,zit) is fully observed (1.2) 0, otherwise In other words, si ≡ {si1,si2, ...,siT} is the series of selection indicator for each i. This implies that sit = 1 if time period t for unit i can be used in estimation. The number of time periods for which a unit i is observed is denoted by Ti which is simply equal to ∑T r=1 sir. Our panel data can be concisely represented as a vector of a randomly drawn sample across the cross section dimension i , with fixed time periods T : {(sit,xit,yit,zit);ci}. Since we allow for endogeneity of some of the explanatory variables xi ≡ (xi1, ...,xiT ), our assumptions in this paper primarily structure the relationship between the instruments zi ≡ (zi1, ...,ziT ), selection indicators si, unobserved individual heterogeneity ci and the id- iosyncratic errors ui = (ui1, ...,uiT ). 1.3 Estimation Methods We consider two main estimation methods to estimate our key parameter of interest βββ : Fixed Effects 2SLS and Random Effects 2SLS. 1.3.1 Fixed Effects 2SLS (FE2SLS) First, we consider the case where unobserved heterogeneity ci is allowed to be correlated with the history of instruments zi. As with the balanced panel analysis, FE2SLS approach for the unbalanced panel data transforms (1.1) to eliminate the unobserved effect ci. In unbalanced panels, the fixed effects transformation, also called the within transformation is obtained by first multiplying equation (1.1) by sit: sityit = sitxitβββ + sitwiδδδ + sitci + situit (1.3) 6 Averaging this equation across t for each i gives us the time averaged equation: ¯yi = ¯xiβββ + wiδδδ + ci + ¯ui (1.4) The time averages are given by ¯yi = T−1 r=1 sirxir. Note that the time averages for yit,xit, (and also zit) are computed only for periods when data exists on the full set of variables. (1.4) is of interest in its own right because its Pooled 2SLS estimation using ¯zi as instruments gives us the Between 2SLS estimator of βββ : ˆβββ B2SLS r=1 siryir and ¯xi = T−1 i ∑T i ∑T Time demeaning (1.3) using (1.4) we get: sit(yit − ¯yi) = sit(xit − ¯xi)βββ + sit(uit − ¯ui) (1.5) We denote the time demeaned variables as ¨xit = (xit − ¯xi) and ¨yit = (yit − ¯yi). The FE2SLS estimator is obtained by estimating (1.5) by Pooled OLS using ¨zit ≡ (zit − ¯zi) as instruments: ˆβββ FE2SLS = (cid:20)(cid:18) N ∑ (cid:20)(cid:18) N i=1 ∑ i=1 (cid:19)(cid:18) N ∑ (cid:19)(cid:18) N i=1 ∑ i=1 sit ¨z(cid:48) it ¨zit sit ¨z(cid:48) it ¨zit T ∑ t=1 T ∑ t=1 (cid:19)−1(cid:18) N ∑ (cid:19)−1(cid:18) N i=1 ∑ i=1 sit ¨z(cid:48) it ¨xit sit ¨z(cid:48) it ¨yit T ∑ t=1 T ∑ t=1 (cid:19)(cid:21)−1× (cid:19)(cid:21) sit ¨x(cid:48) it ¨zit sit ¨x(cid:48) it ¨zit T ∑ t=1 T ∑ t=1 The key assumptions sufficient for the consistency of FE2SLS estimator on the unbalanced panel can be stated as: Assumption 1.3.1 For all i = 1, ...,N • FE2SLS.1 E[uit|zi,ci,si,wi] = 0 • FE2SLS.2 ∑T t=1 E[sit ¨z(cid:48) it ¨xit] is full rank and ∑T t=1 E[sit ¨z(cid:48) it ¨zit] is full rank Assumption FE2SLS.1 implies two things. First is the strict exogeneity of selection and in- struments with respect to the idiosyncratic errors. Second, we allow for correlation between ci and vector of instruments zi. We also allow for selection sit at time t to be correlated with 7 (zi,ci). Assumption FE2SLS.2 is the appropriate rank condition that ensures invertibility of matrices in an unbalanced panel data. It naturally implies that any time-invariant variables are dropped out of our the fixed effects analysis. Under the assumptions stated above, FE2SLS on the unbalanced panel is consistent and is asymptotically normal. 1.3.2 Random Effects Fixed effects estimation methods suffer from a few limitations that arise due to the demeaning of the variables. As illustrated in (1.5), time-invariant observables are eliminated in the process of eliminating ci. In addition, because of time-demeaning, any observation with Ti = 1 also drops out. In addition, much of the variation in the data is also removed in the demeaning process. Random Effects estimation provides a remedy to these issues by imposing additional assumptions. Assumption 1.3.2 For all i = 1, ...,N • RE2SLS 1. E[ci|wi,zi,si] = E[ci] = 0 E[ci] = 0 can be assumed whenever we include an intercept in our model. Random Effects transformation becomes straightforward if we add the assumption that: Assumption 1.3.3 For all i = 1, ...,N • RE2SLS 2. E[uiu(cid:48) i|wi,zi,ci,si] = σ 2 u IT and E[ci|wi,zi,si] = σ 2 c Analogous to the case of balanced panel data, we get a straightforward Generalized Least Square(GLS) transformation when we define (cid:20) (cid:21)1 2 (1.6) θi = 1− σ 2 u (σ 2 u + Tiσ 2 c ) where θi is viewed as a random variable which is a function of Ti. In addition, Ti is also exogenous since E[uit|wi,zi,ci,si] = 0 and E[ci|wi,zi,si] = E[ci] = 0. This implies that E [ci + uit|wi,zi,Ti] = 0. 8 A Pooled 2SLS on the selected sample of (yit − θi ¯yi) on (xit − θi ¯xi) and (1− θi)wi using (zit − θi¯zi) as instruments will give us RE2SLS estimator: ˆβββ RE2SLS. Not surprisingly, just as in the balanced panel data case, the consistency of our estimator will not be affected if we use an incorrect variance-covariance structure. We would just calculate the fully robust variance-covariance matrix for ˆβββ RE2SLS. However, this is true only under the assumption of strict exogeneity of instruments and selection with respect to ci + uit. As Wooldridge (2010) notes, this is a very restrictive assumption. 1.4 An Algebraic Equivalence Result Wooldridge (2016) obtains a general equivalence result for the Random Effects and Pooled OLS in unbalanced panels. We extend the equivalence result further for the RE2SLS and Pooled 2SLS in an unbalanced panel. As we will see below, the equivalence result requires no assump- tion except that the appropriate matrices are invertible in the sample. Consider the following regression model: (yit − θi ¯yi) = (xit − θi ¯xi)βββ + (1− θi)¯ziξξξ + (1− θi)wiδδδ + uit (1.7) where we follow the notation of Section 2. The Pooled 2SLS estimator for βββ for selected sample (i.e for sit = 1) using (zit − θi¯zi) as instruments for (xit − θi ¯xi) is given as ˆβββ P2SLS = [( N ∑ i=1 T ∑ t=1 N ∑ i=1 sit(xit − θi ¯xi)(cid:48)¨zit)( N T ∑ ∑ t=1 i=1 N sit(xit − θi ¯xi)(cid:48)¨zit)( ∑ i=1 T ∑ t=1 [( sit ¨z(cid:48) it(yit − θi ¯yi))] (1.8) T ∑ t=1 sit ¨z(cid:48) it ¨zit)−1( sit ¨z(cid:48) it(xit − θi ¯xi))]−1× N T ∑ ∑ t=1 i=1 N it ¨zit)−1( ∑ i=1 sit ¨z(cid:48) T ∑ t=1 where as before, we define ¨zit = (zit − ¯zi). This expression is obtained by following the simple idea of the Frisch-Waugh-Lovell theorem and extending it for the instrumental variables estimation. (See Section 1.9.1) 9 Theorem 1 If θi ¯xi)(cid:48)¨zit)(∑N then the Pooled 2SLS estimator obtained above is algebraically equivalent to the Fixed Effect − [(∑N nonsingular matrices, (∑N i=1 ∑T it ¨zit)−1(∑N it(xit − θi ¯xi))] it ¨zit) t=1 sit ¨z(cid:48) t=1 sit ¨z(cid:48) i=1 ∑T t=1 sit ¨z(cid:48) t=1 sit(xit i=1 ∑T i=1 ∑T and are 2SLS estimator on the unbalanced panel: ˆβββ P2SLS = ˆβββ FE2SLS Proof. For θi = 0,∀i, (cid:20)(cid:18) N ∑ (cid:20)(cid:18) N i=1 ∑ i=1 ˆβββ P2SLS = (cid:19)(cid:18) N ∑ (cid:19)(cid:18) N i=1 ∑ i=1 T ∑ t=1 T ∑ t=1 sit ¨z(cid:48) it ¨zit it ¨zit sit ¨z(cid:48) (cid:18) (cid:19)−1(cid:18) N ∑ (cid:19)−1(cid:18) N i=1 ∑ i=1 T ∑ t=1 T ∑ t=1 T ∑ t=1 T ∑ t=1 sitx(cid:48) it ¨zit it ¨zit sitx(cid:48) (cid:19) (cid:19)(cid:21)−1× (cid:19)(cid:21) sit ¨z(cid:48) itxit ityit sit ¨z(cid:48) (cid:19) ∑N i=1 ∑T Noting that for ˆβββ FE2SLS For 0 ≤ θi < 1, first consider t=1 sit ¨x(cid:48) (cid:18) it ¨zit ∑N i=1 ∑T t=1 sitx(cid:48) (cid:19) . simplifies to ∑N i=1 ∑T it ¨zit , we get the expression (cid:18) (cid:18) N ∑ i=1 sit(xit − θi ¯xi)(cid:48)¨zit T ∑ t=1 (cid:19) t=1 sit(xit − θi ¯xi)(cid:48)¨zit (cid:19) (cid:18) N ∑ (cid:18) N i=1 ∑ i=1 T ∑ t=1 T ∑ t=1 sit(x(cid:48) sitx(cid:48) it ¨zit = = i¨zit) it ¨zit − θi ¯x(cid:48) (cid:19) (cid:18) N ∑ i=1 − (cid:19) θi ¯x(cid:48) i T ∑ t=1 sit ¨zit (cid:19) (cid:18) N ∑ (cid:18) N i=1 ∑ i=1 = = sitx(cid:48) it ¨zit sit ¨x(cid:48) it ¨zit T ∑ t=1 T ∑ t=1 (cid:19) (cid:19) Note that ∑T t=1 sit ¨zit = 0. So we get (cid:18) N ∑ i=1 sit(xit − θi ¯xi)(cid:48)¨zit T ∑ t=1 10 Similarly, we get: (cid:18) N ∑ i=1 T ∑ t=1 (cid:19) it(yit − θi ¯yi) sit ¨z(cid:48) (cid:18) N ∑ (cid:18) N i=1 ∑ i=1 = = sit ¨z(cid:48) ityit sit ¨z(cid:48) it ¨yit T ∑ t=1 T ∑ t=1 (cid:19) (cid:19) Substituting these expressions, our 2SLS coefficient becomes: ˆβββ P2SLS = (cid:20)(cid:18) N ∑ (cid:20)(cid:18) N i=1 T ∑ ∑ t=1 i=1 sit ¨x(cid:48) it ¨zit T ∑ t=1 sit ¨x(cid:48) it ¨zit (cid:19)(cid:18) N ∑ (cid:19)(cid:18) N i=1 T ∑ ∑ t=1 i=1 (cid:19)−1(cid:18) N ∑ (cid:19)−1(cid:18) N i=1 T ∑ ∑ t=1 i=1 sit ¨z(cid:48) it ¨zit T ∑ t=1 sit ¨z(cid:48) it ¨zit sit ¨z(cid:48) it ¨xit (cid:19)(cid:21) T ∑ t=1 sit ¨z(cid:48) it ¨yit (cid:19)(cid:21)−1× = ˆβββ FE2SLS 1.5 Regression based fully-robust Hausman Test to compare RE2SLS and FE2SLS Specification test in this section is concerned with comparing the FE2SLS and RE2SLS estima- tors. In other words, we want to test that heterogeneity is mean independent of the instruments and selection in all time periods. As we mentioned before, the traditional Hausman test statistic to compare RE2SLS and FE2SLS suffers from several limitations. An elegant way to deal with this is provided by the regression-based fully robust Hausman test. We apply the principle of the Wu-Hausman endogeneity test with additionally using Mundlak (1978) device to modify our regression model. A comparison of of FE2SLS and RE2SLS estimators in unbalanced panels is essentially a test of correlation between unobserved heterogeneity and instruments. In other words, Hausman 11 test can be interpreted as a test of In particular, our model is E[ci|zi,si] = E[ci] yit = xitβββ + wiδδδ + ci + uit for i = 1,2, ...,N and t = 1,2, ...,T . Recall that wi contains unity. We specify a correlated random effects structure for ci due to Mundlak (1978): ci = ξ0 + ¯ziiiξξξ + ai, (1.9) (1.10) (1.11) where Cov[ai,zi] = 0, E[ai] = 0 Substituting for ci in our regression model, we get the usual Mundlak equation: yit = xitβββ + wiδδδ + ¯ziiiξξξ + ai + uit (1.12) where ξ0 is absorbed into the intercept in wi. Now regression based fully-robust Hausman test is simply an application of Theorem 1. If we estimate the above equation by RE2SLS or Pooled 2SLS using (zit, ¯zi) as instruments then we know from the Theorem 1 that the resulting estimator of βββ is the FE2SLS estimator. The regression based Hausman test is simply a Wald test of H0 : ξξξ = 0 To obtain a fully robust test, we estimate (1.12) by pooled 2SLS and use cluster-robust inference. Testing for Correlation between Selection and Idiosyncratic Errors The test developed above helps us to rule out estimation of parameters by RE2SLS method. However, FE2SLS estimation methods are consistent only if selection is strictly exogenous with respect to the idiosyncratic errors conditional on the unobserved effect. Thus, ruling out the possibility of relation between selection (sit) and idiosyncratic errors (uit), conditional on 12 ci is imperative before we proceed. Further, in our context, it will be erroneous to use Heckman (1976) test (extended to the panel data context in Wooldridge (1995)) because it requires the exogenous components of xit to be observable in all time periods. We do not impose such restriction in our model. Nijman and Verbeek (1992) suggest a simple test for testing of selection bias in the context of random effects which also works in the fixed effects estimation. We simply add the lagged value of the selection indication : si,t−1 to the original model and check for its significance. We can also add Ti and check for its significance using a t-test. This test is extended to the fixed effects framework in Wooldridge (2010). We add si,t−1 in our model and estimate the model by fixed effect using sit = 1. A simple robust t-test of the coefficient on si,t−1 tests the null hypothesis that selection in the previous period is not significant. Another alternative which is useful in the attrition problems is to add the lead value of the selection indicator: si,t+1. Once we rule out selection bias in our data, we can now rely on Fixed Effects estimation methods to estimate the coefficients of our model. We would also like to check for the endo- geneity of our explanatory variables. Next section develops simple regression based fully-robust tests for endogeneity. 1.6 Robust Hausman Test to compare FE vs FE2SLS Previous sections focus on developing the test for checking if the instruments are correlated with the individual heterogeneity. A natural addition to those tests would be to check whether or not the explanatory variables are correlated with the idiosyncratic shocks. Traditional Haus- man Test for checking endogeneity of explanatory variables in panel data suffers from several shortcomings (Baum (2006)). It often generates negative χ2 test statistic which makes the test infeasible. In addition, often the degrees of freedom are wrongly calculated and this leads to de- generacies. And lastly, the traditional Hausman test statistic is not robust to heteroscadasticity 13 and the robust versions of the test are not readily available in most softwares. 2 In this section, we use control function approach to check the endogeneity of explanatory variables. Control function approach deals with endogenous explanatory variables by formaliz- ing the correlation between the endogenous explanatory variables and unobservables of idiosyn- cratic shocks. In the cross-section data analysis with linear endogenous explanatory variables, it has been shown that control function approach leads to the same estimators as that of 2SLS. We will show in this section that this algebraic result also holds in unbalanced panels. In the context of panel data, the control function approach would yield FE2SLS estimators. Through this result, we are able to obtain a regression-based fully robust specification test to compare fixed effects estimators with fixed effect 2SLS estimators. 1.6.1 Model The model is defined in terms of the following assumptions: Assumption 1.6.1 For all i = 1, ...,N the first assumption specifies a linear equation for the outcome variable: yit1 = xitβββ + yit2ααα + ci1 + uit (1.13) yit1 denotes the potentially observed outcome variable. In this section we slightly modify the notations and explicitly denote the 1 × Ky2 vector of endogenous variables by yit2. In other words, while xit denote the 1 × Kx vector of potentially observed explanatory variables that are not correlated with the idiosyncratic errors uit, some elements of yit2 are allowed to be correlated with uit. We introduce the 1 × Kw = 1 × (Kx + Kz) vector wit as the full set of instruments that also contain exogenous variables, i.e wit = {xit,zit}, where the 1× Kz zit serve as instruments for yit2. The reduced form equation for yit2 is given by: 2We could always set up the whole model as a GMM problem. 14 Assumption 1.6.2 We assume that the reduced form of the endogenous variable is a set of linear equations: yit2 = witγγγ + ci2 + vit (1.14) We introduce the selection indicator as: Assumption 1.6.3 sit = 1 if and only if (wit,yit) is fully observed (1.15) 0, otherwise As before, the time averages are computed only for periods when data exists on the full set of variables. To obtain regression based fully-robust Hausman test checking for the endogeneity of yit2, we first estimate the reduced form (1.14) by Fixed Effects using the selection sample, i.e for sit = 1. We obtain residuals from this regression, and denote them by ˆ¨vit. Next, we augment equation (1.13) with ˆ¨vit and obtain: yit1 = xitβββ + yit2ααα + ˆ¨vitρρρ + errorit (1.16) (1.16)is called the control function equation and serves as the primary equation for obtaining the Hausman test. The errorit term comprises of both individual heterogeneity and idiosyncratic error. Our key algebraic result is stated in Theorem 2: Theorem 2 Estimate the augmented equation (1.16) by Fixed Effects using the selected sample and let ˜βββ FE(aug.) and ˜αααFE(aug.) denote the Fixed Effect estimators of the augmented equation. Then, ˜βββ FE(aug.) = ˆβββ FE2SLS and ˜αααFE(aug.) = ˆαααFE2SLS The proof of the result is shown in the Section 1.10.2. This result essentially gives us an elegant way to obtain the regression based test to check the possible endogeneity of the 15 explanatory variables. Following the foundations of Section 4, regression based fully-robust Hausman Test is given by a simple Wald test of Ho : ρρρ = 0 using robust standard errors. 1.7 A Strategy for an Applied Econometrician In this paper, we have suggested two kinds of specification tests. One is to test the endogeneity of the explanatory variables with the time-varying idiosyncratic shocks (Section 1.6) and the other is to test the endogeneity of the instruments with the time-constant unobserved individual effects (Section 1.5). An empirical econometrician would begin with testing the endogeneity of the explanatory variables in her model. This would essentially mean that we would compare FE estimator with FE2SLS estimator using the regression based fully-robust Hausman Test that essentially takes the form of a Variable Addition Test (VAT) given in Section 1.5. A failure to reject the null implies that we can consider our explanatory variables to be exogenous with re- spect to the time-varying unobserved idiosyncratic shocks and go ahead with the usual Random Effects and Fixed Effects analysis. We could further compare the RE and FE estimators using a Hausman Test. A rejection of the null implies that we need to account for the endogeneity of our explanatory variables. This entails using instruments and we have two key estimation methods: RE2SLS and FE2SLS. Now, it is possible that our instruments are exogenous not only with respect to the idiosyncratic errors but also with respect to the unobserved time-invariant individual heterogeneity. If FE2SLS estimates seem to be imprecise, then we can use the VAT version of Hausman Test described in Section 6 to compare RE2SLS and FE2SLS estimators. This would help us to determine which of the estimation methods fit our model the best. 16 1.8 Empirical Illustration To illustrate the methods above, we consider the problem of estimating the effects of spend- ing on student performance. We use the data on standardized test scores from 1992 to 1998 at Michigan schools to determine the effects of spending on math test outcomes for fourth graders. Papke (2005) studies this using school-level data and a linear functional form. Papke and Wooldridge (2008) extend the analysis by recognizing the fractional nature of the pass rates. Specifically, they use fractional response models for the district level panel data. Both find non- trivial effects of spending on test pass rates. Since we deal with linear models in our paper, our analysis is closer to Papke (2005). 1.8.1 Background Funding for K-12 schools in Michigan dramatically changed in 1994 from local, property-tax based system to a statewide system supported primarily through a higher sales tax. The primary goal of this policy change was to equalize spending and this was reflected in the rise of per- pupil spending. Papke (2005) studies the effect of this policy change on student performance. The data used comes from annual Michigan School Reports (MSRs). The outcome variable of study is the percentage of students passing the Michigan Educational Assessment Program (MEAP) math test for 4th graders: math4. The key explanatory variable is log of average per pupil expenditure: log(avgrexpp) which serves as a measure of per-pupil spending. The data used in Papke (2005) is an unbalanced panel data. In this empirical illustration, we revisit this problem taking the note of the incomplete nature of the panel. In addition, we use Stata 14 and have fully robust standard errors as ’clustering’ is available for all the regressions. 3 3Previous versions of Stata do not allow to compute fully robust standard errors for example in RE2SLS. However, we could bootstrap to obtain proper standard errors. 17 1.8.2 Results Since our purposes are primarily illustrative, we focus on the simple specification: math4it = θt + β1log(avgrexppit) + β2lunchit + β3log(enrollit) + ci1 + uit (1.17) In this specification, θt is captured by adding where i indexes school and t indexes year. time dummies. The covariate vector is given as xit = [log(avgrexppit),lunchit,log(enrollit)]. Papke (2005) argues that log(avgrexppit) could be endogenous as spending could be correlated with the idiosyncratic shocks uit. She uses the district foundation grant log( f oundit) as in- struments. Thus, we have a vector of instruments : zit = [log( f oundit),lunchit,log(enrollit)], where [lunchit,log(enrollit)] serve as their own instruments. A simple Nijman-Verbeek (1992) test verifies that the selection is not correlated with the idiosyncratic shocks. Specifically, we add the lagged value of the selection indicator to our model and found it insignificant. This allows us to apply our tests to this empirical problem. We begin by conducting a test to check the endogeneity of the explanatory variables. As mentioned before, Papke (2005) argues that the primary variable of interest log(avgrexpp) is endogenous in the sense that it is correlated with the time varying idiosyncratic erros. She verifies this claim using a fully robust Hausman test that compares the Pooled OLS and Pooled 2SLS estimators. In this paper, we further verify the endogeneity of log(avgrexpp) using the control function approach as described in Section 6. More specifically, we begin by estimating the reduced form equation: log(avgrexppit) = φt + π1lunchit + π2log(enrollit) + π3log( f oundit) + ci2 + νit2 (1.18) using fixed effects. The residuals from this regression are denoted by ˆνit2. The control function equation would be equation (1.17) augmented with ˆνit2: math4it = θt + β1log(avgrexppit) + β2lunchit + β3log(enrollit) + ρ ˆνit + c1i + errorit (1.19) We then estimate the above equation using fixed effects. The results are given in Column (1) of Table 1. We see that the equivalence result holds and the estimates are equal to the FE2SLS 18 estimates given in Column (2). To check for the endogeneity of log(avrgexpp), we check the significance of the estimate of the coefficient on ˆν. We see that it is significantly different from zero thus we can conclude that average spending per pupil is endogenous. This is the rejection at 10 percent level. If we think of it as rejection, then we can go ahead and do a RE2SLS and FE2SLS analysis. However, since this is only at 10 percent level, then we can also only look at FE and RE. This is done in the Column (2) and Column (3) of Table 1.1. We compare FE and RE estimates through the regression based Hausman test for unbalanced panels given in Wooldridge (2016). The regression model augumented with the Mundlak’s device is estimated using Random Effects and the standard errors are robust. The results are given in Columm (4). To test for the correlation of the explanatory variables with the individual heterogeneity, we check the joint significance of {lunchi,log(enroll)i,log(avgrexpp)i,y96,y97,y98}. The χχχ2 and the p values of the test are given in Table 2. We can clearly infer that null of no correlation between the explanatory variables and unobserved individual time-invariant heterogeneity is rejected, validating the FE estimation method. Next, we do a RE2SLS and FE2SLS analysis. The results are given in Columns (5) and (6) of Table.1 Both give a statistically significant estimate of the coefficient on log(avgrexpp). The results verify that the effects of spending on student performance are non-trivial. This is consistent with the results obtained in Papke (2005) and Papke and Wooldridge (2008). We find that the RE2SLS estimates are quite different from the FE2SLS estimates and this motivates us to test for the correlation of the instrument with the individual heterogeneity. We use the Mundlak (1978) device and apply the regression based fully-robust Hausman Test developed in Section 5. This also allows us to verify our equivalence results. Recall that we model the individual heterogeneity as: E [c1i|zi] = ξ0 + ¯zzziiiξξξ and add it to our model. Our estimating equation becomes: math4it = θt + β1log(avgrexpp)it + β2lunchit + β3log(enroll)it +ξ1lunchi + ξ2log(enroll)i + ξ3log( f ound)i + ηit (1.20) 19 Aggregate time dummies should be added in the specification. In other words, ¯zzziii include the averages of the year dummies also. This is an important aspect in which our analysis differs due to the unbalanced nature of the panel. Since different individuals have different Ti, we are averaging over different time periods for different i. Thus the time averages of the aggregate time variables changes across i. We estimate the equation(1.20) by RE2SLS and the results are given in Column (7) of Table 1.1. The estimates verify our equivalence result for the unbalanced panels with endogeneity. 20 To check for the correlation of our instrument log( f ound) with the unobserved heterogene- ity ci, we check the joint significance of the coefficients on ¯zzziii. This translate into checking the joint significance of the coefficients on {lunchi,log(enroll)i,log( f ound)i,y96,y97,y98}. The χχχ2 and the p values of the test are given in Table 1.2. We find that the variables are jointly significant that illustrates a non-zero correlation between the instruments and the time-invariant unobserved individual-specific heterogeneity. This validates FE2SLS to be an appropriate esti- mation procedure. 1.9 Technical Details 1.9.1 Derivation for ˆβP2SLS To obtain the expression for ˆβββ P2SLS in section 1.3, we use Frisch-Waugh-Lovell theorem for instrument variables. Consider Yi = X1iβ1 + X2iβ2 + εi where X2i is exogenous with respect to εi and X1i is endogenous with respect to εi. To deal with this endogeneity problem, we have instruments Zi. Then Firsch-Waugh-Lovell Theorem states that the 2SLS estimator of β1 can be estimated as: • First, regress Zi on X2i and obtain the residuals Ri • Next, regress Yi on X1i using Ri as instruments. We follow the similar procedure in the following steps: Step 1 : First we run a Pooled OLS as: (zit − θi¯zi) = (1− θi)¯ziααα1 + (1− θi) ¯wiααα2 + error using sit = 1 Theorem 3 Consider the regression: 21 Table 1.1: Empricial Illustration Results math4 (1) CF (2) FE (3) RE (4) CRE (5) RE2sls (6) FE2sls (7) CRE2sls 7.256*** 5.084 (3.46) (1.80) -0.37*** 0.02 (0.04) (0.01) -3.174 -1.814** (0.84) (2.22) 19.47*** 47.00* 5.084 (25.03) (2.69) (3.46) -0.38*** -0.005 0.02 (0.05) (0.01) (0.0433) 6.483 -0.523 -3.174 (6.16) (2.22) (0.89) 1.598*** 1.274*** 1.598*** 0.0966 -2.558 (2.534) (0.519) (0.549) -6.63** -3.03*** -1.43** (0.610) (0.568) (3.18) 11.72*** 11.74*** 11.72*** 10.05*** 6.11* (0.66) (3.44) (0.473) -1.51*** (0.503) (0.549) -1.43** (0.610) (0.53) (0.61) (0.66) 3.137 (4.082) -0.43*** (0.04) 1.42 (2.388) -4.97 (4.21) 2.89 (4.34) - 15.61*** (4.38) lunch log(avgexp)47.00** (23.32) -0.005 (0.05) log(enrol) 6.483 (5.63) -2.558 (2.367) -6.63** (2.984) 6.111* (3.23) y96 y97 y98 log(avgexp) lunch log(enrol) y96 y97 y98 ˆv log( f ound) -42.48* (23.52) 47.00* (25.04) -0.005 (0.05) 6.483 (6.16) -2.55 (2.535) -6.63** (3.18) 6.11* (3.44) -0.45*** (0.04) -4.51 (4.18) -2.202 (4.57) 2.436 (4.57) - 16.24*** (4.73) -20.40 (17.10) -136.9* Constant -361.8 38.61 26.98 25.04 (222.2) (35.03) (17.05) (19.81) - 80.65*** (24.73) -361.8 (238.9) (73.77) Obs # Schools 5,913 1,643 5,913 1,643 5,913 1,643 5,913 1,643 5,913 1,643 5,913 1,643 5,913 1,643 Robust standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1 22 Table 1.2: Specification Tests Specification Tests FE vs RE FE2SLS vs RE2SLS Chi-Squared 109.53 105.67 p-value 0.0 0.0 Degrees of Freedom 6 6 (zit − θi¯zi) = (1− θi)¯ziααα111 + (1− θi) ¯wiααα222 + error. Pooled OLS estimators using sit = 1 will yield(cid:99)ααα1 = IIIL (Identity matrix) and(cid:99)ααα2 = 000 (zero matrix). Proof. First run a pooled regression of (zit −θi¯zi) on (1−θi)¯zi for sit = 1. The coefficient will be (cid:20) N ∑ (cid:20) N i=1 ∑ (cid:20) N i=1 ∑ (cid:20) N i=1 ∑ i=1 = = = = Ti(1− θi)2¯z(cid:48) i¯zi (1− θi)2¯z(cid:48) i¯zi sit(1− θi)2¯z(cid:48) i¯zi T ∑ t=1 Ti(1− θi)2¯z(cid:48) i¯zi (cid:21)−1(cid:20) N T ∑ ∑ (cid:21)−1(cid:20) N i=1 t=1 (1− θi)¯z(cid:48) ∑ (cid:21)−1(cid:20) N i=1 ∑ (cid:21)−1(cid:20) N i=1 (1− θi)2¯z(cid:48) ∑ i¯zi i=1 (1− θi)¯z(cid:48) sit(1− θi)2¯z(cid:48) (cid:21) i(zit − θi¯zi) (cid:21) T sit(zit − θi¯zi) ∑ (cid:21) t=1 i(Ti¯zi − Tiθi¯zi) (cid:21) i = III The residuals from this regression would be equal to (zit − θi¯zit)− (¯zi − θi¯zi) = (zit − ¯zi) ≡ ¨zit Next, we run a POLS on the regression of (1− θi)wi on (1− θi)¯zi for sit = 1 23 As it is clear, both the coefficient and the residuals from this regression would only depend on i. Coefficient: Residuals: (cid:20) N ∑ (cid:20) N i=1 ∑ i=1 = = = (1− θi)wi − (1− θi)¯zi ≡ ˜eeei(depends only on i) (cid:21) sit(1− θi)2¯z(cid:48) (cid:21) iwi T ∑ t=1 (cid:21)−1(cid:20) N ∑ (cid:21)−1(cid:20) N i=1 Ti(1− θi)2¯z(cid:48) ∑ iwi i=1 (cid:21)−1(cid:20) N ∑ i=1 Ti(1− θi)2¯z(cid:48) i¯zi sit(1− θi)2¯z(cid:48) i¯zi T ∑ t=1 Ti(1− θi)2¯z(cid:48) i¯zi (cid:20) N ∑ i=1 (cid:21) Ti(1− θi)2¯z(cid:48) iwi To obtain ˆααα2, run a POLS of ¨zit = (zit − ¯zi) on ˜eeei for sit = 1. (Frisch-Waugh-Lovell theorem). We get (cid:21) (cid:20) N ∑ (cid:20) N i=1 ∑ (cid:20) N i=1 ∑ (cid:20) N i=1 ∑ i=1 sit ˜eee(cid:48) i¨zit (cid:21)−1(cid:20) N T sit ˜eee(cid:48) ∑ ∑ (cid:21)−1(cid:20) N i˜eeei i=1 t=1 T sit ˜eee(cid:48) ∑ ∑ (cid:21)−1(cid:20) N i˜eeei (cid:0) T t=1 i=1 ˜eee(cid:48) ∑ ∑ (cid:21)−1(cid:20) N (cid:0)Ti¯zi − Ti¯zi i t=1 i=1 ∑ i=1 sit ˜eee(cid:48) ˜eee(cid:48) i T ∑ t=1 T ∑ t=1 Ti˜eee(cid:48) i˜eeei Ti˜eee(cid:48) i˜eeei (cid:1)(cid:21) (cid:21) i(zit − ¯zi) (cid:1)(cid:21) sitzit − Ti¯zi ˆααα2 = = = = = 0 Finally, since ˆααα2 = 000, to obtain ˆααα1 we simply do a POLS of (zit − θi¯zi) on (1− θi)¯zi for sit = 1. This was shown above to be equal to IIIL. Using Theorem 3, we obtain the residuals from the regression in Step 1 as (zit − ¯zi) = ¨zit Step 2 : Regress (yit − θi ¯yi) on (xit − θi ¯xi) 24 for sit = 1 using (zit − ¯zi) = ¨zit as instruments. The resulting 2SLS coefficient will be it(xit − θi ¯xi) (cid:19)(cid:21) it(yit − θi ¯yi) (cid:19)(cid:18) N sit(xit − θi ¯xi)(cid:48)¨zit ∑ (cid:19)(cid:18) N i=1 T ∑ ∑ t=1 i=1 (cid:19)−1(cid:18) N ∑ (cid:19)−1(cid:18) N i=1 T ∑ ∑ t=1 i=1 (cid:20)(cid:18) N ∑ (cid:20)(cid:18) N i=1 T ∑ ∑ t=1 i=1 T ∑ t=1 sit(xit − θi ¯xi)(cid:48)¨zit T ∑ t=1 sit ¨z(cid:48) sit ¨z(cid:48) T ∑ t=1 sit ¨z(cid:48) (cid:19)(cid:21)−1× ˆβββ P2SLS = sit ¨z(cid:48) it ¨zit it ¨zit 1.9.2 Proof of Theorem 2 We have : and the reduced form equation: yit1 = xitβββ + yit2ααα + ci1 + uit yit2 = witγγγ + ci2 + vit (1.21) (1.22) The first step is to estimate the reduced form equation using Fixed Effects. We first time demean equation (1.22): ¨yit2 = ¨witγγγ + ¨vit (1.23) and do a simple POLS using the selected sample, i.e for sit = 1. Denote the FE residuals as ˆ¨vit. The control function equation would be obtained by augmenting equation (1.21) with ˆ¨vit: yit1 = xitβββ + yit2ααα + ˆ¨vitρρρ + errorit (1.24) We will show that the FE estimators of (βββ ,ααα) in equation (1.22) would be identical to the FE2SLS estimators from equation (1.21). Proof : To obtain the FE estimators of (βββ ,ααα) in augmented equation (1.24), let xit1 ≡ (xit,yit2), xit2 ≡ ˆ¨vit and βββ 1 ≡ . Our equation becomes: βββ ααα yit1 = xit1βββ 1 + xit2ρρρ + ηit (1.25) 25 The error term ηit in equation (1.25) contains both individual heterogeneity term and the id- iosyncratic error. FE transformation is given as: ¨yit1 = ¨xit1βββ 1 + ¨xit2ρρρ + ¨ηηηit (1.26) FE estimator for βββ 1 is obtained by doing a POLS on equation (26) using sit = 1. To obtain an expression for βββ 1 , we will use Frisch-Waugh-Lovell theorem: • Step 1: Do a POLS of ¨xit1 on ¨xit2 for sit = 1. Let the estimator for the coefficient be (cid:21)−1(cid:20) (cid:21) ∑N i=1 ∑T t=1 sit ¨x(cid:48) it2 ¨xit1 ∑N i=1 ∑T it2 ¨xit2 denoted by ˆπππ: (cid:20) ˆπππ = Consider(cid:20) N ∑ i=1 T ∑ t=1 sit ¨x(cid:48) it2 ¨xit1 (cid:21) = = = t=1 sit ¨x(cid:48) (cid:20) N ∑ (cid:20) N i=1 ∑ (cid:20) N i=1 ∑ i=1 T ∑ t=1 T ∑ t=1 T ∑ t=1 sitx(cid:48) sit ˆ¨v(cid:48) sit ˆ¨v(cid:48) (cid:21) it2 ¨xit1 (cid:21) it(¨xit, ¨yit2) it ¨xit, N ∑ i=1 T ∑ t=1 since xit1 ≡ (xit,yit2), xit2 ≡ ˆ¨vit sit ˆ¨v(cid:48) (cid:21) it ¨yit2 Note that since ˆ¨vit are FE residuals from the reduced form equation (1.23), by construc- tion we will have ∑N t=1 sit ¨w(cid:48) i=1 ∑T sit ˆ¨v(cid:48) it ¨wit = N ∑ i=1 T ∑ t=1 it ˆ¨vit ≡ 000, T sit ˆ¨v(cid:48) ∑ (cid:20) N t=1 ∑ i=1 T ∑ t=1 N ∑ i=1 =⇒ (cid:21) sit ˆ¨v(cid:48) it ¨zit = (0,0) it(¨xit, ¨zit) = 000 sit ˆ¨v(cid:48) it ¨xit, N ∑ i=1 T ∑ t=1 (cid:20) (cid:21) (cid:21) This implies that ∑N t=1 sit ˆ¨v(cid:48) it ¨xit = 000. So we get (cid:20) i=1 ∑T t=1 sit ˆ¨v(cid:48) ∑N i=1 ∑T it ¨xit,∑N i=1 ∑T t=1 sit ˆ¨v(cid:48) it ¨yit2 0,∑N i=1 ∑T t=1 sit ˆ¨v(cid:48) it ¨yit2 = 26 =⇒ ˆπππ = = = = 0, (cid:20) N T ∑ ∑ (cid:18) (cid:20) N i=1 t=1 ∑ (cid:20) N i=1 T ∑ ∑ (cid:18) (cid:20) N t=1 i=1 ∑ i=1 0, (cid:21)(cid:19) (cid:21) (cid:21) 0, (cid:21)−1(cid:20) N T sit ˆ¨v(cid:48) ∑ ∑ (cid:21)−1(cid:20) N it ¨yit2 t=1 i=1 T sit ˆ¨v(cid:48) ∑ ∑ (cid:21)−1(cid:20) i=1 t=1 N T sit ˆ¨v(cid:48) ∑ ∑ (cid:21)−1(cid:20) N t=1 i=1 T ∑ ∑ t=1 i=1 sitx(cid:48) it ¨yit2 0, it ¨yit2 it2 ¨yit2 sit ¨x(cid:48) it2 ¨xit2 sit ¨x(cid:48) it2 ¨xit2 it2 ¨xit2 sit ¨x(cid:48) it2 ¨xit2 T ∑ t=1 sit ¨x(cid:48) T ∑ t=1 (cid:21)(cid:19) since xit2 ≡ ˆ¨vit t=1 sit ¨x(cid:48) it2 ¨xit2]−1[∑N i=1 ∑T Now, note that [∑N mator when yit2 is regressed on xit2 ≡ ˆ¨vit for sit = 1. This by construction would be equal to identity matrix: I. This implies i=1 ∑T t=1 sitx(cid:48) it2 ¨yit2] is nothing but the FE esti- ˆπππ = [0,I] The residuals from regressing ¨xit1 on ¨xit2 would be: ˆrit = ¨xit1 − ¨xit2[0,I] = [¨xit, ¨yit2]− [0, ¨xit2] = [¨xit, (¨yit2 − ¨xit2)] Now, note that (¨yit2− ¨xit2) is nothing but the predicted values of ¨yit2 from equation (23): ¨yit2 = ˆ¨yit2 + ¨xit2 • Step 2 Do a POLS of ¨yit1 on ˆrit for sit = 1. This would be a regression of ¨yit1 on ¨xit2 and the predicted values from the reduced form equation: ˆ¨yit2. This precisely the FE2SLS method. 27 1.10 Concluding Remarks Literature on unbalanced panels can be broadly classified into two broad sections. The first sec- tion focuses on the problems related to the detection of selection bias and estimation methods to correct for this bias, if it is not ruled out. The second section looks at the cases when selection bias does not exist. When data is missing at random, then estimation methods of the balanced panel case can be extended. A similar adaptation can be done for specification tests. However, just like in the balanced case, the limitations of these tests would also hold in the unbalanced case. While methods to counter these have been developed for the balanced panels, for unbal- anced panels, limited work has been done to formally account for these limitations. This paper hopes to contribute to the existing literature by attempting to address this issue. The preliminary linear model considered in this paper has many possible extensions for future research. We are interested in developing correlated random effects models for linear unbalanced panel data models with individual specific slopes. This would give us a way to obtain a specification test for testing for heterogeneous slopes. Moreover, in this paper we have assumed that conditional on the observables, selection mechanism is exogenous to the idiosyncratic shocks. This assumption seldom holds in most unbalanced panels. Thus analysis for specifications for unbalanced panels where the exogenous sampling does not hold is another area in which we would like to further extend our analysis. 28 CHAPTER 2 CONTROL FUNCTION SIEVE ESTIMATION OF ENDOGENOUS SWITCHING MODELS WITH ENDOGENEITY 2.1 Introduction Evaluating the causal effects of a program or a policy intervention is one of the most important questions in econometric analysis. In the case of discrete treatments, endogenous switching models provide a powerful framework to capture the causal effects. Switching regression mod- els have been extensively used to estimate structural shifts: to capture the parameter variation where each possible state of parameter vector is called a regime. Endogenous switching models have been used as the primary econometric technique in labour economics to study wage dif- ferentials between public and private sectors(Adamchik and Bedi(1983)), union and non-union members (Lee(1978)). It has also been used in modeling housing demand and modeling of markets in disequilibrium (Thorst(1977)). Endogenous switching models have been traditionally estimated using joint maximum like- lihood estimation. This requires a full specification of the joint distribution of the unobservables. This approach not only places restrictive assumptions on the model but is also computationally challenging. Murtazashvili and Wooldridge(2016) use control function methods to obtain com- putationally simple estimation of switching models under both coefficient homogeneity and heterogeneity. They however still maintain distributional assumptions and homoskedastic er- rors. In this paper, we generalize the endogenous switching models by allowing a more flexible reduced form for the treatment variable. This allows us to incorporate more heterogeneity in our model. To allow for heterogeneity in the treatment variable model, we allow the errors in the reduced form to have conditional heteroskedasticity. It allows the unobservables to contribute to the dependent variable in a heterogeneous manner. One can argue that we can also incorporate 29 heterogeneity in an econometric model by allowing for individual-specific slopes in the reduced form. While this specification of the treatment variable model will allow the model to be more structural, this also imposes a specific structure to the heteroskedasticity. We would prefer if our reduced form has a general form of heteroskedasticity. An attractive feature of allowing heteroskedasticity to be of a general form is that this allows the unconditional distribution of the error term to be of an unknown form, that further relaxes the distributional assumptions on the errors. The endogeneity of the treatment variable or the switching indicator is modeled through control function methods. In particular, we model the endogeneity in terms of the relationship between the errors of the primary equation for the outcome variable (’structural’ errors) and the errors of the reduced form equation for the treatment variable (’reduced-form’ errors). Since we are allowing the ’reduced-form’ errors to have heteroskedasticity that is conditional on the explanatory variables, the relationship between the two error terms is also modeled in terms of the explanatory variables. This is done by using Conditional Linear Projections (CLPs). CLPs are just conditional counterparts to the usual linear projections that are popular in modeling the conditional expectations, specially of the auxiliary terms of an econometric model. The key as- sumption is that the relationship between the ’structural’ errors and the ’reduced-form’ errors that is reflected in terms of the conditional expectation of the ’structural’ errors can be repre- sented in terms of a linear projection conditional on explanatory variables. Conditional linear projections restrict linearity only in terms of the errors while allowing for the dependence on the explanatory variables to be of an unknown functional form. Moreover, when we consider the model which allows for individual-specific slopes in the outcome equation, we have an added component contributing the endogeneity: endogeneity arising due to the correlation be- tween the treatment and idiosyncratic gain to the treatment. This correlation is again in terms of the explanatory variables because of conditional heteroskedasticity. However, as we will see, conditional linear projection allow us to model this endogeneity as well. 30 The estimation is done in two steps. The correction terms for the endogeneity are obtained in the first step estimation of the reduced form model for the treatment variable. An estimat- ing equation that accounts for the endogeneity through these correction terms is estimated in the second step. Since the heteroskedasticity of the reduced-form error as well as the correc- tion terms in the estimating equation are of an unknown functional form, we estimate both the step using the semi-nonparametric methods. In particular, we use method of sieves in both the steps to obtain the estimation procedure. However, we also illustrate how one can impose the functional assumptions in the model making it fully parametric and thus estimate both the step through standard parametric methods. Thus we obtain both parametric and semi-nonparametric estimation procedures. An attractive feature of endogenous switching model is that it serves as an umbrella model for other econometric models. Specifically, the estimation strategy can be modified to estimate the parameters in sample selection models and models with a binary endogenous variable. The estimation procedures in this paper are obtained for the model with homogeneous treatment effects and heteroskedastic reduced form model for the treatment variable and is extended to three models. The first model incorporates heterogeneous treatment effects through individual- specific slopes in the outcome equation. We further extend the estimation strategy to the models with binary endogenous variable and sample selection models. We also give a detailed large sample properties of the estimators. Large sample properties of the parametric estimators are obtained by the straightforward GMM types treatment of the two-step estimation. To obtain the large sample estimation theory of the sieve estimators, we refer to Hahn, Liao and Ridder(2018). Hahn, Liao and Ridder(2018) provide a general unified mathematical framework investigating the asymptotic properties of such sieve two-step estima- tors. Since our sieve estimation procedure neatly fits into this framework, our estimator follows these results. They also show the numerical equivalence result for the parametric variances and sieve variances. We use this equivalence result to obtain the expressions for variances of the 31 two-step sieve estimators. The paper is organized as follows. In section 2.2, we begin with the traditional model of endogenous switching models with constant coefficients. We describe the assumptions of con- ditional heteroskedasticity in the reduced form and obtain the estimating equation using the control function approach that uses conditional linear predictors. In section 2.3, we obtain both parametric and sieve two-step estimation procedures. Section 2.4 describes the asymptotics of our estimators and obtain the results of consistency, asymptotic normality and expressions for the asymptotic variances using the numerical equivalence result. Section 2.5 extends our analy- sis to other econometric models. In particular, we consider endogenous switching models with individual-specific slopes in the outcome equation, models with a binary endogenous explana- tory variable and sample selection models. Section 2.6 illustrate our estimation procedures to an empirical application. Section 2.7 concludes the paper. 2.2 Constant Coefficients Endogenous Switching Regression 2.2.1 Model Suppose that we are interested in evaluating the causal effect of a program on N individuals indexed by i = 1,2, ...,N. The treatment status of an individual i is denoted by a binary variable y2i that takes the value 1 if she is treated and 0 otherwise. As we will see later, y2i could denote a binary endogenous explanatory variable or the selection indicator in sample selection models. In the framework of switching models, y2i becomes the endogenous switching indicator for two regimes: 0 1 y2i = for Regime 0 for Regime 1 (2.1) Consistent with the treatment literature, we postulate the existence of two potential out- 1i } where the superscript denotes the regime. comes, one in each regime denoted by {y(0) 1i ,y(1) 32 Our first assumption states that these counter-factual outcomes are linear in the parameters: Assumption 2.2.1 For each i = 1,2, ...,N, y(0) 1i = x1iγγγ(0) + u(0) i y(1) 1i = x1iγγγ(1) + u(1) i where x1i is a 1× Kx1 vector of exogenous explanatory variables that includes unity. x1i also allows all functional forms (like log, squares) that are linear in parameters. {u(0) i } are the ,u(1) i unobservables. We allow for different coefficients in each regime, however these coefficients are not individual-specific. Note that our observed outcome variable give by y1i can be expressed as: y1i = (1− y2i)y(0) 1i + y2iy(1) 1i (2.2) Substituting for the counter-factuals, we get our primary equation for the switching regres- sion model with constant coefficients γγγ(0) and γγγ(1) as: y1i = (1− y2i)x1iγγγ(0) + y2ix1iγγγ(1) + (1− y2i)u(0) i + y2iu(1) i Changing the notations slightly, we re-write (2.3) as: (2.3) (2.4) y1i = x1iβββ 0 + y2ix1iβββ 1 + v0i + y2iv1i i − u(0) , v1i ≡ u(1) where: βββ 0 ≡ γγγ(0), βββ 1 ≡ γγγ(1) − γγγ(0), v0i ≡ u(0) i is captured by the parameter βββ 1. The switching indicator y2i is a binary variable modeled as: . The average treatment effect i y2i = 111[xiβββ 2 − v2i ≥ 0] (2.5) where xi is a 1× Kx vector of explanatory variables with Kx > Kx1. We allow x1i ⊂ xi 1 1As we will see, this would serve as an exclusion restriction to ensure proper estimation. 33 Heteroskedastic Errors: The key feature of our model is that we allow for the heterogene- ity in the treatment assignment by allowing for a heteroskedastic error variance in the treatment model. Assumption 2.2.2 We incorporate multiplicative heteroskedasticity by assuming: v2i = σ2(xi)··· u2i , u2i|=xi , u2i ∼ N (0,1) (2.6) where N denotes Normal Distribution and σ2(xi) is a function of all the explanatory variables. As we will see, the form of heteroskedasticity is specified in the parametric estimation procedure and is allowed to be of an unknown function in the semi-nonparametric estimation procedure. Before we describe the method to obtain the estimating equations, consider the interpreta- tion of the error structure in the model of the treatment variable. Substitution of equation (2.6) in the reduced form equation for the treatment variable implies: y2i = 111[xiβββ 2 − σ2(xi)u2i ≥ 0] (2.7) If we interpret u2i to be an unobserved variable that effects the probability of an individual i to be in the treated group, then equation (2.7) suggests that the contribution of this unob- served variable depends on the individual’s other covariates that effects his/her selection into the program in a flexible way(as reflected in the unspecified form of the functional form of the heteroskedasticity). For instance, consider the standard example of studying the effect of a job training program on wages and assume that u2i is a measure of unobserved ability. A heteroskedastic error structure in the treatment equation implies that the contribution of the un- observed ability on the probability of being selected or participating in the job training program depends on the individual’s socio-economic factors in a flexible way that is captured by σ2(xi). 34 2.2.2 Estimating Equation First note that y2i is a function of (xi,v2i). Thus, to obtain the estimating equation, we first write: E[y1i|xi,v2i] = x1iβββ 0 + y2ix1iβββ 1 + E[v0i|xi,v2i] + y2iE[v1i|xi,v2i] (2.8) E[v0i|xi,v2i] and E[v1i|xi,v2i] are the correction terms that correct for the bias due to the en- dogenous switching. It is clear from (2.8) that we need to obtain expressions for the correction terms. We will obtain these expressions in terms of Conditional Linear Projections (or predictions). Conditional Linear Projections(CLPs) Linear projections are a popular tool in econometric models to approximate the conditional ex- pectations. In our model, we want to approximate the relationships between the unobservables in the outcome equation and treatment equation conditional on xi as is reflected in the correction terms. In addition, since we would allow the conditional heteroskedasticity to be of an unknown form in the semi-nonparametric estimation, we would like to impose linearity only with respect to v2i leaving the functional form with respect to xi unspecified. Thus, for our purposes, we will the use conditional linear projections. Denoted by L[v ji|v2i;xi], Conditional Linear Projection is defined as the linear projection of v ji on v2i conditional on xi, for j = {0,1}. The theory of CLPs was first developed in Hansen and Richard(1987) in which they essentially extended the conventional Hilbert Space analysis to the conditional framework. Wooldridge(1999) uses the concept of CLP to develop the orthogonality conditions to obtain a distribution free estimation of nonlinear panel data models. In the context of this paper, CLP form the key assumption: Assumption 2.2.3 The conditional expectations are assumed to be linear in v2i conditional on 35 xi: E[v0i|v2i,xi] = L[v0i|v2i;xi] ≡ E[v1i|v2i,xi] = L[v1i|v2i;xi] ≡ (cid:32) (cid:32) (cid:33) (cid:33) v2i v2i σ02(xi) σ 2 2 (xi) σ12(xi) σ 2 2 (xi) (2.9) (2.10) where: σ j2(xi) ≡ Cov(v ji,v2i|xi) , j = {0,1} denote the conditional variances between the error terms in the outcome equation and the error term in the reduced form equation for the treatment equation. Assumption (2.2.3) imposes linearity of E[v ji|v2i,xi], j = {0,1} but only with respect to v2i. Since we are using linear projections conditional on xi, we still allow E[v ji|v2i,xi] to be a flexible non-linear function of xi. To give some context to our assumption, first denote the linear projections in an unconditional case as Lu[v ji|v2i,xi] defined as the linear projection of v ji on v2i and xi, for j = {0,1}. We can use the unconditional linear projections to model the relationships between the unobservables in the case where we impose the stronger assumption of (v0i,v1i,v2i) being jointly independent of xi. In this case, Assumption 2.2.3 translate into E[v ji|v2i,xi] = E[v ji|v2i,xi] = Lu[v ji|v2i]. Furthermore, if we relax the independence assumption but impose linearity on xi, Assumption 2.3 becomes E[v ji|v2i,xi] = Lu[v ji|v2i,xi]. (cid:32) Substituting Assumption (2.2.3) in equation (2.8), we get (cid:33) (cid:33) (cid:32) E[y1i|xi,v2i] = x1iβββ 0 + y2ix1iβββ 1 + σ02(xi) σ 2 2 (xi) σ12(xi) σ 2 2 (xi) v2i + y2i v2i (2.11) Next, note that E[u2i|xi,y2i] are the Generalized Errors from the reduced form model for the treatment variable. In our context, E[u2i|xi,y2i] = y2iλ − (1− y2i)λ (cid:19)(cid:21) (cid:18) − xiβββ 2 σ2(xi) (cid:20) (cid:19) (cid:18) xiβββ 2 σ2(xi) ≡ h(y2i,xi) 36 λ (.) is the Inverse Mills Ratio and h(y2i,xi) denote the generalized residuals from the first stage estimation. =⇒ E[v2i|xi,y2i] = σ2(xi)E[u2i|xi,y2i] = σ2(xi)h(y2i,xi) We also have: E[y1i|xi,y2i] = E[E[y1i|xi,v2i]|xi,y2i]. So we get: E[y1i|xi,y2i] = x1iβββ 0 + y2ix1iβββ 1 + =⇒ E[y1i|xi,y2i] = x1iβββ 0 + y2ix1iβββ 1 + We get our estimating equation as: (cid:32) (cid:33) σ02(xi) σ 2 2 (xi) E[v2i|xi,y2i] + y2i (cid:19) (cid:125) h(y2i,xi) + y2i (cid:18)σ02(xi) (cid:124) (cid:123)(cid:122) σ2(xi) ≡g0(xi) (cid:32) (cid:33) σ12(xi) σ 2 2 (xi) (cid:18)σ12(xi) (cid:124) (cid:123)(cid:122) σ2(xi) ≡g1(xi) (cid:19) (cid:125) E[v2i|xi,y2i] h(y2i,xi) y1i = x1iβββ 0 + y2ix1iβββ 1 + g0(xi)h(y2i,xi) + y2ig1(xi)h(y2i,xi) + ηi (2.12) where E[ηi|xi,y2i] = 0 by construction. 2.3 Estimation Strategy For estimation: y2i = 111[xiβββ 2 − σ2(xi)u2i ≥ 0] y1i = x1iβββ 0 + y2ix1iβββ 1 + g0(xi)h(y2i,xi) + y2ig1(xi)h(y2i,xi) + ηi These two equations suggest a two-step estimation procedure. In the first step we estimate the reduced form model of the treatment variable or the switching indicator. In the second step, we will plug-in the estimates from the first stage and will estimate equation (2.12). Since we 37 have terms of unknown functional form given by {σ2(xi), g0(xi), g1(xi)}, we would estimate both the step through semi-nonparametric procedures, specifically through method of sieves. However, one can easily impose both the distributional and functional form assumptions and estimate both the steps through standard parametric procedures. In this section we describe both parametric and sieve estimation of our model. 2.3.1 Parametric Estimation To estimate the model parametrically, we will first specify the functional form of the het- eroskedasticity in the errors in the treatment model. More specifically, we assume: Assumption 2.3.1 For all i = 1, ...,N σ2(xi) ≡ exp(xiΠΠΠ2) (2.13) This implies Var[v2i|xi] = [exp(xiΠΠΠ2)]2. This will allow us to estimate the reduced form model for the treatment using heteroskedastic probit. First Step: The true parameters to be estimated in the first stage can be denoted by θθθ 02 ≡ {βββ 02,ΠΠΠ02} that belong to a finite dimensional parameter space ΘΘΘ2. Denoting, Φ(.) as Normal cdf, our first stage parametric estimators ˆθθθ 2 ≡ ( ˆβββ 2, ˆΠΠΠ2) solve: ˆθθθ 2 = arg max θθθ2∈ΘΘΘ2 (cid:19) ] + (1− y2i)log[1− Φ (cid:18) xiβββ 2 (cid:18) xiβββ 2 exp(xiΠΠΠ2) exp(xiΠΠΠ2) y2i log[Φ (cid:19) (cid:21) ] (cid:20) N ∑ i=1 (2.14) In practice, we would run the standard het probit command in Stata. This would give us esti- mators for βββ 2 and ΠΠΠ2 which would subsequently give us an estimator of λ (.) and h(xi,y2i). Denote these estimators as ˆλi ≡ λ Second Step: Plugging in the estimator from the first stage estimation in equation (2.12): , ˆhi ≡ h(xi,y2i) (cid:18) x2i ˆβββ 2 ˆσ2(xi) (cid:19) y1i = x1iβββ 0 + y2ix1iβββ 1 + g0(xi)ˆh(y2i,xi) + y2ig1(xi)ˆh(y2i,xi) + η1i (2.15) In the second stage, we further specify the functional forms for the terms g0(xi) and g1(xi). The simplest assumption is that these terms are linear: 38 Assumption 2.3.2 For all i = 1, ...,N where {ΩΩΩ02,ΩΩΩ12} are parameters. g0(xi) ≡ xiΩΩΩ02 g1(xi) ≡ xiΩΩΩ12 (2.16) (2.17) Since we can include interactions, squares, logarithms and other functional forms of the ex- planatory variables in xi, the above assumptions impose linearity only in terms of the pa- rameters. We denote the full list of true parameters to be estimated in the second stage as θθθ 01 ≡ {βββ 00,βββ 01,ΩΩΩ0,01,ΩΩΩ0,12} that belong to the finite dimensional parameter space ΘΘΘ1. In the second step, we can estimate the parameters by running a simple least squares. More specifi- cally, the estimator ˆθθθ 1 solves: ˆθθθ 1 = arg min θθθ1∈ΘΘΘ1 N ∑ i=1 2.3.2 Sieve Estimation (cid:2)y1i − x1iβββ 0 − y2ix1iβββ 1 − ˆhixiΩΩΩ02− y2i ˆhixiΩΩΩ12 (cid:3)2 (2.18) The primary feature of our model is that we do not have any assumptions on the functional forms for the heteroskedasticity of the error terms. We estimate the unknown functions using the sieve estimation procedures. We need to impose some regularity conditions on the unknown functions to be able to estimate them. More specifically, we need to specify the smoothness of the unknown functions to be estimated. Define h ≡ [σ2(xi), g0(xi), g1(xi)] to be the collection of all the unknown functions in our model. We assume that h belongs to a Hölder class. We give the technical definition of the Hölder Class in the Section 2.7. Hölder Class of functions are the most popular in the non-parametric estimation procedures in econometrics because they can be easily estimated by linear sieves. A sieve is called a finite- dimensional linear sieve if it is a linear span of finitely many basis functions. Power series, Fourier series, splines, B-splines and wavelets are some of the most popular linear sieves used in sieve estimation. 39 The estimation in the semi-nonparametric setting also takes place in two steps. Since in this case, we do not impose any functional form assumptions on {σ2(xi), g0(xi), g1(xi)}, our pa- rameters to be estimated no longer lie in the finite dimensional parameter space ΘΘΘ ≡ {θθθ 1,θθθ 2}. Now that we are optimizing on an infinite dimensional space, we will use sieve estimation meth- ods to obtain the estimators in both the stages. The preliminary step in both the first and second stage estimation will be to define the basis functions and the sieve spaces. First Step: In the first stage, our true parameters to be estimated are {βββ 20,σ20(xi)} that lie in the infinite dimensional parameter space A2. For sieve estimation, we define the finite dimen- sional sieve space as: (cid:110) σ2(.) = exp(SKσ ,N (xi)ΠΠΠN) : ΠΠΠN ∈ RKσ ,N(cid:111) A2N = B2X (2.19) where X denotes the tensor product. SKσ ,N (xi) ≡ {s1(.), ..., sKσ ,N (.)} is 1× Kσ ,N vector of basis functions. Thus we get: ααα2N ≡ (βββ 2,σ2N) ∈ A2N which is our finite dimensional sieve space. Next, denoting, Φ(.) as Normal cdf, our estimators are: (cid:32) (cid:33) ˆααα2N = arg max ααα2N∈A2N N ∑ i=1 y2i log[Φ xiβββ 2 exp(SKσ ,N (xi)ΠΠΠN) (1− y2i)log[1− Φ ]+ (cid:32) (cid:33) ] (2.20) xiβββ 2 exp(SKσ ,N (xi)ΠΠΠN) (cid:19) (cid:18) x2i This would give us estimators for βββ 20 and σ 2 estimator of λ (.) and h(xi,y2i). Denote the first stage estimators as ˆβββ 2 , λ Second Step: For the second stage we have the sieve space for {βββ 0,βββ 1, g0(xi), g1(xi)} 20(xi) which would subsequently give us an 2 (xi) , ˆλi ≡ ˆσ 2 ˆβββ 2 ˆσ2(xi) , ˆhi ≡ ˆh(xi,y2i). (cid:27) (cid:26) g1(.) = G1,Kg1,N (xi)ΩΩΩN,12 : ΩΩΩN,12 ∈ RKg1,N(cid:111) X(cid:110) g0(.) = G0,Kg0,N (xi)ΩΩΩN,02:ΩΩΩN,02∈RKg0,N = B1X G0NX G1N (2.21) where G0,Kg0,N (xi) ≡ {g0,1(.), ..., g0,Kg0,N (.)} is 1 × Kg0,N vector of basis functions and G1,Kg1,N (xi) ≡ {g1,1(.), ..., g1,Kg1,N (.)} is 1 × Kg1,N vector of basis functions. Plugging in A1N = B1X 40 equation (12): y1i = x1iβββ 0 + y2ix1iβββ 1 + G0,Kg0,N (xi)ΩΩΩN,02 ˆhi + y2iG1,Kg1,N (xi)ΩΩΩN,12 ˆhi + η1i (2.22) As tor second the stage estima- {ˆg0(xi), ˆg1(xi)} ≡ by suggested the ˆααα1N ≡ { ˆβββ 0, ˆβββ 1, ˆg0(xi), ˆg1(xi)} ∈ A 1N equation (21), in {G0,Kg0,N (xi)(cid:98)ΩΩΩN,02, G1,Kg1,N (xi)(cid:98)ΩΩΩN,12} solve the following optimization problem: where ˆααα1N =arg min ααα1N∈A 1N [y1i − x1iβββ 0 − y2ix1iβββ 1 N ∑ i=1 − G0,Kg0,N (xi)ΩΩΩN,02 ˆhi − y2iG1,Kg1,N (xi)ΩΩΩN,12 ˆhi]2 this amounts to running regression y1i on x1i, y2ix1i, In practice, ˆhiy2iG1,Kg1,N (xi). (2.23) ˆhiG0,Kg0,N (xi), 2.4 Asymptotics This section describes the asymptotic properties of our two-step estimators, both in parametric and sieve estimation. An important consideration in a two-step estimation procedure is whether and how the estimation error of the first step estimators affects the asymptotic variance of the second step estimators. In the parametric case, the methods of adjusting the asymptotic variance of the second step estimator follows the standard procedures given in Wooldridge(2002). In the sieve case, we apply the general results of Hahn,Liao and Ridder(2018) wherein the statistical properties of sieve two step estimators are derived. We further apply the numerical equivalence results of Hahn,Liao and Ridder(2018) due to which we can treat the two-step sieve estimation as if it were a standard two-step parametric estimation and thus conduct a practical inference on the parameters. 41 2.4.1 Asymptotic Properties of the Parametric Estimators In the parametric setting, the consistency of the parameters follows under finite moment con- ditions. Valid inference of the parameters in the second step should take into account the in- clusion of the generalized residuals obtained in the first step. More specifically, we can obtain the asymptotic distribution and the variance analytically by formulating the two-step estima- tion method in an one-step method of moment framework. The asymptotic distribution of our parametric estimators is given as: ( ˆθθθ 1 − θθθ 01) ( ˆθθθ 2 − θθθ 02) √ N  →d N (000,V) , V ≡ A−1BA(cid:48)−1 (2.24) We derive the analytical expression for the asymptotic variance of the parametric estimator in the Appendix. Alternatively, we can also run a bootstrap routine. 2.4.2 Asymptotic properties of the Sieve Estimators In this section, we would specify the asymptotics of our two-step sieve M estimators and the assumptions needed for these properties to hold. There is a rich literature on the the asymptotic properties of 2 step semi-nonparametric estimators. Two-step sieve M estimators have been studied in great detail in Hahn, Liao and Ridder (2018). They derive the asymptotic properties such as consistency and the asymptotic distribution of the estimators. Our estimation problem is very well behaved and our model falls neatly into their framework. Thus all the calculations and properties are easily adapted here and we map their results by verifying the conditions. Assumption 2.4.1 We assume that for i = 1,2,...,N: {y1i,y2i,xi} is i.i.d Assumption 2.4.2 In the first step, we assume that ααα20 ≡ {βββ 20,σ 2 as an unique solution to supααα2∈A 2 Q2(Z2i,ααα2) ≡ y2i log[Φ (cid:19) ] + (1− y2i)log[1− Φ E[Q2(Z2i,ααα2)] where (cid:18) xiβββ 2 20(xi)} ∈ A 2 is the identified (cid:18) xiβββ 2 σ2(xi) (cid:19) ] (2.25) σ2(xi) 42 We estimate ααα20 ∈ A 2 by ˆααα2N ∈ A 2N where A 2N is a finite dimensional sieve space defined as: (cid:110) σ2(.) = exp(SKσ ,N (xi)ΠΠΠN) : ΠΠΠN ∈ RKσ N (cid:111) A 2N = B2X = B2X S 2N where the dim(A 2N) = dim(B2) + dim(S 2N) = Kx + Kσ ,N ≡ K2. Assumption 2.4.3 Sieve Spaces in the First Step (i) The sieve spaces A 2n are compact under ||.||A 2 (ii) A 2n ⊆ A 2 n+1 ⊆ ... ⊆ A 2∀n ≥ 1 (iii) ∃πnααα20 ∈ A 2n (cid:51) ||πnααα20 − ααα20||A 2 → 0 as n → ∞ ˆααα2N is defined as: 1 N N ∑ i=1 Q2(Z2i, ˆααα2N) ≥ supααα2N∈A 2N 1 N N ∑ i=1 Q2(Z2i,ααα2N)− Op(ε2 2N) (2.26) where ε2N is the magnitude of the optimization error. Define Z2i ≡ (y2i,xi). Let (cid:126)βββ 10 ≡ (βββ 00,βββ 10)(cid:48), g0(xi) ≡ (g00(xi), g10(xi)). Further define (cid:126)x1i ≡ (x1i,y2ix1i), (cid:126)hhhi ≡ (cid:126)hhh(ααα02;Z2i) ≡ [h(ααα02;Z2i),y2ih(ααα02;Z2i)](cid:48) Assumption 2.4.4 In the second stage, we assume that ααα10 ≡ ((cid:126)βββ 10, g0) ∈ A 1 is the unique solution to supααα1∈A 1 E[Q1(Z1i,ααα1,ααα20)] where Q1(Z1i,ααα1,ααα20) ≡ −[y1i − ((cid:126)x1i (cid:126)βββ 1 +(cid:126)g0(xi)(cid:126)hhhi)]2 2 (2.27) 43 We estimate ααα10 ∈ A 1 by ˆααα1N ∈ A 1N where A 1N is a finite dimensional sieve space defined as A 1N = B1X (cid:26) g1(.) = G1,Kg1,N (xi)ΩΩΩN,12 :ΩΩΩN,12 ∈ RKg1,N(cid:111) (cid:110) g0(.) = G0,Kg0,N (xi)ΩΩΩN,02 :ΩΩΩN,02 ∈ RKg0,N (cid:27) X = B1X G N where the dim(A 1N) = dim(B1) + dim(G N) = Kx1 + Kg0,N + Kg1,N ≡ K1 Assumption 2.4.5 Sieve Spaces in the Second Step (i) The sieve spaces A 1n are compact under ||.||A 1 (ii) A 1n ⊆ A 1 n+1 ⊆ ... ⊆ A 1∀n ≥ 1 (iii) ∃πnααα10 ∈ A 1n (cid:51) ||πnααα10 − ααα10||A 1 → 0 as n → ∞ ˆααα1N is defined as: 1 N N ∑ i=1 where ε1N is the magnitude of the optimization error. Q1(Z1i, ˆααα1N, ˆααα2N) ≥ supααα1N∈A 1N N ∑ i=1 1 N Q1(Z1i,ααα1N, ˆααα2N)− Op(ε2 1N) (2.28) 2.4.2.1 Consistency The following theorem provide the consistency result of our two step sieve estimators. Theorem 4 (a) Under Assumptions 2.4.1, 2.4.2 and 2.4.3 the first stage estimator ˆααα2N is consistent for ααα20 under the pseudo metric ||.||A 2 defined on A 2. The convergence rate is defined as δ∗ 2N. (b) Under Assumptions 2.4.1, 2.4.4 and 2.4.5 the second step estimator ˆααα1N is consistent for ααα10 under the pseudo metric ||.||A 1 defined on A 1. The convergence rate is defined as δ∗ 1N. 44 Given the convergence rates, we can define the shrinking neighborhoods and assume that • ˆααα2N belongs to a shrinking neighborhood of ααα20 with probability approaching 1 defined as: N 2N ≡ {ααα2N ∈ A 2N : ||ααα2N − ααα20||A 2 ≤ δ2N} where δ2N ≡ δ∗ • ˆααα1N belongs to a shrinking neighborhood of ααα10 with probability approaching 1 defined 2Nlog(log(N)) = o(1). as: N 1N ≡ {ααα1N ∈ A 1N : ||ααα1N − ααα10||A 1 ≤ δ1N} where δ1N ≡ δ∗ 1Nlog(log(N)) = o(1) The assumptions for the consistency of the first step sieve estimator correspond to the theo- retical conditions in Theorm 3.1 in Chen(3.1). Since the likelihood function in the first step is a CDF, it satisfies he continuous and thus is sufficient for consistency. The consistency of the second step sieve estimator takes into account that the first step sieve estimator is consistent. Since our first step sieve estimator is essentially an interaction term in the second step estimating equation, we have a fairly straightforward least square criterion function in the second step that satisfies the continuity and uniform convergence conditions. 2.4.2.2 Asymptotic Normality The asymptotic normality results follow the general asymptotic theory developed by Hahn,Liao and Ridder(2018). We denote ˆαααN ≡ { ˆααα1N, ˆααα2N} which is an estimator of ααα0 ≡ {ααα10,ααα20} ∈ A ≡ A 1X A 2 We are interested in the asymptotic properties of a linear functional ρ( ˆααα) which is an estimator of ρ(ααα0). We defined the shrinking neighborhoods for our first stage and second stage estimators as N 2N, N 1N respectively. Based on these definitions, we can as- sume that ˆαααN belongs to the shrinking neighborhood defined as: N N ≡ {(ααα1N,ααα2N) : ααα1N ∈ N 1N and ααα2N ∈ N 2N}, with probability approaching one. 45 Riesz Representer for the First Step: Suppose that for all ααα2N ∈ N 2N, we can approxi- mate Q2(Z2i,ααα2N)−Q2(Z2i,ααα20) by ∆2(Z2i,ααα20)[ααα2N −ααα20] such that ∆2(Z2i,ααα20)[ααα2N − ααα20] is linear in ααα2N − ααα20. Since ααα20 is a unique maximizer of E[Q2(Z2i,ααα2)] on A 2, we can let: −∂E[∆2(Z2i,ααα20 + τ[ααα2N − ααα20])[ααα2N − ααα20]] ∂τ ≡ ||ααα2N − ααα20||2 A 2 which defines a norm on N 2N. Let H 2 be the closed linear span of N 2N − {ααα20} under ||.||A 2 which is a Hilbert Space under ||.||A 2 with the corresponding inner product(cid:10)., .(cid:11) defined (cid:12)(cid:12)(cid:12)(cid:12)τ=0 = −∂E [∆2(Z2i,ααα20 + τb2N)[a2N]] as: (cid:10)a2N , b2N (cid:11) A 2 for any a1N, b2N ∈ H 2 Typically, we will have ∆2(Z2i,ααα20)[a2N] = ∂ Q2(Z2N,ααα20 + τa2N) ∂τ (cid:10)a2N , b2N (cid:11) = −E A 2 ∂τ (cid:12)(cid:12)(cid:12)(cid:12)τ=0 (cid:12)(cid:12)(cid:12)(cid:12) and (cid:20)∂∆2(Z2i,ααα20 + τb2N)[a2N] ∂τ (cid:21) (cid:12)(cid:12)(cid:12)(cid:12)τ=0 (2.29) if the derivative exists and we can interchange the derivative and expectation. We assume that there is a linear functional ∂2ρ(ααα0)[.] : H 2 → R such that ∂2ρ(ααα0)[a2] = ∂ρ(ααα10,ααα20 + τa2) ∂τ for all a2 ∈ H 2 (2.30) Let ααα20,N denote the projection of ααα20 on A 2N under the norm ||.||A 2. Let H 2N denote the Hilbert space generated by N 2N −{ααα20,N}. Then dim(H 2N) = dim(A 2N) < ∞. By Riesz Representation Theorem, there exists a sieve Riesz Representer a∗ 2N ∈ H 2N such that (cid:12)(cid:12)(cid:12)(cid:12)τ=0 ∂2ρ(ααα0)[a2] =(cid:10)a∗ (cid:11) , a2 2N ∀a2 ∈ H 2N and ||a∗ 2N ||2 A 2 = sup 0(cid:54)=a2∈H 2N A 2 (cid:12)(cid:12)(cid:12)∂2ρ(ααα0)[a2] (cid:12)(cid:12)(cid:12)2 ||a2||2 A 2 . Riesz Representer for the Second Step: Suppose that for all ααα1N ∈ N 1N, we can ap- proximate Q1(Z1i,ααα1N,ααα20)− Q2(Z2i,ααα10,ααα20) by ∆1(Z1i,ααα10,ααα20)[ααα1N − ααα10] such that 46 ∆1(Z1i,ααα10,ααα20)[ααα1N − ααα10] is linear in ααα1N − ααα10. Since ααα10 is a unique maximizer of E[Q1(Z1i,ααα1,ααα20)] on A 1, we can let: −∂E[∆1(Z1i,ααα10 + τ[ααα1N − ααα10],ααα20)[ααα1N − ααα10]] ∂τ ≡ ||ααα1N − ααα10||2 A 1 which defines a norm on N 1N. Let H 1 be the closed linear span of N 1N − {ααα10} under ||.||A 1 which is a Hilbert Space under ||.||A 1 with the corresponding inner product(cid:10)., .(cid:11) A 1 (cid:12)(cid:12)(cid:12)(cid:12)τ=0 defined as: (cid:10)a1N , b1N (cid:11) A 1 for any a1N, b1N ∈ H 1 Typically, we will have ∆1(Z1i,ααα0)[a1N] = ∂ Q1(Z1i,ααα10 + τa1N,ααα20) and(cid:10)a1N , b1N (cid:11) ∂τ A 1 = −E = −∂E [∆1(Z1i,ααα10 + τb1N,ααα20)[a1N]] ∂τ (cid:12)(cid:12)(cid:12)(cid:12)τ=0 (cid:12)(cid:12)(cid:12)(cid:12) (cid:20)∂∆1(Z1i,ααα10 + τb1N,ααα20)[a1N] ∂τ (cid:21) (cid:12)(cid:12)(cid:12)(cid:12)τ=0 (2.31) if the derivative exists and we can interchange the derivative and expectation. We assume that there is a linear functional ∂1ρ(ααα0)[.] : H 1 → R such that ∂ρ(ααα10 + τa1,ααα20) ∂1ρ(ααα0)[a1] = (2.32) Let ααα10,N denote the projection of ααα10 on A 1N under the norm ||.||A 1. Let H 1N denote the Hilbert space generated by N 1N −{ααα10,N}. Then dim(H 1N) = dim(A 1N) < ∞. By Riesz Representation Theorem, there exists a sieve Riesz Representer a∗ 1N ∈ H 1N such that ∂τ for all a1 ∈ H 1 (cid:12)(cid:12)(cid:12)(cid:12)τ=0 ∂1ρ(ααα0)[a1] =(cid:10)a∗ (cid:11) , a1 1N A 1 ∀a1 ∈ H 1N and ||a∗ 1N ||2 A 1 = sup 0(cid:54)=a1∈H 1N Let H = H 1X H 2. For any a = (a1, a2) ∈ H , we denote (cid:12)(cid:12)(cid:12)∂1ρ(ααα0)[a1] (cid:12)(cid:12)(cid:12)2 ||a1||2 A 1 . ∂αρ(ααα0)[a] = ∂1ρ(ααα0)[a1] + ∂2ρ(ααα0)[a2] (2.33) To evaluate the effect of first step sieve estimation on the asymptotic variance of the second step sieve estimator, we define F1(ααα0)[a1] = ∂E[Q1(Z1i,ααα10 + τ1a1,ααα20)] ∂τ1 47 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)τ1=0 for any a1 ∈ H 1 (2.34) and F(ααα0)[a1, a2] = ∂ [F1(ααα10,ααα20 + τ2a2)] ∂τ2 for any a2 ∈ H 2 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)τ2=0, Assume that F(ααα0)[., .] is a bilinear functional on H . Given the Riesz Representer a∗ 1N a∗ FN ∈ H 2N as: ] =(cid:10)a2N, a∗ (cid:11) F(ααα0)[a2N, a∗ 1N for any a2N ∈ H 2N FN A 2 (2.35) , define Finally, we define: ||V ∗ N ||2 ≡ Var ∑N i=1 (cid:104) ∆2(Z2i,ααα20)[a∗ 2N ] + ∆2(Z2i,ααα20)[a∗ FN N1/2 ] + ∆1(Z1i,ααα0)[a∗ 1N (cid:105) ]  Theorem 5 Under the assumptions of Theorem 3.1 in Hahn, Liao and Ridder(2018), the two step estimator satisfies: √ N(cid:2)ρ( ˆααα1N, ˆααα2N)− ρ(ααα10,N,ααα20,N)(cid:3) √ N [ρ( ˆααα1N, ˆααα2N)− ρ(ααα10,ααα20)] ||V ∗ N || ||V ∗ N || →d N (0,1) →d N (0,1) (2.36) (2.37) 2.4.2.3 Consistent Variance Estimator To obtain a method of inference, we now need a consistent estimator of ||V ∗ N ||. In this section we obtain the sample analog of ||V ∗ N || which would be a consistent estimator. First assume that our data is iid and both the criterion functions are twice pathwise differentiable with respect to ααα2N and (ααα1N,ααα2N) in N N. Next we define ∆2(Z2i,ααα2N)[a2N] ≡ ∂ Q2(Z2i,ααα2N + τa2N) ∂τ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)τ=0 and H2(Z2i,ααα2N)[a2N, b2N] ≡ ∂∆2(Z2i,ααα2N + τa2N)[b2N] ∂τ 48 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)τ=0 (2.38) for all (a2N, b2N) ∈ H 2N. Similarly, define ∆1(Z1i,ααα1N,ααα2N)[a2N] ≡ ∂ Q1(Z1i,ααα1N + τa1N,ααα2N) ∂τ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)τ=0 and H1(Z1i,ααα1N,ααα2N)[a2N, b2N] ≡ ∂∆1(Z1i,ααα1N + τa1N,ααα2N)[b2N] ∂τ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)τ=0 is defined as: for all a2N ∈ H 2N for any a2N ∈ H 2N. ≡ − 1 N N ∑ i=1 H2(Z2i, ˆααα2N)[a2N, b2N] Similarly, we define Empirical Riesz representer ˆa∗ 1N as: for all a1N ∈ H 1N for any a1N ∈ H 1N. for all (a1N, b1N) ∈ H 1N. where where Empirical Riesz representer ˆa∗ 2N 2N (cid:11) N,A 2 N,A 2 ∂ρ2( ˆαααN)[a2N] =(cid:10)a2N, ˆa∗ (cid:10)a2N, b2N (cid:11) ∂1ρ( ˆαααN)[a1N] =(cid:10)a1N, ˆa∗ (cid:11) (cid:10)a1N, b1N 1N, a2N] =(cid:10)ˆa∗ FN as: FN( ˆαααN)[ˆa∗ N,A 1 N,A 1 (cid:11) 1N and we define ˆa∗ where FN( ˆαααN)[ˆa∗ 1N , a2N] = 1 N A simple sample analog of ||V ∗ N ∑ i=1 || (cid:98)V ∗ N ||2 = N || is given as: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)∆2(Z2i, ˆααα2N)[ˆa∗ 2N 1 N N ∑ i=1 ≡ − 1 N N ∑ i=1 H2(Z1i, ˆαααN)[a1N, b1N] (cid:11) N,A 2 FN , a2N for all a2N ∈ H 2N ∂∆1(Z1i, ˆααα1N, ˆααα2N + τa2N)[ˆa∗ 1N ∂τ ] + ˆa∗ FN ] + ∆1(Z1i, ˆααα1N, ˆααα2N)[ˆa∗ 1N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)τ=0 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)2 ] 49 (2.39) (2.40) (2.41) (2.42) (2.43) (2.44) (2.45) (2.46) Theorem 6 Hahn Liao and Ridder(2018) establish the following results in Theorem 4.1: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = op(1) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)|| (cid:98)V ∗ N || N || − 1 ||V ∗ || (cid:98)V ∗ N || (2.47) (2.48) →d N (0,1) Therefore, √ N [ρ( ˆααα1N, ˆααα2N)− ρ(ααα10,ααα20)] 2.4.2.4 Explicit Expressions in our model Let(cid:10)a2N, b2N (cid:11) (cid:11) (cid:10)a1N, b1N A 2 A 1 1× Kσ expressions: = R2(Z2i,ααα20)[a2N, b2N] ≡ E [−H2(Z2i,ααα20)[a2N, b2N]] for all a2N, b2N ∈ H 2 = R1(Z1i,ααα0)[a1N, b1N] ≡ E [−H1(Z1i,ααα0)[a1N, b1N]] for all a1N, b1N ∈ H 1  where(cid:126)111Kx is a Kx × 1 vector of ones and S(.) are the  (cid:126)111Kx In the first step, define ¯S ≡ S(cid:48) 2N(.) N vector of basis functions in S 2N. Now for the first step, we will obtain the following ∆2(Z2i,ααα20)[¯S] = ˙qqq2i(Z2i,ααα20) = E (−H2(Z2i,ααα20)[¯S, ¯S]) = ¨qqq2(Z2i,ααα20) =  (cid:20)  (cid:20) x(cid:48) i ˙qqq2(Z2i,ααα20) (cid:18) xiβββ 2 −S(.)(cid:48) ixiβββ 2 ˙qqq2(Z2i,ααα20) (cid:19)(cid:21)(cid:20) y2i − Φ (cid:18) xiβββ 2 σ2 20(xi) 1− Φ Φ  (cid:19) (cid:18) xiβββ 2 σ2 20(xi) σ2 20(xi) E[x(cid:48) E[−S(.)(cid:48) (cid:18) xiβββ 2 σ2 20(xi) Φ E[−x(cid:48) ixi(¨qqq2(Z2i,ααα20))] (cid:20) ixi(xiβββ 2 ¨qqq2(Z2i,ααα20))] E[−S(.)(cid:48) (cid:19)(cid:21)(cid:20) (cid:18) xiβββ 2 (cid:19)(cid:21)2 (cid:18) xiβββ 2 σ2 20(xi) 1− Φ (cid:19)(cid:21) [σ 2 φ 20(xi)]2 σ2 20(xi) (cid:32) (cid:19)(cid:21)φ (cid:33) xiβββ 2 σ 2 20(xi) 1 σ 2 20(xi)  i(xiβββ 2 ¨qqq2(Z2i,ααα20))] iS(.)(cid:48) iS(.)((xiβββ 2)2 ¨qqq2(Z2i,ααα20))] a∗ 2N = ¯S[R2(Z2i,ααα20)[¯S, ¯S]]−1∂2ρ(ααα0)[¯S] 50 In the second step, define ¯G ≡ is a Kx × 1 vector of ones and G(.) are N vector of basis functions in G N. Further define ∂(cid:126)hhhi ≡ ∂(cid:126)hhhi ∂ααα20 the 1× Kg step, we will obtain the following expressions: Now for the second G(cid:48) N(.) (cid:126)111Kx1  where (cid:126)111Kx1   (cid:126)x(cid:48)  E[(cid:126)x(cid:48) E[−(cid:126)x(cid:48) iG(.)(cid:48) 1i(cid:126)x1i] (cid:19) (cid:18) i(cid:126)xi(cid:126)hhhi] E[−G(.)(cid:48) E[−G(.)(cid:48) iG(.)((cid:126)hhhi)2] (cid:126)x(cid:48) i1G(.)∂(cid:126)hhhi ¯S G(.)(cid:48)G(.)∂(cid:126)hhhi ¯S a∗ = ¯G[R1(Z1i,ααα0)[¯G, ¯G]]−1∂1ρ(ααα0)[¯G] 1N FN = ¯S[R2(Z2i,ααα20)[¯S, ¯S]]−1F(ααα0)[¯S, a∗ a∗ 1N 1iηi G(.)(cid:48) (cid:126)hhhiηi i (cid:126)hhhi] i  ] ∆1(Z1i,ααα0)[¯G] = E (−H1(Z1i,ααα0)[¯G, ¯G]) = F(ααα0)[¯S, ¯G] = 2.4.3 Numerical Equivalence It has been well established in the theoretical econometrics literature that the in many cases of semi-nonparametric estimation, one can obtain the semi-nonparametric variances using the standard formulas derived in the parametric estimation. These numerical equivalence also hold true in the estimation strategies that involve two step estimation procedures. This greatly simpli- fies the estimation of the semiparametric asymptotic variance that takes account of the first step estimation. In this section, we will use this numerical equivalence result to derive the expression of the asymptotic variances of our sieve estimators. Recall that in our model, at each step we have unknown functions forms: σ2(xi) in the first stage and {g0(xi), g1(xi)} in the second stage. In the sieve estimation procedure, we construct the sieve spaces that are defined in section 2.2. Suppose that we make the incorrect Assumption that the unknown functions in both the steps take the parametric form. In particular, suppose 51 that for the misspecification, we have the following models for the functions: σ20(.) = S(.)ΠΠΠσ2 g0(.) = G0(.)ΩΩΩg0 g1(.) = G1(.)ΩΩΩg1 where the terms are defined as in section 2.2 except that we have now suppressed the super- scripts for notational simplicity. The most important thing under this misspecification is that the dimensions of these terms in the parametric approximation is equal to the number of basis functions in the sieve estimation. Under these misspecification, the criterion functions become: Q1(xi,βββ 0,βββ 1, G0(.)ΩΩΩg0, G1(.)ΩΩΩg1; S(.)ΠΠΠσ2) ≡ Q1 Q2(xi,βββ 2, S(.)ΠΠΠσ2) ≡ Q2 Define βββ Q1 βββ Q1 ≡ {βββ 0,βββ 1,ΩΩΩg0 ,ΩΩΩg1} ≡ {βββ 2,ΠΠΠσ2} To solve for the asymptotic variances, we cast the optimization problem into the GMM frame- work: (2.49) (2.50) The asymptotic result is as follows: √ N ∂ Q2  ∂ Q1 ∂(cid:98)βββ Q1 ∂(cid:98)βββ Q2 (cid:98)βββ Q1 (cid:98)βββ Q2 − βββ Q1 − βββ Q2  = 000   →d N (000, V) 000 52 where: V ≡ A A ≡ E B ≡ E −1BA −1 ∂ 2Q2 ∂βββ Q1∂βββ Q1  ∂ 2Q1  ∂ Q1 ∂βββ Q2∂βββ Q1 ∂ Q1 ∂βββ Q1 ∂ Q1 ∂βββ Q1 ∂βββ Q1 ∂ Q2 ∂βββ Q2 [ [  ≡ ∂βββ Q1∂βββ Q2 ∂ 2Q1 ∂ 2Q2 ∂βββ Q2∂βββ Q2 ](cid:48) [ ](cid:48) ∂ Q1 ∂βββ Q1 ∂ Q2 ∂βββ Q2 [ ∂ Q2 ∂βββ Q2 ∂ Q2 ∂βββ Q2 ](cid:48) ](cid:48) Q11 Q12   ≡ B11 B12 Q21 Q22 B21 B22  Using the results from the partitioned matrix, we can further extract the asymptotic variance of √ N((cid:98)βββ Q1 − βββ Q1 ) as: √ N((cid:98)βββ Q1 − βββ Q1 ) →d N (000, Vβββ Q1 ) (2.51) where (cid:48) (cid:48) (cid:48) (cid:48) ≡ A11B11A 21 + A12B22A 11 + A11B12A 11 + A12B21A 11 Vβββ Q1 A11 ≡(cid:16) (cid:17)−1 (cid:16) −1 Q11 − Q12Q 22 Q21 (cid:16) Q22 − Q21Q Q11 − Q12Q A12 ≡ −Q11Q12 A21 ≡ −Q22Q21 (cid:17)−1 (cid:17)−1 −1 11 Q12 −1 22 Q21 The estimators for these expressions are simply the sample analogue of the population terms. We denote the estimator for Vβββ Q1 The numerical equivalence result implies that as ˆVβββ Q1 || (cid:98)V ∗ N ||2 = ˆVβββ Q1 (2.52) 53 2.5 Empirical Illustration We illustrate our methods using the subset of the data on student performance and catholic school attendance from Altonji, Elder and Taber(2005). We begin with the the model math12i = α1 + x1iβββ 0 + δ cathhsi + cathhsix1iβββ 1 + v0i + cathhsiv1i (2.53) where x1i includes mother’s education, father’s education and the log of family income. Our primary parameter of interest is δ which the coefficient on cathhs. The instruments for cathhs, which is a binary indicator for attending a Catholic high school, is distance from nearest Catholic high school divided into five bins. Thus the switching indicator cathhs modeled as: catthsi = (cid:49)[α2 + xiβββ 2 − v2i > 0] (2.54) where xi contains x1i and four distance dummy variables. Preliminary Results2: According to our estimation procedure, we do the estimation in two steps. In the first step, we estimate the binary response model for the cathhs and obtain the generalised residuals denoted by (cid:98)gr. In the second step, we run the estimating equation that contains the corrections terms accounting for the endogeneity of cathhs. The results of the second step estimation are given in Table 1. The four columns of Table 1 contain the second step estimation results from four estimation procedures. In the first column, we assume that the model for the switching indicator cathhs is homoskedastic and the model for the outcome variable math12 contains constant coefficients. Thus we run a simple probit command in the first step, obtain the generalized residuals and the run a simple OLS of math12i on 1,cathhsi,x1i,cathhsi × (x1i − x1i),(cid:99)gr2,cathhsi ×(cid:99)gr2 (2.55) 2All the estimation is done in Stata 13 and the standard errors are based on 1000 bootstrap replications. 54 where centering x1i ensures that the coefficient on cathhs gives the average treatment effect. The estimates in the first column are similar to those obtained in Wooldridge(2015). The average treatment effect obtained in population is negative and is not statistically different from zero. In the second model we again assume that the model for the switching indicator cathhs is homoskedastic, so the first step is again probit. However, now we allow for random coefficients on all the explanatory variables. So in the second stage, we run OLS of math12i on 1,cathhsi,x1i,cathhsi × (x1i − x1i),(cid:99)gr2,cathhsi ×(cid:99)gr2, (cid:99)gr2× (x1i − x1i),cathhs×(cid:99)gr2× (x1i − x1i) (2.56) The results of the second step estimation are given in column 2 of Table 1 and replicate the results obtained in Wooldridge(2015): the average treatment effect in the enitre population is essentially zero. The third model now introduced heteroskedasticity in the reduced form model for the switching indicator cathhs. As a result, our estimating equation in the second step now takes the form of Equation(12). We estimate the model by both parametric and sieve estimation in both steps. Column 3 of Table 1 gives the results from the parametric estimation. The first step was a simple hetprobit of cathhs on {1,xi} with the heteroskedastic function being a function of xi. In the second step, the unknown functions are assumed to take the form as given in Assumption 3.2. As before, center the explanatory variables to make the coefficient on cathhs consistent for the average treatment effect. We see that the average treatment is again not significantly different from zero. Finally, in column 4 of Table 1, we estimate both the first step estimation and the second step estimation through sieve estimation. In the first step, the sieve space for the heteroskedastic variance function σ 2 2 (xi) is given as: (2.57) (cid:110) exp(SKσ ,N (xi)ΠΠΠN) : ΠΠΠN ∈ RKσ N (cid:111) S 2N = 55 where SKσ ,N (xi) is a vector of basis functions that contains the elements of the second order polynomials of vector xi. Considering this sieve space, we estimate the first stage model using a flexible het probit to obtain the estimates for the generalized residuals. In the second step, we define the sieve spaces for the unknown functions g0(), g1(.) as follows: (cid:26) G0,Kg0,N (xi)ΩΩΩN,02 :ΩΩΩN,02 ∈ RKg0,N (cid:27) G 0N = G1,Kg1,N (xi)ΩΩΩN,12 :ΩΩΩN,12 ∈ RKg1,N(cid:111) (cid:110) , G 1N = (2.58) where both G0,Kg0,N (xi), G0,Kg1,N (xi) are each vector of basis functions that contains the ele- ments of the second order polynomials of vector x1i. After defining the sieve spaces for the unknown functions, we run a flexible OLS in the second step estimation with the explanatory variables appropiately demeaned. The results from the second step estimation are given in the column 4 of Table 1. We once again find that the average treatment of attending a catholic high school does not have a significant effect on the scores. 56 Table 2.1: Empirical Illustration Results (1) Hom_Const (2) Hom_Rand (3) Het_para (4) Het_Sieve 0.847 (2.133) 0.599*** (0.0788) 0.877*** (0.0716) 1.856*** (0.177) -0.677 (0.882) -0.143 (0.894) -2.445 (2.200) 7,444 0.186 0.482 (2.200) 0.653*** (0.0811) 0.886*** (0.0731) 1.776*** (0.184) -0.357 (0.972) 0.387 (0.931) -4.435* (2.457) 7,444 0.189 Explanatory Variables cathhs mthed f thed l f aminc cathhs (mthed mthed) × − × cathhs ( f thed − f thed) × − cathhs (l f aminc l f aminc) -0.953 (1.747) 0.709*** (0.0642) 0.876*** (0.0583) 1.858*** (0.149) -0.0851 (0.262) 0.184 (0.238) -0.691 0.277 (2.054) 0.619*** (0.0801) 0.871*** (0.0725) 1.821*** (0.179) 0.149 (0.972) -0.541 (0.891) -1.968 (0.634) (2.092) 7,444 Observations R-squared 0.186 Bootstrapped Standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1 7,444 0.185 57 (4) Het_Sieve -0.652 (0.914) 1.750 (1.584) 11.32** (5.436) 2.846 (5.285) -21.22** (9.967) -0.457*** (0.165) 0.140 (0.191) 7,444 0.189 (1) Hom_Const Table 2.1 (cont’d) (2) Hom_Rand (3) Het_para -1.523* (0.797) 3.308** (1.306) -0.592 (0.895) 1.744 (1.489) -0.915* (0.484) -0.0478 (0.453) -0.519 (1.191) -0.603 (0.924) 1.417 (1.540) -1.121** (0.448) -0.00513 (0.440) -0.158 (1.106) Explanatory Variables (cid:98)gr cathhs×(cid:98)gr (cid:98)gr × (mthed − (cid:98)gr × ( f thed − (cid:98)gr× (l f aminc− (cid:98)gr × (mthed − (cid:98)gr × (mthed × − f thed mthed × f thed) mthed)2 mthed) f thed) l f aminc) 7,444 Observations R-squared 0.186 Bootstrapped Standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1 7,444 0.185 7,444 0.186 58 (1) Hom_Const Table 2.1 (cont’d) (2) Hom_Rand (3) Het_para (4) Het_Sieve 0.826 1.554** (0.774) 0.536 (0.703) 1.251 (1.646) (0.715) 0.255 (0.701) 1.184 (1.667) 7,444 0.186 Explanatory Variables − l f aminc mthed × l f aminc) f thed)2 − l f aminc f thed × l f aminc) (cid:98)gr × (mthed × (cid:98)gr × ( f thed − (cid:98)gr × ( f thed × (cid:98)gr× (l f aminc− cathhs × (cid:98)gr × cathhs × (cid:98)gr × cathhs × (cid:98)gr × ( f thed − f thed) (mthed mthed) l f aminc)2 − − (l f aminc l f aminc) 7,444 Observations R-squared 0.186 Bootstrapped Standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1 7,444 0.185 59 -0.140 (0.494) -0.317** (0.139) 0.381 (0.507) 0.847 (0.534) -12.08** (5.883) -4.156 (5.831) 22.28** (10.66) 7,444 0.189 (1) Hom_Const Table 2.1 (cont’d) (2) Hom_Rand (3) Het_para Explanatory Variables (motheduc − motheduc)2 cathhs × (cid:98)gr × cathhs × (cid:98)gr × (motheduc × − f atheduc motheduc× f atheduc) cathhs × (cid:98)gr × (motheduc × − l f aminc motheduc× l f aminc) cathhs × (cid:98)gr × cathhs × (cid:98)gr × ( f atheduc f atheduc)2 − × − ( f atheduc l f aminc f atheduc× l f aminc) cathhs × (cid:98)gr × − (l f aminc l f aminc)2 intercept 11.18*** (1.381) 12.87*** (1.649) 7,444 Observations 0.186 R-squared Bootstrapped Standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1 7,444 0.185 12.69*** (1.653) 7,444 0.186 60 (4) Het_Sieve 0.502*** (0.181) -0.137 (0.211) 0.110 (0.547) 0.273* (0.150) -0.159 (0.576) -0.898 (0.591) 12.71*** (1.706) 7,444 0.189 We also compute the average treatment effect on the untreated and the average treatment effect on the treated using the method given in Wooldridge(2015) for the heteroskedastic sieve estimation by running separate regressions. The results are given in Table 2. The estimates gives us a deeper insight into the reason leading to average treatment effect in the population being not statistically different from zero. As one can see, the ATE on the Treated is positive and significantly different from zero. However, the ATE on the Untreated is not significantly different from zero. In this dataset, though the ATE on the Untreated is numerically small, the fact that the only 452 observations are treated compared to 6992 untreated observations makes the average treatment effect in the population not significantly different from zero. Table 2.2: Treatment Effects Observed Coefficient Bootstrap Std. Error Z P >z Normal Based [95 % confidence interval] ATE on the Treated ATE on the Untreated 2.774772 1.319732 2.10 0.036 .1881439 5.3614 .2969452 2.088351 0.14 0.887 -3.796147 4.390037 ATE .4473989 1.97412 0.23 0.821 -3.421804 4.316602 Treated Observations = 452 Untreated Observations = 6992 2.6 Technical Details 2.6.1 Asymptotic Variance for the Parametric Estimators The asymptotic variance in the parametric setting can be obtained by using the M estimation methods. We will first define the notations in the parametric setting. We have the following models: • wi ≡ {xi,y2ix1i,h(y2i,xi)xi,y2ih(y2i,xi)xi} 61 (cid:9) • θθθ 1 ≡ {βββ 0,βββ 1,ΩΩΩ02,ΩΩΩ12 • θθθ 2 ≡ {βββ 2,ΠΠΠ} • θθθ ≡ {θθθ 1,θθθ 2} Next, we will define the estimation problem in the M estimation framework: where: q1(w1,θθθ 1,θθθ 2) q(w1,θθθ 1;θθθ 2) ≡ q2(w1,θθθ 2) q3(w1,θθθ 2)  q1(w1,θθθ 1;θθθ 2) ≡ −(cid:2)y1i − x1iβββ 0 − y2ix1iβββ 1 − ˆhixiΩΩΩ01− y2i ˆhixiΩΩΩ01 (cid:18) xiβββ 2 (cid:18) xiβββ 2  (cid:16) xiβββ 2 (cid:16) xiβββ 2 (cid:17)(cid:105)(cid:104) y2i − Φ (cid:16) xiβββ 2 (cid:16) xiβββ 2 (cid:17)(cid:105)(cid:104) y2i − Φ σ2(xi,ΠΠΠ) where ˜xi ≡ [xiβββ 2][−1][exp(xiΠΠΠ)]−1 (cid:17) (cid:16) xiβββ 2 (cid:17) (cid:16) xiβββ 2 (cid:17)(cid:105)φ (cid:17)(cid:105)φ q3(w1,θθθ 2) ≡ q2(w1,θθθ 2) ≡ σ2(xi,ΠΠΠ) 1− Φ σ2(xi,ΠΠΠ) 1− Φ σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) (cid:104) (cid:104) Φ Φ σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) i (cid:3)w(cid:48) i ≡ −ηiw(cid:48) (cid:19) (cid:19) σ2(xi,ΠΠΠ) x(cid:48) i ˜xix(cid:48) i The vector q(.) denotes the first order conditions for our estimation criterion functions that implies: E (2.59) (2.60) (2.61) (2.62) (2.63) (2.64) The asymptotic distribution of our parametric estimators is given as: 0 q1(wi,θθθ 01;θθθ 02)  ( ˆθθθ 1 − θθθ 01) q2(wi,θθθ 02) q3(wi,θθθ 02)  =    →d N (000,V) 0 0 ( ˆθθθ 2 − θθθ 02) √ N 62 where: (cid:21) (cid:20)∂ q(wi,θθθ 1;θθθ 2) V ≡ A−1BA(cid:48)−1 A ≡ E B ≡ E[q(wi,θθθ 1;θθθ 2)q(wi,θθθ 1;θθθ 2)(cid:48)] ∂θθθ In addition, we define σ 2 terms for A,B. ∂ q(wi,θθθ 1;θθθ 2) ∂θθθ = Next,  ∂θθθ1 ∂θθθ1 ∂θθθ1 ∂ q1(wi,θθθ1;θθθ2) ∂ q1(wi,θθθ1;θθθ2) ∂ q2(wi,θθθ2) ∂ q2(wi,θθθ2) ∂βββ 2 ∂βββ 2 ∂βββ 2 2 (xi,Π) ≡ Var(vi2|xi) In the rest of this subsection, we will define the  ∂ q1(wi,θθθ1;θθθ2) ∂ΠΠΠ ∂ q2(wi,θθθ2) ∂ΠΠΠ ∂ΠΠΠ ∂ q3(wi,θθθ2) ∂ q3(wi,θθθ2) ∂ q3(wi,θθθ2) ∂ q1(wi,θθθ 1;θθθ 2) ∂ q1(wi,θθθ 1;θθθ 2) ∂ q1(wi,θθθ 1;θθθ 2) ∂θθθ 1 ∂θθθ 2 ∂ΠΠΠ = w(cid:48) = w(cid:48) = w(cid:48) iwi i000 000 λ d(cid:16) xiβββ 2 i000 000 λ d(cid:16) xiβββ 2 σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) (cid:17) (cid:17) x(cid:48) i σ2(xi,ΠΠΠ)xiΩΩΩ02 y2iλ d(cid:16) xiβββ 2 (cid:17) ˜xixiΩΩΩ02 y2iλ d(cid:16) xiβββ 2 σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) ˜xixiΩΩΩ12 (cid:17) x(cid:48) i σ2(xi,ΠΠΠ)xiΩΩΩ12 63 = 000 = 000 ∂ q2(wi,θθθ 2) ∂θθθ 1 ∂ q2(wi,θθθ 2) ∂βββ 2 ∂ q2(wi,θθθ 2) ∂ΠΠΠ ∂ q3(wi,θθθ 2) ∂θθθ 1 ∂ q3(wi,θθθ 2) ∂θθθ 2 ∂ q3(wi,θθθ 2) ∂ΠΠΠ = = = = − − − − (cid:104) Φ (cid:104) Φ (cid:104) Φ (cid:104) Φ σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) φ φ σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) (cid:104) (cid:16) xiβββ 2 (cid:17)(cid:105)(cid:104) (cid:16) xiβββ 2 (cid:16) xiβββ 2 (cid:104) (cid:16) xiβββ 2 (cid:17)(cid:105)(cid:104) (cid:104) (cid:16) xiβββ 2 (cid:17)(cid:105)(cid:104) (cid:16) xiβββ 2 (cid:16) xiβββ 2 (cid:104) (cid:16) xiβββ 2 (cid:17)(cid:105)(cid:104) σ2(xi,ΠΠΠ) φ φ σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) (cid:17)(cid:105)2 x(cid:48) ixi σ2 2 (xi,ΠΠΠ) 1− Φ (cid:16) xiβββ 2 (cid:16) xiβββ 2 σ2(xi,ΠΠΠ) ˜xixix(cid:48) i σ2 2 (xi,ΠΠΠ) 1− Φ (cid:17)(cid:105)2 σ2(xi,ΠΠΠ) (cid:17)(cid:105)2 ˜xixix(cid:48) i σ2 2 (xi,ΠΠΠ) 1− Φ (cid:16) xiβββ 2 σ2(xi,ΠΠΠ) i xix(cid:48) ˜x2 (cid:16) xiβββ 2 i σ2 2 (xi,ΠΠΠ) 1− Φ (cid:17)(cid:105)2 σ2(xi,ΠΠΠ)  (cid:17)(cid:105) + ˜q2(wi,θθθ 2)  (cid:17)(cid:105) + ˜q2(wi,θθθ 2)  (cid:17)(cid:105) + ˜q3(wi,θθθ 2)  (cid:17)(cid:105) + ˜q3(wi,θθθ 2) where λ d(.) denotes the first derivative of In addition, E[ ˜q2(wi,θθθ 2)] and E[ ˜q3(wi,θθθ 2)] are functions of wi,θθθ 2 that will become equal to zero when we will take expectations because they contain [y2i − Φ Next we define B = E[q(wi,θθθ 1;θθθ 2)q(wi,θθθ 1;θθθ 2)(cid:48)] (cid:16) xiβββ 2 the inverse mills ratio. σ2(xi,ΠΠΠ) (cid:17) ] (cid:18) q1(wi,θθθ 1;θθθ 2)(cid:48) q2(wi,θθθ 2)(cid:48) q3(wi,θθθ 2)(cid:48) (cid:19)     = E = E  q1(wi,θθθ 1;θθθ 2) q2(wi,θθθ 2) q3(wi,θθθ 2) 1 q1q(cid:48) q1q(cid:48) q2q(cid:48) 1 q2q(cid:48) 1 q3q(cid:48) q3q(cid:48) 3 2 q1q(cid:48) 2 q2q(cid:48) 2 q3q(cid:48) 3 3   64 Note that we are the suppressing the arguments of the function for expositional simplicity q1q(cid:48) q1q(cid:48) 1 = ηiw(cid:48) iwi 2 = −ηiw(cid:48) ixi φ q1q(cid:48) q2q(cid:48) q2q(cid:48) q2q(cid:48) q3q(cid:48) q3q(cid:48) q3q(cid:48) (cid:17)(cid:105) σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) Φ σ2(xi,ΠΠΠ) (cid:104) Φ σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) 1− Φ σ2(xi,ΠΠΠ) 1− Φ σ2(xi,ΠΠΠ) 1− Φ σ2(xi,ΠΠΠ) 1− Φ σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) x(cid:48) i (cid:16) xiβββ 2 (cid:17) (cid:19) (cid:18) xiβββ 2 (cid:16) xiβββ 2 (cid:17)(cid:105)(cid:104) y2i − Φ (cid:16) xiβββ 2 (cid:17)(cid:105)(cid:104) y2i − Φ (cid:16) xiβββ 2 (cid:17)(cid:105)(cid:104) y2i − Φ (cid:16) xiβββ 2 (cid:17)(cid:105)(cid:104) y2i − Φ (cid:16) xiβββ 2 (cid:17)(cid:105)(cid:104) y2i − Φ (cid:16) xiβββ 2 (cid:17)(cid:105)(cid:104) y2i − Φ (cid:16) xiβββ 2 (cid:17) (cid:17)(cid:105) (cid:16) xiβββ 2 (cid:17)(cid:105)(cid:104) (cid:16) xiβββ 2 y2i − Φ (cid:16) xiβββ 2 (cid:17) (cid:16) xiβββ 2 (cid:16) xiβββ 2 (cid:104) (cid:17)(cid:105)(cid:104) y2i − Φ (cid:17) (cid:18) xiβββ 2 (cid:19) (cid:17)(cid:105)φ (cid:16) xiβββ 2 2 2φ (cid:16) xiβββ 2 (cid:17) (cid:17)(cid:105) (cid:16) xiβββ 2 2(cid:104) (cid:17)(cid:105)2 (cid:17) (cid:16) xiβββ 2 (cid:16) xiβββ 2 φ (cid:17) (cid:19) (cid:18) xiβββ 2 (cid:16) xiβββ 2 ˜xix(cid:48) 2(cid:104) (cid:17)(cid:105)2 (cid:16) xiβββ 2 (cid:17) (cid:16) xiβββ 2 ˜xix(cid:48) ixi 2(cid:20) (cid:17) (cid:18) xiβββ 2 (cid:19)(cid:21)2 (cid:16) xiβββ 2 (cid:17)(cid:105) (cid:17)(cid:105) (cid:17)(cid:105) (cid:17)(cid:105) σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) 1− Φ σ2(xi,ΠΠΠ) 1− Φ σ2(xi,ΠΠΠ) 1− Φ Φ Φ σ2(xi,ΠΠΠ) (cid:104) 2 = 1 = ixi ˜xiφ 3 = −ηiw(cid:48) (cid:104) (cid:16) xiβββ 2  (cid:16) xiβββ 2  (cid:16) xiβββ 2  (cid:16) xiβββ 2  (cid:16) xiβββ 2  (cid:16) xiβββ 2 (cid:104) (cid:104) (cid:104) (cid:104) 3 = 2 = 3 = 1 = Φ Φ Φ Φ σ2(xi,ΠΠΠ) 1− Φ σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) ˜xix(cid:48) ixi σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) φ φ φ σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) σ2(xi,ΠΠΠ) (cid:17) wiηi x(cid:48) ixi iwiηi i x(cid:48) ˜x2 ixi 2.6.2 Holder class Assume that X is the support of x and is compact.3 Denote x ≡ (x1, ...,xK), where the dimension dim(x) = K,. Let 0 < γ ≤ 1. Further define |x|e ≡ ∑K k)1/2 to be the Euclidean Norm. Next, k=1(x2 3We suppress the subscript i to simplify the notations. 65 we define f : X → R to be a real valued function. • Hölder Condition: f are said to satisfy a Hölder condition with exponent γ if ∃c > 0 such that ∀{x,y} ∈ X,|f(x)− f(y)| ≤ c|x− y|γ e Next, let α ≡ (α1, ...,αK),[α] ≡ α1 + ... + αK. A Differential Operator is defined as Dα ≡ ∂ x . Next, let m be a non-negative integer and let p = m + γ. ∂ [α] α1 αK 1 ...∂ x K • f is said to be p- smooth if (a) f is m times continuously differentiable, (b), and (c) Dα f satisfies Hölder condition for exponent γ for all α with [α] = m • Hölder Ball: Define Hölder class to be the class of p -smooth functions f , denoted by Λp(X). Define Cm(X) to be a space of m- times continuously differentiable functions f. Hölder ball with smoothness p = m + γ is defined as: (cid:34) f ∈ Cm(X) :sup[α]≤m supx∈X|Dα f(x)| ≤ c; Λp c (X) = sup[α]=m supx,y∈X,x(cid:54)=y |Dα f(x)− Dα f(x)| |x− y|γ e (cid:35) ≤ c The key assumption in our model is that we need to assume that the unknown functions that need to be estimated belong to the Hölder class. Formally stating, we assume that h ∈ Λp c (X) (2.65) 2.7 Application of the Estimation Strategy to other econometric models The estimation strategy of this paper can be applied to the other econometric models as well. In this paper, we describe how our estimation strategy can be extended to three econometric models. The first model that we consider incorporates individual specific slope in the outcome equation along with heteroskedastic reduced form model for the treatment variable. The sec- ond model that we consider is a standard sample selection model, with heteroskedastic errors. 66 Finally, we consider the econometric model in which the primary equation has a binary endoge- nous variable, that is has a heteroskedastic reduced form. We show how our estimation strategy, both parameteric and sieve estimation can be extended for the estimation of parameters of these models. 2.7.1 Heterogeneous Coefficients Model The first model that we consider is an endogenous switching model with heterogeneous coeffi- cients in the primary equation for the outcome variable. As before, we also allow the reduced form equation of the treatment variable to have conditional heteroskedastic errors. Allowing individual specific slopes in the primary outcome equation allows us to capture heterogeneity in the treatment effects. However, individual-specific slopes also become another source that con- tributes to the endogeneity of the treatment. As we will see, conditional linear predictors allow us to model this endogeneity in a similar way as before. This allows to incorporate substantial amount of heterogeneity in our econometric model. Model: We define the primary outcome equation for each i = 1, ...,N as: Assumption 2.7.1 (cid:18) (cid:19)(cid:48) Where bbbgi ≡ y1i = x1ibbb0i + y2ix1ibbb1i + v0i + y2iv1i (2.66) denote the individual specific slopes in each regime g ∈ {0,1}. The primary outcome equation is derived in the similar manner as before using the counterfactual framework but with individual specific slopes: b1,gi . . . bK1,gi i + u(0) y(0) 1i = x1iγγγ(0) i i + u(1) y(1) 1i = x1iγγγ(1) i y1i = (1− y2i)y(0) 1i + y2iy(1) 1i In the random coefficient framework, our primary parameters of interest are the average population effects denoted by βββ g ≡ E[bgi]. Thus we have βββ g = bgi + dddgi where dddgi ≡ 67 (cid:18) d1,gi (cid:19)(cid:48) . . . dK1,gi with E[dddgi] = 0 for g = {0,1} by construction. Substituting, the out- come equation (43) becomes: y1i = x1iβββ 0 + y2ix1iβββ 1 + x1iddd0i + v0i + y2ix1iddd1i + y2iv1i (2.67) The switching indicator y2i is a binary variable modeled as before in Assumption 2.2. Estimation Equation: To obtain the estimating equation, we first write: E[y1i|xi,v2i] = x1iβββ 0 + y2ix1iβββ 1 + x1iE[ddd0i|xi,v2i] + y2ix1iE[ddd1i|xi,v2i] + E[v0i|xi,v2i] + y2iE[v1i|xi,v2i] (2.68) The expressions for E[v0i|xi,v2i] and E[v1i|xi,u2i] can be obtain from Assumption 2.2.3. this estimating equation, we have two additional terms. These additional terms stem from the In correlation of the heterogeneous coefficients and the endogenous treatment. These terms are E[ddd0i|xi,v2i] and E[ddd1i|xi,v2i]. To obtain the appropriate expressions for these additonal cor- rection terms, we extend the CLP approximation to Assumption 2.7.2 For i = 1, ...N and for g = {0,1}, E[dgi|v2i,xi] = L[dgi|v2i,xi] = (cid:33) v2i (2.69) (cid:32)σσσ dg2(xi) (cid:19)(cid:48) σ 2 2 (xi) (cid:18) where σσσ dddg2(xi) ≡ σdg12(xi) σdg22(xi) and σdg j2(xi) ≡ Cov(dgi, j,v2i|xi) (2.70) for g = {0,1} and j = 1,2, ...,K1. Substituting the expressions for the correction terms we get: . . . σdgK1 2(xi) E[y1i|xi,v2i] = x1iβββ 0 + y2ix1iβββ 1 + x1i (cid:33) (cid:32)σσσ ddd02(xi) σ 2 2 (xi) v2i + y2ix1i (cid:33) (cid:32)σσσ ddd12(xi) σ 2 2 (xi) v2i + (cid:32) (cid:32) σ02(xi) σ 2 2 (xi) σ12(xi) σ 2 2 (xi) (cid:33) (cid:33) v2i v2i (2.71) + y2i 68 Next,using the similar calculations as in section 2.3, we get: E[y1i|xi,v2i] = x1iβββ 0 + y2ix1iβββ 1+x1i h(y2i,xi) + y2ix1i (cid:33) h(y2i,xi) (cid:32)σσσ ddd12(xi) (cid:19) (cid:18)σ12(xi) σ2(xi) σ2(xi) h(y2i,xi) + y2i h(y2i,xi) where h(y2i,xi) is the expression for generalized residuals as defined previously. Next, define: (cid:33) σ2(xi) σ2(xi) (cid:32)σσσ ddd02(xi) (cid:18)σ02(xi) (cid:19)  ≡  σdg12(xi) σ2(xi) ... σdgK1 σ2(xi) for g = {0,1} 2(xi) + (xi) fdddg1 . . .  (cid:18)σg2(xi) fdddgK1 σ2(xi) (xi) (cid:19) fdddg(xi) ≡ fvg ≡  for g = {0,1} Using this notation, we get our final estimating equation as: y1i = x1iβββ 0 + y2ix1iβββ 1 + x1ifddd0 (xi)h(y2i,xi) + y2ix1ifddd1 (xi)h(y2i,xi) + fv0(xi)h(y2i,xi) + y2ifv1(xi)h(y2i,xi) + ηi (2.72) where E[ηi|xi,y2i] = 0 by construction. Parametric Estimation: The estimation will be done in two steps as before with the first step being a hetprobit with Assumption 3.1. We will obtain the estimator for h(y2i,xi). In the second step, we will have to specify the functional forms for the additional terms fddd0 (xi) and (xi). This in turn means that we would be specifying the functional forms for fdddg j (xi) for fddd1 each g = {0,1} and for all j = {1, ...,K1}. The simplest functional form is a linear in parameter specification: Assumption 2.7.3 For all i = 1, ...,N, for each g = {0,1}, for all j = 1, ...,K1 assume fdddg j (xi) = xiωωωdg j (2.73) 69 where each ωωωdg j is a K × 1 vector of coefficients. This implies: fdddg(xi) = XiΩΩΩdg, where Xi ≡  xi 000 ... 000 000 xi ... 000 . . . . . . . . . ... 000 ... . . . xi  and ΩΩΩdg ≡   ωωωdg1 ωωωdg2 ... ωωωdgK1 Substituting the functional forms of the additional terms with the terms defined in Assumption 3.1 and the estimators from the first step, we have the following parametric estimating equation: y1i = x1iβββ 0 + y2ix1iβββ 1 + x1iXiΩΩΩd0 ˆhi + y2ix1iXiΩΩΩd1 ˆhi + ˆhixiΩΩΩ01+ y2i ˆhixiΩΩΩ01+ ηi (2.74) In the second step, equation (51) is then estimated using least squares. Sieve Estimation: The first step of sieve estimation also follows the same procedure as in section 2.2 with defining the sieve space as A2N and then estimating the result MLE. In the second stage, we extend the sieve space to accommodate the additional functional forms arising due to the presence of individual specific slopes.The sieve spaces for {fv0(xi), fv1(xi)} would be similar to those defined for {g0(xi), g1(xi)}. The sieve spaces for {fddd0 (xi)} would also be straightforward, but a little more notationally involved. (xi), fddd1 In particular, first define sieve spaces for each component of fdddg(xi): fdddg j (xi) = Dg j(xi)ωωωN,dg j ≡ D dg j N K g (xi)ωωωN,dg j : ωωωN,dg j ∈ RK dg j N (2.75) K for each j = {1, ...,K1} and for g = {0,1}. D g Next, let Kdddg dgK1 N + ... + K N N + Kdg2 ) and define: dg j N (.) is a 1× Kdg j N vector of basis functions.   ωωωN,dg1 ωωωN,dg2 ... ωωωN,dgK1  and ΩΩΩN,dg ≡ (K1×K dddg N ) dddg N ×1) (K (2.76)  N ≡ (Kdg1 Dg1(xi) 000 000 ... 000 Dg2(xi) ... 000 . . . . . . ... . . . 000 ... . . . DgK1(xi) Dg(xi) ≡  70 We have the sieve space for fdddg(xi) for g = {0,1}, (cid:40) fdddg(xi) = Dg(xi)ΩΩΩN,dg :ΩΩΩN,dg ∈ RK dddg N (cid:41) (2.77) particular, In {βββ 0,βββ 1, fddd0 sieve our second (xi), fv0(xi), fv1(xi)} is now defined as: (xi), fddd1 space the in stage for the parameters A 1N = B1X (cid:41) v0 N X (cid:40) (cid:40) X (cid:40) (cid:40) K fv0(.) = G 0 v1 N K fv1(.) = G 1 v0 N (xi)ΩΩΩN,02 :ΩΩΩN,02 ∈ RK (cid:41) v1 (cid:41) N (xi)ΩΩΩN,12 :ΩΩΩN,12 ∈ RK fddd0 (xi) = D0(xi)ΩΩΩN,d0 :ΩΩΩN,d0 ddd1 ∈ RK N X (cid:41) (2.78) (xi) ≡ fddd1 (xi) = D1(xi)ΩΩΩN,d1 ddd1 ∈ RK N :ΩΩΩN,d1 (cid:41) K g1 0(.), ..., g 0 v0 N (.) is a 1 × Kv0 K N vector of basis functions , G 1 v1 N (cid:40) (cid:41) v0 N K where G 0 (cid:40) K g1 1(.), ..., g 0 (xi) ≡ v1 N (.) is a 1× Kv1 N vector of basis functions. Plugging in equation(49) and sup- pressing the superscripts: y1i = x1iβββ 0 + y2ix1iβββ 1 + x1iD0(xi)ΩΩΩN,d0 ˆhi + y2ix1iD1(xi)ΩΩΩN,d1 ˆhi + G0(xi)ΩΩΩN,02 ˆhi + y2iG1(xi)ΩΩΩN,12 ˆhi + ηi (2.79) As suggested by equation (56), in the second step, we obtain the estimators by running regres- sion y1i on x1i,y2ix1i,x1iD0(xi)ˆhi,y2ix1iD1(xi)ˆhi, G0(xi)ˆhi,y2iG1(xi)ˆhi 2.7.2 Binary Endogenous Variable The next model that we consider are the models with a binary endogenous variable. Adaptation of the estimating strategy helps to incorporate heterogeneity in the reduced form by allowing heteroskedastic errors. 71 Model: The model comprises of a primary equation for the outcome variable y1i that is assumed to be linear in parameters: Assumption 2.7.4 For all i = 1, ...,N y1i = x1iβββ 1 + y2iβy + v1i (2.80) where y2i is a binary endogenous variable. x1i are the exogenous variables. We also have a linear reduced form model for y2i given as: Assumption 2.7.5 y2i = 111[xiβββ 2 − v2i < 0], (2.81) where xi is a vector of all the exogenous variables that includes x1i. The formal version for the exogeneity is given as: Assumption 2.7.6 E[v1i|xi] = 0 (2.82) The next assumption defines the heteroskedastic structure to the errors in the reduced form: v2i ≡ σ2(xi)u2i u2i|=xi, u2i ∼ N (0,1) (2.83) (2.84) Finally, we use conditional linear predictors to model the endogeneity of y2i through control function methods: Assumption 2.7.7 E[v1i|v2i;xi] = L[v1i|v2i;xi] = σ12(xi) σ 2 2 (xi) v2i (2.85) This Assumption helps us to obtain the correction term that we need to obtain the estimating equation. 72 Estimating Equation: To obtain the estimating equation, we follow the similar algorithm as given in section 2.2: E[y1i|xi,y2i] = E[E[y1i|xi,v2i]|xi,y2i] = x1iβββ 1 + y2iβy + E[E[v1i|xi,v2i]|xi,y2i] (cid:19) (cid:125) E[y1i|xi,y2i] = x1iβββ 1 + y2iβy + g(xi)h(xi,y2i) (cid:18)σ12(xi) (cid:123)(cid:122) (cid:124) σ2(xi) ≡g(xi) = x1iβββ 1 + y2iβy + h(xi,y2i) where h(xi,y2i) is the generalized residuals from the first stage estimation of the reduced form model for y2i. Thus we get our estimating equation as: y1i = x1iβββ 1 + y2iβy + g(xi)h(y2i,xi) + ηi (2.86) with E[ηi|xi,y2i] = 0 by construction. Estimation of equation (2.63) is analogous to the two step estimation described in the previous sections. We can either impose the parametric Assumptions by specifying the functional forms of the σ2(xi) and g(xi) or can perform the sieve estimation in both the steps. Parametric Estimation: In the first step we define again specify the functional form for σ2(xi) as in Assumption 3.3 and will run the hetprobit and obtain the estimator for the general- ized residuals. The second step involves specifying the functional form for the g(xi): Assumption 2.7.8 Assume that for all i = 1, ...,N, g(xi) = xiΩΩΩ12 (2.87) Next we just substitute the functional form and the estimators from the first step and obtain: y1i = x1iβββ 1 + y2iβy + xiΩΩΩ12 ˆhi + ηi (2.88) that is solved by regressing y1i on x1i, y2i, ˆhixi. Sieve Estimation: The firs step in sieve estimation again begins with define the sieve space A2N as done previously and running the resulting hetprobit In the second step, the sieve 73 (cid:26) g(xi) = G N (xi)ΩΩΩ12,N :ΩΩΩ12,N ∈ RKg Kg N (cid:27) where G Kg N (xi) ≡ N vector of basis functions. Substituting the first stage estimators space is defined as A 1N = B1 × Kg N (.)} is 1× Kg {g1(.), ..., g and the basis functions, we get and the parameters are obtained by regressing y1i on x1i, y2i, ˆhiG y1i = x1iβββ 1 + y2iβy + G (2.89) Kg N (xi)ΩΩΩ12 ˆhi + ηi Kg N (xi). 2.7.3 Sample Selection Model Finally we extend our estimation strategy to sample selection models with heteroskedasticity. Sample selection models are one of the leading applications for the semi and nonparameteric estimation techniques. However, as Vella(1998) notes, heteroskedasticity in sample selection models still remains an unexplored issue. Since heteroskedasticity in sample selection mod- els manifests itself in the model for the selection indicator that is usually non-linear, it cre- ates inconsistency. On the other hand, identifying and correcting the issues arising due to het- eroskedasticity require restrictive distribution assumptions (Vella(1998)). Heteroskedastic sample selection models have been previously addressed by Donald(1995) and Chen and Khan(2003). Donald(1995) permits heteroskedasticity of an unknown form and maintains the assumption of bivariate normality of the errors. He suggests a two-step estimation procedure that allows a non-parametric estimation of the model. However, as we will see, his method becomes complicated due to necessity of a trimming rule. Chen and Khan(2003) propose a three-step estimator that corrects for heteroskedastic sample selection bias. Their estimation procedure involves nonparametric estimation of propensity scores in the first step and nonparametric quantile estimation of the conditional interquartile range of the outcome equation dependent variable for the selected observations in the second step. The first two steps yield a reweighted outcome equation that is partially linear that is analogous Donald’s(1995) and is thus estimated using the appropriate methods. While innovative in nature, their three-step estimation 74 strategy requires multiple smoothing parameters, including those for the quantile estimation in the second step and a trimming function in the third step. The estimation procedure suggested in this section, addresses these issues by proposing a simple two-step estimation that corrects for the sample selection bias and is straightforward to apply. We incorporates heteroskedasticity in the sample selection models that could either be given a functional form in the parametric estimation or could be of an unknown form in the sieve estimation. As before, we model the endogeniety arising in the model due to the sample selection bias using the conditional linear predictors. Model: The model comprises of two latent variables: a continuous variable y∗ 1i that yields 2i that yields the binary selection indicator. Both the outcome variable and a latent variable y∗ latent variables are assumed to be linear in parameters: Assumption 2.7.9 For all i = 1, ...,N y∗ 1i = x1iβββ 1 + v1i y∗ 2i = xiβββ 1 − v2i, x1i ⊂ xi y1i = y2i • y∗ y2i = 111[xiβββ 2 − v2i < 0] 1i if and only if {xi,y1i}, is fully observed =⇒ In other words, y2i = 1 0, otherwise We incorporate conditional heteroskedasticity of the multiplicative form: Assumption 2.7.10 v2i ≡ σ2(xi)u2i, u2i|=xi, u2i ∼ N (0,1) 75 (2.90) (2.91) (2.92) (2.93) (2.94) (2.95) The endogeneity of sample selection is reflected in the relationship between the latent errors of the outcome equation and the selection equation. The relationship is modeled using conditional linear predictors that restricts the linearity only in terms of v2i which allowing xi to enter in an unspecified functional form. Assumption 2.7.11 E[v1i|v2i;xi] = L[v1i|v2i;xi] = σ12(xi) σ 2 2 (xi) v2i (2.96) where y2i is now the selection indicator. The nonrandom selection is reflected in equation (53). Estimating Equation: To obtain the estimating equation, first note that since y1i is observed only when y2i = 1, our key equation in sample selection models is E[y1i|xi,y2i = 1]. Thus our algorithm changes slightly to: E[y1i|xi,y2i = 1] = E[E[y1i|xi,v2i]|xi,y2i = 1] (cid:19) = x1iβββ 1 + E[E[v1i|xi,v2i]|xi,y2i = 1] = x1iβββ 1 + (cid:18)σ12(xi) (cid:124) (cid:123)(cid:122) σ2(xi) ≡g(xi) E[y1i|xi,y2i = 1] = x1iβββ 1 + g(xi)λ (cid:19) (cid:125) (cid:18) x2iβββ 2 (cid:18) x2iβββ 2 (cid:19) σ2(xi) λ σ2(xi) with λ (.) being the Inverse Mills Ratio. Thus we get our estimating equation as: y1i = x1iβββ 1 + g(xi)λ with E[ηi|xi,y2i = 1] = 0 by construction. (cid:19) (cid:18) x2iβββ 2 σ2(xi) + ηi (2.97) Estimating Procedure: The estimating strategy is exactly as before for both parametric and sieve estimation with two main differences. First, in the first step we need to obtain only the es- (cid:17) ≡ ˆλi and not the full generalized residual. Secondly, in the second stage, (cid:16) xiβββ 2 timator for λ σ2(xi) we will run the regression only using the observations with y2i = 1 for obvious reasons. To give 76 some context to our estimation strategy, we compare it with that proposed by Donald(1995). To obtain the estimation procedure, Donald(1995) divides equation (74) by λi ≡ λ yields: σ2(xi) that (cid:16) x2iβββ 2 (cid:17) y1i λi = xi λi βββ 1 + g(xi) + ηi λi Replacing λi with ˆλi, yields a partially linear model that is then and then estimated using a In Donald(1995) the first stage esti- differencing procedure analogous to Robinson(1988). mation is done using nonparametric methods and λi is constructed by inverting the standard normal density function that is only defined if the arguments of the density function are be- tween zero and one. In addition, ˆλi the enters the estimating equation in the denominator. As Donald(1995) notes, this requires a trimming function that complicates the estimation. In the semi-nonparametric estimation procedure suggested in this paper requires no such methods and directly estimates the unknown function in the estimating equation using sieve methods. 2.8 Concluding Remarks This paper proposes a two-step sieve control function estimation of endogenous switching mod- els. We allow the probit reduced form model for the binary switching indicator to incorpo- rate heterogeniety by allowing for a conditional heteroskedastic error term. Allowing the het- eroskedastic function to be of an unknown form, we also relax the distributional assumption in the reduced form. Conditional Linear Projections help us to obtain the corrections terms in the estimating equation in the environment of conditional heteroskedasticity and unspeci- fied distributional assumptions. Though the focus of the paper is the sieve estimation, we also suggest how one can still incorporate an heteroskedastic reduced form and estimate the model parametrically. We also extend the model to the endogenous switching models with individual specific slopes in the outcome equation, linear models with a binary endogenous variable and sample selection models. The extension to the sample selection model serves as an important 77 contribution to the limited literature of sample selection models with heteroskedasticity. 78 CHAPTER 3 CONTROL FUNCTION ESTIMATION OF SPATIAL ERROR MODELS WITH ENDOGENEITY 3.1 Introduction Cross-sectional dependence creates an interesting and challenging environment for estimation and inference in applied econometrics. In the context of time-series data, we have several tractable procedures for estimation with correlated observations because of the uni-directional nature of dependence that renders a natural ordering to the data: outcomes in the past can affect the outcomes in the future but not vice-versa. In the cross-sectional context, on the other hand, the lack of such natural ordering and structure restricts the use of general forms of dependence that are routinely allowed in time-series data. Allowing the observations to exhibit spatial dependence is an important way to model cross- sectional dependence among economic agents. In the framework of spatial dependence, the individuals are interdependent due to their locations in an Euclidean space and their proximity with each other other. This proximity is measured as the economic distance between the indi- viduals as described in Conley (1999) that can be defined within the framework of the specific empirical application. In this paper we consider the R2 Euclidean space for the sake of exposition. Empirical ques- tions in the field of regional and urban economics, agricultural and environmental economics, industrial organization and as well as public health and epidemiology- all are concerned with data that exhibits significant spatial dependence due to the geographical location of the obser- vations. Specifically, consider the hedonic price models in the housing markets. Hedonic price models are regression models where the price of a commodity is regressed on its attributes. In housing markets, the price of a house depends not only on its own attributes such as floor plan etc but also on the location of the house. The spatial dependence is exhibited in the un- 79 observed (or poorly measured) neighborhood variables. These spatial omitted neighborhood variables enter the unobservables of the regression model inducing spatial dependence in the errors. This serves as the key motivation behind Spatial Error Models (SEM) that provide one framework to capture spatial dependence in the data. Spatial dependence can also be modeled in the framework of Spatial Lag Models were the outcome variables are assumed to exhibit spatial correlation. In this paper, we consider the Spatial Error Models. Thus, we posit that the spatial de- pendence in the data is captured in the unobservables of the models. This entails having a non-spherical variance-covariance matrix of the errors. The spatial dependence is accounted for in estimation of SEMs by incorporating the correlations between the observations in the es- timation procedure. This entails accounting for all the pairwise correlations of the spatial data in a framework similar to Generalized Least Squares (GLS) estimation. However, in a large sample we have a huge error variance-covariance matrix and this makes the efficient GLS-type estimation procedure very cumbersome to implement. In this paper, we contribute to the literature by obtaining an estimation procedure for linear regression models with spatially correlated errors and endogenous variables. Our estimation procedure achieves efficiency gains by taking account of the spatial correlations between the observations. We provide a computationally simple estimation procedure by dividing the data into groups based on the distances between them and then accounting for only the correlation between the observations within a group while ignoring the correlations between the observa- tions between the groups. The intuition is based on The First Law of Geography, according to Tobler (1970): "everything is related to everything else, but near things are more related than distant things". Since correlations within a group accounts for the most correlation in the data, we get significant efficiency gains while simultaneously avoiding the tedious calculations of the traditional GLS estimation with the variance-covariance matrix for the entire data. This is also motivated by the nature in which spatial data is collected: data is often collected from different 80 geographical regions leading to a natural clustering or division of the observations. However the estimation is different from the estimation procedures with clustered data because with spatial data, the correlations within a group are the function of the distance between the observations in a group. In addition, unlike the estimation procedures with clustered data, we do not impose the independence assumptions between different groups. Lu and Wooldridge (2017) consider estimation of linear econometric models with spatial data in which they describe an estimation procedure by first dividing the observations into groups and then using only the within-group information while ignoring the across-group cor- relations. For the linear models, they obtain a Quasi-GLS estimation procedure that uses a tapered error covariance matrix as opposed to using the full error covariance matrix used in the traditional GLS estimation. Estimation for the non-linear models can be done in the similar way to obtain Generalized Estimating Equations that account for the spatial correlations in the data (Lu 2013). Wang, Iglesias and Wooldridge (2013) also suggest a similar estimation procedure for spatial Probit models in which the observations are divided by pairwise groups and then bivariate normal distributions are specified within each group. In Lu and Wooldridge (2017) framework of estimation of models with spatial data, all the covariates are assumed to be uncorrelated with the unobservables. However, in practice, ex- planatory variables are often correlated with the errors leading to endogeneity that causes in- consistencies of estimators. For example, consider the hedonic housing price models where one is interested in studying the causal relationship between school quality and house prices. Nguyen-Hoang and Yinger (2011) give a detailed account of the studies that have explored the impact of school quality on housing values. In this empirical question; the school qual- ity, often measured by some indicator of overall school achievement scores; is correlated with the omitted neighborhood variables giving rise to the problem of endogeneity. In addition,the houses located near one another to have similar unobservable attributes and this motivates the spatial error framework of spatial dependence. This serves as an important motivation behind 81 the design of the econometric model of this paper. In this paper, we consider linear models with spatially correlated data and allow some co- variates to be endogenous and implement control function method to correct for the endogene- ity. Our estimation strategy divides the observations into groups based on the distance between them and then incorporates the correlation structure of individuals within a group. The group structure is also conducive to obtaining additional instruments to correct for the endogeneity. Specifically, we recognize that for each individual within a group, the exogenous variables of his/her within-group neighbors are also eligible instruments for that individual. We use these additional instruments in the traditional 2SLS estimation to obtain what we call Grouped Two Stage Least Square estimator. We also describe how the groupwise spatial dependence can be incorporated with these additional instruments in an a Generalized Instrument Variable frame- work to obtain a Spatial Generalized Instrument Variable estimator. Finally we describe a two step control function estimation method in which we explicitly model the endogeneity through a control function assumption that is imposed for each group. This control function assumption also incorporates the groupwise spatial dependence and gives us a Spatial Control Function estimator. Spatial econometrics has experienced some major advancements in the asymptotic theory research. Asymptotic theory for estimation with dependent processes establishes laws of large numbers and central limit theorems by imposing some structure to the cross-sectional depen- dence and regularity conditions. In practice, this entails clearly defining the nature of spatial de- pendence and then either deriving or applying the asymptotic results. Conley (1999) establishes law of large numbers and central limit theorems for stationary mixing processes. Jenish and Prucha (2009) consider the spatial asymptotics for more general form of dependence including nonstationary mixing processes. Jenish and Prucha (2012) derive the law of large numbers and central limit theorems for near-epoch dependent random fields. Under this general framework, they also establish the consistency and asymptotic normality results for the GMM estimators. In 82 this paper, the asymptotic results are described under the framework of near-epoch dependence to give it the most general treatment. There are two framework under which spatial asymptotic theory can be developed. Under increasing-domain asymptotics, we assume that the number of observations increase and the minimum distance between the observations is bounded below by a positive constant. In con- trast, we have the fixed-domain or infill aymptotics, wherein, the observations become increas- ingly dense in a fixed and bounded region. While there is a vast literature establishing several asymptotic results of consistency and normality in the increasing-domain framework (Mardia and Marshall (1984), Cressie and Lahiri (1993), Lee (2004), Conley (1999)) ; under the fixed- domain framework, as the interactions between the observations increase with the sample size, it is known that Maximum Likelihood estimators are inconsistent (Lee (2004)). We obtain the asymptotic properties of the estimators under the increasing-domain framework. For the asymptotic properties in our context, we assume that as the number of observations increases, the number of groups increases while the group size remains fixed. We further assume that the minimum distance between the observations and the groups is bounded below by a positive constant. The asymptotic properties of the Spatial Control Function are obtained by collecting the estimating equations and describing the two-step estimation procedure as a one- step estimation procedure. In addition, since we have a two-step estimation, we also obtain the asymptotic variance of the Spatial Control Function that corrects for the first-step estimation. We also obtain consistent variance-covariance estimators that are made robust to the mis- specification of the spatial correlation structure as well as the correlation between the observa- tions in different groups. This accounts for across-group dependence between the observations that we ignore in the estimation procedure making our inference robust. We use the HAC esti- mator defined by Lu and Wooldridge (2017) that considers all the observations/groups within a fixed radius of a particular observation/group. The paper is structured as follows. In section 2, we define a linear regression model with the 83 spatially dependent errors and endogenous covariates. In section 3, we obtain the estimating equations by implementing control function methods. In section 4, we describe our estima- tion procedure that takes account of the within-group correlations. In section 5, we obtain the asymptotic properties of our estimator under near-epoch dependence. We also provide consis- tent variance estimators. In section 6, we describe the design and the results of the Monte Carlo simulation studies to illustrate the small sample properties of our estimators. Finally, in section 7, we conclude and suggest possible avenues for future research. 3.2 Model Let S be a two dimensional Euclidean space in which the population resides. Let si represent a location in S for i = 1,2, .... Denote the distance between locations {si} ∈ S and location {s j} ∈ S by di j. The data points sampled at a location si ∈ S are denoted by {y1si,y2si,zsi} where i = 1,2, ...,N. In addition, we have {usi,vsi} as the underlying unobservables. To make the notations simple, we will denote the index si by i. 3.2.1 A Linear Regression Model The primary equation for our outcome variable is assumed to be linear in parameters: Assumption 3.2.1 We have a linear in parameter regression model for the outcome variable: y1i = xxx1iβββ 1 + τy2i + ui (3.1) where y2i is the scalar endogenous variable and xxx1i is a 1×Kx1 vector of exogenous explanatory variables with the first element being equal to unity. The vector of coefficients on xxx1i is denoted by βββ 1 which is Kx1 × 1. τ is a scalar coefficient on the scalar endogenous variable y2i. We defined a linear in parameter reduced form model for the endogenous variable: The reduced form of the scalar endogenous variable is also assumed to be linear in parameters: 84 Assumption 3.2.2 For i = 1,2, ...,N y2i = xxx2iβββ 2 + vi (3.2) where xxx2i is a 1 × Kx2 vector of exogenous explanatory variables that also include x1i. We assume that Kx2 ≥ Kx1 that serves as an exclusion condition needed for identification. The vector of coefficients on x2i is denoted by βββ 2 which is Kx2 × 1. In matrix form, we write our model as: Y1N = XXX1Nβββ 1 + τY2N +UN Y2N = XXX2Nβββ 2 +VN where (cid:18) (cid:18) Y1N ≡ UN ≡ (cid:19)(cid:48) (cid:19)(cid:48) y11 . . . y1N u1 . . . uN (cid:18) ; XXX1N ≡ ; VN ≡ v1 (cid:18) xxx11 (cid:19)(cid:48) (cid:18) y21 . . . xxx1N (cid:19)(cid:48) (cid:18) ; Y2N ≡ . . . y2N (cid:19)(cid:48) . . . vN ; XXX2N ≡ xxx21 . . . xxx2N (cid:19)(cid:48) (3.3) (3.4) (3.5) (3.6) We assume that the strict exogeneity of the instruments hold. The assumption for the ex- ogenous variables is formally stated as below: Assumption 3.2.3 Strict Exogeneity Condition:  UN VN (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)XXX2N  E  = E  = UN VN  000 000 (3.7) 3.2.2 Spatially Correlated Errors In this paper, the spatial nature of the data is captured in the variance-covariance matrix of the error terms. In other words, we consider Spatial Error Models (SEMs). As Dubin (1988) notes, SEMs are an integral part of the urban economics and they play a special role in housing 85 hedonic regression analysis. In housing markets, the omitted variables will have high degree of dependence because houses that are near each other will have similar "neighborhood vari- ables" (Dubin (1988)). Dubin (1988) recognizes crime rates, quality of public schools , race as common neighborhood variables that are either unobservables and/or are difficult to accurately measure. Thus, often these variables that exhibit spatial dependence enter the error terms of the regression model motivating a SEM analyses. Glass, Kenegalieva and Sickles (2012) also use SEM analysis the context of state vehicle usage in the US. They make two economic arguments for incorporating SEM. First, in the context of state vehicle usage in the US, they argue SEMs gives a fuller representation of spatial dependence than models which do not include a spatial autocorrelation term. Second, they illustrate how SEMs allow researchers to permit Wald tests of whole sets of coefficients against one another to ascertain if models which are estimated using disaggregated data contain more information than the aggregate model. In our model, we have two types of errors for each individual. Each individual has a pri- mary error ui which is the unobservable term in the primary equation and the a reduced-form error vi which is the unobservable term in the reduced form equation . Thus we focus on the . However, since we also have endogeneity in our model, variance-covariance matrix of UN VN the variance-covariance matrix of the error terms also needs to capture that. The following as- sumption formalizes the spatial nature of the data and the endogeneity of the model: Assumption 3.2.4 The variance-covariance matrix is given as:  UN VN  (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)XXX2N,D,ρ,λ V  = V  UN VN  (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)D,ρ,λ  = ΛN(D,ρ,λ ) (3.8) where D contains all the pairwise distance between the observations in the data, ρ is the vector of variance-covariance parameters and λ is the spatial parameter. 86 The essential idea that we would like to formalize through Assumption 4 is twofold. First, we specify that the unobservables of all the individuals exhibit spatial correlations. In other words, for all i, j = 1,2, ...,N, ui is spatially correlated with u j and vi is spatially correlated with v j. Second, we postulate that the primary error of an individual i is spatially correlated with the both the reduced-form error of the individual i and the reduced-form error of the individual j. In other words, for all i, j = 1,2, ...,N, ui is spatially correlated with v j. We can write the error variance-covariance matrix as: ΛUU,N(D,ρ,λ ) ΛUV,N(D,ρ,λ )  ΛUV,N(D,ρ,λ ) ΛVV,N(D,ρ,λ ) ΛN(D,ρ,λ ) = (3.9) where ΛUU,N(D,ρ,λ ) ≡ V[UN],ΛVV,N(D,ρ,λ ) ≡ V[VN] and ΛUV,N(D,ρ,λ ) ≡ Cov[UN,VN]. In our model, we express the endogeneity of y2i through the relationship between the errors of the primary equation and the errors of the reduced-form equation. Formally, this relationship is expressed in the control function assumption that we state below. The components of the control function assumption, however, are defined in the variance-covariance matrix denoted by ΛUV,N(λ ). 3.3 Estimating Equation Our estimating equation needs to take account of the endogeneity in our spatially correlated data. To deal with the endogeneity in the spatially correlated data of our model, we divide our data set into G groups and then use the information on these observations within a group. The essential idea that we want to capture is that the internal correlation between these observations in a group are more important than the external correlation. We divide the total number of observations into G groups, each containing Lg number of observations. For simplicity assume that there are same number of observations in each group; i.e L1 = L2 =, ..., = LG ≡ L. Writing the model at the group level, we have: 87 For g = 1, ...,G: with Y1g ≡ Y2g ≡ Y1g = XXX1gβββ 1 + τY2g +Ug (3.10) (cid:18) (cid:18) y1g1 y2g1 (cid:19)(cid:48) (cid:19)(cid:48) . . . y1gL . . . y2gL (cid:18) (cid:18) ; XXX1g ≡ ; Ug ≡ (cid:19)(cid:48) (cid:19)(cid:48) xxx1g1 . . . xxx1gL ug1 . . . ugL Next, note that since y2i is a function of ZZZ2g,Vg where ZZZ2g is the vector of all the exogenous variables within the group g: E[Y1g|ZZZ2g,Vg] = XXX1gβββ 1 + τY2g + E[Ug|ZZZ2g,Vg] (3.11) (cid:19)(cid:48) (cid:18) As we will describe in section 3.2, ZZZ2g can contain more elements than XXX2g ≡ xxx2g1 If we do not have endogeneity, we will have E[Ug|ZZZ2g,Vg] = 0. This is the framework stud- ied in Lu and Wooldridge (2017). However, ruling out endogenous covariates in a regression . . . xxx2gL . analysis is often not justified by economic theory. In practice, explanatory variables are often correlated the errors due to the common encountered issues of omitted variables and/or mea- surement errors which lead to inconsistencies in parameter estimation. This calls for a more general treatment of linear regression models that allow for both spatially dependent errors and endogenous covariates. In this paper, endogeneity of y2i is captured in the relationship between the primary errors and the reduced-form errors. This relationship is captured by E[Ug|ZZZ2g,Vg] which is also termed as the correction term. Obtaining an expression for the correction term and including it in the estimating equation enables us to obtain the consistent estimators for the parameters. The control function approach proceeds by the imposing some structure on the correction terms. In our context, since the unobservables also exhibit spatial dependence, the control function assumption that we would impose allows us to incorporate this spatial dependence. 88 3.3.1 Control Function Assumption Control function approach models the endogeneity of the covariates by explicitly specifying the endogeneity in the estimation. In our model, the control function method corrects for the endogeneity of y2 by modeling the relationship between the errors of the primary equation and the error of the reduced-form equation. Since we have divided the data into groups, we specify the control function assumption for each group. This also incorporates the spatial correlation between the errors of each group. Thus the components of the control function assumption are given by the variance-covariance matrix of each group. To obtain the expression for the control function assumption, define the group specific variance-covariance matrix of the unobservables as:   Ug Vg (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)ZZZ2g,Dg,λ V  = ΛUU,g(λ ) ΛUV,g(λ )  ΛUV,g(λ ) ΛVV,g(λ ) (3.12) (3.13) (3.14) (3.15) (3.16) where Dg contains all the pairwise distance between the observations within a group g Formally stating, our control function assumption is: Assumption 3.3.1 For each g = 1,2, ..,G Define Πg =(cid:2)ΛVV,g (cid:3)Vg (cid:3)−1(cid:2)ΛUV,g (cid:3)Vg = ΠgVg E[Ug|ZZZ2g,Vg] = E[Ug|Vg] =(cid:2)ΛVV,g (cid:3)−1(cid:2)ΛUV,g (cid:3) and using the vectorization operator we can write: (cid:3)−1(cid:2)ΛUV,g E[Ug|Vg] =(cid:2)ΛVV,g (cid:105)(cid:2)vec(Πg)(cid:3) (cid:104) V(cid:48) g ⊗ IL (cid:104) V(cid:48) g ⊗ IL (cid:105)(cid:2)vec(Πg)(cid:3) + ηg Y1g = XXX1gβββ 1 + τY2g + = Substituting in the equation 10 we obtain the estimating equation as: where E[ηg|ZZZ2g]. 89 3.3.2 Instruments Within a group, the exogenous variables for an individual gi are given by xxx2gi. However, the exogenous variables of other individuals in a group also serve as instruments for the individual In other words, if we denote the XXX2g−i as the vector of exogenous covariates of other gi. individuals in a group g, then the full set of instruments for gi are given by zzz2gi ≡ [xxx2gi,XXX2g−i] . Recall that for each individual gi in group g, we have the vector of instruments given by xxx2gi. Recognizing that within a group the exogenous variables of the ones neighbors can also serve as the instruments, we can obtain extra instruments. That is in this case for g1 the instruments are given by ZZZ2g1 ≡ [xxx2g1,xxx2g2]. For example, suppose that L = 2. In this case, Y2g = (cid:19)(cid:48) (cid:18) y2g1 y2g2 We can write the within-group reduced form model for the endogenous variable that incor- porates all the exogenous variables that are found within a group: Y2g = ZZZ2gδδδ 2 +Vg (3.17) (cid:18) ZZZ2g1 (cid:19)(cid:48) . . . . ZZZ2gL where ZZZ2g ≡ 3.4 Estimation Procedures In this section, we describe three new estimation procedures that incorporate the additional instruments as well as the within-group spatial dependence that is facilitated by the groupwise division of the data. We begin by describing a two-step estimation of the Spatial Control Func- tion Estimator. Next, we describe a Grouped Two Stage Least Square estimation procedure that uses only the additional instruments and ignores the spatial dependence in the data. Finally, we describe the Spatial Generalized Instrumental Variable estimator that incorporates both the additional instruments as well as the within-group spatial dependence of the observations. 90 3.4.1 Control Function Estimation We write the reduced form equation and the estimating equation again for convenience: (cid:104) V(cid:48) g ⊗ IL (cid:105)(cid:2)vec(Πg)(cid:3) + ηg Y1g = XXX1gβββ 1 + τY2g + Y2g = ZZZ2gδδδ 2 +Vg The equations above suggest a two-step estimation procedure. In the first step we esti- mate the reduced form model and obtain the residuals. In the second step we plug in the first step residuals in the estimating equation. Then we use the Quasi-GLS estimation procedure described in Lu and Wooldridge (2017). 3.4.1.1 First Step In the first step, we can re-write: Y2g = ZZZ2gδδδ 2 +Vg (3.18) 2 where δδδ P 2 is a finite dimensional (3.19) (3.20) (3.21) Denote the true parameters of the first-step as δδδ 2∗ ∈ δδδ P parameter space. The estimator denoted by(cid:98)δδδ 2 is given as: (cid:104)(cid:0)Y2g − ZZZ2gδδδ 2 (cid:35)−1(cid:34) G (cid:98)δδδ 2 = arg minδδδ 2∈δδδ P (cid:34) G (cid:98)δδδ 2 = Since we are estimating a linear model, we get: ZZZ(cid:48) 2gZZZ2g 1 G G ∑ g=1 2 ∑ g=1 (cid:1)(cid:105) (cid:1)(cid:48)(cid:0)Y2g − ZZZ2gδδδ 2 (cid:35) ZZZ(cid:48) 2gY2g ∑ g=1 In addition, this gives the first stage residuals as: (cid:98)Vg = Y2g − ZZZ2g(cid:98)δδδ 2 3.4.1.2 Second Step (cid:104) In the second stage, we have the estimating equation as: V(cid:48) g ⊗ IL Y1g = XXX1gβββ 1 + τY2g + (cid:105)(cid:2)vec(Πg)(cid:3) + ηg 91 To motivate the estimation procedure in the second-step, assume that we fully observe Vg. Re- write the estimating equation as: Y1g = XXXgθθθ 1 + ηg (3.22) where XXXg is a vector that contains all the covariates of the estimating equation and θθθ 1 is a vector that contains all the parameters. We can estimate the parameters in equation (22) using a simple a OLS and we will obtain consistent estimators. However, note that the errors in the estimating equation are going to have spatial dependence. This suggests that we can obtain some efficiency gains if we can incorporate the spatial dependence in the estimation procedure. To estimate the parameters in this spatial setting, we will implement the Quasi-GLS estima- tion procedure proposed by Lu and Wooldridge (2017). Denote the variance-covariance matrix of ηg by Λgη (λη ) where λη is the spatial parameter. Note that the group-specific variance- covariance matrix Λgη (λη ) is contained in the full variance-covariance matrix for ηN denoted by ΛNη (λη ). Traditional GLS estimation considers the entire error variance-covariance matrix ΛNη (λη ). This entails accounting for all the pairwise correlations of the observations which is compu- tationally very tedious. However, since we divide the observations into groups according to the distances between the observations we recognize that the within-group correlations account for most of the correlations in the data and ignoring the across-group correlations should not lead to much efficiency loss. This is precisely the motivation behind the Quasi-GLS estima- tion procedure in Lu and Wooldridge (2017). Quasi-GLS estimation procedure involves using a tapered error variance-covariance matrix which essentially implies that we extract the variance- covariance matrix for within-group errors denoted by Λgη (λη ) and the correlations between the errors across-groups are set to zero. Formally, define a N × N tapering matrix T whose individual components Ti, j = 1 if the observations i, j belong to the same group and Ti, j = 0 otherwise for i, j = 1,2, ..,N. Define (cid:3). In other words, Λη is a block diagonal matrix that contains Λη ≡ diag(cid:2)Λ1η ,Λ2η , ...,ΛGη 92 only within-group error variance-covariances. Note that Λη ≡ T ◦ ΛNη where ◦ denotes the Hadamard product. Now, while the traditional GLS estimation is weighted by ΛNη and Quasi- GLS estimation is weighted by Λη. For example, consider the simplest case of N = 4. Further assume that observations {1,2} are in group 1 and observations {3,4} are in group 2. In this case,  = T ◦ ΛNη σ11 σ12 σ21 σ22 0 0 0 0 0 0 0 0 σ33 σ34 σ43 σ44   ,Λη = .  1 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 T =  σ11 σ12 σ13 σ14 σ21 σ22 σ23 σ24 σ31 σ32 σ33 σ34 σ41 σ42 σ43 σ44 where ΛNη = In our context, we call Quasi-GLS estimator as the Spatial Control-Function Estimator (sp.CF). Denote the true parameters of the second-step as θθθ 1∗ ∈ ΘΘΘ1 where ΘΘΘ1 is a finite di- mensional parameter space. The estimator is obtained as: (cid:104)(cid:0)Y1g − XXXgθθθ 1 (cid:1)(cid:48) Λ−1 gη (cid:0)Y1g − XXXgθθθ 1 (cid:1)(cid:105) (3.23) (cid:98)θθθ 1(sp.CF) = arg minθθθ1∈ΘΘΘ1 1 G G ∑ g=1 As it is clear, by using the Quasi-GLS estimation procedure to obtain our Control-Function estimator in the second-step, we are taking account of some of the correlations in the data by including Λgη (λη ). Thus we get efficiency gains because we do not completely ignore the spatial dependence in the data. In practice, we first plug in(cid:98)Vg as the consistent estimators for Vg that controls for the endo- geneity of Y2g. Further, to obtain a feasible version of the Spatial Control-Function estimator, we will need a consistent estimator for Λgη (λη ) that depends on the pairwise distances and spatial parameter λη. Let (cid:98)Λgη (ˆλη ) be the consistent estimator for Λgη (λη ). For notational convenience, XXXg now denotes the vector of all the covariates with(cid:98)Vg instead of Vg. 93 The second-step essentially entails using the Feasible Quasi-GLS procedure of Lu and Wooldridge (2017) to obtain what we call the Feasible Spatial Control-Function (F.sp.CF) es- timator, denoted by(cid:98)θθθ 1(F.sp.CF). The estimator is obtained as: (cid:104)(cid:0)Y1g − XXXgθθθ 1 (cid:1)(cid:48)(cid:98)Λ−1 (cid:17)(cid:35)−1(cid:34) G (cid:16) XXX(cid:48) (cid:98)θθθ 1(sp.CF) = arg minθθθ1∈ΘΘΘ1 (cid:34) G (cid:16) XXX(cid:48) Solving the optimization routine the explicit expression is obtained as: (cid:98)θθθ 1(F.sp.CF) = g(cid:98)Λ−1 g(cid:98)Λ−1 gη XXXg 1 G G ∑ g=1 (cid:0)Y1g − XXXgθθθ 1 (cid:17)(cid:35) gη Y1g gη (cid:1)(cid:105) (3.24) (3.25) ∑ g=1 ∑ g=1 3.4.2 Incorporating Extra Instruments in other Estimation Procedures Grouping the observations into groups based on the distance between the observations allows us to obtain additional instruments for each group. We can incorporate these instruments in the traditional estimation procedures to get some efficiency gains. Let zzz1i ≡ {xxx1i,y2i} and denote βββ ≡ [βββ 1,τ](cid:48) for notational simplicity. The primary outcome equation can be re-written as: The groupwise notation is given as: y1i = zzz1iβββ + ui Y1g = ZZZ1gβββ +Ug where ZZZ1g is the groupwise notation for zzz1i. The traditional 2SLS estimator is given as: (cid:98)βββ (2SLS) = (cid:32) N (cid:32) N ∑ i=1 ∑ i=1 (cid:33)(cid:32) N (cid:33)(cid:32) N ∑ i=1 ∑ i=1 (cid:33)−1(cid:32) N (cid:33)−1(cid:32) N ∑ i=1 ∑ i=1 xxx(cid:48) 2ixxx2i xxx(cid:48) 2ixxx2i zzz(cid:48) 1ixxx2i zzz(cid:48) 1ixxx2i (3.26) (3.27) (3.28) (cid:33)−1 (cid:33) × xxx(cid:48) 2izzz1i xxx(cid:48) 2iy1i 94 3.4.2.1 Grouped 2SLS Estimation When we group the data to estimate the model with new instruments, we obtain the expression for what we call grouped-2SLS. (cid:98)βββ (grpd.2SLS) = (cid:32) G (cid:32) G ∑ g=1 ∑ g=1 (cid:33)(cid:32) G (cid:33)(cid:32) G ∑ g=1 ∑ g=1 ZZZ(cid:48) 2gZZZ2g ZZZ(cid:48) 2gZZZ2g (cid:33)−1(cid:32) G (cid:33)−1(cid:32) G ∑ g=1 ∑ g=1 (cid:33)−1 (cid:33) ZZZ(cid:48) 2gZZZ1g ZZZ(cid:48) 2gY1g × (3.29) ZZZ(cid:48) 1gZZZ2g ZZZ(cid:48) 1gZZZ2g 3.4.2.2 Spatial Generalized Instrumental Variable Estimation We can also incorporate the extra instruments as well as the spatial dependence in the framework of Generalized Instrumental Variables (GIV) estimation. Recall that we denoted Var[Ug] = ΛUU,g as the groupwise variance-covariance matrix for the primary outcome equation. The instruments are denoted as ZZZ2g. GIV works by incorpo- rating the spatial dependence of the errors through a GLS type transformation of equation (27). Assuming that ΛUU,g to be a positive definite, we obtain the GLS transformation as: Λ−1/2 UU,gY1g = Λ−1/2 UU,gZZZ1gβββ + Λ−1/2 UU,gUg (3.30) UU,gZZZ2g as instruments. We call this estimator as Spatial where Λ−1/2 UU,gΛ−1/2 UU,g = Λ−1 UU,g. GIV: Next, we estimate (32) using Λ−1/2 (cid:33)(cid:32) G (cid:98)βββ (sp.GIV ) = 1gΛ−1 ZZZ(cid:48) UU,gZZZ2g ∑ g=1 (cid:32) G (cid:32) G ∑ g=1 1gΛ−1 ZZZ(cid:48) UU,gZZZ2g (cid:33)−1(cid:32) G (cid:33)−1(cid:32) G 2gΛ−1 ZZZ(cid:48) UU,gZZZ1g ∑ g=1 2gΛ−1 ZZZ(cid:48) UU,gY1g ∑ g=1 (cid:33)−1 (cid:33) × (3.31) ∑ g=1 (cid:33)(cid:32) G ∑ g=1 2gΛ−1 ZZZ(cid:48) UU,gZZZ2g 2gΛ−1 ZZZ(cid:48) UU,gZZZ2g 95 In practice, we use a Feasible version of the estimator once we plug in an estimator for(cid:98)Λ−1 (cid:33)−1 (cid:98)βββ (F.sp.GIV ) = (cid:33) (cid:32) G (cid:32) G (cid:33)−1(cid:32) G (cid:33)−1(cid:32) G (cid:33)(cid:32) G (cid:33)(cid:32) G 2g(cid:98)Λ−1 1g(cid:98)Λ−1 1g(cid:98)Λ−1 2g(cid:98)Λ−1 ZZZ(cid:48) 2g(cid:98)Λ−1 ZZZ(cid:48) 2g(cid:98)Λ−1 ZZZ(cid:48) ZZZ(cid:48) ∑ g=1 UU,gZZZ1g ZZZ(cid:48) ∑ g=1 UU,gZZZ2g UU,gZZZ2g UU,gZZZ2g UU,gY1g ZZZ(cid:48) ∑ g=1 UU,gZZZ2g ∑ g=1 ∑ g=1 ∑ g=1 UU,g × (3.32) 3.4.3 Estimation of the spatial parameter (3.33) We can obtain a consistent estimator for λη using a straightforward minimization algorithm. We can obtain residuals from a preliminary consistent estimator of θθθ 1. Recall that we denoted the variance-covariance matrix of ηN by ΛNη (D,λη ). We specify the functional form of the individual components of the matrix as ΛNη;(i, j)(di j,λη ) with ΛNη;(i,i) = σ 2 η. A consistent estimator for the variance of ηi denoted by(cid:98)ση is given as 1 ˜η2 i N ∑ i=1 N − Kx (cid:98)ση = (cid:16) (cid:17)2 ˜ηi ˜η j − ΛΛΛNη;(i, j)(di j,(cid:98)ση ,λη ) N ∑ i=1 N ∑ j(cid:54)=i (3.34) (3.35) A consistent estimator for λη can be obtained as: ˆλη = arg min 1 N 3.5 Asymptotics In spatial statistics, there are two main frameworks for the asymptotic analysis (Lee (2004), Cressie (1993)): increasing-domain and fixed-domain. In increasing-domain framework, we assume that the minimum distance between the observations is bounded below by a positive constant and the asymptotics are based on a growing observation region. In fixed-domain or infill 96 framework, we assume that as the number of observations increase, the observational region becomes increasingly dense while the observational region is assumed to be fixed and bounded. It has been well established (Cressie (1993) , Stein (1999), Ripley (1988), Zhang (2004) ) that general results under fixed-domain asympotics are not available because as the number of observations increase, the number of interactions between the observations also increase and there is no theoretical basis for the usual behavior of estimators. There exist a large body of literature on the asymptotic analysis with spatial dependence un- der the increasing-domain framework. Mardia and Marshall (1984), Cressie and Lahiri (1993) give consistency and asymptotic normality results for the maximum likelihood and other likeli- hood estimators for regression models with spatially correlated errors. Lee (2004) investigates the asymptotic properties of the maximum likelihood and quasi-maximum likelihood estima- tor for the spatial autoregressive models. Conley (1996) obtains the asymptotic results for the generalized method of moments estimators with stationary spatial data. In this paper, we obtain the asymptotic results for out estimator under the increasing-domain framework. We collect our estimating equations and describe the two-step estimation procedure as a one-step estimation procedure that enables us to derive the asymptotic properties in a con- cise manner. We choose the framework of Jenish and Prucha (2012) to implement the law of large numbers and central limit theorem for spatial near-epoch dependence. Let wi = {y1i,y2i,xxx2i,ui,vi} be all the random variables in our model for i = 1,2,3.... Assumption 3.5.1 ∀N, wi are located on an infinitely countable lattic D ∈ Rd of possibly uneven placed locations. The space Rd is equipped with the metric d(i, j) = max1≤l≤d|il − jl|, where jl is the lth element of j, and all elements of D are located at distances of at least d∗ > 0 from each other Next, we define the nature of dependence of the spatial processes of our model to obtain asymp- totic properties of our estimators. Jenish and Prucha (2012) obtains results for law of large 97 numbers and central limit theorem under the general set of restrictions on near cross-sectional dependence. Specifically, the spatial processes are assumed to exhibit near-epoch dependence. Near-epoch dependence for random fields is described as: Assumption 3.5.2 (a) For some random filed ε = {εi,N;i ∈ TN, TN ⊂ D,N ≥ 1} with |TN| → ∞ as N → ∞ where || denotes the cardinality of a set; for some element wa i of wi, the random field {{wa i }i=2,..,N}N≥1 is L2-near-epoch dependent (NED) on the random field ε, i.e, : i − E(cid:0)wa i |Fi,N(s))||2 ≤ Cψ(s)(cid:1) ||wa (3.36) where Fi,N(s) = σ (ε j,N; j ∈ TN,di, j ≤ s). C is some positive constant, and ψ(s) is a deter- ministic sequence called NED coefficients with ψ(s) ≥ 0 and lims→∞ψ(s) = 0 (b) ε is α-mixing coefficients satisfying Assumption 3 of Jenish and Prucha (2012). (c)ψ(s) satisfies ∑d−1 r=1 ψ(r) < ∞ This assumption implies that the random variables in wi can be arbitrarily well approximated by neighboring observations of an α-mixing field. This includes the case where wi is the α-mixing as is the case in this model. (Verdier 2016) Under Assumption 1 and Assumption 2 described above and Assumption (4) in Jenish and Prucha (2012) we can apply the weak law of large numbers and central limit theorem described in Theorem 1 and Theorem 2 in Jenish and Prucha (2012). 3.5.1 Asymptotics for Feasible Spatial Control Function Estimator For g = 1,2, ...,G: Re-writing the equations: Y1g = XXXgθθθ 1 + ηg Y2g = ZZZ2gδδδ 2 +Vg yg = wgθθθ + ug 98 (3.37) (3.38) (3.39) where  , wg = Y1g Y2g XXXg 000  , ug =  ηg Vg 000 ZZZ2g yg = Next assumption imposes strict exogeneity: Assumption 3.5.3 For i = 1,2, ...,N E[ui|w1, w2, ...] = 0 (3.40) (3.41) where ui, wi is individual level notation. This implies that for any matrix ΩΩΩg of conformable dimension, E[w(cid:48) g wg] = 000 for all g. gΩΩΩ−1 In the context of this paper, we have The Spatial Control-Function estimator is given as: The first order conditions for the optimization routine are given as: 000 I 1 G (cid:17)(cid:48) ΩΩΩg = G ∑ g=1  yg − wgθθθ Λgη 000 (cid:20)(cid:16) (cid:98)θθθ (sp.CF) = arg minθθθ∈ΘΘΘ (cid:16)(cid:98)θθθ sp.CF , wg (cid:17) ≡ 1 −1 g ΩΩΩ−1  sg1 (cid:17) (cid:16)(cid:98)θθθ 1(sp.CF),ZZZ2g;(cid:98)δδδ 2 w (cid:16)(cid:98)δδδ 2,ZZZ2g (cid:17) (cid:16) Y1g − XXXg(cid:98)θθθ 1(sp.CF) (cid:17) (cid:16) Y2g − ZZZ2g(cid:98)δδδ 2 g=1 XXX(cid:48) G ∑G 1 G ∑G gΛgη g=1 ZZZ(cid:48) G ∑ g=1 G ∑ g=1 G ∑ g=1 sg2 sg G G SG((cid:98)θθθ (sp.CF)) ≡ 1  1 1 G ≡ = (cid:17)(cid:21) (cid:16) ΩΩΩ−1 yg − wgθθθ g (cid:16) yyyg − wg(cid:98)θθθ (sp.CF) g  (cid:17) 2g = 000 These conditions are in fact the sample analogue of the moment conditions: E(cid:2)sg (cid:0)θθθ∗, wg (cid:1)(cid:12)(cid:12) WG (cid:3) = 000 99 (cid:17) (3.42) (3.43) (3.44) (3.45) (3.46) (3.47) (3.48) where θθθ∗ ≡ {θθθ 1∗,δδδ 2∗} denote the true parameters, WG is the full data matrix and (cid:0)θθθ , wg (cid:1) = sg sg2 sg1 XXX(cid:48)  (cid:0)θθθ 1,ZZZ2g;δδδ 2 (cid:1) (cid:0)δδδ 2,ZZZ2g (cid:1) (cid:0)Y1g − XXXgθθθ 1 (cid:0)Y2g − ZZZgδδδ 2 (cid:1) gΛgη ZZZ(cid:48) g =  (cid:1) (3.49) (3.50) Thus the estimator proposed in this paper can also be described in a GMM framework. The asymptotic theory developed by Jenish and Prucha (2012) for spatial GMM can easily be im- plemented here. In this paper, we also explicitly correct for the first-step estimation. Next define Hg(θθθ∗, wg) ≡ (cid:32)∂ sg (cid:0)θθθ∗, wg (cid:1) (cid:33) ∂θθθ(cid:48) (cid:48) gΩΩΩ−1 = −w g wg (3.51) The next two assumptions impose boundedness conditions and appropriate rank conditions. Assumption 3.5.4 A ≡ limG→∞ 1 G ∑G g=1 In addition, assume: Assumption 3.5.5 B ≡ limG→∞E has full rank. E(cid:0)−Hg(θθθ∗, wg)(cid:1) exists and has full rank (cid:105)(cid:48)(cid:27) (cid:1)(cid:105)(cid:104) 1 (cid:0)θθθ∗, wg G ∑G h=1 sh (θθθ∗, wh) G ∑G g=1 sg (cid:26)(cid:104) 1 exists and Since Quasi-GLS groups the observations according to the distances between the individuals, it is essentially grouping "nearby" observations. Thus the groups will also be near-epoch de- pendent processes. In addition, NED is preserved under addition and multiplication (Verdier 2016, Davidson 1994). So Theorem 1 and Theorem 3 of Jenish and Prucha (2012) can be implemented. In practice, we get the feasible version of the estimator as: (cid:98)θθθ (F.sp.CF) = arg minθθθ∈ΘΘΘ 1 G G ∑ g=1 yg − wgθθθ (cid:17)(cid:48)(cid:98)ΩΩΩ (cid:16) yg − wgθθθ (cid:17)(cid:21) −1 g (3.52) (cid:20)(cid:16) 100 Theorem 7 Under Assumptions 1-10,(cid:98)θθθ (F.sp.CF) is consistent and 000,A−1BA−1(cid:17) (cid:16)(cid:98)θθθ (F.sp.CF) − θθθ∗ (cid:17) →d N (cid:16) √ G (3.53) Proof. Following the procedure in Lu (2013), we will first obtain the asymptotic properties for (cid:98)θθθ (sp.CF) and then show that(cid:98)θθθ (sp.CF) and(cid:98)θθθ (F.sp.CF) are asymptotically equivalent. • Consistency of(cid:98)θθθ (sp.CF): For consistency, note that (cid:33) (cid:33)−1(cid:32) (cid:32) (cid:98)θθθ (sp.CF) = θθθ∗ + 1 G G ∑ g=1 (cid:48) gΩΩΩ−1 g wg w 1 G G ∑ g=1 (cid:16) 1 Because NED, is preserved under multiplication and addition (Theorems 17.8 and 17.9 in Davidson (1994)) w(cid:48) gΩΩΩ−1 G ∑G g ug (2012), we get are also NED processes. Using Theorem 1 in Jenish and Prucha g ug as well as gΩΩΩ−1 g wg g=1 w(cid:48) g=1 w(cid:48) gΩΩΩ−1 G ∑G (cid:17) and gΩΩΩ−1 g wg and w(cid:48) (cid:32) (cid:32) 1 G G ∑ g=1 G ∑ g=1 1 G (cid:33) (cid:33) (cid:48) gΩΩΩ−1 w g wg (cid:48) gΩΩΩ−1 g ug w →p A →p 000 gΩΩΩ−1 (cid:48) g ug w (cid:16) 1 (3.54) (cid:17) (3.55) (3.56) (cid:17) (3.57) (3.58) (3.59) (3.60) The consistency result follows. • Asymptotic Normality of(cid:98)θθθ (sp.CF): Doing a mean value expansion around θθθ∗ SG((cid:98)θθθ (sp.CF)) = 000 000 = 000 = (cid:16)(cid:98)θθθ (sp.CF) − θθθ∗ √ G (cid:17) = (cid:17) (cid:16)(cid:98)θθθ (sp.CF), wg (cid:0)θθθ∗, wg (cid:1) + (cid:1)(cid:35)−1(cid:34) (cid:0) ¨θθθ , wg G ∑ g=1 Hg 1 G sg sg 1 G 1 G (cid:34) G ∑ g=1 G ∑ g=1 − 1 G G ∑ g=1 (cid:0) ¨θθθ , wg Hg 1√ G G ∑ g=1 sg (cid:1)(cid:16)(cid:98)θθθ (sp.CF) − θθθ∗ (cid:1)(cid:35) (cid:0)θθθ∗, wg 101 where ¨θθθ has elements between(cid:98)θθθ and θθθ∗ sg and Hg (cid:34) √ G − 1 G 1√ G →p A G ∑ g=1 G ∑ g=1 Bringing these two results together, we get Using Theorem 1 and Theorem 2 of Jenish and Prucha (2012), (cid:1)(cid:35) (cid:0) ¨θθθ , wg (cid:0)θθθ∗, wg (cid:1) →d N (000,B) 000,A−1BA−1(cid:17) (cid:17) →d N (cid:16) (cid:16)(cid:98)θθθ (sp.CF) − θθθ∗ • Asymptotic Equivalence of(cid:98)θθθ (sp.CF) and(cid:98)θθθ (F.sp.CF): We have (cid:33)−1(cid:32) (cid:32) G ∑ (cid:32) (cid:33)−1(cid:32) g=1 G ∑ Note that since (cid:98)ΩΩΩg →p ΩΩΩg, we get: g=1 g(cid:98)ΩΩΩ g(cid:98)ΩΩΩ (cid:98)θθθCF = θθθ∗ + (cid:98)θθθ FCF = θθθ∗ + gΩΩΩ−1 (cid:48) g wg w g(cid:98)ΩΩΩ G ∑ g=1 G ∑ g=1 −1 g wg 1√ G 1√ G 1 G 1 G (cid:48) w (cid:48) w (cid:48) w gΩΩΩ−1 (cid:48) g ug w g(cid:98)ΩΩΩ −1 g ug (cid:48) w 1 G G ∑ g=1 −1 gΩΩΩ−1 (cid:48) g wg →p w g wg −1 G (cid:48) gΩΩΩ−1 1 g wg →p ∑ g wg w G g(cid:98)ΩΩΩ g=1 −1 g=1 w(cid:48) g wg →p A G ∑G Since 1 G ∑G g=1 w(cid:48) gΩΩΩ−1 g wg →p A, this implies 1 (3.61) (3.62) (3.63) (3.64) (3.65) (cid:33) (cid:33) Next, 1√ G Thus we will have 1√ G ∑G (cid:48) w g + op(1))ug g(cid:98)ΩΩΩ g(cid:98)ΩΩΩ G ∑ g(cid:98)ΩΩΩ g=1 −1 g=1 w(cid:48) g ug →d N (000,B) and the asymptotic result follows. −1 g(ΩΩΩ−1 (cid:48) g ug = w gΩΩΩ−1 (cid:48) g ug + op(1) = w G (cid:48) gΩΩΩ−1 1√ ∑ w G g=1 g ug + op(1) −1 g ug = (cid:48) w (3.66) (3.67) (3.68) 102 3.5.1.1 Adjusting for first-step estimation We can further tease out the asymptotic properties of(cid:98)θθθ 1(F.sp.CF) from our GMM framework. The asymptotic variance of(cid:98)θθθ 1(F.sp.CF) would adjust for the first-step estimation. We have: SG((cid:98)θθθ (F.sp.CF)) = ≡ Next, define Hg1 ≡ G ∑G 1 G ∑G ∂θθθ(cid:48) 1 SG1((cid:98)θθθ 1(F.sp.CF),ZZZ2g;(cid:98)δδδ 2)  SG2((cid:98)δδδ 2;ZZZ2g)  =  1 g=1 sg1((cid:98)θθθ 1(F.sp.CF),ZZZ2g;(cid:98)δδδ 2) g=1 sg2((cid:98)δδδ 2;ZZZ2g) (cid:33) (cid:32)∂ sg1 (cid:0)θθθ 1∗,ZZZ2g;δδδ 2∗(cid:1) (cid:33) (cid:32)∂ sg1 (cid:0)θθθ 1∗,ZZZ2g;δδδ 2∗(cid:1)) (cid:34) (cid:16) θθθ 1∗,ZZZ2g;(cid:98)δδδ 2 = [A1]−1 (cid:3)(cid:111) sg1 (θθθ 1∗,ZZZG;δδδ 2∗) + F(cid:16)(cid:98)δδδ 2 − δδδ 2∗ (cid:40) (cid:17)(cid:35) G ∑ g=1 G ∑ g=1 Hg2 ≡ (cid:17) E(cid:2)Hg1 1 G sg1 = 1 G . F ≡ lim G→∞ − 1 G √ G G ∑ g=1 (cid:3)(cid:41) E(cid:2)Hg2 (cid:17) (cid:16)(cid:98)δδδ 2 − δδδ 2∗ (cid:0)δδδ 2∗,ZZZ2g s(2) g as: (cid:1) + op(1) ∂δδδ(cid:48) 2 Doing the similar calculations as before we will get: √ G (cid:16)(cid:98)θθθ 1(F.sp.CF) − θθθ 1∗ (cid:110)− 1 where A1 ≡ limG→∞ (cid:16) θθθ 1∗,ZZZG;(cid:98)δδδ 2 G ∑G (cid:17) sg1 1 G G ∑ g=1 g=1 where Also, using the mean value expansion, Next, we have a first order representation of (cid:16)(cid:98)δδδ 2 − δδδ 2∗ (cid:17) √ G = 1 G G ∑ g=1  000 000 (3.69) (3.70) (3.71) + op(1) (cid:17) + op(1) (3.72) (3.73) (3.74) 103 where s(2) g E[s(2) g (cid:0)δδδ 2∗,ZZZ2g (cid:0)δδδ 2∗,ZZZ2g (cid:1)] = 000. (cid:16)(cid:98)θθθ 1 − θθθ 1∗ (cid:17) (cid:34) (cid:34) √ G Now bringing together equations (70), (71) and (73), we have: G ∑ g=1 sg1 (θθθ 1∗,ZZZG;δδδ 2∗) + F = [A1]−1 G ∑ g=1 1 G 1 G (cid:0)δδδ 2∗,ZZZ2g s(2) g (cid:34) + op(1) (cid:0)θθθ 1∗,ZZZ2g;δδδ 2∗(cid:1)(cid:35) (cid:0)θθθ 1∗,ZZZ2g;δδδ 2∗(cid:1) + Fs(2) (cid:0)θθθ 1∗,ZZZ2g;δδδ 2∗(cid:1)∑G g g=1 s(2) g = [A1]−1 1 G G ∑ g=1 s(1) g (cid:0)θθθ 1∗,ZZZ2g;δδδ 2∗(cid:1) = sg1 E(cid:104) (cid:110) 1 G (cid:1) (cid:0)δδδ 2∗,ZZZ2g (cid:0)θθθ 1∗,ZZZ2g;δδδ 2∗(cid:1)(cid:48)(cid:105)(cid:111) . where s(1) g Next define B1 ≡ limG→∞ g=1 s(2) ∑G g (cid:1) depends on the first-step estimation and sg2 (cid:1) with (cid:0)δδδ 2,ZZZ2g (cid:1)(cid:35)(cid:35) + op(1) (3.75) (3.76) (3.77) Using the law of large number and central limit theorem for the mixing sequence, we can follow the steps above to establish the following: Theorem 8 Under Assumptions 1-10 , (cid:16)(cid:98)θθθ 1(F.sp.CF) − θθθ 1∗ (cid:17) →d N (cid:16) 000,A−1 1 B1A−1 1 (cid:17) (3.78) √ G As Lu and Wooldridge (2017), these results allow for non-normal errors and for three kinds of mis-specifications in the error vairance-covariance matrix. It allows for the mis-specification that Quasi-GLS procedure makes by ignoring the correlations between individuals across the groups. Further, these results also allow for the mis-specification of the structure in the Λgη as well as inconsistencies in the estimation of the spatial parameters. 3.5.1.2 A consistent estimator for variance robust to cross-sectional structure To facilitates valid inference from the estimator described in the paper, we use the heteroskedas- ticity and autocorrelation (HAC) estimator suggested in Lu and Wooldridge (2017). Define (3.79) ˜ug ≡ yg − wg(cid:98)θθθ (F.sp.CF) 104 and let (3.80) (cid:48) −1 g ˜ug g(cid:98)ΩΩΩ (cid:98)ug = w 1− dgih j /d∗ dgih j ≤ d∗ (cid:17)(cid:35)−1(cid:34) G 0 dgih j > d∗ The kernel function given in Lu and Wooldridge (2017) denoted by KBC(dgih j ) ; is the function of i− th observation in group g denoted by gi and j− th observation in group g denoted by h j is defined as (cid:16)(cid:98)θθθ 2FCF (cid:100)Avar (cid:17) T.S.robust = KBC(di j) = (cid:34) G ∑ g=1 (cid:16) g(cid:98)ΩΩΩ (cid:48) w −1 g wg KBC(dgh)(cid:98)ug(cid:98)u (cid:48) h G ∑ h=1 ∑ g=1 (cid:35)(cid:34) G ∑ g=1 (cid:16) g(cid:98)ΩΩΩ (cid:48) w (3.81) (cid:17)(cid:35)−1 −1 g wg (3.82) where KBC(dgh) is the kernel matrix defined for group g and h. 3.5.2 Asymptotics for 2SLS, Grouped 2SLS and Spatial GIV Once we establish the assumptions on the nature of spatial dependence, the results of Jenish and Prucha (2012) can be applied to obtain the asymptotic properties of the other estimators described in the paper. The arguments of the proof will be similar to those given in the previous section. Re-writing the estimators : (cid:98)βββ (2SLS) = (cid:98)βββ (grpd.2SLS) = xxx(cid:48) 2izzz1i xxx(cid:48) 2iy1i ∑ i=1 (cid:33)−1(cid:32) N (cid:33)−1(cid:32) N (cid:33)−1(cid:32) G (cid:33)−1(cid:32) N ∑ i=1 ZZZ(cid:48) 2gZZZ2g ZZZ(cid:48) 2gZZZ2g xxx(cid:48) 2ixxx2i xxx(cid:48) 2ixxx2i ∑ i=1 ∑ i=1 (cid:32) N (cid:32) N (cid:32) G (cid:32) G ∑ g=1 ∑ g=1 zzz(cid:48) 1ixxx2i zzz(cid:48) 1ixxx2i ZZZ(cid:48) 1gZZZ2g ZZZ(cid:48) 1gZZZ2g ∑ i=1 (cid:33)(cid:32) N (cid:33)(cid:32) N (cid:33)(cid:32) G (cid:33)(cid:32) N ∑ i=1 ∑ g=1 ∑ g=1 105 (cid:33)−1 (cid:33) × (cid:33)−1 (cid:33) × ZZZ(cid:48) 2gZZZ1g ZZZ(cid:48) 2gY1g ∑ g=1 ∑ g=1 (cid:98)βββ (F.sp.GIV ) = (cid:32) G (cid:32) G ∑ g=1 ∑ g=1 1g(cid:98)Λ−1 1g(cid:98)Λ−1 ZZZ(cid:48) ZZZ(cid:48) UU,gZZZ2g UU,gZZZ2g (cid:33)(cid:32) G (cid:33)(cid:32) G ∑ g=1 ∑ g=1 2g(cid:98)Λ−1 2g(cid:98)Λ−1 ZZZ(cid:48) ZZZ(cid:48) UU,gZZZ2g UU,gZZZ2g (cid:33)−1(cid:32) G (cid:33)−1(cid:32) G ∑ g=1 ∑ g=1 (cid:33)−1 (cid:33) × 2g(cid:98)Λ−1 2g(cid:98)Λ−1 ZZZ(cid:48) ZZZ(cid:48) UU,gZZZ1g UU,gY1g Using the arguments described in the previous section, the estimators are consistent and asymp- totically normal with asymptotic variance given as below: (cid:16)(cid:98)βββ (2SLS) − βββ (cid:105)(cid:104) E(zzz(cid:48) 1ixxx2i) (cid:17) ] = [A2SLS]−1[B2SLS][A2SLS]−1 (cid:105)(cid:104) (cid:105)(cid:104) limN→∞ 1 N ∑N i=1 E(xxx(cid:48) (3.83) (cid:105) limN→∞ 1 N ∑N i=1 E(xxx(cid:48) 2izzz1i) 2ixxx2i) (cid:105)(cid:111)(cid:110) (cid:104) 1√ 2ixxx2i) Var N ∑N i=1 E(xxx(cid:48) 2iui) (cid:105)(cid:111) 1ixxx2i) limN→∞ limN→∞ 1 N ∑N i=1 1 N ∑N i=1 E(xxx(cid:48) 2ixxx2i) E(xxx(cid:48) (cid:105)(cid:111)(cid:48) (cid:17) ] = [Agrpd.2SLS]−1[Bgrpd.2SLS][Agrpd.2SLS]−1 (3.84) (cid:35) (cid:35)(cid:34) × 1gZZZ2g) 2gZZZ2g) lim N→∞ 1 G 1 G 1 G E(ZZZ(cid:48) G ∑ g=1 G E(ZZZ(cid:48) ∑ g=1 (cid:35)(cid:34) (cid:35)(cid:41)(cid:40)(cid:34) (cid:35) 2gZZZ1g) G E(ZZZ(cid:48) ∑ g=1 (cid:35)(cid:41) × (cid:35)(cid:34) For the traditional 2SLS estimator: limN→∞ For the Grouped 2SLS estimator: where A2SLS ≡(cid:104) B2SLS ≡(cid:110)(cid:104) (cid:110)(cid:104) and √ G Avar[ where √ G Avar[ limN→∞ 1 N ∑N i=1 limN→∞ 1 N ∑N i=1 E(zzz(cid:48) 1 N ∑N i=1 E(zzz(cid:48) 1ixxx2i) (cid:105)(cid:104) (cid:16)(cid:98)βββ (grpd.2SLS) − βββ (cid:34) (cid:34) lim G→∞ Agrpd.2SLS ≡ lim G→∞ and Bgrpd.2SLS ≡ (cid:40)(cid:34) (cid:40) 1 G lim G→∞ (cid:34) 1√ G G ∑ g=1 G ∑ g=1 Var E(ZZZ(cid:48) 1gZZZ2g) E(ZZZ(cid:48) 2gUg) lim G→∞ 1 G G ∑ g=1 E(ZZZ(cid:48) 2gZZZ2g) 1 G G ∑ g=1 E(ZZZ(cid:48) 1gZZZ2g) lim G→∞ (cid:35)(cid:41)(cid:48) E(ZZZ(cid:48) 2gZZZ2g) 1 G G ∑ g=1 lim G→∞ 106 Finally, for the Feasible Spatial GIV estimator: (cid:16)(cid:98)βββ (F.sp.GIV ) − βββ √ G Avar[ (cid:17) ] = [AF.sp.GIV ]−1[BF.sp.GIV ][AF.sp.GIV ]−1 (cid:104)E(ZZZ(cid:48) (cid:17)−1(cid:16) 2gΛ−1 1gΛ−1 UU,gZZZ2g)]−1E(ZZZ(cid:48) 1gΛ−1 i=1 ZZZ(cid:48) ∑N UU,gZZZ2g UU,gZZZ2g (cid:17)(cid:16) UU,gZZZ2g)[E(ZZZ(cid:48) i=1 ZZZ(cid:48) ∑N 2gΛ−1 2gΛ−1 (cid:20) where AF.sp.GIV ≡ limG→∞ BF.sp.GIV ≡Var 1√ G ∑G g=1 (cid:16) g=1 1 G ∑G g=1 ZZZ(cid:48) ∑G (3.85) , UU,gZZZ1g) 2gΛ−1 UU,gUg (cid:105) (cid:17)(cid:21) 3.6 Monte Carlo Simulations In this section, we illustrate the properties of our estimator using Monte Carlo simulations. The simulation results show the finite sample properties of the estimator and we compare them with the traditional 2SLS estimator. 3.6.1 Data Generating Process The total number of observations is given by N. The data is generated in a √ N ×√ N lattice (Figure 1). In other words, each observation is located on the intersections of the square lattice. √ This generates a coordinate system for the entire data given as {(l1,l2) : l1,l2 = 1,2,3, ..., N} where (l1,l2) can be interpreted as the "longitude" and "latitude" for each observation. We consider N = 400 such that the observations are generated in a 20× 20 lattice. The pairwise distances between the observation i and j is calculated using these coordinates. We can use several distance measures such as Euclidean distance, Manhattan distance, maxi- mum coordinate-wise distance among others. In this paper, we only consider the Euclidean distance for simplicity. 107 3.6.2 Model The primary equation for the outcome variable is defined as: y1i = β11 + τy2i + ui + N (0,1) {β11,τ} ≡ {1,1} (3.86) (3.87) where ui,i = {1,2, ...,N} is a spatially correlated vector of unobservables. We have three models for the endogenous variable y2i: Three models for the endogenous variable y2i: M1 : y2i = 1 + 3∗ x2i + ρui + N (0,1) M2 : y2i = 1 + 3∗ x2i + 2∗ x(2,i+1) + ρui + N (0,1) M3 : y2i = 1 + 3∗ x2i + 3∗ x(2,i+1) + ρui + N (0,1) (3.88) (3.89) (3.90) where x2i is an exogenous variable that is generated as spatially correlated random variable. x(2,i+1) is the exogenous variable (x) of nearest neighbor of the observation i. We include this in the specification of the reduced form model of y2i to increase the strength of the additional instruments that we get when we divide the data into groups. ρ is the level of endogeneity captured as the relationship between y2i and ui. To study the effect of increasing endogeneity on our estimation, we consider following different levels of ρ: ρ = 1,3 (3.91) 3.6.3 Spatial Correlations To generate a spatially correlated random vector UN, we use the Negative Exponential functional form of the spatial correlation structure. Specifically, each element of the variance-covariance matrix of a spatially correlated random variable is defined as Λu,N(i j) = σexp ; σ ≡ 1 (3.92) (cid:19) (cid:18) −di j λu 108 where λu is the spatial parameter that reflects how quickly the correlations decrease with the increasing distance. A spatial correlated random vector is generated as: Λ1/2 u,N N (0,1) (3.93) u,N is the Cholesky Decomposition and N (0,1) is a vector of i.i.d standard normal where Λ1/2 variables To study the effect of increasing spatial dependence on our estimation, we consider follow- ing different levels of λu: λu = 0.1,0.5,0.8 (3.94) The spatial parameter for X2N is 0.5 and X2N is generated in the same way using the Cholesky Decomposition and standard normal distribution. 3.6.4 Results We estimate the model using Tradional 2SLS, Grouped 2SLS, Spatial GIV and Spatial Control Function estimation methods. For the grouped estimation, we divide the data set into groups with each group containing 2 observations. We report the robust standard errors generated in the way described in section 3.5. Table 3.1 lists the results of model M1, Table 3.2 lists the results of model M2 and Table 3.3 lists the results of model M3. We list the Mean, Standard Deviation, Standard Errors made robust to spatial correlation, Bias, Root MSE, Rejection rate and Coverage Probability of 95 percent confidence intervals. The rejection rates and the confidence intervals are calculated for the 95 percent confidence levels. As we can see, Grouped 2SLS, Spatial GIV and Spatial CF estimators give standard deviations that are almost 40 percent less than the standard deviation of the Traditional 2SLS even though they use only the nearest neighbor. 109 Comparing the performance of Grouped 2SLS, Spatial GIV and Spatial Cf, we find that the results are comparable to each other. However, Spatial GIV performs better in some cases both in terms of point estimates and inference. 110 ρ=1,λ =.1 2SLS G2SLS Sp. GIV Sp. CF ρ=1,λ =.5 2SLS G2SLS Sp. GIV Sp. CF ρ=1,λ =.8 2SLS Grpd. 2SLS Sp. GIV Sp. CF ρ=3,λ =0.1 2SLS G2SLS Sp. GIV Sp. CF ρ=3,λ =0.5 2SLS G2SLS Sp. GIV Sp. CF ρ=3,λ =0.8 2SLS G2SLS Sp. GIV Sp. CF Mean 0.99919 0.9995 0.99952 0.99961 Mean 1.0015 1.0018 1.0013 1.0014 Mean 0.99641 0.99669 0.99745 0.99748 Mean 0.99668 0.99753 0.99751 0.99743 Mean 1.0012 1.0022 1.0023 1.0022 Mean 0.99737 0.99835 0.99923 0.99918 Table 3.1: M1 y2i = 1 + 3∗ x2i + ρui + N (0,1) RMSE 0.054794 0.054764 0.054744 0.054971 RMSE 0.06475 0.064769 0.063932 0.064064 RMSE 0.079842 0.079749 0.074998 0.075117 RMSE 0.055908 0.055667 0.05568 0.056105 RMSE 0.064547 0.064491 0.063998 0.064132 RMSE 0.079363 0.079059 0.075161 0.075462 Cov 0.958 0.959 0.958 0.979 Cov 0.946 0.947 0.946 0.965 Cov 0.947 0.947 0.948 0.941 Cov 0.951 0.956 0.956 0.983 Cov 0.957 0.956 0.961 0.976 Cov 0.947 0.949 0.95 0.96 S.D 0.054815 0.05479 0.054769 0.054997 SD 0.064765 0.064777 0.06395 0.064082 SD S.E 0.049795 0.049792 0.049793 0.056054 SE 0.05694 0.056908 0.05642 0.057404 SE 0.079801 0.07972 0.067279 0.067256 Bias 0.00081 0.000505 0.000476 0.000388 Bias -0.0015 -0.00179 -0.00132 -0.00136 Bias 0.003591 0.003313 0.074992 0.075113 0.064722 0.058463 0.002547 0.002516 Bias 0.003319 0.002473 0.002487 0.002572 Bias -0.00123 -0.00216 -2.31E-03 -0.00216 Bias 0.002632 0.001655 0.000767 0.000818 SD 0.055837 0.05564 0.055652 0.056075 SD 0.064568 0.064487 0.063988 0.064128 SD 0.079359 0.079082 0.075195 0.075495 SE 0.049877 0.049824 0.04983 0.057293 SE 0.056942 0.056824 0.05646 0.0586 SE 0.067875 0.067655 0.065005 0.060856 111 Table 3.2: M2 y2i = 1 + 3∗ x2i + 2∗ x(2,i+1) + ρui + N (0,1) ρ=1,λ =.1 2SLS G2SLS Sp. GIV Sp. CF ρ=1,λ =.5 2SLS G2SLS Sp. GIV Sp. CF ρ=1,λ =.8 2SLS G2SLS Sp. GIV Sp. CF ρ=3,λ =.1 2SLS Grpd. 2SLS Sp. GIV Sp. CF ρ=3,λ =.5 2SLS G2SLS Sp. GIV Sp. CF ρ=3,λ =0.8 2SLS G2SLS Sp. GIV Sp. CF Mean 0.99896 0.99898 0.99898 0.99909 Mean 1.0019 1.0022 1.0022 1.0021 Mean 0.99961 0.99969 0.99974 0.99969 Mean 0.99834 0.99881 0.99882 0.99878 Mean 0.99857 0.99874 0.99887 0.9989 Mean 1.0031 1.0033 1.0036 1.0036 RMSE 0.039492 0.03669 0.036689 0.036712 RMSE 0.047557 0.045487 0.045483 0.04551 RMSE 0.058058 0.057488 0.057314 0.057429 RMSE 0.037543 0.035283 0.035276 0.035339 RMSE 0.049539 0.04741 0.047422 0.047547 RMSE 0.058097 0.056895 0.056787 0.056851 Cov 0.958 0.963 0.963 0.98 Cov 0.956 0.951 0.95 0.961 Cov 0.931 0.93 0.932 0.932 Coverage 0.961 0.961 0.961 0.983 Cov 0.949 0.951 0.952 0.954 Cov 0.954 0.951 0.955 0.952 SD 0.039498 0.036695 0.036693 0.036719 SD 0.047544 0.045458 0.045453 0.045482 SD 0.058086 0.057516 0.057342 0.057457 SD SE 0.036398 0.033626 0.033627 0.037769 SE 0.041376 0.039112 0.039126 0.039987 SE 0.04803 0.046503 0.046516 0.042024 SE Bias 0.001043 0.001017 0.001016 0.000914 Bias -0.00187 -0.00217 -0.00219 -0.00214 Bias 0.000392 0.000309 0.000259 0.000305 Bias 0.037525 0.03528 0.035985 0.033241 0.001658 0.001193 0.035274 0.035336 SD 0.049543 0.047417 0.047432 0.047558 SD 0.058043 0.056827 0.056699 0.056764 0.001183 0.001218 Bias 0.001427 0.001262 1.13E-03 0.001096 Bias -0.00312 -0.00332 -0.00364 -0.00362 0.033242 0.038309 SE 0.04195 0.039778 0.039793 0.040621 SE 0.048244 0.046631 0.046655 0.043687 112 Table 3.3: M3 y2i = 1 + 3∗ x2i + 3∗ x(2,i+1) + ρui + N (0,1) Bias 0.002064 0.002393 0.002389 0.002388 Bias 0.002138 0.002899 0.002867 0.002833 Bias 0.002267 -0.0001 -0.00016 -0.00017 Bias -0.00186 -0.00133 -0.00134 -0.00134 Bias -0.00174 -0.00176 -1.85E-03 -0.00177 Bias -0.00011 0.000273 6.26E-05 1.29E-05 RMSE 0.046851 0.035135 0.035135 0.035106 RMSE 0.051028 0.0413 0.041295 0.041328 RMSE 0.052205 0.044968 0.044953 0.04488 RMSE 0.047374 0.034823 0.034823 0.035026 RMSE 0.050234 0.039573 0.039558 0.03955 RMSE 0.051057 0.043134 0.043088 0.042881 Coverage 0.963 0.958 0.958 0.973 Cov 0.96 0.948 0.948 0.969 Cov 0.958 0.956 0.956 0.971 Cov 0.962 0.967 0.967 0.982 Cov 0.962 0.959 0.958 0.982 Cov 0.958 0.954 0.955 0.974 ρ=1,λ =0.1 2SLS G2SLS Sp. GIV Sp. CF ρ=1,λ =0.5 2SLS G2SLS Sp. GIV Sp. CF ρ=1,λ =0.8 2SLS G2SLS Sp. GIV Sp. CF ρ=3,λ =0.1 2SLS G2SLS Sp. GIV Sp. CF ρ=3,λ =0.5 2SLS G2SLS Sp. GIV Sp. CF ρ=3,λ =0.8 2SLS G2SLS Sp. GIV Sp. CF Mean 0.99794 0.99761 0.99761 0.99761 Mean 0.99786 0.9971 0.99713 0.99717 Mean 0.99773 1.0001 1.0002 1.0002 Mean 1.0019 1.0013 1.0013 1.0013 Mean 1.0017 1.0018 1.0018 1.0018 Mean 1.0001 0.99973 0.99994 0.99999 Std. D 0.046829 0.035071 0.035071 0.035042 Std. D 0.051009 0.041219 0.041216 0.041251 SD 0.052182 0.04499 0.044975 0.044902 SD 0.047362 0.034815 0.034815 0.035018 SD 0.050229 0.039554 0.039535 0.03953 SD 0.051082 0.043154 0.043109 0.042903 Std. E 0.043653 0.032641 0.032641 0.036407 SE 0.044939 0.035274 0.035272 0.038399 SE 0.046249 0.038229 0.038228 0.040531 SE 0.043963 0.032403 0.032403 0.037006 SE 0.04479 0.035241 0.035239 0.039045 SE 0.04666 0.038117 0.038117 0.041549 113 3.6.5 Performance of Spatial GIV Since the Spatial GIV estimator performs better as compared to the SpatialCF estimator under no additional assumptions, we conduct another set of Monte Carlo simulations to analyze it’s properties. Specifically, the outcome variable y1i is generated as (3.85). The model for the endogenous variable y2 is give as 2∗ x2i + ρui + N (0,1), where ρ indicates the endogeneity which is allowed to be 1 and 3. To highlight the benefit of incorporating the within-group spatial correlation between the observation, we increase the group size to L = 4 and for simplicity, we do not include the Spatial CF. In addition, we only use one additional instrument. Finally, the spatial parameter for x is 0.2 and spatial parameter for u takes the value 0.1,0.5,2.0,5.0 The results are given in Table 3.4. As we can see, when we increase the group size, the Spatial GIV estimator gives us very good efficiency gains even in the presence of high spatial correlation. 114 Table 3.4: Performance of Spatial GIV ρ=1,λ =0.1 2SLS G2SLS Sp. GIV ρ=1,λ =0.5 2SLS G2SLS Sp. GIV ρ=1,λ =2.0 2SLS G2SLS Sp. GIV ρ=1,λ =5.0 2SLS 2SLS Sp.GIV ρ=3,λ =0.1 2SLS G2SLS Sp. GIV ρ=3,λ =0.5 2SLS G2SLS Sp. GIV ρ=3,λ =2.0 2SLS G2SLS Sp. GIV ρ=3,λ =5.0 2SLS Grpd. 2SLS Sp. GIV Mean Std. D S.E 0.9956 0.9965 0.9965 0.0783 0.078 0.0655 0.072 0.0719 0.0624 Mean Std. D S.E 0.9957 0.9966 0.9967 0.0781 0.0777 0.0644 0.072 0.0719 0.0617 Mean Std. D S.E 0.9958 0.9967 0.9971 0.0776 0.0773 0.0607 0.0718 0.0717 0.0594 Mean Std. D S.E 0.9956 0.9964 0.9971 0.0774 0.0771 0.059 0.0715 0.0713 0.0584 Mean Std. D S.E 0.9941 0.9967 0.996 0.0789 0.0781 0.0657 0.0727 0.0723 0.0629 Mean Std. D S.E 0.9943 0.9969 0.9964 0.0786 0.0776 0.0646 0.0727 0.0722 0.0621 Mean Std. D S.E 0.9944 0.9971 0.9972 0.0782 0.0769 0.0609 0.0725 0.0718 0.0596 Mean Std. D S.E 0.9943 0.9966 0.9974 0.0778 0.0766 0.0592 0.0721 0.0714 0.0584 Bias 0.004438 0.003528 0.003496 Bias 0.00434 0.003421 0.003269 Bias 0.004207 0.003275 0.002886 Bias 0.00443 0.003607 0.002864 Bias 0.005869 0.003284 0.00397 Bias 0.005742 0.003136 0.003634 Bias 0.005576 0.002937 0.002784 Bias 0.005665 0.003374 0.002618 RMSE 0.0783 0.078 0.0655 RMSE 0.0781 0.0777 0.0644 RMSE 0.0777 0.0773 0.0607 RMSE 0.0774 0.0771 0.059 RMSE 0.0791 0.0781 0.0657 RMSE 0.0788 0.0776 0.0646 RMSE 0.0783 0.0769 0.0609 RMSE 0.0779 0.0766 0.0592 115 3.7 Conclusion and Future Research In this paper, we propose a computationally simple estimation procedure to account for spatially correlated errors in linear econometric models with endogenous variables in a cross-section dataset. We describe how by dividing the observations into groups according to the distances between them and then taking account for only the spatial dependence of observation within a group gives us noticeable efficiency gains; even though we ignore the correlations between across-group observations. We suggest control-function approach to account for endogeneity in the model that gives us intuitive estimating equations. The estimating procedure in the paper provides an empirical researcher with a powerful tool that achieves more precise estimator while avoiding tedious calculations that are unavoidable when we attempt to take account of the entire correlation structure of all the observations. The estimation strategy described in this paper can also be extended to incorporate non- parametric methods of estimation. For example, we can use method of sieves in the second-step of our estimation procedure. To explain the motivation of doing a sieve estimation in the second- step, consider the case where L = 2. In that case, ΠΠΠg is a deterministic function of the distance between the two observations in a group g denoted by dg. We can allow ΠΠΠg to be a flexible function of dg and this motivates a sieve approximation of ΠΠΠg. Further, when we have two observations in a group, sieve methods can also be implemented to estimate the spatial dependence without restricting the functional form for spatial correlation structure. Since the correlation between the two observations in each group depend on the spatial parameter λη and the distance between the two observations dg, we can approximate the correlation function using the method of sieves. A formal description of the sieve methods in our estimation procedure and the asymptotic analysis is left for future research. 116 BIBLIOGRAPHY 117 BIBILIOGRAPHY • Adamchik, V. & V. Bedi. (1983). Wage differentials between the public and the private sectors: Evidence from an economy in transition. Labour Economics 7: 203â ˘A¸S224. • Altonji, J. G., Elder, T. E., & Taber, C. R. (2005). An evaluation of instrumental variable strategies for estimating the effects of catholic schooling. Journal of Human resources, 40(4), 791-821. • Arbia, G. (2006), Spatial Econometrics: Statistical Foundations and Applications to Re- gional Convergence . Springer. • Biorn, E. (1981). Estimating economic relations from incomplete cross-section/time- series data. Journal of Econometrics, 16(2), 221-236. • Baltagi, B. H. (1985). Pooling cross-sections with unequal time-series lengths. Eco- nomics Letters, 18(2), 133-136. • Baltagi, B. (2001). Econometric analysis of panel data, second edition. West Sussex, UK:Wiley • Baltagi, B. H., & Chang, Y. J. (1994). Incomplete panels: A comparative study of alter- native estimators for the unbalanced one-way error component regression model. Journal of Econometrics, 62(2), 67-89. • Baltagi, B. H., Song, S. H., & Jung, B. C. (2002). A comparative study of alternative esti- mators for the unbalanced twoway error component regression model. The Econometrics Journal, 5(2), 480-493. • Baltagi, B. H., & Song, S. H. (2006). Unbalanced panel data: A survey. Statistical Papers, 47(4), 493-523. • Bell, K. P., and Bockstael, N. E. (2000). "Applying the generalized-moments estimation approach to spatial problems involving micro-level data." The review of economics and statistics, 82(1), 72-82. • Chen, X. (2007). Large sample sieve estimation of semi-nonparametric models. Hand- book of econometrics, 6, 5549-5632. • Chen, S., & Khan, S. (2003). Semiparametric estimation of a heteroskedastic sample selection model. Econometric Theory, 19(06), 1040-1064. • Christakos, G. (1987), "On the Problem of Permissible Covariance and Variogram Mod- els," Water Resources Research, 20, 251-265. 118 • Conley,T.G. (1999), "GMM Estimation with Cross Sectional Dependence" Journal of Econometrics, Volume 92, Issue 1, 1-45. • Cressie, N.A.C. (1993), Statistics for Spatial Data, Wiley Series in Probability and Math- ematical Statistics: Applied Probability and Statistics, New York: John Wiley & Sons Inc., a Wiley-Interscience Publication. • Cressie, N and Lahiri, S.N (1993): "The Asymptotic Distribution of REML Estimators" Journal of Multivariate Analysis, 45, 217-233 • Donald, S. G. (1995). Two-step estimation of heteroskedastic sample selection models. Journal of Econometrics, 65(2), 347-380. • Dubin, R.A. (1988), "Estimation of Regression Coefficients in the Presence of Spatially Autocorrelated Error Terms" The Review of Economics and Statistics , Vol. 70, No. 3, 466-474. • Dubin, R. A. (1998). "Spatial autocorrelation: a primer". Journal of housing economics, 7(4), 304-327. • Glass, A. J., Kenjegalieva, K., and Sickles, R. (2012). "The economic case for the spatial error model with an application to state vehicle usage in the US" working paper. • Guggenberger(2010) :The Impact of a Hausman Pretest on the Size of a Hypothesis Test: the Panel Data Case. Journal of Econometrics 156(2), 337-343. • Hausman, J. A. (1978). Specification tests in econometrics. Econometrica: Journal of the Econometric Society, 1251-1271. • Hausman, J. A., & Taylor, W. E. (1981). Panel data and unobservable individual effects. Econometrica: Journal of the Econometric Society, 1377-1398. • Hsiao, C., Shen, Y., Wang, B., & Weeks, G. (2008). Evaluating the effectiveness of Washington state repeated job search services on the employment rate of prime-age fe- male welfare recipients. Journal of econometrics, 145(1), 98-108. • Hahn, J., Liao, Z., & Ridder, G. (2018). Nonparametric two-step sieve M estimation and inference. Econometric Theory, 1-44. • Hansen, L. P., & Richard, S. F. (1987). The role of conditioning information in deducing testable restrictions implied by dynamic asset pricing models. Econometrica: Journal of the Econometric Society, 587-613. • Kelejian, H.H., Prucha, I.R., 1999. "A generalized moments estimator for the autoregres- sive parameter in a spatial model." International Economic Review 40, 509â ˘A¸S533 • Kelejian, H.H., Prucha, I.R., 2001. "On the asymptotic distribution of the Moran I test statistic with applications." Journal of Econometrics 104, 219â ˘A¸S257. 119 • Kelejian, H.H. and Prucha I.R. (2007), "HAC estimation in a spatial framework," Journal of Econometrics, Volume 140, Issue 1, September 2007, Pages 131-154 • Kim, K. I. (2013). An Alternative Efficient Estimation of Average Treatment Effects. Journal of Market Economy, 42(3), 1-41. • Kyriazidou, E. (1997). Estimation of a panel data sample selection model. Econometrica: Journal of the Econometric Society, 1335-1364. • Lee, L. (1978). Unionism and wage rates: A simultaneous equations model with qualita- tive and limited dependent variables. International Economic Review 19: 415â ˘A¸S433. • Lee, L.F.(2004), "Asymptotic Distributions of Quasi-Maximum Likelihood Estimators for Spatial Autoregressive Models," Econometrica, Vol. 72, 1899-1925 • Lu, C., and Wooldridge, J. M. (2017). Quasi-generalized least squares regression estima- tion with spatial data. Economics Letters, 156, 138-141. • Lu, C. (2013). Linear and nonlinear estimation with spatial data. • Maddala, G. S. 1983. Limited-Dependent and Qualitative Variables in Econometrics. Cambridge: Cambridge University Press. • Mardia, K.V. and R.J. Marshall (1984), "Maximum Likelihood Estimation of Models for Residual Covariance in Spatial Regression" Biometrika 71, 135-146. • Mincer, J. and S. Polachek. 1974. Family investment in human capital: Earnings of women. Journal of Political Economy (Supplement) 82: S76â ˘A¸SS108. • Mundlak, Y. (1978). On the pooling of time series and cross section data. Econometrica: journal of the Econometric Society, 69-85. • Murtazashvili, I., & Wooldridge, J. M. (2016). A control function approach to estimat- ing switching regression models with endogenous explanatory variables and endogenous switching. Journal of Econometrics, 190(2), 252-266. • Papke, L. E. (2005). The effects of spending on test pass rates: evidence from Michigan. Journal of Public Economics, 89(5), 821-839. • Papke, L. E., & Wooldridge, J. M. (2008). Panel data methods for fractional response variables with an application to test pass rates. Journal of Econometrics, 145(1), 121- 133. • Rochina-Barrachina, M. E. (1999). A new estimator for panel data sample selection mod- els. Annales d’Economie et de Statistique, 153-181. • Semykina, A., & Wooldridge, J. M. (2010). Estimating panel data models in the presence of endogeneity and selection. Journal of Econometrics, 157(2), 375-380. 120 • Stein, M.L. (1999) "Interpolation of Spatial Data" New York: Springer-Verlag • Thorst, R. 1977. Demand for housing: A model based on inter-related choices between owning and renting. Ph.D. dissertation, University of Florida. • Tobler, W. (1970). "A Computer Movie Simulating Urban Growth in the Detroit Region." Econ. Geog. Supplement 46, 234â ˘A¸S240. • Vella, F. (1998). Estimating models with sample selection bias: a survey. Journal of Human Resources, 127-169. • Verbeek, M., & Nijman, T. (1992). Testing for selectivity bias in panel data models. International Economic Review, 681-703. • Wang, H., Iglesias, E. and Wooldrige, J.M. (2012). "Partial Maximum Likelihood Esti- mation of Spatial Probit Models," Journal of Econonometrics. • Wansbeek, T., & Kapteyn, A. (1989). Estimation of the error-components model with incomplete panels. Journal of Econometrics, 41(3), 341-361. • Wooldridge, J. M. (1995). Selection corrections for panel data models under conditional mean independence assumptions. Journal of econometrics, 68(1), 115-132. • Wooldridge, J. M. (1999). Distribution-free estimation of some nonlinear panel data mod- els. Journal of Econometrics, 90(1), 77-97. • Wooldridge, J. M. (2010). Econometric analysis of cross section and panel data. MIT press. • Wooldridge, J.M. (2011), Econometric Analysis of Cross Section and Panel Data. MIT. • Wooldridge, J. M. (2015). Control function methods in applied econometrics. Journal of Human Resources, 50(2), 420-445. • Wooldridge, J. M. (2016). Correlated random effects models with unbalanced panels. Manuscript (version July 2009) Michigan State University. 121