ESTIMATING NONLINEAR CROSS SECTION AND PANEL DATA MODELS WITH ENDOGENEITY AND HETEROGENEITY by Hoa Bao Nguyen A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY ECONOMICS 2011 ABSTRACT ESTIMATING NONLINEAR CROSS SECTION AND PANEL DATA MODELS WITH ENDOGENEITY AND HETEROGENEITY by Hoa Bao Nguyen The dissertation consists of three chapters that consider the estimation of nonlinear cross section and panel data models. This study contributes to the literature by developing new estimation methods for estimating models with limited dependent variable and endogenous regressors in the presence of unobserved heterogeneity. It also makes contribution to the field of labor economics by applying my new estimators to the study of female labor supply. In the first chapter, a fractional response model with a count endogenous regressor is considered. A new estimation method is proposed to handle discrete endogeneity in the presence of unobserved heterogeneity and non-linear setting. The two-step Quasi-Maximum Likelihood and Nonlinear Least Squares estimators using the Adaptive Gauss Hermite quadrature are proposed. Average partial effects for discrete endogenous variables are obtained given its difficulty of approximation based on a non-closed form conditional mean with a non-normal heterogeneity. Monte Carlo simulations verify that the new estimators are the least biased and the most efficient among examined estimators including existing estimators. This is the first research that supports the necessity and significance of count endogeneity. The proposed estimators are applied to analyze the US female labor supply. The result shows diminishing marginal effects of additional children on female’s working hours. This novel finding is consistent with a story of fertility and presents an evidence of economies of scale that mothers become more efficient after raising the first kids, devote more time to work and balance between working time and family time. In the second chapter, a dynamic Tobit panel data model that allows for an endogenous regressor (besides the lagged dependent variable) is developed. I also permit the presence of unobserved heterogeneity and serial correlation of transitory shocks. A correlated random effect Tobit approach, a computationally attractive estimation method, is proposed. The estimation method employs the control function approach to account for endogeneity and to consistently estimate average partial effects. In addition, serial correlation in the reduced form is corrected which makes the estimator more robust. This method is readily applied to Panel Study of Income Dynamics data from 1980 to 1992. I find a strong evidence of persistence in US white female labor working hours and the initial condition of female labor supply is statistically significant. The third chapter considers the estimation of a panel data model with a corner solution response and the presence of a dummy endogenous variable as well as heterogeneity. The main contribution is to allow a joint distribution of the binary endogenous regressor and the unobserved factors that affect both the amount and participation equations. A bivariate probit model is suggested in the first stage. An exponential type II Tobit (ET2T) model is exploited for the amount equation to ensure that the predicted value for the response variable is positive; and there is a correlation between unobserved effects in both the amount and participation equations. The two-step estimation procedure inspired by Heckman’s idea of adding correction terms for endogenous switching and a corner solution outcome is used to analyze the impact of fertility on female labor force participation and labor supply using the Vietnamese Household Living Standard Surveys data 2004-2008. The proposed approach gives a statistically significant negative effect of having a newborn on women who are working and remain in the labor market. It corrects remarkably the bias in estimating the effect of a newborn on mother’s working hours compared to other alternative estimation methods. Copyright by Hoa Bao Nguyen 2011 This thesis is dedicated to my family, my husband, Minh Cong Nguyen, and my son, Ton Chi Cong Nguyen. v ACKNOWLEDGEMENTS I would like to take this opportunity to thank people who have helped me during the journey to a Ph.D. First, I would like to express my deepest gratitude to my advisor Professor Jeffrey Wooldridge, a person of great knowledge and exceptional teacher, for his generous advice, support and excellent training during my work on this dissertation. I would also like to thank my other committee members, Professors Peter Schmidt, Todd Elder and Joseph Gardiner for their valuable comments and support. I am very grateful for the support that I receive from people at the World Bank who kindly encouraged me to apply and develop more econometric models and estimation methods for nonlinear panel data with discrete endogenous variables, to be exploited as robust devices in useful applications. I wish to thank faculty members and graduate students of the Department of Economics at Michigan State University for their useful training of many core economics branches and seminar discussion of econometrics topics. I would like to express a warmest gratitude to my parents and my husband. I owe my father because he has motivated me to be a scientist and always encouraged as well as challenged me to make great accomplishments. I am grateful to my mother and my parents-in-law who have been there for me and make the time for me to focus on my dissertation. I was fortunate to have a beloved husband who has helped me continuously and tremendously during my doctorate. Without his love and support, I will never make it through. I also want to thank my sister and other members in my extended family. Last but not least, I wish to thank my baby, Tony, since having him during the graduate study made me recognize many values of life and created my unstoppable determination to obtain the PhD and other future accomplishments. I dedicate this dissertation to my big family. I would like to thank everyone whom I did not mention specifically but who helped me during my studies. vi TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi CHAPTER 1 1.1 1.2 1.3 1.4 1.5 ESTIMATING A FRACTIONAL RESPONSE MODEL WITH A COUNT ENDOGENOUS REGRESSOR 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Theoretical Model - Specification and Estimation . . . . . . . . . . . . . . . . . . 4 1.2.1 Estimation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2.2 Average Partial Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.2.1 The Case with Exogenous Covariates and a Normally Distributed Heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.2.2 The Case with a Count Endogenous Covariate and a Non-normally Distributed Heterogeneity . . . . . . . . . . . . . . . . . . . . . 11 Monte Carlo Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3.1 Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.3.2 Data Generating Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.3.3 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.3.3.1 Simulation Result with a Strong Instrumental Variable . . . . . . 16 1.3.3.2 Simulation Result with a Weak Instrumental Variable . . . . . . . 19 1.3.3.3 Simulation Result with Different Sample Sizes . . . . . . . . . . 20 1.3.3.4 Simulation Result with a Misspecified Distribution . . . . . . . . 21 1.3.4 Conclusion from the Monte Carlo Simulations . . . . . . . . . . . . . . . . 21 Application and Estimation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 CHAPTER 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 ESTIMATION OF A DYNAMIC TOBIT PANEL DATA WITH AN ENDOGENOUS VARIABLE AND AN APPLICATION TO FEMALE LABOR SUPPLY Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Average Partial Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Serial Correlation Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Estimation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Average Partial Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Empirical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.2 Estimation and Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii . . . . . . . . . . . . 30 30 31 35 40 41 44 45 45 46 47 49 51 CHAPTER 3 3.1 3.2 3.3 3.4 3.5 3.6 AN EXPONENTIAL TYPE II TOBIT PANEL DATA MODEL WITH BINARY ENDOGENOUS REGRESSOR - APPLICATION TO ESTIMATING THE EFFECT OF FERTILITY ON MOTHERS’ LABOR FORCE PARTICIPATION AND LABOR SUPPLY Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model and Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Average Partial Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Empirical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Overview of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 Estimation and Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 53 55 58 63 65 65 67 71 APPENDIX A TABLES FOR CHAPTER 1 73 APPENDIX B TABLES AND FIGURES FOR CHAPTER 2 90 APPENDIX C TABLES FOR CHAPTER 3 99 APPENDIX D TECHNICALITIES FOR CHAPTER 1 D.1 Details of the QML Estimator . . . . . . . . . . . . . . . D.1.1 Asymptotic Variance for the Two-step Estimator . D.1.2 Asymptotic Variance for the APEs . . . . . . . . D.2 Details of the Tobit Model’s Estimators . . . . . . . . . D.3 Formula of the NLS estimation . . . . . . . . . . . . . . D.4 Derivation of the Heterogeneity Distribution . . . . . . . 103 103 103 107 110 113 114 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . APPENDIX E TECHNICALITIES FOR CHAPTER 2 115 E.1 Asymptotic Variance of the Two-step Estimator . . . . . . . . . . . . . . . . . . . 115 E.2 Asymptotic Variance of the Average Partial Effects . . . . . . . . . . . . . . . . . 118 APPENDIX F TECHNICALITIES FOR CHAPTER 3 120 F.1 Bivariate Probit Model in the First Stage . . . . . . . . . . . . . . . . . . . . . . . 120 F.2 Asymptotic Variance of the Two-step Estimator . . . . . . . . . . . . . . . . . . . 122 BIBLIOGRAPHY 126 viii LIST OF TABLES A.1 Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.5, 500 replications) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 A.2 Simulation Result of the Coefficient Estimates (N=1000, η 1 = 0.5, 500 replications) . . 75 A.3 Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.1, 500 replications) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 A.4 Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.9, 500 replications) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 A.5 Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.5, 500 replications, δ23 = 0.3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 A.6 Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.5, 500 replications, δ23 = 0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 A.7 Simulation Result of the Average Partial Effects Estimates (N=100, η 1 = 0.5, 500 replications) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 A.8 Simulation Result of the Average Partial Effects Estimates (N=500, η 1 = 0.5, 500 replications) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 A.9 Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.5, 500 replications) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 A.10 Simulation Result of the Average Partial Effects Estimates (N=2000, η 1 = 0.5, 500 replications) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 A.11 Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.5, a1 is normally distributed, 500 replications) . . . . . . . . . . . . . . . . . . . . . . . . . . 84 A.12 Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.5, 500 replications) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 A.13 Comparison of analytical and bootstrapping mean of standard errors (N=1000, η 1 = 0.5, 200 replications) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 A.14 Frequencies of the Number of Children . . . . . . . . . . . . . . . . . . . . . . . . . . 86 A.15 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 A.16 First-stage Estimates using Instrumental Variables . . . . . . . . . . . . . . . . . . . . 87 ix A.17 Estimates Assuming Number of Kids is Conditionally Exogenous . . . . . . . . . . . 88 A.18 Estimates Assuming Number of Kids is Endogenous . . . . . . . . . . . . . . . . . . 89 B.1 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 B.2 Determinants of Female Working Experience - First stage regressions . . . . . . . . . 91 B.3 Estimating Dynamic Female Labor Supply, Second Stage Regressions, Experience is Treated as an Endogenous Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 B.4 Average Partial Effects on Female Labor Supply . . . . . . . . . . . . . . . . . . . . . 93 C.1 Summary Statistics for the Whole Sample . . . . . . . . . . . . . . . . . . . . . . . . 99 C.2 Summary Statistics for Each Year in the Panel . . . . . . . . . . . . . . . . . . . . . . 100 C.3 Bivariate Probit Estimates of Fertility and LFP in the First Stage . . . . . . . . . . . . 101 C.4 Estimates for Log(Female Working Hours) Equation . . . . . . . . . . . . . . . . . . 102 x LIST OF FIGURES B.1 Distribution of Women’s Annual Hours of Work in 1980-1992 . . . . . . . . . . . . . 94 B.2 Hours of Work vs. Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 B.3 Hours of Work vs. Number of Children 0-2 . . . . . . . . . . . . . . . . . . . . . . . 96 B.4 Hours of Work vs. Number of Children 3-5 . . . . . . . . . . . . . . . . . . . . . . . 97 B.5 Hours of Work vs. Number of Children 6-17 . . . . . . . . . . . . . . . . . . . . . . . 98 xi Chapter 1 ESTIMATING A FRACTIONAL RESPONSE MODEL WITH A COUNT ENDOGENOUS REGRESSOR 1.1 Introduction Many economic models employ a fraction or a percentage, instead of level values, as a dependent variable. In these models, economic variables of interest occur in fractions such as employee participation rates in 401(k) pension plans, firm market shares and fractions of total weekly hours spent working. These fractional response variables take values in the unit interval [0,1], which have both continuous and discrete characteristics. As suggested in Papke & Wooldridge (1996, 2008), we can model fractional response variables based on a correctly specified conditional mean and use a simple quasi-maximum likelihood estimator or nonlinear least squares (QMLE/NLS) method with the Bernoulli distribution. This method is more attractive than other standard approaches such as the MLE method with beta distribution or the log-odd transformation because it will give a direct estimate of the original dependent variable and ensure that the predicted value is in the unit interval. Fractional response models (FRMs) with continuous or binary endogenous variables have been studied (see more in Papke & Wooldridge (2008) and Wooldridge (2010)). However, there has not been any well-developed estimation method and procedure to deal with count endogeneity in FRMs. Traditionally, a count endogenous explanatory variable (CEEV) is treated as a continuous endogenous variable and it is written in a linear fashion of covariates including instruments and additive error. A common approach such as the two stage least squares (2SLS) using the linear approximation always gives a constant marginal effect. This approach ignores the fact that the marginal effect of having one more unit of the CEEV on the outcome of interest might be more or less than the marginal effect of having the previous unit on the outcome. In order to acknowledge this fact, we should study FRMs and count endogeneity with the nonlinear approximation 1 in the first stage. Specifically, we can handle a CEEV by allowing a Poisson distribution of the count variable in the reduced form. The heterogeneity term in the Poisson model is assumed to be correlated with the error term in the structural conditional mean. It is standard to allow the heterogeneity to follow a gamma distribution (which leads to the gamma error in the reduced form) because it results in a closed form solution, the Negative Binomial (NB) model. The key to correct for the endogeneity problem in this case is how we are willing to make an assumption on the joint distribution of errors. One strategy is to allow a linear correlation between the transformation of the gamma, which is now normally distributed, and the error in the structural conditional mean (see further discussion in Weiss (1999)). However, this assumption does not allow a direct relationship between the two errors which governs the endogeneity problem. The choice of the transformation function is the inverse of the standard normal and depends on unknown parameters in the distribution of the heterogeneity of the Poisson model. This strategy can be used to test for endogeneity of the count explanatory variable but it will make the evaluation of the likelihood function more computational and obtaining the conditional maximum likelihood estimator as well as its asymptotic covariance matrix is nontrivial and time-consuming. An alternative strategy is that we still allow the gamma heterogeneity in the Poisson model and a linear, direct correlation of this heterogeneity and the error term in the structural conditional mean. We then need to integrate out this heterogeneity in the structural conditional mean. As discussed in Winkelmann (2000), the heterogeneity in a Poisson model can be presented in terms of an additive correlated error or a multiplicative correlated error. However, the multiplicative correlated error has some advantage over the additive correlated error on grounds of consistency. As a result, a multiplicative correlated error is used in this model. Because nonlinearity is allowed in both the reduced form and the main structural equations, a two-step estimation procedure is more attractive. In a simultaneous nonlinear equations model with a count dependent variable and a binary endogenous variable, Terza (1998) proposed a twostage method estimation method using a joint normal distribution of the error terms. He did not carry out the estimation of the Poisson Full-Information Maximum Likelihood (FIML) or explore 2 its properties even though this approach was introduced in his paper. This is due to the computation burdensome of the FIML estimator. It emphasizes the advantage of the two-step estimation procedure that we can employ in this model. However, the joint normal distribution of the error terms is no longer appropriate in this case because a normal error term in the Poisson model will not lead to a closed form solution. Therefore, we have discussed the strategy on assuming the error terms as above. Besides an easier computational task, the two-step estimation procedure ensures that the predicted value lying in the rational range. Moreover, we do not need to find a conditional probability for each value of the CEEV without knowing in advance specific values of a general count explanatory variable, and avoid computing a conditional MLE which must be very difficult. Other estimation method for a FRM with a CEEV can be considered. For example, semiparametric and nonparametric method can be used (see more in Das (2005)). However, this approach does not give estimates of the partial effects or the average partial effects of interest. If we are interested in estimating both parameters and average partial effects (APEs), a parametric approach will be preferred. In addition, in a nonlinear model, the quantity of interest is the APE which can be comparable to a linear model’s estimate. Therefore, it is necessary and useful for applied economists and practitioners to obtain the APE and use the parametric model. In this chapter, I show how to specify and estimate FRMs with a CEEV and an unobserved heterogeneity. Based on the work of Papke & Wooldridge (1996, 2008), I also use models for the conditional mean of the fractional response in which the fitted value is always in the unit interval. I focus on the probit response function since the probit mean function is less computationally demanding in obtaining the average partial effects. I suggest a new estimation method to handle discrete endogeneity in the presence of unobserved heterogeneity and non-linear setting. The twostep Quasi-Maximum Likelihood and Nonlinear Least Squares estimators using Adaptive Gauss Hermite quadrature are proposed. Average partial effects for discrete endogenous variables are obtained given its difficulty of approximation based on a non-closed form conditional mean with a non-normal heterogeneity. Monte Carlo simulations verify that the new estimators are the least biased and the most efficient among examined estimators including existing estimators. Using 3 these robust and efficient estimators, I applied my proposed estimators to analyze the US female labor supply. The empirical result gives an evidence to show that this is the first research that supports the necessity and significance of count endogeneity. This chapter is organized as follows. Section 2 introduces the specifications and estimations of a FRM with a CEEV and shows how to estimate parameters and the average partial effects using the two-step QMLE and NLS approaches. Section 3 presents Monte Carlo simulations and an application to the fraction of total working hours for a female per week will follow in Section 4. Section 5 concludes. 1.2 Theoretical Model - Specificatio and Estimation For a 1 × K vector of explanatory variables z1 , the conditional mean model is expressed as follows: E(y1 |y2 , z, a1 ) = Φ(α1 y2 + z1 δ1 + η1 a1 ), (1.1) where Φ(·) is a standard normal cumulative distribution function (cdf), y 1 is a response variable (0 ≤ y1 ≤ 1), and a1 is a heterogeneous component or an omitted factor assumed to be correlated with y2 but independent of exogenous variables z. In equation (1.1), I focus on the fractional probit conditional mean because it gives a computationally simple estimator when we deal with unobserved heterogeneity and endogenous regressors, as well as a convenient way to obtain average partial effects later on. The exogenous variables are z = (z1 , z2 ) where we need exogenous variables z2 to be excluded from (1.1). z is a 1 × L vector where L > K, z2 is a vector of instruments. y2 is a count endogenous variable where we assume that the endogenous regressor has a Poisson distribution: y2 |z, a1 ∼ Poisson[exp(zδ 2 + a1 )], (1.2) then the conditional density of y 2 is specified as: f (y2 |z, a1 ) = [exp(zδ 2 + a1 )]y2 exp [− exp(zδ 2 + a1 )] , y2 ! (1.3) where a1 is assumed to be independent of z, and exp(a 1 ) is distributed as Gamma(δ0 , 1/δ0 ) using a single parameter δ0 , with E(exp(a1 )) = 1 and Var(exp(a1 )) = 1/δ0 . 4 The presence of a1 in both equations (1.1) and (1.2) is what makes y 2 potentially endogenous in the equation of interest, (1.1). To illustrate this point, we could use, for example, u 2 instead of a1 in the reduced form and u1 instead of η1 a1 in the structural conditional mean and assume a linear function: u1 = η1 u2 + e1 . Substitute the right-hand-side of this function into the structural conditional mean and then omit e 1 through multiplying all the coefficients by the scale factor 2 1/ 1 + σe , we will see η1 u2 and u2 appear in the places of a1 and η1 a1 in equations (1.1) and (1.2), respectively. Hence, rather than using u1 and u2 , we simply use a1 and η1 a1 as stated in the reduced form and the structural conditional mean to govern the endogeneity of y 2 . After a transformation (see Appendix D.4. for the derivation), the distribution of a 1 is derived as follows: δ δ 0 [exp(a1 )]δ0 exp(−δ0 exp(a1 )) . f (a1 ; δ0 ) = 0 Γ(δ0 ) (1.4) In order to get the conditional mean E(y1 |y2 , z), I specify the conditional density function of a1 . Using Bayes’ rule, it is: f (a1 |y2 , z) = f (y2 |a1 , z) f (a1 |z) . f (y2 |z) Since y2 |z, a1 has a Poisson distribution and exp(a 1 ) has a gamma distribution, y 2 |z is Negative Binomial II distributed, as a standard result (see the Poisson and Negative Binomial II models in Cameron & Trivedi (1986) and a specific derivation of this result from equations (D.1) to (D.3) in Appendix D). After some algebra, the conditional density function of a 1 is: exp[P] [δ0 + exp(zδ 2 )](y2 +δ0 ) , f (a1 |y2 , z) = Γ(y2 + δ0 ) (1.5) where P = − exp(zδ 2 + a1 ) + a1 (y2 + δ0 ) − δ0 exp(a1 ). The conditional mean E(y1 |y2 , z) therefore will be obtained as: E(y1 |y2 , z) = +∞ −∞ Φ(α1 y2 + z1 δ1 + η1 a1 ) f (a1 |y2 , z)da1 = μ (θ ; y2 , z), where f (a1 |y2 , z) is given in (1.5) and θ = (α1 , δ1 , η1 ). 5 (1.6) The key to obtain the conditional mean of interest is to get the conditional density function of a1 . Therefore, we need to assume the distribution of a 1 and specify f (a1 |y2 , z) as above. For estimating purpose, it is necessary to compute f (a 1 |y2 , z) in (1.5) based on the parameters in the reduced form (1.2). These parameters can be estimated using a Negative Binomial II regression. And henceforth, they can be viewed as first-step estimated parameters. In the second step, we are interested in estimating conditional mean parameters, θ , in the FRMs. For FRMs, we can consider a beta distribution or log-odds transformation of the fractional dependent variable. However, Wooldridge (2010) shows that these two approaches have some drawbacks. First, they rule out the case when the fractional response variable has some pileup at zero and/or one. Second, specifying a beta distribution is not robust and produces inconsistent estimators if any aspect of the distribution is misspecified. Third, the log-odds approach does not give a direct estimate of the conditional mean which is of interest; since this approach offers only the estimate of the transformed dependent variable (see more discussion in Papke & Wooldridge (1996)). Therefore, for the dependent variable which has some mass point at 0 and/or 1, and continuous in (0,1), we can focus on estimating the conditional mean of the fractional response (as stated in equation 1.1) that keeps the predicted value in the unit interval, and obtain robust estimators using the QMLE/NLS under the correctly specified conditional mean function. See Papke & Wooldridge (1996, 2008) for further details. Given the fractional probit conditional mean model as in equation (1.1), there are many ways to estimate θ consistently. One possibility is to adopt the NLS estimator. This estimator is consistent √ and N asymptotically normal. However, this estimator is unlikely to be asymptotically efficient because homoskedasticity is unlikely to hold for y 1 , even if we ignore the conditional Poisson distribution for y2 . It might also be computationally intensive to obtain the weighting matrix for the NLS estimator. Hence, we can use a simpler, robust and efficient estimator, that is, the quasimaximum likelihood estimator (QMLE). One can consider the QMLE using the Bernoulli distribution or the Poisson distribution of y 1 . The QMLE is simple and strongly consistent even if the true distribution of y 1 is not Bernoulli 6 once the first moment is assumed to be correctly specified. There are other reasons that make the Bernoulli QMLE more attractive. First, maximizing the Bernoulli log likelihood is easy. Second, the Bernoulli distribution is a member of the linear exponential family (LEF) and it does not have any restriction as other distributions (see further discussion in Papke & Wooldridge (1996)). Moreover, it has some advantage over the Poisson distribution. For example, it is consistent with the nature of a fractional response variable which has both continuous and discrete characteristics. The Poisson distribution is consistent with a non-negative response variable but does not take into account mass points at 0 and/or 1. In addition, even though the Poisson distribution is a member of the LEF, it is chosen if we want the variance to be proportional to the mean, which is not realistic for a fractional response variable. It is unlikely that the variance is monotonically increasing in the mean. Another attraction of the Bernoulli QMLE is that it is efficient in a class of estimators containing all QMLEs in the LEF as long as the conditional mean is correctly specified and the variance assumption holds. The assumption that the variance associated with the quasi-log likelihood in equation (1.6) is the Bernoulli generalized linear models (GLM) variance will hold if the number of Bernoulli draws is independent of z i . This assumption still holds in an empirical example of this chapter. However, in other applications, there is no guarantee that this assumption holds and it is recommended to obtain fully robust sandwich standard errors (see more discussion in Papke & Wooldridge (1996) and (Wooldridge, 2010, section 18.6)). Therefore, in what follows, we use the QMLE or NLS with the Bernoulli quasi-log likelihood function to estimate θ of equation (1.6) in the second step. The Bernoulli quasi-log likelihood function is given by: li (θ ) = y1i ln μi + (1 − y1i ) ln(1 − μi ). (1.7) The QMLE of θ in the second step is obtained from the maximization problem (see more details in Appendix D.1.): n Max θ ∈Θ i=1 li (θ ). (1.8) The NLS estimator of θ in the second step is attained from the minimization problem (see more 7 details in Appendix D.3.): N min N −1 θ ∈Θ [y1i − μi (θ ; y2i , zi )]2 /2. (1.9) i=1 After we obtained the estimated parameters from the first-step and approximate the conditional mean (the detailed approximation procedure is discussed below), we estimate θ using the QMLE and NLS estimators as described in the above maximization and minimization problems. These estimators are the so-called two-step M-estimators that are consistent and asymptotically normal (see further discussion of these estimators in Newey & McFadden (1994) and (Wooldridge, 2002, chapter 12)). Since μi = E(y1 |y2 , z) does not have a closed form solution, it is necessary to use a numerical approximation. The numerical routine for integrating out the unobserved heterogeneity in the conditional mean equation (1.6) is based on the Adaptive Gauss-Hermite quadrature. This adaptive approximation has proven to be more accurate with fewer points than the ordinary Gauss-Hermite approximation. The quadrature locations are shifted and scaled to be under the peak of the integrand. Therefore, the adaptive quadrature is performed well with an adequate amount of points (see more in Skrondal & Sophia (2004)). Using the Adaptive Gauss-Hermite approximation, the above integral (1.6) can be obtained as: μi = +∞ −∞ hi (y2i , zi , a1 )da1 ≈ Ò Ó M √ √ 2σi w∗ exp (a∗ )2 hi (y2 , zi , 2σi a∗ + wi ), m m m (1.10) m=1 where σi and wi are the adaptive parameters for observation i, w∗ are the weights and a∗ are the m m evaluation points, and M is the number of quadrature points. The approximation procedure follows Skrondal & Sophia (2004). The adaptive parameters σi and wi are updated in the kth iteration of the optimization for μi with: μi,k ≈ √ M √ ˆ ˆ ˆ 2σi,k−1 w∗ exp{(a∗ )2 }hi (y2i, zi , 2σi,k−1 a∗ + ωi,k−1 ), m m m m=1 √ ˆ 2σi,k−1 w∗ exp{(a∗ )2 }hi (y2i, zi , τi,m,k−1 ) m m ˆ ωi,k = (τi,m,k−1 ) , μi,k m=1 M 8 ˆ σi,k = M √ (τi,m,k−1 )2 m=1 ˆ 2σi,k−1 w∗ exp{(a∗ )2 }hi (y2i, zi , τi,m,k−1 ) m m ˆ − (ωi,k )2 , μi,k where τi,m,k−1 = √ ˆ ˆ 2σi,k−1 a∗ + ωi,k−1 . m ˆ ˆ This process is repeated until σi,k and ωi,k have converged for this iteration at observation i of the maximization algorithm. This adaptation is applied to every iteration until the log-likelihood difference from the last iteration is less than a relative difference of 1e−5 ; after this adaptation, the adaptive parameters are fixed. Once the evaluation of the conditional mean has been done for all observations, the numerical ˆ values can be passed on to a maximizer in order to find the QMLE or NLS θ . I summarize the method for estimating θ with the following procedure: 1.2.1 Estimation Procedure (i) Estimate δ2 and δ0 by using maximum likelihood of y i2 on zi in the Negative Binomial ˆ ˆ model. Obtain the estimated parameters δ2 and δ0 . (ii) Use the fractional probit QMLE (or NLS) of yi1 on yi2 , zi1 to estimate α1 , δ1 and η1 with the approximated conditional mean. The conditional mean is approximated using the estimated parameters in the first step and using the Adaptive Gauss-Hermite method. ˆ ˆ ˆ ˆ After getting all the estimated parameters θ = (α1 , δ 1 ,η1 ) , the standard errors in the second stage should be adjusted for the first stage estimation and obtained using the delta method. The standard errors obtained by using the delta method can be derived with the following formula: ˆ Avar(θ ) = N 1 ˆ −1 ˆ ˆ ˆ ri1 ri1 A−1 . N −1 A1 1 N i=1 (1.11) For more details, see the derivation and matrix notation from equation (D.6) to equation (D.20) in Appendix D.1. 9 1.2.2 Average Partial Effects Econometricians are often interested in estimating the average partial effects of explanatory variables in non-linear models in order to get comparable magnitudes with other nonlinear models and linear models. The Average Partial Effects (APE) can be obtained by taking the derivatives or the differences of a conditional mean equation with respect to the explanatory variables of interest. The APE cannot be estimated with the presence of unobserved factor. It is necessary to "integrate out" the unobserved variable in the conditional mean or average the partial effects across the distribution of the unobservable. Then we will obtain a single factor by taking the average across the sample in order to compare with the corresponding linear estimate. I begin by reviewing the calculation of APEs when the explanatory variables are exogenous, following Papke & Wooldridge (2008), and then show how to identify the APEs with a count endogenous explanatory variable. 1.2.2.1 The Case with Exogenous Covariates and a Normally Distributed Heterogeneity In a FRM with all exogenous covariates, model (1.1) with y 2 exogenous and a normally distributed a1 is considered (for a general discussion of a FRM with all exogenous covariates, see Papke & Wooldridge (2008)). Let w = (y2 , z1 ), dropping observation index i, equation (1.1) is rewritten as: E(y1 |w, a1 ) = Φ(wβ + a1 ), where w is the fixed terms and a1 is the random term. We can also allow elements of w to be any function of (y2 , z1 ), including nonlinear functions, such as quadratic or cubic forms, and interactions. If w1 is continuous, then the partial effect with respect to w 1 is: ∂ E(y1 |w, a1)/∂ w1 = β1 φ (wβ + a1 ). If w1 is a dummy variable, we compute: Φ(w1 β + a1 ) − Φ(w0 β + a1 ), 10 where w1 and w0 are two different values of the covariates including w1 =1 and w1 =0, respectively. 2 Since a1 is not observed but we assume a1 |w ∼ Normal(0, σa ), we can obtain the APE by averaging the partial effects across the distribution of a 1 : Ea1 [β1 φ (wβ + a1 )], Ea1 [Φ(w1 β + a1 ) − Φ(w0 β + a1 )], ä ç and these are equivalent to getting: β 1 φ (wβ a ) and Φ(w1 βa ) − Φ(w0 βa ) where subscript a stands for division by 2 1 + σa . Then we can obtain a single number to compare with the linear estimates by averaging the derivative or the difference across the sample. For a continuous z11 , the APE is estimated by: ¾ ˆ δ11a N −1 N ¿ ˆ ˆ φ (α1a y2i + z1i δ1a ) . i=1 For a count variable y2 , the APE is estimated by: N −1 N ˆ ˆ ˆ 2 ˆ 2 Φ(α1a y1 + z1i δ1a ) − Φ(α1a y0 + z1i δ1a ) . i=1 For example, if we are interested in obtaining the APE when y2 changes from 0 to 1, it is necessary to predict the difference in mean responses with y2 = 1 and y2 = 0 and average the difference across all units. 1.2.2.2 The Case with a Count Endogenous Covariate and a Non-normally Distributed Heterogeneity In a fractional response model with a count endogenous variable, model (1.1) is considered with the estimation procedure provided in the previous section. The APEs are obtained by taking the derivatives or the differences in: Ea1 [Φ(α1 y0 + z0 δ1 + η1 a1 )], 2 1 11 (1.12) with respect to the elements of (y0 , z0 ). In the argument of the expectations operator, (y0 , z0 ) are 2 1 2 1 fixed terms and a1 is a random term. The partial effect (PE) is obtained for a continuous variable z11 : PE(y0 , z0 , a1 ) = δ11 φ (α1 y0 + z0 δ1 + η1 a1 ), 2 1 2 1 (1.13) and for a discrete variable y2 , we compute: Φ(α1 y1 + z0 δ1 + η1 a1 ) − Φ(α1 y0 + z0 δ1 + η1 a1 ), 2 1 2 1 (1.14) which is the difference in mean responses with two fixed points: y 2 = y1 and y2 = y0 that we are 2 2 interested in. To obtain the APEs, we need to average the above partial effects across the distribution of a 1 : APEc = Ea1 [δ11 φ (α1 y0 + z0 δ1 + η1 a1 )], 2 1 (1.15) APEd = Ea1 [Φ(α1 y1 + z0 δ1 + η1 a1 ) − Φ(α1 y0 + z0 δ1 + η1 a1 )], 2 1 2 1 (1.16) for the continuous case, and for the discrete case. This is equivalent to integrate out a 1 and we respectively receive: ψ =APEc = λ = APEd = +∞ −∞ +∞ −∞ δ11 φ (α1 y0 + z0 δ1 + η1 q1 ) f (q1 |y0 , z0 ; θ )dq1 , 2 1 2 1 Φ(g1 θ ) f (q1 |y1 , z0 ; θ )dq1 − 2 1 +∞ −∞ Φ(g0 θ ) f (q1 |y0 , z0 ; θ )dq1 , 2 1 (1.17) (1.18) where q1 is a dummy argument in the integration, g 1 = (y1 , z0 , q1 ), and g0 = (y0 , z0 , q1 ). 2 1 2 1 These APEs are estimated by: ˆ APE c = δ11 APE d = +∞ −∞ +∞ −∞ ˆ ˆ ˆ ˆ 2 φ (α1 y0 + z0 δ1 + η1 q1 ) f (q1 |y0 , z0 ; θ )dq1 , 1 2 1 ˆ ˆ Φ(g1 θ ) f (q1 |y1 , z0 ; θ )dq1 − 2 1 12 +∞ −∞ ˆ ˆ Φ(g0 θ ) f (q1 |y0 , z0 ; θ )dq1 , 2 1 (1.19) (1.20) Since equations (1.19) and (1.20) cannot be obtained in a closed form, we need to use the Adaptive Gauss-Hermite method to approximate the density of f (q 1 |yk , z0 ; θ ); k = 0, 1. This is 2 1 equivalent to obtain: ˆ ˆ ψ = APE c ≈ δ11 √ 2σi M √ ˆ {w∗ exp[(a∗ )2 ]}φ (g0 θ ) f (y0 , z0 , ( 2σi a∗ + wi ); θ ) , m m m 2 1 (1.21) m=1 λ = APE d = λ 1 − λ 0 , where λk = M √ √ ˆ 2σi {w∗ exp[(a∗ )2 ]}Φ(gk θ ) f (yk , z0 , ( 2σi a∗ + wi ); θ ), m m m 2 1 (1.22) (1.23) m=1 √ in addition to gk = (yk , z0 , ( 2σi a∗ + wi )); k = 0, 1 and θ = (α1 , δ1 , η1 ) . For a comparison m 2 1 between the linear model estimates and the fractional probit estimates, it is useful to have a single factor. This single factor can be obtained by averaging out z1i across all individuals in the formula ˆ of ψ and λ . For example, in order to get the APE when y2 changes from 0 to 1, it is necessary to predict the difference in the mean responses with y2 = 0 and y2 = 1 and take the average of the differences across all units. This APE gives us a number comparable to the linear model’s estimate. The standard errors for the APEs will be obtained using the delta method. The detailed derivation is provided from equation (D.21) to equation (D.40) in Appendix D.1. 1.3 Monte Carlo Simulations This section examines the finite sample properties of the two-step QML and NLS estimators of the population averaged partial effect in a fractional response model with a count endogenous variable. Some Monte Carlo experiments are conducted to compare these estimators with other estimators under different scenarios. These estimators are evaluated under correct model specification with different degrees of endogeneity; with strong and weak instrumental variables; and with different sample sizes. The behavior of these estimators is also examined with respect to a choice of a particular distributional assumption. 13 1.3.1 Estimators Two sets of estimators under two corresponding assumptions are considered: (1) y 2 is assumed to be exogenous, and (2) y2 is assumed to be endogenous. Under the former assumption, three estimators are used: the ordinary least squares (OLS) estimator in a linear model, the maximum likelihood estimator (MLE) in a Tobit model and the quasi-maximum likelihood estimator (QMLE) in a fractional probit model. Under the latter assumption, five estimators are examined: the twostage least squares (2SLS) estimator, the two-step maximum likelihood estimator (MLE) in a Tobit model using the Blundell-Smith estimation method (hereafter the Tobit BS), the two-step QMLE in a fractional probit model using the Papke-Wooldridge’s estimation method (hereafter the QMLEPW; see more discussion of handling endogeneity in Papke & Wooldridge (2008)), the two-step QMLE and the two-step NLS estimators in a fractional probit model using the estimation method proposed in the previous section. 1.3.2 Data Generating Process The count endogenous variable is generated from a conditional Poisson distribution: y exp(−λi )λi 2i , f (y2i |x1i , x2i , zi, a1i ) = y2i ! (1.24) λi = E(y2i |x1i , x2i , zi , a1i ) = exp(δ21 x1i + δ22 x2i + δ23 zi + ρ1 a1i ), (1.25) with a conditional mean: using independent draws from normal distributions: z ∼ N(0, 0.3 2), x1 ∼ N(0, 0.22), x2 ∼ N(0, 0.22 ) and exp(a1 ) ∼ Gamma(1, 1/δ0) where 1 and 1/δ0 are the mean and variance of a gamma distribution. Parameters in the conditional mean model are set to be: (δ21 , δ22 , δ23 , ρ1 , δ0 ) = (0.01, 0.01, 1.5, 1, 3). The dependent variable is generated by first drawing a binomial random variable x with n trials and a probability p and then y1 = x/n. In this simulation, n = 100 and p comes from a conditional 14 normal distribution with the conditional mean: p = E(y1i |y2i , x1i , x2i , a1i ) = Φ(δ11 x1i + δ12 x2i + α1 y2i + η1 a1i ), (1.26) and parameters in this conditional mean are set at: (δ 11 , δ12 , α1 , η1 ) = (0.1, 0.1, −1.0, 0.5). In order to compare the magnitudes between a nonlinear model and a linear model, we are interested in computing APEs. Based on the population values of the parameters set above, the so-called true value of the APE with respect to each variable is reported as the mean of the APEs approximated via the simulations with the standard procedure described below. First, when y2 is treated as a continuous variable, the so-called true value of the APE with respect to y2 is approximated from simulations by first computing the derivative of the conditional mean with respect to y2 , and then taking the average across the distribution of a 1 : APE = −1.0 ∗ 1 N φ (0.1 ∗ x1 + 0.1 ∗ x2 − 1.0 ∗ y2 + 0.5 ∗ a1i). N i=1 (1.27) Now when y2 is allowed to be a count variable, the so-called true values of the APEs with respect to y2 are computed by first taking differences in the conditional mean. These true values of the APEs are computed at interesting values. In this chapter, I will take the first three examples when y2 increases from 0 to 1, 1 to 2 and 2 to 3, respectively and the corresponding true values of ¾ the APEs are: APE01 = APE12 = APE23 = 1 N N i=1 ¾ N 1 N i=1 N 1 N i=1 ¾ Φ(0.1 ∗ x1 + 0.1 ∗ x2 − 1.0 ∗ 1 + 0.5 ∗ a1i) −Φ(0.1 ∗ x1 + 0.1 ∗ x2 − 1.0 ∗ 0 + 0.5 ∗ a1i) Φ(0.1 ∗ x1 + 0.1 ∗ x2 − 1.0 ∗ 2 + 0.5 ∗ a1i) −Φ(0.1 ∗ x1 + 0.1 ∗ x2 − 1.0 ∗ 1 + 0.5 ∗ a1i) Φ(0.1 ∗ x1 + 0.1 ∗ x2 − 1.0 ∗ 3 + 0.5 ∗ a1i) −Φ(0.1 ∗ x1 + 0.1 ∗ x2 − 1.0 ∗ 2 + 0.5 ∗ a1i) ¿ , (1.28) , (1.29) . (1.30) ¿ ¿ These so-called true values of the APEs (which are approximated through simulations) with respect to y2 and other exogenous variables are reported in Tables A.1-A.4. The experiment is conducted with 500 replications and the sample size is normally set at 1000 observations. 15 1.3.3 Experiment Results I report sample means, sample standard deviations (SD) and root mean squared errors (RMSE) of these 500 estimates. In order to compare estimators across linear and non-linear models, I am interested in comparing the APE estimates from different models. 1.3.3.1 Simulation Result with a Strong Instrumental Variable Tables A.1-A.4 report the simulation outcomes of the APE estimates for the sample size N = 1000 with a strong instrumental variable (IV) and different degrees of endogeneity, where η 1 = 0.1, η1 = 0.5, and η1 = 0.9. The IV is strong in the sense that the coefficient on z is δ 23 = 1.5 in the first stage and the F-statistics on the significance test of z in the first-stage are large. These F-statistics have average values equivalent to 91.75, 107.57 and 133.56 in 500 replications for three designs of η1 : η1 = 0.1, η1 = 0.5, and η1 = 0.9, respectively. Three different values of η1 are selected which corresponds to low, medium and high degrees of endogeneity. Columns 2-10 contain the true values of the APE estimates and the means, SD and RMSE of the APE estimates from different models with different estimation methods. Columns 3-5 consist the means, SD and RMSE of the APE estimates for all variables from 500 replications with y 2 assumed to be exogenous. Columns 6-10 include the means, SD and RMSE of the APE estimates for all variables from 500 replications with y2 allowed to be endogenous. I first report the simulation outcomes for the sample size N = 1000 and η 1 = 0.5 (see Table A.1). The APE estimates using the proposed methods of QMLE and NLS in columns 9-10 are closest to the true values of the APEs when y2 is discrete (−.3200, −.1273 and −.0212). It is typical to get these three APEs for a discrete y2 (when y2 goes from 0 to 1, 1 to 2 and 2 to 3) as examples in order to see the pattern of the means, SD and RMSE of the APE estimates. The APE estimate is also very close to the true value of the APE (−.2347) when y 2 is treated as a continuous variable. Table A.1 shows that the OLS estimate is about a half of the true value of the APE. The first source of large bias in the OLS estimate comes from the ignorance of the endogeneity in the count variable y2 (with η1 = 0.5). The second source of bias in the OLS estimate is due 16 to the neglect of the non-linearity in both the structural and reduced-form equations (1.1) and (1.2). The 2SLS approach also produces a biased estimator of the APE because of the second reason mentioned above even though the endogeneity is taken into account. The MLE estimators in the Tobit model have smaller bias than the estimators in the linear model but larger bias than the estimators in the fractional probit model because they do not consider the functional form of the fractional response variable and the count explanatory variable. When the endogeneity is corrected, the MLE estimator in the Tobit model using Blundell-Smith method has a smaller bias than the counterpart where y2 is assumed to be exogenous. Among the fractional probit models, the two-step QMLE estimator, where y2 is assumed to be exogenous, (column 5) has the largest bias because it ignores the endogeneity of y2 . However, it still has a smaller bias than other estimators of the linear and Tobit models. The two-step QMLE-PW estimator (column 8) provides useful result because its estimates are also very close to the true values of the APEs but it produces a larger bias than the two-step QMLE and NLS estimators proposed in this chapter. Similar to the two-step MLE estimator in Tobit model using Blundell-Smith method, the two-step QMLE-PW estimator adopts the control function approach. This approach utilizes the linearity in the first stage equation. As a result, it ignores the discreteness in y 2 which leads to the larger bias than the two-step QMLE and NLS estimators proposed in this chapter. The first set of estimators with y2 assumed to be exogenous (columns 3-5) has relatively smaller SDs than the second set of estimators with y 2 allowed to be endogenous (columns 6-10) because the methods that correct for endogeneity using IVs have more sampling variation than their counterparts without endogeneity correction. This results from the less-than-unit correlation between the instrument and the endogenous variable. However, the SDs of the two-step QMLE and NLS estimators (columns 9-10) are no worse than the QMLE estimator where y 2 assumed to be exogenous (column 5). Among all estimators, the two-step QMLE and NLS estimators proposed in this chapter have the smallest RMSE, not only for the case where y2 is allowed to be a discrete variable but also for the case where y2 is treated as a continuous variable using the correct model specification. As 17 discussed previously, the two-step QMLE estimator using Papke-Wooldridge method has the third smallest RMSE since it also uses the same fractional probit model. Comparing columns 3 and 6, 4 and 7, 5 and a set of all columns 8-10, the RMSEs of the methods correcting for endogeneity are smaller than those of their counterparts. Table A.2 reports simulation result for coefficient estimates. The coefficient estimates are useful in the sense that it gives the directions of the effects. For studies which only require exploring the signs of the effects, the coefficient tables are necessary. For studies which require comparing the magnitudes of the effects, we essentially want to estimate the APEs. Table A.2 shows that the means of point estimates are close to their true values for all parameters using the two-step QML (or NLS) approach (−1.0, 0.1 and 0.1). The bias is large for both 2SLS method and OLS method. These results are as expected because the 2SLS method uses the predicted value from the first stage OLS so it ignores the distributional information of the right-hand-side (RHS) count variable, regardless of the functional form of the fractional response variable. The OLS estimates do not carry the information of endogeneity. Both the 2SLS and OLS estimates are biased because they do not take into account the presence of unobserved heterogeneity. The bias for a Tobit Blundell-Smith model is similar to the bias with the 2SLS method because it does not take into account the distributional information of the right-hand-side count variable and it employs a different functional form given the fact that the fractional response variable has a small number of zeros. The biases for both the QMLE estimator treating y 2 as an exogenous variable and for the two-step QMLE-PW estimator are larger than those of the two-step QMLE and NLS estimators in this chapter. In short, simulation results indicate that the means of point estimates are close to their true values for all parameters using the two-step QMLE and the NLS approach mentioned in the previous section. Simulations with different degrees of endogeneity through the coefficient η 1 = 0.1 and η1 = 0.9 are also conducted (see Table A.3 and A.4). Not surprisingly, with less endogeneity, η 1 = 0.1, the set of the estimators treating y2 as an exogenous variable produces the APE estimates closer to the true values of the APE estimates; the set of the estimators treating y 2 as an endogenous variable has the APE estimates further from the true values of the APE estimates. With more endogeneity, 18 η1 = 0.9, the set of the estimators treating y 2 as an endogenous variable has the APE estimates getting closer to the true values of the APE estimates; and the set of the estimators treating y 2 as an exogenous variable gives the APE estimates further from the true values of the APE estimates. As an example, it is noted that, as η 1 increases, the APE estimates of the 2SLS method are less biased while the APE estimates of the QMLE estimator treating y 2 as an exogenous variable are more biased and the difference between these two APE estimates is smaller since the endogeneity is corrected. All other previous discussions on the bias, SD and RMSE still hold with η 1 = 0.1 and η1 = 0.9. It confirms that the two-step QMLE and NLS estimators perform very well under different degrees of endogeneity. 1.3.3.2 Simulation Result with a Weak Instrumental Variable Table A.5 reports the simulation outcomes of the APE estimates for the sample size N = 1000 with a weak IV and η1 = 0.5. Using the rule of thumb on a weak instrument (suggested in Staiger & Stock (1997)), the coefficient on z is chosen as δ23 = 0.3 which corresponds to a very small firststage F-statistic (the mean of the F-statistic is 6.97 in 500 replications). Columns 2-10 contain the true values of the APE estimates, the means, SD and RMSE of the APE estimates from different models with different estimation methods. Columns 3-5 consist the means, SD and RMSE of the APE estimates for all variables from 500 replications with y 2 assumed to be exogenous. Columns 6-10 include the means, SD and RMSE of the APE estimates for all variables from 500 replications with y2 allowed to be endogenous. The simulation results show that, even though the instrument is weak, the set of estimators assuming y 2 endogenous still has smaller bias than the set of estimators assuming y2 exogenous. The two-step QMLE and NLS APE estimates are still very close to the true values of the APEs for both cases in which y2 is treated to be a continuous variable and y2 is allowed to be a count variable. Their SD and RMSE are still the lowest among the estimators considering y2 endogenous. Table A.11 also provides this evidence. Simulation results from my proposed procedure show that the two-step QMLE and NLS APE 19 estimates are less biased and more efficient compared with the linear model’s and other models’ estimates. However, at the first glance, we can notice the standard deviation in Table A.5 is less than the standard deviation from Table A.1 under columns of QMLE and NLS, which is contrary to the pattern of standard deviations in the linear model (under the column with 2SLS estimation method). The standard deviation from my proposed procedure (under the columns of QMLE and NLS) is smaller in the case of a weak IV than the case of a strong IV, which seems odd at first if we judge that exclusion restrictions are driving identification. If we look at the result from the column of 2SLS, the bias and inefficiency of 2SLS estimates may arise because a linear model may provide a poor approximation for the count and fractional response variable. It suggests that nonlinearities have larger contributions than exclusion restriction to identification in the nonlinear models. In other words, functional form assumptions are mainly responsible for identification rather than the exclusion restriction. Therefore, it is worth investigating the reason why the estimates in my proposed procedure tend to be closer to the true value of the APEs and are always efficient without increasing the standard deviation in the case of weak instrument. We design the experiment similar to simulated experiment in Table A.5 but the coefficient on the instrument is 0 (δ23 = 0). This is equivalent to the case of no instruments. The results in Table A.6 show that standard deviations under columns of QMLE and NLS are still smaller suggesting nonlinearity is responsible for identification since there is no exclusion restriction here. 1.3.3.3 Simulation Result with Different Sample Sizes Four sample sizes are chosen to represent those commonly encountered sizes in applied research. These range from small to large sample sizes: 100, 500, 1000 and 2000. Tables A.7-A.10 report the simulation outcomes of the APE estimates with a strong IV, η 1 = 0.5, for sample sizes N = 100, 500, 1000, and 2000 respectively. Table A.8 is equivalent to Table A.1. Columns 2-10 contain the true values of the APE estimates and the means, SD and RMSE of the APE estimates from different models with different estimation methods. Columns 3-5 consist the means, SD and RMSE of the APE estimates for all variables from 500 replications with y 2 assumed to be exogenous. Columns 20 6-10 include the means, SD and RMSE of the APE estimates for all variables from 500 replications with y2 allowed to be endogenous. In general, the simulation results indicate that the SD and RMSE for all estimators are smaller for larger sample sizes. Previous discussion as in 3.3.1 is still applied. The two-step QMLE and NLS estimators perform very well in all sample sizes with the smallest SD and RMSE. They are also the least biased estimators among all the estimators in this discussion. 1.3.3.4 Simulation Result with a Misspecifie Distribution The original assumption is that exp(a 1 ) ∼ Gamma(1, 1/δ0). However, misspecification is dealt with in this part. The distribution of exp(a 1 ) is no longer gamma, instead, a 1 ∼ N(0, 0.12 ) is assumed. The finite sample behavior of all the estimators in this incorrect specification is examined. Table A.11 shows the simulation results for the sample size N = 1000 with a strong IV and η1 = 0.5 under misspecification. All of the previous discussion under the correct specification as in 1.3.3.1 is not affected. The APE estimates under the fractional probit model are still very close to the true values of the APEs. Table A.12 shows the simulation results of the APE estimates with the sample size N = 1000 and η1 = 0.5. The estimates are close to true values of the APEs , with very small MSE and rejection rates close to 0.05. We should note from all tables in the section of Monte Carlo simulations that the standard deviations under the columns of QMLE and NLS using the proposed procedure are not directly comparable as standard errors. However, we can see from the simulation result of Table A.13 that the proposed procedure’s analytical variance is quite reliable because estimates of mean of standard errors using analytical computation is quite close to those using bootstrapping method. 1.3.4 Conclusion from the Monte Carlo Simulations This section examined the finite sample behavior of the estimators proposed in the FRM with an endogenous count variable. The results of some Monte Carlo experiment show that the two21 step QMLE and NLS estimators have smallest standard deviations, RMSE and least biased when the endogeneity is presented. The two-step QMLE and NLS methods also produce least biased estimates in terms of both parameters and the APEs compared to other alternative methods. 1.4 Application and Estimation Results My proposed estimators can be applied in a model of female labor supply. The dependent variable refers to the allocation of total hours per week mothers spent on working. Hereafter, we name the dependent variable as weekly fractional working hours. The data in this chapter were used in Angrist & Evans (1998) to illustrate a linear model with a dummy endogenous variable: more than two kids. They estimate the effect of additional children on female labor supply, considering the number of children as endogenous and using the instruments: same sex and twins at the first two births. They found that married women who have the third child reduce their labor supply, and their 2SLS estimates are roughly a half smaller than the corresponding OLS estimates. In this application, the fractional response variable is the fraction of total weekly hours that a woman spends working. This variable is generated from the number of working hours, which was used in Angrist & Evans (1998), divided by the maximum hours per week (168). There is a substantial number of women who do not spend any hours working, 13068 observations at zero. Therefore, a Tobit model might be a choice. In this application, we are interested in estimating a model of weekly fractional working hours (FrHour) for women who take into consideration of having the number of children as a count endogenous factor. We begin with the linear model as follows: FrHour = α1 Kidno + δ1 Educ + δ2 Age + δ3 Age f b + δ4 Hispan + δ5 NmInc + a1 . (1.31) The count variable in this application is the number of children beyond two, between 0 and 10, instead of an indicator for having more than two kids which was used in Angrist & Evans (1998). The number of kids is considered endogenous, which is in line with the recent existing empirical literature. First, the number and timing of children born are controlled by a mother makes fertility 22 decisions correlated with the number of children. Second, some women have preference for familybased activity or market-based work, so fertility is correlated with women’s heterogeneity. The estimation sample contains 31,824 women, more than 50% is childless, 31% have one kid, 11% have two kids and the rest have more than two kids. Table A.14 gives the frequency distribution of the number of children and it appears to have excess zeros and long tails with the average number of children is around one. Other explanatory variables which are exogenous, including demographic and economic variables of the family, are also described in Table A.15. The current research on parent’s preferences over the sex mixture of their children using US data shows that most families would prefer at least one child of each sex. For example, Ben-Porath & Welch (1976) found that 56% of families with either two boys or two girls has a third birth while only 51% families with one boy and one girl had a third child. Angrist & Evans (1998) found that only 31.2% of women with one boy and one girl have a third child whereas 38.8% and 36.5% of women with two girls and two boys have a third child, respectively. With the evidence that women with children of the same sex are more likely to have additional children, the instruments that we can use are same sex and twins. Table A.16 illustrates the result of first-stage estimates with the significant statistics of same sex and twins. Table A.17 shows the estimation results of the OLS in a linear model, the MLE in a Tobit model and the QMLE in a fractional probit model when y 2 is assumed exogenous. The estimation results of the 2SLS in a linear model, the MLE in a Tobit BS model, the QMLE-PW, the QMLE and NLS estimation in a fractional probit model are shown in Table A.18 when y 2 is assumed endogenous. Since I also analyze the model using the Tobit BS model, its model specification and derivation of the conditional mean, the average partial effects and the estimation approach are included in Appendix D.2. The two-step NLS method with the same conditional mean used in the two-step QMLE method is also presented in Appendix D.3. Ordinary least squares The OLS estimation often plays a role as a benchmark since its computation is simple, its interpretation is straightforward and it requires fewer assumptions for consistency. The estimates 23 of a linear model in which the fraction of total working hours per week is the response variable and the number of kids is considered exogenous are provided in Table A.17. As discussed in the literature of women’s labor supply, the coefficient of the number of kids is negative and statistically significant. The linear model with the OLS estimation ignores functional form issues that arise from the excess-zeros nature of the dependent variable. In addition, the predicted value of the fraction of the total weekly working hours for women always lies in the unit interval. The use of the linear model with the OLS estimation will not make any sense if the predicted value occurs outside this interval. A Tobit model with an exogenous number of kids There are two reasons that a Tobit model might be practical. First, the fraction of working hours per week has many zeros. Second, the predicted value needs to be nonnegative. The estimates are given in Table A.17. The Tobit coefficients have the same signs as the corresponding OLS estimates, and the statistical significance of the estimates is similar. For magnitude, the Tobit partial effects are computed to make them comparable to the linear model estimates. First of all, the partial effect of a discrete explanatory variable is obtained by estimating the Tobit conditional mean. Second, the differences in the conditional mean at two values of the explanatory variable that are of interest is computed (for example, we should first plug in y 2i = 1 and then y2i = 0). As implied by the coefficient, having the first child reduces the estimated fraction of total weekly working hours by about 0.023, or 2.3 percentage points, a larger effect than 1.9 percentage points of the OLS estimate. Having the second child and the third child make the mother work less by about 0.021 or 2.1 percentage points and 0.018 or 1.8 percentage points, respectively. All of the OLS and Tobit statistics are fully robust and statistically significant. Comparing with the OLS partial effect, which is about 0.019 or 1.9 percentage points, the Tobit partial effects are larger for having the first kid but almost the same for the second and the third kid. The partial effects of continuous explanatory variables can be obtained by taking the derivatives of the conditional mean; or we can practically get the adjustment factors to make the adjusted Tobit coefficients roughly comparable to the OLS estimates. All of the Tobit coefficients given in Table A.17 for continuous 24 variables are larger than the corresponding OLS coefficients in absolute values. However, the Tobit partial effects for continuous variables are slightly larger than the corresponding OLS estimates in absolute values. A Fractional response model (FRM) with an exogenous number of kids Following Papke & Wooldridge (1996), I also use the fractional probit model assuming the number of children exogenous for a comparison purpose. The FRM’s estimates are similar to the Tobit’s estimates, but they are even closer to the OLS estimates. The statistical significance of QML estimates is almost the same as that of the OLS estimates (see Table A.17). Having the second child reduces the estimated fraction of total weekly working hours by 1.9 percentage points, which is roughly the same as the OLS estimate. However, having the first and third child result in different partial effects. Having the first kid makes a mother work much less by 2.0 percentage points, and having the third kid makes a mother work less by 1.6 percentage points. Two-stage least squares In the literature on female labor supply, Angrist & Evans (1998) consider fertility endogenous. Their remarkable contribution is to use two binary instruments: genders of the first two births are the same (samesex) and twins at the first two births (multi2nd) to account for an endogenous third child. The 2SLS estimates are replicated and reported in Table A.18. The first stage estimates using the OLS method and assuming a continuous number of children, given in Table A.16, show that women with higher education are estimated to be 6.5 percentage points less likely to have kids. In magnitude, the 2SLS estimates are less than the OLS estimates for the number of kids but roughly the same for other explanatory variables. With IV estimates, having children leads a mother to work less by about 1.6 percentage points, which is smaller than the corresponding OLS estimates of about 1.9 percentage points. These findings are consistent with Angrist and Evans’ result. A Tobit BS model with an endogenous number of kids 25 A Tobit BS model is used with the number of children endogenous (see Table A.18). Only the Tobit average partial effect of the number of kids have statistically slightly larger effect than that of the 2SLS estimates. The APEs of the Tobit estimates are almost the same as those of the corresponding 2SLS estimates for other explanatory variables. Having the first, second and third kid reduce the fraction of hours a mother spends working per week by around 1.8, 1.7 and 1.5 percentage points, respectively. Having the third kid reduces a mother’s fraction of working hours per week by the same amount as the 2SLS estimates. The statistical significance is almost the same for the number of kids. The Tobit BS method is similar to the 2SLS method in the sense that the first stage uses a linear estimation and it ignores the discrete nature of the number of children. It explains why the Tobit BS result gets very close to the 2SLS estimates. A FRM with an endogenous number of kids Now let us consider the FRM with the number of kids endogenous. The fractional probit model with Papke-Wooldridge method (2008) has dealt with the problem of endogeneity. However, this method has not taken into account the problem of count endogeneity. The endogenous variable in this model is treated as a continuous variable, hence, the partial effects at discrete values of the count endogenous variable are not considered. In this chapter, the APEs of the QMLE-PW estimates are also computed in order to be comparable with other APE estimates. Having the first kid reduces a mother’s fraction of weekly working hours by the same amount as the 2SLS estimates. Treating the number of children continuous also gives the same effect as the 2SLS estimate on the number of kids. As the number of children increases, the more working hours a mother has to sacrifice. Having the second and third kids reduce the fraction of hours a mother spends working per week by around 1.6, 1.5 and 1.4 percentage points, respectively. The statistical significance is the same as the Tobit BS estimates for the number of kids. The APEs of the two-step QMLE-PW estimates are almost the same as those of the corresponding 2SLS estimates for other explanatory variables. The fractional probit model with the methods proposed in this chapter is attractive because it controls for endogeneity, functional form issues and the presence of unobserved heterogeneity. 26 More importantly, the number of children is considered a count variable instead of a continuous variable. Both the two-step QMLE and NLS are considered and the two-step NLS estimates are quite the same as the two-step QML estimates. The two-step QML and NLS’s coefficients and robust standard errors are given in Table A.18 and the first-stage estimates are reported in Table A.16. In the first stage, the Poisson model for the count variable is preferred because of two reasons. First, the distribution of the count variable with a long tail and excess zeros suggests an appropriate model of gamma heterogeneity instead of normal heterogeneity. Second, adding the unobserved heterogeneity with the standard exponential gamma distribution to the Poisson model transforms the model to the Negative Binomial model, which can be estimated by the maximum likelihood method. The OLS and Poisson estimates are not directly comparable. For instance, increasing education by one year reduces the number of kids by 0.065 as in the linear coefficient and by 7.8% as in the Poisson coefficient. The fractional probit’s estimates have the same signs as the corresponding OLS and 2SLS estimates. In addition, the result shows that the two-step QMLE is more efficient than the OLS and 2SLS estimators. For magnitude, the fractional probit’s APEs are computed to make them comparable to the linear model’s estimates. Similar to the Tobit model, the partial effect of a discrete explanatory variable is obtained by estimating the conditional mean and taking the differences at the values we are interested in. Regarding the number of kids, having more kids reduces the fraction of hours that a mother works weekly. Having the first child cuts the estimated fraction of total weekly working hours by about 0.017, or 1.7 percentage points, which is similar to the 2SLS estimates, and less than the OLS estimates. Having the second child and the third child make a mother work less by about 1.5 percentage points and 1.4 percentage points, respectively. Even though having the third kid reduces a mother’s fraction of weekly working hours compared to having the second kid, the marginal reduction is less, since a marginal reduction of 0.2 percentage points for having the second kid now goes down to 0.1 percentage points for having the third kid. This can be seen as the "adaptation effect" as the mother adapts and works more effectively after having 27 the first kid. The partial effects of continuous explanatory variables can be obtained by taking the derivatives of the conditional mean so that they would be comparable to the OLS, 2SLS estimates and other alternative estimates. All of the estimates in Table A.18 tell a consistent story about fertility. Statistically, having any children reduces significantly a mother’s working hours per week. In addition, the more kids a woman has, the more hours that she needs to forgo. The FRM treating the number of kids as endogenous and as a count variable gives an evidence that the marginal reduction of women’s working hours per week is less as women have additional kids. In addition, the FRM’s estimates, taking into account the endogeneity and count nature of the number of children, are statistically significant and more significant than the corresponding linear model’s and Tobit BS’s estimates. One advantage of the fractional probit model with the two-step QMLE (NLS) method that we æÈ n é are discussing in this part is that it fits the data better than alternative models or methods. Either Rsquared (S1 = SSE/SST = 1 − (y1i − y1 ˆ i=1 )2 / n È (y i=1 ¯ 1i − y1 )2 ) or the correlation squared (S2 = {Corr[y1 , E(y1 |y2 , z)]}2 ) can be used to compare the goodness-of-fit among these models. The statistics on fractional probit-QMLE, NLS, Tobit BS, and Linear 2SLS are 0.116, 0.114, 0.090, and 0.088, respectively. This shows that the fractional probit model using the two-step QMLE(NLS) methods has larger goodness-of-fit statistic(s) than that of the Tobit model using Blundell-Smith’s procedure and the linear model using the 2SLS method. It seems questionable that the standard errors under the columns of QMLE and NLS methods (see Table A.18) are unexpectedly smaller than alternative methods’ standard errors if we compare with the simulation results in Table A.1. There are two things we need to make clear: i) In the simulation results, they are standard deviations instead of standard error estimates; and ii) Table A.1 show the case of a strong IV whereas our empirical results does not have a strong IV. Therefore, we need to look closely at Tables A.5 and A.6 where we have a weak IV or no IV. We also see the pattern that standard deviations in Tables A.5 and A.6 are much smaller than alternative methods’ standard deviations and they are quite the same as what we observed in Table A.18. In addition, we also note that the standard errors in Table A.18 use analytical variance instead of bootstrapping 28 variance which is not directly comparable to other methods’ variances which are bootstrapping variances. However, the simulation result in Table A.13 implies that the analytical variance used in Table A.18 is quite reliable. For these reasons, we can conclude that the linear approximation may provide a poor approximation for the count and fractional response variable in a simultaneous model. It is also worth noting that standard deviations of nonlinear estimators are not increasing when we go from a strong to weak IV case. In this particular fractional probit model, too much of the identification appears to be off of the nonlinearity. Exclusion restriction seems not necessary when a nonlinear model is used for the first stage, instead of the linear model. In other words, nonlinearity is responsible for identification. However, it is widely considered preferable to have an identification strategy that is robust to using a linear first stage regression. This is really a matter of one’s judgment and identification off the nonlinearity is still identification. We only need to worry about assumptions on functional form and the distribution of an error term. 1.5 Conclusion I present the two-step QMLE and NLS methodology to estimate the fractional response model with a count endogenous explanatory variable. The unobserved heterogeneity is assumed to have an exponential gamma distribution, and the conditional mean of the fractional response model is estimated numerically. The two-step QMLE and NLS approaches are more efficient than the 2SLS and Tobit with IV estimates. They are more robust and less difficult to compute than the standard MLE method. This approach is applied to estimate the effect of fertility on the fraction of working hours for a female per week. Allowing the number of kids to be endogenous, using the data provided in Angrist & Evans (1998), I find that the marginal reduction of women’s working hours per week is less as women have an additional kid. In addition, the effect of the number of children on the fraction of hours that a woman spends working per week is statistically significant and more significant than the estimates in all other linear and nonlinear models considered in this chapter. 29 Chapter 2 ESTIMATION OF A DYNAMIC TOBIT PANEL DATA WITH AN ENDOGENOUS VARIABLE AND AN APPLICATION TO FEMALE LABOR SUPPLY 2.1 Introduction This chapter considers the estimation of a dynamic Tobit model with an endogenous regressor in the presence of unobserved heterogeneity in both stages and serial correlation in the first stage. Practical issue motivating this study is concerned with the dynamics of female annual labor supply where we have a corner solution outcome and it is affected by its previous state and another source of endogeneity. The estimation method proposed in this chapter is established based on a combination of practical methods proposed in the literature. To deal with the first source of endogeneity, estimation methods of dynamic nonlinear models with a lagged dependent variable have been proposed with fixed effects or random effects. The first method is case-specific, computation√ ally complex and often leads to estimators that do not converge at the usual n rate. In addition, partial effects for nonlinear model using this approach are not identified. Therefore, an appealing and robust method which solves for unobserved effects and the well-known initial condition problem has been proposed by Wooldridge (2005). To correct for the second source of endogeneity, we can use a control function approach, especially convenient and computationally easy proposed by Smith & Blundell (1986) for limited dependent variables (LDV). As state dependence in a dynamic nonlinear model can be overestimated without taking into account serial correlation, we also need to correct for serial correlation. The contribution of this chapter is to provide a computationally attractive estimation method for a dynamic censored model with an endogenous regressor (besides the lagged dependent variable) and serially correlated error terms. This method is readily applied to Panel Study of Income Dynamics (PSID) data using the years 1980 to 1992. Based on the estimation result, I find the evidence of persistence in US white female labor working hours over the period 1980-1992. It 30 suggests that the current labor supply of US women is affected by their past labor supply and their initial condition of labor supply. Both observed and unobserved individual heterogeneity, and serial correlation play an important role in the persistence of US female labor supply. This chapter is organized as follows. The second section reviews: i) approaches to estimation of a dynamic Tobit panel data model; ii) a control function approach to govern the endogeneity problem and iii) methods to deal with serial correlation. It also discusses related issues on the dynamics of the US female annual labor supply. The third section develops a model for dynamic Tobit panel data with an endogenous regressor and the fourth section obtains average partial effects (APE) estimates. The fifth section discusses how to correct for serial correlation in the first stage. Empirical example follows in the next section. The last section is summarization and conclusion. 2.2 Literature Review The approach and framework of this chapter are most closely related to the work proposed by Giles & Murtazashvili (2010). They allow continuous, endogenous contemporaneous regressors in a dynamic panel data model but their outcome of interest is a binary variable. Their estimator is applied to analyze the impact of migrant labor markets on reducing the probability of falling into poverty. Since the outcome variable in this chapter is continuous with a positive probability as well as has a pileup at zero, the framework for a dynamic binary response model in Giles & Murtazashvili (2010) has to be adjusted. A dynamic Tobit panel data model should be appropriate in this case. There have recently been many studies on a dynamic Tobit panel data model which allows for unobserved heterogeneity and dynamic feedback. These two features of the dynamic panel data model, however, often create difficulties in estimation. The main difficulty is that with nonlinearity, it is not obvious how to “difference away” the individual specific effects and how to use instrumental variable type techniques. Some developments have been made on estimating certain nonlinear dynamic models using the “fixed-effects” approach, for example, the censored regression models (Honore (1993); Hon31 ore & Hu (2001)), the sample selection models (Kyriazidou (2001)), the discrete choice models (Honore & Kyriazidou (2000)), and the models with multiplicative individual effects (Chamberlain (1992); Wooldridge (1997)). In particular, Honore (1993) proposed some solutions for estimating a censored regression panel data model with individual fixed effects and lagged censored dependent variables. Honore & Hu (2001) provided identification results for this approach under certain conditions. And Honore & Hu (2004) allowed a lagged dependent variable and a set of strictly exogenous variables. They constructed moment conditions for the panel data model with fixed effects and lagged (censored) dependent variable with a restrictive assumption of non-negative coefficient on the lagged dependent variable. In addition, their approach will not result in estimates for APEs. Even though semiparametric approaches do not make any assumptions on either unobserved effects or initial conditions but they are case-sensitive and often lead to estimators that do √ not converge at the usual n rate (Arellano & Honore (2001)). For example, Honore & Kyriazidou (2000) assumed that transitory errors are iid over time (c i is arbitrary dependent on Xit ). If the regressors are continuous or have high dimension then the estimator will have a convergence rate √ slower than n. The estimator will over-difference the data and understate the role of the initial value of dependent variable, yi0 . This causes downward biased coefficient on the lagged dependent variable in finite samples and this bias will not decrease as T increases. More importantly, partial effects on the conditional mean are not identified. The amount of state dependence therefore cannot be determined. An alternative of estimation method in nonlinear dynamic models is to use the “randomeffects” approach. This approach is faced with a notably difficult issue of initial condition problem. Wooldridge (2005), Chay & Hyslop (1998), and Hsiao (1986) have an excellent summarization on how this problem is treated in the literature. There are three alternative assumptions on initial conditions. The first approach treats the initial condition as exogenous (Heckman (1978a,b, 1981b)). Initial conditions are independent of the individual effects and can be ignored when estimating the structural model. However, if either ci or Xit is a determining factor in the initial sample conditions, then this approach will overstate the amount of state dependence in the process. Moreover, this is a 32 very strong assumption and may not make sense. For example, ability is allowed to be uncorrelated with initial earnings. The second approach treats the initial condition as in equilibrium (Card & Sullivan (1988)). This restriction is unlikely to hold when observable covariates are time-varying and important determinants of the outcome. The initial condition is allowed to be random and the distribution of the initial condition given unobserved heterogeneity is specified. This model does not allow for additional covariates (Bhargava & Sargan (1983); Hsiao (1986)). The third way is to adopt a flexible reduced form specification: approximating initial sample observation (Heckman (1981b)). This approach is computationally difficult to obtain estimates of parameters and APEs. The first approach is viewed as “pure” random effect approach where ci is independent of zi and yi0 . In addition, unobserved effect is independent of exogenous variables. One can obtain the density of (yi1 , . . . , yiT ) given yi0 and zi by integrating out ci . This method requires a strong assumption of independence between the initial condition and the unobserved effect. The fourth approach which is proposed by Wooldridge (2005) is the most unrestricted random effect model, which was named "correlated" random effect. Compared with the fixed effect model, it may provide substantial efficiency gain (Hausman (1978)) given the correctly specified distribution of c i and yi0 . It recommends to obtain a joint distribution of (y i1 , . . ., yiT ) conditional on yi0 and zi ; rather than a distribution of (yi0 , . . ., yiT ) conditional on zi as in Heckman’s approach. However, we need to specify a density of ci given yi0 and zi (motivated by the original idea from Chamberlain (1980)). The relationship between ci and zi makes this model named “correlated” random effects where we allow a linear relationship between ci and zi and yi0 . This approach requires fewer computational efforts than Heckman’s technique and gives nice APEs. It also leads to several advantages, we can choose a flexible conditional distribution of the initial condition instead of approximation which results in computational difficulty. As a consequence, estimates are readily computed and partial effects can be easily determined. The study of limited dependent variables models with an endogenous regressor (instead of lagged dependent variable) has a fairly long history. Most papers in the literature have assumed a reduced form for the endogenous variable. Examples of this include the papers by Nelson & Olson 33 (1978); Heckman (1978a); Amemiya (1979); Newey (1985, 1986); Blundell & Smith (1989); Vella (1993); Blundell & Powell (2004); Das (2002) for cross sections and Vella & Verbeek (1999); Labeaga (1999); Giles & Murtazashvili (2010) for panel data. In a linear model, such a reduced form (or the “first stage”) can be thought of as a linear projection, and as such it is essentially always well-defined and consistently estimated by the OLS estimator. This is not the case in a nonlinear model where it is typically assumed that the first stage is a conditional expectation and that the error is independent of the instruments. Smith & Blundell (1986) considered a static Tobit model to analyze female labor supply in the UK in 1981 treating other household income as endogenous. The insight of their paper is to substitute a consistent estimator for the residual in the reduce-form equation into the structural model to control for the endogeneity. And this approach is named control function approach which produces a two-step estimator based on the conditional likelihood for the equation of interest. As this chapter studies a source of endogeneity not coming from the lagged dependent variable, we will employ the control function approach that allows for a correlation between unobserved effect and regressors, as well as between regressors and the structural error. As in Baltagi & Li (1991) and Baltagi & Wu (1999), estimation of a panel data model with AR(1) disturbances is based on a feasible generalized least squares procedure. This method is simple to compute and provides natural estimates of the serial correlation and variance components parameters. The test for zero first-order serial correlation is also easily implemented. However, this estimation procedure works very well for a linear panel data model but has not been executed in a nonlinear panel data model. In order to deal with serial correlation in a dynamic nonlinear model, Lee (1999) has proposed the simulated maximum likelihood method. This method is robust in time-series context, however, it is quite computationally intensive. We will exploit the method similar to Baltagi & Li (1991) to handle our first stage serial correlation. One of the possible applications for this model is to study the persistence of the US female labor supply taking into account endogenous features of observed covariates. The literature on labor supply has examined female labor supply in many studies. Women’s labor supply is one 34 of long-standing labor supply research (Heckman (1974); Heckman & Macurdy (1980)). Studies of women labor supply is growing rapidly due to the increasing availability of panel data and improved computational power and techniques. According to Heckman (1981a), state dependence may arise if working leads to accumulation of human capital – skills, know-how, work ethic, etc., and not working leads to depreciation of human capital. Women who prefer work to leisure, who are highly motivated and have high ability tend to stay in the work force for their entire working life and their high labor supply persistence is exhibited. Differences in “search costs” associated with different labor market states may also cause state dependence (Eckstein & Wolpin (1990); Hyslop (1999)). There might be fixed cost to enter the labor market, raising the cost for individuals who are not employed, relative to those already in the labor market. Shaw (1994) studies the persistence of the US white female labor supply from 1967 to 1987 using a linear dynamic model with age stratification and she found persistence in their labor supply because as women entered the labor force, they tended to become continuous workers. She also found that the extent of persistence changed little over the 20 year period studied after controlling for individual circumstances which are influential for early and late life periods such as number of children, health status, age, and wages. However, she does not take into account the nature of the working hours as a limited dependent variable. And she does not examine whether the persistence comes from transitory shocks that might be serially correlated. 2.3 Model I consider a panel data model with the latent variable as follows: y∗ = ρ y1i,t−1 + α y2it + xit β + c1i + u1it , 1it y1it = max(0, y∗ ), 1it t = 1, . . ., T, (2.1) (2.2) where y1it is observed and equal to zero with a positive probability while continuously distributed over strictly positive values, y1i,t−1 is a lagged dependent variable and the dynamics are assumed first order, y2it is an endogenous variable, xit is a 1×K vector of time-varying explanatory variables 35 which can contain a constant term, ci1 is a time-constant unobserved heterogeneity and u it1 is an idiosyncratic error. β is a K × 1 vector of parameters, ρ and α are scalar parameters. i indexes a random draw from the cross section with sample size N and t denotes a particular time period within a number of fixed time periods T . For simplicity, we assume a balanced panel. In the followings, we have i = 1, . . . , N, and t = 1, . . ., T . We assume that model (2.1) is correctly specified dynamically and the error term is serially uncorrelated: 2 u1it |y1i,t−1, . . . , y1i0 , xi , c1i ∼ Normal(0, σu ). 1 (2.3) If we allow the error term to be serially correlated, for example, allowing for an AR(1) process, we would want to include not only a lagged dependent variable but also lags of x as well. In this case, we include a single lag of y1 , contemporaneous y2 and possibly of x’s. In model (2.1), xit is assumed to be strictly exogenous and y 2it is allowed to be endogenous. Let zit = (xit , z1it ) be a set of strictly exogenous variables, a 1 × L vector of instrumental variables, where L > K and z1it is excluded from (2.1). Using the control function approach to model the endogeneity (see Smith & Blundell (1986); Rivers & Vuong (1988)), we can assume a linear reduced form for y2it as follows: y2it = zit γ + c2i + u2it , (2.4) 2 where u2it is an idiosyncratic serially uncorrelated error with Var(u 2it ) = σu and c2i is an unob2 served effect. Using Mundlak (1978)’s device, we allow c2i = zi δ + a2i and rewrite y2it as: y2it = zit γ + zi δ + v2it , where v2it = a2i + u2it ; zi = T −1 T Èz t=1 it (2.5) 2 and a2i |zi ∼ Normal(0, σa ). We can also add time dum2 mies into this reduced form. Now it boils down to the assumption that we need to make for the conditional distributions of u1it and c1i . We will discuss first about u1it . As in the cross-sectional case discussed in Smith & Blundell (1986), we can allow a joint normality between u 1it and v2it . However, v2it is serially correlated because of the presence of the heterogeneity, a2i , therefore, this will make the serial 36 correlation issue in the estimation with dynamics difficult to handle. As a result, we will start with the assumption of joint normality of u 1it and u2it as suggested in Giles & Murtazashvili (2010) since u2it can be naturally assumed to be serially uncorrelated. As we will see in the discussion below, this assumption is reasonable in our context of dynamics in the structural equation. We write: u1it = θ1 u2it + e1it , (2.6) 2 where θ1 = Cov(u1it , u2it )/Var(u2it ) and e1it ∼ Normal(0, σe1 ). (u1it , u2it ) is allowed to have a zero mean, bivariate normal distribution; z i is strictly exogenous in both equation (2.1) and (2.5) or in other words, u 1it and u2it are independent of zi . e1it is independent of zi and u2it . We can assume that e1it serially uncorrelated because u2it is serially uncorrelated and independent of zi in addition to the fact that u 1it is free of serial correlation. Even if u2it is serially correlated, we can correct for this serial correlation without any hardship. We will discuss about this issue in more details later. Regarding the issue of the endogeneity of y2it , let us rewrite equation (2.6): u1it = θ1 v2it − θ1 a2i + e1it . (2.7) We will see now the direct relation between u1it and v2it , through that we can account for endogeneity of y2it in period t. In addition, u 2it is free of serial correlation, then from equation (2.6), e1it is not correlated with u2i,t−1 and v2i,t−1, as a result. With the same idea for past values of v2it , y2it will become sequentially exogenous in the estimating equation. With equation (2.7), we now have to handle the heterogeneity issues, not only c 1i but also a2i . Rewrite the structural equation under the assumption from equation (2.7), we have: y∗ = ρ y1i,t−1 + α y2it + xit β + c1i + θ1 v2it − θ1 a2i + e1it , t = 1, . . . , T, 1it or y∗ = ρ y1i,t−1 + α y2it + xit β + si + θ1 v2it + e1it , t = 1, . . . , T, 1it where si = c1i − θ1 a2i , which is a composite error. 37 (2.8) Using Wooldridge-Chamberlain’s device (2005, 1980), with the motivation of "correlated" random effects dynamic model proposed by Wooldridge (2005) to handle the initial condition problem, we can specify si as a linear function of y1i0 and zi in order to use the standard random effects Tobit software without approximating the density function of s i . However, now our regressors in equation (2.8) extends to include v2it (which is not in zi ), therefore, we will include v2i into the linear function that describe the relationship of s i and the initial condition as well as explanatory variables in all time periods. si = θ2 y1i0 + zi θ3 + v2i θ4 + a1i , (2.9) where 2 a1i |(y1i0 , zi , v2i ) ∼ Normal(0, σa ). 1 (2.10) This is a reasonable assumption because unobserved effect (such as motivation, ambition) is correlated with the initial condition of the outcome of interest (working hours). In addition, as the model has a lagged dependent variable, y1i,t−1 and c1i has some source of correlation. In order to conserve the degree of freedom or to reduce the time of computation which will be important in some applied work with a substantial number of explanatory variables, we can assume that different time periods of explanatory variables have equal impacts on s i and using Mundlak’s device, we can restrict our assumption (2.9) to: s i = θ2 y1i0 + zi θ3 + v2i θ4 + a1i . We can see that now v1it = θ2 yi0 + zi θ3 + v2i θ4 + θ1 v2it + a1i + e1it . Substitute that into equation (2.1), hence, we readily obtain: y∗ = ρ y1i,t−1 + α y2it + xit β + θ2 y1i0 + zi θ3 + v2i θ4 + θ1 v2it + a1i + e1it , t = 1, . . . , T, (2.11) 1it and in a shorter version, we have: y1it = max(0, w1it λ1 + θ2 y1i0 + zi θ3 +v2i θ4 + θ1 v2it + a1i + e1it ), t = 1, . . . , T, where w1it = (y1i,t−1 , y2it , xit ) and λ1 = (ρ , α , β ) . 38 (2.12) Based on the estimating equation (2.11), with the framework suggested in (Wooldridge, 2002, section 13.9) and Wooldridge (2005), we can write the density as follows: ft (y1t |y1,t−1 , y2t , y10 , z, v2 , a1 ; λ ) = f0t f1t , (2.13) where f0t = 1 − Φ[(w1t λ1 + θ2 y10 + zθ 3 + v2 θ4 + θ1 v2t + a1 )/σe1 ] 1[y1t =0] , and f1t = (1/σe1 )φ [(y1t − w1t λ1 − θ2 y10 − zθ 3 − v2 θ4 − θ1 v2t − a1 )/σe1 ]1[y1t >0] . Thus the density of (y1i1 , y1i2 , . . . , y1iT ) given (y1i0 = y10 , zi = z, v2i = v2 , a1i = a1 ) is: T ft (y1t |y1,t−1 , y2t , xt , y10 , z, v2 , a1 ; λ ), (2.14) t=1 and since we do not observe a1i , in order to estimate λ , we need to integrate out a 1 from this den2 sity. Given a1i |(y1i0, zi , v2i ) ∼ Normal(0, σa ), we can obtain the density of (y1i1 , y1i2 , . . . , y1iT ) 1 given (y1i0 = y10 , zi = z, v2i = v2 ) as: ¾T R t=1 ¿ ft (y1t |y1,t−1, y2t , xt , y10 , z, v2 , a1 ; λ ) (1/σa1 )φ (a1 /σa1 )da1 , (2.15) which has exactly the same structure as in the standard random effects Tobit model, but the explanatory variables at time period t are: wit = (y1i,t−1 , y2it , xit , y1i0 , zi , v2i , v2it ). (2.16) Now we can exploit the standard random effects Tobit software for estimation. We add y i0 , zi , v2i 2 and v2it as additional explanatory variables in each time period and estimate λ , θ 3 , θ4 and σe , 1 where v2i = (v2i1, v2i2 , . . . , v2iT ). Based on the above model development, the estimation procedure for "correlated random effect" dynamic Tobit model is proposed as follows. Estimation Procedure: 39 (i) Estimate the reduced form for y2it using the pooled OLS of y2it on zit , zi , and time dummies. Obtain the residuals, v2i and v2it . (ii) Use the random effect Tobit of y1it on wit and get all the estimates of interest, λ , where wit = (y1i,t−1 , y2it , xit , y1i0 , zi , v2i , v2it ). 2.4 Average Partial Effects In order to compare the magnitude of the estimate obtained in a nonlinear model from the previous section with a linear estimate, we need to obtain the marginal effect or the average partial effect (APE) of the explanatory variable of interest. Following Wooldridge (2002, 2005), the APEs are computed as the derivatives or differences of: 2 E[m(w1t λ1 + θ2 y1i0 + zi θ3 +v2i θ4 + θ1 v2it + a1i , σe )], 1 t = 1, . . ., T, (2.17) 2 where m(g, σe ) = Φ[g/σe1 ]g + σe1 φ [g/σe1 ] under the notation that 1 g = w1t λ1 + θ2 y1i0 + zi θ3 +v2i θ4 + θ1 v2it + a1i , and in the argument of the expectation operator, variables with a subscript i are random and all others are fixed. Using iterated expectation, expression (2.17) can be rewritten as: 2 E{E[m(w1t λ1 + θ2 y1i0 + zi θ3 +v2i θ4 + θ1 v2it + a1i , σe )|y1i0 , zi , v2i ]}, 1 (2.18) where w1t are fixed values here and the conditional expectation is with respect to the distribution 2 of (y1i0 , zi , v2i , a1i ). Since a1i and (y1i0 , zi , v2i ) are independent, and a1i ∼ Normal(0, σa ), the 1 conditional expectation in equation (2.18) is obtained by integrating 2 m(w1t λ1 + θ2 y1i0 + zi θ3 +v2i θ4 + θ1 v2it + a1i , σe ), 1 2 over a1i with respect to the Normal(0, σa ) distribution. 1 40 Since 2 m(w1t λ1 + θ2 y1i0 + zi θ3 +v2i θ4 + θ1 v2it + a1i , σe ) 1 is obtained by integrating max(0, w1t λ1 + θ2 y1i0 + zi θ3 +v2i θ4 + θ1 v2it + a1i + e1it ) 2 with respect to e1it over the Normal(0, σe ) distribution, the conditional expectation in equation 1 (2.18) is: 2 2 m(w1t λ1 + θ2 y1i0 + zi θ3 +v2i θ4 + θ1 v2it , σa + σe ). 1 1 (2.19) For a given value of w1t (w0 ), a consistent estimator for expression (2.19) can be obtained by 1 replacing unknown parameters by consistent estimators: N −1 N ˆ ˆ ˆ ˆ ˆ v ˆ ˆ2 ˆ2 m(w0 λ1 + θ2 y1i0 + zi θ3 +ˆ 2i θ4 + θ1 v2it , σa + σe ), 1 1 1 (2.20) i=1 where v2it are the first stage pooled OLS residuals from y2it on zit , zi and time dummies, and ˆ ˆ v2i = (v2i1 , v2i2 , . . ., v2iT ). ˆ ˆ ˆ The APEs are obtained by taking derivatives or differences of expression (2.19) (in which w 0 1 is replaced with w1t ) with respect to w1t and the estimator of these APEs will be obtained based on those derivatives and differences and estimated parameters. For example, APE of y1,t−1 is: N T ρ (NT )−1 ˆ ˆ ˆ ˆ ˆ v ˆ ˆ2 ˆ2 Φ[(w1t λ1 + θ2 y1i0 + zi θ3 +ˆ 2i θ4 + θ1 v2it )/(σa + σe )] , 1 1 (2.21) ˆ ˆ ˆ ˆ ˆ v ˆ ˆ2 ˆ2 Φ[(w1t λ1 + θ2 y1i0 + zi θ3 +ˆ 2i θ4 + θ1 v2it )/(σa + σe )] . (2.22) i=1t=1 and APE of y2t is: α (NT )−1 N T 1 i=1t=1 1 2.5 Serial Correlation Correction As discussed in the previous part, the essential assumption that we made in equation (2.6) requires u2it free of serial correlation. If u2it is serially correlated, then we must correct for the serial 41 correlation in e1it , otherwise our estimator will not be consistent. For simplicity, assume that u 2it follows an AR(1) process, similar to the discussion in Giles & Murtazashvili (2010): u2it = η u2i,t−1 + e2it , t = 1, . . ., T, (2.23) 2 and e2it is a white noise error with Var(e2it ) = σe . 2 We have: y2it = w2it γ2 + a2i + u2it , (2.24) where w2it = (zit ,zi ) and γ2 = (γ , δ ) or we can write: y2i = w2i γ2 + v2i . (2.25) Since u2it has serial correlation, e1it is serially correlated as we can see below: ηe = Cov(e1it , e1i,t−1) = Cov(u1it − θ1 u2it , u1i,t−1 − θ1 u2i,t−1), ηe = Cov(u1it − θ1 η u2i,t−1 − θ1 e2it , u1i,t−1 − θ1 u2i,t−1), 2 ηe = ηθ1 Var(u2i,t−1), ηe = 0 unless η = 0 or θ1 = 0. To remove serial correlation in e1it , our strategy is to use a transformation procedure and obtain the first-stage residual free of serial correlation. Define the variance-covariance matrix of v2i as: 2 2 Γ = E(v2i v2i ) = σa jT jT + σu Ψ(η ), 2 2 (2.26) where Γ is a T × T positive definite matrix when −1 < η < 1 and I assume that in what follows. This matrix is necessarily the same for all i because of the random sampling assumption in the cross section. jT is a T × 1 vector of ones, and Ψ(η ) is defined as below: ¾ η T −3 η T −2 η T −1 1 η Ψ(η ) = η η2 1 η . . . η T −4 η T −3 η T −2 η2 . . . η . . . 1 . . . . . . η T −5 η T −4 η T −3 . . . . . . ... . . . ¿ ... η T −2 η T −3 η T −4 . . . η 1 η η T −1 η T −2 η T −3 . . . η2 η 1 42 . (2.27) 2 2 2 2 We also note that σu = σe /(1 − η 2 ). After obtaining consistent estimates of η , σ a , σu (and 2 2 σe ), 2 2 2 we can transform v2it into v∗ 2it 2 which is free of serial correlation. With this new serially un- correlated error (v∗ ), we can transform u2it to a new serially uncorrelated u∗ using this equation: 2it 2it u∗ = v∗ − a2i . This will guarantee that our new e1it (e∗ ) is free of serial correlation as a result 2it 2it 1it of: e∗ = u1it − θ1 u∗ . e∗ is now serially uncorrelated, independent of zi and u∗ , and has a 1it 2it 1it 2it ∗2 normal distribution: Normal(0, σ e ). 1 We will briefly describe the transformation procedure as follows: Using the fact that jT jT = T , Γ is rewritten as: −1 2 Γ = T σa jT jT jT 2 2 2 2 jT + σu Ψ(η ) = T σa PT + σu Ψ(η ), 2 2 2 where PT ≡ IT − QT ; QT = IT − jT jT jT −1 (2.28) 2 2 2 jT . Define τ1 = σu Ψ(η )/[T σa + σu Ψ(η )], we 2 2 2 can write: 2 2 Γ = T σa + σu Ψ(η ) (PT + τ1 QT ). 2 2 (2.29) After some algebra, we can show that: (PT + τ1 QT )−1/2 = (1 − τ )−1[IT − τ PT ] where τ = √ 1 − τ1 . Hence, 2 2 Γ−1/2 = T σa + σu Ψ(η ) Ò 2 2 −1/2 2 (1 − τ )−1[IT − τ PT ] = σu Ψ(η ) −1/2 2 Ó1/2 [IT − τ PT ], (2.30) 2 2 2 . where τ = 1 − σu Ψ(η )/[T σa + σu Ψ(η )] 2 2 2 Define CT ≡ [Ψ(η )]−1/2 [IT − τ PT ] and transform equation (2.25) into: y2i = w2i γ2 + v2i , (2.31) by multiplying C T to both sides of equation (2.25). Now the variance matrix of v2i is: 2 E(v2i v2i ) = CT ΓCT = σu IT . 2 (2.32) Therefore we have transformed v2i into v∗ (= v2i ) which is serially uncorrelated and ho2i moskedastic by using: v2i = CT v2i . 43 (2.33) The estimator of CT is: CT = σu2 Γ−1/2. (2.34) We can see that, in the special case when η = 0 (no serial correlation), Ψ(η ) = IT and CT = [IT − τ PT ]. Now we can adjust equation (2.9) under the adjusted assumption that: s∗ = θ2 y1i0 + zi θ3 + v∗ θ4 + a∗ , i 2i 1i (2.35) ∗2 where a∗ |(yi0 , zi , v∗ ) ∼ Normal(0, σa ) and obtain: 1i 2i 1 y∗ = ρ y1i,t−1 + α y2it + xit β + θ2 y1i0 + zi θ3 + v∗ θ4 + θ1 v∗ + a∗ + e∗ , t = 1, . . . , T, (2.36) 1it 2i 2it 1i 1it where v∗ = (v∗ , v∗ , . . . , v∗ ) and we will estimate all parameters in the second stage using 2i 2i1 2i2 2iT standard random effects Tobit software, based on the density of (y 1i1 , y1i2 , . . ., y1iT ) given (y1i0 = y10 , zi = z, v∗ = v∗ ) as: 2i 2 ¾T R t=1 ¿ ∗ ∗ ft (y1t |y1,t−1, y2t , xt , y10 , z, v∗ , a∗ ; λ ∗ ) (1/σa )φ (a∗ /σa )da∗ , 2 1 1 1 1 1 (2.37) which has exactly the same structure as in the standard random effects Tobit model, but the explanatory variables at time period t are: w∗ = (y1i,t−1 , y2it , xit , y1i0 , zi , v∗ , v∗ ). it 2i 2it (2.38) Now we can propose an estimation procedure for “correlated random effect” dynamic Tobit model with first-stage residual serial correlation correction. 2.5.1 Estimation Procedure (i) Run the random effect linear regression with an AR(1) disturbance of y2it on w2it (with time dummies) and obtain the residuals v2it and v2i . Obtain CT and transform v2it and v2i into v∗ and v∗ based on the above transformation procedure. 2it 2i ˆ (ii) Use the random effect Tobit of y1it on w∗ and get all the estimates of interest, λ ∗ , where it w∗ = (y1i,t−1 , y2it , xit , y1i0 , zi , v∗ , v∗ ). it 2i 2it 44 2.5.2 Average Partial Effects As the errors in the first stage are serially correlated, we also need to adjust the estimates of APEs. Instead of equation (2.18), we start with: 2∗ E[m(w1t λ1 + θ2 y1i0 + zi θ3 +v∗ θ4 + θ1 v∗ + a∗ , σe )], 2i 2it 1i t = 1, . . ., T, 1 (2.39) and following the same discussion as the case with no serial correlation, we can obtain APEs with respect to y1,t−1 , y2t , and xt by taking derivatives or differences of: N −1 N 2∗ ˆ ˆ ˆ ˆ ˆ v ˆ ˆ 2∗ m(w1t λ1 + θ2 y1i0 + zi θ3 +ˆ ∗ θ4 + θ1 v∗ , σa + σe ). 2i 2it ˆ 1 i=1 1 (2.40) If the null hypothesis of no endogeneity and no serial correlation in the first stage is rejected, the standard errors in the second stage should be adjusted for the first stage estimation by using delta method or bootstrapping. In addition, we also need to obtain asymptotic standard errors for the APEs. Appendix E shows how to obtain adjusted standard errors in the second stage and asymptotic standard errors for the APEs using delta method. 2.5.3 Comparison We will compare the methods proposed in the previous section with the traditional linear model and the model without serial correlation correction. 1. Linear Dynamic Model with an endogenous explanatory variable We estimate model (2.1) using a generalized method of moments (GMM) system approach (Arellano & Bover (1995)) using both level and differenced instruments. 2. Correlated Random Effect model (without serial correlation correction) We estimate model (2.1) with a correlated random effect model: y∗ = ρ y1i,t−1 + α y2it + xit β + θ2 y1i0 + zi θ3 + v2i θ4 + θ1 v2it + a1i + e1it , t = 1, . . ., T. 1it Estimation Procedure: 45 (i) Estimate the reduced form for y2it using the pooled OLS of y2it on zit , zi and time dummies. Obtain the residuals, v2i and v2it . (ii) Use the random effect Tobit of y1it on wit and get all the estimates of interest, λ , where wit = (y1i,t−1, y2it , xit , y1i0 , zi , v2i , v2it ) using the notation introduced in the previous section. Using Mundlak’s simpler version of Chamberlain’s device (1980), in those estimating equations above, we can use v2i instead of v2i since the Mundlak’s model can conserve on degrees of freedom, which is important especially when T is large in a dynamic model. 2.6 Empirical Example The estimation procedure described above can be used in many applications. Here we apply to analyze the US female labor supply. In a panel data study, working hours exhibit a dynamic behavior and the persistence may be contaminated by heterogeneity, endogeneity and serial correlation. According to Heckman and MaCurdy’s labor supply model, the censored model should be appropriate. The challenge is to invent a new econometric device to estimate a dynamic censored model with an endogenous variable besides the lagged dependent variable. And this new device has been developed in the previous section. Endogeneity of experience is a potential problem because there are two sources of endogeneity here. First, experience is correlated with ability. Second, experience is constructed based on working hours and exogenous shock to working hours in the past (through wages) is correlated with the number of years of experience we observe today. Therefore, experience is not viewed as strictly exogenous after conditioning on unobserved heterogeneity. Having controlled for unobserved effects does not follow that we have unbiased estimates of state dependence, for two reasons. First, women with high average lifetime hours of work, and thus high xit , may have become permanent workers because their early experience in the market demonstrated to them the need for continuous hours of work to build and maintain their human capital investment. The result is that "human capital acquired through work experience raises 46 the future probability of participation" (see Heckman & Willis (1974), initial definition of state dependence). In this case, the estimated state dependence parameter is biased downward towards 0, because state dependence operates entirely through a high lifetime c i . State dependence is the coefficient on lagged hours: a positive coefficient on lagged hours implies that past hours have a positive impact on future hours. If ci is omitted from the regression, the coefficient on lagged hours will be biased upward by the omitted variable bias, and therefore the importance of state dependence will be over-estimated. The second problem is that state dependence cannot be separated from serially correlated errors. In other words, if shocks to hours are correlated over time, they will be picked up by the lagged hours variable, and state dependence will be biased. Hyslop (1999) found that transitory errors negatively correlated over time, suggesting failing to control for serially correlated transitory errors would lead to underestimation of state dependence. 2.6.1 Data One application of the model introduced in the previous part is to study the dynamics of female labor supply. We can use the data from the Panel Study of Income Dynamics (PSID) for the years 1980-1992. In this study, we only focus on 864 white female who were either heads of households or spouses and their age is from 18 to 65. Women who are self-employed, in army and agricultural workers are excluded. Observations with inconsistent or missing data are dropped. More specifically, if one of the following happened in at least one year between 1976-1992, then the person will be dropped: self-reported age exceeded the age constructed using information on the year of birth by more than two years or self-reported age was smaller than constructed age by more than one year; the person was less than 18 or more than 65 years old; the person had missing experience; the person’s age exceeded her/his experience by less than six years; spouse’s weeks of unemployment was missing; the person reported positive work hours and zero earnings; the change in years of schooling between 1976-1985 was negative and exceeded one year in absolute value. In cases when the reported decrease in years of schooling was on year, the minimum of the 47 two reported values was assigned in all periods. The final sample consists of 11,232 observations. The dependent variable in the structural equation (2.1), y1it , is female annual working hours. The vector of explanatory variables includes the lagged dependent variable (y 1i,t−1 ), an endogenous variable, experience, (y2it ), and a set of exogenous variables (xit ): education (measured in years of schooling), number of small children ages in 3 categories: 0-2, 3-5, and 6-17, marital status, husband’s employment status, and non-wife income. Experience is constructed by taking the information about prior experience from 1976 survey year or from the year when the individual entered the sample for the first time, and then updating this information annually. In each year, experience was increased by one if the annual work hours were 2000 or more, and it was increased by the number of hours worked divided by 2000 if the annual work hours were less than 2000. Education is considered to be strictly exogenous conditional on the unobserved effect while experience is considered endogenous. The set of instruments, zit , contains years of schooling, age and its square, an indicator of marital status, number of children with three categories of ages in the family, husband’s employment status, and non-wife income; their time averages and time dummies. Table B.1 reports the summary statistics for all variables used in the analysis. Figure B.1 shows the distribution of women’s working hours during the period 1980-1992. Around 27 percent of women did not work at the time of the survey. On average, women work for 1124 hours per year, which is about 21 hours per week (including women who do not work). The next largest group consists of women who work for 2000 hours per year, which is equivalent to 40 hours per week, accounting for 12 percent. The pattern with some pile up at zero hour and 2000 hours suggests that hours of work are sensitive to changes in the structure of both observed and unobserved individual heterogeneity. Figures B.2-B.5 illustrate the relationship between women’s working hours and her experience, and her children number with 3 groups of ages, respectively. All of these relationships appear to fit our prior expectations. In our sample, there are 2,978 women worked zero hours, opposed to 8,053 women worked for wage during the year with positive hours, ranging from 2 to 5,168. Hence, annual hours is a 48 reasonable candidate for a Tobit model. 2.6.2 Estimation and Result We are interested in estimating the dynamic Tobit model of working hours for a woman i at time t: Hoursit = max(0, ρ Hoursi,t−1 + α Experienceit + xit β + c1i + u1it ), (2.41) where Hoursit is annual working hours for a woman i at time t, which are determined by her annual working hours in the previous period, Hours i,t−1 , her experience, Experienceit , and a vector of her characteristics including age, education, number of children, marital status and her husband’s characteristics. The lagged dependent variable is included to capture the dynamic feature of working hours, in the sense that current working hours may also depend on past working hours, all others held constant. This dependence is due to things such as the accumulation of skills derived from past work. From this model, we are interested in estimating the coefficients on Hours i,t−1 and Experienceit . The coefficient on Hoursi,t−1 will shed light on the US female labor supply persistence over the period 1980-1992. As women’s experience is considered endogenous in this model, we will instrument the endogenous regressor with her age and its square because there is a positive significant correlation between experience and age and age is strictly exogenous in the structural equation. The firststage regression estimates and their statistics are reported in Table B.2. The instruments are jointly significant on experience with the F-statistics are 196.26. We first test for the endogeneity of fertility using the Hausman (1978) test. Because experience and working hours are simultaneously determined, the exogeneity assumption of experience has to be tested. A test for endogeneity of y2it can be obtained by adding the first-stage residuals to the second stage estimation and obtain the t-statistic on v∗ . Table B.3 shows the significance of v2it 2it and v∗ suggests that the hypothesis of an exogenous experience is rejected. 2it Table B.4 reports estimation results (of average partial effects) using the correlated random effect approach with and without serial correlation correction. The estimation result for the dynamic 49 linear model using GMM method is also shown for comparison. Since the result in Table B.4 is consistent with the result in Table B.3, we are going to discuss more about the results in Table B.3. In all models (columns (1)-(3)), the coefficients for lagged working hours are significant and positive, suggesting positive state dependence of labor supply for women. The positive sign of the lagged working hours shows that women are likely to continue to be workers if they are already workers or continue to be unemployed if she does not work. The decline in the value of the coefficient on lagged working hours from model (1) to model (3) explains the upward bias of state dependence in women’s working hours without taking into account the censored and unobserved heterogeneity issues as well as the serial correlation of unobserved factors. Unobserved heterogeneity which correlates with women characteristics contributes the largest to this upward bias, next is the ignorance of zero working hours issues and last is the serial correlation of unobserved factors. In these models, from column (1) to column (3), in general, experience has positive influence on working hours. The magnitude is larger when we controlled for serial correlation. It shows that if women work continuously and accumulate a substantial amount of experience, the more experience they have, the more hours they work. Compare to columns (1) and (2), we control for an extra source of serial correlation (the transitory shock) in experience besides unobserved heterogeneity (the permanent shock). The coefficient on experience is quite larger and its standard error is smaller. The intuition is as follows. Consider a positive (transitory) shock to experience. With a high degree of positive serial correlation and a rise of experience in the first period, experience will continue to rise in the next period and become very large over a long time period. This explains a higher coefficient on experience compared to those on (1) and (2). After correcting for the serial correlation in (3), even though in the first stage, CRESC estimates are more efficient than CRE estimates and we can see that the standard error on experience is smaller than those on (1) and (2). Other explanatory variables might be affected using CRESC (for example, number of children) because when a lagged dependent variable entered into the equation, which is the proxy of the 50 dependent variable, in the presence of the serial correlation of the endogenous variable, the higher effect of experience may pick up some of the effects of unmeasured variables as well as observed covariates. As a result, the coefficients on the lagged dependent variable and children (as well as mother’s education) are reduced and the significance of children might change. Even though a linear model does not require any serial correlation assumption of experience, the coefficient of children is more appealing in a nonlinear model where we use CRESC. We also note that children is allowed to be correlated with heterogeneity but not with the shocks to labor supply so this assumption is not conflicting with the endogeneity assumption in Chapter 1 where we deal with the cross section and allow correlation between children and heterogeneous preference. In this chapter, we treat children exogenous with respect to shocks rather than with respect to heterogeneity. It is also indicated from the coefficients on small children from 0-2 and 3-5 that small kids have statistically significant negative effects on mothers’ working hours. There is an evidence from the result that children aged 6 to 17 do not affect negatively to women working hours and the statistics are not significant in models (2) and (3). The initial value of working hours illustrates the correlation between the unobserved effect and the initial condition. The coefficient on the initial value of working hours is statistically significant in both models (2) and (3). It suggests a strong state dependence of labor supply for women for a long period. 2.7 Conclusion In this chapter, an attractive and easy-to-compute method for estimating dynamic Tobit panel data models with endogenous regressors (besides the lagged dependent variable) is proposed. This approach requires fewer computational efforts than Heckman’s technique and gives nice APEs. It also leads to several advantages, for example, we can choose a flexible conditional distribution of the initial condition instead of approximation which results in computational difficulty. As a consequence, estimates are readily computed and partial effects can be easily determined. In addition, the control function approach is used to control for the endogeneity which is not coming 51 from the lagged dependent variable. This approach allows for correlation between unobserved effect and regressors, as well as between regressors and the structural error. To handle the presence of heterogeneity that causes serial correlation, the correction procedure is added and the serially uncorrelated residual in the first stage is obtained. This proposed method discussed in this chapter provides useful tool for applied economic research. The method can be applied to various economic applications, such as estimation of labor supply models, housing expenditure models, or children’s educational expenditure models, etc. The proposed estimation procedure is readily applied to Panel Study of Income Dynamics data from 1980 to 1992. Based on the estimation result, I find a strong evidence of persistence in the US white female labor working hours after controlling for censoring, endogeneity and serial correlation issues. I also find that the initial condition of female labor supply is statistically significant and has positive impact on women working history. It suggests that the current labor supply of US women is affected by their past labor supply and their initial condition of labor supply. 52 Chapter 3 AN EXPONENTIAL TYPE II TOBIT PANEL DATA MODEL WITH BINARY ENDOGENOUS REGRESSOR - APPLICATION TO ESTIMATING THE EFFECT OF FERTILITY ON MOTHERS’ LABOR FORCE PARTICIPATION AND LABOR SUPPLY 3.1 Introduction There has been a growing interest in the estimation of nonlinear panel data models with discrete endogenous variables. Most of the studies focus on binary response or count models with an endogenous dummy variable. However, there has not been any method suggested in a panel data model with a corner solution response. Moreover, there is a correlation between the probability of a positive outcome and itself. Heterogeneity is also present in the model. Therefore, the goal of this chapter is to develop a panel data estimation method for a model with a corner solution response and a binary endogenous variable in the presence of heterogeneity and the mentioned correlation. Many approaches have been proposed to handle switching endogeneity in models with limited dependent variables. In a limited dependent variable panel data model, the main difficulty lies with the nonlinear functional form and we cannot difference away the unobserved effect. Fullinformation maximum likelihood can be used but this approach is intensively computational which makes it unattractive. Semiparametric or nonparametric estimators are based on distributional weaker assumptions; nevertheless, these estimators give scaled index coefficients and not average partial effects. The simplest approach is 2SLS, however, this method ignores nonlinearity in both the first and second stage. It might provide a good approximation but the two assumptions that a binary endogenous variable is expressed as a linear function and a binary or censored dependent variable is a linear function of a binary endogenous variable are unrealistic. Especially, this approach ignores the distribution of a censored variable where there is a massive pile of zeros. Econometricians came up with the control function approach to handle endogeneity so that 53 nonlinearity is present in the second stage but linearity is still endured in the first stage. In this chapter, I propose a simple two-step estimator that keeps the nonlinearity assumption in both the first and second stage, and this method is more computationally attractive than the full-information maximum likelihood approach. The model and estimation can be used in various economic applications. For example, we can apply it to study the effects of union status on labor market outcomes, the effects of childbearing on women’s labor supply and many studies on health economics, business or epidemiology where binary endogeneity and the corner solution response occur. There are enormous studies on the effect of fertility on women’s labor force participation (LFP) and labor supply. It is important to understand how the childbearing decision affects female participation in the labor force and how much she will work in a system of related equations. In this chapter, I will consider the fertility decision an endogenous dummy variable that influences both women’s LFP and hours of work. The labor supply equation is the amount equation with a corner solution response while the LFP equation is the so-called participation equation. Using this system of equations, we can correct for both corner solution and endogenous problems in the study of women’s labor supply. The contribution of this chapter is to propose a simple two-step estimator which is robust and can be easily implemented for a Tobit panel data model in the presence of discrete endogeneity and heterogeneity. The main estimation strategy is to add correction terms so that the endogeneity and corner solution bias will be removed. This approach allows a joint distribution of the endogenous dummy regressor and the unobserved factors that affect both the amount and participation equations. I propose a two-step estimation method in which the first stage exploits a bivariate probit model for the relationship between the dummy endogenous variable and the participation decision. For the amount equation, by using an Exponential Type II Tobit (ET2T) model (see more of Type II Tobit models in (Wooldridge, 2010, chapter 17)), we can ensure that predicted value of log(hours) is positive, and there is a correlation between unobserved effects in both the amount and participation equation. In addition, exclusion restriction is used in order to identify the parameters in the structural equation. In other words, we allow some variables in the participation equation which 54 are not determinants in the amount equations. Explanatory variables are permitted to be correlated with the heterogeneity. Finally, on the empirical side, it also contributes to the study on the effect of having a newborn on women’s LFP and labor supply, taking into account their unique culture and characteristics, using Vietnamese Household data in recent years. This chapter is organized as follows: The second section reviews approaches to the estimation of a model with a binary endogenous explanatory variable and a limited dependent variable. It also discusses the literature on the effect of fertility on female labor supply and female labor participation. The third section develops a model for Tobit panel data with a dummy endogenous regressor in the presence of correlated participation and heterogeneity. An estimation procedure is proposed and average treatment effects are obtained. The next section gives an overview of data and estimation results for an empirical example. The last section is summarization and conclusion. 3.2 Literature Review There have been many studies on limited dependent variable models with a dummy endogenous variable. These models were first pioneered by Heckman (1978a) using joint normal distributional assumptions and maximum likelihood (ML) method. Many other works use the conditional ML framework such as Amemiya (1978, 1979); Newey (1986, 1987); Blundell & Smith (1989) but with different procedures: generalized least squares (GLS) estimators, minimum Chi-squared estimators or two-step estimators. The disadvantage of this canonical method is that it is hard to implement and very computationally expensive. In a panel data framework, most papers assume a reduced form for an endogenous variable or use a control function approach with generalized residuals (Vella & Verbeek (1999); Labeaga (1999)). Many studies also use this approach for cross-sectional cases (Vella (1993); Smith & Blundell (1986); Rivers & Vuong (1988)). Even though this approach produces consistent estimators, it would be unrealistic to assume a linear function for a dummy variable. In order to avoid distributional assumptions in traditional ML framework as in Heckman (1978a), some studies have proposed nonparametric or semiparametric estimators (Newey (1985); Lee 55 (1996); Vytlacil (2002); Vytlacil & Yildiz (2007)). However, these estimators are quite difficult to implement in the case where both the corner solution and binary endogeneity occur. Moreover, in a panel data framework, the semiparametric fixed-effect approach cannot identify average partial effects. Angrist (2001) discussed other alternative methods for estimating dummy endogenous variables (including 2SLS, IV for an exponential conditional mean, minimum mean squared error approximation or quantile treatment effects approach). He prefers IV an estimation strategy (similar to Mullahy (1997) and Abadie (2000)) for nonlinear models with covariates and a nonstructural approach since it gives similar average treatment effects. However, he did not give any evidence for not using the structural approach. In this chapter, I focus on the simple two-step estimation method (Terza (1998); Kim (2006)) since our model has both corner solution and binary endogeneity problems. It would be attractive to use this method that incorporates a method similar to Heckman (1979) to correct for sample selection. In addition, in our panel data framework, we would like to use correlated random effects to handle heterogeneity in the presence of endogeneity and correlated participation (similar to Semykina & Wooldridge (2010)). Both IV strategies and the bivariate probit method are utilized to handle the binary endogeneity. The proposed method is very applicable in many economic models since switching endogeneity is of interest to many applied economists and policy makers. One interesting application is estimating the effect of having a newborn on women’s labor supply in the presence of a corner solution response and unobserved heterogeneity. Hence, the following part will consist of a literature review on the relationship between fertility and female labor supply. A remarkable number of studies have examined the effect of fertility on female labor supply and labor force participation. These studies can be divided into four major groups, depending on how they handle the endogeneity problem of the fertility decision. The first group is presented by the studies of Gronau (1973), Heckman (1974), and Heckman & Willis (1977) who assumed exogenous fertility and established a strong negative correlation between female labor supply and 56 fertility. However, as Browning (1992) commented, very few credible inferences can be drawn from them even though we have a number of robust correlations. Their main methodology is to use OLS to estimate the effects of fertility on labor supply. A second group of studies led by Cain & Dooley (1976), Schultz (1978), and Fleisher & Rhodes (1979) acknowledged endogenous fertility. They handled the endogeneity problem by estimating simultaneous equations models. Smaller estimates on fertility are found when treating it as an endogenous variable than when treating it as an exogenous variable. The problem with this approach is that it is hard to find plausible exclusion restrictions that could identify the underlying structural parameters. A third group of studies, pioneered by the work of Nakamura & Nakamura (1992), added the lagged dependent variable (i.e. hours of work) to control for unobserved heterogeneity across women. This approach has been used subsequently by a number of authors (Even (1987); Lehrer (1992)). Although adding the lagged dependent variable can help control for unobserved heterogeneity, it still does not address the problem of the endogeneity of the fertility decision. Last but not least, a fourth group of studies solved the endogeneity problem of fertility by exploiting exogenous sources of variation in family size. Rosenzweig & Wolpin (1980) first used this strategy by comparing the labor supply of women who had twins at their first birth with that of women who had a single child. Then Bronars & Grogger (2001); Jacobsen et al. (1999) used the same strategy but managed to obtain more precise estimates. Other studies (Bloom et al. (2009); Kim & Aassve (2006)) exploit abortion legislation or the contraceptive choice of couples as an IV for fertility. In the same spirit as the twins studies mentioned above, Angrist & Evans (1998) estimated the effect of a third or higher order child on female labor supply by exploiting the fact that parents typically prefer mixed-sex siblings. For a sample of couples with at least two children, they instrumented further childbearing (i.e. having more than two children) with a dummy variable for whether the sex of the second child matched the sex of the first. Because sex mix is virtually random, this strategy allows for identification of the effect of a third or higher order child. Nguyen (2010) emphasized the negative significant impact of the number of children on female 57 labor supply. The paper found diminishing impacts of having children on female labor supply and the first child always has the largest adverse effect on a mother’s labor supply. This implies that children do not have equal impacts on a mother’s labor supply. This finding is similar to the idea from Browning (1992) that having a newborn has more significant impact on a mother’s labor supply than having a general number of kids. However, the paper does not view the problem in terms of a two-part model acknowledging the fact that people who decide to work will have positive working hours. This chapter will consider the issue of female labor force participation in a relation with female labor supply and the impact of having a newborn on both a mother’s amount and participation decision, which calls again for a discrete endogeneity of having children. 3.3 Model and Estimation I consider a panel data model with a corner solution response and a binary endogenous variable in the presence of correlated participation decision and heterogeneity as follows: y1it = y2it exp(X1it β1 + y3it α1 + c1i + u1it ), (3.1) or log(y1it ) = X1it β1 + y3it α1 + c1i + u1it if y1it > 0 or y2it = 1(iff y∗ > 0), 2it y∗ = X2it β2 + y3it α2 + c2i + u2it , 2it (3.2) y∗ = X3it β3 + c3i + u3it , 3it (3.3) y2it = 1[y∗ > 0], 2it (3.4) y3it = 1[y∗ > 0], 3it (3.5) where y1it is continuous with strictly positive values when y ∗ > 0 and equal to zero when y∗ < 0 2it 2it with positive probability, hereafter i = 1, 2, . . ., N and t = 1, 2, . . ., T . We assume that we observe y1it only when y∗ > 0 or y2it = 1. Xmit are 1 × Km vectors of exogenous explanatory variables 2it (for m = 1, 2, 3) which can contain a constant term. βm are Km × 1 vectors of parameters. cmi are time-constant unobserved heterogeneity and umit are idiosyncratic errors. α1 and α2 are scalar 58 parameters. 1[·] is an indicator function which has a value of one when the expression inside the bracket is true, otherwise has a value of zero. Both y2it and y3it are dummy variables. We assume a balanced panel for simplicity so i = 1, 2, . . ., N and t = 1, 2, . . ., T is assumed throughout this chapter. A novel feature of this panel data model is that the common endogenous variable y 3it appears in both the amount and participation equations: (3.1) and (3.2). We therefore need to handle both endogeneity and the corner solution problem in equation (3.1). Following the work of Heckman et al. (1999) and Heckman (1979), α 1 and α2 are identified if X3i includes at least one variable which is excluded from X2i or X1i under the correct assumption of joint distribution of the error terms. That variable is usually referred to as (an) instrumental variable(s). X2i should include at least one variable which is not in X 1i . Those instrumental variables are assumed strictly exogenous conditional on unobserved heterogeneity. With that in mind, and using the modeling device in Mundlak (1978), we can model the relationship between unobserved effects cmi and Xmit for each m. Let us rewrite equations (3.1)-(3.3) as follows: y1it = y2it exp(X1it β1 + y3it α1 + c1i + u1it ), (3.6) y∗ = X1it β21 + X22it β22 + y3it α2 + c2i + u2it , 2it (3.7) y∗ = X1it β31 + X32it β32 + c3i + u3it , 3it (3.8) where X32it and X22it are instrumental variables. Now we assume that: cmi = Zi δm + ami , m = 1, 2, 3, 2 where ami |Xmi ∼ Normal(0, σam ); Z i = T −1 T ÈZ t=1 it ; (3.9) Zit contains both explanatory variables X1it , X32it , and X22it . Z i is a 1 × L vector where L = K2 + K3 − K1 . And now we can rewrite equations (3.1)-(3.3) as: log(y1it ) = W1it γ1 + y3it α1 + v1it if y1it > 0 or y2it = 1;W1it ≡ (X1it , Zi ), (3.10) y∗ = W2it γ2 + y3it α2 + v2it ;W2it = (X2it , Zi ), 2it (3.11) y∗ = W3it γ3 + v3it ;W3it = (X3it , Zi ), 3it (3.12) 59 where vmit = ami + umit ; m = 1, 2, 3. As discussed in (Wooldridge, 2010, section 17.6.3), we model the corner solution using the ET2T model. First, we can ensure that the predicted value of the response variable is positive. Second, it is noticeable that we can allow a correlation between unobserved factors that affect the amount equation and unobserved factors that affect the participation equation, that is, v 1it and v2it are correlated. This assumption is exploited to relax the assumption in the usual lognormal hurdle model. Moreover, it is a reasonable assumption in empirical study. For example, in the model for married women’s labor supply, unobserved factors can influence both women’s LFP and labor supply or the unobserved effects determining both decisions are related. Therefore, we can assume that: E(v1it |W1i , v2it ) = η v2it , (3.13) in addition to Var(v2it ) = 1; Var(v1it ) = σ 2 ; Cov(v1it , v2it ) = ψσ = η where ψ is the correlation between v1it and v2it . We are interested in deriving E(log(y1it )|Wi, y3it , y∗ > 0). 2it E(log(y1it )|Wi , y3it , y∗ > 0) = W1it γ1 + y3it α1 + E(v1it |Wi , y3it , y∗ > 0). 2it 2it (3.14) Now, E(v1it |Wi , y3it , y∗ > 0) = y3it E(v1it |Wi , y∗ > 0, y∗ > 0) + (1 − y3it )E(v1it |Wi , y∗ < 0, y∗ > 0), 2it 3it 2it 3it 2it or E(v1it |Wi , y3it , y∗ > 0) = y3it E1 + (1 − y3it )E0 . 2it We will derive E1 first and apply the similar strategy for E0 . E1 = E(v1it |Wi, y∗ > 0, y∗ > 0) = η1 E(v2it |Wi , y3it = 1, y2it = 1), 3it 2it (3.15) E1 = η12 E12 + η13 E13 , (3.16) where E12 = φ (W3it γ3 )Φ (W2it γ2 + α2 − ρ W3it γ3 )(1 − ρ 2 )−1/2 Φ−1 (W3it γ3 ,W2it γ2 + α2 ; ρ ), (3.17) 2 60 and E13 = φ (W2it γ2 + α2 )Φ (W3it γ3 − ρ (W2it γ2 + α2 ))(1 − ρ 2)−1/2 Φ−1 (W3it γ3 ,W2it γ2 + α2 ; ρ ), 2 (3.18) under the assumption that Cov(v2it , v3it ) = ρ and Var(v3it ) = 1; we can also write: v2it = ρ v3it +eit 2 where eit |Zi , v3it ∼ Normal(0, σe ). E0 = η02 E02 + η03 E03 , (3.19) where E02 = φ (−W3it γ3 )Φ (W2it γ2 − ρ W3it γ3 )(1 − ρ 2 )−1/2 Φ−1 (−W3it γ3 ,W2it γ2 , −ρ ), (3.20) E03 = φ (W2it γ2 )Φ (−W3it γ3 + ρ W2it γ2 )(1 − ρ 2 )−1/2 Φ−1 (−W3it γ3 ,W2it γ2 , −ρ ). (3.21) and Using these two regimes of y3it and correlated participation, we can handle both endogenous switching and corner solution problems. Instead of proceeding with the full information maximum likelihood, we estimate parameters of interest using a two-step estimation procedure based on Heckman’s idea of correcting a selection problem using the correction functions (which is the inverse Mill’s ratio in Heckman’s model). For each regime corresponding to either y 3it = 0 or y3it = 1, we add two correction terms which comprise one part for fixing an endogeneity problem and the other part for correcting correlated unobserved effects bias from the participation equation. The conditional mean of interest with positive outcome, y 2it = 1, is: ¾ E(log(y1it )|Wi , y3it , y∗ > 0) = 2it W1it γ1 + y3it α1 + η12 y3it E12 (θ1 ) + η13 y3it E13 (θ1 ) +(1 − y3it )η02 E02 (θ1 ) + η03 (1 − y3it )E03(θ1 ) ¿ , (3.22) where Wi = W1i ∪W2i ∪W3i and 4 correction terms E12 , E13 , E02 and E03 are stated as above. We can identify θ1 = (α2 , γ3 , γ2 , ρ ) , a Q × 1 vector, (Q = 1 + K2 + K3 + 2L + T ) using maximum likelihood estimation for pooled bivariate probit model in the first stage. Similar to Heckman 61 (1978a), Greene (1997), and Carrasco (2001), the log likelihood function in the first stage that solves for estimates of θ1 is: ¾ ln Lit (θ1 ) = (1 − y3it )(1 − y2it ) ln P00 + y3it (1 − y2it ) ln P10 ¿ +y2it (1 − y3it ) ln P01 + y3it y2it ln P11 , (3.23) where P11 = Pr(y3it = 1 and y2it = 1) = Φ2 (W3it γ3 ,W2it γ2 + α2 ; ρ ), (3.24) P00 = Pr(y3it = 0 and y2it = 0) = Φ2 (−W3it γ3 , −W2it γ2 , ρ ), (3.25) P10 = Pr(y3it = 1 and y2it = 0) = Φ2 (W3it γ3 , −W2it γ2 − α2 ; −ρ ), (3.26) P01 = Pr(y3it = 0 and y2it = 1) = Φ2 (−W3it γ3 ,W2it γ2 ; −ρ ). (3.27) We can estimate parameters in the first stage to obtain θ1 (and its standard errors as shown in the first-stage technicalities of Appendix F) and get 4 correction terms in equation (3.22) to plug in the second stage. In the second stage, we estimate the following equation on the selected sample with y2it = 1 or a positive dependent variable using POLS: ¾ log(y1it ) = ˆ ˆ W1it γ1 + y3it α1 + η12 y3it E12 (θ1 ) + η13 y3it E13 (θ1 ) ˆ ˆ +(1 − y3it )η02 E02 (θ1 ) + η03 (1 − y3it )E03 (θ1 ) + εit ¿ . (3.28) Hence, with a similar idea of adding the inverse Mills ratio to correct for the sample selection bias, we can add 4 correction terms to control for a corner solution problem with a correlated participation decision and binary endogeneity. We can rewrite the estimating equation above as: log(y1it ) = W1it γ1 + y3it α1 + 4 ˆ η j λit j + εit , (3.29) j=1 ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ where λit1 = y3it E12 (θ1 ); λit2 = y3it E13 (θ1 ); λit3 = (1−y3it )E02 (θ1 ); and λit4 = (1−y3it )E03(θ1 ). Even though the two-step estimator is easy to implement and numerically robust, we need to adjust the second-stage standard errors, taking into account the first-stage estimation. I show how ˆ to obtain θ1 and derive the asymptotic variance of this two-step estimator ( θ2 ) in the technical section of Appendix F. 62 3.4 Average Partial Effect The quantity of interest in this study is average treatment effect (ATE) of the binary endogenous variable. We can also obtain average partial effects (APEs) for exogenous explanatory variables. First we rewrite model (3.1) to (3.3) in the conditional mean forms as follows: E(log(y1it ) = X1it β1 + y3it α1 + c1i + u1it if y1it > 0, (3.30) E(y2it |X2it , y3it , c2i , u2it ) = Φ(X2it β2 + y3it α2 + c2i + u2it ), (3.31) E(y3it |X3it , c3i , u3it ) = Φ(X3it β3 + c3i + u3it ). (3.32) Our main interest lies in the treatment effect of a binary endogenous variable in both equations (3.30) and (3.31). We can evaluate the effect at values of exogenous explanatory variables of interest. But first, we need to handle the correlated unobserved effects using Mundlak’s device as shown in equation (3.9) and follow the estimation procedure that is clarified in the previous section. Now (3.30) and (3.31) have been previously derived as: E(log(y1it )|Zi , y3it , y1it > 0) = X1it β1 + y3it α1 + Zi δ1 + 4 η j λit j , (3.33) j=1 E(y2it |Zi , y3it , a2i , u2it ) = Φ(X2it β2 + y3it α2 + Zi δ2 + a2i + u2it ). (3.34) ATE for the amount equation: For y3t as a binary variable, the ATE at time t can be obtained by averaging equation (3.30) over the distribution of c1i and u1it or take a difference in: E(Z ,λ ) [X1t β1 + y3t α1 + Zi δ1 + i it 4 η j λit j ], (3.35) j=1 where in the argument of the expectation operator, variables with a subscript i are random and all others are fixed. With the definition from equation (3.17) - equation (3.21), plus equation (3.29), (3.35) is rewritten as: EEit [α1 + η12 E12 (θ1 ) + η13 E13 (θ1 ) − η02 E02 (θ1 ) − η03 E03 (θ1 )]. 63 (3.36) Given consistent estimator of θ1 and θ2 , the ATE of the binary variable y3t in equation (3.33) can be estimated as: AT E = N −1 N ä ç α1 + η12 E12 (θ1 ) + η13 E13 (θ1 ) − η02 E02 (θ1 ) − η03 E03 (θ1 ) , (3.37) i=1 where for each unit we predict the difference in mean responses with and without “treatment” (for y3t = 1 and y3t = 0), and then average the difference in these estimated mean responses across all units. ATE for the participation equation: We rewrite model (3.34) with the scaled coefficients using a standard mixing property of the normal distribution of eit : E(y2it |Zi , y3it , v3it , eit ) = Φ(X2it β2 + y3it α2 + Zi δ2 + ρ v3it + eit ), (3.38) E(y2it |Zi , y3it , v3it , eit ) = Φ(X2it β2e + y3it α2e + Zi δ2e + ρe v3it ), (3.39) or where the subscript e denotes division by 2 1 + σe . Note that we can write (3.38) - equation (3.39) in terms of bivariate probit model as in the technical section and the procedure to obtain APE or ATE is the same as described below. That means we average out Zi and then take derivatives or changes with respect to the elements of (X2t , y3t ). The APEs are obtained by computing derivatives, or obtaining differences, in: E(Z ,v ) [Φ(X2t β2e + y3t α2e + Zi δ2e + ρe v3it )], i 3it (3.40) E(Z ) [Φ(X2t β2v + y3t α2v + Zi δ2v )], (3.41) or i Õ 2 2 where the subscript v denotes division by ρ e 1 + σv and Var(v3it ) = σv = 1. In order to obtain 3 3 partial effects, we average out Zi and then take derivatives or changes with respect to the elements 64 of (X2t , y3t ). Across the sample for a chosen t, we can obtain the estimators for APE with respect to one element X2t1 of X2t as: ¾ APE = β2v1 N −1 N ¿ φ (X2t β2v + y3t α2v + Zi δ2v ) . (3.42) i=1 The estimator for ATE with respect to y3t is: AT E = N −1 N Φ(X2t β2v + α2v + Zi δ2v ) − Φ(X2t β2v + Zi δ2v ) , (3.43) i=1 which we are interested in. 3.5 Empirical Example 3.5.1 Overview of Data Over the past two decades, fertility has decreased as the labor force participation rates of women in most developing and advanced countries have increased (Kim & Aassve (2006)). This change implies the changing roles of women and changes in the time allocation among household members in both work activities and fertility behavior. We also observed this pattern in Vietnam. For the last two decades, the fertility rates of Vietnamese women fell while the labor force participation rates for the whole population did not change very much. A decline in fertility also accompanied an increase in income. During the period from 1986 to 2006, while fertility dramatically decreased, GDP per capita increased 2.9 times to 587.4 USD per capita. This pattern is consistent with microeconomic predictions: higher income leads to a reduction in fertility and the inverse relationship of fertility and labor force participation (Becker & Lewis (1973); Willis (1973)). Thus, it is important to analyze the data on fertility and labor market behavior of working women. The data used in this paper came from the Vietnamese Household Living Standard Surveys (VHLSS) 2004, 2006, and 2008, which were conducted by the Vietnamese General Statistical Office (GSO) with technical support from The World Bank. The survey sample was randomly 65 selected to represent the whole country, taking into account urban and rural structures, geographical conditions, regional issues, ethnic differences, and provincial representation. The sample used in this chapter has 665 women. The survey collected information about the following: household information, education, health, employment, migration, housing, fertility and family planning, incomes, expenditures, borrowing, lending, and savings. Only households with children under 18 years old and households with a mother and father younger than 60 and 65 years of age, respectively, at the time of the interview are included in this research. There are 1,995 households in the sample used for this research. Table C.1 provides a summary of the descriptive statistics for the whole sample. The dependent variables are working status and hours worked per day for a woman (being either head of household or spouse). According to Table C.1, 95% of mothers worked in the interview year, and on average, they worked 7.8 hours per day. The explanatory variables are whether mother has a newborn, mother’s education, age, non-labor income, father’s education, age; and other household characteristics such as whether they live in an urban area, they work on a farm and their ethnicity. In this sample, each household had an average of 2.5 children; 10% of the sample women had newborns; 55% and 56% women in the sample had a boy first and their first two kids had the same gender, respectively. In general, the husband’s education is higher than wife’s education. Income from other sources for women in the sample is about 8 million VND per year (approximately 400-450 USD per year). Table C.1 also shows that around 84% of working wives worked on farms and 18% of households were located in urban areas. Table C.2 shows the summary statistics for each year in the panel data. There is no obvious pattern for women working hours and labor force participation (LFP). However, we can observe that the fertility rate declines over time. The percentage of having a newborn goes down from 16% in 2004 to 9% in 2006 and 6% in 2008. On the contrary, non-wife income increases over time. 66 3.5.2 Estimation and Result The main contribution of this chapter and its following application is to allow the correlation between women’s decision to participate in the labor market and their amount of working hours; to acknowledge the nature of having a newborn as a dummy variable and to consider the influence of having a newborn on both women’s participation and labor supply. As shown in the literature, newborns have negative effect on women’s labor force entry. This means that women who are not working are unlikely to take part in the labor market after delivering babies. This raises the question of how newborns affect their mothers’ labor supply for those women who are participating and stay in the market. When a mother has a newborn, she will decide how many working hours she will spend after deliver a baby. If endogeneity of fertility is not accounted for, we will not obtain consistent estimates of labor supply conditional on fertility. In order to draw robust and credible estimates of the effects of newborns on women’s labor supply and participation, we need to take into account this endogeneity. Another important point is both amount and participation decisions are jointly determined because preferences for working or work time somehow are positively correlated. In the same way, preferences for having a baby and for working are negatively correlated. Therefore, we should model these decisions with a joint relationship. Using the panel data VHLSSs 2004-2008, we study women’s labor supply in a system of equations where fertility decision is an endogenous dummy variable occurred in both labor supply and participation equations, and this system of equations are jointly correlated. We are interested in estimating a panel data model of working hours for a woman i at time t, who takes having a newborn into consideration as an endogenous factor, as follows: ¾ Log(Hoursit ) = ¾ ∗ LFPit = Newbornit α1 + Meduit β11 + Mageit β12 + Magesqit β13 ¿ +NMincomeit β14 + c1i + u1it if Hoursit > 0 Meduit β21 + Mageit β22 + Magesqit β23 + Heduit β24 + Hageit β25 + Hagesqit β26 + NMincomeit β27 + Newbornit α2 + c2i + u2it 67 , (3.44) ¿ , (3.45) ¾ Newborn∗ = it Samesexβ32 + Medugit β33 + Mageit β34 +Magesqit β35 + NMincomeit β36 + c3i + u3it ¿ . (3.46) These equations correspond to equation (3.1) to equation (3.3) in the model section. Hours it is annual working hours for a woman i at time t, which is determined by her education, Medu, her age, Mage, her age square, Magesq, her other income not from her wage, NMincome, other variables such as whether she lives in an urban area, whether her ethnicity is majority, whether she works on a farm, and whether she has a newborn, Newborn, with the age from 0 to 1. A woman’s LFP is influenced by her characteristics (the same variables in equation (3.44), her husband’s characteristics including education, age, age square, Hedu, Hage, Hagesq, non-mom income as well as whether she has a newborn. The fertility decision equation has right-hand-side variables including an instrumental variable: whether the first two children have the same gender and other exogenous variables including mom’s characteristics and non-mom income. We also allow some explanatory variables to be correlated with heterogeneity and take care of this relationship by adding time averages of explanatory variables into each equation. With the new procedure to control for a corner solution, we can allow unobserved factors that affect both amount and participation equations to be correlated. In addition, to ensure that the predicted value of labor supply is positive, we need to apply Type II Tobit model to log(hours) rather than hours. That is why we use the specification of Exponential Type II Tobit model (ET2T) (see more in (Wooldridge, 2010, Chapter 17)). In addition, the ET2T model is applicable when we have exclusion restrictions. The participation equation contains many more variables which are not in the amount equation so that the parameters in the amount equation will be identified. The choice of appropriate instrumental variables is important because these can affect the reliability of estimates and inferences. Valid and strong instrumental variables must satisfy two conditions: an instrumental variable should be uncorrelated with the error term and it should be highly correlated with the right-hand-side endogenous regressor(s). In this research, that means that the instrumental variables have no correlation with factors that directly affect parental LFP and labor supply and that the instruments are correlated with fertility. Whether the first two children have 68 the same gender is used to generate exogenous variations in fertility in this research. Normally, the gender of a child is a random variable, and it is uncorrelated with parental LFP and labor supply. In addition, we found that the boy-to-girl ratio of the first child was 1.05 in our sample, which is close to the natural ratio. Thus, the gender of the first child is a valid instrumental variable. However, this instrumental variable is not significant in the first stage. Angrist & Evans (1998) found that parents prefer a mixed sibling-sex composition, and parents who first had two girls or boys had a higher probability of having additional children. Carrasco (2001) also found same sex instrumental variable is a strong instrument in the US data. In this dataset, among women with more than two children, the likelihood of another birth was 28% if they had a son, 34% if they had two sons and 11% if they had three sons. This evidence implies that siblings with mixed genders are desirable among Vietnamese families. The same gender of the first two children variables meets the two conditions required of a valid and strong instrument, and it can serve as an instrumental variable to generate exogenous variations in fertility. The same gender of the first two children equals 1 if the first two children have the same gender, and 0 otherwise. According to Table C.1, 55% of sampled households had a male first child and 56% of households had the first two children with the same gender. The t-test is implemented to see if the same gender instrument is strong or not. The result is -3.2, implying that this instrument can be used for this study. Table C.3 shows the estimation result for the bivariate probit model in the first stage. The coefficient on samesex is positive and it is statistically significant implying samesex is a good and significant instrument in our study. The coefficient on a newborn in the LFP equation is also negative and statistically significant. The effect of a newborn reduces the mother’s probability of LFP by 13.6%. In terms of the average treatment effect, compared to women without newborn babies, mothers with newborns have lower probability to continue to work by 12.7%. The coefficient on ρ , -0.165, shows us that there is a negative correlation between unobserved effects that affect both fertility and women’s LFP. This brings more evidence to empirical studies of developing countries that having an additional child will negatively influence the probability of working women who just delivered a child to come back to work. 69 Table C.4 reports the coefficient estimates from six different estimation methods. Pooled OLS (POLS) assumes that all explanatory variables are uncorrelated with unobserved heterogeneity and are also strictly exogenous. The estimates based on POLS show that having a newborn reduces the mother’ s working hours by 13.4%. The POLS estimates have the largest bias because they do not take into account endogeneity of fertility, the presence of heterogeneity which might be correlated with explanatory variables, and the correlation between work participation and the amount of work. Pooled 2SLS takes into account endogeneity of a newborn but does not remove an unobserved effect. Controlling for endogeneity of a newborn reduces the bias by 10%. Now having a newborn will make a mother reduce her working hours by 23%. Fixed effects (FE) allows for correlation between the explanatory variables and unobserved heterogeneity and FE-2SLS further allows a newborn to be correlated with the idiosyncratic errors. Columns (3) and (4) show that mothers’ working hours are diminished by 16.4% and 27.7% using the FE and FE-2SLS. However, FE2SLS ignores the correlation between women’s decision to participate and how much to work. In addition, all methods from (1) to (4) do not consider a newborn a dummy variable. To take into account the correlated participation, we can also use Heckman type IV correction (see Semykina & Wooldridge (2010)) method and hereafter, we call this estimator SW (under column 5 of Table C.4). This estimator allows correlated participation and heterogeneity in the presence of endogeneity. However, this method ignores the binary nature of the endogeneity and assumes a linear reduced form (using pooled 2SLS in the second stage after obtaining the inverse Mills ratio in the first stage). The result shows that mothers’ working hours are reduced by 30.8% using the SW, which is more than the reduction in mothers’ working hours using the FE and FE2SLS. It suggests that correlated participation does matter. However, this decrease is still smaller than the reduction in mothers’ working hours using the new proposed procedure since we need to take care of the binary endogeneity. The new proposed procedure corrects for endogeneity of a newborn, plus its dummy nature and its influence on both women’s participation and amount of work. It also reduces another source of bias from correlated heterogeneity by adding time averages of explanatory variables into all 70 equations and time dummies. However, the standard errors are larger once these corrections are accounted for. After controlling for all these sources of bias, women who are still working will decrease their working hours by 34.5%. The result shows that having a new child in Vietnamese households has a negative effect on maternal hours for working women. Women will have to give up their working hours by 34.5% to take care of her newborn or use such forgone time as an input of home production. 3.6 Conclusion This chapter studies the nonlinear panel data model with an endogenous dummy variable and a corner solution response. The main contribution is to allow a joint distribution of the endogenous dummy regressor and unobserved factors that affect both the amount and participation equations. I propose a two-step estimation method in which the first stage exploits a bivariate probit model for the relationship between the endogenous dummy variable and the participation decision. For the amount equation, by using an ET2T model, we can ensure that the predicted value of log(hours) is positive; and there is a correlation between unobserved effects in both the amount and participation equations. In addition, we need to allow exclusion restrictions in order to identify the parameters in the amount equation. In other words, we allow a set of explanatory variables in the participation equation which contains the set of explanatory variables in the amount equations. I also allow some explanatory variables to be correlated with heterogeneity. This estimation method is applied to analyze the effect of fertility on women’s working hours and labor force participation. The proposed approach gives a statistically significant negative effect of having a newborn on a woman who is working and remain in the labor market. Having a newborn has a significant negative impact on a woman’s taking part into the labor force and her working hours. The proposed estimation method corrects remarkably the bias in estimating the effect of a newborn on a mother’s working hours compared to other alternative estimation methods. 71 APPENDICES 72 Appendix A TABLES FOR CHAPTER 1 73 Table A.1: Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.5, 500 replications) Model True value APE Linear Tobit Fractional Linear Probit Estimation Method OLS MLE QMLE 2SLS y2 is assumed exogenous y2 continuous -0.2347 -0.1283 -0.1591 -0.2079 -0.1583 (0.0046) (0.0042) (0.0051) (0.0110) [.0034] [.0024] [.0008] [.0024] y2 discrete 0-1 -0.32 -0.2014 -0.2763 (0.0046) (0.0051) [.0038] [.0014] y2 discrete 1-2 -0.1273 -0.161 -0.1193 (0.0027) (0.0017) [.0011] [.0002] y2 discrete 2-3 -0.0212 -0.0388 -0.0259 (0.0030) (0.0012) [.0006] [.0001] x1 0.0235 0.0224 0.021 0.0223 0.0237 (0.0181) (0.0125) (0.0130) (0.0189) x2 0.0235 0.0218 0.0214 0.0195 0.023 (0.0181) (0.0128) (0.0129) (0.0192) Note: Figures in brackets ()[] are standard deviation and RMSE respectively. 74 Tobit Fractional Fractional BS Probit Probit MLE QMLE-PW NLS y2 is assumed endogenous -0.1754 -0.2295 -0.2368 (0.0064) (0.0077) (0.0051) [.0019] [.0002] [.00008] -0.2262 -0.3109 -0.3201 (0.0082) (0.0099) (0.0041) [.0030] [.0003] [.00005] -0.1716 -0.1258 -0.128 (0.0031) (0.0023) (0.0020) [.0014] [.00005] [.00004] -0.0317 -0.0224 -0.0214 (0.0031) (0.0014) (0.0014) [.0003] [.00004] [.00001] 0.0212 0.0231 0.024 (0.0131) (0.0142) (0.0159) 0.0218 0.0241 0.0243 (0.0131) (0.0134) (0.0153) Fractional Probit QMLE -0.2371 (0.0050) [.00008] -0.3204 (0.0030) [.00001] -0.1278 (0.0016) [.00001] -0.0212 (0.0010) [.000001] 0.0238 (0.0140) 0.0244 (0.0136) Table A.2: Simulation Result of the Coefficient Estimates (N=1000, η 1 = 0.5, 500 replications) Model True value Coef. Linear Tobit Fractional Probit Estimation Method OLS MLE QMLE y2 is assumed exogenous y2 -1 -0.1283 -0.2024 -0.8543 (0.0044) (0.0046) (0.0146) x1 0.1 0.0224 0.0267 0.0917 (0.0181) (0.0160) (0.0534) x2 0.1 0.0218 0.0272 0.0956 (0.0181) (0.0163) (0.0534) Note: Figures in parenthesis () are standard deviations. 75 Linear Tobit Fractional Fractional Fractional BS Probit Probit Probit 2SLS MLE QMLE-PW NLS QMLE y2 is assumed endogenous -0.1583 -0.2275 -0.9387 -1.045 -1.044 (0.0089) (0.0084) (0.0255) (0.0483) (0.0424) 0.0237 0.0275 0.0945 0.1061 0.1052 (0.0190) (0.0171) (0.0578) (0.0702) (0.0619) 0.0231 0.0282 0.0987 0.1071 0.1073 (0.0192) (0.0170) (0.0548) (0.0681) (0.0600) Table A.3: Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.1, 500 replications) Model True value APE Linear Tobit Fractional Linear Probit Estimation Method OLS MLE QMLE 2SLS y2 is assumed exogenous y2 continuous -0.2461 -0.1507 -0.1854 -0.2402 -0.16 (0.0042) (0.0037) (0.0046) (0.0102) [.0031] [.0019] [.0002] [.0027] y2 discrete 0-1 -0.3383 -0.239 -0.3289 (0.0042) (0.0031) [.0032] [.0003] y2 discrete 1-2 -0.1332 -0.2001 -0.1319 (0.0018) (0.0011) [.0021] [.00004] y2 discrete 2-3 -0.0208 -0.0193 -0.0219 (0.0029) (0.0007) [.00005] [.00003] x2 0.0246 0.0267 0.021 0.025 0.0265 (0.0168) (0.0089) (0.0063) (0.0170) x2 0.0246 0.0241 0.0214 0.0246 0.0242 (0.0178) (0.0100) (0.0070) (0.0182) Note: Figures in brackets ()[] are standard deviation and RMSE respectively. 76 Tobit Fractional Fractional BS Probit Probit MLE QMLE-PW NLS y2 is assumed endogenous -0.1887 -0.2442 -0.249 (0.0053) (0.0056) (0.0044) [.0018] [.0001] [.0001] -0.2445 -0.3355 -0.3384 (0.0066) (0.0051) (0.0019) [.0030] [.00009] [.000005] -0.2022 -0.1331 -0.1332 (0.0025) (0.0013) (0.0008) [.0022] [.000004] [.000001] -0.0177 -0.0212 -0.0208 (0.0032) (0.0008) (0.0007) [.0001] [.00001] [.000001] 0.0234 0.025 0.0255 (0.0090) (0.0065) (0.0072) 0.0222 0.0246 0.0252 (0.0100) (0.0070) (0.0077) Fractional Probit QMLE -0.2491 (0.0043) [.0001] -0.3385 (0.0020) [.000007] -0.1332 (0.0010) [.000001] -0.0208 (0.0007) [.000001] 0.0253 (0.0066) 0.0249 (0.0072) Table A.4: Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.9, 500 replications) Model True value APE Linear Tobit Fractional Linear Probit Estimation Method OLS MLE QMLE 2SLS y2 is assumed exogenous y2 continuous -0.2178 -0.1104 -0.1368 -0.1777 -0.1548 (0.0042) (0.0039) (0.0054) (0.0148) [.0034] [.0026] [.0013] [.0020] y2 discrete 0-1 -0.2973 -0.1706 -0.2307 (0.0049) (0.0069) [.0040] [.0021] y2 discrete 1-2 -0.1281 -0.1303 -0.111 (0.0031) (0.0024) [.00007] [.0005] y2 discrete 2-3 -0.0253 -0.0532 -0.0319 (0.0022) (0.0019) [.0009] [.0002] x1 0.0218 0.0327 0.0276 0.0291 0.0263 (0.0222) (0.0176) (0.0182) (0.0169) x2 0.0218 0.0215 0.0212 0.0236 0.0244 (0.0201) (0.0170) (0.0179) (0.0184) Note: Figures in brackets ()[] are standard deviation and RMSE respectively. 77 Tobit Fractional Fractional BS Probit Probit MLE QMLE-PW NLS y2 is assumed endogenous -0.1637 -0.2144 -0.2208 (0.0096) (0.0124) (0.0052) [.0017] [.0001] [.0001] -0.21 -0.2871 -0.3 (0.0136) (0.0196) (0.0054) [.0028] [.0003] [.00008] -0.1491 -0.1232 -0.1288 (0.0060) (0.0047) (0.0025) [.0007] [.0002] [.00002] -0.0452 -0.0258 -0.0247 (0.0030) (0.0023) (0.0021) [.0006] [.00002] [.00002] 0.0273 0.0305 0.0318 (0.0184) (0.0201) (0.0237) 0.0199 0.0233 0.02 (0.0187) (0.0206) (0.0216) Fractional Probit QMLE -0.2205 (0.0045) [.0001] -0.2994 (0.0040) [.00006] -0.1291 (0.0020) [.00003] -0.0249 (0.0015) [.00001] 0.0313 (0.0208) 0.0203 (0.0187) Table A.5: Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.5, 500 replications, δ23 = 0.3) Model True value APE Linear Tobit Fractional Linear Probit Estimation Method OLS MLE QMLE 2SLS y2 is assumed exogenous y2 continuous -0.2402 -0.1352 -0.1618 -0.2094 -0.1661 (0.0047) (0.0037) (0.0050) (0.0621) [.0033] [.0025] [.0010] [.0023] y2 discrete 0-1 -0.3202 -0.1992 -0.2724 (0.0044) (0.0055) [.0038] [.0015] y2 discrete 1-2 -0.1275 -0.1605 -0.1195 (0.0026) (0.0016) [.0010] [.0003] y2 discrete 2-3 -0.0213 -0.0386 -0.0268 (0.0027) (0.0011) [.0005] [.0002] x1 0.024 0.0224 0.0114 0.0117 0.0237 (0.0181) (0.0118) (0.0130) (0.0189) x2 0.024 0.0218 0.0104 0.0109 0.023 (0.0181) (0.0123) (0.0129) (0.0192) Note: Figures in brackets ()[] are standard deviation and RMSE respectively. This table presents the case of weak IV (δ23 = 0.3). 78 Tobit Fractional Fractional BS Probit Probit MLE QMLE-PW NLS y2 is assumed endogenous -0.1823 -0.2327 -0.2405 (0.0363) (0.0366) (0.0046) [.0023] [.0002] [.00001] -0.2301 -0.3157 -0.3199 (0.0522) (0.0799) (0.0037) [.0018] [.0001] [.00001] -0.1676 -0.1248 -0.128 (0.0189) (0.0087) (0.0016) [.0029] [.00009] [.00002] -0.0332 -0.0227 -0.0215 (0.0108) (0.0066) (0.0012) [.0013] [.00005] [.000001] 0.0235 0.0253 0.0261 (0.0225) (0.0142) (0.0146) 0.0227 0.0227 0.0244 (0.0244) (0.0134) (0.0135) Fractional Probit QMLE -0.2407 (0.0043) [.00001] -0.3202 (0.0029) [.000001] -0.1279 (0.0015) [.00001] -0.0214 (0.0010) [.000003] 0.0252 (0.0127) 0.025 (0.0119) Table A.6: Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.5, 500 replications, δ23 = 0) Model True value APE Linear Tobit Fractional Tobit Fractional Fractional Fractional Probit BS Probit Probit Probit Estimation Method OLS MLE QMLE MLE QMLE-PW NLS QMLE y2 is assumed exogenous y2 is assumed endogenous y2 continuous -0.2441 -0.1382 -0.1652 -0.2117 -0.1827 -0.2625 -0.2436 -0.2441 (0.0045) (0.0034) (0.0049) (0.0424) (0.0427) (0.0045) (0.0044) [.0033] [.0024] [.0010] [.0020] [.0002] [.00001] [.00001] y2 discrete 0-1 -0.3194 -0.2019 -0.2708 -0.2284 -0.3513 -0.3186 -0.3194 (0.0039) (0.0053) (0.0562) (0.0931) (0.0036) (0.0030) [.0038] [.0014] [.0018] [.0001] [.00001] [.000001] y2 discrete 1-2 -0.1269 -0.1605 -0.1189 -0.1697 -0.1294 -0.1274 -0.127 (0.0025) (0.0021) (0.0211) (0.0101) (0.0016) (0.0015) [.0010] [.0002] [.0015] [.00007] [.00001] [.00001] y2 discrete 2-3 -0.0211 -0.036 -0.0269 -0.0267 -0.0175 -0.0213 -0.021 (0.0026) (0.0020) (0.0129) (0.0074) (0.0011) (0.0010) [.0004] [.0002] [.0010] [.00004] [.000001] [.000002] Note: Figures in brackets ()[] are standard deviation and RMSE respectively. This table presents the case of no instrument (δ 23 = 0). 79 Table A.7: Simulation Result of the Average Partial Effects Estimates (N=100, η 1 = 0.5, 500 replications) Model True value APE Linear Tobit Fractional Linear Probit Estimation Method OLS MLE QMLE 2SLS y2 is assumed exogenous y2 continuous -0.235 -0.1419 -0.1695 -0.218 -0.1688 (0.0221) (0.0193) (0.0216) (0.0667) [.094] [.0066] [.0017] [.0066] y2 discrete 0-1 -0.3281 -0.2173 -0.3 (0.0253) (0.0339) [.0111] [.0028] y2 discrete 1-2 -0.1308 -0.1767 -0.1252 (0.0222) (0.0096) [.0046] [.0006] y2 discrete 2-3 -0.0214 -0.0316 -0.0243 (0.0129) (0.0034) [.0010] [.0003] Note: Figures in brackets ()[] are standard deviation and RMSE respectively. 80 Tobit Fractional Fractional BS Probit Probit MLE QMLE-PW NLS y2 is assumed endogenous -0.1837 -0.234 -0.2371 (0.0356) (0.0253) (0.0166) [.0051] [.0001] [.0002] -0.2386 -0.3277 -0.3288 (0.0492) (0.0457) (0.0137) [.0090] [.00004] [.00005] -0.184 -0.1305 -0.13 (0.0257) (0.0139) (0.0073) [.0053] [.00004] [.00009] -0.0286 -0.0217 -0.0211 (0.0134) (0.0041) (0.0036) [.0007] [.00003] [.00003] Fractional Probit QMLE -0.2366 (0.0162) [.00017] -0.3281 (0.0122) [.00004] -0.1306 (0.0063) [.00004] -0.0213 (0.0027) [.00001] Table A.8: Simulation Result of the Average Partial Effects Estimates (N=500, η 1 = 0.5, 500 replications) Model True value APE Linear Tobit Fractional Linear Probit Estimation Method OLS MLE QMLE 2SLS y2 is assumed exogenous y2 continuous -0.2358 -0.1415 -0.171 -0.2201 -0.1617 (0.0177) (0.0157) (0.0163) (0.0175) [.0046] [.0034] [.0007] [.0041] y2 discrete 0-1 -0.3285 -0.219 -0.3026 (0.0219) (0.0311) [.0059] [.0012] y2 discrete 1-2 -0.1309 -0.1782 -0.1259 (0.0205) (0.0082) [.0028] [.0002] y2 discrete 2-3 -0.0214 -0.0309 -0.024 (0.0109) (0.0025) [.0004] [.0001] Note: Figures in brackets ()[] are standard deviation and RMSE respectively. 81 Tobit Fractional Fractional BS Probit Probit MLE QMLE-PW NLS y2 is assumed endogenous -0.1815 -0.2334 -0.2379 (0.0114) (0.0119) (0.0051) [.0029] [.0001] [ .0001] -0.2351 -0.3241 -0.3293 (0.0153) (0.0192) (0.0109) [.0052] [.0002] [.00001] -0.1847 -0.13 -0.131 (0.0158) (0.0058) (0.0044) [.0031] [.00004] [.00001] -0.0267 -0.0219 -0.0212 (0.0084) (0.0018) (0.0017) [.0002] [.00002] [.00001] Fractional Probit QMLE -0.2376 (0.0086) [.0001] -0.329 (0.0106) [.00002] -0.1311 (0.0043) [.000004] -0.0213 (0.0014) [.000004] Table A.9: Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.5, 500 replications) Model True value APE Linear Tobit Fractional Linear Probit Estimation Method OLS MLE QMLE 2SLS y2 is assumed exogenous y2 continuous -0.2347 -0.1283 -0.1591 -0.2079 -0.1583 (0.0046) (0.0042) (0.0051) (0.0110) [.0034] [.0024] [.0008] [.0024] y2 discrete 0-1 -0.32 -0.2014 -0.2763 (0.0046) (0.0051) [.0038] [.0014] y2 discrete 1-2 -0.1273 -0.161 -0.1193 (0.0027) (0.0017) [.0011] [.0002] y2 discrete 2-3 -0.0212 -0.0388 -0.0259 (0.0030) (0.0012) [.0006] [.0001] Note: Figures in brackets ()[] are standard deviation and RMSE respectively. 82 Tobit Fractional Fractional BS Probit Probit MLE QMLE-PW NLS y2 is assumed endogenous -0.1754 -0.2295 -0.2368 (0.0064) (0.0077) (0.0051) [.0019] [.0002] [.00008] -0.2262 -0.3109 -0.3201 (0.0082) (0.0099) (0.0041) [.0030] [.0003] [.00016] -0.1716 -0.1258 -0.128 (0.0031) (0.0023) (0.0020) [.0014] [.00005] [.00004] -0.0317 -0.0224 -0.0214 (0.0031) (0.0014) (0.0014) [.0003] [.00004] [.00001] Fractional Probit QMLE -0.2371 (0.0050) [.00008] -0.3204 (0.0030) [.00001] -0.1278 (0.0016) [.00001] -0.0212 (0.0010) [.000001] Table A.10: Simulation Result of the Average Partial Effects Estimates (N=2000, η 1 = 0.5, 500 replications) Model True value APE Linear Tobit Fractional Linear Probit Estimation Method OLS MLE QMLE 2SLS y2 is assumed exogenous y2 continuous -0.2347 -0.1286 -0.1591 -0.208 -0.1591 (0.0028) (0.0028) (0.0031) (0.0082) [.0024] [.0017] [.0006] [.0017] y2 discrete 0-1 -0.3201 -0.2014 -0.2766 (0.0034) (0.0036) [.0027] [.0010] y2 discrete 1-2 -0.1275 -0.1609 -0.1194 (0.0020) (0.0012) [.0008] [.0002] y2 discrete 2-3 -0.0213 -0.039 -0.0259 (0.0020) (0.0008) [.0004] [.0001] Note: Figures in brackets ()[] are standard deviation and RMSE respectively. 83 Tobit Fractional Fractional BS Probit Probit MLE QMLE-PW NLS y2 is assumed endogenous -0.1755 -0.2293 -0.2369 (0.0044) (0.0050) (0.0031) [.0013] [.0001] [.00005] -0.2263 -0.3106 -0.3201 (0.0059) (0.0074) (0.0029) [.0021] [.0002] [.000001] -0.1717 -0.1258 -0.1281 (0.0024) (0.0017) (0.0015) [.0010] [.00004] [.00001] -0.0317 -0.0224 -0.0214 (0.0021) (0.0010) (0.0010) [.0002] [.00003] [.000002] Fractional Probit QMLE -0.2371 (0.0030) [.00006] -0.3204 (0.0021) [.000007] -0.1278 (0.0011) [.000009] -0.0212 (0.0007) [.000001] Table A.11: Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.5, a1 is normally distributed, 500 replications) Model True value APE Linear Tobit Fractional Linear Probit Estimation Method OLS MLE QMLE 2SLS y2 is assumed exogenous y2 continuous -0.2379 -0.1599 -0.1876 -0.2369 -0.1625 (0.0053) (0.0041) (0.0050) (0.0088) [.0025] [.0015] [.00003] [.0024] y2 discrete 0-1 -0.3409 -0.2431 -0.3393 (0.0040) (0.0028) [.0031] [.00005] y2 discrete 1-2 -0.1361 -0.2032 -0.1358 (0.0019) (0.0010) [.0021] [.000007] y2 discrete 2-3 -0.0215 -0.0195 -0.0217 (0.0027) (0.0006) [.00006] [.000005] x1 0.0238 0.0265 0.0223 0.024 0.0237 (0.0165) (0.0084) (0.0059) (0.0189) x2 0.0238 0.0234 0.0217 0.024 0.023 (0.0179) (0.0103) (0.0064) (0.0192) Note: Figures in brackets ()[] are standard deviation and RMSE respectively. Tobit Fractional Fractional BS Probit Probit MLE QMLE-PW NLS y2 is assumed endogenous -0.1885 -0.2375 -0.239 (0.0051) (0.0056) (0.0049) [.0016] [.00001] [.00003] -0.2445 -0.3403 -0.3401 (0.0063) (0.0050) (0.0023) [.0031] [.00002] [.00003] -0.2037 -0.136 -0.136 (0.0026) (0.0012) (0.0011) [.0021] [.000002] [.000002] -0.0192 -0.0216 -0.0216 (0.0032) (0.0007) (0.0008) [.00007] [.000003] [.000003] 0.0224 0.024 0.0242 (0.0084) (0.0059) (0.0064) 0.0218 0.024 0.0239 (0.0104) (0.0064) (0.0064) Fractional Probit QMLE -0.239 (0.0048) [.00003] -0.3401 (0.0018) [.00003] -0.136 (0.0010) [.000002] -0.0216 (0.0006) [.000003] 0.024 (0.0061) 0.0239 (0.0061) Table A.12: Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.5, 500 replications) APE (QMLE) y2 continuous y2 discrete 0-1 y2 discrete 1-2 y2 discrete 2-3 True APE -0.2347 -0.32 -0.1273 -0.0212 Mean SD MSE -0.2371 0.005 0.0051 -0.3204 0.003 0.0029 -0.1278 0.0016 0.0014 -0.0212 0.001 0.0009 84 Rejection rate 0.046 0.045 0.046 0.048 Table A.13: Comparison of analytical and bootstrapping mean of standard errors (N=1000, η 1 = 0.5, 200 replications) Model Estimation Method Standard error y2 continuous Fractional Probit QMLE NLS analytical bootstrapping analytical bootstrapping -0.2406 -0.2406 -0.2405 -0.2405 (0.0043) (0.0041) (0.0046) (0.0043) y2 discrete 0-1 -0.3203 -0.3203 -0.32 -0.32 (0.0030) (0.0028) (0.0038) (0.0034) y2 discrete 1-2 -0.1279 -0.1279 -0.128 -0.128 (0.0015) (0.0013) (0.0016) (0.0014) y2 discrete 2-3 -0.0214 -0.0214 -0.0215 -0.0215 (0.0010) (0.0010) (0.0012) (0.0012) Note: Figures in parenthesis () are mean of standard errors. Figures not in parenthesis () are APEs’ estimates. Bootstrapping standard errors are obtained by bootstrapping method using 100 bootstrap replications. 85 Table A.14: Frequencies of the Number of Children Number Frequency Percent Cumulative of kids relative frequency 0 16,200 50.9 50.9 1 10,000 31.42 82.33 2 3,733 11.73 94.06 3 1,373 4.31 98.37 4 323 1.01 99.39 5 134 0.42 99.81 6 47 0.15 99.96 7 6 0.02 99.97 8 4 0.01 99.99 9 2 0.01 99.99 10 2 0.01 100 Total 31,824 100 Table A.15: Descriptive Statistics Variable frhour kidno age agefstm hispan nonmomi edu samesex multi2nd Description Mean S.D. Women’s weekly fractional working hours 0.126 0.116 Number of kids 0.752 0.977 Mother’s age in years 29.742 3.613 Mother’s age in years when first child was born 20.118 2.889 =1 if race is hispanic; = 0 if race is black 0.593 0.491 Non-mom’s labor income 31.806 20.375 Education = Number of schooling years 11.005 3.305 =1 if the 1st 2 kids have the same sex; = 0 otherwise 0.503 0.5 =1 if the 2nd birth is twin; =0 otherwise 0.009 0.093 86 Min Max 0 0.589 0 10 21 35 15 32 0 1 0 157.4 0 20 0 1 0 1 Table A.16: First-stage Estimates using Instrumental Variables Dependent Variable - Kidno Linear model (OLS) Negative Binomial II model (MLE) edu -0.065 -0.078 (0.002) (0.002) age 0.096 0.119 (0.002) (0.002) agefstm -0.114 -0.156 (0.002) (0.003) hispan 0.036 0.045 (0.010) (0.015) nonmomi -0.002 -0.003 (0.000) (0.000) samesex 0.075 0.098 (0.010) (0.013) multi2nd 0.786 0.728 (0.052) (0.045) constant 0.911 0.013 (0.042) (0.067) Note: Figures in parentheses are robust standard errors. 87 Table A.17: Estimates Assuming Number of Kids is Conditionally Exogenous Model Estimation Method Linear Tobit Fractional Probit OLS MLE QMLE Coefficient Coefficient APE Coefficient APE kidno (continuous) -0.019 -0.034 -0.0225 -0.099 -0.0202 (0.0007) (0.0013) (0.0008) (0.0040) (0.0008) 0-1 -0.0231 -0.0207 (0.0008) (0.0008) 1-2 -0.0207 -0.0185 (0.0007) (0.0007) 2-3 -0.0183 -0.0163 (0.0005) (0.0005) edu 0.004 0.008 0.005 0.022 0.005 (0.0002) (0.0004) (0.0002) (0.0010) (0.0002) age 0.005 0.008 0.006 0.024 0.005 (0.0002) (0.0003) (0.0002) (0.0010) (0.0002) agefstm -0.006 -0.01 -0.007 -0.03 -0.006 (0.0003) (0.0004) (0.0003) (0.0010) (0.0003) hispan -0.032 -0.052 -0.034 -0.15 -0.031 (0.0010) (0.0022) (0.0014) (0.0070) (0.0013) nonmomi -0.0003 -0.0006 -0.0004 -0.002 -0.0004 (0.0000) (0.0001) (0.0000) (0.0002) (0.0000) Note: Figures in parentheses under the Coefficient columns are robust standard errors. Figures in parentheses under the APE columns are bootstrapped standard errors. 88 Table A.18: Estimates Assuming Number of Kids is Endogenous Model Estimation Method Linear 2SLS Tobit (BS) MLE Fractional Probit Fractional Probit Fractional Probit QMLE-PW QMLE NLS Kidno is assumed cont. Coef. Coef. APE Coef. APE Coef. APE Coef. APE kidno (continuous) -0.016 -0.027 -0.018 -0.078 -0.016 -0.081 -0.017 -0.081 -0.017 (0.0070) (0.0130) (0.0080) (0.0370) (0.0080) (0.0070) (0.0010) (0.0070) (0.0010) 0-1 -0.018 -0.016 -0.017 -0.017 (0.0080) (0.0080) (0.0010) (0.0010) 1-2 -0.017 -0.015 -0.015 -0.015 (0.0070) (0.0070) (0.0010) (0.0010) 2-3 -0.015 -0.014 -0.014 -0.014 (0.0060) (0.0050) (0.0010) (0.0010) edu 0.004 0.009 0.006 0.024 0.005 0.024 0.005 0.024 0.005 (0.0005) (0.0009) (0.0006) (0.0020) (0.0005) (0.0010) (0.0005) (0.0010) (0.0005) age 0.005 0.008 0.005 0.022 0.004 0.021 0.004 0.021 0.004 (0.0007) (0.0010) (0.0008) (0.0040) (0.0008) (0.0010) (0.0008) (0.0010) (0.0008) agefstm -0.006 -0.01 -0.006 -0.028 -0.006 -0.027 -0.005 -0.027 -0.005 (0.0008) (0.0010) (0.0010) (0.0040) (0.0009) (0.0020) (0.0008) (0.0020) (0.0008) hispan -0.032 -0.052 -0.034 -0.15 -0.031 -0.151 -0.031 -0.151 -0.031 (0.0010) (0.0020) (0.0010) (0.0070) (0.0010) (0.0070) (0.0010) (0.0070) (0.0010) nonmomi -0.0003 -0.0005 -0.0004 -0.002 -0.0003 -0.002 -0.0003 -0.002 -0.0003 (0.00004) (0.00006) (0.00004) (0.00002) (0.00004) (0.00020) (0.00004) (0.00020) (0.00004) Note: Figures in parentheses under the Coefficient columns are robust standard errors. Figures in parentheses under the APE columns are bootstrapped standard errors; those under the APEs for a count endogenous variable with the QMLE and NLS methods are computed standard errors. 89 Appendix B TABLES AND FIGURES FOR CHAPTER 2 Table B.1: Summary Statistics Variable Description Annual Hours Experience (years) Education (years) Age (years) Number of children aged 0-2 Number of children aged 3-5 Number of children aged 6-17 Married (= 1 if married) Husband’s employment status (=1 if working) Non-wife income (thousand dollars) Number of observations Number of women Number of years 90 Mean 1105.7 11.89 12.94 41.42 0.13 0.18 0.84 0.88 0.82 36,622.4 11,232 864 13 Standard deviation 886.52 7.71 2.27 10.18 0.37 0.42 1.01 0.32 0.39 41,704 Table B.2: Determinants of Female Working Experience - First stage regressions Dependent variable: Female Working Experience Number of children aged 0-2 -0.442** [0.200] Number of children aged 3-5 -0.707*** [0.169] Number of children aged 6-17 -1.207*** [0.162] Years of schooling 0.470*** [0.096] Married -1.191 [1.186] Husband’s work participation -1.256 [0.913] Non-wife income -0.00003*** [0.00001] Age 1.174*** [0.138] Age squared -0.010*** [0.002] η 0.958 Number of observations 11,232 Number of women 864 R-squared 0.37 F-Statistics on IVs 196.26 Note: *, **, ***: significant at 10%, 5% and 1% level respectively. Other explanatory variables include time dummies and time averages of all explanatory variables. Standard errors robust to heteroskedasticity and serial correlation are inside square brackets. Instrumental variables (IVs) are age and age squared. 91 Table B.3: Estimating Dynamic Female Labor Supply, Second Stage Regressions, Experience is Treated as an Endogenous Variable Model Estimation Method Lagged Hours Experience Children 0-2 Children 3-5 Children 6-17 Education Married Husband’s work status Non-wife income Dynamic Linear GMM [1] 0.857*** [0.012] 4.683*** [1.514] -37.657** [15.718] 0.371 [12.932] 29.820*** [5.81] 7.979** [3.147] -134.671*** [33.576] 136.438*** [26.787] -0.001*** [0.0004] Initial Condition v2it v2it* Correlated RE (CRE) [2] 0.542*** [0.009] 4.964** [2.381] -73.537*** [15.884] -44.292*** [13.66] 46.139*** [8.108] -6.849 [9.415] -253.253** [124.741] 205.205*** [25.982] 0.001*** [0.0002] 0.161*** [0.038] -59.413*** [3.643] Tobit CRE with serial correlation correction [3] 0.492*** [0.025] 13.207*** [1.582] -148.978*** [18.038] -97.080*** [15.628] 7.103 [8.364] 3.975 [9.577] -234.204** [117.946] 195.717*** [28.254] -0.001*** [0.0002] 0.102*** [0.011] 427.820*** [47.673] Observations 10368 10368 10368 Number of women 864 864 864 Note: *, **, ***: significant at 10%, 5% and 1% level respectively. z i and v2i are included in (2) and (3) but not reported in the table. The first stage residual in (3) is free of serial correlation. Standard errors corrected for the first stage estimation are inside square brackets. 92 Table B.4: Average Partial Effects on Female Labor Supply Model Estimation Method Dynamic Linear Tobit GMM CRE CRE-SC [1] [2] [3] Lagged Hours 0.857*** 0.469*** 0.434*** [0.012] [0.012] [0.011] Experience 4.683*** 4.294 11.481*** [1.514] [3.074] [2.662] Children 0-2 -37.657** -63.616*** -122.552*** [15.718] [17.612] [14.848] Children 3-5 0.371 -38.317** -80.443*** [12.932] [12.932] [10.343] Children 6-17 29.820*** 39.914*** 6.262 [5.81] [10.10] [8.722] Education 7.979** -5.925 3.504 [3.147] [12.38] [9.669] Married -134.671*** -219.09 -251.11 [33.576] [278.695] [244.441] Husband’s work status 136.438*** 177.521*** 161.88*** [26.787] [58.886] [35.728] Non-wife income -0.001*** 0.001*** -0.001*** [0.0004] [0.0003] [0.0004] Replications 100 100 Note: *, **, ***: significant at 10%, 5% and 1% level respectively. The figures inside square brackets are bootstrapped standard errors with 100 replications. 93 30 20 Percent 10 0 0 1000 2000 3000 Annual work hours 4000 5000 Figure B.1: Distribution of Women’s Annual Hours of Work in 1980-1992 94 2000 1500 1000 500 0 0 10 20 30 Experience in years Figure B.2: Hours of Work vs. Experience 95 40 50 1200 1000 800 600 400 200 0 1 2 Number of children in FU aged 0−2 Figure B.3: Hours of Work vs. Number of Children 0-2 96 3 1200 1000 800 600 400 0 1 2 Number of children in FU aged 3−5 Figure B.4: Hours of Work vs. Number of Children 3-5 97 3 1200 1100 1000 900 800 700 0 2 4 Number of children in FU aged 6−17 Figure B.5: Hours of Work vs. Number of Children 6-17 98 6 Appendix C TABLES FOR CHAPTER 3 Table C.1: Summary Statistics for the Whole Sample Variable Description Female labor participation Annual Hours Education (years) Age (years) Newborn aged 0-2 Spouse’s age (years) Spouse’s education (years) Non-wife income (millions) First child’s gender (=1 if a boy, =0 if a girl) First two children has same sex (=1 if yes, =0 if not) Live in urban Live with grandparent Work on farm Ethnic (=1 if major, =0 if minor) Number of women Number of observation Note: N=665 women. Years = 2004, 2006, 2008. S.D. stands for standard deviation. 99 Mean S.D. 0.95 0.21 1938.02 755.29 7.29 3.88 39.18 7.21 0.1 0.3 41.9 7.48 8.2 3.86 8.44 16.13 0.55 0.5 0.56 0.18 0.09 0.84 0.79 665 1995 0.5 0.38 0.29 0.37 0.4 Table C.2: Summary Statistics for Each Year in the Panel Variable Description Annual Hours (hours) Female labor participation (=1 if work, =0 if not) Newborn aged 0-1 (=1 if yes, =0 if not) Education (years) Age (years) Non-wife income (million dongs) Spouse’s age (years) Spouse’s education (years) First two children has same sex (=1 if yes, =0 if not) Live in urban (=1 if yes, =0 if not) Work on farm (=1 if yes, =0 if not) Ethnic (=1 if major, =0 if minor) Number of observations 2004 2006 2008 Mean S.D. Mean S.D. Mean S.D. 1818.47 826.48 1842.79 839.42 1792.62 869.48 0.96 0.19 0.96 0.2 0.94 0.23 0.16 0.36 0.09 0.28 0.06 0.25 7.23 3.81 7.31 3.85 7.32 3.98 37.23 7.03 39.16 7 41.13 7.09 6.32 14.77 7.89 13.05 11.1 19.53 39.97 7.32 41.9 7.32 43.83 7.3 8.12 3.81 8.18 3.8 8.3 3.96 0.59 0.49 0.57 0.5 0.52 0.5 0.17 0.38 0.18 0.38 0.19 0.39 0.86 0.35 0.83 0.38 0.82 0.38 0.8 0.4 0.8 0.4 0.79 0.41 665 665 665 100 Table C.3: Bivariate Probit Estimates of Fertility and LFP in the First Stage Dependent Variable Explanatory Variable Newborn Samesex Non-wife income Age Age squared Education Husband’s age Fertility Equation Newborn [1] 1.024*** [0.374] 0.008*** [0.003] -0.302 [0.204] 0.002 [0.002] -0.06 [0.08] - Husband’s age squared -0.001*** [0.0002] -0.031* [0.017] 0.0001* [0.00006] -0.005 [0.03] -0.007* [0.004] 0.0001 [0.0003] -0.07* [0.05] - Husband’s education LFP Equation LFP [2] [3] Coefficient APE/ATE -0.136*** -0.127*** [0.05] [0.03] - - Cov(v2it,v3it) = ρ -0.001*** [0.0002] -0.019* [0.011] 0.0001 [0.0001] -0.003 [0.03] -0.005* [0.003] -0.0001 [0.0002] -0.05* [0.03] -0.165 [0.04] Log likelihood -866.48 Number of observations 1995 Note: N=665, T=3. Time averages of explanatory variables and year dummies for 2006 and 2008 are included. Figures in square brackets are clustered standard errors to control for serial correlation across time. *, **, ***: significant at 10%, 5% and 1% level respectively. 101 Table C.4: Estimates for Log(Female Working Hours) Equation Explanatory Variable Newborn Education Age Age squared Non-wife income Urban Work on Farm Ethnic Pooled Pooled Fixed OLS 2SLS Effect [1] [2] [3] -0.134*** -0.232*** -0.164*** [0.04] [0.053] [0.044] 0.023*** 0.024*** 0.018* [0.004] [0.006] [0.01] -0.014 0.003 0.039 [0.017] [0.046] [0.033] 0.0001 -0.0001 -0.001 [0.0002] [0.0004] [0.0004] 0.001 0.001 0.001* [0.001] [0.001] [0.001] 0.090** 0.088** 0.077 [0.039] [0.04] [0.085] -0.318*** -0.317*** -0.101* [0.037] [0.037] [0.054] -0.143*** -0.138*** -0.176 [0.034] [0.037] [0.161] Fixed Effect 2SLS [4] -0.277*** [0.082] 0.014 [0.011] -0.002 [0.067] 0.0002 [0.001] 0.002* [0.001] 0.053 [0.197] -0.104 [0.066] -0.195 [0.235] SW Procedure [5] -0.308*** [0.117] 0.019* [0.012] 0.035 [0.037] -0.0005 [0.0005] 0.002** [0.001] 0.091** [0.038] -0.329*** [0.033] -0.142*** [0.028] R-square 0.1 0.08 0.07 0.03 0.1 Number of observations 1904 1904 1904 1904 1904 Note: The dependent variable is log(hours), with 1904 observations of positive hours. Year dummy variables and time averages of explanatory variables are included. Standard errors are robust to serial correlation and heteroskedasticity. Standard errors in the SW and proposed procedure are corrected for the first-step estimation. *, **, ***: significant at 10%, 5% and 1% level respectively. 102 Proposed Procedure [6] -0.345*** [0.108] 0.016** [0.008] -0.02** [0.01] 0.0001** [0.0001] 0.002*** [0.0006] 0.095*** [0.037] -0.328*** [0.034] -0.143*** [0.028] 0.11 1904 Appendix D TECHNICALITIES FOR CHAPTER 1 D.1 Details of the QML Estimator D.1.1 Asymptotic Variance for the Two-step Estimator This section derives asymptotic standard errors for the QML estimator in the second step. The adjusted asymptotic standard errors for the NLS estimator can be derived in a similar way. In the first stage, we have: y2 |z, a1 ∼Poisson[exp(zδ2 + a1 )] with the conditional density function: f (y2 |z, a1 ) = [exp(zδ 2 + a1 )]y2i exp[− exp(zδ 2 + a1 )] . y2 ! (D.1) The unconditional density of y 2 conditioned only on z is obtained by integrating a 1 out of the joint density. That is: f (y2 |z) = a1 f (y2 |z, a1 ) f (a1 )da1 , δ in which f (a1 ) =δ0 0 exp(a1 )δ0 −1 exp(−δ0 exp(a1 ))Γ−1(δ0 ). Let m = exp(zδ2 ) and c = exp(a1 ), then the conditional density is: [mc]y2 exp[−mc] , Γ(y2 + 1) f (y2 |z, a1 ) = and the unconditional density is: ∞ f (y2 |z) = 0 δ [mc]y2 exp [−mc] δ0 0 cδ0 −1 exp(−δ0 c)) dc. Γ(y2 + 1) Γ(δ0 ) This is equivalent to: δ [m]y2 δ0 0 f (y2 |z) = Γ(y2 + 1)Γ(δ0 ) or ∞ exp[−c(m + δ0 )]cy2i +δ0 −1 dc, 0 δ [m]y2 δ0 0 Γ(y2 + δ0 ) . f (y2 |z) = Γ(y2 + 1)Γ(δ0 ) (m + δ0 )(y2 +δ0 ) 103 (D.2) δ 0 Define h = m+δ results in: 0 f (y2 |z) = Γ(y2 + δ0 )hδ0 (1 − h)y2 , Γ(y2 + 1)Γ(δ0 ) (D.3) where y2 = 0, 1, ... and δ0 > 0, which is the density function for the negative binomial distribution. The log-likelihood for observation i is: æ é æ é æ é δ0 exp(zi δ2 ) Γ(y2i + δ0 ) + y2i ln + ln . li (δ2 , δ0 ) = δ0 ln δ0 + exp(zi δ2 ) δ0 + exp(zi δ2 ) Γ(y2i + 1)Γ(δ0 ) (D.4) For all observations: N L(δ2 , δ0 ) = li(δ2 , δ0 ). (D.5) i=1 We can estimate jointly δ2 and δ0 by maximum likelihood estimation method. Let γ = (δ2 , δ0 ) has the dimension of (L + 1) where L is the dimension of δ 2 which is the sum of K and the number of instruments, under standard regularity conditions, we have: N √ ˆ N(γ − γ ) = N −1/2 ri2 + o p (1), (D.6) i=1 ¾ where ri2 = in which s0 = ∇δ li 2 ∇δ li 0 A0 = E(∇2 li ) = E γ à = ∇2 li δ2 ∇2 li δ0 s01 à s02 à =E −A−1 s01 01 −A−1 s02 02 ¿ , (D.7) , and H01 à H02 = A01 A02 à . After taking the first derivative and the second derivative, we have: z δ (y − exp(zi δ2 )) s01 = i 0 2i , δ0 + exp(zi δ2 ) (D.8) z z δ exp(zi δ2 ) H01 = − i i 0 , δ0 + exp(zi δ2 ) (D.9) 104 s02 = ln( δ0 exp(zi δ2 ) − y2i Γ (y2i + δ0 ) Γ (δ0 ) )+ + − , δ0 + exp(zi δ2 ) δ0 + exp(zi δ2 ) Γ(y2i + δ0 ) Γ(δ0 ) (D.10) H02 = H021 + H022 , (D.11) where H021 = exp(zi δ2 ) exp(zi δ2 ) − y2i , − δ0 [δ0 + exp(zi δ2 )] [δ0 + exp(zi δ2 )]2 ä and H022 = Γ (y2i + δ0 )Γ(y2i + δ0 ) − Γ (y2i + δ0 ) [Γ(y2i + δ0 )]2 ç2 ä ç2 Γ (δ0 )Γ(δ0 ) − Γ (δ0 ) − , [Γ(δ0 )]2 where s01 and H01 are L × 1 and L × L matrices; s012 and H02 are 1 × 1 and 1 × 1 matrices. ri2 (γ ) has the dimension of (L + 1) × 1. √ ˆ With the two-step M-estimator, the asymptotic variance of N(θ − θ ) must be adjusted to √ ˆ account for the first-stage estimation of N(γ − γ ) (see more in 12.4.2 of chapter 12, Wooldridge, 2002). The score of the QML (or the gradient) for observation i with respect to θ is: si ( θ ; γ ) = θ li (θ ), = y1i = = = = si ( θ ; γ ) = θ μi θ μi , 1 − μi y1i θ μi (1 − μi ) − μi (1 − y1i ) θ μi , μi (1 − μi ) y1i θ μi − μi θ μi , μi (1 − μi ) (y1i − μi ) θ μi , μi (1 − μi ) (y1i − μi ) +∞ ∂ Φ(gi θ ) f (a1 |y2 , z)da1 μi (1 − μi ) −∞ ∂θ μi − (1 − y1i ) (y1i − μi ) +∞ g φ (gi θ ) f (a1 |y2 , z)da1 , μi (1 − μi ) −∞ i (D.12) where gi = (y2i , z1i , a1i ) and θ = (α1 , δ1 , η1 ) and θ has the dimension of K + 2. N √ ˆ N(θ − θ ) = A−1 (N −1/2 ri1 (θ ; γ )) + o p (1), 1 i=1 105 (D.13) A1 = E [− = E æ( θ si (θ ; γ )] , θ μi ) θ μi é μi (1 − μi ) 1 BB . = E μi (1 − μi ) ˆ A1 = N −1 Ê æ N æ é , é 1 ˆ ˆ BB , μi (1 − μi ) i=1 (D.14) +∞ where B = −∞ gi φ (giθ ) f (a1 |y2 , z)da1 . ri1 (θ ; γ ) = si (θ ; γ ) − F1 ri2 (γ ), ˆ ˆ ˆ ˆ ri1 (θ ; γ ) = si (θ ; γ ) − F1 ri2 (γ ), (D.15) where ri1 (θ ; γ ), si (θ ; γ ) are (K + 2) × 1 matrices, and ri2 (γ ) and F1 are (L + 1) × 1 and (K + 2) × (L + 1) matrices, A1 is a (K + 2) × (K + 2) matrix. F1 = E[ γ si (θ ; γ )] = E E δ2 si (θ ; γ ) E δ0 si (θ ; γ ) ¾ N 1 ˆ F1 = N i=1 where æ −1 B = E μi (1 − μi ) −1 B = E μi (1 − μi ) æ [μi (1 − μi ˆ )]−1B [μi (1 − μi ˆ )]−1B δ2 si (θ ; γ ) à , si ( θ ; γ ) δ0 +∞ ∂ f (a1 |y2 , z) Φ(gi θ ) da1 ∂ δ2 −∞ +∞ ∂ f (a1 |y2 , z) Φ(gi θ ) da1 −∞ ∂ δ0 é é Ê +∞ Φ(g θˆ )[∂ f (a |y , z)/∂ δ ]da ¿ i 1 2 2 1 −∞ Ê +∞ Φ(g θˆ )[∂ f (a |y , z)/∂ δ ]da , −∞ i 1 2 0 , (D.16) 1 ∂ f (a1 |y2i , zi ) zi PC[δ0 + exp(zi δ2 )](y2i +δ0 −1) = , ∂ δ2 Γ(y2i + δ0 ) in which P = − exp(zi δ2 + a1 ) + a1 (y2i + δ0 ) − δ0 exp(a1 ), and C = {(y2i + δ0 ) exp(zi δ2 ) − exp(zi δ2 + a1 )[δ0 + exp(zi δ2 )]} . 106 , (D.17) ∂ f (a1 |y2 , z) = f (a1 |y2 , z)D, ∂ δ0 (D.18) y +δ 2i in which D = a1 − a1 exp(a1 ) + ln(δ0 + exp(zi δ2 )) + δ +exp(z0δ ) − Γ (y2i + δ0 ) and i 2 0 f (a1 |y2 , z) = exp(P)[δ0 + exp(zδ2 )](y2 +δ0 ) . Γ(y2 + δ0 ) Therefore, we can obtain the asymptotic variance of the two-step estimator as: √ ˆ Avar N(θ − θ ) = A−1Var[ri1 (θ ; γ )]A−1, 1 1 (D.19) and the estimator of this variance is: ˆ Avar(θ ) = N 1 ˆ −1 ˆ ˆ ˆ ri1 ri1 A−1 . N −1 A1 1 N i=1 (D.20) The asymptotic standard errors are obtained by the square roots of the diagonal elements of this matrix. D.1.2 Asymptotic Variance for the APEs First, we need to obtain the asymptotic variance of √ ˆ N(ψ − ψ ) for continuous explanatory variable where: ˆ ψ= +∞ −∞ ˆ ˆ ˆ φ (gθ ) f (a1 |y2 , z; θ )da1 θ , (D.21) is the vector of scaled coefficients times the scaled factor in the APE section ψ= +∞ −∞ φ (gθ ) f (a1 |y2 , z; θ )da1 θ , is the vector of scaled population coefficients times the mean response. If y2 is treated as a continuous variable: APE = +∞ −∞ ˆ ˆ ˆ ˆ ˆ φ (α1 y2 + z1 δ1 + η1 a1 ) f (a1|y2 , z; θ )da1 α1 . For a continuous variable z11 : APE = +∞ −∞ ˆ ˆ ˆ ˆ ˆ φ (α1 y2 + z1 δ1 + η1 a1 ) f (a1 |y2 , z; θ )da1 δ11 . 107 (D.22) ˆ ˆ ˆ ˆ Using problem 12.12 in Wooldridge (2002), and let π = (θ , δ 2 , δ0 ) we have: N √ √ ˆ ˆ N(ψ − ψ ) = N −1/2 [j(gi , zi , π ) − ψ ] + E[∇π j(gi , zi , π )] N(π − π ) + o p (1), (D.23) i=1 where +∞ j(gi , zi , π ) = and f (a1 |y2 , z) = f (a1 ; δ0 , δ0 ) First, we need to find φ (gi θ ) f (a1 |y2 , z; θ )da1 θ , −∞ √ ˆ N(π − π ) δ0 +y2 δ0 + exp(zδ 2 ) [exp(a1 )]y2 . δ0 + exp(zδ 2 + a1 ) N A−1 ri1 1 i=1 ri2 √ ˆ N(π − π ) = N −1/2 √ ˆ N(π − π ) = N −1/2 Thus the asymptotic variance of +∞ Var −∞ √ à N + o p (1), ki + o p (1). (D.24) i=1 ˆ N(ψ − ψ ) is: φ (giθ ) f (a1 |y2i , zi )da1 θ − ψ + J(π )ki , (D.25) where J(π ) = E[∇π j(gi , zi , π )]. Next, we need to find ∇θ j(gi , zi , π ) ; ∇δ j(gi , zi , π ) and ∇δ j(gi , zi , π ). 2 ∇θ j(gi , zi , π )= − +∞ −∞ +∞ −∞ 0 φ (gi θ ) f (a1 |y2i , zi )da1 IK+2 φ (giθ ) (gi θ ) (θ gi ) f (a1 |y2i , zi )da1 , (D.26) where IK+2 is the identity matrix and (K + 2) is the dimension of θ . ∇δ j(gi , zi , π ) = θ 2 +∞ −∞ φ (gi θ ) ∂ f (a1 |y2i , zi ) da1 ∂ δ2 , (D.27) ∂ f (a1 |y2i , zi ) da1 ∂ δ0 , (D.28) where ∂ f (a1i |y2i , zi )/∂ δ2 is defined in (D.17) and ∇δ j(gi , zi , π ) = θ 0 +∞ −∞ φ (gi θ ) 108 where ∂ f (a1i |y2i , zi )/∂ δ0 is defined in (D.18). ∇δ j(gi , zi , π ) is (K +2)×L matrix and ∇δ j(gi , zi , π ) 2 0 is (K + 2) × 1 matrix. Then, ∇π j(gi , zi , π ) = ∇θ j(gi , zi , π ; θ )|∇δ j(gi , zi , π ; δ2 )|∇δ j(gi , zi , π ;δ0 ) , 2 0 (D.29) and its expected value is estimated as: ˆ ˆ J = J(π ) = N −1 Finally, Avar Avar ä√ ä√ N i=1 ˆ ˆ ˆ ∇θ j(gi , zi , π ; θ )|∇δ j(gi , zi , π ; δ2 )|∇δ j(gi , zi , π ;δ0 ) . 2 ç 0 (D.30) ˆ N(ψ − ψ ) is consistently estimated as: ç ˆ N(ψ − ψ ) = N −1 N i=1 +∞ × −∞ +∞ −∞ ˆ ˆ ˆ ˆˆ φ (gi θ ) f (a1 |y2i , zi )da1 θ − ψ + Jki ˆ ˆ ˆ ˆˆ φ (giθ ) f (a1 |y2i , zi )da1 θ − ψ + Jki . (D.31) where all quantities are evaluated at the estimators given above. The asymptotic standard error for any particular APE is obtained as the square root of the corresponding diagonal element of (D.31), √ divided by N. √ Now we obtain the asymptotic variance of N(λ − λ ) for a count endogenous variable where: APE = Ea1 [Φ(α1 yk+1 + z1 δ1 + η1 a1 ) − Φ(α1 yk + z1 δ1 + η1 a1 )]. 2 2 (D.32) For example, yk = 0 and yk+1 = 1. 2 2 APE = +∞ −∞ Var ˆ ˆ Φ(gk+1 θ ) f (a1 |y2 , z; θ )da1 − i √ (1) We start with: N(λ − λ ) √ +∞ −∞ ˆ ˆ Φ(gk θ ) f (a1 |y2 , z; θ )da1 , i √ = Var N (λk+1 − λk ) − (λk+1 − λk ) , √ √ = Var N(λk+1 − λk+1 ) +Var N(λk − λk ) √ √ −2Cov[ N(λk+1 − λk+1 ), N(λk − λk )]. N N(λk − λk ) = N −1/2 i=1 109 j(gk , zi , π ) − λk i (D.33) √ ˆ +E[∇π j(gk , zi , π )] N(π − π ) + o p (1), i Ê (D.34) +∞ where j(gk , zi , π ) = −∞ Φ(gk θ ) f (a1 |y2i , zi )da1 . i i Var √ N(λk − λk ) = N −1 N +∞ Φ(gk θ ) f (a1 |y2i , zi )da1 − λk + Jki i 2 , (D.35) ˆ ˆ ˆ ∇θ j(gk , zi , π ; θ )|∇δ j(gk , zi , π ; δ2 )|∇δ j(gk , zi , π ;δ0 ) , i i i (D.36) i=1 −∞ ˆ in which the notations of ki is the same as (D.24) and J is defined as follows: ˆ ˆ J = J(π ) = N −1 N i=1 2 +∞ k g −∞ i ∇θ j(gk , zi , π ; θ ) = i ∇δ j(gk , zi , π ; δ2 ) = i 2 ∇δ j(gk , zi , π ;δ0 ) = i 0 (2) Var √ +∞ −∞ +∞ −∞ 0 φ (gk θ ) f (a1 |y2i , zi )da1 , i (D.37) Φ(gk θ ) i ∂ f (a1 |y2i , zi ) da1 , ∂ δ2 (D.38) Φ(gk θ ) i ∂ f (a1 |y2i , zi ) da1 . ∂ δ0 (D.39) N(λk+1 − λk+1 ) is obtained in a similar way as (1). (3) Using the formula: Cov(x, y) = E(xy) − ExEy and getting the estimator of this Covariance with the notice that E(λk ) = λk , after some algebra, we have the estimator for this covariance is 0. Adding (1), (2) and (3) together, we get: Var √ N(λ − λ ) = Var √ N(λk − λk ) + Var √ N(λk+1 − λk+1 ) . (D.40) The asymptotic standard error for APE of the count endogenous variable is obtained as the √ square root of the corresponding diagonal element of (D.40), divided by N D.2 Details of the Tobit Model’s Estimators This appendix shows how to obtain the average partial effects for Tobit models in both cases where y2 is assumed exogenous and endogenous respectively. Following the Smith-Blundell (1986) approach, the model with endogenous y 2 is written as: y1 = max(0, α1 y2 + z1 δ1 + v2 ξ1 + e1 ), 110 (D.41) where the reduced form of y2 is: y2 = zπ 2 + v2 , v2 |z ∼ Normal(0, Σ2), 2 and e1 |z, v2 ∼ Normal(0, σe ). The conditional mean of y1 is: (D.42) Õ 2 E(y1 |z, y2 , v2 ) = Φ[(α1y2 + z1 δ1 + v2 ξ1 )/ (1 + σe )], = Φ(α1e y2 + z1 δ1e + v2 ξ1e ). 2 The Blundell-Smithprocedure for estimating α 1 , δ1 , ξ1 and σe will then be: (i) Run the OLS regression of yi2 on zi and save the residuals vi2 , i = 1, 2, . . ., N. ˆ ˆ ˆ ˆ (ii) Do Tobit of yi1 on yi2 , z1i and vi2 to get α1e , δ1e , and ξ1e , i = 1, 2, . . ., N. ˆ APEs for Tobit model with exogenous or endogenous variable are obtained as follows: * APE in Tobit Model with exogenous variable y 2 y1 = max(0, y∗ ), y∗ = α1 y2 + z1 δ1 + a1 , a1 |y2 , z1 ∼ N(0, σ 2 ). 1 1 The conditional mean is: E(y1 |z1 , y2 ) = Φ(α1s y2 + z1 δ1s )(α1 y2 + z1 δ1 ) + σ φ (α1s y2 + z1 δ1s ), (D.43) α δ1 where α1s = σ1 , δ1s = σ . We define E(y1 |z1 , y2 ) = m(y2 , z1 , θ1s , θ1 ). For a continuous variable y2 : APE = ∂ E(y1 |z1 , y2 ) = Φ(α1s y2 + z1 δ1s )α1 . ∂ y2 (D.44) 1 N ˆ ˆ ˆ Φ(α1s y2i + z1i δ1s )α1 . N i=1 (D.45) The estimator for this APE is: APE = For a discrete variable y2 with the two values c and c + 1: APE = m(y2i = c + 1) − m(y2i = c), 111 (D.46) and the estimator for this APE is: APE = 1 N m(y2i = c + 1) − m(y2i = c), ˆ ˆ N i=1 (D.47) ˆ ˆ ˆ ˆ ˆ ˆ ˆ where m(y2i = c) = Φ(α1s c + z1i δ1s )(α1 c + z1i δ1 ) + σ φ (α1s c + z1i δ1s ). ˆ * APE in Tobit Model with endogenous y 2 (Blundell-Smith 1986) y1 = max(0, y∗ ), y∗ = α1 y2 + z1 δ1 + η1 a1 + e1 = α1 y2 + z1 δ1 + u1 , 1 1 y2 = zδ 2 + a1 , 2 Var(a1 ) = σ 2 , e1 |z, a1 ∼ N(0, τ1 ). The standard method is to obtain APEs by computing the derivatives or the differences of: 2 Ea1 [m(α1y2 + z1 δ1 + η1 a1 , τ1 )], (D.48) 2 2 2 where m(α1 y2 + z1 δ1 + η1 a1 , τ1 ) = m(α1 y2 + z1 δ1 , η1 σ 2 + τ1 ). The conditional mean is: Õ 2 2 E(y1 |z1 , y2 ) = Φ(α1s y2 + z1 δ1s )(α1y2 + z1 δ1 ) + η1 σ 2 + τ1 φ (α1s y2 + z1 δ1s ), where α1s = Õ We define: α1 ,δ 2 σ 2 +τ 2 1s η1 1 = (D.49) Õ δ1 . 2 σ 2 +τ 2 η1 1 2 2 E(y1 |z1 , y2 ) = m(α1 y2 + z1 δ1 , η1 σ 2 + τ1 ). (D.50) ˆ Consistent estimators of APEs are resulted from the derivatives or the differences of m( α1 y2 + ˆ ˆ2 ˆ ˆ2 ˆ z1 δ1 , η1 σ 2 + τ1 ) with respect to elements of (z1 , y2 ) where σ 2 is the estimate of error variance from the first-stage OLS regression. APE with respect to z1 : N APE = N −1 ˆ ˆ ˆ Φ(α1s y2i + z1i δ1s )α1 , (D.51) i=1 and APE with respect to y2 : APE = N −1 N m(y2i = c + 1) − m(y2i = c), ˆ ˆ i=1 Õ ˆ ˆ ˆ ˆ2 ˆ ˆ ˆ ˆ2 ˆ where m(y2i = c) = Φ(α1s c + z1i δ1s )(α1 c + z1i δ1 ) + η1 σ 2 + τ1 φ (α1s c + z1i δ1s ). ˆ 112 (D.52) An alternative method is to get APEs by computing the derivatives or the differences of: 2 Ea1 [m(α1 y2 + z1 δ1 + η1 a1 , τ1 )], (D.53) 2 2 where m(z1 , y2 , a1 , τ1 ) = m(x, τ1 ) = Φ(x/τ1)x + τ1 φ (x/τ1 ). APE with respect to z1 : APE N = N −1 Õ 2 Φ(x/ τ1 )δ11 . (D.54) i=1 APE with respect to y2 : APE = N −1 N [m1 − m0 ], (D.55) i=1 where m0 = m[y2 = 0] and x =α1 y2 + z1 δ1 + η1 a1 and a1 is the residual obtained from the first ˆ stage estimation. For more details, see the Blundell-Smith procedure and the APEs in (Wooldridge, 2002, chapter 16). D.3 Formula of the NLS estimation In order to compare the NLS and the QML estimation, the basic framework is introduced as below. The first stage is to estimate δ2 and δ0 by using the step-wise maximum likelihood of y i2 on ˆ ˆ zi in the Negative Binomial model. Obtain the estimated parameters δ2 and δ0 . In the second stage, instead of using QMLE, we use the NLS of yi1 on yi2 , zi1 to estimate α1 , δ1 and η1 with the approximated conditional mean μi (θ ; y2 , z). The NLS estimator of θ solves: min N −1 θ ∈Θ or N i=1 y1i − +∞ −∞ Φ(α1 y2i + z1i δ1 + η1 a1 ) f (a1 |y2 , z)da1 2 , N −1 min N [y1i − μi (θ ; y2i , zi )]2 /2. θ ∈Θ i=1 The score function can be written as: si = −(y1i − μi ) +∞ −∞ gi φ (gi θ ) f (a1 |y2 , z)da1 . 113 (D.56) D.4 Derivation of the Heterogeneity Distribution We are given exp(a1 ) distributed as Gamma(δ0 , 1/δ0 ) using a single parameter δ0 . We are interested in obtaining the density function of Y = a 1 . Let X = exp(a1 ). The density function of X is specified as follows: δ δ 0 X δ0 −1 exp(−δ0 X) ; f (X; δ0 ) = 0 Γ(δ0 ) X > 0, δ0 > 0. (D.57) Since X > 0 and Y = ln(X), dX /dY = exp(Y ) and Y ∈ (−∞, ∞). The density function of Y will be derived as: ¬¬ dX ¬¬ ¬ ¬ f (Y ; δ0 ) = f [h(Y )] ¬ ¬ ; ¬ dY ¬ Y ∈ (−∞, ∞), (D.58) δ 0 exp(a1 )δ0 exp[−δ0 exp(a1 )] f (Y ; δ0 ) = 0 , Γ(δ0 ) (D.59) δ δ0 0 exp(Y )δ0 −1 exp[−δ0 exp(Y )] where f [h(Y )] = . Γ(δ0 ) Plug in Y = a1 , we get: δ which is equation (1.4). 114 Appendix E TECHNICALITIES FOR CHAPTER 2 E.1 Asymptotic Variance of the Two-step Estimator If the null hypothesis of no endogeneity and no serial correlation in the first stage is rejected, the standard errors in the second stage should be adjusted for the first stage estimation by using delta method or bootstrapping. In addition, we also need to get asymptotic standard errors for the average partial effects. We start with the linear reduced form in the first stage: y2it = w∗ γ2 + v∗ , 2it 2it (E.1) where w∗ = (zit , zi ) is 1 × (2L) vector of exogenous variables. Under standard regularity condi2it tions, we have: √ N(γ2 − γ2 N ) = N −1/2 πi2 (γ2 ) + o p (1), (E.2) i=1 where πi2 = A−1 B2i v∗ , 2i 2 (E.3) and B2i is the T × (2L) matrix with tth row w ∗ , A2 = E(B2i B2i ) and v∗ is a T × 1 vector of 2it 2i reduced form errors. Now we can write: M =E(y1it |zi , y1i,t−1 , y2it , y1i0 , w∗ ), 2i 2 M =m[ρ y1i,t−1 + α y2it + xit β + θ2 y1i0 + zi θ3 + (y∗ − w∗ γ2 )θ4 + θ1 (y∗ − w∗ γ2 ), σs∗ ], 2i 2i 2it 2it 2 M =m[α y2it + w3it λ3 + θ1 (y∗ − w∗ γ2 ) + (y∗ − w∗ γ2 )θ4 , σs∗ ], 2it 2it 2i 2i (E.4) 2 ∗2 ∗2 where w3it = (y1i,t−1 , xit , yi0 , zi ); σs∗ = σa + σe and λ3 = (ρ , β , θ2 , θ3 ) . 1 1 We collect all the parameters in M except for γ2 into the parameter vector λ ∗ and abuse the notation that wit = (y2it , w3it , v∗ , v∗ ) in this part. In the previous part we use w ∗ . it 2it 2i 115 With the maximum likelihood in the second stage, the log likelihood for observation i in period time t is: 2 lit (λ ∗ ; σs∗ ) = 1[y1it = 0] log[1 − Φ(wit λ ∗ /σs∗ )] 2 −1[y1it > 0]{log φ [(y1it − wit λ ∗ )/σs∗ ] − log(σs∗ )/2}. (E.5) Using the notation: Φ(w i λ ∗ /σs∗ ) = Φi ; φ (wi λ ∗ /σs∗ ) = φi and the constant does not affect the maximization, we can rewrite this log likelihood as: 1 1 2 2 li (λ ∗ ; σs∗ ) = 1[y1i = 0] log(1 − Φi) − 1[y1i > 0]{ (y1i − wi λ ∗ )2 /σs∗ + log(σs∗ )}, 2 2 ¾ and we have the score as: si (λ ∗ ; γ2 ) = si1 ¿ ¾ si2 = ∇λ ∗ li ∇σs∗ li (E.6) ¿ , (E.7) 2 and si1 = −1[y1i = 0](φi wi )/σs∗ (1 − Φi ) + 1[y1i > 0](y1i − wi λ ∗ )wi /σs∗ , ä ç 2 4 2 si2 = 1[y1i = 0](φiwi λ ∗ )/[2σs∗(1 − Φi )] + 1[y1i > 0] (y1i − wi λ ∗ )/ 2σs∗ − 1/ 2σs∗ . √ With the two-step M-estimator, the asymptotic variance of N(λ ∗ − λ ∗ ) must be adjusted to √ account for the first-stage estimation of N(γ2 − γ2 ) (see more in 12.4.2 of Chapter 12, Wooldridge, 2002). We can write: √ ¾ N(λ ∗ − λ ∗ ) = A−1 N −1/2 1 N ¿ πi1 (λ ∗ ; γ2 ) + o p (1), (E.8) i=1 where A1 = E[−∇λ ∗ si1 (λ ∗ ; γ2 )], (E.9) and Ò Ó ∇λ ∗ si1 (λ ∗ ; γ2 ) = −σ −2 1[y1i = 0] [φi2 − φi (1 − Φi )λ ∗ ]/(1 − Φi)2 + 1[y1i > 0] wi wi , and πi1 (λ ∗ ; γ2 ) = si1 (λ ∗ ; γ2 ) − F1 πi2 (γ2 ), 116 (E.10) where or F1 = −E T È [(1 − σs∗)φ t=1 F1 = E[∇γ2 si1 (λ ∗ ; γ2 )], it wit λ ∗+Φ Therefore, we get: √ Avar where V =Var[πi1(λ ∗ ; γ2 )]. √ A valid estimator of Avar ∗ ∗ it ](θ1 w2it + θ4 w2i ) (E.11) . N(λ ∗ − λ ∗ ) = A−1 VA−1 , 1 1 (E.12) N(λ ∗ − λ ∗ ) is: A11 = A−1 N −1 1 N πi1 πi1 A−1 , 1 (E.13) i=1 where A1 = N −1 N T Ò Ó 2 σ −2 1[y1it = 0] [φit − φit (1 − Φit )λ ∗ ]/(1 − Φit )2 + 1[y1it > 0] wit wit , i=1t=1 and πi1 = si1 − F1 πi2 in which πi2 = A−1 B2i v∗ and 2i 2 F1 = −N −1 N T [(1 − σs∗ )φit wit λ ∗ + Φit ](θ1 w∗ + θ4 w∗ ), 2it 2i i=1t=1 and the asymptotic variance of λ ∗ is: Avar(λ ∗ ) = A−1 QA−1 /N = A11 , 1 1 where Q = N −1 N Èπ i=1 i1 πi1 . å √ 2 2 We can derive Avar N(σs∗ − σs∗ ) √ Avar N(λ ∗ − λ ∗ ) and get A22 . è as the above procedure for the derivation of 2 And denote, Ψ ≡ (λ ∗ , σs∗ ) , we can derive Avar Avar ä√ (E.14) ç N(Ψ − Ψ) = ä√ ç N(Ψ − Ψ) as: ¾ ¿ A11 A12 A21 A22 , where A22 = A−1 QA−1 /N and 2 2 A2 = −σ −4 {(wiλ ∗ /σ s∗ )3 φi + (wi λ ∗ /σ s∗ )φi − [(wi λ ∗ /σ s∗ )φi2 /(1 − Φi )]−2Φi }/4 117 (E.15) and A12 = A−1 QA−1 /N and 12 12 A12 = σ −3 {(wiλ ∗ /σ s∗ )2 φi + φi − [(wi λ ∗ /σ s∗ )φi2 /(1 − Φi )]}wi /2. E.2 Asymptotic Variance of the Average Partial Effects Next, we obtain the standard errors for the average partial effects as in equations (2.21) and (2.22). ¾ ¿ N T ϕ = λ ∗ (NT )−1 Φ(wit λ ∗ /σs∗ ) , (E.16) i=1t=1 where wit = (y2it , w3it , v∗ , v∗ ). 2it 2i T ϕ = λ ∗ T −1 E [Φ(wit λ ∗ /σs∗ )] . t=1 Then we need to compute the asymptotic variance of Let μ = (λ ∗ ; γ2 ) and p(wit , w∗ , μ ) ≡ (T −1 2it √ N(ϕ − ϕ ). T È äΦ(äα y ∗ ∗ ∗ ∗ 2it + w3it λ3 + θ1 (y2it − w2it γ2 ) + (y2i − w2i γ2 )θ4 t=1 (E.17) ç ç /σs∗ ) )λ ∗ , we have: N √ N(ϕ − ϕ ) = N −1 λ ∗ (T −1 i=1 √ [Φ(wit λ ∗ /σs∗ )]) − ϕ + E[∇μ p] N(μ − μ ) + o p (1). T t=1 (E.18) In which: ¾ where Di = of Di is: √ A−1 πi1 1 πi2 ¿ N(μ − μ ) = N −1 N Di + o p (1), (E.19) i=1 and all matrix definitions were introduced in step 1 and a valid estimator ¾ D= Therefore, the asymptotic variance of ¾ Ω = Var λ ∗ (T −1 A−1 πi1 1 πi2 ¿ √ N(ϕ − ϕ ) is: T . (E.20) ¿ [Φ(wit λ ∗ /σs∗ )]) − ϕ + PD , t=1 118 (E.21) in which Ω = Var(K + PD)where K = λ ∗ (T −1 T È [Φ(w t=1 it λ ∗ /σ )]) − ϕ . s∗ Hence, we can get T K = λ ∗ (T −1 Φ(wit λ ∗ /σs∗ ) ) − ϕ . (E.22) t=1 The last job is to find the Jacobian P where P = E[∇ μ p] P = [P1 |P2 ] P1 = ∇λ ∗ p =T −1 T [φ (wit λ ∗ /σs∗ )] (wit λ ∗ /σs∗ ) + Φ(wit λ ∗ /σs∗ )], t=1 and T P1 = T −1 φ (wit λ ∗ /σs∗ ) (wit λ ∗ /σs∗ ) + Φ(wit λ ∗ /σs∗ )], t=1 or in short: T P1 = T −1 [φ (ω )ω + Φ(ω )] , (E.23) t=1 where ω = wit λ ∗ /σs∗ P2 = ∇γ2 p =T −1 T È [φ (w t=1 P2 it λ ∗ /σ )] (−θ w∗ − w∗ θ )λ ∗ /σ )] and s∗ s∗ 1 2it 2i 4 = T −1 T [φ (ω )] (−θ1 w∗ − w∗ θ4 )λ ∗ /σs∗ )]. 2it 2i (E.24) t=1 Therefore P= (NT )−1 N i=1 Finally, Avar ä√ ¾ T −1 T [φ (ω )ω + Φ(ω )] |T −1 t=1 T ¿ [φ (ω )] (−θ1 w∗ − w∗ θ4 )λ ∗ /σs∗ )] . 2it 2i t=1 (E.25) ç N(ϕ − ϕ ) is consistently estimated as: Ω = N −1 N (K − PD)(K − PD) . (E.26) i=1 The asymptotic standard error for any particular APE is obtained as the square root of the √ corresponding diagonal element in the above expression, divided by N. 119 Appendix F TECHNICALITIES FOR CHAPTER 3 Derivation of Maximum likelihood estimator in the firs stage and the Asymptotic Variance in the second stage F.1 Bivariate Probit Model in the First Stage In the first stage, we estimate equation (3.11) and equation (3.12) simultaneously and get the log likelihood as in equation (3.23). Note that the model is qualitatively different from the usual bivariate probit model. In a simultaneous equations model (3.11-3.12), the second dependent variable y3it appears on the right hand side of the equation with the dependent variable y 2it . One can derive the following conditional mean and obtain the corresponding marginal effects of interest: E(y2it |Wi ) = Pr[y3it = 1|Wi ]E[y2it |y3it = 1,Wi ] + Pr[y3it = 0|Wi ]E[y2it |y3it = 0,Wi ], (F.1) where E[y2it |y3it = 1,Wi ] = Pr[y2it = 1|y3it = 1,Wi ], (F.2) E[y2it |y3it = 0,Wi ] = Pr[y2it = 1|y3it = 0,Wi ], (F.3) E(y2it |Wi ) = Φ2 (W3it γ3 ,W2it γ2 + α2 ; ρ ) + Φ2 (−W3it γ3 ,W2it γ2 ; −ρ ). (F.4) and Therefore To obtain the derivatives and Hessian, let us rewrite the log likelihood in a convenient way with q2i = 2y2i − 1 and q3i = 2y3i − 1 (which results in qim = 1 if ymi = 1 and qim = −1 if ymi = 0, for m = 2, 3): lnLit = ln Φ2 (ki2 , ki3 ; π ), 120 (F.5) where kim = qimWmit γm for m = 2, 3 (here the notation is abused under the note that γ 2 = (γ2 , α2 ) and π = q2i q3i ρ . The score function and the information matrix resulting from equation (F.5) are derived as ¾ follows: ∂ ln Lit (θ1 )/∂ γ3 ∂ ln Lit (θ1 )/∂ γ2 sit (θ1 ) = ∇θ ln Lit (θ1 ) = 1 ∂ ln Lit (θ1 )/α2 ¿ , (F.6) ∂ ln Lit (θ1 )/ρ and I(θ1) = −E ∇2 ln Lit (θ1 ) . θ (F.7) ∂ ln Lit (θ1 )/∂ γ3 = Φ−1 (ki2 , ki3 ; π )(qi3W3it )gi3 , 2 (F.8) 1 We have: where gi3 = φ (ki3 )Φ (ki2 − π ki3 )(1 − π 2)−1/2 . ∂ ln Lit (θ1 )/∂ γ2 = Φ−1 (ki2 , ki3 ; π )(qi2W2it )gi2 , 2 (F.9) where gi2 = φ (ki2 )Φ (ki3 − π ki2 )(1 − π 2)−1/2 . ∂ ln Lit (θ1 )/∂ α2 = Φ−1 (ki2 , ki3 ; π )qi2gi2 , 2 (F.10) ∂ ln Lit (θ1 )/∂ ρ = Φ−1 (ki2 , ki3 ; π )qi2 qi3 φ2 (ki2 , ki3 ; π ). 2 (F.11) and Therefore, the asymptotic variance of θ1 is: Avar(θ1 ) = C−1VC−1 /N, (F.12) where C = N −1 N I(θ1 ), i=1 121 (F.13) and V = N −1 N sit (θ1 )sit (θ1 ) . (F.14) i=1 As a result, the estimator of the asymptotic variance of θ 1 is: Avar(θ1 ) = C−1V C−1 /N, √ d N(θ1 − θ1 ) → Normal(0,C −1VC−1 ), and or (F.15) (F.16) √ N(θ1 − θ1 ) = N −1/2 N ri (θ1 ) + o p (1), (F.17) i=1 where ri (θ1 ) = −I(θ1 )−1 si (θ1 ), (F.18) ri (θ1 ) ≡ −I(θ1 )−1 si (θ1 ). (F.19) and F.2 Asymptotic Variance of the Two-step Estimator The asymptotic variance of the second-stage parameters, θ2 , needs to be corrected for general heterokedasticity, serial correlation and first-stage estimation of θ 1 using the delta method as shown in Wooldridge (1995a) and Wooldridge (2002, chapter 12). For y2it = 1,we define the general regressors for time period t as: ˆ ˆ ˆ ˆ wit = (W1it , y3it , 0, .., 0, λit1, 0, .., 0, 0, .., 0, λit2, 0, .., 0, 0, .., 0, λit3, 0, .., 0, 0, .., 0, λit4, 0, .., 0, ) and the parameter vector in the second stage is: θ2 = (γ1 , α1 , η11 , . . ., ηT 1 , η12 , . . . , ηT 2 , η13 , . . . , ηT 3 , η14 , . . ., ηT 4 ) which is a G × 1 vector where G = (1 + K1 + L + 4T ). 122 We can write E[log(y1it )|wit , y2it = 1] = wit θ2 , then we have: log(y1it ) = wit θ2 + εit where E[εit |wit , y2it = 1] = 0 (t = 1, T ). On the selected sample, our POLS estimator is: θ2 = N −1 N −1 T y2it wit wit N −1 i=1 t=1 θ2 = θ2 + N −1 N T y2it wit log(y1it ) , (F.20) i=1 t=1 −1 T N y2it wit wit N −1 i=1 t=1 N T y2it wit εit , (F.21) i=1 t=1 and it can be shown that: √ d N(θ2 − θ2 ) → Normal(0, A−1BA−1 ), (F.22) where T A =E y2it wit wit , (F.23) t=1 B = Var(hi ) = E(hi hi ) and hi = si −Fri , (F.24) in which si = T y2it wit εit , (F.25) t=1 T F =E y2it wit θ2 ∇θ wit (θ1 ) , 1 t=1 (F.26) in which ∇θ wit (θ1 ) is a G × Q gradient of wit (θ1 ) evaluated at θ1 and ri is defined in the previous 1 part. To estimate Avar(θ2 ) = A−1 BA−1 /N, we obtain: A≡N −1 N T y2it wit wit , (F.27) y2it wit θ2 ∇θ wit (θ1 ) , (F.28) i=1 t=1 F≡N −1 N T 1 i=1 t=1 and for each i = 1, N. 123 si ≡ T y2it wit εit , (F.29) t=1 in which εit = log(y1it ) − wit θ2 , and hi = si −Fri . (F.30) A consistent estimator of B is: N B≡N −1 hi hi . (F.31) i=1 The asymptotic variance of θ2 is estimated as: Avar(θ2 ) = A−1 BA−1 /N, (F.32) and the asymptotic standard errors are obtained as the square roots of the diagonal elements of this matrix. 124 BIBLIOGRAPHY 125 BIBLIOGRAPHY Abadie, Alberto. 2000. Semiparametric estimation of instrumental variable models for causal effects. Working Paper 260. National Bureau of Economic Research. Amemiya, Takeshi. 1978. The estimation of a simultaneous equation generalized probit model. Econometrica 46(5). 1193–1205. Amemiya, Takeshi. 1979. The estimation of a simultaneous equation tobit model. International Economic Review 20(1). 169–81. Angrist, Joshua D. 2001. Estimation of limited-dependent variable models with dummy endogenous regressors: Simple strategies for empirical practice. Journal of Business and Economic Statistics 19(1). 2–16. Angrist, Joshua D. & William N. Evans. 1998. Children and their parents’ labor supply: Evidence from exogenous variation in family size. American Economic Review 88(3). 450 – 77. Arellano, Manuel & Olympia Bover. 1995. Another look at the instrumental variable estimation of error-components models. Journal of Econometrics 68(1). 29–51. Arellano, Manuel & Bo Honore. 2001. Panel data models: Some recent developments. In J.J. Heckman & E.E. Leamer (eds.), Handbook of econometrics, vol. 5 Handbook of Econometrics, chap. 53, 3229–3296. Elsevier. Baltagi, Badi H. & Qi Li. 1991. A transformation that will circumvent the problem of autocorrelation in an error-component model. Journal of Econometrics 48(3). 385–393. Baltagi, Badi H. & Ping X. Wu. 1999. Unequally spaced panel data regressions with ar1 disturbances. Econometric Theory 15(06). 814–823. Becker, Gary S. & H. Gregg Lewis. 1973. On the interaction between the quantity and quality of children. Journal of Political Economy 81(2). S279–88. Ben-Porath, Yoram & Finis Welch. 1976. Do sex preferences really matter? The Quarterly Journal of Economics 90(2). 285 – 307. Bhargava, Alok & J. D. Sargan. 1983. Estimating dynamic random effects models from panel data covering short time periods. Econometrica 51(6). 1635–59. Bloom, David, David Canning, Günther Fink & Jocelyn Finlay. 2009. Fertility, female labor force participation, and the demographic dividend. Journal of Economic Growth 14(2). 79–101. Blundell, Richard W. & James L. Powell. 2004. Endogeneity in semiparametric binary response models. Review of Economic Studies 71. 655–679. Blundell, Richard W. & Richard J. Smith. 1989. Estimation in a class of simultaneous equation limited dependent variable models. Review of Economic Studies 56(1). 37–57. 126 Bronars, Stephen G. & Jeff Grogger. 2001. The effect of welfare payments on the marriage and fertility behavior of unwed mothers: Results from a twins experiment. Journal of Political Economy 109(3). 529–545. Browning, Martin. 1992. Children and household economic behavior. Journal of Economic Literature 30(3). 1434–75. Cain, Glen G. & Martin D. Dooley. 1976. Estimation of a model of labor supply, fertility, and wages of married women. Journal of Political Economy 84(4). S179–99. Cameron, Colin A. & Pravin K. Trivedi. 1986. Econometric models based on count data: Comparisons and applications of some estimators and tests. Journal of Applied Econometrics 1(1). 29–53. Card, David & Daniel G. Sullivan. 1988. Measuring the effect of subsidized training programs on movements in and out of employment. Econometrica 56(3). 497–530. Carrasco, Raquel. 2001. Binary choice with binary endogenous regressors in panel data: Estimating the effect of fertility on female labor participation. Journal of Business and Economic Statistics 19(4). 385–394. Chamberlain, Gary. 1980. Analysis of covariance with qualitative data. Review of Economic Studies 47(1). 225–38. Chamberlain, Gary. 1992. Sequential moment restrictions in panel data: Comment. Journal of Business and Economic Statistics 10(1). 20–26. Chay, Kenneth Y. & Dean Hyslop. 1998. Identification and estimation of dynamic binary response panel data models: Empirical evidence using alternative approaches. Das, Mitali. 2002. Estimators and inference in a censored regression model with endogenous covariates. Discussion papers. Columbia University. Das, Mitali. 2005. Instrumental variables estimators of nonparametric models with discrete endogenous regressors. Journal of Econometrics 124(2). 335 – 361. Eckstein, Zvi & Kenneth I. Wolpin. 1990. Estimating a market equilibrium search model from panel data on individuals. Econometrica 58(4). 783–808. Even, William E. 1987. Career interruptions following childbirth. Journal of Labor Economics 5(2). 255–77. Fleisher, Belton M. & Jr. Rhodes, George F. 1979. Fertility, women’s wage rates, and labor supply. American Economic Review 69(1). 14–24. Giles, J. & I. Murtazashvili. 2010. A control function approach to estimating dynamic probit models with endogenous regressors, with an application to the study of poverty persistence in china. Greene, William H. 1997. Econometric analysis. NewYork: Macmillan 3rd edn. 127 Gronau, Reuben. 1973. The intrafamily allocation of time: The value of the housewives’ time. American Economic Review 63(4). 634–51. Hausman, Jerry A. 1978. Specification tests in econometrics. Econometrica 46(6). 1251–71. Heckman, James J. 1974. Effects of child-care programs on women’s work effort. Journal of Political Economy 82(2). S136–S163. Heckman, James J. 1978a. Dummy endogenous variables in a simultaneous equation system. Econometrica 46(4). 931–59. Heckman, James J. 1978b. Simple statistical models for discrete panel data developed and applied to test the hypothesis of true state dependence against the hypothesis of spurious state dependence. In Manski C.E. & Daniel L. McFadden (eds.), The econometrics of panel data 30/31, 227–269. University of Chicago. Heckman, James J. 1979. Sample selection bias as a specification error. Econometrica 47(1). 153–61. Heckman, James J. 1981a. Heterogeneity and state dependence. In Studies in labor markets NBER Chapters, 91–140. National Bureau of Economic Research, Inc. Heckman, James J. 1981b. The incidental parameters problem and the problem of initial conditions in estimating a discrete time-discrete data stochastic process. In Manski C.E. & Daniel L. McFadden (eds.), Structural analysis of discrete panel data with econometric applications, MIT press. Heckman, James J., Robert J. Lalonde & Jeffrey A. Smith. 1999. The economics and econometrics of active labor market programs. In O. Ashenfelter & D. Card (eds.), Handbook of labor economics, vol. 3, chap. 31, 1865–2097. Elsevier. Heckman, James J. & Thomas E. Macurdy. 1980. A life cycle model of female labour supply. Review of Economic Studies 47(1). 47–74. Heckman, James J. & Robert J. Willis. 1974. Estimation of a stochastic model of reproduction: An econometric approach. NBER Working Papers 0034 National Bureau of Economic Research, Inc. Heckman, James J. & Robert J. Willis. 1977. A beta-logistic model for the analysis of sequential labor force participation by married women. Journal of Political Economy 85(1). 27–58. Honore, Bo E. 1993. Orthogonality conditions for tobit models with fixed effects and lagged dependent variables. Journal of Econometrics 59(1-2). 35–61. Honore, Bo E. & Luojia Hu. 2001. Estimation of censored regression models with endogeneity. Honore, Bo E. & Luojia Hu. 2004. Estimation of cross sectional and panel data censored regression models with endogeneity. Journal of Econometrics 122(2). 293–316. 128 Honore, Bo E. & Ekaterini Kyriazidou. 2000. Panel data discrete choice models with lagged dependent variables. Econometrica 68(4). 839–874. Hsiao, Cheng. 1986. Analysis of panel data. Cambridge, MA: Cambridge University Press. Hyslop, Dean R. 1999. State dependence, serial correlation and heterogeneity in intertemporal labor force participation of married women. Econometrica 67(6). 1255–1294. Jacobsen, Joyce P., James Wishart Pearce III & Joshua L. Rosenbloom. 1999. The effects of childbearing on married women’s labor supply and earnings: Using twin births as a natural experiment. Journal of Human Resources 34(3). 449–474. Kim, Jungho & Arnstein Aassve. 2006. Fertility and its consequence on family labour supply. IZA Discussion Papers 2162 Institute for the Study of Labor (IZA). Kim, Kyoo Il. 2006. Sample selection models with a common dummy endogenous regressor in simultaneous equations: A simple two-step estimation. Economics Letters 91(2). 280–286. Kyriazidou, Ekaterini. 2001. Estimation of dynamic panel data sample selection models. Review of Economic Studies 68(3). 543–72. Labeaga, Jose M. 1999. A double-hurdle rational addiction model with heterogeneity: Estimating the demand for tobacco. Journal of Econometrics 93(1). 49–72. Lee, Lung-fei. 1999. Estimation of dynamic and arch tobit models. Journal of Econometrics 92(2). 355–390. Lee, Myoung-jee. 1996. Methods of moments and semiparametric econometrics for limited dependent variable models. Springer. Lehrer, Evelyn L. 1992. The impact of children on married women’s labor supply: Black-white differentials revisited. Journal of Human Resources 27(3). 422–444. Mullahy, J. 1997. Instrumental-variable estimation of count data models: Applications to models of cigarette smoking behavior. Review of Economics and Statistics 79. 586–93. Mundlak, Yair. 1978. On the pooling of time series and cross section data. Econometrica 46(1). 69–85. Nakamura, Alice & Masao Nakamura. 1992. The econometrics of female labor supply and children. Econometric Reviews 11(1). 1–71. Nelson, Forrest & Lawrence Olson. 1978. Specification and estimation of a simultaneous-equation model with limited dependent variables. International Economic Review 19(3). 695–709. Newey, Whitney K. 1985. Semiparametric estimation of limited dependent variable models with endogenous explanatory variables. Annales de l’inséé 59/60. Newey, Whitney K. 1986. Linear instrumental variable estimation of limited dependent variable models with endogenous explanatory variables. Journal of Econometrics 32(1). 127–141. 129 Newey, Whitney K. 1987. Efficient estimation of limited dependent variable models with endogenous explanatory variables. Journal of Econometrics 36(3). 231–250. Newey, Whitney K. & Daniel L. McFadden. 1994. Large sample estimation and hypothesis testing. In Robert F. Engle & Daniel L. McFadden (eds.), Handbook of econometrics, vol. 4, chap. 36, 2111 – 2245. Elsevier. Nguyen, Hoa B. 2010. Estimating a fractional response model with a count endogenous regressor and an application to female labor supply. In William H. Greene & R. Carter Hill (eds.), Advances in econometrics, vol. 26, 253–298. Emerald Group Publishing Limited. Papke, Leslie E. & Jeffrey M. Wooldridge. 1996. Econometric methods for fractional response variables with an application to 401(k) plan participation rates. Journal of Applied Econometrics 11(6). 619 – 32. Papke, Leslie E. & Jeffrey M. Wooldridge. 2008. Panel data methods for fractional response variables with an application to test pass rates. Journal of Econometrics 145(1-2). 121 – 133. Rivers, Douglas & Quang H. Vuong. 1988. Limited information estimators and exogeneity tests for simultaneous probit models. Journal of Econometrics 39(3). 347–366. Rosenzweig, Mark R. & Kenneth I. Wolpin. 1980. Life-cycle labor supply and fertility: Causal inferences from household models. Journal of Political Economy 88(2). 328–48. Schultz, T. Paul. 1978. Fertility and child mortality over the life cycle: Aggregate and individual evidence. American Economic Review 68(2). 208–15. Semykina, Anastasia & Jeffrey M. Wooldridge. 2010. Estimating panel data models in the presence of endogeneity and selection. Journal of Econometrics 157(2). 375–380. Shaw, Kathryn. 1994. The persistence of female labor supply: Empirical evidence and implications. Journal of Human Resources 29(2). 348–378. Skrondal, Anders & Rabe-Hesketh Sophia. 2004. Generalized latent variable modeling: Multilevel, longitudinal and structural equation models. Boca Raton, FL: Chapman and Hall, CRC. Smith, Richard J. & Richard W. Blundell. 1986. An exogeneity test for a simultaneous equation tobit model with an application to labor supply. Econometrica 54(3). 679–85. Staiger, Douglas & James H. Stock. 1997. Instrumental variables regression with weak instruments. Econometrica 65(3). 557 – 586. Terza, Joseph V. 1998. Estimating count data models with endogenous switching: Sample selection and endogenous treatment effects. Journal of Econometrics 84(1). 129 – 154. Vella, Francis. 1993. A simple estimator for simultaneous models with censored endogenous regressors. International Economic Review 34(2). 441–57. Vella, Francis & Marno Verbeek. 1999. Two-step estimation of panel data models with censored endogenous variables and selection bias. Journal of Econometrics 90(2). 239–263. 130 Vytlacil, Edward. 2002. Independence, monotonicity, and latent index models: An equivalence result. Econometrica 70(1). 331–341. Vytlacil, Edward & Nese Yildiz. 2007. Dummy endogenous variables in weakly separable models. Econometrica 75(3). 757–779. Weiss, Andrew A. 1999. A simultaneous binary choice/count model with an application to credit card approvals. In R. Engle & H. White (eds.), Cointegration, causality, and forecasting: A Festschrift in honour of Clive W. J. Granger, 429 – 461. Oxford and New York: Oxford University Press. Willis, Robert J. 1973. A new approach to the economic theory of fertility behavior. Journal of Political Economy 81(2). S14–64. Winkelmann, Rainer. 2000. Econometric analysis of count data. Berlin: Springer. Wooldridge, Jeffrey M. 1997. Multiplicative panel data models without the strict exogeneity assumption. Econometric Theory 13(5). 667–678. Wooldridge, Jeffrey M. 2002. Econometric analysis of cross section and panel data. Cambridge, MA: MIT Press. Wooldridge, Jeffrey M. 2005. Simple solutions to the initial conditions problem in dynamic, nonlinear panel data models with unobserved heterogeneity. Journal of Applied Econometrics 20(1). 39–54. Wooldridge, Jeffrey M. 2010. Econometric analysis of cross section and panel data. Cambridge, MA: MIT Press 2nd edn. 131