THREE ESSAYS ON UNBALANCED PANEL DATA MODELS By Do Won Kwak A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Economics 2011 Abstract THREE ESSAYS ON UNBALANCED PANEL DATA MODELS By Do Won Kwak This dissertation consists of three chapters on panel data models. The first chapter examines unconditional logit and conditional logit, and pooled correlated random effect (CRE) logit approaches in binary panel data models with highly dependent data. Simulation results show that, first, the conditional logit method is not robust to violation of the conditional independence (CI) assumption. The magnitude of bias for the model coefficient is greater for the data with smaller time dimensions and higher serial correlations. Second, we find no significant finite sample biases for average partial effects (APEs) and associated rejection frequency of CRE logit in the presence of high serial correlation under correct specification of unobserved heterogeneity. Finally, we quantify two sources of bias into the part due to unobserved heterogeneity and the part due to serial correlation by using unconditional logit and conditional logit methods. Finite sample biases by both sources are important in binary panel models with highly persistent outcome and correlated individual fixed effects. As an empirical example, we apply conditional logit and pooled CRE logit to the Survey of Income and Program Participation (SIPP) data where welfare participation exhibits high persistence. The results imply that it is important to account for both state dependence and unobserved heterogeneity simultaneously. The second chapter introduces two formal tests of the missing completely at random (MCAR) assumption in unbalanced panel data. In a Monte Carlo (MC) simulation, we provide the evidence of the substantial powers of tests. Inverse probability weighted (IPW) estimator and multiple imputation-based (MI) estimator under the missing at random (MAR) assumption are studied as methods for missing data when the MCAR assumption is violated. We suggest combining MI and IPW methods such that the MI method is applied to non-monotone missing data and the IPW method is applied to monotone missing data sequentially. Proposed tests and the MI-IPW method for missing data are applied to estimate the effect of class size reduction (CSR) on student scores for grades in K-3 using Project STAR. The result of empirical application shows the violation of the MCAR assumption and the MI-IPW estimates for the effects of CSR on student achievement scores in grades K-3 are about 4.5 to 6 percent while unweighted estimates are about 6 to 7.5 percent. In the last chapter, we extend the robust inference method with heterogenous variance and autocorrelation in Hansen (2007) to unbalanced panel data with large time dimension. With the homogeneity assumption and random attrition, we derive the robust inference result using weighted least squares (WLS) for unbalanced panel data. We show that, our inference method with WLS in unbalanced panel data provides conservative t-test results in Bakirov and Szekely (2006) in the presence of attrition or heteroskedastic variance. We study the size and the power of a proposed inference method in unbalanced panel data using the Monte Carlo simulation. A MC simulation reveals that a proposed WLS method reduces the size distortion and improves power over the methods of Ibragimov and Muller (2009) and Hansen (2007). ACKNOWLEDGMENTS I would never been able to finish my dissertation without guidance, suggestions and support from several people. First, my deepest appreciation goes to my advisor, Jeffrey M. Wooldridge. He was readily available at every stage of the dissertation process and provided me his perspective and helpful comments all the time. I could never have reached the depths of this dissertation without his mentorship and insightful advice. Besides my advisor, I would like to sincerely thank the faculty members in Michigan State University, Todd Elder, Peter Schmidt, Timothy Vogelsang, and Tapabrata Maiti. Whenever I needed advice about my research, they helped make this dissertation a better work with their insights and suggestions. I also have been very fortunate to have conducted some part of my dissertation research at StataCorp as an intern. Simulations in my dissertation would not have been developed as it is now without computing and data expertise I had learnt at StataCorp. I would like to extend a special thanks to David Drukker and Rafal Raciborski. I would like to thanks to my fellow graduate students. Jeff Brown, Cheol-Keun Cho, Dooyeon Cho, Junghwan Hyun, Myoung-Jin Keay, Jongduk Kim, Jinyoung Lee, Cuicui Lu, Suhyeon Nam, Iraj Rahmani, Seunghwa Rho, Shengwu Shang, Wei-Siang Wang, and Yali Wang had numerous discussions with me and I appreciate for their thoughtful comments and enjoyable conversation. I would like to express my gratitude to my parents and younger brother for their unwavering love and support. Last but not least, I would like to thank my wife who was always there cheering me up and stood by me through the good and bad times. iv Contents List of Tables viii List of Figures xii 1 The robustness of conditional logit estimator in a binary panel data model 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Ideal conditions for conditional logit estimator . . . . . . . . . . . . 1.3 DGP for correlated logistic errors . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Correlated errors from multivariate logistic distribution . . . . . . . . 1.4 Monte Carlo experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 The procedures of Data Generation Process: a continuous covariate . 1.4.2 The bias and size distortion for coefficient: a continuous covariate . . 1.4.3 The bias for APEs and rejection frequency: a continuous covariate . . 1.4.4 DGP: A model with both binary and continuous covariates . . . . . . 1.4.5 The bias and size distortion for δ: binary covariate . . . . . . . . . . 1.4.6 The bias for APEs and rejection frequency: binary covariate . . . . . 1.5 Panel data models for married women’s welfare participation . . . . . . . . 1.5.1 Static panel data model for married women’s welfare participation . . 1.5.2 Dynamic Unobserved effects logit models under strict exogeneity . . 1.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 5 6 7 8 11 12 16 21 27 28 30 32 34 39 44 2 The MI-IPW method in an unbalanced panel data model: An empirical application on the effect of class size reduction on SAT scores for students in grades K-3 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Linear panel data model with missing and noncompliance . . . . . . . . . . . 2.2.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Tests of missing completely at random . . . . . . . . . . . . . . . . 2.2.3 Implications from the rejection of tests . . . . . . . . . . . . . . . . 2.3 Formal tests of MCAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Hausman-type test I - Comparison of θpls and θF E using E(ci ,sit xit )= 0 as null . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 45 52 52 55 57 57 v 58 2.3.2 2.3.3 2.4 2.5 2.6 2.7 2.8 3 Monte Carlo experiment I . . . . . . . . . . . . . . . . . . . . . . . . Hausman-type test II - Comparison of θF E and θF D , Null: strict exogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Monte Carlo experiment: Test of MCAR using strict exogeneity assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.5 Variable addition test of strict exogeneity . . . . . . . . . . . . . . . Estimations with MAR assumption: IPW and MI-IPW methods . . . . . . 2.4.1 IPW method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 AIPW estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Multiple imputation . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.4 MI-IPW method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Monte Carlo experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 DGP I: Correct specification of both outcome and probability of selection models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 DGP II: The correct probability of selection model with endogenous treatment effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 The robustness of IPW estimator to misspecification of the probability of selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.4 The robustness of first-differencing estimator with small withinvariation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Empirical application: The return to class size reduction (CSR) on SAT score 2.6.1 Background information on Project STAR experiment of CSR . . . . 2.6.2 Evidence of experimental violation . . . . . . . . . . . . . . . . . . . 2.6.3 The impact of CSR on SAT score for grades in K-3 . . . . . . . . . . 2.6.4 Test of strict exogeneity assumption . . . . . . . . . . . . . . . . . . 2.6.5 Estimation with inverse probability weighted(IPW) and multiple imputation(MI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.6 IPW estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.7 MI-IPW method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Direction (sign) of the bias for unweighted estimators . . . . . . . . . . . . . 2.7.1 Bound for the effects of small class . . . . . . . . . . . . . . . . . . . conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cluster robust inference in unbalanced panel data when T is large 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Robust inference with t-test in unbalanced panel data . . . . . . . . 3.2.2 WLS with heterogenous covariance matrix in unbalanced panel data 3.3 Monte Carlo experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 The t-test using t∗ -statistic with scaled factor under homegeniety wls and attrition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi 60 66 67 69 74 75 79 81 83 84 84 92 97 108 115 116 117 127 134 137 140 151 155 155 159 160 160 164 166 171 172 173 The t-test using t∗ -statistic with scaled factor under heterogenous wls variance and attrition . . . . . . . . . . . . . . . . . . . . . . . . . . 181 3.3.3 Trade-off calculation: Dropping observation to make balanced panel data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 3.3.2 3.4 APPENDICES 189 A Gaussian copula density 189 B Proof for the consistency of conditional logit estimator in chapter 1 191 C Monte Carlo simulation results: Coefficient β for continuous covariate 193 D Monte Carlo simulation results: APEs for continuous covariate 204 E Monte Carlo simulation results: Coefficient δ for binary covariate 217 F Monte Carlo simulation results: APEs for binary covariate 222 G Proof of Hausman-type test in chapter 2 228 H Proof of Corollary 3 in Chapter 3: Robust inference in unbalanced panel data 232 I Monte Carlo simulation results in chapter 3: Size-adjusted power for t-test in Ibragimov and Muller [2009] 236 vii List of Tables 1.1 Basic Descriptive Statistics for the Generated Errors . . . . . . . . . . . . . 14 1.2 Pairwise correlation of the generated errors for ρ = 0.1 . . . . . . . . . . . . 14 1.3 Pairwise correlation of the generated errors for ρ = 0.5 . . . . . . . . . . . . 15 1.4 Pairwise correlation of the generated errors for ρ = 0.9 . . . . . . . . . . . . 15 1.5 Decomposition of bias for β for n = 100 . . . . . . . . . . . . . . . . . . . . 21 1.6 Decomposition of bias for β for n = 200 . . . . . . . . . . . . . . . . . . . . 22 1.7 The bias for β due to serial correlation using conditional logit for n = 200 . 22 1.8 Decomposition of bias into two parts using unconditional logit for n = 200 . 26 1.9 The bias of unconditional logit n = 200 . . . . . . . . . . . . . . . . . . . . 26 1.10 Decomposition of bias into two parts n = 100 . . . . . . . . . . . . . . . . . 29 1.11 Decomposition of bias using unconditional logit for n = 100 . . . . . . . . . 32 1.12 Basic statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 1.13 Logit estimation for married women’s welfare participation . . . . . . . . . . 37 1.14 Probit estimation for married women’s welfare participation . . . . . . . . . 38 1.15 Probit estimation of dynamic model for married women’s welfare participation 43 2.1 Hausman-type test I: Null and Alternative (cov(sit dit , ci ) = 0) . . . . . . . 62 2.2 Hausman-type test I: Null and Alternative (cov(sit dit , ci ) = 0) . . . . . . . 62 2.3 Pooled LS and FE estimates for α and p-value for Wa . . . . . . . . . . . . 64 2.4 RE and FE estimates for α and p-value for Wa . . . . . . . . . . . . . . . . 65 2.5 FD and FE estimates for α and p-value for Wb . . . . . . . . . . . . . . . . 70 2.6 Test of strict exogeneity variable addition test and Hausman test for Wb . . 73 2.7 Specification of models for missing variables . . . . . . . . . . . . . . . . . . 87 2.8 DGP I: MTE estimates for unweighted estimators . . . . . . . . . . . . . . . 89 2.9 DGP I: MTE estimates for IPW estimators 90 viii . . . . . . . . . . . . . . . . . . 2.10 DGP I: MTE for estimators with multiple imputations . . . . . . . . . . . . 91 2.11 DGP II: MTE for unweighted estimators . . . . . . . . . . . . . . . . . . . . 94 2.12 DGP II: MTE estimates for IPW estimators . . . . . . . . . . . . . . . . . . 95 2.13 DGP II: MTE estimates for estimators with multiple imputations . . . . . . 96 2.14 Selection model misspecification: MTE for unweighted estimators, δ=0 . . . 100 2.15 Selection model misspecification: MTE for IPW estimators, δ=0 . . . . . . 101 2.16 Selection model misspecification: MTE for unweighted estimators, δ=.2 2.17 Selection model misspecification: MTE for IPW estimators, δ=.2 . . . . . . 103 2.18 Selection model misspecification: MTE for unweighted estimators, δ=.5 2.19 Selection model misspecification: MTE for IPW estimators, δ=.5 . . 102 . . 104 . . . . . . 105 2.20 Selection model misspecification: MTE for unweighted estimators, δ=1.0 . . 106 2.21 Selection model misspecification: MTE for IPW estimators, δ=1.0 . . . . . 107 2.22 Sensitivity analysis to small within-variation: MTE with full within-variation 112 2.23 Sensitivity analysis: MTE with within-variation of 50% sample only . . . . 113 2.24 Sensitivity analysis: MTE with within-variation of 15% sample only . . . . 114 2.25 Students in each class types for grades in K-3 . . . . . . . . . . . . . . . . . 118 2.26 Attrition and reassignments of class types (noncompliance) . . . . . . . . . 121 2.27 SAT percentile scores at t − 1 for Leavers and stayers . . . . . . . . . . . . . 122 2.28 Data availability for covariates . . . . . . . . . . . . . . . . . . . . . . . . . . 124 2.29 Data availability by selection indicator . . . . . . . . . . . . . . . . . . . . . 125 2.30 Basic statistics: The mean and standard deviation . . . . . . . . . . . . . . 126 2.31 Cross-section unweighted LS and reduced-form estimates . . . . . . . . . . . 130 2.32 Unweighted pooled LS and reduced-form . . . . . . . . . . . . . . . . . . . . 132 2.33 Unweighted FD estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 2.34 Hausman-type test II: strict exogeneity . . . . . . . . . . . . . . . . . . . . . 135 2.35 Variable addition test: strict exogeneity . . . . . . . . . . . . . . . . . . . . . 136 2.36 Cross-section IPW LS and IPW reduced-form estimation . . . . . . . . . . . 145 ix 2.37 IPW pooled LS, pooled IV . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 2.38 Weighted FD estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 2.39 Covariates and distributional assumption used for MICE . . . . . . . . . . . 152 2.40 MI-IPW pooled LS, pooled IV . . . . . . . . . . . . . . . . . . . . . . . . . 154 2.41 MI-IPW FD estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 2.42 corr(vit , sit+j ) ∀ j≥ 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 2.43 Upper bound estimates for cross-section IV estimator . . . . . . . . . . . . . 158 3.1 selection process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 3.2 The size of test, 1(p < 0.05) . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 3.3 The size adjusted power of test . . . . . . . . . . . . . . . . . . . . . . . . . 179 3.4 The size adjusted power of test, 1(p < 0.05), T = 100 . . . . . . . . . . . . . 180 3.5 Configuration of heteroskedastic variance . . . . . . . . . . . . . . . . . . . . 181 3.6 The size of test, 1(p < 0.05) . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 3.7 The size adjusted power of test, 1(p < 0.05), T = 100 . . . . . . . . . . . . . 184 3.8 The size of test, 1(p < 0.05) where AR(1) error with correlation coefficient 0.75 186 3.9 The size adjusted power of test, 1(p < 0.05), T = 50 . . . . . . . . . . . . . 187 C.1 Unconditional logit estimates for β with no serial correlation . . . . . . . . . 194 C.2 Conditional logit estimates for β with no serial correlation . . . . . . . . . . 195 C.3 Unconditional logit estimates for β with serial correlation . . . . . . . . . . . 196 C.4 Conditional logit estimates for β with serial correlation . . . . . . . . . . . . 197 C.5 Unconditional logit estimates for β with serial correlation . . . . . . . . . . . 198 C.6 Conditional logit estimates for β with serial correlation . . . . . . . . . . . . 199 C.7 Unconditional logit estimates for β with serial correlation . . . . . . . . . . . 200 C.8 Conditional logit estimates for β with serial correlation . . . . . . . . . . . . 201 C.9 Unconditional logit estimates for β with serial correlation . . . . . . . . . . . 202 C.10 Conditional logit estimates for β with serial correlation . . . . . . . . . . . . 203 x D.1 LPM, unconditional logit, CRE logit APE estimates for β . . . . . . . . . . 205 D.2 LPM, unconditional logit, CRE logit APE estimates for β . . . . . . . . . . 206 D.3 LPM, unconditional logit, CRE logit APE estimates for β . . . . . . . . . . 207 D.4 LPM, unconditional logit, CRE logit APE estimates for β . . . . . . . . . . 208 D.5 LPM, unconditional logit, CRE logit APE estimates for β . . . . . . . . . . 209 D.6 LPM, unconditional logit, CRE logit APE estimates for β . . . . . . . . . . 210 D.7 LPM, unconditional logit, CRE logit APE estimates for β . . . . . . . . . . 211 D.8 LPM, unconditional logit, CRE logit APE estimates for β . . . . . . . . . . 212 D.9 LPM, unconditional logit, CRE logit APE estimates for β . . . . . . . . . . 213 D.10 LPM, unconditional logit, CRE logit APE estimates for β . . . . . . . . . . 214 D.11 LPM, unconditional logit, CRE logit APE estimates for β . . . . . . . . . . 215 D.12 LPM, unconditional logit, CRE logit APE estimates for β . . . . . . . . . . 216 E.1 Unconditional logit for δ with low serial correlation for n = 100 . . . . . . . 218 E.2 Unconditional logit for δ with high serial correlation for n = 100 . . . . . . . 219 E.3 Conditional logit for δ with low serial correlation for n = 100 . . . . . . . . . 220 E.4 Conditional logit for δ with high serial correlation for n = 100 . . . . . . . . 221 F.1 LPM, unconditional logit, CRE logit APE estimates for δ . . . . . . . . . . . 223 F.2 LPM, unconditional logit, CRE logit APE estimates for δ . . . . . . . . . . . 224 F.3 LPM, unconditional logit, CRE logit APE estimates for δ . . . . . . . . . . . 225 F.4 LPM, unconditional logit, CRE logit APE estimates for δ . . . . . . . . . . . 226 F.5 LPM, unconditional logit, CRE logit APE estimates for δ . . . . . . . . . . . 227 I.1 The size adjusted power of test, 1(p < 0.05), T = 100 . . . . . . . . . . . . . 237 I.2 The size adjusted power of test, 1(p < 0.05), T = 300 . . . . . . . . . . . . . 238 I.3 The size adjusted power of test, 1(p < 0.05) , T = 1, 000 . . . . . . . . . . . 239 xi List of Figures 1.1 Bias for β with continuous covariate . . . . . . . . . . . . . . . . . . . . . . . 17 1.2 Bias of unconditional and conditional logit for β with continuous covariate . 18 1.3 Bias of unconditional logit and CRE-logit estimators for APEs . . . . . . . . 24 1.4 Bias of unconditional and conditional logit estimators for β with binary covariate 28 1.5 Unconditional logit and CRE-logit estimates of APEs for binary covariate . . 2.1 Bias of unweighted and IPW LS estimator . . . . . . . . . . . . . . . . . . . 109 2.2 Bias of unweighted and IPW FD estimator to small within variation . . . . . 115 2.3 Estimated πit for t = 1, 2, 3 2.4 Estimated πit conditional on sit = 0 or sit = 1 for t = 1, 2, 3 . . . . . . . . . 144 2.5 Estimated πit and pit for t = 1, 2, 3 with attrition sample . . . . . . . . . . . 147 31 . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 xii Chapter 1 The robustness of conditional logit estimator in a binary panel data model 1.1 Introduction Standard maximum likelihood estimators are usually inconsistent in binary panel data models with individual fixed effects when individual fixed effects are unspecified with a large number in the cross-section (N) and a fixed number of time (T). The problem arises since there are so many nuisance parameters (individual fixed effects) to estimate while the time dimension is fixed.1 Fixed-T identification difficulty for micro panel can be overcome by using 1 This situation is called incidental parameters problem. The references on the incidental parameters problem include Neyman and Scott [1948], Lancaster [2000] and Arellano and Hahn [2007]. Individual fixed effects is also called as unobserved heterogeneity hereafter. 1 the conditional logit approach if the logistic distributional assumption is true.2 Chamberlain [2010] showed that conditional logit delivers √ n consistent estimation of the parameter of interest if individual-and-period specific errors are independent over time and logistic, and covariates are unbounded. However, these conditions for √ n consistency of conditional logit can be very restrictive in micro panel data applications. In particular, the independence assumption is restrictive in the applications with highly persistent response variables such as married women’s labor force participation (Chay and Hyslop [2001]), married women’s welfare participation (Chay and Hyslop [2001]), the incidence of external debt crises in developing countries (Hajivassiliou and McFadden [1998]) and habitual smoking behavior (Collado and Browning [2007]). Moreover, as fixed-T consistency results for model coefficients can not be extended to APEs, it becomes a serious limitation in the applications where researchers are interested in the effect of a policy on outcome. As we do not make any distributional assumption on heterogeneity (i.e. individual fixed effects), it is difficult to figure out what to plug in for unobserved heterogeneity when calculating the APEs. However, the conditional logit approach is useful for the estimations of log odd ratio and relative effects between covariates. In this paper, we study the robustness of fixed-T consistency results for the conditional logit estimator to the violation of conditional independence assumption using a Monte Carlo (MC) experiment. Specifically, we introduce data generating processes (DGPs) which are 2 These types of conditional methods are not available in general non-linear models and can not be extended to APEs estimation since they do not make any distributional assumption on unobserved heterogeneity. For general non-linear models, a recent literature focuses rather on approximately unbiased estimator than on the estimators with no bias at all. Arellano and Hahn [2007], Hahn and Newey [2004], Arellano [2003], Fernandez-Val [2009] and others provide bias correction methods for non-linear fixed effects panel data model. An extensive review is provided in Arellano and Hahn [2007]. 2 designed to satisfy ideal conditions for the conditional logit estimator except the conditional independence (CI) assumption to focus on the effects by the violation of CI assumption. For simplicity, the violation of CI in DGPs is induced by correlated logistic errors of AR(1) using a copula. We introduce a copula in simulation since it allows the separation of the dependence structure over time from the marginal distribution for each time period. Thus, by specifying marginal distribution of outcome for each time period and using a copula to specify the dependence structure of marginal distributions of outcomes over time, we specify joint non-normal (logistic) distribution for outcomes over time. In a MC experiment, using DGPs with correlated logistic errors, we examine the performance of conditional logit, CRE logit with pooled MLE, and unconditional logit methods.3 Finally, we quantify finite sample bias due to unobserved heterogeneity and due to true state dependence using unconditional logit and conditional logit estimators. Simulation with highly persistent binary panel data is important for the conditional logit estimator as well as for approximately unbiased estimators with bias correction since these methods require the conditional independence assumption but the effect of violation has not been studied much. When interpreting the simulations results, we focus on finite sample bias for coefficients β and APEs in (1.1) while in DGPs, the focus is on the generation of correlated errors uit T t=1 which are drawn from T-multivariate logistic distribution with serial correlation of AR(1). 3 Correlated random effect logit approach is based on parametric specification of the distribution for unobserved heterogeneity as in Chamberlain [1980]. In the simulation and application, we use parametric specification for unobserved heterogeneity as follows. D(ci |xi ) ∼ 2 logistic(ψ + xi ξ, σa ). Unconditional logit approach is based on logit model with MLE where it estimate unobserved heterogeneity(ci ) as parameters. For the properties of unconditional logit see Arellano and Hahn [2007] and Andersen [1973]. For the case of T =2, Abrevaya [1997] shows the bias of unconditional MLE is 100 %. 3 yit = 1[xit β + ci + uit ≥ 0], i = 1, 2, ..., n, and t = 1, 2, ..., T (1.1) where n is large, T is fixed and T ≥2, xit is continuous or binary, ci is correlated unobserved heterogeneity (with xit ) and uit is idiosyncratic error. In simulation, the conditional logit estimator for β in (1.1) shows finite sample bias when the CI assumption is violated by serial correlation among errors. Finite sample bias is greater for the high ρ(serial correlation coefficient) and small T (the size of time dimension). The size distortion of inference (the bias of rejection frequency) with conditional logit is very severe for the data with ρ ≥.4 and it increases with n (the size of the cross-section). CRE logit with pooled MLE for APEs and the associated rejection frequency shows no finite sample bias regardless of the magnitude of ρ. Unconditional logit exhibits the bias due to both (inconsistent estimation of) unobserved heterogeneity and serial correlation. For unconditional logit estimator β with DGPs of ρ ≤.4, more bias is due to unobserved heterogeneity than due to serial correlation. For the unconditional logit estimator for APEs, more bias is due to unobserved heterogeneity than to serial correlation for DGPs with ρ ≤.6. We analyze married women’s welfare participation using the conditional logit model with MLE and CRE logit model with pooled MLE for β and APEs with the data of 8 waves for 1,934 married women, taken from Chay and Hyslop [2001]. Married women’s welfare participation shows substantial serial persistence. This persistence could be due either to certain individual characteristics or due to true state dependence.4 The estimation results 4 In our example of welfare participation, true state dependence implies ”narcotic” effect in Chay and Hyslop [2001]. Once a person starts to depend on welfare, participation itself change the person’s mental attitude and behavior so that leads the person to rely more on welfare. 4 and a comparison of the estimates imply that much of the serial persistence of welfare participation is explained by unobserved heterogeneity (54%) but true state dependence (46%) is also an important source for serial persistence. Therefore, the empirical example shows that proper control of true state dependence as well as unobserved heterogeneity is very important in the model specifications of married women’s welfare participation. The rest of the chapter is organized as follows. In section 2, we explicitly explain the model and ideal assumptions for the conditional logit method and section 3 describes the procedure of DGPs using copula to satisfy ideal conditions for the conditional logit model except CI. Section 4 shows finite sample simulation results for conditional logit, CRE logit, and unconditional logit methods. Section 5 provides an empirical application pertaining to married women’s welfare participation and section 6 contains a brief summary of the results and the conclusion. 1.2 Model The conditional logit estimator allows to estimate β in (1.1) without making any assumption on unobserved heterogeneity since the logit transformation allows to separate the estimation of β from unobserved heterogeneity. (Andersen [1973], Chamberlain [1984] and Wooldridge [2010]) However, this comes with imposing CI assumption, which is not necessary for other competing estimators such as CRE logit with pooled MLE. In this section, we introduce conditional logit model and conditions for its ideal performance. 5 1.2.1 Ideal conditions for conditional logit estimator Assumption 1 Our baseline model has latent variable representation. yit = 1[xit β + ci + uit ≥ 0] ∗ yit = xit β + ci + uit where i = 1, 2, ...n, t = 1, 2, ..., T ci ∼ logistic (ψ + xi ξ, π2 ) 6 2 xit is continuous and uit ∼ logistic(0, π ) 6 Assumption 2 Conditional strict exogeneity (SE) E(yit |xi , ci ) = E(yit |xit , ci ) for each t P (yit = 1|xi , ci ) = P (yit = 1|xit , ci ) for each t Assumption 3 (Functional form marginal distribution) Conditional mean of binary response yit is specified as E(yit |xit , ci ) = P (yit = 1|xit , ci ) = Λ(xit β + ci ) for each t = 1, 2, 3, ...T where Λ(xit β + ci ) = exp(xit β + ci ) 1 + exp(xit β + ci ) 6 (i.e.) conditional marginal distribution of each yit is logistic distribution. Assumption 4 Conditional Independence (CI) T D(yi1, yi2 , ......, yiT |xi , ci ) = D(yit |xit , ci ) t=1 Assumption 5 Other regularity conditions for the consistency of β such as convex support of β, and continuity and differentiability of objective function. We also assume random sampling over cross-section. Theorem We assume, under assumptions 1-5, conditional logit estimator for β is a consistent estimator. Proof See appendix. The proof in appendix shows that the transformation using the sum of response variables as sufficient statistics eliminates unobserved heterogeneity. Obtained βc−logit estimator is √ a consistent and n-asymptotically normal. Chamberlain [2010] also shows that the only binary panel model that provides fixed-T √ n consistent estimators for β is logit model. Remark 1 Proof explicitly shows that CI assumption is required to apply conditional MLE for β. CI can be easily violated when response variables show state dependence. 1.3 DGP for correlated logistic errors This section introduces the details of data generating processes(DGPs) in simulation for studying the robustness of conditional logit to the violation of CI assumption. The key is to 7 generate variables that satisfy ideal conditions for conditional logit except CI. 1.3.1 Correlated errors from multivariate logistic distribution We introduce dependence among errors using a copula and logistic distribution. In general, specifying variables from multivariate logistic distribution can be very complicated as we have to define joint distribution. However, using copula, we can draw variables from multivariate logistic distribution by only specifying marginal distribution for each variable and copula which specify intended dependence among marginal distribution of each variable separately. Copula Copula is a function which links univariate marginal CDFs to their multivariate CDF with intended dependence among marginal distributions. Copula is useful for our purpose since any T -dimensional joint distribution function can be decomposed into its T marginal distributions and a copula function which completely describes the dependence structure among marginal distributions. For instance, in DGP, we use a copula to generate errors from a multivariate logistic distribution for which each marginal error has standard logistic distribution with intended dependence among marginal distribution of errors. We specify only conditional marginal distribution of zit for each time and use a Gaussian copula to specify dependence among CDFs of zi1 , zi2 , ..., ziT . We choose a Gaussian copula since most statistical packages allow users to randomly draw a set of variables from joint normal distribution with user-specified pair-wise covariance. Therefore, we specify the marginal distribution for each zit and Gaussian copula for dependence among CDFs of zi1 , zi2 , ..., ziT as in (1.2). 8 Φ(z1 , z2 , ....., zT ) = C(Φ1 (z1 ), ..., Φt (zt ), ..., ΦT (zT ); ρ) (1.2) where Φ(·) is joint normal, Φt (·) is univariate normal of marginal distribution of each t, and C(·, ρ) is copula function where ρ governs dependence among marginal distributions. Then, we transform normal marginal to logistic marginal at each period as in (1.3) and the dependence of multivariate normal by a Gaussian copula is inherited to the dependence among logistic marginal CDFs since copula is invariant to increasing transformation. Thus, after transforming of marginal distribution, we obtain (1.4). → Φt (zt ) Ft (ut ) (1.3) transformation of marginal CDF F (u1 , u2 , ....., uT ) = C(F1 (u1 ), ..., Ft (ut ), ..., FT (uT ); ρ) (1.4) where F (·) is joint logistic distribution and Ft (·) is univariate logistic marginal distribution. Joint logistic density is derived using multivariate gaussian copula in the appendix. Because of this invariance property of copula, dependence structure on zit T is maintained t=1 in ui T after increasing transformation. Generated ui T follows joint logistic distribution t=1 t=1 with correlation parameters ρ as in (1.2). Joint logistic density of f (u1 , u2 , ....., uT ) and its derivation is provided in appendix. Extension: Generate correlated variables from non-normal distribution Dependent variables generation using Gaussian copula can be extended to other non-normal multivariate distribution. For the method to work, the only requirement is to obtain the intended 9 marginal distribution from increasing transformation of the normal distribution. For instance, in our DGPs, each normal marginal CDF is transformed to logistic marginal CDF by increasing transformation and the Gaussian copula specifies the dependence structure among the marginal distributions. Dependence structure is maintained after increasing transforming by invariance property of copula. Generation of a random vector from T -joint logistic distribution with dependence structure of AR(1) correlation Consider a random vector z = (z1 , z2 , ......, zT ) which is from T -variate multivariate normal distribution with AR(1) covariance matrix Cov(z) of (1.5) and zt ∼ N (0, 1) ∀ t. 5   ρ  1    ρ 1    . . Cov(z) =  . . .  .    ρT −2 ρT −3    ρT −1 ρT −2 · · · · · · ρT −1   .  .  ··· ··· .    .. . · · · ρ2     ··· 1 ρ     ··· ··· 1 (1.5) Using gaussian copula in (1.6), we can express T -variate joint normal distribution as follows. Let Φ be a joint normal CDF for z = (z1 , z2 , ......, zT ) with marginal CDF Φt for each zt , then, by Sklar’s theorem in Sklar [1959], there exists unique copula C that connect each marginal CDF Φ1 (z1 ), Φ2 (z2 ), ...., ΦT (zT ) ∈ RT with dependence structure of (1.5) 5 We choose pairwise correlation of in (1.5) for simplicity. We have corr(Z)=cov(Z) since zt is standard normal ∀t. The main advantage of using copula is the flexibility of model marginal distribution and dependence among marginal distribution separately. Moreover, the specification of dependence relationship among marginal CDFs is not limited to 2nd moment or linear relationship but is allowed to nonlinear and tail dependence. However, in the simulation, our focus is serial correlation among errors so that we use copula with only linear relationship specification for simplicity. 10 such that equation (1.6) is satisfied. Φ(z1 , z2 , ...., zT ) = C(Φ1 (z1 ), ...., ΦT (zT ); Cov(z)) (1.6) where Φt is normal CDF ∀t = 1, ..., T . The transformation from a normal marginal distribution to a logistic marginal distribuΦ(z ) t tion is performed by ut = Ft−1 (Φ(zt )) = ln( 1−Φ(z ) ) where Ft (·) is logistic CDF. We can t rewrite (1.6) as the equation (1.7) for a copula C and u = (u1 , u2 , ...., uT ) ∈ RT , such that F (u1 , u2 , ...., uT ) = C(Ft (u1 ), ...., Ft (uT ); ρ), where F is joint logistic CDF. (1.7) Because of the invariance dependence property of copula for increasing transformation of marginal distribution, {uit }, t = 1, 2, ..., T , the dependent structure of z in (1.6) is maintained for u in (1.7) with the same C. 6 Lee [1979] also showed similar relationship between normal and logistic distribution for bivariate case. 1.4 Monte Carlo experiment We generate data based on two model specifications. First one is where the model has only one covariate with unbounded support and individual fixed effects are correlated with the covariate. Second model has two covariates of unbounded continuous covariate and binary covariate and has individual fixed effects which are correlated with the binary covariate. In both models, we generated dependent errors from joint logistic distribution using Gaussian 6 Linear dependence, (for instance, pairwise correlation), that we use here can not be maintained after transformation of marginal distribution from normal to logistic but general dependence measures such as Kendal’s rank correlation are maintained under monotonic transformation such as from normal to logistic. ( Schweizer and Sklar [1983] and Trivedi and Zimmer [2007]) 11 copula. We perform simulation with binary covariate since conditional logit models with MLE requires covariates to have unbounded supports for √ n consistency result in Chamber- lain [2010]. Thus, simulation with binary covariate provides information about how bounded support of covariate affects the property of conditional logit estimator. 1.4.1 The procedures of Data Generation Process: a continuous covariate We generate data {yit , xit , ci , uit } for t = 1, 2, .., T and i = 1, 2, ..., n based on model (1.1). We follow Heckman [1981], Green [2004] and Fernandez-Val [2009] for the design of the DGPs in this section. 7 We draw four items with 2,000 replications. The correlation across timedimension is induced through error components u using a copula as illustrated in previous section 1.3.1. We generate four items as follow: 1. xit ∼ iid N (0, 1). We generate an univariate covariate from identical and independent standard normal distribution. 2. Unobserved heterogeneity: x + xi2 + ..... + xik a π2 T √ + √i where ai ∼ logistic(0, ) and k = [ ].(1.8) D(ci |xi ) ∼ i1 3 2 2 2k 3. We generate each uit such that it has standard logistic marginal distribution (i.e. 2 uit ∼ log istic(0, π )), Cov(uit , uis ) = ρ|t−s| for t = s, and (u1 , u2 , ...., uT ) is drawn 3 from joint logistic distribution by using the procedure in section 1.3.1. 7 Devroye [1986] and Johnson [1987] contain extensive methods of generating multivariate random variables and vectors from non-normal distribution. 12 4. We generate yit as follow. yit = 1[xit β + ci + uit ≥ 0] where β = 1 and using xit , ci , uit as in1, 2, 3. (1.9) Generation of {xit }T and ci is straightforward and generation of {yit }T is also straightt=1 t=1 forward once we have {uit }T . We use a copula in generating errors, (ui1 , ui2 , ...., uiT ) from t=1 T -variate logistic distribution while each uit has a univariate logistic distribution and dependence structure among (ui1 , ui2 , ...., uiT ) is inherited from dependence structure of joint normal distribution for (z1 , z2 , ......, zT ). Generation of correlated logistic errors, u Let z = (z1 , z2 , ......, zT ) is variables drawn from T -variate multivariate normal distribution with dependent structure of AR(1) serial correlation. We obtain (u1 , u2 , ...., uT ) by the increasing transformation of marginal distributions, ut = Ft−1 (Φt (zt )) where Ft is logistic CDF and Φt is standard normal CDF. u = (u1 , u2 , ...., uT ) is generated from the following algorithm: (Step 1) Draw z1 , z2 , ...., zT from T -variate normal distribution with dependence structure of cov(z) in (1.5). (Step 2) Obtain each marginal CDF et = Φt (zt ) where Φt (·) is standard normal CDF, and ∀t = 1, ..., T . (Step 3) Transform zt to ut based on following formula. ut = Ft−1 (et ) = Ft−1 (Φt (zt )) 2 e Φ (z ) t t t where Ft (·) ∼ logistic(0, π ), (i.e. ut = ln( 1−e ) = ln( 1−Φ (z ) )) 3 t t t (Step 4) Repeat steps of (1), (2), and (3) for all i = 1, 2, ..., n. 13 Table 1.1. Basic Descriptive Statistics for the Generated Errors u1 ρ = 0.1 ρ = 0.25 ρ = 0.5 ρ = 0.75 ρ = 0.9 u2 u3 u4 u5 u6 u7 u8 -.07(1.9) -.03(1.8) -.03(1.8) -.02(1.8) -.02(1.8) .07(1.7) .03(1.8) .01(1.8) -.01(1.8) -.01(1.8) .12(1.8) .08(1.9) .06(1.9) .02(1.9) .02(1.9) .02(1.9) .03(1.9) .04(1.9) .02(1.8) .02(1.8) -.04(1.8) -.02(1.8) .01(1.9) .01(1.9) .01(1.9) -0.05(1.9) -0.03(1.8) -0.02(1.8) -0.01(1.9) -0.01(1.9) .02(1.9) .06(1.9) .04(1.9) .03(1.9) .03(1.9) .08(1.9) .11(1.8) .11(1.9) .07(1.9) .07(1.9) Table 1.2. Pairwise correlation of the generated errors for ρ = 0.1 u1 u1 u2 u3 u4 u5 u6 u7 u8 u2 u3 u4 u5 u6 u7 u8 1 0.09 0.06 0.02 0.03 0.03 0.02 0.02 1 0.13 0.01 0.01 0.06 -0.05 -0.002 1 0.11 -0.02 0.01 0.01 0.02 1 0.11 0.02 0.05 0.07 1 0.11 0.01 0.04 1 0.09 0.02 1 0.08 1 Basic Statistics for correlated logistic errors Table 1.1 reports the MC point es- timates of 2,000 replications for the mean and the standard deviation of generated errors from multivariate logistic distribution with n=100 and T =8. The estimates for standard deviations are reported in parenthesis. Table 1.2, 1.3, and 1.4 report pairwise correlations among generated errors for n=100, T =8 and ρ for 0.1, 0.5 and 0.9, respectively. Tables of basic statistics verify that pairwise correlations of z are maintained to the pairwise correlations among u. Statistics for uit such as mean and standard deviation is equal to mean and standard deviation of a variable from standard logistic distribution. 14 Table 1.3. Pairwise correlation of the generated errors for ρ = 0.5 u1 u1 u2 u3 u4 u5 u6 u7 u8 u2 u3 u4 u5 u6 u7 u8 1 0.51 0.30 0.14 0.07 -0.01 0.02 0.05 1 0.54 0.28 0.09 0.04 0.03 0.06 1 0.52 0.26 0.13 0.08 0.12 1 0.48 0.26 0.14 0.10 1 0.50 0.23 0.12 1 0.52 0.22 1 0.50 1 Table 1.4. Pairwise correlation of the generated errors for ρ = 0.9 u1 u1 u2 u3 u4 u5 u6 u7 u8 u2 u3 u4 u5 u6 u7 u8 1 0.90 0.80 0.72 0.67 0.59 0.53 0.48 1 0.90 0.80 0.74 0.66 0.59 0.54 1 0.90 0.81 0.73 0.65 0.59 1 0.90 0.82 0.73 0.67 1 0.91 0.81 0.74 1 0.89 0.81 1 0.91 1 15 1.4.2 The bias and size distortion for coefficient: a continuous covariate All the results presented in this section are based on 2,000 replications and the robust standard error of sandwich form that allows weakly dependent serial correlation. We vary three components T , n, and ρ in the DGPs since it is important to examine how conditional logit estimator for β behaves in finite sample as T , n, and ρ change. We are particularly interested in typical micro panel such as small T , large n with high persistence(ρ > 0.4). Throughout the tables in this section, we report the mean of the MC point estimate for β (Mean), the standard deviation of the MC point estimate for β (SD), the mean of the MC point estimate of the robust standard error for β (SE), the rejection frequency (1(pval < 0.05)), and 95-percent coverage rate for the rejection frequency. Conditional logit for β Figure 1.2 summarizes the estimation results for unconditional logit and conditional logit estimators for coefficient β. As the second panel of figure 1.2 shows, for ρ=0, conditional logit estimator for β shows no bias and rejection frequency in hypothesis test with conditional logit estimator for β shows no size distortion regardless of n and T . For ρ=0.2, the bias of conditional logit estimator of β is less than 10% for T ≥4. The rejection frequency reveals that a simulation with ρ=0.2 leads to over-rejection by 100% for n = 400. The size distortion increases with n. For ρ=0.4, the bias of conditional logit estimator for β is greater than 10% for ∀T ≤ 8. The rejection frequency reveals that a simulation with 16 Figure 1.1. Bias for β with continuous covariate (Note) x-axis: time dimension, y-axis: estimated β. True β is 1. Each color of solid line represents serial correlation coefficient, ρ - blue (0), brown (.2), green (.4), purple (.6), and teal-blue (.8). We use n = 200 as benchmark since the bias does not change much as we increases n over 200. The left panel of first row shows the estimates for unconditional logit estimator and the right panel of first row shows the estimates for conditional logit estimator. The left panel of second row shows the estimates from both of conditional (dashed line) and unconditional (solid line) estimators together. More results on simulation with various n, T, ρ are reported in Appendix. “For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation.” 17 Figure 1.2. Bias of unconditional and conditional logit for β with continuous covariate (Note) x-axis: time dimension, y-axis: estimated β and legend represent number of cross-section units - true (teal), blue (100), red (200), green (400) and purple (600) - by solid line for unconditional logit estimates and by dashed line for conditional logit estimates. ρ represents serial correlation coefficient. Each panel of figure represents estimated β for each ρ. Figures in the first row report estimated β for ρ =0 and ρ=.2. Figures in the second row report estimated β for ρ =.4 and ρ=.6. 18 ρ=0.4 leads to over-rejection by 700% for n = 400. For ρ=0.6, the bias of conditional logit estimator of β is greater than 20% for ∀T ≤ 8. The rejection frequency with nominal level 0.05 approaches 0.95 and beyond as n increases over 400 with ∀T ≥ 4. Finally, for ρ=0.8, the bias of conditional logit estimator for β is greater than 50% for ∀T ≤ 8. The rejection frequency with nominal level 0.05 approaches 1 as n increases over 200 with ∀T ≥ 3. Table 1.7 shows the magnitude of bias in % as ρ changes using unconditional logit estimates in MC experiment. Figure 1.1 shows that the increase in T leads to the smaller bias for coefficient estimates but the increase in n does not change the magnitude of finite sample bias for coefficient β especially for T ≥ 3. Overall, the magnitude of finite sample bias for coefficient β is greater with the higher value of ρ and the smaller T but is invariant to n. The size distortion becomes more severe as n increases and ρ increases while the change in T (∀T ≥ 3) does not change the rejection frequency much. Regardless of the lengths of T and n, the size distortion in inference is quite severe for ρ >0.2. In sum, simulation reveals that conditional logit estimator with correlated logistic errors provides biased coefficient estimates and rejection frequency in inference and, therefore, is not robust to the violation of CI assumption by serial correlation especially for ρ >0.2. Unconditional logit for β: Decomposition of Bias Left panel in the first row of figure 1.1 shows finite sample bias of unconditional logit for β. The bias for β is greater for smaller T and higher value of ρ while finite sample bias does not change much as n changes in figure 1.2. Furthermore, in table 1.5 and 1.6, using 19 unconditional logit estimator for β, simulation shows the decomposition of bias into the part due to unobserved heterogeneity(ci ) and the part due to serial correlation(ρ) for n = 100 and n = 200, respectively. Bias for β Finite sample bias for unconditional logit is greater with smaller T and larger ρ. For instance, when there is no serial correlation, ρ = 0, the magnitude of bias for unconditional logit is greater than 100 % with T =2 and ∀n ≤ 600 while the magnitude of bias for unconditional logit is about 20 % with T =8 and ∀n ≤ 600. Even for T as large as 8, finite sample bias of unconditional logit estimator is about 17 % with no serial correlation and finite sample bias for unconditional logit increases up to about 20 %, 30 %, 45 % and 90 % as ρ increases up to .2, .4, .6, and .8 respectively. Decomposition of bias For ρ ≤0.4, finite sample bias due to ci (unobserved heterogeneity) dominates the bias due to ρ (serial correlation among errors) ∀n, T . For T ≥5, the bias and the decomposition of bias does not change much as n changes. For instance, in table 1.6, for the case of n=200 and T =8, finite sample bias due to unobserved heterogeneity is about 18 % while finite sample bias due to serial correlation is about 4 %, 13 %, 29 % and 73 % for ρ=.2, .4, .6 and .8 respectively. As ρ increases finite sample biases by both sources are substantial so that it is important to control both unobserved heterogeneity and serial correlation properly. Sometimes, we can control only one source of bias and, in this case, the choice of allowing serial dependence on response variable(D(yi |xi , ci )) or of allowing no restriction on D(ci |xi ) should be depending on whether researchers have more information 20 Table 1.5. Decomposition of bias for β for n = 100 ci ρ = 0.2 ρ = 0.4 ρ = 0.6 ρ = 0.8 t=2 146 42 113 286 624 t=3 75 19 51 112 478 t=4 48 12 33 73 186 t=5 36 9 24 54 136 t=6 28 7 19 43 108 t=7 23 5 15 35 90 t=8 20 5 13 29 76 (Note) % of bias are reported in the cells. ci represents the proportion of bias due to unobserved heterogeneity. ρ represents serial correlation coefficient for errors. Abrevaya [1997] shows that unconditional logit MLE for β has probability limit 2β. Thus, 46 % of bias is due to finite sample bias which has nothing to do with unobserved heterogeneity. ˆ ˆ With no serial correlation of error, we obtain βc−logit =1.23 and βunc−logit =2.46. on D(yi |xi , ci ) or D(ci |xi ).8 For instance, for the case where there exists high persistence of response variables even after conditioning on covariates, pooled methods that allowing serial dependence with parametric assumption on D(ci |xi ) probably more appropriate than those methods that impose no restriction on D(ci |xi ) and require conditional independence assumption on outcome D(yi |xi , ci ). 1.4.3 The bias for APEs and rejection frequency: a continuous covariate Since conditional logit approach avoids specifying individual fixed effects, APEs cannot be obtained for conditional logit estimator. In the estimation of APE for β in (1.1), we study the APEs for CRE logit with pooled MLE under correct specification for D(ci |xi ) and we also compare these estimates with APEs for fixed effects(FE), LPM, and unconditional logit with 8 It is like determining which functional forms assumption either D(yi |xi , ci ) or D(ci |xi ) is causing less distortion in statistical analysis. 21 Table 1.6. Decomposition of bias for β for n = 200 ci ρ = 0.2 ρ = 0.4 ρ = 0.6 ρ = 0.8 t=2 119 31 80 175 493 t=3 67 17 45 99 238 t=4 45 11 30 68 168 t=5 33 8 23 51 128 t=6 26 6 18 41 100 t=7 22 5 14 33 84 t=8 18 4 13 29 73 (Note) % of bias are reported in the cells. ci represents the proportion of bias due to unobserved heterogeneity. ρ represents serial correlation coefficient for errors. 19 % of bias is due to finite sample bias which has nothing to do with unobserved heterogeneity. With ˆ ˆ no serial correlation of error, we obtain βc−logit =1.095 and βunc−logit =2.19. Table 1.7. The bias for β due to serial correlation using conditional logit for n = 200 ci ρ = 0.2 ρ = 0.4 ρ = 0.6 ρ = 0.8 t=2 0 25 50 97 269 t=3 0 14 28 57 129 t=4 0 9 21 44 101 t=5 0 8 18 36 84 t=6 0 6 15 31 72 t=7 0 6 13 27 64 t=8 0 5 11 24 57 (Note) % of bias are reported in the cells. ci represents the proportion of bias due to finite sample estimation error for unobserved heterogeneity. ρ represents serial correlation coefficient for errors. 22 MLE.9 The APEs of CRE pooled-MLE and LPM with FE does not impose independence over time while unconditional logit does not impose any restriction on unobserved heterogeneity. Thus, unconditional logit is inconsistent estimator due to both unobserved heterogeneity and serial correlation. Figure 1.3 summarizes MC estimates of APEs for CRE logit with pooled MLE, unconditional logit with MLE and FE LPM. With correct specification of unobserved heterogeneity for CRE logit, only unconditional logit for APEs shows substantial bias. Finite sample bias for APEs of unconditional logit is positively correlated with the magnitude of serial correlation. CRE logit with pooled MLE: APEs Left panel of the second row in figure 1.3 summarizes the APEs for CRE logit with pooled MLE. The results with n=200 are reported as benchmark.10LPM and CRE logit with pooled MLE show no finite sample bias for APEs and associated rejection frequency. The correct coverage rate of rejection frequency of hypothesis test on APEs of β is obtained regardless of serial correlation (ρ) and the size of time dimension(T ).11Finite sample bias is less than 6%(= .165−.156 ) ∀T and ∀ρ and is less than 2% for most T and ρ. In short, CRE logit .165 with pooled MLE provides reliable estimates for APEs of β and shows no size distortion in inference under correct specification of unobserved heterogeneity.12 9 Unconditional logit is logit model where individual fixed effects are estimated as parameters. CRE logit with pooled MLE is logit model with parametric assumption for unobserved heterogeneity as in DGPs of previous section. Pooled estimation is based on the assumption of wrong model. 10 More results with various n and T are reported in appendix. 11 The results do not change much as n changes. See appendix for details. 12 The correct specification of unobserved heterogeneity is implicitly assumed in DGPs since ci is generated to satisfy CRE logit model. 23 Figure 1.3. Bias of unconditional logit and CRE-logit estimators for APEs (Note) x-axis: time dimension, y-axis: estimated β. In figures of first row, solid lines report APEs for unconditional logit and dashed lines report APEs for conditional logit. Color represent estimates with different number of cross-section units from n=100 to n=600. Figures in first row, left panel reports APEs with ρ=0 and right panel reports APEs with ρ=.6. For figures in second row, left panel report APEs for CRE logit with n=200 and various ρ - blue (true), red (0), greed (.2), purple (.4), teal (.6), and orange (.8) and right panel report APEs for unconditional logit with n=200 and various ρ - blue (0), red (.2), greed (.4), purple (.6), and teal (.8). 24 Unconditional logit with pooled MLE: APEs Unconditional logit with MLE shows substantial finite sample bias for APEs and shows substantial size distortion in inference. Table 1.9 reports the estimates of β for unconditional logit. Finite sample bias for APEs is greater for smaller T and greater value of ρ. Even for ρ=0, finite sample bias is greater than 15 % ∀T . The bias for APEs is greater than 17 %, 42 %, 96 % and 204 % with ρ=.2, .4, .6 and .8, respectively, ∀T . The bias in rejection frequency is greater for smaller T and greater value of ρ but does not depend on n.13 The bias decomposition for APEs using unconditional logit Table 1.8 reveals that the decomposition of the bias for APEs into the part due to unobserved heterogeneity (ci ) and the part due to serial correlation (ρ) for n=200. The bias by ci dominates the bias by serial correlation with ρ ≤0.6. This implies that, compared to the bias for β, the relative influence to the bias from ci is greater for APEs. % of bias due to serial correlation decreases as T increases for T ≥ 4. In sum, first, serial correlation does not affect APEs for LPM and CRE logit with pooled MLE with correct specification of individual effects. Second, both estimates of LPM and CRE logit with pooled MLE under correct specification of individual effects show no significant finite sample bias for APEs and rejection frequency regardless of n, T and ρ. Third, the estimates of APEs for unconditional logit show substantial bias and its magnitude is positively correlated to serial correlation and negatively correlated to the magnitude of time dimension (T ). 13 See appendix for the results of the bias for rejection frequency. 25 Table 1.8. Decomposition of bias into two parts using unconditional logit for n = 200 ci , ρ = 0 ρ = 0.2 ρ = 0.4 ρ = 0.6 ρ = 0.8 t=2 114 9 17 24 17 t=3 67 10 21 37 52 t=4 44 6 16 32 55 t=5 30 5 13 25 48 t=6 22 4 10 21 41 t=7 17 3 7 16 34 t=8 15 2 6 14 30 (Note) % of bias of unconditional logit APEs are reported in the cells. ci represents the proportion of bias due to finite sample estimation error for unobserved heterogeneity. ρ represents serial correlation coefficient for errors. n = 200 case was reported as benchmark since very small difference of bias decomposition occurs as we change n. % of bias due to ci is obtained from AP Eunc,ρ=0 −AP ET rue and % of bias due to serial AP ET rue AP Eunc,ρ>0 −AP Eunc,ρ=0 . obtained from AP Eunc,ρ>0 correlation (ρ) is Table 1.9. The bias of unconditional logit n = 200 true ci , ρ = 0 ρ = 0.2 ρ = 0.4 ρ = 0.6 ρ = 0.8 t=2 .163 .349 .363 .378 .388 .277 t=3 .163 .273 .289 .308 .334 .258 t=4 .164 .236 .246 .262 .288 .326 t=5 .165 .214 .222 .235 .255 .294 t=6 .165 .201 .208 .218 .235 .269 t=7 .166 .194 .199 .206 .220 .251 t=8 .164 .188 .192 .198 .211 .237 (Note) The bias are reported in the cells. n = 200 case was reported as benchmark since very small difference of bias decomposition occurs as we change n. 26 1.4.4 DGP: A model with both binary and continuous covariates We generate data {yit , xit , Dit , ci , uit } for t = 1, 2, .., T and i = 1, 2, ..., n based on model (1.10) with 1,000 replications. yit = 1[xit β + Dit δ + ci + uit ≥ 0] (1.10) We draw the following five items as below. The correlation across time is induced through error components u as in section 1.3.1. 1. xit ∼ iid N (0, 1). We generate an univariate covariate from identical and independent standard normal distribution. 2. Dit = 1(dit − .5 ≥ 0) where dit ∼ iidU nif (0, 1) 3. Unobserved heterogeneity: D(ci |xi ) ∼ a Di1 + Di2 + ..... + Dik √ + i 2k π2 (1.11) 3 where ai ∼ logistic(0, π2 T ) and [k = ]. 3 2 4. We generate each uit such that it has standard logistic marginal distribution (i.e. 2 uit ∼ log istic(0, π )), Cov(uit , uis ) = ρ|t−s| for t = s, and joint distribution of 3 (u1 , u2 , ...., uT ) is logistic. 5. We generate yit yit = 1[xit β + Dit δ + ci + uit ≥ 0] 27 (1.12) Figure 1.4. Bias of unconditional and conditional logit estimators for β with binary covariate (Note) x-axis: time dimension, y-axis: The estimates for β are reported for n = 100. The color of line represents serial correlation coefficient - orange (true), blue (0), red (.2), green (.4), purple (.6), teal-blue (.8). The left panel of first row shows the estimates for unconditional logit estimator and the right panel of first row shows the estimates for conditional logit estimator. The left panel of second row shows the estimates for both conditional (dashed line) and unconditional (solid line) estimators together. using xit , Dit , ci , uit as in 1,2,3,4, β=.2 and δ =1. 1.4.5 The bias and size distortion for δ: binary covariate Conditional logit for δ As figure 1.4 shows, in the MC experiment with binary covariate, we reach the same conclusion as in the MC experiment with a continuous covariate. Smaller T and greater value of ρ 28 Table 1.10. Decomposition of bias into two parts n = 100 ci ρ = 0.2 ρ = 0.4 ρ = 0.6 ρ = 0.8 t=2 157 48 212 486 14343 t=3 65 18 47 104 450 t=4 41 10 28 63 162 t=5 29 8 21 47 116 t=6 26 6 15 36 92 t=7 20 6 15 32 80 t=8 18 4 11 25 68 (Note) % of bias are reported in the cells. leads to greater magnitude of finite sample bias for coefficient estimate, δ, but the change in n does not affect the bias much. The size distortion becomes more severe as n and ρ increase while the change in ∀T ≥ 3 does not have much effect on the rejection frequency. Regardless of the values of T and n, the size distortion is quite severe for ρ >0.2. Conditional logit estimator for δ and rejection frequency are biased and, therefore, are not robust to violation of CI assumption for ρ >0.2, ∀n, T . Unconditional logit for δ: Decomposition of Bias We obtain the same conclusion as DGP with a continuous covariate although the magnitude of bias is greater with a binary covariate especially for T ≤ 4. The bias of unconditional logit for δ is greater for data with smaller T and higher value of ρ while the bias does not change much as n changes as shown in figure 1.4. Table 1.10 shows that, for ρ ≤0.4, the bias due to ci (unobserved heterogeneity) dominates the bias due to ρ (serial correlation among errors) ∀T . 29 1.4.6 The bias for APEs and rejection frequency: binary covariate In the estimation of APEs for β in (1.12), we study the APEs for CRE logit with pooled MLE and compare these estimates with APEs for LPM and unconditional logit with MLE. Figure 1.5 summarizes MC estimates for APEs of CRE logit with pooled MLE, unconditional logit with MLE and LPM. As the left panel in the second row of figure 1.5 shows, LPM and CRE logit with pooled MLE exhibit no substantial finite sample bias of APEs for δ regardless of serial correlation (the magnitude of ρ) and size of T . The rejection frequency also shows no significant size distortion regardless of the magnitude of serial correlation (ρ) and size of T .14In short, CRE logit with pooled MLE provides reliable finite sample estimates for APEs of δ and a valid inference under correct specification of unobserved heterogeneity. The bias decomposition for APE: binary covariate Unconditional logit with MLE shows substantial bias for APEs and rejection frequency. The right panel in the second row of figure 1.5 shows the bias of APE of δ for unconditional logit. The bias for APE is greater for smaller T and larger value of ρ. Even for ρ=0, the magnitude of bias for APE of δ is greater than 24 % ∀T . The bias increases with the magnitude of ρ for all T . The bias for rejection frequency is greater for smaller T and larger value of ρ.15 14 The results are reported 15 See appendix for further in appendix. results of the bias for rejection frequency. 30 Figure 1.5. Unconditional logit and CRE-logit estimates of APEs for binary covariate (Note) x-axis: time dimension, y-axis: APEs for β with n = 100 are reported. ρ represents serial correlation coefficient. Figures in first row report APEs with ρ=0 (left) and ρ=.6 (right) respectively. The colors of solid line represent various APE estimators; blue (true), red (LPM), green (unconditional logit), and purple (CRE logit). As we see, purple completely cover blue. The left panel of second row shows the estimates for CRE logit estimator and the right panel of second row shows the estimates for unconditional logit estimators with various serial correlation coefficient; blue (0), red (.2), green (.4), purple (.6) and teal-blue (.8). 31 Table 1.11. Decomposition of bias using unconditional logit for n = 100 ci ρ = 0.2 ρ = 0.4 ρ = 0.6 ρ = 0.8 t=2 164 27 65 140 255 t=3 87 14 37 73 197 t=4 55 7 23 48 104 t=5 41 7 18 38 82 t=6 33 4 13 28 65 t=7 27 5 12 26 59 t=8 25 4 10 22 51 (Note) % of bias of unconditional logit APEs are reported in the cells. ci represents the proportion of bias due to finite sample estimation error for unobserved heterogeneity. ρ represents serial correlation coefficient for errors. n = 100 case was reported as benchmark since very small difference of bias decomposition occurs as we change n. % of bias due to ci is obtained from 1.5 AP Eunc,ρ=0 −AP ET rue and % of bias due to serial AP ET rue AP Eunc,ρ>0 −AP Eunc,ρ=0 obtained from . AP Eunc,ρ>0 correlation (ρ) is Panel data models for married women’s welfare participation In this section, we illustrate the application of conditional logit for β and CRE logit with pooled MLE for APEs using the Survey of Income and Program Participation (SIPP) data on the welfare participation of married women from Chay and Hyslop [2001].16 As welfare participation of married women exhibits substantial serial persistence over time, we apply CRE logit with pooled MLE to obtain APEs of marital status and the number of children on welfare participation. Welfare participation data are from 1990 panel of the SIPP which contains eight waves at four month intervals covering a 32-month period. The data set contains 1,934 married 16 Pooled MLE with CRE logit model does not assume that the full conditional density of y give x, c is correctly specified but it is very useful since full conditional density of y give x, c is likely to be very complicated and pooled MLE does not restrict on serial dependence. 32 women over 8 periods. The sample contains women who are 18-65 years old and either receive AFDC payments during or before the sample period or whose average total family income during the sample period is below the family-specific average poverty level.17Welfare participation status shows clear serial correlation over time. For instance, out of 1,934 married women, 1,076 married women never participate in welfare program and 364 married women participate in welfare program for all sample periods. Basic statistics for variables which is used in estimation are provided in table 1.12. We start estimation with static model of (1.13) for simplicity. We control unobserved heterogeneity with chamberlain device while we do not control for true state dependence explicitly. participationit = 1[zit β + ft + ci + uit ≥ 0], i = 1, 2, ..., n, and t = 1, 2, ..., T (1.13) where zit includes black race dummy, years of education, family poverty level status, age/10, age2 /100, marital status and number of children age less than 18. And ft is period fixed effects. 17In summer 1993, the Nation had 36 million mothers 15 to 44 years old; 3.8 million of them (10 percent) were receiving AFDC (Aid to Families with Dependent Children) payments to help with the rearing of 9.7 million children. An additional 0.5 million women over 45 years old and 0.3 million fathers living with their dependent children also received AFDC. 33 Table 1.12. Basic statistics Obs AFDC 15,472 married 15,472 kids 15,472 Black 15,472 Educ 15,472 Poverty 15,472 age/10 15,472 2 /100 15,472 age Note: There is 1.5.1 Mean .302 .291 1.849 .310 11.260 1026.426 3.405 12.917 no missing S.D Min Max .459 0 1 .454 0 1 1.479 0 10 .463 0 1 2.700 0 18 352.849 519.755 2377.113 1.149 1.6 6.3 8.864 2.56 39.69 data. 15,472 = 8×1,934 Static panel data model for married women’s welfare participation Table 1.13 shows that allowing unobserved heterogeneity to be correlated with marital status and number of children has important effects on estimated APEs. LPM provides estimated coefficients of -0.27 and 0.05 on marital status and number of children, respectively, with both coefficients being statistically significant. Each child is estimated to increase the welfare participation probability by about 0.05 while being married is estimated to reduce the welfare participation probability by about 0.27. If we use logit and assume unobserved heterogeneity is independent of explanatory variables, the estimated APEs are a little bit higher for both marital status and number of children. When we use CRE logit model with pooled MLE which controls for unobserved heterogeneity, we obtain APEs that are very similar to the estimates by LPM. Moreover, the coefficients on the time averages of time-varying covariates are statistically significant and practically large. All these results imply that accounting for unobserved heterogeneity correctly is very important. CRE logit model with full MLE reveals the same conclusion as with partial MLE. Using GEE with logit link and correlated random 34 effect provides very similar estimates to the estimates for CRE logit model with full MLE. Finally, column (6) contains the estimates from conditional logit. Unfortunately, the coefficient magnitudes are difficult to interpret because of scaling factor. The ratio of coefficients using conditional logit is − 2.80 = (−3.69), which is quite different from the one, 0.76 − 1.76 = (−5.18), from CRE logit model with Pooled MLE. Moreover, if we do not con0.34 trol for unobserved heterogeneity, the ratio of coefficient for RE logit with Pooled MLE 2.28 is − 0.54 = (−4.22). The discrepancy for the ratio of coefficients between CRE logit and RE logit which does not control unobserved heterogeneity is possibly caused by either misspecification for the distribution of unobserved heterogeneity or uncontrolled unobserved heterogeneity. Moreover, the discrepancy for the ratio of coefficients between CRE logit and conditional logit is possibly caused by either misspecification for the distribution of unobserved heterogeneity or serial correlation. Therefore, for instance, if we assume distribution of unobserved heterogeneity is correct with Chamberlain’s device, then we can infer that the discrepancy is originated from the bias by serial correlation for conditional logit and the bias by unobserved heterogeneity for RE logit. Moreover, in this case, because CRE logit with pooled MLE in column (3) allows arbitrary weak dependence, it seems sensible to rely on the estimates of CRE logit with pooled MLE under the assumptions that marital status and number of children variables are strictly exogenous conditional on unobserved heterogeneity and the assumption which unobserved heterogeneity can be specified by the time averages of time-varying covariates. However, without correct specification on D(ci |xi ), no procedure strictly dominates the other. In particular, the choice among these procedures should be based on whether we are more confident for the correct specification about D(yi |xi , ci ) or D(ci |xi ). 35 Table 1.14 shows results for probit model using the same estimation methods as in table 1.13. Although standard errors are smaller for probit model, we can draw the identical conclusion as using logit model. 36 Table 1.13. Logit estimation for married women’s welfare participation Model Estimation married kids married kids (1) Linear FE coef -.2707 (.0278) .0524 (.0098) (2) logit pooled coef -2.276 (.152) .539 (.056) MLE APE -.3704 (.0211) .0877 (.0086) (3) CRE logit pooled coef -1.757 (.196) .343 (.060) -.613 (.231) .231 (.062) (4) CRE logit MLE GEE APE coef -.2851 -1.591 (.0308) (.185) .0557 .294 (.0096) (.054) -.723 (.230) .242 (.061) √1 1+ˆ 2 σ (5) CRE logit Full APE coef -.2613 -3.305 (.0298) (.248) .0483 .724 (.0088) (.091) -1.762 (.349) .547 (.111) .242 (6) FE logit MLE MLE APE coef -.2293 -2.802 (.0122) (.242) .0671 .761 (.0084) (.104) log likelihood -7571.91 -7552.25 -4028.83 -1412.62 number of women 1934 1934 1934 1934 1934 494 *Note: Cluster robust standard errors are reported in parentheses below all coefficients and APEs. All model include a full set of period dummy variables, time constant variables years of education, black race dummy variable, age/10, and age2 /100 which are not reported. 37 Table 1.14. Probit estimation for married women’s welfare participation Model Estimation married kids married kids (1) Linear FE coef -.2707 (.0278) .0524 (.0098) (2) probit pooled coef -1.251 (.078) .309 (.033) MLE APE -.3490 (.0189) .0861 (.0086) (3) CRE probit pooled MLE coef APE -.951 -.2644 (.103) (.0281) .198 .0550 (.035) (.0097) -.354 (.126) .132 (.036) √1 1+ˆ 2 σ (4) CRE probit GEE coef APE -.906 -.2525 (.105) (.0289) .177 .0492 (.032) (.0091) -1.304 (.134) .404 (.037) (5) CRE probit Full coef -1.870 (.134) .400 (.052) -.726 (.217) .290 (.071) .349 (6) FE logit MLE MLE APE coef -.2078 -2.802 (.0124) (.242) .0590 .761 (.0078) (.104) log likelihood -7600.63 -7580.68 -3976.44 -1412.62 sample 1934 1934 1934 1934 1934 494 *Note: Cluster robust standard errors are reported in parentheses below all coefficients and APEs. All model include a full set of period dummy variables, time constant variables years of education, black race dummy variable, age/10, and age2 /100 which are not reported. 38 1.5.2 Dynamic Unobserved effects logit models under strict exogeneity Instead of controlling serial correlation by the models with pooled MLE methods, we probably control serial dependence among yi by using a dynamic model with full MLE. Chay and Hyslop [2001] emphasizes the importance of introducing dynamic models to seriously consider both unobserved heterogeneity and true state dependence in welfare analysis. Explicitly distinguishing spurious sate dependence by individual permanent characteristics and true state dependence is possible only with dynamic models. Our treatment for dynamic model follows Wooldridge [2005] and Wooldridge [2010]. Suppose we date our observations starting at t = 0, so that yi0 is the first observation on outcome, y. For t = 1,...,T, we are interested in dynamic model of (1.14). E(yit |yi,t−1 , ..., yi0 , zi , ci ) = G[ρyit−1 + zit β + ci + uit ≥ 0] (1.14) where zit is a vector of contemporaneous explanatory variables, and G can be the probit or logit function. There are several important points about this model. First, zit is assumed to satisfy a strict exogeneity assumption (conditional on ci ). Second, the probability of success at time t is allowed to depend on the outcome in t − 1 as well as unobserved heterogeneity, ci . Following Chay and Hyslop [2001], we focus the specification of dynamic model to investigate following two points. First, we are interested in hypothesis test with the null, H0 : ρ=0. ρ =0 implies the existence of state dependence after controlling for the unobserved heterogeneity, ci . This provides information about the decomposition of welfare 39 participation persistence to state true dependence and unobserved heterogeneity. Second, the estimation for ρ and the marginal effects of marital status and number of children on welfare participation. We start with the specification of dynamic model of (1.15). T f (y1 , y2 , ....yT |y0 , z, c, θ) = f (yt |yt−1 , ...., y0 , zt , c, θ) (1.15) t=1 We condition on yi0 as well to solve the initial conditions problem. Furthermore, we propose a D(ci |y0 , zi ) as in (1.17) to obtain f (y1 , y2 , ...., yT |y0 , zi ) using (1.16). Given a density h(c|y0 , z, δ), we have ∞ f (y1 , y2 , ....yT |y0 , z, c, θ) = ∞ f (y1 , y2 , ....yT |y0 , z, c, θ)h(c|y0 , z, δ)dc (1.16) where G is standard normal CDF for probit model and 2 h(c|y0 , z, δ) is Normal(ψ + ς0 yi0 + zi ξ, σa ) (1.17) where z include AF DCt−1 , number of children, marital status, and family poverty level. We can rewrite our estimation equation as in (1.18) using chamberlain device. yit = 1[ψ + ρyit−1 + zit β + ς0 yi0 + zi ξ + ai + uit ≥ 0] (1.18) Pooled MLE does not consistently estimate the scaled coefficients or the APEs in estimating dynamic model. This is because while P (yit = 1|zi , yit−1 , ...., yi0 , ai )=Φ(ψ + ρyit−1 + zit β + ς0 yi0 + zi ξ + ai ) is true, P (yit = 1|zi , yit−1 , ...., yi0 )=Φ(ψa + ρa yit−1 + zit βa + ς0a yi0 + zi ξa ) does not hold true unless ai is identically zero. Therefore, only random effect probit consistently estimate the scaled coefficients and the APEs. For comparison purpose we 40 also report CRE probit with pooled MLE and RE probit with MLE which does not control unobserved heterogeneity in table 1.15. The column (4) in table 1.15 for dynamic model provides that CRE probit with MLE for ρ which is 1.290(s.e=.072) and is statistically different from zero even after controlling for ˆ unobserved heterogeneity. This is strong evidence for the existence of true state dependence. ˆ ˆ ˆ ˆ ˆ The computation of APE for AF DCt−1 is based on averaging Φ(ψa +ρa +zit βa +ς0a yi0 +zi ξa ) ˆ ˆ ˆ - Φ(ψa + zit βa + ς0a yi0 + zi ξa ) across all i and t.18 CRE probit with full MLE for the ˆ APE of AF DCt−1 is provided in the model (4) of table 1.15 and is about .175. In other words, averaged across all women and all time periods, the probability of participating in welfare at time t is about .175 higher if the women was in welfare at time t − 1 than if she was not. This estimate controls for unobserved heterogeneity, number of children, marital status, family poverty level and women’s education, race, age, age2 . The column (2) of table 1.15 reports the APEs for random effect probit with MLE which does not control unobserved heterogeneity. Estimated APE for state dependence is .384 which is higher than random effects estimates for which heterogeneity is controlled for. This implies that much of observed serial dependence in welfare participation could be attributed to unobserved heterogeneity. ( .384−.175 = .544) About 54 % of observed serial dependence of welfare .384 participation is due to heterogeneity while about 46 % of observed serial dependence is due to state dependence. In other words, simple dynamic RE probit model that ignore unobserved heterogeneity exaggerate true state dependence by 54 %. In sum, the analysis with dynamic model implies that both sources of persistent welfare participation are important since the 18 Original coefficients are scaled by √ 1 1+ˆ 2 σ of all APEs. 41 =1, where σ 2 =.00000045 in the calculation ˆ difference between the estimates clearly shows that welfare participation is depending on individual permanent characteristics as well as ”narcotic” effects of welfare program. 42 Table 1.15. Probit estimation of dynamic model for married women’s welfare participation Model Estimation Method AF DC1 married kids √1 1+ˆ 2 σ (1) Linear FE coef .3703 (.0163) -.1922 (.0230) .0350 (.0075) (2) RE probit Full coef 2.028 (.073) -.988 (.094) .211 (.037) .734 MLE APE .3840 (.0058) -.1076 (.0057) .0220 (.0024) (3) CRE probit pooled MLE coef APE 1.290 .1343 (.063) (.0085) -1.595 -.1545 (.143) (.0115) .280 .0180 (.059) (.0034) (4) CRE probit Full MLE coef APE 1.290 .1747 (.062) * -1.595 -.1545 (.136) * .280 .0178 (.051) * .999 (5) FE logit MLE coef 1.643 (.097) -2.506 (.303) .670 (.129) unobs heterogeneity controlled not controlled controlled controlled controlled log likelihood -2624.13 -1658.99 -1658.99 -936.45 sample 1934 1934 1934 1934 441 Note: Cluster robust standard errors are reported in parentheses below all coefficients and APEs. All models include a full set of period dummy variables, time constant variables years of education, black race dummy variable, age/10, age2 /100, initial period value for AFDC and married and kids variables for all periods which are not reported. 43 1.6 Conclusion This paper examines the robustness of the conditional logit estimator to the violation of the CI assumption using correlated logistic errors in a binary panel data model with correlated individual fixed effects. Simulation results show that the conditional logit method is not robust to violation of the CI assumption. The magnitude of bias for coefficient is greater for the data with a smaller time dimension (T ) and a higher serial correlation coefficient. (ρ) Under correct specification of unobserved heterogeneity, CRE logit with pooled MLE provides reliable estimates for APEs with no significant finite sample bias. Simulation for the rejection frequency of the hypothesis test shows evidence that CRE logit provides a reliable test in the presence of high serial correlation. Finite sample bias due to both unobserved heterogeneity and serial correlation is substantial. An empirical example of welfare participation which exhibits high persistence shows that both sources of bias are important. The decomposition shows that bias due to uncontrolled unobserved heterogeneity is about 54% and that due to serial correlation is about 46%. This implies the importance of accounting for both state dependence and unobserved heterogeneity. 44 Chapter 2 The MI-IPW method in an unbalanced panel data model: An empirical application on the effect of class size reduction on SAT scores for students in grades K-3 2.1 Introduction Randomized experiments have become more popular in evaluating the impact of social programs recently. Angrist and Pischke [2009] reflects this trend by arguing that the causal inference for the effect of a program can be best answered by a randomized experiment. 45 However, perfect implementation of a randomized trial is extremely rare. In most experiments, experimental violations occur typically through noncompliance and attrition and can invalidate any statistical conclusions drawn from them. The possible bias from noncompliance can be addressed by the IV method using initial treatments as IV for actual treatments. However, standard methods for missing data, which include complete-case analysis, simple imputation and last-observation-carried-forward analysis, are not appropriate for a valid inference without assuming missing completely at random (MCAR). 1 Unfortunately, in most empirical applications, the MCAR assumption is unrealistic and the violation inevitably leads to biased estimates, incorrect standard errors and invalid inferences for unweighted complete-case analysis.2 In this chapter, we focus on the marginal treatment effect (MTE) of a program in a random experiment with ideal design at the beginning but subsequent missing data at the implementation stage. The purpose of this article is twofold. First, we provide robust Hausman tests of the missing completely at random (MCAR) assumption in an unbalanced linear panel data model for least-square (LS), fixed-effects (FE) and first-differencing (FD) estimators for which the rejections of tests imply invalid statistical analysis. Proposed Wald statistics for robust Hausman tests are based on comparing two consistent estimators under the null and it is robust in the sense that estimators used in the construction of the Wald statistics do not assume conditions for the estimators to be efficient.3 The idea of the test 1 MCAR and missing at random (MAR) assumption is in the sense defined by Rubin [1976] and Little and Rubin [2002]. The MCAR implies that the probability of missing is independent of both observed and unobserved variables. MAR implies that, conditional on observed data, the probability of missing does not depend on the unobserved data. 2 Data is complete in the panel data for a particular period if a cross-section unit has observations for outcome and all covariates at that period. 3 Original Hausman tests(Hausman [1978]) which compare random effect estimator and 46 is that, under the null, the difference in the estimates of two consistent estimators should be entirely due to sampling errors. This is an extension of a robust Hausman test of a balanced panel in [Wooldridge, 2010, p.321-334] to unbalanced panel data. We also conduct a Monte Carlo experiment of regression based variable addition tests for fixed-effects and first-differencing estimators to test the MCAR assumption. Second, we introduce practical methods with valid statistical analyses for handling missing data under the missing at random (MAR) assumption which include the inverse probability weighted (IPW) method (Horvitz and Thompson [1952] and Robins et al. [1994]), the likelihood-based maximum likelihood (ML) method (Rubin [1987] and Little and Rubin [2002]) and the multiple imputation (MI) method ( Buuren et al. [1999] and Schafer and Graham [2002]). Many microeconomic applications, that typically use the linear models involve various types of high-dimensional covariates which include continuous and categorical variables of various types (binary, count, fraction and so forth). Moreover, data can be missing for all covariates and outcome. Therefore, the likelihood based ML and the MI methods are not appropriate for unbalanced panel data with these types of missing variables since the ML and the MI methods require modeling the joint distribution of missing variables and the correct specification of a model for joint distribution would be extremely difficult, if not impossible. In many applications, although joint normality is assumed, we have very little information on the joint distribution of missing variables. On the other hand, the IPW method with missing outcome and covariates does not need to specify a fully parametric model for joint distribution of missing variables since probability weights can be modeled by a univariate normal (probit) or a logistic (logit) distribution. fixed effect estimator assume efficiency for random effect estimator. 47 Meanwhile, the IPW method under MAR is criticized mainly due to the following three concerns. First, the IPW method is generally less efficient than the MI method with correct specification for missing variables since the IPW method only uses complete-case data while the MI method uses information in covariates and outcome even from incomplete case samples. (Clayton et al. [1999]) Therefore, for instance, if a linear regression model has highdimensional covariates and only one variable in the covariates has missing data, then the loss of efficiency for the IPW method should be significant. However, for monotone missing data (i.e. attrition), there is no loss of information due to this factor since data missing occurs in outcome and all covariates simultaneously for a cross-section unit.4 The second concern is that IPW estimates can be very unstable if the estimated probability weights are close to zero for certain subpopulation units(Little and Rubin [2002]). However, we can verify the relevance of this criticism by probing the estimates with actual data in the application. Finally, the IPW estimates can be sensitive to the model specification for the probability of selection (or missing). We provide an example of using an economic theory in the specification of the selection process and the MAR assumption. Particularly, in the empirical example, we extend Becker [1993]’s model of parent’s school choice between public and private schools to specify the selection process that satisfies the MAR assumption. For simplicity, we use 4 Recently, Robins and Rotnitzky [1995] and Scharfstein et al. [1999] propose an AIPW (i.e. doubly robust) estimator which allows to construct a theoretically efficient version of IPW so that the extension of IPW to AIPW estimator can overcome inefficiency problem at least asymptotically. An estimator is called doubly robust if it is consistent when either a model for conditional expectation(imputation model for missing variables) or a model for probability of selection is correctly specified. An AIPW estimator can attain efficiency in the sense that its variance can reach semi-parametric efficiency bound among the class T of estimators that use n i=1 t=1 sit · momentit (Observedit , θ) = 0. More discussion for efficiency properties of AIPW estimators is provided in Bickel et al. [1998], Tsiatis [2006], Robins et al. [1994], and section in 2.4.1. 48 the logit or the probit models for the distribution of probability of selection.5 Unfortunately, it is very hard to test how inaccurate the approximation of the specification for predictors and the logit or the probit models for probability of selection are in practice so we conduct a Monte Carlo simulation to provide evidence on the sensitivity of the IPW estimator to selection model misspecification. We explicitly categorize unbalanced panel data into four types based on missing patterns. The four data types are the following: (i) complete case data, (ii) individual units leave the sample but re-enter the sample later (type-I missing data), (iii) individual units are in the sample but some variables are missing (type-II missing data), and (iv) attrition (typeIII missing data). For type-I missing data, we drop the observations and do not use in the estimations while we assume these types of data missing are completely random or negligibly small. For type-II missing data, we apply multiple imputations based on the missing variable types. Finally, we propose a method which applies MI and IPW sequentially. Specifically, after multiple imputations of type-II missing variables, we treat type-II missing data as if they are complete case data. We then combine imputed type-II missing data with complete case data and hen apply the IPW method. Tests of MCAR and the estimations with unweighted, IPW and MI-IPW methods under the MAR are conducted to gauge the effect of class size reduction (CSR) on SAT scores for students in K-3 grades using Tennessee’s Project STAR. We employ Tennessee’s Project STAR to evaluate the MI-IPW method in the application since it is a randomized experi5 Nonparametric estimation of the probability of selection is also available from Hirano et al. [2003] but it suffers from the curse of dimensionality in practical applications.(Hall and Presnell [1999]). 49 ment with significant numbers of noncompliance and attrition.6 For instance, 13% of 11,601 students switched from treatment groups to the control group or vice versa (i.e. 13% of the sample are defiers) and about 45% of the students either entered the program after kindergarten or left before the program ended and about 5% of the participating students had missing variables during in the STAR program. Moreover, potential differential attrition across treatment and control groups appears in basic statistics which show that a larger fraction of students left the sample in regular and regular-with-aide classes than from small classes for all grades from K to 3. As reported in Table 2.27, the difference in the average scores between students who stayed and drop-out students is greater in small classes than the difference in regular and regular-with-aide classes for both first and second grades although the differences are small. This implies that a relatively greater proportion of better performing students left the program from regular and regular-with-aide classes than from small classes. This is informal evidence of differential drop-out rates of students across different types of classes and suggests possible overestimation in the effect of CSR on SAT scores. Therefore, the MCAR assumption is possibly violated and a standard estimation with complete-case is probably inappropriate.7 Our application of proposed tests and estimations with Project STAR data proceed as follows. First, we apply formal tests of differential attrition across class types which lead to violation of the MCAR assumption. We test the MCAR assumption using Hausmantype tests and regression based variable addition tests using FD and FE estimators. In 6 Tennessee’s Project STAR is a longitudinal data with a single cohort of 11,601 students for grades in K-3. Missing data information for STAR is reported in table 2.26. 7 Further details of findings with unweighted estimation with complete-case are provided in Word et al. [1990] and Krueger [1999]. 50 the application with Project STAR, both Hausman-type tests and regression-based variable addition tests reject the MCAR and IPW and MI-IPW methods under MAR are proposed to overcome the endogeneity problem by violation of MCAR. Meanwhile, we cannot directly test with the null of weak exogeneity for a pooled LS estimator. This is because, for sit = 0 (see (2.2) for definition), we do not have observations for either outcome or for some covariates so we cannot estimate E(uit |sit , xit ) directly. (see (2.1) for the definitions of variables). Second, we use LS, reduced-form(IV)8 and first-differencing estimators with IPW and MIIPW methods in a linear unbalanced panel data model to estimate the effect of CSR on SAT score for grades K-3 using Tennessee’s project STAR data. Specifically, we apply the IPW method using the logit (or probit) model for which all observed past covariates and outcome are adopted as predictors for probability of selection which is based on Becker [1993]’s school choice model. In the application, the estimated probability weights are strictly greater than 0.2 so that we can avoid a criticism of instability for the IPW method from the probability weights being too close to zero. Multiple imputations of chained equation estimates are also obtained and compared to IPW estimates. The estimates for the effect of CSR on SAT scores are about 4 to 6.5 percent for IPW and MI-IPW while unweighted estimates for the return of CSR are 5 to 7.5 percent. Unweighted estimators of complete-case overestimate the effect by about 1 to 2 percentage points. Direction of bias analysis for pooled estimators also suggests that unweighted estimators with complete-case reveal upward bias under an assumption of positive correlation between E(sit |xit ) and E(sit+1 |xit ). The rest of chapter is organized as follows. In section 2, we provide pooled LS, FE and FD 8 Reduced-form estimation (IV hearafter) implies LS estimation where initial assignment of treatments is used as covariates instead of actual treatments as in Krueger [1999]. 51 estimators in an unbalanced linear panel data model. Section 3 introduces robust Hausmantype tests and variable addition tests to test the MCAR assumption and to show the power of the tests. In section 4, we describe IPW and MI-IPW methods under MAR to deal with missing data and to perform robust analysis for the IPW method to the misspecification of selection model using Monte Carlo(MC) simulation. Section 5 illustrates the tests of the MCAR and the estimations of IPW and MI-IPW methods under MAR using project STAR in the analysis of the effect of CSR on SAT score for students in grades K-3. Section 6 provides a method with which to calculate the direction of bias when the MAR assumption is violated. Section 7 contains concluding remarks. 2.2 Linear panel data model with missing and noncompliance We are mainly interested in the estimation of marginal treatment effect (MTE hearafter) of a program in linear unbalanced panel data model. In particular, we focus on three popular estimators, in linear panel data models, which include pooled LS, FE and FD estimators. 2.2.1 Model We introduce selection indictor (2.2) to express explicitly the conditions of consistency and MCAR for LS, FE and FD estimators. yit = dit · β + x1it δ + x2i γ + ft + ci + uit for , i = 1, 2, .., N and t = 1, 2, .., T 52 (2.1)       0  sit =    1 if all of y , x are observed  it it    otherwise (2.2) where yit is an outcome, dit is a vector of indicators for treatments, x1it is a vector of time varying covariates, x2i is a vector of time constant covariates, ft is time effect, ci is unobserved heterogeneity and uit is idiosyncratic error. Let xit = (dit , x1it , x2i , ft ), θ = (β , δ , γ , 1) and vit = ci + uit . Then we can rewrite (2.1) as (2.3). yit = xit · θ + vit for , i = 1, 2, .., n and t = 1, 2, .., T (2.3) We can express pooled LS, FE and FD estimators with selection indicator as (2.4), (2.5) and (2.6) respectively and explicitly state conditions for MCAR. N T N sit xit xit θpls = ( i=1 t=1 N T θF E = ( .. where xit = xit − xi , xi = i=1 t=1 T 1 sit xit Ti t=1 N T )−1 sit xit yit i=1 t=1 .. .. sit xit xit )−1 and Ti = N T .. .. sit xit y it (2.5) sF D ∆xit ∆yit it (2.6) i=1 t=1 T sit . t=1 T N T sF D ∆xit ∆xit )−1 it θF D = ( (2.4) i=1 t=2 i=1 t=2 where ∆xit = xit − xit−1 for t = 2, 3, .., T and sF D = min{sit , sit−1 }. it Conditions for consistency with missing data Pooled Least Squares For a pooled least squares(PLS) estimator, we essentially need rank condition (2.7) and exogeneity condition (2.8) to obtain a consistent estimator. rankE(sit xit xit ) = dim(θ) , t = 1, 2, ..., T (2.7) E(sit xit vit ) = 0 , t = 1, 2, ..., T (2.8) 53 Fixed effects Rank condition and exogeneity condition for fixed-effects estimator are (2.9) and (2.10), respectively. .. .. rankE(sit xit xit ) = dim(θ) , t = 1, 2, ..., T .. .. E(sit xit uit ) = 0 , t = 1, 2, ..., T (2.9) (2.10) First differencing Rank condition and exogeneity condition for first differencing estimator are (2.11) and (2.12), respectively. rankE(sF D ∆xit ∆xit ) = dim(θ) , t = 2, ..., T it (2.11) E(sF D ∆xit ∆uit ) = 0 , t = 2, ..., T it (2.12) Exogeneity assumption Strict exogeneity of (2.13) is sufficient for the exogeneity conditions of FE and FD estimators in (2.10) and (2.12) while weak exogeneity of (2.15) is sufficient for pooled estimator. E(uit |si , xi ) = 0, for t = 1, 2, ..., T (2.13) where si = (si1 , si2 , ...., siT ) and xi = (xi1 , xi2 , ...., xiT ) (2.14) E(uit |sit , xit ) = 0 and E(ci |sit , xit ) = 0, for t = 1, 2, ..., T (2.15) Pooled estimator does not need strict exogeneity while it requires no correlation between selection and unobserved heterogeneity. 54 2.2.2 Tests of missing completely at random If missing completely at random (MCAR) is satisfied, all three estimators should be consistent with random experiment.9 We suggest a Hausman-type test using null hypothesis of strict exogeneity and E(ci |sit , xit ) = 0 in section 2.2.1. MCAR MCAR in random experiment implies both weak and strict exogeneity while violation of either weak or strict exogeneity leads to violation of MCAR. Moreover, with proper rank conditions, MCAR assumption is sufficient for all of pooled LS, FE and FD estimators to be consistent while violation of either weak or strict exogeneity leads unweighted pooled LS or FD and FE estimators to be inconsistent respectively. Null I: strict exogeneity Consider a case in which the violation of strict exogeneity is the only source of bias. Under strict exogeneity, both FE and FD estimators are consistent and the difference of FE and FD estimates should be entirely due to sampling error. We extend a robust Hausman test which compares FE and FD estimators in balanced panel data as in [Wooldridge, 2010, p.324-325] to unbalanced panel data model by constructing a Wald statistic based on √ n(βF E − βF D ). The rejection of null hypothesis implies the violation of strict exogeneity (Cov(uit ,si di ) = 0) and both FE and FD estimators are inconsistent. In short, MCAR is violated and wnweighted FE and FD estimators are inconsistent. Singularity Since we can not obtain the estimates for FE and FD coefficient of time constant covariates, the set of parameters that we can use to construct a Wald statistic 9 See Wooldridge [2009] for more discussion of equivalence of three estimators under the null. 55 should exclude the coefficients from time constant variables. Thus, in typical applications, the indicators for the treatments of a program should vary over time to apply our method of comparing two consistent estimators under the null. Null II: no correlation between selection and unobserved heterogeneity We propose a subsequent test for the case when we can not reject the null of strict exogeneity. We assume strict exogeneity hold both under the null and the alternative. Therefore, under the null, all conditions for the consistency of pooled, FE and FD estimators are satisfied while, under the alternative, E(ci |sit , xit ) = 0 is not satisfied. Still both fixed effect and first differencing estimators provide consistent estimators both under the null and the alternative while pooled estimator is consistent only under the null.10Thus, we can test the null hypothesis of MCAR by using differential correlation between selection and unobserved heterogeneity across treatments and control as alternative. We construct a Wald statistic based on √ n(βpls − βF E ) to perform a test of MCAR. The rejection of the null implies the violation of E(ci |sit , xit ) = 0 (conditional on xit Cov(ci ,sit xit )= 0) and this leads pooled estimator to be inconsistent because of Cov(ci ,sit xit )= 0. Under ideal conditions and E(ci |sit , xit ) = 0, MCAR is violated and unweighted pooled estimator is inconsistent while, with FE and FD transformed linear models, MCAR assumption holds and unweighted FE and FD estimators are consistent. 10 Strict exogeneity assumption is maintained both under the null and under the alternative. We also assume that rank conditions are satisfied for all estimators we consider in this section. 56 2.2.3 Implications from the rejection of tests The rejection of strict exogeneity implies that both unweighted FE and FD estimators with complete-case are inconsistent. The failure of rejection of strict exogeneity and the success of rejection for Cov(ci ,sit |xit )= 0 leads unweighted pooled estimator with complete-case to be inconsistent while we can not reject the null that unweighted FE and FD estimators with complete case are consistent.11 The rejection of both tests implies missing indicator is probably correlated with both timevarying and time-constant unobserved variables even conditional on observed variables(ci and ui are correlated with sit xit ) so that MCAR is violated and unweighted FE and FD estimators with complete case are invalid. Therefore, we need to control for missing data to produce a valid statistical analysis. Either FD or FE transformations are not enough to obtain a valid inference since the correlation between ui and si xi is not zero. 2.3 Formal tests of MCAR In this section, we introduce a test statistic used in performing robust Hausman tests for MCAR assumption and investigate the power of the test via Monte Carlo experiment. 11 Under the null of these two Hausman-type tests, all other conditions for consistency including rank condition and other regularity conditions are assumed to be satisfied. 57 2.3.1 Hausman-type test I - Comparison of θpls and θF E using E(ci ,sit xit )= 0 as null A test statistic is constructed to perform a Hausman-type test for the null of no correlation between unobserved heterogeneity and missing data. First, we stack two estimators θpls and θF E . We denote stacked estimator as θa = (θpols , θF E ) and denote restriction matrix as R of k × 2k.  √ n −1 T 1 0 sit xit xit  n  i=1 t=1 nθa =  n T  .. .. 1 0 sit xit xit n      1 sit xit yit  √n  i=1 t=1   1 n T .. .. √ sit xit y it n i=1 t=1 2k×2k n  T i=1 t=1     2k×1 and R = [Ik | − Ik ]. We write the null hypothesis as (2.16) and this implies that under the null both pooled LS and FE estimators are consistent. H0 : Rθa = 0     1 0 · · · · · · 0 −1 0 · · · · · · 0    0 1 0 · · · 0 0 −1 0 · · · 0    . . . .. .. . . . Rθa =  . . . . . . .  .   . . .. . . ..  . . . . . . . . .  .   0 0 ··· 0 1 0 0 · · · 0 −1                k×2k                     θ1,ls    .  . .    θk,ls     θ1,F E     .  . .    θk,F E 2k×1 58 (2.16)    θ1,ls − θ1,F E    θ  2,ls − θ2,F E   .  . .   =  .  . .    .  . .    θk,ls − θk,F E                    =0 k×1 Using the null H0 : Rθa = 0 12and consistency conditions in (2.2.1), we construct a Wald statistic which converges asymptotically to a chi-squared distribution with degree of freedom equal to number of restrictions, k in (2.16). Assumptions 1. Conditional strict exogeneity : E(uit |xi , si ) = 0, ∀t 2. no correlation between unobserved heterogeneity and missing data: E(ci |xit , sit ) = 0. 3. Rank conditions: E( T F E .. .. t=1 sit xit xit ) 1 p lim n p lim 1 n n and E( T t=1 sit xit xit ) T T sit xit xit = E( i=1 t=1 n T have full rank .. .. sit xit xit ) t=1 T sit xit xit = E( i=1 t=1 .. .. sit xit xit ) t=1 n T 1 sit xit uit and 4. Conditions for applying CLT to following two objects, √n i=1 t=1 1 √ n n T i=1 t=1 12 This .. .. sit xit uit . is equivalent to H0 : θj,P LS = θj,F E ∀j = 1, 2, ...k 59 A Wald Statistic for a robust Hausman test Proposition 2 Under assumption 1,2,3,and 4, for the test with null hypothesis of H0 : Rθa = 0, we can define a Wald type statistic Wa which converges to χ2 distribution. k √ √ √ Wa = [R n(θa )] [Rvar( nθa )R ]−1 R n(θa ) ∼ χ2 k where    Rθa =                1 0 · · · · · · 0 −1 0 1 . . . .. ··· 0 0 0 . . . . . . . . . . . . 1 0 . . . . .. 0 0 ··· 0 . 0 ··· ··· 0    −1 0 · · · 0    .  .. .  . .   .  .. . .  .    0 · · · 0 −1 k×2k                     θ1,P LS    .  . .    θk,P LS     θ1,F E     .  . .    θk,F E = 2k×1                      θ1,P LS − θ1,F E    θ2,P LS − θ2,F E     .  . .    .  . .    .  . .    θk,P LS − θk,F E k×1 Proof. See appendix. 2.3.2 Monte Carlo experiment I We investigate the power of the test using Monte Carlo simulation. We generate a selection process which reduces full sample to either 75 or 50 percent of full sample. Generated 60 selection process induces a differential correlation between unobserved heterogeneity and selection across treatment and control. Thus, this particular selection process preserves asymptotic consistent results for FE and FD estimators. We focus on how the power of test changes as the fraction of complete-case data to full data changes to 75 and 50 percents. Data generating process Data generation process is based on following outcome and selection process. The model for DGP yit = dit · α + xit · β + ci + uit , for i = 1, 2, .., n and t = 1, 2, 3, 4 where dit a binary variable and xit is continuous variable. w e si1 = 1; sit = 1(dit · γ + √i + it > a), where γ = 1 for i = 1, 2, .., n and t = 2, 3, 4 2 2 Variables generation h +v √ 1. dit = Φ( it i ) where Φ(·) is standard normal CDF. dit = 1 if dit > 0.7. α = 1, 2 hit ∼ iidN (0, 1) and vi ∼ iidN (0, 1). 2. xit ∼ iidN (0, 1), β = 0.2 3. uit ∼ iidN (0, 1) √ 4. ci ∼ 5. sit 3wi 2 = a + 2i , wi ∼ iidN (0, 1) and ai ∼ iidN (0, 1) w eit Φ(dit γ + √i + 2 ) 2 where γ=1 . where a determines the proportion of missing data. 61 sit = 1 if sit > a, Table 2.1. Hausman-type test I: Null and Alternative (cov(sit dit , ci ) = 0) Pooled LS FE estimator Null Alternative Consistent Inconsistent Consistent Consistent Table 2.2. Hausman-type test I: Null and Alternative (cov(sit dit , ci ) = 0) RE estimator FE estimator Null Consistent and Efficient Consistent Alternative Inconsistent Consistent With this data generating process, the bias of pooled LS estimator for α is originated from wi which causes a differential correlation between sit and ci across dit while, for FE estimator, FE transformation eliminate ci and FE estimator for α is consistent. This particular selection process leads to general missing. The size and power of the Hausman-type I test is based on the estimators with characteristics in 2.1 under the null and alternative. As DGP satisfy the conditions for Random effects (RE) estimator to be ideal under the null we can also carry out Hausman type-I test by stacking RE and FE estimators. For this case, the size and power of the Hausman-type I test is based on the estimators with characteristics in 2.2 under the null and alternative. Simulation results MC simulations are performed with 1,000 replications with T =4 and various n from 100 to 5,000 and cluster robust standard error is used in all simulations. We choose the size of T =4 to mimic the actual sample size of Tennessee’s Project STAR in the application. In DGP, a determines the missing proportion of data. Therefore, a=0 implies no missing 62 data so it is a balanced panel while a=0.5 implies 50 percent of data is missing and only 25 percent of data is complete case. Third column and fourth column in tables 2.3 and 2.4 represent the means of pooled LS and FE estimates for α respectively. P-value column shows that the mean of p-value for Wald statistic, Wa for Hausman-type test I. 1(P < .05) column shows the mean of the rejection of null at the nominal level 0.05. Finally, 95-CI column represents, the coverage for 1(P < .05). Both in table 2.3 and table 2.4, the power of test increases with cross-sectional size, n, for all a > 0. For example, for the data of same size as Project STAR data for which n= 6,800 in each grade and a=0.5(missing fraction of data), the power of test is 1 in both 2.3 and 2.4. Overall, if n is greater than 5,000, the power of test is 1 regardless of the value for a(fraction of non-complete data to full data) in the simulation. In short, the power of robust Hausman-type test using the difference between pooled LS and FE is quite reliable to test MCAR for n > 5, 000. As we see in tables 2.3 and 2.4, under alternative DGPs, both bias and standard error is smaller for RE estimator than those for pooled estimator. In both 2.3 and 2.4, the panel with a=0 shows the results for balanced panel data and it reports no size distortion for all n and the coverage rate of rejection frequency contains true nominal level, 0.05, for all n. 63 Table 2.3. Pooled LS and FE estimates for α and p-value for Wa a=0 True α Pooled FE p-value r ≡ 1(P<.05) 95-CI n = 100 1 1.003 1.002 0.502 0.054 (0.040,0.068) n = 500 1 1.003 1.002 0.510 0.062 (0.047,0.076) n = 1, 000 1 0.996 0.998 0.496 0.047 (0.034,0.060) n = 2, 000 1 0.999 1.001 0.490 0.047 (0.034,0.060) n = 5, 000 1 0.998 1.000 0.510 0.048 (0.035,0.061) a = 0.25 True α Pooled FE p-value r ≡ 1(P<.05) 95-CI n = 100 1 0.810 1.000 0.374 0.155 (0.132,0.177) n = 500 1 0.812 1.000 0.132 0.533 (0.502,0.564) n = 1, 000 1 0.804 0.997 0.033 0.846 (0.823,0.868) n = 2, 000 1 0.810 1.001 0.002 0.990 (0.984,0.996) n = 5, 000 1 0.807 0.999 0.000 1 (1,1) a = 0.5 True α Pooled FE p-value r ≡ 1(P<.05) 95-CI n = 100 1 0.758 1.002 0.348 0.182 (0.158,0.206) n = 500 1 0.754 1.000 0.089 0.664 (0.635,0.693) n = 1, 000 1 0.747 0.999 0.014 0.936 (0.920,0.951) n = 2, 000 1 0.751 0.999 0.001 0.995 (0.991,0.999) n = 5, 000 1 0.750 1.000 0.000 1 (1,1) *Note: 1(P<.05) is the rejection rate for the null of H0 :α=1 with nominal value 0.05.; a is the ratio of missing sample to full sample.; Cluster robust standard error has been used in all estimations.; p-value is the mean of p-value for Wa , r is rejection rate, and CI-95 is coverage for r. 64 Table 2.4. RE and FE estimates for α and p-value for Wa a=0 True α Pooled FE p-value r ≡ 1(P<.05) 95-CI n = 100 1 .999 .999 0.512 0.051 (0.037,0.065) n = 500 1 1.000 1.000 0.495 0.055 (0.040,0.069) n = 1, 000 1 1.000 1.001 0.497 0.050 (0.036,0.064) n = 2, 000 1 1.001 1.002 0.507 0.054 (0.040,0.068) n = 5, 000 1 1.000 1.000 0.496 0.045 (0.032,0.058) a = 0.25 True α Pooled FE p-value r ≡ 1(P<.05) 95-CI n = 100 1 0.960 1.000 0.397 0.141 (0.119,0.163) n = 500 1 0.962 1.001 0.166 0.489 (0.458,0.520) n = 1, 000 1 0.958 0.998 0.055 0.768 (0.742,0.794) n = 2, 000 1 0.962 1.001 0.007 0.965 (0.954,0.976) n = 5, 000 1 0.960 1.000 0.000 1 (1,1) a = 0.5 True α Pooled FE p-value r ≡ 1(P<.05) 95-CI n = 100 1 0.973 1.005 0.433 0.119 (0.099,0.139) n = 500 1 0.968 1.000 0.225 0.325 (0.296,0.354) n = 1, 000 1 0.967 1.001 0.105 0.616 (0.586,0.646) n = 2, 000 1 0.965 0.999 0.026 0.879 (0.859,0.899) n = 5, 000 1 0.967 1.001 0.000 1 (1,1) *Note: 1(P<.05) is the rejection rate for the null of H0 :α=1 with nominal value 0.05.; a is the ratio of missing sample to full sample.; Cluster robust standard error has been used in all estimations.; p-value is the mean of p-value for Wa , r is rejection rate, and CI-95 is coverage for r. 65 2.3.3 Hausman-type test II - Comparison of θF E and θF D , Null: strict exogeneity A test statistic for a Hausman-type test of strict exogeneity is derived. We stack two unbiased estimators θF E , θF D under strict exogeneity and denote stacked estimator as θb = (θF E , θF D ) and restriction matrix as R of k × 2k.   √  nθb =   1 n n T i=1 t=1 −1  .. .. sit xit xit 0 1 n 0 n T i=1 t=2 sF D ∆xit ∆xit it         1 √ 1 √ n n T .. .. sit xit y it i=1 t=1 n T sF D ∆xit ∆yit it n i=1 t=2      Using the null H0 : Rθb = 0 and the assumptions for consistency in (2.2.1), we can construct a Wald statistic which converges asymptotically to a chi-squared distribution with following assumptions. Assumptions 1. Conditional strict exogeneity : E(uit |xi , si ) = 0, ∀t .. .. T t=1 sit xit xit ) 2. Rank conditions: E( 1 p lim n 1 p lim n n n T and E( .. .. sit xit xit T FD t=2 sit ∆xit ∆xit ) T = E( i=1 t=1 have full rank .. .. sit xit xit ) t=1 T T sF D ∆xit ∆xit it i=1 t=2 sF D ∆xit ∆xit ) it = E( t=2 n T .. .. 1 3. Conditions for applying CLT to following two objects, √n sit xit uit and i=1 t=1 1 √ n n T i=1 t=2 sF D ∆xit ∆uit . it 66 Proposition 3 Under assumption 1,2,and 3, for the test with null hypothesis of H0 : Rθb = 0, we can derive a Wald type statistic Wb that converges to χ2 distribution. k √ √ √ Wb = [R n(θb )] [Rvar( nθb )R ]−1 R n(θb ) ∼ χ2 k Proof. See appendix. It can be obtained in similar manner as Hausman type I test. Singularity A test using FE and FD estimators can only include the coefficients for time varying covariates.13 2.3.4 Monte Carlo experiment: Test of MCAR using strict exogeneity assumption We investigate the power of test for DGPs with selection processes which reduce full sample to either 75, 50 or 25 percent of full sample and induce a differential correlation between selection and idiosyncratic errors across dit (treatment and control). Thus, these selection processes violate strict exogeneity assumption. We focus on how the power of test changes as we increase the missing proportion of data from 25 percent to 75 percent. Data generating process Data generation is based on following outcome and selection process. The model for DGP yit = dit · α + xit · β + ft + ci + uit 13 See Wooldridge [2010] for more discussion. 67 (2.17) where i = 1, 2, .., n and t = 1, 2, 3, 4 v + w + eit si1 = 1; sit = 1 if Φ( it √i > cutof f ) and sit = 0 otherwise 3 (2.18) Variables generation h +v √ 1. dit = Φ( it i ) where Φ(·) is standard normal CDF. dit = 1 if dit > 0.5. and 2 dit = 0 otherwise. α = 1, hit ∼ iidN (0, 1) and vi ∼ iidN (0, 1). 2.        0   1 ρ   uit   ∼ BN   ,          0 ρ 1 eit where BN is bivariate normal and ρ=0.7 in the simulation. 3. xit ∼ iidN (0, 1), β = −0.2 4. ft ∼ unif orm(−1, 1) √ 5. ci ∼ 3wi 2 v + 2i , wi ∼ iidN (0, 1) and vi ∼ iidN (0, 1) h +wi+e 6. sit = Φ( it √ it ). sit = 1 if sit > a and sit = 0 otherwise , 3 where a determines proportion of missing data since sit is normal CDF. Both FE and FD estimators for α are not valid by a differential correlation between sit and uit across dit since sit dit is correlated with uit and the source of differential correlation hit is time-varying. The differential correlation remains even after FE and FD transformations so that both FE and FD estimators are not reliable for unbalanced panel data with a >0. 68 Simulation results Table 2.5 reports the size and power of Hausman-type II tests. Simulation performed with 1,000 replications with data set size of n ranges from 100 to 5,000 and T =4. First, from the a=0 panel, we see in column r ≡ 1(P<.05) that there is no size distortion and the coverage rate of rejection frequency contains true nominal level 0.05 for all n from 100 to 5,000. Second, the power of test increases with the proportion of missing data, a, and cross-sectional sample size, n. Third, for Project STAR data of a=0.5 and n=6,000, the power of test is about 0.6. Therefore, it is quite possible that we may not be able to reject strict exogeneity assumption even if selection and time-varying unobserved variables are correlated since finite sample power is strictly less than 1. 2.3.5 Variable addition test of strict exogeneity Wooldridge [2009] proposed a simple regression based test of strict exogeneity for FE and FD estimators in an unbalanced panel data. This test simply adds selection indicator of sit+1 as an additional regressor in the linear regression model for FD and FE estimations. Then the t-statistics for the coefficient, λ on sit+1 in (2.19) and (2.20) can be used in t-tests of null hypothesis of H0 : λ = 0. This is because the strict exogeneity implies any function of si = {si1 , si2 , ....siT } should not be correlated with ui = (ui1 , ui2 , ...uiT ) conditional on xi . The rejection of test implies the violation of strict exogeneity and violation of strict exogeneity is sufficient for violation of MCAR assumption so that both unweighted FE and FD estimators are invalid. 69 Table 2.5. FD and FE estimates for α and p-value for Wb a=0 True α FD FE p-value r ≡ 1(P<.05) 95-CI n = 100 1 0.994 1.002 0.492 0.061 (0.046,0.076) n = 500 1 1.001 1.002 0.504 0.045 (0.032,0.058) n = 1, 000 1 0.998 0.998 0.513 0.047 (0.034,0.060) n = 2, 000 1 0.999 0.999 0.499 0.044 (0.031,0.057) n = 5, 000 1 1.001 1.000 0.507 0.054 (0.040,0.068) a = 0.25 True α FD FE p-value r ≡ 1(P<.05) 95-CI n = 100 1 0.831 0.849 0.487 0.078 (0.061,0.095) n = 500 1 0.834 0.854 0.469 0.061 (0.046,0.076) n = 1, 000 1 0.829 0.849 0.456 0.082 (0.065,0.099) n = 2, 000 1 0.831 0.850 0.418 0.121 (0.101,0.141) n = 5, 000 1 0.830 0.850 0.311 0.263 (0.236,0.290) a = 0.5 True α FD FE p-value r ≡ 1(P<.05) 95-CI n = 100 1 0.749 0.802 0.457 0.087 (0.070,0.104) n = 500 1 0.743 0.803 0.388 0.167 (0.144,0.190) n = 1, 000 1 0.752 0.802 0.361 0.205 (0.180,0.230) n = 2, 000 1 0.753 0.808 0.259 0.352 (0.322,0.382) n = 5, 000 1 0.751 0.807 0.162 0.599 (0.569,0.629) a = 0.75 True α FD FE p-value r ≡ 1(P<.05) 95-CI n = 100 1 0.725 0.879 0.409 0.126 (0.105,0.147) n = 500 1 0.717 0.869 0.296 0.320 (0.291,0.349) n = 1, 000 1 0.708 0.869 0.218 0.495 (0.464,0.526) n = 2, 000 1 0.711 0.869 0.159 0.611 (0.581,0.641) n = 5, 000 1 0.712 0.866 0.106 0.752 (0.725,0.779) *Note: 1(P < .05) column shows the mean of the rejection of null at the nominal level 0.05.; a is the ratio of missing sample to full sample.; Cluster robust standard error has been used in all estimations.; p-value is the mean of p-value for Wb . FD column and FE column represents the mean of FD and FE estimates for α respectively. Finally, 95-CI column represents, the coverage for 1(P < .05). 70 .. .. .. .. y it = λ · sit+1 + xit θ + ft + uit , for sit = 1 (2.19) ∆yit = λ · ∆sit+1 + ∆xit θ + ∆ft + ∆uit , for sF D = 1 it (2.20) Although we can not directly test weak exogeneity by variable addition test for pooled estimator in (2.21) since we can not observed all xit , yit for individual i when sit =0, we can obtain indirect evidence from variable addition test in the pooled LS based on the assumption of non-zero serial correlation of selection indicator. If we further assume that E(sit |xit ) and E(sit+1 |xit ) is correlated, we can t-statistic on λ in (2.21) to test weak exogeneity and MCAR. yit = λ · sit+1 + xit θ + ft + ci + uit , for sit = 1 (2.21) Data generating process We use the same DGP processes as in section 2.3.4 where we study the power of Hausmantype test for strict exogeneity. We only change the model of (2.17) from 2.3.4 to (2.19) and (2.20) for FE and FD estimators. In these DGPs, the selection and unobserved heterogeneity or time-varying idiosyncratic errors are differentially correlated across dit so that both FD and FE transformations cannot eliminate the correlation and both FD and FE estimators are not valid. The power of variable addition test for strict exogeneity assumption Table 2.6 reports the estimates for λ in (2.19) and (2.20) for FE and FD estimators. In all simulations, the number of replication is 1,000 with the data set of T =4 and n from 100 71 to 5,000 and the cluster standard error are used in all estimations. Selection indicators are generated to produce the fraction of missing data to full data to be ranged from a=0.1 to a=0.75. For pooled LS estimator, the power of the test for the null, H0 : λ = 0 in (2.21) is 1 for all a >0 and n ≥500 but this is not a direct test of MCAR for the pooled LS estimator. However, if we assume further non-zero correlation between E(sit |xit ) and E(sit+1 |xit ), this test become valid test of weak exogeneity and MCAR. For both FE and FD estimators, the power of test increases with n and a. For instance, for the case of a=0.5, the power of test increases from 0.07 to 0.26 for FE estimation while the power of test increases from 0.07 to 0.67 for FD estimation. Still, the power of test is significantly lower than 1 for the data with the actual size of the Project STAR. The power of variable addition tests is greater for FD estimator than FE estimator with the same n and a. Compared to a robust Hausman-type test at the last column in table (2.6)which uses strict exogeneity, the power of variable addition test is greater for when a ≤ 0.5 while the power of variable addition test is lower when a=0.75. 72 Table 2.6. Test of strict exogeneity variable addition test and Hausman test for Wb LS a = 0.1 t-stat 1(P<.05) n = 100 3.40 0.92 n = 500 7.30 1 n = 1, 000 10.35 1 n = 2, 000 14.54 1 n = 5, 000 23.04 1 a = 0.25 t-stat 1(P<.05) n = 100 3.66 0.96 n = 500 8.14 1 n = 1, 000 11.58 1 n = 2, 000 16.35 1 n = 5, 000 25.91 1 a = 0.5 t-stat 1(P<.05) n = 100 3.52 0.93 n = 500 7.94 1 n = 1, 000 11.21 1 n = 2, 000 15.97 1 n = 5, 000 25.19 1 a = 0.75 t-stat 1(P<.05) n = 100 3.04 0.85 n = 500 6.84 1 n = 1, 000 9.62 1 n = 2, 000 13.59 1 n = 5, 000 21.43 1 95-CI (0.90,0.94) (1,1) (1,1) (1,1) (1,1) 95-CI (0.95,0.97) (1,1) (1,1) (1,1) (1,1) 95-CI (0.92,0.95) (1,1) (1,1) (1,1) (1,1) 95-CI (0.83,0.87) (1,1) (1,1) (1,1) (1,1) t-stat -0.13 -0.33 -0.48 -0.63 -1.06 t-stat -0.23 -0.66 -0.84 -1.18 -1.85 t-stat -0.32 -0.79 -1.05 -1.50 -2.37 t-stat -0.31 -0.64 -0.87 -1.27 -2.04 FD 1(P<.05) 0.08 0.05 0.07 0.10 0.18 1(P<.05) 0.07 0.10 0.15 0.22 0.45 1(P<.05) 0.07 0.12 0.19 0.31 0.67 1(P<.05) 0.10 0.11 0.14 0.25 0.52 FE 95-CI t-stat 1(P<.05) 95-CI (0.06,0.09) -0.07 0.07 (0.05, 0.08) (0.04,0.06) -0.18 0.05 (0.04,0.06) (0.05,0.09) -0.23 0.06 (0.04,0.07) (0.08,0.11) -0.39 0.07 (0.06, 0.09) (0.16,0.21) -0.59 0.07 (0.06, 0.09) 95-CI t-stat 1(P<.05) 95-CI (0.05,0.08) -0.12 0.06 (0.04,0.07) (0.08,0.12) -0.44 0.07 (0.05,0.09) (0.13,0.17) -0.47 0.08 (0.07,0.10) (0.20,0.25) -0.67 0.10 (0.08,0.11) (0.42,0.48) -1.08 0.19 (0.16,0.21) 95-CI t-stat 1(P<.05) 95-CI (0.06,0.09) -0.20 0.07 (0.06,0.09) (0.10,0.14) -0.46 0.07 (0.06,0.09) (0.16,0.21) -0.60 0.09 (0.07,0.11) (0.28,0.34) -0.86 0.15 (0.13,0.18) (0.64,0.70) -1.31 0.26 (0.23,0.29) 95-CI t-stat 1(P<.05) 95-CI (0.08,0.12) -0.18 0.09 (0.07,0.11) (0.09,0.12) -0.36 0.06 (0.04,0.07) (0.12,0.16) -0.46 0.09 (0.07,0.11) (0.23,0.28) -0.70 0.11 (0.09,0.13) (0.49,0.55) -1.12 0.19 (0.16,0.21) Hausman 1(P<.05) 1(P<.05) 0.08 0.06 0.08 0.12 0.26 1(P<.05) 0.09 0.17 0.21 0.35 0.60 1(P<.05) 0.13 0.32 0.50 0.61 0.75 *Note: 1(P<.05) is the rejection rate of the null of H0 :α=1. a is the ratio of missing sample to full sample. 73 2.4 Estimations with MAR assumption: IPW and MIIPW methods We introduce the methods under MAR to deal with missing data when MCAR is violated. MAR assumes that the variables that explain selection (or missing) process are either included in outcome model, as covariates or past outcomes, or they are observed as auxiliary variables. p(sit = 1|yit , zit ) = p(sit = 1|zit ) = p(zit ) ≡ Git (2.22) where t = 1, 2, 3 and zit is all observed variables at t. Therefore, under MAR assumption of (2.22), individuals who have the same past outcome and covariates have the same conditional probability of selection (or missing) status. In particular, we focus on two of most popular methods to deal with missing data under MAR assumption which are IPW and MI methods.14 In this section, we first introduce the idea of the justification for IPW and MI-IPW methods. Then, we examine finite sample properties for IPW and MI-IPW methods via Monte Carlo simulation. We consider two DGPs: (i) outcome and selection models are correctly specified in first DGP and (ii) outcome model is correctly specified but selection model is misspecified. Simulation with misspecified selection model provides information on the robustness property of IPW estimators for MTE to the violation of MAR where the violation 14 Unfortunately, it is not possible to test MAR assumption since it is never possible to know that all appropriate observed variables have been included as predictors in probability of selection model. Therefore, we suggest a method based on an economic theory for the specification of selection process where the justification of valid MAR is economic theory for the selection process. 74 of MAR is induced by the selection model misspecification. 2.4.1 IPW method We first consider IPW method (or efficient version of AIPW method) in Scharfstein et al. [1999], Wooldridge [2007] and Tsiatis [2006]. These methods are based on the idea that, complete-case data which are weighted by valid probability of selection can create pseudodata that mimic a random sample from the population of interest. Therefore standard estimation method applied to inverse probability weighted complete-case provides a consistent estimator. Correctly specified probability of selection model leads to the satisfaction of MAR assumption so that IPW estimator is consistent. Consistency for IPW estimator We consider a linear model with selection process as in (2.23) and (2.24) and we are primarily interested in MTE for dit , α. yit = dit · α + x1it · β + f1t + c1i + uit (2.23) where for i = 1, 2, .., n, t = 1, 2, 3, 4, dit is treatment status indicator and x1it is time-varying continuous covariates. We exclusively focus on the missing pattern of monotone drop-out (i.e. attrition) in the modeling of selection process for IPW estimator.15 pi1 = 1; pit = 1 if Φ(zit · γ + f2t + c2i + eit > a) 15 (2.24) We consider both monotone and non-monotone missing in the modeling of selection process for MI-IPW estimator. 75 and pit = 0 otherwise for i = 1, 2, .., n and t = 2, 3, 4. Then the objective function is provided in (2.25) for the case of balanced panel data model. n 4 mit (Wit , θ) = 0 where mit = xit vit (2.25) i=1 t=1 where vit = ci + uit , xit = (dit , x1it , ft ) and Wit = (yit , xit ). As we introduce missing data, we can rewrite the objective function (2.25) for unbalanced panel data with complete-case as the following estimating equation of (2.26). n 4 sit · mit (Wit , θ) = 0 (2.26) i=1 t=1 The estimating equation for IPW estimator with complete-case in unbalanced panel data is provided in (2.27). n 4 i=1 t=1 sit · mit (Wit , θ) = 0 pit (2.27) We suppose pit is obtained and converges to E(sit |Wit ) in (2.28). Then, under MAR assumption, we can extend the consistency result of unweighted pooled LS estimator in balanced panel data to IPW-pooled IPW estimator of (2.28) in unbalanced panel data. n T βP LS = ( i=1 t=1 sit xit xit −1 ) ( p(zit ) n T i=1 t=1 sit xit yit ) p(zit ) (2.28) where pit is estimated with either probit or logit model in the application. Recall first that the crucial exogeneity condition for Pooled LS estimator to be consistent in balanced panel is given in (2.29). T E( xit (ci + uit )) = 0 (2.29) t=1 We show the exogeneity condition of (2.29) for IPW-pooled LS estimator as below. We start from (2.30). 76 T sit xit (ci + uit )) E( (2.30) t=1 Using law of iterated expectation and MAR, we can rewrite (2.30) as the following. T E( T t=1 t=1 T T P (sit = 1|yit , zit )xit (ci + uit )) = E( E(sit |yit , zit )xit (ci + uit )) sit xit (ci + uit )) = E( = E( p(sit = 1|zit )xit (ci + uit )) t=1 t=1 T = E( p(zit )xit (ci + uit )) t=1 Thus, provided that p(zit ) is the consistent estimate for the correctly specified p(zit ), we obtain consistent IPW-Pooled LS estimator if conditional strict exogeneity of (2.31) is satisfied. T E( t=1 sit xit (ci + uit ) ) = E( p(zit ) T t=1 T = E( x (ci + uit ) E(sit |yit , zit ) it ) = E( p(zit ) T t=1 x (ci + uit ) p(zit ) it ) p(zit ) xit (ci + uit )) = 0 (2.31) t=1 As shown in (2.31), MAR assumption for selection process is critical for the validity of IPW method. An example of using MAR for IPW Estimator We consider the estimation of MTE for CSR on student achievement with project STAR data in illustrating the use of MAR assumption for the probability of selection. We consider IPW-Pooled LS estimator for missing data. n T βP LS = ( i=1 t=1 sit xit xit −1 ) ( p(zit ) 77 n T i=1 t=1 sit xit yit ) p(zit ) Specification for the probability of selection We estimate p(sit = 1|zit ) using binary logit and probit with maximum likelihood model for Git (zit , γ0 ). Git (zit , γ) ≡ p(sit = 1|zit ) = p(zit ), t = 2, 3, 4 zit = (xit−1, yit−1 ), or zit = (xit−1,..., x1 , yit−1 , .., yi1 ) where zit is observed at t (i) xit−1 : class types (initial small, initial aide, cumulative years in small class, cumulative years in small class) , student characteristics (gender, race, free lunch status), teacher characteristics (race, experiences, highest degree dummy), class characteristics (fraction of classmates with kindergarten attendance, fraction of classmates with free lunch, fraction of female classmates , fraction of minority classmates) (ii) yit−1 : previous year SAT percentile score We use logit binary response models for probability of selection πit = p(sit = 1|zit , sit−1 = 1) = P (zit γ + it > 0) = Λ(zit γ), sit−1 = 1. where Λ(zit γ) = exp(zit γ) 1 + exp(zit γ) For the case of using logit model for selection probability, we can use conditional expectation argument sequentially to estimate selection probability as follows. Gi1 = 1, Gi2 = πi2 , Gi3 = πi3 · πi2 , Gi4 = πi4 · πi3 · πi2 Gi1 = 1, Gi2 = Λ(si1 · zi2 γ), Gi3 = Λ(si1 · zi2 γ) · Λ(si2 · zi3 γ) Gi4 = Λ(si1 · zi2 γ) · Λ(si2 · zi3 γ) · Λ(si3 · zi4 γ) 78 Once we obtain γ from logit estimation, we can construct weights for inverse probability weighted (IPW) pooled LS estimator as in (2.32). n T n T sit xit xit −1 sit xit yit βP LS = ( ) ( ) Git Git i=1 t=1 i=1 t=1 (2.32) Idea behind the consistency of IPW estimator Under MAR, those cross-section units that have the same predictors values (past outcome, covariates and auxiliary variables) should have the same probability of selection (missing). Thus, remaining cross-section units with predictors of low(high) probability of selection get high weights and play the role of as many cross-section units who leave sample and had the same predictors values. Using valid probability weights eliminates bias due to differential attrition across treatment and control. For instance, in the study of the MTE for CSR on SAT score, if those students who left sample and might have perform well if they remained in sample without treatment were exclusively in regular class, ignoring missing distorts MTE of small class since students in regular class misrepresent population. IPW method recovers random sample property of balanced panel since remaining students with the same predictors in probability of selection model as students who left sample play the role of themselves and missing students by receiving more weights. 2.4.2 AIPW estimator The objective function of (2.33) provide an AIPW(Augmented IPW) estimator which is consistent and optimally efficient if data are MAR and conditional expectation is correctly 79 specified. n 4 i=1 t=1 s s [ it · mit (Wit , θ) + (1 − it )E(mit |It−1 )] = 0 pit pit (2.33) where It−1 is all information available at t − 1. IPW estimator is consistent if probability of selection is correctly specified but is generally inefficient. While an AIPW estimator can be efficient and is consistent if either a model specification for probability of selection or a model specification for conditional expectation. (i.e. joint distribution of missing variable) An efficient AIPW estimator can be constructed with appropriate model for conditional expectation in (2.33).16 Unfortunately, we do not have much information for parametric specification of the model on conditional expectation, E[sit · mit (Wit , θ)] = 0, to use in a practical application since it basically requires a correct specification for joint distribution of missing variables and, therefore, in the implementation, we focus on either the consistency of IPW estimator which only requires correct specification for probability of selection or MI-IPW estimator for which MI is based on MICE for E[sit · mit (Wit , θ)] = 0. In the application of MI, imputation is 16 AIPW is doubly robust and an estimator is called doubly robust if it is consistent when either a model for conditional expectation or a model for selection is correctly specified. Moreover, AIPW estimator is efficient as it delivers semi-parametric efficiency bound among 4 the class of estimators that use n i=1 t=1 sit · mit (Wit , θ) = 0. For more discussion on AIPW estimators is provided in Tsiatis [2006] and Robins et al. [1994]. Recently, we observed theoretical development of AIPW estimator(doubly robust(DR) estimator) which combines IPW method with imputation method as in equation (2.33). For instance, the first-term in objective function (2.33) is the same as IPW use for identification of α with complete-case and second-term in (2.33) is the same objective function as in imputation model.17 Theoretical attractiveness of DR estimator such as robustness and efficiency has been studied in great details in Carpenter et al. [2006], Tsiatis [2006], Robins and Rotnitzky [1995], and Wooldridge [2007]. However, up to our knowledge, there is no simulation study to evaluate the usefulness of theoretical properties of IPW or DR estimators for the regression model with high-dimensional missing covariates of various forms including binary, count, ordinal and fraction. Most of simulation studies has only one missing continuous variable in the model and uses joint multivariate normal distribution for conditional expectation term in the simulations.(Bang and Robins [2005], Carpenter et al. [2006] and Graham et al. [2008]). 80 applied only to cross-section units who are remained in the sample with missing variables but not to cross-section units who leave sample once but come back to sample later. 2.4.3 Multiple imputation We consider imputation based method to deal with missing data as in Rubin [1987], Buuren et al. [1999] and Ragnuthan et al. [2001]. Although multiple imputation has become increasingly popular in epidemiology, psychology, statistics and other social sciences, up to our knowledge, there has been no MI method appropriately designed for multiple treatments and missing occurs for outcome and all of covariates with various types. However, high-dimensional covariates with general missing or attrition is quite common in economic applications and the necessity of correct specification for the joint distribution of all missing covariates and outcome makes MI method unattractive in practice. For instance, in the application with Project STAR data, missing occurs in outcome and covariates including class size, free lunch status dummy, teacher’s years of experience, teacher’s master degree dummy, teacher’s race dummy, fraction of female classmates, fraction of minority classmates and so forth. Therefore, we restrict the use of imputation based method to within sample individuals who have missing variables so that we do not apply imputation for the units who are not in the sample so outcome and all covariates are missing. In most application, for the cross-section units in the sample with missing variables, it is very rare that the number of missing variables is more than three so the burden of approximation is reduced significantly compared to imputing all variables for cross-section units who leave sample. In implementation, we apply a MI method using specification of a logistic distribution for binary variables 81 and normal distribution for all other continuous variables. Among many MI methods used in practice, we implement fully conditional approach of multiple imputation in Royston [2004] and Royston [2005], using Stata ice command. 18 Set-up for MICE Estimator Conditional predictive distribution is estimated from the observed data under MAR. Missing variable are multiply imputed for within sample units. In implementation, we adopt a univariate conditional models in a Gibbs sampler type approach in Buuren et al. [1999]. We model each variable separately conditional on others observed variables. Sequences of missing variables imputation follow from most sample to least sample in terms of number of sample. yit = (yO , yM ), yO : observed , yM : missing (2.34) xit = (xO , xM ), xO : observed , xM : missing (2.35) We need to model all variables with missing data as in (2.36) and (2.37). f (yM,it |WO,it−1 ) , ∀t (2.36) f (xM,it |WO,it−1 ) , ∀t (2.37) where WO,it−1 is all observed variables at t − 1. 18 There are currently two widely used methods of model-based imputation: multiple imputation based on multivariate normal distribution for missing variables and multiple imputation by chained equation(MICE) in Buuren et al. [1999]. We implement a MI method as comparison purpose to IPW method since it is readily available to most of statistical packages such as Stata, SAS, and R. 82 We conduct multiple imputation by chained equations method in Stata implementation of Royston [2005] and type in below command. ice y d x ‘auxiliary’ using temp.dta, m(10) cmd(d:logit) where m(10) implies that ice produces 10 imputed data sets. 2.4.4 MI-IPW method In unbalanced panel data, we suggest MI-IPW method for which implementation is based on the following. First, we sort data for each cross-section unit at each time into four category: complete case, monotone(attrition), non-monotone-I, and non-monotone-II. Nonmonotone-I missing is for cross-section unit which is in sample but has missing variables while non-monotone-II missing is cross-section unit who leave sample but re-enter sample later. We assume MCAR for cross-section unit with missing patterns of non-monotone-II so that we ignore these types of data and still obtain valid statistical analysis. Second, we impute missing variables for non-monotone-I unit using the imputation methods in 2.4.3. Third, by treating imputed non-monotone-I units as complete case, we combine complete case and imputed non-monotone-I data and apply IPW. We obtain MTE αj and its covariance matrix Vj . Finally, we iterate the second and third procedures multiple(M ) times. By extending Rubin’s formula(Rubin [1987]) for MI method to MI-IPW method, we obtain MI-IPW estimator for coefficient and variance matrix as in (2.38). 1 α= M where B = M1 −1 M j=1 (αj M j=1 1 αj , V = M − α)(αj − α) 83 M Vj + j=1 M +1 B M (2.38) As probability weight in IPW method eliminates bias due to differential missing while MI for missing variables enhances the efficiency. Both IPW and MI method use MAR assumption in the specification of probability of selection (missing) and missing variables. 2.5 Monte Carlo experiment We adopt data generating processes (DGPs) with monotone missing data(i.e. missing by attrition) so that data missing occurs simultaneously for outcome and all covariates for a cross-section unit. The covariates in outcome model include both discrete and continuous variables. We are primarily interested in obtaining consistent MTE using IPW method. 2.5.1 DGP I: Correct specification of both outcome and probability of selection models We start with a DGP I which satisfies the correct specifications for both outcome and selection models. Conditional on auxiliary variables c2it , selection is not correlated with unobserved errors in outcome equation so that MAR is satisfied for DGP I. Model for data generating process We use a linear model with single treatment and single covariate for DGP. We generate data with n from 100 to 5,000 and t is 4 to mimic the actual size of Project STAR data. yit = dit · α + xit · β + c1i + uit where i = 1, 2, ..., n, dit is a binary variable and xit is continuous variable. si1 = 1; sit = 1 if fi + c2it + it > a and sit−1 = 1, for i = 1, 2, .., n and t = 2, 3, 4 84 where a is the cut-off value that determines the fraction of missing sample to full sample and a=0, -1, -2 are chosen and corresponding fraction of missing data to full data is 75, 50, and 25 percent, respectively. Variables generation for DGP I 1. dit = 1 if fi + qit > 0 and dit = 0 otherwise, where fi ∼ iidU (0, 1) , qit ∼ iidN (0, 1) and α =1 2. xit ∼ iidU (0, 1), β = −0.2 3. c1i = 0 4. uit = rit + vit where rit ∼ iidU (0, 1) and       0       0      ,    0         0 1 ρ ρ ρ    ρ 1 ρ ρ     ρ ρ 1 ρ     ρ ρ ρ 1            vi1        vi2     ∼ M V N     vi3        vi4 where M V N is multivariate normal and ρ=0.7 in the simulation. 5. si1 = 1 and sit = 1 if sit−1 = 1 and fi + c2it + it > a where it ∼ iidN (0, 1), and c2it = rit + qit . Selection process and the potential source of bias The correlation among selection, treatment and unobserved errors are induced to be strictly positive in DGP. First, for a balanced data, dit is not correlated with unobserved errors so that balanced panel unweighted estimators are consistent. The correlation between selection and treatment is induced by 85 fi and qit while the correlation between selection and unobserved errors are induced by rit . Therefore, there is strictly positive differential correlation between selection and errors across dit . Therefore, all pooled LS, FE and FD estimators should show finite sample bias. However, under MAR(i.e. conditional on c2it = rit +qit and fi ), the selection is not correlated with unobserved errors so both pooled LS and FD estimators with correct specification of probability of selection should not show bias more than sampling error. Estimation of probability weight for IPW method Estimation of probability weight is based on (2.39) and (2.40). In Stata, we estimate pit for t = 2, 3, 4 using a command (probit sit fi c2it if sit−1 =1 or logit sit fi c2it if sit−1 =1) πi1 = 1; pit (sit = 1|zit , sit−1 = 1) = πit = Φ(zit · γ + eit > 0) (2.39) where eit is standard normal for i = 1, 2, .., n and t = 2, 3, 4. pit = πit · πit−1 . . . πi2 , ∀i and for t = 2, 3, 4 (2.40) pit for t = 2, 3, 4 using a command in Stata as follows: probit s f c2 if s_t-1=1 In this DGP, probit model with predictors, fi and c2it , provides the correct specification for probability of selection. Imputation of missing variables using MICE for y, d, x using following specifications. 86 In Stata, we impute missing variables Table 2.7. Specification of models for missing variables variable yM it dM it xM it distributional assumption Normal Logistic Normal command regress logit regress prediction equation dOit xOit cO2it fOi yOit xOit cO2it fOi yOit dOit cO2it fOi M represents missing variable and O represent observed variable. In implementation, we type in a Stata command as follows (Royston [2005]): ice y d x c2 f using temp.dta, m(10) cmd(d: logit). Simulation result: Correctly specified outcome and selection models Table 2.8 reports unweighted estimates for pooled LS, FE and FD with DGP I. It shows that unweighted estimates are all biased since MCAR is violated. The biases of unweighted estimators do not depend on n but the biases for unweighted estimators decrease as a (missing fraction from full data) falls. Interestingly, all unweighted estimators underestimate the marginal treatment effect (MTE). The magnitude of biases among unweighted pooled LS , FE and FD estimators are not much different from each other. Table 2.9 shows that IPW estimates for pooled LS and FD obtained using data from DGP I. As theory predicts, both IPW-pooled LS and IPW-FD estimators show no significant finite sample bias as the mean of MC point estimates converges to true value 1 as n increases for all a. For the inference with IPW LS estimator, the rejection frequency converges to true nominal level 0.05 as n increases and the coverage rate of rejection frequency contains 0.05 for all n and a ≤0.5. On the other hand, IPW-FD estimates are getting close to true MTE as n increases while the rejection frequency does not converges to 0.05 and the coverage rate of rejection frequency does not contain 0.05 for all n and a. There is over-rejection in the inference of IPW-FD estimator even for n=5,000 and all a. For example, the mean of MC 87 rejection frequency is .09 at the nominal level of .05. Table 2.10 reports MI estimates for pooled LS, FE and FD with DGP I. The mean of MC point estimate for MTE of pooled LS is quite close to true value 1 while rejection rate coverage fails to contain 0.05 for all a and n. The means of MC point estimates for MTE and associated rejection frequency of FD are all biased. MI estimates for LS, FE and FD provide incorrect standard errors in finite sample. 88 Table 2.8. DGP I: MTE estimates for unweighted estimators Unweighted number of n missing fraction 200 0.75 200 0.5 200 0.25 500 0.75 500 0.5 500 0.25 1,000 0.75 1,000 0.5 1,000 0.25 5,000 0.75 5,000 0.5 5,000 0.25 LS Mean(α) 0.968 0.974 0.987 0.963 0.974 0.985 0.963 0.974 0.988 0.961 0.974 0.987 FE SD(α) .152 .111 .090 .099 .071 .058 .071 .048 .041 .031 .022 .018 Mean(α) .951 .968 .986 .951 .969 .984 .951 .968 .985 .951 .969 .985 FD SD(α) .142 .082 .065 .086 .053 .040 .062 .037 .028 .026 .016 .013 Mean(α) .952 .967 .987 .951 .970 .984 .951 .967 .985 .951 .969 .985 *Note: Missing fraction is the fraction of missing sample to full sample. 89 SD(α) .155 .093 .073 .095 .059 .046 .068 .042 .033 .029 .018 .015 Table 2.9. DGP I: MTE estimates for IPW estimators IPW n % missing 200 0.75 200 0.5 200 0.25 500 0.75 500 0.5 500 0.25 1,000 0.75 1,000 0.5 1,000 0.25 5,000 0.75 5,000 0.5 5,000 0.25 LS Mean(α) 1.001 .997 1.001 .996 1.001 .998 .997 1.001 1.000 .994 1.001 1.001 SD(α) .246 .142 .096 .179 .091 .061 .137 .065 .043 .070 .030 .019 FD r coverage of r .086 (.074,.099) .060 (.050,.070) .050 (.040,.060) .080 (.067,.091) .047 (.037,.056) .049 (.040,.058) .078 (.066,.089) .047 (.037,.056) .049 (.039,.058) .055 (.045,.064) .048 (.038,.057) .046 (.036,.055) Mean(α) .986 .993 1.000 .985 .998 .997 .986 .996 .998 .992 .997 .999 SD(α) .238 .125 .081 .169 .079 .050 .127 .057 .036 .065 .026 .016 r coverage of r .143 (.128,.159) .090 (.077,.103) .090 (.077,.103) .122 (.107,.136) .087 (.075,.099) .086 (.074,.098) .106 (.092,.119) .096 (.083,.109) .086 (.074,.098) .087 (.074,.099) .082 (.070,.094) .092 (.079,.105) *Note: r ≡ 1(p < .05) is the rejection rate for the null of H0 :α=1 with nominal value 0.05.; Cluster robust standard error has been used in all estimations. 90 Table 2.10. DGP I: MTE for estimators with multiple imputations MI n % missing 200 0.75 200 0.5 200 0.25 500 0.75 500 0.5 500 0.25 1,000 0.75 1,000 0.5 1,000 0.25 LS Mean(α) 1.006 1.007 1.002 1.013 1.003 1.004 1.011 1.006 1.000 FE SD(α) .168 .112 .094 .106 .071 .059 .074 .051 .040 Mean(α) .980 .994 .996 .988 .991 .999 .987 .994 .998 SD(α) .161 .093 .068 .101 .059 .042 .071 .042 .029 FD r coverage of r .551 (.529,.573) .261 (.242,.280) .128 (.113,.142) .563 (.541,.585) .277 (.257,.297) .108 (.095,.122) .559 (.537,.581) .274 (.255,.294) .096 (.083,.108) Mean(α) .947 .973 .984 .951 .968 .986 .952 .970 .985 SD(α) .157 .091 .074 .097 .058 .046 .066 .041 .033 r coverage of r .089 (.077,.102) .065 (.054,.076) .064 (.053,.074) .093 (.080,.106) .093 (.080,.106) .065 (.054,.076) .110 (.096,.124) .107 (.094,.121) .082 (.070,.094) *Note: r ≡ 1(p < .05) is the rejection rate for the null of H0 :α=1 with nominal value 0.05.; Cluster robust standard error has been used in all estimations. 91 2.5.2 DGP II: The correct probability of selection model with endogenous treatment effect Consider the same data generating process as DGP I except ci1 = fi + ai so that, even for a balanced panel data, the treatment, dit , is correlated with unobserved heterogeneity. However, FE or FD transformation eliminates the source of bias in balanced panel data. In unbalanced panel data, unweighted pooled LS, FE and FD estimators are biased due to the correlations between selection and the treatment and between selection and time-varying errors. However, under MAR for probability of selection, IPW-FD estimator can eliminate the source of differential correlation between selection and unobserved heterogeneity across dit . Once MAR assumption holds and the probability of selection is correctly specified, IPW-FD estimator should not show any bias more than finite sample error in simulation. Simulation results In table 2.11, unweighted pooled LS, FE and FD estimators show finite sample bias for the data with all n and a > 0. Unweighted Pooled LS estimator overestimates MTE while FE and FD estimators underestimate MTE for all n and a. The biases do not depend on n for all unweighted pooled LS, FE and FD estimators while the biases decrease as a (the fraction of missing sample to full sample) decreases. Table 2.12 reports that IPW LS is biased and overestimates MTE. Compared to unweighted LS, IPW LS estimates has much larger variance but somewhat smaller bias. The bias for IPW LS estimator does not depend on n and a and the size distortion is prohibitively large for all n and a. On the other hand, IPW FD estimates are very close to true marginal 92 effect 1 while rejection frequency shows some evidence of over-rejection. Although the size distortion decreases as n increases, over-rejection problem remain even for n=5,000 and all a. The size distortion for IPW FD estimator does not disappear even for n=20,000. Table 2.13 shows pooled LS FE and FD with MI . MC estimates show finite sample bias for all n and a. Pooled LS estimator with MI overestimates the marginal treatment effect while MI FD estimators underestimate the marginal treatment effect for all n and a. The biases do not depend on n for both MI LS and MI FD estimators while the biases decrease as a (fraction of missing sample to full sample) decreases. The mean of MC point estimates for MI FE estimator shows that the bias is quite small for all n and a ≤0.5 while the size distortion is large and is negatively correlated to a. 93 Table 2.11. DGP II: MTE for unweighted estimators Unweighted number of n missing fraction 200 0.75 200 0.5 200 0.25 500 0.75 500 0.5 500 0.25 1,000 0.75 1,000 0.5 1,000 0.25 5,000 0.75 5,000 0.5 5,000 0.25 LS Mean(α) 1.144 1.142 1.136 1.150 1.145 1.138 1.148 1.143 1.137 1.148 1.144 1.138 FE SD(α) .171 .122 .098 .110 .077 .060 .075 .054 .043 .034 .024 .020 Mean(α) .949 .968 .985 .952 .969 .985 .951 .970 .985 .952 .969 .985 FD SD(α) .140 .083 .064 .088 .051 .040 .059 .037 .028 .027 .016 .013 Mean(α) .949 .968 .985 .952 .969 .984 .952 .970 .985 .952 .969 .985 SD(α) .154 .094 .064 .098 .058 .046 .065 .041 .031 .030 .018 .014 *Note: Missing fraction is the fraction of missing sample to full sample. Sensitivity analysis is conducted for unweighted LS, FE and FD estimators. 94 Table 2.12. DGP II: MTE estimates for IPW estimators IPW n % missing 200 0.75 200 0.5 200 0.25 500 0.75 500 0.5 500 0.25 1,000 0.75 1,000 0.5 1,000 0.25 5,000 0.75 5,000 0.5 5,000 0.25 LS Mean(α) 1.129 1.131 1.132 1.131 1.134 1.133 1.136 1.132 1.133 1.136 1.133 1.134 SD(α) .272 .150 .104 .196 .077 .063 .147 .072 .046 .074 .033 .021 FD r coverage of r .151 (.136,.167) .181 (.164,.198) .261 (.242,.280) .175 (.158,.191) .344 (.323,.365) .542 (.520,.564) .253 (.233,.272) .538 (.516,.560) .831 (.815,.847) .620 (.598,.641) .972 (.965,.979) .999 (.997,1) Mean(α) .983 .993 .997 .987 .995 .997 .992 .998 .998 .991 .997 .999 SD(α) .235 .118 .080 .170 .079 .050 .127 .058 .035 .064 .026 .015 r coverage of r .154 (.139,.171) .098 (.084,.110) .094 (.081,.106) .129 (.114,.143) .090 (.077,.103) .091 (.078,.104) .085 (.072,.097) .084 (.071,.096) .083 (.070,.095) .085 (.073,.097) .073 (.062,.084) .076 (.064,.088) *Note: r ≡ 1(p < .05) is the rejection rate for the null of H0 :α=1 with nominal value 0.05.; Cluster robust standard error has been used in all estimations. 95 Table 2.13. DGP II: MTE estimates for estimators with multiple imputations MI n % missing 200 0.75 200 0.5 200 0.25 500 0.75 500 0.5 500 0.25 1,000 0.75 1,000 0.5 1,000 0.25 LS Mean(α) 1.141 1.142 1.139 1.145 1.141 1.136 1.143 1.137 1.136 FE SD(α) .176 .121 .100 .111 .076 .060 .078 .053 .045 Mean(α) .982 .993 .998 .989 .994 .997 .986 .991 .997 SD(α) .163 .096 .067 .107 .060 .041 .073 .043 .030 FD r coverage of r .547 (.525,.569) .270 (.251,.290) .107 (.094,.121) .578 (.557,.600) .259 (.239,.278) .100 (.087,.113) .572 (.550,.594) .289 (.269,.309) .099 (.087,.113) Mean(α) .950 .968 .985 .953 .970 .985 .949 .968 .984 SD(α) .154 .094 .072 .097 .059 .046 .066 .040 .032 r coverage of r .073 (.062,.085) .075 (.064,.087) .056 (.046,.066) .087 (.075,.100) .084 (.072,.096) .057 (.047,.067) .117 (.102,.131) .114 (.100,.128) .080 (.068,.092) *Note: r ≡ 1(p < .05) is the rejection rate for the null of H0 :α=1 with nominal value 0.05.; Cluster robust standard error has been used in all estimations. 96 2.5.3 The robustness of IPW estimator to misspecification of the probability of selection This section provides the simulation results for the sensitivity of probability of selection model misspecification. One of main criticism for IPW method is its instability of estimates to the misspecification of the probability of selection. We test the sensitivity of IPW estimator to the misspecification of probability of selection model using a DGP in which an omitted variable induces misspecification of probability of selection. For instance, in STAR empirical analysis, it is possible that we possibly omit some important family characteristics in the selection model. DGP with selection model misspecification We use a linear model with a treatment and a covariate. We generate data with n from 100 to 5,000 and t=4 to mimic the actual size of Project STAR data. yit = dit · α + xit · β + c1i + uit , for i = 1, 2, .., n and t = 1, 2, 3, 4 where dit a binary variable and xit is continuous variable. Variable generation conducted as follows. 1. dit = 1 if fi + qit > 0 and dit = 0 otherwise, where fi ∼ iidU (0, 1) , qit ∼ iidN (0, 1) and α =1 2. xit ∼ iidU (0, 1), β = −0.2 3. c1i ∼ iidU (0, 1) 97 4. uit = rit + vit where rit = fi + iidU (0, 1) and                 vi1         vi2    ∼ M V N     vi3        vi4 0       0      ,    0         0  1 ρ ρ ρ    ρ 1 ρ ρ     ρ ρ 1 ρ     ρ ρ ρ 1 where M V N is multivariate normal and ρ=0.7 in the simulation. The selection is complicated by including an interaction term which is omitted in the logit model estimation. si1 = 1 and sit = 1 if sit−1 = 1 and fi + δyit−1 ∗ c2it + c2it + it > a where it ∼ iidN (0, 1) and c2it = rit + qit and where a is the cut-off value that determines the fraction of missing sample to full sample and a=0, -1, -2 are chosen and corresponding fraction of missing data are about 75, 50, and 25 percents, respectively. For correctly specified selection model, IPW FD estimator is consistent since the unique source of differential correlation, fi , can be eliminated by FD transformation. The probability weights for IPW are obtained using following model specification for selection process, πit , which is given by: πit = exp(zit γ) , t = 2, 3, 4 for sit−1 = 1 1 + exp(zit γ) where zit =(fi , c2it ) and the omission of interaction term and logit estimation instead of probit lead to misspecification of selection model. Simulation Result: Sensitivity of IPW estimators to probability of selection model misspecification We simulate with four values for the parameter of misspecifi98 cation, δ=0, 0.2, 0.5, and 1 where δ=0 implies correct specification. As the values of δ increases, the degree of misspecification increases. For instance, in DGP with δ=0.2, the variation by the omitted variable (yit−1 ∗ c2it ) represents about 3 percent of total variation. Moreover, as δ increases to .5 and 1, the fraction of variation due to the variation of the omitted variable represents 10 and 25 percent of total variation respectively. This simulation intends to clarify the impact of misspecification of selection since we possibly do not observe some important variables in the selection model. Tables from 2.14 to 2.21 show simulation results. Even with the misspecification for the probability of selection model, the bias of IPW estimators is no greater than the bias of unweighted estimators with δ ≤0.5. Figure 2.1 compares biases between unweighted and IPW methods for LS and FD estimators. Especially if missing fraction is less than or equal to 0.5, the biases for IPW estimators are less than those of unweighted estimators in the presence of misspecification of selection model for δ ≤0.5. In sum, simulation results show that the estimates from IPW FD estimator does not exacerbate the bias even if selection model is misspecified when it is compared to the estimates from unweighted FD estimator with δ ≤0.5. 99 Table 2.14. Selection model misspecification: MTE for unweighted estimators, δ=0 Unweighted number of n missing fraction 100 0.75 100 0.5 100 0.25 200 0.75 200 0.5 200 0.25 500 0.75 500 0.5 500 0.25 1,000 0.75 1,000 0.5 1,000 0.25 LS Mean(α) 1.265 1.267 1.269 1.257 1.258 1.261 1.251 1.253 1.257 1.259 1.259 1.260 FE SD(α) .246 .169 .154 .179 .115 .106 .111 .072 .066 .081 .049 .045 Mean(α) .955 .973 .981 .951 .976 .983 .949 .973 .982 .951 .975 .982 FD SD(α) .197 .102 .092 .136 .073 .064 .086 .044 .040 .063 .032 .028 Mean(α) .953 .975 .981 .951 .977 .984 .948 .974 .982 .951 .974 .982 SD(α) .219 .118 .104 .147 .082 .073 .096 .052 .046 .070 .036 .032 *Note: Missing fraction is the fraction of missing sample to full sample. δ represents the degree of misspecification of the selection model. 100 Table 2.15. Selection model misspecification: MTE for IPW estimators, δ=0 IPW n % missing 100 0.75 100 0.5 100 0.25 200 0.75 200 0.5 200 0.25 500 0.75 500 0.5 500 0.25 1,000 0.75 1,000 0.5 1,000 0.25 LS Mean(α) 1.271 1.271 1.275 1.250 1.261 1.265 1.243 1.263 1.263 1.255 1.266 1.266 SD(α) .354 .193 .165 .275 .134 .115 .199 .086 .073 .153 .058 .049 FD r coverage of r .209 (.181,.236) .342 (.313,.371) .413 (.383,.444) .233 (.207,.260) .552 (.521,.583) .663 (.634,.692) .384 (.354,.414) .876 (.856,.896) .955 (.942,.968) .553 (.522,.584) .990 (.984,.996) .998 (.995,1) Mean(α) .989 .994 .994 .982 .998 1.000 .984 .998 .998 .990 .998 .998 SD(α) .299 .139 .113 .216 .097 .083 .160 .064 .051 .116 .043 .035 r coverage of r .171 (.146,.196) .090 (.072,.108) .081 (.064,.098) .120 (.100,.140) .087 (.070,.104) .079 (.062,.096) .116 (.096,.136) .088 (.070,.106) .077 (.060,.094) .097 (.079,.115) .084 (.067,.101) .074 (.058,.090) *Note: r ≡ 1(p < .05) is the rejection rate for the null of H0 :α=1 with nominal value 0.05.; Cluster robust standard error has been used in all estimations. 101 Table 2.16. Selection model misspecification: MTE for unweighted estimators, δ=.2 Unweighted number of n missing fraction 100 0.75 100 0.5 100 0.25 200 0.75 200 0.5 200 0.25 500 0.75 500 0.5 500 0.25 1,000 0.75 1,000 0.5 1,000 0.25 LS Mean(α) 1.401 1.303 1.288 1.417 1.317 1.295 1.418 1.316 1.293 1.415 1.314 1.292 FE SD(α) .240 .157 .146 .169 .112 .104 .107 .072 .067 .075 .050 .047 Mean(α) .938 .973 .984 .941 .973 .982 .941 .974 .982 .939 .975 .983 FD SD(α) .165 .095 .089 .115 .069 .064 .075 .043 .040 .050 .030 .028 Mean(α) .926 .966 .978 .929 .966 .977 .932 .968 .977 .929 .969 .979 *Note: Missing fraction is the fraction of missing sample to full sample. 102 SD(α) .183 .110 .102 .126 .078 .073 .084 .049 .045 .057 .035 .032 Table 2.17. Selection model misspecification: MTE for IPW estimators, δ=.2 IPW n % missing 100 0.75 100 0.5 100 0.25 200 0.75 200 0.5 200 0.25 500 0.75 500 0.5 500 0.25 1,000 0.75 1,000 0.5 1,000 0.25 LS Mean(α) 1.381 1.288 1.281 1.405 1.305 1.289 1.401 1.304 1.286 1.395 1.302 1.286 SD(α) .300 .158 .144 .216 .116 .105 .140 .074 .068 .095 .050 .047 r ≡ 1(p < .05) .318 .458 .488 .533 .780 .809 .855 .992 .989 .980 1 1 FD coverage of r (.289,.347) (.426,.489) (.455,.521) (.502,.564) (.754,.806) (.784,.883) (.833,.877) (.986,.998) (.983,.995) (.971,.989) (1,1) (1,1) Mean(α) .930 .970 .981 .937 .970 .981 .942 .973 .981 .940 .974 .982 SD(α) .231 .115 .103 .163 .081 .074 .109 .051 .046 .073 .036 .033 r ≡ 1(p < .05) .141 .114 .106 .129 .116 .107 .141 .137 .129 .185 .181 .134 coverage of r (.119,.163) (.094,.134) (.086,.126) (.108,.150) (.096,.136) (.088,.126) (.119,.163) (.116,.158) (.108,.150) (.161,.209) (.157,.205) (.113,.155) *Note: 1(P<.05) is the rejection rate for the null of H0 :α=1 with nominal value 0.05.; Cluster robust standard error has been used in all estimations. 103 Table 2.18. Selection model misspecification: MTE for unweighted estimators, δ=.5 Unweighted number of n missing fraction 100 0.75 100 0.5 100 0.25 200 0.75 200 0.5 200 0.25 500 0.75 500 0.5 500 0.25 1,000 0.75 1,000 0.5 1,000 0.25 LS Mean(α) 1.416 1.303 1.292 1.424 1.307 1.288 1.424 1.306 1.289 1.423 1.305 1.287 FE SD(α) .215 .147 .143 .146 .101 .097 .098 .071 .068 .067 .048 .046 Mean(α) .936 .971 .980 .936 .974 .982 .939 .974 .982 .941 .976 .983 FD SD(α) .142 .094 .089 .097 .064 .060 .062 .042 .039 .042 .028 .027 Mean(α) .916 .962 .976 .918 .967 .977 .922 .965 .976 .924 .968 .978 *Note: Missing fraction is the fraction of missing sample to full sample. 104 SD(α) .157 .110 .105 .110 .074 .070 .070 .047 .045 .048 .033 .031 Table 2.19. Selection model misspecification: MTE for IPW estimators, δ=.5 IPW n % missing 100 0.75 100 0.5 100 0.25 200 0.75 200 0.5 200 0.25 500 0.75 500 0.5 500 0.25 1,000 0.75 1,000 0.5 1,000 0.25 LS Mean(α) 1.379 1.297 1.290 1.389 1.300 1.285 1.385 1.299 1.286 1.384 1.297 1.284 SD(α) .259 .154 .145 .177 .105 .099 .113 .073 .069 .079 .049 .046 r ≡ 1(p < .05) .400 .505 .495 .656 .798 .799 .942 .991 .992 .995 1 1 FD coverage of r (.370,.431) (.472,.539) (.454,.535) (.627,.685) (.773,.823) (.773,.825) (.927,.957) (.985,.997) (.986,.998) (.991,.999) (1,1) (1,1) Mean(α) .914 .962 .977 .910 .968 .978 .918 .967 .977 .920 .969 .979 SD(α) .178 .112 .104 .124 .075 .071 .077 .048 .045 .053 .033 .031 r ≡ 1(p < .05) .142 .120 .114 .174 .109 .099 .257 .169 .138 .405 .215 .161 coverage of r (.120,.164) (.098,.141) (.088,.140) (.150,.198) (.090,.129) (.080,.119) (.230,.284) (.146,.192) (.117,.160) (.375,.435) (.189,.241) (.138,.184) *Note: 1(P<.05) is the rejection rate for the null of H0 :α=1 with nominal value 0.05.; Cluster robust standard error has been used in all estimations. 105 Table 2.20. Selection model misspecification: MTE for unweighted estimators, δ=1.0 Unweighted number of n missing fraction 100 0.75 100 0.5 100 0.25 200 0.75 200 0.5 200 0.25 500 0.75 500 0.5 500 0.25 1,000 0.75 1,000 0.5 1,000 0.25 LS Mean(α) 1.378 1.314 1.298 1.382 1.313 1.297 1.374 1.313 1.299 1.376 1.311 1.298 FE SD(α) .207 .156 .151 .150 .114 .110 .090 .067 .065 .064 .049 .047 Mean(α) .929 .961 .966 .936 .966 .973 .931 .964 .972 .933 .964 .972 FD SD(α) .129 .093 .091 .093 .067 .064 .057 .042 .040 .038 .029 .027 Mean(α) .905 .944 .955 .910 .952 .963 .905 .948 .960 .907 .949 .962 *Note: Missing fraction is the fraction of missing sample to full sample. 106 SD(α) .150 .107 .106 .107 .077 .072 .064 .047 .045 .045 .034 .032 Table 2.21. Selection model misspecification: MTE for IPW estimators, δ=1.0 IPW n % missing 100 0.75 100 0.5 100 0.25 200 0.75 200 0.5 200 0.25 500 0.75 500 0.5 500 0.25 1,000 0.75 1,000 0.5 1,000 0.25 LS Mean(α) 1.402 1.323 1.305 1.412 1.323 1.305 1.402 1.322 1.307 1.406 1.321 1.305 SD(α) .248 .160 .154 .176 .118 .112 .105 .069 .066 .074 .050 .047 r ≡ 1(p < .05) .442 .529 .527 .730 .807 .817 .981 .997 .996 1 1 1 FD coverage of r (.410,.473) (.496,.563) (.488,.565) (.702,.758) (.783,.832) (.792,.841) (.973,.989) (.994,1) (.992,1) (1,1) (1,1) (1,1) Mean(α) .902 .943 .955 .903 .952 .964 .896 .947 .960 .897 .948 .962 SD(α) .167 .108 .107 .115 .077 .072 .069 .048 .045 .049 .034 .032 r ≡ 1(p < .05) .170 .133 .127 .208 .149 .129 .416 .257 .204 .614 .423 .297 coverage of r (.146,.193) (.110,.156) (.102,.153) (.183,.233) (.127,.171) (.108,.150) (.385,.447) (.230,.284) (.179,.229) (.584,.644) (.392,.454) (.269,.325) *Note: 1(P<.05) is the rejection rate for the null of H0 :α=1 with nominal value 0.05.; Cluster robust standard error has been used in all estimations. 107 2.5.4 The robustness of first-differencing estimator with small within-variation In most experiments, program participants ideally maintain their treatment or control groups status from the start to the end of a program. This inevitably induces very small withinvariation for treatment variable and this is one of main criticisms for fixed effects and first differencing estimators in the program evaluation studies. Thus, we examine the sensitivity of IPW first-differencing estimator to small within-variation of treatment variable. The magnitude of within-variation is induced randomly and we assume correct specification for the probability of selection. Motivation of sensitivity test for IPW FD estimator to small within variation Fixed effects and first differencing estimators are very effective methods to eliminate the source of bias when across treatment and control groups, response variables are systematically different for missing data individual units and its difference is due to unobserved heterogeneity. However, identification of first differencing estimator is based only on within variation. DGP for dit with small within variation Three different magnitudes of within-variation for treatment status dit are induced randomly in the simulation. No restriction on treatment status L = 0 First type use the generation of treatment variable without any restriction for within-variation imposed on for treatment variable. No 108 Figure 2.1. Bias of unweighted and IPW LS estimator (Note) For figures in first row, six sets of bar represent LS (a=.25), IPW-LS (a=.25), LS (a=.5), IPW-LS (a=.5), LS(a=.75), and IPW-LS (a=.75) estimates from the left to the right where true value is 0. The colors of bar represent degree of mis-specification - blue (correctly specified), Red (δ=.2), green (δ=.5), and Purple (δ=1.0)- where δ is the magnitude of misspecification coefficient in the selection model and a is fraction of missing data. For figures in second row, six sets of bar represent FD (a=.25), IPW-FD (a=.25), FD (a=.5), IPW-FD (a=.5), FD(a=.75), and IPW-FD (a=.75) estimates from the left to the right. In both columns, the left panel reports simulation results for n=100 and right panel for n=1000. 109 restriction induces dit for which about 68 percent of sample change status during four time periods. dit = 1 if fi + qit > 0 and dit = 0 otherwise, where fi ∼ iidU (0, 1) , qit ∼ iidN (0, 1) Small within variation treatment variable Smaller within-variations for the treat- ment status were induced with the following DGP. di1 = 1 if fi + qi1 > 0 and di1 = 0 otherwise. dit = dit−1 if −L < fi + qit < L and dit = 0 if fi + qit < −L and dit = 1 if fi + qit > L. Therefore, the value L determines the magnitude of fraction of within-variation for a treatment variable. si1 = 1 and sit = 1 if sit−1 = 1 and fi +c2it + it > a where it ∼ iidN (0, 1), and c2it = rit +qit . where a is cutoff value which is adjusted from DGP I. The model for selection process is assumed to be specified correctly so that πit is obtained from the following: πit = exp(zit γ) , t = 2, 3, 4 for sit−1 = 1 1 + exp(zit γ) where zit =(fi , c2it ). Sensitivity of IPW FD estimator to small within variation for a treatment variable We test the sensitivity of IPW FD estimator to small within variation of a treatment variable. DGPs are a treatment variable with full variation, with 50 % variation of units and with 15 % variation of units. The simulation intends to clarify the impact of small within variation on the consistency of IPW FD estimator. 110 Simulation results Tables from 2.22 to 2.24 show that the bias of FD and IPW-FD estimators to small within variation treatment. The key is that for missing fraction less than 50 % and within variation more than 50 %, IPW-FD estimate quite consistently estimate MTE. Figure 2.2 compares the biases between FD and IPW-FD methods. Especially if missing fraction is less than 0.5, IPW-FD method works quite well. Unless variation is less than 25 percent, the bias by random small within variation is not substantial. 111 Table 2.22. Sensitivity analysis to small within-variation: MTE with full within-variation FD number of n missing fraction 100 0.75 100 0.5 100 0.25 200 0.75 200 0.5 200 0.25 500 0.75 500 0.5 500 0.25 1,000 0.75 1,000 0.5 1,000 0.25 Mean(α) .946 .967 .984 .949 .970 .987 .948 .967 .987 .947 .968 .986 IPW-FD SD(α) .219 .130 .101 .150 .090 .072 .096 .057 .046 .068 .040 .032 Mean(α) .974 .992 .994 .974 .998 1.000 .979 .993 .998 .985 .994 .999 SD(α) .290 .155 .107 .224 .111 .078 .169 .079 .049 .124 .055 .036 r ≡ 1(p < .05) .147 .104 .083 .126 .076 .083 .122 .089 .095 .093 .084 .100 *Note: Missing fraction is the fraction of missing sample to full sample. 112 coverage of r (.123,.171) (.085,.122) (.066,.101) (.105,.146) (.060,.092) (.066,.100) (.101,.142) (.071,.107) (.077,.113) (.080,.106) (.067,.101) (.081,.119) Table 2.23. Sensitivity analysis: MTE with within-variation of 50% sample only FD number of n missing fraction 100 0.75 100 0.5 100 0.25 200 0.75 200 0.5 200 0.25 500 0.75 500 0.5 500 0.25 1,000 0.75 1,000 0.5 1,000 0.25 Mean(α) .941 .962 .987 .952 .964 .983 .941 .961 .984 .933 .965 .983 IPW-FD SD(α) .302 .177 .134 .210 .120 .095 .137 .077 .058 .091 .053 .041 Mean(α) .964 .993 1.001 .987 .995 .998 .979 .998 .999 .983 1.001 1.000 SD(α) .392 .218 .145 .294 .156 .106 .223 .100 .067 .164 .072 .045 r ≡ 1(p < .05) .190 .109 .087 .145 .100 .081 .142 .071 .063 .115 .063 .071 *Note: Missing fraction is the fraction of missing sample to full sample. 113 coverage of r (.163,.216) (.090,.129) (.069,.105) (.123,.167) (.081,.119) (.064,.098) (.120,.164) (.055,.087) (.048,.078) (.095,.135) (.048,.078) (.055,.087) Table 2.24. Sensitivity analysis: MTE with within-variation of 15% sample only FD number of n missing fraction 100 0.75 100 0.5 100 0.25 200 0.75 200 0.5 200 0.25 500 0.75 500 0.5 500 0.25 1,000 0.75 1,000 0.5 1,000 0.25 Mean(α) .733 .935 .959 .904 .940 .973 .927 .942 .976 .902 .946 .971 IPW-FD SD(α) .735 .418 .280 .571 .273 .186 .314 .164 .116 .208 .115 .082 Mean(α) .731 .968 .981 .922 .960 .994 .973 .974 1.003 .919 .986 1.004 SD(α) .793 .479 .299 .669 .343 .212 .485 .233 .135 .385 .170 .096 r ≡ 1(p < .05) .566 .221 .117 .412 .155 .090 .251 .116 .060 .191 .097 .055 *Note: Missing fraction is the fraction of missing sample to full sample. 114 coverage of r (.532,.599) (.195,.247) (.097,.137) (.381,.442) (.132,.177) (.077,.102) (.224,.278) (.096,.136) (.045,.075) (.167,.215) (.079,.115) (.041,.070) Figure 2.2. Bias of unweighted and IPW FD estimator to small within variation (Note) Six sets of bar represent FD and IPW-FD estimator according to covariate variation. From left to right, sets of bar represent FD (v=1), IPW-FD (v=1), FD (v=.5), IPW-FD (v=.5), FD (v=.15), and IPW-FD (v=.15) where v represents fraction of covariates which vary over time. Missing fraction is represent by the color of bar - Blue:25 % , Red: 50 %, and Green: 75 %. Left panel reports the estimates for n=200 and right panel reports the estimates for n=1,000. 2.6 Empirical application: The return to class size reduction (CSR) on SAT score In this section, we estimate the effect of CSR on SAT score for grades in K-3 using Tennessee’s STAR experiment data which is accumulated for a single cohort of students from 1985 to 1989. Data for class size, student characteristics, teacher characteristics, classroom peer characteristics, and student achievement scores are collected. However, substantial proportion of data is missing so that only about 50 percent of sample is complete-case. Out of 11,601 students, for each grade from K to 3, about 6,800 students are in the program in each grade. We start by testing MCAR using Hausman-type test and variable addition test to deter- 115 mine whether we can use unweighted estimator with complete-case. As we reject the null of MCAR, we apply IPW and MI-IPW methods using Becker [1993]’s parent school choice model for the specification of the probability of selection. This section proceeds as follows. We first briefly describe Tennessee’s STAR experiment and provide detailed information on noncompliance and missing patterns of data.19Second, we provide Hausman-type test and variable addition test of MCAR. Third, we report IPW estimates and compare this to unweighted estimates. 2.6.1 Background information on Project STAR experiment of CSR Origination of experiment Observational data has not been able to find consistent class size reduction effect on student performance (Hanushek [1986] and Hanushek [1997]) while experimental studies of class size reduction show strong and positive effect.20 Therefore, the Tennessee legislature funded a randomized experiment, Tennessee’s Project STAR (Student/Teacher Achievement Ratio), to provide the definitive answer to the effectiveness of CSR on student achievement in 1985.21 Experiment design It was a randomized experiment that assigned students and teachers to three different class types, a small-size class (target of 13-17 students), a regular-size 19 More detailed description of STAR experiment is provided in Word et al. [1990]. 20 For instance, experimental studies such as Tennessee’s Project STAR (Student/Teacher Achievement Ratio), Wisconsin’s Student Achievement Guarantee in Education (SAGE) Program and Israel’s Maimonides’ Rule reported a strong link between class size reduction and improvement in student achievement. 21 The experiment funded by the Tennessee State Legislature under Lama Alexander at a total cost of $12 million with $3 million for the first year of program. 116 class (target of 22-25 students), or a regular-size class with a full-time teacher’s aide. The randomization was done within school for both students and teachers.22 Each year about 6,800 students, 330 classes, and 79 schools in 46 districts participated in the program from 1985 to 1989. A single cohort of students who entered a participating school in the 19851986 school year participated in the experiment for grades in K-3. Thus, Project STAR experiment possess essential features of a controlled experiment designed to produce reliable evidence about the effects of reducing class size.23 Major findings in previous studies The evidence showed that the students in the smaller classes outperformed the students in the regular classes, whether or not the regular class teachers had a full-time aide helping them. In particular, the positive effect of smaller classes are greater for black and inner-city poor students. Moreover, students who had been in small class performed better even after treatment is over (Krueger and Whitmore [2001]). 2.6.2 Evidence of experimental violation The validity of an experimental analysis depends on initial randomization and no differential attrition and refreshment across treatment and control groups. It is quite well supported by some evidences that initial class types were randomly assigned among students and teachers within school, but the Project STAR experiment suffers from implementation problems of 22 All participating schools implemented at least one of each of the three types of classes in order to cancel out the possible influences coming from variations in the quality of the participating schools that might affect the quality of the classroom activity. 23 See previous studies such as Word et al. [1990] and Krueger [1999] for more details of description on experiment design and implementation. 117 Table 2.25. Students in each class types for grades in K-3 small regular/aide Total 1900 1900 4425 4425 6325 6325 1293 384 248 248 1925 2867 1929 108 108 4904 4160 2313 356 356 6829 1273 366 377 192 2016 3402 1313 109 47 4824 4675 1679 486 239 6840 1276 373 525 207 2174 3567 908 153 72 4628 4843 1281 678 279 6802 Kindergarten Randomly assigned First grade Previously randomly assigned New entrant Switchers Switchers from previous years Total Second grade Previously randomly assigned New entrant Switchers Switchers from previous years Total Third grade Previously randomly assigned New entrant Switchers Switchers from previous years Total source: author’s own calculation from the sample of Project STAR data. 118 noncompliance and differential missing across treatment and control groups.24 Noncompliance Size of noncompliance sample There were sizable non-compliance of initial class type assignment of treatments and control. Among the complete samples of 24,275, about 2,200 samples were reassigned to different class types from initial assignment. For instance, table 2.25 shows the number of reassignments from small to other types of class or vice versa. For instance, first grade, out of 4,515 remaining students, 356 (7.9%) students switched class types from small to aide/regular or vice versa. For second and third grades, out of 5,049 and 5,413, remaining students 239 (4.9%) and 279 (5.2%) students switched class types, respectively. In sum, about 10 % of students were moved from one class type to another in a non-random manner. 25 The solution to noncompliance If initial assignments of treatments and control were completely random, noncompliance problem can be overcome quite easily in linear panel data model by using Instrumental Variable(IV) method in which one uses initial treatments as IV for actual treatments. Missing Data Differential missing Ignoring missing causes problem in the estimation of MTE if missing is systematically different across treatment and control groups. Typically, attrition can 24 [Krueger, 1999, p.502] for details of initial random assignment of class types for kindergartners. 25 Krueger [1999] reported that most of these moves were due to student misbehavior, and were not typically the result of parental request for small class reassignment. 119 depend on some observed and unobserved characteristics and different characteristics across treatment and control groups can lead to differential attrition. Table 2.26 and 2.27 provide an indirect evidence for possible differential attrition across class types by comparing simple mean of SAT score across groups. Table 2.26 summarizes the frequency and fraction of missing for each class types and table 2.27 compares average SAT score for attritors and stayers of the program across class types. The difference of score between attritors and remainders is greater for students in regular classes and that in small classes. Size of missing data There was sizable missing from the prior year’s treatment and control groups. Table 2.27 shows that about 6,000 to 7,000 students for each grade, 1,809 students left the program in first grade, 1,608 students left the program in second grade, and 1,319 students left the program in third grade. For instance, among those students in the program at the beginning, about 45 percent of students left Project STAR program before finishing third grade. Overall, out of 11,601 students, about 48 % of all participant students either left the program before it ends or enter the program after kinder grade. 120 Table 2.26. Attrition and reassignments of class types (noncompliance) A. Kindergarten to first grade Kindergarten First grade small regular regular with aide leaving sample (All-K) fraction of attrition small 1293 60 48 380 (1,900) 0.20 regular 126 737 663 668 (2,194) 0.31 aide 122 761 706 642 (2,231) 0.29 new sample 384 1,026 903 2,313(new)/1,809(leaving) B. First grade to second grade First grade Second grade small regular regular with aide leaving sample (All-G1) fraction of attrition small 1,435 23 24 443 (1,925) 0.23 regular 152 1,498 202 732 (2,584) 0.28 aide 40 115 1,560 650 (2,320) 0.28 new sample 389 693 709 1,791(new)/1,668(leaving) C. Second grade to third grade Second grade Third grade small regular regular with aide leaving sample (All-G2) fraction of attrition small 1,564 37 35 380 (2,016) 0.188 regular 167 1,485 152 525 (2,329) 0.225 aide 40 76 1,857 522 (2,495) 0.209 new sample 403 487 481 1,371(new)/1,319(leaving) source: Project STAR and beyond: Database User’s Guide(2007) 121 Table 2.27. SAT percentile scores at t − 1 for Leavers and stayers 1st grader Frequency Stayer of small class at first grade 1,319 Leave small sample at first grade 453(25.6%) Stayer of regular or regular with aide class at first grade 2949 Leave regular or regular with aide class at first grade 1182(28.6%) Stayer of all class types at first grade 4267 Leaver of all class types at first grade 1635 2nd grader Frequency Stayer of small class at second grade 1173 Leave small sample at second grade 276(19.0%) Stayer of regular or regular with aide class at second grade 2187 Leave regular or regular with aide class at second grade 631(22.4%) Stayer of all class types at second grade 3360 Leaver of all class types at second grade 907 Mean 59.19 45.55 54.56 42.26 55.99 43.18 Mean 61.35 46.94 57.63 44.28 58.93 45.09 SE Diff(small-reg/aide) 25.40 13.64 28.22 25.26 12.30 26.54 25.40 12.81 27.05 SE 24.69 14.41 27.13 24.26 13.35 25.17 24.27 13.84 25.80 Note: Average score for the group are reported in mean column. Diff(small-reg/aide) column report the average score difference between stayers and leavers for each class type. 122 Basic statistics of missing data Table 2.28 shows data used in the regression analysis for grades in K-3. Overall, 48 percent of data is missing and there are 16 different types of missing patterns appear in the data as in table 2.29. Among those students in the STAR program, about 5 participating students have missing variables as shown in table 2.28, but only about 52 percent of students have complete-case data. Therefore, there is severe attrition problem. Missing occurs in Project STAR due to students’ attrition and missing variables of participating students. 123 Table 2.28. Data availability for covariates Small class Aide class Regular class White/Asian Female Free Lunch Teacher white Teacher experience Teacher master degree and above Fraction of free lunch in class Fraction of kinder attendee in class outcome (yit ) available all covariates (xit ) available Both yit and all covariates (xit ) available (≡ A) Number of students in the program (≡ B) A B 124 K 6325 6325 6325 11467 11581 6300 6286 6304 6304 6325 6325 5907 6256 5840 6325 0.923 G1 6829 6829 6829 11467 11581 6650 6810 6810 6788 6781 6829 6684 6584 6430 6829 0.942 G2 6840 6840 6840 11467 11581 6496 6780 6739 6744 6645 6840 6559 6288 5919 6840 0.865 G3 6802 6802 6802 11467 11581 6520 6737 6751 6736 6652 6801 6464 6428 6086 6802 0.895 Table 2.29. Data availability by selection indicator K case1 case2 case3 case4 case5 case6 case7 case8 case9 case10 case11 case12 case13 case14 case15 case16 Total (Note) G1 G2 G3 × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × # of sample 2418 513 1079 898 328 935 1582 847 492 1154 101 53 52 223 124 802 11601 implies sit = 1 and × implies sit = 0. 125 Table 2.30. Basic statistics: The mean and standard deviation K 52.44 (26.49) 0.30 (0.46) 0.35 (0.48) 0.35 (0.48) 0.67 (0.47) 0.49 (0.50) 0.48 (0.50) 0.84 (0.37) 9.26 (5.81) 0.34 (0.47) 0.48 (0.29) G1 G2 SAT score 52.82 52.40 (27.46) (26.99) Small class 0.28 0.29 (0.45) (0.46) Aide class 0.34 0.37 (0.47) (0.48) Regular class 0.38 0.34 (0.49) (0.47) White/Asian 0.67 0.65 (0.47) (0.48) Female 0.48 0.48 (0.50) (0.50) Free Lunch 0.52 0.51 (0.50) (0.50) Teacher white 0.83 0.80 (0.38) (0.40) Teacher experience 11.63 13.14 (8.94) (8.65) Teacher master degree and above 0.34 0.36 (0.47) (0.48) Fraction of free lunch in class 0.52 0.51 (0.28) (0.29) Fraction of kinder attendee in class 0.66 0.54 (0.18) (0.20) (Note) Standard deviation is in parenthesis. 126 G3 51.85 (27.17) 0.32 (0.47) 0.37 (0.48) 0.31 (0.46) 0.67 (0.47) 0.48 (0.50) 0.51 (0.50) 0.79 (0.41) 13.93 (8.61) 0.43 (0.49) 0.51 (0.28) 0.47) (0.19) Selection indicator, sit : Selection indicator sit has value 1 only if both response variables and all covariates [small class (dummy), class with aide (dummy), student (race, gender, free lunch), teacher (race, experience, degree), class (fraction of free lunch, fraction of kindergarten attendees)] are observed, otherwise we assign sit to be 0. And we also denote 3 Ti = sit . Table 2.29 shows the number of students for each type of data based on using t=K selection indicator sit . Cases of data from 11 to 15 in table 2.29 are non-monotone II type missing so we assume MCAR of these missing patterns in the application of MI-IPW method. 2.6.3 The impact of CSR on SAT score for grades in K-3 We start by replicating complete-case estimations in previous studies under MCAR. Model for the effect of CSR on SAT score Using the model specification of Krueger [1999], we adopt linear panel data model as in (2.41). Model Operationally ignoring missing in estimation, we can apply pooled LS or pooled reduced-form (IV) estimators. 26 yit = Dit · β + x1it δ + x2i γ + ft + ci + uit (2.41) i = student, t = K, G1, G2, G3, where Dit : two treatments yit = xit θ + ci + uit , where xit = (Dit x1it x2i ft ), θ = (β δ γ , 1) 26 A reduced-form estimation uses initial treatments and control as explanatory variables in model specification. 127 where yit is student academic achievement , Dit is vector of class type assignments- small and regular with aide, Observed covariates include student, teacher, peer characteristics x1it time varying observable covariates including school dummy, x2i time-fixed observable covariates, ci unobserved heterogeneity, ft time effect, and uit unobserved errors y (Achievement Score) Achievement score we use is Standard Achievement Test (SAT). It measures student achievement in reading, math, and word skills in grade K-3. Following Krueger [1999], we scaled the test score into percentile ranks. Percentile ranking was calculated using the scores for those students who are assigned to regular and regular with full-time aide class types only. And three separate percentile rankings for each subject were generated. Response variable y is obtained as average of these three percentile rankings. For those students who are not available all three percentile rankings, we use the average percentile ranking of available percentile rankings. If one of three subject scores is missing, we use only two subject test scores to obtain y and if two subject scores are missing, then we use only available one subject’s test score for y. 27 Covariates x include: (i) student characteristics: race, gender, free lunch status (ii) teacher characteristics: years of experience, highest degree dummy, race (iii) peer characteristics: fraction of classmates with free lunch, fraction of female classmates, fraction of 27 Project STAR Public Access Data is obtained from www.heros-inc.org/data.html. All the following descriptions of variables used in this study are based on Project STAR and Beyond: Database User’s Guide (2007). 128 classmates of kindergarten attendees, and fraction of minority classmates, and (iv) school variables: school indicator, school location indicator. Unweighted estimations Results Cross-section estimation Table 2.31 reports unweighted cross-section estimates for both LS and IV. The estimated cross-section return of small class is about 5 to 7 percentile points for grades in K-3. It is about 1 of standard deviation of SAT score. The estimated cross5 section return of the class with full-time aide is not statistically different from 0 for grades in K-3. 129 Table 2.31. Cross-section unweighted LS and reduced-form estimates OLS K G1 G2 G3 Small 5.205 7.683 5.821 4.964 (.749) (.740) (.776) (0.822) Aide .259 1.866 1.642 -.492 (.698) (.710) (.733) (.770) R-squared .309 .299 .278 .221 # of sample 5840 6430 5919 6086 Other covariates School dummy, student and teacher characteristics OLS K G1 G2 G3 Small 5.224 7.599 5.440 4.644 (.749) (.747) (.782) (0.836) Aide .434 2.012 1.445 -.791 (.698) (.716) (.738) (.777) R-squared .311 .300 .280 .222 # of sample 5840 6430 5919 6086 Other covariates School dummy, student, teacher peer(classmates) characteristics reduced-form K G1 G2 G3 5.205 6.621 5.279 5.247 (.749) (.758) (.786) (.814) .259 1.509 1.175 -.072 (.698) (.679) (.700) (.726) .309 .295 .276 .220 5840 6430 5919 6086 School dummy, student and teacher characteristics reduced-form K G1 G2 G3 5.224 6.490 5.064 4.979 (.749) (.767) (.786) (.817) .434 1.560 1.156 -.078 (.698) (.680) (.699) (.728) .311 .296 .279 .221 5840 6430 5919 6086 School dummy, student, teacher peer(classmates) characteristics (Note) Students cluster-robust standard errors are provided in parentheses. Student characteristics include race, gender, and free lunch status and teacher characteristics include race, years of experience and highest degree dummy. Peer characteristics include the fraction of classmates with free lunch, fraction of minority classmates and fraction of female classmates. Reduced-form estimation use initial assignments of treatments as independent variables. 130 Pooled estimation of LS and FD estimation Table 2.33 reports unweighted FD estimates for the return of CSR on SAT score. First, the FD estimates for the return of CSR are about 6.4 percent (1.6 · 4 = 6.4) while the return of CSR for unweighted pooled LS estimate are about 7.5 percent in table 2.32. Second, unweighted FD estimates imply that the class with full-time aide has statistically significant impact on the improvement of SAT score when compared to regular class without full-time aide while pooled estimates for the return of class with full-time aide are not statistically different from zero. Overall, with unweighted methods, the effect of CSR on SAT score is between 5(pooled IV) and 7.5(pooled LS) percents. 131 Table 2.32. Unweighted pooled LS and reduced-form LS: full Initial small 3.243 (.806) Initial aide .495 (.671) Cumulative years in small 1.003 (.388) Cumulative years in aide .262 (.376) R-squared .228 # of sample 24,275 Student characteristics Yes teacher characteristics Yes peer characteristics No School, grade,PP-year Yes sample IV: full 3.201 4.991 (.806) (.507) .518 .782 (.670) (.507) .978 (.388) .254 (.377) .229 .226 24,275 24,275 Yes Yes Yes Yes Yes No Yes Yes sample 4.903 (.507) .796 (.507) .227 24275 Yes Yes Yes Yes LS: attrition sample IV: attrition sample 3.474 3.441 4.856 4.806 (.980) (.980) (.698) (.699) -.022 -.004 .205 .244 (.797) (.797) (.663) (.663) .746 .743 (.436) (.436) .221 .241 (.426) (.427) .227 .229 .226 .227 16,222 16,222 16,222 16,222 Yes Yes Yes Yes Yes Yes Yes Yes No Yes No Yes Yes Yes Yes Yes (Note) Students cluster-robust standard errors are provided in parentheses. PP-year implies program participation year dummy. Grade implies dummies for grades in K-3. IV represents the estimation method for which initial assignments of class type were used as treatment variables. 132 Table 2.33. Unweighted FD estimates FD with full sample Cumulative years in small 1.738 1.680 1.651 (.362) (.364) (.364) Cumulative years in aide 1.492 1.370 1.359 (.358) (.363) (.364) R-squared .016 .020 .020 # of sample 13214 12924 12924 Student characteristics Yes Yes Yes teacher characteristics No Yes Yes peer characteristics No No Yes School, grade,PP-year Yes Yes Yes FD with attrition sample 1.827 1.641 1.613 (.416) (.413) (.414) 1.738 1.692 1.689 (.426) (.427) (.428) .015 .019 .020 9883 9793 9793 Yes Yes Yes No Yes Yes No No Yes Yes Yes Yes (Note) Students cluster-robust standard errors are provided in parentheses. Grade implies dummies for grades in K-3. 133 2.6.4 Test of strict exogeneity assumption We formally test strict exogeneity using Hausman-type test which is based on unweighted FD and FE estimators. We perform Hausman-type II test and regression based variable addition test of Wooldridge [2009] for unbalanced panel data. Test Results Table 2.34 shows the rejection of the null hypothesis for strict exogeneity at nominal level of 0.05. This implies that the violation of strict exogeneity and unweighted FE and FD estimators with complete-case would produce bias and incorrect standard error. Furthermore, table 2.35 shows the results for a regression based variable addition tests and H0 : λ = 0 in (2.19)-(2.21) are rejected for FD and FE estimation models. Tests imply MCAR is violated and, therefore, both unweighted FD and FE estimators are not valid. Indirect evidence of weak exogeneity from LS and reduced-form(IV) Unfortu- nately, we cannot directly test conditional contemporaneous exogeneity(i.e. weak exogeneity conditional on selection E(sit · dit · errorit ) =0) for pooled LS and reduced-form(IV) estimators. However, if we assume that E(sit |·) and E(sit+1 |·) are serially correlated, we can conjecture the sign of E(sit · dit · errorit ) from the sign of E(sit+1 · dit · errorit ). LS and IV columns in table 2.35 report that E(sit+1 · dit · errorit ) >0 and, thus for instance, if both E(sit · dit · uit ) and E(sit+1 · dit · errorit ) have the same sign, both LS and IV estimators are biased upward. 134 Table 2.34. Hausman-type test II: strict exogeneity Cumulative year in small class Cumulative year in aide class # of sample H0 : θpols = θF E , k = 2 P-values Student characteristics teacher characteristics peer characteristics School, grade,PP-year FD FE 1.569 .544 (.391) (.321) 1.514 .809 (.386) (.319) 31228 W2 = 4.82 {.089} No No No Yes FD FE 1.741 .477 (.403) (.324) 1.501 .774 (.400) (.322) 30309 W2 = 6.70 {.035} Yes No No Yes 135 FD FE 1.682 .458 (.363) (.326) 1.377 .705 (.361) (.323) 29960 W2 = 6.37 {.041} Yes Yes No Yes FD FE 1.648 .442 (.363) (.327) 1.360 .692 (.362) (.324) 29960 W2 = 6.16 {.045} Yes Yes Yes Yes Table 2.35. Variable addition test: strict exogeneity PLS PIV Selection (S it+1 ) 10.439 10.474 (.420) (.420) Initial small 2.702 5.047 (.919) (.575) Initial aide .076 .922 (.749) (.513) Cumulative year in 1.602 small class (.522) Cumulative year in .802 aide class (.510) R-squared .269 .269 # of sample 18386 18386 Student characteristics Yes Yes teacher characteristics No No peer characteristics No No School, grade,PP-year Yes Yes FE FD PLS 4.571 3.889 10.975 (.514) (.544) (.418) 2.848 (.923) .185 (.753) 1.276 2.054 .951 (.474) (.491) (.545) 1.866 2.085 .668 (.468) (.492) (.516) .037 .272 18386 8538 18189 Yes Yes Yes No No Yes No No Yes Yes Yes Yes 136 PIV FE 10.287 4.185 (.419) (.510) 4.113 (.598) .886 (.516) 1.249 (.478) 1.756 (.471) .272 18189 18189 Yes Yes Yes Yes Yes Yes Yes Yes FD 3.604 (.540) 2.084 (0.494) 2.110 (.498) .037 8390 Yes Yes Yes Yes 2.6.5 Estimation with inverse probability weighted(IPW) and multiple imputation(MI) The two formal tests in previous section show the evidence of differential correlation among unobserved heterogeneity, ci , and idiosyncratic error, ui = (ui1 , ui2 , ...., uiT ) and selection indicator, si across dit and this implies that MCAR is violated. Therefore, we apply IPW and MI-IPW methods using the probability of selection which satisfies MAR assumption as weights. IPW estimation In the estimation of the probability of selection (i.e. E[sit |Wit−1 ]), we adopt the method in Robins et al. [1994] and Wooldridge [2010] for which the probability of selection is estimated in sequential manner for unbalanced panel data. We base our choice of the predictors for the probability of selection on parents’s school choice between private and public. At the beginning of period, all students start in public school. Parents’s school choice and attrition We use parents’s school choice model in Becker [1993] to describe students attrition in STAR Program. Using this model, we specify the model for the probability of selection which satisfies MAR assumption. Estimated probability of selection is used as inverse probability weights in IPW estimation. Assumptions 1. Parents maximize utility directly for two periods and indirectly infinite periods 137 2. heterogeneity takes two forms which are parent resource (h1i ) and student individual ability (h2i ) 3. Each period choice is made for the resource allocation among (i) consumption (ii) spending on human capital of children (Et ) 4. Initially, as all students start in public school, we assume that the marginal cost of moving to private school is greater than the marginal benefit. Model: Parent’s optimization problem Following Becker [1993], at time t, parent’s utility depends both on the utility of themselves and their children as in (2.42). Ut = u(ct ) + ρ · u(ct+1 ) (2.42) where ct is the consumption of parent, ρ is a constant which measures the relative weight for child’s utility, ct+1 is children’s consumption when they become adult. If the preference (2.42) is the same for all generations and consumption during childhood can be ignored, then the utility of parent can be written as in (2.43). ∞ ρi · u(ct+i ). Ut = (2.43) i=1 The utility of parent depends directly only on the utility of their own children and indirectly on all descendants. Resource constraint for parent and children at period t is given by yt = ct + Et ; yt = h1i ; yt+1 = f (h2i , Et ); with linearity yt+1 = a(h2i ) · Et (2.44) where h1i is parent’s income, Et parents’ human capital investment for their children at t and a >0 is the marginal return for human capital investment which varies across students as 138 their ability varies. For the sake of simplicity of illustration, we assume a linear production function for children in (2.44). 28 At the optimum, with linearity assumption on children’s production function, we have the following optimal condition. u (h1i − Et ) = ρ · Et (a(h2i ) · u (ct+1 )|It ) (2.45) where Et (·|It ) is conditional expectation at period t using all available information, It , at the beginning of period t. LHS of (2.45) is marginal cost of moving to private school and RHS of (2.45) is expected marginal benefit of moving to private school. Parent does not have complete information on their children’s ability so they update their children’s ability as they obtain new information. At the beginning of program, for all participating students, LHS is greater than RHS in (2.45). At the end of each grade, parents update their expected marginal benefit of moving to private school as new information on student’s score, class, teacher, and school quality become available. Therefore, attrition by school choice occurs as RHS becomes larger than LHS in (2.45) after new information arrives at the end of period. There are other reasons for students move and leave STAR program such as parent job reallocation, the move due to sibling concerns, and others. We assume that the attrition of students by these reasons should not be causing differential attrition across class types since it is very hard to think that these reasons to move only occur to one type of class. Therefore, we assume that differential attrition across class types only occurs through parent’s decision of the moving. 28 We assume no transfer from parents to children other than human capital investment and bequest transfer. Bequest transfer is determined at the level which expected marginal returns from bequest transfer is equal to marginal return from human capital investment. 139 The probability of selection model The attrition of student is determined by updated information Et (·|It ) between at the end of t − 1 and at the beginning of t. It includes student’s score (yit−1 ) and updated class, teacher, and school characteristics(z1it−1 ) where z1it−1 include class average on SAT score, teacher’s experience, teacher’s highest degree, the fraction of free lunch students and so on. In the application, we use yit−1 and (z1it−1 ) as predictors for the probability of selection model. In this setting, MAR is satisfied as in (2.46) and MAR implies that, once we know parent information It , we can figure out parent’s school choice and attrition status. We use logit model in the application. P (sit = 1|Wit ) = P (sit = 1|zit )P (sit = 1|Wit , sit−1 = 1) = P (sit = 1|zit , sit−1 ) = πit (2.46) where zit = (z1it−1 , yit−1 ). πit = G(zit ; γ)Λ(zit · γ) (2.47) exp(·) where wit =(xit−1 , yit−1 ) and Λ(·) = 1+exp(·) . 2.6.6 IPW estimation We illustrate IPW estimation with complete-case. MI-IPW estimation is conducted with pseudo-complete case data.29 We construct pit in sequential manner using conditional expectation. pit = p(sit = 1|zit , sit−1 = 1)·p(sit−1 = 1|zit−1 , sit−2 = 1)····p(si2 = 1|zi2 , si1 = 1), t = 1, 2, 3 (2.48) 29 Pseudo-complete case is the case that combines complete case and non-monotone II which become complete case after imputations. 140 pit = πit · πit−1 · · · πi1 Using the sequential approach as in (2.48), and assuming the sequence of binary response models is correctly specified as in (B.1), we can consistently γ. Once we obtain γ from MLE, we use G(zit , γ) to construct pit for all i and t with sit = 1. IPW estimates: The return of CSR on SAT score For cross-section estimation, we estimate pit = πit = p(sit = 1|zit , sit−1 = 1) using logit model. zit−1 include class types, student characteristics (gender, race, free lunch), teacher characteristics (race, experiences, highest degree), school fixed effects, and SAT percentile score of period t − 1. Cross-section estimation of probability weight Figure 2.3 show estimated probability weight for cross-section estimation in grades K-3. One of practical criticism for IPW estimation is that estimates can be very unstable if certain subsets of population have very low non-attrition probabilities.(Little and Rubin [2002]) However, as wee see in figure 2.3, all of estimated pit is strictly greater than 0.4 so that our assumption of pit > 0 is well satisfied. The overlap assumption of pit for treatment and control groups is also well satisfied as we see in figure 2.4. IPW estimates: cross-section Table 2.36 reports cross-section IPW estimates of both LS and reduced-form for grades in 1-3. The estimated cross-section return of small class is about 5 to 7 percentile points. Students in small classes obtain about 5 to 7 percentile 141 1 point higher test scores than students in regular class for grades in K-G3. It is about 5 of standard deviation of SAT score. The estimated cross-section return of full-time aide class is not statistically different from 0. The implication of IPW estimates is no different from the one of unweighted estimates. In all grades, the difference between unweighted and IPW estimates is within 1 percent percentile. 142 Figure 2.3. Estimated πit for t = 1, 2, 3 (Note) For all figures, x-axis represents πit and y-axis represents fraction. For the figure in first row and first column, πi1 in x-axis is ranged from 0.4, to 0.9 and its unit interval is .1. y-axis is ranged from 0 to 0.05 and its unit interval is 0.01. For the figure in first row and second column, πi2 in x-axis is ranged from 0.4, to 1 and its unit interval is 0.2. y-axis is ranged from 0 to 0.06 and its unit interval is 0.02. For the figure in second row and first column, πi3 in x-axis is ranged from 0.6, to 1 and its unit interval is 0.1. y-axis is ranged from 0 to 0.06 and its unit interval is 0.02. 143 Figure 2.4. Estimated πit conditional on sit = 0 or sit = 1 for t = 1, 2, 3 (Note) x-axis represents πit and y-axis represents fraction. We denote figure in ith row and jth column as F(i,j) . F(i,1) shows πit conditional on sit = 0 and F(i,2) shows πit conditional on sit = 1. F(1,1) shows πi1 conditional on si1 = 0, x-asix is ranged from 0.4 to 0.9 and y-asix is ranged from 0 to 0.06 with interval of 0.02. F(2,1) shows πi2 conditional on si2 = 0, x-asix is ranged from 0.4 to 0.9 and y-asix is ranged from 0 to 0.05 with interval of 0.01. F(3,1) shows πi3 conditional on si3 = 0, x-asix is ranged from 0.6 to 1 and y-asix is ranged from 0 to 0.06 with interval of 0.02. F(1,2) shows πi1 conditional on si1 = 1, x-asix is ranged from 0.4 to 0.9 and y-asix is ranged from 0 to 0.06 with interval of 0.02. F(2,2) shows πi2 conditional on si2 = 1, x-asix is ranged from 0.4 to 1 and y-asix is ranged from 0 to 0.08 with interval of 0.02. F(3,1) shows πi3 conditional on si3 = 1, x-asix is ranged from 0.6 to 1 and y-asix is ranged from 0 to 0.08 with interval of 0.02. 144 Table 2.36. Cross-section IPW LS and IPW reduced-form estimation LS K G1 G2 G3 Small 5.205 7.309 6.025 5.491 (.749) (.926) (.928) (0.953) Aide .259 1.090 2.375 .289 (.698) (.950) (.913) (.904) R-squared .309 .318 .294 .234 # of sample 5840 4052 4338 4533 Other covariates School dummy, student and teacher characteristics OLS K G1 G2 G3 Small 5.224 7.282 5.595 5.355 (.749) (.935) (.940) (0.969) Aide .434 1.442 2.235 .092 (.698) (.955) (.921) (.910) R-squared .311 .320 .296 .235 # of sample 5840 4052 4338 4533 Other covariates School dummy, student, teacher peer(classmates) characteristics reduced-form K G1 G2 G3 5.205 5.613 5.536 6.224 (.749) (.960) (.953) (.945) .259 .013 1.480 .820 (.698) (.898) (.848) (.833) .309 .313 .293 .235 5840 4052 4338 4533 School dummy, student and teacher characteristics reduced-form K G1 G2 G3 5.224 5.508 5.312 6.072 (.749) (.973) (.955) (.949) .434 .076 1.544 .728 (.698) (.897) (.847) (.834) .311 .316 .293 .235 5840 4052 4338 4533 School dummy, student, teacher peer(classmates) characteristics (Note) Students cluster-robust standard errors are reported in parentheses. Student characteristics include race, gender, and free lunch status and teacher characteristics include race, years of experience and highest degree dummy. Peer characteristics include the fraction of classmates with free lunch, fraction of minority classmates and fraction of female classmates. Reduced form estimation use initial assignments of treatments as independent variables. 145 Panel probability weights estimation πit is estimated using equation (2.49) by con- ditional MLE. πit = Λ(zit · γ) for sit−1 = 1. (2.49) where j = 1, 2, 3. The probability weight estimate, pit is sequentially constructed as in (2.50). pit = πit · πit−1 · · · ·πi1 . (2.50) Figure 2.5 shows the histogram of estimated probability weight, πit and pit πit and pit are greater than .15 and .35 for all i and t respectively. 146 Figure 2.5. Estimated πit and pit for t = 1, 2, 3 with attrition sample (Note) F(i,j) identifies the location of figure. i identifies the row and j identifies column. Figures in left panel of rows represent πit . First, second, and third row show figures for πi1 , πi2 , and πi3 respectively. Figures in right panel of rows represent πit . First, second, and third row show figures for pi1 , pi2 , and pi3 respectively. For πi1 in F(1,1) , x-axis is ranged from 0.4 to 0.9 and y-axis is ranged from 0 to 0.05 with interval of 0.01. For πi2 in F(2,1) , x-axis is ranged from 0.2 to 1 and y-axis is ranged from 0 to 0.06 with interval of 0.02. For πi3 in F(3,1) , x-axis is ranged from 0.5 to 1 and y-axis is ranged from 0 to 0.08 with interval of 0.02. For pi1 in F(1,2) , x-axis is ranged from 0.4 to 0.9 and y-axis is ranged from 0 to 0.05 with interval of 0.01. For pi2 in F(2,2) , x-axis is ranged from 0.2 to .8 and y-axis is ranged from 0 to 0.05 with interval of 0.01. For πi3 in F(3,2) , x-axis is ranged from 0 to .8 and y-axis is ranged from 0 to 0.05 with interval of 0.01. 147 Panel IPW estimation results Table 2.37 provides IPW pooled estimates. The estimated effects of IPW LS for the effect of CSR on SAT score is about 6.4 % while the estimated effect of unweighted LS is 7.3 % in table 2.32. Moreover, the estimated effects of IPW IV for the effect of CSR on SAT score is about 4.5 % while the estimated effect of unweighted IV is 5.5 %. Table 2.38 reports IPW FD estimates. The estimated effects of IPW FD estimator is 5.5 % while the effect for unweighted FD estimates is about 7 %. Again, the effect is large for the unweighted estimator. Thus, for all three estimates, unweighted estimates for the return of CSR on score are greater than IPW estimates. Summary of estimation results: Unweighted and IPW estimates in unbalanced panel data Unweighted estimates are greater than IPW estimates for pooled LS, pooled reducedform(IV), FD estimators. However, the difference of estimates is very small for pooled LS, pooled reduced-form, and FD estimator and is within 2%. If IPW estimates are consistent, unweighted LS, reduced-form, and FD estimators overestimate the return of CSR about 1 to 2 percent. In sum, we conclude that the return of CSR on SAT score for grades in K-3 is within the range from 4 to 6.5 percent. 148 Table 2.37. IPW pooled LS, pooled IV Pooled LS Initial small 2.152 2.374 (1.163) (1.163) Initial aide -.601 -.414 (.897) (.905) Cumulative years in small 1.154 1.033 (.506) (.508) Cumulative years in aide .360 .275 (.471) (.476) R-squared .228 .229 # of sample 15395 15241 Student characteristics Yes Yes teacher characteristics No Yes peer characteristics No No School, grade,PP-year Yes Yes 149 Pooled 2.328 4.500 (1.163) (.813) -.390 -.221 (.903) (.762) 1.038 (.508) .297 (.476) .231 .228 15241 15395 Yes Yes Yes No Yes No Yes Yes IV 4.497 4.448 (.815) (.815) -.123 -.074 (.768) (.768) .229 15241 Yes Yes No Yes .230 15241 Yes Yes Yes Yes Table 2.38. Weighted FD estimates IPW FD Cumulative years in small 1.317 1.380 (.415) (.416) Cumulative years in aide 1.697 1.572 (.433) (.435) R-squared .014 .018 # of sample 9512 9401 Student characteristics Yes Yes teacher characteristics No Yes peer characteristics No No School, grade,PP-year Yes Yes 150 unweighted FD 1.361 1.827 1.641 (.416) (.416) (.413) 1.572 1.738 1.692 (.435) (.426) (.427) .019 .015 .019 9401 9883 9793 Yes Yes Yes Yes No Yes Yes No No Yes Yes Yes 1.613 (.414) 1.689 (.428) .020 9793 Yes Yes Yes Yes 2.6.7 MI-IPW method The major disadvantage of imputation method in our application is the necessity of modeling the joint distribution of missing variables which include both categorical and continuous variables. For instance, our application of project STAR data includes score (continuous), free lunch (binary), teacher experience (continuous), teacher race (binary), many other class peer (fractional variables with having values between zero and one). However, we mitigate this problem of high dimensional covariates by focusing on non-monotone II type of missing in imputation. It is easy to implement MI method since built-in command for MI method exists with Stata, SAS and R.30 . Models of missing variables with multiple imputation Participating students with missing observations are about 5 percent at each grade and missing observations mostly occur to SAT score. Therefore, most of missing data comes from attrition of students for Project STAR data. Imputation model for missing variables using MICE In Stata, we imputed missing variables which include outcome, treatments and covariates using following predictors and distributional assumption in table 2.39. 30 We implement multiple imputation for the estimation of the return of CSR on score using ”ice” Stata command with the simplest assumption on the distribution of missing variables. Further details on imputation methods under MAR have been studied extensively in theory and practice(Rubin [1987] and Little and Rubin [2002]) 151 Table 2.39. Covariates and distributional assumption used for MICE variables SAT score small class aide class free lunch teacher’s race teacher’s highest degree teacher’s years of experience fraction of students with freelunch fraction of female students fraction of black students distribution Normal Logistic Logistic Logistic Logistic Logistic Logistic Normal Normal Normal command regress logit logit logit logit logit logit regress regress regress SAT SAT SAT SAT SAT SAT SAT prediction equation treatments, student, teacher, peer characteristics SAT score, student, teacher, peer characteristics SAT score, student, teacher, peer characteristics score, treatments, student, teacher, peer characteristics score, treatments, student, teacher, peer characteristics score, treatments, student, teacher, peer characteristics score, treatments, student, teacher, peer characteristics score, treatments, student, teacher, peer characteristics score, treatments, student, teacher, peer characteristics score, treatments, student, teacher, peer characteristics (Note) (i) Treatments are class-type indicators. (ii) Students characteristics include race, sex and free lunch status. (iii) Teacher characteristics include race, degree, and experience. (iv) Peer characteristics include the fraction of student with free lunch, fraction of female and fraction of black classmates. For panel data MI estimation, grade, program participation grade and school dummies are additionally included in predictions of all missing variables. 152 Estimates with MI-IPW Using the set of imputed variables for STAR participating students as complete case, we apply IPW estimation. Tables 2.40 and 2.41 reports pooled LS, pooled IV and pooled FD estimates for the return of CSR on score using MI-IPW method. There are about 4.5 percent more sample observations are used in MI-IPW method than IPW method. For instance, in the case of pooled LS estimator, MI-IPW method uses 17,468 observations while IPW method 15,395 observations in the estimation. The rest of missing data which is more than 40 percent of full-sample is due to attrition. The difference of estimates between IPW and MI-IPW is very small while standard errors are small for MI-IPW method so that MI-IPW estimates provide the same implications as those of IPW estimates. In sum, unweighted estimator overestimates CSR on SAT score about 1 to 2 percentage points. Summary of results The return of CSR on SAT score is about 4 to 6.5 percent for MI-IPW estimator while it is about 5 to 7.5 percent for unweighted estimator. Unweighted estimator overestimates the effect of CSR by about 1 to 2 percentage point. The return of aid class on SAT score is not statistically significant for pooled LS and IV estimates while it is statistically significant for FD estimator. The statistical significance of the return for both small class and aide class does not change whether we use either unweighted or IPW (MI-IPW) methods. 153 Table 2.40. MI-IPW pooled LS, pooled IV Pooled LS Initial small 2.584 2.202 (1.084) (1.087) Initial aide .170 -.127 (.840) (.839) Cumulative years in small .997 1.065 (.453) (.455) Cumulative years in aide .090 .172 (.421) (.423) R-squared .201 .200 # of sample 17468 17468 Student characteristics Yes Yes teacher characteristics No Yes peer characteristics No No School, grade,PP-year Yes Yes Pooled 2.267 4.788 (1.080) (.747) -.094 .275 (.843) (.716) .905 (.488) .164 (.435) .206 .201 17468 17468 Yes Yes Yes No Yes No Yes Yes IV 4.511 4.416 (.756) (.760) .066 .080 (.716) (.716) .200 17468 Yes Yes No Yes .206 17468 Yes Yes Yes Yes Table 2.41. MI-IPW FD estimates MI-IPW FD Cumulative years in small .982 .894 .901 (.411) (.413) (.413) Cumulative years in aide 1.241 1.358 1.460 (.425) (.430) (.430) R-squared .016 .020 .020 # of sample 11,146 10,761 10,761 Student characteristics Yes Yes Yes teacher characteristics No Yes Yes peer characteristics No No Yes School, grade,PP-year Yes Yes Yes 154 unweighted FD 1.827 1.641 (.416) (.413) 1.738 1.692 (.426) (.427) .015 .019 9883 9793 Yes Yes No Yes No No Yes Yes 1.613 (.414) 1.689 (.428) .020 9793 Yes Yes Yes Yes 2.7 Direction (sign) of the bias for unweighted estimators In this section, we try to obtain information about the direction of the bias for unweighted estimators by making assumption that E(vit sit |zit ) and E(vit sit+j |zit ) has the same sign. This is because the source of bias for unweighted estimators is E(vit sit |zit ) and we cannot directly estimate it. Although we can not obtain an estimate for equation (2.51), we can obtain an estimate for (2.52) and we infer E(vit sit |zit ) from E(vit sit+j |zit ). E(vit sit |xit , zit ) = E(vit sit |zit ) where xit ⊆ zit and zit is instrument (2.51) where vit = ci + uit E(vit sit+j |zit ) where |j| ≥ 1 (2.52) Under the assumption that E(vit sit |zit ) and E(vit sit+j |zit ) has the same sign, we can calculate the direction of the bias of unweighted IV estimates and thereby we can obtain the bound for true return of class size reduction. 2.7.1 Bound for the effects of small class First, we consider an estimator for E(vit sit+j |zit ). Cross-section IV regression E(uit sit+j |zit ) = 155 1 n n vit sit+j i=1 (2.53) where vit be residual from unweighted IV estimator 31 can rewrite IV estimator as follows. We θIV sit zit zit = ( )−1 ( sit zit zit = θ+( zit sit vit ) i=1 n sit zit zit )−1 ( i=1 sit zit (zit θ + vit )) i=1 i=1 )−1 ( i=1 n )−1 ( n n ( sit zit zit sit zit yit ) = ( i=1 i=1 n n n n zit sit vit ) = ( i=1 1 n n sit zit zit )−1 ( i=1 p → E(sit zit zit 1 n )−1 E(z n zit sit vit ) i=1 it sit vit ) E(sit zit zit )−1 E(zit )E(sit vit ) = Since Z exogenous Thus, finally, we obtain cross-section IV estimator as follows. θIV = θ + E(sit zit zit )−1 E(zit )E(sit vit ) θIV > θ if sign(E(sit vit )) is positive < θ if sign(E(sit vit )) is positive Therefore, θIV becomes the upper bound for θtrue if sign(E(sit vit )) is positive and lower bound if sign(E(sit vit )) is negative. pooled IV regression G3 uit sit+j |zit ) = E( t=K 1 nT n G3 vit sit+j i=1 t=K n G3 vit = yit − zit θpooledIV , where θpooledIV = ( n sit zit zit i=1 t=K 31 This (2.54) G3 )−1 ( sit zit yit ) i=1 t=K is not conventional IV estimator. We define IV estimator here to be equivalent estimator as in Krueger [1999] where he called this estimator as reduced form estimator. 156 Similarly, as in cross-section IV case, we obtain an pooled IV estimator as G3 G3 θpooledIV sit zit zit = θ + E( t=K θIV )−1 E( zit sit vit ) t=K > θ if sign(E(sit vit )) is positive < θ if sign(E(sit vit )) is positive This is because E(zit ) > 0 and zit is exogenous in our data. Estimation results Pairwise correlation coefficient estimates for (2.53) is reported in table 2.42. The estimates 1 for E(vit sit+j |zit ) = n n i=1 vit sit+j is positive and statistically significantly different from zero ∀ j ≥ 1 in all grades. (i.e. residuals from cross-section IV estimators and lead of selection is positively correlated in statistically significant way.) However, pairwise correlation coefficients for residuals and lag of selection are not statistically different from zero. In other word, we simply cannot reject the null hypothesis of E(vit sit−j |zit ) = 0 for j ≥ 1. 1 Thus, given that sample estimates for E(vit sit+j |zit ) = n n i=1 vit sit+j (≥ 0) is non- negative ∀ j and j ==0. It is positive for leads of selection and zero for lags of selection. Therefore, if we assume the of sign(E(vit sit |zit ))=sign(E(vit sit+j |zit )) ∀ j≥ 1, we can infer that θIV is biased upward. Thus, under this assumption, θIV overestimate true effects of small class. Table 2.43 shows the upper bound cross-section IV estimates. 157 Table 2.42. corr(vit , sit+j ) ∀ j≥ 0 siK siG1 siG2 siG3 siK siG1 siG2 siG3 covariates(1) uiK uiG1 .0011 .1954* .2539* .2583* .2708* .2810* covariates(3) uiK uiG1 .0049 .1692* .2156* .2235* .2215* .2481* uiG2 -.0019 .0115 uiG3 -.0020 .0075 .0074 .1409* uiG2 .0014 .0220 uiG3 .0022 .0099 .0038 .1073* covariates(2) uiK uiG1 .0018 .1733* .2196* .2261* .2282* .2561* covariates(4) uiK uiG1 .0048 .1696* .2156* .2241* .2217* .2485* uiG2 .0003 .0205 uiG3 -.0007 .0097 -.0007 .1157* uiG2 .0012 .0224 uiG3 .0022 .0099 .0041 .1066* (Note) *: statistically significant at the level .05. Covariates (1) include treatments and school dummies, covariates (2) include covariates (1) and student characteristics. Covariates (3) include covariates(2) and teacher characteristics and, finally, covariates (4) include covariates (3) and peer characteristics. Table 2.43. Upper bound estimates for cross-section IV estimator Initial small Initial aide R-squared # of sample Other covariates reduced-from K G1 G2 5.205 6.621 5.279 (.749) (.758) (.786) .259 1.509 1.175 (.698) (.679) (.700) .309 .295 .276 5840 6430 5919 School dummy, student teacher characteristics G3 5.247 (.814) -.072 (.726) .220 6086 and K G1 G2 G3 5.224 6.490 5.064 4.979 (.749) (.767) (.786) (.817) .434 1.560 1.156 -.078 (.698) (.680) (.699) (.728) .311 .296 .279 .221 5840 6430 5919 6086 School dummy, student, teacher peer(classmates) characteristics (Note) Students cluster-robust standard errors are provided in parentheses. Student characteristics include (i) race (ii) gender (iii) free lunch status and teacher characteristics include (i) race (ii) years of experience (iii) highest degree. Peer characteristics include (i) fraction of classmates with free lunch (ii) fraction of minority classmates (iii) fraction of female classmates. 158 2.8 conclusion Two formal tests of the missing completely at random (MCAR) assumption are introduced in unbalanced panel data. The powers of tests for strict exogeneity increase with cross-section size n and missing proportion of data. For instance, the power of Hausman-type tests which is based on the difference of pooled LS(or RE) and FE estimators with a data set of n ≥5,000 and T = 4 is 1. The power of the variable addition test is also 1 with a data set of n ≥5,000 and T = 4. We compare the estimates of unweighted, IPW and MI-IPW methods in unbalanced panel data using a Monte Carlo simulation. The results show that the IPW method under MAR works well if the probability of selection is correctly specified. We also study the robustness of the IPW estimator with the probability of selection model misspecification. Simulation shows that finite sample bias of the IPW estimator is smaller than that of an unweighted estimator unless misspecification is severe. We provide the MI-IPW method under MAR to improve efficiency by combining the MI and IPW methods together in an unbalanced panel data model. The estimated effect of class size reduction (CSR) on student academic achievement for students in grades K-3 is about 4 to 6 percent with both IPW LS and FD and MI-IPW LS and FD estimators while the estimated effect for unweighted LS and FD estimators with complete-case is about 6 to 7 percent. Estimators ignoring missing data overestimate the effect of CSR on student score by about 1 to 2 percentage points. 159 Chapter 3 Cluster robust inference in unbalanced panel data when T is large 3.1 Introduction A robust inference method in the linear panel data model with a long T has become more relevant recently as more panel data are available with long time dimensions. An inference method which ignores heterogeneity and dependence can severely distort the inference about parameters of interest. For instance, if errors are correlated across observations either in the time or cross-section dimension, a covariance matrix using OLS standard error is biased. Recently, Wooldridge [2003], Hansen [2007], Bester et al. [2009], Vogelsang [2008] and Ibragimov and Muller [2009] provide robust inference methods with dependent data in a balanced panel data model.1 However, attrition in a panel data and unbalancedness in a two-level data 1 Arellano [1987] provides asymptotic properties of a covariance matrix in the linear panel models with random sampling (over cross-section unit) but it is limited only to the case when T fixed and n large. Moreover, cluster covariance estimators are used with data that 160 occur frequently in a large T data or a two-level data. For instance, unbalancedness of data occurs when each school has different numbers of classes or each region has different numbers of states or starting periods of available data that can differ across cross-section units. Therefore, this paper focuses on robust inferences in unbalanced panel data and the effect of an unbalancedness on the validity for the inference methods in balanced panel data. In particular, unbalancedness causes heterogeneity even if attrition or unbalancedness occurs completely at random and heterogeneity can cause distortion in inference. For example, the methods in Hansen [2007] and Vogelsang [2008] require homogeneity in cross-section units to use t-distribution and F -distribution as reference distributions for hypothesis tests. Therefore, unfortunately, robust inference methods in balanced panel data cannot be directly extended to unbalanced panel data. Among those robust inference methods in balanced panel data, the Ibragimov and Muller [2009] approach provides a t-statistic based inference method. However, this t-statistic based inference method uses only between variations so there is room for improving efficiency. Bester et al. [2009] shows simulation results that the t-statistic based method of Ibragimov and Muller [2009] performs poorly if the estimator presents substantial finite sample bias using the example of 2SLS simulation with weak IV. On the other hand, Hansen [2007] provides methods based on constructing t and Wald statistics using a cluster covariance matrix estimator (CCE). Under homogeneity in both covariates and errors, t and Wald statistics based on CCE follow standard t and F distrihas a group structure with independence across groups. In our panel data setting, crosssection units represent groups and time series represents group members and we assume independence between groups. See Wooldridge [2003], Liang and Zeger [1986] and Bertrand et al. [2004] for more details. 161 bution with scale factors and provide valid inference. Under homogenous covariates and heterogenous variance, t statistic using CCE with scaling factors converges t-distribution so the t-test provides conservative inference in Bakirov and Szekely [2006]. When heterogeneity exists for both covariates and errors, the t statistic using CCE converges to a limiting distribution which is neither standard nor pivotal in both Hansen [2007] and Ibragimov and Muller [2009]. Hansen [2007]’s method without homogeneity can not perform valid robust inference using standard reference distributions of t and F while Ibragimov and Muller [2009]’s approach provides conservative inference of Bakirov and Szekely [2006] in the presence of heterogenous variance. Vogelsang [2008] provides a robust inference method based on constructing t and Wald statistics with HACs of a covariance matrix where the HAC smoothing parameter grows proportionally to the sample size (this is called fixed-b inference). Vogelsang [2008] also shows convergence of t and Wald statistics with HACs covariance matrix based on fixed-b smoothing parameter to t and F distribution with scaling factors under homogeneity. Both in Hansen [2007] and Vogelsang [2008], we need to calculate limiting distributions for t and Wald statistics treating the covariance matrix as random variables in the limit. This causes heterogeneous variance for the data with different time periods of attrition across different cross-section units. Our objective, in this paper, is to propose a method which provides a robust inference with heterogenous variance and autocorrelation in an unbalanced linear panel data model. We compare our method to those in Hansen [2007] and Ibragimov and Muller [2009] which are developed for balanced panel data. Our proposed inference method is an extension of the OLS method in Hansen [2007] to weighted least squares (WLS) in constructing t and Wald 162 statistics using CCE where attrition periods are used as weights. We show that, using the result of Bakirov and Szekely [2006], t-test with WLS provide conservative inference result. To our knowledge, there has been no study that focuses on the asymptotic properties of a variance matrix in unbalanced panel data in the presence of heterogeneity and autocorrelation. For the sake of simplicity, when doing the simulation we focus on random attrition for panel data and random unbalancedness for two-level data. In the MC experiment, heterogeneity in variance and covariates are induced through random attrition in unbalanced panel data. Our proposed inference method based on WLS which extends Hansen [2007] is compared to the t-statistic based method of Ibragimov and Muller [2009] and the method of Hansen [2007] with OLS. The WLS method which uses the attrition periods information as weights, in comparison to the OLS method in Hansen [2007], shows significantly smaller size distortion. Simulation results also reveal that our method shows power improvement over the t-statistic based method in Ibragimov and Muller [2009]. We also extend our proposed method with WLS to the case where the heterogeneity by both attrition and heteroskedastic variance is present. We compare the power of our method to that of Ibragimov and Muller [2009] and provide evidence for improvement of power performance through a MC simulation. Unbalancedness which is caused by attrition introduces heterogeneity and this forces conservative t-tests for the method in Ibragimov and Muller [2009]. Thus, we examine the tradeoff for the t-tests based on the method in Ibragimov and Muller [2009] as we drop data on purpose to make the data balanced. Dropping available data inevitably introduces the loss of power while the balancedness of data eliminates heterogeneity so it leads to no size distortion in inference. Simulation results show that the cost of losing power dominates the 163 benefit of eliminating size distortion as the loss of power is substantial while size distortion is not significant. The rest of the paper is organized as follows. In section 2, we introduce models with individual fixed effects in unbalanced panel data and propose a robust inference method which extends the method of Hansen [2007] with WLS in unbalanced panel data. Section 3 contains MC experiment results and, in section 4, we provide a brief summary of results and the conclusion. 3.2 Model This section presents a method for conducting robust inference with attrition in linear unbalanced panel data model. yit = zit β0 + ci + ft + uit (3.1) with i = 1, 2, ..., n and t = 1, 2, ..., T . yit is continuous and zit is a vector of covariates, ci is individual fixed effects, ft is time fixed effects and uit is weakly dependent error. We assume random sampling for cross-section units but allow weak dependence in time dimension.2 For the sake of simplicity of notation, 2 The application of balanced panel data model of (3.1) with large T has been appeared in empirical finance literature, in equity returns and earnings surprises, and oftentimes in these models standard errors were estimated with the adjustment to reflect correlation across observations. Typical standard error adjustments was based on Fama and MacBeth [1974] procedure or the method of clustering across time is most common. Recently, the use of panel data and linear model with outcome of serial dependence becomes quite common in finance and macroeconomics applications but Petersen [2009] surveys that the methods each researcher uses to adjust the bias of standard errors are quite various. 164 we rewrite (3.1) as in (3.2). yit = xit β + ci + uit (3.2) where xit = (zit , ft ) and β = (β0 , 1) . Now we define selection indicator (sit ) to express the model with unbalanced panel data by random attrition. sit = 1 if both dependent variable (yit ) and all of covariates (xit ) are available and sit = 0 otherwise. Consider the following fixed-effects (FE) transformation of the original model of (3.1) to (3.3) with balanced panel data. ˜ yit = xit β + uit ˜ ˜ (3.3) where 1 yit = yit − ˜ Ti T T sit · yit , Ti = t=1 t=1 1 ˜ sit , xit = xit − Ti T t=1 1 sit · xit , uit = uit − ˜ Ti T sit uit t=1 (3.4) Correspondingly, fixed-effects (FE) transformed model for unbalanced panel data can be expressed with selection indicator as follows. ˜ sit yit = sit xit β + sit uit ˜ ˜ (3.5) The focus is on inference about β. OLS estimator for β in (3.5) is given by (3.6).(We denote this FE estimator as OLS hereafter.) n T ˆ βols = ( n ˜ ˜ sit xit xit i=1 t=1 T )−1 ˜ ˜ sit xit yit i=1 t=1 165 (3.6) where we use s2 = sit . it n T ˆ βols − β = ( n ˜ ˜ sit xit xit T )−1 i=1 t=1 n T ˜ ˜ sit xit uit = ( i=1 t=1 n ˜ ˜ sit xit xit i=1 t=1 T )−1 ˜ sit xit uit i=1 t=1 (3.7) 3.2.1 Robust inference with t-test in unbalanced panel data In this section, we present fixed-n, large-T asymptotic properties of the CCE for unbalanced panel data. Under fixed-n and large T , the CCE is not consistent but converges in distribution to a limiting random variable. In the CCE with attrition, Ti plays the role of the HAC smoothing parameter as Ti determines the bandwidth for each cross-section unit i. In T particular, smoothing parameter, λi = Ti which changes for each cross-section unit is equivalent to b in HAR covariance estimator in fixed-b asymptotic theory (Kiefer and Vogelsang [2002] and Kiefer and Vogelsang [2005]). For balanced panel data, Hansen [2007] shows that we can perform t and F tests with scaled t and Wald statistics under the assumption that covariates and error terms are independent of cross-section unit (hereafter we call this assumption as cross-section homogeneity). Theorem 4 (Hansen 2007) Under cross-section homogeneity and balanced panel, standard t-statistic for βols with cluster robust standard error converges to a t-distribution with n − 1 degree of freedom, scaled by n n−1 , under the null hypothesis. However, the methods which ignore attrition cannot be directly applied with unbalanced panel data since attrition introduces heterogeneity in both covariates and variance so this leads to distortion in inference for unbalanced panel data. 166 We extend the result of theorem 4 to unbalanced panel data using weights and the result of Bakirov and Szekely [2006]. We use weight for each cross-section unit to recover original homogeneity in covariates of balanced panel data. The idea of recovering homogeneity is that cross-section units with fewer time-series get higher weights for each observation to contribute evenly across cross-section. With the following assumptions, under cross-section homogeneity of original balanced panel data, t-test with weighted least squares (WLS) of (3.8) and CCE of Vw−clus in (3.9) provides a conservative inference result of Bakirov and Szekely [2006] for unbalanced panel data. n T βwls = ( i=1 t=1 n T Vw−clus = ( i=1 t=1 x x sit √it √it )−1 λi λi n T i=1 t=1 1 s x x )−1 Ωwls ( λi it it it n y x sit √it √it λi λi T i=1 t=1 1 s x x )−1 , λi it it it (3.8) (3.9) where T n [( Ωwls = i=1 t=1 1 s x u )( λi it it it T t=1 1 s x u )] λi it it it and uit = yit − xit βW LS . Assumption 5 1. Independent sampling across cross-section, n is fixed and Ti goes in- finity for each i 2. Weak dependence for xit , uit 1 3. (WLLN) T 1 4. (FCLT) √ T t=1 p T sit xit xit → λi Qi and Q−1 exists, where Ti = λi λi ∈ (0, 1] i T s x u T t=1 it it it ⇒ Λi W (λi ), where W (λi ) is brownian motion with mean zero and variance λi 167 5. (Homogeneity across cross-section) Qi = Q and Λi = Λ 6. (Missing Completely At Random) E(uit |xit , si ) = 0 where si = (si1 , si2 , ...., siT ) Consider testing linear hypothesis about β of the form H0 : Rβ = r, H1 : Rβ = r (3.10) where R is a q × k matrix of known constants with full rank and r is a q × 1 vector of known constants. In the case q = 1 we can define the t-statistics √ t= T (Rβols − r) , t∗ = wls R · T · Vclus R √ T (Rβwls − r) (3.11) R · T · Vw−clus R Using following two theorems 6 and 7, we can obtain corollary 8 which allows conservative inference for t-test with weight. Theorem 6 (Bakirov and Szekely [2006]) Let X1 , X2 , ....., Xn be an i.i.d. sample from normal scale mixture, and let y1 , y2 , ....., yn be independent normal (0, σi ) random variables, 1 y=n n i=1 1 2 yi , and Sy = n−1 n (yi − y)2 . Then i=1 √ X −µ P{ n > x} = sup SX σ1 ,σ2 ,...σn Rn n √ y P{ n > x} dF (σi ) Sy i=1 Proof. See Bakirov and Szekely [2006]. Theorem 7 (Bakirov and Szekely [2006]) Let Cn−1 (α) be the critical value of the usual two-sided t-test based on (3) of level α (i.e. P (|Tn−1 | > Cn−1 (α)) = α) √ nX Tn−1 = where X1 , X2 , ....., Xn be an i.i.d.sample from normal SX 168 (3.12) Then, using theorem 6, √ (i) If α ≤ 2 · (1 − Φ( 3)) = 0.0832...., then ∀n ≥ 2, sup σ1 ,σ2 ,...σn P (|Tn−1 | > Cn−1 (α)) ≤ P (|Tn−1 | > Cn−1 (α)) = α (3.13) (i.e. α ≤ 0.0832 and ∀n ≥ 2, Tn−1 ⇒ Tn−1 where Tn−1 is t-distribution with n − 1 degree of freedom) (ii) Usual one-sided t-test of nominal level 0.05 or lower remains valid as long as n ≤14. For n >15, usual two-sided tests of nominal level 0.1(or one-sided tests of nominal level 0.05) are not automatically conservative for large n. (iii) For α ≤ 0.0832 and ∀n ≥ 2, as n → ∞ , sup σ1 ,σ2 ,...σn P (|Tn−1 | > Cn−1 (α)) ≤ P (Z > Cn−1 (α)) = α where Z ∼ N (0, I) Proof. See Bakirov and Szekely [2006]. Corollary 8 Under assumptions 5 and attrition, the inference based on t∗ in (3.11) scaled wls by n n−1 converges to Tn−1 -distribution in theorem 7, under the null hypothesis. Proof. Here we provide the sketch of the proof. See appendix for details. First, consider the numerator of t∗ . βwls = ( wls n T xit x sit √ √it )−1 λi i=1 t=1 λi n T xit y sit √ √it . Under assumptions 5 i=1 t=1 λi λi and null hypothesis, we can rewrite numerator as √ √ T (Rβwls − Rβ) = R T (βwls − β) n √ √ R T (βwls − β) = R T ( T i=1 t=1 x x sit √it √it )−1 λi λi 169 n T i=1 t=1 x u sit √it √it λi λi n √ √ R T (βwls − β) = R T ( T i=1 t=1 n => R( i=1 1 λ Q )−1 λi i i n x x sit √it √it )−1 Ti Ti 1 Λ W (λi ) = λi i i=1 n T i=1 t=1 n R( Qi )−1 i=1 i=1 x u sit √it √it Ti Ti n 1 Λ W (λi ) λi i where W (λi ) is wiener process with mean zero and variance λi . Now consider denominator T · Vw−clus n T · Vw−clus = ( i=1 1 Ti T t=1 n [( Ωwls = i=1 1 1 √ λi T T t=1 i=1 1 1 √ λi T T 1 1 sit xit uit = √ λi T n sit xit xit )−1 Ωwls ( sit xit uit − t=1 1 1 λi T T sit xit uit )( t=1 T 1 Ti T sit xit xit )−1 t=1 T 1 1 √ λi T sit xit uit ) ] t=1 √ sit xit xit T (βW LS − β) + op (1) t=1 1 =⇒ Λi W (λi ) − Qi ( λi n n Qj )−1 ( j=1 j=1 1 Λ W (λj )) λj j Let’s denote W (λj ) as Wj for simplicity of notation. n Under cross-section homogeneity, (nQ)−1 Ωwls (nQ)−1 and Ωwls ⇒ Λ[ T · Vw−clus n W W i i λi λi i=1 − ⇒ ( Qi )−1 Ωwls ( i=1 n W n W j j 1 n( λj )( λj )]Λ j=1 j=1 n Qi )−1 = i=1 . W (λ ) Let’s define yi as yi ≡ RQ−1 Λ λ i . Since Wi ≡ W (λi ) is brownian motion with mean i 0 and the variance λi , yi is mean zero and variance of RQ−1 Λvar(W (λi ))Λ Q−1 R λ2 i . As the variance of W (λi ) is λi ∗ Iq , for the case of q = 1, y1 , y2 , y3 , ....., yn is independent and each −1 ΛΛ Q−1 R λi yi is normal (0, RQ ). Then, in the case of q = 1, we have n yi t∗ wls ⇒ i=1 n i=1 = 1 2 yi − n n · n−1 n n yi i=1 1 n−1 ny = n yj j=1 √ n i=1 ny ≡ (yi − y)2 i=1 170 2 yi − ny 2 = ny n (yi − y)2 i=1 √ y n Tn−1 where Tn−1 ≡ n n−1 Sy Thus, the proper weight leaves λi as the only source of heterogeneity so that we can apply Bakirov and Szekely [2006] result and obtain conservative inference. Remark 9 Heterogeniety is induced by attrition. Attrition leads to heterogeneity in sit xit xit and sit xit uit for each i. However, using the ratio of the length of time-series to full length T of time series as weight, λi = Ti , we apply WLS and the weight eliminates heterogeneity in covariates. Under cross-section homogeneity assumption, as attrition is the unique source of heterogeneity, the use of weights recovers the homogeneity in covariates. Moreover, since empirically most relevant case is two-sided tests with α ≤ 0.05, we can conduct inference with t∗wls statistics by multiplying n−1 , n n which is equivalent to replace Vw−clus with n−1 · Vw−clus . For a significant level .05 or lower, the t-test with t∗wls -statistic scaled by n−1 n remains conservative. 3.2.2 WLS with heterogenous covariance matrix in unbalanced panel data In addition to heterogeneity by attrition, we introduce heteroskedastic variance for each cross-section unit in unbalanced panel data. For balanced panel data, t-statistic based test in Bester et al. [2009] and Ibragimov and Muller [2009] provide conservative inference results of Bakirov and Szekely [2006] in the presence of heteroskedastic variance. For unbalanced panel data, if homogeneity only holds for covariates but not for variance so that Qi = Q and Λi = Λ hold because of heteroskedastic variance, we can still obtain the conservative 171 inference result in Bakirov and Szekely [2006] with t-test as we use t∗wls -statistic with scale factor. Under heteroskedastic variance and homogenous covariates for unbalanced panel data, we have two sources of heterogeneity but we can still apply corollary 8 with βW LS and Vw −clus and perform t-test. Only change from homoskedastic variance is that we define yi in the W (λ ) proof of corollary 8 as RQ−1 Λi λ i so that its variance is i RQ−1 Λi Λi Q−1 R λi . The key for the determination of weights is making each cross-section unit to contribute evenly as in the case of no heteroskedastic variance. Units with fewer time series observation and with smaller variance receive more weights than the units with more time series or large variance. This weighting eliminates heterogeniety in covariates but two sources of heteroskedasticity in error still remains after weighting. In sum, we show that robust inference method in balanced panel data can be extended to unbalanced panel data with proper choice of weights for cross-section units. Corollary 8 shows that t-test with WLS in unbalanced panel data provides the bound result of conservative inference under heterogenous variance and attrition. 3.3 Monte Carlo experiment In this section, we examine finite sample properties of proposed WLS method in unbalanced linear panel data model with dependent data. We provide two sets of simulation. The only source of heterogeneity is attrition in the first simulation while there are two sources of heterogeneity, attrition and heteroskedastic variance in the second simulation. Simulation results for the size distortion and the power of test are provided. In each set of simulation, 172 we compare the size distortion and the power of our method to those of Hansen [2007] and Ibragimov and Muller [2009] methods which were developed for balanced panel data. Attrition time periods are generated in simulation not to be correlated with either outcome or covariates. Thus, each cross-section unit has different attrition time period and this leads to different time-series length which introduces heterogenous variance for each cross-section unit. 3.3.1 The t-test using t∗ -statistic with scaled factor under homewls geniety and attrition DGP I: Attrition is the only source of heterogeneity We consider following data generating process for balanced panel data. Yit = xit · β + ci + ft + uit , for i = 1, 2, .., n and t = 1, 2, ..., T where xit is continuous variable. Variables generation 1. xit ∼ iidN (0, 1), β = 1 2. Homogeneous error, ui , with t=T     ui1          u    i2     ∼ M V N   .    .    .         uiT    0   1 ρ . . . ρT −1          ρ 0   1 ρT −2    ,   .   . .  .   . .  .   . 1 .        0 ρT −1 ... 1 where M V N is multivariate normal and ρ=0.75 in the simulation. 173 3. ci = ai + xi , ai ∼ iidN (0, 1) and xi = ¯ ¯ T t=1 sit · xit 4. ft = 0 The determination of attrition for unbalanced panel data is presented in table 3.1. Two key features in the determination of attrition are: (i) the proportion of available data is about 60% and (ii) the length of available time-series for each cross-section units vary so the variation of time length for each cross-section unit induces heterogeneity. We consider various length of time-series, T =50, 100, 300, and 1,000 for balanced panel data. T =50 has relevance for yearly data of 40 years such as data from 1970 to 2009, while T =100 has relevance for quarterly data of 25 years such as from 1985 to 2009. 174 Table 3.1. selection process T=50 n=4 n=5 n=6 n=7 n=8 n=9 n=10 n=11 n=16 T=100 n=4 n=5 n=6 n=7 n=8 n=16 T=300 n=4 n=5 n=6 n=7 n=8 n=16 missing fraction 50(2), 10(2) 0.4 50(2), 30(1), 10(2) 0.4 50(2), 40(1), 20(1), 10(2) 0.4 50(2), 40(1), 30(1), 20(1), 10(2) 0.4 50(2), 40(2), 20(2), 10(2) 0.4 50(2), 40(1), 35(1), 30(1), 25(1), 20(1), 10(2) 0.4 50(2), 40(1), 35(1), 30(2), 25(1), 20(1), 10(2) 0.4 50(2), 40(1), 35(1), 30(3), 25(1), 20(1), 10(2) 0.4 50(2), 40(2), 35(2), 30(4), 25(2), 20(2), 10(2) 0.4 missing fraction 100(2), 20(2) 0.4 100(2), 60(1), 20(2) 0.4 100(2), 70(1), 50(1), 20(2) 0.4 100(2), 70(1), 60(1), 50(1), 20(2) 0.4 100(2), 70(1), 60(2), 50(1), 20(2) 0.4 100(2), 80(2), 70(2), 60(4), 50(2), 40(2), 20(2) 0.4 missing fraction 300(2), 100(2) 0.33 300(2), 100(3) 0.4 300(2), 140(2), 100(2) 0.4 300(2), 180(1), 140(2), 100(2) 0.4 300(2), 200(2), 160(1), 140(1), 100(2) 0.4 300(4), 240(1), 210(2), 150(2), 125(2), 90(4) 0.4 (Note) The length of time is reported and number of cross-section is reported in parentheses. 175 Simulation result: Homogeneity and attrition In all simulation, we consider inference about parameter β in panel data model of (3.3) using OLS in (3.6) and WLS in (3.8). Table 3.2 shows the size of t-test with nominal level 0.05. The first column shows the dimension of panel data. The second column reports the rejection probability of t-test for balanced panel data using t-statistic in Hansen [2007]. As the homegeneity assumption is satisfied, t-test for balanced panel data shows no size distortion. Coverage rate for rejection probability contain nominal level 0.05 for all n. Third column reports the rejection probability for t-test of complete case from unbalanced panel data. For all simulated cross-section dimension n ≤8, rejection probability shows substantial size distortion by over-rejection and the size distortion is greater for unbalanced panel data with smaller n. Therefore, simulation results verify that t-test using t-statistic in Hansen [2007] show size distortion with unbalanced panel data. The fourth column shows the rejection probability from t-test of Ibragimov and Muller [2009] method. t-statistics in Ibragimov and Muller [2009] is obtained from (3.14). ¯ ˆ √ β tβ = n sβ ˆ ¯ 1 ˆ where β= n n ˆ j=1 βj 1 and s2 = n−1 ˆ β n j=1 (βj (3.14) ¯ ˆ − β)2 The rejection probability from t-test of Ibragimov and Muller [2009] method provides the evidence of conservative inference when there is heterogeneity. It also shows that the size distortion tapers off as n increases. Eventually, Ibragimov and Muller [2009] method with heterogeneity and n=16 shows no size distortion at all. The fifth column in table 3.2 provides the rejection probability of t-test with t∗ -statistics wls 176 where the weights for each cross-section is calculated by T Ti . As WLS put more weights to cross-section unit with shorter length so that eliminate heterogeneity across cross-section. Simulation results for t-test with t∗ -statistics in table 3.2 verify the conservative inference wls of corollary 8 and the size distortion with WLS is substantially smaller than that of OLS especially for small n ≤8. In sum, WLS method reduces type I error. The bound from WLS is very close to .05 and much closer to .05 than the bound from Ibragimov and Muller [2009] method. Table 3.3 reports the size adjusted power of t-test for T =50. Power is greater for t-test of Hansen [2007] with WLS than for that of Ibragimov and Muller [2009]. An inference of Hansen [2007] with WLS dominates that of Ibragimov and Muller [2009] in all n and the power advantage of Hansen [2007] with WLS comes from using both within and between variations while Ibragimov and Muller [2009] only use between variation. All these imply that a robust inference of Hansen [2007] with WLS is preferred method especially with small n ≤8 when attrition is the sole source of heterogeneity. We obtain the same qualitative implication in table 3.4 with T =100. As n and T increases, the power of tests also increases for all four methods. The order of higher power is balanced data, complete case with OLS, complete case with WLS and complete case with Ibragimov and Muller [2009] method. 177 Table 3.2. The size of test, 1(p < 0.05) T=50 balanced complete-case complete-case OLS OLS t-stat inference of IM n=4 .051(.047,.055) .090(.084,.096) .032(.028,.035) n=5 .053(.048,.057) .074(.069,.079) .034(.030,.037) n=6 .053(.048,.057) .068(.063,.073) .039(.035,.043) n=7 .053(.048,.057) .065(.060,.070) .039(.035,.043) n=8 .049(.045,.053) .058(.053,.062) .045(.041,.049) n=16 .050(.046,.054) .049(.045,.054) .050(.044,.054) T=100 balanced complete-case complete-case OLS OLS t-stat inference of IM n=4 .050(.046,.054) .083(.078,.089) .035(.031,.038) n=5 .052(.047,.056) .090(.085,.096) .024(.021,.027) n=6 .048(.044,.053) .079(.073,.084) .029(.026,.032) n=7 .049(.045,.054) .068(.063,.073) .028(.025,.031) n=8 .051(.046,.056) .061(.056,.066) .042(.038,.045) n=16 .051(.045,.057) .050(.044,.056) .051(.045,.058) T=300 balanced complete-case complete-case OLS OLS t-stat inference of IM n=4 .051(.046,.055) .075(.069,.080) .034(.030,.037) n=5 .049(.044,.053) .067(.062,.071) .040(.036,.044) n=6 .052(.047,.056) .067(.062,.072) .042(.038,.046) n=7 .047(.043,.051) .062(.058,.067) .041(.037,.045) n=8 .049(.044,.053) .061(.056,.066) .042(.038,.045) n=16 .048(.044,.052) .055(.050,.059) .047(.043,.051) complete-case WLS .051(.046,.055) .051(.046,.055) .050(.044,.054) .049(.045,.054) .049(.044,.053) .046(.041,.051) complete-case WLS .044(.040,.048) .045(.041,.049) .045(.041,.049) .043(.040,.047) .045(.041,.049) .050(.044,.056) complete-case WLS .042(.038,.046) .045(.041,.049) .047(.043,.051) .045(.041,.049) .045(.041,.049) .049(.045,.054) (Note) The rejection probability with nominal level .05 is reported. 95 % coverage rate for rejection probability is reported in parenthesis. The number of replications in simulation is 10,000. Balanced OLS and complete case OLS is based on the methods in Hansen [2007]. 178 Table 3.3. The size adjusted power of test T=50 n=4 bias=0 bias=2% bias=5% bias=10% bias=20% T=50 n=5 bias=0 bias=2% bias=5% bias=10% bias=20% T=50 n=6 bias=0 bias=2% bias=5% bias=10% bias=20% T=50 n=7 bias=0 bias=2% bias=5% bias=10% bias=20% T=50 n=8 bias=0 bias=2% bias=5% bias=10% bias=20% balanced OLS .051(.047,.055) .056(.051,.060) .077(.071,.082) .147(.140,.153) .395(.385,.404) balanced OLS .053(.048,.057) .057(.052,.061) .084(.078,.089) .188(.181,.196) .551(.541,.561) balanced OLS .053(.048,.057) .055(.050,.060) .091(.086,.097) .227(.219,.235) .668(.658,.677) balanced OLS .053(.048,.057) .058(.053,.062) .107(.101,.113) .278(.269,.287) .763(.754,.771) balanced OLS .049(.045,.053) .059(.054,.063) .118(.111,.124) .320(.311,.330) .835(.827,.842) complete-case OLS .090(.084,.096) .089(.083,.094) .103(.097,.109) .152(.145,.159) .347(.337,.356) complete-case OLS .074(.069,.079) .082(.077,.087) .098(.092,.104) .171(.164,.178) .429(.420,.439) complete-case OLS .068(.063,.073) .074(.069,.079) .096(.090,.101) .188(.180,.195) .498(.488,.507) complete-case OLS .065(.060,.070) .072(.067,.077) .100(.094,.106) .206(.198,.214) .576(.567,.586) complete-case OLS .058(.053,.062) .066(.061,.071) .105(.099,.111) .230(.221,.238) .641(.631,.650) complete-case t-stat inference of IM .032(.028,.035) .033(.029,.037) .039(.035,.042) .061(.057,.066) .157(.149,.164) complete-case t-stat inference of IM .034(.030,.037) .037(.033,.041) .048(.044,.053) .087(.082,.093) .238(.229,.246) complete-case t-stat inference of IM .039(.035,.042) .045(.041,.049) .059(.054,.063) .107(.100,.113) .307(.298,.316) complete-case t-stat inference of IM .039(.035,.042) .045(.040,.049) .064(.059,.069) .131(.125,.138) .372(.363,.382) complete-case t-stat inference of IM .045(.041,.049) .046(.042,.050) .075(.069,.080) .158(.151,.165) .465(.455,.474) complete-case WLS .051(.046,.055) .050(.045,.054) .060(.056,.065) .096(.091,.102) .244(.236,.253) complete-case WLS .051(.046,.055) .054(.050,.059) .067(.062,.072) .119(.113,.126) .345(.226,.354) complete-case WLS .050(.045,.054) .056(.051,.060) .072(.067,.077) .149(.142,.156) .430(.420,.440) complete-case WLS .049(.045,.054) .056(.052,.061) .080(.075,.086) .172(.165,.179) .513(.503,.523) complete-case WLS .049(.044,.053) .056(.052,.061) .089(.084,.095) .201(.193,.209) .601(.591,.610) (Note) The rejection probability is reported. 95 % coverage rate for rejection probability is reported in parenthesis. Generated errors are from AR(1) process with correlation coefficient 0.75. The number of replications is 10,000. Balanced OLS and complete case OLS is based on the methods in Hansen [2007]. 179 Table 3.4. The size adjusted power of test, 1(p < 0.05), T = 100 T=100 n=4 bias=0 bias=2% bias=5% bias=10% bias=20% T=100 n=5 bias=0 bias=2% bias=5% bias=10% bias=20% T=100 n=6 bias=0 bias=2% bias=5% bias=10% bias=20% T=100 n=7 bias=0 bias=2% bias=5% bias=10% bias=20% T=100 n=8 bias=0 bias=2% bias=5% bias=10% bias=20% balanced OLS .050(.046,.054) .056(.051,.060) .097(.091,.104) .228(.220,.237) .646(.635,.657) balanced OLS .052(.047,.056) .063(.056,.069) .117(.109,.125) .320(.309,.330) .821(.813,.828) balanced OLS .048(.044,.053) .058(.054,.063) .140(.131,.148) .406(.396,.416) .916(.909,.923) balanced OLS .049(.045,.054) .073(.067,.080) .172(.161,.182) .487(.474,.500) .964(.960,.968) balanced OLS .051(.046,.056) .070(.064,.075) .188(.178,.198) .554(.544,.564) .986(.982,.989) complete-case OLS .083(.078,.089) .083(.078,.089) .117(.109,.124) .223(.215,.231) .549(.537,.560) complete-case OLS .090(.085,.096) .097(.089,.104) .141(.132,.149) .281(.271,.290) .678(.669,.687) complete-case OLS .079(.073,.084) .089(.083,.094) .142(.133,.150) .315(.306,.324) .766(.755,.777) complete-case OLS .068(.063,.073) .086(.078,.093) .142(.132,.151) .366(.354,.378) .850(.842,.858) complete-case OLS .061(.056,.066) .078(.072,.083) .163(.153,.173) .408(.398,.418) .893(.883,.902) complete-case t-stat inference of IM .035(.031,.038) .036(.032,.039) .044(.039,.049) .097(.091,.103) .289(.278,.299) complete-case t-stat inference of IM .024(.021,.027) .027(.023,.031) .040(.035,.045) .096(.090,.103) .285(.276,.293) complete-case t-stat inference of IM .029(.026,.032) .030(.027,.033) .056(.050,.061) .131(.124,.138) .399(.387,.411) complete-case t-stat inference of IM .028(.025,.031) .038(.033,.043) .067(.059,.069) .171(.161,.180) .495(.484,.505) complete-case t-stat inference of IM .042(.038,.045) .049(.044,.053) .095(.086,.104) .248(.238,.258) .670(.659,.680) complete-case WLS .044(.040,.048) .049(.044,.053) .067(.061,.073) .140(.133,.146) .407(.396,.418) complete-case WLS .045(.041,.049) .046(.040,.051) .080(.073,.087) .176(.168,.185) .543(.533,.553) complete-case WLS .045(.041,.049) .056(.052,.060) .094(.087,.102) .233(.225,.241) .679(.667,.691) complete-case WLS .043(.039,.047) .060(.052,.061) .102(.093,.111) .293(.282,.304) .789(.780,.798) complete-case WLS .045(.041,.049) .060(.052,.071) .128(.118,.139) .348(.338,.358) .855(.844,.865) (Note) The rejection probability is reported. 95 % coverage rate for rejection probability is reported in parenthesis. Generated errors are from AR(1) process with correlation coefficient 0.75. The number of replications is 10,000. Balanced OLS and complete case OLS is based on the methods in Hansen [2007]. 180 Table 3.5. Configuration of heteroskedastic variance n=4 n=5 n=6 n=7 n=8 3.3.2 a1 a2 1.5 .5 1.5 .5 1.5 .5 1.5 .5 1.5 .5 a3 2 2 2 2 2 a4 .8 .8 .8 .8 .8 a5 a6 a7 a8 .9 .9 .9 .9 1.1 1.1 1.1 1.2 1.2 .7 The t-test using t∗ -statistic with scaled factor under hetwls erogenous variance and attrition DGP II: Attrition and heterokedastic variance We use the same data generating process as in DGP I where attrition is the only source of heterogeneity and attrition configurations as in table 3.1. In DGP II, the only difference in DGP is coming from error process ui . As in DGP I, each uit is initially drawn from standard normal with AR(1) process of serial correlation coefficient, ρ=0.75. Then, we introduce heteroskedastic variance by replace uit with ai ∗ uit where, for the case of n=4, we assign a1 =1.5, a2 =.5, a3 =2 and a4 =.8. Heteroskedastic variance configurations in DGPs are provided in table 3.5. Simulation result: Heteroskedastic variance and attrition Table 3.6 shows the size of t-test at the nominal level .05 for data with n=4,5,6,7,8,16 and T =100. The t-test inference of Hansen [2007] results in balanced panel data are reported in the second column and it shows conservative inference for all n. As n increases, the rejection frequency approaches nominal level .05 so that size distortion disappears. Third column shows the results of t-test with complete case in unbalanced panel data. There are 181 two sources of heterogeneity for the t-test with complete case: attrition and heteroskedastic variance. Both sources of heterogeneity are not controlled for t-test with complete case. Results in third column shows that two sources of heterogeneity may work against each other as attrition mitigates standard error while heteroskedastic variance increases standard error used in t-statistic. Thus, there are no consistent pattern of over-rejection or underrejection as we change n. The fourth column reports the results of t-tests using the method in Ibragimov and Muller [2009] with complete case in unbalanced panel data. The t-tests provide conservative inference for all n while rejection frequency approaches to true nominal level .05 as n increases up to n=16. The mean value of MC rejection frequency for Ibragimov and Muller [2009] is lower than that for Hansen [2007] in balanced panel data. Fifth column reports the results of t-test with complete case in unbalanced panel using WLS which is proposed in this paper. For this method, weights used in WLS eliminate heterogeneity in covariates due to attrition while heterogeneity in errors by two sources still remains. The results in fifth column shows conservative inference for all n. The rejection frequency approaches to true nominal level .05 as n increases up to n=16. The mean value of MC rejection frequency is lower than that in balanced panel data for all n. Table 3.7 shows the power of t-tests for the method in Hansen [2007] with balanced panel, the method in Hansen [2007] with unbalanced panel, the method in Ibragimov and Muller [2009] with unbalanced panel, and our suggested WLS method with unbalanced panel. The power increases with n for all methods. Our WLS method has higher power than the method in Ibragimov and Muller [2009] for all n and all magnitude of bias. The third column shows the power for the methods in Hansen [2007] while the fifth column reports the power for our WLS method in unbalanced panel. The WLS method has lower power than the method in 182 Table 3.6. The size of test, 1(p < 0.05) T=100 n=4 n=5 n=6 n=7 n=8 n=16 balanced complete-case complete-case OLS OLS t-stat inference of IM .033(.030,.037) .031(.028,.034) .017(.014,.020) .036(.031,.040) .049(.044,.054) .027(.023,.031) .038(.033,.042) .051(.045,.056) .027(.023,.031) .042(.038,.046) .051(.047,.056) .029(.025,.032) .042(.038,.046) .055(.050,.060) .036(.032,.040) .047(.041,.053) .058(.052,.064) .042(.036,.048) complete-case WLS .018(.015,.020) .028(.024,.031) .033(.028,.037) .037(.033,.040) .042(.037,.046) .046(.040,.052) (Note) The rejection probability with nominal level .05 is reported. 95 % coverage rate for rejection probability is reported in parenthesis. The number of replications in simulation is 2,000. Balanced OLS and complete case OLS is based on the methods in Hansen [2007]. Hansen [2007] but the difference is quite small overall. However, compared to homogeneous case, the introduction of heteroskedastic variance leads to loss of power for all methods we considered in simulation. 3.3.3 Trade-off calculation: Dropping observation to make balanced panel data As unbalancedness of panel data causes heterogeneity and it forces conservative t-tests for the methods in Ibragimov and Muller [2009] and Bester et al. [2009] and leads to t-tests invalid when these methods use t-distribution as reference distribution. Thus, we examine the tradeoff of dropping some observations on purpose to make data balanced.3 Dropping observations inevitably introduces the loss of power while balanced panel data eliminates 3 We observe this practice of balancing data occasionally especially in international macroeconomic data. Many data are available from 1970 and the number of countries analyzed sometimes less than 50. It is quite common that researchers have shorter time-series for developing countries and they use only balanced data from 1980 or later period. 183 Table 3.7. The size adjusted power of test, 1(p < 0.05), T = 100 T=100 n=4 bias=0 bias=2% bias=5% bias=10% bias=20% T=100 n=5 bias=0 bias=2% bias=5% bias=10% bias=20% T=100 n=6 bias=0 bias=2% bias=5% bias=10% bias=20% T=100 n=7 bias=0 bias=2% bias=5% bias=10% bias=20% T=100 n=8 bias=0 bias=2% bias=5% bias=10% bias=20% balanced OLS .033(.030,.037) .037(.028,.046) .069(.061,.077) .190(.182,.198) .497(.489,.505) balanced OLS .036(.031,.044) .045(.037,.053) .081(.073,.089) .260(.252,.268) .681(.673,.689) balanced OLS .038(.033,.042) .051(.043,.059) .114(.106,.122) .316(.308,.324) .786(.778,.794) balanced OLS .042(.038,.046) .059(.051,.067) .137(.129,.144) .391(.383,.399) .870(.863,.878) balanced OLS .042(.038,.046) .070(.064,.075) .188(.178,.198) .554(.544,.564) .986(.982,.989) complete-case OLS .031(.028,.034) .033(.030,.037) .059(.051,.064) .153(.145,.161) .422(.414,.430) complete-case OLS .049(.044,.054) .097(.089,.104) .141(.133,.149) .281(.273,.289) .678(.670,.686) complete-case OLS .051(.045,.056) .058(.053,.063) .107(.100,.114) .254(.246,.262) .622(.614,.630) complete-case OLS .051(.047,.056) .063(.056,.071) .117(.110,.125) .311(.303,.319) .717(.709,.725) complete-case OLS .055(.050,.060) .078(.072,.083) .163(.153,.173) .408(.398,.418) .893(.883,.902) complete-case t-stat inference of IM .017(.014,.020) .017(.014,.020) .028(.024,.033) .064(.059,.069) .208(.203,.213) complete-case t-stat inference of IM .027(.023,.031) .027(.023,.031) .040(.035,.045) .096(.091,.101) .285(.276,.290) complete-case t-stat inference of IM .027(.023,.031) .035(.030,.040) .051(.047,.056) .111(.106,.116) .349(.344,.354) complete-case t-stat inference of IM .029(.025,.032) .036(.031,.040) .060(.055,.065) .134(.129,.140) .409(.404,.415) complete-case t-stat inference of IM .036(.032,.040) .049(.044,.053) .095(.086,.104) .248(.238,.258) .670(.659,.680) complete-case WLS .018(.015,.020) .019(.016,.021) .034(.027,.041) .093(.087,.100) .300(.293,.307) complete-case WLS .028(.024,.031) .046(.040,.053) .080(.073,.087) .176(.169,.184) .543(.535,.550) complete-case WLS .033(.028,.037) .043(.037,.050) .077(.070,.084) .181(.174,.188) .565(.558,.572) complete-case WLS .037(.033,.041) .045(.038,.052) .085(.078,.092) .241(.234,.248) .655(.648,.662) complete-case WLS .042(.037,.046) .060(.052,.071) .128(.118,.139) .348(.338,.358) .855(.844,.865) (Note) The rejection probability is reported. 95 % coverage rate for rejection probability is reported in parenthesis. The number of replications is 2,000. Balanced OLS and complete case OLS is based on the methods in Hansen [2007]. 184 heterogeneity and the size distortion in inference. Thus, we can quantify the tradeoff between the benefit of eliminating the rejection probability bias and the cost of losing power using DGP I. Table 3.8 reports the size of t-tests for a method in Ibragimov and Muller [2009]. The first column shows cross-section dimension of data. The second column reports the rejection probability of t-test for balanced panel data without attrition. The fourth and fifth columns report the rejection probability of t-test for balanced panels by dropping observations (reduced balanced panel) from complete case for time dimension and cross-section dimension respectively. These three columns show that there is no size distortion of inferences for balanced panels in simulation. The third column reports the rejection probability of t-test for complete case of unbalanced panel data. Different length of data due to missing process configuration of table 3.1 induces heterogeneity and heteroskedastic variance leads to conservative inference of t-test as theory prediction of Ibragimov and Muller [2009] approach indicates. As n increases the degree of conservativeness of inference is mitigated and, for n=16, inference exhibits no size distortion regardless of size of time dimensions. Table 3.9 and appendix I report the size adjusted power of t-tests in Ibragimov and Muller [2009] for T =50, 100, 300 and 1,000 respectively. Regardless of T , the loss of power is quite significant for reduced balanced panel. Especially for large n ≥8 and bias=20%, the power of t-test for reduced balanced panel is less than 50% of the power of unbalanced panel of complete case. This is noticeable decline of power by dropping observations. As reported in tables 3.3 and 3.4, the difference of the power of t-tests of between complete case with OLS and complete case with WLS is negligibly small. The power analysis for reduced balanced panel data shows that the cost of losing power overwhelms the benefit of no size distortion. 185 Table 3.8. The size of test, 1(p < 0.05) where AR(1) error with correlation coefficient 0.75 T=50 n=4 n=5 n=6 n=7 n=8 n=9 n=10 n=11 n=16 T=100 n=4 n=5 n=6 n=7 n=8 n=16 T=300 n=4 n=5 n=6 n=7 n=8 n=16 T=1,000 n=4 n=5 n=6 n=7 n=8 n=9 n=16 balanced complete-case .053(.045,.059) .054(.046,.060) .047(.040,.054) .046(.039,.053) .051(.044,.058) .052(.045,.059) .047(.040,.054) .049(.042,.056) .050(.043,.057) balanced .048(.041,.054) .050(.043,.056) .046(.040,.053) .054(.047,.061) .048(.042,.055) .048(.041,.055) balanced .046(.039,.053) .051(.044,.058) .053(.046,.060) .045(.038,.052) .053(.042,.055) .046(.041,.055) balanced .056(.049,.063) .051(.044,.058) .050(.043,.057) .047(.040,.053) .046(.038,.055) .051(.041,.058) .050(.043,.057) .031(.025,.036) .034(.029,.040) .037(.031,.043) .042(.036,.048) .039(.033,.045) .040(.034,.046) .042(.036,.048) .048(.041,.055) .048(.041,.055) complete-case .028(.023,.033) .023(.019,.028) .027(.022,.032) .030(.025,.036) .038(.032,.044) .049(.040,.058) complete-case .029(.022,.035) .032(.026,.038) .039(.032,.045) .038(.032,.044) .042(.035,.049) .050(.043,.057) complete-case .033(.026,.040) .039(.033,.046) .040(.034,.047) .038(.032,.044) .037(.031,.044) .043(.036,.051) .046(.039,.053) balanced keep ∀i drop t .050(.042,.056) .045(.039,.051) .049(.042,.056) .048(.041,.054) .050(.043,.056) .047(.040,.053) .049(.042,.056) .051(.044,.058) .054(.047,.062) balanced-drop t .046(.040,.053) .051(.044,.058) .046(.039,.053) .047(.040,.054) .050(.043,.057) .050(.042,.057) balanced-drop t .053(.042,.063) .051(.045,.058) .050(.043,.057) .051(.044,.059) .049(.040,.057) .050(.043,.057) balanced-drop t .042(.034,.049) .045(.038,.052) .050(.043,.057) .047(.040,.053) .052(.045,.058) .049(.042,.056) .050(.043,.057) balanced keep ∀t drop i .050(.043,.056) .055(.048,.063) .050(.043,.056) .049(.042,.055) .052(.045,.059) .052(.045,.059) .053(.046,.060) .052(.045,.059) .048(.041,.054) balanced -drop i .053(.046,.060) .047(.040,.053) .048(.041,.054) .048(.041,.054) .053(.046,.060) .050(.043,.057) balanced -drop i .048(.039,.057) .046(.040,.053) .052(.046,.059) .049(.041,.056) .050(.042,.058) .048(.041,.055) balanced -drop i .057(.050,.064) .049(.042,.055) .050(.043,.057) .053(.046,.060) .055(.048,.063) .045(.038,.052) .048(.041,.055) (Note) The rejection probability is reported. 95 % coverage rate for rejection probability is reported in parenthesis. The number of replications is 4,000. 186 Table 3.9. The size adjusted power of test, 1(p < 0.05), T = 50 T=50, n=4 balanced complete-case bias=0 bias=2% bias=5% bias=10% bias=20% T=50, n=8 bias=0 bias=2% bias=5% bias=10% bias=20% T=50, n=16 bias=0 bias=2% bias=5% bias=10% bias=20% .053(.045,.059) .058(.047,.068) .074(.063,.085) .137(.121,.152) .391(.369,.412) balanced .051(.044,.058) .058(.048,.068) .114(.100,.128) .307(.287,.327) .812(.794,.829) balanced .050(.043,.057) .074(.063,.085) .191(.173,.208) .613(.591,.634) .993(.989,.997) .031(.025,.036) .043(.034,.051) .049(.040,.058) .072(.060,.083) .174(.157,.190) complete-case .039(.033,.045) .052(.042,.062) .071(.059,.082) .152(.136,.167) .467(.445,.488) complete-case .048(.041,.055) .053(.043,.062) .102(.089,.115) .306(.286,.326) .810(.792,.827) balanced keep ∀i drop t .050(.042,.056) .047(.037,.057) .049(.040,.058) .058(.047,.068) .101(.088,.114) balanced-drop t .050(.043,.056) .054(.044,.064) .060(.049,.070) .081(.069,.092) .204(.186,.222) balanced-drop t .054(.047,.062) .053(.043,.063) .083(.071,.095) .176(.159,.193) .503(.481,.525) balanced keep ∀t drop i .050(.043,.056) .057(.046,.067) .066(.056,.076) .082(.070,.094) .121(.107,.135) balanced -drop i .052(.045,.059) .059(.049,.069) .064(.053,.074) .083(.071,.095) .132(.117,.146) balanced -drop i .048(.041,.054) .054(.049,.064) .073(.062,.084) .132(.117,.147) .383(.361,.404) (Note) The rejection probability is reported. 95 % coverage rate for rejection probability is reported in parenthesis. Generated errors are from AR(1) process with correlation coefficient 0.75. The number of replications is 2,000. 187 3.4 Conclusion In this paper, we extend the robust inference method with heterogenous variance and autocorrelation in Hansen [2007] to unbalanced panel data. In unbalanced panel data, heterogeneity is induced by attrition and heterogeneity in both covariate and variance makes the t-test using OLS with scaled t-statistic in Hansen [2007] invalid. We derive robust inference result using weighted least squares (WLS) for unbalanced panel data where t∗ -statistic with wls scaling factor provides conservative inference results with standard t-test in the presence of both attrition and heteroskedastic variance. We examine our proposed WLS inference method and compare to the methods in Hansen [2007] and Ibragimov and Muller [2009] via MC simulation. Simulation results verify the bound result of t-test for the method with WLS when attrition is the only source of heterogeneity and compared to the method in Hansen [2007], the size distortion is significantly reduced with WLS especially for small n. Moreover, compared to t-statistic based method of Ibragimov and Muller [2009], WLS method performs better in power and the bound of type I error is closer to .05. Simulation also shows that the proposed WLS method in an unbalanced panel provides conservative inference but shows greater power than the method in Ibragimov and Muller [2009] in the presence of both heterogenous variance and attrition. Finally, we examine the tradeoff of dropping some observations on purpose to make the data balanced. Balanced panel data eliminate heterogeneity and the size distortion in inference but the loss of power due to dropping observations is quite substantial. Overall, the cost of losing power dominates the benefit of eliminating the rejection probability bias. 188 APPENDICES Appendix A Gaussian copula density Consider following joint logistic distribution for (u1 , u2 , ....., uT ). F (u1 , u2 , ....., uT ) = C(F1 (u1 ), ..., Ft (ut ), ..., FT (uT ); ρ) where F (·) is joint logistic distribution and Ft (·) is univariate logistic distribution. Then, we can obtain joint logistic density as following. f (u1 , u2 , ....., uT ) = ∂ε ∂ T C(ε1 , ε2 , ...., εT ; ρ) ∂ε1 ∂ε2 ··· T ∂ε1 ∂ε2 · · · ∂εT ∂u1 ∂u2 ∂uT ∂ε where Ft (ut ) = εt , ∂ut = ft (ut ) for all t and let t ∂ T C(ε1 ,ε2 ,....,εT );ρ) ∂ε1 ∂ε2 ···∂εT (A.1) = c(ε1 , ε2 , ...., εT ; ρ). Since C(·) is gaussian copula, we can rewrite c(·) as follow using the probability integral transformation(Hoel et al. [1972]) which transforms normal CDF to logistic CDF. c(ε1 , ε2 , ...., εT ; ρ) = φ(Φ−1 (ε1 ), Φ−1 (ε2 ), ..., Φ−1 (εT ); ρ) 1 2 T φ1 (Φ−1 (ε1 )) · φ2 (Φ−1 (ε2 )) · · · φT (Φ−1 (εT )) 1 2 T where φ(·) is joint normal density and φt (·) is a univariate normal density. 189 (A.2) Using the equations (A.1) and (A.2), we obtain following logistic density. In applications, we can use equation (A.3) to estimate model parameters by maximum likelihood estimation. T f (u1 , u2 , ....., uT ) = φ(Φ−1 (F1 (u1 )), Φ−1 (F2 (u2 )), ..., Φ−1 (FT (uT )); ρ) 1 2 T ft (ut ) −1 (F (u ))) t t t=1 φt (Φt (A.3) where joint normal density is given by the following. φ(g1 , g2 , ..., gT ) = ( 1 1 1 T ) 2 |Σ|− 2 exp(− g (|Σ|−1 − IT )g) 2π 2 where g = (g1 , g2 , ..., gT ) and Σ is covariance matrix of g, cov(g). 190 Appendix B Proof for the consistency of conditional logit estimator in chapter 1 Likelihood of joint T events yi1 , yi2 ...., yiT can be obtained from the following. P (yi1 = y1 , ...., yiT = yT |xi , ci , ni = n) = = CI−A4 = SE−A2 P (yi1 = y1 , ...., yiT = yT |xi , ci ) P (ni = n|xi , ci ) P (yi1 = y1 |xi , ci )P (yi2 = y2 |xi , ci ) · · · · · P (yiT = yT |xi , ci ) P (ni = n|xi , ci ) P (yi1 = y1 |xi1 , ci )P (yi2 = y2 |xi2 , ci ) · · · · · P (yiT = yT |xi3 , ci ) P (ni = n|xi , ci ) T exp( yit xit β) t=1 = (B.1) T A3 [exp( a∈Ri t=1 191 at xit β)] T where a = (a1 , a2 , ...., aT ), at ∈ {0, 1} at = 1 if yit = 1, RT defined as {a ∈ RT : at ∈ {0, 1} and at = ni and Ri is the subset of t=1 T at = ni }. t=1 n βc−log it = arg max b li (b) (B.2) i=1 T exp( yit xit b) t=1 li (b) = log{ } T [exp( a∈Ri 192 t=1 at xit b)] Appendix C Monte Carlo simulation results: Coefficient β for continuous covariate 193 Table C.1. Unconditional logit estimates for β with no serial correlation ρ=0, n=100 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 2.46 1.20 0.86 0.30 (0.28,0.32) t=3 1.75 0.46 0.44 0.34 (0.32,0.36) t=4 1.48 0.31 0.29 0.33 (0.31,0.35) t=5 1.36 0.23 0.22 0.32 (0.30,0.34) t=6 1.28 0.18 0.18 0.30 (0.28,0.32) t=7 1.23 0.16 0.16 0.29 (0.27,0.31) t=8 1.20 0.14 0.14 0.27 (0.25,0.29) ρ=0, n=200 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 2.19 0.59 0.54 0.60 (0.58,0.62) t=3 1.67 0.30 0.29 0.63 (0.61,0.65) t=4 1.45 0.20 0.20 0.61 (0.59,0.63) t=5 1.33 0.16 0.15 0.57 (0.54,0.59) t=6 1.26 0.13 0.13 0.52 (0.50,0.55) t=7 1.22 0.11 0.11 0.50 (0.47,0.52) t=8 1.18 0.10 0.10 0.47 (0.45,0.49) ρ=0, n=400 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 2.07 0.37 0.36 0.89 (0.88,0.90) t=3 1.63 0.21 0.20 0.92 (0.90,0.93) t=4 1.42 0.14 0.14 0.90 (0.89,0.91) t=5 1.32 0.11 0.11 0.87 (0.85,0.88) t=6 1.25 0.09 0.09 0.82 (0.81,0.83) t=7 1.20 0.08 0.08 0.77 (0.75,0.79) t=8 1.17 0.07 0.07 0.74 (0.71,0.77) ρ=0, n=600 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 2.03 0.30 0.29 0.99 (0.98,0.99) t=3 1.62 0.17 0.16 0.98 (0.98,0.99) t=4 1.42 0.11 0.11 0.98 (0.98,0.99) t=5 1.31 0.09 0.09 0.96 (0.95,0.98) t=6 1.24 0.07 0.07 0.95 (0.94,0.96) t=7 1.20 0.06 0.06 0.92 (0.91,0.94) t=8 1.17 0.05 0.05 0.90 (0.88,0.92) *Note: r is the rejection rate for the null of H0 :β=1 with nominal value 0.05. ρ is AR(1) serial correlation coefficient for the errors. 194 Table C.2. Conditional logit estimates for β with no serial correlation ρ=0, n=100 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 1.23 0.60 0.45 0.03 (0.03,0.04) t=3 1.08 0.26 0.25 0.04 (0.03,0.05) t=4 1.04 0.20 0.19 0.06 (0.05,0.07) t=5 1.04 0.16 0.16 0.05 (0.04,0.06) t=6 1.03 0.14 0.14 0.04 (0.03,0.05) t=7 1.03 0.13 0.12 0.05 (0.04,0.06) t=8 1.02 0.11 0.11 0.05 (0.04,0.06) ρ=0, n=200 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 1.09 0.29 0.27 0.04 (0.03,0.05) t=3 1.03 0.17 0.17 0.05 (0.04,0.06) t=4 1.02 0.13 0.13 0.05 (0.04,0.06) t=5 1.02 0.11 0.11 0.05 (0.04,0.06) t=6 1.01 0.10 0.10 0.05 (0.04,0.06) t=7 1.01 0.09 0.09 0.05 (0.04,0.06) t=8 1.01 0.08 0.08 0.05 (0.04,0.06) ρ=0, n=400 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 1.03 0.19 0.18 0.04 (0.03,0.05) t=3 1.01 0.12 0.12 0.05 (0.04,0.06) t=4 1.01 0.09 0.09 0.05 (0.04,0.06) t=5 1.00 0.08 0.08 0.05 (0.04,0.06) t=6 1.00 0.07 0.07 0.05 (0.04,0.06) t=7 1.00 0.06 0.06 0.05 (0.04,0.06) t=8 1.00 0.05 0.05 0.05 (0.04,0.06) ρ=0, n=600 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 1.01 0.15 0.15 0.05 (0.04,0.06) t=3 1.01 0.10 0.09 0.05 (0.04,0.06) t=4 1.00 0.07 0.07 0.05 (0.04,0.06) t=5 1.00 0.06 0.06 0.06 (0.05,0.07) t=6 1.00 0.05 0.05 0.05 (0.04,0.06) t=7 1.00 0.05 0.05 0.05 (0.04,0.06) t=8 1.00 0.05 0.05 0.05 (0.04,0.06) *Note: r is the rejection rate for the null of H0 :β=1 with nominal value 0.05. ρ is AR(1) serial correlation coefficient for the errors. 195 Table C.3. Unconditional logit estimates for β with serial correlation ρ=0.2, n=100 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 2.88 1.89 1.04 0.36 (0.34,0.38) t=3 1.94 0.53 0.48 0.45 (0.43,0.48) t=4 1.60 0.34 0.31 0.46 (0.44,0.48) t=5 1.45 0.24 0.24 0.46 (0.44,0.48) t=6 1.35 0.20 0.19 0.41 (0.39,0.43) t=7 1.28 0.17 0.16 0.39 (0.37,0.41) t=8 1.25 0.15 0.14 0.38 (0.36,0.40) ρ=0.2, n=200 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 2.50 0.72 0.62 0.72 (0.70,0.74) t=3 1.84 0.33 0.32 0.80 (0.78,0.82) t=4 1.56 0.21 0.21 0.78 (0.76,0.80) t=5 1.41 0.16 0.16 0.75 (0.73,0.76) t=6 1.32 0.13 0.13 0.70 (0.68,0.72) t=7 1.27 0.12 0.11 0.67 (0.65,0.69) t=8 1.22 0.10 0.10 0.62 (0.60,0.64) ρ=0.2, n=400 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 2.37 0.42 0.42 0.97 (0.96,0.98) t=3 1.80 0.23 0.22 0.98 (0.97,0.99) t=4 1.54 0.15 0.15 0.98 (0.97,0.99) t=5 1.40 0.12 0.11 0.97 (0.96,0.98) t=6 1.32 0.09 0.09 0.95 (0.93,0.96) t=7 1.26 0.08 0.08 0.92 (0.91,0.93) t=8 1.22 0.07 0.07 0.90 (0.88,0.92) ρ=0.2, n=600 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 2.31 0.35 0.33 0.99 (0.99,1) t=3 1.79 0.19 0.18 1 (1,1) t=4 1.53 0.12 0.12 1 (1,1) t=5 1.40 0.09 0.09 1 (1,1) t=6 1.31 0.07 0.08 1 (1,1) t=7 1.25 0.07 0.07 0.99 (0.98,0.99) t=8 1.22 0.06 0.06 0.98 (0.97,0.99) *Note: r is the rejection rate for the null of H0 :β=1 with nominal value 0.05. ρ is AR(1) serial correlation coefficient for the errors. 196 Table C.4. Conditional logit estimates for β with serial correlation ρ=0.2, n=100 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 1.44 0.94 0.56 0.02 (0.01,0.02) t=3 1.18 0.30 0.27 0.05 (0.04,0.06) t=4 1.12 0.22 0.20 0.07 (0.06,0.08) t=5 1.10 0.17 0.17 0.07 (0.06,0.08) t=6 1.08 0.15 0.14 0.07 (0.06,0.08) t=7 1.07 0.13 0.13 0.07 (0.06,0.08) t=8 1.06 0.12 0.12 0.07 (0.06,0.08) ρ=0.2, n=200 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 1.25 0.36 0.32 0.05 (0.04,0.06) t=3 1.14 0.19 0.18 0.07 (0.06,0.08) t=4 1.09 0.14 0.14 0.08 (0.07,0.09) t=5 1.08 0.11 0.12 0.08 (0.07,0.09) t=6 1.06 0.10 0.10 0.08 (0.07,0.09) t=7 1.06 0.09 0.09 0.09 (0.08,0.10) t=8 1.05 0.09 0.08 0.09 (0.08,0.10) ρ=0.2, n=400 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 1.18 0.21 0.21 0.08 (0.06,0.09) t=3 1.11 0.13 0.13 0.11 (0.10,0.13) t=4 1.08 0.10 0.10 0.12 (0.10,0.13) t=5 1.07 0.08 0.08 0.13 (0.12,0.15) t=6 1.06 0.07 0.07 0.12 (0.10,0.14) t=7 1.05 0.06 0.06 0.10 (0.08,0.11) t=8 1.04 0.06 0.06 0.10 (0.08,0.12) ρ=0.2, n=600 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 1.16 0.17 0.17 0.10 (0.08,0.12) t=3 1.11 0.11 0.10 0.15 (0.13,0.17) t=4 1.08 0.08 0.08 0.16 (0.13,0.18) t=5 1.06 0.06 0.07 0.14 (0.12,0.17) t=6 1.05 0.06 0.06 0.14 (0.12,0.16) t=7 1.04 0.05 0.05 0.13 (0.11,0.15) t=8 1.03 0.05 0.05 0.13 (0.11,0.15) *Note: r is the rejection rate for the null of H0 :β=1 with nominal value 0.05. ρ is AR(1) serial correlation coefficient for the errors. 197 Table C.5. Unconditional logit estimates for β with serial correlation ρ=0.4, n=100 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 3.59 2.65 1.32 0.44 (0.42,0.46) t=3 2.26 0.65 0.57 0.60 (0.58,0.63) t=4 1.81 0.40 0.36 0.63 (0.61,0.65) t=5 1.60 0.27 0.26 0.65 (0.63,0.67) t=6 1.47 0.21 0.21 0.62 (0.60,0.65) t=7 1.38 0.18 0.18 0.57 (0.54,0.59) t=8 1.33 0.16 0.15 0.57 (0.55,0.59) ρ=0.4, n=200 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 2.99 0.92 0.77 0.84 (0.83,0.86) t=3 2.12 0.39 0.38 0.93 (0.91,0.94) t=4 1.75 0.25 0.24 0.93 (0.91,0.94) t=5 1.56 0.18 0.18 0.92 (0.91,0.94) t=6 1.44 0.15 0.14 0.89 (0.88,0.91) t=7 1.36 0.12 0.12 0.87 (0.86,0.89) t=8 1.31 0.11 0.11 0.83 (0.81,0.84) ρ=0.4, n=400 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 2.79 0.54 0.50 0.99 (0.99,1) t=3 2.07 0.27 0.26 1 (1,1) t=4 1.73 0.17 0.17 1 (1,1) t=5 1.54 0.13 0.13 1 (1,1) t=6 1.42 0.10 0.10 1 (1,1) t=7 1.35 0.09 0.09 0.99 (0.99,1) t=8 1.30 0.08 0.08 0.98 (0.98,0.99) ρ=0.4, n=600 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 2.72 0.43 0.40 1 (1,1) t=3 2.06 0.21 0.21 1 (1,1) t=4 1.72 0.14 0.14 1 (1,1) t=5 1.53 0.10 0.10 1 (1,1) t=6 1.42 0.08 0.08 1 (1,1) t=7 1.35 0.07 0.07 1 (1,1) t=8 1.30 0.06 0.06 1 (1,1) *Note: r is the rejection rate for the null of H0 :β=1 with nominal value 0.05. ρ is AR(1) serial correlation coefficient for the errors. 198 Table C.6. Conditional logit estimates for β with serial correlation ρ=0.4, n=100 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 1.80 1.32 0.77 0.004 (0.01,0.02) t=3 1.35 0.35 0.32 0.11 (0.04,0.06) t=4 1.25 0.25 0.22 0.14 (0.06,0.08) t=5 1.20 0.19 0.18 0.15 (0.06,0.08) t=6 1.17 0.16 0.15 0.16 (0.14,0.18) t=7 1.13 0.14 0.14 0.15 (0.13,0.16) t=8 1.13 0.13 0.13 0.15 (0.13,0.16) ρ=0.4, n=200 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 1.50 0.46 0.39 0.12 (0.10,0.13) t=3 1.28 0.21 0.21 0.21 (0.19,0.23) t=4 1.21 0.16 0.15 0.25 (0.23,0.27) t=5 1.18 0.13 0.12 0.26 (0.23,0.27) t=6 1.15 0.11 0.11 0.25 (0.22,0.26) t=7 1.13 0.09 0.10 0.24 (0.21,0.26) t=8 1.11 0.09 0.09 0.23 (0.21,0.25) ρ=0.4, n=400 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 1.39 0.27 0.25 0.27 (0.25,0.29) t=3 1.26 0.15 0.14 0.40 (0.38,0.42) t=4 1.20 0.11 0.11 0.45 (0.42,0.47) t=5 1.16 0.09 0.09 0.46 (0.43,0.48) t=6 1.14 0.08 0.08 0.41 (0.38,0.44) t=7 1.12 0.07 0.07 0.42 (0.39,0.45) t=8 1.10 0.06 0.06 0.40 (0.37,0.43) ρ=0.4, n=600 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 1.36 0.21 0.20 0.40 (0.37,0.43) t=3 1.26 0.12 0.12 0.59 (0.56,0.62) t=4 1.19 0.09 0.09 0.64 (0.61,0.67) t=5 1.16 0.07 0.07 0.61 (0.58,0.64) t=6 1.13 0.06 0.06 0.59 (0.56,0.62) t=7 1.11 0.06 0.05 0.56 (0.52,0.59) t=8 1.10 0.05 0.05 0.53 (0.50,0.56) *Note: r is the rejection rate for the null of H0 :β=1 with nominal value 0.05. ρ is AR(1) serial correlation coefficient for the errors. 199 Table C.7. Unconditional logit estimates for β with serial correlation ρ=0.6, n=100 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 5.32 10.91 1.89 0.57 (0.55,0.59) t=3 2.87 0.94 0.76 0.79 (0.77,0.81) t=4 2.21 0.49 0.45 0.86 (0.84,0.87) t=5 1.90 0.33 0.32 0.88 (0.87,0.90) t=6 1.71 0.25 0.25 0.87 (0.85,0.88) t=7 1.58 0.21 0.20 0.85 (0.84,0.87) t=8 1.49 0.18 0.17 0.83 (0.82,0.85) ρ=0.6, n=200 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 3.94 1.66 1.09 0.91 (0.90,0.93) t=3 2.66 0.52 0.49 0.99 (0.98,0.99) t=4 2.13 0.32 0.30 0.99 (0.99,1) t=5 1.84 0.22 0.22 0.99 (0.99,1) t=6 1.67 0.17 0.17 0.99 (0.99,1) t=7 1.55 0.15 0.14 0.99 (0.99,1) t=8 1.47 0.13 0.12 0.98 (0.98,0.99) ρ=0.6, n=400 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 3.55 0.74 0.67 1 (1,1) t=3 2.56 0.35 0.33 1 (1,1) t=4 2.09 0.21 0.21 1 (1,1) t=5 1.82 0.15 0.15 1 (1,1) t=6 1.64 0.12 0.12 1 (1,1) t=7 1.53 0.10 0.10 1 (1,1) t=8 1.46 0.09 0.09 1 (1,1) ρ=0.6, n=600 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 3.43 0.58 0.52 0.99 (0.99,1) t=3 2.55 0.27 0.27 1 (1,1) t=4 2.08 0.16 0.17 1 (1,1) t=5 1.80 0.12 0.12 1 (1,1) t=6 1.64 0.10 0.10 1 (1,1) t=7 1.53 0.08 0.08 1 (1,1) t=8 1.45 0.07 0.07 1 (1,1) *Note: r is the rejection rate for the null of H0 :β=1 with nominal value 0.05. ρ is AR(1) serial correlation coefficient for the errors. 200 Table C.8. Conditional logit estimates for β with serial correlation ρ=0.6, n=100 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 2.86 11.06 3048013 0.001 (0,0.03) t=3 1.68 0.50 0.41 0.27 (0.25,0.29) t=4 1.49 0.29 0.27 0.39 (0.37,0.41) t=5 1.40 0.22 0.21 0.45 (0.42,0.47) t=6 1.34 0.18 0.18 0.47 (0.44,0.49) t=7 1.29 0.16 0.15 0.45 (0.43,0.47) t=8 1.25 0.14 0.14 0.44 (0.42,0.46) ρ=0.6, n=200 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 1.97 0.83 0.58 0.23 (0.21,0.25) t=3 1.57 0.28 0.26 0.59 (0.57,0.61) t=4 1.44 0.19 0.18 0.72 (0.70,0.74) t=5 1.36 0.15 0.14 0.73 (0.71,0.75) t=6 1.31 0.13 0.12 0.75 (0.73,0.77) t=7 1.27 0.11 0.11 0.71 (0.69,0.73) t=8 1.24 0.10 0.09 0.71 (0.69,0.73) ρ=0.6, n=400 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 1.77 0.37 0.34 0.68 (0.67,0.70) t=3 1.52 0.19 0.18 0.89 (0.88,0.90) t=4 1.42 0.13 0.13 0.95 (0.94,0.96) t=5 1.35 0.10 0.10 0.96 (0.95,0.97) t=6 1.30 0.09 0.08 0.95 (0.94,0.96) t=7 1.26 0.07 0.07 0.95 (0.94,0.96) t=8 1.23 0.07 0.07 0.96 (0.94,0.97) ρ=0.6, n=600 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 1.71 0.29 0.27 0.85 (0.83,0.87) t=3 1.51 0.15 0.14 0.98 (0.98,0.99) t=4 1.41 0.10 0.10 1 (0.99,1) t=5 1.34 0.08 0.08 1 (0.99,1) t=6 1.29 0.07 0.07 0.99 (0.99,1) t=7 1.25 0.06 0.06 0.99 (0.99,1) t=8 1.22 0.06 0.05 0.99 (0.98,0.99) *Note: r is the rejection rate for the null of H0 :β=1 with nominal value 0.05. ρ is AR(1) serial correlation coefficient for the errors. 201 Table C.9. Unconditional logit estimates for β with serial correlation ρ=0.8, n=100 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 8.70 17.11 3.05 0.67 (0.65,0.68) t=3 6.53 25.88 1.67 0.91 (0.89,0.92) t=4 3.34 0.90 0.75 0.98 (0.98,0.99) t=5 2.72 0.52 0.50 1 (0.99,1) t=6 2.36 0.39 0.36 1 (0.99,1) t=7 2.13 0.31 0.29 1 (0.99,1) t=8 1.96 0.25 0.24 1 (0.99,1) ρ=0.8, n=200 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 7.12 12.38 2.33 0.93 (0.92,0.94) t=3 4.05 0.92 0.83 1 (1,1) t=4 3.13 0.53 0.48 1 (1,1) t=5 2.61 0.34 0.33 1 (1,1) t=6 2.26 0.25 0.24 1 (1,1) t=7 2.06 0.20 0.20 1 (1,1) t=8 1.91 0.17 0.17 1 (1,1) ρ=0.8, n=400 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 5.50 2.28 1.25 1 (0.99,1) t=3 3.81 0.58 0.54 1 (1,1) t=4 3.03 0.34 0.33 1 (1,1) t=5 2.55 0.23 0.23 1 (1,1) t=6 2.24 0.18 0.17 1 (1,1) t=7 2.04 0.14 0.14 1 (1,1) t=8 1.89 0.12 0.12 1 (1,1) ρ=0.8, n=600 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 5.09 1.07 0.92 1 (1,1) t=3 3.74 0.46 0.43 1 (1,1) t=4 3.00 0.27 0.26 1 (1,1) t=5 2.52 0.18 0.18 1 (1,1) t=6 2.23 0.14 0.14 1 (1,1) t=7 2.03 0.11 0.11 1 (1,1) t=8 1.89 0.09 0.09 1 (1,1) *Note: r is the rejection rate for the null of H0 :β=1 with nominal value 0.05. ρ is AR(1) serial correlation coefficient for the errors. 202 Table C.10. Conditional logit estimates for β with serial correlation ρ=0.8, n=100 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 5.52 29.96 6296147 0 (0,0) t=3 4.03 19.14 380015 0.61 (0.59,0.63) t=4 2.12 0.50 0.42 0.89 (0.87,0.90) t=5 1.91 0.32 0.30 0.95 (0.94,0.97) t=6 1.78 0.26 0.24 0.96 (0.95,0.97) t=7 1.69 0.22 0.20 0.96 (0.95,0.96) t=8 1.61 0.19 0.18 0.97 (0.96,0.97) ρ=0.8, n=200 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 3.69 11.54 1053342 0.27 (0.25,0.29) t=3 2.29 0.48 0.44 0.98 (0.98,0.99) t=4 2.01 0.29 0.27 1 (1,1) t=5 1.84 0.21 0.20 1 (1,1) t=6 1.72 0.17 0.16 1 (1,1) t=7 1.64 0.14 0.14 1 (1,1) t=8 1.57 0.13 0.12 1 (1,1) ρ=0.8, n=400 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 2.75 1.14 0.68 0.98 (0.98,0.99) t=3 2.17 0.30 0.28 1 (1,1) t=4 1.95 0.19 0.18 1 (1,1) t=5 1.81 0.14 0.14 1 (1,1) t=6 1.70 0.12 0.11 1 (1,1) t=7 1.63 0.10 0.10 1 (1,1) t=8 1.56 0.09 0.08 1 (1,1) ρ=0.8, n=600 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 2.55 0.54 0.47 1 (1,1) t=3 2.13 0.24 0.22 1 (1,1) t=4 1.94 0.15 0.15 1 (1,1) t=5 1.79 0.11 0.11 1 (1,1) t=6 1.69 0.09 0.09 1 (1,1) t=7 1.62 0.08 0.08 1 (1,1) t=8 1.56 0.07 0.07 1 (1,1) *Note: r is the rejection rate for the null of H0 :β=1 with nominal value 0.05. ρ is AR(1) serial correlation coefficient for the errors. 203 Appendix D Monte Carlo simulation results: APEs for continuous covariate 204 Table D.1. LPM, unconditional logit, CRE logit APE estimates for β n=100 t=2 t=2 t=2 t=3 t=2 t=3 t=4 t=4 t=4 t=5 t=5 t=5 t=6 t=6 t=6 t=7 t=7 t=7 t=8 t=8 t=8 ρ=0 LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit Mean 0.157 0.348 0.158 0.161 0.276 0.162 0.163 0.235 0.163 0.164 0.215 0.164 0.165 0.202 0.165 0.166 0.194 0.166 0.166 0.194 0.166 SD 0.040 0.081 0.040 0.028 0.044 0.028 0.023 0.032 0.023 0.020 0.025 0.020 0.018 0.021 0.018 0.017 0.019 0.017 0.016 0.017 0.015 SE r ≡ 1(P<0.05) 0.040 0.058 0.048 0.869 0.040 0.059 0.028 0.058 0.034 0.833 0.028 0.057 0.023 0.049 0.027 0.692 0.023 0.053 0.020 0.056 0.022 0.586 0.019 0.060 0.018 0.050 0.019 0.453 0.018 0.069 0.016 0.053 0.017 0.392 0.016 0.059 0.015 0.051 0.015 0.332 0.015 0.056 95-CI (0.047,0.068) (0.854,0.884) (0.048,0.068) (0.048,0.068) (0.816,0.849) (0.056,0.068) (0.040,0.058) (0.672,0.712) (0.043,0.063) (0.046,0.066) (0.048,0.068) (0.050,0.071) (0.040,0.059) (0.432,0.475) (0.040,0.058) (0.041,0.060) (0.370,0.413) (0.049,0.069) (0.041,0.060) (0.311,0.352) (0.046,0.066) *Note: r is the rejection rate for the null of H0 :β=1 . ρ is AR(1) serial correlation coefficient for the errors. True APE is 0.162(t = 2), 0.162(t = 3), 0.165(t = 4), 0.165(t = 5), 0.167(t = 6), 0.166(t = 7), and 0.166(t = 8). 205 Table D.2. LPM, unconditional logit, CRE logit APE estimates for β n=200 ρ=0 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 LPM 0.157 0.028 0.028 0.059 (0.049,0.069) t=2 uncon logit 0.349 0.050 0.032 0.979 (0.973,0.985) t=2 CRE logit 0.157 0.028 0.028 0.060 (0.049,0.070) t=3 LPM 0.161 0.020 0.020 0.053 (0.043,0.063) t=3 uncon logit 0.273 0.031 0.024 0.973 (0.966,0.980) t=3 CRE logit 0.161 0.020 0.020 0.055 (0.045,0.065) t=4 LPM 0.163 0.016 0.016 0.053 (0.043,0.062) t=4 uncon logit 0.236 0.022 0.019 0.927 (0.915,0.938) t=4 CRE logit 0.163 0.016 0.016 0.058 (0.048,0.068) t=5 LPM 0.164 0.014 0.014 0.057 (0.047,0.067) t=5 uncon logit 0.214 0.028 0.015 0.851 (0.834,0.866) t=5 CRE logit 0.164 0.014 0.014 0.056 (0.045,0.066) t=6 LPM 0.165 0.013 0.013 0.054 (0.044,0.063) t=6 uncon logit 0.201 0.015 0.013 0.753 (0.734,0.772) t=6 CRE logit 0.165 0.013 0.013 0.055 (0.045,0.065) t=7 LPM 0.166 0.011 0.011 0.047 (0.037,0.056) t=7 uncon logit 0.194 0.013 0.012 0.658 (0.637,0.678) t=7 CRE logit 0.166 0.011 0.011 0.049 (0.039,0.058) t=8 LPM 0.166 0.011 0.011 0.060 (0.049,0.070) t=8 uncon logit 0.188 0.011 0.011 0.622 (0.601,0.643) t=8 CRE logit 0.166 0.011 0.011 0.059 (0.049,0.069) *Note: r is the rejection rate for the null of H0 :β=1 . ρ is AR(1) serial correlation coefficient for the errors. True APE is 0.163(t = 2), 0.163(t = 3), 0.164(t = 4), 0.165(t = 5), 0.165(t = 6), 0.166(t = 7), and 0.164(t = 8). 206 Table D.3. LPM, unconditional logit, CRE logit APE estimates for β n=400 ρ=0 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 LPM 0.156 0.020 0.020 0.063 (0.051,0.073) t=2 uncon logit 0.347 0.033 0.023 1 (1,1) t=2 CRE logit 0.156 0.020 0.020 0.063 (0.052,0.073) t=3 LPM 0.160 0.014 0.013 0.060 (0.049,0.070) t=3 uncon logit 0.273 0.022 0.017 1 (1,1) t=3 CRE logit 0.160 0.014 0.014 0.055 (0.045,0.065) t=4 LPM 0.163 0.011 0.011 0.056 (0.046,0.066) t=4 uncon logit 0.235 0.015 0.013 0.997 (0.994,0.999) t=4 CRE logit 0.163 0.011 0.011 0.056 (0.046,0.066) t=5 LPM 0.164 0.010 0.010 0.056 (0.046,0.067) t=5 uncon logit 0.214 0.012 0.011 0.986 (0.980,0.991) t=5 CRE logit 0.164 0.010 0.010 0.056 (0.046,0.067) t=6 LPM 0.165 0.009 0.009 0.044 (0.035,0.060) t=6 uncon logit 0.202 0.010 0.009 0.962 (0.953,0.970) t=6 CRE logit 0.165 0.009 0.009 0.048 (0.039,0.062) t=7 LPM 0.165 0.008 0.008 0.054 (0.044,0.064) t=7 uncon logit 0.193 0.009 0.008 0.847 (0.831,0.865) t=7 CRE logit 0.165 0.008 0.008 0.056 (0.046,0.066) t=8 LPM 0.166 0.007 0.007 0.050 (0.040,0.060) t=8 uncon logit 0.188 0.008 0.008 0.800 (0.780,0.820) t=8 CRE logit 0.166 0.007 0.007 0.053 (0.043,0.063) *Note: r is the rejection rate for the null of H0 :β=1 . ρ is AR(1) serial correlation coefficient for the errors. True APE is 0.161(t = 2), 0.162(t = 3), 0.166(t = 4), 0.166(t = 5), 0.165(t = 6), 0.167(t = 7), and 0.167(t = 8). 207 Table D.4. LPM, unconditional logit, CRE logit APE estimates for β n=600 ρ=0 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 LPM 0.156 0.016 0.016 0.101 (0.082,0.119) t=2 uncon logit 0.346 0.028 0.018 1 (1,1) t=2 CRE logit 0.156 0.016 0.016 0.097 (0.078,0.115) t=3 LPM 0.160 0.012 0.011 0.062 (0.047,0.076) t=3 uncon logit 0.273 0.018 0.014 1 (1,1) t=3 CRE logit 0.160 0.012 0.011 0.062 (0.047,0.076) t=4 LPM 0.163 0.009 0.009 0.056 (0.041,0.072) t=4 uncon logit 0.235 0.013 0.011 1 (1,1) t=4 CRE logit 0.163 0.009 0.009 0.060 (0.045,0.076) t=5 LPM 0.164 0.008 0.008 0.059 (0.044,0.073) t=5 uncon logit 0.214 0.010 0.009 1 (1,1) t=5 CRE logit 0.164 0.008 0.008 0.059 (0.045,0.073) t=6 LPM 0.165 0.007 0.007 0.047 (0.034,0.060) t=6 uncon logit 0.202 0.008 0.007 0.998 (0.995,1) t=6 CRE logit 0.165 0.007 0.007 0.047 (0.034,0.060) t=7 LPM 0.165 0.007 0.007 0.055 (0.041,0.069) t=7 uncon logit 0.194 0.007 0.007 0.997 (0.967,0.986) t=7 CRE logit 0.165 0.007 0.007 0.058 (0.044,0.073) t=8 LPM 0.166 0.006 0.006 0.049 (0.036,0.062) t=8 uncon logit 0.188 0.006 0.006 0.931 (0.915,0.947) t=8 CRE logit 0.166 0.006 0.006 0.055 (0.040,0.069) *Note: r is the rejection rate for the null of H0 :β=1 . ρ is AR(1) serial correlation coefficient for the errors. True APE is 0.165(t = 2), 0.163(t = 3), 0.167(t = 4), 0.165(t = 5), 0.164(t = 6), 0.167(t = 7), and 0.167(t = 8). 208 Table D.5. LPM, unconditional logit, CRE logit APE estimates for β n=100 t=2 t=2 t=2 t=3 t=2 t=3 t=4 t=4 t=4 t=5 t=5 t=5 t=6 t=6 t=6 t=7 t=7 t=7 t=8 t=8 t=8 ρ=0.2 LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit Mean 0.156 0.362 0.157 0.160 0.289 0.161 0.162 0.246 0.163 0.164 0.223 0.164 0.165 0.208 0.165 0.165 0.198 0.165 0.166 0.193 0.166 SD 0.039 0.083 0.039 0.027 0.047 0.027 0.023 0.033 0.023 0.020 0.025 0.020 0.018 0.021 0.018 0.016 0.018 0.016 0.015 0.016 0.015 SE r ≡ 1(P<0.05) 0.038 0.068 0.047 0.906 0.038 0.070 0.027 0.056 0.034 0.889 0.027 0.057 0.023 0.057 0.027 0.802 0.023 0.057 0.020 0.053 0.022 0.713 0.019 0.053 0.018 0.055 0.019 0.614 0.018 0.063 0.016 0.050 0.017 0.499 0.016 0.056 0.015 0.046 0.015 0.426 0.015 0.050 95-CI (0.057,0.078) (0.893,0.918) (0.060,0.081) (0.045,0.066) (0.875,0.903) (0.046,0.067) (0.046,0.067) (0.785,0.820) (0.047,0.067) (0.043,0.064) (0.693,0.733) (0.043,0.063) (0.045,0.065) (0.477,0.521) (0.052,0.074) (0.040,0.060) (0.477,0.521) (0.046,0.067) (0.036,0.055) (0.404,0.448) (0.040,0.059) *Note: r is the rejection rate for the null of H0 :β=1 . ρ is AR(1) serial correlation coefficient for the errors. True APE is 0.163(t = 2), 0.162(t = 3), 0.164(t = 4), 0.165(t = 5), 0.165(t = 6), 0.166(t = 7), and 0.166(t = 8). 209 Table D.6. LPM, unconditional logit, CRE logit APE estimates for β n=200 ρ=0.2 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 LPM 0.157 0.027 0.028 0.059 (0.049,0.069) t=2 uncon logit 0.363 0.050 0.032 0.979 (0.973,0.985) t=2 CRE logit 0.157 0.028 0.028 0.060 (0.049,0.070) t=3 LPM 0.161 0.019 0.019 0.049 (0.039,0.058) t=3 uncon logit 0.289 0.031 0.024 0.995 (0.992,0.998) t=3 CRE logit 0.161 0.019 0.019 0.054 (0.044,0.064) t=4 LPM 0.163 0.016 0.016 0.051 (0.041,0.061) t=4 uncon logit 0.246 0.022 0.019 0.978 (0.972,0.984) t=4 CRE logit 0.163 0.016 0.016 0.049 (0.039,0.058) t=5 LPM 0.164 0.014 0.014 0.049 (0.040,0.058) t=5 uncon logit 0.222 0.017 0.015 0.946 (0.936,0.955) t=5 CRE logit 0.164 0.014 0.014 0.050 (0.040,0.059) t=6 LPM 0.165 0.013 0.013 0.050 (0.040,0.059) t=6 uncon logit 0.208 0.014 0.013 0.858 (0.842,0.872) t=6 CRE logit 0.165 0.013 0.012 0.055 (0.045,0.065) t=7 LPM 0.166 0.011 0.011 0.051 (0.041,0.060) t=7 uncon logit 0.199 0.013 0.012 0.776 (0.758,0.794) t=7 CRE logit 0.166 0.011 0.011 0.056 (0.046,0.066) t=8 LPM 0.166 0.011 0.011 0.054 (0.044,0.064) t=8 uncon logit 0.192 0.012 0.011 0.664 (0.643,0.685) t=8 CRE logit 0.166 0.011 0.011 0.056 (0.046,0.066) *Note: r is the rejection rate for the null of H0 :β=1 . ρ is AR(1) serial correlation coefficient for the errors. True APE is 0.163(t = 2), 0.162(t = 3), 0.164(t = 4), 0.165(t = 5), 0.166(t = 6), 0.166(t = 7), and 0.164(t = 8). 210 Table D.7. LPM, unconditional logit, CRE logit APE estimates for β n=400 ρ=0.2 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 LPM 0.156 0.019 0.019 0.056 (0.046,0.066) t=2 uncon logit 0.364 0.032 0.020 1 (1,1) t=2 CRE logit 0.156 0.019 0.019 0.056 (0.046,0.066) t=3 LPM 0.160 0.014 0.014 0.053 (0.043,0.063) t=3 uncon logit 0.288 0.022 0.017 1 (1,1) t=3 CRE logit 0.160 0.014 0.013 0.052 (0.042,0.062) t=4 LPM 0.163 0.011 0.011 0.055 (0.045,0.065) t=4 uncon logit 0.246 0.015 0.013 1 (1,1) t=4 CRE logit 0.163 0.011 0.011 0.058 (0.048,0.068) t=5 LPM 0.164 0.010 0.010 0.052 (0.041,0.063) t=5 uncon logit 0.222 0.012 0.011 0.998 (0.996,1) t=5 CRE logit 0.164 0.010 0.010 0.056 (0.046,0.067) t=6 LPM 0.165 0.010 0.009 0.052 (0.038,0.066) t=6 uncon logit 0.208 0.010 0.009 0.990 (0.984,0.996) t=6 CRE logit 0.165 0.009 0.009 0.054 (0.039,0.068) t=7 LPM 0.165 0.008 0.008 0.050 (0.040,0.060) t=7 uncon logit 0.198 0.009 0.008 0.972 (0.964,0.979) t=7 CRE logit 0.165 0.008 0.008 0.052 (0.042,0.062) t=8 LPM 0.166 0.008 0.008 0.058 (0.044,0.072) t=8 uncon logit 0.192 0.008 0.008 0.923 (0.907,0.939) t=8 CRE logit 0.166 0.008 0.007 0.056 (0.042,0.070) *Note: r is the rejection rate for the null of H0 :β=1 . ρ is AR(1) serial correlation coefficient for the errors. True APE is 0.162(t = 2), 0.162(t = 3), 0.166(t = 4), 0.165(t = 5), 0.165(t = 6), 0.165(t = 7), and 0.166(t = 8). 211 Table D.8. LPM, unconditional logit, CRE logit APE estimates for β n=600 ρ=0.2 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 LPM 0.155 0.015 0.016 0.062 (0.047,0.077) t=2 uncon logit 0.363 0.028 0.017 1 (1,1) t=2 CRE logit 0.155 0.015 0.015 0.060 (0.045,0.075) t=3 LPM 0.160 0.011 0.011 0.052 (0.039,0.066) t=3 uncon logit 0.288 0.018 0.014 1 (1,1) t=3 CRE logit 0.160 0.011 0.011 0.052 (0.039,0.066) t=4 LPM 0.163 0.009 0.009 0.060 (0.045,0.075) t=4 uncon logit 0.246 0.013 0.011 1 (1,1) t=4 CRE logit 0.163 0.009 0.009 0.060 (0.045,0.075) t=5 LPM 0.164 0.008 0.008 0.046 (0.033,0.059) t=5 uncon logit 0.222 0.010 0.009 1 (1,1) t=5 CRE logit 0.164 0.008 0.008 0.050 (0.036,0.064) t=6 LPM 0.165 0.007 0.007 0.046 (0.033,0.059) t=6 uncon logit 0.208 0.008 0.008 1 (1,1) t=6 CRE logit 0.165 0.007 0.007 0.049 (0.036,0.062) t=7 LPM 0.165 0.007 0.007 0.057 (0.042,0.071) t=7 uncon logit 0.198 0.008 0.007 0.990 (0.984,1) t=7 CRE logit 0.165 0.007 0.007 0.057 (0.042,0.071) t=8 LPM 0.166 0.006 0.006 0.045 (0.032,0.058) t=8 uncon logit 0.192 0.006 0.006 0.975 (0.965,0.985) t=8 CRE logit 0.166 0.006 0.006 0.045 (0.032,0.058) *Note: r is the rejection rate for the null of H0 :β=1 . ρ is AR(1) serial correlation coefficient for the errors. True APE is 0.165(t = 2), 0.163(t = 3), 0.165(t = 4), 0.164(t = 5), 0.164(t = 6), 0.167(t = 7), and 0.167(t = 8). 212 Table D.9. LPM, unconditional logit, CRE logit APE estimates for β n=100 t=2 t=2 t=2 t=3 t=2 t=3 t=4 t=4 t=4 t=5 t=5 t=5 t=6 t=6 t=6 t=7 t=7 t=7 t=8 t=8 t=8 ρ=0.4 LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit Mean 0.155 0.379 0.155 0.160 0.310 0.161 0.163 0.262 0.163 0.164 0.235 0.164 0.165 0.218 0.165 0.165 0.206 0.165 0.166 0.198 0.166 SD 0.037 0.094 0.037 0.027 0.047 0.027 0.022 0.034 0.022 0.019 0.025 0.019 0.018 0.021 0.018 0.016 0.018 0.016 0.015 0.017 0.015 SE r ≡ 1(P<0.05) 0.037 0.074 0.047 0.936 0.036 0.078 0.026 0.057 0.033 0.955 0.026 0.058 0.022 0.058 0.027 0.910 0.022 0.059 0.019 0.055 0.022 0.859 0.019 0.059 0.018 0.053 0.019 0.762 0.018 0.058 0.016 0.045 0.017 0.669 0.016 0.050 0.015 0.055 0.016 0.561 0.015 0.057 95-CI (0.063,0.086) (0.925,0.946) (0.066,0.089) (0.047,0.068) (0.946,0.964) (0.048,0.068) (0.048,0.068) (0.897,0.923) (0.049,0.070) (0.045,0.065) (0.844,0.874) (0.049,0.069) (0.043,0.063) (0.743,0.781) (0.049,0.069) (0.036,0.054) (0.648,0.690) (0.041,0.060) (0.045,0.065) (0.539,0.583) (0.047,0.067) *Note: r is the rejection rate for the null of H0 :β=1 . ρ is AR(1) serial correlation coefficient for the errors. True APE is 0.161(t = 2), 0.162(t = 3), 0.164(t = 4), 0.165(t = 5), 0.166(t = 6), 0.165(t = 7), and 0.167(t = 8). 213 Table D.10. LPM, unconditional logit, CRE logit APE estimates for β n=200 ρ=0.4 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 LPM 0.157 0.026 0.026 0.057 (0.046,0.067) t=2 uncon logit 0.378 0.048 0.027 0.998 (0.996,1) t=2 CRE logit 0.157 0.026 0.025 0.059 (0.048,0.069) t=3 LPM 0.160 0.018 0.019 0.048 (0.039,0.057) t=3 uncon logit 0.308 0.031 0.023 0.995 (0.996,1) t=3 CRE logit 0.160 0.018 0.018 0.053 (0.043,0.063) t=4 LPM 0.162 0.016 0.016 0.060 (0.049,0.070) t=4 uncon logit 0.262 0.023 0.019 0.998 (0.982,0.995) t=4 CRE logit 0.162 0.016 0.016 0.057 (0.047,0.067) t=5 LPM 0.164 0.014 0.014 0.055 (0.040,0.070) t=5 uncon logit 0.235 0.018 0.016 0.989 (0.982,0.995) t=5 CRE logit 0.164 0.014 0.013 0.051 (0.047,0.076) t=6 LPM 0.165 0.012 0.012 0.058 (0.047,0.068) t=6 uncon logit 0.218 0.015 0.013 0.953 (0.943,0.962) t=6 CRE logit 0.165 0.012 0.012 0.056 (0.046,0.076) t=7 LPM 0.165 0.011 0.011 0.053 (0.043,0.063) t=7 uncon logit 0.206 0.013 0.012 0.906 (0.893,0.919) t=7 CRE logit 0.165 0.011 0.011 0.051 (0.041,0.060) t=8 LPM 0.166 0.011 0.011 0.055 (0.045,0.065) t=8 uncon logit 0.198 0.012 0.011 0.834 (0.818,0.851) t=8 CRE logit 0.166 0.011 0.011 0.059 (0.049,0.070) *Note: r is the rejection rate for the null of H0 :β=1 . ρ is AR(1) serial correlation coefficient for the errors. True APE is 0.163(t = 2), 0.161(t = 3), 0.165(t = 4), 0.163(t = 5), 0.166(t = 6), 0.165(t = 7), and 0.164(t = 8). 214 Table D.11. LPM, unconditional logit, CRE logit APE estimates for β n=400 ρ=0.4 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 LPM 0.156 0.019 0.018 0.068 (0.056,0.079) t=2 uncon logit 0.379 0.032 0.018 1 (1,1) t=2 CRE logit 0.156 0.019 0.018 0.066 (0.056,0.076) t=3 LPM 0.160 0.013 0.013 0.050 (0.040,0.060) t=3 uncon logit 0.307 0.022 0.016 1 (1,1) t=3 CRE logit 0.160 0.013 0.013 0.054 (0.044,0.064) t=4 LPM 0.163 0.011 0.011 0.046 (0.037,0.055) t=4 uncon logit 0.262 0.016 0.013 1 (1,1) t=4 CRE logit 0.163 0.011 0.011 0.048 (0.038,0.057) t=5 LPM 0.164 0.010 0.010 0.054 (0.044,0.064) t=5 uncon logit 0.234 0.012 0.011 1 (1,1) t=5 CRE logit 0.164 0.010 0.010 0.056 (0.046,0.067) t=6 LPM 0.165 0.009 0.009 0.048 (0.035,0.060) t=6 uncon logit 0.217 0.010 0.010 1 (1,1) t=6 CRE logit 0.165 0.009 0.009 0.048 (0.035,0.060) t=7 LPM 0.165 0.008 0.008 0.050 (0.040,0.060) t=7 uncon logit 0.206 0.009 0.008 1 (1,1) t=7 CRE logit 0.165 0.008 0.008 0.051 (0.041,0.061) t=8 LPM 0.166 0.008 0.008 0.056 (0.046,0.076) t=8 uncon logit 0.199 0.008 0.008 0.982 (0.972,0.991) t=8 CRE logit 0.166 0.008 0.007 0.055 (0.045,0.075) *Note: r is the rejection rate for the null of H0 :β=1 . ρ is AR(1) serial correlation coefficient for the errors. True APE is 0.162(t = 2), 0.162(t = 3), 0.166(t = 4), 0.165(t = 5), 0.166(t = 6), 0.166(t = 7), and 0.165(t = 8). 215 Table D.12. LPM, unconditional logit, CRE logit APE estimates for β n=600 ρ=0.4 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 LPM 0.155 0.015 0.015 0.071 (0.055,0.086) t=2 uncon logit 0.378 0.026 0.014 1 (1,1) t=2 CRE logit 0.155 0.015 0.015 0.074 (0.057,0.090) t=3 LPM 0.160 0.011 0.011 0.052 (0.039,0.066) t=3 uncon logit 0.309 0.017 0.013 1 (1,1) t=3 CRE logit 0.160 0.011 0.011 0.054 (0.041,0.068) t=4 LPM 0.163 0.009 0.009 0.045 (0.032,0.057) t=4 uncon logit 0.262 0.012 0.011 1 (1,1) t=4 CRE logit 0.163 0.009 0.009 0.048 (0.035,0.061) t=5 LPM 0.164 0.008 0.008 0.045 (0.032,0.057) t=5 uncon logit 0.234 0.010 0.009 1 (1,1) t=5 CRE logit 0.164 0.008 0.008 0.048 (0.035,0.061) t=6 LPM 0.165 0.007 0.007 0.046 (0.033,0.059) t=6 uncon logit 0.217 0.008 0.008 1 (1,1) t=6 CRE logit 0.165 0.007 0.007 0.043 (0.030,0.056) t=7 LPM 0.165 0.007 0.007 0.054 (0.040,0.068) t=7 uncon logit 0.206 0.007 0.007 1 (1,1) t=7 CRE logit 0.165 0.007 0.007 0.057 (0.042,0.071) t=8 LPM 0.166 0.006 0.006 0.044 (0.031,0.057) t=8 uncon logit 0.199 0.006 0.006 1 (1,1) t=8 CRE logit 0.166 0.006 0.006 0.043 (0.031,0.055) *Note: r is the rejection rate for the null of H0 :β=1 . ρ is AR(1) serial correlation coefficient for the errors. True APE is 0.165(t = 2), 0.162(t = 3), 0.163(t = 4), 0.165(t = 5), 0.164(t = 6), 0.166(t = 7), and 0.166(t = 8). 216 Appendix E Monte Carlo simulation results: Coefficient δ for binary covariate 217 Table E.1. Unconditional logit for δ with low serial correlation for n = 100 ρ=0 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 2.57 3.04 1.22 0.16 (0.13,0.18) t=3 1.65 0.63 0.62 0.13 (0.11,0.15) t=4 1.41 0.43 0.42 0.14 (0.12,0.16) t=5 1.29 0.33 0.34 0.12 (0.10,0.14) t=6 1.26 0.29 0.29 0.14 (0.12,0.16) t=7 1.20 0.25 0.25 0.11 (0.09,0.13) t=8 1.18 0.23 0.23 0.11 (0.00,0.13) ρ=0.2 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 3.05 3.80 7.93 0.18 (0.16,0.21) t=3 1.83 0.69 0.66 0.18 (0.16,0.21) t=4 1.51 0.45 0.45 0.17 (0.15,0.20) t=5 1.37 0.35 0.35 0.15 (0.13,0.17) t=6 1.32 0.30 0.30 0.17 (0.15,0.19) t=7 1.26 0.26 0.36 0.16 (0.13,0.18) t=8 1.22 0.24 0.24 0.14 (0.12,0.16) ρ=0.4 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 4.69 30.20 1.73 0.21 (0.18,0.24) t=3 2.12 0.77 0.75 0.26 (0.24,0.29) t=4 1.69 0.50 0.48 0.27 (0.24,0.30) t=5 1.50 0.37 0.37 0.24 (0.22,0.26) t=6 1.41 0.32 0.32 0.25 (0.23,0.28) t=7 1.35 0.28 0.28 0.23 (0.20,0.25) t=8 1.29 0.25 0.25 0.20 (0.18,0.23) *Note: r is the rejection rate for the null of H0 :δ=1 with nominal value 0.05. ρ is AR(1) serial correlation coefficient for the errors. 218 Table E.2. Unconditional logit for δ with high serial correlation for n = 100 ρ=0.6 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 7.43 35.86 6.24 0.27 (0.24,0.30) t=3 2.69 1.52 0.92 0.40 (0.37,0.43) t=4 2.04 0.57 0.56 0.45 (0.42,0.48) t=5 1.76 0.43 0.42 0.42 (0.39,0.45) t=6 1.62 0.35 0.34 0.43 (0.40,0.46) t=7 1.52 0.30 0.30 0.41 (0.38,0.44) t=8 1.43 0.26 0.26 0.37 (0.34,0.40) ρ=0.8 Mean SD SE r ≡ 1(P<0.05) 95-CI t = 2 146.09 46.19 20.32 0.34 (0.31,0.38) t=3 6.15 23.83 1.77 0.63 (0.60,0.66) t=4 3.03 1.37 0.81 0.78 (0.75,0.80) t=5 2.45 0.58 0.56 0.80 (0.77,0.82) t=6 2.18 0.47 0.44 0.80 (0.77,0.83) t=7 2.00 0.38 0.37 0.79 (0.76,0.81) t=8 1.86 0.33 0.32 0.77 (0.74,0.80) *Note: r is the rejection rate for the null of H0 :δ=1 with nominal value 0.05. ρ is AR(1) serial correlation coefficient for the errors. 219 Table E.3. Conditional logit for δ with low serial correlation for n = 100 ρ=0 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 1.29 1.55 19.40 0.03 (0.01,0.04) t=3 1.06 0.39 0.38 0.05 (0.03,0.06) t=4 1.04 0.31 0.30 0.06 (0.04,0.07) t=5 1.02 0.26 0.26 0.04 (0.03,0.05) t=6 1.04 0.23 0.23 0.05 (0.04,0.07) t=7 1.02 0.21 0.21 0.05 (0.03,0.06) t=8 1.02 0.19 0.19 0.04 (0.03,0.06) ρ=0.2 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 1.53 1.94 38.55 0.02 (0.01,0.03) t=3 1.17 0.43 0.18 0.06 (0.04,0.07) t=4 1.10 0.32 0.14 0.05 (0.04,0.06) t=5 1.08 0.27 0.27 0.05 (0.03,0.06) t=6 1.07 0.24 0.24 0.06 (0.04,0.07) t=7 1.06 0.22 0.22 0.05 (0.04,0.06) t=8 1.06 0.21 0.20 0.05 (0.03,0.06) ρ=0.4 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 4.32 75.84 68.47 0.02 (0.01,0.03) t=3 1.34 0.46 0.45 0.07 (0.05,0.08) t=4 1.23 0.35 0.34 0.08 (0.07,0.10) t=5 1.17 0.28 0.29 0.08 (0.06,0.10) t=6 1.16 0.25 0.25 0.09 (0.07,0.10) t=7 1.14 0.23 0.23 0.08 (0.07,0.10) t=8 1.12 0.21 0.21 0.08 (0.06,0.10) *Note: r is the rejection rate for the null of H0 :δ=1 with nominal value 0.05. ρ is AR(1) serial correlation coefficient for the errors. 220 Table E.4. Conditional logit for δ with high serial correlation for n = 100 ρ=0.6 Mean SD SE r ≡ 1(P<0.05) 95-CI t=2 5.23 45.89 326.60 0.02 (0.01,0.02) t=3 1.67 0.82 3.24 0.17 (0.14,0.19) t=4 1.47 0.39 0.38 0.19 (0.17,0.22) t=5 1.37 0.32 0.32 0.19 (0.17,0.22) t=6 1.33 0.28 0.27 0.20 (0.17,0.23) t=7 1.28 0.25 0.25 0.19 (0.17,0.21) t=8 1.24 0.23 0.23 0.17 (0.14,0.19) ρ=0.8 Mean SD SE r ≡ 1(P<0.05) 95-CI t = 2 253.9 80.97 1255596 0.003 (0.00,0.006) t=3 3.48 11.39 155.10 0.38 (0.35,0.41) t=4 2.11 0.78 2.58 0.56 (0.53,0.59) t=5 1.88 0.41 0.40 0.59 (0.56,0.62) t=6 1.76 0.36 0.34 0.61 (0.58,0.64) t=7 1.68 0.31 0.30 0.63 (0.60,0.66) t=8 1.60 0.28 0.27 0.62 (0.59,0.65) *Note: r is the rejection rate for the null of H0 :δ=1 with nominal value 0.05. ρ is AR(1) serial correlation coefficient for the errors. 221 Appendix F Monte Carlo simulation results: APEs for binary covariate 222 Table F.1. LPM, unconditional logit, CRE logit APE estimates for δ n=100 t=2 t=2 t=2 t=3 t=2 t=3 t=4 t=4 t=4 t=5 t=5 t=5 t=6 t=6 t=6 t=7 t=7 t=7 t=8 t=8 t=8 ρ=0 LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit Mean 0.184 0.454 0.174 0.176 0.311 0.168 0.174 0.260 0.168 0.167 0.231 0.162 0.170 0.221 0.166 0.163 0.205 0.160 0.164 0.198 0.160 SD SE r ≡ 1(P<0.05) 0.089 0.084 0.063 0.373 1834.10 0.564 0.084 0.036 0.065 0.059 0.059 0.050 0.100 0.078 0.483 0.055 0.055 0.052 0.049 0.049 0.061 0.069 0.059 0.391 0.046 0.046 0.053 0.041 0.041 0.056 0.053 0.048 0.331 0.039 0.039 0.056 0.038 0.037 0.051 0.046 0.041 0.306 0.036 0.035 0.052 0.033 0.034 0.043 0.039 0.037 0.253 0.031 0.031 0.048 0.031 0.031 0.051 0.035 0.033 0.246 0.030 0.030 0.058 95-CI (0.048,0.086) (0.534,0.595) (0.049,0.080) (0.036,0.063) (0.453,0.513) (0.038,0.065) (0.046,0.076) (0.361,0.421) (0.038,0.068) (0.042,0.070) (0.302,0.360) (0.042,0.070) (0.037,0.065) (0.277,0.335) (0.038,0.066) (0.030,0.056) (0.226,0.280) (0.035,0.061) (0.037,0.065) (0.219,0.273) (0.043,0.073) *Note: r is the rejection rate for the null of H0 :APE for δ=true APE . ρ is AR(1) serial correlation coefficient for the errors. True APE is 0.172(t = 2), 0.166(t = 3), 0.168(t = 4), 0.164(t = 5), 0.166(t = 6), 0.162(t = 7), and 0.160(t = 8). 223 Table F.2. LPM, unconditional logit, CRE logit APE estimates for δ n=100 t=2 t=2 t=2 t=3 t=2 t=3 t=4 t=4 t=4 t=5 t=5 t=5 t=6 t=6 t=6 t=7 t=7 t=7 t=8 t=8 t=8 ρ=0.2 LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit Mean 0.184 0.501 0.174 0.176 0.335 0.168 0.172 0.273 0.168 0.167 0.243 0.163 0.170 0.228 0.166 0.164 0.213 0.160 0.164 0.204 0.160 SD SE 0.081 0.080 0.397 13231 0.077 0.075 0.057 0.057 0.104 0.080 0.054 0.053 0.047 0.047 0.069 0.060 0.044 0.044 0.040 0.041 0.053 0.049 0.038 0.038 0.037 0.037 0.046 0.042 0.035 0.035 0.032 0.033 0.040 0.037 0.031 0.031 0.031 0.031 0.036 0.033 0.029 0.029 r ≡ 1(P<0.05) 0.068 0.616 0.068 0.050 0.547 0.050 0.051 0.439 0.058 0.050 0.280 0.056 0.056 0.360 0.051 0.046 0.298 0.051 0.049 0.294 0.051 95-CI (0.052,0.084) (0.586,0.647) (0.052,0.084) (0.036,0.064) (0.516,0.578) (0.036,0.064) (0.038,0.065) (0.408,0.470) (0.043,0.073) (0.037,0.064) (0.350,0.410) (0.043,0.070) (0.037,0.070) (0.330,0.390) (0.037,0.065) (0.033,0.059) (0.270,0.326) (0.037,0.065) (0.036,0.062) (0.266,0.322) (0.037,0.065) *Note: r is the rejection rate for the null of H0 :APE for δ=true APE . ρ is AR(1) serial correlation coefficient for the errors. True APE is 0.172(t = 2), 0.166(t = 3), 0.168(t = 4), 0.164(t = 5), 0.166(t = 6), 0.161(t = 7), and 0.159(t = 8). 224 Table F.3. LPM, unconditional logit, CRE logit APE estimates for δ n=100 t=2 t=2 t=2 t=3 t=2 t=3 t=4 t=4 t=4 t=5 t=5 t=5 t=6 t=6 t=6 t=7 t=7 t=7 t=8 t=8 t=8 ρ=0.4 LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit Mean 0.182 0.566 0.173 0.176 0.372 0.169 0.173 0.298 0.167 0.167 0.261 0.162 0.169 0.242 0.165 0.164 0.225 0.161 0.163 0.213 0.159 SD SE 0.076 0.075 0.528 17430 0.072 0.070 0.053 0.054 0.105 0.082 0.050 0.050 0.045 0.045 0.071 0.061 0.042 0.042 0.038 0.039 0.054 0.050 0.036 0.037 0.036 0.036 0.046 0.042 0.034 0.034 0.032 0.032 0.041 0.038 0.031 0.031 0.031 0.030 0.036 0.034 0.029 0.029 r ≡ 1(P<0.05) 0.062 0.677 0.062 0.050 0.652 0.049 0.058 0.555 0.058 0.044 0.477 0.058 0.059 0.436 0.059 0.046 0.417 0.050 0.051 0.379 0.051 95-CI (0.047,0.077) (0.647,0.706) (0.047,0.077) (0.039,0.061) (0.627,0.677) (0.038,0.060) (0.043,0.073) (0.524,0.586) (0.043,0.073) (0.031,0.057) (0.446,0.508) (0.034,0.061) (0.044,0.074) (0.405,0.467) (0.044,0.074) (0.033,0.059) (0.386,0.448) (0.036,0.064) (0.037,0.065) (0.349,0.409) (0.037,0.065) *Note: r is the rejection rate for the null of H0 :APE for δ=true APE . ρ is AR(1) serial correlation coefficient for the errors. True APE is 0.172(t = 2), 0.166(t = 3), 0.168(t = 4), 0.164(t = 5), 0.166(t = 6), 0.162(t = 7), and 0.158(t = 8). 225 Table F.4. LPM, unconditional logit, CRE logit APE estimates for δ n=100 t=2 t=2 t=2 t=3 t=2 t=3 t=4 t=4 t=4 t=5 t=5 t=5 t=6 t=6 t=6 t=7 t=7 t=7 t=8 t=8 t=8 ρ=0.6 LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit Mean 0.182 0.695 0.172 0.176 0.432 0.169 0.173 0.341 0.168 0.167 0.294 0.163 0.169 0.268 0.164 0.164 0.247 0.161 0.162 0.232 0.158 SD SE 0.069 0.069 0.729 95297 0.065 0.064 0.050 0.050 0.163 0.781 0.047 0.047 0.041 0.042 0.072 0.063 0.022 0.022 0.037 0.037 0.058 0.052 0.035 0.035 0.034 0.034 0.048 0.044 0.032 0.032 0.031 0.031 0.042 0.039 0.029 0.029 0.029 0.029 0.037 0.035 0.028 0.028 r ≡ 1(P<0.05) 0.058 0.737 0.066 0.049 0.785 0.052 0.050 0.737 0.053 0.056 0.673 0.054 0.052 0.633 0.056 0.054 0.591 0.054 0.054 0.568 0.056 95-CI (0.043,0.073) (0.708,0.765) (0.050,0.082) (0.036,0.062) (0.759,0.810) (0.038,0.066) (0.037,0.064) (0.710,0.765) (0.039,0.067) (0.042,0.070) (0.644,0.702) (0.040,0.068) (0.038,0.065) (0.603,0.663) (0.042,0.070) (0.040,0.068) (0.560,0.621) (0.040,0.068) (0.040,0.068) (0.536,0.599) (0.041,0.070) *Note: r is the rejection rate for the null of H0 :APE for δ=true APE . ρ is AR(1) serial correlation coefficient for the errors. True APE is 0.172(t = 2), 0.166(t = 3), 0.168(t = 4), 0.164(t = 5), 0.166(t = 6), 0.162(t = 7), and 0.158(t = 8). 226 Table F.5. LPM, unconditional logit, CRE logit APE estimates for δ n=100 t=2 t=2 t=2 t=3 t=2 t=3 t=4 t=4 t=4 t=5 t=5 t=5 t=6 t=6 t=6 t=7 t=7 t=7 t=8 t=8 t=8 ρ=0.8 LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit LPM uncon logit CRE logit Mean 0.174 0.892 0.165 0.176 0.638 0.168 0.175 0.434 0.168 0.168 0.366 0.163 0.169 0.329 0.165 0.164 0.301 0.161 0.162 0.279 0.159 SD SE r ≡ 1(P<0.05) 0.061 0.061 0.060 1.112 173194 0.739 0.058 0.058 0.055 0.045 0.045 0.045 0.600 40290 0.902 0.042 0.042 0.050 0.037 0.039 0.052 0.127 0.329 0.946 0.032 0.032 0.046 0.034 0.034 0.044 0.061 0.054 0.934 0.032 0.032 0.044 0.032 0.032 0.051 0.052 0.046 0.911 0.031 0.030 0.054 0.029 0.029 0.047 0.044 0.041 0.906 0.028 0.028 0.052 0.028 0.028 0.050 0.040 0.037 0.867 0.026 0.026 0.048 95-CI (0.043,0.077) (0.708,0.770) (0.038,0.073) (0.032,0.058) (0.883,0.920) (0.036,0.064) (0.038,0.066) (0.933,0.961) (0.033,0.059) (0.032,0.057) (0.919,0.950) (0.032,0.057) (0.037,0.067) (0.893,0.929) (0.039,0.069) (0.034,0.060) (0.888,0.924) (0.038,0.065) (0.037,0.064) (0.845,0.888) (0.035,0.061) *Note: r is the rejection rate for the null of H0 :APE for δ=true APE . ρ is AR(1) serial correlation coefficient for the errors. True APE is 0.171(t = 2), 0.166(t = 3), 0.168(t = 4), 0.164(t = 5), 0.166(t = 6), 0.162(t = 7), and 0.159(t = 8). 227 Appendix G Proof of Hausman-type test in chapter 2 We start with rewriting stacked estimator to obtain asymptotic variance.  n −1 sit xit xit   i=1 t=1 θa =   0 0 n T .. .. i=1 t=1  √   n(θa − θa ) =    √ 1 n N sit xit xit      T sit xit yit   i=1 t=1  n T  .. .. sit xit y it n  T i=1 t=1 2k×2k 2k×1 −1  T i=1 t=1 sit xit xit 0 1 n 0   n T i=1 t=1 n .. .. sit xit xit T             1 ( √n 1 ( √n N i=1 t=1 n T i=1 t=1 n  T T sit xit uit )     .. .. sit xit uit ) 1 1 √ sit xit xit )−1 ( √n sit xit uit )  n(θpls − θ)   ( n  i=1 t=1 i=1 t=1  = n(θa − θa ) =    n T n T √ .. .. .. .. 1 1 n(θF E − θ) sit xit xit )−1 ( √n sit xit uit ) (n i=1 t=1 228 i=1 t=1      θa is a consistent estimator for θa by rank conditions and exogeneity conditions for pooled LS and FE estimators. 1 (p lim n 1 (p lim n n T T sit xit xit )−1 i=1 t=1 n T t=1 .. .. sit xit xit )−1 T (p lim 1 n n .. .. sit xit xit )−1 ≡ A−1 22 = E( t=1 i=1 t=1 1 (p lim n sit xit xit )−1 ≡ A−1 11 = E( T T sit xit uit ) = E( i=1 t=1 n T sit xit uit ) = 0 t=1 T .. .. sit xit uit ) = E( i=1 t=1 .. .. sit xit uit ) = 0 t=1 Consider an estimator for the variance of √ n(θa − θ). we use following notations and obtain the asymptotic variance estimator as in (G.1). −1   A11 0   G−1 =  a   0 A22 n n D21 = 1 n T T [( sit xit uit )( i=1 t=1 n T [( i=1 t=1 t=1 .. T sit xit xit , A22 = i=1 t=1 n   D11 D12   Da =    D21 D22 T A11 = 1 D11 = n  i=1 t=1 1 sit xit uit ) ], D12 = n T sit xit uit )( .. .. sit xit xit sit xit uit ) ], and D22 = t=1 n T T [( i=1 t=1 1 n .. sit xit uit )(( n T [( i=1 t=1 sit xit uit ) ] t=1 .. T sit xit uit )( .. sit xit uit ) ] t=1 where uit : residual from FE estimation and uit residual from pooled LS estimation.  −1   −1  A11 0  √  var( n(θa − θa )) =    0 A22 229 D11 D12   A11 0        D21 D22 0 A22 (G.1) D11 = D12 = D21 = D22 = 1 N 1 N 1 N 1 N n T T t=1 T i=1 t=1 n T i=1 t=1 n T [( .. t=1 T i=1 t=1 n T [( .. t=1 T p t=1 T t=1 T p .. t=1 .. .. sit xit uit ) ] = D21 t=1 T sit xit uit )( t=1 .. sit xit uit ) ] = D12 t=1 T sit xit uit )( sit xit uit ) ] → E[( sit xit uit )( i=1 t=1 sit xit uit )(( sit xit uit ) ] → E[( sit xit uit )( sit xit uit ) ] = D11 t=1 T t=1 T p .. sit xit uit )( sit xit uit ) ] → E[( sit xit uit )(( [( T T p sit xit uit ) ] → E[(( sit xit uit )( [( .. sit xit uit ) ] = D22 t=1 We obtain (23) from (21) and (22). p p p A−1 → A−1 , A−1 → A−1 where → denote probability limit 11 11 22 22     D11 D12  p D11 D12   →      D21 D22 D21 D22   −1     A11 0   D11 D12   A11 0 √     N (θa − θ) ⇒ N 0,       0 A22 D21 D22 0 A22 (G.2) (G.3) −1         (G.4) where ⇒ implies weak convergence (i.e. convergence in distribution)  −1  −1   A11 0   0 A22   D11 D12   A11 0        D21 D22 0 A22       −1 0   D11 D12  A11   =   D21 D22 0 A−1 22  −1 0    A11     0 A−1 22  −1 −1 −1 −1  A11 D11 A11 Aa1 D12 A22   =    A−1 D21 A−1 A−1 D22 A−1 22 11 22 22 Thus, we obtain (24) from (23).   √ √   R N (θa − θ) = R N θa ⇒ N 0, R   Rθ=0 A−1 D11 A−1 11 11 A−1 D12 A−1 a1 22 A−1 D21 A−1 22 11 A−1 D22 A−1 22 22 230   R   (G.5) Denote following notations ,  −1  A11 0   G−1 =  a   0 A22    D11 D12   Da =    D21 D22 Then, we have (25) from (20) and (23). √ p var( N (θa − θ)) = G−1 Da G−1 = G−1 Da G−1 + op (1), (i.e. G−1 Da G−1 → G−1 Da G−1 ) a a a a a a a a (G.6) where  G−1 Da G−1 a a  −1 0   D11 D12  A11   =   0 A−1 D21 D22 22    −1 0    A11     0 A−1 22  −1 −1 −1 −1  A11 D11 A11 Aa1 D12 A22    =   A−1 D21 A−1 A−1 D22 A−1 22 11 22 22 √ √ Thus, from (25) we can construct a statistic W1 = [R N θa ] [RG−1 Da G−1 R ]−1 R N θa a a which converges to chi-squared distribution. √ a R N θa ∼ N (0, RG−1 Da G−1 R ) a a √ √ [R N θa ] [RG−1 Da G−1 R ]−1 R N θa ∼ χ2 . a a k 231 Appendix H Proof of Corollary 3 in Chapter 3: Robust inference in unbalanced panel data Proof First, we consider WLS estimator. n T βwls = ( 1 1 sit √ xit √ xit )−1 ( λ λ n T 1 1 sit √ xit √ yit ) λ λ i i i i i=1 t=1 i=1 t=1 T n T 1 1 1 1 ( sit √ xit √ xit )−1 ( sit √ xit √ yit ). Ti Ti Ti Ti i=1 t=1 i=1 t=1 n = Then we have √ T (βwls −β) = √ n T( T 1 1 sit √ xit √ xit )−1 ( T T i T 1 1 sit √ xit √ uit ) T T i i i=1 t=1 i=1 t=1 n T T n 1 √1 sit xit xit )−1 ( = ( s x u ) Ti Ti λi t=1 it it it t=1 i=1 i=1 √ n T n T T 1 1 √ T √ sit xit xit )−1 ( s x u ) ( T T Ti λi T t=1 it it it i=1 i=1 i t=1 n n n n 1 Λ W (λ )) = ( 1 1 λ Q )−1 ( ⇒ ( Qi )−1 ( λi i i λi i i i λi Λi W (λi )) i=1 i=1 i=1 i=1 n n 1 ( Qi )−1 ( λi Λi Wi ) i=1 i=1 232 i n = = n n √ 1 −1 ( T (βwls − β) ⇒ ( Qi ) ΛW) λi i i i=1 n T · var(βwls ) = T · Vw−clus = ( i=1 1 Ti (H.1) i=1 T n sit xit xit )−1 Ω t=1 T 1 √ sit xit xit )−1 wls ( T t=1 i=1 where 1 Qi = Ti 1 1 √ λi T T n sit xit xit , Ωwls = t=1 i=1 T T t=1 1 1 sit xit uit = √ λi T 1 1 =⇒ Λi Wi − λi Qi ( λi λi n Qj j=1 n [( i=1 1 1 √ λi T T t=1 1 1 sit xit uit )( √ λi T 1 1 sit xit uit − λi T )−1 ( j=1 Ωwls = t=1 n 1 1 [( √ λi T T T sit xit uit ) ] t=1 √ sit xit xit T (βwls − β) + op (1) t=1 1 1 Λj Wj ) = Λi Wi − Qi ( λj λi n n Qj )−1 ( j=1 T sit xit uit )( t=1 1 1 √ λi T j=1 1 Λ W ) λj j j T sit xit uit ) ] t=1 n n 1 Λ W )]{[ 1 Λ B − Q ( 1 Qj )−1 ( j j i i i λj λi λj Λj Wj )] } i i=1 j=1 j=1 j=1 j=1 n n n 1 1 1 1 = [ λ Λi Wi Wi Λi λ − Qi ( Qj )−1 ( − λj Λj Wj ) λi Wi Λi i i=1 i j=1 j=1 n n n n n n 1ΛW( 1 W Λ )( 1 Λ W )( 1 W Λ )( Qj )−1 Qi +Qi ( Qj )−1 ( Qj )−1 Qi ] i i j j λi λj j j λj λj j j j=1 j=1 j=1 j=1 j=1 j=1 n =⇒ 1 [ λ Λi Bi − Qi ( n Qj )−1 ( n Using homogeneity, Λj = Λ and Qj = Q for all j = 1, 2, .., n n = 1 1 1 [ λ ΛWi Wi Λ λ − n (Λ i i i=1 n 1 1 λj Wj ) λi Wi Λ j=1 n 1 − Λ λ Wi ( i n 1 1 λj Wj )Λ ) n j=1 n 1 W )( 1 1 λj j λj Wj )Λ n ] j=1 j=1 n W n W n W n W n W W n W n n W j j 1 j j 1 1( i i − 1( i)−( i )( = Λ[ )( )n+ )( n n λi λi λj λi λi λj λj λj ) n ]Λ j=1 i=1 i=1 j=1 i=1 j=1 j=1 i=1 n W W n W n W j 1 i i i = Λ[ λi λi − n ( λj )( λi )]Λ i=1 j=1 i=1 1 + n Λ( 233 n Ωwls =⇒ Λ[ i=1 n Wi Wi 1 − ( λi λi n i=1 n = ( Wj n Wj )( )]Λ λj λj j=1 j=1 T T n 1 Ti t=1 Qi T · Vw−clus = ( n )−1 Ω sit xit xit )−1 Ωwls ( 1 √ sit xit xit )−1 T t=1 i=1 n Qi )−1 wls ( i=1 i=1 n 1 Wi Wi − ( λi λi n ⇒ (nQ)−1 Λ[ n j=1 i=1 Wj n Wj )( )]Λ (nQ)−1 λj λj (H.2) j=1 Now consider t-statistic based on WLS which we use for t-test. √ t∗ = wls T (Rβwls − r) = √ R T (βwls − β) R · T · Vw−clus R R · T · Vw−clus R For the numerator, using (H.1), we have n n √ 1 Wi 1 −1 Λ( −1 ( R T (βwls − β) ⇒ R(nQ) ΛWi ) = RQ ) λi n λi = 1 ( n i=1 −1 ΛW RQ i n λi i=1 i=1 )≡ 1 ( n n yi ) (H.3) i=1 For the denominator, using (H.2), we have n R · T · Vw−clus R ⇒ R(nQ)−1 Λ[ i=1 = = 1 [ n2 1 [ n2 n i=1 n n j=1 Wj n Wj )( )]Λ (nQ)−1 R λj λj 1 RQ−1 ΛWi Wi Λ Q−1 R − ( λi λi n yi yi − i=1 Wi Wi 1 − ( λi λi n 1 n n i=1 RQ−1 ΛWi )( λi n j=1 Wj Λ Q−1 R λj )] n yi i=1 j=1 n yj ] j=1 Using (H.3) and (H.4) together, we have 234 (H.4) 1 n( √ t∗ wls = R T (βwls − β) R · T · Vw−clus R = ny n (yi − y)2 i=1 = ⇒ n 1 [ yy n2 i=1 i i n · n−1 ( yi ) i=1 1 −n √ ny 1 n−1 n n n 235 yi i=1 (yi − y)2 i=1 = n n j=1 ≡ yj ] yi ) i=1 n [ i=1 n T n − 1 n−1 1 yi yi − n n n yi i=1 j=1 yj ] Appendix I Monte Carlo simulation results in chapter 3: Size-adjusted power for t-test in Ibragimov and Muller [2009] 236 Table I.1. The size adjusted power of test, 1(p < 0.05), T = 100 T=100, n=4 balanced complete-case bias=0 bias=2% bias=5% bias=10% bias=20% T=100, n=8 bias=0 bias=2% bias=5% bias=10% bias=20% T=100, n=16 bias=0 bias=2% bias=5% bias=10% bias=20% .048(.041,.054) .059(.048,.069) .092(.079,.105) .219(.200,.237) .626(.605,.647) balanced .048(.042,.055) .073(.061,.084) .167(.150,.183) .517(.495,.539) .977(.970,.983) balanced .048(.041,.055) .110(.096,.123) .362(.340,.383) .887(.873,.901) 1(1,1) .028(.023,.033) .038(.029,.046) .052(.042,.061) .100(.087,.113) .302(.282,.322) complete-case .038(.032,.044) .054(.044,.063) .099(.085,.112) .243(.224,.262) .671(.650,.692) complete-case .050(.043,.057) .069(.057,.080) .191(.174,.208) .574(.552,.595) .984(.978,.989) balanced balanced keep ∀i drop t keep ∀t drop i .046(.040,.053) .053(.046,.060) .048(.039,.057) .063(.052,.073) .050(.040,.060) .070(.059,.081) .063(.052,.074) .093(.080,.105) .136(.120,.151) .159(.143,.175) balanced-drop t balanced -drop i .050(.040,.059) .053(.046,.060) .047(.037,.056) .052(.042,.061) .062(.051,.072) .060(.049,.070) .108(.094,.122) .089(.077,.101) .311(.290,.331) .156(.140,.171) balanced-drop t balanced -drop i .050(.043,.057) .048(.041,.055) .057(.047,.067) .059(.048,.069) .124(.109,.138) .094(.081,.106) .330(.309,.351) .225(.206,.243) .839(.823,.855) .628(.607,.649) (Note) The rejection probability is reported. 95 % coverage rate for rejection probability is reported in parenthesis. Generated errors are from AR(1) process with correlation coefficient 0.75. The number of replications is 2,000. 237 Table I.2. The size adjusted power of test, 1(p < 0.05), T = 300 T=300, n=4 balanced complete-case bias=0 bias=2% bias=5% bias=10% bias=20% T=300, n=8 bias=0 bias=2% bias=5% bias=10% bias=20% T=300, n=16 bias=0 bias=2% bias=5% bias=10% bias=20% .046(.036,.055) .067(.056,.078) .169(.153,.185) .520(.498,.541) .968(.960,.976) balanced .053(.046,.060) .122(.107,.136) .435(.413,.457) .941(.931,.951) 1(1,1) balanced .046(.036,.055) .187(.170,.204) .798(.780,.815) 1(1,1) 1(1,1) .029(.022,.037) .034(.026,.041) .088(.075,.100) .237(.218,.255) .642(.621,.663) complete-case .042(.034,.050) .089(.076,.101) .270(.250,.289) .748(.729,.767) .999(.998,1) complete-case .050(.040,.060) .135(.120,.150) .502(.480,.523) .960(.951,.969) 1(1,1) balanced balanced keep ∀i drop t keep ∀t drop i .058(.047,.068) .048(.039,.077) .053(.043,.063) .055(.046,.065) .072(.061,.083) .079(.067,.090) .143(.127,.158) .145(.130,.160) .367(.346,.388) .262(.243,.281) balanced-drop t balanced -drop i .049(.040,.059) .050(.043,.058) .070(.059,.081) .053(.043,.063) .179(.162,.196) .070(.059,.081) .548(.526,.569) .126(.111,.141) .986(.980,.991) .249(.230,.268) balanced-drop t balanced -drop i .050(.040,.059) .048(.038,.057) .084(.072,.096) .060(.050,.071) .294(.274,.313) .180(.163,.197) .782(.763,.800) .530(.508,.551) 1(.999,1) .971(.963,.978) (Note) The rejection probability is reported. 95 % coverage rate for rejection probability is reported in parenthesis. Generated errors are from AR(1) process with correlation coefficient 0.75. The number of replications is 2,000. 238 Table I.3. The size adjusted power of test, 1(p < 0.05) , T = 1, 000 T=1,000, n=4 balanced complete-case bias=0 bias=2% bias=5% bias=10% bias=20% T=1,000, n=8 bias=0 bias=2% bias=5% bias=10% bias=20% T=1,000, n=16 bias=0 bias=2% bias=5% bias=10% bias=20% .056(.045,.066) .136(.121,.151) .484(.462,.506) .942(.931,.952) 1(1,1) balanced .046(.038,.054) .273(.253,.292) .906(.893,.918) 1(1,1) 1(1,1) balanced .050(.045,.059) .523(.046,.060) 1(1,1) 1(1,1) 1(1,1) .033(.025,.041) .067(.056,.078) .220(.201,.238) .603(.581,.624) .959(.950,.967) complete-case .036(.027,.044) .144(.129,.159) .577(.555,.598) .963(.955,.971) 1(1,1) complete-case .046(.037,.055) .285(.265,.305) .939(.929,.949) 1(1,1) 1(1,1) balanced keep ∀i drop t .042(.033,.050) .047(.038,.057) .050(.040,.060) .072(.061,.083) .147(.131,.163) balanced-drop t .052(.042,.061) .092(.079,.104) .281(.261,.300) .740(.721,.759) .999(.997,1) balanced-drop t .050(.040,.060) .187(.169,.204) .739(.721,.758) .999(.997,1) 1(1,1) balanced keep ∀t drop i .061(.050,.071) .076(.064,.087) .130(.115,.144) .250(.231,.269) .480(.458,.502) balanced -drop i .055(.045,.065) .066(.055,.077) .130(.115,.144) .240(.221,.258) .442(.420,.464) balanced -drop i .050(.040,.060) .133(.118,.147) .469(.447,.491) .942(.932,.952) 1(1,1) (Note) The rejection probability is reported. 95 % coverage rate for rejection probability is reported in parenthesis. Generated errors are from AR(1) process with correlation coefficient 0.75. The number of replications is 2,000. 239 BIBLIOGRAPHY 240 Bibliography J. Abrevaya. The equivalence of two estimators of the fixed-effects logit model. Economics Letters, 55:41–43, 1997. E. Andersen. Conditional Inference and Models for Measuring. Mentalhygienjnisk Forlag, Copenhague, 1973. J. Angrist and J. Pischke. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton Univ. Press, Princeton, New Jersey, 2009. M. Arellano. Computing robust standard errors for within-groups estimators. Oxford Bulletin of Economics and Statistics, 49:431–34, 1987. M. Arellano. Discrete choices with panel data. Investigaciones Economicas, 27:423–458, 2003. M. Arellano and J. Hahn. Understanding Bias in Nonlinear Panel Models: Some Recent Developments. In Advances in Economics and Econometrics, Blundell R, Newey W, Persson T (eds). Cambridge University Press, Cambridge, 2007. N. Bakirov and G. Szekely. Student’s t-test for gaussian scale mixtures. Journal of Mathematical Science, 139:6497–6505, 2006. H. Bang and J. Robins. Doubly robust estimation in missing data and causal inference models. Biometrics, 61:692–697, 2005. G. Becker. Human Capital: A Theoretical and Empirical Ananlysis, with Special Reference to Education. The university of Chicago Press, Chicago and London, 1993. M. Bertrand, E. Duflo, and S. Mullainathan. How much should we trust differences-indifferences estimates? The Quarterly Journal of Economics, 119:249–275, 2004. A. Bester, T. Conley, and C. Hansen. Inference with dependent data using cluster covariance esteimators. mimeo, 2009. P. Bickel, C. Klaassen, Y. Ritov, and J. Wellner. Efficient and Adaptive Estimation in Semiparametric Models. Springer Verlag, New York, 1998. 241 S. Van Buuren, H. Boshuizen, and D. Knook. Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine, 18:681–694, 1999. J. Carpenter, M. Kenward, and S. Vansteelandt. comparison of multiple imputation and doubly robust estimation for analyses with missing data. Journal of the Royal Statistical Society, Series A(Statistics in Society), 169:571–584, 2006. G. Chamberlain. Analysis of covariance with qualitative data. Review of Economic Studies, 47:225–238, 1980. G. Chamberlain. ”panel data”, in eds z. griliches and m. intriligator. Handbook of Econometrics, 2:1248–1313, 1984. G. Chamberlain. Binary response models for panel data: Identification and information. Econometrica, 78:159–168, 2010. K. Chay and D. Hyslop. Identification and estimation of dynamic binary panel data models: Empirical evidence using alternative approaches. 2001. D. Clayton, D. Spiegelhalter, and A. Pickles. Analysis of longitudinal binary data from multi-phase sampling (with discussion). Journal of the Royal Statistical Society, Series B(statistical methodology), 60:71–87, 1999. M. Collado and M. Browning. Habits and heterogeneity in demands: a panel data analysis. Journal of Applied Econometrics, 22:625–640, 2007. L. Devroye. Non-Uniform Random Variate Generation. Springer-Verlag, New York, 1986. E. Fama and J. MacBeth. Long-term growth in a short-term market. Journal of Finance, 29(3):857–85, 1974. I. Fernandez-Val. Fixed effects estimation of structural parameters and marginal effects in panel probit models. Journal of Econometrics, 150:71–85, 2009. B. Graham, C. Campos de Xavier Pinto, and D. Egel. Inverse probability tilting for moment condition models with missing data. NBER Working Paper No. 13981, 2008. W. Green. The behavior of the fixed effects estimator in nonlinear models. The Econometrics Journal, 7:98–119, 2004. J. Hahn and W. Newey. Jackknife and analytical bias reduction for nonlinear panel models. Econometrica, 72:1295–1319, 2004. V. Hajivassiliou and D. McFadden. The method of simulated scores with application to models of external debt crises. Econometrica, 66:863–896, 1998. P. Hall and B. Presnell. Biased bootstrap methods for reducign the effects of contamination. Journal of the Royal Statistical Society, Series B(statistical methodology), 61:661–680, 1999. 242 C. Hansen. Asymptotic properties of a robust variance matrix estimator for panel data when t is large. Journal of Econometrics, 141:597–620, 2007. E. Hanushek. Instrumental variables estimates of the effect of subsidized traning on the quantiles of traninee earnings. Econometrica, 70:91–117, 1986. E. Hanushek. The effects of birth inputs on birthweight: Evidence from quantile estimation on panel data. Journal of Business and Economic Statistics, 26:379–397, 1997. J. Hausman. Specification tests in econometrics. Econometrica, 46:1251–1271, 1978. J. Heckman. The Incidental Parameters Problem and the Problem of Initial Conditions in Estimating a Descrete Time-Discrete Data Stochastic Process and Some Monte Carlo Evidence. MIT Press, Cambridge, Massachusetts, 1981. K. Hirano, G. Imbens, and G. Ridder. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica, 71:1161–1189, 2003. P. Hoel, S. Port, and C. Stone. Introduction to Probability Theory. Brooks Cole, Boston, 1972. D. Horvitz and D. Thompson. A generalization of sampling without replacement from a finite universe. Journal of American Statistical Association, 47:663–685, 1952. R. Ibragimov and U. Muller. t-statistic based correlation and heterogeneity robust inference. Journal of Business and Economic Statistics, 28:453–468, 2009. M. Johnson. Multivariate Statistical Simulation. John Wiley and Sons, New York, 1987. N. Kiefer and T. Vogelsang. Heteroskedasticity-autocorrelation robust standard errors using the bartlett kernel without truncation. Econometrica, 70(5):2093–2095, 2002. N. Kiefer and T. Vogelsang. A new asymptotic theory for heteroskedasticity-autocorrelation robust tests. Econometric Theory, 21(06):1130–1164, 2005. A. Krueger. Experimental estimates of education production functions. Quarterly Journal of Economics, 114:497–532, 1999. A. Krueger and D. Whitmore. The effect of attending a small class in the early grades on college-test taking and middle school test results: Evidence from project star. Economic Journal, 111:1–28, 2001. T. Lancaster. The incidental parameters problem since 1948. Journal of Econometrics, 95: 391–413, 2000. L. Lee. On comparisions of normal and logistic models in the bivariate dichotomous analysis. Journal of Econometrics, 4:151–155, 1979. K. Liang and S. Zeger. Longitudinal data analysis using generalized linear model. Biometrika, 73:13–22, 1986. 243 R. Little and D. Rubin. Statistical Analysis with Missing Data. J. Wiley and Sons, New York, 2002. J. Neyman and E. Scott. Consistent estimates based on partially consistent observations. Econometrica, 16:1–32, 1948. M. Petersen. Estimating standard errors in finance panel data sets: Comparing approaches. The Review of Financial Studies, 22:435–480, 2009. T. Ragnuthan, J. Lepkowski, J. van Hoewyk, and P. Solenberger. A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology, 27:85–95, 2001. J. Robins and A. Rotnitzky. Semiparametric efficiency in multivariate regression models with missing data. Journal of American Statistical Association, 90:122–129, 1995. J. Robins, A. Rotnitzky, and L. Zhao. Estimation of regression coefficients when some regressors are not always observed. Journal of American Statistical Association, 89:846– 866, 1994. P. Royston. Multiple imputation of missing values. Stata Journal, 4:227–241, 2004. P. Royston. Multiple imputation of missing values: update. Stata Journal, 5:188–201, 2005. D. Rubin. Inference and missing data. Biometrika, 63:581–592, 1976. D. Rubin. Semiparametric Theory and Missing Data. J. Wiley and Sons, New York, 1987. J. Schafer and J. Graham. Missing data: Our view of the state of the art. Psychological Methods, 7:147–177, 2002. D. Scharfstein, A. Rotnitzky, and J. Robins. Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of American Statistical Association, 94: 1096–1120 with Rejoinder, 1135–1146, 1999. B. Schweizer and A. Sklar. Probabilistic Metric Spaces. North Holland, New York, 1983. A. Sklar. Fonctions de repartition a n dimensions et leurs marges. Pul Inst Statist Univ, Paris, 1959. T. Trivedi and D. Zimmer. Copula Modeling: an Introduction for Practitioners. Now Publishers, 2007. A. Tsiatis. Semiparametric Theory and Missing Data. Springer, New York, 2006. T. Vogelsang. Heteroskedasticity, autocorrelation, and spatial correlation robust inference in linear panel models with fixed-effects. 2008. J. Wooldridge. Cluster-sample methods in applied econometrics. American Economic Review, 93(2):133–138, 2003. 244 J. Wooldridge. Simple solutions to the initial conditions problem in dynamic, nonlinear panel data models with unobserved heterogeneity. Journal of Applied Econometrics, 20:39–54, 2005. J. Wooldridge. Inverse probability weighted estimation for general missing data problems. Journal of Econometrics, 141:1281–1301, 2007. J. Wooldridge. Correlated random effects models with unbalanced panels. mimeo, 2009. J. Wooldridge. Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge, Massachustte, 2010. E. Word, J. Johnston, H. Bain, B. Fulton, C. Achilles, M. Lintz, J. Folger, and C. Breda. The state of tennessee’s student/teacher achievement ratio (star) project: Technical report 1985-1990. Tennessee State Department of Education, 1990. 245