ESSAYS IN MULTIPLE FRACTIONAL RESPONSES WITH ENDOGENOUS EXPLANATORY VARIABLES By Suhyeon Nam A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Economics – Doctor of Philosophy 2014 ABSTRACT ESSAYS IN MULTIPLE FRACTIONAL RESPONSES WITH ENDOGENOUS EXPLANATORY VARIABLES By Suhyeon Nam This dissertation consists of three chapters. The first and second chapters develop new estimation methods for multiple fractional response variables with endogeneity. Multiple fractional response variables have two features. Each response is between zero and one, and the sum of the responses for a unit is one. The first chapter proposes an estimation method accounting for these features when there is a continuous endogenous explanatory variable (EEV). It is a two step estimation method combining a control function approach. The first step generates a control function using a linear regression, and the second step maximizes a multinomial log likelihood function with a multinomial logit conditional mean which depends on the control function generated in the first step. Monte Carlo simulations examine the performance of the estimation method when the conditional mean in the second step is misspecified. The simulation results provide evidence that the method’s average partial effect (APE) estimates approximate well true APEs as long as an instrument is not weak and that the method’s approximation outperforms an alternative linear method’s. We apply the proposed two step estimation method to Michigan’s fourth grade math test data to estimate the average partial effects of spending on student performance. The second chapter develops and evaluates an estimation method allowing for the discrete nature of an EEV. We modify the two step estimation method proposed in the first chapter by following Wooldridge (2014); instead of unstandardized residual, we use the generalized residuals as control functions The Monte Carlo simulation demonstrate that although the two step estimation method cannot provide consistent estimators for the mean parameters and average partial effects under the conditional mean misspecification, it yields a decent approximation to average partial effects. In the third chapter, we clarify some issues in computing average partial (or marginal) effects in models that have been estimated using control function or correlated random effects approaches (or some combination). In particular, we show that a common method of estimating average partial effects, where the averaging is done across all variables and across the entire sample, estimates an interesting parameter. Nevertheless, the method differs from averaging across the observed covariates the partial effects obtained via the average structural function. In the special case where unobservables are independent of the observed covariates the two methods are identical. Copyright by SUHYEON NAM 2014 To my husband, Sungsam Chung, my family, and God. v ACKNOWLEDGMENTS I would not have been able to accomplish my doctoral study without the help and support from many people. First, I am deeply grateful to my advisor, Professor Jeffrey Wooldridge for his invaluable guidance and continuous support during these long years. I would also like to thank my dissertation committee members, Professor Leslie Papke, Professor Peter Schmidt, and Dr. Kenneth Frank for their insightful comments and encouragements. I want to express my special thanks to Hajin Kim and Eunsil Lee, who have been my family in Michigan. My special thanks also go to my family. Especially, I am truly grateful to my parents and parents-in-law for their unconditional love and support. And I thank my husband, Sungsam Chung, who shared with me every moment of this journey. Finally, I thank God, who has always been with me. vi TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MULTIPLE FRACTIONAL RESPONSE VARIABLES WITH CONTINUOUS ENDOGENOUS EXPLANATORY VARIABLES . . . . 1.1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 THE MODEL AND ESTIMATION WITH ENDOGENEITY . . . . . . . . . . 1.3 MONTE CARLO SIMULATIONS . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 The Quantities of Interest . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Data Generating Process . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 APPLICATION: MICHIGAN EDUCATIONAL ASSESSMENT PROGRAM MATH TEST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x CHAPTER 1 MULTIPLE FRACTIONAL RESPONSE VARIABLES WITH A BINARY ENDOGENOUS EXPLANATORY VARIABLE . . . . . 2.1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 THE MODIFIED TWO STEP ESTIMATION . . . . . . . . . . . . . . . . . . 2.3 MONTE CARLO SIMULATION . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Data Generating Process . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 4 11 11 12 14 60 66 67 78 CHAPTER 2 ON COMPUTING AVERAGE PARTIAL EFFECTS IN MODELS WITH ENDOGENEITY OR HETEROGENEITY . . . . . . . . . 3.1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 THE AVERAGE STRUCTURAL FUNCTION AND AVERAGE PARTIAL EFFECTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 APES ACROSS THE ENTIRE POPULATION . . . . . . . . . . . . . . . . . 3.4 EMPIRICAL EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 81 83 89 89 90 115 116 CHAPTER 3 vii . 118 . 118 . . . . . 121 128 134 136 138 LIST OF TABLES Table 1.1 Average APEs under Condition 1 . . . . . . . . . . . . . . . . . . . . . 16 Table 1.2 Percentile APEs under Condition 1 and Normal distribution . . . . . 17 Table 1.3 Percentile APEs under Condition 1 and χ23 distribution . . . . . . . . . 19 Table 1.4 F statistics of the 1st step (H0 : π2 = 0) . . . . . . . . . . . . . . . . . . 33 Table 1.5 Average APEs under Condition 2 and Normal distribution . . . . . . 34 Table 1.6 Mean Squared Errors of Average APE estimates under Condition 2 . . 44 Table 1.7 Percentile APEs under Condition 2, π2 = 0.1, and Normal distribution 46 Table 1.8 Percentile APEs under Condition 2, π2 = 0.2, and Normal distribution 48 Table 1.9 Percentile APEs under Condition 2, π2 = 0.5, and Normal distribution 50 Table 1.10 Percentile APEs under Condition 2, π2 = 0.1, and χ23 distribution . . . 52 Table 1.11 Percentile APEs under Condition 2, π2 = 0.2, and χ23 distribution . . . 54 Table 1.12 Percentile APEs under Condition 2, π2 = 0.5, and χ23 distribution . . . 56 Table 1.13 Mean Squared Errors of Percentile APE estimates under Condition 2 . 58 Table 1.14 Four levels of MEAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Table 1.15 Summary statistics of the dependent variables . . . . . . . . . . . . . . 61 Table 1.16 Summary statistics of the data . . . . . . . . . . . . . . . . . . . . . . . 62 Table 1.17 The first step estimation result . . . . . . . . . . . . . . . . . . . . . . . 63 Table 1.18 Average APE estimates of spending on the fourth grade MEAP math test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Table 1.19 Percentile APE estimates of spending on the fourth grade MEAP math test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 viii Table 1.20 (yi1 , yi2 , yi3 ) generated across the simulations . . . . . . . . . . . . . . . 77 Table 2.1 APEs with r = ρu + e, π2 = 1, and e|v ∼ Normal (0, 1) . . . . . . . . . 92 Table 2.2 APEs with r = ρu + e, π2 = 1, and e|v ∼ χ23 Table 2.3 MSEs of APEs with r = ρu + e and π2 = 1 . . . . . . . . . . . . . . . . 94 Table 2.4 F and Wald statistics of the first stage/step (H0 : π2 = 0) . . . . . . . . 95 Table 2.5 APEs with w = 1[z2 π2 + u > 0], ρ = 1, and e|v ∼ Normal (0, 1) . . . . 97 Table 2.6 APEs with w = 1[z2 π2 + u > 0], ρ = 1, and e|v ∼ χ23 . . . . . . . . . . 98 Table 2.7 MSEs of APE estimates with w = 1[z2 π2 + u > 0] and ρ = 1 . . . . . . 99 Table 2.8 Rejection Frequencies for α = 0.05 test in Case 1 with varying ρ . . . . 101 Table 2.9 Rejection Frequencies for α = 0.05 test in Case 1 with varying π2 . . . 104 . . . . . . . . . . . . . . . 93 Table 2.10 APEs with r = ρu + e and π2 = 1 in Case 2 . . . . . . . . . . . . . . . . 107 Table 2.11 MSEs of APE estimates with r = ρu + e and π2 = 1 in Case 2 . . . . . 109 Table 2.12 APEs with w = 1[z2 π2 + u > 0] and ρ = 1 in Case 2 . . . . . . . . . . . 110 Table 2.13 MSEs of APE estimates with w = 1[z2 π2 + u > 0] and ρ = 1 in Case 2 111 Table 2.14 Rejection Frequencies for α = 0.05 test in Case 2 with varying ρ . . . . 113 Table 2.15 Rejection Frequencies for α = 0.05 test in Case 2 with varying π2 . . . 113 Table 3.1 APE estimates in Papke and Wooldridge (2008) . . . . . . . . . . . . . 135 Table 3.2 APE estimates in Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . 136 ix LIST OF FIGURES Figure 1.1 Empirical distributions of Average APE estimates under Condition 1 and Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Figure 1.2 Empirical distributions of Average APE estimates under Condition 1 and Logistic distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Figure 1.3 Empirical distributions of Average APE estimates under Condition 1 and χ23 distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Figure 1.4 Empirical distributions of Percentile APE estimates including w2 under Condition 1 and Normal distribution . . . . . . . . . . . . . . . . . 24 Figure 1.5 Empirical distributions of Percentile APE estimates including w2 under Condition 1 and Logistic distribution . . . . . . . . . . . . . . . . . 27 Figure 1.6 Empirical distribution of Percentile APEs under Condition 1 and χ23 distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Figure 1.7 Empirical distributions of Average APE estimates under Condition 2, π2 = 0.1, and Normal distribution . . . . . . . . . . . . . . . . . . . 35 Figure 1.8 Empirical distributions of Average APE estimates under Condition 2, π2 = 0.2, and Normal distribution . . . . . . . . . . . . . . . . . . . 36 Figure 1.9 Empirical distributions of Average APE estimates under Condition 2, π2 = 0.5, and Normal distribution . . . . . . . . . . . . . . . . . . . 37 Figure 1.10 Empirical distributions of Average APE estimates under Condition 2, π2 = 0.1, and Logistic distribution . . . . . . . . . . . . . . . . . . . 38 Figure 1.11 Empirical distributions of Average APE estimates under Condition 2, π2 = 0.2, and Logistic distribution . . . . . . . . . . . . . . . . . . . 39 Figure 1.12 Empirical distributions of Average APE estimates under Condition 2, π2 = 0.5, and Logistic distribution . . . . . . . . . . . . . . . . . . . 40 Figure 1.13 Empirical distributions of Average APE estimates under Condition 2, π2 = 0.1, and χ23 distribution . . . . . . . . . . . . . . . . . . . . . . . 41 x Figure 1.14 Empirical distributions of Average APE estimates under Condition 2, π2 = 0.2, and χ23 distribution . . . . . . . . . . . . . . . . . . . . . . . 42 Figure 1.15 Empirical distributions of Average APE estimates under Condition 2, π2 = 0.5, and χ23 distribution . . . . . . . . . . . . . . . . . . . . . . . 43 Figure 2.1 Rejection Frequencies for Normal(0,1) with varying ρ . . . . . . . . . . 102 Figure 2.2 Rejection Frequencies for χ23 with varying ρ . . . . . . . . . . . . . . . 102 Figure 2.3 Rejection Frequencies for Normal(0,1) with varying π2 , (π1 = τ = 0) . 105 Figure 2.4 Rejection Frequencies for χ23 with varying π2 , (π1 = τ = 0) . . . . . . 105 Figure 2.5 Rejection Frequencies for Normal 0, 1 + 21 v2 with varying π2 , (π1 = τ = 0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Figure 2.6 Rejection Frequencies for Normal 0, 1 + 12 v2 with varying π2 , (π1 = τ = 0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 xi CHAPTER 1 MULTIPLE FRACTIONAL RESPONSE VARIABLES WITH CONTINUOUS ENDOGENOUS EXPLANATORY VARIABLES 1.1 INTRODUCTION Fractional responses have interesting functional form issues that have been attracting econometricians’ attentions. The research began with a single fractional response, a fractional scalar yi , which has a salient feature - the bounded nature: 0 ≤ yi ≤ 1. Then it has moved to two kinds of systems of fractional responses. One is panel data setting in which a cross sectional unit has relatively smaller time periods. The other is multiple responses in which a cross sectional unit has a set of several choices. For a single fractional response, an OLS estimator or an IV estimator of a linear model are consistent for the parameters in the linear projection. They, however, do not guarantee that their fitted values lie within the unit interval nor that their partial effects estimates y , is a for regressors’ extreme values are good.1 The log-odds transformation, log 1−y traditional solution to recognize the bounded nature. But it requires the response to be strictly inside the unit interval. Papke and Wooldridge (1996) introduce a quasi maximum likelihood estimation (QMLE), which is applicable even when the response takes the boundary values. Their nonlinear estimation method directly models the conditional mean of the response as an appropriate function. 1 Theses are the same drawbacks as the linear probability model for a binary response has. 1 Papke and Wooldridge (2008) extend their single fractional response discussion to a panel data setting with allowing for endogeneity. They allow time invariant unobserved effect to be correlated with explanatory variables and develop another QMLE method employing a control function approach to account for endogeneity. Multiple fractional responses have one additional feature as well as the bounded nature: an adding-up constraint. The sum of a cross sectional unit’s multiple responses is one. For example, a researcher studies student performance for a test in a state. Suppose that she is interested in how the public school spending affects the test outcome and her response variable is a set of students’ pass and fail shares in a district i, (yi,pass , yi,fail ) where yi,pass + yi,fail = 1. This example can fall into the single fractional response category since there are only two shares. However, if the test outcomes are graded into the student’s level of proficiency from Level 1 to Level 4 and district-level data she can access contain the four shares (yi,level1 , yi,level2 , yi,level3 , yi,level4 ) where ∑41 yi,levelg = 1 instead of the pass and fail rates, the types of behaviors described by the single fractional response analysis are limited. Hence, an alternative estimation method is required to exploit all of the available information when there are more than two shares. Such an estimation method is proposed by Sivakumar and Bhat (2002). It is a method of QMLE with the multinomial distribution and the multinomial logit conditional mean specification. It is a multivariate generalization of the method proposed by Papke and Wooldridge (1996). Mullahy (2010) studies this method with more detail. Buis (2008) has written a STATA module of this QMLE, and dubs it as “fractional multinomial logit (fmlogit).” In this chapter, we also refer this QMLE as fractional multinomial logit or fmlogit. Although these studies develop a new estimation method, which can consistently estimate the parameters in the mean as long as the mean specification is correct, they do not address endogeneity. In empirical works, however, endogeneity often arises. In the student performance example, school spending could be endogenous because it is likely 2 to be correlated with some unobserved district effects. This endogeneity issue may lead to inconsistency of the fractional multinomial logit estimation. In this chapter, we develop an estimation method for multiple fractional responses with endogenous explanatory variables. In the model, we allow a continuous endogenous explanatory variable to be correlated with an unobserved omitted variable. Then we propose a two step estimation method employing a control function approach to deal with the endogeneity. The first step generates a control function and the second step applies fractional multinomial logit with including the control function as an extra regressor in the conditional mean. The method can provide a consistent estimator of the conditional mean parameters provided that the conditional mean specification in the second step is correct. A distinct feature of this method is that although the multinomial logit specification in the second step is sensible as a multiple fractional responses’ conditional mean, it is not underpinned by usual structural assumptions. The functional form of the conditional mean in the second step is determined by the two structural components. One is the functional form of the conditional mean depending on the unobserved omitted variable (structural conditional mean). The other is the distributional assumption of the error, which appears when the control function approach is combined. However, there are no closed forms for them to allow the second step conditional mean to be multinomial logit. Thus we suggest directly specifying the conditional mean of the second step as multinomial logit without usual assumptions about them. But we would like to examine how the two step estimation method works if the multinomial logit specification is wrong by conducting Monte Carlo simulations. The simulations focus on whether or not the estimates by the proposed two step estimation method can approximate well the average partial effects of the endogenous explanatory variable, which is the partial effects of the endogenous explanatory variable on the conditional mean averaged across the population distribution of the endogenous explanatory vari- 3 able. Further, we compare the method’s approximation ability with an alternative linear model’s. The simulation results provide evidence that even though the conditional mean is misspecified, the two step estimation method with a strong instrument yields a good approximation. Although a weak instrument deteriorates its approximation performance, it still outperforms an alternative linear approach. The rest of the chapter is organized as follows. Section 1.2 describes the model and the two step estimation method.Section 1.3 presents a Monte Carlo simulation design and results where the conditional mean of the two step estimation method is misspecified. Section 1.4 includes an application of the method to examine the relationship between public school spending and the fourth grade math test outcome for Michigan. And Section 1.5 concludes the chapter. 1.2 THE MODEL AND ESTIMATION WITH ENDOGENEITY We assume that a random sampling across the cross section is available, and each cross sectional unit i has G choices where the sum of i’s responses is one. The dependent variable for i is expressed as   yi1    ..   .       yi =   yig     ..   .    yiG (1.1) G ×1 where 0 ≤ yig ≤ 1, g = 1, · · · , G, 4 (1.2) and G ∑ yig = 1. (1.3) g (1.2) and (1.3) represent the bounded nature and the adding-up constraint, respectively. To represent endogeneity in the model, we assume that there is a continuous endogenous explanatory variable wig , and that it is correlated with an unobserved omitted variable rig . To simplify the exposition, wig and rig are assumed to be invariant across choices: ∀ g, wig = wi and rig = ri . Then, for a set of explanatory variables in all choices, Xi = (xi1 , · · · , xiG ), we assume E(yig |Xi , ri ) = E(yig |Zi , wi , ri ) = Gg (Zi1 , wi , ri ; β), g = 1, · · · , G, (1.4) where 0 < Gg (·) < 1 (1.5) and G ∑ Gg (·) = 1. (1.6) g Zi ≡ (zi1 , · · · , ziG ) is a set of exogenous variables in all choices where zig = (zi1g zi2g ) indicates exogenous variables for choice g and Zi1 ≡ (zi11 , · · · , zi1G ) is a set of zi1g , ∀ g. (1.5) ensures that the fitted value will lie between zero and one. The adding-up constraint (1.3) leads to (1.6). Any function satisfying both (1.5) and (1.6) can be specified for G (·). wi can appear very flexibly in (1.4); for example, we can add wi2 to allow for the quadratic effect of w. If wi and wi2 appear in the specification, plug-in methods are subject to the “forbidden regression” problem as Wooldridge (2010) discusses. To deal with the endogeneity, we employ a control function approach. It includes extra regressors in the estimating equation so that the remaining variation in the endogenous explanatory variable would not be correlated with the unobservables. Since the approach 5 requires an exclusion restriction, a part of Zi appears in Gg (·). We further assume wi = f ( Z i ; π ) + v i ri = ρvi + ei (1.7) (1.8) and (ri , vi ) is independent of Zi . (1.9) (1.7) models the endogenous variable wi as a function of Zi where π is the parameter vector. It includes the exogenous variables excluded from (1.4) so that the instruments could be allowed to be correlated with w. (1.8) models the omitted variable ri as a linear function of the reduced form error vi , which plays a role of the control function - the extra regressor - in this study, where ei is independent of wi . (1.8) is for simplicity; it can be allowed to be more flexible by modeling it with polynomial functions of vi as well as vi . (1.8) reveals that if there is any correlation between wi and ri , it can only come through vi . So ρ shows how much wi is correlated with ri , and consequently tells whether wi is endogenous or not. Due to (1.9), wi cannot have discreteness. The independence assumption implies D ( ei | Z i , v i ) = D ( ei ). (1.10) (1.8) and (1.9) ensure that a single control function, vi , can correct the endogeneity of wi even when flexible functional forms for wi appear in (1.4). If we assume a parametric model for the distribution in (1.10), then one could derive the mean function conditional on Xi as E(yig |Xi ) = K g (Zi1 , wi , vi ; θ) (1.11) 0 < K g (·) < 1 (1.12) where 6 and G ∑ Kg (·) = 1. (1.13) g If we know the functional form of K g (·) and vi is observed, θ can be estimated by nonlinear least squares or a QMLE using multinomial distribution by specifying K g (·) as a proper functional form satisfying (1.12) and (1.13). However, since vi is unobserved here, a simple way to estimate θ is to replace vi with a consistent estimator of vi and apply one of those estimation methods. In general, it is difficult to start with function Gg (·) and a distribution for ei and obtain K g (·) as a simple function. Instead, the proposal is to directly model K g (·) parametrically. A natural choice for a proper functional form of K g (·) is multinomial logit, K g (hi ; θ) = exp hi θg ∑hG exp (hi θh ) (1.14) where hi = (xi1 vi ) = (zi1 wi vi ) is a 1 × p vector, θ = (θ1 . . . θG ) is a pG × 1 parameter vector, θg is a p × 1 vector, g = 2, · · · , G, and θ1 = 0.2 In the basic multinomial logit model, a set of explanatory variables change by unit i but not by choice g. Its coefficient parameters change by choice g, instead.3 In accordance with this choice, we rewrite (1.7) as a linear function of zi = (zi1 zi2 )1× M : wi = zi π + vi = zi1 π 1 + zi2 π 2 + vi (1.15) where π = (π 1 π 2 ) is a M × 1 parameter vector and the constant is subsumed in zi1 . The transformation of w should be carefully chosen to yield (1.15) where vi is arguably independent of zi . Plus, we can add zi in a flexible way. (1.15) is to simplify the notation. Then we propose the following procedure for θ: 2 The first choice is a reference. Hence this specification is appropriate for problems where the characteristics of choices are unimportant or are not of interest. 3 7 Procedure 1.2.1 Step 1. Obtain the OLS residual vi from the regression of wi on zi . Step 2. Apply fractional multinomial logit of (yi1 , yi2 , · · · , yiG ) on zi1 , wi and vi to estimate θ. This is a QMLE with (1.14) and the following log likelihood for i replacing vi with vi : G i ( θ) = ∑ yig log K g (hi ; θ). (1.16) g Procedure 1.2.1 yields a consistent estimator of θ under (1.14). Its consistency does not hinge on whether or not (1.16) is true. It is because the multinomial distribution is a member of the linear exponential family (LEF). Gourieroux et al. (1984) show that a QMLE with a distribution in the LEF provides a consistent estimate of the parameters in a correctly specified conditional mean even when the rest of distribution is misspecified. Furthermore, Procedure 1.2.1 is able to provide a very useful estimator for the quantity regarding the structural conditional mean Gg (·). Dropping the cross-sectional unit i, the partial effect of interest for a continuous explanatory variable x1j , the jth element of x1 is ∂Gg (x1 , r; β) ∂E(y g |x, r ) = , ∂x1j ∂x1j ∀ g, (1.17) where x = (z, w) and x1 = (z1 , w). However, (1.17) is not identified because r is unobserved. Thus the quantity of more interest is the average partial effect (APE), which can be identified by averaging (1.17) over the distribution of r: Er ∂Gg (x01 , r; β) ∂x1j , ∀g (1.18) where the APEs are evaluated at x01 , a set of fixed values of the covariates. From Wooldridge 8 (2010, section 2.2.5) Er ∂Gg (x01 , r; β) ∂x1j = Ev ∂K g (x01 , v; θ) ∂x1j (1.19) under (1.9) and (1.15). Hence Procedure 1.2.1 can estimate the APE on the structural conditional mean Gg (·) even though it does not estimate β, the structural parameters. The asymptotic variances of θ and the APE estimator need to consider the additional variation from the first step of the procedure. The appendix derives their valid asymptotic variances. Notice that the two step estimation method does not assume anything about a functional form of Gg (·) and a distribution of D (e) although they determine the functional form of K g (·). If a combination of their specific forms were able to obtain a certain explicit functional form for K g (·) satisfying (1.12) and (1.13), we would assume those specific forms and maximize the multinomial distribution with a derived K g (·) from those assumptions. However, there are no closed forms of Gg (·) and D (e) to generate an explicitly known form satisfying (1.12) and (1.13). Considering that Gg (·) should satisfy (1.5) and (1.6), a natural choice for Gg (·) is also multinomial logit. Yet, it cannot derive a closed form of K g (·) whatever D (e) is. If Gg (·) is specified as (1.20), it can derive an explicit form by assuming that e is normally distributed: Gg (Xi ; β) = Φ(xig β), G −1 GG ( X i ; β ) = 1 − ∑ g = 1, · · · , G − 1, Φ(xig β) (1.20) g where Φ(·) is the standard normal cumulative distribution function. Based on the mixing property of the normal distribution, the derived function is similar as (1.20). However, GG (·) and the derived function for choice G are not necessarily between zero and one, which violate (1.5) and (1.12). It is the same drawback that the linear models have. So (1.20) is not appropriate, either. 9 Alternatively, the two step estimation method suggests directly specifying K g (·) as multinomial logit. That is, instead of making usual assumptions about Gg (·) and D (e), it implicitly assumes that their combination derives the multinomial logit functional form of K g (·).4 To be cautious about this, we conduct Monte Carlo simulations to investigate how it works as an approximation when the specification is wrong. Some researchers may be inclined to use a linear model rather than a nonlinear model, in which case it would be natural to drop one of the G equations and apply a linear method, say a linear control function (LCF) approach to remaining equations. Then, for choice g, w’s coefficient parameter estimate by the linear method is comparable to (1.23). Procedure 1.2.2 summarizes the LCF approach: Procedure 1.2.2 Step 1. Obtain the OLS residual vi from the regression of wi on zi . This is the same as Step 1 of Procedure 1.2.1. Step 2. For each g = 2, · · · , G,5 regress yig on zi1 , wi and vi to estimate γ g , where γ g = (γzg γwg γvg ) is a 4 × 1 coefficient parameter vector for choice g. Obtain γ1 based on γ1 = e1 − γ2 − γ3 , where e1 is a 4 × 1 unit vector.6 The asymptotic variance of γ g also needs the adjustment taking the extra variation from the first step into account; see the appendix. The simulations compare the approximation by the two step estimation method with the misspecified conditional mean and one by this LCF approach. 4 The approach reflects the manner in which Petrin and Train (2010) employ a control function approach when their dependent variable is a multinomial choice. They divide the structural error in their consumer utility into two parts to generate a mixed logit. Without a distributional assumption of the structural error, one divided part is assumed to be normal and the other is assumed to be type 1 extreme value. 5 The first choice is dropped as the reference choice. 6 The coefficients of a variable across choices sum to be 0 and those of the constant sum to be 1 because of (1.3). 10 1.3 MONTE CARLO SIMULATIONS 1.3.1 The Quantities of Interest The quantity of interest in the simulations is the APE of the endogenous explanatory variable w, Er ∂Gg (x01 , r; β) ∂w = Er ∂Gg (z01 , w0 , r; β) ∂w , ∀ g. (1.21) Since (1.21) depends on where it is evaluated, the simulations use two approaches to obtain a single number. One is averaging (1.21) out across the sample again and the other is evaluating (1.21) at a certain set of values, (z1 , w p ) where z1 is the mean of z1 and w p stands for the pth percentile of w’s distribution. We call the former “average APE” and the latter “percentile APE.” If the two step estimation method’s mean specification (1.14) is correct, (1.21) is obtained by estimating Ev ∂K g (x01 , v; θ) ∂w = Ev K g (x1◦ , v; θ) · ∑hG θwh exp (x1◦ θxh + θvh v) θwg − ∑hG exp x1◦ θxh + θvh v (1.22) where θxh = (θzh θwh ) . Since the distribution of v is not assumed, (1.22) can be estimated by averaging out vi across the sample, instead: 1 N N   ∑ Kg (x01, vi ; θ) · θwg − i ∑hG θwh exp x1◦ θxh + θvh vi ∑hG exp x1◦ θxh + θvh vi   (1.23) where θ is obtained from Procedure 1.2.1. The simulations let (1.14) be misspecified, and so we examine how close (1.23) is to (1.21), if it is closer than the estimates by the LCF approach. Some simulations allow for w’s quadratic effect by including w2 in the model. These simulations add v2i and v3i in the two procedures’ second steps to see if it improves their 11 approximations. 1.3.2 Data Generating Process For the simulations, the number of observation N and choice G are 500 and 3, respectively. We use 1000 replications. The covariates For each replication, we generate 500 observations of zi , wi , ri , vi and ei as following. • zi = (zi1 zi2 ) = (1 zi1 zi2 )1×3 where zi1 = (1 zi1 ) and       zi1   0   1 τ    ∼ MV Normal   ,   , zi2 0 τ 1 τ ∈ {0, −0.5}. There are one included exogenous variable and one excluded exogenous variable where they are drawn from the multivariate normal distribution. A simulation allows them to be correlated: τ = −0.5. • D (e) is one of the three distributions: (a) ei ∼ Normal (0, 1) (b) ei ∼ Logistic(0, 1) (c) ei ∼ χ23 To study various misspecifications, three distributions of e are in use: two symmetric distributions and one asymmetric distribution. • vi ∼ Normal (0, σ2 )7 7 σ2 is adjusted for the variance of wi to be invariant across the simulations. 12 • wi = π1 zi1 + π2 zi2 + vi The endogenous variable is generated based on (1.15). The coefficient parameter for the constant is set to be zero. • ri = ρvi + ei The omitted variable is generated based on (1.8). The structural conditional mean Gg (·) specification: We specify Gg (·) as multinomial logit because it satisfies (1.5) and (1.6): E(yig |xi , ri ) = Gg (zi1 , wi , ri ; β) = exp zi1 βzg + wi β wg + ri β rg ∑3h exp (zi1 βzh + wi β wh + ri β rh ) (1.24) where β = ( β1 β2 β3 ) is a 12 × 1 parameter vector, β g = ( βzg β wg β rg ) is a 4 × 1 parameter vector for g = 2, 3, and β1 = 0 since the first choice is chosen as a reference. Other parameters are set to be 1: β g = (1 1 1 1) for g = 2, 3.8 Note that (1.14) is misspecified under (1.24) and any of the three distributions for e. The multiple fractional dependent variables y: The multiple fractional dependent variables for each observation i are generated by the following process. (1) Calculate the response probabilities Gi1 , Gi2 , and Gi3 by using (1.24) and the covariates generated above.9 (2) Draw 100 multinomial outcomes among 1, 2, and 3 based on the calculated response probabilities. 8 For the simulations including w2 , Gg (zi1 , wi , ri ; β) = exp zi1 βzg + wi β wg + wi2 β w2 g + ri β rg ∑3h exp zi1 βzh + wi β wh + wi2 β w2 h + ri β rh where β is a 15 × 1 parameter vector, β g = ( βzg β wg β w2 g β rg ) = (1 1 1 − 0.1 1) for g = 2, 3. 9 (1.25) is used for the model including w2 . 13 (1.25) (3) Count the frequencies and obtain the proportion for each outcome. For instances, if 1 is drawn 50 times, 2 is drawn 30 times, and 3 is drawn 20 times for an observation i, then yi1 = 0.5, yi2 = 0.3, and yi3 = 0.2.10 Appendix includes a table showing the summary of (yi1 , yi2 , yi3 ) generated by this process. 1.3.3 Simulation Results The first columns of the simulation result tables show whether or not the quadratic effect of w is included in the model: w indicates that it is not, and w2 indicates that it is. The tables report the mean of (1.21) over the 1000 replications (True) along with the results of the two step estimation method and the LCF approach - the means of the estimates (Mean), their standard deviations (SD), and the means of their adjusted standard errors (SE). For the model including the quadratic effect, there are two additional estimation results allowing for the control function in a flexible way: Two step (flexible) and LCF (flexible). They includ v, v2 ,and v3 in the second stage; We would like to examine if it helps the approximations. Condition 1: π2 = 1, ρ = 1 In generating the data for Condition 1, we allow the instrument and the endogeneity to be strong: wi = zi2 + vi and ri = vi + ei .11,12 The simulation results under Condition 1 demonstrate that while both the two step estimation method and the LCF approach provide good approximations to average APEs, the two step estimation method provides better percentile APE estimates. In Table 1.1, the average APE estimates by the two methods are quite similar to true APEs. However, 10 Through this process, the upper corner 1 is generated only for the reference choice while the lower corner 0 is generated for all three choices because of the multinomial logit response probabilities structure. 11 z has no effect on w : π = 0 1 i1 i 12 The simulations allowing z to affect w (w = 0.5z + z + v , τ = −0.5, ρ = 1) provide similar i1 i i i1 i2 i results as those under Condition 1. 14 Table 1.2 and Table 1.3 illustrate that the percentile APE estimates by the two step estimation method have less biased without any sign distortions across the percentiles of w’s distribution than those by the LCF approach.13 The percentile estimates by the LCF approach under the χ23 distribution have the opposite directions to the true APEs at the 90th percentile. The results also present that allowing for the flexible forms of vi does not help the approximations when w’s quadratic effect is included in the models. In Table 1.1 through 1.3, the estimates with allowing for the flexible forms of vi are similar to those without it. The empirical distributions of APE estimates in Figure 1.1 through 1.6, confirm these results. 13 The results of percentile APEs under the logistic distribution are similar to those under the normal distribution in general. 15 Table 1.1: Average APEs under Condition 1 D (e) g True Two Step w LCF w2 True Two Step Two Step (Flexible) LCF LCF (Flexible) 1. 2. Mean Mean SD SE Mean SD SE Mean Mean SD SE Mean SD SE Mean SD SE Mean SD SE Normal 1 2 -0.117 0.059 -0.117 0.059 0.008 0.005 -0.109 0.055 0.009 0.005 0.009 0.005 -0.127 0.063 -0.127 0.063 0.009 0.005 -0.126 0.063 0.009 0.005 -0.118 0.059 0.009 0.005 0.008 0.005 -0.118 0.059 0.008 0.005 0.008 0.005 3 0.059 0.058 0.005 0.054 0.005 0.005 0.063 0.063 0.005 0.063 0.005 0.059 0.005 0.005 0.059 0.005 0.005 Logistic 1 2 -0.108 0.054 -0.108 0.054 0.011 0.006 -0.102 0.051 0.011 0.006 0.011 0.006 -0.116 0.058 -0.115 0.058 0.011 0.006 -0.115 0.057 0.011 0.006 -0.108 0.054 0.011 0.006 0.011 0.006 -0.109 0.054 0.011 0.006 0.011 0.006 3 0.054 0.054 0.006 0.051 0.006 0.006 0.058 0.058 0.006 0.057 0.006 0.054 0.006 0.006 0.054 0.006 0.006 1 -0.049 -0.048 0.007 -0.052 0.008 0.008 -0.061 -0.060 0.008 -0.058 0.008 -0.064 0.008 0.008 -0.064 0.008 0.008 χ23 2 0.025 0.024 0.004 0.026 0.005 0.005 0.031 0.030 0.004 0.029 0.004 0.032 0.005 0.004 0.032 0.005 0.004 3 0.025 0.024 0.004 0.026 0.005 0.005 0.031 0.030 0.005 0.029 0.004 0.032 0.005 0.004 0.032 0.005 0.004 π1 = 0, π2 = 1, ρ = 1 We cannot obtain the standard errors of average APE estimates by the two step estimation method; the process in STATA to calculate them takes too much time to complete. 16 Table 1.2: Percentile APEs under Condition 1 and Normal distribution wp g w True Two Step w2 True Two Step Two Step (Flexible) LCF LCF (Flexible) 1. 2. Mean Mean SD SE Mean Mean SD SE Mean SD SE Mean SD SE Mean SD SE 1 -0.181 -0.184 0.016 0.016 -0.241 -0.240 0.022 0.021 -0.240 0.022 0.021 -0.206 0.018 0.016 -0.190 0.018 0.016 10th 2 0.090 0.092 0.008 0.008 0.121 0.120 0.011 0.011 0.120 0.011 0.011 0.103 0.009 0.008 0.095 0.009 0.008 3 0.090 0.092 0.008 0.008 0.121 0.120 0.011 0.011 0.120 0.011 0.011 0.103 0.009 0.008 0.095 0.009 0.009 1 -0.171 -0.173 0.018 0.018 -0.206 -0.208 0.021 0.020 -0.208 0.021 0.020 -0.164 0.013 0.012 -0.156 0.013 0.012 25th 2 0.085 0.087 0.009 0.009 0.103 0.104 0.011 0.010 0.104 0.011 0.010 0.082 0.007 0.006 0.078 0.007 0.006 APEs at (z1 , w p ). π1 = 0, π2 = 1, ρ = 1. 17 3 0.085 0.087 0.009 0.009 0.103 0.104 0.011 0.010 0.104 0.011 0.010 0.082 0.007 0.006 0.078 0.006 0.006 1 -0.132 -0.131 0.012 0.011 -0.133 -0.133 0.014 0.011 -0.132 0.014 0.011 -0.118 0.009 0.008 -0.118 0.009 0.008 50th 2 0.066 0.066 0.006 0.006 0.066 0.067 0.007 0.006 0.066 0.008 0.006 0.059 0.005 0.005 0.059 0.005 0.005 3 0.066 0.066 0.006 0.006 0.066 0.066 0.007 0.006 0.066 0.007 0.006 0.059 0.005 0.005 0.059 0.005 0.005 1 -0.085 -0.082 0.007 0.004 -0.073 -0.072 0.009 0.007 -0.071 0.010 0.008 -0.071 0.010 0.010 -0.080 0.010 0.009 75th 2 0.043 0.041 0.004 0.003 0.036 0.036 0.005 0.005 0.036 0.006 0.005 0.036 0.006 0.006 0.040 0.005 0.005 3 0.043 0.041 0.004 0.003 0.036 0.036 0.006 0.005 0.036 0.006 0.005 0.035 0.006 0.006 0.040 0.006 0.005 Table 1.2: (cont’d) wp g w True Two Step w2 True Two Step Two Step (Flexible) LCF LCF (Flexible) Mean Mean SD SE Mean Mean SD SE Mean SD SE Mean SD SE Mean SD SE 18 1 -0.050 -0.048 0.006 0.003 -0.040 -0.039 0.007 0.005 -0.039 0.007 0.006 -0.029 0.016 0.014 -0.045 0.014 0.012 90th 2 0.025 0.024 0.004 0.003 0.020 0.020 0.005 0.005 0.019 0.006 0.005 0.014 0.008 0.008 0.023 0.008 0.007 3 0.025 0.024 0.004 0.003 0.020 0.019 0.006 0.005 0.019 0.006 0.005 0.014 0.009 0.008 0.023 0.008 0.007 Table 1.3: Percentile APEs under Condition 1 and χ23 distribution wp g w True Two Step w2 True Two Step Two Step (Flexible) LCF LCF (Flexible) mean Mean SD SE Mean Mean SD SE Mean SD SE Mean SD SE Mean SD SE 1 -0.100 -0.096 0.022 0.020 -0.153 -0.134 0.033 0.029 -0.133 0.032 0.028 -0.162 0.019 0.017 -0.143 0.022 0.020 10th 2 0.050 0.048 0.011 0.010 0.076 0.067 0.017 0.015 0.067 0.016 0.015 0.081 0.010 0.009 0.072 0.011 0.010 3 0.050 0.048 0.011 0.010 0.076 0.067 0.017 0.015 0.066 0.016 0.015 0.081 0.010 0.009 0.071 0.011 0.010 1 -0.068 -0.061 0.011 0.010 -0.085 -0.081 0.013 0.012 -0.078 0.013 0.012 -0.116 0.013 0.012 -0.106 0.015 0.014 25th 2 0.034 0.031 0.006 0.005 0.043 0.041 0.007 0.007 0.039 0.007 0.007 0.058 0.007 0.007 0.053 0.008 0.007 1. 3 0.034 0.030 0.006 0.005 0.043 0.040 0.007 0.007 0.039 0.007 0.007 0.058 0.007 0.007 0.053 0.008 0.007 1 -0.038 -0.033 0.004 0.003 -0.038 -0.040 0.007 0.006 -0.039 0.007 0.006 -0.064 0.009 0.008 -0.064 0.008 0.008 50th 2 0.019 0.017 0.003 0.003 0.019 0.020 0.004 0.004 0.019 0.004 0.005 0.032 0.005 0.004 0.032 0.005 0.004 3 0.019 0.017 0.003 0.003 0.019 0.020 0.004 0.004 0.019 0.004 0.005 0.032 0.005 0.005 0.032 0.005 0.004 1 -0.018 -0.017 0.002 0.002 -0.016 -0.018 0.004 0.003 -0.018 0.004 0.004 -0.013 0.008 0.007 -0.023 0.007 0.006 75th 2 0.009 0.009 0.003 0.003 0.008 0.009 0.003 0.003 0.009 0.003 0.005 0.006 0.005 0.005 0.011 0.004 0.004 3 0.009 0.009 0.003 0.003 0.008 0.009 0.003 0.003 0.009 0.004 0.005 0.006 0.005 0.005 0.011 0.005 0.004 APEs at (z1 , w p ). π1 = 0, π2 = 1, ρ = 1. 3. The grey colored cells indicate that at least one of the APE estimates for three choices has the opposite direction to its true APE. 2. 19 Table 1.3: (cont’d) wp g w True Two Step w2 True Two Step Two Step (Flexible) LCF LCF (Flexible) mean Mean SD SE Mean Mean SD SE Mean SD SE Mean SD SE Mean SD SE 20 1 -0.009 -0.009 0.002 0.001 -0.007 -0.008 0.002 0.002 -0.008 0.002 0.002 0.034 0.012 0.011 0.015 0.012 0.010 90th 2 3 0.004 0.004 0.005 0.005 0.003 0.003 0.003 0.003 0.004 0.004 0.004 0.004 0.004 0.004 0.004 0.004 0.004 0.004 0.004 0.005 0.006 0.006 -0.017 -0.017 0.007 0.007 0.006 0.006 -0.007 -0.007 0.007 0.007 0.006 0.006 −.14 −.12 ape1 −.08 100 .04 .05 .06 ape2 2Step .07 .08 −.12 ape1 2Step 2Step(flexible) −.1 LCF LCF(flexible) −.08 .06 ape3 .07 .08 LCF 60 0 20 density 40 60 20 0 −.14 .05 2Step density 40 60 density 40 20 0 −.16 .04 LCF 80 LCF 80 2Step −.1 80 −.16 0 20 density 40 60 80 100 80 density 40 60 20 0 0 20 density 40 60 80 100 Figure 1.1: Empirical distributions of Average APE estimates under Condition 1 and Normal distribution .04 .05 .06 ape2 2Step 2Step(flexible) 1. The .07 LCF LCF(flexible) .08 .04 .05 .06 ape3 2Step 2Step(flexible) .07 .08 LCF LCF(flexible) distributions in the first row are those of the estimates when the model includes only w and those in the second row are when the model includes w2 . 21 −.12 −.1 ape1 −.08 80 .03 .04 .05 ape2 2Step .06 .07 .04 LCF −.1 −.08 ape1 2Step 2Step(flexible) LCF LCF(flexible) .07 LCF 60 0 20 density 40 60 20 0 −.12 .06 2Step density 40 60 density 40 20 0 −.14 .05 ape3 80 LCF 80 2Step −.06 80 −.14 0 20 density 40 60 80 60 density 40 20 0 0 20 density 40 60 80 Figure 1.2: Empirical distributions of Average APE estimates under Condition 1 and Logistic distribution .04 .05 .06 ape2 2Step 2Step(flexible) 1. The .07 .08 .03 .04 .05 .06 .07 .08 ape3 LCF LCF(flexible) 2Step 2Step(flexible) LCF LCF(flexible) distributions in the first row are those of the estimates when the model includes only w and those in the second row are when the model includes w2 . 22 −.07 −.06 −.05 −.04 −.03 100 0 20 density 40 60 80 100 80 density 40 60 20 0 0 20 density 40 60 80 100 Figure 1.3: Empirical distributions of Average APE estimates under Condition 1 and χ23 distribution −.02 .01 .02 ape1 .04 LCF −.06 ape1 2Step 2Step(flexible) −.05 −.04 LCF LCF(flexible) .04 LCF 60 0 20 density 40 60 20 0 −.07 .03 2Step density 40 60 density 40 20 −.08 .02 80 2Step 0 −.09 .01 ape3 80 LCF 80 2Step .03 ape2 .01 .02 .03 ape2 2Step 2Step(flexible) 1. The .04 LCF LCF(flexible) .05 .01 .02 .03 ape3 2Step 2Step(flexible) .04 .05 LCF LCF(flexible) distributions in the first row are those of the estimates when the model includes only w and those in the second row are when the model includes w2 . 23 −.3 −.25 −.2 −.15 50 0 10 density 20 30 40 50 40 density 20 30 10 0 0 10 density 20 30 40 50 Figure 1.4: Empirical distributions of Percentile APE estimates including w2 under Condition 1 and Normal distribution .08 .1 ape1_p10 .16 .08 LCF LCF(flexible) −.25 −.2 −.15 ape1_p25 2Step 2Step(flexible) LCF LCF(flexible) .12 ape3_p10 .14 2Step 2Step(flexible) LCF LCF(flexible) .08 .12 .16 40 0 20 density 40 0 20 density 40 density 20 0 −.3 .1 60 2Step 2Step(flexible) .14 60 LCF LCF(flexible) 60 2Step 2Step(flexible) .12 ape2_p10 .06 .08 .1 ape2_p25 2Step 2Step(flexible) 24 .12 LCF LCF(flexible) .14 .06 .1 ape3_p25 2Step 2Step(flexible) LCF LCF(flexible) .14 −.14 ape1_p50 −.12 −.1 LCF LCF(flexible) −.1 −.06 80 .05 .06 .07 ape2_p50 .08 .09 2Step 2Step(flexible) LCF LCF(flexible) .03 .05 .04 −.08 ape1_p75 2Step 2Step(flexible) LCF LCF(flexible) −.04 .06 .07 ape3_p50 .08 .09 LCF LCF(flexible) 60 0 20 density 40 60 0 20 density 40 60 density 40 20 0 −.12 .05 2Step 2Step(flexible) 80 2Step 2Step(flexible) .04 80 −.16 80 −.18 0 20 density 40 60 80 60 density 40 20 0 0 20 density 40 60 80 Figure 1.4: (cont’d) .02 .04 ape2_p75 2Step 2Step(flexible) 25 LCF LCF(flexible) .06 .02 .03 .04 ape3_p75 2Step 2Step(flexible) .05 LCF LCF(flexible) .06 −.06 −.04 ape1_p90 2Step 2Step(flexible) −.02 LCF LCF(flexible) 0 80 0 20 density 40 60 80 60 density 40 20 0 0 20 density 40 60 80 Figure 1.4: (cont’d) 0 .01 .02 ape2_p90 2Step 2Step(flexible) 26 .03 LCF LCF(flexible) .04 0 .01 .02 ape3_p90 2Step 2Step(flexible) .03 LCF LCF(flexible) .04 −.25 −.2 ape1_p10 −.1 40 .04 .06 .08 .1 ape2_p10 .12 2Step 2Step(flexible) .14 .04 −.2 −.15 ape1_p25 2Step 2Step(flexible) LCF LCF(flexible) −.1 .08 .1 ape3_p10 2Step 2Step(flexible) .12 .14 LCF LCF(flexible) 40 0 20 density 40 0 20 density 40 density 20 0 −.25 .06 LCF LCF(flexible) 60 LCF LCF(flexible) 60 2Step 2Step(flexible) −.15 60 −.3 0 10 density 20 30 40 30 density 20 10 0 0 10 density 20 30 40 Figure 1.5: Empirical distributions of Percentile APE estimates including w2 under Condition 1 and Logistic distribution .04 .06 .08 ape2_p25 2Step 2Step(flexible) 27 .1 .12 LCF LCF(flexible) .06 .08 .1 ape3_p25 2Step 2Step(flexible) .12 LCF LCF(flexible) .14 −.2 −.15 −.1 −.05 80 0 20 density 40 60 80 60 density 40 20 0 0 20 density 40 60 80 Figure 1.5: (cont’d) .02 .04 .06 ape2_p50 ape1_p50 2Step 2Step(flexible) −.04 LCF LCF(flexible) −.02 .08 .1 LCF LCF(flexible) 60 0 20 density 40 60 20 −.08 −.06 ape1_p75 .06 ape3_p50 2Step 2Step(flexible) 0 −.1 .04 LCF LCF(flexible) density 40 60 density 40 20 0 −.12 .02 80 2Step 2Step(flexible) .1 80 LCF LCF(flexible) 80 2Step 2Step(flexible) .08 .02 .03 .04 ape2_p75 2Step 2Step(flexible) 28 .05 LCF LCF(flexible) .06 .01 .02 .03 .04 ape3_p75 2Step 2Step(flexible) .05 .06 LCF LCF(flexible) −.08 −.06 −.04 ape1_p90 2Step 2Step(flexible) −.02 0 LCF LCF(flexible) 60 0 20 density 40 60 40 density 20 0 0 20 density 40 60 Figure 1.5: (cont’d) −.01 0 .01 .02 ape2_p90 2Step 2Step(flexible) 29 .03 .04 LCF LCF(flexible) −.02 0 .02 ape3_p90 2Step 2Step(flexible) .04 LCF LCF(flexible) .06 −.2 −.15 ape1_p10 −.05 40 0 .05 .1 .15 .02 2Step 2Step(flexible) LCF LCF(flexible) −.1 −.08 ape1_p25 2Step 2Step(flexible) −.06 LCF LCF(flexible) −.04 .08 ape3_p10 .1 .12 LCF LCF(flexible) 40 0 20 density 40 0 20 density 40 density 20 −.12 .06 2Step 2Step(flexible) 60 LCF LCF(flexible) 0 −.14 .04 ape2_p10 60 2Step 2Step(flexible) −.1 60 −.25 0 10 density 20 30 40 30 density 20 10 0 0 10 density 20 30 40 Figure 1.6: Empirical distribution of Percentile APEs under Condition 1 and χ23 distribution .02 .03 .04 ape2_p25 2Step 2Step(flexible) 30 .05 .06 LCF LCF(flexible) .02 .03 .04 .05 ape3_p25 2Step 2Step(flexible) .06 LCF LCF(flexible) .07 −.06 −.05 −.04 ape1_p50 −.02 100 0 .01 .02 ape2_p50 2Step 2Step(flexible) .03 .04 0 −.02 ape1_p75 2Step 2Step(flexible) −.01 LCF LCF(flexible) .02 ape3_p50 2Step 2Step(flexible) .03 .04 LCF LCF(flexible) 100 0 50 density 100 0 50 density 100 density 50 0 −.03 .01 LCF LCF(flexible) 150 LCF LCF(flexible) 150 2Step 2Step(flexible) −.03 150 −.07 0 20 density 40 60 80 100 80 density 40 60 20 0 0 20 density 40 60 80 100 Figure 1.6: (cont’d) 0 .005 .01 ape2_p75 2Step 2Step(flexible) 31 .015 .02 LCF LCF(flexible) −.005 0 .005 .01 ape3_p75 2Step 2Step(flexible) .015 .02 LCF LCF(flexible) −.02 −.015 −.01 ape1_p90 2Step 2Step(flexible) −.005 0 200 0 50 density 100 150 200 150 density 100 50 0 0 50 density 100 150 200 Figure 1.6: (cont’d) −.01 0 .01 .02 ape2_p90 LCF LCF(flexible) 2Step 2Step(flexible) 32 −.01 0 .01 .02 ape3_p90 LCF LCF(flexible) 2Step 2Step(flexible) LCF LCF(flexible) Condition 2: π2 < 1, ρ = 1 To see if the above results are dependent on the instrument’s strong predictive power, we consider a condition where its predictive power is weaker than Condition 1 by generating the data with wi = π2 zi2 + vi where π2 ∈ {0.1, 0.2, 0.5} and ri = vi + ei . Staiger and Stock (1997) suggest a guideline for dividing weak and strong instruments by using the first step’s F statistic, which tests the hypothesis that the instruments are not correlated with the endogenous regressor; a threshold that they suggest is a F statistic of 10. Table 1.4 shows that the mean of F statistics testing the null hypothesis of π2 being 0 is not larger than 10 until π2 = 0.2. It also shows that about a half of 1000 replications has the F statistic which is larger than 10 as π2 = 0.2 and every replication has it as π2 = 0.5. Thus, according to the Staiger and Stock’s discussion, the instrument is weak as π2 = 0.1 and strong as π2 = 0.5. When π2 = 0.2, it barely manages to have properties as a strong instrument. Table 1.4: F statistics of the 1st step (H0 : π2 = 0) π2 F statistics 1. Mean SD ( F > 10) 0.1 3.379 (3.436) 0.053 0.2 10.942 (6.665) 0.485 0.5 71.888 (19.440) 1.000 1 507.208 (80.357) 1.000 ( F > 10) stands for the proportion of the F statistics being greater than 10 among the 1000 replications. Table 1.5 and Figure 1.7 to 1.9 illustrate that under the normal distribution, the average APE estimates by both the two methods are more biased and more volatile as π2 decreases; the weak instrument makes their approximations worse.14 But the mean squared errors (MSEs) in Table 1.6 suggest that, for all three distributions, the LCF approach is worse than the two step estimation method as a weak instrument is in use. 14 Figure 1.10 through 1.15 show that the results under the other distributions are similar. 33 Table 1.5: Average APEs under Condition 2 and Normal distribution π2 g w True Two Step LCF w2 True Two Step Two Step (Flexible) LCF LCF (Flexible) Mean Mean SD SE Mean SD SE Mean Mean SD SE Mean SD SE Mean SD SE Mean SD SE 1 -0.113 -0.087 0.110 -0.008 2.509 36.96 -0.121 -0.091 0.113 -0.085 0.109 0.035 2.954 43.42 0.032 3.301 47.57 0.1 2 3 1 0.056 0.056 -0.113 0.045 0.042 -0.096 0.059 0.067 0.061 0.033 -0.025 -0.085 1.511 1.318 0.105 27.92 19.25 0.088 0.061 0.061 -0.121 0.045 0.046 -0.102 0.063 0.065 0.064 0.042 0.043 -0.097 0.062 0.064 0.063 -0.036 0.001 -0.091 1.624 1.732 0.094 29.42 26.22 0.082 -0.033 0.000 -0.092 1.482 2.076 0.099 22.03 36.71 0.082 1. 0.2 2 0.057 0.048 0.032 0.043 0.052 0.045 0.061 0.051 0.033 0.049 0.034 0.045 0.044 0.041 0.046 0.046 0.041 3 0.057 0.048 0.033 0.042 0.055 0.047 0.061 0.051 0.034 0.049 0.034 0.046 0.054 0.045 0.046 0.056 0.045 1 -0.114 -0.111 0.020 -0.098 0.021 0.021 -0.122 -0.119 0.021 -0.117 0.022 -0.106 0.020 0.020 -0.106 0.020 0.019 0.5 2 0.057 0.055 0.011 0.049 0.011 0.011 0.061 0.059 0.011 0.058 0.011 0.053 0.011 0.011 0.053 0.010 0.010 3 0.057 0.055 0.011 0.049 0.011 0.011 0.061 0.059 0.011 0.058 0.012 0.053 0.011 0.011 0.053 0.011 0.010 π1 = 0, ρ = 1. We cannot obtain the standard errors of average APE estimates by the two step estimation method; the process in STATA to calculate them takes too much time to complete. 3. The grey colored cells indicate that at least one of the APE estimates for three choices has the opposite direction to its true APE. 2. 34 0 .2 ape1 10 .6 −.2 −.1 0 ape2 .1 .2 2Step LCF .2 ape1 2Step 2Step(flexible) .4 .6 .2 LCF 8 2 0 −.3 −.2 −.1 0 .1 ape2 LCF LCF(flexible) 0 density 4 6 8 2 0 0 −.2 2Step density 4 6 8 density 4 6 2 0 −.2 −.4 ape3 10 LCF 10 2Step .4 10 −.2 0 2 4 density 6 8 10 8 6 density 4 2 0 0 2 4 density 6 8 10 Figure 1.7: Empirical distributions of Average APE estimates under Condition 2, π2 = 0.1, and Normal distribution 2Step 2Step(flexible) 1. The .2 −.4 −.2 0 .2 ape3 LCF LCF(flexible) 2Step 2Step(flexible) LCF LCF(flexible) distributions in the first row are those of the estimates when the model includes only w and those in the second row are when the model includes w2 . 35 −.3 −.2 −.1 0 .1 .2 15 0 5 density 10 15 10 density 5 0 0 5 density 10 15 Figure 1.8: Empirical distributions of Average APE estimates under Condition 2, π2 = 0.2, and Normal distribution −.1 −.05 ape1 0 .05 .1 .15 −.1 LCF 2Step 0 .1 ape1 2Step 2Step(flexible) LCF LCF(flexible) .2 LCF 10 0 5 density 10 0 5 density 10 density 5 −.1 .2 15 2Step 0 −.2 .1 ape3 15 LCF 15 2Step −.3 0 ape2 −.1 −.05 0 ape2 2Step 2Step(flexible) 1. The .05 .1 −.1 0 .1 .2 ape3 LCF LCF(flexible) 2Step 2Step(flexible) LCF LCF(flexible) distributions in the first row are those of the estimates when the model includes only w and those in the second row are when the model includes w2 . 36 −.15 −.1 −.05 0 40 0 10 density 20 30 40 30 density 20 10 0 0 10 density 20 30 40 Figure 1.9: Empirical distributions of Average APE estimates under Condition 2, π2 = 0.5, and Normal distribution 0 .02 .04 ape2 ape1 2Step 2Step(flexible) −.05 0 .08 LCF 30 10 0 0 .02 .04 .06 .08 ape2 LCF LCF(flexible) .06 density 20 30 10 −.1 ape1 .04 ape3 2Step 0 −.15 .02 LCF density 20 30 density 20 10 0 −.2 0 40 2Step .08 40 LCF 40 2Step .06 2Step 2Step(flexible) 1. The LCF LCF(flexible) .1 0 .02 .04 ape3 2Step 2Step(flexible) .06 .08 LCF LCF(flexible) distributions in the first row are those of the estimates when the model includes only w and those in the second row are when the model includes w2 . 37 0 .2 ape1 .4 .8 8 −.2 −.1 0 ape2 2Step .1 .2 .2 ape1 2Step 2Step(flexible) .4 .6 LCF LCF(flexible) .8 −.2 ape3 0 .2 LCF 6 0 2 density 4 6 2 0 0 −.4 2Step density 4 6 density 4 2 0 −.2 −.6 LCF 8 LCF 8 2Step .6 8 −.2 0 2 density 4 6 8 6 density 4 2 0 0 2 density 4 6 8 Figure 1.10: Empirical distributions of Average APE estimates under Condition 2, π2 = 0.1, and Logistic distribution −.6 −.4 −.2 ape2 2Step 2Step(flexible) 1. The 0 .2 −.4 −.2 0 .2 ape3 LCF LCF(flexible) 2Step 2Step(flexible) LCF LCF(flexible) distributions in the first row are those of the estimates when the model includes only w and those in the second row are when the model includes w2 . 38 −.3 −.2 −.1 0 .1 .2 15 0 5 density 10 15 10 density 5 0 0 5 density 10 15 Figure 1.11: Empirical distributions of Average APE estimates under Condition 2, π2 = 0.2, and Logistic distribution −.15 −.1 −.05 ape1 0 .05 .1 −.1 LCF 2Step 0 .1 ape1 2Step 2Step(flexible) .2 .15 LCF 10 5 0 −.1 −.05 0 .05 .1 ape2 LCF LCF(flexible) .1 density 10 0 5 density 10 density 5 −.1 .05 15 2Step 0 −.2 0 ape3 15 LCF 15 2Step −.3 −.05 ape2 2Step 2Step(flexible) 1. The LCF LCF(flexible) .15 −.2 −.1 0 ape3 2Step 2Step(flexible) .1 .2 LCF LCF(flexible) distributions in the first row are those of the estimates when the model includes only w and those in the second row are when the model includes w2 . 39 −.2 30 0 10 density 20 30 20 density 10 0 0 10 density 20 30 Figure 1.12: Empirical distributions of Average APE estimates under Condition 2, π2 = 0.5, and Logistic distribution 0 0 .02 .04 ape2 ape1 .08 LCF 2Step 2Step(flexible) −.05 0 .08 .1 LCF 30 10 0 0 .02 .04 .06 .08 ape2 LCF LCF(flexible) .06 density 20 30 10 0 −.1 ape1 .04 2Step density 20 30 density 20 10 −.15 .02 40 2Step 0 −.2 0 ape3 40 LCF 40 2Step .06 2Step 2Step(flexible) 1. The LCF LCF(flexible) .1 0 .02 .04 ape3 2Step 2Step(flexible) .06 .08 .1 LCF LCF(flexible) distributions in the first row are those of the estimates when the model includes only w and those in the second row are when the model includes w2 . 40 −.5 0 .5 8 0 2 density 4 6 8 6 density 4 2 0 0 2 density 4 6 8 Figure 1.13: Empirical distributions of Average APE estimates under Condition 2, π2 = 0.1, and χ23 distribution 1 −.6 −.4 ape1 0 1 2Step 2Step(flexible) LCF LCF(flexible) 0 .2 LCF 8 0 2 density 4 6 8 2 .5 ape1 −.2 ape3 2Step 0 0 −.4 LCF density 4 6 8 density 4 6 2 0 −.5 −.6 10 2Step .2 10 LCF 10 2Step −.2 ape2 −.6 −.4 −.2 ape2 2Step 2Step(flexible) 1. The 0 .2 LCF LCF(flexible) −.6 −.4 −.2 ape3 2Step 2Step(flexible) 0 .2 LCF LCF(flexible) distributions in the first row are those of the estimates when the model includes only w and those in the second row are when the model includes w2 . 41 −.1 0 ape1 .2 20 −.1 0 .1 .2 2Step 2Step 2Step(flexible) .1 LCF LCF(flexible) .2 .05 .1 LCF 15 0 5 density 10 15 5 0 0 ape1 0 ape3 2Step density 10 15 density 10 5 −.1 −.05 LCF 20 LCF 0 −.2 −.1 ape2 20 2Step .1 20 −.2 0 5 density 10 15 20 15 density 10 5 0 0 5 density 10 15 20 Figure 1.14: Empirical distributions of Average APE estimates under Condition 2, π2 = 0.2, and χ23 distribution −.1 −.05 0 ape2 2Step 2Step(flexible) 1. The .05 .1 −.1 0 .1 .2 ape3 LCF LCF(flexible) 2Step 2Step(flexible) LCF LCF(flexible) distributions in the first row are those of the estimates when the model includes only w and those in the second row are when the model includes w2 . 42 −.1 −.05 0 .05 50 0 10 density 20 30 40 50 40 density 20 30 10 0 0 10 density 20 30 40 50 Figure 1.15: Empirical distributions of Average APE estimates under Condition 2, π2 = 0.5, and χ23 distribution −.02 0 ape1 .06 2Step 2Step(flexible) 0 LCF LCF(flexible) .05 .04 .06 LCF 40 0 10 density 20 30 40 10 −.05 ape1 .02 ape3 2Step 0 −.1 0 LCF density 20 30 40 density 20 30 10 0 −.15 −.02 50 2Step .04 50 LCF 50 2Step .02 ape2 −.02 0 .02 ape2 2Step 2Step(flexible) 1. The .04 LCF LCF(flexible) .06 −.02 0 .02 ape3 2Step 2Step(flexible) .04 .06 LCF LCF(flexible) distributions in the first row are those of the estimates when the model includes only w and those in the second row are when the model includes w2 . 43 Table 1.6: Mean Squared Errors of Average APE estimates under Condition 2 1 0.1 2 3 Two Step LCF Two Step Two Step (Flexible) LCF LCF (Flexible) 0.013 6.305 0.014 0.013 8.752 10.923 0.004 2.285 0.004 0.004 2.647 2.204 0.005 1.743 0.004 0.004 3.003 4.314 Two Step LCF Two Step Two Step (Flexible) LCF LCF (Flexible) 0.015 9.419 0.016 0.015 15.493 12.334 0.004 1.312 0.005 0.005 5.048 3.486 0.005 5.220 0.005 0.004 3.587 3.412 Two Step LCF Two Step Two Step (Flexible) LCF LCF (Flexible) 0.011 2.386 0.010 0.011 5.328 5.039 0.003 0.855 0.004 0.004 2.061 1.914 0.004 0.538 0.003 0.004 1.312 1.272 π2 g w w2 w w2 w w2 1. 0.2 1 2 Normal 0.004 0.001 0.012 0.003 0.004 0.001 0.005 0.001 0.010 0.002 0.011 0.002 Logistic 0.005 0.001 0.013 0.003 0.006 0.001 0.006 0.001 0.012 0.003 0.012 0.003 χ23 0.002 0.001 0.007 0.002 0.002 0.001 0.003 0.001 0.004 0.001 0.004 0.001 3 1 0.5 2 0.001 0.003 0.001 0.001 0.003 0.003 0.000 0.001 0.000 0.000 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.003 0.002 0.002 0.003 0.004 0.001 0.001 0.001 0.001 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.002 0.001 0.001 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 3 1 1 2 3 The grey colored cells indicate that at least one of the APE estimates for three choices has the opposite direction to its true APE. 44 Table 1.7 through 1.1215 show that percentile APE estimates have similar patters with those in average APE estimates. Plus, the tables presents that the weak instrument causes huge standard errors at some replications.16 The standard errors significantly decrease as π2 rises. As in the average APE estimates, Table 1.13 shows that the weak instrument causes the LCF approach to have much worse approximation than the two step estimation method. Under Condition 2, the results does not provide enough evidence that including vi in a flexible way helps their approximations. Actually, with the weak instrument, it makes the two methods’ approximations worse; it causes the two step estimation method to have enormous standard errors at several replications and the two methods to have more sign distortions. Hence the results under Condition 2 demonstrate that the quality of an instrument affects the two methods’ approximation performances, and that the two step estimation method is less sensitive to a weak instrument than the LCF approach. Plus, it is better not to include the additional terms of the control function for their approximations if a weak instrument is used. 15 The logistic distribution has similar results to the normal distribution. There are big differences between the medians and the means of the standard errors except the two step estimation. 16 45 Table 1.7: Percentile APEs under Condition 2, π2 = 0.1, and Normal distribution wp g w True Two Step w2 True Two Step Two Step (Flexible) LCF LCF (Flexible) Mean Mean SD SE Mean Mean SD SE Mean SD SE SE* Mean SD SE SE* Mean SD SE SE* 1 -0.164 -0.125 0.122 0.232 -0.219 -0.150 0.125 0.237 -0.146 0.147 8E+04 0.155 -0.063 2.953 43.42 0.106 -0.009 3.467 47.77 0.168 10th 2 0.082 0.064 0.063 0.132 0.110 0.075 0.068 0.156 0.074 0.084 2E+13 0.085 0.013 1.624 29.42 0.058 -0.012 1.563 22.13 0.088 3 0.082 0.061 0.069 0.148 0.110 0.075 0.069 0.158 0.073 0.085 6E+11 0.081 0.050 1.732 26.22 0.057 0.021 2.148 36.81 0.090 1 -0.156 -0.160 0.173 0.485 -0.188 -0.182 0.180 0.367 -0.177 0.180 5E+07 0.180 -0.016 2.954 43.42 0.105 0.011 3.385 47.65 0.130 1. 25th 2 0.078 0.082 0.088 0.259 0.094 0.090 0.096 0.258 0.088 0.099 7E+12 0.094 -0.010 1.624 29.41 0.057 -0.022 1.521 22.07 0.068 3 0.078 0.078 0.097 0.353 0.094 0.092 0.096 0.224 0.089 0.101 3E+11 0.093 0.027 1.732 26.22 0.057 0.011 2.113 36.75 0.069 1 -0.127 -0.112 0.147 0.355 -0.127 -0.112 0.147 0.305 -0.110 0.154 8E+09 0.134 0.035 2.954 43.42 0.105 0.032 3.299 47.57 0.099 50th 2 0.064 0.058 0.078 0.269 0.064 0.055 0.084 0.211 0.054 0.088 3E+12 0.071 -0.036 1.624 29.41 0.057 -0.032 1.481 22.03 0.055 3 0.064 0.054 0.088 0.246 0.064 0.056 0.083 0.200 0.056 0.087 9E+10 0.075 0.001 1.732 26.22 0.057 0.001 2.075 36.71 0.055 APEs at (z1 , w p ). π1 = 0, ρ = 1. 3. SE* is the median of the standard errors. 4. The grey colored cells indicate that at least one of the APE estimates for three choices has the opposite direction to its true APE. 2. 46 Table 1.7: (cont’d) wp g w True Two Step w2 True Two Step Two Step (Flexible) LCF LCF (Flexible) Mean Mean SD SE Mean Mean SD SE Mean SD SE SE* Mean SD SE SE* Mean SD SE SE* 1 -0.089 -0.044 0.092 0.456 -0.075 -0.035 0.094 0.294 -0.031 0.116 2E+09 0.094 0.087 2.955 43.42 0.105 0.054 3.244 47.58 0.100 75th 2 0.044 0.023 0.056 0.489 0.037 0.016 0.058 0.284 0.015 0.070 1E+12 0.061 -0.062 1.624 29.41 0.057 -0.044 1.457 22.04 0.056 47 3 0.044 0.020 0.063 0.438 0.037 0.018 0.062 0.273 0.016 0.069 5E+10 0.060 -0.025 1.732 26.22 0.057 -0.010 2.050 36.71 0.058 90th 1 2 -0.057 0.028 -0.014 0.008 0.058 0.043 0.213 0.235 -0.044 0.022 -0.003 0.001 0.062 0.044 0.251 0.178 0.010 -0.005 0.098 0.065 2E+14 5E+12 0.087 0.068 0.134 -0.085 2.955 1.625 43.42 29.42 0.106 0.058 0.074 -0.054 3.220 1.449 47.65 22.08 0.127 0.073 3 0.028 0.006 0.045 0.193 0.022 0.003 0.046 0.190 -0.005 0.065 2E+11 0.069 -0.049 1.732 26.22 0.057 -0.020 2.039 36.75 0.074 Table 1.8: Percentile APEs under Condition 2, π2 = 0.2, and Normal distribution wp g w True Two Step w2 True Two Step Two Step (Flexible) LCF LCF (Flexible) Mean Mean SD SE Mean Mean SD SE Mean SD SE Mean SD SE Mean SD SE 1 -0.164 -0.144 0.089 0.088 -0.220 -0.182 0.092 0.090 -0.182 0.104 152.5 -0.189 0.097 0.085 -0.152 0.159 0.120 10th 2 0.082 0.072 0.045 0.045 0.110 0.091 0.046 0.046 0.091 0.055 13.09 0.094 0.046 0.042 0.076 0.078 0.060 3 0.082 0.072 0.046 0.045 0.110 0.091 0.047 0.046 0.091 0.056 217.2 0.095 0.055 0.046 0.076 0.083 0.063 1 -0.157 -0.147 0.101 0.102 -0.188 -0.174 0.108 0.108 -0.174 0.108 43.23 -0.143 0.095 0.083 -0.124 0.123 0.099 25th 2 0.078 0.074 0.051 0.052 0.094 0.087 0.055 0.055 0.087 0.056 0.626 0.071 0.045 0.041 0.062 0.059 0.049 1. 3 0.078 0.073 0.052 0.052 0.094 0.087 0.055 0.055 0.087 0.057 6.530 0.072 0.054 0.045 0.062 0.067 0.052 1 -0.127 -0.111 0.074 0.072 -0.127 -0.112 0.075 0.073 -0.110 0.078 5.656 -0.091 0.094 0.082 -0.092 0.098 0.082 50th 2 0.064 0.056 0.039 0.039 0.064 0.056 0.039 0.039 0.055 0.041 0.089 0.045 0.044 0.041 0.046 0.046 0.041 3 0.064 0.055 0.040 0.039 0.064 0.056 0.041 0.039 0.055 0.042 0.457 0.046 0.054 0.045 0.046 0.056 0.045 1 -0.089 -0.065 0.045 0.039 -0.075 -0.055 0.048 0.043 -0.050 0.062 0.759 -0.040 0.094 0.082 -0.061 0.103 0.078 75th 2 0.044 0.033 0.026 0.027 0.037 0.027 0.027 0.028 0.025 0.034 0.066 0.020 0.044 0.041 0.030 0.050 0.041 3 0.044 0.032 0.028 0.027 0.037 0.028 0.029 0.028 0.025 0.035 0.243 0.020 0.054 0.045 0.031 0.059 0.045 APEs at (z1 , w p ). π1 = 0, ρ = 1. 3. The grey colored cells indicate that at least one of the APE estimates for three choices has the opposite direction to its true APE. 2. 48 Table 1.8: (cont’d) wp g w True Two Step w2 True Two Step Two Step (Flexible) LCF LCF (Flexible) Mean Mean SD SE Mean Mean SD SE Mean SD SE Mean SD SE Mean SD SE 49 1 -0.057 -0.037 0.033 0.033 -0.043 -0.024 0.038 0.037 -0.013 0.063 8.476 0.007 0.095 0.084 -0.032 0.130 0.088 90th 2 3 0.028 0.028 0.019 0.018 0.022 0.023 0.024 0.025 0.022 0.022 0.012 0.012 0.024 0.025 0.026 0.025 0.007 0.006 0.037 0.037 0.256 0.690 -0.004 -0.003 0.045 0.054 0.042 0.046 0.016 0.016 0.065 0.072 0.048 0.052 Table 1.9: Percentile APEs under Condition 2, π2 = 0.5, and Normal distribution wp g w True Two Step w2 True Two Step Two Step (Flexible) LCF LCF (Flexible) 1. 2. Mean Mean SD SE Mean Mean SD SE Mean SD SE Mean SD SE Mean SD SE 1 -0.168 -0.164 0.037 0.037 -0.224 -0.213 0.041 0.039 -0.214 0.044 0.042 -0.202 0.028 0.026 -0.168 0.036 0.032 10th 2 0.084 0.082 0.019 0.019 0.112 0.106 0.020 0.020 0.107 0.022 0.022 0.101 0.014 0.014 0.084 0.018 0.017 3 0.084 0.082 0.019 0.019 0.112 0.107 0.020 0.020 0.107 0.022 0.022 0.101 0.015 0.014 0.084 0.018 0.017 1 -0.159 -0.158 0.040 0.039 -0.192 -0.188 0.044 0.043 -0.188 0.044 0.042 -0.157 0.024 0.022 -0.139 0.027 0.025 25th 2 0.080 0.079 0.020 0.020 0.096 0.094 0.022 0.022 0.094 0.023 0.022 0.078 0.012 0.012 0.069 0.014 0.013 APEs at (z1 , w p ). π1 = 0, ρ = 1. 50 3 0.080 0.079 0.020 0.020 0.096 0.094 0.022 0.022 0.094 0.023 0.022 0.078 0.012 0.012 0.070 0.014 0.013 1 -0.129 -0.125 0.026 0.025 -0.129 -0.126 0.027 0.026 -0.125 0.028 0.026 -0.106 0.021 0.020 -0.106 0.020 0.019 50th 2 0.064 0.062 0.014 0.013 0.064 0.063 0.014 0.014 0.062 0.015 0.014 0.053 0.011 0.011 0.053 0.010 0.010 3 0.064 0.062 0.014 0.013 0.064 0.063 0.015 0.014 0.062 0.015 0.014 0.053 0.011 0.011 0.053 0.011 0.010 1 -0.088 -0.083 0.010 0.009 -0.075 -0.072 0.014 0.013 -0.069 0.018 0.017 -0.056 0.021 0.021 -0.074 0.020 0.018 75th 2 0.044 0.041 0.007 0.007 0.037 0.036 0.009 0.009 0.035 0.011 0.010 0.028 0.011 0.011 0.037 0.011 0.010 3 0.044 0.041 0.008 0.007 0.037 0.036 0.009 0.009 0.035 0.011 0.010 0.028 0.012 0.011 0.037 0.011 0.010 Table 1.9: (cont’d) wp g w True Two Step w2 True Two Step Two Step (Flexible) LCF LCF (Flexible) Mean Mean SD SE Mean Mean SD SE Mean SD SE Mean SD SE Mean SD SE 51 1 -0.055 -0.052 0.007 0.006 -0.043 -0.041 0.010 0.010 -0.037 0.018 0.016 -0.010 0.026 0.024 -0.044 0.026 0.021 90th 2 0.028 0.026 0.007 0.007 0.021 0.021 0.008 0.008 0.019 0.012 0.012 0.005 0.013 0.013 0.022 0.014 0.013 3 0.028 0.026 0.007 0.007 0.021 0.020 0.008 0.008 0.018 0.012 0.011 0.005 0.014 0.013 0.022 0.015 0.013 Table 1.10: Percentile APEs under Condition 2, π2 = 0.1, and χ23 distribution wp g w True Two Step w2 True Two Step Two Step (Flexible) LCF LCF (Flexible) Mean Mean SD SE Mean Mean SD SE Mean SD SE SE* Mean SD SE SE* Mean SD SE SE* 1 -0.097 -0.133 0.160 0.237 -0.145 -0.146 0.164 0.271 -0.136 0.192 4E+25 0.209 -0.058 2.305 41.22 0.090 0.019 2.401 37.35 0.221 10th 2 0.049 0.069 0.083 0.886 0.072 0.073 0.087 3.391 0.070 0.103 5E+21 0.117 0.028 1.434 27.90 0.051 -0.008 1.420 26.65 0.113 3 0.049 0.064 0.089 0.877 0.072 0.072 0.087 3.362 0.067 0.107 1E+20 0.116 0.030 1.144 20.46 0.051 -0.011 1.238 19.09 0.116 1 -0.070 -0.093 0.150 0.268 -0.087 -0.118 0.160 0.458 -0.100 0.155 1E+29 0.136 0.002 2.305 41.22 0.089 0.041 2.302 37.13 0.152 1. 25th 2 0.035 0.049 0.078 0.685 0.044 0.059 0.089 3.292 0.051 0.090 2E+22 0.078 -0.002 1.434 27.90 0.050 -0.019 1.391 26.55 0.079 3 0.035 0.044 0.089 0.612 0.044 0.059 0.086 3.084 0.049 0.087 4E+21 0.079 0.000 1.144 20.46 0.051 -0.021 1.175 18.98 0.082 1 -0.043 -0.030 0.109 0.306 -0.043 -0.044 0.109 0.434 -0.041 0.140 1E+27 0.126 0.069 2.305 41.22 0.089 0.063 2.241 36.96 0.090 50th 2 0.022 0.017 0.065 0.573 0.022 0.023 0.069 1.627 0.021 0.081 1E+31 0.072 -0.035 1.434 27.90 0.050 -0.031 1.383 26.47 0.051 3 0.022 0.013 0.071 0.674 0.022 0.021 0.065 1.667 0.020 0.078 6E+29 0.074 -0.034 1.144 20.46 0.051 -0.032 1.124 18.91 0.050 APEs at (z1 , w p ). π1 = 0, ρ = 1. 3. SE* is the median of the standard errors. 4. The grey colored cells indicate that at least one of the APE estimates for three choices has the opposite direction to its true APE. 2. 52 Table 1.10: (cont’d) wp g w True Two Step w2 True Two Step Two Step (Flexible) LCF LCF (Flexible) Mean Mean SD SE Mean Mean SD SE Mean SD SE SE* Mean SD SE SE* Mean SD SE SE* 75th 1 2 3 -0.023 0.012 0.012 0.006 -0.001 -0.004 0.081 0.053 0.057 0.503 0.436 0.421 -0.020 0.010 0.010 -0.003 0.002 0.001 0.078 0.056 0.052 0.525 1.332 0.998 0.020 -0.010 -0.010 0.136 0.081 0.076 4E+24 5E+30 2E+29 0.150 0.108 0.113 0.136 -0.069 -0.067 2.304 1.434 1.144 41.22 27.90 20.46 0.089 0.050 0.050 0.087 -0.044 -0.043 2.248 1.400 1.111 37.01 26.50 18.95 0.076 0.049 0.049 53 90th 1 2 3 -0.012 0.006 0.006 0.014 -0.007 -0.007 0.057 0.041 0.044 0.164 0.380 0.339 -0.010 0.005 0.005 0.007 -0.003 -0.004 0.051 0.042 0.041 0.561 0.537 0.947 0.048 -0.025 -0.023 0.122 0.075 0.073 2E+28 2E+30 4E+28 0.189 0.162 0.169 0.196 -0.099 -0.097 2.304 1.434 1.144 41.22 27.90 20.46 0.089 0.051 0.051 0.109 -0.056 -0.053 2.308 1.435 1.131 37.22 26.59 19.05 0.125 0.075 0.073 Table 1.11: Percentile APEs under Condition 2, π2 = 0.2, and χ23 distribution wp g w True Two Step w2 True Two Step Two Step (Flexible) LCF LCF (Flexible) Mean Mean SD SE Mean Mean SD SE Mean SD SE Mean SD SE Mean SD SE 1 -0.097 -0.111 0.113 0.107 -0.145 -0.134 0.124 0.120 -0.130 0.127 0.817 -0.183 0.069 0.062 -0.130 0.185 0.132 10th 2 0.049 0.056 0.057 0.055 0.072 0.067 0.063 0.061 0.065 0.067 15.77 0.091 0.036 0.033 0.065 0.095 0.068 3 0.049 0.056 0.057 0.055 0.072 0.067 0.063 0.062 0.065 0.067 25.78 0.092 0.039 0.035 0.065 0.094 0.069 1 -0.070 -0.066 0.069 0.066 -0.087 -0.091 0.080 0.077 -0.080 0.069 0.312 -0.123 0.067 0.061 -0.096 0.122 0.095 25th 2 0.035 0.033 0.037 0.036 0.044 0.045 0.042 0.041 0.040 0.038 15.56 0.062 0.035 0.032 0.048 0.062 0.049 1. 3 0.035 0.033 0.037 0.036 0.044 0.045 0.042 0.041 0.040 0.039 34.77 0.062 0.038 0.034 0.048 0.062 0.050 1 -0.043 -0.027 0.038 0.031 -0.043 -0.041 0.036 0.029 -0.037 0.060 0.420 -0.057 0.066 0.060 -0.058 0.064 0.059 50th 2 0.021 0.014 0.024 0.022 0.022 0.020 0.022 0.021 0.018 0.033 13.94 0.028 0.034 0.032 0.029 0.033 0.032 3 0.021 0.014 0.023 0.022 0.022 0.021 0.023 0.022 0.019 0.033 49.29 0.029 0.037 0.034 0.029 0.036 0.033 1 -0.023 -0.009 0.032 0.024 -0.020 -0.016 0.027 0.023 0.001 0.071 19.58 0.009 0.065 0.060 -0.019 0.075 0.046 75th 2 3 0.011 0.011 0.005 0.004 0.022 0.020 0.021 0.021 0.010 0.010 0.008 0.008 0.019 0.020 0.021 0.020 -0.001 0.000 0.038 0.039 12.11 65.85 -0.005 -0.004 0.033 0.037 0.032 0.034 0.009 0.010 0.039 0.042 0.028 0.030 APEs at (z1 , w p ). π1 = 0, ρ = 1. 3. SE* is the median of the standard errors. 4. The grey colored cells indicate that at least one of the APE estimates for three choices has the opposite direction to its true APE. 2. 54 Table 1.11: (cont’d) wp g w True Two Step w2 True Two Step Two Step (Flexible) LCF LCF (Flexible) Mean Mean SD SE Mean Mean SD SE Mean SD SE Mean SD SE Mean SD SE 55 1 -0.012 -0.002 0.029 0.024 -0.010 -0.006 0.023 0.022 0.025 0.085 4143 0.069 0.065 0.061 0.016 0.129 0.070 90th 2 3 0.006 0.006 0.001 0.001 0.021 0.020 0.021 0.021 0.005 0.005 0.003 0.003 0.018 0.018 0.020 0.021 -0.013 -0.012 0.047 0.046 490.5 400.8 -0.035 -0.034 0.033 0.037 0.033 0.035 -0.008 -0.008 0.068 0.069 0.041 0.042 Table 1.12: Percentile APEs under Condition 2, π2 = 0.5, and χ23 distribution wp g w True Two Step w2 True Two Step Two Step (Flexible) LCF LCF (Flexible) Mean Mean SD SE Mean Mean SD SE Mean SD SE Mean SD SE Mean SD SE 1 -0.098 -0.096 0.045 0.042 -0.147 -0.123 0.056 0.052 -0.124 0.054 0.054 -0.184 0.026 0.023 -0.138 0.046 0.040 10th 2 0.049 0.048 0.023 0.021 0.073 0.062 0.028 0.027 0.062 0.028 0.028 0.092 0.014 0.012 0.069 0.024 0.021 3 0.049 0.048 0.023 0.021 0.073 0.062 0.028 0.027 0.062 0.028 0.028 0.092 0.014 0.012 0.069 0.024 0.021 1 -0.070 -0.063 0.023 0.021 -0.087 -0.085 0.028 0.025 -0.079 0.025 0.029 -0.127 0.021 0.019 -0.103 0.032 0.028 25th 2 0.035 0.031 0.012 0.012 0.044 0.043 0.014 0.014 0.039 0.014 0.016 0.063 0.011 0.010 0.051 0.016 0.015 1. 3 0.035 0.031 0.012 0.012 0.044 0.043 0.015 0.014 0.040 0.014 0.015 0.064 0.011 0.010 0.052 0.016 0.015 1 -0.042 -0.036 0.008 0.007 -0.042 -0.047 0.010 0.009 -0.042 0.014 0.025 -0.064 0.018 0.016 -0.064 0.017 0.016 50th 2 0.021 0.018 0.006 0.006 0.021 0.023 0.007 0.007 0.021 0.008 0.013 0.032 0.010 0.009 0.032 0.010 0.009 3 0.021 0.018 0.006 0.006 0.021 0.023 0.007 0.007 0.021 0.009 0.013 0.032 0.010 0.009 0.032 0.010 0.009 1 -0.022 -0.019 0.004 0.003 -0.019 -0.022 0.006 0.005 -0.019 0.009 0.045 -0.001 0.017 0.017 -0.025 0.013 0.010 75th 2 0.011 0.010 0.005 0.005 0.010 0.011 0.006 0.006 0.010 0.007 0.019 0.000 0.010 0.009 0.012 0.009 0.007 3 0.011 0.010 0.005 0.005 0.010 0.011 0.006 0.006 0.010 0.007 0.019 0.000 0.010 0.009 0.012 0.009 0.007 APEs at (z1 , w p ). π1 = 0, ρ = 1. 3. The grey colored cells indicate that at least one of the APE estimates for three choices has the opposite direction to its true APE. 2. 56 Table 1.12: (cont’d) wp g w True Two Step w2 True Two Step Two Step (Flexible) LCF LCF (Flexible) Mean Mean SD SE Mean Mean SD SE Mean SD SE Mean SD SE Mean SD SE 57 1 -0.011 -0.011 0.003 0.003 -0.009 -0.010 0.004 0.004 -0.008 0.009 0.123 0.056 0.020 0.019 0.011 0.023 0.017 90th 2 3 0.006 0.006 0.006 0.006 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.006 0.006 0.006 0.006 0.004 0.004 0.009 0.009 0.040 0.040 -0.028 -0.028 0.011 0.011 0.011 0.011 -0.005 -0.005 0.014 0.014 0.011 0.011 Table 1.13: Mean Squared Errors of Percentile APE estimates under Condition 2 π2 g 1 10th 2 0.1 Two step Two step (Flexible) LCF LCF (Flexible) Two step Two step (Flexible) LCF LCF (Flexible) Two step Two step (Flexible) LCF LCF (Flexible) 0.020 0.027 8.747 12.06 0.010 0.012 0.010 0.030 0.002 0.002 0.001 0.004 0.006 0.008 2.646 2.457 0.002 0.003 0.002 0.007 0.000 0.001 0.000 0.001 0.006 0.009 3.002 4.621 0.003 0.003 0.003 0.008 0.000 0.001 0.000 0.001 Two step Two step (Flexible) LCF LCF (Flexible) Two step Two step (Flexible) LCF LCF (Flexible) Two step Two step (Flexible) LCF LCF (Flexible) 0.027 0.037 5.321 5.793 0.015 0.016 0.006 0.034 0.004 0.003 0.002 0.002 0.008 0.011 2.060 2.023 0.004 0.004 0.002 0.009 0.001 0.001 0.001 0.001 0.008 0.012 1.310 1.539 0.004 0.005 0.002 0.009 0.001 0.001 0.001 0.001 wp w2 0.2 0.5 w2 0.1 0.2 0.5 1. 2. 3 25th 1 2 Normal 0.032 0.009 0.033 0.010 8.753 2.648 11.50 2.328 0.012 0.003 0.012 0.003 0.011 0.003 0.019 0.005 0.002 0.001 0.002 0.001 0.002 0.000 0.004 0.001 χ23 0.027 0.008 0.024 0.008 5.320 2.059 5.316 1.938 0.006 0.002 0.005 0.001 0.006 0.002 0.015 0.004 0.001 0.000 0.001 0.000 0.002 0.001 0.001 0.000 3 1 50th 2 3 1 75th 2 3 0.009 0.010 3.003 4.470 0.003 0.003 0.003 0.005 0.001 0.001 0.000 0.001 0.022 0.007 0.024 0.008 8.754 2.648 10.91 2.202 0.006 0.002 0.006 0.002 0.010 0.002 0.011 0.002 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.000 0.007 0.008 3.003 4.308 0.002 0.002 0.003 0.003 0.000 0.000 0.000 0.000 0.010 0.004 0.004 0.015 0.005 0.005 8.757 2.649 3.004 10.54 2.130 4.206 0.003 0.001 0.001 0.004 0.001 0.001 0.010 0.002 0.003 0.011 0.003 0.004 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.008 0.008 1.310 1.385 0.002 0.002 0.002 0.004 0.000 0.000 0.001 0.000 0.012 0.005 0.020 0.006 5.324 2.060 5.035 1.917 0.001 0.000 0.004 0.001 0.005 0.001 0.004 0.001 0.000 0.000 0.000 0.000 0.001 0.000 0.001 0.000 0.004 0.006 1.311 1.266 0.001 0.001 0.001 0.001 0.000 0.000 0.000 0.000 0.006 0.020 5.335 5.064 0.001 0.005 0.005 0.006 0.000 0.000 0.001 0.000 0.003 0.007 2.063 1.964 0.000 0.002 0.001 0.002 0.000 0.000 0.000 0.000 0.003 0.006 1.314 1.238 0.000 0.002 0.002 0.002 0.000 0.000 0.000 0.000 The Mean Squared Errors are calculated from Table 1.7 through 1.12. The grey colored cells indicate that at least one of the APE estimates for three choices has the opposite direction to its true APE. 58 Table 1.13: (cont’d) 1 90th 2 3 0.005 0.013 8.765 10.38 0.002 0.005 0.012 0.017 0.000 0.000 0.002 0.001 0.002 0.005 2.651 2.104 0.001 0.002 0.003 0.004 0.000 0.000 0.000 0.000 0.002 0.005 3.006 4.158 0.001 0.002 0.004 0.005 0.000 0.000 0.000 0.000 0.003 0.006 5.353 5.340 0.001 0.008 0.010 0.017 0.000 0.000 0.005 0.001 0.002 0.006 2.067 2.063 0.000 0.003 0.003 0.005 0.000 0.000 0.001 0.000 0.002 wp π2 g Normal w2 0.1 0.2 0.5 w2 0.1 0.2 0.5 Two step Two step (Flexible) LCF LCF (Flexible) Two step Two step (Flexible) LCF LCF (Flexible) Two step Two step (Flexible) LCF LCF (Flexible) χ23 Two step Two step (Flexible) LCF LCF (Flexible) Two step Two step (Flexible) LCF LCF (Flexible) Two step Two step (Flexible) LCF LCF (Flexible) 1.318 1.282 0.000 0.002 0.003 0.005 0.000 0.000 0.001 0.000 Condition 3: ρ < 1 We also generate the data by allowing the amount of endogeneity to be smaller than Conditions 1 and 2: wi = π2 zi2 + vi where π2 ∈ {0.1, 0.2, 0.5, 1} and ri = ρvi + ei where ρ ∈ {0.1, 0.5} Although a fewer sign distortions are observed than ρ = 1, the previous results remain almost the same, in general. In summary, the simulations under Condition 1 through Condition 3 demonstrate that the two step estimation method with a strong instrument provides a good approxima- 59 tion even though its conditional mean is misspecified, and that it outperforms the LCF approach regardless of the instrument’s quality and the amount of endogeneity. Furthermore, in the simulations, adding v2i and v3i in the estimation does not improve the two methods’ approximations.17 1.4 APPLICATION: MICHIGAN EDUCATIONAL ASSESSMENT PROGRAM MATH TEST We apply the two step estimation method to Michigan Educational Assessment Program (MEAP) data of the school year 2004/2005 in order to estimate the effects of spending on the fourth grade math test outcome. The fourth grade MEAP math test is a statewide assessment test given by the State Board of Education in Michigan. It measures public school student achievement in relation to Michigan curriculum standards, which groups of educators, teachers and school administrators set. A student’s outcome is rated at one of the four levels as described in Table 1.14 and public school districts’ percentage shares of students for the four levels are available from the Michigan Department of Education (MDE) website.18 Table 1.14: Four levels of MEAP Outcome Level 1 Level 2 Level 3 Level 4 1. Description Exceeded Michigan Standards Met Michigan Standards demonstrated Basic knowledge and skills of Michigan Standards Apprentice level, showing little success in meeting Michigan standards The description is from Michgian Department of Education (2005). Papke (2005, 2008), Papke and Wooldridge (2008), and Roy (2011) examine the relationship between spending and pass rates of this test by using panel data. During their data periods, the test had three performance levels (Satisfactory, Moderate, Low) and 17 18 That the simulations model ri as a linear function of vi could be one of the reasons. ❤tt♣✿✴✴✇✇✇✳♠✐❝❤✐❣❛♥✳❣♦✈✴♠❞❡✴ 60 their pass rates indicate the percentage of students in the satisfactory level. They find that there exist significant positive causal effects of spending on the pass rates although their magnitudes are different. The application in this chapter can help understand how spending shifts students in the four different levels instead of between pass and fail. We use district-level data including 512 districts.19 We turn these districts’ percentage shares into proportions to obtain fractional dependent variables.20 Table 1.15 illustrates the dependent variables’ summary statistics. While the lower corner 0 appears for all of the four levels, the upper corner 1 appears only for level 1 like the dependent variables generated in the simulations. We choose the first level as the reference as in the simulations. Table 1.15: Summary statistics of the dependent variables Variable y1 y2 y3 y4 Total Mean 0.283 0.463 0.221 0.033 1.000 SD 0.138 0.082 0.098 0.035 Min 0 0 0 0 Max 1.000 0.742 0.643 0.278 Description fraction of Level 1 fraction of Level 2 fraction of Level 3 fraction of Level 4 The key explanatory variable, spending, is constructed as per pupil general fund expenditure in a logarithmic form, log(per pupil GF expenditure). Although we additionally control for the fraction of applications for the free and reduced-price lunch program as a measure of the poverty rate and log(enrollment) as a measure of school district size, we still suspect that spending is endogenous. There can be unobserved district effects such as parental involvement, which are correlated with spending and are able to affect student outcome as well. Thus we need an instrument to achieve more accurate estimation for the effects of spending on the student test outcome. To find an instrument, we exploit Michigan’s school funding system reform in 1994, which is called “Proposal A.” The reform changed Michigan’s school funding sources 19 In the year of 2004/2005, Michigan has 552 public school districts. The original percentage shares for some districts do not sum to 100 because of rounding errors. Thus We calculate the proportions not based on 100, but based on total percentage shares of the four levels. 20 61 and started to provide school districts with foundation allowances. It resulted in not only significantly raising districts spending, but also reducing the spending inequalities across districts by letting the low spending districts’ foundation allowances increase faster than the others. The initial foundation allowance for a district, which was awarded in 1994/1995, was determined based on its per pupil spending in 1993/1994 and the dollar increases for the following years has been decided solely by the state legislature. Therefore per pupil foundation allowance in 2004/2005 meets the requirements of an instrument for spending if spending in 1993/1994 is controlled for. Letting zi1 = (log(enrollmenti ) freelunchi spending93i ), we apply Procedure 1.2.1 to the model K g (hi ; θ) = exp hi θg ∑4h exp (hi θh ) , ∀ g = 1, · · · , 4 (1.26) where hi = (zi1 spendingi vi ), freelunchi represents the fraction of applications for the free and reduced-price lunch program, and spending93i is spending in 1993/1994. The spending variable’s reduced form is expressed as spendingi = zi1 π 1 + π2 foundationi + vi (1.27) where foundationi = log(per pupil foundation allowancei ). Table 1.16 includes summary statistics of the data. Table 1.16: Summary statistics of the data Variable enrollment fraction of applications for free and reduced lunch per pupil expenditure in 2004/2005 per pupil expenditure in 1994/1995 per pupil foundation allowance # of districts Mean (SD) 3127.186 (6972.843) 0.354 (0.176) 8086.428 (1090.264) 4901.967 (943.182) 6979.738 (655.772) 512 Table 1.17 contains the first step estimation result. The foundation variable’s t statistic shows that spending is correlated with the foundation allowance, netting out the other 62 explanatory variables. In addition, the F statistic suggests that it can be declared as a strong instrument according to Staiger and Stock (1997). Table 1.17: The first step estimation result log(enrollment) freelunch spending93 foundation constant R2 F ( H0 : foundation = 0) coefficient 0.004 0.323 0.180 0.738 0.787 0.625 37.28 SE 0.005 0.024 0.053 0.121 0.706 t 0.860 13.230 3.390 6.110 1.110 p-values 0.391 0.000 0.001 0.000 0.266 Table 1.18 reports the average APE estimates by the two methods. The two step estimation method provides statistically significant effects on level 1 and level 3. When the quadratic effect of spending is not allowed for, a 10% increase in per pupil expenditure - an increase in spending by 0.1 - causes the fractions of student at level 1 and level 3 to raise 6.8 percentage points and decrease 6.2 percentage points, respectively.21 With the quadratic effect, the magnitudes of the effects are slightly larger. The LCF approach yields similar result except that the fraction at level 4 is also statistically significant. The magnitudes of the effects are bigger than those by the two step estimation method. Without the quadratic effect, its estimated effect of a 10% increase in per pupil expenditure on the fraction at level 1 is 7.7 percentage points and those on the fractions at level 3 and 4 are -6.3 and -1.0 percentage points, respectively. With the quadratic effect, those effects become larger. Including the flexible forms of vi causes the estimated effects by both the methods to be smaller except that on level 4 by the LCF approach. Table 1.19 presents that the two methods’ percentile APE estimates for level 1 and level 3 by the two methods and those for level 4 by the LCF approach at 75th and 90th percentiles are statistically significant. All of the estimates for level 1 are positive and those for level 21 Considering that 119,687 students took the MEAP math test in 2004/2005, one percentage point increase(decrease) in the number of students at a certain level translates into about 1200 student increase(decrease). 63 Table 1.18: Average APE estimates of spending on the fourth grade MEAP math test w level Two Step LCF w2 Two Step Two Step (Flexible) LCF LCF (Flexible) 1 0.681* (0.158) 0.768* (0.213) 0.779* (0.067) 0.691* (0.038) 0.856* (0.218) 0.771* (0.227) 2 0.017 (0.158 ) -0.039 (0.131) -0.044 (0.126) 0.014 (0.158) -0.051 (0.143) 0.019 ( 0.146) 3 4 -0.619* -0.079 (0.048) (0.048) -0.625* -0.104* (0.165) (0.049) -0.667* -0.068 (0.095) (0.035) -0.602* -0.103 (0.174) (0.054) -0.693* -0.112* (0.182) (0.056) -0.626* -0.163* (0.171) (0.068) 1. w = spending = log(per pupil GF expenditure). Standard errors are in parentheses. 3. The two step estimation method’s standard errors are calculated using 1000 bootstrap replications. 4. * is significant at, or below, 5 percent. 2. 3 and 4 are negative. As the percentile of spending increases, the magnitudes of the estimated effects for level 1 and 3 by the LCF approach decrease, illustrating that given the same percentage increase in per pupil expenditure, low spending districts are affected more than high spending districts. On the other hand, the two step estimation method has different patterns. Without the quadratic effect, the higher the percentile of spending is, the bigger its estimated effect for level 1 is and the smaller that for level 3 is. But its level 1 estimate with the quadratic effect increases from the 10th to the 75th percentile and then drops at the 90th percentile. Allowing for v2i and v3i in the estimation has the same result as the average APE estimates. The effects on level 2 in Table 1.18 and Table 1.19 are not significant. It might be because the fractions of students who used to be at lower levels move to level 2 and who used to be at level 2 to level 1 are similar. In summary, the two methods show that spending affects mainly on level 1 and 3. Plus, considering the magnitudes of their estimates along with the directions, we can 64 Table 1.19: Percentile APE estimates of spending on the fourth grade MEAP math test w wp level Two Step w2 Two Step Two Step (Flexible) LCF LCF (Flexible) w wp level Two Step w2 Two Step Two Step (Flexible) LCF LCF (Flexible) w wp level Two Step w2 Two Step Two Step (Flexible) LCF LCF (Flexible) 10th 1 0.601* (0.135) 0.720* (0.146) 0.628* (0.163) 0.922* (0.231) 0.791* (0.242) 2 0.213 (0.156) 0.163 (0.194) 0.202 (0.178) -0.060 (0.163) 0.040 (0.169) 3 -0.729* (0.209) -0.832* (0.246) -0.710* (0.215) -0.745* (0.208) -0.646* (0.192) 25th 4 -0.084 (0.074) -0.051 (0.078) -0.120 (0.102) -0.117 (0.066) -0.185* (0.081) 1 0.633* (0.156) 0.755* (0.171) 0.658* (0.184) 0.903* (0.227) 0.785* (0.237) 50th 1 0.679* (0.188) 0.798* (0.206) 0.697* (0.213) 0.872* (0.221) 0.776* (0.230) 2 0.062 (0.125) -0.012 (0.141) 0.052 (0.140) -0.053 (0.147) 0.024 (0.151) 3 -0.660* (0.179) -0.718* (0.190) -0.643* (0.175) -0.706* (0.188) -0.631* (0.175) 2 -0.252 (0.179) -0.273 (0.181) -0.230 (0.185) -0.038 (0.132) -0.011 (0.132) 3 -0.452* (0.072) -0.436* (0.070) -0.437* (0.074) -0.619* (0.162) -0.598* (0.159) 3 -0.706* (0.200) -0.792* (0.227) -0.688* (0.202) -0.730* (0.200) -0.641* (0.185) 4 -0.083 (0.067) -0.059 (0.071) -0.115 (0.088) -0.115 (0.063) -0.179* (0.077) 75th 4 -0.080 (0.056) -0.067 (0.059) -0.106 (0.067) -0.113 (0.058) -0.169* (0.071) 90th 1 0.764* (0.242) 0.775* (0.243) 0.733* (0.245) 0.761* (0.214) 0.741* (0.216) 2 0.157 (0.142) 0.095 (0.169) 0.145 (0.159) -0.057 (0.156) 0.034 (0.162) 4 -0.060* (0.019) -0.065* (0.022) -0.066* (0.018) -0.104* (0.049) -0.132* (0.058) 1. w = spending = log(per pupil GF expenditure). Standard errors are in parentheses. 3. APEs at ( z , spending ). 1 p 4. * is significant at, or below, 5 percent. 2. 65 1 0.728* (0.222) 0.819* (0.236) 0.731* (0.239) 0.828* (0.215) 0.762* (0.222) 2 -0.070 (0.132) -0.141 (0.143) -0.072 (0.147) -0.047 (0.137) 0.010 (0.140) 3 -0.584* (0.140) -0.607* (0.138) -0.568* (0.134) -0.672* (0.174) -0.618* (0.166) 4 -0.074 (0.041) -0.072 (0.042) -0.091* (0.043) -0.109* (0.053) -0.154* (0.064) conclude that an increase in spending tends to shift students who are rated at a lower level to an upper level, in general. It is consistent with the results of the four research that mentioned above. 1.5 CONCLUSION This chapter develops a two step estimation method for multiple fractional dependent variables especially when endogenous explanatory variables are continuous. The method directly specifies the estimating conditional mean rather than the structural one and deals with the endogeneity by combining a control function approach. Although the method is not applicable when the characteristics of choices are of interest, it can provide not only a consistent estimator of the parameters in the estimating conditional mean as long as the conditional mean specification is correct, but also a useful estimator for the quantities of the structural conditional mean without estimating the structural mean parameters. Monte Carlo simulations demonstrate that the method even with a misspecified conditional mean works well as an approximation to true APEs if a strong instrument is available. The simulations also provide evidence that the two step estimation is preferable to an alternative linear method - a LCF approach; the linear method’s approximation is more sensitive to the quality of an instrument. The application to the fourth grade MEAP math test of the year 2004/2005 illustrates that the two step estimation method and the LCF approach provide similar results in general: the more a school district spends, the more students are in the exceeded Michigan Standards level and the fewer students are in the demonstrated basic knowledge and skills of Michigan Standards level. That is, an increase in spending tends to shift students from a lower level to a upper level. 66 APPENDIX 67 Appendix for Chapter 1 In this appendix, we show how to obtain the estimators’ standard errors taking the extra variations from the first steps of Procedure 1.2.1 and Procedure 1.2.2 into account especially when the quadratic effect is not allowed for. The model including w2 and using the flexible forms of v require modifications. This appendix also includes Table 1.20 showing how the response variables for three choices are generated across the simulations. The standard errors of θ First, we obtain the standard errors of θ from Procedure 1.2.1. The first step of Procedure 1.2.1 applies OLS to (1.15) and so, under the standard regularity conditions, √ 1 N (π − π ) = √ N N ∑ E(z z) i −1 1 z i v i + o p (1) = √ N N ∑ q i + o p (1) (1.28) i where qi ≡ E(z z)−1 zi vi . The conditional mean of the second step obtaining θ is E(yig |zi , wi ) = K g (hi ; θ) = = exp(θg hi ) 1 + ∑hG=2 exp(θh hi ) 1 1 + ∑hG=2 exp(θh hi ) , g = 2, · · · , G, (1.29) , g = 1, (1.30) where hi = (xi1 vi ) is a 1 × p vector, vi = wi − zi π, and we redefine θ for the appendix by dropping θ1 from it: θ= θ2 · · · θg · · · θG p( G −1)×1 where θg = (θz θw θv ) p×1 , for g = 2, · · · G. Then the first order condition is N ∑ si (θ, π ) = 0 i 68 (1.31) where   ∂ i   ) (  ∂θ2  s  .   i2   .   ..   .   .       ∂ i       ) si (θ, π ) ≡ (∇θ i ) =  (  =  sig  ∂θ g      ..   ..   .   .       ∂ i  siG ( ) ∂θG and sig = ( ∂ i ) = hi (yig − K g (hi ; θ)), p × 1 vector. ∂θg A mean value expansion (MVE) around θ gives N ∑ si (θ, π ) = ∑ si (θ, π ) + i i ∇θ ∑ si (θ¨ ) (θ − θ) (1.32) i 1 where θ¨ is on the line segment between θ and θ. By multiplying through by √ and N using (1.31) and the weak law of large numbers (WLLN), we rearrange (1.32): √ 1 N (θ − θ) = − √ N N ∑ A−1 si (θ, π ) + o p (1) (1.33) i where A = − E(Hi (θ)) = − E(∇2θ i (θ)) p(G−1)× p(G−1) ,   hi hi Ki2 Ki3 ··· ··· hi hi Ki2 KiG −hi hi Ki2 (1 − Ki2 )      h h K K − h h K ( 1 − K ) · · · · · · h h K K i3   i i i3 i2 i i i3 i i i2 iG Hi =  , . . .   .. .. ..     hi hi KiG Ki2 ··· · · · hi hi KiG KiG −hi hi KiG (1 − KiG ) and Kig = K g (hi ; θ). Since ∑i si (θ, π ) still depends on the sample, we can not apply the central limit the- 69 orem (CLT) yet. Using a MVE around π, multiplying through by √1 N and using (1.28) gives 1 √ N √ N ( π − π ) + o p (1). s ( θ, π ) + E [∇ s ( θ, π )] π i ∑ i N N 1 ∑ si (θ, π ) = √ N i i N 1 =√ N ∑ (si + Fqi ) + o p (1) (1.34) i where ∂si2 ∂π F = E[∇π si (θ, π )] p(G−1)× M = E ··· ∂sig ∂π ··· ∂siG ∂π and ∂sig ∂h θ exp(hi θh ) ∑ = i (yig − Kig ) + hi zi Kig θvg − h=2 vh ∂π ∂π 1 + ∑h=2 exp(hi θh ) . By plugging (1.34) into (1.33), √ N (θ − θ) = A−1 1 −√ N N ∑ di (θ, π ) + o p (1) (1.35) i where di ≡ si + Fqi . Therefore, Avar √ N (θ − θ) = A−1 DA−1 (1.36) where D ≡ Var(di ) = Var(si + Fqi ), and so a valid estimator of Avar(θ) is 1 −1 A D A −1 N (1.37) where D≡ 1 N N ∑ di di = i 1 N ∑(si + Fqi )(si + Fqi ) , (1.38) i si = si (hi ; θ), 70 (1.39) F = Fi (hi ; θ), q=( 1 N (1.40) N ∑ z i z i ) −1 z i v i , (1.41) i and 1 A=− N N ∑ Hi (hi ; θ). (1.42) i The square roots of (1.37)’s diagonal elements are the standard errors. The standard errors of the two step APE estimator Next, we derive the standard errors of (1.23). Let’s define (1.22) as δg (x1 ; η): δg (x1 ; η) ≡ Ev K g (x1 , v; θ) · where η = θ θwg − ∑hG θwh exp (x1 θxh + θvh v) ∑hG exp (x1 θxh + θvh v) (1.43) . Since it depends on the value of x1 , we use two ap- π ( p( G −1)+ M)×1 proaches to obtain a single number. One is the average APEs, δgAVG = Ex1 δg (x1 ; η) (1.44) and the other is the percentile APEs, δgPCT = δg (x1◦ ; η) (1.45) where x1◦ = (z1 , w p ). (1.44) is estimated as δgAVG = 1 N N ∑ δg (x j1; η) j  where δg (x j1 ; η) = 1 N (1.46) ∑iN K g (x j1 , vi ; θ) · θwg − 71 ∑hG θwh exp x j1 θxh + θvh vi ∑hG exp x j1 θxh + θvh vi  . Based on (1.28) and (1.35), we can write  √ N (η − η) = √   1  θ−θ  N = √ N π−π N A −1 d i qi  ∑  1  + o p (1) = √ N i Applying a MVE to (1.46) around η, multiplying through √ N ∑ k i + o p (1). (1.47) i N, using (1.47) and WLLN gives √ We subtract √ √ N (δgAVG N 1 N δgAVG = √ N ∑ δg (x j1 ; η) + E ∇η δg (x j1 ; η) k j + o p (1). (1.48) j NδgAVG from both sides of (1.48): − δgAVG ) 1 =√ N N ∑ δg (x j1 ; η) − δgAVG + E ∇η δg (x j1 ; η) k j + o p (1) (1.49) j where E δg (x j1 ; η) − δgAVG + E ∇η δg (x j1 ; η) k = 0. Therefore, based on the CLT, Avar √ N (δgAVG − δgAVG ) = Var δg (x j1 ; η) − δgAVG + ∆ g (η)k (1.50) where ∆ g (η) = E ∇η δg (x j1 ; η) , the 1 × { p( G − 1) + M} Jacobian of δg (x j1 ; η), and a valid estimator of (1.50) is 1 N N ∑ δg (x j1 ; η) − δgAVG + ∆ g (η)k j δg (x j1 ; η) − δgAVG + ∆ g (η)k j (1.51) ∇η δg (x j1 ; η) , (1.52) j where 1 ∆ g (η) = N and N ∑ j   kj =  A −1 d qj 72  j . (1.53) The percentile APE (1.45) is estimated as δgPCT = δg (x1◦ ; η) 1 = N  where jg (x1◦ , vi ; η) = K g (x1◦ , vi ; θ) · θwg − N ∑ jg (x1◦ , vi ; η) ∑hG θwh exp x1◦ θxh + θvh vi ∑hG (1.54) i exp x1◦ θxh + θvh vi  . Through the same process as in the average APE estimate, √ 1 N (δgPCT − δgPCT ) = √ N N ∑ jg (x1◦ , vi ; η) − δgPCT + J g (η)ki + o p (1) (1.55) i where J g (η) = E ∇η jg (x1◦ , v; η) , the 1 × { p( G − 1) + M} Jacobian of jg (x1◦ , v; η). Since E jg (x1◦ , v; η) − δgPCT + J g (η)k = 0, Avar √ N (δgPCT − δgPCT ) = Var jg (x1◦ , v; η) − δgPCT + J g (η)k . (1.56) Thus, a valid estimator of (1.56) is 1 N N ∑ jg (x1◦ , vi ; η) − δgPCT + J g (η)ki jg (x1◦ , vi ; η) − δgPCT + J g (η)ki (1.57) j where J g (η) = 1 N N ∑ ∇η jg (x1◦ , vi ; η) . (1.58) i Hence, the the asymptotic standard errors of δgAVG and δgPCT are obtained as the square √ roots of (1.51) and (1.57), divided by N, respectively. The standard errors of γ For g = 2, · · · , G, the LCF approach models yig = zi1 γzg + γwg wi + uig 73 (1.59) and uig = ρ g vi + eig (1.60) with the reduced form of the endogenous variable w, (1.15) and yi1 = ∑G g=2 yig . Plugging (1.60) into (1.59), yig = zi1 γzg + γwg wi + ρ g vi + eig = hi γ g + eig , g = 2, · · · , G (1.61) where γ g = (γzg γwg ρ g ) = (γzg γwg γvg ) p×1 . Considering that the second step of Procedure 1.2.2 replaces v with v, the estimating equation of the LCF approach for g = 2, · · · , G is yig = zi1 γzg + γwg wi + ρ g vi + ρ g (vi − vi ) + eig = hi γ g + ρ g (vi − vi ) + eig = hi γ g + (hi − hi )γ g + eig (1.62) and the LCF estimator is expressed as −1 ∑ hi hi γg = γg + ∑ hi (hi − hi )γ g + eig , g = 2, · · · , G (1.63) i i and G ∑ γg . γ1 = e1 − (1.64) g =2 Therefore, √ N (γ g − γ g ) = = 1 N 1 N −1 ∑ hi hi i ∑ hi hi i 1 √ N −1 G ∑ g =2 ∑ hi (hi − hi )γ g + eig , g = 2, · · · , G, (1.65) i 1 −√ N 74 ∑ hi [(hi − hi )γg + eig ] i , g = 1. (1.66) A similar reasoning in Wooldridge (2010, Appendix 6A) rewrites (1.65) and (1.66) as √ N γ g − γ g = C −1 = C −1 1 √ N N ∑ hi eig − R g B−1 zi vi + o p (1), g = 2, · · · , G, (1.67) i 1 −√ N N G ∑∑ hi eig − R g B−1 zi vi + o p (1), g = 1. (1.68) i g =2 where C = E ( h h ), R g = E (γ g ⊗ h) ∇π h , and B = E ( z z ). (1.69) Since E h e g − R g B−1 z v = 0 ∀ g, Avar √ N ( γ g − γ g ) = C −1 M g C −1 (1.70) where M g = Var h e g − R g B−1 z v , g = 2, · · · , G. (1.71) and G M1 = Var ∑ h e g − R g B −1 z v (1.72) g =2 Therefore, Avar(γ g ) is estimated as 1 C −1 M g C −1 N (1.73) where C= 1 N N ∑ hi hi , i 75 (1.74) 1 Mg = N N ∑ hi eig − R g B−1 zi vi , g = 2, · · · , G, (1.75) i G M1 = hi eig − R g B−1 zi vi 1 ∑ Mg + N ∑ g =2 hi eig − R g B−1 zi vi hi eik − Rk B−1 zi vi , (1.76) g =k Rg = 1 N N ∑ γ g ⊗ hi ∇ π x i ( π ), (1.77) i B= 1 N N ∑ zi zi , (1.78) i and eig = yig − hi γ g . (1.79) The standard error of γ g is obtained as the square root of (1.73). (y1 , y2 , y3 ) generated across the simulations In Table 1.20, we report the average outcome in each of the three choices across the simulations, and the fraction of times at least one choice being below 0.05. The fractions are about 0.25 ∼ 0.45 with the two symmetric distributions and the χ2 distribution has higher fractions, about 0.70. They suggest that the dependent variable generating process covers the cases where yi has a set of extreme values such as (0.95, 0.05,0). 76 Table 1.20: (yi1 , yi2 , yi3 ) generated across the simulations D (e) π2 ρ 1 1 0.5 1 0.2 1 0.1 1 1 0.5 0.5 0.5 0.2 0.5 0.1 0.5 1 0.1 0.5 0.1 0.2 0.1 0.1 0.1 y1 0.296 (0.014) 0.289 (0.014) 0.286 (0.013) 0.286 (0.013) 0.277 (0.013) 0.268 (0.012) 0.265 (0.012) 0.264 (0.012) 0.263 (0.012) 0.251 (0.011) 0.248 (0.011) 0.247 (0.011) Normal mean y2 y3 0.352 0.352 (0.007) (0.007) 0.356 0.356 (0.007) (0.007) 0.357 0.357 (0.007) (0.007) 0.357 0.357 (0.007) (0.007) 0.361 0.361 (0.007) (0.007) 0.366 0.366 (0.006) (0.006) 0.368 0.368 (0.006) (0.006) 0.368 0.368 (0.006) (0.006) 0.368 0.368 (0.006) (0.006) 0.374 0.374 (0.006) (0.006) 0.376 0.376 (0.006) (0.006) 0.376 0.376 (0.006) (0.006) fraction 0.39 (0.02) 0.37 (0.02) 0.36 (0.02) 0.36 (0.02) 0.34 (0.02) 0.31 (0.02) 0.30 (0.02) 0.30 (0.02) 0.30 (0.02) 0.27 (0.02) 0.26 (0.02) 0.25 (0.02) y1 0.313 (0.015) 0.307 (0.015) 0.305 (0.015) 0.305 (0.015) 0.299 (0.014) 0.291 (0.014) 0.289 (0.014) 0.289 (0.014) 0.289 (0.014) 0.280 (0.013) 0.277 (0.013) 0.277 (0.013) Logistic mean y2 y3 0.344 0.344 (0.008) (0.008) 0.346 0.346 (0.008) (0.008) 0.347 0.347 (0.008) (0.007) 0.347 0.347 (0.008) (0.007) 0.351 0.351 (0.007) (0.007) 0.354 0.354 (0.007) (0.007) 0.355 0.355 (0.007) (0.007) 0.356 0.355 (0.007) (0.007) 0.356 0.356 (0.007) (0.007) 0.360 0.360 (0.007) (0.007) 0.361 0.361 (0.007) (0.007) 0.361 0.361 (0.007) (0.007) 1. τ = 0. Standard deviations are in parentheses. 3. These are the results when the quadratic effect of w is not allowed. 2. 77 χ23 fraction 0.44 (0.02) 0.42 (0.02) 0.42 (0.02) 0.42 (0.02) 0.40 (0.02) 0.38 (0.02) 0.37 (0.02) 0.37 (0.02) 0.37 (0.02) 0.35 (0.02) 0.34 (0.02) 0.34 (0.02) y1 0.103 (0.009) 0.095 (0.008) 0.093 (0.008) 0.093 (0.008) 0.085 (0.008) 0.077 (0.007) 0.074 (0.007) 0.074 (0.006) 0.073 (0.007) 0.065 (0.006) 0.062 (0.005) 0.062 (0.005) mean y2 0.449 (0.005) 0.452 (0.005) 0.453 (0.005) 0.453 (0.005) 0.458 (0.004) 0.462 (0.004) 0.463 (0.004) 0.463 (0.004) 0.463 (0.004) 0.468 (0.004) 0.469 (0.003) 0.469 (0.003) fraction y3 0.449 (0.005) 0.452 (0.005) 0.453 (0.004) 0.453 (0.004) 0.457 (0.004) 0.462 (0.004) 0.463 (0.004) 0.463 (0.004) 0.463 (0.004) 0.468 (0.004) 0.469 (0.003) 0.469 (0.003) 0.68 (0.02) 0.68 (0.02) 0.68 (0.02) 0.68 (0.02) 0.69 (0.02) 0.69 (0.02) 0.69 (0.02) 0.69 (0.02) 0.69 (0.02) 0.70 (0.02) 0.70 (0.02) 0.70 (0.02) REFERENCES 78 REFERENCES Arsen, D., and D. N. Plank., Michigan School Finance Under Proposal A: State Control, Local Consequences. Buis, M. L. 2008. “FMLOGIT: Stata module fitting a fractional multinomial logit model by quasi maximum likelihood.” Statistical Software Components, Boston College Department of Economics, June. Gourieroux, C., A. Monfort, and A. Trognon. 1984. “Pseudo Maximum Likelihood Methods: Theory.” Econometrica, 52(3): 681–700. Greene, W. H. 2008. Econometric Analysis.: Prentice Hall. Lockwood, A. 2002. School finance reform in Michigan, Proposal A: Retrospective.: Office of Revenue and Tax Analysis, Michigan Department of Treasury. Michgian Department of Education. 2005. “Michigan Educational Assessment Program (MEAP) Frequently Asked Questions – Winter 2005.” State of Michigan, Department of Education, May. Mullahy, J. 2010. “Multivariate Fractional Regression Estimation of Econometric Share Models.” NBER Working Papers 16354, National Bureau of Economic Research, Inc. Papke, L. E. 2005. “The effects of spending on test pass rates: evidence from Michigan.” Journal of Public Economics, 89(5-6): 821–839. . 2008. “The Effects of Changes in Michigan’s School Finance System.” Public Finance Review, 36(4): 456–474. , and J. M. Wooldridge. 1996. “Econometric Methods for Fractional Response Variables with an Application to 401(K) Plan Participation Rates.” Journal of Applied Econometrics, 11(6): 619–32. , and . 2008. “Panel data methods for fractional response variables with an application to test pass rates.” Journal of Econometrics, 145(1-2): 121–133. Petrin, A., and K. Train. 2010. “A Control Function Approach to Endogeneity in Consumer Choice Models.” Journal of Marketing Research, 47(1): 3–13. Roy, J. 2011. “Impact of School Finance Reform on Resource Equalization and Academic Performance: Evidence from Michigan.” Education Finance and Policy, 6(2): 137–167. Sivakumar, A., and C. Bhat. 2002. “Fractional Split-Distribution Model for Statewide Commodity-Flow Analysis.” Transportation Research Record, 1790(1): 80–88. 79 Staiger, D., and J. H. Stock. 1997. “Instrumental Variables Regression with Weak Instruments.” Econometrica, 65(3): 557–586. Wicksall, B., and M. A. Cleary. 2009. “The Basics of the Foundation Allowance – FY 2006-07.” House Fiscal Agency, January, memorandum. Wooldridge, J. M. 2003. Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data.: MIT Press. . 2005. “Unobserved Heterogeneity and Estimation of Average Partial Effects.” In Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg. eds. by D. W. K. Andrews, and J. H. Stock: Cambridge University Press, , Chap. 3. . 2010. Econometric Analysis of Cross Section and Panel Data.: MIT Press. 80 CHAPTER 2 MULTIPLE FRACTIONAL RESPONSE VARIABLES WITH A BINARY ENDOGENOUS EXPLANATORY VARIABLE 2.1 INTRODUCTION In this chapter, we extend Chapter 1 by allowing endogenous explanatory variables (EEVs) to be discrete. In Chapter 1, we developed a two step estimation method for the response variable. Unlike in a linear model, the response variable’s two features are inherent in the nonlinear model for the two step estimation method; each response lies in the unit interval1 and the sum of responses for a cross sectional unit is one. It maximizes a multinomial distribution with specifying the conditional mean, which includes the first step residuals, as multinomial logit. That is, the method combines fractional multinomial logit developed by Sivakumar and Bhat (2002) and Mullahy (2010) with a control function (CF) approach solving the endogeneity problem by including extra regressors, which is called a control function, in the equation so that the correlation between EEVs and the unobservables would not exist. The method, however, restricts the probabilistic nature of EEVs. The EEVs should be continuous since it relies on the assumption that the exogenous variables are independent of the EEVs’ reduced form errors. So it is not applicable for a research which has discrete EEVs such as “how much people have their pension funds invested in stocks, bonds, and 1 It can take a corner value, zero or one. 81 other” where only the shares for these three assets are available and the key explanatory variable is whether or not taking a class of financial education, which could be correlated with some factors that a researcher cannot observe. Thus, in this chapter we modify the method in order to take the discrete nature of EEVs into account. Especially we consider a binary EEV in the model. A control function approach handling discrete EEVs in nonlinear models was discussed by Terza et al. (2008).2 They suggested ”two-stage residual inclusion” (2SRI), which includes the unstandardized residuals – an estimator of the difference between the EEV and its conditional mean – as additional regressors in the second step. Wooldridge (2014) extended two-stage residual inclusion and proposed another control function approach. Motivated by variable addition tests, he suggests using the standardized or generalized residuals as control functions instead of the unstandardized residuals. We employ the approach in Wooldridge (2014) to modify the two step estimation method proposed in Chapter 1. The modified two step estimation method generates the generalized residuals from a probit regression at the first step. Then, it applies the fractional multinomial logit with including the generalized residuals in the conditional mean at the second step. The second step is a quasi maximum likelihood estimation (QMLE) using a distribution belonging to the linear exponential family. Therefore, it provides a consistent estimator of the mean parameter if the multinomial logit conditional mean is correctly specified, which Gourieroux et al. (1984) describe. In other words, the consistency does not require other distributional assumptions than the conditional mean specification. Notice that it is the parameter in the “estimating” conditional mean that the method consistently estimates; it is not the structural mean parameter. However, without estimating the structural mean parameter, the method can provide a consistent estimator of average partial effect (APE) on the “structural” conditional mean, which is often more interesting than the mean parameter itself. For the APE estimator to be consistent, the 2 Actually, they did not restrict the EEVs’ nature. 82 multinomial logit specification needs to be true. To see how the method works as an approximation when the specification is wrong, we conduct Monte Carlo simulations. The quantities that we are interested in are the true APEs of the binary EEV. We evaluate the method’s approximation to these APEs, compared to several alternative estimation methods’ including two stage least squares (2SLS), linear control function (LCF) approaches, and forbidden regressions. In addition, we compare the performance of its test for endogeneity with one from a linear control function using the same control function – the generalized residuals. The simulations provide evidence that although the two step estimation method’s approximation to APEs depends on how strong the instrument is, it is generally as good as or often better than the alternative methods’. In addition, the two step estimation method’s test for endogeneity have not only about the correct size but also better power. The remainder of this chapter is structured as follows. In the next section, we describe the set of assumptions and the modified two step estimation method. Section 2.3 contains the Monte Carlo simulation design and presents the simulation results where the conditional mean of the method is misspecified. And we conclude the chapter in Section 2.4. 2.2 THE MODIFIED TWO STEP ESTIMATION Let’s consider a random sample in the cross section where each cross sectional observation has G choices or shares. The dependent response of interest – multiple fractional response – for observation i is written as,   yi1    ..   .       yi =   yig     ..   .    yiG 83 (2.1) G ×1 where 0 ≤ yig ≤ 1 and ∑G g yig = 1. Dropping the cross sectional observation index i, write the structural conditional mean of the response for choice g: E(y g |x, r ) = E(y g |z, w, r ) = Gg (z1 , w, r; β), g = 1, 2, · · · , G, (2.2) where 0 < Gg (·) < 1 and ∑G g Gg = 1. These two conditions for Gg (·) are required because of the response variable’s two features: the bounded nature and the adding-up constraint. x is the set of explanatory variables including a binary endogenous explanatory variable w and the set of exogenous variables z = (z1 , z2 ) where z1 includes an intercept. r is an unobserved omitted variable. Note that the covariates do not have are the g subscript in (2.2). That is, each choice has the same covariates; choice specific covariates are not allowed for. The modified two step estimation method specifies an estimating conditional mean, derived from (2.2), as multinomial logit.3 The multinomial logit model allows covariates to contain characteristics varying across cross sectional observations, not choices. Yet, E(y g |x, r ) = E(yh |x, r ) for g = h – is still possible since the model allows the parameters to vary across g. The second equality of (2.2) shows that z2 is redundant in the structural conditional mean, indicating that there is an exclusion restriction. To set up endogeneity into the model, we use an omitted variable problem by assuming w is correlated with r in the following fashion; w = 1 [zπ + u > 0] , (2.3) u ∼ Normal (0, 1), (2.4) and we are allowing r to be correlated with u. 3 In (2.2), we are not specifying Gg (·) as multinomial logit. The assumptions regarding Gg (·) are just the two conditions and the restriction for the covariates. Due to the restriction, this approach is more appropriate for the cases where the characteristics for choices are not important. 84 Additionally, add the following conditional independence assumption: D (r |z, w) = D (r |v), where v is the generalized error 4 (2.5) of w, which playing a role of a sufficient statistic to control for w’s endogeneity. Since u ∼ Normal (0, 1), v ≡ E(u|z, w) = wλ(zπ ) − (1 − w)λ(−zπ ) (2.6) φ(·) is the Inverse Mill’s Ratio. Φ(·) Based on the assumptions above, the estimating conditional mean where λ(·) = E(y g |x) = K g (z1 , w, v; θ). (2.7) Its functional form is determined by the functional form of Gg (·) and the distribution of e. The estimating conditional mean, however, can be specified as any function satisfying 0 < K g (·) < 1 and ∑G g K g = 1, which are from the two conditions of Gg (·), because we have not yet made any assumptions about those. Instead, we assume that their combination leads to the multinomial logit form for K g (·): K g (h; θ) = exp hθg ∑hG exp (hθh ) (2.8) where h = (z1 w v) is a 1 × p vector, θg is a p × 1 parameter vector for choice g. θ = (θ1 . . . θG ) is a pG × 1 vector where θ1 = 0 as a reference.5 We can start with specifying Gg (·) as a multinomial logit in (2.2), and assume joint normality of (r, u); then, in principle, we can find E(y g |x). But it would not be in closed form. So, instead, we use (2.5) as an approximation, and the model E(y g |x) = E(y g |x, v) 4 5 Gourieroux et al. (1987) (2.8) satisfies the two conditions for K g (·) 85 directly as a multinomial logit. Neither approach contains the other, as Wooldridge (2014) discussed; they use different assumptions. In order to estimate θ, a consistent estimator of generalized error vi should be obtained first since it is not observed. Therefore, the modified two step estimation method is summarized as follows: Procedure 2.2 Step 1. Obtain π from the probit regression of wi on zi and compute the generalized residual vi , vi = wλ(zπ ) − (1 − w)λ(−zπ ) = φ(zi π ) (wi − Φ(zi π )) Φ(zi π )(1 − Φ(zi π )) (2.9) Step 2. Run fractional multinomial logit (fmlogit) of (yi1 , · · · , yiG ) on zi1 , wi and vi , which maximizes a multinomial log-likelihood: N ∑ i N i ( θ) =∑ i G ∑ yig log Kg (hi ; θ) . (2.10) g where hi = (zi1 wi vi ). The method provides a convenient test for endogeneity with the null hypothesis that w is exogenous by obtaining an asymptotic robust t statistic on vi . This is a variable addition test discussed in Wooldridge (2014). Furthermore, it is able to estimate average partial effect (APE), which is often the quantity of more interest, without estimating the structural mean parameter β. It is because 86 the assumptions above ensure the following equivalence: δg (z◦ ) = APEg (z◦ ) ≡ Er [ E(y g |z◦ , w = 1, r ) − E(y g |z◦ , w = 0, r )] (2.11) = Er [ Gg (z1◦ , w = 1, r; β) − Gg (z1◦ , w = 0, r; β)] (2.12) = Ev [K g (z1◦ , w = 1, v; θ) − K g (z1◦ , w = 0, v; θ)], (2.13) where z◦ denotes a fixed value of z.6 As shown in Wooldridge (2010, section 2.2.5), (2.13) holds under (2.5) and E(y g |x, r, v) = E(y g |x, r ), which can be derived under the assumptions that have been made so far. In order to have a representative single number summarizing δg (z◦ ), we average it out across the sample for z: δg = Ez δg (z) (2.14) Thus, δg can be estimated by obtaining 2step δg = 1 N N ∑ j 1 N N ∑ K g (z j1 , w = 1, vi ; θ) − K g (z j1 , w = 0, vi ; θ) , (2.15) i where θ is the two step estimator of θ. 2step One thing to be aware of is that the standard errors of θ and δg need to take the additional variation caused by the firs step into consideration by using the delta method or bootstrapping the two steps. θ is a QMLE estimator using a distribution, which is one of linear exponential family. Therefore, based on the discussion of Gourieroux, Monfort, and Trognon (1984), its consistency is ensured only by (2.8). In other words, it is still consistent even though the distribution specification is completely wrong except the conditional mean. 2step If (2.8) is wrong, δg is inconsistent because θ is inconsistent and the wrong func- tional form is used in (2.15). So we conduct Monte Carlo simulations in the next section 6 (2.11) can be written as ASF ( z◦ , w = 1) − ASF ( z◦ , w = 0), ASF (·) denotes “average structure function” defined in Blundell and Powell (2003) and Wooldridge (2005). 87 in order to examine how well the two step estimation method approximate to δg when the multinomial logit conditional mean is misspecified. The simulations compare its approximation with several alternative methods that researchers would use. These methods are 1) Two stage least squares (2SLS),7 2) Linear control function approach using the generalized residual (LCF), 3) Linear IV using the fitted probability Φ(·) as an instrument (LIV), 4) Linear plug-in method, and 5) Two step plug-in method.8 The adding-up constraint is not inherent in the alternative linear models. So they should apply their estimation methods to G − 1 equations by dropping one of G choices. Then, the dropped one’s parameters are obtained based on the constraint and the estimates for the other choices.9 The coefficient estimates of these linear models are comparable to δg . The alternative methods include two forbidden regressions – the two plug-in methods. They substitute wi with its fitted probability Φ(·) at their second steps. Researchers often attempt to use this kind of approach, believing that it is legitimate because it emulates the 2SLS procedure. However, they are inappropriate especially for nonlinear models. In addition to 1) through 5), we include the generalized residual – the control function – in a flexible way10 for the two step estimation method and LCF in order to examine if it helps their approximations. We call them 6) Two step flexible and 7) LCF flexible, respectively. Furthermore, we compare the performance of tests for endogeneity done by the two step method and LCF with significance level α = 0.05 by varying degree of endogeneity and the instrument’s predictive power; the null hypotheses of the tests are that there is no endogeneity.11 7 This is the same as a linear control function approach using the residual from the linear regression w on z. 8 Terza et al. (2008) call this approach “two-stage predictor substitution (2SPS).” In the simulations, we drop the first choice (g = 1), which is the reference choice for the two step estimation method, and obtain its estimates using γ1 = e1 − ∑G g=2 γ g where γ g is the coefficient parameter for choice g in the linear models and e1 is a unit vector. 10 v2 and v3 are additionally included in the second step. 11 H : θ = 0 for the two step estimation method and H : γ = 0 for LCF. v v 0 0 9 88 2.3 2.3.1 MONTE CARLO SIMULATION Data Generating Process We use 1000 replications where each replication has 500 cross sectional observations (N = 500) and 3 choices (g = 3). The data generating process for each replication is as follows: The covariates • zi = zi1 zi2  = 1 zi1 zi2  1×3     zi1   0   1 τ  where   ∼ MV Normal   ,   and τ ∈ {0, −0.5}. zi2 0 τ 1 • ui ∼ Normal (0, σu2 )12 • wi = 1[wi∗ > 0] = 1[zi π + ui > 0] = 1[zi1 π1 + zi2 π2 + ui > 0] where π1 ∈ {0, 0.5} and π2 ∈ {0.1, 0.2, 0.5, 1}. • vi = σu wi λ zi π σu − (1 − wi ) λ − zi π σu • ri = ρui + ei where ρ ∈ {0.1, 0.25, 0.5, 0.75, 0.9, 1} and D (ei |vi ) is one of the three: (a) ei |vi ∼ Normal (0, 1) (b) ei |vi ∼ χ23 1 (c) ei |vi ∼ Normal 0, 1 + v2i . 2 Two symmetric distributions and an asymmetric distribution are in use for the distribution of ei |vi With (a) or (b), r is generated to be uncorrelated with w if ρ = 0. The values of π1 and π2 affect Var (wi ). As (π1 , π2 ) varies, we adjust σu for Var (wi∗ ) to be invariant instead of Var (wi ): Var (wi∗ ) = 2. 12 89 However, with (c), where there is heteroskedasticity, r still depends on w even though ρ = 0. The structural conditional mean Gg (·) specification: We specify the structural conditional mean as multinomial logit as well since it satisfies the two conditions for Gg (·): E(yig |xi , ri ) = Gg (zi1 , wi , ri ; β) = exp zi1 βzg + wi β wg + ri β rg ∑3h exp (zi1 βzh + wi β wh + ri β rh ) (2.16) where β = ( β1 β2 β3 ) is a 12 × 1 vector, β g = ( βzg β wg β rg ) = (1 1 1 1) for g = 2, 3, and β1 = 0. Under (2.16), neither of the three distributions of e can derives (2.8). The multiple fractional dependent variables y: The dependent variable generating process is the same as that in Chapter 1; we first draw 100 multinomial outcomes among 1, 2, and 3 based on (2.16), and then calculate the proportions for the three outcomes. 2.3.2 Simulation Results Case 1: Endogeneity comes only through u. Let’s consider the two settings where the data are generated so that w becomes exogenous when ρ = 0. Table 2.1 through 2.3 contain simulation results as π2 = 1.13 In Table 2.1, with the 2step standard normal distribution, δg is pretty similar to δg regardless of the degree of en- dogeneity, and the alternative methods also provide good approximations. Even the forbidden regressions’ approximations are good. Including the flexible forms of v does not improve the approximations for both the two step estimation method and LCF. In addition, allowing for the correlation between z1 and z2 does not change the story.14 Table 2.2 shows that, under the asymmetric distribution, the biases are slightly larger than those in 13 14 The results with other values of ρ are available upon request. π1 = 0.5 and τ = −0.5 90 Table 2.1. Yet, in considering the efficiency as well as the bias, all of the methods under the asymmetric distribution also approximate well, as shown in Table 2.3. 91 Table 2.1: APEs with r = ρu + e, π2 = 1, and e|v ∼ Normal (0, 1) ρ π1 g δg Two Step Two Step Flexible Two Step Plug-in LCF LCF Flexible 2SLS LIV (IV=Φ(·)) Linear Plug-in Mean Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Two Step Flexible Two Step Plug-in LCF LCF Flexible 2SLS LIV (IV=Φ(·)) Linear Plug-in 1. 3 0.054 0.053 (0.019) 0.052 (0.026) 0.053 (0.020) 0.053 (0.020) 0.053 (0.026) 0.053 (0.020) 0.053 (0.020) 0.053 (0.020) 1 -0.107 -0.108 (0.029) -0.106 (0.035) -0.107 (0.030) -0.107 (0.030) -0.107 (0.036) -0.107 (0.031) -0.107 (0.030) -0.107 (0.030) 0.5 ρ π1 g δg Two Step 1 -0.107 -0.107 (0.035) -0.105 (0.046) -0.106 (0.037) -0.107 (0.037) -0.107 (0.047) -0.106 (0.037) -0.106 (0.037) -0.106 (0.037) 1 0.5 2 0.054 0.054 (0.020) 0.053 (0.026) 0.053 (0.021) 0.054 (0.020) 0.054 (0.026) 0.053 (0.021) 0.053 (0.021) 0.053 (0.021) 1 0 2 0.054 0.054 (0.016) 0.053 (0.020) 0.054 (0.017) 0.054 (0.017) 0.054 (0.020) 0.054 (0.017) 0.054 (0.017) 0.054 (0.017) 3 0.054 0.054 (0.016) 0.053 (0.020) 0.053 (0.016) 0.053 (0.016) 0.053 (0.020) 0.053 (0.017) 0.053 (0.016) 0.053 (0.016) 0.1 0 Mean Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD 1 -0.108 -0.108 (0.023) -0.108 (0.028) -0.107 (0.024) -0.107 (0.024) -0.108 (0.030) -0.107 (0.025) -0.107 (0.024) -0.107 (0.024) 2 0.054 0.054 (0.014) 0.054 (0.017) 0.054 (0.014) 0.054 (0.014) 0.054 (0.017) 0.054 (0.015) 0.054 (0.014) 0.054 (0.014) 3 0.054 0.054 (0.013) 0.054 (0.017) 0.054 (0.014) 0.054 (0.014) 0.054 (0.017) 0.053 (0.014) 0.054 (0.014) 0.053 (0.014) As π1 = 0.5, τ = −0.5; otherwise, τ = 0. 92 1 -0.108 -0.108 (0.021) -0.109 (0.025) -0.108 (0.021) -0.108 (0.022) -0.108 (0.027) -0.108 (0.022) -0.108 (0.022) -0.108 (0.022) 2 0.054 0.054 (0.013) 0.055 (0.016) 0.054 (0.013) 0.054 (0.013) 0.054 (0.016) 0.054 (0.014) 0.054 (0.013) 0.054 (0.013) 3 0.054 0.054 (0.012) 0.054 (0.015) 0.054 (0.013) 0.054 (0.013) 0.054 (0.016) 0.054 (0.013) 0.054 (0.013) 0.054 (0.013) Table 2.2: APEs with r = ρu + e, π2 = 1, and e|v ∼ χ23 ρ π1 g δg Two Step Two Step Flexible Two Step Plug-in LCF LCF Flexible 2SLS LIV (IV=Φ(·)) Linear Plug-in Mean Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Two Step Flexible Two Step Plug-in LCF LCF Flexible 2SLS LIV (IV=Φ(·)) Linear Plug-in 1. 3 0.017 0.019 (0.011) 0.017 (0.015) 0.014 (0.013) 0.014 (0.013) 0.016 (0.017) 0.014 (0.013) 0.014 (0.013) 0.014 (0.013) 1 -0.032 -0.036 (0.012) -0.033 (0.014) -0.029 (0.015) -0.028 (0.015) -0.032 (0.018) -0.029 (0.015) -0.029 (0.015) -0.029 (0.015) 0.5 ρ π1 g δg Two Step 1 -0.033 -0.039 (0.015) -0.035 (0.020) -0.029 (0.019) -0.028 (0.020) -0.034 (0.025) -0.029 (0.019) -0.029 (0.019) -0.029 (0.019) 1 0.5 2 0.017 0.020 (0.012) 0.018 (0.016) 0.015 (0.013) 0.015 (0.013) 0.018 (0.017) 0.015 (0.013) 0.015 (0.013) 0.015 (0.013) 1 0 2 0.016 0.019 (0.010) 0.017 (0.012) 0.015 (0.011) 0.015 (0.011) 0.017 (0.013) 0.015 (0.011) 0.015 (0.011) 0.015 (0.011) 3 0.016 0.018 (0.009) 0.016 (0.012) 0.014 (0.010) 0.014 (0.010) 0.016 (0.013) 0.014 (0.011) 0.014 (0.010) 0.014 (0.010) 0.1 0 Mean Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD 1 -0.028 -0.029 (0.009) -0.028 (0.011) -0.028 (0.011) -0.027 (0.011) -0.028 (0.013) -0.027 (0.011) -0.027 (0.011) -0.027 (0.011) 2 0.014 0.015 (0.009) 0.015 (0.011) 0.014 (0.009) 0.014 (0.009) 0.015 (0.011) 0.014 (0.009) 0.014 (0.009) 0.014 (0.009) 3 0.014 0.014 (0.008) 0.014 (0.011) 0.013 (0.009) 0.013 (0.009) 0.013 (0.011) 0.013 (0.009) 0.013 (0.009) 0.013 (0.009) As π1 = 0.5, τ = −0.5; otherwise, τ = 0. 93 1 -0.027 -0.027 (0.008) -0.027 (0.010) -0.027 (0.009) -0.027 (0.009) -0.027 (0.011) -0.027 (0.009) -0.027 (0.009) -0.027 (0.009) 2 0.013 0.014 (0.009) 0.014 (0.011) 0.014 (0.009) 0.014 (0.009) 0.014 (0.011) 0.014 (0.009) 0.014 (0.009) 0.014 (0.009) 3 0.013 0.013 (0.008) 0.013 (0.011) 0.013 (0.008) 0.013 (0.008) 0.013 (0.011) 0.013 (0.009) 0.013 (0.008) 0.013 (0.008) Table 2.3: MSEs of APEs with r = ρu + e and π2 = 1 D (e|v) Normal (0,1) χ23 1. 2. ρ π1 g Two Step Two Step Flexible Two Step Plug-in LCF LCF Flexible 2SLS LIV (IV=Φ(·)) Linear Plug-in Two Step Two Step Flexible Two Step Plug-in LCF LCF Flexible 2SLS LIV (IV=Φ(·)) Linear Plug-in 1 0.001 0.002 0.001 0.001 0.002 0.001 0.001 0.001 0.000 0.000 0.000 0.000 0.001 0.000 0.000 0.000 1 0.5 2 0.000 0.001 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1 3 0.000 0.001 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 As π1 = 0.5, τ = −0.5; otherwise, τ = 0. MSEs are calculated from Table 2.1 and 2.2. 94 2 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 3 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.5 0 2 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.1 3 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1 0.000 0.001 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 2 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 3 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 We weaken the predictive power (π2 < 1)15 while keeping the strongest degree of endogeneity (ρ = 1) in order to examine if these good approximations hinges on the instrument’s strong predictive power. To determine if z2 is a strong instrument, we use the rule of thumb suggested by Staiger and Stock (1997); the first stage F statistic, testing the null hypothesis that instruments are uncorrelated with EEVs, should be larger than 10 for the instruments to have properties as strong instruments. Since the two step estimation method’s first step is a probit regression, we apply their rule to the first step Wald statistics testing the same null hypothesis. Table 2.4 gives the summary of the Wald and the F statistics.16 Table 2.4: F and Wald statistics of the first stage/step (H0 : π2 = 0) D (e|v) Normal (0,1) χ23 OLS τ π1 π2 F statistic Probit [F>10] Wald statistic 1st step [Wald>10] Replication OLS F statistic Probit [F>10] Wald statistic [Wald>10] Replication 0 0 0.1 2.661 (2.914) 2.6% 2.582 (2.746) 2.5% 995 2.861 (3.215) 4.0% 2.763 (2.988) 3.0% 987 0.2 7.923 (5.733) 28.9% 7.410 (5.025) 26.2% 999 7.992 (5.905) 32.0% 7.464 (5.204) 29.4% 999 0.5 53.215 (19.551) 100.0% 39.589 (10.962) 99.8% 1000 54.203 (19.308) 99.8% 40.174 (10.957) 99.8% 1000 1 443.384 (84.613) 100.0% 133.797 (8.861) 100.0% 1000 443.938 (85.330) 100.0% 134.474 (9.186) 100.0% 1000 -0.5 0.5 1 261.906 (60.292) 100.0% 107.881 (11.291) 100.0% 1000 263.869 (59.632) 100.0% 108.501 (11.390) 100.0% 1000 1. Standard deviations are in parentheses [F>10] and [Wald>10] indicate the proportions of the F and Wald statistics being greater than 10 among the replications, respectively. 3. The tests are robust to heteroskedasticity. 2. They are not greater than 10 until π2 = 0.5, on average. When π2 = 0.1, only 25 15 π1 = τ = 0. The first stages of 2SLS and LIV are linear regressions. We report only the former because the latter are quite similar. (The summary of LIV’s first stage F statistics are available upon request.) 16 95 ∼ 40 replications have the statistics being larger than 10. The proportions increase as π2 increases, and then they are larger than 10 in almost every replication as π2 = 0.5. Therefore, we consider z2 as a weak instrument if π2 ≤ 0.2 and as a strong one if π2 ≥ 0.5. Table 2.5, containing the results with the standard normal distribution, show that, in general, the higher the predictive power is, the less biased and the more precise the estimates are, and that as π2 increases, the nonlinear models more quickly recover their approximation abilities. When π2 = 0.1, the linear models yield worse approximations than the three nonlinear models, providing much larger biases and huge standard deviations. When π2 = 0.2, the nonlinear models’ estimates become closer to δg and more precise whereas the linear models still provide poor approximations. As π2 = 0.5, which makes z2 a strong instrument, the approximations by all the methods are quite good. Table 2.7 clearly shows that, among the nonlinear models, the two step estimation method yields a better approximation in both bias and efficiency criteria, and that it does not improve the approximations to include v in a flexible way. The flexible way does not help the estimation to obtain smaller biases but causes bigger standard deviations. Table 2.6 and 2.7 also present that, in general, the asymmetric distribution have a similar results as the standard normal distribution. 96 Table 2.5: APEs with w = 1[z2 π2 + u > 0], ρ = 1, and e|v ∼ Normal (0, 1) π2 g Mean Mean SD Two Step Mean SD Flexible Two Step Mean SD Plug-in LCF Mean SD LCF Mean SD Flexible 2SLS Mean SD LIV Mean (IV=Φ(·)) SD Linear Mean Plug-in SD Replications δg Two Step 1. 1 -0.106 -0.171 (0.300) -0.184 (0.427) -0.147 (0.440) -0.061 (6.387) 1.427 (15.611) 2.545 (91.074) 0.112 (7.392) -0.218 (5.287) 0.1 2 0.053 0.092 (0.186) 0.097 (0.283) 0.082 (0.270) 0.061 (3.274) -0.861 (9.053) -1.303 (47.430) -0.068 (4.376) 0.150 (2.830) 995/1000 3 0.053 0.079 (0.170) 0.088 (0.290) 0.064 (0.249) 0.000 (3.165) -0.566 (7.174) -1.242 (43.651) -0.045 (3.079) 0.067 (2.590) 1 -0.106 -0.082 (0.189) -0.127 (0.350) -0.053 (0.273) -0.017 (0.752) 0.108 (2.567) -0.023 (0.524) -0.027 (0.496) -0.019 (0.597) π1 = τ = 0. 97 0.2 2 0.053 0.042 (0.110) 0.067 (0.210) 0.028 (0.151) 0.010 (0.370) -0.035 (1.639) 0.014 (0.260) 0.015 (0.259) 0.012 (0.284) 999/1000 3 0.053 0.039 (0.103) 0.060 (0.223) 0.025 (0.147) 0.007 (0.397) -0.073 (1.504) 0.009 (0.292) 0.013 (0.257) 0.007 (0.337) 0.5 1 2 3 -0.106 0.053 0.053 -0.102 0.051 0.051 (0.071) (0.040) (0.038) -0.100 0.050 0.050 (0.123) (0.070) (0.069) -0.101 0.050 0.050 (0.075) (0.041) (0.039) -0.101 0.051 0.050 (0.075) (0.041) (0.039) -0.099 0.050 0.049 (0.132) (0.073) (0.073) -0.101 0.050 0.050 (0.075) (0.041) (0.040) -0.101 0.050 0.050 (0.075) (0.041) (0.040) -0.101 0.050 0.050 (0.075) (0.041) (0.040) 1000/1000 Table 2.6: APEs with w = 1[z2 π2 + u > 0], ρ = 1, and e|v ∼ χ23 π2 g Mean Mean SD Two Step Mean Flexible SD Two Step Mean SD Plug-in LCF Mean SD LCF Mean SD Flexible 2SLS Mean SD LIV Mean SD (IV=Φ(·)) Linear Mean Plug-in SD Replications δg Two Step 1. 1 -0.036 -0.064 (0.275) 0.011 (0.431) -0.036 (0.428) -0.046 (3.994) -1.639 (9.191) 0.057 (2.241) -0.074 (1.891) 0.077 (2.632) 0.1 2 0.018 0.035 (0.168) -0.005 (0.279) 0.018 (0.252) 0.021 (2.047) 0.772 (5.188) -0.044 (1.396) 0.194 (4.725) -0.055 (1.632) 987/1000 3 0.018 0.028 (0.166) -0.006 (0.283) 0.018 (0.253) 0.025 (2.060) 0.868 (4.902) -0.013 (1.118) -0.120 (3.054) -0.022 (1.229) 1 -0.036 -0.029 (0.159) -0.002 (0.345) -0.012 (0.218) 0.042 (1.484) -0.464 (4.056) -0.029 (0.645) -0.007 (0.411) -0.043 (1.044) π1 = τ = 0. 98 0.2 2 0.018 0.018 (0.093) 0.000 (0.215) 0.008 (0.119) -0.013 (0.654) 0.227 (2.409) 0.018 (0.275) 0.007 (0.195) 0.024 (0.443) 999/1000 3 0.018 0.012 (0.093) 0.002 (0.209) 0.003 (0.126) -0.029 (0.838) 0.237 (1.797) 0.012 (0.392) 0.000 (0.250) 0.019 (0.614) 0.5 1 2 3 -0.035 0.018 0.018 -0.044 0.023 0.021 (0.031) (0.022) (0.022) -0.030 0.016 0.013 (0.082) (0.053) (0.051) -0.027 0.014 0.013 (0.042) (0.026) (0.026) -0.026 0.014 0.012 (0.041) (0.026) (0.025) -0.039 0.021 0.018 (0.081) (0.054) (0.051) -0.026 0.014 0.012 (0.040) (0.025) (0.025) -0.026 0.014 0.012 (0.040) (0.025) (0.025) -0.026 0.014 0.012 (0.040) (0.025) (0.025) 1000/1000 Table 2.7: MSEs of APE estimates with w = 1[z2 π2 + u > 0] and ρ = 1 D (e|v) Normal (0,1) χ23 1. 2. π2 g Two Step Two Step Flexible Two Step Plug-in LCF LCF Flexible 2SLS LIV (IV=Φ(·)) Linear Plug-in Two Step Two Step Flexible Two Step Plug-in LCF LCF Flexible 2SLS LIV (IV=Φ(·)) Linear Plug-in 1 0.094 0.189 0.196 40.802 246.059 8301.533 54.686 27.963 0.076 0.188 0.183 15.953 87.046 5.032 3.579 6.941 0.1 2 0.036 0.082 0.074 10.722 82.795 2251.416 19.166 8.016 0.029 0.078 0.063 4.189 27.488 1.952 22.353 2.670 3 0.030 0.085 0.062 10.023 51.851 1907.082 9.490 6.706 0.028 0.081 0.064 4.244 24.755 1.251 9.343 1.512 π1 = τ = 0. MSEs are calculated from Table 2.5 and 2.6. 99 1 0.036 0.123 0.078 0.574 6.636 0.281 0.252 0.364 0.025 0.120 0.048 2.208 16.638 0.416 0.170 1.091 0.2 2 3 0.012 0.011 0.044 0.050 0.024 0.022 0.139 0.160 2.693 2.279 0.069 0.087 0.069 0.068 0.082 0.116 0.009 0.009 0.047 0.044 0.014 0.016 0.428 0.705 5.848 3.277 0.075 0.153 0.038 0.063 0.196 0.377 1 0.005 0.015 0.006 0.006 0.017 0.006 0.006 0.006 0.001 0.007 0.002 0.002 0.007 0.002 0.002 0.002 0.5 2 0.002 0.005 0.002 0.002 0.005 0.002 0.002 0.002 0.001 0.003 0.001 0.001 0.003 0.001 0.001 0.001 3 0.001 0.005 0.002 0.002 0.005 0.002 0.002 0.002 0.000 0.003 0.001 0.001 0.003 0.001 0.001 0.001 Table 2.8 compares the rejection frequencies for the different degrees of endogeneity when the strongest instrument is in use. The tests by both the two step estimation method and LCF have good size properties; as ρ = 0, their rejection frequencies are quite close to the nominal value. Allowing for the additional terms of v makes them over-reject. The two step estimation method, however, has better power than LCF for both the two distributions. Figure 2.1 and 2.217 illustrate a pattern that the two step estimation method’s rejection frequencies increase faster than LCF’s as ρ > 0. With χ23 distribution, the pattern is more evident and the flexible way helps the two step estimation method have slightly better power. 17 The graphs show average rejection frequencies for the two choices at each value of ρ in Table 2.8. 100 Table 2.8: Rejection Frequencies for α = 0.05 test in Case 1 with varying ρ D (e|v) Normal (0,1) χ23 1. 2. ρ g Two Step LCF Two Step flexible LCF flexible Two Step LCF Two Step flexible LCF flexible 0 2 0.05 0.05 0.11 0.10 0.06 0.06 0.17 0.11 0.1 3 0.06 0.06 0.11 0.10 0.07 0.04 0.17 0.11 2 0.11 0.09 0.15 0.13 0.09 0.06 0.19 0.12 3 0.12 0.09 0.16 0.12 0.08 0.05 0.18 0.11 0.25 2 0.46 0.30 0.40 0.26 0.22 0.10 0.28 0.13 3 0.46 0.32 0.41 0.25 0.22 0.07 0.29 0.13 0.5 2 0.92 0.79 0.89 0.66 0.54 0.20 0.58 0.18 3 0.92 0.79 0.89 0.68 0.55 0.18 0.58 0.21 0.75 2 1.00 0.97 0.99 0.93 0.78 0.37 0.85 0.32 3 1.00 0.97 0.99 0.94 0.79 0.38 0.85 0.31 π2 = 1, π1 = τ = 0. The null hypotheses for the flexible models are that the coefficients of v, v2 , and v3 are zeroes. 101 0.9 2 1.00 0.99 1.00 0.98 0.88 0.48 0.94 0.41 3 1.00 0.99 1.00 0.98 0.88 0.51 0.93 0.41 1 2 1.00 1.00 1.00 0.99 0.91 0.56 0.96 0.48 3 1.00 1.00 1.00 0.99 0.90 0.60 0.96 0.49 Figure 2.1: Rejection Frequencies for Normal(0,1) with varying ρ 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0 0.1 Two Step 0.25 LCF 0.5 0.75 0.9 Two Step flexible 1 ρ LCF flexible Figure 2.2: Rejection Frequencies for χ23 with varying ρ 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0 Two Step 0.1 0.25 LCF 0.5 0.75 Two Step flexible 0.9 1 ρ LCF flexible Furthermore, the rejection frequencies in Table 2.9 suggest that the stronger the instru- 102 ment is, the higher the power is, except LCF flexible. Figure 2.3 and 2.418 show that the two step estimation method has slightly better power with the standard normal distribution and much better power with the χ23 distribution than LCF. Again, adding additional forms of v in the estimation have the two step estimation method have slightly higher rejection frequencies for χ23 distribution. LCF flexible has a different pattern, however; it has a U-shape graph whereas the other three are monotonically increasing, suggesting LCF could have higher power by including v in a flexible way if a weak instrument is in use. 18 The graphs show average rejection frequencies for the two choices at each value of π where π = 2 1 τ = 0 in Table 2.9. 103 Table 2.9: Rejection Frequencies for α = 0.05 test in Case 1 with varying π2 D (e|v) Normal (0,1) χ23 1. 2. τ π1 π2 g Two Step LCF Two Step flexible LCF flexible Replication Two Step LCF Two Step flexible LCF flexible Replication -0.5 0.5 1 0 0 1 0.5 0.2 0.1 2 3 1.00 1.00 1.00 1.00 1.00 1.00 0.98 0.99 1000 0.88 0.89 0.58 0.60 0.95 0.95 0.48 0.48 1000 2 3 1.00 1.00 1.00 1.00 1.00 1.00 0.99 0.99 1000 0.91 0.90 0.56 0.60 0.96 0.96 0.48 0.49 1000 2 3 0.97 0.97 0.92 0.92 0.98 0.97 0.86 0.87 1000 0.61 0.61 0.35 0.36 0.70 0.70 0.31 0.28 1000 2 3 0.43 0.43 0.32 0.33 0.40 0.39 0.60 0.58 999 0.17 0.16 0.10 0.10 0.23 0.24 0.35 0.33 999 2 3 0.15 0.14 0.14 0.12 0.16 0.15 0.69 0.68 995 0.09 0.09 0.08 0.07 0.16 0.16 0.53 0.52 987 ρ=1 The null hypotheses for the flexible models are that the coefficients of v, v2 , and v3 are zeroes. 104 Figure 2.3: Rejection Frequencies for Normal(0,1) with varying π2 , (π1 = τ = 0) 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.1 Two Step 0.2 LCF 0.5 Two Step flexible 1 π2 LCF flexible Figure 2.4: Rejection Frequencies for χ23 with varying π2 , (π1 = τ = 0) 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.1 Two Step 0.2 LCF 0.5 Two Step flexible 1 π2 LCF flexible Therefore, the simulations with the data generated with the two distributions demonstrate that when the instrument is strong, the two step estimation’s approximation to 105 APEs is as good as the alternative methods. Plus, although a weak instrument deteriorates its approximation, it makes much worse the approximations by the alternative linear models. In addition, the two step estimation’s test for endogeneity show good size and better power than LCF’s. Case 2: Endogeneity comes through e as well as u. Now we consider a setting where D (e|v) is a heteroskedastic normal, in which w is not exogenous even though ρ = 0. For this case, we include two additional estimation methods treating that all covariates are exogenous: Fmlogit and OLS. We would like to examine how they work as ρ = 0. When z2 is a strong instrument,19 Table 2.10 shows that the estimates have bigger biases than those with the standard normal distribution, and that the larger ρ is, the larger their biases are.20 Among the estimation methods, LCF flexible is the best with regard to bias. Yet, in considering the mean squared errors (MSEs) in Table 2.11, the two step estimation method and the other alternative methods also provide good approximations.21 Interestingly, as ρ = 0, the estimates of Fmlogit and OLS are quite similar to δg . So these estimates suggest that the dependence w and r due to the heteroskedasticity of e does not cause any distortion in estimating APEs. With the strong degree of endogeneity (ρ = 1), although the biases in Table 2.12 are not monotonically decreasing as π2 is larger, the MSEs in Table 2.13 are. Note that when a weak instrument is used, LCF Flexible provides more biased estimates than Fmlogit and OLS, and the MSEs of the methods taking the endogeneity into account are not smaller than them except the two step estimation method with π2 = 0.2. The results also suggest that the flexible methods are not helpful at all. 19 The first stage F statistics and the first step Wald statistics in this case have similar summary statistics as in Table 2.4. 20 The results with other values of ρ are available upon request. 21 When τ = −0.5 and ( π , π ) = (0.5, 1), the results are not much different from those as π = 1 in 2 2 1 Table 2.10 and2.11. 106 Table 2.10: APEs with r = ρu + e and π2 = 1 in Case 2 ρ g δg Two Step Two Step Flexible Two Step Plug-in Fmlogit LCF LCF Flexible 2SLS LIV (IV=Φ(·)) Linear Plug-in OLS Mean Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD 1 -0.106 -0.115 (0.031) -0.112 (0.038) -0.114 (0.032) -0.224 (0.018) -0.115 (0.032) -0.106 (0.039) -0.113 (0.032) -0.114 (0.032) -0.114 (0.032) -0.224 (0.018) 1. e|v ∼ Normal 0, 1 + 21 v2 2. π1 = τ = 0. 1 2 0.053 0.058 (0.018) 0.056 (0.021) 0.057 (0.018) 0.112 (0.010) 0.058 (0.018) 0.053 (0.021) 0.057 (0.018) 0.057 (0.018) 0.057 (0.018) 0.112 (0.010) 107 3 0.053 0.057 (0.017) 0.056 (0.021) 0.057 (0.017) 0.112 (0.010) 0.058 (0.018) 0.053 (0.021) 0.056 (0.018) 0.057 (0.017) 0.057 (0.017) 0.112 (0.010) 1 -0.107 -0.112 (0.028) -0.115 (0.033) -0.111 (0.027) -0.167 (0.016) -0.111 (0.029) -0.107 (0.034) -0.110 (0.028) -0.110 (0.028) -0.110 (0.028) -0.167 (0.016) 0.5 2 0.053 0.056 (0.016) 0.058 (0.019) 0.055 (0.016) 0.083 (0.009) 0.056 (0.016) 0.053 (0.019) 0.055 (0.016) 0.055 (0.016) 0.055 (0.016) 0.083 (0.009) 3 0.053 0.056 (0.015) 0.057 (0.019) 0.055 (0.015) 0.083 (0.009) 0.055 (0.016) 0.053 (0.019) 0.055 (0.016) 0.0554 (0.015) 0.055 (0.015) 0.083 (0.009) Table 2.10: (cont’d) ρ g δg Two Step Two Step Flexible Two Step Plug-in Fmlogit LCF LCF Flexible 2SLS LIV (IV=Φ(·)) Linear Plug-in OLS Mean Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD 1 -0.107 -0.109 (0.026) -0.115 (0.031) -0.108 (0.025) -0.119 (0.015) -0.108 (0.027) -0.107 (0.031) -0.108 (0.026) -0.108 (0.026) -0.108 (0.026) -0.119 (0.016) 0.1 2 0.054 0.055 (0.015) 0.058 (0.018) 0.054 (0.015) 0.059 (0.009) 0.054 (0.015) 0.054 (0.018) 0.054 (0.015) 0.054 (0.015) 0.054 (0.015) 0.059 (0.009) 108 3 0.054 0.054 (0.015) 0.057 (0.018) 0.054 (0.015) 0.059 (0.009) 0.054 (0.015) 0.053 (0.018) 0.054 (0.015) 0.054 (0.015) 0.054 (0.015) 0.059 (0.009) 1 -0.107 -0.108 (0.026) -0.115 (0.031) -0.108 (0.025) -0.107 (0.015) -0.108 (0.027) -0.107 (0.031) -0.108 (0.026) -0.108 (0.025) -0.108 (0.026) -0.107 (0.016) 0 2 0.054 0.054 (0.015) 0.057 (0.017) 0.054 (0.015) 0.053 (0.009) 0.054 (0.015) 0.054 (0.018) 0.054 (0.015) 0.054 (0.015) 0.054 (0.015) 0.053 (0.009) 3 0.054 0.054 (0.015) 0.057 (0.018) 0.054 (0.014) 0.053 (0.009) 0.054 (0.015) 0.054 (0.018) 0.054 (0.015) 0.054 (0.015) 0.054 (0.015) 0.053 (0.009) Table 2.11: MSEs of APE estimates with r = ρu + e and π2 = 1 in Case 2 ρ g Two Step Two Step Flexible Two Step Plug-in Fmlogit LCF LCF Flexible 2SLS LIV (IV=Φ(·)) Linear Plug-in OLS 1 0.001 0.002 0.001 0.014 0.001 0.001 0.001 0.001 0.001 0.014 1 2 0.000 0.000 0.000 0.004 0.000 0.000 0.000 0.000 0.000 0.004 3 0.000 0.000 0.000 0.004 0.000 0.000 0.000 0.000 0.000 0.004 1. e|v ∼ Normal 0, 1 + 12 v2 2. MSEs are calculated from Table 2.10. 1 0.001 0.001 0.001 0.004 0.001 0.001 0.001 0.001 0.001 0.004 0.5 2 0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.001 109 3 0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.001 1 0.001 0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.001 0.000 0.1 2 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 3 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1 0.001 0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.001 0.000 0 2 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 3 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Table 2.12: APEs with w = 1[z2 π2 + u > 0] and ρ = 1 in Case 2 π2 g Mean Mean SD Two Step Mean SD Flexible Two Step Mean SD Plug-in Fmlogit Mean SD LCF Mean SD LCF Mean SD Flexible 2SLS Mean SD LIV Mean (IV=Φ(·)) SD Linear Mean Plug-in SD OLS Mean SD Replications δg Two Step 0.1 1 2 -0.102 0.051 -0.175 0.093 (0.324) (0.192) -0.198 0.097 (0.440) (0.286) -0.138 0.073 (0.483) (0.280) -0.333 0.167 (0.020) (0.010) -0.205 0.112 (2.644) (1.284) 0.884 -0.446 (11.653) (7.151) -0.194 0.108 (2.241) (1.314) -0.221 0.142 (2.618) (1.842) -0.197 0.112 (2.213) (1.268) -0.333 0.167 (0.020) (0.010) 997/1000 1. e|v ∼ Normal 0, 1 + 12 v2 2. π1 = τ = 0. 3 0.051 0.082 (0.192) 0.101 (0.293) 0.064 (0.280) 0.167 (0.011) 0.093 (1.509) -0.438 (5.858) 0.085 (1.189) 0.079 (1.314) 0.086 (1.203) 0.167 (0.011) 0.2 1 2 3 -0.102 0.051 0.051 -0.109 0.057 0.052 (0.212) (0.116) (0.117) -0.160 0.081 0.079 (0.378) (0.223) (0.227) -0.090 0.047 0.042 (0.271) (0.146) (0.148) -0.330 0.165 0.165 (0.020) (0.010) (0.011) -0.103 0.056 0.047 (0.782) (0.469) (0.336) 0.111 -0.060 -0.051 (3.230) (2.032) (1.919) -0.089 0.051 0.038 (0.575) (0.398) (0.232) -0.079 0.043 0.037 (0.397) (0.207) (0.219) -0.091 0.052 0.039 (0.628) (0.430) (0.246) -0.330 0.165 0.165 (0.020) (0.010) (0.011) 1000/1000 110 0.5 1 2 3 -0.103 0.051 0.051 -0.125 0.063 0.062 (0.081) (0.044) (0.043) -0.120 0.060 0.060 (0.149) (0.079) (0.081) -0.126 0.063 0.063 (0.080) (0.043) (0.042) -0.306 0.153 0.153 (0.020) (0.010) (0.011) -0.128 0.064 0.064 (0.082) (0.044) (0.043) -0.094 0.047 0.047 (0.156) (0.082) (0.085) -0.126 0.063 0.063 (0.081) (0.044) (0.043) -0.127 0.064 0.063 (0.081) (0.044) (0.043) -0.127 0.064 0.063 (0.081) (0.044) (0.043) -0.306 0.153 0.153 (0.020) (0.010) (0.011) 1000/1000 Table 2.13: MSEs of APE estimates with w = 1[z2 π2 + u > 0] and ρ = 1 in Case 2 π2 g Two Step Two Step Flexible Two Step Plug-in Fmlogit LCF LCF Flexible 2SLS LIV (IV=Φ(·)) Linear Plug-in OLS 1 0.110 0.203 0.234 0.054 7.004 136.772 5.031 6.867 4.908 0.054 0.1 2 0.039 0.084 0.079 0.013 1.652 51.391 1.730 3.403 1.611 0.013 3 0.038 0.088 0.079 0.014 2.278 34.559 1.415 1.727 1.449 0.014 1. e|v ∼ Normal 0, 1 + 12 v2 2. π1 = τ = 0. MSEs are calculated from Table 2.12. 3. 111 1 0.045 0.146 0.073 0.052 0.611 10.480 0.330 0.158 0.395 0.053 0.2 2 3 0.014 0.014 0.051 0.052 0.021 0.022 0.013 0.013 0.220 0.113 4.141 3.695 0.159 0.054 0.043 0.048 0.185 0.060 0.013 0.013 1 0.007 0.023 0.007 0.042 0.007 0.024 0.007 0.007 0.007 0.042 0.5 2 0.002 0.006 0.002 0.010 0.002 0.007 0.002 0.002 0.002 0.010 3 0.002 0.007 0.002 0.010 0.002 0.007 0.002 0.002 0.002 0.010 The rejection frequencies in Table 2.14 and 2.15 tell that the tests by the two step estimation method and LCF cannot detect the endogeneity due to the heteroskedasticity of e; as ρ = 0, their rejection frequencies are still close to 0.05. Yet, those by their flexible methods can detect it. The form of heteroskedasticity and the flexible methods contain v2 and v2 , respectively. We suspect that it might be related with that the flexible methods yield higher rejection frequencies. As ρ > 0, the patters in the rejection frequencies are similar as those with the standard normal distribution although the rejection frequencies themselves are generally smaller. It is more obviously shown in Figure 2.5 and 2.6.22 Overall, we find that the results with the heteroskedastic normal distribution are pretty similar to those of the standard normal distribution regarding approximations to APEs and a test of endogeneity. The two additional estimation methods provide evidence that the endogeneity caused only by heteroskedasticity does not matter to obtain an approximation to APEs. Plus, although including the higher polynomial v ensure the two step estimation method and LCF to have tests detecting this kind of endogeneity, it does not helps their approximations to APEs. In summary, the simulations demonstrate that the two step estimation method with a misspecified conditional mean works well as an approximation if the instrument is strong. Even with a weak instrument, it yields a better approximation than the alternative methods. In addition, its test for endogeneity has about the correct size and better power than LCF’s. 22 The graphs are average rejection frequencies for the two choices at each value of ρ and π in Table 2.14 2 and 2.15, respectively. 112 Table 2.14: Rejection Frequencies for α = 0.05 test in Case 2 with varying ρ D (e|v) Normal 0, 1 + 12 v2 1. 2. 0 0.1 0.25 0.5 0.75 0.9 2 0.08 3 0.07 2 0.08 3 0.08 2 0.20 3 0.20 2 0.53 3 0.53 2 0.84 3 0.85 2 0.94 3 0.94 2 0.96 3 0.96 LCF Two Step flexible LCF flexible 0.07 0.43 0.31 0.07 0.43 0.30 0.07 0.45 0.31 0.07 0.45 0.32 0.16 0.50 0.39 0.17 0.50 0.37 0.46 0.70 0.58 0.47 0.70 0.59 0.77 0.88 0.80 0.77 0.89 0.81 0.90 0.95 0.89 0.90 0.94 0.89 0.94 0.97 0.93 0.94 0.97 0.92 π2 = 1, π1 = τ = 0. The null hypotheses for the flexible models are that the coefficients of v, v2 , and v3 are zeroes. Table 2.15: Rejection Frequencies for α = 0.05 test in Case 2 with varying π2 D (e|v) Normal 0, 1 + 12 v2 1. 2. 1 ρ g Two Step -0.5 0.5 1 τ π1 π2 g Two Step 2 0.94 LCF Two Step flexible LCF flexible Replication 0.91 0.91 0.94 0.94 0.90 0.92 1000 3 0.94 0 0 1 2 0.96 0.5 3 0.96 0.94 0.94 0.97 0.97 0.93 0.92 1000 2 0.69 3 0.70 0.62 0.65 0.66 0.66 0.63 0.62 1000 0.2 2 0.21 3 0.21 0.18 0.18 0.18 0.18 0.39 0.39 1000 0.1 2 0.09 3 0.10 0.09 0.09 0.10 0.10 0.51 0.52 997 ρ=1 The null hypotheses for the flexible models are that the coefficients of v, v2 , and v3 are zeroes. 113 Figure 2.5: Rejection Frequencies for Normal 0, 1 + 12 v2 with varying π2 , (π1 = τ = 0) 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0 0.1 Two Step 0.25 LCF 0.5 0.75 Two Step flexible 0.9 1 ρ LCF flexible Figure 2.6: Rejection Frequencies for Normal 0, 1 + 12 v2 with varying π2 , (π1 = τ = 0) 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.1 Two Step 0.2 LCF 0.5 Two Step flexible 114 1 LCF flexible π2 2.4 CONCLUSION This chapter studies a two step estimation method for multiple fractional dependent variables especially when there is a binary endogenous explanatory variable. By employing a control function approach suggested by Wooldridge (2014), this method is an extension of a two step estimation method developed in Chapter 1 where the endogenous explanatory variable is continuous. With the assumption that the conditional mean, conditional on only observed variables and the generalized residual, is multinomial logit, it applies a QMLE to obtain consistent estimator of the parameters in the conditional mean. Although it is not able to estimate the mean parameter in the structural conditional mean, it is able to provide a consistent estimator of APE, which is often more interesting. By conducting Monte Carlo simulations, this chapter provides evidence that even when the conditional mean is misspecified, the two step estimation method yields a decent approximation to average partial effect. Its approximation is as good as or often better than the alternative methods including a linear control function approach, the standard two stage least squares, and plug-in methods. This results tell a consistent story with Chapter 1. In addition, the simulations demonstrate that the two step estimation method’s test for endogeneity outperforms the linear control function approach in power although both of the two methods have approximately the correct size. 115 REFERENCES 116 REFERENCES Blundell, R., and J. L. Powell. 2003. “Endogeneity in Nonparametric and Semiparametric Regression Models.” In Advances in Economics and Econometrics. eds. by L. P. H. Dewatripont, Mathias, and S. J. Turnovsky, 2: Cambridge: Cambridge University Press, , 1st edition 312–357. Gourieroux, C., A. Monfort, E. Renault, and A. Trognon. 1987. “Generalised residuals.” Journal of Econometrics, 34(1-2): 5–32. , , and A. Trognon. 1984. “Pseudo Maximum Likelihood Methods: Theory.” Econometrica, 52(3): 681–700. Mullahy, J. 2010. “Multivariate Fractional Regression Estimation of Econometric Share Models.” NBER Working Papers 16354, National Bureau of Economic Research, Inc. Nam, S. 2014. “Multiple fractional response variables with continuous endogenous explanatory variables.” June. Sivakumar, A., and C. Bhat. 2002. “Fractional Split-Distribution Model for Statewide Commodity-Flow Analysis.” Transportation Research Record, 1790(1): 80–88. Staiger, D., and J. H. Stock. 1997. “Instrumental Variables Regression with Weak Instruments.” Econometrica, 65(3): 557–586. Terza, J. V., A. Basu, and P. J. Rathouz. 2008. “Two-stage residual inclusion estimation: Addressing endogeneity in health econometric modeling.” Journal of Health Economics, 27(3): 531–543. Wooldridge, J. M. 2005. “Unobserved Heterogeneity and Estimation of Average Partial Effects.” In Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg. eds. by D. W. K. Andrews, and J. H. Stock: Cambridge University Press, , Chap. 3. . 2010. Econometric Analysis of Cross Section and Panel Data.: MIT Press. Wooldridge, J. M. 2014. “Quasi-maximum likelihood estimation and testing for nonlinear models with endogenous explanatory variables.” Journal of Econometrics. 117 CHAPTER 3 ON COMPUTING AVERAGE PARTIAL EFFECTS IN MODELS WITH ENDOGENEITY OR HETEROGENEITY 3.1 INTRODUCTION It is now widely recognized that magnitudes of partial effects are important for determining the importance of policy interventions and for understanding the strengths of relationships posited by economic theory. In models where unobservables are assumed to be independent of observed covariates (or, in some cases, a weaker assumption, such as mean independence), there is little controversy about how one should compute partial effects – at least once one has decided whether the conditional mean or some other feature of a conditional distribution is the focus. Assuming, as we do in this chapter, that the conditional mean is the focus, one typically computes partial derivatives or discrete changes of a conditional mean function with respect to the explanatory variables of interest. To obtain a single number summarizing the effect of a covariate, one often averages the partial effects across the distribution of the covariates. This average leads to the notion of an “average partial effect” (or APE), and APEs are typically straightforward to estimate in parametric, semiparametric, and even nonparametric models with exogenous explanatory variables.1 Other possibilities for summarizing the effect of a covariate is to insert specific values of the covariates, such as means or medians, into the estimated 1 We prefer the name “partial effect” to “marginal effect” because “marginal” connotes a small change. Average partial effects can be computed using derivatives or discrete changes, and the discrete changes can be of any magnitude. 118 partial effects, but these seems less desirable and is less popular. When unobservables are correlated with one or more covariates – so we have some form of “endogeneity” – it is not clear how one should summarize partial effects. Blundell and Powell (2003) propose the notion of an “average structural function” (ASF). The ASF is defined as a function of covariates after the unobservables have been averaged out. More precisely, suppose a response Y is determined as Y = G (X, U) for observed covariates X and unobservables U. We obtain the ASF as a function of x by inserting x into G (·, ·) and then averaging across the unobservables. More precisely, ASF (x) ≡ EU [ G (x, U)]. Partial effects are then obtained by taking partial derivatives or differences of ASF (x). In general, the partial effects defined in this way depend on x. Blundell and Powell (2003) showed how to estimate the ASF very generally in models with endogenous explanatory variables, provided, of course, one has sufficient instrumental variables. Wooldridge (2005b) focused on the partial effects, which in the continuous case are defined as ∂ASF (x)/∂x j , in a similar setting, but with parametric models.2 Part of the appeal of the ASF is that its definition is the same regardless of whether U and X are dependent. Because the ASF is a function of x, one can see how partial effects on the ASF change as elements of x change, and the differences across different values of x can be of significant interest. Even so, one often wants to compare estimates from nonlinear models with estimates from simple linear models – which are often estimated by two stage least squares or, in the case of panel data, standard fixed effects methods. The question is: How should one summarize the partial effects of observed covariates in nonlinear models to make them comparable to linear estimates? There are two possibilities. First, we can obtain the partial effects from the ASF and then average across the distribution of the observed covariates, X. A second possibility is to compute partial effects with respect to X from Y = G (X, U) and then average across (X, U). As we will see in Section 3.3, these two methods are not generally the same, and they can actually be 2 In Wooldridge (2010)and the first edition, the notion of an APE is used regularly for models with endogeneity or correlated random effects. 119 very different. Wooldridge (2010) discusses APEs based on the average structural function and those based directly on a conditional mean specification such as E(Y |X, U) = G (X, U). But he makes no systematic attempt to compare these APEs, and does not even mention that they can be different. This chapter has two goals. The first is to clarify the relationship between the two kinds of partial effects and demonstrate that they generally differ. We will not resolve which partial effect is “better” because it is mainly a matter of test. Second, we show that the kind of control function/correlated random effects approaches discussed in Blundell and Powell (2003), Altonji and Matzkin (2005), and Wooldridge (2005b, 2010) can be used to consistently estimate both types of partial effects. The rest of the chapter is organized as follows. Section 3.2 defines the average structural function, slightly extending the definition in Blundell and Powell (2003). We then show that the key Blundell and Powell result about identifying the ASF when suitable “proxies” are available carries through with the more general definition. We also note that, assuming derivatives can be passed under integrals, the ASF identifies average partial effects. Section 3.3 offers a different way to define a single average partial effect and shows that it is generally different from a definition based on averaging the observed covariates out of the average structural function. We also show that if the heterogeneity and covariates are assumed to be independent, the two ways of computing APEs coincide. Section 3.4 illustrates how to compute the two types of partial effects in two empirical examples using Michigan Educational Assessment Program data, and Section 3.5 summarizes and concludes. 120 3.2 THE AVERAGE STRUCTURAL FUNCTION AND AVERAGE PARTIAL EFFECTS In what follows, it is helpful to use a notation that clearly distinguishes random vectors from particular outcomes of those vectors. We use the traditional convention from probability that upper case letters are random variables or vectors and the lower case counterparts are specific possible values. We are interested in the conditional mean of a response variable, Y, conditional on a vector of observed covariates, X, and a vector of unosbervables, U: E(Y |X, U) = G (X, U), (3.1) where (Y, X, U) has a joint distribution in a population. We can also write E(Y |X = x, U = u) = G (x, u). (3.2) This setup subsumes that in Blundell and Powell (2003), who assume that Y is a deterministic function of (X, U). Because conditional probabilities can be written as conditional means we cover partial effects for probabilities as a special case. In many applications, G (·, u) is continuously differentiable on the the support of X, X , which we would then assume to be an open set. But we are also interested in cases where G (·, ·) is not differentiable in either argument. For example, if Y is binary and we assume Y = 1[α + Xβ + U > 0] for a K × 1 vector β, G (·, ·) is not differentiable. In this case we define partial effects as changes. For example, in moving the K th variable from x0K to x1K , holding the other 121 elements of x, say x(K ) fixed, the partial effect is 1[α + x(K ) β(K ) + β K x1K + u > 0] − 1[α + x(K ) β(K ) + β K x0K + u > 0] More generally, the effect of changing x from x0 to x1 is G ( x1 , u ) − G ( x0 , u ). When G (·, u) is continuously differentiable in x j we can define the partial effect with respect to x j as g j (x, u) = ∂G (x, u) . ∂x j (3.3) Wooldridge (2005b, 2010) defines the average partial effect as APEj (x) = EU [ g j (x, U)], (3.4) so that the unobservables are averaged out. It is important to see that the APE is obtained by inserting a fixed value for x and then averaging across the unconditional distribution of U. We are not using the conditional distribution, D (U|X), in the averaging, and we are not restricting this conditional distribution. In the special case that U and X are independent, D (U|X) = D (U) and then the distinction is irrelevant. As discussed in Wooldridge (2005b, 2010, Section 2.2.5), APEj (x) is closely related to the notion of an average structural function (Blundell and Powell, 2003): ASF (x) = EU [ G (x, U)]. (3.5) Actually, this definition is somewhat more general than that used by Blundell and Powell, who effectively write Y = G (X, U). The more general definition is important in situations where Y is discrete and we model endogeneity as correlation between one or more omit- 122 ted variables, in U, and one or more observed covariates, in X. The ASF is a very useful function because one can take derivatives or changes with respect to elements of x. Assuming that the partial derivative can be taken through the sum or integral defining EU [ G (x, U)] – see, for example, Bartle (1966) for general conditions – we have APEj (x) = ∂ASF (x) . ∂x j (3.6) As mentioned earlier, the definition of the ASF takes no stand on D (U|X). However, if U and X are independent then the ASF is the same as the conditional expectation. This follows immediately from the the law of iterated expectations and the general Lebesgue representation of conditional expectation. Let E(Y |X = x) = E[ E(Y |X, U)|X = x] = = G (x, u) Q(du|x) G (x, u) Q(du) = EU [ G (x, U)] ≡ ASF (x), (3.7) where Q(·) is an appropriate σ-finite measure. When U and X are dependent, we generally cannot obtain interesting partial effects by estimating E(Y |X). There are exceptions, of course. For example, if Y = α + Xfi + U with E(U ) = 0, E(X U ) = 0, but E(U |X) = E(U ), OLS using a random sample is generally consistent for (α, β), which indexes the ASF: ASF (x) = α + xfi. However, simple extensions with correlated random slopes do permit OLS to consistently estimate the ASF. And in nonlinear models, directly estimating E(Y |X) rarely leads to interesting quantities unless U and X are independent. Blundell and Powell (2003) and Wooldridge (2005b) show how to identify the ASF when “proxy” variables for U, say V, are available. Sometimes we assume that we observe suitable proxies, such as standardized test scores to proxy for cognitive ability. In the context of BP (2003), V is a vector of reduced form errors for endogenous elements of X. When V is a vector of reduced form errors we need exogenous variables from outside 123 the equation to serve as instruments. Suitable proxies are also available in a panel data context, where V can be a vector of functions of a time series of covariates {X1 , X2 , ..., X T }. A leading case is V = T −1 ∑tT=1 Xt . See, for example, Altonji and Matzkin (2005) and Wooldridge (2005a, 2010). Two key restrictions on V suffice to identify average partial effects. The first is that V is redundant in the “structural” conditional expectation. The second is that V is a good enough proxy for U so that U and X are independent conditional on V. Formally, we state these assumptions as E(Y |X, U, V) = E(Y |X, U) (redundancy of V) D (U|X, V) = D (U|V) (conditional independence) (3.8) (3.9) Sometimes assumption (3.8) is called a “conditional mean independence” assumption, because the mean of Y is independent of V once we also condition on (X, U). Under assumptions (3.8) and (3.9), we have an important identification result, which was discovered independently in different settings by Blundell and Powell (2003), Altonji and Matzkin (2005), and Wooldridge (2002, 2005b). In what follows we assume that conditional means exist, along with standard moment conditions. Proposition 1: Let (Y, X, U, V) be a random vector such that (3.8) and (3.9) hold. Define H (x, v) ≡ E(Y |X = x, V = v). (3.10) ASF (x) = EV [ H (x, V)] (3.11) Then 124 Proof: By the law of iterated expectations and the redundancy condition (3.8), E(Y |X = x, V = v) = E[ E(Y |X = x, U, V = v)|X = x, V = v] = E[ E(Y |X = x, U)|X = x, V = v] = E[ G (x, U)|X = x, V = v] Next, by conditional independence (3.9), E[ G (x, U)|X = x, V = v] = E[ G (x, U)|V = v], and so we have established the key relationship H (x, v) = E[ G (x, U)|V = v]. (3.12) Now integrate (in the measure theoretic sense) both sides with respect to the distribution of V and use iterated expectations on the right: EV [ H (x, V)] = EV { E[ G (x, U)|V]} = EU [ G (x, U)] = ASF (x). We can use Proposition 1 to obtain useful formulas for partial effects based on discrete changes. The “structural” quantity of interest is ASF (x1 ) − ASF (x0 ) = EU [ G (x1 , U) − G (x0 , U)] and Proposition 1 shows that this is the same as EV [ H (x1 , V) − H (x0 , V)]. An important special case, suppose XK is a binary treatment indicator – perhaps indicat- 125 ing participation in a program. Generally, we can estimate the average treatment effect of the program with the observables set at particular values. In particular, under the conditions of Proposition 1, EV [ H (x(K ) , 1, V) − H (x(K ) , 0, V)] = APEK (x(K ) ) = EU [ G (x(K ) , 1, U) − G (x(K ) , 0, U)], (3.13) where x(K ) denotes x without xK . For partial effects defined as derivatives, we have the following. Proposition 2: Under the same assumptions in Proposition 1, assume in addition that G (·, u) is continuously differentiable and the partial derivative with to x j can be passed through the integrals defining EV [ H (x, V)] and EU [ G (x, U)] then EV ∂H (x, V) ∂x j = EU ∂G (x, U) ∂x j (3.14) and so the APEs can be gotten from E(Y |X = x, V = v) by taking derivatives (or changes) with respect to x j and then averaging out V. The conclusion in equation (3.11) is very powerful, especially considering that there are general conditions under which H (x, v) is nonparametrically identified. Blundell and Powell (2003) study the case where some elements of X are correlated with U, but exogenous variables Z are available such that endogenous variables X2 can be represented as X2 = F ( Z ) + V (3.15) where (U, V) is independent of Z. Provided there are enough elements Z2 , where Z = (X1 , Z2 ), the function H (x, v) is nonparametrically identified. Of course, one can use parametric models or semiparametric models to estimate the functions F(z) and H (x, v) – either as the true models or as convenient approximations. We can turn the population formulas into estimators very generally. Given a random 126 sample {(Yi , Xi , Vi ) : i = 1, ..., N } and a consistent estimator Hˆ (x, v) of E(Y |X = x, V = v), a generally consistent estimator of ASF (x) is N ASF (x) = N −1 ∑ Hˆ (x, Vi ), (3.16) i =1 and the previous analysis shows that we can take derivatives or changes with respect to elements in x to estimate average partial effects. In some cases, including in the Blundell ˆ i , which depends on parameters or and Powell (2003) setup, Vi must be replaced with V functions that are consistently estimated in a first stage. In many studies one wants to report a single number that measures the effect of, say, x j on ASF (x). We might do this by evaluating APEj (x) at a central values of the elements of x, such as the means or medians. In the spirit of the average treatment effect literature, it is probably more appealing to average APEj (X) across the distribution of X: δj = EX [ APEj (X)]. (3.17) A consistent estimate of δj is immediate under standard regularity conditions, provided APE j (x) is consistent for each x: N δˆj = N −1 ∑ APE j (Xi ). (3.18) i =1 It turns out that the quantity in (3.18) is not the most commonly reported in empirical studies, and it is a bit cumbersome to compute. Plus, obtaining a standard error via analytical methods is a bit tricky. In the next section we discuss APEs where we average jointly across the distribution of (X, U). 127 3.3 APES ACROSS THE ENTIRE POPULATION As in the previous section, we assume interest centers on E(Y |X = x, U = u) = G (x, u), but now we wish to average the partial effects across the joint distribution of the observables and unobservables. For example, in moving xK from x0K to x1K , the average partial effect across the entire population is E[X(K) ,U] G (X(K ) , x1K , U) − G (X(K ) , x0K , U) . For a continuous variable X j we may define the partial effect as g j (x, u) = ∂G (x, u) ∂x j (3.19) and then the parameter of interest is η j = E(X,U) g j (X, U) . (3.20) Notice that X and U are treated symmetrically in equation (3.20), as in the treatment effects literature [for example, Imbens and Wooldridge (2009)]. In studying identification of (3.20) it makes no sense to start with the ASF because E(X,U) [ G (X, U)] = E(Y ) by iterated expectations, and so the expected value of the ASF with respect (X, U) conveys no useful information about how X j affects Y. Focusing for now on the the case where X j is continuous, the partial effect defined by (3.20) generally differs from δj in (3.17). In other words, EX ∂ASF (X) ∂x j = E(X,U) g j (X, U) . (3.21) To see why, it is helpful to consider a simple example, where X and U are both scalars. 128 Assume the conditional mean function is E(Y | X, U ) = β 0 + β 1 X + β 2 U + β 3 X 2 U, (3.22) where E(U ) = 0. Then ASF ( x ) = EU ( β 0 + β 1 x + β 2 U + β 3 x2 U ) = β 0 + β 1 x and so ∂ASF ( x ) = β1. ∂x (3.23) No further averaging is needed to obtain a single effect because APE( x ) is constant. By contrast, the partial effect of x on E(Y | X = x, U = u) is PE( x, u) = ∂E(Y | X = x, U = u) = β 1 + 2β 3 xu ∂x (3.24) and so η ≡ E(X,U ) [ PE( X, U )] = β 1 + 2β 3 Cov( X, U ). (3.25) Only if U and X are uncorrelated does (3.25) equal the partial derivative of the ASF. With substantial correlation between X and U or if β 3 is large in magnitude, the difference between (3.25) and (3.23) can be substantial. When one wants to study how the partial effects differ across a range of values of the explanatory variables, the ASF seems to be natural quantity of interest. But it is less clear that (3.17) is the best definition of the average effect across the population. If we follow the approach from the average treatment effects literature then (3.20) is preferred. As in the previous section, we first state a result that can be applied to partial effects defined by differences. Proposition 3: Under the same assumptions as Proposition 1, let xK be a fixed value. 129 Then E[X(K) ,V] H (X(K ) , xK , V) = E[X(K) ,U] G (X(K ) , xK , U) . Proof: From equation (3.12) we can write H (x(K ) , xK , v) = E[ G (x(K ) , xK , U)|V = v]. Now use conditional independence again: E[ G (x(K ) , xK , U)|V = v] = E[ G (x(K ) , xK , U)|X(K ) = x(K ) , V = v] = E[ G (X(K) , xK , U)|X(K) = x(K) , V = v], which we can write in terms of random variables as H (X(K ) , xK , V) = E[ G (X(K ) , xK , U)|X(K ) , V]. The proof is finished by taking the expected value with respect to (X(K ) , V) and using iterated expectations. For the continuous case, we have the following. Define h j (x, v) = ∂H (x, v) , ∂x j (3.26) the partial effect of E(Y |X = x, V = v) with respect to x j . Proposition 4: Define η j as in equation (3.20). Under the same assumptions as Proposition 2, η j = E(X,V) h j (X, V) . (3.27) Proof: From equation (3.12), H (x, v) = E[ G (x, U)|V = v], and assuming the partial derivative with respect to x j can be passed through the integrals, h j (x, v) = E[ g j (x, U)|V = v]. 130 Now use the conditional independence assumption again: h j (x, v) = E[ g j (x, U)|V = v] = E[ g j (x, U)|X = x, V = v] = E[ g j (X, U)|X = x, V = v], which we can write in terms of random variables as h j (X, V) = E[ g j (X, U)|X, V]. Now use iterated expectations to get E(X,V) h j (X, V) = E(X,U) g j (X, U) . (3.28) The four propositions stated in this and the previous section show that, under the same set of “control function” assumptions, average partial effects obtained by averaging X out of the ASF and those obtained by averaging the partial effects of E(Y |X, U) are generally identified. For better or worse, these partial effects are not generally the same, and there is unlikely to be concensus on which measure is “best.” The differences can be economically important. For example, in equation (3.22), the partial effect of the ASF, averaged across X, is simply β 1 . The average partial effect across ( X, U ) is β 1 + 2β 3 σXU . In principal, these need not even have the same sign, let alone similar magnitudes. (In practice, β 3 might be small because it is the coefficient on the interaction X 2 U.) The example in equation (3.22) can also be used to illustrate the main result of this section, namely, that the η j in (3.20) are identified if we have access to a suitable proxy variable, V. Assume V – which is observed or depends on parameters that we can consistently estimate – has a zero mean. Make the redundancy assumption along with a 131 linearity assumption on E(U |V ): E(U | X, V ) = E(U |V ) = θV (3.29) Then E(Y | X, V ) = β 0 + β 1 X + β 2 E(U | X, V ) + β 3 X 2 E(U | X, V ) = β 0 + β 1 X + β 2 θV + β 3 X 2 θV ≡ h( X, V ) The partial derivative of h( x, v) with respect to x is ∂h( x, v) = β 1 + 2γ3 xv ∂x where γ3 = β 3 θ. Now E(X,V ) ( β 1 + 2γ3 XV ) = β 1 + 2γ3 E( XV ) = β 1 + 2β 3 E( XθV ) and E( XθV ) = E[ XE(U |V )] = E[ E( XU | X, V )] = E( XU ) by iterated expectations. This shows that estimating E(Y | X, V ), taking the partial effect with respect to X, and then averaging across ( X, V ) is the same as starting with E(Y | X, U ) and performing the same operations. This same analysis shows that β 1 = ∂ASF ( x )/∂x is also identified by E(Y | X, V ). As mentioned earlier, there is one important case where the different definitions of the average partial effects are the same. We already showed that when U and X are independent then ASF (x) = E(Y |X = x). We now can say more. Namely, basing the average partial effect on the ASF, or first taking the partial derivative of the original structural function, give the same answer after all random variables are averaged out. 132 Proposition 5: Under the assumptions of Proposition 4, assume that U and X are independent. Then E(X,U) g j (X, U) = EX ∂ASF (X) . ∂x j Proof: By the law of iterated expectations, E(X,U) g j (X, U) = EX { E[ g j (X, U)|X]}. Now, we can write E[ g j (X, U)|X = x] = g j (x, U) Q(du|x) = ∂ ∂g (x, u) Q(du) = ∂x j g(x, u) Q(du) ∂ASF (x) = , ∂x j ∂x j where the second equality uses independence and the third uses the technical assumption of interchanging the derivative and the integral. It follows that EX { E[ g j (X, U)|X]} = EX and this completes the proof. 133 ∂ASF (X) ∂x j 3.4 EMPIRICAL EXAMPLE In this section, we illustrate how to obtain the estimates of (3.17) and (3.20) with two empirical examples using Michigan Educational Assessment Program (MEAP) math test outcome for the fourth graders. Papke and Wooldridge (2008) apply a control function approach to estimate the effects of spending on student’s performance for this test using 501 school districts from 1992 through 2001. Their dependent variable is a pass rate3 and they use foundation allowance as the instrument variable for spending. For this example, (3.17) and (3.20) are estimated as δ = β· 1 NT N T ∑∑ 1 NT 1 NT N T j s N T ∑∑ i φ(ψ + x js β + hi ξ + ρvit ) , (3.30) t and η= β· ∑∑ i φ(ψ + xit β + hi ξ + ρvit ) , (3.31) t where xit includes spendingit , freelunchit , log(enrollment)it , spendingi,1994 , time dummies and the interactions between spendingi,1994 and time dummies, and hi contains time averages of freelunchit and log(enrollment)it .4,5 vit is the residual obtained by the first step of, and β, ψ, ξ and ρ are the estimates obtained by the second step of Procedure 4.1 in Papke and Wooldridge (2008), respectively. Table 3.1 presents δ and η for spendingit , freelunchit , and log(enrollment)it . The estimates η are the same as those reported in Papke and Wooldridge (2008).6 We see that δ is more 3 In their data period, the test outcome is graded as one of Satisfactory, Moderate, or Low. The pass rate is a fraction of students rated at the Satisfactory level. 4 In Papke and Wooldridge (2008), spending is constructed as the average of real expenditures per pupil it for the recent four years in logarithmic form, freelunchit is the fraction of students who are eligible for free and reduced-price lunch program. 5 They allow the correlation between the district individual heterogeneity and the time averages of freelunchit and log(enrollment)it . 6 Table 5 in Papke and Wooldridge (2008). 134 precisely estimated than η but they are similar.7 Table 3.1: APE estimates in Papke and Wooldridge (2008) APE variables spending freelunch log(lenrollment) Scale factor η 0.583 (0.256) -0.100 (0.069) 0.096 (0.073) 0.337 δ 0.558 (0.210) -0.096 (0.065) 0.092 (0.061) 0.323 1. The standard errors, in parentheses, are obtained by using 1000 bootstrap replications. 2. Scale factor indicates the average sum of φ(·) in (3.30) and (3.31). We now calculate the two APE estimates of the empirical application in Chapter 1, which also examines how spending affects the MEAP test outcomes using the 2004 school year data with multiple fractional response variables. In this application, the dependent variable is a set of student fractions of the four levels for a district where the sum of a district’s four fractions becomes one.8 The two APE of spending on these four levels are estimated by δg = 1 N N  1 N   exp x j θxg + θvg vi ∑N ∑ j i ∑4h exp  x j θxh + θvh vi · θwg − ∑4h θwh exp x j θxh + θvh vi ∑4h exp x j θxh + θvh vi      (3.32) 7 Their difference comes from the difference between their scale factors. Considering that the scale factors are normal density functions raging from zero to 0.4, the difference could not be large unless the coefficient estimate β is huge. 8 In 2004, there were four categories. 135 , and ηg = N 1 N  ∑ i  exp xi θxg + θvg vi ∑4h exp xi θxh + θvh vi · θwg − ∑4h θwh exp xi θxh + θvh vi ∑4h exp xi θxh + θvh vi   , (3.33) where xi contains spendingi , freelunchi , log(enrollmenti ), and spending93i ,9 , vi is the OLS residual in the first step, and θ is the fmlogit estimates in the second step. Table 3.2 reports the two estimates. As in the above illustration, they are similar in general. Table 3.2: APE estimates in Chapter 1 spending v η δ spending2 v only η δ v, v3 , v3 η δ level 1 0.699 (0.223) 0.681 (0.205) 0.806 (0.239) 0.779 (0.216) 0.709 (0.248) 0.691 (0.229) level 2 0.016 (0.139) 0.017 (0.135) -0.053 (0.153) -0.044 (0.150) 0.012 (0.158) 0.014 (0.154) level 3 -0.634 (0.190) -0.619 (0.172) -0.679 (0.199) -0.667 (0.183) -0.618 (0.188) -0.602 (0.170) level 4 -0.081 (0.054) -0.079 (0.057) -0.074 (0.055) -0.068 (0.056) -0.103 (0.061) -0.103 (0.068) 1. The standard errors, in parentheses, are obtained by using 1000 bootstrap replications. 2. spending2 indicates, the model includes spending2 as well as spending. 2. v, v3 , v3 indicates that the estimation includes them. 3.5 CONCLUSION In this chapter, we have examined two types of a single APE that summarizes the APEs of observed covariates. One is obtained by calculating ASF – averaging the conditional 9 spending i = log(per pupil GF expenditure), and spending93i is spending in 1993/1994. 136 mean over the distribution of unobservables – and then averaging its partial derivatives or discrete changes across the observed covariates. The other, which is more commonly used in empirical studies, is obtained by averaging the partial effects of conditional mean over the joint distribution of observed covariates and unobservables. Through the propositions, we have shown that the two APEs are identified in general as long as there are suitable proxy variables satisfying the redundancy and the conditional independence assumptions. Furthermore, they are not generally the same unless the unobservables and observed covariates are independent. We have also illustrated how the two APEs are estimated in the empirical examples using MEAP math test outcomes. In these examples, the two types of APE estimates are similar. 137 REFERENCES 138 REFERENCES Altonji, J. G., and R. L. Matzkin. 2005. “Cross section and panel data estimators for nonseparable models with endogenous regressors.” Econometrica, 73(4): 1053–1102. Bartle, R. G. 1966. The Elements of Integration.: New York: Wiley. Blundell, R., and J. L. Powell. 2003. “Endogeneity in Nonparametric and Semiparametric Regression Models.” In Advances in Economics and Econometrics. eds. by L. P. H. Dewatripont, Mathias, and S. J. Turnovsky, 2: Cambridge: Cambridge University Press, , 1st edition 312–357. Imbens, G. W., and J. M. Wooldridge. 2009. “Recent Developments in the Econometrics of Program Evaluation.” Journal of Economic Literature, 47(1): 5–86. Nam, S. 2014. “Multiple fractional response variables with continuous endogenous explanatory variables.” June. Papke, L. E., and J. M. Wooldridge. 2008. “Panel data methods for fractional response variables with an application to test pass rates.” Journal of Econometrics, 145(1-2): 121– 133. Wooldridge, J. M. 2002. Econometric analysis of cross section and panel data.: MIT press. . 2005a. “Fixed-effects and related estimators for correlated random-coefficient and treatment-effect panel data models.” Review of Economics and Statistics, 87(2): 385– 390. Wooldridge, J. M. 2005b. “Unobserved Heterogeneity and Estimation of Average Partial Effects.” In Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg. eds. by D. W. K. Andrews, and J. H. Stock: Cambridge University Press, , Chap. 3. . 2010. Econometric Analysis of Cross Section and Panel Data,. MIT Press Books: MIT Press, , 2nd edition. 139