Essays on discrete multivalued treatments with endogeneity and heterogeneous counterfactual errors
This dissertation is composed of three chapters, and each one of them studies discrete multivalued treatments with endogeneity and heterogeneous counterfactual errors. The first chapter extends the investigations of average treatment effects (ATEs) in extensively-studied binary treatments to those in discrete multivalued treatments with both endogeneity and heterogeneous counterfactual errors and explores the behavior of control function (CF) and instrumental variables (IV) methods in this framework. Specifically, I offer identification strategies for the ATEs, suggest a consistent estimator for the ATEs, show the asymptotic properties of CF parameter estimates, and derive a score test in order to draw inferences about the ATEs and other parameters of interest. Moreover, using a Monte Carlo simulation analysis, I compare CF method with widely used IV method in terms of asymptotic efficiency, asymptotic unbiasedness, and consistency. Simulation results suggest that CF method can be asymptotically up to 12% more efficient than IV method, and asymptotic bias in parameter estimates of IV method can be as high as 43%. However, when misspecification is introduced, simulation results favor IV method. For the empirical illustration, I apply ordinary least squares (OLS), CF, IV, and nonparametric bound analysis to the estimation of how limited English proficiency (LEP) influences wages of Hispanic workers in the USA. The data come from the 1% Public Use Microdata Series of the 1990 US Census. Utilizing age at arrival as an instrumental variable, both OLS and CF methods indicate that LEP on average imposes a statistically significant wage penalty (up to 79% in some CF estimates)on Hispanic community in the USA. IV method mostly produces insignificant results, and nonparametric bound analysis provides uninformative lower bounds.The second chapter incorporates a structure of correlated random coefficients (CRCs)into the framework introduced in the first chapter. However, in this new setting with CRCs, conventional IV method is suspected to be inconsistent for ATEs. In this chapter, I propose a consistent CF estimation procedure for the ATEs and show the asymptotic properties of CF parameter estimates. In addition, my Monte Carlo simulation analysis suggests that, in the absence of misspecification, CF method is asymptotically unbiased and consistent (but not necessarily more efficient). Whereas, IV method is generally asymptotically biased and inconsistent. In the presence of misspecification, the simulation results show that both CF and IV methods have biased estimates (more on CF estimates). With regard to efficiency, the simulation findings show that none of the methods outperforms the other one clearly.In the third chapter, I take the treatment model from the first chapter to a specific linear high dimensional sparse setting where the high dimensional variables are irrelevant in treatment choice given the instruments and appear only in the outcome equation. Using a detailed simulation analysis, I examine the finite sample properties, model selection features, and prediction capabilities of several machine learning (ML) methods and of the CF method from the first chapter. To estimate the parameters of interest, I use four different ML methods: LASSO; post partial-out LASSO of Belloni et al. (2012); post double selection LASSO of Belloni, Chernozhukov, and Hansen (2014a); and double/debiased ML LASSO of Chernozhukov et al. (2018). The most important simulation result is that, in the presence of enough extra predictive variables that are ignorable in treatment selection and are from a set of high dimensional predictors of outcome, more complicated LASSO-based methods result inefficiency gains in ATE estimates over the simpler CF method although both LASSO-based methods and the CF method perform more or less the same as far as finite sample bias is concerned. As far as model selection goes, the simulations show that the double/debiased MLLASSO both selects the most number of potential variables and correctly selects the most number of variables with true nonzero impact on outcome in estimation. As to prediction, the simulation results suggest that LASSO has the best prediction features.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Authors
-
Kekec, Ibrahim
- Thesis Advisors
-
Wooldridge, Jeffrey M JW
- Committee Members
-
Schmidt, Peter PS
Kim, Kyoo il KK
Sung, Chih-Li CS
- Date Published
-
2021
- Subjects
-
Economics
- Program of Study
-
Economics - Doctor of Philosophy
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- xiii, 230 pages
- ISBN
-
9798759956440
- Permalink
- https://doi.org/doi:10.25335/g2c5-6s24