Efficient estimation with missing values in cross section and panel data

Chapter 1: Efficient Estimation with Missing Data and EndogeneityI study the problem of missing values in both the outcome and the covariates in linear models with endogenous covariates. I propose an estimator that improves efficiency relative to a Two Stage Least Squares (2SLS) based only on the complete cases. My framework also unifies the literature on missing data and combining data sets, and includes the "Two-Sample 2SLS" as a special case. The method is an extension of Abrevaya and Donald (2017), who provide methods of improving efficiency over complete cases estimators in linear models with cross-section data and missing covariates. I also provide guidance on dealing with missing values in the instruments and in commonly used nonlinear functions of the endogenous covariates, likes squares and interactions, without introducing inconsistency in the estimates.Chapter 2: Imputing Missing Covariate Values in Nonlinear ModelsI study the problem of missing covariate values in nonlinear models with continuous or discrete covariates. In order to use the information in the incomplete cases, I propose an inverse probability weighted one-step imputation estimator that provides gains in efficiency relative to the complete cases estimator using a reduced form for the outcome in terms of the always-observed covariates. Unlike the two-step imputation and dummy variable methods commonly used in empirical work ,my estimator is consistent for a wide class of nonlinear models. It relies only on the commonly used "missing at random" assumption, and provides a specification test for the resulting restrictions. I show how the results apply to nonlinear models for fractional and nonnegative responses.Chapter 3: Efficient Estimation of Linear Panel Data Models with Missing CovariatesWe study the problem of missing covariates in the context of linear, unobserved effects panel data models. In order to use information on incomplete cases, we propose generalized method of moments (GMM) estimation. By using information on the incomplete cases from all time periods, the proposed estimators provide gains in efficiency relative to the fixed effects (and Mundlak) estimator that use only the complete cases. The method is an extension of Abrevaya and Donald(2017), who consider a linear model with cross-sectional data and incorporate the linear imputation method in the set of moment conditions to obtain gains in efficiency. Our first proposed estimator uses the assumption of strict exogeneity of the covariates as well as the selection, while allowing the selection to be correlated with the observed covariates and unobserved heterogeneity in both the outcome equation and the imputation equation. We also consider the case in which the covariates are only sequentially exogenous and propose an estimator based on the method of forward orthogonal deviations introduced by Arellano and Bover (1995). Our framework suggests a simple test for whether selection is correlated with unobserved shocks, both contemporaneous and those in other time periods.

Read

In Collections: Electronic Theses & Dissertations

Copyright Status: In Copyright

Material Type: Theses

Authors: Rai, Bhavna

Thesis Advisors: Wooldridge, Jeffrey

Committee Members: Schmidt, Peter
Elder, Todd
Caputo, Vincenzina

Date Published: 2021

Subjects: Economics--Methodology
Structural equation modeling
Regression analysis

Program of Study: Economics - Doctor of Philosophy

Degree Level: Doctoral

Language: English

Pages: viii, 130 pages

ISBN: 9798535565194

Permalink: https://doi.org/doi:10.25335/m8k6-jp80

Efficient estimation with missing values in cross section and panel data

Full text