ESTIMATING NONLINEAR CROSS SECTION AND PANEL DATA MODELS WITH
ENDOGENEITY AND HETEROGENEITY
by
Hoa Bao Nguyen

A DISSERTATION
Submitted
to Michigan State University
in partial fulﬁllment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
ECONOMICS
2011

ABSTRACT
ESTIMATING NONLINEAR CROSS SECTION AND PANEL DATA MODELS WITH
ENDOGENEITY AND HETEROGENEITY
by
Hoa Bao Nguyen
The dissertation consists of three chapters that consider the estimation of nonlinear cross section and panel data models. This study contributes to the literature by developing new estimation
methods for estimating models with limited dependent variable and endogenous regressors in the
presence of unobserved heterogeneity. It also makes contribution to the ﬁeld of labor economics
by applying my new estimators to the study of female labor supply.
In the ﬁrst chapter, a fractional response model with a count endogenous regressor is considered. A new estimation method is proposed to handle discrete endogeneity in the presence
of unobserved heterogeneity and non-linear setting. The two-step Quasi-Maximum Likelihood
and Nonlinear Least Squares estimators using the Adaptive Gauss Hermite quadrature are proposed. Average partial effects for discrete endogenous variables are obtained given its difﬁculty
of approximation based on a non-closed form conditional mean with a non-normal heterogeneity.
Monte Carlo simulations verify that the new estimators are the least biased and the most efﬁcient
among examined estimators including existing estimators. This is the ﬁrst research that supports
the necessity and signiﬁcance of count endogeneity. The proposed estimators are applied to analyze the US female labor supply. The result shows diminishing marginal effects of additional
children on female’s working hours. This novel ﬁnding is consistent with a story of fertility and
presents an evidence of economies of scale that mothers become more efﬁcient after raising the
ﬁrst kids, devote more time to work and balance between working time and family time.
In the second chapter, a dynamic Tobit panel data model that allows for an endogenous regressor (besides the lagged dependent variable) is developed. I also permit the presence of unobserved heterogeneity and serial correlation of transitory shocks. A correlated random effect Tobit

approach, a computationally attractive estimation method, is proposed. The estimation method
employs the control function approach to account for endogeneity and to consistently estimate average partial effects. In addition, serial correlation in the reduced form is corrected which makes
the estimator more robust. This method is readily applied to Panel Study of Income Dynamics data
from 1980 to 1992. I ﬁnd a strong evidence of persistence in US white female labor working hours
and the initial condition of female labor supply is statistically signiﬁcant.
The third chapter considers the estimation of a panel data model with a corner solution response
and the presence of a dummy endogenous variable as well as heterogeneity. The main contribution
is to allow a joint distribution of the binary endogenous regressor and the unobserved factors that
affect both the amount and participation equations. A bivariate probit model is suggested in the
ﬁrst stage. An exponential type II Tobit (ET2T) model is exploited for the amount equation to ensure that the predicted value for the response variable is positive; and there is a correlation between
unobserved effects in both the amount and participation equations. The two-step estimation procedure inspired by Heckman’s idea of adding correction terms for endogenous switching and a corner
solution outcome is used to analyze the impact of fertility on female labor force participation and
labor supply using the Vietnamese Household Living Standard Surveys data 2004-2008. The proposed approach gives a statistically signiﬁcant negative effect of having a newborn on women who
are working and remain in the labor market. It corrects remarkably the bias in estimating the effect
of a newborn on mother’s working hours compared to other alternative estimation methods.

Copyright by
Hoa Bao Nguyen
2011

This thesis is dedicated to my family, my husband, Minh Cong Nguyen, and my son, Ton Chi Cong
Nguyen.

v

ACKNOWLEDGEMENTS

I would like to take this opportunity to thank people who have helped me during the journey
to a Ph.D. First, I would like to express my deepest gratitude to my advisor Professor Jeffrey
Wooldridge, a person of great knowledge and exceptional teacher, for his generous advice, support
and excellent training during my work on this dissertation.
I would also like to thank my other committee members, Professors Peter Schmidt, Todd Elder
and Joseph Gardiner for their valuable comments and support.
I am very grateful for the support that I receive from people at the World Bank who kindly
encouraged me to apply and develop more econometric models and estimation methods for nonlinear panel data with discrete endogenous variables, to be exploited as robust devices in useful
applications.
I wish to thank faculty members and graduate students of the Department of Economics at
Michigan State University for their useful training of many core economics branches and seminar
discussion of econometrics topics.
I would like to express a warmest gratitude to my parents and my husband. I owe my father
because he has motivated me to be a scientist and always encouraged as well as challenged me to
make great accomplishments. I am grateful to my mother and my parents-in-law who have been
there for me and make the time for me to focus on my dissertation. I was fortunate to have a beloved
husband who has helped me continuously and tremendously during my doctorate. Without his love
and support, I will never make it through. I also want to thank my sister and other members in my
extended family. Last but not least, I wish to thank my baby, Tony, since having him during the
graduate study made me recognize many values of life and created my unstoppable determination
to obtain the PhD and other future accomplishments. I dedicate this dissertation to my big family.
I would like to thank everyone whom I did not mention speciﬁcally but who helped me during
my studies.

vi

TABLE OF CONTENTS

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xi

CHAPTER 1
1.1
1.2

1.3

1.4
1.5

ESTIMATING A FRACTIONAL RESPONSE MODEL WITH A COUNT
ENDOGENOUS REGRESSOR
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Theoretical Model - Speciﬁcation and Estimation . . . . . . . . . . . . . . . . . . 4
1.2.1 Estimation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.2 Average Partial Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.2.1 The Case with Exogenous Covariates and a Normally Distributed
Heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.2.2 The Case with a Count Endogenous Covariate and a Non-normally
Distributed Heterogeneity . . . . . . . . . . . . . . . . . . . . . 11
Monte Carlo Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.1 Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.2 Data Generating Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.3 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.3.1 Simulation Result with a Strong Instrumental Variable . . . . . . 16
1.3.3.2 Simulation Result with a Weak Instrumental Variable . . . . . . . 19
1.3.3.3 Simulation Result with Different Sample Sizes . . . . . . . . . . 20
1.3.3.4 Simulation Result with a Misspeciﬁed Distribution . . . . . . . . 21
1.3.4 Conclusion from the Monte Carlo Simulations . . . . . . . . . . . . . . . . 21
Application and Estimation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

CHAPTER 2

2.1
2.2
2.3
2.4
2.5

2.6

2.7

ESTIMATION OF A DYNAMIC TOBIT PANEL DATA WITH AN
ENDOGENOUS VARIABLE AND AN APPLICATION TO FEMALE
LABOR SUPPLY
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Average Partial Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Serial Correlation Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.1 Estimation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.2 Average Partial Effects . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Empirical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6.2 Estimation and Result . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii

.
.
.
.
.
.
.
.
.
.
.
.

30
30
31
35
40
41
44
45
45
46
47
49
51

CHAPTER 3

3.1
3.2
3.3
3.4
3.5

3.6

AN EXPONENTIAL TYPE II TOBIT PANEL DATA MODEL WITH
BINARY ENDOGENOUS REGRESSOR - APPLICATION TO ESTIMATING THE EFFECT OF FERTILITY ON MOTHERS’ LABOR
FORCE PARTICIPATION AND LABOR SUPPLY
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Model and Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Average Partial Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Empirical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.1 Overview of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.2 Estimation and Result . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.

53
53
55
58
63
65
65
67
71

APPENDIX A

TABLES FOR CHAPTER 1

73

APPENDIX B

TABLES AND FIGURES FOR CHAPTER 2

90

APPENDIX C

TABLES FOR CHAPTER 3

99

APPENDIX D TECHNICALITIES FOR CHAPTER 1
D.1 Details of the QML Estimator . . . . . . . . . . . . . . .
D.1.1 Asymptotic Variance for the Two-step Estimator .
D.1.2 Asymptotic Variance for the APEs . . . . . . . .
D.2 Details of the Tobit Model’s Estimators . . . . . . . . .
D.3 Formula of the NLS estimation . . . . . . . . . . . . . .
D.4 Derivation of the Heterogeneity Distribution . . . . . . .

103
103
103
107
110
113
114

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

APPENDIX E TECHNICALITIES FOR CHAPTER 2
115
E.1 Asymptotic Variance of the Two-step Estimator . . . . . . . . . . . . . . . . . . . 115
E.2 Asymptotic Variance of the Average Partial Effects . . . . . . . . . . . . . . . . . 118
APPENDIX F TECHNICALITIES FOR CHAPTER 3
120
F.1 Bivariate Probit Model in the First Stage . . . . . . . . . . . . . . . . . . . . . . . 120
F.2 Asymptotic Variance of the Two-step Estimator . . . . . . . . . . . . . . . . . . . 122
BIBLIOGRAPHY

126

viii

LIST OF TABLES

A.1 Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.5, 500
replications) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
A.2 Simulation Result of the Coefﬁcient Estimates (N=1000, η 1 = 0.5, 500 replications) . . 75
A.3 Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.1, 500
replications) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
A.4 Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.9, 500
replications) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
A.5 Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.5, 500
replications, δ23 = 0.3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
A.6 Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.5, 500
replications, δ23 = 0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
A.7 Simulation Result of the Average Partial Effects Estimates (N=100, η 1 = 0.5, 500
replications) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
A.8 Simulation Result of the Average Partial Effects Estimates (N=500, η 1 = 0.5, 500
replications) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
A.9 Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.5, 500
replications) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
A.10 Simulation Result of the Average Partial Effects Estimates (N=2000, η 1 = 0.5, 500
replications) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
A.11 Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.5, a1 is
normally distributed, 500 replications) . . . . . . . . . . . . . . . . . . . . . . . . . . 84
A.12 Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.5, 500
replications) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
A.13 Comparison of analytical and bootstrapping mean of standard errors (N=1000, η 1 =
0.5, 200 replications) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
A.14 Frequencies of the Number of Children . . . . . . . . . . . . . . . . . . . . . . . . . . 86
A.15 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
A.16 First-stage Estimates using Instrumental Variables . . . . . . . . . . . . . . . . . . . . 87
ix

A.17 Estimates Assuming Number of Kids is Conditionally Exogenous . . . . . . . . . . . 88
A.18 Estimates Assuming Number of Kids is Endogenous . . . . . . . . . . . . . . . . . . 89
B.1 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
B.2 Determinants of Female Working Experience - First stage regressions . . . . . . . . . 91
B.3 Estimating Dynamic Female Labor Supply, Second Stage Regressions, Experience is
Treated as an Endogenous Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
B.4 Average Partial Effects on Female Labor Supply . . . . . . . . . . . . . . . . . . . . . 93
C.1 Summary Statistics for the Whole Sample . . . . . . . . . . . . . . . . . . . . . . . . 99
C.2 Summary Statistics for Each Year in the Panel . . . . . . . . . . . . . . . . . . . . . . 100
C.3 Bivariate Probit Estimates of Fertility and LFP in the First Stage . . . . . . . . . . . . 101
C.4 Estimates for Log(Female Working Hours) Equation . . . . . . . . . . . . . . . . . . 102

x

LIST OF FIGURES

B.1 Distribution of Women’s Annual Hours of Work in 1980-1992 . . . . . . . . . . . . . 94
B.2 Hours of Work vs. Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
B.3 Hours of Work vs. Number of Children 0-2 . . . . . . . . . . . . . . . . . . . . . . . 96
B.4 Hours of Work vs. Number of Children 3-5 . . . . . . . . . . . . . . . . . . . . . . . 97
B.5 Hours of Work vs. Number of Children 6-17 . . . . . . . . . . . . . . . . . . . . . . . 98

xi

Chapter 1
ESTIMATING A FRACTIONAL RESPONSE MODEL WITH A COUNT ENDOGENOUS
REGRESSOR

1.1 Introduction
Many economic models employ a fraction or a percentage, instead of level values, as a dependent
variable. In these models, economic variables of interest occur in fractions such as employee
participation rates in 401(k) pension plans, ﬁrm market shares and fractions of total weekly hours
spent working. These fractional response variables take values in the unit interval [0,1], which have
both continuous and discrete characteristics. As suggested in Papke & Wooldridge (1996, 2008),
we can model fractional response variables based on a correctly speciﬁed conditional mean and use
a simple quasi-maximum likelihood estimator or nonlinear least squares (QMLE/NLS) method
with the Bernoulli distribution. This method is more attractive than other standard approaches
such as the MLE method with beta distribution or the log-odd transformation because it will give a
direct estimate of the original dependent variable and ensure that the predicted value is in the unit
interval.
Fractional response models (FRMs) with continuous or binary endogenous variables have been
studied (see more in Papke & Wooldridge (2008) and Wooldridge (2010)). However, there has
not been any well-developed estimation method and procedure to deal with count endogeneity in
FRMs. Traditionally, a count endogenous explanatory variable (CEEV) is treated as a continuous
endogenous variable and it is written in a linear fashion of covariates including instruments and
additive error. A common approach such as the two stage least squares (2SLS) using the linear
approximation always gives a constant marginal effect. This approach ignores the fact that the
marginal effect of having one more unit of the CEEV on the outcome of interest might be more
or less than the marginal effect of having the previous unit on the outcome. In order to acknowledge this fact, we should study FRMs and count endogeneity with the nonlinear approximation
1

in the ﬁrst stage. Speciﬁcally, we can handle a CEEV by allowing a Poisson distribution of the
count variable in the reduced form. The heterogeneity term in the Poisson model is assumed to be
correlated with the error term in the structural conditional mean. It is standard to allow the heterogeneity to follow a gamma distribution (which leads to the gamma error in the reduced form)
because it results in a closed form solution, the Negative Binomial (NB) model. The key to correct for the endogeneity problem in this case is how we are willing to make an assumption on the
joint distribution of errors. One strategy is to allow a linear correlation between the transformation of the gamma, which is now normally distributed, and the error in the structural conditional
mean (see further discussion in Weiss (1999)). However, this assumption does not allow a direct
relationship between the two errors which governs the endogeneity problem. The choice of the
transformation function is the inverse of the standard normal and depends on unknown parameters
in the distribution of the heterogeneity of the Poisson model. This strategy can be used to test
for endogeneity of the count explanatory variable but it will make the evaluation of the likelihood
function more computational and obtaining the conditional maximum likelihood estimator as well
as its asymptotic covariance matrix is nontrivial and time-consuming. An alternative strategy is
that we still allow the gamma heterogeneity in the Poisson model and a linear, direct correlation of
this heterogeneity and the error term in the structural conditional mean. We then need to integrate
out this heterogeneity in the structural conditional mean. As discussed in Winkelmann (2000), the
heterogeneity in a Poisson model can be presented in terms of an additive correlated error or a
multiplicative correlated error. However, the multiplicative correlated error has some advantage
over the additive correlated error on grounds of consistency. As a result, a multiplicative correlated
error is used in this model.
Because nonlinearity is allowed in both the reduced form and the main structural equations,
a two-step estimation procedure is more attractive. In a simultaneous nonlinear equations model
with a count dependent variable and a binary endogenous variable, Terza (1998) proposed a twostage method estimation method using a joint normal distribution of the error terms. He did not
carry out the estimation of the Poisson Full-Information Maximum Likelihood (FIML) or explore

2

its properties even though this approach was introduced in his paper. This is due to the computation burdensome of the FIML estimator. It emphasizes the advantage of the two-step estimation
procedure that we can employ in this model. However, the joint normal distribution of the error
terms is no longer appropriate in this case because a normal error term in the Poisson model will
not lead to a closed form solution. Therefore, we have discussed the strategy on assuming the error
terms as above. Besides an easier computational task, the two-step estimation procedure ensures
that the predicted value lying in the rational range. Moreover, we do not need to ﬁnd a conditional
probability for each value of the CEEV without knowing in advance speciﬁc values of a general
count explanatory variable, and avoid computing a conditional MLE which must be very difﬁcult.
Other estimation method for a FRM with a CEEV can be considered. For example, semiparametric and nonparametric method can be used (see more in Das (2005)). However, this approach
does not give estimates of the partial effects or the average partial effects of interest. If we are
interested in estimating both parameters and average partial effects (APEs), a parametric approach
will be preferred. In addition, in a nonlinear model, the quantity of interest is the APE which
can be comparable to a linear model’s estimate. Therefore, it is necessary and useful for applied
economists and practitioners to obtain the APE and use the parametric model.
In this chapter, I show how to specify and estimate FRMs with a CEEV and an unobserved
heterogeneity. Based on the work of Papke & Wooldridge (1996, 2008), I also use models for the
conditional mean of the fractional response in which the ﬁtted value is always in the unit interval.
I focus on the probit response function since the probit mean function is less computationally
demanding in obtaining the average partial effects. I suggest a new estimation method to handle
discrete endogeneity in the presence of unobserved heterogeneity and non-linear setting. The twostep Quasi-Maximum Likelihood and Nonlinear Least Squares estimators using Adaptive Gauss
Hermite quadrature are proposed. Average partial effects for discrete endogenous variables are
obtained given its difﬁculty of approximation based on a non-closed form conditional mean with
a non-normal heterogeneity. Monte Carlo simulations verify that the new estimators are the least
biased and the most efﬁcient among examined estimators including existing estimators. Using

3

these robust and efﬁcient estimators, I applied my proposed estimators to analyze the US female
labor supply. The empirical result gives an evidence to show that this is the ﬁrst research that
supports the necessity and signiﬁcance of count endogeneity.
This chapter is organized as follows. Section 2 introduces the speciﬁcations and estimations of
a FRM with a CEEV and shows how to estimate parameters and the average partial effects using
the two-step QMLE and NLS approaches. Section 3 presents Monte Carlo simulations and an
application to the fraction of total working hours for a female per week will follow in Section 4.
Section 5 concludes.

1.2 Theoretical Model - Specificatio and Estimation
For a 1 × K vector of explanatory variables z1 , the conditional mean model is expressed as follows:
E(y1 |y2 , z, a1 ) = Φ(α1 y2 + z1 δ1 + η1 a1 ),

(1.1)

where Φ(·) is a standard normal cumulative distribution function (cdf), y 1 is a response variable
(0 ≤ y1 ≤ 1), and a1 is a heterogeneous component or an omitted factor assumed to be correlated
with y2 but independent of exogenous variables z. In equation (1.1), I focus on the fractional
probit conditional mean because it gives a computationally simple estimator when we deal with
unobserved heterogeneity and endogenous regressors, as well as a convenient way to obtain average partial effects later on. The exogenous variables are z = (z1 , z2 ) where we need exogenous
variables z2 to be excluded from (1.1). z is a 1 × L vector where L > K, z2 is a vector of instruments. y2 is a count endogenous variable where we assume that the endogenous regressor has a
Poisson distribution:
y2 |z, a1 ∼ Poisson[exp(zδ 2 + a1 )],

(1.2)

then the conditional density of y 2 is speciﬁed as:
f (y2 |z, a1 ) =

[exp(zδ 2 + a1 )]y2 exp [− exp(zδ 2 + a1 )]
,
y2 !

(1.3)

where a1 is assumed to be independent of z, and exp(a 1 ) is distributed as Gamma(δ0 , 1/δ0 ) using
a single parameter δ0 , with E(exp(a1 )) = 1 and Var(exp(a1 )) = 1/δ0 .
4

The presence of a1 in both equations (1.1) and (1.2) is what makes y 2 potentially endogenous
in the equation of interest, (1.1). To illustrate this point, we could use, for example, u 2 instead
of a1 in the reduced form and u1 instead of η1 a1 in the structural conditional mean and assume a
linear function: u1 = η1 u2 + e1 . Substitute the right-hand-side of this function into the structural
conditional mean and then omit e 1 through multiplying all the coefﬁcients by the scale factor
2
1/ 1 + σe , we will see η1 u2 and u2 appear in the places of a1 and η1 a1 in equations (1.1) and

(1.2), respectively. Hence, rather than using u1 and u2 , we simply use a1 and η1 a1 as stated in the
reduced form and the structural conditional mean to govern the endogeneity of y 2 .
After a transformation (see Appendix D.4. for the derivation), the distribution of a 1 is derived
as follows:

δ

δ 0 [exp(a1 )]δ0 exp(−δ0 exp(a1 ))
.
f (a1 ; δ0 ) = 0
Γ(δ0 )

(1.4)

In order to get the conditional mean E(y1 |y2 , z), I specify the conditional density function of
a1 . Using Bayes’ rule, it is:
f (a1 |y2 , z) =

f (y2 |a1 , z) f (a1 |z)
.
f (y2 |z)

Since y2 |z, a1 has a Poisson distribution and exp(a 1 ) has a gamma distribution, y 2 |z is Negative
Binomial II distributed, as a standard result (see the Poisson and Negative Binomial II models in
Cameron & Trivedi (1986) and a speciﬁc derivation of this result from equations (D.1) to (D.3) in
Appendix D).
After some algebra, the conditional density function of a 1 is:
exp[P] [δ0 + exp(zδ 2 )](y2 +δ0 )
,
f (a1 |y2 , z) =
Γ(y2 + δ0 )

(1.5)

where P = − exp(zδ 2 + a1 ) + a1 (y2 + δ0 ) − δ0 exp(a1 ).
The conditional mean E(y1 |y2 , z) therefore will be obtained as:
E(y1 |y2 , z) =

+∞
−∞

Φ(α1 y2 + z1 δ1 + η1 a1 ) f (a1 |y2 , z)da1 = μ (θ ; y2 , z),

where f (a1 |y2 , z) is given in (1.5) and θ = (α1 , δ1 , η1 ).

5

(1.6)

The key to obtain the conditional mean of interest is to get the conditional density function
of a1 . Therefore, we need to assume the distribution of a 1 and specify f (a1 |y2 , z) as above. For
estimating purpose, it is necessary to compute f (a 1 |y2 , z) in (1.5) based on the parameters in the
reduced form (1.2). These parameters can be estimated using a Negative Binomial II regression.
And henceforth, they can be viewed as ﬁrst-step estimated parameters. In the second step, we are
interested in estimating conditional mean parameters, θ , in the FRMs.
For FRMs, we can consider a beta distribution or log-odds transformation of the fractional
dependent variable. However, Wooldridge (2010) shows that these two approaches have some
drawbacks. First, they rule out the case when the fractional response variable has some pileup
at zero and/or one. Second, specifying a beta distribution is not robust and produces inconsistent
estimators if any aspect of the distribution is misspeciﬁed. Third, the log-odds approach does not
give a direct estimate of the conditional mean which is of interest; since this approach offers only
the estimate of the transformed dependent variable (see more discussion in Papke & Wooldridge
(1996)). Therefore, for the dependent variable which has some mass point at 0 and/or 1, and
continuous in (0,1), we can focus on estimating the conditional mean of the fractional response
(as stated in equation 1.1) that keeps the predicted value in the unit interval, and obtain robust
estimators using the QMLE/NLS under the correctly speciﬁed conditional mean function. See
Papke & Wooldridge (1996, 2008) for further details.
Given the fractional probit conditional mean model as in equation (1.1), there are many ways to
estimate θ consistently. One possibility is to adopt the NLS estimator. This estimator is consistent
√
and N asymptotically normal. However, this estimator is unlikely to be asymptotically efﬁcient
because homoskedasticity is unlikely to hold for y 1 , even if we ignore the conditional Poisson
distribution for y2 . It might also be computationally intensive to obtain the weighting matrix for
the NLS estimator. Hence, we can use a simpler, robust and efﬁcient estimator, that is, the quasimaximum likelihood estimator (QMLE).
One can consider the QMLE using the Bernoulli distribution or the Poisson distribution of y 1 .
The QMLE is simple and strongly consistent even if the true distribution of y 1 is not Bernoulli

6

once the ﬁrst moment is assumed to be correctly speciﬁed. There are other reasons that make the
Bernoulli QMLE more attractive. First, maximizing the Bernoulli log likelihood is easy. Second, the Bernoulli distribution is a member of the linear exponential family (LEF) and it does not
have any restriction as other distributions (see further discussion in Papke & Wooldridge (1996)).
Moreover, it has some advantage over the Poisson distribution. For example, it is consistent with
the nature of a fractional response variable which has both continuous and discrete characteristics.
The Poisson distribution is consistent with a non-negative response variable but does not take into
account mass points at 0 and/or 1. In addition, even though the Poisson distribution is a member of
the LEF, it is chosen if we want the variance to be proportional to the mean, which is not realistic
for a fractional response variable. It is unlikely that the variance is monotonically increasing in
the mean. Another attraction of the Bernoulli QMLE is that it is efﬁcient in a class of estimators
containing all QMLEs in the LEF as long as the conditional mean is correctly speciﬁed and the
variance assumption holds. The assumption that the variance associated with the quasi-log likelihood in equation (1.6) is the Bernoulli generalized linear models (GLM) variance will hold if
the number of Bernoulli draws is independent of z i . This assumption still holds in an empirical
example of this chapter. However, in other applications, there is no guarantee that this assumption
holds and it is recommended to obtain fully robust sandwich standard errors (see more discussion
in Papke & Wooldridge (1996) and (Wooldridge, 2010, section 18.6)).
Therefore, in what follows, we use the QMLE or NLS with the Bernoulli quasi-log likelihood
function to estimate θ of equation (1.6) in the second step.
The Bernoulli quasi-log likelihood function is given by:
li (θ ) = y1i ln μi + (1 − y1i ) ln(1 − μi ).

(1.7)

The QMLE of θ in the second step is obtained from the maximization problem (see more
details in Appendix D.1.):
n

Max

θ ∈Θ i=1

li (θ ).

(1.8)

The NLS estimator of θ in the second step is attained from the minimization problem (see more

7

details in Appendix D.3.):
N

min N −1

θ ∈Θ

[y1i − μi (θ ; y2i , zi )]2 /2.

(1.9)

i=1

After we obtained the estimated parameters from the ﬁrst-step and approximate the conditional
mean (the detailed approximation procedure is discussed below), we estimate θ using the QMLE
and NLS estimators as described in the above maximization and minimization problems. These
estimators are the so-called two-step M-estimators that are consistent and asymptotically normal
(see further discussion of these estimators in Newey & McFadden (1994) and (Wooldridge, 2002,
chapter 12)).
Since μi = E(y1 |y2 , z) does not have a closed form solution, it is necessary to use a numerical
approximation. The numerical routine for integrating out the unobserved heterogeneity in the conditional mean equation (1.6) is based on the Adaptive Gauss-Hermite quadrature. This adaptive
approximation has proven to be more accurate with fewer points than the ordinary Gauss-Hermite
approximation. The quadrature locations are shifted and scaled to be under the peak of the integrand. Therefore, the adaptive quadrature is performed well with an adequate amount of points
(see more in Skrondal & Sophia (2004)).
Using the Adaptive Gauss-Hermite approximation, the above integral (1.6) can be obtained as:

μi =

+∞
−∞

hi (y2i , zi , a1 )da1 ≈

Ò

Ó

M
√
√
2σi
w∗ exp (a∗ )2 hi (y2 , zi , 2σi a∗ + wi ),
m
m
m

(1.10)

m=1

where σi and wi are the adaptive parameters for observation i, w∗ are the weights and a∗ are the
m
m
evaluation points, and M is the number of quadrature points. The approximation procedure follows
Skrondal & Sophia (2004). The adaptive parameters σi and wi are updated in the kth iteration of
the optimization for μi with:

μi,k ≈

√

M

√
ˆ
ˆ
ˆ
2σi,k−1 w∗ exp{(a∗ )2 }hi (y2i, zi , 2σi,k−1 a∗ + ωi,k−1 ),
m
m
m

m=1

√
ˆ
2σi,k−1 w∗ exp{(a∗ )2 }hi (y2i, zi , τi,m,k−1 )
m
m
ˆ
ωi,k =
(τi,m,k−1 )
,
μi,k
m=1
M

8

ˆ
σi,k =

M

√
(τi,m,k−1

)2

m=1

ˆ
2σi,k−1 w∗ exp{(a∗ )2 }hi (y2i, zi , τi,m,k−1 )
m
m
ˆ
− (ωi,k )2 ,
μi,k

where

τi,m,k−1 =

√
ˆ
ˆ
2σi,k−1 a∗ + ωi,k−1 .
m

ˆ
ˆ
This process is repeated until σi,k and ωi,k have converged for this iteration at observation i of
the maximization algorithm. This adaptation is applied to every iteration until the log-likelihood
difference from the last iteration is less than a relative difference of 1e−5 ; after this adaptation, the
adaptive parameters are ﬁxed.
Once the evaluation of the conditional mean has been done for all observations, the numerical
ˆ
values can be passed on to a maximizer in order to ﬁnd the QMLE or NLS θ .
I summarize the method for estimating θ with the following procedure:
1.2.1 Estimation Procedure
(i) Estimate δ2 and δ0 by using maximum likelihood of y i2 on zi in the Negative Binomial
ˆ
ˆ
model. Obtain the estimated parameters δ2 and δ0 .
(ii) Use the fractional probit QMLE (or NLS) of yi1 on yi2 , zi1 to estimate α1 , δ1 and η1 with the
approximated conditional mean. The conditional mean is approximated using the estimated
parameters in the ﬁrst step and using the Adaptive Gauss-Hermite method.
ˆ
ˆ ˆ ˆ
After getting all the estimated parameters θ = (α1 , δ 1 ,η1 ) , the standard errors in the second
stage should be adjusted for the ﬁrst stage estimation and obtained using the delta method. The
standard errors obtained by using the delta method can be derived with the following formula:
ˆ
Avar(θ ) =

N
1 ˆ −1
ˆ
ˆ ˆ
ri1 ri1 A−1 .
N −1
A1
1
N
i=1

(1.11)

For more details, see the derivation and matrix notation from equation (D.6) to equation (D.20)
in Appendix D.1.

9

1.2.2 Average Partial Effects
Econometricians are often interested in estimating the average partial effects of explanatory variables in non-linear models in order to get comparable magnitudes with other nonlinear models and
linear models. The Average Partial Effects (APE) can be obtained by taking the derivatives or the
differences of a conditional mean equation with respect to the explanatory variables of interest.
The APE cannot be estimated with the presence of unobserved factor. It is necessary to "integrate
out" the unobserved variable in the conditional mean or average the partial effects across the distribution of the unobservable. Then we will obtain a single factor by taking the average across the
sample in order to compare with the corresponding linear estimate.
I begin by reviewing the calculation of APEs when the explanatory variables are exogenous,
following Papke & Wooldridge (2008), and then show how to identify the APEs with a count
endogenous explanatory variable.

1.2.2.1 The Case with Exogenous Covariates and a Normally Distributed Heterogeneity
In a FRM with all exogenous covariates, model (1.1) with y 2 exogenous and a normally distributed
a1 is considered (for a general discussion of a FRM with all exogenous covariates, see Papke &
Wooldridge (2008)).
Let w = (y2 , z1 ), dropping observation index i, equation (1.1) is rewritten as:
E(y1 |w, a1 ) = Φ(wβ + a1 ),
where w is the ﬁxed terms and a1 is the random term. We can also allow elements of w to be
any function of (y2 , z1 ), including nonlinear functions, such as quadratic or cubic forms, and
interactions. If w1 is continuous, then the partial effect with respect to w 1 is:

∂ E(y1 |w, a1)/∂ w1 = β1 φ (wβ + a1 ).
If w1 is a dummy variable, we compute:
Φ(w1 β + a1 ) − Φ(w0 β + a1 ),
10

where w1 and w0 are two different values of the covariates including w1 =1 and w1 =0, respectively.
2
Since a1 is not observed but we assume a1 |w ∼ Normal(0, σa ), we can obtain the APE by

averaging the partial effects across the distribution of a 1 :
Ea1 [β1 φ (wβ + a1 )],
Ea1 [Φ(w1 β + a1 ) − Φ(w0 β + a1 )],

ä

ç

and these are equivalent to getting: β 1 φ (wβ a ) and Φ(w1 βa ) − Φ(w0 βa ) where subscript a stands
for division by

2
1 + σa .

Then we can obtain a single number to compare with the linear estimates by averaging the
derivative or the difference across the sample.
For a continuous z11 , the APE is estimated by:

¾

ˆ
δ11a N −1

N

¿

ˆ
ˆ
φ (α1a y2i + z1i δ1a ) .

i=1

For a count variable y2 , the APE is estimated by:
N −1

N

ˆ
ˆ
ˆ 2
ˆ 2
Φ(α1a y1 + z1i δ1a ) − Φ(α1a y0 + z1i δ1a ) .

i=1

For example, if we are interested in obtaining the APE when y2 changes from 0 to 1, it is
necessary to predict the difference in mean responses with y2 = 1 and y2 = 0 and average the
difference across all units.

1.2.2.2 The Case with a Count Endogenous Covariate and a Non-normally Distributed Heterogeneity
In a fractional response model with a count endogenous variable, model (1.1) is considered with
the estimation procedure provided in the previous section. The APEs are obtained by taking the
derivatives or the differences in:
Ea1 [Φ(α1 y0 + z0 δ1 + η1 a1 )],
2
1

11

(1.12)

with respect to the elements of (y0 , z0 ). In the argument of the expectations operator, (y0 , z0 ) are
2 1
2 1
ﬁxed terms and a1 is a random term.
The partial effect (PE) is obtained for a continuous variable z11 :
PE(y0 , z0 , a1 ) = δ11 φ (α1 y0 + z0 δ1 + η1 a1 ),
2 1
2
1

(1.13)

and for a discrete variable y2 , we compute:
Φ(α1 y1 + z0 δ1 + η1 a1 ) − Φ(α1 y0 + z0 δ1 + η1 a1 ),
2
1
2
1

(1.14)

which is the difference in mean responses with two ﬁxed points: y 2 = y1 and y2 = y0 that we are
2
2
interested in.
To obtain the APEs, we need to average the above partial effects across the distribution of a 1 :
APEc = Ea1 [δ11 φ (α1 y0 + z0 δ1 + η1 a1 )],
2
1

(1.15)

APEd = Ea1 [Φ(α1 y1 + z0 δ1 + η1 a1 ) − Φ(α1 y0 + z0 δ1 + η1 a1 )],
2
1
2
1

(1.16)

for the continuous case, and

for the discrete case.
This is equivalent to integrate out a 1 and we respectively receive:

ψ =APEc =
λ = APEd =

+∞
−∞

+∞
−∞

δ11 φ (α1 y0 + z0 δ1 + η1 q1 ) f (q1 |y0 , z0 ; θ )dq1 ,
2
1
2 1

Φ(g1 θ ) f (q1 |y1 , z0 ; θ )dq1 −
2 1

+∞
−∞

Φ(g0 θ ) f (q1 |y0 , z0 ; θ )dq1 ,
2 1

(1.17)
(1.18)

where q1 is a dummy argument in the integration, g 1 = (y1 , z0 , q1 ), and g0 = (y0 , z0 , q1 ).
2 1
2 1
These APEs are estimated by:
ˆ
APE c = δ11
APE d =

+∞
−∞

+∞
−∞

ˆ
ˆ
ˆ
ˆ 2
φ (α1 y0 + z0 δ1 + η1 q1 ) f (q1 |y0 , z0 ; θ )dq1 ,
1
2 1

ˆ
ˆ
Φ(g1 θ ) f (q1 |y1 , z0 ; θ )dq1 −
2 1

12

+∞
−∞

ˆ
ˆ
Φ(g0 θ ) f (q1 |y0 , z0 ; θ )dq1 ,
2 1

(1.19)
(1.20)

Since equations (1.19) and (1.20) cannot be obtained in a closed form, we need to use the
Adaptive Gauss-Hermite method to approximate the density of f (q 1 |yk , z0 ; θ ); k = 0, 1. This is
2 1
equivalent to obtain:
ˆ
ˆ
ψ = APE c ≈ δ11

√

2σi

M

√
ˆ
{w∗ exp[(a∗ )2 ]}φ (g0 θ ) f (y0 , z0 , ( 2σi a∗ + wi ); θ ) ,
m
m
m
2 1

(1.21)

m=1

λ = APE d = λ 1 − λ 0 ,
where

λk =

M
√
√
ˆ
2σi
{w∗ exp[(a∗ )2 ]}Φ(gk θ ) f (yk , z0 , ( 2σi a∗ + wi ); θ ),
m
m
m
2 1

(1.22)

(1.23)

m=1

√
in addition to gk = (yk , z0 , ( 2σi a∗ + wi )); k = 0, 1 and θ = (α1 , δ1 , η1 ) . For a comparison
m
2 1
between the linear model estimates and the fractional probit estimates, it is useful to have a single
factor. This single factor can be obtained by averaging out z1i across all individuals in the formula
ˆ
of ψ and λ . For example, in order to get the APE when y2 changes from 0 to 1, it is necessary to
predict the difference in the mean responses with y2 = 0 and y2 = 1 and take the average of the
differences across all units. This APE gives us a number comparable to the linear model’s estimate.
The standard errors for the APEs will be obtained using the delta method. The detailed derivation is provided from equation (D.21) to equation (D.40) in Appendix D.1.

1.3 Monte Carlo Simulations
This section examines the ﬁnite sample properties of the two-step QML and NLS estimators of the
population averaged partial effect in a fractional response model with a count endogenous variable.
Some Monte Carlo experiments are conducted to compare these estimators with other estimators
under different scenarios. These estimators are evaluated under correct model speciﬁcation with
different degrees of endogeneity; with strong and weak instrumental variables; and with different
sample sizes. The behavior of these estimators is also examined with respect to a choice of a
particular distributional assumption.

13

1.3.1 Estimators
Two sets of estimators under two corresponding assumptions are considered: (1) y 2 is assumed
to be exogenous, and (2) y2 is assumed to be endogenous. Under the former assumption, three
estimators are used: the ordinary least squares (OLS) estimator in a linear model, the maximum
likelihood estimator (MLE) in a Tobit model and the quasi-maximum likelihood estimator (QMLE)
in a fractional probit model. Under the latter assumption, ﬁve estimators are examined: the twostage least squares (2SLS) estimator, the two-step maximum likelihood estimator (MLE) in a Tobit
model using the Blundell-Smith estimation method (hereafter the Tobit BS), the two-step QMLE in
a fractional probit model using the Papke-Wooldridge’s estimation method (hereafter the QMLEPW; see more discussion of handling endogeneity in Papke & Wooldridge (2008)), the two-step
QMLE and the two-step NLS estimators in a fractional probit model using the estimation method
proposed in the previous section.

1.3.2 Data Generating Process
The count endogenous variable is generated from a conditional Poisson distribution:
y

exp(−λi )λi 2i
,
f (y2i |x1i , x2i , zi, a1i ) =
y2i !

(1.24)

λi = E(y2i |x1i , x2i , zi , a1i ) = exp(δ21 x1i + δ22 x2i + δ23 zi + ρ1 a1i ),

(1.25)

with a conditional mean:

using independent draws from normal distributions: z ∼ N(0, 0.3 2), x1 ∼ N(0, 0.22), x2 ∼ N(0, 0.22 )
and exp(a1 ) ∼ Gamma(1, 1/δ0) where 1 and 1/δ0 are the mean and variance of a gamma distribution. Parameters in the conditional mean model are set to be:
(δ21 , δ22 , δ23 , ρ1 , δ0 ) = (0.01, 0.01, 1.5, 1, 3).
The dependent variable is generated by ﬁrst drawing a binomial random variable x with n trials
and a probability p and then y1 = x/n. In this simulation, n = 100 and p comes from a conditional
14

normal distribution with the conditional mean:
p = E(y1i |y2i , x1i , x2i , a1i ) = Φ(δ11 x1i + δ12 x2i + α1 y2i + η1 a1i ),

(1.26)

and parameters in this conditional mean are set at: (δ 11 , δ12 , α1 , η1 ) = (0.1, 0.1, −1.0, 0.5).
In order to compare the magnitudes between a nonlinear model and a linear model, we are
interested in computing APEs. Based on the population values of the parameters set above, the
so-called true value of the APE with respect to each variable is reported as the mean of the APEs
approximated via the simulations with the standard procedure described below.
First, when y2 is treated as a continuous variable, the so-called true value of the APE with
respect to y2 is approximated from simulations by ﬁrst computing the derivative of the conditional
mean with respect to y2 , and then taking the average across the distribution of a 1 :
APE = −1.0 ∗

1 N
φ (0.1 ∗ x1 + 0.1 ∗ x2 − 1.0 ∗ y2 + 0.5 ∗ a1i).
N i=1

(1.27)

Now when y2 is allowed to be a count variable, the so-called true values of the APEs with
respect to y2 are computed by ﬁrst taking differences in the conditional mean. These true values
of the APEs are computed at interesting values. In this chapter, I will take the ﬁrst three examples
when y2 increases from 0 to 1, 1 to 2 and 2 to 3, respectively and the corresponding true values of

¾

the APEs are:
APE01 =

APE12 =

APE23 =

1 N
N i=1

¾

N

1
N i=1
N

1
N i=1

¾

Φ(0.1 ∗ x1 + 0.1 ∗ x2 − 1.0 ∗ 1 + 0.5 ∗ a1i)
−Φ(0.1 ∗ x1 + 0.1 ∗ x2 − 1.0 ∗ 0 + 0.5 ∗ a1i)
Φ(0.1 ∗ x1 + 0.1 ∗ x2 − 1.0 ∗ 2 + 0.5 ∗ a1i)
−Φ(0.1 ∗ x1 + 0.1 ∗ x2 − 1.0 ∗ 1 + 0.5 ∗ a1i)
Φ(0.1 ∗ x1 + 0.1 ∗ x2 − 1.0 ∗ 3 + 0.5 ∗ a1i)
−Φ(0.1 ∗ x1 + 0.1 ∗ x2 − 1.0 ∗ 2 + 0.5 ∗ a1i)

¿
,

(1.28)

,

(1.29)

.

(1.30)

¿
¿

These so-called true values of the APEs (which are approximated through simulations) with
respect to y2 and other exogenous variables are reported in Tables A.1-A.4. The experiment is
conducted with 500 replications and the sample size is normally set at 1000 observations.
15

1.3.3 Experiment Results
I report sample means, sample standard deviations (SD) and root mean squared errors (RMSE)
of these 500 estimates. In order to compare estimators across linear and non-linear models, I am
interested in comparing the APE estimates from different models.

1.3.3.1 Simulation Result with a Strong Instrumental Variable
Tables A.1-A.4 report the simulation outcomes of the APE estimates for the sample size N = 1000
with a strong instrumental variable (IV) and different degrees of endogeneity, where η 1 = 0.1,

η1 = 0.5, and η1 = 0.9. The IV is strong in the sense that the coefﬁcient on z is δ 23 = 1.5 in the ﬁrst
stage and the F-statistics on the signiﬁcance test of z in the ﬁrst-stage are large. These F-statistics
have average values equivalent to 91.75, 107.57 and 133.56 in 500 replications for three designs
of η1 : η1 = 0.1, η1 = 0.5, and η1 = 0.9, respectively. Three different values of η1 are selected
which corresponds to low, medium and high degrees of endogeneity. Columns 2-10 contain the
true values of the APE estimates and the means, SD and RMSE of the APE estimates from different
models with different estimation methods. Columns 3-5 consist the means, SD and RMSE of the
APE estimates for all variables from 500 replications with y 2 assumed to be exogenous. Columns
6-10 include the means, SD and RMSE of the APE estimates for all variables from 500 replications
with y2 allowed to be endogenous.
I ﬁrst report the simulation outcomes for the sample size N = 1000 and η 1 = 0.5 (see Table
A.1). The APE estimates using the proposed methods of QMLE and NLS in columns 9-10 are
closest to the true values of the APEs when y2 is discrete (−.3200, −.1273 and −.0212). It is
typical to get these three APEs for a discrete y2 (when y2 goes from 0 to 1, 1 to 2 and 2 to 3) as
examples in order to see the pattern of the means, SD and RMSE of the APE estimates. The APE
estimate is also very close to the true value of the APE (−.2347) when y 2 is treated as a continuous
variable. Table A.1 shows that the OLS estimate is about a half of the true value of the APE.
The ﬁrst source of large bias in the OLS estimate comes from the ignorance of the endogeneity
in the count variable y2 (with η1 = 0.5). The second source of bias in the OLS estimate is due
16

to the neglect of the non-linearity in both the structural and reduced-form equations (1.1) and
(1.2). The 2SLS approach also produces a biased estimator of the APE because of the second
reason mentioned above even though the endogeneity is taken into account. The MLE estimators
in the Tobit model have smaller bias than the estimators in the linear model but larger bias than
the estimators in the fractional probit model because they do not consider the functional form
of the fractional response variable and the count explanatory variable. When the endogeneity is
corrected, the MLE estimator in the Tobit model using Blundell-Smith method has a smaller bias
than the counterpart where y2 is assumed to be exogenous. Among the fractional probit models, the
two-step QMLE estimator, where y2 is assumed to be exogenous, (column 5) has the largest bias
because it ignores the endogeneity of y2 . However, it still has a smaller bias than other estimators
of the linear and Tobit models. The two-step QMLE-PW estimator (column 8) provides useful
result because its estimates are also very close to the true values of the APEs but it produces a
larger bias than the two-step QMLE and NLS estimators proposed in this chapter. Similar to the
two-step MLE estimator in Tobit model using Blundell-Smith method, the two-step QMLE-PW
estimator adopts the control function approach. This approach utilizes the linearity in the ﬁrst
stage equation. As a result, it ignores the discreteness in y 2 which leads to the larger bias than the
two-step QMLE and NLS estimators proposed in this chapter.
The ﬁrst set of estimators with y2 assumed to be exogenous (columns 3-5) has relatively smaller
SDs than the second set of estimators with y 2 allowed to be endogenous (columns 6-10) because
the methods that correct for endogeneity using IVs have more sampling variation than their counterparts without endogeneity correction. This results from the less-than-unit correlation between
the instrument and the endogenous variable. However, the SDs of the two-step QMLE and NLS estimators (columns 9-10) are no worse than the QMLE estimator where y 2 assumed to be exogenous
(column 5).
Among all estimators, the two-step QMLE and NLS estimators proposed in this chapter have
the smallest RMSE, not only for the case where y2 is allowed to be a discrete variable but also
for the case where y2 is treated as a continuous variable using the correct model speciﬁcation. As

17

discussed previously, the two-step QMLE estimator using Papke-Wooldridge method has the third
smallest RMSE since it also uses the same fractional probit model. Comparing columns 3 and 6, 4
and 7, 5 and a set of all columns 8-10, the RMSEs of the methods correcting for endogeneity are
smaller than those of their counterparts.
Table A.2 reports simulation result for coefﬁcient estimates. The coefﬁcient estimates are useful in the sense that it gives the directions of the effects. For studies which only require exploring
the signs of the effects, the coefﬁcient tables are necessary. For studies which require comparing
the magnitudes of the effects, we essentially want to estimate the APEs. Table A.2 shows that the
means of point estimates are close to their true values for all parameters using the two-step QML
(or NLS) approach (−1.0, 0.1 and 0.1). The bias is large for both 2SLS method and OLS method.
These results are as expected because the 2SLS method uses the predicted value from the ﬁrst stage
OLS so it ignores the distributional information of the right-hand-side (RHS) count variable, regardless of the functional form of the fractional response variable. The OLS estimates do not carry
the information of endogeneity. Both the 2SLS and OLS estimates are biased because they do not
take into account the presence of unobserved heterogeneity. The bias for a Tobit Blundell-Smith
model is similar to the bias with the 2SLS method because it does not take into account the distributional information of the right-hand-side count variable and it employs a different functional
form given the fact that the fractional response variable has a small number of zeros. The biases
for both the QMLE estimator treating y 2 as an exogenous variable and for the two-step QMLE-PW
estimator are larger than those of the two-step QMLE and NLS estimators in this chapter. In short,
simulation results indicate that the means of point estimates are close to their true values for all
parameters using the two-step QMLE and the NLS approach mentioned in the previous section.
Simulations with different degrees of endogeneity through the coefﬁcient η 1 = 0.1 and η1 =
0.9 are also conducted (see Table A.3 and A.4). Not surprisingly, with less endogeneity, η 1 = 0.1,
the set of the estimators treating y2 as an exogenous variable produces the APE estimates closer to
the true values of the APE estimates; the set of the estimators treating y 2 as an endogenous variable
has the APE estimates further from the true values of the APE estimates. With more endogeneity,

18

η1 = 0.9, the set of the estimators treating y 2 as an endogenous variable has the APE estimates
getting closer to the true values of the APE estimates; and the set of the estimators treating y 2 as
an exogenous variable gives the APE estimates further from the true values of the APE estimates.
As an example, it is noted that, as η 1 increases, the APE estimates of the 2SLS method are less
biased while the APE estimates of the QMLE estimator treating y 2 as an exogenous variable are
more biased and the difference between these two APE estimates is smaller since the endogeneity
is corrected.
All other previous discussions on the bias, SD and RMSE still hold with η 1 = 0.1 and η1 = 0.9.
It conﬁrms that the two-step QMLE and NLS estimators perform very well under different degrees
of endogeneity.

1.3.3.2 Simulation Result with a Weak Instrumental Variable
Table A.5 reports the simulation outcomes of the APE estimates for the sample size N = 1000 with
a weak IV and η1 = 0.5. Using the rule of thumb on a weak instrument (suggested in Staiger &
Stock (1997)), the coefﬁcient on z is chosen as δ23 = 0.3 which corresponds to a very small ﬁrststage F-statistic (the mean of the F-statistic is 6.97 in 500 replications). Columns 2-10 contain the
true values of the APE estimates, the means, SD and RMSE of the APE estimates from different
models with different estimation methods. Columns 3-5 consist the means, SD and RMSE of the
APE estimates for all variables from 500 replications with y 2 assumed to be exogenous. Columns
6-10 include the means, SD and RMSE of the APE estimates for all variables from 500 replications
with y2 allowed to be endogenous. The simulation results show that, even though the instrument is
weak, the set of estimators assuming y 2 endogenous still has smaller bias than the set of estimators
assuming y2 exogenous. The two-step QMLE and NLS APE estimates are still very close to the
true values of the APEs for both cases in which y2 is treated to be a continuous variable and y2
is allowed to be a count variable. Their SD and RMSE are still the lowest among the estimators
considering y2 endogenous. Table A.11 also provides this evidence.
Simulation results from my proposed procedure show that the two-step QMLE and NLS APE

19

estimates are less biased and more efﬁcient compared with the linear model’s and other models’
estimates. However, at the ﬁrst glance, we can notice the standard deviation in Table A.5 is less
than the standard deviation from Table A.1 under columns of QMLE and NLS, which is contrary
to the pattern of standard deviations in the linear model (under the column with 2SLS estimation
method). The standard deviation from my proposed procedure (under the columns of QMLE and
NLS) is smaller in the case of a weak IV than the case of a strong IV, which seems odd at ﬁrst
if we judge that exclusion restrictions are driving identiﬁcation. If we look at the result from the
column of 2SLS, the bias and inefﬁciency of 2SLS estimates may arise because a linear model
may provide a poor approximation for the count and fractional response variable. It suggests that
nonlinearities have larger contributions than exclusion restriction to identiﬁcation in the nonlinear models. In other words, functional form assumptions are mainly responsible for identiﬁcation
rather than the exclusion restriction. Therefore, it is worth investigating the reason why the estimates in my proposed procedure tend to be closer to the true value of the APEs and are always
efﬁcient without increasing the standard deviation in the case of weak instrument. We design the
experiment similar to simulated experiment in Table A.5 but the coefﬁcient on the instrument is
0 (δ23 = 0). This is equivalent to the case of no instruments. The results in Table A.6 show that
standard deviations under columns of QMLE and NLS are still smaller suggesting nonlinearity is
responsible for identiﬁcation since there is no exclusion restriction here.

1.3.3.3 Simulation Result with Different Sample Sizes
Four sample sizes are chosen to represent those commonly encountered sizes in applied research.
These range from small to large sample sizes: 100, 500, 1000 and 2000. Tables A.7-A.10 report the
simulation outcomes of the APE estimates with a strong IV, η 1 = 0.5, for sample sizes N = 100,
500, 1000, and 2000 respectively. Table A.8 is equivalent to Table A.1. Columns 2-10 contain the
true values of the APE estimates and the means, SD and RMSE of the APE estimates from different
models with different estimation methods. Columns 3-5 consist the means, SD and RMSE of the
APE estimates for all variables from 500 replications with y 2 assumed to be exogenous. Columns

20

6-10 include the means, SD and RMSE of the APE estimates for all variables from 500 replications
with y2 allowed to be endogenous. In general, the simulation results indicate that the SD and
RMSE for all estimators are smaller for larger sample sizes. Previous discussion as in 3.3.1 is still
applied. The two-step QMLE and NLS estimators perform very well in all sample sizes with the
smallest SD and RMSE. They are also the least biased estimators among all the estimators in this
discussion.

1.3.3.4 Simulation Result with a Misspecifie Distribution
The original assumption is that exp(a 1 ) ∼ Gamma(1, 1/δ0). However, misspeciﬁcation is dealt
with in this part. The distribution of exp(a 1 ) is no longer gamma, instead, a 1 ∼ N(0, 0.12 ) is
assumed. The ﬁnite sample behavior of all the estimators in this incorrect speciﬁcation is examined. Table A.11 shows the simulation results for the sample size N = 1000 with a strong IV and

η1 = 0.5 under misspeciﬁcation. All of the previous discussion under the correct speciﬁcation as
in 1.3.3.1 is not affected. The APE estimates under the fractional probit model are still very close
to the true values of the APEs.
Table A.12 shows the simulation results of the APE estimates with the sample size N = 1000
and η1 = 0.5. The estimates are close to true values of the APEs , with very small MSE and
rejection rates close to 0.05.
We should note from all tables in the section of Monte Carlo simulations that the standard
deviations under the columns of QMLE and NLS using the proposed procedure are not directly
comparable as standard errors. However, we can see from the simulation result of Table A.13
that the proposed procedure’s analytical variance is quite reliable because estimates of mean of
standard errors using analytical computation is quite close to those using bootstrapping method.

1.3.4 Conclusion from the Monte Carlo Simulations
This section examined the ﬁnite sample behavior of the estimators proposed in the FRM with
an endogenous count variable. The results of some Monte Carlo experiment show that the two21

step QMLE and NLS estimators have smallest standard deviations, RMSE and least biased when
the endogeneity is presented. The two-step QMLE and NLS methods also produce least biased
estimates in terms of both parameters and the APEs compared to other alternative methods.

1.4 Application and Estimation Results
My proposed estimators can be applied in a model of female labor supply. The dependent variable
refers to the allocation of total hours per week mothers spent on working. Hereafter, we name
the dependent variable as weekly fractional working hours. The data in this chapter were used in
Angrist & Evans (1998) to illustrate a linear model with a dummy endogenous variable: more than
two kids. They estimate the effect of additional children on female labor supply, considering the
number of children as endogenous and using the instruments: same sex and twins at the ﬁrst two
births. They found that married women who have the third child reduce their labor supply, and
their 2SLS estimates are roughly a half smaller than the corresponding OLS estimates.
In this application, the fractional response variable is the fraction of total weekly hours that
a woman spends working. This variable is generated from the number of working hours, which
was used in Angrist & Evans (1998), divided by the maximum hours per week (168). There is a
substantial number of women who do not spend any hours working, 13068 observations at zero.
Therefore, a Tobit model might be a choice.
In this application, we are interested in estimating a model of weekly fractional working hours
(FrHour) for women who take into consideration of having the number of children as a count
endogenous factor. We begin with the linear model as follows:
FrHour = α1 Kidno + δ1 Educ + δ2 Age + δ3 Age f b + δ4 Hispan + δ5 NmInc + a1 .

(1.31)

The count variable in this application is the number of children beyond two, between 0 and 10,
instead of an indicator for having more than two kids which was used in Angrist & Evans (1998).
The number of kids is considered endogenous, which is in line with the recent existing empirical
literature. First, the number and timing of children born are controlled by a mother makes fertility
22

decisions correlated with the number of children. Second, some women have preference for familybased activity or market-based work, so fertility is correlated with women’s heterogeneity. The
estimation sample contains 31,824 women, more than 50% is childless, 31% have one kid, 11%
have two kids and the rest have more than two kids. Table A.14 gives the frequency distribution of
the number of children and it appears to have excess zeros and long tails with the average number of
children is around one. Other explanatory variables which are exogenous, including demographic
and economic variables of the family, are also described in Table A.15.
The current research on parent’s preferences over the sex mixture of their children using US
data shows that most families would prefer at least one child of each sex. For example, Ben-Porath
& Welch (1976) found that 56% of families with either two boys or two girls has a third birth while
only 51% families with one boy and one girl had a third child. Angrist & Evans (1998) found that
only 31.2% of women with one boy and one girl have a third child whereas 38.8% and 36.5% of
women with two girls and two boys have a third child, respectively. With the evidence that women
with children of the same sex are more likely to have additional children, the instruments that we
can use are same sex and twins. Table A.16 illustrates the result of ﬁrst-stage estimates with the
signiﬁcant statistics of same sex and twins.
Table A.17 shows the estimation results of the OLS in a linear model, the MLE in a Tobit model
and the QMLE in a fractional probit model when y 2 is assumed exogenous. The estimation results
of the 2SLS in a linear model, the MLE in a Tobit BS model, the QMLE-PW, the QMLE and NLS
estimation in a fractional probit model are shown in Table A.18 when y 2 is assumed endogenous.
Since I also analyze the model using the Tobit BS model, its model speciﬁcation and derivation
of the conditional mean, the average partial effects and the estimation approach are included in
Appendix D.2. The two-step NLS method with the same conditional mean used in the two-step
QMLE method is also presented in Appendix D.3.

Ordinary least squares
The OLS estimation often plays a role as a benchmark since its computation is simple, its
interpretation is straightforward and it requires fewer assumptions for consistency. The estimates
23

of a linear model in which the fraction of total working hours per week is the response variable
and the number of kids is considered exogenous are provided in Table A.17. As discussed in the
literature of women’s labor supply, the coefﬁcient of the number of kids is negative and statistically
signiﬁcant. The linear model with the OLS estimation ignores functional form issues that arise
from the excess-zeros nature of the dependent variable. In addition, the predicted value of the
fraction of the total weekly working hours for women always lies in the unit interval. The use of
the linear model with the OLS estimation will not make any sense if the predicted value occurs
outside this interval.

A Tobit model with an exogenous number of kids
There are two reasons that a Tobit model might be practical. First, the fraction of working hours
per week has many zeros. Second, the predicted value needs to be nonnegative. The estimates
are given in Table A.17. The Tobit coefﬁcients have the same signs as the corresponding OLS
estimates, and the statistical signiﬁcance of the estimates is similar. For magnitude, the Tobit
partial effects are computed to make them comparable to the linear model estimates. First of all,
the partial effect of a discrete explanatory variable is obtained by estimating the Tobit conditional
mean. Second, the differences in the conditional mean at two values of the explanatory variable
that are of interest is computed (for example, we should ﬁrst plug in y 2i = 1 and then y2i = 0).
As implied by the coefﬁcient, having the ﬁrst child reduces the estimated fraction of total weekly
working hours by about 0.023, or 2.3 percentage points, a larger effect than 1.9 percentage points
of the OLS estimate. Having the second child and the third child make the mother work less by
about 0.021 or 2.1 percentage points and 0.018 or 1.8 percentage points, respectively. All of the
OLS and Tobit statistics are fully robust and statistically signiﬁcant. Comparing with the OLS
partial effect, which is about 0.019 or 1.9 percentage points, the Tobit partial effects are larger
for having the ﬁrst kid but almost the same for the second and the third kid. The partial effects of
continuous explanatory variables can be obtained by taking the derivatives of the conditional mean;
or we can practically get the adjustment factors to make the adjusted Tobit coefﬁcients roughly
comparable to the OLS estimates. All of the Tobit coefﬁcients given in Table A.17 for continuous
24

variables are larger than the corresponding OLS coefﬁcients in absolute values. However, the Tobit
partial effects for continuous variables are slightly larger than the corresponding OLS estimates in
absolute values.

A Fractional response model (FRM) with an exogenous number of kids
Following Papke & Wooldridge (1996), I also use the fractional probit model assuming the
number of children exogenous for a comparison purpose. The FRM’s estimates are similar to
the Tobit’s estimates, but they are even closer to the OLS estimates. The statistical signiﬁcance
of QML estimates is almost the same as that of the OLS estimates (see Table A.17). Having the
second child reduces the estimated fraction of total weekly working hours by 1.9 percentage points,
which is roughly the same as the OLS estimate. However, having the ﬁrst and third child result
in different partial effects. Having the ﬁrst kid makes a mother work much less by 2.0 percentage
points, and having the third kid makes a mother work less by 1.6 percentage points.

Two-stage least squares
In the literature on female labor supply, Angrist & Evans (1998) consider fertility endogenous.
Their remarkable contribution is to use two binary instruments: genders of the ﬁrst two births are
the same (samesex) and twins at the ﬁrst two births (multi2nd) to account for an endogenous third
child. The 2SLS estimates are replicated and reported in Table A.18. The ﬁrst stage estimates
using the OLS method and assuming a continuous number of children, given in Table A.16, show
that women with higher education are estimated to be 6.5 percentage points less likely to have
kids. In magnitude, the 2SLS estimates are less than the OLS estimates for the number of kids
but roughly the same for other explanatory variables. With IV estimates, having children leads a
mother to work less by about 1.6 percentage points, which is smaller than the corresponding OLS
estimates of about 1.9 percentage points. These ﬁndings are consistent with Angrist and Evans’
result.

A Tobit BS model with an endogenous number of kids

25

A Tobit BS model is used with the number of children endogenous (see Table A.18). Only
the Tobit average partial effect of the number of kids have statistically slightly larger effect than
that of the 2SLS estimates. The APEs of the Tobit estimates are almost the same as those of the
corresponding 2SLS estimates for other explanatory variables. Having the ﬁrst, second and third
kid reduce the fraction of hours a mother spends working per week by around 1.8, 1.7 and 1.5
percentage points, respectively. Having the third kid reduces a mother’s fraction of working hours
per week by the same amount as the 2SLS estimates. The statistical signiﬁcance is almost the same
for the number of kids. The Tobit BS method is similar to the 2SLS method in the sense that the
ﬁrst stage uses a linear estimation and it ignores the discrete nature of the number of children. It
explains why the Tobit BS result gets very close to the 2SLS estimates.

A FRM with an endogenous number of kids
Now let us consider the FRM with the number of kids endogenous. The fractional probit model
with Papke-Wooldridge method (2008) has dealt with the problem of endogeneity. However, this
method has not taken into account the problem of count endogeneity. The endogenous variable
in this model is treated as a continuous variable, hence, the partial effects at discrete values of
the count endogenous variable are not considered. In this chapter, the APEs of the QMLE-PW
estimates are also computed in order to be comparable with other APE estimates. Having the
ﬁrst kid reduces a mother’s fraction of weekly working hours by the same amount as the 2SLS
estimates. Treating the number of children continuous also gives the same effect as the 2SLS
estimate on the number of kids. As the number of children increases, the more working hours a
mother has to sacriﬁce. Having the second and third kids reduce the fraction of hours a mother
spends working per week by around 1.6, 1.5 and 1.4 percentage points, respectively. The statistical
signiﬁcance is the same as the Tobit BS estimates for the number of kids. The APEs of the two-step
QMLE-PW estimates are almost the same as those of the corresponding 2SLS estimates for other
explanatory variables.
The fractional probit model with the methods proposed in this chapter is attractive because
it controls for endogeneity, functional form issues and the presence of unobserved heterogeneity.
26

More importantly, the number of children is considered a count variable instead of a continuous
variable. Both the two-step QMLE and NLS are considered and the two-step NLS estimates are
quite the same as the two-step QML estimates. The two-step QML and NLS’s coefﬁcients and
robust standard errors are given in Table A.18 and the ﬁrst-stage estimates are reported in Table
A.16.
In the ﬁrst stage, the Poisson model for the count variable is preferred because of two reasons.
First, the distribution of the count variable with a long tail and excess zeros suggests an appropriate
model of gamma heterogeneity instead of normal heterogeneity. Second, adding the unobserved
heterogeneity with the standard exponential gamma distribution to the Poisson model transforms
the model to the Negative Binomial model, which can be estimated by the maximum likelihood
method. The OLS and Poisson estimates are not directly comparable. For instance, increasing
education by one year reduces the number of kids by 0.065 as in the linear coefﬁcient and by 7.8%
as in the Poisson coefﬁcient.
The fractional probit’s estimates have the same signs as the corresponding OLS and 2SLS estimates. In addition, the result shows that the two-step QMLE is more efﬁcient than the OLS and
2SLS estimators. For magnitude, the fractional probit’s APEs are computed to make them comparable to the linear model’s estimates. Similar to the Tobit model, the partial effect of a discrete
explanatory variable is obtained by estimating the conditional mean and taking the differences at
the values we are interested in. Regarding the number of kids, having more kids reduces the fraction of hours that a mother works weekly. Having the ﬁrst child cuts the estimated fraction of total
weekly working hours by about 0.017, or 1.7 percentage points, which is similar to the 2SLS estimates, and less than the OLS estimates. Having the second child and the third child make a mother
work less by about 1.5 percentage points and 1.4 percentage points, respectively. Even though
having the third kid reduces a mother’s fraction of weekly working hours compared to having the
second kid, the marginal reduction is less, since a marginal reduction of 0.2 percentage points for
having the second kid now goes down to 0.1 percentage points for having the third kid. This can
be seen as the "adaptation effect" as the mother adapts and works more effectively after having

27

the ﬁrst kid. The partial effects of continuous explanatory variables can be obtained by taking the
derivatives of the conditional mean so that they would be comparable to the OLS, 2SLS estimates
and other alternative estimates.
All of the estimates in Table A.18 tell a consistent story about fertility. Statistically, having
any children reduces signiﬁcantly a mother’s working hours per week. In addition, the more kids
a woman has, the more hours that she needs to forgo. The FRM treating the number of kids as
endogenous and as a count variable gives an evidence that the marginal reduction of women’s
working hours per week is less as women have additional kids. In addition, the FRM’s estimates,
taking into account the endogeneity and count nature of the number of children, are statistically
signiﬁcant and more signiﬁcant than the corresponding linear model’s and Tobit BS’s estimates.
One advantage of the fractional probit model with the two-step QMLE (NLS) method that we

æÈ
n

é

are discussing in this part is that it ﬁts the data better than alternative models or methods. Either Rsquared (S1 = SSE/SST = 1 −

(y1i − y1
ˆ

i=1

)2 /

n
È (y

i=1

¯
1i − y1

)2

) or the correlation squared (S2 =

{Corr[y1 , E(y1 |y2 , z)]}2 ) can be used to compare the goodness-of-ﬁt among these models. The
statistics on fractional probit-QMLE, NLS, Tobit BS, and Linear 2SLS are 0.116, 0.114, 0.090, and
0.088, respectively. This shows that the fractional probit model using the two-step QMLE(NLS)
methods has larger goodness-of-ﬁt statistic(s) than that of the Tobit model using Blundell-Smith’s
procedure and the linear model using the 2SLS method.
It seems questionable that the standard errors under the columns of QMLE and NLS methods
(see Table A.18) are unexpectedly smaller than alternative methods’ standard errors if we compare
with the simulation results in Table A.1. There are two things we need to make clear: i) In the
simulation results, they are standard deviations instead of standard error estimates; and ii) Table
A.1 show the case of a strong IV whereas our empirical results does not have a strong IV. Therefore,
we need to look closely at Tables A.5 and A.6 where we have a weak IV or no IV. We also see the
pattern that standard deviations in Tables A.5 and A.6 are much smaller than alternative methods’
standard deviations and they are quite the same as what we observed in Table A.18. In addition,
we also note that the standard errors in Table A.18 use analytical variance instead of bootstrapping

28

variance which is not directly comparable to other methods’ variances which are bootstrapping
variances. However, the simulation result in Table A.13 implies that the analytical variance used
in Table A.18 is quite reliable. For these reasons, we can conclude that the linear approximation
may provide a poor approximation for the count and fractional response variable in a simultaneous
model. It is also worth noting that standard deviations of nonlinear estimators are not increasing
when we go from a strong to weak IV case. In this particular fractional probit model, too much of
the identiﬁcation appears to be off of the nonlinearity. Exclusion restriction seems not necessary
when a nonlinear model is used for the ﬁrst stage, instead of the linear model. In other words,
nonlinearity is responsible for identiﬁcation. However, it is widely considered preferable to have
an identiﬁcation strategy that is robust to using a linear ﬁrst stage regression. This is really a
matter of one’s judgment and identiﬁcation off the nonlinearity is still identiﬁcation. We only need
to worry about assumptions on functional form and the distribution of an error term.

1.5 Conclusion
I present the two-step QMLE and NLS methodology to estimate the fractional response model
with a count endogenous explanatory variable. The unobserved heterogeneity is assumed to have
an exponential gamma distribution, and the conditional mean of the fractional response model
is estimated numerically. The two-step QMLE and NLS approaches are more efﬁcient than the
2SLS and Tobit with IV estimates. They are more robust and less difﬁcult to compute than the
standard MLE method. This approach is applied to estimate the effect of fertility on the fraction
of working hours for a female per week. Allowing the number of kids to be endogenous, using the
data provided in Angrist & Evans (1998), I ﬁnd that the marginal reduction of women’s working
hours per week is less as women have an additional kid. In addition, the effect of the number of
children on the fraction of hours that a woman spends working per week is statistically signiﬁcant
and more signiﬁcant than the estimates in all other linear and nonlinear models considered in this
chapter.

29

Chapter 2
ESTIMATION OF A DYNAMIC TOBIT PANEL DATA WITH AN ENDOGENOUS
VARIABLE AND AN APPLICATION TO FEMALE LABOR SUPPLY

2.1 Introduction
This chapter considers the estimation of a dynamic Tobit model with an endogenous regressor in
the presence of unobserved heterogeneity in both stages and serial correlation in the ﬁrst stage.
Practical issue motivating this study is concerned with the dynamics of female annual labor supply where we have a corner solution outcome and it is affected by its previous state and another
source of endogeneity. The estimation method proposed in this chapter is established based on a
combination of practical methods proposed in the literature. To deal with the ﬁrst source of endogeneity, estimation methods of dynamic nonlinear models with a lagged dependent variable have
been proposed with ﬁxed effects or random effects. The ﬁrst method is case-speciﬁc, computation√
ally complex and often leads to estimators that do not converge at the usual n rate. In addition,
partial effects for nonlinear model using this approach are not identiﬁed. Therefore, an appealing
and robust method which solves for unobserved effects and the well-known initial condition problem has been proposed by Wooldridge (2005). To correct for the second source of endogeneity, we
can use a control function approach, especially convenient and computationally easy proposed by
Smith & Blundell (1986) for limited dependent variables (LDV). As state dependence in a dynamic
nonlinear model can be overestimated without taking into account serial correlation, we also need
to correct for serial correlation.
The contribution of this chapter is to provide a computationally attractive estimation method
for a dynamic censored model with an endogenous regressor (besides the lagged dependent variable) and serially correlated error terms. This method is readily applied to Panel Study of Income
Dynamics (PSID) data using the years 1980 to 1992. Based on the estimation result, I ﬁnd the
evidence of persistence in US white female labor working hours over the period 1980-1992. It
30

suggests that the current labor supply of US women is affected by their past labor supply and
their initial condition of labor supply. Both observed and unobserved individual heterogeneity, and
serial correlation play an important role in the persistence of US female labor supply.
This chapter is organized as follows. The second section reviews: i) approaches to estimation
of a dynamic Tobit panel data model; ii) a control function approach to govern the endogeneity
problem and iii) methods to deal with serial correlation. It also discusses related issues on the
dynamics of the US female annual labor supply. The third section develops a model for dynamic
Tobit panel data with an endogenous regressor and the fourth section obtains average partial effects
(APE) estimates. The ﬁfth section discusses how to correct for serial correlation in the ﬁrst stage.
Empirical example follows in the next section. The last section is summarization and conclusion.

2.2 Literature Review
The approach and framework of this chapter are most closely related to the work proposed by
Giles & Murtazashvili (2010). They allow continuous, endogenous contemporaneous regressors
in a dynamic panel data model but their outcome of interest is a binary variable. Their estimator
is applied to analyze the impact of migrant labor markets on reducing the probability of falling
into poverty. Since the outcome variable in this chapter is continuous with a positive probability
as well as has a pileup at zero, the framework for a dynamic binary response model in Giles &
Murtazashvili (2010) has to be adjusted. A dynamic Tobit panel data model should be appropriate
in this case.
There have recently been many studies on a dynamic Tobit panel data model which allows for
unobserved heterogeneity and dynamic feedback. These two features of the dynamic panel data
model, however, often create difﬁculties in estimation. The main difﬁculty is that with nonlinearity, it is not obvious how to “difference away” the individual speciﬁc effects and how to use
instrumental variable type techniques.
Some developments have been made on estimating certain nonlinear dynamic models using
the “ﬁxed-effects” approach, for example, the censored regression models (Honore (1993); Hon31

ore & Hu (2001)), the sample selection models (Kyriazidou (2001)), the discrete choice models
(Honore & Kyriazidou (2000)), and the models with multiplicative individual effects (Chamberlain
(1992); Wooldridge (1997)). In particular, Honore (1993) proposed some solutions for estimating
a censored regression panel data model with individual ﬁxed effects and lagged censored dependent variables. Honore & Hu (2001) provided identiﬁcation results for this approach under certain
conditions. And Honore & Hu (2004) allowed a lagged dependent variable and a set of strictly
exogenous variables. They constructed moment conditions for the panel data model with ﬁxed
effects and lagged (censored) dependent variable with a restrictive assumption of non-negative coefﬁcient on the lagged dependent variable. In addition, their approach will not result in estimates
for APEs. Even though semiparametric approaches do not make any assumptions on either unobserved effects or initial conditions but they are case-sensitive and often lead to estimators that do
√
not converge at the usual n rate (Arellano & Honore (2001)). For example, Honore & Kyriazidou (2000) assumed that transitory errors are iid over time (c i is arbitrary dependent on Xit ). If the
regressors are continuous or have high dimension then the estimator will have a convergence rate
√
slower than n. The estimator will over-difference the data and understate the role of the initial
value of dependent variable, yi0 . This causes downward biased coefﬁcient on the lagged dependent
variable in ﬁnite samples and this bias will not decrease as T increases. More importantly, partial effects on the conditional mean are not identiﬁed. The amount of state dependence therefore
cannot be determined.
An alternative of estimation method in nonlinear dynamic models is to use the “randomeffects” approach. This approach is faced with a notably difﬁcult issue of initial condition problem.
Wooldridge (2005), Chay & Hyslop (1998), and Hsiao (1986) have an excellent summarization on
how this problem is treated in the literature. There are three alternative assumptions on initial conditions. The ﬁrst approach treats the initial condition as exogenous (Heckman (1978a,b, 1981b)).
Initial conditions are independent of the individual effects and can be ignored when estimating the
structural model. However, if either ci or Xit is a determining factor in the initial sample conditions,
then this approach will overstate the amount of state dependence in the process. Moreover, this is a

32

very strong assumption and may not make sense. For example, ability is allowed to be uncorrelated
with initial earnings. The second approach treats the initial condition as in equilibrium (Card &
Sullivan (1988)). This restriction is unlikely to hold when observable covariates are time-varying
and important determinants of the outcome. The initial condition is allowed to be random and the
distribution of the initial condition given unobserved heterogeneity is speciﬁed. This model does
not allow for additional covariates (Bhargava & Sargan (1983); Hsiao (1986)). The third way is to
adopt a ﬂexible reduced form speciﬁcation: approximating initial sample observation (Heckman
(1981b)). This approach is computationally difﬁcult to obtain estimates of parameters and APEs.
The ﬁrst approach is viewed as “pure” random effect approach where ci is independent of zi
and yi0 . In addition, unobserved effect is independent of exogenous variables. One can obtain
the density of (yi1 , . . . , yiT ) given yi0 and zi by integrating out ci . This method requires a strong
assumption of independence between the initial condition and the unobserved effect. The fourth
approach which is proposed by Wooldridge (2005) is the most unrestricted random effect model,
which was named "correlated" random effect. Compared with the ﬁxed effect model, it may provide substantial efﬁciency gain (Hausman (1978)) given the correctly speciﬁed distribution of c i
and yi0 . It recommends to obtain a joint distribution of (y i1 , . . ., yiT ) conditional on yi0 and zi ;
rather than a distribution of (yi0 , . . ., yiT ) conditional on zi as in Heckman’s approach. However,
we need to specify a density of ci given yi0 and zi (motivated by the original idea from Chamberlain (1980)). The relationship between ci and zi makes this model named “correlated” random
effects where we allow a linear relationship between ci and zi and yi0 . This approach requires
fewer computational efforts than Heckman’s technique and gives nice APEs. It also leads to several advantages, we can choose a ﬂexible conditional distribution of the initial condition instead of
approximation which results in computational difﬁculty. As a consequence, estimates are readily
computed and partial effects can be easily determined.
The study of limited dependent variables models with an endogenous regressor (instead of
lagged dependent variable) has a fairly long history. Most papers in the literature have assumed a
reduced form for the endogenous variable. Examples of this include the papers by Nelson & Olson

33

(1978); Heckman (1978a); Amemiya (1979); Newey (1985, 1986); Blundell & Smith (1989); Vella
(1993); Blundell & Powell (2004); Das (2002) for cross sections and Vella & Verbeek (1999);
Labeaga (1999); Giles & Murtazashvili (2010) for panel data. In a linear model, such a reduced
form (or the “ﬁrst stage”) can be thought of as a linear projection, and as such it is essentially
always well-deﬁned and consistently estimated by the OLS estimator. This is not the case in a
nonlinear model where it is typically assumed that the ﬁrst stage is a conditional expectation and
that the error is independent of the instruments. Smith & Blundell (1986) considered a static
Tobit model to analyze female labor supply in the UK in 1981 treating other household income as
endogenous. The insight of their paper is to substitute a consistent estimator for the residual in the
reduce-form equation into the structural model to control for the endogeneity. And this approach
is named control function approach which produces a two-step estimator based on the conditional
likelihood for the equation of interest. As this chapter studies a source of endogeneity not coming
from the lagged dependent variable, we will employ the control function approach that allows
for a correlation between unobserved effect and regressors, as well as between regressors and the
structural error.
As in Baltagi & Li (1991) and Baltagi & Wu (1999), estimation of a panel data model with
AR(1) disturbances is based on a feasible generalized least squares procedure. This method is
simple to compute and provides natural estimates of the serial correlation and variance components
parameters. The test for zero ﬁrst-order serial correlation is also easily implemented. However, this
estimation procedure works very well for a linear panel data model but has not been executed in a
nonlinear panel data model. In order to deal with serial correlation in a dynamic nonlinear model,
Lee (1999) has proposed the simulated maximum likelihood method. This method is robust in
time-series context, however, it is quite computationally intensive. We will exploit the method
similar to Baltagi & Li (1991) to handle our ﬁrst stage serial correlation.
One of the possible applications for this model is to study the persistence of the US female
labor supply taking into account endogenous features of observed covariates. The literature on
labor supply has examined female labor supply in many studies. Women’s labor supply is one

34

of long-standing labor supply research (Heckman (1974); Heckman & Macurdy (1980)). Studies
of women labor supply is growing rapidly due to the increasing availability of panel data and
improved computational power and techniques. According to Heckman (1981a), state dependence
may arise if working leads to accumulation of human capital – skills, know-how, work ethic, etc.,
and not working leads to depreciation of human capital. Women who prefer work to leisure, who
are highly motivated and have high ability tend to stay in the work force for their entire working life
and their high labor supply persistence is exhibited. Differences in “search costs” associated with
different labor market states may also cause state dependence (Eckstein & Wolpin (1990); Hyslop
(1999)). There might be ﬁxed cost to enter the labor market, raising the cost for individuals who
are not employed, relative to those already in the labor market.
Shaw (1994) studies the persistence of the US white female labor supply from 1967 to 1987
using a linear dynamic model with age stratiﬁcation and she found persistence in their labor supply
because as women entered the labor force, they tended to become continuous workers. She also
found that the extent of persistence changed little over the 20 year period studied after controlling
for individual circumstances which are inﬂuential for early and late life periods such as number of
children, health status, age, and wages. However, she does not take into account the nature of the
working hours as a limited dependent variable. And she does not examine whether the persistence
comes from transitory shocks that might be serially correlated.

2.3 Model
I consider a panel data model with the latent variable as follows:
y∗ = ρ y1i,t−1 + α y2it + xit β + c1i + u1it ,
1it
y1it = max(0, y∗ ),
1it

t = 1, . . ., T,

(2.1)
(2.2)

where y1it is observed and equal to zero with a positive probability while continuously distributed
over strictly positive values, y1i,t−1 is a lagged dependent variable and the dynamics are assumed
ﬁrst order, y2it is an endogenous variable, xit is a 1×K vector of time-varying explanatory variables
35

which can contain a constant term, ci1 is a time-constant unobserved heterogeneity and u it1 is an
idiosyncratic error. β is a K × 1 vector of parameters, ρ and α are scalar parameters. i indexes
a random draw from the cross section with sample size N and t denotes a particular time period
within a number of ﬁxed time periods T . For simplicity, we assume a balanced panel. In the
followings, we have i = 1, . . . , N, and t = 1, . . ., T .
We assume that model (2.1) is correctly speciﬁed dynamically and the error term is serially
uncorrelated:
2
u1it |y1i,t−1, . . . , y1i0 , xi , c1i ∼ Normal(0, σu ).
1

(2.3)

If we allow the error term to be serially correlated, for example, allowing for an AR(1) process,
we would want to include not only a lagged dependent variable but also lags of x as well. In this
case, we include a single lag of y1 , contemporaneous y2 and possibly of x’s.
In model (2.1), xit is assumed to be strictly exogenous and y 2it is allowed to be endogenous.
Let zit = (xit , z1it ) be a set of strictly exogenous variables, a 1 × L vector of instrumental variables,
where L > K and z1it is excluded from (2.1). Using the control function approach to model the
endogeneity (see Smith & Blundell (1986); Rivers & Vuong (1988)), we can assume a linear
reduced form for y2it as follows:
y2it = zit γ + c2i + u2it ,

(2.4)

2
where u2it is an idiosyncratic serially uncorrelated error with Var(u 2it ) = σu and c2i is an unob2

served effect. Using Mundlak (1978)’s device, we allow c2i = zi δ + a2i and rewrite y2it as:
y2it = zit γ + zi δ + v2it ,
where v2it = a2i + u2it ; zi = T −1

T
Èz
t=1

it

(2.5)

2
and a2i |zi ∼ Normal(0, σa ). We can also add time dum2

mies into this reduced form.
Now it boils down to the assumption that we need to make for the conditional distributions of
u1it and c1i . We will discuss ﬁrst about u1it . As in the cross-sectional case discussed in Smith &
Blundell (1986), we can allow a joint normality between u 1it and v2it . However, v2it is serially
correlated because of the presence of the heterogeneity, a2i , therefore, this will make the serial
36

correlation issue in the estimation with dynamics difﬁcult to handle. As a result, we will start with
the assumption of joint normality of u 1it and u2it as suggested in Giles & Murtazashvili (2010)
since u2it can be naturally assumed to be serially uncorrelated. As we will see in the discussion
below, this assumption is reasonable in our context of dynamics in the structural equation. We
write:
u1it = θ1 u2it + e1it ,

(2.6)

2
where θ1 = Cov(u1it , u2it )/Var(u2it ) and e1it ∼ Normal(0, σe1 ).

(u1it , u2it ) is allowed to have a zero mean, bivariate normal distribution; z i is strictly exogenous
in both equation (2.1) and (2.5) or in other words, u 1it and u2it are independent of zi .
e1it is independent of zi and u2it . We can assume that e1it serially uncorrelated because u2it
is serially uncorrelated and independent of zi in addition to the fact that u 1it is free of serial correlation. Even if u2it is serially correlated, we can correct for this serial correlation without any
hardship. We will discuss about this issue in more details later.
Regarding the issue of the endogeneity of y2it , let us rewrite equation (2.6):
u1it = θ1 v2it − θ1 a2i + e1it .

(2.7)

We will see now the direct relation between u1it and v2it , through that we can account for
endogeneity of y2it in period t. In addition, u 2it is free of serial correlation, then from equation
(2.6), e1it is not correlated with u2i,t−1 and v2i,t−1, as a result. With the same idea for past values
of v2it , y2it will become sequentially exogenous in the estimating equation. With equation (2.7),
we now have to handle the heterogeneity issues, not only c 1i but also a2i . Rewrite the structural
equation under the assumption from equation (2.7), we have:
y∗ = ρ y1i,t−1 + α y2it + xit β + c1i + θ1 v2it − θ1 a2i + e1it , t = 1, . . . , T,
1it
or
y∗ = ρ y1i,t−1 + α y2it + xit β + si + θ1 v2it + e1it , t = 1, . . . , T,
1it
where si = c1i − θ1 a2i , which is a composite error.
37

(2.8)

Using Wooldridge-Chamberlain’s device (2005, 1980), with the motivation of "correlated" random effects dynamic model proposed by Wooldridge (2005) to handle the initial condition problem, we can specify si as a linear function of y1i0 and zi in order to use the standard random effects
Tobit software without approximating the density function of s i . However, now our regressors in
equation (2.8) extends to include v2it (which is not in zi ), therefore, we will include v2i into the
linear function that describe the relationship of s i and the initial condition as well as explanatory
variables in all time periods.
si = θ2 y1i0 + zi θ3 + v2i θ4 + a1i ,

(2.9)

where
2
a1i |(y1i0 , zi , v2i ) ∼ Normal(0, σa ).
1

(2.10)

This is a reasonable assumption because unobserved effect (such as motivation, ambition) is
correlated with the initial condition of the outcome of interest (working hours). In addition, as the
model has a lagged dependent variable, y1i,t−1 and c1i has some source of correlation. In order to
conserve the degree of freedom or to reduce the time of computation which will be important in
some applied work with a substantial number of explanatory variables, we can assume that different
time periods of explanatory variables have equal impacts on s i and using Mundlak’s device, we can
restrict our assumption (2.9) to: s i = θ2 y1i0 + zi θ3 + v2i θ4 + a1i .
We can see that now v1it = θ2 yi0 + zi θ3 + v2i θ4 + θ1 v2it + a1i + e1it . Substitute that into equation (2.1), hence, we readily obtain:
y∗ = ρ y1i,t−1 + α y2it + xit β + θ2 y1i0 + zi θ3 + v2i θ4 + θ1 v2it + a1i + e1it , t = 1, . . . , T, (2.11)
1it
and in a shorter version, we have:
y1it = max(0, w1it λ1 + θ2 y1i0 + zi θ3 +v2i θ4 + θ1 v2it + a1i + e1it ), t = 1, . . . , T,
where w1it = (y1i,t−1 , y2it , xit ) and λ1 = (ρ , α , β ) .

38

(2.12)

Based on the estimating equation (2.11), with the framework suggested in (Wooldridge, 2002,
section 13.9) and Wooldridge (2005), we can write the density as follows:
ft (y1t |y1,t−1 , y2t , y10 , z, v2 , a1 ; λ ) = f0t f1t ,

(2.13)

where
f0t = 1 − Φ[(w1t λ1 + θ2 y10 + zθ 3 + v2 θ4 + θ1 v2t + a1 )/σe1 ]

1[y1t =0]

,

and
f1t = (1/σe1 )φ [(y1t − w1t λ1 − θ2 y10 − zθ 3 − v2 θ4 − θ1 v2t − a1 )/σe1 ]1[y1t >0] .
Thus the density of (y1i1 , y1i2 , . . . , y1iT ) given (y1i0 = y10 , zi = z, v2i = v2 , a1i = a1 ) is:
T

ft (y1t |y1,t−1 , y2t , xt , y10 , z, v2 , a1 ; λ ),

(2.14)

t=1

and since we do not observe a1i , in order to estimate λ , we need to integrate out a 1 from this den2
sity. Given a1i |(y1i0, zi , v2i ) ∼ Normal(0, σa ), we can obtain the density of (y1i1 , y1i2 , . . . , y1iT )
1

given (y1i0 = y10 , zi = z, v2i = v2 ) as:

¾T

R

t=1

¿

ft (y1t |y1,t−1, y2t , xt , y10 , z, v2 , a1 ; λ ) (1/σa1 )φ (a1 /σa1 )da1 ,

(2.15)

which has exactly the same structure as in the standard random effects Tobit model, but the explanatory variables at time period t are:
wit = (y1i,t−1 , y2it , xit , y1i0 , zi , v2i , v2it ).

(2.16)

Now we can exploit the standard random effects Tobit software for estimation. We add y i0 , zi , v2i
2
and v2it as additional explanatory variables in each time period and estimate λ , θ 3 , θ4 and σe ,
1
where v2i = (v2i1, v2i2 , . . . , v2iT ).

Based on the above model development, the estimation procedure for "correlated random effect" dynamic Tobit model is proposed as follows.
Estimation Procedure:

39

(i) Estimate the reduced form for y2it using the pooled OLS of y2it on zit , zi , and time dummies.
Obtain the residuals, v2i and v2it .
(ii) Use the random effect Tobit of y1it on wit and get all the estimates of interest, λ , where
wit = (y1i,t−1 , y2it , xit , y1i0 , zi , v2i , v2it ).

2.4 Average Partial Effects
In order to compare the magnitude of the estimate obtained in a nonlinear model from the previous
section with a linear estimate, we need to obtain the marginal effect or the average partial effect
(APE) of the explanatory variable of interest. Following Wooldridge (2002, 2005), the APEs are
computed as the derivatives or differences of:
2
E[m(w1t λ1 + θ2 y1i0 + zi θ3 +v2i θ4 + θ1 v2it + a1i , σe )],
1

t = 1, . . ., T,

(2.17)

2
where m(g, σe ) = Φ[g/σe1 ]g + σe1 φ [g/σe1 ] under the notation that
1

g = w1t λ1 + θ2 y1i0 + zi θ3 +v2i θ4 + θ1 v2it + a1i ,
and in the argument of the expectation operator, variables with a subscript i are random and all
others are ﬁxed.
Using iterated expectation, expression (2.17) can be rewritten as:
2
E{E[m(w1t λ1 + θ2 y1i0 + zi θ3 +v2i θ4 + θ1 v2it + a1i , σe )|y1i0 , zi , v2i ]},
1

(2.18)

where w1t are ﬁxed values here and the conditional expectation is with respect to the distribution
2
of (y1i0 , zi , v2i , a1i ). Since a1i and (y1i0 , zi , v2i ) are independent, and a1i ∼ Normal(0, σa ), the
1

conditional expectation in equation (2.18) is obtained by integrating
2
m(w1t λ1 + θ2 y1i0 + zi θ3 +v2i θ4 + θ1 v2it + a1i , σe ),
1

2
over a1i with respect to the Normal(0, σa ) distribution.
1

40

Since
2
m(w1t λ1 + θ2 y1i0 + zi θ3 +v2i θ4 + θ1 v2it + a1i , σe )
1

is obtained by integrating
max(0, w1t λ1 + θ2 y1i0 + zi θ3 +v2i θ4 + θ1 v2it + a1i + e1it )
2
with respect to e1it over the Normal(0, σe ) distribution, the conditional expectation in equation
1
(2.18) is:
2
2
m(w1t λ1 + θ2 y1i0 + zi θ3 +v2i θ4 + θ1 v2it , σa + σe ).
1
1

(2.19)

For a given value of w1t (w0 ), a consistent estimator for expression (2.19) can be obtained by
1
replacing unknown parameters by consistent estimators:
N −1

N

ˆ
ˆ
ˆ ˆ
ˆ v ˆ
ˆ2
ˆ2
m(w0 λ1 + θ2 y1i0 + zi θ3 +ˆ 2i θ4 + θ1 v2it , σa + σe ),
1
1
1

(2.20)

i=1

where v2it are the ﬁrst stage pooled OLS residuals from y2it on zit , zi and time dummies, and
ˆ
ˆ
v2i = (v2i1 , v2i2 , . . ., v2iT ).
ˆ
ˆ
ˆ
The APEs are obtained by taking derivatives or differences of expression (2.19) (in which w 0
1
is replaced with w1t ) with respect to w1t and the estimator of these APEs will be obtained based
on those derivatives and differences and estimated parameters.
For example, APE of y1,t−1 is:
N T

ρ (NT )−1

ˆ
ˆ
ˆ ˆ
ˆ v ˆ
ˆ2
ˆ2
Φ[(w1t λ1 + θ2 y1i0 + zi θ3 +ˆ 2i θ4 + θ1 v2it )/(σa + σe )] ,
1
1

(2.21)

ˆ
ˆ
ˆ ˆ
ˆ v ˆ
ˆ2
ˆ2
Φ[(w1t λ1 + θ2 y1i0 + zi θ3 +ˆ 2i θ4 + θ1 v2it )/(σa + σe )] .

(2.22)

i=1t=1

and APE of y2t is:

α (NT )−1

N T

1

i=1t=1

1

2.5 Serial Correlation Correction
As discussed in the previous part, the essential assumption that we made in equation (2.6) requires
u2it free of serial correlation. If u2it is serially correlated, then we must correct for the serial
41

correlation in e1it , otherwise our estimator will not be consistent. For simplicity, assume that u 2it
follows an AR(1) process, similar to the discussion in Giles & Murtazashvili (2010):
u2it = η u2i,t−1 + e2it , t = 1, . . ., T,

(2.23)

2
and e2it is a white noise error with Var(e2it ) = σe .
2

We have:
y2it = w2it γ2 + a2i + u2it ,

(2.24)

where w2it = (zit ,zi ) and γ2 = (γ , δ ) or we can write:
y2i = w2i γ2 + v2i .

(2.25)

Since u2it has serial correlation, e1it is serially correlated as we can see below:

ηe = Cov(e1it , e1i,t−1) = Cov(u1it − θ1 u2it , u1i,t−1 − θ1 u2i,t−1),
ηe = Cov(u1it − θ1 η u2i,t−1 − θ1 e2it , u1i,t−1 − θ1 u2i,t−1),
2
ηe = ηθ1 Var(u2i,t−1),

ηe = 0 unless η = 0 or θ1 = 0.
To remove serial correlation in e1it , our strategy is to use a transformation procedure and obtain
the ﬁrst-stage residual free of serial correlation. Deﬁne the variance-covariance matrix of v2i as:
2
2
Γ = E(v2i v2i ) = σa jT jT + σu Ψ(η ),
2
2

(2.26)

where Γ is a T × T positive deﬁnite matrix when −1 < η < 1 and I assume that in what follows.
This matrix is necessarily the same for all i because of the random sampling assumption in the
cross section. jT is a T × 1 vector of ones, and Ψ(η ) is deﬁned as below:

¾

η T −3

η T −2

η T −1

1

η
Ψ(η ) =

η

η2

1

η

. . . η T −4 η T −3 η T −2

η2
.
.
.

η
.
.
.

1
.
.
.

. . . η T −5 η T −4 η T −3
.
.
.
.
.
.
...
.
.
.

¿

...

η T −2 η T −3 η T −4 . . .

η

1

η

η T −1 η T −2 η T −3 . . .

η2

η

1

42

.

(2.27)

2
2
2
2
We also note that σu = σe /(1 − η 2 ). After obtaining consistent estimates of η , σ a , σu (and
2

2
σe ),
2

2

2

we can transform v2it into

v∗
2it

2

which is free of serial correlation. With this new serially un-

correlated error (v∗ ), we can transform u2it to a new serially uncorrelated u∗ using this equation:
2it
2it
u∗ = v∗ − a2i . This will guarantee that our new e1it (e∗ ) is free of serial correlation as a result
2it
2it
1it
of: e∗ = u1it − θ1 u∗ . e∗ is now serially uncorrelated, independent of zi and u∗ , and has a
1it
2it
1it
2it
∗2
normal distribution: Normal(0, σ e ).
1

We will brieﬂy describe the transformation procedure as follows:
Using the fact that jT jT = T , Γ is rewritten as:
−1

2
Γ = T σa jT jT jT
2

2
2
2
jT + σu Ψ(η ) = T σa PT + σu Ψ(η ),
2
2
2

where PT ≡ IT − QT ; QT = IT − jT jT jT

−1

(2.28)

2
2
2
jT . Deﬁne τ1 = σu Ψ(η )/[T σa + σu Ψ(η )], we
2
2
2

can write:
2
2
Γ = T σa + σu Ψ(η ) (PT + τ1 QT ).
2
2

(2.29)

After some algebra, we can show that: (PT + τ1 QT )−1/2 = (1 − τ )−1[IT − τ PT ] where τ =
√
1 − τ1 .
Hence,
2
2
Γ−1/2 = T σa + σu Ψ(η )

Ò

2

2

−1/2

2
(1 − τ )−1[IT − τ PT ] = σu Ψ(η )

−1/2

2

Ó1/2

[IT − τ PT ],

(2.30)

2
2
2
.
where τ = 1 − σu Ψ(η )/[T σa + σu Ψ(η )]
2
2
2
Deﬁne CT ≡ [Ψ(η )]−1/2 [IT − τ PT ] and transform equation (2.25) into:

y2i = w2i γ2 + v2i ,

(2.31)

by multiplying C T to both sides of equation (2.25).
Now the variance matrix of v2i is:
2
E(v2i v2i ) = CT ΓCT = σu IT .
2

(2.32)

Therefore we have transformed v2i into v∗ (= v2i ) which is serially uncorrelated and ho2i
moskedastic by using:
v2i = CT v2i .
43

(2.33)

The estimator of CT is:
CT = σu2 Γ−1/2.

(2.34)

We can see that, in the special case when η = 0 (no serial correlation), Ψ(η ) = IT and CT =
[IT − τ PT ].
Now we can adjust equation (2.9) under the adjusted assumption that:
s∗ = θ2 y1i0 + zi θ3 + v∗ θ4 + a∗ ,
i
2i
1i

(2.35)

∗2
where a∗ |(yi0 , zi , v∗ ) ∼ Normal(0, σa ) and obtain:
1i
2i
1

y∗ = ρ y1i,t−1 + α y2it + xit β + θ2 y1i0 + zi θ3 + v∗ θ4 + θ1 v∗ + a∗ + e∗ , t = 1, . . . , T, (2.36)
1it
2i
2it
1i
1it
where v∗ = (v∗ , v∗ , . . . , v∗ ) and we will estimate all parameters in the second stage using
2i
2i1 2i2
2iT
standard random effects Tobit software, based on the density of (y 1i1 , y1i2 , . . ., y1iT ) given (y1i0 =
y10 , zi = z, v∗ = v∗ ) as:
2i
2

¾T

R

t=1

¿

∗
∗
ft (y1t |y1,t−1, y2t , xt , y10 , z, v∗ , a∗ ; λ ∗ ) (1/σa )φ (a∗ /σa )da∗ ,
2 1
1
1
1
1

(2.37)

which has exactly the same structure as in the standard random effects Tobit model, but the explanatory variables at time period t are:
w∗ = (y1i,t−1 , y2it , xit , y1i0 , zi , v∗ , v∗ ).
it
2i 2it

(2.38)

Now we can propose an estimation procedure for “correlated random effect” dynamic Tobit
model with ﬁrst-stage residual serial correlation correction.

2.5.1 Estimation Procedure
(i) Run the random effect linear regression with an AR(1) disturbance of y2it on w2it (with time
dummies) and obtain the residuals v2it and v2i . Obtain CT and transform v2it and v2i into
v∗ and v∗ based on the above transformation procedure.
2it
2i
ˆ
(ii) Use the random effect Tobit of y1it on w∗ and get all the estimates of interest, λ ∗ , where
it
w∗ = (y1i,t−1 , y2it , xit , y1i0 , zi , v∗ , v∗ ).
it
2i 2it
44

2.5.2 Average Partial Effects
As the errors in the ﬁrst stage are serially correlated, we also need to adjust the estimates of APEs.
Instead of equation (2.18), we start with:
2∗
E[m(w1t λ1 + θ2 y1i0 + zi θ3 +v∗ θ4 + θ1 v∗ + a∗ , σe )],
2i
2it
1i

t = 1, . . ., T,

1

(2.39)

and following the same discussion as the case with no serial correlation, we can obtain APEs with
respect to y1,t−1 , y2t , and xt by taking derivatives or differences of:
N −1

N

2∗
ˆ
ˆ
ˆ ˆ
ˆ v ˆ
ˆ 2∗
m(w1t λ1 + θ2 y1i0 + zi θ3 +ˆ ∗ θ4 + θ1 v∗ , σa + σe ).
2i
2it ˆ
1

i=1

1

(2.40)

If the null hypothesis of no endogeneity and no serial correlation in the ﬁrst stage is rejected,
the standard errors in the second stage should be adjusted for the ﬁrst stage estimation by using
delta method or bootstrapping. In addition, we also need to obtain asymptotic standard errors
for the APEs. Appendix E shows how to obtain adjusted standard errors in the second stage and
asymptotic standard errors for the APEs using delta method.

2.5.3 Comparison
We will compare the methods proposed in the previous section with the traditional linear model
and the model without serial correlation correction.
1. Linear Dynamic Model with an endogenous explanatory variable
We estimate model (2.1) using a generalized method of moments (GMM) system approach
(Arellano & Bover (1995)) using both level and differenced instruments.
2. Correlated Random Effect model (without serial correlation correction)
We estimate model (2.1) with a correlated random effect model:
y∗ = ρ y1i,t−1 + α y2it + xit β + θ2 y1i0 + zi θ3 + v2i θ4 + θ1 v2it + a1i + e1it , t = 1, . . ., T.
1it
Estimation Procedure:

45

(i) Estimate the reduced form for y2it using the pooled OLS of y2it on zit , zi and time dummies.
Obtain the residuals, v2i and v2it .
(ii) Use the random effect Tobit of y1it on wit and get all the estimates of interest, λ , where
wit = (y1i,t−1, y2it , xit , y1i0 , zi , v2i , v2it ) using the notation introduced in the previous section.
Using Mundlak’s simpler version of Chamberlain’s device (1980), in those estimating equations above, we can use v2i instead of v2i since the Mundlak’s model can conserve on degrees of
freedom, which is important especially when T is large in a dynamic model.

2.6 Empirical Example
The estimation procedure described above can be used in many applications. Here we apply to
analyze the US female labor supply. In a panel data study, working hours exhibit a dynamic behavior and the persistence may be contaminated by heterogeneity, endogeneity and serial correlation.
According to Heckman and MaCurdy’s labor supply model, the censored model should be appropriate. The challenge is to invent a new econometric device to estimate a dynamic censored model
with an endogenous variable besides the lagged dependent variable. And this new device has been
developed in the previous section.
Endogeneity of experience is a potential problem because there are two sources of endogeneity
here. First, experience is correlated with ability. Second, experience is constructed based on
working hours and exogenous shock to working hours in the past (through wages) is correlated
with the number of years of experience we observe today. Therefore, experience is not viewed as
strictly exogenous after conditioning on unobserved heterogeneity.
Having controlled for unobserved effects does not follow that we have unbiased estimates of
state dependence, for two reasons. First, women with high average lifetime hours of work, and
thus high xit , may have become permanent workers because their early experience in the market
demonstrated to them the need for continuous hours of work to build and maintain their human
capital investment. The result is that "human capital acquired through work experience raises
46

the future probability of participation" (see Heckman & Willis (1974), initial deﬁnition of state
dependence). In this case, the estimated state dependence parameter is biased downward towards
0, because state dependence operates entirely through a high lifetime c i . State dependence is the
coefﬁcient on lagged hours: a positive coefﬁcient on lagged hours implies that past hours have
a positive impact on future hours. If ci is omitted from the regression, the coefﬁcient on lagged
hours will be biased upward by the omitted variable bias, and therefore the importance of state
dependence will be over-estimated.
The second problem is that state dependence cannot be separated from serially correlated errors. In other words, if shocks to hours are correlated over time, they will be picked up by the
lagged hours variable, and state dependence will be biased. Hyslop (1999) found that transitory
errors negatively correlated over time, suggesting failing to control for serially correlated transitory
errors would lead to underestimation of state dependence.

2.6.1 Data
One application of the model introduced in the previous part is to study the dynamics of female
labor supply. We can use the data from the Panel Study of Income Dynamics (PSID) for the
years 1980-1992. In this study, we only focus on 864 white female who were either heads of
households or spouses and their age is from 18 to 65. Women who are self-employed, in army and
agricultural workers are excluded. Observations with inconsistent or missing data are dropped.
More speciﬁcally, if one of the following happened in at least one year between 1976-1992, then
the person will be dropped: self-reported age exceeded the age constructed using information on
the year of birth by more than two years or self-reported age was smaller than constructed age by
more than one year; the person was less than 18 or more than 65 years old; the person had missing
experience; the person’s age exceeded her/his experience by less than six years; spouse’s weeks
of unemployment was missing; the person reported positive work hours and zero earnings; the
change in years of schooling between 1976-1985 was negative and exceeded one year in absolute
value. In cases when the reported decrease in years of schooling was on year, the minimum of the

47

two reported values was assigned in all periods. The ﬁnal sample consists of 11,232 observations.
The dependent variable in the structural equation (2.1), y1it , is female annual working hours.
The vector of explanatory variables includes the lagged dependent variable (y 1i,t−1 ), an endogenous variable, experience, (y2it ), and a set of exogenous variables (xit ): education (measured in
years of schooling), number of small children ages in 3 categories: 0-2, 3-5, and 6-17, marital
status, husband’s employment status, and non-wife income. Experience is constructed by taking
the information about prior experience from 1976 survey year or from the year when the individual entered the sample for the ﬁrst time, and then updating this information annually. In each
year, experience was increased by one if the annual work hours were 2000 or more, and it was
increased by the number of hours worked divided by 2000 if the annual work hours were less than
2000. Education is considered to be strictly exogenous conditional on the unobserved effect while
experience is considered endogenous.
The set of instruments, zit , contains years of schooling, age and its square, an indicator of
marital status, number of children with three categories of ages in the family, husband’s employment status, and non-wife income; their time averages and time dummies. Table B.1 reports the
summary statistics for all variables used in the analysis.
Figure B.1 shows the distribution of women’s working hours during the period 1980-1992.
Around 27 percent of women did not work at the time of the survey. On average, women work for
1124 hours per year, which is about 21 hours per week (including women who do not work). The
next largest group consists of women who work for 2000 hours per year, which is equivalent to 40
hours per week, accounting for 12 percent. The pattern with some pile up at zero hour and 2000
hours suggests that hours of work are sensitive to changes in the structure of both observed and
unobserved individual heterogeneity. Figures B.2-B.5 illustrate the relationship between women’s
working hours and her experience, and her children number with 3 groups of ages, respectively.
All of these relationships appear to ﬁt our prior expectations.
In our sample, there are 2,978 women worked zero hours, opposed to 8,053 women worked
for wage during the year with positive hours, ranging from 2 to 5,168. Hence, annual hours is a

48

reasonable candidate for a Tobit model.

2.6.2 Estimation and Result
We are interested in estimating the dynamic Tobit model of working hours for a woman i at time t:
Hoursit = max(0, ρ Hoursi,t−1 + α Experienceit + xit β + c1i + u1it ),

(2.41)

where Hoursit is annual working hours for a woman i at time t, which are determined by her annual
working hours in the previous period, Hours i,t−1 , her experience, Experienceit , and a vector of
her characteristics including age, education, number of children, marital status and her husband’s
characteristics. The lagged dependent variable is included to capture the dynamic feature of working hours, in the sense that current working hours may also depend on past working hours, all
others held constant. This dependence is due to things such as the accumulation of skills derived
from past work. From this model, we are interested in estimating the coefﬁcients on Hours i,t−1
and Experienceit . The coefﬁcient on Hoursi,t−1 will shed light on the US female labor supply
persistence over the period 1980-1992.
As women’s experience is considered endogenous in this model, we will instrument the endogenous regressor with her age and its square because there is a positive signiﬁcant correlation
between experience and age and age is strictly exogenous in the structural equation. The ﬁrststage regression estimates and their statistics are reported in Table B.2. The instruments are jointly
signiﬁcant on experience with the F-statistics are 196.26.
We ﬁrst test for the endogeneity of fertility using the Hausman (1978) test. Because experience
and working hours are simultaneously determined, the exogeneity assumption of experience has to
be tested. A test for endogeneity of y2it can be obtained by adding the ﬁrst-stage residuals to the
second stage estimation and obtain the t-statistic on v∗ . Table B.3 shows the signiﬁcance of v2it
2it
and v∗ suggests that the hypothesis of an exogenous experience is rejected.
2it
Table B.4 reports estimation results (of average partial effects) using the correlated random effect approach with and without serial correlation correction. The estimation result for the dynamic

49

linear model using GMM method is also shown for comparison. Since the result in Table B.4 is
consistent with the result in Table B.3, we are going to discuss more about the results in Table
B.3. In all models (columns (1)-(3)), the coefﬁcients for lagged working hours are signiﬁcant and
positive, suggesting positive state dependence of labor supply for women. The positive sign of the
lagged working hours shows that women are likely to continue to be workers if they are already
workers or continue to be unemployed if she does not work. The decline in the value of the coefﬁcient on lagged working hours from model (1) to model (3) explains the upward bias of state
dependence in women’s working hours without taking into account the censored and unobserved
heterogeneity issues as well as the serial correlation of unobserved factors. Unobserved heterogeneity which correlates with women characteristics contributes the largest to this upward bias,
next is the ignorance of zero working hours issues and last is the serial correlation of unobserved
factors.
In these models, from column (1) to column (3), in general, experience has positive inﬂuence
on working hours. The magnitude is larger when we controlled for serial correlation. It shows
that if women work continuously and accumulate a substantial amount of experience, the more
experience they have, the more hours they work.
Compare to columns (1) and (2), we control for an extra source of serial correlation (the transitory shock) in experience besides unobserved heterogeneity (the permanent shock). The coefﬁcient
on experience is quite larger and its standard error is smaller. The intuition is as follows. Consider
a positive (transitory) shock to experience. With a high degree of positive serial correlation and a
rise of experience in the ﬁrst period, experience will continue to rise in the next period and become
very large over a long time period. This explains a higher coefﬁcient on experience compared to
those on (1) and (2). After correcting for the serial correlation in (3), even though in the ﬁrst stage,
CRESC estimates are more efﬁcient than CRE estimates and we can see that the standard error on
experience is smaller than those on (1) and (2).
Other explanatory variables might be affected using CRESC (for example, number of children)
because when a lagged dependent variable entered into the equation, which is the proxy of the

50

dependent variable, in the presence of the serial correlation of the endogenous variable, the higher
effect of experience may pick up some of the effects of unmeasured variables as well as observed
covariates. As a result, the coefﬁcients on the lagged dependent variable and children (as well as
mother’s education) are reduced and the signiﬁcance of children might change. Even though a
linear model does not require any serial correlation assumption of experience, the coefﬁcient of
children is more appealing in a nonlinear model where we use CRESC. We also note that children
is allowed to be correlated with heterogeneity but not with the shocks to labor supply so this
assumption is not conﬂicting with the endogeneity assumption in Chapter 1 where we deal with the
cross section and allow correlation between children and heterogeneous preference. In this chapter,
we treat children exogenous with respect to shocks rather than with respect to heterogeneity.
It is also indicated from the coefﬁcients on small children from 0-2 and 3-5 that small kids have
statistically signiﬁcant negative effects on mothers’ working hours. There is an evidence from the
result that children aged 6 to 17 do not affect negatively to women working hours and the statistics
are not signiﬁcant in models (2) and (3).
The initial value of working hours illustrates the correlation between the unobserved effect and
the initial condition. The coefﬁcient on the initial value of working hours is statistically signiﬁcant
in both models (2) and (3). It suggests a strong state dependence of labor supply for women for a
long period.

2.7 Conclusion
In this chapter, an attractive and easy-to-compute method for estimating dynamic Tobit panel data
models with endogenous regressors (besides the lagged dependent variable) is proposed. This
approach requires fewer computational efforts than Heckman’s technique and gives nice APEs. It
also leads to several advantages, for example, we can choose a ﬂexible conditional distribution
of the initial condition instead of approximation which results in computational difﬁculty. As
a consequence, estimates are readily computed and partial effects can be easily determined. In
addition, the control function approach is used to control for the endogeneity which is not coming
51

from the lagged dependent variable. This approach allows for correlation between unobserved
effect and regressors, as well as between regressors and the structural error. To handle the presence
of heterogeneity that causes serial correlation, the correction procedure is added and the serially
uncorrelated residual in the ﬁrst stage is obtained.
This proposed method discussed in this chapter provides useful tool for applied economic research. The method can be applied to various economic applications, such as estimation of labor
supply models, housing expenditure models, or children’s educational expenditure models, etc.
The proposed estimation procedure is readily applied to Panel Study of Income Dynamics data
from 1980 to 1992. Based on the estimation result, I ﬁnd a strong evidence of persistence in the
US white female labor working hours after controlling for censoring, endogeneity and serial correlation issues. I also ﬁnd that the initial condition of female labor supply is statistically signiﬁcant
and has positive impact on women working history. It suggests that the current labor supply of US
women is affected by their past labor supply and their initial condition of labor supply.

52

Chapter 3
AN EXPONENTIAL TYPE II TOBIT PANEL DATA MODEL WITH BINARY
ENDOGENOUS REGRESSOR - APPLICATION TO ESTIMATING THE EFFECT OF
FERTILITY ON MOTHERS’ LABOR FORCE PARTICIPATION AND LABOR SUPPLY

3.1 Introduction
There has been a growing interest in the estimation of nonlinear panel data models with discrete
endogenous variables. Most of the studies focus on binary response or count models with an
endogenous dummy variable. However, there has not been any method suggested in a panel data
model with a corner solution response. Moreover, there is a correlation between the probability
of a positive outcome and itself. Heterogeneity is also present in the model. Therefore, the goal
of this chapter is to develop a panel data estimation method for a model with a corner solution
response and a binary endogenous variable in the presence of heterogeneity and the mentioned
correlation.
Many approaches have been proposed to handle switching endogeneity in models with limited dependent variables. In a limited dependent variable panel data model, the main difﬁculty
lies with the nonlinear functional form and we cannot difference away the unobserved effect. Fullinformation maximum likelihood can be used but this approach is intensively computational which
makes it unattractive. Semiparametric or nonparametric estimators are based on distributional
weaker assumptions; nevertheless, these estimators give scaled index coefﬁcients and not average partial effects. The simplest approach is 2SLS, however, this method ignores nonlinearity in
both the ﬁrst and second stage. It might provide a good approximation but the two assumptions
that a binary endogenous variable is expressed as a linear function and a binary or censored dependent variable is a linear function of a binary endogenous variable are unrealistic. Especially,
this approach ignores the distribution of a censored variable where there is a massive pile of zeros. Econometricians came up with the control function approach to handle endogeneity so that
53

nonlinearity is present in the second stage but linearity is still endured in the ﬁrst stage. In this
chapter, I propose a simple two-step estimator that keeps the nonlinearity assumption in both the
ﬁrst and second stage, and this method is more computationally attractive than the full-information
maximum likelihood approach.
The model and estimation can be used in various economic applications. For example, we can
apply it to study the effects of union status on labor market outcomes, the effects of childbearing
on women’s labor supply and many studies on health economics, business or epidemiology where
binary endogeneity and the corner solution response occur. There are enormous studies on the
effect of fertility on women’s labor force participation (LFP) and labor supply. It is important to
understand how the childbearing decision affects female participation in the labor force and how
much she will work in a system of related equations. In this chapter, I will consider the fertility
decision an endogenous dummy variable that inﬂuences both women’s LFP and hours of work.
The labor supply equation is the amount equation with a corner solution response while the LFP
equation is the so-called participation equation. Using this system of equations, we can correct for
both corner solution and endogenous problems in the study of women’s labor supply.
The contribution of this chapter is to propose a simple two-step estimator which is robust and
can be easily implemented for a Tobit panel data model in the presence of discrete endogeneity and
heterogeneity. The main estimation strategy is to add correction terms so that the endogeneity and
corner solution bias will be removed. This approach allows a joint distribution of the endogenous
dummy regressor and the unobserved factors that affect both the amount and participation equations. I propose a two-step estimation method in which the ﬁrst stage exploits a bivariate probit
model for the relationship between the dummy endogenous variable and the participation decision.
For the amount equation, by using an Exponential Type II Tobit (ET2T) model (see more of Type II
Tobit models in (Wooldridge, 2010, chapter 17)), we can ensure that predicted value of log(hours)
is positive, and there is a correlation between unobserved effects in both the amount and participation equation. In addition, exclusion restriction is used in order to identify the parameters in the
structural equation. In other words, we allow some variables in the participation equation which

54

are not determinants in the amount equations. Explanatory variables are permitted to be correlated
with the heterogeneity. Finally, on the empirical side, it also contributes to the study on the effect
of having a newborn on women’s LFP and labor supply, taking into account their unique culture
and characteristics, using Vietnamese Household data in recent years.
This chapter is organized as follows: The second section reviews approaches to the estimation of a model with a binary endogenous explanatory variable and a limited dependent variable.
It also discusses the literature on the effect of fertility on female labor supply and female labor
participation. The third section develops a model for Tobit panel data with a dummy endogenous
regressor in the presence of correlated participation and heterogeneity. An estimation procedure is
proposed and average treatment effects are obtained. The next section gives an overview of data
and estimation results for an empirical example. The last section is summarization and conclusion.

3.2 Literature Review
There have been many studies on limited dependent variable models with a dummy endogenous
variable. These models were ﬁrst pioneered by Heckman (1978a) using joint normal distributional
assumptions and maximum likelihood (ML) method. Many other works use the conditional ML
framework such as Amemiya (1978, 1979); Newey (1986, 1987); Blundell & Smith (1989) but with
different procedures: generalized least squares (GLS) estimators, minimum Chi-squared estimators
or two-step estimators. The disadvantage of this canonical method is that it is hard to implement
and very computationally expensive.
In a panel data framework, most papers assume a reduced form for an endogenous variable
or use a control function approach with generalized residuals (Vella & Verbeek (1999); Labeaga
(1999)). Many studies also use this approach for cross-sectional cases (Vella (1993); Smith &
Blundell (1986); Rivers & Vuong (1988)). Even though this approach produces consistent estimators, it would be unrealistic to assume a linear function for a dummy variable.
In order to avoid distributional assumptions in traditional ML framework as in Heckman (1978a),
some studies have proposed nonparametric or semiparametric estimators (Newey (1985); Lee
55

(1996); Vytlacil (2002); Vytlacil & Yildiz (2007)). However, these estimators are quite difﬁcult to
implement in the case where both the corner solution and binary endogeneity occur. Moreover, in
a panel data framework, the semiparametric ﬁxed-effect approach cannot identify average partial
effects.
Angrist (2001) discussed other alternative methods for estimating dummy endogenous variables (including 2SLS, IV for an exponential conditional mean, minimum mean squared error
approximation or quantile treatment effects approach). He prefers IV an estimation strategy (similar to Mullahy (1997) and Abadie (2000)) for nonlinear models with covariates and a nonstructural
approach since it gives similar average treatment effects. However, he did not give any evidence
for not using the structural approach.
In this chapter, I focus on the simple two-step estimation method (Terza (1998); Kim (2006))
since our model has both corner solution and binary endogeneity problems. It would be attractive
to use this method that incorporates a method similar to Heckman (1979) to correct for sample
selection. In addition, in our panel data framework, we would like to use correlated random effects
to handle heterogeneity in the presence of endogeneity and correlated participation (similar to
Semykina & Wooldridge (2010)). Both IV strategies and the bivariate probit method are utilized
to handle the binary endogeneity.
The proposed method is very applicable in many economic models since switching endogeneity is of interest to many applied economists and policy makers. One interesting application is
estimating the effect of having a newborn on women’s labor supply in the presence of a corner solution response and unobserved heterogeneity. Hence, the following part will consist of a literature
review on the relationship between fertility and female labor supply.
A remarkable number of studies have examined the effect of fertility on female labor supply
and labor force participation. These studies can be divided into four major groups, depending on
how they handle the endogeneity problem of the fertility decision. The ﬁrst group is presented
by the studies of Gronau (1973), Heckman (1974), and Heckman & Willis (1977) who assumed
exogenous fertility and established a strong negative correlation between female labor supply and

56

fertility. However, as Browning (1992) commented, very few credible inferences can be drawn
from them even though we have a number of robust correlations. Their main methodology is to
use OLS to estimate the effects of fertility on labor supply.
A second group of studies led by Cain & Dooley (1976), Schultz (1978), and Fleisher & Rhodes
(1979) acknowledged endogenous fertility. They handled the endogeneity problem by estimating
simultaneous equations models. Smaller estimates on fertility are found when treating it as an endogenous variable than when treating it as an exogenous variable. The problem with this approach
is that it is hard to ﬁnd plausible exclusion restrictions that could identify the underlying structural
parameters.
A third group of studies, pioneered by the work of Nakamura & Nakamura (1992), added the
lagged dependent variable (i.e. hours of work) to control for unobserved heterogeneity across
women. This approach has been used subsequently by a number of authors (Even (1987); Lehrer
(1992)). Although adding the lagged dependent variable can help control for unobserved heterogeneity, it still does not address the problem of the endogeneity of the fertility decision.
Last but not least, a fourth group of studies solved the endogeneity problem of fertility by
exploiting exogenous sources of variation in family size. Rosenzweig & Wolpin (1980) ﬁrst used
this strategy by comparing the labor supply of women who had twins at their ﬁrst birth with that of
women who had a single child. Then Bronars & Grogger (2001); Jacobsen et al. (1999) used the
same strategy but managed to obtain more precise estimates. Other studies (Bloom et al. (2009);
Kim & Aassve (2006)) exploit abortion legislation or the contraceptive choice of couples as an
IV for fertility. In the same spirit as the twins studies mentioned above, Angrist & Evans (1998)
estimated the effect of a third or higher order child on female labor supply by exploiting the fact
that parents typically prefer mixed-sex siblings. For a sample of couples with at least two children,
they instrumented further childbearing (i.e. having more than two children) with a dummy variable
for whether the sex of the second child matched the sex of the ﬁrst. Because sex mix is virtually
random, this strategy allows for identiﬁcation of the effect of a third or higher order child.
Nguyen (2010) emphasized the negative signiﬁcant impact of the number of children on female

57

labor supply. The paper found diminishing impacts of having children on female labor supply and
the ﬁrst child always has the largest adverse effect on a mother’s labor supply. This implies that
children do not have equal impacts on a mother’s labor supply. This ﬁnding is similar to the idea
from Browning (1992) that having a newborn has more signiﬁcant impact on a mother’s labor
supply than having a general number of kids. However, the paper does not view the problem
in terms of a two-part model acknowledging the fact that people who decide to work will have
positive working hours. This chapter will consider the issue of female labor force participation in
a relation with female labor supply and the impact of having a newborn on both a mother’s amount
and participation decision, which calls again for a discrete endogeneity of having children.

3.3 Model and Estimation
I consider a panel data model with a corner solution response and a binary endogenous variable in
the presence of correlated participation decision and heterogeneity as follows:
y1it = y2it exp(X1it β1 + y3it α1 + c1i + u1it ),

(3.1)

or
log(y1it ) = X1it β1 + y3it α1 + c1i + u1it if y1it > 0 or y2it = 1(iff y∗ > 0),
2it
y∗ = X2it β2 + y3it α2 + c2i + u2it ,
2it

(3.2)

y∗ = X3it β3 + c3i + u3it ,
3it

(3.3)

y2it = 1[y∗ > 0],
2it

(3.4)

y3it = 1[y∗ > 0],
3it

(3.5)

where y1it is continuous with strictly positive values when y ∗ > 0 and equal to zero when y∗ < 0
2it
2it
with positive probability, hereafter i = 1, 2, . . ., N and t = 1, 2, . . ., T . We assume that we observe
y1it only when y∗ > 0 or y2it = 1. Xmit are 1 × Km vectors of exogenous explanatory variables
2it
(for m = 1, 2, 3) which can contain a constant term. βm are Km × 1 vectors of parameters. cmi
are time-constant unobserved heterogeneity and umit are idiosyncratic errors. α1 and α2 are scalar
58

parameters. 1[·] is an indicator function which has a value of one when the expression inside the
bracket is true, otherwise has a value of zero. Both y2it and y3it are dummy variables. We assume
a balanced panel for simplicity so i = 1, 2, . . ., N and t = 1, 2, . . ., T is assumed throughout this
chapter. A novel feature of this panel data model is that the common endogenous variable y 3it
appears in both the amount and participation equations: (3.1) and (3.2). We therefore need to
handle both endogeneity and the corner solution problem in equation (3.1).
Following the work of Heckman et al. (1999) and Heckman (1979), α 1 and α2 are identiﬁed if
X3i includes at least one variable which is excluded from X2i or X1i under the correct assumption
of joint distribution of the error terms. That variable is usually referred to as (an) instrumental variable(s). X2i should include at least one variable which is not in X 1i . Those instrumental variables
are assumed strictly exogenous conditional on unobserved heterogeneity. With that in mind, and
using the modeling device in Mundlak (1978), we can model the relationship between unobserved
effects cmi and Xmit for each m. Let us rewrite equations (3.1)-(3.3) as follows:
y1it = y2it exp(X1it β1 + y3it α1 + c1i + u1it ),

(3.6)

y∗ = X1it β21 + X22it β22 + y3it α2 + c2i + u2it ,
2it

(3.7)

y∗ = X1it β31 + X32it β32 + c3i + u3it ,
3it

(3.8)

where X32it and X22it are instrumental variables.
Now we assume that:
cmi = Zi δm + ami , m = 1, 2, 3,
2
where ami |Xmi ∼ Normal(0, σam ); Z i = T −1

T
ÈZ

t=1

it ;

(3.9)

Zit contains both explanatory variables X1it ,

X32it , and X22it . Z i is a 1 × L vector where L = K2 + K3 − K1 .
And now we can rewrite equations (3.1)-(3.3) as:
log(y1it ) = W1it γ1 + y3it α1 + v1it if y1it > 0 or y2it = 1;W1it ≡ (X1it , Zi ),

(3.10)

y∗ = W2it γ2 + y3it α2 + v2it ;W2it = (X2it , Zi ),
2it

(3.11)

y∗ = W3it γ3 + v3it ;W3it = (X3it , Zi ),
3it

(3.12)

59

where vmit = ami + umit ; m = 1, 2, 3.
As discussed in (Wooldridge, 2010, section 17.6.3), we model the corner solution using the
ET2T model. First, we can ensure that the predicted value of the response variable is positive.
Second, it is noticeable that we can allow a correlation between unobserved factors that affect the
amount equation and unobserved factors that affect the participation equation, that is, v 1it and v2it
are correlated. This assumption is exploited to relax the assumption in the usual lognormal hurdle
model. Moreover, it is a reasonable assumption in empirical study. For example, in the model
for married women’s labor supply, unobserved factors can inﬂuence both women’s LFP and labor
supply or the unobserved effects determining both decisions are related. Therefore, we can assume
that:
E(v1it |W1i , v2it ) = η v2it ,

(3.13)

in addition to Var(v2it ) = 1; Var(v1it ) = σ 2 ; Cov(v1it , v2it ) = ψσ = η where ψ is the correlation
between v1it and v2it .
We are interested in deriving E(log(y1it )|Wi, y3it , y∗ > 0).
2it
E(log(y1it )|Wi , y3it , y∗ > 0) = W1it γ1 + y3it α1 + E(v1it |Wi , y3it , y∗ > 0).
2it
2it

(3.14)

Now,
E(v1it |Wi , y3it , y∗ > 0) = y3it E(v1it |Wi , y∗ > 0, y∗ > 0) + (1 − y3it )E(v1it |Wi , y∗ < 0, y∗ > 0),
2it
3it
2it
3it
2it
or
E(v1it |Wi , y3it , y∗ > 0) = y3it E1 + (1 − y3it )E0 .
2it
We will derive E1 ﬁrst and apply the similar strategy for E0 .
E1 = E(v1it |Wi, y∗ > 0, y∗ > 0) = η1 E(v2it |Wi , y3it = 1, y2it = 1),
3it
2it

(3.15)

E1 = η12 E12 + η13 E13 ,

(3.16)

where
E12 = φ (W3it γ3 )Φ (W2it γ2 + α2 − ρ W3it γ3 )(1 − ρ 2 )−1/2 Φ−1 (W3it γ3 ,W2it γ2 + α2 ; ρ ), (3.17)
2
60

and
E13 = φ (W2it γ2 + α2 )Φ (W3it γ3 − ρ (W2it γ2 + α2 ))(1 − ρ 2)−1/2 Φ−1 (W3it γ3 ,W2it γ2 + α2 ; ρ ),
2
(3.18)
under the assumption that Cov(v2it , v3it ) = ρ and Var(v3it ) = 1; we can also write: v2it = ρ v3it +eit
2
where eit |Zi , v3it ∼ Normal(0, σe ).

E0 = η02 E02 + η03 E03 ,

(3.19)

where
E02 = φ (−W3it γ3 )Φ (W2it γ2 − ρ W3it γ3 )(1 − ρ 2 )−1/2 Φ−1 (−W3it γ3 ,W2it γ2 , −ρ ),

(3.20)

E03 = φ (W2it γ2 )Φ (−W3it γ3 + ρ W2it γ2 )(1 − ρ 2 )−1/2 Φ−1 (−W3it γ3 ,W2it γ2 , −ρ ).

(3.21)

and

Using these two regimes of y3it and correlated participation, we can handle both endogenous
switching and corner solution problems. Instead of proceeding with the full information maximum
likelihood, we estimate parameters of interest using a two-step estimation procedure based on
Heckman’s idea of correcting a selection problem using the correction functions (which is the
inverse Mill’s ratio in Heckman’s model). For each regime corresponding to either y 3it = 0 or
y3it = 1, we add two correction terms which comprise one part for ﬁxing an endogeneity problem
and the other part for correcting correlated unobserved effects bias from the participation equation.
The conditional mean of interest with positive outcome, y 2it = 1, is:

¾

E(log(y1it )|Wi , y3it , y∗ > 0) =
2it

W1it γ1 + y3it α1 + η12 y3it E12 (θ1 ) + η13 y3it E13 (θ1 )
+(1 − y3it )η02 E02 (θ1 ) + η03 (1 − y3it )E03(θ1 )

¿
, (3.22)

where Wi = W1i ∪W2i ∪W3i and 4 correction terms E12 , E13 , E02 and E03 are stated as above.
We can identify θ1 = (α2 , γ3 , γ2 , ρ ) , a Q × 1 vector, (Q = 1 + K2 + K3 + 2L + T ) using maximum likelihood estimation for pooled bivariate probit model in the ﬁrst stage. Similar to Heckman

61

(1978a), Greene (1997), and Carrasco (2001), the log likelihood function in the ﬁrst stage that
solves for estimates of θ1 is:

¾

ln Lit (θ1 ) =

(1 − y3it )(1 − y2it ) ln P00 + y3it (1 − y2it ) ln P10

¿

+y2it (1 − y3it ) ln P01 + y3it y2it ln P11

,

(3.23)

where
P11 = Pr(y3it = 1 and y2it = 1) = Φ2 (W3it γ3 ,W2it γ2 + α2 ; ρ ),

(3.24)

P00 = Pr(y3it = 0 and y2it = 0) = Φ2 (−W3it γ3 , −W2it γ2 , ρ ),

(3.25)

P10 = Pr(y3it = 1 and y2it = 0) = Φ2 (W3it γ3 , −W2it γ2 − α2 ; −ρ ),

(3.26)

P01 = Pr(y3it = 0 and y2it = 1) = Φ2 (−W3it γ3 ,W2it γ2 ; −ρ ).

(3.27)

We can estimate parameters in the ﬁrst stage to obtain θ1 (and its standard errors as shown in
the ﬁrst-stage technicalities of Appendix F) and get 4 correction terms in equation (3.22) to plug in
the second stage. In the second stage, we estimate the following equation on the selected sample
with y2it = 1 or a positive dependent variable using POLS:

¾

log(y1it ) =

ˆ
ˆ
W1it γ1 + y3it α1 + η12 y3it E12 (θ1 ) + η13 y3it E13 (θ1 )
ˆ
ˆ
+(1 − y3it )η02 E02 (θ1 ) + η03 (1 − y3it )E03 (θ1 ) + εit

¿
.

(3.28)

Hence, with a similar idea of adding the inverse Mills ratio to correct for the sample selection
bias, we can add 4 correction terms to control for a corner solution problem with a correlated
participation decision and binary endogeneity. We can rewrite the estimating equation above as:
log(y1it ) = W1it γ1 + y3it α1 +

4

ˆ
η j λit j + εit ,

(3.29)

j=1

ˆ
ˆ
ˆ ˆ
ˆ ˆ
ˆ
ˆ
where λit1 = y3it E12 (θ1 ); λit2 = y3it E13 (θ1 ); λit3 = (1−y3it )E02 (θ1 ); and λit4 = (1−y3it )E03(θ1 ).
Even though the two-step estimator is easy to implement and numerically robust, we need to
adjust the second-stage standard errors, taking into account the ﬁrst-stage estimation. I show how
ˆ
to obtain θ1 and derive the asymptotic variance of this two-step estimator ( θ2 ) in the technical
section of Appendix F.

62

3.4 Average Partial Effect
The quantity of interest in this study is average treatment effect (ATE) of the binary endogenous
variable. We can also obtain average partial effects (APEs) for exogenous explanatory variables.
First we rewrite model (3.1) to (3.3) in the conditional mean forms as follows:
E(log(y1it ) = X1it β1 + y3it α1 + c1i + u1it if y1it > 0,

(3.30)

E(y2it |X2it , y3it , c2i , u2it ) = Φ(X2it β2 + y3it α2 + c2i + u2it ),

(3.31)

E(y3it |X3it , c3i , u3it ) = Φ(X3it β3 + c3i + u3it ).

(3.32)

Our main interest lies in the treatment effect of a binary endogenous variable in both equations
(3.30) and (3.31). We can evaluate the effect at values of exogenous explanatory variables of
interest. But ﬁrst, we need to handle the correlated unobserved effects using Mundlak’s device
as shown in equation (3.9) and follow the estimation procedure that is clariﬁed in the previous
section. Now (3.30) and (3.31) have been previously derived as:
E(log(y1it )|Zi , y3it , y1it > 0) = X1it β1 + y3it α1 + Zi δ1 +

4

η j λit j ,

(3.33)

j=1

E(y2it |Zi , y3it , a2i , u2it ) = Φ(X2it β2 + y3it α2 + Zi δ2 + a2i + u2it ).

(3.34)

ATE for the amount equation:
For y3t as a binary variable, the ATE at time t can be obtained by averaging equation (3.30)
over the distribution of c1i and u1it or take a difference in:
E(Z ,λ ) [X1t β1 + y3t α1 + Zi δ1 +
i it

4

η j λit j ],

(3.35)

j=1

where in the argument of the expectation operator, variables with a subscript i are random and all
others are ﬁxed.
With the deﬁnition from equation (3.17) - equation (3.21), plus equation (3.29), (3.35) is rewritten as:
EEit [α1 + η12 E12 (θ1 ) + η13 E13 (θ1 ) − η02 E02 (θ1 ) − η03 E03 (θ1 )].
63

(3.36)

Given consistent estimator of θ1 and θ2 , the ATE of the binary variable y3t in equation (3.33)
can be estimated as:
AT E = N −1

N

ä

ç

α1 + η12 E12 (θ1 ) + η13 E13 (θ1 ) − η02 E02 (θ1 ) − η03 E03 (θ1 ) ,

(3.37)

i=1

where for each unit we predict the difference in mean responses with and without “treatment” (for
y3t = 1 and y3t = 0), and then average the difference in these estimated mean responses across all
units.
ATE for the participation equation:
We rewrite model (3.34) with the scaled coefﬁcients using a standard mixing property of the
normal distribution of eit :
E(y2it |Zi , y3it , v3it , eit ) = Φ(X2it β2 + y3it α2 + Zi δ2 + ρ v3it + eit ),

(3.38)

E(y2it |Zi , y3it , v3it , eit ) = Φ(X2it β2e + y3it α2e + Zi δ2e + ρe v3it ),

(3.39)

or

where the subscript e denotes division by

2
1 + σe .

Note that we can write (3.38) - equation (3.39) in terms of bivariate probit model as in the
technical section and the procedure to obtain APE or ATE is the same as described below. That
means we average out Zi and then take derivatives or changes with respect to the elements of
(X2t , y3t ).
The APEs are obtained by computing derivatives, or obtaining differences, in:
E(Z ,v ) [Φ(X2t β2e + y3t α2e + Zi δ2e + ρe v3it )],
i 3it

(3.40)

E(Z ) [Φ(X2t β2v + y3t α2v + Zi δ2v )],

(3.41)

or
i

Õ

2
2
where the subscript v denotes division by ρ e 1 + σv and Var(v3it ) = σv = 1. In order to obtain
3

3

partial effects, we average out Zi and then take derivatives or changes with respect to the elements

64

of (X2t , y3t ). Across the sample for a chosen t, we can obtain the estimators for APE with respect
to one element X2t1 of X2t as:

¾

APE = β2v1 N −1

N

¿
φ (X2t β2v + y3t α2v + Zi δ2v ) .

(3.42)

i=1

The estimator for ATE with respect to y3t is:
AT E = N −1

N

Φ(X2t β2v + α2v + Zi δ2v ) − Φ(X2t β2v + Zi δ2v ) ,

(3.43)

i=1

which we are interested in.

3.5 Empirical Example
3.5.1 Overview of Data
Over the past two decades, fertility has decreased as the labor force participation rates of women
in most developing and advanced countries have increased (Kim & Aassve (2006)). This change
implies the changing roles of women and changes in the time allocation among household members
in both work activities and fertility behavior. We also observed this pattern in Vietnam.
For the last two decades, the fertility rates of Vietnamese women fell while the labor force
participation rates for the whole population did not change very much. A decline in fertility also
accompanied an increase in income. During the period from 1986 to 2006, while fertility dramatically decreased, GDP per capita increased 2.9 times to 587.4 USD per capita. This pattern
is consistent with microeconomic predictions: higher income leads to a reduction in fertility and
the inverse relationship of fertility and labor force participation (Becker & Lewis (1973); Willis
(1973)). Thus, it is important to analyze the data on fertility and labor market behavior of working
women.
The data used in this paper came from the Vietnamese Household Living Standard Surveys
(VHLSS) 2004, 2006, and 2008, which were conducted by the Vietnamese General Statistical
Ofﬁce (GSO) with technical support from The World Bank. The survey sample was randomly

65

selected to represent the whole country, taking into account urban and rural structures, geographical
conditions, regional issues, ethnic differences, and provincial representation. The sample used in
this chapter has 665 women. The survey collected information about the following: household
information, education, health, employment, migration, housing, fertility and family planning,
incomes, expenditures, borrowing, lending, and savings.
Only households with children under 18 years old and households with a mother and father
younger than 60 and 65 years of age, respectively, at the time of the interview are included in this
research. There are 1,995 households in the sample used for this research. Table C.1 provides a
summary of the descriptive statistics for the whole sample. The dependent variables are working
status and hours worked per day for a woman (being either head of household or spouse). According to Table C.1, 95% of mothers worked in the interview year, and on average, they worked
7.8 hours per day. The explanatory variables are whether mother has a newborn, mother’s education, age, non-labor income, father’s education, age; and other household characteristics such as
whether they live in an urban area, they work on a farm and their ethnicity. In this sample, each
household had an average of 2.5 children; 10% of the sample women had newborns; 55% and 56%
women in the sample had a boy ﬁrst and their ﬁrst two kids had the same gender, respectively. In
general, the husband’s education is higher than wife’s education. Income from other sources for
women in the sample is about 8 million VND per year (approximately 400-450 USD per year).
Table C.1 also shows that around 84% of working wives worked on farms and 18% of households
were located in urban areas.
Table C.2 shows the summary statistics for each year in the panel data. There is no obvious
pattern for women working hours and labor force participation (LFP). However, we can observe
that the fertility rate declines over time. The percentage of having a newborn goes down from 16%
in 2004 to 9% in 2006 and 6% in 2008. On the contrary, non-wife income increases over time.

66

3.5.2 Estimation and Result
The main contribution of this chapter and its following application is to allow the correlation between women’s decision to participate in the labor market and their amount of working hours; to
acknowledge the nature of having a newborn as a dummy variable and to consider the inﬂuence
of having a newborn on both women’s participation and labor supply. As shown in the literature,
newborns have negative effect on women’s labor force entry. This means that women who are not
working are unlikely to take part in the labor market after delivering babies. This raises the question of how newborns affect their mothers’ labor supply for those women who are participating
and stay in the market.
When a mother has a newborn, she will decide how many working hours she will spend after
deliver a baby. If endogeneity of fertility is not accounted for, we will not obtain consistent estimates of labor supply conditional on fertility. In order to draw robust and credible estimates of the
effects of newborns on women’s labor supply and participation, we need to take into account this
endogeneity. Another important point is both amount and participation decisions are jointly determined because preferences for working or work time somehow are positively correlated. In the
same way, preferences for having a baby and for working are negatively correlated. Therefore, we
should model these decisions with a joint relationship. Using the panel data VHLSSs 2004-2008,
we study women’s labor supply in a system of equations where fertility decision is an endogenous dummy variable occurred in both labor supply and participation equations, and this system
of equations are jointly correlated. We are interested in estimating a panel data model of working
hours for a woman i at time t, who takes having a newborn into consideration as an endogenous
factor, as follows:

¾

Log(Hoursit ) =

¾
∗
LFPit =

Newbornit α1 + Meduit β11 + Mageit β12 + Magesqit β13

¿

+NMincomeit β14 + c1i + u1it if Hoursit > 0

Meduit β21 + Mageit β22 + Magesqit β23 + Heduit β24 + Hageit β25 +
Hagesqit β26 + NMincomeit β27 + Newbornit α2 + c2i + u2it

67

,

(3.44)

¿
,

(3.45)

¾
Newborn∗ =
it

Samesexβ32 + Medugit β33 + Mageit β34
+Magesqit β35 + NMincomeit β36 + c3i + u3it

¿
.

(3.46)

These equations correspond to equation (3.1) to equation (3.3) in the model section. Hours it
is annual working hours for a woman i at time t, which is determined by her education, Medu,
her age, Mage, her age square, Magesq, her other income not from her wage, NMincome, other
variables such as whether she lives in an urban area, whether her ethnicity is majority, whether she
works on a farm, and whether she has a newborn, Newborn, with the age from 0 to 1. A woman’s
LFP is inﬂuenced by her characteristics (the same variables in equation (3.44), her husband’s
characteristics including education, age, age square, Hedu, Hage, Hagesq, non-mom income as
well as whether she has a newborn. The fertility decision equation has right-hand-side variables
including an instrumental variable: whether the ﬁrst two children have the same gender and other
exogenous variables including mom’s characteristics and non-mom income. We also allow some
explanatory variables to be correlated with heterogeneity and take care of this relationship by
adding time averages of explanatory variables into each equation.
With the new procedure to control for a corner solution, we can allow unobserved factors that
affect both amount and participation equations to be correlated. In addition, to ensure that the
predicted value of labor supply is positive, we need to apply Type II Tobit model to log(hours)
rather than hours. That is why we use the speciﬁcation of Exponential Type II Tobit model (ET2T)
(see more in (Wooldridge, 2010, Chapter 17)). In addition, the ET2T model is applicable when we
have exclusion restrictions. The participation equation contains many more variables which are
not in the amount equation so that the parameters in the amount equation will be identiﬁed.
The choice of appropriate instrumental variables is important because these can affect the reliability of estimates and inferences. Valid and strong instrumental variables must satisfy two conditions: an instrumental variable should be uncorrelated with the error term and it should be highly
correlated with the right-hand-side endogenous regressor(s). In this research, that means that the
instrumental variables have no correlation with factors that directly affect parental LFP and labor
supply and that the instruments are correlated with fertility. Whether the ﬁrst two children have

68

the same gender is used to generate exogenous variations in fertility in this research.
Normally, the gender of a child is a random variable, and it is uncorrelated with parental LFP
and labor supply. In addition, we found that the boy-to-girl ratio of the ﬁrst child was 1.05 in our
sample, which is close to the natural ratio. Thus, the gender of the ﬁrst child is a valid instrumental
variable. However, this instrumental variable is not signiﬁcant in the ﬁrst stage. Angrist & Evans
(1998) found that parents prefer a mixed sibling-sex composition, and parents who ﬁrst had two
girls or boys had a higher probability of having additional children. Carrasco (2001) also found
same sex instrumental variable is a strong instrument in the US data. In this dataset, among women
with more than two children, the likelihood of another birth was 28% if they had a son, 34%
if they had two sons and 11% if they had three sons. This evidence implies that siblings with
mixed genders are desirable among Vietnamese families. The same gender of the ﬁrst two children
variables meets the two conditions required of a valid and strong instrument, and it can serve as
an instrumental variable to generate exogenous variations in fertility. The same gender of the ﬁrst
two children equals 1 if the ﬁrst two children have the same gender, and 0 otherwise. According to
Table C.1, 55% of sampled households had a male ﬁrst child and 56% of households had the ﬁrst
two children with the same gender. The t-test is implemented to see if the same gender instrument
is strong or not. The result is -3.2, implying that this instrument can be used for this study.
Table C.3 shows the estimation result for the bivariate probit model in the ﬁrst stage. The coefﬁcient on samesex is positive and it is statistically signiﬁcant implying samesex is a good and
signiﬁcant instrument in our study. The coefﬁcient on a newborn in the LFP equation is also negative and statistically signiﬁcant. The effect of a newborn reduces the mother’s probability of LFP
by 13.6%. In terms of the average treatment effect, compared to women without newborn babies,
mothers with newborns have lower probability to continue to work by 12.7%. The coefﬁcient on

ρ , -0.165, shows us that there is a negative correlation between unobserved effects that affect both
fertility and women’s LFP. This brings more evidence to empirical studies of developing countries
that having an additional child will negatively inﬂuence the probability of working women who
just delivered a child to come back to work.

69

Table C.4 reports the coefﬁcient estimates from six different estimation methods. Pooled OLS
(POLS) assumes that all explanatory variables are uncorrelated with unobserved heterogeneity and
are also strictly exogenous. The estimates based on POLS show that having a newborn reduces the
mother’ s working hours by 13.4%. The POLS estimates have the largest bias because they do not
take into account endogeneity of fertility, the presence of heterogeneity which might be correlated
with explanatory variables, and the correlation between work participation and the amount of work.
Pooled 2SLS takes into account endogeneity of a newborn but does not remove an unobserved
effect. Controlling for endogeneity of a newborn reduces the bias by 10%. Now having a newborn
will make a mother reduce her working hours by 23%. Fixed effects (FE) allows for correlation
between the explanatory variables and unobserved heterogeneity and FE-2SLS further allows a
newborn to be correlated with the idiosyncratic errors. Columns (3) and (4) show that mothers’
working hours are diminished by 16.4% and 27.7% using the FE and FE-2SLS. However, FE2SLS ignores the correlation between women’s decision to participate and how much to work. In
addition, all methods from (1) to (4) do not consider a newborn a dummy variable.
To take into account the correlated participation, we can also use Heckman type IV correction
(see Semykina & Wooldridge (2010)) method and hereafter, we call this estimator SW (under
column 5 of Table C.4). This estimator allows correlated participation and heterogeneity in the
presence of endogeneity. However, this method ignores the binary nature of the endogeneity and
assumes a linear reduced form (using pooled 2SLS in the second stage after obtaining the inverse
Mills ratio in the ﬁrst stage). The result shows that mothers’ working hours are reduced by 30.8%
using the SW, which is more than the reduction in mothers’ working hours using the FE and FE2SLS. It suggests that correlated participation does matter. However, this decrease is still smaller
than the reduction in mothers’ working hours using the new proposed procedure since we need to
take care of the binary endogeneity.
The new proposed procedure corrects for endogeneity of a newborn, plus its dummy nature and
its inﬂuence on both women’s participation and amount of work. It also reduces another source
of bias from correlated heterogeneity by adding time averages of explanatory variables into all

70

equations and time dummies. However, the standard errors are larger once these corrections are
accounted for. After controlling for all these sources of bias, women who are still working will
decrease their working hours by 34.5%.
The result shows that having a new child in Vietnamese households has a negative effect on
maternal hours for working women. Women will have to give up their working hours by 34.5% to
take care of her newborn or use such forgone time as an input of home production.

3.6 Conclusion
This chapter studies the nonlinear panel data model with an endogenous dummy variable and a
corner solution response. The main contribution is to allow a joint distribution of the endogenous
dummy regressor and unobserved factors that affect both the amount and participation equations. I
propose a two-step estimation method in which the ﬁrst stage exploits a bivariate probit model for
the relationship between the endogenous dummy variable and the participation decision. For the
amount equation, by using an ET2T model, we can ensure that the predicted value of log(hours) is
positive; and there is a correlation between unobserved effects in both the amount and participation
equations. In addition, we need to allow exclusion restrictions in order to identify the parameters
in the amount equation. In other words, we allow a set of explanatory variables in the participation
equation which contains the set of explanatory variables in the amount equations. I also allow
some explanatory variables to be correlated with heterogeneity.
This estimation method is applied to analyze the effect of fertility on women’s working hours
and labor force participation. The proposed approach gives a statistically signiﬁcant negative effect
of having a newborn on a woman who is working and remain in the labor market. Having a
newborn has a signiﬁcant negative impact on a woman’s taking part into the labor force and her
working hours. The proposed estimation method corrects remarkably the bias in estimating the
effect of a newborn on a mother’s working hours compared to other alternative estimation methods.

71

APPENDICES

72

Appendix A
TABLES FOR CHAPTER 1

73

Table A.1: Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.5, 500 replications)
Model

True
value
APE

Linear

Tobit

Fractional Linear
Probit
Estimation Method
OLS
MLE
QMLE
2SLS
y2 is assumed exogenous
y2 continuous
-0.2347 -0.1283 -0.1591
-0.2079
-0.1583
(0.0046) (0.0042) (0.0051) (0.0110)
[.0034] [.0024]
[.0008]
[.0024]
y2 discrete 0-1
-0.32
-0.2014
-0.2763
(0.0046) (0.0051)
[.0038]
[.0014]
y2 discrete 1-2
-0.1273
-0.161
-0.1193
(0.0027) (0.0017)
[.0011]
[.0002]
y2 discrete 2-3
-0.0212
-0.0388
-0.0259
(0.0030) (0.0012)
[.0006]
[.0001]
x1
0.0235
0.0224
0.021
0.0223
0.0237
(0.0181) (0.0125) (0.0130) (0.0189)
x2
0.0235
0.0218
0.0214
0.0195
0.023
(0.0181) (0.0128) (0.0129) (0.0192)
Note: Figures in brackets ()[] are standard deviation and RMSE respectively.

74

Tobit
Fractional Fractional
BS
Probit
Probit
MLE
QMLE-PW
NLS
y2 is assumed endogenous
-0.1754
-0.2295
-0.2368
(0.0064)
(0.0077)
(0.0051)
[.0019]
[.0002]
[.00008]
-0.2262
-0.3109
-0.3201
(0.0082)
(0.0099)
(0.0041)
[.0030]
[.0003]
[.00005]
-0.1716
-0.1258
-0.128
(0.0031)
(0.0023)
(0.0020)
[.0014]
[.00005]
[.00004]
-0.0317
-0.0224
-0.0214
(0.0031)
(0.0014)
(0.0014)
[.0003]
[.00004]
[.00001]
0.0212
0.0231
0.024
(0.0131)
(0.0142)
(0.0159)
0.0218
0.0241
0.0243
(0.0131)
(0.0134)
(0.0153)

Fractional
Probit
QMLE
-0.2371
(0.0050)
[.00008]
-0.3204
(0.0030)
[.00001]
-0.1278
(0.0016)
[.00001]
-0.0212
(0.0010)
[.000001]
0.0238
(0.0140)
0.0244
(0.0136)

Table A.2: Simulation Result of the Coefﬁcient Estimates (N=1000, η 1 = 0.5, 500 replications)
Model

True
value
Coef.

Linear

Tobit

Fractional
Probit
Estimation Method
OLS
MLE
QMLE
y2 is assumed exogenous
y2
-1
-0.1283 -0.2024
-0.8543
(0.0044) (0.0046) (0.0146)
x1
0.1
0.0224
0.0267
0.0917
(0.0181) (0.0160) (0.0534)
x2
0.1
0.0218
0.0272
0.0956
(0.0181) (0.0163) (0.0534)
Note: Figures in parenthesis () are standard deviations.

75

Linear

Tobit
Fractional Fractional Fractional
BS
Probit
Probit
Probit
2SLS
MLE
QMLE-PW
NLS
QMLE
y2 is assumed endogenous
-0.1583 -0.2275
-0.9387
-1.045
-1.044
(0.0089) (0.0084)
(0.0255)
(0.0483)
(0.0424)
0.0237
0.0275
0.0945
0.1061
0.1052
(0.0190) (0.0171)
(0.0578)
(0.0702)
(0.0619)
0.0231
0.0282
0.0987
0.1071
0.1073
(0.0192) (0.0170)
(0.0548)
(0.0681)
(0.0600)

Table A.3: Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.1, 500 replications)
Model

True
value
APE

Linear

Tobit

Fractional Linear
Probit
Estimation Method
OLS
MLE
QMLE
2SLS
y2 is assumed exogenous
y2 continuous
-0.2461 -0.1507 -0.1854
-0.2402
-0.16
(0.0042) (0.0037) (0.0046) (0.0102)
[.0031] [.0019]
[.0002]
[.0027]
y2 discrete 0-1
-0.3383
-0.239
-0.3289
(0.0042) (0.0031)
[.0032]
[.0003]
y2 discrete 1-2
-0.1332
-0.2001
-0.1319
(0.0018) (0.0011)
[.0021]
[.00004]
y2 discrete 2-3
-0.0208
-0.0193
-0.0219
(0.0029) (0.0007)
[.00005] [.00003]
x2
0.0246
0.0267
0.021
0.025
0.0265
(0.0168) (0.0089) (0.0063) (0.0170)
x2
0.0246
0.0241
0.0214
0.0246
0.0242
(0.0178) (0.0100) (0.0070) (0.0182)
Note: Figures in brackets ()[] are standard deviation and RMSE respectively.

76

Tobit
Fractional Fractional
BS
Probit
Probit
MLE
QMLE-PW
NLS
y2 is assumed endogenous
-0.1887
-0.2442
-0.249
(0.0053)
(0.0056)
(0.0044)
[.0018]
[.0001]
[.0001]
-0.2445
-0.3355
-0.3384
(0.0066)
(0.0051)
(0.0019)
[.0030]
[.00009]
[.000005]
-0.2022
-0.1331
-0.1332
(0.0025)
(0.0013)
(0.0008)
[.0022]
[.000004] [.000001]
-0.0177
-0.0212
-0.0208
(0.0032)
(0.0008)
(0.0007)
[.0001]
[.00001]
[.000001]
0.0234
0.025
0.0255
(0.0090)
(0.0065)
(0.0072)
0.0222
0.0246
0.0252
(0.0100)
(0.0070)
(0.0077)

Fractional
Probit
QMLE
-0.2491
(0.0043)
[.0001]
-0.3385
(0.0020)
[.000007]
-0.1332
(0.0010)
[.000001]
-0.0208
(0.0007)
[.000001]
0.0253
(0.0066)
0.0249
(0.0072)

Table A.4: Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.9, 500 replications)
Model

True
value
APE

Linear

Tobit

Fractional Linear
Probit
Estimation Method
OLS
MLE
QMLE
2SLS
y2 is assumed exogenous
y2 continuous
-0.2178 -0.1104 -0.1368
-0.1777
-0.1548
(0.0042) (0.0039) (0.0054) (0.0148)
[.0034] [.0026]
[.0013]
[.0020]
y2 discrete 0-1
-0.2973
-0.1706
-0.2307
(0.0049) (0.0069)
[.0040]
[.0021]
y2 discrete 1-2
-0.1281
-0.1303
-0.111
(0.0031) (0.0024)
[.00007]
[.0005]
y2 discrete 2-3
-0.0253
-0.0532
-0.0319
(0.0022) (0.0019)
[.0009]
[.0002]
x1
0.0218
0.0327
0.0276
0.0291
0.0263
(0.0222) (0.0176) (0.0182) (0.0169)
x2
0.0218
0.0215
0.0212
0.0236
0.0244
(0.0201) (0.0170) (0.0179) (0.0184)
Note: Figures in brackets ()[] are standard deviation and RMSE respectively.

77

Tobit
Fractional Fractional
BS
Probit
Probit
MLE
QMLE-PW
NLS
y2 is assumed endogenous
-0.1637
-0.2144
-0.2208
(0.0096)
(0.0124)
(0.0052)
[.0017]
[.0001]
[.0001]
-0.21
-0.2871
-0.3
(0.0136)
(0.0196)
(0.0054)
[.0028]
[.0003]
[.00008]
-0.1491
-0.1232
-0.1288
(0.0060)
(0.0047)
(0.0025)
[.0007]
[.0002]
[.00002]
-0.0452
-0.0258
-0.0247
(0.0030)
(0.0023)
(0.0021)
[.0006]
[.00002]
[.00002]
0.0273
0.0305
0.0318
(0.0184)
(0.0201)
(0.0237)
0.0199
0.0233
0.02
(0.0187)
(0.0206)
(0.0216)

Fractional
Probit
QMLE
-0.2205
(0.0045)
[.0001]
-0.2994
(0.0040)
[.00006]
-0.1291
(0.0020)
[.00003]
-0.0249
(0.0015)
[.00001]
0.0313
(0.0208)
0.0203
(0.0187)

Table A.5: Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.5, 500 replications, δ23 = 0.3)
Model

True
value
APE

Linear

Tobit

Fractional Linear
Probit
Estimation Method
OLS
MLE
QMLE
2SLS
y2 is assumed exogenous
y2 continuous
-0.2402 -0.1352 -0.1618
-0.2094
-0.1661
(0.0047) (0.0037) (0.0050) (0.0621)
[.0033] [.0025]
[.0010]
[.0023]
y2 discrete 0-1
-0.3202
-0.1992
-0.2724
(0.0044) (0.0055)
[.0038]
[.0015]
y2 discrete 1-2
-0.1275
-0.1605
-0.1195
(0.0026) (0.0016)
[.0010]
[.0003]
y2 discrete 2-3
-0.0213
-0.0386
-0.0268
(0.0027) (0.0011)
[.0005]
[.0002]
x1
0.024
0.0224
0.0114
0.0117
0.0237
(0.0181) (0.0118) (0.0130) (0.0189)
x2
0.024
0.0218
0.0104
0.0109
0.023
(0.0181) (0.0123) (0.0129) (0.0192)
Note: Figures in brackets ()[] are standard deviation and RMSE respectively.
This table presents the case of weak IV (δ23 = 0.3).

78

Tobit
Fractional Fractional
BS
Probit
Probit
MLE
QMLE-PW
NLS
y2 is assumed endogenous
-0.1823
-0.2327
-0.2405
(0.0363)
(0.0366)
(0.0046)
[.0023]
[.0002]
[.00001]
-0.2301
-0.3157
-0.3199
(0.0522)
(0.0799)
(0.0037)
[.0018]
[.0001]
[.00001]
-0.1676
-0.1248
-0.128
(0.0189)
(0.0087)
(0.0016)
[.0029]
[.00009]
[.00002]
-0.0332
-0.0227
-0.0215
(0.0108)
(0.0066)
(0.0012)
[.0013]
[.00005]
[.000001]
0.0235
0.0253
0.0261
(0.0225)
(0.0142)
(0.0146)
0.0227
0.0227
0.0244
(0.0244)
(0.0134)
(0.0135)

Fractional
Probit
QMLE
-0.2407
(0.0043)
[.00001]
-0.3202
(0.0029)
[.000001]
-0.1279
(0.0015)
[.00001]
-0.0214
(0.0010)
[.000003]
0.0252
(0.0127)
0.025
(0.0119)

Table A.6: Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.5, 500 replications, δ23 = 0)
Model

True
value
APE

Linear

Tobit

Fractional
Tobit
Fractional Fractional Fractional
Probit
BS
Probit
Probit
Probit
Estimation Method
OLS
MLE
QMLE
MLE
QMLE-PW
NLS
QMLE
y2 is assumed exogenous
y2 is assumed endogenous
y2 continuous
-0.2441 -0.1382 -0.1652
-0.2117
-0.1827
-0.2625
-0.2436
-0.2441
(0.0045) (0.0034) (0.0049) (0.0424)
(0.0427)
(0.0045)
(0.0044)
[.0033] [.0024]
[.0010]
[.0020]
[.0002]
[.00001]
[.00001]
y2 discrete 0-1
-0.3194
-0.2019
-0.2708
-0.2284
-0.3513
-0.3186
-0.3194
(0.0039) (0.0053) (0.0562)
(0.0931)
(0.0036)
(0.0030)
[.0038]
[.0014]
[.0018]
[.0001]
[.00001] [.000001]
y2 discrete 1-2
-0.1269
-0.1605
-0.1189
-0.1697
-0.1294
-0.1274
-0.127
(0.0025) (0.0021) (0.0211)
(0.0101)
(0.0016)
(0.0015)
[.0010]
[.0002]
[.0015]
[.00007]
[.00001]
[.00001]
y2 discrete 2-3
-0.0211
-0.036
-0.0269
-0.0267
-0.0175
-0.0213
-0.021
(0.0026) (0.0020) (0.0129)
(0.0074)
(0.0011)
(0.0010)
[.0004]
[.0002]
[.0010]
[.00004]
[.000001] [.000002]
Note: Figures in brackets ()[] are standard deviation and RMSE respectively.
This table presents the case of no instrument (δ 23 = 0).

79

Table A.7: Simulation Result of the Average Partial Effects Estimates (N=100, η 1 = 0.5, 500 replications)
Model

True
value
APE

Linear

Tobit

Fractional Linear
Probit
Estimation Method
OLS
MLE
QMLE
2SLS
y2 is assumed exogenous
y2 continuous
-0.235 -0.1419 -0.1695
-0.218
-0.1688
(0.0221) (0.0193) (0.0216) (0.0667)
[.094]
[.0066]
[.0017]
[.0066]
y2 discrete 0-1
-0.3281
-0.2173
-0.3
(0.0253) (0.0339)
[.0111]
[.0028]
y2 discrete 1-2
-0.1308
-0.1767
-0.1252
(0.0222) (0.0096)
[.0046]
[.0006]
y2 discrete 2-3
-0.0214
-0.0316
-0.0243
(0.0129) (0.0034)
[.0010]
[.0003]
Note: Figures in brackets ()[] are standard deviation and RMSE respectively.

80

Tobit
Fractional Fractional
BS
Probit
Probit
MLE
QMLE-PW
NLS
y2 is assumed endogenous
-0.1837
-0.234
-0.2371
(0.0356)
(0.0253)
(0.0166)
[.0051]
[.0001]
[.0002]
-0.2386
-0.3277
-0.3288
(0.0492)
(0.0457)
(0.0137)
[.0090]
[.00004]
[.00005]
-0.184
-0.1305
-0.13
(0.0257)
(0.0139)
(0.0073)
[.0053]
[.00004]
[.00009]
-0.0286
-0.0217
-0.0211
(0.0134)
(0.0041)
(0.0036)
[.0007]
[.00003]
[.00003]

Fractional
Probit
QMLE
-0.2366
(0.0162)
[.00017]
-0.3281
(0.0122)
[.00004]
-0.1306
(0.0063)
[.00004]
-0.0213
(0.0027)
[.00001]

Table A.8: Simulation Result of the Average Partial Effects Estimates (N=500, η 1 = 0.5, 500 replications)
Model

True
value
APE

Linear

Tobit

Fractional Linear
Probit
Estimation Method
OLS
MLE
QMLE
2SLS
y2 is assumed exogenous
y2 continuous
-0.2358 -0.1415
-0.171
-0.2201
-0.1617
(0.0177) (0.0157) (0.0163) (0.0175)
[.0046] [.0034]
[.0007]
[.0041]
y2 discrete 0-1
-0.3285
-0.219
-0.3026
(0.0219) (0.0311)
[.0059]
[.0012]
y2 discrete 1-2
-0.1309
-0.1782
-0.1259
(0.0205) (0.0082)
[.0028]
[.0002]
y2 discrete 2-3
-0.0214
-0.0309
-0.024
(0.0109) (0.0025)
[.0004]
[.0001]
Note: Figures in brackets ()[] are standard deviation and RMSE respectively.

81

Tobit
Fractional Fractional
BS
Probit
Probit
MLE
QMLE-PW
NLS
y2 is assumed endogenous
-0.1815
-0.2334
-0.2379
(0.0114)
(0.0119)
(0.0051)
[.0029]
[.0001]
[ .0001]
-0.2351
-0.3241
-0.3293
(0.0153)
(0.0192)
(0.0109)
[.0052]
[.0002]
[.00001]
-0.1847
-0.13
-0.131
(0.0158)
(0.0058)
(0.0044)
[.0031]
[.00004]
[.00001]
-0.0267
-0.0219
-0.0212
(0.0084)
(0.0018)
(0.0017)
[.0002]
[.00002]
[.00001]

Fractional
Probit
QMLE
-0.2376
(0.0086)
[.0001]
-0.329
(0.0106)
[.00002]
-0.1311
(0.0043)
[.000004]
-0.0213
(0.0014)
[.000004]

Table A.9: Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.5, 500 replications)
Model

True
value
APE

Linear

Tobit

Fractional Linear
Probit
Estimation Method
OLS
MLE
QMLE
2SLS
y2 is assumed exogenous
y2 continuous
-0.2347 -0.1283 -0.1591
-0.2079
-0.1583
(0.0046) (0.0042) (0.0051) (0.0110)
[.0034] [.0024]
[.0008]
[.0024]
y2 discrete 0-1
-0.32
-0.2014
-0.2763
(0.0046) (0.0051)
[.0038]
[.0014]
y2 discrete 1-2
-0.1273
-0.161
-0.1193
(0.0027) (0.0017)
[.0011]
[.0002]
y2 discrete 2-3
-0.0212
-0.0388
-0.0259
(0.0030) (0.0012)
[.0006]
[.0001]
Note: Figures in brackets ()[] are standard deviation and RMSE respectively.

82

Tobit
Fractional Fractional
BS
Probit
Probit
MLE
QMLE-PW
NLS
y2 is assumed endogenous
-0.1754
-0.2295
-0.2368
(0.0064)
(0.0077)
(0.0051)
[.0019]
[.0002]
[.00008]
-0.2262
-0.3109
-0.3201
(0.0082)
(0.0099)
(0.0041)
[.0030]
[.0003]
[.00016]
-0.1716
-0.1258
-0.128
(0.0031)
(0.0023)
(0.0020)
[.0014]
[.00005]
[.00004]
-0.0317
-0.0224
-0.0214
(0.0031)
(0.0014)
(0.0014)
[.0003]
[.00004]
[.00001]

Fractional
Probit
QMLE
-0.2371
(0.0050)
[.00008]
-0.3204
(0.0030)
[.00001]
-0.1278
(0.0016)
[.00001]
-0.0212
(0.0010)
[.000001]

Table A.10: Simulation Result of the Average Partial Effects Estimates (N=2000, η 1 = 0.5, 500 replications)
Model

True
value
APE

Linear

Tobit

Fractional Linear
Probit
Estimation Method
OLS
MLE
QMLE
2SLS
y2 is assumed exogenous
y2 continuous
-0.2347 -0.1286 -0.1591
-0.208
-0.1591
(0.0028) (0.0028) (0.0031) (0.0082)
[.0024] [.0017]
[.0006]
[.0017]
y2 discrete 0-1
-0.3201
-0.2014
-0.2766
(0.0034) (0.0036)
[.0027]
[.0010]
y2 discrete 1-2
-0.1275
-0.1609
-0.1194
(0.0020) (0.0012)
[.0008]
[.0002]
y2 discrete 2-3
-0.0213
-0.039
-0.0259
(0.0020) (0.0008)
[.0004]
[.0001]
Note: Figures in brackets ()[] are standard deviation and RMSE respectively.

83

Tobit
Fractional Fractional
BS
Probit
Probit
MLE
QMLE-PW
NLS
y2 is assumed endogenous
-0.1755
-0.2293
-0.2369
(0.0044)
(0.0050)
(0.0031)
[.0013]
[.0001]
[.00005]
-0.2263
-0.3106
-0.3201
(0.0059)
(0.0074)
(0.0029)
[.0021]
[.0002]
[.000001]
-0.1717
-0.1258
-0.1281
(0.0024)
(0.0017)
(0.0015)
[.0010]
[.00004]
[.00001]
-0.0317
-0.0224
-0.0214
(0.0021)
(0.0010)
(0.0010)
[.0002]
[.00003]
[.000002]

Fractional
Probit
QMLE
-0.2371
(0.0030)
[.00006]
-0.3204
(0.0021)
[.000007]
-0.1278
(0.0011)
[.000009]
-0.0212
(0.0007)
[.000001]

Table A.11: Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.5, a1 is normally distributed, 500 replications)
Model

True
value
APE

Linear

Tobit

Fractional Linear
Probit
Estimation Method
OLS
MLE
QMLE
2SLS
y2 is assumed exogenous
y2 continuous
-0.2379 -0.1599 -0.1876
-0.2369
-0.1625
(0.0053) (0.0041) (0.0050) (0.0088)
[.0025] [.0015]
[.00003]
[.0024]
y2 discrete 0-1
-0.3409
-0.2431
-0.3393
(0.0040) (0.0028)
[.0031]
[.00005]
y2 discrete 1-2
-0.1361
-0.2032
-0.1358
(0.0019) (0.0010)
[.0021] [.000007]
y2 discrete 2-3
-0.0215
-0.0195
-0.0217
(0.0027) (0.0006)
[.00006] [.000005]
x1
0.0238
0.0265
0.0223
0.024
0.0237
(0.0165) (0.0084) (0.0059) (0.0189)
x2
0.0238
0.0234
0.0217
0.024
0.023
(0.0179) (0.0103) (0.0064) (0.0192)
Note: Figures in brackets ()[] are standard deviation and RMSE respectively.

Tobit
Fractional Fractional
BS
Probit
Probit
MLE
QMLE-PW
NLS
y2 is assumed endogenous
-0.1885
-0.2375
-0.239
(0.0051)
(0.0056)
(0.0049)
[.0016]
[.00001]
[.00003]
-0.2445
-0.3403
-0.3401
(0.0063)
(0.0050)
(0.0023)
[.0031]
[.00002]
[.00003]
-0.2037
-0.136
-0.136
(0.0026)
(0.0012)
(0.0011)
[.0021]
[.000002] [.000002]
-0.0192
-0.0216
-0.0216
(0.0032)
(0.0007)
(0.0008)
[.00007] [.000003] [.000003]
0.0224
0.024
0.0242
(0.0084)
(0.0059)
(0.0064)
0.0218
0.024
0.0239
(0.0104)
(0.0064)
(0.0064)

Fractional
Probit
QMLE
-0.239
(0.0048)
[.00003]
-0.3401
(0.0018)
[.00003]
-0.136
(0.0010)
[.000002]
-0.0216
(0.0006)
[.000003]
0.024
(0.0061)
0.0239
(0.0061)

Table A.12: Simulation Result of the Average Partial Effects Estimates (N=1000, η 1 = 0.5, 500 replications)
APE (QMLE)
y2 continuous
y2 discrete 0-1
y2 discrete 1-2
y2 discrete 2-3

True APE
-0.2347
-0.32
-0.1273
-0.0212

Mean
SD
MSE
-0.2371 0.005 0.0051
-0.3204 0.003 0.0029
-0.1278 0.0016 0.0014
-0.0212 0.001 0.0009

84

Rejection rate
0.046
0.045
0.046
0.048

Table A.13: Comparison of analytical and bootstrapping mean of standard errors (N=1000, η 1 = 0.5, 200 replications)
Model
Estimation Method
Standard error
y2 continuous

Fractional Probit

QMLE
NLS
analytical bootstrapping analytical bootstrapping
-0.2406
-0.2406
-0.2405
-0.2405
(0.0043)
(0.0041)
(0.0046)
(0.0043)
y2 discrete 0-1
-0.3203
-0.3203
-0.32
-0.32
(0.0030)
(0.0028)
(0.0038)
(0.0034)
y2 discrete 1-2
-0.1279
-0.1279
-0.128
-0.128
(0.0015)
(0.0013)
(0.0016)
(0.0014)
y2 discrete 2-3
-0.0214
-0.0214
-0.0215
-0.0215
(0.0010)
(0.0010)
(0.0012)
(0.0012)
Note: Figures in parenthesis () are mean of standard errors. Figures not in
parenthesis () are APEs’ estimates. Bootstrapping standard errors are
obtained by bootstrapping method using 100 bootstrap replications.

85

Table A.14: Frequencies of the Number of Children
Number Frequency Percent
Cumulative
of kids
relative frequency
0
16,200
50.9
50.9
1
10,000
31.42
82.33
2
3,733
11.73
94.06
3
1,373
4.31
98.37
4
323
1.01
99.39
5
134
0.42
99.81
6
47
0.15
99.96
7
6
0.02
99.97
8
4
0.01
99.99
9
2
0.01
99.99
10
2
0.01
100
Total
31,824
100

Table A.15: Descriptive Statistics
Variable
frhour
kidno
age
agefstm
hispan
nonmomi
edu
samesex
multi2nd

Description
Mean
S.D.
Women’s weekly fractional working hours
0.126 0.116
Number of kids
0.752 0.977
Mother’s age in years
29.742 3.613
Mother’s age in years when ﬁrst child was born
20.118 2.889
=1 if race is hispanic; = 0 if race is black
0.593 0.491
Non-mom’s labor income
31.806 20.375
Education = Number of schooling years
11.005 3.305
=1 if the 1st 2 kids have the same sex; = 0 otherwise 0.503
0.5
=1 if the 2nd birth is twin; =0 otherwise
0.009 0.093

86

Min Max
0
0.589
0
10
21
35
15
32
0
1
0
157.4
0
20
0
1
0
1

Table A.16: First-stage Estimates using Instrumental Variables
Dependent Variable - Kidno Linear model (OLS) Negative Binomial II model (MLE)
edu
-0.065
-0.078
(0.002)
(0.002)
age
0.096
0.119
(0.002)
(0.002)
agefstm
-0.114
-0.156
(0.002)
(0.003)
hispan
0.036
0.045
(0.010)
(0.015)
nonmomi
-0.002
-0.003
(0.000)
(0.000)
samesex
0.075
0.098
(0.010)
(0.013)
multi2nd
0.786
0.728
(0.052)
(0.045)
constant
0.911
0.013
(0.042)
(0.067)
Note: Figures in parentheses are robust standard errors.

87

Table A.17: Estimates Assuming Number of Kids is Conditionally Exogenous
Model
Estimation Method

Linear
Tobit
Fractional Probit
OLS
MLE
QMLE
Coefﬁcient Coefﬁcient
APE
Coefﬁcient
APE
kidno (continuous)
-0.019
-0.034
-0.0225
-0.099
-0.0202
(0.0007)
(0.0013) (0.0008) (0.0040)
(0.0008)
0-1
-0.0231
-0.0207
(0.0008)
(0.0008)
1-2
-0.0207
-0.0185
(0.0007)
(0.0007)
2-3
-0.0183
-0.0163
(0.0005)
(0.0005)
edu
0.004
0.008
0.005
0.022
0.005
(0.0002)
(0.0004) (0.0002) (0.0010)
(0.0002)
age
0.005
0.008
0.006
0.024
0.005
(0.0002)
(0.0003) (0.0002) (0.0010)
(0.0002)
agefstm
-0.006
-0.01
-0.007
-0.03
-0.006
(0.0003)
(0.0004) (0.0003) (0.0010)
(0.0003)
hispan
-0.032
-0.052
-0.034
-0.15
-0.031
(0.0010)
(0.0022) (0.0014) (0.0070)
(0.0013)
nonmomi
-0.0003
-0.0006
-0.0004
-0.002
-0.0004
(0.0000)
(0.0001) (0.0000) (0.0002)
(0.0000)
Note: Figures in parentheses under the Coefﬁcient columns are robust standard errors.
Figures in parentheses under the APE columns are bootstrapped standard errors.

88

Table A.18: Estimates Assuming Number of Kids is Endogenous
Model
Estimation Method

Linear
2SLS

Tobit (BS)
MLE

Fractional Probit
Fractional Probit
Fractional Probit
QMLE-PW
QMLE
NLS
Kidno is assumed cont.
Coef.
Coef.
APE
Coef.
APE
Coef.
APE
Coef.
APE
kidno (continuous)
-0.016
-0.027
-0.018
-0.078
-0.016
-0.081
-0.017
-0.081
-0.017
(0.0070) (0.0130) (0.0080) (0.0370)
(0.0080)
(0.0070) (0.0010) (0.0070) (0.0010)
0-1
-0.018
-0.016
-0.017
-0.017
(0.0080)
(0.0080)
(0.0010)
(0.0010)
1-2
-0.017
-0.015
-0.015
-0.015
(0.0070)
(0.0070)
(0.0010)
(0.0010)
2-3
-0.015
-0.014
-0.014
-0.014
(0.0060)
(0.0050)
(0.0010)
(0.0010)
edu
0.004
0.009
0.006
0.024
0.005
0.024
0.005
0.024
0.005
(0.0005) (0.0009) (0.0006) (0.0020)
(0.0005)
(0.0010) (0.0005) (0.0010) (0.0005)
age
0.005
0.008
0.005
0.022
0.004
0.021
0.004
0.021
0.004
(0.0007) (0.0010) (0.0008) (0.0040)
(0.0008)
(0.0010) (0.0008) (0.0010) (0.0008)
agefstm
-0.006
-0.01
-0.006
-0.028
-0.006
-0.027
-0.005
-0.027
-0.005
(0.0008) (0.0010) (0.0010) (0.0040)
(0.0009)
(0.0020) (0.0008) (0.0020) (0.0008)
hispan
-0.032
-0.052
-0.034
-0.15
-0.031
-0.151
-0.031
-0.151
-0.031
(0.0010) (0.0020) (0.0010) (0.0070)
(0.0010)
(0.0070) (0.0010) (0.0070) (0.0010)
nonmomi
-0.0003
-0.0005
-0.0004
-0.002
-0.0003
-0.002
-0.0003
-0.002
-0.0003
(0.00004) (0.00006) (0.00004) (0.00002) (0.00004) (0.00020) (0.00004) (0.00020) (0.00004)
Note: Figures in parentheses under the Coefﬁcient columns are robust standard errors. Figures in parentheses under the APE
columns are bootstrapped standard errors; those under the APEs for a count endogenous variable with the QMLE and NLS
methods are computed standard errors.

89

Appendix B
TABLES AND FIGURES FOR CHAPTER 2

Table B.1: Summary Statistics
Variable Description
Annual Hours
Experience (years)
Education (years)
Age (years)
Number of children aged 0-2
Number of children aged 3-5
Number of children aged 6-17
Married (= 1 if married)
Husband’s employment status (=1 if working)
Non-wife income (thousand dollars)
Number of observations
Number of women
Number of years

90

Mean
1105.7
11.89
12.94
41.42
0.13
0.18
0.84
0.88
0.82
36,622.4
11,232
864
13

Standard deviation
886.52
7.71
2.27
10.18
0.37
0.42
1.01
0.32
0.39
41,704

Table B.2: Determinants of Female Working Experience - First stage regressions
Dependent variable: Female Working Experience
Number of children aged 0-2
-0.442**
[0.200]
Number of children aged 3-5
-0.707***
[0.169]
Number of children aged 6-17
-1.207***
[0.162]
Years of schooling
0.470***
[0.096]
Married
-1.191
[1.186]
Husband’s work participation
-1.256
[0.913]
Non-wife income
-0.00003***
[0.00001]
Age
1.174***
[0.138]
Age squared
-0.010***
[0.002]
η
0.958
Number of observations
11,232
Number of women
864
R-squared
0.37
F-Statistics on IVs
196.26
Note: *, **, ***: signiﬁcant at 10%, 5% and 1% level respectively. Other explanatory
variables include time dummies and time averages of all explanatory variables.
Standard errors robust to heteroskedasticity and serial correlation are inside
square brackets. Instrumental variables (IVs) are age and age squared.

91

Table B.3: Estimating Dynamic Female Labor Supply, Second Stage Regressions, Experience is
Treated as an Endogenous Variable
Model
Estimation Method

Lagged Hours
Experience
Children 0-2
Children 3-5
Children 6-17
Education
Married
Husband’s work status
Non-wife income

Dynamic Linear
GMM
[1]
0.857***
[0.012]
4.683***
[1.514]
-37.657**
[15.718]
0.371
[12.932]
29.820***
[5.81]
7.979**
[3.147]
-134.671***
[33.576]
136.438***
[26.787]
-0.001***
[0.0004]

Initial Condition
v2it
v2it*

Correlated
RE (CRE)
[2]
0.542***
[0.009]
4.964**
[2.381]
-73.537***
[15.884]
-44.292***
[13.66]
46.139***
[8.108]
-6.849
[9.415]
-253.253**
[124.741]
205.205***
[25.982]
0.001***
[0.0002]
0.161***
[0.038]
-59.413***
[3.643]

Tobit
CRE with serial
correlation correction
[3]
0.492***
[0.025]
13.207***
[1.582]
-148.978***
[18.038]
-97.080***
[15.628]
7.103
[8.364]
3.975
[9.577]
-234.204**
[117.946]
195.717***
[28.254]
-0.001***
[0.0002]
0.102***
[0.011]

427.820***
[47.673]
Observations
10368
10368
10368
Number of women
864
864
864
Note: *, **, ***: signiﬁcant at 10%, 5% and 1% level respectively. z i and v2i
are included in (2) and (3) but not reported in the table. The ﬁrst stage residual
in (3) is free of serial correlation. Standard errors corrected for the ﬁrst
stage estimation are inside square brackets.

92

Table B.4: Average Partial Effects on Female Labor Supply
Model
Estimation Method

Dynamic Linear
Tobit
GMM
CRE
CRE-SC
[1]
[2]
[3]
Lagged Hours
0.857***
0.469***
0.434***
[0.012]
[0.012]
[0.011]
Experience
4.683***
4.294
11.481***
[1.514]
[3.074]
[2.662]
Children 0-2
-37.657**
-63.616***
-122.552***
[15.718]
[17.612]
[14.848]
Children 3-5
0.371
-38.317**
-80.443***
[12.932]
[12.932]
[10.343]
Children 6-17
29.820***
39.914***
6.262
[5.81]
[10.10]
[8.722]
Education
7.979**
-5.925
3.504
[3.147]
[12.38]
[9.669]
Married
-134.671***
-219.09
-251.11
[33.576]
[278.695]
[244.441]
Husband’s work status
136.438***
177.521***
161.88***
[26.787]
[58.886]
[35.728]
Non-wife income
-0.001***
0.001***
-0.001***
[0.0004]
[0.0003]
[0.0004]
Replications
100
100
Note: *, **, ***: signiﬁcant at 10%, 5% and 1% level respectively. The ﬁgures
inside square brackets are bootstrapped standard errors with 100 replications.

93

30
20
Percent
10
0
0

1000

2000
3000
Annual work hours

4000

5000

Figure B.1: Distribution of Women’s Annual Hours of Work in 1980-1992

94

2000
1500
1000
500
0
0

10

20
30
Experience in years

Figure B.2: Hours of Work vs. Experience

95

40

50

1200
1000
800
600
400
200
0

1
2
Number of children in FU aged 0−2

Figure B.3: Hours of Work vs. Number of Children 0-2

96

3

1200
1000
800
600
400
0

1

2

Number of children in FU aged 3−5

Figure B.4: Hours of Work vs. Number of Children 3-5

97

3

1200
1100
1000
900
800
700
0

2

4

Number of children in FU aged 6−17

Figure B.5: Hours of Work vs. Number of Children 6-17

98

6

Appendix C
TABLES FOR CHAPTER 3

Table C.1: Summary Statistics for the Whole Sample
Variable Description
Female labor participation
Annual Hours
Education (years)
Age (years)
Newborn aged 0-2
Spouse’s age (years)
Spouse’s education (years)
Non-wife income (millions)
First child’s gender
(=1 if a boy, =0 if a girl)
First two children has same sex (=1 if yes, =0 if not)
Live in urban
Live with grandparent
Work on farm
Ethnic (=1 if major, =0 if minor)
Number of women
Number of observation
Note: N=665 women. Years = 2004, 2006, 2008.
S.D. stands for standard deviation.

99

Mean
S.D.
0.95
0.21
1938.02 755.29
7.29
3.88
39.18
7.21
0.1
0.3
41.9
7.48
8.2
3.86
8.44
16.13
0.55
0.5
0.56
0.18
0.09
0.84
0.79
665
1995

0.5
0.38
0.29
0.37
0.4

Table C.2: Summary Statistics for Each Year in the Panel
Variable Description
Annual Hours (hours)
Female labor participation (=1 if work, =0 if not)
Newborn aged 0-1 (=1 if yes, =0 if not)
Education (years)
Age (years)
Non-wife income (million dongs)
Spouse’s age (years)
Spouse’s education (years)
First two children has same sex (=1 if yes, =0 if not)
Live in urban (=1 if yes, =0 if not)
Work on farm (=1 if yes, =0 if not)
Ethnic (=1 if major, =0 if minor)
Number of observations

2004
2006
2008
Mean
S.D.
Mean
S.D.
Mean
S.D.
1818.47 826.48 1842.79 839.42 1792.62 869.48
0.96
0.19
0.96
0.2
0.94
0.23
0.16
0.36
0.09
0.28
0.06
0.25
7.23
3.81
7.31
3.85
7.32
3.98
37.23
7.03
39.16
7
41.13
7.09
6.32
14.77
7.89
13.05
11.1
19.53
39.97
7.32
41.9
7.32
43.83
7.3
8.12
3.81
8.18
3.8
8.3
3.96
0.59
0.49
0.57
0.5
0.52
0.5
0.17
0.38
0.18
0.38
0.19
0.39
0.86
0.35
0.83
0.38
0.82
0.38
0.8
0.4
0.8
0.4
0.79
0.41
665
665
665

100

Table C.3: Bivariate Probit Estimates of Fertility and LFP in the First Stage
Dependent Variable
Explanatory Variable
Newborn
Samesex
Non-wife income
Age
Age squared
Education
Husband’s age

Fertility Equation
Newborn
[1]
1.024***
[0.374]
0.008***
[0.003]
-0.302
[0.204]
0.002
[0.002]
-0.06
[0.08]
-

Husband’s age squared

-0.001***
[0.0002]
-0.031*
[0.017]
0.0001*
[0.00006]
-0.005
[0.03]
-0.007*
[0.004]
0.0001
[0.0003]
-0.07*
[0.05]

-

Husband’s education

LFP Equation
LFP
[2]
[3]
Coefﬁcient
APE/ATE
-0.136***
-0.127***
[0.05]
[0.03]
-

-

Cov(v2it,v3it) = ρ

-0.001***
[0.0002]
-0.019*
[0.011]
0.0001
[0.0001]
-0.003
[0.03]
-0.005*
[0.003]
-0.0001
[0.0002]
-0.05*
[0.03]

-0.165
[0.04]
Log likelihood
-866.48
Number of observations
1995
Note: N=665, T=3. Time averages of explanatory variables and year dummies for 2006
and 2008 are included. Figures in square brackets are clustered standard errors
to control for serial correlation across time.
*, **, ***: signiﬁcant at 10%, 5% and 1% level respectively.

101

Table C.4: Estimates for Log(Female Working Hours) Equation
Explanatory Variable

Newborn
Education
Age
Age squared
Non-wife income
Urban
Work on Farm
Ethnic

Pooled
Pooled
Fixed
OLS
2SLS
Effect
[1]
[2]
[3]
-0.134*** -0.232*** -0.164***
[0.04]
[0.053]
[0.044]
0.023*** 0.024***
0.018*
[0.004]
[0.006]
[0.01]
-0.014
0.003
0.039
[0.017]
[0.046]
[0.033]
0.0001
-0.0001
-0.001
[0.0002]
[0.0004]
[0.0004]
0.001
0.001
0.001*
[0.001]
[0.001]
[0.001]
0.090**
0.088**
0.077
[0.039]
[0.04]
[0.085]
-0.318*** -0.317*** -0.101*
[0.037]
[0.037]
[0.054]
-0.143*** -0.138***
-0.176
[0.034]
[0.037]
[0.161]

Fixed
Effect 2SLS
[4]
-0.277***
[0.082]
0.014
[0.011]
-0.002
[0.067]
0.0002
[0.001]
0.002*
[0.001]
0.053
[0.197]
-0.104
[0.066]
-0.195
[0.235]

SW
Procedure
[5]
-0.308***
[0.117]
0.019*
[0.012]
0.035
[0.037]
-0.0005
[0.0005]
0.002**
[0.001]
0.091**
[0.038]
-0.329***
[0.033]
-0.142***
[0.028]

R-square
0.1
0.08
0.07
0.03
0.1
Number of observations
1904
1904
1904
1904
1904
Note: The dependent variable is log(hours), with 1904 observations of positive hours. Year dummy
variables and time averages of explanatory variables are included. Standard errors are robust to
serial correlation and heteroskedasticity. Standard errors in the SW and proposed procedure are
corrected for the ﬁrst-step estimation. *, **, ***: signiﬁcant at 10%, 5% and 1% level respectively.

102

Proposed
Procedure
[6]
-0.345***
[0.108]
0.016**
[0.008]
-0.02**
[0.01]
0.0001**
[0.0001]
0.002***
[0.0006]
0.095***
[0.037]
-0.328***
[0.034]
-0.143***
[0.028]
0.11
1904

Appendix D
TECHNICALITIES FOR CHAPTER 1

D.1 Details of the QML Estimator
D.1.1 Asymptotic Variance for the Two-step Estimator
This section derives asymptotic standard errors for the QML estimator in the second step. The
adjusted asymptotic standard errors for the NLS estimator can be derived in a similar way. In the
ﬁrst stage, we have: y2 |z, a1 ∼Poisson[exp(zδ2 + a1 )] with the conditional density function:
f (y2 |z, a1 ) =

[exp(zδ 2 + a1 )]y2i exp[− exp(zδ 2 + a1 )]
.
y2 !

(D.1)

The unconditional density of y 2 conditioned only on z is obtained by integrating a 1 out of the
joint density. That is:
f (y2 |z) =

a1

f (y2 |z, a1 ) f (a1 )da1 ,

δ

in which f (a1 ) =δ0 0 exp(a1 )δ0 −1 exp(−δ0 exp(a1 ))Γ−1(δ0 ).
Let m = exp(zδ2 ) and c = exp(a1 ), then the conditional density is:
[mc]y2 exp[−mc]
,
Γ(y2 + 1)

f (y2 |z, a1 ) =
and the unconditional density is:
∞

f (y2 |z) =
0

δ

[mc]y2 exp [−mc] δ0 0 cδ0 −1 exp(−δ0 c))
dc.
Γ(y2 + 1)
Γ(δ0 )

This is equivalent to:
δ

[m]y2 δ0 0
f (y2 |z) =
Γ(y2 + 1)Γ(δ0 )
or

∞

exp[−c(m + δ0 )]cy2i +δ0 −1 dc,

0

δ

[m]y2 δ0 0
Γ(y2 + δ0 )
.
f (y2 |z) =
Γ(y2 + 1)Γ(δ0 ) (m + δ0 )(y2 +δ0 )
103

(D.2)

δ

0
Deﬁne h = m+δ results in:
0

f (y2 |z) =

Γ(y2 + δ0 )hδ0 (1 − h)y2
,
Γ(y2 + 1)Γ(δ0 )

(D.3)

where y2 = 0, 1, ... and δ0 > 0, which is the density function for the negative binomial distribution.
The log-likelihood for observation i is:

æ

é

æ

é

æ

é

δ0
exp(zi δ2 )
Γ(y2i + δ0 )
+ y2i ln
+ ln
.
li (δ2 , δ0 ) = δ0 ln
δ0 + exp(zi δ2 )
δ0 + exp(zi δ2 )
Γ(y2i + 1)Γ(δ0 )

(D.4)

For all observations:
N

L(δ2 , δ0 ) =

li(δ2 , δ0 ).

(D.5)

i=1

We can estimate jointly δ2 and δ0 by maximum likelihood estimation method.
Let γ = (δ2 , δ0 ) has the dimension of (L + 1) where L is the dimension of δ 2 which is the sum
of K and the number of instruments, under standard regularity conditions, we have:
N
√
ˆ
N(γ − γ ) = N −1/2
ri2 + o p (1),

(D.6)

i=1

¾

where
ri2 =

in which s0 =

∇δ li
2
∇δ li
0

A0 = E(∇2 li ) = E
γ

à
=

∇2 li
δ2
∇2 li
δ0

s01

à

s02

à

=E

−A−1 s01
01
−A−1 s02
02

¿
,

(D.7)

, and

H01

à

H02

=

A01
A02

à
.

After taking the ﬁrst derivative and the second derivative, we have:
z δ (y − exp(zi δ2 ))
s01 = i 0 2i
,
δ0 + exp(zi δ2 )

(D.8)

z z δ exp(zi δ2 )
H01 = − i i 0
,
δ0 + exp(zi δ2 )

(D.9)

104

s02 = ln(

δ0
exp(zi δ2 ) − y2i Γ (y2i + δ0 ) Γ (δ0 )
)+
+
−
,
δ0 + exp(zi δ2 )
δ0 + exp(zi δ2 ) Γ(y2i + δ0 ) Γ(δ0 )

(D.10)

H02 = H021 + H022 ,

(D.11)

where
H021 =

exp(zi δ2 )
exp(zi δ2 ) − y2i
,
−
δ0 [δ0 + exp(zi δ2 )] [δ0 + exp(zi δ2 )]2

ä

and
H022 =

Γ (y2i + δ0 )Γ(y2i + δ0 ) − Γ (y2i + δ0 )
[Γ(y2i + δ0 )]2

ç2

ä

ç2

Γ (δ0 )Γ(δ0 ) − Γ (δ0 )
−
,
[Γ(δ0 )]2

where s01 and H01 are L × 1 and L × L matrices; s012 and H02 are 1 × 1 and 1 × 1 matrices. ri2 (γ )
has the dimension of (L + 1) × 1.

√
ˆ
With the two-step M-estimator, the asymptotic variance of N(θ − θ ) must be adjusted to
√
ˆ
account for the ﬁrst-stage estimation of N(γ − γ ) (see more in 12.4.2 of chapter 12, Wooldridge,

2002).
The score of the QML (or the gradient) for observation i with respect to θ is:
si ( θ ; γ ) =

θ li (θ ),

= y1i
=
=
=
=
si ( θ ; γ ) =

θ μi

θ μi

,
1 − μi
y1i θ μi (1 − μi ) − μi (1 − y1i ) θ μi
,
μi (1 − μi )
y1i θ μi − μi θ μi
,
μi (1 − μi )
(y1i − μi ) θ μi
,
μi (1 − μi )
(y1i − μi ) +∞ ∂ Φ(gi θ )
f (a1 |y2 , z)da1
μi (1 − μi ) −∞
∂θ

μi

− (1 − y1i )

(y1i − μi ) +∞
g φ (gi θ ) f (a1 |y2 , z)da1 ,
μi (1 − μi ) −∞ i

(D.12)

where gi = (y2i , z1i , a1i ) and θ = (α1 , δ1 , η1 ) and θ has the dimension of K + 2.
N
√
ˆ
N(θ − θ ) = A−1 (N −1/2
ri1 (θ ; γ )) + o p (1),
1
i=1

105

(D.13)

A1 = E [−
= E

æ(

θ si (θ ; γ )] ,
θ μi )

θ

μi

é

μi (1 − μi )
1
BB .
= E
μi (1 − μi )

ˆ
A1 = N −1

Ê

æ

N

æ

é

,

é

1
ˆ ˆ
BB ,
μi (1 − μi )
i=1

(D.14)

+∞
where B = −∞ gi φ (giθ ) f (a1 |y2 , z)da1 .

ri1 (θ ; γ ) = si (θ ; γ ) − F1 ri2 (γ ),
ˆ ˆ
ˆ
ˆ
ri1 (θ ; γ ) = si (θ ; γ ) − F1 ri2 (γ ),

(D.15)

where ri1 (θ ; γ ), si (θ ; γ ) are (K + 2) × 1 matrices, and ri2 (γ ) and F1 are (L + 1) × 1 and (K + 2) ×
(L + 1) matrices, A1 is a (K + 2) × (K + 2) matrix.

F1 = E[ γ si (θ ; γ )] = E
E

δ2 si (θ ; γ )

E

δ0 si (θ ; γ )

¾
N

1
ˆ
F1 =
N i=1
where

æ

−1
B
= E
μi (1 − μi )
−1
B
= E
μi (1 − μi )

æ

[μi (1 − μi

ˆ
)]−1B

[μi (1 − μi

ˆ
)]−1B

δ2 si (θ ; γ )

à

,
si ( θ ; γ )
δ0
+∞
∂ f (a1 |y2 , z)
Φ(gi θ )
da1
∂ δ2
−∞
+∞
∂ f (a1 |y2 , z)
Φ(gi θ )
da1
−∞
∂ δ0

é
é

Ê +∞ Φ(g θˆ )[∂ f (a |y , z)/∂ δ ]da ¿
i
1 2
2
1
−∞
Ê +∞ Φ(g θˆ )[∂ f (a |y , z)/∂ δ ]da ,
−∞

i

1 2

0

,

(D.16)

1

∂ f (a1 |y2i , zi ) zi PC[δ0 + exp(zi δ2 )](y2i +δ0 −1)
=
,
∂ δ2
Γ(y2i + δ0 )

in which
P = − exp(zi δ2 + a1 ) + a1 (y2i + δ0 ) − δ0 exp(a1 ),
and
C = {(y2i + δ0 ) exp(zi δ2 ) − exp(zi δ2 + a1 )[δ0 + exp(zi δ2 )]} .
106

,

(D.17)

∂ f (a1 |y2 , z)
= f (a1 |y2 , z)D,
∂ δ0

(D.18)

y +δ

2i
in which D = a1 − a1 exp(a1 ) + ln(δ0 + exp(zi δ2 )) + δ +exp(z0δ ) − Γ (y2i + δ0 ) and
i 2
0

f (a1 |y2 , z) =

exp(P)[δ0 + exp(zδ2 )](y2 +δ0 )
.
Γ(y2 + δ0 )

Therefore, we can obtain the asymptotic variance of the two-step estimator as:
√
ˆ
Avar N(θ − θ ) = A−1Var[ri1 (θ ; γ )]A−1,
1
1

(D.19)

and the estimator of this variance is:
ˆ
Avar(θ ) =

N
1 ˆ −1
ˆ
ˆ ˆ
ri1 ri1 A−1 .
N −1
A1
1
N
i=1

(D.20)

The asymptotic standard errors are obtained by the square roots of the diagonal elements of
this matrix.

D.1.2 Asymptotic Variance for the APEs
First, we need to obtain the asymptotic variance of

√

ˆ
N(ψ − ψ ) for continuous explanatory variable

where:
ˆ
ψ=

+∞
−∞

ˆ
ˆ
ˆ
φ (gθ ) f (a1 |y2 , z; θ )da1 θ ,

(D.21)

is the vector of scaled coefﬁcients times the scaled factor in the APE section

ψ=

+∞
−∞

φ (gθ ) f (a1 |y2 , z; θ )da1 θ ,

is the vector of scaled population coefﬁcients times the mean response.
If y2 is treated as a continuous variable:
APE =

+∞
−∞

ˆ
ˆ
ˆ
ˆ
ˆ
φ (α1 y2 + z1 δ1 + η1 a1 ) f (a1|y2 , z; θ )da1 α1 .

For a continuous variable z11 :
APE =

+∞
−∞

ˆ
ˆ
ˆ
ˆ
ˆ
φ (α1 y2 + z1 δ1 + η1 a1 ) f (a1 |y2 , z; θ )da1 δ11 .
107

(D.22)

ˆ ˆ ˆ
ˆ
Using problem 12.12 in Wooldridge (2002), and let π = (θ , δ 2 , δ0 ) we have:
N
√
√
ˆ
ˆ
N(ψ − ψ ) = N −1/2 [j(gi , zi , π ) − ψ ] + E[∇π j(gi , zi , π )] N(π − π ) + o p (1),

(D.23)

i=1

where
+∞

j(gi , zi , π ) =
and

f (a1 |y2 , z) = f (a1 ; δ0 , δ0 )
First, we need to ﬁnd

φ (gi θ ) f (a1 |y2 , z; θ )da1 θ ,

−∞

√
ˆ
N(π − π )

δ0 +y2
δ0 + exp(zδ 2 )
[exp(a1 )]y2 .
δ0 + exp(zδ 2 + a1 )

N

A−1 ri1
1

i=1

ri2

√
ˆ
N(π − π ) = N −1/2
√

ˆ
N(π − π ) = N −1/2

Thus the asymptotic variance of
+∞

Var

−∞

√

à

N

+ o p (1),

ki + o p (1).

(D.24)

i=1

ˆ
N(ψ − ψ ) is:

φ (giθ ) f (a1 |y2i , zi )da1 θ − ψ + J(π )ki ,

(D.25)

where J(π ) = E[∇π j(gi , zi , π )].
Next, we need to ﬁnd ∇θ j(gi , zi , π ) ; ∇δ j(gi , zi , π ) and ∇δ j(gi , zi , π ).
2

∇θ j(gi , zi , π )=
−

+∞
−∞

+∞
−∞

0

φ (gi θ ) f (a1 |y2i , zi )da1 IK+2

φ (giθ ) (gi θ ) (θ gi ) f (a1 |y2i , zi )da1 ,

(D.26)

where IK+2 is the identity matrix and (K + 2) is the dimension of θ .

∇δ j(gi , zi , π ) = θ
2

+∞
−∞

φ (gi θ )

∂ f (a1 |y2i , zi )
da1
∂ δ2

,

(D.27)

∂ f (a1 |y2i , zi )
da1
∂ δ0

,

(D.28)

where ∂ f (a1i |y2i , zi )/∂ δ2 is deﬁned in (D.17) and
∇δ j(gi , zi , π ) = θ
0

+∞
−∞

φ (gi θ )

108

where ∂ f (a1i |y2i , zi )/∂ δ0 is deﬁned in (D.18). ∇δ j(gi , zi , π ) is (K +2)×L matrix and ∇δ j(gi , zi , π )
2

0

is (K + 2) × 1 matrix.
Then,
∇π j(gi , zi , π ) = ∇θ j(gi , zi , π ; θ )|∇δ j(gi , zi , π ; δ2 )|∇δ j(gi , zi , π ;δ0 ) ,
2

0

(D.29)

and its expected value is estimated as:
ˆ
ˆ
J = J(π ) = N −1
Finally, Avar
Avar

ä√

ä√

N
i=1

ˆ
ˆ
ˆ
∇θ j(gi , zi , π ; θ )|∇δ j(gi , zi , π ; δ2 )|∇δ j(gi , zi , π ;δ0 ) .
2

ç

0

(D.30)

ˆ
N(ψ − ψ ) is consistently estimated as:

ç

ˆ
N(ψ − ψ ) = N −1

N
i=1

+∞

×

−∞

+∞
−∞

ˆ
ˆ ˆ ˆˆ
φ (gi θ ) f (a1 |y2i , zi )da1 θ − ψ + Jki

ˆ
ˆ ˆ ˆˆ
φ (giθ ) f (a1 |y2i , zi )da1 θ − ψ + Jki .

(D.31)

where all quantities are evaluated at the estimators given above. The asymptotic standard error for
any particular APE is obtained as the square root of the corresponding diagonal element of (D.31),
√
divided by N.
√
Now we obtain the asymptotic variance of N(λ − λ ) for a count endogenous variable where:
APE = Ea1 [Φ(α1 yk+1 + z1 δ1 + η1 a1 ) − Φ(α1 yk + z1 δ1 + η1 a1 )].
2
2

(D.32)

For example, yk = 0 and yk+1 = 1.
2
2
APE =

+∞
−∞

Var

ˆ
ˆ
Φ(gk+1 θ ) f (a1 |y2 , z; θ )da1 −
i

√

(1) We start with:

N(λ − λ )

√

+∞
−∞

ˆ
ˆ
Φ(gk θ ) f (a1 |y2 , z; θ )da1 ,
i

√
= Var N (λk+1 − λk ) − (λk+1 − λk ) ,
√
√
= Var N(λk+1 − λk+1 ) +Var N(λk − λk )
√
√
−2Cov[ N(λk+1 − λk+1 ), N(λk − λk )].

N

N(λk − λk ) = N −1/2

i=1

109

j(gk , zi , π ) − λk
i

(D.33)

√
ˆ
+E[∇π j(gk , zi , π )] N(π − π ) + o p (1),
i

Ê

(D.34)

+∞
where j(gk , zi , π ) = −∞ Φ(gk θ ) f (a1 |y2i , zi )da1 .
i
i

Var

√

N(λk − λk ) = N −1

N

+∞

Φ(gk θ ) f (a1 |y2i , zi )da1 − λk + Jki
i

2

,

(D.35)

ˆ
ˆ
ˆ
∇θ j(gk , zi , π ; θ )|∇δ j(gk , zi , π ; δ2 )|∇δ j(gk , zi , π ;δ0 ) ,
i
i
i

(D.36)

i=1

−∞

ˆ
in which the notations of ki is the same as (D.24) and J is deﬁned as follows:
ˆ
ˆ
J = J(π ) = N −1

N
i=1

2

+∞ k
g
−∞ i

∇θ j(gk , zi , π ; θ ) =
i
∇δ j(gk , zi , π ; δ2 ) =
i
2

∇δ j(gk , zi , π ;δ0 ) =
i
0

(2) Var

√

+∞
−∞
+∞
−∞

0

φ (gk θ ) f (a1 |y2i , zi )da1 ,
i

(D.37)

Φ(gk θ )
i

∂ f (a1 |y2i , zi )
da1 ,
∂ δ2

(D.38)

Φ(gk θ )
i

∂ f (a1 |y2i , zi )
da1 .
∂ δ0

(D.39)

N(λk+1 − λk+1 ) is obtained in a similar way as (1).

(3) Using the formula: Cov(x, y) = E(xy) − ExEy and getting the estimator of this Covariance
with the notice that E(λk ) = λk , after some algebra, we have the estimator for this covariance is 0.
Adding (1), (2) and (3) together, we get:
Var

√

N(λ − λ ) = Var

√

N(λk − λk ) + Var

√

N(λk+1 − λk+1 ) .

(D.40)

The asymptotic standard error for APE of the count endogenous variable is obtained as the
√
square root of the corresponding diagonal element of (D.40), divided by N

D.2 Details of the Tobit Model’s Estimators
This appendix shows how to obtain the average partial effects for Tobit models in both cases where
y2 is assumed exogenous and endogenous respectively.
Following the Smith-Blundell (1986) approach, the model with endogenous y 2 is written as:
y1 = max(0, α1 y2 + z1 δ1 + v2 ξ1 + e1 ),

110

(D.41)

where the reduced form of y2 is:
y2 = zπ 2 + v2 , v2 |z ∼ Normal(0, Σ2),
2
and e1 |z, v2 ∼ Normal(0, σe ). The conditional mean of y1 is:

(D.42)

Õ

2
E(y1 |z, y2 , v2 ) = Φ[(α1y2 + z1 δ1 + v2 ξ1 )/ (1 + σe )],

= Φ(α1e y2 + z1 δ1e + v2 ξ1e ).
2
The Blundell-Smithprocedure for estimating α 1 , δ1 , ξ1 and σe will then be:

(i) Run the OLS regression of yi2 on zi and save the residuals vi2 , i = 1, 2, . . ., N.
ˆ
ˆ
ˆ
ˆ
(ii) Do Tobit of yi1 on yi2 , z1i and vi2 to get α1e , δ1e , and ξ1e , i = 1, 2, . . ., N.
ˆ
APEs for Tobit model with exogenous or endogenous variable are obtained as follows:
* APE in Tobit Model with exogenous variable y 2
y1 = max(0, y∗ ), y∗ = α1 y2 + z1 δ1 + a1 , a1 |y2 , z1 ∼ N(0, σ 2 ).
1
1
The conditional mean is:
E(y1 |z1 , y2 ) = Φ(α1s y2 + z1 δ1s )(α1 y2 + z1 δ1 ) + σ φ (α1s y2 + z1 δ1s ),

(D.43)

α
δ1
where α1s = σ1 , δ1s = σ .

We deﬁne E(y1 |z1 , y2 ) = m(y2 , z1 , θ1s , θ1 ).
For a continuous variable y2 :
APE =

∂ E(y1 |z1 , y2 )
= Φ(α1s y2 + z1 δ1s )α1 .
∂ y2

(D.44)

1 N
ˆ ˆ
ˆ
Φ(α1s y2i + z1i δ1s )α1 .
N i=1

(D.45)

The estimator for this APE is:
APE =

For a discrete variable y2 with the two values c and c + 1:
APE = m(y2i = c + 1) − m(y2i = c),
111

(D.46)

and the estimator for this APE is:
APE =

1 N
m(y2i = c + 1) − m(y2i = c),
ˆ
ˆ
N i=1

(D.47)

ˆ
ˆ
ˆ
ˆ
ˆ
ˆ ˆ
where m(y2i = c) = Φ(α1s c + z1i δ1s )(α1 c + z1i δ1 ) + σ φ (α1s c + z1i δ1s ).
ˆ
* APE in Tobit Model with endogenous y 2 (Blundell-Smith 1986)
y1 = max(0, y∗ ), y∗ = α1 y2 + z1 δ1 + η1 a1 + e1 = α1 y2 + z1 δ1 + u1 ,
1
1
y2 = zδ 2 + a1 ,
2
Var(a1 ) = σ 2 , e1 |z, a1 ∼ N(0, τ1 ).

The standard method is to obtain APEs by computing the derivatives or the differences of:
2
Ea1 [m(α1y2 + z1 δ1 + η1 a1 , τ1 )],

(D.48)

2
2
2
where m(α1 y2 + z1 δ1 + η1 a1 , τ1 ) = m(α1 y2 + z1 δ1 , η1 σ 2 + τ1 ).

The conditional mean is:

Õ

2
2
E(y1 |z1 , y2 ) = Φ(α1s y2 + z1 δ1s )(α1y2 + z1 δ1 ) + η1 σ 2 + τ1 φ (α1s y2 + z1 δ1s ),

where α1s =

Õ

We deﬁne:

α1
,δ
2 σ 2 +τ 2 1s
η1
1

=

(D.49)

Õ

δ1
.
2 σ 2 +τ 2
η1
1

2
2
E(y1 |z1 , y2 ) = m(α1 y2 + z1 δ1 , η1 σ 2 + τ1 ).

(D.50)

ˆ
Consistent estimators of APEs are resulted from the derivatives or the differences of m( α1 y2 +
ˆ ˆ2 ˆ
ˆ2
ˆ
z1 δ1 , η1 σ 2 + τ1 ) with respect to elements of (z1 , y2 ) where σ 2 is the estimate of error variance
from the ﬁrst-stage OLS regression. APE with respect to z1 :
N

APE = N −1

ˆ ˆ
ˆ
Φ(α1s y2i + z1i δ1s )α1 ,

(D.51)

i=1

and APE with respect to y2 :
APE = N −1

N

m(y2i = c + 1) − m(y2i = c),
ˆ
ˆ

i=1

Õ

ˆ
ˆ
ˆ
ˆ2 ˆ
ˆ
ˆ
ˆ2 ˆ
where m(y2i = c) = Φ(α1s c + z1i δ1s )(α1 c + z1i δ1 ) + η1 σ 2 + τ1 φ (α1s c + z1i δ1s ).
ˆ
112

(D.52)

An alternative method is to get APEs by computing the derivatives or the differences of:
2
Ea1 [m(α1 y2 + z1 δ1 + η1 a1 , τ1 )],

(D.53)

2
2
where m(z1 , y2 , a1 , τ1 ) = m(x, τ1 ) = Φ(x/τ1)x + τ1 φ (x/τ1 ).

APE with respect to z1 :
APE

N

= N −1

Õ

2
Φ(x/ τ1 )δ11 .

(D.54)

i=1

APE with respect to y2 :
APE = N −1

N

[m1 − m0 ],

(D.55)

i=1

where m0 = m[y2 = 0] and x =α1 y2 + z1 δ1 + η1 a1 and a1 is the residual obtained from the ﬁrst
ˆ
stage estimation.
For more details, see the Blundell-Smith procedure and the APEs in (Wooldridge, 2002, chapter
16).

D.3 Formula of the NLS estimation
In order to compare the NLS and the QML estimation, the basic framework is introduced as below.
The ﬁrst stage is to estimate δ2 and δ0 by using the step-wise maximum likelihood of y i2 on
ˆ
ˆ
zi in the Negative Binomial model. Obtain the estimated parameters δ2 and δ0 . In the second
stage, instead of using QMLE, we use the NLS of yi1 on yi2 , zi1 to estimate α1 , δ1 and η1 with the
approximated conditional mean μi (θ ; y2 , z).
The NLS estimator of θ solves:
min N −1

θ ∈Θ

or

N
i=1

y1i −

+∞
−∞

Φ(α1 y2i + z1i δ1 + η1 a1 ) f (a1 |y2 , z)da1

2

,

N
−1
min N
[y1i − μi (θ ; y2i , zi )]2 /2.
θ ∈Θ
i=1

The score function can be written as:
si = −(y1i − μi )

+∞
−∞

gi φ (gi θ ) f (a1 |y2 , z)da1 .

113

(D.56)

D.4 Derivation of the Heterogeneity Distribution
We are given exp(a1 ) distributed as Gamma(δ0 , 1/δ0 ) using a single parameter δ0 . We are interested in obtaining the density function of Y = a 1 . Let X = exp(a1 ). The density function of X is
speciﬁed as follows:
δ

δ 0 X δ0 −1 exp(−δ0 X)
;
f (X; δ0 ) = 0
Γ(δ0 )

X > 0, δ0 > 0.

(D.57)

Since X > 0 and Y = ln(X), dX /dY = exp(Y ) and Y ∈ (−∞, ∞). The density function of Y will
be derived as:

¬¬ dX ¬¬
¬ ¬
f (Y ; δ0 ) = f [h(Y )] ¬ ¬ ;
¬ dY ¬

Y ∈ (−∞, ∞),

(D.58)

δ 0 exp(a1 )δ0 exp[−δ0 exp(a1 )]
f (Y ; δ0 ) = 0
,
Γ(δ0 )

(D.59)

δ
δ0 0 exp(Y )δ0 −1 exp[−δ0 exp(Y )]
where f [h(Y )] =
.
Γ(δ0 )

Plug in Y = a1 , we get:
δ

which is equation (1.4).

114

Appendix E
TECHNICALITIES FOR CHAPTER 2

E.1 Asymptotic Variance of the Two-step Estimator
If the null hypothesis of no endogeneity and no serial correlation in the ﬁrst stage is rejected,
the standard errors in the second stage should be adjusted for the ﬁrst stage estimation by using
delta method or bootstrapping. In addition, we also need to get asymptotic standard errors for the
average partial effects.
We start with the linear reduced form in the ﬁrst stage:
y2it = w∗ γ2 + v∗ ,
2it
2it

(E.1)

where w∗ = (zit , zi ) is 1 × (2L) vector of exogenous variables. Under standard regularity condi2it
tions, we have:

√

N(γ2 − γ2

N

) = N −1/2

πi2 (γ2 ) + o p (1),

(E.2)

i=1

where

πi2 = A−1 B2i v∗ ,
2i
2

(E.3)

and B2i is the T × (2L) matrix with tth row w ∗ , A2 = E(B2i B2i ) and v∗ is a T × 1 vector of
2it
2i
reduced form errors.
Now we can write:
M =E(y1it |zi , y1i,t−1 , y2it , y1i0 , w∗ ),
2i
2
M =m[ρ y1i,t−1 + α y2it + xit β + θ2 y1i0 + zi θ3 + (y∗ − w∗ γ2 )θ4 + θ1 (y∗ − w∗ γ2 ), σs∗ ],
2i
2i
2it
2it

2
M =m[α y2it + w3it λ3 + θ1 (y∗ − w∗ γ2 ) + (y∗ − w∗ γ2 )θ4 , σs∗ ],
2it
2it
2i
2i

(E.4)

2
∗2
∗2
where w3it = (y1i,t−1 , xit , yi0 , zi ); σs∗ = σa + σe and λ3 = (ρ , β , θ2 , θ3 ) .
1
1

We collect all the parameters in M except for γ2 into the parameter vector λ ∗ and abuse the
notation that wit = (y2it , w3it , v∗ , v∗ ) in this part. In the previous part we use w ∗ .
it
2it 2i
115

With the maximum likelihood in the second stage, the log likelihood for observation i in period
time t is:
2
lit (λ ∗ ; σs∗ ) = 1[y1it = 0] log[1 − Φ(wit λ ∗ /σs∗ )]
2
−1[y1it > 0]{log φ [(y1it − wit λ ∗ )/σs∗ ] − log(σs∗ )/2}.

(E.5)

Using the notation: Φ(w i λ ∗ /σs∗ ) = Φi ; φ (wi λ ∗ /σs∗ ) = φi and the constant does not affect the
maximization, we can rewrite this log likelihood as:
1
1
2
2
li (λ ∗ ; σs∗ ) = 1[y1i = 0] log(1 − Φi) − 1[y1i > 0]{ (y1i − wi λ ∗ )2 /σs∗ + log(σs∗ )},
2
2

¾

and we have the score as:
si (λ ∗ ; γ2 ) =

si1

¿ ¾

si2

=

∇λ ∗ li
∇σs∗ li

(E.6)

¿
,

(E.7)

2
and si1 = −1[y1i = 0](φi wi )/σs∗ (1 − Φi ) + 1[y1i > 0](y1i − wi λ ∗ )wi /σs∗ ,

ä

ç

2
4
2
si2 = 1[y1i = 0](φiwi λ ∗ )/[2σs∗(1 − Φi )] + 1[y1i > 0] (y1i − wi λ ∗ )/ 2σs∗ − 1/ 2σs∗ .
√
With the two-step M-estimator, the asymptotic variance of N(λ ∗ − λ ∗ ) must be adjusted to
√
account for the ﬁrst-stage estimation of N(γ2 − γ2 ) (see more in 12.4.2 of Chapter 12, Wooldridge,

2002). We can write:
√

¾

N(λ ∗ − λ ∗ ) = A−1 N −1/2
1

N

¿

πi1 (λ ∗ ; γ2 ) + o p (1),

(E.8)

i=1

where
A1 = E[−∇λ ∗ si1 (λ ∗ ; γ2 )],

(E.9)

and

Ò

Ó

∇λ ∗ si1 (λ ∗ ; γ2 ) = −σ −2 1[y1i = 0] [φi2 − φi (1 − Φi )λ ∗ ]/(1 − Φi)2 + 1[y1i > 0] wi wi ,
and

πi1 (λ ∗ ; γ2 ) = si1 (λ ∗ ; γ2 ) − F1 πi2 (γ2 ),

116

(E.10)

where

or F1 = −E

T
È [(1 − σs∗)φ

t=1

F1 = E[∇γ2 si1 (λ ∗ ; γ2 )],
it wit λ

∗+Φ

Therefore, we get:

√

Avar
where V =Var[πi1(λ ∗ ; γ2 )].

√

A valid estimator of Avar

∗
∗
it ](θ1 w2it + θ4 w2i )

(E.11)
.

N(λ ∗ − λ ∗ ) = A−1 VA−1 ,
1
1

(E.12)

N(λ ∗ − λ ∗ ) is:

A11 = A−1 N −1
1

N

πi1 πi1 A−1 ,
1

(E.13)

i=1

where
A1 = N −1

N T

Ò

Ó

2
σ −2 1[y1it = 0] [φit − φit (1 − Φit )λ ∗ ]/(1 − Φit )2 + 1[y1it > 0] wit wit ,

i=1t=1

and πi1 = si1 − F1 πi2 in which πi2 = A−1 B2i v∗ and
2i
2
F1 = −N −1

N T

[(1 − σs∗ )φit wit λ ∗ + Φit ](θ1 w∗ + θ4 w∗ ),
2it
2i

i=1t=1

and the asymptotic variance of λ ∗ is:
Avar(λ ∗ ) = A−1 QA−1 /N = A11 ,
1
1
where Q = N −1

N
Èπ

i=1

i1 πi1 .

å

√
2
2
We can derive Avar N(σs∗ − σs∗ )
√
Avar N(λ ∗ − λ ∗ ) and get A22 .

è

as the above procedure for the derivation of

2
And denote, Ψ ≡ (λ ∗ , σs∗ ) , we can derive Avar

Avar

ä√

(E.14)

ç

N(Ψ − Ψ) =

ä√
ç
N(Ψ − Ψ) as:
¾
¿
A11 A12
A21 A22

,

where A22 = A−1 QA−1 /N and
2
2
A2 = −σ −4 {(wiλ ∗ /σ s∗ )3 φi + (wi λ ∗ /σ s∗ )φi − [(wi λ ∗ /σ s∗ )φi2 /(1 − Φi )]−2Φi }/4
117

(E.15)

and A12 = A−1 QA−1 /N and
12
12
A12 = σ −3 {(wiλ ∗ /σ s∗ )2 φi + φi − [(wi λ ∗ /σ s∗ )φi2 /(1 − Φi )]}wi /2.

E.2 Asymptotic Variance of the Average Partial Effects
Next, we obtain the standard errors for the average partial effects as in equations (2.21) and (2.22).

¾

¿

N T

ϕ = λ ∗ (NT )−1

Φ(wit λ ∗ /σs∗ ) ,

(E.16)

i=1t=1

where wit = (y2it , w3it , v∗ , v∗ ).
2it 2i
T

ϕ = λ ∗ T −1

E [Φ(wit λ ∗ /σs∗ )] .

t=1

Then we need to compute the asymptotic variance of
Let μ = (λ ∗ ; γ2 ) and
p(wit , w∗ , μ ) ≡ (T −1
2it

√
N(ϕ − ϕ ).

T
È äΦ(äα y

∗
∗
∗
∗
2it + w3it λ3 + θ1 (y2it − w2it γ2 ) + (y2i − w2i γ2 )θ4

t=1

(E.17)

ç

ç

/σs∗ ) )λ ∗ ,

we have:
N
√
N(ϕ − ϕ ) = N −1

λ ∗ (T −1

i=1

√
[Φ(wit λ ∗ /σs∗ )]) − ϕ + E[∇μ p] N(μ − μ ) + o p (1).

T
t=1

(E.18)
In which:

¾
where Di =
of Di is:

√
A−1 πi1
1

πi2

¿

N(μ − μ ) = N

−1

N

Di + o p (1),

(E.19)

i=1

and all matrix deﬁnitions were introduced in step 1 and a valid estimator

¾
D=

Therefore, the asymptotic variance of

¾

Ω = Var

λ ∗ (T −1

A−1 πi1
1

πi2

¿

√
N(ϕ − ϕ ) is:
T

.

(E.20)

¿

[Φ(wit λ ∗ /σs∗ )]) − ϕ + PD ,

t=1

118

(E.21)

in which Ω = Var(K + PD)where K = λ ∗ (T −1

T
È [Φ(w

t=1

it λ

∗ /σ )]) − ϕ .
s∗

Hence, we can get
T

K = λ ∗ (T −1

Φ(wit λ ∗ /σs∗ ) ) − ϕ .

(E.22)

t=1

The last job is to ﬁnd the Jacobian P where P = E[∇ μ p]
P = [P1 |P2 ]
P1 = ∇λ ∗ p =T −1

T

[φ (wit λ ∗ /σs∗ )] (wit λ ∗ /σs∗ ) + Φ(wit λ ∗ /σs∗ )],

t=1

and
T

P1 = T −1

φ (wit λ ∗ /σs∗ ) (wit λ ∗ /σs∗ ) + Φ(wit λ ∗ /σs∗ )],

t=1

or in short:
T

P1 = T −1

[φ (ω )ω + Φ(ω )] ,

(E.23)

t=1

where ω = wit λ ∗ /σs∗
P2 = ∇γ2 p =T −1

T
È [φ (w

t=1

P2

it λ

∗ /σ )] (−θ w∗ − w∗ θ )λ ∗ /σ )] and
s∗
s∗
1 2it
2i 4

= T −1

T

[φ (ω )] (−θ1 w∗ − w∗ θ4 )λ ∗ /σs∗ )].
2it
2i

(E.24)

t=1

Therefore
P= (NT )−1

N
i=1

Finally, Avar

ä√

¾

T −1

T

[φ (ω )ω + Φ(ω )] |T −1

t=1

T

¿

[φ (ω )] (−θ1 w∗ − w∗ θ4 )λ ∗ /σs∗ )] .
2it
2i

t=1

(E.25)

ç

N(ϕ − ϕ ) is consistently estimated as:
Ω = N −1

N

(K − PD)(K − PD) .

(E.26)

i=1

The asymptotic standard error for any particular APE is obtained as the square root of the
√
corresponding diagonal element in the above expression, divided by N.

119

Appendix F
TECHNICALITIES FOR CHAPTER 3

Derivation of Maximum likelihood estimator in the firs stage and the Asymptotic Variance
in the second stage

F.1 Bivariate Probit Model in the First Stage
In the ﬁrst stage, we estimate equation (3.11) and equation (3.12) simultaneously and get the log
likelihood as in equation (3.23). Note that the model is qualitatively different from the usual bivariate probit model. In a simultaneous equations model (3.11-3.12), the second dependent variable
y3it appears on the right hand side of the equation with the dependent variable y 2it . One can derive
the following conditional mean and obtain the corresponding marginal effects of interest:
E(y2it |Wi ) = Pr[y3it = 1|Wi ]E[y2it |y3it = 1,Wi ] + Pr[y3it = 0|Wi ]E[y2it |y3it = 0,Wi ],

(F.1)

where
E[y2it |y3it = 1,Wi ] = Pr[y2it = 1|y3it = 1,Wi ],

(F.2)

E[y2it |y3it = 0,Wi ] = Pr[y2it = 1|y3it = 0,Wi ],

(F.3)

E(y2it |Wi ) = Φ2 (W3it γ3 ,W2it γ2 + α2 ; ρ ) + Φ2 (−W3it γ3 ,W2it γ2 ; −ρ ).

(F.4)

and

Therefore

To obtain the derivatives and Hessian, let us rewrite the log likelihood in a convenient way with
q2i = 2y2i − 1 and q3i = 2y3i − 1 (which results in qim = 1 if ymi = 1 and qim = −1 if ymi = 0, for
m = 2, 3):
lnLit = ln Φ2 (ki2 , ki3 ; π ),

120

(F.5)

where kim = qimWmit γm for m = 2, 3 (here the notation is abused under the note that γ 2 = (γ2 , α2 )
and π = q2i q3i ρ .
The score function and the information matrix resulting from equation (F.5) are derived as

¾

follows:

∂ ln Lit (θ1 )/∂ γ3
∂ ln Lit (θ1 )/∂ γ2

sit (θ1 ) = ∇θ ln Lit (θ1 ) =
1

∂ ln Lit (θ1 )/α2

¿
,

(F.6)

∂ ln Lit (θ1 )/ρ
and
I(θ1) = −E ∇2 ln Lit (θ1 ) .
θ

(F.7)

∂ ln Lit (θ1 )/∂ γ3 = Φ−1 (ki2 , ki3 ; π )(qi3W3it )gi3 ,
2

(F.8)

1

We have:

where gi3 = φ (ki3 )Φ (ki2 − π ki3 )(1 − π 2)−1/2 .

∂ ln Lit (θ1 )/∂ γ2 = Φ−1 (ki2 , ki3 ; π )(qi2W2it )gi2 ,
2

(F.9)

where gi2 = φ (ki2 )Φ (ki3 − π ki2 )(1 − π 2)−1/2 .

∂ ln Lit (θ1 )/∂ α2 = Φ−1 (ki2 , ki3 ; π )qi2gi2 ,
2

(F.10)

∂ ln Lit (θ1 )/∂ ρ = Φ−1 (ki2 , ki3 ; π )qi2 qi3 φ2 (ki2 , ki3 ; π ).
2

(F.11)

and

Therefore, the asymptotic variance of θ1 is:
Avar(θ1 ) = C−1VC−1 /N,

(F.12)

where
C = N −1

N

I(θ1 ),

i=1

121

(F.13)

and
V = N −1

N

sit (θ1 )sit (θ1 ) .

(F.14)

i=1

As a result, the estimator of the asymptotic variance of θ 1 is:
Avar(θ1 ) = C−1V C−1 /N,

√
d
N(θ1 − θ1 ) → Normal(0,C −1VC−1 ),

and

or

(F.15)

(F.16)

√

N(θ1 − θ1

) = N −1/2

N

ri (θ1 ) + o p (1),

(F.17)

i=1

where
ri (θ1 ) = −I(θ1 )−1 si (θ1 ),

(F.18)

ri (θ1 ) ≡ −I(θ1 )−1 si (θ1 ).

(F.19)

and

F.2 Asymptotic Variance of the Two-step Estimator
The asymptotic variance of the second-stage parameters, θ2 , needs to be corrected for general
heterokedasticity, serial correlation and ﬁrst-stage estimation of θ 1 using the delta method as shown
in Wooldridge (1995a) and Wooldridge (2002, chapter 12).
For y2it = 1,we deﬁne the general regressors for time period t as:
ˆ
ˆ
ˆ
ˆ
wit = (W1it , y3it , 0, .., 0, λit1, 0, .., 0, 0, .., 0, λit2, 0, .., 0, 0, .., 0, λit3, 0, .., 0, 0, .., 0, λit4, 0, .., 0, )
and the parameter vector in the second stage is:

θ2 = (γ1 , α1 , η11 , . . ., ηT 1 , η12 , . . . , ηT 2 , η13 , . . . , ηT 3 , η14 , . . ., ηT 4 )
which is a G × 1 vector where G = (1 + K1 + L + 4T ).

122

We can write E[log(y1it )|wit , y2it = 1] = wit θ2 , then we have: log(y1it ) = wit θ2 + εit where
E[εit |wit , y2it = 1] = 0 (t = 1, T ).
On the selected sample, our POLS estimator is:

θ2 =

N −1

N

−1

T

y2it wit wit

N −1

i=1 t=1

θ2 = θ2 +

N −1

N

T

y2it wit log(y1it ) ,

(F.20)

i=1 t=1
−1

T

N

y2it wit wit

N −1

i=1 t=1

N

T

y2it wit εit

,

(F.21)

i=1 t=1

and it can be shown that:
√
d
N(θ2 − θ2 ) → Normal(0, A−1BA−1 ),

(F.22)

where
T

A =E

y2it wit wit

,

(F.23)

t=1

B = Var(hi ) = E(hi hi ) and hi = si −Fri ,

(F.24)

in which
si =

T

y2it wit εit ,

(F.25)

t=1
T

F =E

y2it wit θ2 ∇θ wit (θ1 ) ,
1

t=1

(F.26)

in which ∇θ wit (θ1 ) is a G × Q gradient of wit (θ1 ) evaluated at θ1 and ri is deﬁned in the previous
1

part.
To estimate Avar(θ2 ) = A−1 BA−1 /N, we obtain:
A≡N −1

N

T

y2it wit wit ,

(F.27)

y2it wit θ2 ∇θ wit (θ1 ) ,

(F.28)

i=1 t=1

F≡N −1

N

T

1

i=1 t=1

and for each i = 1, N.
123

si ≡

T

y2it wit εit ,

(F.29)

t=1

in which εit = log(y1it ) − wit θ2 , and
hi = si −Fri .

(F.30)

A consistent estimator of B is:
N

B≡N −1

hi hi .

(F.31)

i=1

The asymptotic variance of θ2 is estimated as:
Avar(θ2 ) = A−1 BA−1 /N,

(F.32)

and the asymptotic standard errors are obtained as the square roots of the diagonal elements of this
matrix.

124

BIBLIOGRAPHY

125

BIBLIOGRAPHY

Abadie, Alberto. 2000. Semiparametric estimation of instrumental variable models for causal
effects. Working Paper 260. National Bureau of Economic Research.
Amemiya, Takeshi. 1978. The estimation of a simultaneous equation generalized probit model.
Econometrica 46(5). 1193–1205.
Amemiya, Takeshi. 1979. The estimation of a simultaneous equation tobit model. International
Economic Review 20(1). 169–81.
Angrist, Joshua D. 2001. Estimation of limited-dependent variable models with dummy endogenous regressors: Simple strategies for empirical practice. Journal of Business and Economic
Statistics 19(1). 2–16.
Angrist, Joshua D. & William N. Evans. 1998. Children and their parents’ labor supply: Evidence
from exogenous variation in family size. American Economic Review 88(3). 450 – 77.
Arellano, Manuel & Olympia Bover. 1995. Another look at the instrumental variable estimation
of error-components models. Journal of Econometrics 68(1). 29–51.
Arellano, Manuel & Bo Honore. 2001. Panel data models: Some recent developments. In J.J.
Heckman & E.E. Leamer (eds.), Handbook of econometrics, vol. 5 Handbook of Econometrics,
chap. 53, 3229–3296. Elsevier.
Baltagi, Badi H. & Qi Li. 1991. A transformation that will circumvent the problem of autocorrelation in an error-component model. Journal of Econometrics 48(3). 385–393.
Baltagi, Badi H. & Ping X. Wu. 1999. Unequally spaced panel data regressions with ar1 disturbances. Econometric Theory 15(06). 814–823.
Becker, Gary S. & H. Gregg Lewis. 1973. On the interaction between the quantity and quality of
children. Journal of Political Economy 81(2). S279–88.
Ben-Porath, Yoram & Finis Welch. 1976. Do sex preferences really matter? The Quarterly Journal
of Economics 90(2). 285 – 307.
Bhargava, Alok & J. D. Sargan. 1983. Estimating dynamic random effects models from panel data
covering short time periods. Econometrica 51(6). 1635–59.
Bloom, David, David Canning, Günther Fink & Jocelyn Finlay. 2009. Fertility, female labor force
participation, and the demographic dividend. Journal of Economic Growth 14(2). 79–101.
Blundell, Richard W. & James L. Powell. 2004. Endogeneity in semiparametric binary response
models. Review of Economic Studies 71. 655–679.
Blundell, Richard W. & Richard J. Smith. 1989. Estimation in a class of simultaneous equation
limited dependent variable models. Review of Economic Studies 56(1). 37–57.
126

Bronars, Stephen G. & Jeff Grogger. 2001. The effect of welfare payments on the marriage and
fertility behavior of unwed mothers: Results from a twins experiment. Journal of Political
Economy 109(3). 529–545.
Browning, Martin. 1992. Children and household economic behavior. Journal of Economic Literature 30(3). 1434–75.
Cain, Glen G. & Martin D. Dooley. 1976. Estimation of a model of labor supply, fertility, and
wages of married women. Journal of Political Economy 84(4). S179–99.
Cameron, Colin A. & Pravin K. Trivedi. 1986. Econometric models based on count data: Comparisons and applications of some estimators and tests. Journal of Applied Econometrics 1(1).
29–53.
Card, David & Daniel G. Sullivan. 1988. Measuring the effect of subsidized training programs on
movements in and out of employment. Econometrica 56(3). 497–530.
Carrasco, Raquel. 2001. Binary choice with binary endogenous regressors in panel data: Estimating the effect of fertility on female labor participation. Journal of Business and Economic
Statistics 19(4). 385–394.
Chamberlain, Gary. 1980. Analysis of covariance with qualitative data. Review of Economic
Studies 47(1). 225–38.
Chamberlain, Gary. 1992. Sequential moment restrictions in panel data: Comment. Journal of
Business and Economic Statistics 10(1). 20–26.
Chay, Kenneth Y. & Dean Hyslop. 1998. Identiﬁcation and estimation of dynamic binary response
panel data models: Empirical evidence using alternative approaches.
Das, Mitali. 2002. Estimators and inference in a censored regression model with endogenous
covariates. Discussion papers. Columbia University.
Das, Mitali. 2005. Instrumental variables estimators of nonparametric models with discrete endogenous regressors. Journal of Econometrics 124(2). 335 – 361.
Eckstein, Zvi & Kenneth I. Wolpin. 1990. Estimating a market equilibrium search model from
panel data on individuals. Econometrica 58(4). 783–808.
Even, William E. 1987. Career interruptions following childbirth. Journal of Labor Economics
5(2). 255–77.
Fleisher, Belton M. & Jr. Rhodes, George F. 1979. Fertility, women’s wage rates, and labor supply.
American Economic Review 69(1). 14–24.
Giles, J. & I. Murtazashvili. 2010. A control function approach to estimating dynamic probit
models with endogenous regressors, with an application to the study of poverty persistence in
china.
Greene, William H. 1997. Econometric analysis. NewYork: Macmillan 3rd edn.
127

Gronau, Reuben. 1973. The intrafamily allocation of time: The value of the housewives’ time.
American Economic Review 63(4). 634–51.
Hausman, Jerry A. 1978. Speciﬁcation tests in econometrics. Econometrica 46(6). 1251–71.
Heckman, James J. 1974. Effects of child-care programs on women’s work effort. Journal of
Political Economy 82(2). S136–S163.
Heckman, James J. 1978a. Dummy endogenous variables in a simultaneous equation system.
Econometrica 46(4). 931–59.
Heckman, James J. 1978b. Simple statistical models for discrete panel data developed and applied
to test the hypothesis of true state dependence against the hypothesis of spurious state dependence. In Manski C.E. & Daniel L. McFadden (eds.), The econometrics of panel data 30/31,
227–269. University of Chicago.
Heckman, James J. 1979. Sample selection bias as a speciﬁcation error. Econometrica 47(1).
153–61.
Heckman, James J. 1981a. Heterogeneity and state dependence. In Studies in labor markets NBER
Chapters, 91–140. National Bureau of Economic Research, Inc.
Heckman, James J. 1981b. The incidental parameters problem and the problem of initial conditions in estimating a discrete time-discrete data stochastic process. In Manski C.E. & Daniel L.
McFadden (eds.), Structural analysis of discrete panel data with econometric applications, MIT
press.
Heckman, James J., Robert J. Lalonde & Jeffrey A. Smith. 1999. The economics and econometrics of active labor market programs. In O. Ashenfelter & D. Card (eds.), Handbook of labor
economics, vol. 3, chap. 31, 1865–2097. Elsevier.
Heckman, James J. & Thomas E. Macurdy. 1980. A life cycle model of female labour supply.
Review of Economic Studies 47(1). 47–74.
Heckman, James J. & Robert J. Willis. 1974. Estimation of a stochastic model of reproduction: An
econometric approach. NBER Working Papers 0034 National Bureau of Economic Research,
Inc.
Heckman, James J. & Robert J. Willis. 1977. A beta-logistic model for the analysis of sequential
labor force participation by married women. Journal of Political Economy 85(1). 27–58.
Honore, Bo E. 1993. Orthogonality conditions for tobit models with ﬁxed effects and lagged
dependent variables. Journal of Econometrics 59(1-2). 35–61.
Honore, Bo E. & Luojia Hu. 2001. Estimation of censored regression models with endogeneity.
Honore, Bo E. & Luojia Hu. 2004. Estimation of cross sectional and panel data censored regression
models with endogeneity. Journal of Econometrics 122(2). 293–316.

128

Honore, Bo E. & Ekaterini Kyriazidou. 2000. Panel data discrete choice models with lagged
dependent variables. Econometrica 68(4). 839–874.
Hsiao, Cheng. 1986. Analysis of panel data. Cambridge, MA: Cambridge University Press.
Hyslop, Dean R. 1999. State dependence, serial correlation and heterogeneity in intertemporal
labor force participation of married women. Econometrica 67(6). 1255–1294.
Jacobsen, Joyce P., James Wishart Pearce III & Joshua L. Rosenbloom. 1999. The effects of
childbearing on married women’s labor supply and earnings: Using twin births as a natural
experiment. Journal of Human Resources 34(3). 449–474.
Kim, Jungho & Arnstein Aassve. 2006. Fertility and its consequence on family labour supply. IZA
Discussion Papers 2162 Institute for the Study of Labor (IZA).
Kim, Kyoo Il. 2006. Sample selection models with a common dummy endogenous regressor in
simultaneous equations: A simple two-step estimation. Economics Letters 91(2). 280–286.
Kyriazidou, Ekaterini. 2001. Estimation of dynamic panel data sample selection models. Review
of Economic Studies 68(3). 543–72.
Labeaga, Jose M. 1999. A double-hurdle rational addiction model with heterogeneity: Estimating
the demand for tobacco. Journal of Econometrics 93(1). 49–72.
Lee, Lung-fei. 1999. Estimation of dynamic and arch tobit models. Journal of Econometrics 92(2).
355–390.
Lee, Myoung-jee. 1996. Methods of moments and semiparametric econometrics for limited dependent variable models. Springer.
Lehrer, Evelyn L. 1992. The impact of children on married women’s labor supply: Black-white
differentials revisited. Journal of Human Resources 27(3). 422–444.
Mullahy, J. 1997. Instrumental-variable estimation of count data models: Applications to models
of cigarette smoking behavior. Review of Economics and Statistics 79. 586–93.
Mundlak, Yair. 1978. On the pooling of time series and cross section data. Econometrica 46(1).
69–85.
Nakamura, Alice & Masao Nakamura. 1992. The econometrics of female labor supply and children. Econometric Reviews 11(1). 1–71.
Nelson, Forrest & Lawrence Olson. 1978. Speciﬁcation and estimation of a simultaneous-equation
model with limited dependent variables. International Economic Review 19(3). 695–709.
Newey, Whitney K. 1985. Semiparametric estimation of limited dependent variable models with
endogenous explanatory variables. Annales de l’inséé 59/60.
Newey, Whitney K. 1986. Linear instrumental variable estimation of limited dependent variable
models with endogenous explanatory variables. Journal of Econometrics 32(1). 127–141.

129

Newey, Whitney K. 1987. Efﬁcient estimation of limited dependent variable models with endogenous explanatory variables. Journal of Econometrics 36(3). 231–250.
Newey, Whitney K. & Daniel L. McFadden. 1994. Large sample estimation and hypothesis testing.
In Robert F. Engle & Daniel L. McFadden (eds.), Handbook of econometrics, vol. 4, chap. 36,
2111 – 2245. Elsevier.
Nguyen, Hoa B. 2010. Estimating a fractional response model with a count endogenous regressor and an application to female labor supply. In William H. Greene & R. Carter Hill (eds.),
Advances in econometrics, vol. 26, 253–298. Emerald Group Publishing Limited.
Papke, Leslie E. & Jeffrey M. Wooldridge. 1996. Econometric methods for fractional response
variables with an application to 401(k) plan participation rates. Journal of Applied Econometrics
11(6). 619 – 32.
Papke, Leslie E. & Jeffrey M. Wooldridge. 2008. Panel data methods for fractional response
variables with an application to test pass rates. Journal of Econometrics 145(1-2). 121 – 133.
Rivers, Douglas & Quang H. Vuong. 1988. Limited information estimators and exogeneity tests
for simultaneous probit models. Journal of Econometrics 39(3). 347–366.
Rosenzweig, Mark R. & Kenneth I. Wolpin. 1980. Life-cycle labor supply and fertility: Causal
inferences from household models. Journal of Political Economy 88(2). 328–48.
Schultz, T. Paul. 1978. Fertility and child mortality over the life cycle: Aggregate and individual
evidence. American Economic Review 68(2). 208–15.
Semykina, Anastasia & Jeffrey M. Wooldridge. 2010. Estimating panel data models in the presence
of endogeneity and selection. Journal of Econometrics 157(2). 375–380.
Shaw, Kathryn. 1994. The persistence of female labor supply: Empirical evidence and implications. Journal of Human Resources 29(2). 348–378.
Skrondal, Anders & Rabe-Hesketh Sophia. 2004. Generalized latent variable modeling: Multilevel, longitudinal and structural equation models. Boca Raton, FL: Chapman and Hall, CRC.
Smith, Richard J. & Richard W. Blundell. 1986. An exogeneity test for a simultaneous equation
tobit model with an application to labor supply. Econometrica 54(3). 679–85.
Staiger, Douglas & James H. Stock. 1997. Instrumental variables regression with weak instruments. Econometrica 65(3). 557 – 586.
Terza, Joseph V. 1998. Estimating count data models with endogenous switching: Sample selection
and endogenous treatment effects. Journal of Econometrics 84(1). 129 – 154.
Vella, Francis. 1993. A simple estimator for simultaneous models with censored endogenous
regressors. International Economic Review 34(2). 441–57.
Vella, Francis & Marno Verbeek. 1999. Two-step estimation of panel data models with censored
endogenous variables and selection bias. Journal of Econometrics 90(2). 239–263.
130

Vytlacil, Edward. 2002. Independence, monotonicity, and latent index models: An equivalence
result. Econometrica 70(1). 331–341.
Vytlacil, Edward & Nese Yildiz. 2007. Dummy endogenous variables in weakly separable models.
Econometrica 75(3). 757–779.
Weiss, Andrew A. 1999. A simultaneous binary choice/count model with an application to credit
card approvals. In R. Engle & H. White (eds.), Cointegration, causality, and forecasting: A
Festschrift in honour of Clive W. J. Granger, 429 – 461. Oxford and New York: Oxford University Press.
Willis, Robert J. 1973. A new approach to the economic theory of fertility behavior. Journal of
Political Economy 81(2). S14–64.
Winkelmann, Rainer. 2000. Econometric analysis of count data. Berlin: Springer.
Wooldridge, Jeffrey M. 1997. Multiplicative panel data models without the strict exogeneity assumption. Econometric Theory 13(5). 667–678.
Wooldridge, Jeffrey M. 2002. Econometric analysis of cross section and panel data. Cambridge,
MA: MIT Press.
Wooldridge, Jeffrey M. 2005. Simple solutions to the initial conditions problem in dynamic, nonlinear panel data models with unobserved heterogeneity. Journal of Applied Econometrics 20(1).
39–54.
Wooldridge, Jeffrey M. 2010. Econometric analysis of cross section and panel data. Cambridge,
MA: MIT Press 2nd edn.

131