THESIS

Illiiiil'li'lilil'iliﬂiiiliiiililililliilliiiiﬂtﬁill

3 1293 01701 4741

This is to certify that the

dissertation entitled

Essajs On Sampk Scicoi—{on/ 3d? Sclcd-fon
and ModCi SéiCLi—tbn

presented by
A; - C kl. HS H

has been accepted towards fulﬁllment
of the requirements for

?L') .D. degree in ECDnOMTCS

 

 

Qg’huya‘w. («QM

Major professor

Date jwm 2"; ((161?

MSU i: an Aﬂirmatt'w Action/Equal Opportunity Institution 042771

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINE return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE I DATE DUE DATE DUE

JUN 1 8 200m

APR 19 2001
1

W

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

11” MM“

ESSAYS ON SAMPLE SELECTION, SELF SELECTION
AND MODEL SELECTION

By

Ai-Chi Hsu

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Economics

1998

ABSTRACT

ESSAYS ON SAMPLE SELECTION, SELF SELECTION
AND MODEL SELECTION

By

Ai-Chi Hsu

This dissertation analyzes several different issues concerning selection corrections
with a censored selection variable, including sample selection bias corrections, self-
selection bias corrections, and some model selection issues. Several methods are
developed to correct sample and self-selection biases. Model selection for hurdle-type
models is also discussed.

Chapter 1 develops a sample selection bias correction procedure when the
endogenous explanatory variable appearing in the structural equation is also the variable
determining selection. Assuming that the selection variable follows a standard Tobit, a
simple two-step estimator is consistent and asymptotic normal. The method is applied to
a popular data set from the labor economics literature. Some comparisons also are made
for diﬁ‘erent estimation approaches.

Chapter 2 develops a method to simultaneously deal with self-selection and sample
selection problems with a Tobit selection equation and a roughly continuous endogenous
explanatory variable, as in Garen (1984). This method is applied to two diﬂ‘erent data sets
from the labor economics literature.

Chapter 3 studies model selection for hurdle models that are oﬁen used as extensions
of the Tobit model. Two competing non-nested models are considered. One is Cragg’s

(1971) truncated normal model that assumes the distribution of the dependent variable

follows a normal distribution and truncated at zero. The other one is log-normal model
that assumes the distribution of the dependent variable follows a log-normal distribution.
To select one model rather than the other, Vuong’s (1989) approach is applied to see
which model ﬁts data better. The simulation results show that Vuong’s test has reasonable

power for choosing the correct model.

Dedicated to My Parents and Brother

iv

ACKNOWLEDGEMENTS

First and foremost, I would like to say ‘thank you!’ to my dissertation advisor
Professor Jeﬂ‘rey Wooldridge. Professor Wooldridge shared his professional knowledge
and valuable time with me. This dissertation would never have been possible without his
help and support. I also thank other committee members, Professor Robert Lalonde and
Professor Stephen Woodbury for their helpful comments on my thesis, especially chapter
2.

There are some friends and fellow graduate students to whom I also want to say
thanks. Thanks to Hui-Wen Shih, Cheng-Ping Cheng, Te-Fen Lo, I-Jung Tsai, and the
other graduate students I worked with.

I am grateful to my parents and my brother for their support throughout my

graduate study. Thanks to them again.

TABLE OF CONTENTS

LIST OF TABLES ........................................................................... vi
INTRODUCTION .......................................................................... 1
CHAPTER 1
SELECTION CORRECTION WITH ENDOGENOUS EXPLANATORY
VARIABLES AND A TOBIT SELECTION EQUATION ........................... S
I. Introduction ........................................................................ 5
H. The Basic Model .................................................................. 10
III. Empirical Application ............................................................ 17
IH.1 Estimating the Structural Wage Equation ............................... 17
111.2 Estimating the Labor Supply Equation .................................. 22
IV. An Extension of the Model ...................................................... 23
V. Models with Additional Endogenous Explanatory Variables ............... 24
VI. Conclusion ........................................................................ 26
CHAPTER 2
SELECTION CORRECTIONS WHEN BOTH SELF-SELECTION
BIASES AND SAMPLE SELECTION BIASES ARE PRESENT ................... 33
I. Introduction ....................................................................... 33
H. A Random Coefficient Model .................................................. 36
ID. Garen’s Model with Sample Selection ........................................ 39
IV. Empirical Examples .............................................................. 42
V. The Consistency of Procedure 2.2 ............................................. 49
VI. Conclusion ........................................................................ 52
CHAPTER 3
MODEL SELECTION TESTS FOR TWO-PART MODELS ......................... 63
I. Introduction ........................................................................ 63
H. Basic Framework ................................................................. 68
HI. Empirical Result .................................................................. 73
(i) The Simulation Results .................................................... 74
(ii) The real Data Sets Results ................................................ 77
IV. Conclusion ........................................................................ 78

CHAPTER 4
CONCLUSION .............................................................................. 82

LIST OF REFERENCES .................................................................. 85

vii

LIST OF TABLES

Chapter 1

SELECTION CORRECTION WITH ENDOGENOUS EXPLANATORY VARIABLES
AND A TOBIT SELECTION EQUATION

Table 1.1: The Descriptive Statistic of NLS Data Set ............................. 28

Table 1.2: The Estimation of Different Speciﬁcations ............................ 29

Table 1.3: Two step and Moﬂitt’s MLE estimation ................................ 30

Table 1.4: Tobit Estimation for the Labor Supply Equation ...................... 31

Table 1.5: Two Step Estimation with assumption relaxed ......................... 32
Chapter 2

SELECTION CORRECTIONS WHEN BOTH SELF -SELECTION BIASES AND
SAMPLE SELECTION BIASES ARE PRESENT

Table 2.1: The Descriptive Statistics for PSID Data Set .......................... 56
Table 2.2: The Estimation of Different Speciﬁcations (PSID Data Set) ........ 57
Table 2.3: Joint tests for self-selection and Sample selection biases
(PSID data set) .................................................................. 58
Table 2.4: Descriptive Statistics for CPS Data Set ................................. 59
Table 2.5: The Estimation of Different Speciﬁcations (CPS Data Set) ......... 60
Table 2.6: Joint tests for self-selection and Sample selection biases
(CPS data set) ................................................................... 61
Table 2.7: The Estimation of Different Speciﬁcations
(PSID Data Set without Parents’ Education as Instruments) ............. 62
Chapter 3
ALTERNATIVE MODELS SELECTION FOR TOBIT SPECIFICATION
Table 3.1: Combinations of the Simulation ......................................... 79
Table 3 .2: The Rejection Rate (True Model: Log Normal Model) .............. 80
Table 3.3: The Rejection Rate (True Model: Truncate Normal Model)... . . . 81

viii

INTRODUCTION

This dissertation analyzes several different topics for selection corrections with a
censored selection variable, including sample selection bias corrections, self-selection
bias corrections, and some model selection issues.

It is quite normal in econometrics to assume that a random sample is available from
the underlying population of interest. However, this is not always the case. Sometimes
due to the way economic data are collected and economic behavior by the individuals
being sampled, a nonrandom sample can be generated. By “sample selection” I mean
cases where certain variables cannot be observed for a subset of the population. Thus,
sample selection concerns data availability.

Self-selection refers to any situation when one or more explanatory variables
correlated with unobservable factors affecting the outcome equation. An example is
education in a wage equation, where education is correlated with unobserved ability that
affects wage too.

The distinction between self-selection and sample selection is not always sharp. For
example, people can self-select into the workforce, which leads to a data observability
problem (we do not observe the wage offer for those out of the workforce). We will treat
this as a sample selection problem. The labels we give to these econometric problems are
not crucial, but it is useﬁil to have a way of distinguishing endogenity of explanatory
variables from missing data problems.

In this thesis I propose methods to correct for possible sample selection biases. 1

study the case when the variable determining selection follows a reduced form Tobit

model. Usually, the sample selection issue is appeared in the context of binary selection
information: either we observe the data or do not. But in some cases more information is
available. A leading example is estimating a wage oﬁ‘er equation for working age adults.
Rather than work or not, the data for working hours is also available. It is possible to
improve the estimation using this extra information.

As we know, Heckman’s (1976) method for correcting potential sample selection
biases is widely used in empirical studies. However, the method developed in this thesis
has advantages over Heckman’s method: the identiﬁcation is easier, handling endogenous
explanatory variables is easier.

In chapter 1 I develop a sample selection bias correction procedure when the
endogenous variable, especially the Tobit selection variable, is an explanatory variable in
the structural equation. The procedure can be used in a variety of studies. For example, it
can be used to estimate a wage offer equation for a certain group of people (such as age
over 25) which we use as an application in chapter 1. The working hour is the selection
variable in this case. The working hour also appears as an explanatory variable in the
structural equation. There is self-selection component to this example: people self-select
into employment, so whether we observe wage oﬁer or not depends on individual’s
decision. As we said, whether we call example like this sample selection or self-selection
is not a very important issue. However, a typical self-selection issue in economics has
some features that distinguish it from the sample unobservable problem.

Chapter 2 continues the reasoning in chapter 1 and develops a procedure that can
handle both self-selection and sample selection problems at the same time to get a

consistent estimation for the parameters in a structural equation. The problem of self-

selection often occurs when evaluating various programs such as job training, welfare, or
the returns to the education using nonexperimental data. Generally, the issue is that
individuals decide whether to participate in a program or to receive some treatment. Ifthe
participation decision is related to factors that aﬂ‘ect the outcome variable, ignoring the
deterrninates of this decision leads to biased estimates of program impacts. We consider
the case that allows the endogenous explanatory variable interacted with an unobservable.
The reasoning of the procedure is similar as Garen’s (1984) procedure, which he used to
estimate the return to education when education and unobserved ability interact.
However, we extend Garen’s work by considering the sample selection bias.

As we know, hurdle models are more ﬂexible alternatives to Tobit. Chapter 3
studies model selection for hurdle or two-tier model. Two competing non-nested models
are considered. One is Cragg’s truncated normal model, the other is a two—part log-
normal model.

For choosing between nonnested models, two very different approaches have
been proposed. The earliest is based on the work of Cox(1961, 1962) who derived
speciﬁcation tests that use information about a speciﬁc alternative and test whether the
null can predict the performance of the alternative. This approach assumes that one of the
competing models is the true model and test the hypothesis based on this assumption.

The other approach is what Vuong(1989) terms a “model selection approach.”
Vuong bases tests of nonnested alternatives on an estimate of the Kullback-Leibler (1951)
Information Criterion (KLIC), which measures the distance between two distributions
relative to the true distribution. Compared with Cox’s approach, Vuong’s approach does

not assume that one of the models is correct under H0 . The null hypothesis is that the

models ﬁt the data equally well, and the alternative is that one model ﬁts better. If a
model is correctly speciﬁed then, asymptotically, it produces the best ﬁt.

I apply Vuong’s (1989) approach to see which hurdle model ﬁts data better. A small
simulation study is applied to generated data sets for both the truncated normal
distribution and the log-normal distribution to see if Vuong’s procedure has power for
picking the true model. I also apply the test to three labor data sets.

Chapter 4 contains concluding comments and directions for future research.

Chapter 1
SELECTION CORRECTION WITH ENDOGENOUS EXPLANATORY

VARIABLES AND A TOBIT SELECTION EQUATION

I. Introduction

The problem of nonrandom sampling occurs frequently in econometrics. Sample
selection problems can arise when a random sample is not available from the underlying
population of interest. Due to the way economic data are collected, sometimes we can
obtain only a nonrandom sample from the population. Ifthe sample does not represent the
population of interest, sample selection biases may arise. As an example, suppose we are
interested in estimating a wage offer equation for women over age 30. By deﬁnition, this
equation is supposed to represent all women over age 30, whether or not a woman is
actually working at the time of the survey. Since we can only observe the wage offer for
working women, we actually select our sample on this basis. The sample selection problem
arises because data on a key variable, wage, is available only for a clearly deﬁned subset of
the population. This is an example of incidental truncation (of the wage offer, in this
example), because wage is missing due to the outcome of labor force participation.

One can easily get confused about the distinction between sample selection and
self-selection. Consider the above example without the incidental truncation problem. The
population is working women, so we do not have a sample selection problem. However,
education level might be correlated with unobserved characteristics, such as “ability,” that

also have a direct eﬁ‘ect on the wage. In other words, people ‘self-select’ their education

levels. This makes self-selection problem and sample selection problem different in the
nature. In general, the terms “sample selection” and “self-selection” are used to make a
clear distinguish for the nature of the selection problems. Sample selection refers to the
data observation problem when there are selected samples; self-selection refers to the
endogenity of the explanatory variables when the explanatory is considered to be
correlated with the error term in the structural equation. There is no strict separation
between these problems. Self-selection could be the cause of the sample selection
problem. As in our wage offer example, it is true we do not observe the wage oﬂ‘er for
people who do not work, so it is a sample selection problem. However, one can also say
that whether people work or not comes from their “self selection.”

It is fairly well known that in a linear model with endogenous explanatory
variables, if selection rule is based on some exogenous variables, estimation of the
population model by two stage least square (ZSLS hereafter) using the selected sample is
consistent and asymptotically normal (see, for example, Wooldridge (1996)). Ifthe
selection rule is based on endogenous variables, applying ZSLS to estimate the population
model using the selected sample will not generally be consistent. Therefore, it is important
to have methods for testing and correcting for sample selection with endogenous
explanatory variables.

The purpose of this chapter is to derive and apply an alternative method to test and
correct sample selection bias while estimating essentially the Type IH Tobit model1 with
endogenous explanatory variables, concentrating on the core where the selection variable

appear in the structural equation. This makes our model essentially similar to Nelson and

Olson (1978) and becomes a special case of the Type IV Tobit model, which is an
extension of Type HI Tobit. However, we make fewer assumptions and obtain simple
two-step estimators.

A standard Type 1H Tobit model consists two equations. One is the stnrctural
equation, whose coefﬁcients we want to estimate the coeﬂicients consistently. The other is
the selection equation, which takes a Tobit equation form. Unlike the Type H Tobit
model, the Type HI Tobit model’s selection equation includes not just a binary variable
(probit equation). It includes more information in the selection equation. Heckman (1976)
is the pioneer of Type H Tobit model. Heckman’s two step method for correcting sample
selection bias for his labor-supply model by including inverse Mill’s ratio into structural
equation has been widely used. The procedure was ﬁrrther extended to a wide class of
models by Lee (197 8). The drawback to Heckman’s procedure is that, if there is not a
good exclusion restriction in the structural model, the estimators can be very imprecise in
ﬁnite sample. Full maximum likelihood methods or semi-parametric method could be
applied to Type H Tobit model estimation. However, as discussed by Wooldridge (1996),
none of them are satisfying. Ifthere is more information available in the selection equation,
such as Type HI Tobit form, the additional information may be used to get more precise
estimators.

The traditional method to estimate Type HI and IV Tobit model is maximum
likelihood, which makes full parametric assumptions; see Amemiya (1985 section 10.8).
To detect if there exists sample selection bias, Vella (1992) extended the testing

procedures proposed by Heckman (1979) and Vella (1993) to construct a t-test on a

 

‘ See Amemiya (1985, section 10.3)

constructed variable in an auxiliary equation. However, Vella did not prove the
consistency of the parameters in the structural equation using his method. In the case of a
Tobit selection equation, Wooldridge (1996) showed that adding the Tobit residuals can
produce consistent estimators in the structural equation. Wooldridge relaxed the basic
Tobit 1H model assumption, which needed by MLE, that the distribution of the error terms
in both the structural equation and the selection equation are bivariate normal. Instead, he
proposed a weaker assumption that only the error term in the selection equation is normal
distributed and the two error terms have a linear relationship. Under these modiﬁcations of
the Type IH Tobit model, he proposed a multi-step procedure to estimate the model. At
the same time, he also proved that the testing procedure proposed by Vella can not only
be a test but a correction for sample selection bias in cross section linear model case. In
other word, he proved the consistency of the parameters in the structural equation aﬁer
the constructed variable is added as a regressor.

Unlike MLE’s computational burden, the multiple-step procedure is much easier to
implement, even for models with endogenous explanatory variables (the case we are going
to discuss in this paper). It also requires weaker assumptions than MLE. In other word, it
is more robust. Furthermore, it can provide a simple test for the sample selection bias and
correct the problem at the same time if biases do exist.

In this chapter, the basic approach used is a simpler two-step method that extends
a method proposed by Wooldridge (1996). Wooldridge covers the case of Type HI Tobit
model. I focus on the case that the Type IH Tobit model with endogenous explanatory
variables, especially the case where the dependent variable of the selection equation itself

turns out to be an explanatory variable in the structural equation (a special case of Type

IV Tobit). To consistently estimate our model, we make some adjustment of the Type IH
Tobit model by putting the dependent variable of the selection equation into the right hand
side of structural equation as an explanatory variable (a special case of Type IV Tobit) and
correcting the sample selection bias and endogenity of the selection variable at the same
time. We even extend the two-step procedure to three-step procedure in order to estimate
the selection equation consistently. We use the same data set of Mofﬁtt (1984), which
contains the information on married women in 1972 wave of the National Longitudinal
Survey of Older Women. Moﬁitt used a maximum likelihood method to estimate the labor
supply function in his paper. Comparing the method he used, the method we use here is
easier to compute and requires fewer prior assumptions, which means the estimation
procedure we use is more robust.

The results of the estimation show that 1): there does exist sample selection bias
that cannot be ignored. 2): the effect of hours on log(wage) appears to be linear. 3): the
returns of education appears to be overestimated when the sample selection problem is not
accounted for. 4): the wage effect is positive and signiﬁcant in the labor supply equation.
5): the wage elasticity of the labor supply is less when it is conditional on working hours
are positive than without conditions.

The basic econometric model is in section H. The empirical results and the
comparisons are proposed in section HI. We relax the assumption of a linear relationship
between the error terms in structure equation and selection equation assumed in section
IV. In section V we extend the basic model to allow additional endogenous explanatory

variables in the structural equation. Section V1 is the conclusion.

H. The Basic Model

Consider the following structural model:

y] : ﬂyi'i'ziy +u1 (H)
yizaoyi +225+u2 “-2)
y. =max(0,y;) (1.3)

where yl ,y2 are endogenous variables and y1 is observed only when y2 >0; 2] ,22 are
exogenous variables and are always observed, 21 is a 1xKl vector with 2H 2 1 and 22 is

a lsz vector with 22, 21; ul , u2 are structural errors; y; is the part of y2 where
y2>O.

This model basically is in the category of the Tobit selection model with
endogenous explanatory variables. The basic framework is similar to Nelson and Olson
(1978). However, Nelson-Olson assumes y1 is always observed so there is no sample
selection problem, which is diﬁ‘erent from our speciﬁcation. We use y; instead of y2 as

explanatory variable in (1.1) so that the reduced form of equation (1.2) can be derived.
The reduced form of equation (1.2) can be written as (1.3) so the general model becomes

(1.1), (1.3) and
y; = zzr2 + v2 (1.4)
We can combine (1 .3) and (1.4) to obtain a reduced form selection equation:

y2 —--max(0,27r2 +v2) (1.5)

10

The lxK vector 2 contains nonredundant elements 21 and 2,. This model is similar as

Type HI Tobit model except we include y; in (1.1) as explanatory variable.

We consider a general linear model where selection is determined by the

instrumental variables. The population model of interest is

y = ,6, +,62x2 +...+,kak +u = xﬂ+u,
where x1 2 1. One or more of the elements of x can be correlated with u . We also assume
the availability of a le vector 2 , L 2 k, such that

E(u | z) = 0.
Naturally, any exogenous elements in x are included in z . Under the rank condition for
identiﬁcation, rank E( z'x) =k, ,3 could be consistently estimated by two stage least

squares if we have a random sample ﬁ'om the population. However, if we can observe data

on (x, y, z) for only a subset of the population, consistency depends on the nature of the

selection rule.

Let s be a binary selection indicator such that s = 1 if ( x, y, z)is observed and
s = 0 otherwise. Assume that s = h(z) for some known function h( . ) , so that selection is
based entirely on the exogenous instrument variables. Let (x" , y, , z, ), i=1,2,. . .,N, be a
random sample from the population, and let s, = h(z,.). Then the ZSLS estimator using

the selected sample can be written as

r -l

,3 = [N'lisizi'xi] (N’isizi'zi) (NJZNZSJWJ

I

.11" , -1N . 4 4N .
.N Elsizixi N Elsizizi N iElsiziyi.

l— 1-

11

Substituting yi = xﬂ +ui gives

—1

2- 2+1NZS~MNM MM

 

2 —1
[$432.22 2. i [1“. 38.2 a ’2] [N" 39122 i 3]]-

Since we assume E(u | z) = O and 3,2i is some ﬁinction of zi , E( siz, 'u,) = 0 by
iterated expectations. According to the law of large numbers, plirn ﬂ = ,6 . More details
are given in Wooldridge (1996).

In our setting, selection is determined by a censored selection variable. Let v be a

zero mean random variable such that (z,v) determines the sample selection: 5 = h(z, v)
and (x, y, 2, v) is observed when s = 1. Assuming that (u,v) is independent of z , and that
E(u | v) = 5v, we can write
=xﬂ+§v+e, (1.6)

where E(e | 2, v) = 0 . If we use ZSLS on the selected sample to estimate (1 . 1), we will not
get a consistent estimator of ,6. But we can estimate (1.6) by ZSLS using instruments
( 2, v) .

We can apply this result to model (1.1) and (1.5) to estimate ,6 and y . To test

and correct the possible sample selection bias when estimating in (1.1), we need a few
assumptions:
Assumptions 1.1

(i) ( z, y,) is always observed but yl is observed only when y2 >0.

12

(ii) (u1 , v2) is zero mean and independent of z .
(iii) v2 ~ Normal (0, 122)
(W) E( u. ivz )=§V2
An important assumption in the Type 1H Tobit model is the bivariate normalality

assumption so that the model can be estimated by MLE. However, here we only need

Assumptions 1.1 (iv) instead of the bivariate normality of ( u1 , v2 ). This allows a fairly
broad range of distributions for u]. Under Assumptions 1.1 (ii) and (iv) we can write

ul =.§fv2 +e (1.7)

where e is a disturbance such that E( el 2, v2 )=O. Plugging (1.7) into (1 . 1) gives

y1 =ﬂy§+zly+§v2+a (1.8)
Here the test of H0 :9“ = O is a valid test for sample selection problem, as it tests whether
ul and v2 are uncorreclted. If H0 is not rejected, y; is exogenous in the structural
equation (1 . 1) and selection is based on exogenous variables (since 2 is exogenous). Thus,
OLS on the selected sample works in that case. On the other hand, y; is endogenous in
the structural equation (1 . 1) if 4‘ :t 0. However, since y; depends on (2, v2 ), it is

exogenous in equation (1.8). In fact,

E(yi|2,v2)=ﬂy§+zir+§vz (19)
Further, the expectation is the same if we condition also on v2 > —zrr2 . From Wooldridge
(1996) Theorem 2.1, it follows that OLS estimation of (1 .9) using the selected sample will
be consistent (take x = (y; , zl , v2) ). Because v2 is not observed, it must be estimated.

This leads to the following procedure

13

 

PROCEDURE 1.1:

(i) Estimate
y,2 =max (O, 2,7:2 + V12): viz|zi ~ Normal (0, If)
by standard Tobit MLE using all N observations.

(ii) Obtain the residuals from step 1 as 9,, for y,2 >0 (i = 1,2,...,Nl ) by deﬁning

A A

vi; = y,2 —2, 7:2 , i=1,2, ......... ,Nl.
(iii) Regress y" on ya, 2,], and 9,2, i=1,2, ......... ,N1

Since we have consistent ﬁrst—stage estimates, replacing V2 with estimates does

not affect consistency in the second stage. The correction of standard errors due to the
generated regressors problem is discussed in Wooldridge (1996). Wooldridge’s formula

can be applied here because f2, is the only generated regressor.
It should be noticed that there are two separate features of our model. First, there
exists an endogenity problem for y; as an explanatory variable in the structural equation.

Second, there exists sample selection problem because we can only observe yl when

y2 >0. By including V2 as an additional regressor in the structural equation, the two
problems are solved at the same time. The endogenity of y2 actually is the source of the
sample selection problem in our case. If y2 is exogenous, then uI and v2 are not

correlated with each other. The sample selection problem does not exist.

In addition to ,6 and 7 , we may want to estimate ato and 6 . Let

A.

y,l a @in + z“ 7 for all observations N, where y,2 = 2,722 is the predicted value of the

Tobit model in the Procedure 1.1 step (i). Estimate the model

14

 

 

A

y,2 =max (0, a0 y“ + 2,26 +err0r2) (1.10)
by standard Tobit. Then we can get consistent estimators for an and 6 also. Since the

relationship between explanatory variables and dependent variable is nonlinear, interest lies

in E(inIyilrziZ) and E(inIyil’zi2;yi2 > O)- Ifwe let x9 5 a0 yn+ 72125, we can apply

the formulas provided by Wooldridge (1998) directly. We brieﬂy outline the reasoning of

the formulas in the following,

To obtain E( ynlx ), ﬁrst use the law of iterated expectations:

E(yrzlx )=P(y12 : Dix) ' 0 + P(y12 > Dix) ' E(yizlxayrz > 0)

 

=P(y.. > 01x) - E(y,, x,y.2 > 0) (1.11)

the ﬁrst term P(y,.2 > 0|x) is easy to be derived,
P(y,.2 > 0|x) =P(u2 > —xt9|x)
=P(-’i—2—>—xa/a)=¢(x9/a) (1.12)
a

We can also derive that

 

xH/a
E(yi2 |x,y,.2 > 0) = x9+E(u2 > —x6) = x9+aW
So we can conclude that
. _ @(x0/ 0)
E(yizx,y,2>O)—x6+a¢(x0/a) (1.13)
E( y,2 Ix )= <D(xt9 /a)x0 + o¢(x0/0’) (1.14)

15

where a is standard error, $0 is the standard normal pdf, and <1>() is the standard

normal cdf. The partial effect of variable x 1. with respect to (1.13) and (1.14) are:

6E(y,-2|x;y,-2 > 0)
0” x].

 

=19, +19, flaw/a)

= t9j[1—/1(x6/0’){x0/0'+/1(x9/0‘)}] (1.15)

Qﬁ

<I>(-) ”“1

where xi =

 

 

a (ylex) =5P(y,2 > le) -E(y.-2 IX,)’,~2 > O)+P(y12 > Oix)' 6130,12 lxhylz > O)
07x]. 6x]. 0x

I

. 19. 2
since we already know that Po... >0|x)=<I>(x19/a) and 5P<yt2ax>°"‘) = ’¢(),we
. 0’

J

 

can derive that

 

 

 

6E . x 6. 6E . 2 . 0
-—2-'2—|-l =4®(x0/0') 0E(y,.2|x;y,.2 > 0)+<I>(x0/ a) o (y,2|x,y,2 > )
0”xj 0‘ 0,,
x}.
............ (1.16)
The elasticity of y,2 with respect to x J. , conditional on y,2 > O , is
6E(y12ixay12 >0) x]. (117)
axj E(yiz Ixayrz > O)
and the elasticity of y,2 with respect to x J. , without conditional on y,2 > 0, is
613(in ix) x]. (118)
526, E(yiz IX)

In section HI we apply the three-step method we derived so far to estimate wage equation

16

and the correspondent labor supply equation. We also estimate the wage elasticity of labor
supply.
We can use the same argument to put y? in the structural equation of our model.

The sample selection correction version of the structural equation becomes:

yi=ﬂ y§+n y?+2.7+6 v2+e (1.19)
where y? in this equation is an exogenous explanatory variable. However, except for
n = 0 , (1.19), (1.2) and (1 .3) can not lead us to have a reduced form selection equation

like (1.5). But we can use (1.19) to test H0 :17 = 0; that is, we can test for linearity in the

relationship between yl and y;

III. Empirical Application

III.1. Estimating the Structural Wage Equation

We now apply the previous methods to consistently estimate a wage offer

equation. The structural equations are:
log (wage) = )60 + ,6, hours' + 2,7 + u1 (1.20)
hours = max(0, at0 log (wage) + 226 + 14,) (1.21)

where hours‘ are “desired” hours, observed only when hours > 0. 2I and 22 are vectors

of exogenous variables as deﬁned in section H. In the notation of section H, take

17

y1 = log(wage) and y; = hours'. We observe wage only when hours>0. Also, we assume
Assumptions 1.1.

We can derive the reduced form of (1 .21) as

hours = max(0, zrr2 + v,). (1.22)

There are two reasons we put hours' as an explanatory variable in the wage
equation instead of the traditional case that does not have it included in the wage equation.
First, theoretically labor is a quasi-ﬁxed input that will generate quasi-ﬁxed costs to the
ﬁrm, which means that a lower wage will be paid at lower hours of work than at high
hours of work; see Oi (1962). There have been few theoretical or empirical studies of the
effect of hours worked on wage offers. Rosen (1969) is an example of an early study.
Rosen (1976), Larson (1979), and Hausman (1981) also are examples. Second, Moﬁitt
(1984) argues that working hours is an important explanatory variable that should be
included in the wage equation. We apply the current methods to Moﬁitt’s data set to see if
this is the case.

From Assumptions 1.1 in section H, we know that:
E(log( wage)| 2, v2 )= ,60 + ,6, hours. + 2,7 + 6 v2 (1.23)

Adding the term 6 v2 is not only for testing and correcting sample selection bias but also
for testing and correcting the endogenity of hours'. The consistent estimators of A and
7 can be obtained by the following procedure:

(a). Estimate (1. 19) by standard Tobit for all observations N. Save the residual of (1 . 1 9)

for hours>0 as 132 by deﬁning 1'32 =hours— 2721 , for hours>0

18

(b). Regress log( wage) on 1, hours', 21 , and 1‘2, .
Here we assume that Var( ul Iv, ) is constant so that the t statistic is valid and we

can get a convenient t-test of f for sample selection bias/endogenity of hours. after the
regression in step (b). If it is signiﬁcant, we have also correct the problem. The White
(1980) heteroskedasticity—robust t-statistic can be used if Var( ul Iv, ) is not constant.

We estimate the model using the same data set used by Moﬂitt (1984). The data
set contains a cross section of women drawn from the 1972 wave of the National
Longitudinal Survey (NLS) of Older Women. It includes all women in this wave who have
valid data for hours of work last week, the hourly wage rate, the ﬂow of asset income, and

related characteristics. The variables included in the zI vector are race, age, years of

schooling, and three area variables: the size of local labor force, the employment fractions

in manufacturing and in government in the census region of residence. The 2, vector

includes dummies for marital status, age, race, the number of family members, the number
of children in the household who are less than 6, the number of children in the household
who are greater than 6. Both 21 and z, are treated as exogenous explanatory variables.
See Table 1.1 for more details about the descriptive statistics. In Table 1.1 the percentage
of women working is 48.69%. Eighty percent of the women are married. The average age
of the sample is 44.37 and the average education is 10.32 years.

To compare the differences between ordinary least square estimation and our two
step procedure, we estimate the structural equation using ordinary least square on the

selected sample. This corresponds to simply dropping 13,, as an explanatory variable. We

19

also want to see if hours2 any importance for determining wage offer by adding hours2 as
an additional explanatory variable in wage equation. The results are shown in Table 1.2.

In Table 1.2, the t-test for 9,, is signiﬁcant in column (2). It indicates that there

does exist sample selection bias. Comparing the coefﬁcient of educ in columns (1) and (2),

that is, with and without 13,, into the structural equation, the coefﬁcient decreased from

.0777 to .0594. The return of education appears to be overestimated if the sample
selection bias is not corrected. The coefﬁcient of hours' also changes ﬁ'om .0036 to .0121

before and after we add the constructed variable 9,, into the structural equation. This

means that if we treat hours‘ as an exogenous variable, the return of working hours is

apparently underestimated. In column (3) we add hours'2 as an additional regressor in our
two step procedure to see if it is important for determining wage, as found by Mofﬁtt.
However, it is not signiﬁcant in our estimation.

Since Mofﬁtt used wage instead of log(wage) as the dependent variable in the
wage equation, we have to make the modiﬁcation in our structural equation using wage as
the dependent variable in order to make a direct comparison with Mofﬁtt’s maximum
likelihood results. The results are in Table 1.3.

In Table 1.3, column (1) is our two step procedure using wage as the dependent

variable. The t-statistic for o, is still signiﬁcant as in Table 1.2, which indicate sample
selection problem. The effect of education in this case is 16.06 cents. We also ﬁnd that the

estimate coefﬁcient for hours is .243 and is signiﬁcant. In column (2) we add hours'2 as

additional regressor in the two step procedure. Just as in Table 1.2, we still do not ﬁnd

that the coefficient of hours'2 is signiﬁcant in our estimation. The estimated coefﬁcient for

20

hours also is not signiﬁcant in column (2) after we add hours”. This maybe come from
the multicolinerity between hours and hours”. Since Mofﬁtt did not accountd for the
sample selection problem in his study, we put the OLS estimation of the wage equation
without adding 1‘), in column (3) to compare with Mofﬁtt’s estimation in column (4). We
ﬁnd that the estimation results are quite similar except for the coefﬁcients of hours and
hours”. The eﬂ‘ects of education are 20 cents in both cases compared with around 16
cents if the sample selection/endogenity problem is taken care of.

Returning to Table 1.2 for our original setting, just like Moﬂitt found, we ﬁnd
there are signiﬁcant eﬁ‘ect of hours of work on the wage. Aﬁer correcting the sample
selection bias, the eﬁ‘ect is more signiﬁcant.

Moﬂitt (1984) found a quadratic relationship between wage and hours. As
discussed earlier, we can use the same framework to test for hours2 by adding it to the
structural equation. Although we put hours. on the right hand side of the wage equation,
the coeﬁcient of 13,, which represent the estimation of the correlation of ul and 13,, is still
signiﬁcant. This is different from what Mofﬁtt found. In our test, the sample selection bias
does exist. Also, in Moﬁitt’s paper, he found that hours2 has a signiﬁcant negative effect
on wage. After putting hours2 in the right hand side of the structural equation, we did not

ﬁnd that eﬁ‘ect is signiﬁcant. In Table 1.3, the t-statistic for hours'2 is -.25, which is very
insigniﬁcant. Thus, while the point estimate suggests a decreasing return to hours, as

Mofﬁtt found, the effect is not signiﬁcant.

21

IH.2 Estimating the Labor Supply Equation

After we estimate the structural wage equation, we can also estimate the labor
supply equation too. Since we did not ﬁnd the signiﬁcance of hours” , we only include
hours. in the structure equation as an explanatory variable instead of hours. and hours”.
The procedure just like we derived in section H. We post the result of Tobit estimation for
the labor supply equation in Table 1.4.

To derive the wage elasticity of labor supply, let ato log(wage) + 2,6 = x6 .

According to equation (1 . 13) and (1.14), the partial effects of wage with respect to labor

supply with and without the condition hours>0 are that

6 E(hoursllog(wage), 2,; hours > 0) _ a,

 

[1 — 2(x6/0) . ((xa/a) + 1(x9/a))] (1.24)

 

 

6 wage wage

6 E h 1 ,

( “’3' ”mag” 22) = “0 o(-)(xa + ar(-))+ <r>(.) a" (1 - 1(.)(xa/a + 1(.)))

0,, wage wage - a wage
............. (1.25)

The correspondent elasticities are:
1. The wage elasticity conditional on working hours >0( we call it elasl)

elasl= 6 E(hours|10g(wage), 2,; hours > 0) * wage (1.26)

 

0.. wage x6 + 020

2. The wage elasticity without conditional on working hours>0(we call it elas2)

22

= 6 E (hoursIlog(wage), 2,) * wage
<I>(-)x6 + 09150

 

elas2 (1 .27)

6 wage

Plugging in the mean values of the estimates, (1.24) equals to 3.71 and (1.25) equals to
5.31. We can also calculate the wage elasticity of labor supply for these two cases. In the
case of hours>0, the elasticity is 0.26. This is pretty close to the nonlinear constraint case
in Moffrtt’s study, which is 0.21. If we do not condition on hours>0, the elasticity is
0.686.

As Moﬂitt found, we can see in Table 1.4 the other coefficients in the labor supply
equation are generally of the expected sign but usually of low signiﬁcance. Married

women work less, as do older women and those who have more children.

IV. An Extension of the Model

It is natural to extend our model to relax some assumptions and get a more robust

estimation. Keeping all assumptions in Assumption 1.1 except (iv), we instead assume
E(ul Iv, )=1§,v2 + {,(v,2 — 1'22). The conditional expectation of the structure equation
becomes
E(y1 |z,v,)=—§,r,2+ﬂ yi+217+§1v2+42v22 (1-28)
Since we still assume the normality of v, , we can get a consistent estimator of II,
from a Tobit equation. So 9,, and 9}, can be used in the procedure 2-1 we described in

section H. The test for sample selection bias/endogenity is now the joint null H 0 : g, = 0 ,

C, = 0. The result shows in Table 1.5.

23

In Table 1.5, the estimated return to education is essentially unchanged when 9; is

added to the structural equation as an explanatory variable. The change in the coefﬁcient

on hours is also very small. It changes from .0121 to .0122. This is not surprising because
9,2 is not very signiﬁcant using the standard t-statistic. We did not ﬁnd any signiﬁcant

change in the structural equation estimation after we allow for a quadratic in E( ul Iv, ).

V. Models with Additional Endogenous Explanatory Variables

We can add additional endogenous variables to the model in the section H. For
simplicity, we consider the case of one additional endogenous explanatory variable added

into the model. Consider the model:

y. =ﬂly§ +ﬂzy3 +2.7 +12. (129)
y, = zrr, + v, (1.30)
y3 =zzr3+v3 (131)

Here y, is just as before and y, = max (0, 27:, + v, ) is the selection equation. y3

is an endogenous explanatory variable that can be a binary variable, a count variable, or

contain both continuous and discrete characteristics. For simplicity, we assume y3 is a

scalar. The goal again is to estimate the structural parameters in (1.29).
Assumptions 1.2

(i) ( z, y,) is always observed and ( y,, y,) is observed when y, >0.
(ii) (14,, v,) is independent of z .

(iii) v, ~ Normal(0, 1,2).

24

(iv) E(u,|v,) = 5v, .
(v) 27:3 2 217:3, + 2,7r3, , 7:3, at 0.

There are two cases of interest in the setup of (1 .29) to (1.31). The ﬁrst is when
y3 is always observed. For example, y3 is education and y, is log(wage) and education is
always observed no matter the observability of wage. In the second case, y3 is observed
only along with yl . An example is when yl is log(wage) and y3 is a measure of nonwage

beneﬁts. We do not observe nonwage beneﬁts when we do not observe wage.

From the assumption we made above, we can write

y,2,6)»;+ﬂ,y,+z,}2+(fv,+el (1.32)
where el 2 u1 —E( u,Iv,). And E( e,Iz,v,) = 0. Since el is not correlated with y; , y; is
exogenous in (1.32). If v, were observed, we could estimate (1.32) by ZSLS on the
selected sample using instruments ( 2, v, ). As before, we can estimate v, when y, > 0

since 7:, can be consistently estimated by Tobit of y, on 2 .

PROCEDURE 1.2

(i) Obtain Iii from Tobit of y, on 2 using all observations. Obtain the Tobit

residuals 121, = y); — 21-75) for y,, > 0.

(ii) Using the selected subsarnple to estimate the equation

ya = ﬂryrz +ﬂ2y13 +2117 +5912 +911 (133)

by ZSLS, using instruments (y, , 2,, 151,).

25

Allowing for y, means that the structural equation can contain standard form of
endogenity apart from sample selection. For example, when yl is wage offer and y, is
hours worked, y3 might be education. Thus, we allow for self-selection into the model as

well as account for the sample selection problem. We discuss self-selection in the next

chapter.

VL Conclusion

This chapter has derived a multi-step approach to estimating a Type IH Tobit
model where the selection variable appears as an endogenous explanatory variable. The
multi-step approach has advantages of being easy to compute and being more robust than
MLE. It also provides a simple t-test for sample selection bias at the same time.
Computing the asymptotic variance matrices when sample selection is present requires
general methods for two-step estimation, as in Newey and McFadden (1994, Handbook of
Econometrics).

When we apply the approach to the Moﬁitt (1984) data, we ﬁnd evidence of
sample selection bias, contrary to what Mofﬁtt ﬁnds. After correcting the bias, the average
return of education goes from 7.77% to 5.94%, which is similar to earlier ﬁndings. It

appears that the average return will be overestimate if we do not account for the sample

selection. The coefﬁcient of hours‘ also changes ﬁom 36% to 1.21% before and after we

account for endogenity of hours in the wage offer equation.

26

"I

 

As for labor supply, after our three step estimation we found out that the wage
elasticity of labor supply is far less when the estimation is conditional on working hours>0
than without, something that has been reported in earlier literature.

We also showed how to relax the assumption that the structural error has a
conditional expectation linear in the selection error. The model can be extended to handle
other endogenous explanatory variables in the structural equation. The next chapter

considers a more general model with endogenous explanatory variables and sample 1

 

selection.

27

 

Table 1.1

Descriptive Statistics for NLS Data Set

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Variable Observations Mean Standard Minimum Maximum
Deviation
hours 610 17.1 19.06 0 60
wage 297 2.50 1.06 .44 6.62
nowinc 610 12.50 9.32 0 48.08
marital 610 .80 .40 0 1
age 610 44.37 3.35 35 49
nonwhite 610 .26 .44 O l
famber 610 4.31 2.06 1 11
chiles6 610 . 10 .39 O 4
chigre6 610 1.93 1.74 0 7
educ 610 10.32 2.77 2 18
labforce 610 .41 .84 .002 4.59
indfrl 610 .22 .02 .18 .25
Indfr2 610 .18 .02 .15 .23

 

 

 

 

 

 

Note: hours are weekly hours worked, wage is hourly wage rate, nowinc is the ﬂow of

asset income, marital is marital status (=1 if married), nonwhite is a binary variable, which

equals one if nonwhite. famber represents the number of family members, chiles6

represents number of kids less than 6 years of age, chigre6 represents number of kids

greater than 6 years of age, educ represents years of education, labforce is the size of labor

force (in millions), indfrl is employment fractions in manufacturing in the census region of

residence, indﬁ'2 is the employment fractions in government in the census region of

residence.

28

 

Table 1.2

The Estimation of Different Speciﬁcations

 

 

(1) (2) (3)
OLS OLS+ 9, (2)+ hours“2

Constant -.9093 -.5153 -.0625
(.5789) (.7027) (.7083)

nonwhite -. 1325* -.1626* -. 1622*
(.0485) (.0503) (.0504)

Educ .0777* 0594* .0594*
(.0078) (.0115) (.0116)

Labforce 0923* .1013* .1012*
(.0264) (.0266) (.0267)

Indfrl -1.2751* -.3620 -.3534
(.6968) (.8149) (.8186)

Indfr2 —4.251 1* -1.9894 -1.9897
(1.7563) (2.0440) (2.0476)

Age .0015 .0104 .0103
(.0061) (.0074) (.0074)

hours. 0036* 0121* .0110
(.0019) (.0044) (.0090)

hours” - - .00002
(.00014)

9, - -.0091* -.0093*
(.0044) (.0044)

R-Square .3612 .3711 .3712

.1 .IL -H': ”HA:)V
51

 

 

Notes: Dependent variable is log hourly wage.

educ represents years of education.

exper represents experience.

9, is the Tobit residual.

Standard errors in parentheses.

* : signiﬁcant at the 10% level.

29

Table 1.3

Two step and Moﬁitt’s MLE estimation

 

 

(1) (2) (3) (4)
Two step OLS Moffrtt’s
procedure (1)+ hours” estimation
with wage
as
dependent
variable
Constant .4269 .3757 * 2.04 -1.80
(1.7936) (1.8078) (1.49) (1.54)
nonwhite -. 1655 -.1676 -.11 -.10
(.1283) (.1287) (.12) (.12)
Educ . 1607* . 1606* .20* .20*
(.0295) (.0295) (.02) (.02)
Labforce .1910* .1916* . 17* .17*
(.0679) (.0681) (.06) (.08)
[MM -1.185 -1.2251 -3.02* -3.59*
(2.0801) (2.0894) (1.78) (2.05)
Indﬁ'2 -5.0859 -5.0847 -9.51* -10.27*
(5.2176) (5.2261) (4.48) (5.11)
Age .0285 .0284 .01 .01
(.0188) (.0189) (.02) (.02)
hours. .0243 ”I . 0294 . 014 .053 *
(.0113) (.0231) (.02) (.02)
hours” -.00009 -.0001 1 -.00078*
(.00035) (.0003) (.0002)
9, -.0182* -.0181* - -
(.0111) (.0111)
R-Srnrare .3243 .3245 .3182 .1694

 

 

Notes: Dependent variable is hourly wage.
educ represents years of education.
exper represents experience.

9, is the Tobit residual.
Standard errors in parentheses.

* : signiﬁcant at the 10% level.

30

Table 1.4
Tobit Estimation for the Labor Supply Equation

(dependent variable is hours)

 

 

 

 

 

 

 

 

 

 

 

 

 

Coeﬁicient Standard error t value
Constant 49.05 23.44 2.09 P d
lwagehat 19.30 6.30 3.06 .'
nowinc .009 .17 .05 ’ "I
marital -5.78 4.14 -1.40
age -1.20 .49 -2.47
race 5.12 4.05 1.27
famber -.23 1.73 -. 13
chiles6 -4.06 4.64 -.88
chigre6 -.73 1.91 -.38

 

 

 

 

 

Number of obs = 610

chi2(8) = 24.05

Prob > ch12 = 0.0022

Log Likelihood = -1694.5467

Obs. summary: 313 left-censored observations at hours<=0

297 uncensored observations

note: lwagehat is eatimated log(wage). nowinc is the ﬂow of asset income. Marital is
marital status (=1 if married). Race is a binary variable, which equals to one if nonwhite.
Famber represents the number of family members. chiles6 represents number of kids less

than 6 years of age. chigre6 represents number of kids greater than 6 years of age.

31

Two Step Estimation with 9; added as an explanatory variable in the structural equation

Table 1.5

 

 

 

 

 

 

 

 

 

 

 

 

 

coefficient standard error t value
constant -.55 .41 -1.347
race -. 15 .05 -3 .09
educ .06 .01 5.36
labforce .10 .03 3.15
indfrl -.36 .82 -.45
indﬁ2 -2.06 2.05 -1.01
age .01 .007 1.50
hours .012 .004 2.93
9, -.014 .007 -2.03
1322 .00009 .0001 .91

 

 

 

 

 

Number of obs =

R-squared

Notes: Dependent variable is log hourly wage.

educ represents years of education.

exper represents experience.

9, is the Tobit residual.

32

Chapter 2
SELECTION CORRECTION WHEN BOTH SELF-SELECTION BIASES
AND SAMPLE SELECTION BIASES ARE PRESENT IN A RANDOM

COEFFICIENT MODEL

1. Introduction

In chapter one we discussed sample selection correction procedures in models with
endogenous variables and a Tobit selection equation. In this chapter, we focus on the
situation where there is both self-selection and sample selection. A lot of literature has
focused on either corrections for self-selection biases or corrections for sample selection
biases before. However, none of them corrects these two sources of biases at the same
time.

The problem of self-selection often occurs when evaluating various programs such
as job training, welfare, or the returns to education using nonexperimental data. Generally,
the issue is that individuals decide whether to participate in a program or to receive some
treatment. Ifthe participation decision is related to factors that aﬁ‘ect the outcome
variable, ignoring the determinates of this decision leads to biased estimates of program
impacts. As an example, suppose we are interested in evaluating the beneﬁts of social
programs. A common speciﬁcation is as follows:

Y1: zrﬂr +ay2 +u1,
where yl is an outcome such as earnings, Z1 is a vector of exogenous characteristics, and

y, is the program variable that can be a dummy or a continuous variable. Ifwe are

33

interested in the impact of the program on employed persons, it is sufficient to obtain a
random sample of people who are currently working. Ifwe are interested in the impact of
the program for the population of eligible persons, but we can only obtain a sample of

people who work, the problem of sample selection bias arises. In addition, the variable y,

generally cannot be treated as exogenous if the decision of an individual to attend the

program is based on individual self-selection. Ifwe were to treat the variable y as

exogenous, the self-selection biases also arise. Ifthe sample we obtain represents the
population we are interested, there is no sample selection problem and only the self-
selection problem remains. However, if the sample also does not represent the population
of interested, we have to address both sample selection and self-selection problems.

An early discussion of the self-selection problem was that of Roy (1951), who
discussed the problem of individuals choosing between two professions, hunting and
ﬁshing, based on their productivity in each. The observed distribution of incomes of
hunters and ﬁshermen was determined by these choices. The result showed that the
individuals with better skills go into the profession with higher variance in earnings.

The econometrics of the sample selection problem began with the studies by
Gronau (1974), Lewis (1974), and Heckman (1974). Heckman (1976) suggested a two—
stage estimation method to address this problem. Subsequently, Lee and Trost (1978) also
analyzed the self-selection problem in the case of housing demand. About the same time,
Lee (1978) corrected for sample selection biases corrections when estimating the return to
union membership, and Willis and Rosen (1979) did the same thing when estimating the

rate of return to a college education.

34

In the foregoing studies, the self-selection problem involved a discrete choice
between two alternatives. By contrast, Garen (1984) studied self-selection problem with a
continuous choice variable in the case of the returns to schooling where unobserved
heterogeneity interacts with a continuous endogenous2 explanatory variable. In Garen’s
application, the effect of schooling on wage may depend on the level of ability and self-
selection may cause schooling and ability to be correlated. Garen’s model will be discussed
in section H.

Angrist and Krueger (1991) also studied the case of the returns to schooling. They
explored self-selection problem in the context of how compulsory school attendance laws
affect schooling and earnings. They argued that season of birth is related to educational
attainment, because of policies regarding the age when children may ﬁrst start school and
compulsory school attendance laws. People born in the beginning of the year start school
at an older age and can drop out after completing less schooling than people born near the
end of year. They use quarter of birth as instrument for education and apply two-stage
least squares (2 SLS) to estimate the model. In this chapter, we examine the case that
combines both self-selection and sample selection biases with a Tobit selection equation.
We also want to compare the diﬁ‘erent situations under different estimation methods such
as OLS, ZSLS, the method proposed by Garen, and the method we propose.

The remainder of the chapter is organized as follows. Section H develops the basic
model and examines the implications of different combinations of self-selection and sample
selection. Section HI develops a model that extends Garen’s procedure to handle both

self-selection and sample selection. In section IV we use two different data sets from the

 

2 Endogenous variable here means any explanatory variable correlated with unobservables.

35

labor econonrics literature to verify the models we proposed in section HI. Section V
shows consistency of the new procedure we used in section IV. Section V1 is the

conclusion.

H. A Random Coefficient Model

Garen (1984) considered a model where an endogenous explanatory variable
interacts with an unobservable. This kind of model also called a “random coefficient”
model. Speciﬁcally,

y, 22,,6, +a,y, +y,a+7,y,a+u, (2.1)
where 2, represent the exogenous variables and a constant and y, is the variable that is
correlated with the unobserved variable a. As we will see, the assumptions we impose are
most reasonable when y, is (roughly) continuous. For identiﬁcation reasons, we need to
assume 2 (which represent all of the exogenous variables) contains at least one element
not in 2, . The variable u, is the structural error, where we assume that E( u, | 2, y, ,a) = 0.
Therefore, (2.1) represents a conditional expectation:

E(y, |2,y,,a) = 2,6, + a,y, + y,a +7,y,a (2.2)

The partial effect of y, on E( y, |2, y, ,a) depends on a:

BBQ), lz,y2,a)
ayz

 

=a,+7,a.

Without loss of generality we can assume that E( a) = 0. Therefore, the average eﬁ‘ect in

36

the population, sometimes called the average treatment effect, is a, . This is the primary
parameter of interest.

Garen argued that since y, is generally correlated with 2 , y,a is generally
correlated with 2 even if E(al 2) =0. So, if we estimate the equation

y, = 2,6, +a,y, +e,
by ZSLS using instruments 2 , the estimates of 6, and a, will not be consistent. Garen

suggested a correction approach to correct the self-selection biases when there exist the

interaction term of y, and the unobservable. The reduced form for y, is:

y, = 26, + v,. (2.3)
Garen assumed it, ~Normal(0, of, ), a ~Normal(0, of ), and Cov(u, ,a) = 0,1,. Also,
E(a | 2,v,) is independent of 2 and linear in v, :

E(a|2,v,)=E(a|v,)=p,v,. (2.4)
Under (2.4), the expected value of (2.2) conditional on (2, v,) is:

E(y, |2,v,) = 2,6, +a,y, +7,E(a|z,v,)+y,y,E(a | 2,v,)+E(u, |2,v,)

=26. +01% +rlp2v2 +72p2y2v2

This suggests a natural two-step procedure proposed by Garen (1984):

1. Regress y, on 2 and save the residuals 9,.
2. Regress y. 021 2.,y2,92.y292.
By the standard theory of generated regressors, the second regression produces consistent

estimators of 6, and a, , y,p,, and 7,p,.

37

In a recent paper, Wooldridge (1997) showed that Garen’s claim about the
inconsistency of ZSLS is not correct. Wooldridge showed that ZSLS does produce
consistent estimators when we ignore the interaction term in Garen’s model under
assumptions weaker than Garen imposed. We brieﬂy describe the reasoning as follows:

Rewrite equation (2.2) as

y, = 6,+2,6, +a,y,+ay,+u, (2.5)
and assume E( u, | 2) = 0 . Because equation (2.5) contains the term a, y, , we can assume
E( a) = 0 without loss the generality. For y, , we can write a linear equation in terms of all
exogenous variables as in (2.3):

y, =6, +26, +v, =5, +2,6,, +z,6,, +v, (2.6)
If v, is correlated with a , then y, is endogenous and the expected error term will not be
zero, which is:

E(ay, +u,)=E(ay,)=E(av,) $0 (2.7)
Because of this, Garen argued that the ZSLS estimator a, would be inconsistent.
However, ZSLS only causes the intercept term 6, to be inconsistently estimated and the

other parameters will be consistently estimated. To see this, we need to ﬁrst assume

E( u, | 2) = 0 as we already did and we need to assume v, satisﬁes the zero mean
conditional and homoskedasticity assumptions from standard linear regression analysis.
That is, E( v, | 2) = O and E( v,2 | 2) = a: . We also need to assume a relationship between
a and v, as before, E(a I 2, v,) = E(a | v,) = p,v,. This means a is conditional mean
independent of 2 given v, and E(a | v,) is linear. We need the standard identiﬁcation

condition that 6,2 at 0 in (2.6). Now, since

38

E(ay, I2) = E[E(ay, |2,v,) | z]: EIy,E(a| 2,v,)| 2]
=E(p1y2v2 IZ) = 10.13022 I 2) = 10.022 = E(ayz).
we have
y, =(5, +p,0,2)+2,6, +a,y,+r,, (2.8)

where r, = lay, -E(ay, |2)]+u,, so E(r, I2) = 0.

-v’v -‘;T

Applying 2SLS to (2.8) using a random sample of size n, where 2, is the vector of

instruments for y, , is consistent and J; -asymptotically normal for 6, and a, under

 

standard ﬁnite moment conditions. For more details refer to the paper of Wooldridge.
There is one thing that should be noticed: although Garen’s claim about the

inconsistency of 2SLS is incorrect, that does not mean the estimation method he proposed

will produce an inconsistent estimate. Both ZSLS and Garen’s model should work under

similar assumptions.

HI. Garen’s Model with Sample Selection

Garen (1984) assumed that a random sample from the underlying population is
available. This may not be the case in his example, which is to estimate the average return
to education in a wage offer equation. As we discussed in chapter one, using only
observations on people who are working can cause a sample selection bias.

In this section we study estimation of Garen’s model with a Tobit sample selection
mechanism. It can be written as:

yl = zlﬂl +aly2 +6ly2al +2‘] (29)

39

y, = 27:, +v,=2,7r,, +2,72,, +v, (2.10)

y3 =max(0,27r3 +v,), (2.11)
where y, is observed when y3 > 0 , 2 represent the exogenous variables and a constant,
and y, is the endogenous variable that is possibly correlated with the unobserved variable
a. The assumptions below are most suited to the case when y, is continuous. For

identiﬁcation reason, we need to assume 2 (which represent all of the exogenous
variables) contains at least one element not in 2, .
We have added a Tobit selection equation to Garen’s model. An example of this is

when y, is the log of the hourly wage, y, is years of education and y3 is weekly or
annual hours of works. More details about this example will be given in section HI.
Assumptions 2.1

(i) (2, y, , y,) is always observed in the population but y, is observed only when y3 > 0.
(ii) (u,,v,,v,,a,) is zero-mean independent of 2 .

(iii) v, ~ Normal (0, r32 ).

(iv) E(a, | 2, v,,v,) = p,v, +p,v,.

(v) E(u, |2,v,,v,) = {,v, +§,v,.

(vi) The rank condition 72,, i 0.

Assumption (i) says that y, is the only variable unobserved due to a possible sample

selection problem. Assumption (ii) and (iii) are fairly standard in this contexts, but they are
restrictive. Assumption (ii) means that the disturbance term in the linear reduced form for

y, (which is v,) is independent of 2 , this restricts the kinds of endogenous variables we

40

can allow. If y, is a binary or discrete variable, then the assumption is not reasonable. This
could be relaxed somewhat, but linearity is still needed.

Assumption (iii) allows u, , v, , and v3 to be arbitrarily correlated, so that
endogeneity and sample selection bias can both be present. Assumption (iv) implies that
the conditional expectation involving unobservable is linear.Assumption (v) relaxes the
usual joint normality of (u, , v, , v,) by just requiring linearity of a conditional expectation.
Assumption (vi) is the standard identiﬁcation condition: We need a good instrumental
variable for y,.

The model is a Type IH Tobit model in Amemiya’s (1985) taxonomy but with an
endogenous explanatory variable that interacts with an unobservable.

To derive an estimating equation, write

E(y, |2,v,,v,)=2,6, +a,y, +6,y,E(a, |2,v,,v,)+ E(u, |2,v,,v,)

=2,6, +a,y, +6,y,(p,v, + p,v,)+(,v, +§',v3
by assumptions (ii), (iv), and (v). So we have

E(y, |2,v,,v,) = 2,6, +a,y, +K,y,v, +K,y,v, +§,v, +§,v3
where K, a 6, p, and K, E 6, p, . From the discussion in section H of chapter one, we can
select the sample on the basis of ( 2, v3 ).

Since v, and v3 are not observed, we can estimate v, by running OLS on

equation (2.9) and estimate v3 when y3 > 0 since 7:, can be consistently estimated by

Tobit of y3 on 2 . The procedure is:

41

PROCEDURE 2.1

(i) Run OLS of y, on 2 to get 7%,. Obtain the OLS residuals 9, = y, - 272,.

(ii) Obtain if, ﬁom Tobit of y3 on 2 using all observations. Obtain the Tobit
residuals 9, = y3 —27?, for y3 > 0.

(iii) Use OLS on the selected subsample for which we observe y, to estimate the
equation
y, = 2,6, +a,y, +rc,y,9, + rc,y,93 +§,9, +§,93 +errorterm.
We can test and correct the potential self-selection and sample selection biases at

the same time by procedure 2.1. An F-test of the joint signiﬁcance of y,9, , y,9,, 9, , and
93 tests for endogenous of y, or sample selection. Ifeither if, or f, is signiﬁcant, there is

a self-selection problem. If either 12', or 4°, is signiﬁcant, there is a sample selection
problem. Ifwe use individual t-statistics or F-statistics for a subset of coefﬁcients, these
should be adjusted for the generated regressors problem. If the coefficients on these
variables are small, the adjustment will not make much of difference.

Notice that the exogenous explanatory variables 2 in (2. 10) and (2.11) can be the
same: we do not need an exclusion restriction in (2. 10) in order for the procedure to work

well. The residuals 9, have separate variation ﬁom 27?, because of the variation in y,.

IV. Empirical Examples

As in Chapter 1, we use an example from labor economics as the basis for the

empirical applications to follow.

42

A wage offer labor supply system is given by

log(wage°) = 2, 6, + a, educ + 6, educ - a, + a, + u, (2.12)
educ= 27!, + v, (2.13)
hours=max(0, 27:, + v,) (2.14)

where wage° is the wage offer observed only when hours>0, educ is years of education,

a, is unobserved ability. Education is correlated with unobserved ability. There exists an
interaction term educ - a, for education and ability. We assume that hours, 2 , and educ

are always observed. The assumptions are the same as in section II. We are interested in
estimating the returns to education in this model, and we do so using different estimation
methods including OLS, ZSLS, the method suggested by Garen, and the method pr0posed
in Section HI.

We ﬁrst apply these methods to the data set used by Mroz (1987) in his study of
the sensitivity of female labor supply to various assumptions. The data come from the
University of Michigan Panel Study of Income Dynamics (hereafter PSID) for the year
1975. The sample consists of 753 married women, 428 of whom report positive hours

worked during 1975. The exogenous variables include actual labor market experience

(exper), exper 2 , family income other than that earned by the woman — in thousands -
(nwifeinc), number of kids less than six (kidslt6), number of kids between six and eighteen

(kidsge6), and age. We make exclusion restrictions in the wage equation, which includes

only educ, exper, and exper2 as explanatory variables. Hours here are measured annually.

The descriptive statistics are summarized in Table 2.1.

43

We summarize the estimation results for the PSID data set in Table 2.2. In the
OLS estimation, we estimate the model using only the women who work, which is
ignoring the potential sample selection biases and self-selection biases. The result shows in
column (1) of Table 2.2.

The OLS estimate return of the education using the PSID data set is about
10.75%. However, in our model educ is endogenous and possibly correlated with
unobserved ability. It is possible OLS estimation is biased. So we want to see if there
exists self-selection biases and sample selection biases problems by checking the other
versions of estimation.

To see if there self-selection bias exists, we ﬁrst run a regression which only

include 9, as the additional regressor. The result shows in column (2) of Table 2.2.
The t-test for the estimated coefﬁcient of 9, is marginally signiﬁcant, suggesting

possible self-selection bias. To correct the possible self-selection bias, we use 2SLS and
Garen’s method to make comparisons. The results of 2SLS and Garen’s estimation is in
column (3) and (4) of Table 2.2.

The ZSLS estimate of returns to education is 8.04%. We use parents education
and husband education as the instrument variables for educ. The parents education levels
are widely believed good instruments for educ although the husband education level is
more suspectable to be a good instrument. According to section H, the estimation is

consistent if there is only self-selection biases present. Since 9, = educ - 22?, , it can be

proved that the OLS estimators of the returns to education are identical to the 2SLS

estimators when we add 9, as an additional regressor in a normal OLS . Although the

44

parameters estimate of ZSLS and the one we did in column (2) are the same, the
procedure of adding 9, as an additional regressor in a normal OLS is useful as a test to

see if there exists self-selection biases.

We also perform the estimation method proposed by Garen to compare with
ZSLS. The results are shown in column (4). In column (4), the return to education is
7.64%. If we compare the estimated coefﬁcients for the returns to education in column (3)
and (4), we can see that there is only 0.4 percentage point difference.

To see if there exists sample selection biases in addition to self-selection biases, we
suggest two models. One includes both corrections for self-selection and sample selection
biases but linear model which we will call it procedure 2.2 afterwards. More details for
procedure 2.2 are given in section V. The other one is our new method developed above

which include the linear terms 9, , 9, and the nonlinear terms educ- 9, , educ- 9,. The

results are shown in column (5) and (6).
The return to education in column (5) is 7.65%, which is similar to the results in
2SLS and Garen’s model. This is reasonable since the t value for the estimator of

coefﬁcient of 9, is —1 .351, which indicates the sample selection biases is not a server

problem here.

In column (6), the returns to education is 7.98%. The result is very close to the
estimation by ZSLS, Garen’s model, and the linear combined model we proposed above.
From the t value of each estimate coefﬁcient, we have the same conclusion as before that
there only exists a self-selection problem. We also make some joint tests for self—selection

biases and sample selection biases. The results show in Table 2.3. It also shows that there

45

is strong evidence that there exist self-selection biases but no signs of sample selection
biases.

We notice there is about a 20% difference for the estimation of returns to
education between OLS and other estimation methods. We can conclude that the OLS

estimator of a, is inconsistent. Also, the 2SLS version of 6, is .0804, which is not much

difference from the Garen’s version of 6, (about 0.076). Sample selection does not appear
to be a problem in this case.

We now apply the same procedure to another, larger data set, which was compiled
from the May 1991 Current Population Survey (CPS) by Daniel Harnermesh. The data set
contains 5634 married women, 3286 of whom report working positive hours during the
week. The hourly wage is weekly earnings divided by weekly hours for women who
worked positive hours; the women did not work do not have data for hourly wage. The
hours here is measured weekly rather than annually as in PSID data set. Also, experience
is potential rather than actual experience and the variables kidslt6 and kidsge6 are binary
indicators. The descriptive statistic are shown in Table 2.4.

As in Table 2.2, here we perform all the estimation methods for CPS data set in
Table 5. OLS results are shown in column (1).

In column (1), the estimated return to an additional year of education is 9.90%. To
see if there exist self-selection biases, we add 9, as an additional explanatory variable for
OLS. The results are shown in column (2).

From the t statistics for the estimated coefficient of 9, , it is clear that there does

exist self-selection biases and need to be corrected. We ﬁnd the returns to education is

46

13.05% in this case and the t statistic is signiﬁcant at the 10% level. Although it is widely
believed that estimate of the return to education should go down after correction of the
self-selection, there is growing research supporting our empirical results of the CPS’s data
set that OLS estimate is biased downward. Angrist and Krueger (1991) in their research
for returns to schooling reach the same result that OLS estimates are biased downward
than instrumental variables estimates.3 David Card (1993) also reach the same result that
instrumental variable estimates exceed OLS estimate and suggest that the result may come
ﬁom measurement error in schooling.

The 2SLS results are shownn in column (3). As we said, the estimators of ZSLS

are algebraically the same as we add 9, as additional regressor in OLS. So the estimated

coefﬁcients is just the same as the result in previous column. Since the CPS data set we
have does not have other good instruments for educ, we can only use husband education
as the only instrument variable for educ. If there is no sample selection biases present, the
estimate should be consistent. For comparison reason, the estimation of Garen’s method is
shown in column (4).

The return to education in Garen’s model is 13.07%, which is about the same as in
ZSLS version. This support the claim made by Wooldridge that ZSLS estimator is, under
certain assumptions, consistent even when we allow the endogenous explanatory variable
to interact with unobservables. When the data set is bigger, we observe that the estimators
of ZSLS and Garen’s model is closer than in PSID data set. We also have to check if the

sample selection biases is a problem. To see that, we also run the two methods we

 

3 Other researches like Ashenfelter and Krueger (1992), Kane and Rouse (1993) and Butcher and
Case( 1993). All the above studies report instrumental variables estimates of the return to schooling that
exceed the ordinary least-squares estimate.

47

proposed as we did in PSID data set case. The estimation result, which makes corrections
for self-selection and sample selection biases but linear model, is in column (5).

From the t-test of the estimated coefﬁcient for 9, , it’s clear that there exist sample
selection biases problem. The t value is 2.503, which is pretty signiﬁcant. The returns to
education rise a little to 13.09% ﬁ'om 2SLS. The estimation including the interactive
terms as additional regressors shows in column (6).

In column (6), the return to education we estimated is 13 .21%, which is pretty

close to the estimation method that only include 9, and 9, as additional regressors. We
notice that from t-test, both estimate coefficients of 9, and educ- 9, are not signiﬁcant. It

indicates there is multicollinearity present.

We do the F-test to jointly test the four terms including 9, , 93 , educ- 9, , and
educ- 9, . The result shows in Table 2.6. We can see that the joint test for estimate
coemcients of educ- 9, and 9, is marginally signiﬁcant at the 10% level. This supports our

claim that the sample selection biases do exist in this case.

It is important to note that the estimated return to education by OLS is about the
same (10.75% for PSH) data set and 9.9% for CPS data set) before the self-selection and
sample selection problems are being controlled for. After correcting self-selection and
sample selection biases, the return to education decreases to 7.98% in PSID data set and
increases to 13.21% in CPS’s data set. From the estimation results in Table 2.2 and 2.5,
this OLS bias of return to education mainly comes from people’s self-selection.

To make the exact comparison with CPS data set, we redo the procedure 2.1 for

PSID data set and remove parents’ education as instruments for education. This leaves

48

husband education as the only instrument for education just like in CPS data set. The
results are shown in Table 2.7.

In Table 2.7, both self-selection and sample selection don’t appear to be problems.
The result is different from Table 2.2 where we use parents’ education and husband’s
education as instruments of education. This result indicates that husband’s education may
not be a good instrument for education. Compare with the results in Table 2.5 and 2.7, the
situation that OLS estimations are close but procedure 2.1 estimations are very different

keeps the same.

V. The Consistency of Procedure 2.2

When there is only a self-selection problem, both ZSLS and Garen’s estimation
method can produce consistent estimates for the return to education under the
assumptions discussed in section HI. We also mentioned in section ID that the OLS
estimators of the returns to education are identical to the 2SLS estimators when we add
9, as an additional regressor in a normal OLS. This means both OLS y, on
2, , y,, 9,, y,9, and OLS y, on 2, , y, , 9, can get consistent estimators of a, (the returns to
education).

In procedure 2.1 we regress y, on 2, , y,, y,9,, y,9,, 9, , 9,. It is natural for us to
suspect that OLS of y, on 2, , y, , 9, , 9, can actually produce consistent estimator of a, .

Compare the results of Procedure 2.2 and procedure 2.1 in Table 2.2 and 2.5, we also ﬁnd

that 6, in both methods are very close.

49

Our conjecture turns out to be correct. We show it in the following.

As in section H, our model is:

yr 2 Cr +zlﬂl +a1y2 +61yzar +“1 (2-9),
yz =”1'2 +v2=zr7t21 +2232 +v2 (2-10)
y, =max(0,27r, +v,) (2.11)

where c, in (2.9)’ is the intercept. We also need certain assumptions:

Assumptions 2.2

(i) E(u, |2) = 0

(ii) E(v, I 2) = O, E(v,2 | 2) = 0,2

(iii) E( v, | 2) = O, Cov( v,, v, )=0

(iv) E(a, |2,v,,v,)=E(a, |v,,v,) = p,v, +p,v,

(v) E(u, |2,v,,v,) = {,v, +§,v,

(vi) 72,, :t 0 in (2. 10).

Assumption (i) is an exogenous assumption for 2 . Assumption (ii) implies that v,

satisﬁes the zero conditional mean and homoskedasticity assumption from standard linear

regression analysis. Assumption (iii) means that v, satisﬁes the zero conditional mean and
is uncorrelated with v,. Assumption (iv) allows y, and y, to be correlated with a, .
Assumption (v) allows y, and y, to be correlated with u, . Assumption (vi) is the

standard rank condition for identiﬁcation.
We can show that by only adding OLS residuals of equation (2. 10) (denoted as

9,) and Tobit residuals of equation (2. 1 1) (denoted as 9, ) as additional explanatory

50

variables in the structure equation (2.9)’, 6, and a, can be consistently estimated. First
we need to show that E( a, y, | 2, v, , v,) =E( a, y,) . Under (2. 10), (ii), (iii), (iv), and the
law of iterated expectations,
E(a,y, |2,v,v,)=E[E(a,y, |2,v,,v,)l2,v,,v,]=E[y,E(a, |2,v,,v,)|z,v,,v,]
=E[27r,(p,v, + p,v,) | 2,v,,v,]+E[v,(p,v, + p,v,)l 2,v,,v,]
=E( 27r,p,v, |2,v,,v, )+E(27r,p,v, |2,v,,v,)
+E( p,v,v, I 2, v,,v, )+E( p,v,v3 | 2, v,,v,)
=p.E(V§ |2,v2.v3)=p.0§ = E(alyz)

After we have the proof above, we can write

y, = (c, +6,p,0,2)+z,6, +a,y, +r,, (2.12)
where r, = [6,a,y, —6,E(a,y, |2,v,,v,)]+u,, and so
E(r, |2,v,,v,)=E(u, |2,v,,v,)
=§,v, +§,v, (2.13)
by assumption (vi).
It follows that we can write
E(y, |2,v,,v,) = (c, +6,p,a,2)+2,6, +a,y, +§,v, +§,v,, (2.14)

Now let 3 be a binary indicator such that s=l[ y, > 0] which is a function of ( 2, v, ) , we
have
E(yl IZ,V,,V,,S =1)=(Cr +61.0107,?)‘1’21ﬂl +a1y2 +41% +szs- (2-15)

Equation (2.14) leads to the following three-step procedure.

51

PROCEDURE 2.2

(i) Run OLS of y, on 2 to get 7%,. Obtain the OLS residuals 9, = y, — 27%,

(ii) Obtain 7?, from Tobit of y, on 2 using all observations. Obtain the Tobit
residuals 9, = y, —27'i, for y, > 0.

(iii) Estimate 6,,a,,§,,§, from OLS regression

y1 on zrry27V23V3

using the selected subsample.

It turns out the Procedure 2.1 is also unnecessary just like Garen’s method in self-
selection case for us to consistently estimate 6, and a, . Both Procedure 2.1 and

Procedure 2.2 can estimate the returns to education consistently in section IV’s

application.

VI. Conclusion

This chapter shows how to test and to correct for self-selection and sample
selection biases at the same time, especially in circumstance when the explanatory variable,
such as education, interacted with unobservable.

The procedure developed in this paper to test and correct potential self-selection

and sample selection biases is the following:

52

1. If self-selection and sample selection questions exist at the same time, then use the two
methods we provide in previous sections (Procedure 2.1 and 2.2) to test and correct
the biases.

2. If only self-selection appears to be a problem, add 9, as an additional regressor and run

OLS. If 9, is signiﬁcant and self—selection biases are present, use Garen’s method or

ZSLS.

It should be noted that Procedure 2.1 and 2.2 produce very similar results in both
data sets, especially in the larger CPS data set. This supports our claim that the estimation
in Procedure 2.2 is consistent even when there exists an interactive term as a regressor.

It is needed to mention that in this chapter we have avoided ﬁrll parametric
assumptions in deriving the sample selection and self-selection procedures. Nevertheless,
we have assumed that the selection variable follows a standard Tobit equation and that the
conditional expectation for unobservables is linear. In addition, we have assumed
independence between disturbances and exogenous variables instead of weaker and
preferable zero conditional mean assumptions. For ﬁrrther researches, it is possible to
relax some of the parametric assumptions to make the estimation more robust. Powell
(1984) extends least absolute deviations (LAD) estimation to the regression with non-
negativity of the dependent variable, and gives conditions under which this estimator is
consistent and asymptotically normal. The LAD minimize the sum of absolute deviations
of the selection variable from its median function under the assumptions that the error
term is continuously distributed with median zero and the median is unique. The restriction
for the normality of the error term can be relaxed in this setting. It will be interesting to

explore and apply LAD to our case.

53

The empirical results for the PSID data set shows that there is evidence of self-
selection biases but not strong evidence of sample selection biases. The simple OLS
regression using only women who work cannot produce consistent estimates and the
estimate of return to education is biased upward. By contrast, 2SLS can provide
consistent estimates when sample selection bias is not a problem.

In the CPS data set, both self-selection biases and sample selection biases exist.
The method we suggested should be used to correct self-selection biases along with
sample selection biases. The OLS estimate for returns to education is biased downward.
This indicates there is no certain direction for OLS biases.

Also, the estimated return to education by OLS is about the same (10.75% for
PSID data set and 9.9% for CPS data set) before the self-selection and sample selection
problems are being controlled for. After correcting self-selection and sample selection
biases, the return to education decreases to 7.98% in PSID data set and increases to
13.21% in CPS’s data set. From the estimation results in Table 2.2 and 2.5, this OLS bias
of return to education mainly comes from people’s self-selection. Although it is widely
believed that estimate of the return to education should go down after correction of the
sample selection along with self-selection, there are growing researches supporting our
empirical results of CPS’s data set that OLS estimate is biased downward. Angrist and
Krueger (1991) in their research for returns to schooling reach the same result that OLS

estimates are biased downward than instrumental variables estimates.4 David Card (1993)

 

" Other researches like Ashenfelter and Krueger (1992), Kane and Rouse (1993) and Butcher and
Caw(l993). All the above studies report instrumental variables estimates of the return to schooling that
exceed the ordinary least-squares estimate.

54

also reach the same result that instrumental variable estimates exceed OLS estimate and
suggest that the result may come ﬁ'om measurement error in schooling.

The method we develop in this chapter is easy to implement in conventional
empirical studies. It can correct self-selection and sample selection biases at the same time.
It is hoped that researchers in many ﬁelds might beneﬁt from it when analyzing empirical

questions.

55

Table 2.1

The Descriptive Statistics for PSH) Data Set

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Variable Observations Mean Standard Minimum Maximum
Deviation
hours 753 740.58 871.31 0 4950
kidslt6 753 .238 .5240 0 3
kidsge6 753 1.35 1.32 0 8
age 753 42.54 8.07 30 60
educ 753 12.29 2.28 5 17
wage 753 2.37 3.24 0 25
husage 753 45.12 8.06 30 60
huseduc 753 12.49 3.02 3 l7
motheduc 753 9.25 3.37 0 17
fatheduc 753 8.81 3.57 0 17
exper 753 10.63 8.07 0 45
nwifeinc 753 20.13 11.63 -.03 96
lwage 428 1.19 .72 -2.05 3.22
expersq 753 178.04 249.63 0 2025

 

 

 

 

 

Note: hours are annual hours worked. kidslt6 represents number of kids less than 6 years

of age. kidsg6 represents number of kids greater than 6 years of age. Wage is hourly wage

rate. educ represents years of education. husage represents age of husband. huseduc is

years of education for husband. motheduc is years of education for mother. fatheduc is

years of education for father. exper is experience. nwifeinc is non-wife income. lwage is

log wage. expersq is experience 2

56

 

Table 2.2

The Estimation of Different Speciﬁcations

 

 

(PSID Data Set)
( 1) (2) (3) (4) (5) (6)
OLS OLS+ 9, 2SLS Garen Procedure Procedure
2.2 2. 1

Constant -.5220* -. 1869 -.1869 -. 1665 -.O79O -. 1572
(.1986) (.2836) (.2854) (.2829) (.2888) (.3256)

educ . 1075* 0804* 0804* 0764* 0765* 0798*
(.0141) (.0216) (.0217) (.0217) (.0216) (.0245)

exper 0416* 0431* 0431* 0429* 0388* 0388*
(.0132) (.0132) (.0133) (.0131) (.0137) (.0137)

experz -.0008* -.0009* -.0009* -.0009* -.0008* -0008*
(.0004) (.0004) (.0004) (.0004) (.0004) (.0004)

educ-9, - - - 0107* - 0108*
(.0056) (.0056)

educ- 9, - - - - - -00001
(.00002)

9, - 0472* - -0820 0519* -.0795
(.0286) (.0736) (.0285) (.0732)

0, - - - - -.00006 .0001
(00004) (.0002)

R-Square .1568 .1622 .1495 . 1694 .1661 .1746

 

Notes: Dependent variable is log hourly wage.

educ represents years of education.

exper represents experience.

v, = educ - 27:,

9, is the Tobit residual.

Standard errors in parentheses.
* : signiﬁcant at the 10% level.

General I represents correction for both self-selection and sample selection biases

in linear model, without interactive between years of education and ability.

57

_

“-1

Table 2.3

Joint tests for self-selection and Sample selection biases

 

 

 

 

 

(PSID data set)
Ho: F value Prob > F

educ. 9, = 0
educ- 9,= 0

. 2.27 0.0612

v, = 0

9, = 0
educ- 9,: 0

{.2: o 3.54 0.0299
educ- 9, =0

,3, = 0 1.32 0.2688
educ- 9,: 0
educ-9,=0 2.18 0.1145

 

 

 

 

 

Notes: educ denotes years of education.
v, = educ - 27:,

9, is the Tobit residual

58

Table 2.4

Descriptive Statistics for Hamermash Data Set

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Variable Observations Mean Standard Minimum Maximum
Deviation
hours 5634 20.72 19.40 0 120
kidslt6 5634 .279 .4487 0 1
kidsge6 5634 .3 08 .4615 O 1
age 5634 39.43 9.99 18 59
educ 5634 12.98 2.62 0 18
wage 3286 10.37 7.03 .03 200
husage 5634 42.45 1 1.23 19 86
huseduc 5634 13.15 2.98 0 18
exper 5634 20.44 10.45 0 52
nwifeinc 5634 30269.23 2721 1.58 0 1 12500
lwage 3286 2.20 .525 -3.40 5.30
expersq 5634 527.04 468.29 0 2704

 

 

Note: hours are weekly hours worked. kidslt6 represents dummy variable if there are kids
less than 6 years of age. kidsg6 represents dummy variable if there are kids greater than 6
years of age. Wage is hourly wage rate. educ represents years of education. husage

represents age of husband. huseduc is years of education for husband. exper is experience.

nwifeinc is non-wife annual income. lwage is log wage. expersq is experience 2 .

59

 

Table 2.5

The Estimation of Different Speciﬁcations

 

 

(CPS Data Set)
(1) (2) (3) (4) (5) (6)
OLS OLS+ 9, ZSLS Garen Procedure Procedure
2.2 2.1

Constant 6504* 2044* 2044* . 1895* .1901 * .1629
(.0587) (.0950) (.0967) (.0950) (.0921) (.1143)

educ 0990* .1305* .1305* .1307* .1309* .1321 *
(.0035) (.0063) (.0065) (.0063) (.0061) (.0078)

exper 0198* 0195* 0195* 0195* 0186* 0187*
(.0033) (.0033) (.0033) (.0033) (.0033) (.0033)

experz -.OOO3 * -.OOO3 * -.0003 * -.OOO3 * -.0003 * -.0003 *
(.00008) (.00008) (.00008) (.00008) (.00008) (.00008)

educ- 9, - - - 0036* - 0031*
(.0009) (.0011)

educ- 9, - - - - - -00008
(00019)

9, - -0453* - -0912* 0426* -0831*
(.0076) (.0143) (.0073) (.0163)

9, - - - - -.0016* .0026
(.0007) (.0027)

R-Square .2047 .2132 .1854 .2166 .2147 .2177

 

Notes: Dependent variable is log hourly wage.

educ represents years of education.

exper represents experience.

 

9, = educ - Z6,
9, is the Tobit residual.

Standard errors in parentheses.
* : signiﬁcant at the 10% level.

General 1 represents correction for both self-selection and sample selection biases

in linear model, without interactive between years of education and ability.

60

Table 2.6

Joint tests for self-selection and Sample selection biases

 

 

 

 

 

 

 

 

(CPS data set)
Ho: F value Prob > F
educ- 9, = 0
educ- 9, = 0
. 13.64 0.0000
v, = 0
9, = 0
educ- 9,= 0
,3, = 0 21.42 0.0000
educ- 9, =0
,3 ._. 0 2.78 0.0622
educ« 9,: 0
educ- 19, = 0 6.34 0.0018

 

Notes: educ denotes years of education.
v, = educ - 27:,

9, is the Tobit residual

61

 

Table 2.7

The Estimation of Different Speciﬁcations
(PSH) Data Set without parents’ education as instruments)

 

 

(1) (2) (3) (4) (5) (6)
OLS OLS+ 9, ZSLS Garen General 1 Procedure
2.1
Constant -.5220* -.2981 -.2981 -.2625 -.2106 -.2713
(.1986) (.3094) (.3189) (.3090) (.3027) (.3395)
educ .1075 * 0894* 0894* 0837* 0880* 0895*
(.0141) (.0238) (.0231) (.0239) (.0232) (.0261)
exper 0416* 0426* 0426* 0427* 0375* 0377*
(.0132) (.0132) (.0132) (.0132) (.0138) (.0137)
experz -0008* -0008* -0008* -.0008* -0007* -.0007*
(.0004) (.0004) (.0004) (.0004) (.0004) (.0004)
educ-9, - - - .0104* - .0106*
(.0056) (.0055)
educ- 9, - - - - - -00001
(00002)

9, - .0280 - -0977 .0293 -0990
(.0296) (.0725) (.0288) (.0722)

9, - - - - -00006 .0001
(00004) (.0002)

R-Square .1568 .1586 .1536 .1657 .1625 .1712

 

Notes: Dependent variable is log hourly wage.

educ represents years of education.

exper represents experience.

v, = educ - 27:,

9, is the Tobit residual.

Standard errors in parentheses.

* : signiﬁcant at the 10% level.

General I represents correction for both self-selection and sample selection biases

in linear model, without interactive between years of education and ability.

62

Chapter 3

MODEL SELECTION TESTS FOR TWO-PART MODELS

L Introduction

In econometrics much of the literature concerned with estimation and inference
ﬁom a sample of economic data deals with a situation when the statistical model is
correctly speciﬁed. Consequently, in much of econometric practice, it is customary to
assume that the parameterized linear statistical model used for purposes of inference is
consistent with the sampling process from which the sample observations were generated.
In this ideal case, statistical theory provides techniques for obtaining point and interval
estimators of the population parameters and for hypothesis testing. In practice, however,
the possibilities for model misspeciﬁcation are numerous and false statistical models are
most likely the rule rather than the exception.

Choosing between two competing models is relatively straightforward when one
model is nested within another. Assuming that standard regularity conditions hold, Wald,
Lagrange multiplier, and likelihood ration tests (or quasi-LR tests) can be used. See, for
example, Fin and Schmidt (1984). See also Chow (1983, Ch. 9).

When competing models are not strictly nested, which means one model cannot be
obtained from the other by imposing appropriate restrictions or as a limiting suitable
approximation, statistically choosing between them is more difficult. Two very different

approaches have been proposed.

63

One is based on the work of Cox (1961, 1962) who derived speciﬁcation tests
that use information about a speciﬁc alternative and test whether the null can predict the
performance of the alternative. This procedure do the hypothesis testing that prior
assumes one model is the true model against the competing model.

The other approach is a LR based test developed by Vuong (1989) which adopt
the classical hypothesis testing framework and consider the Kullback-Leibler (1951)
Information Criterion (KLIC) which measures the distance between a given distribution
and the true distribution. Compares to Cox’s approach, Vuong’s approach does not
assume a true model in advance. Instead, the two competing models are compared on the
same basis.

In this chapter I explore the use of Vuong’s model selection tests in the context
of two competing hurdle, or two-tier, models. In particular, I am interested in models to
explain Tobit-like outcomes: the response variable can be zero with positive probability,
but it is continuous for strictly positive values. Such variables arise often out of economic
optimization. Good examples include annual labor supply and amount of charitable
contributions. For some nontrivial segment of the population, the optimum choice is zero;
a so-called corner solution.

Two-tier models offer more ﬂexibility than the most common model for corner
solution applications: the standard, or Type I, Tobit model. Rather than the standard
Tobit, they can specify different set of parameters for each tier, which is more
economically reasonable. As an example of study that test between two-tier model and the

standard Tobit model, Fin and Schmidt (1984) studied the two-tier model and developed a

64

test for the model selection between a two-tier model and standard Tobit model for nested
models.

This chapter provides some tests and simulations that selecting alternative models
for Tobit speciﬁcation of censored regression models. Censored regression models are
used when the dependent variable is partly continuous but piles up at one or more points
with positive probability. We can group censored regression models into two categories:

one is data censoring, the other is comer solution application. In the ﬁrst category, there is

a well-deﬁned variable y' and we are interested in the population regression E( y‘ | x)
where x represents exogenous variables. If y'and x were observed for everyone in the
population then we can just apply ordinary least squares or nonlinear least squares. If y'is

censored above or below some value then a data problem arises. An example is when y‘

is wage rate and wage is top-coded in the sampling survey: the actual value of wage rate is
recorded up to some threshold like $100, but after that only the fact wage rate was higher
than $100 is recorded. The second category of censored regression models is comer

solution application, which appears more often in econometrics. Suppose y is an
observable choice variable of an individual. If y is continuous then we can usually apply
some standard procedures like ordinary least squares. But if y is something like individual

consumption of a particular good, it will be optimal not to consume the particular good at

all for some individuals so y =0. Another example like workers decide whether goes to

work or not and it is optimal for some workers not to work at all. In this kinds of case, the

distribution of y in the population will be continuous for y >0 but y is zero with positive

65

probability. The data observability is not the issue in this category of censored regression

models. The focus is the expectation of y like E( y I x) and E( y I x, y > 0).

In this chapter we focus on the corner solution application category of censored
regression models. As we said, the main feature of this kind of models is that the range of

dependent variable basically is continuous yet has a lower bound, say y = 0 , and
observations pile up at this point. The fact that the observed outcome of y might be zero
simply reﬂects the choice of an individual. If we assume E( y I x) 2 x6 and apply OLS on
a random sample, then OLS estimate of 6 will be consistent if E( y I x) is in fact linear.
However, the predicted value for y can be negative for some combinations of x and 6 .
Also, a transformation like log( y) does not work because log(0) is not deﬁned. Actually,
E( y I x) is not linear in many cases. Since the usual regression model can not be used in

this kind of situation, special treatment has to be applied.
It is well known that Tobit model was created to deal with the censored

regression models. For the population, we can write the standard censored Tobit model as
y' =x6+u, qu ~Normal(0,0'2) (3.1)
y =maX(0, y’) (3.2)
For the case of true data censoring, E( y' I x) is of interest and the estimation of
6 is what we need . In the case of comer solution application, we are interested in
E( y I x) or E( y I x, y > 0) which will depend on 6 nonlinearly instead of linearly depend

on it.
However, a major limitation of Tobit model to handle comer solution application is

that standard Tobit model uses the same mechanism to determine the choice between, say,

66

buying or not or working or not and the amount to buy or the amount to work. If we
deﬁne y as the variable to be explained, that means Tobit model use only one set of
procedure determining the choice between y=0 versus y>0 and the amount of y once y>0.
That is not very reasonable because Tobit does not allow the determination of the amount
of y when it is not zero to depend on different parameters or variables from those
determining the probability of its being zero.

Some alternatives to the censored T obit model have been suggested to improve the
insufﬁciency of the standard censored Tobit model. These models allow the decision of
y>0 versus y=0 to be separate from the decision of the amount of y given that y>0. These
kinds of models are so called “two-tiered models” or “hurdle models”.

The model suggested by Cragg (1971) is one of the most widely used hurdle
models. Cragg suggested a model that nests the usual Tobit model to form a truncated
normal distribution. A major advantage of this model is that it can be easily transferred to
standard Tobit model. Because of Tobit model is a special case of the model suggested by
Cragg, it is possible to test Tobit model versus Cragg‘s alternative. Fin and Schmidt
(1984) derive the LM test for this purpose.

There are other alternatives to the standard Tobit model that have been developed.
A simpler lognorrnal distribution model is the most suggested one. Unlike Cragg’s
alternative that truncate normal distribution at zero, lognonnal distribution model assumes
that the dependent variable follows a lognonnal distribution when the dependent variable
is greater than zero. The model is easier to consistently estimate the expectation y

conditional on explanatory variables than Cragg‘s alternative. However, this model does

67

not contain the standard Tobit model as a special case by imposing parameter restrictions,
so it is diﬁcult to test the Tobit model against this alternative.

There are some methods that can be applied to test which model should be
selected. White and Olson (1979) evaluated competing models by their mean square error
of prediction. Vuong (1989) basically compared the maximum likelihood values of the
competing models‘ maximum likelihood functions to select one model over the other.

This chapter uses the maximum likelihood value comparison approach to test
model selection between Cragg model and the lognormal one. We use three different data
sets along with computer simulation to decide which one is better to ﬁt data.

Section H discusses model speciﬁcations and model selection procedures. Section

HI contains the empirical work and results. Section IV is the conclusion.

H. Basic Framework

As we discussed in section I, the situation being considered can be divided into
two tiers. First, we consider a particular event at each observation may or may not occur.
Ifit does not occur, the variable has a zero value. Ifit does occur, thing goes into the
second tier and the event associate with a continuous, positive random variable. We now
deﬁne y is the variable to be explained. First we will examine Cragg model.

Cragg suggests the truncated normal distribution which truncating the
distribution at zero in the second tier to ensure the dependent variable y is positive. We

assume P(y = 0 I x)=1-<I>(xy), (3.3)

68

where <1>(-) is the standard normal cdf and 7 is a le vector of parameters. The density
conditional on y>0 is assumed to be

f (y I x. y > 0) = {MW/OH" [¢({y -Xﬂ }/0)/01, y>0, (3-4)
where x is a lxK vector of the values of the independent variables, 6 is a le vector of
parameters, and ¢(-) is the standard normal pdf. The ﬁrst equation shows the probability
that y is zero or positive. The second equation is the density function of y when y is

greater than zero. The density of y given x is

 

f (y I x;6’) = { 1 - <I>(x7)}'”:°] (¢(x7){¢(XI3 / 0)}’l [¢({y - 396V 0)/ 01W)”;
Cragg’s model nests the usual Tobit model by imposing the restriction
)2 = 6/ a.
The log-likelihood ﬁrnction of truncated normal model for observation i is:
l. (6’) =108(f.-(y I x; 0))
= 10. = 01 Iogll - «>0.-7)}

+1[yi > 01 {1029007) - 108 (”(64 ’ a) + 103 “a!” _ log 2;}
0'

:1[y, = 0] rog{1 - <I>(x.~7)}

.. 3.5
+1[Y.>OIIIOB‘DOW)-108¢(x.-,3/0)—%IOg(2ﬂ02)-,a;2(yl‘x.ﬂ)2} ( )

As a competitor to Cragg’s model, we assume that, conditional on y>0, log(y)
follows a normal distribution. Thus,
P(y=0|x)=1-<I>(x}2) (3.6)

log(y) Ix,y> 0~Normal(x6,02). (3.7)

69

The ﬁrst equation indicates the probability of y being zero or positive and is identical to
Cragg’s model. The second equation says that conditional on y>0, yI x follows a

lognormal distribution. The density of y given x can be written as

f (y l X;9)={1- ¢(xr)}""°' {¢(xr)¢[{108(y) - 22.3} /<71/(y0)}””'°l (3.8)
The log-likelihood ﬁrnction for observation i is

711(9) =108(f.(y | x; 9))

=1[y.=01log{1-<I>(x7)}+1[y. > 01002 ¢(x.7)-log(y.)- $10802”)

glogtznrgaogoa—22.102W} (39)

Unlike Cragg’s model, this does not nest the standard Tobit model. But it is easy
to interpret. For example, 6 ,. is the semi-elastisity of y with respect to x ,. , conditional on
y>0. This makes the model easier to have the economic interpretation than Cragg’s model.
The expectations of E( y I x) and E( y I x, y > 0) are also different in two models. In
lognormal model,

E(y I x,y > O) =exp(x6 + 02 /2) , E(y I x)=<l>(x7)exp(x6 + 02 /2), and
these are easily estimated given 6,6‘2 , and )7.

Cragg’s model and the lognormal model are not nested, so we cannot use

standard tests. As discussed in the introduction, we could apply Cox’s procedure to

testing one model- which is assumed to be true under Ho - against the other. For the two-

tier models studied here, this is very complicated and computationally expensive. Voung
(1989) suggested a LR (likelihood ratio) based test which consider the Kullback-Leibler

(1951) Information Criterion (KLIC) that measures the distance between a given

70

distribution and the true distribution. If the distance between a speciﬁed model and the
true distribution is deﬁned as the minimum of the KLIC over the distribution in the model,
then the “best” model among a collection of competing models is the model that is closest

to the true distribution (see, e. g., Voung (1989)).

Suppose there are two competing models F, and G, that
F, ={ f (y I x; 6);6 e 0} and G, ={ g(y I x,y);y e 1"}, where F,9 and G, are conditional
models and f and g are the conditional density functions. The models can be nested, non-
nested, or overlapping, and the model selection test is to test the null hypothesis that
E“ [log f (y I x; 6.)] = E°[log g(y I x; 7.)] meaning that the two models are equivalent
against E°[log f (y | x; 6. )] > E°[log g(y I x; 7, )] meaning that F, is better than G, or

against E°[log f (y I x; 6. )] < E°[log g(y I x; 7.)] meaning that G, is better than F,.

Although the quantity E°[log f (y I x; 6. )] and E°[log g(y I x; 7.)] are unknown, they can
be consistently estimated by (1/n) times the log-likelihood evaluated at the pseudo or quasi
maximum likelihood estimator (MLE) (see, e. g.,White (1982a)). It is apparent that (1/n)
times the log likelihood ratio (LR) statistic is a consistent estimator of

l3°[log f (y I x; 6. )]- E°[logg(y I x; 22.)]. Cox (1961, 1962) and White (1982b) showed

that, if n is the sample size, then n'“ 2 times the LR statistic properly centered and
normalized has a limiting standard normal distribution under the hypothesis that one of the

Competing models is correctly speciﬁed.

71

To pick the better model and to derive the tests for model selection, the model
with the minimum KLIC which measures the distance between the true distribution and a

speciﬁed model should be choose. For the conditional model F, , KLIC is deﬁned as
KLIC(H196(;F9) = E°llogh°(Y. |X1)1-E°llogf(Y. IX.;9.)],
where H 3,, is the true conditional distribution of Y, given X, . h0 (-I -) is the true

conditional density of Y, given X, . Since the expression in the equation does not depend

.l..-_ 0‘

on F9, an equal measure is E°[log f (Y, | X ,;6.)].

 

Ifwe have two competing models, the model that is closest to the true
conditional distribution should be chose. Vuong therefore develop the following

hypotheses:

H0 :EOIlog f(Y' lX”””I=0, (3.10)
80’. IXM’.)

 

meaning that F, and G, are equivalent, against

logf(Y‘ lX';‘9‘)I>0, (3.11)

Hf: EOI:
g(Yr 1X57.)

meaning that F,9 is better than G, , or

H,: EoIlogﬂY' IX"0'):I<0, (3.12)

 

80’. | X . ;7.)
meaning that F, is worse than G, .

Since the two competing models we are going to examine (the lognormal model
and the Cragg alternative) are non-nested, the model selection tests for strictly non-nested

models are:

72

A D
HO :n""2LR,,(6,,,f,,)/a3,,—>N(O,l) (3.13)

H, :n"’2LR,(6,,7‘,)/03,:+00 (3.14)
H, :n"’2LR,,(6,,,7,,)/6?),,:;-oo (3.15)
where LR,(6,,7‘,) EL{(6,)—Lg(i,)=216gf(y' W5") ,where 6, and y“, are the

 

i=1 g(y. Ix.;7.)

maximum likelihood estimators of 6. and 7.. Also, Li (6,) = sup Li (6) and Li (6) is the
966

conditional log-likelihood function for the model F,,. A similar deﬁnition applies to 7‘, for

 

the conditional model G. And a), ==-—{n2[1og f(y, Ix,, 0:22)]2}“2
t=l gg(YI |xt97n)

The tests above provide us a useﬁrl ﬁamework for model selection. We can

choose a critical value c ﬁom the standard normal distribution at some signiﬁcance level.

Ifthe value n"" ZLR,l (6,, , 7,)/ (9,, is greater than c, then the model F, is preferred. If the

value n‘” ZLR, (6,, , 7,)/ a"), is smaller than -c, then the model G, is preferred. If the value

n‘“ 2LR, (6,, , 7,)/ :9, is between c and -c, then we cannot discriminate between the two

competing models.
III. Empirical Result

To determine the model selection between lognormal model and Gragg’s
alternative, we divide this section into two parts. In the ﬁrst part we do a simulation using

computer-generated data sets to see how well the test works. In the second part we apply

73

the test introduced in section H to three data sets used in chapter one and two. We

basically follow the null hypothesis:

23002 f“. — log 2.)
H, - '2' ~ N(O,1)

zoos}. —Iog 2.)?
i=1

 

 

to form the test and apply to different data sets.

(I) The Simulation Results
The simulation is designed as the following:
We use computer to generate data sets. The data sets are generated in two kinds: one
follows lognormal distribution and the other follows truncate normal distribution.
(a) lognormal distribution
The data sets follow lognormal distribution is generated by:
(i) Generate X=(1,X,,X,),
where l is constant term, X, is a binary variable which is zero for around 70% and one
for around 30%. X, is a standard normal random variable.
(ii) Generate 2 =1 if X7 + u 2 0 , otherwise 2 = 0. u follows a standard normal
distribution.
(iii) If 2 = 0 then y = O. If2=l, then y = exp(X6 + v).
v is generated to corrected with 1: such that
v = .3u + .7k where k follows a normal distribution such that k ~ N (0,03) .

(b) truncate normal distribution

74

(i) Follow the steps (i) and (ii) in (a).

(ii) If 2 = 0 then y : 0. Ifz=1, then y = X6 + v is truncated at zero. Also, v is
generated to correlated with u such that v : .3u + .7k where k follows a normal
distribution such that k ~ N (0, 02).

With both kinds of the data sets, we set sample size at 200 and 1000. We also
experiment two sets of true values of r and 6 5along with two 02 values (.2 and .5). To
see if the simulation results change if we increase the variation of X, , we also experiment
the whole simulation by doubling X, ’s variation. The whole combinations show in Table

3.1. There are sixteen combinations for each kind of data set. With all the combinations,
we run each of them 200 times to ﬁnd out the pattern.

We choose 2.58 as the critical value so the signiﬁcant level is 1%. Ifthe statistic
value we calculate is greater than 2.58, then lognormal model is preferred. If the statistic
value is less than —2.58, then Gragg truncate normal model is preferred. If the statistic
value is between 2.58 and —2.58, then there is no conclusion which model is better. The
rejection rate for the true model being picked up correctly shows in Table 3.2 and 3.3.

In Table 3.2, the true model is lognormal model. We have the following ﬁndings:
(1) The rejection rates decrease when we decrease the sample size from 1000 to 200. This

is reasonable since bigger sample should give the test more power to distinguish the

true model ﬁom the alternative model.

 

5 The ﬁrst set is: 7 = (0,1,1) and6 = (0,1,1) . The second set is :7 = (0,— 1,1) and 6 = (0,1,1) .

75

(2) We also ﬁnd the rejection rates decrease when 02 increases from .2 to .5. That means
the test is harder to pick up the right model when the variance of the error tern in

equation y increases.

(3) The rejection rates decrease when the variation of explanatory variable increases That
means the test is less powerful to pick up the right model when the variation of the
explanatory variable increases.

In Table 3.3, the true model is Cragg’s truncate normal model. The same test is used
as we did in Table 3.2. We have the following ﬁndings:

(1) The rejection rates decrease as we decrease the sample size from 1000 to 200. This
result is similar with Table 3.2’s result.

(2) The rejection rates do not change with respect to 02 and the variation of X, change.
It’s more stable in our simulation.

(3) The speciﬁcation that 7 =(O,-1,1) and 6 =(0,1,1) may not be a good one when the true
model is Cragg’s truncate normal model. The maximum likelihood procedure just can
not converge when estimating the log likelihood ﬁrnction for Cragg’s truncate normal
model.

Compare the simulation results for two different kinds of models, we noticed
that Cragg’s truncate normal model is more robust than the lognormal model in the sense
that the model is seldom being wrong picked when we change the variation of X and 0'2 .
Only when we decrease the sample size the model’s rejection rates decrease. This means
the possibility for us to pick up a wrong model is higher when the true model is a

lognormal model.

76

(H) The Real Data Sets Results

We use the 3 data sets at hand to run the test we described above to see which
model is better ﬁtted. The 3 data sets we use are exactly the data sets we used in previous
two chapters. The ﬁrst one is Mofﬁtt data set. The second one is PSID data set and the
third one is CPS data set.

In the case of Mofﬁtt’s data set, we use hours of work a week as the dependent
variable. The independent variables include dummies for marital status, age, race, the
number of family members, the number of children in the household who are less than 6,
the number of children in the household who are greater than 6, years of education, the
size of the local labor force, the employment ﬁactions in manufacturing and in government
in the census region of residence and the ﬂow of the asset income. After conducting the
maximum likelihood procedures for both models, the calculated statistic value is —8. 10.
Cragg’s truncated normal model is preferred.

For PSH) data set, the hour of work is the dependent variable. The explanatory
variables include age, non-wife income, and the number of kids less than age of six, the
number of kids greater than age of six, years of education, experience, and square of
experience. The statistic value is —6.8. The Cragg’s truncate normal model is preferred.

For CPS data set, the dependent variable is hours of work. The explanatory
variables are age, non-wife income, race (a dummy variable), education, experience,
square of experience, husband education, husband experience, the number of kids less than
age of six, the number of kids greater than age of six. The statistic value is -11.87. Again,

The Cragg’s truncate normal model is preferred.

77

IV. Conclusion

In this chapter, we discuss the potential drawbacks of the Tobit model to handle
the corner solution applications of censored regression models. We also have two good
alternative candidates for Tobit model. One is a lognormal model and the other one is
Cragg’s truncated normal model. To choose the model that best ﬁts the data, we applied
Voung’s LR based test for model selection.

We do simulation ﬁrst and then apply the test to the data sets we used in chapter
one and two. The simulation results show that it is easy for us not to pick the lognormal
model even when the lognormal model is the true model. So if the test shows that
lognormal model is preferred, the chance to go wrong is very small. On the other hand, if
the test indicates Cragg’s truncate normal model is preferred, we still have to be careful.

The results for the ”real data sets show that Cragg’s truncate normal model is
preferred in all three data sets although we should be more careﬁrl about the result as the

simulation suggest.

78

 

Table 3.1

Combinations of the Simulation

 

Sample Size: 1000 Sample Size: 200

 

X,~ N(0,1)X,~2* N(0,1) x,~ N(0,1) X,~2* N(0,1)

 

 

7=(0,1.1)

6 =(0,1,1)

 

7 :(02-121)

6=(0,l,1)

 

 

 

 

 

 

 

 

 

79

Table 3.2

The Rejection Rate

True Model: Log Normal Model

Note: Critical value is 2.58

 

 

 

 

 

 

 

 

 

 

 

 

 

Sample Size: 1000 Sample Size: 200
X~ N(0,1) X~2* N(0,1) X~ N(0,1) X~2* N(0,1)
02=.2 az=5 02=.2 02=5 oz=.2 02=.5 02=.2 02=.5
r=(0.1,1)
340,1,” 100% 98% 56.25% 32.46% 100% 92.5% 27.37% 12.35%
(200) (200) (192) (191) (200) (200) (179) (170)
7=(02'121)
,3 =(0,1,1) 100% 99.5% 100% 100% 100% 99% 100% 100%
(200) (200) (153) (136) (200) (200) (191) (190)

 

 

8O

 

Critical value is -2.58

Table 3.3

The Rejection Rate

True Model: Truncated Normal Model

 

 

 

 

 

 

Sample Size: 1000 Sample Size: 200
X~ N(0,1) X~2* N(0,1) X~ N(0,1) X~2* N(0,1)
02=.2 az=.5 az=.2 02=.5 02= az=.5 02=.2 02=.5
7 =(0.1.1)
’5 =(0,1,1) 100% 100% 100% 100% 93% 87.5% 100% 99%
(200) (200) (200) (200) (200) (200) (200) (200)
7' :(Oa'lal)
ﬂ =(O 1 1) all 4! III * ill ’1‘ ll! 31!

 

 

 

 

 

 

 

 

 

note: * means the computer work does not converge

81

 

 

Chapter 4

CONCLUSION

In the previous three chapters, we explore several different topics for Tobit model
that takes the form
y, = 2,7 +17, (4.1)
y, = max (0, a0y, + 2,6 + u,) (4.2)
where equation (4.1) is the stnrctural equation and equation (4.2) is the Tobit selection

equation. 2,, 2, are the vectors of exogenous variables.

The ﬁrst two chapters concentrate on equation (4.1)’s consistent estimation. In
chapter one we derive and apply an alternative method to test and correct sample selection
bias while estimating essentially the Type HI Tobit model with endogenous explanatory
variables in the structural equation. We applied multi-step approach to estimate traditional
Type HI Tobit model which usually being estimated by Maximum Likelihood Method.
Multi-step approach has advantages of easy to compute and more robust than MLE does.
It also can provide an easy t-test for sample selection bias at the same time. However, if
one need to acquire the asymptotic variance matrices under this multiple step approach, it
maybe needs a lot of computational eﬂ‘orts.

The NLS data set used by Moﬂitt (1984) is applied to our procedure. We ﬁnd
there are evidence of the existence of sample selection bias in our estimation, which is
different from what MofIitt’s claim. After correcting the bias, the average return of
education goes from 7 .77% to 5.94% which is similar to the early ﬁndings. We can say

that the average return will be overestimate if we don’t correct the sample selection bias.

82

ll

 

The coeﬁcient of hours. also changes from 36% to 1.21% before and after we add the

A

constructed variable v; into the structural equation. It means that the hours effect to wage
will be underestimate if we don’t take the sample selection bias into account.

In chapter two we focus on the situation where there are both self-selection and
sample selection. We show how to test and to correct for self-selection and sample
selection biases at the same time, especially in circumstance when the explanatory variable,
such as education, interacted with unobservable. Like in chapter 1, we derive a multiple-
step procedure which provides an easier to compute, less assumptions needed alternative
to traditional Maximum Likelihood Estimation. The procedure extends Garen’s (1984)
procedure to handle the sample selection problem besides the self-selection problem. Two
data sets are applied to this procedure. One is PSH) data set of 1975 used by Mroz
(1987) in his study of the sensitivity of female labor supply to various assumptions. The
other one is CPS data set of May 1991 compiled by Daniel Harnerrnesh.

The empirical result for PSH) data set shows that there is evidence of self-selection
biases but not strong evidence of sample selection biases. The simple OLS regression
using only women who work cannot produce consistent estimates and the estimate of
return to education is biased upward. By contrast, ZSLS can provide consistent estimates
when sample selection bias is not a problem.

In CPS data set, both self-selection biases and sample selection biases exist. The
method we suggested should be used to correct self-selection biases along with sample
selection biases. The OLS estimate for returns to education is biased downward. This

indicates there is no certain direction for OLS biases.

83

Chapter three shift the focus to equation (4.2). We discussed the possible model
misspeciﬁcation and explore the model selection test for choosing between two competing
models. The potential drawbacks of the Tobit model to handle the comer solution
applications of censored regression models are also being discussed. There are two
alternatives for Tobit model. One is lognormal model, the other is Cragg’s truncate normal
model. To pick up a right one that ﬁts the data better, Vuong’s LR based test for model
selection is applied.

The simulation results show that the Vuong’s model selection test basically can
pick up the correct model. However, we do ﬁnd that using the test is easier to mispick
when the lognormal model is the true model. So if the test shows that lognormal model is
preferred, the chance to go wrong is very small. On the other hand, we still have to be
careful if the test indicates Cragg’s truncate normal model is preferred.

For further research, in chapter one and two the normal distribution assumption for
error term in the selection equation could be relaxed. Powell (1984) extends least absolute
deviations (LAD) estimation to the regression with non-negativity of the dependent
variable, and gives conditions under which this estimator is consistent and asymptotically
normal. It should be interesting to explore and apply LAD to our case. In chapter three,
the model selection test is under the assumption that there are only two competing models.
It is useful to generalize the procedures to the case where there are many competing

models.

84

REFERENCES

Amemiya, T. (1985), advance Econometrics. Cambridge: Harvard University Press.

Angrist, J. D., and A. B. Krueger (1991). “Does compulsory school attendance aﬁ‘ect
schooling and earnings?”, Quarterly Journal of Economics v106, n4, p979(3 6).

Anhenfelter, O,. and A. B. Krueger (1992) “Estimates of the Economic Return to
Schooling for a New Sample of Twins.” Princeton University Industral Relations
Section Working Paper #304.

Butcher, K. F., and A. Case (1993) “The Effect of Sibling Composition on Women’s
Education and Earnings.” Unpublished Discussion Paper, Princeton University
Department of Economics.

Card, D., (1993). “Using Geographic Variation in College Proximity to Estimate the
Return to Schooling” NBER Working Paper No. 4483.

Garen, J ., (1984). “The returns to schooling: a selectivity biases approach with a
continuous choice variable, Econometrica 52, 1199-1218.

Chow, G. (1983): Econometrics. New York: Mcgraw-Hill .

Cox, D. R. (1961): ”tests of Separate Families of Hypotheses,” Proceedings of the Fourth
Berkeley Symposium on Mathematical Statistics and Probability, 1, 105-123.

Cox, D. R. (1962): “Further Results on Tests of Separate families of Hypotheses,” Jounal
of Royal Statistical Society, Series B, 24, 406-424.

Cragg, J. (1971), “Some Statistical Models for Limited Dependent Variables with
Application to the Demand for Durable Goods,” Econometrica 39,829-844.

85

Fin, T. and P. Schmidt (1984), “A Test of the Tobit Speciﬁcation Against an Alternative
Suggested by Cragg,” Review of Economics and Statistics 66, 174-177.

Gronau, R.(1974). “Wage Comparisons—A Selectivity Biases.” Journal of Political
Economy 82:1 119-43

Hausman, J. (1980), “The Effects of Wages, Taxes, and Fixed Costs on Women’s Labor
Force Participation.” Journal of Public Economics 14:161-94.

Heckman, J. (1974). “Shadow Prices, Market Wages, and Labor Supply.” Econometrica
42:679-94

Heckman, J. (1976). “The Common Structure of Statistical Models of Truncation,
Sample Selection and Limited Dependent Variables and a Simple Estimator for
Such Models.” Annals of Economic and Social Measurement 5:475-92.

Kane, T. J ., and C. E. Rouse (1993) “Labor Market Returns to Two- and Four- Year
Colleges: Is a Credit a Credit and Do Degrees Matter?” Princeton University
Industrial Relations Section Working Paper #311.

Kullback, S., and R. A. Leibler (1951): “On Information and Sufficiency,” Annals of
2 Mathematical Statistics, 22, 79-86.

Larson, D. (1979), “Taxes in a Labor Supply Model with Joint Wage-Hours
Determination:
Comment.” Econometrica 47: 131 1-13.

Lee, L. F., and R. P. Trost (1978). “Estimation of Some Limited Dependent Variable
Models with Application to Housing Demand.” Journal of Econometrics 8:357-
82.

Lee, L.F. (1978). “Unionism and Wage Rate: A Simultaneous Equation Model with

Qualitative and Limited Dependent Variables.” International Economic Review
19 :41 5-3 3.

86

 

Lewis, H. G. (1974). “Comments on Selectivity Biases in Wage Comparisons.” Journal
of Political Economy 82(6): 1 145-55.

Moﬂitt, R. (1984), “The Estimation of a Joint Wage-Hours Labor Supply Model” Journal
of Labor Economics 2:550-566.

Mroz, TA. (1987), “The Sensitivity of an Empirical Model of Married Women’s Hours
of Work to Economic and Statistical Assumptions,” Econometrica 55, 765-799.

Oi, W. (1962), “Labor as a Quasi-ﬁxed Factor.” Journal of Political Economy 702538-55.

Powell, J .L. (1984), “Least Absolute Deviations Estimation for the Censored regression
Model” Journal of Econometrics 25:303-325.

Rosen, H. (1976), “Taxes in a Labor Supply Model with Joint Wage-Hours
Determination.” Econometrica 44:485-507

Rosen, S. (1969), “On the Interindustry Wage and Hours Structure.” Journal of Political
Economy 772249-73.

Roy, A. D. (1951). “Some Thoughts on the Distribution of Earnings.” Oxford Economic
Papers 32135-46.

Vella, F. (1992), “Simple Tests for Sample Selection Bias in Censored and Discrete
Choice Models,” Journal of Applied Econometrics 7,413-421.

Vella, F. (1993), “A Simple Estimator for Simultaneous Models with Censored
Endogenous Variables,” International Economic Review 34, 441-457.

Vuong, Q. (1989), “Likelihood Ration Tests for Model Selection and Nonested
Hypotheses,” Econometrica 57, 307-333.

White, H., and L. Olson (1979): “Determinants of Wage Change on the Job: A symmetric
Test of Non-Nested Hypotheses,” mimeo, University of Rochester.

87

White, H. (1980): “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a
Direct Test for Heteroskedasticity,” Econometrica 48, 817-838.

White, H. (1982): “Maximum Likelihood Estimation of Misspeciﬁed Models,”
Econometrica, 50, 1-25

White, H. (1982): “Regularity Conditions for Cox’s Test of Non-Nested Hypotheses,”
Journal of Econometrics, 19, 301-318.

White, H. (1982), “Instrumental Variables Regression with Independent Obervations,”
Econometrica 50, 483-500.

Willis, R., and S. Rosen (1979). “Education and Self-Selection.” Journal of Political
Economy 87(5, Part 2):507-36.

Wooldridge, J .M. (1996), ”Selection Corrections with a Censored Selection Variable”
Working Paper, Michigan State University.

Wooldridge, J .M. (1997), “On Two Stage Least Squares Estimation of the Average
Treatment Effect in a Random Coeﬁicient Model”, Working Paper, Michigan
State University.

Wooldridge, J.M. (1998), “Econometric Analysis of Cross Section and Panel Data”,
Manuscript, Michigan State University.

88

 

"1111111111111111: