THESIS Illiiiil'li'lilil'ilifliiiliiiililililliilliiiifltfiill 3 1293 01701 4741 This is to certify that the dissertation entitled Essajs On Sampk Scicoi—{on/ 3d? Sclcd-fon and ModCi SéiCLi—tbn presented by A; - C kl. HS H has been accepted towards fulfillment of the requirements for ?L') .D. degree in ECDnOMTCS Qg’huya‘w. («QM Major professor Date jwm 2"; ((161? MSU i: an Aflirmatt'w Action/Equal Opportunity Institution 042771 PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINE return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE I DATE DUE DATE DUE JUN 1 8 200m APR 19 2001 1 W 11” MM“ ESSAYS ON SAMPLE SELECTION, SELF SELECTION AND MODEL SELECTION By Ai-Chi Hsu A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Economics 1998 ABSTRACT ESSAYS ON SAMPLE SELECTION, SELF SELECTION AND MODEL SELECTION By Ai-Chi Hsu This dissertation analyzes several different issues concerning selection corrections with a censored selection variable, including sample selection bias corrections, self- selection bias corrections, and some model selection issues. Several methods are developed to correct sample and self-selection biases. Model selection for hurdle-type models is also discussed. Chapter 1 develops a sample selection bias correction procedure when the endogenous explanatory variable appearing in the structural equation is also the variable determining selection. Assuming that the selection variable follows a standard Tobit, a simple two-step estimator is consistent and asymptotic normal. The method is applied to a popular data set from the labor economics literature. Some comparisons also are made for difi‘erent estimation approaches. Chapter 2 develops a method to simultaneously deal with self-selection and sample selection problems with a Tobit selection equation and a roughly continuous endogenous explanatory variable, as in Garen (1984). This method is applied to two difl‘erent data sets from the labor economics literature. Chapter 3 studies model selection for hurdle models that are ofien used as extensions of the Tobit model. Two competing non-nested models are considered. One is Cragg’s (1971) truncated normal model that assumes the distribution of the dependent variable follows a normal distribution and truncated at zero. The other one is log-normal model that assumes the distribution of the dependent variable follows a log-normal distribution. To select one model rather than the other, Vuong’s (1989) approach is applied to see which model fits data better. The simulation results show that Vuong’s test has reasonable power for choosing the correct model. Dedicated to My Parents and Brother iv ACKNOWLEDGEMENTS First and foremost, I would like to say ‘thank you!’ to my dissertation advisor Professor Jefl‘rey Wooldridge. Professor Wooldridge shared his professional knowledge and valuable time with me. This dissertation would never have been possible without his help and support. I also thank other committee members, Professor Robert Lalonde and Professor Stephen Woodbury for their helpful comments on my thesis, especially chapter 2. There are some friends and fellow graduate students to whom I also want to say thanks. Thanks to Hui-Wen Shih, Cheng-Ping Cheng, Te-Fen Lo, I-Jung Tsai, and the other graduate students I worked with. I am grateful to my parents and my brother for their support throughout my graduate study. Thanks to them again. TABLE OF CONTENTS LIST OF TABLES ........................................................................... vi INTRODUCTION .......................................................................... 1 CHAPTER 1 SELECTION CORRECTION WITH ENDOGENOUS EXPLANATORY VARIABLES AND A TOBIT SELECTION EQUATION ........................... S I. Introduction ........................................................................ 5 H. The Basic Model .................................................................. 10 III. Empirical Application ............................................................ 17 IH.1 Estimating the Structural Wage Equation ............................... 17 111.2 Estimating the Labor Supply Equation .................................. 22 IV. An Extension of the Model ...................................................... 23 V. Models with Additional Endogenous Explanatory Variables ............... 24 VI. Conclusion ........................................................................ 26 CHAPTER 2 SELECTION CORRECTIONS WHEN BOTH SELF-SELECTION BIASES AND SAMPLE SELECTION BIASES ARE PRESENT ................... 33 I. Introduction ....................................................................... 33 H. A Random Coefficient Model .................................................. 36 ID. Garen’s Model with Sample Selection ........................................ 39 IV. Empirical Examples .............................................................. 42 V. The Consistency of Procedure 2.2 ............................................. 49 VI. Conclusion ........................................................................ 52 CHAPTER 3 MODEL SELECTION TESTS FOR TWO-PART MODELS ......................... 63 I. Introduction ........................................................................ 63 H. Basic Framework ................................................................. 68 HI. Empirical Result .................................................................. 73 (i) The Simulation Results .................................................... 74 (ii) The real Data Sets Results ................................................ 77 IV. Conclusion ........................................................................ 78 CHAPTER 4 CONCLUSION .............................................................................. 82 LIST OF REFERENCES .................................................................. 85 vii LIST OF TABLES Chapter 1 SELECTION CORRECTION WITH ENDOGENOUS EXPLANATORY VARIABLES AND A TOBIT SELECTION EQUATION Table 1.1: The Descriptive Statistic of NLS Data Set ............................. 28 Table 1.2: The Estimation of Different Specifications ............................ 29 Table 1.3: Two step and Moflitt’s MLE estimation ................................ 30 Table 1.4: Tobit Estimation for the Labor Supply Equation ...................... 31 Table 1.5: Two Step Estimation with assumption relaxed ......................... 32 Chapter 2 SELECTION CORRECTIONS WHEN BOTH SELF -SELECTION BIASES AND SAMPLE SELECTION BIASES ARE PRESENT Table 2.1: The Descriptive Statistics for PSID Data Set .......................... 56 Table 2.2: The Estimation of Different Specifications (PSID Data Set) ........ 57 Table 2.3: Joint tests for self-selection and Sample selection biases (PSID data set) .................................................................. 58 Table 2.4: Descriptive Statistics for CPS Data Set ................................. 59 Table 2.5: The Estimation of Different Specifications (CPS Data Set) ......... 60 Table 2.6: Joint tests for self-selection and Sample selection biases (CPS data set) ................................................................... 61 Table 2.7: The Estimation of Different Specifications (PSID Data Set without Parents’ Education as Instruments) ............. 62 Chapter 3 ALTERNATIVE MODELS SELECTION FOR TOBIT SPECIFICATION Table 3.1: Combinations of the Simulation ......................................... 79 Table 3 .2: The Rejection Rate (True Model: Log Normal Model) .............. 80 Table 3.3: The Rejection Rate (True Model: Truncate Normal Model)... . . . 81 viii INTRODUCTION This dissertation analyzes several different topics for selection corrections with a censored selection variable, including sample selection bias corrections, self-selection bias corrections, and some model selection issues. It is quite normal in econometrics to assume that a random sample is available from the underlying population of interest. However, this is not always the case. Sometimes due to the way economic data are collected and economic behavior by the individuals being sampled, a nonrandom sample can be generated. By “sample selection” I mean cases where certain variables cannot be observed for a subset of the population. Thus, sample selection concerns data availability. Self-selection refers to any situation when one or more explanatory variables correlated with unobservable factors affecting the outcome equation. An example is education in a wage equation, where education is correlated with unobserved ability that affects wage too. The distinction between self-selection and sample selection is not always sharp. For example, people can self-select into the workforce, which leads to a data observability problem (we do not observe the wage offer for those out of the workforce). We will treat this as a sample selection problem. The labels we give to these econometric problems are not crucial, but it is usefiil to have a way of distinguishing endogenity of explanatory variables from missing data problems. In this thesis I propose methods to correct for possible sample selection biases. 1 study the case when the variable determining selection follows a reduced form Tobit model. Usually, the sample selection issue is appeared in the context of binary selection information: either we observe the data or do not. But in some cases more information is available. A leading example is estimating a wage ofi‘er equation for working age adults. Rather than work or not, the data for working hours is also available. It is possible to improve the estimation using this extra information. As we know, Heckman’s (1976) method for correcting potential sample selection biases is widely used in empirical studies. However, the method developed in this thesis has advantages over Heckman’s method: the identification is easier, handling endogenous explanatory variables is easier. In chapter 1 I develop a sample selection bias correction procedure when the endogenous variable, especially the Tobit selection variable, is an explanatory variable in the structural equation. The procedure can be used in a variety of studies. For example, it can be used to estimate a wage offer equation for a certain group of people (such as age over 25) which we use as an application in chapter 1. The working hour is the selection variable in this case. The working hour also appears as an explanatory variable in the structural equation. There is self-selection component to this example: people self-select into employment, so whether we observe wage ofier or not depends on individual’s decision. As we said, whether we call example like this sample selection or self-selection is not a very important issue. However, a typical self-selection issue in economics has some features that distinguish it from the sample unobservable problem. Chapter 2 continues the reasoning in chapter 1 and develops a procedure that can handle both self-selection and sample selection problems at the same time to get a consistent estimation for the parameters in a structural equation. The problem of self- selection often occurs when evaluating various programs such as job training, welfare, or the returns to the education using nonexperimental data. Generally, the issue is that individuals decide whether to participate in a program or to receive some treatment. Ifthe participation decision is related to factors that afl‘ect the outcome variable, ignoring the deterrninates of this decision leads to biased estimates of program impacts. We consider the case that allows the endogenous explanatory variable interacted with an unobservable. The reasoning of the procedure is similar as Garen’s (1984) procedure, which he used to estimate the return to education when education and unobserved ability interact. However, we extend Garen’s work by considering the sample selection bias. As we know, hurdle models are more flexible alternatives to Tobit. Chapter 3 studies model selection for hurdle or two-tier model. Two competing non-nested models are considered. One is Cragg’s truncated normal model, the other is a two—part log- normal model. For choosing between nonnested models, two very different approaches have been proposed. The earliest is based on the work of Cox(1961, 1962) who derived specification tests that use information about a specific alternative and test whether the null can predict the performance of the alternative. This approach assumes that one of the competing models is the true model and test the hypothesis based on this assumption. The other approach is what Vuong(1989) terms a “model selection approach.” Vuong bases tests of nonnested alternatives on an estimate of the Kullback-Leibler (1951) Information Criterion (KLIC), which measures the distance between two distributions relative to the true distribution. Compared with Cox’s approach, Vuong’s approach does not assume that one of the models is correct under H0 . The null hypothesis is that the models fit the data equally well, and the alternative is that one model fits better. If a model is correctly specified then, asymptotically, it produces the best fit. I apply Vuong’s (1989) approach to see which hurdle model fits data better. A small simulation study is applied to generated data sets for both the truncated normal distribution and the log-normal distribution to see if Vuong’s procedure has power for picking the true model. I also apply the test to three labor data sets. Chapter 4 contains concluding comments and directions for future research. Chapter 1 SELECTION CORRECTION WITH ENDOGENOUS EXPLANATORY VARIABLES AND A TOBIT SELECTION EQUATION I. Introduction The problem of nonrandom sampling occurs frequently in econometrics. Sample selection problems can arise when a random sample is not available from the underlying population of interest. Due to the way economic data are collected, sometimes we can obtain only a nonrandom sample from the population. Ifthe sample does not represent the population of interest, sample selection biases may arise. As an example, suppose we are interested in estimating a wage offer equation for women over age 30. By definition, this equation is supposed to represent all women over age 30, whether or not a woman is actually working at the time of the survey. Since we can only observe the wage offer for working women, we actually select our sample on this basis. The sample selection problem arises because data on a key variable, wage, is available only for a clearly defined subset of the population. This is an example of incidental truncation (of the wage offer, in this example), because wage is missing due to the outcome of labor force participation. One can easily get confused about the distinction between sample selection and self-selection. Consider the above example without the incidental truncation problem. The population is working women, so we do not have a sample selection problem. However, education level might be correlated with unobserved characteristics, such as “ability,” that also have a direct efi‘ect on the wage. In other words, people ‘self-select’ their education levels. This makes self-selection problem and sample selection problem different in the nature. In general, the terms “sample selection” and “self-selection” are used to make a clear distinguish for the nature of the selection problems. Sample selection refers to the data observation problem when there are selected samples; self-selection refers to the endogenity of the explanatory variables when the explanatory is considered to be correlated with the error term in the structural equation. There is no strict separation between these problems. Self-selection could be the cause of the sample selection problem. As in our wage offer example, it is true we do not observe the wage ofl‘er for people who do not work, so it is a sample selection problem. However, one can also say that whether people work or not comes from their “self selection.” It is fairly well known that in a linear model with endogenous explanatory variables, if selection rule is based on some exogenous variables, estimation of the population model by two stage least square (ZSLS hereafter) using the selected sample is consistent and asymptotically normal (see, for example, Wooldridge (1996)). Ifthe selection rule is based on endogenous variables, applying ZSLS to estimate the population model using the selected sample will not generally be consistent. Therefore, it is important to have methods for testing and correcting for sample selection with endogenous explanatory variables. The purpose of this chapter is to derive and apply an alternative method to test and correct sample selection bias while estimating essentially the Type IH Tobit model1 with endogenous explanatory variables, concentrating on the core where the selection variable appear in the structural equation. This makes our model essentially similar to Nelson and Olson (1978) and becomes a special case of the Type IV Tobit model, which is an extension of Type HI Tobit. However, we make fewer assumptions and obtain simple two-step estimators. A standard Type 1H Tobit model consists two equations. One is the stnrctural equation, whose coefficients we want to estimate the coeflicients consistently. The other is the selection equation, which takes a Tobit equation form. Unlike the Type H Tobit model, the Type HI Tobit model’s selection equation includes not just a binary variable (probit equation). It includes more information in the selection equation. Heckman (1976) is the pioneer of Type H Tobit model. Heckman’s two step method for correcting sample selection bias for his labor-supply model by including inverse Mill’s ratio into structural equation has been widely used. The procedure was firrther extended to a wide class of models by Lee (197 8). The drawback to Heckman’s procedure is that, if there is not a good exclusion restriction in the structural model, the estimators can be very imprecise in finite sample. Full maximum likelihood methods or semi-parametric method could be applied to Type H Tobit model estimation. However, as discussed by Wooldridge (1996), none of them are satisfying. Ifthere is more information available in the selection equation, such as Type HI Tobit form, the additional information may be used to get more precise estimators. The traditional method to estimate Type HI and IV Tobit model is maximum likelihood, which makes full parametric assumptions; see Amemiya (1985 section 10.8). To detect if there exists sample selection bias, Vella (1992) extended the testing procedures proposed by Heckman (1979) and Vella (1993) to construct a t-test on a ‘ See Amemiya (1985, section 10.3) constructed variable in an auxiliary equation. However, Vella did not prove the consistency of the parameters in the structural equation using his method. In the case of a Tobit selection equation, Wooldridge (1996) showed that adding the Tobit residuals can produce consistent estimators in the structural equation. Wooldridge relaxed the basic Tobit 1H model assumption, which needed by MLE, that the distribution of the error terms in both the structural equation and the selection equation are bivariate normal. Instead, he proposed a weaker assumption that only the error term in the selection equation is normal distributed and the two error terms have a linear relationship. Under these modifications of the Type IH Tobit model, he proposed a multi-step procedure to estimate the model. At the same time, he also proved that the testing procedure proposed by Vella can not only be a test but a correction for sample selection bias in cross section linear model case. In other word, he proved the consistency of the parameters in the structural equation afier the constructed variable is added as a regressor. Unlike MLE’s computational burden, the multiple-step procedure is much easier to implement, even for models with endogenous explanatory variables (the case we are going to discuss in this paper). It also requires weaker assumptions than MLE. In other word, it is more robust. Furthermore, it can provide a simple test for the sample selection bias and correct the problem at the same time if biases do exist. In this chapter, the basic approach used is a simpler two-step method that extends a method proposed by Wooldridge (1996). Wooldridge covers the case of Type HI Tobit model. I focus on the case that the Type IH Tobit model with endogenous explanatory variables, especially the case where the dependent variable of the selection equation itself turns out to be an explanatory variable in the structural equation (a special case of Type IV Tobit). To consistently estimate our model, we make some adjustment of the Type IH Tobit model by putting the dependent variable of the selection equation into the right hand side of structural equation as an explanatory variable (a special case of Type IV Tobit) and correcting the sample selection bias and endogenity of the selection variable at the same time. We even extend the two-step procedure to three-step procedure in order to estimate the selection equation consistently. We use the same data set of Moffitt (1984), which contains the information on married women in 1972 wave of the National Longitudinal Survey of Older Women. Mofiitt used a maximum likelihood method to estimate the labor supply function in his paper. Comparing the method he used, the method we use here is easier to compute and requires fewer prior assumptions, which means the estimation procedure we use is more robust. The results of the estimation show that 1): there does exist sample selection bias that cannot be ignored. 2): the effect of hours on log(wage) appears to be linear. 3): the returns of education appears to be overestimated when the sample selection problem is not accounted for. 4): the wage effect is positive and significant in the labor supply equation. 5): the wage elasticity of the labor supply is less when it is conditional on working hours are positive than without conditions. The basic econometric model is in section H. The empirical results and the comparisons are proposed in section HI. We relax the assumption of a linear relationship between the error terms in structure equation and selection equation assumed in section IV. In section V we extend the basic model to allow additional endogenous explanatory variables in the structural equation. Section V1 is the conclusion. H. The Basic Model Consider the following structural model: y] : flyi'i'ziy +u1 (H) yizaoyi +225+u2 “-2) y. =max(0,y;) (1.3) where yl ,y2 are endogenous variables and y1 is observed only when y2 >0; 2] ,22 are exogenous variables and are always observed, 21 is a 1xKl vector with 2H 2 1 and 22 is a lsz vector with 22, 21; ul , u2 are structural errors; y; is the part of y2 where y2>O. This model basically is in the category of the Tobit selection model with endogenous explanatory variables. The basic framework is similar to Nelson and Olson (1978). However, Nelson-Olson assumes y1 is always observed so there is no sample selection problem, which is difi‘erent from our specification. We use y; instead of y2 as explanatory variable in (1.1) so that the reduced form of equation (1.2) can be derived. The reduced form of equation (1.2) can be written as (1.3) so the general model becomes (1.1), (1.3) and y; = zzr2 + v2 (1.4) We can combine (1 .3) and (1.4) to obtain a reduced form selection equation: y2 —--max(0,27r2 +v2) (1.5) 10 The lxK vector 2 contains nonredundant elements 21 and 2,. This model is similar as Type HI Tobit model except we include y; in (1.1) as explanatory variable. We consider a general linear model where selection is determined by the instrumental variables. The population model of interest is y = ,6, +,62x2 +...+,kak +u = xfl+u, where x1 2 1. One or more of the elements of x can be correlated with u . We also assume the availability of a le vector 2 , L 2 k, such that E(u | z) = 0. Naturally, any exogenous elements in x are included in z . Under the rank condition for identification, rank E( z'x) =k, ,3 could be consistently estimated by two stage least squares if we have a random sample fi'om the population. However, if we can observe data on (x, y, z) for only a subset of the population, consistency depends on the nature of the selection rule. Let s be a binary selection indicator such that s = 1 if ( x, y, z)is observed and s = 0 otherwise. Assume that s = h(z) for some known function h( . ) , so that selection is based entirely on the exogenous instrument variables. Let (x" , y, , z, ), i=1,2,. . .,N, be a random sample from the population, and let s, = h(z,.). Then the ZSLS estimator using the selected sample can be written as r -l ,3 = [N'lisizi'xi] (N’isizi'zi) (NJZNZSJWJ I .11" , -1N . 4 4N . .N Elsizixi N Elsizizi N iElsiziyi. l— 1- 11 Substituting yi = xfl +ui gives —1 2- 2+1NZS~MNM MM 2 —1 [$432.22 2. i [1“. 38.2 a ’2] [N" 39122 i 3]]- Since we assume E(u | z) = O and 3,2i is some fiinction of zi , E( siz, 'u,) = 0 by iterated expectations. According to the law of large numbers, plirn fl = ,6 . More details are given in Wooldridge (1996). In our setting, selection is determined by a censored selection variable. Let v be a zero mean random variable such that (z,v) determines the sample selection: 5 = h(z, v) and (x, y, 2, v) is observed when s = 1. Assuming that (u,v) is independent of z , and that E(u | v) = 5v, we can write =xfl+§v+e, (1.6) where E(e | 2, v) = 0 . If we use ZSLS on the selected sample to estimate (1 . 1), we will not get a consistent estimator of ,6. But we can estimate (1.6) by ZSLS using instruments ( 2, v) . We can apply this result to model (1.1) and (1.5) to estimate ,6 and y . To test and correct the possible sample selection bias when estimating in (1.1), we need a few assumptions: Assumptions 1.1 (i) ( z, y,) is always observed but yl is observed only when y2 >0. 12 (ii) (u1 , v2) is zero mean and independent of z . (iii) v2 ~ Normal (0, 122) (W) E( u. ivz )=§V2 An important assumption in the Type 1H Tobit model is the bivariate normalality assumption so that the model can be estimated by MLE. However, here we only need Assumptions 1.1 (iv) instead of the bivariate normality of ( u1 , v2 ). This allows a fairly broad range of distributions for u]. Under Assumptions 1.1 (ii) and (iv) we can write ul =.§fv2 +e (1.7) where e is a disturbance such that E( el 2, v2 )=O. Plugging (1.7) into (1 . 1) gives y1 =fly§+zly+§v2+a (1.8) Here the test of H0 :9“ = O is a valid test for sample selection problem, as it tests whether ul and v2 are uncorreclted. If H0 is not rejected, y; is exogenous in the structural equation (1 . 1) and selection is based on exogenous variables (since 2 is exogenous). Thus, OLS on the selected sample works in that case. On the other hand, y; is endogenous in the structural equation (1 . 1) if 4‘ :t 0. However, since y; depends on (2, v2 ), it is exogenous in equation (1.8). In fact, E(yi|2,v2)=fly§+zir+§vz (19) Further, the expectation is the same if we condition also on v2 > —zrr2 . From Wooldridge (1996) Theorem 2.1, it follows that OLS estimation of (1 .9) using the selected sample will be consistent (take x = (y; , zl , v2) ). Because v2 is not observed, it must be estimated. This leads to the following procedure 13 PROCEDURE 1.1: (i) Estimate y,2 =max (O, 2,7:2 + V12): viz|zi ~ Normal (0, If) by standard Tobit MLE using all N observations. (ii) Obtain the residuals from step 1 as 9,, for y,2 >0 (i = 1,2,...,Nl ) by defining A A vi; = y,2 —2, 7:2 , i=1,2, ......... ,Nl. (iii) Regress y" on ya, 2,], and 9,2, i=1,2, ......... ,N1 Since we have consistent first—stage estimates, replacing V2 with estimates does not affect consistency in the second stage. The correction of standard errors due to the generated regressors problem is discussed in Wooldridge (1996). Wooldridge’s formula can be applied here because f2, is the only generated regressor. It should be noticed that there are two separate features of our model. First, there exists an endogenity problem for y; as an explanatory variable in the structural equation. Second, there exists sample selection problem because we can only observe yl when y2 >0. By including V2 as an additional regressor in the structural equation, the two problems are solved at the same time. The endogenity of y2 actually is the source of the sample selection problem in our case. If y2 is exogenous, then uI and v2 are not correlated with each other. The sample selection problem does not exist. In addition to ,6 and 7 , we may want to estimate ato and 6 . Let A. y,l a @in + z“ 7 for all observations N, where y,2 = 2,722 is the predicted value of the Tobit model in the Procedure 1.1 step (i). Estimate the model 14 A y,2 =max (0, a0 y“ + 2,26 +err0r2) (1.10) by standard Tobit. Then we can get consistent estimators for an and 6 also. Since the relationship between explanatory variables and dependent variable is nonlinear, interest lies in E(inIyilrziZ) and E(inIyil’zi2;yi2 > O)- Ifwe let x9 5 a0 yn+ 72125, we can apply the formulas provided by Wooldridge (1998) directly. We briefly outline the reasoning of the formulas in the following, To obtain E( ynlx ), first use the law of iterated expectations: E(yrzlx )=P(y12 : Dix) ' 0 + P(y12 > Dix) ' E(yizlxayrz > 0) =P(y.. > 01x) - E(y,, x,y.2 > 0) (1.11) the first term P(y,.2 > 0|x) is easy to be derived, P(y,.2 > 0|x) =P(u2 > —xt9|x) =P(-’i—2—>—xa/a)=¢(x9/a) (1.12) a We can also derive that xH/a E(yi2 |x,y,.2 > 0) = x9+E(u2 > —x6) = x9+aW So we can conclude that . _ @(x0/ 0) E(yizx,y,2>O)—x6+a¢(x0/a) (1.13) E( y,2 Ix )= () is the standard normal cdf. The partial effect of variable x 1. with respect to (1.13) and (1.14) are: 6E(y,-2|x;y,-2 > 0) 0” x]. =19, +19, flaw/a) = t9j[1—/1(x6/0’){x0/0'+/1(x9/0‘)}] (1.15) Qfi (-) ”“1 where xi = a (ylex) =5P(y,2 > le) -E(y.-2 IX,)’,~2 > O)+P(y12 > Oix)' 6130,12 lxhylz > O) 07x]. 6x]. 0x I . 19. 2 since we already know that Po... >0|x)=(x19/a) and 5P°"‘) = ’¢(),we . 0’ J can derive that 6E . x 6. 6E . 2 . 0 -—2-'2—|-l =4®(x0/0') 0E(y,.2|x;y,.2 > 0)+(x0/ a) o (y,2|x,y,2 > ) 0”xj 0‘ 0,, x}. ............ (1.16) The elasticity of y,2 with respect to x J. , conditional on y,2 > O , is 6E(y12ixay12 >0) x]. (117) axj E(yiz Ixayrz > O) and the elasticity of y,2 with respect to x J. , without conditional on y,2 > 0, is 613(in ix) x]. (118) 526, E(yiz IX) In section HI we apply the three-step method we derived so far to estimate wage equation 16 and the correspondent labor supply equation. We also estimate the wage elasticity of labor supply. We can use the same argument to put y? in the structural equation of our model. The sample selection correction version of the structural equation becomes: yi=fl y§+n y?+2.7+6 v2+e (1.19) where y? in this equation is an exogenous explanatory variable. However, except for n = 0 , (1.19), (1.2) and (1 .3) can not lead us to have a reduced form selection equation like (1.5). But we can use (1.19) to test H0 :17 = 0; that is, we can test for linearity in the relationship between yl and y; III. Empirical Application III.1. Estimating the Structural Wage Equation We now apply the previous methods to consistently estimate a wage offer equation. The structural equations are: log (wage) = )60 + ,6, hours' + 2,7 + u1 (1.20) hours = max(0, at0 log (wage) + 226 + 14,) (1.21) where hours‘ are “desired” hours, observed only when hours > 0. 2I and 22 are vectors of exogenous variables as defined in section H. In the notation of section H, take 17 y1 = log(wage) and y; = hours'. We observe wage only when hours>0. Also, we assume Assumptions 1.1. We can derive the reduced form of (1 .21) as hours = max(0, zrr2 + v,). (1.22) There are two reasons we put hours' as an explanatory variable in the wage equation instead of the traditional case that does not have it included in the wage equation. First, theoretically labor is a quasi-fixed input that will generate quasi-fixed costs to the firm, which means that a lower wage will be paid at lower hours of work than at high hours of work; see Oi (1962). There have been few theoretical or empirical studies of the effect of hours worked on wage offers. Rosen (1969) is an example of an early study. Rosen (1976), Larson (1979), and Hausman (1981) also are examples. Second, Mofiitt (1984) argues that working hours is an important explanatory variable that should be included in the wage equation. We apply the current methods to Mofiitt’s data set to see if this is the case. From Assumptions 1.1 in section H, we know that: E(log( wage)| 2, v2 )= ,60 + ,6, hours. + 2,7 + 6 v2 (1.23) Adding the term 6 v2 is not only for testing and correcting sample selection bias but also for testing and correcting the endogenity of hours'. The consistent estimators of A and 7 can be obtained by the following procedure: (a). Estimate (1. 19) by standard Tobit for all observations N. Save the residual of (1 . 1 9) for hours>0 as 132 by defining 1'32 =hours— 2721 , for hours>0 18 (b). Regress log( wage) on 1, hours', 21 , and 1‘2, . Here we assume that Var( ul Iv, ) is constant so that the t statistic is valid and we can get a convenient t-test of f for sample selection bias/endogenity of hours. after the regression in step (b). If it is significant, we have also correct the problem. The White (1980) heteroskedasticity—robust t-statistic can be used if Var( ul Iv, ) is not constant. We estimate the model using the same data set used by Moflitt (1984). The data set contains a cross section of women drawn from the 1972 wave of the National Longitudinal Survey (NLS) of Older Women. It includes all women in this wave who have valid data for hours of work last week, the hourly wage rate, the flow of asset income, and related characteristics. The variables included in the zI vector are race, age, years of schooling, and three area variables: the size of local labor force, the employment fractions in manufacturing and in government in the census region of residence. The 2, vector includes dummies for marital status, age, race, the number of family members, the number of children in the household who are less than 6, the number of children in the household who are greater than 6. Both 21 and z, are treated as exogenous explanatory variables. See Table 1.1 for more details about the descriptive statistics. In Table 1.1 the percentage of women working is 48.69%. Eighty percent of the women are married. The average age of the sample is 44.37 and the average education is 10.32 years. To compare the differences between ordinary least square estimation and our two step procedure, we estimate the structural equation using ordinary least square on the selected sample. This corresponds to simply dropping 13,, as an explanatory variable. We 19 also want to see if hours2 any importance for determining wage offer by adding hours2 as an additional explanatory variable in wage equation. The results are shown in Table 1.2. In Table 1.2, the t-test for 9,, is significant in column (2). It indicates that there does exist sample selection bias. Comparing the coefficient of educ in columns (1) and (2), that is, with and without 13,, into the structural equation, the coefficient decreased from .0777 to .0594. The return of education appears to be overestimated if the sample selection bias is not corrected. The coefficient of hours' also changes fi'om .0036 to .0121 before and after we add the constructed variable 9,, into the structural equation. This means that if we treat hours‘ as an exogenous variable, the return of working hours is apparently underestimated. In column (3) we add hours'2 as an additional regressor in our two step procedure to see if it is important for determining wage, as found by Moffitt. However, it is not significant in our estimation. Since Moffitt used wage instead of log(wage) as the dependent variable in the wage equation, we have to make the modification in our structural equation using wage as the dependent variable in order to make a direct comparison with Moffitt’s maximum likelihood results. The results are in Table 1.3. In Table 1.3, column (1) is our two step procedure using wage as the dependent variable. The t-statistic for o, is still significant as in Table 1.2, which indicate sample selection problem. The effect of education in this case is 16.06 cents. We also find that the estimate coefficient for hours is .243 and is significant. In column (2) we add hours'2 as additional regressor in the two step procedure. Just as in Table 1.2, we still do not find that the coefficient of hours'2 is significant in our estimation. The estimated coefficient for 20 hours also is not significant in column (2) after we add hours”. This maybe come from the multicolinerity between hours and hours”. Since Moffitt did not accountd for the sample selection problem in his study, we put the OLS estimation of the wage equation without adding 1‘), in column (3) to compare with Moffitt’s estimation in column (4). We find that the estimation results are quite similar except for the coefficients of hours and hours”. The efl‘ects of education are 20 cents in both cases compared with around 16 cents if the sample selection/endogenity problem is taken care of. Returning to Table 1.2 for our original setting, just like Moflitt found, we find there are significant efi‘ect of hours of work on the wage. Afier correcting the sample selection bias, the efi‘ect is more significant. Moflitt (1984) found a quadratic relationship between wage and hours. As discussed earlier, we can use the same framework to test for hours2 by adding it to the structural equation. Although we put hours. on the right hand side of the wage equation, the coeficient of 13,, which represent the estimation of the correlation of ul and 13,, is still significant. This is different from what Moffitt found. In our test, the sample selection bias does exist. Also, in Mofiitt’s paper, he found that hours2 has a significant negative effect on wage. After putting hours2 in the right hand side of the structural equation, we did not find that efi‘ect is significant. In Table 1.3, the t-statistic for hours'2 is -.25, which is very insignificant. Thus, while the point estimate suggests a decreasing return to hours, as Moffitt found, the effect is not significant. 21 IH.2 Estimating the Labor Supply Equation After we estimate the structural wage equation, we can also estimate the labor supply equation too. Since we did not find the significance of hours” , we only include hours. in the structure equation as an explanatory variable instead of hours. and hours”. The procedure just like we derived in section H. We post the result of Tobit estimation for the labor supply equation in Table 1.4. To derive the wage elasticity of labor supply, let ato log(wage) + 2,6 = x6 . According to equation (1 . 13) and (1.14), the partial effects of wage with respect to labor supply with and without the condition hours>0 are that 6 E(hoursllog(wage), 2,; hours > 0) _ a, [1 — 2(x6/0) . ((xa/a) + 1(x9/a))] (1.24) 6 wage wage 6 E h 1 , ( “’3' ”mag” 22) = “0 o(-)(xa + ar(-))+ (.) a" (1 - 1(.)(xa/a + 1(.))) 0,, wage wage - a wage ............. (1.25) The correspondent elasticities are: 1. The wage elasticity conditional on working hours >0( we call it elasl) elasl= 6 E(hours|10g(wage), 2,; hours > 0) * wage (1.26) 0.. wage x6 + 020 2. The wage elasticity without conditional on working hours>0(we call it elas2) 22 = 6 E (hoursIlog(wage), 2,) * wage (-)x6 + 09150 elas2 (1 .27) 6 wage Plugging in the mean values of the estimates, (1.24) equals to 3.71 and (1.25) equals to 5.31. We can also calculate the wage elasticity of labor supply for these two cases. In the case of hours>0, the elasticity is 0.26. This is pretty close to the nonlinear constraint case in Moffrtt’s study, which is 0.21. If we do not condition on hours>0, the elasticity is 0.686. As Moflitt found, we can see in Table 1.4 the other coefficients in the labor supply equation are generally of the expected sign but usually of low significance. Married women work less, as do older women and those who have more children. IV. An Extension of the Model It is natural to extend our model to relax some assumptions and get a more robust estimation. Keeping all assumptions in Assumption 1.1 except (iv), we instead assume E(ul Iv, )=1§,v2 + {,(v,2 — 1'22). The conditional expectation of the structure equation becomes E(y1 |z,v,)=—§,r,2+fl yi+217+§1v2+42v22 (1-28) Since we still assume the normality of v, , we can get a consistent estimator of II, from a Tobit equation. So 9,, and 9}, can be used in the procedure 2-1 we described in section H. The test for sample selection bias/endogenity is now the joint null H 0 : g, = 0 , C, = 0. The result shows in Table 1.5. 23 In Table 1.5, the estimated return to education is essentially unchanged when 9; is added to the structural equation as an explanatory variable. The change in the coefficient on hours is also very small. It changes from .0121 to .0122. This is not surprising because 9,2 is not very significant using the standard t-statistic. We did not find any significant change in the structural equation estimation after we allow for a quadratic in E( ul Iv, ). V. Models with Additional Endogenous Explanatory Variables We can add additional endogenous variables to the model in the section H. For simplicity, we consider the case of one additional endogenous explanatory variable added into the model. Consider the model: y. =flly§ +flzy3 +2.7 +12. (129) y, = zrr, + v, (1.30) y3 =zzr3+v3 (131) Here y, is just as before and y, = max (0, 27:, + v, ) is the selection equation. y3 is an endogenous explanatory variable that can be a binary variable, a count variable, or contain both continuous and discrete characteristics. For simplicity, we assume y3 is a scalar. The goal again is to estimate the structural parameters in (1.29). Assumptions 1.2 (i) ( z, y,) is always observed and ( y,, y,) is observed when y, >0. (ii) (14,, v,) is independent of z . (iii) v, ~ Normal(0, 1,2). 24 (iv) E(u,|v,) = 5v, . (v) 27:3 2 217:3, + 2,7r3, , 7:3, at 0. There are two cases of interest in the setup of (1 .29) to (1.31). The first is when y3 is always observed. For example, y3 is education and y, is log(wage) and education is always observed no matter the observability of wage. In the second case, y3 is observed only along with yl . An example is when yl is log(wage) and y3 is a measure of nonwage benefits. We do not observe nonwage benefits when we do not observe wage. From the assumption we made above, we can write y,2,6)»;+fl,y,+z,}2+(fv,+el (1.32) where el 2 u1 —E( u,Iv,). And E( e,Iz,v,) = 0. Since el is not correlated with y; , y; is exogenous in (1.32). If v, were observed, we could estimate (1.32) by ZSLS on the selected sample using instruments ( 2, v, ). As before, we can estimate v, when y, > 0 since 7:, can be consistently estimated by Tobit of y, on 2 . PROCEDURE 1.2 (i) Obtain Iii from Tobit of y, on 2 using all observations. Obtain the Tobit residuals 121, = y); — 21-75) for y,, > 0. (ii) Using the selected subsarnple to estimate the equation ya = flryrz +fl2y13 +2117 +5912 +911 (133) by ZSLS, using instruments (y, , 2,, 151,). 25 Allowing for y, means that the structural equation can contain standard form of endogenity apart from sample selection. For example, when yl is wage offer and y, is hours worked, y3 might be education. Thus, we allow for self-selection into the model as well as account for the sample selection problem. We discuss self-selection in the next chapter. VL Conclusion This chapter has derived a multi-step approach to estimating a Type IH Tobit model where the selection variable appears as an endogenous explanatory variable. The multi-step approach has advantages of being easy to compute and being more robust than MLE. It also provides a simple t-test for sample selection bias at the same time. Computing the asymptotic variance matrices when sample selection is present requires general methods for two-step estimation, as in Newey and McFadden (1994, Handbook of Econometrics). When we apply the approach to the Mofiitt (1984) data, we find evidence of sample selection bias, contrary to what Moffitt finds. After correcting the bias, the average return of education goes from 7.77% to 5.94%, which is similar to earlier findings. It appears that the average return will be overestimate if we do not account for the sample selection. The coefficient of hours‘ also changes fiom 36% to 1.21% before and after we account for endogenity of hours in the wage offer equation. 26 "I As for labor supply, after our three step estimation we found out that the wage elasticity of labor supply is far less when the estimation is conditional on working hours>0 than without, something that has been reported in earlier literature. We also showed how to relax the assumption that the structural error has a conditional expectation linear in the selection error. The model can be extended to handle other endogenous explanatory variables in the structural equation. The next chapter considers a more general model with endogenous explanatory variables and sample 1 selection. 27 Table 1.1 Descriptive Statistics for NLS Data Set Variable Observations Mean Standard Minimum Maximum Deviation hours 610 17.1 19.06 0 60 wage 297 2.50 1.06 .44 6.62 nowinc 610 12.50 9.32 0 48.08 marital 610 .80 .40 0 1 age 610 44.37 3.35 35 49 nonwhite 610 .26 .44 O l famber 610 4.31 2.06 1 11 chiles6 610 . 10 .39 O 4 chigre6 610 1.93 1.74 0 7 educ 610 10.32 2.77 2 18 labforce 610 .41 .84 .002 4.59 indfrl 610 .22 .02 .18 .25 Indfr2 610 .18 .02 .15 .23 Note: hours are weekly hours worked, wage is hourly wage rate, nowinc is the flow of asset income, marital is marital status (=1 if married), nonwhite is a binary variable, which equals one if nonwhite. famber represents the number of family members, chiles6 represents number of kids less than 6 years of age, chigre6 represents number of kids greater than 6 years of age, educ represents years of education, labforce is the size of labor force (in millions), indfrl is employment fractions in manufacturing in the census region of residence, indfi'2 is the employment fractions in government in the census region of residence. 28 Table 1.2 The Estimation of Different Specifications (1) (2) (3) OLS OLS+ 9, (2)+ hours“2 Constant -.9093 -.5153 -.0625 (.5789) (.7027) (.7083) nonwhite -. 1325* -.1626* -. 1622* (.0485) (.0503) (.0504) Educ .0777* 0594* .0594* (.0078) (.0115) (.0116) Labforce 0923* .1013* .1012* (.0264) (.0266) (.0267) Indfrl -1.2751* -.3620 -.3534 (.6968) (.8149) (.8186) Indfr2 —4.251 1* -1.9894 -1.9897 (1.7563) (2.0440) (2.0476) Age .0015 .0104 .0103 (.0061) (.0074) (.0074) hours. 0036* 0121* .0110 (.0019) (.0044) (.0090) hours” - - .00002 (.00014) 9, - -.0091* -.0093* (.0044) (.0044) R-Square .3612 .3711 .3712 .1 .IL -H': ”HA:)V 51 Notes: Dependent variable is log hourly wage. educ represents years of education. exper represents experience. 9, is the Tobit residual. Standard errors in parentheses. * : significant at the 10% level. 29 Table 1.3 Two step and Mofiitt’s MLE estimation (1) (2) (3) (4) Two step OLS Moffrtt’s procedure (1)+ hours” estimation with wage as dependent variable Constant .4269 .3757 * 2.04 -1.80 (1.7936) (1.8078) (1.49) (1.54) nonwhite -. 1655 -.1676 -.11 -.10 (.1283) (.1287) (.12) (.12) Educ . 1607* . 1606* .20* .20* (.0295) (.0295) (.02) (.02) Labforce .1910* .1916* . 17* .17* (.0679) (.0681) (.06) (.08) [MM -1.185 -1.2251 -3.02* -3.59* (2.0801) (2.0894) (1.78) (2.05) Indfi'2 -5.0859 -5.0847 -9.51* -10.27* (5.2176) (5.2261) (4.48) (5.11) Age .0285 .0284 .01 .01 (.0188) (.0189) (.02) (.02) hours. .0243 ”I . 0294 . 014 .053 * (.0113) (.0231) (.02) (.02) hours” -.00009 -.0001 1 -.00078* (.00035) (.0003) (.0002) 9, -.0182* -.0181* - - (.0111) (.0111) R-Srnrare .3243 .3245 .3182 .1694 Notes: Dependent variable is hourly wage. educ represents years of education. exper represents experience. 9, is the Tobit residual. Standard errors in parentheses. * : significant at the 10% level. 30 Table 1.4 Tobit Estimation for the Labor Supply Equation (dependent variable is hours) Coefiicient Standard error t value Constant 49.05 23.44 2.09 P d lwagehat 19.30 6.30 3.06 .' nowinc .009 .17 .05 ’ "I marital -5.78 4.14 -1.40 age -1.20 .49 -2.47 race 5.12 4.05 1.27 famber -.23 1.73 -. 13 chiles6 -4.06 4.64 -.88 chigre6 -.73 1.91 -.38 Number of obs = 610 chi2(8) = 24.05 Prob > ch12 = 0.0022 Log Likelihood = -1694.5467 Obs. summary: 313 left-censored observations at hours<=0 297 uncensored observations note: lwagehat is eatimated log(wage). nowinc is the flow of asset income. Marital is marital status (=1 if married). Race is a binary variable, which equals to one if nonwhite. Famber represents the number of family members. chiles6 represents number of kids less than 6 years of age. chigre6 represents number of kids greater than 6 years of age. 31 Two Step Estimation with 9; added as an explanatory variable in the structural equation Table 1.5 coefficient standard error t value constant -.55 .41 -1.347 race -. 15 .05 -3 .09 educ .06 .01 5.36 labforce .10 .03 3.15 indfrl -.36 .82 -.45 indfi2 -2.06 2.05 -1.01 age .01 .007 1.50 hours .012 .004 2.93 9, -.014 .007 -2.03 1322 .00009 .0001 .91 Number of obs = R-squared Notes: Dependent variable is log hourly wage. educ represents years of education. exper represents experience. 9, is the Tobit residual. 32 Chapter 2 SELECTION CORRECTION WHEN BOTH SELF-SELECTION BIASES AND SAMPLE SELECTION BIASES ARE PRESENT IN A RANDOM COEFFICIENT MODEL 1. Introduction In chapter one we discussed sample selection correction procedures in models with endogenous variables and a Tobit selection equation. In this chapter, we focus on the situation where there is both self-selection and sample selection. A lot of literature has focused on either corrections for self-selection biases or corrections for sample selection biases before. However, none of them corrects these two sources of biases at the same time. The problem of self-selection often occurs when evaluating various programs such as job training, welfare, or the returns to education using nonexperimental data. Generally, the issue is that individuals decide whether to participate in a program or to receive some treatment. Ifthe participation decision is related to factors that afi‘ect the outcome variable, ignoring the determinates of this decision leads to biased estimates of program impacts. As an example, suppose we are interested in evaluating the benefits of social programs. A common specification is as follows: Y1: zrflr +ay2 +u1, where yl is an outcome such as earnings, Z1 is a vector of exogenous characteristics, and y, is the program variable that can be a dummy or a continuous variable. Ifwe are 33 interested in the impact of the program on employed persons, it is sufficient to obtain a random sample of people who are currently working. Ifwe are interested in the impact of the program for the population of eligible persons, but we can only obtain a sample of people who work, the problem of sample selection bias arises. In addition, the variable y, generally cannot be treated as exogenous if the decision of an individual to attend the program is based on individual self-selection. Ifwe were to treat the variable y as exogenous, the self-selection biases also arise. Ifthe sample we obtain represents the population we are interested, there is no sample selection problem and only the self- selection problem remains. However, if the sample also does not represent the population of interested, we have to address both sample selection and self-selection problems. An early discussion of the self-selection problem was that of Roy (1951), who discussed the problem of individuals choosing between two professions, hunting and fishing, based on their productivity in each. The observed distribution of incomes of hunters and fishermen was determined by these choices. The result showed that the individuals with better skills go into the profession with higher variance in earnings. The econometrics of the sample selection problem began with the studies by Gronau (1974), Lewis (1974), and Heckman (1974). Heckman (1976) suggested a two— stage estimation method to address this problem. Subsequently, Lee and Trost (1978) also analyzed the self-selection problem in the case of housing demand. About the same time, Lee (1978) corrected for sample selection biases corrections when estimating the return to union membership, and Willis and Rosen (1979) did the same thing when estimating the rate of return to a college education. 34 In the foregoing studies, the self-selection problem involved a discrete choice between two alternatives. By contrast, Garen (1984) studied self-selection problem with a continuous choice variable in the case of the returns to schooling where unobserved heterogeneity interacts with a continuous endogenous2 explanatory variable. In Garen’s application, the effect of schooling on wage may depend on the level of ability and self- selection may cause schooling and ability to be correlated. Garen’s model will be discussed in section H. Angrist and Krueger (1991) also studied the case of the returns to schooling. They explored self-selection problem in the context of how compulsory school attendance laws affect schooling and earnings. They argued that season of birth is related to educational attainment, because of policies regarding the age when children may first start school and compulsory school attendance laws. People born in the beginning of the year start school at an older age and can drop out after completing less schooling than people born near the end of year. They use quarter of birth as instrument for education and apply two-stage least squares (2 SLS) to estimate the model. In this chapter, we examine the case that combines both self-selection and sample selection biases with a Tobit selection equation. We also want to compare the difi‘erent situations under different estimation methods such as OLS, ZSLS, the method proposed by Garen, and the method we propose. The remainder of the chapter is organized as follows. Section H develops the basic model and examines the implications of different combinations of self-selection and sample selection. Section HI develops a model that extends Garen’s procedure to handle both self-selection and sample selection. In section IV we use two different data sets from the 2 Endogenous variable here means any explanatory variable correlated with unobservables. 35 labor econonrics literature to verify the models we proposed in section HI. Section V shows consistency of the new procedure we used in section IV. Section V1 is the conclusion. H. A Random Coefficient Model Garen (1984) considered a model where an endogenous explanatory variable interacts with an unobservable. This kind of model also called a “random coefficient” model. Specifically, y, 22,,6, +a,y, +y,a+7,y,a+u, (2.1) where 2, represent the exogenous variables and a constant and y, is the variable that is correlated with the unobserved variable a. As we will see, the assumptions we impose are most reasonable when y, is (roughly) continuous. For identification reasons, we need to assume 2 (which represent all of the exogenous variables) contains at least one element not in 2, . The variable u, is the structural error, where we assume that E( u, | 2, y, ,a) = 0. Therefore, (2.1) represents a conditional expectation: E(y, |2,y,,a) = 2,6, + a,y, + y,a +7,y,a (2.2) The partial effect of y, on E( y, |2, y, ,a) depends on a: BBQ), lz,y2,a) ayz =a,+7,a. Without loss of generality we can assume that E( a) = 0. Therefore, the average efi‘ect in 36 the population, sometimes called the average treatment effect, is a, . This is the primary parameter of interest. Garen argued that since y, is generally correlated with 2 , y,a is generally correlated with 2 even if E(al 2) =0. So, if we estimate the equation y, = 2,6, +a,y, +e, by ZSLS using instruments 2 , the estimates of 6, and a, will not be consistent. Garen suggested a correction approach to correct the self-selection biases when there exist the interaction term of y, and the unobservable. The reduced form for y, is: y, = 26, + v,. (2.3) Garen assumed it, ~Normal(0, of, ), a ~Normal(0, of ), and Cov(u, ,a) = 0,1,. Also, E(a | 2,v,) is independent of 2 and linear in v, : E(a|2,v,)=E(a|v,)=p,v,. (2.4) Under (2.4), the expected value of (2.2) conditional on (2, v,) is: E(y, |2,v,) = 2,6, +a,y, +7,E(a|z,v,)+y,y,E(a | 2,v,)+E(u, |2,v,) =26. +01% +rlp2v2 +72p2y2v2 This suggests a natural two-step procedure proposed by Garen (1984): 1. Regress y, on 2 and save the residuals 9,. 2. Regress y. 021 2.,y2,92.y292. By the standard theory of generated regressors, the second regression produces consistent estimators of 6, and a, , y,p,, and 7,p,. 37 In a recent paper, Wooldridge (1997) showed that Garen’s claim about the inconsistency of ZSLS is not correct. Wooldridge showed that ZSLS does produce consistent estimators when we ignore the interaction term in Garen’s model under assumptions weaker than Garen imposed. We briefly describe the reasoning as follows: Rewrite equation (2.2) as y, = 6,+2,6, +a,y,+ay,+u, (2.5) and assume E( u, | 2) = 0 . Because equation (2.5) contains the term a, y, , we can assume E( a) = 0 without loss the generality. For y, , we can write a linear equation in terms of all exogenous variables as in (2.3): y, =6, +26, +v, =5, +2,6,, +z,6,, +v, (2.6) If v, is correlated with a , then y, is endogenous and the expected error term will not be zero, which is: E(ay, +u,)=E(ay,)=E(av,) $0 (2.7) Because of this, Garen argued that the ZSLS estimator a, would be inconsistent. However, ZSLS only causes the intercept term 6, to be inconsistently estimated and the other parameters will be consistently estimated. To see this, we need to first assume E( u, | 2) = 0 as we already did and we need to assume v, satisfies the zero mean conditional and homoskedasticity assumptions from standard linear regression analysis. That is, E( v, | 2) = O and E( v,2 | 2) = a: . We also need to assume a relationship between a and v, as before, E(a I 2, v,) = E(a | v,) = p,v,. This means a is conditional mean independent of 2 given v, and E(a | v,) is linear. We need the standard identification condition that 6,2 at 0 in (2.6). Now, since 38 E(ay, I2) = E[E(ay, |2,v,) | z]: EIy,E(a| 2,v,)| 2] =E(p1y2v2 IZ) = 10.13022 I 2) = 10.022 = E(ayz). we have y, =(5, +p,0,2)+2,6, +a,y,+r,, (2.8) where r, = lay, -E(ay, |2)]+u,, so E(r, I2) = 0. -v’v -‘;T Applying 2SLS to (2.8) using a random sample of size n, where 2, is the vector of instruments for y, , is consistent and J; -asymptotically normal for 6, and a, under standard finite moment conditions. For more details refer to the paper of Wooldridge. There is one thing that should be noticed: although Garen’s claim about the inconsistency of 2SLS is incorrect, that does not mean the estimation method he proposed will produce an inconsistent estimate. Both ZSLS and Garen’s model should work under similar assumptions. HI. Garen’s Model with Sample Selection Garen (1984) assumed that a random sample from the underlying population is available. This may not be the case in his example, which is to estimate the average return to education in a wage offer equation. As we discussed in chapter one, using only observations on people who are working can cause a sample selection bias. In this section we study estimation of Garen’s model with a Tobit sample selection mechanism. It can be written as: yl = zlfll +aly2 +6ly2al +2‘] (29) 39 y, = 27:, +v,=2,7r,, +2,72,, +v, (2.10) y3 =max(0,27r3 +v,), (2.11) where y, is observed when y3 > 0 , 2 represent the exogenous variables and a constant, and y, is the endogenous variable that is possibly correlated with the unobserved variable a. The assumptions below are most suited to the case when y, is continuous. For identification reason, we need to assume 2 (which represent all of the exogenous variables) contains at least one element not in 2, . We have added a Tobit selection equation to Garen’s model. An example of this is when y, is the log of the hourly wage, y, is years of education and y3 is weekly or annual hours of works. More details about this example will be given in section HI. Assumptions 2.1 (i) (2, y, , y,) is always observed in the population but y, is observed only when y3 > 0. (ii) (u,,v,,v,,a,) is zero-mean independent of 2 . (iii) v, ~ Normal (0, r32 ). (iv) E(a, | 2, v,,v,) = p,v, +p,v,. (v) E(u, |2,v,,v,) = {,v, +§,v,. (vi) The rank condition 72,, i 0. Assumption (i) says that y, is the only variable unobserved due to a possible sample selection problem. Assumption (ii) and (iii) are fairly standard in this contexts, but they are restrictive. Assumption (ii) means that the disturbance term in the linear reduced form for y, (which is v,) is independent of 2 , this restricts the kinds of endogenous variables we 40 can allow. If y, is a binary or discrete variable, then the assumption is not reasonable. This could be relaxed somewhat, but linearity is still needed. Assumption (iii) allows u, , v, , and v3 to be arbitrarily correlated, so that endogeneity and sample selection bias can both be present. Assumption (iv) implies that the conditional expectation involving unobservable is linear.Assumption (v) relaxes the usual joint normality of (u, , v, , v,) by just requiring linearity of a conditional expectation. Assumption (vi) is the standard identification condition: We need a good instrumental variable for y,. The model is a Type IH Tobit model in Amemiya’s (1985) taxonomy but with an endogenous explanatory variable that interacts with an unobservable. To derive an estimating equation, write E(y, |2,v,,v,)=2,6, +a,y, +6,y,E(a, |2,v,,v,)+ E(u, |2,v,,v,) =2,6, +a,y, +6,y,(p,v, + p,v,)+(,v, +§',v3 by assumptions (ii), (iv), and (v). So we have E(y, |2,v,,v,) = 2,6, +a,y, +K,y,v, +K,y,v, +§,v, +§,v3 where K, a 6, p, and K, E 6, p, . From the discussion in section H of chapter one, we can select the sample on the basis of ( 2, v3 ). Since v, and v3 are not observed, we can estimate v, by running OLS on equation (2.9) and estimate v3 when y3 > 0 since 7:, can be consistently estimated by Tobit of y3 on 2 . The procedure is: 41 PROCEDURE 2.1 (i) Run OLS of y, on 2 to get 7%,. Obtain the OLS residuals 9, = y, - 272,. (ii) Obtain if, fiom Tobit of y3 on 2 using all observations. Obtain the Tobit residuals 9, = y3 —27?, for y3 > 0. (iii) Use OLS on the selected subsample for which we observe y, to estimate the equation y, = 2,6, +a,y, +rc,y,9, + rc,y,93 +§,9, +§,93 +errorterm. We can test and correct the potential self-selection and sample selection biases at the same time by procedure 2.1. An F-test of the joint significance of y,9, , y,9,, 9, , and 93 tests for endogenous of y, or sample selection. Ifeither if, or f, is significant, there is a self-selection problem. If either 12', or 4°, is significant, there is a sample selection problem. Ifwe use individual t-statistics or F-statistics for a subset of coefficients, these should be adjusted for the generated regressors problem. If the coefficients on these variables are small, the adjustment will not make much of difference. Notice that the exogenous explanatory variables 2 in (2. 10) and (2.11) can be the same: we do not need an exclusion restriction in (2. 10) in order for the procedure to work well. The residuals 9, have separate variation fiom 27?, because of the variation in y,. IV. Empirical Examples As in Chapter 1, we use an example from labor economics as the basis for the empirical applications to follow. 42 A wage offer labor supply system is given by log(wage°) = 2, 6, + a, educ + 6, educ - a, + a, + u, (2.12) educ= 27!, + v, (2.13) hours=max(0, 27:, + v,) (2.14) where wage° is the wage offer observed only when hours>0, educ is years of education, a, is unobserved ability. Education is correlated with unobserved ability. There exists an interaction term educ - a, for education and ability. We assume that hours, 2 , and educ are always observed. The assumptions are the same as in section II. We are interested in estimating the returns to education in this model, and we do so using different estimation methods including OLS, ZSLS, the method suggested by Garen, and the method pr0posed in Section HI. We first apply these methods to the data set used by Mroz (1987) in his study of the sensitivity of female labor supply to various assumptions. The data come from the University of Michigan Panel Study of Income Dynamics (hereafter PSID) for the year 1975. The sample consists of 753 married women, 428 of whom report positive hours worked during 1975. The exogenous variables include actual labor market experience (exper), exper 2 , family income other than that earned by the woman — in thousands - (nwifeinc), number of kids less than six (kidslt6), number of kids between six and eighteen (kidsge6), and age. We make exclusion restrictions in the wage equation, which includes only educ, exper, and exper2 as explanatory variables. Hours here are measured annually. The descriptive statistics are summarized in Table 2.1. 43 We summarize the estimation results for the PSID data set in Table 2.2. In the OLS estimation, we estimate the model using only the women who work, which is ignoring the potential sample selection biases and self-selection biases. The result shows in column (1) of Table 2.2. The OLS estimate return of the education using the PSID data set is about 10.75%. However, in our model educ is endogenous and possibly correlated with unobserved ability. It is possible OLS estimation is biased. So we want to see if there exists self-selection biases and sample selection biases problems by checking the other versions of estimation. To see if there self-selection bias exists, we first run a regression which only include 9, as the additional regressor. The result shows in column (2) of Table 2.2. The t-test for the estimated coefficient of 9, is marginally significant, suggesting possible self-selection bias. To correct the possible self-selection bias, we use 2SLS and Garen’s method to make comparisons. The results of 2SLS and Garen’s estimation is in column (3) and (4) of Table 2.2. The ZSLS estimate of returns to education is 8.04%. We use parents education and husband education as the instrument variables for educ. The parents education levels are widely believed good instruments for educ although the husband education level is more suspectable to be a good instrument. According to section H, the estimation is consistent if there is only self-selection biases present. Since 9, = educ - 22?, , it can be proved that the OLS estimators of the returns to education are identical to the 2SLS estimators when we add 9, as an additional regressor in a normal OLS . Although the 44 parameters estimate of ZSLS and the one we did in column (2) are the same, the procedure of adding 9, as an additional regressor in a normal OLS is useful as a test to see if there exists self-selection biases. We also perform the estimation method proposed by Garen to compare with ZSLS. The results are shown in column (4). In column (4), the return to education is 7.64%. If we compare the estimated coefficients for the returns to education in column (3) and (4), we can see that there is only 0.4 percentage point difference. To see if there exists sample selection biases in addition to self-selection biases, we suggest two models. One includes both corrections for self-selection and sample selection biases but linear model which we will call it procedure 2.2 afterwards. More details for procedure 2.2 are given in section V. The other one is our new method developed above which include the linear terms 9, , 9, and the nonlinear terms educ- 9, , educ- 9,. The results are shown in column (5) and (6). The return to education in column (5) is 7.65%, which is similar to the results in 2SLS and Garen’s model. This is reasonable since the t value for the estimator of coefficient of 9, is —1 .351, which indicates the sample selection biases is not a server problem here. In column (6), the returns to education is 7.98%. The result is very close to the estimation by ZSLS, Garen’s model, and the linear combined model we proposed above. From the t value of each estimate coefficient, we have the same conclusion as before that there only exists a self-selection problem. We also make some joint tests for self—selection biases and sample selection biases. The results show in Table 2.3. It also shows that there 45 is strong evidence that there exist self-selection biases but no signs of sample selection biases. We notice there is about a 20% difference for the estimation of returns to education between OLS and other estimation methods. We can conclude that the OLS estimator of a, is inconsistent. Also, the 2SLS version of 6, is .0804, which is not much difference from the Garen’s version of 6, (about 0.076). Sample selection does not appear to be a problem in this case. We now apply the same procedure to another, larger data set, which was compiled from the May 1991 Current Population Survey (CPS) by Daniel Harnermesh. The data set contains 5634 married women, 3286 of whom report working positive hours during the week. The hourly wage is weekly earnings divided by weekly hours for women who worked positive hours; the women did not work do not have data for hourly wage. The hours here is measured weekly rather than annually as in PSID data set. Also, experience is potential rather than actual experience and the variables kidslt6 and kidsge6 are binary indicators. The descriptive statistic are shown in Table 2.4. As in Table 2.2, here we perform all the estimation methods for CPS data set in Table 5. OLS results are shown in column (1). In column (1), the estimated return to an additional year of education is 9.90%. To see if there exist self-selection biases, we add 9, as an additional explanatory variable for OLS. The results are shown in column (2). From the t statistics for the estimated coefficient of 9, , it is clear that there does exist self-selection biases and need to be corrected. We find the returns to education is 46 13.05% in this case and the t statistic is significant at the 10% level. Although it is widely believed that estimate of the return to education should go down after correction of the self-selection, there is growing research supporting our empirical results of the CPS’s data set that OLS estimate is biased downward. Angrist and Krueger (1991) in their research for returns to schooling reach the same result that OLS estimates are biased downward than instrumental variables estimates.3 David Card (1993) also reach the same result that instrumental variable estimates exceed OLS estimate and suggest that the result may come fiom measurement error in schooling. The 2SLS results are shownn in column (3). As we said, the estimators of ZSLS are algebraically the same as we add 9, as additional regressor in OLS. So the estimated coefficients is just the same as the result in previous column. Since the CPS data set we have does not have other good instruments for educ, we can only use husband education as the only instrument variable for educ. If there is no sample selection biases present, the estimate should be consistent. For comparison reason, the estimation of Garen’s method is shown in column (4). The return to education in Garen’s model is 13.07%, which is about the same as in ZSLS version. This support the claim made by Wooldridge that ZSLS estimator is, under certain assumptions, consistent even when we allow the endogenous explanatory variable to interact with unobservables. When the data set is bigger, we observe that the estimators of ZSLS and Garen’s model is closer than in PSID data set. We also have to check if the sample selection biases is a problem. To see that, we also run the two methods we 3 Other researches like Ashenfelter and Krueger (1992), Kane and Rouse (1993) and Butcher and Case( 1993). All the above studies report instrumental variables estimates of the return to schooling that exceed the ordinary least-squares estimate. 47 proposed as we did in PSID data set case. The estimation result, which makes corrections for self-selection and sample selection biases but linear model, is in column (5). From the t-test of the estimated coefficient for 9, , it’s clear that there exist sample selection biases problem. The t value is 2.503, which is pretty significant. The returns to education rise a little to 13.09% fi'om 2SLS. The estimation including the interactive terms as additional regressors shows in column (6). In column (6), the return to education we estimated is 13 .21%, which is pretty close to the estimation method that only include 9, and 9, as additional regressors. We notice that from t-test, both estimate coefficients of 9, and educ- 9, are not significant. It indicates there is multicollinearity present. We do the F-test to jointly test the four terms including 9, , 93 , educ- 9, , and educ- 9, . The result shows in Table 2.6. We can see that the joint test for estimate coemcients of educ- 9, and 9, is marginally significant at the 10% level. This supports our claim that the sample selection biases do exist in this case. It is important to note that the estimated return to education by OLS is about the same (10.75% for PSH) data set and 9.9% for CPS data set) before the self-selection and sample selection problems are being controlled for. After correcting self-selection and sample selection biases, the return to education decreases to 7.98% in PSID data set and increases to 13.21% in CPS’s data set. From the estimation results in Table 2.2 and 2.5, this OLS bias of return to education mainly comes from people’s self-selection. To make the exact comparison with CPS data set, we redo the procedure 2.1 for PSID data set and remove parents’ education as instruments for education. This leaves 48 husband education as the only instrument for education just like in CPS data set. The results are shown in Table 2.7. In Table 2.7, both self-selection and sample selection don’t appear to be problems. The result is different from Table 2.2 where we use parents’ education and husband’s education as instruments of education. This result indicates that husband’s education may not be a good instrument for education. Compare with the results in Table 2.5 and 2.7, the situation that OLS estimations are close but procedure 2.1 estimations are very different keeps the same. V. The Consistency of Procedure 2.2 When there is only a self-selection problem, both ZSLS and Garen’s estimation method can produce consistent estimates for the return to education under the assumptions discussed in section HI. We also mentioned in section ID that the OLS estimators of the returns to education are identical to the 2SLS estimators when we add 9, as an additional regressor in a normal OLS. This means both OLS y, on 2, , y,, 9,, y,9, and OLS y, on 2, , y, , 9, can get consistent estimators of a, (the returns to education). In procedure 2.1 we regress y, on 2, , y,, y,9,, y,9,, 9, , 9,. It is natural for us to suspect that OLS of y, on 2, , y, , 9, , 9, can actually produce consistent estimator of a, . Compare the results of Procedure 2.2 and procedure 2.1 in Table 2.2 and 2.5, we also find that 6, in both methods are very close. 49 Our conjecture turns out to be correct. We show it in the following. As in section H, our model is: yr 2 Cr +zlfll +a1y2 +61yzar +“1 (2-9), yz =”1'2 +v2=zr7t21 +2232 +v2 (2-10) y, =max(0,27r, +v,) (2.11) where c, in (2.9)’ is the intercept. We also need certain assumptions: Assumptions 2.2 (i) E(u, |2) = 0 (ii) E(v, I 2) = O, E(v,2 | 2) = 0,2 (iii) E( v, | 2) = O, Cov( v,, v, )=0 (iv) E(a, |2,v,,v,)=E(a, |v,,v,) = p,v, +p,v, (v) E(u, |2,v,,v,) = {,v, +§,v, (vi) 72,, :t 0 in (2. 10). Assumption (i) is an exogenous assumption for 2 . Assumption (ii) implies that v, satisfies the zero conditional mean and homoskedasticity assumption from standard linear regression analysis. Assumption (iii) means that v, satisfies the zero conditional mean and is uncorrelated with v,. Assumption (iv) allows y, and y, to be correlated with a, . Assumption (v) allows y, and y, to be correlated with u, . Assumption (vi) is the standard rank condition for identification. We can show that by only adding OLS residuals of equation (2. 10) (denoted as 9,) and Tobit residuals of equation (2. 1 1) (denoted as 9, ) as additional explanatory 50 variables in the structure equation (2.9)’, 6, and a, can be consistently estimated. First we need to show that E( a, y, | 2, v, , v,) =E( a, y,) . Under (2. 10), (ii), (iii), (iv), and the law of iterated expectations, E(a,y, |2,v,v,)=E[E(a,y, |2,v,,v,)l2,v,,v,]=E[y,E(a, |2,v,,v,)|z,v,,v,] =E[27r,(p,v, + p,v,) | 2,v,,v,]+E[v,(p,v, + p,v,)l 2,v,,v,] =E( 27r,p,v, |2,v,,v, )+E(27r,p,v, |2,v,,v,) +E( p,v,v, I 2, v,,v, )+E( p,v,v3 | 2, v,,v,) =p.E(V§ |2,v2.v3)=p.0§ = E(alyz) After we have the proof above, we can write y, = (c, +6,p,0,2)+z,6, +a,y, +r,, (2.12) where r, = [6,a,y, —6,E(a,y, |2,v,,v,)]+u,, and so E(r, |2,v,,v,)=E(u, |2,v,,v,) =§,v, +§,v, (2.13) by assumption (vi). It follows that we can write E(y, |2,v,,v,) = (c, +6,p,a,2)+2,6, +a,y, +§,v, +§,v,, (2.14) Now let 3 be a binary indicator such that s=l[ y, > 0] which is a function of ( 2, v, ) , we have E(yl IZ,V,,V,,S =1)=(Cr +61.0107,?)‘1’21fll +a1y2 +41% +szs- (2-15) Equation (2.14) leads to the following three-step procedure. 51 PROCEDURE 2.2 (i) Run OLS of y, on 2 to get 7%,. Obtain the OLS residuals 9, = y, — 27%, (ii) Obtain 7?, from Tobit of y, on 2 using all observations. Obtain the Tobit residuals 9, = y, —27'i, for y, > 0. (iii) Estimate 6,,a,,§,,§, from OLS regression y1 on zrry27V23V3 using the selected subsample. It turns out the Procedure 2.1 is also unnecessary just like Garen’s method in self- selection case for us to consistently estimate 6, and a, . Both Procedure 2.1 and Procedure 2.2 can estimate the returns to education consistently in section IV’s application. VI. Conclusion This chapter shows how to test and to correct for self-selection and sample selection biases at the same time, especially in circumstance when the explanatory variable, such as education, interacted with unobservable. The procedure developed in this paper to test and correct potential self-selection and sample selection biases is the following: 52 1. If self-selection and sample selection questions exist at the same time, then use the two methods we provide in previous sections (Procedure 2.1 and 2.2) to test and correct the biases. 2. If only self-selection appears to be a problem, add 9, as an additional regressor and run OLS. If 9, is significant and self—selection biases are present, use Garen’s method or ZSLS. It should be noted that Procedure 2.1 and 2.2 produce very similar results in both data sets, especially in the larger CPS data set. This supports our claim that the estimation in Procedure 2.2 is consistent even when there exists an interactive term as a regressor. It is needed to mention that in this chapter we have avoided firll parametric assumptions in deriving the sample selection and self-selection procedures. Nevertheless, we have assumed that the selection variable follows a standard Tobit equation and that the conditional expectation for unobservables is linear. In addition, we have assumed independence between disturbances and exogenous variables instead of weaker and preferable zero conditional mean assumptions. For firrther researches, it is possible to relax some of the parametric assumptions to make the estimation more robust. Powell (1984) extends least absolute deviations (LAD) estimation to the regression with non- negativity of the dependent variable, and gives conditions under which this estimator is consistent and asymptotically normal. The LAD minimize the sum of absolute deviations of the selection variable from its median function under the assumptions that the error term is continuously distributed with median zero and the median is unique. The restriction for the normality of the error term can be relaxed in this setting. It will be interesting to explore and apply LAD to our case. 53 The empirical results for the PSID data set shows that there is evidence of self- selection biases but not strong evidence of sample selection biases. The simple OLS regression using only women who work cannot produce consistent estimates and the estimate of return to education is biased upward. By contrast, 2SLS can provide consistent estimates when sample selection bias is not a problem. In the CPS data set, both self-selection biases and sample selection biases exist. The method we suggested should be used to correct self-selection biases along with sample selection biases. The OLS estimate for returns to education is biased downward. This indicates there is no certain direction for OLS biases. Also, the estimated return to education by OLS is about the same (10.75% for PSID data set and 9.9% for CPS data set) before the self-selection and sample selection problems are being controlled for. After correcting self-selection and sample selection biases, the return to education decreases to 7.98% in PSID data set and increases to 13.21% in CPS’s data set. From the estimation results in Table 2.2 and 2.5, this OLS bias of return to education mainly comes from people’s self-selection. Although it is widely believed that estimate of the return to education should go down after correction of the sample selection along with self-selection, there are growing researches supporting our empirical results of CPS’s data set that OLS estimate is biased downward. Angrist and Krueger (1991) in their research for returns to schooling reach the same result that OLS estimates are biased downward than instrumental variables estimates.4 David Card (1993) " Other researches like Ashenfelter and Krueger (1992), Kane and Rouse (1993) and Butcher and Caw(l993). All the above studies report instrumental variables estimates of the return to schooling that exceed the ordinary least-squares estimate. 54 also reach the same result that instrumental variable estimates exceed OLS estimate and suggest that the result may come fi'om measurement error in schooling. The method we develop in this chapter is easy to implement in conventional empirical studies. It can correct self-selection and sample selection biases at the same time. It is hoped that researchers in many fields might benefit from it when analyzing empirical questions. 55 Table 2.1 The Descriptive Statistics for PSH) Data Set Variable Observations Mean Standard Minimum Maximum Deviation hours 753 740.58 871.31 0 4950 kidslt6 753 .238 .5240 0 3 kidsge6 753 1.35 1.32 0 8 age 753 42.54 8.07 30 60 educ 753 12.29 2.28 5 17 wage 753 2.37 3.24 0 25 husage 753 45.12 8.06 30 60 huseduc 753 12.49 3.02 3 l7 motheduc 753 9.25 3.37 0 17 fatheduc 753 8.81 3.57 0 17 exper 753 10.63 8.07 0 45 nwifeinc 753 20.13 11.63 -.03 96 lwage 428 1.19 .72 -2.05 3.22 expersq 753 178.04 249.63 0 2025 Note: hours are annual hours worked. kidslt6 represents number of kids less than 6 years of age. kidsg6 represents number of kids greater than 6 years of age. Wage is hourly wage rate. educ represents years of education. husage represents age of husband. huseduc is years of education for husband. motheduc is years of education for mother. fatheduc is years of education for father. exper is experience. nwifeinc is non-wife income. lwage is log wage. expersq is experience 2 56 Table 2.2 The Estimation of Different Specifications (PSID Data Set) ( 1) (2) (3) (4) (5) (6) OLS OLS+ 9, 2SLS Garen Procedure Procedure 2.2 2. 1 Constant -.5220* -. 1869 -.1869 -. 1665 -.O79O -. 1572 (.1986) (.2836) (.2854) (.2829) (.2888) (.3256) educ . 1075* 0804* 0804* 0764* 0765* 0798* (.0141) (.0216) (.0217) (.0217) (.0216) (.0245) exper 0416* 0431* 0431* 0429* 0388* 0388* (.0132) (.0132) (.0133) (.0131) (.0137) (.0137) experz -.0008* -.0009* -.0009* -.0009* -.0008* -0008* (.0004) (.0004) (.0004) (.0004) (.0004) (.0004) educ-9, - - - 0107* - 0108* (.0056) (.0056) educ- 9, - - - - - -00001 (.00002) 9, - 0472* - -0820 0519* -.0795 (.0286) (.0736) (.0285) (.0732) 0, - - - - -.00006 .0001 (00004) (.0002) R-Square .1568 .1622 .1495 . 1694 .1661 .1746 Notes: Dependent variable is log hourly wage. educ represents years of education. exper represents experience. v, = educ - 27:, 9, is the Tobit residual. Standard errors in parentheses. * : significant at the 10% level. General I represents correction for both self-selection and sample selection biases in linear model, without interactive between years of education and ability. 57 _ “-1 Table 2.3 Joint tests for self-selection and Sample selection biases (PSID data set) Ho: F value Prob > F educ. 9, = 0 educ- 9,= 0 . 2.27 0.0612 v, = 0 9, = 0 educ- 9,: 0 {.2: o 3.54 0.0299 educ- 9, =0 ,3, = 0 1.32 0.2688 educ- 9,: 0 educ-9,=0 2.18 0.1145 Notes: educ denotes years of education. v, = educ - 27:, 9, is the Tobit residual 58 Table 2.4 Descriptive Statistics for Hamermash Data Set Variable Observations Mean Standard Minimum Maximum Deviation hours 5634 20.72 19.40 0 120 kidslt6 5634 .279 .4487 0 1 kidsge6 5634 .3 08 .4615 O 1 age 5634 39.43 9.99 18 59 educ 5634 12.98 2.62 0 18 wage 3286 10.37 7.03 .03 200 husage 5634 42.45 1 1.23 19 86 huseduc 5634 13.15 2.98 0 18 exper 5634 20.44 10.45 0 52 nwifeinc 5634 30269.23 2721 1.58 0 1 12500 lwage 3286 2.20 .525 -3.40 5.30 expersq 5634 527.04 468.29 0 2704 Note: hours are weekly hours worked. kidslt6 represents dummy variable if there are kids less than 6 years of age. kidsg6 represents dummy variable if there are kids greater than 6 years of age. Wage is hourly wage rate. educ represents years of education. husage represents age of husband. huseduc is years of education for husband. exper is experience. nwifeinc is non-wife annual income. lwage is log wage. expersq is experience 2 . 59 Table 2.5 The Estimation of Different Specifications (CPS Data Set) (1) (2) (3) (4) (5) (6) OLS OLS+ 9, ZSLS Garen Procedure Procedure 2.2 2.1 Constant 6504* 2044* 2044* . 1895* .1901 * .1629 (.0587) (.0950) (.0967) (.0950) (.0921) (.1143) educ 0990* .1305* .1305* .1307* .1309* .1321 * (.0035) (.0063) (.0065) (.0063) (.0061) (.0078) exper 0198* 0195* 0195* 0195* 0186* 0187* (.0033) (.0033) (.0033) (.0033) (.0033) (.0033) experz -.OOO3 * -.OOO3 * -.0003 * -.OOO3 * -.0003 * -.0003 * (.00008) (.00008) (.00008) (.00008) (.00008) (.00008) educ- 9, - - - 0036* - 0031* (.0009) (.0011) educ- 9, - - - - - -00008 (00019) 9, - -0453* - -0912* 0426* -0831* (.0076) (.0143) (.0073) (.0163) 9, - - - - -.0016* .0026 (.0007) (.0027) R-Square .2047 .2132 .1854 .2166 .2147 .2177 Notes: Dependent variable is log hourly wage. educ represents years of education. exper represents experience. 9, = educ - Z6, 9, is the Tobit residual. Standard errors in parentheses. * : significant at the 10% level. General 1 represents correction for both self-selection and sample selection biases in linear model, without interactive between years of education and ability. 60 Table 2.6 Joint tests for self-selection and Sample selection biases (CPS data set) Ho: F value Prob > F educ- 9, = 0 educ- 9, = 0 . 13.64 0.0000 v, = 0 9, = 0 educ- 9,= 0 ,3, = 0 21.42 0.0000 educ- 9, =0 ,3 ._. 0 2.78 0.0622 educ« 9,: 0 educ- 19, = 0 6.34 0.0018 Notes: educ denotes years of education. v, = educ - 27:, 9, is the Tobit residual 61 Table 2.7 The Estimation of Different Specifications (PSH) Data Set without parents’ education as instruments) (1) (2) (3) (4) (5) (6) OLS OLS+ 9, ZSLS Garen General 1 Procedure 2.1 Constant -.5220* -.2981 -.2981 -.2625 -.2106 -.2713 (.1986) (.3094) (.3189) (.3090) (.3027) (.3395) educ .1075 * 0894* 0894* 0837* 0880* 0895* (.0141) (.0238) (.0231) (.0239) (.0232) (.0261) exper 0416* 0426* 0426* 0427* 0375* 0377* (.0132) (.0132) (.0132) (.0132) (.0138) (.0137) experz -0008* -0008* -0008* -.0008* -0007* -.0007* (.0004) (.0004) (.0004) (.0004) (.0004) (.0004) educ-9, - - - .0104* - .0106* (.0056) (.0055) educ- 9, - - - - - -00001 (00002) 9, - .0280 - -0977 .0293 -0990 (.0296) (.0725) (.0288) (.0722) 9, - - - - -00006 .0001 (00004) (.0002) R-Square .1568 .1586 .1536 .1657 .1625 .1712 Notes: Dependent variable is log hourly wage. educ represents years of education. exper represents experience. v, = educ - 27:, 9, is the Tobit residual. Standard errors in parentheses. * : significant at the 10% level. General I represents correction for both self-selection and sample selection biases in linear model, without interactive between years of education and ability. 62 Chapter 3 MODEL SELECTION TESTS FOR TWO-PART MODELS L Introduction In econometrics much of the literature concerned with estimation and inference fiom a sample of economic data deals with a situation when the statistical model is correctly specified. Consequently, in much of econometric practice, it is customary to assume that the parameterized linear statistical model used for purposes of inference is consistent with the sampling process from which the sample observations were generated. In this ideal case, statistical theory provides techniques for obtaining point and interval estimators of the population parameters and for hypothesis testing. In practice, however, the possibilities for model misspecification are numerous and false statistical models are most likely the rule rather than the exception. Choosing between two competing models is relatively straightforward when one model is nested within another. Assuming that standard regularity conditions hold, Wald, Lagrange multiplier, and likelihood ration tests (or quasi-LR tests) can be used. See, for example, Fin and Schmidt (1984). See also Chow (1983, Ch. 9). When competing models are not strictly nested, which means one model cannot be obtained from the other by imposing appropriate restrictions or as a limiting suitable approximation, statistically choosing between them is more difficult. Two very different approaches have been proposed. 63 One is based on the work of Cox (1961, 1962) who derived specification tests that use information about a specific alternative and test whether the null can predict the performance of the alternative. This procedure do the hypothesis testing that prior assumes one model is the true model against the competing model. The other approach is a LR based test developed by Vuong (1989) which adopt the classical hypothesis testing framework and consider the Kullback-Leibler (1951) Information Criterion (KLIC) which measures the distance between a given distribution and the true distribution. Compares to Cox’s approach, Vuong’s approach does not assume a true model in advance. Instead, the two competing models are compared on the same basis. In this chapter I explore the use of Vuong’s model selection tests in the context of two competing hurdle, or two-tier, models. In particular, I am interested in models to explain Tobit-like outcomes: the response variable can be zero with positive probability, but it is continuous for strictly positive values. Such variables arise often out of economic optimization. Good examples include annual labor supply and amount of charitable contributions. For some nontrivial segment of the population, the optimum choice is zero; a so-called corner solution. Two-tier models offer more flexibility than the most common model for corner solution applications: the standard, or Type I, Tobit model. Rather than the standard Tobit, they can specify different set of parameters for each tier, which is more economically reasonable. As an example of study that test between two-tier model and the standard Tobit model, Fin and Schmidt (1984) studied the two-tier model and developed a 64 test for the model selection between a two-tier model and standard Tobit model for nested models. This chapter provides some tests and simulations that selecting alternative models for Tobit specification of censored regression models. Censored regression models are used when the dependent variable is partly continuous but piles up at one or more points with positive probability. We can group censored regression models into two categories: one is data censoring, the other is comer solution application. In the first category, there is a well-defined variable y' and we are interested in the population regression E( y‘ | x) where x represents exogenous variables. If y'and x were observed for everyone in the population then we can just apply ordinary least squares or nonlinear least squares. If y'is censored above or below some value then a data problem arises. An example is when y‘ is wage rate and wage is top-coded in the sampling survey: the actual value of wage rate is recorded up to some threshold like $100, but after that only the fact wage rate was higher than $100 is recorded. The second category of censored regression models is comer solution application, which appears more often in econometrics. Suppose y is an observable choice variable of an individual. If y is continuous then we can usually apply some standard procedures like ordinary least squares. But if y is something like individual consumption of a particular good, it will be optimal not to consume the particular good at all for some individuals so y =0. Another example like workers decide whether goes to work or not and it is optimal for some workers not to work at all. In this kinds of case, the distribution of y in the population will be continuous for y >0 but y is zero with positive 65 probability. The data observability is not the issue in this category of censored regression models. The focus is the expectation of y like E( y I x) and E( y I x, y > 0). In this chapter we focus on the corner solution application category of censored regression models. As we said, the main feature of this kind of models is that the range of dependent variable basically is continuous yet has a lower bound, say y = 0 , and observations pile up at this point. The fact that the observed outcome of y might be zero simply reflects the choice of an individual. If we assume E( y I x) 2 x6 and apply OLS on a random sample, then OLS estimate of 6 will be consistent if E( y I x) is in fact linear. However, the predicted value for y can be negative for some combinations of x and 6 . Also, a transformation like log( y) does not work because log(0) is not defined. Actually, E( y I x) is not linear in many cases. Since the usual regression model can not be used in this kind of situation, special treatment has to be applied. It is well known that Tobit model was created to deal with the censored regression models. For the population, we can write the standard censored Tobit model as y' =x6+u, qu ~Normal(0,0'2) (3.1) y =maX(0, y’) (3.2) For the case of true data censoring, E( y' I x) is of interest and the estimation of 6 is what we need . In the case of comer solution application, we are interested in E( y I x) or E( y I x, y > 0) which will depend on 6 nonlinearly instead of linearly depend on it. However, a major limitation of Tobit model to handle comer solution application is that standard Tobit model uses the same mechanism to determine the choice between, say, 66 buying or not or working or not and the amount to buy or the amount to work. If we define y as the variable to be explained, that means Tobit model use only one set of procedure determining the choice between y=0 versus y>0 and the amount of y once y>0. That is not very reasonable because Tobit does not allow the determination of the amount of y when it is not zero to depend on different parameters or variables from those determining the probability of its being zero. Some alternatives to the censored T obit model have been suggested to improve the insufficiency of the standard censored Tobit model. These models allow the decision of y>0 versus y=0 to be separate from the decision of the amount of y given that y>0. These kinds of models are so called “two-tiered models” or “hurdle models”. The model suggested by Cragg (1971) is one of the most widely used hurdle models. Cragg suggested a model that nests the usual Tobit model to form a truncated normal distribution. A major advantage of this model is that it can be easily transferred to standard Tobit model. Because of Tobit model is a special case of the model suggested by Cragg, it is possible to test Tobit model versus Cragg‘s alternative. Fin and Schmidt (1984) derive the LM test for this purpose. There are other alternatives to the standard Tobit model that have been developed. A simpler lognorrnal distribution model is the most suggested one. Unlike Cragg’s alternative that truncate normal distribution at zero, lognonnal distribution model assumes that the dependent variable follows a lognonnal distribution when the dependent variable is greater than zero. The model is easier to consistently estimate the expectation y conditional on explanatory variables than Cragg‘s alternative. However, this model does 67 not contain the standard Tobit model as a special case by imposing parameter restrictions, so it is dificult to test the Tobit model against this alternative. There are some methods that can be applied to test which model should be selected. White and Olson (1979) evaluated competing models by their mean square error of prediction. Vuong (1989) basically compared the maximum likelihood values of the competing models‘ maximum likelihood functions to select one model over the other. This chapter uses the maximum likelihood value comparison approach to test model selection between Cragg model and the lognormal one. We use three different data sets along with computer simulation to decide which one is better to fit data. Section H discusses model specifications and model selection procedures. Section HI contains the empirical work and results. Section IV is the conclusion. H. Basic Framework As we discussed in section I, the situation being considered can be divided into two tiers. First, we consider a particular event at each observation may or may not occur. Ifit does not occur, the variable has a zero value. Ifit does occur, thing goes into the second tier and the event associate with a continuous, positive random variable. We now define y is the variable to be explained. First we will examine Cragg model. Cragg suggests the truncated normal distribution which truncating the distribution at zero in the second tier to ensure the dependent variable y is positive. We assume P(y = 0 I x)=1-(xy), (3.3) 68 where <1>(-) is the standard normal cdf and 7 is a le vector of parameters. The density conditional on y>0 is assumed to be f (y I x. y > 0) = {MW/OH" [¢({y -Xfl }/0)/01, y>0, (3-4) where x is a lxK vector of the values of the independent variables, 6 is a le vector of parameters, and ¢(-) is the standard normal pdf. The first equation shows the probability that y is zero or positive. The second equation is the density function of y when y is greater than zero. The density of y given x is f (y I x;6’) = { 1 - (x7)}'”:°] (¢(x7){¢(XI3 / 0)}’l [¢({y - 396V 0)/ 01W)”; Cragg’s model nests the usual Tobit model by imposing the restriction )2 = 6/ a. The log-likelihood firnction of truncated normal model for observation i is: l. (6’) =108(f.-(y I x; 0)) = 10. = 01 Iogll - «>0.-7)} +1[yi > 01 {1029007) - 108 (”(64 ’ a) + 103 “a!” _ log 2;} 0' :1[y, = 0] rog{1 - (x.~7)} .. 3.5 +1[Y.>OIIIOB‘DOW)-108¢(x.-,3/0)—%IOg(2fl02)-,a;2(yl‘x.fl)2} ( ) As a competitor to Cragg’s model, we assume that, conditional on y>0, log(y) follows a normal distribution. Thus, P(y=0|x)=1-(x}2) (3.6) log(y) Ix,y> 0~Normal(x6,02). (3.7) 69 The first equation indicates the probability of y being zero or positive and is identical to Cragg’s model. The second equation says that conditional on y>0, yI x follows a lognormal distribution. The density of y given x can be written as f (y l X;9)={1- ¢(xr)}""°' {¢(xr)¢[{108(y) - 22.3} /<71/(y0)}””'°l (3.8) The log-likelihood firnction for observation i is 711(9) =108(f.(y | x; 9)) =1[y.=01log{1-(x7)}+1[y. > 01002 ¢(x.7)-log(y.)- $10802”) glogtznrgaogoa—22.102W} (39) Unlike Cragg’s model, this does not nest the standard Tobit model. But it is easy to interpret. For example, 6 ,. is the semi-elastisity of y with respect to x ,. , conditional on y>0. This makes the model easier to have the economic interpretation than Cragg’s model. The expectations of E( y I x) and E( y I x, y > 0) are also different in two models. In lognormal model, E(y I x,y > O) =exp(x6 + 02 /2) , E(y I x)=(x7)exp(x6 + 02 /2), and these are easily estimated given 6,6‘2 , and )7. Cragg’s model and the lognormal model are not nested, so we cannot use standard tests. As discussed in the introduction, we could apply Cox’s procedure to testing one model- which is assumed to be true under Ho - against the other. For the two- tier models studied here, this is very complicated and computationally expensive. Voung (1989) suggested a LR (likelihood ratio) based test which consider the Kullback-Leibler (1951) Information Criterion (KLIC) that measures the distance between a given 70 distribution and the true distribution. If the distance between a specified model and the true distribution is defined as the minimum of the KLIC over the distribution in the model, then the “best” model among a collection of competing models is the model that is closest to the true distribution (see, e. g., Voung (1989)). Suppose there are two competing models F, and G, that F, ={ f (y I x; 6);6 e 0} and G, ={ g(y I x,y);y e 1"}, where F,9 and G, are conditional models and f and g are the conditional density functions. The models can be nested, non- nested, or overlapping, and the model selection test is to test the null hypothesis that E“ [log f (y I x; 6.)] = E°[log g(y I x; 7.)] meaning that the two models are equivalent against E°[log f (y | x; 6. )] > E°[log g(y I x; 7, )] meaning that F, is better than G, or against E°[log f (y I x; 6. )] < E°[log g(y I x; 7.)] meaning that G, is better than F,. Although the quantity E°[log f (y I x; 6. )] and E°[log g(y I x; 7.)] are unknown, they can be consistently estimated by (1/n) times the log-likelihood evaluated at the pseudo or quasi maximum likelihood estimator (MLE) (see, e. g.,White (1982a)). It is apparent that (1/n) times the log likelihood ratio (LR) statistic is a consistent estimator of l3°[log f (y I x; 6. )]- E°[logg(y I x; 22.)]. Cox (1961, 1962) and White (1982b) showed that, if n is the sample size, then n'“ 2 times the LR statistic properly centered and normalized has a limiting standard normal distribution under the hypothesis that one of the Competing models is correctly specified. 71 To pick the better model and to derive the tests for model selection, the model with the minimum KLIC which measures the distance between the true distribution and a specified model should be choose. For the conditional model F, , KLIC is defined as KLIC(H196(;F9) = E°llogh°(Y. |X1)1-E°llogf(Y. IX.;9.)], where H 3,, is the true conditional distribution of Y, given X, . h0 (-I -) is the true conditional density of Y, given X, . Since the expression in the equation does not depend .l..-_ 0‘ on F9, an equal measure is E°[log f (Y, | X ,;6.)]. Ifwe have two competing models, the model that is closest to the true conditional distribution should be chose. Vuong therefore develop the following hypotheses: H0 :EOIlog f(Y' lX”””I=0, (3.10) 80’. IXM’.) meaning that F, and G, are equivalent, against logf(Y‘ lX';‘9‘)I>0, (3.11) Hf: EOI: g(Yr 1X57.) meaning that F,9 is better than G, , or H,: EoIlogflY' IX"0'):I<0, (3.12) 80’. | X . ;7.) meaning that F, is worse than G, . Since the two competing models we are going to examine (the lognormal model and the Cragg alternative) are non-nested, the model selection tests for strictly non-nested models are: 72 A D HO :n""2LR,,(6,,,f,,)/a3,,—>N(O,l) (3.13) H, :n"’2LR,(6,,7‘,)/03,:+00 (3.14) H, :n"’2LR,,(6,,,7,,)/6?),,:;-oo (3.15) where LR,(6,,7‘,) EL{(6,)—Lg(i,)=216gf(y' W5") ,where 6, and y“, are the i=1 g(y. Ix.;7.) maximum likelihood estimators of 6. and 7.. Also, Li (6,) = sup Li (6) and Li (6) is the 966 conditional log-likelihood function for the model F,,. A similar definition applies to 7‘, for the conditional model G. And a), ==-—{n2[1og f(y, Ix,, 0:22)]2}“2 t=l gg(YI |xt97n) The tests above provide us a usefirl fiamework for model selection. We can choose a critical value c fiom the standard normal distribution at some significance level. Ifthe value n"" ZLR,l (6,, , 7,)/ (9,, is greater than c, then the model F, is preferred. If the value n‘” ZLR, (6,, , 7,)/ a"), is smaller than -c, then the model G, is preferred. If the value n‘“ 2LR, (6,, , 7,)/ :9, is between c and -c, then we cannot discriminate between the two competing models. III. Empirical Result To determine the model selection between lognormal model and Gragg’s alternative, we divide this section into two parts. In the first part we do a simulation using computer-generated data sets to see how well the test works. In the second part we apply 73 the test introduced in section H to three data sets used in chapter one and two. We basically follow the null hypothesis: 23002 f“. — log 2.) H, - '2' ~ N(O,1) zoos}. —Iog 2.)? i=1 to form the test and apply to different data sets. (I) The Simulation Results The simulation is designed as the following: We use computer to generate data sets. The data sets are generated in two kinds: one follows lognormal distribution and the other follows truncate normal distribution. (a) lognormal distribution The data sets follow lognormal distribution is generated by: (i) Generate X=(1,X,,X,), where l is constant term, X, is a binary variable which is zero for around 70% and one for around 30%. X, is a standard normal random variable. (ii) Generate 2 =1 if X7 + u 2 0 , otherwise 2 = 0. u follows a standard normal distribution. (iii) If 2 = 0 then y = O. If2=l, then y = exp(X6 + v). v is generated to corrected with 1: such that v = .3u + .7k where k follows a normal distribution such that k ~ N (0,03) . (b) truncate normal distribution 74 (i) Follow the steps (i) and (ii) in (a). (ii) If 2 = 0 then y : 0. Ifz=1, then y = X6 + v is truncated at zero. Also, v is generated to correlated with u such that v : .3u + .7k where k follows a normal distribution such that k ~ N (0, 02). With both kinds of the data sets, we set sample size at 200 and 1000. We also experiment two sets of true values of r and 6 5along with two 02 values (.2 and .5). To see if the simulation results change if we increase the variation of X, , we also experiment the whole simulation by doubling X, ’s variation. The whole combinations show in Table 3.1. There are sixteen combinations for each kind of data set. With all the combinations, we run each of them 200 times to find out the pattern. We choose 2.58 as the critical value so the significant level is 1%. Ifthe statistic value we calculate is greater than 2.58, then lognormal model is preferred. If the statistic value is less than —2.58, then Gragg truncate normal model is preferred. If the statistic value is between 2.58 and —2.58, then there is no conclusion which model is better. The rejection rate for the true model being picked up correctly shows in Table 3.2 and 3.3. In Table 3.2, the true model is lognormal model. We have the following findings: (1) The rejection rates decrease when we decrease the sample size from 1000 to 200. This is reasonable since bigger sample should give the test more power to distinguish the true model fiom the alternative model. 5 The first set is: 7 = (0,1,1) and6 = (0,1,1) . The second set is :7 = (0,— 1,1) and 6 = (0,1,1) . 75 (2) We also find the rejection rates decrease when 02 increases from .2 to .5. That means the test is harder to pick up the right model when the variance of the error tern in equation y increases. (3) The rejection rates decrease when the variation of explanatory variable increases That means the test is less powerful to pick up the right model when the variation of the explanatory variable increases. In Table 3.3, the true model is Cragg’s truncate normal model. The same test is used as we did in Table 3.2. We have the following findings: (1) The rejection rates decrease as we decrease the sample size from 1000 to 200. This result is similar with Table 3.2’s result. (2) The rejection rates do not change with respect to 02 and the variation of X, change. It’s more stable in our simulation. (3) The specification that 7 =(O,-1,1) and 6 =(0,1,1) may not be a good one when the true model is Cragg’s truncate normal model. The maximum likelihood procedure just can not converge when estimating the log likelihood firnction for Cragg’s truncate normal model. Compare the simulation results for two different kinds of models, we noticed that Cragg’s truncate normal model is more robust than the lognormal model in the sense that the model is seldom being wrong picked when we change the variation of X and 0'2 . Only when we decrease the sample size the model’s rejection rates decrease. This means the possibility for us to pick up a wrong model is higher when the true model is a lognormal model. 76 (H) The Real Data Sets Results We use the 3 data sets at hand to run the test we described above to see which model is better fitted. The 3 data sets we use are exactly the data sets we used in previous two chapters. The first one is Moffitt data set. The second one is PSID data set and the third one is CPS data set. In the case of Moffitt’s data set, we use hours of work a week as the dependent variable. The independent variables include dummies for marital status, age, race, the number of family members, the number of children in the household who are less than 6, the number of children in the household who are greater than 6, years of education, the size of the local labor force, the employment fiactions in manufacturing and in government in the census region of residence and the flow of the asset income. After conducting the maximum likelihood procedures for both models, the calculated statistic value is —8. 10. Cragg’s truncated normal model is preferred. For PSH) data set, the hour of work is the dependent variable. The explanatory variables include age, non-wife income, and the number of kids less than age of six, the number of kids greater than age of six, years of education, experience, and square of experience. The statistic value is —6.8. The Cragg’s truncate normal model is preferred. For CPS data set, the dependent variable is hours of work. The explanatory variables are age, non-wife income, race (a dummy variable), education, experience, square of experience, husband education, husband experience, the number of kids less than age of six, the number of kids greater than age of six. The statistic value is -11.87. Again, The Cragg’s truncate normal model is preferred. 77 IV. Conclusion In this chapter, we discuss the potential drawbacks of the Tobit model to handle the corner solution applications of censored regression models. We also have two good alternative candidates for Tobit model. One is a lognormal model and the other one is Cragg’s truncated normal model. To choose the model that best fits the data, we applied Voung’s LR based test for model selection. We do simulation first and then apply the test to the data sets we used in chapter one and two. The simulation results show that it is easy for us not to pick the lognormal model even when the lognormal model is the true model. So if the test shows that lognormal model is preferred, the chance to go wrong is very small. On the other hand, if the test indicates Cragg’s truncate normal model is preferred, we still have to be careful. The results for the ”real data sets show that Cragg’s truncate normal model is preferred in all three data sets although we should be more carefirl about the result as the simulation suggest. 78 Table 3.1 Combinations of the Simulation Sample Size: 1000 Sample Size: 200 X,~ N(0,1)X,~2* N(0,1) x,~ N(0,1) X,~2* N(0,1) 7=(0,1.1) 6 =(0,1,1) 7 :(02-121) 6=(0,l,1) 79 Table 3.2 The Rejection Rate True Model: Log Normal Model Note: Critical value is 2.58 Sample Size: 1000 Sample Size: 200 X~ N(0,1) X~2* N(0,1) X~ N(0,1) X~2* N(0,1) 02=.2 az=5 02=.2 02=5 oz=.2 02=.5 02=.2 02=.5 r=(0.1,1) 340,1,” 100% 98% 56.25% 32.46% 100% 92.5% 27.37% 12.35% (200) (200) (192) (191) (200) (200) (179) (170) 7=(02'121) ,3 =(0,1,1) 100% 99.5% 100% 100% 100% 99% 100% 100% (200) (200) (153) (136) (200) (200) (191) (190) 8O Critical value is -2.58 Table 3.3 The Rejection Rate True Model: Truncated Normal Model Sample Size: 1000 Sample Size: 200 X~ N(0,1) X~2* N(0,1) X~ N(0,1) X~2* N(0,1) 02=.2 az=.5 az=.2 02=.5 02= az=.5 02=.2 02=.5 7 =(0.1.1) ’5 =(0,1,1) 100% 100% 100% 100% 93% 87.5% 100% 99% (200) (200) (200) (200) (200) (200) (200) (200) 7' :(Oa'lal) fl =(O 1 1) all 4! III * ill ’1‘ ll! 31! note: * means the computer work does not converge 81 Chapter 4 CONCLUSION In the previous three chapters, we explore several different topics for Tobit model that takes the form y, = 2,7 +17, (4.1) y, = max (0, a0y, + 2,6 + u,) (4.2) where equation (4.1) is the stnrctural equation and equation (4.2) is the Tobit selection equation. 2,, 2, are the vectors of exogenous variables. The first two chapters concentrate on equation (4.1)’s consistent estimation. In chapter one we derive and apply an alternative method to test and correct sample selection bias while estimating essentially the Type HI Tobit model with endogenous explanatory variables in the structural equation. We applied multi-step approach to estimate traditional Type HI Tobit model which usually being estimated by Maximum Likelihood Method. Multi-step approach has advantages of easy to compute and more robust than MLE does. It also can provide an easy t-test for sample selection bias at the same time. However, if one need to acquire the asymptotic variance matrices under this multiple step approach, it maybe needs a lot of computational efl‘orts. The NLS data set used by Moflitt (1984) is applied to our procedure. We find there are evidence of the existence of sample selection bias in our estimation, which is different from what MofIitt’s claim. After correcting the bias, the average return of education goes from 7 .77% to 5.94% which is similar to the early findings. We can say that the average return will be overestimate if we don’t correct the sample selection bias. 82 ll The coeficient of hours. also changes from 36% to 1.21% before and after we add the A constructed variable v; into the structural equation. It means that the hours effect to wage will be underestimate if we don’t take the sample selection bias into account. In chapter two we focus on the situation where there are both self-selection and sample selection. We show how to test and to correct for self-selection and sample selection biases at the same time, especially in circumstance when the explanatory variable, such as education, interacted with unobservable. Like in chapter 1, we derive a multiple- step procedure which provides an easier to compute, less assumptions needed alternative to traditional Maximum Likelihood Estimation. The procedure extends Garen’s (1984) procedure to handle the sample selection problem besides the self-selection problem. Two data sets are applied to this procedure. One is PSH) data set of 1975 used by Mroz (1987) in his study of the sensitivity of female labor supply to various assumptions. The other one is CPS data set of May 1991 compiled by Daniel Harnerrnesh. The empirical result for PSH) data set shows that there is evidence of self-selection biases but not strong evidence of sample selection biases. The simple OLS regression using only women who work cannot produce consistent estimates and the estimate of return to education is biased upward. By contrast, ZSLS can provide consistent estimates when sample selection bias is not a problem. In CPS data set, both self-selection biases and sample selection biases exist. The method we suggested should be used to correct self-selection biases along with sample selection biases. The OLS estimate for returns to education is biased downward. This indicates there is no certain direction for OLS biases. 83 Chapter three shift the focus to equation (4.2). We discussed the possible model misspecification and explore the model selection test for choosing between two competing models. The potential drawbacks of the Tobit model to handle the comer solution applications of censored regression models are also being discussed. There are two alternatives for Tobit model. One is lognormal model, the other is Cragg’s truncate normal model. To pick up a right one that fits the data better, Vuong’s LR based test for model selection is applied. The simulation results show that the Vuong’s model selection test basically can pick up the correct model. However, we do find that using the test is easier to mispick when the lognormal model is the true model. So if the test shows that lognormal model is preferred, the chance to go wrong is very small. On the other hand, we still have to be careful if the test indicates Cragg’s truncate normal model is preferred. For further research, in chapter one and two the normal distribution assumption for error term in the selection equation could be relaxed. Powell (1984) extends least absolute deviations (LAD) estimation to the regression with non-negativity of the dependent variable, and gives conditions under which this estimator is consistent and asymptotically normal. It should be interesting to explore and apply LAD to our case. In chapter three, the model selection test is under the assumption that there are only two competing models. It is useful to generalize the procedures to the case where there are many competing models. 84 REFERENCES Amemiya, T. (1985), advance Econometrics. Cambridge: Harvard University Press. Angrist, J. D., and A. B. Krueger (1991). “Does compulsory school attendance afi‘ect schooling and earnings?”, Quarterly Journal of Economics v106, n4, p979(3 6). Anhenfelter, O,. and A. B. Krueger (1992) “Estimates of the Economic Return to Schooling for a New Sample of Twins.” Princeton University Industral Relations Section Working Paper #304. Butcher, K. F., and A. Case (1993) “The Effect of Sibling Composition on Women’s Education and Earnings.” Unpublished Discussion Paper, Princeton University Department of Economics. Card, D., (1993). “Using Geographic Variation in College Proximity to Estimate the Return to Schooling” NBER Working Paper No. 4483. Garen, J ., (1984). “The returns to schooling: a selectivity biases approach with a continuous choice variable, Econometrica 52, 1199-1218. Chow, G. (1983): Econometrics. New York: Mcgraw-Hill . Cox, D. R. (1961): ”tests of Separate Families of Hypotheses,” Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, 1, 105-123. Cox, D. R. (1962): “Further Results on Tests of Separate families of Hypotheses,” Jounal of Royal Statistical Society, Series B, 24, 406-424. Cragg, J. (1971), “Some Statistical Models for Limited Dependent Variables with Application to the Demand for Durable Goods,” Econometrica 39,829-844. 85 Fin, T. and P. Schmidt (1984), “A Test of the Tobit Specification Against an Alternative Suggested by Cragg,” Review of Economics and Statistics 66, 174-177. Gronau, R.(1974). “Wage Comparisons—A Selectivity Biases.” Journal of Political Economy 82:1 119-43 Hausman, J. (1980), “The Effects of Wages, Taxes, and Fixed Costs on Women’s Labor Force Participation.” Journal of Public Economics 14:161-94. Heckman, J. (1974). “Shadow Prices, Market Wages, and Labor Supply.” Econometrica 42:679-94 Heckman, J. (1976). “The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models.” Annals of Economic and Social Measurement 5:475-92. Kane, T. J ., and C. E. Rouse (1993) “Labor Market Returns to Two- and Four- Year Colleges: Is a Credit a Credit and Do Degrees Matter?” Princeton University Industrial Relations Section Working Paper #311. Kullback, S., and R. A. Leibler (1951): “On Information and Sufficiency,” Annals of 2 Mathematical Statistics, 22, 79-86. Larson, D. (1979), “Taxes in a Labor Supply Model with Joint Wage-Hours Determination: Comment.” Econometrica 47: 131 1-13. Lee, L. F., and R. P. Trost (1978). “Estimation of Some Limited Dependent Variable Models with Application to Housing Demand.” Journal of Econometrics 8:357- 82. Lee, L.F. (1978). “Unionism and Wage Rate: A Simultaneous Equation Model with Qualitative and Limited Dependent Variables.” International Economic Review 19 :41 5-3 3. 86 Lewis, H. G. (1974). “Comments on Selectivity Biases in Wage Comparisons.” Journal of Political Economy 82(6): 1 145-55. Moflitt, R. (1984), “The Estimation of a Joint Wage-Hours Labor Supply Model” Journal of Labor Economics 2:550-566. Mroz, TA. (1987), “The Sensitivity of an Empirical Model of Married Women’s Hours of Work to Economic and Statistical Assumptions,” Econometrica 55, 765-799. Oi, W. (1962), “Labor as a Quasi-fixed Factor.” Journal of Political Economy 702538-55. Powell, J .L. (1984), “Least Absolute Deviations Estimation for the Censored regression Model” Journal of Econometrics 25:303-325. Rosen, H. (1976), “Taxes in a Labor Supply Model with Joint Wage-Hours Determination.” Econometrica 44:485-507 Rosen, S. (1969), “On the Interindustry Wage and Hours Structure.” Journal of Political Economy 772249-73. Roy, A. D. (1951). “Some Thoughts on the Distribution of Earnings.” Oxford Economic Papers 32135-46. Vella, F. (1992), “Simple Tests for Sample Selection Bias in Censored and Discrete Choice Models,” Journal of Applied Econometrics 7,413-421. Vella, F. (1993), “A Simple Estimator for Simultaneous Models with Censored Endogenous Variables,” International Economic Review 34, 441-457. Vuong, Q. (1989), “Likelihood Ration Tests for Model Selection and Nonested Hypotheses,” Econometrica 57, 307-333. White, H., and L. Olson (1979): “Determinants of Wage Change on the Job: A symmetric Test of Non-Nested Hypotheses,” mimeo, University of Rochester. 87 White, H. (1980): “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity,” Econometrica 48, 817-838. White, H. (1982): “Maximum Likelihood Estimation of Misspecified Models,” Econometrica, 50, 1-25 White, H. (1982): “Regularity Conditions for Cox’s Test of Non-Nested Hypotheses,” Journal of Econometrics, 19, 301-318. White, H. (1982), “Instrumental Variables Regression with Independent Obervations,” Econometrica 50, 483-500. Willis, R., and S. Rosen (1979). “Education and Self-Selection.” Journal of Political Economy 87(5, Part 2):507-36. Wooldridge, J .M. (1996), ”Selection Corrections with a Censored Selection Variable” Working Paper, Michigan State University. Wooldridge, J .M. (1997), “On Two Stage Least Squares Estimation of the Average Treatment Effect in a Random Coefiicient Model”, Working Paper, Michigan State University. Wooldridge, J.M. (1998), “Econometric Analysis of Cross Section and Panel Data”, Manuscript, Michigan State University. 88 "1111111111111111: