MSU LIBRARIES “ RETURNING MATERIALS: Place in book drop to remove this checkout from your record. FINES will be charged if book is returned after the date stamped below. ON THE ADIDUACY OF THE SARAGAN APPROXIMATION TO THE NORMAL IN ECONOMETRIC MODELS BY Mohammad-Ali Kateei A DISSERINTION Submitted to Michigan State University in partial fulfillment at the requirements tor the degree of DOCTOR OF PHILOSOPHY Department of Economics 1984 assrnncr on war announcr or was SARAGAN APPROXIMATION TO THE NORMAL IN ECONOMETRIC MODELS BY Mohammad-Ali Kafeei The normal distribution ,is very commonly assumed in econometric models, but it causes some problem in models for which the likelihood function involves the normal c.d.f. Goldfeld and Quandt introduced a class of distributions, called Sargan distributions, to be used as an approximation to the normal, which have certain desirable properties, notably an integrable c.d.f. We investigate the adequacy of these Sargan distributions in approximating the normal. This is done by measuring the ”cost” resulting from the usage of the Sargan distribution instead of the normal as the assumed distribution of the error terms. The cost is defined in terms of the asymptotic bias (or inconsistency) of the parameter estimates, when the errors are actually normal. We prove that this ”cost" is model dependent. There is no cost in the linear regression model, but in other models (such as sample selection models) there is a cost (a false distributional assumption causes inconsistency). and the cost goes up with the degree of censoring or truncation. We consider three different models for both the realistic case of an unknown variance and the realistic case of a known variance, considering both first and second order univariate to biV use reg the not up inv Sargan distributions. We also define a bivariate Sargan distribution, examine its properties and investigate its adequacy as an approximation to the bivariate normal distribution. We provide tables for the bivariate Sargan and normal densities, compare them, and discuss the use of this Sargan distribution in a -simple seemingly unrelated regression model. The overall conclusion of this study is that if one believes that the true distribution of the error terms is normal, it really does not pay that much to, use the Sargan distribution. Especially in largely censored (or truncated) samples, the cost is more than the benefits (in terms of computational savings) gained. But the potential computational saving is generally higher in the bivariate (or especially the multivariate) case than in the univariate case, so multivariate Sargan distributions may be worthy of further investigation. To the Memory of my father, and to my mother, brother and sisters. ii ACKNOWLEDGMENTS I would like to express my sincere gratitute and appreciation to Professor Peter J. Schmidt, chairman of my dissertation committee for his invaluable advice, suggestions, comments and criticism as well as his encouragement and supports without which this study would have not been possible. I am also grateful to the other members of the committee, Professor Robert H. Rasche, professor Daniel S. Hamermesh, and Professor Byron W. Brown for their critical reading and useful Comments and suggestions. I also would like to thank Mrs. Terie Snyder for typing the equations of the dissertation. Most of all I will be indebted to the members of my family for their moral and financial support. iii List of Tables Chapter II III TABLE OF CONTENTS Introduction 1.1 Historical Background 1.2 Univariate Sargan Distribution: Definitions and Properties 1.3 Outline of the Dissertation Robustness of Normal MLE's to Sargan Errors 2.1 2.2 2.3 2.4 2.5 Introduction The Censored Case (Tobit Model) The Truncated dependent Variable Model The Binary (Probit) Model Conclusions Robustness of Sragan MLE's to Normal Errors: First Order Case 3.1 Introducion 3.2 Linear Regression MOdel 3.3 The Tobit Model iv Page vi 13 15 18 19 20 23 25 30 Chapter IV VI References 3.4 The Truncated Dependent Variable Model 3.5 Conclusions Appendix A Robustness of Sragan MLE's to Normal Errors: Second Order Case 4.1 Introduction 4.2 The Linear Regression Model 4.3 The Tobit Model 4.4 The Truncated Regression Model Bivariate Sargan Distribution‘ 5.1 Introduction 5.2 Definitions 5.3 Density Comparisons 5.4 A Simple Seemingly Unrelated regression Model 5.5 Conclusions Appendix A Appendix B Appendix C Conclusions Page 36 40 44 46 46 49 53 $8 59 80 82 9O 99 103 114 122 126 Table 2.1 3.1 4.1 5.1 5.2 5.3 5.4 5.5 LIST OF TABLES Asymptotic Bias of Tobit When True Errors are First Order Sargan Asymptotic Bias of Sargan "MLE" when True Errors are N(0,1) Asymptotic Bias of Second Order Sargan "MLE" when True Errors are N(0,1) Comparison of Bivariate ( p =-.15 ) Comparison of Bivariate ( p =-.lO ) Comparison of Bivariate ( p =-.05 ) Comparison of Bivariate ( p =0.0 ) Comparison of Bivariate ( p =.05 ) Sargan and Normal Densities Sargan and Normal Densities Sargan and Normal Densities Sargan and Normal Densities Sargan and Normal Densities vi Page 22 43 57 92 93 94 95 96 5.6 5.7 vii Comparison of Bivariate Sargan and Normal Densities ( p 8.10 ) Comparison of Bivariate Sargan and Normal Densities ( p=.15 ) 97 98 Chapter 1 Introduction 1.1 Historical Background The assumption of normality is very common in econometric and statistical work. For example, errors in regression models are often assumed to be normally distributed; hypotheses about means of random variables are tested under the assumption of normality: and in more complicated models, such as disequilibrium models or bivariate probit models, a multivariate normal distribution is assumed. To some extent, the assumption of normality can be justified by reliance on the central limdt theorem.' However, such justifications are seldom very rigorous, and it is probably fair to say that the frequent use of the normal distribution is in fact due to two other reasons. First, in many common models the assumption of normality is very convenient. For example, in the linear regression model, the maximum likelihood Vestimator under normality is least squares, which is simple to calculate. Also given normality exact finite sample tests of linear hypotheses are possible. Alternative error distributions would lead to more complicated estimation and testing procedures, which would generally be justified only asymptotically. Second, in many common models inferences based on normality are asymptotically robust to non-normality. Again using the linear regression model as an example, 1 tests which are exact in finite samples given normality are correct asymptotically given any non-normal distribution with finite mean and variance. However, in other models the assumption of normality may not be very convenient. The cumulative distribution function of the normal distribution can not be expressed in closed form, and therefore the normality assumption may not be the most convenient in models for which the likelihood function contains the c.d.f. of the error distribution. This is so in a wide class of models involving censoring, truncation or selection of the dependent variable (e.g., the Tobit model). This is especially true in multi-equation models (e.g., a multi-market disequilibrium model), since the c.d.f. of the multivariate normal distribution can be very expensive to evaluate. Such computational considerations led Goldfeld and Quandt(1981) to introduce a class of distributions, which they call ”Sargan distributions", to be used as substitutes for the normal distribution in models for which estimation requires evaluation of the c.d.f. of the error distribution. The Sargan c.d.f. can be evaluated analytically, and its density is reasonably close to the normal density. Thus Sargan distributions may be reasonable candidates to use as an approximation to the normal distribution. An obvious question to ask is how good an approximation to the normal distribution a Sargan distribution provides. One way to answer this question is simply to compare the densities (or c.d.f.'s) of the two distributions. Goldfeld and Quandt (1981, p. 145) provide in their Table l a comparison of the densities of N(0,1) and of a second-order Sargan distribution with mean zero, variance one, and the same density at zero as N(0,1). The agreement is reasonable close, except in the tails. Missiakoulis (1983, p. 227) provides in his Table l the same results, plus the density of the first-order Sargan distribution with variance equal to one (a=2). As might be expected, the first-order Sargan distribution does not provide as good an approximation as does the second-order Sargan distribution.1 While such direct comparisons are informative and easy to make, they do not provide good evidence on the relevant statistical question, namely the effect on one's estimates or inferences of the use of such an approximation to the (hypothesized) true error distribution. This is a much harder question, in part because its answer clearly depends on the nature of the model in which the errors appear. For example, an incorrect assumption of normality does not cause bias or inconsistency of the parameter estimates in the linear regression model, but it does cause inconsistency in limited dependent variable models, such as the Tobit model. Conversely, the incorrect assumption that the errors have a Sargan distribution does not cause inconsistency in the linear regression model (as we prove later), but it does cause inconsistency in more complicated models such as the Tobit model. Indeed, it is an unfortunate fact that, for the class of models for which the Sargan assumption is convenient (models for which the likelihood function involves the error c.d.f.), the correctness of the assumed error distribution is vital, even asymptotically. Some work has been done on the robustness of the normal maximum likelihood estimates to non-normality, in the Tobit model and some related models. Goldberger (1980) and Arabmazar and Schmidt (1982) consider the Tobit model, the truncated version of the Tobit model, and the probit model, and consider the asymptotic bias (inconsistency) which results when the normality assumption is incorrect. The alternative distributions considered were Student's t, Laplace, and logistic, and the models considered contained only a constant term. The bias is sometimes substantial, so that this problem merits further attention. Missiakoulis (1983) has provided a similar analysis, for the probit model only, of the asymptotic bias resulting from an (incorrect) assumption that the errors have a Sargan distribution, when the normal is in fact correct. In this thesis we consider some implications of the use of the Sargan distribution in econometric models. The plan of the thesis is given in section 1.3. Section 1.2 first defines Sargan distributions, and gives some of their properties, for later reference. 1.2-Univariate Sargan Distribution; Definitions and Properties Goldfeld and Quandt (1981) define the family of Sargan densities as (equation 2.1); P (1.1) f(x) - Ivze""‘l"l (1+ X erjlxlj) 3'1 where a > 0, 7j Z O, j=l,2,...,P and P is the order of the density. Alternatively, in the notation of equation (1) in Missiakoulis (1983) (1.2) f(x) - £’§J°‘lxl 3|x|3 E y a 1-0 5 with P (1.3) D - (2 711:)"1 - 3% loo and 70 n 1. Note that, as Goldfeld and Quandt and Missiakoulis mention, by setting 7j = O for all j, f(x) becomes a generalization of the Laplace density. Also, Goldfeld and Quandt (1981, p. 144) note that for f'(x) to be continuous, one must impose 71:1 . The Sargan density is a symmetric, unimodal density function with x defined on.( - a , a ). Its moment generating function can be derived as follows. (1.4) me) - flees - f; eex f(x)dx Define: P 1 (1+6 1 A - 23-12011 Jig e( )x(-x)dx Using the fact (see Gronner and Hofreiter (1958), p. 55) that n -ax n! (1.5) I; x e dx - -;;1 n > O a P a1+111 i! A - 3' + 1-0 (a+6)1 1 and similary if we define P 'P a} i! o: r- i -(a-6)x 1 D Y1 14) id) (C4D Thus 1) P 1+1 1 1 (1.6) 14(9) - A+B - 7 2 a 7112 ["—f+T+‘—'T+31'] id) and) are) Therefore, the rth moment of the Sargan distribution is (e 0) o E 1+1 (1,) [(-1)r(i+l)(i+2)...(i+r) u = ' a 7 ° + 1 1' 71-0 1 «1“. + (i+l)(i+2)...(i+r)] i+r+1 a D P r -—-; 2 (1+r)!Y1[(‘1) +11 2a i=0 0 if r is odd (1-7) pr = D P ‘;‘ Z (1+r)! Y1 if r is even a 1-0 Equation (1.7) is different from equation (6) in Missiakoulis (1983) and from equation (2.3) of Goldfeld and Quandt (1981) by the factor 2 71' The Sargan c.d.f. can be evaluated analytically as follows: 1) x < O x Do my P 1| lid Fm - Ina 20w v v M M3 - -e 111 2 1-0 1 3.0 j! 11) x > 0 P P O D a 1 Da -ay F(x) = I_o : e y X 11 a IYI dy + fo"— 9 2 7,6 y dy 1-0 1-0 _ o E Y 1! 3 Y 1+1 2 [ilie- 11 l " - +1 2 1-0 1 2 1:0 1 j-O j! 1 3+1 1 P 1 J -ax ax) = D 2 y 1! -'-e 2 y 1! Z 1-0 1 2 1-0 1 j-O 3’ That is, P 1 j n - a'eax X 71 1! 2 '$-§%l- 11 x < o . 1-0 1.0 j (1.9) F(x) I P 1 j D -ax 1 -’§-e 2 71 1! $§¥2—- 1: x > o 1-0 3-0 The special cases of (1.9) for the first-order and second-order Sargan densities are defined in chapter 3 (equation 4.13). their moments are as follows: (equation 3.18) and chapter 4 The first-order and second-order Sargan densities and (1.10 (1.12 (l. First-Order Case (1.10) f(x) . %e-alxl (1+alxl) 2 2 a a (1.12) M(9) - ‘4'(EIC'1'EF¥S*"‘£L‘§F*'“£L‘19 (a+8) (a-G) Second-Order Case (1.14) (1.15) u = 0 4(l+372) var(x) - T———- a (1+12) 3 3 2 2 2a Y 2a 1 J‘— + J‘- + -——--°‘ + -—--—°‘ + 2 + ~32 M 9 . (1+9 11-9 Q+6)2 La-ejz Jg+6)3 fire) ( ) 4(1+Y2) Equation (1.15) is different from equation (2.3) of Goldfeld and Quandt (1981) as mentioned before. one ver Cor. 1.3 Outline of the Dissertation The purpose of this study is to investigate the adequacy of the Sargan distribution as an approximation to the normal in econometric models. The measure of adequacy is a simple one; namely, the asymptotic bias (inconsistency) which results from the use of an incorrect assumption about the error distribution. Obviously other measures of adequacy of the approximation would be possible, but this one is informative and has been used profitably elsewhere. The models in which we will investigate this question are all versions of what Beckman (1976) calls sample selection models. Consider a hypothetical linear relationship y. = a. x. + £- i=1'2'eee'n * which would satisfy the usual ideal conditions if yi were observed for all i. In the censored regression model, we observe . * (1°16) Y1 ' 4 - nax (0,y1) 181,...,n * i 0 if’yi <20 (This is the Tobit model if s is normal.) Observations with yiz 0 will be called "complete" observations while those with y1 = 0 will be called "limit" observations. In the truncated regression model we observe yi= Yi* if and only if Y1. Z 0. In other words, we observe the complete observations but nothing about the limit observations (not even their numbc SC We model assu: EXIEI we in u dist: same Chapt 59:02 dist: 001“, inVe: ‘0 u 10 number). Finally, in the binary regression mobel we observe * 1 if y1 > 0 (1.17) yi - 1-l,...,n * 0 if y1 < 0 so we essentially observe only the sign of yi*. (This is the probit model if e is normal.) In chapter 2 we investigate the robustness of the MLE's which assume normality to Sargan errors, in these three models. This is an extension of Goldberger (1981) and Arabmazar and Schmidt (1982), which we address mostly for the sake of completeness. Our main interest is in the opposite question, the robustness of MLE's which assume a Sargan distribution to normal errors. We investigate this question in the same three models plus the linear regression model. This is done in chapter 3 for first-order Sargan distributions and in chapter 4 for second-order Sargan distributions. In chapter 5 we consider how to generalize the Sargan distribution to the multivariate case. We consider alternative ways of doing so, and define a particular bivariate distribution with Sargan marginal distributions as a bivariate Sargan distribution. We make some comparisons of its density to that of the bivariate normal, and investigate the robustness of the MLE's based on the bivariate Sargan to the bivariate normal in a simple seemingly unrelated regressions model. 11 Finally, chapter 6 contains our conclusions. _,_..__—. "hi: that 12 Footnotes 1) Missiakoulis also displays the densities of some higher-order Sargan distributions. These approximate the normal less well than the second-order Sargan distribution. This is possible because of the way the parameters of the higher-order distributions were chosen. In particular, whereas the Sargan density of order P contains parameters ( a , 7 ,..., 7 ), Missiakoulis assumes the 7 's to be chosen a priori (see his equatioR (9)) so that for any P there is only one unknown parameter ( a ). Thus his second-order case is not n ted in his third-order case, and so forth. If we allowed the P -order case to have P free parameters, and chose these to maximize the quality of the approximation (however defined), obviously we could do no worse by increasing the order of the approximating distribution. 2) Using (1.7) to calculate a when 72=1/3, P= 2 (second-order Sargan), and variance = 1, u2= variance = D(2 + 6 71+ 24 72)/ a2= l a = V 6 = 2.44949 , (D=3/8) which is different from that of Missiakoulis (1983, p. 227). (Note that 71= l is required to ensure the uniqueness of the results.) N I no Chapter 2 Robustness of Normal MLE's to Sargan Errors 2.1 Introduction The normal distribution has been assumed to be the true distribution of the error terms in a variety of different models, although in many cases there has not been a very good reason to believe or substantiate this assumption. This popularity of the normal distribution stems partly from computational considerations and partly from the robustness of the estimates based on normality. In certain models, such as the linear regression model, the MLE's based on normality are unbiased, BLUE and consistent even if the distribution of the errors is non—normal; only for testing hypothesis in small samples is the normality assumption crucial. The casual use of the normality assumption in different models requires more attention. In this chapter we look at this issue in different models with a specific and true non-normal distribution of the error terms, namely first-order Sargan, as defined in chapter 1. This attempt can be seen as an extension of Goldberger (1981), and Arabmazar and Schmidt (1982) , who raised the question of the robustness of the normal MLE's to non-normality in the censored (Tobit) and truncated regression models. Arabmazar and Schmidt concluded that "the bias from non-normality can be substantial”, especially if the 13 varianc few d1 intere: the ac1 measurq (incon: likeli] The: 2:: normal regres Sargan from atfici t“Inc. seCti regul Farah Chapq GOld: M155 C031 14 variance of the errors is not known. However, they considered only a few distributions ( Laplace, logistic and Student's t ), so it is interesting to perform a similar analysis under the new assumption that the actual distribution of the errors is Sargan. As in their work, the measure of robustness to be used here is the asymptotic bias (inconsistency) of the estimates. In the case of linear regression model, the normal maximum likelihood estimate ( which is OLS ) is unbiased and consistent. Therefore in terms of our analysis, there is no cost in assuming normality in the linear regression case even if in fact that is not so. Goldfeld and Quandt (1981) consider the case of linear regression with the residuals actually being distributed as first-order Sargan, but incorrectly assumed to be normal. One important result from their paper is the fact that efficiency is lost; the relative efficiency of OLS to the Sargan MLE is .84. In this chapter, I will consider in turn the censored, truncated and binary (probit) dependent variable models. In each section, after defining the model and its estimators, I discuss the results of numerical calculations of the inconsistency of the estimated parameters. (These are tabulated at the end of this chapter.) The chapter ends with some concluding remarks. In order to make the argument tractable and also comparable to Goldberger (1980), Arabmazar and Schmidt (1982), and Missiakoulis_(1983), we assume that there is only one regressor, a constant term, in all the models under consideration here. 3‘ II! where only mean is th "e as dist: “9 C: the “ted giVe (2. 15 2.2 The Censored Case (Tobit Model): The model to be considered here is; * (2e1) Y1 ‘ “+111 , i 3 1, eee, I1 , where uiare i.i.d with zero mean and variance 02. By assumption, we only observe yi=Max(0, yi*), not Yi*' We wish to estimate u and 02, the mean and variance of y*. The only thing that remains to be specified is the true distribution of ui. We ask the question of what happens if we assume the errors to be normally distributed when in fact they are distributed as first order Sargan. In other words, is the Tobit estimator robust to non-normality? In order to answer this question, we can use equation (10) of Arabmazar and Schmidt (1982), and calculate the asymptotic bias of the Tobit estimates of u and 02. For this we need the first two truncated moments of the Sargan distribution (as given for the other distributions in their Table 1); these are: em(mr2+3x + 3/a) a-e'm‘ (2+ax) 390 (2.2) 13(qu > - x) . (xe - axe/a 2mm: Table The 1 The x biha: this t<3 Ta BSSW the deg:( the 16 and, V 16 -ax¢z 3 2 2 2 7-4e [Ix +x +Ex+-§-] a a XX) 41m(2+mt) (2.3) E(u2|u>-x) =1 m3+4x2-§x+—8- a 2 °‘ x<0 L Z-ax The estimated biases of the Tobit estimates are given in Table 2.1, for a variety of values of u, under the heading ”censored”. The true value of a is assumed to be two, which corresponds to o2 8 l. The results for the truncated version of Tobit model and also for the binary case (probit), which we will discuss in the next sections of this chapter, are also tabulated in Table 2.1. This table is similar to Tables II-V of Arabmazar and Schmidt (1982). We consider both the unrealistic case in which variance is assumed to be known (so that the estimated a is set equal to two), and the realistic case in which 02 is unknown, so that u and a must both be - estimated. We will consider first the case of known variance. The results and the conclusions drawn from them are basically the same as those of Arabmazar and Schmidt for the t1 distribution. 0 The bias is sometimes substantial, but it is heavily dependent upon the degree of censoring. That is, as the sample becomes largely complete, the asymptotic bias goes quickly to zero. This agrees with the fact (as i regre compl that subs: depen which (more as th unkno C850 bette thoug the v 1985 the tense. virtue 17 (as it must) that there is no asymptotic bias in the case of the linear regression model. For example, for samples that are at least half complete ( u = O, and above), there is virtually no bias. For samples that are at least 1/4 complete ( u = -l.2 and above), the bias is not substantial in the censored model (though it is so in the truncated dependent variable case, as we will see in the next section). Letting the variance of the dependent variable be estimated, which is the more realistic case, the results are somewhat different (more pessimistic). Although the asymptotic bias largely disappears as the sample becomes half or more complete, the biases are much larger in this case than the unrealistic one in which variance was known. That is, the cost of misspecification of the distribution of the error terms is higher when we do not know the variance. It is worth noting that in both cases (variance known and unknown), the bias of the normal MLE's - which allows for correction of censoring- is much less than the bias of the sample mean. Thus it is better (less costly) to correct the mean than not to correct, even though the correction is biased. This result does not totally agree with the results of Arabmazar and Schmidt (1982). In their study, when the variance was unknown, the uncorrected sample mean was sometimes less biased than the normal MLE. The same kind of conclusions can be derived for the bias of the estimated variance; that is, its bias depends upon the degree of censoring. As the sample becomes half or more complete there is virtually no bias at all. 18 2;; Th; Truncated Qgpendent Variable Model We now turn to the case in which we have no information concerning the unobserved observations (limit observations): even the number of limit observations is not known. The model is still as defined in (2.1), however, we observe yis y; 11 and only 11 y; is non-negative. Again as was the case in the last section we assume we have only one regressor, a constant term, meaning we estimate the mean of the observations only. The question again is: what is the statistical cost of assuming normality of the error terms, when they are in fact distributed as first-order Sargan? The numerical results are tabulated in Table 2.1 under the heading "truncated". The entries are the solutions to equation (11) of Arabmazar and Schmidt, needing only the already derived truncated moments of u given above in (2.2) and (2.3). The absolute bias is slightly greater than they found for the t10 distribution, in both cases of known and unknown variance. It is considerably greater than in the censored case, although it still goes to zero (very slowly) as the degree of truncation goes to zero. In the unrealistic case of variance known, for example, for samples which are 75 percent complete ( u = 1.2 and above) almost no bias exists. But as the sample becomes less complete (degree of truncation rises) the bias becomes greater and greater. Going to the more realistic case of variance unknown, the bias becomes much larger, especially for heavily truncated samples. Contrary to the censored case, however, when the 19 variance is not known the bias is much larger than the bias of the uncorrected sample mean. (That is, it does not pay to correct for truncation.) The robustness of the normal MLE to non-normality has also been considered by Goldberger (1980), in the truncated case, under the assumption that the only regressor is a constant term, and that the disturbance variance is known. He assumes that the true distribution of errors is a non-normal symmetric distribution (Student's t, Laplace, logistic), derives and calculates the asymptotic bias (inconsistency) of the maximum likelihood estimators which assumes the distribution of the errors to be normal. His results show that although the bias is large for largely censored samples, it goes to zero as the sample becomes complete. 2;; The Binagy (Probit) Model In this section we take up the binary dependent variable model, which can be seen as a special case of the Tobit model, (see Goldberger (1980)). We only observe two different values for yi (say, zero and one), according to the rule; e 1 if y1 > 0 I 1.1 eee n (2.4) Y1 * . . 0 if y1 < 0 * where, yi are defined in (2.1). Thus only the sign of the dependent 20 variable is observable. The only thing that can be identified is the ratio of the mean and standard deviation ( u / a ), so that if we assume that the variance is known and equal to one ( or use a as a normalization factor). then the mean of the dependent variable ( u ) can be estimated. The probability limit of the estimated mean (assuming 02 known and equal to one) is given by (12) of Arabmazar and Schmidt. The asymptotic bias is given in Table 2.1, under the heading ”binary”, for various values of u . Unlike the censored and truncated cases, the sample does not become completely observed as u gets positive and large. The bias is smallest when u = 0, and increases (symmetrically) as u moves in either direction from zero. The size of the bias is similar to that of the censored case with a2 known, for u s 0. 2.5 Conclusions In this chapter we considered the inconsistency (asymptotic bias) of the normal maximum likelihood estimators for the censored, truncated and binary dependent variable models, where the true distribution of the error terms is first-order Sargan. In the case of linear regression with a fully observable dependent variable no bias exists. The normal MLE's ( OLS ) are consistent (they are in fact BLUE), even if normal is not the true distribution of the errors. In other words, the normal MLE's are robust to non-normality. 21 The results for the censored and truncated models are to some extent similar to each other and also similar to the results for the other distributions considered by Arabmazar and Schmidt (1982). First of all, the asymptotic bias disappears as u gets large (i.e. as sample beomes complete) in both the censored and truncated models, but not in the binary case. When a is known, the bias for all values of u is much smaller than when we consider the more realistic case of 0 unknown. This is true especially in the truncated model with a large degree of truncation. Secondly, the bias of u is generally less in the censored model than in the truncated model, implying that knowing the number of limit (unobserved) observations helps in getting a less biased estimate of the mean. This is so, especially when u is very small; that is, where the sample is largely censored. The final conclusion derived from these results is that the maximum likelihood estimator of u (i.e., the estimator which corrects for censoring or truncation) is usually less biased than the sample mean ( y or y' ), suggesting that it pays to correct for censoring or truncation, even if the true error distribution is not normal. We have only considered a very special and simple caSe in which there is only one regressor - a constant term - so that we are estimating only the mean of the dependent variable. Thus it is not known how different the results would be in a more general case with more than one regressor. Certainly this would be an interesting extension of this research. no.0! —0.0l No.0 00.0 vn.! No.0 00.0 No.0 00.0 00. 0.N #0.0! —0.0! No.0 00.0 0N.! No.0 00.0 v0.0 —0.0 00. V.N no.0! No.0! no.0 —0.0 00.! No.0 p0.o 50.0 No.0 50. 0.N v0.0! no.0! no.0 —o.0 N0. —0.0 —0.0 N—.0 no.0 no. 0.— No.0! v0.0! —o.0! No.0 00. no.0! —o.0 0N.0 00.0 00. N.— ho.0 v0.0! 0—.0! No.0 0.. 00.0! —0.0 Nn.0 N—.0 N0. 0.0 0N.0 v0.0! 05.0! .0.0 00. m—.0! 00.0 ov.o —N.0 o0. v.0 .5.0 —0.0 00.N! v0.0! 0.0 V—.0! no.0! nh.0 0n.0 on. o h—.— ——.0 0N.m! n—.0! 00.! no.0 50.0! 00.. —0.0 —n. V.0! 0n.— NN.o 00.0! Nn.0! ——.! 0N.0 00.0! vv.— No.0 0—. 0.0! 0n.— nn.0 m0.0! Nm.0! 00.! mm.0 v0.0! p0.— 0N.— 0—. N.—! 5n.— v¢.0 v0.0! Vh.0! N0.! No.0 No.0 0N.N n0.— no. 0.—! mn.— mm.0 —n.0! 00.0! 00. hN.— N—.o 0m.N N0.N no. o.N! nn.— 00.0 no.m! MN.—! 0N. n0.— MN.0 NG.N —v.N p0. v.N! pn.— mh.0 no.0! 00.—! cm. 00.— hn.0 Nn.n 00.N —0. 0.N! oo+oucoth oOLOmcoo oo+oocoep ooLOmcoo .Numqmm oo+oocath ooLOmcoo oo+oucoeh ooLOmcoo .MMMHNflM. 1MH 0 2:8:ch by: 2:65. 31 com: 0.98m comeom 30.51.th men 28.5 mot... cog: tack no moi 0:01.052 —.N 0_noh Chapter 3 Robustness of Sargan MLE's to Normal Errors: First Order Case 3.1 Introduction One of the basic assumptions of the linear regression model as well as many other models is the normality of the error terms. But since in a large number of models (e.g., models with censored or limited dependent variables) the likelihood function contains the c.d.f. of the distribution of the error terms, the assumption of normality implies that the normal c.d.f. will appear in the likelihood function. The normal c.d.f. can only be evaluated numerically; the calculations would be simpler if a distribution were assumed whose c.d.f. can be expressed analytically. As we have seen in chapter 1, families of Sargan distributions are possible substitutes for the 'normal, as Goldfeld and Quandt (1981) have suggested. In the previous chapter we asked the question of how costly it is to assume normality of the error terms, when they are in fact first order Sargan. In the next two chapters we ask the reverse question, which we think is more interesting. That is, what happens if the true error distribution is normal, but first or second order Sargan is mistakenly assumedl? In other words, how good an approximation to the normal distribution is a Sargan distribution? We will answer this question by providing some evidence on the relevent statistical 23 24 questions, namely the effect on one's estimates or inferences of the use of such an approximation to the (hypothesized) true error distribution. The statistical cost involved is very much dependent upon the nature of the model in question. The misspecification of the errors' distribution ( e.g., an incorrect assumption of normality ) does not cause bias or inconsistency of the parameter estimates in the linear regression model, but it does cause inconsistency in limited dependent variable models, such as the Tobit model. Missiakoulis (1983) has calculated the asymptotic bias (inconsistency) for the binary (probit) model. We consider three additional models; the linear regression model, the censored regression (Tobit) model, and the truncated regression model. Although these results can not be generalized, because of the restricted set of models that has been used so far, our results serve to indicate at least to which extent the results are model dependent. In the next three sections we will present results for three different models, but for the first-order Sargan distribution only. In the next chapter we ask the same question for the second-order Sargan distribution ( for all three models ). In each section, first the model and its estimators are introduced, and then the numerical calculations follow (tabulated in Table 3.1). Most of our conclusions are similar to the conclusions of the previous chapter. For example, the bias depends stongly on the degree of censoring or truncation, and is generally larger when 02 is unknown than when it is known. However, an interesting result is that it is 25 typically less costly (in terms of asymptotic bias) to assume Sargan errors when the truth is normal than it is to assume normal errors when the truth is Sargan. 3:3 Linear Regression Model The model considered in this section is the linear regression model; E y =- X B + u i 1.1 ij j i (3.1) or where, xi represents a row vector of the regressors, with K elements, and B is a column vector of coefficients with K elements. The error terms, ui, are i.i.d with zero mean and are independent of the regressors. What remains to be discussed is the distribution of the errors. In section 2.2 we asked the question previously discussed by Goldfeld and Quandt (1981), namely, what happens if we assume normal errors when in fact the true distribution is first order Sargan. It is obvious that normal MLE will simply bqugg , and that OLS is BLUE, unbiased and consistent, under above conditions. However, Goldfeld and Quandt provided the non-trivial result that the efficiency of OLS relative to the Sargan maximum likelihood estimator is approximately 0.84 . In this section, the opposite question is addressed. Thus the errors are actually normal, but we treat them as first-order 26 Sargan. ( We will ask the same question with the second-order Sargan instead in the next chapter.) We therefore discuss the cost of using first-order Sargan as an approximation for the normal distribution, when the normal distribution is the correct one. The result, as we will show, is that there is no cost (no asymptotic bias) involved at all in assuming first-order Sargan when normal is the true error distribution. Our result indeed is much stronger than just stated, since it says that there is no bias if first-order Sargan is assumed mistakenly as long as the true distribution of the error terms is symmetric around zero. That is, the result hinges on the symmetry of the true distribution of the error terms only and not on its normality. Therefore, Sargan "MLE" is consistent provided that error terms are i.i.d with a symmetric distribution around zero.2 In order to show this result, we begin by forming the Sargan likelihood function. (3.2) inL = n in a - n in 4 - a fly1 - B'Xil + E in [1+aly1 - B'X1|]° ‘ A The Sargan MLE's ( a . B ) satisfy the following first order conditions: -A'x l 611 * l1’1 ‘3 1 (3.3) —--I‘ - §--§|yi-p'x1|+§ , . - a a 1+a|yi-B'X1| ~ 1 x13 binL ‘ “ 3.4 -—1—- - a X - a X . a 1 I ( ) ‘2' 13 ’ 13 1+a(y -e'x) i 1 + a X , 1. X11!- 0 — _ _, 7 l «(y1 8 X1) which are similar to the equations (3.2) in page 146 of Goldfeld and to summation over observations such that: Quandt. Here § refers yi Z B'Xi , and conversely for {- . Now we divide both (3.3) and (3.4) by n (sample size) and take probability limits. Defining Z and B to be the probability limits of a and B respectively, we get; i 2 lyi-B'Xil 1 ~ -p11m—£|y -B'Xl+p1.im ~ ~ n1 1 1 nildulyi-B'Xil (3.5) nzlr: n n ~ ~ - 1 (3.6) alem—E-plim-fi-E§Xij-aplim-leima-:£Xij n 1 X ~ + 1 -a1flfln-—;flhn-— A. ~ n u+ 1 1+dy1-B'Xi) 11 n +Zp11m-Eplim-fi-1-g ~ 1~ xij .- o. l-dy'B'X) 1 1 (Here n+ is the number of terms in § , and similary for n_ ). Deriving some of the probability limits in the above two But a closer look at these equations will equations is not feasible. indicate that to show consistency of B , we do not need to evaluate Rather we will simply show all of these probability limits explicitly. 28 that equations (3.5) and (3.6) are satisfied with Bj - Bj , for some 3. Starting with (3.5), we set Bj = fljfor all j = 1,2,...,K ; luil (3.7) . o - Plim— -u§|i|+Plimn§ all»; 1+a|u 111] 01' (3.8) —- PM +1: [—L‘lL-j- a 1"]ul If u is distributed normally with mean zero and variance oz, the first expected value in (3.8) is easy to evaluate ( a VF27_; ), but the second expected value poses a problem. However, as long as there exists a positive solution 3 to (3.8), its actual value is not of any importance. Next we define; ~ 1 1 1 (3-9) Xj = puma- i xij . plim-n:§_xij - plim—Exj , where the last two equations follow from the the fact that x is independent of u. With Bj = 6 j(for all j=l,2,...,K), equation (3.6) can be rewritten as n n_ 11+ (3.10) EPlim-i'i' -aP11m—x -‘EP11m-—Plim-— -—§——x n ‘j j n+ lfloui 13 ~ 11- l 1 +aPlim—Plim—2—x . 0 n r1—1~ fl - -uu1 ’ j‘l’ OOO’K 29 Also note that; n + ~, _ '~, , (3.11) Plim—n- - P (y1 > 5x1) P(ui>6 xi-B X1) n (3.12) Plim—n- - P(y1 O (3 13) plim n+ g; [1 ~ 1 xij] ml ~ jIn ] - P[——1:|u>01i., lion 3 and similary for the similar term involving n_, again because x and u are independent. Then (3.10) becomes; (3.14) E P(u > 0) 'i' -'&' P(u0) 1: [—l—lu > 0] 31'. J J 1% J +32 P(u 0 (3-21) 001.11) '- ecu (@2 " 11L) (1 < 0 zr-e‘“ <2-m) and H is defined as the ratio of the Sargan density function f to the Sargan c.d.f., with 70g 71= l (in order to insure the uniqueness of the 33 MLE in this case). GK:::? xx) (3.22) H(-x) - {egg-3- - menu-mt) K0 Hue-m0 m Also, ; refers to summation over observations such that both m conditions; y 2 0 , y S 2 hold, while ; refers to summation over A observations such that both y 2 O , y S u hold, and m+, m_ represnt the relevent numbers of observations. Now dividing by m and taking probability limits, with Z = Plim 3 , E = Plim a , we get: m m lY'JEI 1 ~ 1 1 (3.23) é-Pum; Z Iii-ulwnrn; .. a 1-1 1-1 1+aly 111] + Plim 1:; C(15) - o - .. m+ ur ~' m+ 1 1 3.24 —- A-am—M— ~ ~ < ) aIPUm m pm m m “Emmi-.1) +Epum§plimal§ ~1 ~ -p11ml:iH(-E) - o - Moi-u) The probability limits can be rewritten as: (3.25) é-Etlm‘ll ‘y>01+EI—,J:’-1‘;',—l y>01+§g§} G('E,I1')-o. Hair-Pl a 34 Errcyiily») - P0>1 -ZP(y>Ely>0) El ~ 1,, I 3%: >00] (3 .26) INN) + E P0> El ,, 1 ~ 1W) -PW .~ - m3“) ° |y01 The probability limits 5 , Z of the Sargan "MLE's" will satisfy these equations. To evaluate the inconsistency that results when the error terms are actually normal (rather than Sargan), we need to evaluate (3.23) and (3.24) for Y's normally distributed with mean u and variance 02. There is no problem in evaluating the required probabilities, nor in calculating E [ IY - 5 II Y 2 0 ] analytically (see appendix A). However, the other three expected values in (3.23) and (3.24) were intractable and had to be calculated by Monte Carlo methods (i.e., simulated). The number of drawings (of y) which we used ranged from 10,000 to 130,000 depending upon the likelihood of the conditioning event - for example, if u = -l ; 3 =3 it will take a lot of drawings to observe y 2 H enough times to calculate the conditional mean accurately. We also calculated some of the more difficult expected values by numerical integration, and got essentially the same results, The equations (3.23) and (3.24) were then solved using the Gauss-Newton method.3 Our results are given in Table 3.1, under the heading ”censored". (The results under ”truncated" will be discussed in the 35 next section.) We first look at the case where the £539 a is known (that is, the Egg; a is taken to equal twq,'which corresponds to a2 = l.) and then consider the case where the variance is not known, where the latter case is presumably the realistic one. The results are qualitatively similar to those of Table 2.1, for the opposite case (Sargan is true but normal is assumed), eSpecially in the rapid disappearance of the asymptotic bias as the degree of censoring falls. That is, the bias is heavily dependent on how complete the samples are. For samples that are at least half complete, there is virtually no bias, and for the samples that are at least 1/4 complete ( u = -l.2 and above), the asymptotic bias is small. (However, as we will see in the next section, this is not so in the truncated case, in which the bias is generally larger than in the censored case.) Now we relax the assumption that the variance is known, to see how much of a difference it makes whether the variance is known. The results, as shown in Table 3.1, clearly show that knowing the variance does matter. The bias generally is larger when we estimate the variance of the dependent variable ( 02), than when we assume it.' That is, the results for the case in which the variance is unknown are more pessimistic than in the case of known varaince. Indeed, the difference this makes is substantial, at least for heavily censored samples; the absolute bias is typically two or more times higher when the variance is unknown than when it is known, for samples less than 1/4 complete. Another interesting result is that the bias is typically smaller in Table 3.1 than_in Table 2.1. It is less costly (in terms of absolute asymptotic bias) to assume Sargan when the truth is normal 36 than it is to assume normal when the truth is Sargan. Why this should be so is not obvious. (Note that it is not so when the variance is assumed known.) In the next chapter, we ask the same question again, for the second-order Sargan distribution. That is, we will measure the statistical cost of using second order Sargan as an approximation to normal. Not surprisingly, there is less asymptotic bias involved when we use second-order Sargan than first-order. This may indicate that the second order Sargan distribution is a more adequate approximation to the normal in the econometric models. However, we have selected quite a limited number of models to investigate the cost, so the generality of this conclusion is perhaps questionable. 3.4 The Truncated 2gpendent Variable Model In this section we discuss the truncated version of the Tobit model of the last section. That is, the model is given by (3.16). However, now we observe yi= Y1. if and only if yi* Z 0. Nothing else is observed, not even the number of unobserved observations (limit observations). The results calculated in this section are in many ways similar to those for the censored dependent variable model of the last section. The bias is strongly dependent upon the degree of truncation, going to zero as sample becomes complete, and it is also larger when the error variance is unknown. However, the bias is generally larger in the truncated case than in the censored case. We wish to evaluate the asymptotic bias due to assuming the 37 errors to be first-order Sargan when they are in fact normal. We start with the Sargan likelihood function: 111 m (3.27) m. =- -min4+mxna-a X lyi-uI-i- Z h1[1+a|y1-ul] 1-1 1-1 -m1.nF(u) . As in the previous section, we first find the first order conditions: In “ m ly 'l-‘l A A (3'28) 6121‘ - g' 2 lyi'lll+ {-4—1--mR(a,u) I 0 0a a i=1 i-l 1+a|y1-ul (3.29) “1‘- a (m+-m_>-a ——.—-=-+a_§ . . bu 1+a(y1-u) 1"“(3’1'10 - In V (n) ' 0 where, r films)?!" pi>0 41-m(2+au) (3.30) R(a,u) I 4 1 {MT-Ea “<0 and “Hi aflmkm “)0 (3.31) vcm - fl“) - H ”e PM aCl ‘fl‘l u<0 and other notation is as used in the previous section. Then, dividing by m (sample size) and taking probability limits yields the following equations, which the probability limits 38 3 , 3 will satisfy: (3.32) é-EIIy—El |y>01+Et—JDiL-l y>01 «(5.11) - o a burly-pl (3.33) EPEly>0> - T£P0> - E'P'5ly>0) Bil-71:,— ‘ y>ILy>01 Haw-u) + EP0) E[—— I)! may») Imam - WE) - 0 Again, since the normal c.d.f will appear in the expected values in equations (3.32) and (3.33), as in the last section, we are not able to calculate them analytically (except the first one). Therefore, we solve (3.32) and (3.33) numerically for E and H , with the same simulation procedure used in the previous section used again to evaluate the necessary expected values. Our results are given in Table 3.1, under the heading "truncated". We will discuss both the case of known a and the case of unknown a . Generally speaking, in both cases the bias is much larger than in the corresponding censored model. In the truncated model, the asymptotic, bias goes to zero (in both cases) as samples become about 75% complete, compare this to the censored model in which the bias is virtually zero when samples are only half complete. As in the censored model, the results are heavily dependent upon the degree of truncation. For low degrees of truncation, the bias is very small, while at higher degrees of truncation the bias gets very large. Basically the same kind of conclusions can be drawn for the asymptotic bias of a , for the unknown variance case, of course. That is, the bias is larger in the 39 truncated model than in the censored model, and in both cases, the bias of a depends on the degree of truncation (or censoring). Consider first the (realistic) case in which a is treated as unknown. The absolute bias is clearly greater than for the corresponding censored case, as just noted. However, what is most striking is how much smaller the absolute bias is than in the opposite case (Table 2.1), in which we assumed that first-order Sargan was the true distribution of the error terms, but by mistake we used normal instead. The same kind of results were also found in the censored model, though not so strongly. It is less costly (in the truncated model, far less costly) to assume Sargan when the truth is normal than it is to assume normal when the truth is Sargan. Nevertheless, the bias is still substantial except for samples that are substantially untruncated. Next we consider the (unrealistic) case in which a is assumed to be known. In the present case (normal errors with zero mean and variance 1) this means we set E = 2, and then solve (3.33) for 3 . This presents some computational problems, since for any fixed 3 , the left hand side of (3.33) approaches zero as Z approaches - a. In other words, 3 = - a is always a solution. This is not a problem in and of itself as long as another solution exists. However, for 3 S -0.8, apparently no other solution existed. (At least we could not find one, despite our best efforts.) The corresponding entries in Table 3.1 are marked "not available". Those biases which are available ( u 2 -.8) are reasonable, however. 40 3.5 Conclusions In this chapter, we attempt to measure the statistical cost of the use of the first-order Sargan distribution as an approximation to the normal, in econometric models. Since the motivation for the Sargan distribution is really computational ease, it seems reasonable to focus on the case in which the errors are actually normal, but the Sargan "MLE” is used instead. Our measure of the statistical cost of such an approximation is the inconsistency (asymptotic bias) which results. An obvious feature of this ”cost” is that it is model dependent. No asymptotic bias results in the linear regression model. As we saw in section 3.2, the unbiaseness of the coefficients hinges only on the symmetry of the distribution of the errors and not on its normality. In the censored regression (Tobit) model, the bias can be substantial, but it is reasonably small for samples that are at least 50% complete. In the truncated regression model, the bias is even larger, and seems large enough to be bothersome except in samples that are substantially (say, 75%) complete. Also the results indicate that knowing the error variance helps, in both the censored and truncated models. The bias is much smaller when we know a than when a is estimated. Finally, comparing these results to those of chapter 2 tells us that the Sargan distribution is a better approximation for the normal than the normal is for the Sargan. That is, the bias is much smaller when the world is normal but we approximate it by Sargan, than when the world is Sargan, but we treat it as normal. 41 Obviously other models could be considered, (e.g., the disequilibrium model, which Goldfeld and Quandt (1981) used to introduce these distributions), but the fact that the adequacy of the approximation is so clearly model dependent is sufficient to argue for caution in the use of these distributions. At least this is so if they are really viewed as approximations to the normal; it is possible to argue that there is no more reason to assume normal errors than to assume Sargan errors. In this chapter we only considered first-order Sargan distribution, since higher order Sargan distributions contain more parameters than the normal. It is the case that a higher order Sargan distribution (e.g., second-order) would perform more adequately, as we will see in the following chapter. 42 Footnotes 1) This is perhaps a strange case to consider, since Sargan errors are unlikely to be assumed in a regression context, being less convenient than normal errors in this case. However, the regression case serves as a useful standard of comparison for the censored and truncated cases to be considered later. 2) Subject to the qualification that certain moments need to exist, as will be made explicit below. 3) Note that this is an expensive undertaking because 3 and 3 change at each iteration, thus requiring fresh calculation of the expected values just discussed at each iteration. Table 3.1 AsymptotIc Blas of Sargon “MLE” When True Errors are N(0,1) * um known) pm unknown) a Ji. P(z >0) Censored Truncated Cbnsored Truncated Censored Truncated -2.8 .003 -O.58 * 0.97 2.81 1.70 2.97 -2.4 .008 -0.32 ' 0.77 2.30 1.32 2.18 -2.0 .023 -O.12 ‘ 0.59 1.97 1.00 1.83 -1.6 .055 0.01 ' 0.46 1.48 0.78 1.28 -1.2 .115 0.09 ’ 0.33 1.15 0.53 0.98 -0.8 .212 0.10 -0.87 0.20 0.86 0.30 0.70 -0.4 .345 0.08 0.19 0.09 0.58 0.08 0.44 0 .500 0.03 0.23 0.02 0.34 -0.08 0.20 0.4 .655 0.00 0.16 0.00 0.17 -0.15 0.04 0.8 .788 0.00 0.08 0.00 0.07 -0.16 -0.05 1.2 .885 0.00 0.03 0.00 0.00 -0.14 -0.14 1.6 .945 0.00 0.01 0.00 -0.02 -0.12 -0.16 2.0 .977 0.00 -0.01 0.00 -0.03 -0.11 -0.18 2.4 .992 0.00 -0.01 0.00 -0.02 -0.11 -0.18 2.8 .997 0.00 -0.01 0.00 -0.01 -0.10 -0.15 5 Not avallable. See text for explanatlon. Appendix A; ~ 1. if E < 0 E {Iy4EI|y>0} - I; 0> dy - I; y£(yly>0) dy-T; I; f°)dY - E 0) 4E Poly>0> E {IV-3190} ' u + o ' m (g) - 3 (Ad) (3) std. normal density a std. normal cdf 0%) it . where m(d) 2. ifE>o E {Iy4fil[y>0} .- I; Iy-El f(yly>0> dy Ig'cfity) f(yly>0> dy + 13 (y-z) f(y|y>0)dy E 13 f(yly>0) dy - 13 y:(y|y>0>dy + IE yf()’|¥>0)dy '31: E “3‘50”” (A'z) I; 1 11' mm <") o '°-a Io f0> - 5?;553-I0 f(y>dy - -s?;§5§s- - ¢(M) O @OE:E) [a l r» 0 ~ f )0 - m A. f d I 44 45 N P 0< <") I; ~ I8 yf(Y|>'>0)dy ' w Io yf(>'l0<>'0)dY ' 3%3-1515'1110] Md?) «33—5 (M) ' T [11+O--—:—l “0’ «in J" 11 _“' - .E 2¢(B;£>-¢<-5)} + a “(9.03) 4(a) (11-7) mg) «13) E{ly-El y>0} = +a 7201-13 X1) 48 (Here, as before, 3 = Plim a , and similarly for the other parameters.) we show that fl In order to show the consistency of B , (4.5)-(4.7) are satisfied for B 8 0. Setting 3 = 6, we obtain; (4.3) (4.9) —-—"I +32 1: u 2} . o 1+?2 1+E|u|+3272u 1+2'Ey"2u 1+3u+§272u2 lu>0} '1? (4.10) Ep(u>0)1<' - Er(u<0)3i - Ep(u>0)z { ~ 1-2;;2u ~ +aP(u<0) E { ~ ~2~ 2|u0} X ' E N ”2” 2 1 1+au+a272u (4.11) Plim %— 1-2;;éu |u>o} 1‘6- E { ~ ~2~ - E 'U0} + 1: { | y>0} a 1+aly-ul+a 12(y-u) p(z<0) ~ ~ ~ _ + P(y>0) W (G,Y2,IJ) 0 (4 13) .Zl_.+ E I 32(y4;)2 I >0} +-££ZSQZ.5(Z " ") . 0 ° ~ 2 y P(y>0) 972:” ~ ~ ~ ~2~ 1+12 1+aIy-ul+a 72(y-u) 51 (4.19) EPT£ly>O> - bodily») 1+23?2(y-E) - 3P(y>fily>0) E { N N, ~2~ ~ I y>0. y’fi} 1+a(y-u)+a 72(y-u) .. ~ 1-2‘572(y-E) ~ + aP(y0) E I ~ ~ ~2~ ~ I y>0, y0) T (asYZsp) 0 Note that Plim.EE- - P(y>E I y>0) m- N Plim.-;- - P(y

0) n-m . P(y<0) Plim _m P(y>0) -u(1+au+126 u ) ~ 2+12+au+12(1+au) m-‘b/ ~ N ~ N a w<¢sstF> ' "—'—~'_E‘ 'I F(-u) - -ue“”(1-au+vza :4 ) ~ m p.<0 I (1+12)-e [2+12-au+72(1-au) ] (4.20) 52 V -3<1+?"u+&'2?232) ~ .. ,..., .. m 2 .90 ~ 2+12+au+12(1+au) ~ ~ ~ 0F(-p.)/ 311 I T(a.72.u) - ~ . 4 F(-u) ~ 35 - ~ ~2~2 (4.21) we “wa p > ~<0 u L4<142>-e“"t2+i.-awzu-m21 ' finial) ~ .. .. ,..., ~ m 2 I90 tau-Eva; <1W2>12n2+au+12<1+au> 1 8(3; E) ' 2 1'4 - 2 ' EH” agap(1';;)’ (1+?) .. (4.22) m 2 2 “<0 I4(1+?2)-e““[2+5y'2-334;2(1-;;) ] It is easy to see that evaluating the probabilities is not hard, nor is calculating E { I y- 71 I I y 2 0 } analytically, but the other expected values are not easy to calculate. Thus we have simulated 10,000 to 130,000 observations (depending upon the degree of censoring) from a normal distribution with mean zero and variance one, i.e., equations (4.17)-(4.19) are solved numerically. The results are shown in Table 4.1, under the heading ”censored? There aretwo sets of results for the case of ”variance known", with different values for a and 72 . The column labelled censored (I) is based on the values of a and 12 which correspond to variance equal to l and the fourth cumulant equal to zero, namely a = 3.07638,and ‘YZ = 2.15470, and the column labelled censored (2) is based on values of a and 72 which correspond to 53 variance equal to l and the value of the density function (second order Sargan) equal to .3989 at x= 0, namely a=-2.69500,Yé = 0.68884. Missiakoulis(1983) in his study used 72: 1/3 and a = 2.12132 which as we have shown in chapter 1 would not result in 02: 1. He should have used a = 2.44949 instead. In either case with the variance unknown, the biases are very small. In fact, except for large negative u , there is almost no bias. In the case of " unknown variance", we let both a and 72 vary and be defined within the model. Here the bias is larger than in the known-variance case, but it is still not very large for samples that are at least half complete. Also the results in both cases ( a known and 0 unknown ) are better than the results shown in Table 3.1, for the first order Sargan case. This makes the second order Sargan more attractive than the first order, although of course the bias should not be taken as the only means of selecting an alternative for normal c.d.f. 4.4 The Truncated Regression Model Finally we discuss very briefly the truncated dependent variable case. Here we have the following, log of likelihood function; m (4.23) int = m1na-m1n4-mln(l+72)-a 2 Iyi-uI 1-1 m 2 2 + 2 in [1+aIy1 - uI + a 12(yi-u) ] - m in F (u) i-l We take derivatives of (4.23) with respect to a ,1} , u, set them equal 54 to zero, divide by m and then take probability limits: ~ NV ~2 ly-uI+2a72(y-u) 1 (4.24) -- E {Iy-uI y>0} + E { I y>0} a 1+aIy-pI4-a 12(y-u) - A(;s;293) . 0 ~ 2 -1 ~2 (y-u) (4.25) ~ + a E { ~ ~, ~2~ ~ 2 I y>OI 1+12 1+aly-ul+¢ 72(y-u) - C(Es;zs:) ' 0 (4.26) EPEIy>O> - EP0> A.» ~ 2 ~ ~ 1+2¢72(y-u) ~ -aP(y>uly>0) E { ~ ~ ~2~ ~ 2l y>u.y>0} 1+a(y-u)+a 72(y-u) .1. ~ 2 ~ ~ 1-2a72(y-u) ~ + aP(y0) E { ~ ~ ~2~ ~ 2I y0} 1-a(y-u)+a 72(y-u) - B (3232.3) where (4.27) A(E.?2.E) -I .. a....aQ~Q .2” u<1+am2a u )2 “ ~ u>0 4<1€2>-e'““12+§'2+'&'5+?2<1+’&"4>21 u(1-au+vza u ) ~ .3... a..2 "<0 L Zfizrawzfl-au) 55 I' a. Ee’““<1+‘&'i+§'23232) ~ .1 100 ~~ ~ I 4(1+72)-e [ZflztaufiZUWMJ (4.28) Mmzm) -I 3(1-3'E+E'2?232> ~ .~ ~.,.’ 'v~ 2 u<0 I 2fi2-3W2(1'au) V GIN-m «I» ~ -aue ”(Haw/am) ~ ... u>0 4<1+9'Z>-e'“"12+72+&”u+?2(1+3‘5)21 (4.29) «352511) - 1 imam/(142) ~ ~ - ~ ..,3' u<0 I 2+12-au+72(1-au) As in the censored case we have calculated the necessary expected values by simulation, based on 10,000 to 130,000 replications. The results are shown in Table 4.1, and labelled truncated . Two sets of numbers are generated for the case of known variance. As expected, the absolute biases are generally larger in the truncated Iregression model than in the censored regression model, and they are generally larger when the variance is unknown than when it is known. They are generally smaller than in the first-order Sargan case, sometimes substantially. So, at least from the standpoint of bias, it is safer to assume second-order Sargan than first-order Sargan. 56 Footnote 1) Incidently, we consider the second order Sargan distribution to have two free parameters ( a and 72 ). This differs from Missiakoulis (1983) who constrains 72 I=1/3. Table 4.1 Asyuptotlc 81as of Second Order Sargon 'NLE' when True Errors are N(0,1) u (Variance Known) p (Varlanoe Unknown) 2 Censored (1) Censored (2) Truncated (1) Truncated (2) Censored Truncated -2.8 -.07 -.29 ' * .66 2.37 -2.4 .01 -.14 -8.56 ' .45 2.04 -2.0 .05 -.04 -0.91 -12.57 .27 1.72 -1.6 .05 .02 -.08 -2.17 .15 1.20 -1 .2 .03 .04 .21 -.47 .03 .87 -.8 -.01 .04 .23 .01 -.07 .64 -.4 -.04 .02 .09 .09 -.12 .40 0.0 -.06 .02 -.09 .06 .01 .16 .4 .03 .01 -.11 .06 .01 .13 .8 .01 0.0 -.02 .05 0.0 0.7 1.2 0.0 0.0 .03 .02 0.0 .02 1.6 .01 0.0 .03 .01 0.0 .01 2.0 0.0 0.0 .01 0.0 0.0 -.01 2.4 0.0 0.0 0.0 -0.1 0.0 -0.1 2.8 0.0 0.0 -.01 -.01 0.0 -.01 Chapter 5 Bivariate Sargan Distribution 5.1 Introduction So far we have considered only univariate Sargan distributions. In this chapter we extend this study to the bivariate case, in a way which could be generalized to the general n-variate case. If we view sargan distributions as easily computable approximations to the normal, the logic for considering multivariate Sargan distributions is in fact much stronger than for univariate Sargan distributions, since the multivariate normal c.d.f is so much harder to calculate than the univariate normal c.d.f. It is not immediately clear what a bivariate Sargan distribution is. Therefore, I begin by exploring different ways of defining such a bivariate density function, and some of their advantages as well as their shortcomings. Then we will approximate the bivariate density in a specific form. This is done on the basis of Stone's theorem (1962, p. 74), which shows that any function f(x) with certain characteristics can be uniformly approximated by functions of the form e.“ x P(x), P(x) being a polynomial. We will also make some comparisons between the bivariate Sargan densities and their normal counterparts. Finally, we will look at the robustness of the estimators assuming such a bivariate density 58 59 for different models. The plan of this chapter is as follows. In section 5.2 we discuss alternative possible definitions of a bivariate Sargan distribution, and choose one such definition. In section 5.3 we compare the bivariate Sargan density to the bivariate normal density (for various levels of correlation) to see how good an approximation we have. In section 5.4 we provide some evidence on the statistical relevance of the use of such an approximation to the hypothesized (true) error distribution function. For example, as the univariate case, the mistakenly assumed first-order Sargan distribution does not cause any asymptotic bias in the (seemingly unrelated) regression model, so long as the true distribution of the errors is a symmetric one. (A similarly strong result has already been shown for the univariate case.) Finally, in section 5.5 we conclude that the bivariate Sargan distribution, as we have defined it, is an adequate and reasonable approximation for the bivariate normal distribution, at least when the correlation coefficient is not too large. 5.2 Definitions There are several possible ways that one can proceed in order to construct a bivariate density (in our case a bivariate Sargan density.) We will investigate three ways of doing 50. although not in great detail, and discuss their advantages as well as their problems. 60 i) Linear Transformation gpproach Let us assume that x x2 are distributed independently 10 according to the (first-order) Sargan with mean zero and variance one. Therefore, their joint density function will be; 16 e'“1l"1|'“2|"2I 313;. [1+a1Ix1I][1+a2IX2I] (5.1) f(x1,x2) Now we define two other variables Y1, Y as linear functions of X's; 2 Y I a X + 82x2 1 1 1 (5.2) Y2 - 1:11:14-b21{2 or simply, Y . AX (5.3) 81 82 A ' I ] b1 b2 The joint density function of Y then becomes; 0 -1 -1 -1 8(y1.y2) - Jacobian f(A y) - IIAII f(A y) 1 2 16 A l1’2"1"*‘2"2I"Ti"l"1"2"’1y1l e “1“2 (5-4) 8(Y1’Y2) - II All.1 “1 “2 . [1+ lezyl-azyzl][1+ Tlaly2-b1y1l] 61 with (5.5) COV (Y1, Y2) I Z 8 AA' A = alb2 - azb1 This construction follows by analogy to the relationship between the bivariate and univariate normal, since the bivariate (or multivariate) normal can be defined as arising from linear transformation of independent univariate normals. We can accommodate any covariance matrix 2 by appropriate choice of A, and we could also allow for an arbitrary (non-zero) mean vector for Y. However, we dismiss this as a reasonable definition of bivariate Sargan because its marginals are not univariate Sargan. ii) ”Translation" approach This is a general way of constructing a joint density with specified marginals, in this case Sargan marginals. 1' x2 have a bivariate normal density with correlation coefficient b . Let d represent this density We start by assuming that x function. ,Now define: -1 Y1 . F @(XI) (5.6) Y - F'1o(x2) 2 where F' is a univariate Sargan c.d.f., and dis the standard normal c.d.f. Then the marginal distributions of Y1 and Y2 are Sargan, by 62 construction. However, the joint density of Y1 and Y is messy: 2 (5.7) r. e’lrcy211 Thus there is no conceivable computational advantage of using such a bivariate Sargan density. iii) Approximation Methods Here we define the bivariate Sargan density as an approximation. To do this we will make use of a theorem due to Stone (1962, p. 74), that "any continuous real function f, yhigh is defined 92 Egg interval 0 S x < a 229 vanishes a; infinity i3 She sense that 3153 f(x) = 0, gap be uniformly gpproximated by functions 2:. the. e'u‘ P(x), where P(x) is a polynomial". Thus we start by assuming the density f(xl, x2) to be a continuous function defined on the intervals xi 6 [0, a), such that; 1 m xlio f(x1,x2) - 0 sz (5.8) xiig f(x1,x2) - 0 Vxl Following the argument made by Stone (1962, p. 75) we define two new variables £1, £2, as follows; (5.9) 63 for arbitrary values of xl> 0, x2> 0 Therefore, xi= - In 21/ di ,i=l,2. Notice that ii s(0,l], and ‘1 goes to zero as xi goes to infinity. Now we define the function e on [0,l]x[0,l], such that ¢ ( 0 , £2)= 0 V £26 (0.1] (5.10) ¢ ( £1: 0): 0 v £15 (011] ¢ (0 , 0)= 0 Then 6 is a continuous function over [0,l]x[0,l], and therefore, it can be uniformly approximated by a polynomial in £1,’£2 (p. 69 of Stone). 35: C00 + C1051 + c0152 (5 11) + c g 2 + 6 a g + c g 2 ° 20 1 11 1 2 02 2 '1' 000 N N-l N + cN051 +CN-l,1§l £2 + "' + c01452 or simply B 5 1 2 03152 51 F'2 0<81+82 K K1 K2 . a I ——-—-——- 3 . —— - — 1 aZKO+K2 0 a1 a2 Similarily, by symmetry of the argument -a Ix I 2A 2 2 (5.23) 3(x2) - --7 (a1K0+K1+a1K2Ix2I) e “1 Again in order for g(x2) to be a Sargan density requires; a K K (5-24) 0.2 - E‘Tlc'é‘x" , which implies K0 - 23-4 1 o 1 2 . “1 (5.23) and (5.24) together imply that (5.25) Kl/ “1= K2/ “2 That is, to have Sargan marginal densities, the above equalities should hold. Imposing the second requirement as (5.26) Kl/ al= Kz/ a2= 9 or K1= Gal ,K2= Oaz 70 9 being some parameter, then A = alaz/ 8 0 . Substituting back the values of the parameters into the bivariate density function and its marginals we get very simple results “1&2 e-aIIxII-azIsz (5.27) f(x1,x2) . -§§- [6(a1Ix1I+a2Ix2I)+K3x1x2] a -a Ix I 1 1 1 3(x1) - -z- e (1+a1Ix1I) (5.28) I a -a Ix 2 2 2 3(x2) . —z- e (1+a2Ix2I) The principal advantage of (5.27) over (5.14) is that it does not imply zero covariance. In appendix A it is shown that cov(x1, x2) depends on the values of the parameters, in particular K3. Explicitly, 2K3 ‘ E X ’x ) s (5.29) cov (X1.X2) ( 1 2 a 2a 26 1 2 _ 2 _ 2 (5.30) _ 2 _ 2 E( X2)- 0 E(X2 ) - 4/ a2 There is still one problem remaining to be resolved, namely that f(xl, x2) in (5.27) need not to be non-negative for all values of 71 x1, x2 and the parameters. We will solve this problem by including another term which involves the cross-product of absolute values of x1 and x2, and therefore rewrite the joint density as follows: e-¢1Ix1I-azlsz f(x1,x = B 0 2) {1 0 not even p I 1/2 can be attained. This seems to be the main shortcoming of the bivariate Sargan distribution as defined in (5.36), and it casts some doubt on its ability to approximate well distributions with large positive correlation. In appendix B we also derive the moment generating 74 function, which implies the following moments. [n!m!+(n+1)!(m+1)!] xoi+ [(n+1)! m! + n! (m+1)!] e n m 2:11 (12 (ROW) if n,m both even ”um I 0 if one is odd and the other one is even (n+1)! (n+1)! K3 2a1n+1 a2“+1 (K0+9) (5 40) if n;m both are odd Also the cumulative distribution function is equal to: (12 e 1 1 I -e-a222[c -a C 2 +a C 2 -C 2 z ] + 4C -2a C 2 8C4 6 1 7 l 2 7 2 5 l 2 4 1 4 1 22(0, 21>0 or 21(0, 22>0 ealzl+a222 3C4 {Cl-alczzl-a2c222+cszlzz} zl,zz K K 2 2 5 6 (5.52) 5 " K5 2 ' K6 2 . 2 I. “1 72 “2 52 “1 v4 “2 v5 2x3 (5.53) cov(x1,x2) - E(X1.X2) ' “‘2"2 6a1 a2 x3 / (1+12><1+§2) (5.54) p ' 53-11723 " ”(1+372)(1+3§2) 78 The main problem with the second-order bivariate Sargan distribution is that it contains a large number of parameters. A bivariate normal with zero mean contains 3 parameters. The first-order bivariate Sargan (5.36) contains 4 parameters. The second-order bivariate Sargan contains 10 parameters (a1, a2, 5, K0, K3, v1, v2, v3, v4, v5). This may be more than is reasonable. The alternative second-order bivariate Sargan density that was considered is the following: .’“1|"1I"“2I"2l f(x1,x2) I A e [KO+K1leI+K2Ix2I+K3x1x2+K4Ix1IIx2I 2 2 2 ~ 2 + stl +K6x2 +K7x1 x2+K8x1 Isz 2 2 2 2 (5'55) + K9x1x2 +K1on1Ix2 +K11x1 x2 I with the following assumptions; 7: M x < K9 _ K10. Here again A has to be equal to: (5.56) A I a1 a2 where, _ 2 2 2 2 D a2 (a1 Ko+a1K1+2KS)+a2(a1 K2+a1K4+2K8)+2(a1 K6+a1K10+K11) (5.57) 79 to make f(x1,x2) a density function.The problem with this joint density function is that it does not generate the second-order Sargan marginals. Rather 'a Ix I 2Ae 1 1 2 "“;‘§"" [r1+(’2+2(x9+x1o))x1+“3x1 1 x1’0 2 (5.53) 81(x1) -a1Ix I 2Ae 1 2 a“ [fl-(r2+2(x9‘xio))“1+“3*1 1 x1<° 2 ’“2Ix2I 2Ae Ir +(r +2(K +K ))x +r x 2I >0 a 3 4 5 7 a 2 6 2 x2 1 (5.59) 32(x2) . 'a Ix I ZAe 2 2 {r -(r +2(x -K )) +r 2} <0 a 3 4 5 8 7 x2 5‘2 x2 1 where, 2 r2 ' “22K1+“2K4 2 r I a2 K5+a2K8+2K11 2 I a1 Ko+a1K1+2K5 r I a 2K +a K 5 1 2 1 4 2 r I a1 K6+a1K10+2K11 80 Also, this density contains even more parameters than (5.47). For both reasons we will not consider it further. 5.3 Density Comparisons In this section we will provide tables of first-order bivariate Sargan and bivariate normal densities. In order to be able to make meaningful comparisons we have imposed some restrictions on both densities. These restrictions are; (1) means equal to zero, variances equal to one . (ii) same correlation coefficient, p . IO. (iii) same densities at x I x l 2 These restrictions are sufficient to determine uniquely the bivariate Sargan distribution which we will compare to the bivariate normal. The reason is that, although the Sargan density (5.36) appears to depend on five parameters ( a1, a2, K0, K3, 9 ), it can in fact be written in terms of only four parameters. Let p be defined as in (5.39) and define “1 “2K0 (5.60) So I f(0.0) '- may then alaz -a1Ix1I-a2Ix2I f(x1,x2) - ‘ETEaxfij e [KO-talelxlI+a29Ix2I+K3x1x2 + “1“2Kolx1x2 '1 81 IalleI-azIsz Claze ' 8(_“'1<0+e) {(K0+9>[a1|xll+a2Ix2II + KO[l-a1Ix1I-aZIx2I+a1a2Ix1x2I] + K3x1x2} Ia Ix I-a Ix I a (5.61) - e 1 1 2 2 {:élallxllfizllel + So[l-a1Ix1I-azIx2I+a1a2Ix1x 2I] 2 2 a a 1 2 +‘T" 9x112} The imposition of unitary variances implies :11 I a2 Sargan distribution is uniquely determined once we pick a value of the I 2, so that the correlation ( p ) and of the density at x1- sz 0 (so). We therefore provide in Tables 5.1-5.7 a comparison of the Sargan and normal densities. Each table corresponds to a different value of 0. For x1 ={-3.o,-2.5,-2.o,-1.s,-1.o,-o.5,o.o,o.s,1.o,1.5.2.0, 2.5,3.0} and x I{0.0,0.5,l.0,l.5,2.0,2.5,3.0}, we provide the normal 2 (top entry) and Sargan (bottom entry) density. ( Negative entries for x2 are unnecessary because of symmetry; e.g., f(x1,-l.0) I f(-x1,l.0) and f(xl, x2) I f(xz, x1).) The agreement between the Sargan and normal is rather close. 82 The largest differences are near the turning points of the distribution (IXI between 1.0 and 2.0, say), but even these are not very large. Thus, for the values of p considered, the Sargan distribution appears to be a fairly good approximation to the normal. However, it should be noted that the range of p which we consider is rather limited. As indicated in the discussion following (5.39) above,’we need to restrict the correlation ( p ) to ensure that the Sargan density is non-negative for all xland x2. We can not expect a Sargan density to approximate well a bivariate normal with high correlation, at least not unless we allow negative "density" in the tails of the approximation. §;1 A Sigple Seemingly Unrelated Regression Model In the previous section we saw that the bivariate Sargan distribution is a reasonably accurate approximation to the bivariate normal. However, this fact does not provide direct evidence on the relevant question of whether estimators based on the bivariate Sargan distribution will be robust to normal errors. This question was addressed in chapters 3 and 4 in the univariate case, for a variety of models, and the answer was (not surprisingly) model dependent. It is reasonable to expect that the same will be true in the bivariate case; the asymptotic bias that results from assuming Sargan errors when' the errors are actually normal will depend on the model. Rather than become involved in the extensive calculations which were done in the univariate case, however, we will simply ask whether this bias is ever zero. It was, in the univariate case, in the linear regression model. 83 Thus it is reasonable to conjecture that this bias may be zero, in the bivariate case, in the seemingly unrelated regression model. For simplicity and tractability we restrict ourselves to only two equations and only one regressor (a constant term), in each equation. We show that the estimates based on the bivariate Sargan distribution are indeed consistent, regardless of the form of the true distribution of error terms, so long as it is symmetric around zero. Thus we consider the simple model Y1= “1+ ‘1 (5.62) Y2: “2+ ‘2 where yi is Txl, ei has mean zero and covariance structure: 0 if tl s E( s ., e .) ti 5] i,j=l,2 0.. if t=s 1] Notice that we have not yet specified the distribution function of the errors. Here, as in chapters 3 and 4, we ask the question of what happens if we assume a bivariate Sargan distribution when in fact the true distribution is something else (such as bivariate normal). We form the (Sargan) log likelihood function; M (5.63) e - MlnB+Im§1 in f(yml-ul, ymz'“2) 84 or £ I MXnB - a1 §| I? ‘U I ‘ a E Iy '9 I mal m 1 2 11131 m2 2 M + “121 in {KO-palerml-ulI+a29Iym2-u2I+K3(ym1‘ul)(ym2'fl2)} (5.64) + “1“2Ko l("11.1""‘1“I’m—”2) I I ' where B is defined as (5.35). To express its derivatives with respect to all parameters, we define: (5'65) D1 ' Ko+“1°(ym1'“1)"'“29("m2"“2)+(K3+“1“2Ko)(ym1'“1)(ymz'pz) (5-66) D2 - K0+a16(ym1-u1)-a29(ym2~u2)+(K3-a1a2Ko)(ym1-u1)(ymz-uz) (5‘67) D3 ' Ko"“1“(’mf‘fi““2“(ymz‘I‘z)+(K3’“1“2Ko)(ym1'“1)(ymz'uz) (5.68) 114 a Ko-ale(yml-p1)-aze(ym2-u2)+(K3+a1a2K0)(ym1-u1)(ymz-uz) 5 o 1 I’m ”1'+“29'ym2‘“2I+K3(ym1'“1)(ym2’“2) + alazxo I(ym1-I11)(ym2-p2) I 85 Then , the first order conditions defining the maximum likelihood estimates are; (5.70) (5.71) (5.72) "" ' “1(M1+-M1-) ' 2 537' ’ “2("2+3”2-) ' as '++ “19+(K3+“1“2Ko)(’m2‘“2) D1 3 1 2 0 D2 +.. - - «19+(K a a K M?1112 #2) -+ ale-(K3-a1a2K0)(ym2-u2) D3 + H “1 e’(K3*‘“1“2Ko)(yuaz"“2) D4 + +§ “29+(K3+“1“2Ko)(ym1'“1) 2 D1 + +E “2“’(K3’“1“2Ko)(ym1““1) D 2 - “E a29+(K3"a1¢12K0)(Ym1‘I11) D 3 +' “2“'(K3+“1“2Ko)(ym1'”1) +2: ‘11-) 4 M M '52" ' r‘mEIIymHI” 1 lg OIyml-pl I+a2KO I(ym1-u1)(ym2-p2) I - mIl DS 0 86 M 8£ M 5.73 ._.. . .._- _ 1X1 elymz'uzI+a1KOI(ym1-u1)(ym2-u2)l - 0 mIl D5 M 1+a - - (5.74) 5% _ Rid-'I Z __1“2I(ym1 Isl)(yu12 u2)I 0 M a I? ' I“ I? w I 6£ ‘M 1 m1 u1 2 m2 2 o mIl 5 M (y '11 )(y -u) (5.76) .5725. .. 2 and 1D m2 2 _ o 3 mIl 5 (where, M ++ is the number of terms in +2 and similarly for §+ , §_ , M .) Dividing these derivatives by M, and taking probability limits we have (denoting the probability limits of ai, "i' K0, K3, 0 by Hi , Hi, K0, R3, 9 respectfully): M M ~ 1+ 1.. (5.77) (11 {Plim T - Plim T } “H 1 ++ (119+(K3‘H11 2 O)(ym2-u2) - Plim T Plim 31—— 2 ~ H D 1 ”4+. 1 +- “19+(K3-a1a2K0Xym2-p2) - Plim T Plim ii" 2 ~ +- D 2 M ’+;~'(i'3af)(y u) +P11m-i-1Plimfil—2 1 3120 m2 2 -+ D 3 Mn “ ;~'(fi+53~)(y u) +Plim—P11m—Lz 1 3 1 0 m2 2 M M__ 'f)’ l, M Plim (5-78) «2 [Plim T" M ] M++ 1 ++ a29+(K3+a1a2K0)(Ym1 111) - Plim T Plim fi— 2 ~ H D 1 M +- 3"-4 Iym1-u1I M Elyml-ul |+<12K0I yml-ull lymz-uzl + Plim 1 SI M 11131 D5 M 1 N (5.80) :g" Plim‘fi’ 21 IymZ-pzl a2 m. N N ~~ ~ ~ M Glymz'BZI+aIKOIYm1’H1IIymz'P2I +P11m'fi N m']. D5 M 1+3; IV :5 II? -El 1 2 2 (5.81) -:—-1—;+Plimila- X 1 2 “N m Ko+9 m=1 D5 M 3|? 411' It; Iy 411' (5.82) -~——1~+plim% 2 1 “I 1~2 “'2 2' Ko+6 mI1 D5 (5' -T1)(y 5) (5.83) nimfi 2 “1 1 “2 2 - o mIl D 5 Again we have a very messy but (in principal) useful set of equations. To show that ui's are consistent , following the same logic as in the univariate case, we only need to show that the above equations are satisfied by 3i I ui(and some values for U , Hi, 3 , R E3 ). With E i: “i' the second terms in both (5.79) and (5.80) are 0 U easy to evaluate, while other terms are very messy. But the actual values of the parameters : 31' B , P , R0, R3 are not important as long as acceptable solutions to the above (5.79)-(5.83) exists. By 89 acceptable we mean that 3i , R0 , 3 , P , be positive, and K3 be less than alazko . To evaluate (5.77) and (5.78) with Z i: ui, we note that; 141+ Ml_ Plim ( ___ )I P (y1> ul) I P (yl< ul) = P( ___ ) M M because of the symmetry assumption. Finally, it is obvious that the two equations (5.77) and (5.78) hold provided that the joint density of the error terms is symmetric around zero and certain expected values exist: 1 H a18+(K3+ala2Ko)€m2 E {(11 ( 3 1 2 0 2} (5.84) Plim-fi— 2 ~ 5 ++ D1 ++ 1 264i&"§)e — flequa%ka .E{1 31“202}_anfi_1_2 ~ I -- ~ .. n D 4 A ‘ ezaeezin za«:;"n 0 m2 1 3 1 2 o 2 (5.85) Plim-fi-l— 2 1 3~1 2 - E ~ +— D2 +- D2 “1°'(K3'“1“2Ko)“2 1 “ “’(Ka‘fifixokmz I E I P1111!M ~ " -+ D -+ D3 3 The same procedure can be applied to (5.78). Thus all (5.77)-(5.83) are indeed satisfied for ii I “i (and for 3i , U , R0, K3 as being implied by their respective equations (5.79)-(5.83) . The Sargan 90 "MLE" is consistent for “l and “2' This result can also be obtained in cases involving some regressors other than just a constant term. The proof is basically the same, and the result is that the Sargan "MLE‘s" are consistent for the coefficients of all of the regressors. 5.5 Conclusions In this chapter we defined a bivariate Sargan distribution. This was the result of a series of attempts to derive or approximate a bivariate (or multivariate) distribution function with certain properties. Basically, we want a distribution which reasonably approximates the bivariate normal, which has Sargan marginals, and which has an easily computable c.d.f. Our first-order Sargan distribution (5.36) has these properties. We calculated and compared the densities of the bivariate Sargan and bivariate normal distributions to show that this bivariate Sargan density is very close to the bivariate standard normal. (Of course, it has more parameters than the standard normal.)‘ The agreement between the two distributions is in fact very close except around the turning points (x1 and x2 between 1 and 2, say). Finally, we turned to the question of robustness of inferences based on the Sargan density. Here, we looked only at a rather simple model, a seemingly unrelated regression model , with only a constant term in each equation. Of course the MLE's based on a normality assumption are consistent, even if the errors are not normal. The more 91 interesting question is the effect on the estimates of assuming Sargan where the true distribution is not Sargan. Here we showed that the Sargan "MLE's" are still asymptotically unbiased, so long as the true distribution of the errors is symmetric around zero. This result can also be obtained in regressions with multiple regressors. For more complicated models, the Sargan MLE‘s will not always be consistent if the errors do not actually have a Sargan distribution. As we found for the univariate case in chapters 3 and 4, the extent of the asymptotic bias must be model dependent. But the above result at least is the basis for the expectation that the bias will depend strongly on the extent of censoring or truncation in the model: with complete observations, there is no bias. TABLE 5.1 Comparison of Bivariate Sargan and Normal Densities (Normal I top entry, Sargan I bottom entry) p - —e15 x2 X1 3.0 2.5 2.0 1.5 1.0 .5 0 0.0 -300 0.0 0.0 .001 -2e5 0.0 0.0 .001 .002 .0051 -200 ' .001 .001 .003 .001 .004 .010 .023 -105 .001 .003 .006 .012 .002 .006 .017 .038 .067 -100 .002 .005 .012 .025 .051 .002 .007 .021 .050 .092 .130 -es .004 .010 .021 .045 .090 .156 .002 .007 .021 .051 .097 .142 .161 0.0 .005 .013 .028 .059 .114 .184 .161 .001 .005 .016 .040 .079 .120 .5 .002 .006 .013 .028 .060 .115 .001 .003 .009 .024 .050 1.0 .001 ,002 .006 .013 .029 0.0 .001 .004 .011 1.5 0.0 .001 .002 .006 0.0 0.0 .001 2.0 0.0 0.0 .001 0.0 0.0 2.5 0.0 0.0 0.0 3.0 0.0 TABLE 5.2 Comparison of Bivariate Ssrgsn and Normal Densities (Normal I top entry, Sargan I bottom entry) p - -.10 x2 x1 3.0 2.5 2.0 1.5 1.0 .5 0 0.0 “3.0 0.0 0.0 0.0 -205 0.0 0.0 0.0 .001 .004 -200 0.0 .001 .002 .001 .003 .010 .021 -105 .001 .002 .006 .011 .001 .005 .016 .036 .064 “1.0 .002 .005 .011 .023 .047 .002 .007 .021 .049 .089 .127 ".5 .004 .009 .020 .042 .085 .149 .002 .007 .021 .051 .097 .141 .160 0.0 .005 .013 .028 .059 .114 .184 .160 .001 .005 .017 .042 .081 .121 .5 .003 .006 .014 .031 .065 .122 .001 .003 .010 .027 .053 1.0 .001 .003 .007 .015 .032 0.0 .001 .005 .013 1.5 .001 .001 .003 .007 0.0 .001 .002 2.0 0.0 .001 .001 0.0 0.0 2.5 0.0 0.0 0.0 3.0 0.0 TABLE 5.3 Comparison of Bivariate Sargan and Normal Densities (Normal I top entry, Sargan I bottom entry) p - -e05 \ X2 \ X1 \\ 3.0 2.5 2.0 1.5 1.0 .5 0 0.0 -3e0 0.0 0.0 0.0 -2e5 0.0 0.0 0.0 .001 .004 -200 0.0 .001 .002 .001 .003 .008 .019 -105 .001 .002 .005 .010 .001 .005 .014 .034 .061 -1.0 .002 .004 .010 .021 .043 .002 .007 .020 .047 .087 .126 -es .003 .008 .018 .040 .080 .142 .002 .007 .021 .052 .097 .141 .159 0.0 .005 .013 .028 .059 .114 .184 .159 .001 .006 .018 .044 .083 .122 .5 .003 .007 .015 .034 .070 .129 .001 .004 .012 .029 .056 1.0 .001 .003 .008 .017 .036 0.0 .002 .006 .015 1.5 .001 .002 .004 .008 0.0 .001 .002 2.0 0.0 .001 .002 0.0 0.0 2.5 0.0 0.0 0.0 3.0 0.0 TABLE 5.4 Comparison of Bivariate Sargan and Normal Densities (Normal I top entry, Sargon I bottom entry) p I 0.0 x2 XI 3.0 2.5 2.0 1.5 1.0 .5 0 0.0 '3.0 0.0 0.0 0.0 -205 0.0 0.0 0.0 .001 .003 “2.0 0.0 .001 .002 .001 .002 .007 .017 -1e5 .001 .002 .004 .009 .001 .004 .013 .031 .059 -100 .002 .004 .009 .019 .039 .002 .006 .019 .046 .085 .124 -es .003 .007 .017 .037 .075 .135 .002 .007 .022 .052 .097 .141 .159 0.0 .005 .013 .028 .059 .114 .184 .159 .002 .006 .019 .046 .085 .124 .5 .003 .007 .017 .037 .075 .135 .001 .004 .013 .031 .059 1.0 .002 .004 .009 .019 .040 .001 .002 .007 .017 1.5 .001 .002 .004 .010 0.0 .001 .003 2.0 0.0 .001 .002 0.0 0.0 2.5 0.0 0.0 0.0 3.0 0.0 TABLE 5 .5 Comparison of Bivariate Sargan and Normal Densities (Normal I top entry, Sargan I bottom entry) p I .05 x2 X1 3.0 2.5 2.0 1.5 1.0 .5 0 0.0 ”3.0 0.0 0.0 0.0 -205 0.0 0.0 0.0 .001 .002 -200 0.0 .001 .002 0.0 .002 .006 .015 -105 .001 .002 .004 .008 .001 .004 .014 .029 .056 -1eo .001 .003 .007 .017 .036 .001 .006 .018 .044 .083 .122 -es .003 .007 .015 .034 .070 .129 .002 .007 .021 .052 .097 .141 .159 0.0 .005 .013 .028 .059 .114 .184 .159 .002 .007 .020 .047 .087 .126 .5 .003 .008 .018 .039 .080 .142 .001 .005 .014 .034 .061 1.0 .002 .004 .010 .021 .043 .001 .003 .008 .019 1.5 .001 .002 .005 .010 0.0 .001 .004 2.0 0.0 .001 .002 0.0 0.0 2.5 0.0 0.0 0.0 3.0 0.0 TABLE 5.6 Comparison of Bivariate Sargan and Normal Densities (Normal I top entry, Sargan I bottom entry) p I .10 x2 111 3.0 2.5 2.0 1.5 1.0 .5 0 0.0 -300 0.0 0.0 0.0 “205 0.0 0.0 0.0 .001 .002 -200 0.0 .001 .001 0.0 .001 .005 .013 '1.5 .001 .001 .003 .007 .001 .003 .010 .027 .053 -100 .001 .003 .007 .015 .032 .001 .005 .017 .042 .081 .121 -es .002 .006 .014 .031 .065 .115 .002 .007 .021 .051 .097 .141 .160 0.0 .005 .013 .028 .059 .114 .184 .160 .002 .007 .021 .049 .089 .128 .5 .004 .009 .020 .042 .085 .149 .001 .005 .016 .036 .067 1.0 .002 .005 .011 .023 .047 .001 .003 .009 .023 1.5 .001 .002 .005 .011 0.0 .001 .005 2.0 0.0 .001 .002 0.0 .001 2.5 0.0 0.0 0.0 3.0 0.0 TABLE 5.7 Comparison of Bivariate Sargan and Normal Densities (Normal I top entry, Sargan I bottom entry) p I .15 x2 X1 3.0 2.5 2.0 1.5 1.0 .5 0 0.0 -300 0.0 0.0 0.0 -2e5 0.0 0.0 0.0 0.0 .001 -200 0.0 0.0 .001 0.0 .001 .004 .011 -1e5 0.0 .001 .002 .006 .001 .003 .009 .024 .050 '1.0 .001 .002 .006 .013 .029 .001 .005 .016 .040 .079 .120 —.5 .002 .006 .013 .028 .060 .115 .002 .007 .021 .051 .097 .142 .161 0.0 .005 .013 .028 .059 .114 .184 .161 .002 .007 .021 .050 .092 .130 .5 .004 .010 .021 .045 .090 .156 .002 .006 .017 .038 .067 1.0 .002 .005 .012 .025 .051 .001 .004 .011 .023 1.5 .001 .003 .006 .012 .001 .002 .005 2.0 .001 .001 .003 0.0 .001 2.5 0.0 .001 0.0 3.0 0.0 APPENDIX A DERIVATION OF THE COVARIANCES Here, we use the following relationship: n -ax . _ n 0 n ay _ n! I; x e dx ( 1) 1.. y e dy “n+1 h>0 (A.1) COV(X,Y) - E(X,Y) - Ito Ii“ xyf(x,y)dxdy (A.2) f(x,y) is defined as in (5.14) a x+a y 11 . IO“ IO“ xy e 1 2 (K0 -K1 x IK2y+K3xy)dxdy 2K 2K x + _l x + __3. 0 azy KO-sz -K3y 1 0 ul 2 al 'Lwye [' 2 '2K 3ldY'—7{——§-—+2—TI “1 “23 “1 “2 “2 1 - 6-5:? {a1a2K0+2a2K1+2a1K2+4K3} (A-3) l 2 a y -a x 12 - I90 ye I; xe 2 (K0 +K1 x -K2y K xy)dxdy 0 “2y K0 2K1 K2y 2K3y '1... ye I7*"5"‘2"‘3 “7 a a a a 1 1 1 1 -1 I —§-§ (a1a2K0+2a2K1+2a1K2+4K3) (A-4) (11 (:2 I I I; e 2 y I2” xe ”2 (K0 IK1x+K2yIK3xy)dxdy 99 100 2 0 a 2 5 a1 1 a2 a2 -1 = -§-3- (a1a2K0+2a2K1+2a1K2+4K3) (A.5) “1 “2 ’“2’ "“1“ 2 2 I - I; e y I; e (Kox+K1x +K2xy+K3x y)dxdy (A.6) 1 - —Tjflflfi%fi%fifl%SM%} “1 “2 Therefore COV(X,Y) - o (A-7) 101 The same process is applied to f(x1,x2) as defined in (5.27): That is: a x +a x R - f0 Iga xlxze 1 1. 2 2{-6alx1-9a2x2-I'K3x1x2)dxldx2 1 —oo [0 x e+a2x2[- 2a19 + azexz + 2K3x2 -o 2 a 3 2 3 1 “1 “1 ] dx2 29 29a2 4K3 4 azaz a2a3+ 3 3 ' 3a3(“1“2°"(3) (“'8’ 12 12 12 “1 2 I + and similarly for other intervals; R2, R3, R we get: 49 R 8 f0 I” x e-a1x1+a2x2(9a x -6 +K )dx dx 2 «01"2 11“2"23"1"212 0 azx2 29x1 eazx2 2K3x2 =1... ‘27???" “*2 1 1 1 4x -29 29 3 -4 2 2 2 2+a3 3 3 3 (“1“29'1‘3) (“’9’ “1 “2 “1 “2 1 “2 “1 “2 102 -4 R3 - a 3 3 (alaZO-KB) (A.10) 1 “2 ' R -__é._.( 9+K) (All) 4 a an 3 “1“2 3 - 1 2 Therefore, “1“2 COV(X1,X2) - E(X1,X2) - -§§-(R1+R2+R3+R4) 3 (A.12) APPENDIX B DERIVATION OF MOMENT GENERATING FUNCTION AND CUMULATIVE DISTRIBUTION FUNCTION OF BIVARIATE SARGAN i) Moment generating function: let a a -a Ix l-a Ix I 1 2 1 1 2 2 f(x,y) 3 W (KO'HIIGIXI [+1129]le +K3x1x2+a1a2Kolx1||x2| (3.1) as defined in (5.36) t x +t x M(T) - E(e 1 1 2 2) (a +t )x +(a +t )x 0 0 1 1 1 2 2 2 B1 - f_cf_° e (KO-mlBxl-a29x2+(K3+ala'2K0)x1x2)dxldx2 o e<“2+t2)"2 Ko’“2°"2 “1°'(K3+“1“2Ko)"2 ‘~D a +t 2 1 1 (a1+t1) } dx2 - (a1+t1)Ko+a19 + (a1+t1)a26+(K3+u1a2Ko) (B 2) (a1+t1)2(a2+t2) (a1+t1)2(a2+t2)2 103 104 0 -(a1-t1)xf(a2+t2)x2 I~= E; e (“0+“19x1'“2°x2+(x3 “1“2Ko)“1‘2)“x1“x2 f0 e(a2+t2)x2 KO-azexz + a19+(K3-a1a2K0)x2} dx '~= al-tl (a -t )2 2 1 1 (al-t1)Ko+a19 (cl-t1)a29-(K3-a1a2Ko) 2 + 2 2 (3.3) (cl-t1) (a2+t2) (cl-t1) (a2+t2) (a +t )X '(a -t )x 1 1 1 2 2 2 I; I?” e [KO-a19x1+a29x2+(K3-a1a2K0)x1x2]dx1dx2 -(a2-t2)x2 Ko+a29x2 ale-(K3-a1a2Ko)x2 I; e a +c + 2 }“x2 1 1 (a +t ) 1 1 (a1+t1)Ko+a19 + -(K3-a1a2Ko)+a29(a1+tl) (B 4) 2 2 2 ° (a1+t1) (az-tz) (a1+t1) (az-tz) -(a -t )x -(a -t )x 1 1 1 2 2 2 I; I; e [K0+a16x1+a29x2+(K3+ala2Ko)xlx2]dx1dx2 105 Ko+a29x2 a19+(K3+a1a2K0)x2 -(a2-t2)x2 ' o 1 a -c" + 2 } “x2 1 1 (cl-t1) ‘ (al-t1)Ko+a19 (cl-t1)a265T (K3+a1a2Ko) (B 5) 2 f O (cl-t1) (dz-t2) (cl-t1) (dz-t2) “1“2 M(T) {31+B +B3+34} 8(KO+B) 2 ala2 (a2+t2)[(a1+t1)K0+a16]+a29(a1+t1)+K3+a1a'2KO 8(K0+9) 2 2 (a1+t1) (a2+t2) + (a2+t2)[(a1-tl)Ko+a19]+(a1-t1)aZG-(K3-a1a2Ko) (cl-t1)2(a2+t2)2 +-(“2"2)[(“1+“1)Ko+“1“]'(K3'“1“2Ko)+“2°(“1+t1) 2 2 (a1+t1) (dz-t2) (dz-t2)[(a1-t1)Ko+a16]+a26(a1-tl)+(K3+a1a2Ko)} + 2"""‘2 (“1"1) (“2"2) 106 { K0 + «19 + aze (a +t )(a +t ) 2 2 1 1 2 2 (a1+t1) (a2+t2) (a1+t1)(a2+t2) + “1“2K0+K3 2 2 (a1+t1) (a2+t2) K a 9 a 9 0 1 + 2 (“1"1)(“2+“2) (a1+t1)2(a2+t2) (al-t1)(a2+t2)2 a a K -K + 1 2 ° 3 (B.6) (al-t1)2(a2+t2)2 K a 9 a 9 O + 1 + 2 (“1+‘1)(“2"2) (a1+t1)2(a2-t2) (a1+t1)(a2-t2)2 “1“2K0‘K3 + 2 2 (“1+“1) (“2’“2) K a 9 a 9 0 + 1 + 2 (“1‘“1)(“2"2) (al-t1)2(a2-t2) (al-t1)(a2-t2)2 “1 “2K0+K3 “1 “2 + (al-tl)z(a2-t2)2 8(K0+9) Since, 107 an” 1 (-1)“‘“ nhn! ( ) - or in general (8.7) n m X X n+1 n+1 6X1 axz 1 2 x1 x2 n+m a ( 1 _ (-1)“+“(n-1—1)1(m+J-1)2 (3.8) m i j n+1 n+3 _ _ axlnax2 x x x1 x2 (1 1)!(J 1)! an+1n [M(t)] - alaz { (-1)n+m n! m! K0 btlnotzm 8(K0+9) (“1+t1)n+1 (“2+t2)m+l (-1)“"“ (n+1)! m! ale (a1+tl)n+2(a2+t2)m+1 (-1)“”“ n! (n+1)! (129 (-1)“"“ (n+1)! (n+1)! (a1a2x0m3) + + (“1+t1)n+l(a2+t2)m+2 (a1+t1)n+2(a2+t2)m+2 (--1)"'l n! 11:! K0 (--1)“1 (n+1)! m! ale + + (al-t1)n+1(a2+t2)m+1 (al-t1)n+§(a2+t2)m+1 108 (-1)“' n! (n+1)! a e {-1)“ (n+1)! (n+1)! (a a x -1< ) 2 + 1 2 o 3 n+2 2) n+1 n+2 n+2 ((11 t1) (a2+t (al-tl) (a2+t2) (-1)n n! m! x0 (-1)n (n+1)! m! «16 + + n+1 n+1 n+2 n+1 (a1+t1) (a2 t2) (a1+t1) (a2 t2) {-1)“n: (n+1)! a29 (-1)“(n+1)1 (n+1)! (“1“2K0‘K3) + 1 n+2* + n+2 n+2 (“1+“1)n+ (“2't2) (“1+t1) (“2'“2) (-1)2(“+“)n1 m! K0 (-1)2(“*“)(n+1)1 m1 ale + + (a1_t1)n+1(a2_t2)m+1 (a1_t1)n+§(a2_t2)m+1 ' (-1)2(“+“)n1 (n+1)! a29 (-1)2(“+“)(n+1)1 (n+1)! (a1a2K0+K3) + + (al-t1)n+1(a2-t2)m+2 (al-t1)n+2(a2-t2)m+2 n a a nlle [1+(-1)“+(-1)“+(-1)“+“1 _ a +“mt-0) _ 1 2 { o I"'nm 6t nat m 51K0+95 a n+1a n+1 1 2 1 2 109 +[1+(-l)n+(-1)m+(-1)n+m](n+1)1m1a16 a n+2“ n+1 1 2 [1+(-1)“+(-1)“+(-1)“+“]n1(m+1)!aze a n+la n+2 1 2 + 2(n+m)]a 1(-1>“+“+<-1>“+<-1>“+(-1) lazxo + (n+1)!(m+1)! n+2 n+2 “1 “2 1<-1)“+“-<-1)“—<-1>“+(-1>2‘°+“’1K3(n+1>!(n+1)! + a n+2“ n+2 l 2 Therefore; n! m! Ko+(n+l)! m! O+n! (n+1)! O+(n+1)! (m+1)!K0 p 3 on n m 2a1 a2 (KO+6) if both nlm even [n!m!+(n+l)1(m+1)!]KO+[(n+1)1m!+n!(m+1)!]9 - (3.11) n m 2:11 «2 (K0+9) 110 (n+1)!(m+1)!K 3 nm . n+1 m+l 2a1 a2 (K0+9) If both n,m are odd (3.12) p - 0 otherwise (8.13) 11) Derivation of Cumulative Distribution Function 1) zl,zz>0 1° {0 ea1x1+a2x2[x - 9x -a +(x +a ) ]d d -o -o o “1 1 29x2 3 1“2Ko x1‘2 x1 x2 Ko'“2°"2 + “1°'(K3+“1“2Ko)x2 a 2 1 a1 1 dx 0 “2x2 I...e[ 2 2(K0+9)a1a2+K3 - 2 2 (3.14) “2 0 z1 ’“1x1+“2x2 I = [_OIO e [Ko+a1Oxl-az9x2+(K3-a1a2Ko)x1x2]dx1dx2 111 ' I?“ "‘l’{° 1 1[KO‘HH'fle“1'“29"2"“‘3'“1“2Ko)""1"2 (“3'“1“2Ko)x2 “2‘2 + ] e “1 (“3’“1“2Ko)*2 “2x2} “1 le 9x+ - [KO+9--O z a x +a x o 1 1 1 2 2 I15 I... I... e “‘0 “19x1 “2932+(K3+“1“2Ko)"1"2]“"1“"2 . “121 I —§-§- [WI-alwzzl (3.21) (11 a2 22 z1 “1x1-“2xz IO 1.“ e [Ko’“1°x1+“29x2+(K3’“1“2K0)x1x2]“x1“xz 115 “121'“222 -e ' 2 2 {2"4'K3'“1("4‘K3)z1+("4'K3)“222 ”5z122} a a 1 2 (12 e 1 1 1"‘EF‘7E [2“4 K3 “1("4’K3)z1] “1 “2 F(z1,22) - A(I6+I7) A { .112 224111 21 -8 “1’1 22]+e [4w4-2a w z ]} ' w5‘1 1 a 1 [ZwA-K3-a1(w4-K3)zl+a2(w'4-K3)z2 (8.22) (8.23) APPENDIX C SECOND-ORDER BIVARIATE SARGAN DENSITY Let -a1|x1|-a2|x2| f(x1,x2) - A e [KO+K1|x1|+K2|x2|+K3x1x2+K‘lele 2 2 where a 3a 3 A - 1 2 (c.2) 43 2 2 2 8 a1 a22K0+a1a22K1+a1 a2K2+a1a2K4+2a22K5+2a1 K6 Its marginal densitites are: i) x1>O '“1‘1 '“2‘2 2 2 L1 - e I; e (KO+K1x1+K2x2+(K3+K4)x1x2+K5x1 +K6x2 )dx2 116 117 2 -a x K +K x +K x1 + K2+(K3+K4)x1 2x _ e 1 ll 0 1 1 5 + 6] (C.3) a2 2 a 2 “2 3 a x a x 1 l 0 2 2 2 2 L2 - e I” e (KO-Flel-K2x2+(K3-K4)x1x2+l(5xl +K6x2 )de 2 -a1x1 K0+K1x1+K5x1 KZ-(K3-K4)x1 2K6 - e [ + + ] (C.4) a2 2 a 3 “2 2 ii) and similarly for x1<0, we get: 2Ae_a1x1 2 A(L1+L2) ' '-;-§-— [Ko+).1x1+).2x1 ] if x1>0 2 81(X1) - -3§3:1:-1—[x-1x+xx2] ifx<0 3 0 1 1 2 1 1 a 2 (C.5) where 2 X K +11 K +2K o'“2022 6 *1 ' “22x11“2K1. 1 X;- - alYl (C.6) 118 substituting back into (C.5), -a1|x1| ale 2 2 31“1’ ' 2(1+yl+2y2) (1+“1Y11‘1l+“1 Y2‘1 ) and similarly for x2: -a IX 1 a2e 2 2 2 2 32(‘2) ' 2(1+§1+2§2) (1+“2§1|‘2|+“2 “2‘2 > where 2 alhK2+a1K4 2 a2(a1 Ko+a1K1+2K5) £1 ' 2 “1 K6 a n 2 2 2 62 (a1 K0+a1K1+2K5) (C.7) (0.8) (0.9) (C.10) (C.11) Equations (C.7) and (0.8) are second-order Sargan densities for X1, and X2 respectively. To ensure that first and second partial derivatives are continuous we set 51 - 1, 71 - 1 (see Goldfeld and Quandt (1981)). Equations (C.6) and (C.10) result in: 2 “2 K1+“2Kl. K1 K2 K4 2K+a +21< ' a1 .3) K0 . “-1.-“2+“1“ “2 0 2‘2 6 2K +a K K “1 2 11. K2 K1 4 2 - :12 .> K0 - ?-T+aa a1 KO-i-alKl-I-ZKS 2 1 1 That is; 5.1-5.2. .. ‘5 -32. a a 2 2 1 2 a1 12 a2 :2 Equation (0.11) can be rewritten as: K 8 K6 - 51.- 2K5 0 a 5 a1 a 2 2 1 and from (C.7) we get: “2K5 .. a Y .> K5 _ 1(2- _ 2K6 2 1 2 2 a 2 a2 K0+azK2+2K6 a1 12 2 a2 Using equations (C.14) - (C.16), we get: K K a 2 Y2 a 2 £2 1 2 Then; a ‘1 5 a 2: 5 K _ 1 2 K _ 2 2 3 5 1+72 6 [+52 and, (C.12) (C.13) (C.14) (C.15) (C.16) (C.17) (C.18) 120 Y E 2 2 K4 GIGZKO'HIIGZO (ill-Y;+ 112;) (11112113 (C.19) Y2 ’52 v4 . 1T1}; , v5 . TF5; (c.20) v3 I KO + 6 (v4+v5) (C.21) Therefore K1 I alv1 , K2 I azv2 2 v - 5 ( 1 - Y2 ) - x (c.22) 1 1+; 1+7 0 2 2 1 2“2 v2 I 6 (1+Y2 - 1+§2) - K0 (C.23) Using the above relationships, the joint density can be rewritten as “1“2 '“11‘1l'“2"2l f(‘1 "2) ' ’85” “ [Ko+“1"1|‘1 |+“2"2 I‘2l“"3‘1‘2"“1“’2"31‘1‘2l 2 2 2 2 + a1 ovax1 +a2 6v5x2 ] (0.24) The covariance of X1,X2 will therefore be derived as follows: a x a x O 2 2 0 1 1 2 2 MI I f_° x2e f_o x1e [K0 allel a2v2x2+(K3+a1a2v3)x1x2+a1 6V4):1 2 2 + a2 vsox2 ]dx1dx2 121 x +a 2v 6x 2 2[-a1v1+(K3+a1a 2V3)‘2] 0 “2‘2 Ko’“2"2 2 2 5 2 I.” ‘2e { a 2 + a 3 1 1 6a 26v _ 1 4} d a 4 ‘2 1 2x +2 2 +5 25 +2(x +5 ) 5 25v .- “1 o “1 v1 “1 v4 + 2 “1“2‘2 3 1“2V3 + “2 5 (c 25) (14:12 ¢3¢3 (14112 . 1 2 1 2 2 1 and similarly for the other parts, i.e.: 2 2 2 2 -(a1 Ko+2a1 v1+6a1 5v4) 2[-a1a2v2+2(K3-u1a2v3)] 6oz 6v5 M2 ' a 4“ 2 1 a 3 3 ' a a 4 1 2 1 “2 1 2 (C.26) -(a 2K +2a 2v +6a 26v ) 2[-a a v +2(K -a a v )] 6a 26v M - 1 o 1 1_~ 1 4 + 1 2 2 3 1 2 3 _ 2 5 3 4 2 ' 2 '4' “1 “2 “1 “2 “1 “2 (c.27) a 2x +2a 2v +6a 25v 2[a a v +2(x +5 a v )1 6a 25v M - 1 o 1 1 1 4 + 1 2 2 3 1 2 3 + 2 5 4 a 4“ 2 a 3“ 3 a 2a 4 1 2 1 2 1 (C.28) 2K aa 1 2 3 (c.29) C0V(X1,X2) - -§3-'(M1+M2+M3+M4) 5a 2a 2 1 2 and the correlation coefficient is: 2 2 p - 21(3/a1 a2 5 - K3 (1+12)(1+§25 16(1+3yz)(1+3§2) 251525 / (“+312)(1+3§2) _T’Z / “1 “2 (1+72)(1+§2) (3.30) Chapter 6 Conclusions The main objective of this dissertation was to investigate the adequacy of the Sargan distribution as an approximation to the normal in econometric models. The normal distribution is widely assumed, in part because it often leads to simple results. However, in some models the normal distribution is computationally more complicated than the alternative distributions, such as the Sargan distribution, whose c.d.f. can be expressed in closed form. Thus there may be a computational benefit to using an approximation to the normal in certain models, and we ask what the cost might be. In chapters 3 and 4 we showed that the univariate Sargan distribution provides a very close approximation to the normal, in the sense that their densities are quite close to each other. This is especially so if a second or higher-order Sargan distribution is used. Such comparisons have also previously been made by Goldfeld and Quandt (1981) and by Missiakoulis (1983). However, the fact that the densities are close does not necessarily imply that MLE's based on the Sargan distribution will have properties similar to the MLE's based on the normal distribution. Therefore, we have considered a variety of models, and asked what the 122 123 cost is of assuming the errors to have a Sargan distribution, if in fact they have a normal distribution. Our definition of cost is the asymptotic bias (inconsistency) of the resulting estimates. This cost is model dependent; in other words, it is different in different models. There is no cost in the linear regression model. In fact, our result for the linear regression model is stronger: the Sargan MLE's are consistent regardless of the true distribution of the error terms, so long as it is symmetric. However, in the so-called sample selection models, in which the c.d.f. of the errors appears in the likelihood function, there is a cost to an incorrect assumption of the distribution of the errors. How high the cost is depends upon how complete the sample is; the cost is higher when the sample is more highly censored or truncated. For example, in the censored regression model the asymptotic bias varies positively with the degree of censoring. As the sample becomes more complete, the bias disappears, which is consistent with our result for the fully observed sample (the linear regression model). The same kind of results occur for the truncated dependent variable model, except that the bias is generally larger in the truncated model than in the censored model. This is consistent with the reasonable intuition that additional information helps in reducing the bias of the estimates. Another result which is consistent with this intuition is that the bias of the estimates is virtually always smaller when the error variance is known than it is when the error variance is unknown. The bias is sometimes large enough to be a serious problem, but it is 124 relatively minor for samples that are 50% complete in the censored case, or 75% complete in the truncated case. The same kinds of results are true for higher-order Sargan distributions, but the bias is smaller when a higher-order Sargan distribution is assumed. Therefore, as should be expected, it is much safer to approximate the normal distribution by the second-order Sargan distribution than by the first-order Sargan. Another interesting result is that it is much less costly (in terms of asymptotic bias) to approximate the normal distribution with the Sargan that vice-verse. In the linear regression model, there is no bias either way. However, in our more complicated models, the bias caused by assuming Sargan when normal is true is much smaller than the bias caused by assuming normal when Sargan is true. It is not apparent why this should be the case. The overall conclusion from our study of the univariate Sargan distribution is that one should not use the Sargan distribution, if one really believes that the normal distribution is correct. Any computational savings are not worth the cost, in terms of asymptotic bias and resulting incorrect inferences. 0n the other hand, while models depending on the univariate normal c.d.f. (e.g., the Tobit model) are not terribly complicated computationally, models that involve the multivariate normal c.d.f. (e.g., a multi-market disequilibrium model) are still very difficult to estimate. Thus a multivariate Sargan distribution might be more valuable, in terms of potential computational savings, than the univariate Sargan distribution. We have considered a bivariate 125 distribution, whose marginals are univariate Sargan, and defined it to be a bivariate Sargan distribution. We have shown that our constructed bivariate Sargan density is very close to the bivariate standard normal density except around the turning points, although it is not very far off at these intervals either. Also we proved in the seemingly unrelated regressions model that the estimates based on the bivariate Sargan density are asymptotically unbiased (consistent) regardless of the true distribution of the error terms, as long as it is symmetric. Presumably this is not so in the more complex models. Although we did not investigate such models in detail, this could be done by a straight forward extension of the methods used earlier in the thesis. The main pronlem with our bivariate Sargan density is the limited possible range of the correlation coefficient, p. It is not very interesting to limit attention to bivariate or multivariate Sargan distributions that exhibit almost no correlation, but this is necessary to keep the density non-negative over its entire range. An interesting unanswered question is whether ignoring this restriction would cause problems in actual empirical work. For example, it would certainly not matter in any practical sense if the Sargan density were negative, ten or twenty standard deviations from the mean. BIBLIOGRAPHY Arabmazar, A. and P. Schmidt (1982),”An Investigation of the Robustness of the Tobit Estimator to Non-Normality," Econometrica, SQ, lOSS-l063. Goldberger, A. S. (1980),“Abnormal Selection Bias" SSRI Discussion Paper 8006, University of Wisconsin, Madison. Goldfeld, S. M. and R. E. Quandt (1981),”Econometric Modeling with Non-Normal Disturbances," Journal 9g Econometrics, 11, 141-155. Grobner W. and N. Hofreiter (1958), " Integraltafel, Zweiter Teil Bestimate Integrals,” Springer-Verlag in Vienna, Austria. Hechman, J. (1976), ”The Common Structure of Statistaical Models of Truncation, Sample Selection and Limited Dependent Variables, aand a Simple 'Estimators for Such Models,” Annals 9f Economics and Social Measurement, g, 475-492. Johnson, N.L. and 5. Hot: (1970), Continuous Univariate Distributions, Vol 1., New York, Wiley. Missiakoulis, S. (1983),"Sargan Densities: Which 0ne?," Journal 9; Econometrics, 2;, 223-234. Stone, M. H. (1962),”A Generalized Weierstrass Approximation Theorem," in Studies i2 Modern Analysis, 391 1, ed. R. Creighton Buck, Englewood Cliffs, New Jersey: Prentice Hall. 126 ”111111111111111111111111111111111“