MSU

LIBRARIES
“

 

 

RETURNING MATERIALS:

 

Place in book drop to
remove this checkout from
your record. FINES will
be charged if book is
returned after the date
stamped below.

 

 

 

ON THE ADIDUACY OF THE
SARAGAN APPROXIMATION TO THE NORMAL

IN ECONOMETRIC MODELS

BY
Mohammad-Ali Kateei

A DISSERINTION
Submitted to
Michigan State University
in partial fulfillment at the requirements
tor the degree of

DOCTOR OF PHILOSOPHY

Department of Economics

1984

assrnncr
on war announcr or was

SARAGAN APPROXIMATION TO THE NORMAL
IN ECONOMETRIC MODELS

BY
Mohammad-Ali Kafeei

The normal distribution ,is very commonly assumed in
econometric models, but it causes some problem in models for which the
likelihood function involves the normal c.d.f. Goldfeld and Quandt
introduced a class of distributions, called Sargan distributions, to be
used as an approximation to the normal, which have certain desirable
properties, notably an integrable c.d.f. We investigate the adequacy
of these Sargan distributions in approximating the normal. This is
done by measuring the ”cost” resulting from the usage of the Sargan
distribution instead of the normal as the assumed distribution of the
error terms. The cost is defined in terms of the asymptotic bias (or
inconsistency) of the parameter estimates, when the errors are actually
normal. We prove that this ”cost" is model dependent. There is no
cost in the linear regression model, but in other models (such as
sample selection models) there is a cost (a false distributional
assumption causes inconsistency). and the cost goes up with the degree
of censoring or truncation. We consider three different models for
both the realistic case of an unknown variance and the realistic case

of a known variance, considering both first and second order univariate

to
biV
use

reg

the

not

up

inv

Sargan distributions. We also define a bivariate Sargan distribution,
examine its properties and investigate its adequacy as an approximation
to the bivariate normal distribution. We provide tables for the
bivariate Sargan and normal densities, compare them, and discuss the
use of this Sargan distribution in a -simple seemingly unrelated
regression model.

The overall conclusion of this study is that if one believes
that the true distribution of the error terms is normal, it really does
not pay that much to, use the Sargan distribution. Especially in
largely censored (or truncated) samples, the cost is more than the
benefits (in terms of computational savings) gained. But the potential
computational saving is generally higher in the bivariate (or
especially the multivariate) case than in the univariate case, so
multivariate Sargan distributions may be worthy of further

investigation.

To the Memory of my father,

and to my mother, brother and sisters.

ii

ACKNOWLEDGMENTS

I would like to express my sincere gratitute and appreciation
to Professor Peter J. Schmidt, chairman of my dissertation committee
for his invaluable advice, suggestions, comments and criticism as well
as his encouragement and supports without which this study would have
not been possible. I am also grateful to the other members of the
committee, Professor Robert H. Rasche, professor Daniel S. Hamermesh,
and Professor Byron W. Brown for their critical reading and useful
Comments and suggestions.

I also would like to thank Mrs. Terie Snyder for typing the
equations of the dissertation.

Most of all I will be indebted to the members of my family for

their moral and financial support.

iii

List of Tables

Chapter

II

III

TABLE OF CONTENTS

Introduction

1.1 Historical Background

1.2 Univariate Sargan Distribution: Definitions
and Properties

1.3 Outline of the Dissertation

Robustness of Normal MLE's to Sargan Errors

2.1

2.2

2.3

2.4

2.5

Introduction

The Censored Case (Tobit Model)

The Truncated dependent Variable Model
The Binary (Probit) Model

Conclusions

Robustness of Sragan MLE's to Normal Errors: First

Order Case
3.1 Introducion
3.2 Linear Regression MOdel

3.3

The Tobit Model

iv

Page

vi

13
15
18
19

20

23
25

30

Chapter

IV

VI

References

3.4 The Truncated Dependent Variable Model
3.5 Conclusions

Appendix A

Robustness of Sragan MLE's to Normal Errors: Second
Order Case

4.1 Introduction

4.2 The Linear Regression Model

4.3 The Tobit Model

4.4 The Truncated Regression Model

Bivariate Sargan Distribution‘

5.1 Introduction

5.2 Definitions

5.3 Density Comparisons

5.4 A Simple Seemingly Unrelated regression Model
5.5 Conclusions

Appendix A

Appendix B

Appendix C

Conclusions

Page

36

40

44

46

46

49

53

$8

59

80

82

9O

99

103

114

122

126

Table

2.1

3.1

4.1

5.1

5.2

5.3

5.4

5.5

LIST OF TABLES

Asymptotic Bias of Tobit When True Errors are First

Order Sargan

Asymptotic Bias of Sargan "MLE" when True Errors

are N(0,1)

Asymptotic Bias of Second Order Sargan "MLE" when

True Errors are N(0,1)

Comparison of Bivariate

( p =-.15 )

Comparison of Bivariate

( p =-.lO )

Comparison of Bivariate

( p =-.05 )

Comparison of Bivariate

( p =0.0 )

Comparison of Bivariate

( p =.05 )

Sargan and Normal Densities

Sargan and Normal Densities

Sargan and Normal Densities

Sargan and Normal Densities

Sargan and Normal Densities

vi

Page

22

43

57

92

93

94

95

96

5.6

5.7

vii

Comparison of Bivariate Sargan and Normal Densities

( p 8.10 )

Comparison of Bivariate Sargan and Normal Densities

( p=.15 )

97

98

Chapter 1

Introduction

1.1 Historical Background

The assumption of normality is very common in econometric and
statistical work. For example, errors in regression models are often
assumed to be normally distributed; hypotheses about means of random
variables are tested under the assumption of normality: and in more
complicated models, such as disequilibrium models or bivariate probit
models, a multivariate normal distribution is assumed. To some extent,
the assumption of normality can be justified by reliance on the central
limdt theorem.' However, such justifications are seldom very rigorous,
and it is probably fair to say that the frequent use of the normal
distribution is in fact due to two other reasons. First, in many
common models the assumption of normality is very convenient. For
example, in the linear regression model, the maximum likelihood
Vestimator under normality is least squares, which is simple to
calculate. Also given normality exact finite sample tests of linear
hypotheses are possible. Alternative error distributions would lead to
more complicated estimation and testing procedures, which would
generally be justified only asymptotically. Second, in many common
models inferences based on normality are asymptotically robust to

non-normality. Again using the linear regression model as an example,
1

tests which are exact in finite samples given normality are correct
asymptotically given any non-normal distribution with finite mean and
variance.

However, in other models the assumption of normality may not
be very convenient. The cumulative distribution function of the normal
distribution can not be expressed in closed form, and therefore the
normality assumption may not be the most convenient in models for which
the likelihood function contains the c.d.f. of the error distribution.
This is so in a wide class of models involving censoring, truncation or
selection of the dependent variable (e.g., the Tobit model). This is
especially true in multi-equation models (e.g., a multi-market
disequilibrium model), since the c.d.f. of the multivariate normal
distribution can be very expensive to evaluate.

Such computational considerations led Goldfeld and Quandt(1981)
to introduce a class of distributions, which they call ”Sargan
distributions", to be used as substitutes for the normal distribution
in models for which estimation requires evaluation of the c.d.f. of
the error distribution. The Sargan c.d.f. can be evaluated
analytically, and its density is reasonably close to the normal
density. Thus Sargan distributions may be reasonable candidates to use
as an approximation to the normal distribution.

An obvious question to ask is how good an approximation to the
normal distribution a Sargan distribution provides. One way to answer
this question is simply to compare the densities (or c.d.f.'s) of the
two distributions. Goldfeld and Quandt (1981, p. 145) provide in

their Table l a comparison of the densities of N(0,1) and of a

second-order Sargan distribution with mean zero, variance one, and the
same density at zero as N(0,1). The agreement is reasonable close,
except in the tails. Missiakoulis (1983, p. 227) provides in his
Table l the same results, plus the density of the first-order Sargan
distribution with variance equal to one (a=2). As might be expected,
the first-order Sargan distribution does not provide as good an
approximation as does the second-order Sargan distribution.1

While such direct comparisons are informative and easy to
make, they do not provide good evidence on the relevant statistical
question, namely the effect on one's estimates or inferences of the use
of such an approximation to the (hypothesized) true error distribution.
This is a much harder question, in part because its answer clearly
depends on the nature of the model in which the errors appear. For
example, an incorrect assumption of normality does not cause bias or
inconsistency of the parameter estimates in the linear regression
model, but it does cause inconsistency in limited dependent variable
models, such as the Tobit model. Conversely, the incorrect assumption
that the errors have a Sargan distribution does not cause inconsistency
in the linear regression model (as we prove later), but it does cause
inconsistency in more complicated models such as the Tobit model.
Indeed, it is an unfortunate fact that, for the class of models for
which the Sargan assumption is convenient (models for which the
likelihood function involves the error c.d.f.), the correctness of the
assumed error distribution is vital, even asymptotically.

Some work has been done on the robustness of the normal maximum

likelihood estimates to non-normality, in the Tobit model and some

related models. Goldberger (1980) and Arabmazar and Schmidt (1982)
consider the Tobit model, the truncated version of the Tobit model, and
the probit model, and consider the asymptotic bias (inconsistency) which
results when the normality assumption is incorrect. The alternative
distributions considered were Student's t, Laplace, and logistic, and
the models considered contained only a constant term. The bias is
sometimes substantial, so that this problem merits further attention.
Missiakoulis (1983) has provided a similar analysis, for the probit
model only, of the asymptotic bias resulting from an (incorrect)
assumption that the errors have a Sargan distribution, when the normal
is in fact correct.

In this thesis we consider some implications of the use of the
Sargan distribution in econometric models. The plan of the thesis is
given in section 1.3. Section 1.2 first defines Sargan distributions,

and gives some of their properties, for later reference.

1.2-Univariate Sargan Distribution; Definitions and Properties

Goldfeld and Quandt (1981) define the family of Sargan

densities as (equation 2.1);

P
(1.1) f(x) - Ivze""‘l"l (1+ X erjlxlj)
3'1

where a > 0, 7j Z O, j=l,2,...,P and P is the order of the density.

Alternatively, in the notation of equation (1) in Missiakoulis (1983)

(1.2) f(x) - £’§J°‘lxl 3|x|3

E
y a
1-0 5

with
P
(1.3) D - (2 711:)"1 - 3%

loo
and 70 n 1.
Note that, as Goldfeld and Quandt and Missiakoulis mention, by
setting 7j = O for all j, f(x) becomes a generalization of the Laplace
density. Also, Goldfeld and Quandt (1981, p. 144) note that for f'(x)
to be continuous, one must impose 71:1 .

The Sargan density is a symmetric, unimodal density function

with x defined on.( - a , a ). Its moment generating function can be

derived as follows.

(1.4) me) - ﬂees - f; eex f(x)dx
Define:
P
1 (1+6 1
A - 23-12011 Jig e( )x(-x)dx

Using the fact (see Gronner and Hofreiter (1958), p. 55) that

 

n -ax n!
(1.5) I; x e dx - -;;1 n > O
a
P a1+111 i!
A - 3'

+
1-0 (a+6)1 1

and similary if we define

P 'P a} i!
o: r- i -(a-6)x 1 D Y1
14) id) (C4D

 

 

Thus
1) P 1+1 1 1
(1.6) 14(9) - A+B - 7 2 a 7112 ["—f+T+‘—'T+31']
id) and) are)
Therefore, the rth moment of the Sargan distribution is
(e 0) o E 1+1 (1,) [(-1)r(i+l)(i+2)...(i+r)
u = ' a 7 ° + 1
1' 71-0 1 «1“.
+ (i+l)(i+2)...(i+r)]
i+r+1
a
D P r
-—-; 2 (1+r)!Y1[(‘1) +11
2a i=0
0 if r is odd
(1-7) pr =
D P
‘;‘ Z (1+r)! Y1 if r is even
a 1-0

Equation (1.7) is different from equation (6) in Missiakoulis (1983)

and from equation (2.3) of Goldfeld and Quandt (1981) by the factor

2

71'

The Sargan c.d.f. can be evaluated analytically as follows:

1) x < O

x Do my P 1| lid
Fm - Ina 20w v v

M M3
- -e 111
2 1-0 1 3.0 j!

11) x > 0
P P
O D a 1 Da -ay
F(x) = I_o : e y X 11 a IYI dy + fo"— 9 2 7,6 y dy
1-0 1-0
_ o E Y 1! 3 Y 1+1 2 [ilie- 11 l
" - +1
2 1-0 1 2 1:0 1 j-O j! 1 3+1 1
P 1 J
-ax ax)
= D 2 y 1! -'-e 2 y 1! Z
1-0 1 2 1-0 1 j-O 3’
That is,
P 1 j
n -
a'eax X 71 1! 2 '$-§%l- 11 x < o
. 1-0 1.0 j
(1.9) F(x) I
P 1 j
D -ax
1 -’§-e 2 71 1! $§¥2—- 1: x > o
1-0 3-0

The special cases of (1.9) for the first-order and second-order Sargan

densities are defined in chapter 3
(equation 4.13).

their moments are as follows:

(equation 3.18) and chapter 4

The first-order and second-order Sargan densities and

 

(1.10

(1.12

 

(l.

First-Order Case

(1.10)

f(x) . %e-alxl (1+alxl)

2 2
a a

(1.12) M(9) - ‘4'(EIC'1'EF¥S*"‘£L‘§F*'“£L‘19

(a+8) (a-G)

Second-Order Case

(1.14)

(1.15)

 

 

u = 0

4(l+372)
var(x) - T———-

a (1+12)

3 3
2 2 2a Y 2a 1
J‘— + J‘- + -——--°‘ + -—--—°‘ + 2 + ~32
M 9 . (1+9 11-9 Q+6)2 La-ejz Jg+6)3 ﬁre)
( ) 4(1+Y2)

Equation (1.15) is different from equation (2.3) of Goldfeld

and Quandt (1981) as mentioned before.

 

one

ver

Cor.

1.3 Outline of the Dissertation

 

The purpose of this study is to investigate the adequacy of
the Sargan distribution as an approximation to the normal in econometric
models. The measure of adequacy is a simple one; namely, the
asymptotic bias (inconsistency) which results from the use of an
incorrect assumption about the error distribution. Obviously other
measures of adequacy of the approximation would be possible, but this
one is informative and has been used profitably elsewhere.

The models in which we will investigate this question are all
versions of what Beckman (1976) calls sample selection models.

Consider a hypothetical linear relationship

y. = a. x. + £- i=1'2'eee'n

*
which would satisfy the usual ideal conditions if yi were observed for

all i. In the censored regression model, we observe

. *
(1°16) Y1 ' 4 - nax (0,y1) 181,...,n
*
i 0 if’yi <20

 

(This is the Tobit model if s is normal.) Observations with yiz 0 will
be called "complete" observations while those with y1 = 0 will be called
"limit" observations. In the truncated regression model we observe
yi= Yi* if and only if Y1. Z 0. In other words, we observe the complete

observations but nothing about the limit observations (not even their

 

 

 

 

 

numbc

SC We

model

assu:
EXIEI
we
in u
dist:
same
Chapt

59:02

dist:

001“,

inVe:

‘0 u

10

number). Finally, in the binary regression mobel we observe

*
1 if y1 > 0
(1.17) yi - 1-l,...,n
*
0 if y1 < 0

so we essentially observe only the sign of yi*. (This is the probit
model if e is normal.)

In chapter 2 we investigate the robustness of the MLE's which
assume normality to Sargan errors, in these three models. This is an
extension of Goldberger (1981) and Arabmazar and Schmidt (1982), which
we address mostly for the sake of completeness. Our main interest is
in the opposite question, the robustness of MLE's which assume a Sargan
distribution to normal errors. We investigate this question in the
same three models plus the linear regression model. This is done in
chapter 3 for first-order Sargan distributions and in chapter 4 for
second-order Sargan distributions.

In chapter 5 we consider how to generalize the Sargan
distribution to the multivariate case. We consider alternative ways of
doing so, and define a particular bivariate distribution with Sargan
marginal distributions as a bivariate Sargan distribution. We make
some comparisons of its density to that of the bivariate normal, and
investigate the robustness of the MLE's based on the bivariate Sargan
to the bivariate normal in a simple seemingly unrelated regressions

model.

11

Finally, chapter 6 contains our conclusions.

 

_,_..__—.

 

"hi:
that

12

Footnotes

1) Missiakoulis also displays the densities of some
higher-order Sargan distributions. These approximate the normal less
well than the second-order Sargan distribution. This is possible
because of the way the parameters of the higher-order distributions
were chosen. In particular, whereas the Sargan density of order P
contains parameters ( a , 7 ,..., 7 ), Missiakoulis assumes the 7 's to
be chosen a priori (see his equatioR (9)) so that for any P there is
only one unknown parameter ( a ). Thus his second-order case is not
n ted in his third-order case, and so forth. If we allowed the
P -order case to have P free parameters, and chose these to maximize
the quality of the approximation (however defined), obviously we could
do no worse by increasing the order of the approximating distribution.

2) Using (1.7) to calculate a when 72=1/3, P= 2 (second-order
Sargan), and variance = 1,

u2= variance = D(2 + 6 71+ 24 72)/ a2= l
a = V 6 = 2.44949 , (D=3/8)

which is different from that of Missiakoulis (1983, p. 227). (Note
that 71= l is required to ensure the uniqueness of the results.)

N

I

no

Chapter 2

Robustness of Normal MLE's to Sargan Errors

2.1 Introduction

The normal distribution has been assumed to be the true
distribution of the error terms in a variety of different models,
although in many cases there has not been a very good reason to believe
or substantiate this assumption. This popularity of the normal
distribution stems partly from computational considerations and partly
from the robustness of the estimates based on normality. In certain
models, such as the linear regression model, the MLE's based on
normality are unbiased, BLUE and consistent even if the distribution of
the errors is non—normal; only for testing hypothesis in small samples
is the normality assumption crucial. The casual use of the normality
assumption in different models requires more attention. In this
chapter we look at this issue in different models with a specific and
true non-normal distribution of the error terms, namely first-order
Sargan, as defined in chapter 1.

This attempt can be seen as an extension of Goldberger (1981),
and Arabmazar and Schmidt (1982) , who raised the question of the
robustness of the normal MLE's to non-normality in the censored (Tobit)
and truncated regression models. Arabmazar and Schmidt concluded that

"the bias from non-normality can be substantial”, especially if the
13

 

 

 

 

 

varianc
few d1
intere:
the ac1
measurq

(incon:

likeli]
The: 2::

normal

regres
Sargan
from

atfici

t“Inc.
seCti
regul
Farah

Chapq

GOld:
M155

C031

14

variance of the errors is not known. However, they considered only a
few distributions ( Laplace, logistic and Student's t ), so it is
interesting to perform a similar analysis under the new assumption that
the actual distribution of the errors is Sargan. As in their work, the
measure of robustness to be used here is the asymptotic bias
(inconsistency) of the estimates.

In the case of linear regression model, the normal maximum
likelihood estimate ( which is OLS ) is unbiased and consistent.
Therefore in terms of our analysis, there is no cost in assuming
normality in the linear regression case even if in fact that is not so.

Goldfeld and Quandt (1981) consider the case of linear
regression with the residuals actually being distributed as first-order
Sargan, but incorrectly assumed to be normal. One important result
from their paper is the fact that efficiency is lost; the relative
efficiency of OLS to the Sargan MLE is .84.

In this chapter, I will consider in turn the censored,
truncated and binary (probit) dependent variable models. In each
section, after defining the model and its estimators, I discuss the
results of numerical calculations of the inconsistency of the estimated
parameters. (These are tabulated at the end of this chapter.) The
chapter ends with some concluding remarks.

In order to make the argument tractable and also comparable to
Goldberger (1980), Arabmazar and Schmidt (1982), and
Missiakoulis_(1983), we assume that there is only one regressor, a

constant term, in all the models under consideration here.

 

 

 

3‘
II!

where
only
mean
is th
"e as

dist:

“9 C:

the

“ted

giVe

(2.

15

2.2 The Censored Case (Tobit Model):
The model to be considered here is;

*
(2e1) Y1 ‘ “+111 , i 3 1, eee, I1 ,

where uiare i.i.d with zero mean and variance 02. By assumption, we
only observe yi=Max(0, yi*), not Yi*' We wish to estimate u and 02, the
mean and variance of y*. The only thing that remains to be specified
is the true distribution of ui. We ask the question of what happens if
we assume the errors to be normally distributed when in fact they are
distributed as first order Sargan. In other words, is the Tobit
estimator robust to non-normality? In order to answer this question,
we can use equation (10) of Arabmazar and Schmidt (1982), and calculate
the asymptotic bias of the Tobit estimates of u and 02. For this we
need the first two truncated moments of the Sargan distribution (as

given for the other distributions in their Table 1); these are:

em(mr2+3x + 3/a)
a-e'm‘ (2+ax)

390

 

(2.2) 13(qu > - x) .

(xe - axe/a
2mm:

 

 

Table
The 1
The x
biha:
this

t<3 Ta

BSSW

the

deg:(

the

16

and,

 

 

V 16 -ax¢z 3 2 2 2
7-4e [Ix +x +Ex+-§-]
a a XX)
41m(2+mt)
(2.3) E(u2|u>-x) =1
m3+4x2-§x+—8-
a 2
°‘ x<0
L Z-ax

 

The estimated biases of the Tobit estimates are given in
Table 2.1, for a variety of values of u, under the heading ”censored”.
The true value of a is assumed to be two, which corresponds to o2 8 l.
The results for the truncated version of Tobit model and also for the
binary case (probit), which we will discuss in the next sections of
this chapter, are also tabulated in Table 2.1. This table is similar
to Tables II-V of Arabmazar and Schmidt (1982).

We consider both the unrealistic case in which variance is
assumed to be known (so that the estimated a is set equal to two), and
the realistic case in which 02 is unknown, so that u and a must both be -
estimated. We will consider first the case of known variance.

The results and the conclusions drawn from them are basically

the same as those of Arabmazar and Schmidt for the t1 distribution.

0
The bias is sometimes substantial, but it is heavily dependent upon the
degree of censoring. That is, as the sample becomes largely complete,

the asymptotic bias goes quickly to zero. This agrees with the fact

 

 

 

(as i
regre
compl
that

subs:

depen

which
(more

as th

unkno
C850
bette

thoug

the v

1985

the
tense.

virtue

17

(as it must) that there is no asymptotic bias in the case of the linear
regression model. For example, for samples that are at least half
complete ( u = O, and above), there is virtually no bias. For samples
that are at least 1/4 complete ( u = -l.2 and above), the bias is not
substantial in the censored model (though it is so in the truncated
dependent variable case, as we will see in the next section).

Letting the variance of the dependent variable be estimated,
which is the more realistic case, the results are somewhat different
(more pessimistic). Although the asymptotic bias largely disappears
as the sample becomes half or more complete, the biases are much larger
in this case than the unrealistic one in which variance was known.
That is, the cost of misspecification of the distribution of the error
terms is higher when we do not know the variance.

It is worth noting that in both cases (variance known and
unknown), the bias of the normal MLE's - which allows for correction of
censoring- is much less than the bias of the sample mean. Thus it is
better (less costly) to correct the mean than not to correct, even
though the correction is biased. This result does not totally agree
with the results of Arabmazar and Schmidt (1982). In their study, when
the variance was unknown, the uncorrected sample mean was sometimes
less biased than the normal MLE.

The same kind of conclusions can be derived for the bias of
the estimated variance; that is, its bias depends upon the degree of
censoring. As the sample becomes half or more complete there is

virtually no bias at all.

18

2;; Th; Truncated Qgpendent Variable Model

We now turn to the case in which we have no information
concerning the unobserved observations (limit observations): even the
number of limit observations is not known. The model is still as
defined in (2.1), however, we observe yis y; 11 and only 11 y; is
non-negative. Again as was the case in the last section we assume we
have only one regressor, a constant term, meaning we estimate the mean
of the observations only.

The question again is: what is the statistical cost of
assuming normality of the error terms, when they are in fact
distributed as first-order Sargan? The numerical results are tabulated
in Table 2.1 under the heading "truncated". The entries are the
solutions to equation (11) of Arabmazar and Schmidt, needing only the
already derived truncated moments of u given above in (2.2) and (2.3).
The absolute bias is slightly greater than they found for the
t10 distribution, in both cases of known and unknown variance. It is
considerably greater than in the censored case, although it still goes
to zero (very slowly) as the degree of truncation goes to zero. In the
unrealistic case of variance known, for example, for samples which are
75 percent complete ( u = 1.2 and above) almost no bias exists. But as
the sample becomes less complete (degree of truncation rises) the bias
becomes greater and greater. Going to the more realistic case of

variance unknown, the bias becomes much larger, especially for heavily

truncated samples. Contrary to the censored case, however, when the

19

variance is not known the bias is much larger than the bias of the
uncorrected sample mean. (That is, it does not pay to correct for
truncation.)

The robustness of the normal MLE to non-normality has also
been considered by Goldberger (1980), in the truncated case, under the
assumption that the only regressor is a constant term, and that the
disturbance variance is known. He assumes that the true distribution
of errors is a non-normal symmetric distribution (Student's t, Laplace,
logistic), derives and calculates the asymptotic bias (inconsistency)
of the maximum likelihood estimators which assumes the distribution of
the errors to be normal. His results show that although the bias is
large for largely censored samples, it goes to zero as the sample

becomes complete.

2;; The Binagy (Probit) Model

In this section we take up the binary dependent variable
model, which can be seen as a special case of the Tobit model, (see
Goldberger (1980)). We only observe two different values for yi (say,

zero and one), according to the rule;

e
1 if y1 > 0

I 1.1 eee n
(2.4) Y1 * . .
0 if y1 < 0

*
where, yi are defined in (2.1). Thus only the sign of the dependent

20

variable is observable. The only thing that can be identified is the
ratio of the mean and standard deviation ( u / a ), so that if we
assume that the variance is known and equal to one ( or use a as a
normalization factor). then the mean of the dependent variable ( u )
can be estimated.

The probability limit of the estimated mean (assuming 02 known
and equal to one) is given by (12) of Arabmazar and Schmidt. The
asymptotic bias is given in Table 2.1, under the heading ”binary”, for
various values of u .

Unlike the censored and truncated cases, the sample does not
become completely observed as u gets positive and large. The bias is
smallest when u = 0, and increases (symmetrically) as u moves in either
direction from zero. The size of the bias is similar to that of the

censored case with a2 known, for u s 0.

2.5 Conclusions

In this chapter we considered the inconsistency (asymptotic
bias) of the normal maximum likelihood estimators for the censored,
truncated and binary dependent variable models, where the true
distribution of the error terms is first-order Sargan. In the case of
linear regression with a fully observable dependent variable no bias
exists. The normal MLE's ( OLS ) are consistent (they are in fact
BLUE), even if normal is not the true distribution of the errors. In

other words, the normal MLE's are robust to non-normality.

21

The results for the censored and truncated models are to some
extent similar to each other and also similar to the results for the
other distributions considered by Arabmazar and Schmidt (1982). First
of all, the asymptotic bias disappears as u gets large (i.e. as
sample beomes complete) in both the censored and truncated models, but
not in the binary case. When a is known, the bias for all values of u
is much smaller than when we consider the more realistic case of 0
unknown. This is true especially in the truncated model with a large
degree of truncation.

Secondly, the bias of u is generally less in the censored
model than in the truncated model, implying that knowing the number of
limit (unobserved) observations helps in getting a less biased estimate
of the mean. This is so, especially when u is very small; that is,
where the sample is largely censored.

The final conclusion derived from these results is that the
maximum likelihood estimator of u (i.e., the estimator which corrects
for censoring or truncation) is usually less biased than the sample
mean ( y or y' ), suggesting that it pays to correct for censoring
or truncation, even if the true error distribution is not normal.

We have only considered a very special and simple caSe in
which there is only one regressor - a constant term - so that we are
estimating only the mean of the dependent variable. Thus it is not
known how different the results would be in a more general case with
more than one regressor. Certainly this would be an interesting

extension of this research.

 

 

 

 

 

 

 

 

no.0! —0.0l No.0 00.0 vn.! No.0 00.0 No.0 00.0 00. 0.N
#0.0! —0.0! No.0 00.0 0N.! No.0 00.0 v0.0 —0.0 00. V.N
no.0! No.0! no.0 —0.0 00.! No.0 p0.o 50.0 No.0 50. 0.N
v0.0! no.0! no.0 —o.0 N0. —0.0 —0.0 N—.0 no.0 no. 0.—
No.0! v0.0! —o.0! No.0 00. no.0! —o.0 0N.0 00.0 00. N.—
ho.0 v0.0! 0—.0! No.0 0.. 00.0! —0.0 Nn.0 N—.0 N0. 0.0
0N.0 v0.0! 05.0! .0.0 00. m—.0! 00.0 ov.o —N.0 o0. v.0
.5.0 —0.0 00.N! v0.0! 0.0 V—.0! no.0! nh.0 0n.0 on. o
h—.— ——.0 0N.m! n—.0! 00.! no.0 50.0! 00.. —0.0 —n. V.0!
0n.— NN.o 00.0! Nn.0! ——.! 0N.0 00.0! vv.— No.0 0—. 0.0!
0n.— nn.0 m0.0! Nm.0! 00.! mm.0 v0.0! p0.— 0N.— 0—. N.—!
5n.— v¢.0 v0.0! Vh.0! N0.! No.0 No.0 0N.N n0.— no. 0.—!
mn.— mm.0 —n.0! 00.0! 00. hN.— N—.o 0m.N N0.N no. o.N!
nn.— 00.0 no.m! MN.—! 0N. n0.— MN.0 NG.N —v.N p0. v.N!
pn.— mh.0 no.0! 00.—! cm. 00.— hn.0 Nn.n 00.N —0. 0.N!

oo+oucoth oOLOmcoo oo+oocoep ooLOmcoo .Numqmm oo+oocath ooLOmcoo oo+oucoeh ooLOmcoo .MMMHNﬂM. 1MH

0 2:8:ch by: 2:65. 31 com: 0.98m

 

comeom 30.51.th men
28.5 mot... cog: tack no moi 0:01.052

—.N 0_noh

Chapter 3

Robustness of Sargan MLE's to Normal Errors: First Order Case

3.1 Introduction

One of the basic assumptions of the linear regression model as
well as many other models is the normality of the error terms. But
since in a large number of models (e.g., models with censored or
limited dependent variables) the likelihood function contains the
c.d.f. of the distribution of the error terms, the assumption of
normality implies that the normal c.d.f. will appear in the likelihood
function. The normal c.d.f. can only be evaluated numerically; the
calculations would be simpler if a distribution were assumed whose
c.d.f. can be expressed analytically. As we have seen in chapter 1,
families of Sargan distributions are possible substitutes for the
'normal, as Goldfeld and Quandt (1981) have suggested.

In the previous chapter we asked the question of how costly it
is to assume normality of the error terms, when they are in fact first
order Sargan. In the next two chapters we ask the reverse question,
which we think is more interesting. That is, what happens if the true
error distribution is normal, but first or second order Sargan is
mistakenly assumedl? In other words, how good an approximation to the
normal distribution is a Sargan distribution? We will answer this

question by providing some evidence on the relevent statistical
23

24

questions, namely the effect on one's estimates or inferences of the
use of such an approximation to the (hypothesized) true error
distribution.

The statistical cost involved is very much dependent upon the
nature of the model in question. The misspecification of the errors'
distribution ( e.g., an incorrect assumption of normality ) does not
cause bias or inconsistency of the parameter estimates in the linear
regression model, but it does cause inconsistency in limited dependent
variable models, such as the Tobit model. Missiakoulis (1983) has
calculated the asymptotic bias (inconsistency) for the binary (probit)
model. We consider three additional models; the linear regression
model, the censored regression (Tobit) model, and the truncated
regression model. Although these results can not be generalized,
because of the restricted set of models that has been used so far, our
results serve to indicate at least to which extent the results are
model dependent.

In the next three sections we will present results for three
different models, but for the first-order Sargan distribution only. In
the next chapter we ask the same question for the second-order Sargan
distribution ( for all three models ). In each section, first the
model and its estimators are introduced, and then the numerical
calculations follow (tabulated in Table 3.1).

Most of our conclusions are similar to the conclusions of the
previous chapter. For example, the bias depends stongly on the degree
of censoring or truncation, and is generally larger when 02 is unknown

than when it is known. However, an interesting result is that it is

25

typically less costly (in terms of asymptotic bias) to assume Sargan
errors when the truth is normal than it is to assume normal errors when
the truth is Sargan.

3:3 Linear Regression Model

The model considered in this section is the linear regression

model;
E
y =- X B + u
i 1.1 ij j i
(3.1) or

where, xi represents a row vector of the regressors, with K elements,
and B is a column vector of coefficients with K elements. The error
terms, ui, are i.i.d with zero mean and are independent of the
regressors. What remains to be discussed is the distribution of the
errors.

In section 2.2 we asked the question previously discussed by
Goldfeld and Quandt (1981), namely, what happens if we assume normal
errors when in fact the true distribution is first order Sargan. It is
obvious that normal MLE will simply bqugg , and that OLS is BLUE,
unbiased and consistent, under above conditions. However, Goldfeld and
Quandt provided the non-trivial result that the efficiency of OLS
relative to the Sargan maximum likelihood estimator is approximately
0.84 .

In this section, the opposite question is addressed. Thus

the errors are actually normal, but we treat them as first-order

26

Sargan. ( We will ask the same question with the second-order Sargan
instead in the next chapter.) We therefore discuss the cost of using
first-order Sargan as an approximation for the normal distribution,
when the normal distribution is the correct one. The result, as we
will show, is that there is no cost (no asymptotic bias) involved at
all in assuming first-order Sargan when normal is the true error
distribution. Our result indeed is much stronger than just stated,
since it says that there is no bias if first-order Sargan is assumed
mistakenly as long as the true distribution of the error terms is
symmetric around zero. That is, the result hinges on the symmetry of
the true distribution of the error terms only and not on its normality.
Therefore, Sargan "MLE" is consistent provided that error terms are
i.i.d with a symmetric distribution around zero.2

In order to show this result, we begin by forming the Sargan

likelihood function.

(3.2) inL = n in a - n in 4 - a fly1 - B'Xil + E in [1+aly1 - B'X1|]°

‘ A
The Sargan MLE's ( a . B ) satisfy the following first order

 

conditions:
-A'x l
611 * l1’1 ‘3 1
(3.3) —--I‘ - §--§|yi-p'x1|+§ , . -
a a 1+a|yi-B'X1|

~ 1
x13

binL ‘ “

3.4 -—1—- - a X - a X . a 1 I

( ) ‘2' 13 ’ 13 1+a(y -e'x)
i 1

+ a X , 1. X11!- 0

— _ _, 7

l «(y1 8 X1)

which are similar to the equations (3.2) in page 146 of Goldfeld and
to summation over observations such that:

Quandt. Here § refers
yi Z B'Xi , and conversely for {- .
Now we divide both (3.3) and (3.4) by n (sample size) and take
probability limits. Defining Z and B to be the probability limits of
a and B respectively, we get;

i 2 lyi-B'Xil

1 ~
-p11m—£|y -B'Xl+p1.im ~ ~
n1 1 1 nildulyi-B'Xil

(3.5)

nzlr:

n n
~ ~ - 1
(3.6) alem—E-plim-ﬁ-E§Xij-aplim-leima-:£Xij

n
1 X

~ + 1
-a1ﬂﬂn-—;ﬂhn-— A. ~
n u+ 1 1+dy1-B'Xi) 11

n
+Zp11m-Eplim-ﬁ-1-g ~ 1~ xij .- o.
l-dy'B'X)
1 1

(Here n+ is the number of terms in § , and similary for n_ ).
Deriving some of the probability limits in the above two

But a closer look at these equations will

equations is not feasible.

indicate that to show consistency of B , we do not need to evaluate
Rather we will simply show

all of these probability limits explicitly.

28

that equations (3.5) and (3.6) are satisfied with Bj - Bj , for some 3.

Starting with (3.5), we set Bj = ﬂjfor all j = 1,2,...,K ;

luil
(3.7) . o

 

- Plim— -u§|i|+Plimn§

all»;

1+a|u 111]

01'

(3.8) —- PM +1: [—L‘lL-j-

a 1"]ul

If u is distributed normally with mean zero and variance oz,
the first expected value in (3.8) is easy to evaluate ( a VF27_; ), but
the second expected value poses a problem. However, as long as there
exists a positive solution 3 to (3.8), its actual value is not of any

importance.

Next we define;

~ 1 1 1
(3-9) Xj = puma- i xij . plim-n:§_xij - plim—Exj ,

where the last two equations follow from the the fact that x is

independent of u.

With Bj = 6 j(for all j=l,2,...,K), equation (3.6) can be

rewritten as

n n_ 11+
(3.10) EPlim-i'i' -aP11m—x -‘EP11m-—Plim-— -—§——x
n ‘j j n+ lﬂoui 13
~ 11- l 1
+aPlim—Plim—2—x . 0
n r1—1~ ﬂ
- -uu1

’ j‘l’ OOO’K

29

Also note that;

n
+ ~, _ '~, ,
(3.11) Plim—n- - P (y1 > 5x1) P(ui>6 xi-B X1)

n
(3.12) Plim—n- - P(y1<B'X1) - P(u1(B'Xi-6'Xi)

and, with Bj = B (for all j=l,2,...,K) both expressions in (3.11) and

3
(3.12) are equal to l/2 with any symmetric distribution of the error

terms. Also,

1 l l
. —- — - —x > O
(3 13) plim n+ g; [1 ~ 1 xij] ml ~ jIn ]

- P[——1:|u>01i.,
lion 3

and similary for the similar term involving n_, again because x and u

are independent. Then (3.10) becomes;

(3.14) E P(u > 0) 'i' -'&' P(u<o) 31' 452 P(u>0) 1: [—l—lu > 0] 31'.
J J 1% J

+32 P(u<o) m-é—{u < 0] 3i. . o
lea: 3

It is obvious that (3.14) holds, provided that u is
distributed symmetrically around zero (of which the normal distribution
is a special case) and also provided that the necessary expected values

exist. This is so because symmetry implies:

30

P( u 2 0 ) = P ( u S 0 ) , and

(3.15) E [47-1190] - E E—i-luGJJ
Inn: 1111

Therefore, both equations (3.5) and (3.6) are indeed satisfied
for B = ﬂ , and for some value of E as implied by (3.8).

This result indicates that the Sargan MLE's of the
coefficients of the linear regression model are asymptotically unbiased
(consistent), and thus there is no loss (in terms of asymptotic bias)
if any symmetric distribution of the error terms is approximated by
first-order Sargan. (The same result holds for second order Sargan, as
is shown in the next chapter.) Therefore, the result indicates a sense
in which the Sargan distribution is an adequate approximation for any

symmetric distribution of error terms in the linear regression model.

3.3 The Tobit Model

We now turn to the censored dependent variable model, in which
we assume that there is only one regressor, a constant term. This
assumption is made for reasons of tractability, and also to make the
analysis comparable to the studies done by Goldberger (1981),
Arabmazar and Schmidt (1982), and Missiakoulis (1983). Our attempt,
therefore, is to estimate the mean of the original (uncensored)
population. This can be seen as a starting point for a more

complicated case in which we have several regressors in the regression.

31

Thus we consider the model:

i I 1, 2, ..., n

e
(3.16) y:l I (id-mi

*
where ui are i.i.d with zero mean. However, we do not observe yi , but

only yi= max (0, yi*). That is, we have a model with a censored
dependent variable; it would be the Tobit model if the ui were
distributed as normal.

In section 2.3 we asked the question of what happens if the
error terms actually have a first order Sargan distribution, but they
are assumed to be normal. However, here we ask the opposite question.
That is; if the errors are actually normal but we treat them as
first order Sargan, what happens to the estimates of the mean?
In other words, how good is the first-order Sargan distribution
as an approximation to the normal distribution? Note that
again we address the cost involved (in approximation) only in
terms of the asymptotic bias of the estimators. Basically we
expect the same kind, of results as in section 2.3, meaning that
the bias, although possibly large in cases of severe censoring, will
disappear as we get closer to a complete sample.

In order to measure this inconsistency, we start by looking at

Sargan logarithm of likelihood function ;

m m
(3.17) hi. - m1na-m1n4-a 2 lyi-u]+ 2 2n[l+aly1-uI]
1-1 1-1

4' (B1) in F09) .

32

where, we have assumed that the first m observations (y1,y2,...,ym)
are positive, and the remaining (n-m) observations (ym+1,yh+2,...,yn)
are all zero (limit observations). F is the first-order Sargan c.d.f.,

defined as (a special case of (1.9));

1-%ue-mx(2*30 x110
(3 .18) F(x) I
%wén‘(2nn0 x<(0

The first order conditions for the a and u are as follows;

m a m IY "H
1
(3.19) L’iﬂ‘ - 21- Z lyi-ul + X [————. .1
6a a 1'1 1'1 1+“IY1'PI
+ (rt-1:06 (01,11) " 0
(3.20) —a"{‘L - a(m+-m_) - ag-e—l—i-dgg—r-l—a—
bu 1min» 1101-11)

«xi-mud) -0

Here G is defined as:

 

1x + tin/(2W) u > 0
(3-21) 001.11) '-
ecu (@2 " 11L) (1 < 0
zr-e‘“ <2-m)

and H is defined as the ratio of the Sargan density function f to the

Sargan c.d.f., with 70g 71= l (in order to insure the uniqueness of the

33

MLE in this case).

GK:::? xx)
(3.22) H(-x) - {egg-3- -
menu-mt) K0
Hue-m0
m
Also, ; refers to summation over observations such that both
m

conditions; y 2 0 , y S 2 hold, while ; refers to summation over
A

observations such that both y 2 O , y S u hold, and m+, m_ represnt the

relevent numbers of observations.

Now dividing by m and taking probability limits, with

Z = Plim 3 , E = Plim a , we get:
m m lY'JEI
1 ~ 1 1
(3.23) é-Pum; Z Iii-ulwnrn; ..
a 1-1 1-1 1+aly 111]

+ Plim 1:; C(15) - o

 

- .. m+ ur ~' m+ 1 1
3.24 —- A-am—M— ~ ~
< ) aIPUm m pm m m “Emmi-.1)
+Epum§plimal§ ~1 ~ -p11ml:iH(-E) - o
- Moi-u)

The probability limits can be rewritten as:

(3.25) é-Etlm‘ll ‘y>01+EI—,J:’-1‘;',—l y>01+§g§} G('E,I1')-o.
Hair-Pl

a

34

Errcyiily») - P<y<3ly>0>1

-ZP(y>Ely>0) El ~ 1,, I 3%: >00]

(3 .26) INN)
+ E P<y<11ly>0> El ,, 1 ~
1W)

-PW .~ -
m3“) °

 

|y<TLy>01

 

The probability limits 5 , Z of the Sargan "MLE's" will
satisfy these equations. To evaluate the inconsistency that results
when the error terms are actually normal (rather than Sargan), we need
to evaluate (3.23) and (3.24) for Y's normally distributed with mean u
and variance 02. There is no problem in evaluating the required
probabilities, nor in calculating E [ IY - 5 II Y 2 0 ]
analytically (see appendix A). However, the other three expected
values in (3.23) and (3.24) were intractable and had to be calculated
by Monte Carlo methods (i.e., simulated). The number of drawings
(of y) which we used ranged from 10,000 to 130,000 depending upon the
likelihood of the conditioning event - for example, if u = -l ; 3 =3
it will take a lot of drawings to observe y 2 H enough times to
calculate the conditional mean accurately. We also calculated some of
the more difficult expected values by numerical integration, and got
essentially the same results, The equations (3.23) and (3.24) were then
solved using the Gauss-Newton method.3

Our results are given in Table 3.1, under the heading

”censored". (The results under ”truncated" will be discussed in the

35

next section.) We first look at the case where the £539 a is known
(that is, the Egg; a is taken to equal twq,'which corresponds to
a2 = l.) and then consider the case where the variance is not known,
where the latter case is presumably the realistic one. The results are
qualitatively similar to those of Table 2.1, for the opposite case
(Sargan is true but normal is assumed), eSpecially in the rapid
disappearance of the asymptotic bias as the degree of censoring falls.
That is, the bias is heavily dependent on how complete the samples are.
For samples that are at least half complete, there is virtually no
bias, and for the samples that are at least 1/4 complete ( u = -l.2 and
above), the asymptotic bias is small. (However, as we will see in the
next section, this is not so in the truncated case, in which the bias
is generally larger than in the censored case.)

Now we relax the assumption that the variance is known, to see
how much of a difference it makes whether the variance is known. The
results, as shown in Table 3.1, clearly show that knowing the variance
does matter. The bias generally is larger when we estimate the
variance of the dependent variable ( 02), than when we assume it.' That
is, the results for the case in which the variance is unknown are more
pessimistic than in the case of known varaince. Indeed, the difference
this makes is substantial, at least for heavily censored samples; the
absolute bias is typically two or more times higher when the variance
is unknown than when it is known, for samples less than 1/4 complete.

Another interesting result is that the bias is typically

smaller in Table 3.1 than_in Table 2.1. It is less costly (in terms of

absolute asymptotic bias) to assume Sargan when the truth is normal

36

than it is to assume normal when the truth is Sargan. Why this should
be so is not obvious. (Note that it is not so when the variance is
assumed known.)

In the next chapter, we ask the same question again, for the
second-order Sargan distribution. That is, we will measure the
statistical cost of using second order Sargan as an approximation to
normal. Not surprisingly, there is less asymptotic bias involved when
we use second-order Sargan than first-order. This may indicate that
the second order Sargan distribution is a more adequate approximation
to the normal in the econometric models. However, we have selected
quite a limited number of models to investigate the cost, so the

generality of this conclusion is perhaps questionable.

3.4 The Truncated 2gpendent Variable Model

In this section we discuss the truncated version of the Tobit
model of the last section. That is, the model is given by (3.16).
However, now we observe yi= Y1. if and only if yi* Z 0. Nothing else is
observed, not even the number of unobserved observations (limit
observations). The results calculated in this section are in many ways
similar to those for the censored dependent variable model of the last
section. The bias is strongly dependent upon the degree of truncation,
going to zero as sample becomes complete, and it is also larger when
the error variance is unknown. However, the bias is generally larger
in the truncated case than in the censored case.

We wish to evaluate the asymptotic bias due to assuming the

37

errors to be first-order Sargan when they are in fact normal. We start

with the Sargan likelihood function:

111 m
(3.27) m. =- -min4+mxna-a X lyi-uI-i- Z h1[1+a|y1-ul]
1-1 1-1

-m1.nF(u) .

As in the previous section, we first find the first order

 

 

 

 

 

 

conditions:
In “ m ly 'l-‘l A A
(3'28) 6121‘ - g' 2 lyi'lll+ {-4—1--mR(a,u) I 0
0a a i=1 i-l 1+a|y1-ul
(3.29) “1‘- a (m+-m_>-a ——.—-=-+a_§ . .
bu 1+a(y1-u) 1"“(3’1'10
- In V (n) ' 0
where,
r
ﬁlms)?!"
pi>0
41-m(2+au)
(3.30) R(a,u) I 4
1
{MT-Ea “<0
and
“Hi
aﬂmkm “)0
(3.31) vcm - ﬂ“) - H ”e
PM
aCl
‘ﬂ‘l u<0

and other notation is as used in the previous section.

Then, dividing by m (sample size) and taking probability

limits yields the following equations, which the probability limits

38

3 , 3 will satisfy:

(3.32) é-EIIy—El |y>01+Et—JDiL-l y>01 «(5.11) - o
a burly-pl

(3.33) EP<y>Ely>0> - T£P<y<ﬁly>0> - E'P<y>'5ly>0) Bil-71:,— ‘ y>ILy>01
Haw-u)

+ EP<y<p"|y>0) E[—— I)! may»)

Imam
- WE) - 0

Again, since the normal c.d.f will appear in the expected
values in equations (3.32) and (3.33), as in the last section, we are
not able to calculate them analytically (except the first one).
Therefore, we solve (3.32) and (3.33) numerically for E and H , with
the same simulation procedure used in the previous section used again
to evaluate the necessary expected values. Our results are given in
Table 3.1, under the heading "truncated".

We will discuss both the case of known a and the case of
unknown a . Generally speaking, in both cases the bias is much larger
than in the corresponding censored model. In the truncated model, the
asymptotic, bias goes to zero (in both cases) as samples become about
75% complete, compare this to the censored model in which the bias is
virtually zero when samples are only half complete. As in the censored
model, the results are heavily dependent upon the degree of truncation.
For low degrees of truncation, the bias is very small, while at higher
degrees of truncation the bias gets very large. Basically the same
kind of conclusions can be drawn for the asymptotic bias of a , for the

unknown variance case, of course. That is, the bias is larger in the

39

truncated model than in the censored model, and in both cases, the bias
of a depends on the degree of truncation (or censoring).

Consider first the (realistic) case in which a is treated as
unknown. The absolute bias is clearly greater than for the
corresponding censored case, as just noted. However, what is most
striking is how much smaller the absolute bias is than in the opposite
case (Table 2.1), in which we assumed that first-order Sargan was the
true distribution of the error terms, but by mistake we used normal
instead. The same kind of results were also found in the censored
model, though not so strongly. It is less costly (in the truncated
model, far less costly) to assume Sargan when the truth is normal than
it is to assume normal when the truth is Sargan. Nevertheless, the
bias is still substantial except for samples that are substantially
untruncated.

Next we consider the (unrealistic) case in which a is assumed
to be known. In the present case (normal errors with zero mean and
variance 1) this means we set E = 2, and then solve (3.33) for 3 .
This presents some computational problems, since for any fixed 3 , the
left hand side of (3.33) approaches zero as Z approaches - a. In other
words, 3 = - a is always a solution. This is not a problem in and of
itself as long as another solution exists. However, for 3 S -0.8,
apparently no other solution existed. (At least we could not find one,
despite our best efforts.) The corresponding entries in Table 3.1
are marked "not available". Those biases which are available ( u 2 -.8)

are reasonable, however.

40

3.5 Conclusions

In this chapter, we attempt to measure the statistical cost of
the use of the first-order Sargan distribution as an approximation to
the normal, in econometric models. Since the motivation for the Sargan
distribution is really computational ease, it seems reasonable to focus
on the case in which the errors are actually normal, but the Sargan
"MLE” is used instead. Our measure of the statistical cost of such an
approximation is the inconsistency (asymptotic bias) which results.

An obvious feature of this ”cost” is that it is model
dependent. No asymptotic bias results in the linear regression model.
As we saw in section 3.2, the unbiaseness of the coefficients hinges
only on the symmetry of the distribution of the errors and not on its
normality. In the censored regression (Tobit) model, the bias can be
substantial, but it is reasonably small for samples that are at least
50% complete. In the truncated regression model, the bias is even
larger, and seems large enough to be bothersome except in samples that
are substantially (say, 75%) complete.

Also the results indicate that knowing the error variance
helps, in both the censored and truncated models. The bias is much
smaller when we know a than when a is estimated.

Finally, comparing these results to those of chapter 2 tells
us that the Sargan distribution is a better approximation for the
normal than the normal is for the Sargan. That is, the bias is much
smaller when the world is normal but we approximate it by Sargan,

than when the world is Sargan, but we treat it as normal.

41

Obviously other models could be considered, (e.g., the
disequilibrium model, which Goldfeld and Quandt (1981) used to
introduce these distributions), but the fact that the adequacy of the
approximation is so clearly model dependent is sufficient to argue for
caution in the use of these distributions. At least this is so if they
are really viewed as approximations to the normal; it is possible to
argue that there is no more reason to assume normal errors than to
assume Sargan errors.

In this chapter we only considered first-order Sargan
distribution, since higher order Sargan distributions contain more
parameters than the normal. It is the case that a higher order Sargan
distribution (e.g., second-order) would perform more adequately, as we

will see in the following chapter.

42

Footnotes

1) This is perhaps a strange case to consider, since Sargan
errors are unlikely to be assumed in a regression context, being less
convenient than normal errors in this case. However, the regression
case serves as a useful standard of comparison for the censored and
truncated cases to be considered later.

2) Subject to the qualification that certain moments need to
exist, as will be made explicit below.

3) Note that this is an expensive undertaking because 3 and 3
change at each iteration, thus requiring fresh calculation of the
expected values just discussed at each iteration.

Table 3.1

AsymptotIc Blas of Sargon “MLE” When
True Errors are N(0,1)

 

 

* um known) pm unknown) a

Ji. P(z >0) Censored Truncated Cbnsored Truncated Censored Truncated
-2.8 .003 -O.58 * 0.97 2.81 1.70 2.97
-2.4 .008 -0.32 ' 0.77 2.30 1.32 2.18
-2.0 .023 -O.12 ‘ 0.59 1.97 1.00 1.83
-1.6 .055 0.01 ' 0.46 1.48 0.78 1.28
-1.2 .115 0.09 ’ 0.33 1.15 0.53 0.98
-0.8 .212 0.10 -0.87 0.20 0.86 0.30 0.70
-0.4 .345 0.08 0.19 0.09 0.58 0.08 0.44

0 .500 0.03 0.23 0.02 0.34 -0.08 0.20
0.4 .655 0.00 0.16 0.00 0.17 -0.15 0.04
0.8 .788 0.00 0.08 0.00 0.07 -0.16 -0.05
1.2 .885 0.00 0.03 0.00 0.00 -0.14 -0.14
1.6 .945 0.00 0.01 0.00 -0.02 -0.12 -0.16
2.0 .977 0.00 -0.01 0.00 -0.03 -0.11 -0.18
2.4 .992 0.00 -0.01 0.00 -0.02 -0.11 -0.18
2.8 .997 0.00 -0.01 0.00 -0.01 -0.10 -0.15

 

5
Not avallable. See text for explanatlon.

Appendix A;

~

1. if E < 0
E {Iy4EI|y>0} - I; <y-E) f(yly>0> dy

- I; y£(yly>0) dy-T; I; f<YlY>°)dY

- E <y1y>0) 4E P<y>oly>0>

E {IV-3190} ' u + o ' m (g) - 3 (Ad)

(3)
std. normal density a

std. normal cdf 0%)

 

it .
where m(d)

2. ifE>o

E {Iy4ﬁl[y>0} .- I; Iy-El f(yly>0> dy

Ig'cﬁty) f(yly>0> dy + 13 (y-z) f(y|y>0)dy

E 13 f(yly>0) dy - 13 y:(y|y>0>dy

+ IE yf()’|¥>0)dy '31: E “3‘50”” (A'z)

 

I; 1 11' mm <") o '°-a
Io f<y|y>0> - 5?;553-I0 f(y>dy - -s?;§5§s- - ¢(M)
O
<A-3>

 

@OE:E)
[a l r» 0
~ f )0 - m A. f d I

44

45

N P 0< <") I; ~
I8 yf(Y|>'>0)dy ' w Io yf(>'l0<>'<u)dy

 

 

P(O<Z<E) 0‘ <3)
- - -u
mhmu) ¢<—5)-¢<i‘-5-) (H)
- a a [n+0 ~ ]
wig) Nab-meg)
p("< <o) ~
FE Yf(YlY>0)dY ' 3%3-1515'1110]
Md?) «33—5 (M)
' T [11+O--—:—l
“0’ «in

J" 11 _“' - .E
2¢(B;£>-¢<-5)} + a “(9.03) 4(a) (11-7)
mg) «13)

 

 

E{ly-El y>0} = <u-E) [

Chapter 4

Robustness of Sargan MLE's to Normal Errors: Second Order Case

4.1 Introduction

In this section we assume a second-order Sargan density to
estimate the parameters of the three models in the last chapter. First
we discuss the linear regression model, then the Tobit model and

finally the truncated dependent variable model.
3;; Egg Linear aggression Model

We begin our discussion with the linear regression model as
defined in (2.1), where the error terms are i.i.d with zero mean, and
are independent of the regressors. We consider the case where the true
error distribution is normal, but mistakenly it is assumed to be
second-order Sargan. The results are the same as in the case of
first-order Sargan, that is, if the error terms are i.i.d normal (or
indeed have any distribution symmetric around zero), the Sargan gag is

consistent. To show this, we start with the log likelihood function;
1
(4.1) in L - n in a - n in 4 - n in (1+12) - a 2 I y1 - 8 X1]
1
' 2 ' 2
+ tin [1+alyi-Bxil +0: 12(y1-BX1) 1

1

46

47

The first order conditions arel:

5' an a; 2
lyi-s xil + Zavzo'i-B x1)

 

 

A!
(4'2) 3- - X l yi " B xil + 2 A A1 “2‘ A1 2 '
a i i 1+a|yi-B xi] + a 12(y1-8 Xi)
A A.
(4 3) 'n 2 “2(Y1-B x1)2
° A + A

 

A. «25 A. 2
1w2 i 1+a|y1-B X1|+a 12(y1-B x1)

3

AA A,
(4.4) «(2 X1 - 2 X1) - a 2 a

A. A20 at 2 i
+ - + 1+a(y1-B X1)“: 12(3’1-5 X1)

)

AA A.
1-2a72(y1-ﬂ X1)
2 ’—A A]

“2‘ H 2 i
- 1-a(y1-B X1)+a 72(y1-B X1)

Now we divide (4.2)~(4.4) by n and take the

 

 

 

 

a: ly -B X l+2a7 (y -B X )
é- Plim%2lyi-B xil + Plim -1r; 2 A} j, iv 1 J 2 . o
a i i 1+a|yi-B X1|+a 12(y1-B Xi)
(4.5)
~2 ~v 2
_1 1 a (yi-B xi)
(4.6) -—:— '1" Plim ; 2 ~ ~' ~2~ ~' 2 u 0
1+72 1 1+a|y1-B Xil+a 72(y1-B X1)
4 7 "’ P11 1 2 x - P11 3-2 x 1
( ° ) a I m 37+ 1 m n _ i
A.» ~'
~ 1 1+2¢rv2(yi-B xi)
+ 1+a(yi-B Xi)+a Y2(yi-B X1)
m ~'
l-Zaw (y 'B X )
+Eplim%z ~ 2 1 1 x1 - o

- 1-a(y1-13 X1>+a 7201-13 X1)

48

(Here, as before, 3 = Plim a , and similarly for the other parameters.)
we show that

ﬂ

In order to show the consistency of B ,

(4.5)-(4.7) are satisfied for B 8 0. Setting 3 = 6, we obtain;

(4.3)

(4.9) —-—"I +32 1: u 2} . o

1+?2 1+E|u|+3272u

1+2'Ey"2u

1+3u+§272u2

lu>0} '1?

 

(4.10) Ep(u>0)1<' - Er(u<0)3i - Ep(u>0)z {

~ 1-2;;2u ~
+aP(u<0) E { ~ ~2~ 2|u<o} x - o
1-au+a yzu

 

Equation (4.10) holds, because

P( u S 0 ) = P( u 2 0 ),

and because of the equality of 2 expectations:

1+2a12u x|u>0}

 

X ' E
N ”2” 2 1 1+au+a272u

 

(4.11) Plim %—

1-2;;éu

|u>o} 1‘6- E { ~ ~2~

- E 'U<O} i
{ ~ ~Z~ 2 1-au+a yzuz

 

Therefore, no asymptotic bias results from the assumption of

second order Sargan errors. Equations (4.8) and (4.9) will determine

49

the values for E , and 7 . As long as they are positive, their actual

values are not important.
4.3 The Tobit Model

Now we turn to the censored dependent variable model, and
assume that there is only one regressor , a constant term. That is, we
estimate the mean of the population. The model is defined as in (3.1),
in which the u's are i.i.d with zero mean. We can only observe y's
which are defined by yi= max (0,yi'). We do not observe the y.'s. The
u's are in fact distributed as normal but we mistakenly assume them to
be distributed as second-order Sargan. Therefore, the log likelihood,

function would be:

m
(4.12) in L = m in a - m in 4 - m in (1+12) - a 2 | yi-pl
i=1

m 2 2
+ X in [1+alyi-ul+a Y2(yi-u) 1
1-1

+ (n-m) in F(-u)

where F is the second-order Sargan c.d.f.:

( e-ax 2 0
1 " 47mg) [2+m2”2‘1m) ’ "’
(4.13) F(x) - 4
ax

e _ - 2 <0
i W [2 “”2““72‘1 “’0 1 "

 

50

Differentiating (4.12) with respect to all three parameters,

and setting them equal to zero, results in:

 

 

 

 

 

 

 

 

 

A AA A 2
ainL m m * m Iyi'"l+2“72(yi'")
(4°14) A a 7 - 2 I yi‘l-lI + A A 142A at 2
0a a i=1 ‘ i=1 1+aIy1-uI+a 72(yi-p)
+ (n—m) W (carp) - 0
m «124/ 402
“f - "1‘ + 2 . . f2. . 2 + (ma) 8 («2.172.») - o
012 1+12 i=1 l+aIyi-uI+a 72(y1-u)'
(4.15)
aan . . l+2a¥2(y1-u)
--.:- a(m+-m_)-a 2 .. . .2. . 2
an + 1+a(yiru)+a 1203-14)
(4.16)
. 1-2a‘12(y1-u) . . .
+ a E a A AZA A + (n-m) T (a9Y29u‘) - o
1-«(yi-u)+a 72(y1-u)
Now we divide (4.14)-(4.16) by m and take Plims to get;
N m N 2
.1 ~ ly-ul+2a12(y-u)
(4.17) :- - E {Iy-uI . y>0} + 1: { | y>0}
a 1+aly-ul+a 12(y-u)
p(z<0) ~ ~ ~ _
+ P(y>0) W (G,Y2,IJ) 0
(4 13) .Zl_.+ E I 32(y4;)2 I >0} +-££ZSQZ.5(Z " ") . 0
° ~ 2 y P(y>0) 972:”

~ ~ ~ ~2~
1+12 1+aIy-ul+a 72(y-u)

51

(4.19) EP<y>T£ly>O> - bodily»)

1+23?2(y-E)

 

 

 

 

- 3P(y>ﬁly>0) E { N N, ~2~ ~ I y>0. y’ﬁ}
1+a(y-u)+a 72(y-u)
.. ~ 1-2‘572(y-E) ~
+ aP(y<uIy>0) E I ~ ~ ~2~ ~ I y>0, y<u}
1-a(y-u)+a 72(y-u)
P(Z<0) ~ ~ ~ .
+ P(y>0) T (asYZsp) 0
Note that
Plim.EE- - P(y>E I y>0)
m- N
Plim.-;- - P(y<p I y>0)
n-m . P(y<0)
Plim _m P(y>0)
-u(1+au+126 u ) ~
2+12+au+12(1+au)
m-‘b/ ~
N ~ N a
w<¢sstF> ' "—'—~'_E‘ 'I
F(-u) -
-ue“”(1-au+vza :4 ) ~
m p.<0
I (1+12)-e [2+12-au+72(1-au) ]

 

 

(4.20)

52

 

 

 

 

 

 

 

 

 

 

V
-3<1+?"u+&'2?232) ~
.. ,..., .. m 2 .90
~ 2+12+au+12(1+au)
~ ~ ~ 0F(-p.)/ 311 I
T(a.72.u) - ~ . 4
F(-u)
~ 35 - ~ ~2~2
(4.21) we “wa p > ~<0
u
L4<142>-e“"t2+i.-awzu-m21
' ﬁnial) ~
.. .. ,..., ~ m 2 I90
tau-Eva; <1W2>12n2+au+12<1+au> 1
8(3; E) ' 2 1'4 -
2 ' EH” agap(1';;)’ (1+?) ..
(4.22) m 2 2 “<0
I4(1+?2)-e““[2+5y'2-334;2(1-;;) ]

It is easy to see that evaluating the probabilities is not
hard, nor is calculating E { I y- 71 I I y 2 0 } analytically, but the
other expected values are not easy to calculate. Thus we have simulated
10,000 to 130,000 observations (depending upon the degree of censoring)
from a normal distribution with mean zero and variance one, i.e.,
equations (4.17)-(4.19) are solved numerically. The results are shown
in Table 4.1, under the heading ”censored? There aretwo sets of results
for the case of ”variance known", with different values for a and 72 .

The column labelled censored (I) is based on the values of a
and 12 which correspond to variance equal to l and the fourth cumulant
equal to zero, namely a = 3.07638,and ‘YZ = 2.15470, and the column

labelled censored (2) is based on values of a and 72 which correspond to

53

variance equal to l and the value of the density function (second order
Sargan) equal to .3989 at x= 0, namely a=-2.69500,Yé = 0.68884.
Missiakoulis(1983) in his study used 72: 1/3 and a = 2.12132 which as
we have shown in chapter 1 would not result in 02: 1. He should have
used a = 2.44949 instead.

In either case with the variance unknown, the biases are very
small. In fact, except for large negative u , there is almost no bias.
In the case of " unknown variance", we let both a and 72 vary and be
defined within the model. Here the bias is larger than in the
known-variance case, but it is still not very large for samples that
are at least half complete.

Also the results in both cases ( a known and 0 unknown ) are
better than the results shown in Table 3.1, for the first order Sargan
case. This makes the second order Sargan more attractive than the
first order, although of course the bias should not be taken as the

only means of selecting an alternative for normal c.d.f.
4.4 The Truncated Regression Model

Finally we discuss very briefly the truncated dependent

variable case. Here we have the following, log of likelihood function;
m
(4.23) int = m1na-m1n4-mln(l+72)-a 2 Iyi-uI
1-1

m
2 2
+ 2 in [1+aIy1 - uI + a 12(yi-u) ] - m in F (u)
i-l

We take derivatives of (4.23) with respect to a ,1} , u, set them equal

54

to zero, divide by m and then take probability limits:

~ NV ~2
ly-uI+2a72(y-u)

 

 

 

 

 

1
(4.24) -- E {Iy-uI y>0} + E { I y>0}
a 1+aIy-pI4-a 12(y-u)
- A(;s;293) . 0
~ 2
-1 ~2 (y-u)
(4.25) ~ + a E { ~ ~, ~2~ ~ 2 I y>OI
1+12 1+aly-ul+¢ 72(y-u)
- C(Es;zs:) ' 0
(4.26) EP<y>EIy>O> - EP<y<IIIy>0>
A.» ~ 2
~ ~ 1+2¢72(y-u) ~
-aP(y>uly>0) E { ~ ~ ~2~ ~ 2l y>u.y>0}
1+a(y-u)+a 72(y-u)
.1. ~ 2
~ ~ 1-2a72(y-u) ~
+ aP(y<uly>0) E { ~ ~ ~2~ ~ 2I y<u.y>0}
1-a(y-u)+a 72(y-u)
- B (3232.3)
where

(4.27) A(E.?2.E) -I

.. a....aQ~Q .2”
u<1+am2a u )2 “ ~
u>0

 

4<1€2>-e'““12+§'2+'&'5+?2<1+’&"4>21
u(1-au+vza u )

~ .3... a..2 "<0
L Zﬁzrawzﬂ-au)

 

 

55

 

 

 

 

 

I' a.
Ee’““<1+‘&'i+§'23232) ~
.1 100
~~ ~ I 4(1+72)-e [ZﬂztauﬁZUWMJ
(4.28) Mmzm) -I
3(1-3'E+E'2?232> ~
.~ ~.,.’ 'v~ 2 u<0
I 2ﬁ2-3W2(1'au)
V GIN-m «I» ~
-aue ”(Haw/am) ~
... u>0
4<1+9'Z>-e'“"12+72+&”u+?2(1+3‘5)21
(4.29) «352511) - 1
imam/(142) ~
~ - ~ ..,3' u<0
I 2+12-au+72(1-au)

 

As in the censored case we have calculated the necessary
expected values by simulation, based on 10,000 to 130,000 replications.
The results are shown in Table 4.1, and labelled truncated . Two sets
of numbers are generated for the case of known variance.

As expected, the absolute biases are generally larger in the
truncated Iregression model than in the censored regression model, and
they are generally larger when the variance is unknown than when it is
known. They are generally smaller than in the first-order Sargan case,
sometimes substantially. So, at least from the standpoint of bias, it

is safer to assume second-order Sargan than first-order Sargan.

56

Footnote

1) Incidently, we consider the second order Sargan distribution
to have two free parameters ( a and 72 ). This differs from
Missiakoulis (1983) who constrains 72 I=1/3.

Table 4.1

Asyuptotlc 81as of Second Order Sargon 'NLE'
when True Errors are N(0,1)

 

 

u (Variance Known)

 

 

 

p (Varlanoe Unknown)

 

2 Censored (1) Censored (2) Truncated (1) Truncated (2) Censored Truncated
-2.8 -.07 -.29 ' * .66 2.37
-2.4 .01 -.14 -8.56 ' .45 2.04
-2.0 .05 -.04 -0.91 -12.57 .27 1.72
-1.6 .05 .02 -.08 -2.17 .15 1.20
-1 .2 .03 .04 .21 -.47 .03 .87
-.8 -.01 .04 .23 .01 -.07 .64
-.4 -.04 .02 .09 .09 -.12 .40
0.0 -.06 .02 -.09 .06 .01 .16
.4 .03 .01 -.11 .06 .01 .13
.8 .01 0.0 -.02 .05 0.0 0.7
1.2 0.0 0.0 .03 .02 0.0 .02
1.6 .01 0.0 .03 .01 0.0 .01
2.0 0.0 0.0 .01 0.0 0.0 -.01
2.4 0.0 0.0 0.0 -0.1 0.0 -0.1
2.8 0.0 0.0 -.01 -.01 0.0 -.01

 

Chapter 5

Bivariate Sargan Distribution

5.1 Introduction

So far we have considered only univariate Sargan
distributions. In this chapter we extend this study to the bivariate
case, in a way which could be generalized to the general n-variate
case. If we view sargan distributions as easily computable
approximations to the normal, the logic for considering multivariate
Sargan distributions is in fact much stronger than for univariate
Sargan distributions, since the multivariate normal c.d.f is so much
harder to calculate than the univariate normal c.d.f.

It is not immediately clear what a bivariate Sargan
distribution is. Therefore, I begin by exploring different ways of
defining such a bivariate density function, and some of their advantages
as well as their shortcomings. Then we will approximate the bivariate
density in a specific form. This is done on the basis of Stone's
theorem (1962, p. 74), which shows that any function f(x) with certain
characteristics can be uniformly approximated by functions of the form
e.“ x P(x), P(x) being a polynomial.

We will also make some comparisons between the bivariate
Sargan densities and their normal counterparts. Finally, we will look

at the robustness of the estimators assuming such a bivariate density
58

59

for different models.

The plan of this chapter is as follows. In section 5.2 we
discuss alternative possible definitions of a bivariate Sargan
distribution, and choose one such definition. In section 5.3 we
compare the bivariate Sargan density to the bivariate normal density
(for various levels of correlation) to see how good an approximation we
have. In section 5.4 we provide some evidence on the statistical
relevance of the use of such an approximation to the hypothesized
(true) error distribution function. For example, as the univariate
case, the mistakenly assumed first-order Sargan distribution does not
cause any asymptotic bias in the (seemingly unrelated) regression
model, so long as the true distribution of the errors is a symmetric
one. (A similarly strong result has already been shown for the
univariate case.)

Finally, in section 5.5 we conclude that the bivariate Sargan
distribution, as we have defined it, is an adequate and reasonable
approximation for the bivariate normal distribution, at least when the

correlation coefficient is not too large.

5.2 Definitions

There are several possible ways that one can proceed in order
to construct a bivariate density (in our case a bivariate Sargan
density.) We will investigate three ways of doing 50. although not in

great detail, and discuss their advantages as well as their problems.

60

i) Linear Transformation gpproach

Let us assume that x x2 are distributed independently

10
according to the (first-order) Sargan with mean zero and variance one.

Therefore, their joint density function will be;

16 e'“1l"1|'“2|"2I

313;. [1+a1Ix1I][1+a2IX2I]

(5.1) f(x1,x2)

Now we define two other variables Y1, Y as linear functions of X's;

2

Y I a X + 82x2

1 1 1
(5.2)
Y2 - 1:11:14-b21{2
or simply,
Y . AX
(5.3)
81 82
A ' I ]
b1 b2

The joint density function of Y then becomes;

0 -1 -1 -1
8(y1.y2) - Jacobian f(A y) - IIAII f(A y)

1 2
16 A l1’2"1"*‘2"2I"Ti"l"1"2"’1y1l
e
“1“2

 

(5-4) 8(Y1’Y2) - II All.1

“1 “2
. [1+ lezyl-azyzl][1+ Tlaly2-b1y1l]

61
with

(5.5) COV (Y1, Y2) I Z 8 AA'

A = alb2 - azb1

This construction follows by analogy to the relationship
between the bivariate and univariate normal, since the bivariate (or
multivariate) normal can be defined as arising from linear
transformation of independent univariate normals. We can accommodate
any covariance matrix 2 by appropriate choice of A, and we could also
allow for an arbitrary (non-zero) mean vector for Y. However, we
dismiss this as a reasonable definition of bivariate Sargan because its
marginals are not univariate Sargan.

ii) ”Translation" approach

This is a general way of constructing a joint density with
specified marginals, in this case Sargan marginals.

1' x2 have a bivariate normal
density with correlation coefficient b . Let d represent this density

We start by assuming that x

function. ,Now define:

-1
Y1 . F @(XI)

(5.6)

Y - F'1o(x2)

2

where F' is a univariate Sargan c.d.f., and dis the standard normal

c.d.f. Then the marginal distributions of Y1 and Y2 are Sargan, by

62

construction. However, the joint density of Y1 and Y is messy:

2

(5.7) r<y1.y2) - Jacobian 414’1F<y1>. e’lrcy211

Thus there is no conceivable computational advantage of using such a
bivariate Sargan density.

iii) Approximation Methods

 

 

Here we define the bivariate Sargan density as an
approximation. To do this we will make use of a theorem due to Stone
(1962, p. 74), that "any continuous real function f, yhigh is
defined 92 Egg interval 0 S x < a 229 vanishes a; infinity i3 She
sense that 3153 f(x) = 0, gap be uniformly gpproximated by functions

2:. the. e'u‘

P(x), where P(x) is a polynomial". Thus we start by
assuming the density f(xl, x2) to be a continuous function defined

on the intervals xi 6 [0, a), such that;

1 m
xlio f(x1,x2) - 0 sz
(5.8)
xiig f(x1,x2) - 0 Vxl

Following the argument made by Stone (1962, p. 75) we define two new

variables £1, £2, as follows;

(5.9)

63

for arbitrary values of xl> 0, x2> 0

Therefore,

xi= - In 21/ di ,i=l,2.

Notice that ii s(0,l], and ‘1 goes to zero as xi goes to infinity.

Now we define the function e on [0,l]x[0,l], such that

¢ ( 0 , £2)= 0 V £26 (0.1]
(5.10)

¢ ( £1: 0): 0 v £15 (011]

¢ (0 , 0)= 0

Then 6 is a continuous function over [0,l]x[0,l], and therefore, it can

be uniformly approximated by a polynomial in £1,’£2 (p. 69 of Stone).

35:
C00 + C1051 + c0152
(5 11) + c g 2 + 6 a g + c g 2
° 20 1 11 1 2 02 2
'1' 000
N N-l N
+ cN051 +CN-l,1§l £2 + "' + c01452
or simply

B

5
1 2
03152 51 F'2

0<81+82<N

64

substituting for z from (5.9) we get;

1' £2

-a 8 x -a 8 x
(5.12) 2 CB e l l l 2 2 2
0<81+62<N 152

Now all we need to do is to approximate exp( - aiaixi)
uniformly, which can be done using lemma 1 of Stone (1962). The lemma
says that exp( - “15181 ) can be approximated uniformly by

exp(-alxl)-P(xl), where P(x1) is a polynomial in x (which depends on

1
x1 and 61), and similarily for exp(- “25282) . Multiplying and adding
all these polynomials together we get another polynomial in x1, :2,
that is;

-ax.ax
(5.13) e 1 1 2 2 Q(x1,x2)

where Q is a polynomial.

So far we have restricted ourselves to the non-negative values
of xi, 181,2, but our study requires an approximation such that
x. s(- , a ). The question is whether this can be done. If we define

for example,

 

65

another problem arises, since x will not have a one-to-one
correspondance to t, or in other words i is psi 3 function g; i , since
for any given value of t we can find two different values of x (which

are the same in absolute value), that is:
4 ( E )= f(- In £ / a ) or =f( In £ / a )

and we do not know which to use.
In order to get around this problem we impose the symmetry

condition on the density function, so,

f(- In E / a ) =f( In t / a )

and 0 ( E ) iS well defined. Therefore, if f (X1, X2) is

approximated uniformly by

- a X - a X
e l l 2 2 - (polynomial in x1, x2)

over the range [0, a )x[0,° ), by symmetry the negative part (third
quadrant) can be uniformly approximated by exp( alxl+ a2x2)0(polynomial
in x1, x2 ) over (- a ,0]x(- a ,0]. However, this still does not
completely. resolve the problem, since the other two quadrants
(-on ,0]x[0,u ) or [0, a )x(- ,0], are still out of the picture.

From the standpoint of Stone's approximation theory, there
seems to be no simple way to proceed without also making the second and
fourth quadrants mirror image of the first. Suppose we do so, by
assuming that

f(xl, x2) = f( -x1, x2) = f( x1, -x2) V x1, x2.

66

(Note that f(xl, x2) = f ( -x -x2) has already been imposed above.)

1'

Then f(xl, x2) can be written as f(Ix Ile), and it follows directly

1|.
that it can be approximated uniformly by

— a Ix I- 0 IX I
e 1 1 2 2 . (polynomial in lel and IXZI )

The considerable problem with this development is that if

f(xl, x2) can be written in terms of Ix and Ile only, as in

1I
f< IXII. Ile) above, then X1 and x2 are necessarily'uncorrelated. Thus

an approximating density in terms of Ix Ile, such that

l"

. e’“1l"1l'“2l"2l

(5.14) f(x1,x - c (xomllxllﬁzlleﬁﬁxll'lle)

2)

where,

a 2a 2
1 2
4(“1“2Ko+“2K1+“1K2+K3)

can not be alleged to be a good approximation to a bivariate density
which exhibits non-zero correlation. Indeed, this is intuitively clear

since the density (5.14) itself has zero correlation between x and

10

x . Other than that, it would be a reasonable choice from our

2
perspective since it has other desirable properties.

67

Its marginal densities are as follows.

-a1|x1|

 

 

c
(5.15) 30:1) - 7e [aZKOKZ‘I-(alei-KSHXII]
“2
-a Ix l
1 1
- k1 e (1+allxll)
Note that if we let
a K +K
“1 " "2737,73 ”he“
“2 o 2
2 2
C a1 dz (CZKOKZ) a1
k . — (a K +K ) - I
1 a 2 2 0 2 8( K h 2 '3‘
2 “1“2 0 1K2)“2
(s 16) < ) C -a2lle '
o x I ——
3 2 a 2 e [“1Ko*1+(“1K2+K3)"2“
1
-a Ix l
2 2
- k2 e ’ (1+azlle)
and again if we let
2 a1K0+K1
then
k . C(a1K0+K1) - :2-
2 2 8
a

68

Thus the marginal densities are first-order Sargan, as we wish them to

be. The means and variances are:

ll
0

E(X1)

ll
0

E(X2)

2 1 2 32 2
' k1 l“?*"7+—'3'+:51 ' :‘2'
“1 “1 “1 1 1
2
E(X22) - -—2'
“2
(5.18) 2(x1, x2) = cov (x1, x2) = o

as noted above, and as shown directly in appendix A.

One way to allow dependence while maintaining Sargan
marginals, though at the expense of the link with Stone's approximation
theory, is to remove the absolute value from the cross-product in

(5.14). Then we have the bivariate (Sargan) density,

-a Ix I-a Ix I
. 1 1 2 2
(5.19) £011,112) - A e (xomllxllﬁzlleﬂtaxlle

where, I! f(xl,x2) : 1 implies that;

2
. “1 “2
4(a1 02Ko+a2K1+a1K2 )

(5.20) A

69

This still results in marginal Sargan densities, if some restrictions

are imposed. Explicitly,

2A '“1IX1I

(5.21) 3(x1) - ‘;-§-(a2KO+K2+a2K1Ix1I) e
2

For this to be Sargan requires

(5 22) ale > K K1 K2
. a I ——-—-——- 3 . —— - —
1 aZKO+K2 0 a1 a2
Similarily, by symmetry of the argument
-a Ix I
2A 2 2
(5.23) 3(x2) - --7 (a1K0+K1+a1K2Ix2I) e

“1

Again in order for g(x2) to be a Sargan density requires;

a K K
(5-24) 0.2 - E‘Tlc'é‘x" , which implies K0 - 23-4
1 o 1 2 . “1

(5.23) and (5.24) together imply that

(5.25)

Kl/ “1= K2/ “2

That is, to have Sargan marginal densities, the above equalities should

hold. Imposing the second requirement as

(5.26) Kl/ al= Kz/ a2= 9 or K1= Gal ,K2= Oaz

70

9 being some parameter, then A = alaz/ 8 0 . Substituting back the

values of the parameters into the bivariate density function and its

marginals we get very simple results

“1&2 e-aIIxII-azIsz

(5.27) f(x1,x2) . -§§- [6(a1Ix1I+a2Ix2I)+K3x1x2]

a -a Ix I
1 1 1
3(x1) - -z- e (1+a1Ix1I)
(5.28) I
a -a Ix
2 2 2
3(x2) . —z- e (1+a2Ix2I)

The principal advantage of (5.27) over (5.14) is that it does
not imply zero covariance. In appendix A it is shown that cov(x1, x2)

depends on the values of the parameters, in particular K3. Explicitly,

 

2K3
‘ E X ’x ) s

(5.29) cov (X1.X2) ( 1 2 a 2a 26

1 2
_ 2 _ 2

(5.30)

_ 2 _ 2
E( X2)- 0 E(X2 ) - 4/ a2

There is still one problem remaining to be resolved, namely

that f(xl, x2) in (5.27) need not to be non-negative for all values of

71

x1, x2 and the parameters. We will solve this problem by including

another term which involves the cross-product of absolute values

of x1 and x2, and therefore rewrite the joint density as follows:

e-¢1Ix1I-azlsz

f(x1,x = B 0

2) {1<o+1<1 le I+K2 Ix2 I+K3x1x2+K4 Ix1 I Ix2 I}

(5.31)

with K3< K4to ensure f(xl, x2) 2 0. Then B should be such that;

I! f(xl, x2) dxldxz = 1

Therefore,

2a
“1 2

5.32) B .

Its marginal densities are exactly Sargan, given certain restrictions.

 

 

They are:
2Ae-allxll
g("1) 3 2 {“2K0+K2+(“2K1+K4)IX1H
2Ae-a2|x2|
g("2) '3 a2 {“1K0+K1+(“1K2+K4)IX1I}

1

72

For these to be Sargan require

 

 

 

 

a a 13£El:fi I) K I .El.-.f£.+ K4
1 a2K0+K2 0 a1 a2 alaz
(5.33)
a n: fﬁfgiféi I) K I ,EE - El'+' K4
2 a1K0+Kl 0 a2 a1 alaz
Thus they require
K K K
2 1 4
--- I -- , I 9 (say) K I
a2 a1 ’ 0 alaz
Then
a . 1‘1 K2
1 e ' “2 ' 3" °’
2
(5.34) K1 I ale , K2 I aze , K4 I alazxo
With these substitutions
a 2a 2 ala
l 2 2
(5-35) B = 3,, a (x +9) " 8(K +"eS
1 2 0 0
and
5 36 “1“2 -a1|x1| ’ “2"‘2l
( - ) f(x1.x2) _—8(K0+6) e [Komlelxll

+ aZGIx2I+K3xix2+ala2KOIx1IIsz]

73

The marginal densities become univariate Sargan:

a -a Ix I
2 2 2
(5.37)
a -a Ix I
2 2 2
32(x2) I -z-e (1+u2Ix2I)

with means zero and variances being 4 / a1“, 4 / a22 respectively

for x1 and x2. Next we calculate the covariance

25.

5.38 cov x ,x ) - E(X ,x ) -
< ) ( 1 2 1 2 a12¢22(K0+e)

The correlation coefficient p therefore becomes;

COV (X1,X2) K3

 

 

(5.39) p

 

J’VAR(X1)VAR(XZ) 2“1“2(Ko+“)

Recall that in (5.36) we need to have K3 5 alazxo to ensure

that the density is non-negative for all x 82. This puts some

1.
restriction on the possible correlations that can be allowed. For
example, if 9 = 0 then p S l/2; if 0 > 0 not even p I 1/2 can be
attained. This seems to be the main shortcoming of the bivariate
Sargan distribution as defined in (5.36), and it casts some doubt on
its ability to approximate well distributions with large positive

correlation. In appendix B we also derive the moment generating

74

function, which implies the following moments.

[n!m!+(n+1)!(m+1)!] xoi+ [(n+1)! m! + n! (m+1)!] e

 

 

n m
2:11 (12 (ROW)
if n,m both even
”um I 0 if one is odd and the other one is even
(n+1)! (n+1)! K3
2a1n+1 a2“+1 (K0+9)
(5 40) if n;m both are odd

Also the cumulative distribution function is equal to:

(12

 

 

e 1 1 I -e-a222[c -a C 2 +a C 2 -C 2 z ] + 4C -2a C 2
8C4 6 1 7 l 2 7 2 5 l 2 4 1 4 1
22(0, 21>0 or 21(0, 22>0
ealzl+a222
3C4 {Cl-alczzl-a2c222+cszlzz} zl,zz<o
F(2122)
-a 2 -a z
1 1 1 2 2
38;" {“ [C1+“1°2“1+“2C2“2+C3“1“2I + C1
-a 2 "G 2
1 1 _ 2 2
(5.41) -e [C1+a1C221] e [C1+a2C222]} 21,2250
where

C1 ' 2“1“2(Ko+“"”‘3

75

C I a1a2(K0+6)+K3
C3 ' “1“2(Ka+“1 “21(0)
C I “1“2(K0+9)

c5 ‘ “1“2<“1“2K0'K3)
C I 2a1a2(K0+6)-K3

C I a1a2(K0+9)-K3

The bivariate Sargan density (5.36) can be easily extended to
the multivariate case, but we will not pursue the details here. Also,
it is possible to extend the (first-order) bivariate Sargan density

(5.36) to the second-order or higher order cases. In the second-order

case, we include all terms in the polynomial in (5.36) plus x 2 and

l
2 . . 2 2 2 2
x2 . We could also include terms like x1 x2, xlxz and x1 x2 , but then

we would not get second-order Sargan marginals, so we will not do so.

Thus we define the second-order bivariate Sargan distribution to have a

density of the form

. '“1IX1I'“2I“2l
(5.42) f(x1,x2) I K e [KO+K1Ix1I+K2Ix2I+K3x1x2
+ K Ix IIx I+K x 2+K x 2]
4 1 2 5 1 6 2
To ensure the non-negativity of f for all the values of x1 and x2 we

have to have K3 5 K4 . The parameter K is defined by the condition

76

I! f(xl, x2): I which results in

3 3
“1 “2

2 2
“(“1 “22‘0““1“22K1+“1 “2K2+“1“2K4+2“22K5+2“1“‘6)

(5.43) K

Its marginal densities are, as we will see, exactly second-order Sargan
given certain restrictions. They are derived in the appendix B.
Thus we have

IalIx
a e

_1 22
(5.44) 31(x1) EY1+71+2Y2) (1+a1y1Ix1I-I-a1 yle )

1 I

 

 

 

 

 

 

 

 

-a2Ix2I
“2e 2 2
(5°45) 320:2) ' 2(1+g1+2g2)(1+“251IX2I+“2§2"2)
where
Y - a22K1+a2K4 Y : GZZKS
1 ’ 2 2
a1(a2§Ko+a2K2+2L6) a1 (a22K0+a2K2+2K6)
2 y 2
a1 K2+a11<4 “1 K6
£1 . (2K+a +2K) ’ £2 . 2( 2K+a +2K)
“2“101K15 “2“101K15
and their variances:
2+6yl+2472
Var(X1) I 2
a1 (1+Yl+272)
(5.46)
2+6§1+24§2
Var(X2) I 2
a2 (l+§1+2§2)

Note £1= 71:1 is required to have continuous first and second partials.

77

Now according to the appendix C, the joint density will be;

alaz e-aIIxII-aZIxZI

(5.47) f(x1,x2) I -§3—- [K0+a1v1leI+a2v2Ix2I+K3x1x2

2 2 2 2
+“1“2V3IX1IIx2l + “1 “V4“1 ““2 “stz 1

with
(5 43) v I 6(-—l-- 2 Y2 ) - K I 6( 1 - 2 - K
. 1 “:2 1:3,; 0 1775; ”4) o

1
(5.49) v2 - 6(-1-+;;- 23:3?) - K0 - “7+“;- 2v5) - x0

 

 

 

 

E Y
5.50 - 2 2 '
( ) v3 5(T?E;'+'T:7;) + KO 5(v5+v4) + K0
Y2 52
(5.51) “4 ‘ 1+12 ’ v5 ' 1+§2
(1+7 1 (1+5 > K K
2 2 5 6
(5.52) 5 " K5 2 ' K6 2 . 2 I.
“1 72 “2 52 “1 v4 “2 v5
2x3
(5.53) cov(x1,x2) - E(X1.X2) ' “‘2"2
6a1 a2

 

x3 / (1+12><1+§2)

(5.54) p ' 53-11723 " ”(1+372)(1+3§2)

78

The main problem with the second-order bivariate Sargan
distribution is that it contains a large number of parameters. A
bivariate normal with zero mean contains 3 parameters. The first-order
bivariate Sargan (5.36) contains 4 parameters. The second-order
bivariate Sargan contains 10 parameters (a1, a2, 5, K0, K3, v1, v2, v3,
v4, v5). This may be more than is reasonable.

The alternative second-order bivariate Sargan density that was

considered is the following:

.’“1|"1I"“2I"2l

f(x1,x2) I A e [KO+K1leI+K2Ix2I+K3x1x2+K4Ix1IIx2I
2 2 2 ~ 2
+ stl +K6x2 +K7x1 x2+K8x1 Isz
2 2 2 2
(5'55) + K9x1x2 +K1on1Ix2 +K11x1 x2 I

with the following assumptions;

7:
M
x

<
K9 _ K10.

Here again A has to be equal to:

(5.56) A I a1 a2

where,

_ 2 2 2 2
D a2 (a1 Ko+a1K1+2KS)+a2(a1 K2+a1K4+2K8)+2(a1 K6+a1K10+K11)

(5.57)

79

to make f(x1,x2) a density function.The problem with this joint density
function is that it does not generate the second-order Sargan

marginals. Rather

 

 

 

'a Ix I
2Ae 1 1 2
"“;‘§"" [r1+(’2+2(x9+x1o))x1+“3x1 1 x1’0
2
(5.53) 81(x1) -a1Ix I
2Ae 1 2
a“ [fl-(r2+2(x9‘xio))“1+“3*1 1 x1<°
2
’“2Ix2I
2Ae Ir +(r +2(K +K ))x +r x 2I >0
a 3 4 5 7 a 2 6 2 x2
1
(5.59) 32(x2) .
'a Ix I
ZAe 2 2 {r -(r +2(x -K )) +r 2} <0
a 3 4 5 8 7 x2 5‘2 x2
1

where,

2

r2 ' “22K1+“2K4

2
r I a2 K5+a2K8+2K11

2
I a1 Ko+a1K1+2K5
r I a 2K +a K

5 1 2 1 4

2
r I a1 K6+a1K10+2K11

80

Also, this density contains even more parameters than (5.47). For both

reasons we will not consider it further.

5.3 Density Comparisons

In this section we will provide tables of first-order
bivariate Sargan and bivariate normal densities. In order to be able
to make meaningful comparisons we have imposed some restrictions on
both densities. These restrictions are;

(1) means equal to zero, variances equal to one .

(ii) same correlation coefficient, p .

IO.

(iii) same densities at x I x

l 2
These restrictions are sufficient to determine uniquely the
bivariate Sargan distribution which we will compare to the bivariate
normal. The reason is that, although the Sargan density (5.36) appears
to depend on five parameters ( a1, a2, K0, K3, 9 ), it can in fact be
written in terms of only four parameters. Let p be defined as in
(5.39) and define
“1 “2K0

(5.60) So I f(0.0) '- may

then

alaz -a1Ix1I-a2Ix2I

f(x1,x2) - ‘ETEaxﬁj e [KO-talelxlI+a29Ix2I+K3x1x2

+ “1“2Kolx1x2 '1

81

IalleI-azIsz
Claze
' 8(_“'1<0+e) {(K0+9>[a1|xll+a2Ix2II

+ KO[l-a1Ix1I-aZIx2I+a1a2Ix1x2I]

+ K3x1x2}

Ia Ix I-a Ix I a
(5.61) - e 1 1 2 2 {:élallxllﬁzllel

+ So[l-a1Ix1I-azIx2I+a1a2Ix1x 2I]

2 2
a a

1 2
+‘T" 9x112}

The imposition of unitary variances implies :11 I a2

Sargan distribution is uniquely determined once we pick a value of the

I 2, so that the

correlation ( p ) and of the density at x1- sz 0 (so).

We therefore provide in Tables 5.1-5.7 a comparison of the
Sargan and normal densities. Each table corresponds to a different
value of 0. For x1 ={-3.o,-2.5,-2.o,-1.s,-1.o,-o.5,o.o,o.s,1.o,1.5.2.0,

2.5,3.0} and x I{0.0,0.5,l.0,l.5,2.0,2.5,3.0}, we provide the normal

2
(top entry) and Sargan (bottom entry) density. ( Negative entries for
x2 are unnecessary because of symmetry; e.g., f(x1,-l.0) I f(-x1,l.0)
and f(xl, x2) I f(xz, x1).)

The agreement between the Sargan and normal is rather close.

82

The largest differences are near the turning points of the distribution
(IXI between 1.0 and 2.0, say), but even these are not very large.
Thus, for the values of p considered, the Sargan distribution appears
to be a fairly good approximation to the normal. However, it should be
noted that the range of p which we consider is rather limited. As
indicated in the discussion following (5.39) above,’we need to restrict
the correlation ( p ) to ensure that the Sargan density is non-negative
for all xland x2. We can not expect a Sargan density to approximate

well a bivariate normal with high correlation, at least not unless we

allow negative "density" in the tails of the approximation.

§;1 A Sigple Seemingly Unrelated Regression Model

In the previous section we saw that the bivariate Sargan
distribution is a reasonably accurate approximation to the bivariate
normal. However, this fact does not provide direct evidence on the
relevant question of whether estimators based on the bivariate Sargan
distribution will be robust to normal errors. This question was
addressed in chapters 3 and 4 in the univariate case, for a variety of
models, and the answer was (not surprisingly) model dependent. It is
reasonable to expect that the same will be true in the bivariate case;
the asymptotic bias that results from assuming Sargan errors when' the
errors are actually normal will depend on the model. Rather than
become involved in the extensive calculations which were done in the
univariate case, however, we will simply ask whether this bias is ever

zero. It was, in the univariate case, in the linear regression model.

83

Thus it is reasonable to conjecture that this bias may be zero, in the
bivariate case, in the seemingly unrelated regression model.

For simplicity and tractability we restrict ourselves to only
two equations and only one regressor (a constant term), in each
equation. We show that the estimates based on the bivariate Sargan
distribution are indeed consistent, regardless of the form of the true
distribution of error terms, so long as it is symmetric around zero.

Thus we consider the simple model

Y1= “1+ ‘1
(5.62)

Y2: “2+ ‘2

where yi is Txl, ei has mean zero and covariance structure:

0 if tl s

E( s ., e .)

ti 5] i,j=l,2

0.. if t=s
1]

Notice that we have not yet specified the distribution
function of the errors. Here, as in chapters 3 and 4, we ask the
question of what happens if we assume a bivariate Sargan distribution
when in fact the true distribution is something else (such as bivariate

normal). We form the (Sargan) log likelihood function;

M
(5.63) e - MlnB+Im§1 in f(yml-ul, ymz'“2)

84

or
£ I MXnB - a1 §| I? ‘U I ‘ a E Iy '9 I
mal m 1 2 11131 m2 2
M
+ “121 in {KO-palerml-ulI+a29Iym2-u2I+K3(ym1‘ul)(ym2'ﬂ2)}
(5.64)

+ “1“2Ko l("11.1""‘1“I’m—”2) I I '

where B is defined as (5.35).
To express its derivatives with respect to all parameters, we

define:

(5'65) D1 ' Ko+“1°(ym1'“1)"'“29("m2"“2)+(K3+“1“2Ko)(ym1'“1)(ymz'pz)
(5-66) D2 - K0+a16(ym1-u1)-a29(ym2~u2)+(K3-a1a2Ko)(ym1-u1)(ymz-uz)
(5‘67) D3 ' Ko"“1“(’mf‘ﬁ““2“(ymz‘I‘z)+(K3’“1“2Ko)(ym1'“1)(ymz'uz)
(5.68) 114 a Ko-ale(yml-p1)-aze(ym2-u2)+(K3+a1a2K0)(ym1-u1)(ymz-uz)

5 o 1 I’m ”1'+“29'ym2‘“2I+K3(ym1'“1)(ym2’“2)

+ alazxo I(ym1-I11)(ym2-p2) I

 

85

Then , the first order conditions defining the maximum likelihood

estimates are;

(5.70)

(5.71)

(5.72)

"" ' “1(M1+-M1-) ' 2

537' ’ “2("2+3”2-) '

as '++ “19+(K3+“1“2Ko)(’m2‘“2)

D1

 

3 1 2 0

D2

+.. - -
«19+(K a a K M?1112 #2)

-+ ale-(K3-a1a2K0)(ym2-u2)

D3

+

 

H “1 e’(K3*‘“1“2Ko)(yuaz"“2)

D4

+

 

 

+§ “29+(K3+“1“2Ko)(ym1'“1)

2 D1

+ +E “2“’(K3’“1“2Ko)(ym1““1)
D
2

 

- “E a29+(K3"a1¢12K0)(Ym1‘I11)
D

3

 

+' “2“'(K3+“1“2Ko)(ym1'”1)
+2: ‘11-)

 

4

M

M
'52" ' r‘mEIIymHI”

1
lg OIyml-pl I+a2KO I(ym1-u1)(ym2-p2) I -

mIl DS

0

 

86

 

 

 

 

 

M
8£ M
5.73 ._.. . .._- _
1X1 elymz'uzI+a1KOI(ym1-u1)(ym2-u2)l - 0
mIl D5
M 1+a - -
(5.74) 5% _ Rid-'I Z __1“2I(ym1 Isl)(yu12 u2)I 0
M a I? ' I“ I? w I
6£ ‘M 1 m1 u1 2 m2 2
o mIl 5
M (y '11 )(y -u)
(5.76) .5725. .. 2 and 1D m2 2 _ o
3 mIl 5

(where, M

++ is the number of terms in +2 and similarly for §+ , §_ ,

M .)

Dividing these derivatives by M, and taking probability limits
we have (denoting the probability limits of ai, "i' K0, K3, 0 by Hi ,

Hi, K0, R3, 9 respectfully):

M M
~ 1+ 1..
(5.77) (11 {Plim T - Plim T }
“H 1 ++ (119+(K3‘H11 2 O)(ym2-u2)
- Plim T Plim 31—— 2 ~
H D
1
”4+. 1 +- “19+(K3-a1a2K0Xym2-p2)
- Plim T Plim ii" 2 ~
+- D
2
M ’+;~'(i'3af)(y u)
+P11m-i-1Plimﬁl—2 1 3120 m2 2
-+ D
3
Mn “ ;~'(ﬁ+53~)(y u)
+Plim—P11m—Lz 1 3 1 0 m2 2
M M__ 'f)’
l,
M Plim
(5-78) «2 [Plim T" M ]
M++ 1 ++ a29+(K3+a1a2K0)(Ym1 111)
- Plim T Plim ﬁ— 2 ~
H D
1
M +- 3"-<i-'5'&'f<')(y E)
+Plim IPlim—Ml Z 2 3 :20 m1 1
+- D
2
M -+'&'"+(E'-'&'E'~)(y u)
IPlim-ﬁ-tPlim-ﬁ-L-Z 2 3 :20 ml 1
-+ D
3
“- “ 3' '5-0? " E " )(y u
+Plim Plim 1 2 3 120 m1 1
M M__ B

 

87

 

 

 

 

 

 

 

 

88

(5.79) :l - Plim

3“.
>4

Iym1-u1I

M Elyml-ul |+<12K0I yml-ull lymz-uzl

 

+ Plim 1

SI
M

11131 D5

M
1 N
(5.80) :g" Plim‘ﬁ’ 21 IymZ-pzl

 

 

 

 

a2 m. N N ~~ ~ ~
M Glymz'BZI+aIKOIYm1’H1IIymz'P2I
+P11m'ﬁ N
m']. D5
M 1+3; IV :5 II? -El
1 2 2
(5.81) -:—-1—;+Plimila- X 1 2 “N m
Ko+9 m=1 D5
M 3|? 411' It; Iy 411'
(5.82) -~——1~+plim% 2 1 “I 1~2 “'2 2'
Ko+6 mI1 D5
(5' -T1)(y 5)
(5.83) nimﬁ 2 “1 1 “2 2 - o
mIl D

5

Again we have a very messy but (in principal) useful set of

equations. To show that ui's are consistent , following the same logic

as in the univariate case, we only need to show that the above
equations are satisfied by 3i I ui(and some values for U , Hi, 3 ,
R E3 ).

With E i: “i' the second terms in both (5.79) and (5.80) are

0 U

easy to evaluate, while other terms are very messy. But the actual
values of the parameters : 31' B , P , R0, R3 are not important as long

as acceptable solutions to the above (5.79)-(5.83) exists. By

89

acceptable we mean that 3i , R0 , 3 , P , be positive, and K3 be less

than alazko .

To evaluate (5.77) and (5.78) with Z i: ui, we note that;

141+ Ml_
Plim ( ___ )I P (y1> ul) I P (yl< ul) = P( ___ )

M M

because of the symmetry assumption.

Finally, it is obvious that the two equations (5.77) and

(5.78) hold provided that the joint density of the error terms is

symmetric around zero and certain expected values exist:

 

 

 

 

1 H a18+(K3+ala2Ko)€m2 E {(11 ( 3 1 2 0 2}
(5.84) Plim-ﬁ— 2 ~ 5
++ D1 ++ 1
264i&"§)e — ﬂequa%ka
.E{1 31“202}_anﬁ_1_2 ~ I
-- ~ .. n
D 4
A

 

 

 

 

‘ ezaeezin za«:;"n
0 m2 1 3 1 2 o 2
(5.85) Plim-ﬁ-l— 2 1 3~1 2 - E ~
+— D2 +- D2
“1°'(K3'“1“2Ko)“2 1 “ “’(Ka‘ﬁﬁxokmz
I E I P1111!M ~
" -+ D
-+ D3 3

The same procedure can be applied to (5.78). Thus all (5.77)-(5.83)

are indeed satisfied for ii I “i (and for 3i , U , R0, K3 as being

implied by their respective equations (5.79)-(5.83) . The Sargan

90

"MLE" is consistent for “l and “2'

This result can also be obtained in cases involving some
regressors other than just a constant term. The proof is basically the
same, and the result is that the Sargan "MLE‘s" are consistent for the

coefficients of all of the regressors.

5.5 Conclusions

In this chapter we defined a bivariate Sargan distribution.
This was the result of a series of attempts to derive or approximate a
bivariate (or multivariate) distribution function with certain
properties. Basically, we want a distribution which reasonably
approximates the bivariate normal, which has Sargan marginals, and
which has an easily computable c.d.f. Our first-order Sargan
distribution (5.36) has these properties.

We calculated and compared the densities of the bivariate
Sargan and bivariate normal distributions to show that this bivariate
Sargan density is very close to the bivariate standard normal. (Of
course, it has more parameters than the standard normal.)‘ The
agreement between the two distributions is in fact very close except
around the turning points (x1 and x2 between 1 and 2, say).

Finally, we turned to the question of robustness of inferences
based on the Sargan density. Here, we looked only at a rather simple
model, a seemingly unrelated regression model , with only a constant
term in each equation. Of course the MLE's based on a normality

assumption are consistent, even if the errors are not normal. The more

91

interesting question is the effect on the estimates of assuming Sargan
where the true distribution is not Sargan. Here we showed that the
Sargan "MLE's" are still asymptotically unbiased, so long as the true
distribution of the errors is symmetric around zero. This result can
also be obtained in regressions with multiple regressors.

For more complicated models, the Sargan MLE‘s will not always
be consistent if the errors do not actually have a Sargan distribution.
As we found for the univariate case in chapters 3 and 4, the extent of
the asymptotic bias must be model dependent. But the above result at
least is the basis for the expectation that the bias will depend
strongly on the extent of censoring or truncation in the model: with

complete observations, there is no bias.

TABLE 5.1

Comparison of Bivariate Sargan and Normal Densities

(Normal I top entry, Sargan I bottom entry)

 

p - —e15
x2
X1 3.0 2.5 2.0 1.5 1.0 .5 0
0.0
-300
0.0
0.0 .001
-2e5
0.0 0.0
.001 .002 .0051
-200
' .001 .001 .003
.001 .004 .010 .023
-105
.001 .003 .006 .012
.002 .006 .017 .038 .067
-100
.002 .005 .012 .025 .051
.002 .007 .021 .050 .092 .130
-es
.004 .010 .021 .045 .090 .156
.002 .007 .021 .051 .097 .142 .161
0.0
.005 .013 .028 .059 .114 .184 .161
.001 .005 .016 .040 .079 .120
.5
.002 .006 .013 .028 .060 .115
.001 .003 .009 .024 .050
1.0
.001 ,002 .006 .013 .029
0.0 .001 .004 .011
1.5
0.0 .001 .002 .006
0.0 0.0 .001
2.0
0.0 0.0 .001
0.0 0.0
2.5
0.0 0.0
0.0
3.0

 

0.0

TABLE 5.2

Comparison of Bivariate Ssrgsn and Normal Densities

(Normal I top entry, Sargan I bottom entry)

 

p - -.10
x2
x1 3.0 2.5 2.0 1.5 1.0 .5 0
0.0
“3.0
0.0
0.0 0.0
-205
0.0 0.0
0.0 .001 .004
-200
0.0 .001 .002
.001 .003 .010 .021
-105
.001 .002 .006 .011
.001 .005 .016 .036 .064
“1.0
.002 .005 .011 .023 .047
.002 .007 .021 .049 .089 .127
".5
.004 .009 .020 .042 .085 .149
.002 .007 .021 .051 .097 .141 .160
0.0
.005 .013 .028 .059 .114 .184 .160
.001 .005 .017 .042 .081 .121
.5
.003 .006 .014 .031 .065 .122
.001 .003 .010 .027 .053
1.0
.001 .003 .007 .015 .032
0.0 .001 .005 .013
1.5
.001 .001 .003 .007
0.0 .001 .002
2.0
0.0 .001 .001
0.0 0.0
2.5
0.0 0.0
0.0
3.0

 

0.0

TABLE 5.3

Comparison of Bivariate Sargan and Normal Densities

(Normal I top entry, Sargan I bottom entry)

 

p - -e05
\ X2
\
X1 \\ 3.0 2.5 2.0 1.5 1.0 .5 0
0.0
-3e0
0.0
0.0 0.0
-2e5
0.0 0.0
0.0 .001 .004
-200
0.0 .001 .002
.001 .003 .008 .019
-105
.001 .002 .005 .010
.001 .005 .014 .034 .061
-1.0
.002 .004 .010 .021 .043
.002 .007 .020 .047 .087 .126
-es
.003 .008 .018 .040 .080 .142
.002 .007 .021 .052 .097 .141 .159
0.0
.005 .013 .028 .059 .114 .184 .159
.001 .006 .018 .044 .083 .122
.5
.003 .007 .015 .034 .070 .129
.001 .004 .012 .029 .056
1.0
.001 .003 .008 .017 .036
0.0 .002 .006 .015
1.5
.001 .002 .004 .008
0.0 .001 .002
2.0
0.0 .001 .002
0.0 0.0
2.5
0.0 0.0
0.0
3.0

 

0.0

TABLE 5.4

Comparison of Bivariate Sargan and Normal Densities

(Normal I top entry, Sargon I bottom entry)

 

p I 0.0
x2
XI 3.0 2.5 2.0 1.5 1.0 .5 0
0.0
'3.0
0.0
0.0 0.0
-205
0.0 0.0
0.0 .001 .003
“2.0
0.0 .001 .002
.001 .002 .007 .017
-1e5
.001 .002 .004 .009
.001 .004 .013 .031 .059
-100
.002 .004 .009 .019 .039
.002 .006 .019 .046 .085 .124
-es
.003 .007 .017 .037 .075 .135
.002 .007 .022 .052 .097 .141 .159
0.0
.005 .013 .028 .059 .114 .184 .159
.002 .006 .019 .046 .085 .124
.5
.003 .007 .017 .037 .075 .135
.001 .004 .013 .031 .059
1.0
.002 .004 .009 .019 .040
.001 .002 .007 .017
1.5
.001 .002 .004 .010
0.0 .001 .003
2.0
0.0 .001 .002
0.0 0.0
2.5
0.0 0.0
0.0
3.0

 

0.0

TABLE 5 .5

Comparison of Bivariate Sargan and Normal Densities

(Normal I top entry, Sargan I bottom entry)

 

p I .05
x2
X1 3.0 2.5 2.0 1.5 1.0 .5 0
0.0
”3.0
0.0
0.0 0.0
-205
0.0 0.0
0.0 .001 .002
-200
0.0 .001 .002
0.0 .002 .006 .015
-105
.001 .002 .004 .008
.001 .004 .014 .029 .056
-1eo
.001 .003 .007 .017 .036
.001 .006 .018 .044 .083 .122
-es
.003 .007 .015 .034 .070 .129
.002 .007 .021 .052 .097 .141 .159
0.0
.005 .013 .028 .059 .114 .184 .159
.002 .007 .020 .047 .087 .126
.5
.003 .008 .018 .039 .080 .142
.001 .005 .014 .034 .061
1.0
.002 .004 .010 .021 .043
.001 .003 .008 .019
1.5
.001 .002 .005 .010
0.0 .001 .004
2.0
0.0 .001 .002
0.0 0.0
2.5
0.0 0.0
0.0
3.0

 

0.0

TABLE 5.6

Comparison of Bivariate Sargan and Normal Densities

(Normal I top entry, Sargan I bottom entry)

 

p I .10
x2
111 3.0 2.5 2.0 1.5 1.0 .5 0
0.0
-300
0.0
0.0 0.0
“205
0.0 0.0
0.0 .001 .002
-200
0.0 .001 .001
0.0 .001 .005 .013
'1.5
.001 .001 .003 .007
.001 .003 .010 .027 .053
-100
.001 .003 .007 .015 .032
.001 .005 .017 .042 .081 .121
-es
.002 .006 .014 .031 .065 .115
.002 .007 .021 .051 .097 .141 .160
0.0
.005 .013 .028 .059 .114 .184 .160
.002 .007 .021 .049 .089 .128
.5
.004 .009 .020 .042 .085 .149
.001 .005 .016 .036 .067
1.0
.002 .005 .011 .023 .047
.001 .003 .009 .023
1.5
.001 .002 .005 .011
0.0 .001 .005
2.0
0.0 .001 .002
0.0 .001
2.5
0.0 0.0
0.0
3.0

 

0.0

TABLE 5.7

Comparison of Bivariate Sargan and Normal Densities

(Normal I top entry, Sargan I bottom entry)

 

p I .15
x2
X1 3.0 2.5 2.0 1.5 1.0 .5 0
0.0
-300
0.0
0.0 0.0
-2e5
0.0 0.0
0.0 0.0 .001
-200
0.0 0.0 .001
0.0 .001 .004 .011
-1e5
0.0 .001 .002 .006
.001 .003 .009 .024 .050
'1.0
.001 .002 .006 .013 .029
.001 .005 .016 .040 .079 .120
—.5
.002 .006 .013 .028 .060 .115
.002 .007 .021 .051 .097 .142 .161
0.0
.005 .013 .028 .059 .114 .184 .161
.002 .007 .021 .050 .092 .130
.5
.004 .010 .021 .045 .090 .156
.002 .006 .017 .038 .067
1.0
.002 .005 .012 .025 .051
.001 .004 .011 .023
1.5
.001 .003 .006 .012
.001 .002 .005
2.0
.001 .001 .003
0.0 .001
2.5
0.0 .001
0.0
3.0

 

0.0

APPENDIX A

DERIVATION OF THE COVARIANCES

Here, we use the following relationship:

 

n -ax . _ n 0 n ay _ n!
I; x e dx ( 1) 1.. y e dy “n+1 h>0 (A.1)
COV(X,Y) - E(X,Y) - Ito Ii“ xyf(x,y)dxdy (A.2)

f(x,y) is defined as in (5.14)

 

 

a x+a y
11 . IO“ IO“ xy e 1 2 (K0 -K1 x IK2y+K3xy)dxdy
2K 2K
x + _l x + __3.
0 azy KO-sz -K3y 1 0 ul 2 al
'Lwye [' 2 '2K 3ldY'—7{——§-—+2—TI
“1 “23 “1 “2 “2
1
- 6-5:? {a1a2K0+2a2K1+2a1K2+4K3} (A-3)
l 2
a y -a x
12 - I90 ye I; xe 2 (K0 +K1 x -K2y K xy)dxdy
0 “2y K0 2K1 K2y 2K3y
'1... ye I7*"5"‘2"‘3 “7
a a a a
1 1 1 1
-1
I —§-§ (a1a2K0+2a2K1+2a1K2+4K3) (A-4)

(11 (:2

I I I; e 2 y I2” xe ”2 (K0 IK1x+K2yIK3xy)dxdy
99

100

 

2 0 a 2 5
a1 1 a2 a2

-1
= -§-3- (a1a2K0+2a2K1+2a1K2+4K3) (A.5)
“1 “2

’“2’ "“1“ 2 2
I - I; e y I; e (Kox+K1x +K2xy+K3x y)dxdy (A.6)

 

 

1
- —Tjﬂﬂﬁ%ﬁ%ﬁﬂ%SM%}
“1 “2

Therefore

COV(X,Y) - o (A-7)

101

The same process is applied to f(x1,x2) as defined in (5.27):

That is:

a x +a x
R - f0 Iga xlxze 1 1. 2 2{-6alx1-9a2x2-I'K3x1x2)dxldx2

1 —oo

[0 x e+a2x2[- 2a19 + azexz + 2K3x2
-o 2 a 3 2 3

1 “1 “1

 

 

 

] dx2

29 29a2 4K3 4

azaz a2a3+ 3 3 ' 3a3(“1“2°"(3) (“'8’
12 12 12 “1 2

I
+

and similarly for other intervals; R2, R3, R we get:

49

R 8 f0 I” x e-a1x1+a2x2(9a x -6 +K )dx dx
2 «01"2 11“2"23"1"212

0 azx2 29x1 eazx2 2K3x2

=1... ‘27???" “*2
1 1 1
4x
-29 29 3 -4
2 2 2 2+a3 3 3 3 (“1“29'1‘3) (“’9’
“1 “2 “1 “2 1 “2 “1 “2

102

 

-4
R3 - a 3 3 (alaZO-KB) (A.10)
1 “2 '
R -__é._.( 9+K) (All)
4 a an 3 “1“2 3 -
1 2
Therefore,
“1“2

COV(X1,X2) - E(X1,X2) - -§§-(R1+R2+R3+R4)

3 (A.12)

 

APPENDIX B

DERIVATION OF MOMENT GENERATING FUNCTION AND CUMULATIVE

DISTRIBUTION FUNCTION OF BIVARIATE SARGAN

i) Moment generating function:

let

a a -a Ix l-a Ix I
1 2 1 1 2 2
f(x,y) 3 W (KO'HIIGIXI [+1129]le
+K3x1x2+a1a2Kolx1||x2| (3.1)
as defined in (5.36)
t x +t x
M(T) - E(e 1 1 2 2)

(a +t )x +(a +t )x
0 0 1 1 1 2 2 2
B1 - f_cf_° e (KO-mlBxl-a29x2+(K3+ala'2K0)x1x2)dxldx2

o e<“2+t2)"2 Ko’“2°"2 “1°'(K3+“1“2Ko)"2

‘~D a +t 2
1 1 (a1+t1)

 

} dx2

- (a1+t1)Ko+a19 + (a1+t1)a26+(K3+u1a2Ko) (B 2)

(a1+t1)2(a2+t2) (a1+t1)2(a2+t2)2

 

 

103

104

 

 

 

0 -(a1-t1)xf(a2+t2)x2

I~= E; e (“0+“19x1'“2°x2+(x3 “1“2Ko)“1‘2)“x1“x2
f0 e(a2+t2)x2 KO-azexz + a19+(K3-a1a2K0)x2} dx

'~= al-tl (a -t )2 2

1 1
(al-t1)Ko+a19 (cl-t1)a29-(K3-a1a2Ko)
2 + 2 2 (3.3)

(cl-t1) (a2+t2) (cl-t1) (a2+t2)

 

 

 

 

(a +t )X '(a -t )x
1 1 1 2 2 2
I; I?” e [KO-a19x1+a29x2+(K3-a1a2K0)x1x2]dx1dx2
-(a2-t2)x2 Ko+a29x2 ale-(K3-a1a2Ko)x2
I; e a +c + 2 }“x2
1 1 (a +t )
1 1
(a1+t1)Ko+a19 + -(K3-a1a2Ko)+a29(a1+tl) (B 4)
2 2 2 °

(a1+t1) (az-tz) (a1+t1) (az-tz)

-(a -t )x -(a -t )x
1 1 1 2 2 2
I; I; e [K0+a16x1+a29x2+(K3+ala2Ko)xlx2]dx1dx2

105

 

 

 

Ko+a29x2 a19+(K3+a1a2K0)x2 -(a2-t2)x2
' o 1 a -c" + 2 } “x2
1 1 (cl-t1)
‘ (al-t1)Ko+a19 (cl-t1)a265T (K3+a1a2Ko) (B 5)
2 f O
(cl-t1) (dz-t2) (cl-t1) (dz-t2)
“1“2

M(T) {31+B +B3+34}

8(KO+B) 2

ala2 (a2+t2)[(a1+t1)K0+a16]+a29(a1+t1)+K3+a1a'2KO
8(K0+9)

 

 

2 2
(a1+t1) (a2+t2)

+ (a2+t2)[(a1-tl)Ko+a19]+(a1-t1)aZG-(K3-a1a2Ko)

 

(cl-t1)2(a2+t2)2

+-(“2"2)[(“1+“1)Ko+“1“]'(K3'“1“2Ko)+“2°(“1+t1)

 

2 2
(a1+t1) (dz-t2)

 

 

(dz-t2)[(a1-t1)Ko+a16]+a26(a1-tl)+(K3+a1a2Ko)}
+

2"""‘2
(“1"1) (“2"2)

106

 

 

 

 

{ K0 + «19 + aze
(a +t )(a +t ) 2 2
1 1 2 2 (a1+t1) (a2+t2) (a1+t1)(a2+t2)
+ “1“2K0+K3
2 2
(a1+t1) (a2+t2)
K a 9 a 9
0 1 + 2

 

 

 

(“1"1)(“2+“2) (a1+t1)2(a2+t2) (al-t1)(a2+t2)2

a a K -K
+ 1 2 ° 3 (B.6)

(al-t1)2(a2+t2)2

 

K a 9 a 9
O + 1 + 2

(“1+‘1)(“2"2) (a1+t1)2(a2-t2) (a1+t1)(a2-t2)2

 

 

 

“1“2K0‘K3

+
2 2
(“1+“1) (“2’“2)

K a 9 a 9
0 + 1 + 2

(“1‘“1)(“2"2) (al-t1)2(a2-t2) (al-t1)(a2-t2)2

 

 

“1 “2K0+K3 “1 “2

+
(al-tl)z(a2-t2)2 8(K0+9)

Since,

107

 

 

 

 

 

 

 

 

 

 

an” 1 (-1)“‘“ nhn!
( ) - or in general (8.7)
n m X X n+1 n+1
6X1 axz 1 2 x1 x2
n+m
a ( 1 _ (-1)“+“(n-1—1)1(m+J-1)2 (3.8)
m i j n+1 n+3 _ _
axlnax2 x x x1 x2 (1 1)!(J 1)!
an+1n [M(t)] - alaz { (-1)n+m n! m! K0
btlnotzm 8(K0+9) (“1+t1)n+1 (“2+t2)m+l
(-1)“"“ (n+1)! m! ale
(a1+tl)n+2(a2+t2)m+1
(-1)“”“ n! (n+1)! (129 (-1)“"“ (n+1)! (n+1)! (a1a2x0m3)
+ +
(“1+t1)n+l(a2+t2)m+2 (a1+t1)n+2(a2+t2)m+2
(--1)"'l n! 11:! K0 (--1)“1 (n+1)! m! ale
+ +
(al-t1)n+1(a2+t2)m+1 (al-t1)n+§(a2+t2)m+1

108

(-1)“' n! (n+1)! a e {-1)“ (n+1)! (n+1)! (a a x -1< )
2 + 1 2 o 3
n+2
2)

 

 

n+1 n+2 n+2
((11 t1) (a2+t (al-tl) (a2+t2)

(-1)n n! m! x0 (-1)n (n+1)! m! «16
+ +

n+1 n+1 n+2 n+1
(a1+t1) (a2 t2) (a1+t1) (a2 t2)

 

 

 

 

 

 

 

{-1)“n: (n+1)! a29 (-1)“(n+1)1 (n+1)! (“1“2K0‘K3)
+ 1 n+2* + n+2 n+2
(“1+“1)n+ (“2't2) (“1+t1) (“2'“2)
(-1)2(“+“)n1 m! K0 (-1)2(“*“)(n+1)1 m1 ale
+ +
(a1_t1)n+1(a2_t2)m+1 (a1_t1)n+§(a2_t2)m+1
' (-1)2(“+“)n1 (n+1)! a29 (-1)2(“+“)(n+1)1 (n+1)! (a1a2K0+K3)
+ +
(al-t1)n+1(a2-t2)m+2 (al-t1)n+2(a2-t2)m+2
n a a nlle [1+(-1)“+(-1)“+(-1)“+“1
_ a +“mt-0) _ 1 2 { o
I"'nm 6t nat m 51K0+95 a n+1a n+1
1 2 1 2

109

+[1+(-l)n+(-1)m+(-1)n+m](n+1)1m1a16

a n+2“ n+1
1 2

 

[1+(-1)“+(-1)“+(-1)“+“]n1(m+1)!aze

a n+la n+2
1 2

 

+

2(n+m)]a

 

 

1(-1>“+“+<-1>“+<-1>“+(-1) lazxo
+ (n+1)!(m+1)! n+2 n+2
“1 “2
1<-1)“+“-<-1)“—<-1>“+(-1>2‘°+“’1K3(n+1>!(n+1)!
+ a n+2“ n+2
l 2
Therefore;

n! m! Ko+(n+l)! m! O+n! (n+1)! O+(n+1)! (m+1)!K0

 

p 3
on n m
2a1 a2 (KO+6)

if both nlm even

[n!m!+(n+l)1(m+1)!]KO+[(n+1)1m!+n!(m+1)!]9
- (3.11)
n m
2:11 «2 (K0+9)

 

110

(n+1)!(m+1)!K

3
nm . n+1 m+l
2a1 a2 (K0+9)

If both n,m are odd (3.12)

p - 0 otherwise (8.13)

11) Derivation of Cumulative Distribution Function

1) zl,zz>0

1° {0 ea1x1+a2x2[x - 9x -a +(x +a ) ]d d
-o -o o “1 1 29x2 3 1“2Ko x1‘2 x1 x2

Ko'“2°"2 + “1°'(K3+“1“2Ko)x2

a 2
1 a1

 

1 dx

0 “2x2
I...e[

2

2(K0+9)a1a2+K3
- 2 2 (3.14)

“2

 

0 z1 ’“1x1+“2x2
I = [_OIO e [Ko+a1Oxl-az9x2+(K3-a1a2Ko)x1x2]dx1dx2

111

' I?“ "‘l’{° 1 1[KO‘HH'ﬂe“1'“29"2"“‘3'“1“2Ko)""1"2

(“3'“1“2Ko)x2 “2‘2
+ ] e

“1

 

 

(“3’“1“2Ko)*2 “2x2}
“1 le

9x+

- [KO+9--<z2 2

dx2

 

 

 

 

 

_ _’_1.e'“121[Ko+9+“1°z1 + “2°'(K3'“1“2Ko)z1 _ K3'“1“2Ko]
a1 a2 a 2 a 2
2 “1 2
+ _1. Ko+6 + (:29 - K3-a1azKo]
a1 a2 __2 2
a a a
2 1 2
-a Z
- :£__i_:.{2a a (x +e)-x +a [a (x +e)-x 12 } + 2a1a2(K0+9) K3
azaz 12 o 3 1 1% o 3 1 2a2
1 2 “1 2
(3.15)
2 a X -a X
2 o 1 1 2 2
13 I0 I“ e [K0 alexl'i-azexz‘l-(K3 cxlcle(0)xlx2]dxldx2
122 '“2‘2 Ko““2°"2 e (“3’“1“2Ko)"2
- e [---—--+--—-- 1 dx
0 a a 2 2
1 1 a
1
-a 2
+ .
2 2 1“2 0 3 2 1“2 0 3 2 “““2"2“"‘
“1 “2 “1 “2

(13.16)

112

“2 21 ’“1‘1'“2*2
14 ' Io Io ‘3 [Ko+“19x1+“29"2+(x3+“1“2xo)x1x2”"1““2
Z -a Z
- Iz--1—{e11[K+9+a92+a9x+(K+a K)zx
o a o 1 1 2 2 3 1“2 0 1 2

1

+ (K3+“1“2Ko)"2
“1

 

]

(K -a K)x ax
3 1“2 0 2 2 2
- [xo+e+a29x2+» a1 1} e

dx2

'“1’1'“222
e
- a 2“ 2 {2:11(12(K0-1-9)-i-K3-+-<z1 [ala2(Ko+6)+K3]z1
1 2

 

+112 [(11 :12 (KO-+9 )+1<3 122+a1 a2 (K3+a1 «21:0 )1».1 22}

-a
1 1
-e
———-—2a 2 {2a1a2(K0+6)+K3+a1[a1a2(KO+9)+K3]zl}
1 2

_ '“2‘2

it? {2:21 (12(K0-1-6)-l-K3+a2 [al (12(K0-1-6)+K3 122}
1 2

1
+ ----a 20‘ 2 {2a1a2(K0+9)+K3}
1 2
1 ’“1‘1'“222

' a 2“ 2 {“ ["1+“1"2z1+“2"222+“3z1zz1
1 2

 

'“1‘1 ‘“2“2
-e [w1+a1w221] -e [w1+a2w222]+w1} (3.17)

113

F(zlzz) = A(11+12+13+I4)

A { ’“1’1’“222
e

 

[w1+a1w221+a2w222+w32122]

a a 2
1 2
-a z -a z
1 1 2 2
-e [4w4+2a1w£21]-e [4w‘+2a2wh22]+8w4}
where
A - “1“2
8(K0+9)
w1 = 2a1a2(K0+6)+K3 - 2w4+-K3

w - a1a2(Ko+9)+K3 - w£+K3

"’3 " “1“2‘K3+“1“2Ko)

a1a2(K0+9)

w5 - a1a2(a1a2KO-K3)

(3.18)

(8.19)

114

and similarly for the other parts;

2 z a x +a x
2 1 1 1 2 2
I5 f_o I_o e [KO-a1Bxl-az9x2+(K3+a1a2Ko)x1x2)dxldx2

 

a z +a 2
A 1 1 2 2
F(“1%) A Is "2“2’° (“1'“1V2z1'“2“222+”3z1221
a a
1 2
(3.20)
111) 21(0, 22>O
z a x +a x
o 1 1 1 2 2
I15 I... I... e “‘0 “19x1 “2932+(K3+“1“2Ko)"1"2]“"1“"2
. “121
I —§-§- [WI-alwzzl (3.21)
(11 a2

22 z1 “1x1-“2xz
IO 1.“ e [Ko’“1°x1+“29x2+(K3’“1“2K0)x1x2]“x1“xz

115

“121'“222
-e

 

' 2 2 {2"4'K3'“1("4‘K3)z1+("4'K3)“222 ”5z122}
a a
1 2
(12
e 1 1
1"‘EF‘7E [2“4 K3 “1("4’K3)z1]
“1 “2

F(z1,22) - A(I6+I7)

A { .112 224111 21
-8

 

“1’1
22]+e [4w4-2a w z ]}

' w5‘1 1 a 1

[ZwA-K3-a1(w4-K3)zl+a2(w'4-K3)z2

(8.22)

(8.23)

APPENDIX C

SECOND-ORDER BIVARIATE SARGAN DENSITY

 

Let
-a1|x1|-a2|x2|
f(x1,x2) - A e [KO+K1|x1|+K2|x2|+K3x1x2+K‘lele
2 2
where
a 3a 3
A - 1 2 (c.2)
43

2 2 2
8 a1 a22K0+a1a22K1+a1 a2K2+a1a2K4+2a22K5+2a1 K6

Its marginal densitites are:

i) x1>O

 

'“1‘1 '“2‘2 2 2
L1 - e I; e (KO+K1x1+K2x2+(K3+K4)x1x2+K5x1 +K6x2 )dx2

116

117
2
-a x K +K x +K x1 + K2+(K3+K4)x1 2x

 

 

 

 

_ e 1 ll 0 1 1 5 + 6] (C.3)
a2 2 a 2
“2 3
a x a x
1 l 0 2 2 2 2
L2 - e I” e (KO-Flel-K2x2+(K3-K4)x1x2+l(5xl +K6x2 )de
2
-a1x1 K0+K1x1+K5x1 KZ-(K3-K4)x1 2K6
- e [ + + ] (C.4)
a2 2 a 3
“2 2
ii) and similarly for x1<0, we get:
2Ae_a1x1 2
A(L1+L2) ' '-;-§-— [Ko+).1x1+).2x1 ] if x1>0
2
81(X1) -
-3§3:1:-1—[x-1x+xx2] ifx<0
3 0 1 1 2 1 1
a
2
(C.5)
where
2
X K +11 K +2K

o'“2022 6

*1 ' “22x11“2K1.

1
X;- - alYl (C.6)

118

substituting back into (C.5),

-a1|x1|
ale

2 2
31“1’ ' 2(1+yl+2y2) (1+“1Y11‘1l+“1 Y2‘1 )

 

and similarly for x2:

-a IX 1
a2e 2 2

2 2
32(‘2) ' 2(1+§1+2§2) (1+“2§1|‘2|+“2 “2‘2 >

 

where

2
alhK2+a1K4

2
a2(a1 Ko+a1K1+2K5)

 

£1 '

2
“1 K6

 

a n
2 2 2
62 (a1 K0+a1K1+2K5)

(C.7)

(0.8)

(0.9)

(C.10)

(C.11)

Equations (C.7) and (0.8) are second-order Sargan densities for X1, and X2

respectively. To ensure that first and second partial derivatives are

continuous we set 51 - 1, 71 - 1 (see Goldfeld and Quandt (1981)).

Equations (C.6) and (C.10) result in:

 

 

 

 

2
“2 K1+“2Kl. K1 K2 K4
2K+a +21< ' a1 .3) K0 . “-1.-“2+“1“
“2 0 2‘2 6
2K +a K K
“1 2 11. K2 K1 4
2 - :12 .> K0 - ?-T+aa
a1 KO-i-alKl-I-ZKS 2 1 1
That is;
5.1-5.2. .. ‘5 -32.
a a 2 2
1 2 a1 12 a2 :2
Equation (0.11) can be rewritten as:
K 8 K6 - 51.- 2K5
0 a 5 a1 a
2 2 1
and from (C.7) we get:
“2K5 .. a Y .> K5 _ 1(2- _ 2K6
2 1 2 2 a 2
a2 K0+azK2+2K6 a1 12 2 a2

Using equations (C.14) - (C.16), we get:

 

K K
a 2 Y2 a 2 £2
1 2
Then;
a ‘1 5 a 2: 5
K _ 1 2 K _ 2 2
3
5 1+72 6 [+52

and,

 

 

(C.12)

(C.13)

(C.14)

(C.15)

(C.16)

(C.17)

(C.18)

120

 

 

Y E
2 2
K4 GIGZKO'HIIGZO (ill-Y;+ 112;) (11112113 (C.19)
Y2 ’52
v4 . 1T1}; , v5 . TF5; (c.20)
v3 I KO + 6 (v4+v5) (C.21)
Therefore
K1 I alv1 , K2 I azv2
2
v - 5 ( 1 - Y2 ) - x (c.22)
1 1+; 1+7 0
2 2
1 2“2
v2 I 6 (1+Y2 - 1+§2) - K0 (C.23)

Using the above relationships, the joint density can be rewritten as

“1“2 '“11‘1l'“2"2l
f(‘1 "2) ' ’85” “ [Ko+“1"1|‘1 |+“2"2 I‘2l“"3‘1‘2"“1“’2"31‘1‘2l

2

2 2 2
+ a1 ovax1 +a2 6v5x2 ] (0.24)
The covariance of X1,X2 will therefore be derived as follows:
a x a x
O 2 2 0 1 1 2 2
MI I f_° x2e f_o x1e [K0 allel a2v2x2+(K3+a1a2v3)x1x2+a1 6V4):1

2 2
+ a2 vsox2 ]dx1dx2

121

x +a 2v 6x 2

2[-a1v1+(K3+a1a

2V3)‘2]

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0 “2‘2 Ko’“2"2 2 2 5 2
I.” ‘2e { a 2 + a 3
1 1
6a 26v
_ 1 4} d
a 4 ‘2
1
2x +2 2 +5 25 +2(x +5 ) 5 25v
.- “1 o “1 v1 “1 v4 + 2 “1“2‘2 3 1“2V3 + “2 5 (c 25)
(14:12 ¢3¢3 (14112 .
1 2 1 2 2 1
and similarly for the other parts, i.e.:
2 2 2 2
-(a1 Ko+2a1 v1+6a1 5v4) 2[-a1a2v2+2(K3-u1a2v3)] 6oz 6v5
M2 ' a 4“ 2 1 a 3 3 ' a a 4
1 2 1 “2 1 2
(C.26)
-(a 2K +2a 2v +6a 26v ) 2[-a a v +2(K -a a v )] 6a 26v
M - 1 o 1 1_~ 1 4 + 1 2 2 3 1 2 3 _ 2 5
3 4 2 ' 2 '4'
“1 “2 “1 “2 “1 “2
(c.27)
a 2x +2a 2v +6a 25v 2[a a v +2(x +5 a v )1 6a 25v
M - 1 o 1 1 1 4 + 1 2 2 3 1 2 3 + 2 5
4 a 4“ 2 a 3“ 3 a 2a 4
1 2 1 2 1
(C.28)
2K
aa
1 2 3 (c.29)
C0V(X1,X2) - -§3-'(M1+M2+M3+M4) 5a 2a 2
1 2
and the correlation coefficient is:
2 2
p - 21(3/a1 a2 5 - K3 (1+12)(1+§25
16(1+3yz)(1+3§2) 251525 / (“+312)(1+3§2)
_T’Z
/ “1 “2 (1+72)(1+§2) (3.30)

Chapter 6

Conclusions

The main objective of this dissertation was to investigate the
adequacy of the Sargan distribution as an approximation to the normal
in econometric models. The normal distribution is widely assumed, in
part because it often leads to simple results. However, in some models
the normal distribution is computationally more complicated than the
alternative distributions, such as the Sargan distribution, whose
c.d.f. can be expressed in closed form. Thus there may be a
computational benefit to using an approximation to the normal in
certain models, and we ask what the cost might be.

In chapters 3 and 4 we showed that the univariate Sargan
distribution provides a very close approximation to the normal, in the
sense that their densities are quite close to each other. This is
especially so if a second or higher-order Sargan distribution is used.
Such comparisons have also previously been made by Goldfeld and
Quandt (1981) and by Missiakoulis (1983).

However, the fact that the densities are close does not
necessarily imply that MLE's based on the Sargan distribution will have
properties similar to the MLE's based on the normal distribution.

Therefore, we have considered a variety of models, and asked what the
122

123

cost is of assuming the errors to have a Sargan distribution, if in
fact they have a normal distribution. Our definition of cost is the
asymptotic bias (inconsistency) of the resulting estimates.

This cost is model dependent; in other words, it is different
in different models. There is no cost in the linear regression model.
In fact, our result for the linear regression model is stronger: the
Sargan MLE's are consistent regardless of the true distribution of the
error terms, so long as it is symmetric. However, in the so-called
sample selection models, in which the c.d.f. of the errors appears in
the likelihood function, there is a cost to an incorrect assumption of
the distribution of the errors. How high the cost is depends upon how
complete the sample is; the cost is higher when the sample is more
highly censored or truncated. For example, in the censored regression
model the asymptotic bias varies positively with the degree of
censoring. As the sample becomes more complete, the bias disappears,
which is consistent with our result for the fully observed sample (the
linear regression model).

The same kind of results occur for the truncated dependent
variable model, except that the bias is generally larger in the
truncated model than in the censored model. This is consistent with
the reasonable intuition that additional information helps in reducing
the bias of the estimates.

Another result which is consistent with this intuition is that
the bias of the estimates is virtually always smaller when the error
variance is known than it is when the error variance is unknown. The

bias is sometimes large enough to be a serious problem, but it is

124

relatively minor for samples that are 50% complete in the censored
case, or 75% complete in the truncated case.

The same kinds of results are true for higher-order Sargan
distributions, but the bias is smaller when a higher-order Sargan
distribution is assumed. Therefore, as should be expected, it is much
safer to approximate the normal distribution by the second-order Sargan
distribution than by the first-order Sargan.

Another interesting result is that it is much less costly (in
terms of asymptotic bias) to approximate the normal distribution with
the Sargan that vice-verse. In the linear regression model, there is
no bias either way. However, in our more complicated models, the bias
caused by assuming Sargan when normal is true is much smaller than the
bias caused by assuming normal when Sargan is true. It is not apparent
why this should be the case.

The overall conclusion from our study of the univariate Sargan
distribution is that one should not use the Sargan distribution, if one
really believes that the normal distribution is correct. Any
computational savings are not worth the cost, in terms of asymptotic
bias and resulting incorrect inferences.

0n the other hand, while models depending on the univariate
normal c.d.f. (e.g., the Tobit model) are not terribly complicated
computationally, models that involve the multivariate normal c.d.f.
(e.g., a multi-market disequilibrium model) are still very difficult to
estimate. Thus a multivariate Sargan distribution might be more
valuable, in terms of potential computational savings, than the

univariate Sargan distribution. We have considered a bivariate

125

distribution, whose marginals are univariate Sargan, and defined it to
be a bivariate Sargan distribution. We have shown that our constructed
bivariate Sargan density is very close to the bivariate standard normal
density except around the turning points, although it is not very far
off at these intervals either.

Also we proved in the seemingly unrelated regressions model
that the estimates based on the bivariate Sargan density are
asymptotically unbiased (consistent) regardless of the true
distribution of the error terms, as long as it is symmetric.
Presumably this is not so in the more complex models. Although we did
not investigate such models in detail, this could be done by a straight
forward extension of the methods used earlier in the thesis.

The main pronlem with our bivariate Sargan density is the
limited possible range of the correlation coefficient, p. It is not
very interesting to limit attention to bivariate or multivariate Sargan
distributions that exhibit almost no correlation, but this is necessary
to keep the density non-negative over its entire range. An interesting
unanswered question is whether ignoring this restriction would cause
problems in actual empirical work. For example, it would certainly
not matter in any practical sense if the Sargan density were negative,

ten or twenty standard deviations from the mean.

BIBLIOGRAPHY

Arabmazar, A. and P. Schmidt (1982),”An Investigation of the
Robustness of the Tobit Estimator to Non-Normality,"
Econometrica, SQ, lOSS-l063.

Goldberger, A. S. (1980),“Abnormal Selection Bias" SSRI Discussion
Paper 8006, University of Wisconsin, Madison.

 

Goldfeld, S. M. and R. E. Quandt (1981),”Econometric Modeling with
Non-Normal Disturbances," Journal 9g Econometrics, 11,
141-155.

Grobner W. and N. Hofreiter (1958), " Integraltafel, Zweiter Teil
Bestimate Integrals,” Springer-Verlag in Vienna, Austria.

Hechman, J. (1976), ”The Common Structure of Statistaical Models of
Truncation, Sample Selection and Limited Dependent Variables,
aand a Simple 'Estimators for Such Models,” Annals 9f
Economics and Social Measurement, g, 475-492.

Johnson, N.L. and 5. Hot: (1970), Continuous Univariate Distributions,
Vol 1., New York, Wiley.

Missiakoulis, S. (1983),"Sargan Densities: Which 0ne?," Journal 9;
Econometrics, 2;, 223-234.

Stone, M. H. (1962),”A Generalized Weierstrass Approximation Theorem,"

in Studies i2 Modern Analysis, 391 1, ed. R. Creighton
Buck, Englewood Cliffs, New Jersey: Prentice Hall.

126

   

”111111111111111111111111111111111“