REGRESSION ANALYSIS WITH SECOND 7-0RDER
AUTOREG‘RESSIVE DISTURBANCES

Thesis for the Degree 0f Ph. EL
MICHIGAN STATE UNIVERSITY
PETER I. SCHMIDT
1970

IIIIIII IIII III/IIIIIIIIIIIIIIIIIIIIIIIIII

310565 57 77

 

IHFQIQ

This is to certify that the
thesis entitled
Regression Analysis with Second-Order;

Autoregressive Disturbances

presented by

Peter J. Schmidt

has been accepted towards fulﬁllment
of the requirements for

Ph . D. degree in Economics

 

 

%@

Major professor

DateNovegber 19, 1970

0-7639

 

www.mmua—Li
LIB AR Y

MlChlgi‘ﬂl State
UnivchltY f

—"w"v-. q cv—ww‘
o

ABSTRACT

REGRESSION ANALYSIS WITH SECOND-ORDER
AUTOREGRESSIVE DISTURBANCES

BY

Peter J. Schmidt

Autocorrelation is present in a regression
equation when the unobservable random disturbances are
not mutually independent over time. In the presence of
autocorrelation, ordinary least squares will lead to
inefficient estimators of the regression coefficients
and to inconsistent estimators of their variances.
Econometricians have therefore developed testing proce-
dures to test for autocorrelation, and estimation pro-
cedures to alleviate the problems which it causes when
it is present. These procedures must of necessity make
some assumption about what types of autocorrelation
might be present. In particular, it has usually been
assumed that the disturbances follow a first-order auto-
regressive scheme.

This study considers autocorrelation in the more
general form of a second-order autoregressive scheme.

The usual testing and estimation procedures are generalized

Peter J. Schmidt

to this case. Finally, the new procedures are compared
to the original procedures in terms of their performance
in the presence of various types of autocorrelation.

The results obtained indicate that these generalized
testing and estimation procedures may be useful, at least
when one does not have strong a priori reasons for
believing the autocorrelation in the sample to be of

first-order form.

REGRESSION ANALYSIS WITH SECOND-ORDER

AUTOREGRESSIVE DISTURBANCES

BY

Peter J. Schmidt

A THESIS

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Economics

1970

("‘1‘“)

Ibr J
LP\

ACKNOWLEDGEMENTS

This study would never have been begun, much less
completed, without the constant encouragement, advice, and
help of Jan Kmenta, my dissertation committee chairman.
James Ramsey and Roy Gilbert also read the entire study
and made many valuable suggestions. I also wish to thank
Phoebus Dhrymes for his comments on Chapter III, and
Richard Henshaw for his comments on earlier versions of
Chapters IV and V.

The research on which this study is based was
supported in part by the Mathematical Social Science Board
Workshop on Lags in Economic Behavior, held at the Univer-
sity of Chicago in the summer of 1970. Marc Nerlove and
G. S. Maddala, co-directors of the workshop, were kind
enough to read a substantial part of an earlier draft and
to suggest improvements.

Any remaining errors are of course my own

responsibility.

ii

TABLE OF CONTENTS

ACKNOWLEDGEMENTS . . . . . . . . . . . .
LIST OF TABLES . . . . . . . . . . . .
Chapter
I. INTRODUCTION . . . . . . . . . . .
1.1 Statement of the Problem . . . . .
1.2 Types of Autocorrelation . . . . .
II. ESTIMATION IN A LINEAR MODEL . . . . .
2.1 Introduction . . . . . . .
2.2 Generalization to the Second- Order Case
2.3 PrOperties of the GLS Estimators . .
2.4 The Experiment . . . . . . . .
2.5 Results . . . . . . . . . . .
2.6 Summary . . . . . . . . . . .
III. ESTIMATION IN A DISTRIBUTED LAG MODEL . .
3.1 Introduction . . . .
3.2 Asymptotic Properties of the Estimators
3.3 Small Sample Properties of the Maximum
Likelihood Estimators . . . . .
3.4 Results of the Experiment . . . . .
3.5 Comments and Summary . . . . . .
IV. TESTING FOR SECOND-ORDER AUTOCORRELATION: A
GENERALIZATION OF THE DURBIN-WATSON TEST .
4.1 Introduction . . . . . . . .
4.2 Calculation of Significance Points . .
4.3 ApproximatiOns . . . . . . . .
4.4 The Bounds Test . . . . . . . .
V. THE POWER OF THE GENERALIZED DURBIN-WATSON

TEST 0 O O O O O I I O C O O C

1 Analytical Results . . . .
2 A Monte Carlo Comparison of the Tests .
3 Summary . . . . . . . . . . .

5.
5.
5.

iii

Page

ii

11

ll
l3
l6
19
22
28

3O

3O
36

4O

44
51

55

55
57
61
64

70
7O

74
87

Chapter

VI.

CONCLUDING REMARKS

REFERENCES . . . . .

iv

Page
90

92

LIST OF TABLES

Variance of 8 . . . . . . . . . .
A 2
Mean of o . . . . . . . . . . .
u
N = 20,’f= 0.25 . . . . . . . . . .
N = 20,‘r= 0.75 . . . . . . . . . .
N = 50, r: 0.75 . . . . . . . . . .
N = 100, T = 0.75 . . . . . . . . .
0.01 level critical values of (d2)L and (d2)U
0.05 level critical values of (d2)L and (d2)U
0.10 level critical values of (d2)L and (d2)U
Number of rejections per 1,000 trials under

the null hypothesis of no autocorrelation

Means and standard deviations of dl and d2
under the null hypothesis . . . . .
Number of rejections under first-order auto-
correlation . . . . . . . . . .

Means and standard deviations of d1 and d2
under first-order autocorrelation . . .

Number of rejections under second-order
autocorrelation . . . . . . . . .

Means and standard deviations of d1 and d2
under second-order autocorrelation . . .

Number of rejections under second-order auto-
correlation (p1 = 0) . . . . . . .

Means and standard deviations of d1 and d2
under second-order autocorrelation (p1 = 0)

Page
23
24
45
46
50
52
67
68

69

77

78

80

81

83

84

85

86

CHAPTER I

INTRODUCTION
1.1 Statement of the Problem1

Consider the linear regression model

yi = lei1 + Bzxiz + ... + BKXiK + ui' l = 1,2,...IN;

(1.1)

where each Bj is a parameter to be estimated; Xi' is the

ith observation on the jth independent variable (regressor);

yi is the ith observation on the dependent variable in the
regression; and ui is a random disturbance. This model

can be rewritten in matrix form:
y = XB + u; (1.2)

where y, B and u are vectors and X is a matrix, defined by

 

 

 

Y1 X11 X12 '°° XlK U1 81
y x x ... x u 8

y = 2 x = 21 22 2K u = 2 B = 2 (1.3)
yN le xN2 "' XNK uN BK

 

 

 

 

 

 

This section closely follows [17], Section 5.4.

This model is said to satisfy the full ideal conditions

 

(FIC)2 if u is stochastically independent of X, if
E(u) = 0,3 if X has rank K i N, and if E(uu') = OZI
(where I is the N-dimensional identity matrix and 02 is a

parameter to be estimated).

On the other hand, autocorrelation is said to be

 

present when Cov(uiuj) # O for some i # j. That is, the
disturbances are said to be autocorrelated if E(uu') = 020,
where 0 is a non-diagonal positive semi-definite matrix.
Autocorrelation is typically considered in the context of
time-series analysis; it is then the case in which the
disturbances are correlated over time. This study will be
concerned with cases in which the non—diagonality of 0 is
the only violation of the PIC; that is, E(u) = 0 and u

is distributed independently of X, but 0 has non-zero

terms off the diagonal.

The ordinary least squares (OLS) estimator of B

 

is defined by

l

E: (x'xf X'y (1.4)

. . 2 .
and an assoc1ated estimator of o is

l .—
52 = X—¥%, where M = I-X(X'X) l

N x'. (1.5)

 

2This terminology is due to [6].

3The symbol 0 will be used to denote an appro-
priately dimensioned matrix or vector of zeroes.

~

The covariance matrix of 8 under the PIC is equal to

l 2

02(X'X)- , and can be estimated by replacing 02 by 5 .
Now, it is well known that, under the FIC, 8 is best
linear unbiased, consistent, and also asymptotically
efficient if the disturbances are Normally distributed;
52 is unbiased, consistent, and also asymptotically
efficient if the disturbances are Normal. In fact, these
desirable properties of the OLS estimators under the PIC
constitute the chief rationale for the use of the OLS
estimation procedure. Unfortunately, however, these
properties do not hold if the disturbances are autocor-
related. In this case the OLS estimator 8 is still
unbiased and consistent, but it is in general no longer
best linear unbiased or asymptotically efficient. The
estimator 52 is in general biased and inconsistent.
Furthermore, the covariance matrix of 8 is no longer
equal to 02()UX)-l.
It is equally well known that these difficulties

could be avoided through the application of generalized

 

least squares (GLS) if the disturbance term covariance

 

matrix 0 were known. With 0 known, the GLS estimator of
B is

l (1.6)

lX'Q- y.

R = (x'n'lx)'

while 02 is estimated by

' 'k _ _
62 = XN¥KXV wherebd* = I—X(X'Q 1X)

1 -l

X'Q (1.7)

The covariance matrix of S is 02(X'0-1X)-1. It is also

sometimes useful to note that since 0 is by assumption
positive semi-definite, 0-1 is also positive semi-definite,
so that there must exist a (not necessarily unique) non-

singular matrix V such that

Q = V'V (1.8)

It should then be clear that if u has covariance matrix
020, Vu will have covariance matrix GZI. Hence if the

regression equation is rewritten
Vy = VXB + u (1.9)

the disturbances now satisfy the FIC, so that the OLS
regression of Vy on VX is appropriate. Indeed, the OLS
regression of Vy on VX is algebraically identical to the
GLS procedure defined above.

The point of using GLS is of course that the GLS
estimators (with 0 known) have the same optimal properties
as do the OLS estimators under the PIC. That is, 8 is
best linear unbiased, consistent, and asymptotically
efficient; 32 is unbiased, consistent, and asymptotically
efficient. It should therefore be clear that autocorrela-

tion is really a problem only in that Q is generally not

known. With 0 unknown, the above GLS procedure cannot be
applied, at least not directly, and the question of what
to do when 0 is unknown (but suspected to be non-diagonal)
is not a trivial one. It is this question to which the

rest of this study will be addressed.

1.2 Types of Autocorrelation

 

When 0 is unknown, an obvious procedure is to find
a consistent estimate of it, and then to use GLS with the
estimate 0 replacing the (unknown) true covariance matrix
Q. The statistical justification for this procedure is
the well-known fact that if GLS is applied with any

consistent estimator 0 used in place of 0, the resulting

 

estimators 8 and 62 will be consistent and asymptotically
efficient.4

Unfortunately, however, it is in general not
possible to estimate 0 consistently. After all, 0 has
%N(N-l) distinct elements, and one can hardly hOpe to
estimate them all from a sample of N observations. It
should therefore be clear that one can hope to proceed
only by putting fairly severe restrictions on 0.
Essentially, what must be done is to make 0 depend on
some fixed number of parameters that does not depend on

the sample size. These parameters can then (hOpefully)

 

4This theorem was proved for a special case in
[52]. However, it holds whenever X is fully independent
of u. For a full discussion see [36].

be consistently estimated, 0 can be constructed, and

GLS can be applied.

In particular, the procedure which has usually

been used is to assume that the disturbances follow a

first-order Markov process:

11- = u. + o i = —m 0.0 N.
1 p1 i-l ei' ’ ’ ’

where —l < pl < l, E(ei) = 0, E(Eiui-s)

_ . . 2 :
E(eiej) — 0 for i # j, and o - Var(ei)

(1.10)

0 for s > 0,

VarIuj)(1 - 012)

E ou2(1 - p12) for all i and j. (This is often referred
to as a first—order autocorrelation scheme.) Then it is
readily verified that Cov(uiuj) = on2 plll_jl. Hence 0

is of the form:

 

N-1
1 01 pl
0 = p l . . pN—Z l
1 1 2
. . l"pi

 

 

Direct multiplication will verify that 0-1

the tridiagonal matrix

(1.11)

is given by

l -01 O 000 O 0
2

"pl 1+0]. "’01 on. O 0
0 -pl l+p12 ... o 0

-1 . . . . .

n = . . . . (1.12)
o o o .. l+p 2 —p
1 1

O 0 0 .. -pl 1

 

 

As noted in the last section, 9—1 can be decomposed as
-l

 

 

0 = V'V. V is in this case given by:5

a 0 0 .. 0 0

-pl 1 0 ... 0 0

21/2
V = O -p. 1 ... O O , where a = (l-pl ) .

0 0 0 ... -pl 1 (1.13)
One last fact to note is that the determinant of 0-1 is
equal to a2 = 1-012.

It is clear that in this case 0 depends on only
one parameter, pl. Given a consistent estimator 51, 0
can be formed and GLS can be applied. If the disturbances
are in fact generated by a first—order Markov process, 0

will be a consistent estimator of Q, and the resulting

GLS estimators will thus be asymptotically efficient.

 

5The first explicit statement of 0_1 and its
decomposition is given in [40].

In general, however, there is no reason to suppose
that a first—order scheme is appropriate. First-order
autocorrelation is generally only an approximation to the
(unknown) type of autocorrelation in the sample, and
it is frequently assumed because it is a particularly
simple way to make estimation possible. Iowever, while
it is true that the form of 0 must be restricted to make
it estimable, it seems rather drastic to make 0 depend
on only one parameter. After all, a more general type
of autocorrelation scheme might provide a reasonable
approximation to more different types of autocorrelation,
and this would seem desirable when 0 is not known a
priori to be of any particular form.

This study will be particularly concerned with
the obvious generalization of the usual first-order pro-
cedures to the case of a second—order autocorrelation

scheme:

ui = plui—l + p2ui_2 + Ei’ (1.14)
where loll + I02] < l and the same assumptions about 8i

are made as in the first-order case, except that here

Var(ui) is equal to

2 2
l - pl - p2 - 201 pz/(l - 92)

 

02(1 - 02)
== 1
l - - 2 - 2 _ 2 + 3 (1.3.5)
D2 D1 D291 02 p2
Then the following facts are readily verified:

E(uu )=020/(l-p) (116)
i i-l u 1 2 ’
E(uu )=02[o +02/(l-o)] (117)
i i-2 u 2 1 2 '

E(uiul s) = plE(ui-1ui-s) + sz‘ui-zui-s)'
s > 2. (1.18)

0 can thus be written out, though this becomes rather
tedious since the terms become horrendous as one moves
away from the diagonal. The useful expression in any

case in 0’1, which is given by

1 -01 -02 0 0 0 0
-o 1+0 2 o (o -1) -o 0 o o
l l l 2 2 "‘
-p o (p -1) 1+0 2+9 2 p (o -1) o o o
2 1 2 1 2 1 2
.-1 = _ _ 2 2
ﬂ 0 92 01(02 1) l+pl +02 ... 0 0 0 (1.19)
O O 0 0 1+p 2+9 2 p (p -l) -o
"' l 2 l 2 2
2
0 O 0 O ... pl(pz-l) 1+0l -01

 

 

10

Note that this is a band diagonal matrix, with five non-

zero bands. One can again decompose 0-1 as V'V, where in

this case V is given by

 

 

a O O 0 . . 0 O 0
b c 0 0 ... 0 0 0
-9 -0 1 o o o 0
V: 2 1 (1.20)
0 -p2 -pl 1 .. O 0 0
o o o 0 ... -p2 -pl 1
and where
a=[l-*2-02(l+p)/(l-p)]l/2
V2 1 2 2
1
_ _ _ a
b — pl [(1 + 02)/(l 02)] (1.21)
»
c= (l - 022)2

It is also clear
a2C2 = 1 _ pl2

Finally,

considering this

that the determinant of Q-l is equal to

2 2 2 2 4
202 - 201 02 — pl 02 + 02 0
it should be repeated that the reason for

second-order scheme is not that there is

necessarily any a priori reason for believing it to be

common in actual data. The point is simply that the

assumption of first-order autocorrelation may be unduly

restrictive.

CHAPTER II

ESTIMATION IN A LINEAR MODEL

2.1 Introduction

 

Consider the linear model defined by (1.1). Then
it is clear, under the assumption of first-order auto-
correlation, that GLS can be applied to get asymptotically
efficient estimates if one can obtain a consistent
estimator 61 with which to construct 0. Several methods
of getting a consistent estimator 51 have been proposed
in the literature. One method, suggested by Durbin,l

is to use for 31 the OLS estimator of the coefficient of

y_l in the transformed equation

+8 +...+BX

Y1 = p1Yi—1 1x1,1 ' Blplxi-1,1 K i,K

BKplxi—1,K + (ui - plui-l)’ i = 2,3,...,N
(2.1)
This estimator is consistent. Hildreth and Lu2 have

suggested a modification of this procedure in which (2.1)

is estimated subject to the constraints that

 

1[10] and [11].

2[241.

ll

12

/\ A
-ijl = -8 pl, j = 1,...,K. (2.2)
Clearly this is a non-linear procedure. The estimators
it yields are in fact the maximum likelihood estimators,
conditional on yl. This procedure gives a consistent

and asymptotically efficient estimate of pl. Finally,

an estimator which will be referred to as the C-0

estimator (after Cochrane and Orcutt3) is the following:

N~~
A g ui i-l
pl = 'EriT:;-— (2.3)

the ﬁi being the OLS residuals from the regression of y
on X. This estimator is consistent.

As noted, if the disturbances do in fact follow
a first-order scheme, each of these estimators is con-
sistent. Hence in this case GLS based on any of the 61
above will yield asyptotically efficient results.

It should also be noted that in actual econometric
practice the usual procedure in applying GLS is not to
actually form 0 or 0-1, but rather to use the equivalent

procedure of forming V and applying OLS to the transformed

3Actually, the estimator defined in their article
differs from the one defined above by a factor of N/(N-l).

l3

equation Vy = VXB + Vu. Notice that disregarding the

first row of V, this amounts to the following regression:
(Y1 ' piYi-i’ = B1(Xi,1 ' plXi-l,l) T °"

+ BK(Xi,K ‘ plxi-1,K) + (“i ’ plui-l)’

i = 2,3,...,N. (2.4)

Disregarding the first row of V is thus equivalent to
discarding the first observation, and this has commonly
been done. This common "approximation" to the actual GLS
procedure (which would include the first observation with
a "weight" of (l - 512)%)jrsclearly asymptotically
equivalent to GLS.

2.2 Generalization to the
Second—Order Case

 

 

In the case of second-order autocorrelation it is
of course necessary to estimate both p1 and oz in order to
form 0. Fortunately, consistent estimators pl and oz can
be obtained by straightforward generalizations of the
procedures of the last section. Durbin's method can be
generalized to this case4 by applying OLS to the trans-

formed equation:

 

4In fact, it was presented in general qth order

form (q any integer) in [11].

14

 

Y1 7 p1Yi-i + p2Yi-2 + B1xi,i 7 BlplXi-l,l 7 B102X1-2,1 +°°'
+ BKxi,K 7 BKplXi-1,K 7 BK°2X1-2,K
+ (ui - plui-l - QZui—Z)’ i = 3,4,...,N. (2.5)
The maximum likelihood procedure (conditional on yl and
y2) is to estimate the above equation subject to the
constraints
B/\ ’8“ 1 l 2 (2 6)
- 0 = '- o . = .0. K; = o o -
Jpq qu' J , r q r
The C-O method can be generalized in at least two ways.
The first would be to estimate pl and oz by the OLS
regression of E. on u. and ﬁ. . A somewhat more
1 1-1 i-2
informative way is to note that if one defines
N
* g uiui—l
pl = _Er______ (2.7)
2 {1.2
1 i
N
* gul i‘2
pl = —N—_- , (208)
z {1.2
1 i
* * COV(uiui-l)
then p1 and p2 are consistent estimators of 2
0u

Cov(uiui_2)
and 2 respectively; from (1.16) and (1.17) it
Cu

 

15

is clear that these expressions are not in general equal
to p1 and oz. However, consistent estimators can be
derived by setting p: and p; equal to their probability
limits given by (1.16) and (1.17) and then solving for

pl and p2. That is, solve the following equations for

p1 and p2:
* A A
01 — pl/(l - oz) (2.9)
*—" “2 1 A 210

The solution is

A * *2 *2
A * A

Once again the estimators 61 and 62 are con-
sistent. Hence if the disturbances do follow a second-
order autocorrelation scheme, GLS (using 0 constructed
from 61 and 52) will yield asymptotically efficient
estimates of B and 02.

Finally, it is once again apt to be computationally
simpler to form V (rather than 0 or 0_l) and to apply OLS

to the transformed equation Vy = VXB+-Vu. Disregarding

the first two rows of V, this amounts to the following:

l6

(Y1 7 p1Yi-1 7 pzyi-z) = BlIXi,1 ' 01Xi_1,l - 92X1-2,1) +...

+ BK(Xi,K 7 p1xi—1,K 7 OZXi-2,K)
7'" (ui — 01111-1 - 0211i-2), l = 3,4,...,N.

(2.13)

Clearly this is asymptotically equivalent to the actual
GLS procedure, which would include the first and second
observations with appropriate weights given by the

elements in the first two rows of V.

2.3 Pronerties of the GLS Estimators

 

Let us denote by GLSl the GLS procedure which
assumes first-order autocorrelation. That is, GLSl means
GLS with 0 formed from 61, with 61 calculated by any of
the methods of section 2.1. Similarly, let GLSZ be the
GLS procedure using 0 formed from 61 and 62, with 61 and
62 calculated by any of the methods of section 2.2. In
this section we will compare the efficiency of OLS, GLSl,
and GLSZ under various specifications of the form of 0.

Consider first the asymptotic properties of the
various estimation procedures. These asymptotic compari-
sons can be easily made since, as noted in section 1.2,
estimation based on a consistent estimator of 0 will yield

asymptotically efficient results, while estimation based

17

on an inconsistent estimator of 0 will in general give
asymptotically inefficient estimates of B and inconsistent
estimates of oz.

Suppose first that the disturbances satisfy the
full ideal conditions (FIC). Clearly OLS will be
asymptotically efficient. But it is also clear that the
PIC are just the special case of a first-order auto-
correlation scheme corresponding to p1 = 0, so that 61
estimated by any method of section 2.1 will be a consistent
estimator of pl = 0. Hence 0 formed from 61 will be a
consistent estimator of 0, and GLSl will also be
asymptotically efficient. Similarly, 61 and 62 estimated
by any of the methods of section 2.2 will both be con-
sistent estimators of zero, so that GLSZ will also give
asymptotically efficient results. In other words, under
the PIC, OLS, GLSl and GLSZ all give equally asymptotically
efficient results.

Suppose next that the disturbances follow a first-
order scheme with 91 # 0. Then by the same type of
reasoning, it is clear that OLS yields asymptotically
inefficient results, while GLSl and GLSZ are both
asymptotically efficient. Finally, if the disturbances
follow a second-order scheme with p2 # 0, only GLSZ will
be asymptotically efficient; OLS and GLSl will both yield

asymptotically inefficient results.

18

This can be summarized by the general statement
that estimation based on the assumption of the "correct"
(true) order or on the assumption of "too high" an order
of autocorrelation (i.e., an order of autocorrelation
greater than the true order) will lead to asymptotically
efficient estimators 8 and 82. On the other hand,
estimation based on the assumption of "too low" an order
of autocorrelation will in general yield asymptotically
inefficient estimates of B and inconsistent estimates of
62. In other words, in infinite samples one should
always prefer a higher order of autocorrelation, since
this will minimize the chance of getting inefficient
estimates.5

Of course, in small samples none of this need be
true. In fact, there are almost no analytic results on
the small sample properties of GLS estimators with an
estimated covariance matrix; questions of this sort are
of necessity usually investigated by Monte Carlo methods.6
The next section will describe a Monte Carlo experiment
which attempts to examine the small sample prOperties of
OLS, GLSl and GLSZ under various specifications of 0. In

particular, the question which we attempt to answer

 

5This discussion has ignored differences in com-
putational costs.

6For an example of an experiment comparing the
small-sample properties of OLS and GLSl, see [20].

l9

concerns the size of the loss that results in small
samples (if in fact any loss does result) from assuming
too low or too high an order of autocorrelation. The
criteria for choosing the superior estimation procedure
will be the variance of the estimator of B and the bias

of the estimator of Ou2'7

2.4 The Experiment

 

The eXperiment was conducted in the context of the

simple regression model:

yi = a + BXi + ui, (2.14)

with a, B and on2 all being taken equal to one. The
generation of the values of X and u is described below;
given X and u, observations on y were created and (2.14)
was estimated in each of three ways:

A. OLS

B. GLSl, 61 estimated by the C—0 method given

by (2.3) of section 2.1.
C. GLSZ, 61 and 62 estimated by the C-O method

given by (2.7) - (2.12) of section 2.2.8

 

7The bias of 8 2 is considered rather than its
variance or mean squarg error because one is typically not
interested in 6u2 per se, but only to make confidence
statements, tests, etcT— It is therefore most important
that 6 2 not be strongly biased in one direction or the
other.

8This perhaps deserves a comment. Asymptotically
it makes no difference how the p's are estimated, as long
as the estimators used are consistent. However, in small

20

In all cases the GLS procedures used were not actually the
true GLS procedures, but rather the common approximation
to GLS of applying OLS to the transformed equations (2.4)
and (2.13), as described above.9 This procedure was
repeated for 100 independent trials under each of a
variety of specifications, and the results of the 100
trials were used to calculate the variance of 8 and the
mean of ouz.

To create observations on X, X1 was taken to be

a N(0,l) deviate from a listing of random deviates pre-

pared by the Rand Corporation.10 The remaining Xi were
generated as follows:
2 h .
Xi = TXi—l + (1.1: ) 5i, l = 2,3,...,N; (2.15)

where 5i is a N(0,1)-deviate independent of X1 and of

previous 6's. T is thus the correlation between successive

 

samples different ways of estimating the 0's may lead

to considerably different results. This is not investi-
gated here because the purpose of this experiment is to
compare OLS and GLS under various types of autocorrela-
tion, and to introduce different versions of GLS based
on different ways of computing the p's would tend to
confuse the issue.

9As just noted, this makes no difference
asymptotically. The approximate procedure is used here
because it is typically used in econometric practice.

1"[42].

21

Xi' Two values of I were used, 0.2 and 0.8, since it is
well known that the properties of the estimators may
depend on the correlation between the Xi. Hence this
correlation was held constant at each of the two levels.
Two values of N (sample size) were considered, 20 and 100.
The results for sample size 20 are designed to show small
sample properties; sample size 100 was included so as to
get an idea of the results with a somewhat larger sample,
and to see if the known asymptotic properties of the
estimators begin to emerge.

Observations on u were obtained by reading the
N(0,l)-deviates £1, €2""'§N' independent of each other
and of the X's, from the Rand listing, and applying
the suitable transformation (to be described) for each
specification. Given u, y could be constructed, and the
various estimation procedures could be applied. First-
and second-order autocorrelation schemes were considered,
with pl and p2 taking on all possible values among 0.0,
0.2, 0.4, 0.6 and 0.8, subject to the restriction that
pl + 02 < 1.

To simulate the null hypothesis of no autocor-
relation, the independent N(0,l)-deviates 5i were simply
left untransformed; that is, 111 = ii for all i. To
simulate first order autocorrelation with parameter pl,

the deviates 5i were transformed as follows:

22

1/
2)2 8-, i=2,3,...,N, (2.16)

u' = p1111-1 + (1791 1

where the factor (1-plz)l/2 is included to ensure that the
ui will have a variance of l for all i.

Finally, second order autocorrelation was
simulated by applying the following transformation to the

so:
1

u1 = 6l
u = p u + (l-p 2)2 e
2 1 l l 2
;,
uJ pluj-l + p2uj_2 + aj2 ej, j = 3,4,...,N (2.17)
where
-3
2 2 2 3 r

aj = l - pl - p2 - 2pl pz 2 0 p2. (2.18)

r:

Again the aj are taken so as to ensure that the ui will

. 11
have constant variance.

2.5 Results

 

Table 2.1 gives the variance of the estimates of 8
under the various specifications of the model, and Table 2.2

llNote two things. First, it is ouz which is being
held constant rather than 02. Second, this scheme is not
precisely the same as that defined in section 1.2, as it
does not start at -w. However, all the covariances con-
verge to those of section 1.2 as i increases, and even with
N=20 the difference should be negligible.

 

23

TABLE 2.l.--Variance of 8.

 

 

 

 

N = 20 N = 100
01 D 2
OLS GLSl GLSZ OLS GLSl GLSZ
0.0 0.0 0.0636* 0.0744 0.0867 0.0105* 0.0113 0.0114
0.2 0 0 0.0711* 0.0787 0.0916 0.0108 0.0108 0.0107*
0.4 0.0 0.0777 0.0720* 0.0820 0.0112 0.0084 0.0083*
0.6 0.0 0.0815 0.0545* 0.0597 0.0117 0.0053 0.0052*
0.8 0.0 0.0712 0.0259* 0.0364 0.0118 0.0023* 0.0023
0.0 0.2 0.0626* 0.0728 0.0817 0.0105 0.0115 0.0102*
T = 0 2 0.0 0.4 0.0610* 0.0710 0.0660 0.0101 0.0111 0.0078*
' 0.0 0.6 0.0580 0.0644 0.0432* 0.0095 0.0103 0.0050*
0.0 0.8 0.0490 0.0460 0.0210* 0.0090 0.0088 0.0023*
0.2 0.2 0.0708* 0.0756 0.0843 0.0107 0.0105 0.0093*
0.4 0.2 0.0771 0.0644* 0.0692 0.0114 0.0074 0.0067*
0.6 0.2 0.0729 0.0373* 0.0381 0.0125 0.0039 0.0036*
0.2 0.4 0.0692 0.0706 0.0658* 0.0108 0.0100 0.0069*
0.4 0.4 0.0722 0.0511 0.0462* 0.0128 0.0065 0.0046*
0.2 0.6 0.0616 0.0580 0.0393* 0.0112 0.0092 0.0042*
0.0 0.0909* 0.1034 0.1179 0.0113* 0.0120 0.0125
0.2 0.1141* 0.1278 0.1450 0.0146* 0.0160 0.0166
0.4 0.1401* 0.1430 0.1614 0.0191 0.0190* 0.0196
0.6 0.1667 0.1359* 0.1553 0.0257 0.0179* 0.0182
0.8 0.1806 0.1011* 0.1187 0.0355 0.0103* 0.0105
0.0 0.2 0.0954* 0.1064 0.1237 0.0136* 0.0143 0.0146
0.0 0.4 0.0983* 0.1075 0.1150 0.0159 0.0166 0.0147*
T = 0.8 0.0 0.6 0.0993 0.1071 0.0929* 0.0181 0.0187 0.0119*
0.0 0.8 0.0891 0.0951 0.0612* 0.0196 0.0201 0.0066*
0.2 0.2 0.1237* 0.1306 0.1482 0.0185* 0.0191 0.0188
0.4 0.2 0.1556 0.1398* 0.1566 0.0261 0.0207 0.0198*
0.6 0.2 0.1818 0.1230* 0.1402 0.0393 0.0146 0.0141*
0.2 0.4 0.1345 0.1329* 0.1372 0.0243 0.0223 0.0176*
0.4 0.4 0.1756 0.1401* 0.1471 0.0414 0.0210 0.0158*
0.2 0.6 0.1412 0.1315 0.1166* 0.0347 0.0248 0.0128*

 

TABLE 2.2.--Mean of an .

2

24

 

 

 

 

N = 20 N = 100

pl p2 OLS GLSl GLSZ OLS GLSl GLSZ
0.0 0.0 1.0086 1.0015* 0.9578 0.9931* 0.9919 0.9891

0.2 0.0 0.9760* 0.9648 0.9176 0.9886* 0.9860 0.9819

0.4 0.0 0.9236* 0.8969 0.8311 0.9802* 0.9726 0.9631

0.6 0.0 0.8354* 0.7744 0.6835 0.9607* 0.9403 0.9186

0.8 0.0 0.6515* 0.5309 0.4456 0.8982* 0.8436 0.7916

0.0 0.2 0.9759* 0.9629 0.9284 0.9898* 0.9882 0.9826
T _ 0.0 0.4 0.9303* 0.9075 0.8577 0.9814* 0.9792 0.9615
- 0.0 0.6 0.8566* 0.8174 0.7320 0.9608* 0.9576 0.9123
0.0 0.8 0.7181* 0.6493 0.4838 0.9046* 0.8990 0.7828
0.2 0.2 0.9268* 0.9090 0.8748 0.9856* 0.9819 0.9703
0.4 0.2 0.8534* 0.8118 0.7641 0.9906 0.9774 0.9453
0.6 0.2 0.7075* 0.6142 0.5538 0.0101* 0.9666 0.8810

0.2 0.4 0.8616* 0.8333 0.7859 0.9942* 0.9879 0.9428
0.4 0.4 0.7654* 0.6998 0.6419 1.1094 1.0798* 0.9091
0.2 0.6 0.7457* 0.7035 0.6244 1.0398 1.0264* 0.8808
0.0 0.0 1.0220 1.0146* 0.9688 0.9940* 0.9927 0.9897
0.2 0.0 0.9820* 0.9693 0.9218 0.9869* 0.9845 0.9802
0.4 0.0 0.9181* 0.8899 0.8265 0.9750* 0.9685 0.9587
0.6 0.0 0.8115* 0.7545 0.6726 0.9502* 0.9333 0.9109
0.8 0.0 0.5999* 0.5022 0.4230 0.8790* 0.8313 0.7783
0.0 0.2 0.9836* 0.9706 0.9317 0.9886* 0.9871 0.9813
T _ 0.0 0.4 0.9315* 0.9101 0.8532 0.9777* 0.9759 0.9583
_ 0.0 0.6 0.8508* 0.8142 0.7142 0.9543* 0.9518 0.9070
0.0 0.8 0.7080* 0.6394 0.4777 0.8961* 0.8919 0.7771
0.2 0.2 0.9231* 0.9044 0.8671 0.9804* 0.9772 0.9655
0.4 0.2 0.8311* 0.7918 0.7478 0.9794* 0.9687 0.9370
0.6 0.2 0.6564* 0.5813 0.5291 0.9883* 0.9506 0.8703
0.2 0.4 0.8444* 0.8180 0.7661 0.9839* 0.9790 0.9347
0.4 0.4 0.7181* 0.6662 0.6147 1.0862 1.0614* 0.8967
0.2 0.6 0.7117* 0.6777 0.6023 1.0207 1.0104* 0.8688

 

25

gives the mean of the estimates of ouz. For each speci-
fication of the model an asterisk (*) marks the estimated
minimum variance estimator of B and the estimated least
biased estimator of ouz.

Consider first the null hypothesis of no auto-
correlation; that is, the case pl = p2 = O. In terms of
the estimates of B, OLS is clearly best, and GLSl is
better than GLSZ. The differences are considerably larger
at sample size 20 than at sample size 100, as the
asymptotic equivalence of these estimators under the PIC
begins to show at the larger sample size. In terms of
the estimates of on2 there is little difference between
the various procedures, though GLSZ seems to give somewhat
inferior estimates when N = 20. Finally, the value of I
does not seem to make much difference in this case.

The next Specification considered is first-order
autocorrelation. Consider first the efficiency of
estimation of B. GLSl clearly dominates GLSZ at sample
size 20, though the difference is not terribly great; at
sample size 100 they appear to be roughly equivalent,
clearly reflecting their asymptotic equivalence in this
case. Both GLSl and GLSZ gave noticeable gains in
efficiency over OLS, except for "small" values of 01. The
minimum value of pl necessary to result in a gain in
efficiency over OLS was smaller for GLSl than for GLSZ,

and for either GLSl or GLSZ it was smaller with N = 100

26

than with N = 20. Also, the efficiency of either GLSl or
GLSZ compared to OLS was greater when r = 0.2 than when

T = 0.8.12 The results were fairly favorable to the use

of GLS in small samples in that QLof roughly 0.4 sufficed
to make GLSl more efficient than OLS, even with a sample

size of only 20, while the "break-even point" for GLSZ

was roughly 0.6.13

In terms of the bias of the estimates of ouz, it
is clear from Table 2 that OLS was markedly superior to
either GLS procedure. It is somewhat troubling that
this was true even with N = 100. It was true that the
OLS estimator of ouz had the expected downward bias, but
it turned out to be actually less biased than the GLS
estimators. In fact, a glance at the rest of Table 2 will
quickly reveal that this was also true for almost all the
other specifications considered.

With respect to these last results, three points
should be made. The first is that they could not hold
asymptotically; apparently even sample size 100 is not
large enough to reveal the asymptotic result in this case.

The second point is that different results might have been

obtained if 62 rather than 6u2 had been considered. The

 

12This should be expected, since it is well known
that the C-0 estimator 61 is more severely biased the
larger the value of I. See, for example, the results in
[20].

l3Asymptotically, of course, either GLSl or GLSZ
would be more efficient than OLS for any 01 # 0, no
matter how small.

27

third is that only the bias has been considered here; it
is quite conceivable that the variance or even the mean
square error of the GLS estimators might be smaller than
that of the OLS estimator. We will return to these
last two points in section 5 of the next chapter; for
now the above results will simply be taken as they are.
The third specification considered was second-
order autocorrelation with pl = 0, a special case of the
general second-order scheme. Considering the variance
of the estimators of B, GLSZ is more efficient than OLS or
GLSl except for "small" values of 02, a small value of 02
being 0.4 or less at sample size 20 and 0.2 or less at
sample size 100. As before, the relative efficiency of
the most efficient estimator is greater with the smaller
value of T. Comparisons of OLS and GLSl shows OLS to be
generally superior, with very few exceptions. The dif-
ference was usually quite small, however. The superiority
of OLS over GLSl was slightly more noticeable in the
samples of size 20; this is reasonable since OLS and
GLSl are asymptotically equivalent in this case.14
The last specification considered is second-order
autocorrelation with both pl and 02 non-zero. Consider
the estimates of B. GLSZ was typically most efficient,

as would be expected, though GLSl does quite well when

 

14This is true since pl = 0. Thus plim 61 = 01/

28

T = 0.8. With N = 100 GLSZ was always more efficient

than GLSl, and GLSZ was more efficient than OLS in all
cases except one. With sample size 20 GLSZ was more
efficient than OLS in all cases except p1 = 02 = 0.2,

and it was also more efficient than GLSl except when

02 = 0.2 and in a few cases when 02 = 0.4 and T = 0.8.
Again it appears that the relative efficiency of the most
efficient estimator is somewhat less with the larger

value of T. Finally, GLSl is typically more efficient
than OLS, especially when pl is large. This is especially

noticeable at sample size 100.

2.6 Summary

 

One implication of the last section is that GLS
unfortunately does not seem to give less biased estimates
of on2 than OLS, even for fairly large sample sizes. As
noted earlier, this point will be considered again in
section 5 of the next chapter.

In terms of the variance of the estimates of 8,
however, GLS performed quite well. This was true even
for samples as small as 20. In particular, the loss of
efficiency in assuming too high an order of autocorrela-
tion was fairly small, while the penalty for assuming
too low an order was in many cases quite large. These

are of course essentially the asymptotic results, and they

showed through quite well in small samples.

29

One implication of these results is that GLSZ might
seem to be a useful procedure, at least if one is primarily
interested in efficient estimation of B. Asymptotically,
there is no loss in using it unnecessarily, and one will
gain by using it if autocorrelation is of second-order
form. Even in small samples the gains from its use may be
substantial, and the loss in using it unnecessarily (for
example, if autocorrelation were of first-order form) is
typically small. This makes the currently almost universal
use of GLSl in cases of suspected autocorrelation seem
perhaps unjustified. After all, there is frequently no
particular reason to suppose that first-order autocorrela-
tion is typically present in real data. The assumption of
first-order autocorrelation is generally just a simplifying
assumption made in order to make estimation possible.
Second-order autocorrelation is a less restrictive
assumption, and a second-order scheme ought to provide a
reasonable approximation to more different types of auto-
correlation than will a first-order scheme. Hence when
autocorrelation is not known a priori to be of first-
order form, GLSZ might be useful.

Finally, it cannot be overemphasized that the
small-sample results obtained here are specific to the
particular model used. Limited evidence is better than
none, however, and these results may at least be useful
in pointing the way for further analytical work in this

area 0

CHAPTER III

ESTIMATION IN A DISTRIBUTED LAG MODEL

3.1 Introduction

 

In the last chapter we have introduced methods of
estimation in a linear model in the context of second-
order autocorrelation of the disturbances, and we
analyzed the properties of the resulting estimators. In
this chapter we will extend these results to the case of

a common type of non-linear model, the distributed lag

model.
The simplest distributed lag model is a model of
the form
yi = 82 xi_j )3 + ui, i = l,2,...,N (3.1)
j—O
where ui, i = 1,2,...,N, is an unobserved random dis—

turbance; Xi is either a fixed (non-stochastic) number
or a random variable independent of the disturbances, with

observed values Xi,...,X B is a parameter to be esti-

N;
mated; A is a parameter to estimated, 0 i A i l; and yi
is the observed dependent variable in the model. Clearly

the model in this form is not amenable to estimation; it

30

31

is usually rewritten in one of two ways. Lagging (3.1)
by one observation, multiplying by l and subtracting

yields

yi = BXi + Ayi-l + (ui - Aui-l)’ i = 2,3,...,N (3.2)

This is the so~called "Koyck transformation."l Alter~

nately, defining

66 , i i-j.
”0 = 82 x_. 13; wi(1) = 2 x. 1 ' (3'3)
j=o 3 i=1 3
(3.1) can be rewritten as
_ i
yi — BWi(A) + n01 + ui. (3.4)

This transformation was suggested by Klein.2

The model as written in (3.2) has a certain amount
of attractiveness since it can apparently be estimated
directly. However, it has long been realized that
ordinary least squares applied to (3.2) will in general
yield inconsistent results, as the disturbance (ui - Aui-l)

is correlated with the regressor yi-l' Koyck3 suggested

a method for obtaining consistent estimates of B and l;

 

1[32].
2Appendix to [28].
3I321.

this method was reinterpreted by Klein4 in an errors in
the variables framework. Liviatan5 has also suggested a
method for obtaining consistent estimates. His procedure
is essentially an instrumental variable one, with xi—l
serving as the instrument for Yi-l'

Both the Koyck—Klein procedure and the Liviatan
procedure are fairly straightforward; their main dis-
advantage is that the resulting estimates are asymptotic"
ally inefficient. Assuming that the disturbances in (3.1)
are normally distributed and meet the classical condi—
tions (the FIC), asymptotically efficient estimates of
B and 1 can be obtained by maximum likelihood estimation.
Following Klein,6 (3.1) is rewritten as (3.4). Then
the log likelihood function is
L = lg log 20 — g log 02 - —£§

20

, i 2

HMZ

Now since the estimator of 02 turns out to be independert
of the estimators of 8 and A (as will be shown in the nex’

section), maximizing L is equivalent to minimizing

 

4[281.
5[341.

6Appendix to [28].

33

L* [yi

ll
HMZ

- 8wi(x) - noxilz (3.6)

with respect to B, A and n0. Clearly the resulting
normal equations will be highly non-linear. However, it
was noted by Dhrymes7 and by Zellner and Geisel8 that if
one knew A, one could form Wi(1) and Xi and calculate

the maximum likelihood estimates 8 and 00 by a simple
regression of yi on Wi(l) and Ai. When A is unknown, the
procedure is to "search" over the admissable range of A,
picking the value of A which minimizes the sum of squared
errors. The resulting values X, 8, and 30 are then the
maximum likelihood estimators; the maximum likelihood
estimator of 02 is the sum of squared errors divided by N.
It is well known that the estimators 8, I, and 32 are
consistent and asymptotically efficient; their asymptotic
covariance matrix is the inverse of the so-called
"information matrix," which will be written out in the
next section.‘ The estimator 00 is not consistent.

If the errors ui are autocorrelated, the procedure
outlined above must of course be modified somewhat. The
usual case considered in the literature is once again the
case in which the ui follow a first-order autocorrelation
scheme. This paper will treat the case of second-order
autocorrelation; the usual results for the first-order

case can be obtained by letting 02 = 0.

 

7181. 8[531.

34

For convenience, define Z1(A)' = [W1(A) ... WN(A)],
22(1)' = [A 12 )3 ... 1”], 2(1) = [zl(1) 22(1)] and
Y' = (8 no). Then the log likelihood function is
L = 2% log 20 - 810g IOZQI- —£§ [y-Z(A)Y]'0-l [y-Z(A)Y].
20
(3.7)

Now letting Z*(A) = VZ(A) and y* = Vy (V defined as in

(1.15)), and recalling that V'V = 0-1, and defining Q =
2

. -l _ 2 _ _ 2 _ 2 2 4
determinant of 0 — 1 pl 2p2 2pl p2 pl p2 + p2 ,

the log likelihood function can be rewritten as

log 20 - % log 02 +35log Q - l2[y*-Z*(MY]'

20

:E
2

 

L:
Iy*-z*(1)y1. (3.8)

Given A, 01 and 02, it is clear that 9 can be
calculated by the least squares regression of y* on Z*(A);
call this estimator $(A,pl,pz). By searching over A and
choosing the A that minimizes the sum of squared errors,
one gets the maximum likelihood estimators of Y and A,
conditional on pl and p2; denote these by §(pl’p2)
andJRol,pz). Also 82(01’02) can be calculated as the
sum of squared errors divided by N. Now substitute

IIDIIOZ) and 82(91'92) for Y and 02 in (3.8) above to get

35

 

_ _ N _ N .2
L(01.92) - 5 log 2“ 7 log 0 (01.02)
. N82(pl.02)
+ 8 log 0 - (3.9)
262(0 0 )
1' 2
which simplifies to
N N 2 —1 N
L(ol,02) = - 7 (log 26 + 1) - 7 16g [0 (pl,pz)Q / 1.

(3.10)

Finally, to calculate the maximum likelihood estimates

one then searches over 01 and 02 and selects those values

A A . . . "2 ‘l/N
01 and 92 that minimize o (pl,p2) Q .

Several comments are in order here. First, since
32(pl,p2) is itself determined by a search over A, what
is required is a three-dimensional search. That is,
given A, 01, and 02, y* and Z*(A) are formed and y* is
regressed<n1Z*(A). The maximum likelihood estimates are
obtained by choosing those values A, 31, and 82 that

/N

minimize the sum of squared errors divided by Q1 .

N

Secondly, since lim Ql/N = l, the division by Q1/ will

N+oo

not affect the results asymptotically. Hence asymptotically

the estimators obtained by merely minimizing the sum of
squared errors are equivalent to the maximum likelihood

estimators. Thirdly, it would clearly make no difference

36

asymptotically if the first two rows of the transforma-
tion matrix V were eliminated.

Finally, this entire procedure is clearly a
straightforward generalization of the usual procedure
for the case of first order autocorrelation, which may

be found in Dhrymes9 or Zellner and Geisel.10

3.2 Asymptotic Properties of
the Estimators

 

 

This section will begin by showing that maximum
likelihood estimators have desirable asymptotic properties
in the case of normally distributed, second-order auto-
regressive disturbances. Ignoring the initial condi-

11

tions, the model can be written

y1 = BWi(A) + ui; ui = plui-l + DZui-Z + ei (3.11)

Now for the moment assume that pl and 02 are known. Then
the log likelihood function (conditional on yl and y2)

can be written in terms of the 6i as follows:

 

_-N _N 2_ 1 *_ * 2
L — —§-log 20 3 log 0 2 Z[yi BWi(A)] , (3.12)
C
9181.
10[531.
11

It is well known that ignoring the initial condi-
tions makes no difference asymptotically. This should be
intuitively clear since the contribution of the initial
value no = E(yo) becomes less and less important as N
increases.

37

'k

*
where yi = y and Wi(A) = Wi(A)

i 7 p1y1-1 7 p2Y1-2
- plWi_1(A) - 02Wi_2(A). But in this case, with pl and
p2 known, the above transformation has removed the auto-
correlation from the disturbance, so that the likelihood
function is written in terms of mutually independent
variables. If B, A, and 02 are now estimated by maximum
likelihood, the consistency and asymptotic efficiency
of the resulting estimates is thus clear. It is
instructive to write out the information matrix

2

3 L .
-E[§§;§§;] (where the 01 are the parameters to be esti

mated, namely 8, A, and 02), which is as follows:

 

 

1 * 2 l * A * A 0
(B) gg-ZWi(X) gj'ZWi( )Ri( )
(A 1 z * 1 * 1 —$ * 1 2 3 13
) 37 Wi( )Ri( ) 02 XRiI ) 0 ( . I
(02) 0 0 N 4
/20
_ ._ i-2
where Ri(A) - xi-l + 2AXi_2 + ... + (i l)A X1, and

* o
Ri(A) = Ri(A) - lei(A) - szi_2(A). Notice that the
matrix is block diagonal, which shows that the estimator
of 02 is asymptotically independent of the estimators of
B and A, as previously claimed. The covariance matrix of

B and.Acan be found by inverting the upper left hand block.

38

We now proceed to the case in which pl and 02 are
not known and must also be estimated by maximum likelihood.

The information matrix is in this case the following:

 

1 * 2 t i
(m zwwi .4
32' 1 I 02 [Wi(A)Ri(A) 0 0 0

l t t l *
( A X W . A . 2
) :7 1‘IKI“ 37zm4M 0 0 0

 

 

 

 

 

'7 f—. 0 (3.14)
0 o 92
l
. 2 2
(02) o 0 Mu 02 Nou 0
02 I-oz ‘32
w% 0 0 0 o 1%

Zo

 

The matrix is again block diagonal, and the
blocks corresponding to B and A, and to oz, are exactly
as they were in the case in which 01 and 92 were known.
Hence the estimates of B, A, and 02 have the same
asymptotic variances as if pl and 02 were known, so that
the maximum likelihood estimators 8, A, and 32 are still
asymptotically efficient in the case where 01 and 02 are
unknown.

The fact that the maximum likelihood estimators
8, A, and 62 are asymptotically just as efficient as
they would be if 01 and 02 were known has important
implications for estimation. Consider first the case of
no autocorrelation of any order, and consider the follow-

ing three estimation procedures:

39

A. Estimation based on the assumption of no
autocorrelation; i.e., p1 and 02 are set

a_priori to zero.

B. Estimation based on the assumption of first
order autocorrelation; i.e., p is set a
priori to zero but pl is estimgted. '—

C. Estimation based on the assumption of second
order autocorrelation; i.e., p1 and 02 are
both estimated.

(These clearly correspond to the procedures OLS, GLSl and
GLSZ in last chapter's linear model.) Then it is clear
from the discussion above that in terms of the asymptotic
efficiency of the estimates of B, A, and 02, all three
estimation procedures are identical. If the autocorrela-
tion in the sample were of first-order form, then pro-
cedure (A) would yield asymptotically inefficient
results, while procedures (B) and (C) would be equally
asymptotically efficient. On the other hand, if the
autocorrelation in the sample were second-order, then
procedures (A) and (B) would yield inefficient results,
while (C) would be asymptotically efficient. In other
words, there is no loss in asymptotic efficiency in
assuming a higher order of autocorrelation than is in
fact present. On the other hand, assuming a lower order
of autocorrelation than is in fact present leads to
asymptotically inefficient results. (Note the close
analogy to the asymptotic results of the last chapter.)

Finally, it should be noted that the asymptotic

efficiency of maximum likelihood estimation is purchased

6 . , IMI. I1): wﬁwwlm... I qwiynﬂﬂnmg
m .. ..
~

40

at a cost-~first, because of the large computational
burden, and, second, because of the possibility of
inefficiency if too low an order of autocorrelation is
specified. The second problem can be alleviated by
choosing a very high order of autocorrelation, but this
will only make the computational burden correspondingly
larger. There is, therefore, perhaps something to be
said for an estimation procedure such as Liviatan's,12
which is not only simple computationally but which also
yields results that are at least consistent regardless

fo the covariance matrix of the disturbances. Of course,
Liviatan's estimators are not as efficient as maximum
likelihood;13 if one's criterion for choosing an estimator
is asymptotic efficiency, then maximum likelihood cannot
be beaten.

3.3 Small Sample Properties of the
Maximum Likelihood Estimators

 

 

The conclusion of the last section is that from the
standpoint of asymptotic efficiency one should always
assume a more general type of autocorrelation (e.g.,
second-order rather than first-order); there is nothing

to lose by doing 50,14 and there is possibly much to gain.

 

12[341.
13For an evaluation of the difference in efficiency,
see [2].

4Unless one is concerned with computational costs.

41

In small samples, however, this may not be true, if only
because one loses degrees of freedom by estimating

redundant p's. In other words, in small samples one must

 

presumably strike some sort of balance between the
possibility of assuming too low an order of autocorrela—
tion (and hence possibly getting badly biased and ineffi-
cient results) and the possibility of assuming too high

an order of autocorrelation (and hence possibly losing
efficiency). It would therefore be desirable to know how
large a loss there is in small sample efficiency when too
high an order of autocorrelation is specified, as well

as how bad the results are which one gets by assuming too
low an order of autocorrelation. Unfortunately, the

small sample prOperties of the estimators in a distributed
lag model have not proven amenable to analytical solution,
so that questions of this type must at present be investi-
gated by Monte Carlo methods. This section will describe
a Monte Carlo experiment which was performed in an attempt
to discover the small sample prOperties of maximum likeli-
hood estimators which assume no autocorrelation, first-
order autocorrelation, and second-order autocorrelation,
when the true type of autocorrelation in the sample is in
turn each of the three types above. Note that this is
essentially the same question that the experiment of the

last chapter attempted to answer for a linear model.

42

The model considered was of the form of equation
(3.4). The values chosen for the parameters were 8 = l,

A = 0.5 and no = 0, so that the structure of the model was

yi wi(0.5) + ui, i = 1,...,N (3.16)

Three values of N were considered--20, 50, and 100.

The sequence Xl,...,X was in each case constructed by

N
reading independent N(0,l) deviates 51,...,€N off a
"random" number generator and then transforming them as

follows:

x1 = 51

2 .
Xi = TXi-l + /l - T 5i, i = 2,3,...,N (3.17)

Each Xi thus has variance one, and the correlation
between successive X's is T. Two values of r were used,
0.25 and 0.75.

Three types of disturbances were simulated—-
classical, first-order Markov, and second-order Markov.
Classical disturbances were simulated by simply taking
independent N(0,l) deviates and leaving them untrans-
formed. Second-order distrubances were simulated by
taking independent N(0,l) deviates $1,...,€N and trans-

forming them as follows:

 

.I .I, H. (yam ....wﬁy r 9.. ...?!

.1 . .. 1

  

43

L11 7-7- O 61
u = [o /(l-o )] u + o [1-0 2/(l-o )2]15 e
2 1 2 1 u 1 2 2
ui = plui-l + p2ui_2 + 8i, 1 = 3,4,...,N (3.18)

First-order disturbances were simulated in the same way

except that in this case = 0.

02
Given the values of pl, 02, I and N, then, ui
and X1 were generated as above, Wi(A) and Ai were formed,
yi was generated, and the parameters were estimated by
maximum likelihood, assuming in turn no autocorrelation,
first-order autocorrelation, and second-order autocorrela-
tion. For each specification of the model, this pro-
cedure was repeated a number of times, and the observed
means, variances, and mean square errors of the
parameters were calculated. The number of repetitions
was 100 when N = 20 and 50 when N = 50 or N = 100.15
This study will concern itself only with the

resulting estimates of B, A, and 02.

 

15The smaller number of repetitions was used at
the larger sample sizes to economize somewhat on computer
time. A possible justification is that since the esti-
mators should be better behaved at the larger sample
sizes (at least those which are consistent), fewer
observations should be required to characterize their
distribution.

44

3.4 Results of the Experiment

 

Tables 3.1 and 3.2 contain the results with
N = 20, with T = 0.25 and 0.75 respectively. Consider
first the case of no autocorrelation; the results are
given in the upper left hand corner of Tables 3.1 and
3.2. (The figures under the headings 0, l, and 2 are
the results obtained by the estimation procedures
assuming no autocorrelation, first-order autocorrelation,
and second-order autocorrelation, respectively. That
is, they correspond to last chapter's headings OLS,
GLSl, and GLSZ.) Note that the estimates of B and A
obtained by each of the estimation procedures are
almost identical. If one loses small sample efficiency
in the estimation of B and A by assuming autocorrela-
tion when none is in fact present, the loss is apparently
very small, even for a sample as small as 20. The
estimates of 02 are all significantly biased downward,
and in the estimation of 02 one does seem to lose some—
what by assuming too high a degree of autocorrelation,
in terms of the mean square error of the estimate.
However, this is apparently due to an increase in the
bias rather than to a larger variance, and the difference,
while statistically significant, is actually numerically
quite small. Finally, the results were rather insensitive

to T, the correlation between successive Xi.

45

 

 

 

 

 

 

 

 

 

TABLE 3.1.--N = 20 T = 0.25.
pl = 0 92 = 0 p1 = .4 02 0 pl = .8 Q2 0
9 1 9 9 1 9 9 1 9
B MEAN 1.006 1.010 1.016 0.979 0.982 0.989 0.761 1.034 1.036
VAR 0.054 0.053 0.052 0.084 0.067 0.066 0.221 0.054 0.057
MSE 0.054 0.053 0.052 0.088 0.067 0.066 0.277 0.056 0.059
A MEAN 0.478 0.466 0.459 0.491 0.473 0.471 0.637 0.507 0.513
VAR 0.035 0.038 0.040 0.050 0.051 0.055 0.111 0.058 0.060
MSE 0.035 0.039 0.041 0.050 0.052 0.055 0.130 0.058 0.060
02 MEAN 0.873 0.854 0.835 0.911 0.753 0.736 1.467 0.794 0.782
VAR 0.088 0.089 0.085 0.088 0.057 0.054 0.683 0.103 0.101
MSE 0.104 0.111 0.112 0.096 0.118 0.124 0.901 0.145 0.148
pl = O 02 = .4 01 = 02 .8 pl = .4 02 = .4
2 1 3 9. 1 E 9. 1 2
B MEAN 0.996 1.016 1.019 0.804 0.813 1.028 0.785 0.982 1.027
VAR 0.067 0.060 0.045 0.179 0.173 0.040 0.230 0.083 0.056
MSE 0.067 0.061 0.045 0.218 0.208 0.041 0.276 0.083 0.057
A MEAN 0.475 0.460 0.461 0.574 0.571 0.486 0.649 0.539 0.516
VAR 0.048 0.047 0.040 0.116 0.111 0.036 0.080 0.064 0.048
MSE 0.049 0.048 0.042 0.121 0.117 0.036 0.101 0.066 0.048
02 MEAN 0.984 0.941 0.764 2.178 2.128 0.778 1.215 0.848 0.718
VAR 0.135 '0.l38 0.069 1.334 1.430 0.088 0.359 0.118 0.053
MSE 0.135 0.141 0.124 2.722 2.702 0.137 0.406 0.141 0.132

 

46

 

 

 

 

 

 

 

 

 

 

TABLE 3.2.--N = 20 I = .75.
pl 0 02 = pl = .4 02 = 0 01 = .8 p2 - 0
2 1 .2 9. 1 3. 9 1 E
B MEAN 0.994 0.994 .995 979 0.972 0.961 0.924 0.990 0.995
VAR 0.036 0.036 .037 073 0.078 0.077 0.139 0.108 0.100
MSE 0.036 0.036 .037 073 0.078 0.079 0.145 0.108 0.100
A MEAN 0.500 0.500 .502 .507 0.512 0.521 0.511 0.493 0.494
VAR 0.011 0.011 .011 .025 0.027 0.026 0.062 0.042 0.040
MSE 0.011 0.011 .011 .025 0.027 0.026 0.062 0.042 0.040
02 MEAN 0.843 0.827 .818 .945 0.830 0.821 1.224 0.684 0.656
VAR 0.092 0.092 .092 .153 0.108 0.106 0.571 0.057 0.054
MSE 0.119 0.121 .125 .156 0.137 0.138 0.619 0.156 0.172
pl 0 02 = .4 01 = 0 02 .8 01 = .4 02 = .4
.0. .1. .2. 9 1 3 9 1 3
B MEAN 0.971 0.969 .977 .948 0.959 1.010 0.745 0.902 0.983
VAR 0.052 0.052 .040 .078 0.074 0.032 0.198 0.190 0.176
MSE 0.053 0.053 .040 .081 0.076 0.032 0.264 0.200 0.177
A MEAN 0.512 0.512 .505 .508 0.498 0.475 0.643 0.525 0.514
VAR 0.017 0.017 .017 .036 0.035 0.014 0.074 0.070 0.076
MSE 0.017 0.017 .017 .036 0.035 0.015 0.095 0.071 0.077
02 MEAN 1.000 0.990 .829 .823 1.764 0.704 1.067 0.853 0.744
VAR 0.152 0.144 .103 .843 0.813 0.059 0.400 0.185 0.093
MSE 0.152 0.144 .132 .521 1.396 0.146 0.405 0.207 0.158

 

47

Consider next the specification of first-order
autocorrelation. Two values of pl were considered, 0.4
and 0.8. The results are given in the two upper right
hand columns of Tables 1 and 2. Notice first that in all
cases the estimators assuming first-order autocorrelation
and those assuming second-order autocorrelation are
extremely close together. Once again there is practically
no real loss in assuming too high a degree of auto-
correlation; the asymptotic equivalence of these two
procedures apparently comes through quite well even at
sample size 20. Next consider the estimators of B and A
only, and compare the estimators which assume no auto-
correlation with tnose which assume either first- or
second-order autocorrelation. With p1 = 0.4 the results
are somewhat mixed, but with 91 = 0.8 the inferiority of
assuming 91 = 0 is apparent. The estimates assuming no
autocorrelation are clearly biased and inefficient.
Finally, consider the estimators of 02. The estimators
assuming no autocorrelation have a considerably larger
variance than those which correctly assume autocorrelation
to be present, expecially when pl is large. Also note
that when autocorrelation is present, estimation assuming

2 16

no autocorrelation leads to an overestimate of o . All

of these conclusions are again true for both values of T.

 

16At least it leads to higher estimates than if
autocorrelation were in fact not present. Since 02 was
somewhat underestimated in the absence of autocorrelation,

 

48

The last specification considered is that of
second-order autocorrelation. The results are given in
the bottom half of Tables 1 and 2. Three cases are
= 0.4; = 0,

considered: p1 = O, = 0.8; and

p2 pl p2

= 0.4. As could be expected, estimation

91 - oz
assuming second-order autocorrelation leads to the best
(in the sense of mean square error) estimates of B, l,
and 02. (It does not always lead to the least biased
estimates of 02; this will be discussed further in the
next section.) Estimation assuming first-order auto-
correlation seems somewhat better than estimation
assuming no autocorrelation, especially in the case in
which pl is non-zero. Also it is once again true that
assuming too low an order of autocorrelation leads to an
overestimate of 02. Finally, the results are again
rather insensitive to the value of T.

This completes the experiment for the case N = 20.
The salient result is that the known asymptotic properties
of the estimators seemed to hold fairly well even for such
a small sample. In particular, understating the order of
autocorrelation leads to very bad results, while over-

stating the order of autocorrelation is essentially

costless. Apparently the only serious problem with

 

the effect of increasing p is to first push the mean of
62 toward the true value before actually leading to an
overestimate.

 

49

assuming too high an order of autocorrelation is that
the resulting estimate of 02 is apt to be more downward
biased than if the correct order of autocorrelation were
assumed.

As a further check on the rate of convergence of
the results to the known asymptotic results, the experi-
ment was partially duplicated with N = 50. Since the
value of T did not seem to materially affect the results
in the previous case, only one value (0.75) was used at
this sample size. The results are given in Table 3.3.
The asymptotic properties of the estimators are in this
case an extremely good guide to the observed properties.
Under the null hypothesis the estimation procedures
based on all three assumptions gave essentially identical
results, and the downward bias of 02 has largely dis-
appeared. In the case of first-order autocorrelation the
estimation procedures which assumed first- and second-
order autocorrelation gave almost identical results,
while estimation based on the assumption of no autocorrela-
tion gave substantially worse results. In the two cases
of second-order autocorrelation, estimation based on
second-order autocorrelation gave the best results, while
estimation based on the assumption of first-order auto-
correlation generally gave better results than estimation

based on the assumption of no autocorrelation. These are

 

 

 

 

 

 

 

 

 

TABLE 3.3.--N = 50 = 0.75.
p1 = 02 = 0 p1 '4 D2 0
9 1 .2. 9. 1 .2.
8 MEAN 1.001 1.000 1.000 1.043 1.059 1.060
VAR 0.026 0.026 0.027 0.032 0.025 0.026
MSE 0.026 0.026 0.027 0.034 0.029 0.029
1 MEAN 0.497 0.497 0.497 0.470 0.463 0.4:-
VAR 0.0086 0.0086 0.0088 0.0101 0.0093 0.0095
MSE 0.0086 0.0086 0.0088 0.0110 0.0106 0.01 9
02 MEAN 0.948 0.946 0.940 1.106 0.963 0.957
VAR 0.030 0.030 0.030 0.062 0.042 0.043
MSE 0.032 0.033 0.033 0.073 0.043 0.045
pl = 02 = .4 pl .4 - .4
9. 1 .2. .9 1 1
B MEAN 0.988 0.987 0.988 0.898 0.974 1.004
VAR 0.032 0.032 0.028 0.128 0.050 0.043
MSE 0.033 0.032 0.028 0.139 0.051 0.043
A MEAN 0.504 0.505 0.500 0.541 0.492 0.483
VAR 0.0106 0.0102 0.0081 0.048 0.029 0.018
MSE 0.0106 0.0102 0.0081 0.050 0.029 0.018
02 MEAN 1.004 1.001 0.869 1.536 0.987 0.831
VAR 0.057 0.056 0.037 0.348 0.076 0.02“
MSE 0.057 0.056 0.054 0.636 0.076 0.05

 

 

51

essentially the same results as were obtained with N = 20;
the difference is that they hold more distinctly in this
case.

The last specification considered is the null
hypothesis with N = 100. The results are given in Table
3.4. As would be expected, all three procedures give
essentially the same results. This was done primarily
as a check of whether the calculations were performed

correctly, and they do seem to have been.

3.5 Comments and Summary

 

In this chapter we have develOped the maximum
likelihood estimators of a distributed lag model when the
disturbances follow a second-order process. It has also
been pointed out that there is no loss in the asymptotic
efficiency of maximum likelihood estimation when one
assumes an order of autocorrelation higher than the true
order, while assuming too low an order leads to
asymptotically inefficient results. These results
parallel those shown in Chapter II in the context of a
linear model, and the implication is the same--
asymptotically, one should always assume a very general
type of autocorrelation.

A Monte Carlo experiment was then performed to
see how well these asymptotic results hold in small

samples. The experiment suggested that, even with samples

52

 

 

TABLE 3.4.--N = 100 r = 0.75.
01 = 0 Q2
9. 1 1
B MEAN 1.0003 1.0005 1.0000
VAR 0.0092 0.0093 0.0092
MSE 0.0092 0.0093 0.0092
A MEAN 0.4991 0.4990 0.4991
VAR 0.0028 0.0028 0.0028
MSE 0.0028 0.0028 0.0028
02 MEAN 0.9668 0.9660 0.9632
VAR 0.0230 0.0229 0.0230
MSE 0.0240 0.0241 0.0243

.........

 

 

53

as small as 20, the asymptotic properties of the estimators
provide a rather good guide to the small sample pr0perties.
That is, even in small samples there is very little loss
in assuming too high an order of autocorrelation, but
there is a considerable loss in assuming too low an
order. This is essentially the same result obtained by
the Monte Carlo experiment of the last chapter, at least
as far as the regression coefficient 8 is concerned.
Recall, however, that in the last chapter it was
found that GLS did not necessarily lead to "better"
estimators of ouz, even when autocorrelation was actually
present. This seemed puzzling, and it was conjectured
at that time that this might have occurred either because
Buz was considered rather than 02, or because the criterion
for choosing the "best" estimator of ouz was just the size
of the bias. It now seems probable that the latter
reason was the cause of the problem. After all, the
experiment of this chapter considered 82, not 6u2, and it
was still common for an estimation procedure which assumed
too low an order of autocorrelation to give the least
biased estimator of 02. However, estimation procedures
which assumed the true order did give the estimators of 02
which typically had the smallest variance and mean square
error. Hence it appears that whether or not estimation
which takes autocorrelation into account (when it is

present) gives "better" estimates of the variance (either

54

02 or cuz) depends on whether one considers the "best"

estimators to be those with the smallest bias or those

with the smallest mean square error.

 

CHAPTER IV

TESTING FOR SECOND-ORDER AUTOCORRELATION:
A GENERALIZATION OF THE DURBIN-WATSON

TEST

4.1 Introduction

 

Because GLS procedures are computationally more
complicated than OLS and because there does seem to be
some loss in small sample efficiency involved in using
GLS under the FIC, it is desirable to be able to test the
null hypothesis of the independence of the disturbances
against the alternative of autocorrelation. In the con-
text of a linear model, the test which has been most
often used to test for autocorrelation is the Durbin-

Watson test,1 for which the test statistic is the ratio

(4.1)

 

the ii being the residuals from the OLS regression of y

 

1[14] and [15].

55

 

56

on X.2 This test was specifically designed to detect

autocorrelation in the form of a first-order scheme. In

fact, the statistic dl can be written

dl é 2(1 - 51), (4.2)

where 61 is the C-0 estimator of 01 given by equation
(2.3). Clearly positive first-order autocorrelation
(p1 > 0) will tend to lead to small values of d1;
negative first-order autocorrelation will tend to lead
to large values. It should also be clear that this test
may not be very effective in detecting types of auto-
correlation other than first-order; an obvious example
of a type of autocorrelation to which it would be
insensitive would be a second—order scheme with 01 = 0.

In order to test for second-order autocorrelation,

we will propose the second-order Durbin-Watson test, to

 

be based on the test statistic

d = (4.3)

 

 

2The Durbin-Watson test is not applicable to a
distributed lag model such as the one presented in the
last chapter. A test which is asymptotically valid in
such a model has been recently suggested in [13].

 

57

This test should be able to detect first- and second-
order autocorrelation. It should also be able to detect
more different types of autocorrelation than the ordinary
first-order test, since the second-order scheme on which
it is based should be able to approximate more different
types of autocorrelation than can a first-order scheme.
In the next chapter we will discuss the power of both
tests against various alternatives; the remainder of
this chapter will be devoted to the consideration of the
distribution of the above statistic under the null
hypothesis of the FIC.

4.2 Calculation of Significance
Points

 

The statistics dl and d2 can each be written in

matrix form as

d. = 1 , 1 = 1,2; (4.4)

 

where u is the vector of least squares residuals and Al

and A2 are NxN matrices defined as follows:

_mmr’

I‘ll"

 

58

 

 

1 -l O O O ... O O O O 0

-l 2 -l O 0 ... 0 O 0 0 0

O -1 2 -l O ... 0 0 0 0 O

0 O -l 2 -l . . O O 0 O 0

A1 = 0 0 0 -l 2 ... O 0 O O 0

O O 0 0 0 ... 0 -l 2 -l O

0 0 0 0 0 ... 0 O -l 2 -1

O O 0 O 0 ... 0 0 0 -l l

2 -l -l 0 O O ... O 0 O 0

-l 3 -l -l 0 0 ... 0 0 0

-l -l 4 -l -l 0 ... O O 0 0

A2 = O -l -l 4 -l -l ... 0 0 O 0
O 0 O 0 O 0 ... 0 -1 -l 4 -

O 0 O 0 0 0 ... 0 0 -l -1

O O O O 0 0 ... 0 0 O -1

 

This is useful since a number of results have been

established in the literature

~

ﬁ'Au

 

test statistic d = , where A is any real, non-

u'ﬁ
singular, symmetric, positive definite matrix. In

particular, define

for the distribution of

 

(4.5)

a

 

59

I-X(X'X)-1X', (4.6)

3
II

and let
Z = MA (4.7)

Then Z has N-K real positive characteristic roots (N and‘
K being the dimensions of the regressor matrix X) and

K zero roots; number the positive roots, in increasing
order, "l’ﬂ2""'ﬂN-K' Durbin and Watson have shown3

that d can be rewritten as follows:

 

 

Z v.20
l 1 1
d = N-K , (4.8)
2
Z v.
1 1

where the vi are independent N(0,l) variables. Now,

following Koerts and Abrahamse,4 one can note that

P(d < d*) =
N-K 2 N-K 2 N-K 2
P[ 2 ﬂ.V. < d* 2 v. ] = P[ 2 n.v. < 0], where
1 1 1 1 1
1 1 1
where ni = Ni — d*. Using a result from Imhof,5 Koerts

and Abrahamse note that

 

3(14].
4[301.

 

5[25].

60

 

N-K
' 1
N-K . 2 1 Sln [4 i arctan (nir)]
g,
l r N (l + n.2r2)4
1
l
o
(4.9)

Numerical integration is feasible since Imhof has pro-

vided the limit of the integrand as r + 0; it is

5 Z ni. He also provides a bound for the truncation

error caused by integrating over the finite range [0,R];
to hold the truncation error to 8 one must take R equal

to

I11 . (4.10)

Thus the exact probability that d lies below any value d*
can be calculated by numerical integration, even though
the form of the distribution of d is not known. This
procedure was deve10ped by Koerts and Abrahamse6 for the
statistic d1; it clearly can also be applied to d2. All
that need be done is to insert the preper A1 in (4.7);

from there on the procedure is the same in each case.

 

6See [30] or [31]. The same procedure was sub-
sequently but independently developed in [41].

 

61

4.3 Approximations

 

The exact procedure of the preceding section has
the drawback of being rather difficult computationally,
so that one might sometimes wish to resort to an approxi—
mation procedure so as to save computational effort.
Henshaw7 and Durbin and Watson8 have given fairly compre—
hensive reviews of the available procedures, so that
only a few brief comments need be made here.

Given that d has been written as in (4.8), the
moments of d are readily computed. In particular, as

noted by Durbin and Watson,9

 

 

1 N-K _
E(d) = N-K : Ni : N (4.11)
and N-K _ 2
2 2 (Ni- N)
Var(d) = 1 (4.12)

(N-K)(N-K+2).

Now it has been proven that the distribution of d is

10 but it is not known how good a

asymptotically normal,
fit the normal distribution would provide in small
samples. In fact, there is some limited evidence to

suggest that the beta distribution may provide a better

 

7[221.
8[16].
9[15].

1°14].

-.u :-

 

62

approximation to the distribution of d,11 and the beta

distribution has generally been used to approximate the
distribution of d. Since it is clear from (4.8) that the

possible range of d is [01,n and since the beta

N_K] .
distribution over a given range is a two parameter dis-
tribution, it is possible to fit a beta distribution
having the same mean, variance and range as the true
distribution of d. This is essentially the procedure of
Henshaw; however, he goes to rather great lengths to
avoid computing the eigenvalues of Z. Where direct
eigenvalue calculation is possible, the following pro-
cedure is somwhat simpler, at least conceptually.

Having calculated Z and its roots, calculate E(d)

and Var(d) from (4.11) and (4.12). Normalize d to the

range [0,1] by defining

 

 

d - 01
x = ﬂ _ N (4.13)
N-K 1
Then clearly
E(d) :- 1T1
E(x) = ﬂ _ n (4.14)
N-K l

and

 

11For some evidence see [43] or [5].

 

63

Var(d)

Var(x) = 2
- 01)

 

(4.15)

("N-K

Assume that x is a beta variable; clearly it has range
[0,1]. It is well known that such a variable with

density

 

37%737' xp’l(1 - X)q'l (4.16)
has

E(x) = 5E5 (4.17)
and

Var(x) = (p+q?%(p+q+l) ° (4.18)

Since E(x) and Var(x) are known from (4.14) and (4.15),
(4.17) and (4.18) can be solved for p and q. The results

can be stated as follows:

 

q = ‘ I————
Var(x)(l + h)3 + h

p = qh (4.19)

64

Given q and p, one can find the critical values of x in
any table of the incomplete beta function12 and get the

corresponding critical value of d from the relation

d = N

a 1 (4.20)

+ xa(1TN_K - 01).

This procedure fits a beta variable of the same
mean and variance as d into the exact range of d; the
only element of approximation is the use of the beta
distribution. Durbin and Watson13 and Theil and Nagar14
have also proposed beta approximations, but each makes
approximations about the mean, variance and range of d
that are replaced here by exact results.

Finally, this procedure also clearly applies to
each of the di defined above. Once again all that need
be done is to insert the proper Ai into (4.7) and from

there on the procedure is identical in each case.

4.4 The Bounds Test

 

Because of the substantial computational burdens
involved in the procedures of either of the last two
sections, it would clearly be desirable to avoid them as
often as possible. Durbin and Watson have provided a

partial solution in the case of the first-order test, by

 

12For example, [39].

131151.
1"'[48].

 

65

tabulating the critical points of statistics dL and dU
whose critical points bound the critical points of d1.15
In this section we will provide similar bounds for the

distributions of the higher order tests defined above.

ﬁ'Aﬁ

 

Again consider the statistic d = , A being one

ﬁ'ﬁ
of the Ai in (4.5). Then A has N-l positive characteristic
roots; number them in increasing order Al'AZ""’AN-l‘

Recall that 01,...,N are the positive roots, in

N-K
increasing order, of Z = MA. Then the basis for the

present procedure is the fact, proved by Durbin and Watson16
that

A1 i Ni 1 Ai+K" i = l,2,...,N-K (4.21)
where K' = K-l = the number of regressors not including

the constant term (which must be present).
It is then natural to define the variables dL

and dU as follows:

 

 

N-K 2 N-K 2
Z A.v. 2 A. , v.
1 1 1 l 1+K 1 4
dL ‘ N-‘R—z—‘t 90 7 N-K 2 ‘ '22)
2 v. 2 v.
1 1 1 1
151151.

 

 

66

Comparing these with (4.8), it is evident that the
distribution of d is bounded by the distribution of dL
and dU‘ Now, given the matrix A, the significance points
of dL and dU can be calculated by the methods of section
4.2 and tabulated. Note that they depend on the matrix
X only in that they depend on N and K'.

The critical points of (d1)L and (dl)U have been
tabulated in Durbin and Watson.17 Tables 4.1 - 4.3 of
this paper contain the significance points of (d2)L and

(d To use the tables, simply compute the value of d

2’0‘
and compare it to the critical points of dL and dU for
the given values of N and K' and the desired alpha level.
If d is less than the alpha level critical point of dL,
the null hypothesis is rejected at that alpha level.

If d is greater than the critical point of dU, the null
hypothesis is accepted at that alpha level. If d falls
between the critical points of dL and dU, the test is
inconclusive; the critical point of d itself can then

be calculated by the methods of sections 4.2 or 4.3.

This procedure again applies to all of the di defined in

this paper.

 

17[141.

 

 

67

 

 

 

 

mm.m ma.m as.m sﬁ.m mv.m o~.m sm.m -.m mm.m mm.m OOH
mm.m ao.m ms.m NH.m Nv.m mH.m mm.m mH.m om.m Hm.m mm
6m.m mo.m 66.m mo.m ov.m oa.m mm.m mH.m sm.m ha.m mm
mm.m 6m.m sv.m ma.~ mm.m mo.m om.m so.m mm.m NH.m _mM
mm.m mm.m ~6.m om.~ vm.m sm.~ m~.m mm.m ma.m 4o.m .mm
6m.m ms.m mv.m ss.~ Hm.m ~m.~ aa.m mm.~ mo.m mm.~ mm
6m.m mm.m ms.m mo.~ mm.m vs.~ 6H.m mm.~ so.m mm.~ _mw
mm.m mm.m ss.m mm.m m~.m om.m mH.m ms.m ma.m ms.m mm
m6.m mm.~ m4.m 66.N sm.m vm.~ oa.m me.~ sa.m os.m mm
-- .. om.m Hm.~ sm.m av.~ so.m om.m sm.m am.m mm
.. .. am.m OH.~ om.m vm.m 4o.m mm.m as.m m¢.~ .mm
.. -- n- .. H4.m mm.H mo.m mH.~ os.m 6N.~ mm
.. I- u- .. am.m m6.a Ha.m om.a Hm.~ mm.a .mm
m .s 4 .m m .x m .x a .m
.oxmov can almov mo mmsam> Hmoﬂpﬂuo H0>0H Ho.o--.a.¢ mummy

68

 

 

 

 

sh.m mm.m sm.m mm.m Ho.m o¢.m om.m ~4.m Hm.m mv.m mm:
ms.m mm.m m6.m sm.m Hm.m sm.m mm.m ov.m om.m Nv.m mm
ms.m sm.m mo.m om.m Hm.m mm.m sm.m mm.m m6.m mm.m mm
ss.m H~.m m6.m s~.m H6.m m~.m mm.m ~m.m 66.m 6m.m mm
ms.m mH.m a6.m AH.m oo.m H~.m mm.m m~.m ss.m Hm.m mm
mm.m mo.m os.m so.m mm.m ma.m mv.m ma.m mm.m v~.m mm
sm.m ma.m ms.m Ho.m am.m so.m sv.m mH.m sm.m aa.m mm .m
mm.m 6m.m ss.m mm.m oo.m oo.m 64.m so.m mm.m 6H.m mm
vm.m ms.m ss.m mm.m Hm.m Hm.m mv.m mm.m am.m so.m mm
.. .. mm.m os.~ mm.m om.m mv.m om.m mm.m mm.~ mm
.. .. Hm.m mm.m mm.m am.m m¢.m ss.m om.m mm.~ _mm
.. -- -- .. 6s.m ms.m m6.m am.m ma.m ss.m mm
.. u- -u .. mm.m NH.~ mm.m mm.m oa.m mm.m mm
m .E v .x m u .x N .M H u .s
.almuv can quuv mo m05H6> Hmoﬂuﬂuo H0>0H mo.o--.m.4 mqm4e

 

 

69

 

 

mm.m s6.m ss.m as.m as.m Hm.m mm.m mm.m Hm.m mm.m ooa
mm.m m¢.m ms.m 66.m as.m mv.m mo.m Hm.m om.m mm.m mm
Em.m mm.m om.m ms.m ms.m mv.m mm.m m¢.m mm.m Hm.m mm
mm.m 6m.m Am.m sm.m ms.m H¢.m mo.m mv.m mm.m ms.m mm
Ha.m mm.m mm.m Hm.m ws.m mm.m oo.m os.m mm.m 66.m mm
om.m mH.m mm.m m~.m ss.m mm.m em.m mm.m sm.m mm.m mm
ma.m ma.m sm.m mH.m ms.m 4m.m mm.m om.m mm.m mm.m mm m
no.6 so.m am.m HH.m 6s.m mH.m m6.m m~.m om.m Hm.m mm
mo.v 4m.~ mm.m mo.m ms.m HH.m mm.m mH.m ms.m 6N.m mm
-- .. am.m Hm.~ Hm.m Ho.m mm.m oa.m m¢.m o~.m mm
.. -- wo.v os.m mm.m mm.~ sm.m ma.m m8.m HH.m mm
.. u- u- .. mm.m am.m sm.m sm.m mm.m mm.~ mm
.. -u -a .. mo.v ov.m vs.m om.m mm.m om.~ mm
m .s v .m m .x N .s H .s
.slmcv 6:8 almcv mo mwsam> Hmoﬂuﬂuo H0>ma oa.o--.m.s mqm<a

CHAPTER V

THE POWER OF THE GENERALIZED

DURBIN-WATSON TEST

5.1 Analytical Results

 

It is well known1 that the power of tests of the
Durbin-Watson type depends not only on the alpha level and

the alternative 0 matrix, but also on the regressor matrix

 

X. That is, the power of the test can be calculated
exactly, but this requires knowledge of a, R, and X.
Unfortunately, the fact that the power function is X-
dependent makes an analytical comparison of the power of
the tests rather intractable. It should not be sur-
prising, therefore, that even for the case of the first-
order Durbin-Watson test there has been practically no
analytical work done comparing its power to that of other
tests. Rather, resort has usually been made to Monte
Carlo methods.

One exception is the recent work by Durbin and
Watson2 in which it is shown that if the columns of X are

linear combinations of the eigenvectors of the matrix Al

 

1For example, see [30] or [31].

2[16].

70

71

ﬁ'A a
= -——l—J, then the first-order
ﬁ'ﬁ

(the matrix such that dl

Durbin-Watson test is a uniformly most powerful invariant
test3 against the alternative hypothesis of first-order
autocorrelation. For arbitrary X matrices the first-
order Durbin-Watson test is a locally most powerful
invariant test in some neighborhood of the null hypothesis.
It would clearly be nice to be able to make
analogous statements about the powers of the second-
order Durbin-Watson tests when the alternative hypothesis
is second-order autocorrelation. Unfortunately this is
not possible. Durbin and Watson's proof for the case of
first-order autocorrelation4 begins by considering the log
likelihood function generated by the first-order Markov

normal variables u1,...,u conditional on no; this is

NI
preportional to the following expression:

N N
* _ 2 2 2 2 2 _
Ll — (l + pl ) i ui - 01 (ul_ + uN ) 2pl g uiui-l’
(5.1)

For this they substitute the closely similar expression

** _ l 2

2 2 2
u. - pl(ul + uN ) 2pl

u.u. .
1 1 1-1

l-‘MZ
NMZ

(5.2)

 

3For a discussion of invariant tests see [33],
Chapter 5.

4[161.

 

72

Then the key result is the following:

**

2
_ I _
L1 — u [(1 pl) I + plAl]u, (5.3)
where I is the identity matrix and Al is as previously
defined. Now in the case of second-order autocorrelation

*
the log likelihood function is proportional to L

2
defined as follows:
* 2 2 N 2 N N
L2 — (l + pl + p2 ) Z ui - 291 Z ului_l 292 Z ului_2
l 2 3
N—l
2 2 2 2
+ p1‘32 g u 1-1 ” (91 + D2 )(“1 + uN )
2 2 2
Unfortunately one can not find suitable constants c1
and c2 such that the following is true:
* J. I 5 5
L2 - u (clI + c2A2)u, ( . )

this becomes evident immediately if one expands the above
expression. Hence the line of reasoning of the Durbin-
Watson proof breaks down in this case, so that their
result can unfortunately not be generalized to the
second-order case. Another way to see this is to note

that the following is true:

 

 

73

L* 1 . 1 2 D192
2 - u [( - pl - 02) I + (01 - —-2-—-)A:L + 02(A2 -Al)]u.

(5.6)

Following the development of Durbin and Watson, one can
now proceed to derive the locally most powerful invariant

test, whose rejection region turns out to be defined as

 

 

 

follows:
9192 ﬁ'Alﬁ ﬁ'(A2 - A1)ﬁ
_ 'k *
(pl -7-) ~'~ + 02 ~.~ < d (d some constant).
u u u u

(5.7)
ﬁ'Azﬁ

Recalling that d2 = , this clearly is not of the form
fi'ﬁ

*
d2 < d , so that the second-order Durbin-Watson test is
not locally most powerful invariant against the alternative
of second—order autocorrelation. Indeed, it is easy to

see that there does not exist a matrix A not containing

 

the parameters 01 and 02 such that the test based on the
. . ﬁ'Aﬁ . . . . .
stat1stic is locally most powerful invariant in this
ﬁ'ﬁ
case.

To summarize these results, there is some
theoretical reason to believe that the first-order Durbin-
Watson test should be more powerful than the second-order

test in the case of first-order autocorrelation, at least

74

for small values of 01. At present it is not possible to
make an analytical comparison of the tests in the cases

of autocorrelation of higher orders. From a purely
intuitive point of view, however, it would seem reasonable
to expect that the first- and second-order Durbin-Watson
tests would be most apprOpriate in testing for autocorrela-
tion of first- and second-order, respectively. The next
section will describe a Monte Carlo experiment which was
performed in an attempt to test this admittedly intuitive
hypothesis.

5.2 A Monte Carlo Comparison
of the Tests

 

 

The purpose of this section is to compare by
Monte Carlo methods the performance of the first- and
second-order Durbin-Watson tests in detecting first- and
second-order autocorrelation. Such a comparison could
be made in at least two different ways. The first would
be, given a regressor matrix X and disturbance term
covariance matrix 0, simply to calculate the power of each
of the tests exactly by the methods of Koerts and
Abrahamse.5 The second possible procedure would be to
generate random deviates having covariance matrix Q, to
use them to generate observations on the dependent
variable y, and to regress y on X and actually carry out

the tests. By repeating this procedure many times with

 

5(301.

75

independent sets of random deviates one should get a good
idea of the distribution of the test statistics. If one
is simply interested in the power of the various tests,
the first procedure would of course be simpler. The
second procedure, however, should more readily reveal
other features of the distribution of the test statistics
(e.g., mean, variance, shape, etc.) which would be
extremely difficult to calculate by Koerts-Abrahamse
methods. This Monte Carlo experiment was therefore run
along the lines of the second procedure outlined above.
In performing the experiment, three different X
matrices were used, of dimension 20 x 6, 35 x 4, and
50 x 2.6 Each consisted of a constant term plus K-l
columns of random digits taken from a listing compiled
by the Rand Corporation.7 For each x matrix, each test
was set at the 0.01, 0.05 and 0.10 alpha levels by
means of the beta approximation described in the last
chapter. The procedure was then to run 1000 independent
trials under each specification of 0, each time con-

structing observations on y, to calculate the observed

 

6Note that different numbers of regressors were
used at each sample size, so that the results are somewhat
more general than if the same type of X matrix had been
used in each case. The price paid for this added gen-
erality is that it-is now not valid to compare the powers
of the tests as sample size increases; this did not seem to
be a significant cost. Without known asymptotic prOper-
ties, it is not terribly informative to watch the
behavior of the tests as sample size increases.

7[42].

76

mean and standard deviation of each test statistic, and
to count the number of rejections by each test at each
alpha level.

Specifically, all elements of the regression
coefficient vector were taken equal to one,8 so that,

given u, y was obtained by

XX.. + u., i = l,2,...,N. (5.8)

y.
l . 1 l
J 3

Observations on u were created by suitable transformations
of random N(0,l) deviates taken from the Rand list, as
indicated below.

The first specification considered is the null
hypothesis of the full ideal conditions (FIC). Here the
random deviates were simply left untransformed. Table 5.1
gives the number of rejections by each test in 1000
independent trials. The number of rejections should
average 10, 50 and 100 with standard deviations of 3.2,
6.9, and 9.5. An asterisk indicates that the number of
rejections obtained was significantly different from the
expected number at the 5% level. It will be noted that
the beta approximation performed quite well; it was
therefore judged good enough to be used in the rest of

the experiment as well. Table 5.2 compares the actual

 

8 . .

This was done purely for convenience, and does
not in any case affect the distribution of the residuals,
so that it makes no difference.

77

TABLE 5.l.--Number of rejections per 1,000 trials under
the null hypothesis of no autocorrelation.

 

Alpha-Level

 

 

0.01 0.05 0.10

d1 15 52 112
N = 20

d2 20* 61 113

d1 18* 81* 145*
N = 35

d2 17 60 113

d1 15 67* 108
N = 50

d 12 46 95

 

78

TABLE 5.2.--Means and standard deviations of dl and d

under the null hypothesis. 2

 

 

 

Mean Standard Deviation
Actual Predicted Actual Predicted
dl 2.130 2.141 .4365 .4308
N = 20 .
d2 3.953 3.973 .5788 .5800
C11 1.955 1.998 .3165 .3167
N = 35
d2 3.982 3.995 .4585 .4502
dl 1.986 1.986 .2778 .2769
N = 50
d 3.948 3.950 .3841 .3901

 

79

mean and standard deviation of the statistics with the
"predicted" mean and standard deviation used in calculat-
ing the beta approximation. The actual values are in all
cases extremely close to the theoretical correct values,
as would be expected in a sample of 1000 trials.

The next specification considered is that of
first-order autocorrelation. For each trial at each
value of pl, the procedure is to take random N(0,l)

deviates €1,82,...,€N and to transform them as follows:

2 9 .
j pl j-l + (l - pl)2€j, j = 2,3,...,N (5.9)

Values of pl considered were 0.2, 0.4, 0.6, and 0.8.
Table 5.3 gives the number of rejections, as before,
while Table 5.4 gives the means and standard deviations
of the test statistics. In general, the results confirm
our expectations. For any test at any alpha level and
sample size, the expected value of the statistic decreases
and the power of the test increases as 01 increases. The
effect on the standard deviation of the statistics is
ambiguous. Also, for any alpha level and sample size,
the first-order test is almost invariably most powerful.
However, the difference in power is generally not very

large.

80

 

 

 

OOOH mam 666 who 66¢ mom «mm ems mmm Hmv Ram HNH 6

om n z
OOOH OOOH OOOH mam was msm mma mmm «a6 6mm mms mom H6
mmm omm Hma Ham amm ops mam HHm smm Rom NHN mm N6

mm n 2
«mm 66m ova mmm EH6 H65 Ems Ohm mH6 666 6Hm omH H6
mam ans mam smm amm 6~m Nov Ham mvH 60m aMH 6m N6

om u z
mmm Hos smm smm mom mam 666 smm omH 66m m6H am H6
OH. mo. Ho. OH. mo. Ho. OH. mo. Ho. OH. mo. Ho. "H0>6H-616H6

6.6 u Ha 6.0 u Ha 6.6 u Ha ~.o u HQ

 

.COanMHwHHOUOUSM HTUHOIUmHHM HGUCD mCOﬂDOTWGH MO H®§le.m.m qumdrH.

81

 

 

 

 

OOOO. OOO.H OONm. NOm.N OOOO. mOO.N 6666. mmm.m
Om
OOOH. smm.O 6OmN. OOO.O NOON. OON.H OHON. ONO.H
ONmm. ONO.N NOOm. NOO.N NNOm. NONm. OOHm. 666.6
mm
NONN. OO6.O mmmN. NHH.H OHON. OOm.H OOHm. ONO.H
mOmm. OON.N HmmO. 6HO.N NHHN. Omm.m 6OOO. OON.m
ON
OOmO. OOO.H OOO6. H6O.H H666. Om6.H 6N66. OOO.H
.6.m 6602 .6.6 6602 .6.m 6602 .6.m 6602
O.O u HO 0.0 HO 6.O u HO N.O HO
.206
Imamuuooousm HmUHOIumHHM 20025 N Hp mo macauma>mp pumpcmum paw mammzln.v.m mqm¢e

82

The third specification considered is second-order
autocorrelation. This is constructed by transforming the

N(0,1) deviates 61’ i = l,2,...,N, as follows:

u1 = 6:1
u = u + (1 - 2)»2
2 D1 1 D1 82
‘6 -_
uj — pluj_l + pzuj-2 + aj 8], J — 3,4,...,N,

(5.10)

where

a. = l - pi - p: - 20102 ?E 0;. (5.11)

Table 5.5 gives the number of rejections for the
six cases in which both pl and 02 are non-zero, while
Table 5.6 gives the means and standard deviations. Note
that for all alpha levels and all sample sizes, the second-
order test is most powerful. This is true even when 01
is large relative to 02. Also note that the means of all
the test statistics are lower, and the standard deviations
higher, than under the null hypothesis.

Table 5.7 gives the number of rejections for the
four cases in which 01 = 0 but 02 # 0, with Table 5.8
giving the means and standard deviations. As might be
expected, the second-order test is most powerful. In

fact, the first-order test shows very little power.

83

 

 

 

 

 

mam vmm Ohm woo mmm Hem How mmm Hmm mhm Hmm Nwm Hem 5mm bah NVF mmm NNV U
cm N Z
OOOH mmm mmm com vvm mow 0mm mom mmh MNN woo Hmm ®w© HHm mmv 0N0 Nam Nom HG
mom ova 0mm 0N0 0mm hon ONm mvn Hmm mom mob mmw Omh wvw wmv vmm mHv mNN NU
mm N Z
5mm 0mm ham 0mm EMF wow wow 00% va va mam ANN mmv ohm mNN mow vvm owH HG
mNh moo mom mvo hem OHv hmm hvv mmm mmm mmv mom mow mmm mom mmm mmm BAH NU
ON N 2
Han vmo Omv Nmm mmv «mm mam Ham VAN hmm vmm mVH mom oam VHH omN th om HU
OH. mo. Ho. OH. mo. Ho. OH. mo. HO. OH. mo. HO. OH. mo. Ho. OH. mo. Ho. uH®>®AIMLQH¢
N.O H NQ V00 H NQ N.O H NQ 0.0 H NQ Woo H NQ N.O H NO
0.0 H HQ v.0 U HQ V00 N HQ N.O H HQ N.O H HQ N.O H HQ
.COHuMHWHHOUOUSM HQUHOIUCOUmm HQUCS WCOHﬂUmﬂmH MO HGQEDleomom mqmﬂe

84

 

 

 

 

Nome. ohm.a whom. mnm.a ommm. wov.m wmvw. omH.N oamm» mho.m mwmv. hHH.M
om N Z
Nmmm. Ham.o mmvm. vmm.o ommm. mma.a mmwv. mwm.H mmmm. mmv.a momm. mom.a
mmmo. mmm.m mama. omv.m vmom. ham.m move. mmw.m mmom. omm.m nmmm. ham.m
mm H z
mmmm. omo.H Hamm. hom.a ovvm. Nmm.H bvwv. H60.H Hmav. Nvm.a mmmm. Hmo.H
mmmm. ohm.m Hwbm. hhh.m mwmh. mwo.m mmmm. mmm.m mmmn. 6HN.m Namo. Nov.m
ON n z
mHHm. th.H mmhm. 6mm.a mmom. mmm.H Hove. mhm.a hamm. mnm.a mmmv. mmm.a
.D.m cmmz .Q.m cmmz .D.m cmmz .Q.m cmwz .D.m cmmz .D.m cmmz
N a

.coﬂumamuuooousm umpHOIpcoomm Hope:

6 666

U mo msoﬂu6H>mp pumpcmum Ucm

mewEll . m . m Mdmde

85

 

 

 

 

666 66O O6O O6O H6O N66 ONO 666 OON 666 OON N6 6
OO u z

N6H O6H OO O6H HOH OO OOH 6NH OO O6H HO N6 H6

H6O 6NO ON6 O6O HO6 O6N 666 N66 6OH OON OOH 6O N6
O6 u z

66H 6OH NO 6OH 6HH 6O OOH OHH 6O N6H O6 O6 H6

6O6 NH6 6OH OO6 OON 66H HON 66H O6 N6H OHH 66 N6
ON u z

O6H 6OH 6O O6H 66 O6 6NH O6 66 OHH OO HN H6
OH. OO. HO. OH. OO. HO. OH. OO. HO. OH. OO. HO. nH0>0Hu666H6

6.0 u NO 0.0 u NO 6.0 u NO N.O u HO

.Ao HOV EOHuMHmHHooousm Hmpuolpcoomm Mona: mcoHuomﬂmu mo HmQEsZII.O.m mumma

86

 

 

 

 

HOHO. OO6.N H666. O66.N OOO6. OON.6 6OH6. NHO.6
OO
OONO. OON.N O666. O6O.N ON66. N6O.N O6N6. NOO.N
N666. 6OH.6 666O. O6N.6 OOHO. 666.6 6OOO. HOO.6
O6
6ONO. HO6.N OO6O. 6NN.N 66N6. 6OO.N 66O6. H6O.N
NOOO. NHN.6 OOOO. HO6.6 OO6O. H6O.6 6OOO. 66O.6
ON
ONOO. O66.N O6OO. 6ON.N HO6O. OHN.N OO66. 6OH.N
.6.6 6602 .6.6 6602 .6.6 6602 .6.6 6602
6.0 n NO O.O n NO 6.O NO N.O u HO
.Ao u OHuMHmuuooousm
Hmpuoupsoomm amps: m 66 mo maowumw>m© pumpcmum U26 m260211.m.m mqm¢a

87

Looking at the table of means and standard deviations,
one can see that in this case the mean of the first-order

test statistic actually increases over its value under

 

the PIC. It is only because the standard deviation also
increases as 02 increases that we obtain more rejections
than under the null hypothesis. In fact, if the increase
in the mean predominated over the increase in the standard
deviation, one could actually get less rejections in this
case than under the null hypothesis. Clearly the first-

order test is not suitable when 01 = 0 and 02 ¢ 0.

5.3 Summary

 

To summarize these results, some general patterns
clearly appear. The first-order test appears to be most
powerful for the case of first-order autocorrelation and
the second-order test most powerful for second-order
autocorrelation. Hence if a test is used which is of
higher order than the true order of the autocorrelation
scheme, some loss of power apparently results relative to
the case in which the test of "correct" order is used.
However, the Monte Carlo evidence suggests that this loss
is fairly small. On the other hand, use of a test of
"too—low" order also forfeits power, and this loss can be
very substantial indeed; this is especially true if the
lower order p's are small. All these results agree with

the intuition expressed in section 5.1. Of course, it

 

88

should be clear that these results are dependent on the
particular X matrices used in the experiment. Certain X
matrices may (or may not) exist for which these conclusions
would not hold.

To the extent that these conclusions are generally
valid, however, they would seem to imply that one should
perhaps be wary of using the ordinary Durbin-Watson test
intﬂuacommon case of testing for autocorrelation which is
not known a priori to be of first-order form. Even if
the autocorrelation in the sample should happen to be of
first-order form, use of d2 rather than (11 would entail
only a fairly small loss of power. On the other hand,
cases do exist for which the use of (11 rather than d2
would entail an almost complete loss of power. To put
the same point somewhat differently, the second-order
Durbin-Watson test is more generally applicable than the
first-order test, and it would appear to be useful in the
general case of testing for autocorrelation of unknown
form.

Finally, it should be noted that other tests have
recently been proposed which are not tied to the idea of
first-order autocorrelation. For example, Durbin9 has
prOposed as test based on the cumulative periodogram of
the residuals which may be useful in detecting auto-

correlation of a general nature; an interesting tOpic

 

9I121.

 

89

for further research would be to compare the power of

this test with the test preposed here.

CHAPTER VI

CONCLUDING REMARKS

As noted in Chapter I, autocorrelation can cause
serious problems in econometric regression analysis.
Econometricians have therefore developed testing pro-
cedures to test for its presence, and estimation pro-
cedures to alleviate the problems which it causes when it
is present. These procedures must of necessity make some
assumptions about what types of autocorrelation might be
present. In particular, the testing and estimation pro-
cedures which have most commonly been used have been
based on the assumption that the autocorrelation in the
sample is of first-order form. If autocorrelation is
present, but not of first-order form, one can only hope
that a first-order scheme is in some sense a reasonable
approximation to the true scheme. If it is not, the
ordinary procedures may not be very appropriate.

Having argued that the usual first-order auto-
correlation scheme is unduly restrictive, this study
then proposed a generalization in the form of a second-
order scheme. The common testing and estimation pro-
cedures were generalized to forms appropriate for this

case. Finally, the new procedures were compared to the

90

91

original procedures in terms of their performance in the
presence of various types of autocorrelation. It was
typically found that the "best" procedures in each case
were those which assumed the true order of autocorrelation.
However, there was a fundamental asymmetry in that the
losses involved in assuming too high an order of auto—
correlation were generally rather small, while the
losses involved in assuming too low an order were often
quite serious. This would seem to imply that when one
does not know a priori what type of autocorrelation is
present, one should proceed under rather general assump-
tions about its form.

Of course, there must be some limit on how general
a process one can assume and still get meaningful results.
(After all, without some restrictions on Q estimation is
literally impossible.) This study does not claim to have
discovered where that limit might lie. However, it does
seem clear that the assumption of second-order auto-
correlation lies well within the permitted range of
generality. It would therefore seem that testing and
estimation procedures based on the assumption of second-
order autocorrelation might often be more apprOpriate
than those which assume first-order autocorrelation, at
least when one does not have a priori knowledge of the

true form of the disturbance term covariance matrix.

REFERENCES

92

10.

REFERENCES

Amemiya, T. "Specification Analysis in the Estimation
of the Parameters of a Simultaneous Equation Model
with Autocorrelated Residuals." Econometrica (1966),
pp. 283-306.

 

Amemiya, T. and W. Fuller. "A Comparative Study of
Alternative Estimators in a Distributed Lag Model."
Econometrica (1967), pp. 509—529.

 

Anderson, R. L. "Distribution on the Serial Correla—
tion Coefficient." Annals of Mathematical Statis-
tics (1943). PP. 1-137

 

Anderson, T. W. "On the Theory of Testing Serial
Correlation." Skandinavisk Aktuarietidskrift
(1948). PP. 88-116.

Anderson, R. L. and T. W. Anderson. "Distribution of
the Circular Correlation Coefficient for Residuals
from a Fitted Fourier Series." Annals of
Mathematical Statistics (1950), pp.‘59-8l.

 

 

Anscombe, F. "Examination of Residuals." Proceedings
of the Fourth Berkely Symposium on MathematiCal
Statistics and Probability, V61. I, UnIVersity of
California Press (1961), pp. 1-36.

 

 

Cochrane, D. and G. H. Orcutt. "Application of Least—
Squares Regression to Relationships Containing
Autocorrelated Error Terms." Journal of the
American Statistical Association (1949), pp. 32-61.

 

 

Dhrymes, P. "Efficient Estimation of Distributed Lags
with Autocorrelated Errors." International
Economic Review (1969), pp. 47-67.

 

 

Dhrymes, P. Distributed Lags: Problems of Formulation
and Estimation (forthcoming).

 

Durbin, J. "The Fitting of Time—Series Models."
Review of the International Statistical Institute
(1960). PP. 233-243.

93

94

11. Durbin, J. "Estimation of Parameters in Time Series
Regression Models." Journal of the Royal
Statistical Society, Series B (1960), pp. 139-153.

12. Durbin, J. "Tests for Serial Correlation in Regression
Analysis Based on the Periodogram of Least Squares

Residuals." Biometrika (1969), pp. 1-15.

13. Durbin, J. "Testing for Serial Correlation in Least
Squares Regression when Some of the Regressors are
Lagged Dependent Variables." Econometrica
(forthcoming).

l4. Durbin, J. and G. S. Watson. "Testing for Serial
Correlation in Least Squares Regression I."

Biometrika (1950), pp. 409-428.

15. Durbin, J. and G. S. Watson. "Testing for Serial
Correlation in Least Squares Regression II."

Biometrika (1951), pp. 159-178.

16. Durbin, J. and G. 8. Watson. "Testing for Serial
Correlation in Least Squares Regression III."

Biometrika (forthcoming).

l7. Goldberger, A. Econometric Theory. New York: Wiley

(1964).
18. Grenander, U. "On the Estimation of Regression
Coefficients in the Case of an Autocorrelated

Disturbance." Annals of Mathematical Statistics

(1954). pp. 252-272.

"A Note on the Serial Correlation Bias

l9. Griliches, Z.
Econometrica,

in Estimates of Distributed Lags."

(1961). pp. 65-73.
20. Griliches, Z. and P. Rao. "Small Sample PrOperties
of Several Two-Stage Regression Methods in the

Context of Autocorrelated Errors." Journal of the
American Statistical Association (1969), pp. 253-272.

21. Hart, B. and J. von Neumann. "Tabulation of the
Probabilities for the Ratio of the Mean Square
Annals

Successive Difference to the Variance."
of Mathematical Statistics (1942), pp. 207—2I4.

22. Henshaw, R. "Testing Single-Equation Least Squares
Regression Models for Autocorrelated Disturbances."

Econometrica (1966), pp. 646-660.

23.

24.

25.

26.

27.

28.

29.

30.

31.

32.

33.

34.

35.

95

Hildreth, C. "Asymptotic Distribution of Maximum
Likelihood Estimators in a Linear Model with
Autoregressive Disturbances." Annals of
Mathematical Statistics (1969), pp. 583-594.

 

Hildreth, C. and J. R. Lu. "Demand Relations with
Autocorrelated Disturbances." Agricultural
Experiment Station Technical Bulletin No. 276,
Michigan State University (1960).

Imhof, P. "Computing the Distribution of Quadratic
Forms in Normal Variables." Biometrika (1961),

 

Johnston, J. Econometric Methods. New York: McGraw—
Hill (1963).

 

Kadiyala, K. R. "Testing for the Independence of
Regression Disturbances." Econometrica (1970),
pp. 97-117.

 

 

Klein, L. "The Estimation of Distributed Lags."
Econometrica (1958), pp. 553-565.

 

Koerts, J. "Some Further Notes on Disturbance
Estimates in Regression Analysis." Journal of
the American Statistical Association (1967),
pp. 169-183.

 

 

Koerts, J. and A. P. J. Abrahamse. “On the Power of
the'BLUS Procedure." Journal of the American

Statistical Association (1953): PP- [227-1236.

Koerts, J. and A. P. J. Abrahamse. On the Theory and
Application of the General Linear Model.
Rotterdam: University of Rotterdam Press (1969).

 

 

Koyck, L. Distributed Lags and Investment Analysis.
Amsterdam: North Holland Publishing Company
(1955).

 

Lehman, E. Testing Statistical Hypotheses. New York:
Wiley (1959).

Liviatan, N. "Consistent Estimation of Distributed
Lags." International Economic Review (1963),
pp. 44-52.

Lyttkens, E. "Standard Errors of Regression Coeffi-
cients by Autocorrelated Residuals." In H. Wold,
Econometric Model Building: Essays on the Causal
Chain Approach. Amsterdam: North HoIland
Publishing Company (1964).

 

36.

37.

38.

39.

40.

41.

44.

45.

46.

47.

48.

96

Maddala, G. S. "Generalized Least Squares with an
Estimated Variance-Covariance Matrix."
Econometrica (forthcoming).

 

Malinvaud, E. Statistical Methods of Econometrics.
Chicago: Rand-McNally (1966).

 

Neumann, J. von. "Distribution of the Ratio of the
Mean-Square Successive Difference to the Variance."
Annals of Mathematical Statistics (1941).

 

Pearson, K. Tables of the Incomplete Beta Function.
Cambridge: Biometrika OffiCeIT1934).

 

Prais, S. J. and C. B. Winsten. "Trend Estimators
and Serial Correlation." Cowles Commission
Discussion Paper No. 383, Chicago (1954).

Press, S. and R. Brooks. "Testing for Serial Correla-
tion in Regression (revised)." Center for
Mathematical Studies in Business and Economics.
University of Chicago, Report 6911 (1969).

RAND Corporation. One Million Random Digits and One
Hundred Thousand Deviates. Santa Monica (1950).

 

 

Rubin, H. "On the Distribution of the Serial
Correlation Coefficient." Annals of Mathematical
Statistics (1945), pp. 211-215.

 

Sargan, J. D. "The Estimation of Economic Relation-
ships using Instrumental Variables." Econometrica
(1958), pp. 393-415.

 

Sargan, J. D. "The Maximum Likelihood Estimation of
Economic Relationships with Autoregressive
Residuals." Econometrica (1961), pp. 414-426.

 

Theil, H. "Analysis of Disturbances in Regression
Analysis." Journal of the American Statistical
Association (1965), pp. 1067-1079.

 

 

Theil, H. "A Simplification of the BLUS Procedure
for Analyzing Regression Disturbances." Journal
of the American Statistical Association (1968),
pp. 242-253.

 

Theil, H. and A. Nagar. "Testing the Independence of
Regression Disturbances." Journal of the American
Statistical Association (1961), pp. 793-806.

 

 

49.

50.

51.

52.

53.

97

Watson, G. S. "Serial Correlation in Regression
Analysis." Biometrika (1955), pp. 327-342.

 

 

Watson, G. S. "Linear Least Squares Regression."
Annals of Mathematical Statistics (1967), pp.
1679-1699.

Wickens, M. R. "The Consistency and Efficiency of

Generalized Least Squares in Simultaneous Equa-
tions Systems with Autocorrelated Errors."
Econometrica (1969), pp. 651-659.

 

Zellner, A. "An Efficient Method of Estimating
Seemingly Unrelated Regressions and Tests for
Aggregation Bias." Journal of the American
Statistical AssociatIOn (1962), pp. 348-368.

 

Zeller, A. and M. Geisel. "Analysis of Distributed
Lag Models with Applications to Consumption
Function Estimation." Econometrica (forthcoming).

 

 

"11111111111ES