REGRESSION ANALYSIS WITH SECOND 7-0RDER AUTOREG‘RESSIVE DISTURBANCES Thesis for the Degree 0f Ph. EL MICHIGAN STATE UNIVERSITY PETER I. SCHMIDT 1970 IIIIIII IIII III/IIIIIIIIIIIIIIIIIIIIIIIIII 310565 57 77 IHFQIQ This is to certify that the thesis entitled Regression Analysis with Second-Order; Autoregressive Disturbances presented by Peter J. Schmidt has been accepted towards fulfillment of the requirements for Ph . D. degree in Economics %@ Major professor DateNovegber 19, 1970 0-7639 www.mmua—Li LIB AR Y MlChlgi‘fll State UnivchltY f —"w"v-. q cv—ww‘ o ABSTRACT REGRESSION ANALYSIS WITH SECOND-ORDER AUTOREGRESSIVE DISTURBANCES BY Peter J. Schmidt Autocorrelation is present in a regression equation when the unobservable random disturbances are not mutually independent over time. In the presence of autocorrelation, ordinary least squares will lead to inefficient estimators of the regression coefficients and to inconsistent estimators of their variances. Econometricians have therefore developed testing proce- dures to test for autocorrelation, and estimation pro- cedures to alleviate the problems which it causes when it is present. These procedures must of necessity make some assumption about what types of autocorrelation might be present. In particular, it has usually been assumed that the disturbances follow a first-order auto- regressive scheme. This study considers autocorrelation in the more general form of a second-order autoregressive scheme. The usual testing and estimation procedures are generalized Peter J. Schmidt to this case. Finally, the new procedures are compared to the original procedures in terms of their performance in the presence of various types of autocorrelation. The results obtained indicate that these generalized testing and estimation procedures may be useful, at least when one does not have strong a priori reasons for believing the autocorrelation in the sample to be of first-order form. REGRESSION ANALYSIS WITH SECOND-ORDER AUTOREGRESSIVE DISTURBANCES BY Peter J. Schmidt A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Economics 1970 ("‘1‘“) Ibr J LP\ ACKNOWLEDGEMENTS This study would never have been begun, much less completed, without the constant encouragement, advice, and help of Jan Kmenta, my dissertation committee chairman. James Ramsey and Roy Gilbert also read the entire study and made many valuable suggestions. I also wish to thank Phoebus Dhrymes for his comments on Chapter III, and Richard Henshaw for his comments on earlier versions of Chapters IV and V. The research on which this study is based was supported in part by the Mathematical Social Science Board Workshop on Lags in Economic Behavior, held at the Univer- sity of Chicago in the summer of 1970. Marc Nerlove and G. S. Maddala, co-directors of the workshop, were kind enough to read a substantial part of an earlier draft and to suggest improvements. Any remaining errors are of course my own responsibility. ii TABLE OF CONTENTS ACKNOWLEDGEMENTS . . . . . . . . . . . . LIST OF TABLES . . . . . . . . . . . . Chapter I. INTRODUCTION . . . . . . . . . . . 1.1 Statement of the Problem . . . . . 1.2 Types of Autocorrelation . . . . . II. ESTIMATION IN A LINEAR MODEL . . . . . 2.1 Introduction . . . . . . . 2.2 Generalization to the Second- Order Case 2.3 PrOperties of the GLS Estimators . . 2.4 The Experiment . . . . . . . . 2.5 Results . . . . . . . . . . . 2.6 Summary . . . . . . . . . . . III. ESTIMATION IN A DISTRIBUTED LAG MODEL . . 3.1 Introduction . . . . 3.2 Asymptotic Properties of the Estimators 3.3 Small Sample Properties of the Maximum Likelihood Estimators . . . . . 3.4 Results of the Experiment . . . . . 3.5 Comments and Summary . . . . . . IV. TESTING FOR SECOND-ORDER AUTOCORRELATION: A GENERALIZATION OF THE DURBIN-WATSON TEST . 4.1 Introduction . . . . . . . . 4.2 Calculation of Significance Points . . 4.3 ApproximatiOns . . . . . . . . 4.4 The Bounds Test . . . . . . . . V. THE POWER OF THE GENERALIZED DURBIN-WATSON TEST 0 O O O O O I I O C O O C 1 Analytical Results . . . . 2 A Monte Carlo Comparison of the Tests . 3 Summary . . . . . . . . . . . 5. 5. 5. iii Page ii 11 ll l3 l6 19 22 28 3O 3O 36 4O 44 51 55 55 57 61 64 70 7O 74 87 Chapter VI. CONCLUDING REMARKS REFERENCES . . . . . iv Page 90 92 LIST OF TABLES Variance of 8 . . . . . . . . . . A 2 Mean of o . . . . . . . . . . . u N = 20,’f= 0.25 . . . . . . . . . . N = 20,‘r= 0.75 . . . . . . . . . . N = 50, r: 0.75 . . . . . . . . . . N = 100, T = 0.75 . . . . . . . . . 0.01 level critical values of (d2)L and (d2)U 0.05 level critical values of (d2)L and (d2)U 0.10 level critical values of (d2)L and (d2)U Number of rejections per 1,000 trials under the null hypothesis of no autocorrelation Means and standard deviations of dl and d2 under the null hypothesis . . . . . Number of rejections under first-order auto- correlation . . . . . . . . . . Means and standard deviations of d1 and d2 under first-order autocorrelation . . . Number of rejections under second-order autocorrelation . . . . . . . . . Means and standard deviations of d1 and d2 under second-order autocorrelation . . . Number of rejections under second-order auto- correlation (p1 = 0) . . . . . . . Means and standard deviations of d1 and d2 under second-order autocorrelation (p1 = 0) Page 23 24 45 46 50 52 67 68 69 77 78 80 81 83 84 85 86 CHAPTER I INTRODUCTION 1.1 Statement of the Problem1 Consider the linear regression model yi = lei1 + Bzxiz + ... + BKXiK + ui' l = 1,2,...IN; (1.1) where each Bj is a parameter to be estimated; Xi' is the ith observation on the jth independent variable (regressor); yi is the ith observation on the dependent variable in the regression; and ui is a random disturbance. This model can be rewritten in matrix form: y = XB + u; (1.2) where y, B and u are vectors and X is a matrix, defined by Y1 X11 X12 '°° XlK U1 81 y x x ... x u 8 y = 2 x = 21 22 2K u = 2 B = 2 (1.3) yN le xN2 "' XNK uN BK This section closely follows [17], Section 5.4. This model is said to satisfy the full ideal conditions (FIC)2 if u is stochastically independent of X, if E(u) = 0,3 if X has rank K i N, and if E(uu') = OZI (where I is the N-dimensional identity matrix and 02 is a parameter to be estimated). On the other hand, autocorrelation is said to be present when Cov(uiuj) # O for some i # j. That is, the disturbances are said to be autocorrelated if E(uu') = 020, where 0 is a non-diagonal positive semi-definite matrix. Autocorrelation is typically considered in the context of time-series analysis; it is then the case in which the disturbances are correlated over time. This study will be concerned with cases in which the non—diagonality of 0 is the only violation of the PIC; that is, E(u) = 0 and u is distributed independently of X, but 0 has non-zero terms off the diagonal. The ordinary least squares (OLS) estimator of B is defined by l E: (x'xf X'y (1.4) . . 2 . and an assoc1ated estimator of o is l .— 52 = X—¥%, where M = I-X(X'X) l N x'. (1.5) 2This terminology is due to [6]. 3The symbol 0 will be used to denote an appro- priately dimensioned matrix or vector of zeroes. ~ The covariance matrix of 8 under the PIC is equal to l 2 02(X'X)- , and can be estimated by replacing 02 by 5 . Now, it is well known that, under the FIC, 8 is best linear unbiased, consistent, and also asymptotically efficient if the disturbances are Normally distributed; 52 is unbiased, consistent, and also asymptotically efficient if the disturbances are Normal. In fact, these desirable properties of the OLS estimators under the PIC constitute the chief rationale for the use of the OLS estimation procedure. Unfortunately, however, these properties do not hold if the disturbances are autocor- related. In this case the OLS estimator 8 is still unbiased and consistent, but it is in general no longer best linear unbiased or asymptotically efficient. The estimator 52 is in general biased and inconsistent. Furthermore, the covariance matrix of 8 is no longer equal to 02()UX)-l. It is equally well known that these difficulties could be avoided through the application of generalized least squares (GLS) if the disturbance term covariance matrix 0 were known. With 0 known, the GLS estimator of B is l (1.6) lX'Q- y. R = (x'n'lx)' while 02 is estimated by ' 'k _ _ 62 = XN¥KXV wherebd* = I—X(X'Q 1X) 1 -l X'Q (1.7) The covariance matrix of S is 02(X'0-1X)-1. It is also sometimes useful to note that since 0 is by assumption positive semi-definite, 0-1 is also positive semi-definite, so that there must exist a (not necessarily unique) non- singular matrix V such that Q = V'V (1.8) It should then be clear that if u has covariance matrix 020, Vu will have covariance matrix GZI. Hence if the regression equation is rewritten Vy = VXB + u (1.9) the disturbances now satisfy the FIC, so that the OLS regression of Vy on VX is appropriate. Indeed, the OLS regression of Vy on VX is algebraically identical to the GLS procedure defined above. The point of using GLS is of course that the GLS estimators (with 0 known) have the same optimal properties as do the OLS estimators under the PIC. That is, 8 is best linear unbiased, consistent, and asymptotically efficient; 32 is unbiased, consistent, and asymptotically efficient. It should therefore be clear that autocorrela- tion is really a problem only in that Q is generally not known. With 0 unknown, the above GLS procedure cannot be applied, at least not directly, and the question of what to do when 0 is unknown (but suspected to be non-diagonal) is not a trivial one. It is this question to which the rest of this study will be addressed. 1.2 Types of Autocorrelation When 0 is unknown, an obvious procedure is to find a consistent estimate of it, and then to use GLS with the estimate 0 replacing the (unknown) true covariance matrix Q. The statistical justification for this procedure is the well-known fact that if GLS is applied with any consistent estimator 0 used in place of 0, the resulting estimators 8 and 62 will be consistent and asymptotically efficient.4 Unfortunately, however, it is in general not possible to estimate 0 consistently. After all, 0 has %N(N-l) distinct elements, and one can hardly hOpe to estimate them all from a sample of N observations. It should therefore be clear that one can hope to proceed only by putting fairly severe restrictions on 0. Essentially, what must be done is to make 0 depend on some fixed number of parameters that does not depend on the sample size. These parameters can then (hOpefully) 4This theorem was proved for a special case in [52]. However, it holds whenever X is fully independent of u. For a full discussion see [36]. be consistently estimated, 0 can be constructed, and GLS can be applied. In particular, the procedure which has usually been used is to assume that the disturbances follow a first-order Markov process: 11- = u. + o i = —m 0.0 N. 1 p1 i-l ei' ’ ’ ’ where —l < pl < l, E(ei) = 0, E(Eiui-s) _ . . 2 : E(eiej) — 0 for i # j, and o - Var(ei) (1.10) 0 for s > 0, VarIuj)(1 - 012) E ou2(1 - p12) for all i and j. (This is often referred to as a first—order autocorrelation scheme.) Then it is readily verified that Cov(uiuj) = on2 plll_jl. Hence 0 is of the form: N-1 1 01 pl 0 = p l . . pN—Z l 1 1 2 . . l"pi Direct multiplication will verify that 0-1 the tridiagonal matrix (1.11) is given by l -01 O 000 O 0 2 "pl 1+0]. "’01 on. O 0 0 -pl l+p12 ... o 0 -1 . . . . . n = . . . . (1.12) o o o .. l+p 2 —p 1 1 O 0 0 .. -pl 1 As noted in the last section, 9—1 can be decomposed as -l 0 = V'V. V is in this case given by:5 a 0 0 .. 0 0 -pl 1 0 ... 0 0 21/2 V = O -p. 1 ... O O , where a = (l-pl ) . 0 0 0 ... -pl 1 (1.13) One last fact to note is that the determinant of 0-1 is equal to a2 = 1-012. It is clear that in this case 0 depends on only one parameter, pl. Given a consistent estimator 51, 0 can be formed and GLS can be applied. If the disturbances are in fact generated by a first—order Markov process, 0 will be a consistent estimator of Q, and the resulting GLS estimators will thus be asymptotically efficient. 5The first explicit statement of 0_1 and its decomposition is given in [40]. In general, however, there is no reason to suppose that a first—order scheme is appropriate. First-order autocorrelation is generally only an approximation to the (unknown) type of autocorrelation in the sample, and it is frequently assumed because it is a particularly simple way to make estimation possible. Iowever, while it is true that the form of 0 must be restricted to make it estimable, it seems rather drastic to make 0 depend on only one parameter. After all, a more general type of autocorrelation scheme might provide a reasonable approximation to more different types of autocorrelation, and this would seem desirable when 0 is not known a priori to be of any particular form. This study will be particularly concerned with the obvious generalization of the usual first-order pro- cedures to the case of a second—order autocorrelation scheme: ui = plui—l + p2ui_2 + Ei’ (1.14) where loll + I02] < l and the same assumptions about 8i are made as in the first-order case, except that here Var(ui) is equal to 2 2 l - pl - p2 - 201 pz/(l - 92) 02(1 - 02) == 1 l - - 2 - 2 _ 2 + 3 (1.3.5) D2 D1 D291 02 p2 Then the following facts are readily verified: E(uu )=020/(l-p) (116) i i-l u 1 2 ’ E(uu )=02[o +02/(l-o)] (117) i i-2 u 2 1 2 ' E(uiul s) = plE(ui-1ui-s) + sz‘ui-zui-s)' s > 2. (1.18) 0 can thus be written out, though this becomes rather tedious since the terms become horrendous as one moves away from the diagonal. The useful expression in any case in 0’1, which is given by 1 -01 -02 0 0 0 0 -o 1+0 2 o (o -1) -o 0 o o l l l 2 2 "‘ -p o (p -1) 1+0 2+9 2 p (o -1) o o o 2 1 2 1 2 1 2 .-1 = _ _ 2 2 fl 0 92 01(02 1) l+pl +02 ... 0 0 0 (1.19) O O 0 0 1+p 2+9 2 p (p -l) -o "' l 2 l 2 2 2 0 O 0 O ... pl(pz-l) 1+0l -01 10 Note that this is a band diagonal matrix, with five non- zero bands. One can again decompose 0-1 as V'V, where in this case V is given by a O O 0 . . 0 O 0 b c 0 0 ... 0 0 0 -9 -0 1 o o o 0 V: 2 1 (1.20) 0 -p2 -pl 1 .. O 0 0 o o o 0 ... -p2 -pl 1 and where a=[l-*2-02(l+p)/(l-p)]l/2 V2 1 2 2 1 _ _ _ a b — pl [(1 + 02)/(l 02)] (1.21) » c= (l - 022)2 It is also clear a2C2 = 1 _ pl2 Finally, considering this that the determinant of Q-l is equal to 2 2 2 2 4 202 - 201 02 — pl 02 + 02 0 it should be repeated that the reason for second-order scheme is not that there is necessarily any a priori reason for believing it to be common in actual data. The point is simply that the assumption of first-order autocorrelation may be unduly restrictive. CHAPTER II ESTIMATION IN A LINEAR MODEL 2.1 Introduction Consider the linear model defined by (1.1). Then it is clear, under the assumption of first-order auto- correlation, that GLS can be applied to get asymptotically efficient estimates if one can obtain a consistent estimator 61 with which to construct 0. Several methods of getting a consistent estimator 51 have been proposed in the literature. One method, suggested by Durbin,l is to use for 31 the OLS estimator of the coefficient of y_l in the transformed equation +8 +...+BX Y1 = p1Yi—1 1x1,1 ' Blplxi-1,1 K i,K BKplxi—1,K + (ui - plui-l)’ i = 2,3,...,N (2.1) This estimator is consistent. Hildreth and Lu2 have suggested a modification of this procedure in which (2.1) is estimated subject to the constraints that 1[10] and [11]. 2[241. ll 12 /\ A -ijl = -8 pl, j = 1,...,K. (2.2) Clearly this is a non-linear procedure. The estimators it yields are in fact the maximum likelihood estimators, conditional on yl. This procedure gives a consistent and asymptotically efficient estimate of pl. Finally, an estimator which will be referred to as the C-0 estimator (after Cochrane and Orcutt3) is the following: N~~ A g ui i-l pl = 'EriT:;-— (2.3) the fii being the OLS residuals from the regression of y on X. This estimator is consistent. As noted, if the disturbances do in fact follow a first-order scheme, each of these estimators is con- sistent. Hence in this case GLS based on any of the 61 above will yield asyptotically efficient results. It should also be noted that in actual econometric practice the usual procedure in applying GLS is not to actually form 0 or 0-1, but rather to use the equivalent procedure of forming V and applying OLS to the transformed 3Actually, the estimator defined in their article differs from the one defined above by a factor of N/(N-l). l3 equation Vy = VXB + Vu. Notice that disregarding the first row of V, this amounts to the following regression: (Y1 ' piYi-i’ = B1(Xi,1 ' plXi-l,l) T °" + BK(Xi,K ‘ plxi-1,K) + (“i ’ plui-l)’ i = 2,3,...,N. (2.4) Disregarding the first row of V is thus equivalent to discarding the first observation, and this has commonly been done. This common "approximation" to the actual GLS procedure (which would include the first observation with a "weight" of (l - 512)%)jrsclearly asymptotically equivalent to GLS. 2.2 Generalization to the Second—Order Case In the case of second-order autocorrelation it is of course necessary to estimate both p1 and oz in order to form 0. Fortunately, consistent estimators pl and oz can be obtained by straightforward generalizations of the procedures of the last section. Durbin's method can be generalized to this case4 by applying OLS to the trans- formed equation: 4In fact, it was presented in general qth order form (q any integer) in [11]. 14 Y1 7 p1Yi-i + p2Yi-2 + B1xi,i 7 BlplXi-l,l 7 B102X1-2,1 +°°' + BKxi,K 7 BKplXi-1,K 7 BK°2X1-2,K + (ui - plui-l - QZui—Z)’ i = 3,4,...,N. (2.5) The maximum likelihood procedure (conditional on yl and y2) is to estimate the above equation subject to the constraints B/\ ’8“ 1 l 2 (2 6) - 0 = '- o . = .0. K; = o o - Jpq qu' J , r q r The C-O method can be generalized in at least two ways. The first would be to estimate pl and oz by the OLS regression of E. on u. and fi. . A somewhat more 1 1-1 i-2 informative way is to note that if one defines N * g uiui—l pl = _Er______ (2.7) 2 {1.2 1 i N * gul i‘2 pl = —N—_- , (208) z {1.2 1 i * * COV(uiui-l) then p1 and p2 are consistent estimators of 2 0u Cov(uiui_2) and 2 respectively; from (1.16) and (1.17) it Cu 15 is clear that these expressions are not in general equal to p1 and oz. However, consistent estimators can be derived by setting p: and p; equal to their probability limits given by (1.16) and (1.17) and then solving for pl and p2. That is, solve the following equations for p1 and p2: * A A 01 — pl/(l - oz) (2.9) *—" “2 1 A 210 The solution is A * *2 *2 A * A Once again the estimators 61 and 62 are con- sistent. Hence if the disturbances do follow a second- order autocorrelation scheme, GLS (using 0 constructed from 61 and 52) will yield asymptotically efficient estimates of B and 02. Finally, it is once again apt to be computationally simpler to form V (rather than 0 or 0_l) and to apply OLS to the transformed equation Vy = VXB+-Vu. Disregarding the first two rows of V, this amounts to the following: l6 (Y1 7 p1Yi-1 7 pzyi-z) = BlIXi,1 ' 01Xi_1,l - 92X1-2,1) +... + BK(Xi,K 7 p1xi—1,K 7 OZXi-2,K) 7'" (ui — 01111-1 - 0211i-2), l = 3,4,...,N. (2.13) Clearly this is asymptotically equivalent to the actual GLS procedure, which would include the first and second observations with appropriate weights given by the elements in the first two rows of V. 2.3 Pronerties of the GLS Estimators Let us denote by GLSl the GLS procedure which assumes first-order autocorrelation. That is, GLSl means GLS with 0 formed from 61, with 61 calculated by any of the methods of section 2.1. Similarly, let GLSZ be the GLS procedure using 0 formed from 61 and 62, with 61 and 62 calculated by any of the methods of section 2.2. In this section we will compare the efficiency of OLS, GLSl, and GLSZ under various specifications of the form of 0. Consider first the asymptotic properties of the various estimation procedures. These asymptotic compari- sons can be easily made since, as noted in section 1.2, estimation based on a consistent estimator of 0 will yield asymptotically efficient results, while estimation based 17 on an inconsistent estimator of 0 will in general give asymptotically inefficient estimates of B and inconsistent estimates of oz. Suppose first that the disturbances satisfy the full ideal conditions (FIC). Clearly OLS will be asymptotically efficient. But it is also clear that the PIC are just the special case of a first-order auto- correlation scheme corresponding to p1 = 0, so that 61 estimated by any method of section 2.1 will be a consistent estimator of pl = 0. Hence 0 formed from 61 will be a consistent estimator of 0, and GLSl will also be asymptotically efficient. Similarly, 61 and 62 estimated by any of the methods of section 2.2 will both be con- sistent estimators of zero, so that GLSZ will also give asymptotically efficient results. In other words, under the PIC, OLS, GLSl and GLSZ all give equally asymptotically efficient results. Suppose next that the disturbances follow a first- order scheme with 91 # 0. Then by the same type of reasoning, it is clear that OLS yields asymptotically inefficient results, while GLSl and GLSZ are both asymptotically efficient. Finally, if the disturbances follow a second-order scheme with p2 # 0, only GLSZ will be asymptotically efficient; OLS and GLSl will both yield asymptotically inefficient results. 18 This can be summarized by the general statement that estimation based on the assumption of the "correct" (true) order or on the assumption of "too high" an order of autocorrelation (i.e., an order of autocorrelation greater than the true order) will lead to asymptotically efficient estimators 8 and 82. On the other hand, estimation based on the assumption of "too low" an order of autocorrelation will in general yield asymptotically inefficient estimates of B and inconsistent estimates of 62. In other words, in infinite samples one should always prefer a higher order of autocorrelation, since this will minimize the chance of getting inefficient estimates.5 Of course, in small samples none of this need be true. In fact, there are almost no analytic results on the small sample properties of GLS estimators with an estimated covariance matrix; questions of this sort are of necessity usually investigated by Monte Carlo methods.6 The next section will describe a Monte Carlo experiment which attempts to examine the small sample prOperties of OLS, GLSl and GLSZ under various specifications of 0. In particular, the question which we attempt to answer 5This discussion has ignored differences in com- putational costs. 6For an example of an experiment comparing the small-sample properties of OLS and GLSl, see [20]. l9 concerns the size of the loss that results in small samples (if in fact any loss does result) from assuming too low or too high an order of autocorrelation. The criteria for choosing the superior estimation procedure will be the variance of the estimator of B and the bias of the estimator of Ou2'7 2.4 The Experiment The eXperiment was conducted in the context of the simple regression model: yi = a + BXi + ui, (2.14) with a, B and on2 all being taken equal to one. The generation of the values of X and u is described below; given X and u, observations on y were created and (2.14) was estimated in each of three ways: A. OLS B. GLSl, 61 estimated by the C—0 method given by (2.3) of section 2.1. C. GLSZ, 61 and 62 estimated by the C-O method given by (2.7) - (2.12) of section 2.2.8 7The bias of 8 2 is considered rather than its variance or mean squarg error because one is typically not interested in 6u2 per se, but only to make confidence statements, tests, etcT— It is therefore most important that 6 2 not be strongly biased in one direction or the other. 8This perhaps deserves a comment. Asymptotically it makes no difference how the p's are estimated, as long as the estimators used are consistent. However, in small 20 In all cases the GLS procedures used were not actually the true GLS procedures, but rather the common approximation to GLS of applying OLS to the transformed equations (2.4) and (2.13), as described above.9 This procedure was repeated for 100 independent trials under each of a variety of specifications, and the results of the 100 trials were used to calculate the variance of 8 and the mean of ouz. To create observations on X, X1 was taken to be a N(0,l) deviate from a listing of random deviates pre- pared by the Rand Corporation.10 The remaining Xi were generated as follows: 2 h . Xi = TXi—l + (1.1: ) 5i, l = 2,3,...,N; (2.15) where 5i is a N(0,1)-deviate independent of X1 and of previous 6's. T is thus the correlation between successive samples different ways of estimating the 0's may lead to considerably different results. This is not investi- gated here because the purpose of this experiment is to compare OLS and GLS under various types of autocorrela- tion, and to introduce different versions of GLS based on different ways of computing the p's would tend to confuse the issue. 9As just noted, this makes no difference asymptotically. The approximate procedure is used here because it is typically used in econometric practice. 1"[42]. 21 Xi' Two values of I were used, 0.2 and 0.8, since it is well known that the properties of the estimators may depend on the correlation between the Xi. Hence this correlation was held constant at each of the two levels. Two values of N (sample size) were considered, 20 and 100. The results for sample size 20 are designed to show small sample properties; sample size 100 was included so as to get an idea of the results with a somewhat larger sample, and to see if the known asymptotic properties of the estimators begin to emerge. Observations on u were obtained by reading the N(0,l)-deviates £1, €2""'§N' independent of each other and of the X's, from the Rand listing, and applying the suitable transformation (to be described) for each specification. Given u, y could be constructed, and the various estimation procedures could be applied. First- and second-order autocorrelation schemes were considered, with pl and p2 taking on all possible values among 0.0, 0.2, 0.4, 0.6 and 0.8, subject to the restriction that pl + 02 < 1. To simulate the null hypothesis of no autocor- relation, the independent N(0,l)-deviates 5i were simply left untransformed; that is, 111 = ii for all i. To simulate first order autocorrelation with parameter pl, the deviates 5i were transformed as follows: 22 1/ 2)2 8-, i=2,3,...,N, (2.16) u' = p1111-1 + (1791 1 where the factor (1-plz)l/2 is included to ensure that the ui will have a variance of l for all i. Finally, second order autocorrelation was simulated by applying the following transformation to the so: 1 u1 = 6l u = p u + (l-p 2)2 e 2 1 l l 2 ;, uJ pluj-l + p2uj_2 + aj2 ej, j = 3,4,...,N (2.17) where -3 2 2 2 3 r aj = l - pl - p2 - 2pl pz 2 0 p2. (2.18) r: Again the aj are taken so as to ensure that the ui will . 11 have constant variance. 2.5 Results Table 2.1 gives the variance of the estimates of 8 under the various specifications of the model, and Table 2.2 llNote two things. First, it is ouz which is being held constant rather than 02. Second, this scheme is not precisely the same as that defined in section 1.2, as it does not start at -w. However, all the covariances con- verge to those of section 1.2 as i increases, and even with N=20 the difference should be negligible. 23 TABLE 2.l.--Variance of 8. N = 20 N = 100 01 D 2 OLS GLSl GLSZ OLS GLSl GLSZ 0.0 0.0 0.0636* 0.0744 0.0867 0.0105* 0.0113 0.0114 0.2 0 0 0.0711* 0.0787 0.0916 0.0108 0.0108 0.0107* 0.4 0.0 0.0777 0.0720* 0.0820 0.0112 0.0084 0.0083* 0.6 0.0 0.0815 0.0545* 0.0597 0.0117 0.0053 0.0052* 0.8 0.0 0.0712 0.0259* 0.0364 0.0118 0.0023* 0.0023 0.0 0.2 0.0626* 0.0728 0.0817 0.0105 0.0115 0.0102* T = 0 2 0.0 0.4 0.0610* 0.0710 0.0660 0.0101 0.0111 0.0078* ' 0.0 0.6 0.0580 0.0644 0.0432* 0.0095 0.0103 0.0050* 0.0 0.8 0.0490 0.0460 0.0210* 0.0090 0.0088 0.0023* 0.2 0.2 0.0708* 0.0756 0.0843 0.0107 0.0105 0.0093* 0.4 0.2 0.0771 0.0644* 0.0692 0.0114 0.0074 0.0067* 0.6 0.2 0.0729 0.0373* 0.0381 0.0125 0.0039 0.0036* 0.2 0.4 0.0692 0.0706 0.0658* 0.0108 0.0100 0.0069* 0.4 0.4 0.0722 0.0511 0.0462* 0.0128 0.0065 0.0046* 0.2 0.6 0.0616 0.0580 0.0393* 0.0112 0.0092 0.0042* 0.0 0.0909* 0.1034 0.1179 0.0113* 0.0120 0.0125 0.2 0.1141* 0.1278 0.1450 0.0146* 0.0160 0.0166 0.4 0.1401* 0.1430 0.1614 0.0191 0.0190* 0.0196 0.6 0.1667 0.1359* 0.1553 0.0257 0.0179* 0.0182 0.8 0.1806 0.1011* 0.1187 0.0355 0.0103* 0.0105 0.0 0.2 0.0954* 0.1064 0.1237 0.0136* 0.0143 0.0146 0.0 0.4 0.0983* 0.1075 0.1150 0.0159 0.0166 0.0147* T = 0.8 0.0 0.6 0.0993 0.1071 0.0929* 0.0181 0.0187 0.0119* 0.0 0.8 0.0891 0.0951 0.0612* 0.0196 0.0201 0.0066* 0.2 0.2 0.1237* 0.1306 0.1482 0.0185* 0.0191 0.0188 0.4 0.2 0.1556 0.1398* 0.1566 0.0261 0.0207 0.0198* 0.6 0.2 0.1818 0.1230* 0.1402 0.0393 0.0146 0.0141* 0.2 0.4 0.1345 0.1329* 0.1372 0.0243 0.0223 0.0176* 0.4 0.4 0.1756 0.1401* 0.1471 0.0414 0.0210 0.0158* 0.2 0.6 0.1412 0.1315 0.1166* 0.0347 0.0248 0.0128* TABLE 2.2.--Mean of an . 2 24 N = 20 N = 100 pl p2 OLS GLSl GLSZ OLS GLSl GLSZ 0.0 0.0 1.0086 1.0015* 0.9578 0.9931* 0.9919 0.9891 0.2 0.0 0.9760* 0.9648 0.9176 0.9886* 0.9860 0.9819 0.4 0.0 0.9236* 0.8969 0.8311 0.9802* 0.9726 0.9631 0.6 0.0 0.8354* 0.7744 0.6835 0.9607* 0.9403 0.9186 0.8 0.0 0.6515* 0.5309 0.4456 0.8982* 0.8436 0.7916 0.0 0.2 0.9759* 0.9629 0.9284 0.9898* 0.9882 0.9826 T _ 0.0 0.4 0.9303* 0.9075 0.8577 0.9814* 0.9792 0.9615 - 0.0 0.6 0.8566* 0.8174 0.7320 0.9608* 0.9576 0.9123 0.0 0.8 0.7181* 0.6493 0.4838 0.9046* 0.8990 0.7828 0.2 0.2 0.9268* 0.9090 0.8748 0.9856* 0.9819 0.9703 0.4 0.2 0.8534* 0.8118 0.7641 0.9906 0.9774 0.9453 0.6 0.2 0.7075* 0.6142 0.5538 0.0101* 0.9666 0.8810 0.2 0.4 0.8616* 0.8333 0.7859 0.9942* 0.9879 0.9428 0.4 0.4 0.7654* 0.6998 0.6419 1.1094 1.0798* 0.9091 0.2 0.6 0.7457* 0.7035 0.6244 1.0398 1.0264* 0.8808 0.0 0.0 1.0220 1.0146* 0.9688 0.9940* 0.9927 0.9897 0.2 0.0 0.9820* 0.9693 0.9218 0.9869* 0.9845 0.9802 0.4 0.0 0.9181* 0.8899 0.8265 0.9750* 0.9685 0.9587 0.6 0.0 0.8115* 0.7545 0.6726 0.9502* 0.9333 0.9109 0.8 0.0 0.5999* 0.5022 0.4230 0.8790* 0.8313 0.7783 0.0 0.2 0.9836* 0.9706 0.9317 0.9886* 0.9871 0.9813 T _ 0.0 0.4 0.9315* 0.9101 0.8532 0.9777* 0.9759 0.9583 _ 0.0 0.6 0.8508* 0.8142 0.7142 0.9543* 0.9518 0.9070 0.0 0.8 0.7080* 0.6394 0.4777 0.8961* 0.8919 0.7771 0.2 0.2 0.9231* 0.9044 0.8671 0.9804* 0.9772 0.9655 0.4 0.2 0.8311* 0.7918 0.7478 0.9794* 0.9687 0.9370 0.6 0.2 0.6564* 0.5813 0.5291 0.9883* 0.9506 0.8703 0.2 0.4 0.8444* 0.8180 0.7661 0.9839* 0.9790 0.9347 0.4 0.4 0.7181* 0.6662 0.6147 1.0862 1.0614* 0.8967 0.2 0.6 0.7117* 0.6777 0.6023 1.0207 1.0104* 0.8688 25 gives the mean of the estimates of ouz. For each speci- fication of the model an asterisk (*) marks the estimated minimum variance estimator of B and the estimated least biased estimator of ouz. Consider first the null hypothesis of no auto- correlation; that is, the case pl = p2 = O. In terms of the estimates of B, OLS is clearly best, and GLSl is better than GLSZ. The differences are considerably larger at sample size 20 than at sample size 100, as the asymptotic equivalence of these estimators under the PIC begins to show at the larger sample size. In terms of the estimates of on2 there is little difference between the various procedures, though GLSZ seems to give somewhat inferior estimates when N = 20. Finally, the value of I does not seem to make much difference in this case. The next Specification considered is first-order autocorrelation. Consider first the efficiency of estimation of B. GLSl clearly dominates GLSZ at sample size 20, though the difference is not terribly great; at sample size 100 they appear to be roughly equivalent, clearly reflecting their asymptotic equivalence in this case. Both GLSl and GLSZ gave noticeable gains in efficiency over OLS, except for "small" values of 01. The minimum value of pl necessary to result in a gain in efficiency over OLS was smaller for GLSl than for GLSZ, and for either GLSl or GLSZ it was smaller with N = 100 26 than with N = 20. Also, the efficiency of either GLSl or GLSZ compared to OLS was greater when r = 0.2 than when T = 0.8.12 The results were fairly favorable to the use of GLS in small samples in that QLof roughly 0.4 sufficed to make GLSl more efficient than OLS, even with a sample size of only 20, while the "break-even point" for GLSZ was roughly 0.6.13 In terms of the bias of the estimates of ouz, it is clear from Table 2 that OLS was markedly superior to either GLS procedure. It is somewhat troubling that this was true even with N = 100. It was true that the OLS estimator of ouz had the expected downward bias, but it turned out to be actually less biased than the GLS estimators. In fact, a glance at the rest of Table 2 will quickly reveal that this was also true for almost all the other specifications considered. With respect to these last results, three points should be made. The first is that they could not hold asymptotically; apparently even sample size 100 is not large enough to reveal the asymptotic result in this case. The second point is that different results might have been obtained if 62 rather than 6u2 had been considered. The 12This should be expected, since it is well known that the C-0 estimator 61 is more severely biased the larger the value of I. See, for example, the results in [20]. l3Asymptotically, of course, either GLSl or GLSZ would be more efficient than OLS for any 01 # 0, no matter how small. 27 third is that only the bias has been considered here; it is quite conceivable that the variance or even the mean square error of the GLS estimators might be smaller than that of the OLS estimator. We will return to these last two points in section 5 of the next chapter; for now the above results will simply be taken as they are. The third specification considered was second- order autocorrelation with pl = 0, a special case of the general second-order scheme. Considering the variance of the estimators of B, GLSZ is more efficient than OLS or GLSl except for "small" values of 02, a small value of 02 being 0.4 or less at sample size 20 and 0.2 or less at sample size 100. As before, the relative efficiency of the most efficient estimator is greater with the smaller value of T. Comparisons of OLS and GLSl shows OLS to be generally superior, with very few exceptions. The dif- ference was usually quite small, however. The superiority of OLS over GLSl was slightly more noticeable in the samples of size 20; this is reasonable since OLS and GLSl are asymptotically equivalent in this case.14 The last specification considered is second-order autocorrelation with both pl and 02 non-zero. Consider the estimates of B. GLSZ was typically most efficient, as would be expected, though GLSl does quite well when 14This is true since pl = 0. Thus plim 61 = 01/ 28 T = 0.8. With N = 100 GLSZ was always more efficient than GLSl, and GLSZ was more efficient than OLS in all cases except one. With sample size 20 GLSZ was more efficient than OLS in all cases except p1 = 02 = 0.2, and it was also more efficient than GLSl except when 02 = 0.2 and in a few cases when 02 = 0.4 and T = 0.8. Again it appears that the relative efficiency of the most efficient estimator is somewhat less with the larger value of T. Finally, GLSl is typically more efficient than OLS, especially when pl is large. This is especially noticeable at sample size 100. 2.6 Summary One implication of the last section is that GLS unfortunately does not seem to give less biased estimates of on2 than OLS, even for fairly large sample sizes. As noted earlier, this point will be considered again in section 5 of the next chapter. In terms of the variance of the estimates of 8, however, GLS performed quite well. This was true even for samples as small as 20. In particular, the loss of efficiency in assuming too high an order of autocorrela- tion was fairly small, while the penalty for assuming too low an order was in many cases quite large. These are of course essentially the asymptotic results, and they showed through quite well in small samples. 29 One implication of these results is that GLSZ might seem to be a useful procedure, at least if one is primarily interested in efficient estimation of B. Asymptotically, there is no loss in using it unnecessarily, and one will gain by using it if autocorrelation is of second-order form. Even in small samples the gains from its use may be substantial, and the loss in using it unnecessarily (for example, if autocorrelation were of first-order form) is typically small. This makes the currently almost universal use of GLSl in cases of suspected autocorrelation seem perhaps unjustified. After all, there is frequently no particular reason to suppose that first-order autocorrela- tion is typically present in real data. The assumption of first-order autocorrelation is generally just a simplifying assumption made in order to make estimation possible. Second-order autocorrelation is a less restrictive assumption, and a second-order scheme ought to provide a reasonable approximation to more different types of auto- correlation than will a first-order scheme. Hence when autocorrelation is not known a priori to be of first- order form, GLSZ might be useful. Finally, it cannot be overemphasized that the small-sample results obtained here are specific to the particular model used. Limited evidence is better than none, however, and these results may at least be useful in pointing the way for further analytical work in this area 0 CHAPTER III ESTIMATION IN A DISTRIBUTED LAG MODEL 3.1 Introduction In the last chapter we have introduced methods of estimation in a linear model in the context of second- order autocorrelation of the disturbances, and we analyzed the properties of the resulting estimators. In this chapter we will extend these results to the case of a common type of non-linear model, the distributed lag model. The simplest distributed lag model is a model of the form yi = 82 xi_j )3 + ui, i = l,2,...,N (3.1) j—O where ui, i = 1,2,...,N, is an unobserved random dis— turbance; Xi is either a fixed (non-stochastic) number or a random variable independent of the disturbances, with observed values Xi,...,X B is a parameter to be esti- N; mated; A is a parameter to estimated, 0 i A i l; and yi is the observed dependent variable in the model. Clearly the model in this form is not amenable to estimation; it 30 31 is usually rewritten in one of two ways. Lagging (3.1) by one observation, multiplying by l and subtracting yields yi = BXi + Ayi-l + (ui - Aui-l)’ i = 2,3,...,N (3.2) This is the so~called "Koyck transformation."l Alter~ nately, defining 66 , i i-j. ”0 = 82 x_. 13; wi(1) = 2 x. 1 ' (3'3) j=o 3 i=1 3 (3.1) can be rewritten as _ i yi — BWi(A) + n01 + ui. (3.4) This transformation was suggested by Klein.2 The model as written in (3.2) has a certain amount of attractiveness since it can apparently be estimated directly. However, it has long been realized that ordinary least squares applied to (3.2) will in general yield inconsistent results, as the disturbance (ui - Aui-l) is correlated with the regressor yi-l' Koyck3 suggested a method for obtaining consistent estimates of B and l; 1[32]. 2Appendix to [28]. 3I321. this method was reinterpreted by Klein4 in an errors in the variables framework. Liviatan5 has also suggested a method for obtaining consistent estimates. His procedure is essentially an instrumental variable one, with xi—l serving as the instrument for Yi-l' Both the Koyck—Klein procedure and the Liviatan procedure are fairly straightforward; their main dis- advantage is that the resulting estimates are asymptotic" ally inefficient. Assuming that the disturbances in (3.1) are normally distributed and meet the classical condi— tions (the FIC), asymptotically efficient estimates of B and 1 can be obtained by maximum likelihood estimation. Following Klein,6 (3.1) is rewritten as (3.4). Then the log likelihood function is L = lg log 20 — g log 02 - —£§ 20 , i 2 HMZ Now since the estimator of 02 turns out to be independert of the estimators of 8 and A (as will be shown in the nex’ section), maximizing L is equivalent to minimizing 4[281. 5[341. 6Appendix to [28]. 33 L* [yi ll HMZ - 8wi(x) - noxilz (3.6) with respect to B, A and n0. Clearly the resulting normal equations will be highly non-linear. However, it was noted by Dhrymes7 and by Zellner and Geisel8 that if one knew A, one could form Wi(1) and Xi and calculate the maximum likelihood estimates 8 and 00 by a simple regression of yi on Wi(l) and Ai. When A is unknown, the procedure is to "search" over the admissable range of A, picking the value of A which minimizes the sum of squared errors. The resulting values X, 8, and 30 are then the maximum likelihood estimators; the maximum likelihood estimator of 02 is the sum of squared errors divided by N. It is well known that the estimators 8, I, and 32 are consistent and asymptotically efficient; their asymptotic covariance matrix is the inverse of the so-called "information matrix," which will be written out in the next section.‘ The estimator 00 is not consistent. If the errors ui are autocorrelated, the procedure outlined above must of course be modified somewhat. The usual case considered in the literature is once again the case in which the ui follow a first-order autocorrelation scheme. This paper will treat the case of second-order autocorrelation; the usual results for the first-order case can be obtained by letting 02 = 0. 7181. 8[531. 34 For convenience, define Z1(A)' = [W1(A) ... WN(A)], 22(1)' = [A 12 )3 ... 1”], 2(1) = [zl(1) 22(1)] and Y' = (8 no). Then the log likelihood function is L = 2% log 20 - 810g IOZQI- —£§ [y-Z(A)Y]'0-l [y-Z(A)Y]. 20 (3.7) Now letting Z*(A) = VZ(A) and y* = Vy (V defined as in (1.15)), and recalling that V'V = 0-1, and defining Q = 2 . -l _ 2 _ _ 2 _ 2 2 4 determinant of 0 — 1 pl 2p2 2pl p2 pl p2 + p2 , the log likelihood function can be rewritten as log 20 - % log 02 +35log Q - l2[y*-Z*(MY]' 20 :E 2 L: Iy*-z*(1)y1. (3.8) Given A, 01 and 02, it is clear that 9 can be calculated by the least squares regression of y* on Z*(A); call this estimator $(A,pl,pz). By searching over A and choosing the A that minimizes the sum of squared errors, one gets the maximum likelihood estimators of Y and A, conditional on pl and p2; denote these by §(pl’p2) andJRol,pz). Also 82(01’02) can be calculated as the sum of squared errors divided by N. Now substitute IIDIIOZ) and 82(91'92) for Y and 02 in (3.8) above to get 35 _ _ N _ N .2 L(01.92) - 5 log 2“ 7 log 0 (01.02) . N82(pl.02) + 8 log 0 - (3.9) 262(0 0 ) 1' 2 which simplifies to N N 2 —1 N L(ol,02) = - 7 (log 26 + 1) - 7 16g [0 (pl,pz)Q / 1. (3.10) Finally, to calculate the maximum likelihood estimates one then searches over 01 and 02 and selects those values A A . . . "2 ‘l/N 01 and 92 that minimize o (pl,p2) Q . Several comments are in order here. First, since 32(pl,p2) is itself determined by a search over A, what is required is a three-dimensional search. That is, given A, 01, and 02, y* and Z*(A) are formed and y* is regressed 0) will tend to lead to small values of d1; negative first-order autocorrelation will tend to lead to large values. It should also be clear that this test may not be very effective in detecting types of auto- correlation other than first-order; an obvious example of a type of autocorrelation to which it would be insensitive would be a second—order scheme with 01 = 0. In order to test for second-order autocorrelation, we will propose the second-order Durbin-Watson test, to be based on the test statistic d = (4.3) 2The Durbin-Watson test is not applicable to a distributed lag model such as the one presented in the last chapter. A test which is asymptotically valid in such a model has been recently suggested in [13]. 57 This test should be able to detect first- and second- order autocorrelation. It should also be able to detect more different types of autocorrelation than the ordinary first-order test, since the second-order scheme on which it is based should be able to approximate more different types of autocorrelation than can a first-order scheme. In the next chapter we will discuss the power of both tests against various alternatives; the remainder of this chapter will be devoted to the consideration of the distribution of the above statistic under the null hypothesis of the FIC. 4.2 Calculation of Significance Points The statistics dl and d2 can each be written in matrix form as d. = 1 , 1 = 1,2; (4.4) where u is the vector of least squares residuals and Al and A2 are NxN matrices defined as follows: _mmr’ I‘ll" 58 1 -l O O O ... O O O O 0 -l 2 -l O 0 ... 0 O 0 0 0 O -1 2 -l O ... 0 0 0 0 O 0 O -l 2 -l . . O O 0 O 0 A1 = 0 0 0 -l 2 ... O 0 O O 0 O O 0 0 0 ... 0 -l 2 -l O 0 0 0 0 0 ... 0 O -l 2 -1 O O 0 O 0 ... 0 0 0 -l l 2 -l -l 0 O O ... O 0 O 0 -l 3 -l -l 0 0 ... 0 0 0 -l -l 4 -l -l 0 ... O O 0 0 A2 = O -l -l 4 -l -l ... 0 0 O 0 O 0 O 0 O 0 ... 0 -1 -l 4 - O 0 O 0 0 0 ... 0 0 -l -1 O O O O 0 0 ... 0 0 O -1 This is useful since a number of results have been established in the literature ~ fi'Au test statistic d = , where A is any real, non- u'fi singular, symmetric, positive definite matrix. In particular, define for the distribution of (4.5) a 59 I-X(X'X)-1X', (4.6) 3 II and let Z = MA (4.7) Then Z has N-K real positive characteristic roots (N and‘ K being the dimensions of the regressor matrix X) and K zero roots; number the positive roots, in increasing order, "l’fl2""'flN-K' Durbin and Watson have shown3 that d can be rewritten as follows: Z v.20 l 1 1 d = N-K , (4.8) 2 Z v. 1 1 where the vi are independent N(0,l) variables. Now, following Koerts and Abrahamse,4 one can note that P(d < d*) = N-K 2 N-K 2 N-K 2 P[ 2 fl.V. < d* 2 v. ] = P[ 2 n.v. < 0], where 1 1 1 1 1 1 1 1 where ni = Ni — d*. Using a result from Imhof,5 Koerts and Abrahamse note that 3(14]. 4[301. 5[25]. 60 N-K ' 1 N-K . 2 1 Sln [4 i arctan (nir)] g, l r N (l + n.2r2)4 1 l o (4.9) Numerical integration is feasible since Imhof has pro- vided the limit of the integrand as r + 0; it is 5 Z ni. He also provides a bound for the truncation error caused by integrating over the finite range [0,R]; to hold the truncation error to 8 one must take R equal to I11 . (4.10) Thus the exact probability that d lies below any value d* can be calculated by numerical integration, even though the form of the distribution of d is not known. This procedure was deve10ped by Koerts and Abrahamse6 for the statistic d1; it clearly can also be applied to d2. All that need be done is to insert the preper A1 in (4.7); from there on the procedure is the same in each case. 6See [30] or [31]. The same procedure was sub- sequently but independently developed in [41]. 61 4.3 Approximations The exact procedure of the preceding section has the drawback of being rather difficult computationally, so that one might sometimes wish to resort to an approxi— mation procedure so as to save computational effort. Henshaw7 and Durbin and Watson8 have given fairly compre— hensive reviews of the available procedures, so that only a few brief comments need be made here. Given that d has been written as in (4.8), the moments of d are readily computed. In particular, as noted by Durbin and Watson,9 1 N-K _ E(d) = N-K : Ni : N (4.11) and N-K _ 2 2 2 (Ni- N) Var(d) = 1 (4.12) (N-K)(N-K+2). Now it has been proven that the distribution of d is 10 but it is not known how good a asymptotically normal, fit the normal distribution would provide in small samples. In fact, there is some limited evidence to suggest that the beta distribution may provide a better 7[221. 8[16]. 9[15]. 1°14]. -.u :- 62 approximation to the distribution of d,11 and the beta distribution has generally been used to approximate the distribution of d. Since it is clear from (4.8) that the possible range of d is [01,n and since the beta N_K] . distribution over a given range is a two parameter dis- tribution, it is possible to fit a beta distribution having the same mean, variance and range as the true distribution of d. This is essentially the procedure of Henshaw; however, he goes to rather great lengths to avoid computing the eigenvalues of Z. Where direct eigenvalue calculation is possible, the following pro- cedure is somwhat simpler, at least conceptually. Having calculated Z and its roots, calculate E(d) and Var(d) from (4.11) and (4.12). Normalize d to the range [0,1] by defining d - 01 x = fl _ N (4.13) N-K 1 Then clearly E(d) :- 1T1 E(x) = fl _ n (4.14) N-K l and 11For some evidence see [43] or [5]. 63 Var(d) Var(x) = 2 - 01) (4.15) ("N-K Assume that x is a beta variable; clearly it has range [0,1]. It is well known that such a variable with density 37%737' xp’l(1 - X)q'l (4.16) has E(x) = 5E5 (4.17) and Var(x) = (p+q?%(p+q+l) ° (4.18) Since E(x) and Var(x) are known from (4.14) and (4.15), (4.17) and (4.18) can be solved for p and q. The results can be stated as follows: q = ‘ I———— Var(x)(l + h)3 + h p = qh (4.19) 64 Given q and p, one can find the critical values of x in any table of the incomplete beta function12 and get the corresponding critical value of d from the relation d = N a 1 (4.20) + xa(1TN_K - 01). This procedure fits a beta variable of the same mean and variance as d into the exact range of d; the only element of approximation is the use of the beta distribution. Durbin and Watson13 and Theil and Nagar14 have also proposed beta approximations, but each makes approximations about the mean, variance and range of d that are replaced here by exact results. Finally, this procedure also clearly applies to each of the di defined above. Once again all that need be done is to insert the proper Ai into (4.7) and from there on the procedure is identical in each case. 4.4 The Bounds Test Because of the substantial computational burdens involved in the procedures of either of the last two sections, it would clearly be desirable to avoid them as often as possible. Durbin and Watson have provided a partial solution in the case of the first-order test, by 12For example, [39]. 131151. 1"'[48]. 65 tabulating the critical points of statistics dL and dU whose critical points bound the critical points of d1.15 In this section we will provide similar bounds for the distributions of the higher order tests defined above. fi'Afi Again consider the statistic d = , A being one fi'fi of the Ai in (4.5). Then A has N-l positive characteristic roots; number them in increasing order Al'AZ""’AN-l‘ Recall that 01,...,N are the positive roots, in N-K increasing order, of Z = MA. Then the basis for the present procedure is the fact, proved by Durbin and Watson16 that A1 i Ni 1 Ai+K" i = l,2,...,N-K (4.21) where K' = K-l = the number of regressors not including the constant term (which must be present). It is then natural to define the variables dL and dU as follows: N-K 2 N-K 2 Z A.v. 2 A. , v. 1 1 1 l 1+K 1 4 dL ‘ N-‘R—z—‘t 90 7 N-K 2 ‘ '22) 2 v. 2 v. 1 1 1 1 151151. 66 Comparing these with (4.8), it is evident that the distribution of d is bounded by the distribution of dL and dU‘ Now, given the matrix A, the significance points of dL and dU can be calculated by the methods of section 4.2 and tabulated. Note that they depend on the matrix X only in that they depend on N and K'. The critical points of (d1)L and (dl)U have been tabulated in Durbin and Watson.17 Tables 4.1 - 4.3 of this paper contain the significance points of (d2)L and (d To use the tables, simply compute the value of d 2’0‘ and compare it to the critical points of dL and dU for the given values of N and K' and the desired alpha level. If d is less than the alpha level critical point of dL, the null hypothesis is rejected at that alpha level. If d is greater than the critical point of dU, the null hypothesis is accepted at that alpha level. If d falls between the critical points of dL and dU, the test is inconclusive; the critical point of d itself can then be calculated by the methods of sections 4.2 or 4.3. This procedure again applies to all of the di defined in this paper. 17[141. 67 mm.m ma.m as.m sfi.m mv.m o~.m sm.m -.m mm.m mm.m OOH mm.m ao.m ms.m NH.m Nv.m mH.m mm.m mH.m om.m Hm.m mm 6m.m mo.m 66.m mo.m ov.m oa.m mm.m mH.m sm.m ha.m mm mm.m 6m.m sv.m ma.~ mm.m mo.m om.m so.m mm.m NH.m _mM mm.m mm.m ~6.m om.~ vm.m sm.~ m~.m mm.m ma.m 4o.m .mm 6m.m ms.m mv.m ss.~ Hm.m ~m.~ aa.m mm.~ mo.m mm.~ mm 6m.m mm.m ms.m mo.~ mm.m vs.~ 6H.m mm.~ so.m mm.~ _mw mm.m mm.m ss.m mm.m m~.m om.m mH.m ms.m ma.m ms.m mm m6.m mm.~ m4.m 66.N sm.m vm.~ oa.m me.~ sa.m os.m mm -- .. om.m Hm.~ sm.m av.~ so.m om.m sm.m am.m mm .. .. am.m OH.~ om.m vm.m 4o.m mm.m as.m m¢.~ .mm .. -- n- .. H4.m mm.H mo.m mH.~ os.m 6N.~ mm .. I- u- .. am.m m6.a Ha.m om.a Hm.~ mm.a .mm m .s 4 .m m .x m .x a .m .oxmov can almov mo mmsam> Hmoflpfluo H0>0H Ho.o--.a.¢ mummy 68 sh.m mm.m sm.m mm.m Ho.m o¢.m om.m ~4.m Hm.m mv.m mm: ms.m mm.m m6.m sm.m Hm.m sm.m mm.m ov.m om.m Nv.m mm ms.m sm.m mo.m om.m Hm.m mm.m sm.m mm.m m6.m mm.m mm ss.m H~.m m6.m s~.m H6.m m~.m mm.m ~m.m 66.m 6m.m mm ms.m mH.m a6.m AH.m oo.m H~.m mm.m m~.m ss.m Hm.m mm mm.m mo.m os.m so.m mm.m ma.m mv.m ma.m mm.m v~.m mm sm.m ma.m ms.m Ho.m am.m so.m sv.m mH.m sm.m aa.m mm .m mm.m 6m.m ss.m mm.m oo.m oo.m 64.m so.m mm.m 6H.m mm vm.m ms.m ss.m mm.m Hm.m Hm.m mv.m mm.m am.m so.m mm .. .. mm.m os.~ mm.m om.m mv.m om.m mm.m mm.~ mm .. .. Hm.m mm.m mm.m am.m m¢.m ss.m om.m mm.~ _mm .. -- -- .. 6s.m ms.m m6.m am.m ma.m ss.m mm .. u- -u .. mm.m NH.~ mm.m mm.m oa.m mm.m mm m .E v .x m u .x N .M H u .s .almuv can quuv mo m05H6> Hmoflufluo H0>0H mo.o--.m.4 mqm4e 69 mm.m s6.m ss.m as.m as.m Hm.m mm.m mm.m Hm.m mm.m ooa mm.m m¢.m ms.m 66.m as.m mv.m mo.m Hm.m om.m mm.m mm Em.m mm.m om.m ms.m ms.m mv.m mm.m m¢.m mm.m Hm.m mm mm.m 6m.m Am.m sm.m ms.m H¢.m mo.m mv.m mm.m ms.m mm Ha.m mm.m mm.m Hm.m ws.m mm.m oo.m os.m mm.m 66.m mm om.m mH.m mm.m m~.m ss.m mm.m em.m mm.m sm.m mm.m mm ma.m ma.m sm.m mH.m ms.m 4m.m mm.m om.m mm.m mm.m mm m no.6 so.m am.m HH.m 6s.m mH.m m6.m m~.m om.m Hm.m mm mo.v 4m.~ mm.m mo.m ms.m HH.m mm.m mH.m ms.m 6N.m mm -- .. am.m Hm.~ Hm.m Ho.m mm.m oa.m m¢.m o~.m mm .. -- wo.v os.m mm.m mm.~ sm.m ma.m m8.m HH.m mm .. u- u- .. mm.m am.m sm.m sm.m mm.m mm.~ mm .. -u -a .. mo.v ov.m vs.m om.m mm.m om.~ mm m .s v .m m .x N .s H .s .slmcv 6:8 almcv mo mwsam> Hmoflufluo H0>ma oa.o--.m.s mqm6H-616H6 6.6 u Ha 6.0 u Ha 6.6 u Ha ~.o u HQ .COanMHwHHOUOUSM HTUHOIUmHHM HGUCD mCOflDOTWGH MO H®§le.m.m qumdrH. 81 OOOO. OOO.H OONm. NOm.N OOOO. mOO.N 6666. mmm.m Om OOOH. smm.O 6OmN. OOO.O NOON. OON.H OHON. ONO.H ONmm. ONO.N NOOm. NOO.N NNOm. NONm. OOHm. 666.6 mm NONN. OO6.O mmmN. NHH.H OHON. OOm.H OOHm. ONO.H mOmm. OON.N HmmO. 6HO.N NHHN. Omm.m 6OOO. OON.m ON OOmO. OOO.H OOO6. H6O.H H666. Om6.H 6N66. OOO.H .6.m 6602 .6.6 6602 .6.m 6602 .6.m 6602 O.O u HO 0.0 HO 6.O u HO N.O HO .206 Imamuuooousm HmUHOIumHHM 20025 N Hp mo macauma>mp pumpcmum paw mammzln.v.m mqm¢e 82 The third specification considered is second-order autocorrelation. This is constructed by transforming the N(0,1) deviates 61’ i = l,2,...,N, as follows: u1 = 6:1 u = u + (1 - 2)»2 2 D1 1 D1 82 ‘6 -_ uj — pluj_l + pzuj-2 + aj 8], J — 3,4,...,N, (5.10) where a. = l - pi - p: - 20102 ?E 0;. (5.11) Table 5.5 gives the number of rejections for the six cases in which both pl and 02 are non-zero, while Table 5.6 gives the means and standard deviations. Note that for all alpha levels and all sample sizes, the second- order test is most powerful. This is true even when 01 is large relative to 02. Also note that the means of all the test statistics are lower, and the standard deviations higher, than under the null hypothesis. Table 5.7 gives the number of rejections for the four cases in which 01 = 0 but 02 # 0, with Table 5.8 giving the means and standard deviations. As might be expected, the second-order test is most powerful. In fact, the first-order test shows very little power. 83 mam vmm Ohm woo mmm Hem How mmm Hmm mhm Hmm Nwm Hem 5mm bah NVF mmm NNV U cm N Z OOOH mmm mmm com vvm mow 0mm mom mmh MNN woo Hmm ®w© HHm mmv 0N0 Nam Nom HG mom ova 0mm 0N0 0mm hon ONm mvn Hmm mom mob mmw Omh wvw wmv vmm mHv mNN NU mm N Z 5mm 0mm ham 0mm EMF wow wow 00% va va mam ANN mmv ohm mNN mow vvm owH HG mNh moo mom mvo hem OHv hmm hvv mmm mmm mmv mom mow mmm mom mmm mmm BAH NU ON N 2 Han vmo Omv Nmm mmv «mm mam Ham VAN hmm vmm mVH mom oam VHH omN th om HU OH. mo. Ho. OH. mo. Ho. OH. mo. HO. OH. mo. HO. OH. mo. Ho. OH. mo. Ho. uH®>®AIMLQH¢ N.O H NQ V00 H NQ N.O H NQ 0.0 H NQ Woo H NQ N.O H NO 0.0 H HQ v.0 U HQ V00 N HQ N.O H HQ N.O H HQ N.O H HQ .COHuMHWHHOUOUSM HQUHOIUCOUmm HQUCS WCOHflUmflmH MO HGQEDleomom mqmfle 84 Nome. ohm.a whom. mnm.a ommm. wov.m wmvw. omH.N oamm» mho.m mwmv. hHH.M om N Z Nmmm. Ham.o mmvm. vmm.o ommm. mma.a mmwv. mwm.H mmmm. mmv.a momm. mom.a mmmo. mmm.m mama. omv.m vmom. ham.m move. mmw.m mmom. omm.m nmmm. ham.m mm H z mmmm. omo.H Hamm. hom.a ovvm. Nmm.H bvwv. H60.H Hmav. Nvm.a mmmm. Hmo.H mmmm. ohm.m Hwbm. hhh.m mwmh. mwo.m mmmm. mmm.m mmmn. 6HN.m Namo. Nov.m ON n z mHHm. th.H mmhm. 6mm.a mmom. mmm.H Hove. mhm.a hamm. mnm.a mmmv. mmm.a .D.m cmmz .Q.m cmmz .D.m cmmz .Q.m cmwz .D.m cmmz .D.m cmmz N a .coflumamuuooousm umpHOIpcoomm Hope: 6 666 U mo msoflu6H>mp pumpcmum Ucm mewEll . m . m Mdmde 85 666 66O O6O O6O H6O N66 ONO 666 OON 666 OON N6 6 OO u z N6H O6H OO O6H HOH OO OOH 6NH OO O6H HO N6 H6 H6O 6NO ON6 O6O HO6 O6N 666 N66 6OH OON OOH 6O N6 O6 u z 66H 6OH NO 6OH 6HH 6O OOH OHH 6O N6H O6 O6 H6 6O6 NH6 6OH OO6 OON 66H HON 66H O6 N6H OHH 66 N6 ON u z O6H 6OH 6O O6H 66 O6 6NH O6 66 OHH OO HN H6 OH. OO. HO. OH. OO. HO. OH. OO. HO. OH. OO. HO. nH0>0Hu666H6 6.0 u NO 0.0 u NO 6.0 u NO N.O u HO .Ao HOV EOHuMHmHHooousm Hmpuolpcoomm Mona: mcoHuomflmu mo HmQEsZII.O.m mumma 86 HOHO. OO6.N H666. O66.N OOO6. OON.6 6OH6. NHO.6 OO OONO. OON.N O666. O6O.N ON66. N6O.N O6N6. NOO.N N666. 6OH.6 666O. O6N.6 OOHO. 666.6 6OOO. HOO.6 O6 6ONO. HO6.N OO6O. 6NN.N 66N6. 6OO.N 66O6. H6O.N NOOO. NHN.6 OOOO. HO6.6 OO6O. H6O.6 6OOO. 66O.6 ON ONOO. O66.N O6OO. 6ON.N HO6O. OHN.N OO66. 6OH.N .6.6 6602 .6.6 6602 .6.6 6602 .6.6 6602 6.0 n NO O.O n NO 6.O NO N.O u HO .Ao u OHuMHmuuooousm Hmpuoupsoomm amps: m 66 mo maowumw>m© pumpcmum U26 m260211.m.m mqm¢a 87 Looking at the table of means and standard deviations, one can see that in this case the mean of the first-order test statistic actually increases over its value under the PIC. It is only because the standard deviation also increases as 02 increases that we obtain more rejections than under the null hypothesis. In fact, if the increase in the mean predominated over the increase in the standard deviation, one could actually get less rejections in this case than under the null hypothesis. Clearly the first- order test is not suitable when 01 = 0 and 02 ¢ 0. 5.3 Summary To summarize these results, some general patterns clearly appear. The first-order test appears to be most powerful for the case of first-order autocorrelation and the second-order test most powerful for second-order autocorrelation. Hence if a test is used which is of higher order than the true order of the autocorrelation scheme, some loss of power apparently results relative to the case in which the test of "correct" order is used. However, the Monte Carlo evidence suggests that this loss is fairly small. On the other hand, use of a test of "too—low" order also forfeits power, and this loss can be very substantial indeed; this is especially true if the lower order p's are small. All these results agree with the intuition expressed in section 5.1. Of course, it 88 should be clear that these results are dependent on the particular X matrices used in the experiment. Certain X matrices may (or may not) exist for which these conclusions would not hold. To the extent that these conclusions are generally valid, however, they would seem to imply that one should perhaps be wary of using the ordinary Durbin-Watson test intfluacommon case of testing for autocorrelation which is not known a priori to be of first-order form. Even if the autocorrelation in the sample should happen to be of first-order form, use of d2 rather than (11 would entail only a fairly small loss of power. On the other hand, cases do exist for which the use of (11 rather than d2 would entail an almost complete loss of power. To put the same point somewhat differently, the second-order Durbin-Watson test is more generally applicable than the first-order test, and it would appear to be useful in the general case of testing for autocorrelation of unknown form. Finally, it should be noted that other tests have recently been proposed which are not tied to the idea of first-order autocorrelation. For example, Durbin9 has prOposed as test based on the cumulative periodogram of the residuals which may be useful in detecting auto- correlation of a general nature; an interesting tOpic 9I121. 89 for further research would be to compare the power of this test with the test preposed here. CHAPTER VI CONCLUDING REMARKS As noted in Chapter I, autocorrelation can cause serious problems in econometric regression analysis. Econometricians have therefore developed testing pro- cedures to test for its presence, and estimation pro- cedures to alleviate the problems which it causes when it is present. These procedures must of necessity make some assumptions about what types of autocorrelation might be present. In particular, the testing and estimation pro- cedures which have most commonly been used have been based on the assumption that the autocorrelation in the sample is of first-order form. If autocorrelation is present, but not of first-order form, one can only hope that a first-order scheme is in some sense a reasonable approximation to the true scheme. If it is not, the ordinary procedures may not be very appropriate. Having argued that the usual first-order auto- correlation scheme is unduly restrictive, this study then proposed a generalization in the form of a second- order scheme. The common testing and estimation pro- cedures were generalized to forms appropriate for this case. Finally, the new procedures were compared to the 90 91 original procedures in terms of their performance in the presence of various types of autocorrelation. It was typically found that the "best" procedures in each case were those which assumed the true order of autocorrelation. However, there was a fundamental asymmetry in that the losses involved in assuming too high an order of auto— correlation were generally rather small, while the losses involved in assuming too low an order were often quite serious. This would seem to imply that when one does not know a priori what type of autocorrelation is present, one should proceed under rather general assump- tions about its form. Of course, there must be some limit on how general a process one can assume and still get meaningful results. (After all, without some restrictions on Q estimation is literally impossible.) This study does not claim to have discovered where that limit might lie. However, it does seem clear that the assumption of second-order auto- correlation lies well within the permitted range of generality. It would therefore seem that testing and estimation procedures based on the assumption of second- order autocorrelation might often be more apprOpriate than those which assume first-order autocorrelation, at least when one does not have a priori knowledge of the true form of the disturbance term covariance matrix. REFERENCES 92 10. REFERENCES Amemiya, T. "Specification Analysis in the Estimation of the Parameters of a Simultaneous Equation Model with Autocorrelated Residuals." Econometrica (1966), pp. 283-306. Amemiya, T. and W. Fuller. "A Comparative Study of Alternative Estimators in a Distributed Lag Model." Econometrica (1967), pp. 509—529. Anderson, R. L. "Distribution on the Serial Correla— tion Coefficient." Annals of Mathematical Statis- tics (1943). PP. 1-137 Anderson, T. W. "On the Theory of Testing Serial Correlation." Skandinavisk Aktuarietidskrift (1948). PP. 88-116. Anderson, R. L. and T. W. Anderson. "Distribution of the Circular Correlation Coefficient for Residuals from a Fitted Fourier Series." Annals of Mathematical Statistics (1950), pp.‘59-8l. Anscombe, F. "Examination of Residuals." Proceedings of the Fourth Berkely Symposium on MathematiCal Statistics and Probability, V61. I, UnIVersity of California Press (1961), pp. 1-36. Cochrane, D. and G. H. Orcutt. "Application of Least— Squares Regression to Relationships Containing Autocorrelated Error Terms." Journal of the American Statistical Association (1949), pp. 32-61. Dhrymes, P. "Efficient Estimation of Distributed Lags with Autocorrelated Errors." International Economic Review (1969), pp. 47-67. Dhrymes, P. Distributed Lags: Problems of Formulation and Estimation (forthcoming). Durbin, J. "The Fitting of Time—Series Models." Review of the International Statistical Institute (1960). PP. 233-243. 93 94 11. Durbin, J. "Estimation of Parameters in Time Series Regression Models." Journal of the Royal Statistical Society, Series B (1960), pp. 139-153. 12. Durbin, J. "Tests for Serial Correlation in Regression Analysis Based on the Periodogram of Least Squares Residuals." Biometrika (1969), pp. 1-15. 13. Durbin, J. "Testing for Serial Correlation in Least Squares Regression when Some of the Regressors are Lagged Dependent Variables." Econometrica (forthcoming). l4. Durbin, J. and G. S. Watson. "Testing for Serial Correlation in Least Squares Regression I." Biometrika (1950), pp. 409-428. 15. Durbin, J. and G. S. Watson. "Testing for Serial Correlation in Least Squares Regression II." Biometrika (1951), pp. 159-178. 16. Durbin, J. and G. 8. Watson. "Testing for Serial Correlation in Least Squares Regression III." Biometrika (forthcoming). l7. Goldberger, A. Econometric Theory. New York: Wiley (1964). 18. Grenander, U. "On the Estimation of Regression Coefficients in the Case of an Autocorrelated Disturbance." Annals of Mathematical Statistics (1954). pp. 252-272. "A Note on the Serial Correlation Bias l9. Griliches, Z. Econometrica, in Estimates of Distributed Lags." (1961). pp. 65-73. 20. Griliches, Z. and P. Rao. "Small Sample PrOperties of Several Two-Stage Regression Methods in the Context of Autocorrelated Errors." Journal of the American Statistical Association (1969), pp. 253-272. 21. Hart, B. and J. von Neumann. "Tabulation of the Probabilities for the Ratio of the Mean Square Annals Successive Difference to the Variance." of Mathematical Statistics (1942), pp. 207—2I4. 22. Henshaw, R. "Testing Single-Equation Least Squares Regression Models for Autocorrelated Disturbances." Econometrica (1966), pp. 646-660. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 95 Hildreth, C. "Asymptotic Distribution of Maximum Likelihood Estimators in a Linear Model with Autoregressive Disturbances." Annals of Mathematical Statistics (1969), pp. 583-594. Hildreth, C. and J. R. Lu. "Demand Relations with Autocorrelated Disturbances." Agricultural Experiment Station Technical Bulletin No. 276, Michigan State University (1960). Imhof, P. "Computing the Distribution of Quadratic Forms in Normal Variables." Biometrika (1961), Johnston, J. Econometric Methods. New York: McGraw— Hill (1963). Kadiyala, K. R. "Testing for the Independence of Regression Disturbances." Econometrica (1970), pp. 97-117. Klein, L. "The Estimation of Distributed Lags." Econometrica (1958), pp. 553-565. Koerts, J. "Some Further Notes on Disturbance Estimates in Regression Analysis." Journal of the American Statistical Association (1967), pp. 169-183. Koerts, J. and A. P. J. Abrahamse. “On the Power of the'BLUS Procedure." Journal of the American Statistical Association (1953): PP- [227-1236. Koerts, J. and A. P. J. Abrahamse. On the Theory and Application of the General Linear Model. Rotterdam: University of Rotterdam Press (1969). Koyck, L. Distributed Lags and Investment Analysis. Amsterdam: North Holland Publishing Company (1955). Lehman, E. Testing Statistical Hypotheses. New York: Wiley (1959). Liviatan, N. "Consistent Estimation of Distributed Lags." International Economic Review (1963), pp. 44-52. Lyttkens, E. "Standard Errors of Regression Coeffi- cients by Autocorrelated Residuals." In H. Wold, Econometric Model Building: Essays on the Causal Chain Approach. Amsterdam: North HoIland Publishing Company (1964). 36. 37. 38. 39. 40. 41. 44. 45. 46. 47. 48. 96 Maddala, G. S. "Generalized Least Squares with an Estimated Variance-Covariance Matrix." Econometrica (forthcoming). Malinvaud, E. Statistical Methods of Econometrics. Chicago: Rand-McNally (1966). Neumann, J. von. "Distribution of the Ratio of the Mean-Square Successive Difference to the Variance." Annals of Mathematical Statistics (1941). Pearson, K. Tables of the Incomplete Beta Function. Cambridge: Biometrika OffiCeIT1934). Prais, S. J. and C. B. Winsten. "Trend Estimators and Serial Correlation." Cowles Commission Discussion Paper No. 383, Chicago (1954). Press, S. and R. Brooks. "Testing for Serial Correla- tion in Regression (revised)." Center for Mathematical Studies in Business and Economics. University of Chicago, Report 6911 (1969). RAND Corporation. One Million Random Digits and One Hundred Thousand Deviates. Santa Monica (1950). Rubin, H. "On the Distribution of the Serial Correlation Coefficient." Annals of Mathematical Statistics (1945), pp. 211-215. Sargan, J. D. "The Estimation of Economic Relation- ships using Instrumental Variables." Econometrica (1958), pp. 393-415. Sargan, J. D. "The Maximum Likelihood Estimation of Economic Relationships with Autoregressive Residuals." Econometrica (1961), pp. 414-426. Theil, H. "Analysis of Disturbances in Regression Analysis." Journal of the American Statistical Association (1965), pp. 1067-1079. Theil, H. "A Simplification of the BLUS Procedure for Analyzing Regression Disturbances." Journal of the American Statistical Association (1968), pp. 242-253. Theil, H. and A. Nagar. "Testing the Independence of Regression Disturbances." Journal of the American Statistical Association (1961), pp. 793-806. 49. 50. 51. 52. 53. 97 Watson, G. S. "Serial Correlation in Regression Analysis." Biometrika (1955), pp. 327-342. Watson, G. S. "Linear Least Squares Regression." Annals of Mathematical Statistics (1967), pp. 1679-1699. Wickens, M. R. "The Consistency and Efficiency of Generalized Least Squares in Simultaneous Equa- tions Systems with Autocorrelated Errors." Econometrica (1969), pp. 651-659. Zellner, A. "An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for Aggregation Bias." Journal of the American Statistical AssociatIOn (1962), pp. 348-368. Zeller, A. and M. Geisel. "Analysis of Distributed Lag Models with Applications to Consumption Function Estimation." Econometrica (forthcoming). "11111111111ES