9)]‘J‘95‘ I‘ ’a' 9€$r . S. '4- . . a“ 7. 31.55 9‘59. ‘. .N 2! L 43.9 .33. -r,‘ 3"; 23¢ .- ‘ 9386;}? 0“? -— ‘3; 3"" "It . 9““ 5‘ .923 r5“ SW ‘12. on h 5‘“ 9 a ms t} ”a“. 'i. - 0": ‘ '. i 52: “film 9 _ .3 a, :3. w" .434; »"~-.‘}m‘n .. s '- $331443. 9 O 33.25:. L _'. 3'. 35.. .I . .‘l‘I‘K‘ ‘ c ‘0“ . ;_v _.. - 9 0'. ' ‘40. I " . t‘ u "- . ’ .. ‘ . - u- 9' .9 . . . > a I O 9 .'. ,'4 ' ' ‘ \ f. . ‘.' ‘. 0‘ ‘l‘. :5 .' _-; 9 . " ' .'> 11 ‘. a. ‘ I. .I " _"‘Il - I\o ‘il '0.‘. =— 9 -.,. .3. . ‘ 1,’.';’ ',' :l a}, . . .V 'V 9 1.. , 51.1,. .. I). - L.‘ . I ‘4... ' ‘.. t v '3 «u» ,1" .- . .6 " ' .0 '--' ”LVA' 9--....» .,. . 1w ¢a~-co.fi.... -.... . I. 4 . .. “n“..q.’ ‘ . -. ¢oo .0 n a»': ‘ ' . I . - F ' I. '~‘ '- 0 1..“ o .1 fl. .- u " c I Q ' 0 iv‘ c '- ‘l. ' x! 'rtfv [:7‘. ,.., . .. ‘ .11., -0 0"! co . I l I :9.- 0“ ‘- :‘F 'd . , ‘ 'J. 4.}! . I . o t z‘.“n.v¢‘~. - I. v 1 - .-. -. . . «.3. . ‘ A q ‘ ‘ o. o I 'I-v 9. 7n." ‘a' . - -1: 0;, u- ‘ .. 1‘1.) ¢_‘.h. ,.-. . - ~o. ‘ ‘ OI . r- ' v v ‘- f . s: g . '._-‘A . '3, “’...I,L‘}.nés {a .3.” V} \ ‘ ' . " ' t n- a . ,‘ '. ‘P 0’ - - ? AHA)! 9,. » K. p, .‘. “.91”. .19”.- ovv- . ‘3‘ .. IV. .J. O n In n ..0.0 -.‘-‘ x1, 0 I o A-O‘Lo" no... i. . O! ..| V V o‘c ru'. '34-’53 ‘ v v u ‘ I (an. 1' O". c "O I" a 4.-'.. V.. 0" up! n... v. , luv-0". 4'3..." 001.. ‘ ' --'~ I v '. .(la‘g‘ofi‘b . 1""",‘-.' “U r O o o 01' .‘I '-' o.- o . 1‘. #0 (.o.‘.‘..:‘ v.?o'£.’ :4. ' LL, ' '1'- ,‘fia ‘7‘1'0'.'.ffix . .¢b~~'L:-t,“'.q ‘Vti‘fi. , .- ' ' I“. ' 'v‘"< -.7.',‘.'" ‘9‘: o“ ‘. -'.":". THESIS IUIHHHHllllHHHHHIHIHHIll)Hll‘llHllHllMlUlll 3 1293 010515 Thisistocertifythatthe dissertation entitled ESTIMATION WITH PANEL DATA presented by KYUNG SO IM has been accepted towards fulfillment of the requirements for . Ph .D . degree in Econogics Date MGPCL} 3 '7 I ‘1‘77‘ “407th iveA tion/ Equal Opportu unity Inuit ° 0-12771 LIBRARY Michigan State University PLACE It RETURN BOX to remove We checkout from your record. TO AVOID F INES return on or before date due. DATE DUE DATE DUE DATE DUE ' ' ' A wt: 5‘52 _ Zbo’i‘l l MSU le An Affirmative Action/Equal Opportunity lnetitution Warns-9.1 ME fir? ESTIMATION WITH PANEL DATA BY Kyung So Im A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Economics 1994 ABSTRACT ESTIMATION WITH PANEL DATA BY Kyung So Im This dissertation studies standard panel data models with repeated observations for large cross sections. In Chapter 2, we compare the BSLS estimator and the generalized IV estimator, and derive some equivalence results. Also, we obtain some redundancy conditions for models where the regressors are strictly exogenous. Block diagonality of the optimal GMM weighting matrix turns out to be crucial for some instruments to be superfluous. Also, we propose some GMM estimators that are computationally simple and asymptotically no less efficient than GLS. Chapter 3 covers weakly exogenous models. If the instruments are weakly exogenous and the errors are serially correlated, it seems that the currently used moment conditions lead to inconsistent estimators in general. The source of the serial correlations appears crucial to determine the set of orthogonality conditions. Also, we suggest some reduced lists of instruments in several useful models that would produce nearly efficient estimators. In Chapter 4, we derive asymptotic variances of estimators when the moment conditions from covariance restrictions are used. As nonlinear optimization is not necessary to estimate these variances, this result, in practice, would motivate people to use covariance restrictions more frequently. We also detail when the moment conditions from covariance restrictions are redundant in several popular models. An interesting result is that the instrumental variables can be useful even when they are not correlated with the regressors. We also argue that the moment conditions from covariance restrictions are useful always unless the GLS efficiency is reached. To the memory of my parents iv ACRNO'LHDGHBNTS Many people have assisted me in getting through the graduate program. Although there are too many names to mention them all, I am happy to have a chance to express my gratitude. Were it not for the aid of Professor Jeffrey Wooldridge, the dissertation committee chair, I would still be idling my time away. He provided careful advice on every aspects of my thesis with exceeding intelligence and consistent patience. I was a demanding student, but he always split his tight schedule and found time for listening to my questions. Especially, his careful correction on the final draft of Chapter 2 is appreciated deeply. I am grateful to the other two committee members, Professors Peter Schmidt and Ching-Fan Chung, for their time and efforts. The detailed and critical comments by Professor Schmidt on the early drafts of Chapter 2 and 4 improved the quality of this thesis substantially, and will be invaluable lessons through my career. The wonderful lectures from many faculty and the assistance of the departmental staff are appreciated. I must thank Professor Richard Baillie for his careful reading and comments on Chapter 2 and 4, and Professor Paul Chen for his advice and encourgement. It was my fortune to have known Dongin Lee, my four year officemate. We talked on whatever occured to us. Without him, this course should have been much more arduous. Joonsoo Lee's guidance was also quite helpful. My greatest luck in graduate school was meeting and marrying my wife, Eunhee Kuh. She sacrificed her own study and brought a lovely girl and a healthy boy into this wonderful world. Words cannot be found to express my thanks to her. My baby brother Kyung Tae and big sister Hye Kyung have provided thorough support throughout the course. And I am also indebted to my other brothers Kyung Rae, Kyung Ho and Kyung Hun for their encouragement. Thank you all who prayed for me. vi TABLE OF CONTENTS CHAPTER 1. INTRODUCTION ................................. CHAPTER 2. ESTIMATION USING PANEL DATA UNDER STRICT EXOGENEITY 1. INTRODUCTION OOOOOOOOOOOOCOOOOOOOOOOO ........ 00.... 2. 3SLS, GIV, AND REDUNDANCY CONDITIONS 2.1. PRELIMINARIES ................................ 2.2. EFFICIENCY COMPARISON OF 3SLS AND GIV ........ 2.3. NUNERICAL EQUIVALENCE OF BSLS AND GIV ........ 2.4. ALGEBRAIC REDUNDANCY OF INSTRUMENTS IN BSLS .. 3. MODEL WHERE THE REGRESSORS ARE UNCORRELATED WITH THE ERRORS 3.1. UNRESTRICTED COVARIANCE MATRIX .... ...... ..... 3.2. DIAGONAL COVARIANCE MATRIX ................... 3.3. RANDOM EFFECTS STRUCTURE ..................... 3.4. A GENERALIZATION OF THE RANDOM EFFECTS ASSUMPTION ................................... 4. MODEL WHERE THE REGRESSORS ARE CORRELATED WITH THE TIME CONSTANT ERROR COMPONENTS .................... 4.1. A "FIXED EFFECTS" TYPE MODEL ................. 4.2. HAUSMAN AND TAYLOR MODEL ..................... 4.3. HT MODEL WITH SERIALLY CORRELATED TIME-VARYING ERRORS .......... ...... ....................... 5. CONCLUSION 0.00000000000000000000000000000000000000 APPENDIX 1 ........................................ APPENDIxzOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 1 14 18 20 23 26 26 28 29 34 37 38 41 44 46 48 52 CHAPTER 3. ESTIMATION USING PANEL DATA UNDER WEAK EXOGENEITY 1. INTRODUCTION 00.0.0000...OOOOOOOCOOOOO0.0.0.0000... 2. SERIAL CORRELATION AND CONSISTENCY OF ESTIMATORS . 2.1. MOMENT CONDITIONS UNDER WEAK EXOGENEITY ...... 2.2. SERIAL CORRELATION AND MOMENT CONDITIONS .... 3. ESTIMATION WITH THE BMS ASSUMPTION ........ . ....... vii 56 60 61 63 7O 4. NEARLY EFFICIENT ESTIMATION ..... .................. 5. CONCLUSION .0..0.000000000000000000.00.00.00.00...O 82 CHAPTER 4. INFORMATION FROM COVARIANCE RESTRICTIONS IN PANEL DATA MODELS 1. INTRODUCTION 2. PRELIMINARIES 2.1. REDUNDANCY CONDITIONS FOR MOMENT RESTRICTIONS 2.2. SCALAR COVARIANCE AND THE ASYMPTOTIC VARIANCE OF Gm .0.0...0.000.......OOOOOOOOOOOOOOOOOOOO 2.3. RANDOM EFFECTS COVARIANCE AND THE ASYMPTOTIC VARIANCE OF GMM 3. STRICTLY EXOGENOUS MODELS OOOOOOOOOOOOOOOOOOOOOOOO. 3.1. GENERAL RESULTS ON NONREDUNDANCY UNDER IDEAL CONDITIONS 3.2. STRICTLY EXOGENOUS MODEL: 3.3. STRICTLY EXOGENOUS MODEL: COVARIANCE 3.4. STRICTLY EXOGENOUS MODEL: SCALAR COVARIANCE .. RANDOM EFFECTS FIXED EFFECTS TYPE . 4. WEAKLY EXOGENOUS MODELS .00...OOOOOOOOOOOOOOOOOOOOO 4.1. WEAKLY EXOGENOUS MODEL: DIAGONAL COVARIANCE .. 4.2. WEAKLY EXOGENOUS MODEL: RANDOM EFFECTS COVARIANCE 5. CONCLUSION OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO CHAMER 5. CONCLUDINGREMARKS 0.0.0.0000...OOOOOOOOOOOOO LIST OF REFERENCES viii 87 89 95 99 99 101 106 107 111 111 115 121 123 126 CHAPTER ONE INTRODUCTION This dissertation deals with linear panel data models with repeated time observations for large cross sections. A basic model is (1) yit = xitfi + uitl t = 1" ' '11.: where xit is a 1xk vector and fl is a kxl parameter vector of primary interest. Let xi = (xi'1, - - - ,xip ' with yi and ui similarly defined. Allowing time-constant unobserved individual effects, that may in many instances bias the estimators obtained from single cross section data, we write (2) u.t = ¢i-+ 5n! t = 1,- -,T. 1 Thus, ¢N is the time-constant error component and eit is the idiosyncratic error. We assume there is a Txh matrix of instrumental variables wg; these instruments are suggested by various assumptions. We are interested in the case where ¢i are treated as random, and not as fixed parameters to be estimated. We do consider the case where ¢k is correlated with some or all of the regressors; for many applications this is an important feature. This chapter provides a summary of the main results contained in the subsequent chapters, and links them to 2 previous studies. The following chapters are essentially independent of each other, and can be read separately. In each chapter we define the relevant notation. Whenever we refer to theorems, equations or assumptions contained in other chapters, we will specify the chapter number. References are gathered at the end of the thesis. In Chapter 2, we are primarily concerned with estimation in models where the regressors are strictly exogenous to the idiosyncratic errors: (3) E(x¢mg) = 0. But, before dealing with specific models, we compare 3SLS and generalized IV (GIV). This comparison appears in Bowden and Turkington (1984, p. 72) and White (1984, pp. 83-105: 1986). But, a general result has not been established yet. We assume (4) E(uuqlvq) = E(uufi) 552. The 3SLS and GIV estimators are defined as B355 = [x'wmmm"w'x1'1x'mw'nm‘1w'Y, and 3 = [x'n“W(w'n"W)"w'n'1x1‘1x'n'1w w'n'1w "w'n'1Y, GIV where (Y,X,W) is the data matrix stacking (yi,xi,wi) , i = 1,---,N, and n ==I~o£. 3SLS and GIV utilize the instruments vq and 24w}, respectively. We show that there in general is 3 no dominance one over the other between these two, and they are numerically identical when the time periods use common instruments or when E is diagonal. Thus, the well-known equivalence result between OLS and GLS in the SUR model follows as a corollary of our result when wi==)g. We then turn to the specific models and provide several reduced lists of instrumental variables that lead to the fully efficient estimators under several different assumptions. Asymptotically, there is no reason to reduce the set of instruments since GMM never loses asymptotic efficiency by adding orthogonality conditions. However, GMM based on restricted instruments is not only computationally simpler but could have better finite sample properties. The unobserved effects model is standard if we add the assumption of the random effects covariance matrix (5) z = of:T + age,e;, where E(e§t) = oi, t = 1, - --,T, Em?) = 03' IT is the TxT identity matrix, and 9% is the Txl vector of ones. If xi and <1:i are not correlated with each other, the model is the popular random effects model, and the random effects estimator (GLS) is the most efficient. We show that BSLS utilizing the instrumental variables (PX,QX) is GLS, and propose GMM using (PX,QX) as the instruments, where PX and QX denote the NTxk meaned and demeaned matrices of the data matrix X. If the assumption (4) does not hold - for 4 example, in the presence of heteroskedasticity and/or serial correlation in the 5n conditioned on wg‘- GMM using the instruments (PX,QX) is generally more efficient than the random effects estimator. Hausman and Taylor (1981) (HT hereafter) allowed for some of xi to be correlated with 4", and showed that how the coefficients on the time—constant variables are identified. Subsequently, Amemiya and MaCurdy (1986) and Breusch, Mizon and Schmidt (1989) (EMS henceforth) developed more efficient estimators under some additional assumptions. We argue that the optimal weighting matrix E(wi'uiu,'wi) needs to be block diagonal to generate the fully efficient GMM with reduced lists of the instrumental variables. This unified theme provides intuition behind the redundancy results established by EMS, Ahn and Schmidt (1992) and many of the theorems in this thesis including the previous result that 3SLS using (PX,QX) is GLS in the random effects model. If 2 is of the random effects form, then 2 can be expressed as aPr-i-bQT, for some scalars a and b. Thus if wi can be decomposed into (PTw1i,QTw2i) , where w1i is generally constructed from the regressors that are not correlated with ¢H (an important exception is the instrument set suggested by EMS) and w2i usually is based on all of the time varying regressors, and provided assumption (4) holds, the optimal weighting matrix become block diagonal simply because Pg; = 0. We show a redundancy result through an example when wz‘. = Lox‘i’, which 5 is standard, where L is the Tx(T-1) differencing matrix and x? = (xi1,- ~ -,xiT). Then W2(W.‘,'W2)'1W2X = QX and the resulting estimators are the same whether W2 or QX are used as instruments. W2 contains T(T-1)k instruments, but only the k instruments QX are useful and the rest are redundant. This also explains the well-known lemma showing GLS in the random effects model is a linear combination of the within and the between estimators, and generalizes this lemma to the case when the optimal weighting matrix is block diagonal. We show that 2 = aPT+bQT =9 PTzzQT = 0, but the converse is not true. P.2Qt = 0 is sufficient to make the optimal weighting matrix block diagonal, provided wi== (Prwnerwzi) and assumption (4) holds. Another important case when the optimal weighting matrix become block diagonal is when (6) 2 = diag(0$1"'la$)l that is, when there are no time-constant unobserved effects and the errors are serially uncorrelated. Under the assumptions (4) and (6), E(w{unfivq) is block diagonal with the t-th block 0§E(wi'twit) , where wit is the instruments for the t-th period equation. If wit 3 xit for t = 1, --,T, BSLS utilizing the instruments diag(x“,---,xfl) is GLS. This covariance matrix is especially revelant for the rational expectations models where (pi does not present and the errors are necessarily serially uncorrelated. 6 Assumption (5) is now almost standard in the panel data literature. But, in general, there are neither a priori grounds nor any technical reasons that justify this form of 2. Therefore, we consider the case when the idiosyncratic errors are serially correlated. Ahn and Schmidt (1991) showed that BSLS using all the instruments Ifinfi is GLS. We provide a simpler proof than theirs, and reconsider the HT model allowing for the idiosyncratic errors to be serially correlated in an arbitrary manner. Some of the instrumental variables turn out to be redundant, but the number is smaller than under (5). Also, we show that GIV is not consistent unless the equi-correlation assumption of EMS holds, but when the BMS assumption holds, GIV can reduce the number of instrumental variables substantially. We also consider the model when all of the regressors are correlated with ¢i° If the idiosyncratic errors are arbitrarily correlated, the model is very similar to the fixed effects model with arbitrary intertemporal covariance considered by Kiefer (1980). We show that the several estimators, including the Kiefer's estimator (GLS in demeaned equation using a generalized inverse of Q12QT) , GLS in the differenced equations, and GLS in the demeaned equations after deleting any one equation, are numerically identical. Thereby, generalized inverting in this case is an unnecessary complication. In Chapter 3, we study the models where the regressors 7 are weakly exogenous to the idiosycratic errors. Thus in place of (3) we have the assumption (7) E(x&e“) = O, s 2 t. Dynamic models and rational expectations models are the leading examples of weakly exogenous but not strictly exogenous models. However, weakly exogenous models would be suitable in broader applications. We are primarily concerned with the consistency of estimators when the errors are serially correlated, and with the consistency of the usual standard errors of the BSLS estimators when certain moment conditions are used. Also, some nearly efficient estimators based on some reduced lists of instruments are proposed We ask a basic question whether the moment conditions in (7) are valid when the idiosyncratic errors are serially correlated. In dynamic models, it now is well known that no moment conditions exist between the lagged dependent variables and the disturbances if the idiosyncratic errors are arbitrarily serially correlated. We show that a similar relation exists between the general weakly exogenous regressors and the errors unless the time-varying errors contain two components; the serially correlated components to which the regressors are strictly exogenous and the serially uncorrelated components to which the regressors are weakly exogenous. 8 Keane and Runkle (1992) proposed ZSLS upon the forward filtered equations when the errors are serially correlated, adapting a suggestion by Hayashi and Sims (1983) in a pure time series context. Schmidt, Ahn and Wyhowski (1992) (SAW henceforth), in a comment on Keane and Runkle, provided the maximal sets of the instrumental variables in several weakly exogenous models, and showed that the Keane and Runkle estimator is numerically identical to BSLS when all the instruments are used. Hayashi and Sims (1983) and SAW indicated that eliminating the serial correlations by forward filtering is justified only when "the serial correlations in the errors are independent of the current and lagged values of the instrumental variables". But, if the Keane and Runkle estimator is inconsistent, so is 3SLS since they are the same. Thereby, the requirement for vindicating forward filtering noted by SAW and Hayashi and Sims indeed is needed for the moment conditions in (7) to be valid. Wooldridge (1993) showed that the usual standard errors for the nonlinear BSLS estimators in hedonic pricing models are not consistent, and derived a condition for the usual BSLS standard errors to be consistent. Ahn (1990) obtained a similar result in the dynamic model when certain instruments are used. If BMS's equi-correlation assumption that E(xi't¢i) are the same over t holds, we have (T-1)k additional instruments. We show that the usual 3SLS 9 standard errors are not consistent if these instruments are used. Applying BMS's condition to dynamic models, we have the condition that E(yi't¢i) are the same over t, which is implied by the stationarity of {(Yit¢i) :t=0, - - - ,T} suggested by Arellano and Bover (1990). This condition implies T-1 instruments, and the usual 3SLS standard errors are not consistent if these instruments are used. In fact, the result obtained by Ahn (1990) is based on the instrumental variables obtained from the moment conditions E[(yit-ayit_1)¢i] being the same over t, which is weaker than EMS condition (or Arellano and Bover), but the structures of the instruments from these conditions are quite similar. Thus, Ahn's result is closely related to ours. As we argued above, all the instrumental variables are useful unless the optimal weighting matrix is block diagonal in general. When the instruments are weakly exogenous the diagonal would be the only structure of 2 that makes the optimal weighting matrices block diagonal. Thus, any attempts to find some reduced lists of instruments that produce the fully efficient estimators may not be fruitful. But, it is practically useful to find some reduced set of instruments. In many applications, the weakly and the strictly exogenous regressors exist together in a model. A leading example is the dynamic model with strictly exogenous regressors considered by many. Letting X1 and X2 be weakly and strictly exogenous regressors, respectively, we suggest 10 instruments (R,n4Xé),‘where R includes all the instruments between X1 and the errors. In Chapter 4, we study the moment conditions from covariance restrictions, which are essentially nonlinear in the parameters. Covariance restrictions are rarely used in practice in standard models, perhaps because people are reluctant to utilize a priori restrictions that will cause inconsistency of estimators when they are false. Another important reason would be the computational burden of numerical optimization. But, if covariance restrictions bring non-trivial efficiency gains, computational burden is secondary. We show how to consistently estimate the asymptotic variances of the nonlinear GMM estimators that use the moment conditions from covariance restrictions without numerical optimization. If the efficiency gains by adding the moment conditions from covariance restrictions is non-trivial, then it is worth doing numerical optimization. Testing whether the covariance restrictions are valid is straightforward, so we can get around possible inconsistency problem. This result applies to the general simultaneous equations models as well as to the panel data models. Next, we find when the moment conditions from covariance restrictions are redundant. We consider the scalar and the random effects covariance matrices in strictly and weakly exogenous models. In the strictly exogenous model with scalar covariance matrix, so where OLS 11 is efficient under standard set of assumptions, it turns out that the moment conditions from covariance restrictions are useful unless certain third moment conditions of the errors are met. In this case, the nonlinear GMM estimator is equivalent to the linear GMM estimator using the instruments made of residuals from initial consistent estimators, and these instruments are not correlated with the regressors. Therefore, what we find is that the instruments can be useful even when they are not correlated with the regressors. This deviates from convention. In fact, the efficiency gains in this case follow from the correlations between the instruments and the error square sequence {ufi}. In another words, the additional instrumental variables from covariance restrictions necessarily cause heteroskedasticity to be useful. This relates to the results obtained by Cragg (1983) and Chamberlain (1982) that additional instruments (other than the regressors) can be useful in standard regression models where all the regressors are valid instruments under heteroskedasticity of unknown form. The asymptotic variance of the estimator when we use instruments w} is (we consider here a single equation case for simplicity), 2 -1 4 (8) [E(xi'wi)E(uiwi'wi) E(wi'xi)] . Let wi = (xi,zi) , where zi is the set of additional instruments, and suppose E(z;xg) = 0 (this is not necessary 12 but simplify the algebra), then (8) becomes E | ‘1 E 2 | _ E 2 q E 2 | '1E 2 c E n "I (Xi Xi) { (111-Xi Xi) (uixi Zi)[ (uizi Zi)] (uizi Xi)} (Xi Xi) I which is strictly smaller that OLS variance as long as E(u§xi'zi) n 0. Thus, zi contributes by explaining u‘i”. Another general result we obtain in Chapter 4 is that the information from covariance restrictions is useful if the GLS efficiency is not reached. Along with the above arguments, it becomes obvious that the GLS efficiency is necessary but not sufficient for the moment conditions from covariance restrictions to be redundant. In the Hausman and Taylor (1981) model, GLS is not consistent. But, the regression is seperated into the two orthogonal spaces as we argued above, and the GLS efficiency is reached in the deviation space. As was conjectured by Ahn and Schmidt (1992), the instrumental variables from the random effects covariance restrictions are in the deviation space, and they are redundant if certain higher conditions on the error are satisfied. In weakly exogenous models, GLS is not consistent unless 2 is diagonal. It seems that covariance restrictions are useful if 2 is not diagonal, and they can be redundant if 2 is diagonal. We show these through several examples that quite often appear in applications. If there present unobserved effects ¢H in weakly exogenous model, the moment conditions from covariance are useful because 2 is not 13 diagonal. This holds whether ¢H is correlated with the regressors or not. CHAPTER 2 ESTIMATION WITH PANEL DATA UNDER STRICT EXOGENEITY 1. INTRODUCTION This chapter has two purposes. The first is to establish equivalences between certain three stage least squares (3SLS) and generalized instrumental variables (GIV) estimators in several panel data models with strictly exogenous regressors. The second purpose is to find minimal sets of nonredundant instruments for BSLS under various assumptions that have been used in the panel data literature. Extensions of the standard assumptions are also considered. The 3SLS estimator considered in this paper appears in Amemiya (1977, equation 5.4), Hausman, Newey and Taylor (1987), and Schmidt (1990, equation 5), and is the generalized method of moments estimator (GMM) under standard assumptions. The GIV estimator has been considered by White (1984, pp. 85-105; 1986), Bowden and Turkington (1984, pp. 68-72), Schmidt (1990), and many others. In the models we consider, there will always be an estimator that is asymptotically no less efficient than BSLS and GIV, namely the cum estimator using all orthogonality conditions and an unrestricted weighting matrix. Thus, in 14 15 terms of achieving asymptotic efficiency there is really no need to weed out redundant orthogonality conditions. As a practical matter, though, it is very useful to know whether, under certain assumptions, the list of nonredundant instruments is shorter than the list of all possible instruments. For example, in a panel data model with 10 strictly exogenous time-varying regressors and six time periods, the total number of instruments is 360. This can cause computational problems for CHM, especially when the cross section dimension is small. Even if there is no computational issue with CNN, there might be good statistical reasons for using estimators based on fewer orthogonality conditions. As an illustration, consider a result obtained in section 3.3. There, it is shown that a BSLS estimator based on a reduced instrument list is, under the usual random effects assumptions, equivalent to the random effects (GLS) estimator. This 3SLS estimator is based on many fewer instruments than an unstructured GMM analysis would be. This result implies that, if we then compute the CNN estimator with Optimal weighting matrix using the restricted BSLS instruments, we obtain an estimator no less efficient than random effects. In addition, if the random effects assumption fails, then the estimator is generally more efficient than random effects. And while this GMM estimator based on restricted instruments is generally less efficient than the full GMM 16 estimator if the random effects assumptions fail, it has computational advantages and could very well have better finite sample properties. Thus, the redundancy conditions do have practical applications. Throughout this chapter (and in later chapters) we focus on general unobserved effects panel data models, where the time constant unobserved effect may or may not be correlated with some or all of the observed regressors. We cover models with serially uncorrelated idiosyncratic errors as well as models where the idiosyncratic errors are allowed to be serially correlated with time-varying variances. Such a setup captures the flavor of both random effects and fixed effects-type specifications. In a random effects framework, the key assumption is that the observed regressors are uncorrelated with the unobserved effects. For many fixed effects applications, the key feature is that the unobservable effects can be correlated with some or all of the regressors. We consider both cases in what follows, and this is sufficient for the vast majority of applications. The strict fixed effects framework, which assumes that the unobserved effects are constants that differ across individual, is not treated here. See Hsiao (1986, pp. 41-49) for a discussion of the conceptual issues underlying the fixed versus random effects dichotomy. Section 2 contains some general results concerning the equivalence between 3SLS and GIV. These are extensions of 17 the well-known equivalences between OLS and GLS under certain conditions in the seemingly unrelated regressions model, and of the equivalence between 3SLS and ZSLS in simultaneous equations models. Section 3 studies panel data models where the regressors are uncorrelated with the composite errors. In a general model the equivalence of the BSLS estimator using all orthogonality conditions and GLS is established, giving a different proof of a result of Ahn and Schmidt (1991). A new result showing the equivalence of a BSLS estimator with reduced instrument set and GLS is presented in section 3.3. Section 4 turns to models where the time-constant unobserved effects are potentially correlated with some or all of the regressors. Several estimators are shown to be identical for estimating the parameters in the unobserved effects analog of Kiefer's (1980) fixed effects model. We also study the Hausman and Taylor (1981) (HT hereafter) model under more general assumptions. Hausman and Taylor showed how the coefficients on the time constant regressors could be identified when the time constant regressors are correlated with the unobserved effects in a model with serially independent idiosyncratic errors. Efficient estimation in the HT model was considered further by Breusch, Mizon and Schmidt (1989) (EMS hereafter) under some additional assumptions. These results are extended to allow for arbitrary serial correlations in the idiosyncratic 18 errors . 2. 3SLS, GIV, AND REDUNDANCY CONDITIONS 2.1. W Consider a linear panel data model, yi==)gfi + ui, (2.1) where yi = (yfl, - - - ,y") ' and xi = (xi'1, - - - ,xi'T) ' of dimension Txl and Txk, respectively. {(yvag):i=1,-- ,N} is an i.i.d. random sequence. Throughout the paper, for any Txp matrix mi, M 5 (m{,-- , mg)' of dimension NTXp, where N is the number of observations. Thus, the matrix M is the stacked matrix of nu for i = 1,-- ,N with the i-th block mi. In the sample, Y = X6 + U. Most of the results we discuss in this chapter deal with algebraic equivalences of various estimators and have nothing to do with statistical properties such as consistency and asymptotic normality. Nevertheless, one would probably not use the estimators unless certain assumptions are satisfied. It is useful to set out some assumptions that typically underlie method of moments-type estimators. Because we are studying both 3SLS and GIV we make assumptions that traditionally underlie application of —z~fl' - 19 these methods. To consistently estimate 6 appearing in (2.1) we assume that there is a set of Txh instruments, wg, that are appropriately orthogonal to “1' 4A fairly standard set of assumptions is ASSUMPTION 2.1: (a) E(w{u1) = 0. (b) E(wi'2'1ui) = 0, where 2 E(unfi) is nonsingular. ASSUMPTION 2.2: (a) E(wi'xi) has full column rank and E(w‘.'zwi) is positive definite. (b) E(wi'2"xi) has full column rank and E(wi'2wi) is positive definite. ASSUMPTION 2.3: (a) E(wi'uiui'wi) = E(wi'2wi). -1 -1 __ -1 (b) E(w{2 Inugz v“) — E(w;2 W})° Assumption 2.1(a) and 2.2(a) ensure that the BSLS estimator is consistent under standard regularity conditions. As a practical matter, Assumption 2.2(a) implies that the BSLS estimator exists with probability approaching one (as the sample size grows); for what follows, we just assume the estimator exists for any sample. Assumption 2.3(a) is the weakest assumption that guarantees that the usual formula for the asymptotic variance of BSLS is valid. Assumptions 2.1(b) and 2.2(b) imply consistency of the GIV; Assumption 20 2.3(b) implies that its asymptotic variance matrix is of a relatively simple form. Note that a sufficient condition for both parts of Assumption 2.3 is E(uiui' Iwi) = E(uiui') = 2. When we study 3SLS in later sections, we typically have in mind assumptions such as Assumption 2.1(a), 2.2(a), and 2.3(a) . The key will be to find instruments wi that satisfy these conditions under more primitive assumptions about the models at hand. Assumption 2.1(a) is critical and dictates the choice of wg. .Assumption 2.2(a) can be viewed as a regularity condition. Assumption 2.3(a) cannot be guaranteed a priori, but it is useful as a starting point. In practice, one needs a consistent estimator of 2 to perform 3SLS or GIV. Nothing is lost in the following analysis by assuming 2 is known because it is consistently estimable in the models we deal with. 2.2. Efficiency Comparison of 3SLS and GIV Given the data matrices x, W, and Y, and defining a E 1&92, the 3SLS estimator is defined to be 2535“ = [x'W(w'nW)‘1w'x1'1x'W(w'QW)"w'y. Equivalently, this is the GMM estimator based on the orthogonality condition E(W'U)=0 using weighting matrix (W'nW)4. The GIV estimator first transforms (2.1) to spherical disturbances by premultiplying by ZYVZ, and then 21 uses 21'1’zwi as the instruments. This gives 35”, = [X'fl"W(W'n"W)‘1w'n‘1x1‘1x'n"W(w'n"W)"w'n'1y. Under Assumptions 2.1-2.3, we have the following asymptotic variances: Avar/N(E,SLs-m = [E(x;wi){E(w:2wi) }"E(w:xi)1“. and Avar/N(3mv-fi) = [E(xi')3'1wi) {E(wi'2’1wi) )‘1E(wi'2'1xi) 1". Rather than compare the asymptotic variances, it is easier to work with the estimates of the asymptotic variances that these formulas imply, which are VA A - ' ' ‘1 I '1 — I "U2 -1/2 -1 armssLs) — [x W(W (W) W X] — (x n PmVZW)“ X) , (2.2a) and var-($6M = [x'n"W(w'n°‘W)"w'n“X]" = (X'n'1/2P(n-1/2W)0'1’2X)('1, b) 2.2 where P(.) denotes the projection onto the columns of (-). Efficiency comparison of the two estimators is ranking the two idempotent matrices of the same rank P(Q“QW) and Pm‘VZW) , which is not possible without further information on X, W, and n. The problem is equally stated as finding the optimal 6 which maximizes Pm‘W) , which does not seem to be possible in general. An important well-known special case is when W = x, in which case 361V is the GLS estimator and 333:5» is the OLS estimator. However, when st, general dominance of one 22 estimator over the other has not been established. Bowden and Turkington (1984, p. 72) argued that there would be no clear dominance of one estimator over the other without further information on X and W when X and W are of the same order. White (1984, pp. 83-105: 1986) showed that the GIV estimator is no less efficient than the BSLS estimator if (I'VZW is the optimal set of instruments in the transformed equation multiplied by n'VZ. White's proof is based on the fact that the GIV estimator is the ZSLS estimator on the transformed equation and the ZSLS estimator is the most efficient when the covariance matrix is scalar and the optimal instruments are used. However, as we can see in (2.2), the BSLS estimator is also a ZSLS estimator in the transformed equation with the scalar covariance matrix. The difference is in the instruments to be used; GIV uses (TVZW, whereas 3SLS uses QWW. It is not clear which instrument set is optimal without further information on X, W and a. Before turning to algebraic equivalence results it should be noted that the efficiency issue is unambiguous if we strengthen the assumptions as in Chamberlain (1987). He shows that if (i) E(uilwi) = o and (ii) E(uiui'lwi) = 2:, then the most efficient estimator that ignores second moment information is the BSLS estimator using the instruments E"E(xilwi). Unfortunately, the condition E(uilwi) = 0 is too strong for several of the panel data applications we have in 23 mind. 2.3. Numerical Equivalence of 3SLS and GIV We now turn to the algebraic equivalence of 3SLS and GIV. Therefore, Assumptions 2.1-2.3 are not needed: we only need for the estimators to exist along with assumptions about W or E, to be given below. As is well known, for nonsingular n, OLS = GLS iff there exists nonsingular R such that 94x = XR. Essentially the same relationship holds between the BSLS and the GIV estimators for given instruments W. From a CNN viewpoint, the estimators are invariant to any nonsingular transformation of the orthogonality conditions as was pointed out by Schmidt, Ahn and Wyhowski (1992) (SAW hereafter). THEOREM 2.1: In model (2.1), if there exists nonsingular B such that n"w = WB, then Ems = 36“. PROOF: x'n"W(w'n"W)"w'n"x = X‘WB(B'W'QWB)"B'W'X = x'W(w'QW)"w'x. I The most widely known special case of Theorem 2.1 is the SUR model with either common regressors or diagonal E, in which case OLS = GLS. We now extend these results to show 3SLS and GIV are equivalent under analogous assumptions. 24 We first consider the common instrument case, that is, where the same set of instruments is used for all t. This is given by the following assumption. ASSUMPTION 2.4: wi = ITow‘i’, where w? is a 1xq vector. As we will see, Assumption 2.4 is applicable to many panel data models with strictly exogenous regressors since the regressors in each time period are orthogonal to errors in all time periods. THEOREM 2.2: In model (2.1), when common instruments are used, that is wi = ITow‘i’, the 381.8 and the GIV estimators are the same. PROOF; From Theorem 2.1, it is sufficient to show that E'1wi = >3"(I,ew$) = 2"ow‘i’ = (119w?) (3-181,) 2 wiB. I Just as with Theorem 2.1, no statistical assumptions are imposed. Since the proof is based on each observation, technically the results holds in the samples of size no smaller than h. It is worth emphasizing that the common instruments w‘i’ do not have to be of the form (wi1,- - o, w”) . Theorem 2.2 holds no matter what w? is. Theorem 2.2 still holds after we replace 2‘ for E" for any scalar 6. Hence, there are infinite sets of instruments that generate the same estimator. The same result has been provided by SAW (1992) when FW} is used as instruments, where F is the forward filtering matrix; this corresponds to 25 the case 6 = -1/2. A wide class of models implies common instruments. Standard panel data models where the instruments are strictly exogenous to errors, standard simultaneous models, and the usual SUR models are examples. Thus, the GIV estimators are the same as the BSLS estimators in these models. We now turn to the case of diagonal 2, where the instruments are essentially unrestricted. ASSUMPTION 2.5: 2 = diag(a§, - .-,o$). THEOREM 2.3: For any 1xht vectors wit, define w‘- = diag(wi1, -,wn). Then, under Assumption 2.5, BSLS and GIV are the same. PROOF: 2'1wi = diag(o;2wi1, - - - ,ofw”) = diag(wi1o;2, - - - ,wiTa;2) E wiB, where, B E diag(Ig1®o;2, Igzoaéz, - - - ,IgToaf) of dimension gxg, and I9t is the identity matrix of dimension 9}. I It turns out that the theorems in this section can be used to derive some of the results for the specific panel data models we turn to next. In cases where the comparision is between BSLS using one set of instruments and GIV using another set, direct arguments are easier. 26 2.4. Algebraic Redundancy or Instruments in BSLS In section 4 we use a general result on redundancy of instruments for 3SLS. The following is the algebraic equivalance analog of White (1984, Proposition 4.50). THEOREM 2.4: Let [a = [x1111(1111'12111)"111'x1'1x'w1 (w1'nw,)"w1'y and 5 = [x'mw'nm"w'x1‘1x'mw'nm'1w'y, where w = (111,142). Then A 6 = 19 1f wz'x = wgnw1(w,'nw1) 111111111. PROOF: Appendix 2. I Similarly, the two GIV estimators using instruments WW1 and n“(w1,w2) are numerically identical if w'n"x = WOW (WOW )‘1w'n’1X 2 2 1 1 1 1 ° 3. MODEL WHERE THE REGRESSORS ARE UNCORRELATED WITH THE ERRORS 3.1. Unrestricted Covariance Matrix We now consider model (2.1) under the assumption that each element of X} is orthogonal to each element of ui; thus, we have in mind that E(xi®ui) = o. (3.1) Under (3.1), for each t the instruments can be taken to be 27 all nonredundant elements of x? E (xi1,~--,x"). (3.2) Thus, we want to analyze the BSLS estimator under the following assumption. ASSUMPTION 3.1: E(wi'ui) = 0, where wi = Irow‘i’, and W? contains all nonredundant elements of x?. If there are no time constant elements in xit then w?==)¢. Typically, w‘i’ will have fewer than Tk elements since x” Often contains at least a constant, if not other time constant variables. Any time constant variables only appear once in w?. In this subsection we place no restrictions on the variance matrix 2, which puts us exactly in the situation of Theorem 2.2. Ahn and Schmidt (1991) showed that the 3SLS estimator using all of the instruments IToxi° is the GLS estimator. We can restate their finding with a simpler proof. THEOREM 3.1: Under Assumption 3.1, the 3SLS estimator is the GLS estimator. PROOF: It follows immediately from Theorem 2.2 that BSLS = GIV using the same set Of instruments. But the GIV estimator using instruments Ignfi is the GLS estimator since 28 3.2. Diagonal Covariange Matrix If the errors uit are serially uncorrelated over time, we have Assumption 2.5. Let x: = diag(xi1, - . - ,x”) . THEOREM 3.2: Under Assumption 2.5, the 3SLS estimator using instruments x: is the GLS estimator. PROOF: 38LS=GIV from Theorem 2.3. Let 52‘; = 24/22:; and ii = E'Wxi. Since 2 is diagonal, x’i' = diag(a;1xi1, ~ - - ,O}1x") . Thus, P(;(-)X = X. The result follows immediately. I The definition of x: leaves the instruments for each equation entirely unrestricted. In fact, because E(x:Hn) = 0 is sufficient for consistency of BSLS under Assumption 2.5, the strict exogeneity condition (3.1) is not needed for consistency. Of course we are only proving algebraic equivalence results here anyway. Recalling the conclusion of Theorem 3.1, Theorem 3.2 is seen to be a redundancy result. Theorem 3.1 showed that, without any restrictions on 2, the 3SLS estimator using wi== Itow‘i’, where w? contains all nonredundant elements of x‘i’, equals the GLS estimator. Together, Theorem 3.1 and Theorem 3.2 show that, under Assumption 2.5, 3SLS using instruments w} is the same as 3SLS using instruments xi. Without Assumption 2.5 this redundancy does not necessarily hold. In the SUR model where fi's are different across the equations, the regressor itself is xi. OLS = GLS if 2 is 29 diagonal. But in the panel data models with the same 6 across time periods, GLS is strictly more efficient than OLS. Of course, if 0: = 02 for t = 1,---,T, OLS = GLS. 3.3. gendon Effects Structure In many panel data applications 2 is entirely unrestricted as in section 3.1. Further, it is essentially never diagonal in unobserved components models. We now turn to the popular random effects model. To study the random effects setup we need to introduce some notation similar to that used by EMS (1989). The instruments ITox‘i’ are equivalent to the instruments (eT,L)Ox‘i’, since all of the columns in (eI,L) are linear combinations of the columns in I} and both are of the same column rank, where eT denotes the Tx1 vector of ones and L denotes the Tx(T-1) differencing matrix The instruments etox‘i’ and Lox? are in the space spanned by eT and L, and of dimension Tka and TxT(T-1)k, respectively. For the same reason, the instruments x: are equivalent - .. _ T - _ - to (xi,xi) , where xi = 4% 21x1: and xi = (X11'X1I' - ~, xiH-xi) . t: 30 That is, X? and (wai) preserve the same information. Since xn-xi is the negative sum of the rest of the terms (xH-xi, -,xiT_1-§i) , xn-xi is trimmed away to avoid the singularity problem without losing any information. Let 2” = xH-xg, then iii = (53“, - - ~ , 52m) . The dimensions of ii and iii are 1xk and 1x(T-1)k. In summary, the instruments Itox‘i’ are equivalent to; (L,e,)®x? or Leo-(“521) or (L,eT)®(§i,§i). In the sample, (X°,X,X) 9L and (X°,X,3'()181eT stand for the stacked instruments m(xg,§i,§i) and eT®(x“?,xi,xi), respectively, where -x11x12---x"- 'X11x12"'x11-1' -x1- x21 x22 X21 x21 X22 x214 x2 x°- it: ’= . - .1 - '° ” °° _ _ - _ x111 X142 x," X111 x112 X1114 X11 In the standard random effects model each uit can be written as u.it =¢i + Git! t: 1'...’T, where ¢3 is the time-constant unobserved effect and the sit are the idiosyncratic errors. We assume that 491 and 6it have zero means, are uncorrelated for all t, and that {en:t=1,-o ,T} is an uncorrelated sequence with constant variance 0:. The variance of ¢1 is 0:. These assumptions lead to a well known form for 2. 31 . _ 2 2 ASSUMPTION 3.2. 2 — O‘II + a‘eTeT'. In applying random effects it is assumed that E(x;¢e)== 0 and E(xi®ei) = 0, so that the strict exogeneity condition (3.1) holds. Thus, the set of potential instruments is exactly as in section 3.1: the nonredundant elements of x? can be used as instruments for each time period t. Section 3.2 showed how the number of instruments can be reduced when 2 is diagonal. The next result shows that one can get by with many fewer instruments in the random effects model. The proof is much simplified by writing 2 under Assumption 3.2 as E = PT + bQT, where PT = eT(e1'e,)‘1eT' = %e,e;, QT = L(L'L)4L' = IT-Iy, and b is a positive scalar. To see how to do this, note that _ 2 2 _ 2 2 _ 2 2 2 2 - afiIT + o‘eTeT' — OGIT + TagT — (ae+Ta')PT + OeQT E PT + bQT. Of-l-TO: (sum of each column in 2) is assumed to be one without loss Of generality, and b a of. Note further that (PT + bQT)'1 = PT + £01, that holds since two projection matrices PT and QT span the two orthogonal bases eT and L. Let p = INOPT, Q = INQQT. THEOREM 3.3: Under Assumption 3.2, the 3SLS estimator using instruments (Ppg,Qpn) is the GLS estimator. PROOF: X'P ] X'P " X1(PX,QX) [[ X'Q ](P+bQ) (PX,QX)] [X'Q = X'PX + %X'QX = X'(P+bQ)'1X. I 32 Another way to say this is that BSLS using all the instruments is the same as 3SLS using (qu,Qng). Since both BSLS and GLS need a consistent estimator of E and GLS is computationally simpler, one might think 3SLS is not so useful in practice. But in fact Theorem 3.3 is quite useful. Recall that 3SLS is a GMM estimator using a restrictive weighting matrix. Under Assumption 3.2, 3SLS is asymptotically equivalent to GMM using instruments (Ppg,ng) which is robust in the presence of the conditional heteroskedasticity and/or the conditional serial correlation (White, 1980, 1982; Hansen, 1982). If Assumption 2.3(a) is violated, for example, if there exists conditional heteroskedasticity or serial correlation, the inference based upon the GLS estimator is not valid. And while the robust variance estimator of the GLS estimator can be reported as A A 4 4 N 4A A -1 -1 -1 Var(fiGLs)=(X'n X) (.21 xi'z uiuiz“. xi)(X'n X) 1- (Wooldridge, 1992), this is not even necessarily smaller than that of the OLS estimator if A2.3 is violated. On the other hand, the GMM estimator using the instruments (qu ,qu) is the most efficient (among the estimators based on these instruments), and Theorem 3.3 tells us it is no less efficient than GLS, and it is more efficient than GLS if Assumption 3.2 fails. Further, the GMM estimator using the instruments (Ppg,qu) is no less efficient than the OLS 33 estimator whether Assumption 2.3(a) holds or not, since the instruments (ng,Qgg) are equivalent to the instruments (xi,Ptxi) , and the additional instruments PIX: are not redundant upon the instruments xg. This GMM estimator has a lot fewer instruments than the GMM estimator based on all orthogonality conditions. The representation 2: = PT-i-bQT has many other uses as well. For example, it leads to a straightforward proof that the GLS estimator can be written as a convex combination of the between and within estimators. Let 3b and 3" be the between and the within estimators, then Ems = [x' (P+%Q)X]'1X' (P+%Q)Y = [x' (P+%Q)X]'1X'PY+%[X' (P+%Q)X]'1X'QY (x'I>x+f‘5x'QX)"x'Pfo,D + (X'PX+%X'QX)'%X'QX£3H. Theorems 3.2 and 3.3 suggest that the minimal set of instruments depends on the structure of error covariance. It appears that the minimal set of instruments depends on the block diagonality of the Optimal weighting matrix E(w;2w3). In the model with a diagonal error covariance, the optimal weighting matrix is a block diagonal with the t- th block O§E(x‘i"x“?) , for which only xit is non-redundant since (1;2E(x‘i"x‘i’)‘1 meets the regressor xit only. Recall that the instruments 119x? are equivalent to (e,ox§’,Lex$) . In the random effect model, the Optimal weighting matrix becomes block diagonal between the two 34 blocks eT'eTO E(x“?'x‘i’) and bL'LoE(x‘i"x‘i’) . In the sample, the first block corresponds to the regression 4 [X'P(X°®IT) (INOPT)X] X'P(XO®IT) (IN®PT)Y. Thus, replacing PX for X°®eT produces the same result. For the same reason, nothing differs whether we use QX or x%u; for the regression in the space spanned by L. It would be worth noting that if 2 is diagonal the optimal weighting matrix becomes block digonal even when the instruments are weakly exogenous to the errors, but when 2=PT+bQT the optimal weighting matrix is not block diagonal upon the weakly exogenous instruments. This would be the reason why Ahn and Schmidt (1992) get the result that all of the instruments are not redundant in the dynamic panel data model with the random effect covariance structure. In dynamic model, the instruments corresponding to the lagged dependent variables are weakly exogenous. Details are in Chapter 3. We end section 3 with another model where the optimal weighting matrix is block diagonal. 3.4. A Generalization of the Random Effects Assumption Note that E = PT+bQT is sufficient but not necessary for the block diagonality of the optimal weighting matrix upon the instruments (eTox‘i’,Lox‘i’) . E = PT+bQT => PTEQT = 0, 35 but not vice versa (Lemma A1.2 in Appendix 1). Even though it seems unlikely in applications for )3 to satisfy PTEQT = 0 but not be of the form PT+bQT, theoretically it is worth looking into this case in greater details. The properties of E which satisfies PT'L‘QT = 0 are collected in Appendix 1. Because the nonredundant set of instruments depends in different ways on time-constant and time-varying regressors, we now explicitly separate the two. Write Yit = xitfi + Zi‘y + uit' t = 1:"‘1T, (3.3) where xit is 1xk and zi is 1xg; note that zi'can include a constant. Note that uit need not be separated into a time constant and time varying errors for stating the results of this section. For consistent estimation of B and 1 by, say, GLS, in addition to (3.1) we would now need the condition E(zgmn) = 0. Interestingly, if 2 satisfies certain conditions, the 3SLS estimator with a reduced set of instruments is the GLS estimator. The condition on 2 is formally stated as ASSUMPTION 3.3: PTEIQT = 0. As we mentioned earlier, if z is of the random effects form then it satisfies Assumption 3.3, but the converse is not true. 36 THEOREM 3.4: In model (3.1) under Assumption 3.3, 3SLS using the instruments (arm-(“etozwmxn is GLS. PROOF: Appendix 2. I Theorem 3.4 shows that the (T-1)(2k+g) instruments (mxiflozweroxi) are redundant when PTEQT = 0. Redundancy of eto'xi is rather obvious. The regression is separated into the two orthogonal spaces and the error covariance is idempotent in the space spanned by es. On the other hand, the intuition behind why the instruments L®(xi,zi) are redundant is not that obvious. Note that ii, the time constant component of time-varying instruments x3, behaves just like the time-constant instruments zi. In the previous subsection, if the error covariance is of the random effects structure then the GLS estimator is a convex combination of the between and the within estimators. Similarly, when the error covariance satisfies Assumption 3.3, the GLS estimator is a convex combination of the between estimator and the GLS estimator on the differenced data. To show this, let A = x1 (INOE'1)X, and note that the GLS estimator on the differenced data is Em = [X'(IN®L(L'2L)’1L'}X]'1X'{IN®L(L'2L)'1L'}Y, and 13,.»3'1PT = éP, (Lemma A1.5 in Appendix 1) . Then, am A'1x' (1,392")! = A'1[X'(IN®PTE'1PT)Y + x' (INGQTZ'1QT)Y] = A’1‘;X'PY + A'1X'(I"®L(L'2L)'1L')Y A"%(X'PX)fib + A'1[X'(INOL(L'2L)'1L')X]§GLS. 37 4. MODEL NHERE THE REGRESSORS ARE CORRELATED WITH THE TIME CONSTANT ERROR COMPONENTS In this section we consider two models where the time-constant unobserved effect may be correlated with some or all of the regressors. In the first model all regressors are time-varying and possibly correlated with the unobserved effect. If in addition the idiosyncratic errors are assumed to be serially uncorrelated with time-constant variance, this effectively corresponds to the traditional fixed effects model. When the variance-covariance matrix of the idiosyncratic disturbances is unrestricted we get the unobserved effects analog of Kiefer's (1980) fixed effect model. In the general case we derive the equivalence of several estimators that are suggested by the structure of the model. In sections 4.2 and 4.3 we study the HT model, where some regressors are assumed to be orthogonal to the unobserved effect. The original HT model assumed i.i.d. idiosyncratic errors. We cover this case in section 4.2 and in section 4.3 derive new redundancy results when the variance-covariance matrix of the idiosyncratic errors is unrestricted. 38 4.1. A "Fixed Erfects" Tyne Model The model can be written as yit = xitfi + 4’1 + an = xitfl + uit, t = 1,---,T. (4.1) Now the orthogonality condition underlying the analysis is E(xgug) = 0, (4.2) which is a strict exogeneity condition but allows xit and Oi to be arbitrarily correlated. This arbitrary relationship between xi and 4’1 gives (4.1) a fixed effects flavor. Under (4.2), only coefficients on time-varying regressors are identified; thus, for this subsection, x contains only M time-varying regressors. Under (4.2) the valid instruments for estimating 6 are given by _ 0 where recall that x‘i’ a (xi1,- - -,x”) is a 1ka row vector and L is the Tx(T-l) differencing matrix (see section 3.3). Thus, w; is Tx(T-1)k. Not surprisingly, a reduced set of instruments is available under standard assumptions. Under Assumption 3.2, that is E = OEIT + aie,e,', the within (or "fixed effects") estimator is known to be efficient (provided Assumption 2.3(a) holds with wg== Loxfi). Thus, it seems natural that other efficient estimators would 39 be the same as fixed effects. THEOREM 4.1: In model (4.1) under Assumption 3.2, the BSLS estimators using the instruments Lox? and thi and the GLS estimators on the demeaned and differenced data are the within estimator. PROOF: The two BSLS estimators using the instrumental variables Lox? and eri are the within estimators since x' (X°®L) [X°'X°®L(PT+bQT)]'1(X°®L) 'x = gx'ox. And the GLS estimators on the demeaned and on the differenced data are the within estimator because xi'QTQIQTXi = xi'eri = xi'L(L'L)-1L'in where Q; denotes the generalized inverse of QT. II If we allow E(eie;) to have an unrestricted form, then 2 = E(uiui') is also unrestricted and this effectively gives the setup of Kiefer (1980). We can apply Theorem 2.4 to show that some instruments used in 3SLS are redundant. THEOREM 4.2: In model (4.1), the two 3SLS estimators using the instruments (LoiEwLoxi) and Lexi are numerically identical. Thus, Lexi are redundant. PROOF: From Theorem 2.4, it is sufficient to show that (XOL) 1): = (XOL) ' (Inez) (XOL) [(XOL) ' (Inez) (XOL) ]'1(XoL) 0:. But, the RHS is (X'OL'E)[P(x)®L(L'£L)'1L']X = (x'eL'z) (IN®L(L'2L)’1L')(P(g)®Qr)X = (XOL) 'x. The last 40 equality holds from Lemma A2.1 in Appendix 2. I Theorem 4.2 shows that among the T(T-1)k instruments Lox$, the (T-1)k instruments Lon-ci are redundant. As before, this result can be used to construct a GMM estimator that is no less efficient than 3SLS when Assumptions 2.1(a), 2.2(a), and 2.3(a) hold with w3== Lox?; if, in particular, Assumption 2.3(a) should fail, this GMM estimator is more efficient than 3SLS, and it adds no more orthogonality conditions. Other estimators under these assumptions also suggest themselves. One can apply GLS on the first differenced or demeaned equations. Kiefer (1980) proposed GLS using the demeaned data using (QIEQTY', the generalized inverse of error covariance on the demeaned data. It seems clear that no information is lost by deleting any one equation in the demeaned data, since any one equation is the negative sum of the rest of the equations. Also demeaning and differencing preserve the same information. There are several estimators that are numerically identical. Let 335133 381.8 estimator using the instruments L®Xi in the original data, Kiefer's estimator, b K 1: ‘0 GLS estimator in the demeaned equation after deleting b o 3 any one equation, GLS estimator in the differenced equation, El 0 1! 33s”); 3SLS estimator in the differenced equation using all 41 of the instruments amv’ GIV estimator in the differenced equation using all of the instruments. THEOREM 4'3: fissLs = 31:1: = Bun = fior = fi3SLS = ficw' PROOF: Appendix 2. I 4.2. nausnan and Taylor Model J The HT model is the model (3.3) where xi== (xfi,xfi) and zi = (z1i,22i). Thus, yi = XML + XZiBZ + (eTozHM1 + (eTozmM2 + 4’191 + 6i. (4.3) The dimensions of xm, x2“, z“. and z2i are 1xk“ 1xk2, 1xq1 and 1xg2, where k=k1+k2 and g=g1+gz. fi=(fi1',fiz') ' and 12(1f,15)'. Assumptions 2.1 - 2.3 and Assumption 3.2 that the error covariance is that of the random effect model are assumed in the HT model. The distinctive feature of the HT model lies in the assumptions E(x”®ug ==0, (4.4) E(xa®ei)== 0, (4.5) E(zfioug ==0, (4.6) E(zfioei)== 0, (4.7) E(x2i't¢i) is the same for t = 1,~--,T. (4.8) The conditions (4.4)-(4.8) determine the instruments 42 available in the model. (4.4) implies the Ti’k1 instruments Iroxgi. (4.5) implies the T(T-1)k2 instruments Loxgi. (4.6) implies the Tg1 instruments ITozfi, and (4.7) implies the (T-1)g2 instruments LozZi. (4.8) adds the (T--1)k2 instruments eTOSEZi. The condition (4.8) and the additional instruments 9195221 were proposed by BMS. Together, we have [T2k1+(T"’--1)k2 +Tg1+(T-1)g2] instruments wi = (Ipxfi’wLoxgi, Itozfi,Loz2i,eTo§2i) , which are equally represented by [Lo(x‘1’i, xgi'z1i'ZZi)'eT®(xc1,i' S:‘21'219 ] ° For the model to be identified, the number of instruments in the space spanned by eT should not be smaller than the number of the time constant regressors, that is, Tk1-1-(T-1)k.‘,+g1 2 g1-1-g2 should hold. Under the random effects covariance structure in Assumption 3.2, EMS and Ahn and Schmidt (1992) showed that the minimal instruments needed for the most efficient 3SLS estimator are [thi,eTo(x‘1’i,x2i,z1i)]. All the instruments in the space spanned by eT are not redundant, but only the k instruments thi are not redundant among the T(T-1)k+(T-1)g instruments Lo(x‘i’,zi) in the space spanned by L. Since 2 = P,-1-bQT and all the instruments belong exclusively either to the space spanned by eT or to the space spanned by L, the regression is separated into the two orthogonal spaces. It is entirely valid to find the minimal set of instruments in each space separately. In the space spanned by L, the error covariance QIEQT = bQT is a scalar idempotent, thus it is not 43 surprising that the instruments (2.xi are sufficient to reach the GLS efficiency. In the space spanned by e,, the error covariance is idempotent but eTo(x1i,22i) are correlated with 45,, thus it fits our intuition that all the instruments in the space spanned by eT are not redundant. THEOREM 4.4: In model (4.3) under Assumption 3.2, the 3SLS estimator using the instruments [QTxi,eTo(x§’i,§2i,z1i)] is a convex combination of the within estimator and the ZSLS estimator using the instruments eT®(x‘1’i,x2i,z1i) . PROOF: Let di = (x‘1’i,x2i,z1i) . In the sample the instruments are (QX,DoeT) . Let R = (X,ZoeT) , the regressors. Note that QR = QX. Let bX'QX o '1 X'QX A a R' x,Do = (Q e7”: 0 D'DoeT'eT ] [ (DoeT) 'R ] .. 1 A _ -1 .1. _ -1 A _1 A 3351.5 ' A (bX'QY+R'P(D®er)Y) - bA X'QXB" + A R'P(DoeT)R323Ls° I THEOREM 4.5: In model (4.4) under Assumption 3.2, the BSLS estimator using the instruments (QTxi,etodi) is the GIV estimator using the same set of instruments. PROOF: By Theorem 2.1, it is sufficient to show that (p,+f;Q,)(Q,xi,eTodi) = (Jb*QTxi,etodi) = (QTxi,eT®di)B, where B is the Tk1> PIEQT = 0. A counter-example which satisfies P1291 = 0 but not of the form 2 = aPT + bQT is sufficient for the proof. Suppose 3 1 0 2 = [ 1 4 -1 ]. O -1 5 2 is symmetric positive definite and the sum of each row is 4, but 2 is not of the form aPT + bQT, because not all of its diagonal elements are equal and not all of its off diagonal elements are equal. I ' ”"1 49 LEMMA A1.3: E = aPT + bQT e PTZQT = 0 when T = 2. PROOF: Equality of the off diagonal terms is guaranteed from the symmetry of E and the equality of diagonal terms is enforced from Lemma A1.1. Thus 2 is of the form P1 + bQT. I LEMMA A1.4: If PIEQT = 0 and either all of the off diagonals in 2 are equal or all the diagonal terms in 2 are the same, then 2: = aPT + bQT. PROOF: If the off diagonal terms are equal, all of the diagonal terms should be the same each other for PTEQT = 0 to hold from Lemma A1.1. Thus the two statements 2 = aPT-+ bQT and PTZQT = 0 are equivalent when the off diagonal terms of E are the same. To prove the statement that the equal diagonal elements of E and PTEQT = 0 implies )3 = aPI + err mathematical induction is used. When T = 3, imposing the equality of the diagonal terms and from the symmetry of 2, 2 is expressed as a on GB 013 023 a From Lemma A1.1, a12+a13 = O12+023 = O13+Oz3, hence, 012 = 013 = 08, Suppose the Off diagonals are the same. If the diagonal terms are the same for T = t > 3, then for T = t+1, the (t+1)—th off diagonal terms are forced to be the same (Lemma A1.1). I 50 o _ _ -1 .- LEMMA A1.5. If PIEQ1 — 0, then P.,ZIPT — aPI, PTZ PT - 1P a 1' and e1(e,'2e,)"e1' = P.2'1PT, where a is the sum of each column of 2.". PROOF: PIZJPT = e,(eT'eT)"eT'Ee,(eT'eT)"eT'= gPT, where s = et'zzeT is the sum of all elements of 2, thus % = a. Hence, the first result follows. (P,>:"P,) (P,2P,) = PT. Thus, (P.2T‘P,)aPT = P1 and PTE'1PI = éPT. The third result follows trivially. I LEMMA A1.6: If PTZQT = 0, then QT2'1QT = L(L'2L)"L'. PROOF: It is sufficient to show that QTZ'1QT2 = L(L'2L)'1L'2. 4 _ 4 _ - _ - Note that Q: Q72 — QTZ‘ ‘2."QI — 0., Since PTZIQT - 0 =: Q12 — QTZIQT = ZQT. Thus, what we need to show is that L(L'2L)'1L'2 = QT. But, L(L'EL)"L'2(P,+QT) = L(L'ZL)'1L'EQT, since P,EQt = o .. L'Ee, = o and L(L'EL)"L'EQT = L(L'EL)"L'2L(L'L)"L' = Q I T. LEMMA 111.7: PTEQT = o e PTE"QT = o. PROOF: PTZQT = o e E = PTZPT + QTZQI e PTE = P.2PT and QTZ = QTBQT. Post-multiplying by E", P1r = PTEPTZ'1 = PTEPTE'1PT = 2PT'2T‘PT and pre-multiplying by 2", we have 2:"PT = PT2'1PT, which is the condition we are looking for. Exactly the same procedure shows that '2'i‘1QT = Q.2"Q.. Given sufficiency, necessity is obvious. I 51 In fact, 2" = (PTZIPI + QIEQI)" = (P.2"P, + QTanT) for any integer n, which implies 2" = (PJI‘PT + QTE'1QT) if PTZIQT = 0. 52 APPENDIX 2 LEMMA A2.1: (P(x)oQT)X = (P(R)OQT)QX = ox. PROOF: Note that when k = 1, X = vec(X°') and X = X°QT. vec(Q,X°') = QX, which is valid when k > 1 by applying this argument to each of the regressors separately. In fact, X are the first (T-1)k columns of X°QT, but P(X)X°Qt = X°Q,, simply because the projection of QT after deleting any one column of QT is still QT. I PROOF of Theorem 2.4: 25 = 3 if x'w1(w1'nw1)"w1'x = X'W(W'flW)'1W'X. But, '1 W2'flW2 WZ'flW1] [WZ'X] X'W(W'nW)'1W'X = (x'w2 X'W1)[ w1'nw2 w; aw1 w; x 4 _ 4 4 _ 4 4 x'wzn wz'x x'wzo wz'nw1(w1'nw1) w1'x x'w1(w1'nw1) w1'nwzo wz'x + x'w1(w1'nw1)“w,'x + x'w1(w1'nw1)"w;{211213411150111 (w1'nw1)"w1'x, from the partitioned inverse lemma. Thus the condition is x'wzn"w,_'x - x'WZD'1w2'nw,(w1'nw,)"w,'x - x'w,(w1'nw,)"w1'nwzn"w2'x + x'w1(w1'nw1)‘1w1'nwzo'1w2'nw1(w1'nw1)“w1'x = 0, which is A'D‘1A, where A = wz'x — wgnw1(w1'nw,)“w,'x and D = 11219112 - W2'nW1(W1'flW1)"W1'nW2, a nonsingular positive definite. Thus A'D'1A = 0, iff A = o. I 53 PROOF of Theorem 3.4: Let H=(X,Z). It is sufficient to show that X'XOL'ZL ]‘[ (XOL) ' (XoL,HoeT)[ J (X,Z®e.) H ' Hoe; ‘2.."eI (HoeT) = (1“82'1)(X,ZoeT). The LHS is [P(R)OL(L'2L)'1L'](X,Z®eT) + [P(H)oe,(e;ze,)"e;] (x,2oe,) = [IueL(L'EL)'1L']X + [Inoe,(e;2eT)"e;](x,ZoeT) = (INoQTE‘1QT)x + (I'OPTZ'1PT)(X,Z®eT) = (Iuoz'1)(X,ZoeT). The first equality follows from Lemma A2.1 and the second equality follows from Lemmas A1.5 and A1.6. I PROOF of Theorem 4.3: Ems = Ems = 76'6“, in the differenced equation from Theorem 2.1 and Theorem 3.1. EELS = 333“ since X'(XOL)(X'XOL'ZL)'1(X®L)'X = x' [P(g)@L(L'ZL)'1L']X = x'[INoL(L'EL)"L'][P(x)oQT]x = X'(INOL) (INOL'ZL)'1(IN®L) 'x. The last equality follows from Lemma A2.1. To show fifi.= 30,, it is sufficient to show that L(L'2L)"L' = (9,29,)‘2 But: C212£2114(I:"231:)"L'QTEQT = 0,20,, L(L'):L)"L'QTEQ.L(L'EL)"L'= L(L'EL)"L' and QTZQTL(L'2L)'1L'= L(L'2L)'1L'QT2QT = or Thus L(L'EL)'1L' is the unique generalized inverse of QTF.QT (Theil, 1971, pp 269). To prove 25,, = Bo". Let ii 3'1 0 , ] and (QTXQTV = [ ] xi QTxi=[ o o where ii denotes the first T-l rows and x? denotes the last row of QTi. Hence, we are looking at the case when the 54 last row of the demeaned data is deleted. Then, 1 * ”1 £1 81 0 {‘i "I ’1" xiQT(QT2QT) QTXi = [xi 'xi 1 ][ c] = X15 xi' 0 0 xi Deleting any one other row instead of the last row of the demeaned data makes no difference. I PROOF of Theorem 4.6: Let R = (x,z6e,), H = (322,22) and G = (xg,22,z,). Note (HoL)'R = (HoL)'X. From Theorem 2.4 it is sufficient to show that (HoL)'(INo2)(GoIT)[(GoIT)'(Iuoz)(co1.)]"(co1,) 'R = (HoL) 'R. The LHS is (HoL)'(P(G)oIT)'R = (HoI.)'(P(G)oL)'X = (HoL)'X (Lemma A2.1). I PROOF of Theorem 4.7: Let R = (X,ZoeT) and G = (X?,XZ,Z1). For the 3SLS estimator using the instruments (GOIT), (GOIT) [ (GOIT) ' (Inez) (GOIT) ]'1(G®IT) ' (X,Z®eT) = (P(G)ozq)(X,Zoefi. And, for the GIV estimator using the instruments (QX,Goefi XIQ '1 x1 (I oE")(Qx,Goe ) (I o2")(Qx,Goe ) Q (I o2")(x,zoe ) " I (Gee ) " ' (Gee ) " ' T T [ (I'82"QT)XD'1X ' (INOQT2‘1) (luoz"Q,)xn“x' (P(G)®QTZ'1eT(eT'E'1e1)'1e1'2'1) (P(G)82'1e1(e{2'1e1) '1eT'2'1QT) XD'1X' (Iqu,E") + P(G)OZ'1eT(eT'2'1eT)‘1eT'2'1 + (P(G)®2'1e1(e1'2'1e7)’1eT'2'1QT)XD'1X' (P(G)8Q,2'1et(e1'2'1et)'1e7'2'1) ] ° (xlzee‘f) I 55 where D = x' (INeQT2'1Q,)x - x'[P(G)eQ,2"e,(eT'z"eT)"e;2"Q,]x. Let A n [x' (InsQ12'1)-X'(P(G)®QTZ'1eT(eT'2'1e,)'1eT'2'1)](X,ZoeT) . A -— [x' (INeQ,2“Q,) -x' (P(G)oQTE'1eT(eT'2'1eI)'1eT'2'1Q7) 1 (x,Zee,) + [x'(IueQT2“P,)-x'(p(G)eQ,2"e,(e;z"e,)"e;2"PT)1(X,Zee,)] = D. Thus, the 1st and the 2nd terms add up to (IN®2'1QT)X and the 3rd and the 5th terms add up -(P(G)92"er(er'2’1eT)"eT'2'1Q,)X. Together, we have (1'82'1QT)X - (P(G)92"eT(eT'Z'1eT)"et'z‘ng + IT [P(G)oz"e1(e;z'1e,)"eT'E'H(X,Z®e,). For the regressor ZoeT, I" [P(G)92'1eT(eT'Z'1eT)'1eT'2'1](ZoeT) = (9(G)e2")(2ee,), and for x, [Iner‘or- P(G)®2'1eT(eT'Z'1eT)"e;2'1QT + p(G)e2"eT(e;2"eT)"egzfljx = (1.32"ng + (P(G)®2'1PT)X = (P(G)ez“)x. I CHAPTER 3 BBTIHATION USING PANEL DATA UNDER WEAK BXOGBNBITY 1. INTRODUCTION In this chapter we study linear panel data models where the regressors are only weakly exogenous. The primary concern is with the consistency of estimators when the errors are serially correlated, and with the consistency of the usual standard errors of 3SLS (appropriately defined, see Chapter 2) estimators when certain instruments are used. We also discuss how to construct some reduced lists of instrumental variables that would lead nearly efficient estimators. A leading example of the weakly exogenous model is the dynamic model with lagged dependent variables. Anderson and Hsiao (1981), Bargava and Sargan (1983), Holtz-Eakin, Newey and Rosen (1988), Arellano and Bover (1990), Arellano and Bond (1991), Ahn (1990), and Ahn and Schmidt have studied efficient estimation in dynamic models. The rational expectations model is an another important example. Using panel data to test the rational expectations hypothesis has lead to renewed interest in studying weak exogeneity in panel data models (Zeldes, 1989; Kean and Runkle, 1990; Runkle, 1991). Generally, there is a growing realization 56 57 both in time series and panel data contexts that many regressors in general models would be only weakly exogenous to the errors. However, only a few study exist that deal with the general weak exogeneity (Keane and Runkle (1992) and comments). An important feature of weak exogeneity is that different instruments are available for each period so that T GLS transformation, in general, will bring the inconsistency ‘ of the resulting estimators (Schmidt, 1990). An important exceptional case is when E is diagonal (Chapter 2, Theorem 2.3). When 2 is diagonal, the redundancy result of Theorem 3.2 of Chapter 2 also applies to weakly exogenous case. Consequently, a general result that GMM using all the moment conditions is the best specially has a force in weakly exogeneous case with non-diagonal covariance matrix (Ahn and Schmidt, 1992; comments on Keane and Runkle, 1992). Schmidt, Ahn, and Wyhowski (1992) (SAW henceforth) provide the lists of instrumental variables for each of several weakly exogenous models. The structure of the weakly exogenous instrumental variables provided by SAW is quite similar to the structure of the instruments for lagged dependent variables in dynamic models. It now is well-known that no moment conditions between the lagged dependent variables and the disturbances exist unless the covariance matrix is somehow restricted. We ask a basic question about whether the a priori population moment conditions that are 58 currently suggested'in weakly exogenous models are valid when the idiosyncratic errors are serially correlated. Genarally, it is likely that there is a link between covariance restrictions and orthogonality conditions as is in dynamic models. We investigate these in the next section. If the variances of the disturbances conditioned on the instrumental variables is the same as unconditional variances, the usual 3SLS standard errors are consistent in general. Wooldridge (1993) showed that the usual standard errors from nonlinear BSLS in hedonic pricing models are not consistent even when the conditional variance of the errors are constants. A similar result was obtained by Ahn (1990) in dynamic panel data models. SAW suggested that the equi- correlation assumption of Breusch, Mizon, and Schmidt (1989) (EMS hereafter) can hold in weakly exogenous models. We show, in section 3, that the usual 3SLS standard errors are not consistent if the instrumental variables from the BMS assumption are used under any plausible assumptions for weakly exogenous models. Also, we link this to the result obtained by Ahn (1990) in dynamic models. Keane and Runkle (1992) proposed ZSLS after forward filtering the equations in weakly exogenous panel data models when the errors are serially correlated, adapting a suggestion by Hayashi and Sims (1983) in a pure time series context. But, it was shown by SAW that the Keane and Runkle 59 estimator is numerically identical to BSLS when all of the instruments are used, and therby forward filtering is an unnecessary complication. However, it still remains an interesting question that how the forward filtering can reduce the list of instruments, since, in many instances, using all of the instruments is not even feasible. Keane and Runkle provided evidence through an example that forward filtering can bring non-trivial efficiency gains, though this does not generalize to other cases. The arguments of Chapter 2 comparing BSLS and GIV apply to this case. The main idea of Keane and Runkle is that forward filtering whitens the errors and applying instruments (without transformation) to the forward filtered equations would result in better estimators. However, 3SLS also is ZSLS on the forward filtered equations useing the instruments 0“QW. Therfore, what Keane and Runkle suggest is using W instead of n“QW on the forward filtered equations. To compare these two, we need to compare P(W) and P(fl“QW)' where P(.) denotes the projection onto the columns of (o). This comparison, in general, is not entirely clear. See Chapter 2 for more details. BSLS and the Keane and Runkle estimators, in the presence of heteroskedasticity, are less efficient than GMM that uses the same instruments. The comparison between the two GMM estimators using instruments W and Q'WW is even more ambiguous. 60 However, in finite samples, GMM estimators based on a huge number of instrumental variables might not have desirable properties. For example, in finite samples, the standard errors can grow by adding instruments, and this would happen more likely when the number of instruments becomes closer to the number of observations. Thus, it is practically useful to find some reduced lists of instruments that generate estimators with desirable properties. In section 4, we suggest some weighted sums of the given long lists of instrumental variables, which do not generate fully efficient estimators but would lead to nearly efficient estimators. However, we should note that the usefulness of applying these reduced lists of instrumental variables remains to be seen. Section 5 concludes. 2. SERIAL CORRELATION AND CONSIBTENCY OF EBTINATORB This section shows that the moment conditions in weakly exogenous models are restricted in general by structures of covariance matrices. Before doing this, we review previous results on the moment conditions for weakly exogenous regressors, in particular those provided by SAW. 61 2.1. figment Conditions under Weak Exggeneity The model we consider is yit = xitB + uitr t = 1,-°-,T. (2.1) Let Yi = (Yi1""IYir)" xi = (xi'1I"'Ixi'r)' and “i = (“i1""' u")' of dimensions Tx1, Txk and Tx1, respectively. (2.1) is equally expressed as yi = xi)? + u,. {(yi,xi):i=1, - --,N} is an i.i.d. sequence and the fourth moments of (ywaq) exist. Let 2 = E(uiui'). If there are no unobserved individual effects that are correlated with the regressors, weak exogeneity is defined by an assumption ASSUMPTION 2.1: E(x§un) = 0, l s t s s S T. ‘I This implies %T(T+1)k instruments diag(x%,x§V---,x%), where x% = (x“,--',xn), t = 1, o-,T. Note that we did not give any restrictions on 2. Also, Assumption 2.1 allows for unobserved effects that are uncorrelated with the regressors. By introducing unobserved fixed effects that are correlated with the regressors, the errors are composites of the time-constant and the time-idiosyncratic components, so uit = ¢i-+ e", t = 1, -~,T. The assumption that corresponds to the weak exogeneity and the presence of explicit unobserved effects is 62 ASSUMPTION 2.2: E(xfie.) = O, 1 S t S s S T. It 18 Under Assumption 2.2, ¢H is allowed to be arbitrarily correlated with the regressors. Under Assumption 2.2, the coefficients on the time- constant variables are not identified. Thus we simply assume that all the regressors are time-varying. It is usual, under Assumptions 2.2, to estimate the parameters after differencing. In the differenced equations, the errors are Auit E uit - uit+1 = 6i: - emu t = 1, - --, T-l, and the orthogonality conditions are E(xfiAum) = o, t S s = 1,---,T-1. (2.2) Thus, we have the instrumental variables diag(x%,--.,x%4) of dimension TX%T(T-l)k in the differenced equations. Let wi = diag(x‘i’1,x“?2, ---,x‘i’T,1) . Applying wi to the differenced equations amounts to applying the instrumental variables Img‘to the original equations before differencing (SAW), where L is the Tx(T-l) differencing matrix (for the definition of L, see SAW or Chapter 2). While applying instruments Lwi to the original equations and applying wi to the differenced equations yield the numerically identical estimators, using Img in the original equations has several important advantages, both for the identification of coefficients on the time-constant variables if they exist and for the efficiency of all of the estimators, whenever we have some instrumental variables that are not in the space 63 spanned by L. 2.2. Serial Correlation and noment Conditions It is usually assumed that {e":t=1,---,T) is an uncorrelated sequence with a constant variance and is not correlated with {¢i}. In this case 2 takes the random effects covariance structure 22 = ail, + aie,e,', (2.3) where II is the TxT identity matrix and er is the Txl vector of ones. But, there is no reason to think (2.3) holds universally. In GMM, the restriction (2.3) makes no difference unless we utilize the moment conditions implied by (2.3), namely, the moment conditions from covariance restrictions. 2 is estimable in the models we deal with and imposing restrictions like (2.3) without testing can be too limiting. In dynamic models, the set of the instrumental variables corresponding to the lagged dependent variables relies heavily on whether the idiosyncratic errors are serially correlated (Ahn and Schmidt, 1992; Arellano and Bond, 1991). Nevertheless, serial correlation of the idiosyncratic errors in weakly exogenous models have often been presumed to have nothing to do with the set of the instrumental variables (Runkle (1991) is an exception; he 64 expressed the concern that the usual instrumental variables might not be valid if the time-varying errors are serially correlated). We will consider whether instruments LM} (or Assumption 2.2 that generally is based on apriority) are valid when the idiosyncratic errors are serially correlated in the model with fixed effects. Although we will study the case when the regressors are correlated with the unobserved individual effects, the results equally apply to the model with no fixed effects. To this end, though it is unnecessary to give any parametric restrictions on serial correlations, for simplicity, we do impose them. We consider two examples that are simple and of particular interest in practice: AR(1) and MA(1). For the AR(1) case, suppose = ¢€n4 + g”, for some non-zero constant ¢, (2.4a) E(gnxfi) = 0, (2.4b) E(Cit¢i) = E(gite‘?t-1) = 0! (2.4C) for t = 1,---,T. Under weak exogeneity, the regressors are correlated with the lagged errors so that xit is correlated with ‘nq for j > 0, thus E(x' iteit-j) " 0: j > 0. (2.5) We now examine whether moment conditions of Assumption 2.2, given (2.4) and (2.5), are valid. But, 65 E(xi'teit) = V’E(xi't€it~1) + E(xi'tgit) = ¢E(Xi't€it-1) " 0' And the same course shows that E(x ) s 0, j > 0. Thus i't‘iuj given (2.4), Assumption 2.2 is at odds with condition (2.5), as is in dynamic models. Condition (2.4), in fact, implies a set of orthogonality conditions that are not linear in parameters. Ignoring (2.40), covariance restrictions, (2.4b) implies E[(Aun-¢Aun4)x%] = 0, t = 2,- -,T-1, since (uh-nu”) - ¢(u"4-un) = g" - ch”. These are [%T(T-1)-1]k moment conditions so that, to compare with Lwi, the number of moment conditions reduced by the AR(1) serial correlations is k. We next consider the case when the idiosyncratic errors follow the MA(1), so when we have cit = nit - pnit-1l (2.6a) E(nitx?t) = or (2.6b) E(nitn?t-1) = E(nit¢i) = 0! (2.6C) for t = 1,-- ,T. Then, E(xi'tAuit) = E[xi't(nit-pnit-1-nit+1+pnit)1 = “E(xi'tnit-1) " 0° Thus, if we use the instruments lmg, the obtained estimator will be inconsistent in this case. However, from (2.6b) we have the %(T-1)(T-2) moment conditions 66 E(x‘i’t'Auit+j) = 0, j 2 1. (2.7) These are linear in B, and we have the instruments - o - x% "(:31 x?2 'x?2 . x?T-2 ' "X?r-2 ‘ Comparing this with 1mg, the number of instruments shrinked by the MA(1) serial correlation is (T-1)k. The instruments that becomes invalid is L[diag(xH,---,x"4)], which corresponds to the statement in (2.7) that E(xi'tniH) e 0. However, it is interesting to note that the condition E(xfinnwfi ==0 can be valid, giving an alternative explanation for the serial correlation. As appears in rational expectations literature, time lags until the shocks are observed by individuals can cause serial correlation. If this is the case, it is quite possible that the one period lagged errors are uncorrelated with the current regressors though the errors follow MA(1), thus the moment condition E(xinnwn ==0 can be plausible. Therefore, allowing for time lags until shock is observed, it would not be necessary to reduce the set of instruments. We showed that the set of instrumental variables in weakly exogenous model is closely connected to the structure 67 of 2 through a couple of examples. However, we can give a different interpretation for serial correlation of the idiosyncratic errors. Suppose that serial correlation is caused by some omitted variables that are uncorrelated with the regressors of all periods (like time constant-errors in the random effects model) and that the error components to which the regressors are weakly exogenous are serially uncorrelated. Then.1mg is valid and 2 is unrestricted. In this case, the errors should be of the form + e“ _ _ S ‘u. _ ¢i'+ 6n - ¢i'+ 6 n! u t = l,---,T, (2.11) where the regressors are strictly exogenous to the serially correlated errors 6% and weakly exogenous to the intertemporally uncorrelated errors ‘3' This distinguishes general weakly exogenous models from dynamic models. In dynamic models, there can not exist the error components to which the regressors are strictly exogenous. A familiar example is the unoberved individual effects ¢H° The correlation between ¢i and the lagged dependent variables are guaranteed in dynamic models, but it is not necessarily the case in general weakly exogenous models. SAW and Hayashi and Sims (1983) pointed out that eliminating serial correlations by forward filtering requires a similar situation. We quote SAW (p. 11), "Forward filtering requires that the serial correlations in the errors do not depend on the values current and lagged 68 values of the instruments." As was noted earlier, SAW showed that the Keane and Runkle estimator, the ZSLS estimator based on forwarded filtered equations, is numerically identical to BSLS if all the instruments Lwi are used. Thus, whenever the Keane and Runkle estimator is inconsistent, so is BSLS. The requirement for vindicating forward filtering noted by SAW l and Hayashi and Sims indeed is needed for currently utilized “ instrumental variables to be valid.‘ Finally, we note on the relationship between the moment conditions for lagged dependent variables and the structure of 2 in the dynamic model considered by Ahn and Schmidt and many others. For simplicity, let xit = Yn4r t = 1, --,T. We start with the assumption that E(eity‘i’m) = 0, t = 1,~--,T, (2.12) which implies that {e“:t=1,---,T} is serially uncorrelated. We do not impose the homoskedasticity restriction of {6"3t=1,- -,T} so that E(ea) s E(ei), t e s. Assumption (2.12) alone implies the set of instruments L g'that are usually used in dynamic models, but, as was noted by Ahn and Schmidt, we need an additional assumption E(eitcp‘.) are the same, t = 1,---,T, (2.13) in order to have the restricted covariance matrix where all of the off-diagonal elements are the same. The T-2 69 additional moment conditions suggested by Ahn and Schmidt, E(uitum1 - uimumz) = 0, t = 1, - -~,T-2, (2.14) along with the instruments 1mg, encompass all of the moment conditions implied by conditions (2.12) and (2.13). Thus, the Hausman test (1978) and the GMM test (Hansen, 1982), given the instruments LM}, of testing the validity of the moment conditions (2.14) essentially test whether the condition (2.13) holds. When (2.13) is violated, the covariance matrix still will be restricted as long as {e“:t=1,- -,T) is serially uncorrelated. The off-diagonal part of 2 has %T(T-1) possibly distinctive elements, but these are composed of the T+1 elements E(¢gen), t = 1,. -,T, and 0:, and so there should be the %T(T-3)-1 restrictions. But, these add no useful moment conditions given the instruments Lwi, since Lwi stands for the %(T-1) (T-2) moment conditions from covariance restrictions when condition (2.13) holds, and %(T-1)(T-2) > %T(T-3)-1. 3. ESTIMATION USING THE BMS ASSUMPTION Wooldridge (1993) showed that the usual standard errors based on N3SLS in hedonic pricing models are not consistent, and he derived a condition for the usual 3SLS standard errors to be valid. Ahn (1990) obtained a similar result in 70 dynamic panel data models when certain moment conditions from covariance restrictions are used. These results essentially show that some moment conditions necessarily cause heteroskedasticity in the models considered by Wooldridge (1993) and Ahn (1990). It was suggested by SAW that the equi-correlation assumption of BMS can hold in weakly exogenous models with unobserved effects, and then (T-1)k instruments are added. This section shows that the usual 3SLS standard errors are not consistent if these instruments are used. Also, we apply this result to the dynamic case and link it to the result obtained by Ahn (1990). We assume that the covariance matrix is of the random effects form to avoid the consistency arguments of the previous section. For the usual 3SLS standard errors to be consistent when the instruments Lwi are used, we need the assumption of no heteroskedasticity ASSUMPTION 3.1: E(wi'L'uiui'Lwi) = E(wi'L'ZLwi) . Wooldridge (1993, Example 5.2) showed that Assumption 3.1 is satisfied under the assumptions E(eitlxgt,e$t,1,¢i) = o (3.1a) and E(egtlxfim‘i’bwcpi) = 02 (3.1b) (l that are plausible and used quite often in the rational 71 expectations models. To see this, note that E(wi'L'uiui'Lwi) has elements E(x‘i’t'AuitAuisxgs) = E[x‘i’t'(eit-eit+1)(sis-eis+1)x“?s], t,s = 1,---,T-1, and so the result follows immediately by the law of iterated expectations. The assumption suggested by BMS is ASSUMPTION 3.2: E(x&¢w) is the same, t = 1,-~ ,T, which, given Assumption 2.2, is expressed as E(xtuit - anfi%rn) = O, t = 1, -o,T-1. (3.2) I It is convenient to see the orthogonality conditions implied by this through the moment matrix - I ... . , Xnun xnun E : I . (3.3) h ' . . . ' d xnu“ Xnun Under Assumptions 2.2 and 3.2, all of the elements in the upper triangular of the moment matrix are the same, while Assumption 2.2 only implies that the elements in each row of the upper triangular of (3.3) are the same. Thus, Assumption 3.2 adds the (T-1)k moment conditions that the upper triangular elements of (3.3) are the same across the rows. The moment conditions of (3.2) describes these and we have instrumental variables 72 'xn xn iT-1 —xiT _ A notable distinction between hi and Lwi is that hi is not in the space spanned by L and it is useful for identification of the parameters on the time-constant variables. For the usual BSLS standard errors to be consistent when the instruments hi are used, the conditions E(hi'uiui'hi) E(hi'Zhi) (3.4a) and E(wi'L'uiui'hi) E(wi'L'Zhi) (3.4b) should be satisfied. Now, we show that condition (3.4) can not hold under any plausible assumptions for weakly exogenous models with fixed effects. To this end, we will show that the equalities for the first kxk blocks of (3.2a) and (3.2b) do not hold. The first kxk block of E(hgzh1)== E[hi' (ofIT+aieTeT' )hi] is afE(xi'1xi1 + Xi'zxiz) + aiE[(xi1-xi2)'(xi1-xi2)] (3.5a) And for E(hi'uiui'hi) , we have 73 I 2 - I - I I 2 E(xnunx" xi1ui1ui2xi2 xizuizunxn + xiZuiZXiZ) ... 2 I _ I I 2 I “' E[e"xi1xi1 ei16i2(xi1xi2 + xizxn) + eizxizxizl 2 + E[¢i(xi1-xi2) ' (xi1-xi2)] (3-5b) ' .- I U ' + E[2¢i€i1xi1xi1 ¢i(5i1+‘i2) (xi1xi2+xi2xi1) + 2¢i€i2xi2xi23 For the equality between (3.5a) and (3.5b) to hold, three conditions should be met: (i) E[ei1ei2(xi'1xi2 + xi'zxi1)] in the first term of (3.5b) is zero, (ii) E(¢§Axi'1Axi1) == 03E(Axi'1Axi1) for equality of the second terms in (3.5a) and (3.5b), (iii) the last term of (3.5b) is zero. Condition (1) holds under Assumption 3.1. Condition (ii) holds if we are willing to assume E(¢§|Axi1,.~,Axi,_1) = oi, (3.6) which is a strong assumption, but still is plausible along with Assumption 3.2. Condition (iii) is a different matter. From the assumptions in (3.1), the last term of (3.5b) becomes E[¢i (€i1+€i2) (xi'1xi2+xi'2xi1)] = E[¢i€i1(xi'1xi2+xi2xi1) 1' which, however, never becomes zero under any assumptions that are plausible for weakly exogenous models with fixed effects. Next we compare the first kxk block of E(ng'unqrn) 74 and E(wa'Zhi). For E(ng'Zhi),'we have E[wi'L' (OEIT+aieTe;)hi] = a§E(wi'L'hi) = afE(xi'1xi1 + xi'1xi2) , and from the assumption (3.1) , E(wi'L'uiui'h‘.) becomes OEEWGXH + Xi'1xi2) + E(¢i€i1xi'1xi2) r the second term of which is not zero in weakly exogenous cases with fixed effects. Therefore, the usual BSLS standard errors are not consistent if the instruments hi are used. We now apply this result to dynamic models and compare it with the result obtained by Ahn (1990). For simplicity, we focus on a simple AR(1) dynamic model with no exogenous regressors, so xit = Yn4l t = 1,-- ,T. The covariance matrix is assumed to be of the random effects form, and we keep the assumption (3.1) of conditional moment conditions. The BMS assumption, in this case, tells that E(Yit¢i) is the same for t = O,---,T, (3.7) which is implied by the stationarity of {(Yit¢i) :t=0- - - ,T}, the assumption suggested by Arellano and Bover (1990). The moment conditions are E(yituit+1 - yit+1uit+2) = 0' t = 0' ' --,T-2, (3’8) and we have the set of the instruments (waln). Note that all of the elements in the upper triangular of the moment 75 matrix (3.3) are the same, and so the number of moment conditions is %T(T+1)-1. This covers the moment conditions from the random effects covariance restrictions (except for those from the restriction of equal diagonals in 2). Nothing essentially differs from the previous model, and the usual standard errors from 3SLS are not consistent if h.i is L used as instruments. Without the stationarity of { (Yit¢i) :t=0, - - - ,T} , there are %T(T-1)-1 moment conditions from the equal off—diagonal restriction of 2, and T-1 conditions that E(ymAun) = O, t = 1,.o-,T-1. Together, we have %T(T+1)-2 conditions, one less than the case under the stationarity of {(yhgn):t=O,-- ,T}. These comprise %T(T-1) moment conditions E(wa'u1)== 0, and T-2 conditions (2.14) suggested by Ahn and Schmidt. The conditions in (2.14) are essentially nonlinear in parameters. Ahn and Schmidt showed that, given that the diagonals of 2 are equal, these additional moment conditions are represented as linear in parameters like E(yitAuit - yimAuim) = 0, t = 1,---,T-2. (3.9) These are, in fact, linear combinations of the conditions in (2.14) with the moment conditions from the restrictions of equal diagonals, and with E(ngVuQ ==0. Ahn (1990) showed that the usual 3SLS standard errors are not consistent when the instrumental variables from (3.9) are used, which is quite closely related to the result we obtained. The 76 structure of instruments from (3.9) is quite similar to hi. The conventional N3LS methods do not generalize to implement the moment conditions from covariance restrictions. Thereby, there is no point to compare between GMM and N3SLS when the moment condition (2.14) are used. 4. NEARLY EFFICIENT ESTINATION GMM using all of the moment conditions leads to the fully efficient estimator in large samples. In many panel data sets (e.g. when we construct the data from PSID), there is trade-off between N and T in applications; as T increases, the size of cross-section N shrinks. Further, as T increases, the number of moment conditions grows by T2. Thereby, when T is relatively large (like 6 - 10), situations where it is not even feasible to use all of the moment conditions could arise, and finding some shorter lists of instruments is of practical importance. In this section, we propose several reduced lists of instruments that would lead to nearly efficient estimators. However, we cannot provide "how near", since the efficiencies of the resulting estimators depend on too many factors to be sorted out clearly. The estimators we propose are intended to serve as only possible estimators among many, and they need to be compared to other possible estimators in practice. The arguments apply equally to the 77 estimator proposed by Keane and Runkle (1992). In many cases some of the regressors in weakly exogenous models are, in fact, strictly exogenous. A leading example is the dynamic panel data model with additional strictly exogenous regressors considered by many. Suppose xit = (x%",xfit) for t = 1, --,T and B = (B;,fl;)' in model (2.1), where x”t is weakly exogenous and xZit are strictly exogenous to the errors. The dimensions of x1it and x2,it are 1xk1 and 1xk2, where k = k1 + k2. We first consider a simple model where all the regressors are not correlated with the time-constant error ¢i. ‘We have ASSUMPTION 4.1: E(xfi'tuis) = o, 1 s t s s s T. ASSUMPTION 4.2: E(xfieu1)== 0. The set of instrumental variables implied by Assumption 4.1 is w1i = diag(x§’i1,x‘1’i2 ,---,x‘1’iT ). Assumption 4.2 implies the instruments w2i = (Itex‘z’fl) . Let wi = (w1i,w2i) of the column dimension J'2"1'(T+1)k1 + T2k2. We further assume that there presents no conditional heteroskedasticity, so ASSUMPTION 4.3: E(wi'uiui'wi) = E(wi'zwi). Under Assumptions 4.1 - 4.3 and if E is diagonal so that if there are no unobserved individual effects and the errors are intertemporally uncorrelated, the reduced set of instruments diag(x",xn,---,xn) generate the fully efficient estimator (Chapter 2). No result exists that finds some 78 reduced set of instruments that lead to the fully efficient estimator when 2 is not diagonal (Ahn and Schmidt, Chapter 2, Chapter 4). Here we allow momentarily that E to be unrestricted, but Assumptions 4.1 - 4.3 hold. These assumptions rules out time constant unobservables correlated with the regressors, but the result obtained under these assumptions will be generalized to more practical models. For the rest of the paper, for any Txp matrix mi, M E (m{,o--,mu)' of dimension NTxp. Thus, the matrix M is the stacked matrix of mi for i = 1, - - - ,N with the i-th block mi. Let n = Inez. Then, W = (W1,W2) , note that W2 is equally represented by X3811. (Chapter 2) . If all the regressors are strictly exogenous, the GLS estimator is fully efficient. Nevertheless, E(x1i'2'1ui) # 0, and 24x1i are not valid instruments. However, 24x2i are still valid. It would be natural to consider the property of the 3SLS estimator using the instruments zi = (w1i,2'1x2i) , the column dimension of which is %T(T+1)k14-15. Estimators are defined as ,8 = [x'wm'nm‘1w'x1'1x'wm'nwr‘w'y and f3= [x'z(z'QZ)"z'X]"x'z(z'QZ)"z'Y We will compare the variances of 8 and 3. Let P(.) denote the projection onto the columns of (-). Then, 79 we have -1 __ -1/2 -1/2 __ -1 W(W'nW) W'X2 — n P(n"zz,,xgo2"2)“ X2 — 9 X2, (4.2a) and Z(z'I22)“z'x2 = IT‘X2 (4.2b) since amuz'nm"z'r11/2r2'1/Z‘x2 = P(01/2W1'n-1/2x2)fl'1/2X2 = n'VZxZ. From (4.2), it follows that [ )(1'W(w'rzw)"w'x1 x1'n'1x2 ]" I [X'W(W'nW)'1W'X]'1 = (4.3a) 4 4 x50 x1 xz'n x2 and [x'2(z'02)“z'x1“ = [ X1'Z(Z'DZ)'1Z'X1 x1'n"x,_ " 0 (403b) -1 -1 x50 x1 xz'n x2 The difference between these two arises from the difference between X1'W(W'flW)'1W'X1 and x1'2(z'r22)"z'x1. We know that E is more efficient than 3. However, the difference in efficiency might not be substantial, since both W and Z include W1 which provides direct information for X1. This result depends on Assumption 4.3 of no conditional heteroskedasticity, and asymptotically the GMM estimator using the instruments Z could be dominated by GMM using simpler instruments (W,X§) in the presence of heteroskedasticity. This is the reason why the instruments 2 is limited to serve as only a choice to be compared with many other constructable set of instruments. If the covariance matrix is of the random effects form 80 and the unobserved effects are not correlated with the regressors, it is better to use (W1,PX2,QX2) than (W1,n"XZ) as instruments whether there is heteroskedasticity or not, since (2")(2 = aPXZ-I-bQX2 and we never worse off by using (PX2,QXZ) instead of aPX2+bQXZ, where P = INGJT‘eTeT' and Q = I“.- P. See Chapter 2 for the result of n'1 = aP+bQ. In this case, (PXZ,QX2) explains x1 better than aPXz-I-bQXZ. We omit complicated algebra comparing the performance of two sets of instrumental variables (W1,PX2,QX2) and (W1,n"x2) I since it is intuitively clear. In the case when the unobserved individual effects are correlated with all of the regressors and the covariance matrix is of the random effects form, we have the instruments ri = L-diag(x‘1’i1,x§’i2, - - - ,x‘1’im) and Loxgi. In this case, the reduced set of instruments (R,QX§)*would produce a nearly efficient estimator, since P(R,QX2)XZ = P(R,x‘2’oL)XZ = QXZ. The same algebra compares the variances of estimator from these instruments, and produces a result like (4.3). The above arguments also apply to more general models where only some of the regressors are correlated with the fixed effects. We study the dynamic version of the Hausman and Taylor model considered by Ahn and Schmidt: Yit = ayit-1 + X“:31 + XZitfiZ + z1i71 + zzflz + ¢i "' ‘itl (4'4) for t = 1,...,T, All the regressors but the lagged dependent variable are strictly exogenous to the 81 idiosyncratic errors, and only (xfit,za) is correlated with unobserved fixed effects ¢H' The covariance matrix takes the random effects form. Then, we have the set of instruments I‘ = [R1, (X1,z1oe,)oI,, (Xz,zzoeT)oL], where R1 includes all the instruments between the lagged dependent variables and the disturbances. Note that (Z1,Zt.,)oeT is the time-constant regressors. The reduced lists of instrumental variables F = (R1,QX,PX1,Z1®eT) would produce a nearly efficient estimator, since PU‘) (X1,Z1oe,) = P0,.) (x1,Z1oeT) = (X1,z1oe,) and P(r)(X2,ZZ) = P(F)(x2,zz). Hence, the reasoning is the same as the previous cases. Note that (QX,X1oeT,Z1oeT) is the reduced lists of instruments in the static Hausman and Taylor model, that produces the fully efficient estimator when there is no heteroskedasticity. For more details, see Ahn and Schmidt (1992). 5. CONCLUSION We showed that the moment conditions that are currently utilized in weakly exogenous models may not be valid in some cases if the idiosyncratic errors are serially correlated. Were the serial correlations of the idiosycratic errors detected from an initial stage estimator of 3, testing exogeneity of the instruments would be constructive. Difficulty arises when 2 is entirely unrestricted. Then, identification of 3 becomes a serious problem unless there 82 are a sufficient number of strictly exogenous instrumental variables. However, testing the structure of 2 would lead to some nice specification tests that are not doable in single equation models, which we leave for further study. Though we prove that GMM should be used when we estimate using the BMS assumption, the reason why there necessarily arises heteroskedasticity problem is not so clear. But, the condition derived by Wooldrige (1993) provides a partial answer for this. He essentially shows that weak exogeneity is minimally required for the usual 3SLS standard errors to be consistent. Any instruments wit for the t-th equation that satisfy E(e",eru1,---,enlwk)== 0 do not raise this problem, but the instruments from the BMS assumption and the instruments suggested by Ahn (1990) necessarily relate to disturbances across equations and thereby violate the Wooldridge's condition. CHAPTER 4 INFORMATION PROM COVARIANCE RESTRICTIONS IN PANEL DATA MODELS 1. INTRODUCTION In this chapter, we study the orthogonality conditions from covariance restrictions. The main purpose is to find whether the covariance restrictions are useful for more efficient estimation in several panel data models. We focus on the restrictions from scalar and random effects covariance matrices, but the results can be extended to more general restrictions. Also, we derive the asymptotic variances of generalized method of moment (GMM) estimators that use the moment conditions from covariance restrictions. Covariance restrictions have largely been studied in the context of simultaneous equations models, and identification has been the major concern. For efficiency, Rothenberg and Leenders (1964) showed that the exploitation of covariance restrictions lowers the Cramer-Rao bound in standard simultaneous equations models when the errors are normally distributed. Hausman, Newey and Taylor (1987) proposed augmented 3SLS as a handy way to realize the efficiency gains from covariance restrictions. For panel data models, previous studies have focused on simple dynamic models. No results exist for static models 83 84 like the Hausman and Taylor (1981) model (HT henceforth) and general weakly exogenous models. The random effects covariance structure, which has become an almost standard assumption in panel data analysis, implies a set of orthogonality conditions. Instrumental variables for lagged dependent variables considered by Anderson and Hsiao (1982), Holtz-Eakin, Newey and Rosen (1988), Arellano and Bond (1991) and Ahn and Schmidt are based on the orthogonality conditions from the random effects covariance matrix. Covariance restrictions are rarely used in practice except for dynamic panel data models and triangular simultaneous equations models, where covariance restrictions are crucial for identification. This may be due to reluctance to utilize a priori restrictions that will cause inconsistency of estimators when they are false. Another important reason would be computational burden of numerical optimization, which is required in general to realize the efficiency gains from covariance restrictions. But, if covariance restrictions bring non-trivial efficiency gains, the computational burden is secondary. Therefore, it would be nice to have an easy way (without getting nonlinear estimators) to approach the possible efficiency gain when moment conditions from covariance restrictions are added. We show how to consistently estimate the asymptotic variances of the nonlinear GMM estimators that incorporate covariance restrictions without numerical optimization. By 85 comparing the two variance estimates of estimators (with and without covariance restrictions) we can see the possible efficiency gains when we add the moment conditions from covariance restrictions. If the efficiency gain is non- trivial, it would be worth doing numerical optimization. Oncw we get a nonlinear GMM estimator, it is straightforward to apply Hausman test (Hausman, 1978) or GMM test (Hansen, : 1982) to test whether the covariance restrictions used are T‘— 0‘ valid. Thus, it is not hard to get around the possible inconsistency problem of estimators from using false restrictions. In section 2, we study a general model and give some preliminary results used for the chapter. The asymptotic variances of GMM estimators that use the orthogonality conditions from covariance restrictions will be derived, and it will be shown how they are consistently estimated without numerical optimization. Also, we provide the conditions when the linear GMM estimators using the residuals as instrumental variables are asymtotically identical to the nonlinear GMM. In sections 3 and 4 we study covariance restrictions in specific models and derive conditions when the moment conditions from covariance restrictions are redundant. Section 3 deals with models where the regressors are strictly exogenous to the time-varying errors. It turns out that certain moment conditions from covariance restrictions 86 are useful unless some third moment conditions - essentially symmetry conditions - of the errors are met. We cover three models: the model with scalar covariance matrix, the random effects model, and a fixed effects type model where the unobserved individual effect is correlated with the regressors. Section 4 studies weakly exogenous models. We argue that the orthogonality conditions from covariance restrictions can be redundant when the covariance matrix is diagonal, but whenever the covariance matrix is not diagonal, they are essentially always useful. Section 5 concludes. 2. PRELIMINARIES 2.1. Begundancy Conditions for Moment Restrictions We study a linear panel data model yit = xitfi + uit' t = 1'...’T, (2.1) where {(yn,xn):i = 1, --,N) is an i.i.d. random sequence. Lat Y5 = (yi1r"'lyir)'r Xi = (Xi'1!"'lxi'1)'l and ui = (ui1t'°'r u")' of dimensions Txl, Txk and Tx1, respectively. (2.1) is equally expressed as yi = xiB + ui. Let 2 E E(uiui') , a TxT nonsingular matrix. We assume that the 4-th moments of (ywag) exists. Throughout the chapter, for any TXp matrix 87 an, M I (m{,---,m;)' of dimension NTxp, where N is the number of cross-section observations. There is a set of Txh observable instrumental variables wi , that satisfy ASSUMPTION 2.1: E(wgug ==0. ASSUMPTION 2.2: E(ngk)lhas full column rank and E(wgwg) is positive definite. Assumption 2.2 is a regularity condition that ensures identification, and it is assumed for the rest of the paper without being stated further. Throughout the paper, we let E[g”(fi)] = 0 denote an initial set of moment conditions and E[gfi(fl)] = 0 be additional moment conditions from covariance restrictions. Therefore, our major concern is whether the additional moment conditions E[gfi(fi)] = O are redundant, given the conditions E[g”(fi)] = 0. Let em?) = [gnwwefim 1 If we use the orthogonality conditions E[gi(B)] = 0, GM solves the problem . l N “4 1 N mfiJ-n Q“(fl) = [N121 gi(fi)]'Au [1.1.2:]. gi(B)]° 1. It is well known that the best choice of weighting matrix X" is a consistent estimator of A a E[gi(fi)gi(fi) '1, so we take AN = 1.112;]. gi(B)gi(B) ' I 88 where 3 is a consistent estimator of 3 (e.g. Hansen, 1982). Then Avar/N(Sa. - fl) = [D'A4014, (2.2) a . where D I E[ Egg—‘31]. Let A). = E[(9,-,(B)gu(fi) '1. 332 = 1.2. agjiw) and D 1- (D1',Dz,')', where Dj E[—3B_']' j = 1,2. Then, the 1 asymptotic variance of the GMM estimator that uses only E[gfi(3)] = 0 is “ Avar/N(8W - 3) = [D1'A;}D1]". (2.3) From (2.2) and (2.3) it is seen that interest centers on the difference between D'A"D and D1'AflD1. The former is no smaller than the latter since GMM never becomes worse asymptotically by adding orthogonality conditions. Thus, the information from covariance restrictions is useful unless D'A'1D = D1'AHD1. Schmidt (1991) shows that this equality holds if and only if D2 = A21A;]D1. (2.4) We will use this condition at several points in the remainder of the chapter. 2.2. Sealar Covariance and the Asymptotic Varienee ef QMM We begin by considering the moment conditions from the scalar covariance matrix. 89 ASSUMPTION 2.3: 2 = 021,. Assumption 2.3 tells that the off-diagonals are zero and the diagonals are the same in 2. The number of off-diagonal elements in E is T(T—l), but due to the symmetry of 2, the upper triangular of 2 is a duplicate of the lower triangular. Thus, the condition of zero off-diagonals implies the %T(T-1) orthogonality conditions E(umu") = 0, s > t = 1,---,T-1. (2.5) The moment conditions (2.5) can be expressed as E(b1'iui) = 0 or E(bz'iui) = 0, where r “n u” l “a “u bn = I n _ 0 .. and .. 0 0 — u” o 0 ha = um um ' ' “n “n “n4 ‘ The dimension of both b1i and b2i is TX%T(T-1). The condition of equal diagonals implies the T-1 moment conditions 90 mu?t - ufim) = o, t = l,---,T-l. (2.6) We express (2.6) as E(ci'ui) = 0, where P ui1 '1 "uiz uiZ ci = -ui3 . uiT-1 _ _u _ W The moment conditions in (2.5) and (2.6) contain different information, and will be considered separately. We derive the asymptotic variance of the GMM estimator expressed in terms of xi, wi, b“, has and ci. First, D1 = -E(wi'xi). (2.7a) For the moment conditions in (2.5), or E(b1'iui) = 0, the elements of D2 are -E(uisxit + uitxis), s > t = 1,---,T-1. Thus, it follows that D2 can be written as D2 = -E[(b1i+b2i) 'xi]. (2.7b) For the moment conditions E(ci'ui) = 0, we have the elements of D2 '2E(uitxit - uit+1xit+1) ' t = 1' ' ' ' 'T-l' so D2 = —2E(ci'xi). (2.7c) 91 Consider first the estimator that uses the moment conditions E(wi'ui) = 0 and E(b1'iui) = 0. Then from (2.2), Aver/M5“, - fl) = I I I I 1 I w. uiuiw, wiuiuib1i ] E wixi ] (b1i+b2i) 'xi (2.8a) -1 E[xi'wi ,xi' (b1i+b2i) ]E[ I . ' I b1iuiui Wi bi1uiui bu Similarly for the GMM estimator that uses the moment conditions E(wi'ui) = 0 and E(ci'ui) = 0, we have Aver/1W;m2 - fl) = wi' uiui' wi wi' uiui' ci '1 wi' xi '1 E[xi'wi,2xi'ci]E[ ] E[ ] (2.8b) ci'uiui'w‘. ci'uiui'ci 2ci'xi The equations (2.8a) and (2.8b) are useful in practice. They are consistently estimated using residuals G“ in place of disturbances u”, t = 1,---,T, where G" is based on a consistent estimator of B from the initial instruments WI° For a proof of consistency, See White (1984, pp. 135-138). Define b”, b2i and Si to be b", b2i and ci after replacing Git for uit for t = 1,---,T, and let w: = (wwbfi). Then, the ratio between the corresponding diagonal elements of the two estimators (standard errors) of the asymptotic variances N A A [X'W(.§:1wi'u‘.ui'wi)"W'X]'1 1- and A A N * A A * _1 A A -1 [X' (W,B1+B2) (121 wi 'uiui'w.) (B1+B2,W) 'X] will provide guidance about whether it is worth trying to 92 use the moment conditions from zero off-diagonal covariance restrictions through numerical optimization. Similar arguments apply for the equal diagonal restrictions. Were they available, b“, b2i and ci could serve themselves as instrumental variables. In the equation (2.8a), it is not hard to see that if E(bfixg) = 0 the asymptotic variance of the nonlinear GMM estimator becomes the asymptotic variance of the linear GMM estimator using b1i as instruments. Similarly, if E(bfixg) = 0, the linear GMM estimator using b2i as instrumental variables is asymptotically identical to the nonlinear GMM estimator. Also equation (2.8b) shows that when E(c{xg) = 0, the nonlinear GMM estimator is the same asymptotically as the linear GMM estimator using'cg as instrumental variables. It is interesting to ask what will happen if we use b", 8a and 8, as instruments instead of b", b2i and ci. As shortly will be shown, there is an interesting correspondence. If E(ngg) = 0, there asymptotically is no difference between using b1i and b" as instruments. Similarly, if E(bfixg)== 0 and E(ci'xi) = 0, we lose nothing by doing linear GMM using instruments BZi and Si, respectively. We now verify these assertations. If we use b", bfi or A cg as instruments the resulting estimators are consistent since I N A plimfiixl uisuit = E(u.suit), s,t = 1, - --,T. I 93 For the limiting distributions of the resulting estimators not to be affected by replacing b1i by b”, the two random N N variables X G u. and 2 u u. for s > t, should have fii-l is It 71’31-1 is It' the same limiting distribution. But, 6,8 = uis - xis(§ - fl) and N N N A 26.11. = fun. ~1zu.x./N(p-p). ‘7 afii-l Is It film-1 Is It N1_1 It 18 Because JN(§ - B) = Op(l), the limiting distributions of the is two GMM estimators using 5" and using b1i are the same provided N plimgiizlu = E(u.txis) = 0, s > t, itxis . so when E(bz'ixi) = 0. Similarly, if E(b1'ixi) = 0, replacing b2i by SZI do not affect the limiting distribution of the estimators. For 8,, since N A A #121 (uituit ' uit+1uit+1) N 2 2 l N A = #121 (uit ‘ uit+1) ' N121 (uitxit ' uit+1xit+1)’/N(fi " 3) I if E(unx“) = 0, t = 1, °-HP. t+1, t = 1,---,T-2. (2.13a) And the second set is E[(uit-um1)ui1] = o, t = 2,---,T-l. (2.13b) The moment conditions (2.13) are equally expressed as either E(h1'iui) = 0 or E(hz'iui) = 0, where h1i and h2i are ' “a u“ “n 0 ‘ "um ”um um ”um u" um 'uM 7“” “um um h1i = “W 'un -ufl “n _ O ..ui1 .4 and 97 ' 0 Ann Ann "' Aun4 1 0 0 0 0 Au n ha 3 I Ann Ann L ' . . . J Aui1 Ann-2 0 0 where Auit a u.-«1 The left and right blocks of h1i and it it+1' h2i correspond to the moment conditions in (2.13a) and (2.13b), respectively. To derive the asymptotic variance of the nonlinear GMM estimator, we follow the same path as we did in the last sub-section. Since au.Au. —6mfiTLL = -(uisAxit + xisAuit) I it follows that ahlu. “EfiIJ'= ’(hn+ha)'xw and we have 02 = -E[(h1i+h2i) 'xi]. The first derivatives of the moment conditions from the random effects covariance matrix is quite similar to that from the scalar covariance, which is the case because the moment conditions from the random effects covariance are some linear combination of those from the scalar covariance. Thus, the results we obtained for the scalar covariance restriction equally applies to the random effects covariance 98 restriction. If we use the orthogonality conditions E(w;ufi = 0 and E(h1'iui) = 0, then Mar/M56... - fl) = I I I I '1 I w.unnvq wiunntni ] E wixi ] I (h1i+h2i) 'xi (2.14) 4 [E[xi'wi,xi' (h1i+hi2) ]E[ I I I I 1H9%uiwi 1%fi%uihn The linear GMM estimator using the instrumentals h” (hfi) is asymptotically identical to the nonlinear GMM if E(hfixk) = 0 (E(h1'ixi) = 0). Applying the redundancy condition (2.4), the moment conditions from the restriction of equal off-diagonals are redundant iff E[(h1i+h2i) 'xi] = E(h1'iuiui'wi) [E(wi'uiui'wi) ]'1E(wi'xi) , (2.15) which is an analogy of the condition (2.10a) 3. STRICTLY EXOGENOUS MODELS In this section, we find the conditions when the moment conditions from covariance restrictions are redundant in the models where the regressors are strictly exogenous to the time-varying errors. We study the scalar and the random effects covariance matrices. Before considering redundancy, we present a theorem that provides intuition for our later discussion. 99 3.1. genera; Resuits on Nonredundemcy umder Ideal Conditions In the model yi = xiii + ui, we assume: ASSUMPTION 3.1: E(uilxi) = o, ASSUMPTION 3.2: E(uiui'lxi) = 07-1,. Assumptions 3.1 and 3.2 are "ideal" conditions. OLS is BLUE under these assumptions (along with nonsingularity of X'X matrix). Chamberlain (1987) showed that, ignoring the moment conditions from (3.1b) below, if all the instrumental variables wi that include xi satisfy E(uilwi) = OI (3.13) E(uiui' IWI) = 021,, (3.1b) the optimal set of instruments is E(xilwi) = xi, and OLS is the most efficient. Condition (3.1) is stronger than Assumptions 3.1 and 3.2, and Chamberlain's result allows that there would be nonredundant instrumental variables other than xi under Assumptions 3.1 and 3.2. We write it down more explicitly. THEOREM 3.1: In model (2.1) under Assumptions 3.1 - 3.2, suppose there are instrumental variables afi'of dimension qu such that (i) E(agui) = o and (ii) E(ai'uiui'xi) e 02E(ai'xi). Then GMM using the instruments (xifim) is more efficient than OLS. 100 PROOF: It is sufficient to show that a.i is not redundant. From (2.5), given the initial instruments in ai are redundant iff E(ai'xi) = E(ai'uiui'xi) [E(xi'uiui'xi) ]'1E(xi'xi) , (3.2) which holds iff E(ai'uiui'xi) = 02E(ai'xi). I Theorem 3.1 holds even when the errors are normally distributed and when the regressors are independent of the errors, but applies only to large samples. Generally GMM should be used to realize the efficency gain from the additional instruments ai. The idea underlying Theorem 3.1 is suggested in Cragg (1983) and Chamberlain (1982). They showed that there can exist nonredundant instrumental variables in addition to the regressors in the presense of conditional heteroskedasticity of unknown form, even when all the regressors are valid instruments. Cragg's estimator is a GMM estimator with more instrumental variables that are correlated with the conditional error covariance. The efficiency gain in Chamberlain's optimal minimum distance estimator has a similar interpretation. Note that Theorem 3.1 shows that, to be useful, the additional instrumental variables ai do not have to be correlated with the regressors xi. Instruments that are uncorrelated with the regressors appear frequently in the models we study subsequently. 101 3.2. 55:19:12 Exogemoms Model; Scalar Covariance Assumption 3.1 is stronger than needed. The weakest assumption with strictly exogenous regressors is ASSUMPTION 3.3: E(wi'ui) = 0, where wi = ITsx‘i’ and x“? = new - - - .xin- The choice of instruments in Assumption 3.3 simply means that xit is uncorrelated with u“, all t,s = 1, --,T. In this section, we find when the moment conditions from the scalar covariance of Assumption 2.3 are redundant, given the initial instruments ITox‘i’. Under Assumption 3.3, E(b1'ixi) = E(bz'ixi) = E(ci'xi) = 0, so from (2.9) the linear GMM estimators using either 8” or $2, and (ii as instruments has the same limiting distribution as the nonlinear GMM. Thus, we treat b”, b2i and ci as being available. Applying the condition (2.10a) to see if b1i is redundant or not, given the initial instruments Imnfi, we get E(b1'.u.ui'(I,ox$)][E(uiui'oxg'x‘i’)1"E[(1Tox‘;) 'xi] = o. (3.3) I I If there exists no conditional heteroskedasticity, thus if the assumption ASSUMPTION 3.4: E(wi'uiui'wi) = E(wi'Zwi) holds, then combined with the scalar covariance assumption 2 102 = 01H” the weighting matrix which is in the middle of the LHS of the equation (3.3) becomes aZIroE(x“?'x‘i’) . Thus, OLS is efficient and only k instruments xi are useful among Tzk instruments I ox? since P o X = X. Otherwise all the T I (X 81,) instruments Ignfi are useful (Chapter 2). Under Assumption 3.4, the equation (3.3) becomes E(b1'iuiui'xi) = 0. (3.4a) This is no more than the redundancy condition of b1i on the initial instruments xi. E(b1'iuiui'xi) contains the elements T 21 E(uisuituifx") , s > t = 1, - - - ,T-l. T: A sufficient condition for (3.4a) is E(uisufitxit) = E(uisufitxis) = E(uisuituifxifl = 0, s s t s 1’. (3.4b) There are other situations where (3.4a) holds, but they are not very intuitive. If Assumption 3.4 is violated, condition (3.3) holds if E[b1'iuiui' (ITGX?)] = 0. (3 . 5a) (3.5a) is stronger than (3.3), but it would be very unusual if (3.3) holds without (3.5a) . Because E[b1'iuiui'(I,ox‘i’)] has elements E(uisuituifx‘i’), s > t = 1,-~,T-1 and ‘r = 1,---,T, condition (3.3) holds if 103 E(u'fsuitx‘i’ = 0 and E(uisuituifx‘i’ = 0, s ,I t a 1. (3.5b) Condition (3.5b) is stronger than (3.4b), but as long as the strong exogeneity assumption holds the two conditions are quite similar. A sufficient condition that ensures (3.5b) is E(uituislx‘i’m”) = 0, 1 II s,t, (including s = t), which is met if the errors are independent over time. This condition particularly rules out ARCH presentation in panel data context. A constructive way to understand what the condition (3.5b) represents is that for b1i to be redundant, there should be no conditional heteroskedasticity when b1i is used as instruments. Otherwise, though being uncorrelated with the regressors, b1i becomes useful by explaining the second moments of errors. Thus, the reason why b1i can be useful is the same as why additional instrumental variables afilof the previous subsection can be useful even when they are uncorrelated with the regressors. We follow the same procedure to find the redundancy condition for ci. From (2.10b), when Assumption 3.4 holds ci is redundant iff E(ci'uiui'xi) 0, or equivalently T 2:1E(ui:’s - ufiw)uitxit = 0, s = 1,---T-1. (3.6a) t= Given the condition that b1i is redundant, this condition becomes E(U?txit - u?t*1xit+1) = O, t = 1’ ' ' ' ,T-10 (306b) 104 A sufficient condition for (3.6b) is E(u?txit) = o, t = l,---,T, (3.6c) that demands the symmetry of the error distribution as a miminum. If Assumption 3.4 does not hold, ci is redundant if E[ci'uiui' (IT®X?)] = 0, (3.7a) or, E[(u§t-u§t+1)u"x? = o, t = l,---,T-l, r = l,---,T. (3.7b) Given (3.5a), condition (3.7b) becomes E(ufitx‘; = o, t = l,---,T-l. (3.70) It is interesting to note that the conditions (3.5) and (3.7) are usually assumed in the literature concerning about covariance restrictions in simultaneous equations models, namely: ASSUMPTION 3.5: E[(uiui'®ui)wi] = 0. Examples are Hausman, Newey and Taylor (1987) and Arellano (1989). In particular, standard errors from augumented BSLS estimator proposed by Hausman, Newey and Taylor are not consistent when Assumption 3.5 is violated (Section 4 of their paper). To show why Assumption 3.5 ensures the conditions (3.5) and (3.7), we define selection matrices Sj of dimension [32-41‘('I'+1)-1]x'r2 such that z = 07-1, e Sjvec(2) = o 105 (Magnus and Neudecker, 1980) . Thus, the matrix Sj selects the elements in vec(2) . In our model, Sj[vec(uiui')] = (bji,ci)'ui, for j = 1,2. Since vec(uiui') = (IToui)ui, the conditions (3.5a) and (3.7a) are equally expressed as S1E[ (Iroui)uiui' (ITox‘i‘H = S1E[ (uiui'eui) (ITox‘i’H = 0 from Assumption 3.5. Thus, Assumption 3.5 is sufficient to ensure the moment conditions from covariance restrictions are redundant when the regressors are strictly exogenous. Allowing for individual effects would be a primary reason why people use panel data models, and the model we studied in this section rarely appears in panel data applications. We now turn to more widely applicable models. 3.3. Strictly Exogenous Model: Random Effects Covariance In this section, we will consider the covariance restrictions in the popular random effects model. Thus, the errors are composites of time-constant.¢k and time-varying e and all the regressors are exogenous to ¢i as well as to “I e", t = 1,- -,T. We have the initial instruments Ignfi and the set of moment conditions from the random effects covariance is a subset of the moment conditions from the scalar covariance. Thus, the moment conditions from the random effects covariance matrix are redundant as long as the higher conditional moment conditions of the errors such as Assumption 3.5 are satisfied. 106 We focus on the higher moment conditions on the time- constant error ([5, that usually is thought to be caused by omitting some unobserved variables that are invariant over the time periods in question. The additional moment conditions are E(h1'iui) = 0 or E(hz'iui) = 0, and E(ci'ui) = 0. Since E(h1'ixi) = E(hz'ixi) E(ci'xi) = 0, we handle h", h2i and ci like they are available. From the condition (2.15), h1i is redundant if E[h1'iuiui' (Irex‘i’n = 0. (3.9a) and which is equally stated as, (i) E(uisAu. 11. X? It If I 0, and (ii) E(ui1Aunu.x9 = 0, ITI s-1 > t = 1, --,T-1, T = 1,-- ,T, and c = 2, --,T-1. (3.9b) The first and the second conditions of (3.9b) correspond to the left and the right blocks of h”. Due to homogeneity of the moment conditions, not much is lost by examining only the (1,1) element of E[h1'iuiui'(I,ox“?)], so the condition E(ui3AuHui1x‘i’) = 0. Then, we have E{[¢I(Ei1-ei2)+¢i(€I1‘6i16i24—6i16i3—6i26i3)+(GI16i3-6i16i26i3)JX‘IP} = 0' (3.9c) The condition (3.9c) is met if (i) E(eilxi,¢i) = 0, (ii) E(eiteislxi,¢i,ei,) = E(eiteis), s,t II 1 (including 5 = t). For the moment conditions from equal diagonals, we apply the condition (3.7b) that E[(u§t-u§t+1)ui1x‘i’ = 0, t = 1,---,T-1, and r = 1, --,T. We consider the case when t = r 107 = 1, then the condition becomes 2 2_ 2_ 3 _ z o _ E{[2(ei1-Isi2)¢i + (36“ e“ 2€i16i2)¢i + I:i1 ei1ei2]xi} - 0. (3.10) Equality of this equation holds if we add the condition E(efi1x?) = 0 to (3.9) . One notable thing is that the condition (3.7c) that E(u?x£ = 0 does not necessarily apply III for this case. Note that in (3.10) ¢§ is differenced away. 3.4. ict Exo enous ode : ' ed Ef ects T We allow for arbitrary correlations between the regressors and the time-constant error ¢i° ‘The usual assumption is ASSUMPTION 3.6: E(xioei) = 0. Note that we still are working on the random effects covariance matrix. Assumption 3.6 implies a set of instrumental variables Lex§,*where L is the Tx(T-l) differencing operator (Chapter 2; Ahn and Schmidt). We assume that there are no time-constant variables in xit to ensure that B and 2 are identified. Ahn and Schmidt showed that the moment conditions from the random effects covariance restrictions are redundant under Assumption 3.6. Their reasoning is that the moment conditions from the random effects covariance restrictions add information only through the regression corresponding to 108 the instruments that are in the space spanned by L where the regression reaches the GLS efficiency. Their finding is plausible under certain set of assumptions, that we will detail. E(hfixg) = 0 under Assumption 3.6, thus the linear GMM using 3” as instruments is asymptotically identical to the nonlinear GMM, but using the instruments fia would in general lead to less efficient estimator than the nonlinear GMM since E(hfixg) # 0. Thus, h1i will be considered as being available. Note that h1i is in the space spanned L in the sense that ch11. = h”, where QT = L(L'L)"L'. If Assumption 3.4 of no conditional heteroskedasticity is met, so that if we have, E[(L®x§)'uiui'(1.®x‘i’)] = oiL'IeE(x‘i"x‘i’) , the only relevant instrumental variables are 09g, and OLS on the demeaned equations is efficient. It does not matter whether we use thi or Lox? as the initial instruments since P (”max = QX, where Q = IueQT. Applying the redundancy condition (2.15), we have a§E(h1'ixi) = E(h1'iuiui'QTxi) . (3.11) To simplify this condition, we will consider an example when T 3. Then, hfii = (ui3 -ui3 0) ' and hfii = (0 ui1 -ui1) ', that are pretty much of the same sort in a sense that one is redundant if the other is. We will consider h% only. Since 109 E(hii'xi) = E[4’i(xi1"'xi2)J and 3 - E(h'ii'uiui'eri) = E[(ei3+¢i)(€i1-€i2) 21 6i1(xi1-xi) ] I f- the equality in (3.11) holds under ASSUMPTION 3.7: E(eiteislxi,¢i,ei,) = E(eiteis), s,t !‘ 1’. Note that Assumption 3.7 includes the case when t = s. For the restriction of equal diagonals in 2, we have the orthogonality conditions E(cgu1)== 0, and the redundancy condition becomes 2 — 20‘E(ci'xi) — E(ci'uiui'QTxi) . (3.12) We will consider the simplest case when T = 2. Then, the LHS of (3.12) becomes 20‘3E[<1>i(xi1 - xi2)]. For the RHS of (3.11), we have 2 _ 2 _ EH41 ’ 42) 21 €i1(xi1-Xi)] + E[2¢i(€i1 ‘ Ei2) 21 ei1(in-xi)]' r- r= For both terms to be the same, it generally requires ASSUMPTION 3.8: E(e?tlxi) = o, t = 1,--.,'r, as well as Assumption 3.7. Recall that Assumption 3.7 and 3.8 are quite similar to the conditions we derived for the random effects model. They also are quite similar to the redundancy conditions in 1.11.4 110 the equations (3.5) and (3.7) of the moment conditions from the scalar covariance matrix if we consider «pi as a regressor. Allowing for correlation between the individual effects and the regressors does not alter appreciably the redundancy condition for the moment conditions from the random effects covariance restrictions. The intuition provided by Ahn and Schmidt (1992) is plausible, but it is interesting to note that there are more efficient estimators than GLS when certain conditional third moments conditions on errors are violated. Throughout this section we have studied redundancy conditions of covariance restrictions in the models where the GLS efficiency is reached. In the following, we turn to the models where the regressors are weakly exogenous to the errors . 4. NEARLY EXOGENOUS MODELS As we noted in Chapter 3, dynamic models and the rational expectations models are typical weakly exogenous models. There is a growing concern that many regressors in standard panel data models would be only weakly exogenous to the time-varying errors. We will not work on dynamic models explicitly, since there is a large body of previous work and much of which studied covariance restrictions. For listings of references, see Chapter 3. We study the models under the 111 assumptions that are usual in the rational expectations models. But, our results apply to standard models where some of regressors are weakly exogenous and also to general dynamic models. There, in general, is no point to argue whether covariance restrictions are useful in dynamic models, because covariance restrictions coincide with the instruments for the lagged dependent regressor. But, there at least is one model (probably the only model) that draws our interest, which we study first. 4.1. Weakly Exogenous Model: Diagonal Covariance We first study the weakly exogenous panel data model with sequential conditional moment restrictions of the type in Chamberlain (1992), but with no individual effects. The model is a typical rational expectations model that appears in panel data applications. The diagonal covariance matrix rarely appears in standard panel data models. Nevertheless, it frequently is assumed in the rational expectations models as the hypothesis itself implies. Further, many tests failed to reject the null of no individual effects (Keane and Runkle, 1990; 1992; Runkle, 1991). Other important models that can have the diagonal covariance matrix are dynamic models. Most of the previous studies on dynamic models concerned the random effects covariance. However, it has been observed that allowing for rich dynamics diminishes 112 the importance of individual effects (e.g. Holtz-Eakin, 1988). We continue to consider model (2.1), and assume ASSUMPTION 4 . 1: E (uit | x?,,u‘,-’t-1) I O (1' II P I-3 where x‘i’t = (xi1,---,xit) and u‘i’t (uiV- ~,uit), t = 1,---,T. Assumption 4.1 implies many instrumental variables. Utilizing every moment condition is not feasible. We restrict our attention to the second moment conditions of (xi,ui) , thus we have E(unxg) = 0, t = 1,-- ,T, (4.1) and E(u‘un) = 0, s # t. (4.2) (4.1) implies the set of instruments that appears in panel data literatures (e.g. Schmidt, Ahn and Wyhowski, 1992), and (4.2) is the covariance restrictions that the off-diagonals of E are zero. It is usual to assume ASSUMPTION 4.2: E(ufitlxgt,u‘;t_1) = of, t = 1,---,T. This excludes conditional heteroskedasticity. Under Assumptions 4.1 and 4.2, GMM using the instrumental variables x: = diag(xi1, - . - ,x") is asymptotically identical to GLS, and no other instruments are useful ignoring the higher moment conditions on the errors (Chapter 2). That is to say, among tk instruments for the t-th period equation, .1 113 only k instruments xit are useful and others are redundant, and any other functions of 3% are also redundant. We find whether the moment conditions (4.2) are redundant under Assumptions 4.1 and 4.2, given the initial instruments x}. Note that E(bfix i) = 0, but E(bz'ixi) s 0. Thus, the linear GMM using the instruments 8m is identical to the nonlinear GMM asymtotically. Thus, we treat b2i like it is known. THEOREM 4.1: In model (2.1) under Assumptions 4.1 - 4.2, the orthogonality conditions in (4.2) are redundant, given the initial instrumental variables x:. PROOF: We apply the redundancy condition (2.10a), which becomes E(bz'ixi) = E(bz'iuiui'x'i') [E(xg'uiui'xh]’1E(x:'xi) (4.3) E(bz'iuiui'xb = E(u.u s > t = l,---,T-1, 1 = l,-~,T. it isuitxi1)' When 1’ = S, E(u.u = E(u.u = E[ui “E(IJ- H'ut'xis)xis] It IS XIS) = 0§E(uitxis) ’1 0. When 1=,t E(u.u it is“ if xiv) itisuir x") = E‘unumxn) = E[ufiE(uh|u",xn)xn] = 0 from Assumption 4.1. For T s s,t, E(uituisui7x”) = 0. Thus, non-zero elements in E(bz'iuiui'xh are a§E(uitxis) for s > t. For simplicity, we show this when T = 3. Then, 2 0 azunxiz 0 I I * .. 2 2 0 0 o3ui2xi3 and [E(x:'uiui'x'i')]"E(x:'xi) = (a;2 of agz)'8Ik. Thus, the RHS 114 of the condition (4.3) becomes E(u ) for t < s = 2,3, nxn which is no more than the LHS of the condition (4.3). The argument is more tedious for general T. I Even if the regressors are only weakly exogenous, GLS is consistent since 2 is diagonal, and efficient under Assumption 4.2. This result also applies to dynamic models with one or more lagged dependent variables. As an example, suppose a simple AR(1) dynamic model where yit,1 is the only regressor. Then, GLS is equivalent to 3SLS using the instruments diag(ym,c--,y"4). The moment conditions to be used are E(ypnn) = 0, t = 1,- -,T, which are equivalent to E(unqu“) = 0, t = 1,- -,T-1, and E(ymu“) = 0. Thus, among %T(T-1) zero off-diagonal restrictions only (T-l) restrictions that correspond to the second moments between regressors and errors are useful and rest are redundant. Similar arguments apply when more than one lagged dependent variable appear as regressors. Theorem 4.1 depends heavily on the diagonality of E. If E is not diagonal and the regressors are weakly exogenous, GLS is not consistent (Schmidt, 1990), and all the covariance restrictions would be useful in general. We study a special case of this in the next section. 115 4.2. Weekiy Exogenous Model: Random Effects Covariance We now combine the individual effects and the weak exogeneity of the regressors to the idiosyncratic errors. From Ahn and Schmidt, and section 3.3, we know that the moment conditions from the random effects error covariance are useful unless GLS efficiency is reached in the space spanned by L. In another words, for the orthogonality conditions from the random effect covariance to be redundant, GLS in the differenced equations should be at least consistent or equivalently the instruments Lox? should be valid in the original equations before differencing. In this sense, the model with a diagonal covariance matrix we studied in the last sub-section is an exception, where the instruments Lox? are not valid, but the GLS efficiency (not in the space spanned by L) is reached because the covariance matrix is diagonal and so the optimal weighting matrix becomes block diagonal. We argue, throughout this section, that whenever the GLS efficiency is not reached, the moment conditions from covariance restrictions are useful. There probably is a nice and simple proof for this statement, but we could not provide it. Thus, we only provide a heuristic discussion through a couple of examples that look useful in applications. The model we deal with in this section is (2.1) with 116 the random effects error structure. Once we allow for the time-constant error ¢3,.Assumption 4.1 is not plausible. Instead, we assume ASSUMPTION 4.3: E(eitngt,e$t_1,¢i) = o, t = 1,---,'r. Assumption 4.3 is standard in the rational expectation models that allow for arbitrary correlations between the regressors and the time-constant unobservable ¢i. Like we did in the previous subsection, we will consider only the orthogonality conditions from the second moments of (xwan). Then, we get a set of instruments .. 0 q x“ _o o x“ x” O an _ o - Xn4 For more details, see Chapter 3 or Schmidt, Ahn and Wyhowsky (1992). Also, Assumption 4.3 implies that the off-diagonals of Z are the same each other. We add no conditional heteroskedasticity assumption ASSUMPTION 4.4: E(GEtIx‘i’t,<-:“?t_1,¢i) = o2 t = l,---,T. ‘I This assumption is stronger than usual. It generally is allowed that E(ea) s E(efi), t ¢ 5. Assuming that they are the same, along with Assumption 4.3, leads to the ramdom 117 effects covariance matix. But, as will shortly become clear, the restriction of equal diagonals of 2 does not alter the results we obtain. The moment conditions from equal diagonals are not our concern anyway; our interest centers on whether the equal off-diagonal restriction of covariance matrix are useful under Assumption 4.3 and 4.4. The orthogonality conditions from equal off-diagonals of the random effects covariance matrix are E(h1'iui) = 0. To simplify our discussion, we will consider when T = 3. We focus on the moment condition E(hfii'ui) = 0, and find whether it is redundant. We lose no generality from this simplification, since the moment conditions E(hflhm) = 0 and E(hfiiIui) 0 represent the same sort of moment conditions of equal off-diagonals of covariance matrix. The moment condition E(hfii'ui) = 0, given the initial instruments wi, is redundant iff E[ (hfifihéi) 'xi] = E(hfii'uiui'wi) [E(wi'uiui'wi) ]'1E(wi'xi) . (4.4) Recall that hgi = (ui3 -ui3 0)' and hgi = (o 0 Au“) I. Thus, the LHS becomes E[¢i(xi1’xi2)] + (€i1-6i2)xi3]° Straightforward algebra using the matrix inverse lemma and the identity E(xflx‘i’z) [E(x‘i’z' x‘i’z) ]'1E(x‘i’2' x‘i’1) = E(x‘i’1'x‘i’1) shows that the RHS of the condition (4.4) is 118 “4’9“” " 312w.» - %E<¢ixn> [E(xiaxn>1"E + %E(¢ixi1) [E(xiglxi1)].1E(xi'1xi3) + %E(¢ix?2) [E(xci’z'xci’z) 1-1E(X?2'Xi3) Equality between two terms do not hold unless E(¢9q) = 0, and E(enxfl) = E(euxu) = 0, that make both terms zero. Thus, the moment condition E(hfii'ui) = 0 is not redundant under Assumption 4.3 and 4.4. Nonlinear optimization is necessary in general to implement the moment condition E(hhfim) = 0, since both E(hki'xi) and E(hgi'xi) are non-zero. However, E(hgi'xi) = 0 so that the linear GMM estimator using 3% as instrumental variables is asymptotically equivalent to the nonlinear GMM that uses the same set of moment conditions. Note that we construct the %T(T-1)-1 instruments hh (or h%) first by equalizing the elements in each row of (2.13a), and then the T-l instruments h% (or ha) are constructed by equalizing the elements in columns of (2.13a). But, it is not hard to see that the column dimensions of h% and h% (or h; and h%) are reversed if we construct the orthogonality conditions by equalizing all the elements in each column of (2.13a) first. Thus, %T(T-1)-l moment conditions can be implemented without numerical optimization. One would still be doubtful that efficiency gains from the random effcts covariance restrictions would come from allowing correlations between the regressors and the time- constant error ¢u rather than from covariance structure. To 1 119 provide a firm idea that covariance structure plays an important role for redundancy of the moment conditions from covariance restrictions, we consider one more model under a quite strong set of assumptions, which would not be much of practical use. We add assumptions ASSUMPTION 4.5: E(¢ilxi,ei) l O ASSUMPTION 4.6: E(¢§Ixi,ei) I o Assumption 4.5 is like the random effects assumption in strictly exogenous model, Assumption 4.6 is an assumption of no conditional heteroskedasticity. Obviously, Assumptions 4.5 and 4.6 exclude dynamic models. Now, the only difference between the model under Assumption 4.3 - 4.6 that we will consider and the model under 4.1 - 4.2 that we considered in the last subsection is in the covariance structure. The initial set of instruments is the same, that is wi = diag(x‘i’1,x‘i’2, - - - ,x‘fi) . For simplicity, we again consider the case when T = 3 and the moment condition E(hfihm) = 0. The condition (4.4) is the redundancy condition, given the initial instruments Iq = diag(x%,x§,-~~,x%). The LHS of (4.4) is E[(h|1.i+h2i) 'xi] = E[(€i1-€i2)xi3]° For the RHS of (4.4), we have 120 E(h1'iuiui'w‘.) = [o o (o§+o§)E((eH-ei2)x$3}], E(wi'xi) = E(XIIXII Xi'zxi’z xi'3x‘i’3)' and E(wi'uiui'wi) = (Ii-E(wi'wi) + o§E(wi'eTeT'wi) (0:303) XII. XII 03X?1'X?2 03x3 x233 _ 2 2 o. o 2 0.0 (af+o§) x‘i’; x‘i’3 Though it is onerous to invert E(w{unfivq), it is not hard to see that equality in (4.4) holds when the off-blocks of E(wi'uiui'wi) are zero, as P(X§)Xt = Xt, t = 1,2,3, the case we considered in the last section. Thus, covariance restrictions become useful by allowing the time-constant error. Note again that the only difference between this and the model we studied in the last section is in the structure of the covariance matrix. Given weak exogeneity of the regressors, appearance of the time-constant error breaks the block diagonality of the optimal weighting matrix and make GLS inconsistent. In many of the rational expectations models, MA(1) serial correlation of the errors has been detected (e.g, Keane and Runkle, 1990; 1992; Runkle, 1991). As we discussed in Chapter 3, we do not have to shrink the set of instruments if the serial correlation is caused by the time lag for observing past shocks. Then, the set of instruments in those models and in the model we are dealing with are the 121 same except those from covariance restrictions. However, GLS anyway is not consistent under MA(1) error structure. Thus, we conjecture that covariance restrictions are useful in those models. Also, numerical Optimization will not be necessary to realize the efficiency gains from covariance restrictions in those models. 5. CONCLUSION GMM provides a new aspect of instrumental variables that, to be useful, they do not have to be correlated with the regressors as long as they are correlated with the error squared sequence (u§}. It is interesing to ask how much we can improve estimators from using those instruments. Finding that kind of instrumental variables outside of the models we are interested in would be unusual, but as sections 3 and 4 show, residuals generated from initial consistent estimators could play that role. And generally they are useful when error distribution is not symmetric. From section 4, we know that diagonality of covariance matrix is crucial for GLS to be consistent and efficient when the regressors are weakly exogenous. However, there are models, though not of practical importance, where GLS is consistent but covariance restrictions are always useful. Suppose a model where the regressors are only currently uncorrelated with the errors (known as contemporaneous 122 uncorrelated model) and 2 is diagonal, then GLS is consistent. But, conditional heteroskedasticity is guaranteed in this case, and more instruments generally are useful if exist, and covariance restrictions also are useful. This arguments are directly related with Wooldridge (1993) and Chapter 3. We only considered redundancy of the moment conditions from the second moments of errors and it proves that the conditional third moment conditions of errors are crucial. If covariance restrictions are not useful since the conditional third moment conditions of the errors are met. Then, those conditional third moment conditions of the errors in turn become a new set of moment conditions, and they, to be redundant, would require certain set of the conditional fourth moment conditions of the errors, and so on. Though we do not pursue the redundancy of higher moments in this paper, we conjecture that unless the conditional errors are from a normal distribution, higher moment conditions would matter at some point. For an example, consider a moment condition E(ei) = 0, t = 1,-- , T. It is not hard to show that these moment conditions are not redundant unless E(e’i’t) = 30““, which holds when the errors are from a nomal distribution. .LI_. CHAPTER FIVE CONCLUDING REMARKS Finding some additional moment conditions and the conditions under which certain moment conditions are superfluous in panel data models has been an important branch of research; Hausman and Taylor (1981), Amemiya and . MaCurdy (1986), Breusch, Mizon and Schmidt (1989), Anderson |- and Hsiao (1981), Holtz-Eakin, Newey and Rosen (1988), Arellano and Bover (1990), Schmidt, Ahn and Wyhowski (1992), and Ahn and Schmidt are examples of contributions. The results in this thesis unify and extend results in several of these papers. One consequence of the analysis is the emergence of some new estimators that either exploite redundancy results or new useful orthogonality conditions. Another important line of research is the specification test. In most of applications, people presume that the covariance matrix takes the random effects form, and many existing tests that are suitable for the panel data framework focus on testing whether the time-constant unobserved effects are correlated with the explanatory variables (Chamberlain, 1982; Holtz-Eakin, 1986; Jakubson, 1991). However, as we discussed in Chapter 3, in weakly exogenous models it is highly probable that the moment conditions depend on the structure of the covariance matrix, 123 124 and also as we showed in Chapter 2, in strictly exogenous model the redundancy of moment conditions hinges heavily on the structure of the covariance matrix. Thus, testing the structure of covariance matrices will be quite useful and necessary in many cases. While the over-identification test (Sargan, 1958; Hansen, 1982) and the Hausman test (Hausman, 1978) are directly applicable for testing the covariance structure, these tests need the estimators using the moment conditions from covariance restrictions, and therefore, in general, will involve numerical optimization. There might be simpler ways of testing the structure of covariance matrix. Arellano and Bond (1991) devised a test statistic for testing the null a“3t = 0, s # t, in dynamic models, where ast is the (s,t) element of the covariance matrix of the differenced errors. This direct test can be generalized to general weakly exogenous, and also to strictly exogenous models. There also will be many more test statistics to be devised. For an example, it would be nice to have a simple test statistic that jointly tests the null that the covariance matrix is of the random effects form. In addition, these direct test of covariance structure, combined with the Hausman test or with the GMM test, would lead to even nicer results. For an example, suppose we have a conflict result in the rational expectations model considered in Chapter 3; the null of the MA(1) serial 125 correlation of the time-varying errors cannot be rejected from the direct test, but the Hausman test cannot reject the hypothesis that the instruments are valid, that are supposed to be invalid in the presence of the MA(1) serial correlation. Then, this result will lead to the conclusion that the serial correlation is due to the time lag until the shock is observed. We leave these topics for future works. LIST OF REFERENCES 126 REFERENCES Ahn, S.C. (1990), "Three Essays on Share Contracts Labor Supply, and The Estimation of Models for Dynamic Panel Data," unpublished Ph.D. dissertation, Michigan State University. Ahn, S.C. and P. Schmidt (1991), "Generalized Least Squares Estimation and Specification Test for Panel Data Models," unpublished manuscript. Ahn, S.C. and P. Schmidt (1992), "Efficient Estimation of Models for Panel Data," Journal of Econometrics, forthcoming. Ahn, S.C. and P. Schmidt (1993), "A Separability Result for GMM Estimation, with Application to GLS Prediction and Conditional Moment Tests," Econometric Review, forthcoming Amemiya, T. (1977), "The Maximum Likelihood and The Nonlinear Three Stage Least Square Estimator in The General Nonlinear Simultaneous Equation Model," Econometrica, 45, 955-968. Amemiya, T. (1985), Advanced Econometrics, Basil Blackwell. Amemiya, T. and T.E. MaCurdy (1986), "Instrumental-Variable Estimation of An Error-components Model," Econometrica, 54, 869-880. Anderson, T. and C. Hsiao (1981), "Estimation of Dynamic Model with Error Components," Journal of the American Statistic Association, 76, 598-606. Arellano, M. and S. Bond (1991), "Some Tests of Specification for Panel Data: Monte Carlo Evidence and Application to Employment Equation," Review of Economic Stmgieg, 58, 277-297. Arellano, M. and O. Bover (1990), "Another Look at The Instrumental Variable Estimation of Error-Components Models," Review of Economic Studies, forthcoming. Bhargava, A., and J.D. Sargan (1983), "Estimating Dynamic Random Effects Models from Panel Data Covering Short Time Periods," Econometrica, 51, 1635-1659 127 Bowden, R.J. and D.A. Turkington (1984), Instrumemtel yeriables, New York, Cambridge University Press. Breusch, T.S., G.E. Mizon and P. Schmidt (1989), "Efficient Estimation Using Panel Data," Eeomometrica, 57, 695- 701. Chamberlain, G. (1982), "Multivariate Regression Models for Panel Data," Sourmei of Econometrics, 18, 5-18. Chamberlain, G. (1987), "Asymptotic Efficiency in Estimation with Conditional Moment Restrictions," Jourmei ef Seenometrics, 34, 305-334. Chamberlain, G. (1992a), "Comment: Sequential Moment Restrictions in Panel Data," Jourmei of Busimese amd Eeemomic Statistics, 10, 20-26. Chamberlain, G. (1992b), "Efficiency Bound for Semiparametric Regression," Econo etrica, 60, 567-596. Cragg, J.G. (1983), "More Efficient Estimation in The Presence of Heteroskedasticity of Unknown Form," Ecommetrica, 51, 751-763. Hansen, L.P. (1982), "Large Sample Properties of Generalized Methods of Moments Estimators,” E onometrica, 50, 1029- 1054. Hausman, J.A. (1978), "Specification Tests in Econometrics," Eeomemetrica, 46, 1251-1272. Hausman, J.A. and W.E. Taylor (1981), "Panel Data and Unobservable Individual Effects," Econometrica, 49, 1377-1398. Hausman, J.A., W.K. Newey and W.E. Taylor (1987), "Efficient Estimation of Simultaneous Equation Models with Covariance Restrictions," Econometrica, 55, 849-874. Hayashi, F., and C. Sims (1983), "Nearly Efficient Estimation of Time Series Models with Predetermined but Not Exogenous Instruments," Econometrica, 51, 783-792 Holtz-Eakin, D.W. (1986), "Testing for Individual Effects in Autoregressive Models," Jemrnai er Econometrics, 39, 297-307. Holtz-Eakin, D., W. Newey and H.S. Rosen (1988), "Estimating Vector Autoregressions with Panel Data," Econometrica, 56,1371-1396. 128 Hsiao, C. (1986), Anaiysis of Panel Dete, New York, Cambridge University Press. Jakubson, G. (1991), "Estimation and Testing of the Union Wage Effect Using Panel Data," Review of Economic Studies. 58. 971-991- Keane, M.P. and D.E. Runkle (1990), "Testing The Rationality of Price Forecasts: New Evidence form Panel Data," American Eeonemie Beyiew, 80, 714-735. Keane, M.P. and D.E. Runkle (1992), "On The Estimation of Panel Data Models with Serial Correlation When Instruments Are Not Strictly Exogenous," Somrnal of Snsiness and Economic Statisties, 10, 1-9. Kiefer, N.M. (1980), ”Estimation of Fixed Effects Models for Time Series of Cross Sections with Arbitrary Intertemporal Covariance," Jou nal of Econ m trics, 14, 195-202. Newey, W.K. and D. McFadden (1993), "Estimation in Large Samples," Handbook of Economet 'cs, Vol.4, forthcoming. Rothenberg, T.J. and C.T. Leenders (1964): "Efficient Estimation of Simultaneous Equations Systems," Econometrica, 32, 57-76. Runkle, D.E. (1991), "Liquidity Constraints and The Permanent Income Hypothesis: Evidence from Panel Data," Sournal of Monetary Economics, 97, 73-98. Sargan, J.D. (1958), "The Estimation of Economic Relations Using Instrumental Variables," Econometri a, 26, 393- 415. Schmidt, P. (1990), "Three-Stage Least Squares with Different Instruments for Different Equations," Journal er Econometrics, 43, 389-394. Schmidt, P. (1990), Lecture notes. Schmidt, P., S.C. Ahn and D. Wyhowski (1992), "Comment," Semrnel of Susiness ang Eeonomie Stetisties, 10, 10-14. Theil, H. (1971), Erineipies ef Sgongmetries, John Wiley & Sons. White, H. (1980), "A Heteroskedasticity-Consistent Covariance Matrix Estimator and A Direct Test for Heteroskedasticity," Econometrica, 48, 817-838. 129 White, H. (1982), "Instrumental Variables Regression with Independent Observations," Econometrica, 50, 483-499. White, H. (1984), Asymptotic Theory for Econometricians, Orlando, Academic Press. White, H. (1986), "Instrumental Variables Analogs of Generalized Least Squares Estimators," Advances in Statistical Analysis and Statistical Com utin , 1, 173- 227. Wooldridge, J.M. (1992), "System Estimation Procedures," Lecture notes. Wooldridge, J.M. (1993), "Estimating Systems of Equations with Different Instruments for Different Equations," unpublished manuscript. HICHIG‘IN STATE UNIV. LIBRRRIE ES