.L‘ H- i“ «mam Id!!! .2090st .“‘V‘ eon-name! H nnnrwnu M" nun-v- III-Il'lqlll ,.y v uuuniu up— 1;... any punts“,- «sun... U" : nn‘yg‘! “3‘ THESIS Illllllllllll‘IIIHIHIIIIIIIIHIIUlllllf‘illlHlllllHlllll 31293 01417 2666 This is to certify that the dissertation entitled IMPROVED GENERALIZED METHOD OF MOMENTS ESTIMATORS presented by HAILONG QIAN has been accepted towards fulfillment of the requirements for PhD degree in _Ecgnnmics_ 1&3er Major professor Date July 22, 1995 MSU is an Affirmative Action/Equal Opportunity Institution 0- 12771 LIBRARY Mlchlgan State University PLACE ll RETURN BOX to remove thb checkout from your record. TO AVOID FINES return on or before date due. DATE DUE DATE DUE DATE DUE MSU le An Nflrmetlve Action/Equal Opponunlty Inetltulon W ”39.1 INIPROVED GENERALIZED METHOD OF MOMENTS ESTIMATORS HAILONG QIAN A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Economics 1995 ABSTRACT Improved Generalized Method of Moments Estimators By Hailong Qian This thesis introduces a new method to improve Generalized Method of Moments estimators, given extra observable information. Monte Carlo simulation for a simple model with intercept only confirms the accuracy of the asymptotic results obtained in this thesis even when the sample size is quite small The three-stage least squares estimator of a system of equations is shown to be asymptotically equivalent to an iterative two-stage least squares estimator applied to each equation, augmented with the residuals fiom the other equations. ACKNOWLEDGENIENTS It is impossible for me to express sufficient gratitude to my major professor, Peter Schmidt, who masterly taught me econometrics, suggested this dissertation topic, carefully guided me through each step of my research, meticulously read and corrected my drafts time after time, and kindly offered me research assistantships for Summer, 1993, and the 1994-95 academic year. Without Professor Schmidt's valuable help, it would have been impossible for me to complete this dissertation. I am also extremely gratefiil to Professors Richard Baillie, Ching-Fan Chung, Robert Rasche and Jeffrey Wooldridge for their valuable help with this dissertation. I consider myself extremely fortunate to have the opportunity to associate with such an excellent group of econometricians. Their intelligence, diligence and success have inspired me so much. I am heavily indebted to Professor Daniel Suits and Mrs. Adelaide Suits. It was Professor Suits who helped get me admitted into this program, and most of all helped me to get a scholarship from the Ford Foundation for my first year here. Without their help, I would not have been here in the first place, and everything else would not be possible either. I also owe a debt of special gratitude to Mrs. Dottie Schmidt both for her help with my spoken English and for her extreme kindness. Her generous help given to me and numerous foreign students and scholars attending MSU makes our otherwise homesick lives more enjoyable. It is also my great pleasure to express my appreciation to Dan Hansen, Bill Horrace, Young min Kwon, Hyung- Seung Lee, Paul Loetscher, Jess Reaser, Leslie Schenk and Jen Tracey for all the help and entertainment they have given me in the last five years. Their freindship has made my life here enjoyable. I would also like to express my thanks to Ms. Ann F eldman and Terie Snyder for their help in many ways in the last five years. I owe my greatest debt to my family. Even though my parents are uneducated farmers, through example they taught me the virtues of diligence, self-sufliciency, decency and dignity. Though they often could not afford to buy what they needed, their sacrifice for my education and their wishes for me to have a better life were unconditional I also want to thank my four older brothers from my heart for all the things they have done for me. Finally, I gratefully acknowledge the scholarship for my first year here from the US. Committee on Economics Education and Research in China sponsored by the Ford Foundation. TABLE OF CONTENTS LIST OF TABLES ...................................................................................................... viii CHAPTER 1: INTRODUCTION ................................................................................. 1 CHAPTER 2: IMPROVED GMM ESTIMATORS FOR THE LINEAR REGRESSION MODEL ................................................................................................. 4 2.1. Introduction ............................................................................................... 4 2.2. GM with Moment Conditions Not Containing Unknown Parameters ....... 6 2.3. The Linear Model ....................................................................................... 8 2.4. Monte Carlo Results ................................................................................. 16 2.5. Concluding Remarks ................................................................................ 20 CHAPTER 3: IMPROVED GMM AND 3SLS ESTIMATORS FOR SYSTEM OF EQUATIONS ....................................................................................... 23 3.1. Introduction .............................................................................................. 23 3.2. Model and Notation .................................................................................. 24 3.3. Improved GMM estimators ....................................................................... 25 3.4. Concluding Remarks ................................................................................. 42 CHAPTER 4: IMPROVED GMM ESTIMATORS FOR SYSTEM OF NONLINEAR EQUATIONS ...................................................................................... 44 4.1. Introduction ............................................................................................. 44 4.2. Improved GMM Estimators ..................................................................... 44 4.3. Conchrding Remarks ................................................................................ 57 CHAPTER 5: THE ASYMPTOTIC EQUIVALENCE BETWEEN THE ITERATED IMPROVED ZSLS ESTIMATOR AND THE BSLS ESTIMATOR ..... 58 5.1. Introduction ............................................................................................ 58 5.2. Improved ZSLS Estimator ...................................................................... 59 5.3. Iterated 12SLS Estimator ......................................................................... 63 5.4. The Convergence and Asymptotic Efliciency of the Iterated IZSLS Estimators ............................................................................................... 71 CHAPTER 6: CONCLUSION ................................................................................... 97 REFERENCES ........................................................................................................... 99 LIST OF TABLES TABLE 1 ..................................................................................................................... 21 TABLE 2 ..................................................................................................................... 22 CHAPTER] INTRODUCTION Suppose that we have a set of moment conditions EN)l (y: ,60 )] = 0 which identify the unknown parameter 90, so that the generalized method of moments (GMM) estimation of 60 is feasible. However, suppose that we also have available a set of additional moment conditions E[¢2(y: )] = 0, where 62 is observable because it depends only on the observed data y: . Then the question is how to utilize these additional moment conditions in a simple way to improve the estimation of 60. This is possible when (I), is correlated with m. Problems of this type have been considered previously by Imbens (1992, 1993) and Imbens and Lancaster ( 1994). Imbens (1992, footnote 3) considered estimation of no = E(yt ). The sample mean, based on the moment condition E(yt — po) = O, is less efficient than the GM estimate based on the moment conditions E[(yt — 110 ), ut ]' = 0, if ut is observable, with E(ut ) = 0 and cov[(yt — Ho ), 11,] ¢ 0. Imbens (1992) and Imbens and Lancaster (1994) analyze some other specific problems that lead to GMM estimation with additional moment conditions that do not depend on the parameters of interest. In this dissertation, we prove that the usual GMM estimator of 90, say 6 , using the moment conditions E[¢1(y: ,60 )] = 0 and weighting matrix 0:: = aim... EIT'W 2.1.4». o: emu-“2 2.1m (yI.eo)1'}“. can be improved by using the observed extra moment conditions E[¢2(y: )] = 0. Specifically, we prove that the usual GMM estimator 6 is no more efficient than the augmented GMM (AGMM) estimator, say 6 , defined as the GM estimator of 90 using the moment conditions E[¢(Y:,Oo)] = E[¢l(Y:a90)'s¢2(Y:)'I= O and weighting matrix 2 C c ‘1 . .. c“ = L” C12] ={1im-r_... EIT'W 2.1m. ,eomi‘” 21m. ,eon'r‘. 21 22 We further show that the AGMM estimator 6 is numerically the same as the improved GMM (IGMM) estimator, say 6 , defined as the GMM estimator using the moment conditions E[¢.(y:.eo)— Cncii‘l’z (y: )1 = o and weighting matrix C“ =(C11— C12C2;C21)-1- The structure of the dissertation is as follows. In chapter 2, we first provide a brief general treatment of GMM estimation with additional moment conditions not containing unknown parameters. We then give some more detailed results for the linear regression model In the case of the linear regression model with conditional homoskedasticity and uncorrelatedness, we show that the IGMM estimate is an improved ZSLS (IV) estimate using as a new set of instruments the part of the original instruments that is orthogonal to the observed extra variables, whereas the usual GMM estimate is just an ordinary ZSLS (IV) estimate using the original set of instruments. We also provide some other estimators that can be written in closed form and that are asymptotically equivalent to the IGMM estimator. For the special case of a simple regression model with intercept only, we provide some Monte Carlo evidence on the finite sample performance of some specific improved estimators. For this simple model, the eficiency gains predicted by asymptotic theory are realized even for quite small sample sizes. In chapter 3, we extend the general results on improved GM to the case of a system of linear equations. Under the assumptions of conditional homoskedasticity and uncorrelatedness, we obtain explicit expressions for several asymptotically equally efficient improved GMM estimators. While the usual GMM estimator is just an ordinary 3SLS estimator, we prove that the IGMM is an improved 3SLS estimator. The improved 3SLS estimator differs from the usual 3SLS estimator in two ways. First, the covariance matrix of the residuals of the projection of the original model disturbances onto the observed 3 extra variables is used as the relevant error covariance matrix. Second, it uses as its instruments the part of original instruments orthogonal to the observed extra variables. In chapter 4, we extend the IGMM results from chapter 3 to the case of a system of nonlinear equations. Under suitable regularity conditions and some "high-level" assumptions, we show that essentially the same results as those in chapter 3 still hold for this case. In chapter 5, we further extend the improved GMM idea of previous chapters to the case where the extra variables are not observed but consistently estimated. We investigate this problem in the context of a system of linear equations. We show that 3SLS applied to the entire equation system is asymptotically equivalent to iterated ZSLS applied to each equation, augmented by the residuals from the other equations. This result generalizes a result ofTelser (1964) for the case of seemingly unrelated regressions. It also provides an interesting example of a setting in which the improved GMM estimator arises natually as an efficient estimator. The final chapter concludes the dissertation with some brief comments on firrther possible work in this line of research. CHAPTER2 INIPROVED GMM ESTIMATORS FOR THE LINEAR REGRESSION MODEL 2.1. Introduction In this chapter, we provide (in section 2.2) a brief general treatment of GMM estimation with additional moment conditions not containing unknown parameters. We also give (in section 2.3) some more detailed results for the linear regression model. Specifically, we consider the standard regression model (2.1) yt=xt'B+8t, t=1,2, ...... , T, with instruments zt satisfying E(z,et ) = 0. These moment conditions are the basis of GM estimation of B ; under a conditional homoscedasticity assumption for at , the GM estimator is the usual instrumental variables (IV) estimator. If we also have available a vector of observable variables 111 that are uncorrelated with 2.1 but correlated with a, , the additional moment conditions E(ut 69 2t ) = 0 will improve the efficiency of estimation of [3. This principle applies in linear or nonlinear models, but in the linear case we obtain very simple explicit results for the improved estimators. We believe that these results are empirically relevant, notably in the estimation of rational expectations models. In many empirical rational expectations models, the orthogonath conditions used in estimation assert that a forecast error, written as a fimction of data and parameters, is uncorrelated with variables in the information set at the 5 time the forecast was made. Thus a, is the error made in forecasting some variable at time t, based on information available at time t- l, and zt consists of information available at time t-l, so that it is uncorrelated with at. In this setting, ut can be the observable (ex post) error in the forecast of a set of variables at time t based on information available at time t- 1. As a specific example, suppose that s is a spot exchange rate at time t and ft is the one period forward rate. Many papers have tested the unbiasedness hypothesis that f,_1 = E( st | OH ), where OH is the information set at time t-l. Thus we should have a = 0 and B = l in the regression model (2.2) st =or+13fH +st. When SI and f,_] contain unit roots but are cointegrated, the above regression is often replaced by a regression in stationary variables: (23) (St - St—1)= ‘1 +B(ft—l _ St—1)+8t where again or = O and B = 1 under the unbiasedness hypothesis. Because the forecast error at is uncorrelated with variables in QM, (2.2) or (2.3) can be estimated by GM or IV, where the instruments zt are variables in OH. This is a standard applied econometric excercise. However, the esthate can be improved by using other observable variables 0., that are correlated with the forecast error a, but uncorrelated with 2,. Such variables will typically be forecast errors in other related variables. An obvious example would be the change in a security price from time t-l to t. We might reasonably expect a, and ut to be correlated if spot exchange rates and security prices respond to the same unforecastable economic shocks. For Imbens's model of the estimation of the sample mean, we provide (in section 6 2.4) some Monte Carlo evidence on the finite sample performance of some specific improved estimators. For this simple model, the eficiency gains predicted by asymptotic theory are realized even for quite small sample sizes. The final section concludes the chapter with some comments. 2.2. GMM With Moment Conditions Not Containing Unknown Parameters Let 90 be a K x1 vector ofparameters to be estimated, and y:, t = 1, 2, T, be observed data. Suppose that the following moment conditions hold: (2.4) E¢(y:,90) = E[¢‘(Y:’?°)] = o, t = 1, T, ¢2(Yt) where (bl is N X], with N 2 K, and 62 is Hx 1. We want to compare GMM based on 4)] only with GMM based on d) =(¢1',¢2')'. Note that 4), does not depend on 60. Define the following notation: 9 .. (25A) Me) = [M )] = 1 Emma) ¢T2 Tt=1 (2 SB) C= C“ C" = limlT-Etb (9 )o (e r] . C21 C22 T"°° T O T O (2.50) D=[D‘]= lim M. 0 T—roo 60' (The block "zero" in D arises because ¢T2 does not depend on 9 .) For identification of 90 7 we require D1 to be of fill] column rank. In the case that the y: are iid, C = V[¢(y: ,60)] and D = E[a¢(y:,90)/ae']. Let 6 denote the GM estimator of 90 based on the moment conditions (1)1 only, using weighting matrix C; 11; and let 6 denote the augmented GMM (AGMM) estimator of 90 based on the moment conditions ¢ = (¢1',¢2')', using weighting matrix C"1. Under suitable regularity conditions, standard GMM results indicate that these estimators are consistent, with .AV[JT(é —eo)] = (ole-:1)l )-1 and Avwia‘) — 90)] = (lye-11))-l =(D1'C11Dl)"1, where C11 = (CH - CHCZCZIYl is the block of C"1 corresponding to 4)] (i e., the upper left block). For discussions of regularity conditions, see, e. g., Hansen (1982) or Gallant and White (1988). The AGMM estimator is efficient relative to the GMM estimator, since (Dl'C,"1‘D1)’l —(D1'C“D1)'1 is positive semidefinite. In fact, a little algebra reveals that we‘ll)1 - lapel-1‘1), = (0,1031), )'(c22 - encgcu)‘l - (C21C{11D1), so that the condition for no gain ill efficiency is Clel'I'D1 = 0. There is no efficiency gain when C2] = 0 (t1)l and 92 are uncorrelated); when C21 at O, the AGMM estimator is generally (but not necessarily) strictly better than the GM estimator. We can also write the augmented GMM estimator as follows. Consider the moment conditions (2.6) Baronet)—Cocaoztytn=o, which essentially deal with the residuals fiom a regression of ti), on (1)2. If V[¢(yi,90)] = C. then Vim: .90) - Cacaottyt )1 = Cl. — 0.20302. = (6‘)“. With this motivation, we define the improved GMM (IGMM) estimator, say 6, as the GM estimator using the moment conditions (2.6) and weighting matrix C11 = (C11 — c,,c;;c,,)“. It is then not difficult to show that the IGMM estimator 6 and the AGMM estimator 6 are the same. This can be seen by noting that 6 satisfies the first order condition (2-7A) DTl(é).Cll[¢Tl (é) “ Clzcii‘i’rzl = 0 where DT1(6) = 64)“ (9) / 69'; 6 satisfies the first order condition (2713) Datéwlon(é)+Drr(é)'c”om = 0. But (2.7A) and (2.7B) are seen to be the same with the substitution C12 = —C”C12C;21 in (2.7B). The above discussion treats the weighting matrix C as known. Assuming suitable regularity conditions, the superiority of the AGMM or IGMM estimator to the GMM estimator will still hold asymptotically if C is replaced by a consistent estimate C. The numerical equivalence of the AGMM and IGMM estimators would require that the same estimate C be used for both estimators. 2.3. The Linear Regression Model In this section we will apply the general results of the previous section to the case of the linear regression model For this case we can give an explicit formula for the IGMM estimator. When the errors are conditionally homoskedastic, further simplications are possible and the IGMM estimator is related to some previous results. The model considered in this section is as given in equation (2.1) above, which we rewrite slightly as (2.8) yt = xt'00+8,, t= l, 2, ...... , T, where yt is the dependent variable , xt is a K x1 vector of explanatory variables, at is the disturbance term, and 60 is the parameter vector to be estimated. Suppose that we have available an M x1 vector of instruments zt satisfying M .2 K and E(z,et ) = 0, so that the GM estimation of 90 based on the moment conditions E(z,e,) = 0 is feasible. However, suppose that we also have available an L x] observable vector ut satisfying E(ut ®z,)= 0 and E(utet) at 0. The observable data vector is y: = (yt,x,',zt',u,')', and in our previous notation we have moment conditions E¢( y: ,90) = 0, with (29A) ¢1(Y:’90)= no. — x390) (2.913) My?) = u. ®z. = (It emut. Asamatter ofnotation, let Z=(zl,---,zr)'; X=(x,,---,xT)'; a =(81,--°,8T)'; U=(u1,---,uT)'; y=(Y1’”'SYT)'; u(j) =(llJ-1,"',llij-)'f01‘j= 1’ m, L; and n" =(u(1)',~--,u(L)')'= vec(U). Then straightforward calculation yields (2.10A) on = T"z'(y - x9) (2.1013) on = T“(IL ®Z')n‘ = T“veo(z' U) (2.100) DT,(0) = —T“Z' x Using the first order condition (2.7A) above, with these substitutions we arrive at the IGMM estimator (2.11) 6 = (X'ZCHZ'X)’1X'ZC”[Z'y — c,,c;;vee(z U)]. To proceed further, we need to put more structure on C. This is possible under the assumption of no conditional heteroskedasticity or autocorrelation: suppose that, 10 conditional on Q, = {zt ;s,_1,u,_1,z,_,;...}, the (s,,u,')' are mutually uncorrelated, zt is stationary, and that (2.12) V([8‘]|z,)sz=[6: 2w]. u, 2 £118 uu Then (2.13) c = Holy:.eo)o(yt.eo)'= 2®E(ztzt') for which a consistent estimate is (2.14) f: = i cor-lz'z, where E is any consistent estimate on. Then C11 = 65"‘(T’1Z'Z)’1 with as = (63 — 2 12-12,, )-1, cl, = $3,, oar-122,6” = $3,, er'z'z, C120}; = (Emil-1 )®IM. With these substitutions in (2.11), and noting that 6“ cancels, we obtain (2.15) 6 = (x' szy‘ X'Z(Z'Z)" {Z'y - [(233) ® IM]vec(Z' U)}, where PZ = Z(Z'Z)’1 Z'. More generally, if A is any matrix, we will define PA as the projection onto A, so that PA = A(A'A)'1A' if A has firll cohrmn rank. Similarly, we define M A = 1— PA. Obviously the first term in this expression is just (X' PZX)‘l X'sz, the IV (ZSLS) estimator, which is GMM based on 411, given the assumption of no conditional heteroskedasticity. Using the matrix fact vec(BC) = (C'®I)vec(B), (2.15) can be rewritten as (2.16) 6 = (x' PZX)’1 x' 2(z'Z)"[z'y — z' U232“, ]. 11 It is reasonable to consider Em, = T'IU' U, in, = T“1U‘(y — X6), where 6 is any consistent estimate of 9. Then (2.16) becomes (2.17) 6 = (X'PZX)'1X' Pz[y — PU(y - xé)]. Finally, while (2.17) is defined for any consistent estimate 6, we may as well consider 6 = 6. Then (2.17) implies (X' PZX)6 = x' sz — x' PZPUy + x' PZPUxé; solving for 6 , we obtain (2.18) 6 = (x'PZMnyl x' PZMUy. The IGMM estimator 6 is very similar to an estimator considered by Schmidt (1986, 1988): (2-19) 6 = (X'P[MUZ]X)-1X'P[Mul]y' This is IV of the regression equation (2.8), using as instruments MUZ, the part of Z orthogonal to U. Schmidt also notes that '6. can be derived as IV of the augmented equation (2.20) yt = xt'Oo +ut'g +vt using (Z, U) as instruments. Equation (2.20) is instructive because, speaking loosely, the effect of adding the variable n1 is to reduce the relevant variance fiom o: to 03 = 0'2 - elu — 2‘1 2‘. This result is closely related to the result of Wooldridge (1993), who 2- as 2w an 118' essentially considers the case xt = zt (in our notation). 12 To be more precise about the sense in which 6 and of). dominate the simple IV estimator, and to exhibit some other asymptotically efficient estimators, we make some more explicit assumptions. To make the asymptotic theory as simple as possible, we will make the following "high level" assumptions. F-A)O( X7 AXE Axu- . 1 A2" A22 0 0 . (A2.1) phm :r—[X,Z,8,U]‘[X,Z,8,U] = A“ 0 of: Zen exrsts. _A,,x 0 2m Emu (A2.2) Axx , Au and 2‘.“ are nonsingular; AZ, is of full column rank. (A2.3a) T‘1’2vec[Z'(s,U)]—> N[O,‘I’]. (A2.3b) ‘11 = zeAu. These high-level assumptions are derivable from various sets of more basic assumptions. For example, in the rational expectations context, define et = (a, ,u,‘)' and let Q, be the information (sub)set Q, = {z,;z,_l,e,_,;z,_2,et_2; ...... }. Then (A2.1)-(A2.3) follow from the assumptions that E(et |Q,) = 0, V(et IQ, ) = Z, and xt and z, are covariance stationary. Let 6 be the usual IV estimator using Z as instruments: 6 =(X'PZX)’1X'PZy. Under (A2. l)-(A2.3) we have the standard result: (2.21) mil—90)»N10.oZ(A..zA;z‘A.,.)“l; this is consistent with the general GMM result AV(6) = (Dl'Cl’llD1 )’I presented earlier. We now turn to the IGMM estimator 6 , Schmidt's estimator '6', and the following additional estimators (2.22A) 6 = (X'MUPZMUX)" X'MUPZMUy 13 (2.22B) 6 = (X' PZX)” X' My — Uh) where A = £32m. In practice, (2.22B) will require a consistent estimate of A. We wish to show that the estimators 6 , '6', 6 and 6 are asymptotically equivalent, with asymptotic variance matrix oilu(szA;zlAzx)‘l, where as above oiln = 0': — 28,12,121”. This is consistent with the result of Schmidt (1988, Appendix C.4) for .9". Comparing to the asymptotic variance matrix of 6 in (2.21) above, the inequality 0;“ S 0‘: establishes the asymptotic efficiency of 6 , '6', 6 and 6 relative to 6. LEMMA 2.1: plim T’1X'MUZ = plim T'1x'z = An plim T‘lz'MUz = plim T‘1Z'Z = A,z Proof: plim T‘1x' MUZ = plim [T‘1X'Z — T“x' U(T"1U'U)“T"U'Z] = A,z —A,,,2;;-o = An, and similarly for plim T'IZ'MUZ. LEMMA 2.2: plim T'IX'PIMUZIX = plim T‘IX'MUPZMUX = plim T'IX' PZMUX = plim r‘x' P2X = A,,A;,‘A,x Proof. plim T’1X'PIMUZ,X = plim Tlx'MUzqtlim T"z'MUZ)“plim T‘IZ'MUX = szAgl‘A,x using Lemma 2.1. The proofs for plim T’1X'MUPZMUX and plim T'IX'PZMUX are similar. LEMMA 2.3: plim T‘IX' lemme = plim T’IX'MUPZMUe = plimT‘IX'PZMUa 14 = plimT“x'PZ(y—- Uh) = plimT'lX'PZs = 0 Proof: plimT’IX'PlMUZIS = szAgzl-plimT'T MUS using Lemma 2.1. But plim T‘IZ'MUs = plimr“z'e — plimT’IZ‘ U(plimT’1U'U)'lplim T'IU's = 0—0-1232“, = 0 since plim T'IZ'a = O and plim T’IZ' U = O. The proofs for the other cases are similar. Lemmas 2.2 and 2.3 imply that the estimators 6 , '6', B and 6 are consistent. For example, (2.23) plimBO=90+[AXZA;AZX]’1»O =90 using Lemmas 2.2 and 2.3; similar simple arguments apply to the other estimators. It is interesting in Lemma 2.3 that the orthogonality of 8 with (MUZ) occurs because a is orthogonal to Z Ed Z is orthogonal to U. LEMMA 2.4: T'I’ZX'PIMUZIS,T‘1/2X'MUPZMU8,TWX'PZMUS and T"“2X'Pz(y — U7») each converge in distribution to N[0,o:rquzA;zlAzx ]. Proof: We will give the proof for T‘“2 X' PlMoZIS . The other proofs are quite similar. (2.24) T‘l/ZX'PIMUZIe = T’1/2X'MUZ(Z'MUZ)"Z'MU8 = (T’lx'MUZ)(T-‘z'MUZ)"T'mz'MUe = AXZA2(T‘”2Z'MU8)+OP(1). So we consider T"’z' MUe = T‘1’2Z'[I—U(U'U)'l U']e 15 = T‘mZ‘e—(T‘ml‘ U)(T“U' U)"(T‘1U'e) — ——r'1’1z'(e 112,3,2 ,,)+o (l) Combining expressions, we have (2.25) T‘1’1x'PIM ZIn: A A‘1'r'1’1z'(e— U2;u2 ue)+o (1) But (2.26) T‘1’1z'(s - U232“) = T‘mvec[Z'(e — U232“, )] = T’mvec{Z'(s,U)|:_ 2_ _12 :|} 0.11118 ={[1,— —2 Z"]®IM}T"/2vec[Z'(s,U)]. w 1.111 But according to assumption (A2.3) above, T"mvec[Z‘ (e, U)] —) N[O,2 ®An]. Therefore (2.27) T-1’1z'(e- 112,112 us)—al~l(o,13) where (2.28) B={[l,—2,u 2‘1]®IM}(£®AZZ ){[1,—2,,2;,1,]®1M)' ={[1 -2 su2,1,,1]2[1, -2 2;,12Z,I}®A = (as _ 2"'tarzt—rllrzur: )‘Azz= Azz O.slu Using (2.27)-(2.28) in (2.25), we conclude (2.29) T’WX'PlM 218 —+ N[0, AHA; zz.A;,1ltt,,]— — N[0, oi‘quzA‘zzlAzx]. OeluA THEOREM 2.1: JT(6 -90)a s/T(6.—90), JT(6 -60) and «Edi-90) each converge in distribution to N[0,(r:hl (AXZAEAZXYl ]. 16 Proof. Ji(°t'1°— 90) = (T'IX'PIMUZIXYI T‘1’1x'PlMUZIe. Then using Lemmas 2.2 and 2.4, fi(.9"-90) —) N[O, A], with A = (A...A;.'A...)“1 -oit(A..A;;Aa)-(A..A;Aar1 = 0:111 (sz‘AE‘qzx).l - The proofs for the other estimators are essentially identical. I Thus the asymptotic variance matrix of each of the above estimators is cit“ (p lim T'IX' P2X)'l. As noted above, this is less than the corresponding asymptotic variance matrix for the ordinary IV estimator, 02(plim T'1X'PZX)'1, so long as 28“ at 0. To achieve an efficiency gain, the additional variables 11 must be uncorrelated with the instruments z an_d correlated with the errors a. The estimator 6 in (2.22B) is infeasible because it depends on A = 2'12 We “11 118' can define a feasible version of it, say (2.30) (i = (X'PZX)‘1X'PZ(y - Ui), where it is a consistent estimate of A. Specifically, it = (U'U)'1U'§ with é = y - X6, where 6 is any consistent estimate of 60. It is easy to show that 6 is consistent and has the same asymptotic distribution as 6 (and, therefore, the same asymptotic distribution as 6, '6 and 6 ). 2.4. Monte Carlo Results In the previous section we have considered four improved IV (UV) or improved GMM estimators. Each is consistent and asymptotically more efficient than the usual 17 IV/GMM estimator 6 = (X'PZX)"1X' sz. The asymptotic efficiency gain for each of the HV estimators over the usual IV estimator is ZNEQEW -(plim T "IX'PZXY‘, which obviously depends on the strength of the correlation between 8 and u. A natural question to ask is whether our IIV/IGMM estimators are still more efficient than the usual IV or ordinary GMM estimator in finite samples. In order to answer this question, we performed a Monte Carlo simulation on a very simple model. In the simulation we considered our IIV estimators and also some estimators of Imbens (1993) that are similar to GMM estimators. Our simulation plan is as follows. The assumed regression model is: (2.31) y, =90+e,, t= 1,2, ...... , T, where 60 is a scalar parameter and a, is iid N(O,1). Thus we are estimating 60 = E(y, ). Further we assume that we observe a random variable u,, which is also iid N(O,1). Let p denote the correlation between a, and 11,. This simple model has also been considered by Imbens (1993). An efficiency gain is possible here because the mean of u, is known to be zero. Our DGP is therefore as follows: (2.32A) y, = 1+8,, (2323) “t = pet + l-p’nr, where a, is iid N(O,l), n, is also iid N(O,1) and a, is independent of 11,. Thus 60 = 1. Our results do not depend on this choice of 60, nor do they depend on the choice of the variance of a, and 11, equal to one. The following six estimators of 60 are considered ill our simulation: (1) Sample mean (61): 61 = y. This is the GMM estimator based on E(y, —60) = 0. 18 (2) Infeasible GMM (6,): 62 = argmin,{¢T(e)'C'1¢T(e)} = y — pii, where _ _ l p e, ¢r(9)=(y-9, u)',andC=[ ]=V([ ]). p 1 at (3) Feasible GM (6,): 63 = argnnine {¢T(9)'6‘1¢T(9)) = y — on, where -—- — A 1 T A A. e 6 . A _é ¢T(9)=(y-9, u)';C=¥Ze,e, =[.” .12]wrthe,=[yt 1]; and 1:1 C21 C22 11, 6: 6:21 / 6:22- (4) 11v estimator (6,): 64 = (i'Mui)“i'Muy, where i a (1, l)'.,,,. (5) Imbens's first estimator (6 5): 6 5 is the pseudo maximum likelihood (PML) estimator defined by Imbens (1993) as the first part of the solution to g(926) = 2?:1 p(Yt ,u,,6,5) = 0’ where p(y,u,6,8) =(1y+—s:’ 1:511) and 5 is an artificial parameter. (6) Imbens's third estimator (66): 66 is defined by Imbens (1993) as the first part of the sohrtion to g(e,6.tt) = 2.1.16(y.,u..e,6,n) = o. where "156.11.935.10 = ((y-9)eXP(u -5u),u-eXP(11-8u),1- eXP(11-5u))'- Notice that in our special case of a regression model with only intercept, some estimators that are different in general become identical. The infeasible GMM estimator l9 (6,) is the same as the infeasible IIV/IGMM estimator 6 = (x' sz)“l x' P,(y — uh) = y — pfi defined in (2.22B) above. The feasible GMM estimator (63) is the same as the feasrhle IIV/IGMM estimator 6 =(x'P,x)-1x'P,(y — ui.) defined in (2.30) above, where 71: 2,32,, = (U'U)’1[U'(y - 61)] = 6,, /6,,, provided that the initial consistent estimator for e, is 6, in both cases. The three IIV/IGMM estimators 6 =(x'P,MUx)‘1x'P,MUy defined in (2.18), '6' = (x'nMU,,x)'1x'P,MU,,y defined in (2.19) and 6 =(x'MUPzMUx)“1x'MUPzMUy defined in (2.22A) are the same and equal to 6, = (i'MUi)"i'MUy, when x = z = i = (l, m, l)'Tx,. The second estimator ofImbens (1993), defined as the first part ofthe solution to g(6,8) = Z,=,p(y,,u,,6,6) = 0 with p(y,u,6,8) = ((y -6)(1-5u), u(1— 8u))', is also the same as the first three HV/IGMM estimators (equal to 6,). This leaves us with the six distinct estimators listed above. 6, = y is unbiased and var(6,) = UT. 6, = 7— on is unbiased and var(6,) = (1 — p2 ) / T. For the remaining four estimators, finite sample properties are unknown, but the estimators are consistent and their asymptotic variance is (l - p2 ) / T. Our simulation results are based on 20,000 replications. The simulations were performed in GAUSS 2.0 and used its random number generator. Table 1 gives the means of the six distinct estimators, while Table 2 gives their mean squared errors (MSE). In each case the estimators are nearly unbiased and MSE is nearly the same as variance. For convenience we actually present MSE multiplied by sample size (T), and the asymptotic variance of «E (6 —6) is given as the value for T = 00. For the sample mean 6,, T-MSEl should equal 1.0 apart fiom sampling error for all values of T and p , and deviations from unity in the first cohrmrl labelled T-MSE1 give an indication of the sample variability in the experiment. Similarly, for the infeasible GMM estimator 6,, T-MSE2 should equal (1- p2) apart from sampling error for all T and p. For the other estimators T-MSE should converge to (l — p2) for large T. The result in Table 2 are in close agreement with the asymptotic theory, and the agreement is very close for T .>_ 50. T-MSE is nearly equal to its asymptotic value 20 (1— p2) for all estimators, all values of p, and all sample sizes except T = 25 and occasionally T = 50. The IIV/IGMM estimators are better than the sample mean in all cases except p =.l and T = 25 or T = 50; as expected, the size of the efliciency gain depends on p. For this simple model, at least, the difl‘erences among the various IIV/IGMM estimators are quite small. As might be expected, the infeasible GMM estimator (6,) is usually the best. The IIV estimator 6 4 (also equal to Imbens's second estimator) is somewhat better than Imbens's first and third estimators (6 5 and 66). The feasible GMM estimator (63) seems to be slightly better than the IIV estimator when p is small, and slightly worse when p is larger. However, we repeat that the finite sample differences among the asymptotically equivalent estimators are quite small The main message of the simulations is that we can indeed improve on the usual IV estimator in finite samples, and asymptotic theory is a reliable guide to the variability of these improved estimators. At least this is so in the simple model we have considered. 2.5. Concluding Remarks In this chapter we have shown how to improve on ordinary GMM (IV or ZSLS) estimators, given observable extra variables which are uncorrelated with the instruments but correlated with the error in the equation being estimated. The difference between the improved ZSLS (IV) estimators and the ordinary ZSLS (IV) estimators is that the projection matrix PZ in ordinary ZSLS (IV) is replaced by PIMUZI’ MUPZMU, or PZMU, so that the 2SLS "fitted values" are constructed differently. For example, 6. uses MUZ, the part of Z orthogonal to U, as the regressors in the "first stage" regression, whereas the ordinary 2SLS estimator 6 just uses Z. 21 TABLE 1 Means of Alternative Estimators .1 i o. o. o. o. o. o. 25 .9998 .9998 .9992 .9992 .9992 .9992 50 .9996 .9996 .9994 .9994 .9995 .9994 100 .9999 1.0000 1.0001 1.0001 1.0001 1.0000 200 .9996 .9997 .9997 .9997 .9997 .9997 500 .9999 .9999 .9999 .9999 .9999 .9999 00 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 25 .9998 .9998 .9993 .9993 .9942 .9993 50 .9996 .9996 .9995 .9995 .9996 .9995 100 .9999 1.0002 1.0002 1.0002 1.0003 1.0003 200 .9996 .9998 .9998 .9998 .9998 .9998 500 .9999 .9999 .9999 .9999 .9999 .9999 “3 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 25 .9998 .9998 .9995 .9996 .9975 .9996 50 .9996 .9996 .9997 .9997 .9998 .9997 100 .9999 1.0003 1.0003 1.0004 1.0004 1.0004 200 .9996 1.0000 .9999 .9999 .9999 .9999 500 .9999 1.0000 1 .0000 1.0000 1 .0000 l .0000 00 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 25 .9998 .9998 .9997 .9999 .9993 .9999 50 .9996 .9997 .9998 .9999 .9998 .9998 100 .9999 1.0004 1.0004 1.0004 1.0004 1.0004 200 .9996 1.0001 1.0000 1.0000 1.0000 1.0000 500 .9999 1.0000 1.0000 1.0000 1.0000 1.0000 00 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 25 .9998 .9999 .9997 1.0000 1.0001 1.0000 50 .9996 .9999 .9999 1.0000 .9995 1.0000 100 .9999 1.0003 1.0003 1.0003 1.0003 1.0003 200 .9996 1.0001 1.0001 1.0001 1.0001 1.0001 500 .9999 1.0000 1.0000 1.0000 1.0000 1.0000 00 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 T‘ 25 50 100 200 500 00 25 50 100 200 500 25 50 100 200 500 25 50 100 200 500 25 50 100 200 500 22 TYUBLIEZ Mean Square Errors of Alternative Estimators .9965 .9985 1.0073 ‘L0008 L0120 L0000 .9965 .9985 1.0073 1.0008 L0120 1.0000 .9965 .9985 1.0073 1.0008 L0120 LOOOO .9965 .9985 L0073 L0008 L0120 L0000 .9965 .9985 1.0073 L0008 L0120 L0000 T-MSE, T-MSE, .9970 .9945 .9953 .9888 L0003 .9900 .9350 .9249 .9111 .9051 .9163 .9100 .7831 .7699 .7478 .7433 .7524 .7500 .5370 .5266 .5068 .5043 .5099 .5100 .1983 .1955 .1887 .1881 .1896 .1900 L0345 1.0123 L0047 .9937 L0027 .9900 .9684 .9406 .9196 .9094 .9180 .9100 .8113 .7829 .7556 .7468 .7535 .7500 .5615 .5372 .5130 .5068 .5109 .5100 .2199 .2035 .1918 .1893 .1902 .1900 L0447 L0147 1.0052 .9938 L0027 .9900 .9775 .9432 .9198 .9094 .9180 .9100 .8156 .7846 .7556 .7468 .7535 .7500 .5580 .5365 .5130 .5068 .5108 .5100 .2066 .1995 .1911 .1891 .1901 .1900 T-MSE, T-MSE5 L0586 L0181 1,0063 .9940 1.0025 .9900 .9963 .9469 .9207 .9096 .9180 .9100 .8188 .7873 .7564 .7468 .7535 .7500 .5600 .5377 .5136 .5068 .5110 .5100 .2192 .2003 .1913 .1892 .1900 .1900 T-MSE6 L0470 L0157 L0055 .9938 1.0025 .9900 .9781 .9439 .9200 .9098 .9180 .9100 .8159 .7849 .7558 .7468 .7535 .7500 .5592 .5367 .5132 .5068 .5110 .5100 .2069 .1998 .1912 .1890 .1900 .1900 CHAPTER3 IMPROVED GMM AND 3SLS ESTMATORS FOR SYSTEM OF EQUATIONS 3.1. Introduction In section 2.2 of Chapter 2 we defined the improved GMM (IGMM) estimator as the GM estimator using moment conditions E[¢l(y: ,60) — €1,034), (y: )] = 0 and weighting matrix C11 = (Cll — CHCQC21 )"1. In the definition, we intentionally did not specify the functional forms of d), and (112 , nor did we require the observations {o(y,‘,e) = (o,(y:,e)',o,(y;')' )', t = 1,2,...) to be conditionally homoskedastic or serially uncorrelated so long as they satisfy suitable regularity conditions. In section 2.3 of Chapter 2 we applied the general results of IGMM estimator to the case of the linear regression model. Assuming conditional homoskedasticity and serial uncorrelation, and imposing the regularity conditions (A.2.1)-(A.2.3), we obtained an explicit formula for the IGMM estimator and related it to other previously known estimators, such as the estimator of Schmidt (1988). In this chapter we will provide a similar analysis for a system of linear equations. We will first set up the model and make ' 'gh-level" assumptions of regularity conditions. Under these assumptions we derive an explicit formula for the IGMM estimator and several other asymptotically equivalent estimators, and demonstrate the efficiency of these estimators relative to the usual three-stage least sqares (3 SLS) estimator. 23 Ill. 24 3.2. Model and Notation The model considered in this chaper is (3.1) ytg = x,,,'60g +8,8, g =1, 2, ...... , G; t =1, 2, ...... , T, where yt8 is the dependent variable of equation g at observation t, x,g is the R, X] vector of explanatory variables of equation g at observation t, 90,, is the K8 x1 unknown parameter vector of equation g, and at, is the model disturbance of equation g at observation t. We assume that in general cov(x,g,s,g) at O for g = 1, 2, ..., G. We define the following notation: ytl x11. 91 811 (3.2A) y,= 5 ,X,= ,6: 3 ,8,= 3 ,t=1,2,...,T; YIg xlg (3.2B) ya) = : , X“) = YTg ng. Y(l) X0) 8(1) (3.2C) ytt- = E , Xe = .'. , as = E , y(G) X(G) 8(G) a,‘ (3.2D) 8 = ‘ sT' Then (3.1) can be rewritten as 25 (3.3A) y, = x,9, +e,, t = 1, 2, ...... , T, 01' as (3.3B) y. = x.e, +e. 3.3. Improved GMM Estimators Suppose that we have available an M x 1 vector of instruments z, satisfying the moment conditions E(e, 8) z,) = O, with E[(IG ® 2, )X,] having full column rank. Suppose that we also have available an L x 1 vector of observable variables 11, satisfying E(u, ® z,) = 0 and E(u,e,') = 2“, ¢ 0. Then the additional moment conditions E(u, ® 2,) = 0 will help us to improve the efficiency of estimation of 60. Using the notation of Chapter 2, we have (My: 13)] _ [(16 8’ Zr )(yt - X60] 3.4 La: , _ ( ) "y ) [tum (heme. where the observed data vectoris y:=(yr';Xr,',...,x,G';u,';z,1)'. Then 111(9)]=12o(yt,9)=[ on r106 e Z'Xy. — Xe) T t=l (3'5) ¢r(9) =[ T‘1(IL ®Z')u. where Z=(z,,...,zr)' and u. =vec(U)E(u(,)',...,u(L)‘)' with U=(u,,...,uT)'. Then _ C11 C12 _ . . , (36A) C‘lca €22}ng Baronet-(9.11 26 T—>oo = m E r10, ®Z')8.8.'(IG ®Z) T"(IG ®Z')8..ua'(1L ®Z) T"(IL o'oz')n.e..'(1G 8 Z) T“(IL ® Z)u..u.'(1L ® Z) (3.68) 1),, = 544$? = —%(1G ®Z')X. . Substituting (3.5) and (3.6) into the first order condition (2.7A) in Chapter 2 for the IGMM estimator 6 , and replacing C“, C12 and 0;,1 by consistent estimates C", C,, and C; respectively, we arrive at: (3.7) [x.'(1G ®Z)]C“[(IG 82')(y. — x.6)- 6,,6;,1(lL ®Z')u...] = 0. Solving for 6 , we obtain (3.8) 6 = [X.'(IG o3>Z)611(IG ®Z')X.]" x.'(lG ®Z)C” {(1. e2'1y. - 6.26310. ®Z')u.1. In order to simplify the above expression further, we need to put more structure on C. This is possible under the assumption of conditional homoskedasticity and lmcorrelatedness. Suppose that conditional on Q, = {z, ;e,_, ,u,_1 ,z,_, ;...} , the (s,',u,')' are mutually uncorrelated and that: 2. 39 V 81 _ 2&8 2w (. ) {JIM—[2 2 ] “8 uu 27 Then C = £®E(z,z,') and C = E®T‘1Z'Z is a consistent estimate ofC, where E is any consistent estimate of E. For the moment we will treat 2 as known, for simplicity. Therefore we have (3.10A) 611 = 2.. ®(T"Z'Z)'l with 288 = (2,, — 2,,2;,1,2,,,)-1 (3.1013) 6,, = 2,,,, ®T"Z‘Z (310(3) 6;; = 2,1,1, ®(T"‘Z'Z)" (3.10D) 6,,63 = 2,2,3, 81 1M. Substituting (3.10A) and (3. 10D) into (3.8), we get (3.11) 6 = [x.'(2“ ®Pz)X.]" x.'[2“ ®Z(Z'Z)"1] {(IG @2')Y* ..(Zsuzl-Rll ® IM )(IL @Z')u.j. Noticing that (3.12) (2,,2;,1, ® 1,, )(1L ®Z')u. =(2,,2;,1, ®Z')vec(U) = vec(z'U2,‘,,1,2,,) = (1G 8z'U)vec(2;,1,2,, ) = (IG 82')(IG ®U)ve0(2{;l£aa)a and substituting into (3.11), we obtain an explicit formula for the IGMM estimator 6: (3.13) 6 = [x.'(2'i ®PZ)X.]‘1X.'(E“ 8P,)[y. -(lG sum where A = vec(Eme) a (A,',A,',-~-,AG')'. Thus, for i = 1, 2, ..., G, A, = 2,1,1, times the ith column of 2“,; equivalently, A, = (p1imT’1U'U)‘1plimT'lU's(,). 28 We can compare the IGMM estimator in (3.13) with the usual GM (3 SLS) estimator (3.14) 6 = [x.'(2;,1 8P,)x.]‘1x.'(2;,1 8P,)y. based on moment conditions E[¢t,(y: ,60)] = 0. We see that the only difference between the 3SLS estimator and the IGMM estimator is that 2;: and y. in (3.14) are replaced by 2" and [ya - (1G 18 U))t] respectively. It is interesting to notice that [Ye — (16 ® U)},] = [(y(,) — UA)’ ,:--,(y(G) - UA)‘ ]' is just a vector of residuals from the linear projection of ya) onto U. Thus the IGMM estimator 6 in (3.13) can also be regarded as a purged GMM (PGMM) estimator. We will now consider a specific form for a consistent estimate of 1.. Define A (3.15) 5. (h,',~--,5.G')' with it, =(T'1U'U)‘1T"U'(y(g) — X(g)6g), where 6g is any consistent estimate of 68, for g =1,..., G. Then (U'U)_1U'(Y(l) - x(1)é(l) ) (3.16) (1G 8 0))". = (1G 8 U) . (U.U)_1U.(Y(G) ’ X((i)9(o)) = (I. e was ®(U'U)"U')(ya — x6) :08 ®PUXY--Xté) where 6 = (6,',...,6G')'. Substituting the above expression into (3.13), we get (3.17) 6 =lx.'(2°‘ elem-8:12“ eerie—(Is ePa)(y.-x.é)l. 29 This expression still depends on 2“, and we will discuss its consistent estimation later. While (3.17) is defined for any consistent estimate 6, we may as well consider the special case that 6 = 6. Then (3.17) implies (3.18) [8'02“ mantle = xstzs many. -(I. ePUXy. - x611. Solving for 6 , we obtain (3.19) 6 =[x.'(2*=1 ®PZMU)X.]" x.'(2°i ®PZMU )y.. This is an obvious generalization of the single-equation IGMM estimator 6 of Chapter 2. In order to be more precise about the sense in which the IGMM estimator 6 dominates the usual GMM (3 SLS) estimator 6, and to introduce some other equally asymptotically eflicient estimators, we make some more explicit assumptions. To make the asymptotics as simple as possible, we will make the following "high level" assumptions: X0). . 1X ' (A3.l) phm-T— (2‘31 [x,,, xm, z e U] 8| _ U. A PAll A16 A12 A18 Alu 1 = AG] ' A66 A62 A68 AGu - Azl . A26 Au 0 0 exrsts. A81 ° A86 0 288 281.1 _Au1 ° AuG 0 2118 2‘ou 288 2w 0 (A32) All, 2,“, and 2 = [Z 2 :l are nonsmgular; Azg hasfull column rank for g =1, 2, ..., G. (A3 3) 1 (1 ®Z‘) 8‘ —>N(0 28A ) ' fl G+L ll. 9 22 ' As was the case in Chapter 2, these high-level assumptions are derivable from various sets of more basic assumptions. For example, let e, = (8,',u, ' )' and Q, = {z,;z,_1 ,e,_,;z,_2 ,e,_,; ...... } ; then (A3.1)-(A3.3) follow from the assumptions that E(e,|Q, ) = O, V(e,|Q,) = )2, and X, and z, are covariance stationary. It is well known that under (A3. 1)-(A3.3), the usual GMM (3 SLS) estimator 6 defined in (3.14) has the following asymptotic variance: " - 1 . -1 1—1 . —1 -l —1 (3.20) AV[Ji(e-0,)]=[phm¥x. (2,, ®PZ)X., = [A (2,, 8A,, )A] A where A = plim%(1G ®Z')X. = 21 Anti We now wish to show that several estimators are asymptotically equally efficient, and that they are efficient relative to the 3SLS estimator. One such estimator is the IGMM estimator 6 defined in (3.19). The other such estimator is the PGMM estimator 31 defined in (3.13) with A known. In order to distinguish the PGMM estimator from the IGMM estimator, we now denote the PGMM estimator by 6. We will also consider the following two additional estimators (3-21A) .9. = [Xa'Ole‘ ® PIMpzl)an_1 Xe'aw ® PlMuzl )Yt (3.2113) 6=[x.'(2°18M P M )X.]'1X.'(2“®M P M )y.. U Z U U Z U We will show that 6 , 6 , '6' and 6 are asymptotically equivalent, with asymptotic variance matrix equal to (3.22) [plim%X.'(£as ®PZ)X.]“ =[A'(2'1a ®A;,‘)A]‘1 a B“ . Comparing to the asymptotic variance matrix of 6 in (3.20) above, the fact that the matrix {[A'(2;,1 ®A;z‘ )A]" — [A'(2“ 8A,,1 )A]'1) is positive semidefinite (shown later in Theorem 3.3) establishes the asymptotic efficiency of 6 , 6 , .6. and 6 relative to 6. We now turn to a rigorous proof of these results. LEMMA 3.1: plim T‘1z'MUx,,, = phm'r‘1z'x,,, = A,,, for g = 1, 2, G. plimT'IZ'MUZ = plimT“1z'z = A,z Proof: The proof is similar to the proof of Lemma 2.1 of Chapter 2. For example, plim T"1z'M,,x,,, = plim[r-1z'x,,, -(r‘1z'U)(r"1U'U)'1(T‘1U'x,,,)1 = A, -O-2"A =A,,, uuug using (A3. 1) and (A32). LEMMA 3.2: plim T'1x,,,,' PZMUX“, = plimT’IXm' P[Mnle(8) 32 - -l l __ - -1 r _ phmT xm sz = A,,A;,1A,, (a) forh, g =1, 2, ..., G. Proof. The proof is essentially the same as the proof of Lemma 2.2 of Chapter 2. For example, plim T‘1x,,,, 'PZMUX,,, = plim[r‘1x(,,,'z](r“1z'2)‘1[T‘1z'MUx,,,] =A,,,A;1A,g using (A31), (A32) and Lemma 3.1. LEMMA 3.3: plim[T'1X.'(Z“ 8 P,MU )x.] = plim[r’1x.'(2i*= 8 Pmuz, )x.] = plim[T‘1X.'(Z“ 8M,P,MU)X.] = plim[r-1x.'(2“ 8P,)x.] = A12“ 8A;)A a B, and B isnonsingular. Proof: Let 288 =(611')G,G. Then phm[T'1x.'(2i‘ 8 PIMUZ, )X,] I 11 16 x0) 0 PIMUZI G PIMUZJ X0) 0 —-1 O C O =phmT Gl GG X(G). ‘3 PIMUZI ‘7 PIMUZJ X(G) x '611P x x '61GP x (1) 1Mu21 <1) (i) [Moll (G) O -1 O O C =phmT ' ° G1 I GG xtG)'° PlMuZlXU) X(G)O PlMule(G) 33 OllAlen-zlAzl GIGAlegAnG = E E 5 (usingLemma3.2) OGIAGzAgAzl CGGAGzAgAnG A,, 611A;,1 61%;,1 A,, AGz (rGlAg,1 060A; A,G =A’(Z“®A;)A using the definition of A in (3.20). The probability limits for the other cases involve essentially the same arguments. Finally, B = AXE“ ®A;z1)A is nonsingular bacause 2‘. , Em, and A2, are nonsingular, which implies 2“ ®A;,l nonsingular, and because A has fill] column rank (see (A3.2)). LEMMA 3.4: plim T'1x(,,,'P,MUe,,) = plimT"1X(h) 'P,MUZ,e,,, = plimT‘1x,,,,'MUP,M,,e,,) = phmT'Xm'P,(e,,, - Uh) = plimT‘1x,,,,'Pze,,, = 0 where h, g =1, 2, ..., G. Proof. The proof is similar to the proof of Lemma 2.3 of Chapter 2. For example, plim T“X(,,) 'MUPZMUe,,, = plim[T’1X(h)'MUZ](T"'Z'Z)" [T'1z'MUe,,,] = A,, ;,1-plimT‘1z'MUe,,, using Lemma 3.1. But plim T‘1z'MUe(,, = plim'r'1z'e,,) - plim('r‘1z'U)(T"1U'U)‘1[T‘1U'e,,, ] = 0—0-2,;,1,2,,,,, = 0 using (A3. 1) and (A32), where 2“,, is the g-th column of 2“,. Therefore 34 p lim T‘1x,,,,' MUPZMUe,,, = A,,,A;,1 -0 = 0. The proofs for the other cases are similar. LEMMA 3.5: plim[T’1X.'(X“ 8P,MU )e.) = plim[T-1x.'(>:“ ® P[M,,Z])891 = phm[r‘1x.'(2“ 8 MUPZMU)8.] = plim{T"1x.'(2“ 8 PZ )[e. -(1G 8U)7.]} = plim[r'1x.'(2;,1 8P,)e.] = 0 Proof. Let 2"" = (oi’l)GxG as above. Then plim[r'1x.'(2** 811%,, )e.] 11 16 xtn' 0 PIMUZI 0 PIMUZI 8(1) . -1 . . . . =phmT GI GG ‘ xtel' 5 PIMUZ1 0 PIMUZI 8(9) G 1 lg Zs=lx(1) 0' PiMUZ18(8) o -1 e =phmT - G I Gs Zs=lx(G) 1’ PIMUZISm G 1 ° 1 r EPIC gth'r x0) PIMUZIEm) G G3 ' -1 28=10 PMT X(9)'P1Mo213(s) 23:1618 '0 = 5 (using Lemma 3.4) 26 10,63 .0 g: 35 = 0. The proofs for the other cases are similar. THEOREM 3.1: The improved GMM estimators 6 , 6 , '6. and 6 are consistent. Proof: We will give the proof for 6. The other proofs are quite similar. plim6 = plim[x.'(2'as 8 P,M,,)X.]'1 x.'(2“ 8P,M,,)y. = plim[x.'(2“ 8P,M,,)X.]'1x.'(21° 8P,M,,)(x..6O +e.) = e, +[plimx.'(2' ®PZMU)X.]'1[PlimX.'(2“ 8P,M,,)e.] = 0, + B" -0 (using Lemmas 3.3 and 3.5) = 0,. LEMMA 3.6: T190, ®Z'MU)8. -> N[O, (2...)-1 8A,] where 2“ = (2,, - 2,2;2mr1. Proof: T‘1’1z'MUe,,, = T“1’1z'e,,, - (T-1’1z'U)(T‘1U' U)“ (T-1U'e,,, ) = T-1’1Ze,,, - (T'1’1z'U)h, +o,(1) = T""2Z'(8(g) — UA8)+op(l) for g =1, 2, ..., G, where A, = 2,1,1, -E(u,e,g). Then T—1’2z' MUS“) T‘1’1(IG ®Z'MU)8. = I r'1’1z'MUe,G, T'mZ'(8(,) — U711) = 5 +0p (1) 36 = "1‘1”(1, 8z')[e. —(1G ®UE,‘,,1,)vec(2,,8)]+op(l) (using the definition of A8 = 2,1,1, -E(u,8,,,)) = T'1’1 (1G 8 Z')[vec(8) -vec(U2;,§2,,, )] + op (l) = T"1’2(1, 8 Z')vec[8 - U2;,1,2,,,]+ op (1) = T'mvec[Z'(8 — U2;,1,2,, )] +op (1) = T"”2vec{Z'(8,U)[_21_?z ]} + 0p (1) 11111.18 = T1” (11.. ,—2..£;.1.lein )veo12'(e.U)l +o, (1) =T‘1”([Ie.—£...2‘11®IM)(IG..el)veo(e.U) +o,,(1) = (11. ,—2..2;.11®IM ){T‘1’1tiou e Z')[:‘ ]} # +op(1). 8e But according to assumption (A33) above, T‘”2 (IG+L 81 Z')[ ]-> N(O, 2 ®A,z ). 6 Therefore T‘1’2(IG ® Z'M,J )8. —> N(O, W) w=(11..,—2..2;.11®Iu)(2®A..)([_2{?2 191M) =03... -E.a2.12a.)®Azz = (2"1'1)‘l 8A,, LEMMA 3.7: T‘1’1x.'(2“ 8 P,M,,)e., T‘1’2X.'(2“ 8 11mm»... T’mX.'(Z“ <8MUPZMU )8. and T'mX.'(£“ ® PZ )[8. —(IG ® U)A] each converge in distribution to l~l[o,A'(2as 8A;,1)A] with A = diag(A,,, ...... , A,,,) as defined in (3.20). 37 Proof. We give the proof for T‘WX.'(E“ ® MUPZMU )8.. The proofs for the other cases are quite similar. T'1’1x.'(21‘ 8 MUPZMU )e. X(,)' 611MUPZMU 61‘1MUPZMU em =T'"2 ' E I E E X(G)' OGIMUpzMU “° OGGMUpzMU 8(G) G 1 ___ T—1/2 E G r G Zs=1X(G) 5 BMUPZM06® 1g -l/2 3:10" T x,,)'MU P ,Mue,,, G -1/2 86-,6 8T x,G,'MUP,MUe,,, 11-,6 68,,(T‘1xm,'M Z)(T"Z'Z)"[T"’ZZ'MUS(8)] lg —1 -1/2 +o,,(1) (using Lemma 3.1) [:z§-,618(T*1x,,,'MUZ)(T‘1z'Z)‘1[T‘1’12'Mue,,,] L: 21-, 11-,6GBAG,A;,1 T'1’1z'MUe,,) ‘Alz’qul (2“ 81M )[T‘1’1(lG 8z'MU)e.]+op (1) AGzA-z-zi = A'(1G 8A,,1 (2“ ®IM)[T“’2(IG 8z'MU)e.]+o,(1) = A'(2°‘ ®A;,‘)[T‘"2(IG ®Z'MU)8..]+0,,(1). 38 But according to Lemma 3.6: T‘1/2(IG ®Z'MU)8. —> N[O, (2111)”1 8A,,]. Then T‘1’2x.'(2“ ®MUPZMU)8. -) N[O, V] where v = A12“ 8A; )[(2* )-1 8A,,1[A'(2i‘ 8A; )]'= A'(2“ @AZ )A which is the same as the matrix B defined ill (3.22) above. THEOREM 3.2: JT(6 —60), JT(6 —60), «fl—"(6660) and JT(6 —60) each converge in distribution to N(O, 13-1) with B = A12“ 8A;)A = plimT’IX.'(2“ ®PZ)X.. Proof: We will give the proof for JT (6 — 60). The proofs for the other cases are essentially identical. JT(6 -e,)=[T‘1x.'(2°‘ 8 PZMU )x.]‘l T‘1’1x.'(2* 8 PZMU )e. = B‘1-T‘1’2X.'(2“ 8 P,M,,)e. +op (1) using Lemma 3.3. But according to Lemma 3.7: T‘1’2X.'(2“ 8 PZMU )e. —> N(O, 13). Therefore JT(6—00)-)N(O, A) A = B“B(B’1)'=(B")'= B“. THEOREM 3.3: The improved GMM estimators: 6 , 6 , .6. and 6 are asymptotically efficient relative to the 3 SLS estimator 6. They are strictly more efficient than the 3 SLS estimator if 2,, at 0. Proof: From Theorem 3.2, the asymptotic variance matrix of each of the IGMM estimators is B“1 = [A'(2“ ®A;zl )A]'1. The asymptotic variance matrix of the 3SLS estimator is Q’1 = [A'(§.‘.,;,,1 (8 A; )A]'l. We wish to show that (Q'1 — B") is positive 39 semidefinite (psd), and positive definite (pd) when 2,, at O. This is equivalent to showing that (B - Q) ispsd, and pd when 2,, at 0. But B—Q=A'[(E“ — 2;,1 )8A;,1]A. When 2,, = 0, 25‘ = 2;: and B - Q = 0; there is no efficiency gain for IGMM relative to 3SLS. But when 2,, at 0, 2“ - 2;: ispd;A;z1 ispd; this implies that (22';8 — 23;,,‘)®A”ul is pd; and B - Q is pd because (2“ - 2;)8A; is pd and A has full cohlmn rank. Theorem 3.3 is best understood by the intuitive explanation of the following theorem THEOREM 3.4: Consider the augmented system (3.23) y,g = x,g'60g +u,'Ag +v,g, g =1, 2, ..., G; t = l, 2, ..., T, where A, = [E(u,u,')]’l E(u,8,g) is the linear projection coefficient of 8,1, onto u,, and v,g = 8,8 - u, ' A3 is the error in the linear projection. Define 6m as the estimator of 60 = (001',...,6,,G')' when the system (3.23) is estimated by 3SLS, using (z,',u,')' as instruments. Then 6 m is numerically the same as the IGMM estimator 6. defined in (3.21A) above. Proof: Let v, = (v,,,...,v,G )'. Then 811 "ut'll ;‘l' E(Stlut ')[E(ut.ut )1" v,= 5 =8,- 5 u,=8,- : . u, 816 ’ut.7‘G AG. E(StGut')[E(ut 'ut ”—1 40 = e, — 2,,2;,,1,u,. Therefore V(v,)=2,,-2 2‘12 =(2i‘)-1. “W118 As before, equation (3.23) can be rewritten as 9 (3.24) y. = x.e, +(lG 8 U)A+v. = [x.,(IG 8U)][ flaw. , where A = (A,',...,AG')' and V(v.) = (2%)-1 ®IT. Applying 3SLS formula to (3.24) using (Z, U) as instruments, we obtain: é3SLS _ x‘. as —l (3.25) [ism] - {[06 My}: eiiz,u,)lx.,as 9U») x.' ,, . (IG®U)' (2 @Plzmblc. Because P1211] = PU +PIMUZ] and PIZJJIU = U, (3.25) becomes (2“®U')X. X“®U'U Aasrs (3.25) [6m ] = [x4281 8(PU +P,,,UZ, )]x. x.'(2“ 8U)]‘ .1 X512“ ®(PU +P[M,,z])]yti (2“ 8U')y. ' Using the partitioned inverse rule, we get (3.26) 6,8,, = E"~X..'[21"®(PU +P,MUZ, )]y. — E‘1BD‘1-(2i‘ 8U')y. 41 where B = x.'(2as 80), D = 2‘11 ®U'U, and E = A—BD’IC with A = x42“ 8(PU +P,MU,,)]X. and c = (2“ 8U')x.. But (3.27) E = A — BD'IC = x.'[2“ 8(PU +PIMUZ] )]x. -x.'(2as ®U)(£“ 8U'U)‘1(2i‘ 8U')x. = X362“ ®(Pn +P[M,,Z] )1-(2“ ®PU)}X* = x.'(2“ ®1’[MUZ,)X. , and (3.28) BD"1 = X.'(2“ 8U)(2i‘ ®U'U)“ = x..'[IG ®U(U'U)“]. Substituting (3.27) and (3.28) into (3.26), we obtain: (3.30) 6,8,, = [x.'(2“ 811W, )x.1’1 -{x.'[2“ ®(Pn +P[MUZ] )]y. - Xe'llo ®U(U'U)'l 1(2"8 69 U')ya} = [x.'(2“ ®P[M,,z] )x.]‘1x.'{[2‘*‘ ®(Pn +P[M,,z] )1-(2“ ® PU)}Y* = [x.'(2'i's 811W, )x.]“1 x.'(2“ 811mg)» =6. Equation (3.23) is instructive because, speaking loosely, the effect of adding the variable 11, is to reduce the relevant variance of 8, from V(a,) = 2,, to V(8, lu, ) = 28,, - 2 2“2 Obviously, this result is a direct extension of the similar 8111111118. result of Schmidt (1986), and is also closely related to the result of Wooldridge (1993). 42 2 2 Our discussion so far has assumed that the covariance matrix 2 = [2m 8"] is 118 known. Since we generally do not know 2 , the estimators 6 , 6 , .6. and 6 are infeasible. However we can define feasible versions of them, say (3.31A) 6,. = [x.‘(2'as 8 P,M,,)x..]‘1x.'(2as ®PZMU)y. (3.318) 6,. = [x.'(2“ 8P,)x.]‘1x.'(2as 8P,)[y. —(1G 8 U)A] (3.31C) '6'F =[x.'(21*= 811MHz, )x.]'1 x.'(2"=a 811mm)» (3.31D) 6, = [)(.'(2118 8MUPZMU)X.]‘1X.'(2“ 8MUPZMU)y. where E“ and A are consistent estimates of 2‘.“ and A respectively. Specifically, 2“ = (E - 28,232,, )‘1 and A = vec(EEEm) with 2,, = T‘lé'é, 2,,'= 28,, = T'E'U and 2,, = T-1U'U, where 6 = (€,,---,€,)'= (y, — x,6, yT —x,6)' with 6 being any consistent estimate of 60; for example, the usual ZSLS estimator. Then it is not difficult to show that 61., 61., 6F and 6F have the same asymptotic distribution as 6 , 6, '6' and 6. 3.4. Concluding Remarks This chapter provided some improved versions of 3SLS, and extended the results in Chapter 2 on improved versions of IV (ZSLS). The improved 3SLS estimators differ fiom the usual 3SLS estimator in two ways. The first difl‘erence is that the projection matrix PZ in 3SLS is replaced by PZMU, MUPZMU, or PIMUZI’ so that the 3SLS "fitted values" are constructed differently. For example, '6' uses MUZ, the part of Z orthogonal to U, as the regressors in the "first stage" regressions, whereas the 3SLS estimator 6 just uses Z. This is exactly the same as the difference between the improved and ordinary 43 2SLS estimators in Chapter 2. However, there is a second difference between the usual and improved 3SLS estimators that did not arise in the case of 2SLS. Where 3SLS estimator uses 2;, the improved 3SLS estimator uses 2“ = (2,, - 28,232,, )’1. Thus 3SLS estimator uses the inverse of V(8, ), while the improved 3SLS estimtors use the inverse of V(8, lu, ), in the final "stage" of estimation. CHAPTER4 INIPROVED GMM ESTIMATORS FOR SYSTEM OF NONLINEAR EQUATIONS 4.1. Introduction In this chapter we will extend the results of Chapter 3 to the case of a system of nonlinear equations. The structure of the chapter is as follows. In the next section, we will first define several improved GMM (IGMM) estimators, and then show that these IGMM estimators are asymptotically equally eflicient and eflicient relative to the usual GMM estimator. The final section gives some concluding remarks. 4.2. Improved GMM Estimators The model considered in this chaper is (4.1) f(y:8,60g) = 8,8, g =1, 2, ..., G; t =1, 2, ..., T, with y; = (y,g,Y,g',x,8' )', where yt8 is the dependent variable of equation g at observation t, Ytg is the t"1 observation on the Mg x 1 vector of other endogenous variables included in equation g, xt8 is the N8 x 1 vector of exogenous variables of equation g at observation t, 608 is the K8 x1 unknown parameter vector of equation g, and at, is the model disturbance of equation g at observation t. 44 45 Suppose that we have available an M x 1 vector of instruments z, satisfying the moment conditions E[(lG 8 z, )(e,,,...,etG )'1 = 0, and E[z,af(y:,,6,,)/60,'] has full column rank, g = 1, 2, ..., G. Then the usual GMM estimation of(60,',...,606')' ill (4.1) is feasible. However, suppose that we also have available an L x1 vector of observable variables 11, satisfying E[(IL ® 2, )u,] = O and E(u,8,') = 2us at 0. Then the additional moment conditions E[(IL ® z, )u,] = 0 will help us to improve the efficiency of the estimation of(60,',...,606')'. We define the following notation: 91 (42A) 6: 5 ; 0G Yil “3’11 :91) 311 (4.28) y,‘= E , f(y,‘,e)= E , e, = 5 , t= l, ...,T; 311.6 “Yio 966 1' 8to YIg f(YIgaeg) 813 (4.2C) y(8) = E , f(8)(y(g),68)= 5 , 8(8): 5 , g= 1, ..., G; YTg f(YTgreg) 8T3 ' 8: , U: : , Z: 3 ST' uT' zT' f‘(1)(Y(ml)o91) 8(1) (4.2D) f.(6)= E , u..=vec(U), 86=V60(8)= E ; f(G) ()’(o) :96) 8(o) 8f ‘ ,0 (4.2E) l),(8,) = “11%” 11), l, ., G; 608' D1(91) (4.2F) D(6) = , 06(96) (4.2G) P,( = X(X'X)“ x', M,( = I— Px for any matrix x with full column rank. Then (4.1) can be rewritten as (43A) f(y.’,90) = 8., t = l, 2, T, 01‘ (4.38) f,,,(y;,,,8,,) = 8(8), g = 1, 2, G. Using the notation of Chapter 2, we have . ¢1(Y:’e) (IG®zt)f(Y:’e) 4.4 ,,e = , = , ( ) My ) [4.0.11 [ (1.92mi. 1 Then _ 96(9) ___1_T . _ T"(IG®Z')f.(6) (4-5) ¢r(9)-[ 4m :I-Tt>=21¢(yt,9)-[ T_,(IL®Z,)u‘ ] Therefore (46A) c= C“ C” =1im1T-Bo (9)1» (9 )'1 . C21 C22 T"°° T 0 T O 47 T—eoo - Hm E T“(Io ®Z‘)etet'(lo <8 Z) T‘1(IG 82')e..u..'(1L 82) T_1(IL ®Z')u.8.'(lG ® Z) T‘1(IL ® Z')u,,u,,'(1L ® 2) _ acute) = 1 6MB) _ 1 (4.68) 1),, _ 69, $0,, 82')—-—(367 - TOG ®Z')D(6). According to the general results of Chapter 2, we know that the augmented GMM (AGMM) estimator (6 ) using moment conditions E{(IG+L <8 2, )[f(y: ,60)' ,u,']'} = 0 and weighting matrix C“ is numerically equivalent to the IGMM estimator (6 ) using the moment conditions E[(IG ® 2, )f(y,' ,60 ) - CuC; (IL ® 2, )u,] = 0 and weighting matrix 611 = (6,, — 6,,6;;6,, )'1, since both 6 and 6 satisfy the same first order condition (2.7A) ill Chapter 2. Under suitable regularity conditions, standard GMM results also tell us that both 6 and 6 are consistent. For discussions of regularity conditions, see, e. g., Hansen (1982), Gallant and White (1988) or Amemiya (1985). Substituting (4.5) and (4.6) into the first order condition (2.7A) in Chapter 2 for the IGMM estimator 6 , and replacing C“, C12 and C3; by consistent estimates C”, 612 and 6;,1 respectively, we arrive at: (4.7) 8(6)'(1G 8 2)611[(lG 8 Z')f.(6) — 6,,6;,1(1L 8 z')u.] = 0. In order to simplify (4.7) further, we need to put more structure on C (or C). To do so, and to allow us to investigate the asymptotic properties of our estimators, we make the following "high-level" assumptions: (A4. 1) (A42) (A43) (4.8A) 48 PD1(901)' l D 6 ' phm; 6‘21“) [Drteah Date...) 2 e U] 8| _ U. .1 PAH A16 A12 A18 Aluw = AG] A66 A62 A68 AGu ' All . A29 Au 0 0 670818. A81 ' A86 0 2% 2w __Aul AuG 0 2us: zuuj Z 2 A2,, 2,,“ and 2 = [2“ 2“] are nonsingular; AZ, has full column rank 118 1111 for g =1, 2, ..., G. 1 . ‘/—T-(lG+L ®Z‘)[:] —> N(O, 28A,,). C= 2®E(z,z,')= 2®AE Note that (A43), and hence (4.8A), implicitly reflect an assumption of no conditional heteroskedasticity. Furthermore, (4.88) 6 6,, 6,, 2,,8T'1z'z 2,,8T‘12'2 49 is a consistent estimate of C, where i is any consistent estimate of 2. For the moment we will treat 2‘. as known, for simplicity. Therefore we have (4.9A) (“3“ = (Cl, —é,,é;§é,, )“ = z68 ®(T"‘Z'Z)“ with )3“ = (2,, - 28,23,sz (4.9B) 6126;; = 2&2; 09111- Substituting (4.9A) and (4.9B) into (4.7), we get (4.10) D(é)'(1G ®Z)[E“ ®(T“Z'Z)“]~ {(16 @2')f.(é)—(2w2;3, «81,, )(IL ®Z')u.] = o. Noticing that (4.11) (2&2; 49 1M )(IL @2011. = (2&2; ®Z')vec(U) = vec(Z' US$21“) =(1G <8 z'U)vec(>:;3,2u,) = (IG ®Z‘)(IG ®U)vec(2;},2,,), and substituting this expression into (4.10), we get (4.12) D(é)'(1G azure <2a>(z'2)"1 1(1G @Z')[f.(é) —(1G 49 um = 0 or (4.13) D(5)'(2“ ®PZ)[f.(é)—(IG ®U)A]=O where A = vec(BfiZm) a (Al',kz',---,AG')'. Thus, fori= l, 2, ..., G, Xi = 2;; times the i‘h column of 2“,; equivalently, x, = (plimT’lU' U)’1p1imT'1U'em. 50 It is well known that the first order condition for the usual GMM estimator ((3) using moment conditions E[¢1(y: .90 )] = 13((1G a 2, my: ,eo)] = o and weighting matrix 2;: is (4.14) 1)(é)'(2:;,1 a Pz)f.(é) = 0. Comparing (4.14) with (4.13), we see that the only difi‘erence between the first order condition of the usual GMM estimator and the first order condition of the IGMM estimator is that E: and f.(9) in (4.14) are replaced by E“ and [f.(9)- (IG ®U))t] respectively. It is interesting to notice that [f. (90) - (IG ® UM] = [(r(1,(y;,, ,eo, ) - mt),- - -,(f(G) (yzg, .90G ) - van is just a vector of residuals from the linear projection of fm)(y:g) ,Oog) onto U. Thus the IGMM estimator é implicitly determined by (4. 13) can also be regarded as a purged GMM (PGMM) estimator, as in Chapter 3. Under assumptions (A4.1)-(A4.3), it is not difficult to show that the usual GMM estimator é implicitly determined by (4. 14) has the following asymptotic variance: (4.15) Av1fi(é-eo)1 =1pM%D(eo)'(2: ®Pz mm. )1" = 14112;: ®A2>A1“ where AZ] (4.16) A = phm—;(IG ez')1)(90) = Ad} has full column rank because of assumption (A4.2). We now wish to consider the IGMM estimator 6 and the following three more estimators 51 (4.17A) ’6' = atgmin, {f.(0)'(2“ a qMUZ,)f.(e)} (4.1713) 6 = argmine{f.(9)'(2“ @MUPZMU)f.(9)} (417(3) 6' = otgmine {f.(9)'(2“ ®PZMU)f.(O)}. We will show that these estimators are asymptotically equally eficient, with asymptotic variance matrix equal to (4.13) [plim%D(90)'(Z“ tsp, )D(90)]“ = [A'()3"' 69A; )A]" e B". We will presume that the estimators are consistent, because each can be written as a GM estimator that exploits valid moment conditions. Comparing (4.18) with the asymptotic variance matrix of 6 in (4.15) above, the fact that the matrix {[A'(z;,1 em;l )A]" — [A'(2“ eAg )A]“} is positive semidefinite (shown later in Theorem 4.2) establishes the asymptotic eficiency of 6 , '9', 5 and 6 relative to 5. We now turn to a proof of these results. LEMMA 4.1: plimT’IZ'MUZ= phmT“Z'Z=A,,; plimr‘12'MU1),(90i ) = plimr’12'D,(90i ) = A,, for i = 1, 2, G. Proof. The proof is similar to the proof of Lemma 3.1 of Chapter 3. For example, plimT’IZ'MUDi (90,) = pfim[T-IZ'D1(90i )-(T"Z'U)(T"'U'U)"(T"1U'Di(9ot ))1 = A, - 0.2;},Au, = A”, where the second equality is implied by (A4.]). 52 LEMMA 4.2: plim T’lDi (90i )'I’IMUmDJ- (90].) = plim T“D,(9Oi )'MUPZMUDJ-(Ooj ) = pMT—lDi(901 )'PZMUDj(90j) = pMT—lDi(90i)'PZDj(90j) : AizA—ulAq' for i,j=1, 2, ..., G. Proof: We will give the proof for plimT'lDi(90i )'MUPZMUDjwoj) = AizA ;Azj. The proofs for the other cases are quite similar P lim T-lDi (9m )' MUPZMUDj (901') = pfimlT—1D1(901)'MUZ](T—1Z.Z)-l[T—IZ.MUDj(90j )1 = AgAgA, using (A4.1), (A42) and Lemma 4.1. LEMMA 4.3: plim[T”lD(90)'(Z‘8 ® EMUZ,)D(60)] = pIimIT"D(Go)'(E“ ® MonMo)D(Oo)] = pIimIT"D(Go)'(2“ ® PzMU)D(90)] = plimlT“D(90)'(Z“ 8’ Pz)D(Bo)1 = A12“ ®A;z‘)A a B, and B is nonsingular. Proof: Let 2” = (oii)G,(,. Then P limITIDwo )'(Zw ® P[MUZ] )D(90 )] D1 (901 )' GIIP[MUZ] ' ‘ ' GIGP[MUZ] = plimT-l '. : : Do (906 )' O'Glpmuzl ' ° ’ OGGP[MUZ] 53 Dt(9m) Do(9oo) D1(901)'0“P|MUZJD1(901) D1(901).61GP[MUZ]DG(BOG) =phmT-1 s s s Dowoo )'OmleUlei (901) Do (900 )'GGGPlMUZJDG (906) 011A12A2A21 GIGAlegAi} = 5 E E (usingLemma4.2) OGlAGzAgAzl GGGAGzAZAzG Alz oI‘A; o‘GA; AZ, AGZ 061A; "' 066A; ’ Ad} A12“ ®A;)A a B. using the definition of A in (4.16). The probability limits for the other cases involve essentially the same arguments. Finally, 13 = AXE“ (SA; )A is nonsingular bacause Z , Em, and Azz are nonsingular, which implies 2‘.“ ®A;z1 nonsingular, and because A has full cohrmn rank (see (A4.2)). LEMMA 4.4: plimr"(1G ®Z'MU)D(90) = plimT“ (1G @Z')D(00) = A. Proof: plimT‘1(IG ®Z'MU )D(90) 54 Z'MU D1(901) = plimT’1 . Z'Mu Do (900) plimT—IZ'MUD1(901) plim T—IZ'MUDG (90G) plimT"Z'Dt(9ot) = (using Lemma 4.1) plimT'IZ'DG (GOG) 21 = (using (A4.1)) A 26 using the definition of A in (4.16). LEMMA 4.5: T'1/2(IG ®Z')[e. —(1G e um and T‘1’2(IG ® z'MU )e. each converge in distribution to N[O, (2“ )-1 ®Au] with 2“ = (2,, — zwzggzue)". Proof. See the proof of Lemma 3.6 in Chapter 3. THEOREM 3.1: JT(§—90)r flay—90), «E(é—Bo) and JT(é—90) each convergein distribution to N(O,B‘1), where B = A'(2‘.“ am; )A with A = ding(A,, ,...,A,G) as defined in (4.16). 55 Proof: We will give the proof for $11.6. - 90 ). The proofs for the other cases are essentially identical. Using the definition in (4. 17A), we know that '9' satisfies the following first order condition: (4.19) 1)('é')'(2“ e qMUZ,)f.('é') = 0. Using the first order Taylor expansion formula, we have (4.20) r('é') = f(90 ) +D(e‘ )(‘é’ -90) = e. +D(e‘ )(‘é’-90) where 9' is between 60 and ‘9‘. Substituting (4.20) into (4.19), we get (4.21) D('é')(z“ e PIMUZ] )D(9‘)(é‘-90)+D(é')'(2“ a PM,] )e. = o. (4.21) is equivalent to (4.22) T1/2('é'- 90) = —1T-ID(6')'(2“ ® IiM.n>D(e')1-‘ -T-“D('é')'(2“ ®P1Mo21)8'- Because .9. is consistent, (4.22) can be rewritten as (4-23) T"2 (ii—Go) = -[p1imT"D(eo)'(>:“ e PIMUZ1)D(90)]—l - .T-1/2D(9o “2% ® PIMUZ] )e, +0p (1) = —B—1 .'1“—1/21)(9o “Zea <8 PIMUZI )3, +0p (1) using Lemma 4.3. But (4.24) T‘V’me0 )'(>:88 a PIMUZ, )e. 56 = r“‘”D(eo)'[2“ ®MUZ(Z'MUZ)‘1Z'MU]8. = T"”D(90)'(IG ca MUZ)[Z“ <22>(Z'MUZ)“](IG ®z'MU)c. =[T‘1D(90)'(IG ®MUZ)][2“ ®(T"Z'MUZ)“]- ~T‘1’2(IG ® z'MU )e. = A12“ 49A; )-T'“2(IG ®Z'MU)8. +op(1) using Lemmas 4.4 and 4.1. Substituting (4.24) into (4.23), we arrive at (4.25) T"2('é'-eo) = —B“A'(E“ ®A;ZI)-T‘”2(IG ®Z'MU)8. +01, (1) —+ N(0,W) using Lemma 4.5, where (4.26) w = B“A'(Z§“ ®A;Z')-[(2“ )"1 ®An]-[B"A'(2“ @A; )]' = 13“A'(2m ®A;)-[(£“)“ ®Au]-(£“ @A; )AB" = 13‘1 -A'(Z“ ®A;z‘)A-B"‘ = B’1 ~B-B'1 (using the definition ofB in (4.18)) = B“. THEOREM 4.2: The improved GMM estimators: é , '9', 5 and 6 are asymptotically emcient relative to the usual GMM estimator 6. They are strictly more efiicient than the usual GMM estimator (3 if 2“,, at 0. Proof: See the proof of Theorem 3.3 in Chapter 3. 57 2 2 Our discussion so far has assumed that the covariance matrix 2 = [£68 8"] is ‘18 Z uu known. Since we generally do not know 2‘. , the estimators é , '6', 5 and ii are infeasible. However we can define feasible versions of them, say (4.27A) 6,. = argmin, {[f.(9)— (1G a U)X]'(i“ <8 1),)[f..(9)-(1G e Um} (4.2713) '6', = argmin, {f..(e)'(2as a PlMUZ] mm» (4.270) 6, = urgmine {f.(9)'(i“ ® MUPZMU )f.(e)} (4.27D) 6,. = urgmine {f.(9)'(2“ ® PZMU )f.(e)} where if.“ and it are consistent estimates of 25‘ and A respectively. Specifically, 2“ = (2,, — 2,52,,12,)-l and i: vec(2;3,2,,) with 2,, = r—‘e'e, 2,: 2,, = r—‘é'u and 2,, = T‘lU'U, where E = (a, ,---,éT )'= [f(y;,é), my; ,é)]' withé being any consistent estimate of 60; for example, the usual GMM estimator. Then it is not difficult to show that 61;, 6}, 6r and 6,, have the same asymptotic distribution as (3 , '6', 5 and 5. 4.3. Concluding Remarks In this chapter, we have generalized the results of the previous chapter for 3SLS- type estimators to the case of a nonlinear model We have simplified the analysis by making high-level assumptions, and by not giving a rigorous proof of consistency. Given these simplifications, the extension from linear to nonlinear BSLS is fairly straightforward. As in the linear case, the improved estimators use a difiermt projection matrix than the usual nonlinear 3SLS (e. g. the projection onto MUZ instead of onto Z), and they use the inverse of V(st |u,) instead of V(e,) in weighting equations. CHAPTER 5 THE ASYMPTOTIC EQUIVALENCE BETWEEN THE ITERATED INIPROVED ZSLS ESTMATOR AND THE 3SLS ESTINIATOR 5.1. Introduction In Chapters 2, 3 and 4, we showed that we can improve on the usual IV (ZSLS) and 3SLS estimators provided that we have available an extra vector of observable variables 111 which is uncorrelated with the instruments and correlated with the disturbances of the model being estimated. In this chapter, we will extend the improved IV (ZSLS) idea to the case when the extra information ut is consistenthi estimated instead of observed. Because the asymptotic distribution of the estimated 111 depends on the the model structure from which it is estimated, we choose to consider our extension in the context of a system of linear equations. It is well known that the only diflerence between equation-by-equation ZSLS and BSLS is that the 3SLS estimator utilizes information about the relationships among the disturbances of difi‘erent equations, but 2SLS does not. Then a natural question one wants to ask is whether it is possible for us, on one hand, to still keep the simplicity of the equation-by—equation 2SLS estimator; on the other hand, to also utilize the information contained in the covariance structure of the model disturbances, such that the new equation-by-equation estimator has the same asymptotic efficiency as the BSLS estimator applied to the entire equation system Telser (1964) has addressed essentially the same question in the context of a seemingly uncorrelated equation (SURE) system: 58 59 (5.1) ytg = x,,'eo, +e,,, g = 1, 2, G; t = 1, 2, T, where ytg is the dependent variable of equation g at observation t, xtg is the K8 x1 vector of explanatory variables of equation g at observation t, at, is the model disturbance of equation g at observation t, 6,8 is the K8 x1 unknown parameter vector of equation g, and xt8 is strictly exogenous. Telser proved that the iterated LS estimators of the augmented equations (5.2) y, = xtg'eo, +e,(,)'7.o, +v,,, g = 1, 2, G; t = 1, 2, T, where at“) = (8,1,...,a,,8_1,a,_g,l,...,etG)', log = [E(e,(g)s,(g)')]‘1E(s,(g)atg) and vtg = at8 - “9'20? are asymptotically as efficient as the SURE estimator of the system (5.1). It turns out that the analogous result is also true for a simultaneous equation system The rest of the chapter is organized as follows. Section 5.2 defines the improved ZSLS (IZSLS) estimator. Section 5.3 describes the iterated IZSLS estimators. Section 5.4 proves that the iterated IZSLS estimators converge to a limit, and that their limit is asymptotically as eflicient as the 3SLS estimator. 5.2. Improved ZSIS Estimator The model considered in this chapter is (5.3) ytg = x,'e,, +stg, g = l, 2, ..., G; t = l, 2, ..., T, 60 where ytg is the dependent variable of equation g at observation t, xtg is the K8 x1 vector of explanatory variables of equation g at observation t, 8,8 is the model disturbance of equation g at observation t, and 903 is the Kg x1 unknown parameter vector of equation g. We assume that in general cov(x,g,etg) at O for all g. Suppose that we have available an M x1 vector ofinstruments zt satisfying E(ztstg) = 0, with E(ztxtg') having fiJll column rank for g = l, 2, ..., G. We define the following notation: YIg xlg 813 zl (5.4A) y,= : ,Xg= : ,88= : ,Z= : ; YTg ng' 8T3 ZT. 91 (5.4B) 9 = E , a = [81,...,EG], a. = vec(e); 9o (5.4C) 8“,) = (8,1,...,et’g_1,e,,g+1,...,e,G)'; 31(3). (5.4D) 8(8) = : =(81,...,88_] 388+1’”"SG); 8m). (5.4E) PZ = Z(Z'Z)“Z‘ provided that 22 is nonsingular; (SAP) )2, = sz8 for g = 1, 2, G; (5.4G) L(h|W) = W20 = the linear projection of h onto W, with 20 = [E(W'W)]'1E(W'h) defined for any s x 1 vector h and any sx m matrix W as long as E(W‘W) is nonsingular. 61 It is important to note the distinction between a, (T observations for the error of equation g) and 8(8) (T observations for the errors of all the equations except equation g). With this notation, (5.3) can be rewritten as (5.5) yg = x8908 +88, g :1, 2, ..., G. We make the following "high level" assumptions. (A5.1) (As.3) plim— Xl' [X1,...,XG ,Z,81,...,SG] A16 A12 Axe.“ AXEJG AGG AGz Axe,Gl Axa,GG A2G A22 0 0 exists. AflJG 0 511 0' lG Asx,GG 0 CG] GGG j T'”2 (1G ez')e. —+ N(o,2eA,,). 62 (ASA) T‘1/28(i)'vi —) N(O,Qi) With Vi = 8i - L(8i 8(3)) for i = 1, 2, ..., G. 0'“ Ola—1 51,1“ 016 q._ 0-._ ._ o’._. c._ . Define 2,, = ‘ I" ' I" l ‘ 1"“ ‘ LG 1= 1, 2, ..., G. 01+” ' ” o'i+1.i—1 °i+1,i +1 "' 0i+1,G _ CG] oGJ—l oG,i+1 6G6 _ Assumption (A5.2) implies that 2n is nonsingular for all i These high-level assumptions are derivable from various sets of more basic assumptions. For example, in a time series context, define et = (8,1,...,e,G)' and ‘I’t = {z,;z,_1,e,_l;z,_2 ,e,_2 ;...}. Then (A5.1)-(A5.4) follow from the assumptions that E(e,|‘l’,) = 0, V(e,|‘l’,) = E, (xu',...,x,G',z,')' is covariance stationary, and the fourth moments of et exist. We now define our IZSLS estimator as the IV estimator of the augmented equation (5.6) yg = X890g +s(g)kog +vg, g = l, 2, ..., G, using (Eyew) as instruments, where (5.7A) 103 =[E(8(8)18(8))]‘1E(8(8)'88)_=_(kogl,n,,)\,og’g_l,xog,g+1,o.., 1086 )1 (5.7B) v8 = a, — L(ag|e(g)) = as -—s(g))tog. 63 If we knew 8(8), then as pointed out in Chapter 3, adding 8(8) to (5.6) reduces the variance of the model disturbance in (5.6) from V(sg) to V(88|8(8) ), and IV applied to (5.6) would be more eficient than IV applied to (5.3). However, the IV estimation of (5.6) is infeasible because we do not observe 8“,). In the next section we will define the iterated feasible IZSLS estimator. 5.3. Iterated IZSLS Estimator Our iterated feasrble IZSLS estimator of 90 = (601',...,OOG’)' in (5.3) is defined as follows. TKO—mill): Apply the usual IV (ZSLS) estimation method to (5.3) equation by equation, using Z as instruments, to get the initial consistent estimate 6(0) = [6A)1(0)',...,éG (0)']' of 90. Then we estimate the model disturbance a, by (5.8) e,(0) = y, -x,é,(0) for g =1, 2, ..., G. Round 1: For equation 1, apply the usual IV estimation method to (5-9) Y1 = X1901+8(1)(1)}~01+V1(1) using pt, ,8(,) (1)] as instruments, where 2.0, is defined in (5.7A), and (5-10A) 3(1)“) = [32(0)a---a50(0)] (5-103) V10) = Y1 ‘ X1901— 3(1)(1)7~01 = 81" 8(1)(l))~01~ 64 Denote the IV estimate of (5.9) by [61(1)',il(1)']'. We then update the estimate of a, by (5.11) e,(1)=y,—x,é,(1). For myation 2: Apply the usual IV estimation method to (5-12) Y2 = X2902 +3(2)(1)}~02 +V2(1) using [222 ,e(2)(1)] as instruments, where Am is defined in (5.7A), and (5-13A) 8(2)“) = [81(l)a33(0)a---aeo(0)] (5.13B) v2(1)= 82 —8(2)(1))toz. Denote the resulting IV estimate of (5. 12) by [62(1) , 22(1)]2 We then update the estimate of 82 by (5.14) 32(1)=Y2 4962(1). Generally, in Round 1 for equa_tiqn_g: We apply the usual IV estimation method to (5.15) y8 = X8908 +8(g)(1)7cog +vg(l) using [x,,e(,)(1)] as instruments, where 2.0, is defined in (5.7A), and (5.16A) 8(g)(l) = [81(1),...,88_l(1),8g+1(0),...,80(0)] (5.16B) vg(1) = 3:; -e(g)(l)7tog. 65 Denote the resulting IV estimate of(5.15) by [68(1), 28(1)}. We then update the estimate of as by (5.17) e,(1) = y8 — x8680). Round 2: For equation 1, apply the usual IV estimation method to (5-18) Y1 = x1901+3(1)(2)7~01+"1(2) using [22, ,8“) (2)] as instruments, where 9.0, is defined in (5.7A), and (5-19) 8(1)(2)=[82(1)a---a80(1)] (5-20) V1(2) = Y1 — X1901 " 5(1)(2))~01 = 81 ‘ 8(1) (2)101- Denote the IV estimate of(5. 18) by [61(2)',il(2)']'. We then update the estimate of 81 by (5.21) 81(2) = y1 — x,é,(2). For equation 2: Apply the usual IV estimation method to (5-22) Y2 = X2902 +8(2)(:’3)7~02 +"2(2) using [39 ,s(2)(2)] as instruments, where 202 is defined in (5.7A), and (523) 8(2)(2)=[81(2)’83(1)r°"80(1)] 66 (5.24) v2(2) = 82 — 8(2)(2)7\.02. Denote the resulting IV estimate of (5.22) by [02(2), 712(2)}. We then update the estimate of a, by (5.25) 82(2) = y, — x,é,(2). Generally, in Round 2 for equation g: We apply the usual IV estimation method to (5.26) yg = Xg90g +i»:(g)(2)?to8 +vg(2) using [233(c)(2)] as instruments, where 208 is defined in (5.7A), and (5-27) 8(9(2) = [81(2)a---rag—1(2)a8g+1(1)a-~-936(1)] (5.28) vg(2) = a, - 8(g)(2))tog. Denote the resulting IV estimate of (5.26) by [08(2) , 28(2)']'. We then update the estimate of 88 by (5.29) e,(2) = y, —x,é,(2). Further rounds continue in the same fashion. Generally, in Rormd n for equgtion g, we apply the usual IV estimation method to (5.30) y8 = X3903 +e(g)(n)}t0g +vg(n) using [5(8,s(g)(n)] as instruments, where 208 is defined in (5.7A), and 67 (5.3 1A) 8(g)(n) = [81(n),...,eg_1(n),ag+1(n — l),...,sG(n — 1)] (5.31B) vg(n) = as — 8(g)(n)}tog. Denote the resulting IV estimate by [03(n)',ig(n)']'. We then update the estimate of a, by (5.32) e,(n) = y8 — xgé,(n). Thus, we have finished describing the iterative procedure of the feasible IZSLS estimator. This process is essentially the same as defined by Telser (1964) for the SURE model, except that IV is used here where OLS was used in the SURE model We may note that other similar iterative processes are also possible. For example, for equation g in rormd n, we could augmented the equation and instruments set by 8Z8)(n) = [81(11 — 1),...,88_1(n —1),8g+1(n — 1),...,8G(n - 1)] instead of 8(8) (11) as in (5.31A); this amounts to using the estimates of a from round n-l to estimate all equations in rormd n. This would change our algebra but not any of our conclusions since, if the interative processes based on 8(g)(n) and 8:901) both converge, they obviously converge to the same limit. We now show that the iterated IZSLS estimators defined above are consistent and asymptotically normal in every rormd of iteration. LEMMA 5.1: T"2[é,(n) - 90,1» N[o,n,(n)], where Q,(n) is a finite positive definite (pd) matrix, forn = 0,1, 2, ..., and g =1, 2, ..., G. 68 Proof: We will only prove the case of G = 2. The proof for general G is essentially the same. When n = o, é,(0) is just the usual 2SLS estimate ciao, in (5.3). Then it is well known that (5.33) T1/2[é,(0)—90,] —> N[o,o,,(A,,A-,}A,,)-l] using (A5.1)-(A5.3), for every g. When n = 1, using the definition that [08(l)',5tg(l)']' is the usual IV estimate of (5.9) using [3(1,a(,)( 1)] as instruments, we have (5 34) T1/2[él(1)-601] _ T‘lil'xl T'lil'8(1)(l) 4 T‘mil'vlfl) . T1/2[i1(1)_}\'01] — T_]8(1)(1)'X1 T_18(1)(1)'8(l)(1) T-1/28(1)(1)'v1(1) ' But (5.35) Tfig'xl = (T-‘x,'Z)(T"'z' Z)"(T”‘Z'X, ) : Aleir:z1Azl +0p(l) (5.36) T"x,'e(l,(1) = T“x,'e,(0) (using the definition of 8(1)(1) in (5.10A)) = T92. 'iya - 2962(0)] = T-lx, '[(y, — x,90, ) + x, (9,, - é,(0))] = Tame, — x,(é‘),(0)-eo2 )] = T‘1il'ez -(T-1x,'x,)[é,(0)-e,,] = (T'IXI'ZXT'll' Z)'1(T"Z' e,) —(T-lx,'Z)(T-1z' Z)-1(T-1z' x2 )[é,(0)—90,] :Al' ;.0p(1)—AIZA; 22‘0p(1)+0p(1) 69 (using (A5. 1) and the consistency of 02 (0)) = 0p( 1) (5.37) T'1em(l)'x1 = T“‘e,(0)'xl = T“[ya — Xaéamn'X. = T“[82 — Xa(éz(9)-eo)1'xr = T-le,'x1 -(é,(0)-902 )'(T-lx2'x1) = A8X,21—0p(1).A21+0p(1) (using (A5. 1) and the consistency of 02(0)) = Aam +op(1) (5-38) T_]8(1)(1)"3(1)(1) = T_1[32 " X2 (62(0)‘902 )“52 " X2(éz(0)"902 )] = 022 +op(1) using (A5. 1) and the consistency of 02(0). Combining (5.35)-(5.38), we have T-lx 'x T-lx' 1 A A-IA o , (539) —1 1' l -1 l 8(l)'( ) =I: lz zz zl +0p(1)- T 8(0(1) X1 T 8(1)(1)3(1)(1) Aex,21 022 Because (5-40) V1“) = Y1 _X1901 _8(1)(1)x01 3 Y1 ‘ X1901 " 82(0))”01 (using (5'10A» = 81 — [Y2 " Xzézmnxm = 31_[82 — X2(éz(0) _902)])\v019 70 (5.41) T‘mit'v. (1) = T_1/2X1'{31"32}V01 autism—902110.} = Twigs1 —T"’2x,'e,xm +T“’25‘<.'Xzié. ((1)—9.211.. = A A“ .T'mz'e, —A A" ~(T‘1/2Z'82 )lm lzzz lzzz +A12A221Az2 “Tm [62(0) “9021' 7V01"' 0p (1) using (A5.1)-(A5.2). Combining (A53), (5.33) and (5.41), we see that T‘1’2x,'v1 (1) is asymptotically normal with mean zero. Similarly (5.42) T‘1/28(D(1)'v,(1) = T‘1’282(0)' {e., 4.22.0, +x,[é,(0)-e,,]lm} = T-1/2{g2 _x2[(),(o)—902 ]}'{e1 —t:2)tm +X2 [92(0)_902]7L01} = T-me,t(a, —82101)-{Tm[éz(0)-9os]}'[T-1X2'(81 - an...” +(T-‘e,'x,)-T"’[€1,(0)—e,,]7t01 — {Tm 162(0)-9a1}'(T“X.'X. )ié.(9)-9a1>~u = T'mez'e. mi»...)—{T"2162(0)—9..1}'(A..,.. "'Axeazxm) +A,,,22 -T‘”[é,(0)-90,]i.01 +o,(1). Then combining (A5.4), (5.33) and (5.42), we see that T_1/28(1)(1)'V1(1) is asymptotically normal with mean zero. Substituting (5.39), (5.41) and (5.42) into (5.34), we have proved that T1’2[01(1)- 001] is asymptotically normal with zero mean. Thus T”2 [01(1) — 001] —) N[0,Q1 (1)], where 91(1) is the corresponding asymptotic covariance matrix. (For our present purposes we do not need to evaluate 01(1).) 71 The proof that T"2 [02 (1) — 002 ] —> N[0,Q2 (1)] is essentially the same as the proof given above. Then, using the inductive method, we can prove that T"2 [6,(n) - 90,1» N[o,o,(n)] for all g and n. 5.4. The Convergence and Asymptotic Efficiency of the Iterated IZSLS Estimators In this section, we give a convergence result for the iterated IZSLS estimator, and show that it has the same asymptotic efficiency as 3SLS. Since [08(n)',5tg(n)']' is the usual IV estimate of (5.30) using [5(8,e(g)(n)] as instruments, we have (5.43) X8.X8 x8'8(8)(n) 9801) = XS'YS . 3(n)'8(s)(n) 7‘s“) 8(e)(n)'ys Because (5.44) y8 = X390g +88 2 X8908 +8(3)A'08 +Vg (USing (5.7B)) = [Xg,8(g)](903'3x0g.).+vgs then )2 'y )2 ' 903] 5.45 8 3 = 8 x, ( > [WW8] [,(g)(n).]{t ,e,,1[,08 e.,} 72 = Xs'xa xs'em) [9%]4. xs'vs . 8(8)(n)'x8 8(11.)(11)'8(11) A'08 8(3)(n)'vs Substituting (5.45) into (5.43), we get (5.46) [ x,'x, x,'e(,,(n) [64m] 8(s)(n)'xg 8(a)(n)'8(s)(n) 9:801) : xs'xs xa'em) [9%]+ xs'va . 8(a)(n)'xg 8(a)(n)'8(s) )‘Os 8(g>(n)"’s In order to examine the convergence of[08(n)',ig(n)']' as n —> 00, we define the following notation: x 'x x ' (5.47A) 130 =[ 8. 8 8.803)] 8(11) X2. 8(8) 8(a) (5.4713) B = XB'XS X8.8(8)(n) 8(s)(n)'xs 8(8)(n).8(8)(n) (5470 C =[ XS'XB x,'a(,) ] s(g)(n)'X, 8(s)(n)'8N[0, Amp J. ,, ,1. — A,[T“2dJ -(n)]+o, (1) (using (A5.1)) —>N[0, A- J.er.(n)A, ] using Lemma 5.1. (2) T-“zx,'xJ.dJ.(n) = (T‘lx,'2)(T-‘z' Z)’1(T"Z‘XJ.)[T1/2dJ-(n)] = A,A;,‘A ,.T“’dJ.(n)+oJ, (1) (using (A5.1)) —> N[O, A,A;,1A,J.QJ(n)AJ.,A;,‘A,.] using Lemma 5.1. (3) T‘mai'XJ-dJ-(n) = (T-‘e.'xJ.)[T”2d.(n)] =A,, ,de. J(n)+o, (1) (using (A5.1)) —+ N[O, A,,J.Q J(n)A,, ,] 8O usingLemma 5.1. LEMMA 5.5: Each element of T‘1/2(AC, - AB,) converges in distribution to a normal distribution with mean zero as the sample size T—) co. Proof. Using the expressions for AB, in (5.47F) and AC, ill (5.47G), we get --l/2A l (5.63) T—U2(AC,-AB,)= 0 -1/2 . T Xg 2,201) . J 0 T 8(,)D,(n)—T D,(n) D,(n) But -l/2A I (5.64) T X, D,(n) = T'l’zi,'[xld1(n),...,X,_1d,_,(n),X,,1d,,l(n —1),...,deG(n —1)]. Then using Lemma 5.4, part (2), we see that each column of T’mi,'D,(n) converges ill distribution to a multivariate normal distribution with mean zero as T —> 90. Similarly, (5.65) T'1/28(,J'D,(n) = T'“2 8-1 [X1d1(n),...,X,_1d,_1(n),X,+ld,+l(n—1),...,XGdG(n-1)] = (Pr—“2%'dej(n.))(G—l)x(G-l) 81 where i,j=1,...,(g-1),(g+l),..., G, and 11‘ = n ifj < g otherwise n' = n - 1. Then, using Lemma 5.4, part (3), we see that each element of T'1/28(,J 'D,(n) is asymptotically normal with mean zero. Finally, (5.66) T'mD,(n)'D,(n) = (T'mdtfllt )'Xt'delea ))(G-1)x(G—l) using the definition of D,(n) in (5.47B), where n], ll2 = n or n - 1. But, as T—> 60, (5-67) dethlt )'Xt 'dejflla ) = di(n1)'(T—lxi 'Xj)[de,-(n2 )1 = 0p(1) using Lemma 5.1 and (A5. 1). Substituting (5.67) into (5.66), we obtain (5.68) T'mD,(n)'D,(n) = o,(l) as T —) 00. Therefore, substituting (5.64), (5.65) and (5.68) into (5.63), we prove that each element of T’m(AC, — AB,) converges in distribution to a normal distribution with mean zero asthe sample size T—)oo. LEMMA 5.6: Both Twig“, and T“1/28(,J(n)'v, converge ill distribution to multivariate normal distributions with mean zero as the sample size T —> 00. Proof: Using the definition of v, in (5.7B), we have (5.69) T“’2x,'v8 = T‘1’2x,'[e, - 2,80,] = (T"X,'Z)(T"Z' Z)"‘ ~T’U2Z'[a, — 8(,))to,] = —A,,A;,l -T“’2Z'e)to, +o,(1), 82 where we define (5,70) 71,, = (8,1,...,l,,,_,,—1,>.,,,,,,...,>.,,G )' with 7.0, for j = 1, g - 1, g + 1, Gdefined in (5.7A). (Note that 5.0, is not an estimate.) But —l/2 " __ —l/2 . " (5.71) T 282,, — vec(T Z 820,) = r“2 (51,691,, )vec(Z'i-:) = T“2 ('22,, '81,, )(1G 8 Z')vec(8) = (i,,'<8IM)-T‘“2(IG ®Z')vec(8) -> N(O,Q.) using (A53), where a. = (i0,'®1M)(2®A,,)(ioJJ ®IM) = (i,,'2i,,)-A,,. Combining (5.69) and (5.71), we see that -l/2A I (5.72) T x, v,J —>N(0,Q) ° -1 --1 ~ I ~ -1 WIth Q = A,,An ll. -A,,A,, = (1.0, 220,)-(A,,A,,A,, ). Similarly, using the expresion for 8“,) (n) in (5.50), we have (5.73) T'ms(,J(n)'v, = T‘"2 [e(,) —D,(n)]‘v, _ 1/2 -l/2 I But (5.74) T‘1’2a(,'v, —> N(0,Q,) 83 using (A5.4). T-Uzd] (11)le 1v8 T-l/2dg_1 (nyxg—l ng (5.75) T'mD (n)'v = _ J J 8 g T ll2dg+1(n_l) xg+l V 8 _ T'de(n —1)'XG'V, J using the definition of D,(n) in (5.47E). The expression ill (5.75) converges to a multivariate normal distribution with mean zero because, for any i and 11‘ = n or n- 1, T'1’2d,(n")'x,'v,J = [T"2d,(n‘)]'-T“x,'[e, — 2,30,] (using the definition of v, in (5.7B)) = - 1T"2d. (n‘ )1'-T"X: '81., (using the definition of X0, in (5.70)) = 1T‘”d.(n‘>1'iA.... WA 1%, +0, (1) using (A5.l). Then, according to Lemma 5.1, T‘mdi (n')'X, 'v, converges to a normal distribution with mean zero. Substituting (5.74) and (5.75) into (5.73), we prove that T“ma(,)(n)'v, converges ill distribution to a multivariate normal distribution with mean zero. THEOREM 5.1: (i) For any g = l, 2, G, both Tm[0,(n)—00,] and T”2 [i,(n) — 20,] converge ill distribution to multivariate normal distributions with mean zero. (ii) T“’[é,(n)-e,,]=(A,,A;,‘A,,)-‘[T*“2x,'1),(n).i.,, +T—“2x,'v,]+o,(1). 84 Proof: (i) Applying Lemmas 5.3, 5.5 and 5.6 to (5.56), the result follows immediately. (ii) Substituting Lemma 5.3 and (5.63) into (5.56), we obtain 1/2 " _ (5.76) Tmfligm 908] T [xg(n)—A'Og] _A,,A“Az, 0 "{o T“”i,'D,(n) [90,] AM, 2 o T’1/28(,J'D,(n)-T'mD,(n)'D,(n) 2.0, +[ T-1/2X' ,v, 8]}+°p(1)- T-1/28(8)(n)lv Then, using the partitioned inverse rule, we obtain (5.77) T1’2[0,(n)—00,]=(A,,A;21A,,)‘1[T‘1/2i,'D,(n)-Ao, +T-1’2i,'v,]+o, (1). We will rewrite (5.77) slightly, as (5.78) T1/2[é,(n)-90,] = (T-lx,'x,)-1[T-V2x,'D,(n)20, +T-1’2)“(,'v,]+op (l). Prermlltiplying (5.78) by (vigor, ), we obtain (5.79) (T-li,'x,)-TW[é,(n)—eo,] = T-1/2x,'D,(n)-l,, +T-1’2x8'v, +oJ,(1) 85 01' (5.80) (T-lxg'x,).T1/2d,(n) = T-1/2x,'D,(n).7.0, +T‘1’25(,'v, +oJ, (1) using the definition of d,(n) in (5.47D). Substituting (5.47E) into (5.80), we obtain (5.81) (T-lx,'x,)-T1/2d,(n) = T-lnig'lxldl(n):---axg—ldg—l(n):xg+1dg+l(n _ 1))-":XGdG (ll —1)]A‘Og +T'1’25(,'v, +o,(l) . 8-1 G a = T-1/2X,'{J§1x,d,(n)x,, + JZJXJ-dJ-(n—1)}uo,J-}+T‘1’2X,'v, +o,(l) J=8+ using the definition of 20, in (5.7A). This can also be rewritten as (5.82) (T-lx,')"(,)-T1I2d,(n) g—l A G A = 2(T‘1X,'XJ-)T"2dJ-(n))t0,- + z (T-lx,'xJ.)T1/2dJ.(n—1)>.O, j=1 j=g+l +T—l/2xgovg+op(l) ,_1 A A G a .. = E(T-txgtxjfltndjwno, +j=§+J(T—IXB'XJ )Tl/sz.(n— 1))to, +T-l/2Xg'v8 +01) (1) using the fact that )2ng = (sz,)'xJ. = (PZX,)'(PZXJ.) = 52352,. Define ' 0 0 0 0 7,02,52,66, 0 0 0 (5.83A) L=T-l 2.03,)? '22, 7.0,,x3'x, 0 0 y .8. x) Q- y § X G. tax, >’ § >4 Q- cox) >2 .8 ‘3 ...x’ o- >4) ‘1’ 86 (5.83B) 0 Aolle'xz 7*0132‘1'7‘3 A01,ci—1Z(1'Xo-l )‘01le 9ft; 0 0 2.023X2 'X3 A02,6-1X2'Xc—1 Amoxz 'XG U = '1‘.1 E . i ' 0 0 0 0 )‘0,G—I,GXG—l '26 _0 0 0 0 0 - Pi] - a )2, _J ..J .. (5.83C)X= ,D=T XX - XG .1 (1101) v1 (5.83D) d(n) = E , V: ‘ dG(n) VG With this notation, (5.82) for g = 1, ..., G can be expressed in matrix form as (5.84) D - T1’2d(n) = L - T1’2d(n) + U-T1’2d(n — 1)+ T'l’zi'v + 0p (1). Solving for T1’2d(n), we obtain (5.85) T1’2d(n) = (D — L)‘1U-T1’2d(n — 1) +(D — L)‘1 ~T‘1/2X'v + o, (1) forn =1, 2, The iteration procedure defined in (5.85), apart from the 0p (1) term, is just the Gauss- Seidel iteration method (see Varga, 1962). We now wish to show that this iterative process converges to a limit, say d', and that AV(de') equals the asymptotic variance 87 of the usual BSLS estimator of 00 in (5.3). In order to prove these results, we first need to establish Lemmas 5.7 and 5.8. Define (5.86A) V = (v1,...,vG) with v,, i = 1, 2, ..., G defined in (5.7B) (5.86B) c = E(T’18' V) a (c, )G)“, (5.86C) A = (i,,,...,i,,) with 720,, i = 1, 2, G defined in (5.70). LEMMA 5.7: (l) —V=8A; 011 (2) C = ispd; cGG (3) —C = 2A; (4) L— D+U = T-lx'm'el, )2. Proof: (1) Using (5.70) and the definitions of a and 8(,) in (5.4B) and (5.4D), equation (5.7B) can be rewritten as (5.87) —v = 810,. Because (5.87) holds for all g, we can stack the equations together as (5.88) —V = 8A. (2) and (3): Because v, = a, — L(8, a“) ), then E(sm 'v, ) = 0 for any i. Therefore c,- = T‘1E(8,'vJ-) = 0 for i ¢ j. Premultiplying (5.88) by T'le', we get 88 (5.89) —T“e'v = T‘le'eA. Taking expectations of both sides of ( 5.89), we obtain (5.90) -c = 2A using (5.86B) and (A5. 1). Equation (5.90) can be rewritten as (5.91) —2“C = A. Denoting 2‘1 = (oij )GxG, then oii > 0 for allibecause 2"1 = (oij )6,6 is pd. Comparing the diagonal elements on both sides of (5.91), we get (5.92) --o“cii = —1 using (5.70), (5.86C) and the diagonality ofC. Then cii = 1/ oii > 0 for all i. (4) Using the definitions of L, D, U, and )2 in (5.83) and A in (5.86C), we can easily verify that L— D+U = T-lx'(A'eIT)x. LEMMA 5.8: All the eigenvalues of(D — L)"1 U equal zero. Proof. A x1. cllIT (5.93) X'(C®I,) = 89 CIIIK, x1. cGGIKO XG. III 01 )5) where Ki is the number ofcolumns in X, for i = l, 2, ..., G. According to Lemma 5.7, part (2), c, > 0 for all i. So we can define cillzlx, (5.94) C. = cgélxo Then (5.95) C = C3 . Using Lemma 5.7, part (4), (5.96) L - D + U = T-lx'm'el, )i = T-lx'((—2-IC)'8>IT))‘( (using Lemma 5.7, part (3)) = 44221081, )(2-1 81,)22 = 446212-181”)? (using (5.93)) = -T-lcz)"('(2-l elm? 90 using (5.95). Then (5.97) C:‘(D — L— U)C. = C.[T")('(2'1®IT)5(]C. a M using (9.96). Because both C. and T“)“('(2‘l ® IT))( are pd, then the matrix M defined ill (5.97) is also pd. Therefore there exists a nonsingular matrix, say P, such that (5.98) P‘IMP = lK where K = 20:, K,. Suppose x is any eigenvalue of the matrix (D — L)”1 U, then it satisfies (5.99) [xiK —(D—- L)“ U| = 0 where |A| = det(A). Using the facts that |A ~B| = |A| |B| and (D - L) is nonsingular, (5.99) is equivalent to (5.100) |x(D—L)- U| = 0, which is also equivalent to (5.101) :‘[x(D—L)—U]C. =0 C. because ¢ 0. Substituting (5.97) into (5. 101), we get (5.102) ‘xM+(x-1)C:1UC.. = 0. 91 Because |P| =1: 0, (5. 102) is also equivalent to (5.103) |P“[xM +(x—1)CI1UC.]PI = 0. Substituting (5.98) into (103), we obtain (5.104) IxIK +(x— 1)(c.1>)‘l U(C.P)| = 0. Using the fact that all the eigenvalues of a strict upper triangular matrix are zero, we conlude that all the eigenvalues of U are zero, since U is a strict upper triangular matrx. Because (C.P)‘1U(C.P) conjugates with U, then all the eigenvahles of (C..P)’1 U(C.P) are zero. Next we use the facts that ifan H x H matrix Q has eigenvalues p1, ..., on, then (1) for any scalar a, the eigenvalues ofmatrix orQ are up], ..., con; and (2) for any scalar or, the eigenvalues ofmatrix (011H +Q) are (or +p1), ..., (a +pH). From this we can conclude that all the eigenvalues of the matrix [xIK +(x —1)(C.P)'1U(C.P)] are equal to x. Then (5.104) is equivalent to (5.105) xK = 0 because the determinant of a matrix equals the product of its eigenvalues. Solving for x, we get x = 0 with multiplicity K. Therefore we have proved that all the eigenvalues of the matrix (D— L)‘1U are equal to zero. Define d’ to be the limit of the iterative process (5.106) Tl/2d(n) = (D— L)-1 U-T1/2d(u — 1)+(1)- L)-1.T-1/2x'v, 92 which is the same as (5.85) except for an 0p (1) term Because all of the eigenvalues of (D — L)‘1 U equal zero, d' exists and the iterative process (5. 106) reaches (1" in no more than K iterations. Furthermore, since (5.85) and (5.106) difi‘er only by an o,(1) term, the probability that the process (5.85) has a limit (in n) approaches one as T —> co; and the limit of this iterated 12SLS estimator has the same asymptotic distribution as d’. We now proceed to show that d‘ (and hence the iterated IZSLS estimator) has the same asymptotic distribution as the 3SLS estimator. THEOREM 5.2: de‘ —> N(0,W), with w = [A'(2"' 8A; )A]", where A: diag(A,1,...,A,fi). Proof. Since (1' is the limit of the process (5.106), it satisfies (5.107) TWd‘ = (D— L)‘1U-T"2d‘ +(D— L)--1 .T-Wx'v. Solving for de‘, we get (5.108) TWd' = —(L+U— D)-1-T-1/2x'v = —[T-15('(A'<8>I,))“(]-1-T-1/25('v, using Lemma 5.7, part (4). But (5.109) T-‘x'(A'®1,)5‘( = T1 (A'®IT) 93 = [A'®(T"Z'Z)“]- T"‘xG'z T"z'xl T"z'xG =A'(A'®A;,1)A+o,(1) (using (A5.1)) = —A'(C2“ 8A;)A+o,(1) (using Lemma 5.7, part (3) and the diagonality of C) = —A'(C®IM)(2" 8A;)A +oJ,(l). Similarly (5.110) T‘mx'v = T"“2 T'mil'vl 1 -l/2A r T XG vG (T—lxl rZ)(T-lZlZ)-l . T—l/2 Z'V] (T-IXG IZ)(T—lZlZ)—l . T—l/ZZIVG 94 A —1 ‘T-1/2Z.V1 1 22 = 5 + 0p ( 1) (using (A5.1)) AGZA'Z',‘ -T‘“’z'vG = A'(IG ®A;Z‘)-T‘“2 (IG ®Z‘)vec(V)+o,(l) (using the definition of v in (5.83D) and V in (5.86A)) = —A'(1G 8A;_,1).T-1/2(1G eZ')vec(sA)+op(l) (using Lemma 5.7, part (1)) = —A'(IG <8Ag.l )-T“”(Io ®Z')(A'®Ir )vec(8)+°r(1) = —A'(IG ®A221 )(A'®IM ) 'T-W (16 ® Z')Ve°(8) + 013(1) —> N(0,Wt) using (A5.3), where (5.111) w1 = A'(IG <8A;,§)(A'erM)-(28»A,,).[A'(1G ®A;z‘ )(A'®IM)]' = A'(A'2A®A;,‘)A = A'[(—2"C)'2(—2“C)<8>A',1 ]A (using Lemma 5.7, part (3)) = A'(C2-IC®A;,1)A = A'(C®IM)(2" ®A;,1)(C®IM)A. Combining (5.108)-(5.111), we obtain (5.112) de‘ = —[T")“<'(A'®l,)ik)‘l me'v = [A'((:<81M)(2'1 8>A;,‘)A]‘l .T‘1’2x'v+o,(1) —> N(0,w‘) (5.113) w’ =[A'(C®IM)(2‘1<8A‘,‘z‘)A]’l -wl . 95 °{[A'(C ®IM)(E_1 ®AZ )1‘11—I }' = [A'(C®Iu)(2" 94;).«1‘1 - -A'(C®IM)(2“ 8A;)(ceIM)A. -{[A'(C GNMXY1 @142 )A1'1}' = [A'(C®IM)(2_1®A;)A]—l - ~A'(C®IM)(2“ 8A;)(C®IM)A. -[A'(2" 8A; )(C ®IM )Ar‘. But cllIM A21 (5.114) (C®IM )A = ' cGGIM cllAzl CGGAZG A21 c111K A26 5 A .("3 where Ki = the number ofcolumns ill X,, i = l, 2, ..., G. Substituting (5.114) into (5.113), we get (5.115) w‘ =[(AC)'(2'1®A;Z')A]’l ~(AC)'(2“ ®A;)AC-[A'(2“®A;)AC]"1 =[A'(2"@AQMT‘C"-CA'(2“(29A;)AC-C"1[A'(2'1®A;,')A]’1 96 =[A'(2“<8>A;21)A]‘1 =w. Because W = [A'(2‘l (8 A; )A]‘1 is just the asymptotic variance of the the usual 3SLS estimator of 00 ill (5.3), theorem 5.2 tells us that the iterated 12SLS estimator defined in section 5.3 of this chapter does indeed have the same asymptotic eficiency as the usual 3SLS estimator. CHAPTER 6 CONCLUSION In this dissertation we have shown how to improve on standard GMM estimators, given observable extra information. For linear models, and for certain kinds of nonlinear models, the additional information consists of variables that are uncorrelated with the instruments but correlated with the error(s) of the equation(s) being estimated. We believe that these results are empirically relevant, notably in the estimation of rational expectations models. An obvious further research question is the size of the efficiency gain that can be obtained in actual empirical work. Here the relevant issue is the strength of the correlation between the forecast errors in related series. We have also considered the case that the extra moment conditions involve parameters that need to be estimated, and we discussed the 3SLS problem in detail More generally, we could consider GMM based on the moment conditions ',0 ,0 (6.1) 9=E10(y:,9.)1=E[¢‘(y: °‘ ”)1, (1’2 (Yt £01,902) and using the weighting matrix C‘l, where C11 C12] (6.2) C = “HIT—>00 Eli'l"°d’T(9017902)(1>T(901’902 )'1 E [C C 21 22 T .. .. ... = T—l Z¢(Y:ael :92 ) Let 9 = (01.99;). be the t=l and where (1).,(91’92) = [¢Tl(91r92 )] ¢r2(91.92 ) corresponding GMM estimates. An alternative to this (standard) GMM treatment is to consider an iterative procedure, and this is feasible if 4), identifies 001 with 002 given, and 97 98 6, identifies 9,, with 0,1 given. Then, iii), and 0, are any initial estimates, we can consider the GM problems (6.3A) (6.3B) where (6.4A) (6.4B) (6.4C) (6.4D) 6l = ”Emilie, {¢112(91262 )'C11¢1|2(91562 )} 62 = argmine,{¢2|1(ét.92)'C22¢2|1(9t.92)} ¢1|2(91’éz ) = 1911(91 aéz)" C12C2i¢T2 (61 ’62 )1 ¢2|1(éla92)=1¢T2(éla92)"C21C1i¢rl(élaéz)1 C“ =(C11_C12C221C21)—1 C22 = (C22 - C21C1‘11C12 )'1. This yields new estimates 0,, 0,, and the iterative process can be continued ill an obvious fashion. We conjecture that the limit of this iterative process is asymptotically equivalent to the GM estimate 0. This result, if true, is a considerable generalization of our results of Chapter 5 on 3SLS and iterated improved ZSLS. REFERENCES REFERENCES [1] Amemiya, T. (1985), Advanced Econometrics, Cambridge, Mass: Harvard University Press. [2] Gallant, AR. and H. White (1988), A Unified Theory of Estimation and Inference for Nonlinear Dynamic Models, Oxford: Basil Blackwell [3] Hansen, LP. (1982), "Large Sample Properties of Generalized Method of Moments Estimators," Econometrica, 50, 1029-1054. [4] Imbens, G.W. (1992), "An Efiicient Method of Moments Estimator for Discrete Choice Models with Choice-Based Sampling," Econometrica, 60, 1187- 12 14. [5] Imbens, G.W. (1993 ), "A New Approach to Generalized Method of Moments Estimation," unpublished manuscript, Harvard University. [6] Imbens, G.W. and T. Lancaster (1994), "Combining Micro and Macro Data in Microeconometric Models," Review of Economic Studies, 61, 65 5-680. [7] Schmidt, P. (1986), "A Curious IV Result," unpublished manuscript, Michigan State University. [8] Schmidt, P. (1988), "Estimation of A Fixed-Effect Cobb-Douglas System Using Panel Data, " Journal of Econometrics, 37, 36 1-3 80. [9] Telser, LG. (1964), "Iterative Estimation of a Set of Linear Regression Equations," American Statistical Association Journal, September, 845-862. [10] Varga, R S. (1962), Matrix Iterative Analysis, Englewood Clifi’s, New Jersey: Prentice-Hall, Inc. [1 l] Wooldridge, J.M. (1993), "Efficient Estimation with Orthogonal Regressors," Econometric Theory, 9, 687. 99 nICHIan sraTE UNIV. LIBRARIES 11111111111|11111111111111111111111 31293014172666