[ >O\ ALA—L A. _ LIBRARY Michigan State ___v 993mm This is to certify that the dissertation entitled THREE ESSAYS ON ECONOMETRICS presented by CHIROK HAN has been accepted towards fulfillment of the requirements for Ph.D. degree in ECONOMICS tam 3&2:er Major professor Date JUNE 14, 2001 MS U i: an Affirmative Action/Equal Opportunity Institution 0-12771 PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/01 C'jCIRC/DateDue.p65-p.15 THREE ESSAYS ON ECONOMETRICS By Chirok Han AN ABSTRACT OF A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Economics 2001 Professor Peter J. Schmidt ABSTRACT THREE ESSAYS ON ECONOMETRICS By Chirok Han This dissertation contains three unrelated essays in econometric theory. The first chapter considers Generalized Method of Moment-type estimators for which a criterion function is minimized that is not the “standard” quadratic distance measure, but instead is a general LP distance measure. It is shown that the resulting estimators are root-n consistent, but not in general asymptotically normally distributed, and we derive the limit distribution of these estimators. In addition, we prove that it is not possible to obtain estimators that are more efficient than the “usual” Lg-GMM estimators by considering Lp~GMM estimators. We also consider the issue of the choice of the weight matrix for LpdGMM estimators. The second chapter is concerned with the asymptotic properties of the instrumental variable estimators with irrelevant instruments. The estimator is neither consistent nor asymptotically normal, but converges in distribution to a random variable which depends on the covariance of the regressors and the error term. The density of the asymptotic distribution is calculated and it is Shown that the mean of the asymptotic distribution is equal to the probability limit of the OLS estimator. The last chapter is an extension of Ahn, Lee and Schmidt (2001) to allow a parametric function for time-varying coefficients on the individual effects. It is shown that the main results of Ahn, Lee and Schmidt (2001) hold for our model, too. Least squares is consistent, given white noise errors, but less efficient than a GMM estimator. ACKNOWLEDGMENTS I would first like to thank Professor Peter]. Schmidt, the chairperson of my dissertation committee. He taught my first econometrics course and his enthusiasm and knowledge of this subject inspired me to pursue my present studies. I am grateful that he kindly acted as the Chairperson of my dissertation committee and guided me the for the course of my stud- ies and writing of this dissertation. I would also like to thank Professor Robert M. de Jong, who taught me a great deal on methodological details and helpfully guided me in writing the first chapter of this thesis. I also would like to express my gratitude toProfessor Jeffrey M. Wooldridge for his timely comments and insightful critiques. Without my committee members’ help, this thesis would not have been possible, although I am responsible for any remaining mistakes. I am greatly indebted to my wife, Youngmi, for her love, support, and patience. My four-year-old son, Yoojeong, also helped by his constant love, which is a continuing great source of joy. My special thanks also go to my parents and my parents-in-law, who have supported and encouraged me in various ways with unconditional love and care. I wish to thank my friends for their time, friendship and support. Some of them are Hoon Kim, Seok Hyeon Kim, Douglas Harris, Hong Peng Ong, and Neil Megan. I also would like to thank all the individual Korean students in the Department of Economics. I would like to extend a Special word of thanks to the staff in the Department of Eco- nomics, especially, Margaret Lynch, Linda Wirick and Pamela Dorton who skillfully helped me to handle many administrative items. Finally, I acknowledge with gratitude Reverend Hyo Nam Hwang, Reverend Jung Kee Lee, and all the members of the Lansing Korean United Methodist Church for their concern, help and friendship. Thanks and glory to God. iii TABLE OF CONTENTS Chapter 1 The Properties of Lp—GMM Estimators .............................. l 1 .1 Introduction ....................................................... l 1.2 Main theorem ..................................................... 3 1.3 Efficiency of Lg-GMM ............................................. 5 1.4 Further remarks on weight matrices .................................. 7 1 .5 Conclusion ........................................................ 12 LA Mathematical Appendix ............................................ 14 Chapter 2 The Asymptotic Distribution of the Instrumental Variable Estimators When the Instruments Are Not Correlated with the Regressors ........................... 21 2. 1 Introduction ....................................................... 21 2.2 The limit distribution ............................................... 23 2.3 The relationship with the CLS estimator ............................. 25 2.4 Conclusion ........................................................ 26 2.A Proof of Theorem 2.4 .............................................. 27 Chapter 3 Estimation of a Panel Data Model with Parametric Temporal Variation in Individual Effects ............................................................. 29 3. 1 Introduction ....................................................... 29 3.2 The model and assumptions ......................................... 31 3.3 GMM under the Orthogonality Assumption ........................... 33 3.4 GMM under the Orthogonality and Covariance Assumptions ........... 37 3.5 Least Squares ..................................................... 41 3.6 Conclusion ........................................................ 43 3.A The asymptotic variance of the GMM estimator ....................... 45 3B The asymptotic variance of the CLS estimator ........................ 46 Bibliography .................................................................. 46 iv Chapter 1 The Properties of Lp—GMM Estimators 1.1 Introduction Since Lars Peter Hansen’s (1982) original formulation, Generalized Method of Moment (GMM) estimation has become an extremely important and popular estimation technique in economics. This is due to the fact that economic theory usually implies moment conditions that are exploited in the GMM technique, while typically economic theory is uninforrnative about the exact stochastic structure of economic processes. GMM estimation provides an estimator when a certain set of moment conditions E g(y, 6’0) 2 0 is a priori known to hold. When the number of moment conditions exceeds the number of parameters, we cannot hope to obtain an estimator by setting the empirical equivalent 9(0) of our moment condition equal to zero, but instead we will need to make QM) as close to zero as possible in some sense. The usual GMM formulation minimizes a quadratic measurement of distance. Hansen (1982) established the large sample properties of these GMM estimators under mild regularity conditions. The above exposition raises the natural questions of what happens if distance measures other than a quadratic one is used and whether or not those other distance measures can give better estimators. The answer to the latter question is no, as Chamberlain (1987) has shown that the optimal GMM (in the usual sense) estimators attain the efficiency bound. Apart from this general remark on the efficiency of optimal GMM estimators, there have been at- tempts such as Manski (1983) and Newey (1988) to directly treat the use of non-quadratic measures of distance between population and empirical moments. In those articles results are stated that imply that under mild assumptions, estimators that minimize a general dis- crepancy function are consistent and asymptotically normally distributed. Based on these results, Newey (1988) concludes that (under regularity conditions) estimators using two different measures of distance are asymptotically equivalent if the corresponding Hessian matrices are asymptotically equal. This implies that it is impossible to obtain better esti- mators by modifying the quadratic criterion function, given the assumptions of that paper. This conclusion gives a direct argument for the use of quadratic distance measure beside Chamberlain’s general argument. However, when considering Lp-GMM as defined below, it turns out that only the L2 norm satisfies the assumptions of Manski (1983) and Newey (1988), and other values of p in [1, 00) do not. The problems are the following. When p = 1, the Lp norm is not differentiable at 0; when p E (1, 2), it is continuously differentiable but is not twice dif- ferentiable at 0; when p e (2, 00), it is continuously twice differentiable, but the Hessian matrix evaluated at the true parameter becomes zero (and therefore singular). Therefore, the papers by Newey and Manski have no implications for Lp-GMM for values of p other than 2. When considering Lp—GMM, it turns out that the “standard” asymptotic framework will fail. Also, the least absolute deviations type asymptotic framework also does not di- rectly apply. Linton (1999) has recently pointed out in an example in Econometric Theory that the estimator minimizing the L1 distance of the sample moments from zero can have a non-normal limit distribution. In this chapter, we will establish the limit distribution of general Lp—GMM estimators, and we show that Lp-GMM estimators are root-n consistent, but in general need not have an asymptotically normal distribution. In addition, we prove a theorem that Shows that Lp-GMM estimators cannot be more efficient than L2-GMM estimators, thereby strengthening Newey’s conclusion to Lp-GMM estimators. Finally, we discuss the problem of finding. the optimal weight matrix for Lp-GMM estimators. Section 1.2 defines our estimator and gives the main theorem for consistency and asymptotic distribution, whose proof is given in the Mathematical Appendix (Section 1.A). Section 1.3 discusses the efficiency of Lg-GMM among all Lp-GMM estimators. Section 1.4 describes the problem of the selection of the weight matrix. In addition, this section gives some interesting results for the case when p = 1 and p = 3, including Linton’s (1999) example. The conclusions section (Section 1.5) is followed by a Mathematical Appendix in which all the proofs are gathered. 1.2 Main theorem In this section, the main result of this chapter on which the remainder of our discussion of this chapter is based will be stated. Let y1, y2, . . . be a sequence of i.i.d. random vectors in W”. Let g(y,-, 9) be a set of q moment conditions with parameter 6 E O C N“, that is, let 9(2),, 6) be a random vector in R9 that satisfies Eg(yz', 00) = 0. (1.1) Let 9(6) : n‘l 2?:1 9(3),, 0). The LP norm H - Hp is defined as (I 1 Help = (Z W”) 4’ (1.2) i=1 for p E [1, 00). The LP-GMM estimator (in is assumed to satisfy Wimp = gig, ll§z(9>llp. (1.3) Let ‘1‘ : maxiJ' )xz'jl If [It IS a k1 X k2 matrix. LEI Q = Eg(yi,60)g(yi,90)’, 311d D = E (8/66’ ) g(y,;, 90). The regularity assumption below will be needed to establish our results: Assumption 1.1. (i) 9 is a compact and convex subset in IR)“ ,' (ii) 60 is an interior point of 9; (iii) E 9(3),, 6) = 0 if 6 :2 60, i.e., 60 uniquely satisfies the moment conditions; (iv) 9(3), 6) is continuous in 6 for each y E W”, and is measurable for each 6 E 6); (v) Esupeeo more)! < 00: (vi) 9 = 59(3/23 90)9(:¢/r, 90)’ I'Sfinite: (vii) 9(3), 6) has first derivative with respect to 6 which is continuous in 6 E 6) for each y E Rm and measurablefor each 6 E 6-), Esupgee |(0/66’)g(y,j, 6)] < 00, and D is of full column rank. (viii) Ha: + Déllp achieves its minimum at a unique point of{ in IR)“ for each :1: 6 IR". Note that part (viii) of Assumption 1.1 is nonstandard and far from innocent in the L1 case. Consider for example sequences of random variables y“ and gig that are independent of each other and are N (O, 1) distributed, and consider the Ll-GMM estimator that minimizes lifr —9| + lib-9|- (1.4) Part (viii) of Assumption 1.1 will not hold in this case, because any value in the interval [min(g1,372),max(g1,y2)] will minimize the criterion function. Therefore, our result does not establish the limit distribution of the Lp-GMM estimator for this case. However, if we consider the weighted criterion function lat - 91+ 61372 — 0| (1.5) for any c e [0, 00) except for c :2 1, part (viii) of Assumption 1.1 will be satisfied. The following theorem now summarizes the asymptotic properties of Lp-GMM esti- mators. Note that we do not yet explicitly consider weight matrices at this point, but such a treatment can be easily done with the result below at hand. Theorem 1.2. Let Y be a random vector in IRq distributed N (0, 9). Then under Assump— tion 1.), 6", ——> 60 a.s., and ill/2(én — 60) £14 argmin HY + Dgup. (1.6) teak The proof of this theorem, like all the proofs of this chapter, can be found in the Appendix (Section 1.A). As a special case of the above theorem, the usual Lg-GMM estimator can be considered. HY + Déllg 2: (Y + DE)’(Y + Dg) is minimized by f : —(D’D)“1D'Y, so applying Theorem 1.2, we get 1/2 ‘ d __ I —1 I I —1 I I -1 n (6,, -— 60) ——> (D D) D Y ~ N[O, (D D) D SID(D D) ] (1.7) which coincides with usual analysis. In examples below, we will Show that for general values of p, normality need not result for the Lp-GMM estimator. We will be able to establish though that the limit distribution is symmetric around 0 and possesses finite second moments. 1.3 Efficiency of Lg-GMM In this section and in the remainder of this chapter, we consider Lp-GMM estimation with a weight matrix W, i.e., Lp-GMM estimators that minimize the distance from zero of “weighted” average of moment conditions H W§(6) up, where W is a q x q nonrandom and nonsingular matrix. It is straightforward to extend our analysis to the case of estimated matrices W, and we will not pursue that issue here. Clearly, whenever Eg(y,-, 60) = O we will have EWg(yi, 60) = 0, and therefore our previous analysis applies. Below, we will keep using the notations Y, D, and 9 defined previously. Let g minimize “W (Y + D£)Hp. Applying Theorem 1.2, we see that 6", which mini- mizes ||Wg(6) Hp, is strongly consistent (since W 9(3),, 60) is also a set of legitimate moment conditions) and til/2(6n —— 60) —d—> f. To facilitate the efficiency discussion, we need to show asymptotic unbiasedness of LP- GMM estimators. This is established by noting that the limiting distribution of 721/ 2(6,, ~— 60) is symmetric and has a finite second moment. The following theorem states the unbi- asedness result: Theorem 1.3. Under Assumption 1.], LP-GMM estimators are asymptotically unbiased. Because of the asymptotic unbiasedness of our estimators, we can compare weighted Lp- GMM estimators by their asymptotic variances. This property is crucial to prove the fol- lowing theorem. This result states that optimal Lg-GMM estimators are asymptotically efficient among the class of weighted Lp-GMM estimators. Theorem 1.4. Under Assumption 1.], an optimal Lg—GMM estimator is asymptotically eflicient among the class of weighted Lp-GMM estimators, i.e., the asymptotic variance of an optimal Lg-GMM estimator is less than or equal to that of any weighted Lp-GMM estimator: The above theorem provides us with the knowledge that the central message from the result by Newey (1988)—that there is no potential for efficiency improvement by considering discrepancy functions other than quadratic—can be extended towards Lp-GMM estimators. Basically, Theorem 1.4 is obtained by noting that the expression for the limit distribution can be viewed as a finite sample estimation problem in its own right, for which the Cramér— Rao underbound applies. 1.4 Further remarks on weight matrices In this section, we will discuss various issues involving the choice of the weight matrix W and discuss several examples. We will not be able to prove optimality of a particular nonsingular weight matrix for general Lp-GMM, but instead we will sketch some of the issues below. It is well-known that the optimal weight matrix W for p = 2 satisfies W QW’ = I (or W ' W 2 9‘1). This result can be easily obtained using our first theorem too, for ||W(Y + Dani = (Y + Dg)’W’W(Y + DE) (1.8) is minimized by g: ——(D’W’WD)‘lD’W’WY (1.9) and its variance is minimized when W’ W 2 9’1. Therefore the optimal Lg-GMM esti- mator has the asymptotic distribution Ill/ha, — 00) 1"» N[O, (o’o—lorli. (1.10) Can this efficiency be attained for p other than 2? In general, the answer is yes. It can be achieved for general p by weighting cleverly. Consider W* :— (0‘10 3 I/VQ)’ (1.11) where W2 is of Size q x (q — k), chosen to be orthogonal to D, i.e., ”5D 2 O and chosen such that W* is nonsingular. This weight matrix always exists when q > k.’ Then P ) D’n-1(Y + or) HWWY+DOE= W§(Y + 05) lThis weight matrix needs WéD = O and |W*| aé 0, so there are (q -— k)k + 1 restrictions. But W* has q( q — k) free parameters. The number of parameters is greater than or equal to the number of restrictions when q > k. = HD’Q‘IY + 179406111; + HWéYHii (1 12> is minimized byé = —(D’a-ID)-lo'o-1Y ~ N[O, (D’Q‘ID)“1] for any 1) 2 1. So the W *-weighted Lp-GMM estimator 6” with W“ chosen as in Equation (1.11) has the asymptotic distribution (1.10), and therefore the weight matrix W* is optimal for any p. For p = 2, there are two different types of optimal weights. One is given by (1.11) (say, D'Q‘l type) and the other is characterized by W QW’ : I (note that a scalar mul- tiplication of an optimal weight matrix is again optimal). In general, each of these neither implies nor is implied by the other, but they give one and the same asymptotic distribution. Furthermore, the optimal weight of the second type is not unique, since any orthogonal transformation of an Optimal weight is again optimal. (When WOW’ 2 I, V : H W also satisfies VQV’ = I provided H’ H = H H’ z: I.) This is, of course, because the I'V-weighted L2 distance “Wm 12 : (:1;I W' W:r:)1/2 depends only on the product W’ W but not W itself. But when p 75 2, two different orthogonal transformations W and V of 12-1/2 are not expected to give equivalent asymptotic distribution, even though both WQW’ : I and VQV’ = I hold. Here are a few examples. (1') Our first example is for p = 1, q = 2, and ls: = 1. Suppose that it is known that (311,-, ygi) is i.i.d. across i with mean (60, 260) and covariance I. Then the moment condition is E(y1,- — 60,13, — 260)’ = 0, and therefore D = ——(1 2)’, and Q = I. Consider two weight matrices: W = I and V = —— . (1.13) ‘6 -—2 1 It can be seen that V is an optimal weight matrix here for p E [1, 00), since the same limit distribution as for optimal Lg-GMM is obtained using V. In the case W = I , the W-weighted Ll-GMM estimator can be obtained by minimizing the criterion function l'y'r-QIHy’rQHI, (1.14) and the minimizer equals (1 /2)g2. This implies that the rescaled and centered W- weighted Ll—GMM estimator is asymptotically distributed as N (0, 1 / 4), while the rescaled and centered V—weighted Lp-GMM estimator is asymptotically distributed N(0,1/5). (it) Here is a more interesting example for p = 1, q =2 3, and k = 1. Suppose that yu, ygi and 11/3,- are mutually independent and i.i.d. across 1', have a mean 60, and a variance of 1. Then, the moment condition is E(Illi — 190,1121' — 607.71% — 00), = 0, implying that D = —(1, 1, 1)’ and Q = I. Consider the two weight matrices W :- I and 1N3 1N3 we) 0 1/\/§ -1/\/§ . (1.15) -\/é7?? 1N6 1N6) Again, V can be Shown to be an Optimal weight matrix in this example. In the case ‘7 [I ll of W = I, this situation could result when we are minimizing the criterion function 161-9|+ly‘2-91+1373*91- (1.16) Note that both W and V are chosen to be orthogonal. The W -wei ghted Ll-GMM es- timator (after centering and scaling) converges in distribution to argminé H N (0, I 3) + Dflll. The minimizing argument 6, which is the (unique) median of three indepen- dent standard normal random variables, has distribution 1' P(£s:c):6 / (t)[1—(t)]¢(t)dt=(z)2[3—2(x)1 (1.17) -—00 (see Linton (1999)). This distribution is not normal, and simulations for three stan- dard normals illustrate that the density of the median has sharper center and thicker tail than a (properly rescaled) normal (N (0, 2 / 3)). The result of using V as the weight matrix is different. We have VD 2 (—\/3 O O)I and the V—weighted Ll—GMM estimator (after centering and scaling) converges in distribution to argmindHZ + vogul : (21—- (fig + [221 + 123)} where Z =—. (21,22,23)’ ~ N(O,I). Note that basically, this optimal weight matrix will eliminate two out of three absolute value elements of the criterion function of Equation (1.16). The solution 5 is dis- tributed N (0, 1 / 3). This example Shows that two weight matrices W and V that satisfy WOW' 2 I and VOV’ 2 I can give asymptotics different not only in vari- ance but in the type of limit distribution, since the one distribution is non-normal, while the other is normal. (iii) The only tractable example for p > 2 that we could find is the following. Consider the above case ofq 2 3, k 2 1, Q 2 I, and D 2 —(1 1 1)’. The weight matrix V of (1.15) is again Optimal and the V—weighted L3-GMM estimator is asymptotically normal. In the case when the weight is W 2 I so the objective function to be minimized is lat—613+1y‘2 —613+ 1.2/‘3 4913, (1.18) the W-weighted L3-GMM estimator (after centering and rescaling) converges in dis- tribution tog 2 argminEHY + Df||3 where Y 2 (Y1,Y2,Y3)’ ~ N(0,13). Let (Ya), 1"(2), Y(3)) be the order statistic of (Y ,Y2, Y3), and (6(1), 6(2), 6(3)) be the or- der statistic of (1Y1 — Y2], |Y2 — Y3], [YB — Y1 ). Then it turns out that ~ I5 = r7+sgn(Y(1) +313) ‘2Y'(2))i(2/3)(5(3)+5(2)) —(25(3)5(2))1/21 = Y0) -- sgn(Y(1) + Y(3) — 23/(2 )(3))[(5 -()26(3(S (22))1/ ] (1.19) 10 where sgn(a) 2 1{a > O} _ 1{a < 0} and Y 2 (Y1 + Y2 + Y3)/3. In simulations, this distribution cannot be distinguished from a normal. The natural question now arises whether we can get optimality by a nonsingular weight matrix W satisfying WOW’ 2 I. In Short, the answer is yes provided D’ O‘ID equals a scalar or a scalar matrix (a scalar times the identity matrix). The question here is whether we can construct (Ii—1D 5 W2)’ (where WéD 2 0) by an orthogonal transformation of (Tl/2, that is, whether there exists an orthogonal matrix H of size q x q such that HQ"1/2 2 MQ’ID W2)' and WéD 2 0. If such aweight matrix H is exist, it will have the form H 2 /\(§2_1/2D Ql/ZWQ)’ for which (a) WéD 2 O, (b) H is nonsingular, i.e., IHI 2 0, and (c) HH’ 2 I, i.e., D’O-ID D’Wg I 0 HH’ :- /\2 = . (1.20) W5 D WéQWg O I (a) imposes (q —- 1)h restrictions, and (b) imposes 1 extra restriction. When It 2 1, (c) is equivalent to VVéQI/VQ 2 (D’Q’ID)Iq__1 due to (a), which imposes (q — 1)(q — 2) / 2 more restrictions. Therefore, when k 2 1, we have q(q — 1) + 1 free parameters (for Wg and A) and (q — 1) +1 + (q —1)(q - 2)/2 2 q(q —1)/2 + 1 restrictions. So the number of parameters to be set is greater than or equal to the number of restrictions, whence we conclude that we can find W2 satisfying (a), (b), and (c). When It > 1, (c) can not be satisfied unless D’ Q‘ID is a scalar matrix, but if D’Q’ID is so, Wg and A satisfying (a), (b), and (c) can be found. The V matrices in the examples above are constructed in this way and are Optimal for those problems. Note that this rule does not depend on the specific value of p, and that the reason the weight W satisfying WQW’ 2 I is optimal for p 2 2 does not lie in that the optimal weight of type D’ 52—1 can be obtained by an orthogonal transformation of 9‘1/2, but in the specific prOperties of L2 distance. 11 1.5 Conclusion In this chapter we derived an abstract expression for the limit distribution of estimators which minimizes the Lp distance between population moments and sample moments, as follows: VH6” — 60) 11—) argmin “Y + DCHP (1.21) tenth where Y N N[O, Eg(y,-, 60)g(y,-, 60)’] and D 2 E(6/66’)g(y,-, 60). This asymptotic repre- sentation allows a generalization of the well-known GMM framework of Hansen (1982) to- wards the L1) distance. As mentioned in the introduction, Manski (1983) and Newey (1988) generalized GMM to allow arbitrary distance (or, more generally, discrepancy) function. But unfortunately they need the second order differentiability of the distance functions and the nonsingularity of a Hessian matrix evaluated at true parameter. Only the L2 distance satisfies these conditions among all Lp distances. However, our analysis can not give an explicit form for the asymptotic distribution, but only allows the above abstract representation in terms of the argmin functional. Nonethe- less, our method directly supports the result of Chamberlain (1987) that the optimal L2- GMM estimator is efficient among the class of Lp-GMM estimators. Interestingly, our analysis reduced the analysis of efficiency issues of Lp-GMM estimators to the analysis of the small sample properties of estimators minimizing the L1) distance between Y and —D5, i.e., argmingemk ”Y + Dfllp. As a final remark, note that it is interesting to consider potential robustness properties of the Lp-GMM procedure. The asymptotic results that were presented in this chapter all rely on central limit theorems and existence of second moments, so in this sense, we probably should not expect the Lp-GMM method to have robustness properties of any type. However, since the objective function in the case of Ll-GMM effectively puts less 12 weight on “outlier” moments, one might expect that Ll—GMM may be less vulnerable to the inclusion of an incorrect moment condition than “standard” Lg-GMM estimators. No attempt will be made however in this chapter to formalize this intuition. l3 1.A Mathematical Appendix In order to establish the theorems, we will need several results that will be Stated as lem- mas. Lemma 1.5 is used to prove Theorem 1.3 (the asymptotic unbiasedness of Lp-GMM estimators). Lemma 1.5. Let a random vector Y in R4 with finite q have a normal distribution. Let D be a real nonrandom matrix of size q x k (q 2 k) with full column rank. Then for any p E [1, 00), E 2 argmin “Y + Déllp (1.22) {GIRk will have a well-definedfinite covariance matrix. Proof. First note that, because D’ D has full column rank under Assumption 1.1, 5'6 S f’D'DE/AnmdD'D) s (HY + Dine + I1Y||2)2/Amtn(D’D) s f/WHY + 0511,. + in up) IAm... ||Eg(y,-, 6)”p a.s. uniformly on 9 Since I] . ”p is continuous. Finally, this uniform convergence result and the uniqueness of 60 by condition (iii) of Assumption 1.] give the Stated result by Theorem 4.2.1 of Bierens (1994). C] To prove the main assertion of Theorem 1.2, we will use Theorem 2.7 of Kim and Pollard (1990). We restate Kim and Pollard’s theorem as our next lemma. Lemma 1.7. Let Q, Q1, Q2, ' - - be real-valued random processes on Rk with continuous paths, and {27, be random vector in W“, such that (i) Q“) —’ 00 as '5' —" 00" (ii) Q() achieves its minimum at a unique point in IR,“ ; (iii) Q" converges weakly to Q on any set C 2 [—M, M ]k (iv) in = 012(1): (v) 5,, minimizes Qn(£). Then {n —d—> argminéem Q(§). Proof. See Theorem 2.7 of Kim and Pollard (1990). 15 To apply Kim and Pollard’s theorem and Show that its conditions are satisfied in our sit- uation, we need the following three lemmas. For these lemmas, we need the following definitions. Define {An 2 til/2(6n - 60), for n 2 1, 2, . . ., (1.25) where 6,, is Lp-GMM estimator. Define Rq -valued random functions 12,111,122, . . . by Ill/29(60 + {71—1/2), if 60 + {'n—1/2 E 9 til/29(60), otherwise where 60 2 argmaxaeg H9(6)||p, for n 2 1,2, . . ., and 12(5) 2 Y + D6 where Y is a W -valued random vector distributed N (0, I2). Let one) = Hh..(£)Hp and 62(5) = WON,» (1.27) The lemmas that we need for the proof of our central result are then the following: Lemma 1.8. Suppose the conditions of Assumption 1.] are satisfied. Then 72,1/2( 6n — 60) 2 Op(1). Proof The Taylor expansion of 9(6) around 6 2 60 implies that {70%) = 6090) + (0/39')§(6n)(én - 00), (1.28) where 6,, is a mean value in between 6,, and 60. From the above Taylor series, from the triangular inequality for the Lp norm, and from the fact that 6,, minimizes H9(6)||p, we have ~ - _ - 1 2- lln1/2(8/89’)§(9n)(9n — 60)”,, S lln1/29(6n)llp + ”n / 9(60)HP (I 29) s 211n1/2g(6oilip- But condition (vi) of Theorem 1.2 implies that n1/29(60) converges in distribution by cen- tral limit theorem, and therefore is Op(1). Therefore, iIn1/2(a/ae’1g(a><én -— 00>in -—- 010(1). (1.30) 16 Condition (vii) of Theorem 1.2 implies that (ii/66’ ) 9(6) follows a strong uniform law of large numbers, which combined with the consistency of 6 implies that (8/66’)9(6n) & E(6/66’)g(y,~, 60) 2 D. NowletD 2 (ii/0609(6).) Then it follows that D’D 3;) D’D. Since D’ D is strictly positive definite, D’ D becomes strictly positive definite for n large enough. Therefore, for n large enough, "(én - 90),“;72 - 90) S "(én — 60),D,D(én - 00)//~\mm. (1.31) where Ami" is the smallest eigenvalue of D’ D. Because 2mm 3'—> Am,” > 0 where Am," is the smallest eigenvalue of D’ D, we get Am,” 2 0.5Amin eventually (for 71 large enough) almost surely. Therefore, as 71. increases, the right hand side of (1.31) eventually becomes less than 4n(6n — 60)’ D’ D(6n — 60)/)\m,-,,. By Equation (1.30) and because of the equivalence of Lp and Lq norms for p, q E [1, 00), this expression is 012(1), which completes the proof. D Lemma 1.9. Consider random functions Q, Q1, Q2, . . . defined by (1.27). Under Assump- tion 1.], the finite-dimensional distributions of Qn converge to the finite-dimensional dis- tributions of Q. Proof. With fixed .5, condition (ii) of Assumption 1.] (60 is an interior point of O) ensures that 60 + fn‘l/z belongs to O for 71 large enough. When this happens, by the Taylor expansion, hue) = 721/2960 + girl/2) = n1/2§(00)+(3/39')§(90 +6716)»; (1.32) with g lying in between g and 0. Condition (vi) of Assumption 1.1 (finiteness of the second d . . .. moment of g(y,-, 60)) implies that n1/29(60) —+ Y, and condition (vu) of Theorem 1.2 implies that (6/66)9(60 +fn‘1/2)§ —+ D6 as. similar to the proof of Lemma 1.8. 17 To conclude the proof and show the convergence of the finite-dimensional distributions of hn to h, we can use the Cramér—Wold device (see for example Billinsley (1968), Theorem 7.7), which states that (hn(€1)l ' ° ' hn(€r)I)’ 1’ (6461), " ' h(€r)’)' (133) if and only if E Aginuj) 11+ 2 Aging) (1.34) j21 j=1 for each A1 6 Kg, . . . , Ar E R]. And (1.34) is to be easily shown using the result of the first part of this proof. Finally, note that Since [I - Hp is continuous, the finite-dimensional distribution of Q" 2 “(1an converge to those of Q 2 thlp by the continuous mapping theorem. D Lemma 1.10. Under Assumption I .1, Q,,,(.) defined by Equation (1.27) is stochastically equicontinuous on any set E 2 [—M, M ]k . Proof. Using the triangular inequality for H - Hp, we have linfr) - Qn.(€2)1 = l llhn(€1)l|p - llhn.(€2)||p| S th<€1> 2 (1.12162)le = 11(3/59')Q(90 +5171—1/2)€1 - (3/09')fl(90 +52'n—1/2162Hp (1.35) where 6,- lies in between 5t and O for i 2 1,2. By the strong uniform law of large numbers for (6/66’ )9(6) and the convergence to zero of fin—”2 uniformly over all {1 and f2. sup manpower—“fa sup ”Dar—amp (1.36) €1€E,|€1-€2|<5 €1€Eil€1-€2|<5 18 under the conditions of Assumption 1.1. Therefore, by nonsingularity of D’ D, it follows that for all 17 > 0 lim lim sup P( sup lQn(El) — Qn(.f2)l > 7}) 2 0, (1.37) 520 "200 tiara—aid which is the stochastic equicontinuity condition. C] Proof of Theorem 1.2. The strong consistency result of this theorem is proven in Lemma 1.6. For the proof of the main assertion of this theorem, we will show that for the Q", Q and 5,, as defined above, all the conditions of Lemma 1.7 are implied by the conditions of Theorem 1.2. First, note that Q“, Q, and En, defined by (1.25) and (1.27), satisfy conditions (i)—(v) of Lemma 1.7 under the conditions of Theorem 1.2. Condition (v) of Lemma 1.7 is guaranteed by the definitions of 61,, 5n, and Q". It is also not difficult to notice that condition (i) of Lemma 1.7 is trivially satisfied since D is of full column rank. And condition (ii) of Lemma 1.7 is just supposed by condition (viii) of Theorem 1.2. The weak convergence condition is verified by showing stochastic equicontinuity and finite-dimensional convergence, which together with compactness of the parameter space is well-known to imply weak convergence. Lemmas 1.8, 1.9, and 1.10 therefore ensure that the conditions of Lemma 1.7 are all implied by the conditions of Theorem 1.2, and therefore convergence in distribution of our estimator is proven by invoking Lemma 1.7. [:1 Proof of Theorem 1.3. By Lemma 1.5, E has a finite mean. And by Theorem 1.2, 111/2 (6,, — 60) i) f 2 argming ”Y + D§||p where Y N N(0,§2) and D 2 E(0/86’)g(y,;, 60). From the symmetry of Y, it follows that ”Y + Dg Hp is distributed identically to HY + D(—€) up, which implies identical distributions of f and —E. Therefore, the mean of g is O. [:1 Proof of Theorem 1.4. Let 6,, be the W -weighted Lp-GMM estimator. By Theorem 1.2, 111/269,, — 00) 3+ argmin ||W(Y + Damp. (1.38) E 19 So the problem here is to Show that 52 2 argminé HST-1(2(Y + D§)H2 has smaller vari- ance than any other (:1, 2 argminé ”W (Y + DOHP. Now, let us view the minimization problem ming ”Y + D6 H p as generating estimators 5,, of the unknown parameter 5, where Y ~ N (—D§ , Q) with known Q. The result of Theorem 1.3 now states that all Lp—GMM estimators will be asymptotically unbiased, and the argument can be easily extended to Show global unbiasedness of £1, for 5 (as required for the application of the Cramér-Rao lower variance bound). The likelihood function 110/, D; O = (270—6 2101“” 2exp{—(1/2)(Y + Dem-Ia + DO} (1.39) satisfies all the required regularity conditions for Cramér-Rao inequality (see Theil (1971), p.384). And it now follows that the asymptotic distribution of the optimal L2—GMM esti- mator argrnin ”524/20” + Dang 2 ——(D’§2_1D)”1D'Q_1Y (1.40) {elitk attains the Cramér-Rao variance lower bound of (D’Q‘ID)‘1, since it equals the maxi- mum likelihood estimator. The result then follows. D 20 Chapter 2 The Asymptotic Distribution of the Instrumental Variable Estimators When the Instruments Are Not Correlated with the Regressors 2.1 Introduction A number of recent papers, including Bound, Jaeger and Baker (1995) and Staiger and Stock (1997), have considered instrumental variable (IV) estimators when the instruments are weak, in the sense that the correlation between the instruments and the regressors is low. In this chapter, we consider the extreme case that the instruments are completely irrelevant. In this case we can prove the following interesting result: the mean of the asymptotic distribution of the IV estimator is the same as the probability limit of the OLS estimator. Thus, as might be expected, irrelevant instruments do not remove the least squares bias. To be specific, consider the linear model 9 2 X 6 + e (in matrix notation) where e is a T x 1 random vector with mean zero, X is a T X K random matrix of regressors, and ,6 is a K x 1 parameter. It is well known that when X t is correlated with at, the ordinary least squares (OLS) estimator is not consistent. More Specifically, under the regularity conditions that ensure the convergence of the statistics T“1X’X and T’lX’e in probability, the OLS estimator converges in probability as T —> 00 to 60 + (E XtXD ‘1 E Xtet, which is different 21 from 60, the true parameter, unless EX fit 2 0. To obtain a consistent estimator, one possibility is instrumental variable estimation. Good instruments Z (T x L) are those which satisfy: (i) T-lZ ’ Z converges in probability to a nonrandom, nonsingular matrix; (ii) T‘IZ ’ X converges in probability to a nonrandom matrix with full column rank; (iii) T—1/2Z’e converges in distribution to a normal random vector with zero mean. When the instruments are good, the IV estimator is consistent and asymptotically normal. Here we are concerned with the case that condition (ii) fails. Suppose that L 2 K, so that there are enough instruments, but the instruments (Z) are not strongly correlated with the regressors (X). Specifically, let the reduced form for X be: X=ZH+V OD Staiger and Stock (1997) consider the case that IT 2 HT 2 C/x/T, with C a L x K matrix of constants. They call this the case of weak instruments. In this case the correlation between X t and Zt is of order T ‘1/2, and condition (ii) fails. Staiger and Stock show that with weak instruments 61V, the IV estimator, does not have a probability limit but rather 6 IV — 60 converges to a non—normal random variable. The mean of the asymptotic distribution of 61V — 60 is non-zero, so that with weak instruments there is asymptotic bias. This bias is in the same direction as the bias of OLS. In this chapter we consider the case of irrelevant instruments, which are uncorrelated with the regressors. This is a special case of Staiger and Stock, corresponding to C 2 0 so that IT 2 0 in the reduced form (2.1) for all T. In this case we Show that the mean of the asymptotic distribution of (61V — 60) is the same as (plim 601,5 -— 60), the asymptotic bias of the OLS estimator. 22 2.2 The limit distribution Consider a linear model in matrix notation y 2 X°6 + W7 + e (2.2) where y and X ° are respectively a T x 1 vector of dependent variables and a T x K matrix of the endogenous regressors, W is a T x G matrix of exogenous regressors, the first column of which is a vector of ones, 5 is the vector of errors, and 6 and y are the parameters to be estimated. Consider a T x L random matrix Z ° of “instruments.” For any matrix A with full column rank, let PA = A(.4’A)-1A'. Let X = (I — PW)X° and Z = (I — PW)Z°. Thus X is the part of the endogenous regressors not explained by the exogenous regressors, and similarly Z is the part of the “instruments” not explained by the exogenous regressors. We make the following “high level” assumptions. Assumption 2.1. T-IX’ X, T‘le’e, and T’IZ' Z converge in probability to finite, non- random, nonsingular matrices, and T—IX’E converges to a nonrandom matrix. Let 2 2 plim T‘1(X,e)'(X,e). It has submatrices XXX, 2X5, and 055, which are the probability limits of T-lX’ X , T‘lX’e, and T ’15’ 5, respectively. Also let I) 2 plim T‘lZ’ Z. Assumption 2.] can be regarded as the implication of a law of large num- bers under more primitive assumptions on the sequences. For example, when the sequence (et,Xf',Zt°’)’ is i.i.d. and its second moment exists, XXX 2 EX°Xt°' — EXfl'Vt’ ' (EWtWt')‘1EWtX°', 2X5 2 EXt°et — EXth’(EWtI’V{)‘1E1’Vt5t = 1999086055 = Eef, and a : EZfo’ — EZfl'Vt’(EWtIl/’t’)“IEWth’. Let p 2 2&3622 X5051.” 2 which is a multivariate correlation coefficient. A key as- sumption is the irrelevance of Z as instruments for X, as follows. 23 . _ d Assumption 2.2. T 1/ZZ'(X, e) ——> til/205.2%; 7703,12) where vec(£,17) is a multivari- ate centered normal with Evec(§) vec(§)’ 2 I, E7717’ 2 I and Evec(f)n’ 2 p 8) 1. Note that Assumption 2.2 implies T‘lZ ’ X i) 0, which may agree with an intuitive def- inition of irrelevant instruments. Also, this assumption can be regarded as the implication of a central limit theorem under more primitive assumptions, as above. Now let 61V be the estimate of 6 in equation (2.2), when estimation is by IV using (Z °, W) as instruments. It is readily shown that 61V — so = [X’Z(Z’Z)“lZ’X]'1X’Z(Z’Z)‘IZ’e. (2.3) By dividing Z’ Z by T, and Z’ X and Z’ 5 by Tl/Z, we observe that 61V — 60 is a function e of (T‘lz’Z,T‘1/2Z’X,T‘l/QZ’c), where <12: RLXL x 116“ x IRLXI —+ 1&le is defined by tp(§l, A, b) 2 (A’Q‘lA)"1A’Q‘1b. Obviously, (p is measurable and is almost surely continuous in the limit. Here continuity is assured by the nonsingularity of the limit of T‘IZ ’ Z and the almost sure full column rank of the limit of T ‘1/ 2Z ’ X. Therefore, we apply the continuous mapping theorem to get the following result. Theorem 2.3. Under Assumptions 2.] and 2.2, ‘ ~ ' —1 2 .. 1 2 aw i I3... = a) + 2X)? (5’5) levee! , (2.4a) or equivalently, 5 = iii/WW — towel/2 —d+ 5... -—- (t’trlt’n. (2.41» We note that the result in (2.4a) is the same as equation (2.5) of Staiger and Stock (1997, P.562) when C 2 0 (and therefore /\ 2 0 in (2.3a) and (2.3b)). We now calculate the density of 5053, as follows. 24 Theorem 2.4. Under Assumptions 2. I and 2.2, the density of (Iasy is -1 —(L+l)/2 I p I f(d) = CK,L - (1 - rim—K” (I, d) (2.5) pl 1 d! where CK,L = 2—(L—1)(K—1)/27,—K/2 P(L+1) F (L—K+1)—1. § Proof See Appendix. 1:] Given K (the dimension of Xt) and L (the dimension of Zt), the density depends upon p only. As is mentioned in Phillips (1980), this density is similar to the multivariate t distribution. The first moment of Sasy exists as long as L is strictly greater than K, and more generally its integer moments exist up to the degree of over-identification. (See Phillips (l980,p.870)) 2.3 The relationship with the OLS estimator We are now in a position to prove our main result. Theorem 2.5. Suppose L > K. Then under Assumptions 2.] and 2.2, the mean of 6031/ is equal to the probability limit of the OLS estimator: Proof. We observe that the density of 5033, in (2.4b) is symmetric around p, the correlation coefficient of the endogenous regressors and the error. Furthermore, if L > K, the mean of 5,3,, exists. Therefore, if L > K, Etiagy = p. Then ~ —1 2 .~ 1/2 Ell/30,81] :2 /30 + E4YX/ Eéasyagg -1/2 1/2 :: 2 r r 05 [30 + AA p E (26) 2,130 + ZEIXZXE = plim BOLS- 25 C. An alternative proof that does not depend on the exact form of the density of 603,, is as follows. When the mean of (Iggy 2 ((6)4877 exists, EEO-1872 = EEO—lemma) (2.7) by the law of iterated expectations. But since E vec(f) vec(£)', E vcc(£)r)’, and E 1771’ are respectively equal to I <8) I , p (81 I , and I, En] vec(f) 2 (,0I ® I)(I <81 I)—1vec§ 2 (p. (2.8) (For the operations involved with the Kronecker product and vec operators, see Magnus and Neudecker (1988, Ch. 2).) Hence, E(§’f)‘1§’n = p. It follows that Bias, 2 [30 + E}%2Egasyo¥2 2 60 + XXIX 2X5, which is equal to the probability limit of the OLS estimator, as in the original proof. D 2.4 Conclusion In this chapter, we answered some questions about the IV estimator using irrelevant instru- ments in linear models. We saw that the IV estimator is not consistent but converges to a nondegenerate distribution which is similar to a multivariate t distribution. When the num- ber of instruments (excluding the exogenous regressors) is strictly greater than the number of endogenous regressors, the mean of the asymptotic distribution exists and is equal to the probability limit of the OLS estimator. 26 2.A Proof of Theorem 2.4 First, observe that the rows of the L x (K + 1) matrix (6 , 77) are a random sample from I . . N (0, J) where J = (p, 3’). Thus, (6, 77)’(6, 77) has a K + 1 dlmenSlonal central Wishart distribution with L degrees of freedom on the covariance matrix J. When L 2 K + 1, its density at the point 6’6 2 31,6’77 2 b2, and 77’17 2 ()3 is 9(311 b2, b3) = 2—L(K+1)/2FK+1(%)—1(1- p’p)‘L/2>< 1 1 (2.9) (3729-16-2) exp{—§ tr r13} where B 2 8,1 b2 and I}, is the multivariate gamma function defined as 02 03 n . rum.) 2 n"("-1)/4 H I‘(a — Lg—l). (2.10) i =1 (See Johnson and Kotz (1972, p.162).) Following Phillips (1980), consider the one-to-one mapping 6) on the set of K + 1 dimensional, real, symmetric, positive definite matrices defined as B b ( B 3‘11) ,7,, 1 2 2, 1 1 2 (2.11) (1’2 (23 (62131-1 b3—b’zBl‘lb3 Then the inverse 16—1 is A Ad 071: A1 d —+ 1 1 (2.12) d, (73 (ll/11 a3+d’A1d whose Jacobian turns out to be |A1|. Therefore, by the change-of-variable technique, the density of the symmetric random matrix, which is defined such that the upper-left K x K diagonal block is 6’ 6 , the lower-right 1 x 1 diagonal block is 17’77 — 77' 6 (6’6 )"16I 77, and the upper-right K x 1 off-diagonal block is Susy = (6’6)‘16’77, evaluated at the point such that 6'6 = Al, (80—1677 = d. and n’n — n’€(€’€)‘1.€'n = as. (2.13) 27 where A1 is symmetric, positive definite and a3 is positive, becomes h(A1,d.a3) = 904114161703 + d’Ald) ' lA1| (2.14) =—- 2‘L‘1<1 — p’prL/Z - H1041) - Haas) where H118) = 1519—19” expl—i trSII +<1— p’prltd -— p> where the integral is taken over all symmetric, positive definite K x K matrices. Thus, we have the evaluation H1(S)dS = 20 (2.19) II + (1 - p'p)—1(d - I?)(d - ell-(“1W The desired density (2.5) is obtained by combining Equations (2.14), (2.17), and (2. 19). 28 Chapter 3 Estimation of a Panel Data Model with Parametric Temporal Variation in Individual Effects 3.1 Introduction In this chapter we consider the model: 1),, = 217,73 + 2,47 + x,(e)o,- + 5,7, 2' = 1, . . . , N, t = 1, . . . ,T. (3.1) We treat T as fixed, so that “asymptotic” means as N —> 00. The distinctive feature of the model is the interaction between the time-varying parametric function At(6) and the individual effect 02,-. We consider the case that the a,- are “fixed effects,” as will be discussed in more detail below. In this case estimation may be non—trivial due to the “incidental parameters problem” that the number of 02’s grows with sample size; see, for example, Chamberlain (1980). Models of this form have been proposed and used in the literature on frontier produc- tions functions (measurement of the efficiency of production). For example, Kumbhakar (1990) proposed the case that At(6) 2 [1 + exp(61t + 62112)]‘1, and Battese and Coelli (1992) proposed the case that At(6) 2 exp[—6(t — T)]. Both of these papers considered random effects models in which a,- is independent of X and Z. In fact, both of these papers proposed specific (truncated normal) distributions for the a,, with estimation by maximum 29 likelihood. The aim of the present chapter is to provide a fixed-effects treatment of models of this type. There is also a literature on the case that the At themselves are treated as parameters. That is, the model becomes: 2),, = X§t6+Z£y+ xto,+e,,, i21,...,N, t21,...,T. (3.2) This corresponds to using a set of dummy variables for time rather than a parametric func- tion At (6), and now Atari is just the product of fixed time and individual effects. This model has been considered by Kiefer (1980), Holtz-Eakin, Newey and Rosen (1988), Lee (1991), Chamberlain (1992), Lee and Schmidt (1993) and Ahn, Lee and Schmidt (2001), among others. Lee (1991) and Lee and Schmidt (1993) have applied this model to the frontier pro- duction function problem, in order to avoid having to assume a specific parametric function At(6). Another motivation for the model is that a fixed-effects version allows one to control for unobservables (e.g. macro events) that are the same for each individual, but to which different individuals may react differently. Ahn, Lee and Schmidt (2001) establish some interesting results for the estimation of model (3.2). A generalized method of moments (GMM) estimator of the type considered by Holtz-Eakin, Newey and Rosen (1988) is consistent given exogeneity assumptions on the regressors X and Z. Least squares applied to (3.2), treating the a,- as fixed param- eters, is consistent provided that the regressors are strictly exogenous and that the errors Ett are white noise. The requirement of white noise errors for consistency of least squares is unusual, and is a reflection of the incidental parameters problem. Furthermore, if the errors are white noise, then a GMM estimator that incorporates the white noise assumption dominates least squares, in the sense of being asymptotically more efficient. This is also a somewhat unusual result, since in the usual linear model with normal errors, the mo- 30 ment conditions implied by the white noise assumption would not add to the efficiency of estimation. The results of Ahn, Lee and Schmidt apply only to the case that the At are unrestricted, and therefore do not apply to the model (3.1). However, in this chapter we show that es- sentially the same results do hold for the model (3.1). This enables us to use a parametric function At(6), and to test the validity of this assumption, while maintaining only weak assumptions on the at). This may be very useful, especially in the frontier production func- tion setting. Applications using unrestricted At have yielded temporal patterns of efficiency that seem unreasonably variable and in need of smoothing, which a parametric function can accomplish. The plan of the chapter is as follows. Section 3.2 restates the model and lists our assumptions. Section 3.3 considers GMM estimation under basic exogeneity assumptions, while Section 3.4 considers GMM when we add the conditions implied by white noise errors. Section 3.5 considers least squares estimation and the sense in which it is dominated by GMM. Finally, Section 3.6 contains some concluding remarks. 3.2 The model and assumptions The model is given in equation (3.1) above. We can rewrite it in matrix form, as follows. Let y,- 2 (372-1, . . . , yiT)’, X,- 2 (Xt'1a~-1Xz'T)’a and e,- 2 (5,1, . . . ,eiT)’. Thus y,- iSTx 1, X,- is T x K, e,- is T x 1, 6 is K x 1, y is g x 1, and Or,- is a scalar. (In this chapter, all the vectors are column vectors, and the data matrices are “vertically tall”) Define a function /\ : O ——> RT, where O is a compact subset of IR”, such that M6) 2 (A1(6), . . . , AT(6))’. Note that T is fixed. In matrix form, our model is: 1),: Xi6+1TZ£y+A(6)oz,-+e,;, 1'2 1,...,N. (3.3) 31 /\(6) must be normalized in some way such as )l(6)’/\(6) E l or A1(6) E 1, to rule out trivial failure of identification arising from A(6) 2 0 or scalar multiplications of A( 6). Here we choose the normalization A1 (6) E 1. Let W,- 2 (le1, . . . 1XiT’ Z;)’. We make the following “orthogonality” and “covari- ance” assumptions. Assumption 3.1 (Orthogonality). E(Wi’, a,)’e;- 2 O. Assumption 3.2 (Covariance). E5752 2 031T. Assumption 3.l says that €tt is uncorrelated with 01,-, Z,, and X71, . . . , XiT, and there- fore contains an assumption of strict exogeneity of the regressors. Note that it does not re- strict the correlation between at, and [Zi,X,-1, . . . , XiTI» so that we are in the fixed-effects framework. Assumption 3.2 asserts that the errors are white noise. We also assume the following regularity conditions. Assumption 3.3 (Regularity). (i) (Wi', (Xi, e;)’ is independently and identically distributed over i; (ii) 5,- hasfinitefourth moment, and E5, 2 0; (iii) (VI/’2’, (ti )’ has finite nonsingular second moment matrix; (iv) E IVA Z5, 0,) is of full column rank; (v) /\(6) is twice continuously differentiable in 6. The first four of these conditions correspond to assumptions (BA.l)—(BA.4) of Ahn, Lee and Schmidt (2001), who give some explanation. Condition (v) is new, and self- explanatory. 32 3.3 GMM under the Orthogonality Assumption Let “it 2 uit(6,'y) 2 yit - X56 — Zg'y, and u,- 2 til-(6,7) 2 (111-1, . . . , ill-T)’. Since 11,77, 2 At(6)a,- + Etta it follows that nit — At(6)u,-1 2 Eu —— /\7(6)5,-1, which does not depend on 01,-. This is a sort of generalized within transformation to remove the individual effects. The Orthogonality Assumption (Assumption 3.1) then implies the following moment con- ditions: EI’I"’iI’l-tlt(t3i ’7) - /\t(9)"i1(1’3,7)l = 0, t = 2. . - AT. (34) These moment conditions can be written in matrix form, as follows. Define 0(6) 2 [——)\...(6), IT_1]’, where )1... 2 (A2, . . . , AT)’. The generalized within transformation cor- responds to multiplication by C(6)’, and the moment conditions (3.4) can equivalently be written as follows: E61703, ’y, 6) = E[G(6)"u.,;(6, “7) (X) VIE] = 0. (3.5) (This corresponds to equation (7) of Ahn, Lee and Schmidt (2001), but looks Slightly dif- ferent because our W,- is a column vector whereas theirs is a row vector.) This is a set of (T — 1)(TK + g) moment conditions. Some further analysis is needed to establish that (3.5) contains all of the moment con- ditions implied by the Orthogonality Assumption. Let 2WW 2 Ella-1112’, 2W0 2 E I'Iv’iaai, and of, = Eng. Given the model (3.3), the Orthogonality Assumption holds if and only if the following moment conditions hold: E11601?) ® Wt“ — /\(9) ® Ewe] = 0- (3.6) We could use these moment conditions as the basis for GMM estimation. Alternatively, we can remove the parameter EWa by applying a nonsingular linear transformation to (3.6) in such a way that the transformed set of moment conditions is separated into two subsets, 33 where the first subset does not contain EWa and the second subset is exactly identified for 2W0, given (6,7,6). The following transformation accomplishes this. 0’ ® Id [Iii ® I/Vz' — /\ ® Earn] 2 0 (3.7) A, ® Id where d E TK + g for notational Simplicity; similarly, G, A and u,- are shortened ex- pressions for C(6), M6) and uz-(6, 7). This is a nonsingular transformation, since (G, A) is nonsingular, and therefore GMM based on (3.7) is asymptotically equivalent to GMM based on (3.6). Now Split (3.7) into its two parts: E(G’u,- so Ill/i) = 0 (3.8) E(x’u,-)W,- — (x’mea = 0. (3.9) Here (3.9) is exactly identified for 2W0, given 6, 7 and 6, in the sense that the number of moment conditions in (3.9) is the same as the dimension of EWa- Also ZWa does not appear in (3.8). It follows (e.g., Ahn and Schmidt (1995), Theorem 1) that the GMM estimates of 6, 7 and 6 from (3.8) alone are the same as the GMM estimates of 6, 7 and 6 if we use both (3.8) and (3.9), and estimate the full set of parameters (6, 7, 6, 2W0). But (3.8) is the same as (3.5), which establishes that (3.5) contains all the useful information about 6, 7 and 6 implied by the Orthogonality Assumption. Let b1(6, 7, 6) 2 N‘1 Zthl 012(13, 7,6). Then the optimal GMM estimator 6, 7, and 6 based on the Orthogonality Assumption solves the problem mianitt 7. 6)’V“11b1(6,7,6) (3.10) 6.7.0 q where V11 2 Eblt‘b’u evaluated at the true parameters. As usual, V11 can be replaced by any consistent estimate. A standard estimate would be 2 —Zb1i(fl~a7 gibliIB 7107, (3‘11) 34 where (6 ,7, 6) is an initial consistent estimate of (6,7,6) such as GMM using identity weighting matrix. Under certain regularity conditions (Hansen (1982), Assumption 3) the resulting GMM estimator is x/N-consistent and asymptotically normal. To express the asymptotic variance of the GMM estimator analytically, we need a little more notation. Let S X be the T(TK + g) x K selection matrix such that X, 2 (IT (29 W,)’SX, and let SZ be the T(TK+g) x 9 selection matrix such that ITZ; 2 (IT®W,-)’SZ. S X and 5' Z have the following forms: SK 2 (1K 0 0 OKXgIO 1K O OKxgE 50 01K OKXg)’ (3.12) SZ = (ngK ngK lgE EOg> 2Well (3.14) This result can be obtained either by direct calculation, or by applying the chain rule to 81 calculated in Ahn, Lee and Schmidt (2001, p. 251). This asymptotic variance form is ob- tained from the Orthogonality Assumption only and does not need any further assumption. A practical problem with this GMM procedure is that it is based on a rather large set of moment conditions. Some considerable simplifications are possible if we make the following assumption of no conditional heteroskedasticity (NCH) of 5,: E (eiengi) 2 Egg. (NCH) Under the NCH assumption, V11 = E[G(6o)'et€§G(60) <29 WtWiI = 0(90)’EeeG(90) ‘8 EWW- (3.15) 35 ZWW can be consistently estimated by :WW 2 N ‘1 25:1 Wil/Vi’. Also, for any se- quence (6 N, 7 N) that converges in probability to (60, 70), we have N 1 N ZWWN, 7N7ui(fiNi7N)l -p-> 255 + ogM60)M60)’. (3.16) i=1 Since G(6)’M6) 2 O, for any initial consistent estimate (6, 7, 6), N q(é)’ N-1 20,0. , “with, 7)’ 0(0) (3.17) i=1 will consistently estimate C(60)’255G(60). Thus it is easy to construct a consistent esti- mate of V11 as given in (3.15). In order to consistently estimate the asymptotic variance under NCH, we need to esti- mate ZWWa Ewa, and C’EEEG. Estimation of EWW and G'ZEEG was discussed above. We can obtain an estimate of 2W0 from the GMM problem (3.7). A direct algebraic cal- culation gives us that N “(A N . IE All-12 AA—1-A -- ELI/(I : N Wi‘ifi — “N Wi[A’255G(GIZE€G) GI'UZ’J/(AIA) (3.18) 2'21 i=1 where 171‘ 2 ill-(6, 7), A 2 M6), C 2 G(6), and A5320 is a consistent estimate of A’EggG, one possibility of which is N ‘1 Eli—1 51’ 212-1226}. Finally, under the NCH assumption, the set of moment conditions (3.5) can be con- verted into an exactly identified set of moment conditions that yield an asymptotically equivalent GMM estimate. Specifically, we can replace the moment conditions E01, 2 0 by the moment conditions EBi Vfilbli 2 O. Routine calculation using the forms of Bl, V11 and b1,- yields the explicit expression: EX2-’G(G’255G)_lG'ui = 0 (3.19a) EZ,1§~G(G’EE€G)"IG’1I,- : 0 (3.1%) EEI/Vangwwfi ' A;(GIEEEG)—10,ui : 0- (3-19C) 36 These three sets of moment conditions respectively correspond to (21a), (21b), and (21c) of Ahn, Lee and Schmidt (2001, p. 229). We can replace the nuisance parameters 255, 2W0, and EWW by consistent estimates, as given above (based on some initial consistent GMM estimates of 6, 7 and 6). The point of this simplification is that we have drastically reduced the set of moment conditions: there are (T — 1)(TK + g) moment conditions in b1,- (equation (3.5)) but only K + g + p moment conditions in (3.19). We note that this is a stronger result than the corresponding result (Proposition 1, p. 229) of Ahn, Lee and Schmidt (2001). In order to reach essentially the same conclusion on the reduction of the number of moment conditions, they impose the assumption that e,- is independent of (W7, 02,-), a much stronger assumption than our NCH assumption. 3.4 GMM under the Orthogonality and Covariance Assumptions In this section we continue to maintain the Orthogonality Assumption (Assumption 3.1), but now we add the Covariance Assumption (Assumption 3.2), which asserts that Elsie; 2 021T. Clearly the Covariance Assumption holds if and only if E(u,;u;.) = ogxx’ + (731T. (3.20) Condition (3.20) contains T(T + 1) / 2 distinct moment conditions. It also contains the two nuisance parameters 0?, and 03, and so it should imply T(T + l) / 2 — 2 moment conditions for the estimation of 6, 7 and 6. These are in addition to the moment conditions (3.5) implied by the Orthogonality Assumption. To write these moment conditions explicitly, we need to define some notation. Let H 2 diag(H2, H3, . . . , HT), with Ht equal to the T x (T - t) matrix of the last T -t 37 columns (the (t + 1)th through Tth columns) of IT for t < T, and with HT equal to a T x (T —- 2) matrix of the second through (T -— 1)-th columns of IT.1 Then we can write the distinct moment conditions implied by the Orthogonality and Covariance Assumptions as follows: E01,: : E(G’u, oat/1(3) = 0 (3.21a) Ebgi = EH’(G’u, es in) = 0 (3.21b) I )‘Iui E03,: = E[G u,- a) W] = 0. (3.210) (In these expressions, G is short for 0(6), /\ is short for M6), and u,- is short for ui(6, 7).) The moment conditions b1, in (3.21a) are exactly the same as those in (3.5) of the previous section, and follow from the Orthogonality Assumption. The moment conditions ()2,- in (3.21b) correspond to those in equation (12) of Ahn, Lee and Schmidt (2001). Note that it is not the case that E(G’u, @117) 2 0. Rather, looking at a typical element of this product, we have E (11,-, — At’ui1)ui3, which equals zero for s 2 t and s 2 1. The selection matrix H ’ picks out the logically distinct products of expectation zero, the number of which equals T(T — 1) / 2 —- 1. The selection matrix H plays the same role as the definition of the matrices U5 plays in Ahn, Lee and Schmidt (2001). We note that the moment conditions b2,- follow from the non-autocorrelation of the cit; homoskedasticity would not be needed. The (T — 1) moment conditions in 03,- in (3.21c) correspond to those in equation (13) of Ahn, Lee and Schmidt (2001). They assert that, for t 2 2, . . . ,T, E (“it — AWnXZEEfl Asuz-S) 2 0, and their validity depends on both the non-autocorrelation and the homoskedasticity of the cit. lFor any matrix B with T rows, HfB selects the last T — t rows of B for t < T, and HAB selects the second through (T — 1)-th rows of B. For any matrix B with T columns, BHt selects the last T — t columns of B fort < T, and BHT selects the second through (T — 1)-th columns of B. 38 Some further analysis may be useful to establish that (3.21b) and (3.21c) represent all of the useful implications of the Covariance Assumption. We begin with the implication (3.20) of the Covariance Assumption, which we rewrite as E01781 117) 2 0,2,.(A (8) x\) + ogvecIT. (3.22) Now, let S be the T 2 x T(T + 1) / 2 selection matrix such that, for a T x 1 vector u, vech(uu’) 2 S’ (u <8) 11), where “vech” is the vector of distinct elements. Then E5’(u 8 n) = s’[o,2,(x 8 A) + ogveorT] (3.23) contains the distinct moment conditions. Now we transform the moment conditions (3.23) by multiplying them by a nonsingular matrix, in such a way that (i) the first T(T + 1) / 2 — 2 transformed moment conditions are those given in (3.21b) and (3.21c); and (ii) the last two moment conditions are exactly identified for the nuisance parameters (0?, and 0?), given the other parameters. This will imply that the last two moment conditions are redundant for the estimation of 6, 7 and 6, and thus that (3.21b) and (3.21c) contain all of the useful information implied by the Covariance Assumption for estimation of 6, 7 and 6. To exhibit the transformation, let Gt be the (t — 1)th column of G; let e; equal the tth column of I T—2 and eT equal the last column of IT; and define (HF), = I‘ATHIV 9:65;. ..., (ii—28’2"» 0(T—2)x:rl- (324) (HT was defined above.) Then [as 8 H2, GT_1® HT_1, 1177’s - s’(u,- 8 in) = H’(G’ 8 IT)(u,: 8 u,), (3.25) which is the same as in 02,- in (3.21b). Also, let Ji" = IT — AA’ and J*,t = 2,. . . ,T, is equal to diatg{0tx t, MIT-4} plus a T x T matrix with zero elements except for the tth row 39 which is A’. Then HflJf, . . . , .1715 - S’(u,- 8 u.) = (x’ 8 o’)(u,- 8 1...), (3.26) which is equal to b3,- in (3.21c). The point of the above argument is that the transformations preceding 5" (n,- ® 11,-) in (3.25) and (3.26), Stacked vertically, construct a [T (T + 1) / 2 — 2] x T(T + 1) /2 matrix of full row rank, and yield the moment conditions 02,- and 133,-. The remaining two moment conditions that determine the nuisance parameters are E “(1.221 2 03 + 052 (3.27) Ut2Ut1 J A203 and must be linearly independent of the others (since they involve 0?, and 052 while the others do not). The asymptotic variance of the GMM estimate is complicated because it depends on the moments of Ett up to fourth order. However, we can simplify things with the following “conditional independence of the moments up to fourth order” (CIM4) assumption: Conditional on (W,, 01,-), 5it is independent overt = 1, 2, . . . , T, with mean zero, and with second, third and fourth moments that do not depend on (CIM4) (W7, 01,-) or on t. This is a strong assumption; it implies the Orthogonality Assumption, the Covariance As- sumption, the NCH assumption, and more. In Appendix A, we calculate the asymptotic variance matrix of the GMM estimate based on (3.21) under the assumption (CIM4). Let A 2 0M6O)/66 and note that A... 2 G’ A. Given assumption (CIM4), the mo- ment conditions (3.19), which are asymptotically equivalent to (3.21a), can be simplified 40 as follows: Exgpgu, = 0 (3.28a) enigma... = 0 (3.28b) E$(,,.azf,.lwtv'. - A’PGu, = 0. (3.28c) That is, in place of the large set of moment conditions (3.218), (3.21b) and (3.21c), we can use the reduced set of moment conditions consisting of (3.28), (3.21b) and (3.21c). A final simplification arises if, conditional on (W,-, 01,-), 5it is i.i.d. normal. In this case, (3.21b) can be shown to be redundant given (3.21a) and (3.21c). (See Proposition 4 of Ahn, Lee and Schmidt (2001, p. 231).) Hence, in that case, the GMM estimator using the moment conditions (3.28) and (3.21c) is efficient. 3.5 Least Squares In this section we consider the concentrated least squares (CLS) estimation of the model. We treat the a,- as parameters to be estimated, so this is a true “fixed effects” treatment. We can consider the following least squares problem: N min N4 XII/i — Xifl —1TZi7 - /\(9)aiI’II/i -' X273 ~1TZiV - /\(9)09:I- (3.29) B,’Y,6,0l,...,aN i=1 Solving for 011, . . . , (1N first, we get (is-(Ii, 7, e) = [x(e)’x(9)]’1x(e)’n,(e, 7) i = 1, . . . , N. (3.30) where 1.1-([3, 7) = y, — X26 — 1TZ£7 as before. Then the estimates 6L5, 7145, and 6L5 minimizing (3.29) are equal to the minimizers of the sum of the squared concentrated residuals N N 0w, 7, I9) = N"1 Zen/3,7,6) = N‘1 Zm-(fi.7)’MA(0)ut-(/3,7) (3.31) i=1 2'21 4] which is obtained by replacing a,- in (3.29) with (3.30). From the name of (3.31), we call 6 L S, 7 L S and 6 L S the concentrated least squares estimator. Since G"). 2 O, we have 11in 2 G and therefore III) 2 PG 2 C(C’G)‘1G’. So the first order conditions of the CLS estimation become I I [ (BC/66 N X5190... - 2 36737 = ‘N Z ZilerGui = 0- (3-32) i=1 _ 0C/66 J _ A’Pau...gl(.\'/\)-1 J Interpreting (3.32) as sample moment conditions, we can construct the corresponding (ex- actly identified) implicit population moment conditions: EXfPGu, = 0 (3.33a) EZ,1’TPGu, = 0 (3.33b) Er\’PG’llitL;-A(/\IA)_1 = 0. (3.33c) That is, the CLS estimator is asymptotically equivalent to the GMM estimator based on (3.33). The moment conditions (3.33a) and (3.33b) are satisfied under the Orthogonality As- sumption. However, this is not true of (3.33c). The moment conditions (3.33c) require the Covariance Assumption to be valid (unless we make very specific and unusual assumptions about the form of /\ and its relationship to the error variance matrix). Thus, the consistency of the CLS estimator requires both the Orthogonality Assumption and the Covariance As- sumption. This is a rather striking result, since the consistency of least squares does not usually require restrictions on the second moments of the errors, and is a reflection of the incidental parameters problem. We would generally believe that least squares should be efficient when the errors are i.i.d. normal. However, similarly to the result in Ahn, Lee and Schmidt (2001), this is 42 not true in the present case. The efficient GMM estimator under the Orthogonality and Covariance Assumptions uses the moment conditions (3.21), while the CLS estimator uses only a subset of these. This can be seen most explicitly in the case that, conditional on (W7, 02,), the cit are i.i.d. normal. Then (3.21b) is redundant and (3.21a) can be replaced by (3.28), so that the efficient GMM estimator is based on (3.288), (3.28b), (3.28c) and (3.21c). The CLS estimator is based on (3.338), which is the same as (3.288); (3.33b), which is the same as (3.28b); and (3.33c), which is a subset of (3.21c).2 So the inefficiency of CLS lies in its failure to use the moment conditions (3.28c) and from its failure to use all of the moment conditions in (3.210). The latter failure did not arise in the Ahn, Lee and Schmidt (2001) analysis (see footnote 2). In Appendix B, we calculate the asymptotic variance matrix of the CLS estimator, under the “conditional independence of the moments up to fourth order” (CIM4) assumption of Section 3.4. 3.6 Conclusion In this chapter we have considered a panel data model with parametrically time-varying co- efficients on the individual effects. Following Ahn, Lee and Schmidt (2001), we have enu- merated the moment conditions implied by alternative sets of assumptions on the model. We have shown explicitly that our sets of moment conditions capture all of the useful infor- mation contained in our assumptions, so that the corresponding GMM estimators exploit these assumptions efficiently. We have also considered concentrated least squares estimation. Here the incidental 2The moment conditions (3.33c) are equivalent to EA’ G (G’ G)‘1b3.- 2 0. When the number of parameters in 6 is less than T — 1, the transformation A’ G (G’ G)‘1 loses information. This will be so in most parametric models for M6), though it is not true in the model of Ahn, Lee and Schmidt (2001). 43 parameters problem is relevant because we are treating the fixed effects as parameters to be estimated. An interesting result is that the consistency of the least squares estimator requires both exogeneity assumptions and the assumption that the errors are white noise. Furthermore, given the white noise assumption, the least squares estimator is inefficient, because it fails to exploit all of the moment conditions that are available. We Show how the GMM estimation problem can be simplified under some additional assumptions, including the assumption of no conditional heteroskedasticity and a stronger conditional independence assumption. Under these assumptions we also give explicit ex- pressions for the variance matrices of the GMM and least squares estimators. APPENDIX In this Appendix we derive the asymptotic variances of the efficient GMM estimator and the CLS estimator. We make the “conditional independence of the moments up to fourth order” (CIM4) assumption of Section 3.4. 3.A The asymptotic variance of the GMM estimator Under the Orthogonality and Covariance Assumptions, the moment conditions we have are bl,- = 0’21. 8 14%,, b2,- = H'(G’u,- 8 u.) and 33, = (A'A)‘1/\"ui 8 0’2... Let 6 = (6’,7’,6’)’. Let Bj 2 —E(0bj,-/66) forj 2 1,2,3, evaluated at the true parameters. Let Vi)C 2 Ely-71);“. for j, k 2 1,2,3, evaluated at the true parameters. Define K3 2 Egg/o? and 834 = E(e;.1, — 8o§)/o§. Let 7...; = EW,;<1> = (1)(6) = m; + diag()t2, . . . , AT); and (1).. 2 MM, + diag()\%, . . . , Age), where A... 2 (A2, . . . , AT)'. After some algebra, we get v11 = o§(a’a 8 2......) (3.34) V12 = o§(G’G 8 2...,A’)H (3.35) V13 = a? [G’G 8 Ewa+ :7: —( 8 71.47)] (3.36) 1'22— _ oEH ’[G’G 8 (o3, xx’ + oEITnH (3.37) V23— — 05 H’ { [(0-0, 2 +32%) G’G+ TIA/“(DI 8 A} (3.38) 7 2 2 02 I "4 1:33:05 {(0a+ +:\T-—,—i\)GG+2:—;—:ua+ “002 ~———~—2...(I> } (3.39) and Br = [(G <8 ZWWYSXi (G ® ZWWIISZ» 1h (59 Ewe] (3-40) 32 = H’(IT_1 8 A)[(G 8 Sara’s... (a 8 ZWQ)’SZ, 03A,] (3.41) 133 = [(0 ® Ewol'SX, (G <8 EWa)ISZv 03.1%]- (3-42) 45 With these results, the variance-covariance of the GMM estimator is - (V11 V12 V13) {BM—1 covm—(d—d): (31,3533) V13 V22 V23 32 . (3.43) \Viz’, V23 V33) K33} L .l 3.B The asymptotic variance of the CLS estimator By the standard Taylor series expansion technique, we find that the asymptotic variance will be equal to 34080—1140 where 620,- _ 8C)- 604 A0 ”Eats—857’ and 30‘ 66 66’ (3.44) evaluated at the true parameter. Let us calculate each of them. Let A 2 BA(90)/86’ 2 (0px1, A;)'. B0 is the same as in Ahn, Lee and Schmidt (2001, p. 253). Let ‘1' 2 C(G’G)-1<1>- (am-105x11... = G(G’G)-1.(G'G)-1G’; and In = Ea). Then 80- 80,; 0B1 85’ =405 S'XU’G ® EWW)SX (3'45) ('30 6C- 652 a): = 4agSfY(PG 8 Ewe/>52 (3.46) 30- 30 2 65’ 6—9} = 4085} [PG 8 XXX/0+ A, A01! 8 pm] A (3.47) 60- BC Ebfifl— 0') I — —4UESZ(PG ® ZWW)SZ (3-48) BC OC 2 E a 672' — -04 5'2 [PG8EWQ+ :IA(\II8)IW)] A (3.49) 60,: ac,:_ 02 I 02 K14 } __ _ \II A. . E09 89’ 5A 40{(2 NA PG+2:I:IIQ‘II+(———AI/\)2 4 (350) A0 is obtained from the following. 320- Eg—fiay = 2lSXUDG ‘59 ZWW )SX, 5X (PG ® ZWW )SZ, SX(PG ® EwaW (3.51) 6201 I I I Em = 2[Sz(PG ® E3WW)SX, 52(PG ‘59 Ewwfiz, 520?} ® 27WaW (3.52) 620i I I I I 2 I 46 Bibliography Ahn, S. C., Y. H. Lee, and P. Schmidt (2001) ‘GMM Estimation of Linear Panel Data Models with Time-varying Individual Effects.’ Journal of Econometrics 101, 219— 255. Ahn, S. C. and P. Schmidt (1995) ‘A Separability Result for GMM Estimation, with Ap- plications to GLS Prediction and Conditional Moment Tests.’ Econometric Reviews 14, 19—34. Battese, G. E., and T. J. Coelli (1992) ‘Frontier Production Functions, Technical Efficiency and Panel Data: with Application to Paddy Farms in India.’ Journal of Productivity Analysis 3, 153—170. Bierens, H. J. (1994) Topics in Advanced Econometrics, Cambridge University Press. Billingsley, P. (1968) Convergence of Probability Measures, Wiley. Bound, J., D. A. Jaeger, and R. M. Baker (1995) ‘Problems with Instrumental Variables Estimation When the Correlation between the Instruments and the Endogeneous Ex- planatory Variable is Weak (in Applications and Case Studies).’ Journal of the Amer- ican Statistical Association 90, 443—450. Chamberlain, G. C. (1980) ‘Analysis of Covariance with Qualitative Data.’ Review of Eco- nomic Studies 47, 225—238. Chamberlain, G. (1987) ‘Asymptotic Efficiency in Estimation with Conditional Moment Restrictions.’ Journal of Econometrics 34, 305—334. Chamberlain, G. C. (1992) ‘Efficiency Bounds for Semiparametric Regression.’ Economet- rica 60, 567—596. Davidson, J. (1994) Stochastic Limit Theory, Oxford University Press. Hansen, L. P. (1982) ‘Large Sample PrOperties of Generalized Method of Moments Esti- mators.’ Econometrica 50, 1029—1054. Holtz-Eakin, D., W. Newey, and H. S. Rosen (1988) ‘Estimating Vector Autoregressions with Panel Data.’ Econometrica 56, 1371-1395. James, A. T. (1964) ‘Distributions of Matrix Variates and Latent Roots Derived from Nor- mal Samples.’ Annals of Mathematical Statistics 35, 475—501. 47 Johnson, N. L., and S. Kotz ( 1972) Distributions in Statistics: Continuous Multivariate Distributions, John Wiley & Sons. Kiefer, N. M. (1980) ‘A Time Series-Cross Sectional Model with Fixed Effects with an Intertemporal Factor Structure.’ Unpublished Manuscript, Cornell University. Kim, J ., and D. Pollard (1990) ‘Cube Root Asymptotics.’ The Annals of Statistics 18, 191— 219. Kumbhakar, S. C. (1990) ‘Production Frontiers, Panel Data and Time-varying Technical Inefficiency.’ Journal of Econometrics 46, 201—212. Lee, Y. H. (1991) ‘Panel Data Models with Multiplicative Individual and Time Effects: Ap- plications to Compensation and Frontier Production Functions.’ Unpublished Ph.D. Dissertation, Department of Economics, Michigan State University. Lee, Y. H., and P. Schmidt (1993) ‘A Production Frontier Model with Flexible Temporal Variation in Technical Inefficiency.’ In The Measurement of Productive Efiiciency: Techniques and Applications, edited by H. Fried, C. A. K. Lovell, and S. Schmidt, Oxford University Press. Linton, O. (1999) ‘Problem.’ Econometric Theory 15, 151. Magnus, J. R., and H. Neudecker (1988) Matrix Difierentation Calculus with Applications in Statistics and Econometrics, John Wiley & Sons. Manski, C. F. (1983) ‘Closest Empirical Distribution Estimation.’ Econometrica 51, 305— 319. Newey, W. K. (1988) ‘Asymptotic Equivalence of Closest Moments and GMM Estimators.’ Econometric Theory 4, 336—340. Phillips, P. C. B. (1980) ‘The Exact Distribution of Instrumental Variable Estimators in an Equation Containing n + 1 Endogenous Variables.’ Econometrica 48, 861—878. Staiger, D., and J. H. Stock (1997) ‘Instrumental VariabIes Regression with Weak Instru- ments.’ Econometrica 65, 557—586. Theil, H. (1971) Principles of Econometrics, Wiley. 48 HltANF HAVE LIBRARIE 1293 02177 ‘077‘