[

>O\

ALA—L A.

_ LIBRARY
Michigan State
___v 993mm

 

 

 

 

This is to certify that the

dissertation entitled

THREE ESSAYS ON ECONOMETRICS

presented by

CHIROK HAN

has been accepted towards fulﬁllment
of the requirements for

Ph.D. degree in ECONOMICS

 

tam 3&2:er

Major professor

Date JUNE 14, 2001

 

MS U i: an Afﬁrmative Action/Equal Opportunity Institution 0-12771

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

DATE DUE DATE DUE DATE DUE

 

6/01 C'jCIRC/DateDue.p65-p.15

 

THREE ESSAYS ON ECONOMETRICS
By

Chirok Han

AN ABSTRACT OF A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Economics

2001

Professor Peter J. Schmidt

ABSTRACT

THREE ESSAYS ON ECONOMETRICS
By

Chirok Han

This dissertation contains three unrelated essays in econometric theory.

The ﬁrst chapter considers Generalized Method of Moment-type estimators for which
a criterion function is minimized that is not the “standard” quadratic distance measure,
but instead is a general LP distance measure. It is shown that the resulting estimators are
root-n consistent, but not in general asymptotically normally distributed, and we derive the
limit distribution of these estimators. In addition, we prove that it is not possible to obtain
estimators that are more efﬁcient than the “usual” Lg-GMM estimators by considering
Lp~GMM estimators. We also consider the issue of the choice of the weight matrix for
LpdGMM estimators.

The second chapter is concerned with the asymptotic properties of the instrumental
variable estimators with irrelevant instruments. The estimator is neither consistent nor
asymptotically normal, but converges in distribution to a random variable which depends
on the covariance of the regressors and the error term. The density of the asymptotic
distribution is calculated and it is Shown that the mean of the asymptotic distribution is
equal to the probability limit of the OLS estimator.

The last chapter is an extension of Ahn, Lee and Schmidt (2001) to allow a parametric
function for time-varying coefﬁcients on the individual effects. It is shown that the main
results of Ahn, Lee and Schmidt (2001) hold for our model, too. Least squares is consistent,

given white noise errors, but less efﬁcient than a GMM estimator.

ACKNOWLEDGMENTS

I would ﬁrst like to thank Professor Peter]. Schmidt, the chairperson of my dissertation
committee. He taught my ﬁrst econometrics course and his enthusiasm and knowledge of
this subject inspired me to pursue my present studies. I am grateful that he kindly acted as
the Chairperson of my dissertation committee and guided me the for the course of my stud-
ies and writing of this dissertation. I would also like to thank Professor Robert M. de Jong,
who taught me a great deal on methodological details and helpfully guided me in writing
the ﬁrst chapter of this thesis. I also would like to express my gratitude toProfessor Jeffrey
M. Wooldridge for his timely comments and insightful critiques. Without my committee
members’ help, this thesis would not have been possible, although I am responsible for any
remaining mistakes.

I am greatly indebted to my wife, Youngmi, for her love, support, and patience. My
four-year-old son, Yoojeong, also helped by his constant love, which is a continuing great
source of joy. My special thanks also go to my parents and my parents-in-law, who have
supported and encouraged me in various ways with unconditional love and care.

I wish to thank my friends for their time, friendship and support. Some of them are
Hoon Kim, Seok Hyeon Kim, Douglas Harris, Hong Peng Ong, and Neil Megan. I also
would like to thank all the individual Korean students in the Department of Economics.

I would like to extend a Special word of thanks to the staff in the Department of Eco-
nomics, especially, Margaret Lynch, Linda Wirick and Pamela Dorton who skillfully helped

me to handle many administrative items.

Finally, I acknowledge with gratitude Reverend Hyo Nam Hwang, Reverend Jung Kee
Lee, and all the members of the Lansing Korean United Methodist Church for their concern,

help and friendship. Thanks and glory to God.

iii

TABLE OF CONTENTS

Chapter 1 The Properties of Lp—GMM Estimators .............................. l

1 .1 Introduction ....................................................... l

1.2 Main theorem ..................................................... 3

1.3 Efﬁciency of Lg-GMM ............................................. 5

1.4 Further remarks on weight matrices .................................. 7

1 .5 Conclusion ........................................................ 12

LA Mathematical Appendix ............................................ 14
Chapter 2 The Asymptotic Distribution of the Instrumental Variable Estimators When

the Instruments Are Not Correlated with the Regressors ........................... 21

2. 1 Introduction ....................................................... 21

2.2 The limit distribution ............................................... 23

2.3 The relationship with the CLS estimator ............................. 25

2.4 Conclusion ........................................................ 26

2.A Proof of Theorem 2.4 .............................................. 27

Chapter 3 Estimation of a Panel Data Model with Parametric Temporal Variation in

Individual Effects ............................................................. 29

3. 1 Introduction ....................................................... 29

3.2 The model and assumptions ......................................... 31

3.3 GMM under the Orthogonality Assumption ........................... 33

3.4 GMM under the Orthogonality and Covariance Assumptions ........... 37

3.5 Least Squares ..................................................... 41

3.6 Conclusion ........................................................ 43

3.A The asymptotic variance of the GMM estimator ....................... 45

3B The asymptotic variance of the CLS estimator ........................ 46

Bibliography .................................................................. 46

iv

Chapter 1
The Properties of Lp—GMM Estimators

1.1 Introduction

Since Lars Peter Hansen’s (1982) original formulation, Generalized Method of Moment
(GMM) estimation has become an extremely important and popular estimation technique in
economics. This is due to the fact that economic theory usually implies moment conditions
that are exploited in the GMM technique, while typically economic theory is uninforrnative
about the exact stochastic structure of economic processes. GMM estimation provides an
estimator when a certain set of moment conditions E g(y, 6’0) 2 0 is a priori known to
hold. When the number of moment conditions exceeds the number of parameters, we
cannot hope to obtain an estimator by setting the empirical equivalent 9(0) of our moment
condition equal to zero, but instead we will need to make QM) as close to zero as possible in
some sense. The usual GMM formulation minimizes a quadratic measurement of distance.
Hansen (1982) established the large sample properties of these GMM estimators under mild
regularity conditions.

The above exposition raises the natural questions of what happens if distance measures
other than a quadratic one is used and whether or not those other distance measures can give
better estimators. The answer to the latter question is no, as Chamberlain (1987) has shown

that the optimal GMM (in the usual sense) estimators attain the efﬁciency bound. Apart

from this general remark on the efﬁciency of optimal GMM estimators, there have been at-
tempts such as Manski (1983) and Newey (1988) to directly treat the use of non-quadratic
measures of distance between population and empirical moments. In those articles results
are stated that imply that under mild assumptions, estimators that minimize a general dis-
crepancy function are consistent and asymptotically normally distributed. Based on these
results, Newey (1988) concludes that (under regularity conditions) estimators using two
different measures of distance are asymptotically equivalent if the corresponding Hessian
matrices are asymptotically equal. This implies that it is impossible to obtain better esti-
mators by modifying the quadratic criterion function, given the assumptions of that paper.
This conclusion gives a direct argument for the use of quadratic distance measure beside
Chamberlain’s general argument.
However, when considering Lp-GMM as deﬁned below, it turns out that only the L2
norm satisﬁes the assumptions of Manski (1983) and Newey (1988), and other values of
p in [1, 00) do not. The problems are the following. When p = 1, the Lp norm is not
differentiable at 0; when p E (1, 2), it is continuously differentiable but is not twice dif-
ferentiable at 0; when p e (2, 00), it is continuously twice differentiable, but the Hessian
matrix evaluated at the true parameter becomes zero (and therefore singular). Therefore,
the papers by Newey and Manski have no implications for Lp-GMM for values of p other
than 2. When considering Lp—GMM, it turns out that the “standard” asymptotic framework
will fail. Also, the least absolute deviations type asymptotic framework also does not di-
rectly apply. Linton (1999) has recently pointed out in an example in Econometric Theory
that the estimator minimizing the L1 distance of the sample moments from zero can have
a non-normal limit distribution. In this chapter, we will establish the limit distribution of
general Lp—GMM estimators, and we show that Lp-GMM estimators are root-n consistent,

but in general need not have an asymptotically normal distribution. In addition, we prove

a theorem that Shows that Lp-GMM estimators cannot be more efﬁcient than L2-GMM
estimators, thereby strengthening Newey’s conclusion to Lp-GMM estimators. Finally, we
discuss the problem of ﬁnding. the optimal weight matrix for Lp-GMM estimators.
Section 1.2 deﬁnes our estimator and gives the main theorem for consistency and
asymptotic distribution, whose proof is given in the Mathematical Appendix (Section 1.A).
Section 1.3 discusses the efﬁciency of Lg-GMM among all Lp-GMM estimators. Section
1.4 describes the problem of the selection of the weight matrix. In addition, this section
gives some interesting results for the case when p = 1 and p = 3, including Linton’s (1999)
example. The conclusions section (Section 1.5) is followed by a Mathematical Appendix

in which all the proofs are gathered.

1.2 Main theorem

In this section, the main result of this chapter on which the remainder of our discussion of
this chapter is based will be stated. Let y1, y2, . . . be a sequence of i.i.d. random vectors in
W”. Let g(y,-, 9) be a set of q moment conditions with parameter 6 E O C N“, that is, let

9(2),, 6) be a random vector in R9 that satisﬁes
Eg(yz', 00) = 0. (1.1)
Let 9(6) : n‘l 2?:1 9(3),, 0). The LP norm H - Hp is deﬁned as
(I 1
Help = (Z W”) 4’ (1.2)
i=1
for p E [1, 00). The LP-GMM estimator (in is assumed to satisfy
Wimp = gig, ll§z(9>llp. (1.3)

Let ‘1‘ : maxiJ' )xz'jl If [It IS a k1 X k2 matrix. LEI Q = Eg(yi,60)g(yi,90)’, 311d D =

E (8/66’ ) g(y,;, 90). The regularity assumption below will be needed to establish our results:

Assumption 1.1.
(i) 9 is a compact and convex subset in IR)“ ,'
(ii) 60 is an interior point of 9;
(iii) E 9(3),, 6) = 0 if 6 :2 60, i.e., 60 uniquely satisﬁes the moment conditions;
(iv) 9(3), 6) is continuous in 6 for each y E W”, and is measurable for each 6 E 6);
(v) Esupeeo more)! < 00:
(vi) 9 = 59(3/23 90)9(:¢/r, 90)’ I'Sﬁnite:

(vii) 9(3), 6) has ﬁrst derivative with respect to 6 which is continuous in 6 E 6) for each
y E Rm and measurablefor each 6 E 6-), Esupgee |(0/66’)g(y,j, 6)] < 00, and D

is of full column rank.
(viii) Ha: + Déllp achieves its minimum at a unique point of{ in IR)“ for each :1: 6 IR".

Note that part (viii) of Assumption 1.1 is nonstandard and far from innocent in the L1 case.
Consider for example sequences of random variables y“ and gig that are independent of

each other and are N (O, 1) distributed, and consider the Ll-GMM estimator that minimizes
lifr —9| + lib-9|- (1.4)

Part (viii) of Assumption 1.1 will not hold in this case, because any value in the interval
[min(g1,372),max(g1,y2)] will minimize the criterion function. Therefore, our result does
not establish the limit distribution of the Lp-GMM estimator for this case. However, if we

consider the weighted criterion function

lat - 91+ 61372 — 0| (1.5)

for any c e [0, 00) except for c :2 1, part (viii) of Assumption 1.1 will be satisﬁed.
The following theorem now summarizes the asymptotic properties of Lp-GMM esti-
mators. Note that we do not yet explicitly consider weight matrices at this point, but such

a treatment can be easily done with the result below at hand.

Theorem 1.2. Let Y be a random vector in IRq distributed N (0, 9). Then under Assump—
tion 1.), 6", ——> 60 a.s., and
ill/2(én — 60) £14 argmin HY + Dgup. (1.6)
teak
The proof of this theorem, like all the proofs of this chapter, can be found in the Appendix
(Section 1.A). As a special case of the above theorem, the usual Lg-GMM estimator can
be considered. HY + Déllg 2: (Y + DE)’(Y + Dg) is minimized by f : —(D’D)“1D'Y,

so applying Theorem 1.2, we get
1/2 ‘ d __ I —1 I I —1 I I -1
n (6,, -— 60) ——> (D D) D Y ~ N[O, (D D) D SID(D D) ] (1.7)

which coincides with usual analysis.
In examples below, we will Show that for general values of p, normality need not result
for the Lp-GMM estimator. We will be able to establish though that the limit distribution

is symmetric around 0 and possesses ﬁnite second moments.

1.3 Efﬁciency of Lg-GMM

In this section and in the remainder of this chapter, we consider Lp-GMM estimation with
a weight matrix W, i.e., Lp-GMM estimators that minimize the distance from zero of
“weighted” average of moment conditions H W§(6) up, where W is a q x q nonrandom and
nonsingular matrix. It is straightforward to extend our analysis to the case of estimated

matrices W, and we will not pursue that issue here. Clearly, whenever Eg(y,-, 60) = O we

will have EWg(yi, 60) = 0, and therefore our previous analysis applies. Below, we will
keep using the notations Y, D, and 9 deﬁned previously.

Let g minimize “W (Y + D£)Hp. Applying Theorem 1.2, we see that 6", which mini-
mizes ||Wg(6) Hp, is strongly consistent (since W 9(3),, 60) is also a set of legitimate moment
conditions) and til/2(6n —— 60) —d—> f.

To facilitate the efﬁciency discussion, we need to show asymptotic unbiasedness of LP-
GMM estimators. This is established by noting that the limiting distribution of 721/ 2(6,, ~—
60) is symmetric and has a ﬁnite second moment. The following theorem states the unbi-

asedness result:

Theorem 1.3. Under Assumption 1.], LP-GMM estimators are asymptotically unbiased.

Because of the asymptotic unbiasedness of our estimators, we can compare weighted Lp-
GMM estimators by their asymptotic variances. This property is crucial to prove the fol-
lowing theorem. This result states that optimal Lg-GMM estimators are asymptotically

efﬁcient among the class of weighted Lp-GMM estimators.

Theorem 1.4. Under Assumption 1.], an optimal Lg—GMM estimator is asymptotically
eﬂicient among the class of weighted Lp-GMM estimators, i.e., the asymptotic variance

of an optimal Lg-GMM estimator is less than or equal to that of any weighted Lp-GMM

estimator:

The above theorem provides us with the knowledge that the central message from the result
by Newey (1988)—that there is no potential for efﬁciency improvement by considering
discrepancy functions other than quadratic—can be extended towards Lp-GMM estimators.
Basically, Theorem 1.4 is obtained by noting that the expression for the limit distribution
can be viewed as a ﬁnite sample estimation problem in its own right, for which the Cramér—

Rao underbound applies.

1.4 Further remarks on weight matrices

In this section, we will discuss various issues involving the choice of the weight matrix
W and discuss several examples. We will not be able to prove optimality of a particular
nonsingular weight matrix for general Lp-GMM, but instead we will sketch some of the
issues below.

It is well-known that the optimal weight matrix W for p = 2 satisﬁes W QW’ = I (or

W ' W 2 9‘1). This result can be easily obtained using our ﬁrst theorem too, for

||W(Y + Dani = (Y + Dg)’W’W(Y + DE) (1.8)
is minimized by
g: ——(D’W’WD)‘lD’W’WY (1.9)

and its variance is minimized when W’ W 2 9’1. Therefore the optimal Lg-GMM esti-

mator has the asymptotic distribution
Ill/ha, — 00) 1"» N[O, (o’o—lorli. (1.10)

Can this efﬁciency be attained for p other than 2? In general, the answer is yes. It can be

achieved for general p by weighting cleverly. Consider
W* :— (0‘10 3 I/VQ)’ (1.11)

where W2 is of Size q x (q — k), chosen to be orthogonal to D, i.e., ”5D 2 O and chosen

such that W* is nonsingular. This weight matrix always exists when q > k.’ Then

P
) D’n-1(Y + or)
HWWY+DOE=

W§(Y + 05)

 

lThis weight matrix needs WéD = O and |W*| aé 0, so there are (q -— k)k + 1 restrictions. But

W* has q( q — k) free parameters. The number of parameters is greater than or equal to the number
of restrictions when q > k.

= HD’Q‘IY + 179406111; + HWéYHii (1 12>

is minimized byé = —(D’a-ID)-lo'o-1Y ~ N[O, (D’Q‘ID)“1] for any 1) 2 1. So
the W *-weighted Lp-GMM estimator 6” with W“ chosen as in Equation (1.11) has the
asymptotic distribution (1.10), and therefore the weight matrix W* is optimal for any p.
For p = 2, there are two different types of optimal weights. One is given by (1.11)
(say, D'Q‘l type) and the other is characterized by W QW’ : I (note that a scalar mul-
tiplication of an optimal weight matrix is again optimal). In general, each of these neither
implies nor is implied by the other, but they give one and the same asymptotic distribution.
Furthermore, the optimal weight of the second type is not unique, since any orthogonal
transformation of an Optimal weight is again optimal. (When WOW’ 2 I, V : H W
also satisﬁes VQV’ = I provided H’ H = H H’ z: I.) This is, of course, because the

I'V-weighted L2 distance “Wm 12 : (:1;I W' W:r:)1/2 depends only on the product W’ W but

 

not W itself.
But when p 75 2, two different orthogonal transformations W and V of 12-1/2 are not
expected to give equivalent asymptotic distribution, even though both WQW’ : I and

VQV’ = I hold. Here are a few examples.

(1') Our ﬁrst example is for p = 1, q = 2, and ls: = 1. Suppose that it is known that
(311,-, ygi) is i.i.d. across i with mean (60, 260) and covariance I. Then the moment
condition is E(y1,- — 60,13, — 260)’ = 0, and therefore D = ——(1 2)’, and Q = I.

Consider two weight matrices: W = I and
V = —— . (1.13)
‘6 -—2 1

It can be seen that V is an optimal weight matrix here for p E [1, 00), since the same

limit distribution as for optimal Lg-GMM is obtained using V. In the case W = I ,

the W-weighted Ll-GMM estimator can be obtained by minimizing the criterion
function

l'y'r-QIHy’rQHI, (1.14)
and the minimizer equals (1 /2)g2. This implies that the rescaled and centered W-
weighted Ll—GMM estimator is asymptotically distributed as N (0, 1 / 4), while the
rescaled and centered V—weighted Lp-GMM estimator is asymptotically distributed

N(0,1/5).

(it) Here is a more interesting example for p = 1, q =2 3, and k = 1. Suppose that yu, ygi
and 11/3,- are mutually independent and i.i.d. across 1', have a mean 60, and a variance

of 1. Then, the moment condition is

E(Illi — 190,1121' — 607.71% — 00), = 0,

implying that D = —(1, 1, 1)’ and Q = I. Consider the two weight matrices W :- I

and

1N3 1N3 we)
0 1/\/§ -1/\/§ . (1.15)
-\/é7?? 1N6 1N6)

Again, V can be Shown to be an Optimal weight matrix in this example. In the case

‘7
[I

ll

 

of W = I, this situation could result when we are minimizing the criterion function
161-9|+ly‘2-91+1373*91- (1.16)

Note that both W and V are chosen to be orthogonal. The W -wei ghted Ll-GMM es-
timator (after centering and scaling) converges in distribution to argminé H N (0, I 3) +
Dflll. The minimizing argument 6, which is the (unique) median of three indepen-

dent standard normal random variables, has distribution

1'

P(£s:c):6 / <I>(t)[1—<I>(t)]¢(t)dt=<I>(z)2[3—2<I>(x)1 (1.17)

-—00

(see Linton (1999)). This distribution is not normal, and simulations for three stan-
dard normals illustrate that the density of the median has sharper center and thicker
tail than a (properly rescaled) normal (N (0, 2 / 3)). The result of using V as the weight
matrix is different. We have VD 2 (—\/3 O O)I and the V—weighted Ll—GMM
estimator (after centering and scaling) converges in distribution to argmindHZ +
vogul : (21—- (fig + [221 + 123)} where Z =—. (21,22,23)’ ~ N(O,I). Note
that basically, this optimal weight matrix will eliminate two out of three absolute
value elements of the criterion function of Equation (1.16). The solution 5 is dis-
tributed N (0, 1 / 3). This example Shows that two weight matrices W and V that
satisfy WOW' 2 I and VOV’ 2 I can give asymptotics different not only in vari-
ance but in the type of limit distribution, since the one distribution is non-normal,

while the other is normal.

(iii) The only tractable example for p > 2 that we could ﬁnd is the following. Consider
the above case ofq 2 3, k 2 1, Q 2 I, and D 2 —(1 1 1)’. The weight matrix V
of (1.15) is again Optimal and the V—weighted L3-GMM estimator is asymptotically
normal. In the case when the weight is W 2 I so the objective function to be
minimized is

lat—613+1y‘2 —613+ 1.2/‘3 4913, (1.18)
the W-weighted L3-GMM estimator (after centering and rescaling) converges in dis-
tribution tog 2 argminEHY + Df||3 where Y 2 (Y1,Y2,Y3)’ ~ N(0,13). Let
(Ya), 1"(2), Y(3)) be the order statistic of (Y ,Y2, Y3), and (6(1), 6(2), 6(3)) be the or-

der statistic of (1Y1 — Y2], |Y2 — Y3], [YB — Y1

 

). Then it turns out that

~

I5 = r7+sgn(Y(1) +313) ‘2Y'(2))i(2/3)(5(3)+5(2)) —(25(3)5(2))1/21

= Y0) -- sgn(Y(1) + Y(3) — 23/(2 )(3))[(5 -()26(3(S (22))1/ ] (1.19)

10

where sgn(a) 2 1{a > O} _ 1{a < 0} and Y 2 (Y1 + Y2 + Y3)/3. In simulations,

this distribution cannot be distinguished from a normal.

The natural question now arises whether we can get optimality by a nonsingular weight
matrix W satisfying WOW’ 2 I. In Short, the answer is yes provided D’ O‘ID equals a
scalar or a scalar matrix (a scalar times the identity matrix). The question here is whether
we can construct (Ii—1D 5 W2)’ (where WéD 2 0) by an orthogonal transformation
of (Tl/2, that is, whether there exists an orthogonal matrix H of size q x q such that
HQ"1/2 2 MQ’ID W2)' and WéD 2 0. If such aweight matrix H is exist, it will have
the form H 2 /\(§2_1/2D Ql/ZWQ)’ for which (a) WéD 2 O, (b) H is nonsingular, i.e.,
IHI 2 0, and (c) HH’ 2 I, i.e.,
D’O-ID D’Wg I 0
HH’ :- /\2 = . (1.20)
W5 D WéQWg O I
(a) imposes (q —- 1)h restrictions, and (b) imposes 1 extra restriction. When It 2 1, (c)
is equivalent to VVéQI/VQ 2 (D’Q’ID)Iq__1 due to (a), which imposes (q — 1)(q — 2) / 2
more restrictions. Therefore, when k 2 1, we have q(q — 1) + 1 free parameters (for Wg
and A) and (q — 1) +1 + (q —1)(q - 2)/2 2 q(q —1)/2 + 1 restrictions. So the number
of parameters to be set is greater than or equal to the number of restrictions, whence we
conclude that we can ﬁnd W2 satisfying (a), (b), and (c). When It > 1, (c) can not be
satisﬁed unless D’ Q‘ID is a scalar matrix, but if D’Q’ID is so, Wg and A satisfying (a),
(b), and (c) can be found. The V matrices in the examples above are constructed in this
way and are Optimal for those problems.
Note that this rule does not depend on the speciﬁc value of p, and that the reason the
weight W satisfying WQW’ 2 I is optimal for p 2 2 does not lie in that the optimal

weight of type D’ 52—1 can be obtained by an orthogonal transformation of 9‘1/2, but in

the speciﬁc prOperties of L2 distance.

11

1.5 Conclusion

In this chapter we derived an abstract expression for the limit distribution of estimators
which minimizes the Lp distance between population moments and sample moments, as

follows:

VH6” — 60) 11—) argmin “Y + DCHP (1.21)
tenth

where Y N N[O, Eg(y,-, 60)g(y,-, 60)’] and D 2 E(6/66’)g(y,-, 60). This asymptotic repre-
sentation allows a generalization of the well-known GMM framework of Hansen (1982) to-
wards the L1) distance. As mentioned in the introduction, Manski (1983) and Newey (1988)
generalized GMM to allow arbitrary distance (or, more generally, discrepancy) function.
But unfortunately they need the second order differentiability of the distance functions and
the nonsingularity of a Hessian matrix evaluated at true parameter. Only the L2 distance
satisﬁes these conditions among all Lp distances.

However, our analysis can not give an explicit form for the asymptotic distribution, but
only allows the above abstract representation in terms of the argmin functional. Nonethe-
less, our method directly supports the result of Chamberlain (1987) that the optimal L2-
GMM estimator is efﬁcient among the class of Lp-GMM estimators. Interestingly, our
analysis reduced the analysis of efﬁciency issues of Lp-GMM estimators to the analysis of
the small sample properties of estimators minimizing the L1) distance between Y and —D5,
i.e., argmingemk ”Y + Dfllp.

As a ﬁnal remark, note that it is interesting to consider potential robustness properties
of the Lp-GMM procedure. The asymptotic results that were presented in this chapter
all rely on central limit theorems and existence of second moments, so in this sense, we
probably should not expect the Lp-GMM method to have robustness properties of any

type. However, since the objective function in the case of Ll-GMM effectively puts less

12

weight on “outlier” moments, one might expect that Ll—GMM may be less vulnerable to
the inclusion of an incorrect moment condition than “standard” Lg-GMM estimators. No

attempt will be made however in this chapter to formalize this intuition.

l3

1.A Mathematical Appendix

In order to establish the theorems, we will need several results that will be Stated as lem-
mas. Lemma 1.5 is used to prove Theorem 1.3 (the asymptotic unbiasedness of Lp-GMM

estimators).

Lemma 1.5. Let a random vector Y in R4 with ﬁnite q have a normal distribution. Let
D be a real nonrandom matrix of size q x k (q 2 k) with full column rank. Then for any
p E [1, 00),

E 2 argmin “Y + Déllp (1.22)
{GIRk
will have a well-deﬁnedﬁnite covariance matrix.

Proof. First note that, because D’ D has full column rank under Assumption 1.1,

5'6 S f’D'DE/AnmdD'D)

s (HY + Dine + I1Y||2)2/Amtn(D’D)
s f/WHY + 0511,. + in up) IAm...<D'D)
s cﬁjgeiirna/Ammw’D1 (1.23)

where the ﬁrst inequality follows from full column rank of D, the second inequality is the

triangle inequality, the third is the inequality

(Zb2)1/2< Cl/p2(Zlbi|p)1/p (1.24)

which is a consequence of Loeve’s of inequality (see Davidson (1994, p. 140), Equation
(953)), and the fourth follows by the fact that “Y + D{ H p is minimized at E 2 g”. The result

then follows because all moments of the normal distribution are ﬁnite. D

The ﬁrst step towards the proof of Theorem 1.2 is the strong consistency proof for Lp-

GMM estimators, which can be accomplished by invoking several theorems from Bierens

l4

(I994).
Lemma 1.6. Under Assumption 1.1, the Lp-GMM estimator 6,, is strongly consistent.

Proof. First, conditions (i) and (iv) of Assumption 1.1 ensure the existence and measurabil-
ity of 6,, by Theorem 1.6.1 of Bierens (1994). The above conditions together with condition
(v) of Assumption 1.1 imply that 9(6) converges to E 9(3),, 6) almost surely uniformly on

O by Theorem 2.7.5 of Bierens (1994). Hence,

 

 

q(6)||p —> ||Eg(y,-, 6)”p a.s. uniformly on
9 Since I] . ”p is continuous. Finally, this uniform convergence result and the uniqueness of
60 by condition (iii) of Assumption 1.] give the Stated result by Theorem 4.2.1 of Bierens

(1994). C]

To prove the main assertion of Theorem 1.2, we will use Theorem 2.7 of Kim and Pollard

(1990). We restate Kim and Pollard’s theorem as our next lemma.

Lemma 1.7. Let Q, Q1, Q2, ' - - be real-valued random processes on Rk with continuous

paths, and {27, be random vector in W“, such that
(i) Q“) —’ 00 as '5' —" 00"
(ii) Q() achieves its minimum at a unique point in IR,“ ;
(iii) Q" converges weakly to Q on any set C 2 [—M, M ]k
(iv) in = 012(1):
(v) 5,, minimizes Qn(£).
Then {n —d—> argminéem Q(§).

Proof. See Theorem 2.7 of Kim and Pollard (1990).

15

To apply Kim and Pollard’s theorem and Show that its conditions are satisﬁed in our sit-
uation, we need the following three lemmas. For these lemmas, we need the following

deﬁnitions. Deﬁne
{An 2 til/2(6n - 60), for n 2 1, 2, . . ., (1.25)
where 6,, is Lp-GMM estimator. Deﬁne Rq -valued random functions 12,111,122, . . . by

Ill/29(60 + {71—1/2), if 60 + {'n—1/2 E 9
til/29(60), otherwise
where 60 2 argmaxaeg H9(6)||p, for n 2 1,2, . . ., and 12(5) 2 Y + D6 where Y is a

W -valued random vector distributed N (0, I2). Let

one) = Hh..(£)Hp and 62(5) = WON,» (1.27)
The lemmas that we need for the proof of our central result are then the following:

Lemma 1.8. Suppose the conditions of Assumption 1.] are satisﬁed. Then 72,1/2( 6n — 60) 2
Op(1).
Proof The Taylor expansion of 9(6) around 6 2 60 implies that

{70%) = 6090) + (0/39')§(6n)(én - 00), (1.28)

where 6,, is a mean value in between 6,, and 60. From the above Taylor series, from the
triangular inequality for the Lp norm, and from the fact that 6,, minimizes H9(6)||p, we

have

~ - _ - 1 2-
lln1/2(8/89’)§(9n)(9n — 60)”,, S lln1/29(6n)llp + ”n / 9(60)HP (I 29)
s 211n1/2g(6oilip-
But condition (vi) of Theorem 1.2 implies that n1/29(60) converges in distribution by cen-

tral limit theorem, and therefore is Op(1). Therefore,

iIn1/2(a/ae’1g(a><én -— 00>in -—- 010(1). (1.30)

16

Condition (vii) of Theorem 1.2 implies that (ii/66’ ) 9(6) follows a strong uniform law of
large numbers, which combined with the consistency of 6 implies that (8/66’)9(6n) &
E(6/66’)g(y,~, 60) 2 D. NowletD 2 (ii/0609(6).) Then it follows that D’D 3;) D’D.
Since D’ D is strictly positive deﬁnite, D’ D becomes strictly positive deﬁnite for n large

enough. Therefore, for n large enough,
"(én - 90),“;72 - 90) S "(én — 60),D,D(én - 00)//~\mm. (1.31)

where Ami" is the smallest eigenvalue of D’ D. Because 2mm 3'—> Am,” > 0 where
Am," is the smallest eigenvalue of D’ D, we get Am,” 2 0.5Amin eventually (for 71 large
enough) almost surely. Therefore, as 71. increases, the right hand side of (1.31) eventually
becomes less than 4n(6n — 60)’ D’ D(6n — 60)/)\m,-,,. By Equation (1.30) and because of
the equivalence of Lp and Lq norms for p, q E [1, 00), this expression is 012(1), which

completes the proof. D

Lemma 1.9. Consider random functions Q, Q1, Q2, . . . deﬁned by (1.27). Under Assump-

tion 1.], the ﬁnite-dimensional distributions of Qn converge to the ﬁnite-dimensional dis-

tributions of Q.

Proof. With ﬁxed .5, condition (ii) of Assumption 1.] (60 is an interior point of O) ensures

that 60 + fn‘l/z belongs to O for 71 large enough. When this happens, by the Taylor

expansion,
hue) = 721/2960 + girl/2) = n1/2§(00)+(3/39')§(90 +6716)»; (1.32)

with g lying in between g and 0. Condition (vi) of Assumption 1.1 (ﬁniteness of the second

d . . ..
moment of g(y,-, 60)) implies that n1/29(60) —+ Y, and condition (vu) of Theorem 1.2

implies that (6/66)9(60 +fn‘1/2)§ —+ D6 as. similar to the proof of Lemma 1.8.

17

To conclude the proof and show the convergence of the ﬁnite-dimensional distributions
of hn to h, we can use the Cramér—Wold device (see for example Billinsley (1968), Theorem

7.7), which states that

(hn(€1)l ' ° ' hn(€r)I)’ 1’ (6461), " ' h(€r)’)' (133)
if and only if
E Aginuj) 11+ 2 Aging) (1.34)
j21 j=1

for each A1 6 Kg, . . . , Ar E R]. And (1.34) is to be easily shown using the result of the

ﬁrst part of this proof.
Finally, note that Since [I - Hp is continuous, the ﬁnite-dimensional distribution of Q" 2

“(1an converge to those of Q 2 thlp by the continuous mapping theorem. D

Lemma 1.10. Under Assumption I .1, Q,,,(.) deﬁned by Equation (1.27) is stochastically

equicontinuous on any set E 2 [—M, M ]k .

Proof. Using the triangular inequality for H - Hp, we have

linfr) - Qn.(€2)1

= l llhn(€1)l|p - llhn.(€2)||p|

S th<€1> 2 (1.12162)le

= 11(3/59')Q(90 +5171—1/2)€1 - (3/09')fl(90 +52'n—1/2162Hp (1.35)

where 6,- lies in between 5t and O for i 2 1,2. By the strong uniform law of large numbers

for (6/66’ )9(6) and the convergence to zero of fin—”2 uniformly over all {1 and f2.

sup manpower—“fa sup ”Dar—amp (1.36)
€1€E,|€1-€2|<5 €1€Eil€1-€2|<5

18

under the conditions of Assumption 1.1. Therefore, by nonsingularity of D’ D, it follows

that for all 17 > 0

lim lim sup P( sup lQn(El) — Qn(.f2)l > 7}) 2 0, (1.37)
520 "200 tiara—aid

which is the stochastic equicontinuity condition. C]

Proof of Theorem 1.2. The strong consistency result of this theorem is proven in Lemma
1.6. For the proof of the main assertion of this theorem, we will show that for the Q", Q
and 5,, as deﬁned above, all the conditions of Lemma 1.7 are implied by the conditions
of Theorem 1.2. First, note that Q“, Q, and En, deﬁned by (1.25) and (1.27), satisfy
conditions (i)—(v) of Lemma 1.7 under the conditions of Theorem 1.2. Condition (v) of
Lemma 1.7 is guaranteed by the deﬁnitions of 61,, 5n, and Q". It is also not difﬁcult
to notice that condition (i) of Lemma 1.7 is trivially satisﬁed since D is of full column
rank. And condition (ii) of Lemma 1.7 is just supposed by condition (viii) of Theorem
1.2. The weak convergence condition is veriﬁed by showing stochastic equicontinuity and
ﬁnite-dimensional convergence, which together with compactness of the parameter space
is well-known to imply weak convergence. Lemmas 1.8, 1.9, and 1.10 therefore ensure that
the conditions of Lemma 1.7 are all implied by the conditions of Theorem 1.2, and therefore

convergence in distribution of our estimator is proven by invoking Lemma 1.7. [:1

Proof of Theorem 1.3. By Lemma 1.5, E has a ﬁnite mean. And by Theorem 1.2, 111/2 (6,, —
60) i) f 2 argming ”Y + D§||p where Y N N(0,§2) and D 2 E(0/86’)g(y,;, 60). From
the symmetry of Y, it follows that ”Y + Dg Hp is distributed identically to HY + D(—€) up,

which implies identical distributions of f and —E. Therefore, the mean of g is O. [:1
Proof of Theorem 1.4. Let 6,, be the W -weighted Lp-GMM estimator. By Theorem 1.2,

111/269,, — 00) 3+ argmin ||W(Y + Damp. (1.38)
E

19

So the problem here is to Show that 52 2 argminé HST-1(2(Y + D§)H2 has smaller vari-
ance than any other (:1, 2 argminé ”W (Y + DOHP. Now, let us view the minimization
problem ming ”Y + D6 H p as generating estimators 5,, of the unknown parameter 5, where
Y ~ N (—D§ , Q) with known Q. The result of Theorem 1.3 now states that all Lp—GMM
estimators will be asymptotically unbiased, and the argument can be easily extended to
Show global unbiasedness of £1, for 5 (as required for the application of the Cramér-Rao

lower variance bound). The likelihood function
110/, D; O = (270—6 2101“” 2exp{—(1/2)(Y + Dem-Ia + DO} (1.39)

satisﬁes all the required regularity conditions for Cramér-Rao inequality (see Theil (1971),
p.384). And it now follows that the asymptotic distribution of the optimal L2—GMM esti-
mator

argrnin ”524/20” + Dang 2 ——(D’§2_1D)”1D'Q_1Y (1.40)
{elitk

attains the Cramér-Rao variance lower bound of (D’Q‘ID)‘1, since it equals the maxi-

mum likelihood estimator. The result then follows. D

20

Chapter 2

The Asymptotic Distribution of the
Instrumental Variable Estimators When
the Instruments Are Not Correlated
with the Regressors

2.1 Introduction

A number of recent papers, including Bound, Jaeger and Baker (1995) and Staiger and
Stock (1997), have considered instrumental variable (IV) estimators when the instruments
are weak, in the sense that the correlation between the instruments and the regressors is low.
In this chapter, we consider the extreme case that the instruments are completely irrelevant.
In this case we can prove the following interesting result: the mean of the asymptotic
distribution of the IV estimator is the same as the probability limit of the OLS estimator.
Thus, as might be expected, irrelevant instruments do not remove the least squares bias.

To be speciﬁc, consider the linear model 9 2 X 6 + e (in matrix notation) where e is a
T x 1 random vector with mean zero, X is a T X K random matrix of regressors, and ,6 is
a K x 1 parameter. It is well known that when X t is correlated with at, the ordinary least
squares (OLS) estimator is not consistent. More Speciﬁcally, under the regularity conditions
that ensure the convergence of the statistics T“1X’X and T’lX’e in probability, the OLS

estimator converges in probability as T —> 00 to 60 + (E XtXD ‘1 E Xtet, which is different

21

from 60, the true parameter, unless EX ﬁt 2 0.

To obtain a consistent estimator, one possibility is instrumental variable estimation.

Good instruments Z (T x L) are those which satisfy:
(i) T-lZ ’ Z converges in probability to a nonrandom, nonsingular matrix;
(ii) T‘IZ ’ X converges in probability to a nonrandom matrix with full column rank;
(iii) T—1/2Z’e converges in distribution to a normal random vector with zero mean.

When the instruments are good, the IV estimator is consistent and asymptotically normal.
Here we are concerned with the case that condition (ii) fails. Suppose that L 2 K, so
that there are enough instruments, but the instruments (Z) are not strongly correlated with

the regressors (X). Speciﬁcally, let the reduced form for X be:
X=ZH+V OD

Staiger and Stock (1997) consider the case that IT 2 HT 2 C/x/T, with C a L x K
matrix of constants. They call this the case of weak instruments. In this case the correlation
between X t and Zt is of order T ‘1/2, and condition (ii) fails. Staiger and Stock show
that with weak instruments 61V, the IV estimator, does not have a probability limit but
rather 6 IV — 60 converges to a non—normal random variable. The mean of the asymptotic
distribution of 61V — 60 is non-zero, so that with weak instruments there is asymptotic
bias. This bias is in the same direction as the bias of OLS.

In this chapter we consider the case of irrelevant instruments, which are uncorrelated
with the regressors. This is a special case of Staiger and Stock, corresponding to C 2 0
so that IT 2 0 in the reduced form (2.1) for all T. In this case we Show that the mean of

the asymptotic distribution of (61V — 60) is the same as (plim 601,5 -— 60), the asymptotic

bias of the OLS estimator.

22

2.2 The limit distribution

Consider a linear model in matrix notation
y 2 X°6 + W7 + e (2.2)

where y and X ° are respectively a T x 1 vector of dependent variables and a T x K matrix of
the endogenous regressors, W is a T x G matrix of exogenous regressors, the ﬁrst column
of which is a vector of ones, 5 is the vector of errors, and 6 and y are the parameters to be
estimated.

Consider a T x L random matrix Z ° of “instruments.” For any matrix A with full
column rank, let PA = A(.4’A)-1A'. Let X = (I — PW)X° and Z = (I — PW)Z°. Thus
X is the part of the endogenous regressors not explained by the exogenous regressors, and
similarly Z is the part of the “instruments” not explained by the exogenous regressors.

We make the following “high level” assumptions.

Assumption 2.1. T-IX’ X, T‘le’e, and T’IZ' Z converge in probability to ﬁnite, non-

random, nonsingular matrices, and T—IX’E converges to a nonrandom matrix.

Let 2 2 plim T‘1(X,e)'(X,e). It has submatrices XXX, 2X5, and 055, which are
the probability limits of T-lX’ X , T‘lX’e, and T ’15’ 5, respectively. Also let I) 2
plim T‘lZ’ Z. Assumption 2.] can be regarded as the implication of a law of large num-
bers under more primitive assumptions on the sequences. For example, when the sequence
(et,Xf',Zt°’)’ is i.i.d. and its second moment exists, XXX 2 EX°Xt°' — EXfl'Vt’ '
(EWtWt')‘1EWtX°', 2X5 2 EXt°et — EXth’(EWtI’V{)‘1E1’Vt5t = 1999086055 =
Eef, and a : EZfo’ — EZfl'Vt’(EWtIl/’t’)“IEWth’.

Let p 2 2&3622 X5051.” 2 which is a multivariate correlation coefﬁcient. A key as-

sumption is the irrelevance of Z as instruments for X, as follows.

23

. _ d
Assumption 2.2. T 1/ZZ'(X, e) ——> til/205.2%; 7703,12) where vec(£,17) is a multivari-

ate centered normal with Evec(§) vec(§)’ 2 I, E7717’ 2 I and Evec(f)n’ 2 p 8) 1.

Note that Assumption 2.2 implies T‘lZ ’ X i) 0, which may agree with an intuitive def-
inition of irrelevant instruments. Also, this assumption can be regarded as the implication
of a central limit theorem under more primitive assumptions, as above.

Now let 61V be the estimate of 6 in equation (2.2), when estimation is by IV using

(Z °, W) as instruments. It is readily shown that
61V — so = [X’Z(Z’Z)“lZ’X]'1X’Z(Z’Z)‘IZ’e. (2.3)

By dividing Z’ Z by T, and Z’ X and Z’ 5 by Tl/Z, we observe that 61V — 60 is a function
e of (T‘lz’Z,T‘1/2Z’X,T‘l/QZ’c), where <12: RLXL x 116“ x IRLXI —+ 1&le is
deﬁned by tp(§l, A, b) 2 (A’Q‘lA)"1A’Q‘1b. Obviously, (p is measurable and is almost
surely continuous in the limit. Here continuity is assured by the nonsingularity of the limit

of T‘IZ ’ Z and the almost sure full column rank of the limit of T ‘1/ 2Z ’ X. Therefore, we

apply the continuous mapping theorem to get the following result.

Theorem 2.3. Under Assumptions 2.] and 2.2,

‘ ~ ' —1 2 .. 1 2
aw i I3... = a) + 2X)? (5’5) levee! , (2.4a)
or equivalently,
5 = iii/WW — towel/2 —d+ 5... -—- (t’trlt’n. (2.41»

We note that the result in (2.4a) is the same as equation (2.5) of Staiger and Stock (1997,
P.562) when C 2 0 (and therefore /\ 2 0 in (2.3a) and (2.3b)).

We now calculate the density of 5053, as follows.

24

Theorem 2.4. Under Assumptions 2. I and 2.2, the density of (Iasy is

-1 —(L+l)/2

I p I
f(d) = CK,L - (1 - rim—K” (I, d) (2.5)
pl 1 d!

where CK,L = 2—(L—1)(K—1)/27,—K/2 P(L+1) F (L—K+1)—1.

§

Proof See Appendix. 1:]

Given K (the dimension of Xt) and L (the dimension of Zt), the density depends upon
p only. As is mentioned in Phillips (1980), this density is similar to the multivariate t
distribution. The ﬁrst moment of Sasy exists as long as L is strictly greater than K, and more
generally its integer moments exist up to the degree of over-identiﬁcation. (See Phillips

(l980,p.870))
2.3 The relationship with the OLS estimator

We are now in a position to prove our main result.

Theorem 2.5. Suppose L > K. Then under Assumptions 2.] and 2.2, the mean of 6031/ is

equal to the probability limit of the OLS estimator:

Proof. We observe that the density of 5033, in (2.4b) is symmetric around p, the correlation

coefﬁcient of the endogenous regressors and the error. Furthermore, if L > K, the mean

of 5,3,, exists. Therefore, if L > K, Etiagy = p. Then

~ —1 2 .~ 1/2
Ell/30,81] :2 /30 + E4YX/ Eéasyagg

-1/2 1/2
:: 2 r r 05
[30 + AA p E (26)
2,130 + ZEIXZXE
= plim BOLS-

25

C.

An alternative proof that does not depend on the exact form of the density of 603,, is as

follows. When the mean of (Iggy 2 ((6)4877 exists,

EEO-1872 = EEO—lemma) (2.7)

by the law of iterated expectations. But since E vec(f) vec(£)', E vcc(£)r)’, and E 1771’ are

respectively equal to I <8) I , p (81 I , and I,
En] vec(f) 2 (,0I ® I)(I <81 I)—1vec§ 2 (p. (2.8)

(For the operations involved with the Kronecker product and vec operators, see Magnus

and Neudecker (1988, Ch. 2).) Hence, E(§’f)‘1§’n = p. It follows that Bias, 2 [30 +
E}%2Egasyo¥2 2 60 + XXIX 2X5, which is equal to the probability limit of the OLS

estimator, as in the original proof. D

2.4 Conclusion

In this chapter, we answered some questions about the IV estimator using irrelevant instru-
ments in linear models. We saw that the IV estimator is not consistent but converges to a
nondegenerate distribution which is similar to a multivariate t distribution. When the num-
ber of instruments (excluding the exogenous regressors) is strictly greater than the number

of endogenous regressors, the mean of the asymptotic distribution exists and is equal to the

probability limit of the OLS estimator.

26

2.A Proof of Theorem 2.4

First, observe that the rows of the L x (K + 1) matrix (6 , 77) are a random sample from

I . .
N (0, J) where J = (p, 3’). Thus, (6, 77)’(6, 77) has a K + 1 dlmenSlonal central Wishart
distribution with L degrees of freedom on the covariance matrix J. When L 2 K + 1, its

density at the point 6’6 2 31,6’77 2 b2, and 77’17 2 ()3 is

9(311 b2, b3) = 2—L(K+1)/2FK+1(%)—1(1- p’p)‘L/2><

1 1 (2.9)
(3729-16-2) exp{—§ tr r13}
where B 2 8,1 b2 and I}, is the multivariate gamma function deﬁned as
02 03
n .
rum.) 2 n"("-1)/4 H I‘(a — Lg—l). (2.10)

i =1
(See Johnson and Kotz (1972, p.162).)

Following Phillips (1980), consider the one-to-one mapping 6) on the set of K + 1

dimensional, real, symmetric, positive deﬁnite matrices deﬁned as

B b ( B 3‘11)
,7,, 1 2 2, 1 1 2 (2.11)
(1’2 (23 (62131-1 b3—b’zBl‘lb3
Then the inverse 16—1 is
A Ad
071: A1 d —+ 1 1 (2.12)
d, (73 (ll/11 a3+d’A1d

whose Jacobian turns out to be |A1|. Therefore, by the change-of-variable technique, the
density of the symmetric random matrix, which is deﬁned such that the upper-left K x K
diagonal block is 6’ 6 , the lower-right 1 x 1 diagonal block is 17’77 — 77' 6 (6’6 )"16I 77, and the

upper-right K x 1 off-diagonal block is Susy = (6’6)‘16’77, evaluated at the point such that

6'6 = Al, (80—1677 = d. and n’n — n’€(€’€)‘1.€'n = as. (2.13)

27

where A1 is symmetric, positive deﬁnite and a3 is positive, becomes

h(A1,d.a3) = 904114161703 + d’Ald) ' lA1|

(2.14)
=—- 2‘L<K+1WI‘K+1(%>‘1<1 — p’prL/Z - H1041) - Haas)
where
H118) = 1519—19” expl—i trSII +<1— p’prltd -— p><d — p)’l} (2.15)
and
H3(;l:) : sUFKI/Q'l exp{—§s(l — p’p)-1}. (2.16)

The density of Susy at d is obtained by integrating out A1 (symmetric and positive
deﬁnite) and a3 (positive) from (2.14). From the deﬁnition of the P( - ) function, the integral

of H3(a3) in (2.14) over all positive (13 is equal to
00
/ H3(;l:)d;r : WHO/20 — p’p)(L—K)/2r(é:gi) (2.17)
0

The integral of the matrix argumented function H1(S) over all symmetric, positive deﬁnite
matrices is obtained from the results in James (1964). Equations (25), (26), and (28) of
James (1964, pp. 479—480) imply that for any nonsingular real symmetric K x K matrix
D.

[S 0 |S|a-%(K+1)exp{—tr SD}dS 2 I‘K(a)|D|_a (2.18)
>

where the integral is taken over all symmetric, positive deﬁnite K x K matrices. Thus, we

have the evaluation

H1(S)dS = 2<L+1)/2rK(L:2+—1)x

S>0 (2.19)

II + (1 - p'p)—1(d - I?)(d - ell-(“1W

The desired density (2.5) is obtained by combining Equations (2.14), (2.17), and (2. 19).

28

Chapter 3

Estimation of a Panel Data Model with

Parametric Temporal Variation in
Individual Effects

3.1 Introduction

In this chapter we consider the model:
1),, = 217,73 + 2,47 + x,(e)o,- + 5,7, 2' = 1, . . . , N, t = 1, . . . ,T. (3.1)

We treat T as ﬁxed, so that “asymptotic” means as N —> 00. The distinctive feature
of the model is the interaction between the time-varying parametric function At(6) and the
individual effect 02,-. We consider the case that the a,- are “ﬁxed effects,” as will be discussed
in more detail below. In this case estimation may be non—trivial due to the “incidental
parameters problem” that the number of 02’s grows with sample size; see, for example,
Chamberlain (1980).

Models of this form have been proposed and used in the literature on frontier produc-
tions functions (measurement of the efﬁciency of production). For example, Kumbhakar
(1990) proposed the case that At(6) 2 [1 + exp(61t + 62112)]‘1, and Battese and Coelli
(1992) proposed the case that At(6) 2 exp[—6(t — T)]. Both of these papers considered
random effects models in which a,- is independent of X and Z. In fact, both of these papers

proposed speciﬁc (truncated normal) distributions for the a,, with estimation by maximum

29

likelihood. The aim of the present chapter is to provide a ﬁxed-effects treatment of models
of this type.
There is also a literature on the case that the At themselves are treated as parameters.

That is, the model becomes:
2),, = X§t6+Z£y+ xto,+e,,, i21,...,N, t21,...,T. (3.2)

This corresponds to using a set of dummy variables for time rather than a parametric func-
tion At (6), and now Atari is just the product of ﬁxed time and individual effects. This model
has been considered by Kiefer (1980), Holtz-Eakin, Newey and Rosen (1988), Lee (1991),
Chamberlain (1992), Lee and Schmidt (1993) and Ahn, Lee and Schmidt (2001), among
others. Lee (1991) and Lee and Schmidt (1993) have applied this model to the frontier pro-
duction function problem, in order to avoid having to assume a speciﬁc parametric function
At(6). Another motivation for the model is that a ﬁxed-effects version allows one to control
for unobservables (e.g. macro events) that are the same for each individual, but to which
different individuals may react differently.

Ahn, Lee and Schmidt (2001) establish some interesting results for the estimation of
model (3.2). A generalized method of moments (GMM) estimator of the type considered
by Holtz-Eakin, Newey and Rosen (1988) is consistent given exogeneity assumptions on
the regressors X and Z. Least squares applied to (3.2), treating the a,- as ﬁxed param-
eters, is consistent provided that the regressors are strictly exogenous and that the errors
Ett are white noise. The requirement of white noise errors for consistency of least squares
is unusual, and is a reﬂection of the incidental parameters problem. Furthermore, if the
errors are white noise, then a GMM estimator that incorporates the white noise assumption
dominates least squares, in the sense of being asymptotically more efﬁcient. This is also

a somewhat unusual result, since in the usual linear model with normal errors, the mo-

30

ment conditions implied by the white noise assumption would not add to the efﬁciency of
estimation.

The results of Ahn, Lee and Schmidt apply only to the case that the At are unrestricted,
and therefore do not apply to the model (3.1). However, in this chapter we show that es-
sentially the same results do hold for the model (3.1). This enables us to use a parametric
function At(6), and to test the validity of this assumption, while maintaining only weak
assumptions on the at). This may be very useful, especially in the frontier production func-
tion setting. Applications using unrestricted At have yielded temporal patterns of efﬁciency
that seem unreasonably variable and in need of smoothing, which a parametric function can
accomplish.

The plan of the chapter is as follows. Section 3.2 restates the model and lists our
assumptions. Section 3.3 considers GMM estimation under basic exogeneity assumptions,
while Section 3.4 considers GMM when we add the conditions implied by white noise
errors. Section 3.5 considers least squares estimation and the sense in which it is dominated

by GMM. Finally, Section 3.6 contains some concluding remarks.

3.2 The model and assumptions

The model is given in equation (3.1) above. We can rewrite it in matrix form, as follows.
Let y,- 2 (372-1, . . . , yiT)’, X,- 2 (Xt'1a~-1Xz'T)’a and e,- 2 (5,1, . . . ,eiT)’. Thus y,- iSTx 1,
X,- is T x K, e,- is T x 1, 6 is K x 1, y is g x 1, and Or,- is a scalar. (In this chapter, all the
vectors are column vectors, and the data matrices are “vertically tall”) Deﬁne a function
/\ : O ——> RT, where O is a compact subset of IR”, such that M6) 2 (A1(6), . . . , AT(6))’.

Note that T is ﬁxed. In matrix form, our model is:

1),: Xi6+1TZ£y+A(6)oz,-+e,;, 1'2 1,...,N. (3.3)

31

/\(6) must be normalized in some way such as )l(6)’/\(6) E l or A1(6) E 1, to rule out
trivial failure of identiﬁcation arising from A(6) 2 0 or scalar multiplications of A( 6). Here

we choose the normalization A1 (6) E 1.
Let W,- 2 (le1, . . . 1XiT’ Z;)’. We make the following “orthogonality” and “covari-

ance” assumptions.
Assumption 3.1 (Orthogonality). E(Wi’, a,)’e;- 2 O.
Assumption 3.2 (Covariance). E5752 2 031T.

Assumption 3.l says that €tt is uncorrelated with 01,-, Z,, and X71, . . . , XiT, and there-
fore contains an assumption of strict exogeneity of the regressors. Note that it does not re-
strict the correlation between at, and [Zi,X,-1, . . . , XiTI» so that we are in the ﬁxed-effects
framework. Assumption 3.2 asserts that the errors are white noise.

We also assume the following regularity conditions.
Assumption 3.3 (Regularity).
(i) (Wi', (Xi, e;)’ is independently and identically distributed over i;
(ii) 5,- hasﬁnitefourth moment, and E5, 2 0;
(iii) (VI/’2’, (ti )’ has ﬁnite nonsingular second moment matrix;
(iv) E IVA Z5, 0,) is of full column rank;
(v) /\(6) is twice continuously differentiable in 6.

The ﬁrst four of these conditions correspond to assumptions (BA.l)—(BA.4) of Ahn,
Lee and Schmidt (2001), who give some explanation. Condition (v) is new, and self-

explanatory.

32

3.3 GMM under the Orthogonality Assumption

Let “it 2 uit(6,'y) 2 yit - X56 — Zg'y, and u,- 2 til-(6,7) 2 (111-1, . . . , ill-T)’. Since
11,77, 2 At(6)a,- + Etta it follows that nit — At(6)u,-1 2 Eu —— /\7(6)5,-1, which does not depend
on 01,-. This is a sort of generalized within transformation to remove the individual effects.
The Orthogonality Assumption (Assumption 3.1) then implies the following moment con-
ditions:

EI’I"’iI’l-tlt(t3i ’7) - /\t(9)"i1(1’3,7)l = 0, t = 2. . - AT. (34)

These moment conditions can be written in matrix form, as follows. Deﬁne 0(6) 2
[——)\...(6), IT_1]’, where )1... 2 (A2, . . . , AT)’. The generalized within transformation cor-
responds to multiplication by C(6)’, and the moment conditions (3.4) can equivalently be

written as follows:
E61703, ’y, 6) = E[G(6)"u.,;(6, “7) (X) VIE] = 0. (3.5)

(This corresponds to equation (7) of Ahn, Lee and Schmidt (2001), but looks Slightly dif-
ferent because our W,- is a column vector whereas theirs is a row vector.) This is a set of
(T — 1)(TK + g) moment conditions.

Some further analysis is needed to establish that (3.5) contains all of the moment con-
ditions implied by the Orthogonality Assumption. Let 2WW 2 Ella-1112’, 2W0 2 E I'Iv’iaai,
and of, = Eng. Given the model (3.3), the Orthogonality Assumption holds if and only if

the following moment conditions hold:
E11601?) ® Wt“ — /\(9) ® Ewe] = 0- (3.6)

We could use these moment conditions as the basis for GMM estimation. Alternatively, we
can remove the parameter EWa by applying a nonsingular linear transformation to (3.6)

in such a way that the transformed set of moment conditions is separated into two subsets,

33

where the ﬁrst subset does not contain EWa and the second subset is exactly identiﬁed for

2W0, given (6,7,6). The following transformation accomplishes this.

0’ ® Id
[Iii ® I/Vz' — /\ ® Earn] 2 0 (3.7)

A, ® Id
where d E TK + g for notational Simplicity; similarly, G, A and u,- are shortened ex-
pressions for C(6), M6) and uz-(6, 7). This is a nonsingular transformation, since (G, A)

is nonsingular, and therefore GMM based on (3.7) is asymptotically equivalent to GMM

based on (3.6). Now Split (3.7) into its two parts:

E(G’u,- so Ill/i) = 0 (3.8)

E(x’u,-)W,- — (x’mea = 0. (3.9)

Here (3.9) is exactly identiﬁed for 2W0, given 6, 7 and 6, in the sense that the number
of moment conditions in (3.9) is the same as the dimension of EWa- Also ZWa does
not appear in (3.8). It follows (e.g., Ahn and Schmidt (1995), Theorem 1) that the GMM
estimates of 6, 7 and 6 from (3.8) alone are the same as the GMM estimates of 6, 7 and 6
if we use both (3.8) and (3.9), and estimate the full set of parameters (6, 7, 6, 2W0). But
(3.8) is the same as (3.5), which establishes that (3.5) contains all the useful information
about 6, 7 and 6 implied by the Orthogonality Assumption.

Let b1(6, 7, 6) 2 N‘1 Zthl 012(13, 7,6). Then the optimal GMM estimator 6, 7, and

6 based on the Orthogonality Assumption solves the problem

mianitt 7. 6)’V“11b1(6,7,6) (3.10)
6.7.0 q

where V11 2 Eblt‘b’u evaluated at the true parameters. As usual, V11 can be replaced by

any consistent estimate. A standard estimate would be

2 —Zb1i(ﬂ~a7 gibliIB 7107, (3‘11)

34

where (6 ,7, 6) is an initial consistent estimate of (6,7,6) such as GMM using identity
weighting matrix. Under certain regularity conditions (Hansen (1982), Assumption 3) the
resulting GMM estimator is x/N-consistent and asymptotically normal.
To express the asymptotic variance of the GMM estimator analytically, we need a little
more notation. Let S X be the T(TK + g) x K selection matrix such that X, 2 (IT (29
W,)’SX, and let SZ be the T(TK+g) x 9 selection matrix such that ITZ; 2 (IT®W,-)’SZ.

S X and 5' Z have the following forms:

SK 2 (1K 0 0 OKXgIO 1K O OKxgE 50 01K OKXg)’ (3.12)
SZ = (ngK ngK lgE EOg><K ngK [9), =1T®(Og><TK719)I (3-13)
where O’s without dimension subscript stand for OKxK- Deﬁne A... 2 ' 6A*(60)/86’.

The variance of the asymptotic distribution of the GMM estimates of 6, 7 and 6 equals

(Bi VﬁlBl)”1 where V11 2 Eblib’” as above and
Bi = [(0 ‘8 zwwl'Sxi (G <8 ZWWYSZa 4* 63> 2Well (3.14)

This result can be obtained either by direct calculation, or by applying the chain rule to 81
calculated in Ahn, Lee and Schmidt (2001, p. 251). This asymptotic variance form is ob-
tained from the Orthogonality Assumption only and does not need any further assumption.

A practical problem with this GMM procedure is that it is based on a rather large set
of moment conditions. Some considerable simpliﬁcations are possible if we make the

following assumption of no conditional heteroskedasticity (NCH) of 5,:
E (eiengi) 2 Egg. (NCH)
Under the NCH assumption,
V11 = E[G(6o)'et€§G(60) <29 WtWiI = 0(90)’EeeG(90) ‘8 EWW- (3.15)

35

ZWW can be consistently estimated by :WW 2 N ‘1 25:1 Wil/Vi’. Also, for any se-

quence (6 N, 7 N) that converges in probability to (60, 70), we have

N
1
N ZWWN, 7N7ui(ﬁNi7N)l -p-> 255 + ogM60)M60)’. (3.16)

i=1
Since G(6)’M6) 2 O, for any initial consistent estimate (6, 7, 6),

N
q(é)’ N-1 20,0. , “with, 7)’ 0(0) (3.17)

i=1
will consistently estimate C(60)’255G(60). Thus it is easy to construct a consistent esti-
mate of V11 as given in (3.15).
In order to consistently estimate the asymptotic variance under NCH, we need to esti-
mate ZWWa Ewa, and C’EEEG. Estimation of EWW and G'ZEEG was discussed above.

We can obtain an estimate of 2W0 from the GMM problem (3.7). A direct algebraic cal-

culation gives us that

N “(A N
. IE All-12 AA—1-A --
ELI/(I : N Wi‘iﬁ — “N Wi[A’255G(GIZE€G) GI'UZ’J/(AIA) (3.18)
2'21 i=1

where 171‘ 2 ill-(6, 7), A 2 M6), C 2 G(6), and A5320 is a consistent estimate of A’EggG,
one possibility of which is N ‘1 Eli—1 51’ 212-1226}.

Finally, under the NCH assumption, the set of moment conditions (3.5) can be con-
verted into an exactly identiﬁed set of moment conditions that yield an asymptotically
equivalent GMM estimate. Speciﬁcally, we can replace the moment conditions E01, 2 0
by the moment conditions EBi Vﬁlbli 2 O. Routine calculation using the forms of Bl,

V11 and b1,- yields the explicit expression:

EX2-’G(G’255G)_lG'ui = 0 (3.19a)
EZ,1§~G(G’EE€G)"IG’1I,- : 0 (3.1%)
EEI/Vangwwfi ' A;(GIEEEG)—10,ui : 0- (3-19C)

36

These three sets of moment conditions respectively correspond to (21a), (21b), and (21c)
of Ahn, Lee and Schmidt (2001, p. 229). We can replace the nuisance parameters 255,
2W0, and EWW by consistent estimates, as given above (based on some initial consistent
GMM estimates of 6, 7 and 6). The point of this simpliﬁcation is that we have drastically
reduced the set of moment conditions: there are (T — 1)(TK + g) moment conditions in
b1,- (equation (3.5)) but only K + g + p moment conditions in (3.19).

We note that this is a stronger result than the corresponding result (Proposition 1, p. 229)
of Ahn, Lee and Schmidt (2001). In order to reach essentially the same conclusion on
the reduction of the number of moment conditions, they impose the assumption that e,- is

independent of (W7, 02,-), a much stronger assumption than our NCH assumption.

3.4 GMM under the Orthogonality and Covariance
Assumptions

In this section we continue to maintain the Orthogonality Assumption (Assumption 3.1),
but now we add the Covariance Assumption (Assumption 3.2), which asserts that Elsie; 2
021T.

Clearly the Covariance Assumption holds if and only if
E(u,;u;.) = ogxx’ + (731T. (3.20)

Condition (3.20) contains T(T + 1) / 2 distinct moment conditions. It also contains the two
nuisance parameters 0?, and 03, and so it should imply T(T + l) / 2 — 2 moment conditions
for the estimation of 6, 7 and 6. These are in addition to the moment conditions (3.5)
implied by the Orthogonality Assumption.
To write these moment conditions explicitly, we need to deﬁne some notation. Let

H 2 diag(H2, H3, . . . , HT), with Ht equal to the T x (T - t) matrix of the last T -t

37

columns (the (t + 1)th through Tth columns) of IT for t < T, and with HT equal to a
T x (T —- 2) matrix of the second through (T -— 1)-th columns of IT.1 Then we can write
the distinct moment conditions implied by the Orthogonality and Covariance Assumptions

as follows:

E01,: : E(G’u, oat/1(3) = 0 (3.21a)

Ebgi = EH’(G’u, es in) = 0 (3.21b)
I )‘Iui

E03,: = E[G u,- a) W] = 0. (3.210)

(In these expressions, G is short for 0(6), /\ is short for M6), and u,- is short for ui(6, 7).)

The moment conditions b1, in (3.21a) are exactly the same as those in (3.5) of the
previous section, and follow from the Orthogonality Assumption.

The moment conditions ()2,- in (3.21b) correspond to those in equation (12) of Ahn, Lee
and Schmidt (2001). Note that it is not the case that E(G’u, @117) 2 0. Rather, looking at a
typical element of this product, we have E (11,-, — At’ui1)ui3, which equals zero for s 2 t and
s 2 1. The selection matrix H ’ picks out the logically distinct products of expectation zero,
the number of which equals T(T — 1) / 2 —- 1. The selection matrix H plays the same role as
the deﬁnition of the matrices U5 plays in Ahn, Lee and Schmidt (2001). We note that the
moment conditions b2,- follow from the non-autocorrelation of the cit; homoskedasticity
would not be needed.

The (T — 1) moment conditions in 03,- in (3.21c) correspond to those in equation
(13) of Ahn, Lee and Schmidt (2001). They assert that, for t 2 2, . . . ,T, E (“it —
AWnXZEEﬂ Asuz-S) 2 0, and their validity depends on both the non-autocorrelation and

the homoskedasticity of the cit.

lFor any matrix B with T rows, HfB selects the last T — t rows of B for t < T, and HAB
selects the second through (T — 1)-th rows of B. For any matrix B with T columns, BHt selects

the last T — t columns of B fort < T, and BHT selects the second through (T — 1)-th columns of
B.

38

Some further analysis may be useful to establish that (3.21b) and (3.21c) represent all
of the useful implications of the Covariance Assumption. We begin with the implication

(3.20) of the Covariance Assumption, which we rewrite as
E01781 117) 2 0,2,.(A (8) x\) + ogvecIT. (3.22)

Now, let S be the T 2 x T(T + 1) / 2 selection matrix such that, for a T x 1 vector u,

vech(uu’) 2 S’ (u <8) 11), where “vech” is the vector of distinct elements. Then
E5’(u 8 n) = s’[o,2,(x 8 A) + ogveorT] (3.23)

contains the distinct moment conditions.

Now we transform the moment conditions (3.23) by multiplying them by a nonsingular
matrix, in such a way that (i) the ﬁrst T(T + 1) / 2 — 2 transformed moment conditions
are those given in (3.21b) and (3.21c); and (ii) the last two moment conditions are exactly
identiﬁed for the nuisance parameters (0?, and 0?), given the other parameters. This will
imply that the last two moment conditions are redundant for the estimation of 6, 7 and
6, and thus that (3.21b) and (3.21c) contain all of the useful information implied by the
Covariance Assumption for estimation of 6, 7 and 6.

To exhibit the transformation, let Gt be the (t — 1)th column of G; let e; equal the tth

column of I T—2 and eT equal the last column of IT; and deﬁne
(HF), = I‘ATHIV 9:65;. ..., (ii—28’2"» 0(T—2)x:rl- (324)
(HT was deﬁned above.) Then
[as 8 H2, GT_1® HT_1, 1177’s - s’(u,- 8 in) = H’(G’ 8 IT)(u,: 8 u,), (3.25)

which is the same as in 02,- in (3.21b). Also, let Ji" = IT — AA’ and J*,t = 2,. . . ,T, is

equal to diatg{0tx t, MIT-4} plus a T x T matrix with zero elements except for the tth row

39

which is A’. Then
HﬂJf, . . . , .1715 - S’(u,- 8 u.) = (x’ 8 o’)(u,- 8 1...), (3.26)

which is equal to b3,- in (3.21c).

The point of the above argument is that the transformations preceding 5" (n,- ® 11,-) in
(3.25) and (3.26), Stacked vertically, construct a [T (T + 1) / 2 — 2] x T(T + 1) /2 matrix
of full row rank, and yield the moment conditions 02,- and 133,-. The remaining two moment

conditions that determine the nuisance parameters are

E “(1.221 2 03 + 052 (3.27)
Ut2Ut1 J A203
and must be linearly independent of the others (since they involve 0?, and 052 while the
others do not).

The asymptotic variance of the GMM estimate is complicated because it depends on
the moments of Ett up to fourth order. However, we can simplify things with the following
“conditional independence of the moments up to fourth order” (CIM4) assumption:

Conditional on (W,, 01,-), 5it is independent overt = 1, 2, . . . , T, with mean

zero, and with second, third and fourth moments that do not depend on (CIM4)

(W7, 01,-) or on t.

This is a strong assumption; it implies the Orthogonality Assumption, the Covariance As-
sumption, the NCH assumption, and more. In Appendix A, we calculate the asymptotic
variance matrix of the GMM estimate based on (3.21) under the assumption (CIM4).

Let A 2 0M6O)/66 and note that A... 2 G’ A. Given assumption (CIM4), the mo-

ment conditions (3.19), which are asymptotically equivalent to (3.21a), can be simpliﬁed

40

as follows:

Exgpgu, = 0 (3.28a)
enigma... = 0 (3.28b)
E$(,,.azf,.lwtv'. - A’PGu, = 0. (3.28c)

That is, in place of the large set of moment conditions (3.218), (3.21b) and (3.21c), we can
use the reduced set of moment conditions consisting of (3.28), (3.21b) and (3.21c).

A ﬁnal simpliﬁcation arises if, conditional on (W,-, 01,-), 5it is i.i.d. normal. In this case,
(3.21b) can be shown to be redundant given (3.21a) and (3.21c). (See Proposition 4 of
Ahn, Lee and Schmidt (2001, p. 231).) Hence, in that case, the GMM estimator using the

moment conditions (3.28) and (3.21c) is efﬁcient.

3.5 Least Squares

In this section we consider the concentrated least squares (CLS) estimation of the model.
We treat the a,- as parameters to be estimated, so this is a true “ﬁxed effects” treatment. We

can consider the following least squares problem:

N
min N4 XII/i — Xiﬂ —1TZi7 - /\(9)aiI’II/i -' X273 ~1TZiV - /\(9)09:I- (3.29)
B,’Y,6,0l,...,aN i=1
Solving for 011, . . . , (1N ﬁrst, we get
(is-(Ii, 7, e) = [x(e)’x(9)]’1x(e)’n,(e, 7) i = 1, . . . , N. (3.30)

where 1.1-([3, 7) = y, — X26 — 1TZ£7 as before. Then the estimates 6L5, 7145, and 6L5
minimizing (3.29) are equal to the minimizers of the sum of the squared concentrated

residuals

N N
0w, 7, I9) = N"1 Zen/3,7,6) = N‘1 Zm-(ﬁ.7)’MA(0)ut-(/3,7) (3.31)
i=1

2'21

4]

which is obtained by replacing a,- in (3.29) with (3.30). From the name of (3.31), we call
6 L S, 7 L S and 6 L S the concentrated least squares estimator.
Since G"). 2 O, we have 11in 2 G and therefore III) 2 PG 2 C(C’G)‘1G’. So the

ﬁrst order conditions of the CLS estimation become

I I [

 

 

 

 

(BC/66 N X5190...
- 2
36737 = ‘N Z ZilerGui = 0- (3-32)
i=1
_ 0C/66 J _ A’Pau...gl(.\'/\)-1 J

Interpreting (3.32) as sample moment conditions, we can construct the corresponding (ex-

actly identiﬁed) implicit population moment conditions:

EXfPGu, = 0 (3.33a)
EZ,1’TPGu, = 0 (3.33b)
Er\’PG’llitL;-A(/\IA)_1 = 0. (3.33c)

That is, the CLS estimator is asymptotically equivalent to the GMM estimator based on
(3.33).

The moment conditions (3.33a) and (3.33b) are satisﬁed under the Orthogonality As-
sumption. However, this is not true of (3.33c). The moment conditions (3.33c) require the
Covariance Assumption to be valid (unless we make very speciﬁc and unusual assumptions
about the form of /\ and its relationship to the error variance matrix). Thus, the consistency
of the CLS estimator requires both the Orthogonality Assumption and the Covariance As-
sumption. This is a rather striking result, since the consistency of least squares does not
usually require restrictions on the second moments of the errors, and is a reflection of the
incidental parameters problem.

We would generally believe that least squares should be efﬁcient when the errors are

i.i.d. normal. However, similarly to the result in Ahn, Lee and Schmidt (2001), this is

42

not true in the present case. The efﬁcient GMM estimator under the Orthogonality and
Covariance Assumptions uses the moment conditions (3.21), while the CLS estimator uses
only a subset of these. This can be seen most explicitly in the case that, conditional on
(W7, 02,), the cit are i.i.d. normal. Then (3.21b) is redundant and (3.21a) can be replaced
by (3.28), so that the efﬁcient GMM estimator is based on (3.288), (3.28b), (3.28c) and
(3.21c). The CLS estimator is based on (3.338), which is the same as (3.288); (3.33b),
which is the same as (3.28b); and (3.33c), which is a subset of (3.21c).2 So the inefﬁciency
of CLS lies in its failure to use the moment conditions (3.28c) and from its failure to use
all of the moment conditions in (3.210). The latter failure did not arise in the Ahn, Lee and
Schmidt (2001) analysis (see footnote 2).

In Appendix B, we calculate the asymptotic variance matrix of the CLS estimator, under
the “conditional independence of the moments up to fourth order” (CIM4) assumption of

Section 3.4.

3.6 Conclusion

In this chapter we have considered a panel data model with parametrically time-varying co-
efﬁcients on the individual effects. Following Ahn, Lee and Schmidt (2001), we have enu-
merated the moment conditions implied by alternative sets of assumptions on the model.
We have shown explicitly that our sets of moment conditions capture all of the useful infor-
mation contained in our assumptions, so that the corresponding GMM estimators exploit
these assumptions efﬁciently.

We have also considered concentrated least squares estimation. Here the incidental

2The moment conditions (3.33c) are equivalent to EA’ G (G’ G)‘1b3.- 2 0. When the number of
parameters in 6 is less than T — 1, the transformation A’ G (G’ G)‘1 loses information. This will be
so in most parametric models for M6), though it is not true in the model of Ahn, Lee and Schmidt
(2001).

43

parameters problem is relevant because we are treating the ﬁxed effects as parameters to
be estimated. An interesting result is that the consistency of the least squares estimator
requires both exogeneity assumptions and the assumption that the errors are white noise.
Furthermore, given the white noise assumption, the least squares estimator is inefﬁcient,
because it fails to exploit all of the moment conditions that are available.

We Show how the GMM estimation problem can be simpliﬁed under some additional
assumptions, including the assumption of no conditional heteroskedasticity and a stronger
conditional independence assumption. Under these assumptions we also give explicit ex-

pressions for the variance matrices of the GMM and least squares estimators.

APPENDIX

In this Appendix we derive the asymptotic variances of the efﬁcient GMM estimator and
the CLS estimator. We make the “conditional independence of the moments up to fourth

order” (CIM4) assumption of Section 3.4.

3.A The asymptotic variance of the GMM estimator

Under the Orthogonality and Covariance Assumptions, the moment conditions we have
are bl,- = 0’21. 8 14%,, b2,- = H'(G’u,- 8 u.) and 33, = (A'A)‘1/\"ui 8 0’2... Let 6 =
(6’,7’,6’)’. Let Bj 2 —E(0bj,-/66) forj 2 1,2,3, evaluated at the true parameters. Let
Vi)C 2 Ely-71);“. for j, k 2 1,2,3, evaluated at the true parameters. Deﬁne K3 2 Egg/o?
and 834 = E(e;.1, — 8o§)/o§. Let 7...; = EW,;<1> = (1)(6) = m; + diag()t2, . . . , AT); and

(1).. 2 MM, + diag()\%, . . . , Age), where A... 2 (A2, . . . , AT)'. After some algebra, we get

v11 = o§(a’a 8 2......) (3.34)
V12 = o§(G’G 8 2...,A’)H (3.35)
V13 = a? [G’G 8 Ewa+ :7: —(<I> 8 71.47)] (3.36)
1'22— _ oEH ’[G’G 8 (o3, xx’ + oEITnH (3.37)
V23— — 05 H’ { [(0-0, 2 +32%) G’G+ TIA/“(DI 8 A} (3.38)
7 2 2 02 I "4
1:33:05 {(0a+ +:\T-—,—i\)GG+2:—;—:ua<l>+ “002 ~———~—2...(I> } (3.39)
and
Br = [(G <8 ZWWYSXi (G ® ZWWIISZ» 1h (59 Ewe] (3-40)
32 = H’(IT_1 8 A)[(G 8 Sara’s... (a 8 ZWQ)’SZ, 03A,] (3.41)
133 = [(0 ® Ewol'SX, (G <8 EWa)ISZv 03.1%]- (3-42)

45

With these results, the variance-covariance of the GMM estimator is

- (V11 V12 V13) {BM—1

covm—(d—d): (31,3533) V13 V22 V23 32 . (3.43)

\Viz’, V23 V33) K33}

 

 

 

 

 

 

L .l

3.B The asymptotic variance of the CLS estimator

By the standard Taylor series expansion technique, we ﬁnd that the asymptotic variance

will be equal to 34080—1140 where

620,- _ 8C)- 604
A0 ”Eats—857’ and 30‘ 66 66’

     

(3.44)

evaluated at the true parameter. Let us calculate each of them. Let A 2 BA(90)/86’ 2
(0px1, A;)'. B0 is the same as in Ahn, Lee and Schmidt (2001, p. 253). Let ‘1' 2

C(G’G)-1<1>- (am-105x11... = G(G’G)-1<I>.(G'G)-1G’; and In = Ea). Then

     

     

   

 

   

80- 80,;
0B1 85’ =405 S'XU’G ® EWW)SX (3'45)
('30 6C-
652 a): = 4agSfY(PG 8 Ewe/>52 (3.46)
30- 30 2
65’ 6—9} = 4085} [PG 8 XXX/0+ A, A01! 8 pm] A (3.47)
60- BC
Ebﬁﬂ— 0') I — —4UESZ(PG ® ZWW)SZ (3-48)
BC OC 2
E a 672' — -04 5'2 [PG8EWQ+ :IA(\II8)IW)] A (3.49)
60,: ac,:_ 02 I 02 K14 }
__ _ \II A. .
E09 89’ 5A 40{(2 NA PG+2:I:IIQ‘II+(———AI/\)2 4 (350)
A0 is obtained from the following.
320-
Eg—ﬁay = 2lSXUDG ‘59 ZWW )SX, 5X (PG ® ZWW )SZ, SX(PG ® EwaW (3.51)
6201 I I I
Em = 2[Sz(PG ® E3WW)SX, 52(PG ‘59 Ewwﬁz, 520?} ® 27WaW (3.52)
620i I I I I 2 I

46

Bibliography

Ahn, S. C., Y. H. Lee, and P. Schmidt (2001) ‘GMM Estimation of Linear Panel Data

Models with Time-varying Individual Effects.’ Journal of Econometrics 101, 219—
255.

Ahn, S. C. and P. Schmidt (1995) ‘A Separability Result for GMM Estimation, with Ap-

plications to GLS Prediction and Conditional Moment Tests.’ Econometric Reviews
14, 19—34.

Battese, G. E., and T. J. Coelli (1992) ‘Frontier Production Functions, Technical Efﬁciency

and Panel Data: with Application to Paddy Farms in India.’ Journal of Productivity
Analysis 3, 153—170.

Bierens, H. J. (1994) Topics in Advanced Econometrics, Cambridge University Press.
Billingsley, P. (1968) Convergence of Probability Measures, Wiley.

Bound, J., D. A. Jaeger, and R. M. Baker (1995) ‘Problems with Instrumental Variables
Estimation When the Correlation between the Instruments and the Endogeneous Ex-

planatory Variable is Weak (in Applications and Case Studies).’ Journal of the Amer-
ican Statistical Association 90, 443—450.

Chamberlain, G. C. (1980) ‘Analysis of Covariance with Qualitative Data.’ Review of Eco-
nomic Studies 47, 225—238.

Chamberlain, G. (1987) ‘Asymptotic Efﬁciency in Estimation with Conditional Moment
Restrictions.’ Journal of Econometrics 34, 305—334.

Chamberlain, G. C. (1992) ‘Efﬁciency Bounds for Semiparametric Regression.’ Economet-
rica 60, 567—596.

Davidson, J. (1994) Stochastic Limit Theory, Oxford University Press.

Hansen, L. P. (1982) ‘Large Sample PrOperties of Generalized Method of Moments Esti-
mators.’ Econometrica 50, 1029—1054.

Holtz-Eakin, D., W. Newey, and H. S. Rosen (1988) ‘Estimating Vector Autoregressions
with Panel Data.’ Econometrica 56, 1371-1395.

James, A. T. (1964) ‘Distributions of Matrix Variates and Latent Roots Derived from Nor-
mal Samples.’ Annals of Mathematical Statistics 35, 475—501.

47

Johnson, N. L., and S. Kotz ( 1972) Distributions in Statistics: Continuous Multivariate
Distributions, John Wiley & Sons.

Kiefer, N. M. (1980) ‘A Time Series-Cross Sectional Model with Fixed Effects with an
Intertemporal Factor Structure.’ Unpublished Manuscript, Cornell University.

Kim, J ., and D. Pollard (1990) ‘Cube Root Asymptotics.’ The Annals of Statistics 18, 191—
219.

Kumbhakar, S. C. (1990) ‘Production Frontiers, Panel Data and Time-varying Technical
Inefﬁciency.’ Journal of Econometrics 46, 201—212.

Lee, Y. H. (1991) ‘Panel Data Models with Multiplicative Individual and Time Effects: Ap-
plications to Compensation and Frontier Production Functions.’ Unpublished Ph.D.
Dissertation, Department of Economics, Michigan State University.

Lee, Y. H., and P. Schmidt (1993) ‘A Production Frontier Model with Flexible Temporal
Variation in Technical Inefﬁciency.’ In The Measurement of Productive Eﬁiciency:

Techniques and Applications, edited by H. Fried, C. A. K. Lovell, and S. Schmidt,
Oxford University Press.

Linton, O. (1999) ‘Problem.’ Econometric Theory 15, 151.

Magnus, J. R., and H. Neudecker (1988) Matrix Diﬁerentation Calculus with Applications
in Statistics and Econometrics, John Wiley & Sons.

Manski, C. F. (1983) ‘Closest Empirical Distribution Estimation.’ Econometrica 51, 305—
319.

Newey, W. K. (1988) ‘Asymptotic Equivalence of Closest Moments and GMM Estimators.’
Econometric Theory 4, 336—340.

Phillips, P. C. B. (1980) ‘The Exact Distribution of Instrumental Variable Estimators in an
Equation Containing n + 1 Endogenous Variables.’ Econometrica 48, 861—878.

Staiger, D., and J. H. Stock (1997) ‘Instrumental VariabIes Regression with Weak Instru-
ments.’ Econometrica 65, 557—586.

Theil, H. (1971) Principles of Econometrics, Wiley.

48

      

     

HltANF HAVE LIBRARIE

1293 02177 ‘077‘