9)]‘J‘95‘

I‘ ’a'

9€$r

.
S.

'4-

.
.
a“

7.

31.55

9‘59.

‘.
.N

2!
L

43.9

.33.

-r,‘

3";

23¢

.-

‘

9386;}?

0“?

-— ‘3;
3"" "It

. 9““

  

5‘

.923

r5“

SW

‘12.
on

h

5‘“
9
a

ms

t}

”a“.

'i.

 

-
0":

‘
'.
i

52:

“ﬁlm

9

_ .3
a,

    

:3.

w"

.434;

»"~-.‘}m‘n

..
s
'-

$331443.

9
O

33.25:.

 

   
     

L

 

_'. 3'.
35..

.I
. .‘l‘I‘K‘ ‘

  
   
  
  
  
   
   
 
  

c
‘0“ .
;_v _..
- 9 0'.
' ‘40. I "
. t‘ u "-
. ’ .. ‘
.
-
u- 9'
.9 . .
. > a I
O 9 .'.
,'4 '

' ‘ \
f. . ‘.' ‘.
0‘ ‘l‘. :5
.' _-; 9 .
" ' .'>
11 ‘. a.
‘ I. .I
" _"‘Il -
I\o ‘il '0.‘. =—
9 -.,.
.3. .
‘ 1,’.';’ ','

   
 

:l a}, .

. .V 'V 9 1.. ,
51.1,. .. I).
- L.‘ . I ‘4... ' ‘..
t v '3 «u» ,1" .- . .6
" ' .0 '--' ”LVA' 9--....» .,.
. 1w ¢a~-co.ﬁ.... -.... . I. 4 . .. “n“..q.’ ‘ .
-. ¢oo .0 n a»': ‘ ' . I . - F '

I. '~‘

'- 0 1..“ o
.1 ﬂ. .- u " c I Q ' 0
iv‘ c '- ‘l. ' x! 'rtfv [:7‘.
,.., . .. ‘ .11., -0

   

0"!

  
 
 
 
 
 
  

   

        

  
 
  
   
 
  

co . I
l I :9.-
0“

‘- :‘F 'd
. ,
‘ 'J. 4.}!

  
 
 
  
  

 

 

 

     
   
     
     
  

. I .

o

t z‘.“n.v¢‘~. - I. v 1

- .-. -. . . «.3. . ‘

A q ‘ ‘ o. o I 'I-v 9. 7n." ‘a' . -
-1: 0;, u- ‘ .. 1‘1.) ¢_‘.h. ,.-. .
- ~o. ‘ ‘ OI . r- ' v v ‘- f .
s: g . '._-‘A . '3, “’...I,L‘}.nés {a .3.” V} \
‘ ' . " ' t n- a . ,‘
'. ‘P 0’ -
- ? AHA)!

 

 

9,.

 
 

 
   
 

  
 

» K. p, .‘. “.91”.
.19”.- ovv- .
‘3‘ ..

    
    
 
 
 
  
   

   
 

 

IV.
.J.

 
  

O
n In
n ..0.0
-.‘-‘

  
 
 

x1,
0 I

    
 

       

  

o
A-O‘Lo" no... i.
. O! ..| V
V o‘c ru'.

    
   

      
  
   
      

   

'34-’53
‘ v v

u

‘ I (an.

   
 

   

  

        

1' O".
c "O I" a
4.-'.. V..

 
    
 

0" up!
n... v.

   
 
  
 
 

    

,
luv-0".
4'3..." 001.. ‘
' --'~ I v

'. .(la‘g‘oﬁ‘b .
1""",‘-.' “U

  
    
  
  
   
  

   

r
O
o o 01' .‘I '-' o.- o .
1‘. #0 (.o.‘.‘..:‘ v.?o'£.’ :4. ' LL,
' '1'- ,‘ﬁa ‘7‘1'0'.'.fﬁx .
.¢b~~'L:-t,“'.q ‘Vti‘ﬁ. , .-
' ' I“. ' 'v‘"< -.7.',‘.'" ‘9‘: o“ ‘. -'.":".

THESIS

IUIHHHHllllHHHHHIHIHHIll)Hll‘llHllHllMlUlll

3 1293 010515

Thisistocertifythatthe

dissertation entitled
ESTIMATION WITH PANEL DATA
presented by
KYUNG SO IM

has been accepted towards fulﬁllment
of the requirements for .

Ph .D . degree in Econogics

 

Date MGPCL} 3 '7 I ‘1‘77‘

“407th

iveA tion/ Equal Opportu unity Inuit ° 0-12771

 

 

LIBRARY
Michigan State
University

 

 

 

PLACE It RETURN BOX to remove We checkout from your record.
TO AVOID F INES return on or before date due.

DATE DUE DATE DUE DATE DUE
' ' ' A wt: 5‘52 _ Zbo’i‘l

 

 

 

 

   

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

l

MSU le An Affirmative Action/Equal Opportunity lnetitution
Warns-9.1

 

 

ME
ﬁr?

 

 

 

 

 

ESTIMATION WITH PANEL DATA

BY
Kyung So Im

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Economics

1994

ABSTRACT
ESTIMATION WITH PANEL DATA
BY
Kyung So Im

This dissertation studies standard panel data models
with repeated observations for large cross sections. In
Chapter 2, we compare the BSLS estimator and the generalized
IV estimator, and derive some equivalence results. Also, we
obtain some redundancy conditions for models where the
regressors are strictly exogenous. Block diagonality of the
optimal GMM weighting matrix turns out to be crucial for
some instruments to be superfluous. Also, we propose some
GMM estimators that are computationally simple and
asymptotically no less efficient than GLS.

Chapter 3 covers weakly exogenous models. If the
instruments are weakly exogenous and the errors are serially
correlated, it seems that the currently used moment
conditions lead to inconsistent estimators in general. The
source of the serial correlations appears crucial to
determine the set of orthogonality conditions. Also, we
suggest some reduced lists of instruments in several useful
models that would produce nearly efficient estimators.

In Chapter 4, we derive asymptotic variances of
estimators when the moment conditions from covariance
restrictions are used. As nonlinear optimization is not

necessary to estimate these variances, this result, in

practice, would motivate people to use covariance
restrictions more frequently. We also detail when the
moment conditions from covariance restrictions are redundant
in several popular models. An interesting result is that
the instrumental variables can be useful even when they are
not correlated with the regressors. We also argue that the
moment conditions from covariance restrictions are useful

always unless the GLS efficiency is reached.

To the memory of my parents

iv

ACRNO'LHDGHBNTS

Many people have assisted me in getting through the
graduate program. Although there are too many names to
mention them all, I am happy to have a chance to express my
gratitude.

Were it not for the aid of Professor Jeffrey
Wooldridge, the dissertation committee chair, I would still
be idling my time away. He provided careful advice on every
aspects of my thesis with exceeding intelligence and
consistent patience. I was a demanding student, but he
always split his tight schedule and found time for listening
to my questions. Especially, his careful correction on the
final draft of Chapter 2 is appreciated deeply.

I am grateful to the other two committee members,
Professors Peter Schmidt and Ching-Fan Chung, for their time
and efforts. The detailed and critical comments by
Professor Schmidt on the early drafts of Chapter 2 and 4
improved the quality of this thesis substantially, and will
be invaluable lessons through my career.

The wonderful lectures from many faculty and the
assistance of the departmental staff are appreciated. I
must thank Professor Richard Baillie for his careful reading
and comments on Chapter 2 and 4, and Professor Paul Chen for

his advice and encourgement.

It was my fortune to have known Dongin Lee, my four
year officemate. We talked on whatever occured to us.
Without him, this course should have been much more arduous.
Joonsoo Lee's guidance was also quite helpful.

My greatest luck in graduate school was meeting and
marrying my wife, Eunhee Kuh. She sacrificed her own study
and brought a lovely girl and a healthy boy into this
wonderful world. Words cannot be found to express my thanks
to her. My baby brother Kyung Tae and big sister Hye Kyung
have provided thorough support throughout the course. And I
am also indebted to my other brothers Kyung Rae, Kyung Ho
and Kyung Hun for their encouragement.

Thank you all who prayed for me.

vi

TABLE OF CONTENTS

CHAPTER 1. INTRODUCTION .................................

CHAPTER 2. ESTIMATION USING PANEL DATA UNDER STRICT
EXOGENEITY

1. INTRODUCTION OOOOOOOOOOOOCOOOOOOOOOOO ........ 00....
2. 3SLS, GIV, AND REDUNDANCY CONDITIONS

2.1. PRELIMINARIES ................................
2.2. EFFICIENCY COMPARISON OF 3SLS AND GIV ........
2.3. NUNERICAL EQUIVALENCE OF BSLS AND GIV ........
2.4. ALGEBRAIC REDUNDANCY OF INSTRUMENTS IN BSLS ..

3. MODEL WHERE THE REGRESSORS ARE UNCORRELATED WITH THE
ERRORS

3.1. UNRESTRICTED COVARIANCE MATRIX .... ...... .....
3.2. DIAGONAL COVARIANCE MATRIX ...................
3.3. RANDOM EFFECTS STRUCTURE .....................
3.4. A GENERALIZATION OF THE RANDOM EFFECTS
ASSUMPTION ...................................

4. MODEL WHERE THE REGRESSORS ARE CORRELATED WITH THE
TIME CONSTANT ERROR COMPONENTS ....................

4.1. A "FIXED EFFECTS" TYPE MODEL .................
4.2. HAUSMAN AND TAYLOR MODEL .....................
4.3. HT MODEL WITH SERIALLY CORRELATED TIME-VARYING

ERRORS .......... ...... .......................

5. CONCLUSION 0.00000000000000000000000000000000000000

APPENDIX 1 ........................................
APPENDIxzOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO

1

14

18
20
23
26

26
28
29

34

37

38
41

44

46

48
52

CHAPTER 3. ESTIMATION USING PANEL DATA UNDER WEAK EXOGENEITY

1. INTRODUCTION 00.0.0000...OOOOOOOCOOOOO0.0.0.0000...
2. SERIAL CORRELATION AND CONSISTENCY OF ESTIMATORS .

2.1. MOMENT CONDITIONS UNDER WEAK EXOGENEITY ......
2.2. SERIAL CORRELATION AND MOMENT CONDITIONS ....

3. ESTIMATION WITH THE BMS ASSUMPTION ........ . .......

vii

56

60

61
63

7O

4. NEARLY EFFICIENT ESTIMATION ..... ..................

5. CONCLUSION .0..0.000000000000000000.00.00.00.00...O

82

CHAPTER 4. INFORMATION FROM COVARIANCE RESTRICTIONS IN PANEL
DATA MODELS

1. INTRODUCTION

2. PRELIMINARIES

2.1. REDUNDANCY CONDITIONS FOR MOMENT RESTRICTIONS
2.2. SCALAR COVARIANCE AND THE ASYMPTOTIC VARIANCE

OF Gm .0.0...0.000.......OOOOOOOOOOOOOOOOOOOO

2.3. RANDOM EFFECTS COVARIANCE AND THE ASYMPTOTIC

VARIANCE OF GMM

3. STRICTLY EXOGENOUS MODELS

OOOOOOOOOOOOOOOOOOOOOOOO.

3.1. GENERAL RESULTS ON NONREDUNDANCY UNDER IDEAL

CONDITIONS
3.2. STRICTLY EXOGENOUS MODEL:
3.3. STRICTLY EXOGENOUS MODEL:
COVARIANCE
3.4. STRICTLY EXOGENOUS MODEL:

SCALAR COVARIANCE ..
RANDOM EFFECTS

FIXED EFFECTS TYPE .

4. WEAKLY EXOGENOUS MODELS .00...OOOOOOOOOOOOOOOOOOOOO

4.1. WEAKLY EXOGENOUS MODEL: DIAGONAL COVARIANCE ..
4.2. WEAKLY EXOGENOUS MODEL: RANDOM EFFECTS

COVARIANCE

5. CONCLUSION OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO

CHAMER 5. CONCLUDINGREMARKS 0.0.0.0000...OOOOOOOOOOOOO

LIST OF REFERENCES

viii

87

89

95

99

99

101

106
107

111

111

115

121

123

126

CHAPTER ONE

INTRODUCTION

This dissertation deals with linear panel data models
with repeated time observations for large cross sections. A

basic model is
(1) yit = xitﬁ + uitl t = 1" ' '11.:

where xit is a 1xk vector and ﬂ is a kxl parameter vector of
primary interest. Let xi = (xi'1, - - - ,xip ' with yi and ui
similarly defined. Allowing time-constant unobserved
individual effects, that may in many instances bias the

estimators obtained from single cross section data, we write

(2) u.t = ¢i-+ 5n! t = 1,- -,T.

1

Thus, ¢N is the time-constant error component and eit is the
idiosyncratic error. We assume there is a Txh matrix of
instrumental variables wg; these instruments are suggested
by various assumptions. We are interested in the case where
¢i are treated as random, and not as fixed parameters to be
estimated. We do consider the case where ¢k is correlated
with some or all of the regressors; for many applications
this is an important feature.

This chapter provides a summary of the main results

contained in the subsequent chapters, and links them to

2

previous studies. The following chapters are essentially
independent of each other, and can be read separately. In
each chapter we define the relevant notation. Whenever we
refer to theorems, equations or assumptions contained in
other chapters, we will specify the chapter number.
References are gathered at the end of the thesis.

In Chapter 2, we are primarily concerned with
estimation in models where the regressors are strictly

exogenous to the idiosyncratic errors:
(3) E(x¢mg) = 0.

But, before dealing with specific models, we compare 3SLS
and generalized IV (GIV). This comparison appears in Bowden
and Turkington (1984, p. 72) and White (1984, pp. 83-105:
1986). But, a general result has not been established yet.

We assume
(4) E(uuqlvq) = E(uuﬁ) 552.
The 3SLS and GIV estimators are defined as

B355 = [x'wmmm"w'x1'1x'mw'nm‘1w'Y,
and

3 = [x'n“W(w'n"W)"w'n'1x1‘1x'n'1w w'n'1w "w'n'1Y,
GIV

where (Y,X,W) is the data matrix stacking (yi,xi,wi) , i =
1,---,N, and n ==I~o£. 3SLS and GIV utilize the instruments

vq and 24w}, respectively. We show that there in general is

3
no dominance one over the other between these two, and they
are numerically identical when the time periods use common
instruments or when E is diagonal. Thus, the well-known
equivalence result between OLS and GLS in the SUR model
follows as a corollary of our result when wi==)g.

We then turn to the specific models and provide several
reduced lists of instrumental variables that lead to the
fully efficient estimators under several different
assumptions. Asymptotically, there is no reason to reduce
the set of instruments since GMM never loses asymptotic
efficiency by adding orthogonality conditions. However, GMM
based on restricted instruments is not only computationally
simpler but could have better finite sample properties.

The unobserved effects model is standard if we add the

assumption of the random effects covariance matrix

(5) z = of:T + age,e;,

where E(e§t) = oi, t = 1, - --,T, Em?) = 03' IT is the TxT
identity matrix, and 9% is the Txl vector of ones. If xi
and <1:i are not correlated with each other, the model is the
popular random effects model, and the random effects
estimator (GLS) is the most efficient. We show that BSLS
utilizing the instrumental variables (PX,QX) is GLS, and
propose GMM using (PX,QX) as the instruments, where PX and

QX denote the NTxk meaned and demeaned matrices of the data

matrix X. If the assumption (4) does not hold - for

4
example, in the presence of heteroskedasticity and/or serial
correlation in the 5n conditioned on wg‘- GMM using the
instruments (PX,QX) is generally more efficient than the
random effects estimator.

Hausman and Taylor (1981) (HT hereafter) allowed for
some of xi to be correlated with 4", and showed that how the
coefficients on the time—constant variables are identified.
Subsequently, Amemiya and MaCurdy (1986) and Breusch, Mizon
and Schmidt (1989) (EMS henceforth) developed more efficient
estimators under some additional assumptions. We argue that
the optimal weighting matrix E(wi'uiu,'wi) needs to be block
diagonal to generate the fully efficient GMM with reduced
lists of the instrumental variables. This unified theme
provides intuition behind the redundancy results established
by EMS, Ahn and Schmidt (1992) and many of the theorems in
this thesis including the previous result that 3SLS using
(PX,QX) is GLS in the random effects model. If 2 is of the
random effects form, then 2 can be expressed as aPr-i-bQT, for
some scalars a and b. Thus if wi can be decomposed into
(PTw1i,QTw2i) , where w1i is generally constructed from the
regressors that are not correlated with ¢H (an important
exception is the instrument set suggested by EMS) and w2i
usually is based on all of the time varying regressors, and
provided assumption (4) holds, the optimal weighting matrix
become block diagonal simply because Pg; = 0. We show a

redundancy result through an example when wz‘. = Lox‘i’, which

5
is standard, where L is the Tx(T-1) differencing matrix and
x? = (xi1,- ~ -,xiT). Then W2(W.‘,'W2)'1W2X = QX and the resulting
estimators are the same whether W2 or QX are used as
instruments. W2 contains T(T-1)k instruments, but only the
k instruments QX are useful and the rest are redundant.

This also explains the well-known lemma showing GLS in
the random effects model is a linear combination of the
within and the between estimators, and generalizes this
lemma to the case when the optimal weighting matrix is block
diagonal. We show that 2 = aPT+bQT =9 PTzzQT = 0, but the
converse is not true. P.2Qt = 0 is sufficient to make the
optimal weighting matrix block diagonal, provided wi==
(Prwnerwzi) and assumption (4) holds. Another important
case when the optimal weighting matrix become block diagonal

is when
(6) 2 = diag(0$1"'la$)l

that is, when there are no time-constant unobserved effects
and the errors are serially uncorrelated. Under the
assumptions (4) and (6), E(w{unﬁvq) is block diagonal with
the t-th block 0§E(wi'twit) , where wit is the instruments for
the t-th period equation. If wit 3 xit for t = 1, --,T, BSLS
utilizing the instruments diag(x“,---,xﬂ) is GLS. This
covariance matrix is especially revelant for the rational
expectations models where (pi does not present and the errors

are necessarily serially uncorrelated.

6

Assumption (5) is now almost standard in the panel data
literature. But, in general, there are neither a priori
grounds nor any technical reasons that justify this form of
2. Therefore, we consider the case when the idiosyncratic
errors are serially correlated. Ahn and Schmidt (1991)
showed that BSLS using all the instruments Iﬁnﬁ is GLS. We
provide a simpler proof than theirs, and reconsider the HT
model allowing for the idiosyncratic errors to be serially
correlated in an arbitrary manner. Some of the instrumental
variables turn out to be redundant, but the number is
smaller than under (5). Also, we show that GIV is not
consistent unless the equi-correlation assumption of EMS
holds, but when the BMS assumption holds, GIV can reduce the
number of instrumental variables substantially.

We also consider the model when all of the regressors
are correlated with ¢i° If the idiosyncratic errors are
arbitrarily correlated, the model is very similar to the
fixed effects model with arbitrary intertemporal covariance
considered by Kiefer (1980). We show that the several
estimators, including the Kiefer's estimator (GLS in
demeaned equation using a generalized inverse of Q12QT) , GLS
in the differenced equations, and GLS in the demeaned
equations after deleting any one equation, are numerically
identical. Thereby, generalized inverting in this case is
an unnecessary complication.

In Chapter 3, we study the models where the regressors

7
are weakly exogenous to the idiosycratic errors. Thus in

place of (3) we have the assumption
(7) E(x&e“) = O, s 2 t.

Dynamic models and rational expectations models are the
leading examples of weakly exogenous but not strictly
exogenous models. However, weakly exogenous models would be
suitable in broader applications. We are primarily
concerned with the consistency of estimators when the errors
are serially correlated, and with the consistency of the
usual standard errors of the BSLS estimators when certain
moment conditions are used. Also, some nearly efficient
estimators based on some reduced lists of instruments are
proposed

We ask a basic question whether the moment conditions
in (7) are valid when the idiosyncratic errors are serially
correlated. In dynamic models, it now is well known that no
moment conditions exist between the lagged dependent
variables and the disturbances if the idiosyncratic errors
are arbitrarily serially correlated. We show that a similar
relation exists between the general weakly exogenous
regressors and the errors unless the time-varying errors
contain two components; the serially correlated components
to which the regressors are strictly exogenous and the
serially uncorrelated components to which the regressors are

weakly exogenous.

8

Keane and Runkle (1992) proposed ZSLS upon the forward
filtered equations when the errors are serially correlated,
adapting a suggestion by Hayashi and Sims (1983) in a pure
time series context. Schmidt, Ahn and Wyhowski (1992) (SAW
henceforth), in a comment on Keane and Runkle, provided the
maximal sets of the instrumental variables in several weakly
exogenous models, and showed that the Keane and Runkle
estimator is numerically identical to BSLS when all the
instruments are used. Hayashi and Sims (1983) and SAW
indicated that eliminating the serial correlations by
forward filtering is justified only when "the serial
correlations in the errors are independent of the current
and lagged values of the instrumental variables". But, if
the Keane and Runkle estimator is inconsistent, so is 3SLS
since they are the same. Thereby, the requirement for
vindicating forward filtering noted by SAW and Hayashi and
Sims indeed is needed for the moment conditions in (7) to be
valid.

Wooldridge (1993) showed that the usual standard errors
for the nonlinear BSLS estimators in hedonic pricing models
are not consistent, and derived a condition for the usual
BSLS standard errors to be consistent. Ahn (1990) obtained
a similar result in the dynamic model when certain
instruments are used. If BMS's equi-correlation assumption
that E(xi't¢i) are the same over t holds, we have (T-1)k

additional instruments. We show that the usual 3SLS

9
standard errors are not consistent if these instruments are
used. Applying BMS's condition to dynamic models, we have
the condition that E(yi't¢i) are the same over t, which is
implied by the stationarity of {(Yit¢i) :t=0, - - - ,T} suggested
by Arellano and Bover (1990). This condition implies T-1
instruments, and the usual 3SLS standard errors are not
consistent if these instruments are used. In fact, the
result obtained by Ahn (1990) is based on the instrumental
variables obtained from the moment conditions
E[(yit-ayit_1)¢i] being the same over t, which is weaker than
EMS condition (or Arellano and Bover), but the structures of
the instruments from these conditions are quite similar.
Thus, Ahn's result is closely related to ours.

As we argued above, all the instrumental variables are
useful unless the optimal weighting matrix is block diagonal
in general. When the instruments are weakly exogenous the
diagonal would be the only structure of 2 that makes the
optimal weighting matrices block diagonal. Thus, any
attempts to find some reduced lists of instruments that
produce the fully efficient estimators may not be fruitful.
But, it is practically useful to find some reduced set of
instruments. In many applications, the weakly and the
strictly exogenous regressors exist together in a model. A
leading example is the dynamic model with strictly exogenous
regressors considered by many. Letting X1 and X2 be weakly

and strictly exogenous regressors, respectively, we suggest

10
instruments (R,n4Xé),‘where R includes all the instruments
between X1 and the errors.

In Chapter 4, we study the moment conditions from
covariance restrictions, which are essentially nonlinear in
the parameters. Covariance restrictions are rarely used in
practice in standard models, perhaps because people are
reluctant to utilize a priori restrictions that will cause
inconsistency of estimators when they are false. Another
important reason would be the computational burden of
numerical optimization. But, if covariance restrictions
bring non-trivial efficiency gains, computational burden is
secondary. We show how to consistently estimate the
asymptotic variances of the nonlinear GMM estimators that
use the moment conditions from covariance restrictions
without numerical optimization. If the efficiency gains by
adding the moment conditions from covariance restrictions is
non-trivial, then it is worth doing numerical optimization.
Testing whether the covariance restrictions are valid is
straightforward, so we can get around possible inconsistency
problem. This result applies to the general simultaneous
equations models as well as to the panel data models.

Next, we find when the moment conditions from
covariance restrictions are redundant. We consider the
scalar and the random effects covariance matrices in
strictly and weakly exogenous models. In the strictly

exogenous model with scalar covariance matrix, so where OLS

11
is efficient under standard set of assumptions, it turns out
that the moment conditions from covariance restrictions are
useful unless certain third moment conditions of the errors
are met. In this case, the nonlinear GMM estimator is
equivalent to the linear GMM estimator using the instruments
made of residuals from initial consistent estimators, and
these instruments are not correlated with the regressors.
Therefore, what we find is that the instruments can be
useful even when they are not correlated with the
regressors. This deviates from convention. In fact, the
efficiency gains in this case follow from the correlations
between the instruments and the error square sequence {uﬁ}.
In another words, the additional instrumental variables from
covariance restrictions necessarily cause heteroskedasticity
to be useful. This relates to the results obtained by Cragg
(1983) and Chamberlain (1982) that additional instruments
(other than the regressors) can be useful in standard
regression models where all the regressors are valid
instruments under heteroskedasticity of unknown form. The
asymptotic variance of the estimator when we use instruments

w} is (we consider here a single equation case for

simplicity),
2 -1 4
(8) [E(xi'wi)E(uiwi'wi) E(wi'xi)] .
Let wi = (xi,zi) , where zi is the set of additional

instruments, and suppose E(z;xg) = 0 (this is not necessary

12

but simplify the algebra), then (8) becomes
E | ‘1 E 2 | _ E 2 q E 2 | '1E 2 c E n "I
(Xi Xi) { (111-Xi Xi) (uixi Zi)[ (uizi Zi)] (uizi Xi)} (Xi Xi) I

which is strictly smaller that OLS variance as long as
E(u§xi'zi) n 0. Thus, zi contributes by explaining u‘i”.

Another general result we obtain in Chapter 4 is that
the information from covariance restrictions is useful if
the GLS efficiency is not reached. Along with the above
arguments, it becomes obvious that the GLS efficiency is
necessary but not sufficient for the moment conditions from
covariance restrictions to be redundant.

In the Hausman and Taylor (1981) model, GLS is not
consistent. But, the regression is seperated into the two
orthogonal spaces as we argued above, and the GLS efficiency
is reached in the deviation space. As was conjectured by
Ahn and Schmidt (1992), the instrumental variables from the
random effects covariance restrictions are in the deviation
space, and they are redundant if certain higher conditions
on the error are satisfied.

In weakly exogenous models, GLS is not consistent
unless 2 is diagonal. It seems that covariance restrictions
are useful if 2 is not diagonal, and they can be redundant
if 2 is diagonal. We show these through several examples
that quite often appear in applications. If there present
unobserved effects ¢H in weakly exogenous model, the moment

conditions from covariance are useful because 2 is not

13
diagonal. This holds whether ¢H is correlated with the

regressors or not.

CHAPTER 2

ESTIMATION WITH PANEL DATA UNDER STRICT EXOGENEITY

1. INTRODUCTION

This chapter has two purposes. The first is to
establish equivalences between certain three stage least
squares (3SLS) and generalized instrumental variables (GIV)
estimators in several panel data models with strictly
exogenous regressors. The second purpose is to find minimal
sets of nonredundant instruments for BSLS under various
assumptions that have been used in the panel data
literature. Extensions of the standard assumptions are also
considered.

The 3SLS estimator considered in this paper appears in
Amemiya (1977, equation 5.4), Hausman, Newey and Taylor
(1987), and Schmidt (1990, equation 5), and is the
generalized method of moments estimator (GMM) under standard
assumptions. The GIV estimator has been considered by White
(1984, pp. 85-105; 1986), Bowden and Turkington (1984, pp.
68-72), Schmidt (1990), and many others.

In the models we consider, there will always be an
estimator that is asymptotically no less efficient than BSLS
and GIV, namely the cum estimator using all orthogonality

conditions and an unrestricted weighting matrix. Thus, in

14

15
terms of achieving asymptotic efficiency there is really no
need to weed out redundant orthogonality conditions. As a
practical matter, though, it is very useful to know whether,
under certain assumptions, the list of nonredundant
instruments is shorter than the list of all possible
instruments. For example, in a panel data model with 10
strictly exogenous time-varying regressors and six time
periods, the total number of instruments is 360. This can
cause computational problems for CHM, especially when the
cross section dimension is small.

Even if there is no computational issue with CNN, there
might be good statistical reasons for using estimators based
on fewer orthogonality conditions. As an illustration,
consider a result obtained in section 3.3. There, it is
shown that a BSLS estimator based on a reduced instrument
list is, under the usual random effects assumptions,
equivalent to the random effects (GLS) estimator. This 3SLS
estimator is based on many fewer instruments than an
unstructured GMM analysis would be. This result implies
that, if we then compute the CNN estimator with Optimal
weighting matrix using the restricted BSLS instruments, we
obtain an estimator no less efficient than random effects.
In addition, if the random effects assumption fails, then
the estimator is generally more efficient than random
effects. And while this GMM estimator based on restricted

instruments is generally less efficient than the full GMM

16
estimator if the random effects assumptions fail, it has
computational advantages and could very well have better
finite sample properties. Thus, the redundancy conditions
do have practical applications.

Throughout this chapter (and in later chapters) we
focus on general unobserved effects panel data models, where
the time constant unobserved effect may or may not be
correlated with some or all of the observed regressors. We
cover models with serially uncorrelated idiosyncratic errors
as well as models where the idiosyncratic errors are allowed
to be serially correlated with time-varying variances. Such
a setup captures the flavor of both random effects and fixed
effects-type specifications. In a random effects framework,
the key assumption is that the observed regressors are
uncorrelated with the unobserved effects. For many fixed
effects applications, the key feature is that the
unobservable effects can be correlated with some or all of
the regressors. We consider both cases in what follows, and
this is sufficient for the vast majority of applications.
The strict fixed effects framework, which assumes that the
unobserved effects are constants that differ across
individual, is not treated here. See Hsiao (1986, pp. 41-49)
for a discussion of the conceptual issues underlying the
fixed versus random effects dichotomy.

Section 2 contains some general results concerning the

equivalence between 3SLS and GIV. These are extensions of

17
the well-known equivalences between OLS and GLS under
certain conditions in the seemingly unrelated regressions
model, and of the equivalence between 3SLS and ZSLS in
simultaneous equations models.

Section 3 studies panel data models where the
regressors are uncorrelated with the composite errors. In a
general model the equivalence of the BSLS estimator using
all orthogonality conditions and GLS is established, giving
a different proof of a result of Ahn and Schmidt (1991). A
new result showing the equivalence of a BSLS estimator with
reduced instrument set and GLS is presented in section 3.3.

Section 4 turns to models where the time-constant
unobserved effects are potentially correlated with some or
all of the regressors. Several estimators are shown to be
identical for estimating the parameters in the unobserved
effects analog of Kiefer's (1980) fixed effects model. We
also study the Hausman and Taylor (1981) (HT hereafter)
model under more general assumptions. Hausman and Taylor
showed how the coefficients on the time constant regressors
could be identified when the time constant regressors are
correlated with the unobserved effects in a model with
serially independent idiosyncratic errors. Efficient
estimation in the HT model was considered further by
Breusch, Mizon and Schmidt (1989) (EMS hereafter) under some
additional assumptions. These results are extended to allow

for arbitrary serial correlations in the idiosyncratic

18

errors .

2. 3SLS, GIV, AND REDUNDANCY CONDITIONS

2.1. W

Consider a linear panel data model,

yi==)gﬁ + ui, (2.1)
where yi = (yﬂ, - - - ,y") ' and xi = (xi'1, - - - ,xi'T) ' of dimension
Txl and Txk, respectively. {(yvag):i=1,-- ,N} is an i.i.d.

random sequence.

Throughout the paper, for any Txp matrix mi, M 5
(m{,-- , mg)' of dimension NTXp, where N is the number of
observations. Thus, the matrix M is the stacked matrix of
nu for i = 1,-- ,N with the i-th block mi. In the sample, Y
= X6 + U.

Most of the results we discuss in this chapter deal
with algebraic equivalences of various estimators and have
nothing to do with statistical properties such as
consistency and asymptotic normality. Nevertheless, one
would probably not use the estimators unless certain
assumptions are satisfied. It is useful to set out some
assumptions that typically underlie method of moments-type
estimators. Because we are studying both 3SLS and GIV we

make assumptions that traditionally underlie application of

—z~ﬂ' -

19
these methods.
To consistently estimate 6 appearing in (2.1) we assume
that there is a set of Txh instruments, wg, that are
appropriately orthogonal to “1' 4A fairly standard set of

assumptions is

ASSUMPTION 2.1:
(a) E(w{u1) = 0.

(b) E(wi'2'1ui) = 0, where 2

E(unﬁ) is nonsingular.
ASSUMPTION 2.2:
(a) E(wi'xi) has full column rank and E(w‘.'zwi) is
positive definite.
(b) E(wi'2"xi) has full column rank and E(wi'2wi) is
positive definite.
ASSUMPTION 2.3:
(a) E(wi'uiui'wi) = E(wi'2wi).

-1 -1 __ -1
(b) E(w{2 Inugz v“) — E(w;2 W})°

Assumption 2.1(a) and 2.2(a) ensure that the BSLS estimator
is consistent under standard regularity conditions. As a
practical matter, Assumption 2.2(a) implies that the BSLS
estimator exists with probability approaching one (as the
sample size grows); for what follows, we just assume the
estimator exists for any sample. Assumption 2.3(a) is the
weakest assumption that guarantees that the usual formula
for the asymptotic variance of BSLS is valid. Assumptions

2.1(b) and 2.2(b) imply consistency of the GIV; Assumption

20

2.3(b) implies that its asymptotic variance matrix is of a
relatively simple form. Note that a sufficient condition
for both parts of Assumption 2.3 is E(uiui' Iwi) = E(uiui') = 2.

When we study 3SLS in later sections, we typically have
in mind assumptions such as Assumption 2.1(a), 2.2(a), and
2.3(a) . The key will be to find instruments wi that satisfy
these conditions under more primitive assumptions about the
models at hand. Assumption 2.1(a) is critical and dictates
the choice of wg. .Assumption 2.2(a) can be viewed as a
regularity condition. Assumption 2.3(a) cannot be
guaranteed a priori, but it is useful as a starting point.

In practice, one needs a consistent estimator of 2 to
perform 3SLS or GIV. Nothing is lost in the following
analysis by assuming 2 is known because it is consistently

estimable in the models we deal with.
2.2. Efficiency Comparison of 3SLS and GIV
Given the data matrices x, W, and Y, and defining a E
1&92, the 3SLS estimator is defined to be
2535“ = [x'W(w'nW)‘1w'x1'1x'W(w'QW)"w'y.

Equivalently, this is the GMM estimator based on the
orthogonality condition E(W'U)=0 using weighting matrix
(W'nW)4. The GIV estimator first transforms (2.1) to

spherical disturbances by premultiplying by ZYVZ, and then

21

uses 21'1’zwi as the instruments. This gives
35”, = [X'ﬂ"W(W'n"W)‘1w'n‘1x1‘1x'n"W(w'n"W)"w'n'1y.

Under Assumptions 2.1-2.3, we have the following

asymptotic variances:

Avar/N(E,SLs-m = [E(x;wi){E(w:2wi) }"E(w:xi)1“.
and

Avar/N(3mv-ﬁ) = [E(xi')3'1wi) {E(wi'2’1wi) )‘1E(wi'2'1xi) 1".

Rather than compare the asymptotic variances, it is easier
to work with the estimates of the asymptotic variances that

these formulas imply, which are

VA A - ' ' ‘1 I '1 — I "U2 -1/2 -1

armssLs) — [x W(W (W) W X] — (x n PmVZW)“ X) , (2.2a)

and

var-($6M = [x'n"W(w'n°‘W)"w'n“X]" = (X'n'1/2P(n-1/2W)0'1’2X)('1, b)
2.2

where P(.) denotes the projection onto the columns of (-).
Efficiency comparison of the two estimators is ranking the
two idempotent matrices of the same rank P(Q“QW) and
Pm‘VZW) , which is not possible without further information
on X, W, and n. The problem is equally stated as finding
the optimal 6 which maximizes Pm‘W) , which does not seem to
be possible in general.

An important well-known special case is when W = x, in
which case 361V is the GLS estimator and 333:5» is the OLS

estimator. However, when st, general dominance of one

22
estimator over the other has not been established.

Bowden and Turkington (1984, p. 72) argued that there
would be no clear dominance of one estimator over the other
without further information on X and W when X and W are of
the same order. White (1984, pp. 83-105: 1986) showed that
the GIV estimator is no less efficient than the BSLS
estimator if (I'VZW is the optimal set of instruments in the
transformed equation multiplied by n'VZ. White's proof is
based on the fact that the GIV estimator is the ZSLS
estimator on the transformed equation and the ZSLS estimator
is the most efficient when the covariance matrix is scalar
and the optimal instruments are used. However, as we can
see in (2.2), the BSLS estimator is also a ZSLS estimator in
the transformed equation with the scalar covariance matrix.
The difference is in the instruments to be used; GIV uses
(TVZW, whereas 3SLS uses QWW. It is not clear which
instrument set is optimal without further information on X,
W and a.

Before turning to algebraic equivalence results it
should be noted that the efficiency issue is unambiguous if
we strengthen the assumptions as in Chamberlain (1987). He
shows that if (i) E(uilwi) = o and (ii) E(uiui'lwi) = 2:, then
the most efficient estimator that ignores second moment
information is the BSLS estimator using the instruments
E"E(xilwi). Unfortunately, the condition E(uilwi) = 0 is too

strong for several of the panel data applications we have in

23

mind.

2.3. Numerical Equivalence of 3SLS and GIV

We now turn to the algebraic equivalence of 3SLS and
GIV. Therefore, Assumptions 2.1-2.3 are not needed: we only
need for the estimators to exist along with assumptions
about W or E, to be given below.

As is well known, for nonsingular n, OLS = GLS iff
there exists nonsingular R such that 94x = XR. Essentially
the same relationship holds between the BSLS and the GIV
estimators for given instruments W. From a CNN viewpoint,
the estimators are invariant to any nonsingular
transformation of the orthogonality conditions as was
pointed out by Schmidt, Ahn and Wyhowski (1992) (SAW

hereafter).

THEOREM 2.1: In model (2.1), if there exists nonsingular B

such that n"w = WB, then Ems = 36“.

PROOF: x'n"W(w'n"W)"w'n"x = X‘WB(B'W'QWB)"B'W'X

= x'W(w'QW)"w'x. I

The most widely known special case of Theorem 2.1 is
the SUR model with either common regressors or diagonal E,
in which case OLS = GLS. We now extend these results to
show 3SLS and GIV are equivalent under analogous

assumptions.

24
We first consider the common instrument case, that is,
where the same set of instruments is used for all t. This

is given by the following assumption.
ASSUMPTION 2.4: wi = ITow‘i’, where w? is a 1xq vector.

As we will see, Assumption 2.4 is applicable to many panel
data models with strictly exogenous regressors since the
regressors in each time period are orthogonal to errors in

all time periods.

THEOREM 2.2: In model (2.1), when common instruments are
used, that is wi = ITow‘i’, the 381.8 and the GIV estimators are

the same.

PROOF; From Theorem 2.1, it is sufficient to show that

E'1wi = >3"(I,ew$) = 2"ow‘i’ = (119w?) (3-181,) 2 wiB. I

Just as with Theorem 2.1, no statistical assumptions are
imposed. Since the proof is based on each observation,
technically the results holds in the samples of size no
smaller than h. It is worth emphasizing that the common
instruments w‘i’ do not have to be of the form (wi1,- - o, w”) .
Theorem 2.2 holds no matter what w? is.

Theorem 2.2 still holds after we replace 2‘ for E" for
any scalar 6. Hence, there are infinite sets of instruments
that generate the same estimator. The same result has been

provided by SAW (1992) when FW} is used as instruments,

where F is the forward filtering matrix; this corresponds to

25
the case 6 = -1/2.

A wide class of models implies common instruments.
Standard panel data models where the instruments are
strictly exogenous to errors, standard simultaneous models,
and the usual SUR models are examples. Thus, the GIV
estimators are the same as the BSLS estimators in these
models.

We now turn to the case of diagonal 2, where the

instruments are essentially unrestricted.

ASSUMPTION 2.5: 2 = diag(a§, - .-,o$).

THEOREM 2.3: For any 1xht vectors wit, define w‘- = diag(wi1,

-,wn). Then, under Assumption 2.5, BSLS and GIV are the

same.
PROOF: 2'1wi = diag(o;2wi1, - - - ,ofw”) = diag(wi1o;2, - - - ,wiTa;2) E
wiB, where, B E diag(Ig1®o;2, Igzoaéz, - - - ,IgToaf) of dimension

gxg, and I9t is the identity matrix of dimension 9}. I

It turns out that the theorems in this section can be
used to derive some of the results for the specific panel
data models we turn to next. In cases where the comparision
is between BSLS using one set of instruments and GIV using

another set, direct arguments are easier.

26

2.4. Algebraic Redundancy or Instruments in BSLS

In section 4 we use a general result on redundancy of
instruments for 3SLS. The following is the algebraic
equivalance analog of White (1984, Proposition 4.50).
THEOREM 2.4: Let [a = [x1111(1111'12111)"111'x1'1x'w1 (w1'nw,)"w1'y and
5 = [x'mw'nm"w'x1‘1x'mw'nm'1w'y, where w = (111,142). Then

A
6 = 19 1f wz'x = wgnw1(w,'nw1) 111111111.
PROOF: Appendix 2. I

Similarly, the two GIV estimators using instruments

WW1 and n“(w1,w2) are numerically identical if
w'n"x = WOW (WOW )‘1w'n’1X
2 2 1 1 1 1 °

3. MODEL WHERE THE REGRESSORS ARE UNCORRELATED WITH THE

ERRORS
3.1. Unrestricted Covariance Matrix

We now consider model (2.1) under the assumption that
each element of X} is orthogonal to each element of ui;

thus, we have in mind that
E(xi®ui) = o. (3.1)

Under (3.1), for each t the instruments can be taken to be

27

all nonredundant elements of
x? E (xi1,~--,x"). (3.2)

Thus, we want to analyze the BSLS estimator under the

following assumption.

ASSUMPTION 3.1: E(wi'ui) = 0, where wi = Irow‘i’, and W?

contains all nonredundant elements of x?.

If there are no time constant elements in xit then w?==)¢.

Typically, w‘i’ will have fewer than Tk elements since x”
Often contains at least a constant, if not other time
constant variables. Any time constant variables only appear
once in w?.

In this subsection we place no restrictions on the
variance matrix 2, which puts us exactly in the situation of
Theorem 2.2. Ahn and Schmidt (1991) showed that the 3SLS
estimator using all of the instruments IToxi° is the GLS

estimator. We can restate their finding with a simpler

proof.

THEOREM 3.1: Under Assumption 3.1, the 3SLS estimator is

the GLS estimator.

PROOF: It follows immediately from Theorem 2.2 that BSLS =
GIV using the same set Of instruments. But the GIV

estimator using instruments Ignﬁ is the GLS estimator since

28

3.2. Diagonal Covariange Matrix

If the errors uit are serially uncorrelated over time,

we have Assumption 2.5. Let x: = diag(xi1, - . - ,x”) .

THEOREM 3.2: Under Assumption 2.5, the 3SLS estimator

using instruments x: is the GLS estimator.

PROOF: 38LS=GIV from Theorem 2.3. Let 52‘; = 24/22:; and

ii = E'Wxi. Since 2 is diagonal, x’i' = diag(a;1xi1, ~ - - ,O}1x") .

Thus, P(;(-)X = X. The result follows immediately. I

The definition of x: leaves the instruments for each
equation entirely unrestricted. In fact, because E(x:Hn) =
0 is sufficient for consistency of BSLS under Assumption
2.5, the strict exogeneity condition (3.1) is not needed for
consistency. Of course we are only proving algebraic
equivalence results here anyway.

Recalling the conclusion of Theorem 3.1, Theorem 3.2 is
seen to be a redundancy result. Theorem 3.1 showed that,
without any restrictions on 2, the 3SLS estimator using wi==
Itow‘i’, where w? contains all nonredundant elements of x‘i’,
equals the GLS estimator. Together, Theorem 3.1 and Theorem
3.2 show that, under Assumption 2.5, 3SLS using instruments
w} is the same as 3SLS using instruments xi. Without
Assumption 2.5 this redundancy does not necessarily hold.

In the SUR model where ﬁ's are different across the

equations, the regressor itself is xi. OLS = GLS if 2 is

29
diagonal. But in the panel data models with the same 6
across time periods, GLS is strictly more efficient than
OLS. Of course, if 0: = 02 for t = 1,---,T, OLS = GLS.

3.3. gendon Effects Structure

In many panel data applications 2 is entirely
unrestricted as in section 3.1. Further, it is essentially
never diagonal in unobserved components models. We now turn
to the popular random effects model.

To study the random effects setup we need to introduce
some notation similar to that used by EMS (1989). The
instruments ITox‘i’ are equivalent to the instruments
(eT,L)Ox‘i’, since all of the columns in (eI,L) are linear
combinations of the columns in I} and both are of the same
column rank, where eT denotes the Tx1 vector of ones and L

denotes the Tx(T-1) differencing matrix

 

 

The instruments etox‘i’ and Lox? are in the space spanned by eT
and L, and of dimension Tka and TxT(T-1)k, respectively.
For the same reason, the instruments x: are equivalent

- .. _ T - _ -
to (xi,xi) , where xi = 4% 21x1: and xi = (X11'X1I' - ~, xiH-xi) .
t:

30

That is, X? and (wai) preserve the same information. Since
xn-xi is the negative sum of the rest of the terms (xH-xi,

-,xiT_1-§i) , xn-xi is trimmed away to avoid the singularity
problem without losing any information. Let 2” = xH-xg,
then iii = (53“, - - ~ , 52m) . The dimensions of ii and iii are
1xk and 1x(T-1)k.

In summary, the instruments Itox‘i’ are equivalent to;

(L,e,)®x? or Leo-(“521) or (L,eT)®(§i,§i). In the sample,
(X°,X,X) 9L and (X°,X,3'()181eT stand for the stacked instruments

m(xg,§i,§i) and eT®(x“?,xi,xi), respectively, where

-x11x12---x"- 'X11x12"'x11-1' -x1-
x21 x22 X21 x21 X22 x214 x2
x°- it: ’= .
- .1 - '° ” °° _ _ - _
x111 X142 x," X111 x112 X1114 X11

 

 

 

 

 

 

In the standard random effects model each uit can be
written as

u.it =¢i + Git! t: 1'...’T,

where ¢3 is the time-constant unobserved effect and the sit
are the idiosyncratic errors. We assume that 491 and 6it have
zero means, are uncorrelated for all t, and that

{en:t=1,-o ,T} is an uncorrelated sequence with constant
variance 0:. The variance of ¢1 is 0:. These assumptions
lead to a well known form for 2.

31

. _ 2 2
ASSUMPTION 3.2. 2 — O‘II + a‘eTeT'.

In applying random effects it is assumed that E(x;¢e)== 0
and E(xi®ei) = 0, so that the strict exogeneity condition
(3.1) holds. Thus, the set of potential instruments is
exactly as in section 3.1: the nonredundant elements of x?
can be used as instruments for each time period t. Section
3.2 showed how the number of instruments can be reduced when
2 is diagonal. The next result shows that one can get by
with many fewer instruments in the random effects model.
The proof is much simplified by writing 2 under Assumption
3.2 as E = PT + bQT, where PT = eT(e1'e,)‘1eT' = %e,e;, QT =
L(L'L)4L' = IT-Iy, and b is a positive scalar. To see how

to do this, note that

_ 2 2 _ 2 2 _ 2 2 2
2 - aﬁIT + o‘eTeT' — OGIT + TagT — (ae+Ta')PT + OeQT E PT + bQT.

Of-l-TO: (sum of each column in 2) is assumed to be one
without loss Of generality, and b a of. Note further that
(PT + bQT)'1 = PT + £01, that holds since two projection
matrices PT and QT span the two orthogonal bases eT and L.

Let p = INOPT, Q = INQQT.

THEOREM 3.3: Under Assumption 3.2, the 3SLS estimator using

instruments (Ppg,Qpn) is the GLS estimator.

PROOF: X'P ]

X'P "
X1(PX,QX) [[ X'Q ](P+bQ) (PX,QX)] [X'Q

= X'PX + %X'QX = X'(P+bQ)'1X. I

 

32

Another way to say this is that BSLS using all the
instruments is the same as 3SLS using (qu,Qng).

Since both BSLS and GLS need a consistent estimator of
E and GLS is computationally simpler, one might think 3SLS
is not so useful in practice. But in fact Theorem 3.3 is
quite useful. Recall that 3SLS is a GMM estimator using a
restrictive weighting matrix. Under Assumption 3.2, 3SLS is
asymptotically equivalent to GMM using instruments
(Ppg,ng) which is robust in the presence of the
conditional heteroskedasticity and/or the conditional serial
correlation (White, 1980, 1982; Hansen, 1982). If
Assumption 2.3(a) is violated, for example, if there exists
conditional heteroskedasticity or serial correlation, the
inference based upon the GLS estimator is not valid. And
while the robust variance estimator of the GLS estimator can
be reported as

A A 4 4 N 4A A -1 -1 -1
Var(ﬁGLs)=(X'n X) (.21 xi'z uiuiz“. xi)(X'n X)
1-

(Wooldridge, 1992), this is not even necessarily smaller
than that of the OLS estimator if A2.3 is violated. On the
other hand, the GMM estimator using the instruments (qu
,qu) is the most efficient (among the estimators based on
these instruments), and Theorem 3.3 tells us it is no less
efficient than GLS, and it is more efficient than GLS if
Assumption 3.2 fails. Further, the GMM estimator using the

instruments (Ppg,qu) is no less efficient than the OLS

33
estimator whether Assumption 2.3(a) holds or not, since the
instruments (ng,Qgg) are equivalent to the instruments
(xi,Ptxi) , and the additional instruments PIX: are not
redundant upon the instruments xg. This GMM estimator has a
lot fewer instruments than the GMM estimator based on all
orthogonality conditions.

The representation 2: = PT-i-bQT has many other uses as
well. For example, it leads to a straightforward proof that
the GLS estimator can be written as a convex combination of
the between and within estimators. Let 3b and 3" be the

between and the within estimators, then

Ems = [x' (P+%Q)X]'1X' (P+%Q)Y
= [x' (P+%Q)X]'1X'PY+%[X' (P+%Q)X]'1X'QY

(x'I>x+f‘5x'QX)"x'Pfo,D + (X'PX+%X'QX)'%X'QX£3H.

Theorems 3.2 and 3.3 suggest that the minimal set of
instruments depends on the structure of error covariance.
It appears that the minimal set of instruments depends on
the block diagonality of the Optimal weighting matrix
E(w;2w3). In the model with a diagonal error covariance,
the optimal weighting matrix is a block diagonal with the t-
th block O§E(x‘i"x“?) , for which only xit is non-redundant since
(1;2E(x‘i"x‘i’)‘1 meets the regressor xit only.

Recall that the instruments 119x? are equivalent to
(e,ox§’,Lex$) . In the random effect model, the Optimal

weighting matrix becomes block diagonal between the two

34
blocks eT'eTO E(x“?'x‘i’) and bL'LoE(x‘i"x‘i’) . In the sample, the

first block corresponds to the regression
4
[X'P(X°®IT) (INOPT)X] X'P(XO®IT) (IN®PT)Y.

Thus, replacing PX for X°®eT produces the same result. For
the same reason, nothing differs whether we use QX or x%u;
for the regression in the space spanned by L.

It would be worth noting that if 2 is diagonal the
optimal weighting matrix becomes block digonal even when the
instruments are weakly exogenous to the errors, but when
2=PT+bQT the optimal weighting matrix is not block diagonal
upon the weakly exogenous instruments. This would be the
reason why Ahn and Schmidt (1992) get the result that all of
the instruments are not redundant in the dynamic panel data
model with the random effect covariance structure. In
dynamic model, the instruments corresponding to the lagged
dependent variables are weakly exogenous. Details are in
Chapter 3.

We end section 3 with another model where the optimal

weighting matrix is block diagonal.

3.4. A Generalization of the Random Effects Assumption

Note that E = PT+bQT is sufficient but not necessary
for the block diagonality of the optimal weighting matrix

upon the instruments (eTox‘i’,Lox‘i’) . E = PT+bQT => PTEQT = 0,

 

35

but not vice versa (Lemma A1.2 in Appendix 1). Even though
it seems unlikely in applications for )3 to satisfy PTEQT = 0
but not be of the form PT+bQT, theoretically it is worth
looking into this case in greater details. The properties
of E which satisfies PT'L‘QT = 0 are collected in Appendix 1.

Because the nonredundant set of instruments depends in
different ways on time-constant and time-varying regressors,

we now explicitly separate the two. Write
Yit = xitﬁ + Zi‘y + uit' t = 1:"‘1T, (3.3)

where xit is 1xk and zi is 1xg; note that zi'can include a
constant. Note that uit need not be separated into a time
constant and time varying errors for stating the results of
this section. For consistent estimation of B and 1 by, say,

GLS, in addition to (3.1) we would now need the condition
E(zgmn) = 0.

Interestingly, if 2 satisfies certain conditions, the 3SLS
estimator with a reduced set of instruments is the GLS

estimator. The condition on 2 is formally stated as
ASSUMPTION 3.3: PTEIQT = 0.

As we mentioned earlier, if z is of the random effects form
then it satisfies Assumption 3.3, but the converse is not

true.

36
THEOREM 3.4: In model (3.1) under Assumption 3.3, 3SLS

using the instruments (arm-(“etozwmxn is GLS.

PROOF: Appendix 2. I
Theorem 3.4 shows that the (T-1)(2k+g) instruments
(mxiﬂozweroxi) are redundant when PTEQT = 0. Redundancy of
eto'xi is rather obvious. The regression is separated into
the two orthogonal spaces and the error covariance is
idempotent in the space spanned by es. On the other hand,
the intuition behind why the instruments L®(xi,zi) are
redundant is not that obvious. Note that ii, the time
constant component of time-varying instruments x3, behaves
just like the time-constant instruments zi.

In the previous subsection, if the error covariance is
of the random effects structure then the GLS estimator is a
convex combination of the between and the within estimators.
Similarly, when the error covariance satisfies Assumption
3.3, the GLS estimator is a convex combination of the
between estimator and the GLS estimator on the differenced
data. To show this, let A = x1 (INOE'1)X, and note that the
GLS estimator on the differenced data is
Em = [X'(IN®L(L'2L)’1L'}X]'1X'{IN®L(L'2L)'1L'}Y, and 13,.»3'1PT =

éP, (Lemma A1.5 in Appendix 1) . Then,

am A'1x' (1,392")! = A'1[X'(IN®PTE'1PT)Y + x' (INGQTZ'1QT)Y]

= A’1‘;X'PY + A'1X'(I"®L(L'2L)'1L')Y

A"%(X'PX)ﬁb + A'1[X'(INOL(L'2L)'1L')X]§GLS.

 

37
4. MODEL NHERE THE REGRESSORS ARE CORRELATED WITH THE TIME

CONSTANT ERROR COMPONENTS

In this section we consider two models where the
time-constant unobserved effect may be correlated with some
or all of the regressors. In the first model all regressors
are time-varying and possibly correlated with the unobserved
effect. If in addition the idiosyncratic errors are assumed
to be serially uncorrelated with time-constant variance,
this effectively corresponds to the traditional fixed
effects model. When the variance-covariance matrix of the
idiosyncratic disturbances is unrestricted we get the
unobserved effects analog of Kiefer's (1980) fixed effect
model. In the general case we derive the equivalence of
several estimators that are suggested by the structure of
the model.

In sections 4.2 and 4.3 we study the HT model, where
some regressors are assumed to be orthogonal to the
unobserved effect. The original HT model assumed i.i.d.
idiosyncratic errors. We cover this case in section 4.2 and
in section 4.3 derive new redundancy results when the
variance-covariance matrix of the idiosyncratic errors is

unrestricted.

38

4.1. A "Fixed Erfects" Tyne Model

The model can be written as
yit = xitﬁ + 4’1 + an = xitﬂ + uit, t = 1,---,T. (4.1)
Now the orthogonality condition underlying the analysis is
E(xgug) = 0, (4.2)

which is a strict exogeneity condition but allows xit and Oi
to be arbitrarily correlated. This arbitrary relationship
between xi and 4’1 gives (4.1) a fixed effects flavor.
Under (4.2), only coefficients on time-varying regressors

are identified; thus, for this subsection, x contains only

M
time-varying regressors.
Under (4.2) the valid instruments for estimating 6 are

given by

_ 0

where recall that x‘i’ a (xi1,- - -,x”) is a 1ka row vector and
L is the Tx(T-l) differencing matrix (see section 3.3).
Thus, w; is Tx(T-1)k. Not surprisingly, a reduced set of
instruments is available under standard assumptions.

Under Assumption 3.2, that is E = OEIT + aie,e,', the
within (or "fixed effects") estimator is known to be

efficient (provided Assumption 2.3(a) holds with wg== Loxﬁ).

Thus, it seems natural that other efficient estimators would

39

be the same as fixed effects.

THEOREM 4.1: In model (4.1) under Assumption 3.2, the BSLS
estimators using the instruments Lox? and thi and the GLS
estimators on the demeaned and differenced data are the

within estimator.

PROOF: The two BSLS estimators using the instrumental

variables Lox? and eri are the within estimators since

x' (X°®L) [X°'X°®L(PT+bQT)]'1(X°®L) 'x = gx'ox.
And the GLS estimators on the demeaned and on the
differenced data are the within estimator because
xi'QTQIQTXi = xi'eri = xi'L(L'L)-1L'in

where Q; denotes the generalized inverse of QT. II

If we allow E(eie;) to have an unrestricted form, then
2 = E(uiui') is also unrestricted and this effectively gives
the setup of Kiefer (1980). We can apply Theorem 2.4 to

show that some instruments used in 3SLS are redundant.

THEOREM 4.2: In model (4.1), the two 3SLS estimators using
the instruments (LoiEwLoxi) and Lexi are numerically

identical. Thus, Lexi are redundant.

PROOF: From Theorem 2.4, it is sufficient to show that
(XOL) 1): = (XOL) ' (Inez) (XOL) [(XOL) ' (Inez) (XOL) ]'1(XoL) 0:.
But, the RHS is (X'OL'E)[P(x)®L(L'£L)'1L']X

= (x'eL'z) (IN®L(L'2L)’1L')(P(g)®Qr)X = (XOL) 'x. The last

40

equality holds from Lemma A2.1 in Appendix 2. I

Theorem 4.2 shows that among the T(T-1)k instruments Lox$,
the (T-1)k instruments Lon-ci are redundant. As before, this
result can be used to construct a GMM estimator that is no
less efficient than 3SLS when Assumptions 2.1(a), 2.2(a),
and 2.3(a) hold with w3== Lox?; if, in particular,
Assumption 2.3(a) should fail, this GMM estimator is more
efficient than 3SLS, and it adds no more orthogonality
conditions.

Other estimators under these assumptions also suggest
themselves. One can apply GLS on the first differenced or
demeaned equations. Kiefer (1980) proposed GLS using the
demeaned data using (QIEQTY', the generalized inverse of
error covariance on the demeaned data. It seems clear that
no information is lost by deleting any one equation in the
demeaned data, since any one equation is the negative sum of
the rest of the equations. Also demeaning and differencing
preserve the same information. There are several estimators
that are numerically identical. Let
335133 381.8 estimator using the instruments L®Xi in the

original data,

Kiefer's estimator,

b
K
1:

‘0

GLS estimator in the demeaned equation after deleting

b
o
3

any one equation,

GLS estimator in the differenced equation,

El
0
1!

33s”); 3SLS estimator in the differenced equation using all

41
of the instruments
amv’ GIV estimator in the differenced equation using all of

the instruments.
THEOREM 4'3: ﬁssLs = 31:1: = Bun = ﬁor = ﬁ3SLS = ﬁcw'

PROOF: Appendix 2. I

4.2. nausnan and Taylor Model J

The HT model is the model (3.3) where xi== (xﬁ,xﬁ) and

zi = (z1i,22i). Thus,
yi = XML + XZiBZ + (eTozHM1 + (eTozmM2 + 4’191 + 6i. (4.3)

The dimensions of xm, x2“, z“. and z2i are 1xk“ 1xk2, 1xq1
and 1xg2, where k=k1+k2 and g=g1+gz. ﬁ=(ﬁ1',ﬁz') ' and
12(1f,15)'. Assumptions 2.1 - 2.3 and Assumption 3.2 that
the error covariance is that of the random effect model are
assumed in the HT model. The distinctive feature of the HT

model lies in the assumptions

E(x”®ug ==0, (4.4)
E(xa®ei)== 0, (4.5)
E(zﬁoug ==0, (4.6)
E(zﬁoei)== 0, (4.7)
E(x2i't¢i) is the same for t = 1,~--,T. (4.8)

The conditions (4.4)-(4.8) determine the instruments

42
available in the model. (4.4) implies the Ti’k1 instruments
Iroxgi. (4.5) implies the T(T-1)k2 instruments Loxgi. (4.6)
implies the Tg1 instruments ITozﬁ, and (4.7) implies the
(T-1)g2 instruments LozZi. (4.8) adds the (T--1)k2
instruments eTOSEZi. The condition (4.8) and the additional
instruments 9195221 were proposed by BMS. Together, we have
[T2k1+(T"’--1)k2 +Tg1+(T-1)g2] instruments wi = (Ipxﬁ’wLoxgi,
Itozﬁ,Loz2i,eTo§2i) , which are equally represented by [Lo(x‘1’i,
xgi'z1i'ZZi)'eT®(xc1,i' S:‘21'219 ] °

For the model to be identified, the number of
instruments in the space spanned by eT should not be smaller
than the number of the time constant regressors, that is,
Tk1-1-(T-1)k.‘,+g1 2 g1-1-g2 should hold.

Under the random effects covariance structure in
Assumption 3.2, EMS and Ahn and Schmidt (1992) showed that
the minimal instruments needed for the most efficient 3SLS
estimator are [thi,eTo(x‘1’i,x2i,z1i)]. All the instruments in
the space spanned by eT are not redundant, but only the k
instruments thi are not redundant among the T(T-1)k+(T-1)g
instruments Lo(x‘i’,zi) in the space spanned by L. Since 2 =
P,-1-bQT and all the instruments belong exclusively either to
the space spanned by eT or to the space spanned by L, the
regression is separated into the two orthogonal spaces. It
is entirely valid to find the minimal set of instruments in
each space separately. In the space spanned by L, the error

covariance QIEQT = bQT is a scalar idempotent, thus it is not

 

43
surprising that the instruments (2.xi are sufficient to reach
the GLS efficiency. In the space spanned by e,, the error
covariance is idempotent but eTo(x1i,22i) are correlated with
45,, thus it fits our intuition that all the instruments in

the space spanned by eT are not redundant.

THEOREM 4.4: In model (4.3) under Assumption 3.2, the 3SLS
estimator using the instruments [QTxi,eTo(x§’i,§2i,z1i)] is a
convex combination of the within estimator and the ZSLS

estimator using the instruments eT®(x‘1’i,x2i,z1i) .

PROOF: Let di = (x‘1’i,x2i,z1i) . In the sample the instruments
are (QX,DoeT) . Let R = (X,ZoeT) , the regressors. Note that
QR = QX. Let

bX'QX o '1 X'QX

A a R' x,Do =

(Q e7”: 0 D'DoeT'eT ] [ (DoeT) 'R ]
.. 1

A _ -1 .1. _ -1 A _1 A
3351.5 ' A (bX'QY+R'P(D®er)Y) - bA X'QXB" + A R'P(DoeT)R323Ls° I

THEOREM 4.5: In model (4.4) under Assumption 3.2, the BSLS
estimator using the instruments (QTxi,etodi) is the GIV

estimator using the same set of instruments.

PROOF: By Theorem 2.1, it is sufficient to show that
(p,+f;Q,)(Q,xi,eTodi) = (Jb*QTxi,etodi) = (QTxi,eT®di)B,
where B is the Tk1><Tk1 diagonal matrix with %'s in the first

k1 diagonal and 1's in the last (T-1)k1 diagonal. I

Thus, GIV does not reduce the number of instruments for the

 

44

original HT model.

4.3. 31 Model with Serially Correlated Tine-Varying Errors

We now study the HT model without Assumption 3.2; in
fact, the error covariance 2 is entirely unrestricted. We

get the following redundancy result for 3SLS.

THEOREM 4.6: In model (4.3), the two BSLS estimators using
the instruments [ITo(x‘1’i,5°:2i,z"),Lo(x2i,z2i)] and ITo(x‘1’i,§2i,z")

are the same. Thus Lo(xﬁ,zﬁ) are redundant.
PROOF: Appendix 2. I

The instruments L®(xﬁ,zu) are redundant whereas Lo(x”,z”)
are not. The difference is that the instruments eT®(x1i,z1i)
are included in the instrument set while eT®()-(2i,22i) are not.
It is intuitively reasonable that the redundancy of the two
sets of the instrumental variables ii and zi, namely the
instruments from the time constant component of time varying
regressors and the instruments from the time constant
regressors, show the same pattern. A similar result was
provided in Theorem 3.5, but there the covariance structure
satisfying PIEQT = 0 is critical.

The BMS condition (4.8) is crucial for the GIV
estimators to consistent. Without (4.8), Lo(x%,za) are not

valid for GIV since

 

45

E[(L®x§i) '2‘1ui] = E[(L®xgi) '3'14’11 = L'2’1eT®E(xgi'¢i),

which is not zero unless PTEQT=0. The same proof shows that
Loz2i are not valid instruments.

The instruments Lo(xﬁ,za) which are redundant for 3SLS
are not valid for GIV even with the BMS condition (4.8),

since
E[(Loz2i) '2'1ui] = E[(L®z2i)'2'1¢ieT] = L'E'1eT®E(22i'¢i) ,

which is not zero unless PT‘i'QT = 0. The same procedure
shows invalidity of the instruments Lox”. Thus the
instruments available for GIV are I.o(x‘1’i,5°c2i,z1i) , which are
no more than the non-redundant instruments for the 3SLS

estimator. Here, (x%,§ﬁ, z") are the common instruments.

THEOREM 4.7: In the model (4.3), the 3SLS estimator using
the instruments I,o(x‘1’i,§2i,z1i) is the GIV estimator using

the instruments [91x1 ,eTo (x‘1’i 322, , z”) ] .
PROOF: Appendix 2. I

Thus, GIV reduces the number of instruments in the space
spanned by L from Lo(xi,z1i) to eri, say, (T-1)2k+(T-1)g1 to
k, but does not reduce the number of instruments in the
space spanned by e,.

The relationship between the two GIV estimators using
the instruments ITo(x‘1’i,§2i,zH) and [QTxi,eTo(x‘1’i,xZi,z1i)] is

similar to the relationship between the two BSLS estimators

46
using the same set of instruments in the HT model of the
previous subsection, where error covariance is of random
effects form.

If there are no x”, 21. in the model we are dealing
with, the model is the fixed-effects type model of section
4.1 with unrestricted covariance matrix but with the BMS
assumption. Then, the instruments (etoxﬁ) are not
redundant, and there are more efficient estimators than the
GLS estimator in the differenced equation or Kiefer's

estimator.

5. CONCLUSION

For the two estimators, GIV and 3SLS, it has been shown
that there is no clear dominance of one over the other in
general but the two are the same numerically in many popular
panel data models. We have also derived new redundancy
results for BSLS estimator for a variety of panel data
models with strictly exogenous regressors.

The result that block diagonality of the optimal
weighting matrix plays the central role in determining the
nonredundant set of instrumental variables provides a sound
intuition behind the findings in Amemiya and MaCurdy (1986),
BMS, Ahn and Schmidt (1992), and most of the findings in
this paper. The idea also is applied to other models. For

example, construction of the GMM estimator which is robust

47
to conditional heteroskedasticity and no less efficient than
the GLS estimator is straightforward in the usual group data
models.

When the optimal weighting matrix is not block
diagonal, a large number of instruments is not redundant for
3SLS. Even if the GIV estimators are not robust, GIV
reduces the number of relevant instruments under an
unrestricted error covariance structure. GIV would
therefore be useful in practice.

As was shown in Schmidt (1990), in general GIV produces
inconsistent estimators when the instruments are only weakly
exogenous to the errors. This raises many interesting
questions both for the consistency of the estimators and for
the consistency of the standard errors. We study weakly
exogenous panel data models in Chapter 3.

Throughout this paper, we only consider orthogonality
conditions between the regressors and the errors. An
interesting question is whether the orthogonality conditions
from the error covariance matrix in each of the models are
useful or not. Chapter 4 gives the conditions when the
information from the error covariance matrix is useful in

several popular models.

"1

48

APPENDIX 1

LEMMA A1.1: P1291 = 0 iff the element sums of each row

(column) are the same for all rows (columns) of E.

PROOF: PIZQT = o e z = 13,29T + QTZQT, since 2 = (pT+QT)2(P,+QT) .
The element sum of each row (column) in QTEQT is 0 and all
of the elements in P.2PT are the same. Hence, necessity
follows. If the element sums of each column are the same,
all of the elements in P12: are the same. Thus PTZ = P.2PT,

which implies PTEQT = 0. Hence, sufficiency follows. I

LEMMA A1.2: For non-negative scalar a and b, 2: = aPT + bQI

e PTEQT = 0, but not vice versa.

PROOF: Direct substitution proves that 2‘. = aPI + th => PIEQT
= 0. A counter-example which satisfies P1291 = 0 but not of
the form 2 = aPT + bQT is sufficient for the proof. Suppose

3 1 0

2 = [ 1 4 -1 ].

O -1 5
2 is symmetric positive definite and the sum of each row is
4, but 2 is not of the form aPT + bQT, because not all of

its diagonal elements are equal and not all of its off

diagonal elements are equal. I

' ”"1

49

LEMMA A1.3: E = aPT + bQT e PTZQT = 0 when T = 2.

PROOF: Equality of the off diagonal terms is guaranteed
from the symmetry of E and the equality of diagonal terms is

enforced from Lemma A1.1. Thus 2 is of the form P1 + bQT. I

LEMMA A1.4: If PIEQT = 0 and either all of the off
diagonals in 2 are equal or all the diagonal terms in 2 are

the same, then 2: = aPT + bQT.

PROOF: If the off diagonal terms are equal, all of the
diagonal terms should be the same each other for PTEQT = 0
to hold from Lemma A1.1. Thus the two statements 2 = aPT-+
bQT and PTZQT = 0 are equivalent when the off diagonal terms
of E are the same. To prove the statement that the equal
diagonal elements of E and PTEQT = 0 implies )3 = aPI + err
mathematical induction is used. When T = 3, imposing the

equality of the diagonal terms and from the symmetry of 2, 2

is expressed as

a on GB
013 023 a

From Lemma A1.1, a12+a13 = O12+023 = O13+Oz3, hence, 012 = 013 =
08, Suppose the Off diagonals are the same. If the
diagonal terms are the same for T = t > 3, then for T = t+1,
the (t+1)—th off diagonal terms are forced to be the same

(Lemma A1.1). I

 

50

o _ _ -1 .-
LEMMA A1.5. If PIEQ1 — 0, then P.,ZIPT — aPI, PTZ PT - 1P

a 1' and
e1(e,'2e,)"e1' = P.2'1PT, where a is the sum of each column of

2.".

PROOF: PIZJPT = e,(eT'eT)"eT'Ee,(eT'eT)"eT'= gPT, where s = et'zzeT
is the sum of all elements of 2, thus % = a. Hence, the
first result follows. (P,>:"P,) (P,2P,) = PT. Thus,

(P.2T‘P,)aPT = P1 and PTE'1PI = éPT. The third result follows

trivially. I

LEMMA A1.6: If PTZQT = 0, then QT2'1QT = L(L'2L)"L'.

PROOF: It is sufficient to show that QTZ'1QT2 = L(L'2L)'1L'2.
4 _ 4 _ - _ -

Note that Q: Q72 — QTZ‘ ‘2."QI — 0., Since PTZIQT - 0 =: Q12 — QTZIQT

= ZQT. Thus, what we need to show is that L(L'2L)'1L'2 = QT.

But, L(L'EL)"L'2(P,+QT) = L(L'ZL)'1L'EQT, since P,EQt = o ..

L'Ee, = o and L(L'EL)"L'EQT = L(L'EL)"L'2L(L'L)"L' = Q I

T.

LEMMA 111.7: PTEQT = o e PTE"QT = o.

PROOF: PTZQT = o e E = PTZPT + QTZQI e PTE = P.2PT and QTZ =
QTBQT. Post-multiplying by E", P1r = PTEPTZ'1 = PTEPTE'1PT =
2PT'2T‘PT and pre-multiplying by 2", we have 2:"PT = PT2'1PT,
which is the condition we are looking for. Exactly the same
procedure shows that '2'i‘1QT = Q.2"Q.. Given sufficiency,

necessity is obvious. I

 

51
In fact, 2" = (PTZIPI + QIEQI)" = (P.2"P, + QTanT) for any

integer n, which implies 2" = (PJI‘PT + QTE'1QT) if PTZIQT = 0.

 

52

APPENDIX 2

LEMMA A2.1: (P(x)oQT)X = (P(R)OQT)QX = ox.

PROOF: Note that when k = 1, X = vec(X°') and X = X°QT.
vec(Q,X°') = QX, which is valid when k > 1 by applying this

argument to each of the regressors separately. In fact, X

 

are the first (T-1)k columns of X°QT, but P(X)X°Qt = X°Q,,
simply because the projection of QT after deleting any one

column of QT is still QT. I

PROOF of Theorem 2.4: 25 = 3 if x'w1(w1'nw1)"w1'x
= X'W(W'ﬂW)'1W'X. But,
'1
W2'ﬂW2 WZ'ﬂW1] [WZ'X]

X'W(W'nW)'1W'X = (x'w2 X'W1)[

w1'nw2 w; aw1 w; x

4 _ 4 4 _ 4 4
x'wzn wz'x x'wzo wz'nw1(w1'nw1) w1'x x'w1(w1'nw1) w1'nwzo wz'x

+ x'w1(w1'nw1)“w,'x + x'w1(w1'nw1)"w;{211213411150111 (w1'nw1)"w1'x,
from the partitioned inverse lemma. Thus the condition is
x'wzn"w,_'x - x'WZD'1w2'nw,(w1'nw,)"w,'x - x'w,(w1'nw,)"w1'nwzn"w2'x
+ x'w1(w1'nw1)‘1w1'nwzo'1w2'nw1(w1'nw1)“w1'x = 0, which is A'D‘1A,
where A = wz'x — wgnw1(w1'nw,)“w,'x and D = 11219112 -
W2'nW1(W1'ﬂW1)"W1'nW2, a nonsingular positive definite. Thus

A'D'1A = 0, iff A = o. I

53

PROOF of Theorem 3.4: Let H=(X,Z). It is sufficient to

show that

X'XOL'ZL ]‘[ (XOL) '

(XoL,HoeT)[ J (X,Z®e.)

H ' Hoe; ‘2.."eI (HoeT)

= (1“82'1)(X,ZoeT).

The LHS is
[P(R)OL(L'2L)'1L'](X,Z®eT) + [P(H)oe,(e;ze,)"e;] (x,2oe,)
= [IueL(L'EL)'1L']X + [Inoe,(e;2eT)"e;](x,ZoeT) = (INoQTE‘1QT)x
+ (I'OPTZ'1PT)(X,Z®eT) = (Iuoz'1)(X,ZoeT).
The first equality follows from Lemma A2.1 and the second

equality follows from Lemmas A1.5 and A1.6. I

PROOF of Theorem 4.3: Ems = Ems = 76'6“, in the differenced
equation from Theorem 2.1 and Theorem 3.1. EELS = 333“ since
X'(XOL)(X'XOL'ZL)'1(X®L)'X = x' [P(g)@L(L'ZL)'1L']X

= x'[INoL(L'EL)"L'][P(x)oQT]x = X'(INOL) (INOL'ZL)'1(IN®L) 'x.
The last equality follows from Lemma A2.1. To show ﬁﬁ.=
30,, it is sufficient to show that L(L'2L)"L' = (9,29,)‘2
But: C212£2114(I:"231:)"L'QTEQT = 0,20,, L(L'):L)"L'QTEQ.L(L'EL)"L'=
L(L'EL)"L' and QTZQTL(L'2L)'1L'= L(L'2L)'1L'QT2QT = or Thus
L(L'EL)'1L' is the unique generalized inverse of QTF.QT
(Theil, 1971, pp 269). To prove 25,, = Bo". Let

ii 3'1 0

, ] and (QTXQTV = [ ]

xi

QTxi=[ o o

where ii denotes the first T-l rows and x? denotes the last

row of QTi. Hence, we are looking at the case when the

54
last row of the demeaned data is deleted. Then,

1 * ”1 £1 81 0 {‘i "I ’1"
xiQT(QT2QT) QTXi = [xi 'xi 1 ][ c] = X15 xi'

0 0 xi
Deleting any one other row instead of the last row of the

demeaned data makes no difference. I

PROOF of Theorem 4.6: Let R = (x,z6e,), H = (322,22) and G =
(xg,22,z,). Note (HoL)'R = (HoL)'X. From Theorem 2.4 it is
sufficient to show that
(HoL)'(INo2)(GoIT)[(GoIT)'(Iuoz)(co1.)]"(co1,) 'R = (HoL) 'R.
The LHS is (HoL)'(P(G)oIT)'R = (HoI.)'(P(G)oL)'X = (HoL)'X

(Lemma A2.1). I

PROOF of Theorem 4.7: Let R = (X,ZoeT) and G = (X?,XZ,Z1).
For the 3SLS estimator using the instruments (GOIT),

(GOIT) [ (GOIT) ' (Inez) (GOIT) ]'1(G®IT) ' (X,Z®eT)
= (P(G)ozq)(X,Zoeﬁ. And, for the GIV estimator using the
instruments (QX,Goeﬁ

XIQ '1 x1
(I oE")(Qx,Goe ) (I o2")(Qx,Goe ) Q (I o2")(x,zoe )
" I (Gee ) " ' (Gee ) " '
T T

[ (I'82"QT)XD'1X ' (INOQT2‘1)

(luoz"Q,)xn“x' (P(G)®QTZ'1eT(eT'E'1e1)'1e1'2'1)

(P(G)82'1e1(e{2'1e1) '1eT'2'1QT) XD'1X' (Iqu,E")
+ P(G)OZ'1eT(eT'2'1eT)‘1eT'2'1
+ (P(G)®2'1e1(e1'2'1e7)’1eT'2'1QT)XD'1X' (P(G)8Q,2'1et(e1'2'1et)'1e7'2'1) ]

° (xlzee‘f) I

55

where D = x' (INeQT2'1Q,)x - x'[P(G)eQ,2"e,(eT'z"eT)"e;2"Q,]x.

Let A n [x' (InsQ12'1)-X'(P(G)®QTZ'1eT(eT'2'1e,)'1eT'2'1)](X,ZoeT) .

A -— [x' (INeQ,2“Q,) -x' (P(G)oQTE'1eT(eT'2'1eI)'1eT'2'1Q7) 1 (x,Zee,)

+ [x'(IueQT2“P,)-x'(p(G)eQ,2"e,(e;z"e,)"e;2"PT)1(X,Zee,)] = D.

Thus, the 1st and the 2nd terms add up to (IN®2'1QT)X and the

3rd and the 5th terms add up -(P(G)92"er(er'2’1eT)"eT'2'1Q,)X.

Together, we have (1'82'1QT)X - (P(G)92"eT(eT'Z'1eT)"et'z‘ng + IT
[P(G)oz"e1(e;z'1e,)"eT'E'H(X,Z®e,). For the regressor ZoeT, I"
[P(G)92'1eT(eT'Z'1eT)'1eT'2'1](ZoeT) = (9(G)e2")(2ee,), and for x,

[Iner‘or- P(G)®2'1eT(eT'Z'1eT)"e;2'1QT + p(G)e2"eT(e;2"eT)"egzﬂjx

= (1.32"ng + (P(G)®2'1PT)X = (P(G)ez“)x. I

CHAPTER 3

BBTIHATION USING PANEL DATA UNDER WEAK BXOGBNBITY

1. INTRODUCTION

In this chapter we study linear panel data models where
the regressors are only weakly exogenous. The primary
concern is with the consistency of estimators when the
errors are serially correlated, and with the consistency of
the usual standard errors of 3SLS (appropriately defined,
see Chapter 2) estimators when certain instruments are used.
We also discuss how to construct some reduced lists of
instrumental variables that would lead nearly efficient
estimators.

A leading example of the weakly exogenous model is the
dynamic model with lagged dependent variables. Anderson and
Hsiao (1981), Bargava and Sargan (1983), Holtz-Eakin, Newey
and Rosen (1988), Arellano and Bover (1990), Arellano and
Bond (1991), Ahn (1990), and Ahn and Schmidt have studied
efficient estimation in dynamic models. The rational
expectations model is an another important example. Using
panel data to test the rational expectations hypothesis has
lead to renewed interest in studying weak exogeneity in
panel data models (Zeldes, 1989; Kean and Runkle, 1990;

Runkle, 1991). Generally, there is a growing realization

56

57
both in time series and panel data contexts that many
regressors in general models would be only weakly exogenous
to the errors.

However, only a few study exist that deal with the
general weak exogeneity (Keane and Runkle (1992) and
comments). An important feature of weak exogeneity is that
different instruments are available for each period so that T
GLS transformation, in general, will bring the inconsistency ‘
of the resulting estimators (Schmidt, 1990). An important
exceptional case is when E is diagonal (Chapter 2, Theorem
2.3). When 2 is diagonal, the redundancy result of Theorem
3.2 of Chapter 2 also applies to weakly exogenous case.
Consequently, a general result that GMM using all the moment
conditions is the best specially has a force in weakly
exogeneous case with non-diagonal covariance matrix (Ahn and
Schmidt, 1992; comments on Keane and Runkle, 1992).

Schmidt, Ahn, and Wyhowski (1992) (SAW henceforth)
provide the lists of instrumental variables for each of
several weakly exogenous models. The structure of the
weakly exogenous instrumental variables provided by SAW is
quite similar to the structure of the instruments for lagged
dependent variables in dynamic models. It now is well-known
that no moment conditions between the lagged dependent
variables and the disturbances exist unless the covariance
matrix is somehow restricted. We ask a basic question about

whether the a priori population moment conditions that are

58
currently suggested'in weakly exogenous models are valid
when the idiosyncratic errors are serially correlated.
Genarally, it is likely that there is a link between
covariance restrictions and orthogonality conditions as is
in dynamic models. We investigate these in the next
section.

If the variances of the disturbances conditioned on the
instrumental variables is the same as unconditional
variances, the usual 3SLS standard errors are consistent in
general. Wooldridge (1993) showed that the usual standard
errors from nonlinear BSLS in hedonic pricing models are not
consistent even when the conditional variance of the errors
are constants. A similar result was obtained by Ahn (1990)
in dynamic panel data models. SAW suggested that the equi-
correlation assumption of Breusch, Mizon, and Schmidt (1989)
(EMS hereafter) can hold in weakly exogenous models. We
show, in section 3, that the usual 3SLS standard errors are
not consistent if the instrumental variables from the BMS
assumption are used under any plausible assumptions for
weakly exogenous models. Also, we link this to the result
obtained by Ahn (1990) in dynamic models.

Keane and Runkle (1992) proposed ZSLS after forward
filtering the equations in weakly exogenous panel data
models when the errors are serially correlated, adapting a
suggestion by Hayashi and Sims (1983) in a pure time series

context. But, it was shown by SAW that the Keane and Runkle

59
estimator is numerically identical to BSLS when all of the
instruments are used, and therby forward filtering is an
unnecessary complication. However, it still remains an
interesting question that how the forward filtering can
reduce the list of instruments, since, in many instances,
using all of the instruments is not even feasible. Keane
and Runkle provided evidence through an example that forward
filtering can bring non-trivial efficiency gains, though
this does not generalize to other cases. The arguments of
Chapter 2 comparing BSLS and GIV apply to this case. The
main idea of Keane and Runkle is that forward filtering
whitens the errors and applying instruments (without
transformation) to the forward filtered equations would
result in better estimators. However, 3SLS also is ZSLS on
the forward filtered equations useing the instruments 0“QW.
Therfore, what Keane and Runkle suggest is using W instead
of n“QW on the forward filtered equations. To compare these
two, we need to compare P(W) and P(ﬂ“QW)' where P(.) denotes
the projection onto the columns of (o). This comparison, in
general, is not entirely clear. See Chapter 2 for more
details.

BSLS and the Keane and Runkle estimators, in the
presence of heteroskedasticity, are less efficient than GMM
that uses the same instruments. The comparison between the
two GMM estimators using instruments W and Q'WW is even

more ambiguous.

60

However, in finite samples, GMM estimators based on a
huge number of instrumental variables might not have
desirable properties. For example, in finite samples, the
standard errors can grow by adding instruments, and this
would happen more likely when the number of instruments
becomes closer to the number of observations. Thus, it is
practically useful to find some reduced lists of instruments
that generate estimators with desirable properties. In
section 4, we suggest some weighted sums of the given long
lists of instrumental variables, which do not generate fully
efficient estimators but would lead to nearly efficient
estimators. However, we should note that the usefulness of
applying these reduced lists of instrumental variables

remains to be seen. Section 5 concludes.

2. SERIAL CORRELATION AND CONSIBTENCY OF EBTINATORB

This section shows that the moment conditions in weakly
exogenous models are restricted in general by structures of
covariance matrices. Before doing this, we review previous
results on the moment conditions for weakly exogenous

regressors, in particular those provided by SAW.

61

2.1. ﬁgment Conditions under Weak Exggeneity

The model we consider is
yit = xitB + uitr t = 1,-°-,T. (2.1)

Let Yi = (Yi1""IYir)" xi = (xi'1I"'Ixi'r)' and “i = (“i1""'
u")' of dimensions Tx1, Txk and Tx1, respectively. (2.1)
is equally expressed as yi = xi)? + u,. {(yi,xi):i=1, - --,N} is
an i.i.d. sequence and the fourth moments of (ywaq) exist.
Let 2 = E(uiui').

If there are no unobserved individual effects that are
correlated with the regressors, weak exogeneity is defined

by an assumption

ASSUMPTION 2.1: E(x§un) = 0, l s t s s S T.

‘I

This implies %T(T+1)k instruments diag(x%,x§V---,x%), where
x% = (x“,--',xn), t = 1, o-,T. Note that we did not give
any restrictions on 2. Also, Assumption 2.1 allows for
unobserved effects that are uncorrelated with the
regressors.

By introducing unobserved fixed effects that are
correlated with the regressors, the errors are composites of
the time-constant and the time-idiosyncratic components, so
uit = ¢i-+ e", t = 1, -~,T. The assumption that corresponds
to the weak exogeneity and the presence of explicit

unobserved effects is

62

ASSUMPTION 2.2: E(xﬁe.) = O, 1 S t S s S T.

It 18

Under Assumption 2.2, ¢H is allowed to be arbitrarily
correlated with the regressors.

Under Assumption 2.2, the coefficients on the time-
constant variables are not identified. Thus we simply
assume that all the regressors are time-varying. It is
usual, under Assumptions 2.2, to estimate the parameters

after differencing. In the differenced equations, the

errors are Auit E uit - uit+1 = 6i: - emu t = 1, - --, T-l, and
the orthogonality conditions are

E(xﬁAum) = o, t S s = 1,---,T-1. (2.2)
Thus, we have the instrumental variables diag(x%,--.,x%4)

of dimension TX%T(T-l)k in the differenced equations.

Let wi = diag(x‘i’1,x“?2, ---,x‘i’T,1) . Applying wi to the
differenced equations amounts to applying the instrumental
variables Img‘to the original equations before differencing
(SAW), where L is the Tx(T-l) differencing matrix (for the
definition of L, see SAW or Chapter 2). While applying
instruments Lwi to the original equations and applying wi to
the differenced equations yield the numerically identical
estimators, using Img in the original equations has several
important advantages, both for the identification of
coefficients on the time-constant variables if they exist
and for the efficiency of all of the estimators, whenever we

have some instrumental variables that are not in the space

63

spanned by L.

2.2. Serial Correlation and noment Conditions

It is usually assumed that {e":t=1,---,T) is an
uncorrelated sequence with a constant variance and is not
correlated with {¢i}. In this case 2 takes the random

effects covariance structure

22 = ail, + aie,e,', (2.3)

where II is the TxT identity matrix and er is the Txl vector
of ones. But, there is no reason to think (2.3) holds
universally. In GMM, the restriction (2.3) makes no
difference unless we utilize the moment conditions implied
by (2.3), namely, the moment conditions from covariance
restrictions. 2 is estimable in the models we deal with and
imposing restrictions like (2.3) without testing can be too
limiting.

In dynamic models, the set of the instrumental
variables corresponding to the lagged dependent variables
relies heavily on whether the idiosyncratic errors are
serially correlated (Ahn and Schmidt, 1992; Arellano and
Bond, 1991). Nevertheless, serial correlation of the
idiosyncratic errors in weakly exogenous models have often
been presumed to have nothing to do with the set of the

instrumental variables (Runkle (1991) is an exception; he

64
expressed the concern that the usual instrumental variables
might not be valid if the time-varying errors are serially
correlated).

We will consider whether instruments LM} (or Assumption
2.2 that generally is based on apriority) are valid when the
idiosyncratic errors are serially correlated in the model
with fixed effects. Although we will study the case when
the regressors are correlated with the unobserved individual
effects, the results equally apply to the model with no
fixed effects.

To this end, though it is unnecessary to give any
parametric restrictions on serial correlations, for
simplicity, we do impose them. We consider two examples
that are simple and of particular interest in practice:

AR(1) and MA(1). For the AR(1) case, suppose

= ¢€n4 + g”, for some non-zero constant ¢, (2.4a)

E(gnxﬁ) = 0, (2.4b)
E(Cit¢i) = E(gite‘?t-1) = 0! (2.4C)
for t = 1,---,T. Under weak exogeneity, the regressors are

correlated with the lagged errors so that xit is correlated

with ‘nq for j > 0, thus

E(x'

iteit-j) " 0: j > 0. (2.5)

We now examine whether moment conditions of Assumption 2.2,

given (2.4) and (2.5), are valid. But,

65

E(xi'teit) = V’E(xi't€it~1) + E(xi'tgit) = ¢E(Xi't€it-1) " 0'

And the same course shows that E(x ) s 0, j > 0. Thus

i't‘iuj
given (2.4), Assumption 2.2 is at odds with condition (2.5),
as is in dynamic models.

Condition (2.4), in fact, implies a set of

orthogonality conditions that are not linear in parameters.

Ignoring (2.40), covariance restrictions, (2.4b) implies
E[(Aun-¢Aun4)x%] = 0, t = 2,- -,T-1,

since (uh-nu”) - ¢(u"4-un) = g" - ch”. These are
[%T(T-1)-1]k moment conditions so that, to compare with Lwi,
the number of moment conditions reduced by the AR(1) serial
correlations is k.

We next consider the case when the idiosyncratic errors

follow the MA(1), so when we have

cit = nit - pnit-1l (2.6a)
E(nitx?t) = or (2.6b)
E(nitn?t-1) = E(nit¢i) = 0! (2.6C)

for t = 1,-- ,T. Then,
E(xi'tAuit) = E[xi't(nit-pnit-1-nit+1+pnit)1 = “E(xi'tnit-1) " 0°

Thus, if we use the instruments lmg, the obtained estimator
will be inconsistent in this case.
However, from (2.6b) we have the %(T-1)(T-2) moment

conditions

66

E(x‘i’t'Auit+j) = 0, j 2 1. (2.7)

These are linear in B, and we have the instruments

 

 

- o -
x%
"(:31 x?2
'x?2 .
x?T-2
' "X?r-2 ‘

Comparing this with 1mg, the number of instruments shrinked
by the MA(1) serial correlation is (T-1)k. The instruments
that becomes invalid is L[diag(xH,---,x"4)], which
corresponds to the statement in (2.7) that E(xi'tniH) e 0.

However, it is interesting to note that the condition
E(xﬁnnwﬁ ==0 can be valid, giving an alternative
explanation for the serial correlation. As appears in
rational expectations literature, time lags until the shocks
are observed by individuals can cause serial correlation.
If this is the case, it is quite possible that the one
period lagged errors are uncorrelated with the current
regressors though the errors follow MA(1), thus the moment
condition E(xinnwn ==0 can be plausible. Therefore,
allowing for time lags until shock is observed, it would not
be necessary to reduce the set of instruments.

We showed that the set of instrumental variables in

weakly exogenous model is closely connected to the structure

67
of 2 through a couple of examples. However, we can give a
different interpretation for serial correlation of the
idiosyncratic errors. Suppose that serial correlation is
caused by some omitted variables that are uncorrelated with
the regressors of all periods (like time constant-errors in
the random effects model) and that the error components to
which the regressors are weakly exogenous are serially
uncorrelated. Then.1mg is valid and 2 is unrestricted. In

this case, the errors should be of the form

+ e“

_ _ S
‘u. _ ¢i'+ 6n - ¢i'+ 6 n!

u t = l,---,T, (2.11)

where the regressors are strictly exogenous to the serially
correlated errors 6% and weakly exogenous to the
intertemporally uncorrelated errors ‘3' This distinguishes
general weakly exogenous models from dynamic models. In
dynamic models, there can not exist the error components to
which the regressors are strictly exogenous. A familiar
example is the unoberved individual effects ¢H° The
correlation between ¢i and the lagged dependent variables
are guaranteed in dynamic models, but it is not necessarily
the case in general weakly exogenous models.

SAW and Hayashi and Sims (1983) pointed out that
eliminating serial correlations by forward filtering

requires a similar situation. We quote SAW (p. 11),

"Forward filtering requires that the serial correlations in

the errors do not depend on the values current and lagged

68

values of the instruments."

As was noted earlier, SAW showed that the Keane and Runkle
estimator, the ZSLS estimator based on forwarded filtered
equations, is numerically identical to BSLS if all the
instruments Lwi are used. Thus, whenever the Keane and
Runkle estimator is inconsistent, so is BSLS. The
requirement for vindicating forward filtering noted by SAW l
and Hayashi and Sims indeed is needed for currently utilized “
instrumental variables to be valid.‘
Finally, we note on the relationship between the moment
conditions for lagged dependent variables and the structure
of 2 in the dynamic model considered by Ahn and Schmidt and
many others. For simplicity, let xit = Yn4r t = 1, --,T.

We start with the assumption that
E(eity‘i’m) = 0, t = 1,~--,T, (2.12)

which implies that {e“:t=1,---,T} is serially uncorrelated.
We do not impose the homoskedasticity restriction of
{6"3t=1,- -,T} so that E(ea) s E(ei), t e s. Assumption
(2.12) alone implies the set of instruments L g'that are
usually used in dynamic models, but, as was noted by Ahn and

Schmidt, we need an additional assumption
E(eitcp‘.) are the same, t = 1,---,T, (2.13)

in order to have the restricted covariance matrix where all

of the off-diagonal elements are the same. The T-2

69

additional moment conditions suggested by Ahn and Schmidt,
E(uitum1 - uimumz) = 0, t = 1, - -~,T-2, (2.14)

along with the instruments 1mg, encompass all of the moment
conditions implied by conditions (2.12) and (2.13). Thus,
the Hausman test (1978) and the GMM test (Hansen, 1982),
given the instruments LM}, of testing the validity of the
moment conditions (2.14) essentially test whether the
condition (2.13) holds.

When (2.13) is violated, the covariance matrix still
will be restricted as long as {e“:t=1,- -,T) is serially
uncorrelated. The off-diagonal part of 2 has
%T(T-1) possibly distinctive elements, but these are
composed of the T+1 elements E(¢gen), t = 1,. -,T, and 0:,
and so there should be the %T(T-3)-1 restrictions. But,
these add no useful moment conditions given the instruments
Lwi, since Lwi stands for the %(T-1) (T-2) moment conditions
from covariance restrictions when condition (2.13) holds,

and %(T-1)(T-2) > %T(T-3)-1.
3. ESTIMATION USING THE BMS ASSUMPTION

Wooldridge (1993) showed that the usual standard errors
based on N3SLS in hedonic pricing models are not consistent,
and he derived a condition for the usual 3SLS standard

errors to be valid. Ahn (1990) obtained a similar result in

 

70
dynamic panel data models when certain moment conditions
from covariance restrictions are used. These results
essentially show that some moment conditions necessarily
cause heteroskedasticity in the models considered by
Wooldridge (1993) and Ahn (1990).

It was suggested by SAW that the equi-correlation
assumption of BMS can hold in weakly exogenous models with
unobserved effects, and then (T-1)k instruments are added.
This section shows that the usual 3SLS standard errors are
not consistent if these instruments are used. Also, we
apply this result to the dynamic case and link it to the
result obtained by Ahn (1990). We assume that the
covariance matrix is of the random effects form to avoid the
consistency arguments of the previous section.

For the usual 3SLS standard errors to be consistent
when the instruments Lwi are used, we need the assumption of

no heteroskedasticity
ASSUMPTION 3.1: E(wi'L'uiui'Lwi) = E(wi'L'ZLwi) .

Wooldridge (1993, Example 5.2) showed that Assumption 3.1 is

satisfied under the assumptions

E(eitlxgt,e$t,1,¢i) = o (3.1a)
and

E(egtlxﬁm‘i’bwcpi) = 02 (3.1b)

(l

that are plausible and used quite often in the rational

71
expectations models. To see this, note that E(wi'L'uiui'Lwi)
has elements E(x‘i’t'AuitAuisxgs) = E[x‘i’t'(eit-eit+1)(sis-eis+1)x“?s],
t,s = 1,---,T-1, and so the result follows immediately by

the law of iterated expectations.

The assumption suggested by BMS is
ASSUMPTION 3.2: E(x&¢w) is the same, t = 1,-~ ,T,
which, given Assumption 2.2, is expressed as

E(xtuit - anﬁ%rn) = O, t = 1, -o,T-1. (3.2)

I

It is convenient to see the orthogonality conditions implied

by this through the moment matrix

- I ... . ,
Xnun xnun

E : I . (3.3)

 

 

h ' . . . ' d
xnu“ Xnun

Under Assumptions 2.2 and 3.2, all of the elements in the
upper triangular of the moment matrix are the same, while
Assumption 2.2 only implies that the elements in each row of
the upper triangular of (3.3) are the same. Thus,
Assumption 3.2 adds the (T-1)k moment conditions that the
upper triangular elements of (3.3) are the same across the
rows. The moment conditions of (3.2) describes these and we

have instrumental variables

72

'xn xn

iT-1

 

 

—xiT _

A notable distinction between hi and Lwi is that hi is not in
the space spanned by L and it is useful for identification
of the parameters on the time-constant variables.

For the usual BSLS standard errors to be consistent

 

when the instruments hi are used, the conditions

E(hi'uiui'hi) E(hi'Zhi) (3.4a)

and

E(wi'L'uiui'hi) E(wi'L'Zhi) (3.4b)

should be satisfied. Now, we show that condition (3.4) can
not hold under any plausible assumptions for weakly
exogenous models with fixed effects. To this end, we will
show that the equalities for the first kxk blocks of (3.2a)
and (3.2b) do not hold. The first kxk block of E(hgzh1)==

E[hi' (ofIT+aieTeT' )hi] is

afE(xi'1xi1 + Xi'zxiz) + aiE[(xi1-xi2)'(xi1-xi2)] (3.5a)

And for E(hi'uiui'hi) , we have

73
I 2 - I - I I 2
E(xnunx" xi1ui1ui2xi2 xizuizunxn + xiZuiZXiZ)
... 2 I _ I I 2 I
“' E[e"xi1xi1 ei16i2(xi1xi2 + xizxn) + eizxizxizl
2
+ E[¢i(xi1-xi2) ' (xi1-xi2)] (3-5b)

' .- I U '
+ E[2¢i€i1xi1xi1 ¢i(5i1+‘i2) (xi1xi2+xi2xi1) + 2¢i€i2xi2xi23

For the equality between (3.5a) and (3.5b) to hold, three

conditions should be met:

(i) E[ei1ei2(xi'1xi2 + xi'zxi1)] in the first term of (3.5b) is

zero,

 

(ii) E(¢§Axi'1Axi1) == 03E(Axi'1Axi1) for equality of the second
terms in (3.5a) and (3.5b),

(iii) the last term of (3.5b) is zero.

Condition (1) holds under Assumption 3.1. Condition (ii)

holds if we are willing to assume

E(¢§|Axi1,.~,Axi,_1) = oi, (3.6)

which is a strong assumption, but still is plausible along
with Assumption 3.2. Condition (iii) is a different matter.
From the assumptions in (3.1), the last term of (3.5b)

becomes
E[¢i (€i1+€i2) (xi'1xi2+xi'2xi1)] = E[¢i€i1(xi'1xi2+xi2xi1) 1'

which, however, never becomes zero under any assumptions
that are plausible for weakly exogenous models with fixed
effects.

Next we compare the first kxk block of E(ng'unqrn)

74

and E(wa'Zhi). For E(ng'Zhi),'we have
E[wi'L' (OEIT+aieTe;)hi] = a§E(wi'L'hi) = afE(xi'1xi1 + xi'1xi2) ,
and from the assumption (3.1) , E(wi'L'uiui'h‘.) becomes
OEEWGXH + Xi'1xi2) + E(¢i€i1xi'1xi2) r

the second term of which is not zero in weakly exogenous

cases with fixed effects. Therefore, the usual BSLS

 

standard errors are not consistent if the instruments hi are
used.

We now apply this result to dynamic models and compare
it with the result obtained by Ahn (1990). For simplicity,
we focus on a simple AR(1) dynamic model with no exogenous
regressors, so xit = Yn4l t = 1,-- ,T. The covariance
matrix is assumed to be of the random effects form, and we
keep the assumption (3.1) of conditional moment conditions.

The BMS assumption, in this case, tells that
E(Yit¢i) is the same for t = O,---,T, (3.7)

which is implied by the stationarity of {(Yit¢i) :t=0- - - ,T},
the assumption suggested by Arellano and Bover (1990). The

moment conditions are
E(yituit+1 - yit+1uit+2) = 0' t = 0' ' --,T-2, (3’8)

and we have the set of the instruments (waln). Note that

all of the elements in the upper triangular of the moment

75
matrix (3.3) are the same, and so the number of moment
conditions is %T(T+1)-1. This covers the moment conditions
from the random effects covariance restrictions (except for
those from the restriction of equal diagonals in 2).
Nothing essentially differs from the previous model, and the
usual standard errors from 3SLS are not consistent if h.i is L
used as instruments.

Without the stationarity of { (Yit¢i) :t=0, - - - ,T} , there

 

are %T(T-1)-1 moment conditions from the equal off—diagonal
restriction of 2, and T-1 conditions that E(ymAun) = O, t =
1,.o-,T-1. Together, we have %T(T+1)-2 conditions, one less
than the case under the stationarity of {(yhgn):t=O,-- ,T}.
These comprise %T(T-1) moment conditions E(wa'u1)== 0, and
T-2 conditions (2.14) suggested by Ahn and Schmidt. The
conditions in (2.14) are essentially nonlinear in
parameters. Ahn and Schmidt showed that, given that the
diagonals of 2 are equal, these additional moment conditions

are represented as linear in parameters like
E(yitAuit - yimAuim) = 0, t = 1,---,T-2. (3.9)

These are, in fact, linear combinations of the conditions in
(2.14) with the moment conditions from the restrictions of
equal diagonals, and with E(ngVuQ ==0. Ahn (1990) showed
that the usual 3SLS standard errors are not consistent when
the instrumental variables from (3.9) are used, which is

quite closely related to the result we obtained. The

76
structure of instruments from (3.9) is quite similar to hi.
The conventional N3LS methods do not generalize to
implement the moment conditions from covariance
restrictions. Thereby, there is no point to compare between

GMM and N3SLS when the moment condition (2.14) are used.

4. NEARLY EFFICIENT ESTINATION

GMM using all of the moment conditions leads to the
fully efficient estimator in large samples. In many panel
data sets (e.g. when we construct the data from PSID), there
is trade-off between N and T in applications; as T
increases, the size of cross-section N shrinks. Further, as
T increases, the number of moment conditions grows by T2.
Thereby, when T is relatively large (like 6 - 10),
situations where it is not even feasible to use all of the
moment conditions could arise, and finding some shorter
lists of instruments is of practical importance.

In this section, we propose several reduced lists of
instruments that would lead to nearly efficient estimators.
However, we cannot provide "how near", since the
efficiencies of the resulting estimators depend on too many
factors to be sorted out clearly. The estimators we propose
are intended to serve as only possible estimators among
many, and they need to be compared to other possible

estimators in practice. The arguments apply equally to the

77
estimator proposed by Keane and Runkle (1992).

In many cases some of the regressors in weakly
exogenous models are, in fact, strictly exogenous. A
leading example is the dynamic panel data model with
additional strictly exogenous regressors considered by many.
Suppose xit = (x%",xﬁt) for t = 1, --,T and B = (B;,ﬂ;)' in
model (2.1), where x”t is weakly exogenous and xZit are

strictly exogenous to the errors. The dimensions of x1it and

 

x2,it are 1xk1 and 1xk2, where k = k1 + k2. We first consider
a simple model where all the regressors are not correlated

with the time-constant error ¢i. ‘We have

ASSUMPTION 4.1: E(xﬁ'tuis) = o, 1 s t s s s T.

ASSUMPTION 4.2: E(xﬁeu1)== 0.

The set of instrumental variables implied by Assumption 4.1
is w1i = diag(x§’i1,x‘1’i2 ,---,x‘1’iT ). Assumption 4.2 implies the
instruments w2i = (Itex‘z’ﬂ) . Let wi = (w1i,w2i) of the column
dimension J'2"1'(T+1)k1 + T2k2. We further assume that there

presents no conditional heteroskedasticity, so
ASSUMPTION 4.3: E(wi'uiui'wi) = E(wi'zwi).

Under Assumptions 4.1 - 4.3 and if E is diagonal so
that if there are no unobserved individual effects and the
errors are intertemporally uncorrelated, the reduced set of
instruments diag(x",xn,---,xn) generate the fully efficient

estimator (Chapter 2). No result exists that finds some

78
reduced set of instruments that lead to the fully efficient
estimator when 2 is not diagonal (Ahn and Schmidt, Chapter
2, Chapter 4). Here we allow momentarily that E to be
unrestricted, but Assumptions 4.1 - 4.3 hold. These
assumptions rules out time constant unobservables correlated
with the regressors, but the result obtained under these
assumptions will be generalized to more practical models.

For the rest of the paper, for any Txp matrix mi, M E
(m{,o--,mu)' of dimension NTxp. Thus, the matrix M is the
stacked matrix of mi for i = 1, - - - ,N with the i-th block mi.
Let n = Inez. Then, W = (W1,W2) , note that W2 is equally
represented by X3811. (Chapter 2) .

If all the regressors are strictly exogenous, the GLS
estimator is fully efficient. Nevertheless, E(x1i'2'1ui) # 0,
and 24x1i are not valid instruments. However, 24x2i are
still valid. It would be natural to consider the property
of the 3SLS estimator using the instruments zi = (w1i,2'1x2i) ,
the column dimension of which is %T(T+1)k14-15. Estimators

are defined as

,8 = [x'wm'nm‘1w'x1'1x'wm'nwr‘w'y
and
f3= [x'z(z'QZ)"z'X]"x'z(z'QZ)"z'Y
We will compare the variances of 8 and 3. Let P(.)
denote the projection onto the columns of (-). Then,

79

we have

-1 __ -1/2 -1/2 __ -1
W(W'nW) W'X2 — n P(n"zz,,xgo2"2)“ X2 — 9 X2, (4.2a)
and

Z(z'I22)“z'x2 = IT‘X2 (4.2b)

since amuz'nm"z'r11/2r2'1/Z‘x2 = P(01/2W1'n-1/2x2)ﬂ'1/2X2 = n'VZxZ.
From (4.2), it follows that

[ )(1'W(w'rzw)"w'x1 x1'n'1x2 ]"
I

[X'W(W'nW)'1W'X]'1 = (4.3a)

4 4
x50 x1 xz'n x2

and

[x'2(z'02)“z'x1“ = [

X1'Z(Z'DZ)'1Z'X1 x1'n"x,_ "
0 (403b)

-1 -1
x50 x1 xz'n x2

The difference between these two arises from the difference
between X1'W(W'ﬂW)'1W'X1 and x1'2(z'r22)"z'x1. We know that E
is more efficient than 3. However, the difference in
efficiency might not be substantial, since both W and Z
include W1 which provides direct information for X1. This
result depends on Assumption 4.3 of no conditional
heteroskedasticity, and asymptotically the GMM estimator
using the instruments Z could be dominated by GMM using
simpler instruments (W,X§) in the presence of
heteroskedasticity. This is the reason why the instruments
2 is limited to serve as only a choice to be compared with
many other constructable set of instruments.

If the covariance matrix is of the random effects form

80
and the unobserved effects are not correlated with the
regressors, it is better to use (W1,PX2,QX2) than (W1,n"XZ)
as instruments whether there is heteroskedasticity or not,
since (2")(2 = aPXZ-I-bQX2 and we never worse off by using
(PX2,QXZ) instead of aPX2+bQXZ, where P = INGJT‘eTeT' and Q =

I“.- P. See Chapter 2 for the result of n'1 = aP+bQ. In

this case, (PXZ,QX2) explains x1 better than aPXz-I-bQXZ. We

 

omit complicated algebra comparing the performance of two
sets of instrumental variables (W1,PX2,QX2) and (W1,n"x2) I
since it is intuitively clear.
In the case when the unobserved individual effects are
correlated with all of the regressors and the covariance
matrix is of the random effects form, we have the
instruments ri = L-diag(x‘1’i1,x§’i2, - - - ,x‘1’im) and Loxgi. In this
case, the reduced set of instruments (R,QX§)*would produce a
nearly efficient estimator, since P(R,QX2)XZ = P(R,x‘2’oL)XZ =
QXZ. The same algebra compares the variances of estimator
from these instruments, and produces a result like (4.3).
The above arguments also apply to more general models
where only some of the regressors are correlated with the
fixed effects. We study the dynamic version of the Hausman

and Taylor model considered by Ahn and Schmidt:
Yit = ayit-1 + X“:31 + XZitﬁZ + z1i71 + zzﬂz + ¢i "' ‘itl (4'4)

for t = 1,...,T, All the regressors but the lagged

dependent variable are strictly exogenous to the

81
idiosyncratic errors, and only (xﬁt,za) is correlated with
unobserved fixed effects ¢H' The covariance matrix takes
the random effects form. Then, we have the set of
instruments I‘ = [R1, (X1,z1oe,)oI,, (Xz,zzoeT)oL], where R1
includes all the instruments between the lagged dependent
variables and the disturbances. Note that (Z1,Zt.,)oeT is the
time-constant regressors. The reduced lists of instrumental

variables F = (R1,QX,PX1,Z1®eT) would produce a nearly

 

efficient estimator, since PU‘) (X1,Z1oe,) = P0,.) (x1,Z1oeT) =
(X1,z1oe,) and P(r)(X2,ZZ) = P(F)(x2,zz). Hence, the
reasoning is the same as the previous cases. Note that
(QX,X1oeT,Z1oeT) is the reduced lists of instruments in the
static Hausman and Taylor model, that produces the fully
efficient estimator when there is no heteroskedasticity.

For more details, see Ahn and Schmidt (1992).

5. CONCLUSION

We showed that the moment conditions that are currently
utilized in weakly exogenous models may not be valid in some
cases if the idiosyncratic errors are serially correlated.
Were the serial correlations of the idiosycratic errors
detected from an initial stage estimator of 3, testing
exogeneity of the instruments would be constructive.
Difficulty arises when 2 is entirely unrestricted. Then,

identification of 3 becomes a serious problem unless there

82

are a sufficient number of strictly exogenous instrumental
variables. However, testing the structure of 2 would lead
to some nice specification tests that are not doable in
single equation models, which we leave for further study.

Though we prove that GMM should be used when we
estimate using the BMS assumption, the reason why there
necessarily arises heteroskedasticity problem is not so
clear. But, the condition derived by Wooldrige (1993)
provides a partial answer for this. He essentially shows
that weak exogeneity is minimally required for the usual
3SLS standard errors to be consistent. Any instruments wit
for the t-th equation that satisfy E(e",eru1,---,enlwk)== 0
do not raise this problem, but the instruments from the BMS
assumption and the instruments suggested by Ahn (1990)

necessarily relate to disturbances across equations and

thereby violate the Wooldridge's condition.

CHAPTER 4

INFORMATION PROM COVARIANCE RESTRICTIONS
IN PANEL DATA MODELS

1. INTRODUCTION

In this chapter, we study the orthogonality conditions
from covariance restrictions. The main purpose is to find
whether the covariance restrictions are useful for more
efficient estimation in several panel data models. We focus
on the restrictions from scalar and random effects
covariance matrices, but the results can be extended to more
general restrictions. Also, we derive the asymptotic
variances of generalized method of moment (GMM) estimators
that use the moment conditions from covariance restrictions.

Covariance restrictions have largely been studied in
the context of simultaneous equations models, and
identification has been the major concern. For efficiency,
Rothenberg and Leenders (1964) showed that the exploitation
of covariance restrictions lowers the Cramer-Rao bound in
standard simultaneous equations models when the errors are
normally distributed. Hausman, Newey and Taylor (1987)
proposed augmented 3SLS as a handy way to realize the
efficiency gains from covariance restrictions.

For panel data models, previous studies have focused on
simple dynamic models. No results exist for static models

83

 

84

like the Hausman and Taylor (1981) model (HT henceforth) and
general weakly exogenous models. The random effects
covariance structure, which has become an almost standard
assumption in panel data analysis, implies a set of
orthogonality conditions. Instrumental variables for lagged
dependent variables considered by Anderson and Hsiao (1982),
Holtz-Eakin, Newey and Rosen (1988), Arellano and Bond
(1991) and Ahn and Schmidt are based on the orthogonality
conditions from the random effects covariance matrix.

Covariance restrictions are rarely used in practice
except for dynamic panel data models and triangular
simultaneous equations models, where covariance restrictions
are crucial for identification. This may be due to
reluctance to utilize a priori restrictions that will cause
inconsistency of estimators when they are false. Another
important reason would be computational burden of numerical
optimization, which is required in general to realize the
efficiency gains from covariance restrictions. But, if
covariance restrictions bring non-trivial efficiency gains,
the computational burden is secondary. Therefore, it would
be nice to have an easy way (without getting nonlinear
estimators) to approach the possible efficiency gain when
moment conditions from covariance restrictions are added.
We show how to consistently estimate the asymptotic
variances of the nonlinear GMM estimators that incorporate

covariance restrictions without numerical optimization. By

85
comparing the two variance estimates of estimators (with and
without covariance restrictions) we can see the possible
efficiency gains when we add the moment conditions from
covariance restrictions. If the efficiency gain is non-
trivial, it would be worth doing numerical optimization.
Oncw we get a nonlinear GMM estimator, it is straightforward

to apply Hausman test (Hausman, 1978) or GMM test (Hansen,

:

1982) to test whether the covariance restrictions used are

T‘—
0‘

valid. Thus, it is not hard to get around the possible
inconsistency problem of estimators from using false
restrictions.

In section 2, we study a general model and give some
preliminary results used for the chapter. The asymptotic
variances of GMM estimators that use the orthogonality
conditions from covariance restrictions will be derived, and
it will be shown how they are consistently estimated without
numerical optimization. Also, we provide the conditions
when the linear GMM estimators using the residuals as
instrumental variables are asymtotically identical to the
nonlinear GMM.

In sections 3 and 4 we study covariance restrictions in
specific models and derive conditions when the moment
conditions from covariance restrictions are redundant.
Section 3 deals with models where the regressors are
strictly exogenous to the time-varying errors. It turns out

that certain moment conditions from covariance restrictions

86
are useful unless some third moment conditions - essentially
symmetry conditions - of the errors are met. We cover three
models: the model with scalar covariance matrix, the random
effects model, and a fixed effects type model where the
unobserved individual effect is correlated with the
regressors.

Section 4 studies weakly exogenous models. We argue
that the orthogonality conditions from covariance
restrictions can be redundant when the covariance matrix is
diagonal, but whenever the covariance matrix is not
diagonal, they are essentially always useful. Section 5

concludes.

2. PRELIMINARIES

2.1. Begundancy Conditions for Moment Restrictions

We study a linear panel data model

yit = xitﬁ + uit' t = 1'...’T, (2.1)
where {(yn,xn):i = 1, --,N) is an i.i.d. random sequence.
Lat Y5 = (yi1r"'lyir)'r Xi = (Xi'1!"'lxi'1)'l and ui = (ui1t'°'r

u")' of dimensions Txl, Txk and Tx1, respectively. (2.1)
is equally expressed as yi = xiB + ui. Let 2 E E(uiui') , a
TxT nonsingular matrix. We assume that the 4-th moments of

(ywag) exists. Throughout the chapter, for any TXp matrix

87
an, M I (m{,---,m;)' of dimension NTxp, where N is the
number of cross-section observations.
There is a set of Txh observable instrumental variables

wi , that satisfy

ASSUMPTION 2.1: E(wgug ==0.
ASSUMPTION 2.2: E(ngk)lhas full column rank and E(wgwg) is

positive definite.

Assumption 2.2 is a regularity condition that ensures
identification, and it is assumed for the rest of the paper
without being stated further.

Throughout the paper, we let E[g”(ﬁ)] = 0 denote an
initial set of moment conditions and E[gﬁ(ﬂ)] = 0 be
additional moment conditions from covariance restrictions.
Therefore, our major concern is whether the additional
moment conditions E[gﬁ(ﬁ)] = O are redundant, given the
conditions E[g”(ﬁ)] = 0.

Let em?) = [gnwweﬁm 1 If we use the
orthogonality conditions E[gi(B)] = 0, GM solves the
problem

. l N “4 1 N
mﬁJ-n Q“(ﬂ) = [N121 gi(ﬁ)]'Au [1.1.2:]. gi(B)]°

1.

It is well known that the best choice of weighting matrix X"

is a consistent estimator of A a E[gi(ﬁ)gi(ﬁ) '1, so we take

AN = 1.112;]. gi(B)gi(B) ' I

88

where 3 is a consistent estimator of 3 (e.g. Hansen, 1982).

Then
Avar/N(Sa. - ﬂ) = [D'A4014, (2.2)
a .
where D I E[ Egg—‘31]. Let A). = E[(9,-,(B)gu(ﬁ) '1. 332 = 1.2.
agjiw)

and D 1- (D1',Dz,')', where Dj E[—3B_']' j = 1,2. Then, the 1
asymptotic variance of the GMM estimator that uses only

E[gﬁ(3)] = 0 is “
Avar/N(8W - 3) = [D1'A;}D1]". (2.3)

From (2.2) and (2.3) it is seen that interest centers
on the difference between D'A"D and D1'AﬂD1. The former is
no smaller than the latter since GMM never becomes worse
asymptotically by adding orthogonality conditions. Thus,
the information from covariance restrictions is useful
unless D'A'1D = D1'AHD1. Schmidt (1991) shows that this

equality holds if and only if
D2 = A21A;]D1. (2.4)
We will use this condition at several points in the

remainder of the chapter.

2.2. Sealar Covariance and the Asymptotic Varienee ef QMM

We begin by considering the moment conditions from the

scalar covariance matrix.

89

ASSUMPTION 2.3: 2 = 021,.

Assumption 2.3 tells that the off-diagonals are zero and the
diagonals are the same in 2. The number of off-diagonal
elements in E is T(T—l), but due to the symmetry of 2, the
upper triangular of 2 is a duplicate of the lower
triangular. Thus, the condition of zero off-diagonals

implies the %T(T-1) orthogonality conditions

 

 

E(umu") = 0, s > t = 1,---,T-1. (2.5)
The moment conditions (2.5) can be expressed as E(b1'iui) = 0
or E(bz'iui) = 0, where
r “n u” l
“a “u
bn = I
n
_ 0 ..
and
.. 0 0 —
u” o 0
ha = um um '
' “n “n “n4 ‘

 

 

The dimension of both b1i and b2i is TX%T(T-1).
The condition of equal diagonals implies the T-1 moment

conditions

90

 

 

mu?t - uﬁm) = o, t = l,---,T-l. (2.6)
We express (2.6) as E(ci'ui) = 0, where
P ui1 '1
"uiz uiZ
ci = -ui3 .
uiT-1
_ _u _

W

The moment conditions in (2.5) and (2.6) contain different
information, and will be considered separately.
We derive the asymptotic variance of the GMM estimator

expressed in terms of xi, wi, b“, has and ci. First,

D1 = -E(wi'xi). (2.7a)
For the moment conditions in (2.5), or E(b1'iui) = 0, the
elements of D2 are
-E(uisxit + uitxis), s > t = 1,---,T-1.

Thus, it follows that D2 can be written as

D2 = -E[(b1i+b2i) 'xi]. (2.7b)
For the moment conditions E(ci'ui) = 0, we have the elements
of D2
'2E(uitxit - uit+1xit+1) ' t = 1' ' ' ' 'T-l'
so

D2 = —2E(ci'xi). (2.7c)

91
Consider first the estimator that uses the moment conditions

E(wi'ui) = 0 and E(b1'iui) = 0. Then from (2.2),

Aver/M5“, - ﬂ) =

I I I I 1 I
w. uiuiw, wiuiuib1i ] E wixi ]

(b1i+b2i) 'xi
(2.8a)

-1

E[xi'wi ,xi' (b1i+b2i) ]E[

I . ' I
b1iuiui Wi bi1uiui bu

Similarly for the GMM estimator that uses the moment

conditions E(wi'ui) = 0 and E(ci'ui) = 0, we have

Aver/1W;m2 - ﬂ) =

wi' uiui' wi wi' uiui' ci '1 wi' xi '1
E[xi'wi,2xi'ci]E[ ] E[ ] (2.8b)
ci'uiui'w‘. ci'uiui'ci 2ci'xi

The equations (2.8a) and (2.8b) are useful in practice.
They are consistently estimated using residuals G“ in place
of disturbances u”, t = 1,---,T, where G" is based on a
consistent estimator of B from the initial instruments WI°
For a proof of consistency, See White (1984, pp. 135-138).
Define b”, b2i and Si to be b", b2i and ci after replacing Git
for uit for t = 1,---,T, and let w: = (wwbﬁ). Then, the
ratio between the corresponding diagonal elements of the two
estimators (standard errors) of the asymptotic variances

N A A
[X'W(.§:1wi'u‘.ui'wi)"W'X]'1
1-
and
A A N * A A * _1 A A -1
[X' (W,B1+B2) (121 wi 'uiui'w.) (B1+B2,W) 'X]

will provide guidance about whether it is worth trying to

92
use the moment conditions from zero off-diagonal covariance
restrictions through numerical optimization. Similar
arguments apply for the equal diagonal restrictions.

Were they available, b“, b2i and ci could serve
themselves as instrumental variables. In the equation
(2.8a), it is not hard to see that if E(bﬁxg) = 0 the
asymptotic variance of the nonlinear GMM estimator becomes
the asymptotic variance of the linear GMM estimator using
b1i as instruments. Similarly, if E(bﬁxg) = 0, the linear
GMM estimator using b2i as instrumental variables is
asymptotically identical to the nonlinear GMM estimator.
Also equation (2.8b) shows that when E(c{xg) = 0, the
nonlinear GMM estimator is the same asymptotically as the
linear GMM estimator using'cg as instrumental variables. It
is interesting to ask what will happen if we use b", 8a and
8, as instruments instead of b", b2i and ci. As shortly will
be shown, there is an interesting correspondence. If
E(ngg) = 0, there asymptotically is no difference between
using b1i and b" as instruments. Similarly, if E(bﬁxg)== 0
and E(ci'xi) = 0, we lose nothing by doing linear GMM using
instruments BZi and Si, respectively.

We now verify these assertations. If we use b", bﬁ or

A
cg as instruments the resulting estimators are consistent

since

I N A
plimﬁixl uisuit = E(u.suit), s,t = 1, - --,T.

I

93
For the limiting distributions of the resulting estimators
not to be affected by replacing b1i by b”, the two random
N N
variables X G u. and 2 u u. for s > t, should have
ﬁi-l is It 71’31-1 is It'

the same limiting distribution. But, 6,8 = uis - xis(§ - ﬂ)

and
N N N A
26.11. = fun. ~1zu.x./N(p-p). ‘7
aﬁi-l Is It ﬁlm-1 Is It N1_1 It 18
Because JN(§ - B) = Op(l), the limiting distributions of the is

two GMM estimators using 5" and using b1i are the same

provided

N
plimgiizlu = E(u.txis) = 0, s > t,

itxis .

so when E(bz'ixi) = 0. Similarly, if E(b1'ixi) = 0, replacing
b2i by SZI do not affect the limiting distribution of the

estimators. For 8,, since
N A A
#121 (uituit ' uit+1uit+1)
N 2 2 l N A
= #121 (uit ‘ uit+1) ' N121 (uitxit ' uit+1xit+1)’/N(ﬁ " 3) I

if E(unx“) = 0, t = 1, °-HP.<a can be replaced by 3i
without affecting the limiting distribution of the
estimators. In summary, the linear GMM estimators using

b”, 32‘. and oi as instrumental variables are asymptotically

identical to the nonlinear GMM estimators if

E(bz'ixi) = 0, E(b1'ixi) = 0 and E(ci'xi) = O, (2.9)

94
respectively. Further, if (2.9) holds, it is valid to
replace yit for u“, t = 1, --,T in b”, b2i and ci.

Note that the condition (2.9) holds for each column in
b”, bin and ci, and thus should apply to any covariance
restrictions that are subsets of the scalar covariance
restriction.

If S", bei and 6i are used as instruments when (2.9)
does not hold, the usual standard errors are not consistent
and adjustment terms should be added (for a general
treatment, see Newey and McFadden, 1993, Section 6).

We end this sub-section by stating the general

redundancy condition (2.4) more explicitly for the current

problem. The additional moment conditions E(b1'iui) = 0 are
redundant, given the initial moment conditions E(wguﬁ ==O,
iff

E[(b1i+b2i)'xi] = E(b1'iuiui'wi)[E(wi'uiui'wi)]'1E(wi'xi). (2.10a)

In the remainder of this chapter we are interested in
checking this condition for various choices of w}. Now
consider using the orthogonality conditions from (2.6).
Given the initial moment conditions E(wi'ui) = 0, the

orthogonality conditions E(c;u1)== 0 are redundant iff

2E(ci'xi) = E(ci'uiui'wi) [E(wi'uiui'wi) ]'1E(wi'xi) . (2.10b)

95

2.3. Bengom Effects Covariance and the Asymptotic Variance
9f GMM

 

The random effects error structure allows the time-

constant error component, thus in model (2.1)

nit = (pi + eit, t = 1,-.-,'r. (2.11)

where ¢3 is the time-constant error.
Instead of Assumption 2.3 that 2 is a scalar covariance

matrix, we assume the random effects covariance:

. .. 2 2

where 02 = E(¢§), a2 = E(egt), t = 1,---,T, and IT and e are

9 i I

TxT identity matrix and Txl vector of ones, respectively.
Assumption 2.4 implies that the idiosyncratic errors (6",

-,e") are uncorrelated with each other, are uncorrelated
with ¢3, and have constant variance.

The difference between the random effects covariance
matrix and the saclar covariance matrix is that the off-
diagonal elements of the random effects are non-zero but the
same each other. We have the %T(T—1)-1 moment conditions
from the restriction of equal off-diagonals, one less than
for the case of scalar covariance, and the restriction of
equal diagonals adds the T-1 moment conditions. Together,
we have the %T(T+1)-2 moment conditions, which should be the

case since we have only two parameters of and 0: among the

96

%T(T+1) possibly distinct elements of Z.

The condition of equal off diagonals of 2 is

Ewan")

= E(ui3ui1) = E(ui3ui2)

 

E(unun) = E(uiTuiZ) = = E(ui‘ruiT-Z) = E(uiTuiT-1) (2°12) \

It is convenient to consider the orthogonality conditions I
(2.12) in two groups; one is that all the elements in each ‘
row are the same and the other is that all the elements are

equal across the rows. The first set is
E[uis(uit-uim)] = o, s > t+1, t = 1,---,T-2. (2.13a)
And the second set is
E[(uit-um1)ui1] = o, t = 2,---,T-l. (2.13b)

The moment conditions (2.13) are equally expressed as

either E(h1'iui) = 0 or E(hz'iui) = 0, where h1i and h2i are

 

 

' “a u“ “n 0 ‘
"um ”um um ”um u" um
'uM 7“” “um um
h1i = “W 'un
-uﬂ “n
_ O ..ui1 .4

 

and

 

97

 

 

' 0 Ann Ann "' Aun4 1
0 0 0 0
Au
n
ha 3 I
Ann Ann
L ' . . . J
Aui1 Ann-2 0 0

 

where Auit a u.-«1 The left and right blocks of h1i and

it it+1'
h2i correspond to the moment conditions in (2.13a) and
(2.13b), respectively.

To derive the asymptotic variance of the nonlinear GMM
estimator, we follow the same path as we did in the last

sub-section. Since

au.Au.

—6mﬁTLL = -(uisAxit + xisAuit) I
it follows that
ahlu.
“EﬁIJ'= ’(hn+ha)'xw

and we have

02 = -E[(h1i+h2i) 'xi].

The first derivatives of the moment conditions from the
random effects covariance matrix is quite similar to that
from the scalar covariance, which is the case because the
moment conditions from the random effects covariance are
some linear combination of those from the scalar covariance.
Thus, the results we obtained for the scalar covariance

restriction equally applies to the random effects covariance

98
restriction. If we use the orthogonality conditions E(w;uﬁ

= 0 and E(h1'iui) = 0, then

Mar/M56... - ﬂ) =

I I I I '1 I
w.unnvq wiunntni ] E wixi ]

I
(h1i+h2i) 'xi

(2.14)

4

[E[xi'wi,xi' (h1i+hi2) ]E[

I I I I
1H9%uiwi 1%ﬁ%uihn

The linear GMM estimator using the instrumentals h” (hﬁ) is
asymptotically identical to the nonlinear GMM if E(hﬁxk) = 0
(E(h1'ixi) = 0).

Applying the redundancy condition (2.4), the moment
conditions from the restriction of equal off-diagonals are

redundant iff
E[(h1i+h2i) 'xi] = E(h1'iuiui'wi) [E(wi'uiui'wi) ]'1E(wi'xi) , (2.15)

which is an analogy of the condition (2.10a)
3. STRICTLY EXOGENOUS MODELS

In this section, we find the conditions when the moment
conditions from covariance restrictions are redundant in the
models where the regressors are strictly exogenous to the
time-varying errors. We study the scalar and the random
effects covariance matrices. Before considering redundancy,
we present a theorem that provides intuition for our later

discussion.

99

3.1. genera; Resuits on Nonredundemcy umder Ideal Conditions

In the model yi = xiii + ui, we assume:

ASSUMPTION 3.1: E(uilxi) = o,

ASSUMPTION 3.2: E(uiui'lxi) = 07-1,.

Assumptions 3.1 and 3.2 are "ideal" conditions. OLS is BLUE
under these assumptions (along with nonsingularity of X'X
matrix). Chamberlain (1987) showed that, ignoring the
moment conditions from (3.1b) below, if all the instrumental

variables wi that include xi satisfy

E(uilwi) = OI (3.13)
E(uiui' IWI) = 021,, (3.1b)
the optimal set of instruments is E(xilwi) = xi, and OLS is

the most efficient. Condition (3.1) is stronger than
Assumptions 3.1 and 3.2, and Chamberlain's result allows
that there would be nonredundant instrumental variables
other than xi under Assumptions 3.1 and 3.2. We write it

down more explicitly.

THEOREM 3.1: In model (2.1) under Assumptions 3.1 - 3.2,
suppose there are instrumental variables aﬁ'of dimension qu
such that (i) E(agui) = o and (ii) E(ai'uiui'xi) e 02E(ai'xi).
Then GMM using the instruments (xiﬁm) is more efficient

than OLS.

 

100
PROOF: It is sufficient to show that a.i is not redundant.
From (2.5), given the initial instruments in ai are
redundant iff
E(ai'xi) = E(ai'uiui'xi) [E(xi'uiui'xi) ]'1E(xi'xi) , (3.2)

which holds iff E(ai'uiui'xi) = 02E(ai'xi). I

Theorem 3.1 holds even when the errors are normally
distributed and when the regressors are independent of the
errors, but applies only to large samples. Generally GMM
should be used to realize the efficency gain from the
additional instruments ai.

The idea underlying Theorem 3.1 is suggested in Cragg
(1983) and Chamberlain (1982). They showed that there can
exist nonredundant instrumental variables in addition to the
regressors in the presense of conditional heteroskedasticity
of unknown form, even when all the regressors are valid
instruments. Cragg's estimator is a GMM estimator with more
instrumental variables that are correlated with the
conditional error covariance. The efficiency gain in
Chamberlain's optimal minimum distance estimator has a
similar interpretation.

Note that Theorem 3.1 shows that, to be useful, the
additional instrumental variables ai do not have to be
correlated with the regressors xi. Instruments that are
uncorrelated with the regressors appear frequently in the

models we study subsequently.

 

101

3.2. 55:19:12 Exogemoms Model; Scalar Covariance

Assumption 3.1 is stronger than needed. The weakest

assumption with strictly exogenous regressors is

ASSUMPTION 3.3: E(wi'ui) = 0, where wi = ITsx‘i’ and x“? =

new - - - .xin-

The choice of instruments in Assumption 3.3 simply means
that xit is uncorrelated with u“, all t,s = 1, --,T. In
this section, we find when the moment conditions from the
scalar covariance of Assumption 2.3 are redundant, given the
initial instruments ITox‘i’.

Under Assumption 3.3, E(b1'ixi) = E(bz'ixi) = E(ci'xi) = 0,
so from (2.9) the linear GMM estimators using either 8” or
$2, and (ii as instruments has the same limiting distribution
as the nonlinear GMM. Thus, we treat b”, b2i and ci as being
available. Applying the condition (2.10a) to see if b1i is

redundant or not, given the initial instruments Imnﬁ, we

get
E(b1'.u.ui'(I,ox$)][E(uiui'oxg'x‘i’)1"E[(1Tox‘;) 'xi] = o. (3.3)

I I

If there exists no conditional heteroskedasticity, thus if

the assumption
ASSUMPTION 3.4: E(wi'uiui'wi) = E(wi'Zwi)

holds, then combined with the scalar covariance assumption 2

 

102
= 01H” the weighting matrix which is in the middle of the
LHS of the equation (3.3) becomes aZIroE(x“?'x‘i’) . Thus, OLS
is efficient and only k instruments xi are useful among Tzk
instruments I ox? since P o X = X. Otherwise all the
T I (X 81,)
instruments Ignﬁ are useful (Chapter 2). Under Assumption

3.4, the equation (3.3) becomes
E(b1'iuiui'xi) = 0. (3.4a)

This is no more than the redundancy condition of b1i on the

initial instruments xi. E(b1'iuiui'xi) contains the elements
T
21 E(uisuituifx") , s > t = 1, - - - ,T-l.
T:

A sufficient condition for (3.4a) is

E(uisuﬁtxit) = E(uisuﬁtxis) = E(uisuituifxiﬂ = 0, s s t s 1’. (3.4b)

There are other situations where (3.4a) holds, but they are
not very intuitive.

If Assumption 3.4 is violated, condition (3.3) holds if
E[b1'iuiui' (ITGX?)] = 0. (3 . 5a)

(3.5a) is stronger than (3.3), but it would be very unusual
if (3.3) holds without (3.5a) . Because E[b1'iuiui'(I,ox‘i’)] has

elements

E(uisuituifx‘i’), s > t = 1,-~,T-1 and ‘r = 1,---,T,

condition (3.3) holds if

103

E(u'fsuitx‘i’ = 0 and E(uisuituifx‘i’ = 0, s ,I t a 1. (3.5b)

Condition (3.5b) is stronger than (3.4b), but as long as the
strong exogeneity assumption holds the two conditions are
quite similar. A sufficient condition that ensures (3.5b)
is E(uituislx‘i’m”) = 0, 1 II s,t, (including s = t), which is
met if the errors are independent over time. This condition
particularly rules out ARCH presentation in panel data
context. A constructive way to understand what the
condition (3.5b) represents is that for b1i to be redundant,
there should be no conditional heteroskedasticity when b1i
is used as instruments. Otherwise, though being
uncorrelated with the regressors, b1i becomes useful by
explaining the second moments of errors. Thus, the reason
why b1i can be useful is the same as why additional
instrumental variables aﬁlof the previous subsection can be
useful even when they are uncorrelated with the regressors.
We follow the same procedure to find the redundancy

condition for ci. From (2.10b), when Assumption 3.4 holds

ci is redundant iff E(ci'uiui'xi) 0, or equivalently

T

2:1E(ui:’s - uﬁw)uitxit = 0, s = 1,---T-1. (3.6a)
t=

Given the condition that b1i is redundant, this condition

becomes

E(U?txit - u?t*1xit+1) = O, t = 1’ ' ' ' ,T-10 (306b)

104

A sufficient condition for (3.6b) is
E(u?txit) = o, t = l,---,T, (3.6c)

that demands the symmetry of the error distribution as a
miminum.

If Assumption 3.4 does not hold, ci is redundant if

E[ci'uiui' (IT®X?)] = 0, (3.7a)
or,

E[(u§t-u§t+1)u"x? = o, t = l,---,T-l, r = l,---,T. (3.7b)
Given (3.5a), condition (3.7b) becomes

E(uﬁtx‘; = o, t = l,---,T-l. (3.70)

It is interesting to note that the conditions (3.5) and
(3.7) are usually assumed in the literature concerning about
covariance restrictions in simultaneous equations models,

namely:
ASSUMPTION 3.5: E[(uiui'®ui)wi] = 0.

Examples are Hausman, Newey and Taylor (1987) and Arellano
(1989). In particular, standard errors from augumented BSLS
estimator proposed by Hausman, Newey and Taylor are not
consistent when Assumption 3.5 is violated (Section 4 of
their paper). To show why Assumption 3.5 ensures the
conditions (3.5) and (3.7), we define selection matrices Sj

of dimension [32-41‘('I'+1)-1]x'r2 such that z = 07-1, e Sjvec(2) = o

105

(Magnus and Neudecker, 1980) . Thus, the matrix Sj selects
the elements in vec(2) . In our model, Sj[vec(uiui')] =
(bji,ci)'ui, for j = 1,2. Since vec(uiui') = (IToui)ui, the
conditions (3.5a) and (3.7a) are equally expressed as
S1E[ (Iroui)uiui' (ITox‘i‘H = S1E[ (uiui'eui) (ITox‘i’H = 0 from
Assumption 3.5. Thus, Assumption 3.5 is sufficient to
ensure the moment conditions from covariance restrictions
are redundant when the regressors are strictly exogenous.

Allowing for individual effects would be a primary
reason why people use panel data models, and the model we
studied in this section rarely appears in panel data

applications. We now turn to more widely applicable models.

3.3. Strictly Exogenous Model: Random Effects Covariance

In this section, we will consider the covariance
restrictions in the popular random effects model. Thus, the
errors are composites of time-constant.¢k and time-varying

e and all the regressors are exogenous to ¢i as well as to

“I
e", t = 1,- -,T. We have the initial instruments Ignﬁ and
the set of moment conditions from the random effects
covariance is a subset of the moment conditions from the
scalar covariance. Thus, the moment conditions from the
random effects covariance matrix are redundant as long as

the higher conditional moment conditions of the errors such

as Assumption 3.5 are satisfied.

 

106

We focus on the higher moment conditions on the time-
constant error ([5, that usually is thought to be caused by
omitting some unobserved variables that are invariant over
the time periods in question.

The additional moment conditions are E(h1'iui) = 0 or
E(hz'iui) = 0, and E(ci'ui) = 0. Since E(h1'ixi) = E(hz'ixi)
E(ci'xi) = 0, we handle h", h2i and ci like they are

available. From the condition (2.15), h1i is redundant if
E[h1'iuiui' (Irex‘i’n = 0. (3.9a)

and which is equally stated as,

(i) E(uisAu. 11. X?

It If I 0, and (ii) E(ui1Aunu.x9 = 0,

ITI

s-1 > t = 1, --,T-1, T = 1,-- ,T, and c = 2, --,T-1. (3.9b)

The first and the second conditions of (3.9b) correspond to
the left and the right blocks of h”. Due to homogeneity of
the moment conditions, not much is lost by examining only
the (1,1) element of E[h1'iuiui'(I,ox“?)], so the condition
E(ui3AuHui1x‘i’) = 0. Then, we have
E{[¢I(Ei1-ei2)+¢i(€I1‘6i16i24—6i16i3—6i26i3)+(GI16i3-6i16i26i3)JX‘IP} = 0'
(3.9c)

The condition (3.9c) is met if (i) E(eilxi,¢i) = 0, (ii)
E(eiteislxi,¢i,ei,) = E(eiteis), s,t II 1 (including 5 = t).

For the moment conditions from equal diagonals, we
apply the condition (3.7b) that E[(u§t-u§t+1)ui1x‘i’ = 0, t =

1,---,T-1, and r = 1, --,T. We consider the case when t = r

107

= 1, then the condition becomes
2 2_ 2_ 3 _ z o _
E{[2(ei1-Isi2)¢i + (36“ e“ 2€i16i2)¢i + I:i1 ei1ei2]xi} - 0. (3.10)

Equality of this equation holds if we add the condition
E(eﬁ1x?) = 0 to (3.9) . One notable thing is that the
condition (3.7c) that E(u?x£ = 0 does not necessarily apply

III

for this case. Note that in (3.10) ¢§ is differenced away.

3.4. ict Exo enous ode : ' ed Ef ects T

We allow for arbitrary correlations between the
regressors and the time-constant error ¢i° ‘The usual

assumption is
ASSUMPTION 3.6: E(xioei) = 0.

Note that we still are working on the random effects
covariance matrix. Assumption 3.6 implies a set of
instrumental variables Lex§,*where L is the Tx(T-l)
differencing operator (Chapter 2; Ahn and Schmidt). We
assume that there are no time-constant variables in xit to
ensure that B and 2 are identified.

Ahn and Schmidt showed that the moment conditions from
the random effects covariance restrictions are redundant
under Assumption 3.6. Their reasoning is that the moment
conditions from the random effects covariance restrictions

add information only through the regression corresponding to

 

108
the instruments that are in the space spanned by L where the
regression reaches the GLS efficiency. Their finding is
plausible under certain set of assumptions, that we will
detail.

E(hﬁxg) = 0 under Assumption 3.6, thus the linear GMM
using 3” as instruments is asymptotically identical to the
nonlinear GMM, but using the instruments ﬁa would in
general lead to less efficient estimator than the nonlinear
GMM since E(hﬁxg) # 0. Thus, h1i will be considered as being
available. Note that h1i is in the space spanned L in the
sense that ch11. = h”, where QT = L(L'L)"L'.

If Assumption 3.4 of no conditional heteroskedasticity

is met, so that if we have,
E[(L®x§)'uiui'(1.®x‘i’)] = oiL'IeE(x‘i"x‘i’) ,

the only relevant instrumental variables are 09g, and OLS
on the demeaned equations is efficient. It does not matter
whether we use thi or Lox? as the initial instruments since
P (”max = QX, where Q = IueQT. Applying the redundancy

condition (2.15), we have
a§E(h1'ixi) = E(h1'iuiui'QTxi) . (3.11)

To simplify this condition, we will consider an example when

T

3. Then, hﬁi = (ui3 -ui3 0) ' and hﬁi = (0 ui1 -ui1) ', that
are pretty much of the same sort in a sense that one is

redundant if the other is. We will consider h% only. Since

109
E(hii'xi) = E[4’i(xi1"'xi2)J

and
3 -
E(h'ii'uiui'eri) = E[(ei3+¢i)(€i1-€i2) 21 6i1(xi1-xi) ] I
f-

the equality in (3.11) holds under
ASSUMPTION 3.7: E(eiteislxi,¢i,ei,) = E(eiteis), s,t !‘ 1’.

Note that Assumption 3.7 includes the case when t = s.
For the restriction of equal diagonals in 2, we have
the orthogonality conditions E(cgu1)== 0, and the redundancy

condition becomes
2 —
20‘E(ci'xi) — E(ci'uiui'QTxi) . (3.12)

We will consider the simplest case when T = 2. Then, the
LHS of (3.12) becomes 20‘3E[<1>i(xi1 - xi2)]. For the RHS of

(3.11), we have

2 _ 2 _
EH41 ’ 42) 21 €i1(xi1-Xi)] + E[2¢i(€i1 ‘ Ei2) 21 ei1(in-xi)]'
r- r=

For both terms to be the same, it generally requires
ASSUMPTION 3.8: E(e?tlxi) = o, t = 1,--.,'r,

as well as Assumption 3.7.
Recall that Assumption 3.7 and 3.8 are quite similar to
the conditions we derived for the random effects model.

They also are quite similar to the redundancy conditions in

1.11.4

110
the equations (3.5) and (3.7) of the moment conditions from
the scalar covariance matrix if we consider «pi as a
regressor. Allowing for correlation between the individual
effects and the regressors does not alter appreciably the
redundancy condition for the moment conditions from the
random effects covariance restrictions. The intuition
provided by Ahn and Schmidt (1992) is plausible, but it is
interesting to note that there are more efficient estimators
than GLS when certain conditional third moments conditions
on errors are violated.

Throughout this section we have studied redundancy
conditions of covariance restrictions in the models where
the GLS efficiency is reached. In the following, we turn to
the models where the regressors are weakly exogenous to the

errors .

4. NEARLY EXOGENOUS MODELS

As we noted in Chapter 3, dynamic models and the
rational expectations models are typical weakly exogenous
models. There is a growing concern that many regressors in
standard panel data models would be only weakly exogenous to
the time-varying errors. We will not work on dynamic models
explicitly, since there is a large body of previous work and
much of which studied covariance restrictions. For listings

of references, see Chapter 3. We study the models under the

111
assumptions that are usual in the rational expectations
models. But, our results apply to standard models where
some of regressors are weakly exogenous and also to general
dynamic models. There, in general, is no point to argue
whether covariance restrictions are useful in dynamic
models, because covariance restrictions coincide with the
instruments for the lagged dependent regressor. But, there
at least is one model (probably the only model) that draws

our interest, which we study first.

4.1. Weakly Exogenous Model: Diagonal Covariance

We first study the weakly exogenous panel data model
with sequential conditional moment restrictions of the type
in Chamberlain (1992), but with no individual effects. The
model is a typical rational expectations model that appears
in panel data applications. The diagonal covariance matrix
rarely appears in standard panel data models. Nevertheless,
it frequently is assumed in the rational expectations models
as the hypothesis itself implies. Further, many tests
failed to reject the null of no individual effects (Keane
and Runkle, 1990; 1992; Runkle, 1991). Other important
models that can have the diagonal covariance matrix are
dynamic models. Most of the previous studies on dynamic
models concerned the random effects covariance. However, it

has been observed that allowing for rich dynamics diminishes

112
the importance of individual effects (e.g. Holtz-Eakin,
1988).

We continue to consider model (2.1), and assume

ASSUMPTION 4 . 1: E (uit | x?,,u‘,-’t-1)

I
O
(1'

II
P
I-3

where x‘i’t = (xi1,---,xit) and u‘i’t (uiV- ~,uit), t = 1,---,T.
Assumption 4.1 implies many instrumental variables.
Utilizing every moment condition is not feasible. We
restrict our attention to the second moment conditions of

(xi,ui) , thus we have

E(unxg) = 0, t = 1,-- ,T, (4.1)
and

E(u‘un) = 0, s # t. (4.2)

(4.1) implies the set of instruments that appears in panel
data literatures (e.g. Schmidt, Ahn and Wyhowski, 1992), and
(4.2) is the covariance restrictions that the off-diagonals

of E are zero. It is usual to assume

ASSUMPTION 4.2: E(uﬁtlxgt,u‘;t_1) = of, t = 1,---,T.

This excludes conditional heteroskedasticity. Under
Assumptions 4.1 and 4.2, GMM using the instrumental
variables x: = diag(xi1, - . - ,x") is asymptotically identical
to GLS, and no other instruments are useful ignoring the
higher moment conditions on the errors (Chapter 2). That is

to say, among tk instruments for the t-th period equation,

.1

113
only k instruments xit are useful and others are redundant,
and any other functions of 3% are also redundant.

We find whether the moment conditions (4.2) are
redundant under Assumptions 4.1 and 4.2, given the initial
instruments x}. Note that E(bﬁx i) = 0, but E(bz'ixi) s 0.
Thus, the linear GMM using the instruments 8m is identical
to the nonlinear GMM asymtotically. Thus, we treat b2i like

it is known.

THEOREM 4.1: In model (2.1) under Assumptions 4.1 - 4.2,
the orthogonality conditions in (4.2) are redundant, given

the initial instrumental variables x:.

PROOF: We apply the redundancy condition (2.10a), which
becomes
E(bz'ixi) = E(bz'iuiui'x'i') [E(xg'uiui'xh]’1E(x:'xi) (4.3)

E(bz'iuiui'xb = E(u.u s > t = l,---,T-1, 1 = l,-~,T.

it isuitxi1)'

When 1’ = S, E(u.u = E(u.u = E[ui “E(IJ- H'ut'xis)xis]

It IS XIS)

= 0§E(uitxis) ’1 0. When 1=,t E(u.u

it is“ if xiv)
itisuir x") = E‘unumxn) =
E[uﬁE(uh|u",xn)xn] = 0 from Assumption 4.1. For T s s,t,
E(uituisui7x”) = 0. Thus, non-zero elements in E(bz'iuiui'xh
are a§E(uitxis) for s > t. For simplicity, we show this when

T = 3. Then,

2
0 azunxiz 0
I I * .. 2
2
0 0 o3ui2xi3

and [E(x:'uiui'x'i')]"E(x:'xi) = (a;2 of agz)'8Ik. Thus, the RHS

114

of the condition (4.3) becomes E(u ) for t < s = 2,3,

nxn
which is no more than the LHS of the condition (4.3). The

argument is more tedious for general T. I

Even if the regressors are only weakly exogenous, GLS
is consistent since 2 is diagonal, and efficient under
Assumption 4.2. This result also applies to dynamic models
with one or more lagged dependent variables. As an example,
suppose a simple AR(1) dynamic model where yit,1 is the only

regressor. Then, GLS is equivalent to 3SLS using the

instruments diag(ym,c--,y"4). The moment conditions to be
used are E(ypnn) = 0, t = 1,- -,T, which are equivalent to
E(unqu“) = 0, t = 1,- -,T-1, and E(ymu“) = 0. Thus, among

%T(T-1) zero off-diagonal restrictions only (T-l)
restrictions that correspond to the second moments between
regressors and errors are useful and rest are redundant.
Similar arguments apply when more than one lagged dependent
variable appear as regressors.

Theorem 4.1 depends heavily on the diagonality of E.
If E is not diagonal and the regressors are weakly
exogenous, GLS is not consistent (Schmidt, 1990), and all
the covariance restrictions would be useful in general. We

study a special case of this in the next section.

 

 

115

4.2. Weekiy Exogenous Model: Random Effects Covariance

We now combine the individual effects and the weak
exogeneity of the regressors to the idiosyncratic errors.
From Ahn and Schmidt, and section 3.3, we know that the
moment conditions from the random effects error covariance
are useful unless GLS efficiency is reached in the space

spanned by L. In another words, for the orthogonality

 

conditions from the random effect covariance to be
redundant, GLS in the differenced equations should be at
least consistent or equivalently the instruments Lox? should
be valid in the original equations before differencing. In
this sense, the model with a diagonal covariance matrix we
studied in the last sub-section is an exception, where the
instruments Lox? are not valid, but the GLS efficiency (not
in the space spanned by L) is reached because the covariance
matrix is diagonal and so the optimal weighting matrix
becomes block diagonal.

We argue, throughout this section, that whenever the
GLS efficiency is not reached, the moment conditions from
covariance restrictions are useful. There probably is a
nice and simple proof for this statement, but we could not
provide it. Thus, we only provide a heuristic discussion
through a couple of examples that look useful in
applications.

The model we deal with in this section is (2.1) with

116
the random effects error structure. Once we allow for the
time-constant error ¢3,.Assumption 4.1 is not plausible.

Instead, we assume
ASSUMPTION 4.3: E(eitngt,e$t_1,¢i) = o, t = 1,---,'r.

Assumption 4.3 is standard in the rational expectation
models that allow for arbitrary correlations between the
regressors and the time-constant unobservable ¢i.

Like we did in the previous subsection, we will
consider only the orthogonality conditions from the second

moments of (xwan). Then, we get a set of instruments

.. 0 q
x“

_o o
x“ x”

O
an

 

 

_ o -
Xn4

For more details, see Chapter 3 or Schmidt, Ahn and Wyhowsky
(1992). Also, Assumption 4.3 implies that the off-diagonals
of Z are the same each other.

We add no conditional heteroskedasticity assumption

ASSUMPTION 4.4: E(GEtIx‘i’t,<-:“?t_1,¢i) = o2 t = l,---,T.

‘I

This assumption is stronger than usual. It generally is
allowed that E(ea) s E(eﬁ), t ¢ 5. Assuming that they are

the same, along with Assumption 4.3, leads to the ramdom

117

effects covariance matix. But, as will shortly become
clear, the restriction of equal diagonals of 2 does not
alter the results we obtain. The moment conditions from
equal diagonals are not our concern anyway; our interest
centers on whether the equal off-diagonal restriction of
covariance matrix are useful under Assumption 4.3 and 4.4.

The orthogonality conditions from equal off-diagonals
of the random effects covariance matrix are E(h1'iui) = 0. To
simplify our discussion, we will consider when T = 3. We
focus on the moment condition E(hﬁi'ui) = 0, and find whether
it is redundant. We lose no generality from this

simplification, since the moment conditions E(hﬂhm) = 0 and

E(hﬁiIui)

0 represent the same sort of moment conditions of
equal off-diagonals of covariance matrix. The moment
condition E(hﬁi'ui) = 0, given the initial instruments wi, is

redundant iff
E[ (hﬁﬁhéi) 'xi] = E(hﬁi'uiui'wi) [E(wi'uiui'wi) ]'1E(wi'xi) . (4.4)

Recall that hgi = (ui3 -ui3 0)' and hgi = (o 0 Au“) I. Thus,

the LHS becomes
E[¢i(xi1’xi2)] + (€i1-6i2)xi3]°

Straightforward algebra using the matrix inverse lemma and

the identity E(xﬂx‘i’z) [E(x‘i’z' x‘i’z) ]'1E(x‘i’2' x‘i’1) = E(x‘i’1'x‘i’1) shows

that the RHS of the condition (4.4) is

 

118

“4’9“” " 312w.» - %E<¢ixn> [E(xiaxn>1"E<xixiz>

+ %E(¢ixi1) [E(xiglxi1)].1E(xi'1xi3) + %E(¢ix?2) [E(xci’z'xci’z) 1-1E(X?2'Xi3)

Equality between two terms do not hold unless E(¢9q) = 0,
and E(enxﬂ) = E(euxu) = 0, that make both terms zero.
Thus, the moment condition E(hﬁi'ui) = 0 is not redundant
under Assumption 4.3 and 4.4.

Nonlinear optimization is necessary in general to
implement the moment condition E(hhﬁm) = 0, since both
E(hki'xi) and E(hgi'xi) are non-zero. However, E(hgi'xi) = 0 so
that the linear GMM estimator using 3% as instrumental
variables is asymptotically equivalent to the nonlinear GMM
that uses the same set of moment conditions. Note that we
construct the %T(T-1)-1 instruments hh (or h%) first by
equalizing the elements in each row of (2.13a), and then the
T-l instruments h% (or ha) are constructed by equalizing
the elements in columns of (2.13a). But, it is not hard to
see that the column dimensions of h% and h% (or h; and h%)
are reversed if we construct the orthogonality conditions by
equalizing all the elements in each column of (2.13a) first.
Thus, %T(T-1)-l moment conditions can be implemented without
numerical optimization.

One would still be doubtful that efficiency gains from
the random effcts covariance restrictions would come from
allowing correlations between the regressors and the time-

constant error ¢u rather than from covariance structure. To
1

 

119
provide a firm idea that covariance structure plays an
important role for redundancy of the moment conditions from
covariance restrictions, we consider one more model under a
quite strong set of assumptions, which would not be much of
practical use.

We add assumptions

ASSUMPTION 4.5: E(¢ilxi,ei)

l
O

ASSUMPTION 4.6: E(¢§Ixi,ei)

I
o

Assumption 4.5 is like the random effects assumption in
strictly exogenous model, Assumption 4.6 is an assumption of
no conditional heteroskedasticity. Obviously, Assumptions
4.5 and 4.6 exclude dynamic models. Now, the only
difference between the model under Assumption 4.3 - 4.6 that
we will consider and the model under 4.1 - 4.2 that we
considered in the last subsection is in the covariance
structure. The initial set of instruments is the same, that
is wi = diag(x‘i’1,x‘i’2, - - - ,x‘ﬁ) .

For simplicity, we again consider the case when T = 3
and the moment condition E(hﬁhm) = 0. The condition (4.4)
is the redundancy condition, given the initial instruments

Iq = diag(x%,x§,-~~,x%). The LHS of (4.4) is
E[(h|1.i+h2i) 'xi] = E[(€i1-€i2)xi3]°

For the RHS of (4.4), we have

 

120
E(h1'iuiui'w‘.) = [o o (o§+o§)E((eH-ei2)x$3}],
E(wi'xi) = E(XIIXII Xi'zxi’z xi'3x‘i’3)'
and
E(wi'uiui'wi) = (Ii-E(wi'wi) + o§E(wi'eTeT'wi)
(0:303) XII. XII 03X?1'X?2 03x3 x233

_ 2 2 o. o 2 0.0

(af+o§) x‘i’; x‘i’3
Though it is onerous to invert E(w{unﬁvq), it is not hard
to see that equality in (4.4) holds when the off-blocks of
E(wi'uiui'wi) are zero, as P(X§)Xt = Xt, t = 1,2,3, the case we
considered in the last section. Thus, covariance
restrictions become useful by allowing the time-constant
error.

Note again that the only difference between this and
the model we studied in the last section is in the structure
of the covariance matrix. Given weak exogeneity of the
regressors, appearance of the time-constant error breaks the
block diagonality of the optimal weighting matrix and make
GLS inconsistent.

In many of the rational expectations models, MA(1)
serial correlation of the errors has been detected (e.g,
Keane and Runkle, 1990; 1992; Runkle, 1991). As we
discussed in Chapter 3, we do not have to shrink the set of
instruments if the serial correlation is caused by the time
lag for observing past shocks. Then, the set of instruments

in those models and in the model we are dealing with are the

121
same except those from covariance restrictions. However,
GLS anyway is not consistent under MA(1) error structure.
Thus, we conjecture that covariance restrictions are useful
in those models. Also, numerical Optimization will not be
necessary to realize the efficiency gains from covariance

restrictions in those models.

5. CONCLUSION

GMM provides a new aspect of instrumental variables
that, to be useful, they do not have to be correlated with
the regressors as long as they are correlated with the error
squared sequence (u§}. It is interesing to ask how much we
can improve estimators from using those instruments.

Finding that kind of instrumental variables outside of the
models we are interested in would be unusual, but as
sections 3 and 4 show, residuals generated from initial
consistent estimators could play that role. And generally
they are useful when error distribution is not symmetric.

From section 4, we know that diagonality of covariance
matrix is crucial for GLS to be consistent and efficient
when the regressors are weakly exogenous. However, there
are models, though not of practical importance, where GLS is
consistent but covariance restrictions are always useful.
Suppose a model where the regressors are only currently

uncorrelated with the errors (known as contemporaneous

 

 

122
uncorrelated model) and 2 is diagonal, then GLS is
consistent. But, conditional heteroskedasticity is
guaranteed in this case, and more instruments generally are

useful if exist, and covariance restrictions also are

useful. This arguments are directly related with Wooldridge

(1993) and Chapter 3.

We only considered redundancy of the moment conditions
from the second moments of errors and it proves that the
conditional third moment conditions of errors are crucial.
If covariance restrictions are not useful since the
conditional third moment conditions of the errors are met.
Then, those conditional third moment conditions of the
errors in turn become a new set of moment conditions, and
they, to be redundant, would require certain set of the
conditional fourth moment conditions of the errors, and so
on. Though we do not pursue the redundancy of higher
moments in this paper, we conjecture that unless the
conditional errors are from a normal distribution, higher
moment conditions would matter at some point. For an
example, consider a moment condition E(ei) = 0, t = 1,-- ,
T. It is not hard to show that these moment conditions are
not redundant unless E(e’i’t) = 30““, which holds when the

errors are from a nomal distribution.

 

 

.LI_.

CHAPTER FIVE

CONCLUDING REMARKS

Finding some additional moment conditions and the
conditions under which certain moment conditions are
superfluous in panel data models has been an important
branch of research; Hausman and Taylor (1981), Amemiya and .

MaCurdy (1986), Breusch, Mizon and Schmidt (1989), Anderson |-

 

and Hsiao (1981), Holtz-Eakin, Newey and Rosen (1988),
Arellano and Bover (1990), Schmidt, Ahn and Wyhowski
(1992), and Ahn and Schmidt are examples of contributions.
The results in this thesis unify and extend results in
several of these papers. One consequence of the analysis is
the emergence of some new estimators that either exploite
redundancy results or new useful orthogonality conditions.
Another important line of research is the specification
test. In most of applications, people presume that the
covariance matrix takes the random effects form, and many
existing tests that are suitable for the panel data
framework focus on testing whether the time-constant
unobserved effects are correlated with the explanatory
variables (Chamberlain, 1982; Holtz-Eakin, 1986; Jakubson,
1991). However, as we discussed in Chapter 3, in weakly
exogenous models it is highly probable that the moment

conditions depend on the structure of the covariance matrix,

123

 

124
and also as we showed in Chapter 2, in strictly exogenous
model the redundancy of moment conditions hinges heavily on
the structure of the covariance matrix. Thus, testing the
structure of covariance matrices will be quite useful and
necessary in many cases. While the over-identification test
(Sargan, 1958; Hansen, 1982) and the Hausman test (Hausman,
1978) are directly applicable for testing the covariance
structure, these tests need the estimators using the moment
conditions from covariance restrictions, and therefore, in
general, will involve numerical optimization.

There might be simpler ways of testing the structure of
covariance matrix. Arellano and Bond (1991) devised a test
statistic for testing the null a“3t = 0, s # t, in dynamic
models, where ast is the (s,t) element of the covariance
matrix of the differenced errors. This direct test can be
generalized to general weakly exogenous, and also to
strictly exogenous models. There also will be many more
test statistics to be devised. For an example, it would be
nice to have a simple test statistic that jointly tests the
null that the covariance matrix is of the random effects
form.

In addition, these direct test of covariance structure,
combined with the Hausman test or with the GMM test, would
lead to even nicer results. For an example, suppose we have
a conflict result in the rational expectations model

considered in Chapter 3; the null of the MA(1) serial

 

 

 

125
correlation of the time-varying errors cannot be rejected
from the direct test, but the Hausman test cannot reject the
hypothesis that the instruments are valid, that are supposed
to be invalid in the presence of the MA(1) serial
correlation. Then, this result will lead to the conclusion
that the serial correlation is due to the time lag until the

shock is observed. We leave these topics for future works.

 

 

 

LIST OF REFERENCES

126

REFERENCES

Ahn, S.C. (1990), "Three Essays on Share Contracts Labor
Supply, and The Estimation of Models for Dynamic Panel
Data," unpublished Ph.D. dissertation, Michigan State
University.

Ahn, S.C. and P. Schmidt (1991), "Generalized Least Squares
Estimation and Specification Test for Panel Data
Models," unpublished manuscript.

Ahn, S.C. and P. Schmidt (1992), "Efficient Estimation of
Models for Panel Data," Journal of Econometrics,
forthcoming.

Ahn, S.C. and P. Schmidt (1993), "A Separability Result for
GMM Estimation, with Application to GLS Prediction and
Conditional Moment Tests," Econometric Review,
forthcoming

Amemiya, T. (1977), "The Maximum Likelihood and The
Nonlinear Three Stage Least Square Estimator in The
General Nonlinear Simultaneous Equation Model,"
Econometrica, 45, 955-968.

Amemiya, T. (1985), Advanced Econometrics, Basil Blackwell.

Amemiya, T. and T.E. MaCurdy (1986), "Instrumental-Variable
Estimation of An Error-components Model," Econometrica,
54, 869-880.

Anderson, T. and C. Hsiao (1981), "Estimation of Dynamic
Model with Error Components," Journal of the American
Statistic Association, 76, 598-606.

Arellano, M. and S. Bond (1991), "Some Tests of
Specification for Panel Data: Monte Carlo Evidence and
Application to Employment Equation," Review of Economic
Stmgieg, 58, 277-297.

Arellano, M. and O. Bover (1990), "Another Look at The
Instrumental Variable Estimation of Error-Components
Models," Review of Economic Studies, forthcoming.

Bhargava, A., and J.D. Sargan (1983), "Estimating Dynamic
Random Effects Models from Panel Data Covering Short
Time Periods," Econometrica, 51, 1635-1659

127

Bowden, R.J. and D.A. Turkington (1984), Instrumemtel
yeriables, New York, Cambridge University Press.

Breusch, T.S., G.E. Mizon and P. Schmidt (1989), "Efficient
Estimation Using Panel Data," Eeomometrica, 57, 695-
701.

Chamberlain, G. (1982), "Multivariate Regression Models for

Panel Data," Sourmei of Econometrics, 18, 5-18.

Chamberlain, G. (1987), "Asymptotic Efficiency in Estimation
with Conditional Moment Restrictions," Jourmei ef

Seenometrics, 34, 305-334.

Chamberlain, G. (1992a), "Comment: Sequential Moment

Restrictions in Panel Data," Jourmei of Busimese amd
Eeemomic Statistics, 10, 20-26.

Chamberlain, G. (1992b), "Efficiency Bound for
Semiparametric Regression," Econo etrica, 60, 567-596.

Cragg, J.G. (1983), "More Efficient Estimation in The
Presence of Heteroskedasticity of Unknown Form,"
Ecommetrica, 51, 751-763.

Hansen, L.P. (1982), "Large Sample Properties of Generalized
Methods of Moments Estimators,” E onometrica, 50, 1029-
1054.

Hausman, J.A. (1978), "Specification Tests in Econometrics,"
Eeomemetrica, 46, 1251-1272.

Hausman, J.A. and W.E. Taylor (1981), "Panel Data and
Unobservable Individual Effects," Econometrica, 49,
1377-1398.

Hausman, J.A., W.K. Newey and W.E. Taylor (1987), "Efficient
Estimation of Simultaneous Equation Models with
Covariance Restrictions," Econometrica, 55, 849-874.

Hayashi, F., and C. Sims (1983), "Nearly Efficient
Estimation of Time Series Models with Predetermined but
Not Exogenous Instruments," Econometrica, 51, 783-792

Holtz-Eakin, D.W. (1986), "Testing for Individual Effects in

Autoregressive Models," Jemrnai er Econometrics, 39,
297-307.

Holtz-Eakin, D., W. Newey and H.S. Rosen (1988), "Estimating
Vector Autoregressions with Panel Data," Econometrica,
56,1371-1396.

 

128

Hsiao, C. (1986), Anaiysis of Panel Dete, New York,
Cambridge University Press.

Jakubson, G. (1991), "Estimation and Testing of the Union
Wage Effect Using Panel Data," Review of Economic
Studies. 58. 971-991-

Keane, M.P. and D.E. Runkle (1990), "Testing The Rationality
of Price Forecasts: New Evidence form Panel Data,"

American Eeonemie Beyiew, 80, 714-735.

Keane, M.P. and D.E. Runkle (1992), "On The Estimation of
Panel Data Models with Serial Correlation When
Instruments Are Not Strictly Exogenous," Somrnal of

Snsiness and Economic Statisties, 10, 1-9.

Kiefer, N.M. (1980), ”Estimation of Fixed Effects Models for
Time Series of Cross Sections with Arbitrary
Intertemporal Covariance," Jou nal of Econ m trics, 14,
195-202.

Newey, W.K. and D. McFadden (1993), "Estimation in Large
Samples," Handbook of Economet 'cs, Vol.4, forthcoming.

Rothenberg, T.J. and C.T. Leenders (1964): "Efficient
Estimation of Simultaneous Equations Systems,"
Econometrica, 32, 57-76.

Runkle, D.E. (1991), "Liquidity Constraints and The
Permanent Income Hypothesis: Evidence from Panel Data,"

Sournal of Monetary Economics, 97, 73-98.
Sargan, J.D. (1958), "The Estimation of Economic Relations

Using Instrumental Variables," Econometri a, 26, 393-
415.

Schmidt, P. (1990), "Three-Stage Least Squares with
Different Instruments for Different Equations," Journal

er Econometrics, 43, 389-394.

Schmidt, P. (1990), Lecture notes.

Schmidt, P., S.C. Ahn and D. Wyhowski (1992), "Comment,"

Semrnel of Susiness ang Eeonomie Stetisties, 10, 10-14.
Theil, H. (1971), Erineipies ef Sgongmetries, John Wiley &
Sons.

White, H. (1980), "A Heteroskedasticity-Consistent
Covariance Matrix Estimator and A Direct Test for
Heteroskedasticity," Econometrica, 48, 817-838.

129

White, H. (1982), "Instrumental Variables Regression with
Independent Observations," Econometrica, 50, 483-499.

White, H. (1984), Asymptotic Theory for Econometricians,

Orlando, Academic Press.

White, H. (1986), "Instrumental Variables Analogs of
Generalized Least Squares Estimators," Advances in
Statistical Analysis and Statistical Com utin , 1, 173-
227.

Wooldridge, J.M. (1992), "System Estimation Procedures,"
Lecture notes.

Wooldridge, J.M. (1993), "Estimating Systems of Equations
with Different Instruments for Different Equations,"
unpublished manuscript.

 

 

HICHIG‘IN STATE UNIV. LIBRRRIE ES