' .‘Lr. [VD-h"! ’ ..n )’, ‘7’ “Ty?" F“ :~. 5:, 5/32, "A '-\. 411/"” .An 3;”, $152k" ' .. ’12“. 15%;:ny 5%: if r.” (5; "7 ""5'fr~"v/'v‘i . a. . 1 U4; fig; Wk f" ,a, I": I??? firm .12).“:an “kn-n... " 9.2.1.9., {éi‘ 1. . "fiq 15:. {fink 5-. ;» :tf?‘ 9%.? {LE/El ‘¢§¢“g§§ . “I“: _‘ w: J ‘1‘ . 12,“; 3pm.: at: " 1‘1 .. ., ’.V 3 n . ‘ i‘,‘ 1.5g, Q'E‘t-‘v , . €1er5, ' ‘é‘l'. .. c :35 .5 :5:-‘ . -_ ' 55' o a « . "Ag?“ “ 3,4», mt». :4, .11.}; - 7‘ 7. ., ” wafiflgfififimgfigfim%flfiwk i‘fi" {:3 -1; 319% " V133,? “g-‘é‘iwflg. ' vfl b: 1, \‘i , fl} 3’ -' j-‘fln‘ “J“. dmfia' $§fiwffl%fi” __ 3:2: . 5.. )‘W‘fix . A , .. f ‘ a . 7 Vii". ‘ . - a u _‘ . we}. . . ‘3}: -»,-v’~_ I‘ O; J J- ; Jox.-,Y 5 {fl-(:10: ‘ 2;; L a. g - Q . -- . I: ' 1; ._ .‘ _ ‘ _-A * ' 1v . ?’-,..‘.-:".7-’~.§a‘.‘; ‘: 1 . .. . ..~ . w 52m» 41.,- “ ‘55 3. f‘ ’5 ~ ’ '7‘" " ' ‘1" vi" , f ‘z : h . _. "5; o 1’41 “1'4”! 1. .{f A “ .,, . - ' ' v i ' : ‘_ , '5' ' N ‘94.}, 1' L ‘ "’10:; J. 1'. C" I: r ‘ ' '; JV 0‘1. 9. '.tl I.” F ._ ‘ "1- ‘5? :11 "vf '.I‘ ‘6‘?! 3w!» : c , ",5; y“; 35;" \-LA .. I ' ‘{. ‘ . “1-,!" ' l :{1 J}: 5 .. H~m¢mww-JQ~ 7 ‘1"...9 f, .Iw 41 2.1,. .,.;,4. .1 . _ ‘1 1 . .-.' ”lg-'19.. W”; I; “A,“ 1‘ "',:!‘"$r‘-‘)‘3""J. . , A If!” :2; .H 1-J’!’£F"'P: v 4 ‘ n "A ‘ .n' ‘ - “I r I ‘ ' 6:1,}: .' ‘47.. ”I" ' ”K" I (4‘5; 'k’,“fl‘:1, 1 (fig: {fig jifc’ériiugt‘é/ Mtg-nay {3 I, :5 7: 3r); ; guru" _I 0". 1 I . . . ‘-, ‘ ‘ i o W" , u 33/3 - ’“ "'13? > -" i ‘v‘; . ,,-, F- !. 114,-:va ,, riffs: . ‘I a ‘ .."_v 5' 01/11]! ”1.,u‘ '..v Q . er-C ._.«,:-,,. If» .,,A ,0 #5 ,~ v J #1.] ' 4' ,. I . ' '1 ‘_ “1:1" ,1 . at!!! I.’ q- _- k. .r .. . 1'1 r'f'xl " ‘ '.1‘. “pl 1'... 31% f 5,. c _ ,o ,4.’ .,. "f,- u. I ‘ ,.. ,1- ‘10:,h' 1.2; .' ‘0‘. I. 1.2}:- ’.‘If‘ If} '; O ‘k 1) (3'1 dfr‘. _ ‘1‘ _ ..‘ I fit“? :5" 'El’. :lxla r/rJf [15; fw‘ ‘ : . - ' I . W. . I! , , (If l‘;.1.\“J-"x.‘ . —/ ‘.!: _. .. -- 45; ,..-I > . l'tfev'- , ‘1’)”, u _ . '/- 5” f3: .79.! ' ,3 'F I: " F :Kc..“ ‘. - 3 1. - . J .._ 111:?7‘” '~. lJ,’:'-'>5u 5’ Io " I ' '.I-z ‘ ': ,1 . . / 3N1; 54' '4’.“ , H g . . .15; Mo ~‘ “‘ .5 7 ",....yv ,..; .uM-w’ua 3“ $3 355""! M! I,” 4’“ / 'yé‘L'J" .lrfih" U .f‘ h’fi" K, ‘ V...” Jan”. '4‘...” "“"-",.'¢l . _“5 u -. 1. .4wa A"? 3:1 5‘ _, S: I {L' whiz; . - -1- Ifi'..'. 5,]; {"12 '3, 34):} w r 9 (J 4%,.” / 13.x! . 7;, Ir“: , r ‘ -4 I"; “1,591, tiff.) 'lyjf‘lg/llxq 1,1} ‘ ....,J. '9 I. 4:}:57‘LII 5' g ,4 $25,111.." #3! ’ {II/1:5 'f" . . ?' . _5 :":|'I h “'.A' “’97.. j gilt/u 6' [’6' ” .ij’if': yup" -.: . '.i {if} '2‘”. I " :5.) H :7 '1’]. . "7 0’“: ‘.‘al, v ‘63: '.." joflfikfiimf, ’lfi/‘f 9.7.7.217“? WW’L‘" éh'fi ' f i: {4:45.31 ;_ v.7 41/31, jawvvgffl'fihg ' I 4214: -” tfi/ %4.r$.! .30 i023: .. I . . 11.5.2. [1.5~,-.3::,}g/ 4154?: : ,.. /,, . gr; .w. «#11; .3? Add... Gan“?! .jg «lif/ "1/ Wit-v71 - drug xw0é7H/ LIBRARY Michigan State University This is to certify that the dissertation entitled INSTRUMENTAL—VARIABLE ESTIMATION OF A PANEL DATA MODEL presented by Donald J. Wyhowski has been accepted towards fulfillment of the requirements for Ph.D. Economics degree in Rafa 8&8? Major professor Date July 15, 1988 MSU is an Affirmatiw Action/Equal Opportunity Institution 0-12771 INSTRUMENTAL-VARIABLE ESTIMATION OF A PANEL DATA MODEL By Donald J. Wyhowski A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Economics 1988 regressic develcps under an “Plateau My l extended COmainir or all of sla11 T. allow the both tb’pe SQCC Tay10r t ABSTRACT INSTRUMENTAL-VARIABLE ESTIMATION OF A PANEL DATA MODEL By Donald J. Wyhowski This dissertation involves the estimation of a linear regression model in the presence of panel data. My research develops appropriate econometric techniques for such models, under differing assumptions about the correlation between the explanatory variables and the (unobserved) effects. My three major contributions are: First, I have extended the analysis of Hausman and Taylor (1981) to a model containing individual and time effects correlated with some or all of the regressors, under the assumption of large N and small T. I consider random individual and time effects, and allow the regressors to be correlated or not with either or both types of effects. Second, I have extended the analysis of Hausman and Taylor to a single equation in a simultaneous equations system; that is, to a regression model in which some of the regressors are correlated with the random noise component of the error.’ I propose ZSLS estimators based on instrument sets proposed by Hausman and Taylor, Amemiya and MaCurdy (1986), and Breusch, Mizon, and Schmidt (1987). Third, the dissertation proposes full-information (3SLS) estimators for a simultaneous equations system with random individual effects correlated with some or all of the exagenous the usual are correQ estzutor the exoge! also cons be cox-rel: Others, 5. equation exogenous variables. These estimators are shown to reduce to the usual fixed-effects treatment if all exogenous variables are correlated with the effects, and to reduce to an estimator previously proposed by Baltagi (1981) if none of the exogenous variables are correlated with the effects. I also consider the case in which some exogenous variables may be correlated with the effects in some equations but not in others, so that the available instrument set varies from equation to equation. Couittet SESSGStic CCntribu, “Ovid lil read thi: CCMentS ACKNOWLEDGMENTS I would like to express my sincere graditude to the contributors to this dissertation. Peter Schmidt, the Committee Chairman, offered encouragement and helpful suggestions at all stages of my research and without his contribution this dissertation would not be possible. I would like to thank the other members of the committee who read this dissertation in its entirety and provided useful comments. iv Chapter 1. 2. Ir. In I. (JV ‘3 TABLE OF CONTENTS Introduction . . . . . . . . . . . . Individual Effects but no Time Effects . . . 2.1 Introduction . . . . . . . . . . 2.2 Geometry . . . . . . . . . . . 2.3 Fixed Effects . . . . . . . . . . 2.4 Random Effects not Correlated with the Regressors . . . . . . . . . . 2.4.1 Within Estimation . . . . . . 2.4.2 Generalized Least Squares Estimation . . . . . . . . 2.4.3 Weighted Least Squares Estimation . 2.5 Random Effects Correlated with the Regressors . . . . . . . . . . 2.6 Variance Estimation when the Random Effects are not Correlated with the Regressors . 2.6.1 Counting Rules for Identification . 2.6.2 Estimation of q and r . . . . . 2.7 Variance Estimation when the Random Effects are Correlated with the Regressors . . 2.7.1 Necessary Conditions for the Existence of dIV . . . . . . 2.7.2 Consistency of dIV . . . . . . 2.7.3 A Consistent Estimate of r . . . 2.8 Conclusions . . . . . . . . . . . Individual and Time Effects . . . . . . . 3.1 Introduction . . . . . . . . . . 3.2 Geometry . . . . . . . . . . . . Page 10 12 13 15 15 17 23 28 30 31 34 3.3 Fixed Effects . . . . . . . . . . 3.4 Random Effects, not Correlated with Regressors . . . . . . . . . . 3.4.1 Within Estimation . . . . . . 3.4.2 Generalized Least Squares Estimation . . . . . . . . 3.4.3 Weighted Least Squares Estimation . 3.5 Random Effects, Correlated with Regressors . . . . . . . . . . 3.5.1 Weighted Least Squares Estimation . 3.5.2 Counting Rules for Identification . 3.6 Random Effects when T is Fixed . . . . 3.6.1 Random Effects not Correlated with Regressors . . . . . . . . 3.6.2 Random Effects Correlated with th Regressors . . . . . . . . 3.6.3 An Alternative Approach to Estimation . . . . . . . . 3.7 Variance Estimation when the Random Effects are not Correlated with the Regressors . 3.8 Variance Estimation when the Random Effects are Correlated with the Regressors . . 3.8.1 Necessary Conditions for the Existence of dIV and cxv . . . 3.8.2 Consistency of dxv and cxv . . . 3.8.3 Consistent Estimates of q and r . . 3.9 Conclusions . . . . . . . . . . . Simultaneous Equations with Effects . . . . 4.1 Introduction . . . . . . . . . . 4.2 Single Equation Estimation . . . . . . 4.2.1 Two-Stage Least Squares . . . . 4.2.2 An Orthogonality Derivation of the ZSLS Estimator . . . . . . . 4.2.3 Baltagi’s Error-Component Two-Stage Least Squares Estimator . . . . 4.3 System Estimation . . . . . . . . . 4.3.1 Three-Stage Least Squares . . . . 4.3.2 Instrumental Variables Estimation . vi 52 54 56 56 58 63 65 68 71 71 76 80 93 101 105 105 107 121 122 122 126 130 132 135 136 138 139 Bibliogr- 4.4 4.5 Conclusions 4.3.3 3SLS with Different Instruments 5. Conclusion . Bibliography Special Cases vii 144 147 160 161 164 CHAPTER 1 Introduction In this thesis, we consider the estimation of a linear regression model using panel data. Following the usual practice in the literature, we assume that this data consists of T time—series observations on each of N individuals. Models using panel data present the possibility that some of the explanatory variables could be constant over either of the two indices (T or N) and that these variables could be unobservable. Such unobservable time-invariant and individual-invariant variables are called individual and time effects, respectively. Our research will develop appropriate econometric techniques for panel data models, under differing assumptions about the correlation between the explanatory variables and the (unobserved) effects. It is commonly argued (e.g., Theil (1972), p. 104) that the stochastic disturbance in the usual regression model reflects the joint influence of the variables not included in the model. In the case of panel data, the individual effects would represent the influences of those neglected variables which are time-invariant, and similiarly the time effects would represent the influences of those neglected variables which are to a node these neg regressor treatment: the corre. The I chels 5-11 the effec Effects t: Correlated then is to the {cares ”rubles. effects as This direc the error' (1981). be Tandem heme“ th 2 which are indivual-invariant. Clearly, least squares applied to a model including either type of effects will be biased if these neglected variables are correlated with the included regressors, and we therefore will distinguish different treatments of the model which vary according to the nature of the correlation between the regressors and the effects. The literature on panel data has covered separately models with individual effects and models with individual and time effects. One strand of the literature has assumed the effects to be fixed, or, more or less equivalently, to be correlated with all the regressors. The point of the model then is to remove the potential bias caused by correlation of the regressors with omitted time or individual-invariant variables. A second strand of literature has viewed the effects as being random and uncorrelated with the regressors. This direction of thought includes the textbook treatment of the error component model as well as the work of Baltagi (1981). A third direction of thought assumes the effects to be random but allows for the possibility of correlation between the effects and some of the regressors. Recent papers by Hausman and Taylor (1981), Amemiya and MaCurdy (1986), and Breusch, Mizon and Schmidt (1987) have considered the case in which the indivdual effects (the time-invariant error component) are correlated with explanatory variables, and have proposed different instrumental variables «estimators. However, with the exception of Amemiya and MaCurdy, none of these papers considers the case in which ale of sense of error, a heliya estimati ours are The “131F515 correlat Effects ‘ the HT, there er. error ha; °f the f, and elt eat-1181. . the Case 5° that i 3 some of the explanatory variables are endogenous (in the sense of being correlated with the noise component of the error, as well as the individual effect). Furthermore, Amemiya and MaCurdy consider only limited information (ZSLS) estimation, and their model is restrictive in some ways that ours are not. The first concern of this dissertation is to extend the analysis for the case when effects are allowed to be correlated with some of the regressors to the case when time effects are present as well as individual effects. That is, the HT, AM, and EMS articles all consider a model in which there are individual effects but no time effects, so that the error has only two components. That is, their error term is of the form Sit = u: + eit where u1 is the individual effect and e1: is the random noise. As pointed out above, the earlier literature on panel data also considered prominently the case in which the error also contains a time effect, ve, so that the error, 81: = u1 + ext, contains three components. In this dissertation, we extend the results of HT, AM, and EMS to the three component case. This results in different sets of allowable instruments than they use, and to some interesting results on how many and what kind of exogeniety assumptions must be made to estimate the model. The analysis is done mostly under the assumption that both the number of individuals (N) and the number of time periods (T) is large, so that asymptotic properties of the estimators are derived as both N and T approach infinity. However, we will include a separat: ssall, thi leads to The of sinult eq‘dations individUa Point of earlier 1:. Variables "ith the natural 6 and it ca every Str Icads to times: er hoiSe but 4 a separate treatment of the case in which N is large but T is small, the common assumption in the two-component case. This leads to some novel estimators in the three-component case. The second concern of this dissertation is the problem of simultaneity. We consider the usual simultaneous equations model, but with panel data and with an unobservable individual effect in every structural equation. The basic point of view in this thesis, motivated by an argument given earlier by Breusch, Mizon and Schmidt (1985), is that all variables correlated with the noise should also be correlated with the individual effects, but not conversely. This is a natural extension of the point of view in Hausman and Taylor, and it can be Justified by consideration of a system in which every structural equation contains individual effects. It leads to a classification of exogenous variables into three types: endogenous, meaning correlated with noise and individual effects; gingly e o e o 3, meaning correlated with noise but possibly correlated with individual effects; and 422211 gnnggnnns meaning uncorrelated with individual effects and noise. Several estimators are derived, which are natural) generalizations both of the HT estimators and the usual two- stage least squares estimator. Third, this dissertation generalizes the estimators from the single-equation literature just cited to full information (3SLS) estimators. These estimators reduce to the fixed effects estimators when all exogenous variables are correlated with the effects, and they reduce to previous estinator exogenous addition, are corre eQuations 5 estimators for the random effects model when none of the exogenous variables are correlated with the effects. In addition, we discuss the case in which different variables are correlated with the effects in different structural equations. The plan of this thesis is as follows. In chapter two, we survey the existing literature, review the geometry which is used in our subsequent analysis, and introduce a new approach for the analysis of regression models with panel data. This approach proves to be useful in the analysis of models with both individual effects and time effects, the topic of chapter three. We then consider the fixed effects model, in which the individual effects are treated as fixed parameters to be estimated; the random effects model, in which the individual effects are treated as random and uncorrelated with the regressors; and the model of HT, in which individual effects are treated as random but potentially correlated with the regressors. We also consider the problem of consistent estimation of the variances of the noise and the individual effects. Such estimates are necessary to implement the generalized least squares estimators considered above. In chapter three, we extend the linear regression model considered in the previous chapter to include unobservable individual-invariant time effects, and we then apply the HT method of instrumental variables estimation to this extended model, and derive the subsequent estimator. The analysis of the regres using the consider 1 tile effec and we cor individual uncorrelat vfi’sion 01 "0 treats r‘é‘Eressoz-g are Chara: but Only I PreviOUS 1 that Fj QStimatIOY EffeCts’ to i‘PIem, considhe< In C} eguatiOns individuai consider 6 the regression models considered in this chapter is done using the approach introduced in chapter two. We then consider the fixed effects model, in which the individual and time effects are treated as fixed parameters to be estimated; and we consider the random effects model, in which both the individual and time effects are treated as random and uncorrelated with the regressors. We consider an extended version of HT, in which both the individual and time effects are treated as random but potentially correlated with the regressors. Since many currently available panel data sets are characterized by having many cross-sectional observations but only relatively few time periods, we then consider the previous two models for the case when N is large but T is fixed. Finally, we consider the problem of consistent estimation of the variances of the noise, the individual effects, and the time effects. Such estimates are necessary to implement the feasible weighted least squares estimator considered. In chapter four, we consider the usual simultaneous equations model, but with panel data and with an unobservable individual effect in each structural equation. We then consider a natural extension of the HT model by allowing some of the explanatory variables to be correlated with the individual effects. We apply the HT method of instrumental variables estimation, derive the subsequent limited .information (ZSLS) and full information (3SLS) estimators, and discuss their relative efficiency. In addition, we provide a 5 equations 1. the notatic interesting equations I correlated instruments In Che SSESQSlIOns 7 provide a survey of the current literature on simultaneous equations with effects and translate previous estimators into the notation of this thesis. Finally, we consider an interesting problem which arises for the linear simultaneous equations model with effects when some of the variables are correlated with the individual effects; namely, the instruments need not be the same for every equation. In chapter five, we summarize our results and make suggestions for future directions of research. CHAPTER 2 Individual Effects but no Time Effects 2.1 Intzodugtign In this chapter, we consider the estimation of a linear regression model using panel data. Following the usual practice in the literature, we assume that this data consists of T time-series observations on each of N individuals; we distinguish regressors which vary over time and individuals from those which vary over individuals but are time-invariant; and we assume the presence of unobservable, time-invariant individual effects as well as the usual statistical noise. In chapter 3, we will extend this model to include unobservable time effects. We write the model to be considered in this chapter as (20101) yit =XitB+ ZiD+u1 +elt, i = lpoo’N; t: lgoo’T where yit is the dependent variable, Kit is a vector (of dimension 1 x g) of time-varying explanatory variables, 21 is a vector (of dimension 1 x k) of time-invariant explanatory variables, and B and D are vectors of parameters to be estimated. The errors e1: are iid with mean zero and variance 6. and various all cases 1 It is the stochas reflects t) Iodel. In (our In) W Variables , {of y On X are Correll "ill disti. “cording ‘ individUal The p. we “View ‘ mall‘ses 9 variance 032. The individual effects u: are unobservable, and various assumptions about them will be made. However, in all cases they will be treated as time-invariant. It is commonly argued (e.g., Theil (1972), p. 104)) that the stochastic disturbance in the usual regression model reflects the joint influence of variables not included in the model. In the case of panel data, the individual effects (our ui) would represent the influences of those neglected variables which are time-invariant. Clearly, least squares (of y on X and Z) will be biased if these neglected variables are correlated with the included regressors, and we therefore will distinguish different treatments of the model which vary according to the nature of the correlation between the individual effects and the regressors. The plan of this chapter is as follows. In section 2.2 we review the geometry which is used in our subsequent analyses. We then consider the estimation of the model under various assumptions. In section 2.3 we consider the fixed effects model, in which the individual effects are treated as fixed parameters to be estimated. The point of this model is to remove the potential bias caused by correlation of the regressors with omitted time-invariant variables. In section 2.4 we consider the random effects model, in which the individual effects are treated as random and uncorrelated ‘With the regressors. Under these assumptions there is no IProblem of bias, and efficiency of estimation is our central concern. In section 2.5 we consider the model of Hausman and Taylor (19 random but Finally, 1! consistent individual ilplenent t in section . This cl However, it it introduce Iodels with useful in th. and line eff: 2'2 Ge“\etrz A USEful remainder of (21 1) can b. QUation (2I2,2’ (yit ”4.3) the“ ~ , I o 10 Taylor (1981), in which the individual effects are treated as random but potentially correlated with the regressors. Finally, in section 2.6 and 2.7 we consider the problem of consistent estimation of the variances of the noise and the individual effects. Such estimates are necessary to implement the generalized least squares estimators considered in section 2.4 and 2.5. This chapter does not contain any new estimators. However, it provides a survey of the existing literature, and it introduces a new approach for the analysis of regression models with panel data. This approach will prove to be useful in the analysis of models with both individual effects and time effects, as we will see in chapter 3. 2.2 e et A useful fact, and one to be used throughout the remainder of this chapter, is that the original equation (2.1.1) can be equivalently written as the two orthogonal equations (2.2.2) (Yit - y1.) (Xit - X1. )3 +(e1t - e1.) (2.2.3) Yi. X1.B + 21D + 111 + e1., 1 Where i = 1...,N; t = 1,...,T; y1. = (1/T)Zy1t, X1. = 1‘ 1- t=1 (1/T)Z Kit, and e1. = (1/T)Z en. Equation (2.2.3) tsi t-l expresses the data in terms of its individual averages over time, while equation (2.2.2) expresses the data in terms of its deviations around the mean for each individual. 11 Writing equation (2.1.1) in matrix form, we have (2.2.4) y=XB+ZD+u+e where y, u, and e denote (NT x 1) dimensioned vectors; and X .and 2 denote (NT x g) and (NT x k) dimensioned matrices, respectively. Following the convention of Hausman and Taylor (1981), the observations are ordered first by individual and then by time, so that u and each column of Z are (NT x 1) dimensioned vectors consisting of N blocks, with each block containing T identical entries. To achieve the same decomposition as was accomplished above, we define the two orthogonal projections (2.2.5) P = ( In 0 jrer/T ) and Q = Iur - P where jr = (1,...,1)‘r is a vector of ones, having dimension (T x 1). The transformation P determines the means for each of the individual groups and repeats each of these N observations T times. The transformation Q transforms each observation into the difference between itself and its respective individual group mean. Explicitly, the (i,t) elements of Py and Qy can be written as (2.2.6) (Py)1t = y:. and (Qy)1t = Yit - y1 - a respectively. Since 2 contains variables that are constant across all time-series observations for a given individual, QZ = 0. The elements of the columns of Z are, on the other hand, unaffec Analcgc i.e. Q: can nor equatit (2.2.7 It regress effects aPPI‘Oa: and the esti-at 12 unaffected by the transformation P; that is, P2 = Z. Analogous results hold true for the individual effects u; i.e. Qu-= 0 and Pu = u. Thus, the original equation (2.2.4) can now be written equivalently as the two orthogonal equations (2.2.7) Qy QXB + Qe (2.2.8) Py PXB + ZD + u + Fe 2.3 Eingd Effects In this section, we discuss the estimation of the linear regression equation (2.2.4) when the individual-specific effects are treated as fixed constants. The standard approach is to use individual dummy variables as regressors, and then to apply least squares. This yields the following estimator for B: (2.3.1) bw (XTQX)'1XTQy. The estimator by is the familiar within-group estimator; it uses only the variation within each group. This estimator is sometimes called the covariance estimator since the regression just described is in fact the usual analysis of covariance. The estimator is unbiased, and it is consistent as either N or T (or both) approaches infinity. These are all well-known results; for example, see Judge n;_nl; (1985, pp. 329). A problem with this estimation procedure is that it is not possible to obtain estimates of the coefficients of the tile-inv. is perfe equivale to devia containe coeffici (2.3.2) tine-in. inteTI/‘n 13 time-invariant regressors (Z). Any time-invariant regressor is perfectly collinear with the individual dummy variables; equivalently, it is removed by the transformation of the data to deviations from individual means. If the original model contained no time-invariant regressors, the estimated coefficients of the individual dummy variables are (2.3.2) uw = Py - Pva, and these estimates of the individual effects are consistent as T approaches infinity. If the original model contained. time-invariant regressors, then uw defined above is interpreted as an estimate of (ZD + u) rather than of just u. An equivalent derivation of the within estimator by is to define it as the least squares estimator in equation (2.2.2), ignoring (2.2.3). Similiarly, the estimator uu is least squares applied to (2.2.3), after setting B = bu, and ignoring the time-invariant variables 2. Using only one part of equation (2.1.1), namely equation (2.2.2), when estimating B has the advantage of being computationally more convenient than estimating the whole of equation (2.1.1). This approach also makes explicit the statement that ha ignores the between-group variation; i.e., it ignores the cross-sectional variation in equation (2.2.3). 2.4 In the previous section, we discussed the estimation of a linear regression model when the individual effects (the ui) are treated as fixed constants. In this section, we treat th the erro The N in Tron son. Viewed as Fe I zero and u“Currela (2.4.1; The 1 (2.4.2) 14 treat the individual effects similiarly to the way we treat the error term eat; we assume the ui to be random variables. The N individuals are now to be interpreted as being drawn from some larger population, and so the effects u: can be viewed as a random sample from some distribution. We assume specifically that the u1 are iid with mean zero and variance 0&2. We also assume that X and Z are uncorrelated with u. The model is written as (2.4.1) yit XitB + 211) + u1 + eit XitB + Z1D + Sit i = 1,..,N; t = 1,...,T The variance of Yit, conditional on Kit and 21, is (2.4.2) var(y1t) = var(81t) = 062 + 062. The variances 032 and 052 are sometimes called variance components; each is itself a variance as well as a component of the error variance, var(sit). Similiarly, the errors u: and e1: are sometimes called error components. Therefore, this model is often referred to as an error-components or variance-components model. The presence of the random effects u: in the disturbance term results in correlation among the errors of the same cross-sectional unit, although the errors from different cross-sectional units are independent. This can be made explicit if we let 81 denote the (T x 1) dimensioned error vector (s:1,...,81t)7. The covariance matrix of 81 is then the matrix l2.4.3) where j: = correlatic over time 2.4.1 ELL] The l Whether t Variables Squares a Effects d are treat is Still Hsiao (IS Effects 8 2a 4.2 & As ,_ p‘gric’ds l: errOrs ir 15 (2.4.3) Cov( Bi ) 5 $1 = 03211 + ahz(jrjr') where jr = (1,...,1)r is a (T x 1) vector of 1’s. Thus, this correlation of the errors at the individual level is constant over time and is identical for all individuals. 2.4-1 111W The within-group estimator can be used regardless of whether the “1’8 are viewed as fixed constants or as random variables. The within estimator of B can be viewed as least squares applied to equation (2.2.2), and the individual effects do not appear in this equation. So, whether the u1 are treated as nonstochastic or stochastic, the estimator by is still unbiased and consistent. However, as pointed out by Hsiao (1986), the Within estimator is inefficient when the effects are random and uncorrelated with the regressors. 2.4.2 flgneraliggd Least §guares Estinntion As was shown above, since the Sit in different time periods but for the same individual both contain u1, the errors in the equation (2.4.4) Yit XitB + ZiD + m + eit XitB + ZiD + s1: i = 1,..,N; t = 1,...,T are autocorrelated. Efficient estimation requires that we use the generalized least squares method. Following Hausman and Taylor (1981), we write (2.4.5) S a Cov( s ) = Cov( u + e ) = ohzlru + Tdth where at both idea factor oi (2.4.6) where c2 eQuatior (2.4.7; where RSSune SQUares 16 where 032 = var(eit) and 052 = var(u1). Since P and Q are both idempotent and orthogonal, it follows that, up to a factor of proportionality, (2.4.6) S"1 = Q + czP where c2 = [ 062/( 062 + TOT.2 ) ]. Now, if we rewrite equation (2.2.4) as (2.4.7) y = XB + ZD + u + e = RA + s where R = ( X, Z ) and A = ( BI, DT )7, and if we assume that at? and at? are known, the generalized least squares estimator of A from equation (2.4.7) is simply (2.4.8) acts = (RTS'lR)'1RTS'1y. Equivalently, the GLS estimator is ordinary least squares of (S'llzy) on (S'llzR). Again following Hausman and Taylor (1981), we can note that (2.4.9) 8'”2 = Q + cP = Iur - (1-c)P so that S‘llzy = y - (1-c)Py (and similiarly for R). This transformation is what Hausman and Taylor call "(l-c) differences." For example, (2.4.10) (S'llzy)1t = Yit - (1-c)y1. and this differs from the within transformation to the extent that c i 0. squares fro: Lht equatio: (2.4.11 (2.4.12 Fir (1978b), thesis, the COVE 17 2.4.3 nggnted Leagt Sgnageg Estinntign As an alternative approach to the generalized least squares estimator of A, consider the equations which result from the decomposition of equation (2.4.7). These orthogonal equations can be written as (2.4.11) Qy QRA + Qs (2.4.12) Py PRA + Ps First, we consider the following lemma, due to Mundlak (1978b), and which is to be used throughout much of this thesis. It concerns the need to correct for the failure of the covariance matrix associated with the disturbance term to satisfy the ideal conditions. 2.1 : Suppose y = XB + 8 satisfies the ideal conditions except that Cov(y) = Cov(s) = S. Let M be an idempotent matrix other than the identity matrix, and let y‘ = My and X‘ = MX. Consider the class of estimators of the form b: = (X'TH'1X')'1X‘TH'1y‘, where H is any positive definite matrix. Then the estimator be = (X‘TX')'1X‘Ty‘ is the minimum variance unbiased estimator of B within this class. The point of the Lemma is as follows. We have y = XB + s and cov(s) = S. The best (GLS) estimator of B certainly involves 8. However, if we transform the equation by an idempotent matrix M: (2.4.13) the best transfor' This is : dealing l P and Q. He 5 errors in (2.4.14; and (2.4.15) 18 (2.4.13) (My) = (MX)B + (MS), the best (minimum variance unbiased) estimator of B from this transformed equation is just OLS, which does not depend on S. This is relevant in the present context because we are dealing with equations transformed by the idempotent matrices P and Q. We note that the covariance matrices associated with the errors in (2.4.11) and (2.4.12) may be written as (2.4.14) Cov( Qs ) = QSQ = qQ and (2.4.15) Cov( Ps ) = PSP = rP, respectively, where q = 082 and r = 0&2 + Tahz. Each of these two covariance matrices is of the form of a constant times an idempotent matrix. These two constants may be made the same by multiplying equations (2.4.11) and (2.4.12) by the weights (1/q) and (1/r), respectively. Moreover, it follows from Lemma (2.1) that least squares applied to the system so weighted yields the best minimum unbiased estimator within the class containing all least square estimators of the parameter vector A from any further transformation of these equations or, in fact, the original equation (2.4.7). We will refer to the least squares estimator of A from the system of orthogonal equations (2.4.16) (1/q)Qy = (l/q)QRA + (1/q)Qs (2.4.17) as the we say be vr (2.4.18) The ( transform Correlatic latrix of Space, $1 an: is the b8en % equatiOnS’ eqllatiODS ”iSinal e unbiased e to the Ben equation W equal to t II 19 (2.4.17) (1/r)Py = (1/r)PRA + (1/r)Ps as the weighted least squares estimator of A. This estimator may be written as (2.4.18) aura = (R’QR/q + RTPR/r)'1(RTQ/q + RIP/r)y. The decomposition of the original equation by the transformations, P and Q, has the effect of isolating the correlations found in the non-block diagonal covariance matrix of its error vector, S, to the particular orthogonal space. Since these transformations are orthogonal, and their sum is the identity matrix, equation (2.4.7) is said to have been zeduced by the pair ( Q, P ) into the two orthogonal equations, (2.4.11) and (2.4.12). Since this pair of equations contains exactly the same information as the original equation, we would expect that the minimum variance unbiased estimator from the two equations would be equivalent to the generalized least squares estimator from the original equation. This result is stated in the following theorem. Ih£2££l_LZLZl= The weighted least squares estimator, ast, is equal to the generalized least squares estimator, acts. 22221: The generalized least squares estimator of A from the equation y = RA + s where Cov(s) = S, can be written as 86L: ( R'S'1R )-1( R'S'1y ) ( R‘{ Q + czP }R )"( R’{ Q + czP ly ) 20 ( R11 Q + (Q/rlp }R )'1( RT{ Q + (q/r)P ly ) since c2 = (Q/r) ( R'{ (1/9)Q + (1/r)P }R )‘1( R’{ (1/9)Q + (l/r)P }y ) ( R'QR/q + R’PR/s )'1( R'Q/q + RIP/r )y Therefore, ast = aGLSo Q.E.D. Now least squares applied to equation (2.4.12) is called the between-group estimator; explicitly, it is as = (RTPR)’1RTPy. It utilizes the cross-sectional variation in the individual means. Recall that the within-group estimator can be viewed as least squares applied to equation (2.4.11); it utilizes the variation within the individual groups. As Maddala (1971) has shown, the generalized least squares estimator can be viewed as an efficient combination of the within-group estimator and the between-group estimator. The optimal weights for the two different sets of variation are the constants being used to normalize each of the equations; i.e. the reciprocal of the variances q = 032 and r = 0&2 + Td’u2 for the respective equations, (2.4.11) and (2.4.12). The following two theorems concern alternative estimation procedures which yield the weighted least squares estimator defined above. For the first such derivation, consider again the equations resulting from the decomposition of the original equation (2.4.7). These orthogonal equations can be written as v-« 8.... let SO S: the e any i there is 21 (2.4.19) y- = RsA + s: where Py PR ya = , Re = , Qy QR and P( e + u ) s: = . Q( e ) Let qP 0 (204020) St E COV(S’) = g 0 rQ so Sc denotes the singular covariance matrix associated with the error term of the above system. It is well known that any idempotent matrix is its own generalized inverse, and therefore the generalized inverse of the singular matrix, 8:, is (1/q)P 0 0 (1/r)Q (2.4.21) Sat" .1: Applying generalized least squares to (2.4.19), using the generalized inverse of the error covariance matrix, we arrive again at the weighted least squares estimator; this is stated formally in the following theorem. Inggzgn_LZ;§): The generalized least squares estimator of A from equation (2.4.19) equals the weighted least squares estimator of A. equation ACLS' 22 £200: 3 The generalized least squares estimator of A from the equation (2.4.19) can be written as acLs- = (Rc’Sc‘Rt)'1Rc’SC*y- = ( RTQR/q + RTPR/r )'1( RTQ/q + RIP/r )y Thus, awis = acLs- Q.E.D. A second derivation follows the lines of Fuller and Battese (1974). We note that Cov(s) 5 S = qQ + rP, and we consider the transformation of the original equation (2.4.7), using the matrix 8'1/3, where 8‘112 = (1/q')Q + (1/r’)P, q' = ( q )1/2 and r‘ = ( r )1/2. The transformed equation can be written as (2.4.22) S'llzy S‘llzRA + 8'1/2( s ) s-IIZRA + 5-1/2( e + 11) Thus, using the Fuller and Battese expression for the covariance of the error term, 3, we have the following theorem. Ingnzgn_(zéjl: The least squares estimator of A from equation (2.4.22) is equal to the weighted least squares estimator of A. 2:991: Now the decomposition of equation (2.4.22) can be written as (2.4.23 (2.4.24 The less M A time 23 (2.4.23) QS'llzy QS'IIZRA + QS'1/2( e ) (2.4.24) PS'l/zy PS‘112RA + PS'1/3( e + u ) The least squares estimator of A from this system is an =(RTs-1/2Qs-IIZR + Hrs-IIIPs-UZR )-1 times ( Rts-IIZQs-llz + Rtg-llZps-112)y ( R‘l Q/q* + P/r* )Q( 9/9* + P/r* )R + RT( Q/q* + P/r* )P( Q/q* + P/r* )R )'1 times ( R‘( Q/q* + P/r* )Q( Q/q* + P/r* ) + RT( Q/q* + P/r* )P( Q/q* + P/r* ))y ( R‘QR/q + R’PR/r )"( R'Q/q + R'P/r )y As shown in equation (2.4.18), the weighted least squares estimator of A is written as ast = ( RTQR/q + RTPR/r )'1( RTQ/q + RIP/r )y Thus, BWLS = a". QeEeDo 2.5 dom ts C ela w’ t e o In some applications of the error-component model, there may be reasons to believe that the individual-specific unobservable effects found in the error term may, in fact, be correlated with some or all of the included explanatory variables. If we take the view suggested earlier, that the random effects represent omitted individual-specific variables, this correlation would seem inevitable. When there is c c explanatory estimator i (1978a) tak ““838 pres x“ejects the the within Howeve Effects Are Consider in (1981), in ““9880“. (2.5.1) “1 Where X11‘ 24 there is correlation between the random effects and the explanatory variables, the generalized least squares estimator is biased and inconsistent. Indeed, Mundlak (1978a) takes the extreme view that such correlation is always present in the error-component model, and therefore rejects the generalized least squares estimator in favor of the within estimator. However, Mundlak (1978a) considers the case in which the effects are correlated with nil of the regressors. We consider instead the case treated by Hausman and Taylor (1981), in which the effects are correlated with ggng of the regressors. To consider this case, we first need to introduce some notation. Consider the equation (2.5.1) y1t = (X11e, XZit)B + (211, 221)D + u1 + e1: where Xlit represents the (1 x g1) dimensioned vector of time-varying explanatory variables and 211 represents the (1 x k1) dimensioned vector of time-invariant explanatory variables, both of which are assumed to be uncorrelated with both errors, u1 and e1t. The (1 x g2) dimensioned vector of time-varying explanatory variables, X21t, and the (1 x k2) dimensioned vector of time-invariant explanatory variables, 221, are both assumed to be correlated with u1 but uncorrelated with eit. As before, both the random noise component, e11, and the individual effects, u1, are i.i.d. as well as independent of one another. The matrix form of equation (2.5.1) can be written as (2.5.2) where y, and Z is Now tradition re’Sl‘essor: In the pn interesti, estimator, based on E Present ir First (Cov(s) )~1 scalar CO" 25 (2.5.2) y (X1, X2)B + (21, Zz)D + u + e RA + s where y, u, and e are (NT x 1); X is (NT x g ), g = g1 + 32; and Z is (NT x k ), k = k1 + k2. Now the method of instrumental variables has traditionally been viewed as the response to the problem of regressors correlated with the equation’s disturbance term. In the present context, Hausman and Taylor (1981) propose an interesting variation to the usual instrumental variables estimator. Unlike the usual approach, their estimator is based on a set of instruments made up of regressors already present in the equation being estimated. First, they multiply equation (2.5.2) by 8'112 (Cov(s))‘ll2 to transform the error term so that it has a scalar covariance matrix. The transformed equation is simply (2.5.3) S'llzy S'1/2(X1, X2)B + S'1’2(21, 22)D + 5-1/23 S'1/3RA + S'llzs Second, they use as their instruments the set H = ( Q, X1, 21 ), and derive what they consider to be the efficient instrumental variables estimator of A from equation (2.5.3). If we define for any matrix M the projection onto the column space of M as P[M] (so that PIM] = M(M'1'M)"1M'r when M has full column rank), the Hausman and Taylor estimator of A can be written as (2.5.4) an: = ( R’S'1/2PTHIS’1/3R )-1( R's-llzplnls-lizy ) The l H is not 1 following Lena 2.! EOWever’ ‘ estimator, understand Somewhat I equation ( errors and variabl e8 26 The Hausman-Taylor instrument set is cumbersome because H is not of full column rank. We can evaluate P[H] using the following Lemma: Jamil-.51: P[Hl = Q + P[(PX1, 21)] However, while this solves the problem of calculating the estimator, it is not very satifactory in helping us to understand why the estimator is efficient. Perhaps a somewhat more intuitive approach to the estimation of equation (2.5.2) is to decompose it into the two orthogonal equations (2.5.5) Qy = (QX1, QX2)B + Qe (2.5.6) Py = (PX1, PX2)B + (Z1, 22)D + P(e + u) Since Qu = 0, there is no problem of correlation between errors and regressors in (2.5.5). Furthermore, ( PX1, 21 ) can readily be seen to be the largest available set of variables in equation (2.5.6) which have been assumed to be uncorrelated with the random effects. Projecting equation (2.5.6) onto the column space of ( PX1, Z1 ), we have the set of orthogonal equations (2.5.7) Qy QRA + Q( e + u ) (2.5.8) P1Py P1PRA + P1P( e + u ) *where P1 = P[(PX1, 21)]. The covariance matrix associated with the errors in equations (2.5.9) and (2.5.10) I‘QSPective IAtrices 1 latrix. 1 ‘ttelpt at e(Nation \. estilator, squares es DQCQNes (2-5-11) [Sing '2'5.12) 27 equations (2.5.7) and (2.5.8) can be written as (2.5.9) Cov( Q( e ) ) = 9Q and (2.5.10) Cov( P[(PX1, Z1)]P( e + u ) ) = rP[(PX1, 21)]. respectively. We note that each of these two covariance matrices has the form of a constant times an idempotent matrix. Thus, Lemma (2.1) would imply that any further attempt at diagonalizing the covariance matrices in either equation would not improve the efficiency of the resulting estimator. Using the weights q and r, the weighted least squares estimator of A from equations (2.5.7) and (2.5.8) becomes (2.5.11) a'Iv = { RT(l/q)QR + RT(1/r)P[(X1, Z1)]R }'1 times { RT(1/Q)Qy + RT(1/r)P[(PX1, Z1)]Y } Using Lemma (2.5), anr can be rewritten as (2.5.12) aur = ( RT( (1/q)Q + (1/r)P[( PX1, 21 )l )R )"1 times ( RT( (1/q)Q + (1/r)P[(PX1, 21)] )y We have now proved the following theorem. Incogg; (2.6): The Hausman and Taylor estimator of A equals weighted least squares applied to equations (2.5.7) and (2.5.8). procedt onpone not the therefc cc“Pens example Cn (1973) estimat distrib k5°“n \h l”Se x the 95t. to 29m ”StimatC 28 2.6 r’ ce t' a ' w e t o ec s e at 0 el te w't e R ssors When discussing the generalized least squares estimation procedure, we have implicitly assumed that the variance components, as? and dbl, were known. In practice, this is not the case; the variance components are usually unknown and therefore must be estimated. When estimates of the variance components are used in place of the actual values, we have an example of fgagible genegnlizeg lenst gguazes. Under mild regularity conditions, Fuller and Battese (1973) have shown that the feasible generalized least squares estimator is consistent and has the same asymptotic distribution as the generalized least squares estimators with known variance components. This result holds true for either large N or large T. Swamy and Mehta (1979) caution that, if the estimator of 052 is unreliable, say because on is close to zero or N is small, the feasible generalized least squares estimator may also be unreliable. Taylor (1980), on the other hand, has shown that the difference between the covariance matrices of the Within estimator and of the feasible generalized least squares estimator is nonnegative definite for even moderate sizes of either N or T. This suggests that, in practice, the feasible generalized least squares estimator may be more efficient than the Within estimator. Efficiency in the estimation of the variance components and its subsequent effect on the efficiency of the feasible genera Anemi) (1973) estima in eff 1 (2.4.1 (2.6.1 Where 1 (2.6.2 Vhéz‘e I If inSteac prOCedL estimat estimat €12 groundi. 29 generalized least squares estimator has been discussed by Amemiya (1971). Similiarly, papers by Maddala and Mount (1973) and Taylor (1980) have shown that using more efficient estimates of the variance components need not lead to a gain in efficiency of the estimates. In the following discussion, we rewrite equation (2.4.11) and (2.4.12) as (2.6.1) Qy R1A1 + Qs where R1 ( QX ), A1 = B, and rank( R1 ) = g; and (2.6.2) Py = RzAz + Ps where R2 ( PX, Z ), A2 = ( 8’, Dr )T, and rank( R2 ) = g + k. If feasible weighted least squares is to be implemented instead of the equivalent feasible generalized least squares procedure, the weights q and r are the parameters we need estimate. One approach to estimating these weights is to estimate q = 052 using residuals from equation (2.6.1) and r = on: + T622 using residuals from equation (2.6.2). The groundwork for such an approach is laid by Maddala (1971), Swamy (1971), and Arora (1973). We now proceed to show that estimators so defined are both unbiased and consistent. In addition, we find the necessary conditions for identification of the model. We define the sum of squared residuals from equation (2.6.1) as ‘2‘603) where th squares . nalely (2|60‘) And we d (2.6.2) (2.6.5) Where tl‘. equal ion ‘2-6.6) 2.5.1 $21 To identin. EQUatiOn Th“, tn. IOdel an 30 (2.6.3) SSE1 = (Qy - R1a1)’(Qy - Rial) where the residuals have been computed using the least squares estimates of the coefficients in equation (2.6.1), namely (2.6.4) a1 = (R1’R1)'1R1Ty. And we define the sum of squared residuals from equation (2.6.2) as (2.6.5) SSE: = (Py - R2a2)’(Py - Rzaz) where the least squares estimates of the coefficients in equation (2.6.2) are given as (2.6.6) a2 = (Rz‘R2)'1R2’y. 2.6.1 Qonnting Rules fog Identification To insure that the parameters in the model are identified requires that the parameters in each of the two equations, (2.6.1) and (2.6.2), separately be identified. Thus, the necessary conditions for the identification of the model are that (2.6.7) g + k < N and g g N(T - 1). Since the second condition follows from the first, the necessary condition for identification of the model can be more succinctly written as (2.6.8) 3 + k < N. 2.6.2 Ls; Theorem ( 823 = SS Then $81.31 : H II EXp{ SSE 31 2-6-2 Estimatisn_gf_s_and_r Inegzgn (2.7): Let 312 = SSE1/l N(T-1) - g ], and let 822 = SSE2/[ N - g - k ]. Then 812 is an unbiased estimator q and 822 is an unbiased estimator of r. 2:221: Let P1 represent the projection onto the column space of the regressors in equation (2.6.1); i.e. P1 = P[R1] = R1(R1TR1)'1R1T. Then QP1 = P1Q = P1, P1R1 = R1, P11 = P1, and P1P = 0. First we write the residual from equation (2.6.1) as (Qy - QP1y) = R1A1 + Qs - PiQy Residuali R1A1 + Qs - P1R1A1 - P1Qs R1A1 - R1A1 1» Q8 - P1s (Q - P1)s Then we form the expression SSE1 (Qy - QP1y)'(Qy - QP1y) = y'(Q - QP1)’(Q - QP1)? 8’(Q - QP1)’(Q - QP1)8 = 8’(Q - PiQ - QP1 + P1QP1)8 = sT(Q- P1 - P1 1» P1)s = s"(Q - P1)s Taking the expectation of the SSE1, we write Exp{ SSE1 } Exp{ sT(Q - P1)s } Exp{ trace{ s'(Q - P1)s ) } Exp{ trace{ (Q - P1)ss' ) } since trace(AB) = trace(BA) if both AB, BA defined and square. Thus, Space =R2(P P21 8! ResidL Then 8352 : II II 32 trace{ (Q - P1)E{ ssT } } trace{ (Q - P1){ qQ + rP } } since E{ ssT } g S = qQ + rP; = (q)trace{ (Q - P1) } since P1P = 0. (q)rank(Q - P1) since trace(A) = rank(A) if A is idempotent. (q){rank(Q) - rank(Rlll Thus, Exp{ 812 } = q. Now, let P2 represent the projection onto the column space of the regressors in equation (2.4.2); i.e. P2 = P[Rz] = R2(R2TR2)'1R2'. Then PP2 = P2P = P2, P2R2 = R2, P2r = P2, and P2Q = 0. First we write the residual from equation (2.6.2) as (Py - Psz) = R2A2 + Ps - PzPy Residual: R2A2 + Ps - P2R2A2 - PzPs R2A2 - RzAz + Ps - P23 (P - P2)s Then we form the expression SSEz (Py - PP2y)'(Py - Psz) = y’(P - PP2)(P - PP2)? 8"(P - PP2)(P - PP2)s 3"(P - P2P - PP2 + P2PP2)s 8'(P - P2 - P2 + P2)s sT(P - P2)s Taking the expectation of the SSE we write Exp{ SSEz } Exp{ sT(P - P2)s } Exp{ trace{ sT(P - P2)s } } Iheere 822 estima- estima‘ Eflflfik Pli. s1 Th (SIR his/N(l ‘) Th, 33 Exp{ trace{ (P - P2)ssT } } since trace(AB) = trace(BA) if both AB and BA are defined and square. trace{ (P - P2)E{ ss' } } trace{ (P - P2){ qQ + rP } 1 since E{ 881' } = qQ + rP. = (r)trace{ (P - P2) } since P2Q = 0. = (r)rank(P - P2) since trace(A) = rank(A) if A idempotent. = (r){rank(P) - rank(R2)} Q.E.D. Inggggl_(2L§): Let s12 = SSE1/{ N(T - 1) - g } and 822 = SSE2/{ N - g - k }. Then 312 is a consistent estimator of q as N or T gets large, and 822 is a consistent estimator of r = 632 + Tdu2 as N gets large. £29.21: plim siz plim SSE1/{ rank(Q) rank(R1) } plim SSE1/N(T - 1) plim sT(Q - P1)s/N(T - 1) plim s'Qs/N(T - 1) - plim s'Pis/N(T - 1) The last term is zero since s'P1s/N(T - 1) = [s'R1/N(T - 1)][R1'R1/N(T - 1)]‘1R1Ts/N(T — 1) and Ri‘s/N(T - 1) -> 0 as N(T - 1) -> oo ( as either N -> 00 or T -> oo ). The first term equals 053 because, using standard results ( e.g. Rao (1973, p 185)) on the distribution of idempotent quadratic forms in normals, sTQs is distributed as 03212151? plim sz 3 34 O'eZXZIHT-l) . plim sz2 plim SSE2/{ rank(P) - rank(Rz) } plim SSE2/N = plim sT(P - P2)s/N plim sTPs/N - plim sTP2/N The last term is zero since s'st/N = [sTRz/N][R2’R2/Nl'1R2's/N and Rzrs/N -> 0 as N -> 00. The last term equals r = 032 + To'u2 because, using standard results ( e.g. Rao (1973, p 185)) on the distribution of idempotent quadratic forms in normals, sTQs. is distributed as rxzn. Q.E.D. Va 'a ce 3 imatio e the Co rel ed w th the Re ressors So far we have considered variance estimation for the feasible weighted least squares estimator only. We now consider the model of section 2.5, in which some of the regressors are correlated with the individual effects. Once again we will need to estimate the variance components 032 and 0&3, since they are needed to implement the Hausman and Taylor instrumental variables estimator (or the equivalent weighted instrumental variables estimator). The estimate of 063 based on the within residuals, discussed in section 2.6, is still consistent in this model. However, the estimate of r = as? + Tot;z which was discussed in section 2.6 is not consistent, since it was based on the residuals from least squares applied to (2.6.2), and this least squares estimator is incon Se ‘ finding . consiste: consiste: is the 1.- estimate Sive a r assumptlt % (i) 1 (ii) (iii) (iv) 35 is inconsistent when regressors and effects are correlated. We therefore turn our attention to the problem of finding consistent estimates of B and D. Then, using these consistent estimates of A2 = ( 8’, Dr )T, we derive a consistent estimate of r. The background for this approach is the work of Hausman and Taylor (1981), who suggest the estimate of r which we discuss here. Hewever, they do not give a rigorous proof that it is consistent. The following assumptions will be made. Angn;p§19n_(2;91: Let H = [ PX1, 21 ]. Then we assume that (i) plim XTQe/N = 0 as N —> 00. (ii) plim HTP(e + U)/N = 0 as N -> 00. (iii) plim (XTQX)/N is finite and nonsingular as N -> 00. (iv) plim (HTZ)/N is finite as N -> 00. (v) plim (HTX)/N is finite as N -> 00. Even after the introduction of X2 and 22 - regressors assumed correlated with the effects - the within estimator is a consistent estimator of B; no correlation exists between the disturbance and the regressors in equation (2.5.5). So the problem of finding a consistent estimator of A2 is reduced to finding a consistent estimator of D. The following regression equation will be used in deriving such an estimator. Lesms_lzllfll: Let d' = Pty - wa)- Then (2.7.1) d‘ = ZD + (P - PX(XIQX)’1XIQ)S. 36 d3 P(y - sz) = Py - PXba = Py - PX(XTQX)’1XTQy PX(XTQX)'1XTQ(XB + ZD + s) P(XB + ZD + s) P(XB + 2D + s) PX(XTQX)'1X'(QXB + Qs) P(XB + ZD + s) PX(X7QX)‘1X7QXB + PX(XIQX)'1XTQS PXB + PZD + Ps PXB + PX(X7QX)'1XTQ8 ZD + (P - PX(X7QX)'1XIQ)8 Q.E.D. Since part of Z is correlated with the error term, least squares applied to equation (2.7.1) does not yield a consistent estimator of D. But, using H = (PX1, 21) as a set of instruments, the instrumental variable estimator of D is defined as (2.7.2) dIV = (XTP[H]X)'1XTP[H]d' where P[H] = H(HTH)'1HT. It is interesting to note that using d“ = (y - wa) instead of d' = P(y - wa) would not increase the efficiency of the estimator, div. Indeed, since P[HlP = PP[H] = P[H], Zi'Q = 0, and the first order conditions (i.e. the "normal equations") defining bu imply that X1TQ(y - wa) :0, (2.7.3) HTP(y - va) = HT(y - wa); thus the game estimator would result if we used d" in place of d’. Given the estimator le, the question is whether this Ir“ 37 estimator is, indeed, a consistent estimator of D. But first, we consider the conditions necessary to assure that dIV does exist. 2.7.1 Neggsgngz Conditiong {on tng Enigtence of d}: A necessary condition for the existence of dIV is that the rank of H be at least as large as the rank of 2; that is, there must be at least as many instruments as regressors. This requires g1 + k1 2 k, or g1 3 k2, as noted by Hausman and Taylor (1981). Intuitively, PX1 is serving as instruments for 22, and so there must be at least as many variables in X1 as in 22. 2.7.2 ansigggncz of d1! Lemma 2. : Given assumption (2.9), (1) plim ZTP1P(e + u)/N = 0 as N -> oo (2) plim ZTP1Z/N is finite and non-singualar as N -> oo (3) plim Z'P1X/N is finite as N -> 00 Lemma (2.11) can be easily proved by noting that P1 = P[H] = H(HTH)'1HT, where H = P( X1, 21 ). Inggggn_12;121: The instrumental variable estimator div is a consistent estimator of D as N -> 00. £22.91: First, rewrite dIV as dIV = (ZTP1Z)'1ZTP1d‘ 38 (ZTP12)'IZTP1(ZD + (P - PX(X7QX)‘1XTQ)3) (Z’P1Z)'IZTP12D + (ZTPIZ)'1ZTP1(P - PX(XTQX)'1XTQ)8) D + (ZIPIZ)‘IZ’P1PS - PX(XIQX)'1XIQS D + (ZTP12/N)'1{ZTP1Ps/N} - (Z'PlZ/N)‘1{ZTPIPX/N}(X’QX/N)'1{XTQs/N} By assumption plim X'Q(e + u)/N = 0 as N -> 00. and plim (XTQX)/N is finite and nonsingular as N -> 00. Using Lemma (2.11), it follows that plim ZTP1P(e + u)/N = 0 as N —> oo, plim (ZTP1Z)/N is finite and nonsingular as N -> 00, and plim (ZTP1PX)/N is finite as N -> 00. Thus, plim dIV = D + {finite}{ 0 } - { finite }{ finite }{ finite }{ 0 } D as N -> oo. Q.E.D. 2.7.3 A ggnsistgnt Estinnge 91 3 Using as a consistent estimate of A2 = ( B”, D1 )'r the estimators ha and dIV, we will now form a vector of residuals. We will then show that the sum of the squared terms of this residual vector, divided by N, is a consistent estimator of r = 022 + Toni. 39 2. 3 : Let Residual = Py - Pwa - PZdrv. Then Residual = P(e + u) - PX(XTQX)'1X7Qe - PZ(Z'P12)'1ZTP1P(e + u) + PZ(ZTP1Z)'1ZTP1PX(XTQX)'IXTQe 2:001: Residual = Py - Pwa - PZdrv Py - PX(XTQX)'1XTQy - P2(ZTP1Z)'1ZTP1d‘ P(XB + ZD + WC + s} - PX(XTQX)'1XTQ{XB + ZD + WC + s} - PZ(ZTP1Z)‘1ZTP1{2D + (P - PX(XTQX)‘1XTQ)(e + u)} PXB + 2D + P(e + u) PX(X7QX)'1XIQXB - PX(XTQX)"XTQe PZ(ZIP12)'1Z’P1ZD PZ(ZTP1Z)'1ZTP1P(e + u) + PZ(ZTP12)'1ZTP1PX(XTQX)‘IXTQe PXB + 2D + P(e + u) PXB - PX(X7QX)'1XTQ(e + u) - PZD PZ(ZTP1Z)'IZTP1P(e + u) + PZ(ZIP1Z)'1ZTP1PX(XIQX)'1XTQe P(e + u) - PX(X'QX)'1XTQe - PZ(Z’P1Z)'1Z'P1P(e + u) + PZ(ZIPIZ)‘1ZTP1PX(XTQX)'1X7Qe Q.E.D. We now define a consistent estimator for r. Define SSE‘ as the sum of squared residuals defined in Lemma (2.13). Our 40 estimator is just SSE‘lN. (2.7.4) SSE’ = (Residual)'(Residual). o o 2. : plim SSE‘lN = r35 6.2 + To’u2 21221: First, SSE‘ can be written as SSE‘ (Residual)’(Residual) (e + u)TP(e + u) (e + u)TPX(XTQX)'1XTQe (e + u)'PZ(Z’P12)'1ZTP1P(e + u) + (e + u)TPZ(ZTP12)'1Z‘P1PX(XTQX)‘1XTQe - eTQX(XTQX)‘1X’P(e + u) + eTQX(XTQX)‘1XTPX(XTQX)'IXTQe + eTQX(XTQX)'1XTPZ(ZTP1Z)'127P1P(e + U) - eTQX(X‘QX)'1XTPZ(Z’P1Z)'IZTP1PX(XTQX)‘1XTQe - (e + u)TPP1Z(ZTP12)‘1ZTP(e + u) + + (e u)TPP1Z(ZTP12)'12’PX(XTQX)'1XTQe + (e + u)TPP1Z(ZTP12)‘1ZTP2(2’P1Z)'1ZTP1P(e + u) - (e + u)7PP12(ZTP1Z)'1ZTPZ(ZTP1Z)'1ZTP1PX(XTQX)'1X‘Qe + eTQX(XTQX)'1X’PP1Z(2'P1Z)'1ZTP(e + u) - e'QX(X'QX)'1X'PP1Z(ZTP1Z)'12TPX(XTQX)'1XTQe - eTQX(XTQX)‘1XTPP12(Z’P1Z)'127P2(27P12)'1ZTP1P(e + u) + eTQX(erX)-lerP1Z(zrpiZ)-1z192(zrpiz)-1 times Z'PiPX(XIQX)'1X’Qe Now, from the above expression, taking the probability limit of SSE’ as N gets large is equivalent to taking the 41 probability limit of the sum of sixteen different terms. Evaluation of these sixteen terms shows that the first term has a probability limit equal to r and that the remaining fifteen terms each have a probability limit equal to zero with all limits being taken as N -> 00. These probability limits are evaluated below. 1) plim (e + u)'P(e + u)/N = plim eTPe/N + plim u‘Pu/N Consider these term by term. First, I eTPe/N = T: 61.2/N. 1:1 Each term e1.2 has a mean of aha/T, and the terms are independent. Therefore, eTPe/N -> Tozz/T = 032 as N -> 00. Second, a u'Pu/N = T2: u12/N —> T6112 as N -> 00. 1:1 Third, a eTPu/N = TE e1.u1/N -> 0 as N -> oo i=1 because e and u are uncorrelated. Therefore, (e + u)TP(e + u)/N -> 032 + Tot.2 as N -> oo. 2) plim (e + u)TPX(WTQW)'1WTQe/N = plim {(e + u)7PW/N}(WTQW/N)'1{WTQe/N} 3) plim 4) plim 5) plim 6) plim 7) plim 42 plim {(e + u)’PW/N} plim (WTQW/N)“1 plim {WTQe/N} 0 as N -> 00. (e + u)TPztz1912)-1zrpip(e + u)/N plim {(e + u)TPZ/N}(ZIP1Z/N)'1{ZTP1P(e + u)/N} plim {(e + u)TPZ/N} plim (2"‘P12/N)’1 times plim {ZTP1P(e + u)/N} 0 as N -> 00. (e + u)TPZ(ZTP12)'12'P1PW(WTQW)'1WIQe/N plim {(e + u)TPZ/N}(Z"P1Z/N)'1 times {2TP1PW/N}(WTQW/N)‘1(WTQe/N} plim {(e + u)7PZ/N} plim (ZTP1Z/N)'1 plim {ZTP1PW/N} times plim (WTQW/N)'1 plim {WTQe/N} 0 as N -> oo. eTQW(WTQW)'1WTP(e + u)/N plim {e'QW/N)(WTQW/N)'1{WTP(e + u)/N} plim {eTQW/N} plim (W'QW/N)‘l plim {WTP(e + u)/N} 0 as N -> oo. eTQW(WTQW)'1WTPW(WTQW)‘1WTQe/N plim {e'QW/NllwrQW/Nl’l{W‘PW/N}(W’QW/N)‘1{WiQe/N} plim {eTQW/N} plim (WTQW/N)’1 plim {WTPW/N} times plim (WTQW/N)'1 plim {WTQe/N} e’QW(W'QW)'1WTPZ(ZTP12)'12'P1P(e + Ul/N plim {eTQW/N}(WTQW/N)'1{WTPZ/N} times (ZTP1Z/N)'1{ZTP1P(e + u)/N} 43 plim {e'QW/N} plim (WTQW/N)”1 plim {WTPZ/N} times plim (2"P12/N)'1 plim {2'P1P(e + u)/N} 0 as N -> oo. 8) plim eTQW(WTQW)'1W7PZ(ZTP1Z)'1ZTP1PW(WTQW)'1WTQe/N plim {eIQW/N}(WIQW/N)'1{WTPZ/N}(Z7P1Z/N)‘1{ZTPIPW/N} times (WTQW/N)'1{WTQe/N} plim {eTQW/N} plim (W'l'QW/N)‘1 times plim {WTPZ/N}plim (Z'IPIZ/N)’1 times plim {ZTP1PW/N} plim (WTQW/N)‘1 plim {WTQe/N} 0 as N -> oo. 9) plim (e + u)TPPIZ(ZTP12)'IZTP(e + u)/N 10) plim 11) plim plim {(e + u)TPP12/N}(ZTP1Z/N)'1{ZIP(e + u)/N} plim {(e + u)TPP1Z/N} plim (2"P1Z/N)“l times plim {ZIP(e + u)/N} 0 as N -> 00. (e + u)IPP1Z(ZTP12)'IZIPW(WIQW)’1WTQe/N plim {(e + u)"PP12/N}(2"P1Z/N)"l times {2’PW/N}(WTQW/N)'1{WTQe/N} plim {(e + u)TPP1Z/N} plim (2"P1Z/N)‘1 plim {Z’PW/N} times plim (WTQW/N)"1 plim {WTQe/N} 0 as N -> 00. (e + u)’PP1Z(ZTP12)'12'PZ(Z’P1Z)'1ZIP1P(e + u)/N plim {(e + u)TPP1Z/N}(Z’P12/N)'1{ZTPZ/N} times (27P12/N)'1{Z'P1P(e + u)/N} 44 plim {(e + u)TPP12/N} plim (Z'I'Piz/N)'1 plim {ZTPZ/N} times plim (2'1'P1Z/N)'l plim {ZTP1P(e + u)/N} 0 as N -> oo. 12) plim (e + u)TPP1Z(ZTP12)'1Z'PZ(Z7PIZ)'1 times Z’P1PW(WTQW)'1W7Qe/N = plim {(e + u)‘1'PP12/N}(2‘1'P12/N)‘.1{ZTPZIN}(ZTP1Z/N)"1 times {ZTP1PW/N}(W’QW/N)‘1{WTQe/N} plim {(e + u)TPP1Z/N} plim (ZTP12/N)'1 plim{27PZ/N} times plim(Z"'P1Z/N)'l plim {Z‘PiPW/N} times plim (W“QW/N)'l plim {WTQe/N} 0 as N -> oo. 13) plim eTQW(WTQW)‘1WTPP12(ZTP1Z)'12TP(e + u)/N plim {eTQW/N}(WTQW/N)‘1{WTPP1Z/N} times (ZTP12/N)'1{ZTP(e + u)/N} plim {eTQW/N} plim (W"QW/N)'1 plim {WTPPiz/N} times plim (Z'I'P1Z/N)”1 plim {ZIP(e + u)/N} 0 as N -> oo. 14) plim eTQW(WTQW)'1W7PP12(ZTP12)‘12‘PW(WTQW)'1W7Qe/n plim {eTQW/N} plim (W'I'QW/N)“1 plim {WTPP1Z/N} times plim (Z'l'P12/N)'1 plim {ZTPW/N} times plim (WTQW/N)‘1 plim {WTQe/N} 0 as N "'> 000 15) Plim eTQW(WTQW)'1WTPP1Z(sz1z)-1 times zrpzurpizrlzrpime + 11)/N 45 plim {e'l'QW/N}(W'1'QW/N)"1{W'l'PP1Z/N}(Z"P12/N)‘1 times {ZTPZ/N}(ZTP12/N)‘1{ZTP1P(e + u)/N} plim {eTQW/N} plim (W'rQW/N)'l plim {W’PPiZ/N} times plim (2'1'P1Z/N)'1 plim {Z'PZ/N} times plim (2"P1Z/N)'1 plim {27P1P(e + u)/N} 0 as N -> oo. 16) plim { e1'QW(WTQW)'lW'l'PP1Z(Z"'P1Z)'1Z"'PZ(Z"P1Z)‘1 times ZTPIPW(WTQW)'1WIQe/N } plim {eTQW/N}(WTQW/N)'1{WTPPlz/N} times plim (ZTP1Z/N)'1 {Z'l'PZ/N}(Z"P1Z/N)"l times plim {ZTPiPW/N}(WTQW/N)'1{WTQe/N} = plim {eTQW/N} plim (WTQW/n)'l plim {WTPP1Z/N} times plim (Z'rPiz/N)'1 plim {ZTPZ/N} times plim (ZIPiz/Nr1 plim {ZTP1PW/N} times plim (W'TQW/N)'1 plim {WTQe/N} O as N ") 00. Q.E.D. 2.8 o c 8'0 8 In this chapter, we have considered a linear regression model which contains unobserved individual effects. Given panel data, this model may be estimated in a variety of ways, depending on what is assumed about the correlation between the regressors and the effects. We have given a survey of the literature, tidying up a few loose ends, and we have introduced the analytical framework to be used in the rest of the thesis. In the next chapter, we will extend the analysis of this chapter to a model which contains unobservable time 46 effects as well as individual effects. CHAPTER 3 Individual and Time Effects 3.1 Introduction In this chapter, we extend the linear regression model considered in the previous chapter to include unobservable time effects. We again assume that the data consists of T time-series observations on each of N individuals; we distinguish regressors which vary over time and individuals from those that are either time-invariant or individual- invariant; and now we assume the presence of unobservable time-invariant individual effects, unobservable individual- invariant time effects, and the usual statistical noise. We write the model to be considered in this chapter as (3.1.1) me = XitB + WtC + 21D + m + Vt + e11, i=1’ooogN;t=1,ooe,To where yit is the dependent variable, Kit is a vector (of dimension 1 x g) of explanatory variables which vary both over time and over individuals, 21 is a vector (of dimension 1 x k) of time-invariant explanatory variables, Wt is a vector (of dimension 1 x h) of individual-invariant explanatory variables, and B, D, and C are vectors of 47 48 parameters to be estimated. The errors e11 are iid with mean zero and variance 632. Both the individual effects u1 and the time effects v: are unobservable, and various assumptions about them will be made. However, in all cases the individual effects will be treated as time-invariant and the time effects will be treated as individual-invariant. The plan of this chapter is as follows. In section 3.2 we review the geometry which is used in subsequent analyses. We then consider the estimation of the model under various assumptions. In section 3.3 we consider the fixed effects model, in which the individual effects are treated as fixed parameters to be estimated. The point of this model is to remove the potential bias caused by correlation of the regressors with the omitted individual-invariant and time- invariant variables. In section 3.4 we consider the random effects model, in which the individual and time effects are treated as random and uncorrelated with the regressors. Under these assumptions there is no problem of bias, and efficiency of estimation is our central concern. In section 3.5 we consider an extended version of the model of Hausman and Taylor (1981), in which the individual effects are treated as random but potentially correlated with the regressors. Since many currently available panel data sets are characterized by having many observations but for only a relatively few time periods, in section 3.6 we consider the previous two models for the case when N is large but T is fixed. Finally, in sections 3.7 and 3.8 we consider the 49 'problem of consistent estimation of the variances of the noise, the individual effects, and the time effects. Such estimates are necessary to implement the feasible weighted least squares estimators considered in section 3.4 and 3.5. This chapter applies the Hausman and Taylor method of instrumental variables estimation to the panel data model extended to included both individual as well as time effects, and derives the subsequent estimator. In addition, it provides a survey of the existing literature on this extended model. The analysis of the regression models considered in this chapter is done using the approach introduced in chapter 2. 3.2 fieometgy A useful fact, and one to be used throughout the remainder of this chapter, is that the equation (3.1.1) can be written, equivalently, as the four orthogonal equations (3.2.1) (y1t - y.t - y1. + y..) = (X1: "X.t - X1. + X..)B + (e11 - e.e - e1. + e..) (3.2.2) (y1. - y..) = (X1. - X..)B + (Z1 - Z.)D + (u1 - u.) + (e1. - e..) (3.2.3) (y.t - y..) = (X.t - X..)B + (We - W.)C + (v; - v.) + (e.t - e..) (3.2.4) y.. =X..B+W.C + Z.D+u. + v. + e.. r u where y1. = (1/T)Z y1t, y.t = (1/N)Z me, and y.. 1:1 1:1 50 n r = (l/NT);1 gym. Equation (3.2.2) expresses the data in terms of its individual averages over time with the grand mean subtracted, while equation (3.2.3) expresses the data in terms of its averages over individuals for each period of time with the grand mean subtracted. Equation (3.2.1) expresses the data in terms of its deviations around both the mean for each individual and for each time period with the grand mean added; equation (3.2.4) expresses the data in terms of its grand mean. Writing equation (3.1.1) in matrix form we have (3.2.5) y = XB + WC + ZD + u + v + e where y, u, v, and e denote (NT x 1) dimensioned vectors; and X, W, and 2 denote (NT x g), (NT x k), and (NT x h) dimensioned matrices, respectively. Again, following the convention of Hausman and Taylor (1981), the observations are ordered first by individuals and then by time, so that v and each column of W are (NT x 1) dimensioned vectors consisting of N blocks, with each block containing the same T entries. To achieve the same decomposition as was accomplished above, we define the same four symmetric, idempotent, mutually orthogonal matrices used by Fuller and Battese (1974). These orthogonal projections are (3.2.6) Q1 = In: - Q2 - Q3 - Q4 (3.2.7) Q2 = ( In 0 jrer/T ) - ( Jurist/NT ) (3.2.8) Q3 ( jnjuT/N 0 Ir ) - ( juerrT/NT ) 51 (3.2.9) Q4 = ( jujNT/N O jrer/T ) = ( jurjurT/NT ) where jr = (1,...1)’ is (T x 1). The transformation Q4 determines the grand mean for the NT observations repeated NT times. The transformation Q2 determines the means for each of the individual groups, subtracts the grand mean, and repeats these N observations T times; the transformation Q3 determines the means for each of the time periods, subtracts the grand mean, and repeats these T observations N times. The transformation Q1 transforms each observation into the difference between itself and both its respective individual group mean and time mean, and then adds the grand mean. Explicitly, the (i,t) element of Q1y, Q2y, Qay, and Qey can be written as (3.2.10) (Q1y)1t = y1e - y1. - y.t + y.. (3.2.11) (Q2y)1t = y1. - y.. (3.2.12) (Qay)1t = y.t - y.. (3.2-13) (Qey)1t = y. . respectively. Since W contains variables that are constant across all individual observations for a given time period, Q1W = 0. Similiarly, Q2W = 0. The elements of the columns of W are, on the other hand, expressed as deviations from their respective grand means by the transformation Q3. Analogous results hold true for the time effects v; i.e. Q1v = 0 and 52 sz = 0. Likewise, since 2 contains variables that are constant across all time period observations for a given individual, Q12 = 0 and Q32 = 0. And similiarly, Q1u = 0 and qu = 0. Thus, the original equation (3.1.5) can now be written equivalently as the four orthogonal equations (3.2.14) Q1y Q1XB + Q1e (3.2.15) Q2y Q2XB + szD + Q2(u + e) (3.2.16) st Q3XB + Q3WC + Q3(v + e) (3.2.17) Qey Q4XB + Q4WC + QeZD + Q4(u + v + e) 3.3 'xed cts In this section, we discuss the estimation of the linear regression equation (3.2.5) when both the individual-specific effects and the time-specific effects are treated as fixed constants. The standard approach is to use dummy variables for individuals and for time periods as regressors, and then to apply least squares. This yields the following estimator for B: (3.3.1) bw = (XTQ1X)'1XTQ1y. The estimator by is the familiar within-group estimator; it uses only the variation within each individual group and each time period. The estimator is unbiased, and it is consistent as either N or T (or both) approaches infinity. These are all well-known results; for example, see Judge gt nl. (1985, pp. 338). 53 A problem with this estimation procedure is that it is not possible to obtain estimates of either the coefficients of the time-invariant regressors (Z) or the coefficients of the individual-invariant regressors (W). The time invariant regressors are perfectly collinear with the individual dummy variables and the individual-invariant regressors are perfectly collinear with the time dummy variables; equivalently, they are removed by the transformation of the data by the matrix Q1. If the original model contained no time-invariant regressors, the estimated coefficients of the individual dummy variables are (3.3.2) Uw = sz - Q2wa, and these estimates of the individual effects are consistent as T approaches infinity. If the original model contained time-invariant regressors, then uw defined above is interpreted as an estimate of (Q22D + Qzu) rather than of just u. Simliarly, if the original model contained no individual-invariant regressors, the estimated coefficients of the time period dummy variables are (30303) Vw = Qay - Q3Xb', and these estimates of the time effects are consistent as N approaches infinity. If the original model contained individual-invariant regressors, then Vw defined above is interpreted as an estimate of (QsWC + st) rather than of just v. 54 An equivalent derivation of the within estimator bw is to define it as the least squares estimator in equation (3.2.14), ignoring equation (3.2.15), (3.2.16), and (3.2.17). Similiarly, the estimator uu is least squares applied to (3.2.15), after setting B = by; ignoring the time-invariant variables 2. And, the estimator Vw is least squares applied to (3.2.16), after setting B = by, and ignoring the individual-invariant variables W. Using only one part of equation (3.2.5), namely equation (3.2.1), when estimating B has the advantage of being computationally more convenient than estimating the whole of equation (3.2.5). This approach also makes explicit the statement that bw ignores the between-group variation and the between time period variation; i.e. it ignores the cross- sectional variation in equation (3.2.15) and the time series variation in equation (3.2.16). 3.4 Random Effggts, not Qorgelateg wign Reggegsogs In the previous section, we discussed the estimation of a linear regression model when the individual effects (the u1) and the time effects (the ve) are treated as fixed constants. In this section, we treat the individual and time effects similiarly to the way we treat the error term e12; we assume both the u1 and the V: to be random variables uncorrelated with the regressors. The N individuals are now interpreted as also being drawn from some larger population, and so too the effects u1 can be viewed as a random sample from some distribution. Similiarly, the T time periods are 55 now interpreted as being drawn from some larger population, and so the effects ve can be viewed as a random sample from some distribution. We assume specifically that the u1 are iid with mean zero and variance ahz, the ve are iid with mean zero and variance 0&2, and the u1 and v: are assumed to be uncorrelated with each other as well as with e11. We also assume that X, Z, and W are uncorrelated with both u and v. The model is written as (3.4.1) Yit X11B + 21D + WeC + u1 + v: + e1: XitB+ZID+WtC+Sitg i=1’oeogN;t=1,ooo,T The variance of Yit, conditional on X11, Z1, and We is (3.4.2) var(y1t) = var(s1t) = 052 + 0&2 + 062. Therefore, this model is often referred to as the generalized error-components or generalized variance-components model. The presence of the random effects u1 and vs in the disturbance term results in correlation among the errors for a given individual as well as among different time series. This can be made explicit if we let s1 denote the (T x 1) dimensioned error vector (s11,...,s1r)'. The covariance matrix of 81 is then the matrix (3.4.3) Cov( s1 ) = ahz(jrer) + 02211 + 06311 where jr = (1,...,1)‘ is a (T x 1) vector of 1’s. Furthermore, the covariance between 81 and s: is given by the 56 matrix (3.4.4) Exp( 81811 ) = dtzIr. 3-4-1 Eithin.£stisatisn The Within-group estimator can be used regardless of whether the u1’s and ve’s are viewed as fixed constants or as random variables. The Within estimator of B can be viewed as least squares applied to equation (3.2.14), and neither the individual effects nor the time effects appear in this equation. So, whether the u1 and V: are treated as nonstochastic or stochastic, the estimator bw is still unbiased and consistent. However, as pointed out by Judge g; 31. (1985), the Within estimator is inefficient when the effects are random and uncorrelated with the regressors. 3.4.2 e er i ed at S uare st'm tio As was shown above, since the 311 in different time periods but for the same individual both contain u1, the errors in the equation (3.4.5) y1t XitB + 21D + WtD + m + V: + e1: XitB + 21D + WtD + 811, i =1,...,N; t =1,...,T are autocorrelated, and since the Sit in the same time period but for different individuals both contain ve, the errors are intertemporally correlated. Efficient estimation requires that we use the generalized least squares method. Following Fuller and Battese (1974), we write 57 (3.4.6) S = Cov(s) = pQ1 + qQ2 + rQ3 + er where p = 032, q = (6.2 + Tduz), r = (6.2 + No’vz), and k = (d.2 + Tot.2 + N032). Since the four matrices Q1, Q2, Q3, and Q4 are idempotent and orthogonal, it follows that (3-4-7) 8'1 = (1/p)Q1 + (l/q)Q2 + (l/r)Q3 + (l/k)Q4 Now, if we rewrite equation (3.2.5) as (3.4.7) y = XB + 20 + WC + u + v + e = RA + s where R = ( X, Z, W ) and A = ( 8', CT, DT )T, and if we assume that omz, 63‘, and 052 are known, the generalized least squares estimator of A from equation (3.4.7) is simply (3.4.8) acts = (RTS'IR)'1RTS'1y. Equivalently, the GLS estimator is ordinary least squares of (S'llzy) on (S'1/2R). Fuller and Battese (1974, pp. 77) show that, up to a factor of proportionality, (3.4.9) 8-113 = In: - (1-cz)Q2 - (1-c3)Q3 + (1+c4)Q4 where c2 = (p/q)112, c3 = (p/r)1/2, and co = (p/k)1/2, so that the GLS estimator can more conviently be calculated using the transformed variables (3.4.10) S‘llzy = y - (1-cz)Q2y - (1-c3)Q3y + (1+c1)Q4y (and similiarly for R). For example, 58 (3.4.11) (S‘llzy)1t = yit - (1-c2)y1. - (1-ca)y.t + (1+c1)y.. and this differs from the within transformation to the extent that the scalars c2, c3, and ca are nonzero. As pointed out by Hsiao (1986), the GLS estimator converges to the Within estimator when N - > 00, T -> 00, and the ratio of N over T trends to a non-zero constant. It can be shown that c2 tends to zero as T gets large, that c3 tends to zero as N gets large, and co tends to zero as T gets large and the ratio of N over T is bounded from above. The GLS estimator is consistent, as pointed out by Judge g; al, (1985), when both N -> 00 and T -> 00; it is not consistent as N -> 00 for T fixed or as T -> 00 for N fixed. The case when N -> 00 for T fixed will be discussed in more detail in section 3.6.1. 3-4-3 Eeishted_Least_§guare§_fistiaatign As an alternative approach to the generalized least squares estimator of A, consider the equations which result from the decomposition of equation (3.4.7). These orthogonal equations can be written as (3.4.12) Q1y = Q1XB + Q1e (3.4.13) sz = Q2XB + szc + Q2(u + e) (3.4.14) Q3y = Q3XB + QsWD + Q3(v + e) (3.4.15) Q1y Q4XB + Q4zc + Q4WD 4. Q4(u + v + e) 59 We note that the covariance matrices associated with the errors in the above equations may be written (respectively) 8.8 (3.4.16) Cov( Q18 ) = Q1SQ1 = pQ1 (3.4.17) Cov( st ) = stQ2 = qQ2 (3.4.18) Cov( Qas ) = Q3$Q3 = rQa (3.4.19) Cov( Qes ) = Q4SQ4 = kQ1 Each of these four covariance matrices is of the form of a constant times an idempotent matrix. These four constants may be equated by multiplying equations (3.4.12), (3.4.13), (3.4.14), and (3.4.15), by the weights (1/p), (1/q), (1/r), and (1/s), respectively. Moreover, it follows from Lemma (2.1) that least squares applied to the system so weighted yields the best (minimum variance) unbiased estimator within the class containing all least squares estimator of the parameter vector A from any further transformation of these equations or, in fact, the original equation (3.4.7). We will refer to the least squares estimator of A from the system of orthogonal equations (3.4-20) (1/p)Q1y (1/P)Q1XB + (llp)Qi (3.4.21) (1/q)Q2y (llq)Q2XB + (1/q)szc + (1/q)st (3.4.22) (1/r)Q3y (1/r)Q3XB + (1/r)Q3WD + (1/r)Qas 60 (3.4.23) (l/k)Q4y = (l/k)Q4XB + (1/k)Q4ZC + (l/k)Q4WD + (1/k)Q4k as the weighted least squares estimator of A. This estimator may be written as (3.4.24) ast = (RTQIR/p + RTQzR/q + RTQaR/r + RTQ4R/k)'1 times (R’Qi/p + R’Qz/q + RTQ3/r + RTQ4/k)y The decomposition of the original equation by the transformations, Q1, Q2, Q3, and Q4, has the effect of isolating the correlations found in the non-block diagonal covariance matrix of its error vector, S, to the particular orthogonal space. Since these transformations are orthogonal, and their sum is the identity matrix, equation (3.4.7) is said to have been ngdnced by the quadruple ( Q1, Q2, Q3, Q4 ) into the four orthogonal equations (3.4.12), (3.4.13), (3.4.14), and (3.4.15). Since these four equations contain exactly the same information as the orthogonal equation, we would expect that the minimum variance unbiased estimator from the four equations would be equivalent to the generalized least squares estimator from the original equation. This result is stated in the following theorem. Ibsgngn__(§;11: The weighted least squares estimator, sure, is equal to the generalized least squares estimator, 8629. 61 21291: The generalized least squares estimator of A from equation (3.4.7) can be rewritten as (3.4.25) acts (R’S'lR)'1RTS'1y (R’[(1/p)Q1 + (1/q)Q2 + (l/r)Qa + (1/k)Q4]R)'1 times RT[(1/p)Q1 + (1/q)Qz + (1/r)Q3 + (l/k)Q4ly (R’QiR/p + R'QzR/q + R'Qafilr + R"Q4R/k)'l times (RTQ1/p + R’Qz/q + R‘Qa/r + RTQ4/k)y Therefore, ast = aoLs. Q.E.D. Now least squares applied to equation (3.4.13) is called the between-individual estimator; a1 = (R'QzR)'1RTQ2y. It utilizes the variation between individuals. Least squares applied to equation (3.4.14) is called the between-time period estimator; as = (RTQaR)'1RTQ3y. It utilizes variation the between time periods. Recall that the within estimator can be viewed as least squares applied to equation (3.4.12); it utilizes the residual variation. Maddala (1971) claims that the generalized least squares estimator can be viewed as an efficient combination of the above three estimators. The optimal weights for the three different sets of variation are the constants being used to normalize each of the equations; i.e. the reciprocal of the variances p = 65’, q = 052 + Tahz, and r = 633 + NOv2 for the respective equations, (3.4.12), (3.4.13), (3.4.14). Since the weighted least squares estimator has been shown to be equivalent to 62 the generalized least squares estimator, this would imply that equation (3.4.15) may be dropped and the weighted least squares estimator computed using the remaining three equations only. Indeed, equation (3.4.15) only determines the constant term and, therefore, dropping this equation and omitting the constant term, leaves the estimates for the other coefficients unchanged. We prove this in the next theorem. eo e 3 : Weighted least squares applied to the set of equations (3.4.20), (3.4.21, (3.4.22), and (3.4.23) is equivalent to weighted least squares applied to the first three equations only. 2:22;: We rewrite equation (3.4.7) as (3.4.26) y = RA 4 s = R1 A1 4' R2 A2 '1' 8 where R1 = (1,...,1)‘r is a (NT x 1) vector of 1’s and A1 is the constant term. From Schmidt (1983), the generalized least squares estimator of A2 is (3.4.27) a2 = (RTMR)R’My where M = 8'1 - S‘1A1(A1‘S'1A1)‘1A1'S'1. But for our 8‘1 and A1, a straightforward calculation shows (3.4.28) M S'1 - (1/s)Q4 (1/plQi + (1/q)Qz + (1/r)Qa Q.E.D. 63 3.5 on fects Co e a ed ' e essors In some applications of the error-component model, there may be reasons to believe that either the individual-specific or the time-specific unobservable effects found in the error term may, in fact, be correlated with some of the included explanatory variables. If we take the view suggested earlier, that the random effects represent both omitted individual-specific and time-specific variables, this correlation would seem inevitable. When there is correlation between the random effects and the explanatory variables, the generalized least squares estimator is biased and inconsistent. We consider the case in which the effects are correlated with gnng of the regressors. To consider this case, we first need to introduce some notation. Consider the equation (3.5.1) y11 = (X111, X211, X311, X411)B + (211, Zz1)D + (W11, W21)C + u1 + Vt + e11 where X111 represents the (1 x g1) dimensioned vector of time and individual varying explanatory variables, 211 represents the (1 x k1) dimesioned vector of time-invariant explanatory variables, and W11’s represent the (1 x h1) dimensioned vector of individual-invariant explanatory variables, all of which are assumed to be uncorrelated with the three errors, u1, v1, and e11. The (1 x g2) dimensioned vector of time and individual-varying explanatory variables, X211, and the (1 x k2) dimensioned vector of time-invariant explanatory 64 variables, 221, are both assumed to be correlated with u1 but uncorrelated with V1 and e11. The (1 x g3) dimensioned vector of time and individual-varying explanatory variables, X311, and the (1 x hz) dimensioned vector of individual- invariant explanatory variables, W21, are both assumed to be correlated with V1 but uncorrelated with u1 and e11. Finally, the (1 x g4) dimensioned vector of time- and individual-varying explanatory variables, X411, is assumed to be correlated with u1 and V1 but uncorrelated with e11. As before, the random noise component, e11, the individual effects, u1, and the time effects, V1, are i.i.d. as well as independent of one another. We note in passing that the variables X, which vary over both individuals and time, may be correlated or not with both the individual effects u1 and the time effects V1. Thus there are four possible kinds of X’s. However, the variables 2 are time-invariant, and can not possibly be correlated with the time effects; there are only two kinds of 2’s, correlated or not with the individual effects. Similiarly, the variables W are individual-invariant, and can not possibly be correlated with the individual effects; there are only two kinds of W’s, correlated or not with the time effects. The matrix form of equation (3.5.1) can be written as (3.542) y = (X1, X2, X3, X4 )3 + (21, Zz )D +(W1,W2)C + u + V + e where y, u, V, and e are (NT x 1); X is (NT x g), g = g1 + g2 65 + gs + g4; 2 is (NT x k), k = k1 + k2; and W is (NT x h), h = h1 + h2. 3.5.1 Eeignted Lgnst Sgnnzgg Esgigntign We decompose equation (3.5.2) into the three orthogonal equations (3.5.3) Q1y Q1X1B1 + Q1X2Bz + Q1X3B3 + Q1X4B4 +Q1(e) (3.5.4) Q2y Q2X1B1 + Q2X2B2 + Q2X3Ba + Q2X4B4 + Q221D1 + Q222D2 + Q2( e + u) (3.5.5) Q3y Q3X1B1 + Q3X2B2 + Q3X3Ba + Q3X4B4 + Q3W101 + Q3W2C2 + Q3( e + v) Now since Q1v = 0 and Q1u = 0, there is no correlation between errors and regressors in (3.5.3). Furthermore, (3.5.6) H2 = [ Q2X1, Q2X3 ,Q221 ] can readily be seen to be the largest available set of variables in equation (3.5.4) which have been assumed uncorrelated with the indivdual effects. Likewise, (3.5.7) H3 = [Q3X1, Q3X2, Q3W1 ] can readily be seen to be the largest available set of variables in equation (3.5.5) which have been assumed uncorrelated with the time effects. Projecting equation (3.5.4) onto the column space of H2 and projecting equation (3.5.5) onto the column space of H3, we have the set of 66 orthogonal equations (3.5.8) Q1y Q1RA + Q1( e) (3.5.9) P2Q2y P2Q2RA + P2Q2( e + u ) (3.5.10) P3Q3y P3Q3RA + P3Q3( e + V ) where P2 = P[ H2 ] and P3 = P[ H3 ]. The covariance matrix associated with the errors in equations (3.5.8), (3.5.9), and (3.5.10) can be written as (3.5.11) Cov( Q1e ) = PQ1 (3.5.12) Cov( P2Q2( e + u ) ) = qu and (3.5.13) Cov( P3Q3( e + v ) ) = rPa, respectively. We note that each of these three covariance matrices has the form of a constant times an idempotent matrix. Thus, Lemma (2.1) would imply that any further attempt at diagonalizing the covariance matrices in any of the equations would not improve the efficiency of the resulting estimator. Using the weights p, q, and r, the weighted least squares estimator of A from equations (3.5.8), (3.5.9), and (3.5.10) becomes (3.5.14 ) ants = {(1/p)RTQ1R + (1/q)RTP2R + (1/r)R"P3R}"1 times {(1/p)RTQ1 + (1/q)RTP2 + (l/r)R'P3}y It is possible to derive the estimator (3.5.14) without 67 decomposing the equation into orthogonal spaces, as follows. First, we multiply equation (3.5.2) by 8’112 to transform the error term so that it has a scalar covariance matrix. The transformed equation is simply (3.5.15) S'llzy = S'llzRA + S'l’zs Second, we note that the maximal set of available instruments for equation (3.5.2) may be written as (3.5.16) 11" = [ Q1, Q2X1, Q2X3, Q221, Q3X1, Q3X2, Q3W1 ]. We then follow the path of Hausman and Taylor (1981), by estimating (3.5.15) using IV with instrument set H‘. This yields (3.5.17) a1v = { RTV'1/2P‘V‘1/2R }'1{ RTV'1/2P‘V'1/2 }y where P‘ = P[ H' ]. We can evaluate P[ H' ] using the following Lemma: emma 3:P[H‘]=Q1+P[H2]+P[H3] §Q1+P2+P3 The efficient instrumental variables estimator of A from equation (3.5.2) can then be written as (3.5.18) arv = { R'S'1’2(Q1 + P2 + p3)s-IIZR 1'1 times { R'S'1/3(Q1 + P2 + P3)S'1’2 )y = { RH (l/p)Q1 + (1/q)P2 + (1/rlPa)S'1’2R )'1 times { RT( (l/P)Q1 + (1/q)P2 + (1/r)P3) }y 68 But this is simply the weighted least squares instrumental variables estimator. We have therefore proved the following theorem. eo . : The efficient instrumental variables estimator of A equals weighted least squares applied to equations (3.5.8), (3.5.9), and (3.5.10). 3.5.2 Counting Rnles for Identificntion Following Hausman and Taylor and corresponding to the familiar rank condition we have the theorem: Theorem (3.5): A necessary and sufficient condition that the vector of parameters A be identified in equation (3.4.7) is that the matrix RTP‘R be non-singular. Corresponding to the order condition, we have the following theorem: Theorem (3.6): A necessary condition for the identification of A in equation (3.4.7) is that (i) g1 + g3 > k2 and (ii) g1 + g2 _>_ h2. Bro—ct: SinceP‘R=(Q1+P2+Ps)(XZW)=(Q1X00) + ( P2X P22 0 ) + ( P3X 0 P3W ) = ( P‘X P22 PsW ), rank(P‘R) = rank(P‘X) + rank(PzZ) + rank(PsW). It follows that a necessary condition for the matrix RTP‘R to be non-singular is that rank(P’R) = g, rank(PzR) = k, and rank(PaR) = h. Now rank(PzR) = min { rank(Pz), rank(R) } = g1 + g2 + k1. 69 Similiarly, rank(PaR) = g1 + g3 + hi and rank(P’R) = g. Thus, a necessary condition for identification is that g1 + g2 + k1 z k nnd 31 + g3 + hi 2 h. Q.E.D. Therefore, to insure that the parameters of the model are identified requires that the parameters in each of the three equations, (3.5.3), (3.4.4), and (3.4.5), separately be identified. eo e . : Given the rank condition of Theorem (3.6), weighted least squares applied to equations (3.5.8), (3.5.9), and (3.5.10) is a consistent estimator for A. Proof: Weighted least squares applied to equations (3.5.8), (3.5.9), and (3.5.10) can be written as BWLS {(1/p)RTQ1R + (1/9)RTP2R + (1/r)R'1'P3R}‘1 times {(1/p)RTQ1 + (1/q)RTP2 + (1/r)RTP3}y A + { (1/P)R’Q1R + (1/Q)RTP2R + (1/r)RTP3R }‘1 times {(1/p)RTQ1 + (1/q)RTP2 + (1/r)RTP3}(u + v + e) Since the estimator exists, lim { (1/P)RTQ1R + (1/9)RTP2R + (1/r)RTP3R }'1 is finite as both N -> 00 and T -> 00. Next, consider { (1/p)RTQ1 + (1/Q)RTP2 + (1/r)R'P3 }(u + V + e)/NT = (1/p)RTQ1(u + v + e)/NT + (1/q)RTP2(u + v + e)/NT + (1/r)R'P3(u + v + e)/NT 70 = (1/p)RTQ1(u + v + e)/NT + (1/q)(RTH2/NT)(H27H2/NT)'1H2’(u + v + e)/NT + (1/r)(R‘Hs/NT)(H3‘H3/NT)‘1H3’(u + v + e)/NT where RTQ1(u + v + e) H2’(u + v + H2T(u + v + As we can easily plim plim plim plim plim plim and plim Therefore, plim RTQ1(u + V + e)/NT = plim H37(u + v + e)/NT = 0 e) 8) _. X31Q2(u 1 L121 Q2 (u .' X1’Q3(v X2’Q3(V W1TQ3(V b show, X'Qie/NT = 0 X1TQ2(u XsTQ2(u 21’Q2(u X1TQ3(V X2'Q3(V W1’Q3(V + e)/NT + e)/NT + e)/NT + e)/NT + e)/NT + e)/NT + + as N X17Q2 (u + e) e) e)-J e) e) e)-J OOOOO -) ‘l , and N -> 00 or T -> 00, as N -> 00, as N -> 00, as N -> 00, as T -> 00, as T -> 00, as T -> oo. plim H2’(u + V + e)/NT co and as T -> oo. 71 Since the estimator exists, lim (RTH2/NT)(H2’Hz/NT)'1 is finite as N -> 00 and lim (R"H3/NT)(H3"H3/NT)"1 is finite as T -> 00. Thus, plim (1/p)RTQ1(u + v + e)/NT plim (1/q)(R'H2/NT)(H2'H2/NT)‘1H2'(u + V + e)/NT plim (1/r)(RTH3/NT)(H3'H3/NT)’1H3'(u + v + e)/NT 0 as both N -> 00 and T -> 00. It follows that, plim ast = A as both N - > 00 and T "> 004 Q.E.D. The weighted least squares estimator is not consistent as N -> 00 for T fixed or when T -> 00 for N fixed. The case when N -> 00 for T fixed will be discussed in more detail in section 3.6.2. 3.6 Random Effects when I is Fixed In the previous two sections, we have derived GLS and IV estimators which are useful only when both N and T are large. In this section, we will be concerned with the case in which N is large and T is small. This is the situation most common in panel or longitudinal data. (3.6.1) Rnndon Effecgs not gogzelnggd 11th fin; Reggegggrg For the present we will assume that the regressors (X, Z, and W) are all uncorrelated with the error components e, u, and v. Now unbiased estimation is still possible in the case of small (or fixed) T. The problem which does arise is 72 the inability to obtain consistent estimates from applying either least squares or generalized least squares to the above equation. To see this, consider equation (3.4.7) multiplied by S'llz. We then have (3.6.1) 5-1/2y = S'1/2XB + s-IIZZD + s-l/zwc + S'1/2(e + u + v) where 5-1/2 = ( Q1/p* + Q2/q* + Qa/r* + Q4/k* )1 P* = ( d2. )1’2; q* = ( dz. + szu )1/3, r* = ( 63. + N62. )1’2, and k* = ( 62. + dzu + 62v )1/3. Evaluating the probability limit of the cross product of the transformed regressor S'1/3X and the transformed error component S'llzv, we find that (3.6.2) plim XTS‘IV/NT plim XT(Q1/p + Q2/q + Q3/r + Q4/k)V/NT plim XT( st/r + Q4v/k )/NT since Q1v = sz = 0 plim (1/r)XT(jnju’/N O Im)V/NT + plim (1/r)XTQ4v/NT - plim (1/k)X’Q4V/NT This probability limit does not equal zero as N -> 00 since (3.6.3) plim (1/r)XT(ijn’/N O Ir)v/NT 2' l = 911111 (1/r))t:l(1ZiCJt/N)TV1/T = 0 only as T -> 00. Therefore, we have the problem of the regressors being correlated with one of the error components in the sense that their cross-moments have a nonzero 73 probability limit for the case when only N -> 00. Thus, for the case of fixed T, the generalized least squares estimator of the coefficient vectors in equation (3.4.7) will not be consistent. Furthermore, exactly the same problem arises with ordinary least squares. For example, XIV/NT has a non- zero probability limit as N -> 00 with T fixed. Only if both N and T -> 00 will ordinary least squares be consistent. A proposed solution to this problem is to apply weighted least squares to a subset of the equations in the decomposition of equation (3.4.7). Unfortunately, the coefficient vector C is no longer estimable. The weighted least squares estimator we derive is for the vector of coefficients ( BT, D'r )7. Consider the decomposition of equation (3.6.4) y = XB + ZD + WC + u + V + e into the four orthogonal equations (3.6.5) Q1y = Q1XB + Q1e (3.6.6) sz = Q2XB + ZD + Q2(u + e) (3.6.7) st = Q3XB + WC + Q3(v + e) (3.6.8) Q4y Q4XB + Q4ZD + Q4WC + Q4(u + v + e) Using Theorem (3.2), we know equation (3.6.8) may be dropped without affecting the estimation of the remaining equations. In addition, equation (3.6.7) must be dropped for it is the source of the present problem. That is, it is equation 74 (3.6.7) from which comes the matrix of cross moments that has a nonzero probability limit unless T gets large. Thus, the estimator of (BT, DI )7 will be derived by applying weighted least squares to the two remaining equations; namely equations (3.6.5) and (3.6.6). Let R = ( X, Z ). Then the weighted least squares estimator of (BT, Dr )T from equation (3.6.5) and (3.6.6) can be written as bst (3.6.9) = (RTQ1R/p + RTQzR/q)'1(RTQ1/p + R'Qz/Q)Y. dune s tio 3. : lim (RTQiR/p + RTQzR/q) as N -> 00 is finite and nonsingular. Ineozem (3.6): The weighted least squares estimator of ( BT, D1 )T from equations (3.6.5) and (3.6.6) is consistent as N -> 00, with T fixed. 2:001: The weighted least squares estimator can be written as bums = (R’Q1R/q + R'QzR/Q)'1(R’Q1/P + R'QZ/Q)y dwns B = + (R’Qifi/p + R'QzR/q)'1(R’Q1/p + RTQzlq)(u + e). D B D + (((llp)R’QiR/NT + (1/<1)R’Qzl%/NT)‘l times ((1/p)R'Q1/NT + (1/Q)RTQ2/NT)(u + e)}. 75 Consider the second term. First, ( (1/p)RTQ1R/NT + (1/q)R7Q2R/NT )'1 is by assumption finite as N gets large. Next, we can write ( RTQ1/p + RTQzlq )(u + e) = (1/p)RTQ1e + (1/q)R7Q2(u + e), where X'Qle (3.6.10) R‘Qle and X'thi + e) (3.6.11) RTQ2(u + e) 2"Q2(u + e) As we can easily show, (3.6.13) plim XTQie/NT = 0 as N -> 00 or T -> 00 (3.6.14) plim XTQ2(u + e)/NT = 0 as N —> 00 and (3.6.14) plim Z'Q1(u + e)/NT = 0 as N -> 00. Thus, plim RTQ1(u + e)/NT = plim RTQ2(u + e)/NT = 0 as N -> 00. Hence, plim {(1/p)RTQ1e/NT + (1/9)RTQ2(u + e)/NT} = 0 as N -> oo. bums 3 Therefore, plim as N -> oo. Q.E.D. dWLS D 76 (3.6.2) Random Effects Cornelnned winn the Regressors We again consider the case when T is fixed but, in addition, we consider the case in which the effects are correlated with nnnn of the regressors. To this end, we first re-introduce some notation. Consider the equation (3.6.15) y = XB + 2D + WC + u + V + e where we again assume that X = (X1, X2, X3, X4), 2 = (21, 22), and W = (W1, W2). That is, X1, 21, and W1 denote (NT x g1), (NT x k1), and (NT x h1) dimensioned matrices, respectively, all assumed to be uncorrelated with e, u, and v; X2 and 22 denote (NT x g2) and (NT x k2) dimensioned matrices, respectively, both assumed to be correlated with u but uncorrelated with e and v; X3 and W2 denote (NT x g3) and (NT x hz) dimensioned matrices, respectively, both assumed to be correlated with v but uncorrelated with e and u; and X4 denotes a (NT x g4) dimensioned matrix which is assumed to be correlated with both u and V but uncorrelated with e. Now, not only is the weighted least squares estimator of A biased, but so is the weighted least squares estimator of (B1, D")'r derived in section 3.6.1. This bias is due to the presence of regressors which are assumed correlated with the equation’s error term. One approach to consistent estimation of (BT, D')’ is to apply the instrumental variables, weighted least squares method to the equations (3.6.5) and (3.6.6). In this section we will derive such an estimator and show it to be consistent. 77 First we consider the equations (3.6.16) Q1y Q1(X1, X2, X3, X4 )B + Q1e (3.6.17) Q2y Q2(X1, X2, X3, X4)B + Q2(Z1, 22)D + Q2(u + e) Since Q1u Q1v = 0 and Q1W = 0, there is no problem of correlation between errors and regressors in (3.6.16). Furthermore, as we show in appendix A, the set (3.6.18) Ho = [Q2X1, Q2X3, Q221 ] contains legitimate instruments for equation (3.6.17). It can readily be seen that H0 is the largest available set of variables in equation (3.6.17) which have been assumed uncorrelated with the individual effects found in that equation. Projecting equation (3.6.17) onto the column space of Ho, we have the set of orthogonal equations (3.6.19) Q1y Q1(X1, X2, X3, X4)B + Q1e (3.6.20) P2y P2(X1, X2, X3, X4)B + P2(21, 22)D + P2(u + e) where P2 = P[Ho]. The covariance matrix associated with the errors in equations (3.6.19) and (3.6.20) can be written as (3.6.21) Cov( Q1e ) = pQ1 and 78 (3.6.22) Cov( P2(u + e) ) = qu, respectively. We note that each of these two covariance matrices has the form of a constant times an idempotent matrix. Thus, Lemma (2.1) would imply that any further attempt at diagonalizing the covariance matrices in either equation (3.6.19) or (3.6.20) would not improve the efficiency of the resulting estimator. Using the weights p and q, the weighted least estimator of (B7, D7)r from equations (3.6.19) and (3.6.20) becomes b1v (3.6.23) = {R'Qia/p + RTP2R/ql'1{R'Qiy/p + R’sz/q) dIV where R = (X1, X2, X3, X4, 21, 22). We first derive the necessary conditions for the existence of the above estimator, and then show it to be consistent for fixed T. Corresponding to the order condition, we have the following theorem: 0 e 3 : A necessary condition for the weighted least squares estimator of (B’, D")'r from equations (3.6.19) and (3.6.20) to exist is that g1 + g3 2 k2. 2129.1: The existence of the IV estimator depends on the matrix Q1 R Q1 X1 Q1 X2 Q1 X3 Q1 X4 0 0 PZR P2X1 P2X2 P2X3 P2X4 P221 P222 79 being of full rank. But for this matrix to be of full rank it is necessary that (P221, P222) be of full rank. And since (P221, P222) = (21, P[Q1X1, Q1X3, Q121122), it follows that a necessary condition for the existence of the estimator is that rank( X1, X3 ) 2 rank( 22 ); or that g1 + g3 2 k2. ID§2£2;_L§;§l= Given the rank condition of theorem (3.7), weighted least squares applied to equations (3.6.19) and (3.6.20) is a consistent estimator for (BT, D7)T when T is fixed. 2:221: Weighted least squares applied to equations (3.6.19) and (3.6.20) can be written as bxv = {R'QIR/p + R‘PzR/ql'1{R‘QI/p + R'Pz/qu dIV B = + {stole/p + RTPzR/q}'1{RTQ1/p + R'leq}(u + e) D Since the estimator exists, lim {RTQiR/p + R"'P2R/q}'1 is finite as N -> 00. Next, consider {RTQ1/p + RTP2/q}(u + e)/NT (1/P)RTQ1(u + e)/NT + (1/q)R'P2(u + e)/NT (1/P)RTQ1e/NT + (1/9)RTH0(H0'H0)'1Ho’(u + e)/NT (1/p)R7Q1e/NT + (1/q)(R‘Ho/NT)(HoTHo/NT)'1(Ho’(u + e)/NT) where XTQ1e RTQle = 80 and -X17Q2(u + e)- Ho'tu + e) = X37Q2(u + e) Z17Q2(u + e) As we can easily show, plim X’Qie/NT = 0 as N -> 00 or T -> oo, plim X1’Q2(u + e)/NT = 0 as N -> oo, plim X3TQ2(u + e)/NT = 0 as N -> 00, and plim Z1TQ2(u + e)/NT = 0 as N -> 00. Therefore, plim (l/p)RTQ1(u + e)/NT = plim HoT(u + e)/NT = 0 as N -> 00. Since the estimator exists, lim (R‘rHo/NT)(Ho"Ho/NT)"1 is finite as N -> 00. Thus, plim { (1/p)RTQ1(u + e)/NT + (1/q)(RTHo)(HoTHo)'1(H07(u + e)/NT } = 0 as N -> oo. brv b It follows that, plim = as N -> oo. dIV d Q.E.D. (3-6-3) W The above approach to estimating ( BT, Dr )'I is an extension of the analytical method used throughout this chapter. Instead, we could follow a naive extension of the analytical method used in chapter 2. In the simple model of chapter 2 for the case when random indiviual effects are present consistency of the least squares estimator requires 81 N —> 00. There the problem was that we had present regressors correlated with one of the error components in the sense that their cross-moments have a nonzero probabiltiy limit for the case when only T -> 00. This is because the effect of the random component u1 can be averaged out only in the direction of that component. That is, probability limits of terms like XTPu/NT = §:1(§f¥11/T)u1/N are to equal zero only as N -> 00. A solution to this problem was to construct a transformation P which determined the means for each of the individual groups and repeats these N observations T times. The within transformation, Q = In: - P, then transforms each observation into the difference between itself and its respective indiviual group mean. Premultiplying equation (2.4.7) by the within transformation eliminated the individual random effects and so the need for N -> 00. Thus, least squares applied to the transformed equation turns out to be consistent as T -> 00. Now however we are not interested in individual effects but rather in eliminating time effects and the need for T -> 00 so we construct a projection similiar to P but in the other direction. To this end, we define (3.6.24) P' = ( jujNT/N 0 Ir ) and Q' = In: - P’ where j'r = (1,...,1)T is (T x 1). The transformation P"I determines the means for each of the time periods and repeats each of these T observations N times. The transformation Q‘ transforms each observation into the difference between 82 itself and its respective time period mean. Explicitly, the (i,t) elements of P'y and Q‘y can be written as (3.6.25) (P'y)11 = y.1 and (Q'y)11 = y11 - y.1, respectively. We note that in terms of our previous notation P'=Q3+Q4 andQ'=Q1+Q2. Since W contains variables that are constant across all individual observations for a given time period, Q'W = 0. The elements of the columns of W are, on the other hand, unaffected by the transformation P'; that is, P’W = W. Analogous results hold true for the time effects v; that is, Q'V = 0 and P‘v = v. Thus, the original equation (3.6.4) can now be written equivalently as the two orthogonal equations (3.6.26) P‘y P'XB + P’ZD + P’WC + P‘(u + v + e) (3.6.27) Q‘y Q'XB + Q‘ZD + P‘(u + e) Equation (3.6.27) represents the original model after being purged of the time effects. Of course the coefficients of the individual-invariant regressors cannot be estimated but OLS applied to equation (3.6.27) would yield a consistent estimator (B7, DT)T. Anny-pninn_iflnfil: lim RTQ‘R as N -> 00 is finite and nonsingular. Ih£2£§!_1§1121: The least squares estimator of ( BT, DT )T from equation (3.6.27) is consistent as N -> oo. 83 Bgoof: The least squares estimator can be written as bone = ( R'Q'R )‘1R'Q'y dons a = + ( R'Q‘R )‘1R’Q’(u + e). D Consider the second term. First, ( RTQ'R )'1 is by assumption finite as N gets large. Next, we can write ( R’Q‘ )(u + e) = R'Q1(u + e) + R7Q2(u + e), where X’Qle 5 (3.6.28) R7Q1(u + e) = 0 and X’Qze (3.6.29) RTQ2(u + e) = . ZTQ2(u + e)' As we can easily show, (3.6.30) plim XTQie/NT ll 0 as N -> 00 or T -> 00, (3.6.31) plim XTQ2(u + e)/NT = 0 as N -> co, and (3.6.32) plim Z'Q2(u + e)/NT = 0 as N -> 00. Thus, plim RTQ1(u + e)/NT = plim RTQ2(u + e)/NT = 0 as N -> 00. Hence, plim { RTQ'(u + e)/NT ) = 0 88 N ‘> 00- 84 boss B as N -> oo. Q.E.D. Therefore, plim dons ~ D The OLS estimator from equation (3.6.27) can be viewed as unweighted version of the WLS estimator from equations (3.6.5) and (3.6.6) as we can see by comparing the following with equation (3.6.9). bons ( R’Q'R )‘1R’Q‘y dons ( RTQlR + RTQZR )‘1( RIQI + RTQZ )1! Since WLS weights the two equations (3.6.5) and (3.6.6) optimally, we would expect weighted least squares to be efficient relative to least squares. This is shown in the following theorem. Ineozem (3,11): The weighted least squares estimator of ( BT, D'r )7 from equations (3.6.5) and (3.6.6) is asymptotically efficient (as N -> 00) relative to the least squares estimator from equation (3.6.27). If p and q are known then weighted least squares is also efficient relative to least squares in finite samples. 2:001: We prove the finite sample case; the other case is similiar. Let Cov(u + e).§ 0- = pQ1 + qQ2 so 0-‘1 = (1/P)Ql + (1/Q)Q2. Then 85 buns Cov dams = ( R’QIR/P + R'Qzfi/q )‘1( R’Qi/p + R'Qzlq )(PQI + QQZ) times ( Q1R/p + QzR/q )( R’Q1R/p + R’QzR/q )"1 = ( R’QiR/p + R’Qzfilq )“( R'Qifi/p + R’Qzfi/q ) times ( RTQ1R/p + RTQzR/q )'1 = ( R’QlR/p + R’QzR/q )'1 = ( R'(Qilp + Qz/q)R )"1 = (RTOU'IR)'1 And bons . Cov = ( RIQ'R )‘1RTQ'OtQ‘R( R’Q’R )‘1 done Now to show is is boas - buns Cov - Cov dons des psd it is sufficient to show that burs bons- Cov - Cov dwns dons psd. But the latter expression can be written as (RTOt’lR) - R'Q‘R(RTQ‘O~Q'R)'1RTQ‘R (RIOu‘lR) - R'Q‘R(R"Q'O~Q'R)'1R7Q'R = RTOu'1/2[ I - 0-“2Q’R(RTQ'Ou-Q'R)'1RTQ‘O~1’2 104'1’2R R’Ou‘1/3[ I - D(D’D)‘1Dr ]OI‘1/2R 86 where D = Oalle‘R, which we see to be a quadratic form in an idempotent matrix; hence, our expression is psd. Q.E.D. A similiar approach can be applied to the case when T is fixed but, in addition, the effects are correlated with none of the regressors. Using the notation of section 3.6.2 we consider the equation (346.37) y = (X1, X2, X3, X4 )8 + (Z1, 22 )D +(W1,W2)C+u+v+e Here again, X1, 21, and W1 denote (NT x g1), (NT x k1), and (NT x h1) dimensioned matrices, respectively, all assumed to be uncorrelated with e, u, and v; X2 and 22 denote (NT x g2) and (NT x k2) dimensioned matrices, respectively, both assumed to be correlated with u but uncorrelated with e and v; X3 and W2 denote (NT x g3) and (NT x h2) dimensioned matrices, respectively, both assumed to be correlated with v but uncorrelated with e and u; and X4 denotes a (NT x g4) dimensioned matrix which is assumed to be correlated with both u and V but uncorrelated with e. As shown in section 3.6.2, the weighted least squares estimator of (BT, DI)r derived in section 3.6.1 is biased due to the presence of regressors assumed correlated with the equation’s error term. An alternative approach to consistent estimation of (B’, Dr)? is to transform equation (3.6.39) by Q’ and then apply the instrumental variables method. In the remainder of this section we will derive two such estimators and discuss their consistency and relative efficiency. 87 First we consider the decomposition of equation (3.6.37) into two orthogonal equations (3.6.38) P'y = P’(X1, X2, X3, X4)B + P‘(21, 22)D + P‘(W1,W2)C + P'(u+v+e) (3.6.39) Q‘y Q’(X1, X2, X3, X4)B + Q‘(Z1, 22)D + Q'(u + e) Since Q‘v = 0 and Q'W = 0, time effects are eliminated from equation (3.6.39) but there still exists a problem of correlation between errors and the regressors X2, X4, and 22. The largest set of legitimate instruments for equation (3.6.39) would appear to be (3.6.40) H: = [Q‘X1, Q‘Xa, Q'Zi ] Unfortunately, by comparing H: to the list of instruments used for the instrumental variables estimator given in equation (3.6.23) it can be seen that H: is not the largest available set of instruments available. Although not apparent in equation (3.6.39), both Q1X3 and Q1X4 are available instruments being excluded. Following White (1984, section IV.3), the efficiency of an instrumental variables estimator is not decreased by adding more instruments. Hence, an instrumental variables estimator using the instrument set He would not lead to a more efficient estimator than the instrumental variables estimator given in equation (3.6.23). It is interesting to note that the existence of the 88 instrumental variables estimator using H: depends on the matrix P.R = [ P[Q'Hs]X, P[Qthlz ] being of full rank; P. = P[Hst]. It follows that a necessary condition for the existence of the above estimator is that rank( X1, X3, 21 ) 3 rank( X ) and rank( X1, X3, 21 ) > rank( 2 ); or that k1 2 g2 + g4 and g1 + g3 2 k2. Thus, not only is it necessary to have enough X1’s and X3’s to identify the coefficients of the 22’s but now we must have enough 21’s available to identify the coefficients of the X2’s and X4’s. If instead of using the instrument set Ha, we use the instrument set (3.6.41) H34 = [ Q'Xl, Q1X2, Q’Xa, Q1X4, Q22 ] we would be using the same list of instrument used in the instrumental variables estimator given in equation (3.6.23). Projecting equation (3.6.39) onto the column space of Has, we have the equation (3.6.42) P+y = P.(X1, X2, X3, X4)B + P.(21, 22)D + P.(u + e). It can be shown that P. = P1 + P2, where P1 = P[ Q1X, 0 ], P2 = P[ Ho ], and H0 is the instrument set given in equation (3.6.18). The least squares estimator of (B‘, D“')'r from equation (3.6.42) can be written as btlv (3.6.43) 3 (R7P4R)'1RTP.y dtlv 89 whereR = (X1,X2,X3,X4, 21, 22). We first derive the necessary conditions for the existence of the above estimator, and then show it to be consistent for fixed T. Corresponding to the order condition, we have the following theorem: Inegzgn (3,12): A necessary condition for the least squares estimator of (B7, D'l')'r from equation (3.6.42) to exist is that g1 + g3 2 k2. Pnoof: The existence of the least squares estimator from equation (3.6.42) depends on the matrix P.R being of full rank. And since P.R = P1R + P2R = ( P4X, P22 ), it follows that a necessary condition for the existence of the estimator is that rank( P22 ) 3 k. But this requires that the rank( P22 X1, X3, Z1 ) > k; or that g1 + g3 2 k2. Q.E.D. e0 e 3. : Given the rank condition of theorem (3.12), least squares applied to equations (3.6.42) is a consistent estimator for (BI, DI)? when T is fixed. 2:001: Least squares applied to equation (3.6.42) can be written as bsiv - B = {RTP.R }‘1RTP.y d-xv D + {RIP4R}‘1RTP.u + e) 90 Since the estimator exists, lim { R7P4R }‘1 is finite as N -> 00. Next, consider { RIP. }(u + e)/NT R'P1(u + e)/NT + RTP2(u + e)/NT R7Q1R(R7Q1R)'1RTQ1(u + e) + RTHo(Ho'Ho)'1HoT(u + e) R‘Q1(u + e)/NT + (R’Ho/NT)(HoTHo/NT)‘1(Ho'(u + e)/NT) where X’Q1e RTQHu + e) = ° 1 and X17Q2(u + e) HoT(u + e) = X3"Q2(u + e) 21"Q2 (u + e) b _ As we can easily show, plim X’Qie/NT = 0 as N -> 00 or T -> oo, plim X1TQz(u + e)/NT = 0 as N -> oo, plim X37Q2(u + e)/NT = 0 as N -> 00, and plim Z1'Q2(u + e)/NT = 0 as N -> 00. Therefore, plim R'Q1(u + e)/NT = plim HoT(u + e)/NT = 0 as N -> 00. Since the estimator exists, lim (R'I'Ho/NTHHoTHo/NT)’1 is finite as N -> 00. 91 Thus, plim { RTQ1(u + e)/NT + (RTHo)(HoTHo)‘1HoT(u + e)/NT } = 0 as N -> 00. balv b It follows that, plim as N -> 00. d-Iv d Q.E.D. The least squares estimator in equation (3.6.43) can be viewed as an unweighted version of the weighted least squares estimator from equations (3.6.19) and (3.6.20), as follows. balv (RIPoR )‘1R7P4y dslv (arms + RTPzR)-1(R’P1 + RTchr (RTQ1R(RTQ1R)'1RTQ1R + aruo(HoTHo)-1HoTR)-1 times (arola(atoia)-IRTQ1 + RTHo(HoTHo)-1Ho’)y (RTQIR + RTPZR)'1(RIQ1 + R’P2)y Since the weighted least squares estimator weights the equations (3.6.19) and (3.6.20) optimally, we would expect that weighted least squares is efficient relative to ordinary least squares. This is shown in the following. Ih£2££l_l§41113 The weighted least squares estimator of ( BT, D'r )’ from equations (3.6.19) and (3.6.20) is asymptotically ( as N -> 00 ) efficient relative to the least squares estimator from equation (3.6.43). If p and q are known then weighted least squares is also efficient relative to least squares in finite samples. 92 22221: We prove the finite sample case; the other case is similiar. Again, let Cov(u + e) 5 0- = PQl + qQ2 and 0a'1 = (1/P)Q1 + (l/q)Qz. Then b1v Cov dIV = ( R'Qin/p + R’PzR/q )'1( R’Qi/p + R'leq )(le + 9P2) times ( QlR/p + PzR/q )( arola/p + RIPzR/q )-1 = ( R'QiR/p + RIPzR/q )'1( R7Q1R/p + R'PzR/q ) times ( RTQ1R/p + RTPzR/q )‘1 = ( R’Qifllp + R7P2R/q )‘1 = ( R‘(Q1/P + P2/Q)R )'1 = (R"On-'1R)'1 And belv COV = ( RrpeR )-1RTP.O~P.R( RrpeR )"'l dclv Now to show bans bums Cov - Cov dons dwns is psd it is sufficient to show that -1 - buns bone Cov - Cov dWLS dons 18 p.s.d. But the latter expression can be written as (RIO-'11!) - ( R'PeR )‘1R1P40IP4R( RIP.R )‘1 R70u‘1/3[ I - 01:113P+R(R"P+OIP+R)'1R'l'P+O:-1/2 ]O-'1/2R aro.-112[ I - D(D’D)'1DT ]o.-112R 93 where D = 041/2P.R, can be seen as a quadratic form in an idempotent matrix and hence, our expression is psd. Q.E.D. 3.7 Vagiancg Estimation when nhe Enndon Effects nnn not Cor e ed t e or When discussing the generalized least squares estimator, we have implicitly assumed that the variance components, 032, 052, and owl, were known. In practice, this is not the case; the variance components are usually unknown and, therefore, must be estimated. When estimates of the variance components are used in place of the actual values, we have an example of eas' e eneral' e east 3 u e . Under mild regularity conditions, Fuller and Battese (1973) have shown that the feasible generalized least squares estimator is consistent and has the same asymptotic distribution as the generalized least squares estimator with known variance components. This result holds true for either large N or large T. Swamy and Arora (1972) caution that, for small samples, the feasible generalized least squares estimator could have larger variances than either the least squares estimator if the variance components 063 and dvz are small, or the within estimator if 062 and 0&3 are very large. Efficiency in the estimation of the variance components and its subsequent effect on the efficiency of the feasible generalized least squares has been discussed by Amemiya (1971). In the following discussion, we rewrite equation 94 (3.4.12), (3.4.13), and (3.4.14) as (3.7.1) Q1y = R1A1 + Q18 where R1 = (QIX). A1 = ( Br )1. and rank( RI ) = 83 (3.7.2) Q2y = R2A2 + st where R2 = (Q2X, Q22), A2 = (Br, DT)T, and rank( R2 ) =s+k: (3.7.3) Qay = R3A3 + 038 where R3 = (Q3X, Q3W), A3 = (BT, CT)T, and rank( R3 ). =g+he If feasible weighted least squares is to be implemented instead of the equivalent feasible least squares procedure, the weights p, q, and r are the parameters we need to estimate. One approach to estimating these weights is to estimate p = 032 using residuals from equation (3.7.1), q = 032 + Tot.2 using residuals from equation (3.7.2), and r = 032 + Nd'v2 using residuals from equation (3.7.3). The groundwork for such an approach is laid by Maddala (1971), Nerlove (1971), and Swamy and Arora (1972). We now proceed to show that estimators so defined are both unbiased and consistent. We define the sum of squared residuals from equation (3.7.1) as (3.7.4) SSE1 = (Q1 - R1a1)"(Q1 - R1a1) where the residuals have been computed using the least 95 squares estimates of the coefficents in equation (3.7.1), namely (3.7.5) a1 = (R17R1)'1R17y. We also define the sum of squared residuals from equation (3.7.2) as (3.7.6) SSEz = (Q2 - R2a2)’(Q2 - Rzaz) ‘where the least squares estimates of the coefficents in equation (3.7.2) are given as (3.7.7) a2 = (R21R2)'1R2Ty. .And we define the sum of squared residuals from equation (3.7.3) as (3.7.8) SSEs = (Q3 - R3a3)7(Q3 - Rsaa) where the least squares estimates of the coefficents in equation (3.7.3) are given as (3.7.9) as = (R31R3)‘1R3Ty. Thence-4.3.19.1: Let s12 = SSE1/{(N-1)(T-1) - g}, 322 SSEz/{(T-1) - g - k}, SSEa/{(N-1) - g - h}. and 832 Then 81:, 322, and 832 are unbiased estimators of p, q, and r, respectively. 96 2222:: Let P1 represent the projection onto the column space of the regressors in equation (3.7.1); i.e. P1 = P[R1] = R1(R1’R1)'1R1". Then Q1P1 = P1Q1 = P1, P1R1 = R1, P1'r = P1, and P1 is orthogonal to Q2, Q3, and Q4. First we write the residual from equation (3.7.1) as Residuali = (Q1y - Q1P1y) = R1A1 + Q1s - P1Q1y =R1A1 + Q18 -P1R1A1 - P1Q1s =R1A1 - R1A1 + Q13 - P1s = (Q1 - P1)s We then form the expression SSE1 = (01y - Qipiy)'(Q1y - QIPIY) = 8"(Ql - Q1P1)"(Q1 - Q1P1)s = sT(Q1 - P1Q1 - Q1P1 + P1Q1P1)s = 87(Q1 - P1 - P1 + P1)s = sT(Q1 - P1)s Taking the expectation of SSE1, we write Exp{ SSE1 } = Exp{ sT(Q1 - P1)s } Exp{ trace{ sT(Q1 - P1)s } } E{ trace{ (Q1 - P1)ssr } } since trace(AB) = trace(BA) if AB and BA are both defined and square. trace{ (Q1 - P1)EXP{ 88' l } trace{ (Q1 - P1){ pQ1 + qQ2 + rQa + kQ4 } (r)trace{ (Q1 - P1) ) 97 (p)rank(Q1 - P1) since trace(A) = rank(A) if A is idempotent (p){rank(Q1) - rank(R1)} Thus, Exp{ 812 } = p. Now, let P2 represent the projection onto the column space of the regressors in equation (3.7.2); i.e. P2 = P[Rz] = R2(R2‘R2)'1R27. Then Qsz = P2Q2 = P2, P2R2 = R2, P2'r = P2, and P2 is orthogonal to Q1, Q3, and Q4. First we write the residual from equation (3.7.2) as Residualz = (Q2y * Qszy) = R2A2 + st - Pzsz =R2A2 + st -P2R2A2 - P2st =R2A2 - R2A2 + 028 - st = (Q2 - P2)s We then form the expression SSEz (Q2y - QzPZY)'(Q2y - Qszy) s’th - Q2P2)'(Qz - QzP2)s s"(Q2 - P2Q2 - Q2P2 + P2Q2P2)s 81(Q2 '1’: -P2 +P2)s s'r (Q2 - P2 )8 Taking the expectation of SSEz, we write Exp{ SSEz } Exp{ sT(Q2 - P2)s } Exp{ trace{ s’(Q2 - P2)s } } E{ trace{ (Q2 - P2)ss'r } } since trace(AB) = trace(BA) if AB and BA are both defined and square. 98 = trace{ (Q2 - P2)Exp{ ssT } } trace{ (Q2 - P2){ PQ1 + qQ2 + rQ3 + kQ4 } = (q)trace{ (Q2 - P2) } (q)rank(Qz - P2) since trace(A) = rank(A) if A is idempotent (q){rank(Q2) - rank(R2)} Thus, Exp{ sz2 } = q. Finally, let P3 represent the projection onto the column space of the regressors in equation (3.7.3); i.e. P3 = P[Ra] R3(R3’R3)'1R31'. Then Q3P3 = P3Q3 = P3, P3R3 = R3, P3T P3, and P3 is orthogonal to Q1, Q2, and Q4. First we write the residual from equation (3.7.3) as Residuals = (Qay - Q3P3y) = R3A3 + Qas - P3Q3y =R3A3 + Qas -P3R3A3 - P3Q3s =R3A3 - R3A3 + Qas - Pss = (Q3 - P3)s We then form the expression SSE3 = (Qay - QaPsy)’(Qay - QaPay) = sT(Q3 - Q3P3)"(Q3 - Q3P3)s = sT(Q3 - P3Q3 - Q3P3 + P3Q3P3)s = s'(Q3 -P3 -P3 +P3)s 8’ (Q3 - P3 )8 Taking the expectation of SSEa, we write Exp{ SSEs } = Exp{ s7(Q3 - P3)s } 99 Exp{ trace{ sT(Q3 - P3)s } } E{ trace{ (Q3 - P3)ss' } } since trace(AB) = trace(BA) if AB and BA are both defined and square. trace{ (Q3 - P3)Exp{ ss' } } trace{ (Q3 - P3){ PQl + qu + rQ3 + kQ4 } (r)trace{ (Q3 - P3) } (r)rank(Q3 - P3) since trace(A) = rank(A) if A is idempotent (r){rank(Qa) - rank(R3)} Thus, Exp{ 832 } = r. Q.E.D. re 3. 0 : Let 912 = SSE1/{(N-1)(T-1) - a}. 322 = SSE2/{(T-1) - g - k}, and $32 = SSEa/{(N-1) - g - h}. Then a) s12 is a consistent estimator of p as N or T -> 00, b) 822 is a consistent estimator of q = 052 + To'u2 as N -> oo , and c) ss2 is a consistent estimator of r = 052 + No'v2 as T -> oo . 2.22211: a) plim 812 plim SSE1/{rank(Q1) - rank(R1)} plim SSE1/(N-l)(T-l) = plim s‘(Q1 - P1)s/(N-1)(T-1) plim sTQ1s/(N-1)(T-1) - plim s'P1s/(N-1)(T-1) 100 The last term is zero since sTPis/(N-1)(T-1) = [sTR1/(N-1)(T-1)llRi'R1/(N-1)(T-1)]'1RiTs/(N-1)(T-1) and R1Ts/(N-1)(T-1) -> 0 as (N-1)(T-1) -> oo ( as either N -> 00 or T -> oo ) The first term equals 032 because s’Q1s can be shown to be distributed as 03€K2(u-1)(r-1) using standard results (e.g. Rao (1973, p 185)) on the distribution of idempotent quadractic forms in normals. b) plim s22 = plim SSE2/{rank(Q2) - rank(R2)} = plim SSEz/(N-l) plim sT(Q2 - P2)s/(N-1) plim sTst/(N-l) - plim s'st/(N-l) The last term is zero since sTst/(N-l) = [s'Rz/(N-1)][R2TR2/(N-1)l‘leTs/(N-l) and Ra‘s/(N-l) -> 0 as N -> 00. The first term equals q = 662 + Td'uz because sTst can 'be shown to be distributed as qx2(u-1) using standard results (e.g. Rao (1973, p 185)) on the distribution of idempotent quadractic forms in normals. c) plim 832 = plim SSEa/{rank(Q3) - rank(R3)} plim SSEa/(T-l) plim 37(Q3 - P3)s/(T-1) plim s’Qas/(T—l) - plim s'Pas/(T-l) 101 The last term is zero since s’Pss/(T-l) = [s’Ra/(T-l)][R3’R3/(T-1)]'1R3’s/(T-1) and Ra’s/(T-I) -> 0 as T -> 00. The first term equals q = 053 + Not.2 because sTQas can be shown to be distributed as qx2(1-1. using standard results (e.g. Rao (1973, p 185)) on the distribution of idempotent quadractic forms in normals.‘ Q.E.D. 3.8 gaginnce Estimntion ghnn tne Bnndon Efifegts nze Co rel ted with e Re re 0 So far we have considered variance estimation for the feasible weighted least squares estimator only. We now turn our attention to the model of section 3.5, in which some of the regressors are correlated with the random effects. Once again we will need to estimate the weights p, q, and r, since they are needed to implement the weighted instrumental variables estimator. The estimate of p based on the within residuals, discussed in section 3.7, is still consistent in this model. However, the estimate of q = 032 + Td'u2 and r = 632 + No'v2 which was discussed in section 3.7 is not consistent, since it was based on the residuals from least squares applied to (3.5.3) and (3.5.4), and these least squares estimator are inconsistent when regressors are correlated with either equation’s error term. We therefore turn our attention to the problem of finding consistent estimates of B, D, and C. Then, using these consistent estimates of A2 = ( B”, D1 )T and A3 " ( 3‘, Or )T, we derive consistent estimate of q and r. The 102 background for this approach is the work of Hausman and Taylor (1981), who suggest the estimate of q which we discussed in section 2.7. However, they do not give a rigorous proof that it is consistent nor do they discuss the estimation of r. The following assumptions will be made. Assnnntion (3,11): Let H2 = [ Q2X1, Q2X3, 21 ] and H3 = [ Q3X1, Q3X2, W1 ]. Then we assume that (i) plim XTQie/(N-1)(T-1) = 0 as either N -> 00 or T -> oo. (ii.a) plim HzTQ2(u + e)/N 0 as N -> oo. (ii.b) plim H3’Q3(V + e)/T 0 as T -> 00. (iii) plim (X'QiX)/(N-1)(T-1) is finite and nonsingular as either N -> 00 or T -> oo. (iv.a) plim (H2721)/N is finite as N -> oo. (iv.b) plim (H3'W1)/T is finite as T -> oo. (V.a) plim (H2‘X)/N is finite as N -> oo. (V.b) plim (H3’X)/T is finite as T -> 00. Even after the introduction of X2, X3,X4, 22, and W2 - regressors assumed correlated with the effects - the within estimator is still a consistent estimator of B; no correlation exists between the disturbance and the regressors in equation (3.5.3). So the problem of finding a consistent estimator of A is reduced to finding a consistent estimator of D and C. The two regression equations introduced in the following Lemma will be used in deriving such estimators. 103 e a 3. 2 : Let f2' = Q2(y - wa) and f3' = Q3(y - wa). Then (3.8.1) fz' 2D + (Q2 - Q2X(XTQ1X)'1XTQ1)s and (3.8.2) fs' WC + (Q3 - Q3X(XTQ1X)'1XTQ1)s Eroof: f2‘ = Q2(y - wa) Q2y - Q2wa = Q2y - Q2X(XTQ1X)‘1XTQ1Y Q2(XB + ZD + WC + s) - Q2X(XTQ1X)-IXTQ1(XB 4 ZD + WC + s) Q2(XB + 2D + WC + s) - Q2X(XTQ1X)'1XT(Q1XB + Q1s) Q2(XB + 2D + WC + s) - Q2X(XTQ1X)'1XTQ1XB + Q2X(XTQ1X)'1XTQ13 = Q2XB + QZZD + Q28 - Q2XB + Q2X(XTQ1X)'1XTQ18 ZD + (Q2 - Q2X(X7Q1X)‘1XTQ1)s fa‘ Q3(y - wa) = Q3y - Q3wa = Q3y - Q3X(XTQ1X)'1XTQ1y Qa(XB + ZD + WC + 8) - Q3X(XTQ1X)-IXTQ1(XB + ZD + WC + s) Q3(XB + 2D + WC + s) - Q3X(X'Q1X)'1X'(Q1XB + Q13) Q3(XB + 2D + WC + s) - Q3X(X'Q1X)'1X’Q1XB + Q3X(XTQ1X)'1X'Q18 Q3XB + Q3WC + Q38 - Q3XB + Q3X(XTQ1X)'1XTQ1s WC + (Q3 - Q3X(XTQ1X)'1XTQ1)s Q.E.D. Since part of Z is correlated with the error term, least squares applied to equation (3.8.1) does not yield a 104 consistent estimator of D. Likewise, since part of W is correlated with the error term, least squares applied to equation (3.8.2) does not yield a consistent estimator of C. But, using H2 = ( Q2X1, Q2X3, Z1 ) as a set of instruments, the instrumental variable estimator of D from equation (3.8.1) is defined as (3.8.3) dIV = (XTP[H2]X)'1XTP[H2]f2'. Similiarly, using H3 = ( Q3X1, Q3Xz, W1 ) as a set of instruments, the instrumental variable estimator of C from equation (3.8.2) is defined as (3.8.4) c1v = (XTP[H3]X)'1XTP[H3]f3'. It is interesting to note that using f3“ = (y - wa) instead of f3' = Q3(y - wa) would not increase the efficiency of the estimator, crv. Indeed, since P[H3]Q3 = Q3P[H3] = P[H3]. 21’Q1 = 0, and the first order condition (i.e. the "normal equations") defining bu imply that ( X1Q1. Xz'Qi )(y - wa) 0. (3.8.5) H37Q3(y - wa) Hs’(y - wa); thus the nnnn estimator would result if we used f3" in place of f3‘. Given the estimators div and crv, the next question is whether these estimators are, indeed, consistent estimates of D and C. But first, we consider the conditions necessary to assure that both dIV and crv do exist. 105 3.8.1 Ne essa Condit' ns 0 t e x'stence of and c A necessary condition for the existence of dIV is that the rank of H2 be at least is large as the rank of 2; that is, there must be at least as many instruments as regressors. This requires that g1 + g3 + k1 2 k, or g1 + g3 2 k2. Intuitively, Q2X1 and Q2X3 are serving as instruments for 22, and so there must be at least as many variables in X1 and X3 as in 22. Similiarly, a necessary condition for the existence of 01v is that the rank of H3 be at least is large as the rank of W; that is, there must be at least as many instruments as regressors. This requires that g1 + g2 + hi 2 h, or g1 + g2 3 h2. Here, Q3X1 and Q3X2 are serving as instruments for W2, and so there must be at least as many variables in X1 and X2 as in W2. The fact that f1‘ and f2’ are calculated from the within-groups residuals suggests that if bw is not fully efficient, then dIV and c1v may not be fully efficient either. 3.8.2 Consistgncy of QT! nnd c11. e a . 3 : Given assumption (3.11), (1.a) plim ZTP2Q2(u + e)/N 0 as N -> oo, (1.b) plim WTP3Q3(V + e)/T 0 as T -> oo, (2.a) plim Z'PzZ/N is finite and non-singular as N -> oo, (2.b) plim W’PsW/T is finite and non-singular as T -> 00, (3.a) plim 2*P2X/N is finite as N -> 00, (3.b) plim WTPsX/T is finite as T -> 00. Lemma (3.13) can be easily proved by noting that P2 106 = P[Hz] = H2(H2'H2)‘1H2‘, where H2 = Qz(X1, X3, 21), and that P3 = P[H3] = H3(H3'H3)'1H3T, where H3 = Q3(X1, X2, W1). Ineozgn (3,11): The instrumental variable estimator dIV is a consistent estimator of D as N gets large and the instrumental variable estimator crv is a consistent estimator of C as T gets large. 2:00: 3 First, we rewrite dIV as div (27P22)'1ZTP2d2‘ (ZTP22)'1ZTP2(ZD + (Q2 - Q2X(XIQ1X)'1XTQ1)s) (ZIPZZ)'12TP22D + (ZTP22)'1ZTP2(Q2 - Q2X(XTQ1X)'1XTQ1)s D + (ZTP22)‘121P2Q28 - Q2X(XTQ1X)'1X'Qis D + (ZTP22/N)'1{ZTP2Q2(e + u)/N} - (ZTPZZ/N)'1{ZTPzQZX/N}(XTQ1X/N)'1{X‘Qle/N} By Assumption (3.11), plim X’Qie/N = 0 as N -> 00 and plim (XTQ1X)/N is finite and nonsingular as N -> 00. Using 0 as Lemma (3.13), it follows that plim 2‘P2Q2(e + u)/N N -> oo, plim (ZTP22)/N is finite and nonsingular as N -> 00, and plim (Z’P2Q2X)/N is finite as N -> 00. Thus, plim dIV D + {finite}{ 0 } - {finite}{finite}{finite}{0} D 107 Next, we rewrite crv as crv (WTPsW)'1WTP3d3' (WTP3W)'1WTP3(WC + (Q2 - Q2X(XTQ1X)'1XTQ1)s) = (WTP3W)‘1WTP3WC + (WTP3W)'1WTP3(Q3 - Q3XlXTQ1X)'1XTQ1)s C + (WTPsW)'1WTP3Qas - Q3X(X'Q1X)'1XTQ1s C + (WTP3W/T)'1{WTP3Qas/T} - (WTP3W/T)'1{WTPaQsX/T}(XTQ3X/T)‘1{X’Qas/T} By Assumption (3.11), plim XTQie/T = 0 as T -> 00 and plim (XTQ1X)/T is finite and nonsingular as T —> 00. Using Lemma (3.13), it follows that plim WTP3Q3(e + v)/T = 0 as T -> oo, plim (WTP3W)/T is finite and nonsingular as T -> 00, and plim (WTP3Q3X)/T is finite as T -> 00. Thus, plim crv = C + {finite}{O} - {finite}{finite}{finite}{0} = c Q.E.D. 3.8.3 WW1: Using as a consistent estimate of A2 = ( BT, Dr )'I the estimator bu and div, we will now form a vector of residuals. We will then show that the sum of the squared terms of this residual vector, divided by N, is a consistent estimator of q = 053 + T062. Similiarly, using as a consistent estimate of A3 = ( B7, Cr )7 the estimator by and crv, we will form a vector of residuals and then show that the sum of the squared 108 terms of this residual vector, divided by T, is a consistent estimator of r = 0&3 + Ndvz. (3.8.6) Residualz Q2y - Q2wa - Q22d1v and (3.8.7) Residuals Q3y - Q3wa - Q3Wc1v Then Residualz = Q2(e + u) - Q2X(XTQ1X)'1X'Q1e - Q22(ZTP22)'1ZTP2Q2(e + u) + Q22(ZTP2Z)‘1ZTP2Q2X(X‘Q1X)"XTQle and Residuals = Q3(e + v) - Q3X(XTQ1X)'1XTQ1e - Q3W(W’P3W)'1W7P3Q3(e + V) + Q3W(WTP3W)‘1WTP3Q3X(XTQ1X)'1XTQ1e 2:001: First, we rewrite Residualz as Residualz = Q2y - Q2va - QZZdIV Q2y - Q2X(XTQ1X)'1X'Q1y - Q22(ZTP22)-IZTP2d2‘ Qz{ XB + ZD + WC + s } - Q2X(X'Q1X)'1X’Q1{ XB + 2D + WC + s } - Q22(ZTP22)'12"P2 times{ ZD + (Q2 - Q2X(XTQ1X)‘1XTQ1)(e + u) } Next, Residuals 109 Q2XB + ZD + Q2(e + u) - Q2X(X’Q1X)'1XTQ1XB - Q2X(XTQ1X)'1XTQ1e - Q22(Z'P22)'1ZTP22D - Q22(ZTP22)'1ZTP2Q2(e + u) + Q22(Z'P2Z)‘1Z"P2Q2X(XTQ1X)'1XTQ1e Q2XB + ZD + Q2(e + u) - Q2XB - Q2X(XTQ1X)'1XTQ1(e + u) - QZZD - Q22(ZTP22)'127P2Q2(e + u) + QZZ(Z'P2Z)'127P2Q2X(X7Q1X)‘1XIQ16 Q2(e + u) - Q2X(XTQ1X)'1XTQ1e - Q22(ZTP22)'12TP2Q2(e + u) + Q22(ZTP2Z)‘12'P2Q2X(XTQ1X)'1XTQ1e we rewrite Residuala as Q3y - Q3wa - Q3Wcrv Q3y - Q3X(XTQ1X)'1XTQ1y - Q3W(WTP3W)‘1WTP3d3* Q3{ XB + 20 4 WC + s } - Q3X(XTQ1X)-1X701{ XB + 2D + WC 4 s } - Q3W(WTP3W)-1WTP3 times { wc + (Q3 - Q3X(X’Q1X)'1X7Q1)(e + u) } Q3XB + wc + Q3(e + u) - Q3X(XTQ1X)’1XTQ1XB - Q3X(XTQ1X)'1XTQ1e - Q3W(WTP3W)‘1WTP3WC - Q3W(WTP3W)'1W"P3Q3(e + u) + Q3W(WTP3W)'1WTP3Q3X(XTQ1X)‘1XTQ1e Q3XB + WC + Q3(e + u) - Q3XB - Q3X(XTQ1X)'1XTQ1(e + u) - QaWC - Q3W(WTP3W)'1WTP3Q3(e + u) + Q3W(WTP3W)'1WTP3Q3X(XTQ1X)'1XIQ1e 110 = Q3(e + u) - Q3X(X'Q1X)'1XTQ1e - Q3W(WTP3W)‘1WTP3Q3(e + u) + Q3W(WTP3W)‘IWTP3Q3X(XTQ1X)'1XTQ1e Q.E.D. We now define consistent estimators for both q and r. Using the definitions found in Lemma (3.15), we define SSE2' as the sum of squared residual terms found in Residualz and 8822‘ as the sum of squared residual terms found in Residuals: (3.8.8) SSEz‘ (Residua12)T(Residualz) (3.8.9) SSEa’ (Residua13)T(Residuala) Our estimators for q and r are then SSEz‘lN and SSE3‘/T, respectively. Iheorem (3.13): plim SSE2'/N 052 + TO'u2 as N -> oo plim SSEa‘lT 032 + No'v2 as T -> oo Eroof: First, SSEz‘ can be written as 3822' = (Residualz)'(Residualz) = (e + u)"Q2(e + u) - (e + u)TQ2X(XTQ1X)'1XTQ1e - (e + u)7Q22(Z'P22)'1ZTP2Qz(e + u) i (e + u)’Q22(2‘P2Z)‘1Z'P2Q2X(XTQ1X)'1X‘Q1e eTQ1X(XTQ1X)‘1XTQ2(e + u) 4. e'Q1X(XTQ1X)'1X7Q2X(XTQ1X)'IXTQle 111 + eTQ1X(XTQ1X)'1XTQ22(ZTP22)'IZTP2Q2(e + u) - e7Q1X(XTQ1X)'1XTQ22(ZTP22)'12’P2Q2X(XTQ1X)'1XTQ1e - (e + u)'thZ(Z'Pzz)'lz‘Qz(e + n) + (e + u)'Q2P22(Z’P22)'12'Q2X(X'Q1X)'1X'Q1e + (e + u)"QzP22(2"P22)'1Z’Q22(ZTP22)‘12TP2Q2(e + u) - (e + u)'Q2P22(ZTP22)'12'Q22(2'P22)‘12"P2Q2X(XTQ1X)‘1X"Q1e + e’QiX(XTQ1X)'1XTQ2P22(ZTP2Z)'12'Q2(e + u) - e7Q1X(XTQ1X)-1XTQ2P22(ZTP22)'1Z'Q2X(X’Q1X)'1XTQ1e - eTQ1X(XTQ1X)'1X7Q2P22(27P2Z)'1Z’Q22(27P22)'1Z’P2Q2(e + u) + eTQ1X(XTQ1X)'1X7Q2PZZ(Z'P2Z)'1ZrQZZ(ZTP22)‘1 times 27P2Q2X(XTQ1X)'1XTQ1e Now, from the above expression, taking the probability limit of SSEz’ as N gets large is equivalent to taking the probability limit of the sum of sixteen different terms. Evaluation of these sixteen terms shows that the first term has a probability limit equal to q and that the remaining fifteen terms each have a probability limit equal to zero with all limits being taken as N -> 00. These probability limits are evaluated below. 1) plim (e + u)TQ2(e + u)/N = plim e’Qze/N + plim u’Qzu/N Consider these term by term. First, a eTQze/N = TE e1.3/N. 1.1 Each term e1.z has a mean of 033/T, and the terms are independent. Therefore, eTQze/N -> Tofile = 032 as N -> 00. 112 Second, N uTQzu/N = TZ'u12/N -> Tofu2 as N -> 00. 1.1 Third, I eTQzu/N = TX e1.u1/N -> 0 as N -> 00 141 because e and u are uncorrelated. Therefore, (e 2) plim 3) plim 4) plim + u)TQ2(e + u)/N -> 0&2 + To'u2 as N -> 00. (e + u)’Q2X(XTQ1X)'1XTQ1e/N plim {(e + u)TQ2X/N}(XTQ1X/N)'1{XTQle/N} plim {(e + u)TQ2X/N} plim (X'I'Q1X/N)'1 plim {X’Q1e/N} as N -> 00 (e + u)TQ22(ZTP22)'1Z'P2Q2(e + u)/N plim {(e + u)Tsz/N}(ZTP22/N)'1{ZTP2Q2(e + u)/N} plim {(e + u)’Q22/N} plim (Z'l'P22/N)'l times plim {ZTP2Q2(e + u)/N} 0 as N -> 00 (e + u)TQ22(27P22)'12'P2Q2X(XTQ1X)’1XTQ1e/N plim {(e + u)TQ22/N}(ZTP22/N)'1{ZTP2Q2X/N} time (X’QlX/N)’1{X7Q1e/N} plim {(e + u)TQ22/N} plim (Z'I'P22/N)"1 times plim {ZTP2Q2X/N} plim (X'l'Q1X/N)’l times plim {X'Q1e/N} 0 as N -> oo 5) plim 6) plim 7) plim 8) plim 9) plim 113 e7Q1X(XTQ1X)'1XTQ2(e + u)/N plim {e'Q1X/N}(XTQ1X/N)'1{X’Qz(e + u)/N} plim {eTQ1X/N} plim (X"Q1X/N)'l plim {XTQ2(e + u)/N} 0 as N -> 00 eTQ1X(XIQ1X)'1XTQ2X(XTQ1X)'1X’Q1e/N plim {eTQ1X/N}(X'Q1X/N)'1{XTQ2X/N} times (XTQ1X/N)'1{X7Q1e/N} plim {eTQ1X/N} plim (X"'Q1X/N)"l plim {X’QzX/N} times plim (X'IQ1X/N)'1 plim {XTQ1e/N} 0 as N -> oo eTQ1X(XIQ1X)’1XTQ22(27P22)’12’P2Q2(e + u)/N plim {eroixmuxrolxmrl{xrozzmHerzZ/m-1 times {ZTP2Q2(e + u)/N} plim {e7Q1X/N} plim (X'I'Q1X/N)'1 plim {X’QzZ/N} times plim (Z'I'P22/N)‘1 plim {2‘P2Q2(e + u)/N} 0 as N —> oo eTQ1X(X7Q1X)’1X'Q22(27P22)'12'P2Q2X(XTQ1X)‘1XTQ1e/N plim {eTQ1X/N}(XTQ1X/N)'1{X’QzZ/N}(ZTP22/N)’l times {Z’P2Q2X/N}(XTQ1X/N)'1{X’Q1e/N} plim {eTQ1X/N} plim (X"Q1X/N)'l plim {X‘QzZ/N} times plim (Z‘PzZ/N)'1 plim {Z‘P2Q2X/N} times plim (X1'Q1X/N)”1 plim {X’Qie/N} 0 as N -> 00 (e + u)102P22(Z’P22)'1ZTQz(e + 11)/N plim {(e + u)TQ2P22/N}(ZTP22/N)'1{ZTQz(e + u)/N} 114 plim {(e + u)‘Q2P22/N} plim (2"P22/N)'1 times plim {27Q2(e + u)/N} 0 as N -> 00 10) plim (e + u)7Q2P22(27P22)‘127Q2X(XTQ1X)'IXTQle/N plim {(e + u)TQ2P22/N}(ZTP22/N)'1 times {ZTQ2X/N}(XTQ1X/N)'1{X'Q1e/N} plim {(e + u)TQ2P22/N} plim (Z'I'P22/N)‘1 times plim {2*Q2X/N} plim (X"Q1X/N)"1 plim {XTQ1e/N} 0 as N -> oo 11) plim (e + u)’Q2P22(2*P22)'1Z’Q22(ZTP22)'1ZTP2Q2(e + u)/N plim {(e + u)TQ2P22/N}(ZTP22/N)’1{ZTQ22/N} times (ZTP22/N)'1{Z’P2Q2(e + u)/N} plim {(e + u)TQ2P22/N} plim (Z"'P22/N)'1 times plim {ZTsz/N} plim (2"P22/N)‘l times plim {ZTP2Q2(e + u)/N} 0 as N -> oo 12) plim (e + u)"Q2P22(Z"P2Z)'12"Q22(2"P22)'1 times 2'P2Q2X(XTQ1X)'1XTQ1e/N plim {(e + 1.1)1'Q2P2Z/N}(Z"P22/N)"{Z"QzZ/N}(Z"‘P2Z/N)"1 times {ZTP2Q2X/N}(XTQ1X/N)'1{X’Q1e/N} plim {(e + u)7Q2P22/N} plim (Z"'P22/N)’1 times plim {ZTQ22/N} plim (Z"P22/N)'1 times plim {ZTP2Q2X/N} Plim (XTQ1X/N)'1 times plim {XTQ1e/N} 0 as N -> 00 115 13) plim eTQ1X(XTQ1X)'1X7Q2P22(ZTP22)'12‘Q2(e + u)/N plim {e7Q1X/N}(X101X/N)'1{X'QszZ/N} times (Z'PzZ/N)'1{Z’Q2(e + u)/N} plim {eTQ1X/N} plim (X"'Q1X/N)'1 plim {X'QszZ/N} times plim (Z"'P22/N)'1 plim {Z’Q2(e + u)/N} 0 as N -> oo 14) plim eTQ1X(XTQ1X)'1X‘Q2P22(2'P2Z)“12’Q2X(X'Q1X)'1XTQ1e/N plim {e’Q1X/N} plim (X"Q1X/N)'1 plim {X’QszZ/N} times plim (2"P22/N)‘l plim {ZTQ2X/N} times plim (X"Q1X/N)'l plim {X'Q1e/N} 0 as N -> oo 15) plim e"Q1X(X"Q1X)'1X"Q2P22(ZTP22)‘l times 2’Q22(ZTP22)'1ZTP2Qz(e + u)/N plim {eTQ1X/N}(XTQ1X/N)'1{X'QszZ/N}(ZTP22/N)’l times {ZTQ22/N}(ZTP22/N)'1{ZTP2Q2(e + u)/N} = plim {eTQ1X/N} plim (XTQ1X/N)'1 plim {XTQszz/N} times plim (ZTP22/N)'1 plim {ZTQ22/N} times plim (ZTP22/N)‘1 plim {ZTP2Q2(e + u)/N} 0 as N -> 00 16) plim eTQ1X(X’Q1X)'1X'Q2P22(27P22)'127Q22(2'P2Z)’l times ZTP2Q2X(XTQ1X)'1XTQ1e/N plim {eTQ1X/N}(X7Q1X/N)'1{XTQZPZZ/N} times plim (2"P22/N)”1 {ZTQ22/N}(Z"P22/N)‘1 times plim {Z'PzQZX/N}(XTQ1X/N)‘1{XTQle/N} = plim {eTQ1X/N} plim (X'IQ1X/N)'1 plim {XTQszz/N} times plim (Z'I'P22/N)'1 plim {ZTsz/N} 116 times plim (Zszz/N)'1 times plim {ZTP2Q2X/N} plim (XTQ1X/N)'1 times plim {X’Qie/N} = 0 as N -> 00 Next, SSEa' can be written as 8883‘ (Residuala)T(Residua13) (e + v)TQ3(e + v) - (e + v)TQ3X(XTQ1X)'1XTQ1e + (e + v)TQ3W(WTP3W)'1WTP3Q3(e + v) (e + v)TQaW(WTP3W)'1WTP3Q3X(XTQ1X)'1X’Q1e e7Q1X(X7Q1X)'1XTQ3(e + v) eTQ1X(XTQ1X)'1X7Q3X(XTQ1X)’1XTQ1e eTQ1X(XTQ1X)'1XTQ3W(W7P3W)'1W7P3Q3(e + v) eT01X(XTQ1X)'1XTQ3W(WTP3W)'IWTP3Q3X(XTQ1X)'1XTQ1e (e + v)TQ3P3W(WTP3W)'1WTQa(e + v) (e + v)'Q3P3W(WTP3W)-1W’Q3X(XTQ1X)'1X‘Q1e (e + v)7Q3P3W(WTP3W)'1W?Q3W(WTP3W)-1WTP3Q3(e + v) (e + v)’QaP3W(WTP3W)'IWTQ3W(WTP3W)°1WTP3Q3X(XTQ1X)’1X‘Q1e eTQ1X(XTQ1X)‘1XTQ3P3W(WTP3W)'1W7Q3(e + v) eTQ1X(X’Q1X)‘1X’Q3P3W(W7P3W)'1W’Q3X(X’QiX)'1XTQ1e e'Q1X(XTQ1X)'1X'Q3P3W(WTP3W)-1WTQ3W(WTP3W)“1WTP3Q3(e + v) eTQ1X(X’Q1X)'1X'Q3P3W(WTP3W)'1W’Q3W(WTP3W)'1 times WTP3Q3X(XTQ1X)'1X’Q1e Now, from the above expression, taking the probability limit of SSEa' as T gets large is equivalent to taking the probability limit of the sum of sixteen different terms. 117 Evaluation of these sixteen terms shows that the first term has a probability limit equal to r and that the remaining fifteen terms each have a probability limit equal to zero with all limits being taken as T -> 00. These probability limits are evaluated below. 1) plim (e + v)TQ3(e + v)/T = plim e'Qae/T + plim v'Qav/T Consider these term by term. First, 1 eTQae/T = NZ 8.t2/T. tsi Each term e.t2 has a mean of ozle, and the terms are independent. Therefore, eTQze/N -> Nozle = as? as T -> 00. Second, 1 vTQav/T = NZLviZ/T -> Ndv2 as T -> oo. tsl Third, I eTQav/T = NZe1.V1/T -> 0 as T-> 00 1181 because e and v are uncorrelated. Therefore, (6 + V)’Qa(e + v)/T -> 6.2 + No’v2 as T -> oo. 2) plim (e + v)'Q3X(X'Q1X)’1X’Q1e/T plim {(e + v)’Q3X/T}(XTQ1X/T)’1{XTQie/T} plim {(e + v)TQ3X/T} plim ()("QiX/T)"1 plim {X’Qie/T} 0 as T -> oo 3) plim (e + v)’QaW(wrPsW)-IWTPaQa(e + v)/T = plim {(e + v)'Q3W/T}(WTP3W/T)'1{WTpaQa(e + v)/T} 4) plim 5) plim 6) plim 7) plim 118 plim {(e + v)TQaW/T} plim (W'UhW/T)‘1 times plim {WTP3Q3(e + v)/T} 0 as T -> 00 (e + v)‘QaW(WTP3W)'1WP3Q3X(X"01X)'IX’Qie/T plim {(e + v)‘Q3W/T}(W‘PaW/T)'1{WTP3Q3X/T} times (X'QlX/T)'1{X7Q1e/T} plim {(e + v)’Q3W/T} plim (WTP3W/T)'l times plim {W’PaQaX/T} plim (X’QlX/T)’1 times plim {X'Qie/T} 0 as T -> oo eTQ1X(X’Q1X)'1XTQ3(e + v)/T plim {e'Q1X/T}(XTQ1X/T)'1{XTQa(e + v)/T} plim {e'Q1X/T} plim (X'l'Q1X/T)‘1 plim {X’Q3(e + v)/T} O as T -> oo eTQIX(X'Q1X)'1XTQ3X(X’Q1X)‘1XTQ1e/T plim {eTQIX/T}(XTQ1X/T)'1{X’QaX/T}(X7Q1X/T)'1{XTQle/T} plim {e’Q1X/T} plim (XTQIX/T)'1 plim {X’QaX/T} times plim ()("QlX/T)’1 plim {XTQie/T} 0 as T -> oo eTQ1X(XTQ1X)'1X'Q3W(WTP3W)'1WTP3Q3(e + v)/T plim {e'Q1X/T}(XTQ1X/T)'1{X’QaW/THWTPSW/T)‘l times {WTP3Q3(e + v)/T} plim {e101X/T} plim (X'QiX/T)’1 plim {X’QaW/T} times plim (W‘PaW/T)’1 plim {W7P3Q3(e + v)/T} 0 as T -> oo 119 8) plim e701X(XTQ1X)'1X‘QaW(WTP3W)'1WTP3Q3X(XTQ1X)'1X‘Q1e/T 9) plim plim {e'Q1X/T}(XTQ1X/T)'1{X'QaW/T}(WTP3W/T)'l times {WTP3Q3X/T}(XTQ1X/T)'1{XTQie/T} plim {e7Q1X/T} plim (X’QlX/T)’1 plim {X’QaW/T} times plim (W‘U’aW/T)"1 plim {WTpaQaX/T} times plim (X'I'Q1X/T)‘l plim {X’Qie/T} 0 as T -> 00 (e + v)TQaP3W(wTPsW)-1WTQ3(e + v)/T plim {(e + v)TQ3P3W/T}(WTP3W/T)‘1{W’Qa(e + v)/T} plim {(e + v)TQ3P3W/T} plim (W1'P3W/T)’l times plim {WTQ3(e + v)/T} 0 as T -> oo 10) plim (e + v)TQ3P3W(WTP3W)'IWTQ3X(XTQ1X)'1XTQ1e/T plim {(e + v)TQ3P3W/T}(WTPaW/T)'1{WTQ3X/T} times (X'QiX/T)'1{XTQ1e/T} plim {(e + v)TQaP3W/T} plim (WTP3W/T)'l plim {WTQ3X/T} times plim (X’QiX/T)‘1 plim {X’Qie/T} O as T -> oo 11) plim (e + v)’Q3P3W(WTP3W)-1W7Q3W(W7P3W)-1WTP3Q3(e + v)/T plim {(e + v)’Q3P3W/T}(WTP3W/T)'1{WTQ3W/T} times (W’PaW/T)'1{WTP3Q3(e + v)/T} plim {(e + v)’Q3P3W/T} plim (WTPBW/T)'1 plim {W’QsW/T} times plim (WTPSW/T)’1 plim {W’P3Q3(e + v)/T} 0 as T -> oo 120 12) plim (e + v)TQ3P3W(W7P3W)'1W7Q3W(WTP3W)'1 times WTP303X(XTQ1X)'1XTQ1e/T plim {(e + v)'Q3P3W/T}(WTP3W/T)'1{WTQ3W/THWTP3W/T)‘1 times {WTP3Q3X/T}(XTQ1X/T)'1{X’Qie/T} plim {(e + v)'Q3P3W/T} plim (WTP3W/T)'l plim {WTQ3W/T} times plim (WTP3W/T)"1 plim {WTP3Q3X/T} times plim (XTQiX/T)‘1 plim {X'Qie/T} 0 as T -> oo 13) plim e'Q1X(XTQ1X)'leQaP3W(WTP3W)'1W7Qa(e + v)/T plim {eTQ1X/T}(XTQ1X/T)'1{X’QaPaW/T} times (WTP3W/T)‘1{WTQ3(e + v)/T} plim {eTQ1X/T} plim (X’QiX/T)'1 plim {XTQ3P3W/T} times plim (W'I'P3W/T)'1 plim {WTQ3(e + v)/T} O as T -> oo 14) plim e701X(XTQ1X)‘1X’QaP3W(WTP3W)'1WTQ3X(XTQ1X)'IXTQle/T plim {eTQ1X/T} plim (X’QlX/T)'1 plim {X’QaPaW/T} times plim (WTP3W/T)'1 plim {WTQ3X/T} times plim (X'I’QLX/T)”1 plim {X'Qie/T} 0 as T -> oo 15) plim e701X(XTQ1X)'1XTQ3P3W(W’P3W)'1 times W‘Q3W(WTP3W)‘1WTP3Q3(e + v)/T plim {e'Q1X/T}(XTQIX/T)'1{X'QsPaW/THW’PHV/T)’1 times {WTQ3W/T}(WTP3W/T)‘1{WTP3Q3(e + v)/T} plim {e‘Q1X/T} plim ()("Q1JK/T)‘1 plim {X’QaPaW/T} times plim (WNW/T)“1 plim {WTQ3W/T} times plim (W'I'P3W/T)'1 plim {W7P3Q3(e + V)/T} 121 = O as T -> oo 16) plim eT01X(XTQ1X)'1XTQ3P3W(WTP3W)'1WTQ3W(WTP3W)'1 times WTP3Q3X(XTQ1X)'1XTQ1e/T plim {eTQ1X/T}(XTQ1X/T)'1{X7Q3P3W/T} times plim (W'I'P3W/T)'1 {WTQ3W/T}(WTP3W/T)'1 times plim {WTPaQ3X/T}(XTQ1X/T)'1{X'Qie/T} = plim {eTQiX/T} plim (XTQ1X/T)'1 plim {XTQ3P3W/T} times plim (W'l'P3W/T)"1 plim {WTQ3W/T} times plim (W'I'P3W/T)‘1 plim {WTP3Q3X/T} times plim (X'rQ1X/T)‘1 plim {XTQie/T} O as T -> oo Q.E.D. 3.9 C c ’o s In this chapter, we have considered a linear regression model which contains unobservable time effects as well as individual effects. Given panel data, this model may be estimated in a variety of ways, depending on what is assumed about the correlation between the regressors and the effects. We have given a survey of the literature; we introduced HT- like estimators for the coefficients of the linear regression when the effects are assumed to be random and correlated with some of the regressors, and we introduced estimators for the variances of the different error components. We also introduced estimators for the above model that are consistent as N -> 00 for fixed T. These estimators may be useful because a common problem with panel data is that N is large but T is small. In the next chapter, we consider the linear simultaneous equations model with effects. CHAPTER 4 Simultaneous Equations with Effects 4.1 lotgoduction In this chapter, we consider a linear simultaneous equations model with individual effects. Within this context we investigate the problem of simultaneity, defined as the case in which some of the explanatory variables are correlated with the noise component of the error. We assume that for each of the M structural equations the data again consists of T time-series observations on each of N individuals; we distinguish regressors which vary over time and individuals from those which vary over individuals but are time-invariant; and we assume the presence of unobservable, time-invariant individual effects as well as the usual statistical noise. We will refer to a variable as onoogooooo if it is correlated with the noise and oxogonooo if it is uncorrelated with the noise. We write the model to be considered in this chapter as a set of M simultaneous equations: (4.1.1) Yitg Yithg + Xithg 'l' ZigCg + Lug 4’ eits l’OOO’N; t=1’OOO’T;8=l’OOO’MO i 122 123 where there are M equations determining the M endogenous variables yit1,...,yits; Yitg is a vector (of dimension 1 x Hg) of endogenous explanatory variables; Xitg is a vector (of dimension 1 x 63) of exogenous variables which vary both over time and individuals; 21; is a vector (of dimension 1 x Kg) of time-invariant exogenous variables; and both D3, 8;, and C. are vectors to be estimated. The individual effects uig are unobservable and will be treated as time-invariant. Writing each of the M simultaneous equations in matrix form we have (4.1.2) 373 = Yng + X33; + ZgCg + u; + e; where y., u., and eg denote (NT x 1) dimensioned vectors; Yg denotes the (NT x Hg) dimensioned matrix of endogenous variables; and X; and 2; denote (NT x Gg), and (NT x Kg) dimensioned matrices of exogenous variables, respectively. Again, following the convention of Hausman and Taylor, the observations are ordered first by individuals and then by time, so that u. and each column of Z. are (NT x 1) dimensioned vectors consisting of T blocks, with each block containing the same N entries. Rewrite equation (4.1.2) as where R; = [Yg, Xg, Z. land A. = ( Dgr, 33', Car )1. Now consider the set of all M equations 124 (4.1.4) yt = RtAt + as where P - ' - - - yi rs. A1 0 4 I y. = . ’ 8. = . , A. = e g and ya as An _J _ _ .. J R1 .7 . O R. = o o 0 . Rn We make the usual assumptions about the error terms. That is, we assume (4.1.5) . is iid N( 0, Zn ), and (4.1.6) . is iid N( 0, Z. ), where Eu and Z. are both (M x M) positive definite matrices. In addition, we assume the e’s are uncorrelated with both the u’s and with the (exogenous) X’s and 2’s. For a single equation, say the first equation, the covariance structure is 125 (4.1.7) 511 Cov( m + e1 )=Zo,111ur +£u,11(TP) {“110 + (Z.,11 + Tzu,11)P = 6139 + otzP where Q and P are the same two idempotent matrices given in chapter 2, 0&3 = Z}.11, and.sz2 = (2}.11 + sz.11); and so (4.1.8) 811'1 = (1/612)Q + (l/dhz)P ‘ and (4.1.9) 811'1/2 = (1/di)Q + (1/dz)P. And for the system, the covariance structure is Cov(ua+et )=(Z.OIur)+(ZuQ(TP)) (ZIOQ)+(ZzOP) (4.1.10) S where [1 = {a and £2 = (Z. + Tzu). Throughout this chapter we will consider a natural extension of the Hausman and Taylor model to a linear simultaneous equations model with random effects by allowing some of the explanatory variables to be correlated with the individual effects. The plan of this chapter is as follows. In section 4.2 we consider the estimation of the coefficients of a single linear equation from a simultaneous equations model. In section 4.3 we consider the estimation of the coefficients of a system of simultaneous equations. An interesting problem arises for the linear simultaneous equations model with random effects when some of the explanatory variables are correlated with the individual 126 random effects; namely, the instruments need not be the same for every equation. This is the topic discussed in section 4.4. We summarize our results in section 4.5. This chapter applies the Hausman and Taylor method of instrumental variables estimation to the simultaneous equations panel data model, derives the subsequent estimators, and discusses their relative efficiency. In addition, it provides a survey of the current literature on simultaneous equations with effects and translates those estimators into the notation of this thesis. 4-2 WWW Let us now turn to the problem of estimating the coefficients of a single equation, say the first equation. That is, we wish to estimate the equation (4.2.1) y: = R1A1 + (u1 + e1). This is a generalization of the estimation problem considered in chapter 2 in the sense that, in addition to the "inside" instruments (i.e. instruments from within the equation itself), we now have available instruments from "outside" the equation. Now we need to introduce some notation, but first we must agree on the type of explanatory variables permitted. Amemiya and MaCurdy (1986) have considered a simultaneous equation model with random effects correlated with the endogenous variables, but in a somewhat non-standard way. The basic point of view in this thesis is that all variables correlated with the noise should also be correlated with the 127 individual effects, but not conversely. That is, only exogoooos variables can be uncorrelated with the individual effects. This point of view can be Justified by consideration of a system in which every structural equation contains unobserved individual effects. By standard algebra such a system would imply a reduced form, in which each reduced form equation has an individual effect which is a linear combination of the individual effects in the structural equations. It therefore follows that every endogenous variable will be correlated with the individual. effect in every equation, just as it is correlated with every structural error term. Thus, all endogenous variables must be correlated with the effects. 0n the otherhand, if we follow a natural extention of the point of view in Hausman and Taylor, there are two kinds of exogenous variables possible; namely, those uncorrelated and those possibly correlated with the individual effects. That is, if we let X and 2 represent the matrices of all time-varying and time-invariant exogenous variables, respectively, we can then write X and 2 as (4.2.2) >4 u [X(1). X(2) ] (4-2.3) Z [2(1). 2(2) ] where X(1) and 2(1) represents the doubly oxogonooo variables, meaning variables uncorrelated with the individual effects as well as the noise; and X(2) and 2(2) represents the giggly oxogeooos variable, meaning variables uncorrelated 128 with noise but possibly correlated with individual effects. It is important to note that X(1) is not the same as X1. That is, X1 is the matrix of time-varying exogenous variables that appear in the first equation, and since X1 may consist of doubly as well as giggly exogenous variables it may have elements in both X(1) and X(2). 0n the other hand, X contains both the doooly and gingiy exogenous variables from every equation, not just the first, so both X(1) and X(2) may contain elements not in X1. A similiar relationship holds between 21, 2(1), 2(2), and Z. It is an important observation that will be used later that each instrument set considered in this chapter is of the form (4.2.4) H = [ QX, PE ] where the set E will vary. Given this form we can evaluate P[H] using the following Lemma: Lemme_lii11: P[Hl = P[QX] + P[PE] Eroot 3 -1 x1 Q X’ Q P[H] = ( QX PE ) ( QX PE ) ET Q ' E" Q -1 erx 0 XTQ = ( QX PE ) o ETQE E'P QX(XTQX)’1XTQ + PE(E'PE)'1E7P 129 g P[QX] + P[PE] Q.E.D. The obvious generalization of the analysis of Hausman and Taylor would be to choose E = [ X(1), 2(1) ] so that the instrument set is (4.2.5) H = [ QX, X(1), 2(1) ]. But we could also consider E = [ X‘(1), 2(1) ], which is essentially the instrument set suggested by Amemiya and McCurdy. As explained by Breusch, Mizon, and Schmidt (1987), the matrix X‘(1) displays each variable geogratgiy for t = 1,2,...,T. That is, for any T x L panel data matrix S, the T x LT matrix S‘ is defined by ‘1 -~ r811 r811 812 . . . 811' 811' 811 812 . . . 811' (4.2.6) 5 = ,, s* = . 8N1 8N1 8N2 . . . 8"! SN! 8N1 8N2 . . . 8N! ._ .4 L... ._ This leads to the instrument set (4.2.7) HAM = [ QX. X‘(1). 2(1) ] A third possibility is E = [ X*(1), 2(1), QX’(2) ]. which implies the instrument set 130 (4.2.8) Hans = [ QX, X‘(1), 2(1). QX'(2) ] suggested by EMS. For our purposes the list of instruments given in (4.2.5) will suffice, since the algebra in the other cases is the same. (4.2.1) Igo-Stgge Loggt figoaggg We derive the two-stage least squares estimator as follows. First, we multiply equation (4.2.1) by 811'“2 to transform the error to a scalar covariance matrix. The transformed equation is simply (4.2.9) 811'1/2y1 = S11'1/2R1A1 + 811'1/2(u1 + e1). We then follow the path of Hausman and Taylor, by estimating (4.2.9) using IV with instrument set H. This yields the following definition: Definigioo_igi21: The two-stage least squares (ZSLS) estimator of A1 from equation (4.2.1) is the instrumental variables estimator of equation (4.2.9), using the instrument set H. Explicitly, (4.2.10) a1,zsLs = [311311-113PIH1811'1’281 ]‘1 times R1'811'1/2PIH1811'1’2y1. It is an interesting detail that although we have transformed equation (4.2.1), we have used the untransformed instruments, H. Following White (1984, section IV.3), the 131 optimal IV estimator is derived by transforming the equation to be estimated so that its error covariance is scalar (as we have done), and then using whatever instruments are optimal. Thus, in general, the question of whether H or 811'1’ZH is preferable depends on which instrument set better explains the endogenous variables contained in 811‘1/281. As Breusch, Mizon, and Schmidt point out, however, in the present context transforming the instruments by 81'”2 makes no difference; either instrument set leads to the same estimator. This is implied by the following Lemma: Leggg (1.3): Given H = [ QX, PE ] defined in (4.2.5) and 811"“2 defined in (4.1.9), (4.2.11) P[H] = P[S11'1/2H]. 2m: P[Sll’l’znl P[ (1/di)QX + (1/o&)PE ] = P[ (1/d1)Qx ] + P[ (1/62)PE ] {(1/0'1)Qx}{(1/<:f12 )X'Qx}'1{(1/61)Qx}' + {(1/o‘z)PE}{(1/o"23)ETPE}-1{(1/dz)PE}r QX{X’QX}'1X’Q + PE{E7PE}'1E'P P[QX] + P[PE] = NH] Q.E.D. 132 (4.2.2) t o o t d' ' ‘v 'o o t Estimgtog Following Hausman, Newey, and Taylor (1987), we consider an interpretation of the ZSLS estimator implied by the instrument-residual orthogonality condition written as plim f1/NT = 0, where (4.2.12) f1 = HTS11‘1’2(y1 - R1A1). Now the covariance structure of fl is (4.2.13) COV(f1),§ C1 = HT811'1/2811811'1/2H = HTH The instrumental variables estimator (also known as the "Generalized Method of Moments" estimator) then is the solution to the problem of minimizing with respect to A1 the quadratic distance from zero of f1: (4.2.14) fi'Cl‘lfi = (y1 - R1A1)TW1(y1 - R1A1) where (4.2.15) W1 six-Illnci-lnrsii-IIZ 811"1/3H(H"'H)"1HTS11"1’z is a quadratic form. This solution can be written as (4.2.16) a1.1v [ R1’W181 ]'1R1'W1y1 [ Rirsin-llzn(aru)-lnrsii-llzsi 1"1 times R1'S11'1/2H(H’H)'1HTS11'1’2y1 [ R17811'1/3PIH1811’1’3R1 1"ll times R17811’1/2PIH1811'1’2y1. 133 It can readily be seen that a1.1v is equal to the ZSLS estimator of A1 given in (4.2.10). It is an interesting result that 811'1’3 in the orthogonality condition given in (4.2.12) is superfluous. To see this, consider the simpler orthogonality condition plim f2/NT = 0, where (4.2.17) f2 = HT(y1 - R1A1). Noting that (4.2.18) Cov( f2 ) a C2 = Hrsiiu, the problem of minimizing with respect to A1 the quadratic distance from zero of f2, (4.2.19) f2’C2'1f2 = (y1 - R1A1)‘W2(y1 - RiAi) where the quadratic form W2 = H(HT811H)'1HT, yields the solution (4.2.20) a2.1v = [ Ri’W2R1 l'lRiTW2y1. Now we can write W2 as (4.2.21) W2 H(H7811H)'1Hr X'Q XTQ = (QX PE ){ (0'le + dzzQ )( QX PE) }"1 ETQ E'Q -1 d12X7QX 0 XTQ ( QX PE ) 0 OGZETQE E‘P 134 (1/612)QX(XTQX)‘1XTQ + (1/622)PE(ETPE)’1ETP (1/611)P[QX] + (1/622)P[PE]. 0n the other hand, W1 given in (4.2.15) can be written as (4.2-22) W1 S11"‘/2H(HTH)'1HTS11'1/2 [ (1/613)Q + (1/U§Z)P JPIH] times (1/d12)Q + (1/dzz)P ] (1/012)P[QX] + (1/623)P[PE]- Therefore, W1 = W2 and the two estimators are the same. Substituting W1 from (4.2.22) into the ZSLS estimator given in (4.2.16), we can rewrite the estimator as (4.2.23) 81,2828 = [ (1/612(QRI)'P[QX](QRI) + (1/oh3(PR1)‘P[PE](PR1) ]‘1 times [ (1/612(QR1)'P[QX](QYI) + (1/dzz(PR1)'P[PE](Py1) ]. Now the same line of proof used above would show that you would get the same estimator based on the orthogonality conditions (4.2.24) f3 = HTS11'1(y1 - R1A1). This is in any case obvious because it corresponds to transforming both the equation and the instruments by S11’1/2, which we have shown above to be the same as transforming only the equation and not the instruments. 135 (4.2.3) ggltggi’s Ezgog-Co-pooegt Igo-Stage ngg; figogzes st' t Baltagi (1981) considers a simultaneous equations model with effects which, in addition to individual effects, contains time effects as well. In contrast to the model considered in this chapter, Baltagi’s does not distinguish between goooly and giggly exogenous variables; implicitly he assumes that only doubly exogenous variables exist among the explanatory variables. In Baltagi’s notation the Error- Component Two-Stage Least Squares (ECZSLS) estimator can be written as (4.2.25) a1.nczsLs 3 -.- {E (mun)rp[xw)mum/0'11“"2 }'1 h=1 3 times {2: (211M)rptxmlz./m.2 }. 1121 On the other hand, the ZSLS estimator given in (4.2.23) can again be written as (4.2.26) a1.2sLs = [ (1/622(QR1)TP[QX](QRi) + (1/612(PR1)7P[PE](P81) ]'1 times [ (1/dh2(Q81)’P[QX](QYI) + (1/613(PR1)’P[PE](Py1) ]. This is "essentially" Baltagi’s estimator translated into our "essentially" in the preceding notation. We use the word sentence because we do not include time effects in our model. Now, if we assume only individual effects, we have the 136 translation as follows: Baltagi’s 21 is our R1 , his 81 is our A1, his X is our (X, 2), his 21(1) is our PRi, his X(1) is our PX, his 611(1) is our 622, and both his 611(2) and 611(3) are our 613. Since time effects are not present the distinction between the two terms X(Z) and X‘33 is irrelevant so in Baltagi’s notation X13) + X(3) is our QX. Similiarly, his 21(2) + 21(3) is equal to our Q81. Therefore, Baltagi’s ECZSLS estimator can be written using our notation as (4.2.27) a1.:czsLs = I (1/622(QR1)'P[QX](QR1) + (1/612(P31)TP[P(X, 2)](PR1) 1'1 times [ (1/dzz(QR1)'P[QX](Qy1) + (1/d12(P81)'P[P(X, 2)](Py1) ]. It is easily seen that this estimator is the same as the al.2813 when E = (X, 2); that is, when there are no singly exogenous variables. 4.3 8 st t at o In section 4.2 we discussed "single-equation" methods of estimation in the sense that the estimators there operated on each equation separately. This section will discuss "systems" methods of estimation, which estimate all equations jointly. The motivation for considering joint estimation is of course that the joint estimates are generally more (asymptotically) efficient than the single-equation procedures. Again, let us consider the set of all M equations 137 (4.3.1) y: = RaAa + 8:! where of course - q r- -w P- y1 81 A1 y.= '(i 8*: 0 9 A‘= o gand ya 83 Au _. .. _ _ .1 ._R1 ._ 0 Rt: 0 I 0 Rs Note that the covariance matrix of s: is (4.3.2) S = Cov( u: + e: ) = ( {p 0 INT) + ( {L 0 (TP) ) =(ZIOQ)+(Z29P) (4.3.3) s-1 = (El-1 9 Q) + (Ea-1 o p) (4.3.4) s-l/2 = ( {1-112 0 Q ) + ( L's-112 o p) where £1 = {.9 and £2 = (Z. + Tin); Q and P are, again the two idempotent matrices used before. Recall that X and 2 represent the matrices of all time- varying and time-invariant exogenous variables, respectively, and that we can write X and 2 as (4.3.5) X [X(1), X(Z) ] (4.3.6) z [2(1), 2(2)] where X(1) and 2(1) represent the doubly ggogoooog variables, 138 meaning uncorrelated with the individual effects as well as the noise; and X(2) and 2(2) represent the giggly egogegogs variable, meaning uncorrelated with noise but possibly correlated with individual effects. Note that the decomposition in both equations (4.3.5) and (4.3.6) are without reference to a particular equation. This is because we are assuming that we have the same instruments in every equation; that is, if a variable is gogoly,ggogggogg in one equation then it is doubly egogegogs in every equation and likewise, if a variable is giggly egogogogs in one equation then it is singly exogenous in every equation. We will consider the more complicated case when the instruments may differ from equation to equation in section 4.4. Finally, recall that our instrument set is of the form H = [ QX, PE ], where E = ( X(1), 2(1) ). (4.3.1) Ihzoe-Stago Leagt §gggggs We derive the three—stage least squares as follows. First, we multiply equation (4.3.1) by Sandi2 to transform the error to a scalar covariance matrix. The transformed equation is simply (4.3.7) Sa‘l/zyt = Ss'IIZRIAt + St'llzst. We then follow the path of Hausman and Taylor, by estimating (4.3.7) using IV with instrument set (I O H). This yields the following definition: 139 Dggigitiog (4,1): The three-stage least squares (3SLS) estimator of A- from equation (4.3.1) is the instrumental variables estimator of equation (4.3.7), using the instrument set (I 8 H). Explicitly, (4.3.8) assis = [R-'( Zr! 0 HQ“ > + (ta-1 0 PIPE] )3. 1-1 times 3.1(23-1 o P[QX] ) + ( Zr! 0 PIPE] ”by... (4.3.2) Igstrugental Vgrigbleg Esgiggiiog Following Hausman, Newey, and Taylor (1987), we consider an interpretation of the 3SLS estimator implied by the instrument-residual orthogonality condition written as plim fa/NT = 0, where H'(y1 - R1A1) (40309) f. = o H'Hyn - RMAH) t. J = (I O Hr)(yt - RtAt). The covariance structure of f: is (4.3.10) Cov(f-) 5 C: = (I O HT)S-(I 0 H) = (10H7)(ZioQ)+({2op)(Ion) (Z1 OHTQH)+(Zz 0111911). To assist in the simplification of the estimators considered below we need the following Lemma: 140 Legga(4.§): Suppose T1 and T2 are positive definite, nonsingular matrices and H = [ QX, PE ]. Then (4.3.11) { ( T1 0 H’QH ) + ( T2 0 H'PH ) 1'1 Ti'1 0 (H'QH)‘1 + T2'1 0 (HTPH)‘1. 2:001: Using Baltagi’s lemma (Baltagi (1980), p. 1548), it is sufficient to show that (H'QH)(H’PH) er mm = Q( QX PE ) P( QX PE ) ETP ETP xth o o o o o i o ETPE o o = = o. Q.E.D. o 0 As before the instrumental variables estimator (also known as the "Generalized Method of Moments" estimator) is then the solution to the problem of minimizing with respect to A- the quadratic distance from zero of f-, (4.3.12) ft’Cs'lfu = (y: - RaAs)7W-(ys - RtAt) where (4.3.13) W: = (I 0 H)Cs’1(I 3 HT) is a quadratic form. By Lemma (4.5), Cc'l can be written as 141 (4.3.14) Ct‘l = {h'1 8 (H’QHY’1 + {2‘1 G (HTPH)'1 so we can rewrite W: as (4.3.15) w. (I s H)( 21-1 0 (atom-1 + Zr! 9 (Hum-1 )(I 0 Hr) ( £1" 0 H(HTQH)'1H* ) + ( Zz-l 0110119104111 ) (Zn-1 0 FIG!” ) + (ta-1 0 MPH] ) (Zn-1 9 P[QX] ) + (ta-1 3 P[PE] ). The solution can be written as (4.3.16) at [ RaTWsRs ]'1RsTWsye I as” {1-1 e P[QXl ) + (la-1 0 PIPE] )R: 1-1 times Rt’( {3'1 O P[QX] ) + ( {2'1 0 PIPE] )Y‘ It can readily be seen that a: is equal to the 3SLS estimator of A: given in (4.3.8). An alternative estimator can be derived from the instrument-residual orthogonality conditions given in (4.3.9) if, in place of the quadratic form We we use instead (4.3.17) W2 (1 o H)[( diag(21) o HTQH) + ( diag(Z2) s H'PH )1-1(I 0 HT). disg(Zi)-l o H(H’QH)'1HT + diag(£2)'1 o H(H"PH)’1HT diss(Z1 )-1 o P[QH] + diag(Zz)-1 e P[PH] 142 where diag(£1) and diag(£2) are diagonal matrices whose diagonal entries are the diagonal entries of‘Z} and [2, respectively. Thus, we do not take account of the fact the covariance structure of f: is ( Z) O H'QH ) + (Z2 0 HTPH ) rather than ( diag(£1) O HTQH ) + ( diag(£2) 0 H’PH ). This yields the estimator (4.3.18) a-2 [RaTWZRI]'1RsTW2yt [new diag(z1)'1 s P[QH] + diag(£2)'1 o P[PH] )R.]-1 times a.” diag(£1)'1 o P[QH] + disg(£z)-1 o P[PH] )y.. Since Rs, diag({1)‘1 Q P[QH], and diag(£z)-1 3 MPH] are block diagonal and since Zii1PIQH] + ZiizPIPH] = S11’1P[H] = 811'1/2PIH1811'1’3, we can rewrite (4.3.18) as (4.3.19) atz [R1'S11'1/2Plfllsll-IIZRI1-1 L [RHTSnn'1/3PIH]Sun-IIZRH1-1 ._ R1TS11‘1’2PIHJS11'1/2y1 times . RH'SHM'IIZPIHISnn'llzgj. — which can be seen equal to be 2SLS applied to each equation separately. 143 Since W2 is a suboptimal weighting matrix, this is one way of proving 3SLS efficient relative to ZSLS. Still another estimator can be derived if we consider, instead of (4.3.9), the instrument-residual orthogonality conditions written as plim fa/NT = 0, where (4.3.20) f3 = (I O H‘)S'1(yt - RtAt). The covariance structure of f3 is written as (4.3.21) Ca §Cov( f3 )= (I 9 HT)S'1(I 8 H). It is an interesting result that S'1 in the orthogonality condition given above is superfluous. To see this, consider the problem of minimizing with respect to A- the quadratic distance from zero of f3, (4.3.22) f3T03'1f3 = (ya - RaAs)TW3(y- - RtAt) using the quadratic form (4.3.23) W3 = S’1(I 0 H)[(I 9 HT)S'1(I 9 H)]'1(I O H')S'1. The solution to this problem yields the estimator (4.3.24) aaa = [ R-‘WaR: l‘lfictWayu. Now we can write Or1 as (4.3.25) 03‘1 { (I O H7 )S"(I O H) }'1 {(I OH')( Zi-l QQ+£z-1 or )(I on) }-1 { [1'1 o H’QH + Earl 0 mm 1-1 144 = [1 3 (atom-1 + £2 9 (mun-1 using Lemma (4.5). Then W3 can be written as (4.3.26) W3 S‘1(I O H)Ca'1(I O H7 )S'1 s-1(I e H){Z1 e (mom-1 + {,2 o (HTPH)'1 }(I 9 ans-1 {ll-121214 e savour-1n! + Zz-IDZz-l 0 H(H"PH)'1HT £1“ 0 HQ!” + Zz-l 3 mm”) {1'1 0 P[QX] + {2'1 s P[PEl). Therefore, comparing W3 given above to W: given in (4.3.15) it is clear that W: = W2, so the two problems are the same and the presence of S'1 in the orthogonality condition of (4.3.20) is irrelevant. (4.3.3) Special Cases Consider the model (4.3.27) ya = R-At + s- We showed in section 4.3.2 that the 3SLS estimator can be interpreted as an IV estimator using the instrument set (I O H). As we have shown before, H = [QX, PE] where E = (X1, 21) contains the doubly exogonous variables present in the model. For our first special case, suppose that there are no doubly exogonous variables; i.e. all exogenous variables are correlated with the individual effects. Then the set E is empty and we should have fixed effects. Suppose also there 145 are no time-invariant variables 2 since estimation of their coefficients would now be impossible. Our estimator becomes (4.3.23) assLs = [RsT (ii-1 0 P[QX] Hal-1 times Ra’(£1‘1 0 P[QX] )ys. An alternative approach to fixed effects estimation would be to derive the 3SLS estimator by first premultiplying equation (4.3.27) by (I O Q), a system-wide within transformation, yielding (4.3.29) (1 O Q)?- = (I O Q)R4As + (I O Q)s- and then using the instrument set H = ( QX ). This fixed effects (within) estimator becomes (4.3.30) at: = [ n.r( {1-1 o P[QX] )R. ]'1R:’(Z1’1 e P[QX] )R-yn This estimator can be seen as equal to our 3SLS estimator when there are no gogbly egogegogg variables. Estimation of the panel-data simultaneous equations model with fixed effects have been considered by Cornwell and' Schmidt (1987). There they show that in a simultaneous equation model in which the same exogenous variables in each equation have coefficients which vary over individuals, the MLE, the conditional MLE and the marginal MLE coincide. This is obviously a more general model than the one being considered here, but their model does simplifiy to a fixed- effects version of the simultaneous equations model with _—-‘_— _-.._ . _- 146 individual effects. In effect, they show that the MLE, CMLE and MMLE coincide in a simultaneous equation model with fixed effects. Their results imply that just as in the single equation case, the coefficients of the time and individual varying explanatory variables are determined by the "within" component of the likelihood and that the coefficients of the time invariant or the individual-invariant explanatory variables is determined by the appropriate "between" component of likelihood. For our second special case, suppose there are no giggly egogegous variables so all the exogenous variables are assumed uncorrelated with the individual effects. This is the Baltagi case; that is, the case when H = [QX PE] where E = (X, Z). In Baltagi’s notation his error-component three- stage least squares (ECSSLS) estimator can be written as (4.3.31) roasts : {fl (z,(h))r( [(h) 0 FUN”) )Zl‘“ }" 1181 3 times { 2: (Z1‘h’)7( 2‘“) 0 P[X(h’] )Zl(h’ }- '18]. On the other hand, the 3SLS estimator given in (4.3.8) can again be written as (4.3.32) assLs = [ RI’( {1’1 9 P[QXI ) + ( {1'1 9 PIPE] )R' 1‘1 times Rs’( {1'1 O P[QX] ) + ( {3'1 0 P[PE] )Rcys. This is essentially Baltagi’s estimator translated into our 147 notation. That is, if we assume only individual effects, note that his {(1) is our {2'1 and his {(3) + [(3) is our {3'1, and further, use the translation of section 4.2.3, then Baltagi’s EC3SLS estimator in our notation is (4.3.33) axcasts = [ Remix-1 0 P[QX] ) + (222-1 9 NPR} )R- 1-1 times RcT( {1'1 0 P[QX] ) + ( [3'1 9 PIPE] )R-y-. which is the same as assts. (4.4) S S w'th D' e ent I at ents We now allow different instruments to exist in different equations. To this end we need to introduce some notation. Let H1 = [QX PE1] be the instrument set for equation 1, H2 = [QX PE2] be the instrument set for equation 2, etc. Note that as before each instrument set is of the form H = [QX PE], but the E’s differ across equations. This is because they contain variables that are oouoly gxogegogg but only with respect to each particular equation. In this section a variable which is dogbly egogenogs for one equation may not be gogoly ggogegoug in another. Recall that in section 4.2 we derived the ZSLS estimator for the first equation by considering the instrument-residual orthogonality condition based on (4.4.1) f1 = H1T(y1 -R1A1). Using 148 (4.4.2) C1 2 Cov( f1 ) = H17811H1, the solution to the problem of minimizing the quadratic distance from zero to f1, (4.4.3) f1’(H1’S11H1)'1f1 (y1 - R1A1)H1(H1’S11H1)'1H1'(y1 -R1A1 (y1 - R1A1)((1/d12) P[QX] + (1/0'12)P[PE11)(y - R1A1) yields the estimator (4.4.4) a1.2sLs [R17W1R1]'1R1'W1y1 [R11((1/612) P[QX] + (1/0'22 P[PEl)Rll'1 times R1T((1/d12) P[QX] + (1/621) P[PE])y1 where (4.4.5) W1 = H1C1‘1H1'r = H1(H1TS11H1)’1H1', and the covariance structure for our 2SLS estimator is given by (4.4.6) Cov(a1,ZsLs) [R1’W1R1]’1 [R1’((1/612) P[QX] + (1/623)P[PE]) R1l‘1. Now we will derive the joint ZSLS estimator; a system estimator with ZSLS applied to each equation separately. Let 149 (4.4.7) H3 = . Then we write the instrument-residual orthogonality conditions as plim fs1/NT = 0, where (4.4.8) fti = H1"(yt - RtAt). Although the covariance structure is (4.4.9) C: = HITSHI we used instead the sub-optimal weighting matrix (4.4.10) W41 = H3(Hs7blg(S)Ht)'1HcT where S11 . 0 (4.4.11) b1g(S) = . . 0 . Sun We minimize the quadratic distance (4.4.12) f-17(Hc7blg(S)Ha)'1ftl which yields the joint ZSLS estimator (4.4.13) atzsns [Rs'Wt1R-l'1 R1’W-1yt [n.ra.(a.rb1g(8)nc)-1H1'~1Rs]-1R1Tw.1y- Because Rs, A4, and b1g(S) are block diagonal, we have 150 (4.4.14) a42s13 (“[RNHHHiSnm)‘1H1'R11‘1 [Run-In (1111811an )‘1HHTRH I‘IJ ‘ R1'H1(H1'Si1H1)'1H1’Y1 times . Rn’Hu(Hn'Snan)‘ll-Inryn_d which can be seen as 2SLS applied to each equation separately. And (4.4.15) Cov(auzsLs) (RI'thRIl'lRaTWa1SWsTR.[R.Tw.IR.]-1 [Rt'Ha(Ha’b1g(S)Hs)‘1H3TRt]‘1RcTWsS times WaTRulaaTH.(HsTb18(s)H.)-ln.rR.1-1 Now consider again the instrument residual orthogonality conditions given in (4.4.8). Using the correct covariance structure, the problem of minimizing the quadratic distance from zero of f-1, (4.4.16) fti'Wtzftl where (4.4.17) W82 HaC-‘lflcr Ht(H-TSHt)’1 HsT yields the 3SLS estimator 151 (4.4.18) acasL [R4W42R4]'1 R*W*2y' [ReHsCs'IHsrRs ]'1 ReHsCs'l-Harys with covariance matrix (4.4.19) Cov(aaasts) (R'TW'ZR*)'1 (Rs’HcC¢'1HcTRt)'1. It is a standard result that this estimator is efficient relative to the ZSLS estimator given above. And when Hy = (I 0 H), it is easy to show that 3SLS given in (4.4.18) simplifies to 3SLS given in section 4.3. heo e .6 : When H: = (I O H), the 3SLS estimator given in (4.4.18) reduces to the 38LS estimator given in (4.3.8). Pyoof: Note that when H: = (I 0 H) where H = [QX PE], the weighting matrix in (4.4.17), using Lemma (4.5), reduces to the matrix (4.4.20) W42 = HtIHaTSHI]HIT H)[I o HT)S (1 o H)]'1(I 0 H”) H)( 21" 0 (H’H)'1 + [3’1 0 WITH)"1 )(I 0 HT) (I (I {1‘1 0 P[QH] + 22-1 o P[PE] {3‘1 0 P[QX] + {2'1 0 P[PE] Since this reduced to the same weighting matrix used in 152 (4.3.8), the result follows. Q.E.D. The question we now ask is whether our EC3SLS estimator for the more general model allowing instruments to vary across equations is efficient. We ask whether the instrument-residual orthogonality conditions given in (4.4.8) can be mixed using the cross equation covariances as weights. Consider a positive definite matrix C (of dimension M x M) and the vector (4.4.21) fra H-T(C O Ir)(y* - RsAc) - d Z c11H1T(y1 — AiRi) 1:1 2: cmHnHyi - AiRi). Thus premultiplying the instruments H: by a matrix of the form (C 0 I) would "mix" the equations (unless C was diagonal) and introduce terms like the cross-product H1’(y1 - R1A1) whose probability limit we implicitly assumed was not zero for at least one j = 1,...., M. (If not then we have the special case when H1 = H2 = ...= Ha.) Thus, f-3 does not represent Logo instrument-residual orthogonal conditions so even consistency of any resulting estimator would be in doubt. Therefore, the orthogonality conditions f-a would not lead to an improved estimator. The question of whether we can improve the 3SLS 153 estimator derived from f-z must be addressed by searching among estimators derived from transformations of fez which do not create new and illegitimate cross-products. We pursue this line of reasoning in the remainder of this section. Now there are two ways to order the instrument-residual orthogonality conditions given by f-z. We can order first by residuals and then by instruments (which has been the method used so far) or we can order first by instruments and then by residuals. We will address the question of transforming the orthogonality conditions ordered in each of the two ways and consider the effect, if any, on the resulting GMM estimator. First, we need to introduce some notation. Let (4.4.22) H = [ h1, . . . , hL ] be the set of gll instruments; L denotes the total number of instruments. Then define (4.4.23) U: = UM —— d to be a selection matrix where HUi = H1; each matrix U1 (of dimensions Li X m1) selects from H the instruments orthogonal to the residual 31; and mi equals the number of instruments orthogonal to residual 81. We can now write “H01 ‘1 -H1 7 (4.4.24) (I. own. e 154 =H.o It would follow that the sum m1 + . . . + ms is the total number of instrument residual orthogonality conditions found in (4.4.6) and that the matrix 0- is of dimension (m1 + . . . + mm) x ML. We can now write (4.4.6) as (4.4.25) fnc U-Tvec(H’s) = U-T(In O HT)vec(s) PH1’ - F81- 1(— _1 8!! J and the covariance matrix of f-4 as (4.4.26) qu 5:Cov(f-4) = E{ Usrvec(H's)vec(HTs)TUn } E{ U:‘(In 0 HT)vec(s)vec(s)T(In 0 H)Ut } (was s H")(Z1 s Q + £2 a PHI): a mu. Ut"(ZI 0 H10}! + £2 9 H'PH)U3 since E{ vec(s)vec(s)' } = Z; O Q + (Z2 0 P. The quadratic distance to zero of fc4 can be written as (4.4.27) f-q'th'lf-c = vec(s)'(In O H)UsC-4UtT(In O HT)vec(s) Now define the matrices T1 (i = 1,.., M) where T1 is a positive definite square matrix of order m1. Then 155 (4.4.28) T: = . . Tu is a positive definite, square matrix of order equal to the total number of restrictions. We can then transform the orthogonality conditions in (4.4.25) by T: and write them as (404029) 184 TsTUaTvec(HTs) = T47037(In O H7)vec(s) F:I‘1"U1TH'r s1 8 0 THTUHTHT su _ _1 1- J rm'I‘11'H1'r r-s1 TuTHn" . . su — d - d We should note that each block, T1TH1781, is a mix of the cross-products beween the instruments in H1 and the residual Si. Since every instrument in H1 is orthogonal to residual si, Ti'Hi'Si represents a mixing of only legitimate instrument-residual orthogonality conditions. In this mixing cross-products which have nonzero probability limit are not introduced. Now consider (4.4.30) 9., E Cov(iu4) 156 E{ T-TUaTvec(HTs)vec(H’s)‘UtTa } E{ TmrUs’(In O H’)vec(s)vec(s)T(In 0 H)U-Ta } TtTCncTa We then write the quadratic distance to zero from {:4 as (4-4-31) (£*¢)’(Q'4)'1£*4 = vec(HTs)TUsTc[Ts'Cc4TI]'1Tt‘Ut’vec(H’s) vec(HTs)'UITtTI'1Ct4’1(Tc:)‘th’UtTvec(HTs) = vec(HTs)'UcCs4'1UaTvec(HTs) vec(s)T(In G H)UcC-4UsT(In 0 HT)vec(s) which is the same as in (4.4.27). Thus, the GMM derived from either ft4 or 144 would be the same. Therefore, mixing the instrument-residual orthogonality conditons having a common residual will have no effect on the resulting estimating. We next consider mixing the orthogonality conditions in (4.4.6) within subgroups with a common instrument but first we need to introduce some additional notation. Let (404032) 89 = ( 81, o o o ,8“ ) be the matrix (of dimension T x M) containing the residuals. Then define (404033) 8(1) = SOV1, j. = 1, o o o g L, as the matrices (of dimension T x 11) containing only those residuals assumed orthogonal to the instrument hi. s(1) is the matrix which selects the 11 residuals from the list in (4.4.32) where 11 denotes the number of residuals orthogonal 157 to instrument hi. Note that there are as many 8(1)’s as there are instruments. The matrix containing all the selection matrices can be written as (4.4.34) V: = . . We can now rearrange the instrument-residual orthogonality conditions found in (4.4.25) first by instrument and then by residuals. The orthogonal conditions reordered in such a manner can be written as (4.4.35) fcs Vervec(sTH) V47 (11. O s" )vec(H). It should be pointed out that this rearrangement has in no way effected the orthogonal conditions; the same instrument- residual orthogonal conditions contained in (4.4.25) are still found in (4.4.35) but now in a different order. The covariance structure of fts is written (4.4.36) 035 §_Cov(fas) = E{ Vc’vec(s’H)vec(sTH)TVc } E{ VcT(HT 0 In )vec(s" )vec(s" )T(H O Iu)V- } V:'(H" OIHHQOE1 + PODHHOIHW' vnmrQn e [1 + HTPH 9 DW. since E{ vec(s)vec(s)1' } = Q 0E1 + P 0 [2. So the quadratic distance from fcs to zero can be written as 158 (4.4.37) (f-5)"(C:s)'1fcs = vec(sTH)TVaCcS'1VaTvec(87H). Now define the matrices T1 (i = 1, . . ., L) where T1 is positive definite, square matrix of order 11 and so (4.4.38) T: = . is a positive definite, square matrix of order equal to the total number of restrictions. We can then transform the orthogonality conditions in (4.4.35) by T: and then write them as (4.4.39) its VtTvec(sTH) FT1TV1TsTh1-l d b T1.'1'V1.'1's'r hL-J We note that each block, T1TV1TsTh1, is a mixing of the instrument-residual orthogonality conditions but for only a single instrument hi. That is, we are mixing the cross- products of 8(1) and hi which are all assumed to have a probability limit equal to zero. Thus, we have not introduced illegitimate instrument-residual cross-products whose probability limit may be nonzero. Now consider 159 (404040) 9.5 Cov(its) E{ TsTV4'vec(sTH)vec(er)TVtTt } E{ TaTVI-"(H'r G Iu)vec(sT)vec(s")T(H 9 I11)VtTc } TITO-5T: We then write the quadratic distance to zero from its as (4-4-41) (i'5)'(9‘5)'11'5 = vec(sTH)TVcT:[TcTC-5Tc]'1T4TVaTvec(sTH) vec(sTH)TV:TuT¢'ICts'1(TIT)‘1Tc'Vchec(s'H) vec(sTH)TVcCcs‘1Vchec(sTH). By comparing the above quadratic distance to that given in (4.4.37), we find that as long as the T1’s are nonsingular so Tt'l exists, transforming the orthogonality conditions in (4.4.37) by T: will have no effect on the resulting GMM estimator. In summary, when transforming the instrument-residual orthogonality conditions when the instruments are different for each residual, we must restrict ourselves to transformations which do not create new and illegitimate cross-products. Unfortunately, when we do so it turns out that there is no gain for doing such transformation - we just get back the 3SLS estimator. 160 4.5 Cogclusions This chapter applies the Hausman and Taylor method of instrumental variables estimation to the simultaneous equations panel data model, and derives the subsequent estimator. Throughout, we attempt to improve our instrumental-variables estimator by transforming the error so to change its error covariance or by transforming the instruments so to improve their explanatory abilitiy of the endogenous variables. We consider a natural extension of the Hausman-Taylor model to a linear simultaneous equation model with random effects by allowing the effects to be potentially correlated with some of the regressors. We then consider the affect on our instrumental-variables estimator when the instrument sets are not the same for-each equation in the system. CHAPTER 5 Conclusion In this thesis, I have considered the specification and estimation of linear models in the presence of panel data. The previous literature on this topic can be organized according to the following four distinctions: first, the nature of the model, such as single equation versus simulataneous equation model: second, whether there are assumed to be individual and time effects, or just one or the other; third, whether the effects are assumed to be fixed or random, and, if they are random, whether they are assumed to be correlated with some or all of the explanatory variables; and fourth, whether asymptotic properties of the estimators depend on a large number of individuals (large N), a large number of time periods (large T), or both. Existing papers cover some but not all of the possible combinations of these assumptions, and the basic purpose of this thesis is to fill in some of the more obvious gaps in the literature by considering plausible and important combinations of assumptions not previously considered. However, another purpose of the thesis is to advance a particular mathematical framework for the analysis and to demonstrate its usefulness. 161 162 There are three substantive contributions of the thesis. The first is to extend the analysis of Hausman and Taylor (1981) to a model containing individual and time effects correlated with some or all of the regressors, under the assumption of large N and small T. I consider random individual and time effects, and allow the regressors to be correlated or not with either or both types of effects. The analysis is similiar to that of Hausman and Taylor, but it is algebraically more complicated because there are more different types of exogeneity assumptions to consider. It should also be noted that all previous treatments of models with both individual and time effects assume large N and large T. I consider this case in detail, but I also consider separately the case of large N and small T (as assumed by Hausman and Taylor). The second contribution of the thesis is to extend the analysis of Hausman and Taylor to a single equation in a simultaneous equations system; that is, to a regression model in which some of the regressors are correlated with the random noise component of the error. This case has previously been analyzed by Amemiya and MaCurdy (1987), but in an unsatisfactory way. I follow Hausman and Taylor and Amemiya and MaCurdy in considering random individual effects (no time effects) which may be correlated with some or all of the exogenous regressors, and in assuming large N and small T. I propose ZSLS estimators based on instrument sets proposed by Hausman and Taylor, Amemiya and MaCurdy, and 163 Breusch, Mizon, and Schmidt (1987). The third contribution of the thesis is to propose full- information (3SLS) estimators for a simultaneous equations system with random individual effects correlated with some or all of the exogenous variables. These estimators are shown to reduce to the usual fixed-effects treatment if all exogenous variables are correlated with the effects, and to reduce to an estimator previously proposed by Baltagi (1981) if none of the exogenous variables are correlated with the effects. I also consider the case in which some exogenous variables may be correlated with the effects in some equations but not in others, so that the available instrument set varies from equation to equation. The line of research followed in this dissertation can be extended in a straightforward fashion by considering additional new combinations of the assumptions underlying previous work. One obvious and interesting task would be to analyze a simultaneous equations model when there are both individual and time effects that may be correlated with the exogenous variables. A second possible topic of future research is to consider single equation models in which the random noise component of the error has a non-scalar covariance matrix. Finally, although this direction of research is less clearly defined, I hope to extend the analyses of this dissertation to nonlinear models. BIBLIOGRAPHY BIBLIOGRAPHY Arnold, S. F. e co e Mode u 'v ' te Agglygig. New York: Wiley, 1981 Amemiya, T., and T. E. MaCurdy. "Instrumental-Variable Estimation of an Error-Components Model." Eoogoggigiog 45 (1986): 869-881. Baltagi, B. H. "On Seemingly Unrelated Regressions with Error Components." Econometyica 48 (1981): 1547-1551. Baltagi, B. H. "Simultaneous Equations with Error Components." Jouggal of Eoogometrios 17 (1981): 189-200. Breusch, T. Panel 8., G. E. Mizon and P. Schmidt. "Some Results on Data." unpublished manuscript. Cornwell, C. and P. Schmidt. "Models for which the MLE and the Conditional MLE Coincide." unpublished manuscript. Fuller, W. and G. Battese. "Estimation of Linear Models with Crossed-Error Structure." Jour a of Eco om t 'cs 2 (1974): 67-78. Halmos, P. R. ' 'te- 'me 'ona V o ces. Princeton: Van Nostrand, 1958. Hausman, J. A., W. K. Newey and W.E. Taylor. "Efficient Estimation and Identification of Simultaneous Equation Models with Covariance Restrictions." Ecogogeigicg 55 (1987): 849-874. Hausman, J. A. and W. E. Taylor. "Panel Data and Unobsevable Individual Effects." Eoogoggiyiog 49 (1981): 1377-1398. Judge, George G., et a1. e o d c 'c of Eoogogeigics. New York: Wiley, 1980. Maddala, G. S. "The Use of Variance Components Models in Pooling Cross Section and Time Series Data." Eoogoggtzioa 39 (1971): 341-358. Mundlak, Y. Data." "On the Pooling of Time Series and Cross Section 82911913111123 46 (1978): 69-85. 164 i g 165 Mundlak, Y. "Models with Variable Coefficients-Integration and Extension." Annglgs de l’Iggee 30/31 (1978): 483- 510. ‘ Nerlove, M. "A Note on Error Components Model." co 0 'ca 39 (1971): 383-396. Rao, C. R. e t'st'c e c a d s Agglioggiogg. New York: Wiley, 1973. Schmidt, P. co ' s. New York: Marcel Dekker, 1976. Schmidt, P. "A Note on a Fixed Effects Model with Arbitrary Interpersonal Covariance." logyggl_o£_§oogoggggiog 22 (1983): 391-393. Swamy, P. and S. Arora. "The Exact Finite Sample Properties of the Estimators of Coefficients in the Error . Components Regression Models." Eoogoggigiog 40 (1972): 261-275. White, H. Asymgotic Theory for Economeiyiciggg. New York: Academic Press, 1984. R ”Ti)fi)))()()()((iflilflil'lMQRMI(Es