ABSTRACT IMPROVING THE COMPUTATION OF SIMULTANEOUS STOCHASTIC LINEAR EQUATIONS ESTIMATES By William Lewis Ruble As a part of a much larger statistical package for a Control Data Corporation 6500 computer, the writer participated in the develop- ment of a set of simultaneous stochastic linear equations routines in- cluding direct least squares (DLS), two-stage least squares (ZSLS), limited information single equation maximum likelihood (LIML), the Zellner-Aitken estimator (ZA), three-stage least squares (BSLS), linearized maximum likelihood (LML), subsystem maximum likelihood (SML), and full information maximum likelihood (FIML). This paper summarizes (1) computationalapproaches used, (2) many relationships between methods, (3) [to a very limited extent] some of the properties of these methods, and (4) forms of user control cards which may be used to specify and control the computation of problems. Computational techniques such as standardization of variables to reduce rounding error are noted. Some of the computational approaches used and some of the re- lationship among estimators noted in this paper were derived by the writer. 0f the computational approaches which are presented for the first time (as far as the writer is aware), the following are probably the most noteworthy: William Lewis Ruble (1) The use of direct orthogonalization in the calculations for many of the simultaneous stochastic linear equations methods. In addition to reducing rounding error, the use of direct orthogo- nalization eliminates some of the problems of multicollinearity among predetermined variables in the equation system. Also, the matrix of predetermined variables need not have full column rank. (2) The development of a method for imposing arbitrary linear re- strictions on coefficients which: (a) (b) (C) (d) (e) (f) (g) Allows the restrictions to be specified directly to the computer without prior solving out or conversion. Provides a means of imposing arbitrary linear restrict- ions upon FIML and SML coefficients. May be applied in essentially the same way to DLS, ZA, SML, FIML, LML, and BSLS. Is adapted to methods requiring iteration to a solution. Allows redundant restrictions to be imposed on coeffi- cients. The number of independent restrictions is cal- culated as a by-product of the computational procedure. Detects inconsistent restrictions. May be used to calculate restricted coefficients even though a unique solution for a method does not exist in the absence of the restrictions. Relationships among methods which are shown for the first time in this paper include; William Lewis Ruble (1) For the special case of a system of equations in which only one jointly dependent variable occurs in each equation, the following computational procedures lead to the same coefficients: (a) FIML. (b) Iteratively applying ZA. (c) The Telser method of iteratively estimating each equa- tion by DLS. (2) For the general case in which more than one jointly dependent vari- able is permitted per equation and at least one equation is over- identified, the following computational procedures do not lead to the same coefficients: (a) FIML. (b) Iteratively applying 3SLS (I3SLS). (3) Iteratively applying LIML (ILIML) leads to FIML estimates in the general case (multiple jointly dependent variables occurring in one or more stochastic equations).1 The Telser method of iter- atively estimating each equation by DLS may be considered a special case of ILIML; hence, IDLS is a maximum likelihood method. 0A direct derivation of IDLS as a maximum likelihood method for the special case of one jointly dependent variable per equation is also given.) In the derivation of the likelihood function for a system of equations for the application of FIML and for a subsystem of equations for the application of SML, identity equations are explicitly recog- nized. It is shown that the identity equations need not be used to eliminate jointly dependent variables from the stochastic equations William Lewis Ruble in order to express the likelihood function or to apply the FIML and SML estimation procedures. 1The ILIML procedure was proposed to the writer by Professor Herman Rubin. 2T.J. Rothenberg and C.T. Leenders, “Efficient Estimation of .Simultaneous Equation Systems“, Econometrica, XXXII, No. 1-2 (January- April, 1964), 57-76 have already shown that it is unnecessary to use identity equations to eliminate jointly dependent variables from the stochastic equations; however, a slightly different approach to show- ing this is taken in this paper. Professor Herman Rubin informed the writer that it is unnecessary to use identity equations to eliminate jointly dependent variables for SML; however, the writer is not aware of any reference to this in the literature. l ‘lll I‘ll—ll [Ir IMPROVING THE COMPUTATION OF SIMULTANEOUS STOCHASTIC LINEAR EQUATIONS ESTIMATES By William Lewis Ruble A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Agricultural Economics 1968 lull, .rfi’x’aV-:S]3 ACKNOWLEDGEMENTS The writer appreciates the help he has recieved from a large ruwmber of people during the early phases of projects leading to this (manuscript as well as in the final preparation of this manuscript. Dru William.A.Cromarty served as the writer's academic advisor during his first year of graduate work at MSU and introduced the writer to the general area of simultaneous stochastic linear equations estima- tion. Professor Clifford Hildreth then served as the writer's academic advisor for three years during which time the writer formulated many of the computational procedures noted in this manuscript. Also, during this period, Professor Herman Rubin gave the writer a number of special lectures on the blackboard of his office in an effort to impart more understanding of the techniques involved and to suggest computational procedures not a part of the literature. Professor Robert L. Gustafson served as chairman of the writer's thesis committee and spent a very large amount of time reading through two drafts of this manuscript and suggesting many substantive and' editorial improvements which have been incorporated. The other members of the writer's thesis committee--Professors Lester B. Henderscheid, Jan Kmenta, Kenneth J. Arnold, and James H. Stapleton--also spent a large amount of time reading drafts of this manuscript and suggesting improvements which have been incorporated. Mrs. Janet Eyster read the manuscript in semi-final form and caught quite a few editorial and typographical errors. ii Mrs. Marylyn Donaldson assisted the writer by calculating prob- lems to test out a number of conjectures regarding special techniques to reduce rounding error. Mrs. Noralee Barnes did an exemplary job of typing this manuscript and Mrs. Barbara Bray, Mr. Tim Walters, and IMrs. Dona Smith assisted the writer to get the computer results in their final form for photographing and reproducing. That it has been possible to accomplish this project at all is largely due to the support and encouragement the writer has received from Professor Lawrence L. Boger, Chairman of the Department of Agricultural Economics. The writer also appreciates the cooperation and assistance he has received during his employment in the Agricul- tural Experiment Station and the cooperation he has received from members of the MSU Computer Laboratory. Finally, the writer greatly appreciates the patience, under- standing, and encouragement of his wife, Margie, during the long dura- tion of his graduate student career. Although a large number of people have made contributions, the writer assumes full responsibility for errors which may remain in the manuscript. Chapter I. II. III. TABLBOPCWTS PART I. SINGLE EQUATION METHODS INTRODUCTION A. Background and Purpose . ..... .......... .............. B. Some.Asymptotic Properties of Estimators ........ .... C. Basic Model . ....... 1. System of structural equations ... ......... . ..... 2. Single structural equation ....... ....... ......... 3. Statistical assumptions ...... ....... .. ..... ..... 4. Reduced form equations .......................... D. Orthogonality Relationships ........................ 1. Notation expressing orthogonality relationships . 2. Computation of matrices of the form [2122112 and [2122] by direct orthogonalization ..?.... '23 COEFFICIENT ESTIMTIQI A. Basic Double k-class Model and Summary of Methods ... B. Methods Which Are Both h-class and Single k-claes ... 1. Direct least squares (DLS) .. ......... . ...... .... 2. Tito-stage least squares (281.8) C. Additional Single k-class Methods ............. . ..... 1. Limited information single equation maximum likelihood (LIML). . . . 1 ....................... 2. Negar' s unbiased to 0(T' ) in probability k (DER) 3. Nagar' a minimum second moment It (IBM) D. Methods Requiring rk x - K - n ............ , ......... l.' Indirect least sq res (ILS) .................... 2. The instrumental variables estimator (IV) ....... E. No Predetermined Variables in an Equation ........... P. Only One Jointly Dependent Variable in an Equation .. 6. Selection of Instruments ................... . . ..... . DISTURBANCE VARIANCE.AND COEFFICIENT VARIANCE-COVARIANCE ESTIMATION A. Disturbance Variance Estimation ..... . . . . . .......... . B. Coefficient Variance-Covariance Estimation . ......... 1. Double k-class ................. ......... . 2. .Alternative estimate for 2LIML .................. 3. Nagar' s unbiased to 0(T 2) in probability estimates .. ...................... ............ c. Coefficient Standard Errors and t-ratios ........ .... iv 15 15 24 31 35 39 39 55 62 74 75 79 79 85 87 89 91 95 101 102 103 114 120 120 123 124 125 Chapter Page IV. cmmxzsn ms: SQUARES A. Unrestricted Generalized Least Squares (GL3) ........ 129 B. Restricted Generalized Least Squares (RGLS) ......... 131 l. Computation of Q and q .. ........................ 134 2. Relationship to another restriction formula ..... 142 C. Restrictions Imposed on Direct Least Squares Coefficients ........................................ 149 D. Restrictions Imposed on Two-stage Least Squares Coefficients .................. ..... . ..... . .......... 152 PART II. mums squxnous METHODS V. FULL INFORMATION MAXIMUM LIKELIHOOD (FIML) A" Properties of the Full Information Maximum Likelihood Estimator ............ ........... ....... ....... . 158 B. Derivation of the Likelihood Function to be Maximized 163 C. Computational Procedure ............................. 172 1. A maximization procedure for functions non-linear in the parameters ..... . ......................... 172 2. The vector of partial derivatives for FIMI ...... 179 3. Metrics for FIML ...... . ........ . ........ . ....... 182 4. Step size to use at each iteration .............. 190 5. Convergence criteria ............................ 199 D. Estimated Disturbance Variance-covariance Matrix .... 202 E. Estimated Coefficient Variance-covariance Matrix .... 205 F. Arbitrary Linear Restrictions Imposed on the Coefficients .... .................................... 208 1. Illustration of linear restrictions on coefficients--K1ein's model I ................... 209 2. Computational formulas .......................... 218 G. Lineariaed Maximum Likelihood (LML) ................. 223 VI. LIMITED mmmrmisunsrsrsa mums LIKELIHOOD (SML) A. Only Zero and Normaliaation Restrictions Imposed on Coefficients ........................................ 225 l. Derivation of the likelihood function to be maximized ........ . .............................. 229 2. Computational formulas .......................... 242 B. Arbitrary Linear Restrictions Imposed on Coefficients ........................................ 245 C. Using Instrumental Variables in SML Estimation ...... 246 D. SML Estimation when rk.x 2 T - M. + 1 ............... 247 E. Iterative Limited Information Single Equation Maximum Likelihood (ILIML) .......................... 248 Chapter VII. VIII. IX. ZELLNERrAITKEN ESTIMATOR (ZA) A. Only Zero and Normalization Restriction Imposed on Coefficients ............................ B. An.Alternate Computational Procedure ............... C. Arbitrary Linear Restrictions Imposed on Coefficients ....................................... D. Iterative ZellnerwAitken Estimator (IZA) ........... 1. Only zero and normalization restrictions imposed on coefficients ........................ 2. Arbitrary linear restrictions imposed on coefficients ................................... E. Iterative Direct Least Squares (IDLS or Telser Method) ............................................ THREE-STAGE LEAST SQUARES (BSLS) A. Only Zero and Normalization Restrictions Imposed on Coefficients .................................... B. An Alternate Computational Procedure ............... C. BSLS Estimation when rk X = T .................... D Arbitrary Linear Restrictions Imposed on Coefficients ....................................... E. Iterative Three-stage Least Squares (IBSLS) ........ 1. Only zero and normalization restrictions imposed on coefficients ........................ 2. Arbitrary linear restrictions imposed on coefficients ................................... PART III. ADDITIONAL PROGRAMMING CONSIDERATIONS ADDITIONAL PROGRAMMING CONSIDERATIONS .................. A. Rounding Error ..................................... 1. Single vs. double precision .................... 2. Standardization of variables ................... a. Deviations from means ............ .......... b. Uniform scaling ................... . ........ c. Improving the estimates of sums, means, and the standardized moment matrix .. ........... d. Adjustments if no overall constant coefficient ................................ 3. Use of simultaneous equations solutions ........ 4. Direct orthogonalization ...................... . 5. Iterative techniques ........................... B. Free Field Interpretive Parameters ................. C. Data Transformation Section ........................ D. ~Coefficients Pool .................................. vi Page 263 272 279 305 315 327 329 354 355 357 366 368 Chapter Appendix A Page E. Special Files ...................................... 370 1. Data files ..................................... 370 2. Intermediate storage files ..................... 371 3. Matrix storage files ........................... 372 F. Incorporation of y and G Directly into the Sums of Squares and Cross-products Matrix .................. 373 G. Estimated Values of Normalizing Jointly Dependent Variables, Residuals, and Related Statistics ....... 375 H. Weighting of Observations .......................... 377 J. Checks Against Errors .............................. 379 K. Computer Output .................................... 383 COMPUTATION OF [2'2 1 AND THE RANK 0F 2 AS AN _ l 2 123 3 INTERMEDIATE STEP IN THE COMPUTATION OF [z'z ] . . 423 l 2 J-[Z3:24] COMPUTATION BY DIRECT ORTHOGONALIZATION OF A MOMENT MATRIX OF VARIABLES EACH OF WHICH IS ORTHOGONAL TO A DIFFERENT SUBSET OF VARIABLES .......................... 427 TENTATIVE PROOFS REGARDING THE CONSISTENCY 0F 8k k 1’ 2 AND a: k ............................................ 439 1’ 2 BIBLIOGRAPHY ........................................... 448 vii DLS GLS FIML IDLS ILIMl ILS IV IZA I3SLS LIML MSM RDLS RDLSME RGLS RZA RZSLS RZSLSME RBSLS LIST OF ABBREVIATIONS USED FOR ESTIMATORS Page direct least squares .................................. 74 Aitken's generalized least squares .................... 129 full information maximum likelihood ................... 158 iterative DLS ......................................... 287 iterative LIML ........................................ 248 indirect least squares ............. _ ................... 91 instrumental variables estimator ...................... 95 iterative ZA .......................................... 279 iterative 3SLS ........................................ 315 limited information single equation maximum likelihood 79 linearized maximum likelihood ......................... 223 minimum second moment k ............................... 87 restricted DLS (arbitrary linear restrictions imposed on coefficients) .... .................................. 149 restricted DLS in which arbitrary linear restrictions are imposed on coefficients in separate equations ..... 275 restricted GLS (arbitrary linear restrictions imposed on coefficients) ........ . ......... . ................... 131 restricted ZA (arbitrary linear restrictions imposed on coefficients) ...... . .............. . ................ 274 restricted ZSLS (arbitrary linear restrictions imposed on coefficients) ........ . ................. . ........... 152 restricted ZSLS in which arbitrary linear restrictions are imposed on coefficients in separate equations ..... 313 restricted 3SLS (arbitrary linear restrictions imposed on coefficients) SML UBK. ZSLS BSLS Page limited information subsystem maximum likelihood ...... 225 unbiased to 0(T-1) in probability k ................... 85 ZellnereAitken estimator .............................. 263 two-stage least squares ............................... 75 three-stage least squares ............................. 295 ix PART I SINGLE EQUATION METHODS CHAPTER I INTRODUCTION A. _Background and Purpose In 1963 the writer started development of a system of computer routines (presently called the AES STAT system) de- signed to calculate simultaneous stochastic linear equations estimates, direct least squares (including stepwise variations), analysis of variance and covariance, some "basic" statistics such as simple correlations, and the plotting of data and functions.1 Emphasis from the start has been on developing only a few major routines which can compute by a number of methods on each routine. Additional flexibility has been obtained by incorporating considerable facility for the manip- ulation and transformation of data in the computer and for the manipulation of coefficients between methods of estimation (the estimated coefficients from one method often provide the starting coefficients for other methods). The parameters (instructions to the routines prepared by the user for the calculation of a particular problem) are of the same form for 2 all of the AES STAT routines. lA computer routine is a set of instructions to a computer to accomplish a given calculation. The terms computer routine and computer program are used interchangeably. 2The general form of the control cards for the AES STAT system and some of the details on the development of the AES STAT system are given in chapter IX. 2 In the process of programming the simultaneous stochastic linear equations methods, the writer found it necessary to refer to many different articles and books for computational formulas and other specific aspects of the methods. Many of the compu- tational approaches were desirable appraoches for computation on a hand calculator but were not well adapted for use on the computer. In adapting computational approaches for use on the computer, emphasis has been placed on reducing rounding error, increasing flexibility, and providing automatic decision branching in the computer in the solution of problems requiring iteration. The purposes of this paper are to: (l) summarize com- putational formulas for a number of simultaneous stochastic linear equations methods, (2) summarize some of the relation- ships among these methods, and (3) to a limited extent sum- marize some of the properties of these methods.1 Some of the computational approaches used and some of the relationships among methods noted in this paper were de- rived by the writer. Of the computational approaches which 1Methods presented in this paper have already been in- corporated into the AES STAT system with the following ex- captions: (l) The method of imposing arbitrary linear restrictions has not yet been implemented as it was derived by the writer in the process of writing this paper. (2) The limited information subsystem maximum likelihood (SML) method has not been programmed, yet; however, it is planned for incorporation into the system shortly. (3) Nagar's minimum second moment k (MSM) is not available in the system. (4) The orthogonalization method described in appendix B is not available in the system. 3 are presented in this paper for the first time (at least as far as the writer is aware), the following are probably the most noteworthy: (1) (2) The use of direct orthogonalization in the calculations for the simultaneous stochastic linear equations methods. In addition to reducing rounding error, the use of direct orthogonalization eliminates some of the problems of multi- collinearity among predetermined variables in the equation system. Also, the matrix of predetermined variables in the system need not have full column rank. The development of a method for imposing arbitrary linear restrictions on coefficients which: (a) Allows the restrictions to be specified directly to the computer without prior solving out or conversion. (b) Provides a means of imposing arbitrary linear re- strictions directly upon full information maximum likelihood (FIML) coefficients.1 (c) May be applied in essentially the same way to direct least squares (DLS), the Zellner-Aitken estimator (ZA), limited information subsystem maximum likelihood (SML), full information maximum likelihood (FIML), linearized maximum likelihood (LML), and three-stage least squares (3SLS). lA summary of abbreviations used in this paper for estimators (e.g., FIML) follows the table of contents. (d) (6) (f) (g) 4 Is adapted to methods requiring iteration to a solution. Allows redundant restrictions to be imposed on coeffi- cients. The number of independent restrictions is calculated as a by-product of the computational pro- cedure. Detectsinconsistent restrictions. May be used to calculate restricted coefficients even though a unique solution for a method does not exist in the absence of the restrictions. (E.g., direct least squares may be applied directly to problems in which the matrix of explanatory variables has less than full column rank provided sufficient restrictions are placed on the coefficients. Thus, the step of eliminat- ing linearly dependent explanatory variables from the equation before obtaining a solution is saved.) Relationships among methods which are shown for the first time in this paper include: (1) For the special case of a system of equations in which only one jointly dependent variable occurs in each equation, the following computational procedures lead to the same coeffi- cients: LThe use of the method of imposing restrictions suggested in this paper together with the method for computing 3SLS and 2A sug- gested in this paper (starting the computations with an identity matrix as the estimated disturbance variance-covariance matrix) provide an easily used procedure for imposing linear restrictions across equations on the coefficients used to obtain disturbance variance-covariance estimates in the calculation of 3SLS or ZA estimates. As a result, if Shfficient restrictions on coefficients across equations occur, unique 3SLS or ZA estimates may exist even though unique two-stage least squares (ZSLS) and DLS estimates do not exist. (a) FIML. (b) Iteratively applying ZA.1 (c) The Telser method of iteratively estimating each equation by DLS.2 (2) For the general case in which more than one jointly dependent variable is permitted in each equation, the following com- putational procedures lead to the same coefficients: (a) FIPfl.. (b) An estimation procedure in which limited information single equation maximum likelihood (LIML) estimation is used iteratively to estimate the coefficients of each equation. (3) For the general case in which more than one jointly depen- dent variable is permitted per equation and at least one equation in the system is over-identified, the following computational procedures do not lead to the same coeffi- cients: (a) FIML , (b) Iterative BSLS (I3SLS). 1The Zellner-Aitken (2A) estimator is given in Zellner [1962] and in Chapter VII of this paper. 2The Telser estimator is given in Telser [1964] and in section VII.E of this paper. 3The iterative LIML computational procedure (ILIML) was pro- posed to the writer by Professor Herman Rubin in 1963 and the re- lationship of the method to FIML was shown to the writer by Professor Rubin at that time. Since DLS may be regarded as a particular case of LIML, the IDLS (Telser) method may be re- garded as a particular case of the ILIML method. 6 In the derivation of the likelihood function for a system of equations for the application of FIML (chapter V) or for a subsystem of equations for the application of SML (chapter VI), identity equations are explicitly carried. It is shown in this paper that the identity equations need not be solved out to express the likelihood or to apply the FIML and SML estimation procedures 1 Detailed user descriptions have been developed for the AES STAT system except for the simultaneous stochastic linear equations portion. It is intended that this paper will serve as a basic reference to the computational procedures used in the simultaneous stochastic linear equations portion and that detailed user descriptions will soon be written for this portion as well. 1Rothenberg and Leenders [1964] have already shown that it is un- necessary to eliminate jointly dependent variables to solve out identity equations for FIML; however, a slightly different approach to showing this is taken in this paper. Professor Herman Rubin informed the writer that it is unnecessary to solve out identity equations for SML; however, the writer is not aware of any reference to this in the lit- erature. 2 The form of the parameters to the system (control cards to the system to compute a given problem) and the form of the output is dis- cussed and illustrated in part III of this paper. The AES STAT system has been programmed on a Control Data Corpo- ration (CDC) 3600 computer. Although 3600 FORTRAN is the primary language used, extensive use is made of COMPASS assembly language sub- routines and features of the DRUM SCOPE executive system; hence, in its present form the AES STAT system is very difficult to convert to another computer system. Installation of a CDC 6500 computer at MSU is planned for late 1968 and the AES STAT system will then be con- verted to the CDC 6500 computer. In the process of conversion to the CDC 6500 computer, a number of assembly language subroutines will be replaced by FORTRAN subroutines thereby making conversion to another large scale computer system more feasible. In any event, the compu- tational approaches suggested here, including especially the ortho- gonalization procedures, involve basic computational procedures which can be programmed for any computer as easily as less accurate but better known procedures. 7 Since this paper concentrates on computational procedures, properties of the estimators receive only cursory treatment; also many of the proofs are given in algorithmic form; that is, the computational method which is described provides the proof of the property claimed. 8 B. .Some Asymptotic Properties of Estimators In this paper, most of the properties noted for particular estimators are asymptotic properties rather than small sample properties.1 The estimators are, however, used in estimation in which the number of observations in a given sample is finite and usually fairly small. Although it is hoped that the asymptotic properties mentioned give a guide to the comparable small sample properties, it must be realized that in particular cases a given ranking of estimators based on an asymptotic property may be re- versed in samples of the size used in the usual application of the estimator. Also, asymptotic properties of an estimator may give a good guide to properties of that estimator for a sample of size of, say, 100 or larger, but a very poor guide for a sample size of, say, 10. Since many readers are not familiar with many of the terms 2 used in this paper, we will note some distinctions. In what follows, let 9 denote a parameter of a probability distribution l . . In this paper, T denotes the number of observations in a sample. We say that an estimator has a given asymptotic property if there exists a sample size To such that for all T > To, the property holds within a given measure of closeness. 2See Goldberger [1964] for a more extensive treatment of the unbiased, asymptotically unbiased, and consistent properties. 9 and 8 an estimator of 9.1 Let T denote sample size. 9 is A independent of T whereas 9‘ may not be independent of T. The properties of 6 for a given sample slag of T are those which E has in repeated samples of sample size T -- not the estimated value of 6 for a given sample (e.g., 6 may be the direct least squares estimator applied to a coefficient of a given equa- tion-~not the particular estimate obtained by applying direct least squares to a given sample). Some distinctions follow: A (l) 9 is an unbiased estimator of a parameter 9 if 69 = 9 where 6 denotes expected value.2 (2) 8 is an asymptotically unbiased estimator of 9 if lim 69 = 9. Tam Unbiased implies asymptotically unbiased, since if 6% = e for all T, lim 6% = lim 9 = e. T499 T-'°° Asymptotically unbiased does not imply unbiased. For example, let 9 = l and 69 = l +-l. Then lim 68 = T l I lim (1 +-—) = 1 so that 9 is aSymptotically unbiased. Two T A 9 is not unbiased, however, since 69 = 1 +H% # l. LThe properties given here derive from the properties of O as a random variable. It is more common in the statistical liter- ature to use X in place of 9 so that the results are not lim- ited to estimators and the location in the sequence is specifically noted. Here n would be T -: the sample size, A more complete notation would be the use of 9 in place of 9 to explicitly recognize sample size. T More particularily, O is an unbiased estimator of a parameter 9 E E if 68 = 9 for all 6 E O. For simplicity, in the remain- der of this section, we will drop the mention of the class O to which 9 belongs and the mention of the requirement that the de- fined property holds for all members of that class. 10 A As T increases, 9 converges in probability to 6 [i.e., converges stochastically--written plim 8 = 9 Tum or equivalently as plim(D - 6) = 0] if for every 5 > 0, lim ProbEabs(O - 9) < e] = l, where Prob denotes THQ the probability of the expression within the brackets.1 Another way to say the same thing is that plim O = 9 T—m if for any a > O and U > 0, however small, there is some T' , such that for all T > T' , e.“ 631 A 2 ProbEabs(e - 6) < e] exceeds 1 - n . We can also say that plim 9 = 8 if the probability distribution for 6 Tan collapses about the single point 9 as Tflm, i.e., if the mean square deviation of 9 from 9 goes to zero as Tr”. (3) An estimator whose probability limit is a finite parameter (plim 6 = 9) is said to be a consistent estimator of that T-m parameter. Asymptotic unbiasedness does not imply consistency, since the probability distribution may not converge to a single point in the limit. As an example, suppose that for any T, h 6 has the distribution:4 Prob(8 = l) A Prob(9 = 2) 1/2 1/2 1abs denotes absolute value. 2Kendall and Stuart [1961], p. 3. A A 2 3The mean square deviation of 9 from 6 is 6(9 - 9) = Var(O) + (69 - e)2. A 4It is, of course, very unusual to define a distribution of 9 not containing 9 as a parameter; however, using such dis- tributions as examples permits construction of exceedingly simple examples. (4) 11 Then 66 = (1/2)(2) + (1/2)(1) = 3/2 and lim 66 = 3/2 T—m A so that if 9 = 3/2 then 9 is both unbiased and asymptot- A ically unbiased. 9 is not consistent since plim 9 # 3/2. T-Ioo (The distribution does not concentrate on the point 3/2 as Tv“; in fact, in this example there is zero probability that 9 = 3/2 even in an infinitely large sample.) Consistency does not imply asymptotic unbiasedness. For example, let 9 have the distribution: Prob(6 0) = (T - 1)/T Prob(O T2) = l/T A Then, since 9 concentrates at O as T becomes large, if 9 = 0, then 9 is a consistent estimator. 0n the other hand, 6 cannot be an asymptotically unbiased estimator of any 9 since 69 = 0'[(T - l)/T] + T2[l/T] = T. In discussing asymptotic properties it is often useful to use the "big 0“ and "little 0" notation to give a magnitude or speed of convergence. In using this notation, it is important to distinguish between whether the magnitude is related to B - 6 (and there- fore is an order of magnitude of consistency) or whether the magnitude is related to 69 (and therefore is an order of magnitude of asymptotic unbiasedness). Let f(T) denote a positive valued function of T such as 1This example was suggested by Professor Kenneth J. Arnold. 12 2 . 1/T or l/T . Then 9 - e is Op(f(T)) [which is read A 6 - 0 is "big 0" of f(T) in probability), or, (equiv- alently) 8 - 9 is of probability order f(T) or (equiv- alently) O - 9 is of the 3333 order of magnitude as f(T) in probability as T-*an if any of the following equivalent conditions hold:1 (a) If there exists a positive c independent of T such that 1 . plim[-—-— abs(9 - 9)] S c. (c can be a yggy large number and still be in- dependent of T.) (b) For any 6 > 0, however small, there exists a positive constant c6 independent of T such that lim Prob[ T—m 1 A - 2 - f(T) abs(9 e) 5 c6] 1 5 or (equivalently) lim ProbEabs(é - e) s c6-f(T)] 2 1 - 5. T—m (c) There exists a positive constant c independent of T such that for any positive fl, however small, there exists T' such that for every T > Tfi , D l f(T) abs(9 - 9)'< c] >‘l - n. Prob[ 1Some of the "big 0" and "little 0" conditions which follow are given in Mann and Wald [1943b]. (5) (6) tude A 9 - 9 siste 13 Most commonly f(T) will be T'l/Z, T'l, T'3/2, or T"2 3/2, or l/TZ). (i.e., 1//T, l/T, UT 9 - e is op(f(T)) (which is read 6 - e is "little 0" of f(T) in probability), or (equivalently) 9 - 9 is of proba- bility order smaller than f(T), or (equivalently) 9 - 9 is of a smaller order of magnitude than f(T) in probability as THub if any of the following equivalent conditions hold: 1 A (a) 1);..iEE-f-Zfi abs(9 - 6)] = 0 . (b) For any positive e and n, however small, there e ists T' such that for ever T > T' , X e.“ y an Prob[ 1 abs(9 - 9) < e] > 1 - n f(T) ' é - e is op(f(T)) implies that D - e is Op(f(T)), but the reverse implication does not necessarily hold. To define order of magnitude of 69 rather than 9, merely A replace 9 by 69 in (4) and (5) above. For example: 69 - 9 is Op(f(T)) [i.e., 69 - 9 is of the same order of magnitude as f(T) in probability as va] if there exists a positive c independent of T such that 1 A plini— abs (69 - 9)] s c. T—m f(T) The order of magnitude of f(T) gives the order of magni- of stated convergence. Thus, if 9 - 9 is Op(T-2) then / / is Op(T-3 2), OP(T-1), 0p(T-1 2), etc., and 9 is con- A nt. (9 is consistent if 9 - 9 is Op(T~n) with n > 0.) 14 (hi the other hand, 9 may be consistent but 9 - 9 not op(T'1/2) or 6- 9 may be OP(T'1/2) but not Op(T’2). Also, the order of magnitude of 9 - 9 does not imply anything regard- ing the order of asymptotic unbiasedness except for specified dis- tributions. Similarly if 69 - 9 is Op(T-2) then 69 - 9 is / A T-3/ 2), etc., and 9 is asymptotically op( 2). op(T'1). opCr'1 unbiased. (9 is asymptotically unbiased if 69 - 9 is Op(T-n) with n > 0 .) On the other hand, 9 may be asymptotically un- biased but 69 - 9 not Op(T-1/2) or 69 - 9 may be Op(T-1/2) but not _Op(T-2). Also, the order of magnitude of 69 - 9 does not imply anything regarding the order of magnitude of consistency except for specified distributions. The above properties are some of the properties referred to in this paper and do not, of course, include many properties of estimators which are important in estimation. The emphasis on asymptotic properties is dictated by the present state of our knowledge of the small sample properties of simultaneous stochas- tic equations estimators. In the remainder of this paper, plim.A will be shortened Tqa to plim.A, i.e., the Tn” will be understood. 15 C. Basic Model 1. System of structural equations In part I, the estimation of a single equation from a com- plete system of equations will be considered. The complete system which consists of M stochastic equations and G - M identity equations (G - M may be zero) may be expressed aszl’z’3 (1.1) Y I" + x B' + [U 2 o] = 0 TXG GXG TXA AXG TXM TX(G-M) TXG or (1.2) Z a' + [U E 0] = o TX(G+A) (G+A)XG 'TXG TXG The same model is also often written in transposed form as: F . “ UEEJ 1 , MX = ' = (1.3) F YEt] + B X[t] +‘ 0' oEtj (t l,...,T) GXG cx1 GxA Ax1 [t] GX1 t‘G’M) X L 1The dimensions of each matrix are listed below the matrix in many of the matrix equations given in this paper. In this paper, ' (prime) denotes transpose. 2 . . . . An identity equation 18 an equation containing known coeffi- cients and no disturbance (i.e., a disturbance vector of all zeros). The notation used in this paper was designed to meet the following requirements: 1) It should be consistent with the notation commonly used for direct least squares; in particular, the signs of the coefficients should not have to be reversed to make them comparable to direct least squares coefficients. (1.4) where Y 16 or 7: U[t] ] MXl oz zit] + 0' = on] (t=l,...,T) GX (G+A) (G+A)Xl [t] Gx1 L(G-M>XL [t] denotes the tth observation and: is a matrix of T sample observations taken on the G jointly dependent variables in the system. is a matrix of T sample observations taken on the A pre- determined variables in the system. = [Y 3 X] is a matrix of T sample observations taken on all C + A variables in the system. is a TXM matrix containing T unobserved structural dis- turbances for each of the M equations containing distur- bances. is the GXG matrix of population coefficients of the G jointly dependent variables. Each row of P (or column of F') contains the population coefficients corresponding to a 2) The coefficients for a structural equation are expressed as a row of the coefficient matrix for the system., Sim- ilarly, the coefficients for a reduced form equation are expressed as a row of the coefficient matrix for the re- duced form. 3) An observation is a row in an observation matrix. 4) Identity equations are explicitly recognized in the notation. 5) Within the above limitations, the notation should be patterned after that of Theil [1961] and Zellner and Theil [1962], since this appears to be the most pre- valently used simultaneous stochastic equations notation now appearing in the literature. 17 particular equation, and each column of F (or row of F') contains the population coefficients corresponding to a particular jointly dependent variable. B is the GXA matrix of population coefficients of the A predetermined variables. Each row of B (or column of B') contains the population coefficients corresponding to a particular equation, and each column of B (or row of B') contains the population coefficients corresponding to a particular predetermined variable. a = [F 5 B] is the GX(G + A) matrix of population coeffi- cients of all C equations for all of the G + A variables in the system. The term "jointly dependent" will be treated as synonymous with the term "endogenous". Jointly dependent variables are random variables assumed to be contemporaneous1y correlated with the disturbances. They are assumed to be generated within (endogenous to) the system of equations. Predetermined variables are the remaining variables in the system. They are either (1) exogenous variables, i.e., variables which are assumed to be generated outside the system of equations and therefore independent of the disturbances or (2) lagged values of jointly dependent variables which due to their lag are con- temporaneously independent of the disturbances.1 In some parts of this paper the predetermined variables will be assumed to consist 1Contemporaneously independent is defined in statistical assumption (3) of section I.C.3. 18 of Ufixed" or non-stochastic variables only. In these cases, the set of predetermined variables must be restricted to exogenous 'variables only, since lagged values of jointly dependent variables are stochastic and not "fixed". Klein's model I of the United States economy will be used to illustrate the above notation and some subsequent notation ‘which will be introduced. Klein's model I is a complete system of equations (the number of jointly dependent variables equals the number of equations).1 which may be written as an 8 equation system, the first 3 equations containing disturbances and the remaining 5 equations consisting of identity equations. These equations are: [1] [1] [1] [1] . : = + + (1 5a) Consumption C a0 a1 P +102 W +-a3 P_1 u1 ‘ - [2] [2] [2] [2] (I.5b) Investment. I — do +’01 P +~az P-1 +-a3 K_1 + u2 . [3] [3] [3] [3] (I.5c) Private wage. W1 00 +~a1 E +-az E_1 +-03 t + u3 (I.5d) Product: Y + R = C +.I + G (I.5c) Income: Y = P +'W (I.5f) Capital: K = K-1 + I (I.5g) Wages: W = W1 +~W2 (I.5h) Private product: E = Y + R - W lAlso implied by the term "complete system" is the recognition that no lagged jointly dependent variable occurs in the system with- out the corresponding (non-lagged) jointly dependent variables also occurring in the system. IAlthough the notation_has been changed slightly, the expla- nation of each equation has been taken almost verbatim from Gold- berger [1964], pp. 303-304. Klein's model I is given in Klein [1950]. 19 The first equation is a consumption function which de- scxribes consumption (C) linearly in terms of profits (P), ‘profits lagged one year (P_1), and the total wage bill (W). The second equation is an investment equation describ- ing net investment (I) linearly in terms of profits, lagged profits, and capital stock at the beginning of the year (K_1). The third equation is a demand for labor equation which describes the private wage bill linearly in terms of private product (E), private product lagged by one year (E_1), and time (t) measured in calendar years. The five identity equations complete the system. The additional variables in the identity equations are national income (Y), indirect taxes (R), government expenditure on goods and services (G), capital stock at the end of the year (K), the private wage bill (W1), and the government wage bill 1 (W2) . The variables C, P, W, I, W1, E, Y, and K are desig- nated as jointly dependent within the system and the variables P_1, K_1, E_1, t, R, G, and W2 are designated as predetermined to the system. It is convenient to consider one additional pre- determined variable -- X0, a variable which assumes the value 1.0 for all observations. Thus, X0 is the variable whose coefficient ,_A__. G is also used in this paper to denote the number of jointly dependent variables in the system, and K is also used to denote the number of instrumental variables in the XI matrix (a matrix which is defined further on); however, the particular uses of G and K should be clear from their contexts. 20 [2] is a": 1] O O .531 in the first equation, a in the second equation and in the third equation. As a digression we will note why certain of the variables defined by identity equations are considered to be jointly de- pendent rather than predetermined. As indicated in the wages equation, W = Wl +-W2. In formulating the model, W1 was des- ignated as jointly dependent and W as predetermined. Since W 2 is composed of only one part not contemporaneously independent of the disturbance, W is also not contemporaneously independent of the disturbance and must be denoted jointly dependent. Sim— ilarly for K, since K = I + K-1 with I designated as jointly dependent. Similarly also for E, since E = Y + R - W2 and Y is jointly dependent. If the annual data from 1921 through 1941 are used in the model, T = 21. Also, given the above designation of variables, G = 8, A = 8, and M = 3. Regarding each of the variables such as C as a 21Xl vector of observed values, the matrices of equations (1.1) and (1.2) may be defined as: 1.6 Z=Y§X=P,W,W,K,C,I,Y,E’ (> E 1R 1 V’— jointly dependent 21x8 X.P,K,E tRG,W] \50 -1 -l -l’ ’ ’ 1:9 v predetermined 21x8 (1.7) Eq. Eq. Eq. Eq. Eq. Eq. Eq. Eq. 21 a = [ F E B ] = 8X16 P “1 .1121 8X8 8X8 w w1 K [1] oz 0 0 o o o o -1 o o o o 1 o o o 0 -1 -1 1 o 0 o 0 x0 P-l [1] [1] 0’0 0’3 [2] [2] 0’0 “2 aE3] 0 o O o 1 o o o o o I Y E 0 0 0 -l O 0 0 0 -l O 0 0 0 O 0 0 O O 0 0 l Eq. Eq. Eq. Eq. Eq. Eq. Eq. Eq. 22 The part within the brackets constitutes the a matrix. 'The equation number corresponding to each row is given on each side of the matrix and the variable corresponding to each column is given above the matrix. Notice that in the a matrix, a coefficient of -1 has been assigned for any variable listed to the left of the equality Sign in equations (I.5a) through (I.5h). This is equivalent to assign- ing a coefficient of l in the equation and then transcribing the coefficient and variable to the same side of the equality sign as the remaining variables. The coefficients of any equation may be multiplied by a scalar without changing the meaning of the equation. To avoid this indeterminacy we will follow the normalization rule given above, i.e., in each of the equations the coefficient of one of the jointly dependent variables will be assigned the value -1.1 Of all of the estimation procedures discussed in this paper, only in the case of limited information single equation k fl 1Many normalization rules have been used in the past to avoid indeterminacy in each equation. One way is to specify that a co- efficient corresponding to one jointly dependent variable in each equation be set to l. A second way is to normalize each equation such that the resulting estimated disturbance variance-covariance matrix has 1's for diagonal elements. For our purpose here, how- ever, we will find it most convenient to select a variable in each equation as the normalizeing variable and set its coefficient to -1. If -1 is used as the normalizing coefficient, the resulting coefficients may be compared directly with coefficients from methods such as direct least squares (DLS) or two-stage least squares (ZSLS); however, if 1 is used as the normalizing coeffi- cient, the sign of each coefficient must be reversed before com- paring the coefficient with a comparable coefficient from DLS or ZSLS. Since it is as easy to use -1 for normalizing as to use 1, it seems highly desirable to do so and avoid the reflection in coefficient signs. 23 tmaximum likelihood (LIML), limited information subsystem maximum likelihood (SML), linearized maximum likelihood (LML), and full information maximum likelihood (FIML) will it make no substantive difference in the estimated coefficients which jointly dependent variable is chosen as the normalizing variable. For the remainder of the procedures, selection of a different normalizing variable will make a substantive change in the resulting estimated coeffi- cients.1 Only coefficients of the form anJ must be estimated. All of the remaining coefficients in the a matrix are assumed known. In the special case of a just-identified equation (see section II.D), it will also make no substantive difference for many methods since the same estimated coefficients are obtained by many of the single equation procedures as are obtained for LIML. 24 2. Single structural equation In part I, estimation of only a single equation from among the M. stochastic equations will be considered although for many of the procedures some of the information contained in the remaining M - l stochastic equations and the G - M _ identity equations will be taken into account by the computational method. For most of the "single equation" estimating methods which are considered, account is taken of the structure of the particular equation to be estimated plus additional "instrumental variables." (The predetermined variables in the entire system including predetermined variables in identity equations usually comprise the instrumental variables.) No account is taken of jointly dependent variables inside the system of equations but outside the equation being estimated or of which particular equations the instrumental variables occur in (if the instrumental variables are predetermined variables in the system). For ease of expressing computational formulas, some addi- tional notation regarding a single equation from the system of equations will be recorded. The uth equation may be written separately as: . = Y + x + (I 8) YD 1» YD u. an up Txl TXm lel TXL {.Xl TXl u- p u- 01' (1.9) (I. 10) where: 25 1 ' Y“ = Y : X + ’1!» u U] "u u 0 r = z 6 + ’1» u- u ”u- Txl Txn [1X1 TXl U u- is the vector of T sample observations taken on the normalizing jointly dependent variable (i.e., the jointly dependent variable assigned a coefficient of -l) for the nth equation. is the TXUL matrix of T sample observations taken on the remaining mh jointly dependent variables in the nth equation. We will refer to jointly dependent variables other than the normalizing jointly dependent variable as "explanatory" jointly dependent variables. is the TXLU matrix of T sample observations taken on the Lu predetermined variables in the uth equation. = [Yfi 3 Ru] is the Tan matrix of T sample observations taken on all nu (= m +-L explanatory variables in the U u) equation; that is, on all variables in the equation except the normalizing variable, y“. is the vector of T unobserved structural disturbances. is the mhxl vector of population coefficients of the h mp explanatory jointly dependent variables in the pt equation. (The elements of Yu may be obtained from the 26 th u row of F by deleting the normalizing coefficient and the coefficients which are known to be zero.) 6 is the Luxl vector of population coefficients of the L predetermined variables in the nth equation. (The elements of BM are the non-zero elements of the nth row of B.) 5 = [EE] is the HDXI vector of population coefficients of the nLL (= m +-L explanatory variables in the equation. H u) (The elements of 5“ may be obtained from the nth row of a be deleting the normalizing coefficient and the coeffi- cients which are known to be zero.) Sometimes it will be desirable that the normalizing variable be included as part of an observation matrix or the normalizing co- efficient as part of a coefficient vector. To do this the follow- ing additional notation is used: Y E Y is the TX + 1 matrix of T sam 1e obser- + u E?“ p”II (mu ) p vations taken on all mu + l jointly dependent variables in the nth equation. z 22 = EYEx = YEx isthe TXn+1) +-u [yn #1 [Yb u H] [+-u #1 ( 9 matrix of T sample observations taken on all up + 1 variables in the nth equation. +‘H Y -1 Y 8 [: ] is the (an + 1)Xl vector of population coefficients u of all mh + 1 jointly dependent variables in the nth equation. -1 m 27 Bu --1 +Yp. +5“. I- [ ] = Yp. = is the (nu + 1)Xl vector of popu- lation coefficients of all nu + 1 variables in the nth equation. . .. ‘. th . USing the above additional notation, the u equation may also be written as: ( ) +‘U +Nh H BM up TX +1 +1 Xl TXL L x1 Tx1 Txl (mu ) (mp. ) u u or (I . 12) +2“. +6“, + up = O TX(np+1) (nu+l)Xl TXl TX1 The estimation of a particular equation will often be accomplished by essentially a two stage procedure in which the jointly dependent variables in the equation are first adjusted by‘a set of “instrumental variables" (we will refer to a set of instrrumental variables as a set of "instruments“) which consist of the predetermined variables in the equation plus additional inst:ruments--usua11y the additional predetermined variables in the System.1 The coefficients are then estimated from the ad- .hfiited.joint1y dependent variables and the predetermined vari- ables in the equation. We will refer to a matrix of instruments as XI. (The I denotes instruments.) Thus, 1The use of the term instruments follows Fisher [1965]. 28 (1.13) XI = [ xu E x** ] TXK X TXLu T (K-Lu) X**, and K should be written as X , I’ I u x:*, and K“, respectively since they may be different for each Strictly speaking, X equation, u; however, since the equation referred to will be clear from the context, we will simplify the notation by writing X as X , X** as X**, and K as K.1 I” I u U During discussion of the single equation techniques (part I), we will drop the subscript u when such will not be confusing. Thus, yp, Yfi, 2”, up, +Xp, +fip’ and x:* will usually be shortened to y, Y, 2, u, +Y, +2, and X** respectively. Also, Yfi, Bu, 6 u +Nh' and +§M will usually be shortened to y, B, 6, y, and +6 respectively, and mu, Lb, nu, and Kp will usually be shortened to m, L, n, and K, respectively. The consumption equation (I.5a) of Klein's model I will be used to illustrate the above notation. Since the consumption equation is the first equation, u will be 1 and: yl = c, Y1 = [P, w], x1 = [x0, P_1], 21 = [P, w, x0, P_1], +Y1 = [c, P, w], +z1 = [c, P, w, x0, P_1], x** = [K t, R, G, ”21 . -1) E-l, 11a BSLS estimation X is again used to denote a set of instruments but in this case x includes as instruments the pre- determined variables in all equations of the subsystem being estimated plus additional instruments desired. For BSLS estimation the same x matrix is used by all equations in the subsystem being estimated. 29 If all of the predetermined variables in the system are used as instrumental variables for adjusting the jointly dependent 'variables, then X coincides with X I renumbering of variables, i.e., ** = 0 x ] [x0, P_1 . K-1, XI = [X1 : ‘E -1’ The population coefficient vectors are: Y a 1 gal] r-l '1 [1] 1 [1] +51 0’2 .gn 0' mg”. Thus, [[11 = 2, L1 = 2, n1 = 4, and K = 8. ".511 .31 .gu [1 [’3] Equations (1.8) through (1.12) become: 411 0' except possibly for a c, R, G, ‘42] . ‘ ad [1] + 1 (1.8') c = [P, w] + [x , P .] + u CIE1] o -1 agl] 1 F11 -1 411 aEl] hZ-J (1.9') and c = [P, w (1.10') (1.11') [c :' P, w] (1.12') [c E P, wE : X0, P_1] r-_1 1 [1] °’1 X 0’ P_1] 30 P E 137 .g11 [1] [1] E“: . "-1 '1 .g11 [1] .g11 [1 133 :L +u=0 31 3. Statistical assumptions Following are the statistical assumptions which will be made in this paper unless noted otherwise: t (1) If estimating the u h structural equation by a single equa- tion method, the nth equation is identifiable by the a priori restrictions on the values of the coefficients in the equation. If estimating by a multiple equations method, all equations in the subset of equations being estimated are identifiable by the a priori restrictions on the values of the coeffi- cients in the subsystem. In the early chapters on single equation methods (chapters 11 1See Goldberger [1964], pp. 306-318, Johnston [1963], pp. 240- 252, or Koopmans and Hood [1953], pp. 135-142 for a discussion of identification. In addition to the usual order conditions imposed on.a single equation, (i.e., the usual counting rule K:* >'mh given below), the assumption here is that observationally equivalent Structures for the subsystem being estimated do not occur. Johnston [1963], p. 252 states: "If K:* 2 mh , the parameters of a relation are identifiable. Our practical estimation procedure may then be influenced by whether K:* = mu or Kfi* >'mh. In the former case rk fiA ** will, apart from a freakish statistical accident, be equal 3 to mh " (Johnston used K** in place of K:* , GA - 1 in place of mh’ and p in place of rk.) Unfortunately, Johnston's state- lnent seems stronger than is warranted. rkfiA **.< mfl has probably 3 ‘DCCUIIEd much more often than is generally realized, but has not bEen apparent when estimating by single equation methods; however, EHJch a situation is more likely to become apparent when estimating bYlmultiple equation techniques. Koopman, Rubin, and Leipnik 11950], pp. 78-80 present an additional examination (in addition t1) the usual counting rule for a single equation) which may be readily performed to detect observationally equivalent structures. 32 and 111) only a priori restrictions that certain coefficients are zero are used; therefore, during this part, identifiability implies nu S rk,X S A and the +au matrix has ‘ full column rank.1 (rk X is used to denote rank of X.) In the last chapter on single equation methods (chapter IV), more general linear restrictions are considered. For that chapter, nu may be greater than A and +2” need not have full column rank. Except for showing some relationships, in no part of this paper will the usual assumption that X has full column rank (i.e., the assumption that rk X = A) be made, since the computational procedures given in this paper automatically handle the more general situation of rk.X<< A. (2) The TXM. matrix of disturbances of the first M equations, UEI] U11 ... Um U=[u1...uM]'-= U. = u ...u [T] _Tl TM has a multivariate distribution with 60 = 0, dUfi U = 2 t] [t] I g I x for all t and auEt]U[t'] O for t # t , 2 being an MIM positive semi-definite matrix. Thus, 2 is the population variance-covariance matrix of disturbances. When estimating S A is equivalent to m S K** since n = m +-L and A = L K$* . (Goldberger [19643, ppH 306-318 anh JoflhstonPEI963], pp. dfio-zsfi use the latter order condition.) S rk X is imposed since we are permitting X to have less thaanull column rank in ‘ this paper. rk.X S A always holds since the rank of any matrix is less than or equal to the number of columns (and rows) in the matrix. (3) (4) 33 a subset of structural equations (including the entire set of equations) by a multiple equations technique, the stronger assumption that 2 is positive definite will be required so that the determinant of 2 will be greater than zero and the inverse of 2 will exist. The restriction 6Ufit]U[t] = 2 for all t implies that the disturbance variance-covariance matrix is constant for all observations. The restriction 6Uft]U[t,] = O for t # t' implies that there exists no serial correlation between observations of the disturbance elements. The nth diagonal element of 2 will be denoted Oi , i.e., Var u“ = ofi . For notational simplicity of will be written simply as 02 during discussion of the single equation methods. The above assumptions regarding U imply that under general conditions plim(l/T)U'U = 2.1 The TXA matrix of predetermined variables, X, has the pro- perty plim(l/T)X'X = 6(1/T)X'X = QXX’ a finite positive semi-definite matrix. Also, the variables in U are con- temporaneously independent of the variables in X; that is, U[t] is statistically independent of X[t.] for all t, t' with t 2 t'.2 This implies that plim(l/T)X'U = 0.3 det T $ 0, hence F51 exists. (det denotes determinant.) plim. T—O 1Goldberger [1964], p. 300. As noted earlier plim denotes 2This assumption holds by definition when X includes only exogenous variables. It allows inclusion of lagged jointly de- pendent variables in X provided there is no serial correlation in the disturbances. 3See Christ [1966], pp. 377, 378 and footnote 70 of p. 439. (5) 34 In some of the computational methods, jointly dependent vari- ables will be adjusted by a TXK matrix of instrumental variables which we will denote XI (the subscript I de- noting instruments). These instrumental variables will be assumed to have essentially the same properties as the pre- determined variables in the system. (Usually XI consists of the set of predetermined variables in the system.) In particular we will assume that plim(l/T)XiU = 0 (where 0 is a KXM matrix) and that plim(l/T)XiXI = c3(1/T)XiXI = 0X X , a KXK positive semi-definite matrix. (We will not, inlgineral, assume that XI has full column rank.) We will ' a ' a x also assume that plim(1/T)XIX 6(1/T)XIX OXIX’ a K A matrix. 35 4. Reduced form equations Often after estimating the coefficients of the structural equations in a system, it is desired that these be used for pre- dictive purposes; however, (1) multiple jointly dependent vari- ables may occur in the equation (therby requiring that values must be assumed for the "explanatory" dependent variables as well as the predetermined variables), (2) it may not be obvious which equation to use in the prediction of a particular jointly de- pendent variable, and (3) a given equation will not reflect the repercussion of the assumed levels of all predetermined variables in the system. As a result, it is often desired that the structural equations be solved for a set of "reduced form" equations, each reduced form equation containing 1 jointly dependent variable and all of the predetermined variables in the system (except that the a priori restrictions on the structural equations may imply that certain of the coefficients of the reduced form equations are to be zero, also). An additional reason for calculating reduced form coeffi- cients comes from the calculation of elasticities. Certain direct elasticities between variables should be based on the coefficients of the structural equations; however, in many cases, when specify- ing elasticities between two variables, the relationship between these variables after taking account of all repercussions in the system is desired. Such elasticities should be based on the reduced form coefficients. 36 The reduced form equations may be derived by premultiplying (1.3) by ,F-1 or by postmultiplying (I.3) by (F-1)'. Pre- multiplying (1.3) by F-1 we have: (I. (I. (I. In (I. (I. (I. F t T _1 _1 Mx1 14) F 1" th] + 1" B Xtt] +1" ;=0tt] cxc cxc cx1 cxc GXA Ax1 cxc °[t] GXl L..(G-M)XU -l or 15) Y' = -T'1 B x' - r'1 [u E o ]' [t] [t] [t] ft] GXl GXG GXA Axl GXG GXl or U - I + vl cxl GXA AX1 GX1 terms of the entire observation matrices for Y, X, and V, 16) may also be written as: 17) Y' = H x' + v' GXT GXA AXT GXT or 18) Y = x n' +- v TXG TXA AXG TXG where in (1.16) through (1.18) (I. 19) n = —F'13 37 is the GXA matrix of coefficients of the reduced form equations, in which each row of H gives the coefficients corresponding to the predetermined variables of a single reduced form equation. Bach equation contains only one jointly dependent variable and this variable is written to the left of the equality sign. 1V[1] V11 ... V}G v = -[U 5 o](F’1)' = [v1 ... vG] = : = V [T] le ... T is the TXG matrix of reduced form disturbances. Some of the statistical characteristics of the reduced form matrices which follow from their relationships to the structural equations and the statistical assumptions regarding the structural equations are: (1.20) 6v = 6[[-U E 01(F‘1)] = E-6U s o](F'1)' = [o E o] = o vt1 (1.21) 6v[t]v[t'] = 6 th [vt,1 ... v ] t'G 6'[u[t] s oml'tuh.J : oEt.]]r-1 60' U , 0 = (F'1)l [t] [t J F-1 0 0 Thus, 2 O l (1.22) 6V' V = (F-1)' F-l dgf GXG positive semi- [c] [c] 0 o 1 2 0 d f - -1 i denotes that we are defining the matrix (T 1)' F definite matrix which ditions, plim(l/T)V'V (1.25) plim(l/T)X'V (1.26) plim(l/T)XiV 38 is fixed for all t. 30W, Under general con- 0 0 (T-1)'[O (JP-1 = 0 for t # t' . =-p11m(1/T)x'[U E o](r’1)' =-[plim(1/T)X'U 5 o](r‘1): = [o s o](T‘1)' =-p1im(l/T)Xi[U 5 o](r‘1)' = [o E 0](T'1)' 39 D. Orthogonality Relationships _1. Notation expressing orthogonality relationships We can shorten the expression of many formulas which are given in this paper while giving a better idea of the computational use of these formulas by introducing some additional notation at the outset. Two variables are said to be orthogonal if their sum of cross-products is zero; that is 21 is orthogonal to 22 if ziz2 = 0. The extension of the concept of orthogonality to matrices whose columns are variables is quite straightforward. Let Z1 be a TXN1 matrix of variables and Z2 be a TXN2 matrix of variables. Then the columns of Z1 are said to be orthogonal to the columns of Z2 if the matrix 2122 = 0 where O is an leN2 matrix. (The ijth element of the matrix 0 is the sum of cross-products of the ith variable of 21 and the jth variable of 22. The ijth element is, of course, zero since the entire matrix of sums of cross-products is zero.) In this paper each column in a matrix of variables is a variable; thus, for this paper we will shorten our definition of orthogonality between the variables in two matrices to: 21 is orthogonal to 22 if ZiZ2 = 0. It is often convenient to divide a vector into two com- ponents -- (1) the part of the vector within the space spanned by (the variables of) another matrix and (2) the part of the vector orthogonal to the other matrix (i.e., outside the Space spanned 40 by the other matrix). Thus, given a TXl vector, y, and a TXN1 matrix, X, y may be separated into: 1.27 = + ( ) y ylx le Txl TXl Tx1 where: ynx is the_part 0f Y which is in the space spanned by x,1 is the part of 4y which is orthogonal to X. (ny is the ’1}: part of y outside the space spanned by the variables in X.) The 1- (perpendicular) symbol is used to denote orthogonality, because if two vectors, y and x, are geometrically at right angles to each other, then y'x = O (i.e., the vectors are orthogonal). The use of the || (parallel) symbol to represent "within the Space spanned by" may be justified by a similar geometrical argument. Extension of this notation to matrices of variables is straightforward. Thus, a TXG matrix of variables Y (Y = [y1 ... yG]) may be partitioned as: (1.28) Y = y"x + le TXG TXG TXG where: YflX is the part of Y in the space spanned by x. obtained by calculating the part of each variable in Y which is in the space spanned by the variables in X. 1y is the projection of y onto the space spanned by the variables in X. ZYJX is the projection of y onto the space orthogonal to X. 41 Y is the part of Y which is orthogonal to X. 1X YlX = {[ylhX ... [yG]Lx}; that is, YlX is merely the matrix obtained by calculating the part of each variable in Y which is orthogonal to X. The partitioning of Y into YNX and YLX may be accom- - plished by an application of least squares. Let us estimate the equation y =VXn + v by least squares. (In the notation pre- viously given, this equation would be a reduced form equation.) The usual least squares vector of predicted (estimated) values of y (i.e., §) is *lX’ i.e., the part of y in the space spanned by X, and the usual least squares vector of residuals (i.e., 9) is -- the part of y orthogonal to X. ny To show this we note that (assuming that X has full column rank) the usual least squares solution for the vector of estimated coefficients is given by: (1.29) n = (x'x>‘1x'y ; the TXl vector of predicted values for y is given by l (1.30) y"x = § = xfi'= X(X'X)- X'y ; and the TXl vector of residuals is given by (1.31) yix=v=y - y=y -X;T=y -X(X'X)-—]X'y Notice that in the usual least squares calculations, y is divided into a part y (or ) within the space spanned by X yux and a part 9 (or ny) orthogonal to X. That 9 is orthogonal 42 to X is easily demonstrated: 3" I v '11 I I I '11 1 v (1.32) Xv=X(y-X(XX) Xy)=Xy-XX(XX) Xy=Xy-Xy=0. Since XIX and Y1 were defined as the parts of each X variable in Y in the space Spanned by X and orthogonal to X, respectively, the calculation of Y and Y1 may also be re- "x x garded as least squares calculations. Let us consider G equations with separate dependent variables, y1 ... yG, but the same inde- pendent variables, x1 ... xA for each equation. Let Y = [y1 ... yG] and X - [x1 ... xA] as before. The least squares solutions for the coefficients of the G equations are given by the matrix (1.33) 11' = l'_(x'X)’1x'y1 (X’X)'1x'yG] =[x'x1'1x'Y ; the TXG matrix of predicted values for Y is given by: (1.34) Yux = Y = [y1 yG] = [X111 xfiG] [X(X'X)-]X'y1 X(X'X)'5('YM] x(x'x)"])('[y1 yM] = X(X'X)';X'Y ; and the TXG matrix of residuals for Y is given by: (1.35) le = U = [ul ... uG] = [y1 - y1 ... yG - yG] - X(X'X)-1X'y [yl - X(X'X).1X'y1 ... y M] M. = [y1 yM] - x(1{'1<)'1x'[y1 yM1= Y - X(X'X)’]X'Y . If X does not have full column rank, the X'X matrix will 43 be singular, the inverse of X'X will not exist and unique least squares coefficients will not exist for any of the G equations. Even though the least squares coefficients are not unique, the least squares predicted values for Y and the residuals for each equation are unique and can be readily calculated. This illustrates a desirable characteristic of Y and le--even though_ X is llx not of full column rank, Y and Yix are unique and can be "X readily calculated.2 ¥A set of least squares coefficients can be obtained by put- ting enough restrictions on the coefficients of each equation; e.g., by setting certain of the coefficients to zero thereby, in effect, omitting variables from the X matrix. 2 A direct orthogonalization procedure such as the Gram-Schmidt orthogonalization procedure is probably the most accurate method available for the calculation of residuals (le), and predicted x; however, if very many observations occur it will be considerably more efficient to obtain a set of least squares coefficients for each column of Y and then use these coefficients to calculate Y"x and le. This may be done by selecting variables from the X matrix in such a way that the subset of variables selected have full column rank which is the same rank as the original X matrix and then using this smaller set of variables in place of the original X matrix in the calculation of least squares coefficients, predicted values of Y (Yux) and residuals (le). The selection of a subset of values of Y (YIX) may be calculated as Y - Y1 variables having full column rank and having the same rank as X may be built into the inversion routine in the manner noted in section I.D.2. Although less computer time will generally be required if‘a set of least squares coefficients are calculated as indicated above and Y"x and le are calculated from these coefficients, . . I O 0 it is deSirable that matrices of the form YIIXYIIX’ lele, Yule, or Yixzz be calculated by direct orthogonalization, since matrices of this form may be calculated directly from moment matrices instead of from the observation matrix. (A method for doing so is given in section I.D.2.) 44 In this paper, we will make extensive use of matrices of I I a o ' the form Yanux and lele which we w111 denote as [Y YJnx and [Y'YJIX’ respectively. Thus, [Y'YLLx is the moment matrix ‘vf (i.e., the sums of squares and cross-products matrix) of thegpart of Y in the Space spanned by the columns of X and [Y'Y]ix is the moment matrix of the part of Y orthogonal to X. If X has full column rank, then [Y'Y] may be ”X expressed as: (I 36) [Y'YJHX = Yfiqux = [Y'X(X'X)'lx'][X(x'X)'1x'Y] = Y'X(X'X)'1X'Y . and [Y'Y]J_x may be expressed as: (1.37) [Y'Y]ix - lele = [Y' - Y'X(X'X)-¥X'][Y - X(X'X)-1X'Y] Y'Y - 2Y'X(X'X)-1X'Y + Y'X(X'X)-1X'X(X'X)-1X'Y -1 '. v _ I v 1 = I _ 1 Y Y Y X(X X) x Y Y x) [Y Y]llx . “Y Although Y'X(X'X)-1X'Y is the usual computational formula given for [Y'Y]“x and Y'Y - Y'X(X'X)-1X'Y is the usual compu- tational formula given for [Y'Y]lx, these formulas will not be used in this paper, except possibly for the purpose of a derivation. Instead, the [Y'Y] X matrix will be calculated by direct orthog- , ii onalization from the Y'Y, YiX,_and X'X matrices by a compu- tational scheme given in section I.D.Z; hence, the use of the form I I . . . - [Y YJIX rather than le iX' Calculation by this direct orthog onalization method has the advantages of being more accurate, re- quiring less computer time, and requiring less computer storage. 45 Also, calculation by direct orthogonalization has the additional advantage that [Y'YJIX is easily calculated when X has less than full column rank. ([Y'Y]ix is unique even though X has less than full column rank.) The [Y'Y] IX matrix will be cal- culated as Y'Y - [Y'Y] 1X rather than by the computational formula Y'X(X'X)-]X'Y for the same reasons that [Y'Y]ix is calculated by direct orthogonalization. Mere general matrices of the form [Z'Z ] and 1 2 "23 I X 0 [21221123 will also be calculated, where 21 is a T N1 matrix of variables, 22 is a TXN2 matrix of variables and 23 is a TXN3 matrix of variables, the variables in any of these matrices being jointly dependent, predetermined, or some jointly dependent and others predetermined within the same matrix. (Some of the variables in one matrix may be repeated in the other two.) Al- I . 7 I I . _ though [2122 1123 [ZIJLZBEZZJIZB’ [2122]123 will not be com puted in this way but will instead be extracted from the moment matrix of the part of Z = [Z1 3 22] orthogonal to 23, i.e., [2122312 may be extracted as the upper right hand block of: 3 I l I I 2121 2122 [2121le3 [21221123 [z'z] = = ‘LZ I I I I z221 2222 [2221112 [2222112 12 3 3 which is computed directly from the Z'Z, 2'23 and 2523 matrices. Any variables common to 21 and 22 need not be repeated in Z. The row and column of [2.2112 corresponding to any variable of Z1 or Z2 which also occurs in 23 will be 46 zero, since no part of that variable is orthogonal to 23. 'Z is cal lated s' l a ' - ' . .7 [21 211123 c“ Imp y 8 2122 [Zizz]i§,/-,13 ‘If the_readerfhas difficultyiwith the concept of orthog- onality, he may regard a matrix [Z as merely a matrix Iz] 1 2 123 that is calculated from the matrix ZiZ2 (and has the same fl number of rows and columns as 2122) through use of the matrices 2523 and 252 (where Z contains all variables in Z1 and 22) by]; standard computational procedure given in section I.n,g ,(which we will call the direct orthogonalization procedure). The matrix [leZJIZ may be regarded (and calculated) as 3 I _ I 1 2122 [21221123' “The remainder of this section is devoted to derivingfsome fundamental relationships which will be useful in verifying various results throughout this paper;_however, these relationships are not required to apply the formulas which are given in this paper. 1Some readers will find it of interest to remember that I I [2122]|Z and [2122112 can be calculated by direct least squares 8y using the variables in 21 and 22 as dependent vari- ables and the variables in 2 as independent variables (the maximum number of linearly independent variables in 2 being used as the set of independent variables if 23 has less than full column rank). The matrices [ZIJIZ and [22]"z are the matrices of predicted values of the variables in 21 and 22, respectively; the matrices [21112 and [22]lz are the 3 matrices of residuals of the variagles in Z and 22, respectively; 1 I g I . I 8 I [z122]"23 [211u23tzz]flz3 ’ and [21221123 [211123[221123' The use of the direct orthogonalization procedure merely saves computer time and provides a more accurate calculation of the desired matrices. 47 2, Z3, and Z4 be any TXNl, TXNZ’ TXN3, and TXN4 matrices of variables, respectively, the variables in any of these matrices being jointly In the remainder of this section let Z Z 1’ dependent, predetermined, or some jointly dependent and some pre- determined. Variables in any of these matrices may also occur in any of the other matrices. We will assume only that 21, 22, , and Z have rank N*, N* Ng, and N* respectively, i.e., z3 4 1 2’ 4 we will assume that any of the matrices may have less than full column rank. In showing algebraically that each of the claimed relationships hold, we will often use matrices 21, 23, zg, and 22 where Z? is a TXNi matrix of variables extracted from the Z matrix such that 2* has full column rank and every 1 1 variable in 21 can be expressed as a linear combination of variables in Z? . (It is always possible to extract such a matrix. A method for doing so is given in section I.D.2.) 2*, lg, and .2: are constructed from 22, Z3, and 24 in the same fashion. Following are some additional relationships which will be helpful in deriving computational formulas. (1) If x is a variable in the matrix Z3 or if x is in the space of 23 (i.e., x may be expressed as a linear com- bination of the columns of Z3), then (1.38) [x] = x ; uz3 hence (1.39) [x]' z = x'z |z3 2 2 48 or if X1 is a matrix of variables which are also contained in Z then: 3, (1.40) [XIJuz3 22 - x122 and in particular, (1.41) [x'x ] = [x ]' [x J = x'x l l "23 l “23 l “23 l 1 Also, (continuing to let X1 be a submatrix of Z3 or at least in the space of 23): (1.42) [X1]Lz = 0 [where O is a TX(number of variables in 3 X1) matrix]; therefore, the matrix of sums of cross-products of [x1112 with any other matrix of variables is zero, i.e., 3 I _ I = (1.43) [X13123 22 0 z2 0 and, in particular, (1.44) [@1123 = [x1 123EX1312 = 0 and I = I = I _ (1'45) [xlzzle [x1112 [22112 0 [22112 0 3 3 3 3 -l 2 .4 = * *' * *' ( ) (I 6) [21]"23 23(23 23) Z3 21 where Z§(Z§'Z§)-IZ§' is called the projection matrix for the space Z3 . (A matrix P is a projection matrix for a Space Z if P2 = [Z ] for any matrix of variables, 3 l 1 H23 Z1 .) l Z§ is defined on page 47. (3) (4) (5) 49 (1.47) [2111z3 = z1 - [21] 1z*'z - * *' * ' 2 23(23 23) 3 1 "23 l -l - *' * *' [1 z§(z3 23) 23 321 where I - Z§(Z§'Z§)-IZ§ is called the projection matrix for the space orthogonal to Z (A matrix P is a projection 3. matrix for the space orthogonal to 23 if P21 = [21312 3 for any matrix of variables, 21.) Z¥(Z€'Z{)-IZ¥' is symmetric and idempotent. 0A matrix, P, is symmetric if P = P' and idempotent if PP = P.) That zf(zi'zt)-lzf' is symmetric and idempotent is easily verified:1 -1 -1 _1 *' * ' ' = *' ' *' * ' = * *' * ' (1.43) (21(21 21) 21 ) (z1 ) (21 21) 2; 21(21 zl) zf and ' '1I I ' I (1.49) [z§(z§'Zf) 1zf'][2{(2f'z§) z? ] = z§(zi z?) 121 , since the part underlined in (1.49) is an identity matrix. I - ZT(ZE'Zi)-12i' is symmetric and idempotent. Again this is easily verified: -1 . -1 - * *' * *' ' = - * ' *‘3 *' (1.50) [1 21(21 21) 21 ] 1 21(2{ 21) 21 and l - -1 - ' ' - * ' * ' = (1.51) [1 z§(zg 21) 2f ][1 zl(z§ 21) z; 1 -l -l -l - *' * *' - * *' * *' * *' * *' -l e - * *' * *' I Zl(Z1 Zl) Z1 1For any three matrices A, B, and C of compatible dimensions, (ABC)' = C'B'A'. Also, for any matrix A, (A')' = A . 50 -1 -l ' ' ' * *' * *' - * *' * *' (6) The prOJection matrices 21(21 21) Z1 and I Zl(Z1 Zl) Z1 are mutually orthogonal, since -1 (1.52) [21(21'21) zg'][1 - z§(zi'2{)'12f'] = * *' * *' - * *' * ' = 21(21 21) 21 21(21 21) 21 0 (7) [Z 1]i23 [Z 2$2] = 0 (where 0 is an NIXNZ matrix), since 23 the variables in [21]"z may be expressed as a linear com- 3 bination of the variables in Z3 and the variables in [22] are orthogonal to the variables in Z .123 3. In particular, from (1.46), (1.47), and (1.52) we have that: -l (1.53) [213623[21],23 = Zi[Z§(Z§'z§)‘lz§'][I - z§(z§'z§) zg'le = 21021= O (8) (1.54) Zi[223“23 = [211;23tzZJIZ3 = [23 ”1'2 [211i2322 This comes from the idempotency of Z§(Z§'Z§)-IZ§' [see (1.49)] as follows: (1.55) [21]“23 [z 2] =[ziz§(z§'z*3)'lzg'][z§(z§'z§)'lzg'zz] 2HZ3 -1 ='**'* z123(z3 23) zg '22 =[z 11' 22 or z '[22] 1I23 «23 (9) Similarly, ' -1 This comes from the idempotency of I - Z§(Z§'Z§) Z§' [see (1.51)] as follows: 51 (1.57) [2122] = [211123[zz]1z 123 -1 -1 = I _ * *0 I _ I 21(1 23(23 2;) 23 )(I Z§(Z§ 2*3) I Z§ )Z2 0 ' -1 ' . = 21(1 - z§(z§ zg) 25 )22 =[le or 21[22]lz 3 123 22 (10) Let A1 be any lepl matrix. Then using (1.46) we obtain: -1 a * *'* it. = ° (1.58) [zlAl]I23 23(23 23) 23 2 11A [2 WJHZ 1 , and using (1.47) we obtain: -1 = - * *' * *' (1.59) [21AA11123 (1 23(23 23) 23 )ZlA szljlz 1 (11) Thus, letting A1 be any lep1 matrix and A2 be any NZX p2 matrix we obtain: (1.60) [A' z' z A ]=[21A1]'23[22A2]=A'[Z [z 2] 1 1 2 2 |z3 2'23 1 1 [23 ”"23 =AiEzizzjnzBA 2 and (1.61) [AFZIZZA21123 = [ZIA1]123[22A2]123=A'[21]'23[zm] = AiEZi223125A2 (12) Let A1 be any lepl matrix and A2 be any NZXPZ matrix. Then from (1.58) we obtain: . A1 ' A1 (1.62) [21A1 + 221121.23 = {[21 : 22][A2 = [21 : 221.23 A 3 B [211+[zz31nzA2 "3‘1 Similarly from (1.59) we obtain: (1.63) [21A.1 +~z 2A ZJiZBB [z 111123A + [z 21123A 2 (13) If 2' Z =0 (i.e., Z (14) 52 is orthogonal to 24), 3 4 3 112 >124 = ([21] (1°64) [zl]i[z3 E 24] ([21] 3 11213123 That (1.64) holds for 2524 = O can be seen by writing out each of the terms and observing that they are the same: . . -1 . , - * ; * * : * ' * 5 * * ; * ' [z 111[23 : 24] 21 [23 24]{[23 24] [23 24]} [Z3 24] z I = . However, for 2324 0. -1 -1 *' * *' * -1 23 23 0 z3 Z3] 0 {[23 E zZ]'[z§ E 22]} = = o zz'z* o [2* 2*] hence, [Z M] [Z 5 1 becomes: 3 Z4 [z='r'z=k]'1 0 z*'z [ 3 3 3 1 z - 2* E 2*] = 1 3 4 *' * -1 *' 0 [24 24] 24 21 z - z*[z*'z*]'lz*'z - z*[z*'z*]'lz*'z . 1 3 3 3 3 1 4 4 4 4 .1 -1 . = - * *' * *' On the other hand ([zljiza)124 [21 23(23 23) Z3 21]”4 -1 -1 = - * *' * *' - * *' * *' [21 23(23 Z3) 23 21] 24(24 24) 24 21 + 22(22'22)'lzz'z§(z§'z§)'lzg'zl = 21 - z§(z§'z§)'lz:§'z1 - zz(zz'zz)'lzz'zl since zz'zg = 0. Similarly (EZIJiz )123 becomes the same. (1.64) does not in general hold for 2324 f 0, however. Since [23 E [Z4]lz ] spans the same space as [Z3 3 24] and since 3 253E241lz = 0 [see (1.45)] applying (1.64) we have: 3 1 . (15) (16) (17) 53 23) ([21221"z )Iz is defined as ([21] 23 ([z 2] 3 4 1“ )IZ4( 2023) "211 From (1.54) we obtain: (1.66) ([21221123) 4 =([z 11123'124 ([2 21123) '24 =[z 11123 ([2 21123) '2 =([z ”Juz 124 [2 21123 However: (1.67) 121221123) 12 ,1 (21221123124 nor ([z1l1z[21)12 , since I I '1 l I “'1 I I '1 I [212 21.6; ..z 2123‘“? 21> 2122:“: 22> 2:: 21a; 21> 217-. I I '1 I I '1 I and (z1[z 21123) nz = 2122(22 2:) 22 23(23 zg) 23 22 Thus, we can perform transformationsof the form (1.54) on the outermost H operator, only, e.g., as in (1.66). Additional examples showing permissible transformations based on the outermost operator only follow: (1,58) [2131231221124 = ([213)']11z 12 [z 2:112 (U1112322) '24 ¢ ([21221"23)124 (1.69) [21112112211124 = [2 13123 ([2 21124) "23 z1[z “1.2 “23 n ([2? W3124 123 Similarly (1.66) through (1.69) hold true if thel'operator is re- placed by the l operator. -In particular; the following may 54 be easily verified by writing out the matrices involved in that same manner as for (1.66) through (1.69): (1.70) ([21221123)1z3 = ([21J1Z3)'1z1 ([2 21123) 21 (1.71) ([21221123) 121 = ([2211123)1'z1([zz1]1z3)1z3 = [Z111z3([22]1z >132 =([z1]1)'z3 123 [2212] (1.72) ([21221123)1Z3 # (z1[22]1z) 3123 nor ([21131'Z3[22])1z11 (1,73) [z1lfz3[zzl1z1=([zfl]1z3 1 z1[22]1121=([21]'223“212 7‘ ([212 22)]123 121 (1.74) [211123973121 = [213'2 3([7‘ 231) 21 123 g (ziEZZJizz. lzB * ([zizziiz1liz3 (18) The direct relationship [2122] = [2? 22]"z3 for the outermostll and -Loperators, only. For example, the + [z1zH113 holds following may be easily verified by writing out the matrices involved in the same manner as for (1.66) through (1.69): (1.75) ([214'!2]1123>”z1 + ([212 22)]nz111211 = [21221123 (1.76) ([z'z] > + ([Z'Z] ) = [Z7] ; 1 2 123 “21 1 2 123 1213 1 2 123 however: (1.77) ([212 21123) 123 + ([21221123)1Z3 # [212 241112 (1.78) ([2'122111z3>+ (”517-2312 >12 " [211223 2 124 3 4 4 55 o I I 2. Computation of matrices of the form [21221123 and [21221.23 by direct orthogonalization1 The procedure given in this section is very general in that it may be used to calculate matrices of the form [Zi22]Lz in 3 which: (1) 21, 22, and 23 contain jointly dependent or predetermined variables, or both. (2) Variables in any of the matrices may also occur in the other two as well. (If 21 is contained in both 21 and 23, the row of [Z'Z ] Icorresponding to 2 will be zero at the 12.123 1 completion of the orthogonalization. Similarily if 22 is d 0 contained in both 22 an 23, the column of [lezllz3 corresponding to will be zero at the completion of the z2 orthogonalization.) (3) 21, Z , and 2 may have less than full column rank. 2 3 In this paper, 21 and 22 will most commonly be Y, the matrix of jointly dependent variables in a system of equations; ¥A’ the matrix of jointly dependent variables in a subsystem of the equations; or +3“, the matrix of jointly dependent variables in a single equation. 23 will most commonly be X a matrix of I, instrumental variables; X”, the matrix of predetermined variables 1The orthogonalization method outlined here is very well known among mathematicians and statisticians; however, oddly enough the writer has never seen reference to its use in the field of econometrics for which it would seem to have considerable application. 56 in a single equation; or X, the matrix of predetermined variables in the system. Thus, the orthogonalization procedure outlined in this section will be most commonly used to calculate matrices of I ' ' ' Matrices of the form [2122]"23 are calculated as [zizz] ' [zi221123 Rather than use the more common formula [Zi22]123 = 2122 - ZiZ3(ZéZ3)-12522, we will calculate [ZlZZJiZ3 by direct orthog- onalization thereby eliminating the requirement that (2523)"1 when 3 23 has less than full column rank. Even if 23 has full column rank, calculation of [2i exists, i.e., thereby permitting calculation of [2122112 221123 by direct orthogonalization is advantageous from the fact that (1) fewer computer locations may be conveniently used to compute EZizzjlza’ (2) fewer arithmetic operations are required thereby saving computer time, (3) [ziZZJlZ3 may be computed to a higher degree of accuracy, and (4) rk z is calculated as a byproduct of the computational pro- 3 cedure. A computational procedure for calculating [lezllz by 3 direct orthogonalization follows:1 (1) Let Z be a TXN matrix containing all of the variables or Z . If desired Z could be which occur in either 21 2 ¥A verification that the computational procedures produces the correct matrix follows the presentation of the computational procedure. 57 defined as Z = [21 5 22]; however, there is no need to re- peat variables common to both Z1 and 22.1 If Z1 = 22’ then Z = 21 = 22.. Z may contain variables in addition to those in Z1 and 22 if desired. Calculate the moment matrix (sums of squares and cross pro- ducts matrix) of [23 5 Z], i.e., calculate: r—z'z z'iT 3 3 3 N3XN3 N3XN (1.79) [23 E z]'[z3 E z] = z'z3 z'z L_1~I><1~13 NXNJ (2) Do elementary row operations on the rows of the matrix until the first N columns are reduced to zeros below the diagonal. 3 (It doesn't matter whether the diagonal elements are set to l or not.) This is equivalent to starting a forward solution of the Doolittle inversion procedure but stopping after the N3th row. The above matrix (1.79) will have become r- '1 A11 A12 X X N3 N3 N3 N (1.80) l 0 [2 21123 X X L“N N3 N N _J where A contains zeros below the diagonal and the results 11 1Repeating variables in the Z matrix causes no computational difficulty. 58 of the elementary row operations on and above the diagonal. A merely contains the results of the elementary row 12 operations. (3) [2122112 is a submatrix occupying the same elements of 3 1 [2'2] as Z'Z occupied of 2'2. .1. 23 l 2 To increase accuracy, it is advisable to rearrange the first N rows and columns at each step so that the largest 3 diagonal element from among the remaining diagonal elements of the 2523 matrix is used as the pivot at each step. (This will not affect a row or a column of [Z'Zjlz ; therefore, there is no requirement that track be kept of which rows and columns are switched. On the other hand, the information as to which diagonal elements have served as pivots can be used to derive a minimum subset of predetermined variables spanning the space of the columns of Z3.) If the largest diagonal element becomes smaller than a preset or precalculated value, c > O, the procedure is stOpped, since all of the values to be reduced to zero will already be within a of zero and [2.2112 will already be the moment matrix of the part of Z orthogonal to 23. The number of columns of 2 already operated on before the largest remaining diagonal 3 element became less than c is the rank of Z3. The pre- determined variables corresponding to the diagonal elements used 1It is noted further on that only the triangular part of [Z'Z] need be formed and operated on; hence, [ZiZzllz is extracted from a triangular matrix representing the symmetric I matrix [2 z]l 23 59 as pivots constitute a basis spanning the same space as the columns of Z3. Each of the remaining predetermined variables may be expressed as a linear combination of this set of rk 23 variables.1 Since (1) no use will be made of the matrices A11 and 1The procedure outlined above also provides the starting point of a procedure for getting a set of least squares coefficients for one or more equations even if the matrix of independent variables has less than full column rank. Let Z = Z3 H' + V be a TXN TXN N XN TXN set of equations for which a set of least squares coefficients is desired. Assume that rk Z = N*. Then at the completion of the orthogonalization procedure outlined above [i.e., (1.80)], N; diagonal elements will have been selected and the remaining iagonal elements will have become less than c. The calculation of a set of least squares coefficients is completed by: (1) Setting the last N - N§ rows of A to zero. (These elements will already be approximately zero, but they will not, in general, be exactly zero due to rounding error.) (2) Dividing each of the elements on or above the diagonals of the first N* rows of [A 1 3 A1 ] by the diagonal elements for the row. (The diagonal elements of the first N3 rows will then be 1.) (3) Performing a back solution in the usual Doolittle manner, i.e., by reducing all elements above the diagonal elements of the first N* columns to O. (4) Rearranging the first N rows into their original order (in terms of the Z matrix ). A_set of least squares coefficients (fi') is then given by the N3XN matrix in the position originally occupied by 252 in (1.79). Estimated values of Z (i.e., [ZJIZ ) and residuals for Z (i.e., [21.2 ) calculated through uge of the H' matrix calculated in this manner will be the same as the estimated values of Z and residuals cal- culated through use of any of the many possible sets of least squares coefficients. (Even though the least squares coefficients are not unique, the estimated values of the dependent variable and the residuals are unique-~the same estimated values being obtained from any set of least squares coefficients.) 60 A12, (2) the initial matrix is symmetric, and (3) the [Z'Zjlz matrix is symmetric, all elements on one side of the diagonal need not be formed or operated on; that is, only a triangular matrix need be formed and all operations may be performed on I this triangular matrix thereby saving computer memory. . . . . 1 Vméacatxsn That The Computatwmfl Pnocedee Pnoducu [Z'Zle 3 That the matrix labeled [Z'Zjlz is indeed the matrix 3 -l I . I I I Z Z Z 23(2323) 23 Z 1f 23 has full column rank [hence, (2523)“1 exists] can be readily demonstrated as follows: Performing the above elementary row operations is equivalent E O to premultiplying (1.79) by a nonsingular matrix 11 E21 1 such that: I I E11 0 2323 232 A11 A12 I I E21 1 Z 23 Z Z 0 A2 Thus: E Z'Z + Z'Z = O or E = -Z'Z (Z'Z )-1 21 3 3 3 21 3 3 3 and: I I = . E21232 + Z Z A22 Substituting for E into the last equation we get: 21 -1 I I I I = -z 23(2323) 232 + z 2 A22 or -1 = I _ I I I = I (1.81) A22 2 z z 23(2323) 232 [z 21123 1The proof for Z of full column rank was suggested by Professor Robert L. Gustafson. 61 The same set of manipulations may also be used to show that the matrix labeled [z'zjlz is indeed that matrix even in the 3 * < . . ' 3 having rank N3 N3 (i e , 1n the case of a 23 having less than full column rank). If rk Z3 = N§ , row operations are performed on the columns corresponding to N§ of the variables before the diagonal elements corresponding to the remaining case of a Z in 23 variables become less than c (for a suitable choice of e). The orthogonalization stops at this point. This is equivalent to per- forming row operations on the following submatrix of (1.79) (let- ting 23 be a submatrix of Z containing the variables corre- 3 sponding to the N? diagonal elements used as pivots): I-z* ' 2* 2* ' z“ 3 3 3 X * *X N5 N3 N3 N (1.82) Z'Z§ 2'2 x * NXN 5N N3 J The same derivation may now be performed on (1.82) as was per- formed with (I.80)--the only difference in the intermediate matrices obtained is that 2* will occur in place of Z wher- 3 3 ever Z3 presently occurs. Thus (1.81) becomes: -1 = ' _ ' * *' * *' , (I.83) A22 2 Z 2 23(23 Z3) 23 2 But this is [2.2112 [see (1.57)]. Thus, the desired matrix is 3 obtained even in the case of Z3 having less than full column rank. CHAPTER II COEFFICIENT ESTIMATION A. Basic Double k-class Model and Summary of Methods The basic single equation procedure presented in this paper is the double k-class model--the model developed in this chapter.1 Variance-covariance formulas for the double k-class model are given in Chapter III and a method for directly imposing restrictions on direct least squares and two-stage least squares coefficients are given in Chapter IV. In this chapter, the double k-class computational formula for the coefficients of an equation are first given, some matrices are defined and a derivation of the double k-class formula is pre- sented. Specific members of the double k-class model such as direct least squares (DLS), two-stage least squares (ZSLS), and limited information single equation maximum likelihood (LIML) are summarized and then presented in more detail. Variations of the double k-class model including instrumental variables techniques are presented. Finally a discussion of selection of instruments-- especially the maximum number of instruments which will be effec- tive--is presented. The nth equation in a system of the equations was defined in (1.8). If we drop the subscript u from many of the matrices, 1The double k-class model is given in Nagar [1962] and Theil [1961], pp. 354. 62 63 we may write the equation as: (11- 1) Y Y 'Y + X“. B + u Txl Txm mxl TXL‘LXI TX1 The computational formula for double k-class estimated coefficients can be written as: 1Of the double k-class estimators discussed, only in the case of LIML does it make no substantive difference in the estimated co- efficients which jointly dependent variable is chosen as the nor- malizing variable. For the remaining procedures, a change in the selected normalizing variable will change the resulting coeffi- cients by more than just a trivial division of all coefficients by the negative of the coefficient of the variable chosen as the normalizing variable. Regarding this effect, Fisher [1965], p. 604 states: I "It can be argued that limited-information maximum likeli- hood has the desirable property of treating all included en- dogenous variables in an equation symmetrically; indeed, Chow has shown that it is a natural generalization of ordinary least squares in the absence of a theoretically given nor- malization rule" [footnote reference deleted]. "On the other hand, such an argument seems rather weak, since normalization rules are in fact generally present in practice, each equation of the model being naturally asso- ciated with that particular endogenous variable which is determined by the decision-makers whose behavior is repre- sented by the equation. The normalization rules are in a real sense part of the specification of the model, and the model is not completely Specified unless every endogenous variable appears (at least implicitly) in exactly one equa- tion in normalized form. For example, it is not enough to have price equating supply and demand, equations should also be present which explain pure quotations by sellers and buyers and which describe the equilibrating process. (For most purposes, of course, such additional equations can remain in the back of the model builder's mind, although the rules for choosing instrumental variables given below may sometimes re- quire that they be made explicit.) "Thus, symmetry may be positively undesirable in a well- specified model where one feels relatively certain as to appropriate normalization, although it may be desirable if one wishes to remain agnostic as to appropriate normal- ization." 64 'Y'Y-kIEY'Y]LX Y' x; 1(1' y- -k ZEY' lex A. :Y9 R k = « 1’ 2 ‘3 __k ,k X'Y x'x x' 1 2 .- u u u uy (11.2) 8 where k1 and k2 are scalars which determine the particular double k-class member. X1 is the matrix of instrumental variables used to adjust the jointly dependent variables in the equation. XI includes all of the predetermined variables in the equation plus additional instruments--the additional instruments usually being (but not restricted to being) all or some of the additional predetermined variables in the system.1 [Y'YJLX is the moment matrix of the part of Y orthogonal 1 +4 4 1The basic double k-class method is usually given with the entire matrix of predetermined variables in the system, X, being used as the matrix of instruments, ; however, in practice some of the predetermined variables in the system are often omitted from the X matrix, predetermined variables are sometimes linearly combined (e.g., by use of principal components), etc. Rather than use the X matrix in our notation and then continually point out variations, it seems more fruitful to merely designate the matrix of variables used to adjust the jointly dependent variables as a matrix of instruments. Since the particular instruments used to adjust the jointly dependent variables have a considerable effect on the coefficients obtained it is imperative that the particular instruments used be listed when reporting results. We will return to the problem of selecting instruments in section 11.6. 65 1 ' I _ I to X . Since [Y YJLX - Y I = y' y [see (1.56)], [Y'Y]1X Y J. I xI J7x1 LXI I may be regarded as either (1) the vector of sums of cross-products of the part of Y orthogonal to X with the part of y orthogonal to I X1 or (2) the vector of sums of cross-products of the part of Y ortho- gonal to XI with y. Thus, computationally, X may be regarded as the I matrix of instruments used to adjust Y or the matrix of instruments used to adjust [y E Y]. In light of the equivalence of [Y'y]ix and I [YJIX y, (11.2) may also be written as: I I _ I I 'IP‘I I '1 (Y Y k1[Y Y]Lx Y x” Y szlx 1 1 (11.3) 6k1,k2 = y ; X'Y X'X X' r u u ‘- H -J however, for actual computations (11.2) should be used since [Y'YJ1X 1 may be computed by direct orthogonalization in the manner indicated in section I.D.2. A tentative proof that 6 is a consistent estimator of 6 k1,k2 given the statistical assumptions of section I.C.3 and the assumption plim(k1 - l) = plim(k2 - 1) = 0 is given in appendix C.2 1The orthogonalization notation used here and the concept of orthog- onalization are discussed in section I.D. A method to compute |:Y'Y]J_X and [Y'yjlx by direct orthogonalization is given in section I.D.2. During the galculation of [Y'Ylix' and [Ty],x it is recommended that rk,x be calculated and, if LIML Eoefficients are to be calculated, thatuthe [+Y'+Y]Lx matrix be saved. A method for calculating rk X and [+Y'+Y]Lx as an intermediate step in the calculation of [Y'Ylix I and [Y'y]Lx is given in appendix A. 2Members of the double k-class family for which plim(k - l) = plim(k - 1) = 0 include ZSLS, LIML, Nagar's unbiased to 0(1-1) in probabIlity estimator (UBK), and Nagar's minimum second moment esti- mator (MSM). plim(k - l) = plim(k2 - l) = -l for DLS; hence, DLS is not shown to be consistent. 66 (11.2) is not the most general form of the double k-class formu- la, since it may be desired that different sets of instrumental vari- ables be used in theadjustment of the different jointly dependent variables. Let Y = [yl ... ym] and let Xi denote the set of instruments used to adjust yl, x: denote the set of instruments used to adjust y2 , ... , X? denote the set of instruments used to adjust ym, and X? denote the set of instruments used to adjust the normalizing jointly dependent variable, y. (X? may be null, i.e., it may be desired that y not be adjusted.) Then, the following matrix could be used in (11.2) in place of the [Y'Y]ix matrix: I "1'1 [1'1] MUN] ylyl lxl yl ixl yz 1x2 y 1x1 y 111!“ I I 1 1 I I I I I 1 I [y];[y]1 [3' fully} 2°” [me3 l m .1 m Ltn XI 1 XI leI 21.XImmJ.XI__J and the following vector could be used in place of the [Y'y]ix 1 vector: . 1.131ij 1Ey11xy 1 I I I I A method of computing (11.4) and (11.5) by direct orthogonalization is given in appendix B. 67 The suggestion that separate instruments be used for the adjustment of each jointly dependent variable for ZSLS is made by Franklin Fisher.1 He points out that such a procedure introduces certain problems of inconsistency depending on the assumptionsof the modelz; however, he then argues effectively as to why the in- consistency introduced is likely to be low. We will not consider properties of the double k-class estimator if separate instruments are used in the adjustment of each jointly dependent variable. We merely note that if it is desired that a separate set of instruments be used in the ad- justment of each jointly dependent variable, a computational method for doing this is given in appendix B. In section 11.G we will return to the problem of selecting instruments and note that due to the use of direct orthogonalization, it may be feasible to merely note the separate instruments which the researcher would prefer to use in the adjustment of each jointly dependent variable and then use all of these instruments in the xI matrix; that is, adjust all of the jointly dependent variables in the equation by the same set of instruments. If this approach is taken, computational form:(II.2) is used. If ZSLS is the basic computational method used, it is often suggested that asymptotic efficiency will be increased 1Fisher [1965], pp. 602-603, 625-633. Fisher points out (p. 603) that "if different predetermined variables are used in re- placing each included endogenous variable (as suggested below), it is not clear how limited-information maximum likelihood carried over to such cases." 2Fisher [1965], p. 631. 68 in the estimation of 6 if each explanatory jointly dependent variable is adjusted only by the predetermined variables in the reduced form equation in which this jointly dependent variable occurs rather than adjusting all explanatory jointly dependent variables by the entire set of predetermined variables in the system.1 (Assuming that all predetermined variables do not occur with non-zero coefficients in each of the reduced form equation being estimated) this leads, of course, to the use of (11.4) in place of [Y'Yllx and (11.5) in place of [Y'yllx 1 1 (except that y is usually not adjusted). DanLéon 06 (II. 2)2 If the coefficients of (11.1) are estimated by DLS, due to the fact that some of the explanatory variables are contem- poraneously correlated with the disturbance of the equation, the DLS coefficients do not possess even the property of consistency. To take account of the occurrence of explanatory variables which are contemporaneously correlated with the disturbance, let us first rewrite the equation in an adjusted form and then apply 1Predetermined variables will tend to occur with readily recognized zero coefficients in reduced form equation in systems which are recursive with respect to the coefficients. ZSLS (modified by 11.4 and 11.5) is still more appropriate than re- cursive system estimation methods if it is not assumed that the off-diagonal elements of the disturbance variance-covariance matrix are zero. 2The derivation in this section parallels the derivation of the single k-class method by Chow [1964], pp. 546-548. 3Except possibly in special recursive models. See Fisher [1965], pp. 592, 593. 69 DLS to the adjusted equation. As a first step let us divide the vari- ation of each variable in Y and y into two parts--that part of the variation which is in the Space spanned by the instruments and that part which is orthogonal to the instruments, i.e., we will divide Y and y into: (11.6) Y = Y + Y, llxI 1x1 (11.7) y = y + y "XI 1XI Since the instruments are assumed to be asymptotically uncorre- lated with the disturbances of all equations, the columns of le and 1 y"X which lie in the space Spanned by XI are also asymptotically I uncorrelated with the disturbances of all equations, and in particular asymptotically uncorrelated with the disturbance of the nth equation, u. To adjust Y and y for the part asymptotically correlated with the disturbance, let us subtract a constant, g1, times le from Y and I (recognizing the special role played by the normalizing variable) sub- tract another constant, gz, times y from y. If lelX and I I are subtracted from Y and y respectively, equation (11.1) iX gzleI becomes: 1It might be argued that it is unnecessary to adjust y, the normal- izing jointly dependent variable. If y is not adjusted, one obtains Theil's h-class which contain DLS and ZSLS as particular cases. Some readers may consider the adjustment of y more justified if this ad- justment is regarded as the result of a two step process as follows: First, lelX is subtracted from Y giving us: I Y = [Y - lelx ]Y + XuB + [u + leix Y] and then gZle is sublracted from the distufbance to (possibly) make the resulting disturbance more homogeneous or (possibly) to reduce the asymptotic correlation between the disturbance and the explanatory variables, [Y - lelxI: Xu], in the equation. If 70 (11.8) [y - gzyixl] = [Y - 31 YLXIJY + X“ B +'[u + lelXIY - gzyLXI] or (11.9) [y - gzylx ] = [Y - lelx E Xulé + [u + lelx Y - gzylx J I I I I The double k-class estimator of 6 in (11.9) may be written as: (11.10) 8 = k1,k2 : '1 I I {[Y lelxI-. x n]. [Y le LXI - Xu]} [Y - leixi - Xu] [Y - gznyI] or (11.11) 8 = k1,k2 '- ‘ I _ _ I '1 I [nglleI] [Y lelxI] [Y leLx 1 X“ [Y'leix J [y-gzyix J 1 1 1 XI Y- Y IX I - However, [Y - le 1X J'Xp = Y' X“ - 81$ Xu= Y'XW - g10 = Y'X m1 1 ' - = . ' and similarly XDEY gZle] Xuy Also, Since B = = ' = ' ijl Y Y' Y; [Y'Y]1x1 and Y'ny Y y1x1 [Y yJLXI [see I _ = I _ I - I 2 I g _ _ I - + g1[Y Y]1x1 Y (2g1 g1)[Y YJLXI , and (11.11) may be re written as: this approach is used, we obtain: =[Y- -g1Y 1X; jv +'qu +>g2y‘Lx + [u +-ginXIv - g2y1XI] which may be rewritte as (11.8). If it is felt that the same adjustment should be made to all jointly dependent variables, then g = and the single k-class estimators are obtained, of which LIML is an example. lle X“ 8 0 since X“ is in the space spanned by XI and le is brthogonal to x1. See (1.38) and (1.53). 1 71 (11.12) 6 = P 2 Y'Y-(Zgl-gl)[Y'Y]ix Y'Xu [Y'y-(g1+g2-glsz)[Y'YJ 1 . I X'Y X'X X'y L ”- lJvN' Ivl' b The basic estimating formula (11.2) is obtained from (11.12) . 2 by letting k1 2g1 - g1 and k2 g1 + g2 - glg2 . Sumnaliy 05 Mcthada. If y is left unadjusted (i.e., g2 is set to zero) and if g1 is set to l - h, Theil's h-class model is obtained as a particular case of the above double k-class model, since k1 be- comes 2g1 - 3% = 2(1 - h) - (l - h)2 = 1 - h2 and k2 becomes 1 g1 +g2 +g1g2 = l - h. Theil's h-class formula may be written: [Y'Y-(1-hz)[Y'Y1ix Y'x‘7’IFY'y-(1-h)[Y'y]lx" 1 “ 1 (11.13) 6h = , X'Y X'X ‘H' . X. As with the formula for any double k-class member, y may be substituted for in the right hand side vector, since y1x 2 [Y'y] = w y . 1X1 lXI If the same basic adjustment is made to all jointly de- 1 pendent variables (i.e., if g1 and g2 are restricted to the 1The h-class model may be found in Theil [1961], pp. 353-354 and Nagar [1962], p. 171. 2 [Y'Y] = Y' y = Y' [y - y ] = Y' y - Y' Y = 1x . 1 J"x1 1x1 1x1 ”(1 1x1 1x1 ”1 Y' y - 0 = Y' .L xI 1x1 72 same scalar value g),the single k-class model with k = 2g - g2 is obtained. The single k-class formula may be written: rY'Y-1<[Y:Y]1x Y'x“ ' 'y-klIY'y] 1x I,‘, - I (11.14) 8k: ' x'Y x'x x' _ u u . uy If both g1 and g2 are set to zero, the DLS estimation procedure with k1 = k2 = 0 is obtained. DLS may be regarded as either a k-class or an h-class member. If g1 is set to 1, even though g2 is taken as any value we have k1 = 2g1 - gi = l and k2 = 31 + g2 - glg2 = 1 + g2 - g2 = 1; therefore, setting g1 to l automatically gives k1 = k2 = l-- the ZSLS estimator. .Thus, ZSLS may be regarded as the single k- class member with k = 1 or the h-class member with h = 0. The LIML estimator is the particular single k-class member in which k equals the smallest 2eigenvalue (characteristic root) of the matrix [Y' Y]Lx I[Y' Y] This eigenvalue will be greater $2 than 1 except for the particular case rk XI = n, 3 in which case the eigenvalue will be 1. An eigenvalue greater than 1 implies a g which must be expressed as a complex number containing an imaginary part. The eigenvalue itself will, of course, be real. lThe k-class model may be found in Theil [1961], pp. 231-237. 2 C O O O C The LIML estimator is discussed in greater detail in section 11.0.1. 3n = m +-L is the number of ”explanatory" variables in the equation. 73 Two other particular single k-class members which will be considered further on are members suggested by Nagar which we will refer to as unbiased to 0(T-1) in probability k (UBK) and minimum second moment (16M) . 74 B. Methods Which are Both h-class and Single k-class 1. Direct least squares (DLS) As indicated earlier, direct least squares (DLS) coeffi- cients may be obtained by setting g1 and g2 to 0 which implies a k of 0 and an h of l. (11.14) becomes: -1 Y'Y Y'xu Y'y (11.15) 61m = x'Y x'x x'y u u u It In estimating by DLS, all of the jointly dependent variables except the normalizing variable are treated the same as the pre- determined variables. If more than one jointly dependent vari- able occurs in the equation, the DLS estimated coefficients are not even consistent, since jointly dependent variables which are not even asymptotically uncorrelated with the disturbance are used as independent variables. Even in this case, however, DLS coefficients have some desirable properties such as small dis- persion of coefficients about their expected values and finite sample coefficient variance-covariance matrices. 18ee Fisher [1965], pp. 591-592, 604-605. 75 2. Two-stage least squares (ZSLS)1 As indicated earlier, if g1 equals 1, g2 may be anything with- out changing the resulting coefficients which we call the two-stage least squares (ZSLS) coefficients. If g2 is considered to be 0, ZSLS may be considered the h-class member with h = 0. If g2 is consid- ered to be 1, ZSLS may be considered the k-class member with k = l. The ZSLS estimating formula becomes: -1 I I I I I Y Y-[Y Y]LX Y x” Y y-[Y yle I 1 (11.16) 52313 = X'Y x'x x'y u u u u - I = I _ I d I = I __ I , Slnce [Y YjuxI Y Y [Y YllX an [Y yl'x Y y [Y ylix I I (11.16) may also be written as: I I ’ I [Y YJHXI Y x“ Y yJ"XI (11.17) 62SLS = X'Y x'x x' uy Duivafion 06 ZSLS a6 a TWO-6110.90. PILOCQAA3 As a first stage, let us calculate the predicted value of each explanatory jointly dependent variable (each variable in Y) 1Basic referenceson ZSLS include Theil [1961], pp. 228-240, 336- 344, and Basmann [1957]. ZSLS is referred to as the generalized clas- sical linear (GCL) estimator by Basmann. (Basmann derived ZSLS at approximately the same time as but independently of Theil.) More re- cently, Basmann extended his GCL estimator to a partial system or full system estimator. 2 . . Theil and Basmann used the X matrix as the matrix of instru- ments, XI. 3Theil [1961], pp. 228-230 derives ZSLS as a two-stage procedure. 76 by DLS, using the variables in XI as explanatory variables in the DLS calculations. The resulting matrix of predicted values of variables in Y (often denoted Y) is exactly the matrix Y“XI as noted in section I.D.l. Since XI is assumed asymp- totically uncorrelated with u, Y"XI will be asymptotically un- correlated with u, also. As a second stage, let us substitute Y for Y into "X (11.1) and estimate the vector of coefficients, 6, by DLS, i.e., let us apply DLS using y as the dependent variable and the variables in the matrix [YnX 3 Xu] as explanatory variables. I We get: [— I I “'11. I q Y Y Y X Y y leI llxI llxI u- IxI (11°18) 62nd stage 8 x'Y x'x x'y X However, YflthYan = [Y'Y]"x1 (by definition 0f [Y'YJIXI) , Y' y = [Y'y] [see (1.54)], and since all variables in X IxI [XI 0 . o y = I = I x = YIX are also contained in XI, YIXIXp [Y XDJIXI Y [ “JIXI u [see (1.54) and (1.40]; hence, (11.18) is equivalent to (11.16) and (11.17). Although SZSLS calculated in two steps (i.e., by actually calculating YIx ) is algebraically the same as the computational formula forI EZSLS given in (11.16), the compu- tational formula given in (11.16) should result in less rounding error since the computations are more direct. 77 Demiuation 06 ZSLS as an Instaumentaf Vaniab£e6 EAIZmazon Technique In section II.D.2, ZSLS is derived as an instrumental vari- ables estimator technique. Deniuation 06 ZSLS as an Appfiication 06 Genenazized Least Squanes In section IV.D, ZSLS is derived as an application of Aitken's generalized least squares. Dehivazion 06 ZSLS as the Least Vaniance Dififienence Consider linearly combining the jointly dependent variables in the equation into a single jointly dependent variable, y*, by postmultiplying these variables by a vector of coefficients, i.e., 1 consider the calculation of y* = [y ? Y][:.:] . Let the residual -v* sum of squares from regressing y* on the predetermined variables in the equation be denoted fi'fi (i.e., fi'fi = [yIY'y‘klLx ) and the u residual sum of squares from regressing y* on all of the in- struments be denoted 5'6 (i.e., fi'fi = [y*'y*]ix ). Then I fi'fi - 3'3 will be minimized if §ZSLS is used as §*.1 Also, if §ZSLS is used as §*, the DLS coefficients obtained by re- gressing y* on the predetermined variables in the equation will . 2 5231.5' The above least variance difference (LVD) property is intu- itively desirable since it causes the jointly dependent variables to be linearly combined such that the instruments which are specified 1Basmann [1960a], pp. 100-102. 2Ibid. 78 a priori as being outside the equation add as little as possible to the explanation of the combined dependent variable (y*). Such an intuitively desirable property can, however, be easily over- emphasized. The LVR (least variance ratio) property of LIML (limited information single equation maximum likelihood) estimates would seem to be as appealing. LVR estimates may be derived in the same manner as LVD estimates, except that instead of selecting §* to minimize 0'6 - fi'fi, §* is selected to minimize fi'fi/fi'fi, which is equivalent to minimizing (fi'fi - E'fi)/G‘E. is used If YLIML as §*, then fi'fi/G'G will be a minimum.1 Also, éLIML will be the DLS coefficients obtained by regressing y* (calculated by using §LIML as §*) on the predetermined variables in the equation. lfi'fi/fi'fi - [fi'fi/i‘i'fi] - 1 + 1 = [(fi'fi - rah/6's] + 1. Since fi'fi/G'G and [(fi'fi - G'G)/G'G] + l differ by an additive con- stant, they are minimized by the same values of y*. Also, kLIML = fi'fi/fi'u. See Koopmans and Hood [1953], pp. 166-169. 2Ibid. 79 C. Additional Single k-class Methods 1. Limited information Single equation maximum likelihood (LIML)1 If kLIML is calculated as the smallest eigenvalue (char- acteristic root) of the matrix [+Y'+Y]I;I[+Y'+Y]lx then the limited information single equation maximum likelihtod (LIML) coefficients may be calculated by the usual single k-class for- and 2 . mula. (As noted earlier, Y = [y : Y]; hence, [+Y' + +XLX I [+Y'+Y]Lx are the moment matrices of the parts of the jointly dependent variables in the equation orthogonal to the instruments and to the predetermined variables in the equation, respectively. Computational formulas for these matrices are given in sections I.D.2 and appendix B.)3 If the matrix X is used as the matrix of instruments, XI, and it is assumed that the matrix of disturbances of the system has the multivariate normal distribution, then the LIML estimates are maximum likelihood estimates given the limited amount of in- formation used (the jointly dependent variables in the equation, the predetermined variables in the system, and which predetermined 1Basic references for the LIML estimator are Anderson and Rubin [1949], Koopmans and Hood [1953], pp. 162-170, and Chernoff and Divinsky [1953], pp. 240-246. 2See Theil [1961], p. 231. The computational equivalence of as defined above and the more commonly expressed formulas fofMIhe calculation of kL ML is noted further on in this section. I . I - I . That the matrix [+Y +Y]lx [+Y +Y]lx% is not a symmetric matrix must be taken into account in extrac ing kLIMl' 3[+Y'+Y]l may be calculated as an intermediate step of the calculation ofx“[+Y'+Y]lx in the manner noted in appendix A. 80 variables have zero coefficients in the equation).1 If the matrix X is not used as the matrix of instruments, then the resulting coefficients are not, strictly speaking, the usual limited in- formation maximum likelihood coefficients, since the predetermined variables in a matrix of instruments have been substituted for the predetermined variables in the system. LIML estimation utilizes the same information as ZSLS, and the LIML coefficients have the same asymptotic coefficient variance- covariance matrix as the ZSLS coefficients.2 As noted at the end of section II.B.2, LIML coefficients may be derived as the coeffi- cients with the least variance ratio (LVR). If rk XI = n then k.LIML = l = kZSLS; therefore, the co- 3 efficients for LIML and ZSLS coincide. If rk XI > n, then kLIML > 1.4 If rk.XI < n, a singular matrix is encountered during 1Koopmans and Hood [1953], pp. 166-170. 2Theil [1961], p. 232. 3 . ' ‘ Theil [1961], p. 232. As noted in section II.B.2, kLIML LVR = fi'G/u'fi where 6'8 and 5'5 are as defined in section 11.3.2. If rk.X = n, then there are effectively only m = n - L instruments in addition to the predetermined variables in the equation. The m + l jointly dependent variables may, therefore, be combined into a single jointly dependent variable in such a way that G'fi = fi'fi. See section II.D for additional detail show- ing the equivalence of ZSLS and LIML for the case rk.XI = n. 4If rk.X > n, then fi'fi # 3'3 since there are effectively more than m instruments in addition to the predetermined vari- ables in the equation; hence, kLIML = fi'fi/G'G > 1. See Koopmans and Hood [1953], pp. 171-175. 81 estimation (a unique solution does not exist). 14.81er LIML Fatimuiao The LIML formula given above is not the most common formula for LIML. To see the relationship to more commonly quoted LIML formulas, we will first note some relationships between eigen- values (characteristic roots) and eigenvectors. Let A and B be an symmetric positive definite matrices.1 Then the determinantal equation (11.19) det(A - CiB) = 0 has n solutions, c1 ... cn of which some of the C1 (the eigenvalues) may be duplicates (i.e., there may be only m dis- tinct roots with m S n). Since a determinantal equation is not changed by multiplying both sides by a constant, the determinantal equation -1 -l (11.20) det(B )-det(A - ciB) = det(B A - ciI) = 0 has the same eigenvalue solutions (the same Ci) as detQA - ciB) = 0. (11.19) may also be converted to another problem-~the calculation of the eigenvalues of the equation (11.21) (A - ciB)di = 0 1The matrix B in this section is any an positive definite symmetric matrix--not the matrix of coefficients of predetermined variables as in other sections of this paper. 2 det denotes determinant. 82 where associated with each eigenvalue, c , is an nXl eigenvector, 1 di' Premultiplying (11.21) by d; we have: (11.22) d!(A - c,B)d, = ded, - c,ded. = 0 i i i 1 1 i 1 1 or (11.23) dedw = c,ded, i i i i 1 or diAdi (11.24) (:i = m i i That is, each eigenvalue, ci, must meet relationship (11.24) with its associated eigenvector, di' Similarly from either (11.20) or (11.21) we can derive that (11.25) (B'lA - ciI)di = 0 ; thus, (11.26) d'(B'¥A - c I)d = 0 and i i i d£(B-1A)di (11.27) c1 = dfd 1 i . . = I = I For LIML estimation A [+Y +Y1lxu , B [+Y +YLX1 , the minimum ci from any of the above formulations becomes kLIML’ and the corresponding di becomes +yLIML . BLIMI may = - - ' ' . be calculated as éLIML [Y X ] Kn IY YLIML 1n the formula which is given in this paper, the smallest eigenvalue of -l I I . . [+Y‘fY11XI[+Y +Y]pr becomes kLIML which is substituted into 1Koopmans and Hood [1953], pp. 170-173. 83 the eneral k- las f l t l l A . g c s ormu a 0 ca cu ate 6LIML . . I _ Another equivalent formulation is to use [+Y +Y]be I I o . [+Y +Y]Lx1 as A and +Y +Y as B. (This is the formulation of Anderson and Rubin [1949] and Chernoff and Divinsky [1953].) The minimum ci becomes kLIML - l but the associated eigen- vector di is still +YLIML . The eigenvalues of A-lB are the reciprocals of the -l eigenvalues of B A; hence, instead of extracting the smallest eigenvalue of [+Y'+Y]E; [+Y'+Y]Lx , the largest eigenvalue of I H l I I [+Y'+Y'ixu[+Y +Y]LXI may be extracted and then kLIML cal- culated as 1 divided by this eigenvalue. Neither A-IB nor B-yA is symmetric; hence, a non- symmetric eigenvalue computer subroutine is required to extract the desired eigenvalues. Computational procedures for extract- ing eigenvalues of a matrix of the special form A-lB with A positive definite and B positive semi-definite are available or a computational scheme for more general non-symmetric matrices may be used. Extraction of an eigenvalue as kLIML and substitution of kLIML into the usual k-class formula in order to cal- culate the LIML coefficients makes it unnecessary to cal- culate the corresponding eigenvector while the root is cal- culated thereby saving computer time. Use of the eigenvector corresponding to kLIML a Y and then calculating s 4 +ALIML . as -[x'x]']X' A BLIML +¥+NLIML also requires spec1a1 programming; 84 thereby, again giving incentive to calculate LIML coefficients through use of the k-class formula. (In addition as is noted in Chapter III, calculation of the estimated coefficient variance- covariance matrix does not require special programming if the general k-class coefficient variance-covariance formula is used.) The two smallest eigenvalues provide information. The smallest is used as kLIMEL and the first two smallest eigen- values may be used in a test of identifiability of a structural equation.1 The closeness of the second eigenvalue to the smallest eignevalue gives an indication of the "explosiveness" of the resulting LIML coefficients as noted in Klein and Nakamura [1962], pp. 294-295. 11 n, no solution exists.1 For a solution to exist, A - n of the equations given in (11.31) must be thrown away (i.e., A - n of the rows of [X'X]”1 must be deleted), thereby ignoring information and making the actual estimates obtained depend on the particular rows of [X'X].1 deleted. Equivalently, A - n of the predetermined variables in the system can be ignored thereby forcing the number of predetermined vari- ables used in the estimation to equal n. This gives rise to a different set of ILS coefficients for each different set of pre- determined variables ignored. Fuhtheh DWI/ation Assuming A = 11 Let us assume that for the nth equation, n = A. Given our assumption that X has' full column rank this implies that ZJX is square. Let us further assume that ZfiX is non-singular. A set of simultaneous equations may be premultiplied by a non-singular matrix without changing the solution. Premultiplying (11.31) by z'x we et: H 8 lAssuming X has‘ full column rank, an equation in which A = n is just-identified by the counting rule for identification and an equation in which A > n is over-identified by the counting rule for identification. The case of just-identification is also referred to as the case of "minimum requisite information" and the case of over-identification is also referred to as the case of "extra information.“ A 2 n is a necessary but not sufficient condition for the population coefficients to be identifiable. See Koopmans and Hood [1953], pp. 135-142. 94 B I I ‘1 I " , s ' ' -1 ' (11.32) qu[X X] x YYILS + ZHXE] = sztx X] x y - [Y'Y] [Y'Y] Since Z'XEX'X] IX'Y =[Z‘1Y] = IX a: IX 0 'X [x'Y]. X'Y‘ u . u [Y'Y] [Y'y] and zux[x'x]'1ny = [zuy] ux = Ix 0 IX [ 'YJ x'y X” “X p (II-32) may be rewritten as:1 [Y'Y] . YFx [Y'y] (11.33) ux YILS + u §ILS s IX qu xuxh Xpy or Y'Y]Ix Y xy]"x (11.34) talus . V V Therefore, I I '1 I f [Y YJ'X Y x“ [Y yllx (11.35) 8 = X'Y x'x x' u u u uy However, (11.35) is the ZSLS computational formula (11.17) in which X is used as the matrix of instruments, XI. Thus, ILS coefficients may be computed through use of the ZSLS formula. l X'Y = X'Y and X' = X' since the variables in [ p Jux u [ “YJIX “Y X“ are contained in X. See (1.42). 95 2. The instrumental variables estimator (IV) In our general double k-class methods, we have been adjust- ing the jointly dependent variables in the equation by a matrix of instrumental variables. In this section, we will consider the calculation of coefficients by a computational method (fairly widely used before the k-class methods such as LIML and ZSLS were devised) called the instrumental variables (1V) estimating method.1 In this method, the same number of instruments as there are vari- ables in the equation are used. We will Show that on the one hand, the IV coefficients may be calculated by the ZSLS computational method and on the other hand, ZSLS coefficients may be derived as a particular case of the IV method. The choice of instruments for ZSLS and IV should apparently be based on the same criteria except that only n instruments can be chosen for the IV method.2 Let a single equation from a system of equations be written as (11.36) y = 2116 + u where all of the matrices and vectors have the same dimension and meaning as before [see (1.10) and (11.1)]. Let XIv be a matrix of n instrumental variables (the same number of instrumental variables as there are columns in 2h) 1Goldberger [1964] contains a detailed treatment of the in- strumental variables method. 2Choice of instruments is discussed in section 11.G. . 96 and assume that xivz is nonsingular (hence that XIV and Z“ have full column rank and that correlation exists between the variables in XIV and the variables in Z“). Premultiplying (11.36) by X' IV we get: (11.37) X' y = X' I IV IVZHS + X u IV If we let the estimating equations for 6IV be: . . 8 = ' , (11 38) XIth IV XIVy an nXl nXl we get: A = I " I (11.39) 61V [xIvzp] 1xIvy If the variables in XIv are contemporaneously independent of u (thus: plim(l/T)Xivu = 0) and correlated with Z so that plim(l/T)Xi Z ==Q exists and is non-singular, then 6 is V”- X Z 1V . IV “ 1 a consistent estimate of 6. Caecuflett'on 06 IV Pnoblieino on a ZSLS Compute/L Rowténe It is not necessary to develop a computer program to cal- culate 1V estimates, since 81V may be calculated on a double k- class computer routine as a ZSLS problem. Let us premultiply -l - . ' . equation (11.38) by the non singular square matrix zuxIVEXIVXIV] A [A proof of the consistency of 6IV is given in Goldberger [1964], p. 285. 97 (premultiplication by a non-singular matrix will not change the solution) so that the estimating equations become: - A -1 11.40 'X ' 1x‘ 5 = ' I ( ) Zn IVEXIVXIV] 1vzu 1v zuxIVEXIVxIV] x or I 1vy (11.41) [zuzujlx 51v = [Zp‘yl'x 1V IV Then: - -1 (11.42) 6 - [2'2 ] [Z'y] IV u u IXIV u IXIV Comparison of (11.42) with (11.17) or (11.35) shows that (11.42) may be computed on a ZSLS routine by merely treating 2” as Y, (i.e., by treating all of the variables in the equation as if they were jointly dependent variables) and by using XIv as the matrix of the instruments, X (Only the upper left hand sub- 1' matrix will remain in the matrix to be inverted and only the upper subvector will remain in the right hand side vector.) The variables in X” are treated as jointly dependent in the com- putation only--not in the interpretation of results. The calculation of 61 as a special ZSLS problem also V provides a convenient way of calculating the estimated coeffi- cient variance-covariance matrix as is noted in the section on coefficient variance-covariance estimation. If some of the predetermined variables in the equation are used as their own instruments (or are even linear combination of variables in XIV), these predetermined variables need not be treated as jointly dependent for computational purposes. Let 98 u = [2: 5 X3] where variables are rearranged in the equation to permit listing predetermined variables which serve as instruments last. These variables comprise the X3 matrix. (11.42) may be rewritten as: *I * ‘1 I 12*'2*J"XIV [2* x 1"X1v 12;; flux (11.43) 8 - 1V [ *' 2*1'Ix [x *‘x * [X*'y] xfi u x“ "XI [XIV Since X3 is in the space spanned by XIV , [z*' x* = z*'x*, [x* 'x* = x*'x* , and x*' = x*' WJIX u H WJHX u u E u YJIX H Y [see (1.39)]; therefore, (11.43) may be rewritten as: -lr- Z*'Z* Z*'X* z*' E u uJIX u u E H YJuXIv (11.44) 6IV = XK'Z* x*'x* X*‘y H u b H Comparison of (11.44) with (11.17) shows that (11.44) is the com- putational formula for ZSLS in which the variables in X3 are treated as predetermined variables in the equation, the variables in a: are treated as explanatory jointly dependent variables in the equation (for computational purposes--not for interpretation of results), and the variables in X are treated as predeter- 1V mined variables in the system. If (as is the usual practice) all of the predetermined variables in the equation are used as instruments, then 2: = Y, Xi = X“, and (11.44) becomes: 99 -1 [Y'Y] Y'X x'y] .. IxIv “' ”’ IXIV (11.45) 51V ’3 3 X'Y x'x x'y H u u H the usual ZSLS computational formula in which X is used as IV the matrix of instruments, XI. The above computational methods do not require that pre- determined variables in the equation be selected as instruments; however, most criteria for selecting instruments make the pre- determined variables in the equation prime candidates as instru- ments. As noted above, not selecting a predetermined variable in an equation as an instrument has the same computational effect (assuming that the variable is not in the space Spanned by the instruments) as if the variable had been reclassified as a jointly dependent variable. If A - n, then (as in double k-class estimation) the in- struments are usually taken to be the predetermined variables in the system so that XIV = XI = X and (11.45) becomes the usual ZSLS computational procedure. .If A > n, unique IV estimates do not exist if the instruments are selected from among the A pre- determined variables in the system, since the number of pre- determined variables in the system is greater than the number of instruments to be selected. On the other hand a unique set of m instruments can be calculated from the A predetermined vari- ables (this set of instruments consists of the variables in YIX) such that if XIV = [YflX :Xu], ZSLS estimates (with XI = X) are obtained. De/Lévauon 05 ZSLS M an IV M22311 We have been treating IV as a special case of ZSLS, at least computationally. It is interesting to note that even if in ZSLS estimation the matrix of instruments, X1, is of rank greater thaxlnp (hence, there are more instruments in the XI matrix than variables in the equation), ZSLS may be considered to be a particular case of IV.1 To demonstrate this, we use the variables in the matrix [YIX E Xp] as instruments.2 (11.39) I becomes: 3 X J'y (11.46) 6 = {[Y H . . , , -1 1v : xu] [Y . XpJ} [Y'x Ix1 E' Y ' x "IFY' y IxI thl ” 'x1 I L’SIY X? t x11? Since Y' Y = [Y'Y] , Y' y =-[Y'y] , and Y' = [Y'x ] "‘1 nxI In:I IxI Ixfi» u Ix = Y'Xp [see (1.54) and (1.40)], the above may be rewritten as: I 1 u 1 . [Y YJ'XI Y XL [Y yJ'x;) (11.47) 81v = = 'Y x' x' X» uxu uy _. serve as valid instruments since they are 0) ZSLS The variables in Y Ix, linear combinations of variables in X1. 1Goldberger [1964], p. 332. 2Y is the part of Y in the space spanned by X IXI I 101 E. No Predetermined Variables in an Equation All of the formulas for estimating coefficients given earlier remain valid even if no predetermined variables occur in the equation to be estimated. In the LIML computations, [+Y'+Y]lx u (the moment matrix of the part of +Y orthogonal to the columns of X“) becomes Y' Y 1 ++° 102 F. Only One Jointly Dependent Variable in an Equation If only one jointly dependent variable occurs in an equation, all of the double k-class methods become the same as DLS, since equation (11.2) becomes simply: ~ -1 11.48 5 = [xfx .x' ( ) k1,k2 IJ' IJ'] “'y 103 G. Selection of Instruments In this section we will discuss the selection of instruments for double k-class methods, limiting our discussion for the sake of simplicity to ZSLS for most of the section. Instruments se- lected for double k-class methods should have the same basic characteristics as instruments selected for the IV estimator except that it seems less desirable to form linear combinations of possible instruments for the double k-class members, since the number of instruments to be used is not restricted to a given number--the number of explanatory variables in the equation--as is the case with the IV method. We will not make explicit further reference to the IV method, since it is most fruitful to regard (and calculate) 1V coefficients as the special case of ZSLS in which there are the same number of instruments as explanatory variables in the equation. We will assume that the predetermined variables in the equation are among the instruments in the XI matrix. For con- sistency in double k-class estimation we will assume that the matrix of instruments is contemporaneously independent of the‘diSturbance‘ of the equation being estimated (see appendix C). This suggests that the predetermined variables in the system are prime candidates for choices as instruments; however, cases may arise where it is desirable that not all of the predetermined variables in the system be used as instruments or that additional variables which are not predetermined variables in the system be used as instruments. 1f 1See section II.D.2. 104 all of the assumptions made at the start of this paper strictly held, one would be led toward the use of all of the predetermined variables in the system as the set of instruments; however, since these assumptions are likely to hold only imptéfectly in practice, it may be desirable that given instruments be eliminated from the XI matrix. For example if the disturbances of an equation are "slightly" serially correlated, one would surely question the use of lagged jointly dependent variables outside the equation but in the system as instruments. 1See Fisher [1965] and Goldberger [1964] for expositions on the choice of instruments. It has been usual (where possible) to use all of the predeter- mined variables in the system as instruments, the justification for using all of them being based on the fact that the unrestricted re- duced form equations corresponding to the jointly dependent variables contain all of the predetermined variables in the equation. Examination of many systems (especially those in which the matrix of coefficients of the structural equations has a recursive structure) often discloses that certain of the predetermined variables occur with zero coefficients in the reduced form equation corresponding to a given jointly dependent variable. Often a separate set of instruments is then used in the ad- justment of each jointly dependent variable. (If a separate set of in- struments is used for each jointly dependent variable, the formulas given for the double k-class estimators can be modified in the manner noted in section II.A and appendix B.) Alternatively only those prea determined variables which have zero coefficients in the reduced form equations corresponding to all of the jointly dependent variables may be eliminated. Fisher takes a causal approach to the selection of instruments which leads him to the examination of the structural equations rather than the reduced form equations as the basis for choice of instruments, the key structural equation for each explanatory jointly dependent vari- able being the one in which that variable occurs as the normalizing variable. (Fisher takes the approach that in a fully specified system, each jointly dependent variable will occur as the normalizing variable in one structural equation.) This also leads him to the possibility of using a separate set of instruments to adjust each explanatory jointly dependent variable. Examination of alternative assumptions made in the block recursive systems which Fisher examines leads him to question (in some cases) the use of lagged jointly dependent variables and to suggest the use of lagged exogenous variables as instruments. A model may also be examined for partitioning into subsystems. After partitioning, the instruments in the estimation of each equa- tion in a subsystem are based on the predetermined variables in 105 The minimum number of instruments which can be selected for ZSLS is n (at least m instruments must be selected in addition to the L predetermined variables in the equation); since otherwise the [YIX 5 Kg] matrix has rank less than n I 1 [[Y'Y]nx Y'x and therefore, the I u matrix is singular, i.e., X'Y X'X h— u’ ”A unique ZSLS coefficients do not exist.1 There is absolutely no maximum number of instruments which may be chosen if direct orthogonalization is used as suggested in this paper; however, if rk.XI = T, all double k-class coefficients become the same as the DLS coefficients.2 This comes about because if rk XI = T, then leI = 0 since any variables in the observation matrix falls in the space spanned by X (i.e., all of the vari- I ables may be expressed as an exact linear combination of the variables that subsystem. One basis for partitioning is the degree of correlation of the disturbances (estimated in some manner or assumed a priori). Also, Hannan [1967] gives a method for sub- dividing a system of equations of a special form into non-inter- secting "maximal" subsystems. LThis is also the minimum number of instruments for LIML, since if the number of instruments is less than or equal to n, the LVR (least variance ratio) for LIML is (see sections 11.3.2 and II.C.1) 8 0'0 /E'G = 1 (since the jointly dependent variables may gchombined such that fi'fi is no larger than 3'5); hence, LIML becomes the same as ZSLS for which (in this case) unique coefficients do not exist. (Depending on the computational formula for kLI , arbitrarily large numbers may be encountered during the compu ation of kLIML if the number of instruments is less than n.) 2In this case, the particular instruments chosen will have no effect on the coefficients obtained (provided rk XI = T) since the DLS solution is obtained in any case. 106 in X1); hence, the part of any jointly dependent variable orthogonal 0 I ' _ I __ 9 .— to XI 18 zero. Thus, [Y Y]1XI= YJ_XIY_LXI - 0 0 — O, [Y YJiXI _ X' y 3 0'0 = 0. and the genera] double k-class formula (11.2) 1X1 1X1 becomes:' r‘I ; . t ’1 I _ . Y Y RI 0 Y XE] FY y k2 0 (11.49) 8 = k1,k2 X'Y x'x x'y u u u u b an I— rY'Y Y'x “'1 Y'§‘ u = = 6DLS X'Y X'X X. L.u u we uZJ Thus, when rk X = T, all double k-class methods give estimated coeffi- cients which coincide with the DLS coefficients. The fact that all double k-class coefficients coincide with DLS coefficients when rk XI = T does not destroy the consistency of given double k-class methods. All that is indicated is that there are insuf- ficient observations to distinguish the double k-class members from 2 each other based on the estimated coefficients obtained. 11h this formula and the remainder of this section we will assume that only zero restrictions are imposed on the coefficients and that the matrix [Y E Xp] has full column rank (this implies that n S T). 2This was pointed out to the writer by Professor Robert L. Gustaf- son. Consistency is an asymptotic property and a small number of obser- vations in a given sample certainly does not affect an asymptotic pro- perty. If the number of instruments used in the estimation is fixed (e.g., the number of predetermined variables in the system is fixed for a given model; hence, if X is used as then the number of vari- ables in X is fixed) then (if the double k-class formulas are followed; that is, a switch is not made to the DLS formula) as T increases, at some point there will be sufficient observations that rk.X ‘< T and the coefficients of the double k—class members will not coincide with DLS coefficients. . 107 manna Penapec/téve The formulas presently quoted in the econometric litera- ture for the calculation of [Y'Y].x and [Y'y]"x are I I -1 -1 Y'Y = Y'X X'X 'Y d ' = ' ' ' . [ J'XI I( 1 I) xI an [Y YJIIXI Y x1(XIxI) ny As a result of focusing on these formulas, the problem occurring when rk XI = T has been expressed as one of how to select or construct a set of predetermined variables, say Xa, which captures the maximum effect of the predetermined variables in the system but whose sums of squares and cross-products matrix, x;xa, is nonsingular.1 Initially the solution to the problem was based on the omission of sufficient variables from the XI matrix that the inverse of the resulting x;xa matrix could be calculated accurately. The variables retained in the X8 matrix (in addition to the predetermined variables in the equation) were selected such that the resulting Xa matrix captured as much effect of the original .X matrix as possible. 1 Kloek and Mennes took a different approach to the re— striction of the space of predetermined variables. They sug-; gested linearly combining the predetermined variables in the system into a set of fewer predetermined variables by calculat- ing principal components of predetermined variables. In their [As an example, Kloek and Mennes [1960], p. 46 state: "The table also shows, however, that A may easily grow in excess of the number of observations T on which the estimation is based. This is a serious problem, for it implies that the matrix of sums of squares and cross products of the predetermined variables (X'X) is singular; and the inverse of this matrix is needed for the estimation of the reduced-form disturbances, which are auxillary in the estimation of the parameters of the structural equations." 108 famous article they elaborated four methods of calculating prin- cipal components.1 The most elaborate of these methods is the method which Kloek and Mennes refer to as method 2. In method 2 the moment matrix of XI; (the part of the predetermined variables in the system with zero coefficients in the equation orthogonal to the predetermined variables with non-zero coeffi- cients in the equation) is first calculated and then a pre- designated number of principal components of the Xi; matrix u are calculated. These principal components plus X“ are then used in place of the XI matrix in double k-class estimation. Regmdéng Rama/tan 06 The Space 06 Imuumemf’. vmab£u The calculation of [Y'Yhx and [Y'y]J_x by direct I I orthogonalization (hence the calculation of [Y'Y]"x as I Y'Y - [Y'YJLXI and [Y'y]“x as Y'y - [Y'y]lx1) changes the focus of the problem from on: of eliminating sufficient multi- collinearity that an inverse (or a set of reduced form coeffi- cients) can be accurately calculated, since [Y‘Y]|X and [Y'y]Ix are already unique and automatically calculated by the computational method given in this paper in even the most extreme cases of multicollinearity among the instruments. The problem now becomes one of 'whether the subspace of the instru- ments should be restricted so that the solution obtained will not coincide with the DLS solution. There is often a good basis for not using all of the pre- determined variables in the system as variables in the X matrix I 1Kloek and Mennes [1960]. 109 provided this is done because of characteristics of the data (e.g., some degree of serial correlation in the disturbances of an equa- tion may imply that certain lagged jointly dependent variables would not serve as desirable instruments) or of the model (e.g., the reduced form equations corresponding to the explanatory jointly dependent variables do not contain a set of the pre- determined variables). There does not, however, seem to be a good basis for restricting the space of instruments merely to cause the resulting coefficients to differ from the DLS coeffi- cients. ’ On the other hand, since there will surely be researchers who will desire that the space of instruments be restricted so that the resulting coefficients do not coincide with DLS coeffi- cients (and since some effective arguments for restricting the space may be forthcoming in the future) a few additional remarks regarding how the space might be restricted will be made. [As noted before, an estimator is not made inconsistent just because the resulting coefficients do not differ from DLS coeffi- cients. Also, by now it should be obvious that asymptotic pro- perties are a poor guide as to how to procede when the problem is the result of a small number of observations. 2Even though DLS does not use any information from the remainder of the system, there may still be considerable advantage to care- fully specifying the remainder of the system even if it is known ahead of time that given the number of observations available, rk.X will equal T. If a complete system is specified, FIML (full information maximum likelihood) may be applied even if rk X - T. (rk.X - T presents no difficulty in FIML estimation, since the matrix of coefficients of jointly dependent variables are recognized in a special fashion, but the jointly dependent variables themselves are not adjusted by a set of instruments. The confusion which exists regarding this point is clarified in footnote 1 of page 203.) The FIML solution will not, in general, coincide with the DLS solution even in the case rk XI - T. 110 The change in focus of the problem away from the prevention of a large degree of multicollinearity considerably reduces the desirability of using principal components to restrict the Space spanned by the instruments. If it is decided that the space spanned by the instruments is to be restricted, then it is much more straightforward to do this by merely eliminating certain of the initial instruments considered for the analysis. In that way the coefficients obtained can be related to the particular set of instruments used, whereas it is very difficult to evaluate the effect of using as instruments in the computation a set of prin- cipal components of a larger set of instruments. The best guide as to which instruments to retain would seem to be to rank (or group) the instruments according to their desirability as in- struments in the estimation of the equation. One of the advantages claimed for the use of principal components in restricting the space of the instruments is that the use of principal components is less arbitrary than the selection of instruments to retain--one merely decides on the principal component method to use (which might as well be method 2) and how many principal components to retain; (the number to retain is quite arbitrary); however, there are certainly easier and more informative methods (in terms of evaluating the results) to accomplish even this advantage. As an example (in addition to the methods of Fisher [1965]) the orthogonalization procedure described in appendix A can be 1See Fisher [1965] for some procedures. lll modified so that c instruments1 (in addition to the L pre- determined variables in the equation) can be selected from among a prespecified set of instruments so that no instrument selected is an exact linear combination of instruments previously selected. The modification consists of merely (l) incorporating the effect of the variables in .Xn, and then (2) stopping the orthogonalization procedure after c diagonal elements have been selected as pivots from among those corresponding to instruments not in the equation. After c pivots have been used, the part of the matrix correspond- . ' ' ing to +Y +Y will have become [+Y +Y Xe] where 11: [xu.x1 ... X through Xc are the instruments corresponding to the c l pivots selected. Using this matrix in place of [+Y'+Y]lx will 1 result in the same coefficients as if X1 through Xc were initially listed as the only instruments in addition to the vari- ables in ~X .2 p Assuming that the space of instruments is to be restricted, 1c could be the number of principal components which would have been selected if principal components had been used. c must be greater than or equal to m since otherwise (as noted earlier) the matrix inverted in computing ZSLS or LIML estimated coeffi- cients is singular. 2This method is somewhat similar to principal components method 2 in that by first incorporating the variables of X , the first pivot selected from among the part of the matrix coraespond- ing to the variables in X** is selected as the largest diagonal element of the [X**'X**l_ matrix (the moment matrix from which principal components are calculated if method 2 is used). If the variable corresponding to the first pivot selected from the [X**'X**]l matrix is denoted as X1 , then the second pivot is selected as the largest one in the [X**'X**] matrix, 1 5 Ex“ x1] and so on until c pivots (and hence c variables) have been selected in the orthogonalization procedure. 112 both the predesignation of the variables to be treated as in- struments and the use of an automatic method to select a set of variables have the distinct advantage over the use of principal components in that it is clear as to the exact instrumenusused in the computations; hence, it is possible to more fully evaluate the results obtained. Ease of computation is a second advantage. Predesignation of the variables to use has an advantage over the automatic selection computational method in that more judgement may be used in the selection of instruments. The two methods may, of course, be combined by (1) not listing pre- determined variables in the system which are clearly not desired, (2) selecting the pivots corresponding to all of predetermined variables clearly desired, and (3) letting the automatic pro- cedure select the remaining variables (up to a given pre- specified number) through choice of the largest pivot at each step. The methods for selecting instruments given in Fisher [1965] tend to lead to a separate set of instruments to adjust each jointly dependent variable in the equation. Whereas the use of this approach is certainly feasible, it would seem that a more straightforward approach would be to select a set of instruments corresponding to each explanatory jointly dependent variable as Fisher suggests and then use all of the instruments selected for any jointly dependent variable in the X matrix, I thereby again adjusting all of the explanatory jointly dependent 113 variables by the same set of instruments.1 In addition to the computations being easier to perform, interpretation of results will be simpler.2’3 1Such a procedure is feasible if direct orthogonalization is used to prevent multicollinearity from making the computations unreliable. 2On the other hand if it is known that a predetermined vari- able has no relationship to a jointly dependent variable (i.e., the coefficient corresponding to the predetermined variable is zero in the reduced form equation containing the jointly dependent variable) it is possible that asymptotic efficiency will be in- creased by not adjusting all explanatory jointly dependent vari- ables by the same set of instruments. 3Again the writer would like to emphasize that in the latter part of this section he is attempting to give methods which will give results which may be more readily interpreted and more easily computed than the methods commonly advocated in the case where it has already been determined that the space of the instruments is to be limited. The writer currently prefers estimates which coincide with DLS estimates rather than resorting to a restriction of the space of instruments merely to prevent the estimates obtained from coinciding with DLS estimates. CHAPTER III DISTURBANCE VARIANCE AND COEFFICIENT VARIANCE-COVARIANCE ESTIMATION In this chapter we will first discuss estimation of the disturbance variance for all double k-class members satisfying plim(k1 - 1) = plim(k2 - 1) = 0 and then discuss coefficient variance-covariance estimation for the double k-class estimators satisfying plim /1(1<1 - 1) = plim /T(k2 - 1) = 0 (which still includes ZSLS, LIML, UBK, and MSM). Finally, estimation of coeffi- cient "t-ratios" is discussed. A. Disturbance Variance Estimation For any double k-class member for which plim(k1 - l) = plim(k2 - l) = 0, a consistent estimate of 02, the disturbance . th . . . variance of the u equation, IS g1ven by: (111.1) 6 = 6' /(T - n) where u' is the usual residual sum of squares, the {1k ,k C1k ,k l 2 1’2 6 2 re31dua1 being given by uk1,k2 = y - Z“ kl’kz . 1To the writer's knowledge no formula (except the one given in.Nagar [1962]) exists in the literature for the estimated dis- turbance variance of a double k-class estimator. (111.1) is con- sistent with the usual formula for single k-class estimators. A tentative proof of the consistency of 62 calculated by (111.1) is given in appendix C. kl’k2 2&‘6 for any linear estimator may be calculated more 8simply, accurately, and with less computer time if calculated as “+2 ]+ 6 rather than by calculating u and then calculating the sum 80f Du squares of u. That the two are mathematicall equivalent is easily verified. u' u = [y - Z ”81' [y - Z “6] = [+2 M6]. [+2u+6]= H6'[ 2' qu] 6. For notational simplicity, however, we will con- tinue to write the residual sum of squares as fi'fi rather than its computational form +6' [+fip +2“]+ 6. 114 115 All of the specific double k-class members previously dis- cussed meet the above plim requirement except DLS. This includes ZSLS, LIML, UBK, and MSM; Although the above does not provide a consistent estimate of 02 for DLS when multiple jointly dependent variables occur in the equation, it does agree with the usual formula used for calculating afiLS (which is almost surely biased downward when multiple jointly dependent variables occur in the equation). The consistency of the estimator will not be changed if any denominator, d, with plim(d/T) = 1, is used in place of T - n. Thus, T may also be used as a denominator instead of T - n. Raga/«Ling the. Appnapméaie "Deglieu 06 Facedom" 6011 the. Denomination When estimating 02 by DLS, it is usual to use the follow- ing formula: «2 g A. A - (111.2) ODLS uDLSuDLS/(T n) If only one jointly dependent variable occurs in the equation, (111.2) provides a consistent estimate of 02. If multiple jointly dependent variables occur in the equation, it is usual to continue to use (111.2) although this estimate is almost surely biased downward.1 It would seem desirable to develop an estimate which takes account of the occurance of multiple jointly dependent HMonte Carlo results of Cragg support this. See Cragg [1966] and Cragg [1967]. 116 variables in the equation; however, the writer is not aware of any results of work in this area. The use of rk XI in an ad- justment for "degrees of freedom" (e.g., a degrees of freedom of T - rk.XI) would not seem to be a fruitful approach for DLS,since XI has no effect on the DLS coefficients. On the other hand, when estimating by other double k- class members, the set of instruments affects the size of 2 a kl’kZ procedure. Given a particular equation, the use of additional , since the instruments are actually used in the estimation 121 fik k; 1’- 1’2 immediately arises--shou1d we reflect this by changing the "degrees instruments will tend to reduce 6 hence, the question of freedom” used as the denominator to something like T - A, T - rk X1, or even T - m - rk XI? All of these denominators will give consistent estimates of 02 under the assumptions plim(k1 - l) = 0 and plim(k2 - l)_= 0. By way of a partial answer, let us consider what happens to the fi' G of a given double k-class member calculated k1,k2 k1.k2 for a given equation, u, and a fixed sample of size T if addi- tional instruments are added to a given XI matrix. 1f the addi- tional instruments are not linear combinations of instruments al- ready in the X matrix, the rank of the new XI will increase and I 6i k fik k will tend to decrease. 1f sufficient predetermined 1’ 2 1’ 2 variables are added that rk XI = T (rkXI cannot exceed T), 6 becomes equal to 6 and G' G becomes equal k1,k2 DLS k1’k2 kl’kz fifiLSfiDLS Thus, if the maximum number of predetermined variables which can affect the estimation are classified as being to 117 ' th " ‘ ’ " ‘ . in e system, ukl’kzukl’kz W111 be decreased to uDLSuDLS The use of T - rk XI as the degrees of freedom is clearly inappropriate in this case since T - rk X = 0 and, therefore, 6 would 1 k1.k2 be arbitrarily large. The above suggests that although some adjustment based on number of predetermined variables might be appropriate, the use of T - rk.XI does not seem appropriate. The use of T - rk XI seems inappropriate also from the standpoint that it completely ignores the number of actual coefficients estimated--a factor which would seem to be of considerably more importance in any "degrees of freedom" adjustment than rk XI. That the use of T - n as the "degrees of freedom" adjustment gives satisfactory results in at least some cases for ZSLS and LIML is indicated by some of Cragg's Mbnte Carlo results. Some Wonk by Nagan -1 Nagar derived the bias to 0(T ) in probability of 2 (l/T)fi as: 11611 (111.3) (IIT)6(Gkfik) - 02 = -a2{2 - [A - n — n - 1]-tr(QCl) - tr(QC) + (tn/1)} where k is assumed nonstochastic with k - l = 0(T-1) in 1Cragg's conclusions notedA n section 111.C pertain to estimated "t-ratios"; however, 0 is used in this estimation. ZNagari[l961]. The comparable formula for double k-class members is given in Nagar [1962]. 118 probability; K is related to k by the relation k = l + (n/T), K being assumed to be non-stochastic and independent of T; and the predetermined variables are assumed to be nonstochastic.1 Q, C, and C are matrices which must be estimated (see section 11. 1 C.3). 6 denotes expected value. It might be suggested that 02 be estimated by the formula (l/T)§L§k and then (111.3) be used to adjust for bias; however, this does not seem to be a fruitful approach. First of all, the formula for bias contains 02 as a parameter. Thus, we are in the position of estimating the bias of a particular estimate of 02 with 02 itself being a parameter. It is possible to manip- ulate the above formula to eliminate 02 from the right hand side; however, the resulting formula is still not very helpful in adjusting for bias. Especially noteworthy is the fact that unbiasedness is only one desirable characteristic (actually the above formula only gives an asymptotic estimate of bias to 0(T-1) in prob- ability). Another important characteristic is dispersion, especially dispersion about the true value. The above formula is likely to give a large dispersion about any value (let alone the true value), since it contains traces of certain matrices which must be estimated, and the estimates of these traces (at least those suggested by Nagar) vary substantially in magnitude 1These are the same assumptions as were made in section 11.0. 3 (MSM estimation). k nonstochastic means in this case that k is independent of any stochastic variable (in particular, k is independent of u). Although ZSLS, UBK, and MSM meet this require- ment, LIML does not. 119 in response to only small changes in the data or model. Also, the particular estimates of these matrices may add a substantial error to our estimate of the bias.1 A Common Ouwighx in ammg aéswazm As noted in section II.B.2, ZSLS may be estimated as a two stage process in which Y is calculated in the first stage, IXI and in the second stage, YIX is Substituted for Y in the cal- I culation of 528LS° Often, the error sum of squares from the second step is then used as in the calculation of A ' A “zsisuzsis EZSLS and , therefore, the calculation of coefficient standard errors is based on YIX instead of the original Y. This is I generally regarded as a less desirable estimate than the estimator we have given, namely that based on u = y - Y] - Xpfi, or some ZSLS equivalent formula. Thus, after calculating SZSLS’ 3'6 should be calculated by using the original Y in place of YIx or by I the formula in the footnote on page 114.2 1Some estimates for these matrices are suggested in.Nagar [1959]. 2 A. A The UZSLSUZSLS A either larger or smaller than the u'fi obtained by DLS at the second stage. obtained through recalculating may be 120 . . 1 B. Coefficient Variance-Covariance Estimation 1. Double k-class For any double k-class member for which plim ./7f(kl - 1) = plim1/T(k2 - l) = O, a consistent estimate of Var(6k k ) is l’ 2 1Coefficient variance-covariance estimation for the double k-class estimators which we have considered other than DLS is complicated by the fact that the small sample sampling variance may be infinite in some cases. Fisher [1965], p. 605 states: "The principal point that has emerged on small sample properties of limited-information estimators is that the sampling variances involved are infinite, at least in some cases. Such a conclusion is borne out both from the analytic work that has been accomplished to date and by the results of the Monte Carlo experiments that have been performed." (Fisher's limited-in- formation estimators include all of the specific k-class estimators treated in this paper except DLS. Fisher gives a number of analytical and Monte Carlo references.) ' This does not imply that coefficient variance-covariance estimation is futile in finite samples. Basmann [1961], p. 621 states: "It is appropriate to mention that even though the exact distribution function F(x) of some estimator fails to possess moments of lower order (say) a variance, it is still possible in many cases to appdoximate F(x) by a distribution function. G(x) that does possess (say) a variance and even possesses moments of still higher order. Thus A.L. Nagar has made an important contribution to econometric statistics by working out formulas for the bias and moment matrices cdfappnoximaze dis- tributions for Theil's h-Class estimators," [reference to Nagar [1959] deleted]. "The reader will easily satisfy himself that Nagar's approximations do not depend on the exact distributions possessing a finite variance. Examine the exact frequency function exhibited in Figure 1 below. This frequency function does not possess a finite variance. Consider the approximation obtained by truncating the exact frequency function at the points v = -3, v = 3. The approxi- mate frequency function obtained in this way possesses finite moments of all orders. The approximate distribution will be an excellent one, indeed." 121 given by: u u l ‘1 ry Y-k1[Y yjlxl Y xn «2 ) = 0 k1,k2 k1,k2 (111.4) v3r(8 X'Y x'x c u u u *2 where 0k k is calculated by (111.1) or any other formula with A l’ 2 plim O = O (e.g., T could be used in the denominator in k1,k2 place of T - n if desired). (111.4) does not provide a consistent estimate of Var(6DLS) when multiple jointly dependent variables occur in the equation even ) though it does agree with the usual formula for calculating Var(6DLS 1To the writer's knowledge no formula (except the one given in Nagar [1962]) exists in the literature for the estimated coefficient variance-covariance matrix of the double k-class estimator. If k = k , (111.4) becomes the usual estimated coefficient variance- covariance formula for the single k-class estimator. Christ [1966], p. 445 states that the asymptotic coefficient variance-covariance matrix of the double k-class estimator is the same as the asymptotic coefficient variance-covariance matrix of the 2313 estimator provided pun/"f(k1 - 1) = p1im./T(k2 - 1) = 0, but does not give a formula for Var(6k k ) . Let the formula given in l’ 2 (111.4) be denoted A. In appendix C it is tentatively shown [under the slightly less restrictive assumption plim(k - 1) = plim(k - l) = 0] that (l/T)plim T A equals the asymptotic coefficient variance covariance matrix of the ZSLS estimator; hence, under the assumption plim/f(k1 - 1) = p1im/T(k2 - 1) = O, A [i.e., the formula given in (111.4)] is a consistent estimate of Var(8k k ) . LIML, ZSLS, UBK, and MSM all meet this plim requirement. 1 2 122 (since k1 = O for DLS). Cragg's Monte Carlo results indicate that this estimate of SDLS has a substantial downward bias.1 Conversion of an instrumental variables problem to a ZSLS pro- blem in the manner indicated in section 11.D.2 also provides a con- venient method of estimating Var(6Iv), since (111 4) will give the same result as the usual IV formula.2 1Cragg's Monte Carlo results are considered in section 111.C in the discussion of t-ratios. 2See Goldberger [1964], pp. 286, 332 for the usual formula-- A2 ' -1 ' ' ‘1 . o o o [XIVzp] [XIVXIV][quIV] --wh1ch Goldberger notes is con81stent. That it is equivalent to the ZSLS formula for the converted 1V pro- . c2 , -l , , -1 _ blem may be noted from. O [XIVzu] [XIVXIVJEZHXIV] - .2. , -1, -1=«2, -1 a [2px1v(x1vx1v) x1vzs] ° [zuzujuxlv times the matrix inverted in the calculation of ZSLS if the IV pro- blem is converted to a ZSLS problem (see 11.39). If (as is usual) the [see (1.36)], which is 62 variables in X“ are also contained in X , 62EZfiZu]i; becomes IV the more familiar IV TY'Y] y'x ’1 IXIV u] 62 [see (11.45)]. L X'Y x'x u uu‘ 123 2. Alternative estimate for LIML An asymptotic coefficient variance-covariance matrix for LIML has been derived by Rubin as:1 ”Y'Y-[rial + “QC—llff' y'x ”'1 X is (111.5) Var(6LIML) = OLIML X’Y X'X where k is the smallest eigenvalue of [Y'Y];; [Y'Y]ix as before, I u I: ' ° +f [+ Y +Y11XI+NLIML’ and f is the same as +f except that the element corresponding to the normalizing variable is deleted, i.e., f ‘ E +Y inx +YL1ML ' I (111.5) is given for completeness. (111.4) would seem to be as desirable and has the further advantages: (1) Var(6 ) estimated by formula (111.4) provides an estimate LIML which can be compared more readily to the Var(6 of other k) k-class members. (2) (111.4) may be obtained as a by-product of the calculation of A I 6 whereas (111.5) requires special-programming. LIML It should be noted that Var(SLIML) by formula (111.4) # Var(6LIML) by formula (111.5), lPersonal conference.' The estimate given in Chernoff and Divinsky [1953], p. 245 is the same as (111.5) as may be seen by writing down the formulas for inverting the matrix given in (111.5) in parts and noting that the result is exactly the same as the formula given by Chernoff and Divinsky. 124 3. Nagar's unbiased to 0(T-2) in probability estimates A Nagar derived the moment matrix of 6 - 6 and 6 - 6 k k1,k2 to 0(T-2) in probability. The estimated coefficient variance- covariance matrix could be based on the formulas derived by Nagar. (Most of the matrices in his formulas are based on population para- meters, but estimates of these matrices of the type which he suggests in the calculation of gMSM could be used.) Those interested in following this approach are referred to Nagar's articles.1 In evaluating whether to follow the approach of estimating a number of population matrices and substituting them into the formulas one should be reminded that: (1) Assumptions additional to those which we have specified are imposed in the derivation of Nagar's formulas. (2) Although the resulting formulas are of a higher order of un- biasedness (assuming that the actual population parameters are available) than the usual formulas, they are still asymptotic. (3) Unbiasedness is only one desirable property. Dispersion, especially about the true value, is a property which is also very important. Nagar's derivations are in terms of certain population matrices and traces of other population matrices. The estimation of these matrices and traces and their sub- stitution into Nagar's formulas are likely to add greatly to the dispersion of the estimated coefficient variance-co- variance matrix. lNagar [1959], and Nagar [1962]. 125 C. Coefficient Standard Errors and t-ratios The square roots of the diagonal elements of the estimated co- efficient variance-covariance matrix (i.e., the square roots of the estimated coefficient variances) are often used as approximate coeffi- cient standard errors and the ratios of the coefficients to the square roots of the estimated coefficient variances are often used as approxi- mate coefficient t-ratios; however, very little information is avail- able on how well these computed values serve as approximate standard errors and approximate t-ratios. Cragg [1966] and [1967] reported the results of a Monte-Carlo experiment involving DLS, ZSLS, UBK, LIML, 3SLS, and FIML. The co- efficient matrix for the basic model was as follows:1 '1 Y12 Y13 811 E312 0 0 BIS O 0 Y21 '1 0 £321 0 B23 0 B25 0 E327 ~° Y32 ’1 B31 0 B33 534 0 B36 OJ 1Cragg [1967], p. 94. The large number of abandoned samples casts considerable doubt on the meaningfullness of the results; however, the obviously very large amount of rounding error which was encountered surely had a considerably smaller effect on the calculation of the single equation estimates than for the 3SLS and FIML estimates. From Cragg's description of the FIML results, it would appear that they should be totally ignored due to excessive rounding error in their computation. In addition to the large number of abandoned samples, a number of FIML estimates were retained even though their computed coefficient variances were negative. This indicates either conver- gence to a saddle point instead of a local maximum for many problems (convergences of FIML is discussed in chapter V) or that a high de- gree of rounding error was encountered (or both). (The writer is not suggesting that additional samples should have been eliminated based on the FIML results, but that the entire set of FIML results should have been ignored.) 126 The results reported in Cragg [1967] consisted of 26 experiments, each containing 50 samples, the experiments being summarized in the following table:1 EXPERIMENTS CONDUCTED Experiment Special Features* Abandoned samples 1 None 1 2 Disturbance set 2 O 3 Disturbance set 3 l 4 Exogenous variable set 2 0 5 Exogenous variable set 3 l 6 Structure 2 0 7 Structure 3 8 8 Structure 4 l 9 Structure 5 1 10 Structure 6 O 11 Structure 7 O 12 Structure 8 0 13 Values of 2 25% those of Table 1 3 14 ‘Values of 2 4 times those of Table 1 3 15 Values of 2 9 times those of Table 1 4 16 Values of 2 16 times those of Table 1 6 17 Values of 2 25 times those of Table I 10 18 35 observations 0 19 50 observations 0 20 7O observations 0 21 Multicollinearity l 1 22 Multicollinearity 2 0 23 Multicollinearity 3 2 24 Multicollinearity 4 6 25 Multicollinearity 5 8 26 Multicollinearity 6 7 *Unless otherwise noted twenty observations were used with no specially introduced multicollinearity in exogenous variable data 1, structural disturbance set 1, and structure 1. 1Cragg [1967], p. 94. 127 A formula equivalent to (11.4) was used as the formula for estimating the coefficient variance-covariance matrix; that is, a degrees of freedom adjustment of T - 5 was used for all samples. Approximate t-ratios were calculated as the coefficients divided by the square roots of the diagonal elements of the coefficient variance-covariance matrices. In reporting the results of the 26 experiments Cragg states: ”The adequacy of the standard errors was investigated by examining the ratios of the deviations of the coefficients from the true values to their standard errors, which we call for simplicity the t ratios. It is sometimes supposed that t ratios are distributed as Student's t. In investigating this supposition there were two difficulties: (1) should the standard errors be adjusted for 'lost degrees of freedom' and (2) what is the appropriate number of degrees of freedom for the t distribution. After examining some of the data it appeared that the standard errors of a coefficient in a particular structural equation should be adjusted for the number of coefficients to be estimated in that equation.” [reference to footnote deleted] '"The most appropriate number of degrees of freedom appeared to be the number of observations minus the number of coefficients to be estimated in the equa- tion in which the coefficient fell. The hypothesis investigated was that not more than five per cent of the t ratios would fall outside the ninety-five percent confidence intervals for roughly five percent of the coefficients in most of the experiments. The number of the consistent t ratios falling outside the interval was significantly higher than five per cent for only one or two coefficients in most experiments and quite often there was none."1 Also, Cragg states: "The DLS standard errors were not apt to be reliable for making inferences about the true values of the structural coefficients. Much more frequently than for the consistent methods, the number of DLS t ratios falling outside the ninety-five per cent con- fidence intervals was significantly greater than five per cent of the total number of estimates of a coefficient."2 1Cragg [1967], p. 101. 2Cragg [1967], p. 102. 128 As one of his conclusions, Cragg states: ”Usually use of the standard errors of the consistent methods would lead to reliable inferences, but this was not always the case. The standard errors of DLS were not useful for making inference about the true values of the coefficients."1 Cragg reported some additional experiments in which the model noted above was modified to examine the effect of (1) errors in the exogenous variables, (2) stochastic coefficients, and (3) heteroskedastic and autocorrelated disturbances. Results similar to those noted above are reported for these additional experiments.2 1Cragg [1967], p. 109. 28cc Cragg [1966]. CHAPTER IV GENERALIZED LEAST SQUARES A. Unrestricted Generalized Least Squares (GLS) The generalized least squares model (also called the Aitken model) may be expressed as: (IV.1) y = X 6 +- u Txl TXn nX1 TXl where the same statistical assumptions are made as were made in Chapter 1 except that (l) 6uu' = 2; where 2 is a TXT positive definite matrix known except for a multiplicative constant. If (1V.l) represents a single stochastic equation from a system of stochastic equations, then T = T and 6uu' = 2 is a loosen- ing of the assumption made earlier in this paper that 6uu' = 021. (The dot is used above the 2 to insure that the 2 matrix is not confused with the 2 matrix which is the MXM disturbance variance-covariance matrix for a system of M equations.) (2) X is a matrix of variables assumed fixed rather than merely con- temporaneously uncorrelated with u. The X matrix is not the same as the X matrix of preceding chapters. 1f (IV.1) represents a single equation from a system of equation then T = T and the X matrix is the same as the Xp or 2“ matrix. In part II of this paper we will, at times, lAitken [1934-35], pp. 42-43. 129 130 rewrite an entire set of M stochastic equations in the form (IV.1). In this case T = MRT, y and u will become MTXl M vectors, X will become an IMTX( 2 mg) u=1 the X“ matrices and matrices of zeros, and 2 *will become an matrix constructed from MTXMT matrix. (The ZA [Zellnerintken] and the 3SLS [three- stage least squares] models will be derived as modifications of the GLS model.) (3) 1n the GLS model, X is assumed to have full column rank. This assumption will be relaxed in the next section when we consider the RGLS (restricted generalized least squares) model. The GLS estimator is: . = , -1 -1 ,--1 (1v.2) 61:13 [x )‘3 x] [x 2 y] and the var1ance of 6GLS is g1ven by: . .- -1 _ I (IV.3) Var (5cm) [x 2 51] . Under the above statistical assumptions, 6 is the minimum GLS variance linear unbiased estimator. Quite a few applications of the GLS model will be made in the remainder of this paper; however, some of the assumptions will not be met for these applications. Even in these applications, although 6G will no longer be best linear un- LS may still have desirable properties. Even though some A biased, 6GLS of the assumptions are not met, we will still refer to estimates using (1V.2) and (IV.3) as GLS estimates. 131 B._ Restricted Generalized Least Squares (RGLS) The restricted generalized least squares (RGLS) model is the same as the GLS model except that the following restrictions are imposed on the coefficients: (1V . 4) R 6 = r X X X NR n n 1 NR 1 where R is an NRXn matrix of known elements and r is an NRX1 vector of known elements.1 In the GLS model, X is assumed to have full column rank. This assumption is relaxed in the RGLS model. A corresponding necessary (but not sufficient) condition for the RGLS model is that rk X + rk R 2 n.2 R need not have full row rank; that is, rk R may be less than NR (i.e., redundent restrictions are permitted in R). The RGLS estimator is given by:3 A 1 I.-1 '1 c 1"1 '.-l (Iv.5) 65m,LS = QHQ (x z X>QJ Q [(11 2 y) - (x 2 mm + q or, equivalently by: 1N may be greater than, equal to, or less than n. If N > n, then (11 some of the restrictions are redundant, (2) some of the re- strictions are inconsistent, and/or (3) 6 is restricted to a fixed set of coefficients. The computational procedure given in this chapter de- tects but allows redundancy. Inconsistency is also detected. 2rk R of the coefficients in 6 may be solved for in terms of the remaining n - rk R coefficients. If rk X is not greater than or equal to n - rk R, then the remaining n - rk R coefficients (and, therefore, the rk R coefficients) will not be unique. 3To the writer's knowledge, these forms of the RGLS estimator have not appeared in the literature. A more common formula will be presented later (IV.23) and some advantages of (IV.5) or (IV.6) and (1V.7) over (IV.23) there discussed. The proofs of (IV.5), (IV.6) and (1V.7) are given after Q and q are defined (via their computation). 132 (IV.6) Elias = [q'(x'i'lx)Q]'lq'[(X'53'1y) - (X'i'lxml and (”'7) 31:33 ___ (22811613 + cl2 where: Q2 is a rk RX(n - rk R) matrix derived from R by reducing R to essentially a row echelon form by a series of row operations, then forming Q2 as the negative of the resulting row echelon matrix (possibly rearranged slightly). Q is an nX(n - rk R) matrix formed as Q2 augmented by an (n - rk R)X(n - rk R) identity matrix (and the rows possibly rearranged). q2 is a rk RXl vector derived from r by performing the row operations on the augmented matrix [R 3 r] instead of on R alone. q is an nXl vector formed as q2 augmented by an (n - rk R)Xl vector of zeros (and the elements possibly rearranged to conform with the rearrangement of the rows of Q). 8:618 is composed of the elements of 6RGLS corresponding to the (n-rk R)Xl vector of zeros added to q2 in forming q. 6(2) is composed of the remaining rk R elements of 6 RGLS RGLS ' The use of (IV.6) and (1V.7) is equivalent to (l) solving (1V.4) for rk R of the coefficients in terms of the remaining n - rk R co- efficients; (2) substituting this solution for the rk.R coefficients into (IV.l), and rewriting (i.e., redefining variables) so that in effect an Egrestricted GLS model with n - rk R coefficients to be 133 estimated is obtained; (3) estimating these n - rk R coefficients by the GLS formula (IV.2); and (4) substituting these estimates back into the solution of step (1) to obtain the estimates of the rk R coeffi- cients which were there "solved out.“ The use of formula (IV.5) is equivalent to the use of the two separate formulas (IV.6) and (1V.7). Computational procedures for forming Q, q, Q2 and q2 Oare given in the next section. The precise difinition of these matrices is given by their computational procedure. . . l The variance-covariance matrix for 6 is: RGLS (1V.8) Var(gkcLs) = Q[Q'(X'i-1X)Q]-1Q' 6A derivation of (1V.8) is given after the derivation of SRGLS° 134 l. Computation of Q, and q First of all, the augmented matrix [R 3 r] is formed. Any row operation which is performed on R will be performed on r as well. The matrix Q and the vector q may be formed as follows:1 lot 30112.6 06 Steps -- Reduction 06 R to Row Echcflon Foam (la) Let abs Ri j be the largest element in absolute value of R. l 1 If abs Ri j l l in step m.) Otherwise switch rows 1 and is less than :1 go to step m. (5:1 is explained i1 of the augmented matrix and columns 1 and of this matrix so that the largest J'1 element occurs in column 1 of row 1. Record the order of the new rows and columns in terms of the order of the original rows and columns of R. (lb) Perform row operations on the resulting augmented matrix to re- duce the first column to column 1 of an N xN identity matrix. R R Denote the resulting augmented matrix as [1(1) ?-R(1) 5 r(1)], where 161) is the first column of an N xN identity matrix. R R 6A9 indicated in Part 111 (Programming Considerations), to reduce rounding error, all variables should be normalized so that all elements of the Z'Z matrix will be of comparable magnitude and unaffected by the multiplication of any variable by a positive constant such as a power of 10 (i.e., unaffected by shifting the decimal point of the variable). Normalization such that the variables inherent in the Z'Z matrix all have length 1 or their deviations from means have length 1 is suggested. Thus, a step which should precede the first step outlined in this section is to normalize the columns of R and r to take account of the normalization of variables. The scaling of vari- ables up or down by a user will then have no effect on the normalized R matrix. If the elements in the rows of R and r differ greatly in magnitude, the R matrix and r vector should also be normalized row- wise to reduce rounding error. (Multiplication of any row of [R 3 r] by a constant does not change the restriction.) (28) (2b) (k8) (Rb) (ED 135 Let abs Ri j be the largest element in absolute value in rows 2 2 ‘ 1 2 through N of R( ). If abs R, is less than a , go to R 12312 2 step m. Otherwise switch rows 2 and i of the augmented matrix 2 and columns 2 and j2 of this matrix so that this element occurs in column 2 of row 2. Record the order of the new rows and columns in terms of the order of the original rows and columns of R. Perform row operations on the resulting augmented matrix to re- duce the second column to column 2 of a NRXNR identity matrix. 11‘” 2 11(2) 2 r(2)], Denote the resulting augmented matrix as 1(2) is the first two columns of a N XN identity matrix. h w ere R R In general, at the kth step: Let abs Ri j be the largest element in absolute value of R(k-1). ' k k If abs Ri j is less than c k k rows k and ik of the augmented matrix and columns k and jk k’ go to step m. Otherwise, switch of this matrix so that this element occurs in column k of row k. Record the order of the new rows and columns in terms of the order of the original rows and columns of R. Perform row operations on the resulting augmented matrix to re- duce the kth column to column k of an NRXNR identity matrix. a I k Denote the augmented matrix as [1(k) : R : r(k)], where 1( ) is the first k columns of a NRXNR identity matrix. The procedure is continued until either (1) all N rows have R been treated (i.e., 1(k) is an NRXNR identity matrix) in which case R has full row rank, i.e., rk R = NR; or (2) at the mth step, Ri j < em in which case rk R = m - 1. Let us assume 136 the latter which we will call step m. Thus, at step m we have: (Iv.9) 11““) E 11““) E run-1)] = I I A1 1 b1 rk Rer R rk RX(n-rk R) rk RXl 0 A2 b2 (NR-rk R)er RI (NR-rk R)X(n-rk R)| (NR-rk R)Xl (If rk R = N the matrix [0 5 A2 E b2] will not occur.) R, If no rounding error occurred, and if the remaining rows were exact linear combinations of the preceding m - 1 rows, then abs R. 11 m m would be zero. We must, however, allow for the possibility of round- ing error; hence, we can detect R. having less than full row rank only if we consider an e which is greater than zero at each step. A preset k :1 = 62 = '°' = SN can be assumed or calculated before the procedure R is started, or an s can be calculated at each step to reflect the k number of operations performed. 1During the calculation of [1(m-l) E R(m-1)], columns were re- arranged; therefore, the column corresponding to a given coefficient in 6 will have been moved. Suppose that the coefficients of 6 are now rearranged so that each coefficient will be in the same order as its corresponding column of [1(m-1) E R(m-1)]. Let us designate 6 in its rearranged order as 6*. Let us also rearrange the columns of R so that they are in the same order as their corresponding coefficients and let us de- in 6* (same order as the columns of [1(m-l) 1 R(m-1)]) lete from R and r the rows (if any) corresponding to [O 1 A2 3 b2]. Let us designate the new matrix obtained as R* and the new vector as r*. Then the original set of restrictions could be rewritten as * R*6* = r* or R{6(1) + R§6(2)* = r* where R? is a rk Rer R matrix. By our method of calculation, [1 E A ’ r*] = (Riflimi E 113 E r*], i.e., 5 b ] = (R*)-1[R* : 1-1 1 ‘ 1 -1 a: * = * A1 R1 R3 and b1 (R1) r* . 137 rk Ri< N may occur in either of two cases: R (1) The remaining NR - m + l restrictions are implied by (are linear (2) combinations of) the preceding m - l restrictions treated; therefore, they can be ignored. In this case, b will be approx- 2 imately a vector of zeros. (If no rounding error occurred, b2 would be exactly a vector of zeros; however, due to rounding error we should compare the absolute value of the elements of b2 with some positive constant.) At least one of the remaining NR - m + 1 restrictions is incon- sistent with the preceding m - l restrictions treated; hence, action should be taken by the user to remove the inconsistency. Inconsistency is detected by comparing the absolute value of the elements of b with a small positive constant. abs(b2)i > c 2 (b 2)i implies that row i of [0 3 A2 3 b2] is inconsistent with the rows of [I 5 A1 1 b1]. Since we recorded the row number of the original matrix R corresponding to each row of these augmented matrices, the set of equations (in terms of the row numbers of R) with which this equation is inconsistent can be noted so that the set of restrictions can be corrected by the user. Even after find- ing one i for which “abs(b2)i > the remaining (b2)i E 9 (b2)i should be checked so that all inconsistencies are noted for correc- tion. 138 2nd Swiss 06$1€p6~~FOW0n 06 Q; and q Mom A1 and b Form a matrix Q* and a vector q* as: '- 'Al T F 1)1 ] rk RX(n-rk R) rk RXl nX(n-rk R) I nXl 0 (n-rk R)X(n-rk R1] L(n-rk R)X1] b [1(m-1) E R(m-1)] we rearrange the columns of In forming the matrix R noting their revised order in terms of their original order. If the rows of Q* and q* are considered to be in the revised order (tn-1) 5 ROD-DJ) (the same order as the columns of [I then the matrices Q and q can be formed from Q* and q* by rearranging the rows of Q* and q* so that they are in the same order as the original columns of R. Thus, Q = Q* with the rows arranged to the original order of the columns of R. q = q* with the rows arranged to the original order of the columns of R. The columns of Q need not be rearranged, since they correspond to Q the rows of R, and the order of the rows of R6 is of no consequence. Computation 06 Q2, q2 The above procedure also gives a method of separating out n-rk R coefficients which may be estimated directly and then the remaining rk R coefficient estimates solved for from these estimates. Let Q2 be formed from Q and q2 be formed from q by de- leting the (n - rk R)X(n - rk R) identity submatrix of Q and the 139 corresponding rows of q.1 Let 6(1) be the coefficients correspond- ing to the rows of the identity submatrix which were deleted from the Q matrix and let 6(2) be the coefficients corresponding to the rows of Q then 6(1) may be calculated directly by (IV 6) and 6(2) 2’ RGLS ' RGLS “(1) may be calculated by (1V.7) from Q2, q2, and 6RGLS° In the calculation of Q and q, it is not actually necessary to search for the largest element in the remaining submatrix at any step. The rows could be taken in turn and the first non-zero element encountered used for Rikjk; however, the extra searching and non- sequential selection of rows will reduce rounding error for many pro- blems. P/1006 05 (ll/.5), (IV.6), and (1V.7) The method of deriving Q, Q2, q, and q2 forms most of the proof of the formulas. In the first computational method, we reduced A b the [R E r] matrix to the reduced augmented matrix [2 1 1‘] , A2 b2 where A2 and b2 are within rounding error of zero (assuming no in- consistency in the original set of restrictions). Only row operations were used in reducing the [R 5 r] matrix; therefore, the above re— duced augmented matrix incorporates the full set of restrictions. In fact, if A2 and b2 are within rounding error of zero, the full set of restrictions are contained in the [I 5 A1 3 b1] matrix which con- tains rk R rows. Thus, the restrictions may be expressed as: (1v.11) [1 :‘ A1]6* = bl A 6 1Q may be formed directly from wA and q2 may be formed directly from b1 be rearranging the rows of -A1 and b1, respectively. 140 where the * denotes that the coefficients in the 6 vector have been rearranged into the same order as the columns of [I E A1]. For nota- tional convenience, let us respecify the order of variables in the original problem so that they are in the same order as the columns of [I i A1]. Thus, under the renumbering, 6* = 6 and b1 = q2. (IV.11) may be rewritten as: (2) (1v.12) [1 . A1]6 [1 . A1] 5(1) 5 +-A16 qz or ’ (1v.13) 5(2) = -A15(1) + “2 = Q26(1) + qz since Q2 = 4A under our revised numbering of variables. Our basic 1 model is now: 6<1) (1v.14) y = [x1 3 x2] +-u (2) subject to (1v.15) 5(2) = Q25(1) + qz or, substituting for 6(2) into (IV.14): .- 5(1) - (IV.16) y = [x E x J + u l 2 5(1) .92 +q23 or . I <1) (IV.17) y ~qu2 = [X1 :.X2] 6 + u Q2 or 141 (1v.18) y - Xq = xqu + u 0 1 since q = and Q = under our revised numbering. ‘12 Q2 Applying the GLS formula to (IV.18) [letting y - Xq and XQ be the 9 y and X respectively of (IV.2)] we get: 5(1) ._1 _ ._ [Q'X'Z XQ] lfq'X'z 1(y - Xq>l (1v.19) ,_ _ ,_ ,_ [Q'X'Z 1XQ] liQ'X'Z 1y - Q'X'E 1qu which is exactly formula (IV.6). If we replace 6(1) in (1V.5) by 6(1), we have 6(2) expressed as in formula (1V.7). Further, A 5(1) ram 0 “(1) .(1V.20) 6 = “(2) = “(1) + = Q6 + q ; 6 Q26 q hence, substituting for 6(1) we get: . ._ - .-1 ._ a = Qiq'x' 2 1x11] 1[Q’x' 2 y - Q'X' 2 1X9] + q . which is exactly (IV.5). To derive the variance formula (1V.8) we note that (by IV.20): (1) (1V.21) Var(6) = QEVar(6 )JQ' But by the GLS formula for variance (IV.3), substituting XQ from (IV.18) for the X in (IV.l): (1v.22) Var(6(1)) = [q'x'ij'lxqj‘l Substituting this into (IV.21) we get (1V.8). 1If y = Ax + b with A a matrix of fixed elements and b a vector of fixed elements, then Var(y) = A[Var(x)]A' 142 2. Relationship_to another restriction formula If the further assumptions that R has full row rank and X has full column rank are imposed, an alternative formula for 3RGLS is given by: A A 1" ‘1 v I"1 ' (N23) (5ka = bGLS - (x 2 1X) R [ROI 2 X) 1,-1A R ] [RéGLS - r] In order to show the relationship between this and our previous formula A for 6RGLS we will derive (IV.23) from (1V.5) using the above additional 1 assumptions (rk R = NR and rk X = n). 0011.an 06 (IV.23) Since rk R = NR’ we may reorder columns and variables if nec- essary and partition R as [R1 3 R2] with R square and nonsingular. 1 Then application of row operations to reduce R = [R1 3 R2] to [I 5 A1] is equivalent to premultiplying R by a non-singular matrix C with C such that CR1 = I. Then, -A o 1 -1 IRQ = C-ICRQ = c’IEI : A1] = c '[-A + A1] = c‘l-o (IV. 24) RQ 1 6A method for deriving (IV.23) using Lagrangian multipliers is given in Stroud and Zellner [1962]. Another derivation of (IV.23) is given in Chipman and Rao [1964a]. The Y matrix and a vector of Chipman and Rao are the same as our R matrix and r vector respectively. Our Q matrix satisfies the requirements for their 6 matrix and our q vector satisfies the requirements for their Ga vector. (Chipman and Rao do not present an actual computational scheme for ‘9, G, or 0.) Using these sub- stitutions, the essence of (1V.5) is contained as an intermediate step of Chipman and Rao's derivation of (IV.23). Since the proof of (1V.5) given in this paper differs from Chipman and Rao's, the derivation of (IV.23) from (1V.5) differs from Chipman and Rao's also. 143 1 Thus, the columns of Q are orthogonal to the rows of R. Before going further we will find it useful to establish the following lemma: (IV.25) Lemma: Let E and F be Square symmetric idempotent matrices of order p with EF = FE = O, rk E = n and rk F = p - n; then, E + F = I. P1006: Let E + F = G. Then G is symmetric idempotent and has rank n + (p - n) = p.2 Thus, G has full row and column rank; hence, G.1 exists. E + F = G = CI = C(cc’l) = GG(G-1) = cc”1 = 1 (End 06 P2005 05 Lemma) From (1V.5) we have: ._1 _1 ._ ._ QEQ'X'E X Q] [Q'X'Z 1y - Q'X'Z 1X91+ q (IV.26) 6RGLS QEQ'X'SC'IXQJ'IQ'(x'fiQNx'i'bo'lx'i'ly - q] + q 1 1"1 ‘1 v c" A _ om x >3 XQJ Q (x 2 11016613 q] + q Let E = Q[Q'X'E-1XQ]-1Q'(X'E-]X) and 1 F = (x'iz'bo'lR'EMx'fi'le R']-1R . Then, E and F are symmetric idempotent with RF = FE = 0, since: 1By the definition of orthogonal complement given in Koopmans, Rubin and Leipnik [1950], p. 89, Q is the orthogonal complement of R; that is, Q has' full column rank and satisfies RQ = 0. 2For symmetric idempotent matrices E and F: (E + F)2 = E + F if and only if EF = FE = 0. Also, EF = FE = 0 implies rk(E + F) = rk E + rk F. See Chipman and Rao [1964b]. 3 . . . Each underlined expression is a matrix times its inverse, hence an identity matrix which may be suppressed. 144 (1) 22 = qtq'X'i'LxQJ’lg'(x'é‘IX)q[q'x'é"xQJ‘lq'°:"1<> = QEQ'X'X'IXQJ'Iwa'S'IX) = s (2) FF = (X'X-1X)-1R'[R(X'E-]X)-1R']-1R(X'E-]X)-lR'[R(X'E-IX)-1R'J-1R = (x'é'lela'EMx't'lX)'1R'J’1R = F (3) EF = Q[Q'X'E-]XQ]-1Q'(X'E-1X) (X'X-1X)-1R'[R(X'E-1X)-1R']-1R = QEQ'X'E-JXQ]-IQ'R'[R(X'1Z-]X)-lR']-1R = 0, since Q'R' = o. (4) FE = (X'X-1X)'IR'[R(X'X-]X)'lR'J’lkqfq'x'iz'lxqj'lq'Ex'i'bi] = 0, since RQ = 0. To apply the previous lemma, we need to derive the ranks of E and F. In the process of deriving these ranks we will point out two additional assumptions about rank that need to be made [see (4) and (8) below]. Those willing to assume that rk E = n - rk R and rk F = rk R may skip the derivation of the ranks of E and F in (1) through (10), below. (1) X was assumed positive definite, hence of rank T. (2) X was assumed to have full column rank (rk X = n). (3) Since E is positive definite, E-l will be positive definite; therefore, E-1 can be expressed as E-1 = P'P where P is non- singular. Thus, rk(X'2X)-1 = rk(X'E-6X) = rk(X'P'PX) = rk(PX) = rk X = n.1 (4) rk(XQ) S min(rk.X, rk Q) = n - rk R. We will assume that rk(XQ) = n - rk R.2 1For any matrix A, rk(A'A) = rk(AA') = rk.A = rk.A'. Also, if B and B are any non-singular matrices of order compatible with A (A may be rectangular) rk(BIA) = rk(ABz) = rk.A . 2rk(XQ) = rk(X1 + XZQZ) where rk.X1 = n - rk R . (5) (6) (7) (8) (9) 145 '-l -l '- rk[Q'x'z: XQ] = rk[Q'X'211XQ] = rk[Q'X'P'PXQ] = rk(PXQ) = rk(XQ) = n - rk R . v 1"1 '1 I I 1"1 ‘1 rk(Q[QX2XQ] Q)=rk[QX2XQ] =n-rkR. The first equality of (6) comes from, first, rk(Q[Q'X'2f1XQ]-1Q') cannot exceed n - rk R, since the rank of the product of matrices cannot exceed the rank of any matrix in the product; second, rk(Q[Q'X'E-1Q]-1Q') cannot be less than n - rk R since: I QEQ'X'E'IXQTIQ' = [q'X'E'IXQTlEI 3 <25] Q2 [9'X'53’IXQJ'1 [q‘x'flxw'log '-l -l .'-l -l QZEQ'X'E XQ] QZEQ'X 2 XQ] Q; and the rank of any matrix is greater than or equal to the rank of any submatrix in the matrix. The second equality comes from (5)- 1 1°‘1 ‘1 1 rk E = rk(Q[Q X 2 .XQ] Q ) = n - rk R . '-l The first equality comes from (X'X! X) being a non-singular matrix and the second equality comes from (6). .-1 - (IV.23) assumes that R(X'ZI X) 1R' is non-singular; i.e., that l [RCX'E-1X)- R']-1 exists. This implies that rk[R(X'X-1x)-1R'] = rk R . 1R'J'IR) = rk R . l rk(R'[R(X'E-lX)- To see this, let [R(x'171X)' R']-1 = B1. Then R'BlR = [C-ICR]'Bl[C-ICR] - (CR)'(C'1)'Blc‘ICR where c is the - -l C matrix used in (v.24). Let B2 = (C 1)'BIC . Then 146 rk B = rk B = rk R , since (C-1)' and C.1 are non-singular. 2 1 I . B.2 BzAl (CR)'B2CR = ' B2[I : A1] = ' ' and has the same rank A1 A1‘32 AleAl as- B2 for the same reason as the explanation in (6). (10) rk F - rk(R'[R(X'E-¥X)-1R']-1R) = rk R . The first equality comes from (X'XFIX)-1 being a non-singular matrix and the second equality comes from (9). Since E and F are square idempotent matrices of order n with .EF = FE = 0, rk E = n — rk R and rk F = rk R, then by the above lemma (IV.25),E+F=1 or E=I-F. Substituting 1 - F for E into (IV.26) we have: . ,--1 -1 , ..-1 -1 , -1 « 5RGLS = [1 - (x 2 .X) R [R(X 2 2X) R ] R][6GLS - q] + g _ . ,-—1 —1, ,--1 -1, -1 ~ _ — [6GLS - q] - (x 2 X) R[R(x 2 x), R] [Roms Rq] +q 1 = 6 - (X'X-1X)-1R'[R(X'E-1X)- 1 ‘1 A GLS R] [R6GLS r] (since r e R6RGLS = R[Q6(1) + q] = RQ6<1> + Rq = 0 6(1) + Rq = Rq) which is exactly equation (IV.23). Advantages 06 the Q Fonmwta ave/1 the R Fonmufla. .Formula (1V.5) (which we will call the Q formula) has the following advantages over formula (IV.23) (which we will call the R formula): (1) X need not have full column rank if the Q formula is used but must have full column rank for the R formula. Thus, the Q A formula may permit calculation of 6RGLS when a unique unre- stricted 6GL does not exist; however,to use the R formula S (2) (3) (4) (5) (6) 147 A requires 6GLS. For the Q formula, the restrictions imposed on the coefficients need not be linearly independent (R need not have full row rank); however, use of the R formula requires restrictions which are linearly independent. In calculating Q, an explicit check is made for inconsistent equations. In the R formula, both linearly dependent and in- consistent equations lead to the same result--singularity of the R(X'.2-1X) -1R' matrix. The largest matrix inverted by the Q formula is of order n - rk R. The largest matrix inverted by the R formula is of order n. If an iterative procedure such as iteration on BSLS (three-stage least squares) or ZA (Zellner-Aitken estimator) is being used, once the Q matrix and q vector are formed, they can be used for all iterations; therefore, the calculation of the restricted coefficients requires less time at each iteration for the Q formula than for the R formula. Use of the Q formula permits a unified treatment of restrictions on coefficients for DLS, ZA, IZA (iterative Zellner-Aitken), BSLS, I3SLS (iterative three-stage least squares), SML (limited in- formation subsystem maximum likelihood), LML (linearized maximum likelihood, and FIML (full information maximum likelihood), as 148 will become evident as each of these methods is discussed. The Q formula has the disadvantage that the Q matrix and q vector must be calculated, but this is only a small task which can be accomplished rapidly on a computer. 149 C. Restrictions Imposed on Direct Least Squares Coefficients DLS estimation is the particular case of GLS estimation in which ' 2 X is assumed to be of the form 0 1 which is known except for a multi- 2 plication constant (C ) as required by the GLS model. model is written as: (IV.27) y = Xu6 4-11 , the DLS estimator is 2 -l (IV.28) 601$ [xpxu] Xuy and the variance of 6DLS is given by (IV 29) Var(6 ) = 02[X'X 1-1 ‘ DLS u u Thus, if the Assuming that X” contains only fixed variables, an unbiased estimate of 02 is1 (IV.30) o — uDLSuDLS/(T n) If the restrictions (IV.31) R 6 = r NRxn nX 1 NRX 1 are imposed on the coefficients, the RDLS (restricted DLS) model is the particular case of the RGLS model in which X is assumed to be of the form 021. Thus, the restricted DLS solution may be obtained by 1Goldberger [1964], p. 268 states that the variables in X may be stochastic provided they are distributed independently of u. 150 . . 2 ' substituting X” for X and O I for 2 in RGLS formula (1V.5) which gives: ’ A I I '1 I I I (1v.32) 6mm = Q{[Q (xuxuml Q [xuy - (xuxumll - q where Q and q are calculated from R and r in the manner given previously for the RGLS model. Q is an nX(n - rk R) matrix, and q is an nXl vector. As with the GLS model, the use of (IV.32) obtains the same result as if (1) rk R coefficients are solved out in terms of the remaining n - rk R coefficients, (2) the n-rk R coefficients are estimated by DLS, and (3) the rk R coefficients which were solved out are calculated from the n - rk R coefficients which were estimated directly. Provided [Q'X‘LXMQZV1 exists [this inverse will exist if rk(XhQ) = n - rk R],the RDLS estimates are unique. The solution obtained is the solution which minimizes (by choice of 6) (y - Xu6)'(y - Xu6) subject to R6 = r. LT . . . 2 . he multiplicative constant 0 cancels out Since 21y - (mozxqun - q 2 2 I I ‘II I _ I _ = Q{(1/o )0 [Q (xuxpml Q 1qu (xpxumll q ‘ _ I I 2 '1 I I 6mm - Q{[Q (XI? mum] Q Expo _ I I '1 I I _ ' - - Q{[Q 0‘11st Q “‘11" (xuxuml q- 2Calculation of the Q matrix and“ q vector also gives a means of separating out rk R coefficients, 6RDLS’ which may be calculated from the remaining n-rk R "unrestricted" coefficients, SR61$° the following pair of formulas are together equivalent to (IV.32): Thus, RDLS RDLS RDLS where Q2 and q2 are the subparts of Q and q noted earlier. A(1) = I I ’1 I I _ I . A(2) = A(1) 6 [Q (XHXQQJ Q [pr (xpxuml . 6 Q2 6 + <12 151 The variance-covariance matrix for 6RDLS is A 2 _ 2 with an unbiased estimate of 0 being (1v.34) 82 = " RDLS URDLSURDLS/(T - n + rk R) provided X” is a matrix of "fixed" variables.1 If we relax our assumptions to those of the double k-class model, permitting, in particular, jointly dependent variables in the matrix X“, then the RDLS estimates will have the same properties as the DLS estimates noted in the double k-class section. Although the formulas within this section no longer provide unbiased or even consistent A estimates, 6RDLS will still be the 6 which minimizes fi'fi (subject *2 . . to the restrictions, of course), and ORDLS will still be the estimate 2 . . of c which would be obtained if the restrictions were solved out and then the usual DLS formulas applied. 1Or the variables in X“ are distributed independently of u. 152 D. Restrictions Imposed on Two-stage Least Squares Coefficients Delbéuwti'on 06 ZSLS as a GLS Method In the ZSLS model used earlier, we considered an equation of the form: (IV.35) y = Zu6 +-u . 2 with Var(u) = O I If (IV.35) is premultiplied by X', we have1 I IV.36 ' = x' 6 + ' ( ) ny 12D XIu with Var(X'u) s X'[Var(u)]X = X'[021]X = OZX'X I I I I 1 1 I provided that we assume that the XI matrix contains "fixed“ variables only. If GLS is applied to (IV.36) by using Xiy as the GLS'y,, xizu as the GLS X,'; and Xiu as the GLS u in the G13 computational formula (IV.2) the following GLS estimator is obtained:2 A I I '1 I '1 . . ’1 ' (1v.37) 6 =[sz1(ozx1x1) x12“) [szlwbilxp XIYJ I I '1 I '1 I I ' I = [2px1(x1x1) szu] [quI(XIXI) 1XIY] But, by (1.36) this is ' , fi-l" . WY YJImI [Y x“ leI [Y YJIXI I '1 [z' ] a [6‘2“] “1 “’y Ix1 [ . . [ . , L. qu] 11xI [xuxuhlxle “19'3le and, since the variables in X“ are contained in X1, this is [see 1A3 before, we assume that the matrix of instruments, X includes the variables in Xi.L I 2This derivation of ZSLS is given in Zellner and Theil [1962], p. 56. 153 (1.40)]: A = 6zsLS X'Y x'x x'y H H H U 4 Thus, ZSLS may be derived by an application of GLS. Not all of the assumptions of the GLS model are met; hence, 6 does not have all ZSLS of the desirable properties that 8GLS has. Particularly affecting the properties of ZSLS is the fact that the m jointly dependent vari- ables in the XiY matrix (a submatrix of XiZ) are asymptotically correlated with the disturbance Xiu . Cetcutatéon 05 2313 When Rest/ulations ahe Imposed on the Coefifiteéento Since ZSLS may be derived as a case of GLS, it is natural to question whether restrictions can be imposed on the coefficients and a restricted ZSLS estimator derived as an application of RGLS. If we denote: ' Y'X ry' (IV.38) A = , b = 'Y X'X X' 1. KM M M L Hy then the GLS-ZSLS solution (IV.37) is: ) I H (IV.39) 6 = A b If we impose the restrictions (IV.40) R6 = r 154 on the coefficients of (IV.35), then the RGLS solution corresponding to (IV.39) [we will denote this solution as the RZSLS solution] is:1 (“'41) 3112315 = QIEQ'AQJ’IQ'Eb - AqJ} + q where Q and q are calculated from R and r in the same way as in the usual RGLS model. Q is an nX(n-rk R) matrix and q is an nxl vector. However, the coefficients obtained through use of the compu- tational method given above may differ from the coefficients obtained if the restrictions, (IV.40), are used to reduce the number of coeffi- cients to be estimated and then ZSLS is used to estimate the remaining coefficients. One reason that the coefficients may differ is that if the usual procedure of using the restrictions to reduce the number of coefficients to be estimated is followed, predetermined variables are often linearly combined with jointly dependent variables and the newly constructed variables (those that are linear combinations of jointly dependent and predetermined variables) are then labeled jointly de- pendent. The predetermined variables which are linearly combined with jointly dependent variables then no longer occur in the X“ matrix. Since these variables are not longer in the Xu matrix, unless they .. 2 , -1 Var(finzsw) = 0 [Q AQ] with a consistent estimate of 02 being given by: A2 _ A ' A °RZSLS URZSISuRZSLS If T were substituted for T - n +-rk R in the denominator, the re- /(T-n+rkR) - , .2 sulting ORZSLS would still be consistent. 155 are specifically entered into the X** matrix (XI = [Xu I : X**]), the space spanned by the XI matrix formed after the coefficients are solved into the equation is likely to be a proper subspace of the space spanned by the XI matrix which would have been formed before the coefficients were solved into the equation. Hence, th and y.x are likely to change. Another reason why the resulting coefficients may differ is that in using restrictions to reduce the number of coefficients, a set of predetermined variables may be linearly combined into a single pre- determined variable; hence, the space spanned by X may again change. I To make the above remarks clearer, we will illustrate the effect of a restriction on two coefficients which effectively linearly combines a jointly dependent and predetermined variable if the uSual procedure is applied. Suppose that the equation to be estimated is: +-a x +~a x + u (IV'AZ) y1 = “1Y2 2 2 3 3 and the restriction (IV.43) a - a = 0 is imposed on the coefficients in the equation. Then, y = y1, Y = y2’ and X“ = [x2,x3]. Suppose also that the RZSLS coefficients are calculated with the matrix of instruments being (IV.44) xI = [xu E X**] = [x2,x 3 ‘ xa’xs’xb] and that (for simplicity) XI has full column rank (i.e., that none of the variables in XI can be expressed as a linear combination of the 156 remaining variables in .XI.) Let the coefficients obtained through [2.2] application of the RZSLS formulas be denoted [61] RZSLS ’ RZSIS ’ and @33st13 As an alternative means of estimating the coefficients, let us use the usual procedure of using the restrictions to reduce the number of coefficients to be estimated. We get: (IV.45) y1 = (II-(y2 + x2) + 03x3 + u . a: = = ' ** ' Thus, y yl, Y y2 + x2, X” x3, and if .X is left unchanged, then = .- ** = : (IV.46) XI [Xp . X ] [x3 . x4,x5,x6] Application of the usual ZSLS formulas to (IV.4S) will give a solution which we will de31gnate the ZSLSR solution. [ail]ZSLSR is not in general equal to [al] is not in general equal RZSLS and [“3123LSR since the Space of XI has been restricted by omitting t° [03] R2815 ’ x2. The coefficients obtained in estimating (IV.42) subject to (IV.43) by the RZSLS formula (IV.41) would be the same as the coeffi- cients obtained from estimating (IV.45) if: I ,x2]); therefore, (1) x2 were added to X** in estimating (IV.42) (r.e., the X given in (IV.46) were changed to [x3 : x4,x5,x6 the RZSLS coefficients would be obtained in both cases;1 or if LThe RZSLS coefficients would be obtained because [y2 +'x +'[x 23px . . I - [YZJIXI ZJ'XI [see (1.62)] which (if x2 is contained in 'XI) equals [yz]llx + x2 [see (1.38)]. I 157 (2) x2 were listed as jointly dependent instead of predetermined and omitted from XI (i.e., the XI given in (IV.44) were changed to [x3 5 x4,x5,x6]); therefore, the ZSLSR coefficients would be obtained in both cases. The preceding does not imply that either the 6 RZSLS solution or the 6 solution is incorrect. ZSLSR 1t merely shows the importance of the particular instruments selected. PART II MULTIPLE EQUATIONS METHODS CHAPTER V FULL INFORMATION MAXIMUM LIKELIHOOD (FIML) A. Properties of the Full Information Maximum Likelihood Estimator The full information maximum likelihood (FIML) estimator which is considered in this section is maximum likelihood if in addition to the basic statistical assumptions of this paper (section I.C.3) we add the assumption that the matrix of disturbances, U, has the multivariate normal distribution. If the matrix of disturbances is not normally distributed then the FIML computational formulas given in this section give estimates which have been termed quasi-maximum likelihood estimates. Quasi-maximum likelihood estimates may still possess some desirable properties. The FIML estimator is a "full information" estimator in the sense that account is taken of all structural equations in the system (including identity equations) in deriving estimates of the population coefficients. In the single equation techniques of part I, consider- ation was given to the structure of only a single equation at a time. For some of these Single equation techniques the predetermined vari- ables in equations other than the equation being estimated were used, but no account was taken of the structure of the remaining equations. In the FIML method the coefficients of all stochastic equations are estimated simultaneously, a distinction being made between jointly de- pendent and predetermined variables in each equation, and explicit 1See Koopmans and Hood [1953], pp. 144-147. 158 159 account being taken of all structural coefficient restrictions and any identity equations which may complete the system.1 The FIML method may only be applied if the number of jointly de- pendent variables in the system equals the number of equations including identity equations. Recording the particular variables which are said to occur in an equation is equivalent to restricting the coefficients of the equa- tion corresponding to the remaining variables in the system to zero. The coefficient of one jointly dependent variable in the equation is also restricted to -l to provide a normalization rule (see section I.C. 1).2 Initially, these are the only types of restriction that will be permitted on the coefficients; however, in section V.F FIML estimation will be generalized to take account of arbitrary linear restrictions imposed on the coefficients.3 In no place in this paper is consider- ation given to FIML estimation with restrictions imposed on the co- variance matrix of the disturbances. In particular, not considered is the much simpler computational method called full-information diagonal (often abbreviated to FID) which is obtained by assuming that all off-diagonal elements of the disturbance variance-covariance matrix 1Actually, the coefficients of predetermined variables in the identity equations are not used explicitly in the computation of the structural coefficients; however, if they were not known, the equation would not be a true identity equation. Thus, an identity equation which contains no jointly dependent variable adds no information to the system and may therefore be deleted. Also, the specification of the model is not changed if all predetermined variables in an identity equation are multiplied by their respective known coefficients and combined into a single predetermined variable. 2A8 noted in section I.C.l, for FIML estimation it makes no sub- stantive difference which jointly dependent variable is singled out as the normalizing jointly dependent variable. In section V.F the normalization rule is also generalized. 160 are zero. Thus, the FIML procedure developed in this paper is the full- information non-diagonal procedure (often denoted elsewhere as FIND). Although considerable progress has been made in developing FIML procedures which permit estimation of equations in which coefficients enter in a non-linear fashion,1 consideration in this paper will be restricted to estimation of a system in which coefficients enter each equation in a linear fashion. In this chapter, we will assume that the system is identified. This means that each equation must be not only just-identified or over- identified in the single equation sense usually treated, but in a multiple equation sense as well.2 Although some additional require- ments must be met over and above those required for identification for single equation estimation, the single equation identification rules applied separately to each equation provide a good starting point. A misconception fairly generally held is that there must be more observations in the sample than number of coefficients in the system to be estimated. This is certainly not true. Provided that there are sufficient observations that the estimated disturbance variance-co- variance matrix is not singular (there must be more observations than number of equations with disturbances), only sufficient observations that DLS can be applied to each equation separately is all that is in 1Eisenpress has made considerable progress in this area. See Eisenpress and Greenstadt [1964]. 2Identification is treated in the multiple equations sense in Koopmans, Rubin, and Leipnik [1950]. 161 general required.1 (At this point the reader may want to review the notation de- veloped in section I.C for expressing a system of equations including notation related to the reduced form equations.) Another property which is certainly worth noting applies to the reduced form coefficients estimated from the FIML structional coeffi- cients. Let, as before, the reduced form equations be expressed as Y = XH' + V. Let h be any estimate of H which is not inconsis- tent with the restrictions imposed on the coefficients of the structural 1“ equations. Also, let V = Y - XH' be the matrix of reduced form resid- A A uals and (l/T)V'V be the estimated variance-covariance matrix of the reduced form disturbances. Then if HFIML is calculated from the estimated FIML coefficients of the structural equations, i.e., A [5-1 A 1: is = -F B h ' d —V' ' HFIML FIML FIML’ t e resulting et(T FIMLVFIML) Will be less than or equal to detQ%V'§) obtained from any set of reduced form coeffi- cients which are not inconsistent with the restrictions imposed on the . . . 2 . . coeffiCients of the structural equations. Thus, of the estimating 1That the estimated disturbance variance-covariance matrix, S, will be singular if the number of observations exceeds the numbe “of equa- tionsAis easily shown. As noted further on in (v.19), S = I(Z'Z)a' = (1/T)U'U; hence, rk S = rk U. If T‘< M, then rk U and therefore rk S will be less than M; i.e., S will be singular. Since S is calculated in the same manner for limited information subsystem maximum likelihood (SML), the Zellner-Aitken estimator (ZA), and three—stage least squares (BSLS), the requirement that there not be fewer observations than stochastic equations applies to these methods as well. 2For this to be a meaningful statement we must assume that identity equations have been incorporated into the system by solving out one jointly dependent variable for each identity equation (thereby imposing less convenient restrictions as will be noted further on). Otherwise, 162 procedures which take the full information of the system into account, the FIML method gives the minimum estimated generalized variance of the disturbances of the reduced form equations.1 For this reason, it is common to refer to FIML estimates as least generalized variance (LGV) estimates. The LGV property does not, of course, rely on an assumption of normality. It is a property similar to the least squares property for a single equation. x A i A '"’ ._ det(l-V3V) = det(l{[u 3 0]?'1}'[u 5 Ojf'l) = det(lEF’lj' U r 1) = o T T T 0 0 for any method meeting the restrictions of the structura equations. (Use of identity equations to solve out jointly dependent variables is necessary to explicitly specify properties, only. In the computational procedure which is presented, the identity equations are explicitly rec- ognized by the computational procedure rather than used to eliminate jointly dependent variables from the system.) lsee Goldberger [1964], pp. 352-354. 163 B. Derivation of the Likelihood Function to be Maximized Before continuing, let us designate what is meant by maximizing the likelihood function. Our equation system is a system of M equa- tions containing disturbances and G - M identity equations which may be written as (v.1) za'+[u30]=o or equivalently as (v.2) 02' + [Z] = 0' The matrix of coefficients, a, may be subdivided into the matrix of co- efficients of jointly dependent variables, F, and the matrix of coeffi- cients of predetermined variables, B. A further subdivision can be made on the basis of whether the coefficients are coefficients of stochastic equations or coefficients of identity equations (hence are known constants). Thus, a may be subdivided as: a1 1"1 BI (v.3) a = [r : B] = = F B C'11 II II where OI = [PI 3 Bl] represents the coefficients of the M stochastic equations and all = [T11 3 B11] represents the coefficients of the G - M identity equations. As a step in the derivation of the likelihood function, we will use the G - M identity equations to temporarily eliminate G - M jointly dependent variables from the system. (The eliminated variables will be reentered into the system at a later step.) Suppose we divide 164 our jointly dependent variables into two groups, Y = [Y 3 Y2], where 1 Y2 contains G - M jointly dependent variables to be temporarily eliminated from the system and Y contains the remaining M jointly 1 dependent variables. To reflect our subdivisions, we may rewrite (v.2) as: t ' + t I = I (v.4) PllYl + F12Y2 BIx + U 0 (v.5) F Y' + F Y'-+ B X' = 0' 21 1 22 2 II where the T matrix has been further subdivided to reflect the division of jointly dependent variables into those which are to be temporarily eliminated and those which will remain, i.e., r- 'H I“I F I11 I12 1 MXG MXM MX(G-M) (V 6) F = = GXG 1“II 1“21 T22 _(G-M) xqj _(G-M) XM (G-M) x (G-M)_J We can assume that F is non—singular, since if it were singular we 22 could merely select a different set of jointly dependent variables to be temporarily eliminated thereby rearranging the columns of F until a nonsingular T22 is obtained.1 Solving (V.5) for Yé we obtain: !_ v_ v. (V.7) FZZYZ -F21Y1 BIIX , hence, -1 -1 c ___ _ v _ v (v.8) Y2 F22F21Y1 FZZBIIX F was assumed to be nonsingular in section I.C.3, assumption (4). 165 Substituting (v.8) into (v.4), we obtain: -1 -1 . |_ I" I_ F v I I: I (V 9) F11Y1 F12 22r21Y1 F12 22311x + BIX + U 0 or (v.10) F* Yi - B* x' + U' = 0' MXM.MXT MXA AXT MXT MXT or (v 11) 0* z' + U' = 0' 1 MX(M+A) (M+A)XT MXT MXT -l * g - F ' X ' o o where F F11 12F22F21 is the square M M matrix of coeffiCients of the remaining jointly dependent variables. -1 * = - F F ' x ' . . . B B1 12 22BII is the M A matrix of coefficients of predetermined variables, 0* = [F* E 8*] is the MX(M + A) matrix of coefficients of all the variables remaining in the system, and z =[Y15X] If we assume that the TXM matrix of disturbances of the sto- chastic equations, f‘. '. T. -‘ , ’U[1] gull ”ins I ll UCTlJ u11 '°' uTM has a multivariate normal distribution with 6U = 0, 6Ufit]U[t] = 2 for all t and 6UEC]U[t] = 0 for trt', 2 being a positive definite matrix, we can write the density function for U as: T T/2 .1 -1 2 exp(-— )3 U 2 U' ) 2t=1 [t] [t] (v.12) f1(U,ZD = (2n)'T/2det’ 166 We can convert this density function to the likelihood function by using (V.ll) to transform the system from U to Z1 and 0*, taking account of the Jacobian of the transformation. Also, the logarithm of the likelihood function is easier to work with in this case than the actual likelihood function. (Since the logarithmic transformation is a strictly increasing one, the logarithm of the likelihood will be maximized at the same point as the actual likelihood.) After these transformations are made, our "logarithmic likelihood function" may be written: 1. 2 logEdet 2] + T logEabS(det I“*)] _ Z _ (V.l3) f2(Zl,a*,ZD - -2 log 2U 21 a*'Z-lo'*zi t—l [t] [t] Nh—I Ipqra If (V.l3) is maximized first with respect to ‘2 (thereby con- centrating the function onto 0* and 21), we get the relation: A A l A = — M ' *' (V.l4) Z T (Z121)o Substituting 2 from (v.14) into (v.13) and dividing by T/2, the following function is obtained: (v.15) f3(&*,zl) = c1 + logEdet2f*] - log[det(%'&*(ZiZl)&*')] 1See Koopmans and Hood [1953], pp. 143-160, 190-191. For a more detailed discussion of the ”stepwise" maximization procedure see Koopmans and Hood [1953], pp. 160, 161, 191. 167 1 where c1 is a constant. 2 2 1 2 2 2 *1: = However, det (F ) det (F11- F12F22F21) det F/det F22 and 1 . . g 1 I u .3 . 59*(lel)a* EQI(Z Z)crI , hence, (V.lS) may be rewritten as (v.16) f4(&,z) = c - logEdetzrzz] + iogEdeth‘] - log[det(%&I[Z'Z]d/i)], 1 T 2 1 c1=T glong-EZZI &*'z lazi) t=1 [t] [t] = -1og 2n - 3(ltr{—zla*'[&*(ziz1)&*']1&*zi}) = -log 2n - —tr{-a*2121&*'[a’*ZiZI&*']-1} l l l 1 = - - ‘_’t — g -1 2T1 . -. = a. D '- log 2n T r{TI} og T 1 log 2n T where tr denotes trace and we have used the relationship tr(AB) = tr(BA) for any matrices A and B provided AB and BA are defined (i.e., provided the number of rows of A equals the number of columns of B and the number of rows of 8 equals the number of columns of A). I A Since det 0 I = det I = 1 for any matrix A and since (det B)-(det C) =“det(BC) for any square matrices B and C of the same order, we have I I"IZFZZI 11 1“12 det F = det ' det ' I 21 I22 irr'lr 1“ rrr‘l o ' 12 22 11 12 11 12 22r21 = det = det F O I I‘2]. 22 0 F2 = - . _ f _ detU‘11 r12r22r21] det F22, hence, detU‘11 12F 221F21] det F/det F 3U = -Zla*' = -Zai by (V.ll) and (v.4); therefore I = I I lezloe“ orIZ Za/I 22 . 168 or since F22 is a known matrix (coefficients of identity equations 2 . are assumed known), logEdet F is merely a constant; therefore, 22] (v.16) may be written as: (v.17) f5(&,2) = c 2 + logEdetzf] - 108[d8t(%&1[2'z]&£)] . 1 . . where c2 is a constant. Note that T is the matrix of coefficients of all G jointly dependent variables in the 9 equations (including identity equations) whereas 01 is the matrix of coefficients of all (C + A) yariables but only for the M stochastic equations. To calculate FIML estimates we will select &1 2 such that the concentrated logarithmic likelihood function (V.l7) is a maximum given our particular sample 2 and the restrictions which we have imposed on a (some elements of GI are assumed to be zero, others are assumed to be -1 for normalization, and all of the elements of aII are assumed known). Notice that it is not necessary to use the identity equations to eliminate jointly dependent variables in order to write down the concentrated logarithmic likelihood function. (We temporarily eliminated some jointly dependent variables in the derivation only.) The computational procedure which will be used to maximize f5(a,Z) also will not require that the identity equations be used to eliminate jointly dependent variables. 1 _ 2 c2 — c1 - logEdet T22] Sz ' = . = ' d d Since a 01 a1 (Q11 13 assume known), only 01 nee be estimated to complete the estimation of & 3Rothenberg and Leenders [1964], pp. 72, 73 first showed that it is unnecessary to use identity equations to solve out jointly dependent variables to maximize the likelihood function. 169 The matrix (l/T)&I(Z'Z)&i is used repeatedly in the elaboration of the computational procedure which follows; hence, it will prove con- venient to denote this matrix as S. If we use as an estimate of U the TXM matrix (V. 18) U = '20. , we can write S as: (v.19) s = 3: = ;r1-& MXM 1 Let a” be the vector of estimated coefficients of the uth equation which are not restricted to either zero or -1, and let +au be defined as: -1 (v.20) a -[ ] , + u a” th , i.e. is the vector of non-zero coefficients of the u equation ’ +8“ including the normalizing coefficient. Then S may also be defined as S = s with: [w'] 1 x x = — a' _ 'U ' e A = - Z a . (Sinc up + H +‘H) Let us also group all of the unrestricted coefficients in the system into a single vector of coefficients and denote this vector as: (V. 22) a . , 170 i.e., a is the vector containing all of the unrestricted coefficients in. a, all unrestricted coefficients of the first equation (first row of a) being listed first as the vector a1, then the unrestricted co- efficients of the second equation as the vector and finally the a2, unrestricted coefficients of the Mth equation, as the vector aM. (The coefficients of the G - M identity equations are all known, i.e., restricted; hence, they are not included in the a vector.) Since Z is fixed for any given sample, we choose the unre- stricted coefficients of a such that f5(&,Z) is a maximum; however, for any given structure, the only elements of 8 which are allowed to vary are the elements of the vector a. Thus, for an assumed structure and a given sample, f5(&,Z) may be considered a function of the vector a only, i.e., (IV.23) f(a) = f5(&,Z) = + log(detzf) - log(det S) C2 Another function which will be maximized when f(a) is maximized is the function (IV.24) f*(a) = f§(a,Z) = detZF/det S ,. We cannot readily maximize either f(a) or f*(a) by setting partial derivatives equal to zero and solving for a, since the partial derivativesare complicated nonlinear functions of the elements of a. Some iterative procedure is required, and the iterative procedure pro- posed in this paper is outlined in the next section. In this procedure, LThe logarithm of f*(a) differs from f(a) only by a constant, since logEdetZF)/det S] = log(det2 F) - log(det S). 171 a set of starting values for the vector a is assumed, and then the co- efficients in a are progressively changed until f(a) reaches a maximum. The first and second partial derivatives of f(a) play a key role in the direction in a-space that a is changed at each step of the maximization procedure; however, it is convenient to base the amount of change in a given direction on the function f*(a). 172 C. Computational Procedure 1. .A maximization procedure for functions non-linear in the parameters Let us first consider maximizing a function f(a) (with a being an n dimensional vector) and assume that (l) f(a) and its derivatives are sufficiently ”well behaved“ to permit use of a Taylor expansion of sufficiently high order in approximating f(a) about the starting estimates and subsequent points1 and (2) f(a) has only a single maximum with no additional local maxima in the region which is considered.2 f(a) may have saddle points in the n dimensional space of7ma without causing difficulty. As the procedure is outlined for the n dimensional case, it will be illustrated graphically for the 2 dimensional (n = 2) case. Further on, in applying the procedure M being described, the n (= 2 n ) n=l U will become our vector a, i.e., a will consist of the vector of all element vector a defined in (v.22) unrestricted coefficients in the system which are to be estimated; however, at this point, a is merely any parameter vector. 1The use of a Taylor expansion in deriving some properties of the maximization procedure outlined here follows Crockett and Chernoff [1955]. Crockett and Chernoff [1955], p. 34 state that: "For most arguments the use of third or fourth order expansions will suffice." ZAt this point f(a) may be any function which meets these assump- tions. In sections V.C.2 and V.C.3 the procedure will be Specialized to the maximization of the f(a) given by (V 23). It is an unanswered question as to whether for the f(a) given by (v.23) multiple local maxima may occur in a region which has a positive probability of being entered through choice of starting coefficients. For a number of pro- blems, the writer has used a set of starting coefficients and calculated the coefficients which maximize the f(a) of (V 23) and then assumed a set of starting coefficients in which all coefficients varied widely from the original set of starting coefficients. In all cases the same maximum was reached. This is encouraging but does not, of course, show that for many problems (or even these problems) multiple local maxima do not occur in regions of interest. 173 Figure V.25 . Figure V.26 amax f(a) amax f(a) - ' ~oint with 'point with ighest f(a) a(' ghest f(a) of V. a I ,\’, of distance tar distance 3 (using 8 from 3(1) .. 771 metric) from a ) ' a1 31; In Figure V.26 the contour lines represent the same function as the contour lines for Figure v.25. (Each contour line is a locus of points having the same value of the function f(a).) The outer- most contour line has the lowest value of f(a), and the innermost one the highest value. The maximum value of f(a) occurs at amax f(a). (1) Suppose that we start at an initial point a and consider all points which lie a distance of exactly g units from the point an) . Next, suppose that the direction d is chosen such that at (1), “am the distance g from a + gd) is the maximum value which can be reached. (I.e., the locus of all points a distance of g from (1) a is the surface of a sphere of radius g centered at the point 8(1). The arrow is drawn through the point on the surface of this Sphere [circle in the case of Figure v.25] with the maximum f(a). The angle of the arrow indicates the direction d.) 174 If we continue in the direction d given by the arrow in Figure v.25, we will pass quite a way to the left of amax f(a). Suppose that instead of considering the locus of points on the circle, we had con- sidered the locus of points on an ellipse (ellipsoid in the case of an n-dimensional vector a) of approximately the same shape as the contour lines. In Figure V.26 an arrow has been drawn through the point with maximum f(a) on the surface of the ellipse. Notice that the arrow now points much more closely toward the maximum. An ellipsoid may be traced out instead of a sphere by merely making our concept of distance more general. Let the distance g be- tween any other point, a(1), and a(1) be defined as: (“7) g = Jam _ £1(1)).Wau) _ 8(1)) where 7” is a positive definite matrix. Then all points at a distance g from a(1) will lie on the surface of an ellipsoid instead of a sphere. In Figure V.26 the matriX' W2 is such that the resulting ellipsoid is approximately the same shape as the contour lines, while in Figure v.25 thefl‘ivdentity matrix has been used as the matrix 7n and therefore a circle has been traced out. The matrix 7)! is termed a metric. The Euclidean metric is represented by' WI==I as in Figure v.25. V Suppose that in selecting the direction to move, an arbitrarily small distance g is selected by letting g approach 0. Then, assuming a given metric 771, it can be shown that the direction in which f(a) (1) increases the most rapidly from the point a is given by 175 -lf(1) *(1) d = m where 1 is the partial derivative of f(a) evaluated at the point 8(1), i.e., T afga) 831 * 1 f a (v.28) ,( ) = Me: (1) = a=a af(82 Ba _ n ].=.<1> Crockett and Chernoff show that for any positive definite matrix 7” -lf<1)) > f(an) and choice of h sufficiently small, f(a(1) + h°7ll ) provided f(l) i 0.1 Although any positive definite matrix 771 can be used for the metric and for h sufficiently small the direction will be sufficiently good that the function will increase (the function will stay the same if we are already at the maximum), some metrics will obviously be better choices than others, e.g., compare the two metrics used in Figures v.25 and V.26. (1) If f(a) is expanded about the point a in a Taylor ex- pansion, the following is obtained: (v.29) f(a) = f(a‘1)) + [a - a(1)J'T(1) + [a - a(1)]':él)[a - a(1)] + higher order terms * where 1(1) is the nX1 vector of first partial derivatives of f(a) is the maximum f(a‘1)>; 1Crockett and Chernoff [1955], p. 35. If a(1) * - * -1 then 1(1) = 0 and f(a(1) + Hm 11(1)) = f(a(1) + MM 0) = i.e., no movement is made away from the maximum. 176 (1) with respect to the elements of a evaluated at a as before, and * ad (1) is the an matrix of second partial derivatives of f(a) with respect to the elements of a evaluated at the point 8(1) . For any function f(a) with continuous first and second partial derivatives, a local maximum occurs at any point satisfying Qéégl = 0 2 . a f(a) if 2 33, is a negative definite matrix at that point. If the partial derivative of the Taylor expansion given in (v.29) is taken with respect to a, ignoring all higher order terms, we get: (v.30) (age? 4(1) +_%,2.£*(1>[a _ 8(1)] Setting gag—l to zero and solving for a, we get: (v.31) [a - am] = (£*(1))-lf(1) or (v.32) a = 3(1) _ (i?(1))-1T(1) Also, taking the partial derivative of (v.30) with respect to a we get: 2 (v.33) biéél = {*(1) aa Since the second partial derivative must be negative definite for the point to be a local maximum, we get the additional condition * * 1. (1) must be negative definite or -£ (1) must be positive definite 177 for the point to represent a local maximum. To summarize, if it is assumed that consideration of only the (1) first three terms of a Taylor expansion about a will give a sufficiently close approximation to f(a), then the following holds (1): in a sufficiently small region of a amax f(a) = a(1) *(1))-1f(1) (v.34) - 05 Multiplying the metric and the partial derivative by T will not change the result. Thus, (v.34) is equivalent to:1 (v.35) amax f(a) = a(1).+ ¢£(1))-11(1) * * where {(1) = ~T£ (1) and 1(1) = T1 (1) Since a(1) will usually be some distance from amax f(a), the pro- cedure indicated by formula (v.35) will be modified as follows: (1) If {61) is not positive definite, it is adjusted to form another matrix, |£(1)| , which is positive definite, and |£(1)| is used as the metric. (2) Instead of using a sfep size of 1 as (v.34) implies, a step size of ha) is used, i.e., an) + h(1)-|£(1)‘-11(1) is used.3 (3) A check will be made that, given the step size h(1) f(an) + h(1>,|£(1)|-11(1)) f(a(1) is indeed greater than ). 1If the variables were not normalized in the manner indicated in Part III before computation begins, multiplication by T might be expected to increase rounding error; however, due to the normalization used, rounding error will not be increased. 2A method for forming the I=€(1)I matrix from the in) matrix is given in section V.C.3. ha) Determination of the step size is discussed in section V.C.4. 178 (4) A series of steps will be taken with the direction and step size recomputed at each step until amax f(a) is reached. Instead of recalculating the metric each iteration, the same metric may be used for a number of iterations, only the vector of partial derivatives being recalculated each iteration. The writer has not compared total time required for convergence if the same metric is used for a number of iterations with the total time required for con- vergence if a new metric is calculated each iteration. Up to now, the writer has always recalculated the metric each iteration (and has obtained rapid convergence on all problems attempted). 1Determination of when convergence or maximimization is achieved is discussed in section V.C.S. 179 2. The vector of partial derivatives for FIML In expressing the vector of partial derivatives and some alter- native metrics for FIML, a number of intermediate matrices will be cal- culated from a given vector of coefficients, a, and the sample values of the variables, Z (more particularly from the Z'Z matrix), given the assumed structure. The Z'Z matrix and the assumed structure do not, of course, change between iterations. Only the vector of coeffi- cients changes (given Z and the assumed structure, f(a) is a function of a, only). We will use a superscript on the a vector to indicate a(i-1) that a particular set of coefficients are used (e.g., ). No superscript will be used on the intermediate matrices calculated from i-l a( ) vector and Z'Z matrix, since it will be obvious from the the formulas given which intermediate matrices change wherever a new vector of coefficients is used. The matrix T will be treated as an intermediate matrix of this form (i.e., f will not be superscripted), since it is formed from (1) the elements of the coefficient vector a, corresponding to jointly dependent variables in the system, (2) the coefficients of the identity equations corresponding to jointly de- pendent variables in the system, (3) zeros, and (4) -1's. In computing the direction for the ith iteration, Tégéélg eval- uated at a(i-1) (i.e., with the coefficients obtained from the pre- ceding iteration used as a) is the right hand side term by which the metric inverse is first multiplied. This term may be written as: 1See Rothenberg and Leenders [1964], p. 61, 63-64 for a derivation of the vector of partial derivatives. Rothenberg and Leender's notation differs slightly, since their logarithmic likelihood function is 1/2 of the f(a) given above, i.e., their logarithmic likelihood function is 180 réfga? 531 (v.36) 51'1) = Téiiél . = T aa a=a(1-1) afga) 1.. aaML=a(i-l) For FIML, the part of the right hand side corresponding to the unre- . . . th . . stricted coeffiCients of the u equation may be written as: . @{Mu} M (v.37) 1 (1'1) = Tégéfll (i_1) = T 1+ 2 s”“ z'fi , u u a=a - 0 u'=1 u u §{u|u . 31a] = T + ZJU I 0 . 8”: where §{p|u} is the part of the nth column of f-1 corresponding to the un- restricted coefficients of the jointly dependent variables of the nth equation. (f is a GXG matrix; that is, the coeffi- cients of jointly dependent variables in the identity equations are included in the % matrix. Only a part of f-1 is used-- the part corresponding to the unrestricted coefficients of the Y{ulul . . . 1 stochastic equations. is an thI vector. (1/2)[c1 + log(detzf) - log(det 8)] = k' + (1/2)-2 log[abs(det f)] - 1/2 log(det s) where k' = (1/2)c2. Also the vector of partial derivatives and the metric have been multi- plied by T in this paper as noted in the conversion from (v.34) to (v.35). (l/2)f(a) §{u|u} 1The rather unique notation should seem more justified A I A I when the Y1“ u} and Y{u|p } vectors are defined during the formation of the at metric (section V.C.3). 181 O is an prl column vector of zeros corresponding to the pre- . . . th determined variables in the u equation. t - sun is the element of the u h row and the u'th column of S 1. _ I" = I (i‘l) _ I = I (1‘1) I (1‘1) 1 zuU [(211 +Zl)+al (2p. +ZM)+aM 1 S is an MXM matrix corresponding to the M stochastic equations. Notice that given the assumed structure and a set of sample values of variables, Z, the matrices and vectors given above are all calculated from a given vector of coefficients a or from intermediate matrices calculated through use of the vector a. A set of starting coefficients such as from ZSLS, LIML, DLS, 3SLS, or merely a set of "assumed" coefficients may serve as starting coefficients in calcu- lating the first metric and right hand side.2 The coefficients from the i-lSt iteration are used in calculating the metric and right . .t . . hand Side for the i h iteration. -1 1 (i-l) . a = a d Z = : 2 as before. + u- a(i-1) n + u» [ya u] u 2A3 noted in section V.C.l the question of whether multiple local maxima occur in the region of interest is an unanswered question. To date the writer is not aware of a problem in which two sets of starting coefficients have led to different local maxima. (If rounding error were large any problem might appear to have many local maxima.) 182 3. Metrics for FIML 2 A rationale for suggesting the use of -T§-£%El as the metric as to use in determining direction was derived earlier. We will call this . . 1 (i-l) . metric the :2. metric. Often a , the vector of unrestricted co- efficients to use in calculating the metric and the vector of partials for the ith iteration will be sufficiently far away from amax f(a) that £(i-1) is not positive definite. For this reason, we will con- sider three other metrics--the 9 metric, the I? metric, and the lil metric. Of these metrics we will use only the ‘dfl metric in our FIML iterations. The formulas for the -9 and E’ metrics are given so that we can draw correspondences to other methods involving one or more iterations such as the Zellner-Aitken estimator (ZA), iteration on the ZA estimator (IZA), BSLS, and IBSLS. (All of these methods are dis- cussed in subsequent chapters.) The -9 and E’ metrics are used during early iterations in some FIML computational schemes, but not the computational scheme given in this paper. The metrics are most easily defined by dividing them into blocks which correspond to the division of the coefficient vector a accord- ing to the equations from which a came. Let. W? be an arbitrary metric subdivided as: r- o o s j 77‘11 mm (v.38) 771 = li ... %_J L. 2 ¥§—£%El is the usual Newton metric. 3a 183 where ”L“. is the block of the metric whose nu rows correspond to . . . th the unrestricted coefficients of the u equation and whose n , u o o a th o columns correspond to the unrestricted coeffiCients of the u' equation. In this section, we will omit from the metric the iteration number specifying the set of coefficients used as the vector a in evaluating the metric. t 1 The 1111' h block of the 9 metric is: un' .39 6’ = 'z (V ) uu' 8 Zn u' 2 The 6’ metric is the -T§—£éél matrix derived if the log(detzf) aa term of (v.23) is ignored in defining f(a). [The log(detzr) term will not appear in f(a) if the Jacobian of the transformation from U to a and Z is 1, or if it is ignored.] The uu'th block of the R metric is:2 ‘—Y'Y Y'X ’[ u u'JnX u u' I I (v.40) g; . = sup [z'z .] x = sup 1 t p‘ p. 1". ll x'Y ' X'X ' u u u u ”Y'Y - Y'Y Y'X ' u u' E u u'.LX u u' = Sun X'Y , x'x . u u u u Y'X = r'x [X'Y = X'Y , and [x'x = x'x Since ([ u H'JIX u u' ’ u u'JIX u u' u u'JHX u u' X” and X”. are submatrices of X [the matrix of predetermined vari- ables in the system]-- see (1.54) and (1.40).) 1The ‘9’ metric changes each iteration since 8 is calculated from the vector of coefficients and, therefore, 5“” changes each iteration. 2 . . . . The E’ metric changes each iteration Since S is calculated from the vector of coefficients and, therefore, 8““ changes each iteration. 184 If x has full column rank, then (v.40) may also be written as: (v.41) ELu' = suu'ZfiX(X'X)-1X'Zu. ; however, (v.40) is a preferable computational formula since [Y'YJIX may be calculated by the orthogonalization procedure of section I.D.2, [Y'Y]IX calculated as Y'Y - [Y'YJIX’ and all of the matrices of the form [YJY extracted as submatrices of [Y'Y] The E’ metric was derived by Rubin as a matrix whose inverse is asymptotically the same as ail, (imax f((1)).1 being the asymptotic maximum likelihood estimator of the FIML coefficient variance-covariance matrix. ,th . . 2 The 1111. block of i for the FIML estimator is: w' .. ~ (v.42) i , = T H , + 8 2'2 , - (l/T)Z'UF ,U'z . MD MD H H H UH H where: - 'A =_ ' A ... A ... z' zuU szul uMJ= [(zu +1)+ 1 (z +ZM)+aMl 1Its use is suggested in Rubin [1948] and Chernoff and Divinsky [1953]. Later yet, the R’ matrix became important as the matrix in- verted in the calculation of BSLS. By showing that Var(63 asymp- totically equals Var(6H ML), Zellner and Theil [1962] and 3§f§ers have also shown that asymptotically E’l equals 151, since the usual :zymgfiqtic Var(635LS) is Er and the usual asymptotic Var(6FIML 2See Rothenberg and Leenders [1964], pp. 63-65 for a derivation of =£ , . Rothenberg and Leender's logarithmic likelihood function ) is 1/2 of the logarithmic likelihood being maximized here. Also, the vector of partial derivatives and the metric have been multiplied by T in this paper (see footnote 1 p. 179). The i matrix changes each iteration since the vector of coeffi- cients is used in the calculation of the intermediate matrices given below. 3 a = -1 and Z = [ i Z ] as before + I» a + u- y ' u ' u 185 P 7‘ 31" . . _1 ' ' [81H,,,SMJ']+SHHS , . an R“. L? a A I A I I —] Rh:- Iu}Y{u-|u } 0 m xm , m XL , u H u M Hun. - with: O O 6 Xm 6 X6 L I» u' H 1111 A I Y1“ lg} being a vector containing the mu elements of the nth column of f- corresponding to the mh unrestricted coefficients of the jointly dependent variables of equation u. (T is a GXG matrix, since the coefficients of jointly dependent vari- ables in the identity equations are included in the F matrix.) A I Y{u|u } being a vector containing the mu, elements of the nth column A- of F corresponding to the mu. unrestricted coefficients of the jointly dependent variables of equation u' The 9 and E metrics are always positive definite or positive semi-definite. (Under some circumstances the 0 and R metrics will be singular; therefore, positive semi-definite.) The :6 metric will generally not be positive definite except close to the maximum; that is, the i metric will generally not be positive definite when iteration is started.1 1A notable exception is Klein's model 1. Although it is not positive definite when starting from the ZSLS estimates, only a few iterations are usually required to move the estimates into a region in which the .i metric is poSitive definite. As a result of being so well conditioned, Klein's model I is somewhat misleading since many procedures which would have difficulty converging for many problems which may be encountered will converge easily for Klein's model I, the model on which they are usually tested. 186 Chernoff and Divinsky recommended use of the 6’ metric initially, the E metric for a number of iterations and finally the :6 metric when the estimated coefficient vector, a, becomes close to the coeffi- cients which maximize the likelihood. They also suggested some guides for switching from one metric to another. Eisenpress wrote the first large scale FIML computer routine available for general use and has calculated a wide variety of problems to it. Initially he programmed it using the 9, E, and the i metrics as suggested by Chernoff and Divinsky. After considerable experimenta- tion with speed of convergence (and whether convergence would occur at all for that matter), he along with John Greenstadt devised the Iifl metric which he then used as the only metric. :6 can be expressed as i = EXE' where A is a diagonal matrix with the eigenvalues of i forming the diagonal elements and E is a matrix whose columns are eigenvectors of iL the eigenvectors being in the same order as their corresponding eigenvalues on the diagonal of A. (Any symmetric matrix may be decomposed in this fashion.) Let 'dfl be defined as: (v.43) lil = ENE' where IX| is the same as A except that the absolute values of the eigenvalues replace the actual eigenvalues. 1Chernoff and Divinsky [1953]. 2 . . Personal conversation. Also, Eisenpress [date unknown]. 187 As with any non-singular symmetric matrix, £51 may be formed directly as: (v.44) £1 == [EAE'J'1 = (B')-LA-IE-l = Ex'ls' (Since E.1 = 3'). Thus, Iafl'l may also be formed directly as (v.45) |£l-l = Eli’lls' where lA-1| is formed by replacing each diagonal element of A by 1 divided by its absolute value.1 Assuming that if. is not singular,|:d will be positive definite by its method of calculation. Eisenpress has examined the effect of setting negative eigenvalues positive and recombining the matrices. The basic directions (the axis system of the metric being formed) are established by the eigenvectors with the eigenvalues determining the distance along each axis. The negative eigenvalues indicate a move- ment along an axis away from the maximum. Setting them positive re- sults in movement along the axes (given by the eigenvectors) in the correct direction. Since the inverse of a metric corresponds to using the inverse of the eigenvalues, a large eigenvalue corresponds to a small movement along an eigenvector axis and a small eigenvalue corre- sponds to a large movement. Since negative eigenvalues are generally small in absolute value, setting them positive results in switching the direction from a large negative direction to a large positive 1Since all off-diagonal elements of IA-ll are zero, lid-1' can be formed more efficiently by forming the ijth element of liq-1 as n 1 . th . _ kfl‘;E;—X;eikejk where Ak is the k eigenvalue of A and E [eij]. 188 direction. Eisenpress feels that this is correct based on the geometry of the situation. Convergence has been much more rapid when this is done than in some earlier experiments in which be substituted zero for the inverse of the negative eigenvalues (thereby moving along the eigen- vectors only in the directions corresponding to positive eigenvalues).1 -1 -l . . . . . . =6 and lil c01nc1de if all eigenvalues are pOSitive and . . . . -1 Since direct inverSion of :6 to form :6 is faster than forming -1 lid in the manner indicated above, the writer has modified the pro- cedure as follows: -1 . . . . . (1) Use the lad metric until 3 consecutive iterations have been performed in which all eigenvalues have been found to be positive. -1 (2) Use the £1 metric by forming the metric through direct inversion of =£ for 5 iterations. -1 . . . . . (3) Use ldfl for an iteration. If all eigenvalues are pOSitive start with (2) again, otherwise start with (1) again. (4) When convergence to amax f(a) appears to haveoccurred, check as to whether all eigenvalues of :6 are positive. If any are negative, convergence has actually been to a saddle point. Switch to the lid metric and Start with (1) again so that movement will be past the saddle point toward the maximum. All eigenvalues of I. will be positive at a local maximum. It is not necessarily true that once a positive definite region for :6 is entered, afl will continue to be positive definite. A non- positive definite region may again be encountered, e.g., one or more 1The writer hopes that he has not misrepresented Eisenpress and Greenstadt's developments in any of the above. 189 saddle points may be encountered as movement is made toward the maximum. No difficulty should be encountered in moving past a non-positive definite region if a reversion to the I!” metric is made. 190 4. Step size to use at each iteration Earlier it was indicated that in the FIML convergence procedure, . . . 1 the coefficients for an iteration, a( ), are to be calculated from the coefficients of the previous iterations as: (v.46) a(i) = a(i-l) + h(i)d(i) where d(i) = Ii‘i-1)|-11(i-1) and h(i) is the step size for the iteration. So far we have covered only the direction in which we will move from a set of coefficients in the space of unrestricted coeffi- cients at any one iteration. The distance we move at any iteration can also be very important. Consider the situation given by Figure v.47. Figure V.47 191 x f(a) is the point of maximum likelihood and Assume that ama the contour lines represent points with equal value of the likelihood function. Assumingthatdirection (11 is taken with Step Size h1 for the first iteration, d is taken with step size h2 for the second 2 iteration, etc., we may easily end up Spending many iterations trying to move up the long narrow ridge (or we could even move down the ridge) due to using a series of step sizes which are too large. Figure v.47 illustrates this. On the other hand, if for each step we could vary the step size such that we land somewhere near the top of the same ridge our situation could be as in Figure v.48. Figure v.48 If too small a step is taken,»the above may take many steps. Thus, our step size as well as our direction takes on considerable importance. 192 Following is a scheme which may be used to determine the step size, h,_to be used for an iteration: (1) (a) (b) For the usual iteration, a step size of hi1) is tried, i.e., £*(a(i'1) + h§1)d(i)) (i) 1 . If f*(a(i-1) + hi1)d(l)) > f*(a(1-l)), a trial step size twice is calculated. (Determination of a start- ing value, h is discussed further on.) as large, i.e., 2h(1) is tried. If f*(a(i-1) + 2h§i)d(i)) > 1 f*(a(1-1) + hil)d(l)), a step size twice as large, i.e., 4hii) is tried. This process is continued until the jth time a step size twice as large is tried, f*(a(1-1) + zjh§1)d(i) )S f*(a(1-1) + 2j-1h§1)d(1)). At that time a quadratic approximation is used to calculate a step size which we will call h2° If f*(a(i-1) + h2d(i)) > f*(a(i-1) + 2j-1h§i)d(i)), h2 is used as the step size, h, for the iteration. Otherwise, 2(j—1)h§1) is used as the step size, h(i), for the iteration. 0n the other hand, if f*(a(i-1) + h(i)d(i)) S f(a(i-1)), a trial step size half as large, i.e.,-1h(i) is tried. If f*(a(i-1) + 2 1 2h§1)g(1)) < f*(a(11))’ a step size half as large 1 e ’ z lhii) is tried. This process is continued until at the j th time a step size half as large is tried either f*(a(1-1) + (l/2)jh{i)d(i)) 2 f*(a(i-1)) or (1/2)Jh{i) < eh . (aa) If £*(a(i‘1) + (l/2)jh§i)d(i)) > f*(a(1-1)), a quadratic approximation is used to calculate a step size which we will call h2.1f f*(a(i'1) + h d(i)) > f*(a(i-1) + (i) d(i) 2 ),h is used as the step size, h(1), for (1/2)jh the iteration. Otherwise (l/2)jh{i) is used as the step 193 size, h(1), for the iteration. (bb> .If f*jh{i)d(i)) = f*(a(i'1)>, (1/2)jh(1) h(1) is used as the step size, , for the iteration. (cc) If (l/2)jh (i) < ch, a negative step, -(1/2)jh§i) is tried. (i-l) )jh (i) d(i) (i-l) (aaa) If f*(a —(l/2 ) > f*(a ), the trial step size is doubled [from -(l/2)jh (1)], re- doubled, etc. in a negative direction in the same manner as was done in a positive direction in step (1a) above. When a step size such that f*(a(i-1) - 2k(l/2)jh{i)d(i)) < f*(a(i'1) _ 2(k-1)(1/2)jh{i)d(i)) is reached, a quadratic approximation is used to cal- culate a step size-—h (i-l) 2. +rh2d (1) (1-1) _ If f*(a ) > f*(a is used as the step size, (k-l) 2k'1(1/2)3h{i>d(i)>, h h 2 (1/2)j , for the iteration. Otherwise, -2 (i) is used as the step size, h , for the iteration. If f*(a(i'1) - 2 k(1/2)jh(i)d (i))= f*(a(i-1)=2k1(1/2)jh(i)d(1)) then -2 k(l/2)jh(i) . i . is used as the step size, h( ), for the iteration. (i-l) _ (1/2)jh (i) d), (i) (bbb) If f*(a (1/2)j+1h§i) is used as the step size, h , for the iteration. (2) If (a) the absolute value of all elements of 1 are less than a preassigned epsilon, e , i.e., the partial convergence criterion 1 (discussed farther on in the section on convergence criteria) has 194 been met; (b) igi-l) is positive definite; and (c) a positive step size was used the previous iteration, then an initial step size of l is tried.1 If f*(a(i-l) + d(i)) 2 f*(a(i-1)), the trial step size is doubled, redoubled, etc. and a quadratic approximation tried as for the usual iteration. If f*(a(i-l) + d(i)) < f*(a(i-l)) a step size of 1 is used even though it leads to a slightly lower likelihood value. The reason for imposing a (i-l) step size of at least 1 is that a is apparently very close . . . (i-l) to the maXimum. Given the small Size of the elements of I a small step size would lead to almost no movement. If the step size of 1 should be larger than optimal the next iteration can ) . . . . . . i-l . eaSily readjust the direction and Size, Since the :£( metric is very powerful close to the maximum. If the point is not close to the maximum, but the elements of 1 are quite small due to the likelihood being almost flat, then again a large step is de- ‘ sirable so that a large movement will be made- The 1W Step Size, hr”) The derivation of a maximization procedure based on the Taylor expansion given in section V.C.l Suggests a step size of l as an optimal Step size; however, this argument is based on the Taylor expansion about (1-1)' a being a sufficiently good approximation to the likelihood function. max f (a) If a(1-1) is not very close to a then the Taylor expansion is unlikely to be a good approximation to the likelihood function. There * 11 rather than 1 is compared to an epsilon due to the normal- ization of variables used in the computation. (Normalization of vari- ables is discussed in chapter IX.) 195 are other arguments for a step size of 1 as being optimal but most of these also founder on some noble assumption. It is generally advanta- geous to allow the step size to vary even though the £1 metric is used and the current region is one in which the £1 metric is positive definite. (i-l) Only when a is virtually at the maximum does it seem somewhat desirable to limit the step size to l and even there the rules given previously for step size allow the step size to be greater than 1. In applying the step size rules given previously, a step size of less than 1 is selected a far greater proportion of the steps than a step size of l in the FIML problems calculated by the writer. (An exception was Klein's model I where the average step size selected was to .5 .9.) As a result, the writer has currently set hi1) At first glance, it might seem that hi1) might be set to the step size used the preceding iteration or some proportion of this step size. This would be a very undesirable choice for hi1), however, as it turns out in practice that large step sizes tend to be followed by small ones, and vice-versa. If any "rule" is to be selected it should hi1) h(l-l), but the relation- probably make inversely related to ship is really to tenuous to be relied on. It is quite possible that a variable hi1) which is better than any fixed h could be calculated based on the eigenvalues of the 1 metric for those iterations in which the eigenvalues and eigenvectors of the metric are calculated; however, the writer has not attempted to develop such a rule. 196 Mim'mwn Step Size, Negative Step Size, and 6h Currently an eh of .001 is being used by the writer. As noted earlier, when the trial step value becomes less than c a negative h step of the same size is tried. If this gives a likelihood higher than the likelihood for the previous iteration, searching goes on in the negative direction for a higher likelihood value, yet. If the negative step does not give a higher likelihood value, half of the previous positive step is used as the likelihood value. If a negative step is selected, the '11 metric is automatically used for at least the next 3 iterations. It is expected that a negative step will be selected only in rare pathological cases. It has been) programmed into the procedure as a matter of interest to see whether such cases arise rather than in the expectation that it will provide a key element in the iteration scheme. The selection of an a of .001 is, of course, quite arbitrary. h In selecting an c it is necessary to weigh the desirability of se- h’ lecting a step size small enough that the likelihood is increased or at least not decreased against the extra time it takes to calculate the likelihood value at any given step size and the fact that if too am (i-l) small a step is taken, will almost coincide with a , (This follows the old army adage, "Do something, even if it is wrong!") QWC App/Lo Lima/tion The quadratic approximation referred to previously consists of calculating the second degree polynomial which fits three equally spaced 197 . 1 '- pOints exactly. If a(1 1) is the value of a at the start of the iteration, h* is the step size with f*(a(l-1) + h*d(i)) 2 f*(a(i-1)), 0-1 o o- o and f*(a(1 ) + 2h*d(l)) s f*(a(l l) + h*d(l)), then the point i-l ’ a( ) + h**d(1) will be the maximum value of the quadratic function which goes through the three points a(l-1), a(i-1) + h*d(1), and a(i-1) + 2h*d(l) when h** is calculated as: (v.49) h** = h* 1 + f*(§(i'1)) - f*(a(i-l) + 2h*d(i?) (i-l) (i) (i-l) (i-l) (i) 2{f*(a + 2h*d ) + f*(a ) - 2f*(a + h*d )} If in calculating (V.49) the denominator is zero, h** is set to 2h* . ._1 . ._ ._ . 1f f*(a(1 ) + 2h*d(l)) = f*(a(l 1)) [and therefore f*(a(l 1) + h*d(1)) (1.1))] and h** is set to h* if f*(a(1-1) + will also equal f*(a 2h*d(1)) # f*(a(i-l)). Formula (V.49) holds even if h* is negative. As noted earlier, if f*(a(l'1) + h*d(l)) > f*(a(i-1) + h**d(i) ), h* is used as the step size rather than h**. In problems calculated to date by the writer, the use of the quadratic approximation has not been a powerful procedure in the se- lection of a step size. For some iterations it has given a step with a higher likelihood value and for some iterations it has not. In fit- ting the quadratic, the f*(a) which we are using is only a montonic function of the likelihood function. It is very possible that bettt: results would be obtained if some other monotonic function of the 1Koopmans, Rubin, and Leipnik [1950], p. 172 attribute the use of a quadratic approximation in the calculation of FIML problems to a suggestion by John Von Neumann. 198 likelihood function were used in deriving trial step sizes. Alter- natively instead of the quadratic approximation, some other approx- imating function could be used. In any event, until a better approximation is devised, the actual quadratic approximation calculation is trivial and the number of times it does lead to slightly better step sizes would appear to justify the additional time required to calculate the likelihoodvalue at the new point, a(i-1) + h**d(i). "Local" methods could be derived for calculating an optimal step size based on, say, the eigenvalues of the metric, the ratios of elements of d(i), etc.; however, these methods require assump- tions regarding the shape of the likelihood functiong (An assump- tionmoften made in deriving a local method is that the likelihood d(i), an function is approximately quadratic in the given direction, assumption which does not appear justified.) As noted earlier, such local methods may be helpful for establishing an initial trial step, h for an iteration, but they do not seem desirable as final de- 1, terminants of the step size to be used for an iteration. 0n the other hand, the suggested step size procedure outlined earlier is a "global“ method in that no assumptions are required re- garding the shape of the likelihood, except that it has only a single peak in the region under consideration and that there is no higher peak. 199 5. Convergence criteria Following are some requirements which could be imposed in de- termining convergence: (1) (2) (3) (4) ) ‘-1 All eigenvalues of :£(1 must be positive for the iteration. Partial derivative convergence criterion: (V 50) max(abs 1(%)) s e , . [J] 1 J i.e., the absolute value of all elements of the right hand side vector must be less than or equal to a preassigned constant. Coefficient convergence criterion: (1) v.51 max abs d , la , ) s g ( ) j ( [J] [J] a (i) . .th . _. , . where d[, 15 the J element of the direction vector, a , is J] [J] the jth element of a and ea is a preassigned constant. If a step size of l were imposed, a(l) - 3(1-1) = (a(1-1) + d(1)) - a(1-1) = d(l); therefore, (V 51) is equivalent to requir- ing that with a step size of l, the absolute proportional change would be less than or equal to ca for all coefficients. Likelihood convergence criterion: (v.52) f*(a(i'1) + d(i))/f*(a(i'l)) s €f* , i.e., if a step size of l were imposed, the ratio of the result- ing likelihood to the likelihood for the preceding iteration is less than or equal to a preassigned constant. If a user desired to iterate until each iteration produced a statistically insignificant Change in the coefficients, then a 200 (1)) and stopping criterion based on the relatives sizes of f*(a f*(a(i-1)) might be considered. It should be recognized, however, that for many problems, coefficients differing considerably from those which maximize the likelihood function may not differ statistically from those which maximize the likelihood function, let alone those of the following iteration; hence, the coefficients derived thrOugh use of the likelihood convergence criterion may differ considerably from the maximum likelihood coefficients. Since it is the coefficients which maximize the likelihood function which are desired-~not coefficients on a fairly flat surface away from the maximum--the likelihood convergence criterion does not appear to be a very fruitful criterion for convergence. Up to now, in the FIML section of the AES STAT system (which is discussed in chapter IX), (1) and either "(2) and (3)" or (4) have been imposed for all problems. If none of the preassigned constants were supplied by the user, the problem iterated to the maximum number of iterations Specified by the user. If s but not ca was specified, 1 ea was set to 61 . If ea but not at was specified, 61 was set to 100-68. An 6 of .0000000001 and an ea of .000000000001 have I worked well for a number of problems computed; however, both 3 and Ba are quite arbitrary. Close to the maximum, convergence usually becomes very rapid; so that a small ea or 9 takes little additional 1 computer time. On the other hand if rounding error becomes severe an extremely small ea or 61 may be difficult to attain. The FIML section is being changed so that if the user specifies neither 3 nor 1 201 sa, ea and a will be automatically set to .0000000001 by the FIML 1 section; hence, even if no convergence criterion is specified, iteration will no longer be merely to the maximum number of iterations specified. 202 D. Estimated Disturbance Variance-covariance Matrix The maximum likelihood estimate of the disturbance variance-co- variance matrix used in FIML estimation is the S matrix with the up element of S calculated as: .53 = 1 A'“ = ' ' * (v > sw. ( m tutu (“Thaw-+2.. +Zu')+au' -l a 2 where + u a and an is the part of amax f(a) corresponding u to the unrestricted coefficients of equation u. In the single equation procedures previously discussed, the S matrix contained only a single element and so a "degrees of freedom" of T-nu was suggested as a possible denominator in calculating SUM for the uth equation, where up is the number of "explanatory" jointly dependent variables plus the number of predetermined variables. A similar adjustment can be made in the calculation of the S matrix for FIML if an estimator more compatible with single equation tech- niques is desired. A more compatible estimator would be to use (V 53), but substitute /ETIIE;1 JEIT—EJT for T in the formula.2 The denominator for the uth diagonal element will then be T - qb as in the single equation procedures and the denominators for the off- diagonal elements will be such that the S matrix will still be positive definite. 1See (V.18) and (v.21). afT - n °/T - n , was suggested to the writer by Professor Arnold Zellner‘as ah alternative to the use of T in the calculation of ZA estimates. Professor Zellner neither endorsed nor disparaged this ad- justment for ZA. He merely listed it as an alternative. 203 It seems desirable that the maximum likelihood estimate of 3 (using the denominator of T) be used during iteration until convergence is complete so that &FIML will indeed be the maximum likelihood es- timate of the coefficients. /T»- nu /T- nu, would only be used in the denominator when printing out the estimated disturbance variance- covariance matrix corresponding to the FIML coefficients. The adjust- ment could also be used when printing out the estimated disturbance variance-covariance matrix corresponding to the coefficients from which the iterations were started or when printing out the estimated disturbance variance-covariance matrix corresponding to intermediate- stage coefficients. 5 , calculated by using T and SUU' calculated by using up /T - nM’/T - up, are asymptotically equivalent. Often we are interested in the relative sizes of the estimated covariances (i.e., relative to the corresponding variances). In this case, the estimated disturbance variance-covariance matrix normalized so that l's appear on the diagonal is useful. This matrix may be de- fined as: (V.54) SN = DSSDS 1 fi F 0 0 /311 0 1 0 fSZZ where D = S 0 0 1 /s a MMJ 204 i.e., S is calculated by dividing each row and column of S by the square root of the corresponding diagonal element of S. SN has the advantage of being independent of the scale of the normalizing variable-- 1's on the diagonal provide a convenient normalization. SN is the same whether T or T - nu'/T - nu, is used in the denominator in the calculation of the elements of S. The element in row u and column u' of SN is: 9 (V.55) s ,//s s , , us was the estimated simple correlation between the disturbance in equation u and the disturbance in equation u'. 205 E. Estimated Coefficient Variance-covariance Matrix The maximum likelihood estimate of the coefficient variance-covariance matrix for FIML is:1 2 -1 (v.56) Var(a) = %-- §_j€§9_ ‘\ aa ~max f(a) 2f '1 ax f(a) -1 = -T§—§ = [ip ] 53 max f(a) a=a The estimated variances of the individual coefficients are given by the diagonal elements. The square roots of these diagonal elements are often used as asymptotic standard errors. The elements of Var(a) could be adjusted in the same manner as the estimated disturbance variance-covariance matrix to provide an esti- mate more compatible with the usual single equation estimates. Let £51 up to unrestricted coefficients of equation u and columns corresponding , be the nuxnu, block of [imax f(a)]-l with rows corresponding -1 to unrestricted coefficients of equation u'. Then, if ii”, were multiplied by T , the resulting estimated coefficient /i - n a/T - n . variance-covariance matrix wdfild still be positive definite and asymptotically the same as the one given in (v.56), but more compatible with the estimated variance-covariance matrix given for the single 1Chernoff and Divinsky [1953], p. 259. This is the "information matrix" of Kendall and Stuart [1961], pp- 28, 54-55. 206 equation methods.1 This adjustment should not be used during iteration to the FIML solution--only during printing out of the estimated coeffi- cient variance-covariance matrix after aFIML has been calculated. Often we are interested in the relative size of the estimated covariance. In this case the estimated coefficient variance-covariance matrix normalized so that l's appear on the dia onal is useful. This 8 matrix may be defined as: (v.57) varN(a) = Dc v;t(a) DC ’ 1 0 O T JVar(a(1)) o 1 o where DC = JVar(a(2)) , l a O O _T_______d /Var(a(n)) i.e., VarN(a) is calculated by dividing each row and column of Var(a) by the square root of the corresponding diagonal element of Var(a). VarN(a) has the advantage of being independent of the scale of the variables; 1's on the diagonal provide a convenient normalization. VarN(a) is the same whether T or' /TJ- nu /T - nut is used in the 1Computationally the adjustment may be accomplished by (l) multi- plying each row by./T/(T - n ) where is the number of unrestricted coefficients in the equationpfrom whigh the coefficient corresponding to the row relates, and (2) multiplying each column by'JT/(T - n ,) where n , is the number of unrestricted coefficients in the equa%ion from whdch the coefficient corresponding to the column relates. 207 denominator in the calculation of Var(a). The element in row i and column j of VarN(a) is the estimated simple correlation between a, and a . i j 208 F. Arbitrary Linear Restrictions Imposed on the Coefficients The iterative FIML procedure given in the previous section is designed to maximize f(a) with respect to a; that is, adjust the elements of a until f(a) is the highest possible, where f(a) is defined in (v.23). In this section we will consider the problem: (V.58) max f(a) a subject to: (V.59) R a = r x x1 x1 NR n “ NR The vector a is the same vector as before--it contains all of the non-zero non-normalization elements of the rows of a which corre- spond to stochastic equations in the system. In this section, NR additional arbitrary linear restrictions are imposed on the M M n = 2 n = 2 (m +-L ) coefficients of a. The equations need not “=1 P- “=1 U- H be identifiable in the absence of the additional restrictions; however, they must be identifiable after the restrictions are imposed. 209 1. Illustration of linear restrictions on coefficients--Klein's model I The placing of linear restrictions on coefficients is very straight- forward and hardly needs illustrating. This example will, however, be used to demonstrate the effect of using identity equations to solve out jointly dependent variables in the system; since this is a technique that is commonly referred to, but apparently not very well understood. In particular, we will note how the use of identity equations to solve out jointly dependent variables leads to the imposition of restrictions on coefficients, and how these restrictions may be expressed by the R matrix and r vector mention above. Equations (I.5a) through (I.5h) present Klein's model I as an 8 equation model (3 stochastic equations and 5 identity equations) con- 2 taining 8 jointly dependent variables (C, P, W, I, w E, Y and K). 1, Often, the model is written as containing 3 stochastic equations and 3 identity equations by implicity carrying restrictions on certain co- efficients rather than listing the last two identity equation. (The last two identity equations may be considered as having been solved into the equations as compared to our previous formulation of the model.) Following is an expression of Klein's model I as a 6 equation model: [IJP + oEl] 1 2 (w1+w2)+°’ (V.60a) Consumption: C = ogl] + o 1This example should convince most readers of the desirability of explicitly using the identity equations in the computational procedure instead of using the identity equation to eliminate jointly dependent variables before commencing computation. 2The definitions of the variables in Klein's model I follow (I.5h) in section 1.0.1. 210 [2] [2] [2] [2] . b : = (V 60 ) Investment I do +~o1 P + o2 P.1 +-o3 K-l + u2 . . . [3] [3] [3] (V.60c) Private Wage. Wl o0 +-a1 (Y + R - WZ) + o2 (Y + R - W2)_1 + og3]t + u3 (V.60d) Product: Y + R = C + I + G (V.60e) Income: Y = P + W1 +‘W2 (V.60f) Capital: K = K + I As the model is written above, it cannot be calculated by the FIML method, since it still contains 8 jointly dependent variables (C, P, wl +W2, 1, may be computed by FIML if we rewrite the first and third equations as: I, W Y + R - W2, Y, and K) but only 6 equations. It (V.6la) C = ogl] +~oEIJP +01Enw1 +~a211w2 +~og1]P_1 + u1 [3] (V.6lb) Wl = o0 +-oE3]Y +~oE3J(R - W2) + og3](Y + R - W2)_ . .g31 1 t + u3 and impose the restrictions: 0121] 0121] 011:13] 0123] Thus, in eliminating the two identity equations, we have solved out two of the original eight jointly dependent variables, namely W (= W + W2) 1 and E (= Y + R - W2), and imposed restrictions on certain coefficients. At this point, we may write our problem as one of: max f(a) a subject to: Ra = r 211 where: ”at? , as" 2: in BE“ avg“ “ ‘2 ‘13 ii: ii: :5 (:11) :22] :53] [’3 j fa ] :3 J c(E11 (121] 0E1] 0121] agl] 0([12] agz] O[£2] O[£2] 0E3] ags] 0’23] 0123] 01:33] 0 1 o -1 o o o o o o o o o o R. 0 0 0 0 0 0 0 0 0 l 0 -1 0 0 0 r- 0 Now let us use the three remaining identity equations to solve out the jointly dependent variables Y, I, and C.1 From (V 60a) we have: (V.62e) Y = P + W1 +’W2 Rewriting (V.60f) we obtain: (V.62f) I = K - K Substituting (V.62e) and (V.62f) into (V.60d) and rewriting it in terms of C we obtain: (V.62d) C=P+W1+W2+R-K+K_1-G Substituting C, Y, and I as expressed by (V.62d), (V.62e), and (V.62f) we get the following expression of the model: 1These are the same variables solved out by Chernoff and Divinsky [1953]. 212 (V.63a) P+W1+W2+R-K+K_1-G=og1]+oE1]P+og1]W1 + Eljwz + og1]P_1 + 111 (V.63b) K - K _=1 oz[2]+ (1&sz + 022313-1 + agz]x_1 + u2 (V.63c) w1 = 0&3] + aE3](P + w1 + wz) + 013] (R - w 2) +~o23](Y + R - w2)_1 +'0g3]t + u3 subject to: 0&1] g 021] “E33 = 0E3] Again, the model as expressed cannot be computed by FIML, since it has 5 jointly dependent variables (P + W1 + W2 + R - K + K_1 -G, P, W1, K - K-1, and P +-W1 +-W2) and only three equations. It may be re- written in the following manner to make it amenable for computation by FIML: (V.64a) K = -og1] + (1 - o[1])P + (1 -a[1])w1+ (1 - o21])w2 - agl]P_1 +~agl](G - R - K_1) - u1 _ [2] [2] [2] [2] (V.64b) K - o0 +o1 P +-o2 P + (o3 _1 + l)K + u2 -1 (V.64c) W1 = 0&3] +-oE3]P +a[3]W1-+<1[3]R +~o[3](Y + R - W2)_1 [3] + u +’03 3 subject to: (1 _ O([1]) = (1 _ aEIJ 213 [1] as = -1 “13] : 0&3] = O[E3] Thus, in terms of the a vector, has become: [[ l . 1]* “1 &[1]* 2 a1 a “[1]* a i 2 where a1 a do , a 83 &[1]* 4 A * .gu agll L a .[13* .[1] .[1]* .[1] cl = 1--o1 , oz - 1--or2 [13* [11% [11* [11* [11* [1] Pol 0’2 0’0 (1’4 0’3 0’5 0’ o 1 o -1 o o o o o o o 1 R g o o o o o o L o o o o o 0 [2] l 0 0 O 0 CY R matrix and [2] 0 (2 restrictions) vector, (9&3? ag3] ag3] a231 .[3 oz ] a L. the model 022] ag2]* 0E3] 0E3] O or([33] “23] O[[23] ag3] 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 O 1 -1 l O -1' O. l 0 a 214 After converging to the FIML values for a, the desired coefficients 61%”, etc. which have not been calculated directly are calculated as: [11* 1 2 , etc. 51?] = 1 - 5311]" 5.21] = 1 - a 3 In our last formulation, we used an explicit normalization for each equation. For the first equation, it would probably have been slightly more convenient not to save out a normalizing coefficient, but instead use the R matrix and r vector to impose a normalization on the coefficients. Rewritten in this manner, the first equation be- comes (V.65a) oglj +-(oE1] - l)P +-(oglj - 1)W1 +-o%1]K_+-(ogl] - 1)W2 + QEIJPJ + egg“; - R - K_1) + u1 = 0 and the second and third equations are unchanged. The restrictions are: dgl] is a linear combination of only a constant with only .. * a single coefficient 0E1] 1Since , its variance and standard error are the same as the variance and standard error of &E1]. Thus, asymptotic [1]) = asymptotic Var(&E1]*), asymptotic Var(&21]) = asymptotic l x * . . . Var(o [1] ), etc. The ratios of coeffiCients to asymptotic standard errors (often called asymptotic t-ratios) will, of course, have to be recalculated, e.g., 62E” 01E 1] a?” Var(& asymptotic S-[1] = asymptotic S§[1]* # asymptotic S [I] “1 CY1 “1 .[1] where 8&[1] denotes the standard error of ol 1 215 [1] - at,” - 1) -= (a, 1) . (2 restrictions), 0E1] - ogl] - 1 0E3] = 0&3] = 023] (2 restrictions), ’ T aE1]** a213** a213 a - 631] .[l]** “a a213 agll j b The R matrix and r vector are: p“11]** agl]** 021] agl] “E1]** o[g1] 0E1] agz] 0E2] 0&2] ag2]* 1 —1 o o o o o o o o o o o 1 o o o o o o o o R ' o o o o o o 1 o o o o o o o o o o o o o o o L o o o o o o o o o o o o[[13] “53] ags] 023] 0&3] a333_ _ a o o o o o o o o o o o o o 1 o o o o o o , t = 1 1 -1 o o o o o 1 o o -1 o o J [91 216 Sammy 05 Kfieén Modefl I lawman The number of coefficients to be estimated in each of the formu- lations of Klein's model I may be summarized as follows: Number of Number of Model Expressed Identity Number of Number of "Unrestricted" byAEguations Equations Coefficients Restrictions Coefficients (I 5a)-(I.5h) 5 12 0 12 (V. 61a), (V. 60b) , (V.61c), (V.60d), 3 14 2 12 (V. 60e) , (V. 60f) (V.64a)-(V.64c) 0 l6 4 12 (V . 65a) - (V. 64b) , 0 l7 5 12 (V . 64c) Thus, the dimensionality of the unrestricted coefficients space does not change as identity equations are used to solve out jointly dependent variables. Since the number of iterations required for convergence is related to the dimensionality of the coefficienuispace, all of the above expressions would require a comparable number of iterations. 0n the other hand, as we will note by the formulas of the next section, the formal listing of the identity equations rather than using them to solve out jointly dependent variables involves less cumbersome cal- culations each iteration thereby taking less total computer time. Explicit listing of identity equations also has the advantages: (1) it provides a more convenient way to formulate the problem (at least in Klein's model I) than solving out the identity equations, (2) all of the coefficients are derived directly, (3) estimated variances, and t-ratios are calculated directly, and (4) calculation of reduced form coefficients is simpler due to the direct calculation of the coeffi- cients. 217 Restrictions on coefficients may arise from contexts other than using identity equations to eliminate jointly dependent variables. In these cases it will likely be more convenient to impose directly the restrictions onto the coefficients in the form Ra c r than to attempt to create additional jointly dependent variables and identity equations to impose the restrictions. The above illustration dealt with restrictions imposed on the coefficients of a single equation at a time; however, the computational formulas which follow will be equally applicable to restrictions which cut across equations. 218 2. Computational formulas In chapter IV we noted a procedure for transforming an R matrix and r vector into a Q matrix and q vector with certain . . . . l speCial properties relative to the R matrix and r vector. The procedure gave a method of separating the vector of coefficients, a, into: — H a (1) (n-rk R)Xl (V. 66) a* = nXl a(2) l_ rk RXl J where the n - rk R coefficients in the a vector are estimated directly and the rk R coefficients in the a(2) vector are cal- culated from the a(1) vector. a* is the same as a except that the coefficients may have been rearranged. The FIML computational formulas which take account of the additional restrictions are: (”7) 3E1; = #111) + hml Q'afl‘i'DQI"1 (2'1 (1'1) (n-rk R)Xl (n-rk R)X1 (le) (n-rk R)X(n-rk R) (n-rk R)X1 (i) _ (i) (v.68) a(2) — Q2 3(1) + q2 rk RXl rk RX(n-rk R) (n-rk R)Xl rk RX1 where: =£(i-1) is T times the negative of the matrix of second partial derivatives of f(a) with respect to a as before. (The $14).) additional restrictions are ignored in defining E.g., the Q matrix is the orthogonal complement of the R matrix. Derivations are given in the following sub-section. 219 i-l 1( ) is T times the vector of first partial derivatives of f(a) with respect to a as before. (The additional restrictions are ignored in defining 1.) Q, Q2, q, and q2 are calculated from R and r as in section IV.B.l. (i) (i-l) a and a denote the values of the coefficients for iteration i and (i-l), respectively. i-l The I I lines around lQ':£.( )QI denote the positive definite matrix [£(i-1) constructed from (Q Q) in the same manner that L£(1-1)| is a“). constructed from (V.67) and (v.68) may be combined into the single formula: (v.69) 8(1) = Q Eéi;l) +h(i) lQ'£(i-1)Q|-1Q'1 (1'1)] + nXl nX(n-rk R) (n-rk R)Xl nXl When convergence has been achieved, the estimated coefficient variance covariance matrix is given by: > = QEQ'fla" Hahn-10' an (v.70) asymptotic Var(aFIML where max f(a) is the restricted maximum satisfying Ra = r. The S matrix is calculated by formula (v.21) except that the restricted coefficients (v.69) obtained from the preceding iteration are used instead of the corresponding unrestricted coefficients in calculating the S matrix. Any restriction may be imposed only on the coefficients of a single equation, or it may cut across equations. As in section IV.B, the restrictions need not be linearly independent, i.e., R need not have full row rank. 220 It is usually convenient and saving of computer time to list only the non-zero non-normalizing coefficients of the stochastic equa- tions in the a vector; therefore, the above procedure is outlined with this in mind. It should be noted, however, that any coefficients of & may be listed in the a vector and then restricted to -l, O, or some other value by the R matrix and r vector. For some pro- blems an implicit normalization of coefficients is more convenient than the explicit normalization which we have used--setting a coefficient to -1. Use of the R matrix and r vector allows for an implicit normalization--all that is required is that some normalization be imposed by the restrictions so that the coefficients of an equation cannot take on an infinite number of values due to lack of a normal- ization rule. DQJz/évauon 06 (v.66) thaough (v.70). (V.68) is given by the method of calculating the Q matrix and q vector. (See section IV.B.l.) From (V.68), we derive that: aéig I (i) O (V.7l) 3(1) = Q a“) + q (2) 2 2 or (V.72) a(l) = Qaél; + q In the following paragraphs, we omit the superscripts (i) for simplicity. Since the coefficients of a are a function of the EAn implicit normalization is used for the first equation of Klein's model I in the last version of the "restrictions” example of section V.F.l. 221 coefficients of a(1) (8(1) is a sub-vector of a), we can maximize f(a) directly with respect to a(1). Let aEi] denote the ith element of a, and a denote the jth element of a . Then from (V.72) (DUJ (1) we get: 33 . (v.73) -——J;5l—- = Q.. 53(1) 1] [j] where Qij is the ijth element of Q. Now consider the vector T times the partial derivatives, T§—££§L . The jth element is a a<1) n n 53. Ta—ZAL g T 2 8::8) aa 1 = 2 1 [i]Q .’ (j=l,... ,n-rk R) , (1)[j] t=1 i (1)[j] t=1 13 where 1[i] is the ith element of l . Hence, (v.74) M = Q'1 aa<1) Next consider the matrix of (-T) times the second-order partial 2 derivatives, -T§—££§% . The element in row j, column k is (minus) the 5a “ (l derivative of the j):h element of (V.73) with respect to a , i.e., (1)[k] n n 2‘ i B a aa n n 5 f(a) i m = m=l i=laamaai 58(1)[j] 53(1)[k] m=l 1 1 n n = 2 (Q'mqu = 03 .1351qu th=er m=1 j m where 1% is column m of ii and Qj’ Qk are columns j, k of Q. Thus, 2 (v.75) ail-91)2 = Q'iQ. aa<1) 222 Taking account of the new matrix of second derivatives and the new matrix of partials we can now iterate to a maximum basing our iteration on the n - rk R coefficients which we have separated out. Thus, (V.46) becomes: a(i- 1) (1): 8(1) (v 76) a(1) al(1-1) _ hm _ nggag :fga) ' <1) al<1) a, 258(1) a (1) 3 8(1- 1) < 1‘) 81(1) Substituting (v.74) and (V.75) into (V.76), we get (V.67). (v.70) follows from (V.72) by a common variance relationship-- if y = Ax + b with A a matrix of fixed elements and b a vector of fixed elements, then Var(y) = AEVar(x)]A' 223 G. Linearized Maximum Likelihood¥(lML) The linearized maximum likelihood (LML) method is a complete system estimation method recently proposed by Rothenberg and Leenders.1 LML estimates and asymptotic coefficient variances and covariances may be calculated as: (v.77) aLML = &(1) +[£(1)]'1-.(1) (v.78) asymptotic Var(aLML) = [1.0)]-1 where :£(1) and 1(1) are calculated from the starting estimates am).2 That is, the LML estimates are the estimates obtained at the end of the first iteration of the FIML iteration procedure, provided the following restrictions are imposed on the iteration: (1) {(1) is not converted to a positive definite matrix even though it is not positive definite. (Usually i1 will not be positive definite the first iteration.) (2) A step size of l is automatically imposed. The restriction formulas given for FIML may also be applied to LML as well without changing the basic LML properties. (1) Provided the starting estimates for LML (a ) are consistent (1) _ a = 0(r’1/2) estimates of the underlying coefficients a with a in probability and also provided that certain other regularity and existence conditions hold, then LML coefficients have the same asymptotic distribution as the corresponding FIML coefficients, i e., 1Rothenberg and Leenders [1964]. 2A ”degrees of freedom” adjustment may be made in asymptotic V3r(a in the same manner as for FIML. LML) 224 1 £( ) is asymptotically the same as {flax f(a). This holds under a wide range of conditions including restrictions imposed on the dis- turbance variance-covariance matrix. Coefficients from ZSLS, LIML, or other methods meeting the consistency and probability requirements may be used as starting coefficients for the calculation of 'aLML; however, the LML coefficients obtained if a(1) = aZSLS will not, of course, equal the LML coefficients obtained if a(l) = a . DLS LIML estimates do not meet the consistency requirements; therefore, they cannot be used as starting estimates if the properties of LML co- efficients are to be preserved. The LML coefficients may have a lower likelihood value than the starting estimates. This is because: (1) 421) will in general not be positive definite; hence, our assurance that for a step size, h, sufficiently small, the calculation of (i) g 8(1-1) + h(i)(£(1'1) -ll(i-1) ) leads to a higher likelihood value will not in general hold. (2) Even if {61) is positive definite for a particular problem, the imposition of a step size of 1 may result in too large a step in -1 (1) l the particular direction given by (£(1)) with the result that a movement is made to a lower likelihood value. (1) Diagonal elements of [afl ]-1 may be negative, giving estimates of asymptotic coefficient variances which are negative. 1DLS estimates may be used as starting estimates for FIML, since iteration proceeds to the maximum of the likelihood, anyway. The FIML coefficients obtained will, of course, coincide with the FIML coeffi- cients obtained from starting from ZSLS or LIML coefficients (assuming that multiple maxima do not occur in the regions of the coefficients). CHAPTER VI LIMITED INFORMATION SUBSYSTEM MAXIMUM LIKFLIHOOD (SML) A. Only Zero and Normalization Restrictions Imposed on Coefficients Limited information subsystem maximum likelihood (SML) is a partial system maximum likelihood method which permits the simultaneous estimation of a subset of the equations in a system of equations.1 In FIML estimation we distinguished between two kinds of equations--the 1The name limited information subsystem maximum likelihood comes from taking account of the structure of only a subsystem of the whole system in the estimation procedure. A basic reference for the SML method is Koopmans and Hood [1953]. Chernoff and Divinsky [1953] refer to the method as the LIS (limited information subsystem) method and give computational formulas. The computation of a particular SML pro- blem is also given by Chernoff and Divinsky. We will use the abbrevia- tion SML rather than LIS to emphasize the maximum likelihood character of SML which distinguishes SML from other ”limited information sub- system" methods such as 3SLS (three-stage least squares) and IBSLS (iterative 3SLS). Hannan [1967] recently derived an SML method (seemingly unaware of the prior existence of the more general SML method given by Koopmans and Hood [1953] which is discussed in this chapter) and showed the re- lationship between it and canonical correlation. As Chow and Ray- Chandhuri [1967] make clear, Hannan's method is only applicable as a very Special case of the general SML procedure given by Koopmans and Hood. (The special case may be stated in our notation (pp. 226 g£_§gq.), as follows. Let ”11 = (r11 311) where a is M X(G + A),F is M XG, and B is MIX A (M1 being the number of equafions in the subsysfem) Now imagine rearranging vari- ables so that all G dependent variables in the subsystem occur first among the dependent variables in the whole system, and all A prede- termined variables in the subsystem occur first among the predetermined variables in the whole system, so we can write = ’F O . (F11 311) ( A 3* 0) where PA is Mlel and B* is Mlel‘ Now consider the matrix aA* = (FA Bk) which is M x(G + A1) The special case that Hannan treats is that in which, apriori, there are exactly M1 - l zeroes in each row of QA*.) 226 M stochastic equations and the G - M identity equations. Thus, in FIML estimation we subdivided the system of equations into [see (v.3), (v.4), and (V.S)]: (v1.1) a1 2' + u' = 0' MX(G+A) (G+A)XT MXT MXT (VI.2) Z' 0' a1 (G-M)X«i+A) (G+A)XT (G-M)XT In SML estimation we make a further division of the M stochastic equations into (1) the M stochastic equations for which we specify 1 the structure and (2) the remaining M equations for which we estimate 2 no coefficients and specify only any predetermined variables occurring in the M2 equations which do not already occur in the M1 equations. Thus, in deriving the SML computational formulas we will find it useful to subdivide the stochastic equations into two subsystems. The complete system may be written as: (V1.3) all 2' + Ui = 0' M1X(G+A) (G+A)XT MIXT Mle I I a I (V1.4) 012 Z + U2 0 M2X(G+A) (G+A)XT MZXT MZXT I = I (V1.5) all 2 0 (G-M)X(G+A) (G+A)XT (G-M)XT In SML estimation, only the structure of the first M1 stochastic equations given by V1.3 (subsystem I1) is specified. Also, the prede- termined variables in the entire system (including those which are in subsystem 12--the remaining M stochastic equations--and subsystem 2 II--the G - M identity equations) are specified (insofar as the re- searcher is able to specify additional predetermined variables in 227 subsystems I2 and II). In the derivation of the concentrated likeli- hood function which follows, we will see that the same likelihood function is obtained whether some or all of the identity equations are used to solve out jointly dependent variables in subsystem 11 or 12 or whether the identity equations are merely ignored (except that additional predetermined variables which occur in the identity equations are in- cluded in the set of predetermined variables recognized as being in the system in applying the computational procedure). Let] G be the number of jointly dependent variables occurring 1 in subsystem 11. Then (as will be shown further on), if G1 equals M1, and the rank of the matrix of predetermined variables in the entire system is less than T - M + l, the SML coefficients will coincide with l the FIML coefficients obtained by applying the FIML computational pro- cedure to the M1 equations, only.1 This holds whether M1 is the entire system or whether M is a subsystem of the entire system with 1 additional predetermined variables in subsystems 12 and II occurring only with zero coefficients in the equations of subsystem 11. On the other hand, if subsystem 11 consists of only a single stochastic equa- tion (i.e., the structure of only a single equation is specified) the resulting SML coefficients for the equation coincide with the usual 1rk X < T - M + l is necessary for computation of SML coeffi- cients as is shown in section VI.D; however, this condition is not a requirement for FIML computation, since the X matrix is not used in the adjustment of jointly dependent variables in FIML estimation. 228 LIML coefficients for the equation. Thus, (provided rk X < T - M1 + 1) both FIML and LIML estimation may be considered to be particular cases of SML estimation; however, since the SML computational procedure is more cumbersome than the FIML computational procedure, it is more fruitful to calculate problems in which the number of jointly dependent variables in the system or sub- system to be estimated equals the number of equations by the FIML com- putational procedure rather than the SML computational procedure. Similarly if only a single stochastic equation is specified, it is more fruitful to use the much simpler LIML computational procedure than the SML computational procedure. The SML procedure may be applied, however, to the many cases in which the structure of more than one stochastic equation is specified but the entire system is not specified. The SML method is sometimes referred to as the least generalized variance ratio (LGVR) method, since the coefficients obtained minimize the ratio of two generalized variances.2 This is not as powerful a property as the LGV property of FIML, even though the properties do coincide if the number of jointly dependent variables equals the number of equations. 1Provided X (the matrix of instruments) in the LIML estimation is taken to be X (the matrix of predetermined variables for the system) or provided X is used in place of X in the SML calculations. A proof of the equivalence of LIML and SML in the case of only a single equation occurring in the subsystem being estimated is contained in Koopmans and Hood [1953], pp. 166-173. 2See Koopmans and Hood [1953], pp. l70-l7l. 229 l. Derivation of the likelihood function to be maximized The Likeiihood To Be Maximized Before indicating SML computational formulas, the function to be maximized by the SML procedure will be indicated. As a step in the derivation of the likelihood function, we will use the G - M identity equations to temporarily eliminate G - M jointly dependent variables from the system. (The eliminated variables will be reentered into the system at a later step.) Suppose that the jointly dependent variables are divided into two groups, Y = [Y1 i Y2], where Y2 contains the G - M jointly dependent variables to be temporarily eliminated from the system and Y1 contains the M jointly dependent variables which will remain. To reflect our subdivisions we may rewrite (V1.3) through (VI 5) as: I + i I _ i = '. (V1.6) FIl,lYl . F11,2Y2 + Bllx + U1 0 I . I - I l l = 0 (v1.7) F12,1Y1 + r12,2Y2 + 312x + 12 0 1The derivation of the likelihood function to be maximized follows Koopmans and Hood [1953] and relies on that reference for some of the details of the derivation. The derivation given in this section differs primarily from the one given in Koopmans and Hood in that identity equa- tions are explicitly treated in the derivation and it is shown that the same likelihood function is obtained as if the identity equations had been ignored, (except for using the predetermined variables from the identity equations in the computational method). Professor Herman Rubin informed the writer that the same result is obtained whether some or all of the identity equations are used to eliminate jointly dependent variables from the stochastic equations whose structure is specified; however, the derivation of the likelihood function in a manner which shows that this is the case (given in this paper) was developed by the writer. 230 l _ I + FZZY' + BI x - 0 (V1.8) FZIY' 211 1 Up to this point the matrix of coefficients has been implicitly or explicitly subdivided in the following ways: r -a ‘- O’11 FI11 B11 X X M1 (G+A) M1 G MIXA (V1.9) a = [F : B] = 012 = F12 B12 GX(G+A) GXG GXA M2X(G+A) MZXG MZXA r 11 B11 LgG-M)X(G+Afl _fic-M)xc (G-M)x4] "Ia fl 11,1 r11,2 B11 X X _ X M1 M M1 (6 M) M1 A = I12,1 F12,2 B12 “X ‘X ... ‘ ' X M2 M M2 (c M) M2 A F21 F22 BII tX (G-M> (G-M)> U1 M1X(M%A) (M+A)XT Mle Mle 231 I + I. a 9 (V1.13) a¥2 Z1 [2 O x ' , Y ' M2 (M+A) (M+A)XT M2 T MZXT where -l * = - r11 F11,1 r11,2F22F21 -l x = - r12 I“12,1 1“12,ZFZZFZl -l * - B11 B11 F11,2r22BII -1 * = - r 1 B12 B12 12,2 22311 "* FT* * “11 11 B11 I 0* = [F* ; 3*] = = * f* . 12 L,12 I In the derivation of the FIML likelihood function, we assumed that U has the multivariate normal density function given by (V 12) and derived the following intermediate likelihood function (V.l3): * = (V1.14) f2(Zl,a ,2) Z a*'2'1a*zl 1 1[t] 1[t] NIH u Pdhi - §log 2n - %log[det'£] + T logEabs(det r*)] ' t Subdividing Z to represent the subdivision of the stochastic equation into two groups we define: .1 F211,11 z11,12 X X “1 M1 M1 M2 (VI. 15) 2 = MXM 212,11 2312,12 X X LMZ M1 M2 M2‘ where 2 consists of the disturbance variance-covariance matrix 11,11 232 of the equations in subsystem 11 (whose structure has been specified), = ' o o o _ 211,12 and 212,11 ( 211,12) conSist of disturbance covariances be tween the equations for subsystem 11 and subsystem 12, and 212 12 2 consists of the disturbance variance-covariance matrix of the equations of subsystem 12. Let us maximize f2(Zl,a*,Z) first with respect to 012’ 211,12, 212,11, and 212,12 taking no account of restrictions imposed on these matrices by the model--in particular taking no account of the structure of 012’ e.g., ignoring the restrictions which would be imposed if account were taken that certain elements of 012 are zero, other 0 -1 0 elements are -1, and, Since a¥2 = 012 - F12,2F22[121 : 311], these restrictions imply restrictions on afz . The following logarithmic concentrated likelihood function is obtained:1 A . T 1 . * g _l _ *I I 1‘ (VI°16) 82(Zl’a11’211,11) C3 + 2 03 [det(rr11EY1Y13ix 11)] _ T 2logEdet i T T A 15-1 A ] - —' 2 z a*'£ a*'z' 11,11 2 t=l llt] 11 11,11 11 1[t] where [YlYlJLX is the moment matrix of the part of each jointly de- pendent variable orthogonal to the predetermined variables in the entire system (including predetermined variables in the identity equa- tions and the part of the structure not specified). The [Y1Y1]lx matrix is the same as the [+y'+r]lx matrix of the double k-class estimators except that instead of one row and column for each jointly dependent variable in equation u, [YlYIJIX contains one row and column for each jointly dependent variable not temporarily eliminated from the system, including the normalizing variables. (Also, we are 1Koopmans and Hood [1953], pp. 192-195. 233 using X as the matrix of instruments, XI.) The usually quoted formula 1 O ' o . for calculating [YlYlllx is. l (V1.17) [YiY - YiX(X'X)-1X'Y = Yi(1 - X(X'X)- X')Y = I 0 131x Y1Y1 1 1 ’ however, there is considerable advantage to using direct orthogonaliza- tion to calculate [YlYlle in the same manner as in calculating [+Y'+Y]Lx rather than by using the above formula. (For one thing X need not have full column rank when direct orthogonalization is used ) ‘\ 1f 82(zl’a11?211,11) is max1mized With respect to 211,11 (thereby concentrating the function onto &* and 21), the following 11 relation is obtained:2 (V1.18) E =(1/T)w WEZ WZIJ 11,11 Substituting for 211 11 into (V1.16) and dividing by T/2, we get the function: (V1.19) 83(&*1 ,Z 1:) + 10g£det(TF11[YiY1] r*')] - logEdet(%d¥l[Z'Zl ]a?1)] C411X 11 1Except, possibly, for a factor of T, [Y' Y 11] is the W matrix given in Koopmans and Hood [1953] and Chernoff 1and fiivinsky [1953] 2Hood and Koopmans [1953], p. 166. 3Hood and Koopmans [1953], p. 195. 234 . 1 c 18 a constant. However, 4 1 1 1 (v1 20) det(ETII[Y1Y1]iXF det(T IIEY YJlXFIl) det(ETA_YAYA11X:A) 11 where: [Y'YJLX is the GXG moment matrix of the part of the G jointly de- pendent variables in the system orthogonal to the A predetermined . . 1 . 0 variables in the entire system. ([Y inx 18 the same as [YlYljix but expanded to include all jointly dependent variables in the system.) [YAYAllX is the G XG moment matrix of the part of the G jointly 1 l 1 dependent variables in subsystem 11 only (the M1 stochastic equations for which the structure is specified) orthogonal to the A predetermined variables in the entire system. is the Glle matrix of coefficients of the G1 jointly dependent variables which occur in the M1 stochastic equations for which T 1 2 1 . “-1 A c = —(c - -2 Z a*'2 0* Z' ) 4 T 3 2t=1 1[t] 11 11,11 11 1[t] = 2(c - ltr{lz &*'[&* (Z'Z )&*' -%&* Z'}) T 3 2 T111 11 1111 111 2 1 1A A A '1 = —. - — * ' *' *1 ' *' T(C3 2tr150112121011[&112121“11] }) 2 l l _ 2 .1. _ T(c3 - 2tr{T1}) - T(c3 2 1) .2c -1 T 3 T where tr denotes trace and we have used the relationship tr(AB)= tr(BA) for any matrices A and B provided AB and BA are defined (i.e., pro- vided the number of rows of A equals the number of columns of B and the number of rows of B equals the number of columns of A). 2The proofs of (V1 20) and (V1.21) follow (V1.21). Also, 235 the structure is specified. If the variables in the Y matrix were rearranged to Y = [YA 1 YB] where YA contains the G1 jointly dependent variables which are included in the M1 equa- tions for which the structure is specified and YB contains the remaining G - G1 jointly dependent variables in the system, and if the columns of F11 and VA were rearranged into the same order, then I‘11 = [ FA ‘ O 3 X X X - Ml G M1 G1 M1 (G cl) 1 1 1 * I *1 g t O = I 0 (v1.21) det(—ozT IlEzlzljo/Il) det(-o/T Il[z zjozn) det(EoIA[ZAZA]o/A) where: ZA is the TX(G1 + A1) matrix of variables in the M stochastic 1 equations for which the structure is specified. (2A is the Z matrix with all variables which do not occur in the stochastic equations with structure specified deleted.) is the M1X(G1 + A1) matrix of coefficients of the G1 + A1 variables which occur in the M1 stochastic equations for which the structure is specified. If variables were rearranged as required: all r 1 0A E O ] M1X(G+A) MIX (61+A1) MIXE (G-G1)+(A-A1)] with “A a E FA ‘ BA 1 X X M1 (61+A1) M1 G1 1~11+A1 236 Showing (VI . 20) From (V1.8) we obtain: II“ I I , (v1.22) =-F22 21Y1' TZZBIIX , hence, a: : = a, ' ' _ I I (VI°23) YlX [Y1 ' Y211x [Y1 ' Y1F21(F22) XB11( F22) 34 I I I However, by (1.59), [- IYIFZIQ 2) 31x: [Y jixr 21(P 3) and by (1.59) _ v ' 1 v = u I _ and (1.42), E XBII(F22) Jlx=[x]1x311(rn1) -0311(r22) Hence, = I I , (v1.24) le [[Y1]l{ -[Y 111x? 21(r221) ] , and [Y Y11x becomes. (v1.25) [Y lex = YLXYLX l- [Y 1' [Y J -[Y 1iJ XEY 1 Q1)' - 1 1X 1 1X 1.LX I210‘ 22 = i [Y1 ' [Y J [Y ]' XEY1] (F'1)' F22F 21 x 1 1x 1r21 11 1.LxF 21( r22_ 11”) 11qu I"<'1>' 1 1 1x1ix 21( 22 "F1 '1 I F {-F 22 rial:Y Y lJLX F22F21EY1Y1F'31X 21¢ 22) 2_§ If F is correspondingly subdivided in the manner of (V1.9) 11 we have: 237 (v1.26) det{F11[Y'Y]lXF11} = F— 1'1 .1 I _\ 1Y1Y11,x [Y Y 1P1”, 21 (59' film det [F11,1 : r11,2] P 1r . [.22 21[Y 1Y111x r22 21[Y1Y1]Lx 21(F22XJ .r11,2lJ ‘ dat{F11,1[YNY1]Lx 11,1 ’ r11,1[Y1M111x 21W 2) r11, 2 ' r11,2r22r21EY1M111x 11,1 + r11,2F22F21EY1F'Y1JLX 210 22) r11,2} g det{UIL 1 P11 2 221F21JEY MYIJLX 11,1 ' [r11,1'r11, 2r22 ZIJEYIYIJLXF 21(F22 ) r11, 2} e det{[r F F ][Y' Y 3 [F -F F-lr 1'} 11,1 1L 2 22F 21 1.1 1x 11,1 11,2 22 21 = det{F*1[Y'1Y1]lXT¥1} The second equality of (V1.20) involves a mere rewriting to eliminate from XlX variables for which no corresponding jointly de- pendent variables occurs in the set of stochastic equations whose structure is specified. If the variables are rearranged in the manner noted in the definition of VA above, we have: I I I [YAYAJLX [YAYB]1X 1" A (v1.27) Il[Y' Y] XF11=[fA : 0] = FA [YA Y flllx I I [YBYA11X [YBYBll 0 Showing (VI. 21 ) Hofldzs By (V1.3) and (V1.12), a Z' = -U' = -a* 2' ° hence, a* Z' Z 0* ll 1 ll 1 ’ 11 l l Il= I I 0’112 2211 A . 238 The second equality of (V1.21) involves a mere rewriting to eliminate from 2 variables which do not occur in the stochastic equa- tions whose structure is specified. If the variables are rearranged in the manner noted in the definition of a we have: A I. I I ZA;A ZAZ QA It. 2 . I v (VI.28) aIIZ ZaIl [qA . O] QAEZAZAJqA I I I ZBZA ZBZ 0 Confiénuing the Demivazéon 05 The SML Linwood Functéon Substituting (V1.20) and (V1.21) into (V1.19) we have: A A = 1“ v 1 _ 1“ I “v (V1.29) g4(qA,Z) c4 + log[detCE:A[Y inxrA)] longet(EaA[2A2A]qA)] which is the concentrated likelihood function which we will maximize in the calculation of SML estimates. Note that :A and qA are the coefficient matrices of the M1 stochastic equations before using the identity equations to temporarily eliminate the G - M jointly depen- dent variables. Thus, the elimination of the G - M jointly dependent variables is convenient only for the derivation of the concentrated likelihood function and is unnecessary for the expression of the con- centrated likelihood function or the computation of the 8M1 coefficients. The predetermined variables in the entire system (including those in the part of the system which was not specified and in the identity 1Thus, Chernoff and Divinsky [1953] would have obtained the same resulting coefficients and considerably simplified their computation of SML coefficients for Klein's model (b) (the same model as Klein's model 1 except that G and R are classified as jointly dependent instead of predetermined) if they had used the stochastic equations which were specified directly instead of using the identity equations to eliminate jointly dependent variables before commencing their iteration procedure. 239 equations) are used in the calculation of the [Y'YjLX matrix. 12 1“ L h . __a I "I __I‘ I I - T e matrices T AEZAZAJQA and T AE¥AYA1LXFA are used re peatedly in the elaboration of the SML damputational procedure which follows; hence, it will prove convenient to denote these matrices as simply S and 7', i.e., _ A _ 1" c “I (v1.30) 3 — 211,11 EQAEZAZAJOIA and (v1.31) T = -1& [2'2] 61' = -1-lA"[Y'Y 1 1A" TA AAlXA TA AAJLXA (The matrix “T should not be confused with the number of observations, 3 'l T.) Let a = I be the vector of unrestricted estimated coefficients a M1 of GA (a is formed from &A in the same way that the a vector was formed from &I for FIML estimation [(V.20) and (V.22)]). Then S 1The preceding derivation showing that it matters not whether the identity equations are ignored in the computational procedure or used to eliminate jointly dependent variables from the subsystem being es- timated assumes that the same X matrix is used in either case; that is, the X matrix is taken as the predetermined variables in the system be- fore using the identity equations to eliminate jointly dependent vari- ables. If instead of this approach the identity equations are used to eliminate jointly dependent variables from the subsystem being estimated and then the matrix of predetermined variables in the system is con- structed as the predetermined variables in the newly modified subsystem being estimated plus the predetermined variables in the remaining stochastic equations, the X matrix constructed in this manner (we will call this matrix the "new” X matrix) will in general not coincide with the original X matrix, since some predetermined variables are likely to have been linearly combined with jointly dependent variables and the combined variables labeled jointly dependent. Since the space spanned by the new X matrix is likely to be smaller than the space spanned by the original X matrix, [Y'Y 1_ will in general be changed and therefore, the coefficients obtained will, in general be different. 240 and T may also be defined as S = [suu,] and T = [tuu,] with (see v.21): (VI 31a) 3 =1 a'[ Z' Z ]a =lfi'fi ' w' T+u+u+u'+u' Tuw (v131b) z =-1- A'[ Y' Y 1)} ' uu' T +'u + u + u' + H” We choose the unrestricted coefficients of &A such that g4(dA,Z) is a maximum; however, since Z is fixed for any given sample, for any given structure the only elements of &A which are allowed to vary are the elements of the vector a. Thus, for an assumed structure and a given sample, g4(&A,Z) may be considered a function of the vector a only, i.e., (V1.32) g(a) = g4(&A,Z) = (24 + log(det T) - log(det 8) Another function which will be maximized when g(a) is maximized is the function (V1.33) g*(a) = det T/det S If the number of jointly dependent variables in the equation being estimated equals the number of equations being estimated, then . t g 11X det fA .. .1“: “v .. Hence, log[det T] log[det(TfA[YAYA]foA)] I I: , IY VA is square and det(rAtYAYAJLXFA) det :A detEYA A 2 . det PA detEXAXA]1x . .1.. 2~ . logEdet(T[XAYA]ix)] + log(det PA) . Substituting into (V1.32) we get 2* l , g(a) - c5 + log(det 1A) - logEdet S] (c5 c4 +-log[deth£XAYA]lx)]), 241 the form of the logarithmic likelihood function for FIML [see (V.17)].1 Thus, for this case, maximization of g(a) using the SML procedure will result in the same a as maximization of the same equations using 2 the FIML procedure. 1This manipulation assumes that det T # 0. If rk X 2 T - G + 1 (where G is the number of jointly dependent variables in the equa- tions being estimated) then det T = 0; hence, logEdet T] = logEO] which is not defined. This case is discussed in more detail in section VI.D. rk X 2 T - G1 + 1 causes no difficulty in FIML estimation. Klein and Nakamura [1962], pp. 295-297 were evidently considering the calculation of FIML estimates as a particular case of SML in their comments of the effect of multicollinearity on FIML estimation relative to the effect of multicollinearity on other estimation procedures. This way of regarding FIML estimation may be undesirable both compu- tationally and conceptionally. It may be undesirable computationally, since the SML computations are more severe and artifically impose pro- blems when rk X is large relative to the number of observations. It may be undesirable conceptionally since the FIML estimator has pro- perties not possessed by the SML estimator (except as the two coincide), e.g., the FIML least generalized variance property (section V.A) is more powerful than the SML least variance ratio of two generalized variances property (section VI.A). Their conclusion that FIML is more sensitive to multicollinearity among the predetermined variables in the system than ZSLS and LIML does not follow from their arguments. They have given very good arguments as to why one would expect SML to be more sensitive to multicollinearity among the predetermined variables in the system than ZSLS or LIML. 2This is noted in Koopmans and Hood [1953], footnote 73, p. 165. 242 2. Computational formulas The same computational procedure as was used for FIML is used for SML. Only the actual vector of partial derivatives and matrix of second partial derivatives differ from the FIML vector of partial derivatives and matrix of second partial derivatives. Also, g*(a) is used in place of f*(a) in determining the step size. an) The vector of starting estimates, , is arbitrary as in FIML. The starting estimates may be derived from single equation techniques such as DLS, ZSLS, and LIML, from multiple equation techniques such as BSLS or IBSLS, or in some other manner. The coefficients for iteration i are calculated from the coefficients for iteration (i-l) by (v.48), i.e., (v1.34) ‘ a”) = 3(1'1) + Mme“) where d“) - I£(i-1)|-11 (1-1) and 11(1) is the step size for the iteration. In what follows, we will omit the superscript giving the 8(1-1)’ dd) (1) iteration number from the , and 1 vectors and the in) matrix. th In SML estimation, the up.’ block of the 1!. matrix is: 2 g _.§_S£El g Hu' '2 - 1 'fiF fi'z (v1.35) Yew. Taauau' s 7‘11 u" ( /T)zph w' u' .[U'z . I A - tm" ' - 1 T z'U G [Zulu (/)[ 1» 11X m» 1» ix ’ .1, 243 th and the u block of the vector of the right hand side vector is:1 M1 M1 I I (V1.36) I“ = 1 gfifll = 2 sup z'fi~, - z 1”” [z'fi .lix p. ”0:11 H‘ L5 H'=1 H‘ u 31“ £1“ = z'fi : - [2'3 1 . u 1 u 1 1X 8M1” 1M1“ where: I t - SUM is the element of the u h row and u'th column of S 1 [see (V1.30) and (VI.31a)]. I t - {PM is the element of the nth row and u' h column of T 1 [see (V1.31) and (V1.3lb)]. I 1(V1.35) and (V1.36) may be derived by noting that T§$§§§_§Z_ is 31“ u the same for FIML and SML-- ZdUl i , the first term of (V1.36). SW The negative of the partial derivative of the first term of (V1.36) . uu' . _ .U . . With respect to a”, gives 5 ZMZM' (l/T)ZM lFuu,U1Zu, , the first two terms of (V1.35). To derive the second term of (V1.36) and the 11 . = —r ' ' last two terms of (V1 35), note that T T [YAYAJLX 1. , , ; .‘.._1. . . TgA([YA]1X , 0) ([YAJLX . O)ozA TQAEZAZAJLXQA , hence, the same partial derivatives as for log(det S) (with S = %&A[Z'Z]&A) are obtained except that [Z] is substituted for Z in each of the 1X terms. Except for the lack of use of direct orthogonalization, essentially the same basic formulas are given in Chernoff and Divinsky [1963], pp. 261-263. 244 _ [A a _ I " " = I ... I sz1 Zulu1 uMll [(211 +zl)+a1 (zLL + 21”,) a M1] _ IA = . ' [zuU1]m{[zu + zl]1x +g1°[zu + ZMljix +-aM1} I .3 l- I -1 TYp. +Yl]1x 0 [Yu +YMIJJ-X 0 3 +81 +8M1 L. 0 9J L. O QJ r- , A “H r . ‘1 [Yu +Y131x + 1 [Yu +YM1]J.X + M1 m X1 m X1 u 0 O x x L. Lu 1 .1 L Lu 1 J .W O ' - F - I [ski °'° leu] + sup S 1 H“ ' leu b r- c arm-l o I c ,-: [tm°°°.tM1]+£mT1 MU 15‘1“; A "degrees of freedom" adjustment can be made in the estimated SML disturbance variance-covariance and the estimated SML coefficient variance-covariance matrices in the same manner as for FIML. These matrices can also be normalized in the manner suggested for FIML. (See section V.D. and V.E.) asML and igdL are, of course, used in place of a and iglML in the calculation of these matrices. FIML 245 B. Arbitrary Linear Restrictions Imposed on Coefficients The iterative procedure given in the previous section is de- signed to maximize g(a) with respect to the vector of coefficients, a. In this section the problem is to: (v1.37) max g(a) a subject to: (V1.38) R a = r NRXn nXl NRXl where the R matrix and the r vector are as defined for restricting a in FIML estimation. The Q and Q2 matrices and the q and q2 vectors are calculated from the R matrix and the r vector in the same manner as for FIML, and the computation gives a means of separat- ing out a vector of n - rk R "unrestricted" coefficients, a(1), and a vector of rk R Coefficients, a(2), which may be calculated from the a vector. (1) The computational formulas for FIML, (V.67) through (v.70), are applicable for SML as well, except that the sf. matrix and 1 vector for SML are used in place of the :8. matrix and 1 vector for FIML. 246 C.' Using_Lnstrumental Variables in_SML Estimation In deriving the logarithmic likelihood function and in the pre- sentation of formulas for the SML estimator, the jointly dependent vari- ables in the equations being estimated were adjusted by all of the jointly dependent variables in the system; i.e., the matrix X was used in the calculation of [TATA]LX . A more general approach would be to adjust the jointly dependent variables in the equations being I , where XI contains all of the predetermined variables in the stochastic equations being estimated estimated by a matrix of instruments, X plus a set of additional instrumental variables.1 Thus, the matrix I ' I [TATAJLXI could be substituted for the matrix [YA in the above YaJLX likelihood formulas and, therefore, in the SML coefficient formulas derived from the likelihood formulas. If such a substitution is made, the resulting SML coefficients will not be the same as the coefficients obtained through use of the E¥d¥Ajlx matrix and 353225 be described as the coefficients which maximize the likelihood function given the structure of the equations estimated and the predetermined variables 1 the system. __ 1Selection of instrumental variables is discussed in section 11.G. 247 D. SML Estimation when rk X 2 T - M1 + 1 1f rk X 2 T - M1 + l (where M1 is the number of equations in the system being estimated) then the T matrix will be singular; hence, the formulas given previously cannot be used to compute SML estimates.1 In some cases, a matrix of instrumental variables, XI’ could be substituted for the X matrix as noted in section VI.C, with the number of instruments restricted so that rk XI < T - M1 + 1; how- ever, the properties of the SML coefficients obtained from such a sub- stitution have not been examined. There is especially a question as to whether or how much the statistical efficiency in the estimation of the SML coefficients is decreased if the X space is restricted. I 1 1r. .~. 1 a. .x T a -T Y' T' s —d t F ' F' b l det det{T At AYAJLX A} T e {[ AYAJLXEYA A]Lx}fi [t e ast equality comes from (1.80)]; hence the TXM1 matrix E¥Ardjix must have full column rank if det T is not to be zero. But the space orthogonal to X has rank T - rk X; hence, if [YATA lX is to have full column rank, T - rk X must be greater than or equal to M i.e., rk X S T - Ml' Thus, det T = 0 if rk X 2 T - M1 + 1. 1, 21f the number of jointly dependent variables in the subsystem being estimated equals the number of equations being estimated, re- striction of the X space will lead to the same estimates as would have been obtained if FIML had been applied only to that subsystem (see section VI.A); however, in this case it is more efficient (with respect to computer time and capacity requirements) to use the FIML computational method directly. (rk X 2 T - M + 1 presents no difficulty in FIML estimation, since fhe joingly dependent variables are not adjusted in the FIML estimation procedure. Instead, the co- efficients of the jointly dependent variables are taken explicitly into account.) 248 E. Iterative Limited Information Single Equation Maximum LikelihoodA(IL1ML) In the case of only a single jointly dependent variable per equa- tion, Professor Lester Telser proposed an iterative DLS estimation pro- cedure which estimates the coefficients of one equation at a time based on the predetermined variables in that equation plus the residuals from each of the remaining equations.1 The new coefficients for an equation are used to estimate new residuals for the equation which are, in turn, used to estimate new coefficients for the remaining equations. In chapter VII of this paper it is shown that Telser's iterative procedure leads to FIML estimates in the special case of a single jointly dependent variable per equation. In this section, we will demonstrate that the same procedure but using LIML instead of DLS leads to FIML estimates in the general case (multiple jointly depen- dent variables permitted in each equation).2 In the case of an in- complete system, the method leads to SML estimates. Den/{vouch 06 The ILIML Method T As noted in (V1.33) the function g*(a) = gEE—S = A I A I det{TA[YAYA]L£TA} is maximized by the same coefficients that maximize d ’- I AI et{aA[ZAZA]aA} the likelihood function. Suppose that we partition the matrix of co- efficients such that the coefficients for one equation are distinguished lTelser [1964], pp. 845-862. 2The method (ILIML) given in this section was proposed to the writer by Professor Herman Rubin in June,l963. The key steps of the proof of the increase in the likelihood function at each iteration were indicated to the writer by Professor Rubin at that time. 249 from the coefficients for the remaining equations. Without loss of generality we can assume that we are distinguishing the first equation from the remaining equations whose structure is specified, since the equations can be rearranged. Thus, let GA and :A be subdivided as: r- a ‘1 ol F 11 x x 1 (61+A1) 1 Cl (V1.39) GA = a and PA = r 2 2 M -1 X G +A ) M -1 X0 If. 1 ) ( 1 1d if 1 ) La then g*(a) becomes: IqlEYfilYAjJ-Xrl FIEYAYA] f 1X 2 de A f 1 A 1‘ [Y'Y] ' F [Y'Y] I" (v1.40) g*(a) = 2 A A 1X l 2 A A LX I “I a I I ' “IEZAZAJOH 1EZAZAJQ2 det 2 Jo' & [2'2 1&' A I QZEZA A 1 2 A A 2 01’ (v1.41) g* (a) I—p t. A A A A _1A a. I. a. I 1"I _ I I I I I I d I I F1[YAYA]J.X 1 1[YAYA]1xF2(F2EYAYAJYXF2) Ifizwlsflzsdixrl . etazhAYAjixrz) I. ' Rf- B . 1.. '7 ' A! - X— . .l, K ' n1. alezA 1 alezAa2(a2zAzAa2) lazzAonz1 det(azzAzAaz) (V1.41) is derived from (V1.40) by the determinental relationship shown in footnote 3 of page 167. The "det" operator may be omitted from the numerator and denominator of the first term of the product since each is a le matrix. The second term of (V1.41) consists only of matrices fixed for any sample ([YAYAllx and ZAZA) and coefficient matrices from equa- tions other than equation 1. Thus for a given & the second term 2, may be considered to be a positive constant in the function to be 250 maximized at this point and we may consider the problem to be one of selecting & to minimize:1 1 (v1.42) g**(a) . -1A alezAal alezAazmzzAzAaz) (IrzzAonz1 _ A A A A A A .1A A I ‘ I‘l _ I I I I I I FlEYAYAle 1 11”»,th .LXF2(F2[YAYA.]_1XF2) FZEYAYA] Lxr1 The problem has been switched from maximizing a function to minimizing the reciprocal of the function (dz being temporarily held fixed) to make the correspondence with LIML easier and to make the derivation of some partial derivatives slightly simpler. Let 0 be the TX(M1 - 1) matrix of residuals of equation 2 2 through M1 and assume that 02 has full column rank.2- Thus, A =_ AI=_f"I_ AI (V1.43) U2 ZAQZ YA 2 XAB2 Then the numerator of (V1.42) may be written as: A A A A -1A A A A I _ I I I I= I A I (V1.44) 0:112:52A ZAU2(U2U2) UZZA]OI1 c1r1[zAzAjwch1 [see (1.37)] . That the denominator of (V1.42) may be written as: A I A A I (“'45) Iaways.)ilIxEuzllh may be seen as follows: 1 '1 ; det(leYAYAJfoé) g*(a) is.” . . . g**(a) det0122A2Adz 2 A lA'A A A I A as 1 . = 2 If 211,11. T£U1U11 is assumed nonsingu ar, then U1 [ul U2] and, therefore, U must already have full Column rank. 2 3E¥A¥AJL1X502]. is the part of the moment matrix of VA orthogonal to both X and . U2 . 251 [UZJLX E [‘YAr2 ‘ XAB2 1x T '[YAjixr2 ’ EXAJLXB2 [by (1‘63)] . -[ijlxiLé since [xAjlJLx - 0 [see (1.42)] - Thus, [YAJLEXEBZJ ' [YA]L(X![62]1X) ' {[YA]L([GZJLX)}LX [see (1.65)] . {[YAJI. ('[YAJfo‘p }J-X A A I I I'I ‘1 A I ' {YA - {-[YAJLXF2)(FZEYAYAJLXFZ) (‘rzhAJlanhx [see (1.47)] " A I AI '1“ I . [YAL-X - {[YALXLXI‘é(FZEYAYALXFZ) F2[YA]LXYA [see (1.62)]. However, {[Yljlx}lx ' [YA] and [YA]. Y = [YXYAJLX [see (1-56)]; .LX .LX A hence, A g _ AI A I AI ‘1“ I (V1.46) [YALEfoZ] [YAJLX [YA]LXI‘2(FZEYAYA]‘LXI"2) I‘ZEYAYAJLX I . I . - [YAYAJL[X§U2] may be written as YAEYA]1[X{U2]. USing (V1.46), we have (“'47) [YAYALIxsfiz] " YAEYAJYIstZJ I _ I AI A I AI ‘lA - YA[YA]lX YAEYAJLXFZC‘ [Y Y ] 1“ ) 1‘21 I 2 A A 1X 2 YAYAJJ-X A A A .. 1A _ I _ I I‘I I I I . . [YAYAJJ- X [YAYAJJ- X 2 (FZEYAYAJJ. XFZ) I”ZEYAYAIL X “ I .. fv . Hence, FIEYAYAJLEXEUZ] 1 is the denominator of (V1.42) as claimed in (V1.45), and (V1.42) becomes: e I ,, A I or1E ZAZAJJ. U 20’1 (V1.48) g**(a) - . I ,, " I I‘lwnyeji [XEUZJFI But in the notation of part I of this paper (since rows and columns of [2A2A]La and [YAY corresponding to coefficients restricted A11 [xi 112] 2 and f1 may be delected), (V1.48) may be rewritten to be zero in 61 252 1 38: C11 [+Y'1: 817M 1 + 2111 1312 1911 (v1.49) g**(a) = +§1E+Y1 + Y3111M U 2] +§1 We may minimize g**(a) first with respect to El thereby con- A centrating the function on Y + 1. First notice that the numerator of (V1.49) may be rewritten as: _‘ .-- A ‘5 I I , +¥1 +Y1 +Y1x1 +Y1 AI AI _ (v1.50) [+‘Y1 Bl] - x' Y x'x " L1+1 ll—JLUZL- 1_J H [Y J [ y'x .1 r. +Y1+Y11U2 +111U2 +Y1 E +Y‘ 6g LEX1+YJ11EX1X1 “1U .5 I A A MR. 1 + Y111i‘12 +Y1 + 261W1 +Y131u2U:+ + B1'1M13111251 with respect to El and Taking the partial derivative of g**(a) setting the partial derivative to zero we have: (v1.51) (1/d){2[x'1 + H1311 +§1 + 2[x'x 11“] 11281} = 0 2 where d is the denominator of g**(a) 1 . ' . ' “ d 2 B : 2 +4» [Y3 8“ + u [’11 u] in (V1.49). Solving for TI” 253 we have:1 (v1.52) 51 = -[X'X1;] *1 112Ex1 + Y1J1U2 +111 Substituting for 81 into the numerator of g**(a) [i.e., substituting into (VI.50)], we have: I. I a. A (”'53) +Y1E+Y1+Y1J1U +Y1 2+Y1E+Y1X1EE1U2EXIX 11w 21x 1 Y 2 1 +Y 1 1U2 + 1 I A l . +Y1E+Y1x13112EX1X1J 2Ex1x1]112EX1X1Y1UEZEX1+Y131U2 +Y1 I +Y1E+Y1 + Y11w +Y1+Y1E+Y1x131U 2 l + 11U + l [11'me 12[x' an 1} 2 2 ' A =+Y1E+Y1 + Y11]1[x:'u U2] +Y1 where denotes the moment matrix of the part of Y ' Is E+Y1+Y1]1[x EU ] + 1 orthogonal to both X1 and U2 . 1 2 a figififll. 2[x'1x le 2, a positive definite matrix if det([X'X ] “ ) # 0 and d f 0 (since d cannot be negative); hence, under these con- ditions the second order condition for the value of Bl given by (V1.52) to minimize (V1.49) is met. 2That the last equality in (V1.53) holds may be seen by writing out [+Yl]1[xlifi2] as follows: E+HEYEX1UU21 E+Y131182?[X111121 EE+ hjiu E 11x 1 ,2} 15.. ‘1 65>] "YE+ 1 1U2 " EX1Y1UZEEX1 162Ex111i12) 1Ex1111'12YE+ 111211“ Hence, [ +Y1 + HIJLEX U 2] g+¥1[+Y1]4-X fiuzj [see (1.56)] '+ Y1E+Y111112 ' + Y'lEX1321112 Ex1x131i‘1121Ex1 + Y131112 I I E +Y1 +Y111U2 E+Y1x111U2Ex1x111UZEXI +Y1111J 254 Thus, (V1.49) becomes: +YlE +Y1 +Y1]1[x1:U2] +Y1 (VI . 54) g** (a) A V! min 51 +Y1E+Y1 + U1]1[x U 2] +Y Comparison of (V1.54) with the alternative formulations of the A LIML problem (section 11.C.l) shows that the +y1 which minimizes g**(a) A is the eigenvector, di’ corresponding to the smallest eigenvalue, c1, of A 1B with A = [+Y' l +H1311x U 2] and -[ +Y1 +Y1]1[X’AU 2]. Further comparison with section 11.C.1 shows that the eigenvalue is the value kLIML in a LIML problem with +Y1 being the matrix of jointly dependent variables in the equation, [X 3 fi being the matrix of predetermined variables in the equation 1 2 and [X § 32] being the matrix of instruments, XI . The +§ which minimizes g**(a)| 1 may be calculated either as (1) the eigen- min B vector, d1, corresponding to the minimum eigenvalue, kLIML’ or (2) by substituting kLIML into the usual k-class formula; thereby cal- culating Y1, él’ and M - 1 additional coefficients corresponding 1 2. (The M1 - 1 coefficients corresponding to are then ignored in further calculations.) to the variables in U the variables in U2 In summary, what we have shown is that given a set of coeffi- ‘ cients for all equations except equation 1, the coefficients of equation 1 which will maximize the likelihood function is the LIML solution of equation 1 modified by including the residuals of equations 2 through M1 as additional predetermined variables in equation 1. 1f the coefficients of equations 2 through M1 are assumed fixed, the coefficients of equation 1 (Y1 and 61’ only--the coeffi- cients corresponding to U are ignored) estimated in this manner 2 255 will increase the likelihood function over the value of the likelihood function obtained by the original LIML coefficients for equation 1 (assuming that the original LIML coefficients did not already maximize the likelihood function considering the coefficients of equations 2 through M as fixed). 1 Now, let us define the calculation of new coefficients for equa- tion 1 as step 1 of an iteration and calculate the residuals for that equation (ignoring the coefficients corresponding to the residuals for equations 2 through M in the calculation of the residuals for equa- 1 tion 1). Let us define the calculation of new coefficients for equa- tion 2 as step 2 of an iteration and calculate these coefficients as the LIML solution obtained if the new residuals from equation 1 and the residuals from equations 3 through M1 are included as additional predetermined variables in equation 2, The new coefficients for equa- tion 2 will increase the likelihood function over the value of the likelihood function at the end of step It If the procedure is continued through step M of the iteration, l the new coefficients for equation M will, in general, increase the 1 likelihood function over the previous coefficients for equation M1” Let us define a new iteration as starting with the calculation of new coefficients for equation 1, using the new residuals obtained from Clearly the new co- steps 2 through M of equations 2 through M l l" efficients obtained will give a likelihood value greater than or equal to the likelihood which was obtained at step 1 of the previous iteration, since the likelihood will have been increased (or at least not decreased) at each step of the previous iteration. The likelihood at any one step of an iteration will not be strictly higher than the 256 likelihood at the same step of the previous iteration only if the co- efficients for all equations maximize the likelihood assuming that the coefficients of the remaining equations are held constant. Swnmany 06 the ILIML Computationai Method (1) (2) LIML may be used to estimate starting coefficients and residuals separately for each of the M1 equations.1 Step 1 of an iteration consists of estimating new LIML coefficients for equation 1 by using the residuals from equations 2 through Ml as additional predetermined variables in the equation. A new vector of residuals is estimated for equation 1 from the new co- efficients of equation 1, the coefficients corresponding to the residuals of equations 2 through M being ignored in calculation 1 of the vector of residuals for equation 1. The new vector of residuals replaces the old vector of residuals in steps 2 through Ml (i.e., in the calculation of new coefficients for equations 2 through M1). Step 2 of an iteration consists of estimating new LIML coefficients for equation 2 by using the residuals from equation 1 and 3 through M as additional predetermined variables in the equation. A new 1 vector of residuals is estimated for equation 2 and this vector represents equation 2 in the calculation of new coefficients for the other equations until step 2 of the following iteration. New coefficients and new residuals are estimated for each equa- tion in turn until new coefficients have been calculated for the Mth equation. 1A3 with FIML, other starting estimates such as DLS or ZSLS estimates could be used. 257 (3) A new iteration is calculated in the same manner using the re- siduals from the preceding iteration in the calculation of new co- efficients for each equation in turn. (4) Iteration continues until all coefficients converge. The con- vergence criteria given for FIML in section V.C.S are applicable for this computational method of calculating FIML and SML coeffi- cients as well. Inc/Leasing Compwtaaomt’. Efifiioécncy 05 ILIML The iterative procedure may be made considerably more efficient computer time-wise and also more accurate rounding error-wise if the residuals are not explicitly calculated. (A large number of multi- plications and additions are required to calculate the residuals and form the sums of cross-products of the residuals with the other vari- ables in each equation.) All moment matrices needed for the calcu- lations may be calculated directly from the vector of coefficients and the moment matrix of all variables in the M1 equations through use of a number of alternative formulas with the optimal formulas to use depending on the amount of special programming which the user of the ILIML method is willing to do. Following are some relationships which may be worked into a procedure: From (V1.54), we note that the k value for estimating the co- efficients may be derived as the smallest eigenvalue of -l [+Yi'i +Yu1ix?fi_u][+yd +Yu11. Exam-u] where f1-“ denotes the residuals for all equations except the nth equa- tion. If desired, the eigenvector corresponding to the smallest root may be used as +§ and B may be calculated by formula (V1.52). 258 Y. Y Q“ d ' " + u +'M]L[X:U_u] an [+Xu +XHJiEXu5U_u] E by direct orthogonalization using the orthogonalization procedure given may be calculated in appendix A. Let 2 be the matrix of all of the jointly dependent variables in the first Ml equations and all of the predetermined variables in the s ste . Th '“ = - Z' Z 3 a d “" 8' Z' Z 5 y m en zuu- [ +u1+u n uuuu' +u[+u+u']+u' Remember that one variable in the U and U matrices changes each time a new set of coefficients are calculated for an equation, A and that the a matrix for equation 2 differs from the U matrix for equation 1 in that it includes residuals from equation 2 instead of residuals from equation 1. Since any equation with rk X = nu does not affect the final FIML or SML coefficients of the remaining equations of a system of equations, some additional computational efficiency may be obtained by omitting all equations with rk X = nu from the iterative procedure until convergence has been obtained (maximum likelihood estimates have been obtained) for all equations with rk X > nu .1 The maximum like- lihood coefficients for an equation with rk X = nu may then be directly caJCulated as a LIML problem which contains as the only "extra" pre- determined variables in that equation the residuals (calculated from ¥An equation with rk X = nu is usually referred to as a just- identified equation and an equation with rk X > nu is usually re- ferred to as an over-identified equation. (See section II.D.) -259 l the converged coefficients) of the equations for which rk x > nu - Compawon ww: FIML and SML Methods At each step of each iteration in the ILIML method, the co- efficients of only one equation are modified; therefore, it would seem likely that situations such as illustrated in Figure V1.55 could arise. Figure V1.55 coefficient from equation u' Coefficient from equation p 1A variation of this technique may also be used in the SML pro- cedure. Convergence on an SML routine may be first obtained for those equations with rk X > n and the system then enlarged to contain also the equations with rk XP- . The two procedures ma also be mixed, SML being used to obtain maximum likelihood coefficients for the equations for which rk X > nu- The maximum likelihood coefficients for an equation with rk x - may then be calculated as a LIML problem which contains as the ongy "extra" predetermined variables in that equation residuals (calculated from the converged coefficients) of the equations for which rk x > nu . 260 Since the coefficients from one equation are selected to maximize the likelihood function given the coefficients of the other equations, movement at a single step is only in the space of the coefficients of a single equation. Thus, (as in Figure V1.55) very little change in the coefficients may result at any one step if movement is up to a ridge which is not parallel to the space of coefficients of one of the equations. It is even quite conceivable that so little change would be made in the coefficients of any equation that it will be thought that the coefficients have converged while actually_a considerable distance from the maximum of the likelihood function. On the other hand for the situation in Figure V1.55, movement to the maximum would be very rapid for the FIML and SML computational procedures given earlier, since all coefficients are simultaneously adjusted during any one step of the procedure. The question of whether multiple local maxima occur in critical regions in the likelihood function for systems of equations estimated by FIML and SML is still an open question. It should be noted that the ILIML method is as likely as the FIML or SML method to move to a local maximum rather than some global maximum (assuming that it moves to a local maximum at all). However, added to this possibility is a considerably more serious possibility--that convergence may be to a point on a ridge. This is much more serious since (1) ridges which could cause difficulty are surely more likely to occur, and (2) con- vergence will be to a non-unique point even within the region. (E g., if estimates result in movement to a point b on the ridge just above a stable point a on the ridge, movement will then be not toward a. Instead b will be considered a point of convergence or movement 261 farther up the ridge will occur.) In general, the larger the number of equations and the larger the total number of coefficients in the entire system being estimated, the slower ILIML convergence is likely to be, since in these cases the coefficients of any equation are likely to span less of the total co- efficients space. It is conceivable that convergence may be actually faster [computer time-wise] for the ILIML method than for the FIML and SML methods in yggy small models due to the simple computations per- formed; however, if such cases do occur, any computer time saved through use of the ILIML method would surely be trivial, since FIML and SML will converge rapidly in these cases, also. 0n the other hand for the large majority of problems to be calculated, the use of the FIML or SML method may save a very large amount of computer time over the use of the ILIML method. In only 11 iterations a coefficient convergence criterion of .000 000 000 001 was satisfied when Klein's model I was calculated by the FIML procedure; however, when Klein's model I was calculated by the ILIML procedure, a coefficient convergence criterion of only .00001 was satisfied after 327 iterations. Convergence was occurring very slowly at this point with approximately 100 iterations being required per extra digit of convergence.1 (The FIML and ILIML coefficients agreed up to the accuracy calculated by the ILIML procedure.) lKlein's model 1 which contains only three stochastic equations is given in section I.C.l, and the FIML solution to Klein's model I is given in the reproduced computer output of section 1X.K. The co- efficient convergence criterion is defined in section V.C.5. 262 The primary advantages of the ILIML method over the FIML or SML method would appear to be: (1) Ease of programming. (However, in general, the more advantage (2) taken of programming relationships to reduce computer time in computation by the ILIML method, the less advantageous the ILIML method is from this standpoint.) With a given sized computer memory, it may be possible to cal- culate much larger problems by means of the ILIML method. (How- ever, in general, the larger the problem being calculated, the less favorably the ILIML method may be expected to perform both with respect to speed of convergence and whether convergence be- comes so slow that the problem appears converged when it is short of the maximum of the likelihood function.) On balance, the ILIML method would not appear to warrant an in- vestment in programming. Instead it seems desirable that FIML and SML be programmed by the formulas given previously. CHAPTER VII ZELLNERrAITKEN ESTIMATOR (ZA) A. Onlyggero and Normalization Restriction Imposed on Coefficients The ZellnerwAitken estimator (ZA) is a multiple equations method which may be applied when each structural equation contains only a single jointly dependent variable.1 If all of the equations in the system are of this form and ZA is applied to the complete system, then ZA is a complete system method.2 ZA may also be applied to only part of the system, in which case ZA may then be regarded as a partial system method. The 2A estimating equations can be derived as an application of GLS (in which an estimated disturbance variance-covariance matrix is substituted for the actual disturbance variance-covariance matrix) as follows: h . . t Assuming that there are M equations in the system, the u equation being of the form: (VII.1) Y =x + u as» “a ’ then the entire system can be written as: = + y1 X181 + 08M + u1 (VII . 2) =0 +..., + yM 81 + XMBM ”M 1Zellner proposed the “efficient estimating procedure" which we are calling the ZellnerwAitken estimator in the article Zellner [1962]. If the complete system contains only one jointly dependent vari- able per equation, then the structural equations and reduced form equa- tions coincide. These reduced form equations will, of course, not be unrestricted but will incorporate all of the a priori restrictions of the structural equations. 263 264 or if we define the following matrices and vectors: y1 x1 '“ 0 B1 "”1 i=2 is: : s=., u= y 0 '°° XM BM-J “M_J MTXl MTXn nXl MTXI M where n = Z nu (n is the number of explanatory variables in the u=l t u h equation), we can write the entire system as: (V11.3) y = X6 + u Initially we will make the statistical assumptions given at the start of this paper (section I.C.3). One of the assumptions implies that X“ for each equation has. full column rank which in turn implies that R has full column rank. This assumption will be relaxed in a later section of this chapter when restrictions on the coefficients are permitted. Let U = [ul "° uM] be the TXM matrix of disturbances, with column u the disturbances for equation u and row t the disturbances for observation t, designated UEtJ' Then the assumptions of section I.C.3 that 6U . o, dufitJUEt] = 2 for all t, and 6U6t1UEt'] = o MXM for all t r t' imply that: (VII .4) 6U = O MTXl and VII.5 6 ' = o I ( ) uuuu' uu' where I is TXT and OHM. is the element in row u and column 0' of Z . (VII.5) is commonly expressed for the whole covariance matrix 265 of. u by the "Kronecker product" 0 as:1 1'If A is any qu matrix and B is any rXs matrix, the Kronecker pro- duct AIB is defined as the (p-r)X(q-s) matrix P ‘11b11 ‘11b1s “19"11 a19%.1 Fa B°°°a B1 a.b '°°a.b a.b °°-a.b 11 lg 11 r1 ll rs lq rl lq rs ass - E 3 - ° ' :plB quBJ sp.1b11 aplbls ‘qull ‘quls :plbrl aplbrs aqurl aqursj For A and B square, symmetric, and nonsingula matrices .ADB is also square, symmetric, and nonsingular a d [AOBJ' -.A' OB’i. This can be seen by (l) premultiplying.ABB by A' B'1 and observing hat the identity matrix is obtained and (2) postmultiplying A08 by A'IIB' and observing that the identity matrix is obtained; hence that A'IOB'1 is the inverse of.AO . Let us assume that A is po and B is qu. Premultiplying.AIB by A‘ '1 we get: 11 -1 1p - ‘ B 11 1p a B [A'IIB'1][AIB] - . . . a913-1 ... appB-l s ... a s 1 pp p -1 P - The 15‘“ block in the product is z [(aikB )(aij)] - z ajkukj(s 1s) . p k-l p k-l I [ Z .1“; ]°I, where 1 is TXT. But 2 aika I 1 if i I j and 0 if RI]. kj kIl 1‘! i I 1; hence the product is: 1.1 ... 0.1 I ... o 3 f I 2 I I I 0.1 ... 1.1 o ... I quPq Similarily, a a B -'- a B anB-1 alpn.fl I O 1 ll 1p [ABB][A-10B-1] - : : : : - ' pl -1 ... PP -1 a B "' a B a B a B 0'1 "° 1-1 __91 psi _ .. I I . pqqu Hence, [AOB]-1 I [A-193-1] 266 r- . 1 011 0 01M 0 r I 0111 clew O o 0 - a TXT TXT 11 1M (VII.6) dbu' I 2 9 I I I i I MXM TXT I U I 0' . ' C LMl MMJ M1 0 MM 0 MTXMT LI'C)....aMl--- 0...OMMJ MTXMT * We will designate 6uu' as Z MTXMT If we treat y as the dependent variable, X as the matrix of predetermined variables, and u as the vector of disturbances and sub- stitute into GLS formulas (v.12) and (V.13) we get: A .. *-1 . -1 .' *-1 , (v11.7) sGLS = E x z x ] [x z y 1 nXMT MTXMT MTXn nXMT MTXMT MTXl . ~ ., *-1. -1 (V11.8) asymptotic Var(BGLS) = [x z: x] an If we had assumed that X, the matrix of predetermined variables, were fixed in repeated samples, then. Var(éGLS) would be-a small sample estimate rather than an asymptotic estimate. Also, EGLS would be best linear unbiased (assuming that we knew 2). * 0 A Due to the particular form of 2, X, and y, B and asymptotic GLS Var(fi ) may be written out in more detail. GLS CI11 01M 0111 O1MI *_ _ _ . . . . 21=[29111=2191= : :. 91:: : owns“ omens“. . . . -1 ow' where I is a TXT identity matrix, and 2 = [cuu,] and Z = [ J are MXM matrices. Thus: 267 (1111.9) 5 - GLS Fx' 07 GI (fl '1 1 111” 61111 1 : ' : x 1.0 i L0 . XMJ cm; X ' G -1PM ‘ ll ... '1 in R ax. Mix. “21° xv. - : : 'M . 1 11;. Max. ”5w if,“ an. and -1 l l tow MW Since 2 is unknown, the following estimate of 2 is sub- FoUX'x omXix; (VII-10) asymptotic Var(BGLS) I stituted into the formulas:1 1The 8 matrix will be singular if the number of equations exceeds the number of observations. Let U I [$1 ~-- 6 ] where the a are TXM M "' ms estimates. Then 9 - (l/T)U'U ; hence, rk s - ch 6 . If '1' < u MXM than rk.fi and therefore rk S will be less than M, i.e., S will be singular. This is not in conflict with our derivation in section (V11.B) of DLS as the special case of ZA in which there is zero correlation be- tween residuals across equations, since that derivation holds only for nonsingular 8. Even though 2 is assumed nonsingular, a particular estimate of 2 may still be singular (and as shown above if Mi> I then' i I 8 is singular). 268 * def VII.1l Z I S I . ( ) [sW ] ..DLS'..DLS «DLS' rDLS cDLS 1th I I ' w sw' (l/T)uu up. (l/T)+Bp' [+211- +zu,]+6pl , +6“ being the vector of DLS coefficients (including the normalizing coefficient, r 1 -l f r e ati a d 2 I : Z . )0 qu onu n+u [YH 11] Substituting S into (V11.9) and (V11.10) we get: (v11. 12) éZA - [emflpm as the ZA estimate of the vector of coefficients of the M equations where: 1T is being used rather than some other divisor so that 12A (iter- ative ZA) coefficients can be derived as maximum likelihood coefficients further on. Two alternative divisors suggested by Professor Zellner follow: ..DLS 'fiDLS (1) 8* ' ' T L “g L p; k H“ u 3" up: - -1 h k - t x x'x ' x' x ' w ere w' r{ Ht 11 u] 1))»qu u' 11'] xu} I '1 1 1 '1 1 . 2, - tram] [xuxwltxwxud [xvxun Pu minauap.) where p is Hooper's trace correlation coefficient which measures the correlagion between sets of variables X and X .. As indicated in Stroud and Zellner [1962], 8* estimateduby this formula provides an unbiased estimate of 2 if X is assumed to be fixed in repeated samples. Although unbiased, 8* would likely have more variance about 2 due to the evaluation of the trace terms. In a discussion with the writer, Professor Zellner suggested that he did not favor the use of formula (1), since 8* I [83“,] is not positive definite. Rather, he favored dividing by T or using the following estimate which is not unbiased, but provides a positive de- _finite 8 matrix: IJHS'ADLS (2) 3** . e—p 4“. up. JT‘Lp’W/T’Lp‘g 269 '11 , 1 , “ s XIX1 s MX1 9(1) - : : Ml , ... , f 1 8% r' I r. - S11 M ' . . . ° 2 smxiy x1Eyl yM] I u-l “ , SM1 p(1) I . I I h M FMI‘ MI- I '3 ”El 8 XMYH’ ![ . .. .. 1! ° - J xM yl yM ° MM .. 0. - l where S 1 I [sup ] . Also, (VII.13) asymptotic Var(EZA) = (9(1))-1 Even if X is assumed to be fixed for repeated samples, 62A is no longer best linear unbiased and G?(1))-1 provides only as asymptotic estimate of Var(fizA) due to the use of an estimate of 2 in the GLS formulas.1 A "degrees of freedom” adjustment can be made in the estimated ZA coefficient variance-covariance matrix [i e., asymptotic Var(bzA)] in the same manner as for FIML (section V E). The estimated ZA coeffi- cient variance-covariance matrix can also be normalized in the manner suggested for FIML. If an estimate of the disturbance variance-covariance matrix is desired, it seems desirable to utilize the estimated ZA coefficients to calculate a new 2 instead of merely using the 8 matrix calculated from the DLS coefficients. Utilizing the ZA coefficients the estimated 1See Zellner [1962] for a proof of (VII.13). 270 disturbance variance-covariance matrix becomes: . ZA (v11.14) 22A [Suu'] ZA _ .ze'.zA *ze' , "ZA "ZA with an“. (l/T)uu u“. - (1/I)+sLL [+2L +2“)”. , +5“ being the vector of ZA coefficients (including the normalizing coefficient, -1) for equation u and V+ZH I [yu 2 2p] . A "degrees of freedom” adjustment may be made in the fiZA matrix and the §ZA matrix normalized may be computed in the same manner as for FIML (section V.D.). The ZA estimates and the DLS estimates coincide if either: (1) The same predetermined variables occur in all equations,1 or (2) There is zero correlation between the residuals from DLS for all 2 pairs of DLS equations, i.e., S is a diagonal matrix. 1That ZA and DLS estimates coincide if the same predetermined vari- ables occur in all equations can be seen as follows. Let X I X1 I X2 I - I X . Then T fl M M 1“ 2 s X'y u-1 u A -l . , -l ' 82A - [s s(x x>] M 2 SMHX'y t-l " 'r However, [S-IG(X'X)]-1 I [S-l]-10(X'X)- I SG(X'X)-1; hence, the vector of coefficients for the it equation becomes: r-M n 2 smx')’ u-1 ” ~(1) -l . -1 ' 52A - [911(X'X) .-. 51M(X X) ] M m z s X'y .u-1 5 M M . _ M _ M I a 2 2 siu'su u[x'x] 1x'yu I 2 {[X'X] 1X'yu 2 siu,sp H} u'Il H'l u=1 u'=1 271 It is sometimes suggested that ZA be applied to the unrestricted reduced form equations to hmprove their efficiency over DLS applied to the unrestricted reduced form equations; however, there will be no improvement due to (1) above. On the other hand it is often the case that structural equation coefficient restrictions imply readily rec- ognised restrictions on the coefficients of reduced form equations. If these restrictions are taken into account in estimating the reduced form equations, the ZA coefficients will not in general coincide with the DLS coefficients obtained from the reduced form equations, even if the restrictions are taken into account in the DLS estimates. In section VII.C a computational method for taking into account general linear restrictions on coefficients in ZA estimation will be presented. This computational method may prove helpful in direct estimation of reduced form coefficients, since structural equation restrictions may imply restrictions on the reduced form coefficients of a more general form than the special case that certain reduced form coefficients are zero. The computational method given in section VII.C is sufficiently general to take account of restrictions which cut across reduced form equations.1 6 M 1 1H llflIHn “°"°V°" .3» u ' [‘11 ' sin] 8' o if i i u ‘ u-l.1“ I w. s su'u}L I [X' X] 1X' 'yu I 6(1) . Goldberger hence, p.1{[x' X] 1X' my".1 aiu' [1964], pp 248 and 263 also contains a proof of (1). 5A proof of (2) is given in the next section (section VII.B). 1It should not be forgotten that FIML may also be applied directly to reduced form equations and that FIML coefficients will not coincide with DLS.cocfficients in the same case that 2A coefficients do not coin- cide with DLS coefficients. Some relatipnships between FIML and52A.coe..: efficients are discussed in.section V11.D.l. 272 B. An Alternate Computational Procedure Since (2) above (i.e., S a diagonal matrix) suggests an alter- native method for calculating EZA’ we will verify (2). In this case: 1 T 11 ... ;—— ... 0 11 -1 ‘ . S a = . and the formula for ZA becomes 0 o ...S_1__ .. ”fl PJ—x'x O x'xW'l [L 811 l l l M SIIXiyl + ~-- + 0 xiyél . 1 . . 0 W . . o __ i - 1 L. l s XMXM O X'y + ... + ———"y F -l q fi—léx'y “ 911(xix1) °" 0 811 1 l . . . ' ‘1 __1__ I L 0 Sm(xMxM) _, s XMYM MM a a V‘ -1 RDLS? ' I ' ‘1 ' AD108 flux»? XMYMJ LEM _, In the above verification the diagonal elements of S cancelled out; hence, if any diagonal matrix had been used in place of S the same result would have been obtained. In particular, an identity matrix could be used for S. This gives an alternative method for calculating ZA coefficients: (1) Start the computation by using an identity matrix in place of the 3 matrix. (No initial coefficients are required, since they are used only in the calculation of the S matrix.) 273 Calculate 0(0) and pm) using the same formulas as for ,0) <1) and. p except that the identity matrix is used for the 8 matrix. Then: ‘6an - <9‘°’>'1p‘°’ (2) Calculate S from BDLS in the same way as given before and use (1) it to calculate a new 0(1) matrix and P vector. Then: an - (9(1))-1pU) as before. whereas the alternate computational method given above takes slightly more computer time at the 0th iteration (for a large scale computer the additional time is hardly measurable), it is simpler to program--especially if provision is made to iterate on the 2A estimates in the manner indicated further on. Another advantage of starting in this manner will be noted in the discussion of the calculation of re- stricted 2A coefficients (section VII.C). A disadvantage of starting with an identity matrix in place of the 8 matrix is that it imposes DLS estimates as the starting estimates for the 2A procedure, whereas it may be desired that other estimates be used as starting estimates for some or all of the equations; how- ever, there seems little tendency to use estimates other than DLS estimates as starting estimates for ZA. 274 C. Arbitrary Linear Restrictions Imposed on Coefficients As was noted earlier, the ZA formulas may be derived as an application of the GLS method. In like manner, restricted ZA (RZA) estimates in which linear restrictions are imposed on the coefficients being estimated may be derived as an application of the RGLS method. If a set of NR restrictions given by (VII.15) R B = r NRXn nXl NRXI is impose on the ZA model (y = XS + u with assumptions as stated pre- viously), the RZA formulas are given by:1 (VII.16) fiRZA - Q {[Q'OmQJ'1 Q'Epm - 9mq1} + q nX1 nX(n-rk R) (n-rk R)X(n-rk R) (n-rk R)Xl - nxl (VII.17) asymptotic Var(éRZA) = QlQ"9(1)Q]-1Q' an (VII-16) and (VII.17 are derived from substituting the ZA matrix '9(1) (1) and the 2A vector F9 into RGLS formulas (1V.5) and (1V.8). Q and q are calculated from R and r by the computational procedure of section IV.B.l. 1Calculation of the Q matrix and q vector also gives a means of *RZA 1) separating out rk R coefficients, B( , which may be calculated from the remaining n-rk R "unrestricted" coefficients, é???’ Thus, the following pair of formulas are together equivalent to (VII.16): . .1 - . 1 1 a??? - [Q9()Q]1 ofiv"-9”q1 (n-rkXR)Xl (n-rk R)X(n-rk R) (n-rk R)X1 ARZA RZA = + nk RXl rk RX(n-rk R) (n-rk R)Xl rk Rxl where Q2 and q2 are the subparts of Q and q noted in section IV.B. 275 In the calculation of the 9(1) matrix and Pa) vector, the 8 matrix corresponding to the restricted DLS rather than the unre- stricted DLS estimates should be used. Imposing Rams/tions which th Amos Equations on the Coefifiioécmto Used in cum the s Mama It seems clearly desirable that restrictions which do not cut across equations be imposed on the DLS coefficients used to estimate the 8 matrix. It would also seem desirable to impose restrictions which do out across equations on the DLS coefficients as well. (They will then not be DLS coefficients for separate equations but coeffi- cients closely related to DLS.) A scheme for doing so follows. Let a be the set of coefficients which minimize fi’fi - DLSME (9-*§)'(y-Xé). Then the estimator Bmsxs ' BDLS ' ' nXl nxl ~an ( 2 n )Xl u-1 is the same as that obtained when each equation is estimated separately by his. Let us now change the problem to one of: (v11. 13) min <9-1’cé)‘ (MB) 8 subject to: A RB - r Then the resulting solution will be the restricted DLS solution if no restrictions cut across equations and a solution which we will call the 276 RDLSME (Bestricted DLS yultiple Equations)solution if one or more re- strictions cut across equations. The RDLSME solution ignores the covariances between the dis- turbance of separate equations as does the restricted DLS solution; however, it does take account of all restrictions on the coefficients which cut across equations whereas the restricted DLS solution does not. RDLSME coefficients may be automatically calculated and then used to calculate the S matrix used to estimate the ZA coefficients by imposing restrictions on the alternate method given earlier for the calculation of the ZA estimates; that is: (1) Start the computation by using an identity matrix in place of the (0) S matrix. Calculate 9‘0) and p using the same formula as for 9(1) and pa) except that the identity matrix is used for the S matrix: (0) _ 9(0) . 0 -1 , BRDLSME = QEQ'9( )Q] Q [)3 q] + q (2) Calculate S from éRDLSME and use it to calculate a new 6‘1) (1) matrix and F) vector. Then calculate EZA by formula (VII.16) as before.1 In addition to being simple to program, the use of the 0th iter- ation method given above has the advantage that unique DLS coefficients need not exist provided the restrictions which cut across equations pro- vide sufficient restrictions on the coefficients of all equations that 1If restrictions which cut across equations are ignored in the estimation of the S matrix, the resulting RZA coefficients will, of course, not coincide with the RZA coefficients obtained through taking account of the restrictions. 277 A unique 8 estimates exist. Remionohip to A Common RZA Forunufia In applying all of the RZA formulas given so far, the R matrix need not have full row rank and the X“ matrix for each equation need not have full column rank. If the additional requirements that (l) the R matrix has full row rank and (2) X” has full column rank for each equation are imposed, substitution of w9(1) and P(1) into RGLS formula (IV.23) gives the following alternative RZA formula:2 (WHO) 6 =‘é ->'R>"[R<9(”1"RJ [Ram r] RZA ZA As with our other formulas, restricted DLS rather than unrestricted DLS estimates should be used to estimate S when calculating EZA ; lAs an example, suppose that a system of 4 equations contains only a single jointly dependent variable y in each equation and that each of the 4 equations contain the same 5 predetermined variables, x1 -°- x5: plus 6 additional predetermined variables not contained in any of the other equations. Let 6(1) be the coefficient corresponding to x me 3 (1) <2) (3) (4) 3 quation i and assume that Bj = B] = BJ = Bj j - l,...,5 . If we assume that the data consists of only 10 observations, unique DLS coefficients do not exist since each equation contains 11 variables but there are only 10 observations. Also, rk X1 = 10 (the for maximum rank) implies a vector of residuals of all zeros for each equation; hence, S - %fi'fi = 0 where 0 is a 4X4 matrix. 0n the other hand the restrictions on coefficients given above imposes 15 independent restrictions which cut across equations; hence, it is probable that the coefficient space is sufficiently restricted that the DLSME and RZA coefficients are unique. 2This formula is given in Stroud and Zellner [1962], p. 10. 278 otherwise, information which is assumed to hold is ignored in the cal- culation of the S matrix. If both unrestricted 6 and B are ZA RZA desired, they should be calculated separately. The advantages of the formulas which use the Q matrix and q vector over the formula which used the R matrix and r vector are given in section IV.B.Z for RGLS, but are applicable to RZA as well. 279 D. Iterative Zellner-Aitken Estimator (IZA) 1. Only zero and normalization restrictions imposed on coefficients If the application of the ZA method to the DLS estimates results in an increase in statistical efficiency, the question naturally arises as to whether the ZA coefficients should then be used to estimate a new S matrix, new ZA estimates calculated, etc. If the process is continued, it may be hypothesized that the coefficients will converge; that is, that the proportional change in all coefficients in any one iteration will become less than a small preassigned constant, e > 0. If the coefficients converge, let us call the result the IZA (iter- ative ZA) coefficients. It is shown in this section that if the IZA procedure converges, the IZA and FIML coefficients will coincide; that is, if to the other statistical assumptions we add normality of the disturbances, then the converged IZA coefficient estimates are maximum likelihood estimates. The FIML Paocedulne First, let us examine the FIML estimating procedure in the case of only one jointly dependent variable per equation. Since the matrix of coefficients of the jointly dependent variables, F, takes on a particularly simple form--the identity matrix, the likelihood function in the form f*(a) [given in (V.24)] becomes detZT/det S = detZI/det S - l/det S. Thus, maximization of the likelihood function implies the minimization of det 8.1 1This may also be derived by noting that FIML estimates minimize the estimated variance-covariance matrix of the restricted reduced form, and that in the ZA system, the structural equations may also be re- garded as the reduced form equations in which account is taken of all coefficient restrictions in the model. 280 For Any equation 1)., Y becomes y“. (Yu. is empty) and Z +~u H becomes Kg. Substituting into equations (v.37) and (v.42) we get:1 Em M , . (VII.21) 1(1'1) - raga (1-1) - 2 8“” x'fi , - x'fi : a and 2 f a up. A A. VII.22 --1‘3——<-1— =8 ' - 11' 'UF ux ( ) €145. aaua‘u' 3.3(1-1) XNXIJI' ( / )x“ M". H! CW I where F .- ' [am 8W1+8HH S 1 DH - ' 8N“ L- th (1) At the i iteration, the new coefficient vector a is formed from the old coefficient vector a(1-1) as: (VII.23) a“) . g(i‘l) + h(10‘1“) where h(i) is the step size and the direction, d(1), is formed by d“) - I£(1-1)|-11(1-1) with: r‘(1-1'7 (M) (1431 ‘ 1 1 a£111 1(1-1) - and £(i'1) :- . I Léi-l: (in) {fut-13, A 1Since f - I, detZF - detZI which is a canstant; hence, terms derived from detzf drop out of Sgéfll and §_£éfll . Thus, §{u|u} 68 is deleted from 1.“ in (v.37) and H is deleted from dip, in (v.42). 281 The. IZA Pucedwce Now let us examine the IZA iterative procedure. At the ith iteration: rm 1 . 2 sluxiy u-l ”' a 1 1-1 -1 9‘ ) _ (9‘ >) M 2 I xxVE lex'x mx‘ W 1 1 8 1XM where 0“.” - ' I , Mlxx31 "' °MMXQ¥M *(1-1) B being used in calculating the 8 matrix which is in turn used to calculate 9(1-1) and pa.” for the current 3(1). The increment added to 8(1 1) to form 8(1) is: (v11.24) 6(1) - 6(1’1) . F}! j Esmxiy u-1 u «9(1-1))-1 5 _ «g(1-1))-q?(1-1)§(1-1) M We may. PM mx 1 a 2 s 'y ’1 . , r:(i-l 11,-Ix” 31"}:li 91 . «9(1'1))' 5 - M : : (1 1) gs»... thew-sm‘mjé xuyu lH‘l d a 2821“ .. fl; aux. Fg xix Bot-If 1"» 1n u ”-1 “.218 - (9(1'1))‘1 : '- : > M M 21mm lsmw“ ” p,- L ‘J a“ JJ '1 FMsm "M 1 . <1-1) m ,. ”-13 1“X1614, -XHBH ) “€18 Xluu a G?(i-1))-l . - «9(i-l))-l M M . (1-1) F“ a (Y -X B ) )3 8 'u ”18me u u u yd ngj _ (9(1-1))-1 1(1--1) i.e., §(1) - §(1'1) = 1°G9(i-1))-1 51-1) Thus, for the Special case of one jointly dependent variable per equation the IZA procedure is exactly the same as the FIML procedure except that: (1) a step size of l is used for each iteration, and (2) the 0 metric is used in- stead of the lid metric.1 If the formula for the ‘9 metric given in this chapter is com- pared to the formula for the «9 metric given in the FIML chapter (V.39), theywill be seen to be the same, i.e., the 9 metric of the IZA pro- cedure is the initial metric used by Chernoff and Divinsky in their FIML computational procedure. (The 9 metric was used as the initial metric due to its "safe? characteristics. For one thing it is positive definite thereby insuring that if a sufficiently small step size is chosen, there will of necessity be an increase in the likelihood function for 1(1) # 0.) LThe 1 vector and liq umtrix are the right hand side vector [defined by (V.37)] and metric [defined by (v.42) and (V.43)] for FIML. 283 Since no jointly dependent variables other than the normalizing variable occur in any equation the E’ metric given by formula (v.40) also reduces to the 6’ metric. (As noted in section V.C.3, the .E metric is a metric which was developed to be asymptotically the same as the =6 metric.) Thus, the 9 metric given here is the same as the metrics used for the bulk of the iterations by Chernoff and Divin- sky. Chernoff and Divinsky only shifted to the more powerful =£ metric close to the maximum. (In addition to the advantages of the more powerful metric, {fl provides a maximum likelihood estimate of the estimated coefficient variance-covariance matrix rather than one which can be shown to be asymptotically the same, i.e., Efl, which in the case of one jointly dependent variable per equation is the same as 9‘1.) Since the likelihood function for the ZA model is of a consider- ably simpler form than the likelihood function to be maximized in the general FIML case, the '9 metric (which in this case coincides with the E’ metric) should be adequate in most cases for convergence. On the other hand, the lid metric which we developed earlier should still prove the more powerful in terms of number of iterations required for convergence. Also convergence could surely be speeded if a vari- able step size were computed in a fashion such as the one given in the FIML section. If a step size of l is imposed, it is conceivable [though unlikely because of the particularly simple form of f(a)] that the coefficients may diverge from the maximum or cycle in some fashion since an increase in the likelihood at each iteration is only guaranteed for a step sufficiently small. 284 Due to the more powerful metric used in the FIML procedure, total computing time may be expected to be less if the FIML procedure of section V.C. is used rather than the IZA procedure.1 There would be even more advantage if the FIML formulas were modified by formulas (VII.21) and (VII.22) to take advantage of the simple form of F. Finally (if the disturbances are assumed to be normally distributed), the FIML procedure does provide a maximum likelihood estimate of the coefficient variance-covariance matrix rather than one which is only asymptotically the same as the maximum likelihood estimate. On the other hand the IZA procedure is easier to program. Con- vergence would be speeded up considerably (at only a small cost in additional programming) if step size were varied by using a scheme such as that given in the FIML section. If the IZA procedure is used, d(i) can be calculated by (l) calculating the new coefficients, which would be obtained at the ith iteration if the IZA procedure were used [i.e., 3(1) - «9(i-1))-1F#i-1)], and (2) calculating d(i) as d(i) _ g(i) _ g(i-l) New coefficients for the iteration could then be calculated as g(i'l) + h(i)d(1), the final step size h(1) being based on det S for trial values of h(i).2 1It is conceivable that there exist some exceedingly simple models in which the IZA procedure requires less computer time; however, these problems will surely be encountered rarely and even in these cases the amount of computer time saved by the IZA procedure will be hardly measurable. 0n the other hand, for the majority of problems encountered, the FIML procedure should result in considerably faster convergence computer time-wise than the IZA'prbcedure. Also, the FIML procedure may be expected to converge for some problems for which the IZA pro- cedure does not converge. 2Since only a single jointly dependent variable occurs in each equation, det S is minimized at the maximum of the likelihood. 285 A “degrees of freedom" adjustment may be made in the estimated disturbance variance-covariance and the estimated Coefficient variance- covariance matrices which correspond to the IZA coefficients in the same manner as for FIML (sections V.D. and V.E.). If the IZA coeffi- cients are to be maximum likelihood estimates, the adjustment to the disturbance variance-covariance matrix is made only after the coeffi- cients have converged, i.e., a diviser of T is used during iteration. 286 2. Arbitrary linear restrictions imposed on coefficients Restrictions may be imposed each iteration of the computation of the IZA coefficients in the same manner as for ZA. When convergence has been obtained, the resulting coefficients will satisfy the restric- tions and will be the same as would be obtained if the restrictions were used in the FIML computational procedure. 287 E. Iterative Direct Least Squares (IDLS or Telser Method) For the particular case of a system of equations in which each equation contains only one jointly dependent variable, Professor Lester Telser proposed a multiple equations computational procedure in which DLS is used as the primary computationalprocedure but in an iterative fashion.1 In each step of the IDLS (iterative DLS) procedure the co- efficients of only a single equation are calculated by DLS except that the residuals from all of the other stochastic equations in the system are included as extra explanatory variables; that is, the explanatory variables for the DLS calculation are taken to be the predetermined variables for the equation being estimated and the residuals from all of the other stochastic equations in the system. Only the coefficients of the predetermined variables of the equation are used. The coeffi- cients corresponding to the residuals are ignored. The IDLS procedure may be considered to be a special case of the ILIML procedure given in section VI.E, since DLS may be considered to be the particular case of LIML in which only one jointly dependent variable occurs in each equation. Thus, the computational procedure summarized on pages 256 and 257 for ILIML is the same as the IDLS computational procedure with LIML substituted for DLS as the basic computational method. Dmivatéon 06 the IDLS Method2 Let us add normality of the disturbances to the statistical assumptions previously made. Then, since the IDLS method is a particular 1Telser [1964]. 2Telser [1964] uses a different approach in his derivation of the IDLS method. 288 case of the ILIML method, the derivation of the ILIML method is suffi- cient to show that at any one step in the procedure, the coefficients of an equation are selected to maximize the likelihood function, pro- vided we consider the coefficients of the remaining equations in the system as fixed during that step. (This implies that if the coeffi- cients of all other equations are maximum likelihood coefficients, then the single step will result in maximum likelihood coefficients being estimated for that equation as well.) Thus, it may be expected that, in general, if enough steps are taken (enough steps may be a very large number of steps in some cases) convergence will be to the maximum of the likelihood function. Particular cases similar to that posed following Figure V1.55 of section VI.E may arise where movement to a point short of the maximum may occur and from that point no further movement occurs or movement is so slow that it is thought that convergence has occurred. Since the restriction to one jointly dependent variable per equation considerably simplifies the likelihood function, the remainder of this section will be devoted to giving a simpler derivation of the IDLS method than the one given for ILIML (however, the basic steps in the derivation are the same as for ILIML). Let us separate out the first equation from the remainder of the system and select coefficients for that equation which maximize the likelihood function assuming that the coefficients of the other equations are fixed. Thus, we will divide the MX(M + A) matrix of coefficients a into two parts-~01, the lX(M + A) matrix of coeffi- cients of the first equation, and 02, the (M - l)X(M + A) matrix of coefficients of the remaining M - 1 equations: 289 F a1 1 1X(M+A) (VII.26) a = a2 b(M- 1) X (M+Aj Let S be the estimated disturbance variance-covariance matrix corre- ‘ sponding to a set of estimated coefficients a as before. Then, since fi---z&', (VII 27) s - (1/T)fi'fi - (1/r)&z'z&' Selecting coefficients to maximize the likelihood function is equivalent ‘ I to selecting coefficients to minimize det S (see section VII.D.1). In the same manner as for ILIML (section VI.E), we may factor det S into: 1 r. .x. 1 . _d l I g I A t o d I c (VII.28) det s r ec(az Za ) EUIEZ leUzal et(a2[z zjaz) where: U2 - -202 is the TX(M - 1) matrix of residuals of equations 2 through M and [2.zjifi is the moment matrix of the part of Z orthogonal to 2 U2 . Let us subdivide 2 as 2 = [y1 2 x1 2 21*] where y1 is the TXl vector of observed values of the jointly dependent variable of equation 1, X1 is the T‘Xnl matrix of predetermined variables in equation 1, and 2t* is the TX(M + A - n - 1) matrix of variables 1 outside equation 1 but in the system. Then 31 may be correspondingly' subdivided as 31 = {-1 E 8i 5 0'] where 81 is the an1 vector of coefficients of predetermined variables in the first equation, and O is the vector of coefficients of the variables outside equation 1. a [Z'ZlLA'a' may now be written as: 1 U2 1 I 290 [‘3’in YiX1 yiz? 71" . - ? A' E 1' ' ' ' ** “ . (VII 29) E 1 31 0] lel xlx1 x121 81 *1” **' ' z1 y1 21 x1 21* zrj OJ . L iuz b I _ v A 1 I A y1Y1 2E3’1x1jiuzg1 + B1EX1X1JJ41251 ' Substituting (VII.29) into (VII.28), taking the partial derivative of det S with respect to 81 and setting the partial derivative to zero we have: det S ' l_ . v u . v x . (V11.30) (LEFT) . Tdet(az[z 210/2) (-2[x1y1]lU2 + 21211115111531) 0 or solving for the minimizing value of 81 we have:1 -1 3 ' A ' A (VII.31) El [xleL-UZEleljiUz But the solution given above is the least squares solution of the co- efficients corresponding to x1 which would be obtained if y1 were used as the dependent variable and the variables in the matrix 1 : 32] were used as predetermined variables in the equation. This may be seen as follows: [1: Let 31 be the (M - l)Xl vector of least squares coefficients corresponding to the variables in 32 . Then the least squares solution 1 Zdet s 2 -1 LT I ‘fdethEZ'fla’PEXinlfi , a positive definite matrix 86 2 1 provided [Xixllré exists and det612EZ'Z]aé) i 0; therefore, the 2 second order condition for 81 to minimize det S is met. 291 of [8i 3 3i] is:1 '11 E31 . A ' ! A -1 i A ' (V11.32) 8 - [[x1 : 02] [x1 . U2]] [x1 . U2] Y1 L1 .J x'x x'fi "131' 1 1 1 2 1y _ “Phi-$2 '[xlxfli‘flzx'lfizméfizl-f 21y] = j‘fijéfizfllfiéxitxixiyl [fiiozjfil .1 .0531 Exlxljl.f]lleyl ' [Xixlji-flizxifizmzfizj”Hap;T g L:[fiéfizjlfilfiéxlhixl]'lx'lyl + [fiéfizliilfiéyl -1 « . . -1«~ ‘ 1 A 1 _ 1 1 1 [x1x11 U2[X1y1 X1U2[UzuzJ U2y1] ~ ~ -1 . . -1 1 1 _ 1 1 1 [U2U2]Jx1[U2y1 UZXIEXIXIJ xlyl] I -1} I A [Xlle-L UZEleljl-Uz A A -1 A I I [U2U211x [1123:1le 1 1 1The formula given here for the inverse of a partitioned matrix is derived in Faddeeva [1959], pp. 102-103 except that’he writes A A A -1 1 A 1 __ 1 1 [XIXILU in the form xlx1 X1U2(U2U2) 2? version, Faddeeva doesn't treat Xin and UéU2 uniformly but in- U'X Also, to save an in- . . A'A stead writes a different term for [U2U2]ix1 . 292 - 1 '1 1 . Thus, 81 [xlxljifiztx1y1lifi2 as claimed, however, [as derived in (VII.31)] this is the solution which maximizes the likelihood assuming that the coefficients of the remaining equations are fixed. (If the remaining coefficients are maximum likelihood coefficients, the solu- tion given by (VII.32) gives maximum likelihood coefficients for this equation as well.) New residuals may now be calculated for equation 1 using the new coefficient vector 81 but not including the coefficients in 31, and new coefficients can be estimated for each equation in turn, using the new residuals calculated in previous steps as additional predeter- mined variables in the equation in the same manner as for ILIML. After estimating new coefficients for equation M, a new iteration is started by again estimating new coefficients for equation 1. Iteration con- tinues until all coefficients converge. Summaay 06 the IDLS Computational Method The summary of the ILIML computational method--pp.256-257 of this paper--summarizes the IDLS computational method as well if DLS is used instead of LIMI. Insteastng Efifitctency 06 IDLS Computation Except for the notes regarding eigenvalues and eigenvectors, the suggestions for increasing efficiency for the ILIML method are applicable to the IDLS method as well. In particular, the residuals need not be calculated, since: (V11.33) z'i‘iLL - -[z' +Zu3+6u and (VII.34) 6's - §'[ 2' z ] é where +2“ = [yp : ZN] , +8“ = E , and Z is the matrix of variables u in the system (or subsystem) being estimated. In like manner to the ILIML method, any equation with rk.X = nu does not affect the converged FIML, IZA, or IDLS coefficients; hence, some additional computational efficiency may be obtained by omitting all equations with rk X = nu from the iterative procedure until con- vergence has been obtained (maximum likelihood estimates have been obtained) for all equations with rk X > nH .1 For the model we are now considering (only one jointly dependent variable per equation) rk X 3 nu will occur for the uth equation only if it is assumed that all of the predetermined variables occurring in any equation in the system being estimated occur also in the uth equa- tion with non-zero coefficients. The maximum likelihood coefficients for an equation containing all of the predetermined variables in the system may then be directly calculated by including as the only "extra" predetermined variables in that equation the residuals (calculated from 2 the converged coefficients) of all equations for which rk X > nu. L wr- lAn equation with rk X = n is usually referred to as a just- identified equation and an equafion with rk X > n is usually referred to as an over-identified equation. p 2This technique may also be used in the FIML and IZA procedures. Convergence may be first obtained (on a FIML or IZA routine) for those equations which do not contain all of the predetermined variables in the system and the system then enlarged to contain also the equations each of which contains all of the predetermined variables in the system. The IDLS procedure may also be combined with the FIML or IZA pro- cedure by using FIML or IZA to calculate the coefficients of those equa- tions which do not contain all of the predetermined variables in the system. The coefficients of an equation which contains all of the pre- determined variables in the system may then be directly calculated as a DLS problem which contains as extra predetermined variables the residuals (calculated from the converged coefficients) of the equations which do not contain all of the predetermined variables in the system. 294 Comps/each Math FIML and IZA Methods The remarks comparing convergence of the ILIML method with con- vergence of the FIML and SML methods (pp. 259-262) are applicable for comparing the IDLS method with the FIML and IZA methods as well. (The IZA method [unlike the ILDS method] is similar to the FIML method in that adjustments are made to all coefficients in any step of the con- vergence procedure; hence, convergence will occur to at least a local maximum if convergence occurs.) Situations in which convergence will be short of the maximum of the likelihood function may occur for IDLS in the same manner as for ILIML.1 1Klein'ssmodel I was modified by reclassifying all explanatory jointly dependent variables in each of the 3 stochastic equations as predetermined. (The normalizing jointly dependent variable for each stochastic equation became the only jointly dependent variable in the equation, and the identity equations were deleted from the system.) The coefficients of the modified model were then estimated by FIML, IZA, and IDLS using a coefficient convergence criterion (see section V.C.S) of .000 000 001. FIML required 6 iterations to converge, IZA required 46 iterations to converge, and IDLS required 64 iterations to converge. (The estimates obtained coincided, of course, for the 3 methods.) The Monte-Carlo experiment reported in Kmenta and Gilbert [1967] was calculated on the AES STAT system. In this experiment the FIML, IZA and IDLS estimates coincided for all samples for which all three methods were calculated. Even in the simple 2 equation models, the FIML computational procedure was much more powerful than the IZA and IDLS procedures, the FIML procedure requiring about 8 iterations for each problem and the IZA and IDLS procedures requiring about 23 itera- tions. 6A coefficient convergence criterion of only .000 001 was used. Had a smaller convergence criterion been used, only a few additional iterations would have been required for the FIML procedure, since it is powerful close to the maximum of the likelihood. On the other hand the IDLS convergence procedure does not converge faster as the maximum is approached; hence, a number of additional iterations would have been re- quired for IDLS.) The number of iterations required for convergence for IDLS was highly variable. In the 2 equation model, IDLS sometimes re- quired more and sometimes fewer iterations than IZA, but never as few iterations as FIML. In a 4 equation model, there was a much greater ad- vantage to using FIML than IZA or IDLS than for the 2 equation model. Also, the number of iterations for IDLS became much higher than for IZA. (The iteration results reported in this footnote are not reported in Kmenta and Gilbert [1967].) CHAPTER VIII THREE-STAGE LEAST SQUARES (3SLS) A. Only Zero and Normalization Restrictions Imposed on Coefficients Three-stage least squares (BSLS)1 is usually thought of as a method for estimating the coefficients of a complete system of equa- tions and its properties are usually compared with the properties of the FIML estimator. It should be noted, however, that the 3SLS esti- mation procedure may also be applied to a subsystem of equations-- utilizing the structure of the subsystem being estimated plus additional instruments (usually the predetermined variables in the remainder of the system); hence, it is often more fruitful to compare the properties of the BSLS estimator with the properties of the SML estimator. Also, as with SML estimation (and unlike FIML estimation) jointly dependent variables are adjusted by a matrix of variables contemporaneously independent of the disturbances of the equations being estimated. As a result, if the rank of the matrix of variables used in the adjustment of the jointly dependent variables equals the number of observations, the special adjustment has no effect. (This is shown in section VIII.C.) Regarding identity equations, Zellner and Theil recommend that they simply be deleted from thethree stage procedure. In a footnote, they go on to say: "It is sometimes recommended that such equations be eliminated by a substitution of variables. This is superfluous and makes the computations more complicated than necessary."2 Predetermined variables in identity equations do serve as prime candidates as in- struments for the Xi .matrix which is used to adjust jointly dependent 1The basic article on BSLS is Zellner and Theil [1962]. 2Zellner and Theil [1962], p. 63. 295 296 variables in the two stage and three stage procedures.1 The BSLS estimating equations can be derived as an application of GLS in which (1) an estimated disturbance variance-covariance matrix is substituted for the actual disturbance variance-covariance matrix, and (2) stochastic variables are included in the GLS X matrix. Follow- ing is a derivation. As in part I let the uth structural equation of the system or subsystem being estimated be VIII.1 = Z 5 + u ( ) yH H H H TXl TXn n X1 Txl H H 1The XI matrix is defined further on in this section. 2Since ZSLS coefficients which are starting estimates for BSLS may also be derived as an application of GLS, BSLS might be said to be de- rivable as an application of the GLS procedure twice. This is the approach used in Zellner and Theil [1962]. The derivation in this paper follows Zellner and Theil's derivation except that: (1) Instead of restricting the matrix of variables used to adjust the jointly dependent variables to the entire matrix of predetermined variables in the system (X), the jointly dependent variables are adjusted by a matrix of instrumental variables, X , with X con- taining all of the predetermined variables in the subsystem being estimated plus additional instruments. X may of course be used as the matrix X A careful reading of the derivations given in Zellner and Theil [1962] will disclose that none of the properties which they claim for the BSLS estimator will be affected by this substitution provided we make the same assumptions regarding the variables in X that they make regarding the variables in X-- that the variables are fixed. (2) Zellner and Theil assumed that the X matrix has full column rank. We will assume that the matrix has full column rank for ease of deriving BSLS as a GLS procedure, but present a computational procedure for which XI may have less than full column rank. 297 where y“ is the vector of observations of the normalizing jointly dependent variable in the equation, Z” = [YH é Xu] is the matrix of explanatory variable in the equation (the Txmb submatrix Y“ is the matrix of explanatory jointly dependent variables in the equation and the Tti submatrix X is the matrix of predetermined variables in Y1» an the explanatory variab es of the uth equation (vn is the mqu sub- the equation), 6“ - is the vector of population coefficients of vector of the population coefficients of the explanatory jointly de- pendent variables and SM is the tuxl subvector of population co- efficients of the predetermined variables), and up is the vector of disturbances of the uth equation. Let XI (the subscript I denotes instruments) be a TXK matrix of instrumental variables containing the predetermined variables in the system or subsystem being estimated plus possibly additional in- strumental variables. The discussion of selection of instruments in section II.G for the XI matrix of the double k-class estimators is applicable to instrumental variables used in the XI matrix for 3SLS as well. The predetermined variables in the remainder of the system (if 3SLS is applied to a subsystem of the entire system) and the pre- determined variables in identity equations should certainly be con- 1 sidered as candidates for inclusion as instruments in the XI matrix. When reporting results of 3SLS estimation, the particular instruments included in the XI matrix should be reported along with BSLS coeffi- cients obtained, since the particular instruments included in the XI matrix affects the BSLS coefficients obtained. 1It is usual to use the X matrix (the matrix of predetermined vari- ables in the system) as XI (the matrix of instrumental variables). 298 In what follows we will assume that XI consists of "fixed" vari- ~ ables, only.1 Initially we will assume that XI has full column rank for ease of deriving the computational formulas; however, (as is noted further on in the derivation) an X of less than full column rank presents no I difficulty if the formulas which are presented in this paper are used. XI must have rank at least equal to the maximum number of explanatory variables in any equation in the system or subsystem being estimated. (Even this lesser restriction will be relaxed in section VIII.D when we consider general linear restrictions on coefficients.) lAssuming that X contains "fixed" variables follows Zellner and Theil [1962]. This is a restrictive assumption since it excludes lagged jointly dependent variables from occurring as predetermined variables in the subsystem being estimated. The assumption that X contains "fixed" variables was apparently made by Zellner and Theil for conven- ience in deriving the 3SLS estimator and deriving properties regarding this estimator. It is common to use SSLS even if equations contain lagged jointly dependent variables. In his derivation of the 3SLS estimator Goldberger [1964], p. 347 states without proof that: ”For convenience we assume that all predetermined variables are exogenous variables distributed independently of the disturbances; the results however, carry over to the general case." (The assumptions regarding predetermined variables in section I.C.3 of this paper follow the assumptions of Goldberger's "general case”. In particular, lagged jointly dependent variables are permitted as predetermined variables in Goldberger's general case.) The discussion in Fisher [1965] regarding the use of lagged jointly dependent variables as instruments would appear applicable for instru- ments used in the X matrix in addition to the predetermined variables in the system or subsystem being estimated. 2The assumption made further on in the derivation [following (VIII.4)]that the matrices Xizu ”have"‘full column rank implies that rk X 2 n for u - l,...,Ms When we consider general linear restrictions on coefficients, the matrices X'Z need not have full column rank (pro- vided the coefficients space is sfifficiently restricted that unique BSLS coefficients exist); hence, the requirement rk XI 2 nu may be re- laxed somewhat in that section. 299 As in our derivation of ZSLS as a GLS method (section IV.D), let us premultiply each equation in the system or subsystem being estimated by the same matrix-~the transpose of the XI matrix defined above. th The p equation becomes: v111.2 'X' - x'z 6 + x' ( ) Iyu I u 11 fit KXl Kan nqu 101 The entire system can be written as: I . I ... I le1 X12161 + + 0 6M +XIu1 (VIII.3) x' - oa+---+' +' IyM 1 x1214511 quM or if we define the following matrices and vectors: r- q - x'y x'z o ] - , ., I 1 I l 61 x¥u1 y "' - 1 X " -- v 1 6 ' : 1 l.) ' : 1 ' ... 1 1 [113:5 L0 11129 5 x154, MKXl MKXn nXl MKXl M where n - Z n , we can write the entire system as: u-1 (VIII.4) y - x 6 + {i . MKXl MKXn nXl MXXl Initially we will assume that the matrices xizu have full column rank which implies that the matrix X has full column rank; however, this assumption will be relaxed in section VIII.D when formulas for imposing general linear restrictions on coefficients are presented. Let Xiuu - an , u - 1,...,M. Then since X1 is fixed, 6%; . dXiuu - Xiduu - XiO - 0; hence, the entire vector 6%; . 0. Let U - [u1 °-- uM] be the TXM matrix of disturbances, with column u the disturbances for equation u and row t the disturbances 300 for observation t, designated U[t] . Then from the assumptions of section I.C.3 that 6U = O, 6U' U = 2 = [o ,] for all t, and [t] [t] MXM up 6U[t]U[t'] = 0 for t' # tpdus the assumption that XI is fixed we obtain that: v111.5 6' e' = 6x‘ ' x = x' 6 ' x = x' o ( ) up u' quuu' I IE Uuuu'] I IE uu'IJXI X X for u, u' = l,...,M . o , ' us I I The latter set of relations is expressed in terms of the entire covariance matrix of u by means of the Kronecker product as: (VIII.6) ses' = z sEx'x ] MXM I I MKXMK KXK .. * We will designate 6uu' as Z MKXMK The matrix X does not consist of fixed variables only, since some of the variables contain submatrices of the form Xti , with Y” jointly dependent, so that (even if the 5 matrix were known) if we used X as the GLS X in deriving the BSLS estimator, the resulting co- efficients would not have all of the GLS properties; however, the GLS derivation is used primarily as a means of suggesting the BSLS esti- mator as an estimator with potentially desirable properties. Pro- perties of the BSLS estimator are derived and proved in Zellner and Theil [1962] after the computational formulas are established. If we treat y as the GLS y, X as the GLS X, and u as the GLS u, GLS formula (IV.2) becomes: 1See footnote 1 of page 265 for a definition of the Kronecker pro- duct and a proof that for A and B square, symmetric, and nonsingular, [AQB] is also nonsingular and [A93].1 = AfilfiB-l . (VIII.7) 8 GLS 301 =[i'E'121'1m'E'193 . *-1 g , -1 g -1 . -1 however, 2 [z s xIxI] z s [x1111] J? 7'11 0 .. Ml .. ’ o . ... . -1 sEXIxI] = Thus, (VIII.7) may be rewritten as: (VIII.8) 66LS = E1111 o 1 pollutixlj'l omtxixIJ'nrJ-{izl . 0 1 : . : 5 f . x I 1 1 '1 g ‘1 . Lo - znxy LON [x1111] oMEXIxI] J Lo - xI '1 , :1 E11! 0 [Unbrixll 1 amfxixI] 1 hip: x . : : 3 L0 ° ° 2h L‘mEXiXIJ'I «New: fins - - q- [M - q guesses 1x121 ”limes ks. 1 “atlases ‘Xiv. M1” . '1 1 ... MM 1 .1 - I M . - 3 ZMXIOKIXI) X121 0 ZMXI (XIXI) 17‘1qu waffle}?! (xixl) 111i);j ”'11 mt ] 1PM a“ w a [2'2 J -'° 0 Z' J - 3 [Z'Y ] 1 1 1x1 lZM 1x1 ”1 1 u IxI . o"1[z;‘zl]'x ”[4111] x g*"izsy 1 x L- I I 1.1 E‘s-l H I 1.1 ~11 . -1 a [xIxI] LoM1[xixI]’1 ... -1 1 1 1 _ 1 _ That aux1(xlx1) X12“. [ZHZH'JMXI (the cross product of the part of space spanned by XI) and that 26X 2” in the space spanned by XI 1 I I with the part of Z“. ) 1xiv”. - [2‘1 (X'X y» in the ,] with IXI 302 [zdyh'Jsx similarily defined follows from (1.36). I Since 2 is unknown, we will substitute the following estimate of 2 into the formula:1 ‘ defS VIII.9 Z = = ( ) [Sup 1 J , . zsLs' uzsLs SZSLS' , zsLs ZSLS With sup, (l/T)% up. W(l/T) [+ZM +2“, ,]+6u, , +6“ being the vector of ZSLS coefficients (including the normalizing coeffi- cient, -l for e uation and = { Z ) q 1» +211 [yLL 11] Substituting S into (VIII.8) for Z we get the BSLS estimator: * _ (1) -1 (1) (VIII.10) 533m — (I? ) II. where S“1 = [SHH'] 81181 (VIII. 11) em = , and [9 Ml[ZMZ ZlIJIX ISMEZMZMJ'IXLJ 1 [M 1n 1 .0) . 3 M . Mi 2 8 [Z'y ] ”=1 M 11- My LThe S matrix will be singular if the number of equations exceeds the number of observations. Let U = [fil °°' u M] wherethe up are ZSLS TXM estimates. Then S = (l/T)fi'fi ; hence, rk S = rk a. If T < M then MXM ‘ rk a and therefore rk S will be less than M, i.e., S will be singular. This is not in conflict with our derivation in section VIII.B of ZSLS as the special case of BSLS in which there is zero correlation between residuals across equations, since this derivation holds only for non- singular S. Even though 2 is assumed nonsingular, a particular esti- mate of 2 may still be singular and (as shown above) if M.2 T, 2 = S is singular. 303 Let YA be the matrix of jointly dependent variables in the M equations being estimated by BSLS, let XA be the matrix of predeter- mined variables in the M equations being estimated by 3SLS, and let 2 = [Y ' 1 A A 3 XAJ- Then 5‘1) and RF ) can be computer efficiently by forming the matrix [- ' | ‘1 X [YAYAJIXI YA A v111.12 ' e ( ) [zAzA1'X[ I _ XAYA XAXA‘ (this matrix may be formed in triangular form since it is symmetric) and then extracting the submatrices [zdzu'JIX used in KAI) and I the subvectors [Zfiyu'JUXI used in 1(1) from [ZAZAJIXI . That I =I d I =1 ... [2AXAJIXI TAXA an [XAxA]IXI X X follows from our def1n1t1on of XI as containing the matrix XA [see (1.40) and (1.41)]. [YAYAJflX is computed as [YAYA] - [YAYAJLX with [YAxA11X com- I I I puted by direct orthogonalization in the manner given in section I.D.2. Note that [2; is unique and easily computed even for an XI 2 ] A IXI having less than full column rank. If X had consisted only of fixed variables and 2 were known so that 2 rather than its estimate S were used in the calculation (1) -1 of Exl) and d(1), then by GLS formula (IV.3), Var(3 ) = (R’ ) 3SLS As shown in Zellner and Theil [1962], even though X contains non- fixed variables and S is used instead of Z, (VIII.13) asymptotic V§r(8 ) = (19(1))-1 3SLS A “degrees of freedom" adjustment can be made in the estimated BSLS coefficient variance-covariance matrix [i.e., asymptotic var(538LS)] in the same manner as for FIML (section V.E.). The estimated 3SLS 304 coefficient variance-covariance matrix can also be normalized in the manner suggested for FIML. If an estimate of the disturbance variance-dovariance matrix 2 is desired, it seems desirable to utilize the estimated BSLS coeffi- cients to calculate a new 2 instead of merely using the S matrix BSLS calculated from the ZSLS coefficients. Utilizing the BSLS coefficients, the estimated disturbance variance-covariance matrix becomes: . BSLS (v111.14) ZBSLS Esp“. 38LS_ 3SLS uBSLS 3SLS' +33SLS *3SLS being the vector of BSLS coefficients for equation u +(including the 11 i ffi i t, -1 d - E z norma 2 ng cos c en ) an +2“ [Yfi H.] A A "degrees of freedom" adjustment may be made in the zBSLS matrix, and the 2 matrix normalized may be computed in the same 3SLS manner as for FIML (section V.D). B. 305 An Alternate Computational Procedure BSLS estimates will be the same as ZSLS estimates if the 8 matrix is a diagonal matrix, i.e., there is zero correlation between the ZSLS residuals of each pair of equations.1 by writing out the BSLS estimating equations. P8 . D... 01 11 311 3'18 . - ' : 0 1 0 _ b 814M 5. ..J and the formula for BSLS becomes 1 -1 1 as") ,5)- r 1-1P n ——Ez'z] ~0Ez'z] ——Ez'y] +- +0E'2y] an 11!le 1M|xI 811 11|xI 1M|xI i OEzMz] '-—-E‘Z] oEz'y] +—'E'] h MIIIXI smzxruxI E 1M": mzuynlx s Ez'zl'1 0 -1—EZ'Y] 11 11lx s 11ux I 11 e - 1 e . 0 2'2“] ——Ez' ] M X s MVM. x L “LLMM "ll '- 1 '1 - F‘ I ' A [2121]”: [zlyljlx 5251‘s I 1 1 -l «ZSLS [2' J Ez'y] 5 LMZMIXI MM ”(L J‘J In the above verification the diagonal elements of S cancelled This is easily verified In this case, out; hence, if any diagonal matrix had been used in place of S, the 1Zellner and Theil [1962], p. 58, note this special case. 306 same result would have been obtained. In particular, an identity matrix could be used for S. This gives a basis for an alternate method for calculating BSLS coefficients which is essentially the same as the alternate method for calculating ZA coefficients: (1) Start the computation by using an identity matrix in place of the S matrix. (No initial coefficients are required, since they are used only in the calculation of the 8 matrix.) Calculate fixo) and 4(0) using the same formulas as for Exl) and 4(1) except that an identity matrix is used for the initial S matrix. Then: r'stfi 1 823m ' 3 " (“OB-lam) 8231.8 LM (2) Calculate S from SZSLS in the way given earlier in this chapter and use S to calculate a new Exl) matrix and .&(1) vector. Then: « .___ (1)-141) 63SLS (R ) as before. Whereas the alternate computational method given above takes slightly more computer time at the 0th iteration (for a large scale computer, the additional time required is hardly measurable), it is simpler td'program--especially if provision is made to iterate on the BSLS estimates in the manner indicated farther on. Another advantage of starting in this manner will be noted in the discussion of the cal- culation of restricted BSLS coefficients. A disadvantage of starting with an identity matrix in place of the S matrix is that it imposes ZSLS estimates as the starting 307 estimates of the BSLS procedure, whereas it may be desired that other estimates be used as starting estimates for some or all of the equations; however, there seems little tendency to use estimates other than ZSLS . . . l estimates as starting estimates for BSLS. 1LIML estimates or other similar estimates meeting certain con- sistency requirements (DLS estimates do not meet these requirements) could be substituted for the ZSLS estimates in the BSLS procedure with- out changing the proof of the derivation of the asymptotic moment matrix of BSLS in Zellner and Theil's article [1962]; however, it is assumed that BSLS estimates are based on ZSLS estimates unless stated otherwise. Use of estimates other than ZSLS estimates would, of course, change the resulting 3SLS estimates obtained. 308 C. BSLS Estimation when rk XI = T In section II.G it was noted that if rk XI = T, the estimated coefficients for all double k-class estimators become the same as the DLS coefficients. In similar fashion for 3SLS estimation, if rk XI = T, 53SLS - 62A ; that is, the BSLS coefficients obtained W111 be the same coefficients as if the explanatory jointly dependent variables of each A equation were misclassified as predetermined and ZA applied. The BSLS coefficients obtained will not, of course, have the same properties as ZA coefficients.1 That 63SLS = 62A 18 ea31ly seen. Let ZA be the matrix of variables in the subsystem being estimated; then rk X 8 T implies I that [ZAJHX - ZA , since all variables are in the space spanned by I XI . 1 M — 10 11 , 1M . '7 2 s Z'y 8 Z1‘7‘1 S leM u=1 1 “‘ 19(1) becomes 5 E and d(l) becomes I ; M1 u ... 0 La zle SMMZMZm gsmz' Myu. 1H J hence, 5 = (EK1))-101(1)) = 69(1))-1(p(1)) = 8 as can be verified 3SLS ZA by comparison with ZA formula (VII.12). 1It may be recalled that SML estimation is so affected by the rank of the XI matrix that if rk XI 2 T - M + l, the SML computations cannot be performed (at least the SML formulas given in Chapter VI cannot be used without some modification). 0n the other hand FIML estimation is not affected by the rank of the XI matrix, since jointly dependent vari- ables are not adjusted by the XI matrix. (Instead, the matrix of co- efficients of jointly dependent variables are used directly in the estimation procedure.) 309 The fa t that 3 ' 'd ' ' ' ' c BSLS COinCi es With 82A computed With Jaintly dependent variables misclassified does not destroy the consistency of e l 5 . All that is indicated is that there are insufficient observa- 3SLS tions to distinguish 6 estimates from 82A estimates (unless 3SLS the space of X is restricted in some fashion). I The discussion of whether the space spanned by XI should be restricted (and methods for restricting the space of X1) in the case of the double k-class methods (section II.G) is applicable to BSLS as well, except that if the subsystem being estimated contains very many predetermined variables, the rank of the predetermined variables in the subsystem being estimated may already equal T. Let the matrix of predetermined variables in the subsystem being estimated be denoted XA' It seems undesirable to restrict the sub- Space of X1 in a manner such that XA is not in the space spanned by XI. If the space of XI is restricted such that XA is not in the Space Spanned by XI’ the 3SLS formulas given previously are not I XAJIXI i xAxA . To take account of these non-equalities, any predetermined variable . a - I I O valid, Since (see VIII.12) [XAYAJIXI # XAYA and [XA outside the Space spanned by X must be adjusted in the same manner I as the jointly dependent variables in the computational formulas; 1Consistency is an asymptotic property and a small number of observations in a given sample certainly does not affect an asymptotic property. If the number of instruments used in the estimation is fixed (e.g., the number of predetermined variables in the system is fixed for a given model, so if X is used as XI’ XI will be fixed) then (if the SBSLS formula is followed; that is, a switch is not made to the 82A formula) as T increases, at some point there will be sufficient observations that rk XI < T and 8 Will not coincide with 5 . 3SLS ZA 310 however, this is the same as misclassifying these variables as jointly dependent in the original model, i.e., this adjustment has the same computational effect as a change in the model in response to the small number of observations. 311 D. Arbitrary Linear Restrictions Imposed on Coefficients As noted earlier, the 3SLS formulas may be derived as an applica- tion of the GLS method. In like manner, we may derive restricted 3SLS (RBSLS) estimates (in which arbitrary linear restrictions are imposed on the coefficients) as an application of the RGLS method. If the set of NR restrictions given by: (VIII.15) R 6 == NRXn nXl N Xl r R is imposed on the 3SLS model (y = R6 + u with assumptions as stated previously), the R3SLS formulas are given by:1 I A ' 1 -1 ' 1 (VIII.16) 5R33Ls = Q {EQ I?( )Q] Q EIL( ) - f(1)q]} + q nXI nX(n-rk R) (n-rk R)X(n-rk R) (n-rk R)X1 nXl (VIII l7) asymptotic Var(g ) = QEQ'R‘1)Q]-1Q' . R3SLS (VIII.16) and (VIII.17) are derived from substituting the 3SLS matrix Exl) and.the 3SLS vector n(1) into RGLS formulas (1V.5) and (1V.8). Q and q are calculated from R and r by the computational 1Calculation of the Q matrix and q vector also gives a means of separating out rk R coefficients, 6(1), which may be calculated from the remaining n-rk R "unrestricted" coefficients, 3(2). Thus, the following pair of formulas are together equivalent to (VIII.16): 312:]??145 = [Q'ECIJQJ-l Q'[IL(1) ‘_ g(l)q] (n-rk R)Xl (n-rk R)X(n-rk R) (n-rk R)Xl R3SLS «R3SLS 8(2) = Q2 5(1) + ‘12 rk Rxl rk RX(n-rk R) (n-rk R)Xl rk Rxl Where Q2 and q2 are the subparts of Q and q noted in chapter IV. 312 1 procedure given in section IV.B.l. The R matrix need not have full row rank and the fill) matrix may be singular (the 2“ matrices may have less than full column rank). Although the R matrix and r vector can contain restrictions imposed on the coefficients of only a single equation and/or restric- tions which cut across equations, it should be noted that the same answer will not in general be obtained if restrictions which affect the coefficients of a single equation are solved into that equation as if these restrictions are listed in the R matrix and r vector and RBSLS applied. The reason that the resulting coefficients may be different are outlined for RZSLS in section IV.D, but are equally applicable to RBSLS. Restrictions which cut across equations cannot, of course, be solved into a single equation; hence, must be imposed through use of a procedure such as the RBSLS computational procedure. (1) In calculating the .9 matrix and a“) vector, it seems desirable that the 8 matrix corresponding to the restricted ZSLS estimates rather than the unrestricted ZSLS estimates be used. 1'If R is assumed to have full row rank and the Eu) matrix is assumed to have full column rank then the formula 3 - 09(1))"11t'[n<;?(1))'lit'lin'éxLs - r] 3391s " 8331.3 may be used. (This formula is implied by Zellner and Theil [1962], p. 78 in their remark that restrictions may be applied to BSLS estimates.) The advantages of the Q, q method over the R, r method for GLS given in section IV.B.Z are applicable to 38LS as well. 313 Imposing Restrictions Which th Ila/was 13qu on the Coefifii’céem Used to We. the S Mama It seems clearly desirable that restrictions which do not cut across equations be imposed on the ZSLS coefficients used to calculate the 8 matrix. It would also seem desirable to impose restrictions which do out across equations on the ZSLS coefficients as well. (They will then not be ZSLS coefficients for separate equations, but coeffi- cients closely related to ZSLS.) This may be done in the same way as was indicated in the ZA procedure, i.e., by using the alternate BSLS computational method noted earlier (section VIII.B), i.e.: (1) Start the computations by using an identity matrix in place of the 8 matrix. Calculate EKG) and fl‘o) using the same formula as for Exl) and a(1) except that the identity matrix is used in place of the S matrix. Then: , o -1 , o (VIII.18) 511251.514}: - QEQ I?( )Q] Q EIL( ) - Ewell + q A A (1) (2) Calculate S from 6R28LSME and use it to calculate a new E’ (1) matrix and a vector. Then calculate GRBSLS by formula (VIII.16) as before.1 h In addition to being simple to program, the use of the 0t iteration method has the advantage that unique ZSLS estimates (which ls . GRBSLS calculated in this manner will not be the same as GRBSLS calculated by ignoring the restrictions which cut across equations in the calculation of the S matrix. 314 take account of restrictions which do not cut across equations) need not exist provided there is sufficient identification that unique RBSLS estimates exist. 1It may be recalled that RZSLS as defined in section IV.D can incorporate restrictions which do not cut across equations whereas RZSLSME as in (VIII.18) can incorporate also restrictions which cut across equations. Unique estimates of the latter may exist even when unique estimates of the former do not. An example of this similar to the example for ZA (footnote 1 of page 277) could be constructed. 315 E. Iterative Three-stage Least Squaresg(ISSLS) 1. Only zero and normalization restrictions imposed on coefficients In their concluding remarks, Zellner and Theil state: "The three-stage least squares estimator 3 implies in general a new estimator of [ogp,] which differs from [suu,]. One can then set up a new stage based on this estimator rather than [spu,] and pro- ceed iteratively. No report on this method can be made as yet, but we hope to come back to it in the future."1 Such a procedure has been referred to as multi-stage least squares; however, this does not seem very appropriate, since going on to subsequent iterations is not the same as going on to subsequent stages as from the DLS lst stage, to the ZSLS 2nd stage, to the 3SLS 3rd stage. Instead we will refer to the above procedure as iterative three-stage least squares (IBSLS). Madansky examines the question and concludes that iteration is not worthwhile "in the sense that there is no improvement in the asymptotic variance of the estimator. 0n the other hand, the effect of such iteration on the finite sample variance (or even on the finite-sample bias) of the estimator is still an open question."2 ZA may be regarded as the particular case of BSLS in which there is only one jointly dependent variable per equation. As we saw earlier, IZA estimates coincide with FIML estimates. The question then arises: "Will iteration on BSLS estimates lead to SML estimates if applied to a partial system or to FIML estimates if applied to a complete system 1Zellner and Theil [1962], p. 78. 2Madansky [1964], p. 55. 316 assuming that the iterations on the BSLS estimates converge (so that we may refer to them as IBSLS estimates)?" The answers to both questions are in general, no, provided that rk XI > nu for at least one equa- tion in the system.1 Thus, occurrence of the "explanatory" jointly de- pendent variables in the case of 3SLS and IBSLS has a considerable effect on the properties of BSLS and IBSLS relative to ZA and IZA. First of all, let us show that IBSLS and SML estimates do not in general coincide. To do so, assume that the partial system to be estimated consists of only a single equation with rk XI > nu for that equation. If a third stage is applied to the single equation the two- stage estimates are again obtained, for the 3-stage formula becomes: l'lEsllEz I 3 - EsllEz BSLS ] [lelllxltzlylllx 62313 U IYIJLX I v lzlJIX I Continued iteration will give only the ZSLS estimates for each itera- tion, that is, the system clearly converges to the ZSLS estimates. Hood and Koopmans show that the SML estimates for a system with a single equation are the LIML estimates.2 Thus, in this case the IBSLS and SML estimates will coincide only if the ZSLS and LIML esti- mates coincide which will not in general occur for an equation for which rk.XI >’nu . The question might still arise--is there some peculiarity about a complete system which would make I3SLS estimates the same as FIML estimates. We can easily answer this in the negative by extending the 1If rk X > n for an equation, the equation is usually termed "over-identified" gnd if rk X . n for an equation, the equation is usually termed "just-identified" (See section 11.3). If rk X I n for all equations in the system, all of the following estimators dplus other estimators) give the same estimated coefficients: FIML, SML, BSLS, IBSLS, LIML, and ZSLS. 2Koopmans and Hood [1953], pp. 170-171. 317 above special case. Zellner and Theil show that in the estimation of the coefficients of a system of equations by 3SLS, any equation for which rk.XI - uh can be omitted in the calculation of the BSLS coefficients of the re- maining equations.1 Since their proof makes no use of where the original estimates are derived, it applies to any iteration; hence, the equations for which rk XI - nu may be omitted from the system in the calculation of the I3SLS coefficients of the equations for which rk.X. >'nu, i.e., the iterations may be performed on the equa- I tions for which rk.XI > mp, only. I one equation in the system and rk.XI - nu for the remainder of the equations. Then the ISSLS estimates of that one equation will be Now, consider a complete system in which rk.X >'nb for only the ZSLS estimates, which will not, in general, be the same as the FIML estimates of that equation. Thus, ISSLS estimates will not, in general, be the same as FIML estimates. The. ISSLS Puceduac To see how the computation of the IBSLS estimates differs at each iteration from the computation of SML and FIML estimates, let us first note the change in the coefficient vector from one iteration 1Zellner and Theil [1962], pp. 63-68. Actually their proof is not complete as they only show that the E’ matrix can be subdivided. They do not show the effect of multiplying the subdivided E’ matrix by a similarily subdivided A vector; however, some additional manipulation shows that their conclusion is correct. 318 to the next in the I3SLS procedure.1 At the ith iteration of the I3SLS procedure: 5(i) = (fixi'l))'1n(i‘1) where Efli‘l) and a (1'1) are calculated in the same manner [(VIII.10) and (VIII.11)] as Exl) and 4(1) except that g(i-l) is used instead of 3(1) The increment added to g(i-l) ' to form g(i) is: 1Many assume that the relationship between FIML and BSLS has been at least largely explicated in Chow [1964], pp. 548-550; however, Chow's derivation of BSLS as a minimization of a particular determinant is in- correct from a couple of standpoints: (l) Chow computes his estimate of a nu' as: 1 — — , - _ - ow' g Tdyujllx ' [YH'JIXYH' ' XuBu) ([yu'JlX - [Yu'JIXYu' xu'Bu') 1 — — , _ - _ - = T f**(6(i-1)). The condition m -0 will imply that the maximum has been reached in the region. In checking possibilities for a function of interest being maximized or minimised, m need not be the partial derivative of the function. It is only necessary that f** be a strictly increasing or decreasing function of the function of interest. The. SML and FIML Panama The vector of SML which is comparable with the vector m(1'1) of ISSLS is [as given in (V1.36)]: M 2 £1!”me u-l t<1-1> _ 21“ p. 21“ “11311111 (VIII.24) u-1 111“ .1, a i '7 W W MI» ...] “:11: EYMu ujllxl + (I at )Yuuu} 2 a u 1.1-1 mg” Nb- dJ The bottom part of each block of 1 corresponds to the prede- termined variables of the equation and is the same as for I3SLS. It is only the top part of each block (which corresponds to the jointly dependent variables of the equations) that differs. Further. if 8'1 322 -1 were substituted for 1' in the SML formula, even the top parts of corresponding blocks would be the same. (T'= (l/T)%A[;A;Al.x a; where [Z‘Z ] is the moment matrix of the part of Z orthogonal A A 1X1 A - * I .“I to xI . s (l/T)°’A[ZAZA]°’A .) In the special case that the number of jointly dependent vari- ables in the system equals the number of equations, the SML estimates become FIML estimates and the vector corresponding to m becomes (from v.37): rn_ M .“ TYUH} + 2 sluYiuu u=1 Mi 2 smxifi L u=1 ‘3 .4 s > M {ii/{MW} + z smYfifi? u=1 M. leuuxfifiubj Up. “'31 ..J where §{”|”} is formed from 1’1 as indicated following (v.37). The number of observations, T, in the FIML expression should not be confused with the ‘T matrix in the SML expression. Notice that the blocks of the right hand side corresponding to predetermined variables coincide for IBSLS and FIML, but the blocks of the right hand side corresponding to jointly dependent variables differ for the two methods. For I3SLS, the jointly dependent variables are adjusted to variables which are asymptotically uncorrelated with the disturbances whereas for FIML, the jointly dependent variables are left unadjusted and the special nature of the jointly dependent vari- ables is taken into account through the use of elements of the inverse 323 of the matrix of estimated coefficients of the jointly dependent vari- ables. [This adjustment is derived as the appropriate adjustment through recognition of the Jacobian of the transformation of the like- lihood from U to a and Z--see (v.12) and (V.13).] Determining additional characteristics of the extreme point to which I3SLS converges would be very helpful in determining whether 1’2 If it were determined that I3SLS such iteration is worthwhile. estimation is worthwhile and we knew a definite function being maximized or minimized, the speed of convergence could be considerably increased. As a start toward increasing the speed of convergence, the step size could be made more optimal at each iteration. Possibly a more effi- cient metric could also be devised. 1Convergence was obtained for some problems and appeared to be slowly occurring in all other IBSLS problems computed on the AES STAT system. 21f we define IZSLS as the same procedure as ILIML (see section VI.E) except that ZSLS is used as the basic computational scheme rather than LIML, then IZSLS estimates apparently do not coincide with IBSLS estimates. Using a coefficient convergence criterion (see section V.C.S) of .000 000 000 1, 61 iterations were required to cal- culate IZSLS estimates for Klein's model I. The coefficients obtained were: C P w CONSTANT P _ 1 Eq 1 -1 .2013 .7821 16.0510 .1282 I P CONSTANT P_1 x_1 Bq 2 -1 .1050 22.9420 .6253 -.1680 W1 E CONSTANT s_1 t Eq 3 .1 .3578 2.5265 .2130 .1815 Klein's model I is given in section I.C.a and the ZSLS, 3SLS, and IBSLS solutions to Klein's model I are given in the reproduced com- puter output of section IX.K. 324 2. Arbitrary linear restrictions imposed on coefficients Restrictions may be imposed each iteration of the computation of the IBSLS coefficients in the same manner as for 3SLS. PART III ADDITIWAL PROGRAMMING CWSIDERATIWS CHAPTER IX ADDITIONAL PROGRAMMING CW8 IDERATIWS The previous 8 chapters contain basic formulas for computation of simultaneous stochastic equations methods. The way that these for- mulas are actually programmed on the computer has a very large effect on the actual coefficients obtained due to: (l) rounding error in per- forming the computations and (2) misspecification of the particular equation system to the computer. (I.e., the computer solves the equa- tion system which is specified to it. This is very commonly not the equation system which the user thinks that he is specifying to it.) In this chapter we will consider programming procedures which may considerably reduce rounding error as compared to the programming practices usually used. The user control cards used in the stochastic equations portion of the AES STAT package of computer programs1 will be presented to illustrate a form of user control cards which results in considerably fewer errors and provides much more flexibility in the 1Except for a few subroutines and some recent modifications, the simultaneous stochastic linear equations part of the AES STAT system was programmed by the writer as a Department of Agricultural Economics research project. Although the programming was spaced out over a few years the actual programming required approximately one year of actual programming time from the writer. The remainder of the system (the DLS methods and methods other than the stochastic simultaneous equations portion) was developed as part of the writer's Agricultural Experiment Station programming responsibilities. Although the writer commenced programming the AES STAT system as the only programmer, his staff has been increased by the addition of approximately one-half of a pro- grammer's time each year. Presently there are three full time programmers beside the writer expanding the system (primarily the analysis of variance and covariance and the least squares portions), writing user descriptions, consulting with members of the university in the use of the routines, and developing other routines not a part of this system (i.e., with a different form of parameters). Programmers who have worked on the AES STAT system at one time or another include Mr. Donald P. Kiel, Miss Mary E. Rafter, Mrs. Barbara Bray, Mrs. MarylynHA. Donald- son, Mr. Richard J. Martz, Mrs. Sara J. Paulson, Mr. Peter M. Schwinn, Mr. Frederick J. Ball, Mr. Tim Walters, and Mr. John Geweke. 325 326 range of problems which may be calculated than the user control cards used for most simultaneous stochastic equations computer programs.1 To illustrate the control cards, the control cards required for cal- culating ZSLS, LIML, 3SLS, I3SLS, LML, and FIML for Klein's model I on the AES STAT system will be given along with the computer results obtained. 1The control cards are not presented as "ultimate" control cards, but merely a form of control cards which have performed considerably better than the control cards commonly used. Substantially superior forms of control cards will surely be devised in the future. Also, the control cards used in the AES STAT system are oriented toward a card input, batch-processing computer system. Since new forms of in- put and new concepts of processing are becoming available, it is de- sirable that the whole approach to specifying problems to the computer should change as well. For example, if teletypes or typewriters are used to specify a problem to a computer operating in a time sharing mode, a system of conversational controls should be devised to replace the form of control statements of the AES STAT system. The computer should query the user regarding aspects of the problem and after each user response, check for inconsistencies. The computer should notify the user of an inconsistency as quickly as it is detected, allow the user to correct the particular inconsistency detected, and proceed with the calculation of the problem. 327 A. Rounding Error " . . . extreme care and precaution with accuracy of computation is not a fruitless and vein search for superfluous digits beyond the number significant in the original input. In spite of the fact that we have only two or three digits of significance in our original input variables and want only two or three-digit coefficients as an end re- sult, we may have to carry out intermediate calculations to a very large number of places. The intermediate stages of equations systems methods of estimation are quite intricate, and if we do not carry out all of our results to many places and use the most accurate arithmetic procedures, we may find that our giant machines are spinning out masses of meaningless figures."1 Longley E1957] reported the results of calculating the DLS co- efficients from data consisting of 16 observations with 6 independent variables and 8 dependent variables on the DLS routines which he felt were the most commonly used DLS routines in the 0.8. Longley's re- sults were startling. The most accurate routine had only 4 or 5 digits accuracy in the resulting coefficients, one routine had no digits correct for some coefficients, and one routine even had some signs of 1R1ein and Nakamura [1962], pp. 298-299. 328 coefficients incorrect. Freund [1963] reported the results of calculating a small DLS problem on a series of computer routines and also obtained widely vary- ing results. If a large amount of rounding error is commonly encountered on small DLS problems, think how meaningless the results from many of eVen the most simple of the simultaneous stochastic equations estimating procedures must be. It is the purpose of this section to suggest com- putational methods for reducing rounding error so that meaningful re- sults can be obtained. 1Longley calculated the "correct“ coefficients on a hand calculator-- carrying 15 digits at each step-~and reported 8 places after the decimal point (plus from four to no places before the decimal point) in the re- sulting coefficients. The writer ran Longley's problems on the AES STAT system without making any special provision to reduce rounding error and obtained exactly the same coefficients as was reported by Longley, except for a couple of the coefficients corresponding to an overall constant in which the AES STAT overall constants disagreed at the ninth significant digit. (Subsequent experimentation with the data on the AES STAT system indicated that the AES STAT coefficients were correct to at least 15 places. Planned modification of the method of forming sums of squares and cross-products will raise the accuracy of the AES STAT system even further.) 329 1. Single vs. double precision Most large scale computers currently being produced have both single precision floating point arithmetic and double precision float- ing point arithmetic built into the computer. The AES STAT system operates on the Control Data Corporation 3600 Computer which has a 48- bit word. Each floating point number is represented as follows in the CDC 3600 computer: Single Precision--Each Word of Storage Contains One Number lsign[ll Exponent 36 mantissa bits bit bits :10 decimal digits l Double Precision--Two Consecutive Words of Storage Contain One Number sign [11 Exponent 84 mantissa bits 524 decimal digits ----q bit bits Each number is carried in the mantissa to a base of 2 rather than 10 and normalized so that "leading zeros" are not carried in the number. The location of the decimal point (to the base 2) is given by the exponent bits. Twice as many computer memory words are required to carry a matrix in double precision form, as to carry the matrix in single precision form. Also, for most computers, arithmetic operators require somewhat longer if they are performed in double precision than if they are performed in single precision. As a result of the requirement of more storage if calculations are carried in double precision, and to a lesser extent as a result of the additional time required to calculate in double precision, many simultaneous stochastic equations routines are programmed in single 330 precision.1 This saving of storage and time is generally a mistake. If computer time is the concern, then the programmer should be reminded that time lost in trying to find out why the coefficients "blew up" is time that should also be charged against the routine. Harder yet to evaluate is the case where (due to rounding error) none of the digits of the calculated statistics are computationally significant, but they are used anyway since the researcher is not aware of this. If computer capacity is the problem, then the programmer should be reminded that it is on the larger problems that rounding error builds up the fastest; hence, that double precision is most needed. If there is insufficient capacity for an operation, then use of supplementary storage devices such as magnetic drums, disks, disk packs, or magnetic tapes in performing a series of operations is preferable to using single precision in order to provide sufficient storage. Many programmers perform operations which they consider will not lead to a large amount of rounding error in single precision (e.g., a programmer may form the matrix of sums of squares and cross-products in single precision), convert the result to double precision, and then perform the more formidable computations which follow in double pre- cision. Although little or no additional rounding error may result from such a practice in particular instances, these instances are far more rare than is generally realized. The reason why this is generally an undesirable practice is that only a small proportion of the possible numbers convert evenly to the 1The simultaneous stochastic equations portion of the AES STAT system is programmed entirely in double precision. 331 power of 2 in the computer. Thus, even simple numbers like 1.1, 1.2, 1.3, 1.4, 1.6, etc. have a different representation in single precision than in double precision; that is, the double precision representation is not merely the single precision representation with a word of zeros attached. ThUS, (in the case of the CDC 3600) the initial numbers will be accurate to 10 digits whether carried in single precision or double precision converted from single precision. Since rounding error starts from the last significant digit, rounding error will start to build up from 10 digits out instead of 24 digits out even if subsequent opera- tions are performed in double-precision.1 One place that this is especially harmful is in the formation of a sums of squares and cross-products matrix with subsequent in- version Of part of the matrix and multiplication by another part of the matrix in order to perform simple calculations such as obtaining direct least squares coefficients. Many people seem to regard the formation of a sums of squares and cross-products matrix as an Opera- tion in which little rounding error occurs (since the basic calculation is so simple); hence, they perform this operation in single precision. Since inversion of a matrix is considered to be a complicated procedure involving a lot of rounding error, the single precision matrix of sums of squares and cross-products is then converted to double precision and the inverse calculated. Unless the sums of squares and cross- products matrix is formed from variables with special characteristics 1'An exception isan integral number such as 1,12,252, etc“ An in- tegral number is represented exactly by a single precision number; therefore, conversion to double precision of positive integers implies merely adding on a word of zeros. However, if division of one number by another is performed in single precision, the result will usually involve rounding to 10 places with subsequent conversion to double pre- cision giving only 10 place accuracy. 332 (such as that all variables contain integral numbers, only) the round- ing error battle has been lost even before inversion is started, since the rounding error will commence from somewhat less than 10 digits out during inversion rather than somewhat less than 24 digits as could have been obtained. In the case of a computer such as the IBM 360 which has only a 32 bit word (approximately 7 digits if floating point arithmetic is used) the use of single precision for some operations will have even a more pronounced effect on rounding error. Multiplication of two matrices or a matrix and a vector is often regarded as a safe operation as compared to inversion, but one should be reminded that the formation of an element of the product requires many multiplications and additions if the matrices are large. If the numbers vary substantially in size, rounding error can be severe in even such a simple operation. 333 2. Standardization of variables a. Deviations from means The size of the mean of a variable has considerable effect on rounding error. Consider the following two variables: x1 x2 100001.2763 1.2763 100002.7816 2.7816 100001.1471 1.1471 100003.0278 3.0278 If x1 is used as an explanatory variable in a stochastic equation containing an overall constant coefficient and the coefficients of the equation are estimated by any of the simultaneous stochastic linear equations methods, the coefficient corresponding to x1 will be the same as the coefficient which would be obtained if x1 were replaced by x2; only the overall constant coefficient Would change if x were replaced by x However, sums of squares and cross- 1 2 ° products can be calculated more accurately for x2 than for x1. Let us assume that the mantissa of each floating point number can contain 10 digits. Then each observation of x: will contain no digits to the right of the decimal point Whereas each observation of x will contain 9 digits to the right of the decimal point. 2 Similarly cross-products of x2 with other variables will contain more digits to the right of the decimal point than cross-products of x1 with other variables. We may regard x as having been formed by subtracting 100000 2 from each observation of x Even more accuracy would be obtained 1 . 334 if x2 were formed by subtracting the mean of x from x1, since 1 there would be even less superflous information to the left of the decimal. The new x2. would again give the same coefficient (except for less rounding error in its calculation) as either x1 or the original, x (Again the overall constant coefficient would change.) 2. So,far the effects of the actual squaring and cross-product Operations, only, have been considered. Summing operations in the formation of the sums of squares, sums of cross-products and sums of the original variables also contributeto rounding error, and again less rounding error wOuld be produced during these summing operations if x2 were used instead of x1. To illustrate the effect of summing on rounding error we will use a different example. To simplify our evaluation we will assume that each floating point number contains exactly 5 decimal digits in its mantissa and its exponent iscarried to the base 10. Consider forming the Sum of a hariable, x3, consisting of 100 observations with the first 50 observations assuming the value 10001 and the last 50 observations assuming the value 10003. The sum of the first 9 Obser- vations is 90009. Whén the 10th observation is added, 10001-101 is Obtained (i.e., the mantissa of the number is 10001 and the exponent is 1). When the 11th observation is added we get 100013101 + 10001-100 = 10001-‘101 + 10001-‘101 - 11001-101. (To add two positive floating point numbers their exponents are firstlequalized. This is accomplished by dividing the mantissa of the smaller number by 108 where a is the difference between the exponents of the two numbers.) In like fashion as each of the observations which follow are added into the sum, the l or 3 is lost from the right hand side due to the equalization of 335 exponents. Thus, after adding in the 99th observation we have 990019101. When the 100th observation is added we get 99001'101 + 10001110o = 99001101 +1000'101 - 10000'102; whereas the exact sum is 10002"102. If xt4 were formed as xt3 - 10000, x4 would be 1 for the first 50 observations and 3 for the last 50 observations. The sum of x4 would be accurately obtained as 200 (which is represented as 20000-10'2). The effect of summing on rounding error has been illustrated for a variable, but a similar effect is obtained if x3 is the square of a variable or the cross-product of two variables during the formation of a sums of squares and cross-products matrix. I Let us assume that an overall constant coefficient appears in each stochastic equation in the subsystem to be estimated by any of the simultaneous stochastic linear equations methods. (We will con- sider the case of a stochastic equation containing no overall constant coefficient further on.) Let M be the matrix of sums of squares and cross-products of all variables in a problem, that is: T (ix.1) M - [”133 . mij - tElxtixtj . and let A be the matrix of sums of squares and cross-products of the deviations from their means of all variables in the problem, that is, T _ .. (IX.2) A - [811] s 81:] B til(xti ‘ xi.) (xtj - xj) T __ T - 2 x x x 2 x t=1 ti tj i t=1 tj 336 [For a given problem, the two definitions of aij may not coincide due to rounding error; however, we are only defining the matrix A at this point. In section IX.A.2.c we will consider a more accurate formula than either of the formulas for aij in (IX.2).] Also, let us define the variable x0 as a variable which assumes the value 1 for all observations. Then the same estimates (except for rounding error) are obtained if (1) the variable x0 is explicitly included in each equation and the M matrix is used as the Z'Z matrix in the computational formulas given in parts I and II of this paper as if (2) the variable x0 is omitted from each equation, the A matrix is used as the Z'Z matrix in the computational formulas of parts I and II, and the overall constant coefficient for each equa- tion is calculated as: -l IX.3 a = -. - 2's = -E— 5.2 . ( ) 110 ya u u yu» Jim] where &u0 is the overall constant coefficient for equation u (the coefficient corresponding to variable x0), y; is the mean of the nor- malizing variable for the uth equation, 2; is a row vector of means of the explanatory variables of the nth equation (not including x0) and d“ is the vector of coefficienUBof the uth equation (not in- cluding the normalizing coefficient, -1, and afio). If the A matrix is used as the Z'Z matrix then the distur- bance variance-covariance matrix, the coefficient variance-covariance matrix, and statistics calculated from these matrices are calculated by the formulas of parts I and II except that (1) the constant coeffi- cient is not included in the formulas and (2) if a ”degrees of freedom" adjustment is made, the "degrees of freedom" should take account of 337 the implicit overall constant coefficient. For example the disturbance variance-covariance matrix may be estimated as: A 1:"! . S TQ’A 9 however, since A is calculated from deviations from means, the over- all constant coefficients are omitted from the & matrix. A degrees of freedom adjustment of 'T/Q/T-nui/T-nu,) can be made in the S matrix as before, but here, nu and nu, each include the overall constant coefficient (i.e., if up and up, reflect the number of explanatory variables not including x they are incremented by 1 in 0, adjusting for degrees of freedom).1 1It is convenient to take this adjustment into account in the sub- matrices which output the variance-covariance matrices and related statistics rather than in the main computational section of the program. This is easily accomplished by setting a variable in COMMON to 0 if the M matrix is used and 1 if the A matrix is used and subtracting this vari- able from the "degrees of freedom" whenever a degrees of freedom cal- culation is made. It is also convenient to calculate overall constant coefficients by (IX.3) in the coefficient output subroutines also, rather than in the main program. An overall constant coefficient is calculated and printed out along with the other coefficients whenever the aforementioned variable in COMMON is l. 338 b. Uniform scaling It is sometimes (incorrectly) thought that if all numbers are carried in floating point form, the scaling of a variable up or down will have little substantive effect on the calculations, since the main effect of the scaling is the changing of exponents which desig- nate where the decimal point occurs for each observation of the vari- able. The scaling of a variable does have a considerable effect on rounding error due to the effect of addition and subtraction of float- ing point numbers. To add two positive floating point numbers, the exponents are first equalized. This is accomplished by shifting the mantissa of the smaller number to the right (dividing it by powers of 2) and add- ing to the exponent of this number until the exponents are equalized. What is left of the mantissa of the smaller number is added to the mantissa of the larger number and the resulting number is then normalized to eliminate leading zeros. Subtraction is performed in a similar manner. Thus, when addition and subtraction are performed, the sizes of the numbers involved are important. Since addition and subtraction are an integral part of the operations for all simul- taneous equations calculations, rounding error is affected by the scaling of variables. The basic information used from a set of variables is often contained in the matrix of sums of squares and cross-products of the deviations of the variables from their means [the A matrix (IX.2) of the previous section]. If some variables are scaled high and some low, the size of the elements of the A matrix may vary greatly in magnitude. Thus, if a typical element of the deviation from the mean 339 of one variable is about 1000 and a typical element of the deviation from the mean of another variable is about .01, the diagonal element of the A matrix of the first variable will tend to be about (1000)2/(.01)2 = 10,000,000,000 times as large as the diagonal element of the second variable. Control of rounding error in addition and sub- traction is very difficult With such widely differing magnitudes of variables. The above does not imply that the user of a computer program must try to scale all of his variables to the same magnitude, as this may impose a considerable burden on the user, both in setting up his data and in interpreting his results. Also, the madnitude of variables created in the computer by a prior transformation or editing procedure may be hard to predict. The above considerations suggest that the computer program should be sophisticated enough to handle automatically the scaling of data (and subsequent descaling of results) for the user. Automatic uniform scaling of variables is easily accomplished. One method of accomplishing uniform scaling is to form the A matrix and then normalize the A matrix so that each diagonal element is 1. This is accomplished by multiplying the elements of row i and the elements of column 1 by (11 Where (IX.4) ai = “51—i— In matrix notation the operation may be represented as (IX.5) A* = DAD where A* is the A matrix normalized to l on the diagonals and D th . is a diagonal matrix whose i diagonal element is di 340 Let xii = di(xti - Mi). Then (ignoring rounding error) A* is the matrix of sums of squares and cross-products of the x: variables. A characteristic of the A* matrix is that each x: variable has length 1--a very convenient normalization of variables.1 Since x? has mean zero and length 1, it is often referred to as a standardized variable. The A* matrix is the usual simple (Pearson product-moment) correlations matrix. All calculations are performed on the A* matrix in the same manner as if it were the A matrix. Thus, many statistics such as coefficient estimates are carried in normalized form in the computer, thereby reducing rounding error in many calculations involving the co- efficients. Only when_statistics are printed out, must those statistics which are affected by a change of scale in the variables be denormalized. It is usually convenient to do the denormalization in output subroutines, thereby leaving the coefficients (or other statistics) in normalized form in the computer in case they are used for subsequent calculations.2 One might pose the question--why not first normalize the A matrix by multiplying rows and columns by powers of 10? The reasons are (1) it is as easy to normalize in the fashion indicated as by a LThe length of a vector, xi, is defined as inxi -./ T 2 Z x . 12:1 121 2The AES STAT package uses one output subroutine for printing co- efficients estimated by single equation methods, one output subroutine for printing coefficients estimated by multiple equations methods, one output subroutine for variance-covariance matrices (either disturbance or coefficient and in either denormalized form or further normalized so that 1's appear on the diagonal of the variance-covariance matrix as well), and one output subroutine to calculate and print estimated coefficient standard errors, coefficients divided by coefficient standard errors, and coefficient variances. All denormalizations are handled in the out- put subroutine thereby leaving the normalized coefficients and variance- covariance matrices unmodified in the computer. 341 power of 10, since a power of 10 is no more convenient than a number such as lA/ T 2 (x t=1 t each standardized variable has length l (the diagonal element of the 2 for the computer, (2) the knowledge that i‘xi) A* matrix are l) is convenient in deriving certain computational formulas, and (3) the statistics affected by scaling variables will be completely denormalized before printing them out anyway. The normalization (or standardization) which we are imposing on the coefficients through normalization of the variables should not be confused with the normalization imposed on the coefficients of an equation by setting the coefficient of the "normalizing variable" to -l. The normalization to -l is required for determinateness of the coefficients, whereas the normalization that we have been discussing in this section results in coefficients which are compatible with the normalized variables (and are therefore independent of scale of the original variables). If normalization of the A matrix is accomplished in the com- puter before computation of statistics is started, all computed statistics (before printing them out) will be uniformly scaled; that is, scaling of variables up or down by the user Will have no effect on the basic A* matrix and, therefore, on the intermediate matrices and final statistics calculated. This uniform scale characteristic 1Normalization by a power of 2 is not advantageous either, since special programming would be required to adjust the exponent directly, and the extra operations would in general require more time than the use of direct multiplication for normalization and denormalization. 2In DLS estimation the normalized coefficients are often referred to as beta weights. 342 is very convenient in determining the rank of a matrix, Whether a matrix being inverted is singular, whether certain variables are linear combinations of other variables during orthogonalization, the degree of convergence of an iterative procedure, etc. Zellner and Thornber state: "A simple and relatively inexpensive check which we rec- ommend is to perform calculations several times with the raw data scaled differently each run. Should the result- ing sets of estimates differ, it is probable that compu- tational errors are a problem. When faced with such re- sults, an investigator might resort to higher precision arithmetic and/or ponder whether the information in his sample is adequate to provide even moderately good esti- mates of all of the parameters of his model." The writer concurs with the importance which Zellner and Thornber place on proper scaling under the assumption that the routine will not automatically scale the variables by normalizing the A matrix; how- ever, even if several arbitrary scalings of the raw data are tried for a problem, none are likely to give as good a result as the normalization to length l of all deviations from means of variables which will be automatically accomplished by the procedure outlined earlier. Thus, although by normalizing the A matrix we have given up being able to affect the calculation by rescaling variables, we can expect to end up ahead rather than behind, and considerably more conveniently (from the 2 standpoint of the user). 1Zellner and Thornber [1966], p. 728. zAlso, to implement the suggestion of Zellner and Thornber, one must decide how much to rescale individual variables. If the alter- native scalings are sufficiently pathological, differing estimates will surely be obtained. 343 c. Improving the estimates of sums, means, and the standardized moment matrix In section IX.A.2.a we noted that if the mean of a variable is subtracted from each observation of the variable, a computer word can contain more meaningful information regarding the sum of the variable,. the sum of squares of the variable, and thesum of cross-products of the variable with another variable. Let us also note the effedt of'roundingm error on the calculation of means and a procedure for improving the accuracy of computed means, sums, and the standardization moment matrix. First, let us define the set of original variables as the x1 and define a set of variables, the yi’ corresponding to the x1 as: (IX.6) yti = xti - mi , t = l,...,T where m1 is a constant subtracted from each observation of xi in the formation of the corresponding variable yi . A desirable choice for an m1 is the mean of the corresponding xi , but we will not restrict an mi to be an exact mean of the corresponding xi . For example, an mi might be a grossly inaccurate estimate of the mean of its corresponding xi From (IX.6) we obtain: (1x,7) xti = yti + mi , t = l,...,T .3 hence: T T (IX.8) tilxti = tilyti + T mi _. T ._ (IX.9) xi = (1{I)- Elxti= yi + m1 344 If m1 is an approximation to the mean of xi we can expect _' T to compute xi much more accurately as yi + m than as (l/T) Z xti , i c-1 and we can expect to form the sum more accurately as T T 1 Z x . = E y . + Tm. , i.e., by (IX.8).' t=l t1 t=l t1 1 1 __ T The arithmetic mean of the xti is, by definition, xi = (l/T) E xti' However, in_ca1culating the mean by this definitional formula, we Effen do not get‘i, exactly. Let m be the number thus actually obtained as an approximation to the mean, and let e be the rounding error, so we have m - x = e , where m is a known icalculated) quantity but, in gen- eral, fhe exact values of_;x' and e may not be known. Now consider another approximation to xi, say mfi, obtained as follows: Compute (l) yti = xti - mi, t = l,...,T; (2) an approximation to the me%n of the yti’ obtained by apply- ing the definitional formula §i = (l/T) 2 yti; let qi be the result of t=l this galculation, and fi the corresponding rounding error, so we have q.-yi=fi; 1 * z: and (3) mi q. + m,. The resulting error, i.e., the difference between m: and 2;, is * - E’ = + m -‘; mi qi 1 i i =-+f+ -_ y1 i m1 x1 = x - m, + f, + m -'x 1 i 1 = fi . Thus, mt Will be a better approximation to ;' than m is, whenever the magnitude of f is less than that of e . Now the magnitude of a round- ing error is, on the average, roughly proportional to the magnitude of the quantity being rounded off. (For example, in the decimal system, if n significant digits are carried throughout, the average magnitude of rounding error is roughly .5 X 10.n times the magnitude of the number rounded off.) Since y is always close to zero,_its magnitude is generally much smaller than that of x, (unless x, is already very close to zero), and hence the magnitude of 1f, is genefally much smaller than that of ei, and m? is generally a much better approximation to xi than is mi. 345 To illustrate the improvement which may be obtained through use of (IX.8) and (IX.9), we will return to one of the examples in section IX.A.2.a. In this example we used a variable, x3, consisting of 100 observations, the first 50 observations assuming the value 10001 and the last 50 observations assuming the value 10003. We also assumed that each floating point number contains exactly 5 decimal digits and that its exponent is carried to the base 10. We then computed the sum T of the variable to be ( Z x ) = 10000:102. Thus, t=1 t3 lst pass T 2 m3 = (tEIXCB)lst pass/T = 10000-10 /100 = 10000 10001 - 10000 = l for t = l,...,50 y =x um: t3 t3 3 10003 - 10000 = 3 for t = 51, ,100 100 By =200 t=1 t3 'y'3-2 T T (EX) =Zy,+Tm t=l t1 2nd pass t=l t1 3 200 + 100-10000 = 1000200 E3 = y + m. = 2 + 10000 = 10002 . Thus, by computing the sum and mean of x3 in two passes, we obtained the more accurate sum 1000200 and the more accurate mean 10002. In a Monte Carlo study, Neely [1966] used an mi of (l/T)t§1xti and compared computations of the means of variables through use of the formulas '51 + mi , (l/T) glxti , and some additional formulas. Neely found that the formula '§:=+ mi performed at least as well as the 346 other formulas he tried and was superior to the direct computation T (l/T) Z x t=l In section IX.A.2.a we noted that the simple correlations matrix ti ' A* is a convenient normalized moment matrix to substitute for the Z'Z matrix in the formulas given in parts I and II and that this matrix has desirable properties for such a substitution. The same A* is obtained if the yi are substituted for the x1 except that the A* matrix for the yi will in general be formed more accurately. To denormalize the statistics affected by the normalization used, we must note what normalization elements [see (IX.2) and (IX.4)] are implied by the substitution of the yi for the x1 . If we had used the x1 directly, we would use 1A/ T 2 _' T as the nor- 2 xti ' xi 2 xti t=l t=l malization elements. Now note that T T 2 __ T _ 2 (IX.10) Zx.-X.Zx.=2(x -X.) t=l t1 1 t=l t1 t=1 ti 1 T — 2 = E [(yti + mi) - (yi + mi)] t=l T 2 g 2 (yti ' yi) t 1 T2 _ T = 2 y , - y 2 y . , t=l t1 1 t=1 t1 hence, we can use the same denormalization elements as if the yi had been the original variables (the xi). In his paper, Neely [1966] also compared computations of simple . T ' correlations matrices through use of the yi [with an mi of (1/T‘ ijti] t =1 with computations through direct use of the x1 and with computations using other formulas. Neely found that the usual simple correlation 347 formula but using the yi in place of the xi performed at least as well as the other formulas he tried and was superior to the direct use of the xi. The preceding implies that, in general, accuracy can be increased by making two passes through the data.1 In the first pass, an approx- imation of the mean of each variable is obtained (this approximation is improved the second pass) and in the second pass the sums of squares and crossgproducts matrix of variables from their approximate means is formed and the sums of the deviations of the variables from their approximate means are obtained.2 Finally (1) the simple correlations matrix is formed from this newly formed sums of squares and cross- products matrix (ignoring the fact that the means and sums of the variables from which it was formed are approximately zero) and (2) the approximate sums and the approximate means of the xi variables are adjusted by (IX. 8) and (IX.9) to provide more accurate compu- tations of these quantities. Provision should be provided for the user to specify a set of approximate means and he should be allowed to specify that only one pass be made through the data. For example, it is planned that the AES STAT system will be modified so that two passes are made through LLongley [1967] contains a report on rounding error by many well- known DLS routines. Longley documents the improvement in accuracy obtained through making two passes through the data and the desirability of using accurately computed simple correlation matrices and accurately computed denormalization elements in the computation of DLS problems. It is convenient to incorporate the extra variable x (a variable assuming the value 1 for all observations) whenever a sums of squares and cross-products matrix is formed. The sums of the variables used in forming the matrix are then automatically calculated as the elements corresponding to x0. 348 the data unless the user directs that only a single pass be made through use of the ONEPASS code on the SSCP card. The user will also be permitted to specify his own approximate means (m ) for any vari- i ables during that pass. Use of the ONEEASS code and specification of no approximate means by the user implies that only a single pass will be made through the data with all m equal to zero, i.e., the usual 1 simple correlation formulas will be used. 349 d. Adjustments if no overall constant coefficient If no overall constant coefficient is to be included in an equa- tion, the sums of squares and cross-products matrix [the M ‘matrix de- fined by (IX.1)] normalized in some fashion is used as Z'Z in the formulas given in parts I and II of this paper rather than the simple correlations matrix [the A* matrix defined by (IX.5), (IX.4), and (IX.2)]. From (IX.2) we recall that: T (IX.1]) aij = mij - xi 2 xtj ; t=l hence, mij may be formed as: _ T .l = a + (IX 2) mij ij xi fixtj t-l _ T If the x1 and the Z xtj are normalized by the same nor- t=l malization as is used for the A matrix in forming the A* matrix T [i.e., d x, and d 2 x , are formed where the d, are the nor- 1 1 j t=l t 1 malization elements defined by (IX.4)], then the M matrix normalized (say M*) may be formed directly from the A* matrix as: _’ T . * = 'k + d d (IX 13) mi], aij ( 1xi)( J, r. xij) t=1 z: * , where M* [mij] This normalization is based on the deviations of each variable from its mean having length 1 rather than each variable having length 1; hence, the elements on the diagonal of M% will not be 1 as is the case for A*. One could do an additional normalization by multiplying t the 1th row and the i h column by C1 where c1 = lA/m:i so that each variable will have length 1 (1'5 will appear on the diagonal of the M* matrix); however, if this is done, statistics affected by the 350 normalization must be denormalized based on a normalization of cid1 rather than d1. If a special normalization is used for the M* matrix and some estimates are based on the A* matrix and other estimates are based on the M* matrix, additional bookkeeping must be kept by the computer routine regarding which normalization was used to compute a set of statistics used in a later step (e.g., DLS coefficients may be used as starting estimates for FIML) or the statistics based on one of the normalizations must be renormalized as they are stored. The advantages of using a special normalization for the M* matrix would not, in general, appear to warrant the additional programming required. Rather than form the M* matrix directly or for that matter form it indirectly once and for all and hold it in the computer memory or on some auxillary storage device, it is much more convenient (and in many cases, saving of computer time) to form from the A* matrix the portion of the M* matrix needed to calculate a given phase of a pro- blem. It is planned that the AES STAT system will be revised so that it is assumed that an overall constant coefficient is to be included in each equation unless the user specifiep a NOCON (for 22 overall Eggstant coefficient) code on the relevant STAT control card. For example, k- class estimates will be based on the A* matrix and an overall con- stant coefficient will automatically be calculated (hence, the x0 variable need not be specified on a given K card) unless the NOCON code appears on the K card.1 If the NOCON code appears on the K card, only the part of the A* matrix corresponding to variables appearing on the K card will be extracted from the A* matrix. The ‘M*"m'atr‘ix . ‘ P 1The'K card is discussed in sections IX.3 andle.x; L'e: 351 corresponding to these variables will then be formed in the manner given by (IX.12) and used as the Z'Z matrix in the calculation of the k-class problem. 332 3. Use of simultaneous equations solutions Klein and Nakamura state, "In the evaluation of Y'X(X'X)-1X'Y we find it more efficient and accurate, from a computational point of view, to calculate (X'X)-1X'Y (as the solution vectors of sets of simultaneous equations) and then premultiply the Y'X. Criteria for success are judged by the positive definiteness and symmetry of [Y'YJIX' Also inconsistencies in subsequent dependent calculations indicate arithmetic errors at this stage."1 Calculation of Y'X(X'X)-1X'Y as Y'Y - [Y'YJLX where [Y'YJIX is obtained by direct orthogonalization (see section I.D.2) is more accurate yet; however, Klein and Nakamura's basic point is well taken. If the result of an inverse times a matrix or vector is desired, this result can be computed more accurately by treating the calculation as a "solution to a set of simultaneous equations" rather than by actually forming the inverse and performing the matrix or vector multiplication. A slight problem presents itself when the inverse of the matrix is desired in its own right such as for the calculation of variance-co- variance estimates; however, this problem can be resolved by using a subroutine which calculates the inverse and simultaneous equations solution at the same time. 1Klein and Nakamura [1962], p. 287. Klein and Nakamura used the notation M M-lM instead of Y'X(X'X)-1x'Y, M-IM instead of yx xx xy xx xy (X'X)-1X'Y , M instead of Y'X , and MA. instead of the [Y'Y] Y" W Ix used in the above quote. 353 For either inversion or simultaneous solution, the selection of the largest element of the remaining submatrix to be operated on at each step as the pivot for the step considerably reduces rounding error for many problems.1 1For positive definite or positive semi-definite matrices, the largest element of the remaining matrix to be operated on at each step will occur on the diagonal of the remaining sub-matrix. 354 4. Direct orthogonalization The use of direct orthogonalization in various phases of the computation has been emphasized throughout this paper. (A method for accomplishing direct orthogonalization is given in section I.D.2.) Here we note that direct orthogonalization may considerably reduce rounding error as compared to calculation by the usual method, e.g., by Y'Y - Y'X(X'X)-1X'Y . Also matrices of the form Y'X(X'X)-1X'Y may be calculated more accurately as Y'Y - [Y'Yllx . The rearrangment of rows and columns at each step so that the pivot is selected as the largest element of the remaining sub-matrix (this largest element will occur on the diagonal of the remaining sub- matrix) at each step will considerably reduce rounding error for many problems. 355 5. Iterative techniques There seems to be a fairly widely held view that rounding error builds up as iteration continues in the calculation of certain solutions. In particular, it is often thought that, even though rounding error is small in the computations performed for a single iteration, FIML coeffi- cients may be subject to quite a bit of rounding error if quite a few iterations are required for convergence. Now, continued iteration may cause difficulty in some iterative procedures, and formulas could cer- tainly be devised which would cause rounding error to build up in the calculation of FIML coefficients; however, rounding error will not build up during iteration if the formulas given in this paper are used. This is because during an single iteration, all calculations are based only on a matrix of sums of squares and cross-products (which does not change as iteration progresses) and on the set of coefficients obtained from the last iteration. In order to move to a higher likelihood, ad- justments to these coefficients are then calculated, and added to the coefficients and a new iteration is started. If the increment for a coefficient is in the wrong direction, the procedure itself will correct the coefficient at some later step (assuming that rounding error is small for the computations performed in any single iteration) as the likelihood will not be maximized until the coefficient is corrected. Thus, assuming that convergence is obtained to a desired degree of accuracy, the particular coefficients obtained are a function of the matrix of sums of squares and cross-products and the assumed structure (neither of which change as iteration progresses) rather than the 356 intermediate coefficients and matrices obtained in the process of iterating. 1If the likelihood has multiple local maxima, the particular in- termediate coefficients will have an influence on which maximum is obtained; however, given that we end up on a particular peak, continued iteration will not cause the coefficients which maximize the likelihood in that region to be estimated less accurately. 357 Free Field Interpretive Parameters A very common way to give instructions to a computer routine is by a series of parameter cards containing information punched by the user in essentially the following form: Card 1: Cols. 5- 7: Cols. 8-10: etc. Cards 2-10: Cols. 4- 5: Cols. 7- 8: Cols. 9-10: Cols. 11-20: Cols. 21-22: Cols. 23-24: Cols. 25-34: etc. Card 11: C01. 5: etc. Number of rows in the matrix Number of columns in the matrix Number of elements on the card First variable number First column number First element Second variable number Second column number Second element Punch a 1 if 3SLS estimates are to be calculated from LIML coefficients. Leave blank if BSLS estimates to be calculated from ZSLS coefficients. The use of this type of parameter card causes an unnecessarily large number of errors. Here a number of cards must be prepared with particular information going in particular columns, the cards differ- ing somewhat from each other, no parameter being expressed in a form natural to the user. It is very easy (and common) to get some of the 358 information punched in the wrong columns. Now, some of the mispunched information may be detected by the routine as being impossible. This case may cause computer time to be wasted and results to be delayed; however, this is not the most serious problem. The serious problem is that with this type of parameter card, it is very easy to mispunch a card in such a way that the parameters still make sense to the routine. An answer is obtained, but it is the answer to a different problem than the user intended. Often the user will then go ahead and use the re- sults. One might be comforted by the thought that the probability is very low that (1) a card will be mispunched by a careful user who double-checks everything, 22d (2) if a mispunch occurs, it will appear correct to the routine, 33g (3) if the mispunch occurs and the routine accepts it, the answer will be within a range that a user skilled in the method will accept as correct. If so, one is falsely comforted. The probability is quite high.1 The more usual situation is one in which a user with little knowledge of the importance of checking his parameters and data cal- culated a problem by one or more of the simultaneous stochastic equa- tions estimating procedures. Often the user has had little experience in interpreting results from the methods (after all, we have to start sometime) and so he expects almost anything. The probability of getting 1The writer has had occasion to take results of problems cal- culated by researchers who are far above average in carefulness of preparing cards and skill in interpreting results, but he has rarely obtained the same answer on his routines. Further checking usually disclosed an invorrect number punched or a code or number punched in the wrong column(s). The researchers had already reported their in- correct results. 359 an answer to a different problem than the one he actually has in mind (using the above type of parameter cards) is even higher than for a user who is aware of the need for double checking and has had much experience in checking results. Often the user makes a comparison of a series of models. In this case the probability must approach 1 that he will get some incorrect results due merely to mispunched parameters. One way to reduce the probability of misspecifying the parameters is to use an interpretive form of parameter card that is free-field, i.e., no numbers have to be punched in particular columns. In the accompanying listing of computer control cards and following ex- planatory paragraphs, interpretive free-field instructions to the com- puter are illustrated with the parameters and data which would be used to estimate Klein's model I by DLS, ZSLS, LIML, 3SLS, I3SLS, LML, and FIML on the AES STAT system. (A number of other methods are available in the AES STAT system, but this should be adequate to illustrate the parameters.) The results are reproduced in the last section (section IX K) of this chapter. through X to be formed 1 15 into a sums of squares and cross-product matrix. Certain basic statistics The SSCP card causes variables X regarding the variables and pairs of variables are automatically cal- culated and printed out. The TRANS code causes the data to be trans- formed as indicated by the transformation cards which are discussed in the next section. The RES code causes the transformed data from X1 through x15 and X20 to be saved for later calculation and labeling of residuals. The MAXE=8 (mgximum equation number is equation 8) code sets up a coefficients pool for temporary storage and retrieval of co- efficients. Quite a few additional codes are permitted on the SSCP 360 9°.aoo..a.a «.ouo as. a ca. Nos .. one a~.a ON. 0.“ ...Ho .4431 as. a a“. c.” o. can so. a so. .~.a . .uac m: a. ~23 8+4: no.6 fat... 33¢ flea noses—a ‘3...» wen—.1. :flfioflk eaten $392.9”? 9.3: .7911. .27. 4 dog .53: deduuesn dead. »u_._.¢.sd> ”iv—snot nt.—04d. 4.” #0.: is»: J J nicest; ..an n=v+ ahfiedv rebou- 933533. to 333.3: +30 ... ewe—33s... J = do semi-3.33%.}??- 93. ......x 0+... 8031:... Low 3.94 1042‘s....— 1. I503. ——¢ 4. dad—int 91...: 505‘ of a..o °«.o ov.n ca.“ oo.a 99.x»...o go.» so.» o..e 90.x» ... oa.¢n N .:°o~. e~.~n ... on. . ..«ooo. can we °~.n~ ..u on. n °~.a.oa. «.aau as .na..e~.na. new.» a.na..xo.e.:.oe .u:o: ..mxax. mm¢.oz.¢a...~.xx.nm.x....a.x.aumm omega; a. omega; 2. anaaas a. mx.p. a.g.~:..~.x J «:33qu .J. m.3. at“... .61»: J _ Juno: z.m4x..<».o a: can a «.33». r223. 4:_uuu4»n.uaama.x..uaamm,_a=o¢ . 2m zunem. o~¢d.¢2..o~.x 33:13:33.. .m.x....u..o«.. x zo~»_umea mJ-aoo .o.«.x zo_mzm:~o a as. ca..m.. z.¢z\a\zozxoo .u.auz<¢» mz_»3o¢-:u x.a.4.zpa 92» m4.. ~.z~<: a..uJ.a .a.an.»x.o~.aaaou pz.oz<:ua 4x_e.sxs .msmna .asmn .4x~4.msm~ zo . Juno: z.msx 44.o.mso=¢.nasxaa.~..u.o.moa .. 361 at»; 1.... 3am. I n 53.3.? 93.310 45.. ...: 33 I a £49.? «331» 15.. .252. I 2.113: .331. A ass.“ on... sa.d~ 934°».nu .on.o o~.aousb.o« coco..n« ..a.o soloaasn.n« coca...“ Loo.» o..ao~on.aa asaoo.»s .oa.~ o..oo«oo.aa cocoo.aa no... oa.aod o..« noocn.oa moans saunas nuwa .oua».oa 1.” a oa.~o~ N «a no no.0“ 4+4 :5: 3:3 ...; 2%».2 av.» animus ..ae "can“.oa cu.» oa.o«~ o.n« cocoa.o« was.“ oa.ms~ a.a~ as.a..ao an.¢ o..aa~ a.a~o~a.oa.oo ac.¢ooo.a.~ o.oaono.oo.so eou.oo.unoa oqaao.s.oo.ao rlfiumrxzrssnrnqliuump_m . g(a‘ 00 a a oc.so camoo onmoo as so 9.”.2 ov.no sauna cameo aa.ao owes ~.co com" “.0 ..n o..n.o..» on.» ....saa....«. o.-.oo.o a.“ so.» as.“ oo.o.o».»....noooc cannon.“ mpm .mhmou ...>n .oquua mcsoo.m.m.n.moa.>s.o.msaa. . .ss.«a.....a\as.»a.o\..... a....4x~4 oz. .Jaau.. smoo: zwusg 3 96.3333 63:... mamoo.mam.~.mom.>o.ua.4m~ . .naaoa...a‘~«.«a.s\..~.x u .0 4:”; nza a4»~o.. onas znusy : mam.u»»ooJ>n.o.Jz.4 manoo.man.u.momJ>o.u..sm~ .n«.~«.°«...asaa.oa....«.x a ...Jtas a=a mama... Juno, zausg a «eao.ao ozm «.0 a..oca..oaoo.«vo».aooo.a.oan ..o so.n.a..nasa..».....on.aaoon a.» 95.5.0».aao..«vo..~eoa.onoan Blues”...o..aaoo.enaa.~soa.an..n «.o ... on.ona...ao~.nn ».a.on.unonn 0.. asses...»°».~«oo.on e.g.oa.o. ..m°«.a on..no~.Haom.ouoa.a.on.o. n.no»...on..now.aooo.a~ o.. ..nnon.ano .«aon.vn ~.. ~.«.cfl.«.3..naoo.an 0.. o.aoon.o.pa.d~on.a. a.» m..o o.~co«.a~ ~.on a.1 «.0 v.“ a.oo a.» ...o unacs.uoa one» .53 a 362 23¢ no ozm Jab. . 8.;3.356265%:6.2.13 ”3 4:... 23063.:39;;.mw¢t¢.zo>n.zo>u.8>o.26636.3:: ...-1... . . 9.3.52 .33....» 53.03.229.23035 - 0 .o...¢\n.~.d.4:_. , . smog: z.msx..msa~ some uz.»¢u zo>o u>= x. o mam . r. 5...». . 88.3.5263.33.83.39239.25£36.33 44.3.3 zu>a.o>a.ua..x3..unaw m..o......a.x\o....\n.~.a.m4mn u Jmoo: z-muxuuwamu tome az~u¢<>w men~ .men 3 5.4:... .33... f. a. ......e... :2... :41: $4.4. e. a. 4.5.3... :2... 511?»..333 1» fl .1338!» $2..» 5.4.... ....s... 1.. ... ......e... :3... 30.43.? $.43! +4 * £1333?» $.30? .1 a.«.acas .nao.»a:¢oe ..mom.>aJo.a.oxvd.o.ooam¢ «.~.«. .uan.»az¢oa s.mum.>a.o.a\n.n.uoa.u.~«\~.ma.oaam¢ «adage .aan.»a=aoa numom.>o.u.n.v.ea.uoam¢ uaa.«cfioa. .uamcaazcoe canon.>o.u.o.o\«.«.va.ooama mzo.a<3am >»_»,ma..._ swans z~msg a 363 card to control data input, cause matrices to be printed out, etc. The SSCP card like all of the other control cards is completely free- field, blanks being permitted anywhere on the card. The first K card sets up the basic matrices for any double k- class member for equation 1 with X1, X4, and X5 as jointly dependent variables; X and x as predetermined variables in the equation; 0 12 and X7 through X10, X12, and X13 as additional instrumental vari- ables. The codes following the K card specify the particular k-class members to be calculated. The ZSLS code designates that ZSLS coeffi- cients are to be calculated and the LIML code designates that LIML co- efficients are to be calculated. QAny specific k, double k, or h- class member or series of members can be calculated by specifying k, k1 and k2, or h. UBK coefficients can also be calculated by using an UBK code.) The particular statistics to be printed out are de- signated by codes (e.g., C for goefficients, STE for standard errors) following the codes telling which estimators to calculate. The SCE=l code following the ZSLS code designates that the ZSLS coefficients are to be stored in a coefficients pool and labeled equation 1. The other two K cards are similar to the first in specifying statistics to calculated for equations 2 and 3. The READC cards are used to input the identity equations, and the BSLS and FIML cards are used to calculate the BSLS, IBSLS, LML, and FIML estimates. For iterat- ing on the BSLS coefficients in the calculation of the I3SLS coefficients, various stopping criteria have been specified. In this case iteration will stop when the first of the following occurs: (1) The proportional change in all coefficients is less than 364 .0000000001 (COG-1.0-10 code). (2) The number of iterations exceeds 200 (MI=200 code). (3) Iteration proceeds for more than 10 minutes (TIMELBlO code). In the computer output given further on it can be noted that the lst criterion was satisfied in 42 iterations with the total time being 12 seconds. The NTH=20 code causes the statistics listed after the code to be printed out each 20th iteration. In this way, the progress of itera- tion can be examined for problems requiring a large number of iterations. The convergence or stopping criteria given for 3SLS are avail- able on FIML with all of the additional stopping criteria listed in section V.C.S avaiiable for FIML as well. The NTH=10 code causes cer- tain statistics specified by codes to be printed out each 10th itera- tion. Additional information regarding the form of the 3SLS and FIML cards are given in a later section (section IX.D) on the "coefficients pool.” The codes (and the entire form of the control cards) are quite arbitrary. The important features of the control cards just discussed are: (1) The control cards are free-field; hence, there is no need to punch particular information in particular columns. (2) An individual code may be mispunched just as a number may be mis- punched into an incorrect column in the form of parameters given earlier; however, with the form of the codes given here, a mis- punched code may be readily detected by the computer routine whereas there may be no way that a number punched in the wrong column can be detected by the computer routine. 365 (3) The control cards are open ended. If a new feature is added to the routine, a new form of control card or a new code may be easily entered to permit the user to use the new feature. (With the other type of control cards, insufficient free columns may be available, hence, additional control cards may be required; however, this may cause trouble to those who don't know about the require- ment of the additional control cards. Also, a set of control cards used to calculate a problem prior to the change can no longer be used without updating the set.) No matter what form of control cards are used, they should be printed out as they are encountered to provide a permanent record of the particular control cards used to calculate the problem. 366 C. Data Transformation Section It is very easy to provide a lot of facility for transforming and editing data by providing a call to a subroutine (to be specified by the user) after each observation is read into the computer. Con- venience to the user is enhanced by selecting carefully the arguments transferred to the subroutine-and the variables provided in COMMON blocks to the user. In the parameter cards given above for calculating Klein's model I, the FTN, SUBROUTINE, COMMON, DOUBLE PRECISION, RETURN, and END cards are described to the non-programmer user as "transformation form cards" which must be inserted whenever a transformation subroutine is used and copies of these cards are kept on hand at all times by the MSU Computer Laboratory. The user merely helps himself to the prepunched form cards instead of punching his own. A manual giving simple transformations is available to the non-programmer user to hslp him accomplish his trans- formation and editing. (A user with knowledge of programming recognizes that the full power of the FORTRAN compiler is available in accomplish- ing the transformations.) The transformation cards are inserted between the DOUBLE PRE- CISION form card and the RETURN form card. The first transformation card in our illustration creates X14 as X4 +-X5; the second trans- formation card creates x15 as x2 +-X12; and the last transformation card creates X20 as the raw observation number (NR) plus 1920. (X20 is used further on to number each residual by the year of the particular observation.) The variables on the COMMON card aid in such things as dropping observations, stopping date input, branching to particular sections if 367 multiple SSCP cards are used, lagging variables, calculating moving averages, and printing out data labeled by observation number. Addi- tional COMMON cards are available to assist the user in particular transformation tasks; however, they are rarely required. The transformation section also provides an open ended method of reading data from miscellaneous storage devices. 368 D. Coefficients Pool In the AES STAT system, a coefficients pool is automatically established within the computer into which coefficients from any single or multiple equations estimating procedure may be stored. Also stored with the coefficients are a record of the method creating the coeffi- cients, which coefficients pertain to jointly dependent variables and which to predetermined variables and the particular variable numbers of the coefficients.1 Various control cards are available for retrieving the coeffi- cients and making calculations based on them. In particular they are retrieved for the BSLS and FIML calculations in the illustration. As examples of the types of parameters available, the SCE=1 (gave coeffi- cients as gquation 1) code following the ZSLS code on the first K card designates that the ZSLS coefficients from the first equation are to be stored as equation 1. Similarly, the SCE codes on the remainder of the K (k-class estimates) cards and on the READC (read goefficients) cards are used to assign coefficients to equation numbers. READC cards are used to input identity equations for Klein's model I; however, they could be used to input any set of coefficients. The numbers following the "38LS(" but before the first "/" on the BSLS card give the equation numbers of coefficients to use as starting estimates for the 3SLS estimating procedure and, by implica- tion, the structure of the 3SLS system. The numbers between slashes designate identity equations in the system. These identity equations 1Coefficients from the pool may be reclassified or the equation renormalized at any time through the use of RECL cards. 369 are not used by the SSLS estimating procedure and could be omitted except that the RF (Eeduced form coefficients) code designates that re- duced form coefficients are to be calculated and printed and the RFRES (reduced form residuals) code designates that reduced form residuals are to be dalculated and printed; hence, the identity equations are re- quired to complete the system so that reduced form coefficients and re- siduals can be calculated. The variable numbers following the slash are instrumental vari- ables to be used in the BSLS estimation procedure in addition to the predetermined variables in the subsystem being estimated (equations 1 through 3). If the SCOE code had been used after the BSLS code, the BSLS coefficients would have replaced the ZSLS coefficients in the coeffi- cients pool and therefore the FIML procedure would have started from the BSLS coefficients instead of the ZSLS coefficients. 370 E. Special Files 1. Data files In the AES STAT system, two data files are available under full control of both the program and the user. Either raw or transformed data may be stored in these files by user control codes. If sufficient capacity is available in the main core memory, these files will be carried in memory; otherwise the routine automatically establishes them on the magnetic drum or (at the user's option) on magnetic tape. Any number of additional data files may be established by the user in which he exercises primary control through use of FORTRAN statements and executive functions in the transformation subroutine. 371 2. Intermediate storage files Two intermediate storage files are presently used by the AES STAT package to (1) store information not needed for the immediate cal- culation (as examples, the coefficients pool, and the full sums of squares and cross-products matrix when only a small part of this matrix is required for a particular calculation) and (2) provide additional storage capacity. These files are automatically established within memory by the AES STAT package if sufficient capacity exists and they are created on the magnetic drum if insufficient capacity exists. User options also permit establishing these files on a magnetic drum or on magnetic tapes. 372 3. Matrix storage files Not presently in the package but to be added are codes which will establish and write particular matrices into files in a manner such that other packages or extensions to the AES STAT system can readily reference them. For example, a simple correlation matrix could then be created by the AES STAT system and then used by a factor analysis package. As another use, a simple correlation matrix or matrix of sums of squares and cross-products could be stored in’a file and retrieved and computation continued at a later date instead of re- quiring the AES STAT routine to again start the calculation of a pro- blem from the set of data. 3/3 F. Incorporation ofgi and fi Directly into the Sums of Squares and Crosssproducts Matrix Ix.1 “=28 ( 4) y HH- and Ix.1 -“= -‘=-8=-28 ( 5) a y y 3'2““ +u+u . -1 where +2“ = [yu : 2D] and +§u = 8 . Also, +‘fl let a sums of squares and cross-products matrix be defined as Z'Z where Z includes the variables in Z” . Suppose that it is desired that the y variable defined by (IX.11.)be added as an extra row and column to the Z'Z matrix. It is more accurate as well as saving of computer time to accomplish this directly rather than by calculating 9, forming [Z 2 y] , and then forming [Z 3 §]'[z E 9] To incorporate 9 directly into the sums of squares and cross- products matrix, we form the [Z E 9]'[z E y] matrix by forming A I 'y as 8'EZ'Z ]8 y uuuu The new sums of squares and cross-products matrix is: Similarly u may be added as an extra row and column by forming Z'u as -[Z' and fi'fi as 3'[ Z' ZHJ+SM . Thus, the new +ZHJ+§H : + u +~u + matrix is: 2'2 2'6 fi'z fi'fi In the AES STAT package, incorporation of the y or u corre- sponding to any set of coefficients directly into the sums of squares and cross-products matrix is accomplished by the REDO control card. For example, the following REDO card will retrieve the coefficients of equation 5 (they may have been created by any method or read directly into the computer), incorporate 9 into the SSCP matrix as variable 10 and incorporate G into the SSCP matrix as variable 14:1 [REDO(S)YHSSCP=10,USSCP=14 1y and 6 may also be added directly into the transformed data file if desired. For example a YHTD=10 code used on a REDO card will incorporate 9 into the transformed data as variable 10 and a UTD=14 code used on a REDO card will incorporate a into the transformed data as variable 14. The variable number of a YHSSCP code need not be the same as a variable number of a YHTD code even though they are used on the same REDO card. 375 G. Estimated Values of Normalizing Jointly Dependent Variables, Residuals, and Related Statistics The calculation of structural and reduced form estimated values (of the normalizing jointly dependent variable) and residuals for each observation as a user option seems very important in a simultaneous stochastic equations package. A feature of the AES STAT package which has proved quite convenient (in cases where a logarithmic transformation has been used in the creation of the normalizing jointly dependent vari- able) is the option of calculating the anti-logs of the actual and esti- mated dependent variable for each observation and the difference between these values for each observation. If the dependent variable has been transformed to logarithms, then these three calculated values correspond to the original variable, the estimated value of the original variable, and the residual in its original or natural form. This can prove very helpful in analyzing the results. All three statistics are printed for each observation as a user option in addition to or in place of the regular actual value, estimated value, and residual in logarithmic form. It is also desirable that a new SSE (sum of squares of error) and R2 be calculated automatically from the residuals in anti—log form whenever they are calculated (a user option in the AES STAT package permits the new SSE and R2 to be calculated even if the residuals in anti—log form..are not actually printed). The SSE and R? so obtained can then be compared to the SSE and R2 obtained from estimating the model in natural numbers (i.e., from estimating the model in some sort 376 of linear form, quadratic form, etc.).1 When calculating residuals, it is simple to calculate the Durbin- T 9 T Watson statistic (- Z (C1t - u; 1Y7 Z 3:) and print it out. The user t-2 ' t=1 can then use it if it is applicable. (The Theil-Nagar statistic is the same as the Durbin-Watson statistic, and the Von Neumann-Hart statistic may be readily calculated by the user as T/(T - 1) times the Durbin- Watson statistic.)2 When printing out residuals it is often helpful to the user if each residual is labeled by one or more numeric or alphabetic variables. In the computer output given further on, each residual is labeled by the actual year to which it applies. 1It would be preferable to generalize the residual subroutine slightly to permit calculation of any function of the dependent vari- able and the estimated value of the dependent variable, since the de- pendent variable may be transformed in other ways besides just the logarithmic transformation. 2Basic references regarding these statistics are: Theil and Nagar [1961], Durbin and Watson [1950], Durbin and Watson [1951], and Hart [1942]. 377 H. Weighting;of Observations It is very easy to provide for weighting of observations as a standard part of the simultaneous equations package. By weighting of observations we mean that the ijth element of the sums of squares and cross-products matrix is formed as: T (IX.16) t.Elctxtixtj where the ct are weights designated by the user.1 (If all of the ct are l, we have the usual unweighted sums of squares and cross- products matrix.) For the weights to give the correct "degrees of freedom” for estimates of variance or statistical tests, the sum of the weights should be T; however, this imposes an unnecessary burden on the user. All that is necessary is that the user specify the re- lative weights. The relative weights will be automatically adjusted so that their sum is T if the sums of squares and cross-products matrix is formed and then the entire matrix is multi- plied by T/ g c , i.e., the ijth element of the matrix used for c~1 t further calculations is formed as: T T (Ix.17) m =(T/Ec)2cx,x ij tsl t t=1 t t1 ij 1Weights may be based on the number of observations in substrata of the sample as compared to the population to which inference is de- sired, the inverses of some estimates of variance related to the ob- servations, the length of time since some base point, etc. Klein [1953], pp. 293, 305-313, Goldberger [1964], pp. 235-236, 239-241, 245 and Johnston [1963], pp. 207-211 contain material on weighting of observations, mostly in adjusting for heteroskedasticity. 378 In the AES STAT series it has been convenient to form and carry the sums of the variables (either original or deviations from means) by including an extra variable, x (a variable taking on the value 1 for 0 all observations), when the sums of squares and cross-products matrix T T is formed. m will then be 2 c and m , will be 2 00 t=l t 01 . t=1 If the entire sums of squares and cross-products matrix is then multi- T plied by T/ 2 ct , the weighted sums of the variables will have been t=l correctly adjusted as well. The same adjustment procedure is used cx,. tti whether the variables (except x0) are actual or deviations from means. The simple correlations matrix (A*) may then be formed as in section IX.A.2.c and the problem calculated. In the AES STAT series, weighting of observations is automatically imposed through use of the WOB code on the SSCP card. If a WOB=10 code is used on an SSCP card, each observation is weighted by the value of X for that observation. (X may be punched on cards along with the 10 10 rest of the data or calculated in the data transformation section [see section IX.C].) The sums of squares and cross-products matrix is then T automatically adjusted by T/ 2 ct before further calculations are made. t=l 379 J. Checks Against Errors As many cross-checks against errors should be built into a com- puter routine as possible. Following are examples of cross-checks against errors built into the AES STAT package: (1) (2) The error sum of squares is calculated anfiprinted out for any co- efficients read into the computer through use of the READC card. If a set of coefficients represent an identity equation, the error sum of squares should be zero. Due to punching errors in the data, the requirement that the error sum of squares be zero is often not met. Unfortunately, users are usually lax about checking that the error sum of squares for each identity equation is zero, so we plan to revise this check so that it is performed by the computer as each 3SLS and FIML system is being set up for calculation. Thus, if this check is not met for an equation specified to be an identity equation, the BSLS or FIML system will not be calculated and a message to the user will be printed. The ranks of matrices are calculated in the process of orthog- onalization and inversion. If in a later step the method requires that certain rank requirements must be met, this check is per- formed by the computer and a message printed out if some require-_ ment is not met. As an example, rk.X“ and rk XI are calculated during the calculation of the [ Y' Y matrix for k-class + u. + pJIXI estimation. Assuming that linear restrictions are not imposed on the coefficients, if rk.XI < “h , unique ZSLS and LIML coeffi- cients do not exist;1 if rk.XI = nu , ZSLS, LIML, and many other 1The equation is under-identified. 380 estimators coincide;1 and if rk XI > n“(the most common case en- countered), ZSLS and LIML estimates do not in general coincide.2 The user is notified as to which case is encountered and if rk.XI >'qb, rk XI - qb is printed out. Where possible a complete check for a given condition should be made and action taken by the computer routine rather than merely noting conditions on the computer output and relying on the user to notice the condition and take action when required. Many users merely skim through the output giving almost no thought to messages printed out. Only by not calculating part or all of his results can you obtain a user's attention to many types of errors. (The analog of this is the proverbial farmer who bats his mule over the head with a 2X4 to get his attention.) For minor errors, we have used the practice of print- ing out many asterisks in conjunction with error messages in the hope that the user will be sufficiently curious to look up the error. (Most error messages in the package are given a number. A separate manual then explains what is wrong for each error number and for many error numbers, steps the user can do to rectify the error.) Also, the total number of minor error numbers is prominently printed out at the end of the user's results. To assist in "checking out the control cards and data" before calculation is actually started, a SCAN card is available. When a SCAN card is used, the data is read into the computer and some basic statistics (but not a sums of squares and cross-products matrix) are calculated. All control cards and codes are then checked for 1The equation is just-identified, 2The equation is over-identified. 381 consistency. For example, dummy coefficients are saved in the coeffi- cients pool so that the routine can check that starting coefficients have been specified for BSLS and FIML. After all errors have been corrected, the SCAN card is removed from the deck and the problem cal- culated. Considerable computer time is saved when the SCAN card is used to detect errors before actually calculating the problem. Some errors must be detected by the user with the computer routine merely printing out statistics to aid in the detection. Data errors are usually of this nature and cause much difficulty as a re- Sult. Following are some statistics printed out by the AES STAT pack- age to aid the user in detecting data errors: (1) First raw observations, i.e., the first observation as it was read from cards or from a file. (Usually a mispunched format card can be detected by checking the first raw observation.) (2) First transformed observation, i.e., the value of each variable listed on the SSCP card for the first observation incorporated into the problem. (An incorrectly specified transformation can often be detected by checking the first transformed observation.) (3) Number of observations read, number of observations dropped, and number of observations in the problem. (Considerable editing of data is often performed in the transformation section. The number of observations in the problem may depend on the trans- formations and the data itself since observations may be deleted by the transformation section.) (4) Sums of the raw observations. If the sums of the raw variables are obtained from the basic data source these sums can serve as a check on transcribing as well as data punching. (Transcribing (5) (6) (7) (8) 382 errors are likely to be a much larger source of data errors than punching errors, especially when card punching is verified by re- punching the cards on a card verifier.) It is usually most con- venient to use a hand adding machine to get the sums, with the individual sums being kept on the adding machine tape. A sum is not obtained twice, since it is checked against the computer out- put. If a sum does not agree with the sum on the computer output, the error is easily traced by comparing the adding machine tape to a listing of the data. Minimum value encountered in the data for each transformed vari- able. Maximum value encountered in the data for each transformed vari- able. Means of the transformed variables. Although the exact means are usually unknown, their magnitudes should be known. A mean of the wrong magnitude may reflect many possible errors. A list of variables which are constant and a list of variables which are zero for all observations. Usually a variable which is constant or zero for all observations reflects a transformation error or failure to provide a transformation. Some of the above checks may seem cumbersome; however, the ease with which errors are made and the effect of errors on results make the returns to such checking extremely high. Those familiar with the statistical methods given in this paper will surely agree that a single mispunched data element may have a very drastic effect on the statistics calculated. 383 K. Computer Output Following is a reproduction of the computer output generated by the parameters and data for Klein's model I listed earlier. Some hand- written notes have been added to call attention to particular points in the output. Also, some blank lines and mostly irrelevant material have been removed to save reproducing costs. Due to the particular method of output used, no number which occurson the printed output contains more than 9 significant digits. This holds even though given numbers might contain 23 significant digits while in the computer. Following are some numbers occurring in the out- put: Number Printed Out Number Should be Interpreted as 1133.89999998 1133.90000 37275.86999989 37275.8700 20.2782089394 20.2782089 1.5261866857 1.52618669 In comparing estimated variances, covariances, and statistics computed from estimated variances (such as standard errors and coeffi- cients divided by their standard errors) in the computer output with reported results from other sources, it should be recalled that a "degrees of freedom" adjustment of T/Q/T - nu /T - nu.) is used in most of these statistics in the output whereas for results reported elsewhere no "degrees of freedom" adjustment or a different degrees of freedom adjustment is often made. For example in results reported in Goldberger [1964], no "degrees of freedom" adjustment is made. In some cases coefficient variance-covariance estimates have been 384 printed out both with and without the degrees of freedom adjustment to aid in comparing results with other sources. For variances, covariances, and related statistics, the computer routine prints out an indication of the denominator used in the computation. Total execution time was about 40 seconds.1 1About 2.5 minutes of "set-up" time was also required. The "set- up" time will be reduced drastically if many of the subroutines are transferred to the regular library file. Current charges for the CDC 3600 computer are $330 per hour plus $.01 per page of printed output. Thus, due to the large set-up time presently required, the total cost of calculating this problem on the CDC 3600 computer was about $18.00. ($3.67 for execution time plus $13.75 for set-up time plus $.65 for printed output.) 385 ~....an.n.as¢ .aaouu. n...~o .auma.4. .coaaa a. aura.:_. nt.»:oc an zo_a.au ma.» pa.» n....~ a. amen... nt.»:og .‘9~J£L 01. ...VONIJS 4 flea.) OrJi u54:r 19.3097. wastedillan 04-.“ 9.0.4.0.“. M atubamaunaundu-avatuuuowtubsoc « .0: mac. or. assau- oe anonocvo..~.- said?! 4:: acouvaisowasdtxr 33:33.32- .n.xo.c....ca.. x zo_n_uw-a nausea ...~.- :3.»aw:.o . an.¢a_.n_.»:.az\a\zozcou ...«azau» wa.a:oco:n oosa~xso an.nrpa u...5.g»a oo.u4.a .ac»..»s..~.a.aou »:.az¢xmo 4x.u.5:4 .mun». .nsan .Jx_a.m4n~ 2o . Swan: games 45.o.w4c:¢.n.4:_a.~.caoo.-os .ao was» :mamoca .. »»_co_ga.oa....nnnann amass: unzwaoua sea." aoancws neon. taco an: 386 ooooooon.ouuv «a ooooooou.uouu o cacao-oo.~oa mu umooaa x oocoeeoo.nv o ooooeooo.uou Nu ooocceoo.nv o wuzoumm 99.9 noxnuxoo when oceanogo.nvn «a .ooooao..«so a os.....~.~u «a sauna; a caucuses.~u «u oeaooeom.o~ a ”auto «auto .axcu .accu .occu .accu oooooooa.o ea noooeaou.van v cocooooa.«~0u cu ooooceoo.one cu wr~u oooooeov.~u v oooooooo.ouo on ooooooov.~u v .sczxou. 40-»200 papa. Jocpzoo papa. Jocazoo papa. Jocuzou papa. Jackson papa. Jocpzoo papa. ax.» azmcuau «as mug” aunaaam on . «an as. pzwccau ...ae..~...~ o...as... . . osooooon.noa ......ao. » a uzo_»¢>¢onao ya: «a so axon «a . au asses-...uca 9......o. as ca 8 > sass...o.. .......a. a a a . .sooooon.nu ......on. n u a: . .oa .>¢u»oo sac :ou.. zoapascuuoo oeoeoooa. zoupa>¢mmno acc run game. omega; u.am ...ooooo.auau nu «ca ......en.a.« 5 ON coooonoo.nnau .xuaooca z. nun—aascuooo Jase» oauaaocn a:o_»¢,¢uuoo Jaye» .uzo.~«.¢muoo 3.: Jake» ac ...-oooo.vc nu coo-«4 - a .......~.~ a as .- ...-.....uc a u nuv¢050243> run ......a.... "a .so .a..4. s eoooo.o~.~ . s on o.a..«oo.~c .nu.o.~.na.na~.ua.aao.-o.~¢:¢oa x.a.¢.aa¢¢»...~.xx.nu.x........aoa. no.4 ..omoaas ..mx~».a.g.~a..a.x 4 s.»o.ou.x 4 u.z.a.us._.oo.u.x 4 «~.ua.a . swan: a.u4w..apao a: sou a . annex z_u42...».a a: a..— awooca a an nauov.o «oos~.a .nono.° «acna.o cocoa." mz_a on oonuv.o oncnn.o ousvv.o nooac.° «onaa.oo cocoa.“ 3F? oanwnowa..oau ouano~oo.oan~ owcusmuo.oonu oonuno.m.soon amnoonua.v~n ooaoeso°.ona omvusnos.non «onunocn.~c «moonwan.oa noouvaan.vn- cocon~«..«quu namoaooa.onu oununooo.voa oooooo~n.~nu conunouo.uco lama w!» Soak mac—b¢~>wa nwlcaom so tam o sovon.o .uwno.a cavan.o o-von.o ..vnn.. cacao.o oeao°.« . omnna.. coho».- nova... «noo~.o noone.. cocoo.. s~.ao.o cases.“ «a s ovoa~.o ocean.o cans».- savnn.o «~ooo.ou one-o.- “no“... a..~s.o cocoa.“ o co«~v.° «moo... soaao.° ovana.. on..~.. osaovno manna a .noco.° om.~o.. noose.“ n ouann.. n-aa.. noaqo.o manna.o nqaao.. annvsno Noqwa o unann.o asaa~.o c-co.e oases." nu coo-o.u v as.»«.. anon... novao.o nan.~.e awash.- oaono.o .uo.«.. onuvo.o oo~n9.9 Nunno.a one»o.o cases.“ a on «cont.- ......4 a n coonm.. «nae... osooo.e snao~.. nn-a.o ~«oon.o can“... «moo... ooavn.o cuaoo.o n~oao.. ~ooc~.. ooooa.u > Au¢0u+d-P.beu +¢0§elfmus~abk '9'th oonooou~.nonan. “cocoon“.onona oaoooowo.oo-5 mooooooo.~nnovo oosoooc~.onon osoacoeo.aaa esooooco.oon~ oooooovo.vnou oooooooo.o~o ~oooooan.ooohn oooooooo.maman oooooov~.sqno oooooomo.oomo~ ..oooe~o.oo~ o~oooo~o.oo«~o wwccaom no a=a mm4a«~cas awraoauzaaa zo mo.»a.»c»m . . a ooeooooo.~n~v .oaooooa.n-u oosooooo.au~n nooooaon.oa~v oceanooo.nca ooaaooao.e aosoooow.os~ ooaaoooo.~ou oesoooom.naa ooaoooou.«o~n ooaooaoo.ua. seasons—.vnn ooaoooom.no. ooeauaoo.o~ .oaooooo.nnan tam m:o_»¢4w¢¢ou unaxum unv~o:«~.o onwoaana..a vuuomnuo.o co~vnouo.o ~«on«o~s.v ~oonooo~.o ~oanuaao.n «a.«o~na.~ nan-oono.« nao-.«o..u vvnncnnn.s o~osao-.v unuoocon.o ~o~vounn.n ammo-so... aoapau>wo acaca¢»n u 4 a a » amen“; u noun“; a a snag... eon»... one»... ~n..a.. .a....u suns... ......u n _ « ..aa...c acoan.. none... can“... scan... ...“... song”... coma».- n.s.a.. mama... anonuuo. nooks”. «one. .. ”noun . mauva... naw«u.. n.~n~... mane... ao~a... owns... “has... .a~.o.. .nnu... annua.. ononm.o sou...- sooso.« coca... ......“ o...o«o~.a°~ an.~vunn.on o~vuanoo.so o..n~nov...~ ocaouoan.oa cocoons... «anoaoao.o coach's... unaccoun.n ouncuan...o onanoo-v.«v oaoaoooo.ou oaooouon.on soaoooo~.« o..n~noo.nn saw: an on nu an CON HOQOBOO ouunvn dddflflfl O‘NHQEOFCO O 7 oucaa; cocoa; amoaaa omega; omega; at. N O—I“Ut.DO-‘K|l>r v. omega; omega; a > u a r > u x ..1.. m4». I s am 2 $52.80 5 2;; aawou 32 w .umnun cacaanao.« cune~aa~.au voaoooaa..u douauuna. . .. - . . . .... w... omega; a azapuzoo : 2:4: u mueuau m. awn: ...z.» .o a: a»..: 0+.uso ” 32:52.. 33.. “.35.? QM»..N$\..M 1.2:; 2.35: 333233230 couuaoaa.o saunas...“ cannacoo.. anaouana...q1oo mane «a o a o tu+ma was omega; a pchazou 3 0+ can .o»«2_aozua n. can: ...:.a no as ou..-..:.: .mwxaaom amass moaam.~. éuvnu>s.o-n¢oacw acaozaa» azu.u_.amou ....u mean novovn~o«~.o enosnnavnn.oa oaoomouoao.o oun-onaua.o cocooooooo.~. ss+od .u «a a a c a .4 use omega; a azaauzou a . a - .mmm.aom swam; us.»m.~. u mazm_u_tuuoo .s..» x .; ... ..nnL. Hummus < .33. 3.2. ido- “U 3 = 3 .13-..» are . .2. 22.... 33:2; 3°23...“ 3:22.: 3:. Z. s . mczm_u~utwou am amz_«4axm .mo..y.zazmo z. .ao».z_:ozmo .\Vunm o+ use n <- .7- YA- Au- V 22:2; .3 22:333. 32. .32.: 9.... 2. own: 2 39.3.? 13.9w . \ “Ix * N as .3 a.»-«nn.~ saunas“... an zmo an x z a peg .xa a. 2...; ..4afin3. .c. 9253.33 . 2. a Saar}? t 53H . {INXJS HQSOD‘C‘MJUD 05W .AQV0.*O..+.I*¢ (Mara. «6:..14461. {is}. ..uxaaom anew; mo..m.~. r _ ax ._. «x f « ._. o a cum; 44a 2.1a”: gum. .aaau msoma o» zo.~¢:z_azou. ”omau w>om< o» zoaaaazaazou. .aa.u Joaazou sac“. .oacu Joaazoo papa. mazouwa no.n yr.» czwczau swan wuzam owmaasm ooxo~xos wpaa vn a ”and wx_» hzwuazu 0:24: .cuoxur mmaouox one can a: awn. m»n.w»uou.>o. 0.42.4 manou.w»u.n..u..>m. u.n5m~ .nu.~a..a ...xua. axn...a.. a on .. 42.4 oz. momu... Juan: :.m4x a as am .. 51.4 oz. n4n~... Juno, zausxa 389 3.32.5 33.2....- noofioonéa 333...... . S o a o 48..qu 333 a 2.4523 3 a 4% 953 «2...: a. on»: 2.2.» .3 t. 311:, 3 use ..3 32.; 82:53. 3:5... mamvfim‘r\«w .-mxaw 9:32.; twouxmczmztumou ( $3.210 $223.~ 3:33... 1393; «1. ..qu 3 o a v .4». up” 333 a 23323 a . 3:27.523 2 3m: .32; 3 a... on - - I J ...o ..8 322m 22:..on 2:23. @398 - -23.... 332:... 2.23th 3.22:»... 32.331: 332.23... 23333.? 23838.? as: 55.. 3 o m c a $3.. auooj a hztmzou : . a n u). can ...-5 392m 20:1..an 325: ..w 35::th ‘ .182 as.» (fists—Lao t.- ..e.» oono.o n-movov.~ «accoovo.a nnoo«voo.oc ..SHJ‘LJQU n L 35...:th 5 $2.33; 2.021373 3 3022:6sz m3 2» ....o a»... u a wow 4 zo.».~¢«> co zo_»moaoaa owma ...z-F. Nm 2. own: a. Na . . «a: .3436 .«o~.av¢.;4..r.:. «5.7.301... a 3323.3 v 88:...2'5. .C.Wn51.x V‘s 9.1.2.93». up uJM4-4a .79... Annmqvnod 1r /u|o|\v .wpooaouaooavzo » no .Sootza a»... fired ”gage. . . ._. 3 v $2.23; 3933.94 + 4.. .3 is a .32 or; z... .... dot-v .... Shear??? T........... ... .1... eases... .....eé... «Hi-3F_HIEC‘:M [O‘DCUH Q a #dfifi *w—m41~9m#¢4" mac: 3 ono«v«o«.o¢u .aommaao.a «nave-Q..“ n ~ a 6.3.13.0?» 0373.990 +3.3... 4¢d\|v 3...." com b3 _ 3 o m o 9!... x: 2.x 3:. a a 2 ”(+3. {Lucio-“eiunbdawflfl a u... g de. é. a. "In.- O n .. .. .. - 4 a» a _ Ari... ...-... .. ......5. am: .32 HEUL E ..Fs m .a ..o has “ryfilv 3333.3 3933.... 339:... n o « tmyupm z— mm<> banana :0 zo_mmwcowz no wwm \ ow 2. omc> pmnwza yo zo_nmwxema no man «stud _ ..3 33.; 20.2....ouz. 3:24; muzoumm no; as: pzwmanu Ban. wuz; awmacjw 3:23 .33 2. . :3 m2: pzmzaau « am .- 4:... oz. 3mm... .58.. 234.. 390 «4.. "Jun 33¢ «anon N on m. am.3mzou a. nu>.u m.mou meu svusn ......«o.». ....».o..» .m....«..~ ao.n~..~.. «a «a o c ....anu 333 u 32.... ; $32.3 . sod-v.53 «chuck m¢ own: .~.z.» to »x on..-.: n+_usp .mm¢.=om hm... we..m.~. a « ..-mucm o¢.oz.. .mouxmhzu_u_..-ou ES . . ao~n«°...o nan~oood.o coo.~non.o .nnnn~.«.o . «a a" o . uso¢u4nu 333 g 333 .. 5:25.... .. .1» u...» 32278sz 9. 33 2.2L 3 t. a... goal I J. 2.5 .3553 3.3 32?? «am» .06., aonmcomam 3.32.; $32.38 diicmaun nonoson~o«.o. nsumnvonuo.o c3noco~os~.a~ on~o«-9n«.o cocooooooo.a. .o4wa a «a «a o c a as awuo‘g x owoo<4 a »z«»mzou ‘ g _ ¢+ p .mw¢m owz~¢4nxm .xo» no zo_.zoao¢a sum: ._.z.*. wm z. own: ». wn > .? a a" v snoovson.c ouncoc.o.o no zua .o :32 o.»<¢ .1; ~z; £01.34}; :6 Lou up. auuL uuh .au¢¢aom »a«m4 we.»u.~. » emu; 44< 2.x».n own. «:2.- .mmoxux mm.4u.x Jan as. a: ~mm. .oa‘u m>oo< o» zo.»«=z.»zou. m»n.u»nou.>a.u.41.4 ~om¢u w>oo< o» zo_»<32.»zou. w»nou.w»m.- un.>m.uun4mm .o¢3 4 a vac-nomo.nn o unnaounu.a .a .npooc.~poo¢.aa u so .uuoo¢.ag — an o nsoonosu.. n.~..n.... u: can .9 ca: o.»¢¢ .3; .3. «anon-on.v guano»...— u u unouxopoou >auo...¢-.x .4 no «you. 103.35... (d .34 :0 or 00% «a cup. :9: o..nouo..u a u ounnovq~.u o t :mpnpn s. a... .mnuc. so 29.nmw-mmx no wuu \ am a. a... bonus. 9: so..uu¢-wc ‘9 an. ~..ow mama.» ao...xxaua. nm».:_4.‘ u am no 42.; 9:4 n4n~.u_ Juno: a.nau museum» .0.- a... unwuuau pch mun—n awns¢4w -o\-\o. a... on . «nan we.» pauasau . 392 3.: mama 21“” . nun n am a. am.aaxou 2. am... ”.... a... o+.u:a 38...... «.23.... 32.2... ...nn...... 04.. man on an e o \o m...» 3...... w .222... m 3 no»... .. own: ...2.. .o .2 a» -... .4_osn ‘ "I O .m ¢.=.m n 4 ... m... 2 saw a . 2.. u. 2 .o w . 3 . E\m ... a . a .mo 2 23... o .232... 2 22.3.. . 3.22.... 32...... .48 man .. n . . .34%.up» mg.» awe... w .22.»2ou w 03 mo».2..o2m. m. cm»: ...2.. .o .2 a. 1.0.: 94 a l I 8.5.3 .2... .35.... «Y... a- 2...... 2:32.; .23....3. «noon...n... nuuonuoov... .o...o~..n.. .no.onoonv.. ............. .s n“ .. a. . o » u+w .a mg.» ...... m .2chm2ou ‘ m .‘ oven. .mw¢.:o. ....4 u.....~. .u m.2w.u....oo < .ixdo i I... 3.2.9.... .322... 22.3.... 9. .1. 2230......0... 3 32.353 2.2.2.2322. z. 32.2.2023 man 3»... 8+».- 2o.:.2; .3 23:33... 32: 3.2.: mm 2. can; t a» 343.0 a. v v.9oov...v ......o... a. 2». .a 2:2 o..22 .2. .2. ‘0?”3‘8. a... L‘ Qua Indl‘ Uta —..m¢.... ...m. ......«.a o . a a... 4.. 2.2..: aux. .22.. .2wo2u2 mm..o.x Joe 20. 2: .mm. .o24u w>om. o. 2°.»om. o» 20...:2..2ou. .oacu Jocbzou »<»w. .ac4u Jacuzou »<»m. muzoumm o... a... .zwaznu .... mu2.m owma.4m ..\.~\o. ...: .. . .... wt.» pawaaau upaou.m»m.n.mun.> .....mu .u.....o. ......n...\o.n.2 n om .. 42.. n2. mama... Jana: 2.m.2 a m am .2 41.4 924 «Jam... 4moor 2.0.x 393 onuuonno.n .39.»... .333... ...-...... «1. .85 .. a. . . .... ...... . ....u2oo . o capucs ac awn: ......» .3 p: on!!! 4 0.83 ’ .... 3....» ...........z. ......3. E\M an... 3...... .3....2.......3 32.3... .32.”... .22”.... 32...... ....H. .. n ., ...: 3...... a 2:9... . ....- a... 3:23.... a. 9.... 2:... 3 ... ...--.r I 49.9 .... m..z.. z........z. ....z... . w n........ ....z... .2......... .....uu..... . ............ .....u...... no.n.u...... .....n....... .2nw .. . . .2c1 «2.. 33.. m 5.522. a ... J. 0..» .... u..2.a 2.....¢..2. ....x... mm u.2u.u....ou ( I . dad. 3.3.3.}. (a so» ...... 2.3:... 3.32.... .5332... 1. «1.. In .9... 3m 3.3.2.33 .... 32.3.... 2.2.2.2323 2. $2.222... a: ...... 22...... .. gorges... ...... 2...... ...... ... ...... t 3 2.4...» . n.o~..-.~. . no.~no...o. an .«poououpoo¢.24 » so .uhooa.aa p a. 2 v .nnoou...v nan-moo... a. 2a: .a 2:2 o..<2 .22 .2. on...m~..n .....«...~ .3342... ... ...... 9......3... 3.3.2:... 3 .. ...... ... .... ... . . . .. cup. 2:: .....un... .....n...~ unsung. a... a .. 2....» 2. accu puny-2 2o no.uum22u2 no mun \ on 2. .2.» rooms. 20 2o.2u-2-m2 so an. me.ou m4a2.. 2....22..2. ....2... — a... raw-can po.a uoz.n awngcau cox-«so. a»... 2. o .... .2.» .2u-2au n on no .2.. an. uauuoo. Juno: 2...: 39k ......3... 3.22.... ... . . .133 ...... .22... a. ...... . ..uau. c. on“. . X 4 n .....M“..... .3332... 23.22.... 08¢ N on En..- B’ _ to :3 3:33:38 :32. 0.100 J-Uuh 0+ 3.3 —.‘ o .ow m2 cusszou 2. 9:2».— nhrggtuou 2 0 94.219 .9222ou. . . ......u...m.u ..m.. ......«g... ....u 4 ...:o. ..... ......»a...u.\~....u..uc .......... .......... .......... ._ ..222 . «a .....2. .. ...... ... c. .19. 34...» ............ ............ ....... . . n z c =... .. 13.823 2 3.5.2.38 .3... 218 ...-nu... 9.. 3.9 a. .3 9. $5.28 2. 3.2..— ..2m.u.twou . “.44.“ ...:c... ......a...m.u ..ma. ...»...2-o. .a22o 4.2.200 .2.m. . nuuua.2o.u.n.o.cu.oncu2 . c 33...... 38...... 32...... 14:1...“ N.» ......2. «a 32.. \ a... >2 4...» 30.41:? P1161. ed :3 03:. 3 can”?! ............ ............. ............ ............ ............. . o . u a .. 133.2512 0 u 2 coda-00 35.22.9300 32»... d‘oo Tuba» 0+ 02.0 cm m2 unwantou :- am>...2u..r.. ....2 2.... . wuzouw. .... a... .2waanu .... muz.» .mn.2.m ..\.u\.. .... .c . .... .2.» .2maxau n2...2aou ...pzma.... .uoor 2.-.2 39S .mhzw.u—uuwou acne. .mrzm~u~uuwoo a¢ucu ouzouwn .«.o 2..» unwccau .0\-\oo «.49 no a .......... u» ...:.2. cocooooooo.uv h oucoo.aooo.« . .U‘OU I IN»... 4 030 am «2 capantou 2. ou>2n. ...:cou. ...2m.u...wou a.mc. .ncdu 4°.»10u nth“. ........... u. ...:p2. ............ . NF 6.8.. ......» J 2.0 r. .. ..m 3 .335... 2. 3.31 .r22295. ...2u.u...wou ncwz. «aldu JOI>IOu bdrm. m<4 aux-n awnt<4w «an wt—b brwzcau 3.33... 3333.. . ... .1... an own:- uan x 4.35 3.333... 3.333.... «One 2. o no 013. . m 33.23... u 4 .... ......c2ao. ..wun.>..u....s.....u.¢-2 .3333... 3333... .1332! ... 3 .3... .3 .1343: 33333.. 33333... ...... 3.3 n . ...... .2 w .u .4 ~29 .....o...u.u ...»...2... ...u.....o.~sn.n.o...2 .2...<=om >...2u..... .uoor 2...: 196 . ...... . ...»... no... . z. ....» ..... .o .... ufl a... oJJd..S! 9‘ tJIL ax“. .......... ......o.... ........... .~ ... .....xm ..z.. m... a... a. ... ...z. ..x.. nt.. a... 4m... .uzm....zou .2... ... .... ...... .... .... o. z.: \c... .3. ... d... H d ._.. no 6......) o Llllv o o a a. n. a. «a o .... x .2. . 3 % 53......49. . a .... w...» 33... u 333 . 335 . .3528 3...». ..... . .. ..... a... a , .‘i'fl 0‘... 4...“...ng 4.. 3d. 0‘ h N: 0 u o x $.14 ‘Ofizufinv U“ 4 9.0.4.3. I 9.30.... 01. .... 5...: +3. 0* 40.43 ‘03th dalmgclo‘ost w u >02 *3. N xw>n>uuan an .m¢¢> Qua; ‘42.”: 3 9.. 3.0.43.”- e..+.¢._uo+. 04+ .... «Its...- 333...! 19.133.1931 “...: 33““ u 2.3““ 1 ages...“ g .....mzou . xw.....:. .. ....» a... ‘9‘?- 1‘... 3.0.4ng ... 5...... +£a fin : ca > 1.31.43 2.4 354...... .4314. Z... .... ...... +3. ... ._.“... 33...... +3.3... «3...... v . 32 S. ... 5.2.3. .. ...... 3.... 0*dl4vo «J 9+ o n a n v a 0 . I. ... . w w H) u u 1 L D a: ....dsr 04:400.... .41. . a... $090 D0.Jd.§ +¢o~$sofi $33.3. u tw>n>noau 2— .IC.» 0092* on... x... cm... ...:o...¢. . a. ... ....o .. on... :o.. am... ...:o..w¢. . a. .o. ...oo .. an... x... an... ...:o..... . a. .o. ...oo .. on... a... 3.... ...ao..m¢. . a. .o. ....u .. on... x... aw... ...ao..m¢. . o. ... ...oo .. .... x... go... ...:o..... . ...... ...ou .. .... x... om... ...:o..m.. . .. ... ....o .. .... x... an... ...:o..... . a. ... ...ou .. wuzouwn «~.o ooxoaxoo “...: >2w¢a3u mpam. o. zo...:2..zou. ....ux..........uou...~..x .a¢oo< o. zo ..32 .zou. - x...u..~.z.z ....u u>oc¢ o. 2. ..32 .zou. a»..w..oa.zu>u..u>u.u>u.zu>o.u>n.....u-..... ....u m>o.. a. zo...az..zou. . ...... ....u w)... o. zo...nz..zou. ...ou..~.¢z..nc.zu>u..u>u.u>u.zu>a.u>a.....u..... ....u .oa.zoo ..... zu>o.u>n..¢.x...u..mo .o¢.o 40¢.zou .4... ”m.. moznm amma44 v. n a. w: » pzwzan _. 4.... z...x...... z... .z...... .4... ....mu .912... .5. ...un mks-onwhdturmw us~>¢¢rn~ «+3....suoou .3... ......» $.33... m . .~.~...... «a cornosvcu.ou N. auuo«. a A. .‘I 4 1a.}... x2330»: 23...; 13.2.. 0+ «1.1.09.3... ...q.:a .......no... .13... :8. I...» 3......3 ....“ G ado-novnoo.n o....o~...u . a: ......v... «a awoaca t ~o........ . a: ......o... .. one... . ......o.... s a: .......... «a amou44 u ..NN...... . - ......o... .. can... . mumou x on muaowc .. .... 4 1.. 313....) ‘03—’33: 0+ «vuvunnu.. o .....u....~ ......ou ....» .... 9. .....n.... . .....n..... ......o. ....» .... .. .1. .... .......... o a L0+mi_u~. ... .... .....u..... ....uvou ....» .... .. nooonnoo.o O - .....u...~. >tmcn canoe-«... \ «a awoo<4 n u.~nooo~.-. an acouca a u.sn0¢o~.co Nu awoo¢4 a Qua-onunuoo Nd awou«a I talk o—btx muzcxo and .....nu.... «a cum-nu...- «a nwoc<4 t «unmanno.o h as acne-nun.“ nu snoo<4 . unocumon..n «a none-nun.» nu AUG-(J ‘ ouonnmnu... «a apnoea...- «u nuooca g «a an cu ac: « .0: a... uvuvnnnu.. o - .nsnuuoo.n~ >v¢>uvoo ....» ...: .. no.»sou..u . . ~u-~uo¢..¢ pp¢puyou ....» .... .s ‘f : so.»~o«0.u . . ~u-~uoo..¢ .upcpo-ou ..... .... .o 4.3 non-~.~.. a . ou.~ouno.uu ~p¢pupou 600 ”no-com... cu uxnp onuNuoon.no «a aaoa¢4 x ouuocaov.a «« amaa¢J s .«oouns... _ an awoo<4 u ..«no..... “a among; . .on..~nn.uu hzdhnzou «a»U¢u n4 can: ...z.» so p2 onau..a .uw¢<:om pm¢w4 wa¢pm wwaxr. . ouovodno.o cu as.» n..n..n..° a. ammo.4 . aeonouuu.o «a amoe¢4 ‘ ao»muou no.~ns«... soo~on«.... noceuonn.u awau<4 a «a oofivnooo.o ”conuaco... vveonnno.a concede... nononooo... sonuoso... coonnuoe.o vvuowuoa.oc vvuocuua.o mg» a¢ amus....z.»...s.z.pvwo pa om.:.l I u a «gunman.... oouaooa...n anooasa...u o°~«.uo... pzoou¢<> o n~.¢n.o...- ooouuuoo.o. chooses-... amonuno... pacpmrou o «odouva... osuaaoo..e. oa~....~... sooouuu..a nanvooo.... cognac“... neu.~o~..~ oonwunno... nwuoooo...o oumoumaa.~ cacao; a nu mass“..°.. unnum..... condo-.... smoo~....o. n~ono«.... a «onanaao... neoooaoo... nonosuoo.c «gonoaoa... oononooo.. owonoooo.ou «ncsoxoo... .sn....... suuonooo... n~««..n.... ouon.~.... among; a «a noonn..... onn.~..... ~o¢uao°... «ovnwooo... nus.»~.... 3 n coonoaoo... noauoooo... nonuoooo.. convene...- noosnoao.o vuooowoo... naooooso... uncoaae... naooogao... m:ono~n..oo coups”...- I to :5... nun-<4 . «a onuooo-o.-u nausea-... cue-sou...- «mauve...- oouconog... coo-nan... o ooonooo..- suagna.... an.voo.... cocoa”..... anunaoa...o nauvovo...u on»non«.... snn¢o..... «cuoaoo.... v~uoeou..-u oc.ouu..... ooaoouu... swan oucoca t an acouoo....o nuance...- canon-u...- noon~oo..- n~.nun....c usuooo~... . c ~o..~..... occusuo... .~.v.«.... nunsoa..... oo.na...... nsnouoo...u o..nno«.... .n.~n~.... nv.noo..... u..so.~.... onnvc«...-o «ouvvo«... ,ou.¢¢> .unu d". ..-d‘.d~..n- d at.» au-o«4 - paapuzou u swoon; u ouoo¢4 . u:_» au¢o¢4 - *2...zom nuoa.a g ouao.4 . ...pnzou . owoo.4 . pa.»aroo u ‘ wruh nuooca a richuaou u amoocJ r 9000‘; s .-:.. goo-.4 m ....uzou u amoo‘a a auoo«J . pa.»uzoo . .m.o¢4 . ...»uzou t mwunn" a» aquso 4i. . 602 ooaownon.o ounoouo«.a ousnvouo.u «csnauso.o osoasooo.u. «assnav'.o on.on«n«.a onooonoa.~p > awr¢xuhmw 0 > .....Ja wovuscac.~ «ocno.o°.n wosonosn.v nnmvso~°.n «bounce... nowucon..v annocov..u onooonoo.u > omp au» ..rJd m 4 a a n _ m w a on.....~.on na..oooo..n nsnn«v~».vn uncounao.~n nauuuooo.uo «e.g.ouu..n ooanonuo.ov .n...ucn.~. » amp‘z_»mm ...er o......n.sn ......ou..n ......au.nn .o.......~n ...ooo....n .o...o.~.oq .....o...no .oo...oo.uc .« .x u > nnnunnmnnnu -"0.¥¢% a: n ooaonwvo.o d) n eaooo.u -o¢¢~¢¢.o oa.~nnon.~ aor1 on”. .mw¢<=ow hm09...> mug..¢spu.o 0* .33.: .. u u «9.5“... neaon.e cacao." . ~s.~uo.o... «5.9:..n.. so».n«o«.u nx_¢».x >ou.¢¢> wuz«mx=~m~a ; amoa.4 ; . «a oooou.o cannu.o «nooo.o. ado“...- o~ooo.°. oo.n~.a o~oo=.° nno-.o o~°~o.° «sooo.o. on~oa.o. onwav.e nuaoo.o.. “Hooo.o ocean.” canon.o. cocoa." nm~_4ouux<> bmno urn» ea ocean.“ v asun~.c vonnn.e voaoo.o canon.oo soooo.o- goo.~.o- undue..- ooocn.. annex... .nou«.oo cocoa... cacao.u CJNHDGIUMOIM. cuou saga .ugu anon ouou anon an.“ «now .332“ at: .a ... ... . bf".‘ 00“” 00“” cu Uta» oueoca . paqrnzoo a count; I om9014 u urchnzoo among; . px¢pazo . n 403 99999999 99999 99999 99999. 0------“---'---- 99999999.99 at ’th 0 19mm 99999999.99 999 99 99999969990 aw: 93m 99999999.9 99999999.. 99999999.9 99999999.9. 99999999.9. 99999999.9 99999999.9 99999999.9. 99999999. 9 99999999. 9. 99999999.9o 99999999.9. 99999999.9 99999999.9 9999 .9.9 99999999.999 .9 999: 9 9.99 99999999.9 99999999.9 99999999.9 munonuns.u 99999999.9 99999999.9 99999999.9. 99999999.9. 99999999. 1. 99999999. 9. 99999999. 9. 99999999.9 99999999.9 --------|----J- 99999999.9 99 99999999.999 999 99999999.9~ 9 :99 99999999.9 99999999.9 a” ”z”. .... 9999999”. 9 99999999.9 99999999.9. 99999999.9. 99999999.9. 99999999. 9. 99999999. 9. 99999999.9 99999999.9 -----:------.999.. ...9- ---- - 99999999.99 99999999.« .9 9999 o 9999 9999 .9 99999999.99. 99999999.999 999 99 .9 999: 9 9.99 99999999.9 99: :99 99999999.9 99999999. 99 99999999.9. 99999999. 99 99999999.9. 99999999.99 99999999.9. 99999999.99 99999999.9 99999999.99 99999999.9. 99999999. 99 99999999.9 99999999. 99 99999999.9 99999999. 99 99999999.9 99999999. 99 99999999... 99999999. 99 99999999.9 99999999.99 99999999.9. 99999999.99 99999999.9 99999999.99 99999999.9 99999999.99 99999999.9. 99999999.99 99999999.9. 99999999.99 99999999.9. 99999999.99 99999999.9. 99999999.99 99999999.9 99999999.99 99999999.9 99999999.99 99999999.9. 99999999.99 9 999999999,. 9 9 99.999999 959. ..4 d 9.9. J 9. 9 I 99AfAu_dmmwfi “VFW! v.0”aH"-.JV 999999999.99 99999999.9 .9 9999 . 9.99 9999 .9.9 .TSQ 9 m9fif 9 .9 .999999999 99 .9 - 9 999999999 999 am: an 99 299: 9 99mm 9- 96 1999999999.? 999 999 99999999.9o 99999999.99 99999999.. 99999999.99 9999999».« 99999999. 99 9999999 9 999999.99 99999999.9. 99999999. 99 99999999.9 99999999. 99 99999999.9 99999999. 99 99999999.9. 99999999.99 99999999.9 99999999. 99 99999999.9. 99999999. 99 99999999.9. 99999999.99 99999999.9. 99999999. 99 99999999.9. 99999999. 99 99999999.9 99 cabooono9conou 90' 99999999.999 9 :99 99999999.99 99999999.99 99999999.99 99999999.99 99999999.99 99999999.99 99999999.99 99999999.99 99999999.99 99999999.99 99999999.99 99999999.99 99999999.99 99999999.99 99999999.99 99999999.99 99999999.99 99999999.99 99999999.99 99999999.99 99999999.99 E .u (99999999.. 99 :9. 9 9 ..999999999 99999 >mn . .9 99999999.9999 “WIV 9 tan 99999999.99 99999999.99 9:22.... 9.99 99999999.99 9.999999.99 99999999.99 99999999. 09 99999999. 99 99999999.99 99999999. 99 99999999.99 99999999.99 “NHCGOKCO 9999 9999 9999 9999 9999 9999 9999 9999 9999 9999 9999 9999 9999 9999 9999 9999 9999 9999 9999 9999 9999 9999 9999 on. 999 9999 9999 9999 9999 9999 9999 9999 9999 9999 40‘ ono~o~s°.ou O I squamous.- . C enouoms...o - onuns-ou.ao . C .ouuonou.oo . a 3 ad .3553 3.3 9...; $5.: «.3 onouon«~.. ca .2.» ~nooo.-... .. at.» ououonu~.. on mt.» na....n.... .« at.» qvuoonon.. cu as.» ~«no««o~.. o:s.n.~u... nu ma auoa¢4 u amoo¢4 a dag-...... «sosnuou... an nu nun-c4 - amoe‘J x uuoouuo~.. oasonouu... ng an oaoo¢4 - amoo«4 a .unu.ao...u canovooa... nu ma assoc; - auao.4 s «sun..o«.° uuuoonua... nu ma 9.00.4 a awoo<4 a ‘3123fl113,13133138 .. :sla.qxa _¢ L11“ . do. h.+qfiw oaaosanuoo.n .x.4 ano.oon«sn.n unsanuua... as ..«u...u.. «a aaoo.4 ‘ «I .....u.... s a. o..n.vv~.. «a amuo<4 ; nonvono... a «a onus-0.... «u nmoo<4 t sounuM.~.-o a: «canon...- «u nwuc<4 ; s.nn.ono.. s a: ~o..nov... «a nmaa¢4 . unwou tron owuaowc ...sn..o.. . . s....u...an p..».aou ...<. .;.a ~53u.~«.... . . u...c.¢...~ . hrchnvou dd-¢..... o . .....u...nn pvc»oaou ....» .‘uo “ouon-§... o - “canons“... ...»ntou o...’ ...a annnoono.. o - .5..~»~‘..c »,¢».2ou ..c¢» .;-a on nun «cooouvhoo.u 53.... 1.6.... ...... 113...... ...“ us;- «an» san‘.st avuzzw 4n: ...... .3 35.1. JLRIO 605 .hb usvuoo. u R «T.¢W ._ I .u‘ A~J~Vvs 3!. « noognoasos.u nae-ocaacu.n n.°.onc~o..n noo.on°mon.o .oo..~o....u voa.~mnnuo.~ ooo.avooo«.n .90.».a.~’.o noo.oodoon.u nae.~a~ovs.~ 1. .awpmwpwp {lie-Til: u ”89.36.— a. .ennovn.a ~o°.~moo~..~ ~.¢.~n«oo~.n Hoc.hao.~o.u «9°.oo.«-.~ doc-unno~.o «a..-oooo.a .13 33 ... :3 33H 4 «J. adieu-«300. ”43H 10341451+d tou+dLo+H undonvuc.o a on «moooeea.o 3 wt.» vnonndo«.uu u on oosnovou.a c wt.» «scanned... a on «unmoved.- c wt.» «anooooo.o na 9&0044 m Noovvoo«.o nu awoacJ u noevooou.e nu awuu‘J w acumen >mc; onnwoo >wag amuwou >w¢a answou >wma nmuwou >m¢s ongmou >w¢g .mumou ,wxn unuwou >m¢s omuwoo >wma .mumou >wmu .muwou >mmn umumou >mma .muwou >w¢a .mgmou >ma. anamou >wca nmuwou >wma umuwou >w¢1 omgwou >m¢g omuwou >m¢g ...aosoo.. «a awoo¢J x nonmagnn.oc «a sumac; x «ganged»... ~u awao«4 a tack tack roan tack song sacs rack xoau tong tack tack roan gong sou. tong tack tog. tax. :01; o_»«¢ o~»¢¢ o~r¢x o.»¢¢ onh¢¢ o_»«¢ o~h.¢ o~»<¢ o—»<¢ o~»‘¢ o_»«¢ o_».¢ 0.».m o.»¢¢ ogh.¢ o_»¢¢ o_».¢ o_»q¢ o.»«x matcxu uuzcxu wuzcxu mozcxu mozcxu moz¢zu uuzurou ...«. ...a .- ..nnonuo.u o . so..suon.os ur ...n .. «an. ..c. a 4.... 506 .1: H40” 49 to 400 suns?! div-r:ns .# :11 34:... 2. 1... .. .... .... +d 38in... 3.4.3.3» —mmunn u:_acama_..caa. :a:._ O~s>o:sooooo. .Vscochcnn 04+ ... :8. 2.2.»: 4: .c 3.2.: ... c... c. .3 .c. c. ‘4utine Jiduuueld coco.anccco.n .:_4 uanncaoaoa.o ounaoncnoc.. an ac m:.a amoocc u .ncaaouccu... cacnoouaao.a mu «a auaacc : ouoocc . awacvumoaa.o canon-oonn.ca . awaoca c azcaaroo aa...oc.n.“~ oaooaccoac c oaooconocc.a 099.cooccn.a on..on~c...~ ace-«coooa.c aco.~coos~.a oao.oonca~.a coo-osoa-.~ u°°.oooao.nn ...-asaaoa c aa..~o:coa.a coo.o:a°:..~ a“ aoa.~naoac.n 949 m . w .llil. a la)! w «ficfiiafl ‘0 ooououuouoau co..cacoaa.q c....oc...”n a...oooo~. a ac..anooco.n t:_J anaanaoaoa.. cnanoncnoc.o ac na w:~a awaaca w co-nooaoo~.o. oaacoouca°.a «a «a amouca a auaaca : ooaanuwoaa.o ~noonooonn.oa . awoocc a azcauzoo _.m4wn w>~pcawa_uoua». zazv— axcanzou aacanzoo nuuwou unawou uuumou louwou omgmou Inuwou unamou .mamou anamou .mumoo anumou amuwou onuwou onumou nonwou omgwoo unuwcu nonwou gunmen nauwou hrcbmzoo urthmzou acouoco~o°.u c....:ac~c.~ n».~..nc...~c snoouoonos.o n >m¢a >m¢: >w¢a >m¢c >mza >mxa >w¢m >wa: >w¢¢ >w¢g >mx: >m:a >mxa >w¢a >m¢a >m¢g >w¢s >w¢: >u¢g >w¢g «we‘vvomoa.n -ccacac~..~ ancnnuonoo.uc unwoaoonoa.o :acu :ocu 23:: tax: :oau :ocu :a:u :oxu 23:: :ocu 23:: :acu :31: :ocu :::u 13:: £385 tack tau: tack a» bun uaouunu~o0.~ ctco pun ..v a . ...c:«acan.. .....M.....c. u a: a zoascacu ccauumncnn... ....o......«. a g a n roaacaoo 01.. . anacp: «ccaouncca a 2.222.... {a . c u :36 a ao—acao- aa:~.u.::uou o_ac: uozc:o .oc ac: .c .0: cu». o_ac: uazcxu mac ac: o» .o: :ua. o_ac: mozcxu aoc ac: on .o: :aa. o_ac¢ mozcxu .oc ac: an .o: no». o_ac: mczcxu .oc ac: c» .o: «a». o_ac: mczczu mac ac: n» .o: :u». o.ac¢ mozcxu soc ac: c» .o: :u». o_ac: mozcxu mac ac: »» .o: :ca. o_ac: mezcxu mac ac: an .0: :aa. o_ac¢ uczcxu mac ac: an .o: co». o_ac: mozcxo mac ac: an .a: 2.». o.ac: we:c:u mac ac: a“ .0: :aa. o_ac: uczcxu ..c ac: ca .0: :aa. oaac: muzcxu «ac ac: an .n: :a». o_ac¢ mozcxu mac ac: cm .o: :aa. o_ac: mozcxu .oc ac: as .o: :ua. o_ac: mezcxu .oc ac: ca .9: :aa. oaac: uozcxo ..c ac: nu .o: co». oaac: nozczu anc ac: «a .0: cup. oaacc mozcxo mac ac: «a .0: cup. .1- s. . s .. ..a a8. 3283...; ...: a: la... n.»o.«acan.. .....u.....c. a a: q :oapcson ccoon~ncan.o. ....o......~. o1¢c c u nunsas. : . zxucu a 2:23. 4:5 canoccncca.. ooa..fl.....c. c c u a :oaacaou mazw.u_a:aou £1 ‘07 cohvsumu.n an ax.» ”coaumoocvo Nu cocoa; r acocanca.c «a aaaacc : suuanshn.n nu awooca a coann*no.n a awooca : nacnacca.~a o p2¢>mzou menu‘s n< can: .32.» .3 ha onto. an; macaw n w>_acuua. awacu>cou. somnu~n9.9 9“ wt.» naonnon... «a auuoc4 a soevu..«.o «a aaooca . «omuoono.a nu awao<4 a caaocca~.. an 3&6614 t aocvouon.u mrcanzou 3333:: ac n3: 2:... .3 a: on a one an; accrm n w’nrcuupc nuu¢w>lou~ «oancaoaca.. .uwccncnoc.. cu an ax.» awuoc; u one...~.c~.o. cacnoouaa..u «a «a aaoocc a awoucc . nmaacoaosc.. ccoonoocnn.ca «a . awoacc ; axcaazoo HA»: macaw n m>.accaac awuawacou._ 15.- 3:. ’3... «so. ”.1330 3. no .8 33833. c. not}... 93.«lassavopallruoCQuQ-ogaonnzxc.u. ’ . hadrutou .aououncoo.~c pz¢huzou aooaaoonca.. ~.a.nuao.c a:cau:ou n :o_ac:ou nnnacmvc.n azcanzou a :o_ac:ow no«.-~c.oa n x c :o_ac=om ‘I E\M .25 33:1. .uouxmazflutaaou anaoaouncc . azcauaou n :o_ac:ou nocmcuaa.ca accanzoo n coaacaaw cannon»... : a :o.ac=om " E :23: 3.32:... 2.22938 Owen-shvuocu cocaouscsn.. u sosuumnonn.oo a «ccaonnvcc.. : O “ nngu—bkmou .aumou >wc: :acu oawc: moxcxu mac ac: .ngwou :a:: :a:: oaacc mozcxu mac ac: I . I unnunuoc.ou c . .aaconn~.c. «cos c ”Jana . .911 uauou 2.3“?) 4 {A n caooncno.. . . .c.cao.~.. usu- . c «Jana . 2‘s..sn .1 cs» aoaaouac.. . ......cnaac..c. a: a aoaacaoa .....u.....c. - ... , a :22...- an: ......a....c. .M a ._mu. J :5 o. . a :oaacao- ac .n: :u». ac .n: cu». 603 at.» on oceanooe.o rznbc¢arn owo¢w>zou. 03—» cu «canouoo.o ~8.acaca. awazw>cou. hz<>wzou o aaaocca... cooucua.... cancnoac.c nwoo<4 ; an monocooo.o uncuaao.... aouanmuo.. acacooo... caacnoo.... aacaoca°.. coaooooc.. onanncao..u mecca-.... rzcrmzou o ”ca-cca... nonnnna.... oncccnc:.a awooca u an onoocoo°.. nuaonao.... moo-nuno.o onsuoooo.o anuosooo.ou cocoacc°.. aoacscna.° cocoonco.ou nuomoouo.o wt» .1 awas..—~z.w...1.z.»vho he on o! c nocncoo.... uacoc....e. o..ccca..o. onscooo... azcanzou a nooocco..o ousaooao.e aanncao.... ocnaaaa..o one-aoeo.o. a-uoaoa... ccuoococ.~ «smudge-... ac.a:c..... co-naooc.c ' i I ' o aa.«cooo.ou cacoooo.... can-onao.o. canouuoo.o azcan:ou a ncsvcnao.o acaoooo..a naocnacc... ac:~.n~..o c....~a..a. couuccuo.a mmvaooua.q anoauca..o. aaoocno.... nccooanc.a nuaoca a «a ocaoc..... coconoo... cannon...- anccnoo.... casca~.... a nccnooo.... noaoooo... ”coacao... cocoaaoo... .caanooo.. ococooo.... ancenac.... ancacooa.. .aacc...... aaocaaac... nu..~c..«. I I. .I I ‘luuxhtt nmooc; x «a anoocaoo.. oc»~a..... .cnona.... ..aaacoa... anc.ac.... a anaccooo... ca..omo... cacaoco... occaaoao... onaonooo.o oaaooaao... nac:.aa.... “manage... onconooo.oo accsccna... anaoca.... I I I guard! o-occa . «a socnonoo.-. uacncoo... a..cc.c...o nuance...- c....c..... «...-cc... c oucnsooo.o ..:c.a.... a...cc.... accnoce.... nocnn.°... .ac.:.a.... ca...cac... auanona... ....cco.... cnaonca.... a.......... ~.c»~..... >oun¢¢> .000 9.0014 s an cnccuaoc... aoaoa..... caucouoo.-u cap-...... scone-....- nuances... c nun-coo... c~.ancoo.. ococoa9... occonaoo... ou.«vooo.o a..nnna.... aoccacna..- ccauaoa... Novuosoocou ”acne-ac... oncoococ... ccancua... >oo.:c> :mnu OOH CROdCCdNOOHO I. ”d can 4 m an» ac a sqm" auoacc : >u 2:33 4 26 : oucacc : . a:cau:ou -:_a au-oca a a:caa:ou oa-acc c cacaca : . at.» about; - brcpuzou a:.a cacocJ a a:caa:oo . a auoacJ : ...-c: c a.» ‘ . c . accaazoo .ccso - ouoaca u aw-OCJ racrnzo . auoo.ac:a». awo¢u>zou. :oac:.:ocwa wxa ac nuns....:.a...1.:.aauo a: can. an; mocha n w>uvcnah~ auo¢w>couu awn-c4 a aacaazou an . «coo... aacnc.. ......a coca»... ......a w 0 .nvsn... 0.00.... «oven... 00....“ auaacc : «a coon».- anon».- anaa... «noon... ......a ..4 macaw n u».>c:up. awo:m::oo. awoocc : an nova“... anoa... caoca.-o ...-... cocoa... coco-.n .138... 1.15.. 533.191....” .:.4 ~c...cc~...a .acaaa ......aac..~ a: a o n a a ......c .aucc.. nucca... ......a uncan.. . _ ......u own—ac::o: a.:ac: :ouocca muzcacaam_o «a a a .u a ...»...a.. ~c.~n..:.. oaannann... «cc-nan... «gonna...- acaoaana.a baa-ac: aoo.:c: uoacacaauaa wrap .c ......a azcamzou : awooca : axcauaou a a o c. «a o n o cocoa... n...a.. accna.. .acua.. canon... .ccca.. acnno... «cc...- .cn.n... n.a.... cou.... acacn.. .c.c... c..~.... .nan~.. accac... «can... ooaa..c accaa.. «...-u. aacnanc non.~.. .cnacuco . voaanuco an.a.... .a..u.u annua... aaanu.u. noacu.. ccuc..u ocoon.- «econ . nncn . coco . n no .u o. no a ......a caaco... .nca .. nac.~.. a acc... nacca... ......a non-.... manna... .nac .. .naan.. ......c ounc..c. can: ... ccona... ...-... anccc... nanac... ......a canon... on....u awn—chcoa c.¢pc: pouocc’ sumo CON” «an on .. . Wflunmnu_ ...c- a: 4” ...... an - 0 Au...» ......” .3 liar acaa auoocg a azcan:oo GUOO¢J I oaooca ruchuxo auooca Q radpntoo I ‘10 090.3} 3.». u a..." 3\...c .3. mac»: :.4 on. nmmwuwwmumu wozcxu smou x‘x ca.« «ouncmc.a nag-no... ocoooeo.° «aaaa . cxxca run a pan to». mac». .:-a. cocoocoao ow 50 turn u— wardxu awou uwc xct ooooceoao Ivan ax: an. ac: 99009.9.0 Ix—J ago «ac—mum aux—4 wt.» so .01 macaw. :ua. gap» or 2.: I... 33rd ...-+434 "Hm—“3s, lot—6. 04+ .3 3335!. 1:33? Illlva a: ‘0¢lb. 0*.tu Cd—Jdutg *‘U‘CO‘O‘ i=‘«0m1 ha I .accu m>ooc oa .oacu w>ooc ca .ozcu m>omc o» .oacu w>oac a. .acco .nscu nuzouwa on.~ u:.a a:w::au aucc mu:— oosoaxoo caca «a . uaac u:. v on mu 0 a:_a awoaca u zo_ac:2.a:oo. :o.ac:2_a:ou. 2°.»c:z_azou. zo_ac:2.azoo_ Joaazoo acaa. 4o:»:oo acau. awmt<4w pzmcxzu .o.. - ....ooo.. oa......a ......... .a on a_:_4 m:.. «a». a.4 4c_a:c: anon azax: ac:.:aa_ a4m>m4 mozwacm>zoo :zua. :3, 3a a ca 9 n c ,c nuaocc : _ awaecc : azcamzon : s u a :aanauaaa :. «cc: : s cc nu «u «a o a : m:_a nmaocc a amoeca a nuns-4 : aacauzoo :u..a. u:.a:a .. .ucc: on:: ma c n a a c a a m a: a o a : :mauau u:.a:a :. ..cc: ooo:u uacmc xocu ow>cu panaoa>u¢c uacmc :ocu aw>cu >4m3o_:u¢. uocwc toes om>cu >4m:o.>w¢c uncuc toss am>cn >4maou>w¢g uacmc nous om>cn a4m:o.>u¢: can“ xocu ow>cn >4uao_>u¢t aamu :o:u aw>cu aaaao_>u¢: - mama tog: ow>cu >Jmnou>mxg .‘u mug-ummrctavuw 92—12—0w0. on no: guano .- oa co. asaou .: a. co: aaaou .c co. .caoo "a a. co. ..aoo c an ac. .aaoo .» .- co: ccaoo .a on so: nuuoo .u dfln'DONC O U - man—x..ao..aoou:.:.4ux.a.uua~:.4 mamou.uam.::.o~.:z.mwc.:.:u»a.:u:u.au>u.u>u40:9Ju.4:_a uncaura: aoao.u>u.:o>o.o>a.u:4:4 . .....c:a.~.a.c:~: _ Juno: :_w4:..a4u~ :o:: ¢:_a:caa 4:.: .4:4 : Ha ammo: :_waa..a4a~ :o:: czaaacaa 4:.: .J:4 _ 411 u:.a sumac; w aacanaou u awoocc a nun-c4 : on an o a me an ocncoaoo.a «a.a~..... cacao-a... accuacco... cvcsco .... «noon-.... .c as». caocoao... acac~na.... accanacc... oncnmucc... ccoa-.... an aaoac; . nococcnn.a canocaa.... «cancnco... acanauac.. . aacumpmo on:.ou.... muons-.... osccuaou... c u c..a~a.... no..an..... ac ococca a ounuaouo.o ac coo-c4 : azcaazou a awoocc : axcanaoo : : o v an a n v . ‘3 cacona~..° ocoaoaoo... cacac..... aaanac9... oaocaoco... acnuaa.... .a u:.a can. nauoaano.o occonuoo..o oaancaaa..u cnoo~o°... onsaaeao.oo nauaoaa... an ouooca a a. nocaaoon.n ~oao~c~a.°- napooan... conconoa..u onouoao... anonnoo... . azcanroo o naocnooa.e. «annowoo.. aoncocao.. o~coa~a... ccaocaoo.. ooccoao.... c u nocsn «neacacu.o. canonnoa.. ocaoaaoc... cacnano.... emu-no...- nanua....o- an cmcaca : - nonooanc.e nmnaanna.eo can-ccoo.. «couuca...o anacoeoa... commune...- ca au-acc : auacaanu.nn nsoooc.a... ..aooca.... acacc~a~.a ocan~aa.... cccaonna.. . aacaa:oo oaooca~... cocanno.... amoaccc.... ancooao°.. «cocouo... c : .acuuaa... .....a..... .~.nn...... unmanaoc... an oucocg : .a.~..a..~ ocunann.... uncooca.... . a:can:ou canoaa.... on...«..... n : ovnnuuu... c ; :oac:_:o:mn wxa ac nmma....:.a...a.:.a.ao a: emu...) I .noo:.4w:.4 :::.ac: qua—:cu:_4. ana_:ac: ,ou.cc: aaoo .nmucuna-c.. unuaoucan~.. acacooaccn.u aansauncnn.. ....unac...cu an o as.» sumac; u aacaazoo a a: . .o aoaucaau ncooamcuau... .2329... oaoccu..c~.~n «SS-«Ra... a=a-«.....co £1.98 a «a v aaoocc : awouca : aacamaou . . «sung. ~ soaacao- .oowa.u c...onn..~.. ..nnonoaac.ca cascam.osa.a ocnvcnuaaa.. .....u.....c. actua- nuoocc : aacau:ou : a . o c zeaacaca , x c aaasi. Aoanrogr.kwa.u PBS 38: .552: 3259...:— vrfluT L a 2: gaggiloclctd 44.4 - altfi W‘\ a»: n .c 1411.0003qu 98.“ n \tfiulpmrlm flue «2.3%... ”2.2... 83.2.11. :3. a 5*...- illlllwl 4. mo: .6 u o c: c m :22. an: a Eu 3: mac: 5: calls anti.“ ..oocos.n ca so gm»: .. warctu unou unc ac: ~..oaav.~ spam as: mac ac: cuoooon.o snag can .xano..u42Juuqaxccaas.a.s..s¢ ucyuaxauccc nay (it... ...... J. :«...¢I.i$.:...t .u 5.5... 3.... a...sn.a .muc 2.: .:~: .nc...a.a .ac:..9.:»u: so acc:-_u .a «a». .81 .1... ...J .3 ...... 33.5... .... .... r43...» 23...... an...” “2 I to. d . .. v 0. >19. . 90 "u“.ununm “3...”? ..nwu.....~.c 3......w..h~.o...n.o..o.n.. $1.3m. ...: .. . .. . . 3 . . . ......8..¢...1...¢ ”.57... “Wong!!! “u“.“udn «3...... Sana.“ “Hanna?“ "unnumuum $3.3m ...... .... . uals... . 8. . . . 2.3.... .33.... .....- .u a . . . . a. .. I do . . on u on. a . so. a. .8 .. 2:... .411... ... .... 1. .... "gnu.“ gunman ”nuns ...”...H. 2...... ”Bea...” 1.3".” A" m.:c:u .38 ac: HMS»; .....Mfi: .“n:.u..u. gain“: $553.5. 11...“... a . m c. .... coconvd.0 O 03 0" 3’. 3..., ”s! .916 .83”me . 3.» 3. \ ... .. .o ...m gig}. he. ..c xc: $2.33.... .3. .:: ... ac: 39.2... ...... 3. . . . In .‘!31. .u v i‘. -"I..’3I-.‘”--I- --."-- -- - --.-..‘O. H at P. «...-.... ’24-" .3...‘ d .... 93 note-83¢.- I I U”.— UI— .. gun It Menu. ”I‘M“ 01.. «.14. 3.3.1... 8...... .....BH 23...?! a 4... ...u... ...u... ..."..2 . . ...): 1.. 3.1.38.3... .JIJ . g 21.1.... 1...... 8...... ....uucuun . ._. 358 38:33.... .522: 3.2.3:... 3.3.56: :32: 3...... 8.3.5.... . . .. Jena .- u u o .. . J 334...... 3.33.”. ...»...n... . ... 1m. 3...... .. ......t”. . . . c.5153. .3 .c 3.2.3:.523:.:5 ... ...... .22... . . o ......» \ x \I 3.. 3...... .551: 5.2.3:... ra.:.c: 3...... 8:35.... u:.. . A .... a...c~.. 33...... . . .m..cu.a ....cua. ...... .. . 2.2:... 32...... .... ...... . .....S... ...-.23.... ...nm...... "uuuuuuuhu “unmunuunu 3 ...... .....3... 2.2.3.... 3.3.3.... 33...... n. . “manna." £33.... 32.33. 25.2.... ,. . ......o... 3.32.... .. ....c. : 33...... 3 ....c. . .35“... g 533...... 24...... : . c . .392... 3.1.2... 33...... . n . A "mumnuumnn "mace“.uuu. 3.38.... Human?" “unnuuuunun “nunmuuuuu ...“ 38““: 3mm. ...... . . ......a... 8.3.2... .... . ... . . c 32.3.... .322... ......o... a... S... a . . 33...... . .:c..:. 2.22.... 3.22... 83.3.... 83...??? 3.3.... 3.38.... . u 4...: 32...... ...... 2... 3 . . . 3.33.". ......3... .. ...... a :33... ...: ...... rams.» ”mum”. mug...” new“... a, ...... . ......u... 23.2.... 22.2.... 33”.“... ..“M...... . 3.5:... 33...... 33.3.... 3.32.... 3.33.... n. ....c. u .212... 9.233.. 3.23.... . 2......» 2...... . 3.32.... .. : . 332:3... w... .c 3.... . .. .. I I 3...... . . . ' I I 58:38.... .552: 22:31.: I .. a a- 1:2: 3...: ..3 613 555..5..5 555..55.. 555.55..5 555..55.. wuzczu smou :.: 555.55..5 555.555.. 555.555.. .55.555.5 555.5.5.5 wuquu 5wou x4: «outdoo.~ 555.5.5.5 555..5..5 555.555.. .55.555.. muZ¢xu swou x... 55..555.5 555..55.5 555...... 555.555.5 555..55.5 555...... muzcxu uwou x‘x ..5.55... .55.555.. .55.55... 555.555.5 5.9.5.: .55.555.5 .55.555.5. .55.5.5.5 .55.:55.. 5.3.55: 555..55.5 moo-555.: 555.555.5 woo-505.» 5.9.5c: 555.5...5 555.....5 555.555.5u 555.5...5 555..55.5 5.0.5:: ....555.5 5.5.55.5 555.555.5 555.555.. 555.555.. 555.5...5. 555.555.5 555.55... .55.555.5 555..55.5 555.5.5.5 555.55..5 55:c:5 .555 5c: 5.5...: o (0.35.5.3. 3.50...» .13 ‘05?!- .. .5 .555 5. mo:cxu .555 55. 5c: .5 .o 55.5 5. 555cxu .555 55. ac: .55.555.. 55..555.. 555.555.. 55..555.5 .5....5.. :.5 :w: .5 .5 .555 5. 55:c:o 5555 55c 5c: 5.....5.5 .55c :.: .5...55.5 .:.: ......5.5 5.1.385... .4 1.1.... .2... 5. 3.. 55...55.5 ..5.5.... 5....5..5 5....55.5 55...55.5 5:15: pm: 555.5...5 55......5 55....5.5 555.555.. 55....5.. 5 555 55....... 5.5.55... 555.5...5 555...... 5....5... 55..5.5.5 . . .5..5.... 5.5.55... 55..5...5 .....5..5 555.55... 555...... 555.555.. 5...555.5 5.5 :w: 5::ca 555 5 555 5.55 .5 .5 ...5 .. 55:c:u .555 55c 5c: 555.....5 .... 5:: 55c 5.: F 5.044593 J 1530-: PI... to US- a 55..5.... 555...... 555.555.. 5.5.5.... 555.555.. 555..55.5 . 5.5.5.... 555.555.. 555..55.5 555.555.. 5.5.5.... 555..55.. 555.55... 555.555.. «.5 5m: «xtc5 555 a 5mg 5555 555.5.5.5 ..55 5:: 5c .c: a 33...... ... 1+8... 5». 5. .... . 555.5.... .55.5.... 555.5.5.. 555.55... 555...... 555.555.. .5 5.5.5.... 555.5.5.. 555...5.5 555.555.. 555.55... .55.5.5.. 555.555.5 555.555.5 5.. 1w: «xxc: 5mg a 5m: to». 555.555.5 .555 5:: 5c .c: .u 3.2.1.... I» 53...... 5. .... . 555.55... 555.55..» 555.555.. 555..5... 555.5...5 55......5 5 5.... 5 .5..555.. 5.5.5.5.. 55...55.. ....55... .55.55... 555..55.5 555.555.. 555.555.. 55....... 555.5.5.5 555..55.5 5.5.555.5 5.. :w: «xxca 5mg 5 5m: 5555 3.... .....55.5 555.55... 5.0.5.5.n 5.59 .555. 5.55. 5.5.. 55... 55555 :5.. oaoooss.s .254 :40 .55.. 5.55. 5.... 55... ....5 :... B588 ouocnhs.s can; 040 .55.. 5.5.. 555.. .55.. 55c». :55. 5....5... .554 555 .55.. .555. 55555 5.... .5... ....5 :55. .555 5:: 55c 5c: 555.555.. .:54 555 .5c:..5.:55: .5 5...... .5 ..5.. .555. 5.... 55... 5.... .5... 55c55 :55. .0... 5.5.nno.5 55 5a 5555 5. mazcxu swou an. ac: 555.555.. :54u 5:: an. 54: 05......» 55.4 94° .55...5.5 .55c :.: .5....5.5 .:.: .....5..5 .:.:..o.:55: 55 54c...5 .. :55. . Q..-4.8.. .4 3.. 52.5. ...... 5.5. ; . 55.. 5 5. .55 .5 555..55.. 555.555.. 555..55.5 55..555 5 .. 55“..5n.. 55....5.5. 555.555.5 555.5...5 55....5”. . . . 5.... 5 555.5...5 555.555.. 55..555.. 555.555.5 55..5.5 5 555.... 5 55.5.5 555.. mozcxu smou 5.: 5.5.5:: «.5 :w: «xxca 5mg 5 5m: 5555 55.5 555..5... .5 .5 ...u .. wuzcxu .555 55c :c: 5...5.5.5 .... .:: 55. 5c: .5...5... ...: 555 55..5... .... :.: .:.: ...5..5.5 .5.:..u.:.5: .5 55c.5.5 .5 .... .....52 ......3 3...... . .... .... ...... 50(305863 d‘ 2‘05.‘ 9.935008 3 0.11. v *o‘dlv .88.... ......... .83.... ...-8... ...-4.... 8.8.... ...: .. ..8..... 88.8.. .83.... 888... .8...... 88...... 8... .. .88.... 88...... 88...... 888... .83.... .88.... ...... .. .83.... 28...... .83.... 888... .85.... 8.8.... ..... .. .88.... 88...... .88.... 888... .85.... 888... ...: .. .88.... 88...... .88.... 888... .85.... 88.8.. ...: 8.. ...... .... .2. 8...... 8.. ...z ...... :8 . ... .... ...... .... 616 3.!”- E a. ..o a»; .. .52.... .300 ... x... .5 .5. 2... mm. x... .83.... ...... 3. Juan .00 0000...... u 8...... .... 2:. g :.:. .88.... .88... .... .. 3;... ... ._.: ...l . 38 a I... .13 ...... .72.... 8 40. “in,” Us esofitoauwlh “I.B.:hz—“uv go...‘ {:.:-1...". O. . 13 d . vs. 3... .... 6 ...)... ... 4 2......8 8.3.1. n.eooo.oo.oo..3. v. :53. u 83.3.88 .....M...... .....u...... .....u...... ......88... . . ...: ...... . ......zou . .. . ...—...... 1... , .98“. J. 2 P...» 88.83.... ............ ............. 88.88.... 8888...... 3...“... o .. i d... u. .. o v a V .... 1. 1.... _ ... ...... x ...... . .21....» . _ 0.8.»... 0: 0+ 93 3.4.3.1. m 22:30- “#3....” «1:. 81.1.. 1.... ...... 8.8.3.8.. .....n....... .....M...... .....n....... 8.8«88... .. .... ... .18 0.8:... J... ...... . 3...... a . u . .....ao- .... ..z 0.2. 443. .o a... 1.2. m.zw.u....ou o. 3.481. ... 1...... ... 5. . 88:... .88.... .83.... 888... .8...... 8... .. .83.... 88...... 88...... 888... ..8..... . . ...: .. 888... .88.... .88.... 888... .8...... 888... 8... .. 888... .8...... .88.... 888... ..8..... 888... 8... .. 3...... am... .8. .73.... «.4 2.2 3:... .3 a 5. am». ...: an: ...-....» a. .o .... .. may}. .53 no. x... 3.3.... .5. ...: ... x... .83.... 3.3 go o 3.15.. ... 3:... t... .. .... .... .83.... .88.... .88.... 888... .8...... 8... . 88...... 38...... .88.... 88.8... ......... .. ... ...... . noalfiBN.O 0000005.." DdoOOhh.h d000d05.0 NdOId'O.N cocoooo.d ”CDC. 0 888... .88.... .88.... 888... .8...... 888. . 8... . 3...... .... x... .52... .... .... ...... ... . ... .... 8... a... ......u; a. .3 .u... i 37.30 .38 no. a... 28...... .5. a... no. a... ...-...... on... 9.3 1015 au»uzou. no»<:_;ozwa n¢ an»: ._.:ch .u.4 x‘z 05:. 44:. a».¢w»zou. «snovnovnuoo nu wx_» onoaoouuvu... «fl auooc; x «...nnonon.. «a auoo¢4 ; _.-.a ac: out. 44:. auacw>aouwu naouunn~.n on wt.» consvnco.~o «a auoo«4 u on..oo.a.a «a owuocq a .oosvvsc.v nu amo¢¢4 u “sovuon~.~ «« amou‘a ; «on.o~on.n o .2«»mzou ’ cauvsuco.« uchozou a 80—»(30w "macaw”..n bxz¢ruzou o sung“.«°.o. unsunnn~.o. «nousuuv.oa naouusne.e vuounoo~.o. ooonsuno.u anos..o~.~uu .8—4 x‘x ouz— 4435 ounce; a nu «onuauoa.a nuunonoo.. t c .ncoc-o.oo ooavnnuo.ou cannuooc.u. «nonvovo.a ousoonuo.ev «ossomon... oumoocno.n. oaoaono~.. couczuxozmo nuo¢w>zou. amuu<4 w an ocuonuoo.a onuoovoo.o & c .nonssmo.o. cnvonano.ou oswmoaoo.«. oaoooaoo.o “woman"..o. onco~vsn.on nouonooc.vc sovaouso.° bzdbnzou a ..nowo.n.. nuoonou... onunouan... nwaocq t «« vcuo-ooo.. ccnomooo.o nmwusnnn.o .oqoonao... canosooo.o ~osnnca«.° ~9....oo.° ooooavnu... ~..n.a.... u o osouovoo.o. nanoonoo.ao ovosusc~.oo omsuoooo.o pchmzou o .uuo~noa.o noouuona.o anouanc~.~ ccnnco.~.a. ovaooooa.o «Kooooo~.n nooooooo.nu ono~.~...o. noooavua.u s>omsooo.u~ m...» at ammah'-il-l pzcumzou c communou.o sounnvoo.. noooouso.uu awoo«4 ; an nausvoo... coonsuuo.a «ounce»... anaconda... no~o~ooo.o oooouavu.. o..uo~.~.. conuoosn..o oncscuu«.. u . «onoonoo.ou v~co~ooo.oo uoonqnan... ~ounnuu... pzouc¢<> awao¢4 g «a «uncomoo.o moovowoo.o «oncomoe.o smouncoo.oo «unosmoo.o 3 a auvucooo.o nuconuoo.o “mason.... cocoouoo.o. ooosnuoo.o nononooo.o counsmvo.ou oesoosuo.o. coononoo.o nososos..o ounccuoo.. co~.z_xo;mo mgr u< ow¢a....z.»...s.z.~.uo pa omua.. ouo¢w>zouv I I :-x_¢»¢x >oo.¢¢> 9000‘; s «a oououno... oom~0¢oo.o «nonuuon.o cannon“...- ououvnoo.. nvuoooou.o c oesooouo.ou oononuuo... nouunooo... osnonon... nounnune... oo..~vo«.o. a..nvoon.uo guanosnv.o ..uouoou... snaosooo.~. ”montage... n».~usnn.. smou omoo¢4 ; «« «sun-no... coon-«u... .«sowoon.a ancosvu.... oonuuvao.. sscnv«~«.. v conssouo... couscouo... on..nxou.n. «canon...- nasuvouo.-u nauooona... o~.°on~s.ao oo‘annon.. ouuuou. ... snscnns .n. u».unnu...u osnvvouv.. umno « ..Od'OdNOOHQ '0" at.» omoa¢4 - pr¢pnrom o-oocJ g aueo¢a . at.» auoo«4 - prcpnzou - ouco<4 u auoa«4 .. pxgpazoo n oueaca Q ~2¢hatou t J-suu n35 : :5 at.» ouoa<4 - ua.».zoo u amoaca g owoo.4 ; as.“ a 24.: as; o no. u .13: praunaoo u:u u o assoc; u av 15 nuoacJ >2¢rnzo . auoo¢4 . pa.»nzoo t 617 .x.4 x¢x ouz. 443a nuo¢w>zou. .x.4 ac: our. 4425 nau¢m>zou. awao<4 w hchmzoa nn 9 macs... «no.... .99..." covnn.o oecaa.u .x.4 x¢x out. 44:. nucmm>zou. ao»¢t a: n voooov-.~ uzcwmzou nu-.... «anvo.au o.-~.. «...... oosnn... ....».e .99..." fifi~¢>cx «j n one...” oucuawos.v osooooss.nu c nw-uca . o «a cause... «nae... noose... oo..n.. «o.na.a. canon.. .~.uo.. oon~n... ocean... vcosv.o c~..a.o. name... cauoo.ou ossoa.o ......“ «acne... ...-o.u nw~_4¢:coz hzchmzoo - «soon.- ounnn.. naoan.. guano... nooon.. «son... canon.- ansoo... mung... ...-..u >ou.¢m. muz.a¢=yu.n >ou...» muz¢acaunun n~¢>¢t toucca’ bwou . o a a .«c.... o.~.~.. ....... .u..... ......n « amononoa.. ...usuo~.c ousouoon.~ my.» .a ......u a . a . anon... assoc... can»... vuogn... cask... swan»... .n.c.... ooon... «on»... an..n... no~n¢.. «no.5... .n~..... mamas... .noon... ...»... non.».. snun.... ....u.. n.-.... ......u ...-.... ......u «an “I!” on 02-» at.» onco¢4 - pzcsnzom ouco¢4 s auao¢4 . pchnroo c nuoo<4 c wraparoo : \ u db. .4“, ov 0 11.7 n bl? vuuosnov.o Q C coonoonn.o . I cusooon~.a . C vouo~n~°.° on «tap aaon-o~.o a“ at.» vonuooae... on at.» sawovnv~.o a” us.» .x.4 ucx ouz. 4J2. aneam»zou. covsuono.o nu amooda m ”envooon.. nu amend; w snvsuwno.o. a awoo44 m snoonso~.a an awuuca m oocvvuo.... «a auoc<4 a v3oaon~c.oo «a awoo<4 x aaomsoso... Nu awoa¢4 x .Noooooo... «a awuc.4 x vnooonho.o s a: awnnouon.. «a awaoca . cousouso.o a a: nuvnooo~.o . «u auso<4 n sounoooo... s a: nosonoco.o «a awaaca c nooooaoo.o a «a nucoooon.. «a auea«4 ; mumou xcog owuaouc .~.nuuon... . o ~a.-u.n..u ....uaou ....» .¢-a .o nouoonca.. . . ...«n"°°.ou ».<~.2oo ....» .‘ua .. ...v 33.2... 4t: . . Loam-hu¢ 3.3:...“ J van .,¢.~,oo ....» .‘nn .~ ...~.o.... . . onsoous...~ u....2ou ..... ...a .« ‘19 vuuosnov.o . c ~ounonmn.o. - c .noonoco.o - - coon-onn.. . C voucunuo.o .u wt.» ~nons..~.° .u w:.» «nonsoou.a .u at.» noonsno~.. cu at.» covsuo~..o nu awoo<4 a o.«..n~n.° nu awou¢4 w o.«s.n~n.o nu cocoa; . nonoooon.. nu ouooaa u wanna-90.. «a awouca x ocoonuoa... «a awoo<4 x ooovnue.... «u auoo¢4 a ou.uouu.... «a auooca a ca...ns... u «a nunnou.n.. «a nuoo¢4 ; ooovoos... h N: nouonoo... nu awoo¢4 a «command... 5 «a oouonoo... «a anonda I ons~.o~...u s a: nuvn-..~.. «a auucca n a: nu vd onennuun... . - u..-uon..u ~2.»u2ou o.c<> .‘un «unvnnao.o saunauoo.nn uv car‘s—raw o > .moohun..~ ona.»nn..n annuaoon. m anncucco.n anonooo,.~ ~oc.«.~u.n «onunuoc.o snoooo...a ) nwp4xuuuu .3.) u.: ouz. 44:. um.xw>zou. oaoooans.-n .x >m.. . ¢.um acou~n-.oo mm: mm oooooooo.o mwc 13m «wouooon.u onuso~ou.~ coconndn. u nauovoao. u. “annouon. a. ..vuouno.~ acnsoonn.o. cannsoo~.«. «amonsav.o sso~u~¢e.¢. coonsanu.no “amuse...“- on.~n~ou.u ooaovooo.~ chooonoo.u oovvo~nc.u csnnano«.« swvomcoo.o. nonoomvu.o onnsosao.o onouvuou.~. > owr¢x~>um . > x: ~u~n.o'~.fl ...m .x.a nanoooou.onn .» x‘y: . ..mm asqoanoa.mu a~\~«59:.an vanavooo.od nuacvouo.s« Kenna“... nu omn“oovo ..u acnsoonn. cu ouonsonn.nu nooovwoh.ou ~59~o~oo.~« ooon~«n«.n« “onwooon.ou uvnfiosoo.oa «nsonaoo.o. omnunvoo.sn ownnosou.o« o-vocno.o« \uco~cov.ou noounvnu.ou vcowomoo.nu unn«ouom.vu > ompcxupna m ( .x.4 x‘x on:— 443u uuoxu>zou. «t 03.3.5: 911...! .... .. ca Wit-o: p . .oocao...u “ canons..." ....ouec... oopooaou. v . an ouonu. o a. snow. «9 9999909.. n . .5 ..o.o.¢ «o no... on ooooooog.» . ”cocoa... a unannouo. a» oooo.o.o.n o~cono~u.o vsnncnuo. on 9999999..“ . ~.u.~.~».n ......uo. »» coaceoo..u . no.~so.c.« nonaunnu. on oooooo°~.o. . oa~ovn...~. .ouocno...n .u .x . p . > a.».x.»nm . > » om..x.»au “HM“ . ...3. .3: xx... noonucus.o «no—vaus.~n« anonc»«~.« «a . .u .mz. . c.um p... .z. a cooooov~.svno. . mT¢anoos-n.oou conuno~v.uv 5n n .0 3.. an ..3 3va2 23.. . > ococooos..nn u o..oooo.o > :2» . mac :3» o . aoaoooon.n~ . annoo«.«.~ ooccaoon..o ooocooou.a~ . ~o..v-».« ..nnnun..no ocoeoooo.o« . kaoooq~n.~ noonaoso.on .oooaoon.nu . ono”n«.o.~. enouauon.oo ooooooon.~u . ukn.un.o.o o~ouoaoo.on oooeoooo.nn . coonona~.~ venoooov.nn ooocoooo..« . u~..na...o. "usuooos.flo ooocooon.~« . n~soon...o. nusooomn.oc oooo.oo~.~u . .o...n.~.e. o.so.nms.oc ooocoooe.s . onwnnnoo.n. onwnanoc.«n ocooooo..ud . ann.no.o.n. uno.»..n..n goooooo..na . ouu.«n.o.~. ou~¢~noo.5n oocoooos.«~ . o~uaoo~o.~ «hoonomn. an oooeoooa.a~ . unocoanm.~ sonnnuvs. en oooooooo.on . a«o.~».n.« souosouo. n oaoooooo.ou . -v.o..n.u osnoaaas. an aooeao°«.o~ . Kooouoao.u nnousoon.dn ooocooov.ou a~o~o~oa.o usososov.on oooo.oo..o« . n-~o..n.« osn~.ooo.sv ooocoooo.ou . maasnnvo.o ocuuovno.ov ooocoo...~u . u.n....a.~. «onoooov... . . > u » an».z..am » aw..:.» . . . . eu.s %WJ322.33. .........~v ..o. a. . ..oo o. ...a......n .........sn ...o.....§» ...o...~.un .o.»...~..~ m ._.m upmw ..«u.».... a. >ua > :3» .oaoooos... 9999.....no ooooooco.uo oaooooou.sn .eooooos.on ..oooo.~.sn ocooooon.un .....oou... .cooooon.o. .ooaooeo.nv oooo......» ooouoo.°.nn cocooooo.sn cocoooon.sn ..ooooou..n .oooooou.nn .o.oooo..~n ...ooo....n ..eoooo~.oo .ooooo...nv ..oooo...uo a - .....nnuu ION“. O ON. ”NQCGOROO on." .3muw anon ouou anon anon anon .=i~u $94M. was». o¢u1§ ‘21 s».nn»~q. v. . m ; . r . : 3 A ... 3.: ...:3 :uuuunmama 32.2»... n . 332:3: 2.2:»: :83»... - . .3 put; o c.»u . p... .a.n «c onavouhanon ”nouvddnnvnflfl N . ooo»os»..»»s» . »w¢ »» .» 2.»: . ..»» p». u ~.~mmn.m»..» "nan“.noucou»» o»....»»..o»o~ ..a..ooa.a ooooooo».»»~» " . . w >. >»» . . »w¢ 1:» > 2:» . .»uu.un». o.»»oo»».n»» . > :3» m»»o°»~..» o-o.».~».~o . . oucooo . . o.»voa~».~ a».»»»s».n. .o.»...n.oo . »oc.~a-.. o»»~.»s..oo ......on.n» » « :.::an 3333...: o co .2 . 3:2: - 9.333.: :33»... n '3 noncoonc... «oovoouo.oo onoouunm.”u . onwaunmanu ~.~»»»~...v ......9».“" u“ ”mu“ 3:33... . no a o v as a . mafia ”Magnum “ “fix.“ “was?" a "a“ «oncomuo.«o . ownuonov.° and 5 mm. omuo».~«.~. mmmuumwu.wm "uuuuoucnca . o.»»u»»o... economno.mm ”unannnunom »» on.» sun~nooa.ou. ~u»~»»o».v» .oaoouon."u . "»m~m«...». onvuodo».»~ cocoon-a.»~ n“ "mu“ ounovoov.00 ounc'ooo.on a c v O «CO-'0 ovnonnvounn . . cocooo¢.nn . Doooeooo ON «d ND.” oonoowom ». oonoo~oc.o» a a o . . a»..o~»n N. «aoooKn~.~n .aooooe . campus»... . o c on» a. . ~....~.~.~. a». o a. n .n a» an.» ocooonon.» "umwmmnn.mu "onouoaonso . on»».»»o.n «comamm~."m "uuuuuumnwm .» on.» sncofioc».~ nvnoonno.on cocoanuu.co . afiooo....« «mouuoon.on .ooooaec. n a «nun oooo-~«.¢ «caessso.o» .ooco o».»o . nooooa-o.o summaouv.vn ...»...c.mn » cu.» unhoovuoon QOVanhn.nm 0990.90 .5“ . 0N00n0~d.a Ohnnvnhk.nn .ooooooo.nn a nu.“ u»...».n.~ u.m»uou...o .9...°.~.~» . u.»»~..n.» onoo~°-..» ...-...»..n . cu.» .u»oo.»..c. aunoov»~.9» eooaonou.mm . Mom-oovuw n»»~snno.s~ ......»n.»~ m “MN“ > owp¢x_»»w . > > nw»¢x_»»m .o a avast . .mucwnoo.su ...-...».nu “HE " p 3255» . > > 35:3» fim mm . n » no «a.» .x.4 ..x o»:. 44:. uuoxm»aou. . a. )m.‘ 0 Co.‘ hflhm WIMB NOBDOHOO O . Nonnnflbflodvd GOONHOQnod HONHHOOOQO .% a. . .c swag u c.»» p4.» .2.» an undue gnu-...».os oooooo~».~»~ . . 33.233 . 4 9‘ ...»...... .....a.».o~ . . . »w¢ 1:» > :2» . ..mnnonun. ...»ooo..»so . . p :3» u.»o.n»c.» »»..o.»..» . .o...».».v . . announoo.. » . noosuo- v can u s .s . .u»»s»»».. mounmunm.m uuuuuuunnn . .uo.-~».. ...M~Mnu.~m uuuuuuuu.»o as «v.» u».~u.~».». ».»~«.-.» ...». ».» . o»~..»~».u ~oso».m~..c .....o...»w .u ...u ”ononumu.. n..»..s..« .o.o.""”.m. . n.m.uo.o.w. ..n.»..»..v .9.......Mc 0“ "Mn“ soon» .u . . o o «as. «node n . .s».»»n»..u “Munumnn.uu u»uoo.°»”~ . owouonav.. «saoounn.mu uuuuuuumuu" »» “non . o..»»oo~.». coauvons.»u ...uu..».». . ...-gag»... coo».»»o.»n ......»n.on »» on.» “DdKOCdVOOO fi..NHfl..-'l .6 ODD-HI . Q“Q.Ouh.oflb HdCQOflNVQBH COCB... - a“ an.“ anomanco.~. «....on»... u a can... . .cnonaoo... .vnoqnou.»n ......»n.cn » an.» oovnnoouJ. octagon.» ouguueou.nu . «3.33.? 3303...... 0:33». «a an: ape-.555.» .n»»»- . o. .» . u...»~.~.~. u...»~o».¢o ......» .on a» «no» a a ......»u » . onu».»...» ».».».n~.~ ..uo .» .n.» u . v ...-...» n0 0 on.“ 422 ..o..=um.» «:.::a mzoa mu». .» a; so »zo.»a-».o a»... ...» aazouu» o.».o~ »m»::.x n »¢:oa . .ux.» o-uncJ» "muowww ”Mung“ nmwngux u manor . “ cmxpoo . s o o »m z z »¢ o, a A .ma.o» . s P wood» 3 3 .3 013.39 K”:- 3:8»... 2...: $52... a 239. a c 3: 0..“ 515;; 51—». .- 93.... 9.2.. 31.433 on .013. +0» mozouwn 00“.: $52... a 250.. a o a: I s mozouuu eon.sn umpax—x « a a 09-930: an: uww nos. 0..."... Iou+§u0 30¢ v a u z x o .tcur 91.0 1:! to 980 .a¢«u Jochzou pa»». 2:. »o can u u»»»»~nn.».» o».s»..~.» s».».n»o.» . ~u.»¢-v.ovv snoooon~.» «doc-no... .c >»-. . ..»» ...» .1.» - . .c .mu. . a.»» pc.» .3.» «c . .m»..».».os »»a~»»~..o.~» o»».oou~.»o»o»o . a~noo-a.von .usn»~oo.o.n~ s.»oo»»».»».n» n». »» .» 3.»: . ..»» >»» . »wc »» .> 2.»: o >.»» >»» ......o... .........5»~o . .......... ..ooooos.»-» »»¢ :2» p gs» u »u¢ :3» p :3» . no»..nn..n a»...oo».».~ ..oooooc.o.~ . ~«u».n~o.» ooo.».so.o~ ......»n.»» «a «v.» oo.»»n»o.. anaooo.».».~ .oooeoon.co~ . o.».o=~».~ u».»oo~s.»s ...o...».¢~ .u ...» ononsn»... ~o».~oo~.°.~ .ooeooom..o~ on».n~nc.n oo~uo~oo.co ....»oao.oo »» .»»u «uo~».~».». »«»~»o~..n.~ poaooooo.»o» . a.».o»«..». no»ooonu.so ...»oo.».»» o» ca.» oo»..»~».. «..noon..«.~ .oocooo....~ . nuo.»oa~.. owonoaoh..o ...»...o.»» u» kn.» osoonnnu.» .maoooso.oon .ooaoo...oo. . ».~.o.»..n nwhnonon.o» ...»o....»» a» on.» osocns~».o. ..oo»s-.o»» .oooooo...ou . ~o»o»n«..»o ~onoa»»n.v» ...eooon.nn »» nu.» u1co 8:23.? 3:23....» 32.22.»: . 3:23.? 83.3...» 2332.3. 3 :3 4th; u»«»..».... .nauoc...~o~ .cocoooc.~o~ o~o.».~..o. .«oo».so.»o .....o.».». »» n».» 5.4 ¢ 3:32.... 0382...»: 2333.2: . 2232.3. 2223..» 3323.: 3 ...»: «Maw: s»»»°n.o.~. .nonon.».o.~ .aooooon.»a~ . ounovo..... .«n»......» ...c...»..» a» um.» 3:33.». 2.20.9.3... 3232...: . 2.9.338. 2»..~3.~o 3322.3 3 .2» 43.9 escoo5s5.~ annadnmo.~«~ oooooaa».»«~ . aa»~n~»..v «ov~¢~o«.~o .aoooooo.no » ouou ~5on~ncu.. -o¢s~»¢.»o~ ocoeoooo.oa~ cocoonoa.» conuncoc.oo ...»ooo...» o .uo» vvonooss.. o»».»»~..oo~ .ooaeooo.~o~ . ~»~noan».~ ncsoaocu.on .aooooon.»o ~ »~»» scuuuono.» «no»»oon.~o~ .ouoooo..».~ . o.»~onn..~ «nvsuoo»..» ...ooaon... . .~»» oovuonnu.» ounouvoo.oo» .aaaeooc.»o» . snooaoou.~ nvnoonnn.on cocoooo».on » on.» nsosnouo.. owowocoo.~»» oceaooo».~o. . ooo»-~».. «oo°.-~.»n ...oooo...» . ca.» nunwonua.« ~»v»uv~o.so» .o.ca.o».oou . ~n~oov~o.n no~oamsu.an .ceoeoov.nn n nuoa ooosvonc.» ».o~».o..n.» .o.»...»..¢. . a»...q.n.~ o.»»..«s.oc ...ooau»... a «u». unnoaooo.~o. snnooooo.co» .coooooo.~o» . uuuoocno... o»»»»v»~.»o ...».c.»..¢ » «m.» p car‘s—pm» . > p nm»«x_»»w _ mm» .x . p . » au»¢:_»»w . y > nw..:_»»w .o» .a a > no . . nnHU 3 .2.» a»: o»:— 443. uaoxu>aou. ” » 4 c a a _ » u c x g a u a w u a a » a . APPENDICES APPENDIX A COMPUTATION OF [2'2 1 AND THE RANK OF 2 As AN 1 2 123 3 INTERMEDIATE STEP IN THE COMPUTATION OF [z'z ] . l 2 l[23:24] The method given in this appendix is very general in that it may be used to calculate a matrix of the form [2 and the rank of 'z ] .l 1 2 23 2 as an intermediate step in the computation of a matrix of the form 3 [leZ] in which: 1(23524] (1) Z Z 1, 2, 23, and Z4 contain jointly dependent or predetermined variables or both. (2) Variables in any of the matrices may occur in any or all of the other three matrices as well. (3) Z Z , 23, and Z may have less than full column rank. 1’ 2 4 In this paper, 21 and 22 will most commonly be +YU (the matrix of jointly dependent variables in the nth equation), 23 will most commonly be X“ (the matrix of predetermined variables in the O nth equation), and [23 : 24] will most commonly be XI (a matrix of instruments containing X“) or X (the matrix of predetermined vari- ables in the entire system). are calcu- Matrices of the form [ZiZ and [2'22] 1 : 2 .23 1 "[23-24] ' - 0' I _ I . i. 1. lated as [2122] [21221123 and [Z122] [ZIZZJLEZ3zz4]’ respect ve y A computational procedure for calculating [Zl221lz and rk 23 3 . ' . as an intermediate step in the calculation of [21221L[23;24] follows. (1) Let Z be a TXN matrix containing all of the variables which occur in Z1 or 22. If desired, Z could be defined as 22] ; however, there is no need to repeat variables I Z - [21 : 423 common to both Z1 424 and Z .1 2 If Z = Z 1 then 2 , Z = Z Z may contain variables in addition to those in Z1 desired. and Z 1 = Z2 ‘ 2 If Calculate the moment matrix (sums of squares and cross-products matrix) of [Z3 5 Z4 5 Z] , i.e., calculate: r-z'z 2‘2 2'2“ 3 3 3 4 3 X X 2 : ' : I = v v v . (A.l) [z3 . z4 . z] [23 . z4 . z] 2423 z4z4 z4z X X X N4 N3 N4 N4 N4 N O t i z 23 z 24 z z X X X LN N3 N NA N‘N (Required computer capacity may be reduced by forming only the upper or lower triangular part of the above matrix, since all operations which follow may be performed on only a triangular part.) (2) Calculate Pt '1 2424 242 t ’ t = (A.2) {[24 . z] [24 . 2]}i23 I 1 L2 2“ 2 $23 Ez'z] [2'2] 7 [21' [z] [21' z 7 4.4123 4 iz3 4123 4iz 4i23123 E2 241123 [2 231234 L. 21 Z3[Zl+]l Z3 2.1.2321 234 = {[2 L ?z }'{[z] ?z } 4 23 «L23 4.123 .L23 1Repeating variables in the Z matrix causes no computational difficulty. 425 in the manner given by section I.D.2.l The matrix given by (A.l) will have been transformed to: r- 1 A11 A12 A13 N3XN3 N3XN4 N3XN I I 0A-3) 0 [2424Jl23 [2421;23 X X X N4 N3 N4 N4 N4 N 0 [2'2 ] [2'2] 4 123 .LZ3 X X X .F N3 N N4 N N . The rank of Z is the number of diagonal elements used as 3 pivots (number of columns treated) before the maximum diagonal element becomes less than a (see section I.D.2). If 2 a Y , Z = X , and LIML coefficients are to be cal- + u 3 u culated, then the [Z'lez3 = [ Y' Y +‘H +Dpjlxp matrix should be saved aside at this point, since it is a basic matrix used in LIML calculations. (3) The computation of [Z'Z] is completed by performing l[23:Z4] elementary row operations on the matrix given by (A.2) which is a 2 submatrix of (A.3) in the manner given by section I.D.2; that is, do elementary row operations on the matrix given by (A.2) until the first N columns are reduced to zeros below the diagonal. 4 (It is advisable to select the pivot as the largest diagonal 1Do elementary row operations on the matrix given by (A.l) until the first N columns are reduced to zeros below the diagonal. Thus, the pivot element for each step is selected from among the first N columns, only. (It is advisable to select the pivot for each step as the largest diagonal element to reduce rounding error.) 2 I.e., calculate [21 Z ] = [Z'Z] g in the z3 123 L([z4]iz3) i[23.24] manner given by section I.D.2. 426 element at each step to reduce rounding error.) The matrix given by (A.1) will have become: r- a A11 A12 A13 X X X N3 N3 N3 N4 N3 N (A.4) 0 A22 A23 X X X N4 N3 N4 N4 N4 N o 0 [2'2] , l[23.24] X X X N N3 N N4 N N J b The rank of [Z is the number of diagonal elements 3. 4 23 used as pivots (number of columns treated) before the maximum diagonal element of A22 becomes less than 6 (see section I.D. 2) . Also (A.5) rk[z3 : 24] - rk 23 + rk [24]123 APPENDIX B COMPUTATION BY DIRECT ORTHOGONALIZATION OF A MOMENT MATRIX OF VARIABLES EACH OF WHICH IS ORTHOGONAL TO A DIFFERENT SUBSET OF VARIABLES The computation by direct orthogonalization of a moment matrix of the form: (301) {[y11lzl ... [ymjiz }’{[y1]lzl ... [ymjlz } = 1 [yljlzltyljlzl "' [ylllZIEYmJLZm EVE ---[ ':[ L’ymlllm yljiz1 ym Lzm ymjlsz by direct orthogonalization will be illustrated by showing how to lA moment matrix of the form ' I c a {[YIJIZI [le'zml {[ylllzl [ym1'zm} may be calculated as [yl '°' yml'iyl "' yml - {Eyllizl "' [lelz }'{Cy1]izl '°° [ymliz }. 427 428 compute a moment matrix of this form for m = 3.1’2 (1) Let y1, y2, and y3 be TXl vectors; Z1 be a TXN1 matrix of variables, 22 be a TXN2 matrix of variables, and 23 be a TXN3 matrix of variables.3 Calculate the moment matrix (sums of squares and cross-pro- ducts matrix) of [21, 22, Z3, yl, y2, y3], i.e., calculate: 1The matrix noted in (B.l) could be calculated by first calculat- lng the m Txl vectors [y11L21’...’[ym1LZ in the manner indicated In footnotes l and 2 of page 43 (i.e., as m sets of residuals from m least squares calculations) and then forming the matrix of sums of squares and cross-products of these calculated vectors (residuals). A more accurate method of computation (which also requires less computer time) is to (I) calculate the m sets of least squares coefficients which give the m [y Illzi vectors in the manner noted in footnote 1 of page 59; i. e. , cal- culate 1a: = [Z*' 2*] 1z*'yi for i=1, ...,m (where Z? is a matrix of vari- ables from zi with rk Z? = rk Zi--see p. 47 ), (2) form a moment matrix, Z'Z, using all of the yi and all of the variables occuring in any Z? (Z may be expanded to include all of the variables in Zi, if desired), (3) form ai from a: by rearranging the coefficients of a: to the same order as Z'Z, inserting a "-1" into the ai vector in the position corresponding to yi , and inserting 0' s into the positions corresponding to all of the remaining variables of Z' Z. The ij th element of the de- sired moment matrix is then calculated as [y mj'z [y Jlej = a 12' Zaj The method given in this appendix is more accuratej than either of the above methods and requires slightly less computer time than the second method (which requires considerably less computer time than the first method). 2 A verification that the computational procedure produces the correct matrix is given at the end of this appendix. 31m this appendix, y1, y2, and y3 denote any variables (not necessarily jointly dependent variables) and 21, 22, and 23 denote matrices of variables--not the explanatory variables of equations 1, 2, and 3. 21, 22, and 23 may contain variables in common or the variables in any two of the matrices may be linearly independent. The matrices 21, 22, and 23 need not have full column rank. (3.2) [21. 22. I = z3’ yl’ y2’ y31 [21’ 22’ 23’ yl’ yz’ Y3] 429 v I I I I I I - z121 Z122 2123 ZIYI Zly2 Zly3 NIXNI NIXNZ NIXNB lel lel lel ' ....................... I I I I I I Z221 I 2222 2223 Zzyl Zzyz Z2Y3 I X X X X X X N2 N1 'N2 N2 N2 N3 N2 1 N2 1 N2 1 I I I I I I | 2321 ' Z322 Z323 z3y1 Zsyz Z3V3 I X N X X N X X X N3Nl'3N2 N3N 31 N31 N31 I ylzl ' Vizz yiz3 ylyl yiyz yiy3 I 1XN1 I IXN2 IXN3 1X1 1X1 1x1 l y'Z I y'Z y'Z y'y y'y y'y 2 1 ' 2 2 2 3 2 1 2 2 2 3 1XN1 , IXN2 1XN3 1X1 1X1 1X1 I I I I I I I y321 I y322 y323 y3"1 y3Y2 y3Y3 IXN1 I IXN2 IXN3 1X1 1X1 1x1 _ I _ (Required computer capacity may be reduced by forming only the upper or lower triangular part of the above matrix, since all operations which follow may be performed on only a triangular part.) (2) Let us designate the part of (8.2) below and to the right of the dashed line as (B.3). (B.3) is saved aside at this point for use in later calculations. in the I 1 l manner given in section I.D.2. The matrix given by (B 2) will 1Do elementary row operations on the matrix given by (B.2) until the first N columns are reduced to zeros below the diagonal. Thus, the pivot element for each step is selected from among the first N columns, only. (It is advisable to select the pivot for each step as the largest diagonal element to reduce rounding error.) (4) 430 have been transformed to: F 7 A A A11 12 13 1,y1 A1,y2 A1,y3 PI I I I q I 0 Z222 Z223 Zzyl zzyz z2y3 I I I I I (B 4) O Z322 Z323 Z331 Z332 z3Y3 I I I I I 0 ylzz ylZS ylyl y1y2 y1Y3 I I I I I 0 y222 y223 y2y1 y2Y2 y2Y3 O yézz y523 yéyl y3y2 y5y3 L a .lelj where A11, A12, etc. stand for matrices of no further interest to us where Z'Z1 , Z'Z etcu occurrEd'Before. The entire lower 1 l 2’ right hand submatrix is the moment matrix of the part of the vari- ables inside it orthogonal to Z (e.g., the submatrix in the l, o o I G I pOSItion 23y1 Is [Z3y11Lzl). rk Z 1 is the number of pivots used (see section 1.8.2). Retrieve (B.3) and replace the row and column of (B.3) correspond- ing to y1 by the corresponding elements of (8.4). (B.3) will have become: r 2'2 2'2 [2' ] z'y z'y - 2 2 2 3 2y11.z1 2 2 2 3 ' ------------------------ d I | I I I I 2322 . Z323 [Zayll-zl Zayz 23y3 I I I I I I I (3.5) [y1z2]izl I[5123]”1 [ylylliz [ylyzliz [y1Y3Jsz I I I I I I I yzzz I y223 [yzyljizl y2Y2 y2"3 l I I I I I I y322 ' y3Z3 [y3y13iz y3Y2 y3Y3 h— | l .J 431 Let us designate the part of (3.5) below and to the right of the dashed lines as (3.6). (8.6) is saved aside at this point so that it may be used for later calculations. (3.2) will already have been overwritten and we are finished with (8.3) and (3.4). (5) Calculate {(23. [YIJLZII yz. ya)'(23. [YIJLZII sz y3>1122 in the manner given in section I.D.2; that is, perform elementary row operations on the (3.5) matrix until all elements below the diagonal of the first N2 columns are reduced to zeros. Thus, the pivot element for each step is selected from the first N2 columns. As before it is advisable to rearrange rows and columns at each step to improve accuracy. After all of the first N2 diagonal elements have been used as pivots or the largest diagonal element has become less than c, the matrix will have been trans- formed to: EA A A A A 1 22 23 2.y1 2.y2 2.y3 '- - --- .................. --. - - - - ‘ ' g I I 0 .[23231122 A3,y1 [Zayzllz [23’31122 I I (3.7) o . A A [y ]' [y 1 A I y1’3 YI’YI 1121 2le2 YI'Ya I I I I I I I 2 2 l 2 2 I o ' [y'z J u A Fy'y ] [Y'y J l 3 312 ,y 3 212 3 312 _ | 2 y3 1 2 2‘ where the A are submatrices of no further interest to us. 11 As before, rk Z is the number of pivots used. 2 (6) Retrieve (B.6) and replace the row and column of (B.6) correspond- ing to y2 by the corresponding elements of (3.7). (B.6) will have become: (7) F W I I I I 2323 [Z3y11Lz [Z3y21i2 Z3y3 ' ______________________ I I I I [YIZ3LZ : [ylylliz ”131215721122 [yly3]l Z1 (13.8) I I I I [y223ltzz :[y21L22[y11LZ [yzyzliz Ey2y3jlz2 I I ' I I I Let us designate the part of (3.8) below and to the right of the dashed lines as (3.9). (3.9) is saved aside at this point for further calculations. (3.2) through (3.7) are not used for further calculations. Calculate {([y1112 , [yzliz , Y3)'([y1]lz , [yZJLZ . y3)}iz in 1 2 l 2 3 the manner given in section I.D.2; that is, perform elementary row operations on the (3.8) matrix until all elements below the diagonal of the first N columns are reduced to zeros. Thus, 3 the pivot element for each step is selected from the first N3 columns. (As before, it is advisable to rearrange rows and columns at each step to improve accuracy.) After all of the first N3 diagonal elements have been used as pivots or the largest diagonal element has become less than 6, the (3 8) matrix will have been transformed to: r A A. 33 3,yl O AY V (13.10) 1 1 0 A YZ’yl 0 [y 11 [y 1 . i m 3 23 1 z1 A3,y2 A3,y3 ‘ y1,y2 [y11121[y31123 Ay2,y2 [y2]lzz[y33123 [y3 lz3[y2]lzz [y5y3llz3‘1 (8) 433 where the AU are submatrices of no further interest to us. As before, rk 23 is the number of pivots used. Retrieve (3.9) and replace the row and column of (3.9) correspond- ing to y3 by the corresponding elements of (3.10). (3.9) will have become: I I I 1 [y1y1JLzl [YIJLZICYZJLZZ .[y11121[y3]l 23 (3.11) U, 122513421 [yéyzllzz [yznzztyahzs Eyslizfyillzl [YalizaLyzjizz [yéyajlz3 A which is the desired moment matrix. Modifications and Genmdézetéom 05 the Pneeedéng Peoeeduae The preceding procedure can be modified and generalized in sev- eral ways. Following are some of them: (1) (2) Only the upper triangular or lower triangular part of the initial moment matrix need be formed and all of the calculations given pre- viously can be performed within this triangular part of the matrix, thereby saving computer storage. The procedure given for m-3 may, of course, be used for m matrices Z1 and corresponding yi. Thus [21 "' 2m, yl ... ynl'le "' Zm’ Y1 "° ym] is formed and then an orthogonalization performed for each 2 Before orthogonaliz- 1. ing with respect to a given 21, the part of the moment matrix corresponding to z1+1 "' 2m. yl ... ym , (say M11) is saved. After orthogonalising with respect to Z , M is retrieved i ii and the row and column corresponding to yi replaced by the corresponding row and column of the just orthogonalised matrix. (3) (4) (5) 434 The desired matrix is obtained by replacing the mth row and column of Mi-l i-l by the mth row and column of the orthogonalized , th matrix of the m step. If it is desired that some of the jointly dependent variables 1. (say y1 -" yn) not be adjusted, i.e., that a moment matrix of h l l" ' t 8 form {[yljlzl [ymjlzmI yl yn} {[ylj-Lzl [ym1lzm, y; --- yg} be formed, this can readily be accomplished by start- - - ... ... T ... i u ... ... ing With [21 Zm, y1 ym, yl yn] [21 Zn, Y1 ym’ y: ... y;] and then stopping after m orthogonalizations as before. (This is correct because [yijlz.y; = [ygijLZi [see (I.56)].) Of the jointly dependent variables of an equation, the normalizing variable is the most likely one not to be Specially adjusted, i.e., to be designated a yi.) More than one yi may be adjusted by the same Zi . For example if {[y] .[y] .[y] .[y] }'{[y1 .[y] .[y] , l l 1 z1 2 22 3 .L22 4 124 l .121 2 .LZZ 3 .LZZ [y4jlz } is desired, this can be accomplished by starting with 4 I [21’ 22’ 24! yl’ y2’ Y3, YA] [21: 22) 24: yla Y2, Y3, Y4] and (letting the matrix saved just before orthogonalizing by IXZ be denoted M11 and the matrix obtained by orthogonalizing by X be denoted P22) replacing the rows and columns corresponding to 2 both y2 and y3 of the M matrix by the corresponding 11 elements of P . As a last step, the orthogonalization by X 22 4 would be as usual. Usually there will be a set of variables common to all of the Z1 matrices. (For example, the variables in the matrix X” Will usually be contained as instruments in all of the matrices of in- struments used for an equation.) If so, we can orthogonalize with 435 respect to the set of variables initially and then omit them from the z1 matrices. The following example illustrates this: Suppose the variables in the TXNO matrix 20 had been common to 21, 22, and 23 in the example given previously. Then we could form the moment matrix , ._ [20’ 2:3 2;: 2;: yls 3'2: Y3] [20: [is 2;: Z3: yla yzs Y3] Where 2‘ is the matrix of N - N variables of 2 not in Z , Z— 1 1 0 l 0 2 is the matrix of N2 - NO variables of 22 not in 20, and z; is the matrix of N3 - NO' variables of 23 not in 20. This matrix would then be orthogonalized with respect to 20 giving us: n- [21. z— I 2’ [3.3 yl’ yZ! Y3] [r9 2;) 2.3-! yl’ y2’ YBJ‘LZ 0 The (N1+N +N -3N +3)X(N1+N +N -3N0+3) matrix 2 3 0 2 3 M is then substituted for (3.2). All calculations then proceed the same as before (steps 2 through 8) until the desired matrix (3.11) is obtained. The only difference in the procedure is that in place of the N rows and columns corresponding to 21, we 1 have N - N rows and columns corresponding to [2‘] ; in 1 0 1.120 place of the N2 rows and columns corresponding to 22, we have N2 - No rows and columns corresponding to [23112 ; and in place 0 of the N3 rows and columns corresponding to 23, we have N3 - NO rows and columns corresponding to [23112 . Also, - I I , d rk 21 rk ZO +'rk [Zijlzo , rk 22 rk 20 + [251120 an rk 23 - rk Z0 + rk [zgjlzo vaéfiécwtéon that the Phocedu/Le Phoducu the Desi/Led “WK That (3.4) and any submatrix of the form [zlyjllZ£ [yizjl-Zk’ or [yiyjjlz are as claimed may be verified by comparing the calcula- k tions producing them with the calculations given in section I.D.2. 436 (The calculations given in section I.D.2 are verified at the end of section I.D.2.) We will now verify that the elements [yi1;z,[ylez are as claimed. 1 j The relevant submatrices used in computing [yi1;2,[yj1Lz are (assuming i < j): l J p 2323' 2% leNl lel (3.12) I I [yizlezi [yiylezi lXN 1X1 L 1 a Let us initially assume that zj has full column rank. Performing elementary row operations on (3.12) to reduce the first Nl columns to zero below the diagonal is equivaJent to pre- E11 multiplying (3.12) by a nonsingular matrix such that: E21 I I I E11 0 zjzj ijj 11 A12 I I E21 1 E3'17”lei Eyiyjjlz. 0 A22 Thus: -1 I I = _ I I and: ' ' = A . E21[ijjJ + [yiijLZi 22 Substituting for E21 in the last equation we get: -1 - . I I ' = . (B. 13) [injJJ'ZiEzjzj] [Zij] + [yiyiJLZi A22 17mm w I a). 437 -1 ' a ' . * *. ‘k . Since [yin]lzi yiEI 21(21 21) z: sz and -l ' - ' - * *' * *' * [yiyj]lzi y1[l 21(21 21) 21 ]yJ where 21 is a subset of the variables in 21, 2: having full column rank which is the same rank as 21 (see section I.D.l), we may rewrite (3.13) as: (3.14) A22 = I '1 I I " I ' I - ' -y;[1 - zgcz; 2;) z; 121(21sz ltszJJ + y,[1 - 2w: 2:) 12: 1y, - y;[I - z:(z§'2f)'12f'l[l - ZJ(ZEZJ)-IZ3]Y1 E - [yillli[yj]lzj [see (1.47)]. Thus, in the case of a 2 having full column rank, the desired J element is obtained. In the case of a 2 having rank N* < N , row operations are 1 J 1 performed on the columns corresponding to N* of the variables in Z J 1 before the diagonal elements corresponding to the remaining variables become less than s. (The orthogonalization stops at this point.) This is equivalent to performing row operations on the following sub- matrix of (3.12) (letting 23 be a submatrix of 2 containing the J variables corresponding to the N3 diagonal elements used as pivots): z*Iz* 2*. 1 F J J 1’: u*xn* u*x1 J J J (3.15) . [3'1sz, 21 [yin]l 21 X b The same derivation may now be performed on (3.15) as was per- formed with (3.12), the only difference in the intermediate matrices obtained is that 23 will occur in place of 21 wherever Zj 438 presently occurs. Thus, (3.14) becomes: (3.16) A -1 -1 ' - * *' * *' - * *' * *' 22 JJ [yiJLZiEijLZj [see (1.47)]. Thus, the desired element is obtained even in the case of a 2 having less than full column rank. APPENDIX C TENTATIVE PROOFS REGARDING THE CONSISTENCY 0F 8k ’k AND 6: ,k l 2 l 2 Consistency and the concept of a probability limit (plim) are discussed in section 1.3. For a discussion of the matrix algebra of the plim operator see Goldberger [1964], especially pp. 115-120 and Christ [1966].1 In this appendix we will need to distinguish between the matrices Y and Y”. As in our initial notation (section I.C), Y will refer to the TXG matrix of all of the jointly dependent variables in the system, and Y“ W111 refer to the Txmh submatrix of Y corresponding to the mp "explanatory" jointly dependent variables in equation u. Since it will cause no notational conflict, the y“ vector (the TX1 submatrix of Y corresponding to the normalizing jointly dependent variable in equation u) will be written as y and the up vector (the TXl vector of disturbances of the nth equation) will be written as u. Also, mu, L”, and nu will be shortened to m, L, and n respectively. From preceding assumptions or derivations we have: (0.1) p1im(l/T)U'U = X where U is TXM and Z is MXM.2 The diagonal element of 2 corresponding to the nth equation is 02; hence, plim(l/T)u'u = 02. (0.2) plim(l/T)X'X = QXX where X is TXA and QXX is AXA.3 The submatrix of X consisting of the predetermined variables in 1As before in this paper, plim stand for plim . Tam 2Assumption 2, section I.C.3. 3Assumption 3, section I.C.3. 439 (0.3) (0.4) (0.5) matrix, OX X is a moment matrix and there exists a non-singular pxp 440 th the u equation is the TXL matrix X“ ; hence, . g = . plim(l/T)XuXp quxp, an LXL matrix. 0 where O is a AXM matrix.1 Thus, plim(l/T)X'U plim(l/T)XfiU = O (with 0 an LXM matrix). plim(l/T)X'V = 0 where V is TXG and, therefore, 0 is AXG. . I = O ' X ' .. plim(l/T)XIXI XIXI where XI 13 a T K matrix of instru ments and OX X is KXK.3 We will assume that the variables I I in X” are contained in XI (hence, flxuxu is a submatrix of XX Let rk.0 = p . Since for all T, (l/T)X'X is a moment XIXI I I 0X X as well as a submatrix of Q ). I I I I 2 submatrix QX*X* = plim(l/T)X¥'X¥ . X? is a matrix of variables from I XI *=: XI for than (0.6) (0.7) (0.8) X T I X X If 0 is po, then is the entire matrix and * * x “XIX 0. I . I I I I I We will assume that rk X? = rk XI for all T and rk X? sufficiently large; however, X? and XI may have rank less p for small T. 4 ' I _ ' x I s p11m(1/T)XIX — OX X With OXIX a K A matrix. (Ox X is a I submatrix of as well as a submatrix of and . X X 3x1 QXx nXI I plim(l/T)XiU = 0 where 0 is a KXM matrix.5 Thus, p1im(1/T)X¥'U = O (with O a rk.XI X M matrix). p1im(1/T)XiV = 0 where O is KXG.6 ) hAssumption 3, section 1.0.3. 2Follows from 0.3. See (1.25). Assumption 5, section 1.0.3. Assumption 5, section 1.0.3. Assumption 5, section I.C.3. 6Follows from (0.7) See (1.26). P 441 (0.9) plim(l/T)V'V giaVV , a GXG matrix.1 2 Given assumptions (0.1) through (0.9) we will derive that: -l Y'Y -k [Y'Y ] Y'X n 0 n l J. * *x* *Y n u u u XI u u YuxIQxI I XI H Y3K“ (0.10) pliml = T deu x'x 0X Y 0x x “ ” L_ u u u H; where DY X* and QY X are defined by (C.13) . H I H H In addition to assumptions (0.1) through (0.9) we will require 5‘ the assumption: ; - -1 Q 0 * * * * YHXIQXIXIQXIYH Y X H H (0.11) is nonsingular. L. 0X Y 0XIX D H H H; It will be convenient for us to derive the plim of 1/T times the sums of cross-product of certain matrices with Y before commenc- ing the main derivations of this appendix (0.12) plim(l/T)Y'U - p1im(1/T)[XH' + v]'u p1im(1/T)HX'U + p1im(1/T)V'U H'plim(1/T)X'U + plim(l/T>(-1)r‘1[u z OJ'U p1im(1/T)U'U 2 -1 -1 def 3 - H 0 - T 0 - -F [g] = OYU . . I = x . . We will use plim(l/r)Y u QYHU , an m l submatrix of OYU 1Follows from the relationship between V and U. See (1.22). 2See (C.22). 2 3"def" denotes that we are defining -F-1[;] as OYU . 442 plim(i/T)[xn' + v]'xI (0.13) p1im(1/T)Y'X I = leim(l/T)X'XI + plim(l/T)V'XI def = + O = m = . I.mXXI XXI QYXI . . I = o X . We Will use plim(l/T)YuXI prI (an m K submatrix of OYXI), . g = . plim(l/T)YpXu OYuxu (an mXL submatrix of QYXI)’ and plim(l/T)Y¢X¥ = OYux*I (an mXp submatrix of OYXI). (0.14) p1im(l/T)Y Y- —p11m(1/T)[xn' + v] [xn' + v] = plim(i/T)[Hx'xn' + nx'v + v'xn' + v'v] = H[p11m(i/T)x'x]n' + leim(1/T)X'V + [plim(i/T)v'x]n' + p1im(1/T)V'V = nnxxn' + n-o + o-n' +-nVV = nnxxn' *‘vidgfinYY We will use plim(l/T)Y;Yu = QYHYH , a submatrix of Q YY ' Theoeem (C.75): If plim(kl - 1) = plim(k2 - 1) = 0, then u u -1 I I _ I Yu Yu- HEY Y uJiXI Y x Yuy k2[Yuy]lx 8 = k1’k2 x'Y x'x x'y u u u u u 1 is a Consistent estimator of 6 . Phoofi 06 Theotem (0.75): ‘ -1 I _k Y I flEYuYulLXI Yuxu Yu- 2Eu1LxI (0.16) 6k1,k2 - 6 = y - 5 X'Y X'X X' n u u u 1Since [Y'ptyhxI = [YH J'X Iy [see (1.56)], the right hand vector '-k 2[Y t] may be written as Y“ “lxl Y xI u "I mh‘ 'PE 443 d ' I E X 6 + : (an Since y [Yp #1 U) .1 Y'Y -k Y'Y Y'x '-k Y ' u u- 1[ u u-JLXI u D Yu 2[ uJiXI ‘ {[Y f X ]5 + u} - 5 X'Y x'x x' “‘ “‘ u u» u- p u . c = v (and Since [Yfillleu [YuYfillXI [see (1.56)] and [YH-J‘L'XIXH- -= ”11"“th - 0 [see (1.56) and (1.45)]): -1 Y'Y -k [Y'Y Y'x Y'Y -1< Y'Y Y'x ms nu 1 uujixl up nu- ZEHH'JJ-XI up 1 =- 5 x'Y x'x X'Y x'x I u u u u u u u u u. -1 'Y -k Y'Y Y'X '-1< Y ' u u- 1[ u uJLXI u 1» Yu 2[ uJLXI + u - 5 x'Y x'x x' u u u u = A6 + d where: ' -1 Y'Y -k Y'Y Y'x Y'Y -k Y'Y Y'x uu 1[HH]iXI up» up ZEHMLXI up (0.17) A .. - I x'Y x'x X'Y x'x u u u u u- and -1 Y'Y -k Y'Y Y'x y'-k Y ' u- » 1E1» HJlXI u u u- 2E MJLXI (C.18) d = u . x'Y x'x x' u u u 1b In evaluating plim A and plim d, we will first find it con- venient to evaluate plim(l/T){YJYP - kltY;Yul-XI} and . v __ k v _ plim(l/T)prYu ZEYfiYHJlXI} (C.19) plim(l/T)[Y;Yp]ixI a plim(l/T){Y;Y - [YdYu] } and for T u- IXI sufficiently large [see (C.5)] and using (1.55): 444 -1 I 11 ' - ' * *' * ' p m(1/T)[YuYu] plitn(l/'1‘)YMXI(XI XI) Xf Y“ . nY Y - [plim(l/T)YfiX‘f][p11m(l/T) (Xsf'xaf)]'1[p11m(1/I)xaf'Y J u u n B “Y“ Y ‘ 0Y x* alliX’VaxakY ° I I I I u Therefore, (C.20) plim(l/T){Y'Y - R Y' a u p 1[ uYHJlXI} 11 1/1: Y'Y - 11 k 9 “>111; p m 1- plim(l/T)[YL1Y“]J_ x 1 -1 -1 = - 1 - [n - 0 ] =