ABSTRACT

IMPROVING THE COMPUTATION OF SIMULTANEOUS STOCHASTIC
LINEAR EQUATIONS ESTIMATES

By

William Lewis Ruble

As a part of a much larger statistical package for a Control
Data Corporation 6500 computer, the writer participated in the develop-
ment of a set of simultaneous stochastic linear equations routines in-
cluding direct least squares (DLS), two-stage least squares (ZSLS),
limited information single equation maximum likelihood (LIML), the
Zellner-Aitken estimator (ZA), three-stage least squares (BSLS),
linearized maximum likelihood (LML), subsystem maximum likelihood (SML),
and full information maximum likelihood (FIML). This paper summarizes
(1) computationalapproaches used, (2) many relationships between methods,
(3) [to a very limited extent] some of the properties of these methods,
and (4) forms of user control cards which may be used to specify and
control the computation of problems. Computational techniques such as
standardization of variables to reduce rounding error are noted.

Some of the computational approaches used and some of the re-
lationship among estimators noted in this paper were derived by the
writer. 0f the computational approaches which are presented for the
first time (as far as the writer is aware), the following are probably

the most noteworthy:

William Lewis Ruble

(1) The use of direct orthogonalization in the calculations for many

of the simultaneous stochastic linear equations methods. In

addition to reducing rounding error, the use of direct orthogo-

nalization eliminates some of the problems of multicollinearity

among predetermined variables in the equation system. Also, the

matrix of predetermined variables need not have full column rank.

(2) The development of a method for imposing arbitrary linear re-

strictions on coefficients which:

(a)

(b)

(C)

(d)

(e)

(f)
(g)

Allows the restrictions to be specified directly to the
computer without prior solving out or conversion.
Provides a means of imposing arbitrary linear restrict-
ions upon FIML and SML coefficients.

May be applied in essentially the same way to DLS, ZA,
SML, FIML, LML, and BSLS.

Is adapted to methods requiring iteration to a solution.
Allows redundant restrictions to be imposed on coeffi-
cients. The number of independent restrictions is cal-
culated as a by-product of the computational procedure.
Detects inconsistent restrictions.

May be used to calculate restricted coefficients even
though a unique solution for a method does not exist in

the absence of the restrictions.

Relationships among methods which are shown for the first time

in this paper include;

William Lewis Ruble

(1) For the special case of a system of equations in which only one
jointly dependent variable occurs in each equation, the following
computational procedures lead to the same coefficients:

(a) FIML.

(b) Iteratively applying ZA.

(c) The Telser method of iteratively estimating each equa-
tion by DLS.

(2) For the general case in which more than one jointly dependent vari-
able is permitted per equation and at least one equation is over-
identified, the following computational procedures do not lead to
the same coefficients:

(a) FIML.
(b) Iteratively applying 3SLS (I3SLS).

(3) Iteratively applying LIML (ILIML) leads to FIML estimates in the
general case (multiple jointly dependent variables occurring in
one or more stochastic equations).1 The Telser method of iter-
atively estimating each equation by DLS may be considered a
special case of ILIML; hence, IDLS is a maximum likelihood method.
0A direct derivation of IDLS as a maximum likelihood method for
the special case of one jointly dependent variable per equation
is also given.)

In the derivation of the likelihood function for a system of
equations for the application of FIML and for a subsystem of equations
for the application of SML, identity equations are explicitly recog-
nized. It is shown that the identity equations need not be used to

eliminate jointly dependent variables from the stochastic equations

William Lewis Ruble
in order to express the likelihood function or to apply the FIML and

SML estimation procedures.

 

1The ILIML procedure was proposed to the writer by Professor Herman

Rubin.

2T.J. Rothenberg and C.T. Leenders, “Efficient Estimation of
.Simultaneous Equation Systems“, Econometrica, XXXII, No. 1-2 (January-
April, 1964), 57-76 have already shown that it is unnecessary to use
identity equations to eliminate jointly dependent variables from the
stochastic equations; however, a slightly different approach to show-
ing this is taken in this paper. Professor Herman Rubin informed the
writer that it is unnecessary to use identity equations to eliminate
jointly dependent variables for SML; however, the writer is not aware
of any reference to this in the literature.

 

l ‘lll I‘ll—ll [Ir

IMPROVING THE COMPUTATION OF SIMULTANEOUS

STOCHASTIC LINEAR EQUATIONS ESTIMATES

By

William Lewis Ruble

A THESIS
Submitted to
Michigan State University

in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Agricultural Economics

1968

 

lull,

 

.rﬁ’x’aV-:S]3

ACKNOWLEDGEMENTS

The writer appreciates the help he has recieved from a large
ruwmber of people during the early phases of projects leading to this
(manuscript as well as in the final preparation of this manuscript.

Dru William.A.Cromarty served as the writer's academic advisor during
his first year of graduate work at MSU and introduced the writer to

the general area of simultaneous stochastic linear equations estima-
tion. Professor Clifford Hildreth then served as the writer's academic
advisor for three years during which time the writer formulated many

of the computational procedures noted in this manuscript. Also,

during this period, Professor Herman Rubin gave the writer a number

of special lectures on the blackboard of his office in an effort to
impart more understanding of the techniques involved and to suggest
computational procedures not a part of the literature.

Professor Robert L. Gustafson served as chairman of the writer's
thesis committee and spent a very large amount of time reading through
two drafts of this manuscript and suggesting many substantive and'
editorial improvements which have been incorporated. The other members
of the writer's thesis committee--Professors Lester B. Henderscheid,
Jan Kmenta, Kenneth J. Arnold, and James H. Stapleton--also spent a
large amount of time reading drafts of this manuscript and suggesting
improvements which have been incorporated. Mrs. Janet Eyster read the
manuscript in semi-final form and caught quite a few editorial and

typographical errors.

ii

Mrs. Marylyn Donaldson assisted the writer by calculating prob-
lems to test out a number of conjectures regarding special techniques
to reduce rounding error. Mrs. Noralee Barnes did an exemplary job
of typing this manuscript and Mrs. Barbara Bray, Mr. Tim Walters, and
IMrs. Dona Smith assisted the writer to get the computer results in
their final form for photographing and reproducing.

That it has been possible to accomplish this project at all is
largely due to the support and encouragement the writer has received
from Professor Lawrence L. Boger, Chairman of the Department of
Agricultural Economics. The writer also appreciates the cooperation
and assistance he has received during his employment in the Agricul-
tural Experiment Station and the cooperation he has received from
members of the MSU Computer Laboratory.

Finally, the writer greatly appreciates the patience, under-
standing, and encouragement of his wife, Margie, during the long dura-
tion of his graduate student career. Although a large number of people
have made contributions, the writer assumes full responsibility for

errors which may remain in the manuscript.

Chapter

I.

II.

III.

TABLBOPCWTS

PART I. SINGLE EQUATION METHODS

INTRODUCTION
A. Background and Purpose . ..... .......... ..............
B. Some.Asymptotic Properties of Estimators ........ ....
C. Basic Model . .......
1. System of structural equations ... ......... . .....
2. Single structural equation ....... ....... .........
3. Statistical assumptions ...... ....... .. ..... .....
4. Reduced form equations ..........................
D. Orthogonality Relationships ........................
1. Notation expressing orthogonality relationships .
2. Computation of matrices of the form [2122112
and [2122] by direct orthogonalization ..?....
'23

COEFFICIENT ESTIMTIQI

A. Basic Double k-class Model and Summary of Methods ...
B. Methods Which Are Both h-class and Single k-claes ...
1. Direct least squares (DLS) .. ......... . ...... ....
2. Tito-stage least squares (281.8)
C. Additional Single k-class Methods ............. . .....
1. Limited information single equation maximum
likelihood (LIML). . . . 1 .......................
2. Negar' s unbiased to 0(T' ) in probability k (DER)
3. Nagar' a minimum second moment It (IBM)
D. Methods Requiring rk x - K - n ............ , .........
l.' Indirect least sq res (ILS) ....................
2. The instrumental variables estimator (IV) .......
E. No Predetermined Variables in an Equation ...........
P. Only One Jointly Dependent Variable in an Equation ..
6. Selection of Instruments ................... . . ..... .
DISTURBANCE VARIANCE.AND COEFFICIENT VARIANCE-COVARIANCE
ESTIMATION
A. Disturbance Variance Estimation ..... . . . . . .......... .
B. Coefficient Variance-Covariance Estimation . .........
1. Double k-class ................. ......... .
2. .Alternative estimate for 2LIML ..................
3. Nagar' s unbiased to 0(T 2) in probability
estimates .. ...................... ............
c. Coefficient Standard Errors and t-ratios ........ ....

iv

15
15
24
31
35
39
39

55

62
74

75
79
79
85
87
89
91
95
101

102
103

114
120
120
123

124
125

Chapter Page

IV. cmmxzsn ms: SQUARES

A. Unrestricted Generalized Least Squares (GL3) ........ 129
B. Restricted Generalized Least Squares (RGLS) ......... 131
l. Computation of Q and q .. ........................ 134
2. Relationship to another restriction formula ..... 142
C. Restrictions Imposed on Direct Least Squares
Coefficients ........................................ 149
D. Restrictions Imposed on Two-stage Least Squares
Coefficients .................. ..... . ..... . .......... 152

PART II. mums squxnous METHODS

V. FULL INFORMATION MAXIMUM LIKELIHOOD (FIML)

A" Properties of the Full Information Maximum Likelihood

Estimator ............ ........... ....... ....... . 158

B. Derivation of the Likelihood Function to be Maximized 163

C. Computational Procedure ............................. 172
1. A maximization procedure for functions non-linear

in the parameters ..... . ......................... 172

2. The vector of partial derivatives for FIMI ...... 179

3. Metrics for FIML ...... . ........ . ........ . ....... 182

4. Step size to use at each iteration .............. 190

5. Convergence criteria ............................ 199

D. Estimated Disturbance Variance-covariance Matrix .... 202

E. Estimated Coefficient Variance-covariance Matrix .... 205
F. Arbitrary Linear Restrictions Imposed on the
Coefficients .... .................................... 208
1. Illustration of linear restrictions on
coefficients--K1ein's model I ................... 209
2. Computational formulas .......................... 218
G. Lineariaed Maximum Likelihood (LML) ................. 223

VI. LIMITED mmmrmisunsrsrsa mums LIKELIHOOD (SML)

A. Only Zero and Normaliaation Restrictions Imposed on

Coefficients ........................................ 225
l. Derivation of the likelihood function to be
maximized ........ . .............................. 229

2. Computational formulas .......................... 242
B. Arbitrary Linear Restrictions Imposed on

Coefficients ........................................ 245
C. Using Instrumental Variables in SML Estimation ...... 246
D. SML Estimation when rk.x 2 T - M. + 1 ............... 247
E. Iterative Limited Information Single Equation

Maximum Likelihood (ILIML) .......................... 248

Chapter

VII.

VIII.

IX.

ZELLNERrAITKEN ESTIMATOR (ZA)

A. Only Zero and Normalization Restriction

Imposed on Coefficients ............................
B. An.Alternate Computational Procedure ...............
C. Arbitrary Linear Restrictions Imposed on
Coefficients .......................................
D. Iterative ZellnerwAitken Estimator (IZA) ...........
1. Only zero and normalization restrictions
imposed on coefficients ........................
2. Arbitrary linear restrictions imposed on
coefficients ...................................
E. Iterative Direct Least Squares (IDLS or Telser
Method) ............................................

THREE-STAGE LEAST SQUARES (BSLS)

A. Only Zero and Normalization Restrictions Imposed
on Coefficients ....................................
B. An Alternate Computational Procedure ...............
C. BSLS Estimation when rk X = T ....................
D Arbitrary Linear Restrictions Imposed on
Coefficients .......................................
E. Iterative Three-stage Least Squares (IBSLS) ........
1. Only zero and normalization restrictions
imposed on coefficients ........................
2. Arbitrary linear restrictions imposed on
coefficients ...................................

PART III. ADDITIONAL PROGRAMMING CONSIDERATIONS

ADDITIONAL PROGRAMMING CONSIDERATIONS ..................
A. Rounding Error .....................................
1. Single vs. double precision ....................
2. Standardization of variables ...................
a. Deviations from means ............ ..........
b. Uniform scaling ................... . ........
c. Improving the estimates of sums, means, and
the standardized moment matrix .. ...........

d. Adjustments if no overall constant
coefficient ................................
3. Use of simultaneous equations solutions ........
4. Direct orthogonalization ...................... .
5. Iterative techniques ...........................
B. Free Field Interpretive Parameters .................
C. Data Transformation Section ........................
D. ~Coefficients Pool ..................................

vi

Page

263
272

279

305

315

327
329

354
355
357
366
368

Chapter

Appendix

A

Page
E. Special Files ...................................... 370
1. Data files ..................................... 370
2. Intermediate storage files ..................... 371
3. Matrix storage files ........................... 372
F. Incorporation of y and G Directly into the Sums of
Squares and Cross-products Matrix .................. 373
G. Estimated Values of Normalizing Jointly Dependent
Variables, Residuals, and Related Statistics ....... 375
H. Weighting of Observations .......................... 377
J. Checks Against Errors .............................. 379
K. Computer Output .................................... 383
COMPUTATION OF [2'2 1 AND THE RANK 0F 2 AS AN
_ l 2 123 3
INTERMEDIATE STEP IN THE COMPUTATION OF [z'z ] . . 423
l 2 J-[Z3:24]
COMPUTATION BY DIRECT ORTHOGONALIZATION OF A MOMENT
MATRIX OF VARIABLES EACH OF WHICH IS ORTHOGONAL TO A
DIFFERENT SUBSET OF VARIABLES .......................... 427
TENTATIVE PROOFS REGARDING THE CONSISTENCY 0F 8k k
1’ 2
AND a: k ............................................ 439
1’ 2
BIBLIOGRAPHY ........................................... 448

vii

DLS
GLS
FIML
IDLS
ILIMl
ILS
IV
IZA
I3SLS

LIML

MSM

RDLS

RDLSME

RGLS

RZA

RZSLS

RZSLSME

RBSLS

LIST OF ABBREVIATIONS USED FOR ESTIMATORS

Page
direct least squares .................................. 74
Aitken's generalized least squares .................... 129
full information maximum likelihood ................... 158
iterative DLS ......................................... 287
iterative LIML ........................................ 248
indirect least squares ............. _ ................... 91
instrumental variables estimator ...................... 95
iterative ZA .......................................... 279
iterative 3SLS ........................................ 315
limited information single equation maximum likelihood 79
linearized maximum likelihood ......................... 223
minimum second moment k ............................... 87
restricted DLS (arbitrary linear restrictions imposed
on coefficients) .... .................................. 149
restricted DLS in which arbitrary linear restrictions
are imposed on coefficients in separate equations ..... 275
restricted GLS (arbitrary linear restrictions imposed
on coefficients) ........ . ......... . ................... 131
restricted ZA (arbitrary linear restrictions imposed
on coefficients) ...... . .............. . ................ 274
restricted ZSLS (arbitrary linear restrictions imposed
on coefficients) ........ . ................. . ........... 152
restricted ZSLS in which arbitrary linear restrictions
are imposed on coefficients in separate equations ..... 313

restricted 3SLS (arbitrary linear restrictions imposed
on coefficients)

SML

UBK.

ZSLS

BSLS

Page

limited information subsystem maximum likelihood ...... 225
unbiased to 0(T-1) in probability k ................... 85
ZellnereAitken estimator .............................. 263
two-stage least squares ............................... 75
three-stage least squares ............................. 295

ix

PART I

SINGLE EQUATION METHODS

CHAPTER I

INTRODUCTION

A. _Background and Purpose

 

In 1963 the writer started development of a system of
computer routines (presently called the AES STAT system) de-
signed to calculate simultaneous stochastic linear equations
estimates, direct least squares (including stepwise variations),
analysis of variance and covariance, some "basic" statistics
such as simple correlations, and the plotting of data and
functions.1 Emphasis from the start has been on developing
only a few major routines which can compute by a number of
methods on each routine. Additional flexibility has been
obtained by incorporating considerable facility for the manip-
ulation and transformation of data in the computer and for the
manipulation of coefficients between methods of estimation
(the estimated coefficients from one method often provide the
starting coefficients for other methods). The parameters
(instructions to the routines prepared by the user for the
calculation of a particular problem) are of the same form for

2
all of the AES STAT routines.

 

lA computer routine is a set of instructions to a computer to
accomplish a given calculation. The terms computer routine and
computer program are used interchangeably.

2The general form of the control cards for the AES STAT
system and some of the details on the development of the AES
STAT system are given in chapter IX.

2

In the process of programming the simultaneous stochastic
linear equations methods, the writer found it necessary to refer
to many different articles and books for computational formulas
and other specific aspects of the methods. Many of the compu-
tational approaches were desirable appraoches for computation on
a hand calculator but were not well adapted for use on the computer.
In adapting computational approaches for use on the computer,
emphasis has been placed on reducing rounding error, increasing
flexibility, and providing automatic decision branching in the
computer in the solution of problems requiring iteration.

The purposes of this paper are to: (l) summarize com-
putational formulas for a number of simultaneous stochastic
linear equations methods, (2) summarize some of the relation-
ships among these methods, and (3) to a limited extent sum-
marize some of the properties of these methods.1

Some of the computational approaches used and some of
the relationships among methods noted in this paper were de-

rived by the writer. Of the computational approaches which

 

1Methods presented in this paper have already been in-
corporated into the AES STAT system with the following ex-
captions:

(l) The method of imposing arbitrary linear restrictions has
not yet been implemented as it was derived by the writer
in the process of writing this paper.

(2) The limited information subsystem maximum likelihood (SML)
method has not been programmed, yet; however, it is planned
for incorporation into the system shortly.

(3) Nagar's minimum second moment k (MSM) is not available in
the system.

(4) The orthogonalization method described in appendix B is not
available in the system.

3

are presented in this paper for the first time (at least as far

as the writer is aware), the following are probably the most

noteworthy:

(1)

(2)

The use of direct orthogonalization in the calculations for

the simultaneous stochastic linear equations methods. In

addition to reducing rounding error, the use of direct

orthogonalization eliminates some of the problems of multi-

collinearity among predetermined variables in the equation

system. Also, the matrix of predetermined variables in the

system need not have full column rank.

The development of a method for imposing arbitrary linear

restrictions on coefficients which:

(a) Allows the restrictions to be specified directly to
the computer without prior solving out or conversion.

(b) Provides a means of imposing arbitrary linear re-
strictions directly upon full information maximum
likelihood (FIML) coefficients.1

(c) May be applied in essentially the same way to direct
least squares (DLS), the Zellner-Aitken estimator (ZA),
limited information subsystem maximum likelihood (SML),
full information maximum likelihood (FIML), linearized
maximum likelihood (LML), and three-stage least squares

(3SLS).

 

lA summary of abbreviations used in this paper for estimators

(e.g., FIML) follows the table of contents.

(d)
(6)

(f)
(g)

4

Is adapted to methods requiring iteration to a solution.
Allows redundant restrictions to be imposed on coeffi-
cients. The number of independent restrictions is
calculated as a by-product of the computational pro-
cedure.

Detectsinconsistent restrictions.

May be used to calculate restricted coefficients even
though a unique solution for a method does not exist

in the absence of the restrictions. (E.g., direct

least squares may be applied directly to problems in
which the matrix of explanatory variables has less than
full column rank provided sufficient restrictions are
placed on the coefficients. Thus, the step of eliminat-
ing linearly dependent explanatory variables from the

equation before obtaining a solution is saved.)

Relationships among methods which are shown for the first

time in this paper include:

(1) For the special case of a system of equations in which only

one jointly dependent variable occurs in each equation, the

following computational procedures lead to the same coeffi-

cients:

 

LThe use of the method of imposing restrictions suggested in
this paper together with the method for computing 3SLS and 2A sug-
gested in this paper (starting the computations with an identity
matrix as the estimated disturbance variance-covariance matrix)
provide an easily used procedure for imposing linear restrictions
across equations on the coefficients used to obtain disturbance
variance-covariance estimates in the calculation of 3SLS or ZA

estimates.

As a result, if Shfficient restrictions on coefficients

across equations occur, unique 3SLS or ZA estimates may exist even
though unique two-stage least squares (ZSLS) and DLS estimates
do not exist.

(a) FIML.

(b) Iteratively applying ZA.1

(c) The Telser method of iteratively estimating each equation
by DLS.2

(2) For the general case in which more than one jointly dependent
variable is permitted in each equation, the following com-
putational procedures lead to the same coefficients:

(a) FIPﬂ..

(b) An estimation procedure in which limited information
single equation maximum likelihood (LIML) estimation
is used iteratively to estimate the coefficients of
each equation.

(3) For the general case in which more than one jointly depen-
dent variable is permitted per equation and at least one
equation in the system is over-identified, the following
computational procedures do not lead to the same coeffi-
cients:

(a) FIML ,

(b) Iterative BSLS (I3SLS).

 

1The Zellner-Aitken (2A) estimator is given in Zellner [1962]
and in Chapter VII of this paper.

2The Telser estimator is given in Telser [1964] and in
section VII.E of this paper.

3The iterative LIML computational procedure (ILIML) was pro-
posed to the writer by Professor Herman Rubin in 1963 and the re-
lationship of the method to FIML was shown to the
writer by Professor Rubin at that time. Since DLS may be regarded
as a particular case of LIML, the IDLS (Telser) method may be re-
garded as a particular case of the ILIML method.

6

In the derivation of the likelihood function for a system of
equations for the application of FIML (chapter V) or for a subsystem
of equations for the application of SML (chapter VI), identity equations
are explicitly carried. It is shown in this paper that the identity
equations need not be solved out to express the likelihood or to apply
the FIML and SML estimation procedures 1

Detailed user descriptions have been developed for the AES STAT
system except for the simultaneous stochastic linear equations portion.
It is intended that this paper will serve as a basic reference to the
computational procedures used in the simultaneous stochastic linear
equations portion and that detailed user descriptions will soon be

written for this portion as well.

 

1Rothenberg and Leenders [1964] have already shown that it is un-
necessary to eliminate jointly dependent variables to solve out identity
equations for FIML; however, a slightly different approach to showing
this is taken in this paper. Professor Herman Rubin informed the
writer that it is unnecessary to solve out identity equations for SML;
however, the writer is not aware of any reference to this in the lit-
erature.

2

The form of the parameters to the system (control cards to the
system to compute a given problem) and the form of the output is dis-
cussed and illustrated in part III of this paper.

The AES STAT system has been programmed on a Control Data Corpo-
ration (CDC) 3600 computer. Although 3600 FORTRAN is the primary
language used, extensive use is made of COMPASS assembly language sub-
routines and features of the DRUM SCOPE executive system; hence, in
its present form the AES STAT system is very difficult to convert to
another computer system. Installation of a CDC 6500 computer at MSU
is planned for late 1968 and the AES STAT system will then be con-
verted to the CDC 6500 computer. In the process of conversion to the
CDC 6500 computer, a number of assembly language subroutines will be
replaced by FORTRAN subroutines thereby making conversion to another
large scale computer system more feasible. In any event, the compu-
tational approaches suggested here, including especially the ortho-
gonalization procedures, involve basic computational procedures which
can be programmed for any computer as easily as less accurate but
better known procedures.

7

Since this paper concentrates on computational procedures,
properties of the estimators receive only cursory treatment; also
many of the proofs are given in algorithmic form; that is, the
computational method which is described provides the proof of the

property claimed.

8
B. .Some Asymptotic Properties of Estimators

 

In this paper, most of the properties noted for particular
estimators are asymptotic properties rather than small sample
properties.1 The estimators are, however, used in estimation in
which the number of observations in a given sample is finite and
usually fairly small. Although it is hoped that the asymptotic
properties mentioned give a guide to the comparable small sample
properties, it must be realized that in particular cases a given
ranking of estimators based on an asymptotic property may be re-
versed in samples of the size used in the usual application of
the estimator. Also, asymptotic properties of an estimator may
give a good guide to properties of that estimator for a sample
of size of, say, 100 or larger, but a very poor guide for a
sample size of, say, 10.

Since many readers are not familiar with many of the terms

2
used in this paper, we will note some distinctions. In what

follows, let 9 denote a parameter of a probability distribution

 

l . .

In this paper, T denotes the number of observations in a
sample. We say that an estimator has a given asymptotic property
if there exists a sample size To such that for all T > To, the

property holds within a given measure of closeness.

2See Goldberger [1964] for a more extensive treatment of the
unbiased, asymptotically unbiased, and consistent properties.

9
and 8 an estimator of 9.1 Let T denote sample size. 9 is

A

independent of T whereas 9‘ may not be independent of T. The
properties of 6 for a given sample slag of T are those which
E has in repeated samples of sample size T -- not the estimated
value of 6 for a given sample (e.g., 6 may be the direct

least squares estimator applied to a coefficient of a given equa-
tion-~not the particular estimate obtained by applying direct

least squares to a given sample). Some distinctions follow:

A

(l) 9 is an unbiased estimator of a parameter 9 if 69 = 9
where 6 denotes expected value.2
(2) 8 is an asymptotically unbiased estimator of 9 if
lim 69 = 9.
Tam
Unbiased implies asymptotically unbiased, since if

6% = e for all T, lim 6% = lim 9 = e.

T499 T-'°°

Asymptotically unbiased does not imply unbiased. For

example, let 9 = l and 69 = l +-l. Then lim 68 =
T
l I
lim (1 +-—) = 1 so that 9 is aSymptotically unbiased.
Two T

A

9 is not unbiased, however, since 69 = 1 +H% # l.

 

LThe properties given here derive from the properties of O
as a random variable. It is more common in the statistical liter-
ature to use X in place of 9 so that the results are not lim-
ited to estimators and the location in the sequence is specifically
noted. Here n would be T -: the sample size, A more complete
notation would be the use of 9 in place of 9 to explicitly
recognize sample size. T

More particularily, O is an unbiased estimator of a parameter
9 E E if 68 = 9 for all 6 E O. For simplicity, in the remain-
der of this section, we will drop the mention of the class O to
which 9 belongs and the mention of the requirement that the de-
fined property holds for all members of that class.

10

A

As T increases, 9 converges in probability to

6 [i.e., converges stochastically--written plim 8 = 9
Tum

or equivalently as plim(D - 6) = 0] if for every

5 > 0, lim ProbEabs(O - 9) < e] = l, where Prob denotes
THQ

the probability of the expression within the brackets.1

Another way to say the same thing is that plim O = 9
T—m
if for any a > O and U > 0, however small, there is
some T' , such that for all T > T' ,
e.“ 631
A 2
ProbEabs(e - 6) < e] exceeds 1 - n . We can also say
that plim 9 = 8 if the probability distribution for 6
Tan

collapses about the single point 9 as Tﬂm, i.e., if
the mean square deviation of 9 from 9 goes to zero
as Tr”.

(3) An estimator whose probability limit is a finite parameter

(plim 6 = 9) is said to be a consistent estimator of that
T-m

 

parameter. Asymptotic unbiasedness does not imply consistency,
since the probability distribution may not converge to a single

point in the limit. As an example, suppose that for any T,

h

6 has the distribution:4

Prob(8 = l)

A

Prob(9 = 2)

1/2

1/2

 

1abs denotes absolute value.

2Kendall and Stuart [1961], p. 3.
A A 2

3The mean square deviation of 9 from 6 is 6(9 - 9) =
Var(O) + (69 - e)2.
A 4It is, of course, very unusual to define a distribution of
9 not containing 9 as a parameter; however, using such dis-
tributions as examples permits construction of exceedingly simple
examples.

(4)

11
Then 66 = (1/2)(2) + (1/2)(1) = 3/2 and lim 66 = 3/2

T—m

A

so that if 9 = 3/2 then 9 is both unbiased and asymptot-

A

ically unbiased. 9 is not consistent since plim 9 # 3/2.
T-Ioo
(The distribution does not concentrate on the point 3/2 as
Tv“; in fact, in this example there is zero probability that
9 = 3/2 even in an infinitely large sample.)
Consistency does not imply asymptotic unbiasedness. For

example, let 9 have the distribution:

Prob(6

0) = (T - 1)/T

Prob(O T2) = l/T

A

Then, since 9 concentrates at O as T becomes large,
if 9 = 0, then 9 is a consistent estimator. 0n the other
hand, 6 cannot be an asymptotically unbiased estimator of
any 9 since 69 = 0'[(T - l)/T] + T2[l/T] = T.

In discussing asymptotic properties it is often
useful to use the "big 0“ and "little 0" notation to
give a magnitude or speed of convergence. In using
this notation, it is important to distinguish between
whether the magnitude is related to B - 6 (and there-
fore is an order of magnitude of consistency) or whether
the magnitude is related to 69 (and therefore is an
order of magnitude of asymptotic unbiasedness).

Let f(T) denote a positive valued function of T such as

 

1This example was suggested by Professor Kenneth J. Arnold.

12
2 .
1/T or l/T . Then 9 - e is Op(f(T)) [which is read

A

6 - 0 is "big 0" of f(T) in probability), or, (equiv-
alently) 8 - 9 is of probability order f(T) or (equiv-
alently) O - 9 is of the 3333 order of magnitude as f(T)
in probability as T-*an if any of the following equivalent
conditions hold:1

(a) If there exists a positive c independent of T

such that

1 .
plim[-—-— abs(9 - 9)] S c.

(c can be a yggy large number and still be in-
dependent of T.)

(b) For any 6 > 0, however small, there exists a
positive constant c6 independent of T such

that

 

lim Prob[

T—m

1 A
- 2 -
f(T) abs(9 e) 5 c6] 1 5

or (equivalently)

lim ProbEabs(é - e) s c6-f(T)] 2 1 - 5.

T—m
(c) There exists a positive constant c independent

of T such that for any positive ﬂ, however small,

there exists T' such that for every T > Tﬁ ,

D

l

f(T) abs(9 - 9)'< c] >‘l - n.

 

Prob[

 

1Some of the "big 0" and "little 0" conditions which follow
are given in Mann and Wald [1943b].

(5)

(6)

tude

A

9 - 9

siste

13
Most commonly f(T) will be T'l/Z, T'l, T'3/2, or T"2

3/2, or l/TZ).

(i.e., 1//T, l/T, UT
9 - e is op(f(T)) (which is read 6 - e is "little 0" of
f(T) in probability), or (equivalently) 9 - 9 is of proba-
bility order smaller than f(T), or (equivalently) 9 - 9 is
of a smaller order of magnitude than f(T) in probability as

THub if any of the following equivalent conditions hold:

1 A
(a) 1);..iEE-f-Zﬁ abs(9 - 6)] = 0 .

(b) For any positive e and n, however small, there

 

e ists T' such that for ever T > T' ,
X e.“ y an
Prob[ 1 abs(9 - 9) < e] > 1 - n
f(T) '

é - e is op(f(T)) implies that D - e is Op(f(T)),
but the reverse implication does not necessarily hold.

To define order of magnitude of 69 rather than 9, merely

A

replace 9 by 69 in (4) and (5) above. For example:
69 - 9 is Op(f(T)) [i.e., 69 - 9 is of the same order of
magnitude as f(T) in probability as va] if there exists

a positive c independent of T such that

1 A
plini— abs (69 - 9)] s c.
T—m f(T)

The order of magnitude of f(T) gives the order of magni-

of stated convergence. Thus, if 9 - 9 is Op(T-2) then
/ /

is Op(T-3 2), OP(T-1), 0p(T-1 2), etc., and 9 is con-

A

nt. (9 is consistent if 9 - 9 is Op(T~n) with n > 0.)

14
(hi the other hand, 9 may be consistent but 9 - 9 not
op(T'1/2) or 6- 9 may be OP(T'1/2) but not Op(T’2). Also,
the order of magnitude of 9 - 9 does not imply anything regard-
ing the order of asymptotic unbiasedness except for specified dis-

tributions.

Similarly if 69 - 9 is Op(T-2) then 69 - 9 is
/

A

T-3/ 2), etc., and 9 is asymptotically

op( 2). op(T'1). opCr'1
unbiased. (9 is asymptotically unbiased if 69 - 9 is Op(T-n)
with n > 0 .) On the other hand, 9 may be asymptotically un-
biased but 69 - 9 not Op(T-1/2) or 69 - 9 may be Op(T-1/2)
but not _Op(T-2). Also, the order of magnitude of 69 - 9 does
not imply anything regarding the order of magnitude of consistency
except for specified distributions.

The above properties are some of the properties referred to
in this paper and do not, of course, include many properties of
estimators which are important in estimation. The emphasis on
asymptotic properties is dictated by the present state of our
knowledge of the small sample properties of simultaneous stochas-
tic equations estimators.

In the remainder of this paper, plim.A will be shortened

Tqa
to plim.A, i.e., the Tn” will be understood.

15

C. Basic Model
1. System of structural equations
In part I, the estimation of a single equation from a com-

plete system of equations will be considered. The complete system

which consists of M stochastic equations and G - M identity

equations (G - M may be zero) may be expressed aszl’z’3
(1.1) Y I" + x B' + [U 2 o] = 0
TXG GXG TXA AXG TXM TX(G-M) TXG
or
(1.2) Z a' + [U E 0] = o
TX(G+A) (G+A)XG 'TXG TXG

The same model is also often written in transposed form as:

 

 

F . “
UEEJ
1 , MX = ' =
(1.3) F YEt] + B X[t] +‘ 0' oEtj (t l,...,T)
GXG cx1 GxA Ax1 [t] GX1
t‘G’M) X L

 

1The dimensions of each matrix are listed below the matrix in
many of the matrix equations given in this paper. In this paper,
' (prime) denotes transpose.

2 . . . .
An identity equation 18 an equation containing known coeffi-
cients and no disturbance (i.e., a disturbance vector of all zeros).

The notation used in this paper was designed to meet the
following requirements:

1) It should be consistent with the notation commonly used
for direct least squares; in particular, the signs of
the coefficients should not have to be reversed to make
them comparable to direct least squares coefficients.

(1.4)

where

Y

16

 

 

or
7:
U[t] ]
MXl
oz zit] + 0' = on] (t=l,...,T)
GX (G+A) (G+A)Xl [t] Gx1
L(G-M>XL

[t] denotes the tth observation and:
is a matrix of T sample observations taken on the G
jointly dependent variables in the system.
is a matrix of T sample observations taken on the A pre-
determined variables in the system.
= [Y 3 X] is a matrix of T sample observations taken on
all C + A variables in the system.
is a TXM matrix containing T unobserved structural dis-
turbances for each of the M equations containing distur-
bances.
is the GXG matrix of population coefficients of the G
jointly dependent variables. Each row of P (or column of

F') contains the population coefficients corresponding to a

 

2) The coefficients for a structural equation are expressed
as a row of the coefficient matrix for the system., Sim-
ilarly, the coefficients for a reduced form equation are
expressed as a row of the coefficient matrix for the re-
duced form.

3) An observation is a row in an observation matrix.

4) Identity equations are explicitly recognized in the
notation.

5) Within the above limitations, the notation should be
patterned after that of Theil [1961] and Zellner and
Theil [1962], since this appears to be the most pre-
valently used simultaneous stochastic equations notation
now appearing in the literature.

17

particular equation, and each column of F (or row of F')
contains the population coefficients corresponding to a
particular jointly dependent variable.

B is the GXA matrix of population coefficients of the A
predetermined variables. Each row of B (or column of B')
contains the population coefficients corresponding to a
particular equation, and each column of B (or row of B')
contains the population coefficients corresponding to a
particular predetermined variable.

a = [F 5 B] is the GX(G + A) matrix of population coeffi-
cients of all C equations for all of the G + A variables
in the system. The term "jointly dependent" will be treated
as synonymous with the term "endogenous". Jointly dependent
variables are random variables assumed to be contemporaneous1y
correlated with the disturbances. They are assumed to be
generated within (endogenous to) the system of equations.

Predetermined variables are the remaining variables in the
system. They are either (1) exogenous variables, i.e., variables
which are assumed to be generated outside the system of equations
and therefore independent of the disturbances or (2) lagged values
of jointly dependent variables which due to their lag are con-
temporaneously independent of the disturbances.1 In some parts of

this paper the predetermined variables will be assumed to consist

 

1Contemporaneously independent is defined in statistical
assumption (3) of section I.C.3.

18

of Ufixed" or non-stochastic variables only. In these cases, the
set of predetermined variables must be restricted to exogenous
'variables only, since lagged values of jointly dependent variables
are stochastic and not "fixed".

Klein's model I of the United States economy will be used
to illustrate the above notation and some subsequent notation
‘which will be introduced. Klein's model I is a complete system
of equations (the number of jointly dependent variables equals
the number of equations).1 which may be written as an 8 equation
system, the first 3 equations containing disturbances and the
remaining 5 equations consisting of identity equations. These

equations are:

[1] [1] [1] [1]
. : = + +
(1 5a) Consumption C a0 a1 P +102 W +-a3 P_1 u1
‘ - [2] [2] [2] [2]
(I.5b) Investment. I — do +’01 P +~az P-1 +-a3 K_1 + u2
. [3] [3] [3] [3]
(I.5c) Private wage. W1 00 +~a1 E +-az E_1 +-03 t + u3
(I.5d) Product: Y + R = C +.I + G
(I.5c) Income: Y = P +'W
(I.5f) Capital: K = K-1 + I
(I.5g) Wages: W = W1 +~W2

(I.5h) Private product: E = Y + R - W

 

lAlso implied by the term "complete system" is the recognition
that no lagged jointly dependent variable occurs in the system with-
out the corresponding (non-lagged) jointly dependent variables also
occurring in the system.

IAlthough the notation_has been changed slightly, the expla-
nation of each equation has been taken almost verbatim from Gold-
berger [1964], pp. 303-304. Klein's model I is given in Klein
[1950].

19

The first equation is a consumption function which de-
scxribes consumption (C) linearly in terms of profits (P),
‘profits lagged one year (P_1), and the total wage bill (W).

The second equation is an investment equation describ-
ing net investment (I) linearly in terms of profits, lagged
profits, and capital stock at the beginning of the year (K_1).

The third equation is a demand for labor equation which
describes the private wage bill linearly in terms of private
product (E), private product lagged by one year (E_1), and
time (t) measured in calendar years.

The five identity equations complete the system. The
additional variables in the identity equations are national
income (Y), indirect taxes (R), government expenditure on
goods and services (G), capital stock at the end of the year

(K), the private wage bill (W1), and the government wage bill

1

(W2) .

The variables C, P, W, I, W1, E, Y, and K are desig-
nated as jointly dependent within the system and the variables
P_1, K_1, E_1, t, R, G, and W2 are designated as predetermined
to the system. It is convenient to consider one additional pre-

determined variable -- X0, a variable which assumes the value 1.0

for all observations. Thus, X0 is the variable whose coefficient

,_A__.

G is also used in this paper to denote the number of jointly
dependent variables in the system, and K is also used to denote
the number of instrumental variables in the XI matrix (a matrix which
is defined further on); however, the particular uses of G and K
should be clear from their contexts.

20

[2]

is a": 1] O

O
.531

in the first equation, a in the second equation and
in the third equation.

As a digression we will note why certain of the variables
defined by identity equations are considered to be jointly de-
pendent rather than predetermined. As indicated in the wages
equation, W = Wl +-W2. In formulating the model, W1 was des-

ignated as jointly dependent and W as predetermined. Since W

2
is composed of only one part not contemporaneously independent
of the disturbance, W is also not contemporaneously independent
of the disturbance and must be denoted jointly dependent. Sim—
ilarly for K, since K = I + K-1 with I designated as jointly
dependent. Similarly also for E, since E = Y + R - W2 and Y
is jointly dependent.

If the annual data from 1921 through 1941 are used in the
model, T = 21. Also, given the above designation of variables,
G = 8, A = 8, and M = 3. Regarding each of the variables such

as C as a 21Xl vector of observed values, the matrices of

equations (1.1) and (1.2) may be defined as:

 

 

 

1.6 Z=Y§X=P,W,W,K,C,I,Y,E’
(> E 1R 1
V’—
jointly dependent
21x8
X.P,K,E tRG,W]
\50 -1 -l -l’ ’ ’ 1:9
v
predetermined

21x8

(1.7)

Eq.

Eq.
Eq.
Eq.
Eq.
Eq.
Eq.

Eq.

21

a = [ F E B ] =

8X16
P
“1

.1121

 

8X8 8X8

w w1 K
[1]

oz 0 0
o o o
o -1 o
o o o
1 o o
o 0 -1
-1 1 o
0 o 0
x0 P-l
[1] [1]

0’0 0’3

[2] [2]

0’0 “2

aE3] 0

o

O o

1 o

o o

o o

I Y E
0 0 0
-l O 0

0 0 -l
O 0 0
0 O 0
0 O O
0 0 l

 

 

Eq.

Eq.

Eq.
Eq.
Eq.
Eq.
Eq.

Eq.

22

The part within the brackets constitutes the a matrix.
'The equation number corresponding to each row is given on each
side of the matrix and the variable corresponding to each column
is given above the matrix.

Notice that in the a matrix, a coefficient of -1 has been
assigned for any variable listed to the left of the equality Sign
in equations (I.5a) through (I.5h). This is equivalent to assign-
ing a coefficient of l in the equation and then transcribing the
coefficient and variable to the same side of the equality sign as
the remaining variables.

The coefficients of any equation may be multiplied by a
scalar without changing the meaning of the equation. To avoid
this indeterminacy we will follow the normalization rule given

above, i.e., in each of the equations the coefficient of one of

the jointly dependent variables will be assigned the value -1.1

Of all of the estimation procedures discussed in this

paper, only in the case of limited information single equation

k

ﬂ

1Many normalization rules have been used in the past to avoid
indeterminacy in each equation. One way is to specify that a co-
efficient corresponding to one jointly dependent variable in each
equation be set to l. A second way is to normalize each equation
such that the resulting estimated disturbance variance-covariance
matrix has 1's for diagonal elements. For our purpose here, how-
ever, we will find it most convenient to select a variable in each
equation as the normalizeing variable and set its coefficient to
-1.

If -1 is used as the normalizing coefficient, the resulting
coefficients may be compared directly with coefficients from
methods such as direct least squares (DLS) or two-stage least
squares (ZSLS); however, if 1 is used as the normalizing coeffi-
cient, the sign of each coefficient must be reversed before com-
paring the coefficient with a comparable coefficient from DLS or
ZSLS. Since it is as easy to use -1 for normalizing as to use 1,
it seems highly desirable to do so and avoid the reflection in
coefficient signs.

23

tmaximum likelihood (LIML), limited information subsystem maximum
likelihood (SML), linearized maximum likelihood (LML), and full
information maximum likelihood (FIML) will it make no substantive
difference in the estimated coefficients which jointly dependent
variable is chosen as the normalizing variable. For the remainder
of the procedures, selection of a different normalizing variable
will make a substantive change in the resulting estimated coeffi-
cients.1

Only coefficients of the form anJ must be estimated.

All of the remaining coefficients in the a matrix are assumed

known.

 

In the special case of a just-identified equation (see
section II.D), it will also make no substantive difference for
many methods since the same estimated coefficients are obtained

by many of the single equation procedures as are obtained for
LIML.

24

2. Single structural equation

 

In part I, estimation of only a single equation from among
the M. stochastic equations will be considered although
for many of the procedures some of the information contained in
the remaining M - l stochastic equations and the G - M _
identity equations will be taken into account by the computational
method. For most of the "single equation" estimating methods
which are considered, account is taken of the structure of the
particular equation to be estimated plus additional "instrumental
variables." (The predetermined variables in the entire system
including predetermined variables in identity equations usually
comprise the instrumental variables.) No account is taken of
jointly dependent variables inside the system of equations but
outside the equation being estimated or of which particular
equations the instrumental variables occur in (if the instrumental
variables are predetermined variables in the system).

For ease of expressing computational formulas, some addi-
tional notation regarding a single equation from the system of
equations will be recorded. The uth equation may be written

separately as:

. = Y + x +
(I 8) YD 1» YD u. an up
Txl TXm lel TXL {.Xl TXl
u- p u-

01'

(1.9)

(I. 10)

where:

25

1 ' Y“
= Y : X +
’1!» u U] "u
u
0 r
= z 6 +

’1» u- u ”u-

Txl Txn [1X1 TXl
U u-

is the vector of T sample observations taken on the
normalizing jointly dependent variable (i.e., the jointly
dependent variable assigned a coefficient of -l) for the
nth equation.

is the TXUL matrix of T sample observations taken on
the remaining mh jointly dependent variables in the nth
equation. We will refer to jointly dependent variables
other than the normalizing jointly dependent variable as
"explanatory" jointly dependent variables.

is the TXLU matrix of T sample observations taken on
the Lu predetermined variables in the uth equation.

= [Yﬁ 3 Ru] is the Tan matrix of T sample observations

taken on all nu (= m +-L explanatory variables in the

U u)
equation; that is, on all variables in the equation except
the normalizing variable, y“.
is the vector of T unobserved structural disturbances.
is the mhxl vector of population coefficients of the
h

mp explanatory jointly dependent variables in the pt

equation. (The elements of Yu may be obtained from the

26
th
u row of F by deleting the normalizing coefficient and

the coefficients which are known to be zero.)
6 is the Luxl vector of population coefficients of the L
predetermined variables in the nth equation. (The elements

of BM are the non-zero elements of the nth row of B.)

5 = [EE] is the HDXI vector of population coefficients of

the nLL (= m +-L explanatory variables in the equation.

H u)
(The elements of 5“ may be obtained from the nth row of
a be deleting the normalizing coefficient and the coeffi-
cients which are known to be zero.)
Sometimes it will be desirable that the normalizing variable
be included as part of an observation matrix or the normalizing co-

efficient as part of a coefficient vector. To do this the follow-

ing additional notation is used:

Y E Y is the TX + 1 matrix of T sam 1e obser-
+ u E?“ p”II (mu ) p

vations taken on all mu + l jointly dependent

variables in the nth equation.

z 22 = EYEx = YEx isthe TXn+1)
+-u [yn #1 [Yb u H] [+-u #1 ( 9
matrix of T sample observations taken on all

up + 1 variables in the nth equation.

+‘H Y

-1
Y 8 [: ] is the (an + 1)Xl vector of population coefficients
u

of all mh + 1 jointly dependent variables in

the nth equation.

-1 m 27

Bu

--1 +Yp.
+5“. I- [ ] = Yp. = is the (nu + 1)Xl vector of popu-
lation coefficients of all nu + 1 variables in
the nth equation.

. .. ‘. th .
USing the above additional notation, the u equation may

also be written as:

( ) +‘U +Nh H BM up
TX +1 +1 Xl TXL L x1 Tx1 Txl
(mu ) (mp. ) u u
or
(I . 12) +2“. +6“, + up = O
TX(np+1) (nu+l)Xl TXl TX1

The estimation of a particular equation will often be
accomplished by essentially a two stage procedure in which the
jointly dependent variables in the equation are first adjusted
by‘a set of “instrumental variables" (we will refer to a set of
instrrumental variables as a set of "instruments“) which consist
of the predetermined variables in the equation plus additional
inst:ruments--usua11y the additional predetermined variables in
the System.1 The coefficients are then estimated from the ad-
.hﬁited.joint1y dependent variables and the predetermined vari-
ables in the equation. We will refer to a matrix of instruments

as XI. (The I denotes instruments.) Thus,

1The use of the term instruments follows Fisher [1965].

28

(1.13) XI = [ xu E x** ]

TXK X
TXLu T (K-Lu)

X**, and K should be written as X ,

I’ I
u

x:*, and K“, respectively since they may be different for each

Strictly speaking, X

equation, u; however, since the equation referred to will be

clear from the context, we will simplify the notation by writing

X as X , X** as X**, and K as K.1
I” I u U

During discussion of the single equation techniques (part
I), we will drop the subscript u when such will not be confusing.

Thus, yp, Yﬁ, 2”, up, +Xp, +ﬁp’ and x:* will usually be shortened

to y, Y, 2, u, +Y, +2, and X** respectively. Also, Yﬁ, Bu, 6

u

+Nh' and +§M will usually be shortened to y, B, 6, y, and +6
respectively, and mu, Lb, nu, and Kp will usually be shortened

to m, L, n, and K, respectively.
The consumption equation (I.5a) of Klein's model I will be
used to illustrate the above notation. Since the consumption

equation is the first equation, u will be 1 and:

yl = c, Y1 = [P, w], x1 = [x0, P_1], 21 = [P, w, x0, P_1],

+Y1 = [c, P, w], +z1 = [c, P, w, x0, P_1],

x** = [K t, R, G, ”21 .

-1) E-l,

 

11a BSLS estimation X is again used to denote a set of
instruments but in this case x includes as instruments the pre-
determined variables in all equations of the subsystem being
estimated plus additional instruments desired. For BSLS estimation
the same x matrix is used by all equations in the subsystem
being estimated.

29

If all of the predetermined variables in the system are

used as instrumental variables for adjusting the jointly dependent

'variables, then X coincides with X

I
renumbering of variables, i.e.,

** = 0
x ] [x0, P_1 . K-1,

XI = [X1 :

‘E

-1’

The population coefficient vectors are:

Y a
1 gal]

r-l '1

[1]
1

[1]
+51 0’2

.gn

0'

 

 

mg”.

Thus, [[11 = 2, L1 = 2, n1 = 4, and K = 8.

".511
.31

.gu

 

[1
[’3]

Equations (1.8) through (1.12) become:

411

0'

except possibly for a

c, R, G, ‘42] .

‘

 

ad

[1]

+ 1

(1.8') c = [P, w] + [x , P .] + u
CIE1] o -1 agl] 1

F11 -1

411

aEl]

 

 

hZ-J

(1.9')

and c = [P, w
(1.10')

(1.11') [c :' P, w]
(1.12') [c E P, wE

: X0, P_1]

r-_1 1
[1]
°’1

 

 

X

0’ P_1]

30

P

E 137
.g11

[1]

 

 

[1]
E“: .

"-1 '1
.g11
[1]

.g11

 

[1
133 :L

 

+u=0

31

3. Statistical assumptions

Following are the statistical assumptions which will be
made in this paper unless noted otherwise:
t
(1) If estimating the u h structural equation by a single equa-

tion method, the nth equation is identifiable by the a priori

restrictions on the values of the coefficients in the equation.
If estimating by a multiple equations method, all equations

in the subset of equations being estimated are identifiable

by the a priori restrictions on the values of the coeffi-

cients in the subsystem.

In the early chapters on single equation methods (chapters 11

 

1See Goldberger [1964], pp. 306-318, Johnston [1963], pp. 240-
252, or Koopmans and Hood [1953], pp. 135-142 for a discussion of
identification. In addition to the usual order conditions imposed
on.a single equation, (i.e., the usual counting rule K:* >'mh

given below), the assumption here is that observationally equivalent
Structures for the subsystem being estimated do not occur. Johnston
[1963], p. 252 states: "If K:* 2 mh , the parameters of a relation

are identifiable. Our practical estimation procedure may then be

influenced by whether K:* = mu or Kﬁ* >'mh. In the former case

rk ﬁA ** will, apart from a freakish statistical accident, be equal
3

to mh " (Johnston used K** in place of K:* , GA - 1 in place

of mh’ and p in place of rk.) Unfortunately, Johnston's state-
lnent seems stronger than is warranted. rkﬁA **.< mﬂ has probably
3

‘DCCUIIEd much more often than is generally realized, but has not
bEen apparent when estimating by single equation methods; however,
EHJch a situation is more likely to become apparent when estimating
bYlmultiple equation techniques. Koopman, Rubin, and Leipnik
11950], pp. 78-80 present an additional examination (in addition
t1) the usual counting rule for a single equation) which may be
readily performed to detect observationally equivalent structures.

32
and 111) only a priori restrictions that certain coefficients

are zero are used; therefore, during this part, identifiability
implies nu S rk,X S A and the +au matrix has ‘ full column
rank.1 (rk X is used to denote rank of X.) In the last
chapter on single equation methods (chapter IV), more general
linear restrictions are considered. For that chapter, nu

may be greater than A and +2” need not have full column
rank.

Except for showing some relationships, in no part of this
paper will the usual assumption that X has full column

rank (i.e., the assumption that rk X = A) be made, since

the computational procedures given in this paper automatically

handle the more general situation of rk.X<< A.

(2) The TXM. matrix of disturbances of the first M equations,

 

UEI] U11 ... Um
U=[u1...uM]'-= U. = u ...u
[T] _Tl TM

has a multivariate distribution with 60 = 0, dUﬁ U = 2
t] [t]
I g I x
for all t and auEt]U[t'] O for t # t , 2 being an MIM
positive semi-definite matrix. Thus, 2 is the population

variance-covariance matrix of disturbances. When estimating

 

S A is equivalent to m S K** since n = m +-L and
A = L K$* . (Goldberger [19643, ppH 306-318 anh JoﬂhstonPEI963],
pp. dﬁo-zsﬁ use the latter order condition.) S rk X is imposed
since we are permitting X to have less thaanull column rank in
‘ this paper. rk.X S A always holds since the rank of any matrix
is less than or equal to the number of columns (and rows) in the
matrix.

(3)

(4)

33

a subset of structural equations (including the entire set

of equations) by a multiple equations technique, the stronger
assumption that 2 is positive definite will be required so
that the determinant of 2 will be greater than zero and the
inverse of 2 will exist. The restriction 6Uﬁt]U[t] = 2
for all t implies that the disturbance variance-covariance
matrix is constant for all observations. The restriction
6Uft]U[t,] = O for t # t' implies that there exists no
serial correlation between observations of the disturbance
elements. The nth diagonal element of 2 will be denoted

Oi , i.e., Var u“ = oﬁ . For notational simplicity of will
be written simply as 02 during discussion of the single
equation methods. The above assumptions regarding U imply
that under general conditions plim(l/T)U'U = 2.1

The TXA matrix of predetermined variables, X, has the pro-
perty plim(l/T)X'X = 6(1/T)X'X = QXX’ a finite positive
semi-definite matrix. Also, the variables in U are con-
temporaneously independent of the variables in X; that is,
U[t] is statistically independent of X[t.] for all t,

t' with t 2 t'.2 This implies that plim(l/T)X'U = 0.3

det T $ 0, hence F51 exists. (det denotes determinant.)

 

plim.

T—O

1Goldberger [1964], p. 300. As noted earlier plim denotes

2This assumption holds by definition when X includes only

exogenous variables. It allows inclusion of lagged jointly de-
pendent variables in X provided there is no serial correlation
in the disturbances.

3See Christ [1966], pp. 377, 378 and footnote 70 of p. 439.

(5)

34

In some of the computational methods, jointly dependent vari-
ables will be adjusted by a TXK matrix of instrumental
variables which we will denote XI (the subscript I de-
noting instruments). These instrumental variables will be
assumed to have essentially the same properties as the pre-
determined variables in the system. (Usually XI consists
of the set of predetermined variables in the system.) In
particular we will assume that plim(l/T)XiU = 0 (where 0
is a KXM matrix) and that plim(l/T)XiXI = c3(1/T)XiXI =
0X X , a KXK positive semi-definite matrix. (We will not,
inlgineral, assume that XI has full column rank.) We will

' a ' a x
also assume that plim(1/T)XIX 6(1/T)XIX OXIX’ a K A

matrix.

35
4. Reduced form equations

Often after estimating the coefficients of the structural
equations in a system, it is desired that these be used for pre-
dictive purposes; however, (1) multiple jointly dependent vari-
ables may occur in the equation (therby requiring that values
must be assumed for the "explanatory" dependent variables as well
as the predetermined variables), (2) it may not be obvious which
equation to use in the prediction of a particular jointly de-
pendent variable, and (3) a given equation will not reflect the
repercussion of the assumed levels of all predetermined variables
in the system. As a result, it is often desired that the structural
equations be solved for a set of "reduced form" equations, each
reduced form equation containing 1 jointly dependent variable and
all of the predetermined variables in the system (except that the
a priori restrictions on the structural equations may imply that
certain of the coefficients of the reduced form equations are to
be zero, also).

An additional reason for calculating reduced form coeffi-
cients comes from the calculation of elasticities. Certain direct
elasticities between variables should be based on the coefficients
of the structural equations; however, in many cases, when specify-
ing elasticities between two variables, the relationship between
these variables after taking account of all repercussions in the
system is desired. Such elasticities should be based on the

reduced form coefficients.

36

The reduced form equations may be derived by premultiplying

(1.3) by ,F-1 or by postmultiplying (I.3) by (F-1)'. Pre-

multiplying (1.3) by F-1 we have:

(I.

(I.

(I.

In

(I.

(I.

(I.

F t T
_1 _1 Mx1
14) F 1" th] + 1" B Xtt] +1" ;=0tt]

cxc cxc cx1 cxc GXA Ax1 cxc °[t] GXl
L..(G-M)XU

-l

 

 

or
15) Y' = -T'1 B x' - r'1 [u E o ]'
[t] [t] [t] ft]
GXl GXG GXA Axl GXG GXl

or

U - I + vl
cxl GXA AX1 GX1
terms of the entire observation matrices for Y, X, and V,
16) may also be written as:
17) Y' = H x' + v'
GXT GXA AXT GXT
or
18) Y = x n' +- v
TXG TXA AXG TXG

where in (1.16) through (1.18)

(I.

19) n = —F'13

37

is the GXA matrix of coefficients of the reduced form equations,
in which each row of H gives the coefficients corresponding to
the predetermined variables of a single reduced form equation.
Bach equation contains only one jointly dependent variable and

this variable is written to the left of the equality sign.

1V[1] V11 ... V}G
v = -[U 5 o](F’1)' = [v1 ... vG] = : =

V
[T] le ... T

is the TXG matrix of reduced form disturbances.

Some of the statistical characteristics of the reduced
form matrices which follow from their relationships to the
structural equations and the statistical assumptions regarding

the structural equations are:

(1.20) 6v = 6[[-U E 01(F‘1)] = E-6U s o](F'1)' = [o E o] = o

vt1

(1.21) 6v[t]v[t'] = 6 th [vt,1 ... v

]

t'G

6<r'1>'[u[t] s oml'tuh.J : oEt.]]r-1

 

60' U , 0
= (F'1)l [t] [t J F-1
0 0
Thus,
2 O l
(1.22) 6V' V = (F-1)' F-l dgf GXG positive semi-
[c] [c] 0 o
1 2 0
d f - -1
i denotes that we are defining the matrix (T 1)' F

definite matrix which

ditions, plim(l/T)V'V

(1.25) plim(l/T)X'V

(1.26) plim(l/T)XiV

38

is fixed for all t.

30W,

Under general con-

0 0
(T-1)'[O (JP-1 = 0 for t # t' .

=-p11m(1/T)x'[U E o](r’1)'

=-[plim(1/T)X'U 5 o](r‘1): = [o s o](T‘1)'

=-p1im(l/T)Xi[U 5 o](r‘1)' = [o E 0](T'1)'

39

D. Orthogonality Relationships

_1. Notation expressing orthogonality relationships

We can shorten the expression of many formulas which are
given in this paper while giving a better idea of the computational
use of these formulas by introducing some additional notation at
the outset.

Two variables are said to be orthogonal if their sum of
cross-products is zero; that is 21 is orthogonal to 22 if
ziz2 = 0. The extension of the concept of orthogonality to
matrices whose columns are variables is quite straightforward.

Let Z1 be a TXN1 matrix of variables and Z2 be a TXN2
matrix of variables. Then the columns of Z1 are said to be
orthogonal to the columns of Z2 if the matrix 2122 = 0 where

O is an leN2 matrix. (The ijth element of the matrix 0 is the
sum of cross-products of the ith variable of 21 and the jth
variable of 22. The ijth element is, of course, zero since the
entire matrix of sums of cross-products is zero.)

In this paper each column in a matrix of variables is a
variable; thus, for this paper we will shorten our definition of
orthogonality between the variables in two matrices to: 21 is
orthogonal to 22 if ZiZ2 = 0.

It is often convenient to divide a vector into two com-
ponents -- (1) the part of the vector within the space spanned by

(the variables of) another matrix and (2) the part of the vector

orthogonal to the other matrix (i.e., outside the Space spanned

40

by the other matrix). Thus, given a TXl vector, y, and a TXN1

matrix, X, y may be separated into:

 

1.27 = +
( ) y ylx le
Txl TXl Tx1
where:
ynx is the_part 0f Y which is in the space spanned by x,1
is the part of 4y which is orthogonal to X. (ny is the

 

’1}:

part of y outside the space spanned by the variables in X.)

The 1- (perpendicular) symbol is used to denote orthogonality,
because if two vectors, y and x, are geometrically at right angles
to each other, then y'x = O (i.e., the vectors are orthogonal).
The use of the || (parallel) symbol to represent "within the Space
spanned by" may be justified by a similar geometrical argument.

Extension of this notation to matrices of variables is
straightforward. Thus, a TXG matrix of variables Y

(Y = [y1 ... yG]) may be partitioned as:

(1.28) Y = y"x + le

TXG TXG TXG

 

where:
YﬂX is the part of Y in the space spanned by x.

obtained by calculating the part of each variable in Y which

is in the space spanned by the variables in X.

 

1y is the projection of y onto the space spanned by the
variables in X.

ZYJX is the projection of y onto the space orthogonal to X.

41

Y is the part of Y which is orthogonal to X.

1X
YlX = {[ylhX ... [yG]Lx}; that is, YlX is merely the matrix
obtained by calculating the part of each variable in Y which
is orthogonal to X.

The partitioning of Y into YNX and YLX may be accom- -
plished by an application of least squares. Let us estimate the
equation y =VXn + v by least squares. (In the notation pre-
viously given, this equation would be a reduced form equation.)
The usual least squares vector of predicted (estimated) values of
y (i.e., §) is *lX’ i.e., the part of y in the space spanned
by X, and the usual least squares vector of residuals (i.e., 9)

is -- the part of y orthogonal to X.

ny
To show this we note that (assuming that X has full

column rank) the usual least squares solution for the vector of

estimated coefficients is given by:

(1.29) n = (x'x>‘1x'y ;

the TXl vector of predicted values for y is given by

l

(1.30) y"x = § = xﬁ'= X(X'X)- X'y ;

and the TXl vector of residuals is given by

(1.31) yix=v=y - y=y -X;T=y -X(X'X)-—]X'y

Notice that in the usual least squares calculations, y is

divided into a part y (or ) within the space spanned by X

yux
and a part 9 (or ny) orthogonal to X. That 9 is orthogonal

42

to X is easily demonstrated:
3" I v '11 I I I '11 1 v
(1.32) Xv=X(y-X(XX) Xy)=Xy-XX(XX) Xy=Xy-Xy=0.

Since XIX and Y1 were defined as the parts of each

X

variable in Y in the space Spanned by X and orthogonal to X,

respectively, the calculation of Y and Y1 may also be re-

"x x

garded as least squares calculations. Let us consider G equations

with separate dependent variables, y1 ... yG, but the same inde-
pendent variables, x1 ... xA for each equation. Let Y = [y1 ... yG]
and X - [x1 ... xA] as before. The least squares solutions for

the coefficients of the G equations are given by the matrix
(1.33) 11' = l'_(x'X)’1x'y1 (X’X)'1x'yG] =[x'x1'1x'Y ;
the TXG matrix of predicted values for Y is given by:

(1.34) Yux = Y = [y1 yG] = [X111 xﬁG]

[X(X'X)-]X'y1 X(X'X)'5('YM]

x(x'x)"])('[y1 yM] = X(X'X)';X'Y ;
and the TXG matrix of residuals for Y is given by:

(1.35) le = U = [ul ... uG] = [y1 - y1 ... yG - yG]

- X(X'X)-1X'y

[yl - X(X'X).1X'y1 ... y M]

M.

= [y1 yM] - x(1{'1<)'1x'[y1 yM1= Y - X(X'X)’]X'Y .

If X does not have full column rank, the X'X matrix will

43

be singular, the inverse of X'X will not exist and unique least
squares coefficients will not exist for any of the G equations.
Even though the least squares coefficients are not unique, the

least squares predicted values for Y and the residuals for each
equation are unique and can be readily calculated. This illustrates

a desirable characteristic of Y and le--even though_ X is

 

llx

not of full column rank, Y and Yix are unique and can be

"X

readily calculated.2

 

¥A set of least squares coefficients can be obtained by put-
ting enough restrictions on the coefficients of each equation; e.g.,
by setting certain of the coefficients to zero thereby, in effect,
omitting variables from the X matrix.

2

A direct orthogonalization procedure such as the Gram-Schmidt
orthogonalization procedure is probably the most accurate method
available for the calculation of residuals (le), and predicted
x; however, if
very many observations occur it will be considerably more efficient
to obtain a set of least squares coefficients for each column of
Y and then use these coefficients to calculate Y"x and le.

This may be done by selecting variables from the X matrix in such
a way that the subset of variables selected have full column rank
which is the same rank as the original X matrix and then using
this smaller set of variables in place of the original X matrix
in the calculation of least squares coefficients, predicted values
of Y (Yux) and residuals (le). The selection of a subset of

values of Y (YIX) may be calculated as Y - Y1

variables having full column rank and having the same rank as X
may be built into the inversion routine in the manner noted in
section I.D.2.

Although less computer time will generally be required if‘a
set of least squares coefficients are calculated as indicated

above and Y"x and le are calculated from these coefficients,

. . I O 0
it is deSirable that matrices of the form YIIXYIIX’ lele, Yule,

or Yixzz be calculated by direct orthogonalization, since

matrices of this form may be calculated directly from moment
matrices instead of from the observation matrix. (A method for
doing so is given in section I.D.2.)

44

In this paper, we will make extensive use of matrices of

I I a o '
the form Yanux and lele which we w111 denote as [Y YJnx

and [Y'YJIX’ respectively. Thus, [Y'YLLx is the moment matrix

 

‘vf

(i.e., the sums of squares and cross-products matrix) of thegpart

of Y in the Space spanned by the columns of X and [Y'Y]ix

 

is the moment matrix of the part of Y orthogonal to X.

If X has full column rank, then [Y'Y] may be

”X

expressed as:

(I 36) [Y'YJHX = Yﬁqux = [Y'X(X'X)'lx'][X(x'X)'1x'Y] = Y'X(X'X)'1X'Y .

and [Y'Y]J_x may be expressed as:

(1.37) [Y'Y]ix - lele = [Y' - Y'X(X'X)-¥X'][Y - X(X'X)-1X'Y]

Y'Y - 2Y'X(X'X)-1X'Y + Y'X(X'X)-1X'X(X'X)-1X'Y

-1 '.
v _ I v 1 = I _ 1
Y Y Y X(X X) x Y Y x) [Y Y]llx .

“Y

Although Y'X(X'X)-1X'Y is the usual computational formula
given for [Y'Y]“x and Y'Y - Y'X(X'X)-1X'Y is the usual compu-
tational formula given for [Y'Y]lx, these formulas will not be
used in this paper, except possibly for the purpose of a derivation.

Instead, the [Y'Y] X matrix will be calculated by direct orthog-
, ii

 

onalization from the Y'Y, YiX,_and X'X matrices by a compu-

tational scheme given in section I.D.Z; hence, the use of the form
I I . . . -

[Y YJIX rather than le iX' Calculation by this direct orthog

onalization method has the advantages of being more accurate, re-

quiring less computer time, and requiring less computer storage.

45

Also, calculation by direct orthogonalization has the additional
advantage that [Y'YJIX is easily calculated when X has less
than full column rank. ([Y'Y]ix is unique even though X has

less than full column rank.) The [Y'Y] IX matrix will be cal-

 

culated as Y'Y - [Y'Y] 1X rather than by the computational

 

formula Y'X(X'X)-]X'Y for the same reasons that [Y'Y]ix is

calculated by direct orthogonalization.

Mere general matrices of the form [Z'Z ] and
1 2 "23
I X 0
[21221123 will also be calculated, where 21 is a T N1 matrix
of variables, 22 is a TXN2 matrix of variables and 23 is a

TXN3 matrix of variables, the variables in any of these matrices
being jointly dependent, predetermined, or some jointly dependent
and others predetermined within the same matrix. (Some of the
variables in one matrix may be repeated in the other two.) Al-

I . 7 I I . _
though [2122 1123 [ZIJLZBEZZJIZB’ [2122]123 will not be com

puted in this way but will instead be extracted from the moment

matrix of the part of Z = [Z1 3 22] orthogonal to 23, i.e.,
[2122312 may be extracted as the upper right hand block of:
3
I l I I
2121 2122 [2121le3 [21221123
[z'z] = =
‘LZ I I I I
z221 2222 [2221112 [2222112
12 3
3
which is computed directly from the Z'Z, 2'23 and 2523
matrices. Any variables common to 21 and 22 need not be

repeated in Z. The row and column of [2.2112 corresponding

to any variable of Z1 or Z2 which also occurs in 23 will be

46

zero, since no part of that variable is orthogonal to 23.
'Z is cal lated s' l a ' - ' . .7
[21 211123 c“ Imp y 8 2122 [Zizz]i§,/-,13

‘If the_readerfhas difficultyiwith the concept of orthog-

onality, he may regard a matrix [Z as merely a matrix

Iz]
1 2 123

that is calculated from the matrix ZiZ2 (and has the same

ﬂ

number of rows and columns as 2122) through use of the matrices

 

2523 and 252 (where Z contains all variables in Z1 and 22)

by]; standard computational procedure given in section I.n,g

,(which we will call the direct orthogonalization procedure). The

 

matrix [leZJIZ may be regarded (and calculated) as
3
I _ I 1
2122 [21221123'

 

“The remainder of this section is devoted to derivingfsome
fundamental relationships which will be useful in verifying various
results throughout this paper;_however, these relationships are

not required to apply the formulas which are given in this paper.

 

1Some readers will find it of interest to remember that
I I
[2122]|Z and [2122112 can be calculated by direct least

squares 8y using the variables in 21 and 22 as dependent vari-

ables and the variables in 2 as independent variables (the
maximum number of linearly independent variables in 2 being
used as the set of independent variables if 23 has less than
full column rank). The matrices [ZIJIZ and [22]"z are the

matrices of predicted values of the variables in 21 and 22,

respectively; the matrices [21112 and [22]lz are the

3

matrices of residuals of the variagles in Z and 22, respectively;

1
I g I . I 8 I
[z122]"23 [211u23tzz]ﬂz3 ’ and [21221123 [211123[221123' The

use of the direct orthogonalization procedure merely saves computer
time and provides a more accurate calculation of the desired matrices.

47

2, Z3, and Z4

be any TXNl, TXNZ’ TXN3, and TXN4 matrices of variables,

respectively, the variables in any of these matrices being jointly

In the remainder of this section let Z Z

1’

dependent, predetermined, or some jointly dependent and some pre-
determined. Variables in any of these matrices may also occur

in any of the other matrices. We will assume only that 21, 22,

, and Z have rank N*, N* Ng, and N* respectively, i.e.,

z3 4 1 2’ 4

we will assume that any of the matrices may have less than full
column rank. In showing algebraically that each of the claimed

relationships hold, we will often use matrices 21, 23, zg, and

22 where Z? is a TXNi matrix of variables extracted from

the Z matrix such that 2* has full column rank and every

1 1
variable in 21 can be expressed as a linear combination of
variables in Z? . (It is always possible to extract such a

matrix. A method for doing so is given in section I.D.2.)

2*, lg, and .2: are constructed from 22, Z3, and 24 in the
same fashion. Following are some additional relationships which
will be helpful in deriving computational formulas.

(1) If x is a variable in the matrix Z3 or if x is in the

space of 23 (i.e., x may be expressed as a linear com-

bination of the columns of Z3), then
(1.38) [x] = x ;

uz3
hence

(1.39) [x]' z = x'z
|z3 2 2

48

or if X1 is a matrix of variables which are also contained

in Z then:

3,
(1.40) [XIJuz3 22 - x122

and in particular,

(1.41) [x'x ] = [x ]' [x J = x'x
l l "23 l “23 l “23 l 1
Also, (continuing to let X1 be a submatrix of Z3 or at

least in the space of 23):

(1.42) [X1]Lz = 0 [where O is a TX(number of variables in
3
X1) matrix]; therefore, the matrix of sums of cross-products

of [x1112 with any other matrix of variables is zero, i.e.,
3

I _ I =
(1.43) [X13123 22 0 z2 0
and, in particular,

(1.44) [@1123 = [x1 123EX1312 = 0

and

 

I = I = I _
(1'45) [xlzzle [x1112 [22112 0 [22112 0
3 3 3 3
-l
2 .4 = * *' * *'
( ) (I 6) [21]"23 23(23 23) Z3 21
where Z§(Z§'Z§)-IZ§' is called the projection matrix for
the space Z3 . (A matrix P is a projection matrix for a
Space Z if P2 = [Z ] for any matrix of variables,
3 l 1 H23
Z1 .)
l

Z§ is defined on page 47.

(3)

(4)

(5)

49

(1.47) [2111z3 = z1 - [21]

1z*'z

- * *' * '
2 23(23 23) 3 1

"23 l

-l
- *' * *'
[1 z§(z3 23) 23 321
where I - Z§(Z§'Z§)-IZ§ is called the projection matrix for

the space orthogonal to Z (A matrix P is a projection

3.
matrix for the space orthogonal to 23 if P21 = [21312

3
for any matrix of variables, 21.)

Z¥(Z€'Z{)-IZ¥' is symmetric and idempotent. 0A matrix, P,
is symmetric if P = P' and idempotent if PP = P.) That

zf(zi'zt)-lzf' is symmetric and idempotent is easily verified:1

-1 -1 _1
*' * ' ' = *' ' *' * ' = * *' * '
(1.43) (21(21 21) 21 ) (z1 ) (21 21) 2; 21(21 zl) zf

and

' '1I I ' I
(1.49) [z§(z§'Zf) 1zf'][2{(2f'z§) z? ] = z§(zi z?) 121 ,

 

since the part underlined in (1.49) is an identity matrix.
I - ZT(ZE'Zi)-12i' is symmetric and idempotent. Again this

is easily verified:
-1 . -1
- * *' * *' ' = - * ' *‘3 *'
(1.50) [1 21(21 21) 21 ] 1 21(2{ 21) 21

and

l

- -1
- ' ' - * ' * ' =
(1.51) [1 z§(zg 21) 2f ][1 zl(z§ 21) z; 1

-l -l -l
- *' * *' - * *' * *' * *' * *'

-l
e - * *' * *'
I Zl(Z1 Zl) Z1

 

1For any three matrices A, B, and C of compatible dimensions,

(ABC)' = C'B'A'. Also, for any matrix A, (A')' = A .

50

-1 -l
' ' ' * *' * *' - * *' * *'
(6) The prOJection matrices 21(21 21) Z1 and I Zl(Z1 Zl) Z1

are mutually orthogonal, since

-1

(1.52) [21(21'21) zg'][1 - z§(zi'2{)'12f'] =

* *' * *' - * *' * ' =
21(21 21) 21 21(21 21) 21 0

(7) [Z 1]i23 [Z 2$2] = 0 (where 0 is an NIXNZ matrix), since
23

the variables in [21]"z may be expressed as a linear com-
3

bination of the variables in Z3 and the variables in [22]

are orthogonal to the variables in Z

.123

3. In particular, from
(1.46), (1.47), and (1.52) we have that:

-l

(1.53) [213623[21],23 = Zi[Z§(Z§'z§)‘lz§'][I - z§(z§'z§) zg'le
= 21021= O
(8) (1.54) Zi[223“23 = [211;23tzZJIZ3 = [23 ”1'2 [211i2322
This comes from the idempotency of Z§(Z§'Z§)-IZ§' [see

(1.49)] as follows:

(1.55) [21]“23 [z 2] =[ziz§(z§'z*3)'lzg'][z§(z§'z§)'lzg'zz]

2HZ3

 

-1
='**'*
z123(z3 23) zg '22 =[z 11' 22 or z '[22]

1I23 «23

(9) Similarly,

' -1
This comes from the idempotency of I - Z§(Z§'Z§) Z§' [see

(1.51)] as follows:

51
(1.57) [2122] = [211123[zz]1z

123
-1

-1
= I _ * *0 I _ I
21(1 23(23 2;) 23 )(I Z§(Z§ 2*3)

I
Z§ )Z2

0 ' -1 ' .
= 21(1 - z§(z§ zg) 25 )22 =[le or 21[22]lz

3

123 22
(10) Let A1 be any lepl matrix. Then using (1.46) we obtain:
-1
a * *'* it. = °
(1.58) [zlAl]I23 23(23 23) 23 2 11A [2 WJHZ 1 ,
and using (1.47) we obtain:
-1
= - * *' * *'
(1.59) [21AA11123 (1 23(23 23) 23 )ZlA szljlz 1
(11) Thus, letting A1 be any lep1 matrix and A2 be any
NZX p2 matrix we obtain:

(1.60) [A' z' z A ]=[21A1]'23[22A2]=A'[Z [z 2]

1 1 2 2 |z3 2'23 1 1 [23 ”"23
=AiEzizzjnzBA 2
and
(1.61) [AFZIZZA21123 = [ZIA1]123[22A2]123=A'[21]'23[zm]
= AiEZi223125A2
(12) Let A1 be any lepl matrix and A2 be any NZXPZ matrix.

Then from (1.58) we obtain:

. A1 ' A1
(1.62) [21A1 + 221121.23 = {[21 : 22][A2 = [21 : 221.23 A
3

B [211+[zz31nzA2

"3‘1

Similarly from (1.59) we obtain:

(1.63) [21A.1 +~z 2A ZJiZBB [z 111123A + [z 21123A 2

(13) If 2' Z =0 (i.e., Z

(14)

52

is orthogonal to 24),

3 4 3

112 >124 = ([21]

(1°64) [zl]i[z3 E 24] ([21] 3

11213123

That (1.64) holds for 2524 = O can be seen by writing out

each of the terms and observing that they are the same:

. . -1 .
, - * ; * * : * ' * 5 * * ; * '
[z 111[23 : 24] 21 [23 24]{[23 24] [23 24]} [Z3 24] z
I = .
However, for 2324 0.
-1 -1
*' * *' *
-1 23 23 0 z3 Z3] 0
{[23 E zZ]'[z§ E 22]} = =
o zz'z* o [2* 2*]
hence, [Z M] [Z 5 1 becomes:
3 Z4
[z='r'z=k]'1 0 z*'z
[ 3 3 3 1
z - 2* E 2*] =
1 3 4 *' * -1 *'
0 [24 24] 24 21
z - z*[z*'z*]'lz*'z - z*[z*'z*]'lz*'z .
1 3 3 3 3 1 4 4 4 4 .1
-1
. = - * *' * *'
On the other hand ([zljiza)124 [21 23(23 23) Z3 21]”4

-1 -1
= - * *' * *' - * *' * *'
[21 23(23 Z3) 23 21] 24(24 24) 24 21

+ 22(22'22)'lzz'z§(z§'z§)'lzg'zl
= 21 - z§(z§'z§)'lz:§'z1 - zz(zz'zz)'lzz'zl since zz'zg = 0.
Similarly (EZIJiz )123 becomes the same.
(1.64) does not in general hold for 2324 f 0, however. Since
[23 E [Z4]lz ] spans the same space as [Z3 3 24] and since

3
253E241lz = 0 [see (1.45)] applying (1.64) we have:
3

1 .

(15)

(16)

(17)

53

23)
([21221"z )Iz is defined as ([21] 23 ([z 2]
3 4 1“ )IZ4( 2023) "211
From (1.54) we obtain:
(1.66) ([21221123) 4 =([z 11123'124 ([2 21123) '24
=[z 11123 ([2 21123) '2 =([z ”Juz 124 [2 21123
However:
(1.67) 121221123) 12 ,1 (21221123124 nor ([z1l1z[21)12 ,
since
I I '1 l I “'1 I I '1 I
[212 21.6; ..z 2123‘“? 21> 2122:“: 22> 2:: 21a; 21> 217-.
I I '1 I I '1 I
and (z1[z 21123) nz = 2122(22 2:) 22 23(23 zg) 23 22

Thus, we can perform transformationsof the form (1.54) on the
outermost H operator, only, e.g., as in (1.66).
Additional examples showing permissible transformations

based on the outermost operator only follow:

(1,58) [2131231221124 = ([213)']11z 12 [z 2:112 (U1112322) '24
¢ ([21221"23)124

(1.69) [21112112211124 = [2 13123 ([2 21124) "23 z1[z “1.2 “23
n ([2? W3124 123

Similarly (1.66) through (1.69) hold true if thel'operator is re-

placed by the l operator. -In particular; the following may

54

be easily verified by writing out the matrices involved in

that same manner as for (1.66) through (1.69):

(1.70) ([21221123)1z3 = ([21J1Z3)'1z1 ([2 21123) 21
(1.71) ([21221123) 121 = ([2211123)1'z1([zz1]1z3)1z3
= [Z111z3([22]1z >132 =([z1]1)'z3 123 [2212]

(1.72) ([21221123)1Z3 # (z1[22]1z) 3123 nor ([21131'Z3[22])1z11

(1,73) [z1lfz3[zzl1z1=([zﬂ]1z3 1 z1[22]1121=([21]'223“212
7‘ ([212 22)]123 121

(1.74) [211123973121 = [213'2 3([7‘ 231) 21 123 g (ziEZZJizz. lzB
* ([zizziiz1liz3

(18) The direct relationship [2122] = [2? 22]"z3

for the outermostll and -Loperators, only. For example, the

+ [z1zH113 holds

following may be easily verified by writing out the matrices

involved in the same manner as for (1.66) through (1.69):

(1.75) ([214'!2]1123>”z1 + ([212 22)]nz111211 = [21221123

(1.76) ([z'z] > + ([Z'Z] ) = [Z7] ;
1 2 123 “21 1 2 123 1213 1 2 123

however:

(1.77) ([212 21123) 123 + ([21221123)1Z3 # [212 241112

(1.78) ([2'122111z3>+ (”517-2312 >12 " [211223 2

124 3 4 4

55

o I I
2. Computation of matrices of the form [21221123 and [21221.23

 

by direct orthogonalization1

 

The procedure given in this section is very general in that

it may be used to calculate matrices of the form [Zi22]Lz in
3

which:

(1) 21, 22, and 23 contain jointly dependent or predetermined

variables, or both.

(2) Variables in any of the matrices may also occur in the other

two as well. (If 21 is contained in both 21 and 23, the
row of [Z'Z ] Icorresponding to 2 will be zero at the
12.123 1
completion of the orthogonalization. Similarily if 22 is
d 0
contained in both 22 an 23, the column of [lezllz3

corresponding to will be zero at the completion of the

z2
orthogonalization.)

(3) 21, Z , and 2 may have less than full column rank.

2 3
In this paper, 21 and 22 will most commonly be Y, the

matrix of jointly dependent variables in a system of equations;

¥A’ the matrix of jointly dependent variables in a subsystem of

the equations; or +3“, the matrix of jointly dependent variables

in a single equation. 23 will most commonly be X a matrix of

I,

instrumental variables; X”, the matrix of predetermined variables

 

1The orthogonalization method outlined here is very well
known among mathematicians and statisticians; however, oddly
enough the writer has never seen reference to its use in the field
of econometrics for which it would seem to have considerable
application.

56

in a single equation; or X, the matrix of predetermined variables
in the system. Thus, the orthogonalization procedure outlined in
this section will be most commonly used to calculate matrices of

I ' ' '

Matrices of the form [2122]"23 are calculated as
[zizz] ' [zi221123
Rather than use the more common formula [Zi22]123 = 2122 -
ZiZ3(ZéZ3)-12522, we will calculate [ZlZZJiZ3 by direct orthog-

onalization thereby eliminating the requirement that (2523)"1

when
3

23 has less than full column rank. Even if 23 has full column

rank, calculation of [2i

exists, i.e., thereby permitting calculation of [2122112

221123 by direct orthogonalization is

advantageous from the fact that

(1) fewer computer locations may be conveniently used to compute
EZizzjlza’

(2) fewer arithmetic operations are required thereby saving computer
time,

(3) [ziZZJlZ3 may be computed to a higher degree of accuracy, and

(4) rk z is calculated as a byproduct of the computational pro-

3

cedure.

A computational procedure for calculating [lezllz by
3
direct orthogonalization follows:1

(1) Let Z be a TXN matrix containing all of the variables

or Z . If desired Z could be

which occur in either 21 2

 

¥A verification that the computational procedures produces
the correct matrix follows the presentation of the computational
procedure.

57

defined as Z = [21 5 22]; however, there is no need to re-

peat variables common to both Z1 and 22.1 If Z1 = 22’
then Z = 21 = 22.. Z may contain variables in addition to
those in Z1 and 22 if desired.

Calculate the moment matrix (sums of squares and cross pro-

ducts matrix) of [23 5 Z], i.e., calculate:

 

 

r—z'z z'iT
3 3 3
N3XN3 N3XN
(1.79) [23 E z]'[z3 E z] =
z'z3 z'z
L_1~I><1~13 NXNJ

(2) Do elementary row operations on the rows of the matrix until

the first N columns are reduced to zeros below the diagonal.

3
(It doesn't matter whether the diagonal elements are set to
l or not.) This is equivalent to starting a forward solution

of the Doolittle inversion procedure but stopping after the

 

 

N3th row. The above matrix (1.79) will have become
r- '1
A11 A12
X X
N3 N3 N3 N
(1.80)
l
0 [2 21123
X X
L“N N3 N N _J
where A contains zeros below the diagonal and the results

11

 

1Repeating variables in the Z matrix causes no computational
difficulty.

58

of the elementary row operations on and above the diagonal.

A merely contains the results of the elementary row

12
operations.
(3) [2122112 is a submatrix occupying the same elements of
3

1

[2'2] as Z'Z occupied of 2'2.
.1.
23 l 2

To increase accuracy, it is advisable to rearrange the

first N rows and columns at each step so that the largest

3
diagonal element from among the remaining diagonal elements of

the 2523 matrix is used as the pivot at each step. (This will
not affect a row or a column of [Z'Zjlz ; therefore, there is no
requirement that track be kept of which rows and columns are
switched. On the other hand, the information as to which diagonal
elements have served as pivots can be used to derive a minimum
subset of predetermined variables spanning the space of the columns
of Z3.) If the largest diagonal element becomes smaller than

a preset or precalculated value, c > O, the procedure is stOpped,
since all of the values to be reduced to zero will already be
within a of zero and [2.2112 will already be the moment matrix

of the part of Z orthogonal to 23. The number of columns of

2 already operated on before the largest remaining diagonal

3

element became less than c is the rank of Z3. The pre-

determined variables corresponding to the diagonal elements used

 

1It is noted further on that only the triangular part of
[Z'Z] need be formed and operated on; hence, [ZiZzllz is

extracted from a triangular matrix representing the symmetric
I
matrix [2 z]l 23

59

as pivots constitute a basis spanning the same space as the columns

of Z3. Each of the remaining predetermined variables may be

expressed as a linear combination of this set of rk 23 variables.1

Since (1) no use will be made of the matrices A11 and

 

1The procedure outlined above also provides the starting point
of a procedure for getting a set of least squares coefficients for
one or more equations even if the matrix of independent variables

has less than full column rank. Let Z = Z3 H' + V be a

TXN TXN N XN TXN

set of equations for which a set of least squares coefficients is

desired. Assume that rk Z = N*. Then at the completion of the

orthogonalization procedure outlined above [i.e., (1.80)], N;

diagonal elements will have been selected and the remaining iagonal

elements will have become less than c. The calculation of a set

of least squares coefficients is completed by:

(1) Setting the last N - N§ rows of A to zero. (These
elements will already be approximately zero, but they will not,
in general, be exactly zero due to rounding error.)

(2) Dividing each of the elements on or above the diagonals of
the first N* rows of [A 1 3 A1 ] by the diagonal elements
for the row. (The diagonal elements of the first N3 rows
will then be 1.)

(3) Performing a back solution in the usual Doolittle manner, i.e.,
by reducing all elements above the diagonal elements of the
first N* columns to O.

(4) Rearranging the first N rows into their original order (in
terms of the Z matrix ). A_set of least squares coefficients
(ﬁ') is then given by the N3XN matrix in the position
originally occupied by 252 in (1.79). Estimated values of
Z (i.e., [ZJIZ ) and residuals for Z (i.e., [21.2 )

calculated through uge of the H' matrix calculated in this manner
will be the same as the estimated values of Z and residuals cal-
culated through use of any of the many possible sets of least
squares coefficients. (Even though the least squares coefficients
are not unique, the estimated values of the dependent variable and
the residuals are unique-~the same estimated values being obtained
from any set of least squares coefficients.)

60
A12, (2) the initial matrix is symmetric, and (3) the [Z'Zjlz

matrix is symmetric, all elements on one side of the diagonal
need not be formed or operated on; that is, only a triangular
matrix need be formed and all operations may be performed on

I

this triangular matrix thereby saving computer memory.

. . . . 1
Vméacatxsn That The Computatwmﬂ Pnocedee Pnoducu [Z'Zle
3
That the matrix labeled [Z'Zjlz is indeed the matrix
3
-l I .
I I I

Z Z Z 23(2323) 23 Z 1f 23 has full column rank [hence,

(2523)“1 exists] can be readily demonstrated as follows:

Performing the above elementary row operations is equivalent

E O
to premultiplying (1.79) by a nonsingular matrix 11
E21 1
such that:
I I
E11 0 2323 232 A11 A12
I I
E21 1 Z 23 Z Z 0 A2
Thus:
E Z'Z + Z'Z = O or E = -Z'Z (Z'Z )-1
21 3 3 3 21 3 3 3
and:

I I = .
E21232 + Z Z A22

Substituting for E into the last equation we get:

21

-1
I I I I =
-z 23(2323) 232 + z 2 A22 or

-1
= I _ I I I = I
(1.81) A22 2 z z 23(2323) 232 [z 21123

 

1The proof for Z of full column rank was suggested by
Professor Robert L. Gustafson.

61
The same set of manipulations may also be used to show that
the matrix labeled [z'zjlz is indeed that matrix even in the
3
* < . . '
3 having rank N3 N3 (i e , 1n the case of a 23

having less than full column rank). If rk Z3 = N§ , row operations

are performed on the columns corresponding to N§ of the variables

before the diagonal elements corresponding to the remaining

case of a Z

in 23
variables become less than c (for a suitable choice of e). The
orthogonalization stops at this point. This is equivalent to per-
forming row operations on the following submatrix of (1.79) (let-

ting 23 be a submatrix of Z containing the variables corre-

3
sponding to the N? diagonal elements used as pivots):
I-z* ' 2* 2* ' z“
3 3 3
X * *X
N5 N3 N3 N
(1.82)
Z'Z§ 2'2
x * NXN
5N N3 J

 

 

The same derivation may now be performed on (1.82) as was per-
formed with (I.80)--the only difference in the intermediate
matrices obtained is that 2* will occur in place of Z wher-

3 3

ever Z3 presently occurs. Thus (1.81) becomes:

-1
= ' _ ' * *' * *' ,
(I.83) A22 2 Z 2 23(23 Z3) 23 2
But this is [2.2112 [see (1.57)]. Thus, the desired matrix is
3

obtained even in the case of Z3 having less than full column

rank.

CHAPTER II

COEFFICIENT ESTIMATION
A. Basic Double k-class Model and Summary of Methods

The basic single equation procedure presented in this paper
is the double k-class model--the model developed in this chapter.1
Variance-covariance formulas for the double k-class model are given
in Chapter III and a method for directly imposing restrictions on
direct least squares and two-stage least squares coefficients are
given in Chapter IV.

In this chapter, the double k-class computational formula
for the coefficients of an equation are first given, some matrices
are defined and a derivation of the double k-class formula is pre-
sented. Specific members of the double k-class model such as
direct least squares (DLS), two-stage least squares (ZSLS), and
limited information single equation maximum likelihood (LIML) are
summarized and then presented in more detail. Variations of the
double k-class model including instrumental variables techniques
are presented. Finally a discussion of selection of instruments--
especially the maximum number of instruments which will be effec-
tive--is presented.

The nth equation in a system of the equations was defined

in (1.8). If we drop the subscript u from many of the matrices,

 

1The double k-class model is given in Nagar [1962] and Theil
[1961], pp. 354.

62

63

we may write the equation as:

(11- 1) Y Y 'Y + X“. B + u
Txl Txm mxl TXL‘LXI TX1

The computational formula for double k-class estimated coefficients

can be written as:

1Of the double k-class estimators discussed, only in the case

of LIML does it make no substantive difference in the estimated co-
efficients which jointly dependent variable is chosen as the nor-
malizing variable. For the remaining procedures, a change in the
selected normalizing variable will change the resulting coeffi-
cients by more than just a trivial division of all coefficients

by the negative of the coefficient of the variable chosen as the
normalizing variable. Regarding this effect, Fisher [1965], p.

604 states: I

"It can be argued that limited-information maximum likeli-
hood has the desirable property of treating all included en-
dogenous variables in an equation symmetrically; indeed, Chow
has shown that it is a natural generalization of ordinary
least squares in the absence of a theoretically given nor-
malization rule" [footnote reference deleted].

"On the other hand, such an argument seems rather weak,
since normalization rules are in fact generally present in
practice, each equation of the model being naturally asso-
ciated with that particular endogenous variable which is
determined by the decision-makers whose behavior is repre-
sented by the equation. The normalization rules are in a
real sense part of the specification of the model, and the
model is not completely Specified unless every endogenous
variable appears (at least implicitly) in exactly one equa-
tion in normalized form. For example, it is not enough to
have price equating supply and demand, equations should also
be present which explain pure quotations by sellers and buyers
and which describe the equilibrating process. (For most
purposes, of course, such additional equations can remain in
the back of the model builder's mind, although the rules for
choosing instrumental variables given below may sometimes re-
quire that they be made explicit.)

"Thus, symmetry may be positively undesirable in a well-
specified model where one feels relatively certain as to
appropriate normalization, although it may be desirable if
one wishes to remain agnostic as to appropriate normal-
ization."

64
'Y'Y-kIEY'Y]LX Y' x; 1(1' y- -k ZEY' lex

A.
:Y9

R k = «

1’ 2 ‘3
__k ,k X'Y x'x x'

1 2 .- u u u uy

(11.2) 8

 

where k1 and k2 are scalars which determine the particular

double k-class member.

X1 is the matrix of instrumental variables used to adjust

the jointly dependent variables in the equation. XI includes all
of the predetermined variables in the equation plus additional
instruments--the additional instruments usually being (but not
restricted to being) all or some of the additional predetermined
variables in the system.1

[Y'YJLX is the moment matrix of the part of Y orthogonal
1

+4 4

1The basic double k-class method is usually given with the
entire matrix of predetermined variables in the system, X, being
used as the matrix of instruments, ; however, in practice some
of the predetermined variables in the system are often omitted
from the X matrix, predetermined variables are sometimes
linearly combined (e.g., by use of principal components), etc.
Rather than use the X matrix in our notation and then continually
point out variations, it seems more fruitful to merely designate
the matrix of variables used to adjust the jointly dependent
variables as a matrix of instruments.

Since the particular instruments used to adjust the jointly
dependent variables have a considerable effect on the coefficients
obtained it is imperative that the particular instruments used
be listed when reporting results. We will return to the problem
of selecting instruments in section 11.6.

 

65

1 ' I _ I
to X . Since [Y YJLX - Y

I = y' y [see (1.56)], [Y'Y]1X

Y
J.
I xI J7x1 LXI I

may be regarded as either (1) the vector of sums of cross-products of

the part of Y orthogonal to X with the part of y orthogonal to

I

X1 or (2) the vector of sums of cross-products of the part of Y ortho-

gonal to XI with y. Thus, computationally, X may be regarded as the

I

matrix of instruments used to adjust Y or the matrix of instruments

used to adjust [y E Y]. In light of the equivalence of [Y'y]ix and

 

 

 

I
[YJIX y, (11.2) may also be written as:
I
I _ I I 'IP‘I I '1
(Y Y k1[Y Y]Lx Y x” Y szlx
1 1
(11.3) 6k1,k2 = y ;
X'Y X'X X'
r u u ‘- H -J

however, for actual computations (11.2) should be used since [Y'YJ1X

1
may be computed by direct orthogonalization in the manner indicated

in section I.D.2.

A tentative proof that 6 is a consistent estimator of 6

k1,k2

given the statistical assumptions of section I.C.3 and the assumption

plim(k1 - l) = plim(k2 - 1) = 0 is given in appendix C.2

 

1The orthogonalization notation used here and the concept of orthog-
onalization are discussed in section I.D. A method to compute |:Y'Y]J_X

and [Y'yjlx by direct orthogonalization is given in section I.D.2.
During the galculation of [Y'Ylix' and [Ty],x it is recommended that

rk,x be calculated and, if LIML Eoefficients are to be calculated,

thatuthe [+Y'+Y]Lx matrix be saved. A method for calculating rk X

and [+Y'+Y]Lx as an intermediate step in the calculation of [Y'Ylix
I
and [Y'y]Lx is given in appendix A.

2Members of the double k-class family for which plim(k - l) =
plim(k - 1) = 0 include ZSLS, LIML, Nagar's unbiased to 0(1-1) in
probabIlity estimator (UBK), and Nagar's minimum second moment esti-
mator (MSM). plim(k - l) = plim(k2 - l) = -l for DLS; hence, DLS is
not shown to be consistent.

66

(11.2) is not the most general form of the double k-class formu-
la, since it may be desired that different sets of instrumental vari-
ables be used in theadjustment of the different jointly dependent
variables. Let Y = [yl ... ym] and let Xi denote the set of
instruments used to adjust yl, x: denote the set of instruments
used to adjust y2 , ... , X? denote the set of instruments used
to adjust ym, and X? denote the set of instruments used to
adjust the normalizing jointly dependent variable, y. (X? may

be null, i.e., it may be desired that y not be adjusted.) Then,

the following matrix could be used in (11.2) in place of the

[Y'Y]ix matrix:

 

 

 

 

I
"1'1 [1'1] MUN]
ylyl lxl yl ixl yz 1x2 y 1x1 y 111!“
I I 1 1 I
I I
I I 1 I
[y];[y]1 [3' fully} 2°” [me3
l m .1 m
Ltn XI 1 XI leI 21.XImmJ.XI__J
and the following vector could be used in place of the [Y'y]ix
1
vector:
.
1.131ij 1Ey11xy
1 I
I
I I

A method of computing (11.4) and (11.5) by direct orthogonalization

is given in appendix B.

67

The suggestion that separate instruments be used for the
adjustment of each jointly dependent variable for ZSLS is made by
Franklin Fisher.1 He points out that such a procedure introduces
certain problems of inconsistency depending on the assumptionsof
the modelz; however, he then argues effectively as to why the in-
consistency introduced is likely to be low.

We will not consider properties of the double k-class
estimator if separate instruments are used in the adjustment of
each jointly dependent variable. We merely note that if it is
desired that a separate set of instruments be used in the ad-
justment of each jointly dependent variable, a computational
method for doing this is given in appendix B. In section 11.G
we will return to the problem of selecting instruments and note
that due to the use of direct orthogonalization, it may be
feasible to merely note the separate instruments which the
researcher would prefer to use in the adjustment of each
jointly dependent variable and then use all of these instruments
in the xI matrix; that is, adjust all of the jointly dependent
variables in the equation by the same set of instruments. If
this approach is taken, computational form:(II.2) is used.

If ZSLS is the basic computational method used, it is

often suggested that asymptotic efficiency will be increased

 

1Fisher [1965], pp. 602-603, 625-633. Fisher points out (p.
603) that "if different predetermined variables are used in re-
placing each included endogenous variable (as suggested below), it
is not clear how limited-information maximum likelihood carried
over to such cases."

2Fisher [1965], p. 631.

68

in the estimation of 6 if each explanatory jointly dependent
variable is adjusted only by the predetermined variables in the
reduced form equation in which this jointly dependent variable
occurs rather than adjusting all explanatory jointly dependent
variables by the entire set of predetermined variables in the
system.1 (Assuming that all predetermined variables do not
occur with non-zero coefficients in each of the reduced form
equation being estimated) this leads, of course, to the use of
(11.4) in place of [Y'Yllx and (11.5) in place of [Y'yllx

1 1
(except that y is usually not adjusted).

DanLéon 06 (II. 2)2

If the coefficients of (11.1) are estimated by DLS, due
to the fact that some of the explanatory variables are contem-
poraneously correlated with the disturbance of the equation, the
DLS coefficients do not possess even the property of consistency.
To take account of the occurrence of explanatory variables
which are contemporaneously correlated with the disturbance, let

us first rewrite the equation in an adjusted form and then apply

 

1Predetermined variables will tend to occur with readily
recognized zero coefficients in reduced form equation in systems
which are recursive with respect to the coefficients. ZSLS
(modified by 11.4 and 11.5) is still more appropriate than re-
cursive system estimation methods if it is not assumed that the
off-diagonal elements of the disturbance variance-covariance matrix
are zero.

2The derivation in this section parallels the derivation of
the single k-class method by Chow [1964], pp. 546-548.

3Except possibly in special recursive models. See Fisher
[1965], pp. 592, 593.

69
DLS to the adjusted equation. As a first step let us divide the vari-

ation of each variable in Y and y into two parts--that part of the
variation which is in the Space spanned by the instruments and that
part which is orthogonal to the instruments, i.e., we will divide Y

and y into:

(11.6) Y = Y + Y,
llxI 1x1

(11.7) y = y + y
"XI 1XI

Since the instruments are assumed to be asymptotically uncorre-

lated with the disturbances of all equations, the columns of le and
1
y"X which lie in the space Spanned by XI are also asymptotically
I

uncorrelated with the disturbances of all equations, and in particular
asymptotically uncorrelated with the disturbance of the nth equation, u.
To adjust Y and y for the part asymptotically correlated with the

disturbance, let us subtract a constant, g1, times le from Y and
I

(recognizing the special role played by the normalizing variable) sub-

tract another constant, gz, times y from y. If lelX and
I I

are subtracted from Y and y respectively, equation (11.1)

iX

gzleI

becomes:

 

1It might be argued that it is unnecessary to adjust y, the normal-
izing jointly dependent variable. If y is not adjusted, one obtains
Theil's h-class which contain DLS and ZSLS as particular cases. Some
readers may consider the adjustment of y more justified if this ad-
justment is regarded as the result of a two step process as follows:
First, lelX is subtracted from Y giving us:

I
Y = [Y - lelx ]Y + XuB + [u + leix Y]

and then gZle is sublracted from the distufbance to (possibly) make
the resulting disturbance more homogeneous or (possibly) to reduce the
asymptotic correlation between the disturbance and the explanatory
variables, [Y - lelxI: Xu], in the equation. If

70

(11.8) [y - gzyixl] = [Y - 31 YLXIJY + X“ B +'[u + lelXIY - gzyLXI]

or

(11.9) [y - gzylx ] = [Y - lelx E Xulé + [u + lelx Y - gzylx J
I I I I

The double k-class estimator of 6 in (11.9) may be written as:

(11.10) 8 =
k1,k2
: '1 I I
{[Y lelxI-. x n]. [Y le LXI - Xu]} [Y - leixi - Xu] [Y - gznyI]
or
(11.11) 8 =
k1,k2
'- ‘ I _ _ I '1 I
[nglleI] [Y lelxI] [Y leLx 1 X“ [Y'leix J [y-gzyix J
1 1 1
XI Y- Y IX I -

 

However, [Y - le 1X J'Xp = Y' X“ - 81$ Xu= Y'XW - g10 = Y'X
m1

1
' - = . '
and similarly XDEY gZle] Xuy Also, Since

 

B = = ' = '

ijl Y Y' Y; [Y'Y]1x1 and Y'ny Y y1x1 [Y yJLXI [see
I _ = I _ I - I
2

I g _ _ I -
+ g1[Y Y]1x1 Y (2g1 g1)[Y YJLXI , and (11.11) may be re
written as:
this approach is used, we obtain: =[Y- -g1Y 1X; jv +'qu +>g2y‘Lx
+ [u +-ginXIv - g2y1XI] which may be rewritte as (11.8).

If it is felt that the same adjustment should be made to all
jointly dependent variables, then g = and the single k-class
estimators are obtained, of which LIML is an example.

lle X“ 8 0 since X“ is in the space spanned by XI and

le is brthogonal to x1. See (1.38) and (1.53).
1

71

(11.12) 6 =

P 2
Y'Y-(Zgl-gl)[Y'Y]ix Y'Xu [Y'y-(g1+g2-glsz)[Y'YJ
1 . I

X'Y X'X X'y
L ”- lJvN' Ivl'

b

 

 

The basic estimating formula (11.2) is obtained from (11.12)
. 2
by letting k1 2g1 - g1 and k2 g1 + g2 - glg2 .

Sumnaliy 05 Mcthada.

If y is left unadjusted (i.e., g2 is set to zero) and
if g1 is set to l - h, Theil's h-class model is obtained as a

particular case of the above double k-class model, since k1 be-

comes 2g1 - 3% = 2(1 - h) - (l - h)2 = 1 - h2 and k2 becomes

1
g1 +g2 +g1g2 = l - h. Theil's h-class formula may be written:

[Y'Y-(1-hz)[Y'Y1ix Y'x‘7’IFY'y-(1-h)[Y'y]lx"
1 “ 1
(11.13) 6h = ,
X'Y X'X ‘H' . X.

 

 

 

 

As with the formula for any double k-class member, y may be

substituted for in the right hand side vector, since

y1x

2
[Y'y] = w y .
1X1 lXI

If the same basic adjustment is made to all jointly de-

1

pendent variables (i.e., if g1 and g2 are restricted to the

 

1The h-class model may be found in Theil [1961], pp. 353-354
and Nagar [1962], p. 171.

2
[Y'Y] = Y' y = Y' [y - y ] = Y' y - Y' Y =
1x .
1 J"x1 1x1 1x1 ”(1 1x1 1x1 ”1
Y' y - 0 = Y'
.L
xI 1x1

72

same scalar value g),the single k-class model with k = 2g - g2

is obtained. The single k-class formula may be written:

rY'Y-1<[Y:Y]1x Y'x“ ' 'y-klIY'y] 1x
I,‘, - I
(11.14) 8k: '
x'Y x'x x'
_ u u . uy

 

If both g1 and g2 are set to zero, the DLS estimation
procedure with k1 = k2 = 0 is obtained. DLS may be regarded as
either a k-class or an h-class member.

If g1 is set to 1, even though g2 is taken as any value
we have k1 = 2g1 - gi = l and k2 = 31 + g2 - glg2 = 1 + g2 - g2 = 1;
therefore, setting g1 to l automatically gives k1 = k2 = l--
the ZSLS estimator. .Thus, ZSLS may be regarded as the single k-
class member with k = 1 or the h-class member with h = 0.

The LIML estimator is the particular single k-class member
in which k equals the smallest 2eigenvalue (characteristic root)
of the matrix [Y' Y]Lx I[Y' Y] This eigenvalue will be greater

$2

than 1 except for the particular case rk XI = n, 3 in which case
the eigenvalue will be 1. An eigenvalue greater than 1 implies a

g which must be expressed as a complex number containing an

imaginary part. The eigenvalue itself will, of course, be real.

 

lThe k-class model may be found in Theil [1961], pp. 231-237.

2 C O O O C
The LIML estimator is discussed in greater detail in section
11.0.1.

3n = m +-L is the number of ”explanatory" variables in the
equation.

73

Two other particular single k-class members which will be
considered further on are members suggested by Nagar which we will
refer to as unbiased to 0(T-1) in probability k (UBK) and minimum

second moment (16M) .

74
B. Methods Which are Both h-class and Single k-class

1. Direct least squares (DLS)

 

As indicated earlier, direct least squares (DLS) coeffi-
cients may be obtained by setting g1 and g2 to 0 which implies

a k of 0 and an h of l. (11.14) becomes:

-1
Y'Y Y'xu Y'y
(11.15) 61m =
x'Y x'x x'y
u u u It

In estimating by DLS, all of the jointly dependent variables
except the normalizing variable are treated the same as the pre-
determined variables. If more than one jointly dependent vari-
able occurs in the equation, the DLS estimated coefficients are
not even consistent, since jointly dependent variables which are
not even asymptotically uncorrelated with the disturbance are
used as independent variables. Even in this case, however, DLS
coefficients have some desirable properties such as small dis-
persion of coefficients about their expected values and finite

sample coefficient variance-covariance matrices.

 

18ee Fisher [1965], pp. 591-592, 604-605.

75
2. Two-stage least squares (ZSLS)1

 

As indicated earlier, if g1 equals 1, g2 may be anything with-
out changing the resulting coefficients which we call the two-stage
least squares (ZSLS) coefficients. If g2 is considered to be 0, ZSLS
may be considered the h-class member with h = 0. If g2 is consid-
ered to be 1, ZSLS may be considered the k-class member with k = l.

The ZSLS estimating formula becomes:

-1
I I I I I
Y Y-[Y Y]LX Y x” Y y-[Y yle
I 1
(11.16) 52313 =
X'Y x'x x'y
u u u u

- I = I _ I d I = I __ I ,
Slnce [Y YjuxI Y Y [Y YllX an [Y yl'x Y y [Y ylix

I I
(11.16) may also be written as:

I I ’ I
[Y YJHXI Y x“ Y yJ"XI

(11.17) 62SLS =

X'Y x'x x'
uy

Duivaﬁon 06 ZSLS a6 a TWO-6110.90. PILOCQAA3
As a first stage, let us calculate the predicted value of each

explanatory jointly dependent variable (each variable in Y)

 

1Basic referenceson ZSLS include Theil [1961], pp. 228-240, 336-
344, and Basmann [1957]. ZSLS is referred to as the generalized clas-
sical linear (GCL) estimator by Basmann. (Basmann derived ZSLS at
approximately the same time as but independently of Theil.) More re-
cently, Basmann extended his GCL estimator to a partial system or full
system estimator.

2 . .
Theil and Basmann used the X matrix as the matrix of instru-

ments, XI.

3Theil [1961], pp. 228-230 derives ZSLS as a two-stage procedure.

76

by DLS, using the variables in XI as explanatory variables in
the DLS calculations. The resulting matrix of predicted values

of variables in Y (often denoted Y) is exactly the matrix

Y“XI as noted in section I.D.l. Since XI is assumed asymp-
totically uncorrelated with u, Y"XI will be asymptotically un-
correlated with u, also.

As a second stage, let us substitute Y for Y into

"X
(11.1) and estimate the vector of coefficients, 6, by DLS, i.e.,

let us apply DLS using y as the dependent variable and the

variables in the matrix [YnX 3 Xu] as explanatory variables.

 

 

 

 

I
We get:
[— I I “'11. I q
Y Y Y X Y y
leI llxI llxI u- IxI
(11°18) 62nd stage 8
x'Y x'x x'y
X
However, YﬂthYan = [Y'Y]"x1 (by definition 0f [Y'YJIXI) ,
Y' y = [Y'y] [see (1.54)], and since all variables in X
IxI [XI 0
. o y = I = I x = YIX
are also contained in XI, YIXIXp [Y XDJIXI Y [ “JIXI u

[see (1.54) and (1.40]; hence, (11.18) is equivalent to (11.16)
and (11.17). Although SZSLS calculated in two steps (i.e., by
actually calculating YIx ) is algebraically the same as the
computational formula forI EZSLS given in (11.16), the compu-

tational formula given in (11.16) should result in less rounding

error since the computations are more direct.

77
Demiuation 06 ZSLS as an Instaumentaf Vaniab£e6 EAIZmazon Technique

In section II.D.2, ZSLS is derived as an instrumental vari-

ables estimator technique.

Deniuation 06 ZSLS as an Appﬁication 06 Genenazized Least Squanes

In section IV.D, ZSLS is derived as an application of

Aitken's generalized least squares.

Dehivazion 06 ZSLS as the Least Vaniance Diﬁﬁenence

Consider linearly combining the jointly dependent variables
in the equation into a single jointly dependent variable, y*, by

postmultiplying these variables by a vector of coefficients, i.e.,
1

consider the calculation of y* = [y ? Y][:.:] . Let the residual
-v*

sum of squares from regressing y* on the predetermined variables

in the equation be denoted ﬁ'ﬁ (i.e., ﬁ'ﬁ = [yIY'y‘klLx ) and the

u
residual sum of squares from regressing y* on all of the in-

struments be denoted 5'6 (i.e., ﬁ'ﬁ = [y*'y*]ix ). Then

I

ﬁ'ﬁ - 3'3 will be minimized if §ZSLS is used as §*.1 Also,

if §ZSLS is used as §*, the DLS coefficients obtained by re-
gressing y* on the predetermined variables in the equation will

. 2
5231.5'

The above least variance difference (LVD) property is intu-
itively desirable since it causes the jointly dependent variables

to be linearly combined such that the instruments which are specified

 

1Basmann [1960a], pp. 100-102.

2Ibid.

78

a priori as being outside the equation add as little as possible
to the explanation of the combined dependent variable (y*). Such
an intuitively desirable property can, however, be easily over-
emphasized. The LVR (least variance ratio) property of LIML
(limited information single equation maximum likelihood) estimates
would seem to be as appealing.

LVR estimates may be derived in the same manner as LVD
estimates, except that instead of selecting §* to minimize
0'6 - ﬁ'ﬁ, §* is selected to minimize ﬁ'ﬁ/ﬁ'ﬁ, which is

equivalent to minimizing (ﬁ'ﬁ - E'ﬁ)/G‘E. is used

If YLIML
as §*, then ﬁ'ﬁ/G'G will be a minimum.1 Also, éLIML will be
the DLS coefficients obtained by regressing y* (calculated by
using §LIML as §*) on the predetermined variables in the

equation.

 

lﬁ'ﬁ/ﬁ'ﬁ - [ﬁ'ﬁ/i‘i'ﬁ] - 1 + 1 = [(ﬁ'ﬁ - rah/6's] + 1. Since
ﬁ'ﬁ/G'G and [(ﬁ'ﬁ - G'G)/G'G] + l differ by an additive con-
stant, they are minimized by the same values of y*. Also,
kLIML = ﬁ'ﬁ/ﬁ'u. See Koopmans and Hood [1953], pp. 166-169.

2Ibid.

79
C. Additional Single k-class Methods

1. Limited information Single equation maximum likelihood (LIML)1

If kLIML is calculated as the smallest eigenvalue (char-
acteristic root) of the matrix [+Y'+Y]I;I[+Y'+Y]lx then the
limited information single equation maximum likelihtod (LIML)
coefficients may be calculated by the usual single k-class for-
and

2 .
mula. (As noted earlier, Y = [y : Y]; hence, [+Y'

+ +XLX

I
[+Y'+Y]Lx are the moment matrices of the parts of the jointly

dependent variables in the equation orthogonal to the instruments
and to the predetermined variables in the equation, respectively.
Computational formulas for these matrices are given in sections
I.D.2 and appendix B.)3

If the matrix X is used as the matrix of instruments, XI,
and it is assumed that the matrix of disturbances of the system
has the multivariate normal distribution, then the LIML estimates
are maximum likelihood estimates given the limited amount of in-

formation used (the jointly dependent variables in the equation,

the predetermined variables in the system, and which predetermined

 

1Basic references for the LIML estimator are Anderson and
Rubin [1949], Koopmans and Hood [1953], pp. 162-170, and Chernoff
and Divinsky [1953], pp. 240-246.

2See Theil [1961], p. 231. The computational equivalence of
as defined above and the more commonly expressed formulas

fofMIhe calculation of kL ML is noted further on in this section.
I . I - I .

That the matrix [+Y +Y]lx [+Y +Y]lx% is not a symmetric matrix
must be taken into account in extrac ing kLIMl'

3[+Y'+Y]l may be calculated as an intermediate step of the

calculation ofx“[+Y'+Y]lx in the manner noted in appendix A.

80

variables have zero coefficients in the equation).1 If the matrix
X is not used as the matrix of instruments, then the resulting
coefficients are not, strictly speaking, the usual limited in-
formation maximum likelihood coefficients, since the predetermined
variables in a matrix of instruments have been substituted for

the predetermined variables in the system.

LIML estimation utilizes the same information as ZSLS, and
the LIML coefficients have the same asymptotic coefficient variance-
covariance matrix as the ZSLS coefficients.2 As noted at the end
of section II.B.2, LIML coefficients may be derived as the coeffi-

cients with the least variance ratio (LVR).

If rk XI = n then k.LIML = l = kZSLS; therefore, the co-
3
efficients for LIML and ZSLS coincide. If rk XI > n, then
kLIML > 1.4 If rk.XI < n, a singular matrix is encountered during

 

1Koopmans and Hood [1953], pp. 166-170.
2Theil [1961], p. 232.

3 . ' ‘
Theil [1961], p. 232. As noted in section II.B.2, kLIML

LVR = ﬁ'G/u'ﬁ where 6'8 and 5'5 are as defined in section
11.3.2. If rk.X = n, then there are effectively only m = n - L
instruments in addition to the predetermined variables in the
equation. The m + l jointly dependent variables may, therefore,
be combined into a single jointly dependent variable in such a
way that G'ﬁ = ﬁ'ﬁ. See section II.D for additional detail show-
ing the equivalence of ZSLS and LIML for the case rk.XI = n.

4If rk.X > n, then ﬁ'ﬁ # 3'3 since there are effectively
more than m instruments in addition to the predetermined vari-
ables in the equation; hence, kLIML = ﬁ'ﬁ/G'G > 1. See Koopmans
and Hood [1953], pp. 171-175.

81

estimation (a unique solution does not exist).

14.81er LIML Fatimuiao

The LIML formula given above is not the most common formula
for LIML. To see the relationship to more commonly quoted LIML
formulas, we will first note some relationships between eigen-
values (characteristic roots) and eigenvectors.

Let A and B be an symmetric positive definite

matrices.1 Then the determinantal equation
(11.19) det(A - CiB) = 0

has n solutions, c1 ... cn of which some of the C1 (the
eigenvalues) may be duplicates (i.e., there may be only m dis-
tinct roots with m S n). Since a determinantal equation is not
changed by multiplying both sides by a constant, the determinantal

equation
-1 -l
(11.20) det(B )-det(A - ciB) = det(B A - ciI) = 0

has the same eigenvalue solutions (the same Ci) as detQA - ciB) = 0.
(11.19) may also be converted to another problem-~the

calculation of the eigenvalues of the equation

(11.21) (A - ciB)di = 0

 

1The matrix B in this section is any an positive definite
symmetric matrix--not the matrix of coefficients of predetermined
variables as in other sections of this paper.

2
det denotes determinant.

82

where associated with each eigenvalue, c , is an nXl eigenvector,

1
di' Premultiplying (11.21) by d; we have:
(11.22) d!(A - c,B)d, = ded, - c,ded. = 0
i i i 1 1 i 1 1
or
(11.23) dedw = c,ded,
i i i i 1
or
diAdi
(11.24) (:i = m
i i

That is, each eigenvalue, ci, must meet relationship (11.24) with
its associated eigenvector, di'

Similarly from either (11.20) or (11.21) we can derive that

 

(11.25) (B'lA - ciI)di = 0 ; thus,
(11.26) d'(B'¥A - c I)d = 0 and
i i i
d£(B-1A)di
(11.27) c1 = dfd
1 i

. . = I = I
For LIML estimation A [+Y +Y1lxu , B [+Y +YLX1 ,
the minimum ci from any of the above formulations becomes

kLIML’ and the corresponding di becomes +yLIML . BLIMI may

= - - ' ' .
be calculated as éLIML [Y X ] Kn IY YLIML 1n the

formula which is given in this paper, the smallest eigenvalue of

-l
I I . .
[+Y‘fY11XI[+Y +Y]pr becomes kLIML which is substituted into

 

1Koopmans and Hood [1953], pp. 170-173.

83

the eneral k- las f l t l l A .
g c s ormu a 0 ca cu ate 6LIML

. . I _
Another equivalent formulation is to use [+Y +Y]be
I I o .
[+Y +Y]Lx1 as A and +Y +Y as B. (This is the formulation
of Anderson and Rubin [1949] and Chernoff and Divinsky [1953].)

The minimum ci becomes kLIML - l but the associated eigen-

vector di is still +YLIML .

The eigenvalues of A-lB are the reciprocals of the
-l
eigenvalues of B A; hence, instead of extracting the smallest

eigenvalue of [+Y'+Y]E; [+Y'+Y]Lx , the largest eigenvalue of
I H

l
I I
[+Y'+Y'ixu[+Y +Y]LXI may be extracted and then kLIML cal-

culated as 1 divided by this eigenvalue.

Neither A-IB nor B-yA is symmetric; hence, a non-
symmetric eigenvalue computer subroutine is required to extract
the desired eigenvalues. Computational procedures for extract-
ing eigenvalues of a matrix of the special form A-lB with A
positive definite and B positive semi-definite are available
or a computational scheme for more general non-symmetric matrices
may be used.

Extraction of an eigenvalue as kLIML and substitution
of kLIML into the usual k-class formula in order to cal-
culate the LIML coefficients makes it unnecessary to cal-
culate the corresponding eigenvector while the root is cal-
culated thereby saving computer time. Use of the eigenvector

corresponding to kLIML a Y and then calculating

s 4 +ALIML .
as -[x'x]']X'

A

BLIML +¥+NLIML also requires spec1a1 programming;

84

thereby, again giving incentive to calculate LIML coefficients
through use of the k-class formula. (In addition as is noted in
Chapter III, calculation of the estimated coefficient variance-
covariance matrix does not require special programming if the
general k-class coefficient variance-covariance formula is used.)
The two smallest eigenvalues provide information. The
smallest is used as kLIMEL and the first two smallest eigen-
values may be used in a test of identifiability of a structural
equation.1 The closeness of the second eigenvalue to the smallest
eignevalue gives an indication of the "explosiveness" of the
resulting LIML coefficients as noted in Klein and Nakamura [1962],

pp. 294-295.

 

11<oopmans and Hood [1953], pp. 103-184.

85
2. Nagar's unbiased to 0(T-1) in probability k ,(UBK)

If k is calculated as k = l + (A - n - l)/T and 6k
is calculated using the usual k-class formula, the coefficients
obtained will be unbiased estimates of the true parameters, 6,
to 0(T-1) in probability provided another statistical assump-
tion is added to those made at the start of this paper (section
I.C.3).1 Rather than merely assuming that the predetermined
variables are contemporaneously independent of the distur-
bance of the equation, u, the stronger assumption that the pre-
determined variables are non-stochastic is made, thereby not
permitting lagged jointly dependent variables to occur as pre-
determined variables in the equation.

A - n - l = K** - m - 1 will be greater than or equal to
zero for an over-identified equation and less than zero for a
just-identified equation. Thus, kUBK 2 l for an over-identified
equation and kUBKu< l for a just-identified equation.

If X is used as XI and rk X is less than A, it seems

 

10(T-1) in probability is defined in section I.B. In that
section it is written more compactly as 0P(T‘1).

2The UBK method is given in Nagar [1959] and Theil [1961],
pp. 349-350. Since Nagar's derivation of UBK is based on the re-
duced form equations which contain the "explanatory" jointly
dependent variables in the equation, it is usual to use the X
matrix as the matrix of instruments, X . Even though lagged
jointly dependent variables occurred in the X matrix for the
computational illustration in Nagar's article (Klein's model I),
Nagar used the entire X matrix as the matrix of instruments,

X .
1

86
appropriate to calculate kUBK as kUBK = l + (rk X - n - 1)/T,

since if only the maximum number of lineraly independent pre-
determined variables were retained and the problem recalculated,
the new A would be rk X.

When contemplating UBK estimation, one should remember that

UBX coefficients are only asymptotically unbiased. Further, one

 

should be reminded that even if the assumptions inherent in the
derivation of the UBK.method hold, the degree of bias of an
estimator is only one characteristic in evaluating it. Another
important characteristic is dispersion, especially dispersion
about the true value. The UBK estimator is noted in this paper

mainly because UBK coefficients are quite easy to calculate.

87

3. Nagar's minimum second moment k (MSM)

 

Nagar derived the moment matrix of 6k around 6, the true

A

parameter vector, to 0(T-2) in probability where 6k is the esti-
mated coefficient vector for any k-class member in which (1) k is
non-stochastic1 and (2) k - l = 0(T-1) in probability.2 (DLS does
not meet the latter requirement.) As for UBK, the stronger assumption
that the predetermined variables are non-stochastic is also made.

Nagar derives the value of k which minimizes the determinant of this

matrix as:

1 tr(C2Q)
11.28 k = l +-— A - 2 + - 3 - —————-—
( ) '1 (m L) tr(ClQ)
where tr denotes trace and C1, C2, and Q are matrices defined by
Nagar.3 C1, C2, and Q must be estimated, the estimate of Q sug-
-l
I I
gested by Nagar being [Y YJIX Y X” , and the estimates of
X'Y X'X
u H

C1 and C2 being defined in his article.4 (Further on in

 

1k non-stochastic means in this case that k is independent of
any Stochastic variable (in particular, k is independent of u).

2 .

The latter restriction implies that 6k is asymptotically un-
biased and has the same asymptotic variance-covariance matrix as 62SLS'
(Nagar [1959], p. 578.)

3The MSM method is given in.Nagar [1959] and Theil [1961], pp. 349-
353. Since Nagar's derivation of MSM is based on the reduced form equa-
tions which contain the "explanatory" jointly dependent variables in the
equation, it is usual to use the X matrix as the matrix of instruments,
X . Even though lagged jointly dependent variables occurred in the X
matrix for the computational illustration in Nagar's article (Klein's
model I), Nagar used the entire X matrix as the matrix of instruments, XI.

4Nagar [1959] gives his suggested estimates for Q, C, and C on p.
589. Although Nagar is not explicit on theapoint, it is apparen[ that

he is suggesting that C2 be calculated as C2 = C - C1 .

88
this paper the symbol Q will be used to denote quite a different

matrix.)

The above method of estimating k (and thereby 6) will
be referred to as the MSM (for minimum second moment) method. R
will be greater than 1 only for A very large. Nagar notes that
the sample size will usually restrict the number of predetermined
variables which can be regarded as being in the system thereby
causing this to happen rarely. Since tr(CZQ)/tr(C1Q) can be-
come large in certain problems, it is possible to obtain a k
which is less than zero.

If X is used as XI and X has less than full column
rank it seems appropriate to substitute rk.X for A in the MSM
formula for calculating kMSM? since if only the maximum number of
linearly independent variables were retained and the problem re-
calculated, the new A would be rk.X.

Even if it is assumed that the assumptions inherent in the
derivation of MSM hold, it must be realized that the derivation
is made in terms of certain population matrices. The method
attempts to minimize by the choice of k the moment matrix of
6k about 6, by first estimating these population matrices and
then substituting them into the MSM formulas. The dispersion of
these estimates about the actual population matrices may be con-
siderable. As a result, the actual dispersion of sMSM. about 6

could very well be much larger (instead of smaller) than the

dispersion for ZSLS.

89

D. Methods Requiring rk XI = K = n

In this section we will consider two methods--indirect
least squares (ILS) and the instrumental variables estimator (1V)1
which are applicable only if the matrix of instruments (XI)
actually used in the computation has full column rank (K) with
the number of instruments being equal to the number of coeffi-
cients being estimated.2’3 ILS and IV are often referred to as
just-identified equation methods, because if the matrix X is
used as XI and X has full column rank, then based on a
commonly used counting rule for identification of the coeffi-

cients of an equation in a system of equations, the equation is

just-identified.

 

1We have been treating all of the double k-class estimators
as instrumental variables methods to the extent that jointly de-
pendent variables are adjusted by a matrix of instruments in the
computational procedure. In this section a computational method
called the 1V method will be discussed and its relationship to
ZSLS and LIML noted. (The IV method was in use before the LIML
and ZSLS methods were devised.)

2The matrix of instruments actually used in the computation
may, of course, be selected or formed from a larger set of in-
struments.

3If X“ has full column rank then rk Xf; = m implies
that rk X = n. (n = m +-L where m is the number of explan-
atory jointly dependent variables in the equation and L is the
number of predetermined variables in the equation. X** is the
TX(K -16) matrix composed of the instruments which are not pre-
determined variables in the equation being estimated.) Although

rk.XI = n implies rk.X“ = L and rk X** = m, rk X“ = L and

rk X** = m do not together imply rk X1:= n. (rk X = L and
rk X*§ = m together imply rk XI = n.) M
in
4The counting rule is a necessary but not sufficient con-
dition for identifiability of the coefficients of the equation
being estimated. See Koopmans and Hood [1953], pp. 135-142.

90

If (1) the matrix of predetermined variables in the system
(X) is used as the matrix X1 in the estimation of ZSLS and
LIML, (2) X is used as the matrix of instruments in the IV method,
and (3) rk.X = A = n, then the same estimated coefficients are
obtained from ILS, IV, ZSLS, and LIML; hence, ZSLS might as well
be used to estimate the coefficients in this case.1 Thus, it is
unnecessary to develop a special ILS computer program. If the
matrix of instruments used in the IV method contains the pre-
determined variables in the equation, then the IV coefficients
will also be the same as the corresponding ZSLS and LIML coeffi-
cients. Even if not all of the predetermined variables in the
equation are used as instruments, the IV problem may be re-
formulated as a special ZSLS problem and readily solved;2 hence,
it is unnecessary that a special IV computer program be developed,

either.

 

1That ZSLS and ILS coefficients coincide is shown in section
II.D.l. Koopmans and Hood [1953], pp. 173-174 show that in the
case of rk.X = K = n (with X used as XI), LIML and ILS co-

1
' = . ' k i
efficients coincide and kLIML 1 (Since ZSLS 3 always 1,
this shows that LIML and ZSLS coefficients coincide.)
2

This is shown in section II.D.2.

91
1. Indirect least squares (ILS)

In the indirect least squares (ILS) method, reduced form
coefficients are estimated directly by DLS (these reduced form co-
efficient estimates are consistent since each equation contains
only 1 jointly dependent variable) and then structural coefficient
estimates are obtained algebraically from the reduced form coeffi-
cient estimates.1 Since the structural coefficients are the same
continuous function of the reduced form coefficients as the
estimated structural coefficients are of the estimated reduced
form coefficients, and since the estimated reduced form coeffi-

2
cients are consistent, the ILS coefficients are consistent.

Duiuauon 05 the ILS Estimating Equations

We will assume that X has full column rank so that unique
unrestricted least squares estimates for reduced form equations
containing all predetermined variables in the system exist.

To estimate the coefficients of the nth equation by ILS,
we need not estimate the coefficients of all of the reduced form
equations--only those with jointly dependent variables contained

in the nth equation. If we divide the GXA matrix of reduced

 

LThe ILS method may be found in Xoopmans and Hood [1953], pp.
135-141.

2This follows from a theorem by Slutsky. See Goldberger
[1954], pp. 118-119, 128, 326-327.

92

H
form coefficients, H into 1 where H1 is the (mh + 1)XA

II2
matrix of coefficients of the equations with the same jointly de-
pendent variables as those in equation u, then we can estimate

1

H by DLS as:

l

(11.29) [I = [X'X]-1X'+Y

I
1
From (1.19) we have:
-1
H = -F B or PR = -B or TH + B = 0

th
or in terms of the coefficients of the 0' equation above, we have:

['3' 5 0']II + [8' E 0'] = 0' or +y'n1 + [B' : 0'] = 0'

. B -
or H1 +y +[p:] 0

. -1
Substituting the estimate Hi = [X'X] X'+Y for Hi, we get the
ILS estimating equations:

.3
(11.30) [x'x]'1x'+Y+§ILs +[ ILS] = o
0

Substituting [y 2 Y] for +Y and [?}:] for +§ and rewriting,
Y

we have:

A

9 -
(11.31) [x'x]'1x'Y§nS +[ ILS] = [x'x] 1x'y
o

 

1+Y = [y 3 Y] as before.

2 -1
+y = Y as before.

93

This is a system of A non-homogeneous equations in
n (= m +-L) unknowns. For (11.31) to have a unique solution,
A must equal n. If A > n, no solution exists.1 For a solution
to exist, A - n of the equations given in (11.31) must be thrown
away (i.e., A - n of the rows of [X'X]”1 must be deleted),
thereby ignoring information and making the actual estimates
obtained depend on the particular rows of [X'X].1 deleted.
Equivalently, A - n of the predetermined variables in the system
can be ignored thereby forcing the number of predetermined vari-
ables used in the estimation to equal n. This gives rise to a
different set of ILS coefficients for each different set of pre-

determined variables ignored.
Fuhtheh DWI/ation Assuming A = 11

Let us assume that for the nth equation, n = A. Given our
assumption that X has' full column rank this implies that ZJX
is square. Let us further assume that ZﬁX is non-singular. A
set of simultaneous equations may be premultiplied by a non-singular
matrix without changing the solution. Premultiplying (11.31) by

z'x we et:
H 8

 

lAssuming X has‘ full column rank, an equation in which
A = n is just-identified by the counting rule for identification
and an equation in which A > n is over-identified by the counting
rule for identification. The case of just-identification is also
referred to as the case of "minimum requisite information" and
the case of over-identification is also referred to as the case of
"extra information.“ A 2 n is a necessary but not sufficient
condition for the population coefficients to be identifiable.
See Koopmans and Hood [1953], pp. 135-142.

94

B
I I ‘1 I " , s ' ' -1 '
(11.32) qu[X X] x YYILS + ZHXE] = sztx X] x y
- [Y'Y] [Y'Y]
Since Z'XEX'X] IX'Y =[Z‘1Y] = IX a: IX
0 'X [x'Y]. X'Y‘
u . u
[Y'Y] [Y'y]
and zux[x'x]'1ny = [zuy] ux = Ix
0 IX [ 'YJ x'y
X” “X p
(II-32) may be rewritten as:1
[Y'Y] . YFx [Y'y]
(11.33) ux YILS + u §ILS s IX
qu xuxh Xpy
or
Y'Y]Ix Y xy]"x
(11.34) talus
. V V

Therefore,

I I '1 I
f [Y YJ'X Y x“ [Y yllx
(11.35) 8 =

X'Y x'x x'
u u u uy

However, (11.35) is the ZSLS computational formula (11.17) in
which X is used as the matrix of instruments, XI. Thus, ILS

coefficients may be computed through use of the ZSLS formula.

 

l
X'Y = X'Y and X' = X' since the variables in
[ p Jux u [ “YJIX “Y

X“ are contained in X. See (1.42).

95
2. The instrumental variables estimator (IV)

In our general double k-class methods, we have been adjust-
ing the jointly dependent variables in the equation by a matrix
of instrumental variables. In this section, we will consider the
calculation of coefficients by a computational method (fairly
widely used before the k-class methods such as LIML and ZSLS were
devised) called the instrumental variables (1V) estimating method.1
In this method, the same number of instruments as there are vari-
ables in the equation are used. We will Show that on the one hand,
the IV coefficients may be calculated by the ZSLS computational
method and on the other hand, ZSLS coefficients may be derived as
a particular case of the IV method. The choice of instruments for
ZSLS and IV should apparently be based on the same criteria except
that only n instruments can be chosen for the IV method.2

Let a single equation from a system of equations be written

as
(11.36) y = 2116 + u

where all of the matrices and vectors have the same dimension and
meaning as before [see (1.10) and (11.1)].

Let XIv be a matrix of n instrumental variables (the

same number of instrumental variables as there are columns in 2h)

 

1Goldberger [1964] contains a detailed treatment of the in-
strumental variables method.

2Choice of instruments is discussed in section 11.G.

.

96

and assume that xivz is nonsingular (hence that XIV and Z“

have full column rank and that correlation exists between the

variables in XIV and the variables in Z“).

Premultiplying (11.36) by X'

IV we get:

(11.37) X' y = X'

I
IV IVZHS + X u

IV

If we let the estimating equations for 6IV be:

. . 8 = ' ,
(11 38) XIth IV XIVy

an nXl nXl

we get:
A = I " I
(11.39) 61V [xIvzp] 1xIvy
If the variables in XIv are contemporaneously independent

of u (thus: plim(l/T)Xivu = 0) and correlated with Z so that

plim(l/T)Xi Z ==Q exists and is non-singular, then 6 is
V”- X Z 1V

. IV “ 1
a consistent estimate of 6.

Caecuﬂett'on 06 IV Pnoblieino on a ZSLS Compute/L Rowténe

It is not necessary to develop a computer program to cal-

culate 1V estimates, since 81V may be calculated on a double k-

class computer routine as a ZSLS problem. Let us premultiply

-l
- . ' .
equation (11.38) by the non singular square matrix zuxIVEXIVXIV]

 

A

[A proof of the consistency of 6IV is given in Goldberger
[1964], p. 285.

97

(premultiplication by a non-singular matrix will not change the

solution) so that the estimating equations become:

- A -1
11.40 'X ' 1x‘ 5 = ' I
( ) Zn IVEXIVXIV] 1vzu 1v zuxIVEXIVxIV] x

or

I
1vy

(11.41) [zuzujlx 51v = [Zp‘yl'x

1V IV

Then:

- -1
(11.42) 6 - [2'2 ] [Z'y]
IV u u IXIV u IXIV

Comparison of (11.42) with (11.17) or (11.35) shows that

(11.42) may be computed on a ZSLS routine by merely treating 2”

as Y, (i.e., by treating all of the variables in the equation as

if they were jointly dependent variables) and by using XIv as

the matrix of the instruments, X (Only the upper left hand sub-

1'
matrix will remain in the matrix to be inverted and only the
upper subvector will remain in the right hand side vector.) The
variables in X” are treated as jointly dependent in the com-
putation only--not in the interpretation of results.

The calculation of 61 as a special ZSLS problem also

V
provides a convenient way of calculating the estimated coeffi-
cient variance-covariance matrix as is noted in the section on
coefficient variance-covariance estimation.

If some of the predetermined variables in the equation are
used as their own instruments (or are even linear combination of
variables in XIV), these predetermined variables need not be

treated as jointly dependent for computational purposes. Let

98

u = [2: 5 X3] where variables are rearranged in the equation to

permit listing predetermined variables which serve as instruments
last. These variables comprise the X3 matrix. (11.42) may be
rewritten as:

*I * ‘1 I
12*'2*J"XIV [2* x 1"X1v 12;; ﬂux

(11.43) 8 -

 

 

1V
[ *' 2*1'Ix [x *‘x * [X*'y]
xﬁ u x“ "XI [XIV
Since X3 is in the space spanned by XIV ,
[z*' x* = z*'x*, [x* 'x* = x*'x* , and x*' = x*'
WJIX u H WJHX u u E u YJIX H Y

[see (1.39)]; therefore, (11.43) may be rewritten as:

-lr-

Z*'Z* Z*'X* z*'
E u uJIX u u E H YJuXIv
(11.44) 6IV =
XK'Z* x*'x* X*‘y
H u b H

 

Comparison of (11.44) with (11.17) shows that (11.44) is the com-
putational formula for ZSLS in which the variables in X3 are

treated as predetermined variables in the equation, the variables
in a: are treated as explanatory jointly dependent variables in
the equation (for computational purposes--not for interpretation

of results), and the variables in X are treated as predeter-

1V
mined variables in the system.
If (as is the usual practice) all of the predetermined

variables in the equation are used as instruments, then 2: = Y,

Xi = X“, and (11.44) becomes:

99

-1
[Y'Y] Y'X x'y]
.. IxIv “' ”’ IXIV
(11.45) 51V ’3 3
X'Y x'x x'y
H u u H
the usual ZSLS computational formula in which X is used as

IV

the matrix of instruments, XI.

The above computational methods do not require that pre-
determined variables in the equation be selected as instruments;
however, most criteria for selecting instruments make the pre-
determined variables in the equation prime candidates as instru-
ments. As noted above, not selecting a predetermined variable in
an equation as an instrument has the same computational effect
(assuming that the variable is not in the space Spanned by the
instruments) as if the variable had been reclassified as a jointly
dependent variable.

If A - n, then (as in double k-class estimation) the in-
struments are usually taken to be the predetermined variables in
the system so that XIV = XI = X and (11.45) becomes the usual
ZSLS computational procedure. .If A > n, unique IV estimates do
not exist if the instruments are selected from among the A pre-
determined variables in the system, since the number of pre-
determined variables in the system is greater than the number of
instruments to be selected. On the other hand a unique set of

m instruments can be calculated from the A predetermined vari-

 

ables (this set of instruments consists of the variables in YIX)

such that if XIV = [YﬂX :Xu], ZSLS estimates (with XI = X) are

obtained.

De/Lévauon 05 ZSLS M an IV M22311

We have been treating IV as a special case of ZSLS, at
least computationally. It is interesting to note that even if
in ZSLS estimation the matrix of instruments, X1, is of rank
greater thaxlnp (hence, there are more instruments in the XI
matrix than variables in the equation), ZSLS may be considered
to be a particular case of IV.1 To demonstrate this, we use the

variables in the matrix [YIX E Xp] as instruments.2 (11.39)
I

becomes:

3 X J'y

(11.46) 6 = {[Y H

. . , , -1
1v : xu] [Y . XpJ} [Y'x

Ix1

E' Y ' x "IFY' y
IxI thl ” 'x1

I

 

 

L’SIY X? t x11?

Since Y' Y = [Y'Y] , Y' y =-[Y'y] , and Y' = [Y'x ]
"‘1 nxI In:I IxI Ixﬁ» u Ix

= Y'Xp [see (1.54) and (1.40)], the above may be rewritten as:

I

1 u 1 .
[Y YJ'XI Y XL [Y yJ'x;)

(11.47) 81v = =
'Y x' x'
X» uxu uy _.

serve as valid instruments since they are

0)

ZSLS

 

The variables in Y

 

Ix,
linear combinations of variables in X1.
1Goldberger [1964], p. 332.
2Y is the part of Y in the space spanned by X

IXI I

101

E. No Predetermined Variables in an Equation

All of the formulas for estimating coefficients given

earlier remain valid even if no predetermined variables occur in

the equation to be estimated. In the LIML computations, [+Y'+Y]lx
u
(the moment matrix of the part of +Y orthogonal to the columns

of X“) becomes Y' Y 1

++°

 

102

F. Only One Jointly Dependent Variable in an Equation

If only one jointly dependent variable occurs in an equation,
all of the double k-class methods become the same as DLS, since

equation (11.2) becomes simply:

~ -1
11.48 5 = [xfx .x'
( ) k1,k2 IJ' IJ'] “'y

103

G. Selection of Instruments

In this section we will discuss the selection of instruments
for double k-class methods, limiting our discussion for the sake
of simplicity to ZSLS for most of the section. Instruments se-
lected for double k-class methods should have the same basic
characteristics as instruments selected for the IV estimator
except that it seems less desirable to form linear combinations
of possible instruments for the double k-class members, since the
number of instruments to be used is not restricted to a given
number--the number of explanatory variables in the equation--as
is the case with the IV method. We will not make explicit further
reference to the IV method, since it is most fruitful to regard
(and calculate) 1V coefficients as the special case of ZSLS in
which there are the same number of instruments as explanatory
variables in the equation.

We will assume that the predetermined variables in the
equation are among the instruments in the XI matrix. For con-
sistency in double k-class estimation we will assume that the matrix
of instruments is contemporaneously independent of the‘diSturbance‘
of the equation being estimated (see appendix C). This suggests
that the predetermined variables in the system are prime candidates
for choices as instruments; however, cases may arise where it is
desirable that not all of the predetermined variables in the system

be used as instruments or that additional variables which are not

predetermined variables in the system be used as instruments. 1f

 

1See section II.D.2.

104

all of the assumptions made at the start of this paper strictly held,
one would be led toward the use of all of the predetermined variables
in the system as the set of instruments; however, since these assumptions
are likely to hold only imptéfectly in practice, it may be desirable that
given instruments be eliminated from the XI matrix. For example if the
disturbances of an equation are "slightly" serially correlated, one would

surely question the use of lagged jointly dependent variables outside

the equation but in the system as instruments.

 

1See Fisher [1965] and Goldberger [1964] for expositions on the
choice of instruments.

It has been usual (where possible) to use all of the predeter-
mined variables in the system as instruments, the justification for
using all of them being based on the fact that the unrestricted re-
duced form equations corresponding to the jointly dependent variables
contain all of the predetermined variables in the equation. Examination
of many systems (especially those in which the matrix of coefficients
of the structural equations has a recursive structure) often discloses
that certain of the predetermined variables occur with zero coefficients
in the reduced form equation corresponding to a given jointly dependent
variable. Often a separate set of instruments is then used in the ad-
justment of each jointly dependent variable. (If a separate set of in-
struments is used for each jointly dependent variable, the formulas
given for the double k-class estimators can be modified in the manner
noted in section II.A and appendix B.) Alternatively only those prea
determined variables which have zero coefficients in the reduced form
equations corresponding to all of the jointly dependent variables may
be eliminated.

Fisher takes a causal approach to the selection of instruments
which leads him to the examination of the structural equations rather
than the reduced form equations as the basis for choice of instruments,
the key structural equation for each explanatory jointly dependent vari-
able being the one in which that variable occurs as the normalizing
variable. (Fisher takes the approach that in a fully specified system,
each jointly dependent variable will occur as the normalizing variable
in one structural equation.) This also leads him to the possibility of
using a separate set of instruments to adjust each explanatory jointly
dependent variable.

Examination of alternative assumptions made in the block recursive
systems which Fisher examines leads him to question (in some cases) the
use of lagged jointly dependent variables and to suggest the use of
lagged exogenous variables as instruments.

A model may also be examined for partitioning into subsystems.
After partitioning, the instruments in the estimation of each equa-
tion in a subsystem are based on the predetermined variables in

105

The minimum number of instruments which can be selected
for ZSLS is n (at least m instruments must be selected in
addition to the L predetermined variables in the equation);

since otherwise the [YIX 5 Kg] matrix has rank less than n

I
1
[[Y'Y]nx Y'x
and therefore, the I u matrix is singular, i.e.,
X'Y X'X
h— u’ ”A

 

 

unique ZSLS coefficients do not exist.1

There is absolutely no maximum number of instruments which
may be chosen if direct orthogonalization is used as suggested in
this paper; however, if rk.XI = T, all double k-class coefficients
become the same as the DLS coefficients.2 This comes about because
if rk XI = T, then leI = 0 since any variables in the observation

matrix falls in the space spanned by X (i.e., all of the vari-

I

ables may be expressed as an exact linear combination of the variables

 

that subsystem. One basis for partitioning is the degree of
correlation of the disturbances (estimated in some manner or
assumed a priori). Also, Hannan [1967] gives a method for sub-
dividing a system of equations of a special form into non-inter-
secting "maximal" subsystems.

LThis is also the minimum number of instruments for LIML,
since if the number of instruments is less than or equal to n,
the LVR (least variance ratio) for LIML is (see sections 11.3.2
and II.C.1) 8 0'0 /E'G = 1 (since the jointly dependent
variables may gchombined such that ﬁ'ﬁ is no larger than 3'5);
hence, LIML becomes the same as ZSLS for which (in this case)
unique coefficients do not exist. (Depending on the computational
formula for kLI , arbitrarily large numbers may be encountered
during the compu ation of kLIML if the number of instruments is

less than n.)

2In this case, the particular instruments chosen will have
no effect on the coefficients obtained (provided rk XI = T) since
the DLS solution is obtained in any case.

106

in X1); hence, the part of any jointly dependent variable orthogonal

 

 

0 I ' _ I __ 9 .—
to XI 18 zero. Thus, [Y Y]1XI= YJ_XIY_LXI - 0 0 — O, [Y YJiXI _
X' y 3 0'0 = 0. and the genera] double k-class formula (11.2)

1X1 1X1

becomes:'

r‘I ; . t ’1 I _ .

Y Y RI 0 Y XE] FY y k2 0
(11.49) 8 =

k1,k2 X'Y x'x x'y
u u u u
b an I—
rY'Y Y'x “'1 Y'§‘
u
= = 6DLS
X'Y X'X X.
L.u u we uZJ

 

 

 

Thus, when rk X = T, all double k-class methods give estimated coeffi-
cients which coincide with the DLS coefficients.

The fact that all double k-class coefficients coincide with DLS
coefficients when rk XI = T does not destroy the consistency of given
double k-class methods. All that is indicated is that there are insuf-

ficient observations to distinguish the double k-class members from

2
each other based on the estimated coefficients obtained.

 

11h this formula and the remainder of this section we will assume
that only zero restrictions are imposed on the coefficients and that
the matrix [Y E Xp] has full column rank (this implies that n S T).

2This was pointed out to the writer by Professor Robert L. Gustaf-
son. Consistency is an asymptotic property and a small number of obser-
vations in a given sample certainly does not affect an asymptotic pro-
perty. If the number of instruments used in the estimation is fixed
(e.g., the number of predetermined variables in the system is fixed for
a given model; hence, if X is used as then the number of vari-
ables in X is fixed) then (if the double k-class formulas are followed;
that is, a switch is not made to the DLS formula) as T increases, at
some point there will be sufficient observations that rk.X ‘< T and
the coefficients of the double k—class members will not coincide with
DLS coefficients.

. 107
manna Penapec/téve

The formulas presently quoted in the econometric litera-

ture for the calculation of [Y'Y].x and [Y'y]"x are
I I
-1 -1
Y'Y = Y'X X'X 'Y d ' = ' ' ' .
[ J'XI I( 1 I) xI an [Y YJIIXI Y x1(XIxI) ny As

a result of focusing on these formulas, the problem occurring
when rk XI = T has been expressed as one of how to select or
construct a set of predetermined variables, say Xa, which
captures the maximum effect of the predetermined variables in
the system but whose sums of squares and cross-products matrix,
x;xa, is nonsingular.1

Initially the solution to the problem was based on the
omission of sufficient variables from the XI matrix that the
inverse of the resulting x;xa matrix could be calculated
accurately. The variables retained in the X8 matrix (in
addition to the predetermined variables in the equation) were
selected such that the resulting Xa matrix captured as much

effect of the original .X matrix as possible.

1
Kloek and Mennes took a different approach to the re—

striction of the space of predetermined variables. They sug-;

gested linearly combining the predetermined variables in the

system into a set of fewer predetermined variables by calculat-

ing principal components of predetermined variables. In their

 

[As an example, Kloek and Mennes [1960], p. 46 state: "The
table also shows, however, that A may easily grow in excess of
the number of observations T on which the estimation is based.
This is a serious problem, for it implies that the matrix of sums
of squares and cross products of the predetermined variables (X'X)
is singular; and the inverse of this matrix is needed for the
estimation of the reduced-form disturbances, which are auxillary
in the estimation of the parameters of the structural equations."

108

famous article they elaborated four methods of calculating prin-
cipal components.1 The most elaborate of these methods is the
method which Kloek and Mennes refer to as method 2. In method

2 the moment matrix of XI; (the part of the predetermined
variables in the system with zero coefficients in the equation
orthogonal to the predetermined variables with non-zero coeffi-

cients in the equation) is first calculated and then a pre-

designated number of principal components of the Xi; matrix
u

are calculated. These principal components plus X“ are then

used in place of the XI matrix in double k-class estimation.

Regmdéng Rama/tan 06 The Space 06 Imuumemf’. vmab£u

The calculation of [Y'Yhx and [Y'y]J_x by direct
I I

orthogonalization (hence the calculation of [Y'Y]"x as
I

Y'Y - [Y'YJLXI and [Y'y]“x as Y'y - [Y'y]lx1) changes the
focus of the problem from on: of eliminating sufficient multi-
collinearity that an inverse (or a set of reduced form coeffi-
cients) can be accurately calculated, since [Y‘Y]|X and
[Y'y]Ix are already unique and automatically calculated by
the computational method given in this paper in even the most
extreme cases of multicollinearity among the instruments. The
problem now becomes one of 'whether the subspace of the instru-
ments should be restricted so that the solution obtained will
not coincide with the DLS solution.

There is often a good basis for not using all of the pre-
determined variables in the system as variables in the X matrix

I

1Kloek and Mennes [1960].

109

provided this is done because of characteristics of the data (e.g.,
some degree of serial correlation in the disturbances of an equa-
tion may imply that certain lagged jointly dependent variables
would not serve as desirable instruments) or of the model (e.g.,
the reduced form equations corresponding to the explanatory
jointly dependent variables do not contain a set of the pre-
determined variables). There does not, however, seem to be a
good basis for restricting the space of instruments merely to
cause the resulting coefficients to differ from the DLS coeffi-
cients. ’
On the other hand, since there will surely be researchers
who will desire that the space of instruments be restricted so
that the resulting coefficients do not coincide with DLS coeffi-
cients (and since some effective arguments for restricting the

space may be forthcoming in the future) a few additional remarks

regarding how the space might be restricted will be made.

 

[As noted before, an estimator is not made inconsistent just
because the resulting coefficients do not differ from DLS coeffi-
cients. Also, by now it should be obvious that asymptotic pro-
perties are a poor guide as to how to procede when the problem is
the result of a small number of observations.

2Even though DLS does not use any information from the remainder
of the system, there may still be considerable advantage to care-
fully specifying the remainder of the system even if it is known
ahead of time that given the number of observations available,
rk.X will equal T. If a complete system is specified, FIML
(full information maximum likelihood) may be applied even if
rk X - T. (rk.X - T presents no difficulty in FIML estimation,
since the matrix of coefficients of jointly dependent variables
are recognized in a special fashion, but the jointly dependent
variables themselves are not adjusted by a set of instruments.
The confusion which exists regarding this point is clarified in
footnote 1 of page 203.) The FIML solution will not, in general,

coincide with the DLS solution even in the case rk XI - T.

110

The change in focus of the problem away from the prevention
of a large degree of multicollinearity considerably reduces the
desirability of using principal components to restrict the Space
spanned by the instruments. If it is decided that the space
spanned by the instruments is to be restricted, then it is much
more straightforward to do this by merely eliminating certain of
the initial instruments considered for the analysis. In that way
the coefficients obtained can be related to the particular set of
instruments used, whereas it is very difficult to evaluate the
effect of using as instruments in the computation a set of prin-
cipal components of a larger set of instruments. The best guide
as to which instruments to retain would seem to be to rank (or
group) the instruments according to their desirability as in-
struments in the estimation of the equation.

One of the advantages claimed for the use of principal
components in restricting the space of the instruments is that
the use of principal components is less arbitrary than the
selection of instruments to retain--one merely decides on the
principal component method to use (which might as well be method
2) and how many principal components to retain; (the number to
retain is quite arbitrary); however, there are certainly easier
and more informative methods (in terms of evaluating the results)
to accomplish even this advantage.

As an example (in addition to the methods of Fisher [1965])

the orthogonalization procedure described in appendix A can be

 

1See Fisher [1965] for some procedures.

lll

modified so that c instruments1 (in addition to the L pre-
determined variables in the equation) can be selected from among

a prespecified set of instruments so that no instrument selected

is an exact linear combination of instruments previously selected.
The modification consists of merely (l) incorporating the effect

of the variables in .Xn, and then (2) stopping the orthogonalization
procedure after c diagonal elements have been selected as pivots
from among those corresponding to instruments not in the equation.
After c pivots have been used, the part of the matrix correspond-

. ' '
ing to +Y +Y will have become [+Y +Y Xe] where

11:
[xu.x1 ...

X through Xc are the instruments corresponding to the c

l
pivots selected. Using this matrix in place of [+Y'+Y]lx will
1

result in the same coefficients as if X1 through Xc were

initially listed as the only instruments in addition to the vari-

ables in ~X .2
p

Assuming that the space of instruments is to be restricted,

1c could be the number of principal components which would
have been selected if principal components had been used. c must
be greater than or equal to m since otherwise (as noted earlier)
the matrix inverted in computing ZSLS or LIML estimated coeffi-
cients is singular.

2This method is somewhat similar to principal components
method 2 in that by first incorporating the variables of X , the
first pivot selected from among the part of the matrix coraespond-
ing to the variables in X** is selected as the largest diagonal
element of the [X**'X**l_ matrix (the moment matrix from

which principal components are calculated if method 2 is used).
If the variable corresponding to the first pivot selected from
the [X**'X**]l matrix is denoted as X1 , then the second pivot is

selected as the largest one in the [X**'X**] matrix,

1 5
Ex“ x1]
and so on until c pivots (and hence c variables) have been
selected in the orthogonalization procedure.

112

both the predesignation of the variables to be treated as in-
struments and the use of an automatic method to select a set of
variables have the distinct advantage over the use of principal
components in that it is clear as to the exact instrumenusused
in the computations; hence, it is possible to more fully evaluate
the results obtained. Ease of computation is a second advantage.

Predesignation of the variables to use has an advantage
over the automatic selection computational method in that more
judgement may be used in the selection of instruments. The two
methods may, of course, be combined by (1) not listing pre-
determined variables in the system which are clearly not desired,
(2) selecting the pivots corresponding to all of predetermined
variables clearly desired, and (3) letting the automatic pro-
cedure select the remaining variables (up to a given pre-
specified number) through choice of the largest pivot at each
step.

The methods for selecting instruments given in Fisher
[1965] tend to lead to a separate set of instruments to adjust
each jointly dependent variable in the equation. Whereas the
use of this approach is certainly feasible, it would seem that
a more straightforward approach would be to select a set of
instruments corresponding to each explanatory jointly dependent
variable as Fisher suggests and then use all of the instruments
selected for any jointly dependent variable in the X matrix,

I

thereby again adjusting all of the explanatory jointly dependent

113

variables by the same set of instruments.1 In addition to the

computations being easier to perform, interpretation of results

will be simpler.2’3

 

1Such a procedure is feasible if direct orthogonalization is
used to prevent multicollinearity from making the computations
unreliable.

2On the other hand if it is known that a predetermined vari-
able has no relationship to a jointly dependent variable (i.e.,
the coefficient corresponding to the predetermined variable is
zero in the reduced form equation containing the jointly dependent
variable) it is possible that asymptotic efficiency will be in-
creased by not adjusting all explanatory jointly dependent vari-
ables by the same set of instruments.

3Again the writer would like to emphasize that in the latter
part of this section he is attempting to give methods which will
give results which may be more readily interpreted and more easily
computed than the methods commonly advocated in the case where it
has already been determined that the space of the instruments is
to be limited. The writer currently prefers estimates which
coincide with DLS estimates rather than resorting to a restriction
of the space of instruments merely to prevent the estimates obtained
from coinciding with DLS estimates.

CHAPTER III
DISTURBANCE VARIANCE AND COEFFICIENT VARIANCE-COVARIANCE ESTIMATION

In this chapter we will first discuss estimation of the
disturbance variance for all double k-class members satisfying
plim(k1 - 1) = plim(k2 - 1) = 0 and then discuss coefficient
variance-covariance estimation for the double k-class estimators
satisfying plim /1(1<1 - 1) = plim /T(k2 - 1) = 0 (which still
includes ZSLS, LIML, UBK, and MSM). Finally, estimation of coeffi-

cient "t-ratios" is discussed.
A. Disturbance Variance Estimation

For any double k-class member for which plim(k1 - l) =
plim(k2 - l) = 0, a consistent estimate of 02, the disturbance

. th . . .
variance of the u equation, IS g1ven by:

(111.1) 6 = 6' /(T - n)
where u' is the usual residual sum of squares, the
{1k ,k C1k ,k
l 2 1’2 6 2
re31dua1 being given by uk1,k2 = y - Z“ kl’kz .

 

1To the writer's knowledge no formula (except the one given
in.Nagar [1962]) exists in the literature for the estimated dis-
turbance variance of a double k-class estimator. (111.1) is con-
sistent with the usual formula for single k-class estimators. A

tentative proof of the consistency of 62 calculated by (111.1)

is given in appendix C. kl’k2

2&‘6 for any linear estimator may be calculated more 8simply,
accurately, and with less computer time if calculated as “+2 ]+ 6
rather than by calculating u and then calculating the sum 80f Du
squares of u. That the two are mathematicall equivalent is
easily verified. u' u = [y - Z ”81' [y - Z “6] = [+2 M6]. [+2u+6]=
H6'[ 2' qu] 6. For notational simplicity, however, we will con-
tinue to write the residual sum of squares as ﬁ'ﬁ rather than its
computational form +6' [+ﬁp +2“]+ 6.

114

115

All of the specific double k-class members previously dis-
cussed meet the above plim requirement except DLS. This includes
ZSLS, LIML, UBK, and MSM; Although the above does not provide a
consistent estimate of 02 for DLS when multiple jointly dependent
variables occur in the equation, it does agree with the usual
formula used for calculating aﬁLS (which is almost surely biased
downward when multiple jointly dependent variables occur in the
equation).

The consistency of the estimator will not be changed if
any denominator, d, with plim(d/T) = 1, is used in place of

T - n. Thus, T may also be used as a denominator instead of

T - n.
Raga/«Ling the. Appnapméaie "Deglieu 06 Facedom" 6011 the. Denomination

When estimating 02 by DLS, it is usual to use the follow-

ing formula:

«2

g A. A -
(111.2) ODLS uDLSuDLS/(T n)

If only one jointly dependent variable occurs in the equation,
(111.2) provides a consistent estimate of 02. If multiple
jointly dependent variables occur in the equation, it is usual

to continue to use (111.2) although this estimate is almost surely
biased downward.1 It would seem desirable to develop an estimate

which takes account of the occurance of multiple jointly dependent

 

HMonte Carlo results of Cragg support this. See Cragg [1966]
and Cragg [1967].

116

variables in the equation; however, the writer is not aware of
any results of work in this area. The use of rk XI in an ad-
justment for "degrees of freedom" (e.g., a degrees of freedom
of T - rk.XI) would not seem to be a fruitful approach for

DLS,since XI has no effect on the DLS coefficients.

On the other hand, when estimating by other double k-

class members, the set of instruments affects the size of

2

a
kl’kZ

procedure. Given a particular equation, the use of additional

, since the instruments are actually used in the estimation

121 ﬁk k;
1’- 1’2
immediately arises--shou1d we reflect this by changing the "degrees

instruments will tend to reduce 6 hence, the question

of freedom” used as the denominator to something like T - A,
T - rk X1, or even T - m - rk XI? All of these denominators will
give consistent estimates of 02 under the assumptions
plim(k1 - l) = 0 and plim(k2 - l)_= 0.
By way of a partial answer, let us consider what happens

to the ﬁ' G of a given double k-class member calculated
k1,k2 k1.k2
for a given equation, u, and a fixed sample of size T if addi-

tional instruments are added to a given XI matrix. 1f the addi-

tional instruments are not linear combinations of instruments al-

ready in the X matrix, the rank of the new XI will increase and

I
6i k ﬁk k will tend to decrease. 1f sufficient predetermined
1’ 2 1’ 2
variables are added that rk XI = T (rkXI cannot exceed T),
6 becomes equal to 6 and G' G becomes equal
k1,k2 DLS k1’k2 kl’kz
ﬁﬁLSﬁDLS Thus, if the maximum number of predetermined

variables which can affect the estimation are classified as being

to

117

' th " ‘ ’ " ‘ .
in e system, ukl’kzukl’kz W111 be decreased to uDLSuDLS The
use of T - rk XI as the degrees of freedom is clearly inappropriate
in this case since T - rk X = 0 and, therefore, 6 would
1 k1.k2

be arbitrarily large.

The above suggests that although some adjustment based on
number of predetermined variables might be appropriate, the use of
T - rk.XI does not seem appropriate. The use of T - rk XI
seems inappropriate also from the standpoint that it completely
ignores the number of actual coefficients estimated--a factor
which would seem to be of considerably more importance in any
"degrees of freedom" adjustment than rk XI. That the use of
T - n as the "degrees of freedom" adjustment gives satisfactory

results in at least some cases for ZSLS and LIML is indicated by

some of Cragg's Mbnte Carlo results.

Some Wonk by Nagan

-1
Nagar derived the bias to 0(T ) in probability of

2

(l/T)ﬁ as:

11611
(111.3) (IIT)6(Gkﬁk) - 02 = -a2{2 - [A - n — n - 1]-tr(QCl)
- tr(QC) + (tn/1)}

where k is assumed nonstochastic with k - l = 0(T-1) in

 

1Cragg's conclusions notedA n section 111.C pertain to
estimated "t-ratios"; however, 0 is used in this estimation.

ZNagari[l961]. The comparable formula for double k-class
members is given in Nagar [1962].

118

probability; K is related to k by the relation k = l + (n/T),
K being assumed to be non-stochastic and independent of T; and
the predetermined variables are assumed to be nonstochastic.1 Q,

C, and C are matrices which must be estimated (see section 11.

1
C.3). 6 denotes expected value.

It might be suggested that 02 be estimated by the formula
(l/T)§L§k and then (111.3) be used to adjust for bias; however,
this does not seem to be a fruitful approach. First of all, the
formula for bias contains 02 as a parameter. Thus, we are in
the position of estimating the bias of a particular estimate of
02 with 02 itself being a parameter. It is possible to manip-
ulate the above formula to eliminate 02 from the right hand
side; however, the resulting formula is still not very helpful
in adjusting for bias.

Especially noteworthy is the fact that unbiasedness is
only one desirable characteristic (actually the above formula
only gives an asymptotic estimate of bias to 0(T-1) in prob-
ability). Another important characteristic is dispersion,
especially dispersion about the true value. The above formula
is likely to give a large dispersion about any value (let alone
the true value), since it contains traces of certain matrices
which must be estimated, and the estimates of these traces (at

least those suggested by Nagar) vary substantially in magnitude

 

1These are the same assumptions as were made in section 11.0.
3 (MSM estimation). k nonstochastic means in this case that k
is independent of any stochastic variable (in particular, k is
independent of u). Although ZSLS, UBK, and MSM meet this require-
ment, LIML does not.

119

in response to only small changes in the data or model. Also, the
particular estimates of these matrices may add a substantial error

to our estimate of the bias.1

A Common Ouwighx in ammg aéswazm

As noted in section II.B.2, ZSLS may be estimated as a two

stage process in which Y is calculated in the first stage,

IXI
and in the second stage, YIX is Substituted for Y in the cal-
I

culation of 528LS° Often, the error sum of squares from the

second step is then used as in the calculation of

A ' A
“zsisuzsis
EZSLS and , therefore, the calculation of coefficient standard

errors is based on YIX instead of the original Y. This is
I

generally regarded as a less desirable estimate than the estimator

we have given, namely that based on u = y - Y] - Xpﬁ, or some

ZSLS
equivalent formula. Thus, after calculating SZSLS’ 3'6 should
be calculated by using the original Y in place of YIx or by
I

the formula in the footnote on page 114.2

 

1Some estimates for these matrices are suggested in.Nagar
[1959].

2 A. A

The UZSLSUZSLS A
either larger or smaller than the u'ﬁ obtained by DLS at the
second stage.

obtained through recalculating may be

120

. . 1
B. Coefficient Variance-Covariance Estimation

1. Double k-class

 

For any double k-class member for which plim ./7f(kl - 1) =

plim1/T(k2 - l) = O, a consistent estimate of Var(6k k ) is

l’ 2

 

1Coefficient variance-covariance estimation for the double k-class
estimators which we have considered other than DLS is complicated by the
fact that the small sample sampling variance may be infinite in some
cases. Fisher [1965], p. 605 states: "The principal point that has
emerged on small sample properties of limited-information estimators
is that the sampling variances involved are infinite, at least in some
cases. Such a conclusion is borne out both from the analytic work
that has been accomplished to date and by the results of the Monte
Carlo experiments that have been performed." (Fisher's limited-in-
formation estimators include all of the specific k-class estimators
treated in this paper except DLS. Fisher gives a number of analytical
and Monte Carlo references.) '

This does not imply that coefficient variance-covariance estimation
is futile in finite samples. Basmann [1961], p. 621 states: "It is
appropriate to mention that even though the exact distribution function
F(x) of some estimator fails to possess moments of lower order (say)

a variance, it is still possible in many cases to appdoximate F(x)

by a distribution function. G(x) that does possess (say) a variance
and even possesses moments of still higher order. Thus A.L. Nagar has
made an important contribution to econometric statistics by working
out formulas for the bias and moment matrices cdfappnoximaze dis-
tributions for Theil's h-Class estimators," [reference to Nagar [1959]
deleted]. "The reader will easily satisfy himself that Nagar's
approximations do not depend on the exact distributions possessing a
finite variance. Examine the exact frequency function exhibited in
Figure 1 below. This frequency function does not possess a finite
variance. Consider the approximation obtained by truncating the
exact frequency function at the points v = -3, v = 3. The approxi-
mate frequency function obtained in this way possesses finite moments
of all orders. The approximate distribution will be an excellent

one, indeed."

121

given by:

u u l ‘1
ry Y-k1[Y yjlxl Y xn
«2

) = 0
k1,k2 k1,k2

(111.4) v3r(8

X'Y x'x
c u u u

 

*2
where 0k k is calculated by (111.1) or any other formula with
A l’ 2
plim O = O (e.g., T could be used in the denominator in
k1,k2
place of T - n if desired).

(111.4) does not provide a consistent estimate of Var(6DLS)

when multiple jointly dependent variables occur in the equation even

)

though it does agree with the usual formula for calculating Var(6DLS

 

1To the writer's knowledge no formula (except the one given in
Nagar [1962]) exists in the literature for the estimated coefficient
variance-covariance matrix of the double k-class estimator. If
k = k , (111.4) becomes the usual estimated coefficient variance-
covariance formula for the single k-class estimator.

Christ [1966], p. 445 states that the asymptotic coefficient
variance-covariance matrix of the double k-class estimator is the
same as the asymptotic coefficient variance-covariance matrix of the
2313 estimator provided pun/"f(k1 - 1) = p1im./T(k2 - 1) = 0, but

does not give a formula for Var(6k k ) . Let the formula given in
l’ 2
(111.4) be denoted A. In appendix C it is tentatively shown [under
the slightly less restrictive assumption plim(k - 1) = plim(k - l)
= 0] that (l/T)plim T A equals the asymptotic coefficient variance
covariance matrix of the ZSLS estimator; hence, under the assumption
plim/f(k1 - 1) = p1im/T(k2 - 1) = O, A [i.e., the formula given in
(111.4)] is a consistent estimate of Var(8k k ) . LIML, ZSLS, UBK,
and MSM all meet this plim requirement. 1 2

122
(since k1 = O for DLS). Cragg's Monte Carlo results indicate that

this estimate of SDLS has a substantial downward bias.1
Conversion of an instrumental variables problem to a ZSLS pro-
blem in the manner indicated in section 11.D.2 also provides a con-

venient method of estimating Var(6Iv), since (111 4) will give the

same result as the usual IV formula.2

 

1Cragg's Monte Carlo results are considered in section 111.C in
the discussion of t-ratios.

2See Goldberger [1964], pp. 286, 332 for the usual formula--
A2 ' -1 ' ' ‘1 . o o
o [XIVzp] [XIVXIV][quIV] --wh1ch Goldberger notes is con81stent.
That it is equivalent to the ZSLS formula for the converted 1V pro-

. c2 , -l , , -1 _
blem may be noted from. O [XIVzu] [XIVXIVJEZHXIV] -

.2. , -1, -1=«2, -1
a [2px1v(x1vx1v) x1vzs] ° [zuzujuxlv

times the matrix inverted in the calculation of ZSLS if the IV pro-
blem is converted to a ZSLS problem (see 11.39). If (as is usual) the

[see (1.36)], which is 62

variables in X“ are also contained in X , 62EZﬁZu]i; becomes

IV
the more familiar IV

 

TY'Y] y'x ’1

IXIV u]

62 [see (11.45)].
L X'Y x'x

 

u uu‘

123
2. Alternative estimate for LIML

An asymptotic coefficient variance-covariance matrix for LIML

has been derived by Rubin as:1

 

 

”Y'Y-[rial + “QC—llff' y'x ”'1
X is
(111.5) Var(6LIML) = OLIML
X’Y X'X
where k is the smallest eigenvalue of [Y'Y];; [Y'Y]ix as before,
I u
I: ' °
+f [+ Y +Y11XI+NLIML’ and f is the same as +f except that the
element corresponding to the normalizing variable is deleted, i.e.,
f ‘ E +Y inx +YL1ML '

I
(111.5) is given for completeness. (111.4) would seem to be

as desirable and has the further advantages:

(1) Var(6 ) estimated by formula (111.4) provides an estimate

LIML

which can be compared more readily to the Var(6 of other

k)

k-class members.

(2) (111.4) may be obtained as a by-product of the calculation of

A I
6 whereas (111.5) requires special-programming.
LIML
It should be noted that Var(SLIML) by formula (111.4)
# Var(6LIML) by formula (111.5),

 

lPersonal conference.' The estimate given in Chernoff and Divinsky
[1953], p. 245 is the same as (111.5) as may be seen by writing down
the formulas for inverting the matrix given in (111.5) in parts and
noting that the result is exactly the same as the formula given by
Chernoff and Divinsky.

124
3. Nagar's unbiased to 0(T-2) in probability estimates

A

Nagar derived the moment matrix of 6 - 6 and 6 - 6
k k1,k2

to 0(T-2) in probability. The estimated coefficient variance-

covariance matrix could be based on the formulas derived by Nagar.

(Most of the matrices in his formulas are based on population para-

meters, but estimates of these matrices of the type which he suggests

in the calculation of gMSM could be used.) Those interested in
following this approach are referred to Nagar's articles.1
In evaluating whether to follow the approach of estimating

a number of population matrices and substituting them into the

formulas one should be reminded that:

(1) Assumptions additional to those which we have specified are
imposed in the derivation of Nagar's formulas.

(2) Although the resulting formulas are of a higher order of un-
biasedness (assuming that the actual population parameters are
available) than the usual formulas, they are still asymptotic.

(3) Unbiasedness is only one desirable property. Dispersion,
especially about the true value, is a property which is also
very important. Nagar's derivations are in terms of certain
population matrices and traces of other population matrices.
The estimation of these matrices and traces and their sub-
stitution into Nagar's formulas are likely to add greatly to
the dispersion of the estimated coefficient variance-co-

variance matrix.

 

lNagar [1959], and Nagar [1962].

125

C. Coefficient Standard Errors and t-ratios

The square roots of the diagonal elements of the estimated co-
efficient variance-covariance matrix (i.e., the square roots of the
estimated coefficient variances) are often used as approximate coeffi-
cient standard errors and the ratios of the coefficients to the square
roots of the estimated coefficient variances are often used as approxi-
mate coefficient t-ratios; however, very little information is avail-
able on how well these computed values serve as approximate standard
errors and approximate t-ratios.

Cragg [1966] and [1967] reported the results of a Monte-Carlo
experiment involving DLS, ZSLS, UBK, LIML, 3SLS, and FIML. The co-

efficient matrix for the basic model was as follows:1

 

 

'1 Y12 Y13 811 E312 0 0 BIS O 0
Y21 '1 0 £321 0 B23 0 B25 0 E327
~° Y32 ’1 B31 0 B33 534 0 B36 OJ

 

1Cragg [1967], p. 94. The large number of abandoned samples casts
considerable doubt on the meaningfullness of the results; however, the
obviously very large amount of rounding error which was encountered
surely had a considerably smaller effect on the calculation of the
single equation estimates than for the 3SLS and FIML estimates. From
Cragg's description of the FIML results, it would appear that they
should be totally ignored due to excessive rounding error in their
computation. In addition to the large number of abandoned samples, a
number of FIML estimates were retained even though their computed
coefficient variances were negative. This indicates either conver-
gence to a saddle point instead of a local maximum for many problems
(convergences of FIML is discussed in chapter V) or that a high de-
gree of rounding error was encountered (or both). (The writer is
not suggesting that additional samples should have been eliminated
based on the FIML results, but that the entire set of FIML results
should have been ignored.)

126
The results reported in Cragg [1967] consisted of 26 experiments,

each containing 50 samples, the experiments being summarized in the

following table:1

EXPERIMENTS CONDUCTED

 

 

Experiment Special Features* Abandoned samples
1 None 1
2 Disturbance set 2 O
3 Disturbance set 3 l
4 Exogenous variable set 2 0
5 Exogenous variable set 3 l
6 Structure 2 0
7 Structure 3 8
8 Structure 4 l
9 Structure 5 1

10 Structure 6 O
11 Structure 7 O
12 Structure 8 0
13 Values of 2 25% those of Table 1 3
14 ‘Values of 2 4 times those of Table 1 3
15 Values of 2 9 times those of Table 1 4
16 Values of 2 16 times those of Table 1 6
17 Values of 2 25 times those of Table I 10
18 35 observations 0
19 50 observations 0
20 7O observations 0
21 Multicollinearity l 1
22 Multicollinearity 2 0
23 Multicollinearity 3 2
24 Multicollinearity 4 6
25 Multicollinearity 5 8
26 Multicollinearity 6 7

 

*Unless otherwise noted twenty observations were used with no
specially introduced multicollinearity in exogenous variable data 1,
structural disturbance set 1, and structure 1.

 

1Cragg [1967], p. 94.

127

A formula equivalent to (11.4) was used as the formula for
estimating the coefficient variance-covariance matrix; that is, a
degrees of freedom adjustment of T - 5 was used for all samples.
Approximate t-ratios were calculated as the coefficients divided
by the square roots of the diagonal elements of the coefficient
variance-covariance matrices.

In reporting the results of the 26 experiments Cragg states:

”The adequacy of the standard errors was investigated by
examining the ratios of the deviations of the coefficients

from the true values to their standard errors, which we call
for simplicity the t ratios. It is sometimes supposed that

t ratios are distributed as Student's t. In investigating

this supposition there were two difficulties: (1) should the
standard errors be adjusted for 'lost degrees of freedom' and
(2) what is the appropriate number of degrees of freedom for
the t distribution. After examining some of the data it
appeared that the standard errors of a coefficient in a
particular structural equation should be adjusted for the
number of coefficients to be estimated in that equation.”
[reference to footnote deleted] '"The most appropriate number
of degrees of freedom appeared to be the number of observations
minus the number of coefficients to be estimated in the equa-
tion in which the coefficient fell. The hypothesis investigated
was that not more than five per cent of the t ratios would fall
outside the ninety-five percent confidence intervals for
roughly five percent of the coefficients in most of the
experiments. The number of the consistent t ratios falling
outside the interval was significantly higher than five per
cent for only one or two coefficients in most experiments and
quite often there was none."1

Also, Cragg states:

"The DLS standard errors were not apt to be reliable for making
inferences about the true values of the structural coefficients.
Much more frequently than for the consistent methods, the number
of DLS t ratios falling outside the ninety-five per cent con-
fidence intervals was significantly greater than five per cent
of the total number of estimates of a coefficient."2

 

1Cragg [1967], p. 101.

2Cragg [1967], p. 102.

128

As one of his conclusions, Cragg states:

”Usually use of the standard errors of the consistent methods
would lead to reliable inferences, but this was not always

the case. The standard errors of DLS were not useful for making
inference about the true values of the coefficients."1

Cragg reported some additional experiments in which the model
noted above was modified to examine the effect of (1) errors in the

exogenous variables, (2) stochastic coefficients, and (3) heteroskedastic

and autocorrelated disturbances. Results similar to those noted above

are reported for these additional experiments.2

 

1Cragg [1967], p. 109.

28cc Cragg [1966].

CHAPTER IV

GENERALIZED LEAST SQUARES

A. Unrestricted Generalized Least Squares (GLS)

 

The generalized least squares model (also called the Aitken
model) may be expressed as:
(IV.1) y = X 6 +- u
Txl TXn nX1 TXl
where the same statistical assumptions are made as were made in Chapter
1 except that
(l) 6uu' = 2; where 2 is a TXT positive definite matrix known
except for a multiplicative constant.
If (1V.l) represents a single stochastic equation from a system
of stochastic equations, then T = T and 6uu' = 2 is a loosen-
ing of the assumption made earlier in this paper that 6uu' = 021.
(The dot is used above the 2 to insure that the 2 matrix is
not confused with the 2 matrix which is the MXM disturbance
variance-covariance matrix for a system of M equations.)
(2) X is a matrix of variables assumed fixed rather than merely con-
temporaneously uncorrelated with u.
The X matrix is not the same as the X matrix of preceding
chapters. 1f (IV.1) represents a single equation from a system
of equation then T = T and the X matrix is the same as the

Xp or 2“ matrix. In part II of this paper we will, at times,

 

lAitken [1934-35], pp. 42-43.

129

130

rewrite an entire set of M stochastic equations in the form

(IV.1). In this case T = MRT, y and u will become MTXl
M

vectors, X will become an IMTX( 2 mg)
u=1

the X“ matrices and matrices of zeros, and 2 *will become an

matrix constructed from

MTXMT matrix. (The ZA [Zellnerintken] and the 3SLS [three-
stage least squares] models will be derived as modifications of
the GLS model.)

(3) 1n the GLS model, X is assumed to have full column rank. This
assumption will be relaxed in the next section when we consider
the RGLS (restricted generalized least squares) model.

The GLS estimator is:

. = , -1 -1 ,--1
(1v.2) 61:13 [x )‘3 x] [x 2 y]
and the var1ance of 6GLS is g1ven by:
. .- -1
_ I
(IV.3) Var (5cm) [x 2 51] .
Under the above statistical assumptions, 6 is the minimum

GLS

variance linear unbiased estimator. Quite a few applications of the
GLS model will be made in the remainder of this paper; however, some
of the assumptions will not be met for these applications. Even in

these applications, although 6G will no longer be best linear un-

LS

may still have desirable properties. Even though some

A

biased, 6GLS

of the assumptions are not met, we will still refer to estimates using

(1V.2) and (IV.3) as GLS estimates.

131
B._ Restricted Generalized Least Squares (RGLS)
The restricted generalized least squares (RGLS) model is the same
as the GLS model except that the following restrictions are imposed on
the coefficients:

(1V . 4) R 6 = r
X X X
NR n n 1 NR 1

where R is an NRXn matrix of known elements and r is an NRX1
vector of known elements.1

In the GLS model, X is assumed to have full column rank. This
assumption is relaxed in the RGLS model. A corresponding necessary
(but not sufficient) condition for the RGLS model is that
rk X + rk R 2 n.2 R need not have full row rank; that is, rk R may

be less than NR (i.e., redundent restrictions are permitted in R).

The RGLS estimator is given by:3

A 1 I.-1 '1 c 1"1 '.-l
(Iv.5) 65m,LS = QHQ (x z X>QJ Q [(11 2 y) - (x 2 mm + q

or, equivalently by:

1N may be greater than, equal to, or less than n. If N > n,
then (11 some of the restrictions are redundant, (2) some of the re-
strictions are inconsistent, and/or (3) 6 is restricted to a fixed set
of coefficients. The computational procedure given in this chapter de-
tects but allows redundancy. Inconsistency is also detected.

2rk R of the coefficients in 6 may be solved for in terms of
the remaining n - rk R coefficients. If rk X is not greater than
or equal to n - rk R, then the remaining n - rk R coefficients (and,
therefore, the rk R coefficients) will not be unique.

3To the writer's knowledge, these forms of the RGLS estimator have
not appeared in the literature. A more common formula will be presented
later (IV.23) and some advantages of (IV.5) or (IV.6) and (1V.7) over
(IV.23) there discussed. The proofs of (IV.5), (IV.6) and (1V.7) are
given after Q and q are defined (via their computation).

132

(IV.6) Elias = [q'(x'i'lx)Q]'lq'[(X'53'1y) - (X'i'lxml
and
(”'7) 31:33 ___ (22811613 + cl2
where:
Q2 is a rk RX(n - rk R) matrix derived from R by reducing R

to essentially a row echelon form by a series of row operations,
then forming Q2 as the negative of the resulting row echelon
matrix (possibly rearranged slightly).

Q is an nX(n - rk R) matrix formed as Q2 augmented by an
(n - rk R)X(n - rk R) identity matrix (and the rows possibly
rearranged).

q2 is a rk RXl vector derived from r by performing the row
operations on the augmented matrix [R 3 r] instead of on R
alone.

q is an nXl vector formed as q2 augmented by an (n - rk R)Xl
vector of zeros (and the elements possibly rearranged to conform
with the rearrangement of the rows of Q).

8:618 is composed of the elements of 6RGLS corresponding to the

(n-rk R)Xl vector of zeros added to q2 in forming q.

6(2) is composed of the remaining rk R elements of 6

RGLS RGLS '

The use of (IV.6) and (1V.7) is equivalent to (l) solving (1V.4)
for rk R of the coefficients in terms of the remaining n - rk R co-
efficients; (2) substituting this solution for the rk.R coefficients
into (IV.l), and rewriting (i.e., redefining variables) so that in

effect an Egrestricted GLS model with n - rk R coefficients to be

133

estimated is obtained; (3) estimating these n - rk R coefficients by
the GLS formula (IV.2); and (4) substituting these estimates back into
the solution of step (1) to obtain the estimates of the rk R coeffi-
cients which were there "solved out.“

The use of formula (IV.5) is equivalent to the use of the two
separate formulas (IV.6) and (1V.7). Computational procedures for
forming Q, q, Q2 and q2 Oare given in the next section. The precise
difinition of these matrices is given by their computational procedure.

. . l
The variance-covariance matrix for 6 is:

RGLS

(1V.8) Var(gkcLs) = Q[Q'(X'i-1X)Q]-1Q'

 

6A derivation of (1V.8) is given after the derivation of SRGLS°

134
l. Computation of Q, and q

 

First of all, the augmented matrix [R 3 r] is formed. Any row
operation which is performed on R will be performed on r as well.

The matrix Q and the vector q may be formed as follows:1
lot 30112.6 06 Steps -- Reduction 06 R to Row Echcﬂon Foam

(la) Let abs Ri j be the largest element in absolute value of R.

l 1
If abs Ri j
l l
in step m.) Otherwise switch rows 1 and

is less than :1 go to step m. (5:1 is explained
i1 of the augmented

matrix and columns 1 and of this matrix so that the largest

J'1
element occurs in column 1 of row 1. Record the order of the
new rows and columns in terms of the order of the original rows
and columns of R.

(lb) Perform row operations on the resulting augmented matrix to re-

duce the first column to column 1 of an N xN identity matrix.

R R
Denote the resulting augmented matrix as [1(1) ?-R(1) 5 r(1)],
where 161) is the first column of an N xN identity matrix.

R R

 

6A9 indicated in Part 111 (Programming Considerations), to reduce
rounding error, all variables should be normalized so that all elements
of the Z'Z matrix will be of comparable magnitude and unaffected by
the multiplication of any variable by a positive constant such as a
power of 10 (i.e., unaffected by shifting the decimal point of the
variable). Normalization such that the variables inherent in the
Z'Z matrix all have length 1 or their deviations from means have
length 1 is suggested. Thus, a step which should precede the first
step outlined in this section is to normalize the columns of R and r to
take account of the normalization of variables. The scaling of vari-
ables up or down by a user will then have no effect on the normalized
R matrix.

If the elements in the rows of R and r differ greatly in
magnitude, the R matrix and r vector should also be normalized row-
wise to reduce rounding error. (Multiplication of any row of [R 3 r]
by a constant does not change the restriction.)

(28)

(2b)

(k8)

(Rb)

(ED

135

Let abs Ri j be the largest element in absolute value in rows
2 2 ‘
1
2 through N of R( ). If abs R, is less than a , go to
R 12312 2
step m. Otherwise switch rows 2 and i of the augmented matrix

2

and columns 2 and j2 of this matrix so that this element occurs
in column 2 of row 2. Record the order of the new rows and columns
in terms of the order of the original rows and columns of R.
Perform row operations on the resulting augmented matrix to re-

duce the second column to column 2 of a NRXNR identity matrix.

11‘” 2 11(2) 2 r(2)],

Denote the resulting augmented matrix as

1(2) is the first two columns of a N XN identity matrix.

h
w ere R R

In general, at the kth step:

Let abs Ri j be the largest element in absolute value of R(k-1).

' k k

If abs Ri j is less than c
k k

rows k and ik of the augmented matrix and columns k and jk

k’ go to step m. Otherwise, switch
of this matrix so that this element occurs in column k of row

k. Record the order of the new rows and columns in terms of the
order of the original rows and columns of R.

Perform row operations on the resulting augmented matrix to re-

duce the kth column to column k of an NRXNR identity matrix.

a I k
Denote the augmented matrix as [1(k) : R<k> : r(k)], where 1( )

is the first k columns of a NRXNR identity matrix.

The procedure is continued until either (1) all N rows have

R
been treated (i.e., 1(k) is an NRXNR identity matrix) in which
case R has full row rank, i.e., rk R = NR; or (2) at the mth
step, Ri j < em in which case rk R = m - 1. Let us assume

136

the latter which we will call step m. Thus, at step m we have:

(Iv.9) 11““) E 11““) E run-1)] =

I I A1 1 b1
rk Rer R rk RX(n-rk R) rk RXl

0 A2 b2
(NR-rk R)er RI (NR-rk R)X(n-rk R)| (NR-rk R)Xl

(If rk R = N the matrix [0 5 A2 E b2] will not occur.)

R,

If no rounding error occurred, and if the remaining rows were

exact linear combinations of the preceding m - 1 rows, then abs R.

11

m m
would be zero. We must, however, allow for the possibility of round-

ing error; hence, we can detect R. having less than full row rank only if

we consider an e which is greater than zero at each step. A preset

k

:1 = 62 = '°' = SN can be assumed or calculated before the procedure
R

is started, or an s can be calculated at each step to reflect the

k

number of operations performed.

 

1During the calculation of [1(m-l) E R(m-1)], columns were re-
arranged; therefore, the column corresponding to a given coefficient
in 6 will have been moved. Suppose that the coefficients of 6 are
now rearranged so that each coefficient will be in the same order as

its corresponding column of [1(m-1) E R(m-1)]. Let us designate 6 in

its rearranged order as 6*. Let us also rearrange the columns of R

so that they are in the same order as their corresponding coefficients
and let us de-

in 6* (same order as the columns of [1(m-l) 1 R(m-1)])

lete from R and r the rows (if any) corresponding to [O 1 A2 3 b2].
Let us designate the new matrix obtained as R* and the new vector as
r*. Then the original set of restrictions could be rewritten as

*
R*6* = r* or R{6(1) + R§6(2)* = r*

where R? is a rk Rer R matrix. By our method of calculation,
[1 E A ’ r*] = (Riflimi E 113 E r*], i.e.,

5 b ] = (R*)-1[R* :
1-1 1 ‘ 1 -1
a: * = *
A1 R1 R3 and b1 (R1) r* .

137

rk Ri< N may occur in either of two cases:

R

(1) The remaining NR - m + l restrictions are implied by (are linear

(2)

combinations of) the preceding m - l restrictions treated;

therefore, they can be ignored. In this case, b will be approx-

2
imately a vector of zeros. (If no rounding error occurred, b2
would be exactly a vector of zeros; however, due to rounding error
we should compare the absolute value of the elements of b2 with
some positive constant.)

At least one of the remaining NR - m + 1 restrictions is incon-
sistent with the preceding m - l restrictions treated; hence,
action should be taken by the user to remove the inconsistency.

Inconsistency is detected by comparing the absolute value of the

elements of b with a small positive constant. abs(b2)i > c

2 (b

2)i
implies that row i of [0 3 A2 3 b2] is inconsistent with the
rows of [I 5 A1 1 b1]. Since we recorded the row number of the
original matrix R corresponding to each row of these augmented
matrices, the set of equations (in terms of the row numbers of R)
with which this equation is inconsistent can be noted so that the
set of restrictions can be corrected by the user. Even after find-

ing one i for which “abs(b2)i > the remaining (b2)i

E 9
(b2)i
should be checked so that all inconsistencies are noted for correc-

tion.

138
2nd Swiss 06$1€p6~~FOW0n 06 Q; and q Mom A1 and b

Form a matrix Q* and a vector q* as:

 

 

 

 

'- 'Al T F 1)1 ]
rk RX(n-rk R) rk RXl
nX(n-rk R) I nXl 0
(n-rk R)X(n-rk R1] L(n-rk R)X1]
b

[1(m-1) E R(m-1)] we rearrange the columns of

In forming the matrix
R noting their revised order in terms of their original order. If
the rows of Q* and q* are considered to be in the revised order

(tn-1) 5 ROD-DJ)

(the same order as the columns of [I then the matrices
Q and q can be formed from Q* and q* by rearranging the rows of

Q* and q* so that they are in the same order as the original columns

of R. Thus,

Q = Q* with the rows arranged to the original order of the columns
of R.

q = q* with the rows arranged to the original order of the columns
of R.

The columns of Q need not be rearranged, since they correspond to

Q

the rows of R, and the order of the rows of R6 is of no consequence.

Computation 06 Q2, q2

The above procedure also gives a method of separating out n-rk R
coefficients which may be estimated directly and then the remaining
rk R coefficient estimates solved for from these estimates.

Let Q2 be formed from Q and q2 be formed from q by de-

leting the (n - rk R)X(n - rk R) identity submatrix of Q and the

139

corresponding rows of q.1 Let 6(1)

be the coefficients correspond-

ing to the rows of the identity submatrix which were deleted from the

Q matrix and let 6(2) be the coefficients corresponding to the rows
of Q then 6(1) may be calculated directly by (IV 6) and 6(2)
2’ RGLS ' RGLS
“(1)
may be calculated by (1V.7) from Q2, q2, and 6RGLS°

In the calculation of Q and q, it is not actually necessary
to search for the largest element in the remaining submatrix at any
step. The rows could be taken in turn and the first non-zero element
encountered used for Rikjk; however, the extra searching and non-

sequential selection of rows will reduce rounding error for many pro-

blems.

P/1006 05 (ll/.5), (IV.6), and (1V.7)

The method of deriving Q, Q2, q, and q2 forms most of the
proof of the formulas. In the first computational method, we reduced

A b
the [R E r] matrix to the reduced augmented matrix [2 1 1‘] ,

A2 b2

where A2 and b2 are within rounding error of zero (assuming no in-
consistency in the original set of restrictions). Only row operations
were used in reducing the [R 5 r] matrix; therefore, the above re—

duced augmented matrix incorporates the full set of restrictions. In
fact, if A2 and b2 are within rounding error of zero, the full set

of restrictions are contained in the [I 5 A1 3 b1] matrix which con-

tains rk R rows. Thus, the restrictions may be expressed as:

(1v.11) [1 :‘ A1]6* = bl

A
6

1Q may be formed directly from wA and q2 may be formed
directly from b1 be rearranging the rows of -A1 and b1, respectively.

140

where the * denotes that the coefficients in the 6 vector have been
rearranged into the same order as the columns of [I E A1]. For nota-
tional convenience, let us respecify the order of variables in the
original problem so that they are in the same order as the columns of
[I i A1]. Thus, under the renumbering, 6* = 6 and b1 = q2. (IV.11)

may be rewritten as:
(2)
(1v.12) [1 . A1]6 [1 . A1] 5(1) 5 +-A16 qz

or

’

(1v.13) 5(2) = -A15(1) + “2 = Q26(1) + qz

since Q2 = 4A under our revised numbering of variables. Our basic

1

model is now:

 

 

6<1)
(1v.14) y = [x1 3 x2] +-u
(2)
subject to
(1v.15) 5(2) = Q25(1) + qz
or, substituting for 6(2) into (IV.14):
.- 5(1) -
(IV.16) y = [x E x J + u
l 2 5(1)
.92 +q23
or
. I <1)
(IV.17) y ~qu2 = [X1 :.X2] 6 + u
Q2

or

141

(1v.18) y - Xq = xqu + u

0 1
since q = and Q = under our revised numbering.
‘12 Q2
Applying the GLS formula to (IV.18) [letting y - Xq and XQ be the

9

y and X respectively of (IV.2)] we get:

5(1)

._1 _ ._
[Q'X'Z XQ] lfq'X'z 1(y - Xq>l

(1v.19) ,_ _ ,_ ,_
[Q'X'Z 1XQ] liQ'X'Z 1y - Q'X'E 1qu

which is exactly formula (IV.6).

If we replace 6(1) in (1V.5) by 6(1), we have 6(2) expressed

as in formula (1V.7). Further,

A 5(1) ram 0 “(1)
.(1V.20) 6 = “(2) = “(1) + = Q6 + q ;
6 Q26 q
hence, substituting for 6(1) we get:

. ._ - .-1 ._
a = Qiq'x' 2 1x11] 1[Q’x' 2 y - Q'X' 2 1X9] + q .

which is exactly (IV.5).

To derive the variance formula (1V.8) we note that (by IV.20):

(1)

(1V.21) Var(6) = QEVar(6 )JQ'

But by the GLS formula for variance (IV.3), substituting XQ from

(IV.18) for the X in (IV.l):
(1v.22) Var(6(1)) = [q'x'ij'lxqj‘l

Substituting this into (IV.21) we get (1V.8).

 

1If y = Ax + b with A a matrix of fixed elements and b a
vector of fixed elements, then Var(y) = A[Var(x)]A'

142
2. Relationship_to another restriction formula

 

If the further assumptions that R has full row rank and X

has full column rank are imposed, an alternative formula for 3RGLS

is given by:

A

A 1" ‘1 v I"1 '
(N23) (5ka = bGLS - (x 2 1X) R [ROI 2 X)

1,-1A
R ] [RéGLS - r]

In order to show the relationship between this and our previous formula

A

for 6RGLS we will derive (IV.23) from (1V.5) using the above additional
1

assumptions (rk R = NR and rk X = n).

0011.an 06 (IV.23)

Since rk R = NR’ we may reorder columns and variables if nec-

essary and partition R as [R1 3 R2] with R square and nonsingular.

1

Then application of row operations to reduce R = [R1 3 R2] to
[I 5 A1] is equivalent to premultiplying R by a non-singular matrix

C with C such that CR1 = I. Then,

-A
o 1 -1

IRQ = C-ICRQ = c’IEI : A1] = c '[-A + A1] = c‘l-o

(IV. 24) RQ 1

 

6A method for deriving (IV.23) using Lagrangian multipliers is
given in Stroud and Zellner [1962]. Another derivation of (IV.23) is
given in Chipman and Rao [1964a].

The Y matrix and a vector of Chipman and Rao are the same as
our R matrix and r vector respectively. Our Q matrix satisfies
the requirements for their 6 matrix and our q vector satisfies the
requirements for their Ga vector. (Chipman and Rao do not present
an actual computational scheme for ‘9, G, or 0.) Using these sub-
stitutions, the essence of (1V.5) is contained as an intermediate step
of Chipman and Rao's derivation of (IV.23). Since the proof of (1V.5)
given in this paper differs from Chipman and Rao's, the derivation of
(IV.23) from (1V.5) differs from Chipman and Rao's also.

143
1

Thus, the columns of Q are orthogonal to the rows of R.
Before going further we will find it useful to establish the
following lemma:
(IV.25) Lemma: Let E and F be Square symmetric idempotent matrices
of order p with EF = FE = O, rk E = n and rk F = p - n;
then, E + F = I.
P1006: Let E + F = G. Then G is symmetric idempotent and
has rank n + (p - n) = p.2 Thus, G has full row and column
rank; hence, G.1 exists.
E + F = G = CI = C(cc’l) = GG(G-1) = cc”1 = 1

(End 06 P2005 05 Lemma)

From (1V.5) we have:

._1 _1 ._ ._
QEQ'X'E X Q] [Q'X'Z 1y - Q'X'Z 1X91+ q

(IV.26) 6RGLS

QEQ'X'SC'IXQJ'IQ'(x'fiQNx'i'bo'lx'i'ly - q] + q

1 1"1 ‘1 v c" A _
om x >3 XQJ Q (x 2 11016613 q] + q
Let E = Q[Q'X'E-1XQ]-1Q'(X'E-]X) and
1

F = (x'iz'bo'lR'EMx'ﬁ'le R']-1R .

Then, E and F are symmetric idempotent with RF = FE = 0, since:

 

1By the definition of orthogonal complement given in Koopmans,
Rubin and Leipnik [1950], p. 89, Q is the orthogonal complement of R;
that is, Q has' full column rank and satisfies RQ = 0.

2For symmetric idempotent matrices E and F: (E + F)2 = E + F
if and only if EF = FE = 0. Also, EF = FE = 0 implies rk(E + F) =
rk E + rk F. See Chipman and Rao [1964b].

3 . . .
Each underlined expression is a matrix times its inverse, hence
an identity matrix which may be suppressed.

144
(1) 22 = qtq'X'i'LxQJ’lg'(x'é‘IX)q[q'x'é"xQJ‘lq'<x'>°:"1<>

= QEQ'X'X'IXQJ'Iwa'S'IX) = s

 

(2) FF = (X'X-1X)-1R'[R(X'E-]X)-1R']-1R(X'E-]X)-lR'[R(X'E-IX)-1R'J-1R
= (x'é'lela'EMx't'lX)'1R'J’1R = F
(3) EF = Q[Q'X'E-]XQ]-1Q'(X'E-1X) (X'X-1X)-1R'[R(X'E-1X)-1R']-1R

 

= QEQ'X'E-JXQ]-IQ'R'[R(X'1Z-]X)-lR']-1R = 0, since Q'R' = o.

(4) FE = (X'X-1X)'IR'[R(X'X-]X)'lR'J’lkqfq'x'iz'lxqj'lq'Ex'i'bi] = 0,

since RQ = 0.

To apply the previous lemma, we need to derive the ranks of E
and F. In the process of deriving these ranks we will point out two
additional assumptions about rank that need to be made [see (4) and
(8) below]. Those willing to assume that rk E = n - rk R and
rk F = rk R may skip the derivation of the ranks of E and F in
(1) through (10), below.

(1) X was assumed positive definite, hence of rank T.

(2) X was assumed to have full column rank (rk X = n).

(3) Since E is positive definite, E-l will be positive definite;
therefore, E-1 can be expressed as E-1 = P'P where P is non-
singular. Thus, rk(X'2X)-1 = rk(X'E-6X) = rk(X'P'PX) = rk(PX) =
rk X = n.1

(4) rk(XQ) S min(rk.X, rk Q) = n - rk R. We will assume that rk(XQ) =

n - rk R.2

 

1For any matrix A, rk(A'A) = rk(AA') = rk.A = rk.A'. Also, if
B and B are any non-singular matrices of order compatible with A
(A may be rectangular) rk(BIA) = rk(ABz) = rk.A .

2rk(XQ) = rk(X1 + XZQZ) where rk.X1 = n - rk R .

(5)

(6)

(7)

(8)

(9)

145
'-l -l '-
rk[Q'x'z: XQ] = rk[Q'X'211XQ] = rk[Q'X'P'PXQ]
= rk(PXQ) = rk(XQ) = n - rk R .
v 1"1 '1 I I 1"1 ‘1
rk(Q[QX2XQ] Q)=rk[QX2XQ] =n-rkR.
The first equality of (6) comes from, first, rk(Q[Q'X'2f1XQ]-1Q')
cannot exceed n - rk R, since the rank of the product of matrices
cannot exceed the rank of any matrix in the product; second,

rk(Q[Q'X'E-1Q]-1Q') cannot be less than n - rk R since:

I
QEQ'X'E'IXQTIQ' = [q'X'E'IXQTlEI 3 <25]
Q2
[9'X'53’IXQJ'1 [q‘x'flxw'log

'-l -l .'-l -l
QZEQ'X'E XQ] QZEQ'X 2 XQ] Q;
and the rank of any matrix is greater than or equal to the rank

of any submatrix in the matrix. The second equality comes from
(5)-
1 1°‘1 ‘1 1
rk E = rk(Q[Q X 2 .XQ] Q ) = n - rk R .
'-l
The first equality comes from (X'X! X) being a non-singular

matrix and the second equality comes from (6).

.-1 -
(IV.23) assumes that R(X'ZI X) 1R' is non-singular; i.e., that

l

[RCX'E-1X)- R']-1 exists. This implies that rk[R(X'X-1x)-1R'] =

rk R .

1R'J'IR) = rk R .

l

rk(R'[R(X'E-lX)-
To see this, let [R(x'171X)' R']-1 = B1. Then
R'BlR = [C-ICR]'Bl[C-ICR] - (CR)'(C'1)'Blc‘ICR where c is the

- -l
C matrix used in (v.24). Let B2 = (C 1)'BIC . Then

146

rk B = rk B = rk R , since (C-1)' and C.1 are non-singular.

2 1
I . B.2 BzAl
(CR)'B2CR = ' B2[I : A1] = ' ' and has the same rank
A1 A1‘32 AleAl
as- B2 for the same reason as the explanation in (6).

(10) rk F - rk(R'[R(X'E-¥X)-1R']-1R) = rk R .
The first equality comes from (X'XFIX)-1 being a non-singular
matrix and the second equality comes from (9).

Since E and F are square idempotent matrices of order n

with .EF = FE = 0, rk E = n — rk R and rk F = rk R, then by the above
lemma (IV.25),E+F=1 or E=I-F.

Substituting 1 - F for E into (IV.26) we have:

. ,--1 -1 , ..-1 -1 , -1 «
5RGLS = [1 - (x 2 .X) R [R(X 2 2X) R ] R][6GLS - q] + g

_ . ,-—1 —1, ,--1 -1, -1 ~ _
— [6GLS - q] - (x 2 X) R[R(x 2 x), R] [Roms Rq] +q
1

= 6 - (X'X-1X)-1R'[R(X'E-1X)-

1 ‘1 A
GLS R] [R6GLS r]

(since r e R6RGLS = R[Q6(1) + q] = RQ6<1> + Rq = 0 6(1) + Rq = Rq)

which is exactly equation (IV.23).

Advantages 06 the Q Fonmwta ave/1 the R Fonmuﬂa.

.Formula (1V.5) (which we will call the Q formula) has the

following advantages over formula (IV.23) (which we will call the R

formula):

(1) X need not have full column rank if the Q formula is used but

must have full column rank for the R formula. Thus, the Q

A

formula may permit calculation of 6RGLS when a unique unre-

stricted 6GL does not exist; however,to use the R formula

S

(2)

(3)

(4)

(5)

(6)

147

A

requires 6GLS.
For the Q formula, the restrictions imposed on the coefficients
need not be linearly independent (R need not have full row rank);
however, use of the R formula requires restrictions which are
linearly independent.

In calculating Q, an explicit check is made for inconsistent
equations. In the R formula, both linearly dependent and in-
consistent equations lead to the same result--singularity of the
R(X'.2-1X) -1R' matrix.

The largest matrix inverted by the Q formula is of order

n - rk R. The largest matrix inverted by the R formula is of
order n.

If an iterative procedure such as iteration on BSLS (three-stage
least squares) or ZA (Zellner-Aitken estimator) is being used,
once the Q matrix and q vector are formed, they can be used
for all iterations; therefore, the calculation of the restricted
coefficients requires less time at each iteration for the Q
formula than for the R formula.

Use of the Q formula permits a unified treatment of restrictions
on coefficients for DLS, ZA, IZA (iterative Zellner-Aitken), BSLS,
I3SLS (iterative three-stage least squares), SML (limited in-
formation subsystem maximum likelihood), LML (linearized maximum

likelihood, and FIML (full information maximum likelihood), as

148

will become evident as each of these methods is discussed.
The Q formula has the disadvantage that the Q matrix and q
vector must be calculated, but this is only a small task which can be

accomplished rapidly on a computer.

149

C. Restrictions Imposed on Direct Least Squares Coefficients

DLS estimation is the particular case of GLS estimation in which

' 2
X is assumed to be of the form 0 1 which is known except for a multi-

2
plication constant (C ) as required by the GLS model.

model is written as:
(IV.27) y = Xu6 4-11 ,

the DLS estimator is

2 -l
(IV.28) 601$ [xpxu] Xuy
and the variance of 6DLS is given by
(IV 29) Var(6 ) = 02[X'X 1-1
‘ DLS u u

Thus, if the

Assuming that X” contains only fixed variables, an unbiased estimate

of 02 is1

(IV.30) o — uDLSuDLS/(T n)
If the restrictions
(IV.31) R 6 = r

NRxn nX 1 NRX 1

are imposed on the coefficients, the RDLS (restricted DLS) model is

the particular case of the RGLS model in which X is assumed to be of

the form 021. Thus, the restricted DLS solution may be obtained by

 

1Goldberger [1964], p. 268 states that the variables in X may
be stochastic provided they are distributed independently of u.

150
. . 2 '
substituting X” for X and O I for 2 in RGLS formula (1V.5)

which gives: ’

A I I '1 I I I
(1v.32) 6mm = Q{[Q (xuxuml Q [xuy - (xuxumll - q

where Q and q are calculated from R and r in the manner given
previously for the RGLS model. Q is an nX(n - rk R) matrix, and q

is an nXl vector. As with the GLS model, the use of (IV.32) obtains
the same result as if (1) rk R coefficients are solved out in terms

of the remaining n - rk R coefficients, (2) the n-rk R coefficients
are estimated by DLS, and (3) the rk R coefficients which were solved
out are calculated from the n - rk R coefficients which were estimated
directly. Provided [Q'X‘LXMQZV1 exists [this inverse will exist if
rk(XhQ) = n - rk R],the RDLS estimates are unique. The solution obtained
is the solution which minimizes (by choice of 6) (y - Xu6)'(y - Xu6)

subject to R6 = r.

 

LT . . . 2 .
he multiplicative constant 0 cancels out Since

21y - (mozxqun - q

2 2 I I ‘II I _ I _
= Q{(1/o )0 [Q (xuxpml Q 1qu (xpxumll q

‘ _ I I 2 '1 I I
6mm - Q{[Q (XI? mum] Q Expo

_ I I '1 I I _ ' -
- Q{[Q 0‘11st Q “‘11" (xuxuml q-

2Calculation of the Q matrix and“ q vector also gives a means
of separating out rk R coefficients, 6RDLS’ which may be calculated

from the remaining n-rk R "unrestricted" coefficients, SR61$°

the following pair of formulas are together equivalent to (IV.32):

Thus,

RDLS RDLS RDLS

where Q2 and q2 are the subparts of Q and q noted earlier.

A(1) = I I ’1 I I _ I . A(2) = A(1)
6 [Q (XHXQQJ Q [pr (xpxuml . 6 Q2 6 + <12

151

The variance-covariance matrix for 6RDLS is

A 2 _

2
with an unbiased estimate of 0 being

(1v.34) 82 = "

RDLS URDLSURDLS/(T - n + rk R)

provided X” is a matrix of "fixed" variables.1

If we relax our assumptions to those of the double k-class model,
permitting, in particular, jointly dependent variables in the matrix
X“, then the RDLS estimates will have the same properties as the DLS
estimates noted in the double k-class section. Although the formulas
within this section no longer provide unbiased or even consistent

A

estimates, 6RDLS will still be the 6 which minimizes ﬁ'ﬁ (subject

*2 . .
to the restrictions, of course), and ORDLS will still be the estimate

2 . .
of c which would be obtained if the restrictions were solved out and

then the usual DLS formulas applied.

 

1Or the variables in X“ are distributed independently of u.

152

D. Restrictions Imposed on Two-stage Least Squares Coefficients

Delbéuwti'on 06 ZSLS as a GLS Method

In the ZSLS model used earlier, we considered an equation of

the form:

(IV.35) y = Zu6 +-u
. 2

with Var(u) = O I

If (IV.35) is premultiplied by X', we have1

I
IV.36 ' = x' 6 + '
( ) ny 12D XIu
with Var(X'u) s X'[Var(u)]X = X'[021]X = OZX'X
I I I I 1 1 I
provided that we assume that the XI matrix contains "fixed“ variables
only. If GLS is applied to (IV.36) by using Xiy as the GLS'y,, xizu

as the GLS X,'; and Xiu as the GLS u in the G13 computational

formula (IV.2) the following GLS estimator is obtained:2

A I I '1 I '1 . . ’1 '
(1v.37) 6 =[sz1(ozx1x1) x12“) [szlwbilxp XIYJ
I I '1 I '1 I I ' I
= [2px1(x1x1) szu] [quI(XIXI) 1XIY]
But, by (1.36) this is
' , ﬁ-l" .
WY YJImI [Y x“ leI [Y YJIXI
I '1 [z' ] a
[6‘2“] “1 “’y Ix1 [ . . [ . ,
L. qu] 11xI [xuxuhlxle “19'3le

 

 

 

and, since the variables in X“ are contained in X1, this is [see

 

1A3 before, we assume that the matrix of instruments, X includes

the variables in Xi.L I
2This derivation of ZSLS is given in Zellner and Theil [1962], p.
56.

153
(1.40)]:

A

= 6zsLS
X'Y x'x x'y
H H H U 4

 

Thus, ZSLS may be derived by an application of GLS. Not all of the

assumptions of the GLS model are met; hence, 6 does not have all

ZSLS
of the desirable properties that 8GLS has. Particularly affecting
the properties of ZSLS is the fact that the m jointly dependent vari-
ables in the XiY matrix (a submatrix of XiZ) are asymptotically

correlated with the disturbance Xiu .

Cetcutatéon 05 2313 When Rest/ulations ahe Imposed on the Coeﬁﬁteéento

Since ZSLS may be derived as a case of GLS, it is natural to
question whether restrictions can be imposed on the coefficients and

a restricted ZSLS estimator derived as an application of RGLS. If we

 

 

denote:
' Y'X ry'
(IV.38) A = , b =
'Y X'X X'
1. KM M M L Hy

then the GLS-ZSLS solution (IV.37) is:

)
I
H

(IV.39) 6 = A b

If we impose the restrictions

(IV.40) R6 = r

154
on the coefficients of (IV.35), then the RGLS solution corresponding

to (IV.39) [we will denote this solution as the RZSLS solution] is:1

(“'41) 3112315 = QIEQ'AQJ’IQ'Eb - AqJ} + q

where Q and q are calculated from R and r in the same way as
in the usual RGLS model. Q is an nX(n-rk R) matrix and q is an
nxl vector.

However, the coefficients obtained through use of the compu-
tational method given above may differ from the coefficients obtained
if the restrictions, (IV.40), are used to reduce the number of coeffi-
cients to be estimated and then ZSLS is used to estimate the remaining
coefficients. One reason that the coefficients may differ is that if
the usual procedure of using the restrictions to reduce the number of
coefficients to be estimated is followed, predetermined variables are
often linearly combined with jointly dependent variables and the newly
constructed variables (those that are linear combinations of jointly
dependent and predetermined variables) are then labeled jointly de-
pendent. The predetermined variables which are linearly combined with
jointly dependent variables then no longer occur in the X“ matrix.

Since these variables are not longer in the Xu matrix, unless they

 

.. 2 , -1
Var(ﬁnzsw) = 0 [Q AQ]

with a consistent estimate of 02 being given by:

A2 _ A ' A
°RZSLS URZSISuRZSLS

If T were substituted for T - n +-rk R in the denominator, the re-

/(T-n+rkR)

- , .2
sulting ORZSLS would still be consistent.

155

are specifically entered into the X** matrix (XI = [Xu

I

: X**]), the
space spanned by the XI matrix formed after the coefficients are
solved into the equation is likely to be a proper subspace of the
space spanned by the XI matrix which would have been formed before

the coefficients were solved into the equation. Hence, th and y.x
are likely to change.

Another reason why the resulting coefficients may differ is that
in using restrictions to reduce the number of coefficients, a set of
predetermined variables may be linearly combined into a single pre-

determined variable; hence, the space spanned by X may again change.

I
To make the above remarks clearer, we will illustrate the effect
of a restriction on two coefficients which effectively linearly combines

a jointly dependent and predetermined variable if the uSual procedure

is applied. Suppose that the equation to be estimated is:

+-a x +~a x + u

(IV'AZ) y1 = “1Y2 2 2 3 3

and the restriction
(IV.43) a - a = 0

is imposed on the coefficients in the equation.
Then, y = y1, Y = y2’ and X“ = [x2,x3]. Suppose also that the RZSLS

coefficients are calculated with the matrix of instruments being

(IV.44) xI = [xu E X**] = [x2,x

3 ‘ xa’xs’xb]
and that (for simplicity) XI has full column rank (i.e., that none of

the variables in XI can be expressed as a linear combination of the

156
remaining variables in .XI.) Let the coefficients obtained through

[2.2]

application of the RZSLS formulas be denoted [61]

RZSLS ’ RZSIS ’

and @33st13

As an alternative means of estimating the coefficients, let us
use the usual procedure of using the restrictions to reduce the number

of coefficients to be estimated. We get:

(IV.45) y1 = (II-(y2 + x2) + 03x3 + u .

a: = = ' ** '
Thus, y yl, Y y2 + x2, X” x3, and if .X is left unchanged,
then

= .- ** = :
(IV.46) XI [Xp . X ] [x3 . x4,x5,x6]

Application of the usual ZSLS formulas to (IV.4S) will give a solution

which we will de31gnate the ZSLSR solution. [ail]ZSLSR is not in

general equal to [al] is not in general equal

RZSLS and [“3123LSR

since the Space of XI has been restricted by omitting

t° [03] R2815 ’

x2.
The coefficients obtained in estimating (IV.42) subject to
(IV.43) by the RZSLS formula (IV.41) would be the same as the coeffi-

cients obtained from estimating (IV.45) if:

I
,x2]); therefore,

(1) x2 were added to X** in estimating (IV.42) (r.e., the X

given in (IV.46) were changed to [x3 : x4,x5,x6

the RZSLS coefficients would be obtained in both cases;1 or if

 

LThe RZSLS coefficients would be obtained because [y2 +'x
+'[x

23px

. . I
- [YZJIXI ZJ'XI [see (1.62)] which (if x2 is contained in 'XI)

equals [yz]llx + x2 [see (1.38)].
I

157
(2) x2 were listed as jointly dependent instead of predetermined and

omitted from XI (i.e., the XI given in (IV.44) were changed

to [x3 5 x4,x5,x6]); therefore, the ZSLSR coefficients would be

obtained in both cases.

The preceding does not imply that either the 6

RZSLS solution

or the 6 solution is incorrect.

ZSLSR 1t merely shows the importance

of the particular instruments selected.

PART II

MULTIPLE EQUATIONS METHODS

CHAPTER V
FULL INFORMATION MAXIMUM LIKELIHOOD (FIML)
A. Properties of the Full Information Maximum Likelihood Estimator

The full information maximum likelihood (FIML) estimator which
is considered in this section is maximum likelihood if in addition to
the basic statistical assumptions of this paper (section I.C.3) we add
the assumption that the matrix of disturbances, U, has the multivariate
normal distribution. If the matrix of disturbances is not normally
distributed then the FIML computational formulas given in this section
give estimates which have been termed quasi-maximum likelihood estimates.
Quasi-maximum likelihood estimates may still possess some desirable
properties.

The FIML estimator is a "full information" estimator in the
sense that account is taken of all structural equations in the system
(including identity equations) in deriving estimates of the population
coefficients. In the single equation techniques of part I, consider-
ation was given to the structure of only a single equation at a time.
For some of these Single equation techniques the predetermined vari-
ables in equations other than the equation being estimated were used,
but no account was taken of the structure of the remaining equations.
In the FIML method the coefficients of all stochastic equations are
estimated simultaneously, a distinction being made between jointly de-

pendent and predetermined variables in each equation, and explicit

 

1See Koopmans and Hood [1953], pp. 144-147.

158

159

account being taken of all structural coefficient restrictions and any
identity equations which may complete the system.1

The FIML method may only be applied if the number of jointly de-
pendent variables in the system equals the number of equations including
identity equations.

Recording the particular variables which are said to occur in
an equation is equivalent to restricting the coefficients of the equa-
tion corresponding to the remaining variables in the system to zero.
The coefficient of one jointly dependent variable in the equation is
also restricted to -l to provide a normalization rule (see section I.C.
1).2 Initially, these are the only types of restriction that will be
permitted on the coefficients; however, in section V.F FIML estimation
will be generalized to take account of arbitrary linear restrictions
imposed on the coefficients.3 In no place in this paper is consider-
ation given to FIML estimation with restrictions imposed on the co-
variance matrix of the disturbances. In particular, not considered
is the much simpler computational method called full-information
diagonal (often abbreviated to FID) which is obtained by assuming that

all off-diagonal elements of the disturbance variance-covariance matrix

 

1Actually, the coefficients of predetermined variables in the
identity equations are not used explicitly in the computation of the
structural coefficients; however, if they were not known, the equation
would not be a true identity equation. Thus, an identity equation
which contains no jointly dependent variable adds no information to
the system and may therefore be deleted. Also, the specification of
the model is not changed if all predetermined variables in an identity
equation are multiplied by their respective known coefficients and
combined into a single predetermined variable.

2A8 noted in section I.C.l, for FIML estimation it makes no sub-
stantive difference which jointly dependent variable is singled out as

the normalizing jointly dependent variable.

In section V.F the normalization rule is also generalized.

160

are zero. Thus, the FIML procedure developed in this paper is the full-
information non-diagonal procedure (often denoted elsewhere as FIND).

Although considerable progress has been made in developing FIML
procedures which permit estimation of equations in which coefficients
enter in a non-linear fashion,1 consideration in this paper will be
restricted to estimation of a system in which coefficients enter each
equation in a linear fashion.

In this chapter, we will assume that the system is identified.
This means that each equation must be not only just-identified or over-
identified in the single equation sense usually treated, but in a
multiple equation sense as well.2 Although some additional require-
ments must be met over and above those required for identification for
single equation estimation, the single equation identification rules
applied separately to each equation provide a good starting point.

A misconception fairly generally held is that there must be more
observations in the sample than number of coefficients in the system
to be estimated. This is certainly not true. Provided that there are
sufficient observations that the estimated disturbance variance-co-
variance matrix is not singular (there must be more observations than
number of equations with disturbances), only sufficient observations

that DLS can be applied to each equation separately is all that is in

 

1Eisenpress has made considerable progress in this area. See
Eisenpress and Greenstadt [1964].

2Identification is treated in the multiple equations sense in
Koopmans, Rubin, and Leipnik [1950].

161

general required.1

(At this point the reader may want to review the notation de-
veloped in section I.C for expressing a system of equations including
notation related to the reduced form equations.)

Another property which is certainly worth noting applies to the
reduced form coefficients estimated from the FIML structional coeffi-
cients. Let, as before, the reduced form equations be expressed as
Y = XH' + V. Let h be any estimate of H which is not inconsis-
tent with the restrictions imposed on the coefficients of the structural

1“

equations. Also, let V = Y - XH' be the matrix of reduced form resid-

A A

uals and (l/T)V'V be the estimated variance-covariance matrix of the
reduced form disturbances. Then if HFIML is calculated from the
estimated FIML coefficients of the structural equations, i.e.,
A [5-1 A 1: is

= -F B h ' d —V' '
HFIML FIML FIML’ t e resulting et(T FIMLVFIML) Will be less than
or equal to detQ%V'§) obtained from any set of reduced form coeffi-

cients which are not inconsistent with the restrictions imposed on the

. . . 2 . .
coeffiCients of the structural equations. Thus, of the estimating

 

1That the estimated disturbance variance-covariance matrix, S, will
be singular if the number of observations exceeds the numbe “of equa-
tionsAis easily shown. As noted further on in (v.19), S = I(Z'Z)a' =
(1/T)U'U; hence, rk S = rk U. If T‘< M, then rk U and therefore
rk S will be less than M; i.e., S will be singular.

Since S is calculated in the same manner for limited information
subsystem maximum likelihood (SML), the Zellner-Aitken estimator (ZA),
and three—stage least squares (BSLS), the requirement that there not be
fewer observations than stochastic equations applies to these methods
as well.

2For this to be a meaningful statement we must assume that identity
equations have been incorporated into the system by solving out one
jointly dependent variable for each identity equation (thereby imposing
less convenient restrictions as will be noted further on). Otherwise,

162

procedures which take the full information of the system into account,
the FIML method gives the minimum estimated generalized variance of
the disturbances of the reduced form equations.1 For this reason, it
is common to refer to FIML estimates as least generalized variance
(LGV) estimates. The LGV property does not, of course, rely on an
assumption of normality. It is a property similar to the least squares

property for a single equation.

 

x A i A '"’ ._

det(l-V3V) = det(l{[u 3 0]?'1}'[u 5 Ojf'l) = det(lEF’lj' U r 1) = o
T T T 0 0

for any method meeting the restrictions of the structura equations.
(Use of identity equations to solve out jointly dependent variables is
necessary to explicitly specify properties, only. In the computational
procedure which is presented, the identity equations are explicitly rec-
ognized by the computational procedure rather than used to eliminate
jointly dependent variables from the system.)

lsee Goldberger [1964], pp. 352-354.

163

B. Derivation of the Likelihood Function to be Maximized

 

Before continuing, let us designate what is meant by maximizing
the likelihood function. Our equation system is a system of M equa-
tions containing disturbances and G - M identity equations which may

be written as
(v.1) za'+[u30]=o

or equivalently as

(v.2) 02' + [Z] = 0'

The matrix of coefficients, a, may be subdivided into the matrix of co-
efficients of jointly dependent variables, F, and the matrix of coeffi-
cients of predetermined variables, B. A further subdivision can be
made on the basis of whether the coefficients are coefficients of
stochastic equations or coefficients of identity equations (hence are

known constants). Thus, a may be subdivided as:

a1 1"1 BI
(v.3) a = [r : B] = =
F B
C'11 II II
where OI = [PI 3 Bl] represents the coefficients of the M stochastic
equations and all = [T11 3 B11] represents the coefficients of the

G - M identity equations.

As a step in the derivation of the likelihood function, we will
use the G - M identity equations to temporarily eliminate G - M
jointly dependent variables from the system. (The eliminated variables

will be reentered into the system at a later step.) Suppose we divide

164

our jointly dependent variables into two groups, Y = [Y 3 Y2], where

1

Y2 contains G - M jointly dependent variables to be temporarily

eliminated from the system and Y contains the remaining M jointly

1

dependent variables.

To reflect our subdivisions, we may rewrite (v.2) as:

t ' + t I = I
(v.4) PllYl + F12Y2 BIx + U 0
(v.5) F Y' + F Y'-+ B X' = 0'

21 1 22 2 II

where the T matrix has been further subdivided to reflect the division
of jointly dependent variables into those which are to be temporarily

eliminated and those which will remain, i.e.,

 

 

 

 

r- 'H
I“I F I11 I12 1
MXG MXM MX(G-M)
(V 6) F = =
GXG 1“II 1“21 T22
_(G-M) xqj _(G-M) XM (G-M) x (G-M)_J
We can assume that F is non—singular, since if it were singular we

22
could merely select a different set of jointly dependent variables to
be temporarily eliminated thereby rearranging the columns of F until
a nonsingular T22 is obtained.1

Solving (V.5) for Yé we obtain:

!_ v_ v.
(V.7) FZZYZ -F21Y1 BIIX ,

hence,

-1 -1
c ___ _ v _ v
(v.8) Y2 F22F21Y1 FZZBIIX

 

F was assumed to be nonsingular in section I.C.3, assumption (4).

165

Substituting (v.8) into (v.4), we obtain:

-1 -1
. |_ I" I_ F v I I: I
(V 9) F11Y1 F12 22r21Y1 F12 22311x + BIX + U 0
or
(v.10) F* Yi - B* x' + U' = 0'
MXM.MXT MXA AXT MXT MXT
or
(v 11) 0* z' + U' = 0'

1
MX(M+A) (M+A)XT MXT MXT

-l
* g - F ' X ' o o
where F F11 12F22F21 is the square M M matrix of coeffiCients

of the remaining jointly dependent variables.

-1
* = - F F ' x ' . . .
B B1 12 22BII is the M A matrix of coefficients of predetermined

variables,
0* = [F* E 8*] is the MX(M + A) matrix of coefficients of all the
variables remaining in the system, and

z =[Y15X]

If we assume that the TXM matrix of disturbances of the sto-

chastic equations,

f‘. '.

T. -‘ ,
’U[1] gull ”ins

I
ll

UCTlJ u11 '°' uTM

 

 

has a multivariate normal distribution with 6U = 0, 6Uﬁt]U[t] = 2 for
all t and 6UEC]U[t] = 0 for trt', 2 being a positive definite

matrix, we can write the density function for U as:

T

T/2 .1 -1
2 exp(-— )3 U 2 U' )
2t=1 [t] [t]

(v.12) f1(U,ZD = (2n)'T/2det’

166

We can convert this density function to the likelihood function
by using (V.ll) to transform the system from U to Z1 and 0*, taking
account of the Jacobian of the transformation. Also, the logarithm of
the likelihood function is easier to work with in this case than the
actual likelihood function. (Since the logarithmic transformation is
a strictly increasing one, the logarithm of the likelihood will be
maximized at the same point as the actual likelihood.) After these

transformations are made, our "logarithmic likelihood function" may be

written:

1.

2 logEdet 2] + T logEabS(det I“*)]

_ Z _
(V.l3) f2(Zl,a*,ZD - -2 log 2U

21 a*'Z-lo'*zi
t—l [t] [t]

Nh—I
Ipqra

If (V.l3) is maximized first with respect to ‘2 (thereby con-

centrating the function onto 0* and 21), we get the relation:

A

A l A
= — M ' *'
(V.l4) Z T (Z121)o
Substituting 2 from (v.14) into (v.13) and dividing by T/2, the

following function is obtained:

(v.15) f3(&*,zl) = c1 + logEdet2f*] - log[det(%'&*(ZiZl)&*')]

 

1See Koopmans and Hood [1953], pp. 143-160, 190-191.

For a more detailed discussion of the ”stepwise" maximization
procedure see Koopmans and Hood [1953], pp. 160, 161, 191.

167
1

where c1 is a constant.

2 2 1 2 2 2
*1: =
However, det (F ) det (F11- F12F22F21) det F/det F22 and

1 . . g 1 I u .3 .
59*(lel)a* EQI(Z Z)crI , hence, (V.lS) may be rewritten as

(v.16) f4(&,z) = c - logEdetzrzz] + iogEdeth‘] - log[det(%&I[Z'Z]d/i)],

 

1
T
2
1 c1=T glong-EZZI &*'z lazi)
t=1 [t] [t]

= -1og 2n - 3(ltr{—zla*'[&*(ziz1)&*']1&*zi})

= -log 2n - —tr{-a*2121&*'[a’*ZiZI&*']-1}

l l l 1
= - - ‘_’t — g -1 2T1 . -. = a. D '-
log 2n T r{TI} og T 1 log 2n T

where tr denotes trace and we have used the relationship tr(AB) = tr(BA)
for any matrices A and B provided AB and BA are defined (i.e.,
provided the number of rows of A equals the number of columns of B
and the number of rows of 8 equals the number of columns of A).

I A

Since det 0 I = det I = 1 for any matrix A and since

(det B)-(det C) =“det(BC) for any square matrices B and C of the

same order, we have

I I"IZFZZI 11 1“12
det F = det ' det '
I 21 I22
irr'lr 1“ rrr‘l o
' 12 22 11 12 11 12 22r21
= det = det
F
O I I‘2]. 22 0 F2
= - . _ f _
detU‘11 r12r22r21] det F22, hence, detU‘11 12F 221F21] det F/det F
3U = -Zla*' = -Zai by (V.ll) and (v.4); therefore

I = I I
lezloe“ orIZ Za/I

22 .

168

or since F22 is a known matrix (coefficients of identity equations

2 .
are assumed known), logEdet F is merely a constant; therefore,

22]

(v.16) may be written as:

(v.17) f5(&,2) = c

2 + logEdetzf] - 108[d8t(%&1[2'z]&£)]

. 1 . .
where c2 is a constant. Note that T is the matrix of coefficients

of all G jointly dependent variables in the 9 equations (including

 

identity equations) whereas 01 is the matrix of coefficients of all

(C + A) yariables but only for the M stochastic equations.

 

To calculate FIML estimates we will select &1 2 such that the
concentrated logarithmic likelihood function (V.l7) is a maximum given
our particular sample 2 and the restrictions which we have imposed
on a (some elements of GI are assumed to be zero, others are
assumed to be -1 for normalization, and all of the elements of aII
are assumed known). Notice that it is not necessary to use the identity
equations to eliminate jointly dependent variables in order to write
down the concentrated logarithmic likelihood function. (We temporarily
eliminated some jointly dependent variables in the derivation only.)

The computational procedure which will be used to maximize f5(a,Z)

also will not require that the identity equations be used to eliminate

jointly dependent variables.

 

1 _ 2
c2 — c1 - logEdet T22]
Sz
' = . = ' d d
Since a 01 a1 (Q11 13 assume known), only 01 nee

be estimated to complete the estimation of &

3Rothenberg and Leenders [1964], pp. 72, 73 first showed that it
is unnecessary to use identity equations to solve out jointly dependent
variables to maximize the likelihood function.

169
The matrix (l/T)&I(Z'Z)&i is used repeatedly in the elaboration

of the computational procedure which follows; hence, it will prove con-
venient to denote this matrix as S. If we use as an estimate of U

the TXM matrix
(V. 18) U = '20. ,
we can write S as:

(v.19) s = 3: = ;r1-&
MXM

1 Let a” be the vector of estimated coefficients of the uth

equation which are not restricted to either zero or -1, and let +au

be defined as:

-1
(v.20) a -[ ] ,
+ u a”

th ,
i.e. is the vector of non-zero coefficients of the u equation

’ +8“
including the normalizing coefficient. Then S may also be defined

as S = s with:
[w']

1 x x
= — a' _ 'U

' e A = - Z a .
(Sinc up + H +‘H)

Let us also group all of the unrestricted coefficients in the

system into a single vector of coefficients and denote this vector as:

(V. 22) a . ,

 

170

i.e., a is the vector containing all of the unrestricted coefficients
in. a, all unrestricted coefficients of the first equation (first row
of a) being listed first as the vector a1, then the unrestricted co-

efficients of the second equation as the vector and finally the

a2,
unrestricted coefficients of the Mth equation, as the vector aM. (The
coefficients of the G - M identity equations are all known, i.e.,
restricted; hence, they are not included in the a vector.)

Since Z is fixed for any given sample, we choose the unre-
stricted coefficients of a such that f5(&,Z) is a maximum; however,
for any given structure, the only elements of 8 which are allowed to
vary are the elements of the vector a. Thus, for an assumed structure

and a given sample, f5(&,Z) may be considered a function of the vector

a only, i.e.,

(IV.23) f(a) = f5(&,Z) = + log(detzf) - log(det S)

C2
Another function which will be maximized when f(a) is maximized is

the function
(IV.24) f*(a) = f§(a,Z) = detZF/det S ,.

We cannot readily maximize either f(a) or f*(a) by setting
partial derivatives equal to zero and solving for a, since the partial
derivativesare complicated nonlinear functions of the elements of a.
Some iterative procedure is required, and the iterative procedure pro-

posed in this paper is outlined in the next section. In this procedure,

 

LThe logarithm of f*(a) differs from f(a) only by a constant,
since logEdetZF)/det S] = log(det2 F) - log(det S).

171

a set of starting values for the vector a is assumed, and then the co-
efficients in a are progressively changed until f(a) reaches a
maximum. The first and second partial derivatives of f(a) play a

key role in the direction in a-space that a is changed at each step

of the maximization procedure; however, it is convenient to base the

amount of change in a given direction on the function f*(a).

172

C. Computational Procedure

 

1. .A maximization procedure for functions non-linear in the parameters

 

Let us first consider maximizing a function f(a) (with a
being an n dimensional vector) and assume that (l) f(a) and its
derivatives are sufficiently ”well behaved“ to permit use of a Taylor
expansion of sufficiently high order in approximating f(a) about the
starting estimates and subsequent points1 and (2) f(a) has only a
single maximum with no additional local maxima in the region which is
considered.2 f(a) may have saddle points in the n dimensional space
of7ma without causing difficulty. As the procedure is outlined for
the n dimensional case, it will be illustrated graphically for the
2 dimensional (n = 2) case. Further on, in applying the procedure

M

being described, the n (= 2 n )
n=l U

will become our vector a, i.e., a will consist of the vector of all

element vector a defined in (v.22)

unrestricted coefficients in the system which are to be estimated;

however, at this point, a is merely any parameter vector.

 

1The use of a Taylor expansion in deriving some properties of the
maximization procedure outlined here follows Crockett and Chernoff [1955].
Crockett and Chernoff [1955], p. 34 state that: "For most arguments
the use of third or fourth order expansions will suffice."

ZAt this point f(a) may be any function which meets these assump-
tions. In sections V.C.2 and V.C.3 the procedure will be Specialized
to the maximization of the f(a) given by (V 23). It is an unanswered
question as to whether for the f(a) given by (v.23) multiple local
maxima may occur in a region which has a positive probability of being
entered through choice of starting coefficients. For a number of pro-
blems, the writer has used a set of starting coefficients and calculated
the coefficients which maximize the f(a) of (V 23) and then assumed
a set of starting coefficients in which all coefficients varied widely
from the original set of starting coefficients. In all cases the same
maximum was reached. This is encouraging but does not, of course, show
that for many problems (or even these problems) multiple local maxima
do not occur in regions of interest.

173

Figure V.25 . Figure V.26

   

amax f(a) amax f(a)

-
'

  
  

   
   

  

~oint with

'point with
ighest f(a) a('

ghest f(a) of

V.

 

 

 

 

a I
,\’, of distance tar distance 3 (using
8 from 3(1) .. 771 metric) from a ) '
a1 31;

In Figure V.26 the contour lines represent the same function
as the contour lines for Figure v.25. (Each contour line is a locus
of points having the same value of the function f(a).) The outer-
most contour line has the lowest value of f(a), and the innermost

one the highest value. The maximum value of f(a) occurs at

amax f(a).

(1)

Suppose that we start at an initial point a and consider

all points which lie a distance of exactly g units from the point

an)

. Next, suppose that the direction d is chosen such that at

(1), “am

the distance g from a + gd) is the maximum value which

can be reached. (I.e., the locus of all points a distance of g from

(1)

a is the surface of a sphere of radius g centered at the point
8(1). The arrow is drawn through the point on the surface of this
Sphere [circle in the case of Figure v.25] with the maximum f(a).

The angle of the arrow indicates the direction d.)

174
If we continue in the direction d given by the arrow in Figure

v.25, we will pass quite a way to the left of amax f(a).

Suppose that
instead of considering the locus of points on the circle, we had con-
sidered the locus of points on an ellipse (ellipsoid in the case of an
n-dimensional vector a) of approximately the same shape as the contour
lines. In Figure V.26 an arrow has been drawn through the point with
maximum f(a) on the surface of the ellipse. Notice that the arrow
now points much more closely toward the maximum.
An ellipsoid may be traced out instead of a sphere by merely

making our concept of distance more general. Let the distance g be-

tween any other point, a(1), and a(1) be defined as:

 

(“7) g = Jam _ £1(1)).Wau) _ 8(1))

where 7” is a positive definite matrix. Then all points at a distance
g from a(1) will lie on the surface of an ellipsoid instead of a
sphere. In Figure V.26 the matriX' W2 is such that the resulting
ellipsoid is approximately the same shape as the contour lines, while
in Figure v.25 theﬂ‘ivdentity matrix has been used as the matrix 7n and
therefore a circle has been traced out. The matrix 7)! is termed a
metric. The Euclidean metric is represented by' WI==I as in Figure
v.25. V

Suppose that in selecting the direction to move, an arbitrarily
small distance g is selected by letting g approach 0. Then, assuming
a given metric 771, it can be shown that the direction in which f(a)

(1)

increases the most rapidly from the point a is given by

175

-lf(1) *(1)

 

d = m where 1 is the partial derivative of f(a) evaluated
at the point 8(1), i.e.,
T
afga)
831
* 1 f a
(v.28) ,( ) = Me: (1) =
a=a
af(82
Ba
_ n ].=.<1>

 

 

Crockett and Chernoff show that for any positive definite matrix 7”

-lf<1)) > f(an)

and choice of h sufficiently small, f(a(1) + h°7ll )

provided f(l) i 0.1
Although any positive definite matrix 771 can be used for the

metric and for h sufficiently small the direction will be sufficiently

good that the function will increase (the function will stay the same

if we are already at the maximum), some metrics will obviously be

better choices than others, e.g., compare the two metrics used in

Figures v.25 and V.26.

(1)

If f(a) is expanded about the point a in a Taylor ex-

pansion, the following is obtained:

(v.29) f(a) = f(a‘1)) + [a - a(1)J'T(1) +

[a - a(1)]':él)[a - a(1)] + higher order terms

*
where 1(1) is the nX1 vector of first partial derivatives of f(a)

 

is the maximum

f(a‘1)>;

1Crockett and Chernoff [1955], p. 35. If a(1)

* - * -1
then 1(1) = 0 and f(a(1) + Hm 11(1)) = f(a(1) + MM 0) =
i.e., no movement is made away from the maximum.

176
(1)

with respect to the elements of a evaluated at a as
before, and

*
ad (1) is the an matrix of second partial derivatives of f(a)

with respect to the elements of a evaluated at the point

8(1) .

For any function f(a) with continuous first and second partial

derivatives, a local maximum occurs at any point satisfying Qéégl = 0
2

. a f(a)

if 2

33,

is a negative definite matrix at that point.

If the partial derivative of the Taylor expansion given in (v.29)

is taken with respect to a, ignoring all higher order terms, we get:

(v.30) (age? 4(1) +_%,2.£*(1>[a _ 8(1)]

Setting gag—l to zero and solving for a, we get:

(v.31) [a - am] = (£*(1))-lf(1)
or
(v.32) a = 3(1) _ (i?(1))-1T(1)

Also, taking the partial derivative of (v.30) with respect to a we

get:

2
(v.33) biéél = {*(1)
aa

Since the second partial derivative must be negative definite

for the point to be a local maximum, we get the additional condition

* *
1. (1) must be negative definite or -£ (1) must be positive definite

177

for the point to represent a local maximum.

To summarize, if it is assumed that consideration of only the

(1)

first three terms of a Taylor expansion about a will give a

sufficiently close approximation to f(a), then the following holds

(1):

in a sufficiently small region of a

amax f(a) = a(1) *(1))-1f(1)

(v.34) - 05

Multiplying the metric and the partial derivative by T will

not change the result. Thus, (v.34) is equivalent to:1

(v.35) amax f(a) = a(1).+ ¢£(1))-11(1)
* *
where {(1) = ~T£ (1) and 1(1) = T1 (1)
Since a(1) will usually be some distance from amax f(a), the pro-

cedure indicated by formula (v.35) will be modified as follows:

(1) If {61) is not positive definite, it is adjusted to form another
matrix, |£(1)| , which is positive definite, and |£(1)| is used
as the metric.

(2) Instead of using a sfep size of 1 as (v.34) implies, a step size
of ha) is used, i.e., an) + h(1)-|£(1)‘-11(1) is used.3

(3) A check will be made that, given the step size h(1)

f(an) + h(1>,|£(1)|-11(1))

f(a(1)

is indeed greater than

).

 

1If the variables were not normalized in the manner indicated in

Part III before computation begins, multiplication by T might be
expected to increase rounding error; however, due to the normalization
used, rounding error will not be increased.

2A method for forming the I=€(1)I matrix from the in) matrix
is given in section V.C.3.

ha)

Determination of the step size is discussed in section

V.C.4.

178
(4) A series of steps will be taken with the direction and step size

recomputed at each step until amax f(a) is reached.

Instead of recalculating the metric each iteration, the same
metric may be used for a number of iterations, only the vector of
partial derivatives being recalculated each iteration. The writer has
not compared total time required for convergence if the same metric is
used for a number of iterations with the total time required for con-
vergence if a new metric is calculated each iteration. Up to now, the

writer has always recalculated the metric each iteration (and has

obtained rapid convergence on all problems attempted).

 

1Determination of when convergence or maximimization is achieved
is discussed in section V.C.S.

179

2. The vector of partial derivatives for FIML

 

In expressing the vector of partial derivatives and some alter-
native metrics for FIML, a number of intermediate matrices will be cal-
culated from a given vector of coefficients, a, and the sample values
of the variables, Z (more particularly from the Z'Z matrix), given
the assumed structure. The Z'Z matrix and the assumed structure do
not, of course, change between iterations. Only the vector of coeffi-
cients changes (given Z and the assumed structure, f(a) is a function
of a, only). We will use a superscript on the a vector to indicate

a(i-1)

that a particular set of coefficients are used (e.g., ). No

superscript will be used on the intermediate matrices calculated from

i-l
a( ) vector and Z'Z matrix, since it will be obvious from

the
the formulas given which intermediate matrices change wherever a new
vector of coefficients is used. The matrix T will be treated as an
intermediate matrix of this form (i.e., f will not be superscripted),
since it is formed from (1) the elements of the coefficient vector a,
corresponding to jointly dependent variables in the system, (2) the
coefficients of the identity equations corresponding to jointly de-
pendent variables in the system, (3) zeros, and (4) -1's.

In computing the direction for the ith iteration, Tégéélg eval-
uated at a(i-1) (i.e., with the coefficients obtained from the pre-

ceding iteration used as a) is the right hand side term by which the

metric inverse is first multiplied. This term may be written as:

 

1See Rothenberg and Leenders [1964], p. 61, 63-64 for a derivation
of the vector of partial derivatives. Rothenberg and Leender's notation
differs slightly, since their logarithmic likelihood function is 1/2 of
the f(a) given above, i.e., their logarithmic likelihood function is

180
réfga?
531

(v.36) 51'1) = Téiiél

. = T
aa a=a(1-1)

afga)
1.. aaML=a(i-l)

For FIML, the part of the right hand side corresponding to the unre-

 

 

. . . th . .
stricted coeffiCients of the u equation may be written as:

 

. @{Mu} M
(v.37) 1 (1'1) = Tégéﬂl (i_1) = T 1+ 2 s”“ z'ﬁ ,
u u a=a - 0 u'=1 u u
§{u|u . 31a]
= T + ZJU I
0 .
8”:

 

where

§{p|u} is the part of the nth column of f-1 corresponding to the un-
restricted coefficients of the jointly dependent variables of
the nth equation. (f is a GXG matrix; that is, the coeffi-
cients of jointly dependent variables in the identity equations
are included in the % matrix. Only a part of f-1 is used--
the part corresponding to the unrestricted coefficients of the

Y{ulul

. . . 1
stochastic equations. is an thI vector.

(1/2)[c1 + log(detzf) - log(det 8)]

= k' + (1/2)-2 log[abs(det f)] - 1/2 log(det s) where k' = (1/2)c2.
Also the vector of partial derivatives and the metric have been multi-
plied by T in this paper as noted in the conversion from (v.34) to (v.35).

(l/2)f(a)

§{u|u}

1The rather unique notation should seem more justified

A I A I
when the Y1“ u} and Y{u|p } vectors are defined during the formation
of the at metric (section V.C.3).

181

O is an prl column vector of zeros corresponding to the pre-

. . . th
determined variables in the u equation.

t -
sun is the element of the u h row and the u'th column of S 1.
_ I" = I (i‘l)
_ I = I (1‘1) I (1‘1) 1
zuU [(211 +Zl)+al (2p. +ZM)+aM 1

S is an MXM matrix corresponding to the M stochastic equations.

Notice that given the assumed structure and a set of sample
values of variables, Z, the matrices and vectors given above are all
calculated from a given vector of coefficients a or from intermediate
matrices calculated through use of the vector a. A set of starting
coefficients such as from ZSLS, LIML, DLS, 3SLS, or merely a set of
"assumed" coefficients may serve as starting coefficients in calcu-
lating the first metric and right hand side.2 The coefficients from
the i-lSt iteration are used in calculating the metric and right

. .t . .
hand Side for the i h iteration.

 

-1

1 (i-l) .
a = a d Z = : 2 as before.
+ u- a(i-1) n + u» [ya u]

u

2A3 noted in section V.C.l the question of whether multiple local
maxima occur in the region of interest is an unanswered question. To
date the writer is not aware of a problem in which two sets of starting
coefficients have led to different local maxima. (If rounding error
were large any problem might appear to have many local maxima.)

182
3. Metrics for FIML

2
A rationale for suggesting the use of -T§-£%El as the metric

as
to use in determining direction was derived earlier. We will call this
. . 1 (i-l) .
metric the :2. metric. Often a , the vector of unrestricted co-

efficients to use in calculating the metric and the vector of partials
for the ith iteration will be sufficiently far away from amax f(a)
that £(i-1) is not positive definite. For this reason, we will con-
sider three other metrics--the 9 metric, the I? metric, and the lil
metric. Of these metrics we will use only the ‘dﬂ metric in our
FIML iterations. The formulas for the -9 and E’ metrics are given so
that we can draw correspondences to other methods involving one or more
iterations such as the Zellner-Aitken estimator (ZA), iteration on the
ZA estimator (IZA), BSLS, and IBSLS. (All of these methods are dis-
cussed in subsequent chapters.) The -9 and E’ metrics are used
during early iterations in some FIML computational schemes, but not
the computational scheme given in this paper.

The metrics are most easily defined by dividing them into blocks
which correspond to the division of the coefficient vector a accord-

ing to the equations from which a came. Let. W? be an arbitrary

metric subdivided as:

r- o o s j
77‘11 mm

(v.38) 771 =

 

 

li ... %_J

L.

 

2
¥§—£%El is the usual Newton metric.
3a

183

where ”L“. is the block of the metric whose nu rows correspond to
. . . th

the unrestricted coefficients of the u equation and whose n ,

u

o o a th o
columns correspond to the unrestricted coeffiCients of the u' equation.
In this section, we will omit from the metric the iteration

number specifying the set of coefficients used as the vector a in
evaluating the metric.

t 1
The 1111' h block of the 9 metric is:

un'
.39 6’ = 'z
(V ) uu' 8 Zn u'

2
The 6’ metric is the -T§—£éél matrix derived if the log(detzf)
aa
term of (v.23) is ignored in defining f(a). [The log(detzr) term
will not appear in f(a) if the Jacobian of the transformation from

U to a and Z is 1, or if it is ignored.]

The uu'th block of the R metric is:2

‘—Y'Y Y'X
’[ u u'JnX u u'
I I
(v.40) g; . = sup [z'z .] x = sup 1 t
p‘ p. 1". ll x'Y ' X'X '
u u u u
”Y'Y - Y'Y Y'X
' u u' E u u'.LX u u'
= Sun
X'Y , x'x .
u u u u
Y'X = r'x [X'Y = X'Y , and [x'x = x'x Since
([ u H'JIX u u' ’ u u'JIX u u' u u'JHX u u'
X” and X”. are submatrices of X [the matrix of predetermined vari-

ables in the system]-- see (1.54) and (1.40).)

 

1The ‘9’ metric changes each iteration since 8 is calculated from
the vector of coefficients and, therefore, 5“” changes each iteration.

2 . . . .

The E’ metric changes each iteration Since S is calculated from
the vector of coefficients and, therefore, 8““ changes each iteration.

184
If x has full column rank, then (v.40) may also be written as:

(v.41) ELu' = suu'ZﬁX(X'X)-1X'Zu. ;

however, (v.40) is a preferable computational formula since [Y'YJIX
may be calculated by the orthogonalization procedure of section I.D.2,
[Y'Y]IX calculated as Y'Y - [Y'YJIX’ and all of the matrices of the

form [YJY extracted as submatrices of [Y'Y]

The E’ metric was derived by Rubin as a matrix whose inverse

is asymptotically the same as ail, (imax f((1)).1 being the asymptotic

maximum likelihood estimator of the FIML coefficient variance-covariance

matrix.
,th . . 2
The 1111. block of i for the FIML estimator is:
w' .. ~
(v.42) i , = T H , + 8 2'2 , - (l/T)Z'UF ,U'z .
MD MD H H H UH H
where:
- 'A =_ ' A ... A ... z'
zuU szul uMJ= [(zu +1)+ 1 (z +ZM)+aMl

 

1Its use is suggested in Rubin [1948] and Chernoff and Divinsky
[1953]. Later yet, the R’ matrix became important as the matrix in-
verted in the calculation of BSLS. By showing that Var(63 asymp-
totically equals Var(6H ML), Zellner and Theil [1962] and 3§f§ers have

also shown that asymptotically E’l equals 151, since the usual
:zymgﬁqtic Var(635LS) is Er and the usual asymptotic Var(6FIML

2See Rothenberg and Leenders [1964], pp. 63-65 for a derivation
of =£ , . Rothenberg and Leender's logarithmic likelihood function

)

is 1/2 of the logarithmic likelihood being maximized here. Also, the
vector of partial derivatives and the metric have been multiplied by
T in this paper (see footnote 1 p. 179).

The i matrix changes each iteration since the vector of coeffi-
cients is used in the calculation of the intermediate matrices given
below.
3 a = -1 and Z = [ i Z ] as before
+ I» a + u- y ' u '

u

185

 

 

 

P 7‘
31"
. . _1
' ' [81H,,,SMJ']+SHHS
, .
an R“.
L? a
A I A I I —]
Rh:- Iu}Y{u-|u } 0
m xm , m XL ,
u H u M
Hun. - with:
O O
6 Xm 6 X6
L I» u' H 1111
A I
Y1“ lg} being a vector containing the mu elements of the nth column
of f- corresponding to the mh unrestricted coefficients
of the jointly dependent variables of equation u. (T is a
GXG matrix, since the coefficients of jointly dependent vari-
ables in the identity equations are included in the F matrix.)
A I
Y{u|u } being a vector containing the mu, elements of the nth column

A-

of F corresponding to the mu. unrestricted coefficients

of the jointly dependent variables of equation u'

The 9 and E metrics are always positive definite or positive
semi-definite. (Under some circumstances the 0 and R metrics will
be singular; therefore, positive semi-definite.) The :6 metric will
generally not be positive definite except close to the maximum; that
is, the i metric will generally not be positive definite when iteration

is started.1

 

1A notable exception is Klein's model 1. Although it is not

positive definite when starting from the ZSLS estimates, only a few
iterations are usually required to move the estimates into a region in
which the .i metric is poSitive definite. As a result of being so
well conditioned, Klein's model I is somewhat misleading since many
procedures which would have difficulty converging for many problems
which may be encountered will converge easily for Klein's model I, the
model on which they are usually tested.

186

Chernoff and Divinsky recommended use of the 6’ metric initially,
the E metric for a number of iterations and finally the :6 metric
when the estimated coefficient vector, a, becomes close to the coeffi-
cients which maximize the likelihood. They also suggested some guides
for switching from one metric to another.

Eisenpress wrote the first large scale FIML computer routine
available for general use and has calculated a wide variety of problems
to it. Initially he programmed it using the 9, E, and the i metrics
as suggested by Chernoff and Divinsky. After considerable experimenta-
tion with speed of convergence (and whether convergence would occur at
all for that matter), he along with John Greenstadt devised the Iiﬂ
metric which he then used as the only metric.

:6 can be expressed as i = EXE' where A is a diagonal matrix
with the eigenvalues of i forming the diagonal elements and E is a
matrix whose columns are eigenvectors of iL the eigenvectors being in
the same order as their corresponding eigenvalues on the diagonal of
A. (Any symmetric matrix may be decomposed in this fashion.) Let

'dﬂ be defined as:

(v.43) lil = ENE'

where IX| is the same as A except that the absolute values of the

eigenvalues replace the actual eigenvalues.

 

1Chernoff and Divinsky [1953].

2 . .
Personal conversation. Also, Eisenpress [date unknown].

187

As with any non-singular symmetric matrix, £51 may be formed

directly as:

(v.44) £1 == [EAE'J'1 = (B')-LA-IE-l = Ex'ls'

(Since E.1 = 3').

Thus, Iaﬂ'l may also be formed directly as
(v.45) |£l-l = Eli’lls'

where lA-1| is formed by replacing each diagonal element of A by 1

divided by its absolute value.1

Assuming that if. is not singular,|:d will be positive definite
by its method of calculation. Eisenpress has examined the effect of
setting negative eigenvalues positive and recombining the matrices.
The basic directions (the axis system of the metric being formed) are
established by the eigenvectors with the eigenvalues determining the
distance along each axis. The negative eigenvalues indicate a move-
ment along an axis away from the maximum. Setting them positive re-
sults in movement along the axes (given by the eigenvectors) in the
correct direction. Since the inverse of a metric corresponds to using
the inverse of the eigenvalues, a large eigenvalue corresponds to a
small movement along an eigenvector axis and a small eigenvalue corre-
sponds to a large movement. Since negative eigenvalues are generally
small in absolute value, setting them positive results in switching

the direction from a large negative direction to a large positive

 

1Since all off-diagonal elements of IA-ll are zero, lid-1' can

be formed more efficiently by forming the ijth element of liq-1 as
n

1 . th . _
kfl‘;E;—X;eikejk where Ak is the k eigenvalue of A and E [eij].

188
direction. Eisenpress feels that this is correct based on the geometry

of the situation. Convergence has been much more rapid when this is
done than in some earlier experiments in which be substituted zero for
the inverse of the negative eigenvalues (thereby moving along the eigen-
vectors only in the directions corresponding to positive eigenvalues).1
-1 -l . . . . . .
=6 and lil c01nc1de if all eigenvalues are pOSitive and
. . . . -1
Since direct inverSion of :6 to form :6 is faster than forming
-1
lid in the manner indicated above, the writer has modified the pro-
cedure as follows:
-1 . . . . .

(1) Use the lad metric until 3 consecutive iterations have been

performed in which all eigenvalues have been found to be positive.

-1

(2) Use the £1 metric by forming the metric through direct inversion

of =£ for 5 iterations.

-1 . . . . .

(3) Use ldﬂ for an iteration. If all eigenvalues are pOSitive

start with (2) again, otherwise start with (1) again.

(4) When convergence to amax f(a)

appears to haveoccurred, check as

to whether all eigenvalues of :6 are positive. If any are negative,
convergence has actually been to a saddle point. Switch to the

lid metric and Start with (1) again so that movement will be past

the saddle point toward the maximum. All eigenvalues of I. will

be positive at a local maximum.

It is not necessarily true that once a positive definite region
for :6 is entered, aﬂ will continue to be positive definite. A non-

positive definite region may again be encountered, e.g., one or more

 

1The writer hopes that he has not misrepresented Eisenpress and
Greenstadt's developments in any of the above.

189

saddle points may be encountered as movement is made toward the maximum.
No difficulty should be encountered in moving past a non-positive

definite region if a reversion to the I!” metric is made.

190

4. Step size to use at each iteration
Earlier it was indicated that in the FIML convergence procedure,
. . . 1
the coefficients for an iteration, a( ), are to be calculated from the

coefficients of the previous iterations as:

(v.46) a(i) = a(i-l) + h(i)d(i)

where d(i) = Ii‘i-1)|-11(i-1) and h(i) is the step size for the
iteration.

So far we have covered only the direction in which we will
move from a set of coefficients in the space of unrestricted coeffi-
cients at any one iteration. The distance we move at any iteration
can also be very important. Consider the situation given by Figure

v.47.

Figure V.47

 

 

 

191

x f(a) is the point of maximum likelihood and

Assume that ama
the contour lines represent points with equal value of the likelihood
function. Assumingthatdirection (11 is taken with Step Size h1 for

the first iteration, d is taken with step size h2 for the second

2
iteration, etc., we may easily end up Spending many iterations trying
to move up the long narrow ridge (or we could even move down the ridge)
due to using a series of step sizes which are too large. Figure v.47
illustrates this.

On the other hand, if for each step we could vary the step size

such that we land somewhere near the top of the same ridge our situation

could be as in Figure v.48.

Figure v.48

 

 

 

If too small a step is taken,»the above may take many steps. Thus,

our step size as well as our direction takes on considerable importance.

192

Following is a scheme which may be used to determine the step

size, h,_to be used for an iteration:

(1)

(a)

(b)

For the usual iteration, a step size of hi1) is tried, i.e.,

£*(a(i'1) + h§1)d(i))

(i)
1 .

If f*(a(i-1) + hi1)d(l)) > f*(a(1-l)), a trial step size twice

is calculated. (Determination of a start-

ing value, h is discussed further on.)

as large, i.e., 2h(1) is tried. If f*(a(i-1) + 2h§i)d(i)) >

1
f*(a(1-1) + hil)d(l)), a step size twice as large, i.e., 4hii)

is tried. This process is continued until the jth time a step

size twice as large is tried, f*(a(1-1) + zjh§1)d(i)

)S

f*(a(1-1) + 2j-1h§1)d(1)). At that time a quadratic approximation

is used to calculate a step size which we will call h2° If

f*(a(i-1) + h2d(i)) > f*(a(i-1) + 2j-1h§i)d(i)), h2 is used as
the step size, h, for the iteration. Otherwise, 2(j—1)h§1) is
used as the step size, h(i), for the iteration.

0n the other hand, if f*(a(i-1) + h(i)d(i)) S f(a(i-1)), a trial

step size half as large, i.e.,-1h(i) is tried. If f*(a(i-1) +

2 1
2h§1)g(1)) < f*(a(11))’ a step size half as large 1 e ’ z lhii)

is tried. This process is continued until at the j th time a step
size half as large is tried either f*(a(1-1) + (l/2)jh{i)d(i)) 2

f*(a(i-1)) or (1/2)Jh{i) < eh .

(aa) If £*(a(i‘1) + (l/2)jh§i)d(i)) > f*(a(1-1)), a quadratic

approximation is used to calculate a step size which we will

call h2.1f f*(a(i'1) + h d(i)) > f*(a(i-1) +

(i) d(i)

2

),h is used as the step size, h(1), for

(1/2)jh

the iteration. Otherwise (l/2)jh{i) is used as the step

193

size, h(1), for the iteration.

(bb> .If f*<a(i'1) + <1/2>jh{i)d(i)) = f*(a(i'1)>, (1/2)jh(1)
h(1)

is used as the step size, , for the iteration.

(cc) If (l/2)jh (i) < ch, a negative step, -(1/2)jh§i) is tried.

(i-l) )jh (i) d(i) (i-l)

(aaa) If f*(a —(l/2 ) > f*(a ), the

trial step size is doubled [from -(l/2)jh (1)], re-
doubled, etc. in a negative direction in the same
manner as was done in a positive direction in step
(1a) above. When a step size such that f*(a(i-1) -
2k(l/2)jh{i)d(i)) < f*(a(i'1) _ 2(k-1)(1/2)jh{i)d(i))
is reached, a quadratic approximation is used to cal-

culate a step size-—h

(i-l)

2.

+rh2d

(1) (1-1) _

If f*(a ) > f*(a

is used as the step size,
(k-l)

2k'1(1/2)3h{i>d(i)>, h
h<i>

2
(1/2)j

, for the iteration. Otherwise, -2

(i)

is used as the step size, h , for the iteration.

If f*(a(i'1) - 2 k(1/2)jh(i)d (i))=

f*(a(i-1)=2k1(1/2)jh(i)d(1)) then -2 k(l/2)jh(i)

. i .
is used as the step size, h( ), for the iteration.

(i-l) _ (1/2)jh (i) d<i)) S f*( <1-—1>),

(i)

(bbb) If f*(a
(1/2)j+1h§i)

is used as the step size, h , for
the iteration.
(2) If (a) the absolute value of all elements of 1 are less than a

preassigned epsilon, e , i.e., the partial convergence criterion
1

(discussed farther on in the section on convergence criteria) has

194
been met; (b) igi-l) is positive definite; and (c) a positive
step size was used the previous iteration, then an initial step
size of l is tried.1 If f*(a(i-l) + d(i)) 2 f*(a(i-1)), the
trial step size is doubled, redoubled, etc. and a quadratic
approximation tried as for the usual iteration. If f*(a(i-l) +
d(i)) < f*(a(i-l)) a step size of 1 is used even though it leads
to a slightly lower likelihood value. The reason for imposing a

(i-l)

step size of at least 1 is that a is apparently very close
. . . (i-l)
to the maXimum. Given the small Size of the elements of I
a small step size would lead to almost no movement. If the step
size of 1 should be larger than optimal the next iteration can

)

. . . . . . i-l .
eaSily readjust the direction and Size, Since the :£( metric
is very powerful close to the maximum. If the point is not close
to the maximum, but the elements of 1 are quite small due to

the likelihood being almost flat, then again a large step is de- ‘

sirable so that a large movement will be made-
The 1W Step Size, hr”)

The derivation of a maximization procedure based on the Taylor
expansion given in section V.C.l Suggests a step size of l as an optimal

Step size; however, this argument is based on the Taylor expansion about

(1-1)'

a being a sufficiently good approximation to the likelihood function.

max f (a)

If a(1-1) is not very close to a then the Taylor expansion

is unlikely to be a good approximation to the likelihood function. There

 

*
11 rather than 1 is compared to an epsilon due to the normal-
ization of variables used in the computation. (Normalization of vari-
ables is discussed in chapter IX.)

195
are other arguments for a step size of 1 as being optimal but most of

these also founder on some noble assumption. It is generally advanta-
geous to allow the step size to vary even though the £1 metric is used

and the current region is one in which the £1 metric is positive definite.

(i-l)

Only when a is virtually at the maximum does it seem somewhat

desirable to limit the step size to l and even there the rules given
previously for step size allow the step size to be greater than 1.

In applying the step size rules given previously, a step size of
less than 1 is selected a far greater proportion of the steps than a
step size of l in the FIML problems calculated by the writer. (An
exception was Klein's model I where the average step size selected was

to .5

.9.) As a result, the writer has currently set hi1)

At first glance, it might seem that

hi1) might be set to the

step size used the preceding iteration or some proportion of this step

size. This would be a very undesirable choice for hi1), however, as

it turns out in practice that large step sizes tend to be followed by

small ones, and vice-versa. If any "rule" is to be selected it should

hi1) h(l-l), but the relation-

probably make inversely related to

ship is really to tenuous to be relied on.

It is quite possible that a variable hi1) which is better than

any fixed h could be calculated based on the eigenvalues of the

1
metric for those iterations in which the eigenvalues and eigenvectors

of the metric are calculated; however, the writer has not attempted to

develop such a rule.

196
Mim'mwn Step Size, Negative Step Size, and 6h

Currently an eh of .001 is being used by the writer. As noted

earlier, when the trial step value becomes less than c a negative

h
step of the same size is tried. If this gives a likelihood higher than
the likelihood for the previous iteration, searching goes on in the
negative direction for a higher likelihood value, yet. If the negative
step does not give a higher likelihood value, half of the previous
positive step is used as the likelihood value.

If a negative step is selected, the '11 metric is automatically
used for at least the next 3 iterations. It is expected that a negative
step will be selected only in rare pathological cases. It has been)
programmed into the procedure as a matter of interest to see whether
such cases arise rather than in the expectation that it will provide a
key element in the iteration scheme.

The selection of an a of .001 is, of course, quite arbitrary.

h

In selecting an c it is necessary to weigh the desirability of se-

h’
lecting a step size small enough that the likelihood is increased or
at least not decreased against the extra time it takes to calculate

the likelihood value at any given step size and the fact that if too

am (i-l)

small a step is taken, will almost coincide with a , (This

follows the old army adage, "Do something, even if it is wrong!")

QWC App/Lo Lima/tion

The quadratic approximation referred to previously consists of

calculating the second degree polynomial which fits three equally spaced

197

. 1 '-
pOints exactly. If a(1 1) is the value of a at the start of the
iteration, h* is the step size with f*(a(l-1) + h*d(i)) 2 f*(a(i-1)),
0-1 o o- o
and f*(a(1 ) + 2h*d(l)) s f*(a(l l) + h*d(l)), then the point

i-l ’
a( ) + h**d(1) will be the maximum value of the quadratic function

which goes through the three points a(l-1), a(i-1) + h*d(1), and

a(i-1) + 2h*d(l) when h** is calculated as:
(v.49) h** =

h* 1 + f*(§(i'1)) - f*(a(i-l) + 2h*d(i?)
(i-l) (i) (i-l) (i-l) (i)
2{f*(a + 2h*d ) + f*(a ) - 2f*(a + h*d )}

If in calculating (V.49) the denominator is zero, h** is set to 2h*
. ._1 . ._ ._ .
1f f*(a(1 ) + 2h*d(l)) = f*(a(l 1)) [and therefore f*(a(l 1) + h*d(1))

(1.1))] and h** is set to h* if f*(a(1-1) +

will also equal f*(a
2h*d(1)) # f*(a(i-l)).
Formula (V.49) holds even if h* is negative.

As noted earlier, if f*(a(l'1) + h*d(l)) > f*(a(i-1) + h**d(i)

),
h* is used as the step size rather than h**.

In problems calculated to date by the writer, the use of the
quadratic approximation has not been a powerful procedure in the se-
lection of a step size. For some iterations it has given a step with
a higher likelihood value and for some iterations it has not. In fit-
ting the quadratic, the f*(a) which we are using is only a montonic

function of the likelihood function. It is very possible that bettt:

results would be obtained if some other monotonic function of the

 

1Koopmans, Rubin, and Leipnik [1950], p. 172 attribute the use of
a quadratic approximation in the calculation of FIML problems to a
suggestion by John Von Neumann.

198

likelihood function were used in deriving trial step sizes. Alter-
natively instead of the quadratic approximation, some other approx-
imating function could be used.

In any event, until a better approximation is devised, the
actual quadratic approximation calculation is trivial and the number
of times it does lead to slightly better step sizes would appear to
justify the additional time required to calculate the likelihoodvalue
at the new point, a(i-1) + h**d(i).

"Local" methods could be derived for calculating an optimal
step size based on, say, the eigenvalues of the metric, the ratios
of elements of d(i), etc.; however, these methods require assump-
tions regarding the shape of the likelihood functiong (An assump-

tionmoften made in deriving a local method is that the likelihood

d(i), an

function is approximately quadratic in the given direction,
assumption which does not appear justified.) As noted earlier, such
local methods may be helpful for establishing an initial trial step,

h for an iteration, but they do not seem desirable as final de-

1,
terminants of the step size to be used for an iteration.

0n the other hand, the suggested step size procedure outlined
earlier is a "global“ method in that no assumptions are required re-
garding the shape of the likelihood, except that it has only a single

peak in the region under consideration and that there is no higher

peak.

199

5. Convergence criteria

 

Following are some requirements which could be imposed in de-

termining convergence:

(1)
(2)

(3)

(4)

)

‘-1
All eigenvalues of :£(1 must be positive for the iteration.

Partial derivative convergence criterion:

(V 50) max(abs 1(%)) s e ,
. [J] 1
J
i.e., the absolute value of all elements of the right hand side

vector must be less than or equal to a preassigned constant.

Coefficient convergence criterion:

(1)
v.51 max abs d , la , ) s g
( ) j ( [J] [J] a
(i) . .th . _. , .
where d[, 15 the J element of the direction vector, a , is
J] [J]
the jth element of a and ea is a preassigned constant.

If a step size of l were imposed, a(l) - 3(1-1) = (a(1-1) +

d(1)) - a(1-1) = d(l); therefore, (V 51) is equivalent to requir-
ing that with a step size of l, the absolute proportional change

would be less than or equal to ca for all coefficients.

Likelihood convergence criterion:

(v.52) f*(a(i'1) + d(i))/f*(a(i'l)) s €f* ,

i.e., if a step size of l were imposed, the ratio of the result-
ing likelihood to the likelihood for the preceding iteration is
less than or equal to a preassigned constant.

If a user desired to iterate until each iteration produced a

statistically insignificant Change in the coefficients, then a

200
(1)) and

stopping criterion based on the relatives sizes of f*(a
f*(a(i-1)) might be considered. It should be recognized, however,
that for many problems, coefficients differing considerably from
those which maximize the likelihood function may not differ
statistically from those which maximize the likelihood function,
let alone those of the following iteration; hence, the coefficients
derived thrOugh use of the likelihood convergence criterion may
differ considerably from the maximum likelihood coefficients.
Since it is the coefficients which maximize the likelihood function
which are desired-~not coefficients on a fairly flat surface away
from the maximum--the likelihood convergence criterion does not
appear to be a very fruitful criterion for convergence.
Up to now, in the FIML section of the AES STAT system (which is

discussed in chapter IX), (1) and either "(2) and (3)" or (4) have

been imposed for all problems. If none of the preassigned constants

were supplied by the user, the problem iterated to the maximum number

of iterations Specified by the user. If s but not ca was specified,

1

ea was set to 61 . If ea but not at was specified, 61 was set

to 100-68. An 6 of .0000000001 and an ea of .000000000001 have
I

worked well for a number of problems computed; however, both 3 and

Ba are quite arbitrary. Close to the maximum, convergence usually

becomes very rapid; so that a small ea or 9 takes little additional
1

computer time. On the other hand if rounding error becomes severe an

extremely small ea or 61 may be difficult to attain. The FIML

section is being changed so that if the user specifies neither 3 nor
1

201

sa, ea and a will be automatically set to .0000000001 by the FIML
1

section; hence, even if no convergence criterion is specified, iteration

will no longer be merely to the maximum number of iterations specified.

202
D. Estimated Disturbance Variance-covariance Matrix
The maximum likelihood estimate of the disturbance variance-co-

variance matrix used in FIML estimation is the S matrix with the

up element of S calculated as:
.53 = 1 A'“ = ' ' *
(v > sw. ( m tutu (“Thaw-+2.. +Zu')+au'
-l
a 2
where + u a and an is the part of amax f(a) corresponding
u

to the unrestricted coefficients of equation u.

In the single equation procedures previously discussed, the S
matrix contained only a single element and so a "degrees of freedom"
of T-nu was suggested as a possible denominator in calculating SUM
for the uth equation, where up is the number of "explanatory"
jointly dependent variables plus the number of predetermined variables.
A similar adjustment can be made in the calculation of the S matrix
for FIML if an estimator more compatible with single equation tech-
niques is desired. A more compatible estimator would be to use (V 53),
but substitute /ETIIE;1 JEIT—EJT for T in the formula.2 The
denominator for the uth diagonal element will then be T - qb as in
the single equation procedures and the denominators for the off-

diagonal elements will be such that the S matrix will still be

positive definite.

 

1See (V.18) and (v.21).

 

afT - n °/T - n , was suggested to the writer by Professor Arnold
Zellner‘as ah alternative to the use of T in the calculation of ZA
estimates. Professor Zellner neither endorsed nor disparaged this ad-
justment for ZA. He merely listed it as an alternative.

203

It seems desirable that the maximum likelihood estimate of 3
(using the denominator of T) be used during iteration until convergence

is complete so that &FIML will indeed be the maximum likelihood es-

 

timate of the coefficients. /T»- nu /T- nu, would only be used in
the denominator when printing out the estimated disturbance variance-
covariance matrix corresponding to the FIML coefficients. The adjust-
ment could also be used when printing out the estimated disturbance
variance-covariance matrix corresponding to the coefficients from
which the iterations were started or when printing out the estimated
disturbance variance-covariance matrix corresponding to intermediate-
stage coefficients.

5 , calculated by using T and SUU' calculated by using

up
/T - nM’/T - up, are asymptotically equivalent.

 

Often we are interested in the relative sizes of the estimated
covariances (i.e., relative to the corresponding variances). In this
case, the estimated disturbance variance-covariance matrix normalized

so that l's appear on the diagonal is useful. This matrix may be de-

 

 

 

fined as:
(V.54) SN = DSSDS
1 ﬁ
F 0 0
/311
0 1 0
fSZZ
where D =
S
0 0 1
/s
a MMJ

 

 

204

i.e., S is calculated by dividing each row and column of S by the
square root of the corresponding diagonal element of S. SN has the
advantage of being independent of the scale of the normalizing variable--

1's on the diagonal provide a convenient normalization. SN is the

 

same whether T or T - nu'/T - nu, is used in the denominator in the
calculation of the elements of S. The element in row u and column

u' of SN is:

9

(V.55) s ,//s s , ,
us was

the estimated simple correlation between the disturbance in equation

u and the disturbance in equation u'.

205

E. Estimated Coefficient Variance-covariance Matrix

 

The maximum likelihood estimate of the coefficient variance-covariance

matrix for FIML is:1

 

2 -1
(v.56) Var(a) = %-- §_j€§9_
‘\ aa ~max f(a)
2f '1 ax f(a) -1
= -T§—§ = [ip ]
53 max f(a)

 

a=a

The estimated variances of the individual coefficients are given
by the diagonal elements. The square roots of these diagonal elements
are often used as asymptotic standard errors.

The elements of Var(a) could be adjusted in the same manner as
the estimated disturbance variance-covariance matrix to provide an esti-

mate more compatible with the usual single equation estimates. Let

£51
up

to unrestricted coefficients of equation u and columns corresponding

, be the nuxnu, block of [imax f(a)]-l with rows corresponding

-1
to unrestricted coefficients of equation u'. Then, if ii”, were

multiplied by T , the resulting estimated coefficient

/i - n a/T - n .

variance-covariance matrix wdﬁld still be positive definite and

 

 

asymptotically the same as the one given in (v.56), but more compatible

with the estimated variance-covariance matrix given for the single

 

1Chernoff and Divinsky [1953], p. 259. This is the "information
matrix" of Kendall and Stuart [1961], pp- 28, 54-55.

206

equation methods.1 This adjustment should not be used during iteration
to the FIML solution--only during printing out of the estimated coeffi-
cient variance-covariance matrix after aFIML has been calculated.
Often we are interested in the relative size of the estimated
covariance. In this case the estimated coefficient variance-covariance

matrix normalized so that l's appear on the dia onal is useful. This
8

matrix may be defined as:

 

 

 

 

 

(v.57) varN(a) = Dc v;t(a) DC
’ 1 0 O T
JVar(a(1))
o 1 o
where DC = JVar(a(2)) ,
l
a O O _T_______d
/Var(a(n))

i.e., VarN(a) is calculated by dividing each row and column of Var(a)
by the square root of the corresponding diagonal element of Var(a).
VarN(a) has the advantage of being independent of the scale of the

variables; 1's on the diagonal provide a convenient normalization.

 

VarN(a) is the same whether T or' /TJ- nu /T - nut is used in the

 

1Computationally the adjustment may be accomplished by (l) multi-
plying each row by./T/(T - n ) where is the number of unrestricted
coefficients in the equationpfrom whigh the coefficient corresponding
to the row relates, and (2) multiplying each column by'JT/(T - n ,)
where n , is the number of unrestricted coefficients in the equa%ion

from whdch the coefficient corresponding to the column relates.

 

 

207

denominator in the calculation of Var(a). The element in row i and
column j of VarN(a) is the estimated simple correlation between

a, and a .
i

j

208

F. Arbitrary Linear Restrictions Imposed on the Coefficients

 

The iterative FIML procedure given in the previous section is
designed to maximize f(a) with respect to a; that is, adjust the
elements of a until f(a) is the highest possible, where f(a) is
defined in (v.23). In this section we will consider the problem:
(V.58) max f(a)

a
subject to:

(V.59) R a = r
x x1 x1
NR n “ NR

The vector a is the same vector as before--it contains all of
the non-zero non-normalization elements of the rows of a which corre-

spond to stochastic equations in the system. In this section, NR
additional arbitrary linear restrictions are imposed on the
M M
n = 2 n = 2 (m +-L ) coefficients of a. The equations need not
“=1 P- “=1 U- H
be identifiable in the absence of the additional restrictions; however,

they must be identifiable after the restrictions are imposed.

209

1. Illustration of linear restrictions on coefficients--Klein's model I

The placing of linear restrictions on coefficients is very straight-
forward and hardly needs illustrating. This example will, however, be
used to demonstrate the effect of using identity equations to solve out
jointly dependent variables in the system; since this is a technique
that is commonly referred to, but apparently not very well understood.
In particular, we will note how the use of identity equations to solve
out jointly dependent variables leads to the imposition of restrictions
on coefficients, and how these restrictions may be expressed by the R
matrix and r vector mention above.

Equations (I.5a) through (I.5h) present Klein's model I as an
8 equation model (3 stochastic equations and 5 identity equations) con-

2
taining 8 jointly dependent variables (C, P, W, I, w E, Y and K).

1,
Often, the model is written as containing 3 stochastic equations and 3
identity equations by implicity carrying restrictions on certain co-
efficients rather than listing the last two identity equation. (The
last two identity equations may be considered as having been solved
into the equations as compared to our previous formulation of the

model.) Following is an expression of Klein's model I as a 6 equation

model:

[IJP + oEl]

1 2 (w1+w2)+°’

(V.60a) Consumption: C = ogl] + o

 

1This example should convince most readers of the desirability of
explicitly using the identity equations in the computational procedure
instead of using the identity equation to eliminate jointly dependent
variables before commencing computation.

2The definitions of the variables in Klein's model I follow (I.5h)
in section 1.0.1.

210

[2] [2] [2] [2]
. b : =
(V 60 ) Investment I do +~o1 P + o2 P.1 +-o3 K-l + u2
. . . [3] [3] [3]
(V.60c) Private Wage. Wl o0 +-a1 (Y + R - WZ) + o2 (Y + R - W2)_1
+ og3]t + u3
(V.60d) Product: Y + R = C + I + G
(V.60e) Income: Y = P + W1 +‘W2
(V.60f) Capital: K = K + I

As the model is written above, it cannot be calculated by the
FIML method, since it still contains 8 jointly dependent variables
(C, P, wl +W2, 1,

may be computed by FIML if we rewrite the first and third equations as:

I, W Y + R - W2, Y, and K) but only 6 equations. It

(V.6la) C = ogl] +~oEIJP +01Enw1 +~a211w2 +~og1]P_1 + u1

[3]

(V.6lb) Wl = o0 +-oE3]Y +~oE3J(R - W2) + og3](Y + R - W2)_

. .g31

1

t + u3

and impose the restrictions:

0121] 0121]

011:13] 0123]

Thus, in eliminating the two identity equations, we have solved out two

of the original eight jointly dependent variables, namely W (= W + W2)

1

and E (= Y + R - W2), and imposed restrictions on certain coefficients.

At this point, we may write our problem as one of:
max f(a)
a

subject to:

Ra = r

211

 

 

 

 

 

 

 

 

where:
”at? , as"

2: in BE“ avg“

“ ‘2 ‘13 ii: ii:
:5 (:11) :22] :53]

[’3 j fa ] :3 J

c(E11 (121] 0E1] 0121] agl] 0([12] agz] O[£2] O[£2] 0E3] ags] 0’23] 0123] 01:33]
0 1 o -1 o o o o o o o o o o

R.
0 0 0 0 0 0 0 0 0 l 0 -1 0 0
0

r- 0

Now let us use the three remaining identity equations to solve
out the jointly dependent variables Y, I, and C.1 From (V 60a) we

have:

(V.62e) Y = P + W1 +’W2

Rewriting (V.60f) we obtain:
(V.62f) I = K - K

Substituting (V.62e) and (V.62f) into (V.60d) and rewriting it in terms

of C we obtain:

(V.62d) C=P+W1+W2+R-K+K_1-G

Substituting C, Y, and I as expressed by (V.62d), (V.62e), and (V.62f)

we get the following expression of the model:

 

1These are the same variables solved out by Chernoff and Divinsky [1953].

212

(V.63a) P+W1+W2+R-K+K_1-G=og1]+oE1]P+og1]W1
+ Eljwz + og1]P_1 + 111
(V.63b) K - K _=1 oz[2]+ (1&sz + 022313-1 + agz]x_1 + u2
(V.63c) w1 = 0&3] + aE3](P + w1 + wz) + 013] (R - w 2)
+~o23](Y + R - w2)_1 +'0g3]t + u3

subject to:
0&1] g 021]
“E33 = 0E3]

Again, the model as expressed cannot be computed by FIML, since it has

5 jointly dependent variables (P + W1 + W2 + R - K + K_1 -G, P, W1,

K - K-1, and P +-W1 +-W2) and only three equations. It may be re-

written in the following manner to make it amenable for computation

by FIML:
(V.64a) K = -og1] + (1 - o[1])P + (1 -a[1])w1+ (1 - o21])w2
- agl]P_1 +~agl](G - R - K_1) - u1

_ [2] [2] [2] [2]
(V.64b) K - o0 +o1 P +-o2 P + (o3

_1 + l)K + u2

-1

(V.64c) W1 = 0&3] +-oE3]P +a[3]W1-+<1[3]R +~o[3](Y + R - W2)_1

[3] + u

+’03 3

subject to:

(1 _ O([1]) = (1 _ aEIJ

213
[1]
as = -1

“13] : 0&3] = O[E3]

Thus, in terms of the a vector,

has become:

 

 

[[ l
. 1]*
“1
&[1]*
2
a1
a “[1]*
a i 2 where a1 a do , a
83 &[1]*
4
A *
.gu
agll
L a
.[13* .[1] .[1]* .[1]
cl = 1--o1 , oz - 1--or2

[13* [11% [11* [11* [11* [1]
Pol 0’2 0’0 (1’4 0’3 0’5 0’
o 1 o -1 o o
o o o o o 1
R g o o o o o o
L o o o o o 0

 

 

 

[2]

l
0

0

O

0

CY

R matrix and

[2]

0

(2 restrictions)

vector,

(9&3?
ag3]
ag3]
a231

.[3
oz ]

 

 

a

L.

the model

022] ag2]* 0E3] 0E3]

O

or([33] “23] O[[23] ag3]

0

0

0

0

0

0

0

0

0

0

0

0

1

 

0

 

0 0
0 O
1 -1
l O
-1'
O.
l
0
a

 

214

After converging to the FIML values for a, the desired coefficients

61%”,

etc. which have not been calculated directly are calculated as:

[11* 1

2 , etc.

51?] = 1 - 5311]" 5.21] = 1 - a

3

In our last formulation, we used an explicit normalization for
each equation. For the first equation, it would probably have been
slightly more convenient not to save out a normalizing coefficient,
but instead use the R matrix and r vector to impose a normalization

on the coefficients. Rewritten in this manner, the first equation be-

comes
(V.65a) oglj +-(oE1] - l)P +-(oglj - 1)W1 +-o%1]K_+-(ogl] - 1)W2
+ QEIJPJ + egg“; - R - K_1) + u1 = 0

and the second and third equations are unchanged. The restrictions

are:

 

dgl] is a linear combination of only a constant with only
.. *
a single coefficient 0E1]

1Since
, its variance and standard error are the
same as the variance and standard error of &E1]. Thus, asymptotic

[1]) = asymptotic Var(&E1]*), asymptotic Var(&21]) = asymptotic

l
x * . . .
Var(o [1] ), etc. The ratios of coeffiCients to asymptotic standard
errors (often called asymptotic t-ratios) will, of course, have to be
recalculated, e.g.,

62E” 01E 1] a?”

Var(&

 

 

 

asymptotic S-[1] = asymptotic S§[1]* # asymptotic S [I]
“1 CY1 “1
.[1]

where 8&[1] denotes the standard error of ol
1

215

[1] -

at,” - 1) -= (a, 1) .

(2 restrictions),

0E1] - ogl] - 1

0E3] = 0&3] = 023] (2 restrictions),

’ T
aE1]**

a213**

a213
a - 631]

.[l]**
“a

a213

 

 

agll j

b
The R matrix and r vector are:

p“11]** agl]** 021] agl] “E1]** o[g1] 0E1] agz] 0E2] 0&2] ag2]*

 

1 —1 o o o o o o o o o
o o 1 o o o o o o o o
R ' o o o o o o 1 o o o o
o o o o o o o o o o o
L o o o o o o o o o o o
o[[13] “53] ags] 023] 0&3] a333_ _ a
o o o o o o o
o o o o o o 1
o o o o o o , t = 1
1 -1 o o o o o
1 o o -1 o o J [91

 

 

 

216
Sammy 05 Kﬁeén Modeﬂ I lawman

The number of coefficients to be estimated in each of the formu-

lations of Klein's model I may be summarized as follows:

 

Number of Number of
Model Expressed Identity Number of Number of "Unrestricted"
byAEguations Equations Coefficients Restrictions Coefficients
(I 5a)-(I.5h) 5 12 0 12

(V. 61a), (V. 60b) ,

(V.61c), (V.60d), 3 14 2 12
(V. 60e) , (V. 60f)

(V.64a)-(V.64c) 0 l6 4 12
(V . 65a) - (V. 64b) , 0 l7 5 12

(V . 64c)

Thus, the dimensionality of the unrestricted coefficients space does

not change as identity equations are used to solve out jointly dependent
variables. Since the number of iterations required for convergence is
related to the dimensionality of the coefficienuispace, all of the above
expressions would require a comparable number of iterations. 0n the
other hand, as we will note by the formulas of the next section, the
formal listing of the identity equations rather than using them to

solve out jointly dependent variables involves less cumbersome cal-
culations each iteration thereby taking less total computer time.
Explicit listing of identity equations also has the advantages: (1)

it provides a more convenient way to formulate the problem (at least

in Klein's model I) than solving out the identity equations, (2) all

of the coefficients are derived directly, (3) estimated variances, and
t-ratios are calculated directly, and (4) calculation of reduced form
coefficients is simpler due to the direct calculation of the coeffi-

cients.

217

Restrictions on coefficients may arise from contexts other than
using identity equations to eliminate jointly dependent variables. In
these cases it will likely be more convenient to impose directly the
restrictions onto the coefficients in the form Ra c r than to attempt
to create additional jointly dependent variables and identity equations
to impose the restrictions.

The above illustration dealt with restrictions imposed on the
coefficients of a single equation at a time; however, the computational
formulas which follow will be equally applicable to restrictions which

cut across equations.

218
2. Computational formulas
In chapter IV we noted a procedure for transforming an R
matrix and r vector into a Q matrix and q vector with certain
. . . . l
speCial properties relative to the R matrix and r vector. The

procedure gave a method of separating the vector of coefficients, a,

 

 

into:
— H
a
(1)
(n-rk R)Xl
(V. 66) a* =
nXl a(2)
l_ rk RXl J
where the n - rk R coefficients in the a<l> vector are estimated
directly and the rk R coefficients in the a(2) vector are cal-

culated from the a(1) vector. a* is the same as a except that
the coefficients may have been rearranged.

The FIML computational formulas which take account of the

additional restrictions are:

(”7) 3E1; = #111) + hml Q'aﬂ‘i'DQI"1 (2'1 (1'1)
(n-rk R)Xl (n-rk R)X1 (le) (n-rk R)X(n-rk R) (n-rk R)X1
(i) _ (i)
(v.68) a(2) — Q2 3(1) + q2
rk RXl rk RX(n-rk R) (n-rk R)Xl rk RX1
where:
=£(i-1) is T times the negative of the matrix of second partial

derivatives of f(a) with respect to a as before. (The

$14).)

additional restrictions are ignored in defining

 

E.g., the Q matrix is the orthogonal complement of the R matrix.

Derivations are given in the following sub-section.

219

i-l
1( ) is T times the vector of first partial derivatives of f(a)

with respect to a as before. (The additional restrictions
are ignored in defining 1.)
Q, Q2, q, and q2 are calculated from R and r as in section IV.B.l.

(i) (i-l)

a and a denote the values of the coefficients for iteration

i and (i-l), respectively.
i-l
The I I lines around lQ':£.( )QI denote the positive definite matrix

[£(i-1)

constructed from (Q Q) in the same manner that L£(1-1)| is

a“).

constructed from

(V.67) and (v.68) may be combined into the single formula:

(v.69) 8(1) = Q Eéi;l) +h(i) lQ'£(i-1)Q|-1Q'1 (1'1)] +
nXl nX(n-rk R) (n-rk R)Xl nXl

When convergence has been achieved, the estimated coefficient variance

covariance matrix is given by:

> = QEQ'ﬂa" Hahn-10'

an

(v.70) asymptotic Var(aFIML

where max f(a) is the restricted maximum satisfying Ra = r.

The S matrix is calculated by formula (v.21) except that the
restricted coefficients (v.69) obtained from the preceding iteration
are used instead of the corresponding unrestricted coefficients in
calculating the S matrix.

Any restriction may be imposed only on the coefficients of a
single equation, or it may cut across equations. As in section IV.B,
the restrictions need not be linearly independent, i.e., R need not

have full row rank.

220

It is usually convenient and saving of computer time to list
only the non-zero non-normalizing coefficients of the stochastic equa-
tions in the a vector; therefore, the above procedure is outlined
with this in mind. It should be noted, however, that any coefficients
of & may be listed in the a vector and then restricted to -l, O,
or some other value by the R matrix and r vector. For some pro-
blems an implicit normalization of coefficients is more convenient
than the explicit normalization which we have used--setting a coefficient
to -1. Use of the R matrix and r vector allows for an implicit
normalization--all that is required is that some normalization be
imposed by the restrictions so that the coefficients of an equation
cannot take on an infinite number of values due to lack of a normal-

ization rule.

DQJz/évauon 06 (v.66) thaough (v.70).

(V.68) is given by the method of calculating the Q matrix and

q vector. (See section IV.B.l.) From (V.68), we derive that:

aéig I (i) O
(V.7l) 3(1) = Q a“) + q
(2) 2 2
or
(V.72) a(l) = Qaél; + q
In the following paragraphs, we omit the superscripts (i) for

simplicity. Since the coefficients of a are a function of the

 

EAn implicit normalization is used for the first equation of Klein's
model I in the last version of the "restrictions” example of section
V.F.l.

221

coefficients of a(1) (8(1) is a sub-vector of a), we can maximize

f(a) directly with respect to a(1). Let aEi] denote the ith element

of a, and a denote the jth element of a . Then from (V.72)
(DUJ (1)

we get:

33 .
(v.73) -——J;5l—- = Q..
53(1) 1]
[j]
where Qij is the ijth element of Q.

Now consider the vector T times the partial derivatives,

T§—££§L . The jth element is

a a<1)

n

n 53.
Ta—ZAL g T 2 8::8) aa 1 = 2 1 [i]Q .’ (j=l,... ,n-rk R) ,
(1)[j] t=1 i (1)[j] t=1 13

 

where 1[i] is the ith element of l . Hence,
(v.74) M = Q'1
aa<1)

Next consider the matrix of (-T) times the second-order partial

2
derivatives, -T§—££§% . The element in row j, column k is (minus) the
5a

 

“ (l
derivative of the j):h element of (V.73) with respect to a , i.e.,
(1)[k]
n n 2‘ i B a aa n n
5 f(a) i m =

m=l i=laamaai 58(1)[j] 53(1)[k] m=l 1 1

n n
= 2 (Q'mqu = 03 .1351qu th=er

m=1 j m
where 1% is column m of ii and Qj’ Qk are columns j, k of Q.

Thus,

2
(v.75) ail-91)2 = Q'iQ.

aa<1)

222

Taking account of the new matrix of second derivatives and the
new matrix of partials we can now iterate to a maximum basing our
iteration on the n - rk R coefficients which we have separated out.

Thus, (V.46) becomes:

a(i- 1)
(1): 8(1)

(v 76) a(1) al(1-1) _ hm _ nggag :fga)
' <1) al<1) a, 258(1) a
(1) 3 8(1- 1)

< 1‘) 81(1)

Substituting (v.74) and (V.75) into (V.76), we get (V.67).
(v.70) follows from (V.72) by a common variance relationship--
if y = Ax + b with A a matrix of fixed elements and b a vector

of fixed elements, then Var(y) = AEVar(x)]A'

223
G. Linearized Maximum Likelihood¥(lML)

 

The linearized maximum likelihood (LML) method is a complete
system estimation method recently proposed by Rothenberg and Leenders.1
LML estimates and asymptotic coefficient variances and covariances may

be calculated as:

(v.77) aLML = &(1) +[£(1)]'1-.(1)

(v.78) asymptotic Var(aLML) = [1.0)]-1

where :£(1) and 1(1) are calculated from the starting estimates am).2

That is, the LML estimates are the estimates obtained at the end
of the first iteration of the FIML iteration procedure, provided the
following restrictions are imposed on the iteration:

(1) {(1) is not converted to a positive definite matrix even though
it is not positive definite. (Usually i1 will not be positive
definite the first iteration.)

(2) A step size of l is automatically imposed.

The restriction formulas given for FIML may also be applied to

LML as well without changing the basic LML properties.

(1)

Provided the starting estimates for LML (a ) are consistent

(1) _ a = 0(r’1/2)

estimates of the underlying coefficients a with a
in probability and also provided that certain other regularity and

existence conditions hold, then LML coefficients have the same

asymptotic distribution as the corresponding FIML coefficients, i e.,

 

1Rothenberg and Leenders [1964].

2A ”degrees of freedom” adjustment may be made in asymptotic

V3r(a in the same manner as for FIML.

LML)

224

1
£( ) is asymptotically the same as {flax f(a).

This holds under a
wide range of conditions including restrictions imposed on the dis-

turbance variance-covariance matrix. Coefficients from ZSLS, LIML,

or other methods meeting the consistency and probability requirements

may be used as starting coefficients for the calculation of 'aLML;
however, the LML coefficients obtained if a(1) = aZSLS will not, of
course, equal the LML coefficients obtained if a(l) = a . DLS

LIML

estimates do not meet the consistency requirements; therefore, they
cannot be used as starting estimates if the properties of LML co-
efficients are to be preserved.
The LML coefficients may have a lower likelihood value than the

starting estimates. This is because:
(1) 421) will in general not be positive definite; hence, our assurance

that for a step size, h, sufficiently small, the calculation of
(i) g 8(1-1) + h(i)(£(1'1) -ll(i-1)

) leads to a higher likelihood

value will not in general hold.
(2) Even if {61) is positive definite for a particular problem, the

imposition of a step size of 1 may result in too large a step in

-1 (1)
l

the particular direction given by (£(1)) with the result

that a movement is made to a lower likelihood value.

(1)

Diagonal elements of [aﬂ ]-1 may be negative, giving estimates

of asymptotic coefficient variances which are negative.

 

1DLS estimates may be used as starting estimates for FIML, since
iteration proceeds to the maximum of the likelihood, anyway. The FIML
coefficients obtained will, of course, coincide with the FIML coeffi-
cients obtained from starting from ZSLS or LIML coefficients (assuming
that multiple maxima do not occur in the regions of the coefficients).

CHAPTER VI
LIMITED INFORMATION SUBSYSTEM MAXIMUM LIKFLIHOOD (SML)

A. Only Zero and Normalization Restrictions Imposed on Coefficients

 

Limited information subsystem maximum likelihood (SML) is a
partial system maximum likelihood method which permits the simultaneous
estimation of a subset of the equations in a system of equations.1 In

FIML estimation we distinguished between two kinds of equations--the

 

1The name limited information subsystem maximum likelihood comes
from taking account of the structure of only a subsystem of the whole
system in the estimation procedure. A basic reference for the SML
method is Koopmans and Hood [1953]. Chernoff and Divinsky [1953] refer
to the method as the LIS (limited information subsystem) method and
give computational formulas. The computation of a particular SML pro-
blem is also given by Chernoff and Divinsky. We will use the abbrevia-
tion SML rather than LIS to emphasize the maximum likelihood character
of SML which distinguishes SML from other ”limited information sub-
system" methods such as 3SLS (three-stage least squares) and IBSLS
(iterative 3SLS).

Hannan [1967] recently derived an SML method (seemingly unaware of
the prior existence of the more general SML method given by Koopmans
and Hood [1953] which is discussed in this chapter) and showed the re-
lationship between it and canonical correlation. As Chow and Ray-
Chandhuri [1967] make clear, Hannan's method is only applicable as a
very Special case of the general SML procedure given by Koopmans and
Hood. (The special case may be stated in our notation (pp. 226 g£_§gq.),
as follows. Let

”11 = (r11 311)

where a is M X(G + A),F is M XG, and B is MIX A (M1 being the
number of equafions in the subsysfem) Now imagine rearranging vari-
ables so that all G dependent variables in the subsystem occur first
among the dependent variables in the whole system, and all A prede-
termined variables in the subsystem occur first among the predetermined
variables in the whole system, so we can write

= ’F O .

(F11 311) ( A 3* 0)

where PA is Mlel and B* is Mlel‘ Now consider the matrix
aA* = (FA Bk)
which is M x(G + A1) The special case that Hannan treats is that in
which, apriori, there are exactly M1 - l zeroes in each row of QA*.)

226

M stochastic equations and the G - M identity equations. Thus, in

FIML estimation we subdivided the system of equations into [see (v.3),

(v.4), and (V.S)]:

(v1.1) a1 2' + u' = 0'

MX(G+A) (G+A)XT MXT MXT

(VI.2) Z' 0'

a1
(G-M)X«i+A) (G+A)XT (G-M)XT

In SML estimation we make a further division of the M stochastic

equations into (1) the M stochastic equations for which we specify

1

the structure and (2) the remaining M equations for which we estimate

2

no coefficients and specify only any predetermined variables occurring
in the M2 equations which do not already occur in the M1 equations.
Thus, in deriving the SML computational formulas we will find it useful

to subdivide the stochastic equations into two subsystems. The complete

 

system may be written as:

(V1.3) all 2' + Ui = 0'
M1X(G+A) (G+A)XT MIXT Mle
I I a I
(V1.4) 012 Z + U2 0
M2X(G+A) (G+A)XT MZXT MZXT
I = I
(V1.5) all 2 0

(G-M)X(G+A) (G+A)XT (G-M)XT

In SML estimation, only the structure of the first M1 stochastic

equations given by V1.3 (subsystem I1) is specified. Also, the prede-
termined variables in the entire system (including those which are in

subsystem 12--the remaining M stochastic equations--and subsystem

2

II--the G - M identity equations) are specified (insofar as the re-

searcher is able to specify additional predetermined variables in

227

subsystems I2 and II). In the derivation of the concentrated likeli-
hood function which follows, we will see that the same likelihood
function is obtained whether some or all of the identity equations are
used to solve out jointly dependent variables in subsystem 11 or 12 or
whether the identity equations are merely ignored (except that additional
predetermined variables which occur in the identity equations are in-
cluded in the set of predetermined variables recognized as being in the
system in applying the computational procedure).

Let] G be the number of jointly dependent variables occurring

1

in subsystem 11. Then (as will be shown further on), if G1 equals

M1, and the rank of the matrix of predetermined variables in the entire

system is less than T - M + l, the SML coefficients will coincide with

l

the FIML coefficients obtained by applying the FIML computational pro-
cedure to the M1 equations, only.1 This holds whether M1 is the

entire system or whether M is a subsystem of the entire system with

1
additional predetermined variables in subsystems 12 and II occurring

only with zero coefficients in the equations of subsystem 11. On the
other hand, if subsystem 11 consists of only a single stochastic equa-

tion (i.e., the structure of only a single equation is specified) the

resulting SML coefficients for the equation coincide with the usual

 

1rk X < T - M + l is necessary for computation of SML coeffi-
cients as is shown in section VI.D; however, this condition is not a
requirement for FIML computation, since the X matrix is not used in
the adjustment of jointly dependent variables in FIML estimation.

228

LIML coefficients for the equation.

Thus, (provided rk X < T - M1 + 1) both FIML and LIML estimation
may be considered to be particular cases of SML estimation; however,
since the SML computational procedure is more cumbersome than the FIML
computational procedure, it is more fruitful to calculate problems in
which the number of jointly dependent variables in the system or sub-
system to be estimated equals the number of equations by the FIML com-
putational procedure rather than the SML computational procedure.
Similarly if only a single stochastic equation is specified, it is more
fruitful to use the much simpler LIML computational procedure than the
SML computational procedure. The SML procedure may be applied, however,
to the many cases in which the structure of more than one stochastic
equation is specified but the entire system is not specified.

The SML method is sometimes referred to as the least generalized
variance ratio (LGVR) method, since the coefficients obtained minimize
the ratio of two generalized variances.2 This is not as powerful a
property as the LGV property of FIML, even though the properties do

coincide if the number of jointly dependent variables equals the number

of equations.

 

1Provided X (the matrix of instruments) in the LIML estimation
is taken to be X (the matrix of predetermined variables for the system)
or provided X is used in place of X in the SML calculations. A
proof of the equivalence of LIML and SML in the case of only a single
equation occurring in the subsystem being estimated is contained in
Koopmans and Hood [1953], pp. 166-173.

2See Koopmans and Hood [1953], pp. l70-l7l.

229

l. Derivation of the likelihood function to be maximized

 

The Likeiihood To Be Maximized

Before indicating SML computational formulas, the function to
be maximized by the SML procedure will be indicated. As a step in the
derivation of the likelihood function, we will use the G - M identity
equations to temporarily eliminate G - M jointly dependent variables
from the system. (The eliminated variables will be reentered into the
system at a later step.) Suppose that the jointly dependent variables
are divided into two groups, Y = [Y1 i Y2], where Y2 contains the
G - M jointly dependent variables to be temporarily eliminated from
the system and Y1 contains the M jointly dependent variables which

will remain.

To reflect our subdivisions we may rewrite (V1.3) through (VI 5)

as:
I + i I _ i = '.
(V1.6) FIl,lYl . F11,2Y2 + Bllx + U1 0
I . I - I l l = 0
(v1.7) F12,1Y1 + r12,2Y2 + 312x + 12 0

 

1The derivation of the likelihood function to be maximized follows
Koopmans and Hood [1953] and relies on that reference for some of the
details of the derivation. The derivation given in this section differs
primarily from the one given in Koopmans and Hood in that identity equa-
tions are explicitly treated in the derivation and it is shown that the
same likelihood function is obtained as if the identity equations had
been ignored, (except for using the predetermined variables from the
identity equations in the computational method).

Professor Herman Rubin informed the writer that the same result
is obtained whether some or all of the identity equations are used to
eliminate jointly dependent variables from the stochastic equations
whose structure is specified; however, the derivation of the likelihood
function in a manner which shows that this is the case (given in this
paper) was developed by the writer.

230

l _ I
+ FZZY' + BI x - 0

(V1.8) FZIY' 211

1

Up to this point the matrix of coefficients has been implicitly or

explicitly subdivided in the following ways:

 

 

 

r -a ‘-
O’11 FI11 B11
X X
M1 (G+A) M1 G MIXA
(V1.9) a = [F : B] = 012 = F12 B12
GX(G+A) GXG GXA M2X(G+A) MZXG MZXA
r
11 B11
LgG-M)X(G+Aﬂ _ﬁc-M)xc (G-M)x4]
"Ia ﬂ
11,1 r11,2 B11
X X _ X
M1 M M1 (6 M) M1 A
= I12,1 F12,2 B12
“X ‘X ... ‘ ' X
M2 M M2 (c M) M2 A
F21 F22 BII
t<G-M)><M (G-M>X (G-M> (G-M)><A..

 

 

We can use the G - M identity equations to temporarily eliminate

the G - M jointly dependent variables in Y2 in the same manner as

for FIML [(V.7) through (V.11)] giving us:

* I * I + I = I
(V1.10) F11 Y1 + BIl X U1 0
X X X X X X
M1 M M T M1 A A T M1 T M1 T
* I * I + I = I
(V1.11) T12 Y1 + B12 X U2 0
X X X X X X
M2 M M T M2 A A T M2 T M2 T
or
I I = 0!
(V1.12) ail Z1 +> U1

M1X(M%A) (M+A)XT Mle Mle

231

I + I. a 9
(V1.13) a¥2 Z1 [2 O

x ' , Y '
M2 (M+A) (M+A)XT M2 T MZXT

where

-l
* = -
r11 F11,1 r11,2F22F21

-l
x = -
r12 I“12,1 1“12,ZFZZFZl

-l
* -
B11 B11 F11,2r22BII

-1
* = - r 1
B12 B12 12,2 22311
"* FT* *
“11 11 B11

I

0* = [F* ; 3*] = =

 

 

* f* .
12 L,12 I
In the derivation of the FIML likelihood function, we assumed

that U has the multivariate normal density function given by (V 12)

and derived the following intermediate likelihood function (V.l3):
* =
(V1.14) f2(Zl,a ,2)

Z a*'2'1a*zl

1 1[t] 1[t]

NIH
u Pdhi

- §log 2n - %log[det'£] + T logEabs(det r*)] '
t

Subdividing Z to represent the subdivision of the stochastic equation

into two groups we define:

 

 

.1
F211,11 z11,12
X X
“1 M1 M1 M2
(VI. 15) 2 =
MXM
212,11 2312,12
X X
LMZ M1 M2 M2‘
where 2 consists of the disturbance variance-covariance matrix

11,11

232

of the equations in subsystem 11 (whose structure has been specified),

= ' o o o _
211,12 and 212,11 ( 211,12) conSist of disturbance covariances be

tween the equations for subsystem 11 and subsystem 12, and 212 12
2
consists of the disturbance variance-covariance matrix of the equations

of subsystem 12.
Let us maximize f2(Zl,a*,Z) first with respect to 012’ 211,12,
212,11, and 212,12 taking no account of restrictions imposed on these

matrices by the model--in particular taking no account of the structure

of 012’ e.g., ignoring the restrictions which would be imposed if

account were taken that certain elements of 012 are zero, other

0 -1 0
elements are -1, and, Since a¥2 = 012 - F12,2F22[121 : 311], these
restrictions imply restrictions on afz . The following logarithmic

concentrated likelihood function is obtained:1

A

. T 1 .
* g _l _ *I I 1‘
(VI°16) 82(Zl’a11’211,11) C3 + 2 03 [det(rr11EY1Y13ix 11)]

_ T

2logEdet i

T
T A 15-1 A
] - —' 2 z a*'£ a*'z'
11,11 2 t=l llt] 11 11,11 11 1[t]

where [YlYlJLX is the moment matrix of the part of each jointly de-
pendent variable orthogonal to the predetermined variables in the
entire system (including predetermined variables in the identity equa-
tions and the part of the structure not specified). The [Y1Y1]lx
matrix is the same as the [+y'+r]lx matrix of the double k-class
estimators except that instead of one row and column for each jointly
dependent variable in equation u, [YlYIJIX contains one row and
column for each jointly dependent variable not temporarily eliminated

from the system, including the normalizing variables. (Also, we are

 

1Koopmans and Hood [1953], pp. 192-195.

233
using X as the matrix of instruments, XI.) The usually quoted formula

1
O ' o .
for calculating [YlYlllx is.

l

(V1.17) [YiY - YiX(X'X)-1X'Y = Yi(1 - X(X'X)- X')Y

= I 0
131x Y1Y1 1 1 ’

however, there is considerable advantage to using direct orthogonaliza-
tion to calculate [YlYlle in the same manner as in calculating
[+Y'+Y]Lx rather than by using the above formula. (For one thing X

need not have full column rank when direct orthogonalization is used )

‘\

1f 82(zl’a11?211,11) is max1mized With respect to 211,11

(thereby concentrating the function onto &* and 21), the following

11

relation is obtained:2

(V1.18) E =(1/T)w WEZ WZIJ

11,11
Substituting for 211 11 into (V1.16) and dividing by T/2, we get the

function:

(V1.19) 83(&*1 ,Z 1:)

+ 10g£det(TF11[YiY1] r*')] - logEdet(%d¥l[Z'Zl ]a?1)]

C411X 11

 

1Except, possibly, for a factor of T, [Y' Y 11] is the W matrix
given in Koopmans and Hood [1953] and Chernoff 1and ﬁivinsky [1953]

2Hood and Koopmans [1953], p. 166.

3Hood and Koopmans [1953], p. 195.

234

. 1
c 18 a constant. However,

4

1 1 1
(v1 20) det(ETII[Y1Y1]iXF det(T IIEY YJlXFIl) det(ETA_YAYA11X:A)

11

where:

[Y'YJLX is the GXG moment matrix of the part of the G jointly de-

pendent variables in the system orthogonal to the A predetermined
. . 1 . 0
variables in the entire system. ([Y inx 18 the same as [YlYljix

but expanded to include all jointly dependent variables in the

system.)

[YAYAllX is the G XG moment matrix of the part of the G jointly

1 l 1

dependent variables in subsystem 11 only (the M1 stochastic
equations for which the structure is specified) orthogonal to the
A predetermined variables in the entire system.

is the Glle matrix of coefficients of the G1 jointly dependent

 

variables which occur in the M1 stochastic equations for which
T
1 2 1 . “-1 A
c = —(c - -2 Z a*'2 0* Z' )
4 T 3 2t=1 1[t] 11 11,11 11 1[t]
= 2(c - ltr{lz &*'[&* (Z'Z )&*' -%&* Z'})
T 3 2 T111 11 1111 111
2 1 1A A A '1
= —. - — * ' *' *1 ' *'
T(C3 2tr150112121011[&112121“11] })
2 l l _ 2 .1.
_ T(c3 - 2tr{T1}) - T(c3 2 1)
.2c -1
T 3 T

where tr denotes trace and we have used the relationship tr(AB)= tr(BA)
for any matrices A and B provided AB and BA are defined (i.e., pro-
vided the number of rows of A equals the number of columns of B and the
number of rows of B equals the number of columns of A).

2The proofs of (V1 20) and (V1.21) follow (V1.21).

Also,

235

the structure is specified. If the variables in the Y matrix
were rearranged to Y = [YA 1 YB] where YA contains the G1

jointly dependent variables which are included in the M1 equa-
tions for which the structure is specified and YB contains the
remaining G - G1 jointly dependent variables in the system,

and if the columns of F11 and VA were rearranged into the

same order, then

I‘11 = [ FA ‘ O 3
X X X -
Ml G M1 G1 M1 (G cl)

1 1 1
* I *1 g t O = I 0
(v1.21) det(—ozT IlEzlzljo/Il) det(-o/T Il[z zjozn) det(EoIA[ZAZA]o/A)

where:

ZA

is the TX(G1 + A1) matrix of variables in the M stochastic

1

equations for which the structure is specified. (2A is the Z
matrix with all variables which do not occur in the stochastic
equations with structure specified deleted.)
is the M1X(G1 + A1) matrix of coefficients of the G1 + A1
variables which occur in the M1 stochastic equations for which
the structure is specified. If variables were rearranged as
required:

all r 1 0A E O ]

M1X(G+A) MIX (61+A1) MIXE (G-G1)+(A-A1)]

with

“A a E FA ‘ BA 1
X X
M1 (61+A1) M1 G1 1~11+A1

236

Showing (VI . 20)

From (V1.8) we obtain:

 

II“ I I ,
(v1.22) =-F22 21Y1' TZZBIIX ,
hence,
a: : = a, ' ' _ I I
(VI°23) YlX [Y1 ' Y211x [Y1 ' Y1F21(F22) XB11( F22) 34
I I I
However, by (1.59), [- IYIFZIQ 2) 31x: [Y jixr 21(P 3) and by (1.59)
_ v ' 1 v = u I _
and (1.42), E XBII(F22) Jlx=[x]1x311(rn1) -0311(r22)
Hence,
= I I ,
(v1.24) le [[Y1]l{ -[Y 111x? 21(r221) ] , and [Y Y11x becomes.
(v1.25) [Y lex = YLXYLX
l- [Y 1' [Y J -[Y 1iJ XEY 1 Q1)' -
1 1X 1 1X 1.LX I210‘ 22
= i
[Y1 ' [Y J [Y ]' XEY1] (F'1)'
F22F 21 x 1 1x 1r21 11 1.LxF 21( r22_
11”) 11qu I"<'1>'
1 1 1x1ix 21( 22
"F1 '1 I F
{-F 22 rial:Y Y lJLX F22F21EY1Y1F'31X 21¢ 22) 2_§
If F is correspondingly subdivided in the manner of (V1.9)

11

we have:

237

 

 

 

 

(v1.26) det{F11[Y'Y]lXF11} =
F— 1'1 .1 I _\
1Y1Y11,x [Y Y 1P1”, 21 (59' ﬁlm
det [F11,1 : r11,2] P
1r .
[.22 21[Y 1Y111x r22 21[Y1Y1]Lx 21(F22XJ .r11,2lJ

 

‘ dat{F11,1[YNY1]Lx 11,1 ’ r11,1[Y1M111x 21W 2) r11, 2
' r11,2r22r21EY1M111x 11,1 + r11,2F22F21EY1F'Y1JLX 210 22) r11,2}
g det{UIL 1 P11 2 221F21JEY MYIJLX 11,1
' [r11,1'r11, 2r22 ZIJEYIYIJLXF 21(F22 ) r11, 2}
e det{[r F F ][Y' Y 3 [F -F F-lr 1'}
11,1 1L 2 22F 21 1.1 1x 11,1 11,2 22 21
= det{F*1[Y'1Y1]lXT¥1}

The second equality of (V1.20) involves a mere rewriting to
eliminate from XlX variables for which no corresponding jointly de-
pendent variables occurs in the set of stochastic equations whose
structure is specified. If the variables are rearranged in the manner

noted in the definition of VA above, we have:

I I I
[YAYAJLX [YAYB]1X 1"

A
(v1.27) Il[Y' Y] XF11=[fA : 0] = FA [YA Y ﬂllx
I I
[YBYA11X [YBYBll 0
Showing (VI. 21 ) Hoﬂdzs
By (V1.3) and (V1.12), a Z' = -U' = -a* 2' ° hence, a* Z' Z 0*

ll 1 ll 1 ’ 11 l l Il=

I I
0’112 2211

A .

238

The second equality of (V1.21) involves a mere rewriting to
eliminate from 2 variables which do not occur in the stochastic equa-
tions whose structure is specified. If the variables are rearranged

in the manner noted in the definition of a we have:

A
I. I I
ZA;A ZAZ QA
It. 2 . I v
(VI.28) aIIZ ZaIl [qA . O] QAEZAZAJqA
I I I
ZBZA ZBZ 0

Conﬁénuing the Demivazéon 05 The SML Linwood Functéon

Substituting (V1.20) and (V1.21) into (V1.19) we have:

A

A = 1“ v 1 _ 1“ I “v
(V1.29) g4(qA,Z) c4 + log[detCE:A[Y inxrA)] longet(EaA[2A2A]qA)]

which is the concentrated likelihood function which we will maximize
in the calculation of SML estimates. Note that :A and qA are the

coefficient matrices of the M1 stochastic equations before using the
identity equations to temporarily eliminate the G - M jointly depen-
dent variables. Thus, the elimination of the G - M jointly dependent
variables is convenient only for the derivation of the concentrated
likelihood function and is unnecessary for the expression of the con-
centrated likelihood function or the computation of the 8M1 coefficients.

The predetermined variables in the entire system (including those in

the part of the system which was not specified and in the identity

 

1Thus, Chernoff and Divinsky [1953] would have obtained the same
resulting coefficients and considerably simplified their computation
of SML coefficients for Klein's model (b) (the same model as Klein's
model 1 except that G and R are classified as jointly dependent
instead of predetermined) if they had used the stochastic equations
which were specified directly instead of using the identity equations
to eliminate jointly dependent variables before commencing their
iteration procedure.

239
equations) are used in the calculation of the [Y'YjLX matrix.
12 1“ L
h . __a I "I __I‘ I I -
T e matrices T AEZAZAJQA and T AE¥AYA1LXFA are used re
peatedly in the elaboration of the SML damputational procedure which

follows; hence, it will prove convenient to denote these matrices as

simply S and 7', i.e.,

_ A _ 1" c “I
(v1.30) 3 — 211,11 EQAEZAZAJOIA
and
(v1.31) T = -1& [2'2] 61' = -1-lA"[Y'Y 1 1A"
TA AAlXA TA AAJLXA

(The matrix “T should not be confused with the number of observations,
3
'l
T.) Let a = I be the vector of unrestricted estimated coefficients
a
M1

of GA (a is formed from &A in the same way that the a vector was

formed from &I for FIML estimation [(V.20) and (V.22)]). Then S

 

1The preceding derivation showing that it matters not whether the
identity equations are ignored in the computational procedure or used
to eliminate jointly dependent variables from the subsystem being es-
timated assumes that the same X matrix is used in either case; that is,
the X matrix is taken as the predetermined variables in the system be-
fore using the identity equations to eliminate jointly dependent vari-
ables. If instead of this approach the identity equations are used to
eliminate jointly dependent variables from the subsystem being estimated
and then the matrix of predetermined variables in the system is con-
structed as the predetermined variables in the newly modified subsystem
being estimated plus the predetermined variables in the remaining
stochastic equations, the X matrix constructed in this manner (we will
call this matrix the "new” X matrix) will in general not coincide with
the original X matrix, since some predetermined variables are likely
to have been linearly combined with jointly dependent variables and
the combined variables labeled jointly dependent. Since the space
spanned by the new X matrix is likely to be smaller than the space
spanned by the original X matrix, [Y'Y 1_ will in general be changed
and therefore, the coefficients obtained will, in general be different.

240
and T may also be defined as S = [suu,] and T = [tuu,] with (see

v.21):
(VI 31a) 3 =1 a'[ Z' Z ]a =lﬁ'ﬁ
' w' T+u+u+u'+u' Tuw
(v131b) z =-1- A'[ Y' Y 1)}
' uu' T +'u + u + u' + H”

We choose the unrestricted coefficients of &A such that
g4(dA,Z) is a maximum; however, since Z is fixed for any given
sample, for any given structure the only elements of &A which are
allowed to vary are the elements of the vector a. Thus, for an

assumed structure and a given sample, g4(&A,Z) may be considered a

function of the vector a only, i.e.,
(V1.32) g(a) = g4(&A,Z) = (24 + log(det T) - log(det 8)

Another function which will be maximized when g(a) is maximized

is the function
(V1.33) g*(a) = det T/det S

If the number of jointly dependent variables in the equation
being estimated equals the number of equations being estimated, then
. t g
11X det fA

.. .1“: “v ..
Hence, log[det T] log[det(TfA[YAYA]foA)]

I I: , IY
VA is square and det(rAtYAYAJLXFA) det :A detEYA A

2 .
det PA detEXAXA]1x .

.1.. 2~ .
logEdet(T[XAYA]ix)] + log(det PA) . Substituting into (V1.32) we get

2* l ,
g(a) - c5 + log(det 1A) - logEdet S] (c5 c4 +-log[deth£XAYA]lx)]),

241
the form of the logarithmic likelihood function for FIML [see (V.17)].1

Thus, for this case, maximization of g(a) using the SML procedure

will result in the same a as maximization of the same equations using

2
the FIML procedure.

 

1This manipulation assumes that det T # 0. If rk X 2 T - G + 1
(where G is the number of jointly dependent variables in the equa-
tions being estimated) then det T = 0; hence, logEdet T] = logEO]
which is not defined. This case is discussed in more detail in section
VI.D. rk X 2 T - G1 + 1 causes no difficulty in FIML estimation.

Klein and Nakamura [1962], pp. 295-297 were evidently considering
the calculation of FIML estimates as a particular case of SML in their
comments of the effect of multicollinearity on FIML estimation relative
to the effect of multicollinearity on other estimation procedures.

This way of regarding FIML estimation may be undesirable both compu-
tationally and conceptionally. It may be undesirable computationally,
since the SML computations are more severe and artifically impose pro-
blems when rk X is large relative to the number of observations. It
may be undesirable conceptionally since the FIML estimator has pro-
perties not possessed by the SML estimator (except as the two coincide),
e.g., the FIML least generalized variance property (section V.A) is
more powerful than the SML least variance ratio of two generalized
variances property (section VI.A). Their conclusion that FIML is more
sensitive to multicollinearity among the predetermined variables in the
system than ZSLS and LIML does not follow from their arguments. They
have given very good arguments as to why one would expect SML to be
more sensitive to multicollinearity among the predetermined variables
in the system than ZSLS or LIML.

2This is noted in Koopmans and Hood [1953], footnote 73, p. 165.

242
2. Computational formulas
The same computational procedure as was used for FIML is used
for SML. Only the actual vector of partial derivatives and matrix of
second partial derivatives differ from the FIML vector of partial
derivatives and matrix of second partial derivatives. Also, g*(a)
is used in place of f*(a) in determining the step size.

an)

The vector of starting estimates, , is arbitrary as in FIML.
The starting estimates may be derived from single equation techniques
such as DLS, ZSLS, and LIML, from multiple equation techniques such as

BSLS or IBSLS, or in some other manner. The coefficients for iteration

i are calculated from the coefficients for iteration (i-l) by (v.48),

i.e.,
(v1.34) ‘ a”) = 3(1'1) + Mme“)
where d“) - I£(i-1)|-11 (1-1) and 11(1) is the step size for the

iteration. In what follows, we will omit the superscript giving the

8(1-1)’ dd) (1)

iteration number from the , and 1 vectors and the

in) matrix.

th
In SML estimation, the up.’ block of the 1!. matrix is:

2
g _.§_S£El g Hu' '2 - 1 'ﬁF ﬁ'z
(v1.35) Yew. Taauau' s 7‘11 u" ( /T)zph w' u'

.[U'z .

I A
- tm" ' - 1 T z'U G
[Zulu (/)[ 1» 11X m» 1» ix ’

.1,

243

th
and the u block of the vector of the right hand side vector is:1
M1 M1
I I
(V1.36) I“ = 1 gfiﬂl = 2 sup z'ﬁ~, - z 1”” [z'ﬁ .lix
p. ”0:11 H‘ L5 H'=1 H‘ u
31“ £1“
= z'ﬁ : - [2'3 1 .
u 1 u 1 1X
8M1” 1M1“
where:
I
t -
SUM is the element of the u h row and u'th column of S 1 [see (V1.30)

and (VI.31a)].

 

I
t -
{PM is the element of the nth row and u' h column of T 1 [see (V1.31)
and (V1.3lb)].
I
1(V1.35) and (V1.36) may be derived by noting that T§$§§§_§Z_ is
31“ u
the same for FIML and SML-- ZdUl i , the first term of (V1.36).
SW
The negative of the partial derivative of the first term of (V1.36)
. uu' . _ .U . .
With respect to a”, gives 5 ZMZM' (l/T)ZM lFuu,U1Zu, , the first

two terms of (V1.35). To derive the second term of (V1.36) and the

11
. = —r ' '
last two terms of (V1 35), note that T T [YAYAJLX

1. , , ; .‘.._1. . .
TgA([YA]1X , 0) ([YAJLX . O)ozA TQAEZAZAJLXQA , hence, the same

partial derivatives as for log(det S) (with S = %&A[Z'Z]&A) are

obtained except that [Z] is substituted for Z in each of the
1X
terms.
Except for the lack of use of direct orthogonalization, essentially

the same basic formulas are given in Chernoff and Divinsky [1963], pp.
261-263.

244

 

 

 

 

 

 

 

 

 

_ [A a _ I " " = I ... I
sz1 Zulu1 uMll [(211 +zl)+a1 (zLL + 21”,) a M1]
_ IA = . '
[zuU1]m{[zu + zl]1x +g1°[zu + ZMljix +-aM1}
I .3 l- I -1
TYp. +Yl]1x 0 [Yu +YMIJJ-X 0
3 +81 +8M1
L. 0 9J L. O QJ
r- , A “H r . ‘1
[Yu +Y131x + 1 [Yu +YM1]J.X + M1
m X1 m X1
u
0 O
x x
L. Lu 1 .1 L Lu 1 J
.W
O ' -
F - I [ski °'° leu] + sup S 1
H“ '
leu
b
r- c
arm-l
o I
c ,-: [tm°°°.tM1]+£mT1
MU
15‘1“;

 

A "degrees of freedom" adjustment can be made in the estimated
SML disturbance variance-covariance and the estimated SML coefficient
variance-covariance matrices in the same manner as for FIML. These
matrices can also be normalized in the manner suggested for FIML.
(See section V.D. and V.E.) asML and igdL are, of course, used in

place of a and iglML in the calculation of these matrices.

FIML

245
B. Arbitrary Linear Restrictions Imposed on Coefficients
The iterative procedure given in the previous section is de-
signed to maximize g(a) with respect to the vector of coefficients, a.

In this section the problem is to:

(v1.37) max g(a)
a

subject to:

(V1.38) R a = r
NRXn nXl NRXl

where the R matrix and the r vector are as defined for restricting
a in FIML estimation. The Q and Q2 matrices and the q and q2
vectors are calculated from the R matrix and the r vector in the

same manner as for FIML, and the computation gives a means of separat-

ing out a vector of n - rk R "unrestricted" coefficients, a(1), and
a vector of rk R Coefficients, a(2), which may be calculated from the
a vector.

(1)

The computational formulas for FIML, (V.67) through (v.70),
are applicable for SML as well, except that the sf. matrix and 1
vector for SML are used in place of the :8. matrix and 1 vector for

FIML.

246

C.' Using_Lnstrumental Variables in_SML Estimation

In deriving the logarithmic likelihood function and in the pre-
sentation of formulas for the SML estimator, the jointly dependent vari-
ables in the equations being estimated were adjusted by all of the
jointly dependent variables in the system; i.e., the matrix X was
used in the calculation of [TATA]LX . A more general approach would
be to adjust the jointly dependent variables in the equations being
I , where XI contains all of
the predetermined variables in the stochastic equations being estimated

estimated by a matrix of instruments, X

plus a set of additional instrumental variables.1 Thus, the matrix

I ' I
[TATAJLXI could be substituted for the matrix [YA in the above

YaJLX
likelihood formulas and, therefore, in the SML coefficient formulas
derived from the likelihood formulas. If such a substitution is made,
the resulting SML coefficients will not be the same as the coefficients
obtained through use of the E¥d¥Ajlx matrix and 353225 be described
as the coefficients which maximize the likelihood function given the

structure of the equations estimated and the predetermined variables

1 the system.

__

 

1Selection of instrumental variables is discussed in section 11.G.

247

D. SML Estimation when rk X 2 T - M1 + 1

1f rk X 2 T - M1 + l (where M1 is the number of equations
in the system being estimated) then the T matrix will be singular;
hence, the formulas given previously cannot be used to compute SML
estimates.1 In some cases, a matrix of instrumental variables, XI’
could be substituted for the X matrix as noted in section VI.C, with
the number of instruments restricted so that rk XI < T - M1 + 1; how-
ever, the properties of the SML coefficients obtained from such a sub-
stitution have not been examined. There is especially a question as

to whether or how much the statistical efficiency in the estimation of

the SML coefficients is decreased if the X space is restricted.

 

I
1 1r. .~. 1 a. .x
T a -T Y' T' s —d t F ' F' b l
det det{T At AYAJLX A} T e {[ AYAJLXEYA A]Lx}ﬁ [t e ast
equality comes from (1.80)]; hence the TXM1 matrix E¥Ardjix must

have full column rank if det T is not to be zero. But the space
orthogonal to X has rank T - rk X; hence, if [YATA lX is to have

full column rank, T - rk X must be greater than or equal to M i.e.,

rk X S T - Ml' Thus, det T = 0 if rk X 2 T - M1 + 1.

1,

21f the number of jointly dependent variables in the subsystem
being estimated equals the number of equations being estimated, re-
striction of the X space will lead to the same estimates as would
have been obtained if FIML had been applied only to that subsystem
(see section VI.A); however, in this case it is more efficient (with
respect to computer time and capacity requirements) to use the FIML
computational method directly. (rk X 2 T - M + 1 presents no
difficulty in FIML estimation, since fhe joingly dependent variables
are not adjusted in the FIML estimation procedure. Instead, the co-
efficients of the jointly dependent variables are taken explicitly
into account.)

248
E. Iterative Limited Information Single Equation Maximum LikelihoodA(IL1ML)

In the case of only a single jointly dependent variable per equa-
tion, Professor Lester Telser proposed an iterative DLS estimation pro-
cedure which estimates the coefficients of one equation at a time based
on the predetermined variables in that equation plus the residuals from
each of the remaining equations.1 The new coefficients for an equation
are used to estimate new residuals for the equation which are, in turn,
used to estimate new coefficients for the remaining equations.

In chapter VII of this paper it is shown that Telser's iterative
procedure leads to FIML estimates in the special case of a single
jointly dependent variable per equation. In this section, we will
demonstrate that the same procedure but using LIML instead of DLS
leads to FIML estimates in the general case (multiple jointly depen-
dent variables permitted in each equation).2 In the case of an in-

complete system, the method leads to SML estimates.

Den/{vouch 06 The ILIML Method

T
As noted in (V1.33) the function g*(a) = gEE—S =

A I A I
det{TA[YAYA]L£TA}

 

is maximized by the same coefficients that maximize
d ’- I AI
et{aA[ZAZA]aA}

the likelihood function. Suppose that we partition the matrix of co-

efficients such that the coefficients for one equation are distinguished

 

lTelser [1964], pp. 845-862.

2The method (ILIML) given in this section was proposed to the
writer by Professor Herman Rubin in June,l963. The key steps of the
proof of the increase in the likelihood function at each iteration
were indicated to the writer by Professor Rubin at that time.

249

from the coefficients for the remaining equations. Without loss of
generality we can assume that we are distinguishing the first equation
from the remaining equations whose structure is specified, since the

equations can be rearranged. Thus, let GA and :A be subdivided as:

 

 

 

 

r- a ‘1
ol F 11
x x
1 (61+A1) 1 Cl
(V1.39) GA = a and PA = r
2 2
M -1 X G +A ) M -1 X0
If. 1 ) ( 1 1d if 1 ) La

then g*(a) becomes:

IqlEYﬁlYAjJ-Xrl FIEYAYA] f

 

1X 2
de A f 1 A
1‘ [Y'Y] ' F [Y'Y] I"
(v1.40) g*(a) = 2 A A 1X l 2 A A LX
I “I a I I '
“IEZAZAJOH 1EZAZAJQ2
det

2 Jo' & [2'2 1&'

A I
QZEZA A 1 2 A A 2

01’

(v1.41) g* (a)

I—p

t. A A A A _1A a. I. a.
I 1"I _ I I I I I I d I I
F1[YAYA]J.X 1 1[YAYA]1xF2(F2EYAYAJYXF2) Iﬁzwlsﬂzsdixrl . etazhAYAjixrz)

I. ' Rf- B . 1.. '7 ' A! - X— . .l, K ' n1.

alezA 1 alezAa2(a2zAzAa2) lazzAonz1 det(azzAzAaz)

(V1.41) is derived from (V1.40) by the determinental relationship shown
in footnote 3 of page 167. The "det" operator may be omitted from the
numerator and denominator of the first term of the product since each
is a le matrix.

The second term of (V1.41) consists only of matrices fixed for
any sample ([YAYAllx and ZAZA) and coefficient matrices from equa-

tions other than equation 1. Thus for a given & the second term

2,

may be considered to be a positive constant in the function to be

250

maximized at this point and we may consider the problem to be one of

selecting & to minimize:1

1
(v1.42) g**(a) .

-1A

 

alezAal alezAazmzzAzAaz) (IrzzAonz1 _

A A A A A A .1A A
I ‘ I‘l _ I I I I I I
FlEYAYAle 1 11”»,th .LXF2(F2[YAYA.]_1XF2) FZEYAYA] Lxr1

The problem has been switched from maximizing a function to minimizing
the reciprocal of the function (dz being temporarily held fixed) to

make the correspondence with LIML easier and to make the derivation of
some partial derivatives slightly simpler.

Let 0 be the TX(M1 - 1) matrix of residuals of equation 2

2
through M1 and assume that 02 has full column rank.2- Thus,
A =_ AI=_f"I_ AI
(V1.43) U2 ZAQZ YA 2 XAB2

Then the numerator of (V1.42) may be written as:

A A A A -1A A A A

I _ I I I I= I A I

(V1.44) 0:112:52A ZAU2(U2U2) UZZA]OI1 c1r1[zAzAjwch1 [see (1.37)] .
That the denominator of (V1.42) may be written as:

A I A A I
(“'45) Iaways.)ilIxEuzllh

may be seen as follows:

 

1 '1 ; det(leYAYAJfoé)

g*(a) is.” . . .
g**(a) det0122A2Adz

 

 

2 A lA'A A A I A
as 1 . = 2
If 211,11. T£U1U11 is assumed nonsingu ar, then U1 [ul U2]

and, therefore, U must already have full Column rank.

2

3E¥A¥AJL1X502]. is the part of the moment matrix of VA orthogonal

to both X and . U2 .

251

[UZJLX E [‘YAr2 ‘ XAB2 1x T '[YAjixr2 ’ EXAJLXB2 [by (1‘63)]

. -[ijlxiLé since [xAjlJLx - 0 [see (1.42)] -

Thus,

[YAJLEXEBZJ ' [YA]L(X![62]1X) ' {[YA]L([GZJLX)}LX [see (1.65)]

. {[YAJI. ('[YAJfo‘p }J-X

A A

I I I'I ‘1 A I
' {YA - {-[YAJLXF2)(FZEYAYAJLXFZ) (‘rzhAJlanhx [see (1.47)]

" A I AI '1“ I
. [YAL-X - {[YALXLXI‘é(FZEYAYALXFZ) F2[YA]LXYA [see (1.62)].

However, {[Yljlx}lx ' [YA] and [YA]. Y = [YXYAJLX [see (1-56)];

.LX .LX A

hence,
A g _ AI A I AI ‘1“ I
(V1.46) [YALEfoZ] [YAJLX [YA]LXI‘2(FZEYAYA]‘LXI"2) I‘ZEYAYAJLX

I . I . -
[YAYAJL[X§U2] may be written as YAEYA]1[X{U2]. USing (V1.46),

we have

(“'47) [YAYALIxsﬁz] " YAEYAJYIstZJ

I _ I AI A I AI ‘lA
- YA[YA]lX YAEYAJLXFZC‘ [Y Y ] 1“ ) 1‘21

I
2 A A 1X 2 YAYAJJ-X

A A A .. 1A
_ I _ I I‘I I I I . .
[YAYAJJ- X [YAYAJJ- X 2 (FZEYAYAJJ. XFZ) I”ZEYAYAIL X
“ I .. fv .
Hence, FIEYAYAJLEXEUZ] 1 is the denominator of (V1.42) as claimed in

(V1.45), and (V1.42) becomes:

e I ,, A I
or1E ZAZAJJ. U 20’1
(V1.48) g**(a) - .

I ,, " I
I‘lwnyeji [XEUZJFI

But in the notation of part I of this paper (since rows and columns of

[2A2A]La and [YAY corresponding to coefficients restricted

A11 [xi 112]

2
and f1 may be delected), (V1.48) may be rewritten

to be zero in 61

252

 

1
38:
C11
[+Y'1: 817M 1 + 2111 1312
1911
(v1.49) g**(a) =

 

 

+§1E+Y1 + Y3111M U 2] +§1

We may minimize g**(a) first with respect to El thereby con-

A

centrating the function on Y

+ 1. First notice that the numerator of

(V1.49) may be rewritten as:

_‘ .--

 

 

 

 

 

 

 

A ‘5
I I ,
+¥1 +Y1 +Y1x1 +Y1
AI AI _
(v1.50) [+‘Y1 Bl] -
x' Y x'x "
L1+1 ll—JLUZL- 1_J
H
[Y J [ y'x .1 r.
+Y1+Y11U2 +111U2 +Y1
E +Y‘ 6g
LEX1+YJ11EX1X1 “1U .5
I A A
MR. 1 + Y111i‘12 +Y1 + 261W1 +Y131u2U:+ + B1'1M13111251

with respect to El and

Taking the partial derivative of g**(a)

setting the partial derivative to zero we have:

(v1.51) (1/d){2[x'1 + H1311 +§1 + 2[x'x 11“] 11281} = 0
2

where d is the denominator of g**(a)

1 . ' .
' “ d 2 B : 2
+4» [Y3 8“ + u [’11 u]

in (V1.49). Solving for

TI”

253

we have:1

(v1.52) 51 = -[X'X1;] *1 112Ex1 + Y1J1U2 +111

Substituting for 81 into the numerator of g**(a) [i.e., substituting

into (VI.50)], we have:

I.

I a. A
(”'53) +Y1E+Y1+Y1J1U +Y1 2+Y1E+Y1X1EE1U2EXIX 11w 21x

1 Y
2 1 +Y 1 1U2 + 1

I A l .
+Y1E+Y1x13112EX1X1J 2Ex1x1]112EX1X1Y1UEZEX1+Y131U2 +Y1

I
+Y1E+Y1 + Y11w +Y1+Y1E+Y1x131U

2 l + 11U + l

[11'me 12[x' an 1}
2 2

' A
=+Y1E+Y1 + Y11]1[x:'u U2] +Y1

where denotes the moment matrix of the part of Y

' Is
E+Y1+Y1]1[x EU ] + 1

orthogonal to both X1 and U2 .

 

1 2
a ﬁgiﬁﬂl. 2[x'1x le 2, a positive definite matrix if det([X'X ] “ )

# 0 and d f 0 (since d cannot be negative); hence, under these con-
ditions the second order condition for the value of Bl given by (V1.52)
to minimize (V1.49) is met.

2That the last equality in (V1.53) holds may be seen by writing
out [+Yl]1[xliﬁ2] as follows:

E+HEYEX1UU21 E+Y131182?[X111121 EE+ hjiu E 11x 1 ,2} 15.. ‘1 65>]
"YE+ 1 1U2 " EX1Y1UZEEX1 162Ex111i12) 1Ex1111'12YE+ 111211“

Hence, [ +Y1 + HIJLEX U 2] g+¥1[+Y1]4-X ﬁuzj [see (1.56)]
'+ Y1E+Y111112 ' + Y'lEX1321112 Ex1x131i‘1121Ex1 + Y131112

I I
E +Y1 +Y111U2 E+Y1x111U2Ex1x111UZEXI +Y1111J

254
Thus, (V1.49) becomes:

+YlE +Y1 +Y1]1[x1:U2] +Y1

 

(VI . 54) g** (a)

A V!
min 51 +Y1E+Y1 + U1]1[x U 2] +Y

 

Comparison of (V1.54) with the alternative formulations of the

A

LIML problem (section 11.C.l) shows that the +y1 which minimizes

g**(a) A is the eigenvector, di’ corresponding to the smallest

eigenvalue, c1, of A 1B with A = [+Y' l +H1311x U 2] and

-[ +Y1 +Y1]1[X’AU 2]. Further comparison with section 11.C.1 shows

that the eigenvalue is the value kLIML in a LIML problem with +Y1

being the matrix of jointly dependent variables in the equation,

[X 3 ﬁ being the matrix of predetermined variables in the equation
1 2

and [X § 32] being the matrix of instruments, XI . The +§ which

minimizes g**(a)| 1 may be calculated either as (1) the eigen-
min B

vector, d1, corresponding to the minimum eigenvalue, kLIML’ or (2)

by substituting kLIML into the usual k-class formula; thereby cal-

culating Y1, él’ and M - 1 additional coefficients corresponding

1

2. (The M1 - 1 coefficients corresponding to

are then ignored in further calculations.)

to the variables in U

the variables in U2

In summary, what we have shown is that given a set of coeffi-
‘

cients for all equations except equation 1, the coefficients of equation
1 which will maximize the likelihood function is the LIML solution of

equation 1 modified by including the residuals of equations 2 through

M1 as additional predetermined variables in equation 1.

1f the coefficients of equations 2 through M1 are assumed

fixed, the coefficients of equation 1 (Y1 and 61’ only--the coeffi-

cients corresponding to U are ignored) estimated in this manner

2

255

will increase the likelihood function over the value of the likelihood
function obtained by the original LIML coefficients for equation 1
(assuming that the original LIML coefficients did not already maximize
the likelihood function considering the coefficients of equations 2

through M as fixed).

1

Now, let us define the calculation of new coefficients for equa-
tion 1 as step 1 of an iteration and calculate the residuals for that
equation (ignoring the coefficients corresponding to the residuals for

equations 2 through M in the calculation of the residuals for equa-

1

tion 1). Let us define the calculation of new coefficients for equa-
tion 2 as step 2 of an iteration and calculate these coefficients as
the LIML solution obtained if the new residuals from equation 1 and

the residuals from equations 3 through M1 are included as additional

predetermined variables in equation 2, The new coefficients for equa-
tion 2 will increase the likelihood function over the value of the

likelihood function at the end of step It

If the procedure is continued through step M of the iteration,

l

the new coefficients for equation M will, in general, increase the

1
likelihood function over the previous coefficients for equation M1”
Let us define a new iteration as starting with the calculation of new
coefficients for equation 1, using the new residuals obtained from
Clearly the new co-

steps 2 through M of equations 2 through M

l l"

efficients obtained will give a likelihood value greater than or

equal to the likelihood which was obtained at step 1 of the previous
iteration, since the likelihood will have been increased (or at least
not decreased) at each step of the previous iteration. The likelihood

at any one step of an iteration will not be strictly higher than the

256

likelihood at the same step of the previous iteration only if the co-

efficients for all equations maximize the likelihood assuming that the

coefficients of the remaining equations are held constant.

Swnmany 06 the ILIML Computationai Method

(1)

(2)

LIML may be used to estimate starting coefficients and residuals

separately for each of the M1 equations.1

Step 1 of an iteration consists of estimating new LIML coefficients
for equation 1 by using the residuals from equations 2 through Ml
as additional predetermined variables in the equation. A new
vector of residuals is estimated for equation 1 from the new co-
efficients of equation 1, the coefficients corresponding to the

residuals of equations 2 through M being ignored in calculation

1
of the vector of residuals for equation 1. The new vector of

residuals replaces the old vector of residuals in steps 2 through

Ml (i.e., in the calculation of new coefficients for equations 2
through M1).

Step 2 of an iteration consists of estimating new LIML coefficients
for equation 2 by using the residuals from equation 1 and 3 through

M as additional predetermined variables in the equation. A new

1
vector of residuals is estimated for equation 2 and this vector

represents equation 2 in the calculation of new coefficients for
the other equations until step 2 of the following iteration.

New coefficients and new residuals are estimated for each equa-

tion in turn until new coefficients have been calculated for the

Mth equation.

 

1A3 with FIML, other starting estimates such as DLS or ZSLS

estimates could be used.

257

(3) A new iteration is calculated in the same manner using the re-
siduals from the preceding iteration in the calculation of new co-
efficients for each equation in turn.

(4) Iteration continues until all coefficients converge. The con-
vergence criteria given for FIML in section V.C.S are applicable
for this computational method of calculating FIML and SML coeffi-

cients as well.
Inc/Leasing Compwtaaomt’. Eﬁﬁioécncy 05 ILIML

The iterative procedure may be made considerably more efficient
computer time-wise and also more accurate rounding error-wise if the
residuals are not explicitly calculated. (A large number of multi-
plications and additions are required to calculate the residuals and
form the sums of cross-products of the residuals with the other vari-
ables in each equation.) All moment matrices needed for the calcu-
lations may be calculated directly from the vector of coefficients and
the moment matrix of all variables in the M1 equations through use
of a number of alternative formulas with the optimal formulas to use
depending on the amount of special programming which the user of the
ILIML method is willing to do. Following are some relationships which
may be worked into a procedure:

From (V1.54), we note that the k value for estimating the co-
efficients may be derived as the smallest eigenvalue of

-l
[+Yi'i +Yu1ix?ﬁ_u][+yd +Yu11. Exam-u]
where f1-“ denotes the residuals for all equations except the nth equa-

tion. If desired, the eigenvector corresponding to the smallest root

may be used as +§ and B may be calculated by formula (V1.52).

258

Y. Y Q“ d ' "
+ u +'M]L[X:U_u] an [+Xu +XHJiEXu5U_u]

E

by direct orthogonalization using the orthogonalization procedure given

may be calculated

in appendix A.
Let 2 be the matrix of all of the jointly dependent variables

in the first Ml equations and all of the predetermined variables in
the s ste . Th '“ = - Z' Z 3 a d “" 8' Z' Z 5
y m en zuu- [ +u1+u n uuuu' +u[+u+u']+u'

Remember that one variable in the U and U matrices changes

each time a new set of coefficients are calculated for an equation,

A

and that the a matrix for equation 2 differs from the U matrix

for equation 1 in that it includes residuals from equation 2 instead of
residuals from equation 1.

Since any equation with rk X = nu does not affect the final
FIML or SML coefficients of the remaining equations of a system of
equations, some additional computational efficiency may be obtained by
omitting all equations with rk X = nu from the iterative procedure
until convergence has been obtained (maximum likelihood estimates have
been obtained) for all equations with rk X > nu .1 The maximum like-
lihood coefficients for an equation with rk X = nu may then be directly
caJCulated as a LIML problem which contains as the only "extra" pre-

determined variables in that equation the residuals (calculated from

 

¥An equation with rk X = nu is usually referred to as a just-
identified equation and an equation with rk X > nu is usually re-

ferred to as an over-identified equation. (See section II.D.)

-259
l

the converged coefficients) of the equations for which rk x > nu -

Compawon ww: FIML and SML Methods

At each step of each iteration in the ILIML method, the co-
efficients of only one equation are modified; therefore, it would
seem likely that situations such as illustrated in Figure V1.55

could arise.

Figure V1.55

coefficient
from
equation u'

 

 

 

Coefficient from equation p

 

1A variation of this technique may also be used in the SML pro-
cedure. Convergence on an SML routine may be first obtained for those
equations with rk X > n and the system then enlarged to contain also
the equations with rk XP- .

The two procedures ma also be mixed, SML being used to obtain
maximum likelihood coefficients for the equations for which rk X > nu-
The maximum likelihood coefficients for an equation with rk x -
may then be calculated as a LIML problem which contains as the ongy
"extra" predetermined variables in that equation residuals (calculated
from the converged coefficients) of the equations for which rk x > nu .

260

Since the coefficients from one equation are selected to maximize
the likelihood function given the coefficients of the other equations,
movement at a single step is only in the space of the coefficients of
a single equation. Thus, (as in Figure V1.55) very little change in
the coefficients may result at any one step if movement is up to a
ridge which is not parallel to the space of coefficients of one of the

equations. It is even quite conceivable that so little change would

 

be made in the coefficients of any equation that it will be thought

that the coefficients have converged while actually_a considerable

 

distance from the maximum of the likelihood function. On the other

 

hand for the situation in Figure V1.55, movement to the maximum would
be very rapid for the FIML and SML computational procedures given
earlier, since all coefficients are simultaneously adjusted during any
one step of the procedure.

The question of whether multiple local maxima occur in critical
regions in the likelihood function for systems of equations estimated
by FIML and SML is still an open question. It should be noted that
the ILIML method is as likely as the FIML or SML method to move to a
local maximum rather than some global maximum (assuming that it moves
to a local maximum at all). However, added to this possibility is a
considerably more serious possibility--that convergence may be to a
point on a ridge. This is much more serious since (1) ridges which
could cause difficulty are surely more likely to occur, and (2) con-
vergence will be to a non-unique point even within the region. (E g.,
if estimates result in movement to a point b on the ridge just above
a stable point a on the ridge, movement will then be not toward a.

Instead b will be considered a point of convergence or movement

261

farther up the ridge will occur.)

In general, the larger the number of equations and the larger
the total number of coefficients in the entire system being estimated,
the slower ILIML convergence is likely to be, since in these cases the
coefficients of any equation are likely to span less of the total co-
efficients space. It is conceivable that convergence may be actually
faster [computer time-wise] for the ILIML method than for the FIML and
SML methods in yggy small models due to the simple computations per-
formed; however, if such cases do occur, any computer time saved through
use of the ILIML method would surely be trivial, since FIML and SML will
converge rapidly in these cases, also. 0n the other hand for the large
majority of problems to be calculated, the use of the FIML or SML method
may save a very large amount of computer time over the use of the ILIML
method.

In only 11 iterations a coefficient convergence criterion of
.000 000 000 001 was satisfied when Klein's model I was calculated by
the FIML procedure; however, when Klein's model I was calculated by the
ILIML procedure, a coefficient convergence criterion of only .00001
was satisfied after 327 iterations. Convergence was occurring very
slowly at this point with approximately 100 iterations being required
per extra digit of convergence.1 (The FIML and ILIML coefficients

agreed up to the accuracy calculated by the ILIML procedure.)

 

lKlein's model 1 which contains only three stochastic equations
is given in section I.C.l, and the FIML solution to Klein's model I
is given in the reproduced computer output of section 1X.K. The co-
efficient convergence criterion is defined in section V.C.5.

262
The primary advantages of the ILIML method over the FIML or

SML method would appear to be:

(1) Ease of programming. (However, in general, the more advantage

(2)

taken of programming relationships to reduce computer time in
computation by the ILIML method, the less advantageous the ILIML
method is from this standpoint.)

With a given sized computer memory, it may be possible to cal-
culate much larger problems by means of the ILIML method. (How-
ever, in general, the larger the problem being calculated, the
less favorably the ILIML method may be expected to perform both
with respect to speed of convergence and whether convergence be-
comes so slow that the problem appears converged when it is short
of the maximum of the likelihood function.)

On balance, the ILIML method would not appear to warrant an in-

vestment in programming. Instead it seems desirable that FIML and SML

be programmed by the formulas given previously.

CHAPTER VII
ZELLNERrAITKEN ESTIMATOR (ZA)

A. Onlyggero and Normalization Restriction Imposed on Coefficients

The ZellnerwAitken estimator (ZA) is a multiple equations method
which may be applied when each structural equation contains only a
single jointly dependent variable.1 If all of the equations in the
system are of this form and ZA is applied to the complete system, then
ZA is a complete system method.2 ZA may also be applied to only part
of the system, in which case ZA may then be regarded as a partial
system method.

The 2A estimating equations can be derived as an application of
GLS (in which an estimated disturbance variance-covariance matrix is
substituted for the actual disturbance variance-covariance matrix) as
follows:
h

. . t
Assuming that there are M equations in the system, the u

equation being of the form:

(VII.1) Y

=x +
u as» “a ’

then the entire system can be written as:

= +
y1 X181 + 08M + u1

(VII . 2)

=0 +..., +
yM 81 + XMBM ”M

 

1Zellner proposed the “efficient estimating procedure" which we
are calling the ZellnerwAitken estimator in the article Zellner [1962].

If the complete system contains only one jointly dependent vari-
able per equation, then the structural equations and reduced form equa-
tions coincide. These reduced form equations will, of course, not be
unrestricted but will incorporate all of the a priori restrictions of
the structural equations.

263

264

or if we define the following matrices and vectors:

y1 x1 '“ 0 B1 "”1
i=2 is: : s=., u=
y 0 '°° XM BM-J “M_J
MTXl MTXn nXl MTXI
M
where n = Z nu (n is the number of explanatory variables in the
u=l

t
u h equation), we can write the entire system as:
(V11.3) y = X6 + u

Initially we will make the statistical assumptions given at the
start of this paper (section I.C.3). One of the assumptions implies
that X“ for each equation has. full column rank which in turn implies
that R has full column rank. This assumption will be relaxed in
a later section of this chapter when restrictions on the coefficients
are permitted.

Let U = [ul "° uM] be the TXM matrix of disturbances, with
column u the disturbances for equation u and row t the disturbances
for observation t, designated UEtJ' Then the assumptions of section

I.C.3 that 6U . o, duﬁtJUEt] = 2 for all t, and 6U6t1UEt'] = o

MXM
for all t r t' imply that:
(VII .4) 6U = O
MTXl
and
VII.5 6 ' = o I
( ) uuuu' uu'

where I is TXT and OHM. is the element in row u and column 0'

of Z . (VII.5) is commonly expressed for the whole covariance matrix

265

of. u by the "Kronecker product" 0 as:1

 

1'If A is any qu matrix and B is any rXs matrix, the Kronecker pro-
duct AIB is defined as the (p-r)X(q-s) matrix
P

 

 

 

 

‘11b11 ‘11b1s “19"11 a19%.1
Fa B°°°a B1 a.b '°°a.b a.b °°-a.b
11 lg 11 r1 ll rs lq rl lq rs
ass - E 3 - ° '
:plB quBJ sp.1b11 aplbls ‘qull ‘quls
:plbrl aplbrs aqurl aqursj

For A and B square, symmetric, and nonsingula matrices .ADB is also
square, symmetric, and nonsingular a d [AOBJ' -.A' OB’i. This can be
seen by (l) premultiplying.ABB by A' B'1 and observing hat the identity
matrix is obtained and (2) postmultiplying A08 by A'IIB' and observing
that the identity matrix is obtained; hence that A'IOB'1 is the inverse
of.AO . Let us assume that A is po and B is qu. Premultiplying.AIB

by A‘ '1 we get:

11 -1 1p -
‘ B 11 1p

a B
[A'IIB'1][AIB] - . . .
a913-1 ... appB-l s ... a s

 

 

 

 

1 pp
p -1 P -
The 15‘“ block in the product is z [(aikB )(aij)] - z ajkukj(s 1s)
. p k-l p k-l
I [ Z .1“; ]°I, where 1 is TXT. But 2 aika I 1 if i I j and 0 if
RI]. kj kIl 1‘!
i I 1; hence the product is:
1.1 ... 0.1 I ... o
3 f I 2 I I I
0.1 ... 1.1 o ... I quPq
Similarily,
a
a B -'- a B anB-1 alpn.ﬂ I O 1
ll 1p
[ABB][A-10B-1] - : : : : - '
pl -1 ... PP -1
a B "' a B a B a B 0'1 "° 1-1
__91 psi _ ..
I I .
pqqu

Hence, [AOB]-1 I [A-193-1]

266

 

 

 

 

r- . 1
011 0 01M 0
r I
0111 clew O o 0 - a
TXT TXT 11 1M
(VII.6) dbu' I 2 9 I I I i I
MXM TXT
I U I 0' . ' C
LMl MMJ M1 0 MM 0
MTXMT
LI'C)....aMl--- 0...OMMJ
MTXMT
*
We will designate 6uu' as Z
MTXMT

If we treat y as the dependent variable, X as the matrix of
predetermined variables, and u as the vector of disturbances and sub-

stitute into GLS formulas (v.12) and (V.13) we get:

A .. *-1 . -1 .' *-1 ,
(v11.7) sGLS = E x z x ] [x z y 1
nXMT MTXMT MTXn nXMT MTXMT MTXl
. ~ ., *-1. -1
(V11.8) asymptotic Var(BGLS) = [x z: x]

an
If we had assumed that X, the matrix of predetermined variables, were
fixed in repeated samples, then. Var(éGLS) would be-a small sample

estimate rather than an asymptotic estimate. Also, EGLS would be best

linear unbiased (assuming that we knew 2).

* 0 A
Due to the particular form of 2, X, and y, B and asymptotic

GLS

Var(ﬁ ) may be written out in more detail.

GLS
CI11 01M 0111 O1MI
*_ _ _ . . . .
21=[29111=2191= : :. 91:: :
owns“ omens“.
. . . -1 ow'
where I is a TXT identity matrix, and 2 = [cuu,] and Z = [ J

are MXM matrices. Thus:

267
(1111.9) 5 -

 

 

 

 

     

 

 

 

 

GLS
Fx' 07 GI (ﬂ '1
1 111” 61111 1
: ' : x
1.0 i L0 . XMJ
cm;
X
' G
-1PM ‘
ll ... '1 in
R ax. Mix. “21° xv.
- : : 'M .
1 11;.
Max. ”5w if,“ an.
and
-1

l l

tow MW

Since 2 is unknown, the following estimate of 2 is sub-

FoUX'x omXix;

(VII-10) asymptotic Var(BGLS) I

 

 

stituted into the formulas:1

 

1The 8 matrix will be singular if the number of equations exceeds

the number of observations. Let U I [$1 ~-- 6 ] where the a are
TXM M "'
ms estimates. Then 9 - (l/T)U'U ; hence, rk s - ch 6 . If '1' < u
MXM

than rk.ﬁ and therefore rk S will be less than M, i.e., S will be
singular. This is not in conflict with our derivation in section (V11.B)
of DLS as the special case of ZA in which there is zero correlation be-
tween residuals across equations, since that derivation holds only for
nonsingular 8. Even though 2 is assumed nonsingular, a particular
estimate of 2 may still be singular (and as shown above if Mi> I

then' i I 8 is singular).

268
* def

VII.1l Z I S I .
( ) [sW ]
..DLS'..DLS «DLS' rDLS cDLS
1th I I '
w sw' (l/T)uu up. (l/T)+Bp' [+211- +zu,]+6pl , +6“ being

the vector of DLS coefficients (including the normalizing coefficient,
r 1
-l f r e ati a d 2 I : Z .
)0 qu onu n+u [YH 11]

Substituting S into (V11.9) and (V11.10) we get:

(v11. 12) éZA - [emflpm

as the ZA estimate of the vector of coefficients of the M equations

where:

 

1T is being used rather than some other divisor so that 12A (iter-
ative ZA) coefficients can be derived as maximum likelihood coefficients
further on. Two alternative divisors suggested by Professor Zellner
follow:

 

..DLS 'ﬁDLS
(1) 8* ' ' T L “g L p; k
H“ u 3" up:

- -1
h k - t x x'x ' x' x '
w ere w' r{ Ht 11 u] 1))»qu u' 11'] xu}

I '1 1 1 '1 1 . 2,

- tram] [xuxwltxwxud [xvxun Pu minauap.)
where p is Hooper's trace correlation coefficient which measures the
correlagion between sets of variables X and X .. As indicated in
Stroud and Zellner [1962], 8* estimateduby this formula provides an
unbiased estimate of 2 if X is assumed to be fixed in repeated samples.
Although unbiased, 8* would likely have more variance about 2 due to
the evaluation of the trace terms.

In a discussion with the writer, Professor Zellner suggested that
he did not favor the use of formula (1), since 8* I [83“,] is not

positive definite. Rather, he favored dividing by T or using the
following estimate which is not unbiased, but provides a positive de-
_finite 8 matrix:

IJHS'ADLS
(2) 3** . e—p 4“.

up. JT‘Lp’W/T’Lp‘g

 

269

 

 

 

 

'11 , 1 , “
s XIX1 s MX1
9(1) - : :
Ml , ... ,
f 1 8%
r' I
r. - S11
M ' . . . °
2 smxiy x1Eyl yM] I
u-l “ , SM1
p(1) I . I I h
M FMI‘
MI- I '3
”El 8 XMYH’ ![ . .. .. 1! °
- J xM yl yM °
MM
.. 0.

 

 

- l
where S 1 I [sup ] . Also,
(VII.13) asymptotic Var(EZA) = (9(1))-1

Even if X is assumed to be fixed for repeated samples, 62A
is no longer best linear unbiased and G?(1))-1 provides only as
asymptotic estimate of Var(ﬁzA) due to the use of an estimate of 2
in the GLS formulas.1

A "degrees of freedom” adjustment can be made in the estimated
ZA coefficient variance-covariance matrix [i e., asymptotic Var(bzA)]
in the same manner as for FIML (section V E). The estimated ZA coeffi-
cient variance-covariance matrix can also be normalized in the manner
suggested for FIML.

If an estimate of the disturbance variance-covariance matrix is
desired, it seems desirable to utilize the estimated ZA coefficients to

calculate a new 2 instead of merely using the 8 matrix calculated

from the DLS coefficients. Utilizing the ZA coefficients the estimated

 

1See Zellner [1962] for a proof of (VII.13).

270

disturbance variance-covariance matrix becomes:

. ZA
(v11.14) 22A [Suu']
ZA _ .ze'.zA *ze' , "ZA "ZA
with an“. (l/T)uu u“. - (1/I)+sLL [+2L +2“)”. , +5“ being the

vector of ZA coefficients (including the normalizing coefficient, -1)
for equation u and V+ZH I [yu 2 2p] .

A "degrees of freedom” adjustment may be made in the ﬁZA
matrix and the §ZA matrix normalized may be computed in the same
manner as for FIML (section V.D.).

The ZA estimates and the DLS estimates coincide if either:
(1) The same predetermined variables occur in all equations,1 or

(2) There is zero correlation between the residuals from DLS for all

2
pairs of DLS equations, i.e., S is a diagonal matrix.

 

 

 

1That ZA and DLS estimates coincide if the same predetermined vari-
ables occur in all equations can be seen as follows. Let X I X1 I X2 I
- I X . Then T ﬂ
M M 1“
2 s X'y
u-1 u
A -l . , -l '
82A - [s s(x x>]
M
2 SMHX'y
t-l "
'r

However, [S-IG(X'X)]-1 I [S-l]-10(X'X)- I SG(X'X)-1; hence, the vector
of coefficients for the it equation becomes:

 

 

r-M n
2 smx')’
u-1 ”
~(1) -l . -1 '
52A - [911(X'X) .-. 51M(X X) ]
M m
z s X'y
.u-1 5
M M . _ M _ M I
a 2 2 siu'su u[x'x] 1x'yu I 2 {[X'X] 1X'yu 2 siu,sp H}
u'Il H'l u=1 u'=1

271
It is sometimes suggested that ZA be applied to the unrestricted

reduced form equations to hmprove their efficiency over DLS applied to
the unrestricted reduced form equations; however, there will be no
improvement due to (1) above. On the other hand it is often the case
that structural equation coefficient restrictions imply readily rec-
ognised restrictions on the coefficients of reduced form equations. If
these restrictions are taken into account in estimating the reduced
form equations, the ZA coefficients will not in general coincide with
the DLS coefficients obtained from the reduced form equations, even if
the restrictions are taken into account in the DLS estimates. In
section VII.C a computational method for taking into account general
linear restrictions on coefficients in ZA estimation will be presented.
This computational method may prove helpful in direct estimation of
reduced form coefficients, since structural equation restrictions may
imply restrictions on the reduced form coefficients of a more general
form than the special case that certain reduced form coefficients are
zero. The computational method given in section VII.C is sufficiently
general to take account of restrictions which cut across reduced form

equations.1

6

 

M 1 1H llflIHn
“°"°V°" .3» u ' [‘11 ' sin] 8' o if i i u ‘
u-l.1“ I
w.

s
su'u}L I [X' X] 1X' 'yu I 6(1) . Goldberger

 

hence, p.1{[x' X] 1X' my".1 aiu'

[1964], pp 248 and 263 also contains a proof of (1).
5A proof of (2) is given in the next section (section VII.B).
1It should not be forgotten that FIML may also be applied directly

to reduced form equations and that FIML coefficients will not coincide
with DLS.cocfficients in the same case that 2A coefficients do not coin-

cide with DLS coefficients. Some relatipnships between FIML and52A.coe..:

efficients are discussed in.section V11.D.l.

272

B. An Alternate Computational Procedure

Since (2) above (i.e., S a diagonal matrix) suggests an alter-

native method for calculating EZA’ we will verify (2). In this case:

 

 

 

1 T
11 ... ;—— ... 0
11
-1 ‘ .
S a = . and the formula for ZA becomes
0 o ...S_1__
.. ”ﬂ

 

 

 

 

 

 

 

 

 

 

 

 

PJ—x'x O x'xW'l [L
811 l l l M SIIXiyl + ~-- + 0 xiyél
. 1 . .
0 W . . o __ i - 1
L. l s XMXM O X'y + ... + ———"y
F -l q ﬁ—léx'y “
911(xix1) °" 0 811 1 l
. . . ' ‘1 __1__ I
L 0 Sm(xMxM) _, s XMYM
MM
a a
V‘ -1 RDLS?
' I
' ‘1 ' AD108
ﬂux»? XMYMJ LEM _,
In the above verification the diagonal elements of S cancelled

out; hence, if any diagonal matrix had been used in place of S the
same result would have been obtained. In particular, an identity matrix

could be used for S. This gives an alternative method for calculating

ZA coefficients:

(1) Start the computation by using an identity matrix in place of the
3 matrix. (No initial coefficients are required, since they are

used only in the calculation of the S matrix.)

273
Calculate 0(0) and pm) using the same formulas as for

,0) <1)

and. p except that the identity matrix is used for the
8 matrix. Then:

‘6an - <9‘°’>'1p‘°’

(2) Calculate S from BDLS in the same way as given before and use

(1)

it to calculate a new 0(1) matrix and P vector. Then:

an - (9(1))-1pU) as before.

whereas the alternate computational method given above takes
slightly more computer time at the 0th iteration (for a large scale
computer the additional time is hardly measurable), it is simpler to
program--especially if provision is made to iterate on the 2A estimates
in the manner indicated further on. Another advantage of starting in
this manner will be noted in the discussion of the calculation of re-
stricted 2A coefficients (section VII.C).

A disadvantage of starting with an identity matrix in place of
the 8 matrix is that it imposes DLS estimates as the starting estimates
for the 2A procedure, whereas it may be desired that other estimates
be used as starting estimates for some or all of the equations; how-
ever, there seems little tendency to use estimates other than DLS

estimates as starting estimates for ZA.

274
C. Arbitrary Linear Restrictions Imposed on Coefficients
As was noted earlier, the ZA formulas may be derived as an
application of the GLS method. In like manner, restricted ZA (RZA)
estimates in which linear restrictions are imposed on the coefficients
being estimated may be derived as an application of the RGLS method.

If a set of NR restrictions given by

(VII.15) R B = r
NRXn nXl NRXI

is impose on the ZA model (y = XS + u with assumptions as stated pre-

viously), the RZA formulas are given by:1

(VII.16) ﬁRZA - Q {[Q'OmQJ'1 Q'Epm - 9mq1} + q
nX1 nX(n-rk R) (n-rk R)X(n-rk R) (n-rk R)Xl - nxl
(VII.17) asymptotic Var(éRZA) = QlQ"9(1)Q]-1Q'
an

(VII-16) and (VII.17 are derived from substituting the ZA matrix '9(1)

(1)

and the 2A vector F9 into RGLS formulas (1V.5) and (1V.8). Q and

q are calculated from R and r by the computational procedure of

section IV.B.l.

 

1Calculation of the Q matrix and q vector also gives a means of

*RZA
1)

separating out rk R coefficients, B( , which may be calculated from

the remaining n-rk R "unrestricted" coefficients, é???’ Thus, the

following pair of formulas are together equivalent to (VII.16):

. .1 - . 1 1
a??? - [Q9()Q]1 oﬁv"-9”q1

(n-rkXR)Xl (n-rk R)X(n-rk R) (n-rk R)X1

ARZA RZA
= +
nk RXl rk RX(n-rk R) (n-rk R)Xl rk Rxl

where Q2 and q2 are the subparts of Q and q noted in section IV.B.

275
In the calculation of the 9(1) matrix and Pa) vector, the

8 matrix corresponding to the restricted DLS rather than the unre-
stricted DLS estimates should be used.
Imposing Rams/tions which th Amos Equations on the Coeﬁﬁioécmto Used
in cum the s Mama

It seems clearly desirable that restrictions which do not cut
across equations be imposed on the DLS coefficients used to estimate
the 8 matrix. It would also seem desirable to impose restrictions
which do out across equations on the DLS coefficients as well. (They
will then not be DLS coefficients for separate equations but coeffi-
cients closely related to DLS.) A scheme for doing so follows.

Let a be the set of coefficients which minimize ﬁ’ﬁ -

DLSME
(9-*§)'(y-Xé). Then the estimator

Bmsxs ' BDLS ' '
nXl nxl ~an

 

 

( 2 n )Xl
u-1

is the same as that obtained when each equation is estimated separately

by his.

Let us now change the problem to one of:
(v11. 13) min <9-1’cé)‘ (MB)
8

subject to:

A

RB - r
Then the resulting solution will be the restricted DLS solution if no

restrictions cut across equations and a solution which we will call the

276
RDLSME (Bestricted DLS yultiple Equations)solution if one or more re-

strictions cut across equations.
The RDLSME solution ignores the covariances between the dis-
turbance of separate equations as does the restricted DLS solution;
however, it does take account of all restrictions on the coefficients
which cut across equations whereas the restricted DLS solution does not.
RDLSME coefficients may be automatically calculated and then used to
calculate the S matrix used to estimate the ZA coefficients by imposing
restrictions on the alternate method given earlier for the calculation
of the ZA estimates; that is:
(1) Start the computation by using an identity matrix in place of the
(0)

S matrix. Calculate 9‘0) and p using the same formula as

for 9(1) and pa) except that the identity matrix is used for

the S matrix:

(0) _ 9(0)

. 0 -1 ,
BRDLSME = QEQ'9( )Q] Q [)3 q] + q

(2) Calculate S from éRDLSME and use it to calculate a new 6‘1)
(1)

matrix and F) vector. Then calculate EZA by formula (VII.16)

as before.1

In addition to being simple to program, the use of the 0th iter-
ation method given above has the advantage that unique DLS coefficients
need not exist provided the restrictions which cut across equations pro-

vide sufficient restrictions on the coefficients of all equations that

 

1If restrictions which cut across equations are ignored in the
estimation of the S matrix, the resulting RZA coefficients will, of
course, not coincide with the RZA coefficients obtained through taking
account of the restrictions.

277

A

unique 8 estimates exist.

Remionohip to A Common RZA Forunuﬁa

In applying all of the RZA formulas given so far, the R matrix
need not have full row rank and the X“ matrix for each equation need
not have full column rank. If the additional requirements that (l)
the R matrix has full row rank and (2) X” has full column rank
for each equation are imposed, substitution of w9(1) and P(1) into

RGLS formula (IV.23) gives the following alternative RZA formula:2

(WHO) 6 =‘é -<e(1>>'R>"[R<9(”1"RJ [Ram r]

RZA ZA

As with our other formulas, restricted DLS rather than unrestricted

DLS estimates should be used to estimate S when calculating EZA ;

 

lAs an example, suppose that a system of 4 equations contains only
a single jointly dependent variable y in each equation and that each
of the 4 equations contain the same 5 predetermined variables, x1 -°- x5:

plus 6 additional predetermined variables not contained in any of the

other equations. Let 6(1) be the coefficient corresponding to x
me 3 (1) <2) (3) (4) 3
quation i and assume that Bj = B] = BJ = Bj
j - l,...,5 .
If we assume that the data consists of only 10 observations,
unique DLS coefficients do not exist since each equation contains 11
variables but there are only 10 observations. Also, rk X1 = 10 (the

for

maximum rank) implies a vector of residuals of all zeros for each
equation; hence, S - %ﬁ'ﬁ = 0 where 0 is a 4X4 matrix. 0n the other

hand the restrictions on coefficients given above imposes 15 independent
restrictions which cut across equations; hence, it is probable that

the coefficient space is sufficiently restricted that the DLSME and

RZA coefficients are unique.

2This formula is given in Stroud and Zellner [1962], p. 10.

278

otherwise, information which is assumed to hold is ignored in the cal-
culation of the S matrix. If both unrestricted 6 and B are
ZA RZA
desired, they should be calculated separately.
The advantages of the formulas which use the Q matrix and q
vector over the formula which used the R matrix and r vector are

given in section IV.B.Z for RGLS, but are applicable to RZA as well.

279
D. Iterative Zellner-Aitken Estimator (IZA)

1. Only zero and normalization restrictions imposed on coefficients

If the application of the ZA method to the DLS estimates results
in an increase in statistical efficiency, the question naturally arises
as to whether the ZA coefficients should then be used to estimate a
new S matrix, new ZA estimates calculated, etc. If the process is
continued, it may be hypothesized that the coefficients will converge;
that is, that the proportional change in all coefficients in any one
iteration will become less than a small preassigned constant, e > 0.

If the coefficients converge, let us call the result the IZA (iter-
ative ZA) coefficients. It is shown in this section that if the IZA
procedure converges, the IZA and FIML coefficients will coincide; that
is, if to the other statistical assumptions we add normality of the
disturbances, then the converged IZA coefficient estimates are maximum

likelihood estimates.
The FIML Paocedulne

First, let us examine the FIML estimating procedure in the case
of only one jointly dependent variable per equation. Since the matrix
of coefficients of the jointly dependent variables, F, takes on a
particularly simple form--the identity matrix, the likelihood function
in the form f*(a) [given in (V.24)] becomes detZT/det S = detZI/det S -
l/det S. Thus, maximization of the likelihood function implies the

minimization of det 8.1

 

1This may also be derived by noting that FIML estimates minimize
the estimated variance-covariance matrix of the restricted reduced form,
and that in the ZA system, the structural equations may also be re-
garded as the reduced form equations in which account is taken of all
coefficient restrictions in the model.

280
For Any equation 1)., Y becomes y“. (Yu. is empty) and Z

 

 

 

 

+~u H
becomes Kg. Substituting into equations (v.37) and (v.42) we get:1
Em
M , .
(VII.21) 1(1'1) - raga (1-1) - 2 8“” x'ﬁ , - x'ﬁ :
a
and
2
f a up. A A.
VII.22 --1‘3——<-1— =8 ' - 11' 'UF ux
( ) €145. aaua‘u' 3.3(1-1) XNXIJI' ( / )x“ M". H!
CW
I
where F .- ' [am 8W1+8HH S 1
DH - '
8N“
L-
th (1)
At the i iteration, the new coefficient vector a is formed from
the old coefficient vector a(1-1) as:

(VII.23) a“) . g(i‘l) + h(10‘1“)

where h(i) is the step size and the direction, d(1), is formed by
d“) - I£(1-1)|-11(1-1) with:

r‘(1-1'7 (M) (1431

‘ 1 1 a£111
1(1-1) - and £(i'1) :- . I

Léi-l: (in) {fut-13,

 

 

 

 

A

1Since f - I, detZF - detZI which is a canstant; hence, terms

derived from detzf drop out of Sgéﬂl and §_£éﬂl . Thus, §{u|u}

68
is deleted from 1.“ in (v.37) and H is deleted from dip, in (v.42).

281

The. IZA Pucedwce

Now let us examine the IZA iterative procedure. At the ith

 

 

 

 

iteration:
rm 1
. 2 sluxiy
u-l ”'
a 1 1-1 -1
9‘ ) _ (9‘ >)
M
2 I
xxVE
lex'x mx‘ W
1 1 8 1XM
where 0“.” - ' I ,
Mlxx31 "' °MMXQ¥M
*(1-1)
B being used in calculating the 8 matrix which is in turn used

to calculate 9(1-1) and pa.” for the current 3(1).

 

 

 

 

 

 

The increment added to 8(1 1) to form 8(1) is:
(v11.24) 6(1) - 6(1’1) .
F}! j
Esmxiy
u-1 u
«9(1-1))-1 5 _ «g(1-1))-q?(1-1)§(1-1)
M
We may.
PM mx 1 a
2 s 'y ’1 . , r:(i-l
11,-Ix” 31"}:li 91
. «9(1'1))' 5 - M : : (1 1)
gs»... thew-sm‘mjé
xuyu
lH‘l d

 

 

 

 

 

 

 

 

 

 

 

a 2821“ ..
ﬂ; aux. Fg xix Bot-If
1"» 1n u
”-1 “.218
- (9(1'1))‘1 : '- : >
M M
21mm lsmw“ ”
p,-
L ‘J a“ JJ
'1
FMsm "M 1
. <1-1) m ,.
”-13 1“X1614, -XHBH ) “€18 Xluu
a G?(i-1))-l . - «9(i-l))-l
M M
. (1-1) F“ a
(Y -X B ) )3 8 'u
”18me u u u yd ngj
_ (9(1-1))-1 1(1--1)
i.e., §(1) - §(1'1) = 1°G9(i-1))-1 51-1) Thus, for the Special case

of one jointly dependent variable per equation the IZA procedure is
exactly the same as the FIML procedure except that: (1) a step size
of l is used for each iteration, and (2) the 0 metric is used in-
stead of the lid metric.1

If the formula for the ‘9 metric given in this chapter is com-
pared to the formula for the «9 metric given in the FIML chapter (V.39),
theywill be seen to be the same, i.e., the 9 metric of the IZA pro-
cedure is the initial metric used by Chernoff and Divinsky in their
FIML computational procedure. (The 9 metric was used as the initial
metric due to its "safe? characteristics. For one thing it is positive
definite thereby insuring that if a sufficiently small step size is
chosen, there will of necessity be an increase in the likelihood

function for 1(1) # 0.)

 

LThe 1 vector and liq umtrix are the right hand side vector
[defined by (V.37)] and metric [defined by (v.42) and (V.43)] for FIML.

283

Since no jointly dependent variables other than the normalizing
variable occur in any equation the E’ metric given by formula (v.40)
also reduces to the 6’ metric. (As noted in section V.C.3, the .E
metric is a metric which was developed to be asymptotically the same
as the =6 metric.) Thus, the 9 metric given here is the same as
the metrics used for the bulk of the iterations by Chernoff and Divin-
sky. Chernoff and Divinsky only shifted to the more powerful =£
metric close to the maximum. (In addition to the advantages of the
more powerful metric, {fl provides a maximum likelihood estimate of
the estimated coefficient variance-covariance matrix rather than one
which can be shown to be asymptotically the same, i.e., Efl, which in
the case of one jointly dependent variable per equation is the same as
9‘1.)

Since the likelihood function for the ZA model is of a consider-
ably simpler form than the likelihood function to be maximized in the
general FIML case, the '9 metric (which in this case coincides with
the E’ metric) should be adequate in most cases for convergence. On
the other hand, the lid metric which we developed earlier should
still prove the more powerful in terms of number of iterations required
for convergence. Also convergence could surely be speeded if a vari-
able step size were computed in a fashion such as the one given in the
FIML section. If a step size of l is imposed, it is conceivable
[though unlikely because of the particularly simple form of f(a)] that
the coefficients may diverge from the maximum or cycle in some fashion

since an increase in the likelihood at each iteration is only guaranteed

for a step sufficiently small.

284

Due to the more powerful metric used in the FIML procedure, total
computing time may be expected to be less if the FIML procedure of
section V.C. is used rather than the IZA procedure.1 There would be
even more advantage if the FIML formulas were modified by formulas
(VII.21) and (VII.22) to take advantage of the simple form of F.
Finally (if the disturbances are assumed to be normally distributed),
the FIML procedure does provide a maximum likelihood estimate of the
coefficient variance-covariance matrix rather than one which is only
asymptotically the same as the maximum likelihood estimate.

On the other hand the IZA procedure is easier to program. Con-
vergence would be speeded up considerably (at only a small cost in
additional programming) if step size were varied by using a scheme such
as that given in the FIML section. If the IZA procedure is used, d(i)
can be calculated by (l) calculating the new coefficients, which would
be obtained at the ith iteration if the IZA procedure were used [i.e.,
3(1) - «9(i-1))-1F#i-1)], and (2) calculating d(i) as

d(i) _ g(i) _ g(i-l)

New coefficients for the iteration could then be calculated as

g(i'l) + h(i)d(1), the final step size h(1) being based on det S for

trial values of h(i).2

 

1It is conceivable that there exist some exceedingly simple models
in which the IZA procedure requires less computer time; however, these
problems will surely be encountered rarely and even in these cases the
amount of computer time saved by the IZA procedure will be hardly
measurable. 0n the other hand, for the majority of problems encountered,
the FIML procedure should result in considerably faster convergence
computer time-wise than the IZA'prbcedure. Also, the FIML procedure
may be expected to converge for some problems for which the IZA pro-
cedure does not converge.

2Since only a single jointly dependent variable occurs in each
equation, det S is minimized at the maximum of the likelihood.

285

A “degrees of freedom" adjustment may be made in the estimated
disturbance variance-covariance and the estimated Coefficient variance-
covariance matrices which correspond to the IZA coefficients in the
same manner as for FIML (sections V.D. and V.E.). If the IZA coeffi-
cients are to be maximum likelihood estimates, the adjustment to the
disturbance variance-covariance matrix is made only after the coeffi-

cients have converged, i.e., a diviser of T is used during iteration.

286

2. Arbitrary linear restrictions imposed on coefficients

Restrictions may be imposed each iteration of the computation
of the IZA coefficients in the same manner as for ZA. When convergence
has been obtained, the resulting coefficients will satisfy the restric-
tions and will be the same as would be obtained if the restrictions

were used in the FIML computational procedure.

287
E. Iterative Direct Least Squares (IDLS or Telser Method)

For the particular case of a system of equations in which each
equation contains only one jointly dependent variable, Professor Lester
Telser proposed a multiple equations computational procedure in which
DLS is used as the primary computationalprocedure but in an iterative
fashion.1 In each step of the IDLS (iterative DLS) procedure the co-
efficients of only a single equation are calculated by DLS except that
the residuals from all of the other stochastic equations in the system
are included as extra explanatory variables; that is, the explanatory
variables for the DLS calculation are taken to be the predetermined
variables for the equation being estimated and the residuals from all
of the other stochastic equations in the system. Only the coefficients
of the predetermined variables of the equation are used. The coeffi-
cients corresponding to the residuals are ignored.

The IDLS procedure may be considered to be a special case of
the ILIML procedure given in section VI.E, since DLS may be considered
to be the particular case of LIML in which only one jointly dependent
variable occurs in each equation. Thus, the computational procedure
summarized on pages 256 and 257 for ILIML is the same as the IDLS
computational procedure with LIML substituted for DLS as the basic

computational method.
Dmivatéon 06 the IDLS Method2

Let us add normality of the disturbances to the statistical

assumptions previously made. Then, since the IDLS method is a particular

 

1Telser [1964].

2Telser [1964] uses a different approach in his derivation of the
IDLS method.

288
case of the ILIML method, the derivation of the ILIML method is suffi-

cient to show that at any one step in the procedure, the coefficients
of an equation are selected to maximize the likelihood function, pro-
vided we consider the coefficients of the remaining equations in the
system as fixed during that step. (This implies that if the coeffi-
cients of all other equations are maximum likelihood coefficients,
then the single step will result in maximum likelihood coefficients
being estimated for that equation as well.) Thus, it may be expected
that, in general, if enough steps are taken (enough steps may be a
very large number of steps in some cases) convergence will be to the
maximum of the likelihood function. Particular cases similar to that
posed following Figure V1.55 of section VI.E may arise where movement
to a point short of the maximum may occur and from that point no
further movement occurs or movement is so slow that it is thought
that convergence has occurred.

Since the restriction to one jointly dependent variable per
equation considerably simplifies the likelihood function, the remainder
of this section will be devoted to giving a simpler derivation of the
IDLS method than the one given for ILIML (however, the basic steps in
the derivation are the same as for ILIML).

Let us separate out the first equation from the remainder of
the system and select coefficients for that equation which maximize
the likelihood function assuming that the coefficients of the other
equations are fixed. Thus, we will divide the MX(M + A) matrix of
coefficients a into two parts-~01, the lX(M + A) matrix of coeffi-
cients of the first equation, and 02, the (M - l)X(M + A) matrix of

coefficients of the remaining M - 1 equations:

289

F a1 1
1X(M+A)
(VII.26) a =
a2
b(M- 1) X (M+Aj

 

 

Let S be the estimated disturbance variance-covariance matrix corre-

‘

sponding to a set of estimated coefficients a as before. Then, since

ﬁ---z&',
(VII 27) s - (1/T)ﬁ'ﬁ - (1/r)&z'z&'

Selecting coefficients to maximize the likelihood function is equivalent
‘

I to selecting coefficients to minimize det S (see section VII.D.1).
In the same manner as for ILIML (section VI.E), we may factor

det S into:

1 r. .x. 1
. _d l I g I A t o d I c
(VII.28) det s r ec(az Za ) EUIEZ leUzal et(a2[z zjaz)
where:

U2 - -202 is the TX(M - 1) matrix of residuals of equations 2 through

M and [2.zjiﬁ is the moment matrix of the part of Z orthogonal to
2
U2 .
Let us subdivide 2 as 2 = [y1 2 x1 2 21*] where y1 is the
TXl vector of observed values of the jointly dependent variable of

equation 1, X1 is the T‘Xnl matrix of predetermined variables in

equation 1, and 2t* is the TX(M + A - n - 1) matrix of variables

1
outside equation 1 but in the system. Then 31 may be correspondingly'
subdivided as 31 = {-1 E 8i 5 0'] where 81 is the an1 vector of

coefficients of predetermined variables in the first equation, and O
is the vector of coefficients of the variables outside equation 1.

a [Z'ZlLA'a' may now be written as:
1 U2 1

I

290

 

 

 

[‘3’in YiX1 yiz? 71"
. - ? A' E 1' ' ' ' ** “ .
(VII 29) E 1 31 0] lel xlx1 x121 81
*1” **' '
z1 y1 21 x1 21* zrj OJ

 

. L
iuz

b

I _ v A 1 I A
y1Y1 2E3’1x1jiuzg1 + B1EX1X1JJ41251 '

Substituting (VII.29) into (VII.28), taking the partial derivative of
det S with respect to 81 and setting the partial derivative to zero

we have:

det S ' l_ . v u . v x .
(V11.30) (LEFT) . Tdet(az[z 210/2) (-2[x1y1]lU2 + 21211115111531) 0

or solving for the minimizing value of 81 we have:1
-1
3 ' A ' A
(VII.31) El [xleL-UZEleljiUz

But the solution given above is the least squares solution of the co-
efficients corresponding to x1 which would be obtained if y1 were

used as the dependent variable and the variables in the matrix

1 : 32] were used as predetermined variables in the equation. This

may be seen as follows:

[1:

Let 31 be the (M - l)Xl vector of least squares coefficients

 

corresponding to the variables in 32 . Then the least squares solution
1 Zdet s 2 -1
LT I ‘fdethEZ'ﬂa’PEXinlﬁ , a positive definite matrix
86 2
1

provided [Xixllré exists and det612EZ'Z]aé) i 0; therefore, the
2

second order condition for 81 to minimize det S is met.

291

 

 

of [8i 3 3i] is:1
'11
E31
. A ' ! A -1 i A '
(V11.32) 8 - [[x1 : 02] [x1 . U2]] [x1 . U2] Y1
L1
.J
x'x x'ﬁ "131'
1 1 1 2 1y

 

 

 

 

 

 

 

 

_ “Phi-$2 '[xlxﬂi‘flzx'lﬁzméﬁzl-f 21y]
= j‘ﬁjéﬁzﬂlﬁéxitxixiyl [ﬁiozjfil .1 .0531
Exlxljl.f]lleyl ' [Xixlji-flizxiﬁzmzﬁzj”Hap;T
g L:[ﬁéﬁzjlfilﬁéxlhixl]'lx'lyl + [ﬁéﬁzliilﬁéyl

-1 « . . -1«~ ‘
1 A 1 _ 1 1 1
[x1x11 U2[X1y1 X1U2[UzuzJ U2y1]

~ ~ -1 . . -1
1 1 _ 1 1 1
[U2U2]Jx1[U2y1 UZXIEXIXIJ xlyl]

I -1} I A
[Xlle-L UZEleljl-Uz

A A -1 A
I I
[U2U211x [1123:1le

1 1

 

1The formula given here for the inverse of a partitioned matrix
is derived in Faddeeva [1959], pp. 102-103 except that’he writes
A A A -1
1 A 1 __ 1 1
[XIXILU in the form xlx1 X1U2(U2U2) 2?

version, Faddeeva doesn't treat Xin and UéU2 uniformly but in-

U'X Also, to save an in-

. . A'A
stead writes a different term for [U2U2]ix1 .

292
- 1 '1 1 .
Thus, 81 [xlxljiﬁztx1y1liﬁ2 as claimed, however, [as derived in

(VII.31)] this is the solution which maximizes the likelihood assuming
that the coefficients of the remaining equations are fixed. (If the
remaining coefficients are maximum likelihood coefficients, the solu-
tion given by (VII.32) gives maximum likelihood coefficients for this
equation as well.)

New residuals may now be calculated for equation 1 using the
new coefficient vector 81 but not including the coefficients in 31,
and new coefficients can be estimated for each equation in turn, using
the new residuals calculated in previous steps as additional predeter-
mined variables in the equation in the same manner as for ILIML. After
estimating new coefficients for equation M, a new iteration is started

by again estimating new coefficients for equation 1. Iteration con-

tinues until all coefficients converge.

Summaay 06 the IDLS Computational Method

The summary of the ILIML computational method--pp.256-257 of
this paper--summarizes the IDLS computational method as well if DLS

is used instead of LIMI.
Insteastng Eﬁﬁtctency 06 IDLS Computation

Except for the notes regarding eigenvalues and eigenvectors,
the suggestions for increasing efficiency for the ILIML method are
applicable to the IDLS method as well. In particular, the residuals

need not be calculated, since:

(V11.33) z'i‘iLL - -[z' +Zu3+6u
and
(VII.34) 6's - §'[ 2' z ] é

where +2“ = [yp : ZN] , +8“ = E , and Z is the matrix of variables

u

in the system (or subsystem) being estimated.

In like manner to the ILIML method, any equation with rk.X = nu
does not affect the converged FIML, IZA, or IDLS coefficients; hence,
some additional computational efficiency may be obtained by omitting
all equations with rk X = nu from the iterative procedure until con-

vergence has been obtained (maximum likelihood estimates have been

obtained) for all equations with rk X > nH .1

For the model we are now considering (only one jointly dependent
variable per equation) rk X 3 nu will occur for the uth equation only
if it is assumed that all of the predetermined variables occurring in
any equation in the system being estimated occur also in the uth equa-
tion with non-zero coefficients. The maximum likelihood coefficients
for an equation containing all of the predetermined variables in the
system may then be directly calculated by including as the only "extra"

predetermined variables in that equation the residuals (calculated from

2
the converged coefficients) of all equations for which rk X > nu.

L
wr-

lAn equation with rk X = n is usually referred to as a just-
identified equation and an equafion with rk X > n is usually referred
to as an over-identified equation. p

2This technique may also be used in the FIML and IZA procedures.
Convergence may be first obtained (on a FIML or IZA routine) for those
equations which do not contain all of the predetermined variables in
the system and the system then enlarged to contain also the equations
each of which contains all of the predetermined variables in the system.

The IDLS procedure may also be combined with the FIML or IZA pro-
cedure by using FIML or IZA to calculate the coefficients of those equa-
tions which do not contain all of the predetermined variables in the
system. The coefficients of an equation which contains all of the pre-
determined variables in the system may then be directly calculated as a
DLS problem which contains as extra predetermined variables the residuals
(calculated from the converged coefficients) of the equations which do
not contain all of the predetermined variables in the system.

294

Comps/each Math FIML and IZA Methods

The remarks comparing convergence of the ILIML method with con-
vergence of the FIML and SML methods (pp. 259-262) are applicable for
comparing the IDLS method with the FIML and IZA methods as well. (The
IZA method [unlike the ILDS method] is similar to the FIML method in
that adjustments are made to all coefficients in any step of the con-
vergence procedure; hence, convergence will occur to at least a local
maximum if convergence occurs.) Situations in which convergence will
be short of the maximum of the likelihood function may occur for IDLS

in the same manner as for ILIML.1

 

1Klein'ssmodel I was modified by reclassifying all explanatory
jointly dependent variables in each of the 3 stochastic equations as
predetermined. (The normalizing jointly dependent variable for each
stochastic equation became the only jointly dependent variable in the
equation, and the identity equations were deleted from the system.)
The coefficients of the modified model were then estimated by FIML, IZA,
and IDLS using a coefficient convergence criterion (see section V.C.S)
of .000 000 001. FIML required 6 iterations to converge, IZA required
46 iterations to converge, and IDLS required 64 iterations to converge.
(The estimates obtained coincided, of course, for the 3 methods.)

The Monte-Carlo experiment reported in Kmenta and Gilbert [1967]
was calculated on the AES STAT system. In this experiment the FIML,
IZA and IDLS estimates coincided for all samples for which all three
methods were calculated. Even in the simple 2 equation models, the
FIML computational procedure was much more powerful than the IZA and
IDLS procedures, the FIML procedure requiring about 8 iterations for
each problem and the IZA and IDLS procedures requiring about 23 itera-
tions. 6A coefficient convergence criterion of only .000 001 was used.
Had a smaller convergence criterion been used, only a few additional
iterations would have been required for the FIML procedure, since it is
powerful close to the maximum of the likelihood. On the other hand the
IDLS convergence procedure does not converge faster as the maximum is
approached; hence, a number of additional iterations would have been re-
quired for IDLS.) The number of iterations required for convergence for
IDLS was highly variable. In the 2 equation model, IDLS sometimes re-
quired more and sometimes fewer iterations than IZA, but never as few
iterations as FIML. In a 4 equation model, there was a much greater ad-
vantage to using FIML than IZA or IDLS than for the 2 equation model.
Also, the number of iterations for IDLS became much higher than for IZA.
(The iteration results reported in this footnote are not reported in
Kmenta and Gilbert [1967].)

CHAPTER VIII
THREE-STAGE LEAST SQUARES (3SLS)

A. Only Zero and Normalization Restrictions Imposed on Coefficients

Three-stage least squares (BSLS)1 is usually thought of as a
method for estimating the coefficients of a complete system of equa-
tions and its properties are usually compared with the properties of
the FIML estimator. It should be noted, however, that the 3SLS esti-
mation procedure may also be applied to a subsystem of equations--
utilizing the structure of the subsystem being estimated plus additional
instruments (usually the predetermined variables in the remainder of
the system); hence, it is often more fruitful to compare the properties
of the BSLS estimator with the properties of the SML estimator. Also,
as with SML estimation (and unlike FIML estimation) jointly dependent
variables are adjusted by a matrix of variables contemporaneously
independent of the disturbances of the equations being estimated. As
a result, if the rank of the matrix of variables used in the adjustment
of the jointly dependent variables equals the number of observations,
the special adjustment has no effect. (This is shown in section VIII.C.)

Regarding identity equations, Zellner and Theil recommend that
they simply be deleted from thethree stage procedure. In a footnote,
they go on to say: "It is sometimes recommended that such equations
be eliminated by a substitution of variables. This is superfluous and
makes the computations more complicated than necessary."2 Predetermined
variables in identity equations do serve as prime candidates as in-

struments for the Xi .matrix which is used to adjust jointly dependent

 

1The basic article on BSLS is Zellner and Theil [1962].

2Zellner and Theil [1962], p. 63.
295

296

variables in the two stage and three stage procedures.1

The BSLS estimating equations can be derived as an application
of GLS in which (1) an estimated disturbance variance-covariance matrix
is substituted for the actual disturbance variance-covariance matrix,
and (2) stochastic variables are included in the GLS X matrix. Follow-
ing is a derivation.

As in part I let the uth structural equation of the system or

subsystem being estimated be

VIII.1 = Z 5 + u
( ) yH H H H

TXl TXn n X1 Txl
H H

 

1The XI matrix is defined further on in this section.

2Since ZSLS coefficients which are starting estimates for BSLS may
also be derived as an application of GLS, BSLS might be said to be de-
rivable as an application of the GLS procedure twice. This is the

approach used in Zellner and Theil [1962].

The derivation in this paper follows Zellner and Theil's derivation
except that:

(1) Instead of restricting the matrix of variables used to adjust the
jointly dependent variables to the entire matrix of predetermined
variables in the system (X), the jointly dependent variables are
adjusted by a matrix of instrumental variables, X , with X con-
taining all of the predetermined variables in the subsystem being
estimated plus additional instruments. X may of course be used
as the matrix X A careful reading of the derivations given in
Zellner and Theil [1962] will disclose that none of the properties
which they claim for the BSLS estimator will be affected by this
substitution provided we make the same assumptions regarding the
variables in X that they make regarding the variables in X--
that the variables are fixed.

(2) Zellner and Theil assumed that the X matrix has full column
rank. We will assume that the matrix has full column rank
for ease of deriving BSLS as a GLS procedure, but present a
computational procedure for which XI may have less than full
column rank.

297

where y“ is the vector of observations of the normalizing jointly
dependent variable in the equation, Z” = [YH é Xu] is the matrix of
explanatory variable in the equation (the Txmb submatrix Y“ is the
matrix of explanatory jointly dependent variables in the equation and

the Tti submatrix X is the matrix of predetermined variables in

Y1»

an

the explanatory variab es of the uth equation (vn is the mqu sub-

the equation), 6“ - is the vector of population coefficients of

vector of the population coefficients of the explanatory jointly de-
pendent variables and SM is the tuxl subvector of population co-
efficients of the predetermined variables), and up is the vector of
disturbances of the uth equation.

Let XI (the subscript I denotes instruments) be a TXK matrix

of instrumental variables containing the predetermined variables in
the system or subsystem being estimated plus possibly additional in-
strumental variables. The discussion of selection of instruments in

section II.G for the XI matrix of the double k-class estimators is

applicable to instrumental variables used in the XI matrix for 3SLS

as well. The predetermined variables in the remainder of the system
(if 3SLS is applied to a subsystem of the entire system) and the pre-

determined variables in identity equations should certainly be con-
1

sidered as candidates for inclusion as instruments in the XI matrix.
When reporting results of 3SLS estimation, the particular instruments

included in the XI matrix should be reported along with BSLS coeffi-

cients obtained, since the particular instruments included in the XI

matrix affects the BSLS coefficients obtained.

 

1It is usual to use the X matrix (the matrix of predetermined vari-
ables in the system) as XI (the matrix of instrumental variables).

298

In what follows we will assume that XI consists of "fixed" vari-
~ ables, only.1

Initially we will assume that XI has full column rank for ease

of deriving the computational formulas; however, (as is noted further

on in the derivation) an X of less than full column rank presents no

I
difficulty if the formulas which are presented in this paper are used.
XI must have rank at least equal to the maximum number of explanatory
variables in any equation in the system or subsystem being estimated.

(Even this lesser restriction will be relaxed in section VIII.D when

we consider general linear restrictions on coefficients.)

 

lAssuming that X contains "fixed" variables follows Zellner and
Theil [1962]. This is a restrictive assumption since it excludes lagged
jointly dependent variables from occurring as predetermined variables
in the subsystem being estimated. The assumption that X contains
"fixed" variables was apparently made by Zellner and Theil for conven-
ience in deriving the 3SLS estimator and deriving properties regarding
this estimator. It is common to use SSLS even if equations contain
lagged jointly dependent variables. In his derivation of the 3SLS
estimator Goldberger [1964], p. 347 states without proof that: ”For
convenience we assume that all predetermined variables are exogenous
variables distributed independently of the disturbances; the results

however, carry over to the general case." (The assumptions regarding
predetermined variables in section I.C.3 of this paper follow the
assumptions of Goldberger's "general case”. In particular, lagged

jointly dependent variables are permitted as predetermined variables
in Goldberger's general case.)

The discussion in Fisher [1965] regarding the use of lagged jointly
dependent variables as instruments would appear applicable for instru-
ments used in the X matrix in addition to the predetermined variables
in the system or subsystem being estimated.

2The assumption made further on in the derivation [following

(VIII.4)]that the matrices Xizu ”have"‘full column rank implies that

rk X 2 n for u - l,...,Ms When we consider general linear restrictions
on coefficients, the matrices X'Z need not have full column rank (pro-
vided the coefficients space is sﬁfficiently restricted that unique

BSLS coefficients exist); hence, the requirement rk XI 2 nu may be re-
laxed somewhat in that section.

299
As in our derivation of ZSLS as a GLS method (section IV.D), let

us premultiply each equation in the system or subsystem being estimated
by the same matrix-~the transpose of the XI matrix defined above.

th
The p equation becomes:

v111.2 'X' - x'z 6 + x'
( ) Iyu I u 11 ﬁt
KXl Kan nqu 101

The entire system can be written as:

I . I ... I
le1 X12161 + + 0 6M +XIu1

(VIII.3)

x' - oa+---+' +'
IyM 1 x1214511 quM

or if we define the following matrices and vectors:

 

 

 

 

 

r- q -
x'y x'z o ] - , .,
I 1 I l 61 x¥u1
y "' - 1 X " -- v 1 6 ' : 1 l.) ' : 1
' ... 1 1
[113:5 L0 11129 5 x154,
MKXl MKXn nXl MKXl
M
where n - Z n , we can write the entire system as:
u-1
(VIII.4) y - x 6 + {i .

MKXl MKXn nXl MXXl

Initially we will assume that the matrices xizu have full column
rank which implies that the matrix X has full column rank; however,
this assumption will be relaxed in section VIII.D when formulas for
imposing general linear restrictions on coefficients are presented.

Let Xiuu - an , u - 1,...,M. Then since X1 is fixed,
6%; . dXiuu - Xiduu - XiO - 0; hence, the entire vector 6%; . 0.

Let U - [u1 °-- uM] be the TXM matrix of disturbances, with

column u the disturbances for equation u and row t the disturbances

300

for observation t, designated U[t] . Then from the assumptions of

section I.C.3 that 6U = O, 6U' U = 2 = [o ,] for all t, and
[t] [t] MXM up
6U[t]U[t'] = 0 for t' # tpdus the assumption that XI is fixed we obtain

that:

v111.5 6' e' = 6x‘ ' x = x' 6 ' x = x' o
( ) up u' quuu' I IE Uuuu'] I IE uu'IJXI

X X for u, u' = l,...,M .

o , '
us I I
The latter set of relations is expressed in terms of the entire

covariance matrix of u by means of the Kronecker product as:

(VIII.6) ses' = z sEx'x ]
MXM I I
MKXMK KXK
.. *
We will designate 6uu' as Z
MKXMK

The matrix X does not consist of fixed variables only, since
some of the variables contain submatrices of the form Xti , with Y”
jointly dependent, so that (even if the 5 matrix were known) if we
used X as the GLS X in deriving the BSLS estimator, the resulting co-
efficients would not have all of the GLS properties; however, the GLS
derivation is used primarily as a means of suggesting the BSLS esti-
mator as an estimator with potentially desirable properties. Pro-
perties of the BSLS estimator are derived and proved in Zellner and
Theil [1962] after the computational formulas are established.

If we treat y as the GLS y, X as the GLS X, and u as the

GLS u, GLS formula (IV.2) becomes:

 

1See footnote 1 of page 265 for a definition of the Kronecker pro-
duct and a proof that for A and B square, symmetric, and nonsingular,

[AQB] is also nonsingular and [A93].1 = AﬁlﬁB-l .

(VIII.7)

8

GLS

301

=[i'E'121'1m'E'193 .

*-1 g , -1 g -1 . -1
however, 2 [z s xIxI] z s [x1111]

 

J?

7'11
0 ..

Ml ..

’ o

. ...

. -1
sEXIxI] =

Thus, (VIII.7) may be rewritten as:

(VIII.8) 66LS =

 

 

 

 

 

 

 

 

 

E1111 o 1 pollutixlj'l omtxixIJ'nrJ-{izl . 0 1
: . : 5 f . x
I 1 1 '1 g ‘1 .
Lo - znxy LON [x1111] oMEXIxI] J Lo - xI
'1 , :1
E11! 0 [Unbrixll 1 amfxixI] 1 hip:
x . : : 3
L0 ° ° 2h L‘mEXiXIJ'I «New: fins
- - q- [M - q
guesses 1x121 ”limes ks. 1 “atlases ‘Xiv.
M1” . '1 1 ... MM 1 .1 - I M . -
3 ZMXIOKIXI) X121 0 ZMXI (XIXI) 17‘1qu wafﬂe}?! (xixl) 111i);j
”'11 mt ] 1PM a“ w
a [2'2 J -'° 0 Z' J - 3 [Z'Y ]
1 1 1x1 lZM 1x1 ”1 1 u IxI
. o"1[z;‘zl]'x ”[4111] x g*"izsy 1 x
L- I I 1.1 E‘s-l H I 1.1

 

 

 

 

 

~11 . -1
a [xIxI]

LoM1[xixI]’1 ...

 

 

 

 

 

 

 

-1
1 1 1 _ 1 _
That aux1(xlx1) X12“. [ZHZH'JMXI (the cross product of the part

of

space spanned by XI) and that 26X

2”

in the space spanned by XI

1 I I

with the part of Z“.

) 1xiv”. - [2‘1

(X'X

y»

in the

,] with
IXI

 

302

[zdyh'Jsx similarily defined follows from (1.36).

I
Since 2 is unknown, we will substitute the following estimate

of 2 into the formula:1

‘ defS
VIII.9 Z = =
( ) [Sup 1 J
, . zsLs' uzsLs SZSLS' , zsLs ZSLS
With sup, (l/T)% up. W(l/T) [+ZM +2“, ,]+6u, , +6“

being the vector of ZSLS coefficients (including the normalizing coeffi-
cient, -l for e uation and = { Z
) q 1» +211 [yLL 11]
Substituting S into (VIII.8) for Z we get the BSLS estimator:

* _ (1) -1 (1)
(VIII.10) 533m — (I? ) II.

where S“1 = [SHH']
81181

(VIII. 11) em = , and

[9 Ml[ZMZ ZlIJIX ISMEZMZMJ'IXLJ

 

 

 

1
[M
1n 1
.0) . 3
M .
Mi
2 8 [Z'y ]
”=1 M 11- My

 

 

LThe S matrix will be singular if the number of equations exceeds
the number of observations. Let U = [ﬁl °°' u M] wherethe up are ZSLS
TXM

estimates. Then S = (l/T)ﬁ'ﬁ ; hence, rk S = rk a. If T < M then
MXM ‘
rk a and therefore rk S will be less than M, i.e., S will be singular.
This is not in conflict with our derivation in section VIII.B of ZSLS
as the special case of BSLS in which there is zero correlation between
residuals across equations, since this derivation holds only for non-
singular S. Even though 2 is assumed nonsingular, a particular esti-
mate of 2 may still be singular and (as shown above) if M.2 T, 2 = S
is singular.

303

Let YA be the matrix of jointly dependent variables in the M

equations being estimated by BSLS, let XA be the matrix of predeter-

mined variables in the M equations being estimated by 3SLS, and let

2 = [Y

' 1
A A 3 XAJ- Then 5‘1) and RF ) can be computer efficiently

by forming the matrix

[- ' | ‘1
X
[YAYAJIXI YA A

v111.12 ' e
( ) [zAzA1'X[

 

 

I
_ XAYA XAXA‘

(this matrix may be formed in triangular form since it is symmetric)

and then extracting the submatrices [zdzu'JIX used in KAI) and

I
the subvectors [Zﬁyu'JUXI used in 1(1) from [ZAZAJIXI . That
I =I d I =1 ...
[2AXAJIXI TAXA an [XAxA]IXI X X follows from our def1n1t1on
of XI as containing the matrix XA [see (1.40) and (1.41)].
[YAYAJﬂX is computed as [YAYA] - [YAYAJLX with [YAxA11X com-

I I I
puted by direct orthogonalization in the manner given in section I.D.2.

Note that [2; is unique and easily computed even for an XI

2 ]
A

IXI
having less than full column rank.

If X had consisted only of fixed variables and 2 were known

so that 2 rather than its estimate S were used in the calculation

(1) -1

of Exl) and d(1), then by GLS formula (IV.3), Var(3 ) = (R’ )

3SLS

As shown in Zellner and Theil [1962], even though X contains non-

fixed variables and S is used instead of Z,

(VIII.13) asymptotic V§r(8 ) = (19(1))-1

3SLS
A “degrees of freedom" adjustment can be made in the estimated
BSLS coefficient variance-covariance matrix [i.e., asymptotic var(538LS)]

in the same manner as for FIML (section V.E.). The estimated 3SLS

304

coefficient variance-covariance matrix can also be normalized in the
manner suggested for FIML.

If an estimate of the disturbance variance-dovariance matrix 2
is desired, it seems desirable to utilize the estimated BSLS coeffi-

cients to calculate a new 2 instead of merely using the S matrix

BSLS
calculated from the ZSLS coefficients. Utilizing the BSLS coefficients,

the estimated disturbance variance-covariance matrix becomes:

. BSLS
(v111.14) ZBSLS Esp“.
38LS_ 3SLS uBSLS 3SLS' +33SLS *3SLS
being the vector of BSLS coefficients for equation u +(including the
11 i ffi i t, -1 d - E z
norma 2 ng cos c en ) an +2“ [Yﬁ H.]

A

A "degrees of freedom" adjustment may be made in the zBSLS

matrix, and the 2 matrix normalized may be computed in the same

3SLS

manner as for FIML (section V.D).

B.

305

An Alternate Computational Procedure

BSLS estimates will be the same as ZSLS estimates if the 8

matrix is a diagonal matrix, i.e., there is zero correlation between

the ZSLS residuals of each pair of equations.1

by writing out the BSLS estimating equations.

 

 

P8 . D... 01
11 311
3'18 . - ' :
0 1
0 _
b
814M
5. ..J
and the formula for BSLS becomes
1 -1 1
as") ,5)-
r 1-1P n
——Ez'z] ~0Ez'z] ——Ez'y] +- +0E'2y]
an 11!le 1M|xI 811 11|xI 1M|xI
i
OEzMz] '-—-E‘Z] oEz'y] +—'E']
h MIIIXI smzxruxI E 1M": mzuynlx
s Ez'zl'1 0 -1—EZ'Y]
11 11lx s 11ux
I 11
e - 1 e .
0 2'2“] ——Ez' ]
M X s MVM. x
L “LLMM "ll
'- 1 '1
- F‘
I ' A
[2121]”: [zlyljlx 5251‘s
I 1 1
-l «ZSLS
[2' J Ez'y] 5
LMZMIXI MM ”(L J‘J
In the above verification the diagonal elements of S cancelled

 

 

 

 

 

 

 

 

 

 

This is easily verified

In this case,

 

 

out; hence, if any diagonal matrix had been used in place of S, the

 

1Zellner and Theil [1962], p. 58, note this special case.

 

306

same result would have been obtained. In particular, an identity

matrix could be used for S. This gives a basis for an alternate

method for calculating BSLS coefficients which is essentially the same
as the alternate method for calculating ZA coefficients:

(1) Start the computation by using an identity matrix in place of the
S matrix. (No initial coefficients are required, since they are
used only in the calculation of the 8 matrix.)

Calculate ﬁxo) and 4(0) using the same formulas as for
Exl) and 4(1) except that an identity matrix is used for the

initial S matrix. Then:

 

 

r'stﬁ
1
823m ' 3 " (“OB-lam)
8231.8
LM
(2) Calculate S from SZSLS in the way given earlier in this chapter

and use S to calculate a new Exl) matrix and .&(1) vector.
Then:
« .___ (1)-141)
63SLS (R ) as before.

Whereas the alternate computational method given above takes
slightly more computer time at the 0th iteration (for a large scale
computer, the additional time required is hardly measurable), it is
simpler td'program--especially if provision is made to iterate on the
BSLS estimates in the manner indicated farther on. Another advantage
of starting in this manner will be noted in the discussion of the cal-
culation of restricted BSLS coefficients.

A disadvantage of starting with an identity matrix in place of

the S matrix is that it imposes ZSLS estimates as the starting

307

estimates of the BSLS procedure, whereas it may be desired that other
estimates be used as starting estimates for some or all of the equations;
however, there seems little tendency to use estimates other than ZSLS

. . . l
estimates as starting estimates for BSLS.

 

1LIML estimates or other similar estimates meeting certain con-
sistency requirements (DLS estimates do not meet these requirements)
could be substituted for the ZSLS estimates in the BSLS procedure with-
out changing the proof of the derivation of the asymptotic moment
matrix of BSLS in Zellner and Theil's article [1962]; however, it is
assumed that BSLS estimates are based on ZSLS estimates unless stated
otherwise. Use of estimates other than ZSLS estimates would, of course,
change the resulting 3SLS estimates obtained.

308

C. BSLS Estimation when rk XI = T

 

In section II.G it was noted that if rk XI = T, the estimated
coefficients for all double k-class estimators become the same as the

DLS coefficients. In similar fashion for 3SLS estimation, if rk XI = T,

53SLS - 62A ; that is, the BSLS coefficients obtained W111 be the same

coefficients as if the explanatory jointly dependent variables of each

A

equation were misclassified as predetermined and ZA applied. The BSLS

coefficients obtained will not, of course, have the same properties as

ZA coefficients.1

That 63SLS = 62A 18 ea31ly seen. Let ZA be the matrix of

variables in the subsystem being estimated; then rk X 8 T implies

I
that [ZAJHX - ZA , since all variables are in the space spanned by
I

 

 

 

 

XI . 1
M
— 10
11 , 1M . '7 2 s Z'y
8 Z1‘7‘1 S leM u=1 1 “‘
19(1) becomes 5 E and d(l) becomes I ;
M1 u ... 0
La zle SMMZMZm gsmz'
Myu.
1H J
hence, 5 = (EK1))-101(1)) = 69(1))-1(p(1)) = 8 as can be verified

3SLS ZA

by comparison with ZA formula (VII.12).

 

1It may be recalled that SML estimation is so affected by the rank

of the XI matrix that if rk XI 2 T - M + l, the SML computations cannot

be performed (at least the SML formulas given in Chapter VI cannot be
used without some modification). 0n the other hand FIML estimation is
not affected by the rank of the XI matrix, since jointly dependent vari-
ables are not adjusted by the XI matrix. (Instead, the matrix of co-
efficients of jointly dependent variables are used directly in the
estimation procedure.)

309

The fa t that 3 ' 'd ' ' ' '
c BSLS COinCi es With 82A computed With Jaintly
dependent variables misclassified does not destroy the consistency of
e l
5 . All that is indicated is that there are insufficient observa-

3SLS

tions to distinguish 6 estimates from 82A estimates (unless

3SLS

the space of X is restricted in some fashion).

I
The discussion of whether the space spanned by XI should be
restricted (and methods for restricting the space of X1) in the case
of the double k-class methods (section II.G) is applicable to BSLS as
well, except that if the subsystem being estimated contains very many
predetermined variables, the rank of the predetermined variables in

the subsystem being estimated may already equal T.

Let the matrix of predetermined variables in the subsystem being

estimated be denoted XA' It seems undesirable to restrict the sub-
Space of X1 in a manner such that XA is not in the space spanned
by XI. If the space of XI is restricted such that XA is not in

the Space Spanned by XI’ the 3SLS formulas given previously are not

I
XAJIXI i xAxA .

To take account of these non-equalities, any predetermined variable

. a - I I O
valid, Since (see VIII.12) [XAYAJIXI # XAYA and [XA

outside the Space spanned by X must be adjusted in the same manner

I

as the jointly dependent variables in the computational formulas;

 

1Consistency is an asymptotic property and a small number of
observations in a given sample certainly does not affect an asymptotic
property. If the number of instruments used in the estimation is fixed
(e.g., the number of predetermined variables in the system is fixed
for a given model, so if X is used as XI’ XI will be fixed) then (if
the SBSLS formula is followed; that is, a switch is not made to the
82A formula) as T increases, at some point there will be sufficient
observations that rk XI < T and 8 Will not coincide with 5 .

3SLS ZA

310

however, this is the same as misclassifying these variables as jointly
dependent in the original model, i.e., this adjustment has the same
computational effect as a change in the model in response to the small

number of observations.

311
D. Arbitrary Linear Restrictions Imposed on Coefficients
As noted earlier, the 3SLS formulas may be derived as an applica-
tion of the GLS method. In like manner, we may derive restricted 3SLS
(RBSLS) estimates (in which arbitrary linear restrictions are imposed
on the coefficients) as an application of the RGLS method.

If the set of NR restrictions given by:

(VIII.15) R 6 ==
NRXn nXl N Xl

r
R
is imposed on the 3SLS model (y = R6 + u with assumptions as stated

previously), the R3SLS formulas are given by:1

I A ' 1 -1 ' 1
(VIII.16) 5R33Ls = Q {EQ I?( )Q] Q EIL( ) - f(1)q]} + q

nXI nX(n-rk R) (n-rk R)X(n-rk R) (n-rk R)X1 nXl
(VIII l7) asymptotic Var(g ) = QEQ'R‘1)Q]-1Q'
. R3SLS
(VIII.16) and (VIII.17) are derived from substituting the 3SLS matrix

Exl) and.the 3SLS vector n(1) into RGLS formulas (1V.5) and (1V.8).

Q and q are calculated from R and r by the computational

 

1Calculation of the Q matrix and q vector also gives a means of
separating out rk R coefficients, 6(1), which may be calculated from

the remaining n-rk R "unrestricted" coefficients, 3(2). Thus, the

following pair of formulas are together equivalent to (VIII.16):

312:]??145 = [Q'ECIJQJ-l Q'[IL(1) ‘_ g(l)q]
(n-rk R)Xl (n-rk R)X(n-rk R) (n-rk R)Xl
R3SLS «R3SLS
8(2) = Q2 5(1) + ‘12

rk Rxl rk RX(n-rk R) (n-rk R)Xl rk Rxl

Where Q2 and q2 are the subparts of Q and q noted in chapter IV.

312
1

procedure given in section IV.B.l.

The R matrix need not have full row rank and the ﬁll)
matrix may be singular (the 2“ matrices may have less than full
column rank).

Although the R matrix and r vector can contain restrictions
imposed on the coefficients of only a single equation and/or restric-
tions which cut across equations, it should be noted that the same
answer will not in general be obtained if restrictions which affect
the coefficients of a single equation are solved into that equation as
if these restrictions are listed in the R matrix and r vector and
RBSLS applied. The reason that the resulting coefficients may be
different are outlined for RZSLS in section IV.D, but are equally
applicable to RBSLS. Restrictions which cut across equations cannot,
of course, be solved into a single equation; hence, must be imposed
through use of a procedure such as the RBSLS computational procedure.

(1)

In calculating the .9 matrix and a“) vector, it seems
desirable that the 8 matrix corresponding to the restricted ZSLS

estimates rather than the unrestricted ZSLS estimates be used.

 

1'If R is assumed to have full row rank and the Eu) matrix
is assumed to have full column rank then the formula

3 - 09(1))"11t'[n<;?(1))'lit'lin'éxLs - r]

3391s " 8331.3
may be used. (This formula is implied by Zellner and Theil [1962], p.
78 in their remark that restrictions may be applied to BSLS estimates.)
The advantages of the Q, q method over the R, r method for GLS
given in section IV.B.Z are applicable to 38LS as well.

313

Imposing Restrictions Which th Ila/was 13qu on the Coeﬁﬁi’céem Used
to We. the S Mama

It seems clearly desirable that restrictions which do not cut
across equations be imposed on the ZSLS coefficients used to calculate
the 8 matrix. It would also seem desirable to impose restrictions
which do out across equations on the ZSLS coefficients as well. (They
will then not be ZSLS coefficients for separate equations, but coeffi-
cients closely related to ZSLS.) This may be done in the same way as
was indicated in the ZA procedure, i.e., by using the alternate BSLS
computational method noted earlier (section VIII.B), i.e.:

(1) Start the computations by using an identity matrix in place of the
8 matrix. Calculate EKG) and ﬂ‘o) using the same formula as
for Exl) and a(1) except that the identity matrix is used in

place of the S matrix. Then:

, o -1 , o
(VIII.18) 511251.514}: - QEQ I?( )Q] Q EIL( ) - Ewell + q

A

A (1)
(2) Calculate S from 6R28LSME and use it to calculate a new E’
(1)

matrix and a vector. Then calculate GRBSLS by formula

(VIII.16) as before.1
h
In addition to being simple to program, the use of the 0t

iteration method has the advantage that unique ZSLS estimates (which

 

ls .
GRBSLS calculated in this manner will not be the same as GRBSLS

calculated by ignoring the restrictions which cut across equations in
the calculation of the S matrix.

314

take account of restrictions which do not cut across equations) need

not exist provided there is sufficient identification that unique RBSLS

estimates exist.

 

1It may be recalled that RZSLS as defined in section IV.D can
incorporate restrictions which do not cut across equations whereas
RZSLSME as in (VIII.18) can incorporate also restrictions which cut
across equations. Unique estimates of the latter may exist even when
unique estimates of the former do not. An example of this similar to
the example for ZA (footnote 1 of page 277) could be constructed.

315
E. Iterative Three-stage Least Squaresg(ISSLS)

1. Only zero and normalization restrictions imposed on coefficients

In their concluding remarks, Zellner and Theil state: "The
three-stage least squares estimator 3 implies in general a new
estimator of [ogp,] which differs from [suu,]. One can then set
up a new stage based on this estimator rather than [spu,] and pro-
ceed iteratively. No report on this method can be made as yet, but we
hope to come back to it in the future."1

Such a procedure has been referred to as multi-stage least
squares; however, this does not seem very appropriate, since going on
to subsequent iterations is not the same as going on to subsequent
stages as from the DLS lst stage, to the ZSLS 2nd stage, to the 3SLS
3rd stage. Instead we will refer to the above procedure as iterative
three-stage least squares (IBSLS). Madansky examines the question and
concludes that iteration is not worthwhile "in the sense that there is
no improvement in the asymptotic variance of the estimator. 0n the
other hand, the effect of such iteration on the finite sample variance
(or even on the finite-sample bias) of the estimator is still an open
question."2

ZA may be regarded as the particular case of BSLS in which there
is only one jointly dependent variable per equation. As we saw earlier,
IZA estimates coincide with FIML estimates. The question then arises:
"Will iteration on BSLS estimates lead to SML estimates if applied to

a partial system or to FIML estimates if applied to a complete system

 

1Zellner and Theil [1962], p. 78.

2Madansky [1964], p. 55.

316

assuming that the iterations on the BSLS estimates converge (so that
we may refer to them as IBSLS estimates)?" The answers to both questions
are in general, no, provided that rk XI > nu for at least one equa-
tion in the system.1 Thus, occurrence of the "explanatory" jointly de-
pendent variables in the case of 3SLS and IBSLS has a considerable
effect on the properties of BSLS and IBSLS relative to ZA and IZA.
First of all, let us show that IBSLS and SML estimates do not
in general coincide. To do so, assume that the partial system to be
estimated consists of only a single equation with rk XI > nu for that
equation. If a third stage is applied to the single equation the two-
stage estimates are again obtained, for the 3-stage formula becomes:

l'lEsllEz
I

3 - EsllEz

BSLS ] [lelllxltzlylllx 62313

U
IYIJLX I

v
lzlJIX I
Continued iteration will give only the ZSLS estimates for each itera-
tion, that is, the system clearly converges to the ZSLS estimates.

Hood and Koopmans show that the SML estimates for a system with
a single equation are the LIML estimates.2 Thus, in this case the
IBSLS and SML estimates will coincide only if the ZSLS and LIML esti-
mates coincide which will not in general occur for an equation for
which rk.XI >’nu .

The question might still arise--is there some peculiarity about

a complete system which would make I3SLS estimates the same as FIML

estimates. We can easily answer this in the negative by extending the

 

1If rk X > n for an equation, the equation is usually termed
"over-identified" gnd if rk X . n for an equation, the equation is
usually termed "just-identified" (See section 11.3). If rk X I n
for all equations in the system, all of the following estimators dplus
other estimators) give the same estimated coefficients: FIML, SML,
BSLS, IBSLS, LIML, and ZSLS.

2Koopmans and Hood [1953], pp. 170-171.

317

above special case.

Zellner and Theil show that in the estimation of the coefficients
of a system of equations by 3SLS, any equation for which rk.XI - uh
can be omitted in the calculation of the BSLS coefficients of the re-
maining equations.1 Since their proof makes no use of where the
original estimates are derived, it applies to any iteration; hence,
the equations for which rk XI - nu may be omitted from the system
in the calculation of the I3SLS coefficients of the equations for
which rk.X. >'nu, i.e., the iterations may be performed on the equa-

I
tions for which rk.XI > mp, only.

I

one equation in the system and rk.XI - nu for the remainder of the

equations. Then the ISSLS estimates of that one equation will be

Now, consider a complete system in which rk.X >'nb for only

the ZSLS estimates, which will not, in general, be the same as the
FIML estimates of that equation. Thus, ISSLS estimates will not,

in general, be the same as FIML estimates.

The. ISSLS Puceduac

To see how the computation of the IBSLS estimates differs at
each iteration from the computation of SML and FIML estimates, let

us first note the change in the coefficient vector from one iteration

 

1Zellner and Theil [1962], pp. 63-68. Actually their proof is
not complete as they only show that the E’ matrix can be subdivided.
They do not show the effect of multiplying the subdivided E’ matrix
by a similarily subdivided A vector; however, some additional
manipulation shows that their conclusion is correct.

318
to the next in the I3SLS procedure.1 At the ith iteration of the

I3SLS procedure: 5(i) = (ﬁxi'l))'1n(i‘1) where Eﬂi‘l) and a (1'1)
are calculated in the same manner [(VIII.10) and (VIII.11)] as Exl)
and 4(1) except that g(i-l) is used instead of 3(1)

The increment added to g(i-l) ' to form g(i) is:

 

1Many assume that the relationship between FIML and BSLS has been
at least largely explicated in Chow [1964], pp. 548-550; however, Chow's
derivation of BSLS as a minimization of a particular determinant is in-
correct from a couple of standpoints:

(l) Chow computes his estimate of a

nu' as:

1 — — , - _ -
ow' g Tdyujllx ' [YH'JIXYH' ' XuBu) ([yu'JlX - [Yu'JIXYu' xu'Bu')

1 — — , _ - _ -
= T<va " [Yujuxvu " ZU'BU') (ypt [YLL'J'X L'1'! 29.8”.)

Where Yh apparently is §§SLS and ﬁg apparently is BESLS

(Chow does not divide by T, but a division by T cancels out in the
actual estimation, anyway. Also, in his notation Chow used B's where
we use y's and vice versa.) 0n the other hand, Zellner and Theil [1962]

compute their estimate of on“, as:

a l A ZSLS 231.5 a ZSLS ZSLS
o ='- - Y - X ' - Y - X
PW! I T (YMl ”Y“! ”a“. ) (Yu I Y“. I U' I B” I )

that is, through use of the actual values of the jointly dependent
variables rather than their adjusted values YIX .

(2) Even if we accepted Chow's estimate of 2 instead of the estimate
proposed by Zellner and Theil we would still not get BSLS, since
application of Chow's formula (7.10) would give a new estimate for the
vector of coefficients which would lead to a new estimate of 2, and

80 on.

319
0,111.19) 3(1) _ 5(1-1) = (514544014) _ (8(i-1))-le(i-l)8(i-l)

 

 

 

 

 

 

 

f M 1 E- 1
111 , 11 , 1 , 71(1-1)
“.318 [Zlpr'xI 8 [2121].){1 S “1:2le1le 51 1
_ (Eu-154$ : _
M
m Ml MM . «(1-1)
rs[z'y] sE'ZJ ---s[z J 6
Wm M» IX! L ZN 1 IxI MZM IXIJ tM .J
,
gade'y] - EZ'Z] 301-1))
“:1 1 p. IIXI 1 p. llxI p.
- (au‘l’fl { 3 1 _._ (3(1-1))-1m<1-1)
M
m I _ I (1’1)
”)3 18 ([zMyu] IxI [zMzu] uxlgu
L

 

where we are temporarily denoting the right hand side vector as rn(i-1).

The jth block of ln(i-1) is:

M
(1‘1) 3 J”! I _ ' (1'1)
(VIII.20) m J 11218 ([zjquIXI Ezjzu]"x16u )
= g stEZ' - Z'Z 6(1-1)] [see (I 62)]
”.1 Jyu» J u u leI '
M M
a J“ I. _ (1'1) .3 j“ .a
pile [21(yp zusu )JIXI “£18 [ZJUHJHXI .

Let us subdivide the jth block according to whether the vari-

ables in Z are jointly dependent or predetermined. Then, since X

.1
'Q
is contained in XI , [Xpup]

.1

"X1 - Xﬁu¢b[see (1.54) and (1.40)] and

(VIII.20) becomes:

320

 

 

 

 

 

 

 

 

PM 1
2 S 'u
p31 1 u llxI
(v111.21) mgl'l) .
2 sjuxiﬁu
L. ”=1 J
Thus,
(v111.22) 3(1) - 5(1'1) ._. (f(i'l))'1m(i'l)
with
PM ~
1“ A
Z S [Y'u ]
“=1 1 u uxI
M
Z: smxiﬁu
F1 J
'-1
(v111.23) m(1 ) =
F-M 1M; -
Z s [Y'ﬁ ]
”=1 M u- IIXI
M
M1 .
Z s 'u
u=1 KM“ j
E.“ ..

As noted earlier, the 5’ metric is a positive definite metric
which has been used in the past as the metric for a large number of
iterations in the FIML and SML computational methods. A necessary con-
dition for IBSLS convergence is that Erlm becomes zero. A sufficient
condition for this is that m goes to zero. If m can be shown to be
the partial derivative of some function f**(8) which is unimodal in
the region of iteration and satisfies certain regularity conditions,

then if we allow a step size other than 1, i.e., calculate

8(1) = 8(1’1) + h(1)(e(i'l))'1m(i'1), then for h“) sufficiently

321
small, f**(3(1)) > f**(6(i-1)). The condition m -0 will imply that

the maximum has been reached in the region. In checking possibilities
for a function of interest being maximized or minimised, m need not
be the partial derivative of the function. It is only necessary that
f** be a strictly increasing or decreasing function of the function

of interest.

The. SML and FIML Panama

The vector of SML which is comparable with the vector m(1'1)

of ISSLS is [as given in (V1.36)]:

M
2 £1!”me
u-l

t<1-1> _

21“ p. 21“ “11311111

(VIII.24)

u-1 111“

.1, a i '7

 

 

 

 

W W MI» ...]
“:11: EYMu ujllxl + (I at )Yuuu}
2 a u
1.1-1 mg”

Nb- dJ

 

The bottom part of each block of 1 corresponds to the prede-
termined variables of the equation and is the same as for I3SLS. It
is only the top part of each block (which corresponds to the jointly

dependent variables of the equations) that differs. Further. if 8'1

322

-1
were substituted for 1' in the SML formula, even the top parts of

corresponding blocks would be the same. (T'= (l/T)%A[;A;Al.x a;

where [Z‘Z ] is the moment matrix of the part of Z orthogonal
A A 1X1 A
- * I .“I
to xI . s (l/T)°’A[ZAZA]°’A .)

In the special case that the number of jointly dependent vari-
ables in the system equals the number of equations, the SML estimates

become FIML estimates and the vector corresponding to m becomes (from

v.37):
rn_ M .“
TYUH} + 2 sluYiuu
u=1
Mi
2 smxiﬁ
L u=1 ‘3

 

 

.4 s >

 

 

M
{ii/{MW} + z smYﬁﬁ?
u=1
M.
leuuxﬁﬁubj
Up. “'31 ..J

 

 

where §{”|”} is formed from 1’1 as indicated following (v.37).
The number of observations, T, in the FIML expression should not be
confused with the ‘T matrix in the SML expression.

Notice that the blocks of the right hand side corresponding to
predetermined variables coincide for IBSLS and FIML, but the blocks of
the right hand side corresponding to jointly dependent variables differ
for the two methods. For I3SLS, the jointly dependent variables are
adjusted to variables which are asymptotically uncorrelated with the
disturbances whereas for FIML, the jointly dependent variables are
left unadjusted and the special nature of the jointly dependent vari-

ables is taken into account through the use of elements of the inverse

323

of the matrix of estimated coefficients of the jointly dependent vari-
ables. [This adjustment is derived as the appropriate adjustment
through recognition of the Jacobian of the transformation of the like-
lihood from U to a and Z--see (v.12) and (V.13).]

Determining additional characteristics of the extreme point to
which I3SLS converges would be very helpful in determining whether

1’2 If it were determined that I3SLS

such iteration is worthwhile.
estimation is worthwhile and we knew a definite function being maximized
or minimized, the speed of convergence could be considerably increased.
As a start toward increasing the speed of convergence, the step size

could be made more optimal at each iteration. Possibly a more effi-

cient metric could also be devised.

 

1Convergence was obtained for some problems and appeared to be
slowly occurring in all other IBSLS problems computed on the AES STAT
system.

21f we define IZSLS as the same procedure as ILIML (see section
VI.E) except that ZSLS is used as the basic computational scheme
rather than LIML, then IZSLS estimates apparently do not coincide with
IBSLS estimates. Using a coefficient convergence criterion (see
section V.C.S) of .000 000 000 1, 61 iterations were required to cal-
culate IZSLS estimates for Klein's model I. The coefficients obtained
were:

C P w CONSTANT P _ 1

Eq 1 -1 .2013 .7821 16.0510 .1282

I P CONSTANT P_1 x_1

Bq 2 -1 .1050 22.9420 .6253 -.1680
W1 E CONSTANT s_1 t

Eq 3 .1 .3578 2.5265 .2130 .1815

Klein's model I is given in section I.C.a and the ZSLS, 3SLS, and
IBSLS solutions to Klein's model I are given in the reproduced com-
puter output of section IX.K.

324

2. Arbitrary linear restrictions imposed on coefficients

Restrictions may be imposed each iteration of the computation

of the IBSLS coefficients in the same manner as for 3SLS.

PART III

ADDITIWAL PROGRAMMING CWSIDERATIWS

CHAPTER IX

ADDITIONAL PROGRAMMING CW8 IDERATIWS

The previous 8 chapters contain basic formulas for computation
of simultaneous stochastic equations methods. The way that these for-
mulas are actually programmed on the computer has a very large effect
on the actual coefficients obtained due to: (l) rounding error in per-
forming the computations and (2) misspecification of the particular
equation system to the computer. (I.e., the computer solves the equa-
tion system which is specified to it. This is very commonly not the
equation system which the user thinks that he is specifying to it.)

In this chapter we will consider programming procedures which
may considerably reduce rounding error as compared to the programming
practices usually used. The user control cards used in the stochastic
equations portion of the AES STAT package of computer programs1 will be
presented to illustrate a form of user control cards which results in

considerably fewer errors and provides much more flexibility in the

 

1Except for a few subroutines and some recent modifications, the
simultaneous stochastic linear equations part of the AES STAT system
was programmed by the writer as a Department of Agricultural Economics
research project. Although the programming was spaced out over a few
years the actual programming required approximately one year of actual
programming time from the writer. The remainder of the system (the
DLS methods and methods other than the stochastic simultaneous equations
portion) was developed as part of the writer's Agricultural Experiment
Station programming responsibilities. Although the writer commenced
programming the AES STAT system as the only programmer, his staff has
been increased by the addition of approximately one-half of a pro-
grammer's time each year. Presently there are three full time programmers
beside the writer expanding the system (primarily the analysis of
variance and covariance and the least squares portions), writing user
descriptions, consulting with members of the university in the use of
the routines, and developing other routines not a part of this system
(i.e., with a different form of parameters). Programmers who have
worked on the AES STAT system at one time or another include Mr. Donald
P. Kiel, Miss Mary E. Rafter, Mrs. Barbara Bray, Mrs. MarylynHA. Donald-
son, Mr. Richard J. Martz, Mrs. Sara J. Paulson, Mr. Peter M. Schwinn,
Mr. Frederick J. Ball, Mr. Tim Walters, and Mr. John Geweke.

325

326

range of problems which may be calculated than the user control cards
used for most simultaneous stochastic equations computer programs.1
To illustrate the control cards, the control cards required for cal-
culating ZSLS, LIML, 3SLS, I3SLS, LML, and FIML for Klein's model I
on the AES STAT system will be given along with the computer results

obtained.

 

1The control cards are not presented as "ultimate" control cards,
but merely a form of control cards which have performed considerably
better than the control cards commonly used. Substantially superior
forms of control cards will surely be devised in the future. Also,
the control cards used in the AES STAT system are oriented toward a
card input, batch-processing computer system. Since new forms of in-
put and new concepts of processing are becoming available, it is de-
sirable that the whole approach to specifying problems to the computer
should change as well. For example, if teletypes or typewriters are
used to specify a problem to a computer operating in a time sharing
mode, a system of conversational controls should be devised to replace
the form of control statements of the AES STAT system. The computer
should query the user regarding aspects of the problem and after each
user response, check for inconsistencies. The computer should notify
the user of an inconsistency as quickly as it is detected, allow the
user to correct the particular inconsistency detected, and proceed
with the calculation of the problem.

327
A. Rounding Error

" . . . extreme care and precaution with accuracy of computation
is not a fruitless and vein search for superfluous digits beyond the
number significant in the original input. In spite of the fact that
we have only two or three digits of significance in our original input
variables and want only two or three-digit coefficients as an end re-
sult, we may have to carry out intermediate calculations to a very
large number of places. The intermediate stages of equations systems
methods of estimation are quite intricate, and if we do not carry out
all of our results to many places and use the most accurate arithmetic
procedures, we may find that our giant machines are spinning out masses
of meaningless figures."1

Longley E1957] reported the results of calculating the DLS co-
efficients from data consisting of 16 observations with 6 independent
variables and 8 dependent variables on the DLS routines which he felt
were the most commonly used DLS routines in the 0.8. Longley's re-
sults were startling. The most accurate routine had only 4 or 5 digits
accuracy in the resulting coefficients, one routine had no digits

correct for some coefficients, and one routine even had some signs of

 

1R1ein and Nakamura [1962], pp. 298-299.

328

coefficients incorrect.

Freund [1963] reported the results of calculating a small DLS
problem on a series of computer routines and also obtained widely vary-
ing results.

If a large amount of rounding error is commonly encountered on
small DLS problems, think how meaningless the results from many of eVen
the most simple of the simultaneous stochastic equations estimating
procedures must be. It is the purpose of this section to suggest com-
putational methods for reducing rounding error so that meaningful re-

sults can be obtained.

 

1Longley calculated the "correct“ coefficients on a hand calculator--
carrying 15 digits at each step-~and reported 8 places after the decimal
point (plus from four to no places before the decimal point) in the re-
sulting coefficients. The writer ran Longley's problems on the AES
STAT system without making any special provision to reduce rounding
error and obtained exactly the same coefficients as was reported by
Longley, except for a couple of the coefficients corresponding to an
overall constant in which the AES STAT overall constants disagreed at
the ninth significant digit. (Subsequent experimentation with the data
on the AES STAT system indicated that the AES STAT coefficients were
correct to at least 15 places. Planned modification of the method of
forming sums of squares and cross-products will raise the accuracy of
the AES STAT system even further.)

329
1. Single vs. double precision
Most large scale computers currently being produced have both
single precision floating point arithmetic and double precision float-
ing point arithmetic built into the computer. The AES STAT system
operates on the Control Data Corporation 3600 Computer which has a 48-

bit word. Each floating point number is represented as follows in the

CDC 3600 computer:

Single Precision--Each Word of Storage Contains One Number

 

lsign[ll Exponent 36 mantissa bits

bit bits :10 decimal digits

 

 

 

l

Double Precision--Two Consecutive Words of Storage Contain One Number

sign [11 Exponent 84 mantissa bits

524 decimal digits

----q

 

bit bits

Each number is carried in the mantissa to a base of 2 rather
than 10 and normalized so that "leading zeros" are not carried in the
number. The location of the decimal point (to the base 2) is given by
the exponent bits. Twice as many computer memory words are required
to carry a matrix in double precision form, as to carry the matrix in
single precision form. Also, for most computers, arithmetic operators
require somewhat longer if they are performed in double precision than
if they are performed in single precision.

As a result of the requirement of more storage if calculations
are carried in double precision, and to a lesser extent as a result of
the additional time required to calculate in double precision, many

simultaneous stochastic equations routines are programmed in single

330

precision.1 This saving of storage and time is generally a mistake.

If computer time is the concern, then the programmer should be reminded
that time lost in trying to find out why the coefficients "blew up" is
time that should also be charged against the routine. Harder yet to
evaluate is the case where (due to rounding error) none of the digits
of the calculated statistics are computationally significant, but they
are used anyway since the researcher is not aware of this.

If computer capacity is the problem, then the programmer should
be reminded that it is on the larger problems that rounding error builds
up the fastest; hence, that double precision is most needed. If there
is insufficient capacity for an operation, then use of supplementary
storage devices such as magnetic drums, disks, disk packs, or magnetic
tapes in performing a series of operations is preferable to using single
precision in order to provide sufficient storage.

Many programmers perform operations which they consider will not
lead to a large amount of rounding error in single precision (e.g., a
programmer may form the matrix of sums of squares and cross-products
in single precision), convert the result to double precision, and then
perform the more formidable computations which follow in double pre-
cision. Although little or no additional rounding error may result
from such a practice in particular instances, these instances are far
more rare than is generally realized.

The reason why this is generally an undesirable practice is that

only a small proportion of the possible numbers convert evenly to the

 

1The simultaneous stochastic equations portion of the AES STAT
system is programmed entirely in double precision.

331

power of 2 in the computer. Thus, even simple numbers like 1.1, 1.2,
1.3, 1.4, 1.6, etc. have a different representation in single precision
than in double precision; that is, the double precision representation
is not merely the single precision representation with a word of zeros
attached. ThUS, (in the case of the CDC 3600) the initial numbers will
be accurate to 10 digits whether carried in single precision or double
precision converted from single precision. Since rounding error starts
from the last significant digit, rounding error will start to build up
from 10 digits out instead of 24 digits out even if subsequent opera-
tions are performed in double-precision.1

One place that this is especially harmful is in the formation
of a sums of squares and cross-products matrix with subsequent in-
version Of part of the matrix and multiplication by another part of
the matrix in order to perform simple calculations such as obtaining
direct least squares coefficients. Many people seem to regard the
formation of a sums of squares and cross-products matrix as an Opera-
tion in which little rounding error occurs (since the basic calculation
is so simple); hence, they perform this operation in single precision.
Since inversion of a matrix is considered to be a complicated procedure
involving a lot of rounding error, the single precision matrix of sums
of squares and cross-products is then converted to double precision
and the inverse calculated. Unless the sums of squares and cross-

products matrix is formed from variables with special characteristics

 

1'An exception isan integral number such as 1,12,252, etc“ An in-
tegral number is represented exactly by a single precision number;
therefore, conversion to double precision of positive integers implies
merely adding on a word of zeros. However, if division of one number
by another is performed in single precision, the result will usually
involve rounding to 10 places with subsequent conversion to double pre-
cision giving only 10 place accuracy.

332

(such as that all variables contain integral numbers, only) the round-
ing error battle has been lost even before inversion is started, since
the rounding error will commence from somewhat less than 10 digits out
during inversion rather than somewhat less than 24 digits as could
have been obtained. In the case of a computer such as the IBM 360
which has only a 32 bit word (approximately 7 digits if floating point
arithmetic is used) the use of single precision for some operations
will have even a more pronounced effect on rounding error.
Multiplication of two matrices or a matrix and a vector is often
regarded as a safe operation as compared to inversion, but one should
be reminded that the formation of an element of the product requires
many multiplications and additions if the matrices are large. If the
numbers vary substantially in size, rounding error can be severe in

even such a simple operation.

333

2. Standardization of variables

a. Deviations from means

The size of the mean of a variable has considerable effect on

rounding error. Consider the following two variables:

x1 x2
100001.2763 1.2763
100002.7816 2.7816
100001.1471 1.1471
100003.0278 3.0278

If x1 is used as an explanatory variable in a stochastic equation
containing an overall constant coefficient and the coefficients of
the equation are estimated by any of the simultaneous stochastic
linear equations methods, the coefficient corresponding to x1 will
be the same as the coefficient which would be obtained if x1 were
replaced by x2; only the overall constant coefficient Would change

if x were replaced by x However, sums of squares and cross-

1 2 °

products can be calculated more accurately for x2 than for x1.
Let us assume that the mantissa of each floating point number can

contain 10 digits. Then each observation of x: will contain no

digits to the right of the decimal point Whereas each observation

of x will contain 9 digits to the right of the decimal point.

2

Similarly cross-products of x2 with other variables will contain
more digits to the right of the decimal point than cross-products of

x1 with other variables.

We may regard x as having been formed by subtracting 100000

2

from each observation of x Even more accuracy would be obtained

1 .

334

if x2 were formed by subtracting the mean of x from x1, since

1
there would be even less superflous information to the left of the
decimal. The new x2. would again give the same coefficient (except
for less rounding error in its calculation) as either x1 or the

original, x (Again the overall constant coefficient would change.)

2.

So,far the effects of the actual squaring and cross-product
Operations, only, have been considered. Summing operations in the
formation of the sums of squares, sums of cross-products and sums of
the original variables also contributeto rounding error, and again
less rounding error wOuld be produced during these summing operations
if x2 were used instead of x1.

To illustrate the effect of summing on rounding error we will
use a different example. To simplify our evaluation we will assume
that each floating point number contains exactly 5 decimal digits in
its mantissa and its exponent iscarried to the base 10. Consider
forming the Sum of a hariable, x3, consisting of 100 observations with
the first 50 observations assuming the value 10001 and the last 50
observations assuming the value 10003. The sum of the first 9 Obser-
vations is 90009. Whén the 10th observation is added, 10001-101 is
Obtained (i.e., the mantissa of the number is 10001 and the exponent

is 1). When the 11th observation is added we get 100013101 + 10001-100 =

10001-‘101 + 10001-‘101 - 11001-101. (To add two positive floating point
numbers their exponents are firstlequalized. This is accomplished by
dividing the mantissa of the smaller number by 108 where a is the
difference between the exponents of the two numbers.) In like fashion

as each of the observations which follow are added into the sum, the

l or 3 is lost from the right hand side due to the equalization of

335
exponents. Thus, after adding in the 99th observation we have 990019101.

When the 100th observation is added we get 99001'101 + 10001110o =
99001101 +1000'101 - 10000'102; whereas the exact sum is
10002"102.

If xt4 were formed as xt3 - 10000, x4 would be 1 for the
first 50 observations and 3 for the last 50 observations. The sum of
x4 would be accurately obtained as 200 (which is represented as
20000-10'2).

The effect of summing on rounding error has been illustrated
for a variable, but a similar effect is obtained if x3 is the square
of a variable or the cross-product of two variables during the formation
of a sums of squares and cross-products matrix. I

Let us assume that an overall constant coefficient appears in
each stochastic equation in the subsystem to be estimated by any of
the simultaneous stochastic linear equations methods. (We will con-
sider the case of a stochastic equation containing no overall constant
coefficient further on.)

Let M be the matrix of sums of squares and cross-products of

all variables in a problem, that is:

T

(ix.1) M - [”133 . mij - tElxtixtj .

and let A be the matrix of sums of squares and cross-products of the

deviations from their means of all variables in the problem, that is,

T _ ..
(IX.2) A - [811] s 81:] B til(xti ‘ xi.) (xtj - xj)
T __ T
- 2 x x x 2 x

t=1 ti tj i t=1 tj

336
[For a given problem, the two definitions of aij may not coincide
due to rounding error; however, we are only defining the matrix A at
this point. In section IX.A.2.c we will consider a more accurate
formula than either of the formulas for aij in (IX.2).]

Also, let us define the variable x0 as a variable which
assumes the value 1 for all observations. Then the same estimates
(except for rounding error) are obtained if (1) the variable x0 is
explicitly included in each equation and the M matrix is used as the
Z'Z matrix in the computational formulas given in parts I and II of
this paper as if (2) the variable x0 is omitted from each equation,
the A matrix is used as the Z'Z matrix in the computational formulas

of parts I and II, and the overall constant coefficient for each equa-

tion is calculated as:

-l
IX.3 a = -. - 2's = -E— 5.2 .
( ) 110 ya u u yu» Jim]
where &u0 is the overall constant coefficient for equation u (the

coefficient corresponding to variable x0), y; is the mean of the nor-
malizing variable for the uth equation, 2; is a row vector of means

of the explanatory variables of the nth equation (not including x0)
and d“ is the vector of coefficienUBof the uth equation (not in-
cluding the normalizing coefficient, -1, and aﬁo).

If the A matrix is used as the Z'Z matrix then the distur-
bance variance-covariance matrix, the coefficient variance-covariance
matrix, and statistics calculated from these matrices are calculated
by the formulas of parts I and II except that (1) the constant coeffi-
cient is not included in the formulas and (2) if a ”degrees of freedom"

adjustment is made, the "degrees of freedom" should take account of

337

the implicit overall constant coefficient. For example the disturbance

variance-covariance matrix may be estimated as:

A

1:"! .
S TQ’A 9

however, since A is calculated from deviations from means, the over-

all constant coefficients are omitted from the & matrix. A degrees

 

of freedom adjustment of 'T/Q/T-nui/T-nu,) can be made in the S
matrix as before, but here, nu and nu, each include the overall
constant coefficient (i.e., if up and up, reflect the number of

explanatory variables not including x they are incremented by 1 in

0,

adjusting for degrees of freedom).1

 

1It is convenient to take this adjustment into account in the sub-
matrices which output the variance-covariance matrices and related
statistics rather than in the main computational section of the program.
This is easily accomplished by setting a variable in COMMON to 0 if the
M matrix is used and 1 if the A matrix is used and subtracting this vari-
able from the "degrees of freedom" whenever a degrees of freedom cal-
culation is made. It is also convenient to calculate overall constant
coefficients by (IX.3) in the coefficient output subroutines also,
rather than in the main program. An overall constant coefficient is
calculated and printed out along with the other coefficients whenever
the aforementioned variable in COMMON is l.

338
b. Uniform scaling

It is sometimes (incorrectly) thought that if all numbers are
carried in floating point form, the scaling of a variable up or down
will have little substantive effect on the calculations, since the
main effect of the scaling is the changing of exponents which desig-
nate where the decimal point occurs for each observation of the vari-
able. The scaling of a variable does have a considerable effect on
rounding error due to the effect of addition and subtraction of float-
ing point numbers.

To add two positive floating point numbers, the exponents are
first equalized. This is accomplished by shifting the mantissa of
the smaller number to the right (dividing it by powers of 2) and add-
ing to the exponent of this number until the exponents are equalized.
What is left of the mantissa of the smaller number is added to the
mantissa of the larger number and the resulting number is then
normalized to eliminate leading zeros. Subtraction is performed in
a similar manner. Thus, when addition and subtraction are performed,
the sizes of the numbers involved are important. Since addition and
subtraction are an integral part of the operations for all simul-
taneous equations calculations, rounding error is affected by the
scaling of variables.

The basic information used from a set of variables is often
contained in the matrix of sums of squares and cross-products of the
deviations of the variables from their means [the A matrix (IX.2)
of the previous section]. If some variables are scaled high and some
low, the size of the elements of the A matrix may vary greatly in

magnitude. Thus, if a typical element of the deviation from the mean

339

of one variable is about 1000 and a typical element of the deviation
from the mean of another variable is about .01, the diagonal element

of the A matrix of the first variable will tend to be about
(1000)2/(.01)2 = 10,000,000,000 times as large as the diagonal element
of the second variable. Control of rounding error in addition and sub-
traction is very difficult With such widely differing magnitudes of
variables.

The above does not imply that the user of a computer program
must try to scale all of his variables to the same magnitude, as this
may impose a considerable burden on the user, both in setting up his
data and in interpreting his results. Also, the madnitude of variables
created in the computer by a prior transformation or editing procedure
may be hard to predict. The above considerations suggest that the
computer program should be sophisticated enough to handle automatically
the scaling of data (and subsequent descaling of results) for the user.

Automatic uniform scaling of variables is easily accomplished.
One method of accomplishing uniform scaling is to form the A matrix
and then normalize the A matrix so that each diagonal element is 1.
This is accomplished by multiplying the elements of row i and the

elements of column 1 by (11 Where
(IX.4) ai = “51—i—

In matrix notation the operation may be represented as
(IX.5) A* = DAD

where A* is the A matrix normalized to l on the diagonals and D

th .
is a diagonal matrix whose i diagonal element is di

340

Let xii = di(xti - Mi). Then (ignoring rounding error) A* is
the matrix of sums of squares and cross-products of the x: variables.
A characteristic of the A* matrix is that each x: variable has
length 1--a very convenient normalization of variables.1 Since x?
has mean zero and length 1, it is often referred to as a standardized
variable. The A* matrix is the usual simple (Pearson product-moment)
correlations matrix.

All calculations are performed on the A* matrix in the same
manner as if it were the A matrix. Thus, many statistics such as
coefficient estimates are carried in normalized form in the computer,
thereby reducing rounding error in many calculations involving the co-
efficients. Only when_statistics are printed out, must those statistics
which are affected by a change of scale in the variables be denormalized.
It is usually convenient to do the denormalization in output subroutines,
thereby leaving the coefficients (or other statistics) in normalized
form in the computer in case they are used for subsequent calculations.2

One might pose the question--why not first normalize the A
matrix by multiplying rows and columns by powers of 10? The reasons

are (1) it is as easy to normalize in the fashion indicated as by a

 

LThe length of a vector, xi, is defined as inxi -./ T 2

Z x .
12:1 121

2The AES STAT package uses one output subroutine for printing co-
efficients estimated by single equation methods, one output subroutine
for printing coefficients estimated by multiple equations methods, one
output subroutine for variance-covariance matrices (either disturbance
or coefficient and in either denormalized form or further normalized so
that 1's appear on the diagonal of the variance-covariance matrix as well),
and one output subroutine to calculate and print estimated coefficient
standard errors, coefficients divided by coefficient standard errors,
and coefficient variances. All denormalizations are handled in the out-
put subroutine thereby leaving the normalized coefficients and variance-
covariance matrices unmodified in the computer.

341

power of 10, since a power of 10 is no more convenient than a number

 

such as lA/ T
2 (x
t=1 t
each standardized variable has length l (the diagonal element of the

2 for the computer, (2) the knowledge that

i‘xi)
A* matrix are l) is convenient in deriving certain computational
formulas, and (3) the statistics affected by scaling variables will

be completely denormalized before printing them out anyway.

The normalization (or standardization) which we are imposing on
the coefficients through normalization of the variables should not be
confused with the normalization imposed on the coefficients of an
equation by setting the coefficient of the "normalizing variable" to
-l. The normalization to -l is required for determinateness of the
coefficients, whereas the normalization that we have been discussing
in this section results in coefficients which are compatible with the
normalized variables (and are therefore independent of scale of the
original variables).

If normalization of the A matrix is accomplished in the com-
puter before computation of statistics is started, all computed
statistics (before printing them out) will be uniformly scaled; that
is, scaling of variables up or down by the user Will have no effect
on the basic A* matrix and, therefore, on the intermediate matrices

and final statistics calculated. This uniform scale characteristic

 

1Normalization by a power of 2 is not advantageous either, since
special programming would be required to adjust the exponent directly,
and the extra operations would in general require more time than the
use of direct multiplication for normalization and denormalization.
2In DLS estimation the normalized coefficients are often referred
to as beta weights.

342

is very convenient in determining the rank of a matrix, Whether a
matrix being inverted is singular, whether certain variables are
linear combinations of other variables during orthogonalization, the
degree of convergence of an iterative procedure, etc.

Zellner and Thornber state:

"A simple and relatively inexpensive check which we rec-

ommend is to perform calculations several times with the

raw data scaled differently each run. Should the result-

ing sets of estimates differ, it is probable that compu-

tational errors are a problem. When faced with such re-

sults, an investigator might resort to higher precision

arithmetic and/or ponder whether the information in his

sample is adequate to provide even moderately good esti-

mates of all of the parameters of his model."

The writer concurs with the importance which Zellner and Thornber
place on proper scaling under the assumption that the routine will not
automatically scale the variables by normalizing the A matrix; how-
ever, even if several arbitrary scalings of the raw data are tried for
a problem, none are likely to give as good a result as the normalization
to length l of all deviations from means of variables which will be
automatically accomplished by the procedure outlined earlier. Thus,
although by normalizing the A matrix we have given up being able to
affect the calculation by rescaling variables, we can expect to end up

ahead rather than behind, and considerably more conveniently (from the

2
standpoint of the user).

 

1Zellner and Thornber [1966], p. 728.

zAlso, to implement the suggestion of Zellner and Thornber, one
must decide how much to rescale individual variables. If the alter-
native scalings are sufficiently pathological, differing estimates will
surely be obtained.

343

c. Improving the estimates of sums, means, and
the standardized moment matrix

In section IX.A.2.a we noted that if the mean of a variable is
subtracted from each observation of the variable, a computer word can
contain more meaningful information regarding the sum of the variable,.
the sum of squares of the variable, and thesum of cross-products of the
variable with another variable. Let us also note the effedt of'roundingm
error on the calculation of means and a procedure for improving the
accuracy of computed means, sums, and the standardization moment matrix.

First, let us define the set of original variables as the x1

and define a set of variables, the yi’ corresponding to the x1 as:

(IX.6) yti = xti - mi , t = l,...,T

where m1 is a constant subtracted from each observation of xi in
the formation of the corresponding variable yi . A desirable choice
for an m1 is the mean of the corresponding xi , but we will not
restrict an mi to be an exact mean of the corresponding xi . For
example, an mi might be a grossly inaccurate estimate of the mean
of its corresponding xi

From (IX.6) we obtain:

(1x,7) xti = yti + mi , t = l,...,T .3
hence:
T T
(IX.8) tilxti = tilyti + T mi
_. T ._
(IX.9) xi = (1{I)- Elxti= yi + m1

344

If m1 is an approximation to the mean of xi we can expect

_' T
to compute xi much more accurately as yi + m than as (l/T) Z xti ,

 

i c-1
and we can expect to form the sum more accurately as
T T 1
Z x . = E y . + Tm. , i.e., by (IX.8).'
t=l t1 t=l t1 1
1 __ T
The arithmetic mean of the xti is, by definition, xi = (l/T) E xti'

However, in_ca1culating the mean by this definitional formula, we Effen
do not get‘i, exactly. Let m be the number thus actually obtained as

an approximation to the mean, and let e be the rounding error, so we
have m - x = e , where m is a known icalculated) quantity but, in gen-
eral, fhe exact values of_;x' and e may not be known. Now consider
another approximation to xi, say mﬁ, obtained as follows:

Compute (l) yti = xti - mi, t = l,...,T;

(2) an approximation to the me%n of the yti’ obtained by apply-

ing the definitional formula §i = (l/T) 2 yti; let qi be the result of
t=l

this galculation, and fi the corresponding rounding error, so we have

q.-yi=fi;

1
* z:
and (3) mi q. + m,.

The resulting error, i.e., the difference between m: and 2;, is

* - E’ = + m -‘;
mi qi 1

i i
=-+f+ -_
y1 i m1 x1
= x - m, + f, + m -'x
1 i 1
= fi .
Thus, mt Will be a better approximation to ;' than m is, whenever the
magnitude of f is less than that of e . Now the magnitude of a round-

ing error is, on the average, roughly proportional to the magnitude of
the quantity being rounded off. (For example, in the decimal system,
if n significant digits are carried throughout, the average magnitude

of rounding error is roughly .5 X 10.n times the magnitude of the number
rounded off.) Since y is always close to zero,_its magnitude is
generally much smaller than that of x, (unless x, is already very close
to zero), and hence the magnitude of 1f, is genefally much smaller than
that of ei, and m? is generally a much better approximation to xi than

is mi.

345

To illustrate the improvement which may be obtained through use
of (IX.8) and (IX.9), we will return to one of the examples in section
IX.A.2.a. In this example we used a variable, x3, consisting of 100
observations, the first 50 observations assuming the value 10001 and
the last 50 observations assuming the value 10003. We also assumed
that each floating point number contains exactly 5 decimal digits and

that its exponent is carried to the base 10. We then computed the sum
T

of the variable to be ( Z x ) = 10000:102. Thus,
t=1 t3 lst pass
T 2
m3 = (tEIXCB)lst pass/T = 10000-10 /100 = 10000
10001 - 10000 = l for t = l,...,50

y =x um:

t3 t3 3 10003 - 10000 = 3 for t = 51, ,100
100

By =200
t=1 t3
'y'3-2

T T
(EX) =Zy,+Tm

t=l t1 2nd pass t=l t1 3

200 + 100-10000 = 1000200

E3 = y + m. = 2 + 10000 = 10002 .

Thus, by computing the sum and mean of x3 in two passes, we

obtained the more accurate sum 1000200 and the more accurate mean 10002.

In a Monte Carlo study, Neely [1966] used an mi of (l/T)t§1xti
and compared computations of the means of variables through use of the
formulas '51 + mi , (l/T) glxti , and some additional formulas. Neely
found that the formula '§:=+ mi performed at least as well as the

346

other formulas he tried and was superior to the direct computation
T

(l/T) Z x
t=l

In section IX.A.2.a we noted that the simple correlations matrix

ti '

A* is a convenient normalized moment matrix to substitute for the Z'Z
matrix in the formulas given in parts I and II and that this matrix has
desirable properties for such a substitution. The same A* is obtained
if the yi are substituted for the x1 except that the A* matrix for
the yi will in general be formed more accurately.

To denormalize the statistics affected by the normalization used,

we must note what normalization elements [see (IX.2) and (IX.4)] are

implied by the substitution of the yi for the x1 . If we had used

 

the x1 directly, we would use 1A/ T 2 _' T as the nor-
2 xti ' xi 2 xti
t=l t=l

malization elements. Now note that

T

T 2 __ T _ 2
(IX.10) Zx.-X.Zx.=2(x -X.)
t=l t1 1 t=l t1 t=1 ti 1
T — 2
= E [(yti + mi) - (yi + mi)]
t=l
T
2
g 2 (yti ' yi)
t 1
T2 _ T
= 2 y , - y 2 y . ,
t=l t1 1 t=1 t1

hence, we can use the same denormalization elements as if the yi had
been the original variables (the xi).
In his paper, Neely [1966] also compared computations of simple

. T '
correlations matrices through use of the yi [with an mi of (1/T‘ ijti]
t

=1
with computations through direct use of the x1 and with computations

using other formulas. Neely found that the usual simple correlation

347

formula but using the yi in place of the xi performed at least as
well as the other formulas he tried and was superior to the direct use
of the xi.

The preceding implies that, in general, accuracy can be increased
by making two passes through the data.1 In the first pass, an approx-
imation of the mean of each variable is obtained (this approximation is
improved the second pass) and in the second pass the sums of squares
and crossgproducts matrix of variables from their approximate means is
formed and the sums of the deviations of the variables from their
approximate means are obtained.2 Finally (1) the simple correlations
matrix is formed from this newly formed sums of squares and cross-
products matrix (ignoring the fact that the means and sums of the
variables from which it was formed are approximately zero) and (2)
the approximate sums and the approximate means of the xi variables
are adjusted by (IX. 8) and (IX.9) to provide more accurate compu-
tations of these quantities.

Provision should be provided for the user to specify a set of
approximate means and he should be allowed to specify that only one
pass be made through the data. For example, it is planned that the

AES STAT system will be modified so that two passes are made through

 

LLongley [1967] contains a report on rounding error by many well-
known DLS routines. Longley documents the improvement in accuracy
obtained through making two passes through the data and the desirability
of using accurately computed simple correlation matrices and accurately
computed denormalization elements in the computation of DLS problems.

It is convenient to incorporate the extra variable x (a variable
assuming the value 1 for all observations) whenever a sums of squares
and cross-products matrix is formed. The sums of the variables used
in forming the matrix are then automatically calculated as the elements
corresponding to x0.

348

the data unless the user directs that only a single pass be made
through use of the ONEPASS code on the SSCP card. The user will also

be permitted to specify his own approximate means (m ) for any vari-

i
ables during that pass. Use of the ONEEASS code and specification of

no approximate means by the user implies that only a single pass will

be made through the data with all m equal to zero, i.e., the usual

1

simple correlation formulas will be used.

349
d. Adjustments if no overall constant coefficient
If no overall constant coefficient is to be included in an equa-
tion, the sums of squares and cross-products matrix [the M ‘matrix de-
fined by (IX.1)] normalized in some fashion is used as Z'Z in the
formulas given in parts I and II of this paper rather than the simple
correlations matrix [the A* matrix defined by (IX.5), (IX.4), and

(IX.2)]. From (IX.2) we recall that:

T
(IX.1]) aij = mij - xi 2 xtj ;
t=l
hence, mij may be formed as:
_ T
.l = a +
(IX 2) mij ij xi ﬁxtj
t-l
_ T
If the x1 and the Z xtj are normalized by the same nor-
t=l
malization as is used for the A matrix in forming the A* matrix
T
[i.e., d x, and d 2 x , are formed where the d, are the nor-
1 1 j t=l t 1

malization elements defined by (IX.4)], then the M matrix normalized
(say M*) may be formed directly from the A* matrix as:
_’ T
. * = 'k + d d
(IX 13) mi], aij ( 1xi)( J, r. xij)
t=1

z: * ,
where M* [mij]

This normalization is based on the deviations of each variable
from its mean having length 1 rather than each variable having length
1; hence, the elements on the diagonal of M% will not be 1 as is the
case for A*. One could do an additional normalization by multiplying

t
the 1th row and the i h column by C1 where c1 = lA/m:i so that
each variable will have length 1 (1'5 will appear on the diagonal of

the M* matrix); however, if this is done, statistics affected by the

350

normalization must be denormalized based on a normalization of cid1
rather than d1. If a special normalization is used for the M* matrix
and some estimates are based on the A* matrix and other estimates are
based on the M* matrix, additional bookkeeping must be kept by the
computer routine regarding which normalization was used to compute a
set of statistics used in a later step (e.g., DLS coefficients may be
used as starting estimates for FIML) or the statistics based on one
of the normalizations must be renormalized as they are stored. The
advantages of using a special normalization for the M* matrix would
not, in general, appear to warrant the additional programming required.
Rather than form the M* matrix directly or for that matter
form it indirectly once and for all and hold it in the computer memory
or on some auxillary storage device, it is much more convenient (and in
many cases, saving of computer time) to form from the A* matrix the
portion of the M* matrix needed to calculate a given phase of a pro-
blem. It is planned that the AES STAT system will be revised so that
it is assumed that an overall constant coefficient is to be included in
each equation unless the user specifiep a NOCON (for 22 overall Eggstant
coefficient) code on the relevant STAT control card. For example, k-
class estimates will be based on the A* matrix and an overall con-
stant coefficient will automatically be calculated (hence, the x0
variable need not be specified on a given K card) unless the NOCON code
appears on the K card.1 If the NOCON code appears on the K card, only

the part of the A* matrix corresponding to variables appearing on the

K card will be extracted from the A* matrix. The ‘M*"m'atr‘ix .

‘ P

 

1The'K card is discussed in sections IX.3 andle.x; L'e:

351

corresponding to these variables will then be formed in the manner
given by (IX.12) and used as the Z'Z matrix in the calculation of

the k-class problem.

332

3. Use of simultaneous equations solutions

 

Klein and Nakamura state, "In the evaluation of Y'X(X'X)-1X'Y
we find it more efficient and accurate, from a computational point of
view, to calculate (X'X)-1X'Y (as the solution vectors of sets of
simultaneous equations) and then premultiply the Y'X. Criteria for
success are judged by the positive definiteness and symmetry of [Y'YJIX'
Also inconsistencies in subsequent dependent calculations indicate
arithmetic errors at this stage."1

Calculation of Y'X(X'X)-1X'Y as Y'Y - [Y'YJLX where [Y'YJIX
is obtained by direct orthogonalization (see section I.D.2) is more
accurate yet; however, Klein and Nakamura's basic point is well taken.
If the result of an inverse times a matrix or vector is desired, this
result can be computed more accurately by treating the calculation as
a "solution to a set of simultaneous equations" rather than by actually
forming the inverse and performing the matrix or vector multiplication.

A slight problem presents itself when the inverse of the matrix
is desired in its own right such as for the calculation of variance-co-
variance estimates; however, this problem can be resolved by using a
subroutine which calculates the inverse and simultaneous equations

solution at the same time.

 

1Klein and Nakamura [1962], p. 287. Klein and Nakamura used the

notation M M-lM instead of Y'X(X'X)-1x'Y, M-IM instead of
yx xx xy xx xy
(X'X)-1X'Y , M instead of Y'X , and MA. instead of the [Y'Y]
Y" W Ix

used in the above quote.

353

For either inversion or simultaneous solution, the selection of
the largest element of the remaining submatrix to be operated on at

each step as the pivot for the step considerably reduces rounding error

for many problems.1

 

1For positive definite or positive semi-definite matrices, the
largest element of the remaining matrix to be operated on at each step
will occur on the diagonal of the remaining sub-matrix.

354
4. Direct orthogonalization

The use of direct orthogonalization in various phases of the
computation has been emphasized throughout this paper. (A method for
accomplishing direct orthogonalization is given in section I.D.2.)
Here we note that direct orthogonalization may considerably reduce
rounding error as compared to calculation by the usual method, e.g.,
by Y'Y - Y'X(X'X)-1X'Y . Also matrices of the form Y'X(X'X)-1X'Y
may be calculated more accurately as Y'Y - [Y'Yllx .

The rearrangment of rows and columns at each step so that the
pivot is selected as the largest element of the remaining sub-matrix
(this largest element will occur on the diagonal of the remaining sub-
matrix) at each step will considerably reduce rounding error for many

problems.

355

5. Iterative techniques

 

There seems to be a fairly widely held view that rounding error
builds up as iteration continues in the calculation of certain solutions.

In particular, it is often thought that, even though rounding error is

 

small in the computations performed for a single iteration, FIML coeffi-

 

cients may be subject to quite a bit of rounding error if quite a few

 

iterations are required for convergence. Now, continued iteration may

 

cause difficulty in some iterative procedures, and formulas could cer-
tainly be devised which would cause rounding error to build up in the
calculation of FIML coefficients; however, rounding error will not
build up during iteration if the formulas given in this paper are used.
This is because during an single iteration, all calculations are based
only on a matrix of sums of squares and cross-products (which does not
change as iteration progresses) and on the set of coefficients obtained
from the last iteration. In order to move to a higher likelihood, ad-
justments to these coefficients are then calculated, and added to the
coefficients and a new iteration is started. If the increment for a
coefficient is in the wrong direction, the procedure itself will correct
the coefficient at some later step (assuming that rounding error is
small for the computations performed in any single iteration) as the
likelihood will not be maximized until the coefficient is corrected.
Thus, assuming that convergence is obtained to a desired degree of
accuracy, the particular coefficients obtained are a function of the
matrix of sums of squares and cross-products and the assumed structure

(neither of which change as iteration progresses) rather than the

356
intermediate coefficients and matrices obtained in the process of

iterating.

 

1If the likelihood has multiple local maxima, the particular in-

termediate coefficients will have an influence on which maximum is
obtained; however, given that we end up on a particular peak, continued
iteration will not cause the coefficients which maximize the likelihood
in that region to be estimated less accurately.

357

Free Field Interpretive Parameters

A very common way to give instructions to a computer routine is

by a series of parameter cards containing information punched by the

user in essentially the following form:

 

Card 1:
Cols. 5- 7:
Cols. 8-10:
etc.
Cards 2-10:
Cols. 4- 5:
Cols. 7- 8:
Cols. 9-10:
Cols. 11-20:
Cols. 21-22:
Cols. 23-24:
Cols. 25-34:
etc.
Card 11:
C01. 5:
etc.

Number of rows in the matrix

Number of columns in the matrix

Number of elements on the card
First variable number

First column number

First element

Second variable number

Second column number

Second element

Punch a 1 if 3SLS estimates are to be calculated
from LIML coefficients.

Leave blank if BSLS estimates to be calculated
from ZSLS coefficients.

The use of this type of parameter card causes an unnecessarily

large number of errors. Here a number of cards must be prepared with

particular information going in particular columns, the cards differ-

ing somewhat from each other, no parameter being expressed in a form

natural to the user.

It is very easy (and common) to get some of the

358
information punched in the wrong columns. Now, some of the mispunched

information may be detected by the routine as being impossible. This
case may cause computer time to be wasted and results to be delayed;
however, this is not the most serious problem. The serious problem is
that with this type of parameter card, it is very easy to mispunch a
card in such a way that the parameters still make sense to the routine.
An answer is obtained, but it is the answer to a different problem than
the user intended. Often the user will then go ahead and use the re-
sults.

One might be comforted by the thought that the probability is
very low that (1) a card will be mispunched by a careful user who
double-checks everything, 22d (2) if a mispunch occurs, it will appear
correct to the routine, 33g (3) if the mispunch occurs and the routine
accepts it, the answer will be within a range that a user skilled in
the method will accept as correct. If so, one is falsely comforted.
The probability is quite high.1

The more usual situation is one in which a user with little
knowledge of the importance of checking his parameters and data cal-
culated a problem by one or more of the simultaneous stochastic equa-
tions estimating procedures. Often the user has had little experience
in interpreting results from the methods (after all, we have to start

sometime) and so he expects almost anything. The probability of getting

 

1The writer has had occasion to take results of problems cal-

culated by researchers who are far above average in carefulness of
preparing cards and skill in interpreting results, but he has rarely
obtained the same answer on his routines. Further checking usually
disclosed an invorrect number punched or a code or number punched in
the wrong column(s). The researchers had already reported their in-
correct results.

359

an answer to a different problem than the one he actually has in mind
(using the above type of parameter cards) is even higher than for a
user who is aware of the need for double checking and has had much
experience in checking results. Often the user makes a comparison of
a series of models. In this case the probability must approach 1 that
he will get some incorrect results due merely to mispunched parameters.
One way to reduce the probability of misspecifying the parameters
is to use an interpretive form of parameter card that is free-field,
i.e., no numbers have to be punched in particular columns. In the
accompanying listing of computer control cards and following ex-
planatory paragraphs, interpretive free-field instructions to the com-
puter are illustrated with the parameters and data which would be used
to estimate Klein's model I by DLS, ZSLS, LIML, 3SLS, I3SLS, LML, and
FIML on the AES STAT system. (A number of other methods are available
in the AES STAT system, but this should be adequate to illustrate the
parameters.) The results are reproduced in the last section (section
IX K) of this chapter.
through X to be formed

1 15

into a sums of squares and cross-product matrix. Certain basic statistics

The SSCP card causes variables X

regarding the variables and pairs of variables are automatically cal-
culated and printed out. The TRANS code causes the data to be trans-
formed as indicated by the transformation cards which are discussed in
the next section. The RES code causes the transformed data from X1
through x15 and X20 to be saved for later calculation and labeling
of residuals. The MAXE=8 (mgximum equation number is equation 8) code

sets up a coefficients pool for temporary storage and retrieval of co-

efficients. Quite a few additional codes are permitted on the SSCP

360

9°.aoo..a.a «.ouo
as. a ca. Nos .. one
a~.a ON. 0.“ ...Ho
.4431 as. a a“. c.” o. can
so. a so. .~.a . .uac
m: a. ~23

8+4: no.6 fat... 33¢

ﬂea noses—a ‘3...» wen—.1. :ﬂﬁoﬂk eaten
$392.9”? 9.3: .7911. .27. 4 dog
.53: deduuesn dead. »u_._.¢.sd>
”iv—snot nt.—04d. 4.” #0.:
is»: J J nicest; ..an n=v+

ahﬁedv rebou-

933533. to 333.3: +30
... ewe—33s... J = do semi-3.33%.}??-

93. ......x
0+... 8031:... Low 3.94 1042‘s....—
1. I503. ——¢ 4. dad—int 91...: 505‘

of

     

a..o °«.o ov.n ca.“ oo.a
99.x»...o go.» so.» o..e
90.x» ... oa.¢n N .:°o~.

e~.~n ... on. . ..«ooo. can we
°~.n~ ..u on. n °~.a.oa. «.aau as
.na..e~.na. new.» a.na..xo.e.:.oe

.u:o:

..mxax. mm¢.oz.¢a...~.xx.nm.x....a.x.aumm
omega; a. omega; 2. anaaas a. mx.p. a.g.~:..~.x J

«:33qu .J.

m.3. at“... .61»: J
_ Juno: z.m4x..<».o a: can a
«.33». r223.
4:_uuu4»n.uaama.x..uaamm,_a=o¢
. 2m
zunem.
o~¢d.¢2..o~.x
33:13:33..
.m.x....u..o«..
x zo~»_umea mJ-aoo

.o.«.x zo_mzm:~o a as. ca..m.. z.¢z\a\zozxoo

.u.auz<¢» mz_»3o¢-:u
x.a.4.zpa
92» m4..
~.z~<:
a..uJ.a
.a.an.»x.o~.aaaou
pz.oz<:ua

4x_e.sxs .msmna .asmn .4x~4.msm~ zo . Juno: z.msx 44.o.mso=¢.nasxaa.~..u.o.moa

..

361

at»; 1.... 3am. I n 53.3.? 93.310

45.. ...: 33 I a £49.? «331»

15.. .252. I 2.113: .331. A

 

 

   

ass.“ on... sa.d~ 934°».nu
.on.o o~.aousb.o« coco..n«
..a.o soloaasn.n« coca...“
Loo.» o..ao~on.aa asaoo.»s
.oa.~ o..oo«oo.aa cocoo.aa
no... oa.aod o..« noocn.oa
moans saunas nuwa .oua».oa
1.” a oa.~o~ N «a no no.0“
4+4 :5: 3:3 ...; 2%».2
av.» animus ..ae "can“.oa
cu.» oa.o«~ o.n« cocoa.o«
was.“ oa.ms~ a.a~ as.a..ao
an.¢ o..aa~ a.a~o~a.oa.oo
ac.¢ooo.a.~ o.oaono.oo.so
eou.oo.unoa oqaao.s.oo.ao

 

rlﬁumrxzrssnrnqliuump_m

. g(a‘

 

 

00

a a

oc.so
camoo
onmoo
as so
9.”.2
ov.no
sauna
cameo
aa.ao
owes
~.co

 

   

com" “.0 ..n o..n.o..» on.» ....saa....«.
o.-.oo.o a.“ so.» as.“ oo.o.o».»....noooc
cannon.“

mpm .mhmou ...>n .oquua
mcsoo.m.m.n.moa.>s.o.msaa. .
.ss.«a.....a\as.»a.o\.....

a....4x~4 oz. .Jaau.. smoo: zwusg 3
96.3333 63:...
mamoo.mam.~.mom.>o.ua.4m~

. .naaoa...a‘~«.«a.s\..~.x

u .0 4:”; nza a4»~o.. onas znusy :
mam.u»»ooJ>n.o.Jz.4
manoo.man.u.momJ>o.u..sm~

.n«.~«.°«...asaa.oa....«.x

a ...Jtas a=a mama... Juno, zausg a

«eao.ao ozm

           

«.0 a..oca..oaoo.«vo».aooo.a.oan
..o so.n.a..nasa..».....on.aaoon
a.» 95.5.0».aao..«vo..~eoa.onoan
Blues”...o..aaoo.enaa.~soa.an..n
«.o ... on.ona...ao~.nn ».a.on.unonn
0.. asses...»°».~«oo.on e.g.oa.o.
..m°«.a on..no~.Haom.ouoa.a.on.o.
n.no»...on..now.aooo.a~
o.. ..nnon.ano .«aon.vn
~.. ~.«.cﬂ.«.3..naoo.an
0.. o.aoon.o.pa.d~on.a.
a.» m..o o.~co«.a~ ~.on

a.1
«.0
v.“

 

 

a.oo

 

a.» ...o unacs.uoa one»
.53 a

 

362

23¢ no ozm

Jab. . 8.;3.356265%:6.2.13
”3 4:... 23063.:39;;.mw¢t¢.zo>n.zo>u.8>o.26636.3::
...-1... . . 9.3.52
.33....» 53.03.229.23035
- 0 .o...¢\n.~.d.4:_.
, . smog: z.msx..msa~ some uz.»¢<am sea. .Jz; a
o«.4m:_a.sa.a.a.uuu.cs~.~t
mamMH . a ..H a a . Q..—......waouaxmz
I
asausnn mam memos zuso use u>u zo>o u>= x. o mam .

r. 5...». . 88.3.5263.33.83.39239.25£36.33
44.3.3 zu>a.o>a.ua..x3..unaw

m..o......a.x\o....\n.~.a.m4mn

u Jmoo: z-muxuuwamu tome az~u¢<>w men~ .men 3

5.4:... .33... f. a. ......e... :2...
:41: $4.4. e. a. 4.5.3... :2...
511?»..333 1» ﬂ .1338!» $2..»
5.4.... ....s... 1.. ... ......e... :3...

30.43.? $.43! +4 * £1333?» $.30?

.1

a.«.acas

.nao.»a:¢oe
..mom.>aJo.a.oxvd.o.ooam¢
«.~.«.

.uan.»az¢oa
s.mum.>a.o.a\n.n.uo<mx
deaaua

.uanvpatcoa
oumom.>a.u.~«\~.ma.oaam¢
«adage

.aan.»a=aoa
numom.>o.u.n.v.ea.uoam¢
uaa.«cﬁoa.

.uamcaazcoe
canon.>o.u.o.o\«.«.va.ooama

mzo.a<3am >»_»,ma..._ swans z~msg a

363

card to control data input, cause matrices to be printed out, etc.
The SSCP card like all of the other control cards is completely free-
field, blanks being permitted anywhere on the card.

The first K card sets up the basic matrices for any double k-
class member for equation 1 with X1, X4, and X5 as jointly dependent
variables; X and x as predetermined variables in the equation;

0 12

and X7 through X10, X12, and X13 as additional instrumental vari-
ables. The codes following the K card specify the particular k-class
members to be calculated. The ZSLS code designates that ZSLS coeffi-
cients are to be calculated and the LIML code designates that LIML co-
efficients are to be calculated. QAny specific k, double k, or h-
class member or series of members can be calculated by specifying k,
k1 and k2, or h. UBK coefficients can also be calculated by using
an UBK code.) The particular statistics to be printed out are de-
signated by codes (e.g., C for goefficients, STE for standard errors)
following the codes telling which estimators to calculate. The SCE=l
code following the ZSLS code designates that the ZSLS coefficients are
to be stored in a coefficients pool and labeled equation 1.

The other two K cards are similar to the first in specifying
statistics to calculated for equations 2 and 3. The READC cards are
used to input the identity equations, and the BSLS and FIML cards are
used to calculate the BSLS, IBSLS, LML, and FIML estimates. For iterat-
ing on the BSLS coefficients in the calculation of the I3SLS coefficients,
various stopping criteria have been specified. In this case iteration

will stop when the first of the following occurs:

(1) The proportional change in all coefficients is less than

364
.0000000001 (COG-1.0-10 code).

(2) The number of iterations exceeds 200 (MI=200 code).

(3) Iteration proceeds for more than 10 minutes (TIMELBlO code).

In the computer output given further on it can be noted that the
lst criterion was satisfied in 42 iterations with the total time being
12 seconds.

The NTH=20 code causes the statistics listed after the code to
be printed out each 20th iteration. In this way, the progress of itera-
tion can be examined for problems requiring a large number of iterations.

The convergence or stopping criteria given for 3SLS are avail-
able on FIML with all of the additional stopping criteria listed in
section V.C.S avaiiable for FIML as well. The NTH=10 code causes cer-
tain statistics specified by codes to be printed out each 10th itera-
tion. Additional information regarding the form of the 3SLS and FIML
cards are given in a later section (section IX.D) on the "coefficients
pool.”

The codes (and the entire form of the control cards) are quite
arbitrary. The important features of the control cards just discussed
are:

(1) The control cards are free-field; hence, there is no need to punch
particular information in particular columns.

(2) An individual code may be mispunched just as a number may be mis-
punched into an incorrect column in the form of parameters given
earlier; however, with the form of the codes given here, a mis-
punched code may be readily detected by the computer routine
whereas there may be no way that a number punched in the wrong

column can be detected by the computer routine.

365

(3) The control cards are open ended. If a new feature is added to
the routine, a new form of control card or a new code may be
easily entered to permit the user to use the new feature. (With
the other type of control cards, insufficient free columns may be
available, hence, additional control cards may be required; however,
this may cause trouble to those who don't know about the require-
ment of the additional control cards. Also, a set of control
cards used to calculate a problem prior to the change can no
longer be used without updating the set.)

No matter what form of control cards are used, they should be
printed out as they are encountered to provide a permanent record of

the particular control cards used to calculate the problem.

366
C. Data Transformation Section

It is very easy to provide a lot of facility for transforming
and editing data by providing a call to a subroutine (to be specified
by the user) after each observation is read into the computer. Con-
venience to the user is enhanced by selecting carefully the arguments
transferred to the subroutine-and the variables provided in COMMON
blocks to the user.

In the parameter cards given above for calculating Klein's model
I, the FTN, SUBROUTINE, COMMON, DOUBLE PRECISION, RETURN, and END cards
are described to the non-programmer user as "transformation form cards"
which must be inserted whenever a transformation subroutine is used and
copies of these cards are kept on hand at all times by the MSU Computer
Laboratory. The user merely helps himself to the prepunched form cards
instead of punching his own. A manual giving simple transformations is
available to the non-programmer user to hslp him accomplish his trans-
formation and editing. (A user with knowledge of programming recognizes
that the full power of the FORTRAN compiler is available in accomplish-
ing the transformations.)

The transformation cards are inserted between the DOUBLE PRE-
CISION form card and the RETURN form card. The first transformation
card in our illustration creates X14 as X4 +-X5; the second trans-
formation card creates x15 as x2 +-X12; and the last transformation
card creates X20 as the raw observation number (NR) plus 1920. (X20
is used further on to number each residual by the year of the particular
observation.)

The variables on the COMMON card aid in such things as dropping

observations, stopping date input, branching to particular sections if

367

multiple SSCP cards are used, lagging variables, calculating moving
averages, and printing out data labeled by observation number. Addi-
tional COMMON cards are available to assist the user in particular
transformation tasks; however, they are rarely required.

The transformation section also provides an open ended method

of reading data from miscellaneous storage devices.

368

D. Coefficients Pool

 

In the AES STAT system, a coefficients pool is automatically
established within the computer into which coefficients from any single
or multiple equations estimating procedure may be stored. Also stored
with the coefficients are a record of the method creating the coeffi-
cients, which coefficients pertain to jointly dependent variables and
which to predetermined variables and the particular variable numbers
of the coefficients.1

Various control cards are available for retrieving the coeffi-
cients and making calculations based on them. In particular they are
retrieved for the BSLS and FIML calculations in the illustration. As
examples of the types of parameters available, the SCE=1 (gave coeffi-
cients as gquation 1) code following the ZSLS code on the first K
card designates that the ZSLS coefficients from the first equation are
to be stored as equation 1. Similarly, the SCE codes on the remainder
of the K (k-class estimates) cards and on the READC (read goefficients)
cards are used to assign coefficients to equation numbers. READC cards
are used to input identity equations for Klein's model I; however,
they could be used to input any set of coefficients.

The numbers following the "38LS(" but before the first "/" on
the BSLS card give the equation numbers of coefficients to use as
starting estimates for the 3SLS estimating procedure and, by implica-
tion, the structure of the 3SLS system. The numbers between slashes

designate identity equations in the system. These identity equations

 

1Coefficients from the pool may be reclassified or the equation
renormalized at any time through the use of RECL cards.

369

are not used by the SSLS estimating procedure and could be omitted
except that the RF (Eeduced form coefficients) code designates that re-
duced form coefficients are to be calculated and printed and the RFRES
(reduced form residuals) code designates that reduced form residuals
are to be dalculated and printed; hence, the identity equations are re-
quired to complete the system so that reduced form coefficients and re-
siduals can be calculated.

The variable numbers following the slash are instrumental vari-
ables to be used in the BSLS estimation procedure in addition to the
predetermined variables in the subsystem being estimated (equations 1
through 3).

If the SCOE code had been used after the BSLS code, the BSLS

coefficients would have replaced the ZSLS coefficients in the coeffi-

cients pool and therefore the FIML procedure would have started from

the BSLS coefficients instead of the ZSLS coefficients.

 

370
E. Special Files

1. Data files

In the AES STAT system, two data files are available under full
control of both the program and the user. Either raw or transformed
data may be stored in these files by user control codes. If sufficient
capacity is available in the main core memory, these files will be
carried in memory; otherwise the routine automatically establishes them
on the magnetic drum or (at the user's option) on magnetic tape. Any
number of additional data files may be established by the user in which
he exercises primary control through use of FORTRAN statements and

executive functions in the transformation subroutine.

371

2. Intermediate storage files

 

Two intermediate storage files are presently used by the AES
STAT package to (1) store information not needed for the immediate cal-
culation (as examples, the coefficients pool, and the full sums of
squares and cross-products matrix when only a small part of this matrix
is required for a particular calculation) and (2) provide additional
storage capacity. These files are automatically established within
memory by the AES STAT package if sufficient capacity exists and they
are created on the magnetic drum if insufficient capacity exists.
User options also permit establishing these files on a magnetic drum

or on magnetic tapes.

372
3. Matrix storage files

Not presently in the package but to be added are codes which
will establish and write particular matrices into files in a manner
such that other packages or extensions to the AES STAT system can
readily reference them. For example, a simple correlation matrix
could then be created by the AES STAT system and then used by a factor
analysis package. As another use, a simple correlation matrix or
matrix of sums of squares and cross-products could be stored in’a file
and retrieved and computation continued at a later date instead of re-
quiring the AES STAT routine to again start the calculation of a pro-

blem from the set of data.

3/3
F. Incorporation ofgi and ﬁ Directly into the Sums of Squares

and Crosssproducts Matrix

 

Ix.1 “=28
( 4) y HH-
and
Ix.1 -“= -‘=-8=-28
( 5) a y y 3'2““ +u+u
. -1
where +2“ = [yu : 2D] and +§u = 8 . Also,
+‘ﬂ

let a sums of squares and cross-products matrix be defined as Z'Z
where Z includes the variables in Z” . Suppose that it is desired
that the y variable defined by (IX.11.)be added as an extra row and
column to the Z'Z matrix. It is more accurate as well as saving of
computer time to accomplish this directly rather than by calculating
9, forming [Z 2 y] , and then forming [Z 3 §]'[z E 9]

To incorporate 9 directly into the sums of squares and cross-
products matrix, we form the [Z E 9]'[z E y] matrix by forming

A
I

'y as 8'EZ'Z ]8

y uuuu

The new sums of squares and cross-products matrix is:

Similarly u may be added as an extra row and column by forming

Z'u as -[Z' and ﬁ'ﬁ as 3'[ Z' ZHJ+SM . Thus, the new

+ZHJ+§H : + u +~u +

matrix is:

2'2 2'6
ﬁ'z ﬁ'ﬁ

In the AES STAT package, incorporation of the y or u corre-
sponding to any set of coefficients directly into the sums of squares
and cross-products matrix is accomplished by the REDO control card.

For example, the following REDO card will retrieve the coefficients
of equation 5 (they may have been created by any method or read directly
into the computer), incorporate 9 into the SSCP matrix as variable

10 and incorporate G into the SSCP matrix as variable 14:1

 

[REDO(S)YHSSCP=10,USSCP=14

 

1y and 6 may also be added directly into the transformed data
file if desired. For example a YHTD=10 code used on a REDO card will
incorporate 9 into the transformed data as variable 10 and a UTD=14
code used on a REDO card will incorporate a into the transformed data
as variable 14. The variable number of a YHSSCP code need not be the
same as a variable number of a YHTD code even though they are used on
the same REDO card.

375

G. Estimated Values of Normalizing Jointly Dependent Variables,

 

Residuals, and Related Statistics

 

The calculation of structural and reduced form estimated values
(of the normalizing jointly dependent variable) and residuals for each
observation as a user option seems very important in a simultaneous
stochastic equations package. A feature of the AES STAT package which
has proved quite convenient (in cases where a logarithmic transformation
has been used in the creation of the normalizing jointly dependent vari-
able) is the option of calculating the anti-logs of the actual and esti-
mated dependent variable for each observation and the difference between
these values for each observation. If the dependent variable has been
transformed to logarithms, then these three calculated values correspond
to the original variable, the estimated value of the original variable,
and the residual in its original or natural form. This can prove very
helpful in analyzing the results. All three statistics are printed
for each observation as a user option in addition to or in place of the
regular actual value, estimated value, and residual in logarithmic form.

It is also desirable that a new SSE (sum of squares of error)
and R2 be calculated automatically from the residuals in anti—log form
whenever they are calculated (a user option in the AES STAT package
permits the new SSE and R2 to be calculated even if the residuals in
anti—log form..are not actually printed). The SSE and R? so obtained
can then be compared to the SSE and R2 obtained from estimating the

model in natural numbers (i.e., from estimating the model in some sort

376

of linear form, quadratic form, etc.).1

When calculating residuals, it is simple to calculate the Durbin-

T 9 T
Watson statistic (- Z (C1t - u; 1Y7 Z 3:) and print it out. The user
t-2 ' t=1

can then use it if it is applicable. (The Theil-Nagar statistic is the
same as the Durbin-Watson statistic, and the Von Neumann-Hart statistic
may be readily calculated by the user as T/(T - 1) times the Durbin-
Watson statistic.)2

When printing out residuals it is often helpful to the user if
each residual is labeled by one or more numeric or alphabetic variables.
In the computer output given further on, each residual is labeled by

the actual year to which it applies.

 

1It would be preferable to generalize the residual subroutine
slightly to permit calculation of any function of the dependent vari-
able and the estimated value of the dependent variable, since the de-
pendent variable may be transformed in other ways besides just the
logarithmic transformation.

2Basic references regarding these statistics are: Theil and
Nagar [1961], Durbin and Watson [1950], Durbin and Watson [1951],
and Hart [1942].

377
H. Weighting;of Observations
It is very easy to provide for weighting of observations as a
standard part of the simultaneous equations package. By weighting of
observations we mean that the ijth element of the sums of squares and
cross-products matrix is formed as:

T

(IX.16) t.Elctxtixtj

where the ct are weights designated by the user.1 (If all of the
ct are l, we have the usual unweighted sums of squares and cross-
products matrix.) For the weights to give the correct "degrees of
freedom” for estimates of variance or statistical tests, the sum of
the weights should be T; however, this imposes an unnecessary burden
on the user. All that is necessary is that the user specify the re-
lative weights. The relative weights will be automatically adjusted
so that their sum is T if the sums of squares and cross-products
matrix is formed and then the entire matrix is multi-

plied by T/ g c , i.e., the ijth element of the matrix used for

c~1 t
further calculations is formed as:

T T
(Ix.17) m =(T/Ec)2cx,x
ij tsl t t=1 t t1 ij

 

1Weights may be based on the number of observations in substrata
of the sample as compared to the population to which inference is de-
sired, the inverses of some estimates of variance related to the ob-
servations, the length of time since some base point, etc. Klein
[1953], pp. 293, 305-313, Goldberger [1964], pp. 235-236, 239-241, 245
and Johnston [1963], pp. 207-211 contain material on weighting of
observations, mostly in adjusting for heteroskedasticity.

378

In the AES STAT series it has been convenient to form and carry

the sums of the variables (either original or deviations from means) by

including an extra variable, x (a variable taking on the value 1 for

0
all observations), when the sums of squares and cross-products matrix
T T

is formed. m will then be 2 c and m , will be 2
00 t=l t 01 . t=1

If the entire sums of squares and cross-products matrix is then multi-
T

plied by T/ 2 ct , the weighted sums of the variables will have been
t=l

correctly adjusted as well. The same adjustment procedure is used

cx,.
tti

whether the variables (except x0) are actual or deviations from means.
The simple correlations matrix (A*) may then be formed as in section
IX.A.2.c and the problem calculated.

In the AES STAT series, weighting of observations is automatically
imposed through use of the WOB code on the SSCP card. If a WOB=10 code
is used on an SSCP card, each observation is weighted by the value of

X for that observation. (X may be punched on cards along with the

10 10

rest of the data or calculated in the data transformation section [see

section IX.C].) The sums of squares and cross-products matrix is then
T

automatically adjusted by T/ 2 ct before further calculations are made.
t=l

379
J. Checks Against Errors

As many cross-checks against errors should be built into a com-

puter routine as possible. Following are examples of cross-checks

against errors built into the AES STAT package:

(1)

(2)

The error sum of squares is calculated anﬁprinted out for any co-
efficients read into the computer through use of the READC card.
If a set of coefficients represent an identity equation, the

error sum of squares should be zero. Due to punching errors in
the data, the requirement that the error sum of squares be zero

is often not met. Unfortunately, users are usually lax about
checking that the error sum of squares for each identity equation
is zero, so we plan to revise this check so that it is performed
by the computer as each 3SLS and FIML system is being set up for
calculation. Thus, if this check is not met for an equation
specified to be an identity equation, the BSLS or FIML system
will not be calculated and a message to the user will be printed.
The ranks of matrices are calculated in the process of orthog-
onalization and inversion. If in a later step the method requires
that certain rank requirements must be met, this check is per-
formed by the computer and a message printed out if some require-_
ment is not met. As an example, rk.X“ and rk XI are calculated

during the calculation of the [ Y' Y matrix for k-class

+ u. + pJIXI

estimation. Assuming that linear restrictions are not imposed on

the coefficients, if rk.XI < “h , unique ZSLS and LIML coeffi-

cients do not exist;1 if rk.XI = nu , ZSLS, LIML, and many other

 

1The equation is under-identified.

380

estimators coincide;1 and if rk XI > n“(the most common case en-
countered), ZSLS and LIML estimates do not in general coincide.2
The user is notified as to which case is encountered and if

rk.XI >'qb, rk XI - qb is printed out.

Where possible a complete check for a given condition should be
made and action taken by the computer routine rather than merely noting
conditions on the computer output and relying on the user to notice
the condition and take action when required. Many users merely skim
through the output giving almost no thought to messages printed out.
Only by not calculating part or all of his results can you obtain a
user's attention to many types of errors. (The analog of this is the
proverbial farmer who bats his mule over the head with a 2X4 to get
his attention.) For minor errors, we have used the practice of print-
ing out many asterisks in conjunction with error messages in the hope
that the user will be sufficiently curious to look up the error. (Most
error messages in the package are given a number. A separate manual
then explains what is wrong for each error number and for many error
numbers, steps the user can do to rectify the error.) Also, the total
number of minor error numbers is prominently printed out at the end of
the user's results.

To assist in "checking out the control cards and data" before
calculation is actually started, a SCAN card is available. When a
SCAN card is used, the data is read into the computer and some basic
statistics (but not a sums of squares and cross-products matrix) are

calculated. All control cards and codes are then checked for

 

1The equation is just-identified,

2The equation is over-identified.

381

consistency. For example, dummy coefficients are saved in the coeffi-

cients pool so that the routine can check that starting coefficients

have been specified for BSLS and FIML. After all errors have been
corrected, the SCAN card is removed from the deck and the problem cal-
culated. Considerable computer time is saved when the SCAN card is
used to detect errors before actually calculating the problem.

Some errors must be detected by the user with the computer
routine merely printing out statistics to aid in the detection. Data
errors are usually of this nature and cause much difficulty as a re-
Sult. Following are some statistics printed out by the AES STAT pack-
age to aid the user in detecting data errors:

(1) First raw observations, i.e., the first observation as it was
read from cards or from a file. (Usually a mispunched format card
can be detected by checking the first raw observation.)

(2) First transformed observation, i.e., the value of each variable
listed on the SSCP card for the first observation incorporated
into the problem. (An incorrectly specified transformation can
often be detected by checking the first transformed observation.)

(3) Number of observations read, number of observations dropped, and
number of observations in the problem. (Considerable editing of
data is often performed in the transformation section. The
number of observations in the problem may depend on the trans-
formations and the data itself since observations may be deleted
by the transformation section.)

(4) Sums of the raw observations. If the sums of the raw variables
are obtained from the basic data source these sums can serve as

a check on transcribing as well as data punching. (Transcribing

(5)

(6)

(7)

(8)

382

errors are likely to be a much larger source of data errors than
punching errors, especially when card punching is verified by re-
punching the cards on a card verifier.) It is usually most con-
venient to use a hand adding machine to get the sums, with the
individual sums being kept on the adding machine tape. A sum is
not obtained twice, since it is checked against the computer out-
put. If a sum does not agree with the sum on the computer output,
the error is easily traced by comparing the adding machine tape
to a listing of the data.

Minimum value encountered in the data for each transformed vari-
able.

Maximum value encountered in the data for each transformed vari-
able.

Means of the transformed variables. Although the exact means are
usually unknown, their magnitudes should be known. A mean of the
wrong magnitude may reflect many possible errors.

A list of variables which are constant and a list of variables
which are zero for all observations. Usually a variable which is
constant or zero for all observations reflects a transformation
error or failure to provide a transformation.

Some of the above checks may seem cumbersome; however, the ease

with which errors are made and the effect of errors on results make the

returns to such checking extremely high. Those familiar with the

statistical methods given in this paper will surely agree that a single

mispunched data element may have a very drastic effect on the statistics

calculated.

383
K. Computer Output

Following is a reproduction of the computer output generated by
the parameters and data for Klein's model I listed earlier. Some hand-
written notes have been added to call attention to particular points in
the output. Also, some blank lines and mostly irrelevant material have
been removed to save reproducing costs.

Due to the particular method of output used, no number which
occurson the printed output contains more than 9 significant digits.
This holds even though given numbers might contain 23 significant digits

while in the computer. Following are some numbers occurring in the out-

 

 

put:
Number Printed Out Number Should be Interpreted as
1133.89999998 1133.90000
37275.86999989 37275.8700
20.2782089394 20.2782089
1.5261866857 1.52618669

In comparing estimated variances, covariances, and statistics
computed from estimated variances (such as standard errors and coeffi-
cients divided by their standard errors) in the computer output with

reported results from other sources, it should be recalled that a

 

"degrees of freedom" adjustment of T/Q/T - nu /T - nu.) is used in
most of these statistics in the output whereas for results reported
elsewhere no "degrees of freedom" adjustment or a different degrees
of freedom adjustment is often made. For example in results reported

in Goldberger [1964], no "degrees of freedom" adjustment is made. In

some cases coefficient variance-covariance estimates have been

384

printed out both with and without the degrees of freedom adjustment to
aid in comparing results with other sources. For variances, covariances,
and related statistics, the computer routine prints out an indication

of the denominator used in the computation.

Total execution time was about 40 seconds.1

 

1About 2.5 minutes of "set-up" time was also required. The "set-

up" time will be reduced drastically if many of the subroutines are

transferred to the regular library file. Current charges for the CDC
3600 computer are $330 per hour plus $.01 per page of printed output.
Thus, due to the large set-up time presently required, the total cost
of calculating this problem on the CDC 3600 computer was about $18.00.

($3.67 for execution time plus $13.75 for set-up time plus $.65 for
printed output.)

385

~....an.n.as¢

.aaouu. n...~o .auma.4.
.coaaa a. aura.:_. nt.»:oc
an zo_a.au ma.» pa.»
n....~ a. amen... nt.»:og

.‘9~J£L 01. ...VONIJS 4 ﬂea.) OrJi u54:r 19.3097. wastedillan 04-.“ 9.0.4.0.“. M atubamaunaundu-avatuuuowtubsoc

« .0: mac.

or.

assau-

oe anonocvo..~.-

said?! 4:: acouvaisowasdtxr 33:33.32-
.n.xo.c....ca..

x zo_n_uw-a nausea

...~.- :3.»aw:.o . an.¢a_.n_.»:.az\a\zozcou

...«azau» wa.a:oco:n

oosa~xso an.nrpa

u...5.g»a

oo.u4.a

.ac»..»s..~.a.aou

»:.az¢xmo

4x.u.5:4 .mun». .nsan .Jx_a.m4n~ 2o . Swan: games 45.o.w4c:¢.n.4:_a.~.caoo.-os
.ao was» :mamoca .. »»_co_ga.oa....nnnann amass: unzwaoua

sea." aoancws neon. taco an:

386

ooooooon.ouuv
«a

ooooooou.uouu
o

cacao-oo.~oa
mu

umooaa x

oocoeeoo.nv
o

ooooeooo.uou
Nu

ooocceoo.nv
o

wuzoumm 99.9

noxnuxoo when

oceanogo.nvn
«a

.ooooao..«so
a

os.....~.~u
«a
sauna; a

caucuses.~u
«u

oeaooeom.o~
a

”auto
«auto
.axcu
.accu
.occu
.accu

oooooooa.o
ea

noooeaou.van
v

cocooooa.«~0u
cu

ooooceoo.one
cu
wr~u

oooooeov.~u
v

oooooooo.ouo
on

ooooooov.~u
v

.sczxou.
40-»200 papa.
Jocpzoo papa.
Jocazoo papa.
Jocuzou papa.
Jackson papa.
Jocpzoo papa.

ax.» azmcuau «as mug” aunaaam

on . «an as.

pzwccau

...ae..~...~ o...as...
. .
osooooon.noa ......ao.
» a
uzo_»¢>¢onao ya: «a so axon
«a
.
au
asses-...uca 9......o.
as ca
8 >
sass...o.. .......a.
a a
a .
.sooooon.nu ......on.
n u
a: .

.oa .>¢u»oo sac :ou.. zoapascuuoo

oeoeoooa.

zoupa>¢mmno acc run

game.
omega; u.am

...ooooo.auau
nu

«ca ......en.a.«
5

ON coooonoo.nnau

.xuaooca z. nun—aascuooo Jase»
oauaaocn a:o_»¢,¢uuoo Jaye»
.uzo.~«.¢muoo 3.: Jake»
ac ...-oooo.vc
nu
coo-«4 -
a .......~.~
a
as
.- ...-.....uc
a
u

nuv¢050243> run

......a....
"a .so .a..4.

s eoooo.o~.~
. s

on o.a..«oo.~c

 

.nu.o.~.na.na~.ua.aao.-o.~¢:¢oa
x.a.¢.aa¢¢»...~.xx.nu.x........aoa.
no.4 ..omoaas ..mx~».a.g.~a..a.x 4

s.»o.ou.x 4

u.z.a.us._.oo.u.x 4

«~.ua.a

. swan: a.u4w..apao a: sou a

 

 

. annex z_u42...».a a: a..—

 

awooca a

an
nauov.o
«oos~.a
.nono.°
«acna.o
cocoa."

mz_a
on

oonuv.o
oncnn.o
ousvv.o
nooac.°
«onaa.oo
cocoa.“

3F?

oanwnowa..oau
ouano~oo.oan~
owcusmuo.oonu
oonuno.m.soon
amnoonua.v~n
ooaoeso°.ona
omvusnos.non
«onunocn.~c
«moonwan.oa
noouvaan.vn-
cocon~«..«quu
namoaooa.onu
oununooo.voa
oooooo~n.~nu
conunouo.uco

lama w!» Soak

mac—b¢~>wa nwlcaom

so tam

o
sovon.o
.uwno.a
cavan.o
o-von.o
..vnn..
cacao.o
oeao°.«

.
omnna..
coho».-
nova...
«noo~.o
noone..
cocoo..
s~.ao.o
cases.“

«a
s

ovoa~.o
ocean.o
cans».-
savnn.o
«~ooo.ou
one-o.-
“no“...
a..~s.o
cocoa.“

o
co«~v.°
«moo...
soaao.°
ovana..
on..~..
osaovno
manna a
.noco.°
om.~o..
noose.“

n
ouann..
n-aa..
noaqo.o
manna.o
nqaao..
annvsno
Noqwa o
unann.o
asaa~.o
c-co.e
oases."

nu
coo-o.u

v
as.»«..
anon...
novao.o
nan.~.e
awash.-
oaono.o
.uo.«..
onuvo.o
oo~n9.9
Nunno.a
one»o.o
cases.“

a

on
«cont.-
......4

a
n

coonm..
«nae...
osooo.e
snao~..
nn-a.o
~«oon.o
can“...
«moo...
ooavn.o
cuaoo.o
n~oao..
~ooc~..
ooooa.u

>

Au¢0u+d-P.beu +¢0§elfmus~abk '9'th

oonooou~.nonan.
“cocoon“.onona
oaoooowo.oo-5
mooooooo.~nnovo
oosoooc~.onon
osoacoeo.aaa
esooooco.oon~
oooooovo.vnou
oooooooo.o~o
~oooooan.ooohn
oooooooo.maman
oooooov~.sqno
oooooomo.oomo~
..oooe~o.oo~
o~oooo~o.oo«~o

wwccaom no a=a

mm4a«~cas awraoauzaaa zo mo.»a.»c»m . . a

ooeooooo.~n~v
.oaooooa.n-u
oosooooo.au~n
nooooaon.oa~v
oceanooo.nca
ooaaooao.e
aosoooow.os~
ooaaoooo.~ou
oesoooom.naa
ooaoooou.«o~n
ooaooaoo.ua.
seasons—.vnn
ooaoooom.no.
ooeauaoo.o~
.oaooooo.nnan

tam

m:o_»¢4w¢¢ou unaxum

unv~o:«~.o
onwoaana..a
vuuomnuo.o
co~vnouo.o
~«on«o~s.v
~oonooo~.o
~oanuaao.n
«a.«o~na.~
nan-oono.«
nao-.«o..u
vvnncnnn.s
o~osao-.v
unuoocon.o
~o~vounn.n
ammo-so...

aoapau>wo
acaca¢»n

u 4 a a »

amen“; u noun“; a
a
snag... eon»...
one»... ~n..a..
.a....u suns...
......u
n _ «
..aa...c acoan..
none... can“...
scan... ...“...
song”... coma».-
n.s.a.. mama...
anonuuo. nooks”.
«one. .. ”noun .
mauva... naw«u..
n.~n~... mane...
ao~a... owns...
“has... .a~.o..
.nnu... annua..
ononm.o sou...-
sooso.« coca...
......“
o...o«o~.a°~
an.~vunn.on
o~vuanoo.so
o..n~nov...~
ocaouoan.oa
cocoons...
«anoaoao.o
coach's...
unaccoun.n
ouncuan...o
onanoo-v.«v
oaoaoooo.ou
oaooouon.on
soaoooo~.«
o..n~noo.nn
saw:

an
on
nu
an

CON HOQOBOO

ouunvn
dddﬂﬂﬂ

O‘NHQEOFCO

O
7

oucaa;
cocoa;

amoaaa

omega;

omega;
at.

N
O—I“Ut.DO-‘K|l>r

v.

omega;
omega;

a
>
u
a

r
>
u
x

   

..1.. m4».

 

 

I
s am 2 $52.80 5 2;; aawou 32 w .umnun
cacaanao.« cune~aa~.au voaoooaa..u douauuna. .
.. - . . . .... w...
omega; a azapuzoo : 2:4: u
mueuau m. awn: ...z.» .o a: a»..: 0+.uso
”
32:52.. 33.. “.35.? QM»..N$\..M 1.2:; 2.35: 333233230
couuaoaa.o saunas...“ cannacoo.. anaouana...q1oo mane
«a o a o tu+ma was
omega; a pchazou 3 0+ can
.o»«2_aozua n. can: ...:.a no as ou..-..:.:
.mwxaaom amass moaam.~. éuvnu>s.o-n¢oacw acaozaa» azu.u_.amou
....u mean
novovn~o«~.o enosnnavnn.oa oaoomouoao.o oun-onaua.o cocooooooo.~. ss+od .u
«a a a c a .4 use
omega; a azaauzou a . a -
.mmm.aom swam; us.»m.~. u mazm_u_tuuoo .s..» x .;
... ..nnL. Hummus < .33. 3.2.
ido- “U 3
= 3 .13-..» are . .2. 22.... 33:2; 3°23...“ 3:22.: 3:. Z.
s . mczm_u~utwou am amz_«4axm .mo..y.zazmo z. .ao».z_:ozmo .\Vunm o+ use
n <- .7- YA- Au- V 22:2; .3 22:333. 32. .32.: 9.... 2. own: 2 39.3.? 13.9w
. \ “Ix *
N
as .3 a.»-«nn.~ saunas“...
an zmo an x z a peg .xa a.
2...; ..4aﬁn3. .c. 9253.33 . 2. a Saar}? t
53H . {INXJS HQSOD‘C‘MJUD 05W .AQV0.*O..+.I*¢
(Mara. «6:..14461. {is}.
..uxaaom anew; mo..m.~.
r _ ax ._. «x f
« ._.
o a
cum; 44a 2.1a”: gum.

.aaau msoma o» zo.~¢:z_azou.
”omau w>om< o» zoaaaazaazou.
.aa.u Joaazou sac“.
.oacu Joaazoo papa.

mazouwa no.n

yr.» czwczau swan wuzam owmaasm
ooxo~xos

wpaa vn a ”and wx_» hzwuazu

0:24:

.cuoxur mmaouox one can a: awn.

m»n.w»uou.>o. 0.42.4
manou.w»u.n..u..>m. u.n5m~
.nu.~a..a ...xua. axn...a..
a on .. 42.4 oz. momu... Juan: :.m4x a

 

as am .. 51.4 oz. n4n~... Juno, zausxa

 

389

    

3.32.5 33.2....- nooﬁoonéa 333...... .
S o a o 48..qu
333 a 2.4523 3 a 4% 953
«2...: a. on»: 2.2.» .3 t. 311:, 3 use
..3 32.; 82:53. 3:5... mamvﬁm‘r\«w .-mxaw 9:32.; twouxmczmztumou
(
$3.210 $223.~ 3:33... 1393; «1. ..qu
3 o a v .4». up”
333 a 23323 a .
3:27.523 2 3m: .32; 3 a... on - - I J ...o
..8 322m 22:..on 2:23. @398 - -23.... 332:... 2.23th
3.22:»... 32.331: 332.23... 23333.? 23838.? as: 55..
3 o m c a $3..
auooj a hztmzou : . a n u). can
...-5 392m 20:1..an 325: ..w 35::th
‘
.182 as.»
(fists—Lao t.- ..e.» oono.o n-movov.~ «accoovo.a nnoo«voo.oc ..SHJ‘LJQU
n L 35...:th 5 $2.33; 2.021373 3 3022:6sz m3 2» ....o
a»... u a wow 4
zo.».~¢«> co zo_»moaoaa owma ...z-F. Nm 2. own: a. Na
. . «a: .3436
.«o~.av¢.;4..r.:. «5.7.301... a 3323.3 v 88:...2'5. .C.Wn51.x
V‘s 9.1.2.93». up uJM4-4a .79... Annmqvnod 1r /u|o|\v .wpooaouaooavzo » no .Sootza a»... ﬁred ”gage.
. . ._.
3 v $2.23; 3933.94 + 4..
.3 is a .32 or; z... ....
dot-v .... Shear???
T........... ... .1... eases... .....eé...
«Hi-3F_HIEC‘:M [O‘DCUH Q a #dﬁﬁ *w—m41~9m#¢4" mac: 3
ono«v«o«.o¢u .aommaao.a «nave-Q..“
n ~ a
6.3.13.0?» 0373.990 +3.3... 4¢d\|v 3...." com b3 _ 3 o m o
9!... x: 2.x 3:. a a 2
”(+3. {Lucio-“eiunbdawﬂﬂ a u... g de. é. a. "In.- O n ..
.. .. - 4 a» a _ Ari... ...-... .. ......5.

am: .32 HEUL E
..Fs m .a ..o has

“ryﬁlv 3333.3 3933.... 339:...
n o «

tmyupm z— mm<> banana :0 zo_mmwcowz no wwm \ ow 2. omc> pmnwza yo zo_nmwxema no man

 

«stud _ ..3 33.; 20.2....ouz. 3:24;

 

muzoumm no; as: pzwmanu Ban. wuz; awmacjw
3:23 .33 2. . :3 m2: pzmzaau « am .- 4:... oz. 3mm... .58.. 234..

390

 

 

 

 

«4.. "Jun

33¢ «anon
N on m. am.3mzou a. nu>.u m.mou meu svusn
......«o.». ....».o..» .m....«..~ ao.n~..~..
«a «a o c ....anu
333 u 32.... ; $32.3 . sod-v.53
«chuck m¢ own: .~.z.» to »x on..-.: n+_usp
.mm¢.=om hm... we..m.~. a « ..-mucm o¢.oz.. .mouxmhzu_u_..-ou
ES . .
ao~n«°...o nan~oood.o coo.~non.o .nnnn~.«.o .
«a a" o . uso¢u4nu
333 g 333 .. 5:25.... .. .1» u...»
32278sz 9. 33 2.2L 3 t. a... goal I J. 2.5
.3553 3.3 32?? «am» .06., aonmcomam 3.32.; $32.38
diicmaun
nonoson~o«.o. nsumnvonuo.o c3noco~os~.a~ on~o«-9n«.o cocooooooo.a. .o4wa a
«a «a o c a as
awuo‘g x owoo<4 a »z«»mzou ‘ g _ ¢+ p
.mw¢<aom pm.w4 uo¢»m.~. ”m mpzw_u_gkmou
<
18* o U
3:... 23.22:.“ Imogen.” 3:33.: 33...”?
m»:w.u.uuwou >m owz~¢4nxm .xo»<z_xazwo 2“ .ao»<z.xozwo man A as
zo_»._u.> no zo_.zoao¢a sum: ._.z.*. wm z. own: ». wn > .? a
a" v snoovson.c ouncoc.o.o
no zua .o :32 o.»<¢ .1; ~z;
£01.34}; :6 Lou
up. auuL uuh
.au¢¢aom »a«m4 we.»u.~.
»
emu; 44< 2.x».n own.
«:2.-
.mmoxux mm.4u.x Jan as. a: ~mm.
.oa‘u m>oo< o» zo.»«=z.»zou. m»n.u»nou.>a.u.41.4
~om¢u w>oo< o» zo_»<32.»zou. w»nou.w»m.- un.>m.uun4mm
.o¢<u Joupzoo pcpm. .n .9“. .~\~a. u.oso.~.u
.o¢¢u 40-»200 »<»». u om .. 4x.4 924 n4u~.._ Juno: z_u42 :

nozoumn no.9 ax.p pzmaaau »m.4 ~92.» ammu<4m
coxowxoo m»<: on u «nun wz~» pzwczau

 

—~ on .. 4x.4 n2. .4»~... Juno, z.m..t_

391

~uoo~.«n.q. anon“ .n ..nvooau.~ nun-non ..
«u mm o O n 3038
331. .. 333 . 2.39.3 . 2‘3
cu»o¢u a. gun: ._.a.» 5: p: on.: .¢ «1-
..uw w.uz.n :o_»«.¢o.a. amp—z... .hmwwumusc\w :n-cm a¢¢nz.mu swoosnuxu.u.uumou
«939... «3.3:... 2333.. 3.2.3.. .1354
Nd . «a o Q g E
auou.4 u awn... pa‘»uaou . a:
«o»¢a_.oawn a. cum: ._.:.~ to .2 on n....: 9+ A
. I
..um 32.; 3:333... 3:33. «”393. 3.2.2.3 332:» .53253
nonnvonoou... nacnoonooo.. ocvvmungan.- ocnuqounu... gang-......uo _
«a a. . q ..4H”
auoo<4 a awou¢4 u pa¢pmxou ¢ . o¢o
..um macs—m :o_»¢.zo.a_ no..x.4. .m npaw.u_uumoo
c4131... an I» 38.. 3.2:..n 3:335 3332..» x ..
3.33238 .6 3.5333 22:...322 ... 32:585. u: .....H.
a: «PL 3n 322....» 3 23:98.... 33 3.3-: «a ... can: 2 an >3 4
a vac-nomo.nn o unnaounu.a
.a .npooc.~poo¢.aa u so .uuoo¢.ag —
an o nsoonosu.. n.~..n....
u: can .9 ca: o.»¢¢ .3; .3.
«anon-on.v guano»...—
u u
unouxopoou >auo...¢-.x .4 no «you.

 

103.35... (d .34
:0 or 00%
«a
cup. :9:
o..nouo..u
a u

ounnovq~.u
o
t
:mpnpn s. a... .mnuc. so 29.nmw-mmx no wuu \ am a. a... bonus. 9: so..uu¢-wc ‘9 an.

 

~..ow mama.» ao...xxaua. nm».:_4.‘

u am no 42.; 9:4 n4n~.u_ Juno: a.nau

museum» .0.- a... unwuuau pch mun—n awns¢4w
-o\-\o. a... on . «nan we.» pauasau .

392

3.:

 

 

mama 21“”
. nun
n am a. am.aaxou 2. am... ”.... a... o+.u:a
38...... «.23.... 32.2... ...nn...... 04.. man
on an e o \o
m...» 3...... w .222... m 3
no»... .. own: ...2.. .o .2 a» -... .4_osn
‘ "I O
.m ¢.=.m n 4 ... m... 2 saw a . 2.. u. 2 .o
w . 3 . E\m ... a . a .mo 2 23... o
.232... 2 22.3.. . 3.22.... 32...... .48 man
.. n . . .34%.up»
mg.» awe... w .22.»2ou w 03
mo».2..o2m. m. cm»: ...2.. .o .2 a. 1.0.: 94 a
l
I
8.5.3 .2... .35.... «Y... a- 2...... 2:32.; .23....3.
«noon...n... nuuonuoov... .o...o~..n.. .no.onoonv.. ............. .s n“
.. a. . o » u+w .a
mg.» ...... m .2chm2ou ‘ m .‘ oven.
.mw¢.:o. ....4 u.....~. .u m.2w.u....oo
<
.ixdo i
I... 3.2.9.... .322... 22.3.... 9. .1.
2230......0... 3 32.353 2.2.2.2322. z. 32.2.2023 man 3»... 8+».-
2o.:.2; .3 23:33... 32: 3.2.: mm 2. can; t a» 343.0
a. v v.9oov...v ......o...
a. 2». .a 2:2 o..22 .2. .2.
‘0?”3‘8. a... L‘
Qua Indl‘ Uta
—..m¢.... ...m. ......«.a
o . a
a... 4.. 2.2..: aux.
.22..
.2wo2u2 mm..o.x Joe 20. 2: .mm.
.o24u w>om. o. 2°.»<az..2ou. m.».m»mou.2o.u.42..
.aa.u m>om. o» 20...:2..2ou.

.oacu Jocbzou »<»w.
.ac4u Jacuzou »<»m.

muzoumm o... a... .zwaznu .... mu2.m owma.4m
..\.~\o. ...: .. . .... wt.» pawaaau

upaou.m»m.n.mun.> .....mu
.u.....o. ......n...\o.n.2
n om .. 42.. n2. mama... Jana: 2.m.2 a

m am .2 41.4 924 «Jam... 4moor 2.0.x

393

onuuonno.n

  

 

.39.»... .333... ...-...... «1. .85
.. a. . .
.... ...... . ....u2oo . o
capucs ac awn: ......» .3 p: on!!! 4 0.83
’
.... 3....» ...........z. ......3. E\M an... 3...... .3....2.......3
32.3... .32.”... .22”.... 32...... ....H.
.. n .,
...: 3...... a 2:9... . ....- a...
3:23.... a. 9.... 2:... 3 ... ...--.r I 49.9
.... m..z.. z........z. ....z... . w n........ ....z... .2.........
.....uu..... . ............ .....u...... no.n.u...... .....n....... .2nw
.. . . .2c1
«2.. 33.. m 5.522. a ... J. 0..»
.... u..2.a 2.....¢..2. ....x... mm u.2u.u....ou
(
I . dad.
3.3.3.}. (a so» ...... 2.3:... 3.32.... .5332... 1. «1..
In .9... 3m 3.3.2.33 .... 32.3.... 2.2.2.2323 2. $2.222... a: ......
22...... .. gorges... ...... 2...... ...... ... ...... t 3 2.4...»
. n.o~..-.~. . no.~no...o.
an .«poououpoo¢.24 » so .uhooa.aa p
a. 2 v .nnoou...v nan-moo...
a. 2a: .a 2:2 o..<2 .22 .2.
on...m~..n .....«...~
.3342... ... ...... 9......3... 3.3.2:... 3 .. ......
... .... ... . . .
..
cup. 2::
.....un... .....n...~

unsung. a...

a ..
2....» 2. accu puny-2 2o no.uum22u2 no mun \ on 2. .2.» rooms. 20 2o.2u-2-m2 so an.

 

me.ou m4a2.. 2....22..2. ....2... —

a... raw-can po.a uoz.n awngcau
cox-«so.

a»...

2. o .... .2.» .2u-2au n on no .2.. an. uauuoo. Juno: 2...:

39k

 

 

 

 

  

 

......3... 3.22.... ... . . .133
...... .22... a. ...... . ..uau. c. on“.
. X 4 n
.....M“..... .3332... 23.22.... 08¢
N on
En..- B’ _ to :3
3:33:38 :32. 0.100 J-Uuh 0+ 3.3 —.‘ o .ow m2 cusszou 2. 9:2».— nhrggtuou 2 0 94.219
.9222ou. . .
......u...m.u ..m.. ......«g...
....u 4
...:o. ..... ......»a...u.\~....u..uc
.......... .......... .......... ._ ..222 .
«a .....2. .. ...... ... c. .19.
34...»
............ ............ ....... . .
n z c =... .. 13.823
2
3.5.2.38 .3... 218 ...-nu... 9.. 3.9 a. .3 9. $5.28 2. 3.2..— ..2m.u.twou . “.44.“
...:c...
......a...m.u ..ma. ...»...2-o.
.a22o 4.2.200 .2.m. . nuuua.2o.u.n.o.cu.oncu2
. c
33...... 38...... 32...... 14:1...“
N.» ......2. «a 32.. \ a... >2 4...»
30.41:? P1161. ed :3 03:. 3 can”?!
............ ............. ............ ............ ............. .
o . u a .. 133.2512
0 u 2 coda-00
35.22.9300 32»... d‘oo Tuba» 0+ 02.0 cm m2 unwantou :- am><w. «paw—uuuumnu 4 3‘
...:c... ......c2cog
......u...m.u ..m..
....u ....2ou .4... ..uum.........xu........2u¢
....u ....2.. ...m. m2......m >...2u..r.. ....2 2.... .

wuzouw. .... a... .2waanu .... muz.» .mn.2.m
..\.u\.. .... .c . .... .2.» .2maxau

n2...2aou ...pzma.... .uoor 2.-.2

39S

.mhzw.u—uuwou acne.

.mrzm~u~uuwoo a¢ucu

ouzouwn .«.o 2..» unwccau
.0\-\oo «.49 no a

..........
u» ...:.2.

cocooooooo.uv
h

oucoo.aooo.«
.

  

.U‘OU I IN»... 4 030

am «2 capantou 2. ou>2n.

 

...:cou.
...2m.u...wou a.mc.

.ncdu 4°.»10u nth“.

...........
u. ...:p2.

............
.
NF
6.8.. ......» J 2.0 r. .. ..m 3 .335... 2. 3.31
.r22295.

...2u.u...wou ncwz.

«aldu JOI>IOu bdrm.

m<4 aux-n awnt<4w
«an wt—b brwzcau

3.33... 3333.. . ... .1...
an own:- uan x 4.35
3.333... 3.333.... «One
2. o no 013.
. m
33.23... u 4 ....
......c2ao.
..wun.>..u....s.....u.¢-2
.3333... 3333... .1332! ...
3 .3... .3 .1343:
33333.. 33333... ...... 3.3
n . ......
.2 w .u .4 ~29
.....o...u.u
...»...2...

...u.....o.~sn.n.o...2

.2...<=om >...2u..... .uoor 2...:

196

. ...... . ...»... no... . z. ....» ..... .o ....

uﬂ a... oJJd..S! 9‘ tJIL ax“.
.......... ......o.... ........... .~ ...
.....xm ..z.. m... a... a. ... ...z. ..x.. nt.. a... 4m... .uzm....zou .2... ... ....
...... .... .... o. z.: \c... .3.
... d... H d ._.. no 6......) o Llllv o o a a. n. a. «a o
.... x .2. . 3 % 53......49. . a .... w...» 33... u 333 . 335 . .3528
3...». ..... . .. ..... a...
a , .‘i'ﬂ 0‘... 4...“...ng 4.. 3d. 0‘ h N: 0 u o x
$.14 ‘Oﬁzuﬁnv U“ 4 9.0.4.3. I 9.30.... 01. .... 5...: +3. 0* 40.43 ‘03th dalmgclo‘ost w u >02 *3. N xw>n>uuan an .m¢¢> Qua;
‘42.”: 3 9.. 3.0.43.”- e..+.¢._uo+. 04+ .... «Its...- 333...! 19.133.1931 “...: 33““ u 2.3““ 1 ages...“ g .....mzou
. xw.....:. .. ....» a...
‘9‘?- 1‘... 3.0.4ng ... 5...... +£a ﬁn : ca >
1.31.43 2.4 354...... .4314. Z... .... ...... +3. ... ._.“... 33...... +3.3... «3...... v . 32 S. ... 5.2.3. .. ...... 3....
0*dl4vo «J 9+ o n a n v a
0 . I. ... . w w H) u u 1 L D
a: ....dsr 04:400.... .41. . a... $090 D0.Jd.§ +¢o~$soﬁ $33.3. u tw>n>noau 2— .IC.» 0092*
on... x... cm... ...:o...¢. . a. ... ....o ..
on... :o.. am... ...:o..w¢. . a. .o. ...oo ..
an... x... an... ...:o..... . a. .o. ...oo ..
on... a... 3.... ...ao..m¢. . a. .o. ....u ..
on... x... aw... ...ao..m¢. . o. ... ...oo ..
.... x... go... ...:o..... . ...... ...ou ..
.... x... om... ...:o..m.. . .. ... ....o ..
.... x... an... ...:o..... . a. ... ...ou ..

wuzouwn «~.o
ooxoaxoo

“...: >2w¢a3u

mp<o

......a. ...m. me... ...x..

............\................

. .uoox z.........~ sou. oz...... .4... ....n :

 

....u m>am. o. zo...:2..zou. ....ux..........uou...~..x
.a¢<u m>oo< o. zo ..32 .zou. - x...u..~.z.z
....u u>oc¢ o. 2. ..32 .zou. a»..w..oa.zu>u..u>u.u>u.zu>o.u>n.....u-.....
....u m>o.. a. zo...az..zou. . ......
....u w)... o. zo...nz..zou. ...ou..~.¢z..nc.zu>u..u>u.u>u.zu>a.u>a.....u.....
....u .oa.zoo ..... zu>o.u>n..¢.x...u..mo
.o¢.o 40¢.zou .4...
”m.. moznm amma44
v. n a. w: » pzwzan

_. 4.... z...x...... z... .z...... .4... ....mu

 

.912... .5.

...un mks-onwhdturmw us~>¢¢rn~

«+3....suoou
.3... ......» $.33... m

.
.~.~......

«a

cor<a~IO¢wO wt» m1 awmaa.~.20hv..1.zobvuo ha omou

..... w.....m..x...m .........

 

2.0.4: n O 5310* H

......o.....
o
o

«soonononu.o
on

us.»

nonosouhnu.eu
Nu

...-20.0.8.3 ”an” 33... ..

moo-onuo.u..
..
can... .

 

—m.... ......m......w «...-....—

 

ooo.eoooo..«n
h
«I

......o......
o

n.~on.o¢v...
nu
awoaca m

nhsnnvonno.o
«a
amou¢4 .

.....u.....o.
.2...2ou

\sys

a
can...”

...~.~.~..
.....o....

.a . , u

......xxoz .....x .ou.¢.. moa..¢=....

r.....x .ou.¢.. wo:........

ouc.oaoooo.u
o
I

cocoa-secs.”

h
N;

oooooooooo..
m.
amau¢4 .

oaooooocoo.u

oucaoocooo.u
N
-

......m..n..
.
.2...zou

canoouwo.~..~
.2...zou

....~.......
.
a

a .
....... ........ .
....... ....... .
....... .
o
.
........... .
......v... u
......o... .
............ .............
o. o
. u
.
............ .............
a n
a: 1
.
............ .............
u n.
. u
.
............ ............o
o o.
a .
.
............ .............
. . v.
u .
o
......o..... .............
o .
u ..
»
o....m~..... .....u.......
n .
~
...u........ .....n.......
o
. u
.
.....u...mou

 

 

 

.123-
..oqu
.n ...:n
.u .l¢udaxu
0.I0n.
a! . .
. ......u...
o 0+.ulx—
......MHIIJ
......o.
......o.
.....=..
.wo..nu-
504%..0
......a. .+ .23.
Iauhcalu
......a.
a.......

398

avsozxuﬁoou
«and 23¢»
+£13.60
sotzouvou
......
«3......

 

ooosons«.oo
o

«uncannn.ao
o

«cu-sown...
o
c

ouoovonu.oo
o
a

....n mam..mm.¢x...w

oovnsooo.ou
..
as.»

......o...
on
us.»

mmooovvo.oo
on
w:.»

~oo.......
..
m2.»

oz.»u<»m.

new!“ +9‘\DIU

nuoonsoo.os
nu
awaa<4 m

.m....-..
..
...... m

«cocooao.oc
nu
amen.) m

anvoosu.o
an
owoocJ m

~m.........
..
awaa<4 x

ou....~....
u.
one... x

~9...o...o.
u.
awe... .

o>nosvcu.ou
N.
auuo«. a

A. .‘I 4 1a.}... x2330»:
23...; 13.2.. 0+ «1.1.09.3...

...q.:a
.......no...

.13... :8. I...» 3......3 ....“ G

ado-novnoo.n

o....o~...u
.
a:
......v...

«a
awoaca t

~o........
.
a:

......o...
..
one... .

......o....
s
a:

..........

«a
amou44 u

..NN......
.
-
......o...
..

can... .

mumou x on muaowc

.. .... 4 1..
313....) ‘03—’33: 0+

«vuvunnu..
o

.....u....~
......ou

....» .... 9.

.....n....
.
.....n.....
......o.

....» .... ..

.1. ....

..........
o a L0+mi_u~.
... ....
.....u.....
....uvou

....» .... ..

nooonnoo.o
O
-

.....u...~.
>t<ruxou

...... .... ..

.ot szhinzsv

«31.8....»

. tozzsuya 82:» ﬁvqutia .1H- +u‘

um hwa

.1...=g.
. ... in...
ace NNGHuO u .210 via ‘9!

399

 

o-oscnsu...
.
a
~.o.n".n.uo
1.3.7.30 c
atlzunowvo-
I...“
seduced
«...»..n...
o
a
u.n.nnnu...
.
a

 

922-.6 u

accuse-c.9u
.u
«sup

convuunuou
cu
wt-h

ocuouunu..
on
as.»

«vo.~nun.-
on
as.»

«accuse-.0.
nu
omo¢¢a U

ousvuunuoo
nu
awocca I

.thuusnuo
an
BUD-<4 u

uns~0u-.-
nu
Gaunt; m

.a....u.«~.o unguou.>mcn

canoe-«...
\ «a
awoo<4 n

u.~nooo~.-.
an
acouca a

u.sn0¢o~.co
Nu
awoo¢4 a

Qua-onunuoo
Nd
awou«a I

talk o—btx muzcxo and

.....nu....
«a
cum-nu...-
«a
nwoc<4 t

«unmanno.o
h
as

acne-nun.“
nu
snoo<4 .

unocumon..n
«a
none-nun.»

nu
AUG-(J ‘

ouonnmnu...
«a

apnoea...-
«u
nuooca g

«a

an

cu

ac: « .0: a...

uvuvnnnu..
o
-
.nsnuuoo.n~
>v¢>uvoo

....» ...: ..

 

no.»sou..u
.
.
~u-~uo¢..¢
pp¢puyou

....» .... .s

‘f
:

so.»~o«0.u
.
.
~u-~uoo..¢
.upcpo-ou

..... .... .o

4.3

 

non-~.~..
a
.
ou.~ouno.uu
~p¢pupou

600

”no-com...
cu
uxnp

onuNuoon.no
«a
aaoa¢4 x
ouuocaov.a
««
amaa¢J s

.«oouns...
_ an
awoo<4 u

..«no.....
“a

among; .

.on..~nn.uu

hzdhnzou

«a»U¢u n4 can: ...z.» so p2 onau..a

.uw¢<:om pm¢w4 wa¢pm wwaxr.

. ouovodno.o
cu
as.»

n..n..n..°
a.
ammo.4 .
aeonouuu.o
«a
amoe¢4 ‘

ao»<z.:ozma q< awn: ._.z.» .o b: o. u..-

I .
E - 4.2.... 332:» 532.33

s-ssu~s¢~.«

.mwccaan hn<m4 was—m uwcxp.

«nauvsoovu.o onuououu-u.o
cu nu
ux~p amooca m
nocuov-voa.o. «moonusuns.o
«a «a
nuou«; 2 among; .
oaoovnwnou.o «vooauseov.o«
naao¢4 g ~24pn2ou

 

—wmw..=on pm.wa ma..» wwcz»._

 

cameos»...
nu
awou«4 w

sonsoaou.o
«a
auoa¢4 .

..vuoovv.u
o
.z«»mzou

l_I

a
padrmzou

osooovassu..~

o
prdrmzou

concooooos.o

ununuuvo.a
pz«»»zou
n zo_»«=om

nanosuns.n

hchnzoo
u zo.»<=om

aceonnus..u

« zo~p¢aau

4 Q ’I... .
E\.w v.53 2.35:» 3032:392t8

svno~uc~.«
»z«puzou
n zo.».=om

ammo-"nu.“

pz<puzou
u to.»‘aom

nonouuoo.o
n

u zo~hcaau

oou.«o...v..
o
u

vuouosonuo.oo
v

ooac.ooo~a..
c
g

M ..zm.o.‘..ou

sononoua.~u
o

.
ans-omso.oa
. 4M...
. sniﬁc .
.~..~".. a upaoo
O 0":
n..»»«»...
.
d.....ﬁ«.. uqxo
.. . 3mm
IVE»
2.3"}; 43.0 .
I
.....n.....«.
a.
a .o.».=au
.....u....... .‘Mwun
. .123 u
n 3:33 486
.....u.......
u
a ao...aou

‘01

wrap
on
onoosooo.a

hzcruzou
o

usunonna.e.
oncocnvo.oo
~oovnoa~.e
ns-uouno.o
onaonsa~.ou
nvooccoo.o
sandman“...

auuaca m
an

«nonoooo.o
«noouuoo.o

a

v
«oaovooo.o
«ovooooo.oa
osononoo.ou
onnnoooo.o
onoo~no°.e
s~ovn-o.o.
onscumns.o.
oaouwowo.°

¢O»¢z—t04w0

now¢¢=on rm<w4 wa¢pm wwaz».

mg.»
on
cocooooo.o

hxsrnzou
o

«nossov....
«nouuvno..o
moo«.ucn..
«cocooc...
nmsnsoow...
oocnso.s..
nooonuu..sn

awoo<4 w
an

nauooooo.o

soucvuoa.o

a
c
«Noaoooa.o
«sneooao.ou
«nanomoo.ou
oaooooao.o
ooonovoo..
no..os~9.ou
«ouvnvoo.oo
'nssnuno.o

mo»¢8~80¢mo

.nwcqsom pm<w4 maapn mmxx».

pz<pazou
o
auuoovuo.o
menocuu....
enununcu.«

auoo<4 s
«a

souvvoo...
onsovuoo.oo
anosno~°.o
tonnage...
unno~ooa.oo
noonnooo.o
«nooocno.o
oouoonoo..o
nosooou...

wx» 2 3m: pu|l III

ba¢>muou

no.~ns«...
soo~on«....
noceuonn.u

awau<4 a
«a

ooﬁvnooo.o
”conuaco...
vveonnno.a
concede...
nononooo...
sonuoso...
coonnuoe.o
vvuowuoa.oc
vvuocuua.o

mg» a¢ amus....z.»...s.z.pvwo pa om.:.l
I

u
a
«gunman....
oouaooa...n
anooasa...u
o°~«.uo...

pz<bmzoo
o

-~onna°.o
“cocoooa.eo
manonnn..o.
«cocoooo.o
s-~n~o....
cananmu...
naownooo.u
socoonoo.ou
«onsano....
oosvouos.n

’I

’

I .l I I I ‘23:. >oou¢<>

o
n~.¢n.o...-
ooouuuoo.o.
chooses-...
amonuno...

pacpmrou
o

«odouva...
osuaaoo..e.
oa~....~...
sooouuu..a
nanvooo....
cognac“...
neu.~o~..~
oonwunno...
nwuoooo...o
oumoumaa.~

cacao; a
nu

mass“..°..
unnum.....
condo-....
smoo~....o.
n~ono«....

a
«onanaao...
neoooaoo...
nonosuoo.c
«gonoaoa...
oononooo..
owonoooo.ou
«ncsoxoo...
.sn.......
suuonooo...
n~««..n....
ouon.~....

among; a
«a
noonn.....
onn.~.....
~o¢uao°...
«ovnwooo...
nus.»~....

3
n
coonoaoo...
noauoooo...
nonuoooo..
convene...-
noosnoao.o
vuooowoo...
naooooso...
uncoaae...
naooogao...
m:ono~n..oo
coups”...-

I
to :5...

nun-<4 .
«a
onuooo-o.-u
nausea-...
cue-sou...-
«mauve...-
oouconog...
coo-nan...

o
ooonooo..-
suagna....
an.voo....
cocoa”.....
anunaoa...o
nauvovo...u
on»non«....
snn¢o.....
«cuoaoo....
v~uoeou..-u
oc.ouu.....
ooaoouu...

swan

oucoca t
an

acouoo....o
nuance...-
canon-u...-
noon~oo..-
n~.nun....c
usuooo~...

.
c
~o..~.....
occusuo...
.~.v.«....
nunsoa.....
oo.na......
nsnouoo...u
o..nno«....
.n.~n~....
nv.noo.....
u..so.~....
onnvc«...-o
«ouvvo«...

,ou.¢¢> .unu

d".

..-d‘.d~..n-
d

at.»
au-o«4 -
paapuzou
u

swoon; u
ouoo¢4 .

u:_»
au¢o¢4 -
*2...zom
nuoa.a g
ouao.4 .
...pnzou

.
owoo.4 .
pa.»aroo
u
‘

wruh
nuooca a
richuaou
u

amoocJ r
9000‘; s

.-:..
goo-.4 m
....uzou
u
amoo‘a a
auoo«J .
pa.»uzoo

.
.m.o¢4 .
...»uzou
t

 

 

mwunn"

a»

aquso

4i. .

602

ooaownon.o
ounoouo«.a
ousnvouo.u
«csnauso.o
osoasooo.u.
«assnav'.o
on.on«n«.a
onooonoa.~p

> awr¢xuhmw 0 >
.....Ja

wovuscac.~
«ocno.o°.n
wosonosn.v
nnmvso~°.n
«bounce...
nowucon..v
annocov..u
onooonoo.u

> omp<xupmu
......Jm

.mm¢.:um pm.m4 ma.»m wucx».

.umu.:om pm.m4 ms<»m wwux».

.oaooooo.»
.....oa~..
...-goo...
eooaooeu.n
oooco°.o.n
ooooooo~.n
.aooooo..u
cocoooo~.°.

E

« Leu.xq n

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

usaoaaoo.n
neo.no.n.u
n~...n.~.a
ad.°~nuo.e.
nuan»..o.o.
movoooun.u.
ooanonao.«c
onccouo..o.

> au»<r—»mm o >

..rJd
m 4 a a n _ m w a

on.....~.on
na..oooo..n
nsnn«v~».vn
uncounao.~n
nauuuooo.uo
«e.g.ouu..n
ooanonuo.ov
.n...ucn.~.
» amp‘z_»mm

...er

o......n.sn
......ou..n
......au.nn
.o.......~n
...ooo....n
.o...o.~.oq
.....o...no
.oo...oo.uc
.« .x u

>

nnnunnmnnnu
-"0.¥¢%

a:

n
ooaonwvo.o

d)

n
eaooo.u

-o¢¢~¢¢.o
oa.~nnon.~

aor<zuzocmn m1» m4 omm:..—.Zoh...1.szvbo >1 on”.

.mw¢<=ow hm<w4 wa<~m mwzxrv

9&0044 w
a”
~s~no.o
cocoa.“

0
NdOnn.oI
vaono.ov
NuNNN.oo
cocoa-u

.nw¢<:om pm¢w4 wa«»m wwcxp.

amoo‘J x
u“
nono".o
~°~o".c
«vooo.o
“moo".o.
cacao."

awoo<4 &
an
OOBndcao
monuo.o
noooo.ou
owono.o
Nouuo.ov
cocoo.d

p2.»mzou
vansu.o.
oaooa.o.
...no.o
..nn«.o
nvvao.oo
«oaaa.c
cocoa.”

u
~n..n.a
......“

au~_J.zxoz x.¢»¢z >09...> mug..¢spu.o

0*
.33.: ..
u

u
«9.5“...

neaon.e
cacao."

.

~s.~uo.o...
«5.9:..n..
so».n«o«.u

nx_¢».x >ou.¢¢> wuz«mx=~m~a

; amoa.4 ;
. «a
oooou.o cannu.o
«nooo.o. ado“...-
o~ooo.°. oo.n~.a
o~oo=.° nno-.o
o~°~o.° «sooo.o.
on~oa.o. onwav.e
nuaoo.o.. “Hooo.o
ocean.” canon.o.
cocoa."
nm~_4<x-oz

hchmzou
o
canooco
«~ooo.o.
oo-v.oo
Nuan~.o
vn-«.°o
anosa.o
oo«-.o
mono“...
anaco.cu
cocoa.“

:
a
no“...o.
o~s.o...
canoo.o
amono.o.
.«u.~.o
nosvu.ou
coco~.o.
-cv«..
~nov«.o.
coo~..c.
...o..«

nuah<x >ouux<> bmno

urn»
ea
ocean.“

v
asun~.c
vonnn.e
voaoo.o
canon.oo
soooo.o-
goo.~.o-
undue..-
ooocn..
annex...
.nou«.oo
cocoa...
cacao.u

CJNHDGIUMOIM.

cuou
saga
.ugu
anon
ouou
anon
an.“
«now
.332“ at:
.a ... ...
.

bf".‘

00“”

00“”

cu

Uta»
oueoca .
paqrnzoo

a

count; I
om9014 u
urchnzoo

among; .
px¢pazo
.
n

 

403

99999999 99999 99999 99999.

0------“---'----

99999999.99

at ’th 0 19mm

99999999.99
999 99

99999969990
aw: 93m

99999999.9
99999999..
99999999.9
99999999.9.
99999999.9.
99999999.9
99999999.9
99999999.9.
99999999. 9
99999999. 9.
99999999.9o
99999999.9.
99999999.9

99999999.9
9999 .9.9

99999999.999
.9 999: 9 9.99

99999999.9
99999999.9
99999999.9
munonuns.u
99999999.9
99999999.9
99999999.9.
99999999.9.
99999999. 1.
99999999. 9.
99999999. 9.
99999999.9
99999999.9

--------|----J-

99999999.9
99

99999999.999
999

99999999.9~
9 :99

99999999.9
99999999.9

a” ”z”. ....

9999999”. 9
99999999.9
99999999.9.
99999999.9.
99999999.9.
99999999. 9.
99999999. 9.
99999999.9
99999999.9

-----:------.999.. ...9- ---- -

99999999.99 99999999.«
.9 9999 o 9999 9999 .9
99999999.99. 99999999.999
999 99 .9 999: 9 9.99
99999999.9
99: :99
99999999.9 99999999. 99
99999999.9. 99999999. 99
99999999.9. 99999999.99
99999999.9. 99999999.99
99999999.9 99999999.99
99999999.9. 99999999. 99
99999999.9 99999999. 99
99999999.9 99999999. 99
99999999.9 99999999. 99
99999999... 99999999. 99
99999999.9 99999999.99
99999999.9. 99999999.99
99999999.9 99999999.99
99999999.9 99999999.99
99999999.9. 99999999.99
99999999.9. 99999999.99
99999999.9. 99999999.99
99999999.9. 99999999.99
99999999.9 99999999.99
99999999.9 99999999.99
99999999.9. 99999999.99
9 999999999,. 9 9 99.999999
959. ..4 d 9.9. J 9.
9 I
99AfAu_dmmwﬁ “VFW! v.0”aH"-.JV

999999999.99 99999999.9
.9 9999 . 9.99 9999 .9.9

.TSQ 9 m9ﬁf 9

.9 .999999999 99 .9 - 9 999999999 999

am: an 99 299: 9 99mm
9-
96 1999999999.?
999 999

99999999.9o 99999999.99
99999999.. 99999999.99
9999999».« 99999999. 99
9999999 9 999999.99
99999999.9. 99999999. 99
99999999.9 99999999. 99
99999999.9 99999999. 99
99999999.9. 99999999.99
99999999.9 99999999. 99
99999999.9. 99999999. 99
99999999.9. 99999999.99
99999999.9. 99999999. 99
99999999.9. 99999999. 99

99999999.9
99

cabooono9conou
90'

99999999.999
9 :99

99999999.99
99999999.99
99999999.99
99999999.99
99999999.99
99999999.99
99999999.99
99999999.99
99999999.99
99999999.99
99999999.99
99999999.99
99999999.99
99999999.99
99999999.99
99999999.99
99999999.99
99999999.99
99999999.99
99999999.99
99999999.99

E

.u (99999999..
99

:9. 9
9 ..999999999 99999
>mn .

.9 99999999.9999
“WIV 9 tan

99999999.99
99999999.99

9:22....
9.99
99999999.99
9.999999.99
99999999.99
99999999. 09
99999999. 99
99999999.99
99999999. 99
99999999.99
99999999.99

“NHCGOKCO

9999
9999
9999
9999
9999
9999
9999
9999
9999
9999
9999
9999
9999
9999
9999
9999
9999
9999
9999
9999
9999

9999
9999
on.

999

9999
9999
9999
9999
9999
9999
9999
9999
9999

 

 

40‘

ono~o~s°.ou
O
I

squamous.-
.
C

enouoms...o
-

onuns-ou.ao
.
C

.ouuonou.oo
.
a

3
ad
.3553 3.3 9...; $5.: «.3

onouon«~..
ca
.2.»

~nooo.-...
..
at.»

ououonu~..
on
mt.»

na....n....
.«
at.»

qvuoonon..
cu
as.»

~«no««o~.. o:s.n.~u...
nu ma
auoa¢4 u amoo¢4 a
dag-...... «sosnuou...
an nu
nun-c4 - amoe‘J x
uuoouuo~.. oasonouu...
ng an
oaoo¢4 - amoo«4 a
.unu.ao...u canovooa...
nu ma
assoc; - auao.4 s
«sun..o«.° uuuoonua...
nu ma
9.00.4 a awoo<4 a
‘3123ﬂ113,13133138
.. :sla.qxa
_¢ L11“ . do. h.+qﬁw
oaaosanuoo.n .x.4 ano.oon«sn.n

unsanuua...
as
..«u...u..
«a
aaoo.4 ‘

«I

.....u....
s
a.
o..n.vv~..

«a
amuo<4 ;

nonvono...
a
«a
onus-0....

«u
nmoo<4 t

sounuM.~.-o
a:
«canon...-
«u
nwuc<4 ;

s.nn.ono..
s
a:
~o..nov...

«a
nmaa¢4 .

unwou tron owuaowc

...sn..o..
.

.
s....u...an
p..».aou
...<. .;.a
~53u.~«....
.

.
u...c.¢...~

.
hrchnvou

dd-¢.....
o
.
.....u...nn
pvc»oaou

....» .‘uo

“ouon-§...
o
-
“canons“...
...»ntou
o...’ ...a
annnoono..
o
-
.5..~»~‘..c
»,¢».2ou

..c¢» .;-a

on nun «cooouvhoo.u

53.... 1.6.... ...... 113...... ...“

us;-
«an»
san‘.st
avuzzw

 

 

4n:
...... .3 35.1.
JLRIO

605

.hb usvuoo. u

R
«T.¢W
._ I .u‘
A~J~Vvs 3!.

«

noognoasos.u
nae-ocaacu.n
n.°.onc~o..n
noo.on°mon.o
.oo..~o....u
voa.~mnnuo.~
ooo.avooo«.n
.90.».a.~’.o
noo.oodoon.u
nae.~a~ovs.~
1. .awpmwpwp

 

{lie-Til: u ”89.36.—

 

 

 

 

 

a. .ennovn.a
~o°.~moo~..~
~.¢.~n«oo~.n
Hoc.hao.~o.u
«9°.oo.«-.~
doc-unno~.o
«a..-oooo.a

.13 33 ... :3 33H 4 «J.
adieu-«300. ”43H 10341451+d tou+dLo+H

undonvuc.o
a on

«moooeea.o

3 wt.»

vnonndo«.uu
u on

oosnovou.a

c wt.»

«scanned...
a on

«unmoved.-

c wt.»

«anooooo.o
na
9&0044 m

Noovvoo«.o
nu
awoacJ u

noevooou.e
nu
awuu‘J w

acumen >mc;
onnwoo >wag
amuwou >w¢a
answou >wma
nmuwou >m¢s
ongmou >w¢g
.mumou ,wxn
unuwou >m¢s
omuwoo >wma
.mumou >wmu
.muwou >mmn
umumou >mma
.muwou >w¢a
.mgmou >ma.
anamou >wca
nmuwou >wma
umuwou >w¢1
omgwou >m¢g
omuwou >m¢g

...aosoo..
«a
awoo¢J x

nonmagnn.oc
«a
sumac; x

«ganged»...
~u
awao«4 a

tack
tack
roan
tack
song
sacs
rack
xoau
tong
tack
tack
roan
gong
sou.
tong
tack
tog.
tax.
:01;

o_»«¢
o~»¢¢
o~r¢x
o.»¢¢
onh¢¢
o_»«¢
o~h.¢
o~»<¢
o—»<¢
o~»‘¢
o_»«¢
o_».¢
0.».m
o.»¢¢
ogh.¢
o_»¢¢
o_».¢
o_»q¢
o.»«x

matcxu
uuzcxu
wuzcxu
mozcxu
mozcxu
moz¢zu
uuz<xu
muz‘zo
woz¢zu
muz<xu
moz.ru
moz¢xo
woz.xo
mcz.xu
wuz<xu
uozcxo
wuz4xu
woz¢xu
mozaxu

oooooaoo.a
s
«a
n..n.vcs..

«a
awn-«4 g

osvunono..
s
a:
sovvneov.n

an
nuco<4 ;

canno.vn...
s
«a
s...»..v.u

an
nwo¢<4 s

no<
an.
no.
no<
no.
no.
no¢
om«
um<
nm¢
ua<
nm<
no.
on.
no.
09¢
no.
nn¢
an.

ac:
x<x
act
u<t
act
x¢x
x4:
:4:
x¢x
ac:
x4:
x<x
x4:
x«:
x‘x
x<x
x<x
x4:
u<x

an

Cd

ca .0: an».

ou .o: my».
on .oa as».
an .02 cu».
on .oz can”
an .0: an».
vu .oz x.»—
n« .0: an».
«a .02 «up.
«ﬂ .0: an».
an .0: 1n».
o .0: an».
o .o: ¢u*_
5 .oz rm».
o .02 an».
n .oz an».
v .0: an».
n .oz «a».
u .02 an».
-sns~u....
o
-
ugu..«ao.n~
~94>urou

...«. ...a .-

..nnonuo.u
o
.
so..suon.os
ur<rnzoo

..... ...: ..

...».uu..u
o
-
....suon.o~
~r.»u,ou

....> ...n ..

 

 

«an.

..c. a
4....

506

.1: H40” 49 to 400
suns?! div-r:ns .# :11
34:... 2. 1... .. .... ....
+d 38in... 3.4.3.3»

 

—mmunn u:_acama_..caa. :a:._

 

O~s>o:sooooo.

.Vscochcnn

04+ ... :8. 2.2.»:
4: .c 3.2.: ...
c... c. .3 .c. c.
‘4utine Jiduuueld

     

 

 

coco.anccco.n .:_4
uanncaoaoa.o ounaoncnoc..
an ac
m:.a amoocc u
.ncaaouccu... cacnoouaao.a
mu «a
auaacc : ouoocc .
awacvumoaa.o canon-oonn.ca
.
awaoca c azcaaroo
aa...oc.n.“~
oaooaccoac c
oaooconocc.a
099.cooccn.a
on..on~c...~
ace-«coooa.c
aco.~coos~.a
oao.oonca~.a
coo-osoa-.~
u°°.oooao.nn
...-asaaoa c
aa..~o:coa.a
coo.o:a°:..~
a“ aoa.~naoac.n
949 m .
w .llil. a la)! w
«ﬁcﬁiaﬂ ‘0 ooououuouoau
co..cacoaa.q
c....oc...”n
a...oooo~. a
ac..anooco.n t:_J
anaanaoaoa.. cnanoncnoc.o
ac na
w:~a awaaca w
co-nooaoo~.o. oaacoouca°.a
«a «a
amouca a auaaca :
ooaanuwoaa.o ~noonooonn.oa
.
awoocc a azcauzoo

 

_.m4wn w>~pcawa_uoua». zazv—

axcanzou

aacanzoo

nuuwou
unawou
uuumou
louwou
omgmou
Inuwou
unamou
.mamou
anamou
.mumoo
anumou
amuwou
onuwou
onumou
nonwou
omgwoo
unuwcu
nonwou
gunmen
nauwou

hrcbmzoo

urthmzou

acouoco~o°.u

c....:ac~c.~
n».~..nc...~c

snoouoonos.o
n

>m¢a
>m¢:
>w¢a
>m¢c
>mza
>mxa
>w¢m
>wa:
>w¢¢
>w¢g
>mx:
>m:a
>mxa
>w¢a
>m¢a
>m¢g
>w¢s
>w¢:
>u¢g
>w¢g

«we‘vvomoa.n

-ccacac~..~

ancnnuonoo.uc

unwoaoonoa.o

:acu
:ocu
23::
tax:
:oau
:ocu
:a:u
:oxu
23::
:ocu
23::
:acu
:31:
:ocu
:::u
13::
£385
tack
tau:
tack

a» bun uaouunu~o0.~ ctco pun
..v a .
...c:«acan.. .....M.....c.
u a:
a zoascacu
ccauumncnn... ....o......«.
a
g a
n roaacaoo 01..
. anacp:
«ccaouncca a 2.222.... {a
. c u :36
a ao—acao-
aa:~.u.::uou
o_ac: uozc:o .oc ac: .c .0: cu».
o_ac: uazcxu mac ac: o» .o: :ua.
o_ac: mozcxu aoc ac: on .o: :aa.
o_ac¢ mozcxu .oc ac: an .o: no».
o_ac: mczcxu .oc ac: c» .o: «a».
o_ac: mczczu mac ac: n» .o: :u».
o.ac¢ mozcxu soc ac: c» .o: :u».
o_ac: mozcxu mac ac: »» .o: :ca.
o_ac: mezcxu mac ac: an .0: :aa.
o_ac¢ uczcxu mac ac: an .o: co».
o_ac: mozcxo mac ac: an .a: 2.».
o.ac: we:c:u mac ac: a“ .0: :aa.
o_ac: uczcxu ..c ac: ca .0: :aa.
oaac: muzcxu «ac ac: an .n: :a».
o_ac¢ mozcxu mac ac: cm .o: :aa.
o_ac: mozcxu .oc ac: as .o: :ua.
o_ac: mezcxu .oc ac: ca .9: :aa.
oaac: uozcxo ..c ac: nu .o: co».
oaac: nozczu anc ac: «a .0: cup.
oaacc mozcxo mac ac: «a .0: cup.
.1- s.
. s
.. ..a a8. 3283...; ...: a: la...
n.»o.«acan.. .....u.....c.
a a:
q :oapcson
ccoon~ncan.o. ....o......~. o1¢c
c u nunsas.
: . zxucu
a 2:23. 4:5
canoccncca.. ooa..ﬂ.....c.
c
c u
a :oaacaou
mazw.u_a:aou

£1

‘07

cohvsumu.n
an
ax.»

”coaumoocvo
Nu
cocoa; r

acocanca.c
«a
aaaacc :

suuanshn.n
nu
awooca a

coann*no.n
a
awooca :
nacnacca.~a

o
p2¢>mzou

menu‘s n< can: .32.» .3 ha onto.

an; macaw n w>_acuua. awacu>cou.

somnu~n9.9
9“
wt.»

naonnon...
«a
auuoc4 a

soevu..«.o
«a
aaooca .

«omuoono.a
nu
awao<4 a

caaocca~..

an
3&6614 t

aocvouon.u
mrcanzou

3333:: ac n3: 2:... .3 a: on a one

an; accrm n w’nrcuupc nuu¢w>lou~

«oancaoaca.. .uwccncnoc..
cu an
ax.» awuoc; u
one...~.c~.o. cacnoouaa..u
«a «a
aaoocc a awoucc .
nmaacoaosc.. ccoonoocnn.ca
«a .
awoacc ; axcaazoo

 

HA»: macaw n m>.accaac awuawacou._

 

15.- 3:. ’3...
«so. ”.1330 3. no .8 33833. c. not}...
93.«lassavopallruoCQuQ-ogaonnzxc.u.

 

 

’

.
hadrutou

.aououncoo.~c

pz¢huzou

aooaaoonca..

~.a.nuao.c
a:cau:ou
n :o_ac:ou

nnnacmvc.n
azcanzou
a :o_ac:ow

no«.-~c.oa
n
x
c :o_ac=om

‘I
E\M .25 33:1. .uouxmazﬂutaaou

anaoaouncc
.
azcauaou
n :o_ac:ou

nocmcuaa.ca
accanzoo
n coaacaaw

cannon»...

:
a :o.ac=om

"
E :23: 3.32:... 2.22938

Owen-shvuocu

cocaouscsn..
u

sosuumnonn.oo
a

«ccaonnvcc..
:

O

“ nngu—bkmou

.aumou >wc: :acu oawc: moxcxu mac ac:
.ngwou :a:: :a:: oaacc mozcxu mac ac:

I . I

unnunuoc.ou

c
.
.aaconn~.c. «cos
c ”Jana
. .911
uauou
2.3“?) 4 {A
n
caooncno..
.
.
.c.cao.~.. usu- .
c «Jana
. 2‘s..sn
.1 cs»
aoaaouac..
.
......cnaac..c.
a:
a aoaacaoa
.....u.....c.

- ... ,
a :22...- an:
......a....c. .M
a ._mu.
J :5

o. .
a :oaacao-

ac .n: :u».
ac .n: cu».

603

at.»
on
oceanooe.o

rz<wuzou
o

pom-.ooo.o.
ncoocawa.o.
c:n~a.oa..
«unoocca.o
cansoswm.o.
acuoaann.a
acaooon~.~aa

auooca w
an

concooooco
unavouoo.o

a

v
nmoowaoo.c
acoooooo.o
ovonoooo.oo
ooooocoo.o
c.a~«aoo.o
c~o~oaoo.oo
noncnano.au
naaooac...

¢o><z~10¢mo

.04 mucuw n w>nbc¢arn owo¢w>zou.

03—»
cu
«canouoo.o

~8<puuou

nannuuuu.oo
ovaooovu.eo
«00090.0.9
v-oououu.o
unnaouno.oc
vvovonvo.d
«ossusno.onu

ouac<a w
a

n
«onuuooo.o
«ooomnoo.o

s

c
a:cnncoo.o
nacooooo.o
aocnoooo.oo
accooooo.o
oaa~oooo.°
exonvoso.eo
onoonuco.au
nacaonoo..

:o»cz.:o:ma

.u: macaw n m>.acaca. awazw>cou.

hz<>wzou
o

aaaocca...
cooucua....
cancnoac.c

nwoo<4 ;
an

monocooo.o
uncuaao....
aouanmuo..
acacooo...
caacnoo....
aacaoca°..
coaooooc..
onanncao..u
mecca-....

rzcrmzou
o

”ca-cca...
nonnnna....
oncccnc:.a

awooca u
an

onoocoo°..
nuaonao....
moo-nuno.o
onsuoooo.o
anuosooo.ou
cocoacc°..
aoacscna.°
cocoonco.ou
nuomoouo.o

wt» .1 awas..—~z.w...1.z.»vho he on o!

c
nocncoo....
uacoc....e.
o..ccca..o.
onscooo...

azcanzou
a

nooocco..o
ousaooao.e
aanncao....
ocnaaaa..o
one-aoeo.o.
a-uoaoa...
ccuoococ.~
«smudge-...
ac.a:c.....
co-naooc.c

'
i
I
'

o
aa.«cooo.ou
cacoooo....
can-onao.o.
canouuoo.o

azcan:ou
a

ncsvcnao.o
acaoooo..a
naocnacc...
ac:~.n~..o
c....~a..a.
couuccuo.a
mmvaooua.q
anoauca..o.
aaoocno....
nccooanc.a

nuaoca a
«a

ocaoc.....
coconoo...
cannon...-
anccnoo....
casca~....

a
nccnooo....
noaoooo...
”coacao...
cocoaaoo...
.caanooo..
ococooo....
ancenac....
ancacooa..
.aacc......
aaocaaac...
nu..~c..«.

I I. .I I ‘luuxhtt

nmooc; x
«a

anoocaoo..
oc»~a.....
.cnona....
..aaacoa...
anc.ac....

a
anaccooo...
ca..omo...
cacaoco...
occaaoao...
onaonooo.o
oaaooaao...
nac:.aa....
“manage...
onconooo.oo
accsccna...
anaoca....

I

I I guard!

o-occa .
«a

socnonoo.-.
uacncoo...
a..cc.c...o
nuance...-
c....c.....
«...-cc...

c
oucnsooo.o
..:c.a....
a...cc....
accnoce....
nocnn.°...
.ac.:.a....
ca...cac...
auanona...
....cco....
cnaonca....
a..........
~.c»~.....

>oun¢¢> .000

9.0014 s
an

cnccuaoc...
aoaoa.....
caucouoo.-u
cap-......
scone-....-
nuances...

c
nun-coo...
c~.ancoo..
ococoa9...
occonaoo...
ou.«vooo.o
a..nnna....
aoccacna..-
ccauaoa...
Novuosoocou
”acne-ac...
oncoococ...
ccancua...

>oo.:c> :mnu

OOH

CROdCCdNOOHO
I. ”d

can 4 m an»
ac a sqm"
auoacc : >u
2:33 4 26
:
oucacc :
. a:cau:ou

-:_a
au-oca a
a:caa:ou
oa-acc c
cacaca : .

at.»
about; -
brcpuzou

 

a:.a
cacocJ a
a:caa:oo .
a
auoacJ :
...-c: c
a.»
‘ .
c .
accaazoo .ccso

-
ouoaca u
aw-OCJ
racrnzo

.
auoo<a s
rzcuuzou

s

 

£09

2“" Q‘IL

u
_¢

J

.a..an..c..n

an; macaw n w>.ac:a». awo¢u>zou.

:oac:.:ocwa wxa ac nuns....:.a...1.:.aauo a: can.

an; mocha n w>uvcnah~ auo¢w>couu

awn-c4 a aacaazou
an .
«coo... aacnc..
......a coca»...
......a

w

0
.nvsn...
0.00....

«oven...
00....“

auaacc :
«a
coon».-
anon».-
anaa...
«noon...
......a

..4 macaw n u».>c:up. awo:m::oo.

awoocc :
an
nova“...
anoa...
caoca.-o
...-...
cocoa...
coco-.n

.138... 1.15.. 533.191....”
.:.4 ~c...cc~...a .acaaa ......aac..~
a: a o
n a a
......c .aucc.. nucca...
......a uncan..
. _ ......u
own—ac::o: a.:ac: :ouocca muzcacaam_o
«a a
a .u a
...»...a.. ~c.~n..:.. oaannann...
«cc-nan... «gonna...-
acaoaana.a
baa-ac: aoo.:c: uoacacaauaa
wrap
.c
......a
azcamzou : awooca : axcauaou a a
o c. «a o n o
cocoa... n...a.. accna.. .acua.. canon... .ccca..
acnno... «cc...- .cn.n... n.a.... cou.... acacn..
.c.c... c..~.... .nan~.. accac... «can... ooaa..c
accaa.. «...-u. aacnanc non.~.. .cnacuco . voaanuco
an.a.... .a..u.u annua... aaanu.u. noacu.. ccuc..u
ocoon.- «econ . nncn . coco . n no .u o. no a
......a caaco... .nca .. nac.~.. a acc... nacca...
......a non-.... manna... .nac .. .naan..
......c ounc..c. can: ... ccona...
...-... anccc... nanac...
......a canon...
on....u
awn—chcoa c.¢pc: pouocc’ sumo

CON”

«an

on

.. . Wﬂunmnu_
...c- a: 4”
......

an
-
0

Au...»
......”

.3

 

liar

acaa
auoocg a
azcan:oo

GUOO¢J I
oaooca
ruchuxo

auooca Q
radpntoo
I

‘10

090.3} 3.». u a..." 3\...c .3.
mac»: :.4 on. nmmwuwwmumu

wozcxu smou x‘x

ca.«

«ouncmc.a nag-no... ocoooeo.°

«aaaa .
cxxca run a pan to».

mac». .:-a.

cocoocoao ow 50 turn u— wardxu awou uwc xct ooooceoao Ivan ax: an. ac: 99009.9.0 Ix—J ago

«ac—mum aux—4 wt.» so .01
macaw. :ua. gap» or 2.:

I... 33rd ...-+434 "Hm—“3s,

lot—6. 04+ .3 3335!. 1:33? Illlva a:

‘0¢lb. 0*.tu Cd—Jdutg *‘U‘CO‘O‘ i=‘«0m1 ha I

.accu m>ooc oa
.oacu w>ooc ca
.ozcu m>omc o»
.oacu w>oac a.

.acco

.nscu

nuzouwa on.~ u:.a a:w::au aucc mu:—
oosoaxoo caca «a . uaac u:.

v on mu 0
a:_a awoaca u

zo_ac:2.a:oo.
:o.ac:2_a:ou.
2°.»c:z_azou.
zo_ac:2.azoo_
Joaazoo acaa.
4o:»:oo acau.

awmt<4w
pzmcxzu

.o.. - ....ooo.. oa......a ......... .a on
a_:_4 m:.. «a». a.4 4c_a:c: anon azax: ac:.:aa_
a4m>m4 mozwacm>zoo :zua. :3,
3a a ca 9 n c ,c
nuaocc : _ awaecc : azcamzon : s u
a :aanauaaa :. «cc:
: s cc nu «u «a o
a : m:_a nmaocc a amoeca a nuns-4 : aacauzoo
:u..a. u:.a:a .. .ucc: on::
ma c n a a c a
a m a: a o

a :
:mauau u:.a:a :. ..cc: ooo:u

uacmc xocu ow>cu panaoa>u¢c

uacmc :ocu aw>cu >4m3o_:u¢.

uocwc toes om>cu >4m:o.>w¢c

uncuc toss am>cn >4maou>w¢g

uacmc nous om>cn a4m:o.>u¢:

can“ xocu ow>cn >4uao_>u¢t

aamu :o:u aw>cu aaaao_>u¢:

- mama tog: ow>cu >Jmnou>mxg

.‘u mug-ummrctavuw 92—12—0w0.

on no: guano .-
oa co. asaou .:
a. co: aaaou .c
co. .caoo "a
a. co. ..aoo c
an ac. .aaoo .»
.- co: ccaoo .a
on so: nuuoo .u

dﬂn'DONC
O
U

- man—x..ao..aoou:.:.4ux.a.uua~:.4
mamou.uam.::.o~.:z.mwc.:.:u»a.:u:u.au>u.u>u40:9Ju.4:_a
uncaura:

aoao.u>u.:o>o.o>a.u:4:4

. .....c:a.~.a.c:~:
_ Juno: :_w4:..a4u~ :o:: ¢:_a:caa 4:.: .4:4 :

 

Ha ammo: :_waa..a4a~ :o:: czaaacaa 4:.: .J:4 _

411

u:.a

 

 

 

 

  

sumac; w aacanaou u awoocc a nun-c4 :
on an o a me an
ocncoaoo.a «a.a~..... cacao-a... accuacco... cvcsco .... «noon-.... .c as».
caocoao... acac~na.... accanacc... oncnmucc... ccoa-.... an aaoac; .
nococcnn.a canocaa.... «cancnco... acanauac.. . aacumpmo
on:.ou.... muons-.... osccuaou... c u
c..a~a.... no..an..... ac ococca a
ounuaouo.o ac coo-c4 :
azcaazou a awoocc : axcanaoo : :

o v an a n v . ‘3
cacona~..° ocoaoaoo... cacac..... aaanac9... oaocaoco... acnuaa.... .a u:.a can.
nauoaano.o occonuoo..o oaancaaa..u cnoo~o°... onsaaeao.oo nauaoaa... an ouooca a a.
nocaaoon.n ~oao~c~a.°- napooan... conconoa..u onouoao... anonnoo... . azcanroo o
naocnooa.e. «annowoo.. aoncocao.. o~coa~a... ccaocaoo.. ooccoao.... c u nocsn
«neacacu.o. canonnoa.. ocaoaaoc... cacnano.... emu-no...- nanua....o- an cmcaca : -
nonooanc.e nmnaanna.eo can-ccoo.. «couuca...o anacoeoa... commune...- ca au-acc :
auacaanu.nn nsoooc.a... ..aooca.... acacc~a~.a ocan~aa.... cccaonna.. . aacaa:oo

oaooca~... cocanno.... amoaccc.... ancooao°.. «cocouo... c :
.acuuaa... .....a..... .~.nn...... unmanaoc... an oucocg :
.a.~..a..~ ocunann.... uncooca.... . a:can:ou
canoaa.... on...«..... n :
ovnnuuu... c ;
:oac:_:o:mn wxa ac nmma....:.a...a.:.a.ao a: emu...)
I
.noo:.4w:.4 :::.ac: qua—:cu:_4. ana_:ac: ,ou.cc: aaoo
.nmucuna-c.. unuaoucan~.. acacooaccn.u aansauncnn.. ....unac...cu
an o
as.» sumac; u aacaazoo a a: .
.o aoaucaau
ncooamcuau... .2329... oaoccu..c~.~n «SS-«Ra... a=a-«.....co £1.98
a «a v
aaoocc : awouca : aacamaou . . «sung.
~ soaacao-
.oowa.u
c...onn..~.. ..nnonoaac.ca cascam.osa.a ocnvcnuaaa.. .....u.....c. actua-
nuoocc : aacau:ou : a . o
c zeaacaca ,
x c aaasi. Aoanrogr.kwa.u
PBS 38: .552: 3259...:— vrﬂuT L a 2: gaggiloclctd 44.4
- altﬁ W‘\ a»: n .c 1411.0003qu
98.“ n \tﬁulpmrlm ﬂue «2.3%... ”2.2... 83.2.11. :3. a
5*...- illlllwl 4. mo: .6 u o c: c m :22. an: a Eu 3: mac: 5:
calls anti.“

..oocos.n ca so gm»: .. warctu unou unc ac: ~..oaav.~ spam as: mac ac: cuoooon.o snag can
.xano..u42Juuqaxccaas.a.s..s¢ ucyuaxauccc nay
(it... ...... J. :«...¢I.i$.:...t .u 5.5... 3....
a...sn.a .muc 2.:

.:~: .nc...a.a .ac:..9.:»u: so acc:-_u .a «a».

.81 .1... ...J .3 ...... 33.5... .... .... r43...» 23...... an...”

     

 

 

 

“2

 

I to. d . .. v 0. >19. . 90
"u“.ununm “3...”? ..nwu.....~.c 3......w..h~.o...n.o..o.n.. $1.3m. ...: ..
. .. . . 3 . . .
......8..¢...1...¢ ”.57... “Wong!!! “u“.“udn «3...... Sana.“ “Hanna?“ "unnumuum $3.3m ...... ....
. uals... . 8. . . . 2.3.... .33.... .....- .u a . . . . a. .. I
do . . on u on. a . so. a. .8
.. 2:... .411... ... .... 1. .... "gnu.“ gunman ”nuns ...”...H. 2...... ”Bea...” 1.3".” A"
m.:c:u .38 ac: HMS»; .....Mﬁ: .“n:.u..u. gain“: $553.5. 11...“... a
. m c. ....
coconvd.0 O 03 0" 3’. 3..., ”s!
.916 .83”me . 3.» 3. \ ... .. .o ...m gig}. he. ..c xc: $2.33.... .3. .:: ... ac: 39.2... ...... 3.
. . . In
.‘!31. .u v i‘. -"I..’3I-.‘”--I-
--."-- -- - --.-..‘O. H at
P. «...-.... ’24-" .3...‘ d .... 93 note-83¢.- I I U”.— UI— .. gun It Menu. ”I‘M“ 01..
«.14. 3.3.1... 8...... .....BH 23...?! a 4... ...u... ...u... ..."..2 . . ...):
1.. 3.1.38.3... .JIJ . g 21.1.... 1...... 8...... ....uucuun . ._. 358
38:33.... .522: 3.2.3:... 3.3.56: :32: 3...... 8.3.5.... . . .. Jena
.- u u
o .. . J
334...... 3.33.”. ...»...n... . ... 1m.
3...... .. ......t”. . . .
c.5153. .3 .c 3.2.3:.523:.:5 ... ...... .22... . . o ......»
\
x \I
3.. 3...... .551: 5.2.3:... ra.:.c: 3...... 8:35....
u:.. .
A .... a...c~.. 33...... . . .m..cu.a ....cua.
...... .. . 2.2:... 32...... .... ...... .
.....S... ...-.23.... ...nm...... "uuuuuuuhu “unmunuunu 3 ......
.....3... 2.2.3.... 3.3.3.... 33...... n. . “manna."
£33.... 32.33. 25.2.... ,. .
......o... 3.32.... .. ....c. :
33...... 3 ....c. .
.35“... g 533...... 24...... : .
c .
.392... 3.1.2... 33...... . n . A
"mumnuumnn "mace“.uuu. 3.38.... Human?" “unnuuuunun “nunmuuuuu ...“ 38““: 3mm.
...... . . ......a... 8.3.2... .... . ... . . c
32.3.... .322... ......o... a... S... a . . 33...... . .:c..:.
2.22.... 3.22... 83.3.... 83...??? 3.3.... 3.38.... . u 4...:
32...... ...... 2... 3 . . . 3.33.". ......3... .. ...... a
:33... ...: ...... rams.» ”mum”. mug...” new“... a, ...... .
......u... 23.2.... 22.2.... 33”.“... ..“M...... . 3.5:...
33...... 33.3.... 3.32.... 3.33.... n. ....c. u
.212... 9.233.. 3.23.... . 2......»
2...... . 3.32.... .. :
.
332:3... w... .c 3.... . .. .. I I 3...... . . .

 

'
I
I

58:38.... .552: 22:31.: I .. a a- 1:2: 3...: ..3

613

555..5..5
555..55..
555.55..5
555..55..

wuzczu smou :.:

555.55..5

555.555..
555.555..
.55.555.5
555.5.5.5

wuquu 5wou x4:
«outdoo.~

555.5.5.5
555..5..5
555.555..
.55.555..

muZ¢xu swou x...
55..555.5

555..55.5
555......
555.555.5
555..55.5
555......

muzcxu uwou x‘x

..5.55...
.55.555..
.55.55...
555.555.5
5.9.5.:

.55.555.5

.55.555.5.

.55.5.5.5
.55.:55..
5.3.55:

555..55.5
moo-555.:
555.555.5
woo-505.»
5.9.5c:

555.5...5
555.....5

555.555.5u

555.5...5
555..55.5
5.0.5::

....555.5
5.5.55.5
555.555.5 555.555..
555.555.. 555.5...5.
555.555.5 555.55...
.55.555.5 555..55.5
555.5.5.5 555.55..5
55:c:5 .555 5c: 5.5...:

o (0.35.5.3. 3.50...» .13 ‘05?!-

.. .5 .555 5. mo:cxu .555 55. 5c:

.5 .o 55.5 5. 555cxu .555 55. ac:

.55.555..
55..555..
555.555..
55..555.5
.5....5..
:.5 :w:

.5 .5 .555 5. 55:c:o 5555 55c 5c: 5.....5.5

.55c :.: .5...55.5 .:.: ......5.5

5.1.385... .4 1.1.... .2... 5. 3..

55...55.5
..5.5....
5....5..5
5....55.5
55...55.5

5:15: pm:

555.5...5

55......5

55....5.5

555.555..

55....5..
5 555

55....... 5.5.55... 555.5...5

555...... 5....5... 55..5.5.5 . .

.5..5.... 5.5.55... 55..5...5 .....5..5

555.55... 555...... 555.555.. 5...555.5

5.5 :w: 5::ca 555 5 555 5.55

.5 .5 ...5 .. 55:c:u .555 55c 5c: 555.....5 .... 5:: 55c 5.:
F 5.044593 J 1530-: PI... to US- a

55..5.... 555...... 555.555..

5.5.5.... 555.555.. 555..55.5 .

5.5.5.... 555.555.. 555..55.5 555.555..

5.5.5.... 555..55.. 555.55... 555.555..

«.5 5m: «xtc5 555 a 5mg 5555
555.5.5.5 ..55 5:: 5c .c:

a 33...... ... 1+8... 5». 5. .... .

555.5.... .55.5.... 555.5.5..

555.55... 555...... 555.555.. .5

5.5.5.... 555.5.5.. 555...5.5 555.555..

555.55... .55.5.5.. 555.555.5 555.555.5

5.. 1w: «xxc: 5mg a 5m: to».
555.555.5 .555 5:: 5c .c:

.u 3.2.1.... I» 53...... 5. .... .

555.55... 555.55..» 555.555..

555..5... 555.5...5 55......5 5 5.... 5

.5..555.. 5.5.5.5.. 55...55.. ....55...

.55.55... 555..55.5 555.555.. 555.555..

55....... 555.5.5.5 555..55.5 5.5.555.5

5.. :w: «xxca 5mg 5 5m: 5555

3....

.....55.5

555.55...

5.0.5.5.n
5.59

.555.
5.55.
5.5..
55...
55555 :5..

oaoooss.s .254 :40

.55..
5.55.
5....
55...
....5 :...

B588

ouocnhs.s can; 040

.55..
5.5..
555..
.55..
55c». :55.

5....5... .554 555

.55..
.555.
55555
5....
.5...
....5 :55.

.555 5:: 55c 5c: 555.555.. .:54 555
.5c:..5.:55: .5 5...... .5 ..5..

.555.
5....
55...
5....
.5...
55c55 :55.

.0...

5.5.nno.5 55 5a 5555 5. mazcxu swou an. ac: 555.555.. :54u 5:: an. 54: 05......» 55.4 94°
.55...5.5 .55c :.: .5....5.5 .:.: .....5..5 .:.:..o.:55: 55 54c...5 .. :55.
. Q..-4.8.. .4 3.. 52.5. ...... 5.5. ;
. 55.. 5
5. .55 .5 555..55.. 555.555.. 555..55.5 55..555 5 ..

55“..5n.. 55....5.5. 555.555.5 555.5...5 55....5”. . . . 5.... 5

555.5...5 555.555.. 55..555.. 555.555.5 55..5.5 5 555.... 5 55.5.5 555..
mozcxu smou 5.: 5.5.5:: «.5 :w: «xxca 5mg 5 5m: 5555 55.5

555..5... .5 .5 ...u .. wuzcxu .555 55c :c: 5...5.5.5 .... .:: 55. 5c: .5...5... ...: 555

55..5... .... :.: .:.: ...5..5.5 .5.:..u.:.5: .5 55c.5.5 .5 ....

 

.....52 ......3 3...... . .... .... ......
50(305863 d‘ 2‘05.‘ 9.935008 3 0.11. v *o‘dlv

.88.... ......... .83.... ...-8... ...-4.... 8.8.... ...: ..
..8..... 88.8.. .83.... 888... .8...... 88...... 8... ..
.88.... 88...... 88...... 888... .83.... .88.... ...... ..
.83.... 28...... .83.... 888... .85.... 8.8.... ..... ..
.88.... 88...... .88.... 888... .85.... 888... ...: ..
.88.... 88...... .88.... 888... .85.... 88.8.. ...: 8..
...... .... .2. 8...... 8.. ...z ...... :8 . ... .... ...... ....

616

3.!”- E a. ..o a»; .. .52.... .300 ... x... .5 .5. 2... mm. x... .83.... ...... 3.

Juan .00 0000...... u

   

8...... .... 2:. g :.:. .88.... .88... .... .. 3;... ... ._.:
...l . 38 a I... .13 ...... .72.... 8
40. “in,” Us esoﬁtoauwlh “I.B.:hz—“uv go...‘ {:.:-1...". O. . 13
d . vs. 3... .... 6 ...)... ... 4 2......8 8.3.1. n.eooo.oo.oo..3. v. :53. u
83.3.88 .....M...... .....u...... .....u...... ......88...
. .
...: ...... . ......zou . ..
. ...—...... 1... ,
.98“. J. 2 P...» 88.83.... ............ ............. 88.88.... 8888...... 3...“...
o .. i d... u. .. o v a
V .... 1. 1.... _ ... ...... x ...... . .21....» . _ 0.8.»...
0: 0+ 93 3.4.3.1. m 22:30- “#3....”
«1:. 81.1.. 1.... ...... 8.8.3.8.. .....n....... .....M...... .....n....... 8.8«88... ..
.... ... .18 0.8:... J... ...... . 3...... a . u
. .....ao-
.... ..z 0.2. 443. .o a... 1.2. m.zw.u....ou
o. 3.481. ... 1...... ... 5. .
88:... .88.... .83.... 888... .8...... 8... ..
.83.... 88...... 88...... 888... ..8..... . . ...: ..
888... .88.... .88.... 888... .8...... 888... 8... ..
888... .8...... .88.... 888... ..8..... 888... 8... ..
3...... am... .8. .73.... «.4 2.2 3:... .3 a 5. am». ...: an:
...-....» a. .o .... .. may}. .53 no. x... 3.3.... .5. ...: ... x... .83.... 3.3 go
o 3.15.. ... 3:... t... .. .... ....
.83.... .88.... .88.... 888... .8...... 8... .
88...... 38...... .88.... 88.8... ......... .. ... ...... .
noalﬁBN.O 0000005.." DdoOOhh.h d000d05.0 NdOId'O.N cocoooo.d ”CDC. 0
888... .88.... .88.... 888... .8...... 888. . 8... .
3...... .... x... .52... .... .... ...... ... . ... .... 8... a...
......u; a. .3 .u... i 37.30 .38 no. a... 28...... .5. a... no. a... ...-...... on... 9.3

1015

au»u<u ad awn: :.:.» no u: one!

.x~4 x<x cuz_ 44:. nuoam>zou.

no»<:_;ozwa n¢ an»: ._.:ch
.u.4 x‘z 05:. 44:. a».¢w»zou.

«snovnovnuoo
nu
wx_»

onoaoouuvu...
«ﬂ
auooc; x

«...nnonon..
«a
auoo¢4 ;

 

_.-.a ac: out. 44:. auacw>aouwu

naouunn~.n
on
wt.»

consvnco.~o
«a

auoo«4 u

on..oo.a.a
«a
owuocq a

.oosvvsc.v
nu
amo¢¢4 u
“sovuon~.~
««
amou‘a ;

«on.o~on.n
o
.2«»mzou

’

cauvsuco.«
uchozou

a 80—»(30w

"macaw”..n

bx<umzoo
u zo—pcaow

s.n.nvu~.ou
a
a
« zo.»<:ou

I
E\.M (as; 2.32:. 308253238

N»s~0~oo.o
on
wt.»

on.uo~no.e
«a
auuu¢J e

cousnnan.e
«a
ouuo<4 n

~..oo~.vo~.-
nu
nacu¢4 a

nu.u~ao«n..n
«a
noun.) a

«noumswnva.ou
.
ua¢umzou

b: bl on 0-0- I

osoooooo.o
nu
awuu<4 w

uuunousv.o
«a
amau«4 ;

communqu.n
.
~2<pmzou

’
’

ou.n.«...n
.
~z«»nzoo
n zo_»¢:om

o~u~uuon..u
~2¢~naou
u zo_~‘:om

common...-

2
u zo~hcaom

E {2.2.5 33...: 332:3".

voonsuwvus.n
pvdhuzou

ouosnuano~.n~
py<pmzoo

nannvmouoo.o

nonosuuvnmon
m

ans-oncuoo...
C

woos-Inunu..o
v

V
a 2:23.38

n~...¢u~.~ ,4

 

       

o
a
.u.ss~no...
v
I
3: (at
3 3.... 83a .
v brace
a JV!‘
n~uo¢.o~..
a
-
coon-m»... /
. ha
2333.9 35
v .
I
.....M.....n.
«a .
n :o_~¢:ou
.....m.....a. ,
. a 3:33 at!
:.:“..25 3 {no
u .

a no.»¢ao-

‘16

mg.»
on
«enoune°..

»z4ruzou
o

noo~odu°.o.
unavooe~.°.
no»nn.~v.o
«oouvuoo.e
coonooc~.o.
econonco.u
unsooooo.oo

.x~4 x4: ouz. J42.

wx_r
on
“unvonoo.o

>z¢ruzou
o

sung“.«°.o.
unsunnn~.o.
«nousuuv.oa
naouusne.e
vuounoo~.o.
ooonsuno.u
anos..o~.~uu

.8—4 x‘x ouz— 4435

ounce; a
nu

«onuauoa.a
nuunonoo..

t

c
.ncoc-o.oo
ooavnnuo.ou
cannuooc.u.
«nonvovo.a

ousoonuo.ev
«ossomon...
oumoocno.n.
oaoaono~..

couczuxozmo

nuo¢w>zou.

amuu<4 w
an

ocuonuoo.a
onuoovoo.o

&

c
.nonssmo.o.
cnvonano.ou
oswmoaoo.«.
oaoooaoo.o

“woman"..o.
onco~vsn.on
nouonooc.vc
sovaouso.°

bzdbnzou
a

..nowo.n..
nuoonou...
onunouan...

nwaocq t
««

vcuo-ooo..
ccnomooo.o
nmwusnnn.o
.oqoonao...
canosooo.o
~osnnca«.°
~9....oo.°
ooooavnu...
~..n.a....

u
o
osouovoo.o.
nanoonoo.ao
ovosusc~.oo
omsuoooo.o

pchmzou
o

.uuo~noa.o
noouuona.o
anouanc~.~
ccnnco.~.a.
ovaooooa.o
«Kooooo~.n
nooooooo.nu
ono~.~...o.
noooavua.u
s>omsooo.u~

m...» at ammah'-il-l

pzcumzou
c

communou.o
sounnvoo..
noooouso.uu

awoo«4 ;
an
nausvoo...
coonsuuo.a
«ounce»...
anaconda...
no~o~ooo.o
oooouavu..
o..uo~.~..
conuoosn..o
oncscuu«..

u
.
«onoonoo.ou
v~co~ooo.oo
uoonqnan...
~ounnuu...

pz<pmzou
o

un-a-«.=
onen.«o«.=
coonno...,
onvsonon.ou
ouoovnqu.=
o~.«.~on.u
canonnnu.:u
caoo.so».c.
“cacao..."
n~.v~«~..o~

‘

auoo¢4 2
an
onoooaoa.o
anoonsoo.o
«swooxno.o
onsocnoo...
nooousoo.o

:
a
ocnonooo.o
«shaman...
acesoooo.o
scuunuoo...
uooodnoc.o
Nwosowoo.o
«unackno.ou
suoosnuo.ou
-oso.°...
ono~nooa.o
«Banana...

I l I on~¢»¢: >ouc¢<>

awao¢4 g
«a

«uncomoo.o
moovowoo.o
«oncomoe.o
smouncoo.oo
«unosmoo.o

3

a
auvucooo.o
nuconuoo.o
“mason....
cocoouoo.o.
ooosnuoo.o
nononooo.o
counsmvo.ou
oesoosuo.o.
coononoo.o
nososos..o
ounccuoo..

co~.z_xo;mo mgr u< ow¢a....z.»...s.z.~.uo pa omua..

ouo¢w>zouv

I
I

:-x_¢»¢x >oo.¢¢>

9000‘; s
«a
oououno...
oom~0¢oo.o
«nonuuon.o
cannon“...-
ououvnoo..
nvuoooou.o

c
oesooouo.ou
oononuuo...
nouunooo...
osnonon...

nounnune...
oo..~vo«.o.
a..nvoon.uo
guanosnv.o

..uouoou...
snaosooo.~.
”montage...
n».~usnn..

smou

omoo¢4 ;
««
«sun-no...
coon-«u...
.«sowoon.a
ancosvu....
oonuuvao..
sscnv«~«..

v
conssouo...
couscouo...
on..nxou.n.
«canon...-

nasuvouo.-u
nauooona...
o~.°on~s.ao
oo‘annon..

ouuuou. ...
snscnns .n.
u».unnu...u
osnvvouv..

umno

«

..Od'OdNOOHQ
'0"

                   

at.»
omoa¢4 -
pr¢pnrom

o-oocJ g
aueo¢a .

at.»
auoo«4 -
prcpnzou
-

ouco<4 u
auoa«4 ..
pxgpazoo
n

oueaca Q
~2¢hatou
t

J-suu

n35

: :5

at.»
ouoa<4 -
ua.».zoo
u

amoaca g
owoo.4 ;

as.“

a 24.: as;
o no. u .13:
praunaoo u:u
u o
assoc; u av 15
nuoacJ
>2¢rnzo

.
auoo¢4 .
pa.»nzoo

t

617

.x.4 x¢x ouz. 443a nuo¢w>zou.

.x.4 ac: our. 4425 nau¢m>zou.

awao<4 w hchmzoa
nn 9
macs... «no....
.99..." covnn.o
oecaa.u

.x.4 x¢x out. 44:. nucmm>zou.

ao»<z.xo‘mo m:» m. amas....z.»...1.zop.uo pa and

w

o
Conwo..o
guano...
cameo...
cacao.”

nwoo.4 u
«a
uuwvo.o
nno.~.¢
movnv.o
once“...
cacao."

awuo<J .
«a
.non«..
«soon.-
vusun.o
”axon...
causu..
..oo..u

om~—chzoz x~¢>¢t

a:
n

voooov-.~

uzcwmzou
nu-....
«anvo.au
o.-~..
«......
oosnn...
....».e
.99..."

ﬁﬁ~¢>cx

«j
n
one...”

oucuawos.v
osooooss.nu

c nw-uca .
o «a
cause... «nae...
noose... oo..n..
«o.na.a. canon..
.~.uo.. oon~n...
ocean... vcosv.o
c~..a.o. name...
cauoo.ou ossoa.o
......“ «acne...
...-o.u
nw~_4¢:coz

hzchmzoo
-

«soon.-
ounnn..
naoan..
guano...
nooon..
«son...
canon.-
ansoo...
mung...
...-..u

>ou.¢m. muz.a¢=yu.n

>ou...» muz¢acaunun

n~¢>¢t toucca’ bwou

. o
a a
.«c.... o.~.~..
....... .u.....
......n
«
amononoa..
...usuo~.c
ousouoon.~
my.»
.a
......u
a .
a .
anon... assoc...
can»... vuogn...
cask... swan»...
.n.c.... ooon...
«on»... an..n...
no~n¢.. «no.5...
.n~..... mamas...
.noon... ...»...
non.».. snun....
....u.. n.-....
......u ...-....
......u

«an

“I!”

on

02-»

at.»
onco¢4 -
pzcsnzom
ouco¢4 s
auao¢4 .
pchnroo
c

nuoo<4 c

wraparoo
:

\

u

db.

.4“,

ov

0
11.7

n

bl?

vuuosnov.o
Q
C

coonoonn.o
.
I

cusooon~.a
.
C

vouo~n~°.°
on
«tap

aaon-o~.o
a“
at.»

vonuooae...
on
at.»

sawovnv~.o
a”
us.»

.x.4 ucx ouz. 4J2. aneam»zou.

covsuono.o
nu
amooda m

”envooon..
nu

amend; w

snvsuwno.o.
a
awoo44 m

snoonso~.a
an
awuuca m

oocvvuo....
«a
auoc<4 a

v3oaon~c.oo
«a
awoo<4 x

aaomsoso...
Nu
awoa¢4 x

.Noooooo...
«a
awuc.4 x

vnooonho.o
s
a:

awnnouon..

«a
awaoca .

cousouso.o
a
a:
nuvnooo~.o

. «u
auso<4 n

sounoooo...
s
a:

nosonoco.o

«a
awaaca c

nooooaoo.o
a
«a
nucoooon..

«a
auea«4 ;

mumou xcog owuaouc

.~.nuuon...
.
o
~a.-u.n..u
....uaou

....» .¢-a .o

nouoonca..
.
.
...«n"°°.ou
».<~.2oo
....» .‘ua ..
...v
33.2... 4t:
.
. Loam-hu¢
3.3:...“ J van
.,¢.~,oo

....» .‘nn .~
...~.o....
.
.
onsoous...~
u....2ou

..... ...a .«

 

‘19

vuuosnov.o
.
c

~ounonmn.o.
-
c

.noonoco.o
-
-

coon-onn..
.
C

voucunuo.o
.u
wt.»

~nons..~.°
.u
w:.»

«nonsoou.a
.u
at.»

noonsno~..
cu
at.»

covsuo~..o
nu
awoo<4 a

o.«..n~n.°
nu
awou¢4 w

o.«s.n~n.o
nu
cocoa; .

nonoooon..
nu
ouooaa u

wanna-90..
«a
awouca x

ocoonuoa...
«a
awoo<4 x

ooovnue....
«u
auoo¢4 a

ou.uouu....
«a
auooca a

ca...ns...
u
«a
nunnou.n..

«a
nuoo¢4 ;

ooovoos...

h
N:

nouonoo...

nu
awoo¢4 a

«command...
5
«a
oouonoo...

«a
anonda I

ons~.o~...u
s
a:
nuvn-..~..
«a
auucca n
a:

nu

vd

onennuun...
.
-
u..-uon..u
~2.»u2ou

o.c<> .‘un

«unvnnao.o

saunauoo.nn
uv<rurou

...<, ....

unavnnao..
.

-
snun.«o..nn
ptcparoo
..c¢, .‘ua
u...on¢u..
o

.
...«nu...vu
~v.»oyoo

...c, ....

.~

 

 

 

..‘QU
Jcihu.
3...:
4......

620

«sonwsvo.o
cvcnooun..
“cuuoono.u
coca-unu.a
«sounouo.o
nunuon~c.~
ooosvonv.a
unnaaooo.~.

> car‘s—raw o >

.moohun..~

ona.»nn..n

annuaoon. m
anncucco.n
anonooo,.~
~oc.«.~u.n
«onunuoc.o
snoooo...a

) nwp4xuuuu

.3.) u.: ouz. 44:. um.xw>zou.

oaoooans.-n
.x >m.. . ¢.um

acou~n-.oo
mm: mm

oooooooo.o
mwc 13m

«wouooon.u
onuso~ou.~
coconndn. u
nauovoao. u.
“annouon. a.
..vuouno.~
acnsoonn.o.
cannsoo~.«.
«amonsav.o
sso~u~¢e.¢.
coonsanu.no
“amuse...“-
on.~n~ou.u
ooaovooo.~
chooonoo.u
oovvo~nc.u
csnnano«.«
swvomcoo.o.
nonoomvu.o
onnsosao.o
onouvuou.~.

> owr¢x~>um . >

x:

~u~n.o'~.ﬂ
...m .x.a

nanoooou.onn
.» x‘y: . ..mm

asqoanoa.mu
a~\~«59:.an
vanavooo.od
nuacvouo.s«
Kenna“... nu
omn“oovo ..u
acnsoonn. cu
ouonsonn.nu
nooovwoh.ou
~59~o~oo.~«
ooon~«n«.n«
“onwooon.ou
uvnﬁosoo.oa
«nsonaoo.o.
omnunvoo.sn
ownnosou.o«
o-vocno.o«
\uco~cov.ou
noounvnu.ou
vcowomoo.nu
unn«ouom.vu

> ompcxupna

m

(
.x.4 x‘x on:— 443u uuoxu>zou.

«t 03.3.5:

911...! .... ..

ca Wit-o:
p

.
.oocao...u “ canons..." ....ouec...
oopooaou. v . an ouonu. o a. snow. «9
9999909.. n . .5 ..o.o.¢ «o no... on
ooooooog.» . ”cocoa... a unannouo. a»
oooo.o.o.n o~cono~u.o vsnncnuo. on
9999999..“ . ~.u.~.~».n ......uo. »»
coaceoo..u . no.~so.c.« nonaunnu. on
oooooo°~.o. . oa~ovn...~. .ouocno...n
.u .x . p . > a.».x.»nm . > » om..x.»au
“HM“ .
...3. .3: xx...
noonucus.o «no—vaus.~n« anonc»«~.«
«a . .u .mz. . c.um p... .z. a
cooooov~.svno. . mT¢anoos-n.oou conuno~v.uv
5n n .0 3.. an ..3 3va2 23.. . >
ococooos..nn u o..oooo.o
> :2» . mac :3»
o
.
aoaoooon.n~ . annoo«.«.~ ooccaoon..o
ooocooou.a~ . ~o..v-».« ..nnnun..no
ocoeoooo.o« . kaoooq~n.~ noonaoso.on
.oooaoon.nu . ono”n«.o.~. enouauon.oo
ooooooon.~u . ukn.un.o.o o~ouoaoo.on
oooeoooo.nn . coonona~.~ venoooov.nn
ooocoooo..« . u~..na...o. "usuooos.ﬂo
ooocooon.~« . n~soon...o. nusooomn.oc
oooo.oo~.~u . .o...n.~.e. o.so.nms.oc
ooocoooe.s . onwnnnoo.n. onwnanoc.«n
ocooooo..ud . ann.no.o.n. uno.»..n..n
goooooo..na . ouu.«n.o.~. ou~¢~noo.5n
oocoooos.«~ . o~uaoo~o.~ «hoonomn. an
oooeoooa.a~ . unocoanm.~ sonnnuvs. en
oooooooo.on . a«o.~».n.« souosouo. n
oaoooooo.ou . -v.o..n.u osnoaaas. an
aooeao°«.o~ . Kooouoao.u nnousoon.dn
ooocooov.ou a~o~o~oa.o usososov.on
oooo.oo..o« . n-~o..n.« osn~.ooo.sv
ooocoooo.ou . maasnnvo.o ocuuovno.ov
ooocoo...~u . u.n....a.~. «onoooov...
. . > u » an».z..am » aw..:.»
.
.
.
.

eu.s

%WJ322.33.

.........~v
..o. a. .
..oo o.
...a......n
.........sn
...o.....§»
...o...~.un
.o.»...~..~

m ._.m

upmw

..«u.»....
a.

>ua
> :3»

.oaoooos...
9999.....no
ooooooco.uo
oaooooou.sn
.eooooos.on
..oooo.~.sn
ocooooon.un
.....oou...
.cooooon.o.
.ooaooeo.nv
oooo......»
ooouoo.°.nn
cocooooo.sn
cocoooon.sn
..ooooou..n
.oooooou.nn
.o.oooo..~n
...ooo....n
..eoooo~.oo
.ooooo...nv
..oooo...uo
a -

.....nnuu

ION“. O ON.

”NQCGOROO

on."

.3muw

anon
ouou
anon
anon
anon

 

 

.=i~u
$94M.
was».
o¢u1§

‘21

 

 

s».nn»~q. v. . m ; . r . : 3 A
... 3.: ...:3 :uuuunmama 32.2»... n . 332:3: 2.2:»: :83»...
- . .3 put; o c.»u . p... .a.n «c
onavouhanon ”nouvddnnvnﬂﬂ N .
ooo»os»..»»s» .
»w¢ »» .» 2.»: . ..»» p». u ~.~mmn.m»..» "nan“.noucou»» o»....»»..o»o~
..a..ooa.a ooooooo».»»~» " . . w >. >»»
. .
»w¢ 1:» > 2:» . .»uu.un». o.»»oo»».n»»
. > :3»
m»»o°»~..» o-o.».~».~o . .
oucooo . .
o.»voa~».~ a».»»»s».n. .o.»...n.oo . »oc.~a-.. o»»~.»s..oo ......on.n» » «
:.::an 3333...: o co .2 . 3:2: - 9.333.: :33»... n '3
noncoonc... «oovoouo.oo onoouunm.”u . onwaunmanu ~.~»»»~...v ......9».“" u“ ”mu“
3:33... . no a o v as a .
maﬁa ”Magnum “ “ﬁx.“ “was?" a "a“
«oncomuo.«o . ownuonov.° and 5 mm.
omuo».~«.~. mmmuumwu.wm "uuuuoucnca . o.»»u»»o... economno.mm ”unannnunom »» on.»
sun~nooa.ou. ~u»~»»o».v» .oaoouon."u . "»m~m«...». onvuodo».»~ cocoon-a.»~ n“ "mu“
ounovoov.00 ounc'ooo.on a c v O «CO-'0 ovnonnvounn .
. cocooo¢.nn . Doooeooo ON «d ND.”
oonoowom ». oonoo~oc.o» a a o . . a»..o~»n N. «aoooKn~.~n .aooooe .
campus»... . o c on» a. . ~....~.~.~. a». o a. n .n a» an.»
ocooonon.» "umwmmnn.mu "onouoaonso . on»».»»o.n «comamm~."m "uuuuuumnwm .» on.»
sncoﬁoc».~ nvnoonno.on cocoanuu.co . aﬁooo....« «mouuoon.on .ooooaec. n a «nun
oooo-~«.¢ «caessso.o» .ooco o».»o . nooooa-o.o summaouv.vn ...»...c.mn » cu.»
unhoovuoon QOVanhn.nm 0990.90 .5“ . 0N00n0~d.a Ohnnvnhk.nn .ooooooo.nn a nu.“
u»...».n.~ u.m»uou...o .9...°.~.~» . u.»»~..n.» onoo~°-..» ...-...»..n . cu.»
.u»oo.»..c. aunoov»~.9» eooaonou.mm . Mom-oovuw n»»~snno.s~ ......»n.»~ m “MN“
> owp¢x_»»w . > > nw»¢x_»»m .o a avast . .mucwnoo.su ...-...».nu
“HE " p 3255» . > > 35:3» ﬁm mm . n » no «a.»
.x.4 ..x o»:. 44:. uuoxm»aou. .
a. )m.‘ 0 Co.‘ hﬂhm WIMB NOBDOHOO O . Nonnnﬂbﬂodvd GOONHOQnod HONHHOOOQO .%
a. . .c swag u c.»» p4.» .2.» an undue
gnu-...».os oooooo~».~»~ . .
33.233 . 4 9‘
...»......
.....a.».o~ . . .
»w¢ 1:» > :2» . ..mnnonun. ...»ooo..»so
. . p :3»
u.»o.n»c.» »»..o.»..» .
.o...».».v . .
announoo.. » . noosuo- v can u s .s .
.u»»s»»».. mounmunm.m uuuuuuunnn . .uo.-~».. ...M~Mnu.~m uuuuuuuu.»o as «v.»
u».~u.~».». ».»~«.-.» ...». ».» . o»~..»~».u ~oso».m~..c .....o...»w .u ...u
”ononumu.. n..»..s..« .o.o.""”.m. . n.m.uo.o.w. ..n.»..»..v .9.......Mc 0“ "Mn“
soon» .u . . o o «as. «node n .
.s».»»n»..u “Munumnn.uu u»uoo.°»”~ . owouonav.. «saoounn.mu uuuuuuumuu" »» “non
. o..»»oo~.». coauvons.»u ...uu..».». . ...-gag»... coo».»»o.»n ......»n.on »» on.»
“DdKOCdVOOO ﬁ..NHﬂ..-'l .6 ODD-HI . Q“Q.Ouh.oﬂb HdCQOﬂNVQBH COCB... - a“ an.“
anomanco.~. «....on»... u a can... . .cnonaoo... .vnoqnou.»n ......»n.cn » an.»
oovnnoouJ. octagon.» ouguueou.nu . «3.33.? 3303...... 0:33». «a an:
ape-.555.» .n»»»- . o. .» . u...»~.~.~. u...»~o».¢o ......» .on a» «no»
a a ......»u » . onu».»...» ».».».n~.~ ..uo .» .n.»
u . v ...-...» n0 0 on.“

422

..o..=um.» «:.::a mzoa mu». .» a; so »zo.»a-».o a»... ...»
aazouu» o.».o~ »m»::.x n »¢:oa . .ux.» o-uncJ»

               

   

 

        

   

 

  
 

  

"muowww ”Mung“ nmwngux u manor . “ cmxpoo
. s o o »m z z »¢ o, a A .ma.o»
. s P wood» 3 3 .3 013.39 K”:- 3:8»... 2...: $52... a 239. a c 3:
0..“ 515;; 51—». .- 93.... 9.2.. 31.433 on .013. +0» mozouwn 00“.: $52... a 250.. a o a:
I s mozouuu eon.sn umpax—x « a a 09-930:
an: uww nos.
0..."... Iou+§u0 30¢ v a u z x o

 

 

.tcur 91.0 1:! to 980

 

.a¢«u Jochzou pa»». 2:. »o can

u
u»»»»~nn.».» o».s»..~.» s».».n»o.» . ~u.»¢-v.ovv snoooon~.» «doc-no...
.c >»-. . ..»» ...» .1.» - . .c .mu. . a.»» pc.» .3.» «c

.
.m»..».».os »»a~»»~..o.~» o»».oou~.»o»o»o . a~noo-a.von .usn»~oo.o.n~ s.»oo»»».»».n»

n». »» .» 3.»: . ..»» >»» . »wc »» .> 2.»: o >.»» >»»
......o... .........5»~o . .......... ..ooooos.»-»
»»¢ :2» p gs» u »u¢ :3» p :3»

.
no»..nn..n a»...oo».».~ ..oooooc.o.~ . ~«u».n~o.» ooo.».so.o~ ......»n.»» «a «v.»
oo.»»n»o.. anaooo.».».~ .oooeoon.co~ . o.».o=~».~ u».»oo~s.»s ...o...».¢~ .u ...»
ononsn»... ~o».~oo~.°.~ .ooeooom..o~ on».n~nc.n oo~uo~oo.co ....»oao.oo »» .»»u
«uo~».~».». »«»~»o~..n.~ poaooooo.»o» . a.».o»«..». no»ooonu.so ...»oo.».»» o» ca.»
oo»..»~».. «..noon..«.~ .oocooo....~ . nuo.»oa~.. owonoaoh..o ...»...o.»» u» kn.»
osoonnnu.» .maoooso.oon .ooaoo...oo. . ».~.o.»..n nwhnonon.o» ...»o....»» a» on.»
osocns~».o. ..oo»s-.o»» .oooooo...ou . ~o»o»n«..»o ~onoa»»n.v» ...eooon.nn »» nu.» u1co
8:23.? 3:23....» 32.22.»: . 3:23.? 83.3...» 2332.3. 3 :3 4th;
u»«»..».... .nauoc...~o~ .cocoooc.~o~ o~o.».~..o. .«oo».so.»o .....o.».». »» n».» 5.4 ¢
3:32.... 0382...»: 2333.2: . 2232.3. 2223..» 3323.: 3 ...»: «Maw:
s»»»°n.o.~. .nonon.».o.~ .aooooon.»a~ . ounovo..... .«n»......» ...c...»..» a» um.»
3:33.». 2.20.9.3... 3232...: . 2.9.338. 2»..~3.~o 3322.3 3 .2» 43.9
escoo5s5.~ annadnmo.~«~ oooooaa».»«~ . aa»~n~»..v «ov~¢~o«.~o .aoooooo.no » ouou
~5on~ncu.. -o¢s~»¢.»o~ ocoeoooo.oa~ cocoonoa.» conuncoc.oo ...»ooo...» o .uo»
vvonooss.. o»».»»~..oo~ .ooaeooo.~o~ . ~»~noan».~ ncsoaocu.on .aooooon.»o ~ »~»»
scuuuono.» «no»»oon.~o~ .ouoooo..».~ . o.»~onn..~ «nvsuoo»..» ...ooaon... . .~»»
oovuonnu.» ounouvoo.oo» .aaaeooc.»o» . snooaoou.~ nvnoonnn.on cocoooo».on » on.»
nsosnouo.. owowocoo.~»» oceaooo».~o. . ooo»-~».. «oo°.-~.»n ...oooo...» . ca.»
nunwonua.« ~»v»uv~o.so» .o.ca.o».oou . ~n~oov~o.n no~oamsu.an .ceoeoov.nn n nuoa
ooosvonc.» ».o~».o..n.» .o.»...»..¢. . a»...q.n.~ o.»»..«s.oc ...ooau»... a «u».
unnoaooo.~o. snnooooo.co» .coooooo.~o» . uuuoocno... o»»»»v»~.»o ...».c.»..¢ » «m.»
p car‘s—pm» . > p nm»«x_»»w _ mm» .x . p . » au»¢:_»»w . y > nw..:_»»w .o» .a a > no .

. nnHU 3
.2.» a»: o»:— 443. uaoxu>aou. ” » 4 c a a _ » u c x g a u a w u a a » a

.

APPENDICES

APPENDIX A

COMPUTATION OF [2'2 1 AND THE RANK OF 2 As AN
1 2 123 3

INTERMEDIATE STEP IN THE COMPUTATION OF [z'z ] .
l 2 l[23:24]

The method given in this appendix is very general in that it may

be used to calculate a matrix of the form [2 and the rank of

'z ]
.l
1 2 23

2 as an intermediate step in the computation of a matrix of the form

3

[leZ] in which:

1(23524]
(1) Z Z

1, 2, 23, and Z4 contain jointly dependent or predetermined

variables or both.
(2) Variables in any of the matrices may occur in any or all of the

other three matrices as well.

(3) Z Z , 23, and Z may have less than full column rank.

1’ 2 4
In this paper, 21 and 22 will most commonly be +YU (the

matrix of jointly dependent variables in the nth equation), 23 will

most commonly be X“ (the matrix of predetermined variables in the

O

nth equation), and [23 : 24] will most commonly be XI (a matrix of

instruments containing X“) or X (the matrix of predetermined vari-

ables in the entire system).

are calcu-

Matrices of the form [ZiZ and [2'22]

1 :

2 .23 1 "[23-24]
' - 0' I _ I . i. 1.

lated as [2122] [21221123 and [Z122] [ZIZZJLEZ3zz4]’ respect ve y

A computational procedure for calculating [Zl221lz and rk 23
3

. ' .
as an intermediate step in the calculation of [21221L[23;24] follows.
(1) Let Z be a TXN matrix containing all of the variables which

occur in Z1 or 22. If desired, Z could be defined as

22] ; however, there is no need to repeat variables

I

Z - [21 :

423

common to both Z1

424

and Z .1

2

If Z = Z

1

then

2 ,

Z = Z

Z may contain variables in addition to those in Z1

desired.

and Z

1 = Z2 ‘
2 If

Calculate the moment matrix (sums of squares and cross-products

matrix) of [Z3 5 Z4 5 Z] , i.e., calculate:

r-z'z 2‘2 2'2“
3 3 3 4 3
X X
2 : ' : I = v v v .
(A.l) [z3 . z4 . z] [23 . z4 . z] 2423 z4z4 z4z
X X X
N4 N3 N4 N4 N4 N
O t i
z 23 z 24 z z
X X X
LN N3 N NA N‘N

 

 

(Required computer capacity may be reduced by forming only the
upper or lower triangular part of the above matrix, since all operations

which follow may be performed on only a triangular part.)

 

 

(2) Calculate
Pt '1
2424 242
t ’ t =
(A.2) {[24 . z] [24 . 2]}i23
I 1
L2 2“ 2 $23

 

 

 

 

Ez'z] [2'2] 7 [21' [z] [21' z 7

4.4123 4 iz3 4123 4iz 4i23123

E2 241123 [2 231234 L. 21 Z3[Zl+]l Z3 2.1.2321 234
= {[2 L ?z }'{[z] ?z }
4 23 «L23 4.123 .L23

 

1Repeating variables in the Z matrix causes no computational

difficulty.

425

in the manner given by section I.D.2.l The matrix given by (A.l)

will have been transformed to:

 

r- 1
A11 A12 A13
N3XN3 N3XN4 N3XN
I I
0A-3) 0 [2424Jl23 [2421;23
X X X
N4 N3 N4 N4 N4 N
0 [2'2 ] [2'2]
4 123 .LZ3
X X X
.F N3 N N4 N N .

 

The rank of Z is the number of diagonal elements used as

3

pivots (number of columns treated) before the maximum diagonal

element becomes less than a (see section I.D.2).

If 2 a Y , Z = X , and LIML coefficients are to be cal-
+ u 3 u

culated, then the [Z'lez3 = [ Y' Y

+‘H +Dpjlxp matrix should be

saved aside at this point, since it is a basic matrix used in

LIML calculations.

(3) The computation of [Z'Z] is completed by performing

l[23:Z4]

elementary row operations on the matrix given by (A.2) which is a
2

submatrix of (A.3) in the manner given by section I.D.2; that is,

do elementary row operations on the matrix given by (A.2) until

the first N columns are reduced to zeros below the diagonal.

4
(It is advisable to select the pivot as the largest diagonal

 

1Do elementary row operations on the matrix given by (A.l) until
the first N columns are reduced to zeros below the diagonal. Thus, the
pivot element for each step is selected from among the first N columns,
only. (It is advisable to select the pivot for each step as the largest
diagonal element to reduce rounding error.)

2

I.e., calculate [21 Z ] = [Z'Z] g in the
z3 123 L([z4]iz3) i[23.24]

manner given by section I.D.2.

426
element at each step to reduce rounding error.) The matrix given

by (A.1) will have become:

 

 

r- a
A11 A12 A13
X X X
N3 N3 N3 N4 N3 N
(A.4) 0 A22 A23
X X X
N4 N3 N4 N4 N4 N
o 0 [2'2] ,
l[23.24]
X X X
N N3 N N4 N N J
b

The rank of [Z is the number of diagonal elements

3.
4 23

used as pivots (number of columns treated) before the maximum

diagonal element of A22 becomes less than 6 (see section

I.D. 2) . Also

(A.5) rk[z3 : 24] - rk 23 + rk [24]123

APPENDIX B

COMPUTATION BY DIRECT ORTHOGONALIZATION OF A MOMENT
MATRIX OF VARIABLES EACH OF WHICH IS ORTHOGONAL TO A
DIFFERENT SUBSET OF VARIABLES

The computation by direct orthogonalization of a moment matrix

of the form:

(301) {[y11lzl ... [ymjiz }’{[y1]lzl ... [ymjlz } =

1

[yljlzltyljlzl "' [ylllZIEYmJLZm

 

 

EVE ---[ ':[
L’ymlllm yljiz1 ym Lzm ymjlsz

by direct orthogonalization will be illustrated by showing how to

lA moment matrix of the form
' I c a
{[YIJIZI [le'zml {[ylllzl [ym1'zm} may be calculated as

[yl '°' yml'iyl "' yml - {Eyllizl "' [lelz }'{Cy1]izl '°° [ymliz }.

427

428

compute a moment matrix of this form for m = 3.1’2

(1) Let y1, y2, and y3 be TXl vectors; Z1 be a TXN1 matrix of

variables, 22 be a TXN2 matrix of variables, and 23 be a

TXN3 matrix of variables.3

Calculate the moment matrix (sums of squares and cross-pro-

ducts matrix) of [21, 22, Z3, yl, y2, y3], i.e., calculate:

 

1The matrix noted in (B.l) could be calculated by first calculat-
lng the m Txl vectors [y11L21’...’[ym1LZ in the manner indicated In
footnotes l and 2 of page 43 (i.e., as m sets of residuals from m least
squares calculations) and then forming the matrix of sums of squares and
cross-products of these calculated vectors (residuals). A more accurate
method of computation (which also requires less computer time) is to
(I) calculate the m sets of least squares coefficients which give the m
[y Illzi vectors in the manner noted in footnote 1 of page 59; i. e. , cal-

culate 1a: = [Z*' 2*] 1z*'yi for i=1, ...,m (where Z? is a matrix of vari-
ables from zi with rk Z? = rk Zi--see p. 47 ), (2) form a moment matrix,
Z'Z, using all of the yi and all of the variables occuring in any Z?

(Z may be expanded to include all of the variables in Zi, if desired),
(3) form ai from a: by rearranging the coefficients of a: to the same
order as Z'Z, inserting a "-1" into the ai vector in the position

corresponding to yi , and inserting 0' s into the positions corresponding

to all of the remaining variables of Z' Z. The ij th element of the de-
sired moment matrix is then calculated as [y mj'z [y Jlej = a 12' Zaj

The method given in this appendix is more accuratej than either of
the above methods and requires slightly less computer time than the
second method (which requires considerably less computer time than the
first method).

2
A verification that the computational procedure produces the
correct matrix is given at the end of this appendix.

31m this appendix, y1, y2, and y3 denote any variables (not
necessarily jointly dependent variables) and 21, 22, and 23 denote

matrices of variables--not the explanatory variables of equations 1, 2,
and 3. 21, 22, and 23 may contain variables in common or the variables
in any two of the matrices may be linearly independent. The matrices

21, 22, and 23 need not have full column rank.

(3.2) [21. 22.

 

I =
z3’ yl’ y2’ y31 [21’ 22’ 23’ yl’ yz’ Y3]

429

v I I I I I I -
z121 Z122 2123 ZIYI Zly2 Zly3
NIXNI NIXNZ NIXNB lel lel lel
' .......................
I I I I I I
Z221 I 2222 2223 Zzyl Zzyz Z2Y3
I
X X X X X X
N2 N1 'N2 N2 N2 N3 N2 1 N2 1 N2 1
I
I I I I I |
2321 ' Z322 Z323 z3y1 Zsyz Z3V3
I
X N X X N X X X
N3Nl'3N2 N3N 31 N31 N31
I
ylzl ' Vizz yiz3 ylyl yiyz yiy3
I
1XN1 I IXN2 IXN3 1X1 1X1 1x1
l
y'Z I y'Z y'Z y'y y'y y'y
2 1 ' 2 2 2 3 2 1 2 2 2 3
1XN1 , IXN2 1XN3 1X1 1X1 1X1
I
I I I I I I
y321 I y322 y323 y3"1 y3Y2 y3Y3
IXN1 I IXN2 IXN3 1X1 1X1 1x1
_ I _

 

(Required computer capacity may be reduced by forming only

the upper or lower triangular part of the above matrix, since all
operations which follow may be performed on only a triangular part.)

(2) Let us designate the part of (8.2) below and to the right of the

dashed line as (B.3). (B.3) is saved aside at this point for use

in later calculations.

in the

I

1 l

manner given in section I.D.2. The matrix given by (B 2) will

 

1Do elementary row operations on the matrix given by (B.2) until
the first N columns are reduced to zeros below the diagonal. Thus,
the pivot element for each step is selected from among the first N
columns, only. (It is advisable to select the pivot for each step as
the largest diagonal element to reduce rounding error.)

(4)

430

have been transformed to:

 

 

 

 

F 7
A A
A11 12 13 1,y1 A1,y2 A1,y3
PI I I I q
I
0 Z222 Z223 Zzyl zzyz z2y3
I I I I I
(B 4) O Z322 Z323 Z331 Z332 z3Y3
I I I I I
0 ylzz ylZS ylyl y1y2 y1Y3
I I I I I
0 y222 y223 y2y1 y2Y2 y2Y3
O yézz y523 yéyl y3y2 y5y3
L a .lelj

where A11, A12, etc. stand for matrices of no further interest to

us where Z'Z1 , Z'Z etcu occurrEd'Before. The entire lower

1 l 2’

right hand submatrix is the moment matrix of the part of the vari-

ables inside it orthogonal to Z (e.g., the submatrix in the

l,

o o I G I
pOSItion 23y1 Is [Z3y11Lzl). rk Z

1 is the number of pivots

used (see section 1.8.2).

Retrieve (B.3) and replace the row and column of (B.3) correspond-

ing to y1 by the corresponding elements of (8.4). (B.3) will
have become:
r 2'2 2'2 [2' ] z'y z'y -
2 2 2 3 2y11.z1 2 2 2 3
' ------------------------ d
I | I I I I
2322 . Z323 [Zayll-zl Zayz 23y3
I
I I I I I I
(3.5) [y1z2]izl I[5123]”1 [ylylliz [ylyzliz [y1Y3Jsz
I
I I I I I I
yzzz I y223 [yzyljizl y2Y2 y2"3
l
I
I I I I I
y322 ' y3Z3 [y3y13iz y3Y2 y3Y3
h— | l .J

 

 

431
Let us designate the part of (3.5) below and to the right of

the dashed lines as (3.6). (8.6) is saved aside at this point so
that it may be used for later calculations. (3.2) will already
have been overwritten and we are finished with (8.3) and (3.4).
(5) Calculate {(23. [YIJLZII yz. ya)'(23. [YIJLZII sz y3>1122 in
the manner given in section I.D.2; that is, perform elementary
row operations on the (3.5) matrix until all elements below the
diagonal of the first N2 columns are reduced to zeros. Thus,
the pivot element for each step is selected from the first N2
columns. As before it is advisable to rearrange rows and columns
at each step to improve accuracy. After all of the first N2

diagonal elements have been used as pivots or the largest diagonal

element has become less than c, the matrix will have been trans-

 

 

formed to:
EA A A A A 1
22 23 2.y1 2.y2 2.y3
'- - --- .................. --. - - - - ‘
' g I I
0 .[23231122 A3,y1 [Zayzllz [23’31122
I
I
(3.7) o . A A [y ]' [y 1 A
I y1’3 YI’YI 1121 2le2 YI'Ya
I
I I I I I
I 2 2 l 2 2
I
o ' [y'z J u A Fy'y ] [Y'y J
l 3 312 ,y 3 212 3 312
_ | 2 y3 1 2 2‘
where the A are submatrices of no further interest to us.

11

As before, rk Z is the number of pivots used.

2
(6) Retrieve (B.6) and replace the row and column of (B.6) correspond-
ing to y2 by the corresponding elements of (3.7). (B.6) will

have become:

(7)

 

F W
I I I I
2323 [Z3y11Lz [Z3y21i2 Z3y3
' ______________________
I I I I
[YIZ3LZ : [ylylliz ”131215721122 [yly3]l Z1
(13.8) I
I I I
[y223ltzz :[y21L22[y11LZ [yzyzliz Ey2y3jlz2
I
I ' I I
I

 

Let us designate the part of (3.8) below and to the right of

the dashed lines as (3.9). (3.9) is saved aside at this point

for further calculations. (3.2) through (3.7) are not used for

further calculations.

Calculate {([y1112 , [yzliz , Y3)'([y1]lz , [yZJLZ . y3)}iz in
1

2

l 2 3

the manner given in section I.D.2; that is, perform elementary

row operations on the (3.8) matrix until all elements below the

diagonal of the first N columns are reduced to zeros. Thus,

3

the pivot element for each step is selected from the first N3

columns. (As before, it is advisable to rearrange rows and

columns at each step to improve accuracy.) After all of the

first N3

diagonal elements have been used as pivots or the

largest diagonal element has become less than 6, the (3 8)

matrix will have been transformed to:

 

r

A A.

33 3,yl
O AY V
(13.10) 1 1
0 A
YZ’yl
0 [y 11 [y 1
. i
m 3 23 1 z1

 

A3,y2 A3,y3 ‘
y1,y2 [y11121[y31123
Ay2,y2 [y2]lzz[y33123
[y3 lz3[y2]lzz [y5y3llz3‘1

(8)

433

where the AU are submatrices of no further interest to us. As
before, rk 23 is the number of pivots used.
Retrieve (3.9) and replace the row and column of (3.9) correspond-

ing to y3 by the corresponding elements of (3.10). (3.9) will

have become:

 

I I I 1

[y1y1JLzl [YIJLZICYZJLZZ .[y11121[y3]l 23

(3.11) U, 122513421 [yéyzllzz [yznzztyahzs
Eyslizfyillzl [YalizaLyzjizz [yéyajlz3 A

 

which is the desired moment matrix.

Modiﬁcations and Genmdézetéom 05 the Pneeedéng Peoeeduae

The preceding procedure can be modified and generalized in sev-

eral ways. Following are some of them:

(1)

(2)

Only the upper triangular or lower triangular part of the initial
moment matrix need be formed and all of the calculations given pre-
viously can be performed within this triangular part of the matrix,
thereby saving computer storage.

The procedure given for m-3 may, of course, be used for m
matrices Z1 and corresponding yi. Thus

[21 "' 2m, yl ... ynl'le "' Zm’ Y1 "° ym] is formed and then

an orthogonalization performed for each 2 Before orthogonaliz-

1.
ing with respect to a given 21, the part of the moment matrix

corresponding to z1+1 "' 2m. yl ... ym , (say M11) is saved.

After orthogonalising with respect to Z , M is retrieved

i ii
and the row and column corresponding to yi replaced by the

corresponding row and column of the just orthogonalised matrix.

(3)

(4)

(5)

434
The desired matrix is obtained by replacing the mth row and column
of Mi-l i-l by the mth row and column of the orthogonalized

, th
matrix of the m step.

If it is desired that some of the jointly dependent variables

1.
(say y1 -" yn) not be adjusted, i.e., that a moment matrix of
h l l" '
t 8 form {[yljlzl [ymjlzmI yl yn} {[ylj-Lzl [ym1lzm,
y; --- yg} be formed, this can readily be accomplished by start-
- - ... ... T ... i u ... ...
ing With [21 Zm, y1 ym, yl yn] [21 Zn, Y1 ym’
y: ... y;] and then stopping after m orthogonalizations as

before. (This is correct because [yijlz.y; = [ygijLZi [see
(I.56)].) Of the jointly dependent variables of an equation, the
normalizing variable is the most likely one not to be Specially
adjusted, i.e., to be designated a yi.)

More than one yi may be adjusted by the same Zi . For example

if {[y] .[y] .[y] .[y] }'{[y1 .[y] .[y] ,
l l
1 z1 2 22 3 .L22 4 124 l .121 2 .LZZ 3 .LZZ
[y4jlz } is desired, this can be accomplished by starting with

4
I
[21’ 22’ 24! yl’ y2’ Y3, YA] [21: 22) 24: yla Y2, Y3, Y4] and

(letting the matrix saved just before orthogonalizing by IXZ be

denoted M11 and the matrix obtained by orthogonalizing by X

be denoted P22) replacing the rows and columns corresponding to

2

both y2 and y3 of the M matrix by the corresponding

11

elements of P . As a last step, the orthogonalization by X

22 4

would be as usual.

Usually there will be a set of variables common to all of the Z1
matrices. (For example, the variables in the matrix X” Will
usually be contained as instruments in all of the matrices of in-

struments used for an equation.) If so, we can orthogonalize with

435

respect to the set of variables initially and then omit them from

the z1 matrices. The following example illustrates this:

Suppose the variables in the TXNO matrix 20 had been
common to 21, 22, and 23 in the example given previously.
Then we could form the moment matrix

, ._
[20’ 2:3 2;: 2;: yls 3'2: Y3] [20: [is 2;: Z3: yla yzs Y3] Where

2‘ is the matrix of N - N variables of 2 not in Z , Z—

1 1 0 l 0 2
is the matrix of N2 - NO variables of 22 not in 20, and z;
is the matrix of N3 - NO' variables of 23 not in 20. This

 

matrix would then be orthogonalized with respect to 20 giving us:

n- [21. z—

I
2’ [3.3 yl’ yZ! Y3] [r9 2;) 2.3-! yl’ y2’ YBJ‘LZ

0

The (N1+N +N -3N +3)X(N1+N +N -3N0+3) matrix

2 3 0 2 3
M is then substituted for (3.2). All calculations then proceed
the same as before (steps 2 through 8) until the desired matrix
(3.11) is obtained. The only difference in the procedure is that

in place of the N rows and columns corresponding to 21, we

1
have N - N rows and columns corresponding to [2‘] ; in
1 0 1.120
place of the N2 rows and columns corresponding to 22, we have
N2 - No rows and columns corresponding to [23112 ; and in place
0

of the N3 rows and columns corresponding to 23, we have N3 - NO
rows and columns corresponding to [23112 . Also,

- I I , d
rk 21 rk ZO +'rk [Zijlzo , rk 22 rk 20 + [251120 an
rk 23 - rk Z0 + rk [zgjlzo

vaéﬁécwtéon that the Phocedu/Le Phoducu the Desi/Led “WK

That (3.4) and any submatrix of the form [zlyjllZ£ [yizjl-Zk’

or [yiyjjlz are as claimed may be verified by comparing the calcula-
k

tions producing them with the calculations given in section I.D.2.

436

(The calculations given in section I.D.2 are verified at the end of

section I.D.2.)

We will now verify that the elements [yi1;z,[ylez are as
claimed. 1 j
The relevant submatrices used in computing [yi1;2,[yj1Lz are
(assuming i < j): l J
p
2323' 2%
leNl lel

(3.12)
I I
[yizlezi [yiylezi

lXN 1X1
L 1 a

Let us initially assume that zj has full column rank.

 

 

Performing elementary row operations on (3.12) to reduce the

first Nl columns to zero below the diagonal is equivaJent to pre-
E11
multiplying (3.12) by a nonsingular matrix such that:
E21 I
I I
E11 0 zjzj ijj 11 A12
I I
E21 1 E3'17”lei Eyiyjjlz. 0 A22
Thus:

-1
I I = _ I I

and:
' ' = A .
E21[ijjJ + [yiijLZi 22

Substituting for E21 in the last equation we get:

-1
- . I I ' = .
(B. 13) [injJJ'ZiEzjzj] [Zij] + [yiyiJLZi A22

17mm w

I a).

437
-1
' a ' . * *. ‘k .
Since [yin]lzi yiEI 21(21 21) z: sz and

-l
' - ' - * *' * *' *
[yiyj]lzi y1[l 21(21 21) 21 ]yJ where 21 is a subset of the
variables in 21, 2: having full column rank which is the same rank
as 21 (see section I.D.l), we may rewrite (3.13) as:
(3.14) A22 =

I '1 I I " I ' I - '
-y;[1 - zgcz; 2;) z; 121(21sz ltszJJ + y,[1 - 2w: 2:) 12: 1y,
- y;[I - z:(z§'2f)'12f'l[l - ZJ(ZEZJ)-IZ3]Y1 E

- [yillli[yj]lzj [see (1.47)].

Thus, in the case of a 2 having full column rank, the desired

J

element is obtained.

In the case of a 2 having rank N* < N , row operations are

1 J 1

performed on the columns corresponding to N* of the variables in Z

J 1
before the diagonal elements corresponding to the remaining variables
become less than s. (The orthogonalization stops at this point.)
This is equivalent to performing row operations on the following sub-

matrix of (3.12) (letting 23 be a submatrix of 2 containing the

J
variables corresponding to the N3 diagonal elements used as pivots):
z*Iz* 2*. 1
F J J 1’:
u*xn* u*x1
J J J
(3.15) .

[3'1sz, 21 [yin]l 21

 

 

X

b

The same derivation may now be performed on (3.15) as was per-
formed with (3.12), the only difference in the intermediate matrices

obtained is that 23 will occur in place of 21 wherever Zj

438

presently occurs. Thus, (3.14) becomes:

(3.16) A

-1 -1
' - * *' * *' - * *' * *'

22 JJ

[yiJLZiEijLZj [see (1.47)].

Thus, the desired element is obtained even in the case of a 2

having less than full column rank.

APPENDIX C

TENTATIVE PROOFS REGARDING THE CONSISTENCY 0F 8k ’k AND 6: ,k
l 2 l 2
Consistency and the concept of a probability limit (plim) are
discussed in section 1.3. For a discussion of the matrix algebra of
the plim operator see Goldberger [1964], especially pp. 115-120 and
Christ [1966].1
In this appendix we will need to distinguish between the matrices
Y and Y”. As in our initial notation (section I.C), Y will refer to
the TXG matrix of all of the jointly dependent variables in the system,
and Y“ W111 refer to the Txmh submatrix of Y corresponding to the
mp "explanatory" jointly dependent variables in equation u. Since it
will cause no notational conflict, the y“ vector (the TX1 submatrix
of Y corresponding to the normalizing jointly dependent variable in
equation u) will be written as y and the up vector (the TXl
vector of disturbances of the nth equation) will be written as u.
Also, mu, L”, and nu will be shortened to m, L, and n respectively.
From preceding assumptions or derivations we have:
(0.1) p1im(l/T)U'U = X where U is TXM and Z is MXM.2 The
diagonal element of 2 corresponding to the nth equation is

02; hence, plim(l/T)u'u = 02.

(0.2) plim(l/T)X'X = QXX where X is TXA and QXX is AXA.3 The

submatrix of X consisting of the predetermined variables in

 

1As before in this paper, plim stand for plim .
Tam

2Assumption 2, section I.C.3.

3Assumption 3, section I.C.3.

439

(0.3)

(0.4)

(0.5)

matrix, OX X is a moment matrix and there exists a non-singular pxp

440

th
the u equation is the TXL matrix X“ ; hence,

. g = .
plim(l/T)XuXp quxp, an LXL matrix.

0 where O is a AXM matrix.1 Thus,

plim(l/T)X'U

plim(l/T)XﬁU = O (with 0 an LXM matrix).
plim(l/T)X'V = 0 where V is TXG and, therefore, 0 is AXG.
. I = O ' X ' ..
plim(l/T)XIXI XIXI where XI 13 a T K matrix of instru
ments and OX X is KXK.3 We will assume that the variables
I I

in X” are contained in XI (hence, ﬂxuxu is a submatrix of

XX

Let rk.0 = p . Since for all T, (l/T)X'X is a moment
XIXI I I

0X X as well as a submatrix of Q ).
I I

I I

2

submatrix QX*X* = plim(l/T)X¥'X¥ . X? is a matrix of variables from
I

XI
*=:

XI

for

than

(0.6)

(0.7)

(0.8)

X

T

I
X X

If 0 is po, then is the entire matrix and
* * x
“XIX 0.

I .

I I I I I
We will assume that rk X? = rk XI for all T and rk X?

sufficiently large; however, X? and XI may have rank less

p for small T.

4
' I _ ' x I s
p11m(1/T)XIX — OX X With OXIX a K A matrix. (Ox X is a

I

submatrix of as well as a submatrix of and .
X X
3x1 QXx nXI I

plim(l/T)XiU = 0 where 0 is a KXM matrix.5 Thus,

p1im(1/T)X¥'U = O (with O a rk.XI X M matrix).
p1im(1/T)XiV = 0 where O is KXG.6

)

 

hAssumption 3, section 1.0.3.

2Follows from 0.3. See (1.25).

Assumption 5, section 1.0.3.

Assumption 5, section 1.0.3.

Assumption 5, section I.C.3.

6Follows from (0.7) See (1.26).

P

441
(0.9) plim(l/T)V'V giaVV , a GXG matrix.1

2
Given assumptions (0.1) through (0.9) we will derive that:

 

 

-l
Y'Y -k [Y'Y ] Y'X n 0 n
l J. * *x* *Y
n u u u XI u u YuxIQxI I XI H Y3K“
(0.10) pliml =
T deu x'x 0X Y 0x x
“ ” L_ u u u H;
where DY X* and QY X are defined by (C.13) .
H I H H
In addition to assumptions (0.1) through (0.9) we will require 5‘
the assumption: ;
-

 

-1
Q 0
* * * *
YHXIQXIXIQXIYH Y X

 

 

H H
(0.11) is nonsingular.
L. 0X Y 0XIX
D H H H;

It will be convenient for us to derive the plim of 1/T times
the sums of cross-product of certain matrices with Y before commenc-
ing the main derivations of this appendix

(0.12) plim(l/T)Y'U - p1im(1/T)[XH' + v]'u

p1im(1/T)HX'U + p1im(1/T)V'U

H'plim(1/T)X'U + plim(l/T>(-1)r‘1[u z OJ'U

p1im(1/T)U'U 2
-1 -1 def 3
- H 0 - T 0 - -F [g] = OYU .
. I = x . .
We will use plim(l/r)Y u QYHU , an m l submatrix of OYU

 

1Follows from the relationship between V and U. See (1.22).

2See (C.22).

2
3"def" denotes that we are defining -F-1[;] as OYU .

442
plim(i/T)[xn' + v]'xI

(0.13) p1im(1/T)Y'X

I
= leim(l/T)X'XI + plim(l/T)V'XI
def
= + O = m = .
I.mXXI XXI QYXI
. . I = o X .
We Will use plim(l/T)YuXI prI (an m K submatrix of OYXI),
. g = .
plim(l/T)YpXu OYuxu (an mXL submatrix of QYXI)’ and
plim(l/T)Y¢X¥ = OYux*I (an mXp submatrix of OYXI).

(0.14) p1im(l/T)Y Y- —p11m(1/T)[xn' + v] [xn' + v]
= plim(i/T)[Hx'xn' + nx'v + v'xn' + v'v]

= H[p11m(i/T)x'x]n' + leim(1/T)X'V + [plim(i/T)v'x]n' + p1im(1/T)V'V
= nnxxn' + n-o + o-n' +-nVV = nnxxn' *‘vidgﬁnYY

We will use plim(l/T)Y;Yu = QYHYH , a submatrix of Q

YY '

Theoeem (C.75):

If plim(kl - 1) = plim(k2 - 1) = 0, then

u u

-1
I I _ I
Yu Yu- HEY Y uJiXI Y x Yuy k2[Yuy]lx

8 =
k1’k2 x'Y x'x x'y
u u u u u

1
is a Consistent estimator of 6 .

Phooﬁ 06 Theotem (0.75):

 

‘ -1
I _k Y I
ﬂEYuYulLXI Yuxu Yu- 2Eu1LxI
(0.16) 6k1,k2 - 6 = y - 5
X'Y X'X X'
n u u u
1Since [Y'ptyhxI = [YH J'X Iy [see (1.56)], the right hand vector
'-k 2[Y t]
may be written as Y“ “lxl

Y
xI
u

"I mh‘

'PE

443
d ' I E X 6 + :
(an Since y [Yp #1 U)

.1
Y'Y -k Y'Y Y'x '-k Y '
u u- 1[ u u-JLXI u D Yu 2[ uJiXI
‘ {[Y f X ]5 + u} - 5
X'Y x'x x' “‘ “‘
u u» u- p u

. c = v
(and Since [Yﬁllleu [YuYﬁllXI [see (1.56)] and

[YH-J‘L'XIXH- -= ”11"“th - 0 [see (1.56) and (1.45)]):

-1
Y'Y -k [Y'Y Y'x Y'Y -1< Y'Y Y'x ms
nu 1 uujixl up nu- ZEHH'JJ-XI up 1
=- 5
x'Y x'x X'Y x'x I
u u u u u u u u u.
-1
'Y -k Y'Y Y'X '-1< Y '
u u- 1[ u uJLXI u 1» Yu 2[ uJLXI
+ u - 5
x'Y x'x x'
u u u u
= A6 + d
where:
' -1
Y'Y -k Y'Y Y'x Y'Y -k Y'Y Y'x
uu 1[HH]iXI up» up ZEHMLXI up
(0.17) A .. - I
x'Y x'x X'Y x'x
u u u u u-
and
-1
Y'Y -k Y'Y Y'x y'-k Y '
u- » 1E1» HJlXI u u u- 2E MJLXI
(C.18) d = u .
x'Y x'x x'
u u u 1b

In evaluating plim A and plim d, we will first find it con-

venient to evaluate plim(l/T){YJYP - kltY;Yul-XI} and
. v __ k v _
plim(l/T)prYu ZEYﬁYHJlXI}

(C.19) plim(l/T)[Y;Yp]ixI a plim(l/T){Y;Y - [YdYu] } and for T

u- IXI
sufficiently large [see (C.5)] and using (1.55):

444

-1
I 11 ' - ' * *' * '
p m(1/T)[YuYu] plitn(l/'1‘)YMXI(XI XI) Xf Y“

. nY Y - [plim(l/T)YﬁX‘f][p11m(l/T) (Xsf'xaf)]'1[p11m(1/I)xaf'Y J

u u n
B “Y“ Y ‘ 0Y x* alliX’VaxakY °
I I I I u

Therefore,

(C.20) plim(l/T){Y'Y

- R Y' a
u p 1[ uYHJlXI}

11 1/1: Y'Y - 11 k
9 “>111; p m

1- plim(l/T)[YL1Y“]J_ x

1
-1 -1
= - 1 - [n - 0 ] =
<qu Y“L Yu Yu Yuxienxafx’fnxafYLL aYﬂX‘ani‘XinX‘iYu
Similarly,

-1
(c.21) plim(l/T){Y'Y - k [Y'Y ] } = Q o o
l x* * *
p u 2 u u XI Yﬁ I XIXI XIYﬁ

Applying (C.6), (C.13), (C.20), and (c.21) we have:

 

 

'l Evy
Y' Y -k 1[Y' Y Y'x -k 2[Y' Y Y'x
:u u-XI]l u 1» Yu Yu- J1XI u u
(0.22) [311% = plim;
x'Y x'x X'Y x'x
u. u u. 13 L n- u- u 5
é ' o o
- xﬁn * * *
Y“ I XIXI xIYu Yuxp‘
n n ’
x Y
Y 1 m

 

hence:

(C.23) plim A =

 

 

 

_ v '1- '- _ I 1
'Y R 1LLIZY Y pix] pru Yde RZCYJYul x1 Y”):“
1 1
plim .5. . ' T ' ' - I
x Y x x x Y x
u u- p u L p» u- v-
F' -1 'l-IF -1 -
o n n 0 n
“Yuxf'ﬁqxgaxgm wau Yum; xiexic )0pr Yul)!“
- - 1
a o o o
L xP'Yl-lv 1!“pr ._ XHYU' xpxl-ﬂ

 

 

445

= I - I = 0 (where I' and O are an matrices).
To evaluate plim d, we first note that:

' a ' _ t l '1 I
(c.24) plim(l/T)[Yu]lx1u plim(l/T)[Yuu primf XI) XI u]

= [plim(l/T)Yﬁu] - [plim(l/T)YJX¥][Plim(l/T)X¥'Xf]-1[plim(l/T)X¥'u]

 

 

 

 

 

 

 

=:QY u - CY X*QX*X*O (where O is a KXl vector)
u u I I I
= QY u
u
Hence:
F'v -k Y'Y ' ‘1 '1 ' - '
YpYu 1[ u p.11. X1 pru Yuu kZEYulxu
' a: l -]-.4
(0.25) pllm d plim T T
xﬁy X'X X'u
L ” .1 -
r- _1 ...-IF 7
O O O — l 0
* * *
QYuXIQXIXI fo pru Yuu Yuu
OX 0 0
Y
L M 1» X**xw .. J
-1 -l
[- n a
* * *
OYHXI XIXIQXIYH pru
= 0 = 0
0
L “m w

 

 

(where O is an nXl vector).
Finally, using (C.16), (C.23), and (C.25) we have:

(0.26) p11m(8 - a) = plim(A5 + d) = [p11m A]6 + plim d

k1,k2
= 06 + 0 = 0 (where the first 0 is an an matrix and the
other two 0's are nXl vectors).

Hence, plim 8k

= 6 and 8
1 k

2 l

is a consistent estimator of 5 .

2

 

446

Theonem (C.27):

If plim(k1 - l) = plim(k2 - 1) = 0, then

i k ' 3L k ah k /(T - n) is consistent estimate of 02
’

1’ 2 l 2 1’ 2
Piwoﬁ 06 Thconem (C.27):

6

A

We will first show that °i k ﬁk k /T is a consistent estimate
2 l’ 2 l’ 2
of a

A

(0.28) ukl’kZ = y - zuékl’kz = y - 2&5 - Zu[6k k - 6]

= u - Zu[A6 + d] [see (C.16)].
Therefore,

A ' A
(0.29) plim(l/T)uk ,k uk ,k

= plim(l/T){u - Z [A6 + d]}'{u - Z [A6 + d]}
1 2 1 2 u “

= plim(l/T){u'u - u'Zu[A6 + d] - [A6 + dj'zdu +-[A6 + d]'z;zu[A6 + d]}
= plim(l/T)u'u - [plim(l/T)u'Zp]plim[A6 + d]

- plimEAs + d]'plim(l/T)Zdu + p11m[A6 + d]'[plim(l/T)Zﬁ2p]plimEA6 +»d]

«

0 I) I) O 1
=‘02 - [Guy E OJO - 0' an“: +10' Yqu Yuxuio = 02
” 0 .j :o o E
Finally,
(0.30) a: k - plim 3L k Gk k /(T - n)
1’ 2 1’ 2 1’ 2
. . A. a 2 2
= p11m[T/(T - n)]°p11m(1/T)u u = 1-0 = O .
k19k2 k1,k2

Vettﬁtcatton that (I/Tlpttm T Vaalsk' h )‘ﬁdth lid/H3,Z k ) gtven by
1' 2 1’ 2
(111.4)] Equats the Asymptotic Coeﬁﬁtctent Vantance-covantance Matntx

05 the 2313 amazon

(0.31) Let Var(Sk k ) be given by (111.4) and let plim(kl - 1) =

l 2
plim(k2 - l) = 0 . Then

‘5.
.'

f E Ii '3'“:
..n

447

 

 

~-1
Y'Y -k[Y'Y Y'x
1 .« 1 .2 unluHJLXI u-u
—plim T Var(6 ) = —plim To
T 1’k2 T
X'Y x'x
uu- M»
'Y -k [Y'Y] sr'x'l"1
up 1 Law-XI up:
= 1 lim 82 1' ‘1
5p pun'1:
X'Y x'x
up- nu.)
Y'Y -k [Y'y] x'x‘"1
up 1 MuiXI up:
.102 1.1
T P “‘f
X'Y x'x
up uu-J
F -1 1-1
n n a n
* *x *Y YX
12 YuxIxII 11» up.
= 5° [by (0.22)]
“xy Oxx
_ u u u HJ

 

 

which is the asymptotic coefficient variance-covariance matrix of the

ZSLS estimator [see Goldberger [1964], p. 333].

BIBLIOGRAPHY

n.1,

. <r_.-—_'H.
- -‘Io ._

BIBLIOGRAPHY

Aitken, A.C. [1934-35], "On Least Squares and Linear Combination of
Observation," Proceedings of the Royal Society of Edinburgh, LV
(1934-35), 42-48

Anderson, T.W., and Rubin, Herman [1949], "Estimation of the Parameters
of a Single Equation in a Complete System of Stochastic.Equations,"
Annals of Mathematical Statistics, XX, No. l (March,l949), 46-63.

[1950], "The Asymptotic Properties of Estimates of the Para-
meters of a Single Equation in a Complete System of Stochastic
Equations," Annals of Mathematical Statistics, XXI, No. 4
(December,l950), 570-582.

 

Basmann, 3.1. [1957], "A Generalized Classical Method of Linear Esti-
mation of Coefficients in a Structural Equation," Econometrica,
, XXV, No. 1 (January,1957), 77-83.

[1960a], "0n the Asymptotic Distribution of Generalized
Linear Estimators," Econometricg XXVIII, No. l (January,l960),

97-107.

[1960b], "On Finite Sample Distributions of Generalized
Classical Linear Identifiability Test Statistics," Journal of the
American Statistica1.Association, LV, No. 292 (December,1960),
650-659.

[1961], "A Note on the Exact Finite Sample Frequency Functions
of Generalized Classical Linear Estimators in Two Leading Over-
identified Cases," Journal of the American Statistical Association,
LVI, No. 295 (September,l961), 619-636.

Chernoff, Herman and Divinsky, Nathan [1953], "The Computation of
Maximum-Likelihood Estimates of Linear Structural Equations,"
Chapter X in V.C. Hood and T.C. Koopmans, Studies in Econometric
Method. New York: John Wiley and Sons, Inc., 1953, 236-302.

Chipman, John 8., and Rao, MgM. [1964a], "The Treatment of Linear
Restrictions in Regression.Analysis," Econometrica, XXXII, No.1-2

(January-April,1964), 198-209.

[1964b], "Projections, Generalized Inverses and Quadratic
Forms," Journal of Mathematical Analysis and Applications, No. 9
(August, 1964) , 1-11.

Chow, Gregory C. [1964], “A Comparison of Alternative Simultaneous
Estimators for Simultaneous Equations," Econometrics, XXXII, No.

4 (October,1964), 546-548.

448

449

Chow, Gregory c. and Ray-Chaudhuri, 11.x. [1967], "An Alternative Proof
of Hannan's Theorem on Canonical Correlation and Multiple Equation
Systems," Econometrica, XXXV, No. 1 (January, 1967), 139-142.

 

Christ, Carl F. [1966], Econometric Models and Methods. New York:
John Wiley and Sons, Inc., 1966.

Cragg, J.G. [1966], "On the Sensitivity of Simultaneous-Equations
Estimators to the Stochastic Assumptions of the Models," Journal
of the American Statistical Association, LXI, No. 313 (March, 1966),
136-151.

[1967], "On the Relative Small-Sample Properties of Several
Structural-Equation Estimators," Econometrica, XXXV, No. 1
(January, 1967), 89-110.

Crockett, J.B., and Chernoff, H. [1955], "Gradient Methods of Maximiza-
tion,“ Pacific Journal of Mathematics, V, No. 5 (1955), 33-50.

Durbin,‘J. and Watson, G.S. [1950], "Testing for Serial Correlation in
Least Squares Regression. I," Biometrica, XXXVII (1950), 409-428.

 

Durbin, J. and Watson, G.S. [1951], "Testing for Serial Correlation in
Least Squares Regression. II," Biometrica, XXXVIII (1951), 159-178.

Dusenberry, J.S., Fromm, G., Klein, L.R., and Kuh, E. [1965], The
Brookipgs anrterly Econometric Model of the United States.
Chicago: Rand McNally and Company, 1965.

Eisenpress, Harry [1961], "Forecasting by Econometric Systems," IB 9
FES (709 Program Description), Library Services Department, IBM
Data Processing Department, 590 Madison Avenue, New York 22, New

York, 1961.

Eisenpress, Harry [date unknown], "Experiments in Convergence in Full-
Information Estimation," Multilithed Paper.

Eisenpress, Harry and Greenstadt, John [1964], "Non-Linear Full-Informa-
tion Estimation," Multilithed Paper Presented at Meeting of
Econometric Society, Chicago, Illinois, December 29, 1964, 17
pages.

Faddeeva, V.N. [1959], Computational Methods of Linear Algebra,
(Translated by Curtis D. Benster). New York: Dover Publications,

Inc., 1959.

Fisher, Franklin [1965], "Dynamic Structure and Estimation in Economy-
wide Econometric Models," Chapter 15 in Dusenberry, J.S., Fromm,
G., Klein, L.R., and Kuh, E., The Brookipgs Quarterly Econometric
Model of the United States. Chicago: Rand McNally and Company,
1965, 588-635.

Freund, R.J. [1963], “A Warning of Roundoff Errors in Regression,“ The
American Statistician, XVII, No. 5 (December, 1963), 13-15.

I

450

Goldberger, Arthur S. [1964], Econometric Theory. New York: John
Wiley and Sons, Inc., 1964.

 

Hannan, E.J. [1967], "Cannonical Correlation and Multiple Equation
Systems in Economics," Econometrica, XXXV, No. 1 (January, 1967),
123-138.

 

Hart, 3.1. [1942], "Tabulation of the Probabilities for the Ratio of
the Mean Square Successive Difference to the Variance," Annals of
Mathematical Statistics, XIII (1942), 207-214.

Hood, Wm. C., and Koopmans, Tjalling C. [1953], Studies in Econometric
Method. (Cowles Commission for Research in Economics Monograph
No. 14.) New York: John Wiley and Sons, Inc., 1953.

 

Johnston, J. [1963], Econometric Methods. New York: McGraw-Hill Book
Company, Inc., 1963.

 

13?
3:

Kendall, M.G., and Stuart, A. [1961], The Advanced Theory of Statistics,
Vol. 2, New York: Hafner Publishing Company, 1961.

 

Klein, Lawrence R. [1950], Economic Fluctuations in the United States
1921-1941. New York: John Wiley and Sons, Inc., 1950.

 

Klein, Lawrence R. [1953], A Textbook of Econometrics. Evanston,
Illinois: Row, Peterson and Company, 1953.

 

Klein, L.R., and Nakamura, Mitsugu [1962], "Singularity in the Equation
System of Econometrics: Some Aspects of the Problem of Multi-
collinearity," International Economic Review, 111, No. 3 (September,
1962), 274-299.

 

Kloek, T., and Mennes, L.B.M. [1960], ”Simultaneous Equations Estimation
Based on Principal Components of Predetermined Variables,"
Econometrica, XXVIII, No. 1 (January, 1960), 45-61.

 

Kmenta, Jan and Gilbert, Roy F. [1967], "Small Sample Properties of
Alternative Estimators of Seemingly Unrelated Regressions,"
Michigan State University Econometrics Workshop Paper No. 6707,
(December, 1967), 41 pages. Forthcoming in the Journal of the
American Statistical Association.

 

 

 

Koopmans, T.C., Rubin, H., and Leipnik, R.B. [1950], "Measuring the
Equation Systems of Dynamic Economics," Chapter II in Tjalling C.
Koopmans, Statistical Inference in Dynamic Economic Models. New
York: John Wiley and Sons, Inc., 1950, 53-237.

 

Koopmans, Tjalling C. [1953], ”Identification Problems in Economic
Model Construction,“ Chapter II in W.C. Hood and T.C. Koopmans,
Studies in Econometric Method. New York: John Wiley and Sons,
Inc., 1953, 27-47.

451

Koopmans, Tjalling C., and Hood, Wm. C. [1953], "The Estimation of
Simultaneous Linear Economic Relationships," Chapter VI in W.C.
Hood and T.C. Koopmans, Studies in Econometric Method. New York:
John Wiley and Sons, Inc., 1953, 112-199?

Koopmans, Tjalling C. [1950], Statistical Inference in Dynamic Economic
Models. (Cowles Commission for Research in Economics Monograph
No. 10.) New York: John Wiley and Sons, Inc., 1950.

Longley, James W. [1967], “An Appraisal of Least Squares Programs for
the Electronic Computer from the Point of View of the User,“
Journal of the American Statistical Association. LXII, No. 319
(September, 1967f, 819-841. ‘

Madansky, Albert [1964], "On the Efficiency of Three-Stage Least-
Squares Estimation," Econometrica, XXXII, No. 1-2 (January-April,
1964), 51-56.

Mann, R.B., and Wald, A. [1943a], "0n the Statistical Treatment of
Linear Stochastic Difference Equations," Econometrica, XI, Nos.
3&4 (July-October, 1943), 173-220.

 

[1943b], "0n Stochastic Limit and Order Relationships,” Annals
of Mathematical Statistics, XIV (1943), 217-226.

Marquardt, D.W. [1959], "Solution of Nonlinear Chemical Engineering
Models," Chemical Engineering Proggess, LV, No. 6 (June, 1959),
65-70.

Nagar, A.L. [1959], “The Bias and Moment Matrix of the General k-class
Estimators of the Parameters in Simultaneous Equations,"
Econometrica, XXVII, No. 4 (October, 1959), 575-595.

 

[1960], “A Monte Carlo Study of Alternative Simultaneous
Equation Estimators,” Econometrica, XXVIII, No. 3 (July, 1960),
573-587.

 

[1961], "A Note on Residual Variance Estimation in Simultaneous
Equations," Econometrica, XXIX, No. 2 (April, 1961), 238-243.

[1962], "Double k-class Estimators of Parameters in Simultaneous
Equations and Their Small Sample Properties," International
Economic Review, III, No. 2 (May, 1962), 168-188.

Neely, Peter M. [1966], ”Comparison of Several Algorithms for Computa-
tion of Means, Standard Deviations and Correlation Coefficients,"
Communications of the ACM, IX, No. 7 (July, 1966), 497-499.

 

Rothenberg, T.J., and Leenders, C.T. [1964], "Efficient Estimation of
Simultaneous Equation Systems," Econometrica, XXXII, No. 1-2
(January-April, 1964), 57-76.

452

Rubin, Herman [1948], "Observations on the Computational Procedures
for Maximum Likelihood Estimates," Cowles Commission Discussion
Paper, Statistics, 317, July 17, 1948 (ditto) 2 pages.

Sargan, J.D. [1964], "Three-Stage Least-Squares and Full Maximum Like-
lihood Estimates," Econometrica, XXXII, No. 1-2 (JanuarywApril,
1964), 77-81.

 

Stroud, Arthur, and Zellner, Arnold [1962], "Program for Computing
Efficient Regression Estimates and Associated Statistics," Systems
Formulation and Methodology Workshop Paper 6204, Social Systems
Research Institute, University of Wisconsin, May 22, 1962.

Summers, Robert [1965], "A Capital Intensive Approach to the Small
Sample Properties of Various Simultaneous Equation Estimators,"
Econometrica, XXXIII, No. 1 (January, 1965), 1-41.

 

Telser, Lester G. [1964], "Iterative Estimation of a Set of Linear
Regression Equations," Journal of the American Statistical Associa-
tion, LIX, No. 307 (September, 1964), 845-862.

 

Theil, H. [1961], Economic Forecasts and Policy. Amsterdam: North-
Holland Publishing Company, 1961.

Theil, H. and Nagar, A.L. [1961], "Testing the Independence of Re-
gression Disturbances," Journal of the American Statistical
Association, LVI (December, 1961), 793-806.

 

 

Wagner, H.M. [1958], “A Monte Carlo Study of Estimates of Simultaneous
Linear Structural Equations,“ Econometrica, XXVI, No. 1 (January,
1958), 117-133.

 

Zellner, Arnold, and Theil, H. [1962], "Three-Stage Least Squares:
Simultaneous Estimation of Simultaneous Equations," Econometrica,
XXX, No. 1 (January, 1962), 54-78.

 

Zellner, Arnold [1962], "An Efficient Method of Estimating Seemingly
Unrelated Regressions and Tests for Aggregation Bias," Journal of
the American Statistical Association, LVII, No. 298 (June, 1962),
348-368.

 

Zellner, A., and Thornber, H. [1966], "Computational Accuracy and
Estimation of Simultaneous Equation Econometric Models,"
Econometrica, XXXIV, No. 3 (July, 1966), 727-729.

 

W‘