.. t .62...
; H45..3..v.f
.. I. .1.“

.Yv
f
.

:1... a. , . «5“ g
r. .r. . 1.15.. .. 2.?
1‘. J
11! .8 3:. .- 5. luv; a.”- 5»...
1 1.13.331. c1 .r x
3 i o

x :- 9... ..

I; 33.33. -..!>l» .J r

.. . it. ,
an 2.

‘ 1!
:1}.
. 79...

J.

.21
. .5». 1.1.3.1.“.
«ﬁg. . 94.. .7.

2

in .1
”brunt.” :1... .v 3.1:?
Lllv .. u .i .v: w ..

. . 1.. v. 3 5.2!

ft. .. t a.
. 1:5 .n
...m.:...;

 

 

:urv SﬂH .q

a...

(a: . :a
13:11.“:3
I

2. L11)
p i ..
u...

;- . .
ﬁreﬂy.»
I... .11. v.I~

r z s 3
.. 3 .. ;

a
i. 1:!)TI0
x. . 7.
. ‘92ij
35‘s: ”A“?

‘9

.u): $3,, . . .1
30‘: HF.

K11
{(1472, A
I. 1) 9:4,

‘ Psi... 7..

.Kp)

rmmu43m..ua.fnn.

 

 

TYLIBRARIES

llllllll \\\l\\\\\\\\Willi“ ii iii

3 12930

MiCHIG

\

 

 

 

 

 

 

 

 

 

 

l

'1‘ his is to certify that the

dissertation entitled

EMPIRICAL BAYES ESTIMATION FOR UNBALANCED
MULTILEVEL STRUCTURAL EQUATION MODEL VIA
THE EM ALGORITHM

r I deﬂoCu U,
See—Heyon Jo

has been accepted towards fulﬁllment
of the requirements for

Ph. D. degreein Counselinq,
Educational Psychology & Special
Education (Statistics & Research Design)

 

J;LL4/~ QMIMWK

Major professor

DﬁeNovember 15, 1994

 

MS U is an Afﬁrmative Action/Eq ual Opportunity Institution 042771

 

LIBRARY
Mlchlgan State
Unlverslty

 

 

 

- - , 494 N “a - _...
PLACE ll RETURN BOX to tomovothb checkout ﬂom your mood.
To AVOID FINES Mum on or More dot. duo.

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

l | l

usu loAn Nﬂmnﬂn Action/Equal Opponunlly III-titular:
W

 

m1

EMPIRICAL BAYES ESTIMATION FOR
UNBALANCED MULTILEVEL STRUCTURAL EQUATION MODELS
VIA THE EM ALGORITHM

By

See-Heyon J o

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁlment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

Department of Counceling, Educational
Psychology and Special Education

1994

ABSTRACT

EMPIRICAL BAYES ESTIMATION FOR
UNBALANCED MULTILEVEL STRUCTURAL EQUATION MODELS

VIA THE EM ALGORITHM

By

See-Heyon J o

The question of how to analyze unbalanced hierarchical data generated from
structural equation models has been a common problem for educational researchers
and analysts. Among difﬁculties plaguing statistical modeling are estimation bias due
to measurement error and the estimation of the effects of hierarchical, social milieu
where education takes place. Over the last two decades, substantial progress in
multilevel structural modeling and estimation techniques has been made for the
balanced sampling design.

This dissertation presents empirical Bayes estimation procedures for the
multilevel structural equation models in the context of unbalanced sampling designs.
The computational procedure is implemented via the EM algorithm. It is particularly
useful for the problem of estimating a large number of parameters in multilevel
structural equation models.

A multilevel structural equation modeling process with an example illustrates
the general principles of the empirical Bayes estimation with the EM algorithm. The
accuracy of the algorithm was tested using a set of artiﬁcial data. The numerical
results suggest that this new methodology is a potentially useful means for studying
hypothesized causal relations among latent variables varying at two levels of

hierarchy.

© Copyright by
See-Heyon J o
1994

To my parents, brothers, wife and friends

iii

ACKNOWLEDGMENT

I wish to express my extreme gratitude to my major professor, Dr. Stephen W.
Raudenbush, for his patient guidance over the last ﬁve years. His insights contributed
greatly to the content of this dissertation. And his encouragement, intellectual and
ﬁnancial support helped me complete this dissertation.

On a more general note, I would like to thank the entire faculty of the
Department of Counseling, Educational Psychology and Special Education at
Michigan State University for providing me the opportunity to pursue an advanced
degree in statistics and research design, and for making my stay here a very rewarding
experience.

My special gratitude goes to Dr. William Schmidt for his support. In the winter
of 1990, Dr. Raudenbush and Dr.Schmidt gave me a precious opportunity to study
multilevel structural equation models under their supervision. That winter enriched the
idea of this dissertation.

I would like to express my gratitude to Dr. Richard Houang for his insightful
suggestions and comments. I wish to express my appreciation to the rest of my
committee, Dr. James Stapleton and Dr. Frank for reviewing my work and providing
suggestions for improvement.

To my father, mother, brothers, and relatives, I owe a special "thank you" for
their support and encouragement. To my wife, Kyung-Nam Lee, I dedicate this
dissertation. Without her unfailing love and support this study could not

exist.

iv

Finally, I believe William Faulkner (1897-1962) also deserves
some thanks for saying :
" I would like to think that there was someone there at that time
too, to reassure them that man is tough, that nothing, nothing -war,
gn'ef, hopelessness, despair can last as long as man himself can last;
that man himself will prevail over all his anguishes, provided he will
make the effort to; make the effort to believe in man and in hope -to
seek not for a mere crutch to lean on, but to stand erect on his own

feet by believing in hope and in his own toughness and endurance."

TABLE OF CONTENTS

LIST OF TABLES ............................................................................................... viii
LIST OF FIGURES ................................................................................................ ix
Chapter Page
1. INTRODUCTION ............................................................................................... 1
1.1 General Problem .................................................................................. 3
1.2 Objectives ............................................................................................. 5
1.3 Brief History of Single Level Structural Equation Models ....................... 6
1.4 Prior Work on Multilevel Structural Equation Models ............................ 9
2. MULTILEVEL STRUCTURAL EQUATION MODELS .................................... 20
2.1 The Model and Basic Notation ............................................................. 22
3. EM ALGORITHM FOR MAXIMUM LIKELIHOOD ESTIMATES .................. 36

3.1 General Description and Application to the Multilevel Structural

Equation Model ................................................................................... 37
3.2 Computation of the Iterates ................................................................ 42
3.3 Log-Likelihood for the Multilevel Structural Equation Model ................ 46

vi

4. NUMERICAL RESULTS .................................................................................. 50

4.1 Generation of Data ............................................................................ 50
4.2 Results of the Analysis ........................................................................ 53
5. CONCLUSION ................................................................................................. 61
5.1 Summary, Implications and Conclusions ............................................... 62
5.2 Future Work ........................................................................................ 65
APPENDICES
Appendix 1 ................................................................................................. 67
Appendix 2 ................................................................................................. 69
Appendix 3 ................................................................................................. 73
Appendix 4 ................................................................................................. 74
Appendix 5 ................................................................................................. 75
BIBLIOGRAPHY ................................................................................................. 78

vii

LIST OF TABLES

Table Page
1. Structural Parts of the Multilevel Structural Equation Model ............................... 31
2. Number of Groups per Group Size .................................................................... 51

3. Summary Statistics of the Multilevel Structural Equation Model for

Balanced and Unbalanced Data Sets with 500 groups .......................................... 53
4. Maximum Likelihood Estimates for the Example Model for Balanced

Data Sets ......................................................................................................... 54
5. Conditional Expectations of the Regression Coefﬁcients for Balanced

Data Sets .......................................................................................................... 54
6. Maximum Likelihood Estimates for the Example Model for Unbalanced

Data Sets ......................................................................................................... 55
7. Conditional Expectations of the Regression Coefﬁcients for Unbalanced

Data Sets .......................................................................................................... 55
8. The Values of the Observed Log-Likelihood ....................................................... 56
9. Maximum Likelihood Estimates for Example Restricted Model for Unbalanced

Data Sets ......................................................................................................... 58

10. Conditional Expectations of the Coefﬁcients for Exogenous Variables in

Restricted Model for Unbalanced Data Sets ...................................................... 59
11.Estimates of It's for Balanced Data ................................................................... 6O
12. Estimates of It's for Unbalanced Data .............................................................. 6O

viii

LIST OF FIGURES

Figure Page

1. The Path Diagram for an Achievement Model ..................................................... 21

ix

CHAPTER 1

INTRODUCTION

As a consequence of various theoretical developments and of improvements in
computing, maximum likelihood (ML) estimation has become a viable procedure for
estimating parameters in multilevel structural equation models under the balanced
sampling design. Many of these developments were reviewed in detail by Jo (1993).

The initial interest in ML estimation of a multilevel covariance structure model was
noted and developed by Schmidt (1969). Schmidt and Wisenbaker (1986) extended this
work to the structural equation models (J oreskog, 1973) for balanced hierarchical data.

McDonald and Goldstein (1989) derived the likelihood equations and derivatives
for a bilevel structural equation model which allows for variables measured strictly at a
higher level, though no computational approach was made available. They also indicated
that the procedure for computation of ML estimates is currently less well developed for
the unbalanced sampling design.

Recently, based on the balanced-data theory provided by Schmidt (1969) and
McDonald and Goldstein (1989), Muthen (1990) showed that the maximum likelihood
ﬁtting function could be rewritten such that the between and within structural models
could be estimated by means of a multi-population analysis in LISCOMP (Muthen,1987)
or other comparable structural equations software. In the case of balanced data this could
be accomplished by treating the within-group deviations as sampled from one population

and the between-group deviations as sampled from a second population. For the case of

2

unbalanced data, each cluster of groups which have the same number of observations is
treated as one population.

Lee and Poon (1992) also used the strategy of classifying level-2 units into subsets
of level-2 units having equal sample size. They proposed an estimator for such data which,
though not maximum likelihood (ML), has the same asymptotic distribution as the ML
estimator as the number of level-2 units per subset increases without bound.
Computationally this estimator is available using standard software program, such as
LISREL (Joreskog and Sorbom, 1993) or EQS (Bentler, 1989).

More recently, Raudenbush (in press) proposed an alternate approach for the
unbalanced case. He conceptualized the problem in the framework of groups which could
all have the same number of sampled cases but are missing data for some individuals. In
particular, in the M-step (maximization) the method uses the standard program such as
EQS (Bentler, 1989). Vredevooogd (1993) applied this general approach to the global
models (where two indicators for a group-level latent variable are included) in her
dissertation proposal. Jo (1993) also applied the general procedure to a set of linear
structural equation models.

The purpose of this dissertation is to develop empirical Bayes estimation
procedures for computing maximum likelihood (ML) estimates of the parameters in the
multilevel structural equation models in the context of the unbalanced sampling design.
The procedures do not require classifying level-2 units into subsets of level-2 units having

equal sample size. We present a multilevel structural equation modeling process with an

3

artiﬁcial example, which illustrates the general principles of empirical Bayes estimation

with the EM algorithm.

1.1 The neral Problem

A distinguishing characteristic of the data encountered in many areas of educational,
medical, social science (sociology, econometrics, management, marketing) and genetics
research is that the sampling structure is hierarchical. For instance, students are nested within
schools, workers within ﬁrms, patients within some treatment-speciﬁc medical programs,
family members in a family tree, or residents within census tracks. Individuals also can take the
role of independently observed groups.

Generally, students are taught in groups by a teacher, several classrooms and teachers
are grouped together into a school, schools into districts and districts are clustered in states.
Then students who attend the same school or classroom are expected to share certain
educational policies and practices. As a result, the educational outcomes for these students will
be, to varying degrees, intercorrelated. These eﬂ'ects of clusters are most validly viewed
within the context of multilevel linear models.

Much of social science data comes from two-, or three- stage sampling designs.
Large-scale educational assessment, for example, is typically conducted by drawing a sample of
schools and, ﬁ'om those schools, sampling the students who will take the assessment test. This
hierarchical fashion of sampling is ﬁ'equently selected in large-scale surveys, such as the
National Longitudinal Study with data gathered regarding the educational aspirations and

attainment of high school seniors of 1972 and the Second International Mathematics Studies

4
(SIMS; Crosswhite et al., 1985), and the Third International Mathematics and Science Studies

(TIMSS; Schmidt, 1993). Under the standard assumption of MD, covariance structure
modeling (Joreskog, 1973) of such data misguides statistical inference by not taking into
account the intracluster correlations which are present in hierarchical data. Hence an important
implication of such structure is that the classical assumption of independence among nested
observations is violated. Ignoring the existence of hierarchy in model building gives rise to
' several methodological and substantive problems and that have been well-documented in the
literature (Burstein,1980).

In the context of the linear modeL statisticians (Lindley and Smith, 1972; Smith, 1973;
Raudenbush, 1984, 1988; Aitkin and Longford, 1986; Goldstein, 1986) developed hierarchical
linear models (HLM) which are appropriate and powerful means of modeling hierarchical data.
Many of these developments and examples are found in the recent book written by Bryk and
Raudenbush (1992).

It was not until hierarchical modeling techniques (Aitkin and Longford, 1986;
Goldstein, 1986; Mason, Wong and Entwistle, 1984; Raudenbush, 1984; Raudenbush
& Bryk, 1986) were developed that complex relationships among variables across all
levels could be inferred. Such techniques have been widely used for various types of
research topics such as cognitive growth and change (Bryk and Raudenbush, 1987;
Goldstein, 1989), population studies (Mason et al., 1984), meta-analysis (Raudenbush
and Bryk, 1985), and evaluation of educational effectiveness (Aitkin and Longford,
1986; Raudenbush and Bryk, 1986).

However there have been only rare attempts of applying the methodology to

5

the structural equation models for hierarchical data. In discussing the empirical Bayes
\

approach, Muthen (1990) also indicated the plausibility of application to the multilevel
structural equation model by estimating each group's factor value under the
assumption of "exchangeability" (deFinetti, 1937; Lindley and Smith, 1972) of
the groups.

The main tasks in this dissertation are (a) to incorporate the effects of two
levels of social organization into statistical models for outcomes measured at the

individual level or/and cluster level; (b) to develop latent variable models that

simultaneously incorporate effects of structural relations and measurement error.

1.2 Qbiectives

The primary objectives of this dissertation are:

(1) To review previous relevant advances in statistiCal modeling and estimation
procedures for the multilevel structural equation model,

(2) To describe a multilevel structural equation model and develop empirical
Bayes estimation procedures for ML estimates via the EM algorithm,

(3) To write a necessary computer program to implement this new estimation
procedure,

(4) To demOnstrate by the use of simulated data that this estimation procedure

produces accurate parameter estimates.

6
1.3 Brief History of Single Level Structural Eguation Models

The measurement of latent constructs with multiple manifest variables began
with the work of Spearman (1904) early in this century. In 1904, Spearman proposed
the method of factor analysis to investigate Galton's theory (Galton, 1883) of
"intelligence" that a single common factor and a speciﬁc factor constituted
cognitive ability as measured educational tests. This model evolved to represent
intelligence with a hierarchical structure (Vernon, 1950). Thurstone (1947) extended
Spearman's theory to a multiple factor analysis model. Apart from
Spearman's factor analysis there is the work of Wright (1934), who derived the path
analysis for the research of genetics. Before Lawley's (1943) development of the
maximum likelihood function for factor analysis, the classical method was not based
on the statistical theory of random sampling. While computational methods were not
available at that time, Lawley derived the partial derivatives of the logarithm of the
likelihood function with respect to each element of the covariance matrix. Based on
Lawley's ML estimation theory, Joreskog (1967, 1973, 1977) developed the structural

equation model.

“Linear structural equation modeling (1977) represents an important

combining of the traditions of econometric and psychometric methods producing

a set of procedures that enable researchers to separate the structural part of

the model from the measurement properties of the variables. The structural

part of the model represents hypothesized networks among latent construct variables
imperfectly projected in the observed indicators. This scheme of formulation allows the
separation of issues of measurement error from the assessment of the structural

relationships that embody the actual purposes of the research. This tradition has seen

7

many applications in education, psychology, and sociology over the last 20 years.”

(Raudenbush and Schmidt, 1991)

Joreskog (1977) adapted two optimization algorithms of steepest descent and
the Davidon-Flectcher-Powell. Recently he extended LISREL to nonlinear structural
model (Joreskog and Sorbom, 1993). The recent version of LISREL8 with PRELISZ
(Jorekog and Sorbom, 1993) provides user-easy language SIMPLIS for the PC-
window.

As noted by Austin and Wolﬂe (1991), structural equation modeling is not a
recent development (Bentler, 1983; McDonald, 1978), nor is it the work of any one
individual or disciplinary area. However, there have been other lines of inquiry.

The second line of inquiry is represented by the work of Bock (1960), Bock
and Bargmann (1966), Wiley (1967), and Wiley, Schmidt and Bramble (1973). They
addressed a set of models formally parameterized as the factor analysis model but with
different notions as to the roles of the parameters themselves.

The model for the observed score vector of p tests is :

The model implies that the vector y has a multivariate normal distribution with mean

vector u and covariance matrix 23:

2=A<DA'+‘I‘ (1.3.2)

the general assumption for the model is (1) (I) is considered as a diagonal matrix, (2)
‘I’ is considered to be 10'2 or heterogeneous, (3) A is completely speciﬁed or scaled

by unknown but estimable matrix of scaling factor, F. Wiley (1967) developed a set
of 16 models that can be hypothesized by applying different combinations of
restrictions to the three main components of the general model. These models cover a
number of Joreskog's conﬁrmatory factor analysis models. Wiley, Schmidt and
Bramble (1973) developed the maximum likelihood estimators for eight of these
models using the restrictions : (1) A is completely speciﬁed and unsealed, (2) A is
completely speciﬁed and scaled by an unknown but estimable matrix of scaling
weights 1".

The third line of inquiry has a different tradition from the other two procedures
of modeling and analysis. Recently the application of the regression component
decomposition (Schonemann and Steiger, 1976) approach is proposed to avoid the
indeterminacy problems in the conventional LISREL model. In the RCD (regression
component decomposition) model, the common factors are deﬁned as linear
combinations of the observed variables and the factor loadings as the regression
coefﬁcients, regressing the manifest variables on the so deﬁned common factors. In
the context of LISREL model, Haaggen and Vittadini (1991) presented a method to
decompose the observed data into components which have analogous properties to

those of the latent variables in the LISREL model.

9
1.4 The Prior Work of Multilevel Structural Eguation Mgdels

Several articles deal with the analysis of the structural equation model for
hierarchical data. In this section we review the proposed methods for their models
and computational methods.

In Schmidt and Wisenbaker (1986), the measurement and structural models

are:
yzpy+An+Abnb+£+£b (1.4.1)
xsz+F§+Fb§b+w+ rub (1.4.2)
n=An+Br§+9 (1.4.3)
77,, = A, 77,, +Bb§b + 6,, (1.4.4)

The vectors ,uy,px, in the equations (1.4.1) and (1.4.2) are simply the expected

values of y and x respectively. The matrix Acontains coefﬁcients relating the latent

endogenous within-groups variables 77 to the manifest variables, y. Similarly, A b

relates the true endogenous between-groups variables, 17,, to the observed variables,

y. The vectors a, 8,, are the errors of measurement associated with the within-groups

and between-groups levels respectively. The coefﬁcients matrices, RI}, and the

vectors 4‘, 5,, w, w, bear similar relationships to the observed vector, x.

Equation (1.4.3) stipulates that the latent within-groups endogenous variables are

expressible as linear functions of themselves and the latent within-groups exogenous

variables. The matrices A and A, must be lower triangular such that (I — A) and

(I — A) are invertible. The vector, 0, contains the errors in structural equation.

10

Equation (1.4.4) is composed of parallel constructs dealing with the expression of
the between-groups latent endogenous variables.

By formulating the total variance-covariance as a simple additive function of
the within and between group variance-covariance matrices, the within group
covariance matrix, 2“,, and between group variance covariance matrix, 2,, were

expressed as follows:

mt“) = 2, + 2, (1.4.5)
y
where
1‘2 ,r' + ‘1', 1‘2,B'(I - A)" A‘
2.. = . A[(I-- 4)"1 32,311 - A)"]A' (1.4.6)
symmetric

+A[(I — A)“ 2:,(1- A)" ]A' + ‘1',

1324.1“, + ‘11,. 132513;,(1 - A,)'1 A',
21, = . AbKI—Ab)—1Bb2{,Bb(I-Ab)—I]A.b (1.4.7)
symmetric -1 _1 .
+Ab[(l - Ab) 29, (I - Ab) lAb + ‘118,

with the following deﬁnitions:

2;: the variance-covariance matrix of the latent within-group exogenous variables,

A B : the matrix of factor loadings associating manifest variables and latent
variables at the group level.

20: the variance-covariance matrix of the within-groups errors in equations, 19.

11

‘1’”: the variance-covariance matrix of the within-groups measurement error

associated with the observed exogenous variables, x.
‘1’; the variance-covariance matrix of the within-groups measurement error
associated with the observed endogenous variables, y.
A : the matrix of factor loadings connecting manifest variables and latent
variables at the individual level.

24‘ : the variance-covariance matrix among the exogenous latent variables at

the group level.

29‘ : the variance-covariance matrix among the endogenous latent variables

at the individual level.

‘1’ : the variance-covariance matrix among the random measurement errors

associated with the observed endogenous variables at the group level

‘I’ : the variance-covariance matrix among the random measurement errors

associated with the observed exogenous variables at the group level

Given the assumption of a multivariate normal distribution, Schmidt and Wisenbaker

derived the log likelihood function;

L(67) = (—J—"2—J’l)-1n|z,|—%1n|z, +nXbl—‘—]22tr[2;'Sw]-§tr[(2w +n£,)-'S,] (1.4.8)

where J is the total number of groups while n is the number of level-1 units in each

12

group. And S. = 715226 -7,)(y.,- -'y',-)T, S. = 312:6, mo; m2 and y..- is
the r by l observation vector for the i-th individual in the j-th group.

The maximum likelihood estimates for the parameters can be obtained by
setting the ﬁrst partial derivatives of the log likelihood ﬁrnction with respect to each
parameter equal to zero and solving for the unknowns. Schmidt and Wisenbaker
(1986) adapted the Fletcher-Powell method of optimization. This balanced model
and theory are subsumed in the Muthen's muti-sample analysis (1990) using
LISCOMP. Note that in Schmidt and Wisenbaker’s formulation, all between group
variables are aggregated versions of within-group variables. In Muthen's extension,
variables observed strictly at the group level are included.

Muthen (1990) provided the ML ﬁtting function for the unbalanced case
through which computational strategies are explicitly identical to the multi-sample
analysis in the standard LISREL, LISCOMP and EQS. He also described the
necessary steps of analyzing the multilevel structural equation model.

Muthen (1990) postulated that "in the unbalanced case the number of groups
with distinct group sizes may be rather small in any given application."

The measurement model for within group level is:

ya. =,u,+A77,j+15',j , 8,]. ~N(0,2,,) (1.4.9)

#1: Aban +51), : 491;j "' N(Oa2b) (1-4010)

13

where 77,]. and 7],} are p x l vectors of within latent variables and between latent

variables respectively; A and A, are r x p matrices of within-factor loadings and

between-factor loadings for the y variables; and, 8,]. and 8b] are the vectors of within-

and between-group measurement errors.

Based on the balanced-data theory, Muthen (1990) derived the ML ﬁtting

 

 

 

function:
D
2'14““ 2:11 +tr[2;1(de +"d (yd ‘nyd —P)T)]} +(N " J)(ln Ew|+tr[2;lSw]) (1-4-11)
4:1
where
)3, =[>:,+n,2,] (1.4.12)
n J"
de zji‘Zde 'Y.d)()7,—d _j7..d)T (1-4-13)
d j=l
1 D J4 ")4 _ _ T
Sw = —Z 2(yijd _y.jd)(yrjd —y.jd) (1-4-14)
N_J d=l j=l i=1

with the following deﬁnitions:

)7, = the sample mean vector for the d-th subset.

ya, = the outcome vector for the i-th individual in the j-th group classiﬁed into the
d-th subset.

Jd = the number of groups of the d-th subset.

,u = the population grand mean vector.

14

n, = the total number of individuals in d-th subset, d=1,...,D.

njd = the number of individuals in j-th group classiﬁed into the d-th subset.

j = index for the groups classiﬁed into a subset of distinct size
N = the total number of individuals in a study.
J = the total number of groups.
531,, = the sample group mean vector for the j-th group.
Note that 1': 1,...,njd. j =1,...,Jd. d=1,...,D

From a structural equation modeling point of view, the multilevel data ML
ﬁtting function can be viewed as corresponding to a simultaneous analysis of
independent observations from D + 1 heterogeneous “populations”. with the D
populations for SM ’s plus the within-group “population”. All of the parameters
are constrained to be equal across the D between-group populations except for the
scaling factor, nd. In equation (1.4.41) “it should be noted that the between sample
covariance matrices may be singular due to being created by summation over fewer
units than variables. This may prevent the use of certain conventional structural
modeling software where positive deﬁnite matrices are assumed.” (Muthen, 1990). To
ﬁnd the ML estimates one has to set up a command ﬁle (see, Muthen, 1990) to run
EQS or LISCOMP program according to the model equations.

By use of the statistical concept of "missing data" (Dempster, Laird, and
Rubin, 1977) Raudenbush (in press) developed a new estimation procedure. In theory
the balanced data is equivalent to the complete data, while unbalanced data is

incomplete data. In the unbalanced case, the E-step computes the conditional

15

expectations of the complete data sufﬁcient statistics given observed data and the
current parameter estimates. In the M-step the standard EQS software can be used to
ﬁnd the ML estimates of the parameters iteratively. Raudenbush (in press) postulates
that "by supposing one has sampled n units within each of J clusters, one can apply

the Muthen's balanced-data method. However, within each group k, only n”.
observations are available with n—nU observations missing. Then one might regard

the balanced data as the complete data, the observed unbalanced data as
incomplete data. Consequently the maximization of the complete data likelihood is
the same as the Muthen's balanced data approach for hierarchical data using the
LISCOMP (Muthen, 1986) or EQS (Bentler, 1989) and so on”. The complete data
sufﬁcient statistics can not be observed but can be estimated by means of their
conditional expectations given the observed data and a current estimates at the
parameter values. This process constitutes the E-step.

At level 1 (within cluster) we have p observations on each of n units, collected

in the p by 1 vector y,. These vary randomly around the cluster mean ,u, according to

the model:

yq=ﬂj+e,,e,~N(0,2) (1.4.15)

The model at level 1 can be written as :

[:lj]=[:lj:lpj +[:”] (1.4.16)
2 2; 21'

16

where y”. is pnlj by 1 observed vector, yzj is p122]. by 1 missing vector.

A”. =1m®1p

A2,. =lm®1p

sq, =1,U®2 (1.4.17)
‘11,, 4,1,8):

"=71” +1221.

At level 2 the model can be written as:

”j : ﬂy +upuj ~ N(O: Tim)
(1.4.18)

x1. is a vector of group level observed variables.

The expected value of the complete data sufﬁcient statistics for within group variance

covariance matrix given the parameter estimates from the M step of the previous

iteration is:
J O 0
SW = Slw +W(J_ 1)2w +ijnj[Lj +(ylj -y2j)(j71j _y2j)T] (1419)
j=l
S“, is the usual pooled within-group variance-covariance estimate based on the

observed data;

T" 1420
T... (..)

17

L1. = (1211.2: +T')“n,j. ;' (1.421)
W __
" n (1.4.22)
w=1v,/n1

N2 is the total number of missing level-1 units. 57,}. is the observed group mean

vector and 72']. is the posterior mean vector for the missing observed data in each

group.
ii,- = LT?”- +(1-L‘}-‘)l'x,- (1.423)

r = Tali? (1.4.24)

The expected value of the complete data sufﬁcient statistics for S” given the

parameter estimates from M-step of the previous iteration is:

5,, = S... +26, —r)1w,(i;,- —r.,.)—Wo7; —y.)1’ (1.4.25)

51,. = 2x)??- - Jﬁ’ (1.4.26)

The sufﬁcient statistic for Zn is:
J
Sm =ij f—Jﬁ’ (1.4.27)
j=l

The expected value of the complete data sufﬁcient statistics for variance
covariance matrix for y given the parameter estimates from the M step of the previous

iteration is:

18

S» = 1:27:57; +nwaL}l +JWZW 41.077" —(n/J)Zw,2.L;' +172 (1.4.28)
where
7; = (1--wj)ylj +wjj7;j (1.4.29)
y‘ =(1—W)j7,+1772’ (1.4.30)
_ 1 J _
yr =72"qu (1.4.31)
1 i=1
J
N1 = Zn”. (1.4.32)
j=l
. 1 J .
Y2 =1—v-Z(n-nl,-)Y2, (1.4.33)
2 j=1

Given the starting values produced by Muthen’s ad hoc estimator, expected
values for the complete data sufﬁcient statistics are calculated by a Fortran program
(Jo, 1993). These estimates will then be used to obtain maximum likelihood estimates
of the parameters using the EQS program. An executive computer program provides
the mechanism to switch on the Fortran program for the E-step and then the packaged
program for the M-step.

The previous work in the ﬁeld of multilevel structural equation models made
substantial progress. In this dissertation we propose a new approach which does not
require classifying level-2 units into a subset of equal sample size.

In conclusion of this chapter we provide a brief preview of subsequent

chapters. In chapter 2, we present the general structural equation model with an

19

example. We also transform the model in terms of mixed model form. And we will
brieﬂy describe some typical research questions that may be addressed by means of
multilevel structural equation models. In chapter 3, empirical Bayes estimation
procedures are developed. The maximum likelihood estimators for the parameters are
given, and we present the observed log-likelihood function. In chapter 4, artiﬁcial
data are generated for checking the accuracy of the parameter estimates. The analysis
is carried out by use of a program in Gauss. The index of goodness-of-ﬁt of the model
is presented, and the likelihood ratio for two alternate models is also given. In chapter
5, the summaries of each chapter are given, and the implications of the models are

discussed. And future research questions are also presented.

CHAPTER 2

MULTILEVEL STRUCTURAL EQUATION MODELS

To illustrate how measurement and substantive theory can be integrated between and
within levels in one overall ﬁ’amework, a hypothetical achievement model will be examined as
an example. Consider a model where achievement scores of a mathematics test are believed
to be inﬂuenced by a student's attitude toward mathematics, individual characteristics, e.g.,
gender, and class characteristics, e. g., teaching styles. The teaching styles such as discovery-
oriented instruction or expository teaching are believed to inﬂuence attitude and achievement
on the classroom level. Gender also is believed to be related to students' attitude and
achievement on the individual level. The path—diagram for this hypothetical achievement
model is shown in Figure 2.1.1. In our example attitudel measures an individual's view
on the useﬁrlness of mathematics in our life and is based on the sum of scores on the
four attitude items, each of them scaled as a Likert (1932) response with categories:
strongly disagree (1), disagree (2), undecided (3), agree (4), and strongly agree (5).
These items are: l. I can get along well in everyday day life without using mathematics.

2. A knowledge of mathematics is not necessary in most occupations.
3. Mathematics is not needed in everyday day living.

4. Most people do not use mathematics in their jobs.

Attitude2 measuring "Attracted" to mathematics, is based on the sum of the scores on
the ﬁve attitude items, each scaled as ﬁve-category Likert; strongly disagree (1),

disagree (2), undecided (3), agree (4), and strongly agree (5). These items are:

1. I would like to work at a job that lets me use mathematics.

20

2 1
2. I think mathematics is fun.

3. Working with numbers makes me happy.
4. I am looking forward to taking more mathematics.

5. I refuse to spend a lot of my own time doing mathematics.

The ACHl is the ﬁrst part composed of basic facts and principles, while the ACH2 is

composed of problem solving questions.

 

 

 

 

 

 

 

 

teaching
style
betabl betab)
class class
attitude achievement
alphabl
1.0 Lambdabl 1.0 LambdabZ
attttudel attitudeZ
ACH I
I . 0
Lambdawl I .0 Lambde
I ha!
attitude 0 p \
/ achievement
beta]
beta2
gender

 

Fi 2.1.1AP hD’ mforM l' E u tionM e1

2.1 The Model and thg Basic Notation

22

A simple item level equation for each individual:

yr] : Awntj +1‘b’7bj+ 81'}

4H

ylr’j

y 31;
_y4tj

 

y2ij

1

 

 

N‘t
‘-

 

 

bfl 2-1 i Up\

0 ‘ ' 1 o ‘

0 7719' + ’11» 0 F771»):
1 7721'; 0 1 JIM):
w2_ .. O 152.1

 

 

+ 82"
33,].

 

(2.1.1)

(2.1.2)

where j=1,2,..,J for classroom, i=1,2,...,nj for students nested in classroom j. The

subscript "w" means the within-level, while "b" means the between-level.

In terms of our educational example, equation (2.1.2) can be expressed as follows :

attitude2 ,1.
ACH1,

 

where 3:1 ~ N (O, 2), a typical form for 2‘. is : 2 =

—attitudel,j _

_ ACH2, _

 

 

 

attitude”.
achievement”.

}.

 

 

 

attitude“ I.
_achievementb21.
O 0
a: O
0 0%
O 0 a

]+

 

 

Assuming structural linear relationships among constructs, the theoretical

 

' (2.1.3)

2 3
relationships on the within-level depicted at the bottom part of Figure 2.1.1 can be

expressed through the following structural equation:

”10' = 0 0 Um- ﬁlo ﬂzo z"). uuj
[7721!] [a] Oiinzyi+iﬁ30 [3,0] [220.] +[u2ij] (2-1-4)

where 11,]. = [uwuw]r, u”. ~ N(O,A), A = Diagonal(5p,p =1,...,P).

In terms of our educational example, equation (2.1.4) can be expressed as follows :

attitude”. = O O attitude”. + .510 320 1 + ”Ir (2.1.5)
achievement,j a, 0 achievement”. '63,) ,640 gender}, 112,1.

Equation (2.1.4) stipulates that on the individual-level the latent variables are
captured as a structural linear function of themselves and the predictor variable. In our

example, gender is used as a predictor variable. In equation (2.1.4) 22,]. is gender, while
2“,. is unit value so that the model has intercept terms.

Now we reduce equation (2.1.4) into the equation (2.1.6).

[“1 =1: 1 1% (2111111 ‘11")
772:; ‘a1 1 ﬂso 540 22a _al 1 ”211'

Then the reduced form for the within-level structural equations (2.1.4) is :

24

 

”to
,. 2,. z ,. O O - 7: vi.
'7" = " 2’ 2° + ” (2.1.7)
7720' O 0 21:1" Z;.~,-_ 7’ 30 V20
1.71.40-

 

where

 

 

P750-

-1
”20 =veco[l:1 0] [7610 £201)
”30 ‘a1 1 1630 7640
-7’403

where vec' stacks the transpose of each row of a matrix into a vector.

In terms of our educational example equation (2.1.7) Mean be written:

l'ﬂlo'i
attitude”. = 1 gender, 0 0 7:20 + V.) (2.1.8)
achrevement,j O O 1 gender”. - 7:30 v20.
_”w_

 

 

Now on the between-cluster level, the structural relationships depicted at the upper

part of Figure 2.2.1 can be expressed as follows :

25
[77011:] ___I: 0 0:][77011] + [ﬂat] [WI] + ubu‘ (2.1.9)
77sz ab] 0 77sz b2 "sz

where uh}. = [umum]r, uh]. ~ N(0,A,,), A, = Diagonal(6,,p,p =1,...,P).

In terms of our educational example, equation (2.1.11) can be expressed as follows :

attitude ttr't de . .
. bi = 0 O a. u b’ + p“ [teachingstyle.]+ u,” (2.1.10)
achrevementbj a b, 0 achrevementbj ’ ub

b2 21'

Equation (2.1.9) is the expression for the structural relationships among latent variables
and the predictor variable on the group level. In equation (2. 1.9) w]. is teaching style. All
of exogenous predictor variables are observed directly Without error, e. g., school location
(rural, urban), school sector (public, nonpublic), religion, gender, ethnicity, family size
(numbers of a household), individual's age in months and years, current membership in a
political party or sports club.

Then we have the reduced form for between-level structural equations (2. 1.9)

m” = l 0 ‘1 ﬂu 1 0 4 ubli
[771.2,] [—a,, 1] [WNW-FLO!“ 1] [um] (2.1.11)

We can represent (2.1.11) in the following form:

26

77b} = Pyjﬂbj + Vb; (2.1.12)

w 0 V -
[mu]:[ 1 ][”b10]+[ bit] (2.1.13)
’7sz O W]. ”020 vblj

01'

where

Vblj $‘\0 Vb” :[ 1 0]“ uh”- v.~N(0T )
V321 "Qbr'l V52" —abl 1 ”b2,- ’ b] , m’ ’

”bro __ 1 0 -1 ﬂbl
[”bzoi—[—ab1 1] [ﬂbZ]

In terms of our educational example, equation (2.1.10) can be written as follows :

[ attitude,j ] ___ [teachingstylej 0 ][7rm:l +[Vb1 J] (2 1 14)

achrevementbj 0 teachmgstyle 1' am v”).

Representation of the Equations in Matrix Form

We can represent the equation (2.1.1) in matrix form without subscripts:

y=AWn+ABnb +3 (2.2.1)

27

 

 

 

 

 

 

 

 

where
y : [y111y211y311y411,---,y1n,ly2n,/y3n,1y4n,.l IT: incur emple, 4N by l veCtor~
a = [amaznsmsm,...,£,nﬂeznﬂ£,nﬂe,nf,]r ,in our example, 4N by 1 vector.
N 2 Zn}.
"Aw, 0 0 0 ‘
0 11,, 0 0
AW : 0 0 A103 0 s
L 0 0 0 A,“
FA”, 0 0 0 I _ 1 0 _
0 A”, 0 0
1,, 0
where, A”. = 0 O AM.3 0 , A”... = 0 l
o o 0 Am, L 0 2.9L
"A,l 0 0 0 1 _ _
0 A 0 O 1 0
b2 A.“ 0
AB: 0 0 A,, 0 ,Abj- 0 1
0 11
_ 0 0 0 Ade - ”-

 

 

77,1. =[77,”772”,...,17W‘Jn,nr,]r is in our example, a2N by 1 vector

28

77,, = [nb1ln52,,..., 1),, Jaw? is in our example, a 2] by 1 vector

The matrix form for the equation (2.1.4) without subscripts is:

n: An+Bz+u
where
’A, 0 0
0 A2 0
A = 0 0 A3
_ 0 0 0

 

 

 

 

 

 

"B, o 0 0 8,.
0 B, 0 0 0
B: o 0 B3 0 ,8I 0
_0 0 0 B, L0
ﬁlo #20] ..
B.= forallor).
" [.330 ﬁ40

(2.2.2)

 

 

u = [u,,,u2”,...,u,,, ﬂu," 1,]7 is in our example, a 2N by 1 vector.

29

The matrix form for the equation (2. 1.9) without subscripts is:

 

 

 

 

1],, = Ab 17,, + Bbw + ub (2.2.3)
where
FAbl o o o‘ "A”, 0 0 0'
0 A12 0 o O Ab}.2 0 0
0 0
Ab: 0 0 A,3 0 ,Abj= 0 0 Ab); 0 ,Abj,=[ ]
ab, 0
_0 0 0 A,” L0 0 0 0 Arm
Ab], is a lower triangular matrix with diagonal elements are zeros.

3,, 0 0 0
0 3,, 0 0
B - 0 0 B 0 B —[ﬂ“°] '
b _ b3 9 bj — 801088811].
761220
_ 0 0 0 Bud

 

 

u, = [ub,,ub2,,...,uwum]r is in our example a 2] by 1 vector

Now we can express the reduced form equation (2.1.7) in matrix form:

7] = Z7: + v (2.2.4)

30

where, 771.). = [7711177211,...,771,,J J 77an ,]T is in our example a 2N by 1 vector

_ _:r,,.j 22,]. 0 0

up
It)" ‘21)“

 

 

 

 

V.)- = [vmvm ,--«,V1..,JV2..,; ]T is in our example a 2N by 1 vector.

We can express the between-level reduced form equation (2.1.12) in matrix form:

’71, = Wll'b +Vb (2.2.5)
272* I

where

1], = [abut],21 ,..., 0,1,77sz ]7 is in our example a 2] by 1 vector

31

W: . , ng_=|:u/1J 0], 7Tb=|:ﬂblo:|
0 Wu ”1220

 

 

3’
vb = [vbllvb2,,...,vmvb2,]" rs rn our example a 2M by 1 vector

Table 2.1 Structural parts of the multilevel structural equation model

 

 

Original Form Reduced Form where
Withingroup n=A77+Bz+v n=Z7r+v 7r=vec[(12—AJ.,.)"BJ,]
Between group 17,, = Abnb + Bbw + vb 7h = Wzrb + vb 71;, = (12 — Ab], )'1 Bb).
e} $311 4 ‘

 

Transforming the Model into the Mixed Model Form

By substituting the structural equations (2.2.4) and (2.2.5) shown in Table 2.1 into
(2.2.1) without subscripts, we have the following combined equation (2.3.1). This
representation permits us to develop a special version of the EM algorithm for multilevel

structural equation models.

y =[A,,Z |A,W][:]+[A,|A,][: ]+a (2.3.1)

32

In a more compact form we have:

y: AOZa+ Aow+£ (2.3.2)

where

A0 : [AwiAbJ’

N1
ll
NI
0
I__1

 

The model equation (2.3.2) is a special case of the general mixed model (Raudenbush,

1988)
Y=4Q+AQ+E Q3”

.A,=1\,Z,
A2 = A0,
g=w/*”‘” 03%
0, = 7r,'/ \
(7.1 4-1 ‘ ..
E=£,"*""M

3 3
In equation (2.3.3),

6, ~ N (0,1"), since our prior knowledge about 19, is assumed null, the
prior precision associated with 6, becomes null.

02 WM», 92 3893890

r 1,031, 0
0" 0 1,81“

E ~ N(0,‘I’).

‘I’=IN®Z,

Based on this general model(2.3.2) I develop the empirical Bayes estimation procedure in
Chapter 3.

In our structural equation model, we consider a population of N level-one units,
indexed k (group) and i (individual). Associated with each level-one unit are three vector-
valued variables y, z and w. The values of the design variables, 2 and w, are completely
known for all level-one units before observations are carried out, but the values of the
outcome variables, y (the four indicators in our illustrative example), are not known at all.
Design variables are considered ﬁxed and known in our multilevel structural equation
models.

Then the marginal distribution of y is:

y ~ N(p,<I>) (2.3.5)
where

,u= AOZn' (2.3.6)

34

ch: AOZHAOZ)’ + AorzAf, + z

where

And the conditional distribution of y given r] is:

yl n~ N(Aon,>3)

(2.3.7) -

(2.3.8)

Note that in the model there are not measurement errors at 2 levels that are distinct from the

model residuals. This is a limitation on the illustrative example. To lmve an identiﬁable model

we restrict the factor loadings for the ﬁrst indicators of the latent variables to unit, the

variance-covariance matrix 2 to a diagonal matrix, and “A” matrices to be lower triangular. In

our model the total number ofvariance-covariance parameters to be estimated are 16, while the

number of unique elements of the variance structure are 10 for each level.

One can also include the group-level observed variables (global variables) for latent

constructs in the general multilevel structural equation model (2.1.2). For example, in the study

of the United States SIMS data each of the constructs of teaching practices and training and

experience are measured by two indicators (Vredevoogd, 1993). In that case we have the

following form of the item-level equations for each individual:

 

 

 

 

 

 

 

 

"y,,,‘ ’1 0‘ F1 0 0 ' ”an,”

y 21:: ’1'; (I) r A: 0 O P772311: 521::

y3ld 77m 1 0 33k,

ym = O ’12 _772u]+ 0 ’182 0 mm + 341:: (2.3.9)
xw 0 0 0 1 “my: 35,“.

_x2a., _ O 0 _ L O O ’183_ fan-J

where x,,,, ,th are the indicators for the group-level construct, teacher's teaching
practices, which is supposed to inﬂuence the class posttest (Vredevoogd, 1993).
Then one can see that this model is a special form subsumed into the general model
(2.3.1).

In the conclusion of this section we note the measurement model speciﬁcations. There
are three types of speciﬁcations. The ﬁrst measurement model is implemented by requiring
equal factor loadings for all manifest variables and equal unique variances (Joreskog, 1971).
The second measurement model retains the assumption of identical error variances across
measures, but allows factor loadings to diﬁ‘er. The second model provides a more realistic
description of actual data where observed measures are similar in content but differ in
difﬁculty. The third measurement model is that the observed measures have identical factor
loadings but have unequal error variances. Of particular importance are the measurement
models in which those measurements have different factor loadings and unequal error
variances, but the manifest variables are highly correlated (i.e., they measure the same thing to

somewhat high degree).

CHAPTER 3

EM ALGORITHM FOR MAXIMUM LIKELIHOOD ESTIMATES

After a model has been formulated, the statistical problems are to estimate the
parameters in the model and to test the ﬁt of the model to the data. General descriptions
of the EM algorithm for the multilevel structural equation models are given in this chapter.
Dempster, Laird and Rubin (197 7) presented the EM algorithm as a general iterative
method for computing maximum likelihood estimates ﬁ'om “incomplete data”. Wu (1983)
presented it in a more general context, viewing it as a special optimization algorithm. The
EM algorithm is particularly useﬁrl when analytic expressions exist for the conditional
expectation of the missing data and for the maximum likelihood estimates (MLE) of the
model parameters given the observed data and missing data. Although in the literature it
has been known as a method for estimating parameters of a model when observed data can
be regarded as incomplete data, there were early uses of EM notions by Hartley (1958),
Healy and Westmoratt (1956), Baum et al (1970), Brown (1974) and Sundberg (1974). In

Rubin (1991) the essential idea of EM algorithm is brieﬂy depicted:

“The basic idea behind the EM algorithm is very old and very intuitive and can be
colloquially described follows:

1. Given a problem that is difﬁcult to solve, formulate it so that if missing data were
observed, then the solution would be at hand; in particular, formulate the problem

so that a good estimate (e.g., the maximum likelihood estimate, MLE) of the

36

37

parameter 9, 6, would be easy to ﬁnd if the missing values, Y were observed in

"HS ’

addition to the observed values, Y . Notice that ”missing data” is viewed quite

broadly to include, for example, latent variables in psychometric models.
2. Consequently, ﬁll in a set of values for Ym and solve the problem(i.e., ﬁnd 6).

3. Using this 6, ﬁnd better values of Y

rm's

to ﬁll in, and then repeat Point 2 to ﬁnd a

new value of 9.

4. Iterate until the values of 9 converge."

Based on this basic notion, one can conceptualize the implementation of the EM
algorithm for the multilevel structural equation model. In section 3.1, we discuss the concepts
of incomplete and complete data as applied to the multilevel structural equation model. We
also develop the posterior distributions of the random vectors in equation (2.1.3). In section
3 .2, we present the iterates for the implementation of EM algorithm. We also present the
maximum likelihood estimates. In section 3 .3, we present the observed-data log likelihood

ﬁmction.

3.1 General Description and Application to the Multilevel Structural Eguation

M2191

Through casting the measurement model and the reduced form of the
structural equation for latent variables into the general mixed model (Raudenbush,
1988), we can conceptualize our problem as having complete data and incomplete
data. Note that in the multilevel structural equation model the factor loadings are

parameters rather than observed predictors.

38

Then to compute maximum likelihood estimates of the dispersion matrices
for the random vectors in the model (2.3.1), we apply the EM algorithm (Dempster,
Laird and Rubin, 1977). We now discuss the concepts of complete data and
incomplete data, as applied to model (3.1.1)

From Chapter 2 we have the following combined equation:

A1011“ + NW5 “W

“cil- 34L, (11' “42 'ul 2'

ya =[AWquA W]I::b°o]+A,,6,j+A,0w+gv (3,1,1)

In more compact form we have :

{it ' i GU i
J L 9% ,
ya. =A 0Z 7r+AowU +80. (3.1.2)
4" [(WU' )0 ' \
where at“ I" "i
~ Zﬁ 0
"Ao=[A,|A,], Z..-—- 0 W“ (3.1.3)
44 4” W 4 2.".

r o
7::[4'9], 7r~N(0,I‘),1“=[l 1.
i141 ”b9 0 er

Since our prior knowledge about it is assumed null, 1“", the prior precision
(Dempster, Rubin and Tsutakawa, 1981) associated with it, becomes null, that is,

I"I —> 0. And,

39..

8,, ~ N(0,>:), 2 =

 

 

In the model equations,

yo. =A,, ,1. +Ab’hj +531.
%=%%+%

”a =Wr”bo+‘9br

O = {A,,A,,2,Tm, T0} is the set of parameters.

yo,” = {Y,Z,W} is the set of observed data.

c = {#0, It”, 65,, 0”. , .} is the set of missing data.

r_l.n()\Or'V\
{OW} ‘1 §

(3.1.4)

40
The conditional probability density function is proportional to-ithe joint probability

, d

density ﬁrnction :

f (495.195,) °C (271)—N'I2l23l—m2 “FIFO-5):: 2 0’1,- — Aozaﬂ" A0 975-) )T
2" (ya. — AoZng— A0 w, )] x (2 70"” |T,,|"‘"2 exp[(-0.5)Z 2 (6;T1 6 )]

’7 if

x(27t "“2le rm exp[(—0.5)Z(0;T;9,.)]xh(n) (3.1.5)

where c={ no, 750,0 .,0,,.,e,,} , G) = {A,,Ab,Z,Tm,Tn}, r=the number of indicators,
p=the dimension of 6”. And s=the dimension of 0b,. The prior distribution h(7r) is

considered a very small constant and it can be ignored while the empirical Bayes
estimators are calculated (F otiu, 1989; Dempster, Rubin and Tsutakawa, 1981).

If ”e" were observable, some function of "c", t(c) would be a vector of
complete data sufﬁcient statistics for the dispersion matrices. In reality, the vector ”c"
is unobservable; however, the vector y, whose elements are linear functions of the
elements of "c", is observable. In the realm of the EM algorithm, we regard the
elements of "c” as the "missing data" and those of y as the "incomplete" data.

Then we can develop the iterative E-step (expectation step) and M-step
(maximizing step) for computing new parameter estimates. The E-step consists of
estimating t(c), which would be a complete data sufﬁcient statistics [if the vector "c"
of complete data were available, by its conditional mean given the observed data and
current estimates of O. The equations that are solved for parameters in the M-

step can be regarded as an approximation to what would be the likelihood equations

41

if the vector "c" of complete data were observable.

From Dempster et al. (1977),
Each iteration of the EM algorithm solves : £[Q(®,®m)|e=ew, ]= 0
where Q6190") = E(1nlf (C;@)]|Y = 359"")

The necessary posterior location vector and dispersion matrix of the random

vectors in the model (2.3.3) is :

(F
D. = Apr-'54, +1“l Any-‘54, “= D; c;,
A,‘I’"A, A,\P"A, +r" Cg, D;
D; = 14‘?" A. - Ai‘I’“A5<Zf/(A{‘P"A5)’]"
D; = Cl +C-'(A,sI—‘A,)D;(A;sI-'A,)C-‘
(3;, = —D;A;\P-'A,C—'
Cir = (C9
on “9 9
where
_‘__C_.'~'.-. A,"~I’"A, + r'
mam-1 f n‘ =D;A;(1—~11-‘A,C—‘A;)\P-‘r 9
" 1

(11"5W"”*'”"‘) i§,=c-:A;w-n(Y—A,i. '
TV

These posterior distributions given parameters provide point estimates and intervals

:ihSn'LC O i

94h ﬁiﬂLl-ﬂ ‘ L‘- ‘L «011):
m Tar A (J: )1 if (WU-71.1.) T\ / 1

’ ‘5
J

x, :é-Wi T02)“
OT “7.1””:j‘Ab—er" 1\/ A .
: (L137 «ﬁgs/g it} MA \(lvih T” r TiJAOIZJ “(1: W5? Ker/(rs,
'T IT‘ ~41 Wj
~ 112541- 5 .45.: 141 1.21.4 441’ 4 +

I 1 .
M .
V J

.m - . T T . .)‘lA U1,
‘ H :j:" r. , "r +(\.J (.1__ 0t,

*1 ' 01/1,;
:55,“ “1.- 2334‘4831 2.,4 2351.4,“ 5211‘ z A... A

(1)3107

W: T/Eid/io L5):
+7?“ij ,A04— A50 ZUK? + )(5) J
(3)

42

needed for inference about the random vectors. These derivations of the posterior

distributions draws heavily on prior research (Raudenbush, 1988).

3.2 Computation of the Iterates
Each iteration of the EM algorithm is composed of two steps, the E-step
(Expectation step) and the M-step (Maximization step). The E-step carries out
estimating complete data sufﬁcient statistics by their conditional expectations given
incomplete data, and the current estimates of parameters. The M-step consists of
solving the complete data log-likelihood function.
To ﬁnd the ML estimates, we have to maximize the function Q(®“’,®“"’).
By deﬁnition Q(@"’,®“"’) =E(log[f(c;®(”)]|l’ = y;®“‘"). As shown in Appendix
3, we have:

0 ,4. ~

Q(®“’, o“ '>) = ( Nr/2)ln 21t+(-N/2)ln|2|
3 u . f a 1

+(—Np/2)ln(27t)+(— N/2)ln|T |+(- Js/2)ln(27r)+(— J/2)ln|T,,l_

-2243 {(A0 2, )D; (A 2.1") +(,\ 1); A7,)
+§<A 2,, c;, Ag)+(A, 0;; 2,7157%]
_%ZZUU_AOZU” "— Aowq)’ 2 (ya—AOZU”.—Aow.;)

(70*
l . 1
’Ezztrrrg‘E(65.-6;IY=y,®"">>1-§ZtrlT.;‘E(6b,9JIY= yo“ ”)1 (3.2.1)

In order to maximize, we take the ﬁrst derivatives of Q(®‘”,®‘"”) with

respect to Tme, 2,Ao, respectively.

4 3
'7 _,- T
r ' . ’/
I I; r "p I
a. '. l

 

) 0‘Q(®“’,®""’) = _ g
51‘ 2

q

(1 [2’r,;‘—D(T,;')] ii -i

”’éZZIzTg'EWMIY=y.®“"’)'r;'-D{T;'E(0t0;IY=y,®“"’)T;'}1 (3.2-2)

where D means a diagonal matrix (Graybill, 1983). Thus D(T) is a diagonal matrix
with i-th diagonal element equal to the i-th diagonal element of a matrix T. Setting the
derivative equal to null matrix and solving gives the ML estimate (see, Press, 1982;
Magnus and Neudecker, 1986).

Then the ML estimate is :
. 1 ,
T». #23656? +09. )1 (32-3)

5Q(@(",@("”)__1_V_ —1_ -1
(2) 51. - 2 [2T,,, D(Tq,)]

'Ib

+%Z[2T;IE(%9LIY = x0“ >1: 'D{T;E(0br9;|Y = y,®““’>T;.‘ H (32-4)

Setting the derivative equal to null matrix and solving gives the ML estimate.

Then the ML estimate is :

Tm =%[Z(0;9;f +D;U)] (3.2.5)

{U
e Y J
C
, y a 4 1.2-
In.

a"?

:3 "T \‘l’
t ‘ . L I ‘+'

j)4

44

(3) @(®U)a®(i-l)) af
as, 502

I'

= [—%[22-' —D(2‘.")]

1 _ ,._ _ _ _ .
4322122 'E(a,.,a,.f|Y= y,®< ”)2 ‘ —D{2 ‘E(a,.jg,.f|r= y,®( ”)2: 1}]] x1,

0

r 7-.

(3.2.6)

- x J I”
where I: is the column indicator vector which has a 1 in theX—th position and zeros

in other positions. And 2, is a full matrix. Setting the derivative equal to zero and

solving gives the ML estimate,. ea. In appendix 4 we present the ML estimator for

each element of 22.

 

i=DiagonaI(6f,..,6f) (3.2.7)
52
_ ._ (3) g @
5Q(®"’,®" 1)) 5A7 = _ -1 ”i 0 _ J“ -1 v ”r
(4) M 0318,, [ [ZZZ AZUCW] [>2 ACWUZU]

Q o “' ® o
422 2"AZVDJJ HZZ 2"ADWUJ
v ‘/ ~
+022 2"yy-n"251+tzi>="yl will-[ZZZ >3“AZ".zr‘rr"Z.-I 1
(l ) (l) J
{22 E" (Auditqz; +AZJ. 71" wig] —%z Z Z‘lAw; q?%]] x 1:27): (3.2.8)
'4- ) (S) J

45

90) 90-1))
é’A

where A is a ﬁll matrix. The details of derivation of 6Q(

 

are given in

appendix 5. Setting the derivative equal to zero and solving gives the ML estimate,

28,. In appendix 5 we present the ML estimate for £0.
ii, = [AAA (3.2.9)

where [askL means placing element as, in the g-th row and k-th column of matrix

A, and zeros elsewhere. In our example, g= 2, 4, k= 1,2,3,4.

In sum the E-steps and M-steps are:
(1) E-step : Find Elog[L(c,o)|y,Tf;-'>],
M-step : Substitute the equation (3.2.3) with these quantities, and then we obtain new
T”, set TS) equal to this new T",
(2) E-step : Find Elog[L(c,®)|y,T,h("”],
M-step : Substitute the equation (3.2.5) with these quantities, and then we obtain new
Tm, set T5,? equal to this new Tab.
(3) E-step : Find E log[L(c,®)| y, 23 (H) ],
M-step : Substitute the equation (3.2.7) with these quantities, and then we obtain new

23 , set E“) equalto thisnew 2.

4 6
(4) E-step: Find It", 01:, Dz“, 0;, CW“. Notethat these are all ﬁmctions of A‘s").

M-step : Substitute the equation (3.2.8) with these quantities, and then we obtain new

A0, set A? equal to this new A0.

Then here the ﬁrst iteration of the E and M step is completed. This algorithm proceeds until
some user-speciﬁed termination criteria are met. For example, the algorithm might terminated

when successive iterates differ from each other by no more than some number (i.e., = O“5 ).

3.3 Likelihood Function

We conclude this chapter with expressions for the observed log-likelihood
function which is numerically simple to evaluate. Although the EM algorithm does not
require an evaluation of the likelihood function, successive values of the ﬁmction can
be useﬁJl in monitoring the progress of the algorithm toward convergence at each
iteration. And it's used in testing ﬁt of alternate models.

Note the relationship among probability density functions :

=Po’lg’Pw’ 331
(y) P(6ly) (..)

 

In the framework of the general mixed linear model, the equation (3.3.1) is rewritten

as:

47
P(y|0,‘o,w,A,)P(qo,w,Ao)
P(61y,Q,‘P, A0)

 

P(y|Q,‘P,Ao) = (3.3.2)

The speciﬁc expressions for each of the density functions stated in equation (3.3.2):

P(y|0.0,‘P.Ao)=[(2n)”|‘l’ll"”exp[(-0.5)(y-A9)“I"‘(y-A9)] (3.3.3)

Hanna/x.)=[(2n)°lm1"”epr-asxsm-‘m (3.3.4)
where

y : the observed outcome vector for an individual

A : the design matrix for the multilevel structural equation model

a: [n1 af ]’
Finally, the denominator part in equation (3.3.2) can be speciﬁed as:

P(6|y,ﬂ,‘1’./\o) = t(2n)“ID;n"*’ exp[(-0.5)(6- Wozw— 6)] (3.3.5)
In particular, when 0: 0’, we have:

1301:2311, A0) = (27: 'N’2|D;|"2 |‘I’|"’2 [arm exp[(—O.5)S(0')] (3.3.6)

where:

s<€>=y’\r"(y—A.0:—A.6;) (3.3.7)

48

Now the log-likelihood ﬁmction for the structural equation model may be:
LLR(Q, ‘1’, Ao| y) at (-0.5)log|‘I’|+(0.5)|D;|—(O.5)longl—(O.5)S(6' ) (3.3.8)

First we evaluate:

det(‘P) = det(2 8 IN) = [det(2‘. )]”

det<2 )=(of) (oi) ...(of> (3'39)
log(det(‘I’)) = N[log of +logo§+...+logaf] (3.3.10)
And also:
det(Q) = det(Q,,)det(Q,h)
(3.3.11)

log[det(Q)] = N log[det(T,, )] + J log[det(Tﬂ)]

I‘ is considered large but ﬁxed, from Dempster, Rubin and Tsutakawa (1981).

Finally we have:

V11 V12 V13
det(D;) = det V21 V22 V23 = [det(Vu)l[det(V22 - V21V1il 12)][det(d33 — d32d2-21 23)]
V31 V32 V33

(3.3.12)

The second term in equation (3.3.12) is given in appendix 2 as det(Q;'). Let the

third term be det(U")

49

where
V V V
d = ” ‘2 d = '3 d =d' = 3.3.13
22 [V21 V22 ]’ 23 V23 ’ 32 [ 23] ,d33 V33 ( )
SW) = ZEUS-24m. -A...- n‘ -A... w; )1 (3.3.14)

Then the log-likelihood function for the structural equation model is :

LLR(‘I’,Q, Aoly) = (N/2)log(det 2) — (N/2)log(detT,,)— (K / 2)log(det T,)
+(1/ 2)log(det V11)+(1/ 2)Z log(det Q; )+ (l / 2): log(det Ug‘)

-ZZ[y£2"(yu -Amﬂ'-Ame~)l (3.3.15)

At each iteration the algorithm evaluates the log-likelihood function to monitor the

progress of the estimation.

CHAPTER 4

NUMERICAL RESULTS

In this chapter, I use a computer program written in Gauss (Version 2.2) to
compute ML estimates ﬁ'om a set of artiﬁcial data. To verify that the produced estimates
of the parameters are accurate the data are randomly generated with known
(predetermined) population parameters.

The analysis was done for the balanced case and the unbalanced case. The Gauss
program is designed to use cross-product matrices and initial starting values as input data
and to perform computing over numerous iterations of the EM algorithm.

The path diagram for the model is given as a ﬁgure 2.1.1. In the example the two
indicators for the ATT (attitude) latent variable are ATTI, ATT2. They are the student-
reported responses to the questions in the attitude scale. The ACHl variable measures
achievement score in the "principle" parts, while the second indicator ACH2 measures

achievement score in "problem solving" part.

4 1 cneratin the Data
Before creating the necessary data we have to consider several issues. For the
balanced data 10 subjects are selected per group. The distribution of the number of groups

per group size is given in Table 4.1. Due to the heavy computational load

50

51

of estimating these model via the EM algorithm, only a single sample data will be

generated.

Table 4.1 Number of Grggps per group Size

 

 

 

Group Size Balanced Data Unbalanced Data
6 10
7 10
8 10
9 20
10 500 450
Total 500 500

To create samples to be ﬁt to the multilevel structural equation model

speciﬁed in chapter 2, we modiﬁed the covariance structure by setting var( 7:) = 0.

Then the observed outcome vector y is calculated by using equation (4.1.1) :

0.2
y", "1.0 0.0 1.0 0.0“"10 2,, 0.0 0.0 0.0 0.0“ 0.1
y”, 0.82 0.0 0.75 0.0 0.0 0.0 1.0 2,, 0.0 0.0 0.31
y” 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 w. 0.0 0.33
y..,,_ _0.0 0.73 0.0 0.66_L0.0 0.0 0.0 0.0 0.0 w" 0.25
0.35

 

 

 

 

 

 

 

'1.0 0.0 1.0 0.0“va a,
. . 0.7 0.0 v.

+082 00 5 2,, + a, (4.1.1)
0.0 1.0 0.0 1.0 v,” a,

_0.0 0.73 0.0 0.66JLv,2,_ s,_

 

 

 

 

 

52

The values corresponding to the vector 7: in (2.3.2) have determined by using the
formula as shown in Appendix 1. The necessary values for ds and ﬂ's are :

,6l = 0.20,,82 = 0.10,,63 = 025,54 2 0.30,,6,l = 0.25,,B,,2 = 0.30, awl = 0.30, or,l = 0.20.

The values for 220.,w 1,12,11,50,“. 1.,va 131.52.53.84 are generated by the following:

(1) ﬁrst we generate 5000 "2" variable from a standard normal distribution. Then if
the value is bigger than 0.0 we assign l to it, while the value is less than 0.0, we
assign 0 to it,

(2) do the same for "w" variable for 500 groups,

(3) generate 500 between level random vectors from the population VC (variance
covariance) matrix, Th,

(4) generate 5000 within level random vectors from the population VC matrix, T”,

(5) generate 5000 measurement error vectors from the population VC matrix, 2 ,

(6) then we use the equation (4.1.1) to obtain a balanced raw data.

The IMSL FORTRAN library contains the necessary several subroutines. The
dimension of the observed variables is four (r=4). The dimension of the latent
variables is two (p=2). And then we create an unbalanced data (4890 data points) by
randomly deleting 4 for each of 10 groups, deleting 3 for each another 10 groups, and

deleting 2 for each of another 10 groups, and ﬁnally delete l for each of 20 groups.

53

Table 4.2 Descriptive Statistigs for ngbles

(1) Balanced Data

TOTAL SAMPLE SIZE = 5000

MEAN

Y1 -0.291
Y2 -0.187
Y3 -0.145
Y4 -0.097

ST.DEV SKEWNESS KURTOSIS MIN

7.735
6.477
7.913
6.055

-0.081
-0.07O
-0.003
-0.021

SAMPLE COVARIANCE MATRIX

Y1 59.824

Y2

Y3 16.487

Y4 1 1. 125
(2) Unbalanced Data

39.570 41.951

12.887
8.927

62.620

35.242

TOTAL SAMPLE SIZE = 4890

MEAN ST. DEV.

Y1 -0.285
Y2 -0.l87
Y3 -0. 167
Y4 -0. 100

7.805
6.533
7.957
6.041

-0.075

-0.092
-0.010
-0.017

~0.010
-0.005
-O. 125
-0.085

36.667

0.045
-0.018
-0.029
-O.137

ESTIMATED COVARIANCE MATRIX

Y 1
Y2
Y3
Y4

60.922
40.477
16.370
10.376

42.686
13.360
8.746

4.2 Regults 91' the Analysis

63.309
35.342

36.492

-33.444
-23.317
-27.877
-20.807

-29.918

-22.221

-27.866

-21.051

FREQ. MAX FREQ.

y—d—ia—sy—s

1
1
l
1

27.565
23.612
25.107
24.764

28.897
24.147
26.820
19.738

1
l
1
1

SKEWNESS KURTOSIS MIN FREQ. MAX FREQ.

|—|p—I_s_-‘

The output of the Table 4.3 and 4.4 are the result of ﬁtting balanced data and

54

unbalanced data to the same model. The focus of investigation is in discovering and
testing the estimates of parameters are close to the predetermined population

parameters within some what sampling error.

Table 4.3 Resultg of Analysis for Balanced Data

 

 

Population Starting Estimated

Parameters Values Parameters
it”, 0.82 0.582 0.857253
4.02 0.73 0.573 0.720532
4151 0.75 0.575 0.738310
21,, 0.66 0.566 0.652453
tn“ 30.00 20.000 29.006845
1,," 9.00 7.000 9.432951
1,,22 32.70 20.000 34.328632
tam 20.00 10.000 18.407147
tam 4.00 3.000 3.528167
1,,m 20.80 10.000 19.754972

of 10.00 8.000 10.851423

0'; 12.00 8.000 11.2542916
a“: 14.00 10.000 13.7741274
oi 16.00 10.000 16.1793495
aw, 0.30 0.350 0.3251942
at,l 0.20 0.300 0.1916734

 

Table 4.3.1 angitibnal gxbggtatibns of regressibn coefficients

 

A 0.20 0.100 0.3550013
ﬂ, 0.10 0.005 0.1492152
B3 0.25 0.100 0.6579540
[34 0.30 0.200 0.1218303
,3“ 0.25 0.100 0.4485403

3,, 0.30 0.100 0.2161860

 

55

Table 4.4 Rgsults of Analysis for Unbalanced Data

 

 

Population Starting Estimated

Parameters Values Parameters
21,, 0.82 0.582 0.855920
21,, 0.73 0.573 0.719029
21,, 0.75 0.575 0.736739
2,, 0.66 0.566 0.651504
1,,“ 30.00 20.000 29.563270
t,l2 9.00 7 .000 9.569872
Inn 32.70 20.000 34.697681
1%“ 20.00 10.000 18.153446
tam 4.00 3.000 3.465942
t,m 20.80 10.000 19.38584

of 10.00 8.000 10.87648

0': 12.00 8.000 11.29153
0% 14.00 10.000 14.04039
oi 16.00 10.000 16.18727
a,,, 0.30 0.200 0.32372
at,l 0.20 0.150 0.19092

 

Table 4.4.1 Congitibngl gxbegtations of regression coefficients

 

,6, 0.20 0.150 0.29023
3, 0.10 0.500 0.13452
,6, 0.25 0.150 0.74056
[3, 0.30 0.200 0.10906
13,, 0.25 0.150 0.47312

13,, 0.30 0.200 0.27258

 

56

Table 4.5 The Valuea bf tha Observed Log-likelihbod at Convergence

 

 

 

Balanced Data
Iteration 1353 -2902.3472413
Iteration 1354 -2902.3472410
Iteration 1356 -2902.3472408
Iteration 1357 -2902.3472405
Iteration 1358 -2902.3472402
Iteration 1359 -2902.3472400
Iteration 1360 -2902.3472397

 

T le 4.6 The Values f the serv dL -likelih dat Conve ence

 

Unbalanced Data
Iteration 1321 -2768.5733562
Iteration 1322 -2768.5733550
Iteration 1323 -2768.5733537
Iteration 1324 -2768.5733522
Iteration 1325 -2768.5733511

Iteration 1326 -2768.5733508

 

57
In discussing the results reported in Table 4.3 and 4.4 , we say that the EM algorithm

recovered the population parameters values well. The criterion used for convergence of the
observed log-likelihood is that log-likelihood is smaller than 0.1‘( 5 s 10‘6 ). In the 486 IBM
PC with the spwd of 66 mhrz for the convergence it took about 45 hours. For the
unbalanced case the hours spent are about 48 hours. The Table 4.5 and Table 4.6 show the
list of log-likelihood values at the neighborhood of convergence. As the Table 4.4 shows the
number of iterations is very large and the spent hours is very long. The slowness of
convergence of the EM algorithm is the repeatedly criticized property of the algorithm. This
seems to be caused by the fact that missing information (Meng and Rubin, 1991) is relatively
large in the multilevel structural equation model. To obtain estimates of a" s and ,0 s we
assume that the matrices, A and A, are diagonal matrices. Then as shown in Appendix 1, we
obtain the estimates of structural parameters. When we use extremely poor starting values, the
estimates are not similar. But if we use moderately poor starting values, then the results are
very similar (almost same) across various sets of starting values.

Aﬁer estimating parameters, the likelihood ratio tests are available to test more and less

complex models. It is known that the statistic, —21n(7:L), has an asymptotic chi-square

distribution, where L1 is the maximum likelihood value of a less complex model and
L, is the value of a more complex model. The degrees of freedom of chi-square

statistic is the difference of the number of parameters to be estimated in each model.

In our educational example, the model for L, may be constrained as follows.
(I) Tq:Tn.2 (2) szAb

As the Table 4.5 shows the number of parameters of the simpler model is 10 while the

58

complex model has 16 parameters to be estimated. The degrees of freedom of chi-
square is 6. The deviance between the two values of -21n(Likelihood) is 662.24. This
value is large enough to reject the adequacy of the simpler model. In this likelihood
ratio test we do not know which of the restrictions is not adequate. Thus for each of
the restrictions we may test the adequacy of the simpler model rather than complex

model.

Table 4.5 Rasults of Analyaia for Unbalanced Data for Restricted Model

 

 

Population Starting Estimated
Parameters Values Parameters
31,, 0.82 0.75 0.841037
21,, 0.73 0.70 0.692301
11,, 0.75 0.70 0.720219
21,, 0.66 0.50 4 0.837227
I," 30.00 15.00 32.647209
1,," 9.00 5.00 10.538582
1,,22 30.00 15.00 36.943765
tn," 20.00 15.00 16.459542
tn... 4.00 5.00 7.358164
1% 20.00 15.00 25.980356
of 10.00 8.00 9.236718
0‘; 12.00 8.00 12.450677
0'; 14.00 8.00 12.120453
0?, 16.00 8.00 13.863560
awl 0.30 0.200 0.323054
(1,, 0.20 0.150 0.447046

The value of log-likelihood -3104.68428

 

 

 

5, 0.20 0.150 0.92709
5, 0.10 0.500 0.17796
53 0.25 0.150 2.05197
5, 0.30 0.200 1.17230
5,, 0.25 0.150 2.64902
5,, 0.30 0.200 1.48537

 

The multilevel structural equation model postulates that the causal relationship
on the within level and between level are different because the explanatory variable is
different, in our example, gender on the within level, teaching methods on the
classroom level. We may calculate the root mean square residual (Joreskog and
Sorbom, 1993) for an overall goodness of ﬁt measure. In the literature of LISREL,
we found the root mean squared residual can be used to compare the ﬁt of two

different models for the same data.

1

RMQR= [22“, --c'},.)2 /(r2 +r) 5 (4.2.1)

where s, is the i th row and j-th column variance-covariance element of the sample

total variance-covariance matrix. And 6,, is the corresponding element of the model

predicted total variance-covariance matrix,fl. RMQR is a measure of the average of
the ﬁtted residuals. The model predicted total variance covariance is given by
iziw+f§, . then we obtain, RMQR = 0.73850 for complex model, RMQR=
5.98066 for simple model. Judged by these index we may conclude that the ﬁt of the
simple model to the hierarchical data is not as adequate as in the complex model. This

conclusion was expected because the data were generated in a hierarchical fashion,

60

and the restrictions makes the model a single level structural equation model. Now we
make inferences about It’s in balanced data case. Speciﬁcally, from the posterior
variance estimates of 71’s we obtained the standard errors for n’s shown in Table 4.6.

and 4.7. For the

balanced data the t-statistics are:

[1.247 0.8373 0.5566 0.8806 1.09108 0.6998]. Thus the expectation of the

“attitude” latent variable for the girl students in classrooms of which instruction style
is expository is signiﬁcantly different from zero. And expectation of the “ACH” latent
variable for the girl students in classrooms of which instruction style is expository is
not signiﬁcantly different from zero. The total effect of “gender” on the “ACH” latent
variable is not signiﬁcantly different from zero. The “gender gap” on the “attitude”
latent variable is not signiﬁcantly different from zero. The total effect of discovery
teaching style on “ACH” latent variable is not signiﬁcantly different zero at the
between group level. And the effect of discovery teaching style on the “attitude”

latent variable at the between group level is signiﬁcantly different from zero.

Table 4.6 Estimates of £3 for Two Sets of Data

 

Balanced Data Unbalanced Data

 

Estimated Standard
Parameters Error

Population Estimated Standard
Parameters Parameters Error

 

71', 0.20 0.3550 0.28560 0.2902 0.2137
ﬂ'z 0.10 0.1492 0.17818 0.1345 0.1845
7:, 0.31 0.7734 0.30602 0.8345 0.2827
71', 0.33 0.1703 0.19344 0.1526 0.2216
7r,l 0.25 0.4485 0.41105 0.5316 0.4363
It),2 0.35 0.3021 0.43176 0.3629 0.5262

 

CHAPTER 5
CONCLUSION

Research in the ﬁeld of education provides various challenges. For example, the
random assignment of students to a set of conditions is not realistic in most cases. Even in
the experimental setting the outcomes will have a positive intracluster correlations due to
the fact that (1) students do not receive their instruction individually but in groups, (2)
interactions exist between treatments and students (Lumsdaine, 1963). This situation
often makes the application of the linear structural equation models (J oreskog, 1977) to
the real data inappropriate.

As Cronbach (1976) pointed out, many studies in the ﬁeld of education have
produced inappropriate analysis, especially in the ﬁeld of evaluation studies because they
failed to recognize the nature of hierarchical data. The diﬁiculty of analyzing data arising
from two levels is in assessing the nature of intervariable relationships at both levels
simultaneously. During the last two decades researchers have developed multilevel
structural equation models for hierarchical data. Previous work on the multilevel structural
equation models involved a minimization ﬁtting ﬁrnction or /and balanced sampling design.
And most ultilized stande software. These minimization ﬁtting function approaches
have made substantial methodological advances. They require classifying groups into
subsets of groups having equal sample size.

This dissertation has shown how multilevel structural equation models can be
formulated for hierarchical data and how they can be analyzed by using empirical Bayes

with the EM algorithm. The model equations are linear at each level, the direct

61

62

direct connection between variables is typically speciﬁed by the value of the
coefﬁcients as in the LISREL (Joreskog, 1977) or EQS (Bentler, 1983) models.
The major outputs of this thesis are four:
1. I presented the general multilevel structural equation model in the
mode of hierarchical empirical Bayes modeling.
2. I developed the empirical Bayes estimation procedure via the EM
algorithm to ﬁnd maximum likelihood estimates of the model.
3. A computer program for numerical analysis of hierarchical
data for structural equation models was developed in Gauss.
4. The accuracy of the computing algorithm has been tested across sampling

designs and models.

5.1 Summag, Implications and Conclusiona

I now summarize the major points that emerged in each chapter and discuss
their implications for ﬁtting multilevel structural equation model to hierarchical data.
In chapter one, the problems confronting single level and multilevel structural
equation models were identiﬁed. These problems are old ones. Traditional single level
structural equation models do not incorporate the random errors from the multilevel
structure. Therefore reserachers face the dillema "What should be our unit of
analysis?” That is, they have to choose individual level analysis or group level
analysis. Ignoring the inherent hierarchical structure in our data sets results in the

confounding of group level effects with individual level effects. Of course, the

63

individual level analysis violates the independence assumption and results in over
estimation of precision. In program evaluation studies, for example, curriculuum
evaluation, we are interested in the interaction effects between the treatments and the
individuals across individual level background information and group level variables.

In chapter two, the multilevel structural equation model and a few basic model
assumptions are presented and then translated into the general mixed model
(hierarchical model of linear equations) on both the cluster level and individual level.
In particular, the model explicitly utilizes the socio-demographic information on both
group and individual levels. We also provided an example for model speciﬁcation. In
chapter three, empirical Bayes estimation procedure via the EM algorithm was
generalized to the multilevel structural equation model. In the "hierarchical prior
distribution" speciﬁcation we adopted the MLR (Dempster, Rubin and Tsutakawa,
1981) approach. We also presented the E and M steps for ML estimates.

Much of Chapter four assessed the accuracy of the EM algorithm for an
artiﬁcial model for different sampling designs. The results showed that the EM
algorithm for the multilevel structural equation model is quite accurate for the data
generated. The results were drawn from the simulation under the balanced and
unbalanced sampling design.

Ignoring the second level effects can yield a misleading interpretation of the
structure of the causal map among latent constructs under the study. The presented
methodology allows the simultaneous estimation of the prarmeters in the unbalanced

multilevel structural equation models.

64

The primary advantage of the EM method for multilevel structural equation
models lies in the three key facts: (1) it does not require the calculation of the 2nd-
order partial derivatives of the maximum likelihood ﬁrnction. Even though the models
studied by several researchers are different in terms of balanced or unbalanced
sampling designs, the calculation of partial derivatives are essential for the methods
proposed by Schmidt and Wisenbaker (1986), MacDonald and Goldstein (1989), Lee
and Poon(l992), Raudenbush (press). (2) it does not require classifying level-2 units
into subsets having equal sample size. However it has the shortcoming of slowness of
convergence. Another shortcoming of the EM algorithm is that it does not provide the
standard errors for the estimates of parameters.

When we apply the method to real-world problems we need ﬁrrther elaboration
of the models that carefully takes into account the special features of a certain subject
matter. Typical applications of structural equation mOdels involve (1) the development
of a prior model, representing hypothetical causal associations among a set of latent
variables and manifest variables. (2) ﬁtting the prior model to sample data, (3) the
evaluation of the solution in terms of its parameters estimates and goodness of ﬁt, (4)
the modiﬁcation of the model so as to improve its parsimony and its ﬁt to the data.
This last step has been known "a speciﬁcation search" or "respeciﬁcation". During
such a search the researcher alters the model speciﬁcation in search of substantively
meaningful model that ﬁts the data well. The structural misspeciﬁcations are involved

the speciﬁcation of the elements of matrix A and A,. It is also closely related to the

identiﬁable model.

65

The purpose of latent variables model is to improve the accuracy and validity
of inferences from empirical data. In order to accomplish this goal, several
assumptions about the structure of the data and the meaning of the associations
between variables must be made. Ideally, each of these assumptions will be based
either on special features of subject-matter application area or on the knowledge,

derived from past empirical evidence.

5.2 Future Work

One of the potential ﬁelds to which the mutilevel structural equation model is
extended is the model where the slope for the exogenous variables varies randomly
across groups. Note that there are numerous settings in which multilevel structural
equation models consisting of random slopes for exgenous variables are needed in
order to represent adequately the variance-covariance structure of the data. Thus, for
example, the gender gap in the SAT mathematics test scores can be explained by the
group-level characteristics.

As is well known, the EM algorithm is simple to implement and numerically
stable, but is slow. Recently Jamshidian and Jennrich (1993) developed a conjugate
gradient scheme for accelerating the speed of the EM algorithm. In their ABM
(accelerating EM) algorithm the evaluation of the gradient of the likelihood function
is essential. When the number of parameters is moderate the implementation of the

AEM algorithm seems not so burdensome. To obtain the standard error of the

66
maximum likelihood estimates one may apply the SEM (supplemented EM algorithm)

algorithm developed by Meng and Rubin (1991).

The current approach to drawing inferences concerning variance components is
based on large sample theory for ML estimators. When the number of groups is small
the normal approximation will be invalid. For that case, one may apply the Data
Augmentation (Tanner and Wong, 1987) approach.

Future models can be expanded to larger and more complicated models. Also

robustness of the model remain to be studied.

APPENDICES

6 7
Appendix 1

For ﬁnding the values of ,6‘ s and a' s, we have the elementwise algebraic relations :

7n =4 (A11)

7’2 :azrﬂr‘l’ﬂz (Al-2)

Then we can derive estimates of the structural coefﬁcients from this ﬁrst within-group
algebraic relations. Especially the identiﬁcation of a' s is carried out on the base of the

following assumption (A 1 .3) and the second within—group algebraic relationship (A 1 .4).

var(u,)=A=diag(6,,p=1,...,P) (A.1.3)

(1 -— A)" A(I — 14):" = var(vy.) = T, (A14)

The elementwise relations of (A 1 .4) are:

6,,=r

'hr

arrazr : 77),, (A. 1.5)

After ﬁnding ’s from the above algebraic relationship (A 1 .5), we ﬁnd the ,8 's from the

ﬁrst within-group algebraic relationship (A 1.2).

6 8
Similarly the ﬁrst between-group algebraic relations are:

(I — A,)"B, = r1, , (A16)

The corresponding elementwise relationships are:

”6011 = ﬂbll

”6021 : abZIﬂbll +ﬂb21 (A17)

In the same fashion, the identiﬁcation of a, 's is carried out on the base of the following

assumption (A 1 .8) and the second within-group algebraic relationship (A 1.9).

var(u,,)=A, =diag(6,,,p=1,...,P) (A18)

(1 - 71,):l A,(I — A, )-" = var(vbj) = Tm (A.1.9)

The elementwise relations of(A 1.9) are:

51>” = Tn...

abllabZI : T (A110)

'1»:

After ﬁnding a, 's from the above algebraic relationship (A 1. 10), we ﬁnd the ,6, ' s from the

ﬁrst between-group algebraic relationship (A 1 .7).

6 9
Appendix 2

For the computational convenience of the necessary posterior expectations and
dispersion matrices for the random vector the equation (3.1.1) without subscript

becomes :

y = A,7t+ A26, +A393 +8

e~ N(0,I,, ®Z ), N=an
n~ man

0, ~ N(0,Q,,),

03 ~ N(O$Qq) )
let y = A 6+r
where A = [A,|A,|A,]
’7’1 - F931-
”2 032
ﬂ'
0 = 62 , 71' = ° , 03 = . s
93 ' '
_ﬂl _, L63k

 

 

 

 

70
92 =[021162123""021nl"”’ 62K162K2P'W62Knx ]'

t : the dimension of 7:

Q, = subdiag(T )
r2 = subdiagdm)

'10

var(6) =

00"!

o 0
o, 0
0 o

'10

By using the results of the standard multivariate normal distribution (Searle, 1971),

one obtain the following joint distribution of the multilevel structural equation model

 

y 0 ' 2, 41‘ 21,9, 21,9,"
5, ~N 0 FA; r 0 0
5, o’ (2,,A, 0 a, 0
5,, __0_, -944; 0 0 (2,, -1

 

 

 

 

 

 

 

where z, = A,rA,' + A,o,.4, + A,o,,A; +1, (8 2

The posterior dispersion matrix D; of 0 is translated from A"I’“A +Q“]‘l into the

context of the multilevel structural equation model. The posterior dispersion matrix

can be written as :

71

Arr-'71, + r:l A',‘P"A, A',‘I’”‘A,
D; = [A"I’“A + Q“ ] <:> A,‘1’"A, A',‘P"A, + (2;; A',‘I"l A3
A;\P"A, A;\P“A, 14,8!" A, + (2;:

For the convenience of applying the inversion formula of the partitioned matrix

(Graybill, 1983), one can rewrite the matrix D5:

B L _1 —1'
3.13.1. =[Q- f‘GQ_’.]
. —(GQ 1) U'+GQ'G

where Q=B+ —LU"L' , G =U“L'
B B . . .
where B, {3: 3:] , L =[L,L,]

Q_.=[Q. Q.J“=[R. a]
Q; Q. R; R.

Q, = A,‘I’"A, — A,‘I’"A,(A;‘I’“A3 +9, )-I A,‘P"A,
Q, = A,‘P“A, - A,‘1’“A,(A;‘I’"A, + o,,)-' A,‘I’“A,

Q3 = A',‘I"'A2 + Q: — A'2‘1"‘A3(A,','-I"’A3 + Q; )‘1 A;‘I’"A,

72

R, R, —R,L,U“ - R,L,U“
1);: R, R, —R,L,U-l — R,L,U“

symmetric U“ + U“ (L',R,L + L',R;L, + L’,R,L, + L',R, L, )U"

l 2 3 D; C39 C10,
. = Cfo D1; Coo,

C1,, C5,, 0;,

U -
D)

II
__V 5‘: __V:
NV, 3‘ _.V
:51 3V ,3:

Thus in the E-step the needed conditional density of 93,, . 6,, , 8,, given the data and

the parameters, 7:0,”,0,T,,,T,,,Tm,£, have the following locations and dispersion

matrices:

9i = V11[(A2 - QzQi'A'z)‘1’"' (I - Ari/”AS?" )](y)

93 = (V22A2 - Qs-JQ2V11A1)T-JII - A3 (71311414, + T;,1)"A$‘P"](y)

9; = U"'A$‘1’“[y-(A19i +A29§)]

11 = [Q1 _ Q2Q3-1Q2 ]-1

12 : "'V11Q2 3-I

13 : -[V,,A;‘I’-IA3 '1'V12A2ql-lAslU—l
22 : 3-I +Q3-IQ2V11Q2 3-1
23 = “ll/1.2113443 +K2AéT—1A3]U_l

33 = U“ '[K3A1LP-1A3 'l'V2'3A2.‘IrlAs]U-l

7 3
Appendix 3

We show the explicit derivation of Q(®“),®“")) deﬁned in chapter 3.
By deﬁnition from Dempster, Laird and Rubin (1977),

Q09") . 9“") = E (1081f (C;@“’)]IY = y;®“”").
By applying the theorem on the expectation of the quadratic function we have:

E[9.§T"0-IY = y;®°‘“’1 = trng‘Ewr-éi IY = y;0“—”)1

'1 '7

E19T~T"9r,-IY = 20‘5”] = 4mm... 9,311» = y;®""”)]

bl 'h

E[£,,T.2"£,,|Y=y;@("”]= tr[2‘.“E(a,ja,:|Y =y;®“"’)]

After some algebraic calculations we have:
Q(@(” ,G)“"’) = (-Nr/2)ln 27t+(-N/2)ln| 2 |
+ (—Np / 2) ln(2 7r) + (—N / 2) ln|T,,|+(—Js/2)ln(27r) + (-J/ 2)ln| Th]
1 _ ~ . ~ .
"EZZ ”[2 1((AoZij)Dx(AoZy)T + (AoDau A70)

+ ( (AOZ,.C;,,U A§)+(A,C,‘T,UZ,’T A1,))]

1 ~ . - - " ‘ ‘
-2ZZUh“Aozrﬂ —Aowv.)TZ ‘(y0‘-AOZU” _A°w"f)

‘izzvmg‘wﬂilhy,®“"’)l-%ZVIT;.‘E(0.0;IY=29““‘01 (3.2.1)

7 4
Appendix 4

In this appendix we provide the formula for the element of 21 is :

A 1 0 t t 0 II
of, = FZZQAWD, +22,,,z,,D,,§ +271 w .D T +22,,,z,,D,, +22,,,z,2,D,,. +

bmr 48 J

22,,w 0,12,, + 271ng + 22,,D;,, + 2(22mc;,,, + 22,,z,,.C‘ + 2 w .C‘ +

1 any}; but J xafc

O O O O O O t 2
Amemfc + 2'brrr“.7211'C1trrng'c} + {ynnj — 1m (”f + 221119“)- Awmvcrj — 111,ij ”bf — lbmvbcj}

where m=1,..,4.

7 5
Appendix 5

In this appendix we present the derivations of the elements of the loading matrices.
First by expanding the 8-th term in the right of the equation (3.2.1) we have the

/24:

following ﬁve terms: ”,...,?

 

(1)(-2.0)y,’.2-‘A,Z,rr‘
(2)(-2.0)y,’.>:-‘A,nr;
(3)(A,Z,£)’2“A,Z,rr*
(4%AOZUIES’Z4AOw;

(5)(A.w.;-)’>:*‘A.w;

For each term we take the derivatives with respect to A0. Note that in taking the

derivatives we regard A0 as a full matrix. Then we have:

7 (Ix—2.0572512;
(2)(—2.012“yi"w;"”
(3)1(2.0)z-'A.Z,r'(rr“2;)1 .
(4)Z(E"Aotrz}}a"@3<f42411027726,?) 7"

V

2 .
(5)(2“Xow;w;’1,4)

76
And we take the derivatives of 7-th term in right side of the equation (3.2.1) with

respect to A,. Then we obtain: (1 é) 67 ,c,‘ )
2.0 33-9 == 21r[i"{/\.'§,D;z J+A DE tAZiGE. +/\. 0e 2 U
2M. (Q a
(6) ”[2: {A,Z’ ,. D;(A,z ,)"_ +A,D;A", +A, Z, c; , ",C+A C,j,,(A",Z,)"}]
\ P 44¢
60(09 00-") (5'2 ‘9 I

By arranging these six equations we obtain

 

as we write in chapter 3.
5A

a) (H)
Let A= @(GaAG ) . [A 1A is the matrix A with zeros inserted in the places of

ﬁxed parameters in A0. Finally by the equation (3.2.3) we obtain the following ML

('5 --\ "1
+:~./

estimators for each elements of A0:

=[- -2/1. 2MW (D x152"). +D £25220- +Cmu 51) -2/123(D;u ,3 +leC;Wu ,3 +22” (3:00”)

__ ,..-z-r—w‘».

{A ”"7”“ .. .
+2({2£:22'2,3wj7tb1_’1'2,3gl11])(z111_7r1+221j7r2+lij ]

mull

"-v._+_d—r

+2C,,, ,,

4211);” + 2221) 0:12 + 222:) D;22 + D + 2220C;w,,21 + (210' 71’: + 22,.J II; + 6:0.) ]]

2 s o
(14.2 :[—2’14,4wj(D J"315211)- +D 8462217 + C; r6062) ‘,2444(D;u 24 +2“ij, 34 +22qC 11117044)
. 0 0
+204.) " 214.41”; ”152 " 14.401.521X7'hj7’3 +22:171'4 +9. 21, )]

o o 2 o o
x[2[Dx33 +2220Dx34 +221'ij44 + DaUZZ + 2C; turn 32 +2221'jC;mu 42 + (zit) ”3 +223 ”4+ 21))21]l

12.3 = {—12.1110 (2D."521ij + 2D.ﬂ52211 +C;a,,51) “212.1(D;,,13 +zl1'jC;mUl3 +221J'C;mu23)

77

. . .
+20”; ”bi + 9.5112(423 (219”! +ZZU'”2 + 61’”) +355]

0 o o o 'l
x[2[wf.D,,, + 0,“, + 2w .C + 2(w,n,, + 5,,,)"]]

j x0053

2,, = {—2,,w,.(2D‘.3.z,,. + 20‘...z,,, + 2C;,,u,,) -22,,(D;U,, + 2,,C;,U,, + 2,,C;,,U,, )’
+20”; ”1:2 ‘1’ 61.52;)0’41,‘ — 24,2(211175: + 2211”; + 0210)]

e o o o ‘1
x[2[w,".D,,, + 1),“, + 2w .C + 2(w,rr,, + 5,1,.)21]

1 ”U64

BIBLIOGRAPHY

7 8
BIBLIOGRAPHY

Aitkin, M., Anderson, D., & Hinde, J. (1981). Statistical modeling of data on teaching
styles. Journal of the Royal Statistical Society, Series A, 144(4), 419-461.

Aitkin, M., & Longford, N (1986). Statistical modeling issues in school effectiveness
studies. Journal of the Royal Statistical Society. Series A. 149. 1, 1-43.

Bentler, P. M (1983). Simultaneous equations as moment structure models: With an
introduction to latent variable models. J oumﬂ of Econometrics, 22. 13-42.

Bentler, P. M (1989). EQS: Structural Equation Program Manual, Los Angeles, CA:
BMDP Statistical Software, Inc.

Bock, RD. (1960). Componets of variance analysis as a structural and discriminant
analysis for psychological tests. British Journal of Statistical Psychology, 13,
151-163.

Bock, RD. (1989). Addendum-measurement of variation: A two-stage model. In R.D.

Bock (Ed), Multilevel analysis of educational data (pp. 319-342). New York:

Academic.

Bock, R. D., and Bargmann, R. E., (1966). Analysis of covariance structures.
Psychometrika, 31, 507-534.

Bryk, AS, and Raudenbush, SW. (1987). Application of hierarchical linear models to
assessing change. Psychological Bulletin, 101, 1, 147-158.

Bryk, A.S., Raudenbush, SW. (1992). Hierarchical Linear Models in Social and

Behavioral Research: Applications and data analyais Methods. Newbury Park, CA:
Sage Publications.

Burstein, L. (1980). The analysis of multilevel data in educational research and
evaluation. In DC. Berliner(Ed.), Review of research in education, vol. 8.
Washington, DC: American Educational Research Association.

Cronbach, L. J. (1976). Research on classrooms and schools: Formulation of questions,
design and analysis. Occasional paper of the Stanford Evaluation
Consortium, School of Education, Stanford University.

Crosswhite, F. J ., Dossey, J. A., Swafford, J. O., MxKnight, C.C., & Cooney, T. J.
(1985). Second international mathematics study: Summary report for the United
States. Charnpaign, IL: Stipes.

deFinetti , B. (193 7). Foresigbt: its lggical laws, its subjective sourcgs. Annals de

79

l’Institute Henri Poincare. Reprinted (in translation) in Kyburg and Smokler
(1964).

Dempster, A, Laird, N. and Rubin, D. (1977). Maximum likelihood from incomplete data
via the EM algorithm (with discussion). Journal of the Royal Statistical Society.
series B 34, 1-8.

 

 

Dempster, A. P., Rubin, D. B. and Tsutakawa, R. K. (1981). Estimation in covariance
components models. Journal of the American Statistical Assoc. 76, 341-353.

Foitu, R.,P (1989). A comparison of the EM and Data Augmentation algorithms on
simulated small sample hierarchical data from research on education.
Unpublished doctoral dissertation, Michigan State University.

Galton, F. (1883). Inquiries into human faculty and its development. London: Macmillan.

Gauss system version 2.2 (1984-1990), Aptech systems, Inc.

Goldstein, H. (1986). Multilevel mixed linear model analysis using iterative generalized
least squares. Biometika, 73, 43-56.

Goldstein, H. (1987). Multilevel models in educatipnal and social research. London:
Oxford University Press.

Goldstein, H. (1989). Models for multilevel response variables with an application to
growth curves. In R.D. Bock (Ed.), Multilevel analysis of educational data (pp.
107-125). New York: Academic.

Graybill, F. A. (1983). Matrices with applications in statistics. Wadsworth Publishing
Company, Inc. Belmont, California.

Haaggen, K. and Vittadiini, G. (1991). Regression component decomposition in structural
analysis. Commanications of Statistical Theory aad Method. 20, 1153-1161.

Hartley, H. O., (1958). Maximum likelihood estimation from incomplete data. Biometrics
14, 174-194. ‘

 

Hartley, H. O., and Hocking, RR, (1971). The analysis of incomplete data. Biometrics
27, 783-808.

 

International Mathematical and Statistics Libraries (1987). Math/Library and
Stat/Library, version 2.0. Houaston: IMSL Problem-Solving Software Systems.

Jamshidian, M. and Jennrich, R. I. (1993). Conjugate gradient acceleration of the EM
algorithm. Journal of the American Statistical Associatioa, 88,221-228.

80

J o, S. H (1993). Multilevel analysis of structural equation model with E-step and M-step
with EQS. The apprenticeship paper presented to the Ph.D guidance committee,
Michigan State University.

Joreskog, KG. (1967). Some contributions to maximum likelihood factor analysis.
Psychometrika, 32, 443—482.

Joreskog, KG. (1973). A general method for estimating a linear structural equation
sysyem. In AS. Goldberger and O. D. Duncan (Eds), Structural equation models
in the social sciences. New York: Seminar Press.

Joreskog, KG (1977). Structural equation models in the social sciences: Speciﬁcation,
estimation and testing. In P.R. Krishnaiah (Ed), Applications of Staﬁitics.
Amsterdam: North-Holland.

Joreskog, K.G., & Sorbom, D. (1993). LISREL 8 with PRELIS 2: A guide to the
program and applications. Chicago, IL: SPSS.

Lawley, D. N. (1943). On problems connected with item selection and test construction.
Proceedinmf the Royal Socoetyof Edinburgh. Section A. 61, 273-287.

Lee, S. Y. and Poon, W. Y (1992). Two-level analysis of covariance structures for
unbalanced designs with small level-one samples. Britiah Journal of Mathematical
and statistical Psychology. 45, 109-123.

Likert, R. (1932). A technique for the measurement of attitudes. Archives of
psychology, 40.

Lindley, D,V, and Smith, AF.M. (1972) Bayes estimates for the linear model
(with discussion). Journal pf the Royal Statistical Socieg, Series B, 34, 1-41.

Longford, N. T. (1985). A fast scoring algorithm for maximum likelihood estimation in
unbalanced mixed models with nested effects. unpublished manuscript, Institute
for Applied Statistics. Lancaster University, Lancaster, England.

 

Lumsdaine, A. A. (1963). Instruments and media of instruction. In N.L. Gage (Ed),
Handbook of research on teaching. Rand Mcnally & Company, Chicago.

Magnus, J. R., and Neudecker, H. (1986). Matrix differential calculus and static
optimization. John Wiley, Chichester.

Mason, W. M., Wong, G. Y., & Entwistle, B. (1984). Contextual analysis through the
multilevel linear model. Sociological Methodology, Sanfrancisco, CA: Jossey-
Bass. '

81

McDonald, RP. (1978). A simple comprehensive model for the analysis of covariance
structures. British qumal of Mathematical and sﬂatisticﬂ Psychology, 31, 59-72.

McDonald, R.P., & Goldstein, H. (1989). Balanced versus unbalanced designs for linear
structural relations in two-level data. British Journal of mathematical and
Statistical Psychology, 42. 215-232.

Meng, X. L., and Rubin, D. B., (1991). Using EM to obtain asymptotic variance-
covariance matrices: The SEM algorithm. Journal of the American Statistical
Associatiop, 86, 899-909.

 

Muthen, B0. (1987). LISCOMP. Analysis of linear structural equations with a
comprehensive measurement model. User's guide [computer program]. Chicago,
IL.: Scientiﬁc Software.

Muthen, ED. (1988). Some uses of structural equation modeling in validity studies:
Extending IRT to external variables. In H. Wainer & H. Braun (Eds), Iast
Validity (pp. 213-23 8). Hillsdale, NJ: Lawrence Erlbaum Associates.

Muthen, B.O. (1990).Means and covariance structure analysis of hierarchical data.
University of California at Los Angeles Statistics Series #62. Los Angeles: UCLA.

Press, S. J (1982). Applied Multivariate Analysis : Using Bayesian and Frequentist
Methods of Inference. Robert E. Krieger Publishing Company. Malabar, Florida.

Raudenbush,S.W (1984). Applications of a hierarchical linear model in educational
research. Unpublished doctoral dissertation, Havard University.

Raudenbush,S.W. and Bryk, A.S.(1985). Empirical Bayes meta-analysis. Journal of
Educational Statistics. 10, 75-98.

Raudenbush, S.W., & Bryk, AS. (1986). A hierarchical model for studying school effects.
Sociology of Educatioa, 1-17.

Raudenbush, SW. (1988). Estimational applicatipns of hietatchicg linear models: A
review. Journal of Educational Statistics, 13, 85-116.

Raudenbush, S. W (press). Maximum likelihood estimation for unbalanced multilevel
structural equation models via the EM algorithm.

Britiah J Qumal pf Matbamatical and Statistical Psychology

Rubin, DB. (1991). EM and Beyond , Psychometrika 56, 241-254.

Schmidt, W.H. (1969). Covariance structure analysis of the multivariate random effects
model. Unpublished doctoral dissertation, University of Chicago.

82

Schmidt, W. H., and Wisenbaker, J. (1986). Hierarchical data analysis: An approach
basad on strucgral maationa. (CEPSE, No. 4., Research Series). Michigan State
University.

Schmidt, W. H., (1993). Survey of mathematics and science opportunities. Research
report series no. 57, TIMSS: curriculum analysis a content analytic approach.

Schonemann, P. H., and Steiger,J. H., (1976). Regression component analysis. British
Journal of Mathematical and Statiaticﬂ Psycholgy, 29, 175-189.

 

Shonemann, P. H., and Haagen, K. (1987). On the use of factor scores for prediction.
Biometrical Journal, 29, 835-847.

Searle, S. R. (1971). Linear Modela. John Wiley & Sons, Inc. , New York.

Smith, A. M. F. (1973). A general Bayesian linear model. Journal of the Royal Statistical
Sgciety, Series B, 35, 61-75.

Spearman, C., (1904). General intelligence, objectively dtermined and measured.
American Journal of Psychology, 15, 201-293.

Tanner, M. A, and Wong, W. H. (1987). The calculation of posterior distribution by data
augmentation (with discussion). Journal of the American Statistical Association,
82, 528-550.

Thurstone, L. L. (1947). Multiple factor analysis. Chicago: University of Chicago Press.

Vernon, P. E. (1950). m structure of human abilities. New York: Wiley.

Vredevoogd, Janet (1989) . Split-scale path analysis: The advgtages of the ﬁrll
measurement mglel without some of the problems. Paper presented at the annual

meeting of the American Educational Research Assoc, San Francisco, California.

Wiley, D. E., (1967). Analysis of covariance structures. N.S.F. Research Grant
Application.

Wiley, D. E., Schmidt, W. H., and Bramble W. J ., (1973). Studies of a class of covariance
structure models. Journal of American Statistical Associations, 68, 317-323.

Wright, S,. (1934). The methods of path coefﬁcients. Annals of mathematical statistics, 5,
161-21 5.

Wu, C.F.J (1983). On the convergence properties of the EM algorithm. Annals of
Statistics, 11, 95-103.

   

"1111111111111111111?