IVA‘AI'H
- _ v “-w

v\(

 

 

~Kw’v ‘

u (A

 

 

 

~ H ..
-Nn "——v » ~4Ay—w

-.--.w«‘.n.-..-.»
gs... «.7

2‘" 71..
' “.4. '
u

o-J.
,-

312:
r

r'

“-49

3L“: :"
t n

3
r
7‘

‘;’o ‘yl .' - n
.gjzﬁggiﬁg {3
"25%;, w,

.;
I.“
U i)

'1
4.-

.»

_ ‘ Via

~ :1... .T ..
4t L

4

u 5. u
v gv‘c‘ﬁh “01* 4
:

.nN.
J

 

‘31:"

.. "Tsar
.._...,.f.\.:}'
‘v «(-1:

3.

v

E‘

z ‘u ,

P .\ .,, . ,, . 4 z:

,. _ Lv.,.,“ K 1“,, .‘. m A .
U" w.;f..yhr _ K . 1;: . 7’ g A. {15
ﬂ» wry: v .g -- - ,, . 7 . . . ~ ,_..1=.;~:.:§§L~'r

“$555

I

‘_ an: i“
:23,“ .J- . a r 29%}.14“.
‘VLL; .. .. . f 4 u». , l v,“ ‘Luﬁkluw
. w . ;. u 1‘1”:.33¥£.:9?I)£R‘l
- . y ‘ .- . an... 1. .\ e
ﬁsw . -. - '1"

. <~ ? y;
.ng . , 4‘ 1.? {kn ‘
w,k1\’k;‘|.n" ‘ ‘u . .lqugunv 1.;
. .‘K: ‘ 'ﬁygﬂmyz‘aggk
I?“ ‘ .i:¥"y'ta

.’-"\':t
Nu- ~v«_ _

fi‘ﬁé‘ﬁ";

Sir-w ﬁv‘smw'
émgyrsaw» \
g», t .

~_

.1.
9..

w.l~

u get
‘\ my "‘
_ ‘9‘. w " ' "5“-
{llV .‘l-ag‘ﬁ‘: .3333“" .‘u. m -
Chianti; .l, .. 2. ~ '
n. ' ‘Ia.
1-

I“.

f ‘ 5 1 “4
«a, 'a
“£444;
x'I'IU L

. .

‘aw‘

1:33; ‘
“L.
v xx

' ' villi

Aide/‘93»;
i..n¢u.~.
m...

5 ‘ ' f < ‘ .
An. 3. ? ﬁg , .., f‘ u . ' r ' , , ' ’ j
. y!» - a)“: » ‘, ., , .- ‘ w. ' . ‘ V 'rl‘ g :L' 44 ,.. 1
. 1.5.11.4}: ; . a .4 ‘t; . My, my: M7 N 4' , HP «2: .. “£19216?me ‘fﬂ-‘zﬁﬁ:
n , ‘l‘ y.“ “0‘: .‘ﬁw.{~1 . . . ~ i‘ a _ . V . 9 M £1 .
Y I A5“? ' ¢1 ‘ I! '31:: '¢_ ‘ .4 - 94¢ I" '31- .1 ‘i. A) £4; v ;u j." gang
‘3‘“, > , 15,5” . .. :$ru‘ m- z,‘ .w. . v.‘ ’ . , .' ‘ {‘Qan‘r.“::\'
m ‘1 . .. ‘ n L .

.' {.5
. .
an! «E ’5‘.
" ' in

Ii“-

3;
~r:
14' gm“
,

4&4“
Vittirﬁ
ma
1 ,J 15,,
“2‘7"" 9’ - “2‘7?“ ;: r“.

avg-ﬁr? if Mm

.~. ' , ' "an, '* ..
M33?” 4 1“ 6‘11 ' 1’ éa—hwm um-Itit‘gbr

Adv:

é

; r

9
4“ «fit; :a , TN - new. .5 \- A 'I .
my“; 4:23. ‘ . f- .' N . 3:. :1 ‘ 1‘ {gs 'gagv‘gé‘; 1":
v , ‘ A . <. i ' 2
"' 7‘” ‘ ﬁ ”.2“? {353.1

135»
f: ..‘ flu
m ‘ w

«
u

N n ‘
I ' '
$351+"

‘ !
:qwlabu‘ﬂilrl" ’
5 J~ ” I“«'Er1'

m,
“ﬁmﬁ
tang;
Aw

. JV -
5:33.25 .~
.351.“ W
1 4 .3549? «wan:
r1114 lama. ""fl‘r ‘ ! '
‘I- T."- CIV A
if! a?“ " .

V

3”,.
‘ t':- 6 ~.
<*
2-“ ;;. ' -. ‘ r \
(S, . 1%..

u 1
¢

. .

’33»

‘ r, .

.v r‘ .. ..
“ ' f?» t'

l .

u S" :‘m n ' Ms ‘ v '
‘ V“: U112 ‘ _ n pilﬁ,_ 2A." , .1
‘ ~ ‘ ‘ r ‘

 

 

 

,.. . v ,

q . 4 .3"; r . r . , ( I
41“}. «:l‘ -T:' V ... . ‘ , ., , v - q

-/ -' [A Ntﬂ’: " , . “s V "' f if

‘ . . .,

az-

.

n." ~“

 

vvﬁ'iy :5: :1
'::"Cr,'!7}?.~ 93‘”
_ (Wing: 43%

 

3‘3»
.ﬁé‘xz'
f'?” ' '

«J .f

‘ ,9: . .

,1 (ftyxl-u” ‘ H J .‘Y Ad 9‘
1 "V “(3- . I‘ >' v ‘ V ’ h! , 7
-.¢.. .. ... . q .. . - 7. . .. (,1. """"""‘v '
‘. . ~ “’ "v7" ' _, - * " - ; i533 _ aE—MSJT. 4:..12'...'12L'.'L'::‘“

 

 

 

This is to certify that the

dissertation entitled
A MIXED LINEAR MODEL WITH TWO—WAY CROSSED RANDOM

EFFECTS AND ESTIMATION VIA THE EM ALGORITHM

presented by

SANG-J IN KANG

has been accepted towards fulﬁllment
of the requirements for

Ph . D . , MEASUREMENT AND
degree in

QUANTITATIVE RESEARCH METHODS

 

 

u). m . .9.

Major professor

Datejeb-ozzamz/

MS U is an Afﬁrmative Action/Equal Opportunity Institution 0-1277 1

 

A.

LIBRARY
Michigan lute

{ Unlvmny
N...‘

r

 

ﬁ‘

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

usu Is An Afﬁrmative Action/Equal Opportunity lnetitution
email”:

A MIXED LINEAR MODEL WITH TWO-WAY CROSSED RANDOM EFFECTS

AND ESTIMATION VIA THE EM ALGORITHM

BY

Sang-Jin Kang

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Counseling, Educational Psychology
and Special Education

1992

ABSTRACT

A MIXED LINEAR MODEL WITH TWO-WAY CROSSED RANDOM EFFECTS
AND ESTIMATION VIA THE EM ALGORITHM

BY

Sang-Jin Kang

In the past two decades, there has been a prominent methodological
effort in educational statistics to develop analytical methods that
account for the multilevel characteristics of the educational data. As a
result, methodologists have developed estimation procedures appropriate
for nested multilevel data assumed normally distributed. One limitation
shared by all of the new multilevel analytic approaches is that they
apply only to those multilevel data which are purely 'hierarchical.
Although educational systems typically have hierarchical organizations-
students are grouped together for learning within classrooms, classrooms
within schools — very often the structure of a system is not a 'pure'
hierarchy. Students may belong to more than one group simultaneously. If
the data are nested within the cell of two cross-classified grouping
factors, we call it crossed multilevel data.

This study presented an appropriate statistical model for the
analysis of crossed multilevel data for general applicability. The
crossed multilevel model presented in this thesis allows us to consider
simultaneously the multilevel and crossed features of higher level units.

Such a model promises to increase the descriptive power and inferential

Sang-Jin Kang
advantages for both micro-and macro-parameters in the crossed multilevel
context. Research interests in this thesis were five fold: a) to
conceptualize the crossed multilevel model; b) to present estimation
theory for the model using the empirical Bayes viewpoint; c) to provide a
computational algorithm for parameter estimation using the EM algorithm;
d) to provide empirical evidences for the accuracy of the computing
algorithm; e) to present the application of the model; f) to provide the
substantive applicability of the model in educational research.

The conceptualization of the model was described via a crossed random
effects ANOVA model and by linking it to the crossed multilevel model.
Estimation theory was reviewed from an empirical Bayes viewpoint and the
computational procedure implemented via the EM algorithm. The accuracy
of the algorithm was tested using randomly generated data against the
standard statistical packages (SAS and BMDP) and via a simulation study.
An actual experimental data set was analyzed for illustration and the
substantive applicability of the model in educational research was

assessed.

This dissertation is dedicated to my mother, for her love and special
life for the family.

To my brother, Sang-Yul Kang, and his wife, In-Sook Kim, for their
support and sincere life.

To my wife, Eun-Mi, for her love, patience, and ongoing support. I
love you.

To my daughters, Bona and Boyoung, who enriched my family life in
East Lansing and my nephews, Bio, Bundo, and Yohan, for their growth and
future lives.

To all of my other relatives and friends who have understood and have

been generous while I have worked in United States.

iv

ACKNOWLEDGEMENTS

I wiSh to send my heartfelt thanks to Dr. Stephen W. Raudenbush, my
advisor and committee chair, who has been my teacher, role model, and
provided rock of support throughout my doctoral program. His
encouragement, mentoring and ‘his faith in. me greatly aided in the
completion of this dissertation. Dr. Raudenbush's insight, courage, and
unusual motivation of hard work will continue to be an inspiration for
me.

I also wish to send my special thanks to Dr. Brian P. Rowan, who has
enlightened, and mentored my professional work. Dr. Rowan has provided a
constant financial support for my advanced study. Dr. Rowan's generosity
and open-minded advices for my work have inspired me.

My special appreciation goes to Dr. John Schwille who provided me
financial support for more than one year and has willingness to give his
valuable time for my needs. It has been my luck that he was near me.

Dr. William H. Schmidt, my former advisor and a committee member, gave
generously one-year research assistantship when I was a first year
doctoral student. Dr. Schmidt also gave helpful comments and suggestions
on my dissertation. Thanks.

I would like to send my recognition with gratitude to my other
committee members. To Dr. James Stapleton, who taught me statistics from
the basic to the advance, for his kind suggestions and comments. To Dr.
Robert Floden for his tough questions. To Dr. Richard Houang for his
willingness to use his valuable time for me.

I wish to acknowledge the professors in the Department of Education at

Yonsei University. I realize that all my success in the advanced study
is rooted in their teaching.
Finally, I wish to recognize my wife, Eun-Mi, who has shared the hard

life for my entire study at Michigan State University.

vi

TABLE OF CONTENTS

LIST OF TABLES

Chapter

I.

II.

III.

IV.

CONCEPTUALIZATION OF CROSSED MULTILEVEL MODEL

Introduction .

Objectives . .

Problems with Multilevel Contexts .
Educational Research and Crossed Multilevel Data .
Methodological Advances of Two- -Way Crossed Multi-
level Model .
Past Methodological Research on Crossed Multilevel
Model

The Two- -Way Crossed Multilevel Model . .
The Two- -Way Crossed Random Effect ANOVA Model
Crossed Multilevel Model Formulation .
General Crossed Multilevel Model .

General Mixed Linear Model .

EMPIRICAL BAYES ESTIMATION OF RANDOM EFFECTS IN THE
CROSSED MULTILEVEL MODEL .

Introduction . . .
The Bayesian Model with Known Covariances
Theory of EM Algorithm . . .
Bayesian Formulation of Crossed Multilevel Model .
Joint Normal Distribution .
Posterior Distribution of Parameters .
Posterior Dispersion Matrix of Parameters
Posterior Mean of Parameter Vector .

COMPUTING ESTIMATES OF CROSSED MULTILEVEL MODEL PARAMETERS

Model . .
Sufficient Statistics of Complete Data . .
EM Formula for Estimating the Fixed Parameters .
EM Formula for Estimating the Parameter Variances
EM Formula for Estimating the Within-Cell
Variance .

Observed Data Likelihood for the MLF Estimation.
Log- -Likelihood for the Crossed Multilevel Model

CHECKING THE ACCURACY OF THE COMPUTING ALGORITHM .

Computational Results
Crossed Random Effect Anova
Variance Components Model I

vii

Page

\JO‘U‘H

l3

16

18
18
23
28
34

35

35
37
40
43
45
46
A7
50

S3

53
54
58
59

60
64
66

69
73

7h
76

ix

Variance Components Model II .
Covariance Components Model

V. ILLUSTRATION .

Data and Hypotheses

Sample and Design

Analysis . . .
Base Model . . . . . . . . .
Within-Cell Model Specification

Between-Cell Model Specification .

VI. CONCLUSION .
Summary .
Limitations
Future Works .

BIBLIOGRAPHY

viii

79
82

88

88
89
91
91
95
102

108
107
112
114

120

Table

10.

11.
12.

13.

14“

LIST OF TABLES

Procedure for checking the accuracy of the algorithm .

Computational comparison between SAS program and the CML
algorithm for crossed random effects ANOVA

Computational comparison for the estimates of variance
components between BMDP program and the CML algorithm
for Model B . . . . . . . . . . . . . . . . .

Computational comparison for the estimates of the fixed
effects between BMDP program and the CML algorithm
for Model B (Balanced Case) . . . . . . . . .

Computational comparison for the estimates of the fixed
effects between BMDP program and the CML algorithm
for Model B (Unbalanced Case) . . . . . . . .

Computational comparison for variance components between
BMDP program and the CML algorithm for Model C .

Computational comparison for the estimates of the fixed
effects between BMDP program and the CML algorithm for
Model C (Balanced Case) . . . . . . . . . . . . .

Computational comparison for the estimates of the fixed
effects between BMDP program and the CML algorithm for
Model C (Unbalanced Case) . . . . . . . . . . .
Computational comparison between the preassigned parameter
values and the average values of the estimates from twenty
simulations

Computational comparison between the preassigned parameter
values and the results of the analyses of the balanced and
unbalanced data

Design and Sample Sizes

Results of crossed multilevel analysis of the base model .

Results of crossed multilevel analysis with random slopes
at within-cell model

Results of crossed multilevel analysis with a fixed within-
cell slope .

ix

Page

71

75

77

78

78

8O

81

82

84

86

90

93

97

. 100

15. Results of crossed multilevel analysis with both within— and
between-cell variables . . . . . . . . . . . . . . . . . . . 103

16. Results of crossed multilevel analysis of the final model. . 105

CHAPTER I

CONCEPTUALIZATION 0F CROSSED MULTILEVEL MODEL

W

In statistical analysis, a researcher needs to specify an
appropriate model that guides inquiry as well as describes the structure
of data as precisely as possible. Educational systems typically have
hierarchical organizations in which "units" at one level (e.g., students)
are "nested" within units at the next higher level (e.g., classrooms or
schools). These educational systems often produce inherently hierarchical
data. The problems of analyzing hierarchical data arise when. key
variables of interest are measured at different levels of an
organizational ‘hierarchy. The prime question for an appropriate
statistical model in this case is whether the model takes into account
the effects of variables measured at both the individual and the group
levels. Failure to account for hierarchies may lead to trouble in terms
of research validity and has been the core in methodological criticism of
educational research (see Cronbach, 1976; Burstein, 1980; Cooley, Bond, &
Mao, 1981; Raudenbush, and Bryk, 1988).

Despite such fundamental warnings, many analysts used single-level
models even if the key variables were measured at two different levels.
This mismatch between single-level models and the multi-level data often
leads the researchers to the analytic dilemmas in the choice of unit of
analysis. In most educational research, students are not randomly

assigned to groups such as classrooms or schools. The lack of

ind

Stu

KO

C0?

.5

we

32:}

,I
p‘ J!

2
independence of responses is further exacerbated by the fact that
students receive treatment as a group. When students share common
group histories, teachers, peer experiences, their responses will be
correlated.

In addition to the violation of statistical independence assumption,
hierarchical data usually produce variables of interest at both student
and group levels. Traditional linear models can account for only single
level variables and fail to accommodate the variables measured at both

levels which restricts model specification and appropriate inferences of

interest as a result. Moreover researchers may raise the questions
about how group processes (i.e . , some policy implementation) are
interrelated with the processes within the groups (1 . e . , student

behavior) when they have hierarchical data. These questions are hardly
answered by the classical linear models. Educational researchers have
long been concerned with multilevel issues. But traditional research
methods have not provided adequate tools with which to analyze data
arising in naturally occurring hierarchies.

New advances on model specification for multilevel data were made by
the methodologists who have advised researchers to formulate explicit,
multi-level models which enable testing of hypotheses about the effects
occurring within each level and interrelations among them (Burstein,
1980; Cooley, Bond,& Mao, 1981; Rogosa, 1978). A number of
methodologists, working independently, have developed estimation
procedures appropriate for hierarchical, or multilevel data assumed
normally distributed (Aitkin & Longford, 1986; Goldstein, 1986; DeLeeuw
& Kreft, 1986; Mason, Wong,& Entwisle, 1984; Raudenbush 8: Bryk, 1986).

They made a substantial methodological advances in two key reasons

exp}
Educ

Aha-v

°55~

EI‘IC

have

(Raudenbush , 1988) :

"First, such methods enable researchers to formulate and test
explicit statistical models for process occurring within and between
educational units. These models solve, in principle, the problem of
aggregation bias Second, these models enable researchers to
specify an appropriate error structure, including random intercepts and
random coefficients In most settings, appropriate specification of
error components solves the problems of misestimated precision which
have plagued hypothesis testing in nested unbalanced data sets."(p.86).

One limitation shared by all of the new multilevel analytic
approaches is that they apply only to those multilevel data structures
which are purely hierarchical. Although educational systems typically
have a hierarchical organization - students are grouped together for
learning within classrooms, classrooms within schools - very often the
structure of a system is not a "pure” hierarchy. Students may belong to
more than one group simultaneously. For example, all students are
members of neighborhoods as well as of schools and not all students in
the same neighborhood are in the same school and vice versa. Therefore
schools and neighborhoods in this case are crossed with different number
of students or no students within each cell classified by the two
factors. If such a classification of students is possible, again a
statistical model must reflect the crossed structure so that a researcher
may investigate the variables at both higher level units, neighborhoods
and schools. Because students are "nested" within cells of a
neighborhoods by schools cross-classification, the structure of data is
cross-classified as well as hierarchical. Also both neighborhood and

school effects are considered random because we can assume the two higher

level
Golds:
hieta:

ic

CCD‘.

L

car;

the

This

4
level units are randomly selected from larger populations of the two.
Goldstein (1987) called this "crossed-multilevel" structure while the
hierarchical data is "nested-multilevel".

For the analysis of crossed-multilevel data, a researcher ought to
pose a model that considers simultaneously both the multilevel and
crossed features of data. Many researchers implicitly worked on this
issue in the research on variance component analysis (Henderson, 1953;
Cunningham & Henderson, 1968; Hartley 5: Rao, 1967; Patterson 5: Thompson,
1971; LaMotte, 1973). The results from these researchers, however, are
applicable under restrictive conditions, as when no continuous group
variables are available, and there was no multilevel model formulation
which enables estimation of covariance components of micro-parameters
and testing the effects of crosslevel interactions in multilevel
contexts. Their results are limited to the estimation of variance
components.

Researchers working on multilevel models have already anticipated
the analysis of crossed-multilevel data in the view point of general
mixed model. Only a few computational attempts were made with simple
models. Lindley and Smith (1972) showed how Bayes estimation can be
applied to the analysis of crossed-multilevel data with a randomized
block design. But their attempt was limited to the case of balanced
design with one observation per cell. A more complete attempt was made
by Dempster, Rubin, and Tsutakawa (1981) in the example of "Professional
football scores" where the game scores were nested within offensive and
defensive effects of the teams involved in each game. These two
examples, however, do not lead directly to a general crossed-multilevel

model. Rather researchers need to understand these exemplary results as

educa

each

grad

yea:

sch:

Six

LA.
-

5
pointing to the possible emergence of a general model. Unfortunately we
do not have sudh a general crossed-multilevel model where any number of
fixed and random effects, in principle, can be estimated at both within
and between unit level along with the interaction effects for both
balanced and unbalanced designs.

Crossed-multilevel data are both common and poorly understood in
educational research. Consider a school testing program. Students in
each grade level receive multiple tests. Here students are nested within
the cell of grade levels by test forms; grade level effects and test
form effects are viewed as random; grade levels and test forms are
crossed. Again suppose schools have an annual testing program for
graduates. Students, in this case, are nested within schools as well as
years; school effects and year effects may be viewed as random; and both

schools and years are crossed.

Objectives

This research attempts to achieve six objectives corresponding to the
six chapters of the thesis:
1. Conceptualization of crossed-multilevel model is described in chapter
1 through two parts. The first part provides background information for
crossed multilevel model. The second part describes the mathematical
model for crossed multilevel analysis and considers the research
inquiries that it makes possible.
2- Estimation theory for the general crossed-multilevel model is
Presented via the empirical Bayes viewpoint for both general and mixed
linear models in chapter 2.

3- A computational algorithm is provided for the estimation of

press

6.3?

- v
o
w: \u‘

6
parameters in the posed model using the EM Algorithm (Dempster, Laird,
and Rubin, 1977) in chapter 3.
4. A computer program is developed in the Gauss language (version 2.0),
and accuracy of the algorithm is examined in chapter 4.
5. Application of the crossed multilevel model to real data set is
presented through illustration in chapter 5.
6. The properties of the estimators and value of the study is discussed

with summary in chapter 6.

b1 5 w t Mu eve o e
As mentioned, most educational research has tended to ignore the
nesting of individuals within groups and, instead, used single-level
models to analyze the multilevel data. This mismatch between single-

1evel models and multilevel data causes the following major problems.

W
Analyzing multilevel data with single-level models often leads the

researchers to the analytic dilemmas in the choice of unit of analysis.
Researchers have stated that significance tests based on individual-
level analysis are unacceptable, due to the violation of the independence
assumption when subjects receive treatments as a group, even if
individuals are randomly assigned to groups (Cronbach, 1975; Glass 6:
Stanley, 1970; Page, 1975; Aitkin, Anderson, & Hinde, 1981; Knapp, 1977;
Walsh, 1947). Inferences about individual behavior, such as individual
aChievement, based on group level analysis can cause aggregation bias
(Cronbach & Webb, 1975; Robinson, 1950; Hopkins, 1982). Some

investigators may choose separate analysis at each level in the hope that

mecca
critici
provide

tron
'choice

‘5

Shelli

7
the convergent results of the two analyses would avoid the methodological
criticism. Raudenbush and Bryk (1988) have stated that this strategy
provides no guidelines on how to interpret the results when they diverge.
Cronbach (1976), Burstein (1980) and others have recognized that
“choice of unit of analysis" is the wrong question for analyzing
multilevel data because the variation at each level is potentially of

interest and ought not be ignored.

ou c o

In the multilevel contexts, the variables measured at different levels
need to be taken into account in the model specification. Educational
researchers are often interested in the association between school
organization and student achievement. Consider a school district that
implements a certain policy to improve student achievement. The effect
of the policy on student achievement may be different across students
due to the differences in student demographic backgrounds such as prior
achievement and motivation, variables defined at student-level. If such
student differences are ignored, one must assume the policy effects are
evenly distributed across the students regardless the differences of
student characteristics, which is h rdly acceptable. The effect of
school organizational variables, i.e. policy variables, may also tend to
be different across schools due to the differences of the contextual
conditions across the schools. For example, schools may differ in their
school climates such as principal leadership, staff cooperation, student
disorder, and in socioeconomic levels of the communities in which
schools are located.

Whenever we study the effect of organizational variables on students

3..

de

EX

97..

EH :
ole

1?:

a w-
an

8
or teachers nested in the organization, we must adjust for confounding
effects that could occur at both the individual level and the
organizational level. Otherwise the estimates of the effects of the
organizational variables will be biased. The restriction of the
standard single-level linear model for the analysis of multilevel data
is that it does not simultaneously control the variables defined at

different levels.

as - ve nteractions

The effects of organizational variables may depend on variables
defined at different levels of the processes of the organization. For
example, the fast pacing of the curriculum is beneficial for high-
aptitude students but not for low-aptitude student (Gamoran, 1991), an
example of an aptitude-by-treatment interaction (Cronbach and Snow,
1977). Such interaction is a "cross-level interaction” (Bryk 6:
Raudenbush, 1989) in terms of the multilevel viewpoint because a
variable measured at higher level (i.e. pacing of the curriculum)
interacts with a variable measured at a lower level (i.e. student
aptitude).

Cross-level interaction effects may have important implications for
the formulation of scientific theories because the effects constrain the
generalizability of findings. Cronbach and Webb (1975) realized that the
characteristics of the classroom interacted with individual backgrounds.
Such interactions rendered the treatment effect unique in each study and
constrained the generalizability of a finding across the settings with
the treatment.

The lack of generalizability of organizational effects may often be

foam
goli
the

J .
m9:

beca

”v

. R
Us .1“
~

9
Eu

(7
P9!

.13

r.‘

9
found in the studies of school effects because the effect of a certain
policy or reform may depends on the characteristics of students or on
the contextual conditions of the schools. Traditional single-level
models do not allow the investigation of the cross-level interactions

because they must assume that effects are homogeneous across the groups.

duca e ea c and C o s d u leve ata
o t l v n e t
Consider the three educational research cases below which require the

analysis of crossed multilevel data.

Case 1. Assessment of School Effects A number of district

administrators want to diagnose the secondary high school education of
their districts and therefore implement an annual testing program for
the graduates. The administrators want to know whether the schools
provide stable education with good quality, (i.e. , excellence), in terms
of graduates test scores across the years. If the test scores of the
graduates are not stable across the years then they want to describe the
changes. The administrators also want to know how much the schools
differ in the test scores of the graduates. If the schools differ
substantially in test scores, then they hope to see what characteristics
of the schools are determinants of such differences. Another important
question is whether the effects of student background characteristics
are homogeneous across schools and years. Are outcomes more or less
equitably distributed in different schools? Is the equity of
distribution stable over time? Available data to the administrators in

this case are graduates test scores in each year, students demographic

backgr

teache'

Case I
adzini:
systems
primary
adsinis
element
at secc
the sec
achiEVe

rharact

E’“L‘ilste

‘ieek
u

set

10
background variables, and school characteristics (e.g., school size,

teacher/student ratio).

Case 2. o a betwee t e school evels Again the
administrators want to get fundamental information about the schooling
systems of the districts. In schooling systems, children attending a
primary school may go to one of several secondary schools. The
administrators need information about what characteristics of the
elementary schools support the long-term progress of student achievement
at secondary high schools. They also need to know the characteristics of
the secondary high schools that have positive or negative effects on
achievement. The more interesting question may be what combination of
characteristics of the elementary and secondary schools supports student
achievement. Another question is how well schools of different types
serve children of different ethnic and social background. Available
data to the administrator are achievement test scores in elementary
schools and secondary schools, student background variables, and

characteristics of the schools at both levels.

Case 3- MW A Psychometrician wants to

evaluate a standardized achievement test that was developed as a power
test (Rudman & Raudenbush, 1987). Unlike a speed test, a power test is
supposed to be insensitive to the duration of test administration. It
is expected that examinees can not answer further questions correctly
after the prescribed test time because their knowledge would be
eXhausted after that elapsed time. Research hypotheses in this case are

Whether there are the effects of excess test time on examinees' test

11
scores.

The psychometrician first blocks classrooms on background
characteristics and then randomly assigns classrooms to excess time
conditions. Hence children are nested within cells of a blocks-by-
treatment cross-classification. Both treatment and blocks are conceived
as random. Available data to the psychometrician are student
demographic variables, teacher characteristics, students' prior

achievement, and the standard test scores.

Analytical Comonalities

The design characteristics of the above three research cases share
four points. First, data of the all studies are basically multilevel. The
studies use the variables measured at both individual and at higher
levels. Second, the higher-level units of the studies are all considered
as random. In the first case, we consider the sample schools as
selected from the larger population and the years of the test
administration are also random. In the second case, the elementary
schools and secondary high schools are all considered as random. The
third case uses a randomized block design where the block effects are
typically viewed as random and the categories of different excess test
times are viewed as random as long as the researcher increases the number
of conditions.

Third, each study uses two kinds of higher level units and the two
factors are crossed. The first example shows that graduates are nested
within cells classified by the schools and years. In the second example,
students have dual membership of their elementary schools and secondary

high schools. The current organization of schools implies that primary

12
schools will not be nested within secondary schools. Rather the two
factors will be crossed. The last example utilizes the block design
where the blocks are crossed with the categories of different time
allocations for testing.

Finally, the data in every case are unbalanced. In the first example,
the number of students are not the same across the years as well as
across the schools. This unbalanced character of the sample size may be
more serious in the second example. Students attending a secondary high
school do not come from all elementary schools. Again, the number of
students from each elementary school is different in each high school.
Therefore, in a natural situation, the numbers of students classified
into each elementary-by-secondary cell are unequal and there will be
many missing cells. A similar situation occurs in the third example. In
education, experimental studies usually use volunteers as the source of
data. The class sizes will vary, meaning that cell sizes will vary.
Also not every block may contain the same number of students.

In sum, each of the three studies has two crossed random factors
within which individuals are nested. Because the higher level units are
crossed and the lower level units are nested within cells in the two-way
classification of the higher level units, we call the data "crossed-
multilevel“ (Goldstein, 1987a). The appropriate model should then
consider simultaneously both the multilevel and crossed features of data.
The multilevel and crossed features require the specification of four
random effects in a model: random individual effects, random effects of
each of the two higher level units, and random interaction effects. In
addition, individual scores within each cell can be described as a

function of multiple individual characteristics; and some of the within-

13
cell effects vary significantly over rows and columns while the others

may vary only over rows.

e dva ta es 0 o-W C o sed ult eve Mode

The limitation of the current available multilevel models for the
analysis of crossed multilevel data is that those models have been
developed for the analysis of one-way nested data. Those models are
appropriate to the analysis of nested multilevel data, but only when one
level is purely nested within the other level. The nested multilevel
model allows estimation of covariance-components at each level.

For the analysis of crossed multilevel data, a model should
estimate the covariance components at each level but it should also
decompose the covariance-components at higher levels into three
components corresponding to specified random effects for two higher-
level units and for their interactions. The model should handle with
unbalanced data and covariates having either fixed or random effects.
We now consider the additional information obtainable from crossed-
multilevel models in connection with the previous three research

examples .

v - o o e ts decom tio

The crossed multilevel model allows specification of the
apprOpriate error terms in a model where schools and years are crossed.
For the first example, the crossed multilevel model can estimate a
within-cell variance, 02, and between-cell covariance components. The
between-cell covariance-components are decomposed into the three parts;

one is the covariance matrix for school residual effect, say fa’ the

14

other is for year residual effect, say 1 and for the residual

b’
interaction effect between schools and years, rc.

With these covariance components we could get various intra-unit
correlations which inform us of the proportion of the total observed
variance that lies within cells, between rows, between columns and
between cells. We could also do a statistical test of the significance
of random variation for each effect. These two pieces of information,
the intra-unit correlations and the xz-test, provide us both practical
and statistical information about further model specification. For
example, the presence of a large intra-school correlation informs us that
a large percentage of variation of the student scores lies among schools.
The significant results of a x2-test for school residual dispersion
effect, ’a’ informs us that there still remains a significant random
residual effect not explained by the school characteristics in the model.

Similar information is obtainable for the analyses of the two other

examples.

n e ec s

We return now to the case in which primary and secondary schools
are crossed. A major difference between the nested multilevel model and
crossed multilevel model is that crossed multilevel model captures the
crossed features of the two higher level units while the nested
multilevel models do not.

Because the crossed multilevel model allows the specification of
random interaction effects of the two higher level units, primary and
secondary schools in the second example, it produces estimates of the

covariance components of random interaction effects of the two higher

15
level units. The presence of substantial random interaction effects
tells us that certain primary school effects depend upon the secondary
school attended; or, equivalently that the effects of attending a
particular secondary school depend on the primary school attended. If
we find a significant interaction effect of a particular primary school
characteristic and a particular secondary school characteristic on
student achievement, then the effect of the involved school

characteristic can't be generalized to the entire schooling system.

3 t tica recision

Imagine that a researcher has applied a nested-multilevel model to
analyze multilevel data with a two-way classification. He has performed
the analysis as a compromise because a crossed-multilevel model was not
available. The nested-multilevel model specifies a single error term
for higher level units. There are, however, three error sources in the
higher level units: two higher level units (e.g., schools, years) and
their interactions.

The estimates of higher level covariance components in his analysis
are then the sum of the three covariance components from each error
source. In the multilevel contexts, the estimates of fixed and random
effects are the functions of covariance components at both individual
and group levels. Nested multilevel models use the sum of covariance
components of the higher level units for parameter estimation without
knowing the size of covariance components from each error source.
Although the exact functional relationship between the nested and
crossed multilevel models regarding the parameter estimation is not

clear to the author at this point, the author suspects the precision of

16

the parameter estimation may be sub-optimal when the nested multilevel
model is applied to the analysis of crossed multilevel data. Consider
the case in which primary and secondary schools are crossed. Again
suppose we are estimating the fixed effects of primary and secondary
school characteristics. Then nested multilevel model uniformly applies
the sum of covariance components for estimating the fixed effects of
both primar and secondary schools while the size of covariance
components pertinent to each higher level units are different.

Crossed multilevel models allow specification of the error terms
precisely for the analysis of nested data with two-way classification
and properly decomposes the covariance components at higher level.
Hence it uses the decomposed covariance components at higher level for

parameter estimation of the model.

et 0 o 0 ice Res a c o C as d leve de

Standard texts on experimental design provide methods for simple
crossed random effects models in which a single random component is
associated with each cell in a fully balanced two-way cross-
classification (see, for example, Kirk, 1982). However, as in the above
examples, the interesting designs in education will typically be
substantially unbalanced. For example, many cells in a neighborhood-by-
schools cross-classification will be empty or small; and enrollments
will vary in a schools-by-time cross-classification.

Researchers working on variance component analysis have . tried to
elaborate models in order to meet various conditions (Henderson, 1953;
Cunningham & Henderson, 1968; Hartley 6: Rao, 1967; Patterson & Thompson,

1971; LaMotte, 1973). The results from these researchers, however, are

l7
applicable under restrictive conditions, as when no continuous group
variables are available and their results are limited to the estimation
of variance components.

Harville (1977) reviews methods of variance components estimation
based on maximum likelihood for unbalanced designs with crossed random
factors. However, although these methods are appropriate for variance
components, they do not allow covariance components, which will often be
of interest in ducation. For example, regression coefficients
describing the relationship between social background and achievement
may vary across schools or neighborhoods, or across time.

Lindley and Smith (1972) showed how Bayes estimation can be applied
to the analysis of crossed-multilevel data with a randomized block
design. But their attempt was limited to the case of balanced designs
with one observation per cell. Dempster, Rubin, and Tsutakawa (1981)
provided numerical results in the example of "Professional football
scores” where the game scores were nested within offensive and defensive
effects of the teams involved in each game. Dempster et al. provided
maximum likelihood estimates using the EM algorithm and explained the
empirical Bayes estimation theory. These two examples are useful;
however, they do not lead directly to the general crossed multilevel
model.

Multilevel methodologists anticipated the analysis of crossed-
multilevel data in the viewpoint of the general mixed model. In this
stream, Goldstein (1987b) outlined a procedure based on iterative
generalized least squares for statistical estimation in crossed-random

covariance components. He, however, supplied no computational strategy.

18

e o-Wa os ve de
In this section, the two-way crossed-multilevel model is
conceptualized and formulated. In order to promote the conce tual
understanding of the crossed-multilevel model, we first consider
crossed-random effects ANOVA model and formulate the crossed multilevel
model from the ANOVA model. Then the general crossed multilevel model

will be presented in two forms: matrix form, and no-subscript form.

e o-Wa Crossed Random feet OVA ode

Suppose a researcher wants to investigate the effects of schools
and neighborhoods on students' achievement. He may first select J
schools and obtain the information about students' K residential areas.
The framework of this design shows; a) students are nested within the
cells of a schools-by-neighborhoods cross-classification, b) school
effects on students achievement are random, c) neighborhood effects are
random.

From this description, we may estimate some or all of the following
effects; a) random main effects of schools, b) random main effects of
neighborhoods, c) fixed effects of school characteristics, d) fixed
effects of neighborhood characteristics, e) random interaction effects
(school by neighborhood), and f) fixed interaction effects (fixed
school-by-fixed neighborhood predictors).

If we consider just the random effects, we can specify a statistical

model as below

Y -p+a+b+c +e

ijk j k jk (1'1)

izjk

19

for i - 1,2,..., - l,2,...,Jk, k - 1,2,...,Kj; and Y is ith

njk’ J ijk

student score in school j in neighborhood k; p is grand mean of all

j is random effect of school j; bk is

random effect of neighborhood k; cjk is random interaction effect

between school j and neighborhood k; and finally e1. jk is a random

error. The notation "izjk" means student i is nested in school j and

students in the population; a

neighborhood k. The subnotation of the cell size, njk’ the number of

schools, Jk’ and the number of neighborhood, Kj’ imply the design
characteristics which allow unbalanced and incomplete data. Since all

2

effects are random, we have a series of assumptions: a - N(0, a a);

2 . 2 .
bk - N(0, ob ), cjk - N(0, ac ), ei:jk

assumptions of Equation (1.1). The effects of aj, bk’ cjk’ and e

J
~ N(0, aez) are all required

izjk
are mutually independent.

Equation (1.1) considers only the random effects. Random main
effects may or may not be of central interest for the inquiry of a
researcher. A researcher interested how much of the variance among
students' scores is attributable to the differences among schools and
neighborhoods, may use this model for analyzing data from the above
design.

Very often, researchers inquire about the effects of school and
neighborhood characteristics on student achievement and how students
scores in a school differ across neighborhoods. Equation (1.1),
assuming all predictors are centered around their means, can be

elaborated into the model below in this case:

Yijk - p + ﬂlxlj + u3 + ﬁ2x2k + vk + ﬁ3x3jk + 'jk + ei:jk (1.2),

20

where 81 is fixed effect of school characteristic, X1 (e.g., school

size); 82 is fixed effect of a neighborhood characteristic, X2

(e.g.,community SES level); and [33 is the fixed effect of the

interaction between X1 and X2. Assuming X

the random school effect, a

J 9
Thus u is a residual random

- X + u .
J a, 11 J J
school effect after accounting for the fixed school effect, 51x11.

1, X2, and X3 are orthogonal,

in Equation (1.1) is decomposed into two

parts in Equation (1.2); a

Similarly, the random neighborhood effect, bk’ and interaction effect

cjk of Equation (1) are also decomposed into two parts respectively; bk
- 8 X + v , and c - B X + it . This model needs assumptions for
2 2k k jk 3 3jk jk

2 2 2
the random terms. u:l - N(0, au ), vk - N(0, av ), ”jk - N(0, a,r ), and
2

e - N(0, 0e ). Other alternative model specifications are possible

ijk

if a researcher has different aims of inquiry. For example, a researcher

may utilize covariates with students level variables, say x4ijk (e.g.,

prior achievement) centered around the mean of group jk. Then the
student level random error can be decomposed into two parts; e 1. jk -

+ where 84 is the fixed effect of student prior

ﬁ4x4ijk fizjk’
achievement level, X4. Whatever model a researcher poses, the model
tells us that all specified effects in the model need to be estimated.

We call Equation (1.1) a crossed-random effect ANOVA model which can

be understood as multilevel model because it takes into account the

contributions from group level units (the a], bk’ cjk) and individual

effect, ei'jk' Equation (1.2) is also considered as a multilevel model
because there are two kinds random terms; u j , vk, and tr jk are group
level residuals and e is individual level residual. These two

i:jk

models show the basic features of crossed-random effect models and how

21
we specify a model of interest. Because these two models incorporate
both individual and group level random effects, we can get considerable
information from the random effect dispersion matrix through variance

decomposition.

d s e o a an V ce eco os ti n
Equation (1.2) contains four fixed parameters; p, 51’ 82 , B3 and

the random variables; u Here we assume the model is

j’ Vk’ 'jk ' ei:jk'
additive and that the four random variables are mutually independent
each other.

The variance of Yijk’ for given fixed effects, is

2+02+02+02,

and the conditional covariance between students' score within the cell

of school by neighborhood is

' 2 .
Cov (uj+vk+1rjk+e1:jk,uj+vk+1rjk+e i:jk) - au v 1r

Hence the correlation is given by definition,

2 2 2 2 2 2
p-(au +0V +a1r)/(au +av +01r +08)

which we can consider "intra-cell" correlation. This correlation itself
shows the proportion of between-cell variances over the total variance.
Thus it tells us the proportion of variance attributable to variation

between cells of school-by-neighborhood.

22
Again the conditional covariances for students in different schools
but the same neighborhood is
2

Cov(uj+kaxjk+ei:jk, u'j+vk+« jk+e i:jk) - av

and for students in different neighborhood but the same school is

I I I 2
c°v(uj+vk+ﬂjk+ei:jk’ uj+v k+ﬂ jk+e i:jk) au

From these covariances we can get two more "intra-unit"
correlations; the "intra-neighborhood" correlation,

2 2 2 2 2
p - av / (au +0v +0,r +ae )

and the intra-school correlation

2 2 2 2 2
p - au / (au +0v +0,r +0e ).

Again each intra-correlation shows the ‘proportion. of total variance
between students which is due to the differences between neighborhood or
between schools respectively.

Now we have all necessary information to form the full dispersion
matrix, V, for total sample. The full variance-covariance matrix, V,

contains cell matrices of school j and year k where there are n

, ij' jk
students. We can form these cell matrices which can be classified into
four types. The first type of cell matrices represent the variance-

covariance matrices for those students in the same school and

23
neighborhood. These matrices take the diagonal positions of the full
matrix, V. This matrix contains the variances of students within a cell,

au2+av2+ax2+aez, as the diagonal elements, and the covariances,

auz-i-avz-raxz , as the off-diagonal elements. For off-diagonal block
matrices of the full matrix of V, there are three kinds of block
matrices. One represents the blocks which consider the students in the
same school but in the different neighborhood. In this case all elements
are ouz. The other one represents the students in the same neighborhood
but different schools. This block contains av2 in all elements. The
last type of block matrices are for those students who have different
neighborhood and school membership. This last type of matrices are null.
Thus the full matrix, V, has somewhat complex structure, but is

composed of only four kinds of block matrices; one kind of diagonal block

matrices and three kinds off-diagonal block matrices.

u t ev Mo u io
The crossed random effects ANOVA model presented at the previous
section is now reformulated into the crossed-multilevel model. Such

reformulation will show the logic of crossed-multilevel modelling.

W

Recall the Equation (1.1) which reflects the research design where
students are nested within the cells of a schools-by-neighborhoods
cross-classification and both schools' and neighborhoods' effects are
random. Since Equation (1.1) has a hierarchical structure, it can be

represented into two stage formulae in terms of within-cell and between-

24
cell models. The within-cell model is
Y1

jk - ﬂjk + eijk (1.3)

where the individual score, Yijk’ is composed of the average score of
the cell classified by jg; school and kg; neighborhood plus an
individual effect eijk - N(0, 02). Equation (1.3) is the traditional

regression model with no predictors in the model except that ﬁjk is
allowed to vary randomly. Thus we pose between-cell model to explain
the variation of the regression coefficients in the within-cell model.
The mean score of a cell, ﬂjk’ can be decomposed into the grand mean, p,
across all cells plus the effects of school j, neighborhood k, and their

interactions. Thus the between-cell model is specified as

-p+a+b+c (1.4)

”3k J k 1k

where u is the grand mean of all cell means, a is the effect of school

J

j, b is the effect of neighborhood k, c is the effect of cell jk.

k jk

Since the effects of school, neighborhood, and their interactions are

all random, the distributional assumptions are; a

J

b)’ and cjk - N(0, rc). If we substitute the Equation (1.4) into the

Equation (1.3) we get the Equation (1.1), crossed random effect ANOVA

~ N(0, fa), bk - N(0,

T

model. So the two models are identical.
Here the random factor a.1 reflects the variation of cell means that
is attributed to the effect of a school. Thus ,a is understood as a

true parameter variance due to differences between schools. A similar

meaning is applied to the Tb and 'c' Using this crossed-multilevel

LI)
1'

q.
is).

‘

~r:
.

25
model we can get various information. First we can test the hypothesis
that H0: p - 0 (i.e. the grand mean is zero), but it is not usually of
interest in this model. We also pose hypotheses about the parameter
variances as well. The null hypotheses are: var(aj) - 0; var(bk) - 0;

and var(c - 0.

Jk’
was

The basic rationale of model specification of crossed-multilevel
model for each study is the same as in the other available multilevel
models (i.e. H M). First, we examine the variability among students in
the hierarchical structure via the base model. Second, we specify the
within-cell model only to reduce the within-cell variation and no
predictor variables are used at between-cell model. In this stage we can
determine whether the effects of the within-cell variables (i.e.,
student variables) are random or fixed across the higher level units by
examining the intra-unit correlations of each within-cell slope. If the
intra-unit correlations of a certain within-cell slope is close to zero
then we can fix the effect of that predictor and the within-cell model
become a mixed linear model. Third, after completion of the within-cell
model specification we start the specification of between-cell model in
order to identify variability among higher level units as a function of
between-unit variables. The criterion of the model specification is a
coe det a o , R2, which has similar meaning in the
regression analysis. While R2 means the proportion of the explained
variation by a model given the total observed variation in regression

analysis, the R2 in crossed-multilevel model is obtained based on the

true parameter variances. Therefore the meaning of R2 in this case is

26
that the proportion of explained true variance by a model given the
total parameter variances from the model based on no predictors.

Suppose we complete the model specification via the above three
steps and have a final model where two within cell variables are
employed but one of them has fixed effect, i.e. pretest score, while the
intercept is random across the groups and the remaining student variable
(e.g., SES level) has random effects across the schools only. Again
suppose one school variable (e.g., school size) and one neighborhood
variable (e.g., crime rate) are identified as an significant predictors

in the final model. By using matrix form, the within-cell model is

ij - xjk Bjk + ejk (1.5)
I- r- - r- T
Y1 i 1 x11 x21 [ 50 ] e1
- 51 +
52 JR
LYndjk _1 x1n indjk _en‘jk

 

 

 

 

 

 

where ij is a vector of students' posttest scores, xjk is the matrix of

student predictors where X1 is students SES level and X2 is the pretest

scores that have fixed effects. Since the within-cell model has both
fixed and random effects it is a mixed linear model itself. In the
following between-cell model we need to model the within-cell parameters,

the 8's, as a function of the group level variables. The between-cell

model is

27

 

 

”3k ‘ “1k r
8 l w w w 0 0 0 F 1 -
[pg] - [o 0102 03lw10] 182
52 jk o o o o o o 1 jk 702
703
710
711
. 720.1
+ Rjk aJ + 631‘ bk + Tjk cjk
1 o a 1 [ b 1 + 1 [ c l
+ 01 [a2 L o 0 k 0 0 3“
oo jk o jk o jk
(1.6)

Here ﬁjk is a vector of within cell parameters; ij is the matrix of

group level variables; 1‘ is the matrix of fixed effects of group level

predictors; Rjk’ cjk’ and Tjk are the selection matrices for random

residual effects; aj, bk’ and cjk are the random residual effect vectors

with distributions aJ - N(0, ra), bk - N(O, 1b), cjk

An interesting feature of the above Equation is the presence of Rjk'

cjk’ and Tjk’ which serve for identifying the residual random effects.

If the all three within-cell parameters, ,80, [31, ﬁz, are random across

- N(0, re).

the schools and neighborhoods including the interactions between the two
factors, then the three selection matrices, Rjk’ Cjk’ and Tjk’ are all
identity matrices with the dimension of the number of within-cell
parameters. If one represents the between-cell model in a scalar form,
one will see how the within cell parameter, 132, has a fixed effect.
These three selection matrices are useful for a practical purpose.
Analysts may need to set some particular residual effects to zero to fix
the relevant variable effects on outcome. If the three selection

matrices are null, then analysts are assuming that the within-cell

slopes are all fixed and the crossed-multilevel model become the

28
standard ordinary least squares regression equation with single variance
component, 02.

Another interesting feature of Equation (1.6) is the format of ij
which shows a different set of predictors for each within-cell slope.
Here, W1, W2, and W3 are the school size, crime rate, and their
interactions of the jkgh cell respectively. Because the within-cell
slope, 82, has no group level variation, it has been fixed and no group
level predictors are included in the matrix of group level variables.
Thus the model has great flexibility in model specification by allowing
a different set of predictors for each within-cell slope.

The above between-cell model is a multivariate model because there
are multiple outcome variables (B's) for each cell, jk. If we combine

the two equations by substituting the Equation (1.6) into the Equation

(1.5), the model becomes

Y

jk - xjk ij I‘ 4» X8le a.j + ijk bk 4- chk cjk + ejk (1.7)

where Xa X Equation (1.7) is

jk " xijjk; x'bjk ' xjkcjk; cjk ' xjijk'
considered as a mixed linear model where the first term of the right

hand side is a fixed portion and and remaining terms are all random.

e as eve o

The crossed random effect ANOVA model and reformulation of it into
the crossed multilevel model presented above explicitly describes the
design characteristics of the data and allow specification of
appropriate error structures that solve the problem of misestimated

precision. We shall now see a general form of crossed-multilevel

29

model. To make the presentation concrete, suppose again that we wish to
estimate a regression equation for each of many cells classified by
schools and neighborhoods. Our aim is to discover whether these
regression equations differ across the cells and , if they' do, to
explore the reasons why they vary. To the extent that these regression
parameters do vary, we want to ask: what school and neighborhood
characteristics are associated with variation in these regression
coefficients? To investigate this kind of problem we formulate a
multilevel model which is composed of two submodels: a within-cell model
and a between—cell model. The parameters of the within cell model are
conceived as outcome variables in a between cell model. After
formulation of multilevel model we will see how the model can be viewed
as a special case of general mixed linear model.

The presentation of crossed multilevel model is ordered in two forms
according to its generalization: 1. matrix form with subscript 2. no

subscript form.

wts
The regression formulas provided in the previous section are useful
for seeing the exact structure of the equations. The formulation of a
general model expands notation and facilitates derivation of the
estimation formula.
We first write the within-cell model which corresponds to the

equation (1.5) as

ij - xjkﬂjk + ejk’ (1.8)

30

T .
where ij - [Yljk’ ’Ynjk] is an njk by 1 vector of achievement
scores;
T
pjk - [ﬂOjk’ ’ﬁpjk’ ’ﬁr-ljk] is an r by 1 vector of micro
parameters;
T
ejk - [eljk’ ’enjk] is the njk by 1 vector of random errors assumed

normally distributed with a mean vector of zero and dispersion matrix
ijk;

and

jk xlljk ° ' ° xr-1,1jk

_1 xlnjk . . . xr-l,njk"

 

 

The between-cell model in matrix notation corresponding to the

equation (1.6) is

ﬁjk - ijy + Rjkaj + Cjkbk + Tjkcjk (1.9)
T T T T
where 1 - [10 ,...,1p ""’1r-1 ] and
T T
7p - [7p0’1pa1""7pas’7pbl’""7pbt’7pc1"'"7pcu] . The elements of

1PT are the parameters capturing the structural relationship between the
p 3h within-cell slope and the predictor variables of schools,
neighborhoods and their interactions.

T
"arl-l]j and a1 - N(0,ra),

T
0"°"br2-l]k and bk - N(0,r

a - [ao,..

J

b-[b

k b);

31

c and c - N(0,rc).

- [c c 1 T
jk O""’ r3-1jk jk
The dispersion matrices 'a’ rb, and re are all full symmetric matrices
with the dimension of r1, r2, and r3 which reflect the number of micro-
parameters that vary randomly across the schools, neighborhoods and

school-by-neighborhood interactions respectively. The matrix of group

level variables is

p

 

 

”3k ' 1 “Oalj"'w0cujk
0
l wpalj . . 'wpcujk
O
- 1 wr-lalj ' ' 'wr-lcujk‘
The elements of W are somewhat complex, because the predictors are

jk

chosen form a variety of resources; schools, neighborhoods and the
interactions of the two macro units. For example, the first element,

WOalj’ means that it is the value of first school characteristic, at

al,
school j, predicting the intercept of within-cell model, ,80. The design

characteristics of ij allows a different set of predictors for each

within-cell slope.
Rjk is a random school effect indicator matrix with the dimension of

r x r1, where r is the number of all micro parameters and r1 is the

number of micro parameters that vary randomly across the schools. Cjk

is a random neighborhood effect indicator matrix with the dimension of
r x r2, where r2 is the number of micro parameters of within-cell model

that vary randomly across the neighborhoods. '1‘jk is a interaction

random effects indicator matrix with the dimension of r x r3, and r3 is

32

the number of randomly varying micro parameters across the cells.
ij becomes a partitioned matrix with block diagonal structure where
the row vectors are stacked along the main diagonal. 1 is constructed of
subvectors, one for each of the r outcomes. The subvectors are "stacked"
on top of each other. The total number of elements of 1 is F, where F- E
Fp and F? - 1 + 3p + tp + up for p - 0,...,r-l, and sp is the number of
fixed row effects predicting p th within cell slope, ﬂpjk; tP is the
number of fixed column effects predicting ﬁpjk; up is the number of
fixed interaction effects predicting ﬂpjk'

The between cell model is a multivariate model because there are "r"

outcome variables for each cell, jk. Because we allow a different set of

predictors for each within cell slope, ﬁpjk’ we need to note FpﬂFp, for

5p and 8p,.

a w u b t
To further simplify notation and subsequent presentation of estimation
formulas, we now rewrite the general crossed multilevel models without
subscripts. This presentation is useful in two ways: First, it provides the
matrix structure of the crossed multilevel model. Second, using the model
without subscripts, it is easy to present estimation method. The within

cell model becomes

Y - Xﬁ + e, e - N (0, e) (1.10)
where Y - [Y11T, ,deilT,
a - [p11T, ... .ﬁinlT.
e - [e11T, ,eJK?]T, and

33

 

 

X - Diag [xjk] - X11 0
xjk
O
. XJKJ
The between cell model is
p-w1+Ra+Cb+Tc, (1.11)
T T T T
where 1 - [1o ’ ... ,1p , ... '7r-1 ] ,
T T T
W - [W11 , ... ,WJK ] ,
T T T
a - [31 , , , . ,aJ ] , and a ~ N( 0, 08),
T T T
b - [b1 , , , . ,bK ] , and b - N( 0, 0b),
T T T
c - [c11 , . . . ’cJK ] , and c - N( 0, 0c),

where 0a - subdiag(ra), 0b - subdiag(r and 0c - subdiag(rc).

b)!

R - Rjk O MR where Rjk are always r x r1

elements of a single '1' and all other elements are zero. As noted earlier

matrices where each column has

the elements of 'ls' in Rjk indicate the presence of random effects of

within cell slopes. MR is G x J matrix which determines the row membership
of a cell; ”G” is the total number of cells with data and ”J" is the total
number of row groups, that is schools. The operator ” G ” means Kronecker
product. C - Cjk a MC where MC is G x K matrix which determines the column
membership of a cell. T - TJk e MT, where MT is G x G identity matrix.

 

 

34
Ge e a xed ea ode
If we substitute the between-cell model (Equation 1.11) into the within-

cell model (Equation 1.10), we obtain the single model

Y - XW1 + XRa + XCb + XTc + e

This model can be restated in simpler notation as

Y - XW1 + Xla + X2b + X3c + e (1.12),

The new matrices of X1 to X3 represent the XR, XC, and XT. This combined

model can be reformulated into general mixed linear model as

Y - A101 + A202 + e, (1.13)

T

where A - [X W], 01 is 1, A2 - [xlIXZIX - [aT, b , cT]T, and e is the

1 31' ”2

same as defined previously. The author will use these two models, Equation

(1,12) and (1.13), for the presentation of estimation theory in chapter two.

CHAPTER I I

EMPRICAL BAYES ESTIMATION OF RANDOM EFFECTS IN
THE CROSSED MULTILEVEL MODEL

Wan

At present, the estimation theories contributing to crossed
multilevel data may be classified in two research streams. The first
stream may be viewed as classical variance component analysis which
started as early as the 19303. In this stream the researchers attempted
to estimate various random effects ANOVA or ANCOVA models. Graybill
(1961) extensively treated the estimation problems for the constants and
variances in the linear model with balanced designs and demonstrated the
optimality properties of the classical analysis of variance procedures.
In the case of 'unbalanced factorial and nested data' , there may be four
subgroups of estimation methods. They may be labeled, 1) Method of
Moments (Henderson, 1953; Searle and Henderson, 1961; Cunningham and
Henderson, 1968), 2) MIVQUE or MINQUE (Harville, 1969; LaMotte, 1970;
Rao, 1972), 3) Maximum Likelihood (Hartley and Rao, 1967; Harville,
1977), 4) Restricted Maximum likelihood (REML) (Patterson and Thompson,
1971; 1974; Harville, 1977).

These four methods are currently implemented in standard statistical
packages, such as SAS and BMDP but all are limited to the estimation of
variance components. Numerical applications are available under mixed
ANOVA or ANCOVA models which do not allow continuous group-level
variables (see BMDP,1985,pp413-435).

The more general approach for hierarchically nested data was made in

the second research stream. Researchers in this stream use the term of
35

36

'multilevel analysis' because they attempt to estimate both micro— and
macro-parameters. There are several researchers who developed estimation
methods independently. Lindley and Smith (1972) derived Bayesian
estimates for the hierarchical linear model; Smith (1973) compared
Bayesian and least squares estimates for' hierarchical linear' model;
Dempster, Laird, and Rubin (1977) established the EM algorithm for ML
estimation for covariance components models; Longford (1985) developed
the Fisher scoring algorithm for ML estimation of covariance components
in multilevel mixed linear models; Goldstein (1986) developed an
iterative generalized least squares estimation; Laird and Ware (1982),
Strenio, Weisberg, and Bryk (1983), Mason, Wong, and Entwisle (1984), and
Raudenbush and Bryk (1986) developed. approaches to estimation rusing
empirical Bayes estimation method via EM algorithm.

These methods have been widely applied for the analysis of
hierarchically nested data and can be considered as candidate methods for
the analysis of crossed multilevel model. This thesis follows the flow
of empirical Bayes estimation using the EM algorithm for the numerical
analysis of crossed multilevel model. While the Bayesian theory
formulates a prior density for the variance and covariances, the
empirical Bayes theory uses maximum likelihood point estimates of the
variance and covariances of the prior that maximize their' marginal
posterior distribution (Efron and Morris, 1975). The term empirical Bayes
came from the fact that it uses empirical data to estimate the variance
and covariance parameters.

For the application of empirical Bayes estimation, the EM algorithm
produces the ML estimates of parameters by substituting the expected

complete data sufficient statistics into the formula for complete-data ML

37
estimation of the parameters. The present chapter discusses how
empirical Bayes estimation theory can be applied to the estimation of
crossed multilevel model. The next chapter presents computational EM
formulas for obtaining ML estimates of the parameters. Here, I will
describe Bayesian estimation theory for the model given the known fixed
effects, variances, and covariances. Then in chapter 3 I will show how

EM algorithm can be used for estimation of crossed multilevel model.

The Bayesian Model with Known Covariances

Using the notation and results of Raudenbush (1988), the Bayesian

linear model takes the form of general linear model,
Y-A6+e, e - N(0, W) (2.1),

where Y is N by 1 vector of outcomes;
A is N by P matrix of predictors;
6 is P by 1 vector of parameters;

e is N by 1 vector of random errors.

The critical difference between the classical linear model and the
Bayesian model lies on the conception of the parameters, 9. The
conception of the parameters in classical linear model can be summarized
as follows:

1) 9 is a fixed unknown; 2) we compute O as our estimate based on the
sample of data; 3) if we replicate the same study many times, the 8
values computed will vary with a mean of 9 and a dispersion var(G) and

the distribution of 0 is called the sampling distribution; 4) we use the

38

sampling distribution of 8 for the inferences about 9.

In the Bayesian point of view, the parameters, 6, of Equation (2.1)
themselves have a general linear structure in terms of other quantities
which they call hyperparameters (Lindley, & Smith, 1972). Therefore all
parameters of 9 are viewed as random and hyperparameters are viewed as

fixed. Since 9 is random, we may propose its distribution,

9 - N( é , o) (2.2).

The Bayesian views this as our prior distribution of 9 and the prior
distribution represents our state of knowledge on 6 (Smith, 1973).
Therefore our prior knowledge about the location of the parameter vector
is 9, and the precision of this knowledge is measured by 0-1.

In the context of Bayesian estimation, the prior distribution of
parameters is assumed to be known, and the posterior distribution of
parameters given the data and prior parameters needs to be found. The
procedure for finding the posterior distribution involves Bayes theorem,
which states that the posterior density function f(6|Y) is proportional
to the product of two independent density functions: the likelihood

L(Y|9) and the prior distribution P(9). Hence the posterior density

function is,

f(6|Y) a L(Y|6)P(6) (2.3).

Bayes theorem incorporates the prior information of the parameters and

the observed information from the sample. The right hand side of

39
Equation (2.3) can be easily seen as joint probability density function
of observed data Y and prior information of 9. Under the assumption of
known dispersion matrices, B and 0, the joint normal distribution of the

data, Y, and the parameter 6 is

f(Y,6) - L(Y|6)P(9) (2.4)

- clexp[-1/2(Y-Ae)Tw'1(Y-Ae)1c2exp[-1/2(e-é)To'1(e-é)1

where C1 and 02 are normalizing constants for the two normal densities.

Thus the posterior distribution of 6 is proportional to the joint normal

density function.
T -1 - T -1 -
f(6IY) « exp[-1/2(Y-A9) w (Y-A6)] exp[-1/2(8-6) o (9-9] (2.5).

*
Equation (2.5) determines 6 , the posterior mean of 6, and the dispersion

D * Th 1
e. at S
f(9IY) « exp[-1/2 (e-e*)ne*’1(e-e*)], (2.6)
where 9* - 06*(0'16 + AIW'IY) (2.7)
and D9* - (0-1 + ATi-IA)-1. (2.8)

Estimation dheory for the Bayes linear model is presented in
Lindley’ and Smith (1972) and. Dempster, Rubin, and.‘Tsutakawa (1981).
Bayesian inferences are based on these posterior estimators from which

we can get point estimates and intervals.

4O
eo o l o

The Bayesian theory presented above requires known variances and
covariances in order to get posterior means and dispersion matrices for
parameters under consideration. In the empirical Bayes method, the
parameters are estimated from the joint normal posterior distribution
given the ML estimates or other consistent point estimates of the hyper-
parameters for their' unknown 'values. This empirical Bayes estimation
method is applied in the EM algorithm, where the ML estimates of
hyperparameters defined through empirical Bayesian methods are used for
finding the expected complete data sufficient statistics which in turn
are used for ML estimation of the parameters (Dempster, Laird, and Rubin,
1977).

The name EM_elge;1§hm comes from the characteristics of an iterative
routine which cycles through an Expectation step and a Maximization step
at each iteration. The expectation step finds the posterior expectation
of the sufficient statistics based on the 'complete data' given the
'observed data' and current estimates of parameters. The Maximization
step then uses the expected sufficient statistics to produce an estimated
values of unknown parameters under estimation. These two steps cooperate
to increase the likelihood function of the estimated parameters.

To make the procedure of EM algorithm concrete, consider a linear
model with a sample size of n and we want to estimate the variance

components of the model as,

Y’- X5 + e,
where Y is a n by 1 vector of observations;

X is a n by p matrix of predictor variables;

41
ﬂ is a p by 1 vector of regression coefficients;
e is a n by 1 vector of random error, and the distributional

assumption is e - N (0, 02I) and p - N(6, F) as a prior.

The EM algorithm uses the expected value of sufficient statistics of the
complete data conditioned on observed data and current estimates of
parameters as a proxy for the summary statistics of "complete data". The
'complete data" consist of Y and the true values of parameters, here ﬂ
and e. By employing the assumption of having complete data, we estimate
the sufficient statistics of complete data by the conditional expectation
as E(e'eIY,021) in E-step, where e'e is the complete data sufficient
statistics, Y is the observed data, 021 is the current estimate of 02 at
the i331 iteration. The term 'I' can be read as 'given the data of'.
Suppose we have a vague prior on 8. Then the variance components of F
become infinitely large and P-l-eO and the Equation (2.9) become the
functionally same model as a standard linear model. Hence we pose the
conditional distribution of 8 given the data as

21 - N(ﬁi. V) (2.10)

plY,a
where pi - (x'X)'1x'Y

v - a21(x'X)'1.

The sufficient statistic is then

42
E(e'e|Y,021) - E[(Y-Xﬁ)'(Y-Xﬁ)IY,021] (2.11)
- Er (Y-xaiwi-xm ' safari-Xe) b.0211
- (Y-xnb'w-xrsi) + arm-pirx'xw-ﬁ‘)Iran]
- SSres + tr[X'XVar(ﬂlY,021)]

- SSres + tr(X'X)021(X'X).1 - SSres + p021

The result of Equation (2.11) is the proxy of the sufficient statistics
for the complete data. In the M-step we simply get maximum likelihood
estimates of 02 as

02(i+1) - [SSres + p021]/n (2.12).

2(i+1)

The resulting value of o is then used as input for the next E-step.

A

The reader will notice that if the current estimate 021 - SSres/(n-p),
then 62(i+1)- SSres/(n—p) also. Indeed, if 621 differ from SSres/(n-p),
the iteration will nevertheless converge to that estimate.

For the use of EM algorithm, there remains a choice between two
likelihood functions for ML estimation of variances and covariances,

which is labeled as MLF and MLR (Dempster, Rubin, and Tsutakawa, 1981).

Consider the general mixed Bayesian model as

Y - A191 + A292 + e,

[:1 - Ni {:1 .1: :11

and e - N(0. V)

The MLF method treats 61, w, and T as fixed parameters to be estimated.

43
To achieve such estimation requires the conditional distribution
f(82|Y,61,‘F,T). In contrast, the MLR method treats w and T as fixed
parameters to be estimated. To achieve such estimation requires
f(91,92IY,i,T). In this thesis, I have chosen the MLF approach.

Consider the general mixed linear model as Equation (1.13), which was

Y - A191 + A262 + e,

where the A161 is the fixed part and remaining right hand side of the
equation is random part. To obtain the conditional density of 62 given

Y, 91, W, and T, the model is modified as,

d - A292 + e,
where d - Y - A161 (Dempster, Rubin, 6: Tsutakawa, 1981). Then the
equation ‘become a general Bayesian. model which enables us to find

posterior means and dispersion. matrix of 9 using Bayesian. method

2

presented in the previous section in this chapter.

e a u ati o sed u t ve Mod
In order to use EM algorithm, we need to formulate the crossed
multilevel model into the general Bayesian form of' Equation (2.1).

Recall the combined model Equation (1.12) was

Y - XW1 + Xla + X2b + X3c + e

and the mixed linear model, Equation (1.13), was

44

Y - A181 + A292 + e.

Since these two models are equivalent, we can posit the following
identities.

A1 _ [xv] (2.13)

A2-[x1 ' x2 |x31

and

61 - 1
62 - [aT | bT | CT].

The related assumptions are that the fixed and random parameters are
independent and that the random parameters, a, b, and c, are also

independent. We change the general mixed linear model into the model

with only random parameters such as

d - A292 + e (2.14),

where d - Y - A191 and the distributional assumptions on the new outcome

variable is

d ~ N (0 , A TA + W) (2.15)
Other terms of the model (2.14) keep the same distributional assumptions

as the mixed model.

In order to make the equivalent crossed multilevel model, we also

Change the model (2.14) into;

45

d - x1e + xzb + x3c + e (2.16),

where d - Y - XW1 and the distributional assumption is

T T T 2
d - N (0 , xlnaxl + inbxz + x3ocx3 + a I) (2.17)

Now, the models (2.16) and (2.14) are equivalent.

Jo o l D str b t o

The necessary step for posterior estimation is the specification of
joint normal distribution of the modified model.
The expected mean vector of the dependent variable is
E(d) - E(A292 + e) - 0
and the expected mean vector of the parameter is
E(62) - 0.
The variance of outcome vector is

T

Var(d) - AZTA2 + i
and the parameter variance is
Var(62) - T.
The covariance between 'd' and '62' is
Cov(d, 62) - Cov(A262 + e, 62) - AZT'
Hence, the joint normal distribution is

T
d _ 0 AZTAZ + w AZT
N T (2.18)
92 0 , TA2 T

The equivalent distribution of the crossed multilevel model can be

46

obtained by substituting A and T with their corresponding terms. Hence

2
'd‘ ' '0' ’xnxT+x XT+X0XT+021X0 x xo"
1 a 1 20b 2 3 c 3 1 a 20b 3 c
a ' N o o x T o o 0
a1 a
T
b o obx2 0 ab 0
c o o x T o o o
- - - . ~ - c 3 c - ~

 

 

 

 

 

 

 

 

P ste or st ibutio o arameter

Having specified the joint normal distribution of modified crossed
multilevel model, we now need to get the posterior distribution of the
parameters, a, b, and c. By using the definitions of (2.13) and by
applying the posterior estimation method, we now can pose the posterior
distribution of parameter vector of the modified version of the Bayesian
model. Then we translate the results into the terms of crossed
multilevel model using the identities defined before. The posterior

distribution of the parameter given the data is

- * *

62Id,T,W N (92 , ”92 ) (2.20),
where
* * T -1
92 - D92 A2 1 d (2.21)
and
T -1 -1 -1
032 - (A2 0 A2 + T ) (2.22)

Equations of (2.21) and of (2.22) show that the posterior distribution
of parameters requires the prior information of the parameters; W, T, and

9 Here we assume that the information of the W, T, and 81 are known.

1.

47
I will first show the posterior dispersion matrix and then move on to

the posterior mean of parameter vector in the crossed multilevel model.

0 e t 0 ar te S

Equation (2.22) is now rewritten in terms of crossed multilevel

model,
T 2 -1 T T
X1 X1 + a 08 X1 X2 X1 X3 -1
* 2 T T 2 -1 T
D92 - 0 X2 X1 X2 X2 + a 0b X2 X3 (2.23).
T T T 2 -1
X3 X1 X3 X2 X3 X3 + a Do

To derive the inversion of the above matrix, the following definitions

are useful.

x TX + 020 '1 x Tx 311 312
+ 1 l a l 2
B ' T T 2 -1 ' 21 22 (2'2“)
x2 x1 x2 x2 + 0 ab 3 B
T
B - T -
x2 x3 B2
T 2 -1
U - X3 X3 + 0 0c
With these definitions we say
* 3+ B -1
D _ (2.25)
92 3T 0 .

The advantage of this representation is that we can now apply the
partitioned matrix inversion method since (2.25) is partitioned in two-

by- two form. The results of the inversion (Searle, 1982) is

48

* H-l -(GH-1)T
D - (2.26)
92 -(GH 1) U 1 + CH 1GT ,
where H - 8+ - BU-IBT
c - U'IBT.
Equation (2.26) requires the two inverted matrices, U.1 and H4. The

dimension of U.1 is determined by the number of columns of the matrix

X But the 'H' matrix has two-by-two partitioned form again. We apply

3.
the same inversion method as applied for (2.25). For the inversion of
the 'H' we use the definitions at Equation (2.24) and rewrite it into

simpler notation as

311 - B U'la T 312 - B U'IB T H n
1 1 1 2 11 12
H - 321 - B U'la T 322 - B U'ls T - H H (2.27)
2 1 2 2 21 22
_1 H11 H12 -1 H11 H12
“an H - - (2.28)
n n H21 H22
21 22
11 -1 -1
where H - (H11 - H12H22 H21)
12 111 -1
H - -H H12H22
22 -1 -1 111 -1 -1 -1
H ' H22 + (“22 "21)H ("12“22 ) °r ("22 ' H21H11 "12) °

Once we get the inverted matrix 'H', we can apply the results to the

inversion of the matrix (2.26). From (2.26) we have

49

 

 

p H11 H12 _H1181U-1_leBZU-1 .
* 2 21 22 21 -1 22 -1
D92 - a H H -H 310 -H BZU (2.29)
U-l+U'1[81TH11B1+B2TH2181
(Symmetric) T 12 T 22 -1
L +31 H 152+];2 H 152 ]U j

*
Now the task is to reexpress the each block of D82 in terms of the

crossed multilevel model. The Equation (2.29) shows that the solutions

of each block are linked each other. Thus more compact notation for the

matrix (2.29) is useful.

v11 v12 V13

Let D - a v21 v22 v23 (2.30).
v31 v32 v33

Using the definitions at Equations of (2.24) and (2.27), and by

collecting the results of Equations (2.26) through (2.29), we can obtain

the following results :

11 -1 -1 -1
‘V11." H ' (“11 ' H12H22 H21) ' (H11 ' Ga) (2'31)

12 111 -1 T T -1
v12 " H " '“ H12‘122 ' "’11 x1 Mx2n22

22 -1 -1 11
v22 " H '"22 + ("22 "21)H

T -1
(“12"22 )
-1 -1 T x T -1
' H22 + H22 x2 Mlell 1 Mx2H22
1 -1 -1

°’-' (“22 ' H21H11 H12) ' (“22 ‘ Cb)

T T -1
v13 .. «(vllx1 X3 + v12X2 X3)U

50

v23 ' "(V21X1Tx3 + "22"213‘3)”.1
v33 - U'1 - (v13TX1TX3 + v23Tx2Tx3)U'1
v-vT v-vT
21 12 ' 31 13
where “11 - xlTMx1 + 0208-1,
H22 ' szsz + 020b-1’
M - I - x3TU'1 3,
Ga ' xlTMx2322-1x2TMx1'
c - x T 'lx Tux .

b 2 Mxlull 1 2

We need to note that the two terms H11 and H22 are subdiagonal matrices
and their inversions are easily obtainable. The complexity of these

computation can be reduced if we note the fact that each component is

the function of other components. Hence once we have information of U-1,

-l
22 , we can get v11 which serves for the estimation of v12 which,

in turn, serves for the computation of v13, and so on.

and H

o te 0 Mean 0 e e Ve 0
Having obtained the posterior dispersion matrix of the crossed
multilevel model, we now consider the posterior expectation. Recall the

Equation (2.21) which was

*
Assuming we have the estimates of 91 and we have D62 , the matrix

operation become straitforward.

51
Concerning the dispersion matrix, recall the assumptions for the
crossed multilevel model in which the column, row, and interaction
effects are mutually independent. In other words, the proposed prior
dispersion matrix has block diagonal form where (Ia, (lb, and 0c are at
its diagonal position. 08, ab, and 0c are also block diagonal matrices

with submatrices of re, and rc at their diagonal position. These

'b’
three submatrices are full matrices with dimension of r1, r2, and r3 and

they take the diagonal position of 08, 0b, and 0c respectively. However
the posterior dispersion matrix which can be obtained after data

observation is not block diagonal. The off diagonal submatrices, v12,

v13, and v23, are not null. We need to use the obtained full matrix of

s
062 for posterior estimation of the parameter vector as the equation

(2.21).

The equation (2.21) become as follow.

* T T T
a vnx1 + vux2 + v13X3
* T T T
92 - b* - v21X1 + v22X2 + v23X3 d (2.32)
* T T T
c v31Xl + v32X2 + v33X3
where X1, X2, and X3 terms reflect the data of the predictors classified
as column, row, and their interaction variables respectively. To

simplify the results of Equation (2.32), we need to first expand the
results, using the operational results of Equation (2.31), and simplify
them by noting the interrelationship of the submatrices of Equation
(2.31). After a somewhat complicated operation of the relevant

submatrices, we can arrive the following results.

52

‘1’
b*
" V22Qb'
* -1 T * *
c -u x3 [d -(X1a +X2b )1,
'r 'r -1 T
where Qa - x1 Md - x1 £21122 x2 Md,
1' T -1 T
ob - x2 Md - x2 MXlHn x1 Md.

All terms included in Equation (2.33) are previously defined at Equation

(2.31).

CHAPTER III.
COMPUTING ESTIMATES OF CROSSED MULTILEVEL MODEL PARAMETERS

This chapter will present the technical aspects of implementing
crossed multilevel analysis. The author will show how the EM algorithm

provides the estimates of crossed multilevel model.

Mgdel

The model defined for MLF estimation was

d - A292 + e,

where d - Y - A161;
92 N(0, T);
e - N(0, i)
with appropriate dimension. 61 is fixed parameter while 62 is

considered as random. The corresponding crossed multilevel model is

d - Xla + X2b + X3

‘where d - Y - XW1

c + e,

a - N(0 , 0 )

a
b - N(0 , 0b)
c - N(0 , 0c)

and. e ~ N(0 , 021).

As before, the following definitions are useful to link the above two

models;

T T T T
Al-XW, A2-[X1|X2|x3],61-1,62 -[a |b |c].

53

 

54

The error, e, is an N by 1 vector which is stacked from the top with all

within cell error vectors, ejk’ which are uncorrelated across the cells.
The within cell errors are njk by 1 vectors where the njk elements are

also independent. Therefore the dispersion matrix, W, is a N by N
diagonal matrix which composed of the dispersion matrices of all non-null
cells. Hence the dispersion matrix of each cell is 021 and the overall

dispersion matrix is

 

 

Pozlnll 1
0
VP - - 021,
0

_ 02anK .
where N - E E njk - 2 n8.
The random parameter variance matrices, Oa’ 0b, and 0c are both
diagonal block matrices with submatrices of ’a’ Tb’ and fc at their

diagonal position across the row, column units and their interactions.

The three submatrices, r and re, are of full matrices with the

a’ fb’
dimensions of the number of within-cell random slopes across the row,

column units and their interactions as stated in chapter I.

s ete ata
In the empirical Bayes method, random effects are estimated given
the ML estimates of the fixed unknown parameters. In ML estimation, the
logic of EM algorithm is to use the expected ‘value of sufficient
statistics of the 'complete data' given the observed data and the

previous estimates of these fixed parameters as a substitute for the

55
summary statistics of 'complete data', and perform ML estimation based on
the assumption that we had observed the complete data. In our case, the
complete data consist of the outcome variable Y and the true values of
random effects, 62 and e. The fixed unknown parameters include 02, 'a’
'b’ 'c’ and 1. By employing the assumption of having the complete data,
ML estimation can be simple to derive.

To find the complete data sufficient statistics, I will first use the
joint likelihood function of the complete data and parameters for MLF
estimation and then translate the results in terms of the crossed
multilevel model. Given the model defined at Equation (2.14) and (2.15)
and by referencing the Equation (2.4), the joint density function of the
parameters and the complete data is

£(d,ezlel,T,w) - f(d|e T,w) f(92|el,T,w) (3.1),

-l/2

2’61,
T.w> - [<2«)NI*|1
-1/2

Where f(dle eXP[(-1/2)(d-A262)Tﬂ-1(d-A262)],

2'91'
{(92|91,T,w) - [(2«)F|T|]

T -l
exp[('1/2)(92 T 62)]'
The term, F, in the second equation on the right hand side of Equation
(3.1) is the total number of elements of 62. Hence the log-likelihood

function will be

L a (-1/2)(3-A292)Tw'1(3-A292) - (1/2)92TT'192

A

‘where d - y - A 9 and 9 - (AITAI)-1A1T(Y - A282). Let e - d-A 6

1 l 1 2 2'
From the above complete data log-likelihood, we see that 8T8 and 9262T
are the sufficient statistics for 02 and the variance components T. Also
T
A1 Y and AlTAZG2 are the sufficient statistics for 61.

Having obtained the set of complete data sufficient statistics of the

56
. “T“ T T
modified baye51an model; (e e, 6262 , and A1 A292), we now need to
translate them in terms of the crossed multilevel model. Concerning the

term, 92621., we know that 62 is a vector of random effects as

T T T T T T
92 - [a1 ,. ,aJ 'bl ,. . .,bK |c11,...,cJK ],

T .
and aj - [an’ . . . , arl-l,j]’
T .
bk ' [bOk’ ' ' ' ' br2-l,k]’

c T - c T — [c c ]

jk g Og’ ' ' ° ’ r3-l,g ’

for j -Ilq...,J rows; k - l,...,K columns; and (jk) - g - l,...,G cells

classified by the jet; row and kg) column; where r1 is the number of
within-cell slopes that are random across the row units; r2 is the number
of within-cell slopes that are random across the column units; r3 is the
number of random within-cell slopes regarding the cells classified by
row and column units. Under the assumptions that the rows, the columns
and their interaction effects are mutually independent and that each row
unit is independent from other row units, each column unit is
independent from other column units of complete data and, given the row
and column effects, the cell is independent from other cells, the
sufficient statistics we need for random residuals are;

:2 a.a T, a b b T, 2 c c T (3.5)

J J k k s 3

Now it become trivial to get the ML estimator of T as

-J'12aaT (3-5)

'4)

-1 T
- K 2 bkbk

'5)
O‘

57

A A A A

Concerning the terms of eT e, where e - d - A262 and d - Y-Alél,

A

we use the cell level notations of the vector e and use the identities
between the Bayesian and crossed multilevel model. Then the sufficient

statistic for 02 of complete data is

A TA A T
2 2 eJk ejk - 2 2 [djk - (lekaj + x2jkbk + ijkcjk)]

x [djk - (lekaj + x2jkbk + x3jkcjk)]'

(3.7)

Then the ML estimator of 02 is

A

02 - (l/N) 2 2 ejkT ejk (3.8).

Finally the sufficient statistics of the estimator of 61 are AITY and
AltA292' the corresponding cell-level expression for AlTA292 in

crossed-multilevel model is

T T T
A1 jk A2jk e23k ' “3k xjk (xljkaj + x2jkbk + x3jkcjk)'

A

Then the equation 81 - (AlTAl)-1A1T(Y - A262) can be rewritten in

computational form in terms of crossed multilevel model as

7 - (2 2 wjk xjk ’51.ij (2 2 W3,, X31, 39,. - 3 E wjk x31. (X13133
+ x2jkbk + X3chjk)) (3-9)‘

.Equation (3.9) shows that the complete data sufficient statistics for

restimating the fixed effects of the crossed multilevel model are;

58

T T T T T T
22 wjk xjlc ij, 22 wJk xjk X13133: 22 wjk xJk ijkbk,
22 w T T (3.10)

31: xjk x3jkcjk'
Having found the complete data sufficient statistics for ML

estimation of r Tc’ and 02 and 1, we need to obtain the expected

a’ rb’
values of the complete data sufficient statistics conditioned on the
observed incomplete data Y and the estimates of the parameters. The EM
algorithm then uses the values of the conditional expectation of the
complete data sufficient statistics as the proxy of the complete data

sufficient statistics for ML estimation of parameters. The following

sections will show the EM formulas for parameter estimation.

9*
EE_E2rmula_f2r_Estiaatina_ths_£iaad_£aramster ( 1 )
As noted at prior section, the complete data sufficient statistic for

T
1 is the term A1 A292. If we have
complete data we can directly apply the value of AITAZO2 for ML

estimation. Since the actual data is not complete, the EM algorithm uses

estimating the posterior value of 6

the following conditional expectation to get the substitute for the

complete data sufficient statistic for ML estimation as

T 2 T *
A1 A2E[62|T, o , el, d] - A1 A262 (3.11),

where the right side of 'I' includes incomplete data 'd' and the

current estimates of T, 02,and 81. Since A1, A2 are given, the

posterior expectation applies only to 6 . Therefore the result of (3.9)

2
at

a-
is A1TA262 , where 62 is the posterior mean of 6 as presented at

2

chapter two. Once we have the estimates of 02, T, and 1, we can obtain

S9

* 'k
62 from Equation (2.33) and then the estimate of the fixed parameter 1

is

T T T T *
ij xjk ij - z 2 ”3k x.Jk (xljkaj
* + x (3.12).

T T

* -1
1 - (2 2 ij X.jk xjkwjk) [2 2

*
+ x2jkbk 3jkcjk )1

0 st matin t e arameter Variances

If the data is complete then the sufficient statistic is 2a to

a T
JJ
get ML estimates of 18. With EM algorithm, the data are incomplete. Thus
jT Finding this

we substitute E[£ a aJTId, W, 1, 1] for 2 a a

J J

conditional expectation directly follow from the standard theory. The

dispersion of aj, Var(a ), is defined as,

J

) - E(a a T) - E(a )T

11 ”(8

Var(a (Searle, 1982).

J J J

This may be restated into a more useful form,

E(a a T) - E(a )T

11 ”3‘“

).

+ Var (a

J J J

This equation is for unconditional expectation. It is the analogue for
conditional expectation given the 'incomplete data' Y and current
estimates of (18, 'b’ fc’ 02, and 1), so the expectation is

2 * *T 2

T
E(Z ajaj Id,ra, 'b’ 'c’ a , 1) - 2 a1 aJ + a 2 vlljj (3.13).

vllj j are the matrices taken from the diagonal position of v11 at

equation (2 . 31) .

*
The maximization step for the estimation of 'a is accomplished by

6O

*
the simple operation. The complete data ML estimator of fa is,

* *
a T + 022 v

* -1
'a 'J ‘2 8.1 j 1111)

(3.14).

*
This completes one iteration for estimating fa . The posterior estimates

* *
of other two estimates, 'b and TC , are presented at equation (3.15) and

(3.16) below;

* -1 * *T 2
rb - K ( 2: bk bk + a z v22kk) (3.15)

* -1 * *T 2
Tc - G (22 cjk cjk + a 22 v33jkjk)° (3-16)
v22kk and v33jl<jk are the matrices taken from the diagonal positions

of v22 and v at Equation (2.31) respectively.

33

The procedure used to derive the equations (3.15) and (3.16) is the same
*

as the one for fa . The iterative routine for estimating the Var(GZJk) is

* * *
now explicit. r , r and r consist the Var(e . ) as known variances
a b c 23k

for next iteration of the EM computation, where

V 6 T*
ar( k) - - r

2J

 

 

The posterior estimation given the known variances was provided in
chapter two. We plug the estimated variances from the previous iteration
(Equation. 3.14, 3.15, 3.16) into the given 'variances to obtain the
sufficient statistics at next iteration (E-step). The new ML estimates

follow easily (M-step).

61
st t n W - Va iance \I!

The within cell errors, e31" are uncorrelated across and within the
cells. Therefore the variance matrix, ‘1', is a N by N diagonal matrix with
off— diagonals of zero. Because the errors are uncorrelated, the
variance estimates of each cell can be computed separately and the
estimation procedure is the same across the G cells. The G parallel
estimates are then arranged along the diagonal of \I! to complete the
matrix estimate.

In order to compute sufficient statistics, we first write down a
model for e . The model is

jk

ejk - djk - A2jk92jk (3.17),

where djk - ij - Aljkel with the dimension of nJk by 1. A2jk’ 923k are

the same as before except the notation of cell indicators of the

subscripts. Then the complete data sufficient statistic for 02 is

T a

A A ‘ T
2 ejk ejk ' 2 2 (djk ' A2jk 92jk) ( jk ‘ A2jk 92jk)'

where djk - ij - A191.

)3 (3.18)

The resulting quantity of Equation (3.18) is a scalar. The expected

value for a scalar quadratic is

T T
E (eJk ejk) - E(ejk) E(ejk) + tr[Var(eJk)] (3.19)
(Searle, 1982 ,p . 355) .
As we discussed earlier, the EM algorithm uses the conditional

expectation at E-step. Hence the posterior estimation of the sufficient

62
statistic has the same form of (3.18) except the notations of posterior
estimates as

2

T , 91) - E(ejk*)TE(ejk*) + tr[Var(ejkId, T, a , 61)]

jk

E(e 2

ejkld’ T, a
(3.20).
We now need each term of the right hand side of equation (3.20). By

using the analogues between the unconditional and conditional

expectation, and the identity (3.17), we have

2 *
E(ejkld, T, a , 91) - d.

3k ' A2jk92jk (3'21)’

*
where 921k is the posterior mean of erk'

hand side of equation (3.20) is rewritten as

The second term in the right

2
Var(djk - Azjkezjkld, T, a , 91).
2
This is the same as Var(-A2jk621k|d, T, a , 61) because dJk is given
data that means constant. Hence we have
Var(e Id T 02 e ) - - A Var(e Id T 02 e )A T (3 22)
jk ’ ’ ’ l 2jk 2jk ’ ’ ’ 21 2jk '
v11 v12 v13 1

2 2

where Var(ezjkld, T, a , 91) - a v21 v22 v23
- v31 v32 v33 ~'

 

 

whose component matrices had been defined from Equation (2.30) through
(2.31). The expression of (3.22) can be more computationaly convenient

form if we consider the property of trace to yield,

63

2 T T 2
tr(A kVar(62jkId, T, a , 91)A2jk ) - cr(A2jk A2kVar(82jkId, T, a , 91))

23
(3.23).

Equation (3.20) can then be shown to be

2 * T d *
T' a ' 91) ' (djk ' A2jk621k ) ( jk ’ A2jk92jk )

T 2
+ tr(A2jk AzjkVar(ezjkId, T, a , 91)).

T
E(ejk ejkldjk’

This equation can be translated back to the crossed multilevel form as

E(e T e Id , T, 02, 7) - [d. - (x a * + x b * + x c *)]T
jk jk jk 3k 131:3 2jk 1: 31k jk
[djk ' (xljkaj* + x2jkbk* + x3jkcjk*)]
x1Tx1 x1sz x1Tx3 v11 v12 v13
+ tr 02 szx1 szx2 x2Tx3 v21 v22 v23 (3.24).
x3TX1 x3sz x3Tx3 3k v31 V32 v33

Having obtained the conditional sufficient statistic, the ML estimator
of 02 is easily followed at the M-step as,

02* - (l/N) 2 E E(ejkT ejkldjk’ r, 02, 1).

*
The resulting estimates of 02 will be used as a known variance for the
next EM iteration. At this point, we have completed one iteration of
EM algorithm.

The EM algorithm described so far does not require an evaluation of

the likelihood function. But as the iterative routine of EM algorithm

 

64
produces successive values of estimates, it will be useful to monitor the
progress of the algorithm by evaluating the likelihood at each iteration.
If there is no change between the likelihood with the estimates at one
iteration and the likelihood at the next iteration, then we can say the
results converge and decide to stop the iteration. I will show the
likelihood function for the MLF estimation and then present the log-

likelihood function for the crossed multilevel model.

Qbservgd Data Likelihood for the MLE Estimation

The derivation of the observed data likelihood is logically simple if
we note the relationship among the several probability density functions
in equation (2.3) which can ‘be rewritten for the marginal density

function of 'Y',

f(Y) - f(YI9)f(9)/f(9|Y) (3.25)

where f(Y) is the marginal probability density function of Y;
f(GIY) is the posterior density function of 9 given Y;
f(YIG) is the likelihood of Y given a;

f(9) is the prior density function of 8.

The above equation is based on general Bayesian model stated at equation
(2.1).

In the modified Bayesian mixed model, we consider (91, T, W) to be
the parameters and d - Y - A161 as the data, hence all density functions
are conditioned on (91,T,w) and the equation (3.25) is now restated as

f(d|91, T, w) - f(d|92,T,w,el) f(ele,w,el) / f(ezld,T,w,e (3.26)

1)

65

where f(dlel, T, i) is the marginal probability density function of the
data 'd' given (913,3); f(dl62,T,i!,61) is the likelihood of the data
given (92331561); f(82IT,VI',91) is the prior density function of 62
given (T,i,61); f(ezld,T,61) is the posterior density function of 6

2
given the data 'd' and the parameters (T,i,91).

By noting the general form of the density functions from Equation
(2.4) and from Equation (2.6) and by substituting the relevant terms for
the modified Bayesian model in appropriate manner we can specify the each

of the right hand side functions of Equation (3.26). The two numerator

functions are;

N -1 2 T -1
f(d|92,T,w,el) - [(23) lwl] / exp[(-1/2)(d - A262) 3 (d - A292)]
(3.27)
f(ele.v.el> - I(2«)'1/21F|T|'l/zexp{(-1/2><92TT'192)1 (3.28)
and the denominator function is;
f(ezld, T, w .91)
-1 ZIP -1 2 *
- [(2 > /1F|De§| / epr(-1/2)(e2 - 62*)H0921(92 - 92 )1
(3.29)
* T -1 -1 -l T -1 T -l -1 -1
where 62- (A2 i A2 + T ) A2 i d, and D92*- (A2 T A2 + T )
These three equations from (3.27) to (3.29) hold for all 92, therefore

*
if we evaluate Equation (3.27) at 62-92 , we can obtain more simplified

equations since the only remaining term from the equation (3.29) will be

66

-1/2 after eliminating the constant terms. The resulting equation is

Ivegl

then

f(dl61.T.w> - [<2«>N|*ll'l/zlbegll/ZITI’l/zexpt<-1/2)s<ez*>1
(3.30)

h s e * d A e * w'1 d A e * e *TT'le *
" ere ( 2 ) ' ( ' 2 2 ) ( ' 2 2 ) + 2 2

T -l * * T -l -l T -l
- d i (d - A262 ) (by using 92 -(A2 W A2+T )A2 8 d)

Equation (3.30) is the likelihood we seek.

ng-iikeiihood for the Crossed Multilevel Model

Given the Equation (3.30), the log-likelihood function is

LLF(61,T,02|d) -(-1/2)Tog|w| + 1/210g|092*| - l/2logITI - 1/2S(92*)
(3.31)
In order to translate this log-likelihood in terms of the crossed-
multilevel model, let us consider the four terms; le, ITI, [Dag], and
3(62*). First we consider the equivalence, i-azI, in crossed

multilevel model as noted at chapter 2. Thus
2
loglil - Nloga . (3.32)

The ‘prior parameter ‘variance T is the 'block. diagonal. with the
submatrices of cell-level dispersion matrices, 'a’ 'b’ and 'c that have
dimensions of r1, r2, and r3 respectively. Hence the dimension of T is

(Jr + Kr + Gr Therefore the log of the determinant of T is the sum

1 2 3"

of all diagonal terms as

 

thr'

5

as

The}:

The .

67

logITI - Jloglral + Kloglrbl + Gloglfcl. (3.33)

*

For the determinant of posterior dispersion matrix D92 which is a
three-by-three partitioned full matrix, we represent the matrix into two
by two partitioned matrix and obtain the determinant of the whole matrix

as

 

 

[ v v v
11 12 13
* 2 I I (2“1“K1'2"Gr3)I d22 d23 I
lDez I ' ” v21 v22 v23 ' 0 d d I
v v v 32 33
31 32 33 (3.34),
Pv V
11 12
where d22 - v v ]
. 21 22 '
PV
13 T
d23 - v J d32 - d23 ,
23 '
d33 ' v33“

The second term of the right hand side of the Equation (3.34) is then

1

22 23 d

(3.35)

' Id22|ld33'd32d22 23I

 

 

32 33
1

d

-1 -
'|V11IIV22’V21V11 v12"d33’d32d22 23"

Equation (3.35) can be simplified more by using the previous results of
Equations of (2.26) and (2.31) in chapter 2. Using the Equation (2.31),
we can obtain

1I

-1 -
Iv22'V21V11 V12| ' IH22

68

where H22 is the block diagonal matrix with dimension of Krz. Again

using the Equation (2.26) we get

-1 -1
Id «1231 - IU I.

33'd32d22

where U is the block diagonal matrix with dimension of Gr3. Finally
at -
8(92 ) - dTi' 1(d - A292*) can be replaced with
* T T * * * 2
8(82 ) - 22[djk djk-djk (leka:l +x2jkbk ”(311‘ch )]/a (3.36).

9:
By substituting loglill, longe‘ﬁ’I, logITI, and 8(62 ) of Equation (3.31)
with Equation (3.32) through (3.36), we arrive the final form of log-
likelihood function for the crossed multilevel model.

2
LLF(02, T, elld) - (Jr1+Kr +Gr3-N)1oga -(Jlog|raI+KlogIrbl+Gloglrcl)

2
-1 -1 T
+ 1°8|V11l+21°8'322k I-i-XXloglUJk I-[ZEdJk djk
1‘ * at 'k 2
- mdjk (xljkaj+x2jkbk ”(311‘ch )]/a (3.37).

The EM algorithm evaluates the log-likelihood at each iteration to
monitor the progress of the algorithm. The deviance, say 6, between the
log-likelihood at ith iteration and its value at (i+l)1:_h iteration will
be computed at each iteration. A value of 6 close to zero indicates
that the log-likelihood at the ith iteration to be very small in
comparison with its value at (ii-1);}; iteration. The actual EM iteration

will stop at 65k, where k is predetermined level.

Es:

Es:

'.-L :

0“

SEE

CHAPTER IV

CHECKING THE ACCURACY OF THE COMPUTING ALGORITHM

The author deveIOped the computer program, "Qrossed Multi-Level
(CML) algorithm," that provides estimates derived from the crossed
multilevel model using Gauss (version 2.0) language. The program is
designed to use sufficient statistics of the cross-product matrix as
input data and to perform thousands of calculations over numerous
iterations of the EM algorithm. It is also designed to analyze data
under general crossed multilevel modelling. Therefore an accuracy check
based on hand calculation of all equations in numerous situations would
be unreasonably demanding and quite unreliable.

An alternative reliable way is to utilize already available computer
programs, such as SAS and BMDP, that support some special cases of the
crossed multilevel analyses and to use simulation methods for other cases
of the model. This approach for checking accuracy of the algorithm is
reasonable in two points. First, the computer program performs MLF
estimation of the EM algorithm which produces maximum likelihood
estimates as noted earlier. Some standard packages, such as SAS and BMDP
allow maximum likelihood estimation for variance components models in
which only an intercept of regression model at the cell level can be
specified as random and all other regression coefficients have to be
fixed. The author decided to compare the estimates between the programs
presented in this thesis and the standard packages under the assumption
these standard packages produce true maximum likelihood estimates when
the crossed multilevel model has a random intercept. For the analysis

69

70
of crossed multilevel model with multiple random slopes in the within-
cell model, a covariance components model, I used a simulation method to
see if the program recovers the known parameter values.

The procedure for checking the accuracy of the program is organized
in Table l. The first column of the table lists the crossed multilevel
models from the simplest case to the most complicated. The second column
of table 1 tells us how the computational results were confirmed to be

accurate .

71

Table 1. Procedure for checking the accuracy of the algorithm

 

Models Empirical Evidence

 

d andom ects nova
Within-cell model:

Yijk ' ﬁjk + 813k
Between-cell model:
ﬁjk - 10 + aj + bk + cjk

compared to SAS

V nce Co onents Mode I
Within-cell model:
Y + e

ijk ' 53k ijk
Between-cell model:
ﬂjk ‘ 70 + 11W1jk + 72w23k

+ aj + bk + cjk

C, Vagiance Components Model II
Within-cell model:

Yijk ' ﬁOjk + ﬂlxlijk + eijk
Between-cell model:
”031: " 70 + 71"131: + 72w2j

+aj+bk+cjk

va e o ode
Within-cell model:

Yijk " ﬁOjk + ﬁljkxijk + eijk
Between-cell model:
ﬁOjk " 700 + 701w13k + 1'02"’2jk

+ aOJ + bOk + chk

compared to BMDP

compared to BMDP

simulation study

+ a + b + c

”mt " 71o 13 1k ljk

 

72

Model A to model C represent the variance components models because
there are only variances to be estimated. Thus the resulting estimates of
variance components are all scalars. Model D is the covariance
components model in which the two random within-cell slopes are
correlated and the resulting parameter variance-covariance matrices are
all two-by-two full matrices. The SAS (see SAS User's Guide, 5th Ed.,
chapter 41) program was used to obtain the maximum likelihood estimates
of the posed models of A. BMDP (see BMDP Manual, Vol. 2., general mixed
model analysis, pp.1144-llS3) was used for estimating the model B where
two group level predictors are involved and the model C where a fixed
effect covariate is involved at individual level, an analysis which SAS
cannot perform.

The common features of variance-components models in BMDP and SAS lie
in the point that they are experimental in nature. The fixed effects sum
to zero and the random effects are assumed to be sampled from normal
populations with zero means. Although BMDP supports the model with fixed
effects covariates at the individual level, SAS requires that all
predictors in the model are the group level categorical variables and
provides only the estimates of variance components even if fixed effects
predictors are in the model. The algorithm for the crossed multilevel
model presented in this thesis does not require the predictors in the
model to be constrained as class or discrete variables. Because no
available computer program allows the estimation of a crossed multilevel
model with continuous variables in the model, the author selected the
model with discrete variables only for computational comparison. Later
at chapter 4 of this thesis, a crossed multilevel model with a continuous

covariate in the model will be estimated.

73

In the case of model D, I conducted simulation analyses for balanced
data, since no computer programs are available for estimation of crossed
multilevel model when it has multiple random slopes. The simulation
method is somewhat judgemental because one should decide the number of
replications of the simulation. The current computer program for the
crossed multilevel model involves complicated computation and takes a
long time for computation. So the author decided to use relatively large
size data, N-8400, but limited the replications to twenty times. For the
unbalanced data of model D, another simulation analysis would be better
for checking the accuracy of the program. However, I analyzed one
unbalanced data set that originated from an already used balanced data
and compared the results to the balanced case in the hope that if the two
results were close enough, then the algorithm is believed to analyze the

unbalanced data properly.

0 a e
The author produced the all data sets to estimate the posed models
through random generation using the Gauss programs. To explain the
models and data used. for accuracy check, following, definitions are
useful:
1. Design characteristics:
J - number of macro units for rows (e.g. schools);
K - number of macro units for columns (e.g. neighborhoods);

G

number of cells classified by the two sets of macro units;
n - number of observations of each cell;

N - total sample size.

74
2. Model characteristics
Yijk - outcome score of person i in the cell classified by jth row and

kth column units;

ﬂqjk - qth random regression slope of a cell classified by jgh row and
kth column units;
2 .
eijk - random effect of person i of cell jk. ejk - N(0, a ),

1 - fixed effect of a parameter;

aJ - random effect of jth row units, a - N(0, ra);

J
bk

cJk - random interaction effect of jth row and kth column units,

b - random effect of kth column units,

k ‘ N(ov fb);

cjk-N(O,rc).

a d f e ov
Model A is a crossed random effects ANOVA model. The design
characteristics are: J-lO, K-l4, 6-140, “F10, N—l400. The

distributions from which the values of random effects selected are: a -

J

N(0, l6), bk ~ N(0, 25), c - N(0, 36), and e - N(0, 100). The fixed

jk ijk

parameter value was assigned as 1 - 10. The fixed effect, 1, and the

random parameters at the higher level, a1, bk’ and cjk constituted the

data for each cell mean, ﬁjk' ﬂjk and the individual random effect, eijk

together produced the outcome values, Y Using Y two analyses

ijk' ijk’
were performed using ML estimation of SAS and the algorithm developed by
the author.

For the analysis of’ unbalanced. data, fifteen. cells of' data. were
arbitrarily selected out. Hence the design characteristics are changed

as: J-10, K-l4, g-125, n-10, N-1250. The model is the same as in the

balanced case. Table 2 shows the computational results of the CML

75
algorithm in comparison with the results of the ML estimation of SAS

program.

Table 2. Computational comparison between SAS program and the CML
algorithm for crossed random effect ANOVA

 

 

 

Covariance components SAS CML algorithm
Balanced Data
fa 37.287992 37.287925
fb 29.999248 29.999221
fc 31.867203 31.867350
02 94.260318 94.260317
1 12.671820 12.671820
Unbalanced Data
fa 40.036423 40.036586
'b 36.3315730 36.315495
fc 30.811034 30.811064
02 94.260322 94.260317
1 12.799353 12.804980

 

The two sets of results clearly show that two programs produced
identical estimates. One thing to note in Table 2 is that the variance
components analysis of the SAS program does not produce the estimates of
the fixed effects, 1, which is the grand mean of the sample in Model A,
so I obtained the observed grand mean to check the accuracy of the

estimates of fixed effects, 1, of the CML algorithm.

cla:
Hem
des:
acm
mac:
dam:
0 f<
Alt}
the

COT]

whe}
Near
Unii
unit

T1111

76
e o o t o e

Model B is the typical 'variance components model for' which. most
classical research on variance components have centered (see Searle and
Henderson, 1961; Rao,l972; Hartley and Rao, 1967; Harville 1977). The
design characteristics and model characteristics of the data used for the
accuracy check are the same as the case of Model A except for the two
macro variables in Model B. The two macro variables, lek and szk, are
dummy variables defined on rows and columns respectively. They are coded
0 for the first half units and l for the remaining units respectively.
Although BMDP provides the estimates of the fixed effects in the model,
the parameters are estimated under a general mixed ANOVA model. The
corresponding variance components model for model B that BMDP uses is

Y - p + a (4.1),

ijk 1 + am + aJ + bk + cjk + eijk

where Y is the outcome value of student 1 in jkth cell; p is the grand

ijk
mean; a1 is the fixed effect of W1, the macro variable defined on row
units; ﬁn is the fixed effect of W2, the macro variable defined on column
units; aj, bk’ cjk’ and eijk are the random effects as in the crossed

multilevel model. In Equation (4.1), the fixed effects parameters, ,1,
a1, and ﬁm are not the same as 10, 11, and 12 in Model B. An alternative
‘way to check the accuracy of the estimated fixed effects of CML algorithm
is to compute the predicted mean values of the groups classified by the
two group-level variables using the estimates of the fixed effects from
the two programs. The design matrix of the variance components model in
BMDP, whose columns are the orthogonal contrasts, and the dummy coding of

the fixed effects variables in the crossed multilevel model enable us to

77
compute the predicted group means from the two analyses. I will first
show the computational comparison of the variance components and then
move to the case of the fixed effects.
The computational results for the estimates of the variance
components from BMDP and crossed multilevel algorithm for balanced and

unbalanced data are appeared at Table 3.

Table 3. Computational comparison for the estimates of variance
components between BMDP program and the CML algorithm for Model B.

 

 

Covariance components BMDP CML algorithm
Balanced Data

'a 12.665 12.665328

1b 28.111 28.111434

7c 31.879 31.879463

02 94.260 94.260318

 

1a 12.266 12.239884
rb 34.666 34.696003
rc 31.070 31.080777
62 94.260 94.260289

The first panel of Table 3 shows that the results from each program
are identical. For the results from the analysis of unbalanced data, the
results are virtually identical but the differences among the estimates,
.03 for parameter variances and .0001 for within-cell variance, are a
little larger than those of balanced case.

The computational comparisons for the estimates of the fixed effects

78

Table 4. Computational comparison for the estimates of the fixed effects
between BMDP and CML algorithm for Model B (Balanced case)

 

 

 

 

BMDP CML
u a1 ﬁm 70 11 12
12.672 -4.909 -1.109 6.653551 9.818753 2.217782

Group Codes Predicted Group Means

W1 W2 BMDP CML

0 0 p+a1+ﬂm- 6.654 10- 6.654

1 0 p+a1-ﬂm-l6.472 10+1I-16.472

0 1 p-a1+ﬁmf 8.871 1o+12- 8.871

1 1 p-al-ﬁmf18.690 10+11+12-18.690

 

Table 5. Computational comparison for the estimates of the fixed effects
between BMDP and CML algorithm for Model B (Unbalanced case)

 

 

 

 

BMDP CML
u 01 ﬂm 10 11 12
12.717 -5.098 - .911 6.480460 10.322841 1.667013

Group Codes Predicted Group Means

W1 W2 BMDP CML

0 O p+al+ﬁmf6.709 10-6.48O

1 O u+al-ﬁmf16.904 10+11-16.803

0 l p-a1+ﬁmf8.530 1o+12-8.148

l l p-al-ﬁm-18.725 10+11+12-18.470

 

79
are appeared in Table 4 for the balanced case and in Table 5 for the
unbalanced case.

The first panels of Table 4 and TableS shows the estimated values of
the fixed parameters and the second panels of the tables show the
computational comparisons of the predicted group means obtained from the
estimated fixed effects from the two programs. The equations of the
second panel shows that the predicted mean values of the groups
classified by the two group level variables are obtained using the design
matrix in BMDP and the dummy coding of the variables in CML algorithm.
The results show that the predicted mean values of the groups classified
by the group level variables, W and W are virtually the same in both

1 2’

balanced and unbalanced case.

WW

Model C is a more complicated type of variance (components model
that has fixed effect variables at both levels of units. SAS cannot
analyze the data using this model because SAS allows only group level
predictors in model specification and the predictors should not be
continuous (see SAS User's Guide, chapter 41). BMDP can perform the
analysis of this model but it also constrains the group level predictors
not to be continuous. In order to compare the computational results,

the author coded the individual level covariates, as zero, one,

xlijk’
and two (a linear contrasts) and kept the coding systems for group level
predictors as before. The sample size of each cell was increased as n-20
because the model has a within-cell covariate.

The design characteristics for the balanced case are then: J-lO,

K-14, G-l40, n-20, N-2800. The chosen random effect distributions are

80
the same as before. For the analysis of unbalanced data, fourteen cells
were arbitrarily selected out and the resulting design characteristics
are: J-lO, K-l4, G-126, n-20, N-2520. The computational results for

variance components from BMDP and CML algorithm are presented at Table 6.

Table 6. Computational comparison for variance components between BMDP
and CML algorithm for Model C.

 

Covariance components BMDP CML algorithm

 

Balanced Data

 

r8 13.565 13.565323
7b 17.402 17.402192
to 25.609 25.609418
02 140.532 140.53248

Unbalanced Data
in 12.569 12.565704
rb 14.497 14.501222
’6 26.852 26.853643
02 140.532 140.53247

 

Table 6 shows that the two sets of results from each program are
equivalent. The results of covariance components of balanced case are
closer each other than those of the unbalanced case.

Concerning the estimates of the fixed effects, the variance
components model that BMDP uses is

Yijk - p + ﬂlxijk + a1 + 5m + a3 + bk + cjk + eijk (3.38),

where pl is the fixed effect of the characteristic of student 1 in jkgh

81
cell, and all other terms are the same as in model B. As in the case of
model B, BMDP uses the design matrix for estimating the fixed effects, y,
al, and pm, thus those fixed effects parameters are different from CML
algorithm that estimates the regression coefficients in terms of general
linear model.

Table 7 and Table 8 show the computational comparisons for fixed

effects between the BMDP and the CML algorithm.

Table 7. Computational comparison for the estimates of the fixed effects
between BMDP and CML algorithm for Model C (Balanced case)

 

 

 

 

BMDP CML
' “ “1 pm ‘31 70 71 72 ‘91
12.185 .113 .207 7.363 12.505 -.414 -.225 7.363

Group Codes Adjusted Group Means (after covariate)

W1 W2 BMDP CML
O 0 p+a1+ﬂmf12.505 10-12.505
l 0 p+al-ﬁm-l2.09l 1o+11-12.09l
0 l p-a1+ﬁmf12.279 1o+12-12.279

l 1 u-al-ﬂm-18.865 10+11+12-18.865

 

82

Table 8. Computational comparison for the estimates of the fixed effects
between BMDP and CML algorithm for Model C (Unbalanced case)

 

 

 

 

BMDP CML
74 9:1 13m 181 ‘70 11 72 BI
12.215 -.029 .286 7.363 12.583 -.686 .084 7.363

Group Codes Adjusted Group Means (after covariate)

W1 W2 BMDP CML

0 0 p+a1+ﬁmfl2.472 1o-l2.583

l 0 p+a1-ﬁm-11.900 1o+11-11.897

0 l p-a1+ﬁm-12.530 10+12-l2.668

l l p-al-ﬁm-ll.958 1O+11+12-1l.982

 

The first panels of Table 7 and Table 8 show the estimated values of
the fixed parameters and. the second. panels of' the tables show’ the
computational comparisons of the predicted group means after accounting
for the effect of the fixed effect of individual covariate. The
equations of the second panel show that the adjusted mean values of the
groups classified by the two group level variables are obtained using the
design matrix in BMDP and the dummy coding of the variables in CML
algorithm. The results show that the two sets of adjusted group means

are virtually the same in both balanced and unbalanced case.

v nce o 5 od
Model D has two random within-cell parameters and the variation of
the two parameters across the macro units was explained with different
sets of predictors. The two within-cell parameters are not necessarily

independent. Hence the resulting dispersion matrices, 'a’ 1b, and 'c’

83
are no longer scalars, rather they are all two by two full matrices.
Standard statistical packages cannot analyze data under this kind of
covariance components model. Since there is no available computer
program for the Model D, the author conducted a simulation study to check
the accuracy of the computational results for balanced data. For the
complete accuracy check of the program performance, the number of
replications of analyses should be large enough, for example ten
thousands times, which is unreasonably demanding for the present thesis
work. As a compromise, the author decided to conclude that the algorithm
works properly if the results show sensible evidence of accurate results.
The number of replications of analysis was limited to twenty times but
the sample size was taken large enough, N-8400, in order to compensate
for the small number of replications. The design characteristics for the
simulation data are: J-14, Kr20, G-280, n930, N-8400. The distribution

of the individual random effects are designated as e - N(0, 49). For

ijk
the group level random effects regarding ﬁOjk’ the distributions are: aOJ
- N(0, l6), b0k - N(0, 25), COjk - N(0, 36). The other set of

distributions of the group level random effects regarding pljk are: a

13 '

N(0, 9), b1 ~ N(0, 4), c ~ N(0,16). The fixed parameter values were

k ljk
assigned as: 700 - 110 - 10, 701 - 5, and 102 - 3. The group level

predictors, lek and szk, are all dummy variables coded zeros and ones.

The within-cell random effect covariate, has the values of zero,

xlijk’
one, and two.
The preassigned parameter values are now compared with the mean values

of the estimates obtained through twenty replications of analysis. Table

9 shows the results.

84

Table 9. Computational comparison between the preassigned parameter
values and the average values of the estimates from twenty simulations.

 

 

 

 

Covariance Parameter Estimates

components values Mean SD Min Max
62 49 49.09 .72 47.54 50.14
'0a 16 16.97 7.80 5.75 38.39
70b 25 23.82 9.31 8.12 36.57
70c 36 37.84 3.63 32.00 44.22
1 9 7.70 3.82 3.19 15.32

la

71b 4 4.52 2.21 1.70 8.52
71c 16 15.96 1.64 13.26 20.35

 

Fixed effects

100 10 9.96 3.52 3.10 15.23
701 5 3.47 3.09 -2.47 11.14
702 3 3.50 2.92 -4.05 7.50
110 10 9.99 1.21 7.33 12.32

 

Table 9 shows first that the mean values of the estimates are all
reasonably close to the preassigned parameter values. Second, all the

preassigned. parameter values fall within the limit of’ one standard

deviation from the mean estimates. Third, the 02, f0c’ and flc are

closest among the covariance components because these parameters are
estimated with the large sample sizes, they are N-8400, G-280. Other
estimates of the covariance components are not as close to the parameter

values as are 02 r or 'lc’ due to the insufficient sample sizes such

0c
as J-l4, K-20. Fourth, for the fixed effects, the mean value of 110

which is the mean. of' within-cell slopes, is closest to the

£11k!
parameter value and has high precision, SD-l.21. Generally speaking, the

85
slopes have less variation than does the intercept. The results of the
fixed effect estimation support this fact. These four characters of the
results establish that the CML algorithm produces sensible results for
Model D even if the replications of the analysis were limited to twenty
times.

For the analysis of unbalanced data, I first built the data by
selecting out, using a random digit table, the data of 30 cells from an
already used balanced data set. Hence the design characteristics are;
J-l4, K-20, G-250, n—30, N-7500, but the model characteristics and the
preassigned parameter values are the same as in the balanced case. Table
10 show the results of the analysis that compared to the balanced case

and the preassigned parameter values.

86

Table 10. Computational comparison beWeen the preassigned parameter
values and the results of the analyses of the balanced and unbalanced
data.

 

Parameters Parameter Estimates
values Balanced Data Unbalanced Data

 

Randqm effects
a

 

49 49.504685 49.152115
'0a 16 38.388547 23.765979
70c 25 17.432644 18.404436
71a 9 11.717677 5.432532
flb 4 6.440947 6.959900
'1c 16 18.057778 13.437822

Fixed effects

100 10 14.881167 10.308522
101 5 2.130050 3.770741
102 3 -l.010730 1.931768
110 10 12.318740 10.085683

 

Table 10 shows that two sets of estimated values have some differences
from the preassigned parameter values. These differences occurred
because the estimates were obtained from the single samples of the
population with the preassigned parameter values. Similarly the
differences between the two sets of estimates of the balanced and the
unbalanced data are attributable to the fact that the observation was
taken from one sample of data. If we replicate the attempt to make an
unbalanced data set from the same balanced data and compute the mean
values of the estimates from the analyses of the unbalanced data, then
the two sets of estimates would be closer. Nevertheless, we can see

some patterns of the results in Table 10 as we found in Table 9. The

87

estimates of 02 show the closest values to the parameter values and, both
balanced and unbalanced cases show little differences, because the
estimates were taken from the large sample size of 8400 and 7500
respectively. The estimates of '0a and rla show somewhat large
differences from the preassigned parameter values as well as between the
two data cases, because they were taken from the sample size of J-l4.

The overall information of Table 10 is that the distribution of the

estimated values are centered on their parameter values in either cases.

CHAPTER V

ILLUSTRATION

In this chapter, crossed multilevel analysis is illustrated. by
reanalyzing the data collected by Rudman and Raudenbush (1987). The
experience with the crossed multilevel analysis will enhance our
understanding on the logic of model specification and answer the
following questions:

1. What parameters can be estimated ?
2. How are the hypothesis tested ?

3. How are the results interpreted meaningfully ?

at d t

Rudman and Raudenbush's (1987) study of the effects of excess
testing time on test scores provides the data for illustration. The
purpose of the study was to assess the influence of providing excess
testing time on standardized reading comprehension test scores. The
test was not intended to be a speed test, but rather a power test. The
time limit of the standardized test has been determined by the "90 %
criterion." Using this criterion, testing time is the time elapsed
until 90 8 of the examinees complete the item analysis edition of the
instrument. The assumption underlying the procedure is that the 90 %
completing the test had arrived at correct answers to many of the items
and had used informed guessing on the remainder. It is also assumed that
the remaining 10 8 who had not completed the test would merely employ
random guessing of given more time. Hence more time would not translate

88

89
into mean test score gains. If the test results are sensitive to the
provided excess time it becomes important to discover the optimal
testing time by estimating the functional form of the relationship
between excess time and test scores for the susceptible test. Moreover,
such tests may be sensitive to variations in test adminstration
procedures and hence would have impact on decisions on student placement
in advanced classes, promotion to a higher grade, teacher awards, or

other recognitions because of their class' higher test scores.

Sample and Design

In the original study, 29 5th grade teachers from 16 of the 33
elementary schools in the Lansing school district in Michigan
volunteered to serve as participants in the study. However data could
be collected for only 23 of these classrooms. These 23 fifth grade
classrooms supplied usable data for 471 pupils. The data set contains
both demographic characteristics (including ethnicity, sex, eligibility
for free lunch, etc.), and test scores, pretest scores (including
reading subtest scores and total reading scores).

The design of the study involved first, creation of seven blocks
based on mean pretest scores each containing four classrooms from the
original 29 classes. Four treatment groups ‘were established, each
represent testing-time allotments defined by having 0, 5, 10, or 15
minutes excess time to complete the test. Within each of the seven
blocks, classrooms were assigned at random to one of the four
treatments. One remaining class was added to one of the classes of
which the class size was only five. Therefore there are 22 cells of

data in two-way classification. The data were well suited to a

90

polynomial trend analysis. The sample sizes in the design are presented

 

 

 

at Table 11.
Table 11. Design and Sample Sizes
Excess Time
None 5 Min 10 Min 15 Min Overall

Block 1 - - - 24 24
Block 2 19 22 27 18 86
Block 3 28 23 20 - 71
Block 4 24 - 24 24 72
Block 5 22 ‘ 22 23 21 88
Block 6 25 20 15 - 60
Block 7 21 13 l7 19 70
Overall 139 100 126 106 471

 

If the sample had included 28 classrooms, the design would have been
a balanced randomized block design (Kirk, 1982, chapter 6) with seven
blocks and four treatments. In the study, however, six cells were missing
and the sample sizes of the cells classified by the seven blocks and four
treatments are not the same. The original study (Rudman and Raudenbush,
1987) used "the least squares solution” (see Searle, 1971) by applying a
series of regression models with increasing complexity, and computed the
reduction in residual variation on each step. This approach solves the
inter-correlation of the main and interaction effects caused by the
unbalanced character of the data. However, the amount of variation
assigned to each effect will depend on the order of its inclusion. The

interpretation of hypothesis testing is also conditional to the order of

91
its inclusion (Searle, 1971). In addition, the two crossed-factors are
considered to have fixed effects in that study. However, the blocks
are typically considered as having random effects and the excess time
effects are viewed as random also because they are considered as a sample
from the population of the excess testing times. Again classrooms are
nested within the cells classified by the blocks and treatment groups.
Hence the design reflects crossed multilevel data.
na s's

The general strategy for crossed multilevel analysis can be
summarized in three steps: 1. examining variability among students in
the hierarchical structure (base model), 2. examining variability among
students within the cell, here classroom, classified. by 'blocks and
treatment groups (within-cell model specification), 3. identifying
variability as a function of group level variables (between—cell model

specification).

W

In the first stage of model specification we examine variability of
the data where students are nested within the cells, classrooms,
classified by the blocks and treatment groups. Addleman (1970) stated
two analytical principles related to the data structure. The first one

may be considered as a general principle in experimental design, that is

" The design and analysis of experiments should take into account all
of the major sources of variability that are expected to influence the

responses (p.1,095).'

92

Second, since students are nested within classrooms, both experimental
unit (classrooms) error and the observational unit (individual) error
should be included in the model (p. 1,097). With these classical
conceptions of model specification, we can see that the present design
reflects four sources of variability of students scores; individual
differences, block differences, treatment group differences, and the
interactions from different combinations between the blocks and the
treatment groups. The errors from individual differences are the
observational unit errors because the data were taken from the individual
scores. The errors from the classroom differences are the experimental
unit errors because the classrooms are the units that are randomly
assigned to the cells of two-way classification.

In the multilevel conceptions, the observational unit errors are
specified under the within-cell model and the experimental unit errors
are specified under the between-cell model where the errors are
decomposed into three parts. We first pose the within cell model as

Yijk ' ﬁjk I e1:31:

for i - l, ..., “31‘ student in the classroom that belongs to block j, j

- 1, ..., 7, and treatment group k, k - 1, ..., 4. Yijk is the test
score of student i in the classroom jk. ﬁjk is a mean test score of
students in classroom jk, and eizjk is an individual effect of student
1 nested within ”z" the classroom jk. It is assumed that ei:jk is
normally distributed within each treatment by block with mean 0 and

constant variance 02. The within-cell model is a traditional

93

regression model with no predictors in the model except class

pjk’
means.

Thus we pose between-cell model to examine variability of the

classroom means, ﬂjk’ The between-cell model is

10 + aj + bk + cjk'

ﬁjk

Each class mean score, becomes the outcome variable which is a

ﬁjk’

function of the grand mean, 10, plus the jt_h, block effect, a the kth

j!
treatment effect, bk’ and the effect of interaction between jg]; block

and kth treatment group. The distributional assumptions of the three

error terms are; a - N(0, fa); bk - N(0, 7b); and c - N(0, 7c). The

J
three error terms are mutually independent. Hence the variance of class
mean scores, parameter variance, is Var(ﬂjk) - fa + fb + 'c’ which is

the total variance attributable to the differences of the classrooms.

The full results for this model are reported in Table 12.

94

Table 12. Results of crossed multilevel analysis of base model

 

Fixed Effects Estimates SE t

 

10 41.535 .459 90.569

 

Covariance Components

 

Within-cell variance

02 99.059

Between-cell variance

Overall 1 r
a b c

5.221 0.901 1.667 2.647

-21n(maximum likelihood) 3515.8118

 

Table 12 shows the estimation. results of' both fixed. effect, 10
(intercept), and the covariance components. These estimates are maximum
likelihood estimates since the CML model uses MLF estimation method of
EM algorithm. For the fixed effect, 10 (intercept), we can test the
hypothesis, H0: 10 - 0, but it is not of interest at this point. The
intercept is the grand mean of all classroom mean scores.

The critical point in Table 12 is decomposition of the between-cell
variance into three parts; fa

, 'b’ 'c' The total observed variance is

Var(Y 2 + fa + r + 'c - 104.28. The between-cell variance is

ijk) " a b

5.221 and the within-cell variance is 99.059. We can compute various
intra-unit correlations using the decomposed covariance components.
First, we can estimate the proportion of variance that lies within- and
between- classrooms. About 95 8 of the observed variance is at
individual level. Only 5 8 of the observed variance is attributable to

the differences among classroom memberships. Classical-single level

models, i.e., regression or ANOVA, do not provide the parameter variance.

95

Nested-multilevel models such as ML2 or HLM provide the parameter
variance but the analysis of variance components would stop here. The
crossed multilevel model allows further decomposition of the variance.
We would like to know what proportion of the variance lies between blocks
or treatments or their interactions. Seventeen percents of the parameter
variance and one percent of the observed variance reflects variation
among blocks. About 32 8 of the parameter variance is attributable to
the treatment group differences. Again 51 8 of the parameter variance
and 2.5 8 of the observed variance reflect random interaction effects,
which means the variation among students with different classroom
membership after taking out the effect of blocks and treatment.

Kang and Raudenbush (1988) performed a comparison between a classical
analysis and HLM using the same data. In that study the observed variance
was decomposed as 02-98.73 and r-5.5397. The two estimates are virtually
the same as the estimates in Table 12. An interesting point is that the
parameter variance from HLM analysis is the sum of the three variance
components at macro level. With the HLM analysis we don't know the
proportion of the parameter variance that is solely attributable to the
differences among treatment group memberships of students. The tiny
differences among the estimates from HLM and CML analysis may be caused
by two things. One thing is that HLM uses MLR estimation which is
equivalent to REML (Patterson and Thompson, 1971; 1974; Harville, 1977)
but CML uses MLF estimation which produce ML estimates. The other thing
is rounding errors when the two programs compute the estimates.

In sum, the crossed multilevel analysis provides all necessary
information regarding variance decomposition of the base model and

guides further model specification. Having decomposed the covariance

96
components, we now move our interest to account for variability of the
student scores at each level. We first attempt to account for the

individual level variance.

W h - o e a

As shown in Table 12, about 95 8 of the observed variance lies at the
individual level within each classroom. To account for the variation we
need to identify the covariates from individual level variables. In the
original study (Rudman & Raudenbush, 1987), the best single covariate
proved to be the total reading pretest, r-.75. Only one covariate was
needed because other likely covariates, (e.g. ethnicity, sex, parent
education) were not significantly related to the outcome after adjusting
for the effects of the best covariate. The traditional single-level
analysis, as in the original study, treats the effects of individual-
level variables as fixed, which implies that the effects of the
individual characteristics on the outcome variable are constant across
the all classrooms, blocks, and the treatment conditions. However we
don't know whether the effects of the chosen covariate on students' test
scores are constant across the all classrooms. The effects of the
covariate may work differently across the classrooms. Suppose certain
teachers provide individual teaching for the students who need remedial
study after the pretest, but some other teachers do not. Then the
correlations between the posttest scores and pretest scores may vary
across the classes. The correlations of the two test scores of the
classes with remedial teaching would be smaller than those of the other
classes. Therefore we first examine the variation of the covariate

effects across the classes, and specify it as having a random effect. We

97

pose the within-cell model as

Yijk ' pOjk + pljk(Tread)ijk + ei:jk'

The variable code ”Tread" indicates total reading pretest scores. Note

the effect of the covariate, has subscripts in the model, which

ﬂljk’
allows the parameter to have different values across the classes. The
between-cell model does not need any predictors at this moment, but we

need to present a pair of equations because the two within-cell

parameters are specified as random. The between-cell models are

+ aoj + b0k + cOjk

+ alj + b1k + cljk

503k ' 700
pljk ' 710

where aOJ - N(O, roe). 11

Tla)’ b1k - N(0, 11b), cljk - N(0, 71c). The random effects

b - N(0,

'08), 0k " N(09

70b), c0jk Similarly a
- N(0,
within each between-cell equation are mutually independent. However the
two outcome variables, pOjk and £11k, are not necessarily independent of
each other. Thus the random effects within each macro unit, for example
an and alj’ are correlated across the two equations, and resulting
covariance components at macro level will be two by two full matrices.

The results of estimating this crossed multilevel model are shown at

Table 13.

98

Table 13. Results of crossed multilevel analysis with random slopes at
within-cell model

 

 

Fixed Effects Estimates SE t
700 12.093 1.19576 10.114
110 . 3848 .01516 25. 3826

 

Covariance Components

 

Within—cell variance

02 42 .474

Between-cell variances

Overall r r r
a b c

9 . 2207 - . 1025 2 .4059 - .0123 1 . 8698 - .0405 4. 9500 - .0497
. 0016 8 . 346E- 5 . 0010 . 0006

- 2ln(likelihood) 3122 . 8877

 

The major concern of this analysis was to see whether the covariate,
Tread, has a fixed or random effect. The results of Table 13 clearly
shows the slope of the covariate does not vary so much across the blocks,

118-8.346E-5; the treatment groups, 71 -.0010; and the interactions,

b
rlc-.0006. We can test the hypotheses, H0: Var(alj) - Var(blk) -
Var(cljk) - 0 with the likelihood ratio test by using the values of

-21n(likelihood). It is known that the statistic, -21n(L1/L2), has an
asymptotic x2 distribution, where L1 is the maximum likelihood value of a
less complex model and L2 is the value of a more complex model and the
degrees of freedom of x2 statistic is the difference of the number of
parameters to be estimated in each model. Hence the deviance between the
two values of -2ln(likelihood) will have x2 distribution with the
difference of the number of parameters between the two models as the

degrees of freedom. The number of parameters of the model in Table 13

is twelve, while only five parameters were estimated in the model of

99
Table 12. The deviance between the two values of -21n(likelihood) is
392.9241 which is large enough to reject the x2 statistic with seven
degrees of freedom. Therefore we reject the composite null hypotheses,
- 0, Var(a

)- Var(blk) - Var(ch) - 0. We, however, do not know

“0‘ 710 13
which particular hypotheses are rejected given the all posed hypotheses.
To identify the significant effects by using the loglikelihood ratio
test, we need to specify another model that has fewer parameters.

Table 13 shows that the within-cell variance has been reduced from
99.059 to 42.474 by virtue of the covariate. Concerning the fixed
effects, the average slope of the covariate across the all cells is
.3848. There are other results regarding the within-cell intercept,
pOjk’ but the intercept is meaningless because the covariate was coded
with its raw scores.

Results shown at Table 13 reveal some characteristics of the crossed
multilevel model. Traditional variance components models and single-
level models that are available through SAS or BMDP can't perform this
analysis, , where multiple random within-cell parameters are modeled with
appropriate error terms. Under classical variance components models, all
covariates must have fixed effects and estimation of random effects is
limited to variance components.

The difference between nested and crossed multilevel analysis is
again shown at Table 13. While crossed multilevel analysis partitions
the covariance components into three matrices, 'a’ 'b' and fc’ nested
multilevel analysis, however would provide the overall matrix only. In
Table 13, both overall and the three separate variance-covariance

matrices show that the sizes of the variance of the covariate effect are

small. Hence both analyses may agree to fix the effect of covariate in

100
this case. However when the overall parameter variance of the covariate
is substantial, a nested multilevel analysis would provide no information
about which macro units cause the substantial size of parameter variance
while crossed-multilevel model identify what particular random effect
among macro units are significant. Thus the crossed multilevel model
provides clear guide for further model specification.

Although the sizes of the variances of the covariate is small, we
still need to test the significance of the variance components by fixing
the effects of the covariate to get: a deviance of ~21n(likelihood)
between the model of Table 13 and the new model with fixed covariate
effects. Thus the second between-cell model is cancelled and the within-

cell model become

Yijk - ﬁOjk + £1 (Tread)ijk + eijk

and the between-cell model has only one equation as

+ a + b + c

ﬁOjk-7OO 03 0k Ojk’

The new results of the analysis are shown at Table 14.

101

Table 14. Results of crossed multilevel analysis with a fixed within

 

 

cell slope

Fixed Effects Estimates SE t
100 12.2879 1.2044 10.2029
81 .3826 .0153 25.055

 

Covariance Components

 

Within-cell variance

02 43.0875
Between-cell variances
Overall 1 r r
a b c
2.984 1.1102 1.2682 .6058

-21n(likelihood) 3125.2379

 

Table 9 shows a highly significant (t-25.055) covariate effect. The
within-cell variance at Table 12 was 99.059 but it was reduced to 43.0875
here. About 57 8 of the within-cell variance was accounted by the
covariate. The overall between-cell variance was also reduced from
5.2205 to 2.984. The last line of Table 14 shows that -2ln(1ikelihood)
is 3125.2379 and the deviance between the values in Table 13 and Table 14
is 2.3502. The x2 statistic with six degrees of freedom is 12.59 at 5 8

significance level. Hence we do not reject the null hypotheses, H '

0°
Var(alj) - Var(blk) - Var(cljk) - 0 and decide to fix the effects of the
covariate across the blocks, treatment groups and the block-by-treatment
interactions.

The parameter variance estimates, Var(ﬂojleread), become

conditional variances. They measure the amount of variability remaining
among the class means. Raudenbush and Bryk (1986) used a measure of

model performance, R2 as in regression, by comparing it to the

102
unconditional parameter variance estimates from the first stage.
Equivalent measures can be used for crossed multilevel model. Hence the

proportion of explained parameter variance by the model is

22* - Var(ﬂjk) - Var(ﬂjkl Tread) _ 5.2205-2.984 _ 428

Var(ﬁjk) 5.2205

 

 

About 43 8 of the parameter variance was explained by the model or the

covariate in this case. The R2* presented above is never less than the
one obtained through OLS estimation because the OLS estimates, ﬁjk’
include true parameter value plus sampling error as 81k - ﬁjk + elk but

the EM estimates, is true parameter value. In the previous study

Bjk!
(Kang and Raudenbush, 1988) the within-cell variance was 43.22, the
overall parameter variance was 2.915, and the total OLS between-cell

variance, Var(B was 10.987. Based on these statistics, the proportion

Jk).
of explained parameter variance by HLM is 42.4 8 and by regression
approach is 23.9 8. It clearly shows that both multilevel models are
more reliable than the classical regression model in terms of the
coefficient of determination (R2). It also shows that the parameter
variance from nested multilevel analysis is the sum of the three
covariance components at macro-level variances in crossed multilevel
model. The nested multilevel model cannot identify the separate error
sources in the model in crossed multilevel contexts.

Having completed the specification of the within-cell model, we now

need to identify the variability among classrooms as a function of

103

between-classroom variables .

- e de f a

The general principle for modelling strategy for the statistical
model specification is to build a parsimonious model. In the crossed
multilevel model, the x2 statistic which is the deviance of the-
21n(maximum likelihood) and which allows us to test a composite
hypothesis, serves as a criterion for model specification. .Available
group level predictors in the original study (Rudman and Raudenbush,
1987) were linear trend of the blocks, and the polynomial trends
variables (linear, quadratic, and cubic) of the treatment groups. The
linear trends of the blocks and treatment groups were significant in that
study after examining all possible sets of predictors. In the present
study, the author decided to use only the polynomial trends variables of
the treatment groups for illustration purpose. By using the predictors
taken from one dimension of two-way classification, we are able to see
changes of each covariance component at the group level.

We first pose the model with all polynomial trends variables and
then take out the nonsignificant predictors from the model. The within-
cell model is the same as before.

Y1

+ 81(Tread)

jk ' ﬂOjk ijk.+ eizjk

but the between-cell model includes the predictors as

ﬁOjk - 100 + 701(L1n)jk + 102(Quad)jk + 103(Cub)Jk + aOJ + b0k +

COjk’

104

where Lin - linear trend, Quad - quadratic trend, and Cub - cubic trend
of the treatment effects variables. All other notations are the same as
before.

The results of the analysis of the above crossed multilevel model are

shown at Table 15.

Table 15. Results of crossed multilevel analysis with both within- and
between- cell variables.

 

 

Fixed Effects Estimates SE t
100 12.3233 1.2079 10.20
101 .5728 .1342 4.27
102 -.0670 .3049 -.22
103 .0158 .1383 .11
61 .3824 .0153 25.02

 

Covariance Components

 

Within-cell Variance
2

0 42.9445
Between-cell Variances
Overall 7 r r
a b c
1.39728 1.08924 .01114 .2970

-21n(likelihood) 3116.4663

 

The results shows that the x2 value of the log-likelihood ratio test
is 8.77 with three degrees of freedom, but x2 value at 5 8 significance

level is 7.82. So the composite hypotheses, H - 0 is

0‘ 101 ' 102 ' 103
rejected. The first panel of Table 10 shows the results of t-test for

the fixed effects and the fixed effects of quadratic and of cubic trends

105

effects are not significant in predicting the variation of within-cell
intercept. So these two predictors ‘will be dropped in. next. model
specification. The residual parameter variances are reduced by virtue
of the predictors at group-level. But the residual parameter variance
of the random block effect, fa' shows the least change because the
employed predictors are mainly supposed to account for the random
treatment effects. One notable result in Table 15 is that the within-
cell variance was not changed by the effects of the group level
predictors“ This is because the group level variables predict the
variation of the responses only at group level.

Since we know only the linear effect of the treatment is significant
in predicting the variation of within-cell intercept, we pose the final

crossed multilevel model. The within- and the between—cell models are

Yijk ' ﬁOjk + 31(Tread)13k + ei:jk

+ a + b + c

ﬁOjk ' 100 + 101(L1“)jk Oj 0k Ojk'

The results are shown at Table 16.

106

Table 16. Results of crossed multilevel analysis of the final model

 

 

Fixed Effects Estimates SE t
100 12.30087 1.20408 10.21
101 .57793 .13336 4.33
61 .38256 .01526 25.07

 

Covariance Components

 

Within-Cell Variance
2

0 42.9431
Between-Cell Variances
Overall 1 r r
a b c
1.4134 1.1045 .0113 .2976

-21n(likelihood) 3116.5100

 

Table 16 shows that the estimated fixed effect of the linear trend
is .578 and the change of the -2ln(likelihood) is almost zero. The
classical interpretation of the estimated effects in regression analysis
can be used for the fixed effects in the crossed multilevel model, that
is the average change of outcome scores for one unit change of linear
trend variable is .578. The linear trend variable was coded as; group 1
- -3; group 2 - -1; group 3 - 1; group 4 - 3; in the study. Hence the
expected mean change of reading test scores for 5 minutes excess testing
time is the twice of the coefficient which becomes 1.156.

Concerning the covariance-components at macro level, we found the
size of the variances at both within- and between—cell levels was not
changed a lot after deleting the two non-significant trends effects.
Comparison of the results with the results of Table 14 tells us that the

within-cell variance was not affected by the effect of the linear trend

107
variable which is from treatment groups, but the magnitude of residual
parameter variance of random treatment effects were reduced from 1.268
to .0113. Hence about 99 8 of the random treatment effects were
accounted by the linear trend variable.
We can compute the coefficient of determination for parameter
variance here. The proportion of the explained parameter variance by

the model is

2* Var(ﬁjk) - Var(ﬁjkl Tread,Lin) 5.2205 - 1.4134
R — - - .729
Var(ﬁjk) 5.2205

 

 

About 73 8 of the parameter variance was explained by the model.

Based on this results regarding fixed effects, we could conclude as
in the original study: 1. Test designers should consider more precise
methods of setting testing time limits, 2. tests which are sensitive to
variations in the procedures of administration should not be used for
high-stakes decisions unless the procedures can be carefully monitored

(Rudman and Raudenbush, 1987, p.14) .

CHAPTER VI

CONCLUS ION

8mm

Educational systems typically ‘have hierarchical organizations in
which "units" at one level are "nested" within units at the next higher
level. These educational systems often. produce 'hierarchical data.
There have been many controversies over results found through traditional
statistical analyses. .As Cronbach (1976) and others emphasized, many
educational studies have used inappropriate analyses, including many
important evaluation studies. A number of methodologists have developed
multilevel models with estimation procedures appropriate for multilevel
data. Although these multilevel models have made substantial
methodological advances in analyzing multilevel data, they apply only to
those multilevel data structures in which each lower-level unit belongs
to only one unit at the next higher level. In many cases, however, the
structure of a system is not so simple. Students may belong to more than
one group simultaneously. For example, we could cross-classify students
both by the school they attend and by the neighborhood they live in. We
call this kind data as "crossed multilevel data."

Despite the recognition of a need for an appropriate multilevel
model analyzing crossed multilevel data, a major difficulty has been
constrained by existing technologies such as one-way nested multilevel
models, variance components models, and OLS regression models. This
thesis has now expanded the multilevel techniques to include the two-way
crossed multilevel model. The major products of this thesis are five

things :
108

109
l. A crossed multilevel statistical model has been presented in
general form.

2. The Empirical Bayes estimation procedure has been adapted to the
estimation of the crossed multilevel model.

3. A computing algorithm (the CML algorithm) for numerical analysis
of crossed multilevel data has been developed using the Gauss

language.

4. The accuracy of the computing algorithm has been tested in
reasonably comprehensive situations of crossed multilevel
modelling both for balanced and unbalanced data sets.

5. The application of the crossed multilevel model to real
data set was learned through illustration.

In chapter one, the issues of crossed multilevel analysis in
educational research were identified and the statistical model for
crossed multilevel analysis was presented. The problems of existing
statistical models for analyzing crossed multilevel data resulted because
those models can't specify the error terms from all sources in crossed
multilevel contexts. Traditional single level models, such as
regression models, fail to incorporate the error sources from multilevel
structure. Such a limitation of the models leads researchers to have an
enforced choice of either individual level or group level analysis.
Because the observations within a group are not independent, significance
tests based on individual level analysis are not acceptable, due to the
violation of independence assumption. Dependencies among observations
cause over-estimation of the precision which is a function of sample size
and intra-class correlations. The group level analysis does not violate
the independence assumption but cannot use individual variables in the
model. Inferences about individual behavior (e.g., individual student

achievement) based.<u1 group level analysis can cause aggregation bias.

The limitation of classical statistical analysis is further exacerbated

110

when researchers raise the inquiries about how the process in higher
ev u t (e.g., organizational treatment) influence the W
[Lowe]; leve], unit: (e. g., students achievement). The effect of treatment
at the organization level on the individual students may be different due
to the differences of students backgrounds. Traditional analysis must
assume that the organizational treatment effects on individual behavior
are homogeneous, which is hardly acceptable.

The currently available nested multilevel models virtually solve
these problems when they are applied for the analysis of one-way nested
multilevel data. Those nested multilevel models, however, are not
appropriate for the analysis of crossed multilevel data because the
models specify only one group-level error term, while crossed multilevel
data have three error sources, two sets of macro units plus their
interactions, at the group level. Application of nested multilevel model
for the analysis of crossed multilevel data would produce large variance,
T, at group level which will be the sum of three group level variances, T

- Ta + r + 'c’ obtainable from a crossed multilevel model. The exact

b
functional relationship between nested multilevel and crossed multilevel
models regarding the group level precision is not clear to the author at
this point. It will be a interesting study to investigate the
significance tests when a nested multilevel model is used for the
analysis of a crossed multilevel data. One thing we know is that the
nested multilevel model does not identify the all error sources of the
crossed multilevel data at group level.

Inquiries into the use of the crossed multilevel model have begun

with several concrete examples. In the case of school effects studies,

many schools can be examined for multiple time points in a single crossed

lll

multilevel analysis if the observed students are different across the
time periods. Furthermore, the crossed multilevel model allows the
estimation of random interaction effects which can't be achieved through
the existing nested multilevel models. Significant random interaction
effects constrain the generalizability of the research findings and lead
us to identify what particular fixed effects of group characteristics are
interactively at work.

In chapter two, the empirical Bayes estimation method using EM
algorithm was adapted for the crossed multilevel model. The essential
idea of the estimation theory is twofold: First, the crossed multilevel
model is viewed as a case of general mixed linear model, which enables us
to build the identities between the two models. The general mixed linear
model can be modified into either a Bayesian model or a linear model.

Then Bayesian estimation method and generalized least squares method were

*

utilized for the estimation of random, 62

and D9*, and fixed parameters,
61*, respectively.

Second, the MLF method of EM algorithm was used for finding ML
estimates of variance and covariances. In EM algorithm, the most complex
computation is to obtain the posterior dispersion matrix which requires

1

9: - - -
matrix inversion, D - (A'i A + T 1') 1. The advantage of MLF over MLR

9
method for computing the posterior dispersion matrix of crossed
multilevel model is that the dimension of the matrix to be inverted is
smaller in MLF (3 x 3 partitioned matrix) than in MLR (4 x 4 partitioned
matrix).

Based on the estimation method presented in chapter two, chapter

three provides actual computational EM formula which have been used for

the CML computer program. In chapter four, the accuracy of the

112
performance of the CML algorithm was examined through reasonably
comprehensive evaluations and all the resulting estimates were virtually
identical to in; estimates computed from either standard packages (SAS,
BMDP) where those programs apply or simulations. The study provided the
evidence confirming the accuracy of the algorithm.

In chapter five, the use of crossed multilevel model was illustrated
by applying the model to the analysis of actual experimental data. The
experience with the crossed multilevel model brought up insights about
the logic of model specification, hypothesis testing, parameters to be
estimated, and the interpretation of the findings. It also showed that
the use of the crossed-multilevel model in crossed-multilevel contexts
has advantages over single-level models in the estimation of covariance
components, inference on the hypothesis testing, and the coefficient of
determination (R2). The basic procedure of model specification consists
of three steps. The first step is to examine variability of the observed
outcome variable with respect to the crossed multilevel structure. We
call the model for this purpose the b§§g_ggggl since no predictors are
included in the model. Using the base model, we obtained variance—
covariance components estimates at both the individual and group levels.
This decomposition of the observed total variance enables us to obtain
three intra-unit correlations: intra-cell, intra-block, and intra-
treatment. The classical single-level models do not allow the
estimation of parameter variance and the OLS estimate of the group-level
variance is contaminated with sampling-errors. Nested-multilevel models
provide the parameter variance but cannot decompose the parameter
variance into the three components.

Having examined the decomposition of the variances and covariances,

113

we tried to explain the process within each classroom through within-
cell model specification. The power of crossed multilevel model in this
step is that the model clearly shows what effects of individual
characteristics are random across what particular macro factors.
Classical-single level models always assume that the effects of
individual characteristics are constant across all macro units, which is
hardly acceptable. Nested-multilevel models allow the effects of
individual characteristics to vary across the macro units but they cannot
identify what particular macro factors, among the three in crossed-
multilevel contexts, are the source of variation of the slopes at
individual level. In our example, the effect of total pretest score
(Tread) showed little variation across the all three macro factors;
treatment groups, blocks, and the classrooms classified by the blocks and
treatment groups. Thus we decided to fix the effect of the variable.

Finally, group level variables were examined in order to identify the
variability of the observed outcome score as a function of group level
variables. In this step, I examined the treatment variables to show the
changes of parameter variances between the blocks, 'a’ and the treatment
groups, rb. It clearly showed that the parameter variance pertinent to
blocks, fa’ did not change for blocks when we used the variables of
treatment characteristics in the model. Thus we could know that the
treatment variables mainly explain the variability of the outcome values
across the treatment groups.

As many multilevel methodologists anticipated the emergence of a
general crossed multilevel model, we now have the model. The crossed
multilevel model presented in this thesis, however, is not yet completed
for those audiences who want to use it. The limitations and the expected

necessary future work for this model are now considered.

114

Mic—rm

Co a o t

Complexity of computing estimates of crossed multilevel model come
from the three facts: First, the crossed multilevel model uses the
variables at both micro and two sets of macro units. Because the CML
algorithm is designed to use the model for general applicability, the
algorithm allows estimation for a wide range of models with various
combinations of predictors at both levels as in chapter three. These
complex model characteristics and general applicability of the model make
the algorithm nearly 1000 lines long in Gauss language. If we consider
that Gauss, unlike the fortran, is a matrix operation oriented program
language, we could imagine the complexity of computation.

Second, the crossed multilevel model can be applied for experimental
data as well as for large scale survey data such as national data sets.
For the analysis of large scale data, computation using the whole data
matrices is not always possible due to the limitation of virtual memory,
especially in micro computers. An alternative way is to compute the sum
of squares and cross-products of each cell separately and recapture the
necessary statistics based on computed cell statistics. This unit by
unit computation approach cause the program complex but useful for
computation of large scale data sets. Although computation is complex,
this computational strategy allows us to perform the crossed multilevel
analysis using micro computers of 386, AT, and even XT.

Although more work will enable us to improve the efficiency of
performance of the CML algorithm, the final version would still

definitely be complex. The current CML algorithm was written with Gauss

115
language (version 2.0) so it can be used only for micro computers with
appropriate math-coprocessors. The Gauss software (version 2.0) allows
matrix inversion up to a 90 X 90 matrix, which enables us to use a fairly
large data set. Increasing popularity of micro computer use for
computation would increase the use of the crossed multilevel analysis,

but we still need to have the algorithm for mainframe computer.

2 ens t v t to ssum tions

Empirical Bayes estimation requires the assumption that data and
the prior distribution of the random effects are normal. Although the
distribution of outcome measures can be examined, the prior distribution
of parameter effects is actually unknown. The sensitivity of results to

violations of this assumption are not known.

a r t n or

The current version of CML algorithm is not an efficient one at this
points. It reads the sufficient statistics multiple times at each
computational step because it involves the computational components for
each set of macro units along with their interactions. Simplification of
the program is possible with more experienced computer programming.
Additional work needed for the program is to include all necessary test
statistics. The current program allows us to estimate the fixed effects
parameters with significance tests and the variance-covariance components
at each level. ‘Various other test statistics, for example confidence
intervals need to be added in the program and these are expected to be

done very soon.

116
One strong advantage of empirical Bayes estimation is that it can
decompose the within-cell parameter variance into true and sampling

variances. For example, 8 where the outcome is the OLS

jk " ﬁjk + ejk’

estimate of a cell coefficient, is the true coefficient value and

51k
ejk is a sampling error. Then Var(B

1+7
(a

jk) - Var(ﬂjk) + Var(ejk)

b+fc) + v, where the left hand side of the equation is the observed
parameter variance and Var(ﬂjk) is the true parameter variance and

Var(e k) is the sampling variance. At this point, the CML algorithm

J

produces only the true parameter variances, r 'c and within-cell

a’ ’b’
variance, 02. The importance of this decomposition of true and sampling
variance of the observed within-cell parameters lies in the points that
we could compute the reliability, p - Var(ﬁjk)/Var(3jk), of the
estimation of within-cell parameters and could compare the performance of
EB estimation and traditional OLS estimations and obtain the technical
properties of EB estimates. EB estimators were known as ”shrunken
estimators”,i.e., estimators shrunken toward their unconditional grand
mean (unconditional) or their conditional grand mean (conditional)
(Rubin,198l; Strenio, 1981; Raudenbush, 1984). This property of EB
estimator can be proved by using the decomposition of the observed
parameter variance into true and sampling variance.

Finally the CML algorithm is not user friendly. More technical

programming work is necessary for the popular use of the algorithm.

.9 -:2: ., . -: ed 9.. _ -v- :1d - e- a- , -v- ,..- :
The crossed multilevel model in this thesis is the expansion of a
nested multilevel model that shares the same modelling strategies,

hypothesis testing , and the interpretation of the findings. These

 

117
similarities encourage us to combine the two models. To denote the
possibilities, let me consider the simple case in which the crossed
multilevel model can work like the nested multilevel model. In nested

multilevel model, the base model is

2
YJ 51 + ej e:1 - N(0, a I) (within-group model)

ﬁj - 10 + uj uj - N(0, r) (between-group model)

The within-group model shows a vector of outcome scores within j Eh group

is a group mean, 81, plus random errors, e Each group mean, 81, is the

j'
function of a grand mean, 10, and the random effect of the group, uj.

Consider now the base crossed multilevel model,

2

e -N(0,aI

Jk )

+ a + b + c a ~ N(0, fa), bk - N(0,

j - N(O, rc)

'b)' cjk
The evident difference between the two models is that crossed multilevel
model has two subscripts, macro row units j and macro column units k,
and the cells, jk, classified by the two sets of macro units. The
relationship between the two models regarding the parameter variances is
r - fa + 'b + 'c' Suppose the within-cell parameter, pjk’ does vary
only across the macro row units, so 'b - 7c - 0, then we would fix the
effects of column and the cell units on the within-cell parameters as
zeros. In this case the model become

- N(0, 021)

ij- pjk+ ejk ejk

ﬂjk - 10 + a:] a:I - N(0, fa).

118

So the within-cell parameter has only one random effect defined on rows.

In this new model, ﬂjk - ﬁj and e - e also fa - 1. Thus the base

Jk J’
nested model and the new crossed multilevel model become identical.

Since the CML algorithm is developed for general applicability, it
supports the model above. A distinction between the nested multilevel
and crossed multilevel model aries when we have longitudinal data sets
or time-series data. Consider a number of students are observed across
the multiple time points. The design may be viewed two-way crossed
(student j and time k) design with one observation in each cell. But
the observations across the times are not independent. While the nested
multilevel model views the design as an oneway nested design, multiple
observations are nested within students, and so allow the dependencies of
the observations, the crossed multilevel model treats the multiple
observations, time k, as macro units and considers them as independent.
So the crossed multilevel model can't analyze the data appropriately.

Similar problem arise when we have longitudinal data. Suppose many
schools are observed through multiple time points using the same samples
of students from each school. The design of this data may be viewed as
two way cross-sectional, the schools (row macro units), the time points
(column macro units). But the column units are not independent because
observations are taken from the same sample at different time points.
The crossed multilevel model assumes the two sets of macro units are all
independent. The 3-level nested multilevel model handles this
longitudinal data because it considers the multiple observations nested

within students which are also nested within schools at a time point

(see, Raudenbush, 1989). The difficulty of the integration of the two

 

119
models stems from the fact that the crossed multilevel model uses the
time points as macro units but the 3-1evel nested multilevel model
considers the time points as the micro units nested within each student.
If an algorithm allows the transition of ndcro- and macro-units in its
configuration of data, there may be a way to integrate the two models.
But we need to have a comprehensive comparison of the characteristics of

the two models to improve our understanding about the combination of the

two models .

BIBLIOGRAPHY

BIBLIOGRAPHY

Aitkin,M.,Anderson,l(.,& Hinde,J.(l981). Statistical modeling of data on

teaching styles.Joatnai of tha Royal Statiaticai Sotiety,Setias A, 144(4),
419-461.

Aitkin,M.,& Longford,N.(1986). Statistical modeling issues in school
effectiveness studies. Jo the 8 ca Soc et r e

144(4),419-461.

Burstein,L.(l980). The analysis of multi-level data in educational

research and evaluation. Review at Reaaatth in Eatcatioa,§,158-233.
Cooley,W.W. ,Bond,L. ,6: Mao,B. (1981) . Analyzing multi-level data. In
R.A.Berk(Ed.), ucat o v u t o o 4-8 . Baltimore,

MLzJohns Hopkins University Press.

Cronbach,L.J.(l976). Research on classrooms and schools:Formulation of
questions, design and analysis. Occasional paper of the Stanford
Evaluation Consortium, School of Education, Stanford University.

Cronbach,L.J., 6: Snow,R.E.(1977). e a truct a ct
New York: Irvington.

Cronbach,L.J.,& Webb,N.(l975).Between and within-class effects in a
reported aptitude-by-treatment interaction: Reanalysis of a study by G.L.

Anderson. 12urna1_2f_Educational_£§xshglczxi§. 717-724.

Cunningham,E.P., & Henderson,C.R.(l968). An iterative procedure for
estimating fixed effects and variance components in mixed model
situations. Biokettita,24,l3-25.

DeLeeuw,J., 6: Kreft,I.(l986). Random coefficient models for multilevel
analysis. Joutaai at Educatignai Statiatita, 11(1), 57-85.

Dempster,A.P., Laird,N.M., 6: Rubin,D.B. (1977). Maximum likelihood from
incomplete data via the EM algorithm (with discussion). W

821al_§tatistical_§2221tx (Series B).39. 1-18.

Dempster,A.P.,Rubin,D.B.,& Tsutakawa,R.D.(1981). Estimation in covariance

components models.l2arnal_2f_ths_Amarican_§tatistisal_Assssiatign.76.341-
53.

Efron,B., 6: Morris,C.(l975). Data Analysis using Stein's estimator and
its generalizations. l2srnal_2f_ths_amari2an_§tatiatical_Assssiatisn. 74.
311-319.

Gamoran, A. (1991). Schooling and Achievement: Additive versus Interactive
Models. In Raudenbush,W.S., 6: Willms,J.D.(Ed.) W

.1... 7 ‘ "Q: 0’; 0 “ O .0. .' OH 3 V. ‘V' .' .‘C .

120

121
San Diego, CA: Academic Press, Inc.

Gauss version 2.0 system and Graphics Manual (1984-1988),Aptech systems,
Inc..

Glass,G.V.,& Stanley,J.C.(1970).Statistital netnoas in aducatian ana
psythology, Englewood Cliffs, N.J.:Prentice-Hall

Goldstein,H.(1986). Multilevel mi ed linear model analysis using
iterative generalized least squares. Diamatrika,2§(l), 43-56.

Goldstein,H.(1987a). u v o a
Btstatth. London:Charles Griffin & Company Limited.

Goldstein,H.(1987b). Multilevel Covariance Component Mbdels. Dianattika.
74, 2, 430-431.

Graybill,F.A.(l96l). uc o e 5 ca de V
N.Y.:McGraw-Hill Book Company,Inc.

Hartley,H.O.,& Rao,J.N.K.(l967). Maximum-Likelihood Estimation for the
mixed analysis of variance model,Dianattika,54,93-108.

Harville,D.A.(l969). Quadratic unbiased estimation of variance compinents

for the one-way classification, Dionettita,§4, 93-108.

Harville, D.A. (1977). Maximum likelihood approaches to variance
component estimation and to related problems. 0 a1 er
Statistical Asaoaiation,22. 358, 320-340.

Henderson,C.R.(l953). Estimation of Variance and Covariance Components,

8.12m. 9. 226-252.

Hopkins,K.D. (1982). The unit of analysiszcroup means versus individual

observations, Anerican Educational Resaattn Jantnal, 19(1), 5-18.

Kang, S.J., and Raudenbush,S.W.(l988). a a Co etw
__ass c: . OV‘ and Hierarct- - 1 1‘: 1"",8 _, e ' a, s ~ o, e te-
Dnbalanted ﬁnnerimantal Data. Paper presented at the annual meeting of

the American Educational Research Association. New Orleans, April.

Kirk, R. E. (1982). e d e v
£2122££§_i2n§_ﬁéil Belmont, CA: Brooks/Cole.

Knapp,T.R.(l977). The unit of analysis problemin applications of simple
correlational researchiszsmaliﬁdusatignalitatistiss. 2(3) .171-186.

Laird,N.M.,& Ware,J.H.(1982). Random-effects models for longitudinal
data.Dianattit§,38,963-974.

LaMotte.L.R.(1970). Wants.

Technical report no.10, Department of Statistics, University of Kentucky,
Lexington, Kentucky.

122

LaMotte ,L.R. (1973) . Quadratic estimation of variance
components ,Dinnattita, 29 , 311- 330 .

Lindley,D.V.,& Smith,A.F.M. (1972). Bayes estimates for the linear model.
W. 1-41.

Longford N 141985).me
WWW Unpublished

manuscript, Institute for applied statistics. Lancaster University,
Lancaster, England.

Mason,W.M.,Wong,G.Y.,& Entwisle,B.(l984). Contextual analysis through the
multilevel linear model. In S.Leinhardt (Ed.),§ociological nethadology

W San Franc sco:Jossey-Bass.

Page,E.B.(l975). Recapturing the richness within the classroom. Paper
presented at the annual meeting of AERA, Chicago.

Patterson,H.D.,& Thompson,R.(l97l). Recovery of inter block information
when block sizes are unequal,DiaLet;ik_a,58,545-554.

Patterson,H.D.,& Thompson,R.(l974). e ood
tomnonents at varianca, Proceedings of the 8th International Biometric

Conference, 197-207.

Rao,C.R.(l972). Estimation of variance and covariance components in

linear models. WW. 112-
15.

Raudenbush, S.W. (1984). Application of a Hierarchical Linear Model in
Educational Research. Unpublished dissertation, Cambridge,MA: Harvard
University.

Raudenbush, S.W. (1988). Educational Applications of Hierarchical Linear
Models: A Review. WJMZLBS-IM.

Raudenbush,S.W. (1989). The analysis of longitudinal, multilevel data,
WWW. 13. 721-740.

Raudenbush,S.W., 6: Bryk,A.S.(1986). A hierarchical model for studying
school effects, W59. 1-17.

Raudenbush,S.W.,& Bryk,A.S.(1988). Methodological advances in analyzing
the effects of schools and classrooms on student learning. Raxiaw at
W.15.423-475.

Robinson,W.S.(l950). Ecological correlations and the behavior of
individuals. WJSIJSI

Rogosa,D.(1978). Politics, process, and pyramids. Wm],
3551551525. 3(1). 79-86.

Rudman.li..& Raudenbush,S.W. (1987). WWW

 

123

Conptenenaion ana ﬂatnenatita Application. paper presented at NCME.,
Washington,D.C., April.

SAS Institute Inc. §A§ Daat'a Guide; Statiatita, Vetaion 5 Edition. Cary,
NC: SAS Institute Inc., 1985.

Searle,S.R.(l982).Mattix algabta naeiul fat atatistica.New Yorszohn
Wiley & Sons,Inc.

Searle,S.R., 6: Henderson,C.R.(l961). Computing procedures for estimating
compineents of variance in the two-way classification mixed model.

Eionettita,l7. 607-16.

Smith,A.F.M. (1973). A. general Bayesian. linear' model” Jou e
ngal_ §tati§tital Society, (aetiea B),35, 61-75.

StreniO.J.L.F.(1981). WM
nodel. Unpublished dissertation, Cambridge, MA: Harvard University.

Strenio,J .L.F. , Weisberg,H. I . ,6: Bryk,A.S . (1983) . Empirical Bayes
estimation of individual growth curve parameters ande their relationship

to covariates. Eipnattitatég. 71-86.

Walsh,J.D.(l947). Concerning the effect of the interclass correlation on
certain significance tests. 0 e ti c . 18. 88-
96.

TATE UNIV. LIBRRRIES

Ill HINllﬂl H II

3007927 16

MICHIan 5
"WWW"
3129