1
a .
.

.1... i
. . .2 f

V.

2.. z
)55.‘
: 23".

 

 

 

 

39:73.. «5&1 fun
$535.. Event...

.uhnﬁaqnﬁaw

‘ In”. .
unﬁt W.

n”. (I 3.3-.vriv9:
3.. 3.3.35

2
. . .2.

:z .1. o in t

31.3353! :3...

1...: ....m.aquur a

. V A 5mm.

I . ‘4.-
Lﬁxii...a

3 {$3. 3.3“...
.135}. End», «Jihad

, 3.5.5.2}
51...... int:? 349..

$3.3.)
1 .
!_5G!.x...:\n..

. gill Y....
'15.)...1 ‘

..

513%....1.
(.55; 11.2.
118:...

lllllllllllllllllllllllllllllllllllllllllllllll

31293021

 

Michigan State

 

LIBRARY

 

University

 

This is to certify that the

dissertation entitled

INCORPORATING FACTOR ANALYSIS INTO HIERARCHICAL
MODELS

presented by

Yasuo Miyazaki

has been accepted towards fulﬁllment
of the requirements for

PH . D . degree in Coungeling, Educational

Psychology & Special Education

W/ Kama/«N

M/ajor professor

Date 7// é//§'5’

MSU is an Afﬁrmative Action/Equal Opportunity Institution 0- 12771

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE

DATE DUE

DATE DUE

 

"’ 6%?) 2%?

 

 

 

 

 

 

 

 

 

 

 

 

 

11m c'JCIRODatoDuopas-p.“

 

INCORPORATING FACTOR ANALYSIS INTO

HIERARCHICAL MODELS
By

Yasuo Miyazaki

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirement

for the degree of

DOCTOR OF PHILOSOPHY

Department of Counseling, Educational

Psychology and Special Education

2000

ABSTRACT

INCORPORATING FACTOR ANALYSIS INTO

HIERARCHICAL MODELS
By

Yasuo Miyazaki

This dissertation incorporates a factor analysis model into a two-level hierarchical
linear model (HLM). It provides the model layout in HLM format and derives the
maximum likelihood estimators. A computational program to implement the theory is
developed. Special attention is focused on the application of the model in order to create a
bridge between the statistical model and applications in education and human
development. That is, I describe in detail when and in what context the model might be
useful and the kinds of research questions that can be addressed using this model, so that
researchers who are interested in substantive issues rather than statistical issues can
immediately apply the model to their research.

Real as well as artiﬁcial data sets are used to illustrate the theory, the application,
and the interpretation of the results. In the last Chapter, extensions of this model and their

settings are mentioned.

© Copyright by
Yasuo Miyazaki
2000

DEDICATION

This dissertation is dedicated to my parents, Yoshio and Sakiko Miyazaki, for their
unconditional love and support.

To my sister, Michiyo Kurakata, who cared about her elder brother, even though she
was busy raising her four children.

To my late elder brother, Tomoo Miyazaki, who didn’t have much happiness in his
life, having passed away without reaching age 40, while I was in the United States. I
apologized to him in front of his tomb for my absense, for not having come back when he
died. That was three years ago already. This time I will bring this disseertation back home

and will show it to him. He might ask, “Did you have fun in the U.S.?”.

iv

ACKNOWLEDGMENTS

It took a longer time than I thought to complete this dissertation. It was a long and
lonely journey. At midnight, I often dreamed of the scenery of my home town village in my
childhood, such as the gentle shape of the round mountains and the sound of the rice
paddies when the breezes blew across the leaves of rice. The dream was oﬁen the same and
was so vivid even though my childhood was more than 30 years ago. During tough times, I
often spoke the golden words to myself, “There is no night that doesn’t open (No night is
endless),” and “Enduring snow, plums bloom sharper scents in the spring.”

Additionally, the words from my father always relaxed me and made me realize that
blue collar spirit was my blood: “There have been no persons in my family who can make a
living by their brains. However, there is a way for such a person to survive. Use your
body.” The words from my mother, “I will live and wait until you come back home,” lifted
my spirit for completion of the program.

During my journey, I fortunately had many accompanying me. I would like to
extend my sincere gratitude to my dissertation director, Professor Stephen W. Raudenbush,
my advisor, Professor Kenneth A. Frank, and dissertation committee members, Professor
Betsy J. Becker, Professor Mark Reckase, and Professor James Stapleton, for their advice,
insight and guidance throughout the process of writing this dissertation.

My special thanks go to Dr. Randall P. Fotiu for his help with computer

programming, which caused me a lot of trouble.

Haiyan Zhang spent lots of her time to help and to support me. Thank you very
much, Haiyan.

I cannot enumerate here all the people to whom I am indebted, who offered me
many kinds of help, support, and encouragement. I’m sure, however, that without their
help, I wouldn’t have ﬁnished the dissertation. I want to make this fact my constant

reminder for the rest of my life. “Do a little help. That makes you a slightly better man.”

vi

TABLE OF CONTENTS

LIST OF TABLES ......................................................................................................... viii
Chapter Page
1. INTRODUCTION ........................................................................................................ 1
1-1. Motivation of the Study ....................................................................................... 1
1-2. Overview of the Past Studies of Multilevel Latent Variable Modeling .............. 7
1-2-1. Papers on Latent Variables in HLM ........................................................... 8
1-2-2. Structural Equation Models
(Mean and Covariance Structure Models) ................................................ 13
1-2-3. Modeling the Error Structure .................................................................... 15
1-2-4. Multilevel Factor Analysis ....................................................................... 16
2. SPECIFIC SETTINGS IN WHICH THE PROPOSED MODEL ARE USEFUL ...... 18
2-1. Slopes-as-Outcomes Model in Organizational Studies ...................................... 18
2-2. Growth Models in Human Development Studies ............................................... 20
2-3. Psychometric Analysis of Multivariate Outcomes ............................................. 22
3. APPROACH AND MODEL LAYOUTS .................................................................... 25
3-1. The Model ........................................................................................................... 25
3-2. Similarities and Differences from the Multilevel Factor Analysis Model .......... 26
3-3. Identiﬁcation ...................................................................................................... 30
4. ESTIMATION AND INFERENCE ............................................................................. 50
5. COMPUTATIONAL EXAMPLES ............................................................................. 55
5-1. High School and Beyond .................................................................................... 55
5-2. Infant Vocabulary Growth .................................................................................. 60
5-3. Scholastic Aptitude Test (SAT) Meta-analysis of Coaching Effects ................ 67

-vii-

5-4. Results from the Simulated Data. ....................................................................... 81

6. CONCLUSIONS AND FUTURE DIRECTIONS ...................................................... 95
6-1. Conclusions ........................................................................................................ 95
6-1-1. Methodological Contributions .................................................................. 95

6-1-2. Substantive Contributions ......................................................................... 98
Expanding Modeling Possibility .......................................................... 98

Recovery of f ................................................................................ 98

Improvement over current ad hoc procedure .................................. 99

View of current ad hoc procedure from HLM2F framework ........ 101
Substantive Interpretation of latent variables ...................................... 102

Practical Interpretability of the Results ............................................... 103

6-1-3. Unsolved Methodological Problems ....................................................... 103
Conﬁrmation of Global Maximum ..................................................... 103

Identiﬁcation ....................................................................................... 106
6-2. Future Directions ............................................................................................... 107
6-2-1. Including Speciﬁcity Parameters ............................................................. 108

6-2-2. Multivariate HLM With a Factor Model .................................................. 110

(2-leve1 Multivariate Model)

General Formulation of the Model ...................................................... 110

Example 1. Growth Model .................................................................. 113

Example 2. Multivariate Growth Model ............................................. 116

6-2-3. Application to Item Response Theory Model .......................................... 120
ANOVA type formulation of 2-P IRT using HGLM .......................... 122

Regression type formulation of 2-P IRT using HGLM ....................... 124

Rasch(1-P IRT) model .................................................................. 124

2-P IRT model ............................................................................... 127

Multi-dimensional IRT model ....................................................... l3 1

6-2-4. Other Possibilities .................................................................................... 132
APPENDIX .................................................................................................................... 134
REFERENCES ............................................................................................................... 154

-viii-

5.1

5.2

5.3

5.4

5.5

5.6

5.7

5.8

5.9

5.10

5.11

5.12

LIST OF TABLES

Results of the Analysis for High School and Beyond Survey ............................. 59
Results of the Analysis for the Infant vocabulary Growth Data .......................... 65
Classiﬁcation of SAT Studies .............................................................................. 68
Results of the Analysis for the SAT Coaching Effects Data ................................ 74
Descriptive Statistics of 4 Generated Outcomes .................................................. 82
Sample Covariance Matrix of 4 Generated Outcomes ......................................... 83
Sample Correlation Matrix of 4 Generated Outcomes ......................................... 83
Shapiro-Wilk Test for Normality for 4 Generated Outcomes .............................. 84
Characteristics of the Speciﬁed Models ............................................................... 88
Results of the Analyses for the Simulated Data by Four Models ........................ 90
Several Results for the Tests of the Model Fit ..................................................... 91

Estimated Coverage Probability and Its Conﬁdence Interval for 1000

Simulations ........................................................................................................... 93

ix

Chapter 1. Introduction

l-l. Motivation of the Study

Suppose that researchers are interested in how inﬂuences of students’ gender,
race, SES, family structure, IQ, and pretest score on their achievements vary from school
to school. This is a typical research question that motivated the development of the
hierarchical linear model (HLM) (Bryk & Raudenbush, 1992), which is alternatively
called the multilevel model (Goldstein, 1995), the random coefﬁcient model (Longford,
1993); and, often in statistics and biostatistics literature, the mixed model. (See, for
example, SAS manual (1996) for Proc Mixed). The HLM can be implemented by several
software programs, such as HLM (Bryk, Raudenbush, & Congdon, 1996), MLWin
(Goldstein et al., 1998), SAS Proc Mixed (SAS Institute, 1996), and MIXOR (I-Iedeker,
D., & Gibbons, R. D., 1996).

After the software became available, a fair amount of educational research was
devoted to this issue (See, for example, Aitkin, Anderson, & Hinde (1980), Aitkin &
Longford (1986), De Leeuw & Kreﬁ (1986), Goldstein (1986), Raudenbush & Bryk
(1986), and Lee & Bryk (1989) among others). Often, however, we don’t have enough
data to support a complex model that has many parameters. Speciﬁcally, if we increase
the number of randomly varying coefficients, we need a larger sample size per cluster.

In an meta-analysis settings, suppose that we have multiple effect size estimates
from a collection of independent studies that test the same hypotheses, and we wish to
synthesize these effect sizes. In this setting, it is sometimes as important to make an

inference about variance components as about ﬁxed effects because the variances indicate

-1-

how much the effects sizes vary from study to study (Raudenbush, Becker, & Kalaian
(1988), Kalaian & Raudenbush (1996)). Though in theory we can model as many the
random effect sizes as we want, the number of the independent studies is usually
relatively small in meta analysis (about 50 - 100 at most), naturally limiting for the
number of random effects that can be estimated from the data at hand.

A more classical and standard application of multivariate random effects is a
psychometric analysis for multivariate outcomes. For example, if researchers want to
know the reliabilities of a test that is supposed to measure multiple constructs, we may
formulate a two-level hierarchical model using dummy variables at level-1 as predictors.
Then we ask how scores from multiple domains or items vary among students. From the
variance estimates, we can determine the test reliabilities for each domain.

In the growth model literature within the hierarchical modeling framework, we
often model the individual growth trajectory by a polynomial at level-1. At level-2 the
coefﬁcients of the level-1 model, that is, the person-speciﬁc growth parameters, become
multivariate outcomes. In such a growth study, it is oﬁen researchers’ substantive interest
to examine correlations among the growth parameters, for example, the correlation
between initial status and rate of growth. If the growth function is complex, there will be
a large number of random effects at level 2.

The mixture of multivariate outcomes and growth models also applies to the
context where we have many random effects in level-2. In this case, correlations within

individuals occur for two reasons; one is that multiple measures are available for the same

individual and the other is that a single measure is taken over time for the same
individual.

In a statistical sense, the common theme in the above examples is that the
researchers wish to study many random effects simultaneously. If we have a large enough
sample to estimate many parameters in the level-2 variance-covariance matrix (we will
call this the “tau matrix”), it is possible to estimate such a large model. However, even in
such a case, researchers who analyze data using a two-level hierarchical linear model still
sometimes encounter the problem of slow convergence or even worse, noconvergence,
because the tau matrix is really singular.

In statistical literature, the problem of a singular tau estimate is known as the
Heywood case (Heywood (1931)) or boundary solution problem (Catchpole & Morgan
(1994)). This problem occurs when the estimates reside near/at the boundary of parameter
space. The currently most common approach to remedy this problem is to get rid of some
of the random terms at level-2 and to re-estimate the smaller number of parameters in the
tau matrix. Sometimes this approach does not make sense since removing a random term
not only removes the large correlation that we want to get rid of, but also removes the
variances that we want to keep in the model when the variation of the corresponding term
exists.

Actually, we can ﬁnd many examples of the above cases where the tau-matrix has
high correlations among random terms. For example, Bryk & Raudenbush (P. 141-144,
Chapter 6, 1992) reanalyzed Huttenlocher et al. (1991)’s children’s vocabulary growth

data. The quadratic growth trajectory was formulated and they found the high correlations

among random coefﬁcients of the initial status, SIOpe, and rate of acceleration (Table 6.5,

1.000 — 0.982 - 0.895
P. 144, Bryk & Raudenbush, 1992), which was - 0.982 1.000 0.842 . Later the
-— 0.895 0.842 1.000

intercept was dropped completely from the model, but the correlation between the slope
and the rate of acceleration was still very high at 0.904.

As a biostatistics example, Longford (P.136, 1993) analyzed the weights of
newborn rats nested within litters and found that the correlation between the mean rat

weights adjusted by diet contrasts and litter size and the gender gap in weight among

0505 — 0.139 . .
, and the correlation rs

litters was almost -1.0 (r = [_ 0.139 0.038

A £0! -00139 , . .
u ., . = . ,. = as —l.0034,thou htlus estimate rs not
A °’ u") JIM, J(0505)(0.038) g

 

 

admissiblel).

In psychometrics, we are often interested in measuring the true score of the
student. Suppose there is a general aptitude test that consists of 4 domains such as Math,
Science, Reading, and Social Studies and each domain consists of 10 testlets whose
scores are given simply by adding up the correct responses for the items in the testlet. If
we conceive the above situation as testlets nested within students and we know which
domain each testlet is supposed to measure, then there are four true scores for each
student and those are likely to be highly correlated because high ability students are likely

produce high scores in any domains, in general. Then, if we have such data as repeated

 

' Longford used the Fisher scoring algorithm, which can produce the estimate outside of the parameter
space. The results are the case of this out-of-bounds estimate by Fisher scoring.

-4-

measurements for students (testlets are nested within subjects), we can expect the high
tau-correlation matrix for the random coefﬁcients that describes between-student
variation.

If we take down our analysis down to the item level in the above scenario, then
we have dichotomous outcome instead of continuous outcome. This is a scenario of using
an item response theory (IRT) model (Hambleton, Swarninathan, & Rogers (1991)). The
IRT model ﬁts a nonlinear function that decomposes the log-odds of a correct response
into an item-speciﬁc part and a person-speciﬁc part. The most standard IRT model is a
unidimensional model and is in an exploratory factor analysis mode, but there is a multi-
dimensional IRT model in terms of dimensionality, for example, see Reckase (1991).
Even the multi-dimensional IRT model is still estimated by exploratory mode analysis,
i.e., we are interested in how many dimensions we need to represent the item-person
interaction. It is possible to execute a conﬁrmatory mode analysis if we formulate the
multi-dimensional IRT model with a hierarchical-model format. We will illustrate this
point in Chapter 6. In this sense, the IRT model can be considered to be a non-linear
version of correlated outcomes within subjects, corresponding to the previous linear and
continuous version of correlated outcomes within subjects. In fact, Rasch model, the
simplest IRT model, can be formulated by a hierarchical model by conceptualizing that
items nested within examinee (Kamata (1998)). Raudenbush & Sampson (1999) applied
the Rasch model to the items measuring neighborhood environments in terms of physical
and social disorder. Their model can be seen as a multivariate Rasch model because they

used dummy variables to represent different constructs.

Most of the above cases involve micro aspects where we are interested in
individual differences and in individual variability. However, we can ﬁnd the examples in
other substantive disciplines. For example, from organizational sociology, data from
National Adult Literacy Survey study analyzed by Raudenbush and Kasim (1998)
involves the regression of literacy on ethnicity conceived as European-American,
African-American, Hispanic-American, Asian-American, and Other ethnicity. Thus, four
dummy variables are required to represent ethnicity. If these dummy variables’ regression
coefﬁcients are allowed to vary from state to state, it can be speculated that those random
coefficients might be highly correlated because of the similar social distribution of the
minority disadvantage. Another example can be found in meta-analysis literature. Becker
(1 988) found the rather high correlation 0.91 between the experimental and control
standardized mean changes across studies.

These above cases all suggest that those random coefﬁcients share some part in
common and tell us that once we know one of the random errors, we can discern the other
to some extent. Then, it might be reasonable to think about a latent variable that is shared
by those random coefficients and that produced the high correlation. In this scenario, a
factor analysis model is one of the candidates to represent this idea. In factor analysis, we
consider a small number of latent variables that explain the correlation among a larger
number of observed variables. In this sense, the factor analysis model is useful for
obtaining a parsimonious summary by reducing the number of parameters in a technical

sense, but also it might extract the essential relationship among the constructs that play a

central role in the social theory. Further, it might help explore and elaborate or test and
confu'rn the theory in mind.

Also, there are computational advantages of incorporating the factor analysis
model to HLM. That is, in the current HLM, when some of the correlations get close to
the boundary values such as 1 or - l , then its convergence gets very slow, or in the worst
scenario, it terminates without giving the estimates. If we use the factor analysis model,
we can expect quick and stable convergence because in factor analysis model, we are not

estimating the covariances that are close to their boundary.

1-2. Overview of Past Studies of Multilevel Latent Variable Modeling

In this section, I review the current state of the research in the ﬁeld of multilevel
latent variable modeling, which is an effort in combining multilevel modeling, structural
equation modeling, and item response theory (IRT) modeling. It is a goal of scientiﬁc
research to integrate different classes of models.

Since the original model was developed in each tradition to solve speciﬁc types of
problems, each modeling framework inherently has its own characteristic strengths and
weaknesses. If we take the approach of incorporating another framework based on one
framework, then the newly-developed model and the methodology are reminiscent of the
parent’s model characteristics.

Therefore, here we review those studies by emphasizing the underlying ideas to

develop the model and summarize the similarities and differences as well as the strengths

and weaknesses. We also describe a methodology that can be a potential building block

of a broader class of model, speciﬁcally, that of Jennrich & Schluchter’s (1986) work.

1-2-1. Papers on Latent variables in HLM

Incorporating latent variables within the hierarchical linear model (HLM) can be
found in Raudenbush, Rowan, and Kang (1991), where classical test theory was
introduced into their level-1 model to handle measurement error variation. Most social
science research involves instruments that are supposed to measure certain constructs, but
they produce measurement errors. If these measurement errors are ignored, then
maximum likelihood estimates of correlations will be attenuated (Lord & Novick, 1968,
page 69-74) as will be the regression coefficients. As a result, we will get biased
estimates. The instrument used by Raudenbush, Rowan, and Kang (1991) is the
Administrator Teacher Survey (ATS), which is a questionnaire that asked the high school
teachers about school climate consisting of 35 items that are supposed to measure the ﬁve
constructs. They used a three-level model in which the level-l units were items, the level-
2 units were teachers, and the level-3 units were schools. In their model, using ﬁve
dummy variables at level-1 speciﬁed which items were intended to measure which of ﬁve
constructs. Thus ﬁve latent variables which played a role of true scores in classical test
theory were speciﬁed in the level-1 model. At level-2, those latent variables in turn
became multivariate outcome variables deﬁned on teachers. At level-3, the school mean
of each entry varied over schools. The unrestricted model had no predictors in each level.

This enabled the authors to do psychometric analysis, i.e., to estimate the internal

consistency reliability of observed-scale scores. A major contribution of this approach
was that they clariﬁed the ambiguity that occurred from computing usual Cronbach’s a
(1951) by ignoring the school-clustering effects and separated it to two components, the
reliability of the teacher-level measures and the reliability of school-level measures. The
results showed that we could recover the attenuated correlation by incorporating the
measurement model. Another aspect of this level-1 model can be considered as a special
case of conﬁrmatory factor analysis with factor loading weights speciﬁed as unity.
Actually they found evidence that the number of factors was less than ﬁve by computing
the eigenvalues for each teacher or school-level covariance matrices. This paper
incorporated the measurement error in dependent variables in their level-1 model which
served as a measurement model, and then used the true scores as dependent variables at
level-2 that served as a structural model. In this sense, this was a prelude for complete
structural equation modeling (SEM) which incorporate measurement errors in both
dependent and independent variables.

In fact, Raudenbush and Sampson (1997) further extended this line of approach to
formulate a model that allowed measurement errors both in dependent and independent
variables. The model was formulated to examine the extent to which neighborhood social
control (Z) mediates the association between neighborhood social composition (X) and
violence (Y) in Chicago. The model was three-level model (measure, person, and
neighborhood were the units for each level), and Y and Z involved measurement errors
and X did not. Also Y and Z were measured at level-l and X was at level-3. Then at

level-1, for measure 1' for person j in neighborhood k, a measurement model was

formulated to represent the set of true scores and error scores using dummy variables as
in the previous Raudenbush, Rowan, and Kang (199l)’s model. That is:

Level-1:
R0,, =D,,.j,,(Y,.k +gjk)+ 02,1.“ij +vjk), (1-1-1)
5,, ~ N(0,aﬁ,, )
vjk ~ N(0,0'221.,,),
where DW is an indicator variable taking on a value of 1 if R”, is a measure of perceived
violence and 0 if not, and D2,], is an indicator variable taking on a value of 1 if Ry, is a

measure of social control and 0 if not (i = 1,2; j = 1,...,Jk; k = 1,...,K). Note that this
formulation allows missing data such as when a person provides a measure of perceived
violence but not of social control, and vice versa. The level-2 model was formulated for
person j in neighborhood k:

Level-2:
_ T
Y/k — Yr +ij7‘yk +rjjk

z}, = Z, + W114, + r”, (1-1-2)

(”1") 01"” Q”)
’ka ~N( O ’ (2,, Q: )’

where W]: = (A ge j,‘ , Gender}, , SES,, ) that was used for adjusting for background

differences. The level-3 model describes the variation across neighborhood of adjusted

neighborhood mean perceived violence and social control:

-10-

Level-3:
Y1: = Xkrﬂy +uyk 9

Zk = Xkrﬂ: 1" “:1: : (1'1'3)

uv. O T T.-
liww ‘- »
uzjk 0 Try Tzz

where X I was measures about neighborhood environments such as poverty
concentration, ethnic isolation, and percent of foreign born, that were considered
measured without error. For other regression components in level-2 model was ﬁxed for
parsimony:
7r... = ﬂyo, (1-1-4)
”:1: = ﬂzo '
Since the above model has the same structure of the model in Raudenbush,

Rowan, and Kang (1991), the parameters such as ,6, , ,6: , ,Byo , .520 for ﬁxed effects, and
T», , Tyz , and T: for the random effects could be estimated by the standard method.

However, their interest was to evaluate the mediating effect of social control (Z) on the
regression model of social composition (X) on perceived violence (Y) at the
neighborhood level. This was done by considering conditional distribution of YIZ, X
from Y,Z|X . Since (YkT, Z I )T in the model Equation 1-1-3 had multivariate normal
distribution, the conditional distribution of YIZ, X was obtained by standard normal
theory. To obtain the decomposition formula, the standard structural equation modeling

(SEM) (or originally, the path analysis) idea was used. That is, they ﬁrst wrote the

-11-

regression of Y on X and Z, and the regression of Z on X. Creating the reduced form (the
regression of Y on X) gave the formulae for the decomposition of total effect into direct
and indirect effects. The all necessary quantities to evaluate the decomposition were the
T

functions of ﬂy, ,6, , ﬂyo, ,6;o T y, , and Tu.

W ’

The decomposition of total effects into direct and indirect effect among latent
variables is often executed by the SEM and thus it can be considered to be an attempt to
integrate the HLM and the SEM to a broader class of model basing on HLM and then
incorporating SEM. The only difference is that the model by the HLM used classical test
theory model instead of factor analysis model that is often formulated by the SEM.

If the’dependent variable is not continuous, say, dichotomous, we need a different
model and the estimation method. Often, measurement instruments in social sciences
such as cognitive achievement test consists of dichotomous items. In Raudenbush and
Sampson (1999), the systematic social observation (hereafter, $80) was used to measure
the physical and social disorders of face blocks in Chicago neighborhood. The evaluation
of the SSO’s items were dichotomous. A three-level model involved items within face
blocks within neighborhood. The level-1 model was a Rasch model (Rasch, 1960) that
related item responses to item difﬁculties and face-block severities. At level-2, between
face-blocks within cluster, the face blocks were outcomes depending on neighborhood -
level intercepts. At level-3, neighborhood intercepts varied over neighborhoods. This
analysis produced psychometric properties such as internal consistency reliabilities at

each level.

-12-

Similarly, Cheong and Raudenbush (1999) applied Rasch Model to Child
behavior Check List 4-18 (hereafter CBCL 4/18) (Achenbach, 1991) to calibrate the
extent of severity of the externalizing behavioral problems of the children for each item.
They used dummy variables to represent two different constructs (aggression and
delinquency) indicator in level-1.

In psychometrics, the application of HLM to Item Response Theory (IRT) model
can be found in Kamata (1998), where he used the Rasch Model. Note that in
psychometric literature, the item severity to measure the extent to which the
neighborhood was disordered is replaced by item difﬁculty and the neighborhood
propensity to social disorder is referred to as person ability. Bock (1988), and Adams,
Wilson & Wu (1997) took a different direction, where they started from a 2 or 3-

pararneter IRT model.

1-2-2. Structural Equation Models (Mean and Covariance Structure Models)

As we mentioned in the above when we refer to Raudenbush and Sampson
(1997), the framework of separating a statistical model into a measurement model and
structural model has a long tradition in mean and covariance structure models or
structural equation models (SEM), which are implemented by package softwares such as
LISREL (Joreskog & Sorbon (1995), EQS (Bentler & Wu (1995)), AMOS (Arbuckle
(1995)), and SAS CALIS procedure (SAS Institute Inc. (1990)), and M-Plus (Muthén
(1998)) among others. In mean and covariance structure models, the measurement model

is usually expressed as a conﬁrmatory factor analysis model (CF A), which utilizes factors

-13-

as latent variables. The primary goal of factor analysis is to explain the covariances or
correlations between many observed variables by relatively few underlying unobserved
factors. In this sense, it is a data reduction technique. CF A, as contrasted to traditional
exploratory factor analysis (EF A), implies that a model is constructed in advance, the
number of factors is set by the researcher, whether a factor inﬂuences an observed
variable is speciﬁed, some direct effect of factor on observed variables are ﬁxed to zero
or some other constant, covariances of factors can be estimated or set to any value, etc.
Thus, in CFA, there are more opportunities that the researcher’s idea are reﬂected on the
model.

As suggested in the previous section, hierarchical linear model (HLM) and
structural equation modeling (SEM) has much in common as methodology. In growth
modeling context, Willet & Sayers (1994) showed that growth trajectory approach taken
by HLM could be done by SEM if the data are balanced. Several methodologists claim
that, compared to HLM, SEM approach has an advantage in terms of ﬂexibility of
modeling of the complex structure of variance-covariance matrix that naturally arises
when we formulate a complicated causal model (Muthén & Curran (1997), Willet &
Sayer (1996), Chou & Bentler (1998)). However, it is limited in dealing with nested
structure of data and unbalanced designs because the model is estimated ﬁ'om sufﬁcient
statistics such as either the correlations or the means and the covariances. An attempt to
incorporate clustered data structure and dealing with unbalanced and missing data which

is a feature advantage of HLM can be found in, for example, Muthén (1989).

-14-

Recently Muthén (1998) developed a methodology which integrates categorical
and continuous latent variable models. Latent categorical variable plays a role of a
classiﬁcation variable in multiple-group SEM, but the group membership is unobserved
and is determined from the data such that a person is classiﬁed into the class that has the
highest probability. In growth modeling context, this methodology not only adds more
ﬂexibility of the modeling that can reﬂect a class of substantive developmental theories
which concerns with discrete transitions or qualitative changes rather than quantitative
changes, but also provides a practically very useful way of predictive diagnostics in
preventive or intervention research.

Comprehensive and systematic treatment of latent variable models including
factor analysis, latent trait analysis (IRT), latent proﬁle analysis, and latent class analysis

can be found in Bartholomew & Knott (1999).

1-2-3. Modeling the Error Structure

An attempt to incorporate various patterns of the error covariance structure into
unbalanced repeated measures was made by Jennrich and Schluchter (1986) in a
longitudinal study context, where we can consider the case when the study was executed
perfectly in a designed way, i.e., no missing observations. If there are no missing
observation, we can utilize the standard MANOVA, or MAN COVA model. The key idea
of their model was to introduce a covariance structure model for the complete data and
then to link the observed outcome and the complete data by missing observation

indicators. They formulated the standard MANCOVA growth model for that complete

-15-

data and considered various error structures for the complete data error terms that were
used in the standard models such as time series model. Those include compound
symmetry, heteroskedastic errors, auto-regressive errors, and so on. Exploratory mode
factor analysis model which put a constraint on covariance matrix (D , (I) = I (identity
matrix) was one of the structures they listed. The key assumption for the missing data
matrix was missing at random (MAR).

This line of approach can be seen in Thum (1997). He explicitly stated that the
key assumption for the missing data matrix was missing at random (MAR). Thurn
specially focused on individual differences and individual variation. Starting from
standard MANCOVA model for repeated measurement on human subjects, he formulated
the two-level hierarchical linear model. In addition to showing various patterns of
covariance structure of both level-1 and level-2, he addressed the sensitivity of inferences
for small sample size data that is often the case in psychological studies by using a
multivariate t-distribution instead of a multivariate normal distribution for modeling the

variation of random coefﬁcients.

1-2-4. Multilevel Factor Analysis

Longford and Muthén (1992) analyzed data having eight domains in mathematics
for the 8th grade from the Second International Mathematics Study (SIMS) carried out in
1982, and they considered a model with factor structures at both the student and school
level. They used Fisher scoring to obtain maximum likelihood estimates for this model.

Their theory was developed for the exploratory factor analysis phase, not for the

-16-

conﬁrmatory phase. The main objective was to see whether the factor loading matrices
had the same patterns at the student and school level.

Muthén (1991) developed the multilevel factor analysis model from the SEM
perspective and showed that if sample size per the level-2 unit was large, the
conventional SEM approximate estimator worked well, providing similar results to those
of the Longford and Muthén (1992) for the same SIMS data set.

Muthén, Khoo, and Gustafsson (1998) extended this approach to multiple groups.
They used the eighth graders’ 16 achievement scores from National Educational
Longitudinal Study (NELS) administered in 1988, conceptualizing that urban Catholic
and urban public schools are two distinct populations. Thus this methodology is a
generalization of conventional latent variable multiple-group analysis to two-level
clustered data.

Though multilevel factor analysis is already developed, there are certain cases
where it appears useful to incorporate the factor analysis model directly into standard
hierarchical linear and non-linear models. That is, seeing the level-1 model of HLM as a
model that generates latent variables that represent the true scores , in a broader sense, we
apply the factor model for those random parameters such that the true score can be
decomposed into a common score and a speciﬁc score in level-2 model. Then, we can
clearly assess how much variation is explained by communality and how much is by the
speciﬁcity. This decomposition can not be done by standard factor analysis because
speciﬁcity is confounded with error variance in the model. Thus, in Chapter 2, we will

give a rationale for incorporating a factor structure in level-2 in HLM in more detail.

-17-

Chapter 2. Speciﬁc Settings in which the Proposed Models are Useful

In this chapter, contexts in which incorporating factor analysis structures into
HLM would be useful will be discussed in more detail. Next the model can be laid out in
terms of a hierarchical linear model. Technical issues such as method of estimation and

derivation of computational formulae will be discussed in Chapter 3.

2-1. Slopes-as-Outcomes Model in Organizational Studies

Suppose that in a school effectiveness study such as High School and Beyond
(Coleman, Hoffer, & Kilgore (1982)), we want to know that how much the relationship
between math achievement and student characteristics such as race, gender, SES vary
among schools. Suppose, in fact, that there are six covariates at level 1, then naturally, we
might formulate

L-l:

Y1,- = :30,- +ﬂjxlij +ﬂ21x2ij +ij3ij + ,6,ij +ﬂSJ'x5i'j +36%“,- + 5i} , (2'1'1)

it'd
where 8,}. ~ N(0,0'2) and xly,x20,x3ij,x4y,x50, and x6”. is the certain covarrates of

student characteristics for student i in school j.
At level 2, suppose that those coefﬁcinets all randomly vary among schools after

being accounted for by a certain school characteristic W! .

-13-

A); =700 +701Wj +qu
H} =7ro +711Wj +qu
:82} =720 +721Wj +u2;
:63} =730 +731Wj +u3j
ﬂu =74o +7.41% +u4j
:65} =750 +751W1 +qu
ﬂea,- =76o +7ij 1'qu

(212)

where u J. = (uoj, u”, uzj, u3j, us], u61)T is considered to be distributed as mean of 0 and the

covariance of r , a 7 x 7 symmetric matrix. This is fairly large model since the number of

unique parameters in tau-matrix is (7 x 8)/2=28 and the data may not provide enough

information to detect all the variances and covariances, or the current algorithm may not

converge in reasonable time. However, if indeed u”. and u2j go together and

an , u“. , us]. and u6j. go together, then we formulate the factor model for

T
u)- " (uojgurjiuzpuli’urtj’qu’uéi) as

 

 

 

fuel) ( 1
u”. 0
uzj 0
u” = 0
u“ 0
us}. 0

\ucj) KO

ooookr—o
.N3°&*~—oo

0\

’70} 7701' 0 W00 Won
771} a 771} NN( 0 3 W10 W11
772; 7721' 0 W20 W21

 

In matrix notation, the above models can be written as

L-l:

y} = XPBJ. +8}, 8,. ~ N(0,0'21,,/)

-19-

1” 02
W12 )- (2'1-3)
W22

(2-1-4)

,6]. = ij + A17]. (2-1-5)
Note that
r = D(uj) = A‘I’AT. (2-1-6)
By using a factor model, we reduce the number of variance-covariance parameters

estimated from 28 to 10.

2-2. Growth Models in Human Development Studies

Suppose that in a child development study, an outcome is modeled by a linear
function of age and want to know how much variability exists among children for
intercept and the rate of growth. Then, the model is
L-l:

Y". = no, + ”Na”. + a". , 8". ~ N(0,0'2) (2-2-1)
where a,,. is the age of child i at time t for t = 1,..., T, , and i = 1,..., n. Or, in more
compactly,

Y". = Air, + 6‘". , (2-2-2)

where A". = (1, a,,)T and 7:, = (7:0,, 7:1,)T.

In matrix notation for child 1',

Y, = Ag; + 8,. , 8,. ~ N(0,O’21T’) (2-2-3)

-20-

where Y, =(K,,...,YTI,)T, A, = E ,and g, =(g,,,...,g,,,)r. Inis the T, x T,identity

matrix.
The level-2 model describes the variability of those person-speciﬁc parameters
among children.

L-2:

7r , = + u u , 0 r r
UNION: :1»
Or, in matrix notation,
7:, = y + u,, u, ~ N(0, 1). (2-2-5)
Now suppose we suspect that u,, and u,, are correlated with the correlation of 1 or

-1. Then using the proposed model, we formulate

u” - '7‘” 27 ~ N(0 w ) (2-2-6)
u.,=/1no,-’ °’ ’ °° '

In matrix notation,

/_\

K. \-

\___/
ll

1
[,)<n._,). 770,- ~ N(0,woo).

or
uj = A77), 170] ~ N(O’LP) (2'2'7)

This implies that the variance and covariance matrix of u j. is

r = A‘I’AT. (2-2-8)

-21-

Thus, using a factor model, we reduce the number of parameters 7 from 3 (too, rm, 1'”)

tom/0M).

2-3. Psychometric Analysis of Multivariate Outcomes

Suppose that we have a math test for 3rd grade comprised of 9 testlets (a testlet is
a group of questions that are closely related in terms of the topic within a test) and the
score from each testlet is the number of correct responses aggregated from the items that
belong to the testlet. In this setting, each score from a testlet can be considered to be
continuous and the score is standardized so that each testlet score has approximately the
same variance, or the same standard deviation in the population, say, a unit standard
deviation. This homogeneity of variance assumption is not restrictive for application,
because test scores are inherently interval scale so that we can always standardize as long
as we only consider one population problem that doesn’t involve group comparisons.
Thus in practice we standardize the scores using sample standard deviations for each
testlet among students. This standardization procedure should be acceptable if the number
of students is fairly large so that sample standard deviation is very close to the population
standard deviation. Further, suppose that the test was constructed so that the ﬁrst three
testlets measure arithmetic proﬁciency, the second four testlets measure algebraic
proﬁciency, and the third two testlets measure geometric proﬁciency. Then, the following
model might be formulated. For testlet score 1’, for testlet i and student j,
L-l:

Y, = AjD“, +,8sz2,, +,6,,D3,, + 8,, (2-3-1)

-22-

where Dl ,, , DM and D3,, are the indicators of which construct the item i indicates, and

(id
8,, ~ N (0, 0'2 ). More explicitly, if the examinee i has all the testlet scores, then

 

 

 

 

 

 

KY”) (1 0 0T (3”)
Y2} 1 O O 52}
Y3, 1 0 0 8,,
Y4,- 0 1 0 ,4}. 8,,
Y,, = 0 1 0 ,8“. + 85,- . (23-2)
)2,- 0 1 0 '35, 8,,
Y7, 0 1 O 87,
I’Q, 0 0 1 88,
03,-) K0 0 1} \591')

Note that the level-1 model is actually a classical test theory model that

decomposes the observed score 1’}, into the true score and the error score 8,. In this

example, there are three true scores for each student j, arithmetic proﬁciency true

score A, , algebraic proﬁciency true score ,62, , and geometric proﬁciency true score ,6” ,
depending on which construct the item is supposed to measure. In matrix notation,
Equation (2-3-2) can be written as

y, = x,,6, +8, ,3, ~ moo-21“,). (2-3-3)

Then, at level-2, the three true scores for the examinee j vary among examinees.

L-2:
A] = 710 + u” “11 0 TH 2'12 1,3
:82} = 7'20 +u‘2j , ”2; ~ N( O a 721 2'22 723 ) (2'3'4)
A} = 730 1” “31 “31' 0 2'3, T32 2'33

Or in matrix notation,

-23-

,B, =7+u,, u, ~N(0,r).

If we suspect the correlation between u,, and uz, to be near unity, then we

formulate

In matrix form, this is written as
u, = A0,, ’70,- ~ N(0,‘I’)

and the variance-covariance matrix of u j is

r = A‘I’AT.

(2-3-5)
V’or
. 2-3-6
11/. 1) ) ( )
(2-3-7)
(2-3-8)

By using the factor model, we reduce the number of parameters in r from 6 to 4. Also,

notice that by using a factor model at level-2, we might be able to say that there are

actually two constructs instead of three. But an altemateive argument is that, as the

variances are different, i.e., 1,, at T33 , we might consider that the second and the third

measure the different construct (Cheong & Raudenbush (1999)). In either interpretation,

the reliability ( Cronbach’s a type internal consistency) of the test score on the examinee

level (level-2) can be obtained.

-24-

Chapter 3. Approach and Model Layouts

3-1. The Model

All the examples in sections 1, 2, and 3 of Chapter 2 can generally be written as
L-l:

Y1 =X,,B,+r,, (3'1)
where Y,isa n, x lvector, X,isa n, x R matrix, ,6, isa Rxl vector, and r,isa n, x1
vector, and

r, ~ N(0,0'21,,I) (32)
where 1,, denotes identity matrix of size n,.
L-2:

,8, = W,7+u, (3-3)
where W, is a R x F matrix for F is a number of ﬁxed effects parameters, 7 is a F x 1
vector, u, is a R x 1 vector. And for u , , we considered factor structure,

u, = A0,, (3-4)

where Aisa Rx M matrix, 77, isa Mxl vector (R2 M),and

u, ~ N(O, 1), (3-5)
7],. ~ N(0,‘P)- (3'6)

Thus, we have
2' = A¢AT. (3-7)

The combined model of the level-1 and the level-2 models is

-25-

Y, = X,W,7+X,An, +r,. (3-8)
Considering the case where we ﬁx some of the elements of u, and not others of
,6, , the more general model can be written as
Y, = A,,y + A2,.An, +r,, (3-9)
where
r, ~ N(0,0'21,,,) and 77, ~ N(0,‘I’). (3-10)
Note that Y, is a n , x lvector of observed scores, A,, is a n , x F matrix where F is a
number of ﬁxed effects parameters, 7 is a F x l parameter vector, A2 , is a n, x R matrix

for R is a number of random effects, A is a R x M ( R 2 M) factor loading matrix, 77, is
a M x 1 vector of factor scores, and r, is a n , x l residual vector. And 0'2 is a common
variance of each element of r, , In] is a n , x n , identity matrix, and ‘1’ is a variance-

covariance matrix of 77, .

3-2. Similarities and the Difference from the Multilevel Factor Analysis Model

The multilevel factor analysis model formulated by Longford and Muthén (1992)

was
L-l:

Y, =21, +A,7},,, +4., (3-11)
L-2:

#1- = #+ A2772, +6.,» (3-12)

-26-

Thus this model assumes each student i in school j has p-variate complete
observation.
If we translate the above model to the HLM notation, we would ﬁrst formulate an
unconditional model (no predictors in both level-l and level-2 model) as
L-l:
Yij=ﬂ0j+rij’ n,- ~N,,(0,2:,), (3-13)
where Y, is a P x lvector of outcomes for student i in school j, ﬂo, a P x lvector of the

school means, r, a P x lvector of student level error, and Z, is a P x P covariance

matrix that is common to all the students across schools, and that represents the vvithin-
school between-student variability for P variates. Now we consider the school level
model.
L-2:

,80, = yo, + u0, , uo, ~ NP(O,22) (3-14)
where 700 is the vector grand means, and 11,, is the vector of school level disturbances,

and 22 is a P x P covariance matrix that represents the between-school variability. If we

make the combined model by plugging Equation 3-14 into Equation 3-13, we obtain

Combined model:
Y,=y00+uo,+r, (3-15)
Let 2 E D[Y,.] , P x P matrix. Then from Equation 3-15, it is clear that

2 = 2, + 2,. (3-16)

-27-

Now we incorporate factor analysis model both in level-1 and level-2 residuals.

That is, for level-1, let
r, =A,77,,+Zj,, 77,, ~N(0,‘I’,), é, ~N(0,Q,) (3-17)
where A, is aP x M, level-l factor loading matrix , 77,, is a M, x llevel-l factor score
vector, g, is a P x 1 level-1 uniqueness vector, and Q, is a diagonal matrix.
For level-2, let
u,=A277,,+§,,, 77,,~N(0,‘P,), §,~N(0,Qz) (3-18)
where A2 is a P x M2 level-2 factor loading matrix , 172, is a M2 x llevel-2 factor score
vector, 5,, is a P x 1 level-2 uniqueness vector, and Q, is a diagonal matrix. After
incorporating the factor structure, we have
2, = A,‘P,Af + (2, (3-19)
and
Z, = A2‘112A7,~ + (22. (3-20)
Note that since they were interested in exploring the factor pattern, they gave

constraints ‘1’, = IMl and ‘1’, = [MI in order for the model to be identiﬁable. Thus,
covariance matrices actually modeled are

2, = A,A’, + Q, (3-21)
and

E, = AzAT2 + (2,. (3-22)

They were interested in testing

-23-

H02A, = AZ.

This hypothesis involves testing M, = M 2 , i.e., the number of factors in student level is
equal to the number of factors in school level.

To summarize the similarities and differences between my model and Longford &
Muthén’s model (1992), it can be stated in the followings.
Similarities:
1. Both approaches use a factor analysis model in the multilevel model.
Differences:
1. My model only has factor structure at level-2, but Longford & Muthén’s model has
factor structure at both level-1 and level-2.
2. My model has independent variables in both at level-1 and level-2, but Longford &
Muthén’s model does not have them either at level-1 or level-2.
3. My model is a conﬁrmatory factor analysis model, but Longford & Muthén’s model is
an exploratory factor analysis model. That is, my model has known and ﬁxed elements in
factor loading matrix A , but Longford & Muthén’s model estimates all the elements in
level-1 and level-2 factor loading matrices (A, , A 2) and let ‘1’, and ‘I’, , factor score
covariance matrices in level-l and level-2, be identity matrices in order for the model to
be identiﬁed.
4. My model can handle level-1 observations missing at random while Longford &
Muthén’s model requires all the students to have complete observations in terms of

outcome.

-29-

3-3. Identiﬁcation

When we start using a factor analytic type of the model that involves a
decomposition of the covariance matrix which is a case of the confirmatory factor
analysis (CF A) model, we often encounter model identiﬁcation problems. The
interpretations of the parameter estimates are meaningless if we estimate the parameters
of the unidentiﬁed model. Therefore, model identiﬁcation must be established before we
estimate the parameters in the model.

Model identiﬁcation is concerned with whether the parameters of the model are
uniquely estimable, assuming a sufﬁciently large sample. Issues of model identiﬁcation
in conﬁrmatory factor analysis (CFA), which is a special case of SEM (see the model at
the section 2-2 of Chapter 1), are detailed in, for example, Bollen (1989). Therefore, here
we brieﬂy describe what it is, what the problems are, and what is the current status of our
knowledge on model identiﬁcation of CFA.

In CFA, the population covariance matrix X is a ftmction of a R x 1 vector of
parameters 6 , which contains all of the unknown and unconstrained parameters of the
model.

A CFA model can be written as

y= A77+§, (3-23)
where y is a R x 1 vector of deviation scores from the sample mean of observed

variables so that we have E (y) = 0 ; 77 is a M x 1 vector of common factors; A is a

R x M matrix of factor loadings relating the observed y ’s to the latent 77 ’s; and cf is a

R x 1 vector of residual or unique factor, which is a combined element of unique factor

-30-

and measurement error. Let 2 be the population covariance matrix of y , i.e.,
2‘. = D[y] = E(ny) . Let ‘1’ be the covariance matrix of the 77 , i.e., L1’ = D[q] , and let
Q be the covariance matrix of .5 , i.e., Q = D[5] . We assume that the population means
of 27 and 4‘ are zero and 77 and 5 are uncorrelated. That is, we assume E (77) = 0,
E (g) = 0 , and Cov( 77, 4‘) = E[ng"T] = 0. Then, we obtain the covariance equation,
2 = A‘I’AT + Q. (3-24)

Thus 2 is represented as a function of the A , ‘I’ , and Q. In the CF A model, some
constraints are placed on A and/or ‘I’ based on the prior knowledge of the subject matter
that researchers are studying. Let q, be the number of unknown parameters in A , q2 be
the number of the unknown parameters in ‘I’ , and q3 be the number of unknown
parameters in the Q , and let q be the total number of unknown parameters in the right
hand side of Equation (3-24). Then we have q = q, + q2 + q,. We represent all of these
parameters as a q x 1 vector 6.

A model is said to be identiﬁed if all parameters in the vector 6 are identiﬁed.
Thus a model is identiﬁed if 2 = 2(6,) = 2(62) implying that 6, = 62 , where 6, and 62
are the parameter vectors that have any speciﬁc values. Note that 6, and 62 are different
even if one of the corresponding elements are different. Negating the proposition, the
model is not identiﬁed if there exist 6l and 62 such that 6, ¢ 62 and 22 = 2(6,) = 2(6,).
Another way of saying this is that if the parameters in 6 are solved uniquely by a

function of the population parameters in 2 , then the model is identiﬁed because

estimation involves using sample data to obtain estimates of population parameters. In

-31-

conﬁrmatory factor analysis (CFA), this involves using the sample covariance matrix of
the observed variables, called S , which is sufﬁcient for 2 under a balanced design and
given our assumptions, to estimate the parameters in A, ‘1’, and Q.

The latter deﬁnition of identiﬁcation suggests a method to demonstrate model
identiﬁability. That is, if all parameters in 6 are represented by unique functions of
elements of Z , then the model is identiﬁed. If not, the model is said to be unidentiﬁed.
When there is only one way to express the parameters in 6 , the model is said to be just
identiﬁed. If there are several ways, then the model is said to be overidentiﬁed. In most
cases, we try to formulate an overidentiﬁed model, because it provides an opportunity to
test our hypothesis. Representing a model with a smaller number of parameters than the
number of unique parameters in the population covariance matrix 2 reﬂects our attempt
to explain or describe a complex phenomena by a relatively simple theory.

The argument of identiﬁability of a multilevel factor model can be made in the
same way since 2’ can be seen as the counterpart of a population covariance matrix of
observed variables in SEM, which is denoted as 2 . That is, if r is the positive semi-
deﬁnite, the identiﬁcation condition follows those given by the conﬁrmatory factor
analysis (CFA) model. Thus, we ﬁrst smnmarize the current status of our knowledge
about identiﬁcation of CFA model.

The trouble in assessing identiﬁability of CFA models is that not only we have
not found necessary and sufﬁcient conditions, but also we have not yet even found a
relatively general, widely applicable sufﬁcient condition. Currently there are a couple of

necessary conditions known. They are useful, but the model that satisﬁes the necessary

-32-

conditions is not necessarily an identiﬁable model. It only gives a hope that can be
identiﬁed. Therefore, a pitfall in which we could fall is a case in which a model satisﬁes
the necessary conditions, but is not an identiﬁable model, and we run the software
without knowing the fact and the software gave the estimate. In this case, the estimate is
meaningless'. On the other hand, models that satisfy sufficient condition are guaranteed
to be identiﬁed. If we know the sufﬁcient condition, we are sure that the estimates
obtained by running the software are meaningful. Therefore, a sufﬁcient condition is
stronger condition than a necessary condition.

One way to know whether the formulated model is identiﬁed is to algebraically
solve the simultaneous nonlinear equations in terms of unknown parameters. If each
parameter in 6 can be represented uniquely by a function of the elements in 2 and by
the pre-speciﬁed parameters, then the model is identiﬁed. This is the only necessary and
sufﬁcient condition of model identiﬁcation of CF A models that we know So far.
However, since the simultaneous nonlinear equations to be solved become very complex
when the model gets large and since we may not know a systematic way of solving them,
it is often a very difﬁcult task to know whether the unique solution is obtainable or not.
Therefore, though it is a necessary and sufﬁcient condition, solving the simultaneous
nonlinear equations is not considered to be a good way to demonstrate the model

identiﬁcation. Long (1979) characterized this problematic status as “proving that a model

 

' A remedy for this is that using the different starting values of the parameters, we see if we get the same
estimates. This strengthens our conﬁdence that the model is identiﬁed, but it is not still enough because
there are inﬁnitely many starting values in the parameter space and we can not test all of them.

-33-

is identiﬁed presents one of the greatest practical difﬁculties in using the conﬁrmatory
factor model” (page 36).

In terms of necessary conditions for identiﬁcation of CFA, there are generally two
necessary conditions available. Kline (1998) summarized those necessary conditions as:

1) the number of parameters to be estimated 5 number of observed

R(R+l)
2

 

unique variances and covariances (q s ). (3-25)

2) every factor must have a scale, i.e., unit variance of each factor
should be deﬁned.
The second condition can be achieved by either ﬁxing each factor variance to unity or by
setting an element of each column of the factor loading matrix A to a constant, usually 1,
which sets one of the factor loading one for each factor.

Since identiﬁable models must satisfy these conditions and both conditions are
easy to check, they are very useful for screening unidentiﬁed models. However, even
after screening the unidentiﬁable models using these necessary conditions, many
unidentiﬁable models remain, having different patterns of factor loadings.

Some works have been made concerning sufﬁcient condition for identiﬁcation of
CFA models. For example, the two and three indicator rules and Bollen’s slightly more
general condition for the three indicator rule (Bollen, 1989) provide some methods. Davis
(1993) extended these works and found the general sufﬁcient conditions for identiﬁcation
of CFA with factor complexity of one, where the factor complexity is deﬁned as the

number of factors on which observed variables load at most, and factor complexity of one

-34-

means that each observed variable loads on one and only one latent variable. This
condition allows that measurement errors be correlated. However, the scope of these
sufﬁcient conditions are limited because these conditions apply only to CF A model of
factor complexity of one, a rather simple model. Thus we only can use these rules of
sufﬁcient conditions for simple factor loading models. These do not apply to factor
models that have a cross-loading structure, which is deﬁned as some of the observed
variables load on more than one factor.

It seemed promising at ﬁrst look for a sufficient condition for uniqueness of factor
loading matrix A under rotation provided by Howe (1955) and Jdreskog (1979).
J oreskog (1979) provided a sufﬁcient condition for the uniqueness (not for model

identiﬁcation) of A under rotation for the conﬁrmatory factor analysis (CFA) model. The
CF A model that Jdreskog (1979) used adds another assmnption for 6 , that is, each
element in .5 is uncorrelated with each other element. This additional assumption

constrains the Q to be a R x R diagonal matrix which contains only the variances of 5 in
the main diagonal, with all off-diagonal elements ﬁxed to 0. It is assumed that all
diagonal elements of Q are unknown parameters to be estimated.

The scales of the latent variables 77 can be determined either by the requirement
that ‘I’ is a correlation matrix or that there is a ﬁxed constant (usually set as l) in each
column of A . The remaining elements of A are either ﬁxed to zero or are unknown
parameters to be estimated.

The sufficient conditions for uniqueness (again, not for model identiﬁcation) that

was stated by Joreskog (1979) are:

-35-

(a) ‘1’ is a symmetric positive deﬁnite matrix with diag(‘1’) = IR ,
(b) A has at least n-l ﬁxed zeros in each column, and
(c) AS has rank n-l, where A, , S =1,2,...,n , is the submatrix of A
consisting of the rows of A which have ﬁxed zero elements in the (3-26)
.S“h column.
Several authors of books on conﬁrmatory factor analysis (for example, see pages
43-44 of Long (1979)) confused a sufﬁcient condition for uniqueness of A under rotation
with a sufﬁcient condition for identiﬁcation. Bollen and Jdreskog (1985) clariﬁed this
misunderstanding by showing an example in which uniqueness of A under rotation does
not guarantee the identiﬁcation of the model. An example of an unidentiﬁed model they
used was unidentiﬁed because the parameters in Q needed to be estimated. The A and
‘1’ were unique under rotation if the parameters in Q were known. However, since
parameters in Q parameters need to be estimated, then unidentiﬁcation occurred.
Therefore, if there are no parameters in Q in the model, it seems that the condition that
Howe (1950) and Joreskog (1979) proposed is a sufﬁcient condition for identiﬁcation as
long as the examples I’m going to use.
Now we consider the identiﬁcation issue for HLM that involves a factor analysis
structure. If we replace the r matrix in HLM as 2 in CFA, the same thing can be said

about the model identiﬁcation. The requirement of r as a covariance matrix of u, is that

it should be positive semi-deﬁnite matrix, the same as the condition required for 2 . The
only difference is that the HLM factor model (see Equation (3-7)) does not involve the

unique residual variance, Q (see Equation (3-24)), though it can be easily added.

-36-

Therefore, we restate the conditions for identiﬁcation for the HLM factor model, which
are the same for identiﬁcation of CFA models, with an additional necessary condition
that is useful for screening unidentiﬁed models.

We ﬁrst describe the simple necessary conditions to check. If these conditions are
not met, the model will not be identiﬁed. Thus, we can use these conditions to screen the
turidentiﬁed models at the ﬁrst stage of model consideration.

Necessary conditions:

1. the number of parameters estimated 3 the number of parameters in r

R(R+l)
(q S ——2 )-

2. every factor must have a scale, i.e., unit variance of each factor (3-27)
should be deﬁned.
3. M2 independent constraints on A (or ‘1’) must be given, where M is the
number of factors.
The ﬁrst two are the same necessary conditions in CFA model. Necessary condition 1 is
simply states that the number of equations must be at least as many as the number of
unknowns. Thus, we need at least the same number of independent equations as the
number of unknown parameters to be estimated.
Necessary condition 2 states that the M scales are necessary for each factor in the
M x 1 vector, 77 because in order to calibrate the variance of a random variable in terms

of variance of factor. the unit variance of the factor must be deﬁned on the ﬁrst hand.

-37-

The third necessary condition that I propose derives from the indeterminacy of the

factor model. Given a positive semi-deﬁnite covariance matrix I and a number of M

factors which satisfy the equation I = A‘I’AT. Let A' = AT and ‘1" = T ""1’TT'l for any
m by m invertible matrix T. Then A"1"A'T = (AT)(T"‘1’TT"l )(TTAT) = A‘I’AT = 1.
Thus the pair (A. ,‘1’°) produces the same 2' as that of (A, ‘1’). Therefore, in order to

uniquely identify A and ‘1’ , we need to specify T, i.e., we need to impose M 2 (the

number of elements in the matrix T) constraints on A or ‘I’ . Thus, if there are less than

M 2 restrictions on either A or ‘1’ , then both A and ‘1’ are not be identiﬁable (See Pages
553 - 557 of Anderson (1984)).

There are two ways of determining the scales of each factor. One is to set the
diagonal of ‘1’ to unity, which scales the variances of all latent factors to one. Another
way is to set one of the elements in A for each column to one. Since we want to keep the

same metric in r], as in u, in our multilevel conﬁrmatory type factor model in Equation

(3-9), we choose the second option. That is, we ﬁx an element of each column of the
factor loading matrix A as 1. By doing that, we are specifying a unique variance at level-
2 unit as the reference variance of the factor.

If we choose to scale the latent factors 77 by setting an element for each column
of A to one, then the 2nd and the 3rd necessary conditions can be summarized as one
necessary condition, which is, the linear transformation matrix should be the identity

matrix, i.e., T = IM . In this condition, the 2nd necessary condition is represented by the

-3 8-

fact that we ﬁx all the diagonal elements of T to 1, and the 3rd necessary condition is

represented by the fact that we set all the M 2 elements in Tto either 1 or 0.

Though only two conditions out of three are independent, I formalize three
necessary conditions for identiﬁcation for our HLM2F model represented in Equation (3-
9) for practical purpose in the followings:

If the model in Equation (3-9) can be identiﬁed, it must satisfy the following two

conditions, otherwise the model cannot be identiﬁed:

1. the number of parameters estimated must be smaller or equal to the number

 

. . . R( R + 1)
of unrque parameters 1n r , i.e., q S 2 .
2. every factor must have a scale, i.e., unit variance of each factor (3-28)

should be deﬁned.

3. the linear transformation matrix T must be the identity matrix, i.e., T = IM

under the transformations A. = AT and ‘1" = T"‘I’TT'1 .

We conﬁrm that the 3rd condition above is truly the necessary condition by the
following reasoning: The model is not identiﬁed if the model is not unique under linear
transformation of A and ‘1’ because there exist at least two pairs, (A,‘1’) and (A. ,‘1”)
that produces the same I , which is the deﬁnition of unidentiﬁed model. Therefore, our
statement is if there exists a non-trivial M x M linear transformation matrix T , which
means that T is not the identity matrix, then the model is not identiﬁed. Taking the
contrapositive, we can say that if the model is identiﬁed, then the linear transformation

matrix Tmust be the identity matrix. This states that T = IM is a necessary condition for

-39-

the model identiﬁcation. Note that the second necessary condition is exact when we
choose to scale the latent factors 77 by setting an element in each column of A to one. If
we choose another scaling option, which is setting ‘1’ = 1M , then Tcan be either T = IM
or T = —1M because of the arbitrariness of the sign of the factor loading matrix.

In terms of the sufﬁcient conditions, we do not know them yet. We know that if
we can solve the covariance equations uniquely for the unknown parameters in 6, the
model is identiﬁed, and thus the solvability of covariance equations in terms of 6 is a
necessary and sufﬁcient condition. In this situation, it seems most practical to proceed as
follows:

1) Formulate the speciﬁc model and check its identiﬁcation by two necessary

conditions.

2) If the necessary conditions are satisﬁed, then we check the identiﬁcation by
attempting to algebraically solve the covariance equations. If it is solved, then
the model is identiﬁed. If not, treat the model as unidentiﬁed.

Therefore, I present the model identiﬁcation issue in detail for the speciﬁc models that
will be used in Chapter 5. First, I will describe the context of the data because
speciﬁcation of the A matrix not only depends on the identiﬁability of the model but also
on the substantive knowledge of the context. That is, it is not useful to estimate the
parameters of the model that aren’t substantively interesting.

As a classical application of a factor model, we consider a setting where subtests

are nested within examinees. Though this does not require a hierarchical model, we can

-40-

see a hierarchical model as a technique for data reduction. This is useful because it allows
comparison to the models where we have a lot of experience.

Speciﬁcally, suppose that a test involves 4 subtests, mathl, math2, verball, and
verba12 and that each subject takes two parallel alternative forms at each test in a
relatively short time span. We assume that there are no missing cases and that there is no
learning effect or memory effect for the two occasions of the measurements. There are
100 subjects. Thus we create a situation in which subtests are nested within students and
each student has 8 observations, which results in 800 observations total. Now we assume
that the true scores of subtests mathl and math2 are perfectly correlated as are those for
verball and verba12. Thus we can have 2 factors, mathematical and verbal proﬁciency.
The correlation between mathematical and verbal true scores is 0.50. Then the model can
be formulated in a hierarchical model form as
L-l:

K} = '4le0 + IBUXIU + '631X3ii+ IB‘UXW 1' 54';

4
= Z'quXqij 1' 547

(Pl

iid

, 5,, ~ N(0, 0'2) (329)

where X ,, = 1 if ith observation is qth subtest, and 0 if not. The standard HLM2 speciﬁes

the level 2 model as

L-2:

761} = 710 + “I;
ﬂy =720 +112, (3-30)
763} =730+u3j

7641:740 +114}

.41-

 

(
“I j 0 TH T12 713 2'14
u2 . Nd 0 r r r r
j 12 22 23 24
where ~ N , , and the supposed level 2 model reduces
“31 O 2'13 2",, 2'3, 2'3,
“4 j K 0 714 1'2, 2'34 744

the number of parameters at level 2 as

L-2:
A] =710 +’71j
7621 =720+21nlj
783; =73o +7721

:64} =74o +1202j

where the unique four level 2 errors are reduced to two, i.e.,

u,, l 0

“21 = ’11 O (7711] and (”IIJTN((O) (W11 WIZJ)
“31‘ O 1 7727‘ , 7727' O , W12 W22
”47' O 32

Note that there is a relationship between u, and 77, as
”j = A”;
and thus

2' = A‘1’AT.

In element wise, it can be represented in a set of equalities:

711 7'12 1'13 714 W11 ’11er W12 ’hWn
712 2'22 723 2'2, _ 11W” A12W” A1W12 A1’1'2W12
I13 2'23 T:13 2'34 - W12 11W12 W22 A2W 22
{,4 2'2, z'34 744 ’12 W12 A1/12W 12 ’12 W22 ’1; W22

1 = A‘I’AT =

and if we write this in a correlation form denoted as p, , then

-42-

(3-31)

(3-32)

(3-33)

(3-34)

(3-35)

1.0 1.0 p p

,0 = 1.0 1.0 p p , (3-36)
' p p 1.0 1.0

p p 1.0 1.0

where p = corr(77,,, 772,) = -\,-=—@\/-=.
' W11 W22

WCSCI 710 :720 :730 =740 =500, 0'2 =25, 1, =08 12 =12, W” =100,
11/22 =100, W12 = 50 SO that corr(7},,,7}2,) = 0.50, amediurn size correlation. Thus we

speciﬁed the r as

In T, 2 1'13 T” 100 80 50 60
T: q, 72, 7,, 7,, _ 80 64 4o 96 , (3-37)
T13 1’23 T33 T34 50 40 100 120

1,, 1,, T3, 1'“ 60 96 120 144

and as a correlation,

1.0 1.0 050 050
1.0 1.0 050 050
p' 050 050 1.0 1.0 '

050 050 1.0 1.0

(3-38)

The true model, as shown in the above, and from which we will generate the data

is called “model 1”.

Model 1: The assumed correct model

It has the factor loading matrix of

1
2.

H00

0
0

3»

-43-

We see from it that the unique element u, , and u,, only loads on the ﬁrst factor and the
variance ratio is one to 2,2 . Similarly, the unique element u,, and u,, loads only on the

second factor and the variance ratio is one to 71,3 .
The most general model among two factors can be formulated by letting all the
elements in the factor loading matrix A be free, unknown parameters except setting two

1’s at the position of (1,1) and (3,2) for scaling purposes. It is named “model 2” and is

represented as:

Model 2: Saturated, but not identiﬁed model

1 ,1,
A = 11 71,
,1, 1
’13 ’2

This is an unidentiﬁed model because it does not satisfy the necessary condition (2). This

can be shown by checking the uniqueness of the linear transformation matrix T. Let

1 716 x+/l6u y+/l,v l A;

7.:(3‘ y].Then,/\ —AT= 3'1 14(3‘ Y): ’th’ﬂmu Ary+l4v = A; ’1';-
u v 715 1 u v 715x+u 85y+v ,1, 1
’13 A2 713x+27u lsyi'ﬂqv 3:1 ’1';

Thus we have conditions for A. that need to be satisﬁed, which are

x+&u=L
Ay+v=L

~44-

This simultaneous equations have inﬁnitely many solutions other than

,u=—1—v=l andy=—2-—.Thusthe

x=1,y=0,u=l, and v=l,forexample, x=
2716 3 32.,

Nl——

linear transformation T is not the identity matrix. Therefore (A,‘1’) is not unique.
Therefore model 2 is not identiﬁed.
Specifying the factor loading matrix as a 4 by 2 matrix reﬂects the researcher’s

belief about how many factors exist for the data. We speciﬁed that there are two factors

T . . .
and that u, = (u,,, u,,, u,,, u,,) are linear combinations oftwo factors (7},,,772, ).

This model is a saturated model in the sense that there are as many unknown elements
involved in A as possible when we have two factors. However, as we proved in the
previous paragraph, it is not an identiﬁed model because of nonexistence of unique linear
transformation matrix. Though it is an unidentiﬁed model, it is worth mentioning the
substantial meanings of the numbers of the rows and the numbers of the columns of the
A . The numbers of the rows tell us how many residuals unique to the random coefﬁcient

,6, exist and the numbers of columns tell us how many common factors exist. In CFA,

we decide the numbers of common factors mostly from the substantive knowledge, either
theoretical knowledge or common sense.
As a more general than the model 1, and as an identiﬁable model, researchers may

formulate the following model.

-45-

Model 3: Identiﬁed model, but a misspeciﬁed model

0
71

A

o}.-

1
As 12
This is an identiﬁed model and the model 1 is nested within this model, because if
we ﬁx 2., = 71, = 0 , we obtain model 1. In order to test identiﬁability, we take two steps.
That is, we ﬁrst check the necessary conditions by (3-28). If the necessary conditions are
satisﬁed, then we proceed to solve the covariance equations to see if indeed the model is
identiﬁable.

Thus, we ﬁrst check the three necessary conditions. The ﬁrst one is satisﬁed

 

R(R+l)
2

because {the number of parameters in Tau} ( ) = 10 (R = 4) 2 {the number of

parameters estimated} (q) = 8 ( 4 + 3 + 1). It is clear that the second condition is also

satisﬁed because there are 1’s in each column of A at the position (1,1) and (3 ,2). To

y

check the third condition, consider an invertible 2 by 2 matrix T = [x J . Then,

II V

1 o x y 1 0

A'=AT= 71, 71, [x y): 71.,x+/t,u li,y+/t,v = A, A;
0 l u v u v 0 1 '

1, a 1,..1,u 2.3.1,. 2; A;

Thus we have conditions for A' that need to be satisﬁed, which are

0
I) = 1,. Therefore, the transformation is

1
x=1,y=0,u=0, and v=1,thatis, T=[0

unique, so A is unique under linear transformation. Since all of the three necessary

-46-

conditions are satisﬁed, there is a hope that this is an identiﬁed model. In order to check

whether it is an identiﬁed model, we see an algebraic solution. Simultaneous equations

that we should solve for unknown parameters are obtained from r = A‘I’AT . That is,

r,, r,, 1',3 1",, l 0 l 0 T

712 1'22 ’23 1'24 _ ’11 ’11 (W11 W12) A1 '14

r” 723 731 Tu - 0 1 W12 W22 0 l

1'“ 721 ’14 Tu ’11 A1 ’11 '12

( W11 W12
_ 31W” 1’14le A1W12 1'14sz [I ’11 0 A!)
- W12 W22 0 ’11 I '11

V1.1W111'1'2W12 Aqu 1"th

( W11 31W11+11W12 W12 31W11+11W12
_ ’11W111‘3’4W12 ’hohWn+11W12)+’1'1(’11W11+’11W22) AW12+34W22 ’liu-IWH+14W12)+11(11W11+’14Wn)
- W12 ’11le 1'11sz W22 ’1':le +17sz

 

KArn‘l’ni'jiWu “1(X1WII+AJWIZ)+AS(A1WIZ+A'ZW22) ’lJle‘l'Ang 33(11W111'12W11)+’11(’11W12+12W21

Thus, we have simultaneous equations

r

 

 

1;!” = r“ ................................................................................ (1)
Aw” +24%, = 1,2 ................................................................... (2)
1,111,, +1211!” = 1,, ................................................................... (4)
4 2,01%, +7141/I,2)+/t,(}l,1//,2 + 7141,1122) = 122 .................................. (5)
7111/42 + 7141/12, = 123 ................................................................... (6)
13(311/111-1- 2.,1/1,,)+ A,(zi,1//,2 +714u/22) = 12, .................................. (7)
W22 = r33 ................................................................................ (8)
/1,1//,2 + 2.21/12, = r3, .................................................................. (9)
L13(1131/4, + 2.21/1,2)+ 1,03%, + 7121,1122) = r4, .................................. (10)

It can be seen that from Equations (2) and (6), we can solve for 71, and 71., because w” ,
1,11,, , and 1,1122 can be easily solved by Equations (1), (3), and (8) respectively. From
Equations (4) and (9), we can solve for A, and 71,. Therefore, the model is identiﬁed.

Actually it is an overidentiﬁed model because there are other ways to obtain the

-47-

solutions, F or example, once we have solutions on 1,11”, w” , W22 , A, , and 714 , we can solve
for 71, and 11., from Equations (7) and (10).

There are several characteristics that make this model one of the most
substantively sensible models. First, the dimensionality was reduced from 4 to 2, and we
speciﬁed that the number of unique factors is two. Specifying the ﬁrst row of A as (1,0)
and the third row as (0,1) make u,, = 77,,, and u,, = 772,, which means that the unique
randomness of mathl is the score of ﬁrst factor, and the unique randomness of verball is
the score of second factor. The model deﬁnes that the variance factor 1 is exactly the

variance of 6,, , the variability of mathl scores among examinee, and nothing else. The
same thing can be said to the variance of 6,, , variability of verball scores among

examinees. This model is considered to be substantively most sensible.

Model 4: identiﬁed model, but a misspeciﬁed model

A: ,‘1’=(1//,,).

131%.)". -

This model is an identiﬁed model and is nested within model 1 so that we can apply the
deviance test. The fact that this model is nested within the model 1 can be shown as

follows. For model 4, we have

-48-

A‘I’AT= (W11)(1 ’11. ’15. ’11)

.1). it 3”. '—

W1.1 A; W 1.1 ’13. W 1.1 ’1; W 1.1
= 11. W11 ’11.2 W11 21. ’1; W 1.1 ’11.”; W11
’15. W11 11.15. W11 15.2 W 1.1 As. ’1; W 1.1
XiW 1.1 ’11.]; W 1.1 ’15. liW 1.1 33.2 W11

We compare this expression to the model 1’s counterpart:

1 0 W11 W12

A‘I’AT= ’11 0 [W11 W12](1 ’11 O 0]: 11W” 31W12 [1 31 0 0]
0 1 W12 W22 0 0 1 ’12 W12 W22 0 O 1 ’12
O ’12 A12W12 ’12sz

W11 31W” W12 ’12le
_ A1W11 112er 11le A1A2W12
_ W12 ’11le W22 12W22
12W12 11’1an2 ’12sz ’13sz

In this expression, we let the correlation of latent variables be 1 to reﬂect our idea for

model 4 that every 4 u, ’s has completely correlated with each other. That is,

 

W2 0 o o o
pin/7h; = W =1,0r, W12 =VW11VW22 -Thenlet W11=W117 ’11 =’11 9 W22 =’15\1W117

712 = . . Then starting from model 4, we obtain model 1. Therefore, model 4 is nested

 

.P 3°.

within model 1. This process can be summarized by saying that after we set the
correlation between two latent variables as 1, then we rescale the W22 and II? by A; ,

where ,1; is newly speciﬁed as a element of the factor loading matrix A .

-49-

Chapter 4. Estimation and Inference

To estimate the parameters in the model (3-9), we adopt the method of full
maximum likelihood (MLF) via Fisher scoring, adapting Longford’s (1987) approach.
We choose full maximum likelihood (MLF) as a method of estimation instead of
restricted maximum likelihood (MLR)'.

If we look at the model (3-9) from the general linear model point of view by

letting
e, = A21921 1"}, (4'1)
we have
Y,=A,,6I+e,,e,~N(0,V,) (4-2)
where
V, = A2,.A‘1’ATA2T, + 021,. V (43)

Let ¢ = (d , ¢, ,..., ¢Q )Tbe a Q x 1 vector of underlying parameters that produce V, . It is

well known that, conditional on the maximum likelihood estimate (MLE) of q) , the MLE

of 6 is asymptotically orthogonal to the MLE of (15 because the Hessian component

 

’ The difference between MLR and MLF is that we specify a ﬂat prior for the ﬁxed effects in the case of
MLR but not for MLF. The choice of model speciﬁcation between MLF and MLR has pros and cons. The
advantage of MLF is that the derivation is easier than MLR and is completely ﬁt to the frequentist notion
of Maximum Likelihood estimation. However, MLF underestimates variance component parameters
because it doesn’t take the uncertainty of the ﬁxed effects. Thus, MLF produces biased estimates for
variance components parameters whereas the corresponding MLR estimates are unbiased. However, MLF
has advantage in terms of hypothesis testing because MLF allows deviance test for both ﬁxed effects and
random effects whereas MLR is only for random effects.

-50-

621 .
E [—] is 0, where 1 rs the log likelihood of the model (3-9). The MLE of 6 is given

56151“
by generalized least squares (GLS) as

61.... = (Ail/"A11" AIV'W (44)

J
where A, = [A,T,,A,T2,---,A,T,]T, a N x Fmatrix for N = Zn, , the total number oflevel-l

7-1

J .
units, V = 031V,, a N x N matrix, Y= [11T,}’,T,~-,Yf]r, a N x lvector ofobservations.
J:

Note that GB denotes the direct sum operator. And the asymptotic covariance matrix of
Q.MLE is
0161.....1=<A.’V"A. 1“ (4-5)

6’]
46W

, ] =02, we just need a series of scoring

 

where V is evaluated at 6,“. Since E[

iterations with respect to ¢ to obtain 43,“. After we obtain qiMLE , we compute 61M“, by

formula (4-4).

The log likelihood based on Equation (4-2) is

N 1 1
WM ; Y) E log L(6l,¢ ; Y) = — 310807!) - 5 logl V I- EeTV"e, (4-6)

J
where e = 63' e, . Since Y is a random sample of size J of Y, ,
I:

 

2 This is generally true when we take derivative of I twice with respect to the ﬁxed effects 61 and the
random effects ¢ successively, and then take the expectation with respect to the distribution.

-51..

J
1(0.,¢.;Y)=le(a.¢;13) <4-7)

I"

where

n l 1 _
I,(61,¢;Y)alogL,(61,¢;rg)= —-2’Iog(2z>—§Iog1V,I—§ejV, 'e,.. (4-8)

Longford (1987) gives the score vector and Hessian of the log-likelihood function (4-8)
in terms of element wise formula. Raudenbush (1994) showed the formulae in more

compact matrix form. The key results are for the q x 1 score vector

 

 

 

61, 1 6vecV, r
S, EE=§( §¢ ) (vecM,) (4-9)
where
M, = V,"(e,ef — V, )V,’1 (4-10)
and
J
S = 25,. (4-11)
j=l
For the q x q Hessian matrix,
H _ E 621, 1(6vecV,]T V" ®V" (6%ch 412
j: [wa¢T]_'—2 a¢ (j j) a¢ (' )
and
J
Hz 2H,. (4-13)

1:1
We apply the above general formulae to obtain (3,4,5. The details of the derivation

and the algorithm are in the Appendix.

-52-

To implement the Fisher scoring algorithm, we need a starting value of the

parameters in the variance-covariance matrix V, . Though it is arbitrary, we need to
provide a good starting value to obtain the MLE within the parameter space and its
quicker convergence. Though V, is a function of three components, i.e., ,1 , w , and 0'2
(See Equation (A-4) in the Appendix), the hard part of providing a good starting value is
in 71 and 1,11 , which are the vector of unknown elements of A and unique elements of

‘1’ (Starting values for the estimate of 0'2 can be found in the section of Starting Values in
the Appendix). One way to obtain the starting values for the estimates of 2. and 11/ is by
analogy to the factor analysis. That is, we execute principal component factor analysis on

the positive semi-deﬁnite f , which is obtained by using the same method as current

HLM2, and then take advantage of the pre-speciﬁed structure of A to ﬁnd the initial
estimate of A , which is denoted by A”). Then solving 1 = A‘I’AT for ‘1’ and substituting

A by A”) and r by zT , we can obtain ‘1"0’ , the initial estimate of ‘1’. The details of the

step was shown in Appendix.

Finally I note that in order to obtain approximate standard errors for 6M”. , we will
compute the information matrix by
Inf0.= —H (4-14)
where H is evaluated at (3mg. Then, the asymptotic standard errors for grim are

computed by

 

s.e.(¢3,) = (r = 1,..., Q) (4-15)

1
Info."

-53-

where ¢, is the rth element of vector ¢ and Info.” is the corresponding rth diagonal

element of the information matrix ( Info. ). For the standard error of 6 , the MLE of ﬁxed

effects 61 , we will compute the asymptotic covariance
Dim-(AMA. )" we
where V is evaluated at q; , the MLE of covariance structure parameter ((1. Then the
standard error of the rth element of a vectoré is given by
S.e.(6A,,) = [(A,TV"A, )-' 1,, V (4-17)

for r = 1,2,..., F. Note that since 6, and 61, are known to be asymptotically normally
distributed, we can use them for the inference by keeping in mind that it’s an

approximation.3

 

3 The large sample normal approximation for the standard error of 6 is poor in most applications so that
other techniques are required in practice (Bryk & Raudenbush, 1992, Chapter 3).

-54-

Chapter 5. Computational Examples

In this chapter, I will demonstrate how the proposed model can be applied to real
data sets and when it is useful to do so. I ﬁrst show that the HLM2 factor analysis model
(HLM2F) includes the standard HLM2 model as a submodel, by using a model
formulated in Bryk & Raudenbush (1992) for the High School and Beyond data. The next
example shows a case when the HLM2F model is useﬁrl, using the data from
Huttenlocher et a1. (1991) on children’s vocabulary growth. The third example shows
how to specify the factor analysis model when the Tau-matrix (the level-2 variance-
covariance matrix) is relatively large such that speciﬁcation of the factor structure is less
obvious. The fourth example uses artiﬁcial data with known parameters. The purpose of
demonstrating this example is not only to provide a check for the validity of the
computation, but also to demonstrate how to formulate a reasonable model, i.e., a
substantively meaningful and an identiﬁable model. Further, it illustrates how to evaluate

and correct mis-speciﬁed models.

5-1. High School and Beyond

I use the High School and Beyond (HS & B) to demonstrate that the standard
HLM2 model is a submodel of the two-level Hierarchical Linear Model with Factor
structure (HLM2F).

The data is a subsarnple from the 1982 High School and Beyond Survey, and
includes information on 7,185 students nested within 160 schools. In Table 4.5 on page

72 of Bryk & Raudenbush (1992), the results of the following model are presented:

-55-

HLM2 model:

L-l:

_ iid
Y, :60, +6,,(SES—SES_,),, +8,,8,, ~N(0,0'2)

L-2:

ﬂ),- = yo, + y,,(Sector),- + 702(SES.,-)j + “0} (5-1)

,6, = 7,, +y,,(Sector), +y,2(SES,,), + u,,,

(2:1)TN((3),(:$3 if: )1

Here, Y, is the math achievement score for student i in school j; SES,, is a measure of the

socio-economic status (SES) of student i in school j; SES.)- is the sample mean of SES of
the school j; and Sector, is the indicator variable taking on a value of ‘one’ for Catholic
schools and ‘zero’ for public schools.

Since the model which I developed in Chapter 3 uses the full maximum likelihood
with the Fisher scoring algorithm and the default HLM2 uses the restricted maximum
likelihood with the EM algorithm, we set the estimation method of HLM2 as comparable
as possible to the HLM2F. This can be done by setting the HLM2 optional speciﬁcation
to MLF (Full maximum likelihood) and the number of Fisher iterations to 1. Setting the

number of Fisher iterations to l speciﬁes the program to run all the way by Fisher

-56-

scoring, but if the algorithm fails for some reason], then the algorithm switches back to
the EM algorithm.

The results of HLM2 by MLF with the number of Fisher scoring = l are in the
second column of Table 5.1.

The HLM2F model equivalent to the above model can be speciﬁed by setting the
factor loading matrix as the identity matrix of the size of the number of random effects in
HLM2. This can be shown as follows.

HLMZF model:

L-l:

— iid
Y, = 6,, + 6,, (SES — SES,), + 8,, ,8, ~ N(0,0'2)(Same as L-l model oquuation (5.1))

L-2:

ﬂu} =700+701(S86t0r)j+y02(SES,j)j+1.771} +0'772j . (5'2)

6, = 7,, +7,,(Sector), +7,2(SES,,), +0- 77,, +1472,
where

(:22) N16111:. 1;»

In matrix form, the level-2 model can be written as

6, = W,7 + A77, (5-3)

 

I Most typical reason of this failure is that Fisher will produce negative deﬁnite i" .

-57-

1 Sector}. SES,}. 0 o 0 J

h W.= __ ,
w ere ’ [0 0 O 1 Sector}. SESJ.

' l 0
7=(7ooa 701’ 702’ 710’ 711’ 712)r’A=(0 J’and nf=(noj’ ’7'1)T'

Thus ifwe set A =12 in the HLM2 model, then uj = n}. and r = A‘PAT = ‘1’.

The results using HLMZF are in the third column of Table 5.1. As can be seen by
comparing the results, the HLMZF reproduced almost exactly the same estimates for all
of the parameters including the standard errors. This is evidence that the newly developed

program is working.

-53-

Table 5.1 Results of the Analysis for High School and Beyond Survey

 

# iterations until convergence

HLM2
5

HLMZF
3

 

Fixed effects estimates

 

700

12.126 (0.197)2

12.126(0.197)

 

701

1.227 (0.303)

1.227(0.303)

 

 

702

5.332 (0.366)

5.332(0.366)

 

710

2.946 (0.154)

2.946(0.154)

 

711

-1.644 (0.237)

-1.644(0.237)

 

712

1.042 (0.296)

 

1.042(0.296)

 

Random effects estimates

 

0,2

36.721 (0.626)

36.721(0.619)

 

2'00 (or woo for HLMZF)

2.317 (0.355)

2.317(0.353)

 

1,0 (or w", for HLMZF)

0.188 (0.196)

(0.483 as corr.)

0.166(0.193)

 

r” (or 1;!” for HLMZF)

0.065 (0.208)

0.065(0.204)

 

 

 

Log-likelihood at convergence -23247-30 -23248.22
# parameters estimated 10 10
46494.592 46496.44

Deviance

 

 

 

 

2 Note. ( ) represents the standard error computed from the Information matrix.

-59-

 

5-2. Infant Vocabulary Growth

The purpose of demonstrating the analysis of infant vocabulary grth data is to
show where it is useful to apply the HLMZF model. The data come from a recent study of
children’s vocabulary development during the second year of life ((Huttenlocher, Haight,
Bryk, Seltzer, & Lyons, 1991). Huttenlocher et al. investigated the relationship between a
child’s early vocabulary acquisition and maternal use of language, hypothesizing that
exposure to the mother’s spoken language has a positive relationship to the growth of the
vocabulary of the young child. Gender differences in vocabulary grth were also
considered (Huttenlocher et al., 1991).

Two groups of mother-infant pairs, with six boys and ﬁve girls in each, provided
from 5 to 7 observations per infant in their early stages of life. More speciﬁcally, the ﬁrst
group consisted of 11 children who were observed at their home on six or seven
occasions at 2-month intervals during the period from 14 to 26 months of age. For some
cases, the 14-months data point was missing. The second group consisted of another 11
children who were observed at 16, 20, and 24 months. The time dimension was (Age —
12),,- months, assuming that at 12 months a child’s vocabulary size was zero.

Huttenlocher et al. (1991) ﬁrst formulated the following full quadratic

unconditional model.

iid
L-l: Y): 3,]. +6,j(Age-12),j+ 621(Age- 12)2,,+ 8,, a, ~N(O,0'2)

I

-60-

A1j=700+u01 (5'4)

“0) M O 700 701 2'02
“11 ”N O 1 1'01 711 2'12
“2 j 0 702 2'12 713

Obtaining the results that H0: r00 = 0 and 700 = O and 7,0 = O can hold statistically and

A

knowing that Corr(u,j. , uzj) z 1, they formulated the following ﬁnal level-1 model which

left only the quadratic term:

L-l:

Y1} = 621(Age -12); + 8,], £0. TN(0,0'2)
At level-2, they modeled the rate of acceleration by the individual characteristics such as
group membership, gender, and the amount that the mother spoke to the child.
L-2:
6,]. = 720 + y2,(Group)j + 722(Gender), + 7,, log( Momspeak)j + uzj , (5-5)

iid

(“21) ~ N((O)’(Too))-
where (Group) I. = 1 if the child is in group 1, and 0 otherwise; (Gender) 1 = 1 if the child

is a girl, and 0 if a boy; and log( Momspeak)j is the natural logarithm of the number of

-61-

words that the mother spoke to the child j, measured once at the ﬁrst occasion of the

observation. The results for this model are in the second column of Table 5.23.
The fact that the estimate of the correlation between an. and “21 was high may not
necessarily indicate that the rate of growth does not vary from child to child, which is a

question that was left in Huttenlocher et al. (1991)’s model. To investigate this point,

Bryk & Raudenbush (1992) used the following model:

L-l:
if]. = mug.» -12),, + 6,j(Age —12);+ 5,}, 5,, TN(O,O'2)
L-2
6.1.:qu
6,]. = 720 + 72,( group) j. + 722( gender) J. + 723 log( Momspeak), + uzj (5-6)
where

(2:3)TN((3).(::; 2:»

The results showed that f” = 16.94 (See the row for r” at column 3 in Table 5.2),

which was signiﬁcantly different from 0, and the uU and uzj were highly correlated. This

suggested that it was not necessary to estimate the covariance 1'12 , because the null

 

3 The results were obtained by reanalyzing the model by HLM2 version 4.92, specifying full maximum
likelihood (MLF) and # Fisher acceleration = I. This speciﬁcation induced slightly different results from
Huttenlocher et al. (1991) which used the restricted maximum likelihood (MLR) and EM algorithm. The
same thing can be said in the next results of Bryk and Raudenbush’s model.

-62-

hypothesis 2'12 = ,/ r” - 2'22 held, in the p0pulation, at .05 level, but it was necessary to

estimate both variances, r” and In because those were not zero.

The above result implies a factor analytic type model. Reparameterizing the 122 as

. . . T . .
122 = 1:17,, by usmg the variance ratio 1,2, = —22- , and then setting 1,, = [turn gives the

11

same constraint as 1,, = ,/ r” - 122 . Thus, we obtain a factor analytic type (HLM2F)

model by giving certain constraints on the elements of r matrix of the original HLM2

model, which in turn, implies that the HLMZF model that will be formulated is nested

within the HLM2 model.

This idea is formally formulated by a factor analysis type model. That is,

In L-2 model, we let

“0} :77”

“1} =121771j,

or, in matrix form,

and the covariance equation is

W11 3211””)

D . =A‘l’ T:
[uj] A (lull/11 331W”

Thus, we formulate the following model:

L-l:

_ 2
K; - Aj(Age—12),j +ﬂzj(Age—12),j +80.

-63-

(5-7)

(5-8)

it‘d
3,, ~ N(0,a’)

L-2:
A}: 771} (5'9)

6,}. = 720 + 7,,(Group)j + 722(Gender)j + 72, log( MomSpeak)j + 3,177,].

id

where n”. l~ N(0,1//00).

The results of this reduced model are in the ﬁfth column of Table 5.2. In order to
use the likelihood ratio test to examine whether the above factor model is statistically
acceptable, we run the standard HLM2 model in two ways, one is to run the HLM2
soﬁware and another is to run the HLMZF program by setting A = 1,. Those results are
in the third column of the table below, under the caption HLM2 (Bryk and Raudenbush’s

model), and in the fourth column, under the caption HLM2F (Bryk and Raudenbush’s

model)‘.

 

4 It should be noted that the results for the standard HLM2 model executed by HLMZF showed a close
match to the HLM2 results. However, a run by HLMZF was problematic because the estimate of

LI" became a negative deﬁnite matrix in the early stage of scoring iteration even though several different
starting values were tried. The values on the table were obtained when we ignored the fact that the estimate
of ‘P became a negative deﬁnite matrix during the iteration. For this reason, we take the results by HLMZF
for the saturated model as a reference. On the other hand, HLM2 was executed by choosing MLF and the
number of Fisher iterations = 1. This speciﬁcation lets HLM2 run repeatedly under Fisher scoring
algorithm as long as the estimate of ‘1’ does not become a negative deﬁnite matrix during the iteration; if it
does, then HLM2 returns to the EM algorithm, which is a default algorithm. Thus, probably this is how
HLM2 produced the results: HLM2 ﬁrst tried to estimate by scoring but it failed, and then used BM.

-64-

Table 5.2 Results of the Analysis for the Infant Vocabulary Growth Datas

 

# iterations
until convergence

HLM2
(Huttenlocher
et al’s model)

 

HLM2
(Bryk and
Raudenbush’s
Model)

8

 

HLMZF
(Bryk and
Raudenbush’s
Model)

4

 

HLM2F
(Reduced
Model)

 

Fixed effects estimates

 

720

2.150(0.277)

2.088(0.140)

2.181(0.139)

2.180(0.139)

 

721

0.802(0.345)

0.599(0.290)

0.593(0.288)

0.598(0.289)

 

722

-1.105(0.327)

-0.904(0.279)

-0.902(0.278)

-o.904(0.279)

 

723

0.886(0.327)

 

0.827(0.268)

0.826(0.267)

 

0.827(0.268)

 

Random effects estimates

 

2

819.366(113.622)

707.866(107.832)

711.025(98.197)

708.412(98.233)

 

 

 

 

 

 

 

 

CT
7110/41 for HLMZF) N.A. 16.940(15.402) 16.3266(13.436) 16.6429(5.256)
I21 N.A. 1.709(0.978) 1.77651(0.864) N.A.
(0.985 as earn.)
132 0.566(0.177) 0.178(0.128) 0.172(0.119) N.A.
A?! N.A. N.A. N.A. 0.102(0.0262)
Log-likelihood at - 639 . 22 ~ 632 . 505 ~ 632 . 502 - 632 . 504
convergence
# parameters estimated 6 8 8 7
I)evjance 1276.436 1265.010 1265.004 1265.008

 

 

 

 

 

We compare the results in column 3 for the full model (Bryk and Raduenbush’s

model) and those in column 5 for the reduced model (reduced by factor model). First, notice

that the point estimates and their estimates of the standard error for the ﬁxed effects are almost

the same for the two models. The estimates of level-1 error variance (0”) are also almost the

 

5 Note:

I. HLM2 results were obtained by setting Full maximum likelihood and # Fisher acceleration =1. When we set
the # Fisher acceleration =5. then the results in column 3 took 867 iterations to converge.

2. ( ) is the standard error.

3. The cutoff criterion for getting out of the Fisher scoring loop for HLMZF is ‘relative change in the squared
length of parameter estimates’ < 10’”. which is the same value as HLM2. though HLM2 uses changes in log-

likelihood.

 

(0'2) are also almost the same, but there is a small difference for the standard errors. For
the level-2 variance-covariances, we can recover the estimates of the original variance-
covariances if we compute them using the points estimates in the reduced model and the

covariance equation in Equation (5-8) for the original Tau-matrix. Thus:
in)?” = 0.10229 x 16.8429 = 1.72286,

21310?“ = 0.102292 x 16. 8429 = 0.176231.
These values closely match those in the second column of Table 5.2, i.e., I], = 1.709 and

522 = 0.178 respectively, which are the corresponding elements in the 2' matrix. This
means that we can recover the original I estimate, and this fact is especially useful when
we cannot directly obtain the estimate of r , which is often the case when 5 is not ﬁill
rank, as happened in this data when we used the Fisher scoring in HLM2F.

The comparison of the deviances for the nested models can be used to assess
model ﬁt using a likelihood ratio test. The deviances for the full model (1265.010) and
for the reduced model (1265.008) are basically the same. Thus, the deviance test clearly

shows that the reduced model is statistically acceptable.

-66-

5-3. Scholastic Aptitude Test (SAT) Meta-analysis of Coaching Effects

The purpose of this demonstration is to give an example where the 1 matrix is of
relatively large dimensions. When the size of r is large, we can formulate many different
factor structures, which adds a complexity to the modeling. At this point, discipline
speciﬁc theory and knowledge about particular data and variables should inform
decisions regarding the number of factors (M), and which elements of factor loading
matrix (A) to ﬁx and what values to use, etc. After deciding on the general framework of
the factor structure, mathematical knowledge of model identiﬁcation can be applied to
decide whether the model in mind is an identiﬁed model.

In the SAT meta-analysis for coaching effect data, we create a 4 x 4 1' matrix.
The data consist of 48 studies on coaching effects on the SAT scores (Becker, 1990).
Only 46 studies were analyzed for our analysis because two studies had no information
on coaching hours. The SAT scores were reported for the two subtests, SAT-Math (SAT-
M) and SAT-Verbal (SAT-V). Each study provides a part of the information on
standardized mean change scores between pretest and posttest on SAT-M, SAT-V for
coached and uncoached groups. A study is considered to be complete if it provides four
standardized mean change scores, i.e., standardized mean change score for the coached
group on SAT-M, standardized mean change score for the uncoached group on SAT-M,
standardized mean change score for the coached group on SAT-V, and standardized mean
change score for the uncoached group on SAT—V. Table 5.3 shows a number of studies

classiﬁed by available standardized mean change scores and whether a control group

-67-

(uncoached group) was used in the study. From that table, we see that only 19 studies,

which is about 42% of the studies, had a complete set of information.

Table 5.3 Classiﬁcation of SAT Studies

Existence of Control Group

 

 

 

 

 

 

Available Standardized Yes No Row Subtotal
Mean Change
Both SAT-M and SAT-V 19 2 21
SAT-M only 1 3 4
SAT-V only 13 8 21
Column Subtotal 33 13 Total 46

 

 

 

 

 

One of the strengths of a hierarchical analysis is that diverse patterns of data
information can be put together and all the information can be used for statistical
inference assuming that data are missing at random (Little and Rubin, 1989). This will be
the method of analysis used on the above meta-analysis data, which utilizes data from 27
studies (about 58 %) that are incomplete. Unless using a hierarchical model, information
from more than half of the studies can be totally discarded or it will be used less

effectively. In a small data set such as this meta-analysis data, it is too costly.

Model:

Let dy. be the standardized mean change score in the ith observation of the jth

study, which can be deﬁned as

-68-

4(n,.j—2) )7 —l_’.

21'] 11/
)

'7 = 4n”. — 5 SM (5-10)

where 172,]. is the sample post-test average for observation i of study j; 17“.]. is the sample

pre-test average for observation i of study j, S,” is the sample standard deviation of the

post test for observation i of study j which is assumed to well approximates the pooled

4("1/ ’2) for Y2); - Yw

sample standard deviation, and the coefﬁcient ——
4n”. — 5 S

is used as a
ray

multiplying factor to ensure that d ,.j is an unbiased estimator of the population

#217

standardized mean change 6,}. = —% (Becker, 1988), where p, ,1. is the population

post-test mean for observation i of study j, .111.)- is the population pre-test mean for

observation i of study j, and a is the population standard deviation that is common to

both pre-test and post-test.

Let X 1 be the indicator for SAT-M score for the coached group, X 2 be the
indicator for SAT-M score for the uncoached group, X 3 be the indicator for SAT-V score
for the coached group, and X 4 be the indicator for SAT-V score for the uncoached group.

Thus, X 1 ,.j = 1 if d0 is the observation of SAT-M score for coached group, and 0

otherwise; X 2 ,.j = 1 if d”. is the observation of SAT-M score for uncoached group, and 0

otherwise, X 3 = 1 if d0 is the observation of SAT-V score for coached group, and 0

ij

otherwise; X 4 ,j = 1 if d U is the observation of SAT-V score for uncoached group, and 0

otherwise. Deﬁning the variables as above, we formulate the level-l HLM model:

-69-

L-l:

d.)=A1X1y+ﬂszzg+ﬂstyj+,34,-X4.-j+5ya (5-11)
where i = 1,...pj for p]. is the number of observations of standard mean diﬁ’erence d”.
available for the study j, and j = 1,2, ....,46. Within the study, the error 8,)— ’s are correlated

because even within a study, sources of errors come from the same set of people. This is
the difference between this model and the standard HLM model where the level-1 errors
are assumed independent. The dependency of the within-study errors occurred because

the model is multivariate, which implies that the data have common sources of variation

within a study. The variances and covariances of the covariance matrix of the vector of
within-study error a}. = (8”,...,£ij. )T (VJ. = D[sj], a pj x pj symmetric matrix) can be

computed by the following formula (Becker, 1990):

 

2(1 " pp?) 502
+ a
. 2n .

.I J

Var(£,.j) a V,.]. = (5-12)

where n j is the number of subjects in the study j, and p,.,. is the population pretest-

posttest correlation. In this analysis, the value of p ,.P = 0.88 will be used as the
approximate population correlation between pre- and posttest SAT scores for both SAT-
M and SAT-V for all subjects, following DerSimonian and Laird (1983). Similarly the
covariances of the covariance matrix of the vector of within-study error can be computed

by the following formula (Becker, 1990).

 

6.6..-
Cov(a..a..)aVii.~=:" + ’zjzp’. (5-13)
I j

-70-

where p,,..j is the population correlation between the dependent variables associated with

estimated standardized mean differences i and i ' . This correlation may be estimated
from the sample, deduced from published test information or imputed on the basis of past
research. In the SAT meta-analysis case, we use pm}. = 0.66 if the pair of (i , i’) is Math
and Verbal in the same experimental group (either in the coached group or uncoached
group), which is taken from the test manual (Gleser & Olkin, 1994). If the pair of the
observation unit (i , i’ ) is in the different groups, i.e., coached and uncoahed group, p“,
= 0.00 since those observations are independent.

As we saw in the above, the within-study error variance-covariance are known in
the meta-analysis settings. In HLM terms, this type of model is said to be the V-known
model.

The level-2 model is a multivariate regression model and the time for coaching in

hours is used to model 6” and A}. , the means of standardized mean difference for SAT-

M and SAT-V for the coached groups.

L-2:
6U= 7,0 + 72,(Coaching Hours) j + u”.
62,. = rm + “2,- (5-13)
63]. = 720 + 721(Coaching Hours) J. + “31
64}. = y“, + u,1 j
where

-71-

7 (

ulj\ 0 711 712 T13 T14 \

“21' ”d 0 721 1'22 723 724
’

u3 j 0 731 r32 733 734 1

W41) \0 741 742 2'43 2'“)

 

 

 

 

and (Coaching Hours) 1 is the hours of coaching for the study j.

This model is substantively interesting and meaningful by two reasons: One is
that if we center Coaching Hours around zero as we just did in the Equation (5-13), then
the ﬁxed effect 710 is the expected standardized mean change for SAT-M for the coached
groups at the absence of coaching and thus y“, — 720 represents the expected difference
between coached and uncoached groups for SAT-M under no treatment (coaching),
which is a possible result because some of the studies were not randomized experiments.
Similar interpretation can be made for 730 — 7’40 , which is the expected difference
between coached and uncoached groups for SAT-V under no treatment. The second
reason is that If we center Coaching Hours around its grand mean, then
710 — he represents the mean gains in the experimental groups compared to the
standardized mean change of the control groups for SAT-M, and 730 — 740 is the mean
gains in the experimental groups compared to the standardized mean change of the
control groups for SAT-V, assuming that the model is correctly speciﬁed6.

To solve V-known HLM model, we capitalize the fact that the level-1 error

variance-covariance matrix V]. is known for all j; j = 1,2,...,], where J is the number of

 

6 Actually model misspeciﬁcation is possible. Kalaian and Raudenbush (1996) used the natural logarithmic
transformation of the hours of coaching, being concerned with a curvilinear relationship from the
observations of the scatterplots. Thus, including quadratic term may show a better ﬁt. Here I used a linear
model for simplicity of demonstrating the point.

-72-

studies that was used in Kalaian and Raudenbush (1996). That is, using Cholesky

. . _ 'r . . . . .
factorization, VJ. — Fij , where F]. is a p j x p 1 lower triangular posrtrve deﬁmte

matrix, we transform the level-1 within study model in Equation (5-11) by multiplying

F1." from the left. Then the transformed level-1 model has i.i.d. error variance for all the

studies with the ﬁxed unit variance. Therefore, we can apply HLM2 and HLM2F

software by ﬁxing the level-1 error variance (02) = l.

The results of this model by HLM2 is shown in the second column of Table 5.4.

The results for the same model by HLM2F was not obtained because the estimate of

‘1’ becomes negative deﬁnite during the early stage of scoring iteration. The estimate of

rwas

0.0775 0.0874
0.0874 0.1045
- 0.0360 - 0.0352
-0.0156 -0.0191

f:

and as a correlation,

1.000
. _ 0.972
p‘ " -O.562

-0307

0.972
1.000
- 0.472
- 0.324

-0.0360 -0.0156
-00352 -00191
, (5-14)
0.0532 0.0364
0.0364 0.0334
-O.562 -0.307
-0472 -0.324
. (5-15)
1.000 0.863
0.863 1.000

As we can see from the correlation estimates, 2° is almost singular and has two high

correlation blocks, which caused the slow convergence that is shown by the very large

number of iterations in HLM2 as 2911 EM iterations, and caused the inability of

obtaining the estimation by HLM2F which utilized Fisher scoring.

-73-

Table 5.4 Results of the Analysis for the SAT Coaching Effects Data7

 

# iterations until convergence

HLM2
(Full model)

2911

HLMZF

(Reduced model a)

13

HLMZF
(Reduced model b)

8

 

Fixed effects estimates

 

0.290(0.071)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

7m 0-317(0.0692) 0.284(0.0679)
y” 0.00716(0.000896) 0.00701(0.00110) 0.00760(0.00115)
720 0.263(0.0709) 0.254(0.0614) 0.253(0.0655)
73o 0-183(0.0517) 0.176(0.0459) 0.187(0.0537)
7“ 0.00467(0.00158) 0.00481(0.00091) 0.00455(0.00123)
y“, 0-140(0-0357) 0.1409(0.0321) 0.1417(0.0343)
Random effects estimates
0'2 N-A- N.A. N.A.
7,,(1/1” for HLMZF a& b) 0-0775(0-0241) 0.0867(0.0243) 0.0812(0.0208)
61 0.0874(0.0269) 11.4. 11.4.
722 0.104(0.0330) 11.11. "J.
23,062, for HLM2F a& b) '0-036010-0‘54) ~0o0269(0-0135) -0.0325(0.00893)
r32 -0.0352(0.0180) 11.11, "J.
133(sz for HLMZF a& b) 0.0532(0.0137) 0.0496(0.0117) 0:0536(0.00811)

 

{“0631 for l-ILMZF b)

-0.0156(0.0125)

N.A.

-0.0167(0.00673)

 

T42

-0.0191(0.0146)

NOA.

N.A.

 

T43(V/32 for HLM2F b)

0.0364(0.0102)

NOA.

0.0329(0.00325)

 

744(1/133 for HLM2F b)

0.0334(0.00958)

N.A.

0.0324(0.00464)

 

 

 

 

 

 

,1“ 11.4. 1.010(0.0541) 1.122(0.0717)
N.A. 0.817 0.0556 11.4.
’142 1 )
Log-likelihood at convergence '279-689 -307-693 -284.424
# parameters estimated 16 11 13
Deviance 559.378 615.386 568.848

 

 

 

 

 

7 Note
1. ( ) is the standard error

2. The cutoff criterion for getting out of the Fisher scoring loop for l-ILMZF is ‘relative change in the squared
length of parameter estimates’ < 10", which is the same value as HLM2. though HLM2 uses changes in log-

hkehhood.

-74-

 

From the pattern of f , we can imagine a simple pattern of factor structure. That

. T
rs, we formulate the factor model for level-2 error vector u j. = (an. , uzj , 11,]. , u“) as

 

 

”U = ’71)"
“21 =12lnlj’ (5-16)
“31 = 7b},
“41 = 427b,“-
or, in matrix form,
(uU\ l 0
_ “21 121 O [’70]
ll]. _ “3)- _ 0 1 Tbj _' A7719 (5’17)
(1141.) O ’142
l 0
0
where A = ’12 , and 77]. =[011]
1 772}
o 4..

The pattern of factor loading matrix A and the length of factor score vector 77!.

characterizes our idea about how the level-2 errors are correlated. That is, we hypothesize

that the 4 level-2 random errors (u j = (“1)- , u2j , u”. , u“. )T) after controlling for the

coaching time can be represented by two unique level-2 unit latent random errors
(77]. = (77,]. , 77217), and the SAT-M unique variability for coached and uncoahed groups
((uU , uzj )) can be explained by a common factor 77,}. , and the SAT-V variability for

coached and uncoahed groups ((“31 , u“. )) can be explained by a common factor 7721..

Formally, we formulate a HLM2 factor (HLM2F) model as follows:

-75-

d”. zﬂij +ﬂle2ij +14sz1; +6,ij +511
L-2:

6U= 7,0 + 7,,(Coaching Hours) j + 77,]. (5-18)

'62} =720 "72217711

63!. = 730 + y3,(Coaching Hours) j + 772].
’34} = 740 '1' 142772},

W11 W12

) . We refer to this model as the reduced
W 21 W22

where [77"] ~ N(0,\1’) for 111 =[

21
‘model a’.
Note that having speciﬁed the level-2 errors as Equation (5-16) or (5-17), we

structured the r covariance matrix as

W11 121W” W12 ’142W12
121W“ A§1W11 421W12 ’121’1'42W12
W21 ’142 W21 W22 ’10sz
142 W21 ’1'21’1'42 W21 142 W 22 1'12 W 22

D[uj] = AD[nj]Ar = MFA" = . (5-19)
Note also that ‘model a’ in Equation (5-18) is nested within the full model in
Equation (5-13). This fact can be shown in a similar way that I did for the 2 by 2 2'
matrix case for the infant vocabulary growth data in section 2 of Chapter 5 (see page 62).
Another way of ﬁnding the constraints on the 7 matrix in order for the reduced model
(model a) be nested within the full model is obtained by comparing Equation (5-19), the
expression of the structured I created by formulating the HLMZF model (model a), with

the unstructured and saturated r in Equation (5-13), which was formulated by HLM2.

-75-

Using this comparison and with a little algebra, we ﬁnd ﬁve constraints on the 2'
matrix:
711722 = 72211 733744 = 131231
73,133 = 21:11-43 1 73314, = 7437311 (5'20)
and 1,,2'332'42 = 721743731 -

Thus we obtain Equation (5-19) from the saturated 1 matrix in Equation (5-13) by

deﬁning
T 7
W11: 7111 421=—2'l_1 W22 = 7331 ’1'42 =i1 (5‘21)
711 2'33
and W21: 731

with these ﬁve constraints. Note that the ﬁrst two constraints in Equation (5-20)

correspond to the statements that the squared correlation between u, j and 212, equals to

one (p3,,2 = 1) and the squared correlation between u,, and u“. equals to one ( p3,", = 1).

The results of ‘model a’ is shown in the third column of Table 5.4. To test the
model ﬁt, we use a deviance statistic. The deviance test suggests that ‘model a’ does not
ﬁt as well as the full model, D, — D0 = 615.386 - 559.378 = 56.008 with 5 (16-11) d.f., P-
value < 0.0001 , where D, is the deviance for model a, and D0 is the deviance for HLM2
model.

The next step is to try to ﬁnd a statistically acceptable model. Since the ntunber of
parameters is reduced from 16 to 11 and the estimate of 7 matrix is almost insufﬁcient

rank, the number of parameters of a model that shows a good ﬁt must be between 11 and

16.

-77-

One model that satisﬁes this condition can be obtained by carefully looking at the
estimated tau matrix produced by the HLM2 full model. The estimated tau matrix with its
correlation matrix form reveals that the upper leﬁ 2 by 2 block matrix is closer to singular
that the lower right 2 by 2 block matrix because the upper has the correlation 0.972
whereas the lower’s one is 0.863. Based on this observation, we consider a model that

only u, j and u,, completely share the common variance. Thus we formulate the following

model:
L-l:
d1; = AJXHI +152.in01' (BUX‘M'I 1' 6,,X40. 1' 50'
L-2:
6, j: y“, + 7,,(C0aching Hours) j + 77,]. (5-22)
ﬂu = 720 + 42.171,-
631. = rm + 73,(C0aching Hours) 1 + 772,.
11’. j = 7... + 773,.
’71; W11 W12 W13
where 77,]. ~ N (031’) for ‘1’ = 11/2, 81,, 01,3 . We refer to this model as the
’73) W31 W32 W33

reduced ‘model b’. Similar argument that we did for ‘model a’ tells that ‘model b’ is
nested in the HLM2 model, and it is clear that ‘model a’ is a nested model in ‘model b’.

Note that the factor model that we used in ‘model b’ for the level-2 error vector

1‘.
uj=(u,,., u,,, u”, u“) rs

-78-

 

 

(up 1 0 0
7],.
”21' 421 0 0 I
ui_ 1131- — O 1 O ’72} "Anjr (5'23)
04,-) 0 o 1 ’7”
1 0 0
, 0 0 ”‘1'
whereA= 0 1 O ,andn,= 772,.
0 0 1 "31'

The results of ﬁtting ‘model b’ is in column 4 of Table 5.4. The deviance test
reveals that ‘model b’ does not ﬁt as well as the full model, but we marginally reject the
null hypothesis at 0.05 level because D2 — D0 = 568.848 - 559.378 = 9.470 with 3 (l6-
13)d.f., and the P-value is 0.024, where D2 is the deviance for model a, and D0 is the
deviance for HLM2 model.

Knowing that ‘model b’ is rejected though it is marginal, we still continue the
investigation of the model that ﬁts to the data. The number of parameters of the model
that ﬁts now must be between 13 and 16. One model that satisﬁes this condition is
speciﬁed by adding a speciﬁcity component to the model, which will be presented in
section 1 of Chapter 6. This model adds four speciﬁc variance parameters on ‘model a’,
and thus if all of these are statistically signiﬁcantly different from zero, the number of
parameters of the model is 15, which is one smaller that the HLM2 model. To ﬁt the
model that has unique speciﬁcity is a t0pic of future study whose model is formulated in
section 1 of Chapter 6.

Finally, I would mention that the results and substantial interpretation for the two

dimension HLM2 factor model are consistent with those of Kalaian & Raudenbush

-79-

(1996), which analyzed the studies that had the pair of SAT-M and SAT-V, where major
substantive conclusions were that coaching was more effective for SAT-M than SAT-V,
and the between-study unique effect of SAT-M and SAT-V was negatively correlated.
However, the current analysis provides a more precise information on what way study
variability emerged and it utilized all of the available studies, whereas Kalaian &
Raudenbush (1996)’s model required the existence of control group for each study, which

will result in 33 studies for this data set (see Table 5.3).

-30-

54. Results from the Simulated Data

The purpose of this analysis is not only to see whether the theory and the
computer program that I developed can recover the parameter values well, but also to
evaluate the capacity of the methodology to distinguish between true model and
alternative incorrect models in the context of large data application. To answer these
questions, I generate the artiﬁcial data with known parameters.

We consider a situation in which subtests nested within students that was already
stated in section 3 of Chapter 3. To summarize the settings, a test consists of 4 subtests,
mathl, math2, verball, and verbalZ, and each student, the total of 100 students, takes one
form of the test at the ﬁrst testing occasion and takes a parallel alternative form of the test
at the second testing occasion. No missing observations were assumed. Thus we created a
situation that can be conceived as subtests are nested within students. Each student has 8
observations for 100 students, with the total observations of 800. We assumed that true
scores of subtests for the same domain such as mathl and math2, and verball and
verba12, are perfectly correlated and also assumed that the correlation between
mathematical and verbal proﬁciency was 0.50. Therefore, non-orthogonal two factors,
mathematical and verbal proﬁciency were considered.

The model was formulated as in Equations (3-29) and (3-31), and the parameters
in the model were setas 7.0 =720 =730 =74, =500, 0'2 = 25, 1., =0.8 2., =12,
01,, = 100 , 1/122 = 100 , W12 = 50. It should be noted that the model is an identiﬁed model

as shown in section 3 of Chapter 3.

-31-

Checking the Data that are generated:
We generate the 800 observations in total, 2 observations from each subtest and

thus 8 observations for each subject. The descriptive statistics for this data is in the

 

following tables.
Table 5.5 Descriptive Statistics of 4 Generated Outcomes
The MEANS Procedure
Variable N Mean Std Dev Minimu- Maxilun
matht 200 500.7896272 10.7851935 472.3574900 536.1797700
math2 200 500.0342420 9.1171437 468.6287800 523.6194500
verb1 200 499.2721306 11.5485774 475.0847300 528.3763000
verb2 200 499.4860767 13.2175895 466.3821800 535.9121100

 

To see whether the generated data are reasonable with regards to the means and

the standard devratrons, we let y”, a 7018f! , yzy- E- Jig-(,2, =1 . 3’3.)- 5 ymxw-l . and
y“, s y'jl’w“ . Then srnce y”, =ﬂ, +5.). )5.) =13),- +6‘,-,-. 3’3.)- =.33,- +31), and

.741} :16” +3111 we have E(71g)=710 =5001 E(yzij)=720 =5003 E(y3ii)‘= 730 =5001

and E(yw) = no = 500 for the respective means, and

s.d.(y,,,.) = W: W: We 11.18,

s.d.(y,,.)= W: W: J15 5943,
s.d.(y,,)=W=W=./iz_sslns,and

s.d.(y,,j) = W = W = «[169 513.00 for each standard deviation.

Comparing these population means and standard deviations with the

corresponding sample means and the standard deviations in Table 5.4, we recognize that

-82-

the sample statistics of the means and the standard deviations are close to the populations

values.

be seen in Table 5.6 or as a correlation matrix in Table 5.7.

For the second moment property, the sample variance and covariance matrix can

Table 5.6 Sample Covariance Matrix of 4 Generated Outcomes

Covariance Matrix, DF = 199

natht math2 verb1 verbz
matn1 116.3203994 74.5273374 47.8142300 59.2826475
math2 74.5273374 83.122308? 29.7039157 41.4890456
verb1 47.8142300 29.7039157 133.3696391 128.1651858
verbz 59.2826475 41.4890456 128.1651658 174.7046711

Table 5.7 Sample Correlation Matrix of 4 Generated Outcomes

Pearson Correlation Coefficients, N s 200

Prob > |r| under H0: Rho=0

nath1 natnz verbt verb2

math1 1.00000 0.75793 0.38388 0.41588

<.0001 <.0001 <.0001

math2 0.75793 1.00000 0.28212 0.34429

<.0001 <.0001 <.0001

verb1 0.38388 0.28212 1.00000 0.83983

<.0001 <.0001 <.0001

verb2 0.41586 0.34429 0.83963 1.00000

<.0001 <.0001 <.0001
The population covariance matrix can be represented as
y. 14. +02 11,,” (11,, 1,12,, 125 80 50 60
D yr _ ’11W11 ’112W11‘1'0'2 4W1: ’h’lrWn = 80 89 40 48 ,

y. 1.1., 4,111,, (11,, +02 2,01,, 50 40 125 120
_ y4 _ ‘ 12le ’11’1'1W12 12sz 122sz +02 60 48 120 169

 

 

-33-

and as a correlation matrix,

FY11 l

 

 

Comparing these expected values with the sample values, the generated data are
again reasonable.
Finally we check the distribution of each observed variable, y, , y2 , y3 , and y4 ,

using the Shapiro-Wilk test for normality. The results are in Table 5.8.

Table 5.8 Shapiro-Wilk Test for Normality for the Generated Outcomes

y2 0.758
p y, ‘ 0.400 0379 1 0.826 '
y, _ 0.413 0391 0.826 1

0.758 0.400 0.413
1 0379 0391

 

 

 

 

 

 

Outcome Shapiro-Wilk Statistic P-value
(W) (Pr. < W)

Math 1 0 . 9958 0 . 8432

Math 2 0 . 9936 0 . 5480

Verbal 1 0.9881 0.0918

Verbal 2 0 . 9919 0 . 3347

 

 

 

All of the P-values computed from the Shapiro-Wilk statistic are greater than .05.
That implies that each outcome can be considered to have a normal distribution, which

we expect because each outcome was generated from the sum of two independent normal

variates.

The above results all indicate that the data were generated accurately as we

speciﬁed in the model.

-34-

 

ﬁll-15.8

Now we move to the analysis. Here we formulate a series of identiﬁed models
discussed in section 3 of Chapter 3, and ﬁt those series of models that only differ in the
level-2 variance-covariance structure to the generated data. Thus, all of the following four
models, model 0, model 1, model 3, and model 4, have the same level-l model as written
in Equation (3 -29). The difference comes in at the level-2 model error structure. The

standard HLM2 expresses the level-2 model for this artiﬁcial data as,

 

 

L-2:
161} = 710 1' “11
6 . =7 +u
2, 20 21 (5_24)
ﬂ3j = 730 + “3}
ﬂu = 740 '1' “4}
where
“U K 0 711 7,2 713 714 \
“27' TN 0 712 2'22 723 2’2,
“31 O 713 1'23 1'33 73,
“41 K O 714 724 2'3, 744 1
Or in matrix notation,
6j=Wj7+uj, u]. ~N(0,r) (5-25)

V T
Where16j=(ﬂj1 18219 ﬂip 164;)1szl417=(711 721 731 74)1and

T
uj=(u,j, u,,, u,,, “41):

All of the models deal with factoring u j and they can be written as in the matrix

form

-35-

u j = A17]. (5-26)

in general. The covariance structure can be written as
r = A‘I’AT. (527)
Thus, structuring the 7 matrix and adding restrictions on it, all the models can be derived

from the standard HLM2 model, which I call here “model 0”. That means that all of the

following models are nested within model 0.

Model 0:
Model 0 can be speciﬁed by a standard HLM model, but if we use the HLMZF
formulation, this model can be expressed by setting A as 4 x 4 identity matrix, i.e.,
T
A =1, and by setting 77,. = (77,], 772,, 773,, 774,.) , and thus by setting ‘1’ as a

4 x 4 symmetric variance-covariance matrix.

Model 1:

Model 1 is the model from which we generated the data. Therefore, it is the model

1
that should best ﬁt to the data. Model 1 is obtained by setting A = g
0

0
O
l 9
12
T
77,. = (71),, 772,.) in Equation (5-26), and ‘1’ as a 2 x 2 variance-covariance matrix in

Equation (5-27).

-86-

Note that model 1 is nested within model 0, which was already shown at section 3
of Chapter 5 when we analyzed the SAT meta-analysis of coaching effects data (see

Equation (5-17).).

Model 2:

Since model 2 was an unidentiﬁed as shown in Section 3 of Chapter 3, it will be

omitted from the analysis.

Model 3:

0
. . . ,1 T
Mode13 rs speciﬁed by settrng A: ‘ , 77,. =(77U, 772,.) and ‘I’ asa

31o}:—

1
11

2 x 2 variance-covariance matrix. Note that model 3 is nested within model 0, and then,
model 1 is nested within model 3. The fact that model 3 is nested within model 0 can be
shown in a similar way that I did for the SAT meta-analysis of coaching effect data in
section 3 of Chapter 5 by comparing the structured r with the unstructured r. The fact
that model 1 is nested within model 3 can be easily obtained by constraining A, = 0 and

,1, =0.

-37-

Model 4:

Model 4 is speciﬁed by setting A = , 77, = (77,1) and ‘I’ as a 1x lvariance-

,1,
32

covariance matrix (scalar). Note that model 4 is nested within model 1 as shown in
section 3 of Chapter 3 and thus it is the most restrictive model.

To summarize, a series of 4 models that are formulated above has the following
nested models structure, Model 4 c Model 1 c Model 3 c: Model 0 (‘ c ’ reads ‘nested
within’). The model ntunber, its characteristics, and the number of parameters estimated

in the 2' matrix is summarized in Table 5.9.

Table 5.9. Characteristics of the Speciﬁed Modelsl

 

 

 

 

 

 

Model Model Characteristics # parameters estimated in the
Tau matrix
0 Saturated HLM2 model 4(5)/2 = 10
4 factors
3 2 factors cross loading 4 + 3 = 7
mis-speciﬁed model
1 2 factors simple loading 2 + 3 = 5
correct model
4 1 factor 3 + l = 4
mis-speciﬁed model

 

 

 

 

Results:

The results are shown in Table 5.10. The estimates of ﬁxed effects are very

similar across all the models including the standard errors and all the 95% conﬁdence

 

' Note: Model 4 c Model 1 c Model 3 c Model 0.

-33-

intervals captures the true values, i.e., 7,0 = 720 = y}, = 7,0 = 500. The large number of

iteration of Model 0 indicates that the level-2 variance-covariance is near singular. It was

estimated by HLM2 as:

124.3 00 98.940 48.818 65.533
.. _ 98.940 81.295 40.195 49.139
T — 48.818 40.195 94.183 1 12.590 ’

65533 49.139 112590 143.791
and as a correlation matrix,

1.000 0.984 0.451 0.490
0.984 1.000 0.459 0.454
0.451 0.459 1.000 0.967 ’
0.490 0.454 0.967 1.000

P5:

which supports near singularity argument.
The results for Model 1 represent that the estimates capture all of the true

parameters in the model within the range of 95% conﬁdence intervals.

-39-

Table 5.10. Results of the Analysis for the Simulated Data by Four Models2

 

# iterations until convergence

Model 0
1309

Model 1

4

Model 3
6

Model 4
5

 

Fixed effects estimates

 

499.225(1.164)

499.225(1.166)

499.225(1.013)

499.225(1.045)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2: 499.055(0.960) 499.055(0.963) 499.055(0.655) 499.055(0.693)
730 500.200(1.028) 500.2(1.021) 500.2(1.092) 500.2(0.964)
74o 500.910(1.246) 500.91(1.243) 500.91(1.273) 500.91(1.166)
Factor Loading estimates

,1, 11.4. 0.80569(0.0407) 0.66261(0.0472) 0.79521(0.0676)
,1, 11.4. 1.24466(0.0562) 1.15753(0.0556) 11.4.

A; 11.4. 11.4. 0.05610(0.0438) 1.15745(0.0695)
,1, 11.4. 11.4. -0.0903(0.0493) 11.4.

,1, 11.4. 11.4. 11.4. 0.61993(0.0745)

Random effects estimates

02 23.609(1.669) 25.3519(1.462) 24.0167(1.365) 57.8552(3.091)
7,, 124.300(19.266) 11.4. 11.4. 11.4.

2.2, 96.940(14.967) 11.4. 11.4. .11.4.

r,, 81 .295(13.193) 11.4. 11.4. 11.4.

7,, 48.616(12.965) 11.4. 11.4. 11.4.

Tn 40.195(10.716) 11.4. 11.4. 11.4.

733 94.163(15.012) 11.4. 11.4. 11.4.

7,, 65.532(15.960) 11.4. 11.4. 11.4.

,4? 49.139(13.000) 11.4. 11.4. 11.4.

7,, 112.590(17.079) 11.4. 11.4. 11.4.

7,, 143.792(22.020) 11.4. 11.4. 11.4.

v,” 11.4. 123.201 (14.283) 90.5223(11.071) 60.2666(12.432)
W12 11.4. 50.4351(8.190) 46.2939(7.666) 11.4.

W22 11.4. 91.5432(10.901) 107.166(12.766) 11.4.

Log-likelihood 2704.02 -2707.09 -2706.93 -2660.67
# parameters estimated ‘5 1° ‘2 9
Deviance 5408. 032 5414.18 5413.86 5761 .74

 

 

 

 

 

 

2
Note:

1. Cutoff criterion for getting out of the Fisher scoring loop for HLM2F is ‘relative change in the squared
length of parameter estimates’ < 10", which is the same value as HLM2, though HLM2 used changes in

log-likelihood).
2. ( ) is the standard error
3.

The saturated model was tried by HLM2F. but during the iteration. ‘1’ estimate became negative

deﬁnite.

4. Model 2 was unidentiﬁed and thus was excluded from the analysis.

-90-

 

We focus on the model ﬁt by comparing the deviance statistics. Since all the
models are nested, we can perform the deviance test. Since the order of nesting of the
models is Model 4 c Model 1 c Model 3 c Model 0, we perform the deviance test as in

the table below.

Table 5.11. Several Results for the Tests of the Model Fit

 

 

 

 

 

 

 

Test Deviance d.f. P-value

Model 3 against Model 0 5.83 4 p = 0.212
(5413.86 - 5408.03) (15-11)

Model 1 against Model 0 6.15 6 p = 0.407
(5414.18 - 5408.03) (15 - 9)

Model 4 against Model 0 353.71 7 p < 0.001
(5761.74 - 5408.03) (15 - 8)

Model 1 against Model 3 0.32 2 p = 0.852
(5414.18 - 5413.86) (11 - 9)

Model 4 against Model 1 347.56 1 p < 0.001
(5761.74 - 5414.18) (9 - 8)

 

 

 

 

 

From the table, the deviance test identiﬁes that model 1 is the most appropriate
model among the 4 models, which is the model that generated the data.

Finally, in order to evaluate how good the method of estimation shown in Chapter
4 is, 1000 data sets were generated from the model in Equation (3-29) on page 41 and in
Equation (3-31) on page 42 and parameter values on page 81. They were analyzed by

Model 1, the correct model (see page 43 and page 86).

A 95 % conﬁdence interval on 6, where 19 is a generic symbol for any one of the

parameters in the model, was constructed for each data set by the form, 19 i 1.96s. 6(9) ,

where s. e.(é) is the estimated standard error for 61 , which was obtained either from the

-91-

method of generalized least squares for the ﬁxed effects parameters (see Equation (4-17)

on page 54) or from the diagonal element of the information matrix (see Equation (4-15)
on page 53). The value of the multiplier of s.e.(é) was chosen as 1.96, which assumes

that 61 is normally distributed; this assumption can be justiﬁed by a reasonable conjecture
that since we have a relatively large number of repetitions (1000 times), the normal
approximation was appropriate.

Let p be the probability that the conﬁdence interval covers the true parameter
value. For each of the 1000 samples a conﬁdence interval was constructed by the above
method. We estimate p by the proportion of coverages among the 1000 repetitions, and
denote this by 13. For example, for a ﬁxed effect parameter 7,, , 945 times out of 1000
repetitions the interval captured the true parameter value (7,0 = 500) in the interval.
Therefore, the estimated coverage probability of the interval, p , is 0.945. Similarly, the

estimated coverage probability was computed for the other eight parameters in the model

and is represented in the second column of Table 5.12.

-92-

Table 5.12. Estimated Coverage Probability and Its Conﬁdence Interval for 1000

 

     
   
    

 

 

 

 

 

 

 

 

Simulations
Parameter Estimated coverage probability A 95% conﬁdence interval on p
(15)

(0.931, 0.959)

0.950 (0.936, 0.964)

0.942 (0.927, 0.957)

0.949 (0.935, 0.963)

1,, 0.938 (0.923, 0.953)
4,, 0.939 (0.924, 0.954)
([1,, H 0.945 (0.931, 0.959)
1,12, 0.946 (0.932, 0.960)
1,122 H 0.943 (0.929, 0.957)

 

 

 

 

Based on the point estimate of the coverage probability ( p ), we can compute an

.. ,_ ,.
approximate 95% conﬁdence interval on p by [3 i1.96,’&n—p—) , where n is the number

of replications. Thus, in our case, n = 1000. Then, for example, an approximate 95%
conﬁdence interval on p for 7,0 is

(0.945 1- 1.96 - 0.0072) 5 (0.945 i 0.014) = (0931,0959) .

An approximate 95% conﬁdence interval on p for the other eight parameters in the

model was computed in a similar way and is shown in the third column of Table 5.12.

-93-

The average estimated coverage probability of the conﬁdence interval is slightly
less than 0.95, with each estimated coverage probability being close to this value. All of
the 95% conﬁdence intervals on p for every parameter captured the true value, 0.95.
These results add credibility to the method developed in this dissertation estimating
parameters and standard errors, when the cluster level (level-2) sample size is large

enough and the parameter values are not at the boundary of the parameter space.

-94-

Chapter 6. Conclusions and Future Directions

6- 1 . Conclusions
In this dissertation, a method that incorporates factor analysis into the level-2
variance-covariance matrix of the hierarchical linear model was discussed. This approach

provides several contributions, methodologically and substantively.

6-1-1. Methodological contributions

HLMZF avoids excess simpliﬁcation of level-2 variance-covariance matrix when
some pair(s) of random effects have high correlation. If we specify many regression
coefﬁcients as random using the standard HLM, with maximum likelihood (ml) estimated
via the EM algorithm, the number of iterations to obtain the estimates gets too large.
However, as long as parameter estimates remain in the interior of the parameter space, the
increase in the log-likelihood is assured and we can expect convergence in a reasonable
amount of time.

Fisher scoring, when combined with EM, can accelerate convergence quite
dramatically. When Fisher scoring fails to converge within the parameter space or it fails
to show much improvement on each iteration, a possible cause should be that parameter
estimates are close to the boundary of the parameter space. Within this boundary
estimation problem, there are two possibilities. One is that some variance estimates are
close to zero and the other is that some correlations are close to 1 or -1. The former
problem can be easily solved by setting the corresponding random level-2 error to zero.

One caution needs to be noted, however. Since setting one random level-2 error to zero

-95-

implies that all the covariances related to this random error as well as the variance be
zero, we need to check whether we can drop the term by executing a multiparameter test
such as a deviance test.

Solving the latter problem is one of the main topics of this dissertation. That is,
the problem of either failure of convergence or slow convergence, caused by correlations
which are close to 1 or -1, was solved by using factor analysis at level-2.

There is another methodological contribution that considering factor analysis in
the level-2 variance-covariance matrix provides. That is, this methodology offers a
natural framework for modeling. If we don’t have a strong theory or prior knowledge to
make regression coefﬁcients ﬁxed, we want to make as many of these coefﬁcients as
possible be random so that the model is more general such that it allows the regression
coefﬁcients to vary among level-2 cluster units. However, if these random coeﬁicients
are highly correlated, the HLM2 methodology fails under Fisher scoring or gives a very
slow convergence under EM algorithm, even when variance estimates of each random
coefficient differ from zero. In this situation, HLMZF is useful because it constrains
covariances, whose estimates have come close to the boundary using factor analysis,
while allowing variance to differ from zero. Factor analysis achieves this purpose by
decomposing the original covariance matrix into a factor loading matrix and a smaller
covariance matrix of factors.

Bryk and Raudenbush (1992) provided a guideline of level-1 model building in
their Chapter 9 (pp. 201-204). The key issue of the procedure that they discussed was

whether we ﬁx the level-l regression slope or make it random. In this regard, they

-96-

suggested using statistical evidence such as point estimates, univariate 12 tests, and
multivariate deviance tests. Also they mentioned that low estimated reliabilities and the
slow rate of convergence for the EM were useful indicators, and diagnostic themselves,
for respecifying a random level-l coefﬁcient as either ﬁxed or nonrandonrly varying.

Even after polishing the level-1 model using the above recommended treatment,
the slow rate of convergence for the EM may still occur because of highly correlated
random coefﬁcients as we have seen in the examples in Chapter 5. Therefore, I propose a
slight modiﬁcation of the level-1 model building procedure. That is,

1) Make as many level-1 regression coefﬁcients random as needed, based on

evidence of non-zero variance in the coefficients;

2) If the rate of convergence is very slow, then consider using the HLM2F model.
In terms of specifying factor structure, employ substantive theory of the ﬁeld in
question as well as the estimated correlation structure of f , if it is obtained.

3) Further, a more active use of the HLMZF model would be, when a researcher
wants to test his/her theory regarding the simultaneous variability of the
random effects among the level-2 units, to specify this theory by HLM2F
regarding how random effects are related.

This procedure solves a key dilemma that most of the analysts who use multilevel
modeling encounter. That is, when analysts don’t have a strong theory or prior knowledge
to make regression coefﬁcients ﬁxed, they want to make as many of these coefﬁcients
random as is possible, but often the data don’t have enough information to estimate all of

the parameters. The procedure that analysts use to deal with this dilemma is to ﬁx some

-97-

of the coefﬁcients that are not their focus, even when they suspect that there is no reason
that those coefficients shouldn’t vary among level-2 clusters (Bryk & Frank, 1991).

HLM2F is a solution of this dilemma.

6-1-2. Substantive contributions
Expanding modeling possibilities

I would argue that the HLMZF methodology will contribute to educational
research because this model certainly expands modeling possibilities, while maintaining
natural continuity with the standard HLM2. This statement consists of three claims:

recovery of f , improvement over current ad hoc procedures, and view of current ad hoc

procedures from the HLM2F framework.

Recovery of 2‘

Suppose a researcher wants to study many random effects simultaneously but
he/she cannot obtain a stable estimate of 7 because of limitations of both the data and the
modeling framework. Suppose, however, he/she can run the HLM2F model by reducing

the dimensionality of r. In this situation, the researcher can recover the 2' estimate by

constructing it from the estimates of A and ‘1‘ by f = A‘I’AT. This certainly gives the
educational researchers an opportunity to study the large 7 -matrix from the limited data

he/she has at hand if some of the random effects in fact go together.

-98-

Improvement over the current ad hoc procedure

The second claim is that the HLMZF model represents an improvement over the
ad hoc procedures currently practiced by educational researchers when they encounter the
above situation, i.e., when they want to make many regression coefﬁcients random, but
HLM2 does not allow for this. Let’s consider the current practice of the researchers who
are in this dilemma. To illustrate this claim speciﬁcally, recall the Equation (2-1-1), the
level-1 model formulated in a hypothetical school effectiveness study.

L-l:

iid
Y}, = 60,. +ﬂjxw +6,sz, +63jx3y. +6,wa +6565”. +6666”. +80, 8,]. ~N(0,0’2)
At level 2, the researcher formulated the model in which all the level-1
coefﬁcients vary randomly among schools aﬁer being accounted for by a certain school

characteristic W]. because he/she thought that there is no reason to ﬁx some of the

coefﬁcients.
L-2:

A); =700 +701Wj +u0j
A)“ ==710 +711Wj +qu
182} =720 +721Wj +u2j
#1} =730 +731Wj +u3j
A} =740 +741Wj +u4j
165} =750 +751Wj +qu
166} =760 +761Wj +u6j

where u j. = (uo, , u, ,. , uzj , u,,, u,,, u,, )T is assumed to be distributed as multivariate normal

with a mean of 0 and a covariance of r , a 7 x 7 symmetric matrix. This is a fairly large

-99-

model that has 28 unique parameters in the r matrix. Such a matrix is generally
estimable only if the number of level-l units is quite large.

If the level-1 sample size is not large, the problem that the researcher faces is that
the data are insufﬁcient to estimate all the variances and covariances. Following the
current practice, the researcher ﬁxes the slopes that are not central to his/her research
question, and let the slopes of interest be random; thereafter he/she writes the report. This
practice is understandable, but we know that it is not the best way. It is a compromise
forced by the data and the limitation of the model.

However, suppose a second researcher with a different theoretical focus analyzes
the same data with a different 1' speciﬁcation, producing different results. There could be
no way to evaluate the adequacy of the tWo summaries of evidence.

Using the HLM2F approach, both researchers could estimate a full 7 by 7 2' and
produce identical results. Alternative interpretations could then be evaluated in light of a
common summary of evidence.

My claim is that even with the same data, we can do a better job by improving the
modeling practice, i.e., formulating the HLM2F model with some a priori substantive
information or insight that some of the random slopes are highly correlated. Also, when
we think about the consequences that the report can produce when used for decision

making of educational policy, the impacts that the results induce can be substantial.

-100-

View of the current ad hoc procedure from the HLM2F framework
The third claim goes to the fact that the current practice also can be considered as
a special case of the HLM2F model. For example, suppose the researcher’s interest is on

60, and 6,. Then, the model the researcher would use is

A); =700 +701Wj +qu
A] =710 +711Wj +qu
1621 =720 +721Wj
A} =730 +731Wj
ﬂu =740 +741Wj
165)- =750 +751Wj
ﬂ..- =rw+r..W,--

This model actually is equivalent to HLM2F model with

(um) (I 0)

“U
u,,

114,-

OOOOO

qu

Kuej/ KO 0)

 

 

 

 

which is a case in which the researcher used two latent factors and assigned 1 to (1,1) and
(2,2) elements, and 0’s to all the rest of the elements of A . Thus, the current practice is a
part of the HLM2F model framework, where the researcher naively determined the
number of factors as two and the values of factor loading matrix as 1’s and 0’s in the
speciﬁed position as in the above. If we use the HLM2F model, we can allow the
elements of A to be unknown, where the current practice ﬁxed these at zero. Thus, the

use of HLM2F not only can solve the dilemma that researchers currently face deciding

-101-

which slopes they should ﬁx, but it also reminds the researchers of alternative modeling
possibilities. Therefore, the HLM2F model not only expands the ﬂexibility of modeling,
but also can reﬂect the researcher’s substantive knowledge of the model. This, in turn,
means that the HLM2F modeling requires more consideration of substantive theory

because the researcher does have to say which element of the level-2 error u, ”go

together.” Thus, in HLM2F, it is especially important to note that the decision of
specifying a factor structure must be made based on interplay between substantive

theories and statistical methods.

Substantive integpretation of latent variables

The second contribution of the HLM2F model to educational research from a
substantive standpoint involves interpretation of the latent variables which are used at
level-2 in HLM2F. HLM2F provides an opportunity to link level-2 random errors which
share common latent variables. Those latent variables may be meaningful substantively
and can be interpreted, as we have seen in the infant vocabulary growth example, where
we found that one and only one latent variable was necessary to describe the differential
growth pattern of vocabulary of young children. This result may reﬁne the current
existing theory on infant vocabulary growth or motivate the researchers to create a new

substantive theory.

-102-

Practical integpretabilig of the results

The third substantive contribution would have to do with interpretation of the
results from a practical viewpoint. When we have many intercorrelated random effects,
HLM2 results are hard to interpret practically, even in the case when a large Tau matrix is
successfully estimated. It is especially difficult to interpret many covariance parameters
simultaneously. Therefore we often interpret only the variance components, and may
overlook important evidence of shared variance. HLM2F offers an opportunity for
analysts to carefully look at the off-diagonal covariance terms to detect the shared
variance and to explore latent variables which might have important meanings. If it’s
possible to make a concise summary by reducing the dimensionality and by using latent
variables, a picture of what kind of factors inﬂuence the variability among the level-2
units then emerges. HLM2F provides this possibility.

Overall, the proposed HLM2F model creates new ﬂexibility by integrating two
major statistical methodologies, HLM and conﬁrmatory factor analysis. In terms of how

to use theory and evidence to formulate models, more work needs to be undertaken.

6-1-3. U_nsolved Methodological Problems
Conﬁrmation of global maximum

When maximum likelihood is chosen as a method of estimation, there is always a
possibility that the likelihood function has multiple maxirna, or a ﬂat likelihood in the
neighborhood of the maximum likelihood estimate. Thus, it is a good idea to graph the

likelihood or the log-likelihood to detect this problem. However, this is difficult since we

-103-

have many parameters and we want to sketch the log-likelihood curve, which is a
function of many parameters, while the standard graph is limited to three-dimensions. To
address this issue, one might consider using the proﬁle likelihood. This method is
appealing because we can visually see the shape of the likelihood as the function of the
target parameter by taking into account other estimates of parameters. The proﬁle
likelihood can be deﬁned (see for example, Gorthwaite, Jolliffe, & Jones, 1995) as
follows:

Deﬁnition: Proﬁle likelihood

Consider a vector of parameters 19 = (61 , 192) , with likelihood function L(6l , 61,;y),
where y is an observed vector which comes from a density fY (y; 19). Suppose that 03,, is

the MLE of 62 for a given value of 61 , then the proﬁle likelihood for 61 is L(6l , 612,;y).

The method of computing the proﬁle likelihood involves the following steps:

1) Fix a parameter a , in which you are interested, to some speciﬁc value, e.g., 61(0).

2) Obtain the MLE of (92 , conditional on the ﬁxed value of the target parameter, 6;.

3) Compute the likelihood L(6, , 0, ; y) by plugging those values in the likelihood formula,
i.e., obtain L09, = 6(‘°’,0, = é,,;y).

4) Change the value of the target parameter 61 and repeat the steps 1) ~ 3) for all possible

6).

5) Draw the proﬁle likelihood curve against 61.

-104-

One technical difﬁculty arises when we consider the implementation of this
methodology. That is, in step 2, we need to compute the MLE for a given value of 61. If
62 involves a variance component and we choose a bad value for 6 , Fisher scoring fails

by providing a negative estimate. In our HLM2F model in Equation (3-9) (or Equation
(A-l) in Appendix), 6 = (7, 05) ,where ¢ = (2., 1/1, 0'2 ) (see Equation (A-9)). Suppose, for
example, 21 = (21,4, )T and (u = (1,11,, , 1,112, , W22 )7. Suppose also that we are interested in
the behavior of 1//,, . Then, in this case, 6 = 01,, , and 62 = (77, ’1’, W211 ((122) . To compute
the proﬁle likelihood of 1,11,, , we compute the MLE of 62 given (11”. But if we ﬁx W11 at

an unlikely value, then Fisher scoring produces an unacceptable MLE of 62 , e. g., W22
becomes a negative value, etc. Thus, as long as we use Fisher scoring, we will face this
problem in computing the proﬁle likelihood.

A solution for this might be to use the EM algorithm. The EM algorithm assures
convergence to a local maximum and thus we can compute the proﬁle likelihood at any
point evaluated, at least, at the local MLE. This certainly motivates one to develop an EM
algorithm for the HLM2F model, but whether it is the global maximum or not is still in
question.

A non-graphical way of checking global maximum was developed by Gan and

Jiang (1999). Their claim is that a global maximizer would satisfy

ac 662 “’ ’

which is consistent with the property that at the global MLE, an equality to derive the

Cramer-Rao lower bound should be satisﬁed,

-105-

at 2 0‘”!
E4126) +£43.79)”:

where I is the log-likelihood function, 6 is an unknown parameter and 60 is the true 6.
Based on the observation on the above approximate equality, they developed a large-
sarnple test that is practically usable. Their Monte Carlo studies on distributions that
involve local maxima, including a normal mixture distribution, showed that the observed
signiﬁcance level was close to a level and the power was very high in a sample size of

500 in a simple random sampling.

Identiﬁcation

Necessary and sufﬁcient conditions for model identiﬁcation are still an unsolved
question when we use the factor analytic model. There are no sufﬁcient conditions
known. In practice, what we can do now is, once a model satisﬁes the necessary
conditions, try to run the software and see if the software can produce the estimates. Then
we might try different starting values for the estimates of the parameters and see if the
software can produce the same estimates. If it does, we can have a little conﬁdence that
the model is identiﬁed, though it is not mathematically proved. This practice is based on
the concept of local identiﬁcation, which means that the information matrix is invertible
at the MLE.

Wald (1950) provided an alternative sufficient condition for local identiﬁcation
using a Jacobian determinant which is known as Wald’s rank rule (Bollen, 1989). The

covariance equations that need to be solved are non-linear simultaneous equations in

-106-

terms of parameters. A general way of solving non-linear equations is an iterative
algorithm using the Jacobian of the simultaneous equations. Wald’s suggestion is that if
the Jacobian has non-zero determinant at the value we plugged in to the equations, then
the parameter is identiﬁed at that value. Since these days symbolic computation of a
determinant is possible using a software package such as Mathematica (Wolfran (1991)),
Wald’s suggestion could be extended to a statement of necessary and sufficient condition
of model identiﬁcation, such that if the Jacobian determinant is not zero at the arbitrary

point in the parameter space, then the model is identiﬁed, and vice versa.

6-2. Future Directions

The issues of conﬁrmation of global maximum and model identiﬁcation, which
were mentioned in the section of unsolved problems, are more general statistical
problems that apply to many other models than HLM2F which I developed in this
dissertation. Clearly more studies on these topics are needed to solve these problems.

In this section, I mainly discuss extensions of the HLM2F model. Since correlated
variables are common in social and behavioral sciences, a variety of the extensions of
HLM2F can be considered. Those include:

6. Including a speciﬁcity parameter;

b. Multivariate hierarchical linear model with factor structure (MHLMF), where factor
analysis is applied to the multivariate HLM model;

c. Nonlinear hierarchical generalized linear model with factor structure (HGLMF), where

factor analysis is applied to the two-level hierarchical generalized linear model

-107-

(HGLM). When the outcome variables are dichotomous responses to test items, it can
be shown that HGLMF is equivalent to a multi-dimensional two-parameter (2-P) IRT

model. This is a conﬁrmatory item factor analysis.

6-2-1. Including smciﬁcig parameters

Including speciﬁcity parameters into HLM2F is straight forward and is useful. As
I suggested in section 5.3, this model is immediately applicable to the SAT meta-analysis
data.
Mel:
L-l:

Y, =X,6, +r,, (6-2-1.1)
where Y,isa n, x lvector, X,isa n, x R matrix, 6, isa Rxl vector, and r,isa 11, x1
vector, and

r, ~ N (0,021,,1) (6-2-1.2)
where 1", denotes an identity matrix of size 71,-.
L-2:
ﬂ) = W + u,- (6-2-1.3)
u, = A77, + 5, (6-2-1.4)
where W,isa RxF matrix, yisa Fxl vector, u, isa Rxl vector, Aisa Rx M
matrix, 77, is a M x 1 vector (R 2 M), and 5, is a R x1 diagonal vector, and

u, ~ N(0, r), (6-2-l.5)

-108-

17,- ~ N (0.‘I’). (6-2—1.6)

4'} ~ N(0.Q). (6-2-1.7)
where
Q = diag(a), ,w2,...,a)R). (6-2-1.8)
Thus, we have
2' = A‘PAT + Q. (6-2-1.9)

and the total variance D[Y,] E 22 , a R x R matrix, is decomposed into

2 = z'+o'2I,,l

6-2-1.10
=A‘1’AT +Q+azln,. ( )

In factor analysis terms, the ﬁrst term in Equation (6-2-1.10) is called the communality
(common factor variance), and the second term is called the speciﬁcity (part of a test’s
true variance which is not shared with any other tests in a battery) (Thurston, 1947). Note
that since we are formulating a factor model at level-2, there is no error variance left at
level-2 variance-covariance 2'. That is, the true variance 7 is decomposed to
communality and uniqueness by the level-2 model. In his factor analysis model, Thurston
(1947, Chapter 3) actually considered the model in terms of variance decomposition as

follows:

-109-

Total Variance = True Score Variance + Error Score Variance

Communality + Speciﬁcity + Error Variance (6-2-1.11)
True Score Variance

 

Communality + Emciﬁciw + Error Varinace.

 

Uniqueness
Usually in factor analysis, the equation that is represented by the third decomposition in
(6-2-1.11) is used. But by using the model represented by Equation (6-2-l.1) to (6-2-
1.10), we are able to further decompose the uniqueness into the speciﬁcity and the error
variance.
Note that if we constrain a), = 0 for all r, then it reduces to the proposed model in

Chapter 3.
In section 6-2-2, I will present the model without the speciﬁcity term, i.e., 5, =0

for all j, because in a technical sense, the model seems more useful in solving the
nonconvergence problem when some of the correlation estimates are close to 1 or -1,
although inclusion of the speciﬁcity term is straightforward. Then, in section 6-2-3, I will
show that the 2-Parameter Item Response Theory (IRT) model can be formulated by a

nonlinear version of the HLM2F model.

6-2-2. Multivariate HLM with a Factor Model (2-level Multivariate Model)

General Formulation of the Model
Suppose the complete data consists of P observations. For example, if a test

consists of Pth subtests and the examinee responds to all the subtests, then we have P

-110-

outcomes for that subject. In a longitudinal study, if Pth waves of observations are
planned, and if we have succeeded to follow up, then we have complete P waves of
observations. Often in these settings, we have missing observations. However, if we can
assume that missing observations occurred at random (MAR), we can use all the
information we have gotten to estimate the complete data model without bias. To develop
a general multivariate model, we utilize the idea of Jennrich and Schluchter (1986) and
Thum (1997). That is, we formulate a level-l model by two steps. The ﬁrst step of the

level-1 model links observed outcome and complete outcome, which is

L-l:
Y, = Z M r‘ (6-2-2.1)

where Y, is a scalar and ith observation of jth level-2 unit, I; is a scalar and pth
observation of level-l unit, and M ,p, is an indicator and is 1 if ith observation is the

observation of pth level-1 unit, 0 if otherwise. Thus the matrix M ,.p, indicates which

observation of complete data we have got, or it tells us the missing pattern of the

observations. In matrix notation, we can write
Y, = M ,Y, , (6—2-2.2)
where Y, is a p, x 1 vector of observed data outcome, M, is a p, x P matrix of rrrissing

pattern, and Y,’ is a P x 1 vector of complete data observation of the outcome. Then, we

formulate the second part of the level-1 model, which is the standard formulation of the

-111-

level-1 model in HLM. Thus, we formulate the standard model for the complete data, not

for the observed data.

Y,’ = X,’6, + r,‘, r,‘ ~ N(0, 2), (6-2-2.3)
where )3 is the P x P variance-covariance matrix for r,’ , i.e., E = D[r,°] = D[Y,°| 6,].
Plugging Equation (6-2-2.3) into Equation (6-2-2.2), we obtain the level-1 model by
letting X, = M,X, and r, = M,r,°.
L-l:

Y, = X,6, +r,, r, ~ N(0,Z,) (6-2-2.4)
where
Z , = M, M} , a p, x p, symmetric matrix. Notice that now we have the subscript j in

)3 that was induced by a missing indicator matrix M, and that 22 , is actually a submatrix

of 2 , a P x P symmetric matrix. Thus, what multiplying M, and M,.T from the left and

the right does is to pick up the subset of the 2 matrix. The level-2 model is a standard
formulation.
L-2:
ﬂ) = W,-7 + u). (6-2-2.5)
Now we consider the case when u, = A77, , then we have the level-2 model
6,. = W,7 + A77,, 77,. ~ N(0,‘l’). (6-2-2.6)
If we put Equation (6-2-2.2), (6-2-2.3), and (6-2-2.6) together, we obtain the combined

model.

-112-

Y, = M,(X,°.W,7+X,'.Ar7, +r,). (6-2-2.7)
We write this general two-level multivariate linear model, which has a factor structure at

level-2, with a slightly more general form:

Y, = M,(A,,6l +A2,A77, +r,)
= M,A,,6, + M,A2,A77, + Mr.

11’

(6-2-2.8)

where r, ~ N(0,Z), 77, ~ N(0,‘I’).Ifwe let X, = M,A A, = M,A2,,and e, = Mr.

I j ’ J J ’
then we have a standard form of HLM with level-2 factor structure as in Equation (3-9),

Y, = X,6( + A,A77, + e, , (6-2-2.9)
where 6, ~ N(O,Z,)for X,isa p, x p,symmetric matrixand 2,. = M,2M,T.
Heterogeneity of variance (Note the sufﬁx for 2 ) appears because of varying missing
pattern matrices M, , and 2, is a subset of 2 .

Now we show two examples to illustrate how to apply the above general

formulation to the speciﬁc cases.

Example 1. Growth Model

Suppose that the same math test was administered for n elementary school
students from 1st grade to 5th grade, but some of the students did not take the test at some
occasions of testing. We wish, for example, to know how math proﬁciency of the

students grows over time and what the variation of the proﬁciency among the students is.

Then, in a general formulation of the multivariate model, Y,’ , complete data for student j,

-ll3-

1s Y,°= (Y' , Y;,,Y 3,, 13,, 1,5171 a 5 x 1 vector in Equation (6-2-2.2), and the number of

occasions P = 5. To be speciﬁc, if student 1 took the test at all of the grades, then we have

()m

Y
Y31=

Y.)

112,)

 

 

If student 2 missed the third occasion, then we have

2

2

Y1
Y2
Y3:
Y,

OOOH

2

fl
0
0
0
\0

 

COO—‘0
co—1oo

OOHO
COCO

OF-‘OOO

o—oo

O)

(1")

ll

 

 

HOOO

 

Y4]

 

Y3}.

 

1Y5)

(6-2—2. 1 0)

(6-2-2.1 1)

To formulate the complete data level-1 model in Equation (6-2-2.3), suppose we

decide to use a quadratic function for modeling mean growth for studentj,.and the grade

was coded as (1st grade, 2nd grade, 3rd grade, 4th grade, 5th grade) = (-2, -1, 0, 1, 2),

which centers the age variable around 3rd grade. Then, the parameter 6, is a 3 x 1 vector

6, = (60,,6,,6,,)Tand the design matrix X, is

 

(1 -2 41
l —1 1
1 1 l
\l 2 4)

 

-114-

X, = 1 0 0 for all students. Thus, the whole level-1 model for complete data is

(11;) f1 —2 4) 0;.) "1 f0)

 

 

 

 

 

 

 

 

 

 

J 7“,,
’21 1 “1 1 ﬂu r2.) r2.) 0
Y3} = 1 0 0 6, + r.) . 6’,- ~N( 0 ,2) (6-2-2.12)
’71 1 l 1 .32; r,°, r;, 0
1Y5) \1 2 4) vs); vs); KO}

where 2: 0'3, 0'3, 0'3 0'3, 0'3, .
C741 0'42 043 0'4 0'45

 

 

(0'5, 0'52 05, 0'54 052 1
This is an unconstrained (or saturated) level-l covariance. Modeling this
covariance structure using a smaller number of parameters by giving restrictions such as
homogeneous case, heterogeneous but independent case, a ﬁrst order autoregressive

model (AR(1)) case, factor analysis (exploratory model) case, and so forth, were
mentioned in Jennrich and Schluchter (1986) and Thum (1997).
For the level-2 model in Equation (6-2-2.5), suppose that we found a high

negative correlation between residual intercept uo, and the residual slope 11,, , but no
correlation between uo, and the residual rate of acceleration uz, , and between u, , and u,, ,

even after we included all of the necessary predictors that could explain individual
differences such as gender, race, socio-economic status (SES), then we might consider the

model in Equation (6-2-2.6). That is, it might be useful to think about

uo, 1 0 77 ’7
u,. = 4 0 [ °’),[ W] ~ N(0,‘1’) (6-2-2.13)

J 0
1

u,, 0 1 ”U

-115-

W11 W12)

where ‘I’ =[
W21 W22

This factor model structures the 7 as

T

700 701 702 1 0 1 0 W11 1W” 0

W11 W12 2
7,0 7,, 7,2 = 2. 0 W W /1. 0 = lit/1,, 2.171,, 0 (6-2-2.l4)
1,, 1,, 7,, 0 1 " ’2 0 1 0 0 11,,

and reduces the number of parameters in r from 6 to 3.

Example 2. Multivariate Outcome Growth Model

Suppose that students in primary school took 2 tests such as Reading and
Mathematics each year from 1st grade to 3rd grade. In this type of study, it is natural that
we have missing observations for either or both tests at some time points for some

students. We assume that these missing patterns are missing at random (MAR). In this

setting, the complete data for each student is Y,’ = (Y'. Y' Y. Y. Y' Y' )T where If, is

11’ 2j’ 3j’ 4j’ Sj’ 6}

the reading score for student j at grade 1, Y2, is the mathematics score for student j at
grade 1, Y3, is the reading score for student j at grade 2, If, is the mathematics score for

student j at grade 2, If, is the reading score for student j at grade 3, Y6‘, is the mathematics

score for student j at grade 3. To formulate the complete data level-1 model in Equation
(6-2-2.3), suppose we decide to use linear function for the mean growth for student j and
the grade was coded as (lst grade, 2nd grade, 3rd grade) = (-l , 0, l), which centers the

age variable around 2nd grade. Then, the parameter 6, is a 4 x 1 vector

,3, = (60, , 6, , 62,, 6,, )T , where 6,, is the mean reading intercept for student j at grade 2,

-ll6-

6, is the mean reading slope for student j , 6,, is the mean mathematics intercept for
student j at grade 2, 6,, is the mean mathematics slope for student j. The design matrix

X, in Equation (6.2.2.3) is

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(1 —1 0 0)
0 0 l —l
, 1 0 0 0
X, = 0 0 1 0 for all students. Thus, the whole level-1 model for complete data
1 l 0 0
K0 0 l l)
is
(11,) (1 -1 0 0) 0,3.) 0,3.) to)
2,. 0 0 1 “16301) r2, r2, 0
Y3, 1 0 0 0 6, r3, r3", 0
= + , , , ~N ,2 6-2-2.15)
,, 0 0 1 0 6,,- r,, r,, (0 ) (
Y5.) 1 1 0 0 K1311} ’5‘} ’3‘} O
\61/ ‘0 O 1 1’ V6.1} \r6.j) ‘0)
(0'1 012 0'13 0'14 0'15 0'19
021 of (In CB4 025 026
where 2: 0'31 0'32 0'32 0'34 0'35 036

0'51 0'52 0'53 0'54 0'5 056

 

 

2
)061 062 063 064 065 0'6 /
Thus 2 matrix, i.e., within-student variance-covariance, is 6 x 6 in this case. In
addition to the ways of structuring the covariance matrix mentioned in example 1., we

can think about other patterns here because there is a content area factor, i.e., reading vs

-ll7-

A 0 0
mathematics, as well as a time factor within students. For example, 2 = 0 A 0 ,
0 0 A

5:2 42

52, 522] and 015 32 x 2 null matrix, represents a block homogeneous

where A = [

variance-covariance pattern which corresponds to 2 = 021, a homogeneous independent

variance-covariance for example 1. The within-subject variance-covariance

pattern 22 =

c: :3 D»
ab:

0
0 implies that reading and mathematics scores are correlated at each
A

time point within students as in A , and the within-student variance covariance is constant
over three time points, but has no correlation with other time points.
Now for the level-2 model, suppose that we found a high positive correlation

between the residual reading intercept uoj and the residual mathematics intercept uzj ,
and between the residual reading slope u”. and the residual mathematics slope 113, , but no
correlation between uoj and ul j , between uoj. and u” , between u1 j and uzj , and between
uzj and "31 even after being accounted for by all of the necessary predictors such as

gender, race, socio-economic status (SES) for example, that could explain individual

differences. Then we might consider the model in Equation (6-2-2.6). That is, it might be

 

 

useful to think about
fuel) 1 O
u. 0 1 . .
‘1 = [0”), (770,) ~ N(0,‘P) (6-2-2.16)
“21 A1 0 7711 ”I;
W31) 0 32

-118-

W11 W12)

where ‘P =[
W21 W22

This factor model structures the r as

T

o
1
o

12

0290*—

1 0
2'10 2'” 1,2 713 = 0 1(V’n W12]
1'20 715 722 723 ’11 0 W21 W22
0 32

W11 ./’12 31%| ’12le

_ W21 W22 Ail/21 Az'l’zz
_ 11W” 11%2 312W” 3132an
lei/’21 ’12sz Aszr 122sz

(6-2-2. 17)

and reduces the number of parameters in r from 10 to 5.

-119-

6-2-3. Application to Item Resmnse Theory Model

The purpose of presenting an alternative formulation for a unidimensional 2-
parameter Item Response Theory (IRT) model here is that, since the reformulation makes
the IRT model a special case of nonlinear mixed model with factor structure, it gives us
an opportunity to examine the model and the theory, from a standard statistical point of
view. Also, the reformulation gives us a model that is easier to extend to a
multidimensional IRT model and to multiple levels of nesting.

We will ﬁrst give the perspective of IRT as a nonlinear factor analysis. What we
mean by “nonlinear” here is a nonlinear transformation of the expected value of the
outcome is modeled by a linear combination of parameters and independent variables.
From this view, we will show that if we adopt the IRT as a nonlinear factor analysis
perspective, it is straightforward to extend the usual unidimensional IRT model to a
multidimensional IRT model. Though the current IRT model, whether uni-dimensional or
multi-dimensional, is used in the exploratory factor analysis mode, it will be shown that a
conﬁrmatory mode non-linear factor analysis can be executed without much difﬁculty, if
we formulate the IRT model as a hierarchical generalized linear model (HGLM) (See
Chapter 5 & 6, Bryk, Raudenbush, & Congdon (1996)).

Usually a test consists of many items, and the items are supposed to measure the
student’s proﬁciency. Then we want to know the current status of the proﬁciency of the
examinee and how his/her proﬁciency grew over time if the longitudinal data are
available. The key for the modeling here is to represent the IRT model as the HGLM

level-1 model for a nonlinear factor analysis, and if we use dummy variables, the same

~120-

model can be represented by HGLM with level-2 factor structure. Suppose that a test that

consists of n dichotomous items is administered for J students, whose response Ya is

scored 1 if correct, 0 if not, and suppose that J students are a random sample from the

student population. The distribution of random variable Y”. conditional on the probability
of correct response of student j for item 1' (denoted as pg. ) is Bernoulli with mean

pg. (0 S pg. S 1) i.e., Pr(Y,J. = 1| p0.) = p”. . If we write this data generation process in a
regression form, then the sampling part of the level-1 model (the unit is item) is:

L-l Sampling Model:

X,- = p,- + 3,, (6-2-3.l)
where 5,). ~ Ber(0, py.(l — p0. )) and the 8,]. ’s are independent from each other. Note that
the conditional mean and the conditional variance of the outcome variable I; are

E(Kjlpy) =p,-,-, and Var(1£j|pg)=py(1-pg)- ‘ (6-2-3-2)
Now, for the model for the mean p”. , since the range of p”. is restricted between 0

and 1, and in order for the transformed variable to have the range theoretically -oo to +00,

we make a logit transformation, i.e.,

Pi}
1-p

 

77"} = log( ) =-—. 10g it(py) a (6'2‘3-3)

y
and then we assume that 17,]. has a normal distribution. For notational convenience, we

write the logit inverse transformation as

1
1+ exp(—n,.j)'

 

1),-,- E logit"(r7,-,~) , where p, = (6-2-3.4)

~121-

Now, the level-1 structural model for the 2-parameter IRT model for item 1' of
student j would be:

L-l Structural Model:
n. = /1,-(0,- - 6.). (6-2—35)
Combining the level-1 sampling and structural models, we obtain the Level-l model as:

L-l Model:

Y, = logir'[x,(oj - (2)] + 5,, (6-2-3.6)
where
8,]- ~ Ber(0,p,.j(1— pg. )) for O S pg. 3 l. (6-2-3.7)
and A, is the item discrimination parameter for item i, 01 is person j’s ability
(proﬁciency), 6,. is the ith item’s difficulty parameter. The scaling constant D 5 1.7 that
makes the logistic model comparable to the normal threshold model is omitted from the

model representation without loss of generality. Note that if we consider all the 2., ’s to be

the same for all the items, then it will be a Rasch model.

AN OVA gge formulation of 2-P IRT using HGLM

We now show that the inside of the inverse logit function of Equation (6-2-3.6)
can be written in the same format as the linear factor analysis model. Speciﬁcally, let’s
consider that we have only three items in the test. Then the level-1 structural model of the

2-parameter IRT model can be written as

-122-

L-l Structural Model:

Itemlznlj =X1(6j —c5;)
Item 2: 772]. = 12(0). —62), (6-2-3.8)

Item 3: 7}”. = 23(6]. — 63)
where 6]. ~ N (0,1) . The variance of 6}. was ﬁxed to 1 in order to identify all of the item

discrimination parameters. In matrix format, Equation (6-2-3.8) can be written as

’7” ”M 31 ﬂ. ,1,
'72.- = 4,5, + ,1, (0,)= y, + A. (9,), (6-2-3.9)
’13.; ”/1353 13 #3 33

where
,u, E -/1,.5,. for all i. (6-2-3.10)

Or, we can write it in matrix form as

77]. =,u+A6j. (6-2-3.11)
Then, at level-2 whose unit is student, we have,

6]. = ﬂ” + uj, (6-2-3.12)
where u J. ~ N (0,1). Note that the Var(uj) = 1 corresponds to Var(61.) = 1.

Thus the logit-transformed mean vector 77]. conditional on 6]. has the same

structure as the conditional mean of the standard one-population factor analysis model,
i.e., E(YI 0) = ,u + A19. Remember that the key assumption of the linear factor analysis is
that given the factor score (latent ability), the Y’s (outcome variables) are independent. In

the IRT case, the local independence assumption (that, given 91. , the observations Yr ’5

are independent) corresponds to the conditional independence assumption for the linear

-123-

factor analysis model. The difference between the linear factor analysis model and IRT
model is whether the link function is identity or logit. Thus, we can view the IRT model
as a non-linear factor analysis model. If we do so, then the item difficulty parameter is the
mean of the logit, and the item discrimination parameters are contained in the factor

loadings matrix A .

Regression ape formulation of 2-P IRT using HGLM

Rasch (l-P IRT) model

Though the above formulation is useful for recognizing that the IRT model
involves a factor structure, it does not facilitate ﬁnding the estimation procedure similar
to the one which is executed by the current HGLM. In order to do that, we reformulate
the above IRT model by a hierarchical model with dummy variables, which was
introduced by Kamata (1998) and Cheong & Raudenbush (1999), who applied this model
to the 1-parameter (l -P) IRT (Rasch) model. In the following presentation, since the
level-1 sampling model is always the same, i.e., Bernoulli distribution, 1 will omit it.

To facilitate the idea of representing a 2-parameter (2-P) IRT model by HGLM,
we use no-intercept model ﬁrst. As before, we have n items and J students. For item i and
student j,

L-l Structural Model:

77,]. =A1Xw +ﬂ2jX20. +AJX3U+---+ﬂquqy+-~+,B,,X,,,.j

= Zﬂinqij

q=l

(6-2-3.13)

-124-

where X W. is the indicator variable and is 1 if q = i and 0 if q a: i for all i. At level-2, all

the coefﬁcients randomly vary among students, but with the same ul j.

L-2:
ﬂ,- = 710 + “u
:62} = 720 + ulj
(6-2-3.14)
ﬂqj : qu + ulj
ﬂnj = 710 + “119
where u“. ~ N (0, Too). Or, in short,
,6,” = 7,10 + u”. for q = 1,...,n. (6-2-3.15)

Notice that ul 1. is the same for all items and reﬂects the idea of unidirnensionality

of the IRT model, that all the items in a test are supposed to measure a single construct.
Notice also that in the Rasch model it is not necessary to ﬁx r00 = 1, though in IRT it is
customary to do so. On the other hand, ﬁxing some parameters is necessary for the 2-P
IRT model in order for the model to be identiﬁed.

If we put the L-l and L2 models together, we get the combined model.

Combined Model:

'7.) = 27.0%.», MHZ/12,. (6-2-3.16)

q-I q=l

But since 2 X W. E 1 by deﬁnition of indicator variables, it reduces to
(1:1

77,-,- = Z quXq, + u” . (6-2-3.17)

(Pl

-125-

I now show that the model in Equation (6-2—3.17) is equivalent to the model
formulated by Kamata (1998), and Cheong & Raudenbush (1999), that used an intercept
model in order to ﬁt the Rasch model to the existing HGLM setup. That is, at level-1, we
let one item, for example, the ﬁrst item, be the reference item, and create (n-l) dummy
variables to represent differential item effects on response probability of student j.

L-l structural Model:

’7"; = A); +£1ij +13sz3;;+°"+/3q,-Xm+”'+ﬂn-me-j

= Ian + Zﬂqulij

q-l

(6-2-3.18)

where X W. is the indicator variable and is 1 if q = iand 0 if q ¢ i for all i. At level-2, only
intercept ﬂoj varies.
L-2:

ﬂu} =7oo+u0)

A} = 710
a, = 720 (6-2-3.l9)

ﬂqj = rqO
Air-l” = Yup—no
where uoj ~ N (0, 100) . The combined model can be written as,

Combined Model:

27,, = 7,, + Z yqoxq, + u,,. (6-2—3.20)

(Pl

-126-

If we let 6,. s -{700 + 2 yqu q,1} , 6}. E uoj , then the combined model is exactly

([21
the same as the l-P IRT Rasch model except that the constant D s. 1.7 , which adjusts it to

the normal ogive model, was omitted. In IRT model terminology, 6,. represents ith item
difﬁculty, 6]. is jth student ability. To interpret the combined model, we consider item 1
and item 2. For item 1, 7}”. = 700 +u0j, and for item 2, 17,]. = 700 +710 +uoj. Thus, the
interpretation in words can be that the logit correct response probability of student j for
item 1 is student’s ability uoj adjusted by the item difficulty 700 (item easiness may reﬂect
the exact meaning of the parameter) for item 1. And the logit correct response probability
of student j for item 2 is student’s ability uoj adjusted by the item difficulty (700 + 7,0) for
item 2. 7,0 reﬂects the difference in terms of the item difﬁculty of item 2 compared to
item 1. If item 2 is easier than item 1, then 7,0 > 0 and thus p2,.2_plj , where p” and

p21. are the correct response probabilities for item 1 and item 2 of student j.

It should be noted that the l-P IRT model can be considered as a non-linear two-

way AN OVA additive model with the factors, item difficulty and student ability.

2-P IRT model

Since using all n indicator variables for items without including the intercept is
more natural, I’m going to adopt a no-intercept hierarchical model formulation to
represent the following 2-P IRT model. The 2-P IRT model addsanother parameter called

item discrimination on each item. Item discrimination can be interpreted as a kind of

-127-

item’s sharpness in distinguishing two students who have different abilities by the item.
This concept suggests the interaction between item characteristics and student ability.
That is, the effect of student ability on item response depends on an item characteristic
which we call item discrimination even aﬁer controlling item difﬁculty. In other words,
some items are sharper than others in reﬂecting student’s ability. By this
conceptualization, we formulate the level-1 structural model as

L-l:

77y = AJXUJ +1321X2ij+m+ q/Xw+'”+ﬂannr

" (6-2-3.21)
= Z 'qu Xqij ’
qal
or, in matrix notation,
17,,- = X374, (6-2-3.22)
where X; = (X10, ng, ,Xqij, ,ij). This notation is actually simpleth

it seems: the ith element is 1 and the rest of them are 0, i.e.,

X; = (0, O, ,1, ,0). Note that this is the same as the level-1 model ofthe
Rasch model. But the level-2 model is different.
L-2:

A] =710 ““11"”
ﬁr} =720 +22“;

(6-2-3.23)
ﬂq] = 7qo + ’1qu

ﬂnj = 7110 +173“)

-128-

where v”. ~ N (O, 1). It should be noted that the variance of v”. is now set to 1 for model
identiﬁcation. Notice the similarity and difference between the 2-P IRT model and the

Rasch model in Equation (6-2-3.14). The same v”. reﬂects that all items are measuring

one kind of ability and the different coefﬁcient 2 implies that the sensitivity of each item

to reﬂect ability v”. is different; in other words, itq is the interaction effect between the

membership of rtem q (X) and person j ’s ability ( v”. ). In matrix notation,

‘7’}

,5}. =7 +Avj , vj ~ N(O, 1) (6-2-3.24)
Whereﬂj =(AJ’IBU’W ”Bin/0T 7=(710,720,...,y"0)r,A=(2.l, ’12, ”'a 1,.)7,

4.1,).

If we combine the L-l and L-2 models together, we get

77'} = {2M1} qu} + {ZIA'X q qy' }vlj a ‘ (6'2'3 .25)

«PI

and ifwe let ,u, a 274,-qu , )1, a Z}. X and 6.- = v0j then the above equation can be

q 0'} ’
q-l qul

written as
77,-,- = A1.- + 1.63, (6-2-326)
where ,u, is the item intercept for item i, A, is the item discrimination for item i, and 0] is

the ability of student j. Using the item intercept as an item parameter is a way of
representing the 2-P IRT model (See, for example, Bock & Aitkin, 1981), but if we

impose a constraint on ,u, such as p, = -/1,.c5} for all i, then the model becomes

= 249,. — 5,.) (6-2-3.27)

-129-

and this is the standard 2-P IRT model. We now write the models for all n items together.
At level-l we have
L-l:

,7]. = A13]. (6-2-3.28)

where 77]. =(mj,...,n,,j)r, A]. = E ,and ,6]. =(Aj,...,,6nj)r.Note that Alisthe nx n

identity matrix, i.e., A j = In if student i responded to all of the items, which is a case of

balanced design. The level-2 model is written as Equation (6-2-3.24), and thus the
combined model is,

Combined Model:
77]. = A17 + AjAvj , vj ~ N(O,‘P) (6-2-3.29)
where 77]. is the n x 1 outcome vector, y is the n x 1 item intercept vector, A] is the
n x n design matrix, A is the n x 1 item discrimination vector (later, it will be
n x M matrix if we use M multidimensional IRT model), vj is the l x lscalar (later we

extend it to a M x 1 vector), and ‘P is a scalar variance of latent ability of student j (later,

a M x M covariance matrix). For the above 3 items case, Equation (6-2-3.29) becomes

”I; 1 0 O 700 l 0 0 I11
77,]. = o 1 o y,, + o 1 o 12(vlj). (6-2-3.30)
0 0 1

773 j 7 20 0 0 1 ’13
Note that the Rasch Model is a special case of the 2-P IRT model in that it lets A = 1 , a

vector of all 1’s in Equation (6-2-3.29) or (6-2-3.30). Note that though the design matrix

~130-

A 1 seems trivial (A j = I" for all i, which is true if all the students take the same set of

items and if there is no missing observation), it can be different from student to student.
Now, notice the close similarity of Equation (6-2-3.29) to Equation (3-9), which is

the HLM2F model developed in this dissertation. In fact, by letting u j = Av j and

denoting D[uj] a r in Equation (6-2-3.29), we have I = A‘I’AT . Thus, by formulating the
IRT model by HGLM with a factor structure, we recognize that the standard 2-parameter
IRT model is a non-linear version of the model Equation (3 -9), which I call HGLMZF.

This is the reason why I say the IRT model is a non-linear version of HLM2F. Instead of
standardizing woo = 1, which is the standard procedure of IRT programs, we can set 20 =

1 for model identiﬁcation. Then, we can directly incorporate the algorithm of HLM2F

into the micro iteration of HGLM, which consists of a doubly iterative procedure.

Multi-dimensional IRT model
The nice thing about this formulation is that we can easily expand it to a
multidimensional model. That is, if we think that the test is measuring multiple abilities,

then we change Equation (6-2-3.9) to

7M #1 111112 (a!)

7721 = #2 + 121122 92' (6-2-3.31)
’73} #3 131 ’132 J
and Equation (6-2-3.12) to
31 __ A») (“01]
[gzjj—[Ao + u”. , (6-2-3.32)

-l31-

W38 SCI t0
1'

“0} O 1 0 . .
where u, ~ N2( 0 ,I2 = O 1 ). Note that the covanance matrix of
J

the identity matrix so that the model can be identiﬁed except for the rotational

indeterminacy.

For a HGLM formulation with dummy variables, we just need to change A by

 

 

 

 

(31‘ {’111 312‘
112 121 122
. E E 5 v1, .
addrn another column from A = to A = , and v. from v . to [ )m
g ’14 ’14! 1'42 j ( U) v2}
KAN) Kl,” 1M2)

Equation (6-2-3.29).

By this formulation, we can perform the unidimensionality test by formulating the
null hypothesis H091,2 = O for all i. Further, if we model the level-2 by some individual
characteristics, then we can perform a statistical test for Differential Item Functioning
(DIF) (Hambleton, Swaminathan, & Rogers, 1991).

For the longitudinal data, we can extend the IRT model to a latent ability growth
model based on this formulation. In that case, the scaling issue should be carefully

addressed in order for the latent ability growth to be meaningful.

6-2-4. Other Possibilities

Incorporating the factor model into a three level model is a natural extension of

the two level model. In the three level case, there are two possibilities of using the factor

-132-

analysis model; one is in level-2 covariance matrix 1',r , and another is in level-3
covariance matrix 11,.

Incorporating the factor model into a three level multivariate hierarchical model is
also straightforward for conceptualization.

As shown in the previous IRT application section, applying the factor analysis
model to hierarchical generalized linear model (HGLM), which has a non-linear outcome,

seems to be promising, at least to measurement models in psychometrics.

~133-

Appendix. Derivation and Algorithm

The purpose of this appendix is to derive the Maximum Likelihood Estimators
(MLE) via Fisher scoring algorithm and to give the detailed steps of algorithm for
computation.
M:

Subscript Model (Model for each group)
Y}. = Aug + Aszryj +rj , (i =1,2,...,J) (A-l)
where
rj ~ N(0,0'21,,j),
71,- ~ N (031’).
and rjand njare independent forallj. ins n]. x 1, AUis nj x F, 19, is Fx 1, 77} isthe
Mxl factorscore vector, rj isa nj x1; Azjisa nj x R, Aist M, Injisthe
n j x n 1 identity matrix, and ‘I’ is the positive deﬁnite M x M symmetric matrix.
By writing
ej = Asznj + rj (A-2)
in Equation (A-l), we see the model (A-l) as general linear model,
1’]. =A,j.0,+ej. (A-3)
The variance-covariance matrix is

V]. a Var(ej) = Asz‘I’ATAzrj + 021,,‘ . (A4)

-134-

The log-likelihood:

The Log-Likelihood I is
1 = — %[N log(27r) + (N - JM)log(0'2 ) + J logl ‘I’I-Z logl cf] + 2(efe, — 4,1,, -AC;‘AT «Le, )/ 02],(A-5)

where C;' = (AT/1,3 Asz + 02‘1“)" that will be explained in Equation (A-16),
ejTej is given by Equation (A-27), and

Azrjej is given by Equation (A-28), to be shown later.

Computation:

In the following, the subscript for the A2 will be omitted for simplicity of the
formula and will be written as A , and all the subscript j will be omitted. If it is necessary
for clarity, we will explicitly use the subscript.

Review of Standard 2-level HLM

Elements required for standard 2 level MLF, Fisher Scoring Algorithm are:

dvec(z') do'2 32 log L02, 2302;3’) - —
d¢T ,F=W,andforH=E[ 6¢T5¢ ],ATV'A,ATV2A,

 

 

”(V-2), and eTV'ze since

 

1 many _. -. o‘vec(V)
H:--[— V ®V ——
2 5W ‘ ) aw (A-6)
= —-12-[E’(ATV"A® A’V"A)E + ETvec(ATV‘2A)F+ FTvec(ATV’2A)E+ tr(V'2)FTF];
6”] L , , 2; . .
for S = 0g (:61 0' y), E, F, ATV"e, tr(V"), and eTV’Ze are required srnce

-135-

S = QMYW“ e V" )vec(eeT - V)
2 5¢
1 (A-7)
= ~2-[Ervec(ATV"eeTV"A — ATV"A) + FT{eTV'Ze — tr(V'l )}].

Now for 2 level MLF factor model, we need the same things. But there are some
differences, because in this case,
2' = A‘I’AT (A-8)

2' is R x R and is not full and matrix which is not invertible

 

A is R x M, and ‘I’ is a M x M positive defrnite symmetric matrix.

(I) Computation 01E and F

Deﬁnition 1. Deﬁnition of ¢ vector

¢ =({’11j}a{‘/’M}’02)T

A-9)
=(AT,WT,0'2)T, (

is a Q x 1 vector of unknown unique parameters in the variance covariance matrix V]. for

all j, and it is composed of three elements, a Ql x lvector A. , a Q2 x lvector l/I , a scalar

0'2.

(a) A QI x lvector ,1 consists of Ql unknown elements in the R x M factor loading
matrix A (Ql 5 RM). Thus, some of the elements (Q, ) are unknown and to be estimated
from the data, and the rest of the elements (RM — Q) are known. To construct A , we
align the unknown elements in the order of starting from (1 ,1) element of A and going

down the column, if we ﬁnd the unknown element, then put it in the A vector. Then,

~136-

move on to next element in the same column of A . Once we ﬁnish searching in the ﬁrst

column, we move to the second column, and so on to the last Mth column of A .
Exam 1e

Ex 1.

O
O
For example, if A = A” 1 then 11 = (A?!) .

o a.
o 2..
Ex 2.
1 0 ’121
A
IfA= ’1“ A” ,then A: 4'.
0 l [in
[141 1'42 2'42
,1,, ,1,,
. . 12. 112‘:
Note that rn exploratory factor analysrs model, we let A = 32 A, , all the
l 2
2'41 1‘42

elements in the A (factor loading matrix) are unknown. But, we are here thinking about
conﬁrmatory factor analysis in which we specify some of the elements as ﬁxed number.
In the ﬁrst example, we ﬁxed 2“ =1,,1,Jl = 0,24, = 0,2,2, 2 0,222 = 0,132 =1, and in the
second example, we ﬁxed as ’11, = 1, A.“ = 0, , ,1,, = O, 1.32 =1 . Thus, in our model, some of
the elements in A are known and some of them are unknown. The number of known

elements is RM — QI and the number of unknown elements is Ql .
(b) A Q2 x 1 vector (11 consists of Q2 elements, where Q2 be the number of unique

elements in M x M variance-covariance matrix ‘I’ , and since ‘1’ is always symmetric,

-l37-

the number of the unique elements Q2 is Q2 = M(M + 1) / 2. We align the

Q2 x lvectort/I as follows.

W11 W12 V’lM
W21 ll’zz

‘11: : :9V’=(V’llaV’zls”'aWMlaV’zza'l’zsama'l’m:W33a”°aV’m)T-
Wm WM

This operation is often written as w = vech(‘1’) (vector half).

Adding the number of unique elements in 02 , which is clearly 1, we have in total,

Q=Q1+Q2+1=Q1+M(—A:+2+1

elements in the vector ¢.
ﬁg.
Max(Q,) = RM and Q2 = M(M + 1) / 2 . Thus, Q2 is always exactly
M ( M +1) / 2 , the number of unique elements in the symmetric matrix ‘1’ , but
Q, satisﬁes inequality restrictions Q1 3 RM because we ﬁx some of the elements in A ,

depending on our theory.

Deﬁnition 2. Deﬁnition of E matrix

We deﬁne E matrix as

E_dvecr_(dvecr dvecr dvecr)
- d¢T _ all" ’dw’ ’de” . (A-lO)
=(E,,E,,,Ea,)

 

-138-

The dimension of E is a R2 x Q matrix, and E matrix consists of three parts, E, , EV , and

E62 , which is always a R2 x 1 vector of 0 because I does not depend on 0'2. Thus, E
matrix always has a form

E=(E,,EV,0). (A-ll)
And, E, is a R2 x Ql matrix, E, is a R2 x Q2 matrix.
Now,
(a) For E, , we compute this column by column. The qth column of E, corresponds to

dvec 2'
d/l

1}

 

, where x1, is the qth element of A (that is, the qth unknown element in A). The

order is determined by counting for the same column down the rows and then move to the

next column, go down the rows. Thus,

 

 

 

 

_ area! _ aiec(A‘PAT) _ 5(A‘I’AT)
1., " d1, ‘ at, ‘ m‘ 0%,. )
’ ’ T ’ (A-12)
- vec(-éA ‘i’AT + N? 6A )- vec(D ‘I’AT + A‘I’DT )
‘ at, at, ‘ ‘11 1v ’
3A .. .. . . . . . .
where D1,;- = a: = {1 for posrtlon (1,1) 1n A and 0 for other posrtrons 1n A} for all 1, j.
1}
Example.
1 O
321 0 diecz' ﬁrecr
when A = 1 , E, =(E,“,E,u)=( €12] , 07142 ).Then,
0 2.2
E,“ = vec(Dkl‘I’AT + MID; ) = vec{D,,l‘PAT + (D,l\iIAT)T}.

—139-

 

0 0
D l 0
’12! - O O ’
0 0
0 O O 0 O 0
DA, (VAT: 1 0(11’11 91’1sz ’12] O 0]: W11 ’1le” V12 ’14z'l’42 ,and
' 0 0 W2, ([1,, 0 O l 11,, 0 O 0 0
0 0 O 0 0 0
O O 0 O 0 W11 0 0
Wu Ami/’11 W12 1421/42 0 321W” 0 O
T A‘PDT =
D123“ + ‘2' o o 0 o + 0 v1.2 0 o
0 0 0 0 0 lat/1,2 0 O

0 Wu 0 0

W11 21211/41 W12 ’142‘l’42
O ([112 0 O

0 1'42 V’ 42 O 0

Then taking vec, we obtain E ,1! , which is a 16 x 1 column vector. Explicitly, it is

_ o 0) (0) (0) o o o o o T
E,“ — (0, V’ 1(1 ),0,0, W 1(1 ’ZAQIV/ll 2 W 12 s 1:2)1” 12 )90, V/l(2 )109090’ 1:2)W1(2)’09O) , Where the
superscript on the parameters means that we use the current values of estimates to

compute E ,‘ . Similarly,
E1, = (0,0,0, Wig):0,0’0a121'/’2(i))’0’0,09 VS”, wéi”,1§?’w§?’a W§§’,2/1§3’W§§’)’ -
(b) Next, for E w , we compute it like the old I in HLM2. That is, we can compute
EV one at a time by
E, =(A®A)D;, (A-13)

where

-140-

 

where w is the part in the vector ¢ whose elements are the unique elements of ‘1’ , as

deﬁned in (A-9) in Deﬁnition 1.

 

 

 

 

1 O
O
Example.whenA= 2;! 1 ,LIJ=(ZH Zn),andy/=(Wn, W219 W22)T,then
21 22
0 142
W11
W21
W12
0' _ o'liec‘P _ W22 _(0’\’ecw o'vecy/ area/I)
1v WT 011/” 1,12, 1112,) 31/1,, , W21 , W22
(1 O 0
0 l 0
‘ o 1 0'
KO 0 1

Then, by Equation A-13, we get the 16 x 3 matrix E, by

-141-

1
321
o
o 2,, o 2,, o

= (A ® A)(D. ’ DLZI ’ DJ’rz)

Vll

= (A 18 A)(vecD,,” , vecDV, , veCDVn )

E,=(A®A)D;={

HOO
2.}:
'-‘OO
CO
OH—‘O
#090

= ((A ® A)vecD,,” , (A ® A)vecD,,u , (A ® A)vecD,,u)

(EWII Ell’zl Esz)

 

 

( l 0 0 I
121 0 0
0 1 O
0 2,, 0
2,, 0 0
1221 0 0
0 2,1 0

= 0 121/142 0
O 1 0 '

o 2,l o

O O 1

o 0 2,,

o 2,, o

0 221/142 0

o 0 2,,
\ O 0 242,}

Deﬁnition 3. Deﬁnition of F vector

A 1 x Q row vector F is deﬁned by

 

(A- 14)

E_xa_1_n1212.

-142-

1 O
0
when A: A“ , ‘1’ =(Wn W”) , ¢=(,1,,,A,,,1//”,1/I,,,y/,,,0'2)T,a6 x 1 column
0 1 W21 W22

0 142

MM 1 22 1
vectorandQ=Q,+Q,+1=2+-L2—+—2+1=2+—(—2:—)+1=2+3+1=6.Then,

F=(o, o, o, o, o, 1).

(2) Computation at H (Expected Hessian matrix) and S (Score vector)
Note that H is a Q x Q matrix, and S is a Q x lvector.

(a) Computation of H (expected Hessian matrix).

 

H“) = my l°g 2203?», ”2’11,” = i. H)” .
where
Hy) = _ g {( 5"ng )) T(Vj" ® V,"' )[ﬂ‘;:#}|”’m
= —%{ET(A,TJ.VJ."A,, ® ALV;‘A,,)E + ETvee(A,T,V;2A,,)F (A-IS)
+ FT[vec(A,T,-Vj-2A2, )]T E + (’(V;2)°FTF}1,.,M 2
where

o'lzec(z') _ id:
a¢T

 

E:

To compute H3.” , we need to know A; V,."AU , A; Vj‘2A2j , and tr(Vj‘2 ).

These quantities are function of Weighted Sufﬁcient Statistics (WSS), which are

-143-

A]; V,.-'14,, A,’",V;'A,,
WSS = AITJ. V,"A,, ALI/j"); , for all j, the WSS is in the functions of Sufﬁcient
Airs-'2

14,521, , A,",A,.

1

Statistics (SS), SS = ASA”. ALY, , for all j, and ACjT'AT for all j. Therefore, we ﬁrst
Air,-
show the computational formulae of ACJT'AT and the WSS, and then we show the

computational formulae speciﬁc to H)”.

Formula of ACjT'AT :

 

We ﬁrst compute a M x M matrix
C;‘ = (AT/1,321,» + 0'2‘11")’l ,j = 1,2,...,], (A-16)
and then compute a R x R matrix ACJT'AT for all j. Then, we compute the WSS for the
current ACjT'AT.
Computation of WSS:

As we can see in the following, WSS is a functions of SS and AC;'AT.

. A,§V;'A,, = {lug/1,, — A514,, -AC;'AT 2,214”); (A-17)
. 27,421, , = 0'2(A,TJ.A,1. — 21,324,, ~ACJT'AT -A,T,A,j); (A-18)
. AijfY, = 04015.); — 21,321,, .AC;‘AT - 21,313.); (A-19)
. ALVﬂAN = 0‘2(A,TJ.A,j — 21,221,, -AC;‘AT -A2TjA,j); (A-20)
. .43).“); = a‘2(A,",Y, — A,’,A,, -AC;'AT - A5123). (A-21)

-144-

 

Computational formulae of A2T,V,"A,, , A27, V,.-2A,, , and tr( V,'2) are as follows.
0 AI, V,‘l A2, is computed by the formula (A-20).
. A,T,V,‘2A,, = 0‘2(A,T,V,"A,, — A{,V,"A,, -AC;'AT -A,T,A,, ) (A-22)

0 tr(V,'2)=(n, — MM“ +tr{(C,‘"1’")2}. (A-23)

(b) Computation of S (Score vector).

. alogL(Y;r.A.r.az)| ’ .
5(1) = = st!) ,
a¢ l2,” ,2 .

where

. 1 a) Vj T —l -l T
Si." =§[{_e%(;_)} {(V, ®V, )vec(e,e, -V,)}i,=

I _ . _ _ _ -
= 2[ETVBC(A2T1VJ 13191“er IAZ} - A,T,V, l"121“" FT{e,7.'V, 23/ —”(V1 1),“

‘(U
(A-24)

M‘” '

(i) T -l T -| T -2 -I

To compute S, ,we needto know A,,V, e,, A,,V, A2,, e,V, e,,and tr(V, ).
. T -I T -l T -2

Thus we need to know computatlonal formulae of A,,V, e, , A,,V, A,, , e, V, e, , and
tr(V,’l ). To compute AI, V,"e, , AI, V,"'A,, , e,7.'V,'2e, , and tr(V,‘l ) , we need to have
evaluated e, , 61") , efe, , and Age, . Those quantities are computed as in the following.
We ﬁrst note that

e, = Y, - AUQIU) , (A-ZS)

where 6}") is the current estimate of 61 , and is computed by

J J
61‘" = (Z A], V,"A,, )T' 2 AI, V," Y, . (A-26)

j=l jal

-145-

 

 

Note that this is a function of WSS, i.e., 61‘” = f (WSS). Next we compute efe, and

A222, by
efe, = YfY, — 26;"’TA,T,Y, + 61“’TA,T,.A,,19,<”; (A-27)
A,T,e, = 2,31; — Jig/1,21“), for all j, (A-28)

which are also functions of SS and WSS, i.e., ere, = f (SS, WSS) and

A,T,e, = f (SS, WSS). Then, using these values, we can compute the quantities necessary

to compute S)".
T —l _ T -l T -l (i),
A,,V, e, _ A,,V, Y, —A,,V, A,,6, , (A-29)
AT.V."A . was already given in formula (A-18) in the computation of H1";
21 J 21 J
e,TV,'2e, =a“(efe, -efA,, -AC,T'AT- T,e, +efA,, -AC,T'AT -A,T,A,, -AC,"AT -A,’,e,) (A-30)

tr(V,"‘) = (n, — M)o"2 + tr(C,T"I’"). (A-31)

Overpll Steps via Fisher Scoring Algorithm:

Step 1. Compute and Store the Sufﬁcient Statistics (SS), i.e.,

Alli/1U AZTJ‘AZJ'
$3 = 21,321,, A5,); (A-32)
A1516-

for all j.

Step 2. Compute Statistics values 61°) , ¢(°’ = (21‘0", w‘o’r,az(°’)rby

-146-

J J
61‘” = (Z A.C-A.,-)" (2 Asia). (A—33)
j=l

jai
and compute the initial values of 21°) and VI”) by the method which is provided in the

next section.

2“” is computed by ﬁrst COIDPUIing

Note: 0'
9,3.) = (A; A, ,)'l (A,T,Y, — ALA, ,191‘”) , (A-34)

and then

1 J
0
0'“ ) = mg“,- ‘ Aljglm) " 142199)“)?- " Aljglw) ‘ A210i3))- (A’35)

Step 3. Compute a M x Mmatrix C," = (ATA2T, A,,A + 02‘1’")"' ,j = 1,2,...,], by
constructing A , ‘1’ from 12“” , and then compute AC,'AT for all j.

Note: We construct A , ‘1’ from ¢(°’ , and use them to compute C,’l with 02”).

Step 4. Compute the Weighted Sufﬁcient Statistics (WSS), i.e.,

A,T,V,"A,, A,T,V,"A,,
wss= A,’,.V."A,, A,T,V,“Y, (A-36)

J

Asa-'1:-
for all j.

Step 5. Compute the matrix E and the vector F.

Note 1: F doesn’t change, though E changes for each Fisher Scoring iteration.
Note 2: E”) = f (¢‘°’). That is, E”) depends on ¢‘°’and E“) will change as

(2“) changes.

-147-

Step 6. Compute H ,S by the formulae in Equation (A-15) and (A-24) along with
computational formulae of the components.

Note: H,S=f(E, WSS).

Step 7. Compute the new e5“) by Fisher Scoring Algorithm, .i.e.,

r5“) = 451°) - H"S. (A-37)
Note: ¢“) = f(H"S).
Step 8. Compute the new
J -1 J
61‘” = (Z Ali-W1.) 2 Am"):- (~38)
j=1 j=l

based on the new ¢"’.
Note: 6,“) = f (WSS)

Step 9. Compute the Log-Likelihood l.

The Log-Likelihood I is computed by

1 J J
I: —-2-[Nlog(27t)+(N - JM)log(0'2)+ Jlogl‘l’I-ZloglCﬂ +Z(e,’e, — e121,, -AC;'A’ -A’-e-)/ 0’],

i=| j=| 2’ I
as in Equation (A-S), where
efe, is computed by Equation (A-27), and A,T,e, is computed by Equation (A-28).
Step 10. Go back to Step 3.

Note: We use the same cutoff criterion as current HLM2, i.e., change in log-likelihood, to

get out of the loop.

-148-

Starting Values:

To start the Fisher Scoring, we use the following quantities.

 

9:°’— = (Z A,,A,,)"(Z AUY) , (A-39)
j=l
9,3.) = (A5, A, ,)"(A,T,Y, — 2}, 211,191”); (A-40)
J
02“”: ——Z(Y, - 2,91» — Azﬂé‘j’m’, — 2,910)— A 910)) (A41)
N— JR ,_ .
_ 0,2(0) J r _l
V = 2042.242.) ; (A42)
j-l
_ 1 J
D = 72 away”; (A-43)
F.
f = 1'5 — i7 ; (A-44)

F is the method of moments estimate in a balanced design. At this point, we check the

positive semi-deﬁnitness of the zT matrix. If the 27 matrix is not positive semi-deﬁnite

(p.s.d.), we ﬁx it up in the following way: i

1) If i, so, set a, to .113].

2) If l 5,12 if, , let T, =.9\/??,,—. (use the sign of the original ii, ); this reduction is 10
percent from the original value.

3) Now check if f is p.s.d. If yes, go to the next step. If no, go back to step 2), and let

2,: —.8 f, ?, an,other 10 percent reduction from the original value. Check if i" is p.s.d.

again. Keep repeating this process. If after ﬁve times of this repetition and if f is still not

p.s.d., then set all of the off-diagonal elements to O.

-149-

With the above procedure, we obtain a p.s.d. ? , an estimate of 1. Using this i" , we ﬁrst

obtain the starting values for A . We start with ‘I’ = [M . Then, since I = A‘PAT , we have

t = AAT. Since i" is p.s.d., it can be decomposed into 5 : CACT , where A is an

R x R matrix in which eigenvalues are on the diagonal, and C is an R x R matrix that the

corresponding eigenvectors are in columns. We approximate C and A by taking the ﬁrst

M largest eigenvalues and the corresponding eigenvectors. We denote them as A' and

C' . Note that A. is a M x M matrix and C' is R x M matrix. Then an equality holds

.1 .1 .1
approximately f = NA” 5 C'A'C'T = (CA 2 )(A 2C”). Thus we obtain A. = C'A 2.

Now we use the information for A speciﬁcation. For clarity, we use an example.

Suppose that we speciﬁed the A as A =

(’111
2,,

 

Us:

1122
3a:

1.2

in)
’123

333
2'43

 

in)

 

 

( 1 0 2,)

0 1 A,

O 21, 1 and suppose we obtained
2, ,1, 0 '
U, 0 0}

from the previous decomposition. Then, we let the

f 1 0 313/133)
0 1 123 / 1a
A“) = 0 2,, /2,, 1 (A-46)
1'41 /A1, A42 /’122 0

 

 

‘151/111 0 0 /

-150-

and use it as the starting values for the estimate of A . In case the pivotal estimates in
A10) are close to zero (in this example, either A“ or 2,, or ,1,, ), then we use the initial
estimate in A. as it is. Note that we need to check if rank( A101) = M .

Next, we compute the starting value for the estimate of ‘I’ based on A“) and F .
Since I = A‘I’AT and rank(A) = M , we can obtain ‘1’ by
L11 = (ATA)“(ATrA)(ATA)" because rank(ATA) = M and thus it is invertible.
Therefore, we let the starting value for the estimate of ‘I’ be

@(0) = ( [\(ov 8(0))-1( A(°)T?A‘°’ X Nov 11(0))4 . ( A _47)

Note that we need to check whether ‘1’”) is positive semi-deﬁnite before we use it as the
starting value.

We leave an option for the users that they can specify the starting values for A“)

and ‘1’”) .

Endnote:

Derivation of log-likelihood formula in Equation (A-S ):

In general, linear model with normal distribution as in Equation (A-3), the log-

likelihood is given by

N 1 J _, 1 J T _,
l=——log(2:r)+§Zlog|V, 1728in e,, (B-1)

2 j=l j=l
where V,’1 is deﬁned in Equation (A-4) and e, is deﬁned in Equation (A-2). By Smith’s

Theorem 2, (AQAT + ‘11)“1 = ‘1’" - ‘1’"A(AT‘I’"A + Q" )'1 A7"1’",

-151-

we have

-1_ —2 -1 T T
V. -0 (I,,—A,,AC,AA,,),

I

where
C,‘ = (A"A,T,A,,A +02‘I’")".
Now,
log| V,"'l = log|0"2(lnl - A,,AC,T'ATA,T, )l = logl 0’21", |+ loglIn1 - A,,AC,'ATA,T,|

T T
2 ' 2' _
=41, log(rr )+10gA2j’A Ilrlcjnl

 

 

(since

 

B=|A||D—CA"B|(=|D||A—BD"C|) |D-CA"B|=A BIA"I)
C D ’ C D '

TT
1A2)

2
=_n,log(0' )+logA,,A I

+ logl C,"|

 

 

A B
(Agaiaapplylc D'=(IAIID-CA"BI)=IDIIA-BD"C1-)

 

= -n, log(02)+ logllnll C, — ATA,T.I lA,,A

j ll

 

+ log| C,'|
= -n, log(az)+1og(1)+loglC, — ATA2T,A2,A|+1081C;'|

_—_ _n, log(a'z ) + loglC, - ATALAUAI + log! Cj'l

= —n, log(a’) + logl 02‘1’"|+log|C,7'| (of = (ATA2T, A, ,A + 0'2‘1’")")
= —n, log(a’)+ M log(02)+ l0gl‘*"'|+log|C}'| (I‘P"|=I‘PI")
= —{(n, - M)1og(a’)+10g|‘P|—log|C,'|}. (B-2)

-152-

J J
2 log| V,."l = — {(N — JM)log(0'2 ) + Jlog|‘1’|—Z loglC,'|} . (13-3)

j=l j-l
And

e,TV,"e, = e,T(In, — A,,AC,"ATJ€1,T,)e,/0'2
=(efe,—eIA,,-AC,'AT-Ar.e )/0'2,

211'

(34}

where efe, is computed by Equation (A-27), and A,T,e . is computed by Equation (A-28).

J

Thus,

J J
Zerfe, = 2(efe, — 4A,, -AC;'AT . Are )/ 0'2 (13-5)

21 J'
j=l j=|
Therefore, plugging Equations (B-3) and (B-5) into Equation (B-l), we obtain the log-

likelihood
l ’ J .
1: -5“); log(27r) +(N - JM)log(az)+ JlogM-Zloglc;'| + 2(efe, - 42,, ~AC,"AT ~A,T,e,)/ 02],

I" I"

which is Equation (A-S). Notice that both of them are functions of WSS and AC,'AT.

-153-

References

Achenbach, T. M. (1991). Manual for the Child Behavior Checklist/4-I8 and 1991
Proﬁle. Burlington: University of Vermont Department of Psychiatry.

Adams, R. 1., Wilson, M., and Wu, M. (1997). Multilevel Item response Models: An
Approach to Errors in Variables Regression. Journal of Educational and
Behavioral Statistics, Spring 1997, Vol. 22, No. I, 47-76.

Aitkin, M., Anderson, D., & Hinde, J. (1981). Statistical Modeling of Data on Teaching
Style. Journal of Royal Statistical Society A, 144, Part 4, 419-461.

Aitkin, M., & Longford, N. (1986). Statistical Modeling Issues in School Effectiveness
Studies. Journal of Royal Statistical Society A, 1 4 9, Part 1, [-43.

Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis, second
edition. New York, John Wiley & Sons.

Arbuckle, J. L. (1995). Amos user’s guide. Chicago: SmallWaters.

Bartholomew, D. J., & Knott, M. (1999). Latent Variable Models and Factor Analysis,
second edition. Kendall’s Library of Statistics 7. London, Edward Arnold.

Becker, B. J. (1988). Synthesizing Standard Mean-change measures. British Journal of
Mathematical and Statistical Psychology 41, 257-378.

Becker, B. J. (1990). Coaching for the Scholastic Aptitude Test: Further Synthesis and
Appraisal. Review of Educational Research, vol. 60, No. 3, 373-417.

Bentler, P. M., & Wu, E. J. C. (1995). EQS for Windows User ’s Guide. Encino, CA:

Multivariate Software, Inc.

-154-

Bock, R. D. (1988). Multilevel Analysis of Educational Data. London: Academic Press.

Bock, R. D., & Aitkin, M. (1981). Marginal Maximum Likelihood Estimation of Item
Parameters: Application of EM algorithm. Psychometrika, 46, 443-459.

Bollen, K. A. (1989). Structural equations with latent variables. New York,: Wiley.

Bollen, K. A. and Joreskog, K. G. (1985). Uniqueness does not imply identiﬁcation: A
note on conﬁrmatory factor analysis. Sociological Methods and Research, I 4,
155-163.

Bryk, A. S., & Frank, K. (1991). The specialization of teachers’ work: An initial
exploratuion. In S. W. Raudenbush & J. D. Willms (Eds.), Schools, classrooms,
and pupils: International studies of schooling ﬁom a multilevel perspective (pp.
185-204). Orlando, FL: Academic Press.

Bryk, A. S., & Raudenbush, S. W. ( 1992 ). Hierarchical Linear Models. Applications
and Data Analysis Methods. Newbury Park, CA: Sage.

Bryk, A. S., Raudenbush, S. W., & Congdon, R. T. (1996). Hierarchical Linear and
Nonlinear Modeling with the HLM/2L and HLM/3L. Chicago: Scientiﬁc Software
International. Inc.

Catchpole, E. A., & Morgan, B. J. T. (1994). Boundary estimation in ring recovery
models. Journal of Royal Statistical Society, Series B, 49, 95-101.

Cheong, Y. F., & Raudenbush, S. W. (1999) Measurement and Structural Models for

Children ’s Problem Behaviors (under review).

-155-

Chou, C., Bentler, P. M., & Pentz, M. A. (1998). Comparisons of Two Statistical
Approaches to Study Growth Curves: The Multilevel Model and the Latent Curve
Analysis. Structural Equation Modeling, 5(3), 247-246.

Coleman, J. S., Hoffer, T., Kilgore, S B. (1982). High school achievement: Public,
Catholic and other schools compared. New York: Basic Books.

Cronbach, L. J. (1951). Coefﬁcients alpha and the internal structure of tests.
Psychometrika, 16, 297-334.

Davis, W. R. (1993). The F CI rule of identiﬁcation for conﬁrmatory factor analysis: A
general sufﬁcient condition. Sociological Methods and Research, 21, 403-43 7.

De Leeuw, J ., & Kreft, I. (1986). Random Coefﬁcients Models for Multilevel Analysis.
Journal of Educational Statistics, 1 1(1), 57-85.

DerSimonian, R., & Laird, N. M. (1983). Evaluating the effect of coaching on SAT
scores: A meta-analysis. Harvard Educational Review, 53, 1-15.

Gan, L. & Jiang, J. (1999). A Test for Global Maximum. Journal of American Statistical
Association, September, 1999, Vol. 94, No. 44 7, 847-854.

Garthwaite, P. H., Jolliffe, I. T., & Jones, B. (1995). Statistical Inference. London:
Prentice Hall.

Gleser, L. J. and Olkin, I. (1994). Stochastically Dependent Effect Sizes. In H. Cooper &
L. V. Hedges (Eds.), The Handbook of Research Synthesis, (pp. 339-355). New
York: Russell Sage Foundation.

Goldstein, H. G. (1986). Multilevel Mixed Linear Model Analysis Using Iterative

Generalized Least Squares. Biometrika, 73(1), 43-56.

-156-

Goldstein, H. G. (1995). Multilevel Statistical Models, Second Edition. Kendall’s Library
of Statistics 3. London: Edward Arnold.

Goldstein, H., and McDonald, R. P. (1988). A general model for the analysis of
multilevel data. Psychometrika, 53. 435-67.

Goldstein, H., Rashbash, J ., Plewis, 1., Draper, D., Browne, W., Yan, M., Woodhouse, G.,
and Healy, M. (1998). A User ’s Guide to ML Win. London: University of London,
Institute of Education.

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of Item
Response Theory. Newbury Park, California: Sage Publication.

Hedeker, D., & Gibbons, R. D. (1996). MIXOR: a computer program for mixed-effects
ordinal regression analysis. Computer Methods and Programs in Biomedicine, 49,
229-252.

Heywood, H. B. (1931). On ﬁnite sequences of real numbers. Proc. Roy. Soc. Lon. 4:
245-253.

Howe, W. G. (1955). “Some contributions to factor analysis.” Report No. ORNL-l9l9.
Oak Ridge, TN: Oak Ridge National Laboratory.

Huttenlocher, J. E., Haight, W., Bryk, A. S., & Selzer, M. (1991). Early Vocabulary
Growth: Relation to Language Input and Gender. Developmental Psychology,
27(2), 236-249.

Jennrich, R. J ., & Schluchter, M. D. (1986). Unbalanced Repeated-Measures Models with

Structured Covariance Matrices. Biometrics, 42, 805-820.

-157-

Joreskog, K. G. (1979). “Author’s adendum to: a general approach to conﬁrmatory factor
analysis,” in K. G. Joreskog and D. Sorbom (eds.) Advances in Factor Analysis
and Structural Equation Models. Cambridge, MA: Abt Books.

J oreskog, K. G. and Sorbom, D. (1979). Advances in factor analysis and structural
equation models. Cambridge, MA: Abt Books.

J oreskog, K. G., & Sorbom, D. (1995). LISREL8 User ’s Manual. Chicago: Scientiﬁc
Software International.

Kamata, A. (1998). Some Generalizations of the Rasch Model: An Application of the
Hierarchical Generalized Linear Model. Unpublished doctoral dissertation,
Michigan State University, East Lansing.

Kalaian H. A., & Raudenbush, S. W. (1996). A Multivariate Mixed Linear Model for
Meta-Analysis. Psychological Methods, Vol. 1. No. 3, 227-335.

Kline, R. B. (1998). Principles and Practices of Structural Equation Modeling, New
York: Guilford Press.

Lee, V., & Bryk, A. S. (1989). A Multilevel Model of the Social Distribution of High
School Achievement. Sociology of Education, 62, 172-192.

Little, R. J. A., & Rubin, D. B. (1987). Statistical Analysis with Missing Data. New York:
Wiley.

Longford, N. T. (1987). Fast Scoring Algorithm for Maximum Likelihood Estimation in
Unbalanced Mixed Models with Nested Random Effects. Biometrika, 74(4), 817-
827.

Longford, N. T. (1993). Random Coefﬁcient Models. Oxford: Oxford University Press.

-158-

Longford, N. T., & Muthén, B. O. (1992). Factor analysis for clustered observations.
. Psychometrika 5 7, 581-97.

Lord, F. M., & Novick, M. R. (1968). Statistical Theories of Mental Test Scores. Menlo
Park, Calif: Addison-Wesley.

Muthén, B. O. (1989). Latent variable modeling in heterogeneous populations.
Psychometrika, 54, 557-585.

Muthén, B. O. (1991). Multilevel factor analysis of class and student achievement
components. Journal of Educational Measurement, 28, 338-354.

Muthén, B. O. (1998). M-Plus User ’3. Muthén & Muthén, Los Angeles.

Muthén, B. O. (1998). Second-generation structural equation modeling with a
combination of categorical and continuous latent variables: New opportunities for
latent class/latent growth modeling. Paper presented at the conference New
Methods for the Analysis of Change, Pennsylvania State Univerisity, October
1998.

Muthén, B. O. & Curran, P. J. (1997). General Longitudinal Modeling of Individual
Differences in Experimental Designs: A Latent Variable Framework for Analysis
and Power Estimation. Psychological Methods, Vol. 2, No. 4, 371-402.

Muthén, B. O., Khoo, S. T. & Gustafsson, J. E. (1998). Multilevel latent variable
modeling in multiple population. Under review.

Rasch, G. (1960). Probabilistic Models for some intelligence and attainment tests.

Copenhagen: Danish Institute of Educational Research.

-159-

Raudenbush, S. W. (1994). Fisher Scoring/[GL5 and EM algorithm for a variety of two-
level models, College of Education, Michigan State university, Unpublished
Manuscript.

Raudenbush, S. W., & Bryk, A. S. (1986). A Hierarchical Model for Studying School
Effects. Sociology of Education, 59, 1-17.

Raudenbush, S. W., Becker, B. J., & Kalaian, H. (1988). Modeling Multivariate Effect
Sizes. Psychological Bulletin, Vol. 103, No. 1, 111-120.

Raudenbush, S. W., Rowan, B., & Kang, S. J. (1991). A Multilevel, Multivariate Model
for Studying School Climate With Estimation Via the EM Algorithm and
Application to US. High-School Data Journal of Educational Statistics, Vol. 16,
No. 4, 295-330.

Raudenbush, S. W., & Sampson, R. (1997). Assessing Direct and Indirect Associations in
Multilevel Designs with Latent Variables. (in press).

Raudenbush, S. W., & Kasirn, R. M. (1998). Cognitive Skill and Economic Inequality:
Findings from the National Adult Literacy Survey. Harvard Educational Review,
Vol. 68, No. 1, Spring 33-79.

Raudenbush, S. W., & Sampson, R. (1999). “Ecometrics” Toward a Scientiﬁc Ecological
Settings, with Application to the Systematic Social Observation. Sociological
Methodology, Vol. 29, 1-41.

Reckase, M. D. (1991). The Discriminating Power of Items That Measure More Than
One Dimension. Applied Psychological Measurement, Vol. 15, No. 4, December,

361-373.

-l60-

SAS Institute (1990). SAS/STAT User ’s Guide, Version 6, Fourth Edition, Volume 1. SAS
Institute Inc., Cary, NC, USA.

SAS Institute (1996). SAS/STATSofrware: Changes and Enhancements through Release
6.11, Cary, NC, USA.

Thum, Y. M. (1997). Hierarchical Linear Models for Multivariate Outcomes. Journal of
Educational and Behavioral Statistics, Spring 199 7, Vol. 22, No. 1 , 77-108.

Thurston, L. L. (1947). Multiple Factor Analysis. Chicago, IL: University of Chicago
Press.

Wald, A. (1950). A note on the identiﬁcation of economic relations. In T. C. Koopmans,
ed., Statistical Inference in Dynamic Economic Models. New York: Wiley, pp.
238-244.

Wolﬁ'an, S. (1991). Mathematica: A system for Doing Mathematics by Computer. Second
Edition. Reading, MA: Addison-Wesley.

Willet, J. B., & Sayer, A. G. (1994). Using covariance structure analysis to detect
correlates and predictors of change. Pychological Bulletin, 116, 363 -381.

Willet, J. B., & Sayer, A. G. (1996). Growth modeling and covariance structure analysis.
In Advanced Structural Equation Modeling, Issues and Techniques (Ed. by

Marcoulides, G. A. & Schumacker, R. E.) 125-157.

-161-

"1111111111111111171111“