3.: £5, raw“...

7. ‘
12‘

‘ u
.{z .. 5

bin 33......»

. 2.5.; .9 .1.

kg m. .

,‘Rqu. I.

I‘ll .

5 .5... 3.
; ..‘
.. ug

 

E .l.
; Maﬁa;
J I {Jiwﬁ . .._,..
a... .. E.
3.515.

L

 

1
§. 5:.
4 I.

a.

11

{a Ya
I

3.3... H
9.. .5...
eadw. A,

 

.2. .4. r 3
“3.3.131. A

.1 ..
I If.i;§ﬂa
is}... Sir-31F.

{i.dnt.:l.£ﬂli I;

‘Ili‘3“ li‘sg“

3W.$ﬂ.rﬂt.ltivca§..ﬂ. (
\l. .v. . 9|). :

.r «:5 .!.uv.luﬂ.nﬂ.¢!x

z. .3!

911:!" .

 

"7

q ‘ .....l a," e3: : . by}? ‘ «a...
if. awumﬁﬁﬁ , . £2 3

.(I‘ , ..

_m 1' ‘l nil

 

1153315

2010

This is to certify that the
dissertation entitled

A UNIFIED MODEL FOR THE ANALYSIS OF
INDIVIDUAL LATNET TRAJECTORIES

presented by

CHUEH-AN HSIEH

has been accepted towards fulﬁllment
of the requirements for the

Ph D degree in Measurement and Quantitative
' ' Methods

 

 

 

\ MajcﬂProfessorEs Signature
5 W 20/0

Date

MSU is an Afﬁrmative Action/Equal Opportunity Employer

 

 

LIBRARY
Michigan State
University

 

 

 

-...r _

—.-3-3«_'-n--__-.-.—-—-—u-i-

A UNIFIED MODEL FOR
THE ANALYSIS OF INDIVIDUAL LATENT TRAJECTORIES

By

Chueh-An Hsieh

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
’ for the degree of

DOCTOR OF PHILSOPHY
Measurement and Quantitative Methods

2010

ABSTRACT

A UNIFIED MODEL FOR
THE ANALYSIS OF INDIVIDUAL LATENT TRAJECTORIES

By
Chueh-An Hsieh

The application of item response theory models to repeated observations has
demonstrated great promise in developmental research. It allows researchers to take into
consideration the characteristics of both item response and measurement error in
longitudinal trajectory analysis, which improves the reliability and validity of the latent
growth curve (LGC) model. This thesis demonstrates the potential of Bayesian methods
and proposes a comprehensive modeling framework, combining a measurement model
with a structural model. That is, through the incorporation of a commonly used link
function and Bayesian estimation, an item response theory model (IRT) can be naturally
introduced into a latent variable model (LVM).

All proposed analyses are implemented in WinBUGS 1.4.3 (Spiegelhalter,
Thomas, Best, and Lunn, 2003), which allows researchers to use Markov chain Monte
Carlo (MCMC) simulation methods to ﬁt complex statistical models and circumvent
intractable analytic or numerical integrations. The utility of this IRT-LVM modeling
framework was investigated with both simulated and empirical data, and promising
results were obtained. As the results indicate, the IRT-LVM utilized information from
individual items of the scales at each point in time, allowing the employment of item

response characteristics from distinct psychometric models, permitting the separation of

time-speciﬁc error and measurement error, and giving researchers 3 way to evaluate the

factorial invariance of latent constructs across different assessment occasions.

ACKNOWLEDGEMENTS

After a few years of work and fun, this is it: my thesis, about models for
individual latent trajectories, on which I worked at the Measurement and Quantitative
Methods (MQM) program of Michigan State University. I would like to thank everyone
at MQM for the good times I had studying there, and for a grant received from the
American Educational Research Association (AERA) Grants Program under the National
Science Foundation (NSF) Grant #DRL-0941014. There are some people I would like to
thank in particular:

Dr. Matthew A. Diemer, for being a good model when I worked with him as a
research assistant; from him I can see the importance of maintaining a diligent and
persistent attitude toward research; Dr. Richard T. Hoang, for his friendly and unique
interpretation of educational statistics, which opens a real world in front of me; Dr.
Kimberly S. Maier, for the freedom and patience she always grants me, which have
become incredible assets that will greatly enhance my future; Dr. Mark D. Reckase, for
his great knowledge base in measurement, which intrigues me, as a way to see things
differently; Dr. Alexander A. von Eye, for his fruitful advice and pleasant cooperation,
which has continuously helped me to consider how to conduct rigorous research. Also, to
my friends in Japan, Taiwan, the United States, and the United Kingdom—without the
invaluable friendship of you all, this learning journey toward a Ph.D. could not have been
as smooth as it was.

Finally, I would like to dedicate this thesis to my family: through your

unconditional support and love, I learned how to push the boundary a bit further!

iv

TABLE OF CONTENTS

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

LIST OF TABLES vii
LIST OF FIGURES ix
INTRODUCTION 1
CHAPTER 1
A UNIFIED MODELING APPROACH 10
A Unidimensional IRT-LVM: 2PL-LGC/2PNO-LGC 11
A Multidimensional IRT-LVM: MGRM-ALGC 21
CHAPTER 2
MODEL FORMULATION 27
The Measurement Model 28
Unidimensional Graded Response Models: GRMs 28
Multidimensional Graded Response Models: MGRMS 29
The Structural Model 32
Univariate Latent Growth Curve Analysis: LGC 32
Multivariate Latent Growth Curve Analysis: the Associative LGC 36
CHAPTER 3
BAYESIAN INFERENCE 39
Estimating Statistical Complex Models Using the Markov Chain Monte Carlo ----------------- 4 0
Sampling Procedures 42
Speciﬁcation of Priors 44
Monitoring the Markov Chain(s) and Evaluating the Model Goodness of Fit ------------------------ 46
CHAPTER 4
PRACTICAL ILLUSTRATION 50
Using the RASCH-LLGC to Evaluate the Model Parameter Estimate Perfonnance~51
Monte Carlo Simulation Study 51
Prior Knowledge Incorporation 57
Fit of the 2PNO-LGC to the Abortion Data 59
Measures and Data Sources 59
Unconditional Models 61
Model Equivalence 64
Missing Longitudinal Data Compensation 66
Using the MGRM-ALGC to Study the Parallel Process of Change 70
Participants 70
Measures 71
Dimensionality Assessment 71
Identiﬁcation Constraints and Prior Distribution Speciﬁcation 72

 

Empirical Results 74

CHAPTER 5

 

 

 

 

DISCUSSION AND CONCLUSION 84
Signiﬁcance of the Present Work 87
Future Research 89

APPENDICES
APPENDIX A 94
APPENDIX B 127

 

 

REFERENCES

vi

141

LIST OF TABLES

 

Table 4.1.1 The Simulation Design Layout 94

Table 4.1.2 The Population Values Used in the RASCH-LLGC Model 95

 

Table 4.1.3 Performance of the Estimated Average Latent Trajectory in the
RASCH-LLCG Model 96

 

Table 4.1.4 Different Types of Prior Used for the Simulated Data Set (SE125110) ------------- 97
Table 4.1.5 Parameter Estimates with Different Priors for the Simulated Data ------------------------------ 98

Table 4.2.1 The Seven Items Concerning Attitudes to Abortion on the British Social

 

 

 

 

 

 

Attitudes Panel Survey, 1983-1986 99
Table 4.2.2 Breakdown Table for the Restricted Data/Complete Cases 100
Table 4.2.3 Breakdown Table for the Full Data/Available Cases 101
Table 4.2.4 Frequencies of the Response Patterns Observed for the 1983-1986 Panels

(Complete Cases} 102
Table 4.2.5 Frequencies of the Response Patterns Observed for the 1983-1986 Panels

(Available Cases) 103
Table 4.2.6 Different Types of Prior Used in the Present Study 104
Table 4.2.7 Parameter Estimates of the 2PNO-LGC Model (Restricted Data) --------------------------- 105

Table 4.2.8 Sensitivity Analysis: Parameter Estimates of the 2PNO-LGC Model
(Restricted Data) 106

 

Table 4.2.9 Bayesian Estimates of the Model Parameters under (1) the HLM and (2) the
LGC Model for a Simulated Data Set 107

 

Table 4.2.10 Unconditional Models: Parameter Estimates of the 2PNO-LGC Model (Both
Data Sets) 108

 

Table 4.2.11 Conditional Models: Parameter Estimates of the 2PNO-LGC Model ----------------- 1 09
Table 4.3.1a Summary Statistics for Longitudinal NYS Data: Social Isolation ---------------------------- 111

Table 4.3.1b Summary Statistics for Longitudinal NYS Data: Deviant Peers Afﬁliation
111

 

vii

Table 4.3.2 Response Frequencies to 13 Outcome Measures 112

 

Table 4.3.3 Different Types of Prior Used in the Present Study 115

 

Table 4.3.4 Unconditional Models: Parameter Estimates of the GRM-LGC Model for
Each Dimension 116

 

Table 4.3.5a Correlations among Adolescents’ Social Isolation and Extent of Exposure to
Delinquent Peers 1 18

 

Table 4.3.5b Unconditional Models: Parameter Estimates of the MGRM-ALGC Model
for Both Dimensions 119

 

Table 4.3.6 Unconditional Models: Parameter Estimates of the MGRM-ALGC Model
with Different Scaling Options (Both Dimensions) 121

 

Table 4.3.7 Results from the ALGC model Using Two Analytical Approaches with a
Simulated Data Set 123

 

Table 4.3.8a Correlations among Adolescents’ Social Isolation and Extent of Exposure to
Delinquent Peers 124

 

Table 4.3.8b Estimates of Fixed and Random Effect Parameters in the MGRM-ALGC
Model 125

 

viii

LIST OF FIGURES

 

Figure 2.1 Path diagram of a bivariate latent growth model. 127

 

Figure 4.2.1 Path diagram of a four-wave 2PNO-LGC model. 128

Figure 4.2.2 Kernel density for the restricted data: One single long chain (excerpted);-~-129

 

 

Figure 4.2.3 Kernel density for the restricted data: Three independent chains. ------------------------------ 130
Figure 4.2.4 Gelman-Rubin statistic for the restricted dataset: Three independent chains

(excerpted). 1 3 1
Figure 4.3.1a Perceived social isolation across ﬁve occasions (n=44). 132

Figure 4.3.1b Perceived extent of exposure to delinquent peers across ﬁve occasions
(n=44). 132

 

Figure 4.3.2 MCMC convergence diagnostics: Gelman and Rubin statistics; ------------------------------- 133

INTRODUCTION
Longitudinal Data Analysis

The use of growth models in social, behavioral, and educational research has
increased rapidly, because it answers important research questions such as concern the
nature of psychological and social development and the process of learning. Already it is
well known that growth models can be approached from several perspectives via the
formulation of equivalent models and can provide identical estimates for a given data set
(e. g., Bauer, 2003; Chou, Bentler, and Pentz, 1998; Curran, 2003; Engel, Gattig, and
Simonson, 2007; Hox and Stoel, 2005; Hsieh and Maier, 2009; Willett and Sayer, 1994).
For instance, a model can be constructed as a standard two—level hierarchical linear model
(HLM), where the repeated measures are positioned at the lowest level and treated as
nested within the individuals (e. g., Singer, 1998; Steele and Goldstein, 2007). Equally, a
model can be constructed as a structural equation model (SEM), in which latent variables
are used to account for the relations among the observed variables, providing estimates of
the individual growth parameters and inter-individual differences in change across all
members of the population; hence its name, latent growth curve (LGC) analysis.

It is this mean and covariance structure (MCS) that makes it possible to specify
exactly the same model as an HLM or LGC, because the ﬁxed and random effects in the
HLM correspond to the mean and covariance structure of the latent variables in the LGC
analysis. Within the HLM framework, time is an independent variable at the lowest level
and the individual is deﬁned at the second level, in which time-varying and
time-invariant explanatory variables can be incorporated into existing level-1 and level-2

models. Additionally, the intercept and slope describe the mean change status and the

change rate, and inter-individual differences in the change proﬁle can be modeled as
random effects for either the intercept or the slope of the time variable, or both
(Raudenbush and Bryk, 2002). Likewise, within the LGC, the time variable is
incorporated as a series of constrained values for the factor loadings of the latent variable
representing the shape of the growth curve, along with all the factor loadings of the latent
variable constrained to the value of one and representing the initial level. Thus, the latent
variable means for the initial level and shape factors depict the mean growth status and
the growth rate, and inter-individual differences in the change can be modeled as the
covariance of the level and shape factors (Meredith and Tisak, 1990).

While several key differences remain between these two models, at the time of
writing this dissertation, the discrepancies have rapidly been disappearing (Curran,
Obeidat, Losardo, in press; Preacher, Wichman, MacCallum and Briggs, 2008; Raykov,
2007). One primary difference is that in the HLM, time is treated as a ﬁxed explanatory
variable, whereas time is introduced in the LGC model via the factor loadings, which
makes HLM the best approach if there are a great many varying occasions across
individuals (Snijders 1996; Willett and Sayer, 1994), and the LGC is considered best
suited for time—structured data or a ﬁxed occasion design (e.g., Byme and Crombie, 2003;
Skrondal & Rabe-Hesketh, 2008). The consequence is that the HLM is essentially a
univariate approach with time points treated as observations of the same variable,
whereas the LGC model essentially takes a multivariate approach with each time point
treated as a separate variable (e. g., Bauer, 2003; Curran, 2003; Hox and Stoel, 2005;

Preacher et al., 2008; Raudenbush and Bryk, 2002; Willett and Sayer, 1994).

Research Motivation

When the outcome measurements are on a discrete scale, however, the application
of conventional growth curve models will introduce a potentially signiﬁcant bias in the
analysis and subsequent inferences (Curran, Edwards, Wirth, Hussong, and Chassin,
2007). Currently, there are two major modeling strategies which allow for the explicit
incorporation of categorical repeated data in growth curve models. One strategy is to use
the nonlinear multilevel model (e. g., Diggle, Heagerty, Liang, and Zeger, 2002; Gibbons
and Hedeker, 1997; Johnson and Raudenbush, 2006), and the other is to use the nonlinear
structural equation model (e. g., Joreskog, 2002; Muthén, 1983, 1984, 1996, 2002). As
Curran et al. (2007) and Vermunt (2007) indicate, when ﬁtting measurement models to
empirical data of the type commonly encountered in developmental research, such as
small sample sizes, multiple discretely scaled items, many repeated assessments, and
attrition over time, both models become quite complex and have difﬁculty achieving
convergence.

Moreover, with categorical response variables, when there are more than two or
three latent variables with random effects, relying on the untestable assumption that these
random coefﬁcients come from a multivariate normal distribution, the integrals appearing
in the likelihood function are hard to analytically determine and need to be solved using
approximation methods (Moustaki and Knott, 2000; Vermunt, 2007). In addition, the
calculation of standard errors is challenging when the expectation-maximization (EM)
algorithm is used to compute the maximum likelihood estimates (J arnshidian and
J ennrich, 2000). Thus, in order to accommodate these, we bridge the gap by resorting to

an integrative modeling framework: using the derivative of the generalized linear latent

and mixed modell (GLLAMM; Skrondal and Rabe-Hesketh, 2004), strengthened by the
attributes of the item response theory model (IRT) (e. g., Lord and Novick, 1968), the
latent variable model (LVM) (e.g., Muthén, 2002), and the Bayesian estimation approach.
An overall “true score” can be generated from a second-Order latent growth curve
analysis, in which each item provides some sources of information, reduces our
uncertainty about the examinees, and reﬂects respondents’ positions on the underlying
dimension (e.g., Bollen, 1989; Curran et al., 2007; Fox, 2007; Preacher etal., 2008; Sayer

and Cumsille, 2001; Wiggins, Ashworth, O’Muircheartaigh, and Galbraith, 1990).

 

l Analogous to the different treatment of the time variable in the HLM and LGC, time is treated as a fixed
explanatory variable in the growth model embedded in the GLLAMM, but is introduced via the factor
loadings in the present study.

Objectives of the Present Work

The application of item response theory models to repeated observations has
demonstrated great promise in developmental research. It allows one to take into
consideration the characteristics of both item response and measurement error in
longitudinal trajectory analysis, which improves the reliability and validity of the latent
growth curve model (e. g., Bollen, 1989; Curran et al., 2007; Fox, 2007; Hsieh and von
Eye, in press; Preacher et al., 2008; Sayer and Cumsille, 2001; Wiggins et al., 1990).
Within this modeling framework, different types of item response model and latent
growth curve analysis can be combined to address various research questions. In addition,
different data structures can be accommodated, such as unidimensional vs.
multidimensional item response theory models, dichotomous vs. polytomous items, linear
vs. nonlinear change trajectories, single vs. multiple domain(s) latent growth curve
analyses, etc. In longitudinal studies, although the development of a single behavior is
often of interest, it is worthwhile to extend this to multiple domains and simultaneously
model their interrelationship across the entire study period (e.g., Cheong, MacKinnon,
and Khoo, 2003; Preacher et al., 2008; Raykov, 2007).

In the present study, the hierarchical nature of latent variable problems suggests a
Bayesian approach to estimation. In estimating complex statistical models, the capacity
of Bayesian methods is undeniable. Bayesian data analysis is seen as having a range of
advantages, such as an intuitive probabilistic interpretation of the parameters of interest,
the efﬁcient incorporation of prior information to empirical data analysis, the ability to
take account of model uncertainty among different models and to draw combined

inferences when there is no single pre-eminent model, and so on (Best, Spiegelhalter,

Thomas, and Brayne, 1996; Maier, 2001; Rupp, Dey, and Zumbo, 2004; Western, 1999).
Additionally, unlike the maximum likelihood estimation (MLE), which requires large
samples to approximate the sampling distribution for sample statistics, Bayesian
inference can be seen a plausible way to deal with small sample size studies (Congdon,
2005; Lee and Wagenmakers, 2005 ; Zhang, Hamagarni, Wang, Grimm, and Nesselroade,
2007). Beyond its value for this purpose, the Bayesian method also has a unique strength,
the systematic incorporation of prior information from previous studies (Scheines,
Hoijtink and Boomsma, 1999; Rupp et al., 2004; Zhang et al., 2007). Bayesian methods
and Bayes’ theorem permit the incorporation of previous ﬁndings as supplementary and
inﬂuential information, whereas traditional likelihood methods cannot do this (Western
and J ackrnan, 1994). By not undertaking statistical analysis in isolation, Bayesian
learning draws on existing knowledge in the prior framing of the model and allows the
combination of existing evidence with the actual study data at hand during the estimation
process (Congdon, 2005). Besides, the interval estimation is a direct product via a
Bayesian estimation routine: inference on functions of parameters can easily be obtained,
since the full posterior distribution of the parameters is available.

Thus, in order to differentially weigh individual items and examine
developmental stability and change over time, this thesis seeks to demonstrate the
potential of Bayesian methods and propose a comprehensive modeling framework
combining both a measurement model and a structural model. That is, through the
incorporation of a commonly used link function and Bayesian estimation, the item
response theory (IRT) model can be naturally introduced into the latent variable model

(LVM). Despite a large number of components requiring attention, this thesis restricts its

focus to the following issues: (1) model formulation: how Bayesians explicitly
incorporate (multivariate) multiple repeated measures of discrete scale into a latent
growth curve model, in which the unidimensional Rasch (1960) and linear latent growth
curve model (RASCH-LLGC), the unidimensional two-parameter normal ogive (e. g.,
Bimbaum, 1968) and nonlinear latent growth curve model (e. g., Meredith and Tisak,
1990) (2PNO-LGC), and the multidimensional graded response (e. g., De Ayala, 1994)
and associative latent growth curve model (e. g., McArdle, 1988) (MGRM-ALGC) are
presented; (2) the evaluation of the model parameter estimate performance: as the sample
size needed for a particular longitudinal study depends on many factors, an “adequate”
sample size is hard to determine unambiguously. As a simpliﬁed illustration, we
demonstrate how to evaluate the performance of parameter estimates through conducting
a Monte Carlo study. For instance, to evaluate the numerical behavior of the average
growth trajectory in Bayesian analysis, we launch a small-scale simulation study using a
2X3><2 design with 12 conditions. Given the constant number of repeated assessments and
the growth curve reliability (GCR), we assume that the performance of a particular
parameter estimate, the stability and variability of the average growth trajectory in the
RASCH-LLGC model, is a ﬁmction of the sample size, the number of items being
administrated at each point in time, and the standardized effect size of the average growth
trajectory; (3) model application: the capacity of this IRT-LVM comprehensive
framework was investigated with two empirical data sets, in which one data set, drawn
from part of the British Social Attitudes Panel Survey (1983-1986), revealed the attitude
toward abortion of a representative sample of adults aged 18 or older living in Great

Britain (McGrath and Waterton, 1986), and the other data set, subsampled from the

National Youth Survey (NYS; Elliott, 1976-1987), depicted the dynamic relations
between two interrelated dimensions (namely, social isolation and exposure extent to
delinquent peers of adolescents who were aged from 11 to 17 in the year 1976) across
ﬁve consecutive years (1976-1980).

Since missing data are unavoidable in almost all serious statistical analyses, as an
alternative estimation method, the Bayesian inference explicitly models missing
outcomes and handles them as extra parameters to estimate (Gelman and Hill, 2007;

J ackman, 2000; Patz and Junker, 1999b; Spiegelhalter et al., 2003). Therefore, it becomes
straightforward to use this method to effectively estimate any missing values at each
iteration. Although the way in which the Bayesian estimation compensates for missing
data is similar to the multiple imputation (MI) technique described by Rubin (1987), it
extends the MI method by jointly simulating the distributions of variables with missing
data, as well as unknown parameters (Carrigan, Barnett, Dobson and Mishra, 2007). It is
expected that through this ﬁrlly Bayesian (FB) method, the missing values can not only
be treated as additional parameters to estimate but these parameter estimates can be
marginally integrated from an exact joint posterior distribution for all parameters and
latent variables. Thus, in the ﬁrst empirical data example, we illustrate how to incorporate
individual-level auxiliary predictors and effectively estimate missing values in a
conditional model via the Bayesian estimation approach.

In the second empirical data example, we make use of the multidimensional
graded response model (MGRM; De Ayala, 1994; Reckase, 2009) and associative latent
growth curve analysis (ALGC; McArdle, 1988) to model the dynamic relations between

two interrelated dimensions across ﬁve consecutive years (1976-1980). In order to

evaluate the performance of this comprehensive modeling approach, we compare and
contrast the corresponding parameter estimates using two distinct analytical approaches
with a simulated data set, namely, a two-stage IRT-based score analysis and a
single-stage IRT-based score analysis. As opposed to the traditionally adopted method
(e. g., an average composite), this approach enables the researcher to make use of
individual items of the scales at each point in time, allowing the employment of item
response characteristics from distinct psychometric models, permitting the separation of
time-speciﬁc error and measurement error, and providing a common ground for testing
measurement invariance across occasions. As for the substantive merit, the following
hypothesized associations can be tested: that is, as adolescents perceive themselves to be
more socially isolated, the chance that they are engaged with delinquent peers becomes

profoundly larger.

Chapter 1
A UNIFIED MODELING APPROACH

As suggested by McArdle (1988), to provide a more rigorous basis for meaningful
scaling, the researcher could consider the incorporation of contemporary IRT models
and/or the generalized linear models (GLIMs) into the latent growth curve analysis. This
is because using the IRT approach provides several distinct beneﬁts over traditional
methods. These beneﬁts include facilitating the identiﬁcation of items which discriminate
among respondents across the range of underlying latent abilities, having the report of
item statistics and person abilities on the same scale, being ﬂexible in incorporating
various auxiliary information, scale construction and measurement invariance
examination, and more (see de la Torre and Patz, 2005; Embretson and Reise, 2000;
Hambleton and Jones, 1993). When we incorporate random effects in the underlying
continuous latent constructs (i.e., when we augment GLIMs via the inclusion of random
effects in the latent variables — hence the name ‘generalized linear mixed models’,
GLMMs), and regress latent variables upon other latent variables or covariates, this
uniﬁed model becomes the generalized linear latent and mixed model, GLLAMM. As a
class of multilevel latent variable models, this GLLAMM encompasses the response
model and the structural model (Skrondal and Rabe-Hesketh, 2003; 2004), where the IRT

model is the response model, and the LGC analysis is the structural model.

10

A Unidimensional IRT-LVM: 2PL-LGC/2PNO-LGC

In the scenario of unidimensional item response models, the GLIM formulation is
typically used. Through a commonly used link function, either a logit or a probit, the
conditional probability of a particular response given the latent trait can easily be
speciﬁed. The classical application of these models is in the literature on educational
testing and psychometrics, where the subscript i represents an item or question in a test

and the responses are scored as correct (1) or incorrect (0) for dichotomous items. In this
setting, 6n represents the latent ability of person n, and the model is pararneterized as

either

[Oglt[P(Yin =l|6n):|=ai(6n—ﬂi) or

probit[P(Y,—n =1 l 6,0] 2 al- ((9,, - A)
(i = l,...,I,°n = l,..., N ), corresponding to a unidimensional two-parameter logistic

(2PL) item response theory model or a unidimensional two-parameter normal ogive

(2PNO) model. Here, the abilities can be interpreted either as logits or probits of the
probability of a correct response to a particular item. Item difﬁculty parameters ( 161') are
deﬁned as the location of inﬂection points in the item characteristic curves (ICCs) along
the same scale as the latent ability (9" ), whereas the (21- are the slopes of ICCs at their

inﬂection points, which can be considered the degree to which item response varies with

the underlying latent construct, and help determine how well the item discriminates

between subjects with different abilities (e. g., Bimbaum, 1968; Lord and Novick, 1968).
As regards the link function, given the similarities between logit and probit of

these two models, either model in most applications will give identical substantive

ll

conclusions (Liao, 1994; Stefanescu, Berger, and Hershberger, 2005). Normally, by
. . 7T . 2
multiplying by a factor of :7: , we can go from one set of estimates to the other .

However, when we have heavy tails in the distribution of observations, estimates from
logit and probit models can differ substantially (Amemiya, 1981). Thus, researchers
could opt to use one or the other link function via model comparison. As one of the
Markov chain Monte Carlo (MCMC) sampling algorithms, direct Gibbs sampling
(Albert, 1992; Chib and Greenberg, 1995; Gelfand, Hills, Racine-Poon and Smith, 1990;
Patz and Junker, 1999a) has been implemented for normal ogive item response models,
requiring the use of a process called data augmentation (Albert and Chib, 1997; Fox,
2007; J ackman, 2000; Kim and Bolt, 2007; Stefanescu et al., 2005). That is, the Gibbs
sampler can be used for extracting marginal distributions from the full conditional

distributions when the complete conditional distributions are of a known distribution

form (Geman and Geman, 1984). Therefore, the probit3 link is considered the more
appropriate function for estimating the two-parameter normal ogive (2PNO) IRT-LGC
model.

As the chronological ordering of responses and the clustering of responses within
individuals are two important features of longitudinal data, in order to accommodate this
mean and covariance structure, a longitudinal model must allow for dependence among
responses on the same subject (e. g., Everitt, 2005; Skrondal and Rabe-Hesketh, 2004).

Being a useful version of the random coefﬁcient model, a single-domain latent grth

 

2 Or, multiplying by a factor lying somewhere between 1.6 and 1.8 (Amemiya, 1981).

In addition, a useful feature of the probit model is that it can be used to yield tetrachoric correlations for
the clustered binary outcomes, and polychoric correlations for ordinal responses (Hedeker, 2005).

12

curve analysis was presented, in which individuals were assumed to differ not only in
their intercepts, but also in other aspects of their trajectory over time in terms of a
unidimensional latent variable (e.g., Byme and Crombie, 2003; Skrondal and
Rabe-Hesketh, 2008). Speciﬁcally, like a bifactor model, the univariate latent growth

curve model can be formulated as
6(t)n = 7t + 405012 + Altéln + 5(t)n

(t = l, ..., T ,‘H = l,..., N ), where the 6( t) n , depicting the propensity of holding the

. . . . th . . . .
property of a certain d1mensron at the t occasron for participant n, are the focr of the

study; 2' t is the intercept of the structural model; (On and €111 are the true initial
level and shape factors; and 8( t) n represents the level-1 residuals. The data are

time-structured and balanced in occasions: all subjects were measured on an identical set

of occasions and possessed complete data points, I = l,.. .,T . In addition, the loadings

for the initial level factor 4’ 0 n are ﬁxed at 21.01 = 1 (VI ), and the loadings for the

shape factor 41 n are set equal to 3.1 t . As the nonlinear latent trajectory is essential for

analyzing more complicated situations, it has been found useful in establishing a better

model-data goodness of ﬁt. In addition, it is feasible to model a nonlinear change
trajectory using a bifactor model with free factor loadings for gln (Meredith and Tisak,

1990). According to Raykov and Marcoulides (2006), this level and shape (LS) model is

equally useful regardless of whether the developmental trajectory is linear or nonlinear.

Finally, to make the model simpliﬁed and identiﬁable, we remove the intercept (2' t )

l3

from the structural model, set 3.1 1 = O and 3.1T = I, and estimate the coefﬁcients

for intermediate time points.
With the longitudinal design, mathematically, the response model can now be

written as

’Ogi’lplw =1'9(r)nll = “i(r)("(t>n ‘40)) °r

prObitl:P(Yi(t)n =1 |9(t)n)] = grit) (60)” _ 4(0)

(i = l,...,],‘t = l,...,T,'n = l,...,N ), where subscript trepresents the different
occasions. In the present study, when the assumption of strong measurement invariance
was adopted (Meredith and Teresi, 2006; Sayer and Cumsille, 2001), we impose equality
for each of the item parameters over time4 (i.e., assuming that neither item difﬁculties

nor item discriminations vary across different points in time), which further reduces

ai(t) to (1,- and ﬂio) to ’31- from the above mathematical formula. If the

invariance of the factor structure fails to hold over time, the difference in means may be
partially attributable to differences in the scale of the latent variables (Blozis, 2007). Thus,
through the estimated item characteristic curves (ICCs) for a unidimensional

two-parameter item response model, this uniﬁed model can be speciﬁed as

exp ( Vi( t)n )
1 + exp(Vi(t)n ))

 

PM(t)n=1|9(t)n)= (

 

For most applications in which the aim is to ensure fairness and equity, a stronger assumption of strict
factorial invariance is necessary: that is, equal factor loadings, intercepts, and equivalent residual
variances (speciﬁc factor plus error variable) across different occasions (Meredith and Teresi, 2006).

14

(i = l,...,],‘l‘ =1,...,T,'n = l,...,N), where Vi(t)n is the linear predictor (i.e.,

al- (60)" — ,61- )), and again, 60.)” can be replaced by (On + Alté’ln + £t(n)'

As the model becomes complex, for identiﬁcation purposes we exclude the intercept
from the structural model, ﬁx the ﬁrst discrimination parameter at one, and set the ﬁrst
item difﬁculty parameter to equal zero. By doing so, we enforce other individual-level
covariates to affect the response via the latent variable only (Skrondal and Rabe-Hesketh,
2004)

In summary, with the imposition of a sampling distribution assumption, this
GLLAMM can be categorized into three subcomponents: (1) the level-1 sampling model;
(2) the link function; and (3) the structural model (Raudenbush and Bryk, 2002).
Alternatively, this uniﬁed model can be regarded as encompassing the following two

parts: either a two-parameter normal ogive model or a two-parameter logistic item

response model for the unidimensional binary data, P (Ill-(1.)" = l I 60)", 61,-, :61' ) , is
the measurement model, where 00.)" represents the latent ability for the subject It at

th . .
the t occasron, and 181' and al- are the item parameters. The structural model,

P (90)" l A, 4’ ) , serves to link the latent abilities with time-varying and time-invariant
covariates. Speciﬁcally, for instance, the ﬁrst component,

P (Yi(t)n = I I 90)" , al- , ﬂi ) , the probability that the subject n has the ability

15

60)” to endorse an item at the tth occasion, is given by the normal ogive item
response theory model.
P(Yi(t)n :1|6(t)n’ai'16i)

: @(al. (90)” _ ’61)): f;(9(t)n‘ﬂi) 37; 63—12de

(i=l,...,1,'t =1,...,T,'n = l,...,N), where ¢() represents the standard normal

 

cumulative distribution function (CDF); and 161' and al- are the item difﬁculty and

item discrimination parameters for a dichotomously-scaled item 1'. Here, for a given item 1',

we denote its corresponding parameter as 51- , that is, 4:1- — ( ﬂi’ al- ).

As the second component of the uniﬁed model, the underlying latent ability
serves as the outcome variable in the structural model, P (60)" I A, 4’ ) , which

establishes the relation between latent abilities and time-varying and invariant covariates.
The time—varying and invariant variables are conceptualized as explanatory covariates for
the latent variables. Thus, the corresponding level-l and level-2 structural model can be

speciﬁed as

6(t)n : (On + ’l'ltgln + g(t)n and
(on = V00 + 701,471 + + YOqu + U0”

(In :v10+711VVl+-"+7quq +Uln

l6

(t=l,...,T,'n=l,...,N,'q=1,...,Q),wherethe lot, All" 4071’ 41" are

level-l factor loadings and latent growth parameters for the initial level and shape factors,

and 80)” are independent and identically distributed as N (0, 0'2 ). With regard to

70g , 71g , and W q , they are level-2 partial regression coefﬁcients and predictors

(individual characteristics) of each latent growth parameter, that is, the latent initial status

and the change rate, and 0011 and U1 n are followed a bivariate normal distribution

with a mean vector of zero and a variance-covariance matrix T, N (0, T). In this

structural model, the growth factors are latent variables with random effects: the level-1
and level-2 models deﬁne a population with N level-2 units (each individual as the
primary sampling unit) and there are t ( t = l, .. ., T ) level-l units within each level-2
unit (n = l, . . ., N ). This model assumes that each person was randomly sampled from a

larger population and each of them has his/her own latent trajectory.

2
U0n 000 0001

T : var 2
”1" 0010 001

As with any item response theory model, this IRT-LGC model is

over-parameterized and needs to be identiﬁed. The indeterminacy is caused by the fact

that the item parameters associated with ordered categorical variables and the distribution

of underlying continuous variables, N ( [.1 , 0' 2 ), are not identiﬁed. Usually, the

identiﬁcation problem is tackled by ﬁxing ( ,u, 0'2) at some pre-assigned values.

17

Depending upon the speciﬁc research question, however, it is better not to impose
restrictions on person parameters when these parameters are of primary interest (Lee,
2007). Thus, we consider imposing the identiﬁcation conditions on the observed
categorical variables, the less interesting nuisance parameters. Generally, there are no
necessary and sufﬁcient conditions for identiﬁability: the problem needs to be addressed
on a case-by-case basis. In the existing literature, different ways are found for model

identiﬁcation: (1) ﬁxing the ﬁrst item discrimination parameter at the value of one
(0’1 = l), and the ﬁrst item difﬁculty parameter at the value of zero ( ﬂl = 0) (for
binary items) or ﬁxing the ﬁrst item discrimination parameter at the value of one

((11 = l), and the ﬁrst item’s ﬁrst threshold parameter at the value of zero ( £1 1 = 0)

(for polytomous items); (2) ﬁxing the ﬁrst item discrimination parameter ( a1 = 0) at
the value of one, and the mean of the latent growth intercept at the value of zero

(40" = O ); and (3) ﬁxing the product of discrimination parameters at the value of one
(Hi (11- = l) and the sum of difﬁculty parameters at the value of zero (21- ,81' = O)

(for binary items) or ﬁxing the product of discrimination parameters at the value of one

(1] i a i = l) and the ﬁrst item’s ﬁrst threshold parameter at the value of zero

( ,8] 1 = O) (for polytomous items) (Fox, 2007; Muthén and Muthén, 1998-2007). In this

study, either the ﬁrst or the second scaling option was adopted.
As regards the general assumptions for the IRT-LGC model, taking the

two-parameter normal ogive model as an example: given the subject latent ability

(6( t) n ) and item parameters (51- = ( 161° , a,- )), the probability of the subject It

18

endorsing a particular item i at the 1th occasion is deﬁned as
pig). = PW). =1 I 0(.)n,ﬂ.-,a.-)=P(Yimn =1 | 6(.)n.e).nis
assumed that each observed outcome variable Yip)” follows a Bernoulli distribution

with the expectation value of 191-0.)” ,

Yi(t)n lpi(t)n ~ Bernoulli(pi(t)n)

(i = l,...,],‘t = l,...,T;n = l,...,N). The latent continuous measurement underlying

the dichotomous outcomes on the item level is assumed to follow a standard normal

distribution. In the structural model, the level-1 residual variance (0' 2) and level-2

variance-covariance matrix (T ) are identically and independently distributed as an
inverse gamma and inverse Wishart distributions, respectively. Additionally, the level-1
residual variance can be assumed as either homogeneous or heterogeneous across
different assessment occasions within individuals, and the level-2 variance component
follows a bivariate normal distribution with a mean vector of zero and covariance matrix
of T . This variance-covariance matrix T is assumed to be constant for all level-2

clusters. As for the statistical interpretation of random effects, for instance, the second

level random intercept, UOn , accounts for the variation of the initial status (Con)
around the ﬁxed population intercept (V00) not explained by the covariates, Wq. The

same interpretation applies to the random shape factor. Finally, the assumptions

associated with each level residual can be summarized as follows:

19

E(8(t)n):0' E(U0n):E(Uln):O’
var(§’0n)=var(uon)=0'30, var(é’1n)=var(uln)=0'31,

COV(Con.é'1n)= COV(UoMn)= 0:201,

COV(UOn,8(t)n) = cov(t)1n,8(t)n) = 0.

20

A Multidimensional IRT-LVM: MGRM-ALGC
Analogously, strengthened by the attributes of the MIRT model and the LVM, a
multivariate multilevel polytomous item response theory model embedded in an
associative growth curve analysis is proposed. Through the cumulative logit
transformation, the logit of responding in category j and a higher versus a lower category
a:
P(Kn21)

*

-P(nn<j)

particular response alternative, given the latent trait, can easily be speciﬁed (e. g.,

 

than j (i.e., 10g ), the conditional probability of endorsing a

Tuerlinckx and Wang, 2004). Based on a multidimensional item response theory model
with simple structure, in which each item measures only one particular latent construct
and there is no item in common across different constructs (e. g., Adams, Wilson, and
Wang, 1997; McDonald, 1999), we proposed a uniﬁed modeling approach using a
parallel generalized linear latent and mixed model (pGLLAMM) to simultaneously
estimate the latent growth trajectories for a dual-domain propensity level. As expected,
this approach can be further developed in a straightforward manner, accommodating a
more complex structure (e. g., tests with within-item multidimensionality) and a richer set
of auxiliary information (e. g., having additional levels above persons). Applications of
these models can be found in the literature on educational testing and psychometrics.
Corresponding to a logistic multidimensional graded response model, subscript i

represents an item in a test and the response is scored using j for a polytomous item. In
this setting, an represents a trait vector for respondent n on dimension d, and the

model is parameterized as

21

*
B'j (@dn)
*
1 '8] (an)

 

. *
logzt[B-j (8d,, )] = log = aim” — .31]

(i =l,...,1,'j = l,...,ml-;d =1,...,D,'n = l,...,N). Here, the abilities can be
interpreted as logits of the cumulative probability that respondent n will endorse a

particular item response category j and higher, at a given 0 d level. The elements in

the aid -vector stand for the multidimensional discrimination parameters for item i on
dimension d, giving the weight of each dimension d on item 1'; the multidimensional item

difﬁculty parameters, ﬂz'j , are scalar parameters and can be deﬁned as the location in

the latent trait space where the category response surface achieves its maximum slope
and, thus, where the item is most informative (Reckase, 1985).

As the multivariate version of the random coefﬁcient model, the multiple-domain
latent growth curve analysis allows one to model the situation that individuals differ not
only in their intercepts but in other aspects of their trajectory over time with respect to
multidimensional latent variables (McArdle, 1988). Again, since the chronological
ordering of responses and the clustering of responses within individuals are two

important features of longitudinal data, in order to accommodate random effect
regressions among the growth factors for two dimensionss, a longitudinal model with a
parallel process of change was proposed to allow for dependence among responses on the

same subject (e. g., Cheong et al., 2003; Preacher et al., 2008; Raykov, 2007).

 

This analysrs rs one kind of structural equation model With regressrons among latent variables which
represent aspects of distinct individual growth curves, with each of these being modeled along a
particular dimension.

22

Speciﬁcally, for each dimension, like a bifactor model, the latent growth curve model can

be formulated as

Qd(t)n =Tdt + ’ldOt €510): + 4611:4311): +8d(t)n
(d =1,...,D,‘t =1,...,T;n = l,...,N), where the Eda)” are the foci ofthe
study, which depict the propensity of holding the property of certain dimension (d = l
or d = 2) at the tth occasion for participant n. The intercept term (1' dt) is typically
constrained to zero, yielding a simpliﬁed model structure; CdOn and Cdln are the

true initial status and shape factors associated with each dimension d; and the 8‘1“)”

represent the level-1 residuals for dimension d in the structural model. The data under

study are time-structured and balanced in occasions: all subjects are measured on an

identical set of occasions and provide complete data points, I = l, .. ., T . In addition, the

loadings for the intercept factor CdOn are ﬁxed at ZdOt = 1 (VI ), and the
loadings for the shape factor 4’ d1 11 are set equal to 1d] t . To make the model

identiﬁable, we set id] 1 = 0 and 3.011 T = l for each dimension, and estimate the
coefﬁcients for intermediate time points. In terms of substantial interpretation, ﬁxing
1d] 1 = 0 indicates that time was centered on the ﬁrst wave of data collection, which

allows the researcher to interpret participant n’s initial status from the very beginning of
the study (Singer and Willett, 2005). Alternatively, the research could consider adopting
the orthogonal design matrix, such as imposing the value of zero for the factor loading

associated with the mid assessment occasion, which alleviates the problem of

23

multicollinearity in the latent growth curve model with higher-order polynomial
coefﬁcients (e. g., quadratic, or cubic).
With the longitudinal design, mathematically, the response model can be written

as

3; (8mm )
I '3] (admit )

 

. *
logit [8-]- (0d(t)n )] =log :atid @d(t)n -ﬂtlj

(i = l,...,I,‘j = l,...,ml-;d = l,...,D,’t =1,...,T;n = l,...,N),wherethe

additional subscript t represents the different occasions. In the present work, assuming
that the assumption of strict measurement invariance holds (Meredith and Teresi, 2006;

Sayer and Cumsille, 2001), the residual variances can be constrained to a constant value,

and each of the item parameters is identical over time, which further reduces aﬂd to

aid and ﬁn] to 161]" Thus, through the estimated item characteristic curves (ICCs)

for a particular unidimensional graded response model, this uniﬁed model can be

speciﬁed as

24

P(Y,-(,),, =j 1961(1))», ”31] raid ) Z Pij (8616)")

= ilj—l) (9610M ) " Pilj) (04%)")

z P* (Yin 2(1-1)10d(,)n,ﬂg.aid)- P3“ (Yin Z 1' 19cm ﬂij raid)

: explaid (Edam — 1320—1) 1) _ “plaid (86W ' 5101))
1+ exp(a,-d (06W, — ﬁlm—1) )) 1+ exp(azd (0mm — 51m ))

 

 

_ exPlVi(1‘-1)d(t)n) _ exPIViUMW")
1+ exP(Vi(j—l)d(t)n) 1+ exPIViUMﬂM)

 

 

(i=1,...,1,’j = l,...,mi;d =l,...,D,'l =1,...,T,'n =1,...,N), where
* a:

Pm,- (061(in = 0, Pro (961(1),?) = 1, and Vi(j—1)d(t)n and

Vi(j)d(t)n are linear predictors, (aid(8d(t)n'ﬂi0'-I) )) and

(aid (6)610)" 'ﬂiO') )) , respectively. Again, 00' (t )n can be replaced by
CdOn '1‘ Ad] té’dln +8d(t)n , for each dimension d. As with other estimation

approaches, various identiﬁcation constraints are needed when complex models are
encountered. In order to address rotational indeterminacy, in this MGRM-ALGC model
we assume a multidimensional model with simple structure, ﬁx the ﬁrst discrimination
parameter associated with each construct to the value of one, with zero loadings
otherwise, and constrain the ﬁrst threshold associated with the ﬁrst item in each
dimension to the value of zero. Moreover, in order to resolve the metric indeterminacy,

we try two different scaling options: (1) instead of imposing the constraints on the item

25

threshold parameters, we ﬁx the initial level growth factor associated with each
dimension to the value of zero; (2) ﬁxing the level-l residual variances for each construct

to a constant value, either the value of one or not.

26

Chapter 2
MODEL FORMULATION
In order to differentially weigh individual items and examine developmental
stability and change over time, for illustrative purposes, the formulation for a speciﬁc
model, an MGRM-ALGC, is presented in what follows, since a simpler model
formulation can easily be derived. As a class of multilevel latent variable models, this
derivative GLLAMM (i.e., pGLLAMM) encompasses the response model and the
structural model (Skrondal and Rabe-Hesketh, 2003; 2004), where the multidimensional
graded response model is the response model, and the associative latent growth curve
analysis is the structural model. Thus, with longitudinal designs, the data are multivariate
multilevel in nature with a set of ordinal categorical responses nested within each person

on each dimension and measurement occasion, with the response model, the structural
model, and ﬁve-levelindices(i =1,...,I,’j = l,...,mi;d =1,...,D,’t =1,...,T,'

n = 1, ...,N ) as the required elements.

27

The Measurement Mode]

Standard use of a latent grth curve analysis typically considers a single
manifest indicator at each measurement occasion, in which each response is a function of
time and constitutes the ﬁrst level of the measurement model. However, taking such an
approach fails to capitalize on one of the capacities inherent in the structural equation
models, not only ignoring the relations between multiple indicators and the underlying
latent construct, but also dismissing information about the psychometric properties of
manifest variables (Sayer and Cumsille, 2001). On the contrary, when we incorporate
multiple indicators with discrete scales of measurement into the model, a second-order
factor structure is used to investigate the developmental trajectory over time, which
allows the researcher to evaluate the factorial invariance of the latent constructs across
measurement waves, and permits the separation of time-speciﬁc error and measurement
error (Blozis, 2007; Sayer and Cumsille, 2001).

Unidimensional graded response models: GRMs. In the graded response model
(GRM; Samejima, 1969, 1997), the probability associated with the observed score equal

to and above the threshold category j is deﬁned as

* * ex 9,1,6,
P ( m—Jlﬁn,a,-,ﬂy)= Pym" ):1+:c.:( (1.619" —i15':')))

 

(i=l,...,1,'j =1,...,mi;n = l,...,N), where Yin denotes the response matrix

given by respondent n to item 1', and 051- and ﬂy represent item discrimination and

threshold parameters. Within each item, there are mi observed response categories and

28

(m,- -1) thresholds. For each alternative ( j = 1, 2,.. .,ml- ), there exists an ordering

relation such that ﬂi 1 < ’61- 2 < < ’61- mi _1 , in which the corresponding threshold

parameters indicate the propensity for a respondent to change from one response category
to another. With respect to the discrimination parameter, the model characterizes how
well an item discriminates among people with different abilities. Usually, a good item
comes with a large discrimination parameter, and with threshold parameters which span a
wide range on the trait scale. Unlike the partial credit model, which treats distinct
thresholds within each item independently (Masters, 1982), the GRM considers the
endorsement of a particular response alternative as requiring the successful
accomplishment of all previous steps (e. g., Reckase, 2009). Thus, calculating the

probability of endorsing a speciﬁc response category can be achieved by

P(Yin = f l gn’ﬂij’ai) = Pawn) = Illj—l)(9n)‘fizj)(9n)’mm

*

Qmi(6,,)=0and 13f0(l9,,)=1.

Multidimensional graded response models: MGRMs. In practical applications,
however, items do not necessarily measure a single uniﬁed component; therefore, a more
general, multidimensional model should be considered. That is, when an instrument
consists of several subscales, the researcher needs to adopt an IRT model of
multidimensionality for calculating a respondent’s conditional probability of correctly
responding to an item. Although psychological processes have constantly been found to
be more complex and several subscales on an instrument may tap distinct latent abilities,

the abilities are not necessarily independent. With respect to this, the MIRT has shown

29

promise when dealing with situations commonly encountered in educational and
psychological testing, such as multiple traits being required for endorsing an item, tests
containing mutually exclusive subsets of items, the underlying dimensions being
correlated, etc. (e. g., Adams et al., 1997; Reckase, 1997). As an extension of the
unidimensional graded response model, a multivariate version for polytomously scored

items (De Ayala, 1994; Reckase, 2009) can be expressed as follows:

exp(2daid9dn " 59')
1+ madame... - a.)

 

P*(Yin 2J'lQaln’a‘id’ﬂij)z Pl; (6‘1") :

(i=1,...,1,‘j =1,...,mi;d =1,...,D;n = l,...,N),where vector @6112
represents the trait level for subject n on dimension d (i.e., a person’s position in the
d—dimensional latent space was represented by the vector 0 = (61 , 92 , . .., 0(1));
vector (1 id stands for the multidimensional discrimination parameters for item i on
dimension d, giving the weight of each dimension d on item 1', and ﬂy- , a scalar

parameter, is the multidimensional item difﬁculty parameter. Like its unidimensional

counterpart, in the MGRM, subject responses to item i are categorized into ml- ordered

categories with (ml- -1) category thresholds, and higher category options indicate

greater 3 level, in which the 0 level could be any one or any combination of the
*
abilities required for solving an item. Thus, Pij (0n) can be interpreted as the

conditional probability of a randomly selected respondent n with latent traits Q

*
responding in category j or higher for item 1'. Because Pij (an) is the cumulative

3O

probability of responding in category j or higher on item i, the probability of responding

in a particular category, Pij (9n ), equals the difference between the cumulative

probabilities for adjacent categories (i.e.,

Pij (8n) : Eff—1) (910— BE!) (8,, )). Moreover, an item’s multidimensional

discrimination parameters can be interpreted in a similar manner as factor loadings in
factor analysis6. Thus, based on the scale structure, the relationships among latent traits
can be customized accordingly: for instance, traits could have a complex or simple
structure, where the complex structure implies that there are one or more items measuring

all d dimensions, and the former indicates that each item measures exactly what it is

supposed to measure (e. g., Bolt and Lall, 2003; Skrondal and Rabe-Hesketh, 2004). As

regards the IBij -parameter, it determines the location in the latent trait space at which

the category response surface achieves its maximum slope and, thus, where the item is

most informative (Reckase, 1985).

 

6 Even though the statistical formulation and procedure of factor analysis (FA) and MIRT are virtually

identical, the research focus and major application for each approach are quite different. Interested
readers may refer to Reckase (1997) for more details.

31

The Structural Model

Compared to traditional longitudinal models, the growth curve model is
considered a highly ﬂexible approach, because of its capacity to handle a variety of
complexities, such as missing data, unequally spaced time points, non-normally
distributed or discretely-scaled repeated measures, non-linear trajectories, and a
multivariate growth process (Curran et al., in press). Perhaps the most intuitively
appealing way of specifying a latent growth curve model is to link it to two distinct
questions about change: one entails the starting position (level) and the other involves the
overall true change (shape) across the entire study period, each arising from a speciﬁc
level in a natural hierarchy, called a two-stage model formulation (Rabe-Hesketh and
Skrondal, 2008; Singer and Willett, 2005).

Univariate latent growth curve analysis: LGC. Taking the perspective of latent
response formulation, change can be modeled in repeated latent constructs, making it

possible for the error in the measurement model to be decomposed into individual

time-speciﬁc deviation (i.e., 8610)”) and measurement error. As Blozis (2007) puts it,

being the subject of analysis, the latent variable encompasses time-speciﬁc error without

the confounding inﬂuence of measurement error. This is because at each point in time, a

common factor is assumed to account for the dependencies among a set of categorically

scored items and allow for the separation of the error variances not attributable to growth.
The level-1 structural model. Using LISREL notation, the univariate level-1

structural model can be expressed as follows:

32

6'a’(2)n w 19102 ’ldIZ C 8d2n
[ (17012]+

 

 

 

 

 

 

 

 

9d(3)n = Td3 + 31:03 16113 C 8d3n
d1” ......
9dmn _TdT_ Jldor MIL fdmn
F1 0 P 8d1n l
1 ld12 Cd 8d2n
= 1 16113 [ 0n:|+ 8d3n
Cdln oooooo
-1 I J _8d(7)n

 

 

 

 

(d =1,...,D,‘t =1,...,T;n =1,...,N,’Tdt = 0;].(111: 0,].le = 1). Since the
repeated measures (6d(t)n) have been extracted from the unidimensional graded

response model through the cumulative logit link function, relating the expected response

to the linear predictor, a linear combination of person-speciﬁc random effect (6d(t)n)

and item-by-logit indicators, equation above is the structural model. As mentioned

earlier, the term 6 d( t) n refers to the propensity of an individual n at time t on the

particular dimension d, and is a function of latent variables (representing the underlying

initial status (gd0n) and the relative growth or decline trajectory (§d1n»’ and
time-speciﬁc disturbance residuals (8d(t)n ). Additionally, if there is a signiﬁcant

amount of variation to be explained, analysis can proceed in a stepwise manner by adding

33

time varying covariates (TVCs), as time-speciﬁc predictors of the repeated measures7. To

model a nonlinear growth/decline trajectory, we adopt the suggestion of Meredith and
Tisak (1990): ﬁxing all ldOt equal to one and setting Adl 1 and lle to be zero

and one for model identiﬁcation purposes. By doing so, we let the model freely estimate

the intermediate time coefﬁcients. Adopting the assumption typically made in structural

equation models (that Eda)” are identically and independently normally distributed

with mean (zero) and variance (l/l )), we ﬁx disturbance residuals at the level-1 structural

model to be time-homoskedastic, which can be equal to a constant value and makes these

time-speciﬁc error variances identically distributed over time within each person.

Because the random-effect (9d( 1.)") can be further represented by the variances of

CdOn and 4’ d 1 n at the second level of structural model, the ULGC represents one

kind of random-effect model.

 

 

0'2 0 0 0 0
0 02 0 0 0
6986,: 0 0 0 0
0 0 0 0
2

_0 0 o 0 04.”

 

As illustrated by Bauer (2009), when ﬁtting models for categorical data, model comparisons are

impeded because of the implicit rescaling of the model estimates which take place with the inclusion of
new covariates. Thus, in order to have the estimates on a common scale and facilitate the model
comparisons, a scaling factor is needed to apply to each component of the random effect model, with

the exception that the successive models differ only in the inclusion/exclusion of cluster-level covariates.

Since we included no time-varying covariate throughout the present work, a rescaling method would not
be applicable here.

34

The level-2 structural model. The level-2 structural model allows us to
distinguish the change trajectories between individuals using their speciﬁc grth
parameters, such as the true initial status and the change rate, implying that we can
examine unobserved heterogeneity in growth curves by studying inter-individual
variation in growth parameters. As explained by Singer and Willett (2005), an
appropriate level-2 model has the following four characteristics: (1) the level-2 outcomes
are the level-1 individual growth parameters; (2) the level-2 model can be written in
separate formulae, one for each level-1 growth parameter; (3) each level-2 formula
speciﬁes a relationship between the individual growth parameter and the time-invariant
characteristics of individuals, and (4) each level-2‘ formula must contain the stochastic
component, because those individuals who share a common predictor could vary in their
speciﬁc change trajectories, hence the name random coefﬁcient models. Thus, an

unconditional level-2 LGC model can be expressed as

Caon __ Vd00 _, den

Cdzn Vd10 vd1n

(d = l, ..., D,’ n = 1,. .., N ), where equation above represent regression equations

among the latent variables, one for each level-1 grth parameter. In an unconditional

model, the 4110” and 4611" factors have VdOO and leo as corresponding

intercepts and the residuals are 061012 and UdIn . As one of the advantages of casting

IRT models in a multilevel structure, the researcher is thereby enabled to incorporate
different contextual variables as auxiliary information while estimating the models,
which not only improves the estimation of person abilities, but the calibration of item

parameters (Mislevy, 1987). Besides, unlike the conventional two—stage procedure, the

35

simultaneous estimation of a multivariate multilevel IRT model avoids the problems of
attenuation bias when the study focus is to regress the latent trait variables on other
explanatory covariates (Bolt and Kim, 2005). Thus, when time-invariant covariate(s)
(TIVs) are introduced into the model, other things being equal, at the individual-level the

between-person variability associated with each growth factor can be augmented as

(don : Vd00 + 7d01

D
[TIVIn]+ “’0"
(arm VdIO 7am vdIn

Usually, 061011 and Udln are assumed to have a bivariate normal distribution

with zero mean and unstructured covariance matrix in both unconditional and conditional
situations. Therefore, the distribution associated with residual variances and covariance
of the true initial level and shape factors can be expressed as following equation, which

permits the level-1 grth parameters to differ across individuals.

— —

2
a a
v deI
t/J=COV(v)= d0 2
_”vd10 00d] .

 

 

Multivariate latent growth curve analysis: the associative LGC. Despite the
fact that developing behaviors are typically intercorrelated, many studies examining the
covariance matrix among these behaviors have been static, primarily based on
cross-sectional measures taken at one point in time (Duncan, Duncan, and Strycker,

2001 ). However, with increased interest in the development of interrelated behaviors, the
focus of the research has switched from static models to the development of dynamic
models, in which the latter incorporate both the time dimension and the intra- and

inter-individual variability of behavior trajectories. As originally conceptualized by

36

Tucker (1966), the multivariate latent growth curve model has been considered a more
general and dynamic view of the correlates of change, making it possible for the
researcher to obtain both the common and the speciﬁc effects of predictors, and examine
the associative relationship among several key developmental variables at the same time
(Duncan, Duncan, Strycker, Li, and Alpert, 1999; Duncan, Duncan, and Strycker, 2000).

Being extracted from the multidimensional graded response model, each
developmental variable of interest is an unobservable propensity level. In order to
validate the rationale in conducting an associative LGC, analytically the researcher needs
to ensure that there is sufﬁcient interindividual variation in the initial status and the
grth rate for each univariate dimension. Once each univariate construct can be
successfully modeled, the researcher can model all the developmental latent variables
simultaneously. The associative latent growth curve model depicted in Figure 2.1
describes the form of growth and the pattern of associations among growth factors for
each dimension of interest. In addition, in order to capture the nonlinear trajectory
embedded in each developmental variable, the shape factor loadings are constrained to
zero at the ﬁrst assessment occasion and one at the last assessment occasion, and the
coefﬁcients for intermediate time points are freely estimated.

This bivariate latent growth curve model can be expressed as
9d(t)n = (don + lduidm + 5d(t)n
(d = l,...,D,‘t = 1,...,T;n = l,...,N), where we include a level and shape factor

(e.g., 4’10 and €11,and 420 and 421 fordimensions 91 and 92,

respectively) and the corresponding deviations. This model allows the identiﬁcation of

grth in each dimension as well as the covariation among them. However, as Ferrer and

37

McArdle indicate (2003), the relation expressed by the covariance of both slopes is not
time dependent, for it overlooks possible interrelations between the dimensions over
times.

As its univariate counterpart, being capable of allowing the straightforward
examination of intraindividual change as well as interindividual variability, the
associative LGC makes available a variety of analyses of grth and developmental
processes to a wide audience of researchers. For instance, apart from the capabilities
leading to greater understanding of multiple developmental trajectories, this associative
LGC is also appealing as a way of examining the antecedents, processes, and
consequences of change (e. g., Willett and Sayer, 1994). Although many other techniques
have been developed to capitalize on the special features of longitudinal research, the
class of statistical methods contained in the latent growth curve is highly ﬂexible in
model articulation, providing enhanced statistical power for testing hypotheses, and
demonstrating greater correspondence between the statistical model and traditional theory

used to explain developmental trajectories (Curran et al., in press; Preacher et al., 2008).

 

8 Unlike the bivariate latent difference scores model (BLDS), this bivariate latent grth model cannot

capture the feature of time-lagged sequences between dimensions (Ferrer and McArdle, 2003).

38

Chapter 3
BAYESIAN INFERENCE

As two major components of the uniﬁed model, the existence of a small number of
latent factors under multivariate discrete data and the combination of the measurement
and structural model for hierarchically nested data structures, a general two-level latent
variable model with ordered categorical variables is adopted to account for the individual
latent trajectories. Motivated by its various advantages, Bayesian estimation was used for
analyzing the current proposed model. Recent MCMC methods in statistical computing
for posterior simulation greatly enhance the applicability of the Bayesian inference.
Through the application of MCMC to simulate observations from the posterior
distribution, one basic strategy is to augment the observed data with the hypothetical
latent data which come ﬁ'om latent measurements and latent variables. Thus, in this
study, using the Gibbs sampler algorithm coupled with the Metropolis-Hastings

algorithm, the MCMC is constructed to circumvent the intractable numerical integration.

39

Estimating Statistical Complex Models Using the Markov Chain Monte Carlo
(MCMC)

Being the centerpiece of Bayesian inference, Bayes’ theorem can be expressed as
f(.Q I Y) 0C f (Y I .0) f (.0), indicating that the joint posterior density is
proportional to the product of the likelihood function and the prior density for the model

parameters, where [2 represents the unknown parameters and latent variables,

Y denotes the observed response data, and f (.0 I Y ) is the posterior probability
density function. The “ f ( )” can be replaced by “ p( )” and “Z ” can take the
place of“ I ” when we have data and parameters of a discrete nature. This posterior

density can be used to determine model parameter estimates; the quantity f ( Y I Q )

denotes the likelihood function of the model parameters, given the response data (Y ),
and f ([2 ) is the prior density for the model parameters, representing the relative

likelihoods of particular parameter values before accessing the data.

When the model becomes complex, this joint posterior distribution tends to
become numerically or analytically intractable. This is because calculating this posterior
density typically requires a large summation and/or multidimensional integrals. In order
to solve this intractability problem, the use of the Monte Carlo integration was revisited
by Bayesian statisticians in the late 19803. A random sequence or chain is generated, such

that in the long run each parameter value occurs with a frequency proportional to
f ([2 I Y ). In addition, the chain is generated so that each value in the sequence

depends only on its immediate predecessor, which under certain conditions makes it a

40

ﬁnite order Markov process (Kim and Bolt, 2007; Rupp et al., 2004; Thompson, Palmer,
and Moreno, 2006; Western, 1999). Possessing these two properties, this sampling
procedure is named the Markov chain Monte Carlo (MCMC), the goal of which is to
reproduce the joint posterior distribution through simulation (e. g., J ackrnan, 2000; Kim
and Bolt, 2007; Lynch and Western, 2004; Patz and Junker, 1999b). By sampling enough
observations, researchers can obtain a general description of the posterior distribution,
such as the expected a posteriori (EAP; the mean of the posterior density), maximum a
posteriori (MAP; the mode of the posterior density), posterior standard deviation (PSD;

standard deviation of the posterior density), the 95% credible interval, etc.

41

Sampling Procedures

The mechanism by which sampling is conducted varies depending on the known
features of the posterior distribution, f ([2 I Y ). In general, various types of sampling

algorithms are considered within MCMC, two of which are the Metropolis-Hastings
algorithm (Hastings, 1970; Metropolis, Rosenbluth, Rosenbluth, Teller, and Teller, 1953)
and the Gibbs sampling (Gelfand et al., 1990; Geman and Geman, 1984). Also known as
rejection sampling, the key to the former is trying to ﬁnd a suitable candidate-generating
density for suggesting a new value, given the current value in the chain. The choice of a
proposal distribution affects the efﬁciency of the algorithm: a good choice of proposal
distribution will make the chain converge quickly to the long-run probabilities; however,
a poor choice of proposal distribution will leave the chain stuck while generating
parameter values and slow down the convergence of the sequence (Thompson et al.,
2006). Usually, the Metropolis-Hastings algorithm is needed when estimating logistic
item response models, for the complete conditional distributions are not of a known
distribution form (Kim and Bolt, 2007). To make the Markov chain reach convergence
reasonably fast, Patz and Junker (1999b) suggest the use of Metropolis-Hastings within
Gibbs (MHwG) for the two- and three-parameter logistic model (e. g., Bimbaum, 1968;
Lord and Novick, 1968) as well as the generalized partial credit model (Muraki, 1992).
As a special case of the Metropolis-Hastings, the Gibbs sampling involves cycling
through smaller subsets of parameters and using the current estimate of the ﬁll]
conditional posterior distribution as the proposal density (Casella and George, 1992; Chib
and Greenberg, 1995; Fox, 2007; Gelfand et al., 1990; Patz and Junker, 1999a, 1999b;

Thompson et al., 2006). The subset parameter may be univariate or multivariate, such as

42

sampling from the full conditional posterior distributions of each unknown or blocks of
unknowns. As stated by Dunson et al. (2005) and F ox (2007), such techniques as
parameter expansion, updating parameters in blocks instead of one by one, have a
dramatic impact on computational efﬁciency and help improve the mixing rate of Markov
chains. Being a “divide and conquer” strategy, sometimes the Gibbs sampler may be
inefﬁcient, moving slowly over the parameter space (Western, 1999); however, due to its
use of known conditional distributions for simulation, this setup helps reduce
multidimensional problems to a series of univariate calculations and make it easier to
simulate draws (Casella and George, 1992; J ackman, 2000; Patz and Junker, 1999a,

1 999b).

43

Speciﬁcation of Priors

As mentioned earlier, the posterior distributions from the Bayesian inferences
depend not only on the data through the likelihood function but also on the prior density
(e. g., Western, 1999): thus, the speciﬁcation of prior distributions for each of the model
parameters and latent variables plays an important part in the Bayesian approach. Unlike
those of the frequentists, Bayesian methods‘provide a clear channel to incorporate prior
information, which helps increase the statistical power of the analysis and contributes to
the accumulation of scientiﬁc ﬁndings (Hsieh and Maier, 2009). Based on Bayes’ law,
whenever our prior is uniformly distributed in the region where the likelihood function is
located, the posterior distribution for the Bayesian function is nearly proportional to the
likelihood function (Gill, 2002; Maier, 2001; Rice, 1995). Moreover, as sample sizes
increase, priors are generally asymptotically irrelevant, and the estimates obtained from
the Bayesian and frequentist methods should approach identical values (Dunson, Palomo,
and Bollen, 2005; Lynch and Western, 2004; Western, 1999). In this sense, the Bayesian
method can be treated as a direct alternative to the maximum likelihood estimates (MLEs)
for parameter estimation when using non-informative priors.

A long-running debate in Bayesian inference revolves around the choice between
subjective priors and objective priors, in which the subjective priors indicate the inclusion
of existing subj ect-matter knowledge, and objective priors remove any subjectivity from
the analysis. Although the role of the prior diminishes as sample size increases,
inferences may be sensitive to the choice of the prior (Gill, 2002; Kim and Bolt, 2007). In
practice, there is a preference for objective reference priors, for they resolve the dispute

between Bayesian and likelihood approaches, which results in proper but diffuse priors as

44

a popular choice (Lynch and Western, 2004). However, informative subjective priors
allow researchers to build on previous research, and can be justiﬁed on the basis of
opinion elicited from scientiﬁc specialists, archival materials, and the weight of
established evidence (e.g., Lee and Wagenmakers, 2005). Seeing that the prior densities
are needed to deﬁne the posterior distribution, it is desirable to select conjugate priors
whenever possible. Adopting conjugate priors implies that the distribution of the
posterior is already known and of the same form as the prior density, which makes the
sampling in MCMC computationally efﬁcient (Johnson et al., 2007; Kim and Bolt, 2007;
Rupp et al., 2004). In other words, by assigning noninfonnative priors to the model
parameters of interest, the researcher allows the data to provide as much information as
possible by themselves. However, in order to facilitate model identiﬁcation, the
researcher may consider using a prior density with high precision. For instance,
throughout the present study a normal prior with high precision was utilized for item
difﬁculty parameters, and a truncated normal prior was adopted for item discrimination
parameters. The complete speciﬁcation of different priors can be found in the appropriate

sections of the practical illustration chapter.

45

Monitoring the Markov Chain(s) and Evaluating the Model Goodness of Fit

For the model estimated via the Bayesian Markov chain Monte Carlo (MCMC),
as implemented in WinBUGS 1.4.3 (Spiegelhalter et al., 2003), the ‘bum—in’ period for
the MCMC chains was determined using the method proposed by Gelman and Rubin
(1992). Although Geyer (1992) suggests that generating one single long chain is more

efﬁcient in using the simulation output, it leads to more complex Monte Carlo standard

error expressionsg. As opposed to running a single long sequence, Gelman and Rubin
(1992) argue that, to monitor the model convergence, it is important to run multiple
chains using a range of different starting values (Seltzer, Wong, and Bryk, 1996). Thus,
in the present work we perform Bayesian analysis using multiple independent chains with
over-dispersed starting values.

In order to begin the sampling process, we need an initial set of values, treated as
the starting values for the model parameters. They can be generated either by random
variables or obtained whenever possible from existing maximum likelihood-based
estimation programs. However, as noted by Kim and Bolt (2007) and Thompson et a1.
(2006), the choice of starting values may inﬂuence the sequence of values produced, and
successive values may be highly correlated in the early stage of the chain. In this case,
simulated values cannot be treated as a random sample from the posterior distribution.
Thus, it is common to disregard a number of the initial iterates, treat them as the bum-in
period, and estimate the posterior distribution using the remaining iterates. In order to

ensure that each chain has converged to its stationary distribution and stable parameter

 

Because the posterior distributions are constructed from simulated samples, errors in the estimates can
be attributed to the standard deviation of the posterior as well as the sampling error. Here, the sampling
error is referred to as the Monte Carlo standard error (MCSE) (Patz and Junker, 1999b; Spiegelhalter ct
aL,2003)

46

estimates have been obtained, one normally allows for a bum-in period of some length,
and makes use of the subsequent simulated states to construct the posterior distribution
(e. g., Kim and Bolt, 2007; Patz and Junker, 1999b).

Several methods have been proposed for model comparison, based on Bayesian
principles; for instance, Spiegelhalter and his colleagues (2002) propose the deviance
information criterion (DIC), which includes many features of classical model assessment,

such as requiring accurate predictions and penalizing complexity. Being composed of two

major elements, mathematically, the DIC is deﬁned as D] C = D(Q) + p D , where

13(0) is a measure of lack of ﬁt, representing an estimated average discrepancy

between model and data, and p D accounts for the expected decrease in deviance

attributable to the added parameters of the more complex model (Fox, 2007; Li, Bolt, and
Fu, 2006). As the model diagnosis and evaluation criterion, estimate of the DIC index can
be requested from the WinBUGS program, in which a smaller DIC represents a better ﬁt
of the model and a difference of less than ﬁve or ten units between models does not
provide sufﬁcient evidence for favoring one model over another (Spiegelhalter et al.,
2003)

In addition to the DIC, the posterior predictive check (PPC) is another criterion
used for assessing the model goodness of ﬁt (Gelman, Carlin, Stern, and Rubin, 2003).

Mathematically, the posterior predictive distribution can be written as:
P(Yrep IY) = IP(Yrep IQ)P(.QIY)d.Q, where Yrep denotes replicated

values of Y , and .0 represents all model parameters and latent variables (Sinharay

and Stern, 2003). The integral deﬁning the posterior predictive distribution consists of

47

two parts: the sampling distribution (P (Y rep IY )) and the posterior distribution for

model parameters and latent variables (P ( .Q IY ) ). That is, the posterior predictive

distribution takes the following two uncertainties into account: sampling uncertainty and
model uncertainty (Lynch and Western, 2004; Rupp et al., 2004; Western, 1999). The
rationale behind posterior predictive checks involves simulating data under the model
stated in the null hypothesis and comparing the features of these replicated data with the
observed ones. This approach grants the researcher a wide range of ﬁt statistics; an
overall discrepancy statistics utilized in one of the present study is the Bayesian
chi-square: the sum of squares of the outﬁt measureslo

Speciﬁcally, being a quantitative measure of lack of ﬁt, with simulated iterates

generated from the posterior distribution, the Bayesian p value (also known as the

PPP-value) can be assessed by comparing the observed T ( Y ) to the replicated

T(Yrep ), and deﬁned as p = P(T(Yrep ) _>_ T(Y)), where this tail-area
probability (or p-value) is estimated from the simulation as the proportion of the N

. . . rep . . .
replrcatrons for whrch T Y _>_ T ( Y ) , and can be mterpreted as the probability of

observing extreme data conditional on the model (Lynch and Western, 2004; Sinharay
and Stern, 2003; Sinharay, Johnson, and Stern, 2006). Thus, any systematic discrepancy

between the replications and observed data reﬂects the implausibility of the data under

 

Even though it has advantages over standard applications of ﬁt statistics, this chr-square-type measure

48

the model, and suggests that the presumed model does not ﬁt the data well (Li et al.,
2006; Lynch and Western, 2004; Sinharay and Stern, 2003; Sinharay et al., 2006).
Usually, the PPP-value under the correct model tends to be closer to .5; however, if the
posterior predictive p values are extreme, being close to zero, one, or both (depending on
the nature of the discrepancy measure), it is clear that the observed response would be
unlikely to occur provided that the null hypothesis is true (Sinharay and Stern, 2003;

Sinharay et al., 2006).

 

should be interpreted with great caution. According to Sinharay et a1. (2006), in IRT model checking it
is not a suitable discrepancy measure and fails to detect the problems with inadequate psychometrics
models.

49

Chapter 4
PRACTICAL ILLUSTRATION

The ease of implementing Markov chain Monte Carlo (MCMC) simulation
methods demonstrates much potential for statistically complex models in which they can
ﬁnd future application. In this section, the utility of this IRT-LVM comprehensive
framework was investigated with examples using both simulated and empirical data, in
which three models were presented in turn, namely, the unidimensional Rasch (1960) and
linear latent growth curve model (RASCH-LLGC); the unidimensional two-parameter
normal ogive (e. g., Bimbaum, 1968) and nonlinear latent grth curve model (e.g.,
Meredith and Tisak, 1990) (2PNO-LGC), and the multidimensional graded response (e.g.,
De Ayala, 1994) and associative latent growth curve model (e. g., McArdle, 1988)

(MGRM-ALGC).

50

Using the RASCH-LLGC to Evaluate the Model Parameter Estimate Performance

Unlike the two-parameter IRT model, the Rasch model assumes an identical
discrimination parameter for each item, implying that the relative severity of the items is
indistinguishable for all subjects (Rasch, 1960). Other key assumptions include (1) local
independence and (2) additivity, in which the former represents a set of items measuring
a single underlying latent variable; the latter implies that there is a readily interpretable
ordering of items and persons, since item differences and person differences contribute
additivity to the same scale, the log-odds of an afﬁrrnative response (Johnson and
Raudenbush, 2006; Raudenbush, Johnson, and Sampson, 2003). As for the structural
component, expanding on traditional repeated-measures analysis, the linear latent grth
curve model allows one to simultaneously model within-person change patterns, and
between-person differences in the characteristics of latent trajectories (Curran, et al., in
press).

Monte Carlo simulation study. Under the framework of the IRT-LGC, we
demonstrate how to evaluate the performance of parameter estimates through conducting
a Monte Carlo study. As the sample size needed for a particular longitudinal study
depends on many factors, such as the complexity of the model, the number of assessment
occasions, the standardized effect size associated with the polynomial coefﬁcient of
interest (ex., linear, quadratic, or cubic), the variation between and within participants,
the amount of missing data, etc (Curran et al., in press; Hertzog, von Oertzen, Ghisletta,
and Lindenberger, 2008; Muthén and Muthén, 2002; Raudenbush and Liu, 2001), an
“adequate” sample size is hard to unambiguously determine. As a simpliﬁed illustration,

a speciﬁc IRT-LGC model is investigated, in which the Rasch model for dichotomous

51

items is the measurement model and a linear latent growth curve model (LLGC) with
four equidistant time points is the structural model. Given the constant number of
repeated assessments and the growth curve reliability (GCR), we assume that the
performance of a particular parameter estimate, the stability and variability of the average
growth trajectory, is a function of sample size, the number of items being administrated at
each point in time, and the standardized effect size of the average growth trajectory.
Based on a Monte Carlo sample size study, Muthén and Muthén (2002) suggest
that for a linear grth curve model without a covariate (i.e., a unconditional model), the
following speciﬁcation of the covariance matrix reﬂects a commonly seen scenario,
showing that the variation of the intercepts is generally larger than that of the linear

growth rate in longitudinal studies, and the covariance between them is set to zero.

.5 0
T:
0 .1

In addition, according to Hertzog et a1. (2008), they indicate that the GCR would
have an impact on the power of detecting individual differences associated with the
change proﬁle, that is, the variance of the slope factor. Having two components, the GCR
can be deﬁned as the variance determined by the latent grth curve at each point in
time, divided by the total variance of repeated measures. In this study, to partial out the
inﬂuence of this confounding factor, we assume that residual variances are homogeneous
across different points in time and ﬁxed at the value of one, which is the general practice
for conducting power analyses in the multilevel model framework (Snijders and Bosker,
1993). In order to have acceptable GCR values across the entire study period, we follow

Muthén and Muthén’s (2002) observation and rescale the elements in the covariance

52

matrix by a factor of 2, which results in a modiﬁed covariance matrix and the respective

GCR values of.50, .55, .64, and .741 1.

10
0.2

Adopting Cohen’s deﬁnition of the magnitude of effect sizes (Cohen, 1988), we
specify two different standardized effect sizes for the mean of the linear grth
trajectory: that is, the small effect size (.14) and the medium effect size (.28). These

values are calculated as follows:

 

V
5_ 10

_ I 2
0'01
. . . . . - 2
, where 5 rs the magmtude of the standardized effect srze, and V10 and 0' 01

represent the overall linear time effect and the corresponding variance associated with

this linear slope factor. Using the values of .316 and .632, we obtain the corresponding
small and medium effect sizes for the linear growth trajectory (V10 ); that is, .14

(.3l6><\/ .2) for the small effect size and .28 (.632><\/.2) for the medium effect size.

 

I 1 The formula for calculating the GCR can be expressed as,

(0'30 + #031 + 2t0'001)

 

2 _
R (65)— 2

2 2 2
2 2 2 ,where 0'00, 001 and 0001
(000+t 001+2t0001+08t)

are the variances and covariance associated with the intercept and slope factors; 0'g is the residual
t

variance for the underlying latent variables at time t, and t is the time coefﬁcient (i.e., 0, l, 2, and 3) in a

linear growth trajectory model (Muthén and Muthén, 2002).

53

As regards the number of items being administered at each time in the same point,
we chose 5, 10, and 15 items to represent three different lengths of the scale. Using the
unidimensional Rasch model, item difﬁculty parameters were selected from the range of
[-2, 2] with equal intervals. For instance, for a scale of 5 items, the item difﬁculty
parameters are pre-speciﬁed as ,6=[-2,-1, 0, l, 2]. For a lO-item test, the item difﬁculty
parameters are B=[-2, -1.556, -1.111, -.667, -.222, .222, .667, 1.111, 1.556, 2].
Analogously, for a test of 15 items, the item difﬁculty parameters are ,6’=[-2, -1 .714,
-l.429, -l.l43, -.857, -.571, -.286, 0, .286, .571, .857, 1.143, 1.429, 1.714, 2]. The
observed dichotomous outcome variables from this RASCH-LLGC model were
generated by comparing the probability of the correct response with a random number
generated from a standard uniform distribution, U[0, I].

As Curran et a1. note (in press), in order to have reliable estimates from the
growth curve models, sample sizes approaching at least 100 are often preferred.
However, achieving accurate estimates in LGC models with discretely scaled variables
requires relatively large sample sizes. Generally speaking, Lee (2007) suggests that, when
analyzing dichotomous data, researchers need at least “30a” sample sizes in order to
achieve reasonably accurate results, where “a” is the number of unknown parameters.
Therefore, as the unknown parameters in this RASCH-LLGC model with three different

lengths of scale are 8, l3 and 18, we select sample sizes of 125 and 250 as the two

investigating levels 1 2.

 

12 . . . . . .
Even though the sample srzes for these two rnvestrgatrng-level seem small in the typical IRT model
estimation, Muthén and Curran (1997) argues that in growth models it is the total number of
person-by-time observations that plays an important role in model estimation and statistical power.

54

In summary, to evaluate the numerical behavior of the average growth trajectory
(i.e., the stability and variability of latent mean associated with the slope factor in an
RASCH-LLGC model), the simulation used a 2X3><2 design with 12 conditions, in each
of which a total of 100 replications were generated using the free software R (R
Development Core Team, 2009) and the models were implemented and estimated using
WinBUGS 1.4.3 (Spiegelhalter et al., 2003). Speciﬁcally, we generate data sets which
represent the alternative hypothesis (i.e., the mean of the slope factor is statistically
signiﬁcant different from the speciﬁed values, .14 and .28). However, in Bayesian
analysis, using the percentage of replications where the null hypothesis was rejected as a
proxy estimate for power determination should proceed with caution. As indicated by Lee
(2007), the standard error estimates are usually overestimated in Bayesian SEM analysis.
Thus, he suggests that the hypothesis testing should be approached by means of model
comparisons through the Bayes factor (BF) or DIC, in particular for models with
dichotomous variables. Also, as the information carried by the dichotomous data is
relatively rough, it is important to monitor the model convergence with great care, for it
requires more iterations for the MCMC algorithm to converge. Therefore, for each
replication, we execute the algorithm by means of running three independent chains with
over-dispersed initial values and take the ﬁrst 25,000 iterations as the burn-in period for
each chain. That is, a total of an additional 15,003 (5,001 *3) iterations for three chains
was carried out to deﬁne the sampling distribution of each parameter in the model. In
addition, a common method used for assessing convergence is to compute the
Gelman-Rubin statistic, the potential scale reduction factor (PSRF), which compares

within-chain variability to the variability among chains (Gelman and Rubin, 1992). When

55

for each parameter of interest the PSRF approaches one, it suggests that the model

reaches convergence. Finally, the summary of population values used in this
RASCH—LLGC model can be found in Table 4.1.2.

The following criteria are used for evaluating the model parameter performance,
such as the bias (BIAS), the root mean squares (RMS) between the true values and the

corresponding estimates, and the ratio of the standard errors estimates to the sample

standard deviations, SE (ﬁlo)/ SD03”, ) , in which the bias of the estimates and the root

mean squares between the true values and the corresponding estimates are computed as

follows:
,. _ Ar 0
BIASof V10 — E[v10 -v10:I
1/2

A 1 100 »r 0 2
RMSof V10: ﬁZIVlO—Vlo]
r=l

,where ﬁro and V100 are the rth estimate of V10 and its true value, respectively.

In order to study the behavior of the numerical standard error estimates, let SD(1310 )
be the sample standard deviation obtained from {131"0 .' r = l, ..., 100} , and

SE (1310 ) be the mean of the numerical standard errors estimates of 1310 obtained

a:
—1 T . T
via, E (T* -1) 2(V10(t) —1310)(V10(t) —1310) ,where T* is the
t=l

total number of simulates obtained from the posterior distribution, and

56

a:

T

A *'-' t

V10 : T 1 E v10( ) . When the standard errors estimates are close to the sample
t=l

standard deviations, SE (131 0) should be close to SD (131 0 ), and the ratio of

SE(1310)/ SD(\310) should be close to one, in which the ratio can be used for
assessing the behavior of the numerical standard error estimates. Thus, based on the

deﬁnitions of $130310) and 5130310), it is found that the sample standard

deviation of {131" 0 .‘ r = l,...,lOOI is smaller than the mean of the numerical standard

error estimates, indicating that the variability of the Bayesian estimates, the average
change rate, is relatively small, which may be regarded as an advantage of the Bayesian

estimates. However, it also indicates that the numerical standard error estimates of the
Bayesian approach (SE/(1710 )) are overestimated, which is in line with our

expectations, as a converged MCMC chain will have explored all of the parameter space
and provided a full picture of the posterior distribution. Finally, it is found that in most
cases the design factors investigated in the present study, such as the sample size, the
standardized effect size, and the number of items, all execute positive inﬂuences with
respect to the stability and variability of the parameter estimate of interest (see Table
4.1.3). That is, by increasing the sample size, the magnitude of the standardized effect
size, and the number of administered items, the promise of reducing bias and increasing
precision for the average growth trajectory in the RASCH-LLGC model can be validated.
Prior knowledge incorporation. In this section, we demonstrate how the use of

prior information affects the parameter estimates and standard deviations from a small

57

data set. In the previous simulation study, baseline priors and conjugate priors are used in
all Bayesian analyses. Speciﬁcally, the mean of the shape factor is estimated using a
normal distribution prior. As regards the covariance matrix of the random effect
parameters, the conjugate prior, the inverse Wishart distribution, is used. As for the item
difﬁculty parameters, in order to facilitate model identiﬁcation, we adopt a normal prior
density with tight precision and treat them as the baseline priors. The complete
speciﬁcations of the least-informative, half-infonnative and full-infonnative priors are
displayed in Table 4.1.4.

Using the least-infonnative, half-informative and full-informative priors, the
results of parameter estimates and associated standard deviations ﬁ'om the simulated data
set, one with a small standardized effect size of the average grth trajectory (.14), a
sample size of 125, and ten dichotomous items (SE125110), are given in Table 4.1.5. The
results appear to show that the standard deviations when adopting vague priors were
relatively large. When analyzing the data again with half- and full-informative priors, the
corresponding standard deviations were reduced: obviously, with more information on
priors, the standard deviations became smaller through comparing their counterparts
which had been obtained using half- and full-informative priors. This illustrates the way
in which the use of informative priors can increase the statistical power and reduce
parameter uncertainty, implying that informative priors can be viewed as additional or
extra data points (Gelman and Hill, 2007; Zhang et al., 2007). Thus, through Bayes’ law,
we demonstrate how posterior probabilities are revised in the light of new information
and bridge individual expressions of uncertainty to contact with real-world data

generating mechanism.

58

Fit of the 2PNO-LGC to the Abortion Data

Despite the large number of components requiring attention when selecting an
appropriate statistical model, this section restricts its focus to the following issues: (1)
model formulation: how Bayesians explicitly incorporate multiple dichotomous repeated
measures into a latent grth curve analysis. In order to differentially weigh individual
items, and examine developmental stability and change over time, one speciﬁc model, an
2PNO-LGC, is presented, in which the model combines the two-parameter normal ogive
item response theory model (e. g., Lord and Novick, 1968) and latent growth curve
analysis (e. g., Meredith and Tisak, 1990); (2) model equivalence: it is well known that
grth models can be approached from several perspectives via the formulation of
equivalent models and can provide identical estimates for a given data set, such as the
HLM and LGC models. To assess the advantages and disadvantages of these two distinct
modeling frameworks, we illustrate their different characteristics and use in applications
with simulated data; (3) missing data compensation: as an alternative estimation method,
the Bayesian inference explicitly models missing outcomes and handles them as extra
parameters to estimate (Gelman and Hill, 2007; May, 2006; Patz and Junker, 1999b;
Spiegelhalter et al., 2003). Thus, when the missing data generation mechanism, missing
at random (MAR; Rubin, 1987), is sustainable, the incorporation of individual-level
auxiliary predictors makes it trivial to use the Bayesian approach to effectively estimate
missing values in a conditional model (Carrigan et al., 2007; Gelman and Hill, 2007).

Measures and data sources. As part of the investigation of British Social
Attitudes, the data represent the responses to seven items concerning attitudes toward

abortion by a selected panel of 410 from the years 1983 to 1986. For each item,

59

respondents were asked if they agreed that the law should allow abortion: where 1 stands
for “agree” and 0 otherwise. These seven items are listed in Table 4.2.113. However,
when we perform a conﬁrmatory factor analysis (CFA) to examine the underlying
construct using the software of Mplus (Muthén and Muthén, 1998-2007), we ﬁnd these
seven items seem not to measure the same thing: that is, these items do not form a
unidimensional construct. As a simpliﬁed demonstration, we decide to focus on
participants’ general attitudes toward abortion (measured by the bottom four items in

Table 4.2.1) and remove the extreme circumstance factor from subsequent analyses. By

doing so, the gamma change‘4 can be ruled out through conducting a CPA on the scale at
four time periods. That is, a single underlying latent variable helps explain the whole
association between the responses to different items by an individual, and all items load
onto this single latent factor across the entire study span.

The breakdown of analyses and response patterns for complete cases and
available cases can be found from Table 4.2.2 to Table 4.2.5. In our analyses, only
approval or disapproval responses were counted as valid and other responses were treated
as item non-response, which results in 284 respondents giving complete responses for all
four years. However, if the responses of “don’t know” and “no answer” are included, we
have a usable sample of 323 cases. As observed in the response pattern for each data set,

it is found that in the contingency table we have a few response patterns with large

 

l . . . . . .
3 Data were supplied by the UK Data Archive. Neither the ongrnal data collectors nor the archive bear
any responsibility for the analyses.

In Golembiewski et al.’s triumvirate conceptualization of longitudinal change (1976), they claim that
the true change (aka. the alpha change) can be inferred only ﬁ'om observed scores in a situation when
there are no beta and gamma changes, where beta change is deﬁned as the change resulting from the
respondent’s recalibration of the measurement scale over time, and gamma change refers to as a
fundamental change concerning the respondent’s understanding and perception of the latent constructs
of primary interest.

60

 

frequencies and many response patterns with small frequencies, which implies that the
data form a rather sparse contingency table and the asymptotic normality of the

maximum likelihood estimator cannot be obtained, since in both data sets some of the

24 possible response patterns are not observed. Thus, when frequentist methods are
adopted, all kinds of problems associated with this sparseness such as statistical inference
and hypothesis testing should be kept in mind constantly (Knott, Albanese, and Galbraith,
1990; Fienberg and Rinaldo, 2007).

The sampling method is a multi-stage design with multiple separate stages of
selection, where selecting respondents were nested within addresses, addresses within
polling districts, polling districts within constituencies, and constituencies within the
electorate (The British Social Attitudes Panel Survey, 1983-1986). Given that a key task
of an annual series survey is to look at trends and changes in attitudes over time, a
longitudinal rather than a repeated cross-sectional design is adopted here (McGrath and
Waterton, 1986; Wiggins et al., 1990). In this study, we aim to extend our concentration
on the methodological issues: that is, the proposal and evaluation of an IRM-LGC hybrid
model. Because a growth curve analysis is used to model the process of change, the
estimation of growth proﬁles is represented by the parameters of initial level and shape,
along with other explanatory variables. Thus, a conceptual modeling framework is
depicted in Figure 4.2.1.

Unconditional models. In subsequent analyses, baseline priors and conjugate
priors are used for the measurement model parameters and structural model parameters.
Speciﬁcally, the means of initial level and shape are estimated using normal distribution

priors, and two kinds of non-informative prior are used for the variance of measurement

61

error: the inverse gamma prior and the uniform distribution prior (Gelman and Hill, 2007).
In regard to the covariance matrix of the random effect parameters, the conjugate prior,
the inverse Wishart distribution, is adopted. The complete speciﬁcation of different priors
can be found in Table 4.2.6. In order to examine the robustness of the obtained Bayesian
results, the monitoring of three independent chains with overdispersed initial values and
the convergence assessment of one single long chain are performed. It is found that the
results from these two approaches are close to each other within at least one decimal
place: in the situation of running three independent chains, the ﬁrst 20,000 iterations are
discarded as bum-in for each chain, which results in a total of an additional 30,003
iterations for the three chains and they were used to deﬁne the posterior distribution of
each parameter. Similarly, for a single long chain, we use a burn-in period of 19,998,
with parameter estimates based on the 50,000 subsequent iterations (see Figures
4.2.2-4.2.3). The output is summarized on the basis of the remaining 30,003 iterations.
Generally, the simulation should be run until the Monte Carlo standard error
associated with each parameter is within an acceptable range, say, less than 5% of the
sample standard deviation (Dunson et al., 2005; Kim and Bolt, 2007; Spiegelhalter et al.,
2003). However, compared to the results obtained from the multiple-chain approach, it is
found that the Monte Carlo errors are not all less than 5% of the sample standard
deviation when we adopt one single long chain to generate the simulated sample. When
using multiple independent chains, however, most of the Gelman-Rubin statistics, with
the potential scale reduction factor (PSRF), approximately approach one for each quantity
of interest (Gelman and Rubin, 1992), which indicates the reaching of convergence (see

Figure 4.2.4). Thus, in subsequent analyses we adopt Gelman and Rubin’s suggestion and

62

monitor the model convergence using three independent chains with over-dispersed
starting values.

Based on the results from Table 4.2.7, in considering a few candidate models, it is
found that all of them provide convergent substantive interpretation; thus, according to
the model goodness of ﬁt index (i.e., DIC), we take the model in the column on the
extreme right, the one with the probit link and uniform prior for level-1 residual variances,
as an example of the adequate representation of the data. Again, the results of parameter
estimates and associated standard deviations from the complete data set (n=284) are
given in Table 4.2.8 (the right panel), where we see that the estimated discrimination
parameters for item 2 and item 3 are both greater than one and larger than for the other
two items, indicating that item 2 and item 3 better discriminate the underlying propensity
level than do item 1 and item 4. This is because greater discrimination indicates a
stronger relationship between an item and the underlying latent trait; hence, we would say
that the “marriage” and “couple” items are more closely related to holding a positive
attitude to abortion than are the “ﬁnancial” and “woman” items. As for the item
difﬁculty parameter estimates, the estimated difﬁculty parameter associated with item 4
is the largest among the four, indicating that “woman makes the abortion decision
herself” is the hardest item to endorse. In other words, the endorsement of this item
reﬂects a higher level of propensity to hold a generally positive attitude toward abortion
than do other items, such as “ﬁnancial”, “marriage”, and “couple” items.

As for the substantive interpretation of the latent growth or decline trajectory, the
empirical result shows that, without controlling any explanatory variables, a mean growth

curve emerges with a true initial level of .392 (p<.01) and a change rate of .336 (p<.01).

63

. . . . ,. 2
The srgnrﬁcant variation between the respondents around these mean values (0' L =2.953

and 61% =.144) implies that, overall, these subjects start their growth process at different

phases and go on to change at different rates, which not only reveals systematic
difference in the change trajectory among participants but also suggests true variation
remaining in both the initial status and rate of change, indicative of the need for

additional time-invariant predictors (e.g., Singer and Willett, 2005). The correlation
between the initial level and the grth rate is -.021 (6' L S / (6’ L - 0" S ) , ns), implying

that the initial level has no predictive power for the growth rate. The level-1 varying
residual variances, describing the measurement fallibility in general attitudes to abortion
over time (their estimated values are 1.077, .581, 1.095, and .391, respectively, being
statistically signiﬁcant at the ﬁrst, and third points of time), suggest that the existence of
additional outcome variation at level-1 of the structural model may be further explained

by other time-varying predictors. Finally, it is found that a piecewise linear growth

trajectory exists (i.e., the estimated slopes for four repeated assessments are Sl = 0

(ﬁxed), S2 = —2.072 (p<.01), S3 = .061 (ns) and S4 = I (ﬁxed)) in terms of
participants’ general attitudes to abortion.

Model equivalence. It is well known that growth models can be approached from
several perspectives via the formulation of equivalent models and can provide identical
estimates for a given data set, such as the HLM and LGC models. To assess the
advantages and disadvantages of these two distinct modeling frameworks, we illustrate
their respective characteristics and application use with a simulated data set, in which the

population values were adopted from a previously modiﬁed analysis result, the one with

the probit link and constant level-1 residual variance. The simulated data are generated
using the free software of R (R Development Core Team, 2009), and the models are
implemented and estimated using WinBUGS 1.4.3 (Spiegelhalter et al., 2003). As
indicated before, in the structural model, ‘time’ in the HLM and LGC model has speciﬁc

consequences for the analysis results:
6(t)n = AOté’On + ’llté’ln + 8t(n) and
QVOn : vOn + U071

Cln : vln +0112

(t=1,...,T,‘n=l,...,N).]ntheHLM, (On and €121 arerandomparametersand

Illt is an observed variable representing time or a time-varying covariate, which makes
HLM the best approach if there are a great many variations of occasion between

individuals (Snijders, 1996; Willett and Sayer, 1994). However, in the LGC, €012 and

41” are the latent variables and 1011 and ’l'lt are factor loadings. Because Alt

cannot vary across subjects, LGC is considered best suited for time-structured data or a
ﬁxed occasion design (e. g., Byrne and Crombie, 2003). Although LGC modeling can be
used for designs with varying occasions by modeling all existing occasions and viewing
the varying occasions as problems of missing data, this approach is difﬁcult to manage
when the number of varying occasions is excessive (Bauer, 2003; Curran, 2003; Hox and
Stoel, 2005).

As can be seen in Table 4.2.9, the parameter estimates are rather similar and both

approaches lead to identical substantive conclusions. However, there is a caution: to

65

facilitate the comparison between these two approaches, in the HLM we manually ﬁx the

estimates of the time variable to be the same as the true values, since time coefﬁcients in
the HLM are ﬁxed explanatory variables (i.e., we ﬁx the population parameters S2

equal to -1.741, and S3 equal to .064), which makes the number of estimated
parameters in the HLM two fewer than their counterparts in the LGC model. In addition,
according to the overall goodness of ﬁt provided via the deviance information criterion
(DIC) (Spiegelhalter et al., 2002), we conclude that these two models ﬁt the data equally
well.

Generally, latent growth curve analysis is preferred in many situations because of
its greater ﬂexibility. For instance, standard SEM software supplies more options, such as
providing omnibus goodness-of-ﬁt indices for a model (i.e., allowing for a saturated
model with which any ﬁtted model can be compared) and being more ﬂexible in
modeling and hypothesis testing (i.e., testing complex mediational mechanisms through
the decomposition of effects and investigating moderational mechanisms through
multiple group analysis, to name only a few) (Bauer, 2003; Chou, Benter and Pentz,
1998; Curran, 2003; Hox and Stoel, 2005; MacCallum et al., 1997; Willett and Sayer,
1994). Still, the HLM is preferable whenever the growth model must be embedded in a
larger number of hierarchical data levels (Snijders, 1996). Adding additional layers to the
model is relatively difﬁcult if the SEM framework is used. While several key differences
remain between these two models, at the time of writing, the discrepancies are rapidly
disappearing (Preacher et al., 2008; Raykov, 2007).

Missing longitudinal data compensation. Missing data are unavoidable in

almost all serious statistical analyses. Although the way in which the Bayesian estimation

66

compensates for missing data is similar to the multiple imputation (MI) described by
Rubin (1987), it extends the MI method by jointly simulating the distributions of
variables with missing data as well as with unknown parameters (Carrigan et al., 2007;
Patz and Junker, 1999b). Thus, through a fully Bayesian (F B) approach, not only can the
missing values be treated as additional parameters to estimate, but these parameter
estimates can themselves be marginally integrated from an exact joint posterior
distribution for all the parameters of interest (Dunson et al., 2005). For instance, in the
context of incomplete longitudinal data, the imputation and analysis models are fully and
simultaneously speciﬁed in an FB analysis. However, the maximum likelihood method
relies on a fully speciﬁed model, and its parameter estimates are constructed using
likelihood-based approximations (Carrigan et al., 2007; Schafer and Graham, 2002).

In order to explore the inﬂuence of the item non-response on estimated
parameters, two separate analyses were conducted: one with a complete data set (for
those individuals who have an opinion on every item in all four years), and the other with
a full dataset of 323 respondents (Wiggins et al., 1990). As the results from the full
dataset (the one containing missing outcomes) do not differ systematically from the
complete cases in unconditional models, the unprovable missing data generation
mechanism, missing completely at random (MCAR; Rubin, 1987), seems sustainable.
Moreover, a hypothesis regarding the missing data mechanism is tested: the
corresponding signiﬁcance value associated with Little’s MCAR test (Little, 1988) is
.222, indicating that the data are missing completely at random. As mentioned earlier,
because Bayesian treats missing values as additional parameters which need to be

estimated, for those respondents with incomplete survey responses, handling missing data

67

this way helps improve the reliability of inference for individual latent growth or decline
trajectories (May, 2006; Patz and Junker, 1999b). Thus, in the present study, the paper by
Wiggins and his colleagues (1990) serves as guidance in selecting explanatory variables,
where age, gender, and religious status (treated as ﬁxed at the respondent’s 1983
response) were chosen to investigate their inﬂuences on the level and shape factors of a
latent growth curve analysis.

According to Rubin (1987), there are three potential patterns of missingness: (1)
missing completely at random (MCAR), (2) missing at random (MAR), and (3) missing
not at random. Although the assumption of MCAR seems statistically retainable in the
current study, we instead rely on the MAR assumption (see Table 4.2.10), indicating that
a systematic difference can be explained by other observed variables (Rubin, 1987). The
reason for this is that in longitudinal studies missing values are accumulated over time; in
this sense they are easily susceptible to biased results. Therefore, an imputation
component was built into the model using the three auxiliary predictors of gender, age,
and religious status, to deal with the multivariate missing categorical data at each
occasion. Based on the result shown in Table 4.2.11, both data sets provide estimates
with identical" substantial interpretation and there is evidence for an age and religious
status interaction in terms of the true initial status. Young people without religious belief
tend to have a higher tendency to hold positive attitudes toward abortion; however, the
same is not the case for senior people with religious belief. As none of the Bayesian
p-values is of extreme value, we ﬁnd no failure of the model: suggesting that the model
generates replicate data similar to the observed one.

Taken together, the application of IRTs to responses gathered from repeated

68

assessments allows us to take into consideration the characteristics of both item responses
and measurement error in the analysis of individual developmental trajectories. As a
simpliﬁed demonstration, in the present study we consider the modeling of a
unidimensional latent construct only. However, in developmental research one is often
interested in the way in which two or more repeatedly followed and interrelated
dimensions evolve over time. In order to effectively accommodate a variety of data
structures, it is clearly worthwhile to extend to multiple domains through the analysis of
random effect regressions, and simultaneously make use of their interrelationship when

we have multiple interrelated dimensions across the entire study period.

69

Using the MGRM-ALGC to Study the Parallel Process of Change

As a simpliﬁed demonstration, the goal of the following analyses is to illustrate
how this comprehensive hybrid model, the MGRM-ALGC, allows one to depict relations
among respective growth factors using data from the National Youth Survey (NY S;
Elliott, 1976-1987).

Participants. Based on a multistage cluster-sampling design, the NYS employed
a probability sample of households in the continental United States. The sample covers
urban, suburban, and rural geographic areas. To be assessed for ﬁve consecutive years,
the panel sample comprised 1,725 adolescents ranging from 11 to 17 years of age
(M=13.87, SD=1.945) at Year 1, 1976. Of these 1,725 randomly selected participants,
838 completed all 13 outcome measures across ﬁve occasions (i.e., after listwise deletion
of all missing values, the number of complete cases is 83 8, implying that attrition and
other form of missingness approximated half the size of the sample). The participants
described themselves as Caucasian (n=690), African American (n=99), Mexican
American (n=35), Native American (n=4), Asian (n=8), and others (n=2). Among them,
82.6% percent were from two-parent families. The questionnaire covered a wide array of
measures to assess participants’ social isolation status and their exposure extent to
delinquent peers. Adolescents with complete demographic data15 (n=802) reported a
slightly higher level than their counterparts with incomplete responses (n=3 6), except for
the second and third assessment occasions; similarly, adolescents with complete
demographic data (n=802) reported a somewhat greater extent of exposure to delinquent

peers than their counterparts with incomplete cases (n=36), except for the third and ﬁfth

70

assessment occasions. However, no statistically signiﬁcant difference was detected in the
two situations. Descriptive statistics for each dimension’s IRT scale scores are presented
in Tables 4.3.1a and 4.3.1b.

Measures. Few studies consider the dynamic relations between adolescents’
mental health and other problem behaviors, although there has been substantial evidence
of their relations in both cross-sectional and longitudinal samples (e. g., Cohen, Reinherz,
and Frost, 1994; Swahn and Dovonan, 2003). Thus, in the present study we decide to
examine the associations between adolescents’ social isolation and engagement with
delinquent peers through the observation of dynamic trajectories between these two
dimensions. The selection of these two constructs was based on the extant literature,
suggesting a link between the way in which adolescents perceived their emotional status
and the likelihood that they were associated with delinquent peers. Based on this
conceptual framework, we are interested in examining the corresponding dynamics
underlying this bivariate system as it evolved over time. A total of 13 polytomous items
were selected as outcome measures on each occasion, each of which is a ﬁve-point
Likert-type scale with higher scores reﬂecting severe status. Among them, the ﬁrst six
variables measure the construct of social isolation and the remaining seven describe the
extent of adolescents’ exposure to delinquent peers (see Table 4.3.2).

Dimensionality assessment. As part of the investigation of the NYS, the data
represent the responses to 13 items regarding adolescents’ social isolation status and the
extent of their exposure to delinquent peers by a selected panel of 838 from the years

1976 to 1980. A conﬁrmatory factor analysis (CPA) with categorical indicators was

 

Demographic variables include the marital status of therr parents, family income, gender, ethnrcrty. and

71

performed to examine the dimensionality using Mplus (Muthén and Muthén, 1998-2007).
The response frequencies for these 13 items are listed in Table 4.3.2. As observed in the
frequency table, it was found that response alternatives equal to or greater than three tend
to have small frequencies, implying that the data were rather sparse and asymptotic
normality of the maximum likelihood estimator may not apply. The CFA results
suggested that these 13 items measured two latent constructs for each of the ﬁve years.
The ﬁt of the ﬁve models was respectable, with Comparative Fit Indices (CFI)

between .965 and .982, Tucker-Lewis Fit Indices (TLI) between .973 and .985, and Root
Mean Square Error of Approximation (RMSEA) between .043 and .071.

Scores from perceived social isolation and exposure extent to delinquent peers are
plotted in Figures 4.3.1a and 4.3. lb. Each of the plots contains data from a random
subsample of 44 adolescents, in which each line represents an individual’s IRT scale
scores followed through ﬁve occasions. These plots illustrate some important features of
the data. Generally, intra-individual variability over time is evident. This observation
applies for both dimensions. Also, there is great inter-individual variability within groups,
indicating great change heterogeneity.

Identification constraints and prior distribution specification. As with other
estimation approaches, various identiﬁcation constraints are needed when complex
models are encountered. In the present study, for the MGRM-ALGC model, in order to
address rotational indeterminacy, we assume a multidimensional model with simple
structure (i.e., each item measures one dimension of ability and there is no cross-loading
of items), ﬁx the ﬁrst discrimination parameter associated with each construct to one and

zero loadings otherwise (i.e., alpha[l,l]<-l, alpha[1,7:l3]<-0; alpha[2,l :6]<-0,

 

age.

72

alpha[2,7]<-1), and constrain the ﬁrst threshold associated with the ﬁrst item’s
multidimensional item difﬁculty parameter in each dimension to zero (i.e., d[1,1]<-0;
d[2,7]<-0). Moreover, in order to resolve the metric indeterminacy, we compare and
contrast two different scaling options: either constraining the initial latent growth factor
from each dimension to the value of zero or ﬁxing level-1 residual variances for each
construct to a constant value (i.e., set variances for both 01 and 02 equal to particular
constants). As regards model convergence checking and subsequent statistical inference,
we adopt Gelman and Rubin’s (1992) suggestion of running three independent chains
with over-dispersed starting values. Because WinBUGS treats an initial 4,000 iterations
as the default adaptive phase under the general normal- proposal Metropolis algorithm,
we take these 4,000 iterations as the bum-in period and sample an additional 4,000
iterations from each independent chain (Spiegelhalter et al., 2003). Thus, the point
estimate of the model parameter and corresponding standard error were computed from
the mean and standard deviation of the remaining 12,000 observations (i.e.,
12,000=4,000*3) sampled from each pararneter’s marginal posterior distribution. For

instance, the mean estimate of an overall time effect associated with a particular

:1:
A A =1: ‘1 T (t) >Ic
dimension (vdlo) can be calculated as leo = (T ) Z leo , where T is
t=l

the total number of simulates obtained from the posterior distribution. Since we have
large sample of leO from its posterior distribution, an estimate of SEQ/56110 ) can

be directly obtained from the sample covariance matrix,

73

1
at: -1T l A t A T 2 *
E (T -1) ZIVd10()_Vd10)(Vd10()—Vd10) -AS T

becomes inﬁnity, these Bayesian estimates tend to approach to their corresponding
posterior means in probability.

As regards the prior density speciﬁcation, in subsequent analyses baseline priors
and conjugate priors are used for the measurement model parameters and structural
model parameters. That is, order to facilitate model identiﬁcation, a normal prior with
tight precision, N(0, .5), was utilized for item difﬁculty parameters, and a truncated

normal prior, N(0, 1.0E-02)l(0,) was adopted for item discrimination parameters. In

addition, the level-l residual variance (0' 2 ) is identically and independently distributed
as an inverse gamma distribution with shape and scale parameters being set to the value
of one. Speciﬁcally, in the unidimensional GRM-LGC model, the means of initial level
and shape factors are estimated using multivariate normal distribution priors. In regard to

the covariance matrix of the random effect parameters, the conjugate prior, the inverse

Wishart distribution is adopted. As for the MGRM-ALGC model, the 0 -vector is next
decomposed into two sets of latent growth factors and assumed to be distributed as a
multivariate normal distribution. For both dimensions, the means of initial level and
shape factors are estimated using multivariate normal distribution priors, and the inverse
Wishart distribution is adopted for the covariance matrix of the random effect parameters
from each dimension. The complete speciﬁcation of different priors can be found in
Table 4.3.3.

Empirical results. Extracted from the multidimensional graded response model,

74

each developmental variable of interest is an unobservable propensity level. In order to
validate the rationale in conducting an associative LGC, analytically the researcher needs
to ensure that there is sufﬁcient interindividual variation in the initial status and growth
rate for each univariate dimension. Once each univariate construct can be successfully
modeled, the researcher can model all the developmental latent variables simultaneously.
The associative latent growth curve model used in the present study describes the form of
grth and the pattern of associations among growth factors for each of the following
dimensions, namely, the degree of adolescents’ social isolation and the extent of exposure
to delinquent peers. In addition, in order to capture the nonlinear trajectory embedded in
each developmental variable, the shape factor loadings are constrained to zero and one at
the ﬁrst and last assessment occasions, and the coefﬁcients for intermediate time points
are freely estimated.

Unidimensional model: the GRM-LGC.

Social isolation. The results of parameter estimates and associated standard
deviations from the complete data set (n=83 8) are given in Table 4.3.4 (left panel), where
we see the estimated discrimination parameters for items 4 and 5 all signiﬁcantly greater
than the value of one, indicating that these items better discriminate the underlying
person ability than the other items do. Because greater discrimination indicates a stronger
relationship between an item and the underlying latent trait, we may say that the items
“nobody at school cares” and “don’t belong at school” are more closely related to the
construct of feeling socially isolated than other items, such as “teachers don’t call on me”,
“outsiders with family”, and “no project work from teachers”. As for the item difficulty

parameter estimates, the estimated item threshold parameter associated with the very last

75

response category in item 6, ,3 [6,4], is the largest, indicating that endorsing in the

response category of 4 in the following item, “no project work from teachers”, is the
hardest alternative for respondents to reach. That is, the endorsement of this item reﬂects
a higher level propensity to feel isolated than do the other items.

As for the substantive interpretation regarding the structural model, the empirical
result shows that without controlling any explanatory variable, a mean growth curve
emerges with a true initial level of 1.542 (p<.01) and a change rate of -.342 (p<.01). The

signiﬁcant variation between the respondents around the mean value associated with the

initial level (6i=1.538) implies that, overall, these subjects initiate their growth process

at different phases, which not only reveals systematic differences in the change trajectory
among participants but also suggests true variation remaining in one of the growth
parameters, indicating the need for additional time—invariant covariates (e. g., Singer and

Willett, 2005). The correlation between the initial level and change rate is -.109
(0115/ (0“ L ° 6’ S ), ns), indicating that the initial level has no predictive power for the

change rate. Finally, it was found that there exists a piecewise linear trajectory (i.e., the
estimated slopes for ﬁve repeated assessments are S1 = 0 (ﬁxed), 52 = .857

( p<.01), s3 = 1.295 (p<.01), s4 = 1.230 (ﬁxed), and 55 =1 (ﬁxed)) in terms
of the participants’ perceived levels of social isolation.

Exposure to delinquent peers. Similarly, in Table 4.3.4 (right panel), we can see
that the estimated discrimination parameter for item 6 is the largest out of seven,
indicating that “stole something worth more than $50 dollars” is more closely related to

hanging out with delinquent peers than other items. As regards the item difﬁculty

76

parameter estimates, overall, the estimated threshold parameters associated with item 5
are rather large, implying that selling hard drugs is a hard item to endorse: those
adolescents who endorsed higher category alternatives for this item were more likely to
be associated with delinquent friends. In addition, without controlling any explanatory
variable, we obtain a mean growth curve with a true initial level of -.874 (p<.01) and a

change rate of -.519 (p<.01). The signiﬁcant variation around the latent means for these

two growth factors (6% =2.788 and 6% =2.504) indicates that there remains room for
individual-level covariates and contextual variables. In addition, because the initial level
has no predictive power for the change rate ( ,5 L S =.002, ns), the change rate

demonstrates a gradual decline pattern, no matter what the respondent starting level.

Likewise, a segmented latent trajectory was found (i.e., the estimated slopes for ﬁve
repeated assessments are S1 = 0 (ﬁxed), S2 = .203 (p<.01), S3 = .503 (p<.01),

S4 = .977 (p<.01), and S5 = 1 (ﬁxed)) in the dimension of deviant peer afﬁliation.

Multidimensional model: the MGRM-ALGC.

Unconditional model: A two-level model. The associative latent growth model
allows for the assessment of relationships among individual parameters for adolescents’
social isolation level and exposure extent to delinquent peers, and for the estimation of
means, variances, and covariances associated with the growth factors for each
developmental dimension. Gelman and Rubin’s (1992) suggestion of running multiple
independent chains with over-dispersed starting values for checking model convergence
is adopted. The model reaches convergence: in all the Gelman-Rubin statistics, the
potential scale reduction factor (PSRF) approaches one for each quantity of interest (see

Figure 4.3.2). Parameter estimates indicate a signiﬁcant rate of change in the

77

development of both adolescents’ social isolation and extent of exposure to delinquent
peers. Being consistent with other developmental studies, generally, the results suggest a

relative downward trend in these two dimensions during adolescence, except for the

fourth occasion in the social isolation dimension (S14 = l. 070 , p<.01). In addition,
both variances of level and shape factors associated with each dimension are signiﬁcant
(i.e., 2.470,] .554;3.047,2.664), an indication that signiﬁcant individual variations remain
in these two developmental variables, which further justiﬁes the implementation of a
univariate LGC for each dimension, and the application of an associate LGC between two
of them.

Table 4.3.5a presents the correlations between the levels and shapes for
adolescents’ social isolation and extent of exposure to delinquent peers. The levels and
shapes associated with each dimension are all signiﬁcantly correlated, except for the
correlation between the change rate of social isolation and initial level of the extent of
exposure to delinquent peers (.109, ns), and that between initial level and rate of change
in the afﬁliation with delinquent peers (-.006, ns). Thus, the hypothesized associations
between these two constructs are validated. That is, in terms of substantive interpretation,
as adolescents perceived themselves more socially isolated, the chance that they are
engaged with delinquent peers becomes profoundly larger (.292 and .523). As shown in
Table 4.3.5b, the estimates for the multidimensional item discrimination and difﬁculty
parameters estimated as ﬁxed effects range from .571 to 1.453, and from -1 .443 to 8.388,
respectively.

As with any item response theory model, this MGRM-ALGC model is

over-parameterized and needs to be identiﬁed. In the above analysis, the identiﬁcation

78

problem is tackled by (l) ﬁxing the ﬁrst discrimination parameter associated with each
construct to the value of one, with zero loadings otherwise; (2) constraining the ﬁrst
threshold associated with the ﬁrst item in each dimension to the value of zero; (3)
imposing the level-1 residual variances for each construct to the value of one. As
mentioned earlier, there are no necessary and sufﬁcient conditions for identiﬁability; the
problem needs to be addressed on a case-by-case basis. Thus, in what follows the other
two scaling options were explored, in which compared to the identiﬁcation constraints
adopted in the previous analysis, in which one removes constraints from the level-1
residual variances and the ﬁrst item’s ﬁrst threshold associated with each construct but
imposes constraints on the initial latent variables (i.e., scaling option 1), while the other
removes constraints from the level-1 residual variances without any concomitant changes
(i.e., scaling option 2). The results were compared and contrasted with those of the
previous analysis (i.e., the original scaling). As the results indicate (see Table 4.3.6), each
scaling option provides convergent substantive interpretation and is equally effective in
resolving the indeterminacy.

Comparison of two analytical approaches.

Additionally, in terms of the ﬁxed and random effects, and the intermediate time
coefﬁcients from the structural model (i.e., the associative latent growth curve model,
ALGC), we compare and contrast the corresponding parameter estimates using two
distinct analytical approaches with a simulated data set, namely, a two-stage IRT based
score analysis and a single-stage IRT based score analysis. The population values of the
simulated data are adopted from the results of previous empirical data analysis, the

unconditional model with the level-1 residual variances from each dimension being ﬁxed

79

at the value of one. The simulated data were generated using the free software of R (R
Development Core Team, 2009), and the models were implemented and estimated using
WinBUGS 1.4.3 (Spiegelhalter et al., 2003).

As expected, the pattern of signiﬁcance from two IRT-based approaches is quite
similar, except that the two-stage estimation approach fails to take into account enough
uncertainty. Furthermore, the results conﬁrm that the proposed uniﬁed model is relevant
to applications such as multilevel analysis and meta-analysis, for they favor random
effects models in which ‘pooling strength’ acts to provide more reliable inferences about
individual cases (Congdon, 2005, 2006; Gelman and Hill, 2007; Luke, 2004; Raudenbush
and Bryk, 2002). Unlike the conventional two-stage procedure, the simultaneous
estimation of a multivariate multilevel IRT model avoids problems of attenuation bias
when the study focus is to regress the latent trait variables on other explanatory covariates
(e. g., Bolt and Kim, 2005).

The MIRT model used for the simultaneous estimation of multiple-domain latent
grth trajectories can be viewed as a general framework for obtaining the dynamic
interrelationship among multiple behavioral dimensions across the entire study span. As
Adams et al. (1997) and de la Torre and Patz (2005) suggest, when dimensions are
related but supposedly distinct, taking the correlation into account can lead to noticeable
improvements in parameter estimates and individual measurements, in particular when
there are several short subscales and the underlying dimensions are correlated. As the
empirical results above indicate, employing a simultaneous estimation of
multiple-domain subscales not only provides direct estimates of the relations between the

latent dimensions but helps reduce the standard error of the parameter estimates of

80

interest, in particular for parameters which present difﬁculties in reaching convergence in
the unidimensional scenario (cf. Table 4.3.4 vs. Table 4.3.5b).

Conditional model: A Two-level model.

One of the advantages of casting IRT models in a hierarchical structure is that it
enables the researcher to incorporate different contextual variables as auxiliary
information while estimating the models, which not only improves the estimation of
person abilities but the calibration of item parameters (Mislevy, 1987). As mentioned
above, unlike the conventional two-stage procedure, the simultaneous estimation of a
multivariate multilevel IRT model avoids problems of attenuation bias when the study
focus is to regress the latent trait variables on other explanatory covariates (e. g., Bolt and
Kim, 2005). In order to illustrate the capacity of this comprehensive modeling
framework, we expand the model by adding person-level covariates. That is, building
upon the previous unconditional model, we include participants’ gender (0=FEMALE
and I= MALE) as the person-level predictor.

Generally, we interpret the parameters within each level in a similar way to the
coefﬁcients in regular regression. Thus, in this example, the two respective level-2 slope
parameters capturing the effect of gender address the following research question: in
terms of social isolation status and delinquent peer afﬁliation: what is the difference in
the average trajectory of true change associated with participants’ biological gender?
Here, the ﬁnal result from a parsimonious model was presented: as shown in Table 4.3.8b
(right panel), the ﬁxed effect estimates associated with the initial level of delinquent peer
afﬁliation in the level-2 model are statistically signiﬁcant (.267, p<.05), implying that, on

average, boys have a higher initial exposure extent than their counterparts (F EMALE=0).

81

However, there is no gender difference associated with other latent growth parameters. In
addition, the level-2 residuals, UdOnk and Udlnk , represent the portions of the

individual growth parameters unexplained by the covariate of change, GENDER, for
each dimension, indicating that there still remains signiﬁcant between-person variability
among adolescents after accounting for the effect of gender. These results again suggest
the need for additional time-invariant predictors for each dimension. According to the
overall goodness of ﬁt provided via DIC, in this particular example we could not reach
the conclusion that the effect of biological gender improves interpretation
(76,453.4<76,462.9). That is, even though a smaller DIC represents a better ﬁt of the
model, a difference of less than ten units between models does not provide sufﬁcient
evidence for favoring one model over another (Spiegelhalter et al., 2003). Hence, these
two models are considered to ﬁt the data equally well. Recall that the multidimensional
item parameters are estimated as ﬁxed effects in the model. As shown in Table 4.3.8b,
the multidimensional item difﬁculty estimates ranged from -1.444 to 8.435, and
multidimensional item discrimination estimates ranged from .570 to 1.476.

In order to model the parallel process of change, our intention is to propose an
advanced analytic method which allows for the simultaneous estimation of a
measurement model containing a set of categorical items and a latent grth curve
analysis. Thus, we illustrate how this uniﬁed approach allows the depiction of relations
among respective growth factors, represented in both the initial level and the change rate
for each of two interrelated dimensions. However, there are several ways of further
extending the analyses reported here. First, the autocorrelation between identical

measures across different occasions can be studied. Second, we might consider

82

incorporating other social contextual risk and protective factors on adolescents’
problern-related behaviors. From a substantive point of view, it would be beneﬁcial to
understand what factors inﬂuence speciﬁc problem behaviors and problem behaviors in
general. As mentioned earlier, such information may better represent the traditional
theory underpinning developmental trajectories and be useful in guiding effective
intervention and prevention programs for young people. Finally, because both empirical
and substantive differences may be critical for the correct interpretation of the dynamics
and inﬂuences of change, as McArdle (1988) and Duncan et al. (2001) suggest, studies
with a broad selection of different multivariate approaches, such as the range of models
and the corresponding statistical power for detecting meaningful differences, all deserve

continuous effort and exploration.

83

Chapter 5
DISCUSSION AND CONCLUSION

Obviously, a single-stage analytic strategy is an optimal alternative. In order to
model the process of change, our intention is to propose an advanced analytic method
which allows for the simultaneous estimation of a measurement model containing a set of
categorical items and a latent growth curve analysis. As Bereiter (1963) puts it, one of the
problems encountered in measuring change is scalability, in which the comparability of
changes from different initial levels is questionable. However, it is expected that this
comprehensive framework yields three beneﬁts when the model ﬁts the data well, and
Bereiter’s concern about scaling can accordingly be accommodated: (1) the
interpretations of item parameters will be invariant to the latent trait distribution of the
respondents in question; (2) the interpretations of latent trait parameters will be invariant
to the distribution of the test items under consideration; and (3) precision can be
approximately obtained in the estimate of each model parameter and latent variable (e.g.,
Curran et al., 2007; Dunson et al., 2005; Embretson, 1994; Rasch, 1960; Roberts and Ma,
2006)

In addition, as longitudinal data analysis has played a signiﬁcant role in empirical
research within developmental science, the researcher should bear in mind that the
decision regarding the longitudinal research design can be made in an a priori manner
based on a Monte Carlo study. Alternatively, the research could also consider performing
a post hoc power analysis before reaching the conclusion that there is no statistical
signiﬁcance in a given context. Finally, when change is studied, it is common to ask

whether change occurs as a result of treatment interventions or different group

84

memberships, that is, whether the change component, such as the differences in average
intercept, slope, and/or other polynomial coefﬁcients, can be discerned and predicted by
other contextual variables. Thus, researchers are encouraged to design and conduct a
Monte Carlo study tailored to their speciﬁc research questions while determining the
sample size at a reasonable level of power and validating their statistical inference
conclusions.

In estimating complex statistical models, the capacity of Bayesian methods is
undeniable, for they allow an intuitive probabilistic interpretation of the parameters of
interest and the efﬁcient incorporation of prior information to empirical data analysis
(Rupp et al., 2004). Advantaged as they are by modern simulation and sampling methods,
such as the Markov chain Monte Carlo (MCMC) algorithm, Bayesians allow for the
representation of parameter densities which may be far from normal, whereas traditional
maximum likelihood estimation relies on asymptotic normality approximations (Best et
al., 1996; Maier, 2001). Unlike classical inference, the Bayesian methods treat unknown
parameters as random variables and interpret traditional statistics in a more intuitive way.
The consequences of taking a Bayesian point of view reﬂect the probability values in
hypotheses and conﬁdence intervals on parameters, both of which are more concordant

with commonsense interpretations (Keller, 2005; Rice, 1995). That is, in the Bayesian
paradigm, the interpretation of a Bayesian 100(1 — a )% credible set is more

straightforward than that made by the frequentists. In classical inference, the conﬁdence
interval is a probability statement about the interval, while in the Bayesian approach, the
credible interval is a statement about the unknown parameter (Phillips, 2005; Rice, 1995;

Wasserman, 2003).

85

As mentioned, MCMC sample-based estimation methods overcome numerical
integration problems and allow the handling of high—dimensional problems and the
exploration of the distribution of parameters, regardless of the forms of distributions of
likelihood and parameters (J ackrnan, 2000; Keller, 2005). In addition to this advantage
and that of straightforward interpretation, Bayesian methods also provide a clear
approach for incorporating prior information, which increases the statistical power of the
analysis and contributes to the accumulation of scientiﬁc ﬁndings. As Congdon (2005)
suggests, informative subjective priors allow researchers to build on previous research
and can be justiﬁed on the basis of archival materials and the weight of established
evidence and opinion elicited ﬁom scientiﬁc specialists. As illustrated in one of practical
illustrations, we demonstrate how informative priors affect the parameter estimates and
standard deviations from a small data set and how they can be treated as extra data

information while conducting an analogy analysis.

86

Signiﬁcance of the Present Work

The ease of implementing MCMC demonstrates much potential for statistically
complex models in which they can ﬁnd future application. Speciﬁcally, one of the
IRT-LGC derivatives, the MGRM-ALGC model presented here, provides an integrated
approach to modeling development in a consecutive and simultaneous manner which
includes multivariate multiple ordered categorical measures as outcomes. The MIRT
model used for the simultaneous estimation of multiple-domain latent growth trajectories
can be viewed as a general framework for obtaining the dynamic interrelationship among
multiple behavioral dimensions across the entire study span. As Adams et al. (1997) and
de la Torre and Patz (2005) suggest, when dimensions are related but supposedly distinct,
taking the correlation into account can lead to noticeable improvements in parameter
estimates and individual measurements, in particular when there are several short
subscales and the underlying dimensions are correlated. As the empirical results above
indicate, employing a simultaneous estimation of multiple-domain subscales not only
provides direct estimates of the relations between the latent dimensions but helps reduce
the standard error of the parameter estimates of interest, in particular for parameters
which present difﬁculties in reaching convergence in the unidimensional scenario.

Being a ﬂexible multivariate multilevel model, this MGRM-ALGC model
produces parameter estimates which are readily estimable and interpretable. For instance,
in addition to the parameter estimates for the latent trajectory of each individual, it also
generates the interpretation of the items as descriptive measures for portraying the
interaction between persons and items (e. g., Reckase, 1997). Substantively, this

associative model helps establish the interrelationship among subjects’ multiple

87

behaviors over time and estimates the corresponding covariation in the developmental
dimensions. In practice, this extension allows the researcher to evaluate the dynamic
structure of both intra- and inter-individual change, rendering a rational sequence in
testing the adequacy of latent growth curve representations of behavioral dynamics
(Duncan et al., 1999, 2004). Methodologically, as the fusion of a number of approaches,
embedding the multidimensional item response theory model into multivariate latent
grth curve analysis allows one to extend the model to a multivariate second-order
analysis, gives one a way to evaluate the factorial invariance of latent constructs across
different assessment occasions, and permits one to separate time-speciﬁc error and

measurement error (Blozis, 2007; Sayer and Cumsille, 2001).

88

Future Research

In the present work, the utility of this IRT-LVM comprehensive framework was
investigated with two real data examples and a simulated study. Promising results were
obtained, in which one data drawn from part of the British Social Attitudes Panel Survey
1983-1986 revealed the attitude to abortion of a representative sample of adults aged 18
or older living in Great Britain (see McGrath and Waterton, 1986). As a simpliﬁed
illustration, we ﬁrst investigated the dimensionality of the scale using conﬁrmatory factor
analysis, and assumed that there was no differential item functioning (DIF) to remove the
corresponding gamma and beta changes. However, as Lord (1980) points out, because the
latent ability obtained from IRT models are invariant across measures of the same
construct but with different psychometric properties, the generalizability of this uniﬁed
model to designs with different item samples administered on different occasions opens a
promising avenue for future research. For instance, the inclusion of a set of shared anchor
items over time and subsets of items altered on the basis of developmental relevance
across the entire study span, namely, incomplete designs or planned missingness (e.g.,
Schafer and Graham, 2002), is a direction worth pursuing, for it not only expands the
possibilities for linking and vertical scaling across studies and over time, but results in
powerful and efﬁcient experimental designs for the analysis of individual developmental
trajectories (Curran et al., 2007; Fischer and Seliger, 1997; Patz and Yao, 2007a, 2007b;
Roberts and Ma, 2006; Te Marvelde, Glas, Van Landeghem, and Van Darnrne, 2006).

Although assessments which measure grth over large grade spans on a
common scale predate modern advances in latent trait models, as a fundamental task, it is

important to conduct an up-to-date literature review and study on the classiﬁcation of the

89

different latent variable models used for examining general issues in growth modeling
and vertical scaling. The taxonomy could be based on selection criteria such as model
parameters and the latent variable of interest, the types of information provided via these
scales, separate versus concurrent calibration, appropriate conditions for model
application, etc. It is hoped that, through a systematically sound categorization, a
conceptual framework can be sketched, which enables educational researchers and
psychometricians to delineate the relations between different models and help them ﬁnd
their own models tailored to the substantive domain knowledge and available data at
hands. These models include: Anderson’s longitudinal model with a latent correlation
(1985), Embretson’s multidimensional Rasch model for learning and change (MRMLC)
(1991), Adams, Wilson, and Wang’s multidimensional random coefﬁcients multinomial
logit model (MRCMLM) (1997), Fischer and Seliger’s multidimensional linear logistic
model (1997), and Patz and Yao’s multidimensional multigroup item response model for
vertical scaling (2007a, 2007b), to name a few.

Moreover, it is expected that this modeling framework can be applied to
large-scale assessments and facilitate the investigation of a promising practice area:
analyzing students’ annual growth and change across a range of grades, for example. In
practice, many applications in educational and psychological testing involve long tests,
large samples, response patterns, and high dimensional latent factor structures. As
directions for future research, researchers could consider comparing and contrasting other
estimation approaches to implementing the analysis, such as the adaptive Gauss-Hermite

quadrature procedure with different options controlling the number of quadrature points

90

used for each dimension of the integrationl6, and releasing such strict assumptions as the
stability of the item parameters over time and among different subpopulations, together
with the assumption of local independence. For instance, in addition to the indirect
effects via the latent variable, researchers could investigate whether the individual-level
covariates on the responses have direct effects. That is, presuming that the scales are
psychometrically sound, the phenomena of differential item functioning (DIF) can be
examined, in which the DIF represents the fact that the probability of endorsing an item
differs among people with the same ability but distinct characteristics, such as people
having the same propensity but being of different gender, and/or ethnicity (e. g., Holland
and Wainer, 1993). In the education testing ﬁeld, such investigation is important, for DIF
suggests that participants might not be fairly assessed by the instrument.

Likewise, the random effect IRT models, deﬁning an additional random effect for
each testlet and/or item bundle, can be adopted to account for dependencies between like
items across different points in time (e.g., De Boeck, 2008; Li et al., 2006; Rijmen,
Tuerlinckx, De Boeck and Kuppens, 2003). Additionally, in both empirical data analyses,
we employed the usual single-group analysis, including subjects’ demographic
characteristics, such as the gender of the participants, as the time-invariant covariate
(TIC). However, it is important to know that when all other parameters remain the same
across different subpopulations, having TICs only introduces differences in conditional
means for the growth factors. As a further point noted by Fischer and Seliger (1997), it is
unrealistic to guarantee that a sufﬁciently unidimensional scale is applicable to all

respondents: because the factor structure in different groups, such as males and females,

 

16 Te Marvelde et al. (2006) argued that for more scales and time points, the adaptive Gauss-Hermite

91

black and white, etc. will generally differ. Putting this recommendation into practice
implies that research should be based on multiple-group invariance analysis (Meredith
and Horn, 2001). Researchers could consider the application of multiple-group grth
models, such as the latent class growth models and growth mixture models, to identify
homogeneous subgroups within the larger heterogeneous population (Curran et al., in
press). Finally, as latent variables play an important part in this generalized linear latent
and mixed modeling framework, it is desirable to develop the semipararnetric Bayesian
method (Lee, 2007) and other approaches (e. g., van den Oord, 2005) to relax its regular

multivariate normality assumption.

 

quadrature method may become unfeasible, but this requires further investigation.

92

APPENDICES

93

Table 4.1.1

The Simulation Design Layout

APPENDIX A

 

Design factor

No. of participants

No. of items

Standardized effect size of the
average growth trajectory

 

Investigating levels

 

125, 250

5,10,15

 

 

Small (.14), Medium (.28)

 

94

Table 4.1.2
The Population Values used in the RASCH-LLGC Model

 

Measurement model

 

Item difﬁculty parameters:

A. 5 items (-2,-l, 0, 1, 2)

B. 10 items (-2, -l.556, -l.l 1 l, -.667, -.222, .222, .667, 1.111, 1.556, 2)

C. 15 items (-2, —l.714, -1.429, -l.143,-.857, -.571, -.286, 0, .286, .571, .857, 1.143, 1.429, 1.714, 2)

 

Structural model

 

Intercept mean: 0.00

Slope mean: .14 vs. .28

Intercept variance: 1.00

Slope variance: .20

Correlation between intercept and slope: 0.00
residual variance(s): 1.00

Occasions of measurement: 0, l, 2, 3

GCR/R—square values: .50, .55, .64, .74.

 

95

Table 4.1.3
Performance of the Estimated Average Latent Trajectory

in the RASCH-LLCG Model

 

 

 

 

 

 

 

 

 

 

 

 

 

 

321:: 13:51:32? Range BIAS RMS SE SD SE/SD power
$5125105 .140 .155 [.062, .285] .015 .054 .068 .052 1.308 .64
SE125110 .140 .148 [.042, .260] .008 .046 .061 .046 1.326 .72
$5125115 .140 .141 [.092, .239] .001 .034 .060 .032 1.875 .81
M5125105 .280 .293 [.148, .398] .013 .058 .069 .056 1.232 1.00
ME125110 .280 .288 [.190, .383] .008 .041 .062 .040 1.550 1.00
M5125115 .280 .280 [.207, .346] .000 .034 .060 .034 1.765 1.00
35250105 .140 .158 [.111, .217] .018 .032 .047 .027 1.741 1.00
$5250110 .140 .147 [.100, .180] .007 .020 .043 .019 2.263 1.00
$5250115 .140 .142 [.107, .182] .002 .016 .042 .016 2.625 1.00
M5250105 .280 .293 [.230, .342] .012 .030 .048 .027 1.778 1.00
M5250110 .280 .276 [.247, .316] -.004 .020 .044 .019 2.316 1.00
ME250115 .280 .279 [.228, .320] -.001 .016 .043 .016 2.688 1.00

 

Note. For instance, SE250105 stands for the condition with small standardized effect size of the average
growth trajectory (.14), the sample size of 250, and ﬁve dichotomous items.

96

Table 4.1.4

Different Types of Prior Used for the Simulated Data Set (SE125110)

 

 

 

 

 

Least Half Full
Parameter True value . . . . . . . . .
mfonnatrve priors mfonnatrve priors mfonnatrveirrors
[31 —2.000 N(0,.25)a N(-2, 22.735) N(-2, 45.469)
,62 -1.556 N(0,.25) N(-l .556, 24.902) N(-1.556, 49.804)
,63 -1.1 11 N(0,.25) N(-1.111, 27.887) N(-1.111, 55.775)
,64 -.667 N(0,.25) N(-.667, 29.495) N(-.667, 58.990)
,65 -.222 N(0,.25) N(-.222, 29.450) N(-.222, 58.899)
,66 .222 N(0,.25) N(.222, 29.815) N(.222, 59.629)
,67 .667 N(0,.25) N(.667, 29.136) N(.667, 58.272)
,68 1.111 N(0,.25) N(1.111,28.097) N(1.111,56.194)
ﬁg 1.556 N(0,.25) N(1.556, 23.716) N(1.556, 47.431)
,61 0 2.000 N(0,.25) N(2, 21.471) N(2, 42.943)
#L 0 "- "-
,u S .14 N(O, .25) N(.14, 127.836) N(.14, 255.673)
2 ‘1 _1 Wishart Wishart Wishart
‘7 L ULS 1 0
1 0 3.5 0 7 0 b
02 01’3 07’5 014’10
0L5 01, ' ' '
.3 1

 

 

 

 

 

Note. 3. Inside the parenthesis, the second quantity stands for the precision of the parameter.

1 O
b. First of all, let [0 2] equal the prior guess for the mean of the 2 x 2 variance/covariance

matrix 2 . Second, choose the degrees-of-freedom parameter, v=10, that roughly represents an

1 0
equivalent prior sample size. Third, deﬁne a matrix S=(v-2-1) x I I=I

97

0.2

70
01.4'

Table 4.1.5
Parameter Estimates with Different Priors for the Simulated Data

 

 

 

 

 

 

 

Simulated data set: SE125110
True Least informative priors Half informative priors Full informative priors

value 13:22:? SD E5323? SD Egg??? SD

,6} -2000 -1.87* .155 4952* .106 -1.963* .093
32 —1.556 -1.547* .148 -1 608* .097 -1.604* .085
,63 -1.11 1 -1000* .140 -1.077* .090 -1.084* .076
[34 -.667 -.535* .140 -.610* .086 -.623* .074
,65 -222 -.161 .136 -.229* .087 -.230* .073
,66 .222 397* .137 315* .085 301* .072
,67 .667 .806* .138 .728* .086 .721 * .074
,68 1.111 1131* .138 1073* .089 1076* .076
,69 1.556 1564* .146 1509* .095 1512* .083
,6] 0 2.000 2060* .152 2000* .105 1997* .092

,u L .000 .000 .000 .000
#S .140 .164* .060 .151* .047 .148* .041
of 1.000 .993* .251 1046* .237 1068* .225
0%, .200 .191 * .051 .159* .047 .164* .044
0' L S .000 450* .171 471* .163 390* .150

0% 1.000 1.000 1.000 1.000

DIC 3,464.580 3,462.600 3,459.440

 

 

 

 

Note. a. *p<.05 (l .96); b. The convergence is assessed via three independent chains with 30,000 iterations
each, where the ﬁrst 25,000 was discarded as burn-in.

98

Table 4. 2. 1
The Seven Items Concerning Attitudes to Abortion on the British Social Attitudes Panel
Survey, 1983-1986

 

Here are a number of circumstances in which a woman might consider an abortion. Please say whether
or not you think the law should allow an abortion in each case. Should abortion be allowed by law?

 

Extreme circumstance factor:

1. [Risk] the woman’s health is seriously endangered by the pregnancy.
2. [Rape] the woman became pregnant as a result of rape.

3. [Defect] there is a strong chance of a defect in the baby.

General attitude factor:

[Financial] the couple cannot afford any more children.

[Marriage] the woman is not married and does not wish to marry the man.
[Couple] the couple agree that they do not wish to have the child.

[Woman] the woman decides on her own she does not wish to have the child.

>199?

 

99

Table 4. 2.2
Breakdown Table for the Restricted Data/Complete Cases

 

 

 

 

 

 

 

 

 

latent variable outcomes Attitude 1983 Attitude 1984 Attitude 1985 Attitude 1986
n 160 160 160 160
Female(0) Mean .261 -.208 .262 .439
Gender SD 1.709 1.649 1.710 1.592
n 124 124 124 124
Male (1) Mean .349 -.069 .494 .860
SD 1.856 1.630 1.806 1.573
n 141 141 141 141
Senior (0) Mean .126 -.319 .161 .527
Age SD 1.702 1.593 1.792 1.661
n 143 143 143 143
Junior (1) Mean .470 .022 .563 .717
SD 1.827 1.67 1.697 1.526
n 182 I82 182 182
Yes (0) Mean .095 -.417 .124 .375
Religion SD 1 .840 1.538 1.742 1.567
n 102 102 102 102
No (1) Mean .664 .333 .791 1.064
SD 1.586 1.711 1.698 1.556
N 284 284 284 284
Total Mean .299 -.147 .364 .623
SD 1.771 1.640 1.753 1.595

 

 

 

 

 

 

Note. a. Each of these three explanatory variables were dichotomized as follows: gender (0: female vs. 1:
male), age (0: elder (>40) vs. 1: young respondents (<=40)), and religious status (0: have religion
vs. 1: no religion).

100

Table 4. 2.3
Breakdown Table for the Full Data/Available Cases

 

 

 

 

 

 

 

 

 

Latent variable outcomes Attitude 1983 Attitude 1984 Attitude 1985 Attitude 1986

n 180 180 180 180

Female(0) Mean .256 -.312 .169 .386

Gender SD 1.577 1.808 1.588 1.629
n 143 143 143 143

Male (1) Mean .419 -.283 .343 .798

SD 1.721 1.758 1.708 1.607

n 157 157 157 157

Senior (0) Mean .153 -.410 .026 .411

Age SD 1.664 1.878 1.667 1.680
n 166 166 166 166

Junior (1) Mean .493 -.l95 .454 .718

SD 1.608 1.689 1.595 1.572

n 204 204 204 204

Yes (0) Mean .032 -.475 .012 .349

Religion SD 1.554 1.741 1.618 1.602
n 119 119 119 119

No (1) Mean .836 .001 .648 .946

SD 1.670 1.824 1.610 1.615

N 323 323 323 323

Total Mean .328 -.299 .246 .569

SD 1.642 1.783 1.642 1.630

 

 

 

 

 

 

Note. a. Each of these three explanatory variables were dichotomized as follows: gender (0: female vs. 1:
male), age (0: elder (>40) vs. 1: young respondents (<=40)), and religious status (0: have religion
vs. 1: no religion).

101

Table 4. 2.4

Frequencies of the Response Patterns Observed for the 1983-1986 Panels (Complete

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Cases)
1983
Response pattern Observed frequencies Response pattern Observed frequencies
1111 95 1001 8
0000 70 0010 8
1000 20 1100 7
1110 19 0111 4
0011 12 0110 4
1010 10 1101 3
1011 10 0101 3
0100 9 0001 2
1984
Response pattern Observed frequencies Response pattern Observed frequencies
0000 121 1010 6
1111 70 1101 5
1000 20 0011 4
1110 14 0001 4
0100 10 0111 3
0010 8 1001 2
1100 8 0110 1
0101 7 1011 l
1985
Response pattern Observed frequencies Response pattern Observed frequencies
1111 96 1011 6
0000 86 0101 5
1000 21 0010 5
1110 19 1010 4
0111 9 0110 4
1100 9 1101 3
0011 8 0001 2
0100 7
1986
Response pattern Observed frequencies Response pattern Observed frequencies
1111 107 1010 6
0000 72 1101 5
1110 32 0110 3
1100 17 0011 3
0111 12 1011 2
1000 9 0001 l
0100 8
0010 7

 

 

 

 

102

 

Table 4. 2.5

Frequencies of the Response Patterns Observed for the 1983-1986 Panels (Available

 

 

 

 

 

 

 

 

 

 

 

 

Cases)
1983
Response pattern Observed frequencies Response pattern Observed frequencies
1111 102 1001 8
0000 85 1100 8
1110 21 9999 5
1000 21 0111 4
0011 14 0110 4
1010 13 1101 3
1011 10 0101 3
0100 10 0001 2
0010 10
1984
Response pattern Observed frequencies Response pattern Observed frequencies
0000 134 1010 7
1111 73 1101 5
1000 24 0001 5
1110 17 0011 4
9999 13 0111 3
0100 11 1001 2
1100 8 1011 l
0010 8 0110 l
0101 7
1985
Response pattern Observed frequencies Response pattern Observed frequencies
1111 99 1011 6
0000 93 0101 5
9999 23 0010 5
1110 21 1010 4
1000 21 0110 4
1100 10 1101 3
0111 9 0001 2
0011 9
0100 9
1986
Response pattern Observed frequencies Resmnse pattern Observed frequencies
1111 117 1010 6
0000 85 0110 4
1110 36 1011 3
1100 18 9999 3
0111 12 0011 3
1000 12 0001 1
0100 9
0010 8
1101 6

 

 

 

 

Note. Response pattern 9 stands for the missing value.

103

Table 4. 2. 6

Different Types of Prior Used in the Present Study

 

Measurement model

 

 

 

 

 

 

 

 

 

Parameter Baseline priors
'62
.63 1910.1)8
,84
a2
a3 N(O, l.0E-02)I(0, )
a4
Structure model
Parameter Non informative priors
S2
N(O, 1.0E—4)
S3
,1:
L N(O, 1.0E-4)
#S
2 —1
UL 0L5 Wishart 1 O ,2
2 O l
2
2 (1) Most ~Gamma(.001, .001)

(2) 0%, ~Unif(0,1.0E04)

 

Note. a. Inside the parenthesis, the second quantity stands for the precision of the parameter.

104

Table 4. 2. 7
Parameter Estimates of the 2PNO-LGC Model (Restricted Data)

 

 

 

 

 

 

 

 

 

 

Priors input d Priors input
~ _ a ~ norm (0, LOB-02)I(0,) and ,B~
a dnorm (0, 1.0E 02)I(0,) and ,8~ dnorm(0,1) dnorm(0,1)
Probit link Legit 1m]? Probit link
gamma priors for . .
gamma priors for varying residuals varying residuals uniform p rrors for
varying residuals
(~dgamma(.001, 001)) (~ (~ dunif(0 1 0E04))
dgamma(.001, 001)) ’ '
Bayesian-one single long chain Bayesian-three independent chains
(30,000 iterations, 20,000 burn-in) (30,000 iterations, 20,000 bum-in)
Estimate Estimate Estimate Estimate
( E AP) SD (E AP) SD (E AP) SD ( E AP) SD
,8] .000 --- .000 --- .000 --- .000 --
ﬂZ .201 * .071 .167* .071 .186* .066 .185* .069
[33 .223“ .070 .195* .072 .210* .068 .210* .069
,84 .636* .071 .662* .094 .677* .088 .699* .090
a! 1 .000 --- l .000 --- 1 .000 --- 1.000 ---
a2 1600* .182 1.449* .186 1.441* .185 1384* .197
a3 1514* .165 1.319* .155 1304* .161 1256* .161
a4 1200* 119 1054* .123 1038* .124 .995* .121
S] .000 ~-- .000 ——- .000 -—- .000 --
52 -2.174* .586 -2.522* .804 -2.517* .686 -2.072* .744
S3 .084 .253 .079 .302 -.002 .292 .061 .289
S4 1.000 --- 1.000 --- 1.000 --- 1.000 ---
,uL .375* .109 .383* .140 .405* .132 .392* .135
,US .271 * .054 286* .072 276* .064 .336* .089
0‘2 2159* .284 2908* .483 2742* .487 2953* .623
0%. .136* .049 .144* .040 .143* .058 .144* .061
pLS -.076 .180 -.l37 .165 -.017 .191 -.021 .214
031 .856* .210 1005* .243 1007* .258 1077* .307
032 .157 .206 .086 .197 .183 .287 .581 .387
033 .873* .192 1061* .281 1057* .270 1095* .304
0'34 .071 .099 .181 .190 .170 .189 .391 .224
Ind DIC=3,329.41; D1C=3,370.06; D1C=3,347.52 ; DIC=3,338.53 ;
ex Bayesian p=.552 Bayesian p=.488 Bayesian p=.5 1 3 Bayesian p=.494

 

 

 

Note. a. Multiplying by a factor of 1.701; b.*p <.05 (1.96).

105

Table 4. 2.8 Sensitivity Analysis: Parameter Estimates of the 2PNO-LGC Model

 

Priors distribution for item parameters: a ~ dnorm (0, LOB-02)I(0,) and ,B~ dnorm(0,1)

 

Probit link

 

uniform priors for varying residuals («dunif (0, 1.0E04))

 

 

 

 

 

 

 

 

One single long chain Three independent chains
(50,000 iterations, 19,998 burn-in) (30,000 iterations, 20,000 burn-in)
Estimate b Estimate

( E AP) SD mcse ( E AP) SD mcse

I31 0.000 .000 -.-
,62 .182* .067 0.003 .185* .069 0.002
,83 .205* .068 0.003 .210* .069 0.002
,84 .679* .084 0.004 .699* .090 0.004

a] 1.000 1.000
(12 1.427* .183 0.008 1384* . 197 0.008
(13 1307* .167 0.008 1.256* .161 0.006
(14 1.035* .120 0.006 .995* .121 0.005

S I .000 --- --- .000 --- . —-
$2 -l.940* .617 0.037 -2.072* .744 0.038
S3 .104 .274 0.008 .061 .289 0.008

S4 1.000 1.000 --
[IL .370* .128 0.004 392* .135 0.004
[US .333* .078 0.004 336* .089 0.004
0% 273* .506 0.029 2953* .623 0.030
of. .144* .057 0.003 .144* .061 0.003
PLS -.019 .204 0.010 -.021 .214 0.010
031 996* .265 0.012 1077* .307 0.013
032 .546 .348 0.020 .581 .387 0.019
033 1016* .275 0.013 1095* .304 0.012
034 .364 .203 0.011 .391 .224 0.010

Index DIC=3,340.25; Bayesian p—value=.504 DIC=3,338.53 ; Bayesian p-value=.494
(Restricted Data)

Note. a. *p <05 (1.96); b. MCSE, a type of sampling error, stands for Monte Carlo standard error, which
can always be reduced by lengthening the chain (Kim and Bolt, 2007).

106

Table 4. 2. 9

Bayesian Estimates of the Model Parameters under (1) the HLM and (2) the LGC Model

 

 

 

 

 

for a Simulated Data Set
Parameter True value HLM LGC
191 .000 --
192 .183 .151* (.050) .152* (.052)
133 .210 252* (.055) 254* (.055)
,84 .728 663* (.063) .663* (.064)
a] 1.000 ..-
a2 1.298 1.316* (.121) 1319* (.120)
a3 1.181 1042* (.085) 1046* (.086)
a4 .934 1043* (.086) 1045* (.086)
S] .000 .—
sz -1741 -1.409* (.371)
S3 .064 -050 (.173)
S4 1.000 --
M .394 328* (.094) 334* (.105)
,u S .399 399* (.042) .470* (.084)
0% 3.192 3.111*(.419) 3088* (.418)
03, .132 .143* (.038) .178* (.065)
pLS .049 .102 (.106) .208 (.156)
0% 1.000 .701* (.106) .710* (.106)
DIC 5,841.250 5,847.230

 

 

 

Note. a. *p<.05 (1.96); b. Standard deviations are given in parentheses.

107

Table 4. 2. I 0
Unconditional Models: Parameter Estimates of the 2PNO—LGC Model (Both Data Sets)

 

Three independent chains (30,000 iterations, 20,000 bum—in)

 

 

 

 

 

Complete cases (n=284) Available cases (n=323)
Estimate (EAP) SD mcseﬁ Estimate (EAP) SD mcse
,B I .000 --- --- .000 --- ---
,82 .185* .069 0.002 .189* .066 0.002
'33 210* .069 0.002 205* .067 0.002
,84 699* .090 0.004 .724* .082 0.003
a] 1 .000 --- --- 1 .000 --- «-
a2 1384* .197 0.008 1382* .171 0.007
a3 1256* .161 0.006 1291* .156 0.006
a4 .995* .121 0.005 1.005* .111 0.005
S 1 .000 --- _-- .000 --- ---
52 -2.072* .744 0.038 -1.89* .560 0.027
S3 .061 .289 0.008 .110 .261 0.007
S4 1 .000 --- --- l .000 --- ---
JUL .392* .135 0.004 .302* .122 0.003
#5 .336* .089 0.004 353* .076 0.003
2 2953* .623 0.030 2.957* .505 0.023
“I.
2 .144* .061 0.003 .148* .059 0.003
“S
pLS -.021 .214 0.010 .029 .202 0.009
2 1077* .307 0.013 1.019* .269 0.011
081
02 .581 .387 0.019 .536 .330 0.016
82
2 1095* .304 0.012 1023* .271 0.010
083
02 .391 .224 0.010 .324 .178 0.008
84
Indices DIC=3,338.53 ; Bayesian p-value=.494 DIC=3,641.82; Bayesian p-value=.500

 

 

 

Note. a. *p<.05 (1.96); b. MCSE, a type of sampling error, stands for Monte Carlo standard error, which

can always be reduced by lengthening the chain (Kim and Bolt, 2007).

108

Table 4. 2. I I
Conditional Models: Parameter Estimates of the 2PNO-LGC Model

 

 

 

 

 

 

 

 

 

 

 

 

Parameter Restricted data (n=2 84) Full data (n=323)
Model 1 1 Model 2 | Model 3 Model 1 | Model 2 l Model 3
Measurement model
01 .000 .000 .000 .000 .000 .000
)92 .182* .180* .173* .196* .191* .193*
(.071) (.068) (.068) (.069) (.067) (.066)
[33 209* 205* .197* 214* 207* 209*
(.074) (.068) (.070) (.070) (.068) (.068)
[34 .734* 688* 675* .779* .730* .737*
(.092) (.086) (.089) (.092) (.089) (.089)
a] 1.000 1.000 1.000 1.000 1.000 1.000
a2 1309* 1403* 1417* 1282* 1363* 1354*
(.174) (.181) (.188) (.154) (.168) (.159)
a3 1173* 1271* 1285* 12* 1298* 1278*
(.146) (.149) (.157) (.144) (.158) (.158)
M 918* 1005* 1017* .916* 10* .985*
(. 109) (.113) (.120) (.098) (.119 (.107)
Structural model
SI .000 .000 .000 .000 .000 .000
52 -1.008* -1.555* -1.915* -1077* -1.495* -1.827*
(.406) (.594) (.618) (.387) (.491) (.573)
S3 .173 .102 .086 .182 .155 .114
(.202) (.256) (.278) (.197) (.233) (.256)
S4 1.000 1.000 1.000 1.000 1.000 1.000
m in! -.366 -.197 -.180 -.378 -252 -231
' (.243) (.188) (.180) (.232) (.181) (.182)
.219 .147
1.
5 gender (.384) (.382)
m age .606 550* .555* .520 .475 .481
' (.370) (.273) (.264) (.355) (.259) (.263)
M mg 2468* 162* 1613* 1882* 1469* 1507*
' (.872) (.382) (.374) (.606) (.367) (.367)
-. 1 -.036
ﬁl‘genage (.609) (.583)
-1.12 -.488
,BLgenrel (.827) (.773)
2122* 4252* 1253* -1.38 -.990* -1.026*
1. e. 1
’3 “g re (.797) (.481) (.473) (.727) (.453) (.463)
B 1 . gen.age. rel 1.169 .485
(1.063) (.981)

 

109

(continued on next page)

Table 4. 2. 11 (cont’d)

 

 

 

 

 

Parameter Restricted data (n=284) Full data (n=323)
Model 1 1 Model 2 1 Model 3 Model 1 L Model 2 [ Model 3
’32.!“ .388* .314* .344* .387* .336* .369*
(.143) (.091) (.083) (.133) (.081) (.083)
.514 .181 .394 .159
”gender (.267) (. 126) (.233) (.1 13)
ﬁlage .073 .0818
(.219) (.194)
. -.047 .222
ﬂz'ml’g (.446) (.391)
,62.gen.age -.346 -.230
(.377) (.315)
-.171 -.322
ﬂZ.gen.rel (.544) (.469)
-.174 -.443
,82.age.rel (.508) (.470)
[32. gen. age.rel .194 .383
(.686) (.585)
2 3.012* 2599* 2579* 3.16* 2.766* 2.821*
0 L (.603) (.474) (.496) (.577) (.552) (.500)
2 245* .169* .144* 231* .176* .147*
US (.122) (.079) (.060) (.109) (.080) (.059)
0.173 0.107 .037 .209 . 126 .067
pLS (.270) (.232) (.213) (.236) (.228) (.210)
2 1.167* 0996* .977* 1.159* 999* 1014*
0'81 (.327) (.276) (.266) (.306) (.274) (.258)
2 1.198* .749 .614 1.002* .645 .607
052 (.442) (.404) (.370) (.419) (.356) (.337)
2 1.158* 1.047* 1024* 1.155* 1.035* 1.056*
033 (.318) (.293) (.290) (.304) (.287) (.276)
2 .419 .407 .4101 .409 .359 .388
034 (.258) (.225) (.227) (.261) (.203) (.233)
Goodness of DIC=3,337; DIC=3,340; DIC=3,342; DIC=3,638; DIC=3,639; DIC=3,639;
ﬁt in dices Bayesmn Bayesmn Baye81an Bayes1an Bayesmn Bayesmn
p=.478 p=.489 p=.488 p=.48 p=.495 p=.494

 

 

 

Note. a. Each number inside the parenthesis stands for the standard deviation of the estimate.
b. *p<.05 (1.96).

110

Table 4. 3. 1 a
Summary Statistics for Longitudinal NYS Data: Social Isolation

A. Summary statistics for NYS IRT scale scores over ﬁve assessment occasions

 

 

NYS-1976 NYS-1977 NYS-1978 NYS-1979 NYS-1980
Mean 1.555 1.238 1.063 1.154 1.212
SD 1.506 1.550 1.611 1.568 1.504
Skewness -.091 -.264 —.456 -.418 -.623
Kurtosis .356 .048 .209 .162 .377

 

B. Correlation matrix for NYS IRT scale scores for ﬁve assessment occasions

 

 

NYS-1976 NYS-1977 NYS-1978 NYS-1979 NYS-1980
NYS-1976 l
NYS-1977 .660* l
NYS-1978 608* .740* l
NYS-1979 .527* 682* .782* l
NYS-1980 .533* 692* .730* .780* 1

 

Note. a. Based on the sample of 838 participants; b. * p<.05 (1.96).

Table 4.3.1b
Summary Statistics for Longitudinal NYS data: Deviant Peers Afﬁliation

A. Summary statistics for NYS IRT scale scores over ﬁve assessment occasions

 

 

NYS-1976 NYS-1977 NYS-1978 NYS-1979 NYS-1980
Mean -.862 -1.007 -l.079 -l.412 -1.377
SD 1.811 1.853 1.986 2.390 2.386
Skewness .178 .125 .068 .104 .083
Kurtosis —.084 -.289 -.195 -.417 -.500

 

B. Correlation matrix for NYS IRT scale scores for ﬁve assessment occasions

 

 

NYS-1976 NYS-1977 NYS-1978 NYS-1979 NYS-1980
NYS-1976 1
NYS-1977 .793* 1
NYS-1978 .763* .818* l
NYS-1979 633* .721* .838* 1
NYS-1980 .641* .724* .834* 906* 1

 

Note. a. Based on the sample of 838 participants; b. *p<.05 (1.96).

111

Table 4.3.2
Response Frequencies to 13 Outcome Measures

 

NYS-1976: Social Isolation

(Please tell me how much you agree or disagree with these statements about you...)

 

 

Strongly Disagree Neither Agree Strongly
dlsagree agree

1. Don’t ﬁt in with friends 175 528 56 58 21

2. Teachers don’t call on me 145 501 92 81 19

3. Outsiders with family 315 447 33 33 10

4. Nobody at school cares 210 493 64 62 9

5. Don’t belong at school 205 526 53 39 15

6. No project work from teachers 126 520 90 86 16

 

NYS-1976: Exposure to Delinquent Peers

(Think of the people you listed as your close friends. During the last year how many of them have...)

 

 

 

 

 

None Very Some Most All
few of them of them of them
7. Destroyed property 522 229 68 15 4
8. Stole something worth $5 dollars or less 460 237 89 40 12
9. Hit someone 367 288 126 34 23
10. Broke into vehicle 763 56 17 1 l
11. Sold hard drugs 804 22 12 0 0
12. Stole something worth $50 dollars or more 777 43 13 l 4
l3. Suggested you break the law 615 133 62 1 1 17
NYS-1977: Social Isolation
(Please tell me how much you agree or disagree with these statements about you...)
Strongly Disagree Neither Agree Strongly
d1sagree agree
1. Don’t ﬁt in with friends 214 529 46 42 7
2. Teachers don’t call on me 181 500 99 52 6
3. Outsiders with family 351 420 34 26 7
4. Nobody at school cares 245 484 67 34 8
5. Don’t belong at school 249 500 49 32 8
6. No project work from teachers 115 541 110 66 6

 

NYS-1977: Exposure to Delinquent Peers

(Think of the people you listed as your close ﬁiends. During the last year how many of them have...)

 

 

None Very Some Most All
few of them of them of them

7. Destroyed property 526 232 65 11 4
8. Stole something worth $5 dollars or less 462 235 88 4O 13
9. Hit someone 434 267 100 25 12
10. Broke into vehicle 764 60 9 3 2
11. Sold hard drugs 797 30 9 0 2
12. Stole something worth $50 dollars or more 791 39 8 0 0
l3. Suggested you break the law 610 141 58 20 9

 

112

(continued on next page)

Table 4. 3.2 (cont’d)

NYS-1978: Social Isolation

(Please tell me how much you agree or disagree with these statements about you...)

 

 

(51:22:12: Disagree Neither Agree 82:23:13,
1. Don’t ﬁt in with friends 275 502 33 25 3
2. Teachers don’t call on me 197 537 74 28 2
3. Outsiders with family 358 412 41 18 9
4. Nobody at school cares 263 471 66 36 2
5. Don’t belong at school 247 499 54 33 5
6. No project work from teachers 116 513 133 73 3

 

NYS-1978: Exposure to Delinquent Peers
(Think of the people you listed as your close friends. During the last year how many of them have...)

 

 

None Very Some Most All
few of them of them of them

7. Destroyed property 528 230 61 14 5
8. Stole something worth $5 dollars or less 455 238 109 26 10
9. Hit someone 484 233 99 17 5
10. Broke into vehicle 752 70 11 4 l
11. Sold hard drugs 779 39 13 5 2
12. Stole something worth $50 dollars or more 779 43 12 2 2
l3. Suggested you break the law 605 135 63 20 15

 

NYS-1979: Social Isolation

(Please tell me how much you agree or disagree with these statements about you...)

 

 

(81:21:61: Disagree Neither Agree Sggfcgely
1. Don’t ﬁt in with friends 259 526 30 21 2
2. Teachers don’t call on me 166 590 55 25 2
3. Outsiders with family 353 422 31 22 10
4. Nobody at school cares 201 520 78 35 4
5. Don’t belong at school 236 522 46 3O 4
6. No project work from teachers 100 471 176 86 5

 

NYS-1979: Exposure to Delinquent Peers
(Think of the people you listed as your close ﬁ’iends. During the last year how many of them have...)

 

 

None Very Some Most All
few of them of them of them

7. Destroyed property 559 209 62 4 4
8. Stole something worth $5 dollars or less 477 228 104 18 11
9. Hit someone 527 221 67 16 7
10. Broke into vehicle 744 68 19 3 4
11. Sold hard drugs 761 50 24 3 0
12. Stole something worth $50 dollars or more 764 50 19 2 3
13. Suggested you break the law 599 139 72 13 15

 

113

(continued on next page)

Table 4. 3.2 (cont’d)

 

NYS-1980: Social Isolation
(Please tell me how much you agree or disagree with these statements about you...)

 

 

Strongly Disagree Neither Agree Strongly
d1sagree agree

1. Don’t ﬁt in with friends 243 549 32 13 1

2. Teachers don’t call on me 147 605 67 17 2

3. Outsiders with family 323 442 51 16 6

4. Nobody at school cares 199 541 77 18 3

5. Don’t belong at school 198 545 57 34 4

6. No project work from teachers 100 477 194 63 4

 

NYS-1980: Exposure to Delinquent Peers
(Think of the people you listed as your close friends. During the last year how many of them have...)

 

 

None Very Some Most All
few of them of them of them

7. Destroyed property 584 185 55 7 7
8. Stole something worth $5 dollars or less 490 212 103 24 9
9. Hit someone 546 213 67 12 0
10. Broke into vehicle 742 76 18 l 1
11. Sold hard drugs 735 73 24 2 4
12. Stole something worth $50 dollars or more 747 66 21 4 0
13. Suggested you break the law 591 143 68 23 13

 

Note. Frequency response calculation was based on the sample of 838 participants.

114

Table 4. 3. 3
Different Types of Prior Used in the Present Study

 

Measurement model

 

 

 

 

 

 

 

 

 

Parameter Baseline priors
ﬂew N(O, .5)3
alpha N(O, LOB-02)I(0,)
Structure model
Parameter Least-informative priors
S2
S3 N(O, l.0E-2)
S4
.UL ,
N(O, 1013-02)
#5
Level-1 residual variances for each dimension
—1
07— Gamma(1,1)
5d

 

Random effect component: Unidimensional GRM-LGC

 

2 —1
0' 0'
L LS WishartH; (1)],3]

2
”LS 0L

Random effect component: Multidimensional MGRM-ALGC:

 

 

 

 

( 2 V1
”IL OILS 011.021. 011,025 '1 0 0 O '
2
a a a a a or O l 0 0
[LS 15 1S 2L IS ZS Wishart ’10
”ILUZL 015021. ‘72 “ZLS O 0 1 0
2L 2 0 0 0 1
(”ILUZS “ISUZS 0'2LS ”25 /

 

 

Note. a. Inside the parenthesis, the second quantity stands for the precision of the parameter.

115

Table 4.3.4
Unconditional Models: Parameter Estimates of the GRM-LGC Model for Each
Dimension

 

Three independent chains (8,000 iterations, 4,000 burn-in)

 

 

 

 

Social Isolation (n=838) Deviant Peer Afﬁliation (n=83 8)

Estimate (EAP) F SD [ mcse Estimate (EAP) 1 SD I mcse

,B[l, 1] .000 --- --- .000 --- ---
,B[l,2] 4358* .083 .004 2602* .074 .003
,B[l,3] 5233* .102 .005 4.753* .138 .004
,8[l,4] 7.114* .188 .006 6.009* .219 .005
ﬂ[2, 1] -.915* .087 .004 -.694* .057 .003
,B[2,2] 3943* .1 15 .006 1485* .083 .004
,b’[2,3] 5378* .167 .009 3297* .138 .007
,8[2,4] 7.865* .304 .013 4.785* .209 .010
073.1] .7608* .057 .003 -.583* .076 .003
,6[3,2] 4.503* .137 .008 2624* .139 .006
,B[3, 3] 5485* .176 .009 5488* .253 .011
,8[3,4] 6930* .255 .012 7.555* .369 .015
,B[4, l] -.1 13 .064 .003 1944* .090 .005
,8[4,2] 3.737* .103 .006 3650* .151 .008
ﬂ[4, 3] 4952* .146 .008 4987* .228 .011
,B[4,4] 7.113* .261 .011 5667* .298 .012
,B[5,l] .018 .060 .003 2885* .144 .008
,B[5,2] 3.813* .107 .006 4407* .218 .011
,6[5,3] 4.711* .139 .008 6331* .348 .016
,B[5, 4] 6333* .223 .011 7220* .448 .018
,B[6, I] -2.051* .136 .007 2.177* .097 .006
,B[6,2] 3.126* .099 .005 3549* .148 .008
,B[6,3] 5085* .169 .009 4932* .230 .011
,B[6,4] 8855* .364 .016 5580* .301 .012
,B[7, I] .721* .078 .004
,B[7,2] 2656* .132 .007
,8[7,3] 4485* .207 .010
§[7,4L 5.756* - .277 .013
a] l .000 --- —-- 1.000 --- ---
G2 .841* .034 .002 1.125* .048 .002
a3 995* .043 .002 .598* .025 .001
a4 1074* .044 .002 1629* .096 .004
(15 1270* .055 .003 980* .058 .003
(16 681* .029 .001 1899* .129 .006
a7 .793* .035 .002

 

 

 

116

(continued on next page)

Table 4. 3.4 (cont’d)

 

Three independent chains (8,000 iterations, 4,000 burn-in)

 

 

 

Social Isolation (n=838) Deviant Peer Afﬁliation (n=838)

Estimate (EAP) L SD 1 mcse Estimate (BAP) I SD mcse

S 1 .000 --- --- .000 --- «-
S2 .857* .238 .016 203* .060 .003
S3 1295* .319 .022 .503* .063 .003
S4 1230* .179 .011 977* .077 .004

S5 1.000 --- --- 1 .000 --- ---
[UL 1542* .074 .003 -.874* .083 .003
,US -.342* .069 .003 -.519* .095 .004
02 1.538* .320 .021 2.788* .281 .014
0;: .619 .370 .026 2504* .397 .021
pLS «.109 .252 .017 .002 .078 .003

 

 

 

Note. a. *p<.05 (1.96); b. Being one kind sampling error, the Monte Carlo standard error (MCSE) can

always be reduced by lengthening the chain (Kim and Bolt, 2007).

117

Table 4.3.5a

Correlations among Adolescents’ Social Isolation and Extent of Exposure to Delinquent

Peers

 

Social isolation

Exposure extent to delinquent

 

 

 

peers
Level Shape Level Shtwe

Social isolation

Level 1

Shape 2387* 1
Exposure extent to delinquent peers

Level .292 * . 109 1

Shape -.203* .523* -.006 1

 

Note. a. *p<.05 (1.96).

118

Table 4. 3. 5b
Unconditional Models: Parameter Estimates of the MGRM-ALGC Model for Both
Dimensions

 

Three independent chains (8,000 iterations, 4,000 burn-in)

 

 

 

 

Social Isolation (n=83 8) Deviant Peer Afﬁliation (n=838)

Estimate (EAP) 1 SD I mcseb Estimate (EAP) l SD L mcse

ﬂ[l, 1] .000 -—- --- .000 --- --
,8[ [,2] 4603* .085 .004 2624* .078 .003
ﬂ[l,3] 5511* .103 .005 4.787* .139 .005
,8[I,4] 7467* .192 .006 6013* .212 .006
,B[2, I] -.835* .057 .002 -.809* .070 .003
,B[2,2] 3.162* .088 .003 1609* .074 .003
,B[2,3] 4362* .108 .004 3636* .110 .004
,B[2,4] 6479* .197 .004 5291* .170 .004
M3, 1] 640* .064 .003 -.366* .047 .002
,B[3, 2] 4289* .113 .005 1550* .054 .002
,B[3, 3] 5256* .134 .005 3266* .091 .002
,B[3, 4] 6682* .189 .006 4509* .147 .003
,8[4,1] -212* .061 .003 2813* .120 .006
,B[4,2] 3.828* .102 .004 5298* .194 .007
,B[4, 3] 5.113* .126 .005 7.109* .279 .009
,8[4, 4] 7402* .224 .006 7979* .356 .010
,B[5, l] -.092 .071 .003 2651* .090 .003
,6[5,2] 4560* .132 .006 4068* .131 .004
,B[5, 3] 5667* .153 .006 5.788* .226 .005
ﬂ[5, 4] 7660* .226 .007 6563* .320 .007
,8[6, I] -1.443* .057 .002 3436* .145 .007
,8[6,2] 2017* .065 .002 5610* .219 .009
,B[6, 3] 3345* .082 .003 7586* .316 .011
,B[6,4] 5977* .177 .003 8388* .394 .012
,8[7,1] --- --- --- 539* .053 .003
,6[7, 2] --- --~ --- 2069* .068 .003
,B[ 7, 3] --- --- --- 3516* .099 .003
ﬁ[7, 4] --- --- --- 4518* .140 .003

a1 1 .000 --- --- l .000 -- -—-
a2 .709* .032 001 1063* .045 .002
a3 .845* .037 002 571* .025 .001
a4 917* .037 002 1355* .072 .003
(I5 1070* .047 002 .840* .050 .002
a6 571* 027 001 1453* .079 .004
a7 --- --- --- .752* .033 .001

 

 

 

119

(continued on next page)

‘

Table 4.3.5b (cont’d)

 

Three independent chains (8,000 iterations, 4,000 burn-in)

 

 

 

 

 

Social Isolation (n=838) Deviant Peer Afﬁliation (n=838)
Estimate (EAP) J SD [ mcse Estimate (EAP) f SD mcse
51 .000 .000
2 580* 074 .003 217* .056 .003
S3 925* 088 .004 504* .060 .003
S4 1070* .077 .003 990* .076 .004
55 1.000 1.000
,u L 1629* .087 .004 -.932* .086 .003
,u S -.416* .074 .002 -.548* .099 .004
0% 2470* .289 .015 3047* .297 .015
0% 1554* .328 .019 2664* .405 .021
pL S -.387* .070 003 -.006 .077 .003
Goodness
of ﬁt D1C=76,453.4
index

 

 

Note. a. *p<.05 (1.96); b. Being one kind sampling error, the Monte Carlo standard error (MCSE) can

always be reduced by lengthening the chain (Kim and Bolt, 2007).

120

Table 4. 3. 6
Unconditional Models: Parameter Estimates of the MGRM-ALGC Model with Different
Scaling Options (Both Dimensions)

 

Three independent chains (8,000 iterations, 4,000 burn-in)

 

 

 

 

Social Isolation (n=838) Deviant Peer Afﬁliation (n=838)
EAP estimate (SD) EAP estimate (SD)

Original Scaling Scaling Original Scaling Scaling
scaling option 1 option 2 scaling option 1 option 2

,B[l 1] .000 -1.764* .000 .000 451* .000
’ (ﬁxed) (.083) (ﬁxed) (ﬁxed) (.078) (ﬁxed)
ﬁll 2] 4603* 2688* 4398* 2624* 3.137* 2563*
’ (.085) (.086) (.100) (.078) (.103) (.082)
.BN 3] 5511* 3575* 5280* 4.787* 5310* 4692*
’ (.103) (.104) (.119) (. 139) (.159) (.147)
,B[l 4] 7467* 5499* 7.187* 6013* 6532* 5909*
’ (.192) (.191) (.201) (.212) (.227) (.219)
ﬂ[2 I] -.835* -2.128* -.780* -.809* -.221* -.832*
’ (.057) (.071) (.058) (.070) (.082) (.071)
,B[2 2] 3.162* 1.892* 3223* 1609* 2185* 1585*
’ (.088) (.066) (.089) (.074) (.095) (.072)
ﬂ[2 3] 4362* 3.101* 4425* 3636* 4210* 3610*
’ (.108) (.086) (.109) (.110) (.129) (.110)
,B[2 4] 6479* 5237* 6540* 5291* 5.862* 5260*
' (.197) (.183) (.195) (.170) (.182) (.171)
,B[3 I] 640* -.895* .701* -.366* -.048 -.372*
’ (.064) (.066) (.070) (.047) (.054) (.047)
,B[3 2] 4289* 2.790* 4353* 1550* 1.871* 1539*
’ (.1 13) (.086) (.116) (.054) (.064) (.054)
,3[3 3] 5256* 3.770* 5322* 3266* 3590* 3254*
’ (.134) (.108) (.137) (.091) (.099) (.090)
[W3 4] 6682* 5216* 6.752* 4509* 4.834* 4499*
’ (.189) (.167) (.191) (.147) (.154) (.148)
3M 1] -212* -l.868* -.155* 2813* 3457* 2.780*
’ (.061) (.082) (.065) (.120) (.149) (.121)
,3[4 2] 3.828* 2.178* 3.860* 5298* 5884* 5266*
' (.102) (.080) (.107) (. 194) (.213) (.196)
,6[4 3] 5.113* 3471* 5.141* 7.109* 7652* 7078*
’ (.126) (.101) (.130) (.279) (.292) (.283)
1574 4] 7402* 5.789* 7425* 7979* 8494* 7940*
’ (.224) (.209) (.226) (.356) (.362) (.353)
[315]] -.092 2051* -.018 2651* 3075* 2641*
' (.071) (.090) (.071) (.090) (.112) (.091)

5 2 4560* 2662* 4631* 4068* 4481* 4061*
'8[’ J (.132) (.098) (.128) (.131) (.149) (.132)
ﬂ[5 3] 5667* 3.786* 5.739* 5.788* 6.183* 5.792*
’ (. 153) (.118) (. 149) (.226) (.239) (.233)
54 7660* 5.819* 7.734* 6563* 6929* 6552*
'6[’ ] (.226) (.200) (.221) (.320) (.325) (.321)
-1.443* -2479* -1.400* 3436* 4.100* 3422*

ﬂ[6’l] (.057) (.070) (.057) (.145) (.191) (.150)
2 2017* 988* 2063* 5610* 6217* 5610*
’8[6’ ] (.065) (.051) (.066) (.219) (.255) (224)

 

 

 

 

 

 

 

121

(continued on next page)

.‘F‘d-‘T—fr

Table 4. 3.6 (cont’d)

 

 

 

 

 

 

 

 

 

Social Isolation (n=838) Deviant Peer Afﬁliation (n=83 8)
EAP estimate (SD) EAP estimate (SD)
Original Scaling Scaling Original Scaling Scaling
scaling option 1 option 2 scaling option 1 option 2
,3[6 3] 3345* 2320* 3392* 7586* 8.131* 7593*
' (.082) (.066) (.082) (.316) (.336) (.324)
[W6 4] 5977* 4971* 6028* 8388* 8901* 8394*
' (.177) (.170) (.182) (.394) (.405) (.402)
539* 952* 522*
”7' I] (.053) (.067) (.052)
2069* 2477* 2051*
“7'21 (.068) (.083) (.066)
3516* 3921* 3498*
”7'31 (.099) (.112) (.098)
4518* 4921* 4501*
”7'41 (. 140) (.150) (.139)
a1 1.000 1.000 1.000 1.000 1.000 1.000
(ﬁxed) (ﬁxed) (ﬁxed) (ﬁxed) (ﬁxed) (ﬁxed)
a2 .709* .803* .787* 1063* 1061* 1.106*
(.032) (.044) (.043) (.045) (.047) (.053)
03 .845* 961* 933* 571* 577* 590*
(.037) (.052) (.050) (.025) (.026) (.029)
a4 917* 1024* 999* 1355* 1306* 1408*
(.037) (.051) (.049) (.072) (.071) (.076)
a5 1070* 1219* 1.179* .840* .820* .881*
(.047) (.065) (.059) (.050) (.050) (.056)
a6 571* 643* 634* 1453* 1394* 1524*
(.027) (.038) (.036) (.079) (.081) (.096)
a7 ___ ___ ___ .752* .750* .785*
(.033) (.037) (.039)
S1 .000 .000 .000 .000 .000 .000
(ﬁxed) (ﬁxed) (ﬁxed) (ﬁxed) (ﬁxed) (ﬁxed)
52 580* 592* 556* 217* 266* 218*
(.074) (.067) (.069) (.056) (.052) (.056)
S3 925* 915* .881 * 504* 524* 511*
(.088) (.081) (.078) (.060) (.058) (.061)
S4 1070* 1057* 1044* 990* 1002* 1007*
(.077) (.072) (.071) (.076) (.071) (.073)
S5 1.000 1.000 1.000 1.000 1.000 1.000
(ﬁxed) (ﬁxed) (ﬁxed) (ﬁxed) (ﬁxed) (ﬁxed)
1629* .000 1523* -.932* .000 -916*
'UL (.087) (ﬁxed) (.085) (.086) (ﬁxed) (.084)
-.416* -.424* -.363* -548* -.732* -.522*
[1 S (.074) (.069) (.071) (.099) (.097) (.092)
2 2470* 2.152* 2198* 3047* 3015* 2869*
0L (.289) (.251) (.247) (.297) (.300) (.294)
2 1554* 1454* 1565* 2664* 2875* 2526*
0S (328) (.275) (.271) (.405) (.435) (.400)
-.387* -.429* -.433* -.006 -.017 -.027
pLS (.070) (. 192) (.189) (.077) (.223) (.218)
A 2 1.000 .715* .711* 1.000 932* .867*
05‘ (ﬁxed) (.076) (.073) (ﬁxed) (.084) (.089)

 

 

 

 

 

Note. a. *p<.05 (1.96).

122

Table 4.3. 7

Results from the ALGC model Using Two Analytical Approaches with a Simulated Data

 

 

 

 

 

 

 

 

 

 

Set
Three independent chains (8,000 iterations, 4,000 burn-in)
Social Isolation (n=83 8) Deviant Peer Afﬁliation (n=838)
True 2 stage IRT 1 stage IRT True 2 stage IRT 1 stage IRT
value Parameter Parameter value Parameter Parameter
estimate estimate estimate estimate
(SD) (50} (50} (SD)
51 .000 .000 .000 .000 .000 .000
.707* 668* .176* 214*
52 580 (.050) (.078) '2” (.040) (.059)
956* 972* 518* 490*
S3 '925 (.055) (.086) “504 (.038) (.064)
1.114* 1133* 1040* 1037*
S4 ”’70 (.060) (.091) '990 (.045) (.097)
S5 1.000 1.000 1.000 1.000 1.000 1.000
1486* 1479* -1.072* -1052*
”L "629 (.064) (.087) "932 (.054) (.078)
-.366* -.384* -.458* -.476*
”S "416 (.053) (.070) "548 (.056) (.093)
2 2564* 2757* 1.859* 2286*
0L 2'470 (.172) (.278) 1047 (.121) (.230)
2 987* 1429* 1331* 2310*
0 S 1554 (.135) (.273) 2'6“ (.148) (.448)
-.376* -.420* 420* .134
”S "387 (.050) (.060) "006 (.062) (.089)

 

 

 

 

 

Note. a. *p<.05 (1 .96).

123

Table 4. 3.8a
Correlations among Adolescents’ Social Isolation and Extent of Exposure to Delinquent
Peers

 

 

 

Social isolation Exposure extent to delinquent peers
Level Shape Level Shape

Social isolation 1

Level

Shape -.392* 1
Exposure extent to delinquent peers 289* .105 1

Level

Shape -205* 522* -.009 1

 

 

Note. a. *p<.05 (1.96).

124

 

Table 4. 3 .8b
Estimates of Fixed and Random Effect Parameters in the MGRM-ALGC Model

 

Three independent chains (8,000 iterations, 4,000 burn-in)

 

 

 

 

Social Isolation (n=838) Deviant Peer Aﬂiliation (n=838)

Estimate (EAP) I SD I mcseb Estimate (EAP) I SD I mcse

pp, 1] .000 -.- —— .000 --.
,6[1,2} 4603* .087 .004 2612* .074 .003
mm] 5511* .106 .005 4.769* .136 .004
MM] 7472* .194 .006 5994* .210 .005
pp, 1] -.836* .056 .002 -.819* .068 .003
,6[2,2] 3.162* .087 .003 1599* .072 .003
,6[2,3] 4362* .107 .004 3625* .110 .004
pp, 4] 6481* .195 .004 5276* .169 .005
H3. 1] 639* .067 .003 -.371* .048 .002
ﬂ[3.2] 4289* .114 .005 1545* .053 .002
M13} 5256* .136 .005 3263* .091 .002
M3, 4] 6684* .191 .005 4510* .151 .003
[v4.1] -.211* .065 .003 2806* .116 .005
ﬂ[4.2] 3829* .108 .004 5300* .190 .007
ﬂ[4.3] 5114* .130 .005 7.116* .273 .008
H44} 7402* .224 .006 7983* .352 .009
M5, 1] -095 .071 .004 2642* .090 .004
M52} 4560* .130 .006 4.061 * .129 .004
M53] 5667* .150 .006 5.784* .113 .009
ﬂ[5.4] 7662* .222 .007 6558* .318 .007
M6, 1] -1 444* .057 .002 3453* .151 .007
M62] 2013* .066 .003 5642* .230 .009
ﬁ[6, 3] 3339* .082 .003 7627* .331 .012
ﬂ[6,4] 5970* .179 .004 8435* .411 .013
M21] 532* .053 .002
13! 7. 2] 2062* .066 .002
127.31 3510* .098 .003
ﬂ[ 7. 4] 4512* .138 .003

(11 1.000 1.000
a2 .710* .031 .001 1066* 043 .002
03 845* .036 .002 572* .026 .001
a4 917* .038 .002 1365* .071 .003
a5 1070* .044 .002 .845* .050 .002
a6 570* .027 .001 1476* .084 .004
a7 .756* .033 .001

 

 

 

125

(continued on next page)

Table 4.3.8b (cont’d)

 

Three independent chains (8,000 iterations, 4,000 burn-in)

 

 

 

 

 

Social Isolation (n=83 8) Deviant Peer Afﬁliation (n=838)
Estimate (EAP) I SD I mcse Estimate (EAP) I SD I mcse
S1 .000 --— --- .000 --- ---
S2 579* .078 .004 214* .059 .003
S3 911* .087 .005 507* .062 .003
S4 1056* .079 .004 1000* .075 .004
S5 1 .000 --- --- 1.000 —-- ---
deO 1626* .091 .004 -1.077* .112 .004
158101 267* .134 .004
12le -.417* .076 .003 -536* .098 .004
ﬁdl 1
0% 2477* .283 .015 3026* .306 .016
0'3. 1605* .351 .021 2622* .445 .025
pLS -392* .070 .003 -.009 .082 .004
Goodness
of ﬁt DIC=76,4629
index

 

 

Note. a. *p<.05 (1.96); b. Being one kind sampling error, the Monte Carlo standard error (MCSE) can

always be reduced by lengthening the chain (Kim and Bolt, 2007).

126

APPENDIX B

a O
V/*\ V,
«.10

*.o‘
W

o

b

\

L.__J

/
4A
7

Figure 2.]

Path diagram of a bivariate latent grth model.

127

 

 

Age

 

 

 

Gender

 

 

 

 

Religion

 

 

 

Figure 4.2.1

Path diagram of a four-wave 2PNO-LGC model.

Level

Shape

128

A7783

A7784

A7T85

A7786

 

Para[3] sample: 30003
6.0 "
4.0
2.0 '
0.0 -

I

 

 

 

 

Para[Q] sample: 30003

2.0 -
1.5*
10 -
0.5 -
0.0 -

 

 

 

 

 

Para[16] sample: 30003

 

 

Figure 4. 2.2

1.0
0.75

0.25
0.0

 

T T

I

 

)—

Para[6] sample: 30003

 

T T

0.5 0.75 1.0 1.25

I

1.5

 

 

Para[11] sample: 30003

 

 

 

 

Kernel density for the restricted data: One single long chain (excerpted).

129

 

 

 

 

6.0
4.0
2.0
0.0

1.5
1.0
0.5
0.0

1.0
0.75

0.25
0.0

 

Para[3] chains 1:3 sample: 30003

I

I

 

 

 

 

Para[Q] chains 1:3 sample: 30003

 

 

 

 

Para[16] chains 1:3 sample: 30003

 

 

 

Figure 4.2.3

0.8
0.6
0.4
0.2
0.0

0.6
0.4
0.2
0.0

 

Para[6] chains 1:3 sample: 30003

I

 

 

 

Para[11] chains 1:3 sample: 30003

 

 

I I I I

4.0 6.0

-

 

Para[18] chains 1:3 sample: 30003

 

 

 

Kernel density for the restricted data: Three independent chains (excerpted).

130

N-

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Para[3] chains 1:3 Para[6] chains 1:3
1.5 - 1.5 -
1'0 _ X7732— — v 1.0 ~ W ——
0.5 . 0.5 '
0.0 ' I I I 0.0 r I I I
20000 25000 20000 25000
iteration iteration
Para[9] chains 1:3 Para[11] chains 1:3
1.5 ' 1.5 '
1.0 - 7%; — 1.0 - Q’s-(ﬁrm;
0.5 ' 0.5 h
0.0 h I I I 0.0 LI I I
20000 25000 20000 25000
iteration iteration
Para[16] chains 1:3 Para[18] chains 1:3
3.0 ' 1.0 -
2.0 -
0.5 i'
1.0 ' w - -
0.0 ”I I I 0.0 ' I I f
20000 25000 20000 25000
iteration iteration
Figure 4. 2.4

Gelman-Rubin statistic for the restricted dataset: Three independent chains (excerpted).

131

 

Figure 4.3.1a

Perceived social isolation across ﬁve occasions (n=44).

8
4 ~ f 2... w/
2" (I. "..—— ..__,______
0 2 .2-. a..-“ \‘.e_——— ___3
I ..., 9 ":4.
1 “ ‘5'“ 7.1-91""
-4 ~ “ ‘--—."""-
-8 ;
1 2 3 4 5
Figure 4.3.1b

Perceived extent of exposure to delinquent peers across ﬁve occasions (n=44).

132

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Para[1] chains 1:3
1.5 _
1.0 - W
0.5 .
0.0 r I I I I
4001 5000 6000 7000
iteration
Para[3] chains 1:3
1.5 _
1.0 - y'h' -2--
0.5 _
0.0 _ I I I I
4001 5000 6000 7000
iteration
Para[5] chains 1:3
1.5 r
10* rise“ - -
0.5 .
0.0 ' I I I I I
4001 5000 6000 7000
iteration
Para[7] chains 1:3
1.5 ‘
1.0 L W
0.5 "
0.0 r I I I I I
4001 5000 6000 7000
iteration
Para[9] chains 1:3
1.5 r
1.0 - 7‘7“ ‘1
0.5 "
0.0 ' I I I I I
4001 5000 6000 7000
iteration
Figure 4.3.2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Para[2] chains 1:3
1.5 '
1.0 - =52 ——‘ A ‘—
0.5L F
0.0 r I I I I
4001 5000 6000 7000
iteration
Para[4] chains 1:3
1.5 '
1.0 ~ 3*: -- - - _
05* ("ff
0.0 - I I I I
4001 5000 6000 7000
iteration
Para[6] chains 1:3
1.5 -
1.0 * PM“ - -- -
0.5 b
0.0 r r I I I
4001 5000 6000 7000
iteration
Para[8] chains 1:3
1.5 '
1.0- R? ,_. .__,~..._.____--- ——————
0.5 '
0.0 '
r t r t
4001 5000 6000 7000
iteration
Para[10] chains 1:3
1.5 '
1.0 - ﬁ- ‘ ' ‘ ' '
0.5 h
0.0 “I I I I
4001 5000 6000 7000
iteration

MCMC convergence diagnostics: Gelman and Rubin statistics.

133

 

 

 

 

 

 

Para[11] chains 1:3
1.5 r
1.0 *-
0.5 r
0.0 r f

4001

 

 

 

6000 7000

iteration

5000

 

Para[13] chains 1:3

 

10- ebb”:

0.5 "

 

 

 

0.0 "
4001

T I

1
6000 7000

iteration

I
5000

 

Para[15] chains 1 :3
1.5 '

1.0-W —~—

0.5 '
0.0 P

4001

 

 

 

 

6000 7000

iteration

5000

 

Para[17] chains 1:3
1.5 '
1.0 - 2.2-
0.5 r
0.0 r

4001

 

 

 

6000 7000

iteration

5000

 

Para[19] chains 1:3
1.5 -

1.0-3:9“: --

0.5 b
0.0 "

4001

 

 

 

 

6000 7000

iteration

5000

Figure 4.3.2 (cont’d)

134

 

1.5”
1.0- o}
0.5"
0.0"

 

Para[12] chains 1:3

 

4

6000
iteration

001 5000

7000

 

Para[14] chains 1:3

 

1.0- 57*

0.5 "

 

0.0 ”

 

I
4001

6000
iteration

5000

7000

 

1.5 "
1.0 -
0.5 "
0.0 '

 

Para[16] chains 1:3

f..—

 

4001

6000
iteration

5000

7000

 

1.5 '
1.0 r
0.5 b
0.0

 

Para[18] chains 1:3

 

4001

I I
6000
iteration

5000

7000

 

1.5 '
1.0 r
0.5 ”
0.0 *

 

Para[20] chains 1:3

W
W'—

 

I
4001

6000
iteration

5000

7000

 

 

 

 

 

 

Para[21] chains 1:3
1.5 ‘
1.0 - 3&2—_- -
0.5 '

0.0 -

 

 

 

6000
iteration

I I
4001 5000

7000

 

Para[23] chains 1:3
1.5 '

 

1.0-W

0.5 '
0.0 '

 

 

 

6000
iteration

4001 5000

7000

 

Para[25] chains 1:3
1.5 r
1.0 -
0.5 r
0.0 r

i

 

 

 

6000
iteration

4001 5000

7000

 

Para[27] chains 1:3
1.5 '

 

1.0-32m: 47

0.5 '
0.0 '

 

 

 

6000
iteration

4001 5000

7000

 

Para[29] chains 1:3
1.5 "
1.0 - ”A ”—
0.5 r

0.0 '

\Il

 

 

 

6000
iteration

I I
4001 5000

Figure 4.3.2 (cont’d)

r
7000

135

 

Para[22] chains 1:3
1.5 '
1.0 - t w -
0.5 '
0.0 '

 

 

6000
iteration

4001 5000

7000

 

Para[24] chains 1:3
1.5 '

 

10'?

0.5 '
0.0 '

 

 

r
6000
iteration

4001 5000

7000

 

Para[26] chains 1:3
1.5 -

 

1.0-.2 - -
0.5-
0.0-

 

 

6000
iteration

I I
4001 5000

7000

 

Para[28] chains 1:3
1.5 ‘

 

10- :2- _-

0.5
0.0 "

 

 

6000
iteration

4001 5000

I
7000

 

Para[30] chains 1:3
1.5 '
1.0 - «k
0.5 -
0.0 -

 

 

6000
iteration

I I
4001 5000

I
7000

 

 

 

 

 

 

Para[31] chains 1:3
1.5 r
1.0 *-
0.5 r
0.0 "

 

 

 

6000
iteration

4001 5000

7000

 

Para[33] chains 1:3

1.5

1.0 '- Isa-—
0.5 F
0.0

 

I

 

 

6000
iteration

4001 5000

7000

 

Para[35] chains 1:3
1.5 '
1.0 -
0.5 ‘
0.0 *

 

 

 

6000
iteration

4001 5000

7000

 

Para[37] chains 1:3
1.5 '
1.0 - ..f'v -
0.5 -
0.0 -

 

‘"———

 

 

6000
iteration

4001 5000

7000

 

Para[39] chains 1:3

1.5'\
1.0" x.—

 

 

 

 

 

f. v - _
0.5 ”
0.0 " I I I I I
4001 5000 6000 7000
iteration
Figure 4.3.2 (cont’d)

136

1.5
1.0
0.5
0.0

1.0

0.5

0.0

1.5
1.0
0.5
0.0

1.5 '
1.0 - %m
0.5 r
0.0 h I I I I I
4001 5000 6000 7000
iteration
Para[40] chains 1:3
1.5 '
1.0 - ‘,,~,=-—-w-~——
0.5 "
0.0 - I I I I I
4001 5000 6000 7000
iteration

 

 

Para[32] chains 1:3

 

4001

6000 7000

iteration

5000

 

 

p

Para[34] chains 1:3

v—:

 

4001

6000 7000

iteration

5000

 

-

 

Para[36] chains 1:3

 

4001

6000 7000

iteration

5000

 

 

Para[38] chains 1:3

 

 

 

 

 

 

 

 

 

 

 

Para[41] chains 1:3
1.5 -
1.0 - >22 :
0.5 -
0.0

I

 

 

 

6000
iteration

4001 5000

7000

 

Para[43] chains 1:3
1.5 '
1.0 - at ?‘ M
0.5 r
0.0 '

 

 

 

6000
iteration

4001 5000

7000

 

Para[45] chains 1:3

 

1.0"»:

0.5 "

 

0.0 '

 

 

6000
iteration

4001 5000

7000

 

Para[47] chains 1:3
1.5 '
1.0 ' 3,7,3;
0.5 '
0.0 '

 

 

 

6000
iteration

I r
4001 5000

7000

 

Para[49] chains 1:3
1.5 '
1.0 - 1.2 - ‘ -
0.5 '
0.0 -

 

 

 

5000 6000

iteration

4001

Figure 4.3.2 (cont’d)

7000

 

Para[42] chains 1:3
2.0 -

 

1.5”
1.0' Pk;__i. '

 

0.5
0.0 *'

 

 

6000
iteration

4001 5000

7000

 

Para[44] chains 1:3

 

1.0 ' nw—vr

 

0.5 '

 

 

0.0 r
4001 6000
iteration

5000

7000

 

Para[46] chains 1:3

1.5 '

1.0 P 91‘ ‘
V/

0.5 F
0.0L

 

 

6000
iteration

I I
4001 5000

7000

 

Para[48] chains 1:3

I

1.5

1.0-7»— -2 -

0.5 '
0.0

I

 

 

6000
iteration

I I
4001 5000

7000

 

Para[50] chains 1:3
1.5 '

 

1.0 " #‘2— A
0.5 '
0.0 '

 

 

6000
iteration

4001 5000

137

7000

 

 

 

 

 

1.5 '
1.0
0.5 ‘
0.0 '

1.5’

 

Para[51] chains 1:3

1

 

 

 

I f I

4001 5000 6000
iteration

7000

 

Para[53] chains 1 :3

 

1.0-u; _

0.5
0.0

1.5
1.0
0.5
0.0

1.5
1.0
0.5
0.0

1.5
1.0
0.5
0.0

 

 

 

6000
iteration

4001 5000 7000

 

Para[55] chains 1 :3

 

 

 

6000 7000

iteration

4001 5000

 

Para[57] chains 1:3

 

 

 

6000 7000

iteration

4001 5000

 

Para[59] chains 1:3

_ M _- A-

)—

_

 

 

 

6000 7000

iteration

4001 5000

Figure 4.3.2 (cont’d)

138

 

1.5”

Para[52] chains 1 :3

10*} A‘ M ‘

0.5 '
0.0 "

 

 

4

1
6000
iteration

001 5000

 

1.5'
1.0*
0.5”
0.0”

 

Para[54] chains 1 :3

w A _

WV

 

4001

6000
iteration

5000

 

1.5“
1.0-
0.5'
0.0”

 

Para[56] chains 1 :3

 

4001

6000
iteration

5000

 

1.5 "
1.0-
0.5’
0.0”

 

Para[58] chains 1 :3

“25'

 

 

4001

6000 7000

iteration

5000

 

15*
1.0-
0.5'
0.0“

 

Para[60] chains 1 :3

W

 

 

4001

6000 7000

iteration

5000

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Para[61] chains 1:3
10- a: — v —
0.5"
0.0L
4001 5000 6000 7000
iteration
Para[63] chains 1:3
1.5-
1.0- Rae—~— - —-—
M
0.5-
0.0-
4001 5000 6000 7000

iteration

 

Para[65] chains 1:3

 

 

 

 

 

 

 

 

 

 

 

 

 

1.5 '
1.0 — W
0.5 -
0.0 ' I I I I
4001 5000 6000 7000
iteration
Para[67] chains 1:3
1.5 '
1.0- ét;f_,.__.;;:;::========l====
0.5 '
0.0 r I I I I I
4001 5000 6000 7000
iteration
Para[69] chains 1:3
1.5 '
1.0- %::==fur ‘- A ~———
0.5 '
0.0 -
l T l I I
4001 5000 6000 7000

iteration

Figure 4. 3.2 (cont’d)

139

1.5'
1.0-
0.5'
0.0'

1.5'

10 e 7.3.,“ _
0.5 -
0.0 -

1.5
1.0
0.5
0.0

1.5
1.0
0.5
0.0

1.5
1.0
0.5
0.0

 

‘4';

 

Para[62] chains 1 :3

 

4001

5000

r
6000
iteration

7000

 

 

Para[64] chains 1:3

 

4001

5000

6000
iteration

7000

 

 

Para[66] chains 1:3

 

4001

5000

6000
iteration

I
7000

 

 

Para[68] chains 1:3

 

4001

5000

6000
iteration

7000

 

 

Para[70] chains 1:3

 

4001

5000

6000
iteration

7000

 

 

 

 

 

 

Para[71] chains 1:3

1.5-
10— A: —-

0.5 P 4,

0.0 ‘

 

 

 

 

4001 5000 6000 7000
iteration

 

Para[73] chains 1:3
1.5 -
1.0 - a — -
0.5 ‘
0.0 _

 

 

 

4001 5000 6000 7000
iteration

 

Para[75] chains 1:3
1.5 -

0.5 "
0.0 '

 

1.0” Wk” :2-

 

 

4001 5000 6000 7000
iteration

 

Para[77] chains 1:3
1.5 '
1.0 - -

 

0.5 "
0.0 '

 

 

 

4001 5000 6000 7000
iteration

 

Para[79] chains 1:3
1.5 '
- W -
0.5 '
0.0 '

 

1.0 L‘ — ‘-

 

 

I I I
4001 5000 6000 7000
iteration

 

Para[81] chains 1:3

1.5-
1.0-*-~— _ —

 

0.5- f

0.0 '

 

 

 

4001 5000 6000 7000
iteration

Figure 4.3.2 (cont’d)

140

 

Para[72] chains 1:3

1.5”
1.0 ..2

0.5 "'
0.0

 

 

 

4001 5000 6000 7000

 

 

 

 

 

 

 

iteration
Para[74] chains 1:3
1.5 r
1.0 - p— — —
0.5 -
0.0 '
4001 5000 6000 7000
iteration
Para[76] chains 1:3
1.5 '
1.0 - £:_A —
0.5 r
0.0 '

 

 

 

4001 5000 6000 7000
iteration

 

Para[78] chains 1:3

1.5"
\_ __
1.0 r

_

 

 

 

 

 

0.5 r
0.0 L
4001 5000 6000 7000
iteration
Para[80] chains 1:3

1.5 '

1.0 - ‘ b ‘ — _.
0.5 ' 7"—

0.0 '

 

 

 

r

4001 5000 6000 7000
iteration

REFERENCES

141

REFERENCES

Adams, J. A., Wilson, M., & Wang, W. C. (1997). The multidimensional random
coefﬁcients multinomial logit model. Applied Psychological Measurement, 21 (1),
1-23.

Albert, J. H. (1992). Bayesian estimation of normal ogive item response curves using
Gibbs sampling. Journal of Educational Statistics, 17, 251-269.

Albert, J. H., & Chib, S. (1997). Bayesian analysis of binary and polytomous response
data. Journal of the American Statistical Association, 88, 669-679.

Amemiya, T. (1981). Qualitative response models: A survey. Journal of Economic
Literature, 19, 1483-1536.

Anderson, E. B. (1985). Estimating latent correlations between repeated testing.
Psychometrika, 50 (1), 3-16.

Bauer, D. J. (2003). Estimating multilevel linear models as structural equation models.
Journal of Educational and Behavioral Statistics, 28 (2), 135-167.

Bauer, D. J. (2009). A note on comparing the estimates of models for cluster-correlated
or longitudinal data with binary or ordinal outcomes. Psychometrika, 74 (1), 97-105.

Bereiter, C. (1963). Some persisting dilemmas in the measurement of change. In C. W.
Harris (Ed.), Problems in measuring change (pp. 203-212). Madison: University
of Wisconsin Press.

Best, N. G., Spiegelhalter, D. J ., Thomas, A., & Brayne, C. E. G. (1996). Bayesian
analysis of realistically complex models. Journal of the Royal Statistical Society,
Series A, 159 (2), 323-342.

Bimbaum, A. (1968). Test scores, sufﬁcient statistics, and the information structures of
tests. In F. M. Lord and M. R. Novick (Eds), Statistical theories of mental test
scores (pp. 425-43 5). Reading, MA: Addison-Wesley Publishing Company.

Blozis, S. (2007). A second order structural latent curve model for longitudinal data. In
K. Van Montfort, J. Oud, and A. Satorra (Eds), Longitudinal Models in the

Behavioral and Related Sciences (pp. 189-214). Mahwah, NJ: Lawrence Erlbaum
Associates, Inc.

Bollen, K. A. (1989). Structural equations with latent variables. NY: Wiley.

Bolt, D. M., & Kim, J.-S. (2005). Hierarchical IRT models. In B. S. Everitt & D. C.

142

Howell (Eds), Encyclopedia of statistics in behavioral science (vol. 2, pp. 805-810).
London: John Wiley & Sons.

Bolt, D. M., & Lall, V. F. (2003). Estimation of compensatory and noncompensatory
multidimensional item response models using Markov chain Monte Carlo. Applied
Psychological Measurement, 27 (6), 395-414.

Byrne, B. M., & Crombie, G. (2003). Modeling and testing change: An introduction to
the latent growth curve model. Understanding Statistics, 2 (3), 177-203.

Carrigan, G., Barnett, A. G., Dobson, A. J ., & Mishra, G. (2007). Compensating for
missing data from longitudinal studies using WinBUGS. Journal of Statistical
Software, 19(7), 1-17.

Casella, G., & George, E. I. (1992). Explaining the Gibbs sampler. The American
Statistician, 46 (3), 164-174.

Cheong, J. W., MacKinnon, D. A., & Khoo, S. T. (2003). Investigation of mediational
processes using parallel process growth curve modeling. Structural Equation
Modeling: A Multidisciplinary Journal, 10 (2), 238-262.

Chib, S., & Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm. The
American Statistician, 49 (4), 327-335.

Chou, C.-P., Bentler, P. M., & Pentz, M. A. (1998). Comparisons of two statistical
approaches to study grth curves: The multilevel model and the latent curve

analysis. Structural Equation Modeling: A Multidisciplinary Journal, 5 (3),
247-266.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed).
Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.

Cohen, E., Reinherz, H. Z., & Frost, A. K. (1994). Self-perceptions of unpopularity in
adolescence: Links to past and current adjustment. Child and Adolescent Social
Work Journal, 11 (1), 37-52.

Congdon, P. (2005). Markov chain Monte Carlo and Bayesian statistics. In B. S. Everitt
and D. C. Howell (Eds), Encyclopedia of statistics in behavioral science (vol. 3,
pp. 1134-43). London: John Wiley & Sons.

Congdon, P. (2006). Bayesian statistical modeling. NJ: John Wiley & Sons.

Curran, P. J. (2003). Have multilevel models been structural equation models all along.
Multivariate Behavioral Research, 38 (4), 529-569.

Curran, P. J ., Edwards, M. C., Wirth, R. J., Hussong, A. M., & Chassin, L. (2007). The

143

incorporation of categorical measurement models in the analysis of individual
growth. In T. Little, J. Bovaird, & N. Card (Eds), Modeling ecological and
contextual effects in longitudinal studies of human development (pp. 89-120).
Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

Curran, P. J., Obeidat, K., & Losardo, D. (in press). Twelve frequently asked questions
about growth curve modeling. Journal of Cognition and Development.

De Ayala, R.J. (1994). The influence of multidimensionality on the graded response
model. Applied Psychological Measurement, 18 (2), 155-170.

De Boeck, P. (2008). Random item IRT models. Psychometrika, 73 (4), 533-559.

de la Torre, J. & Patz, R. J. (2005). Making the most of what we have: A practical
application of multidimensional item response theory in test scoring. Journal of
Educational and Behavioral Statistics, 30 (3), 295-311.

Diggle, P., Heagerty, P., Liang, K.-Y., & Zeger, S. (2002). Analysis of longitudinal data
(2nd Ed.). Oxford, England: Oxford University Press.

Duncan, S. C., Duncan, T. E., & Strycker, L. A. (2000). Risk and protective factors
inﬂuencing adolescent problem behavior: A multivariate latent growth curve
analysis. Annals of Behavioral Medicine, 22 (2), 103-109.

Duncan, S. C., Duncan, T. E., & Strycker, L. A. (2001). Qualitative and quantitative
shifts in adolescent problem behavior development: A cohort-sequential multivariate

latent growth modeling approach. Journal of Psychopathology and Behavioral
Assessment, 23 (1), 43-50.

Duncan, T. E., Duncan, S. C., Strycker, L. A., Li, R, & Alpert, A. (1999). An
introduction to latent variable growth curve modeling: Concepts, issues, and
applications. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

Duncan, T. E., Duncan, S. C. (2004). An introduction to latent growth curve modeling.
Behavior Therapy, 35 (2), 333-363.

Dunson, D. B., Palomo, J ., & Bollen, K. (2005). Bayesian structural equation modeling.
Technical report # 2005-5. Statistical and Applied Mathematical Sciences Institute.
Retrieved 04 February, 2007, from http://wwwsamsi.info/TR/trZOOS-OSngdf.

Elliott, D. National Youth Survey (NYS) Series, 1976-1987 [computer ﬁle].
ICPSR08375-06542. Ann Arbor, MI: Inter-university Consortium for Political and
Social Research [distributor], 2008-08-01. Retrieved 21 March, 2009, from
http://www.igf)sr.umich.edu/cocoon/ICPSR/SERIES/00088.xml.

Embretson, S. E. (1991). A multidimensional latent trait model for measuring learning

144

and change, Psychometrika, 56 (3), 495-515.

Embretson, S. E. (1994). Comparing changes between groups: some perplexities arising
from psychometrics. In Laveault, D., Zumbo, B. D., Gessaroli, M. E., & Boss, M.
W. (Eds), Modern theories of measurement: Problems and issues (pp. 213-248).
Ottawa, Canada: Edumetrics Research Group, University of Ottawa.

Embretson, S. E., & Reise, SP. (2000). Item response theory for psychologists. Mahwah,
NJ: Lawrence Erlbaum Associates, Inc.

Engel, U. Gattig, A., & Simonson, J. (2007). Longitudinal multilevel modeling: A
comparison of growth curve models and structural equation modeling using panel
data from Germany. In K. van Montfort, J. Oud, and A. Satorra (Eds), Longitudinal

Models in the Behavioral and Related Sciences (pp. 295-314). Mahwah, NJ: ,
Lawrence Erlbaum Associates, Inc.

Everitt, B. S. (2005). Longitudinal data analysis. In B. S. Everitt & D. C. Howell (Eds),
Encyclopedia of statistics in behavioral science (vol. 2, pp. 1098-1101). London:
John Wiley & Sons

Ferrer, E., & McArdle, J. J. (2003). Alternative structural models for multivariate
longitudinal data analysis. Structural Equation Modeling: A Multidisciplinary
Journal, 10 (4), 493-524.

F ienberg, S. E., & Rinaldo, A. (2007). Three centuries of categorical data analysis:
Log-linear models and maximum likelihood estimation. Journal of Statistical
Planning and Inference, 137 (l 1), 3430-3445.

Fischer, G. H., Seliger, E. (1997). Multidimensional linear logisitc models for change. In
W. J. van der Linden & R. K. Hambleton (Eds), Handbook of Modern Item
Response Theory (pp. 323-346). NY: Springer.

Fox, J -P. (2007). Multilevel IRT modeling in practice with the package mlirt. Journal of
Statistical Software, 20 (5), 1-16.

Gelfand, A. E., Hills, 8. E., Racine-Poon, A., & Smith, A. F. (1990). Illustration of
Bayesian inference in normal data models using Gibbs sampling. Journal of the

American Statistical Association, 85 (412), 972-985.

Gelman, A., Carlin, J. 13., Stem, H.S., & Rubin, D. B. (2003). Bayesian data analysis (2"(1
Ed.). London: Chapman & Hall.

Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/
hierarchical models. NY: Cambridge.

Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple

145

sequences. Statistical Science, 7 (4), 457-472.

Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distribution and the
Bayesian restoration of images. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 6, 721-741.

Geyer, C. J. (1992). Practical Markov chain Monte Carlo. Statistical Science, 7, 473-483.

Gibbons, R. D., & Hedeker, D. (1997). Random-effects probit and logistic regression
models for three-level data. Biometrics, 53 (4), 1527-1537.

Gill, J. (2002). Bayesian methods: A social and behavioral sciences approach. FL:
Chapman & Hall.

Golembiewski, R. T., Billingsley, K., & Yeager, S. (1976). Measuring change and
persistence in human affairs: Type of change generated by OD designs. Journal of
Applied Behavioral Science, 12, 133-157.

Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item
response theory and their applications to test development. Educational
Measurement: Issues and Practice, 12 (2), 38-47.

Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their
applications. Biometrika, 57, 97-109.

Hedeker, D. (2005). Generalized linear mixed models. In B. S. Everitt & D. C. Howell
(Eds), Encyclopedia of statistics in behavioral science (vol. 2, pp. 729-73 8).
London: John Wiley & Sons

Hertzog, G, von Oertzen, T., Ghisletta, P., and Lindenberger (2008). Evaluating the
power of latent growth curve model to detect individual differences in change.
Structural Equation Modeling: A Multidisciplinary Journal, 15 (4), 541-563.

Holland, P. W., & Wainer, H. (1993). Differential item functioning (1St Ed.). Hillsdale,
NJ: Lawrence Erlbaum Associates, Inc.

Hox, J ., & Stoel, R. D. (2005). Multilevel and SEM approaches to grth curve
modeling. In B. S. Everitt & D. C. Howell (Eds), Encyclopedia of statistics in
behavioral science (vol. 3, pp. 1296-1305). London: John Wiley & Sons.

Hsieh, C., & Maier, KS. (2009). A preliminary Bayesian analysis of incomplete
longitudinal data from a small sample: Methodological advances in an international
comparative study of educational inequality. International Journal of Research and
Method in Education, 32 (1), 103-125.

146

Hsieh, C., & von Eye, A. A. (in press). The best of both worlds: A joint modeling
approach for the assessment of change across repeated measurements. International
Journal of Psychological Research.

Jackman, S. (2000). Estimation and Inference via Bayesian simulation: An introduction
to Markov chain Monte Carlo. American Journal of Political Science, 44 (2),
375-404.

J amshidian, M., & Jennrich, R. I. (2000). Standard errors for EM estimation. Journal of
the Royal Statistical Society, Series B (Statistical Methodology), 62 (2), 257-270.

Johnson, C., Raudenbush, S. W. (2006). A repeated measures, multilevel Rasch model
with application to self-reported criminal behavior. In C. S. Bergeman and S. M.
Boker (Eds) Methodological Issues in Aging Research (pp. 131-164). Notre Dame
Series on Quantitative Methods. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

Johnson, M. S., Sinharay, S., & Bradlow, E. T. (2007). Hierarchical item response theory
models. In C. R. Rao & S. Sinharay (Eds), Handbook of statistics: Psychometrics
(vol. 26, pp. 587-606). Boston: Elsevier North-Holland.

J 6reskog, K. G. (2002). Structural equation modeling with ordinal variables using
LISREL. Ret1ieved 25 December, 2008, from
http://www.ssicentral.com/lisrel/techdocs/ordinal.pdf.

Keller, L. A. (2005). Markov chain Monte Carlo item response theory estimation. In B. S.
Everitt and D. C. Howell (Eds), Encyclopedia of statistics in behavioral science
(vol. 3, pp. 1143-1148). London: John Wiley & Sons.

Kim, J-S., & Bolt, D.M. (2007). Markov chain Monte Carlo estimation of item response
models. Educational Measurement: Issues and Practice, 26 (4), 38-51.

Knott, M., Albanese M. T., & Galbraith, J. (1990). Scoring attitudes to abortion. The
Statistician, 40, 217-223.

Lee, M. D., & Wagenmakers, E. (2005). Bayesian statistical inference in psychology:
Comment on Traﬁmow (2003). Psychological Review, 112 (3), 662-668.

Lee, S-K. (2007). Structural equation modeling: A Bayesian approach. NJ: Wiley.

Li, Y., Bolt, D.M., & Fu, J. (2006). A comparison of alternative models for testlets.
Applied Psychological Measurement, 30 (1), 3-21.

Liao, T. F. (1994). Interpreting probability models.“ logit, probit, and other generalized

linear models. Sage University Paper series on Quantitative Applications in the
Social Sciences, 07-101. Thousand Oaks, CA: Sage.

147

Little, R. J. A. (1988). A test of missing completely at random for multivariate data with
missing values. Journal of the American Statistical Association, 83 (404),
1 198-1202.

Lord, F. M. (1980). Application of item response theory to practical testing problems.
Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading,
MA: Addison-Welsley Publishing Company.

Luke, D. A. (2004). Multilevel modeling. Sage University Paper series on Quantitative
Applications in the Social Sciences, 143. Thousand Oaks, CA: Sage.

Lynch, S. M., & Western, B. (2004). Bayesian posterior predictive checks for complex
models. Sociological Methods and Research, 32 (3), 301-335.

MacCallum, R. C., Kim, C., Malarkey, W. B., & Kiecolt-Glaser, J. K. (1997). Studying
multivariate change using multilevel models and latent curve models. Multivariate
Behavioral Research, 32 (3), 15-53.

Maier, K. S. (2001). A Rasch hierarchical measurement model. Journal of Educational
and Behavioral Statistics, 26 (3), 307-330.

Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47,
149-174.

May, H. (2006). A multilevel Bayesian item response theory method for scaling
socioeconomic status in international studies of education. Journal of Educational
and Behavioral Statistics, 31 (1), 63-79.

McArdle, J. J. (1988). Dynamic but structural equation modeling of repeated measures
data. In J. R. Nesselroade & R. B. Cattell (Eds), Handbook of multivariate
experimental psychology (pp. 561-614). NY: Plenum Press.

McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence
Erlbaum Associates, Inc.

McGrath, K., & Waterton, J. (1986). British Social Attitudes, [983-1986, Panel Survey:
Technical Report (London, Social and Community Planning Research).

Meredith, W., & Tisak, J. (1990). Latent curve analysis. Psychometrika, 55, 107-122.
Meredith, W. & Horn, J. (2001). The role of factorial invariance in modeling growth and

change. In Sayer, A.G. & Collins, L.M. (Eds), New Methods for the Analysis of
Change (pp. 201-240). Washington, DC: American Psychological Association.

148

Meredith, W., & Teresi, J. A. (2006). An essay on measurement and factorial invariance.
Medical care, 44 (l 1) Suppl 3, 869-877.

Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E.
(1953). Equations of state calculations by fast computing machines. Journal of
Chemical Physics, 21, 1087-1092.

Mislevy, R. J. (1987). Exploiting auxiliary information about examinees in the estimation
of item parameters. Applied Psychological Measurement, 11 (1), 81-91.

Moustaki, I., & Knott, M. (2000). Generalized latent trait models. Psychometrika, 65 (3),
391-411.

Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm.
Applied Psychological Measurement, 16 (2), 159-176.

Muthén, B. O. (1983). Latent variable structural equation modeling with categorical data.
Journal of Econometrics, 22, 43-65.

Muthén, B. O. (1984). A general structural equation model with dichotomous, ordered
categorical, and continuous latent variable indicators. Psychometrika, 49, 115-132.

Muthén, B. O. (1996). Growth modeling with binary responses. In A. von Eye & C.
Clogg (Eds), Categorical variables in developmental research: Methods of analysis
(pp. 37-54). San Diego: Academic Press.

Muthén, B. O. (2002). Beyond SEM: General latent variable modeling. Behaviormetrika,
29, 81-117.

Muthén, B. 0., & Curran, P. (1997). General longitudinal modeling of individual
differences in experimental designs: A latent variable framework for analysis and
power estimation. Psychological Methods, 2, 371-402.

Muthén, L. K., & Muthén, B. O. (2002). How to use a Monte Carlo study to decide on
sample size and determine power. Structural Equation Modeling: A
Multidisciplinary Journal, 9 (4), 599-620.

Muthén, L. K., & Muthén, B. 0. (1998-2007). Mplus user’s guide (5“ Ed.). Los Angeles,
CA: Muthén & Muthén.

Patz, R. J ., & Junker, B. W. (1999a). A straightforward approach to Markov chain Monte
Carlo methods for item response models. Journal of Educational and Behavioral

Statistics, 24 (2), 146-178.

Patz, R. J., & Junker, B. W. (1999b). Applications and extensions of MCMC in IRT:

149

Multiple item types, missing data, and rated responses. Journal of Educational and
Behavioral Statistics, 24 (4), 342-366.

Patz, R. J ., & Yao, L. (2007a). Vertical scaling: Statistical models for measuring growth
and achievement. In C.R. Rao and S. Sinharay (Eds), Handbook of statistics:
Psychometrics (vol. 26, pp. 955-975). Amsterdam: Elsevier.

Patz, R. J ., & Yao, L. (2007b). Methods and models for vertical scaling. In N. J. Doran,
M. Pommerich, and P.W. Holland (Eds), Linking and aligning scores and scales
(pp. 253-272). New York: Springer.

Phillips, L. D. (2005). Bayesian statistics. In B. S. Everitt and D. C. Howell (Eds),
Encyclopedia of statistics in behavioral science (vol. 1, pp. 146-150). London: John
Wiley & Sons.

Preacher, K. J ., Wichman, A. L., MacCallum, R. C., & Briggs, N. E. (2008). Latent
growth curve modeling. Sage University Paper series on Quantitative Applications
in the Social Sciences, 157. Thousand Oaks, CA: Sage.

R Development Core Team (2009). R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN
3-900051-07-0, URL http://wwwR-projectoLg.

Rabe-Hesketh, S., & Skrondal, A. (2008). Multilevel and longitudinal modeling using
Stata (2nd Ed.). TX: Stata press publication.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests.
Chicago: University of Chicago Press.

Raudenbush, S. W. & Liu, X. (2001). Effects of study duration, frequency of observation,

and sample size on power in studies of group differences in polynomial change.
Psychological Methods, 6 (4), 387-401.

Raudenbush, S.W., & Bryk, AS. (2002). Hierarchical linear models: Applications and
data analysis methods. CA: Sage Publications.

Raudenbush, S. W., Johnson, C., & Sampson, R. J. (2003). A multivariate multilevel
Rasch model with application to self-reported criminal behavior. Sociological
Methodology, 33 (1), 169-212.

Raykov, T. (2007). Longitudinal analysis with regressions among random effects: A
latent variable modeling approach. Structural Equation Modeling: A

Multidisciplinary Journal, 14 (1), 146-169.

Raykov, T., & Marcoulides, G. A. (2006). A first course in structural equation modeling
(2nd Ed.). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

150

Reckase, M. D. (1985). The difﬁculty of test items that measure more than one ability.
Applied Psychological Measurement, 9 (4), 401-412.

Reckase, M. D. (1997). The past and future of multidimensional item response theory.
Applied Psychological Measurement, 21 (1), 25-36.

Reckase, M. D. (2009). Multidimensional item response theory. NY: Springer.
Rice, J. A. (1995). Mathematical statistics and data analysis. CA: Duxbury Press.

Rijmen, F ., Tuerlinckx, F., De Boeck, P., & Kuppens, P. (2003). A nonlinear mixed
model framework for item response theory. Psychological Methods, 8 (2), 185-205.

Roberts, J. S., & Ma, Q. (2006). IRT models for the assessment of change across repeated
measurements. In R. Lissitz (Ed.), Longitudinal and value added modeling of
student performance (pp. 100-127). Maple Grove, MN: JAM Press.

Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.

Rupp, A. A., Dey, D. K., & Zumbo, B. D. (2004). To Bayes or not to Bayes, ﬁom
whether to when: Applications of Bayesian methodology to modeling. Structural
Equation Modeling: A Multidisciplinary Journal, 11 (3), 424-451.

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded
scores. Psychometrika Monograph Supplement, no. 17, Richmond, VA:
Psychometric Society.

Samejima, F. (1997). Graded response model. In W. J. van der Linden and R. K.
Hambleton. (Eds), Handbook of modern item response theory (pp. 85-100). NY:
Springer.

Sayer, A. G., & Cumsille, P. E. (2001). Second-order latent growth models. In L. M.
Collins & A. G. Sayer (Eds). New methods for the analysis of change (pp.
179-200). Washington, DC: American Psychological Association.

Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art.
Psychological Methods, 7 (2), 147-177.

Scheines, R., Hoijtink, H., & Boomsma, A. (1999). Bayesian estimation and testing of
structural equation models. Psychometrika, 64 (1), 37-52.

Seltzer, M. H., Wong, W. H., & Bryk, A. S. (1996). Bayesian analysis in applications of

hierarchical models: Issues and methods. Journal of Educational and Behavioral
Statistics, 21 (2), 131-167.

151

Singer, J. D. (1998). Using SAS PROC MIXED to ﬁx multilevel models, hierarchical
models, and individual grth models. Journal of Educational and Behavioral
Statistics, 24 (4), 323-355.

Singer, J. D., & Willett, J. B. (2005). Growth curve modeling. In B. S. Everitt & D. C.
Howell (Eds), Encyclopedia of statistics in behavioral science (vol. 2, pp. 772-779).
London: John Wiley & Sons.

Sinharay, S., Johnson, M. S., & Stern, H. S. (2006). Posterior predictive assessment of

item response theory models. Applied Psychological Measurements, 30 (4),
298-321.

Sinharay, S., & Stern, H. S. (2003). Posterior predictive model checking in hierarchical
models. Statistical Planning and Inference, 111, 209-221.

Skrondal, A., & Rabe-Hesketh, S. (2003). Some applications of generalized linear latent
and mixed models in epidemiology: Repeated measures, measurement error and
multilevel modeling. Norsk Epidemiologi, 13 (2), 265-278.

Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling:
Multilevel, longitudinal, and structural equation models. Boca Raton: Chapman &
Hall/CRC.

Skrondal, A., & Rabe-Hesketh, S. (2008). Multilevel and related models for longitudinal
data. In J. de Leeuw & E. Meijer (Eds), Handbook of multilevel analysis (pp.
275-299). NY: Springer.

Snijders, T. A. B. (1996). Analysis of longitudinal data using the hierarchical linear
model. Quality and Quantity, 30, 405-426.

Snijders, T. A. B., & Bosker, R.J. (1993). Standard errors and sample sizes for two-level
research. Journal of Educational Statistics, 18, 237-259.

Spiegelhalter, D. J ., Best, N.G., Carlin, B. P., & van der Linde, A. (2002). Bayesian
measures of model complexity and ﬁt (with discussion). Journal of the Royal
Statistical Society, Series B, 64 (4), 583-616.

Spiegelhalter, D. J ., Thomas, A., Best, N. G., & Lunn, D. (2003). WinBUGS user manual.
Cambridge, UK: MRC Biostatistics Unit, Institute of Public Health. Retrieved 24
October, 2006, from http://www.mrc-bsu.cam.ac.uk/bugs.

Steele, F., & Goldstein, H. (2007). Multilevel models in psychometrics. In C. R. Rao &
S. Sinharay (Eds). Handbook of Statistics, Psychometrics (vol. 26, pp. 401-420).
Boston: Elsevier North-Holland.

Stefanescu, C., Berger, V. W., & Hershberger, S. L. (2005). Probits. In B. S. Everitt & D.

152

C. Howell (Eds). Encyclopedia of Statistics in Behavioral Science (vol. 3, pp.
1608-1610). London: John Wiley & Sons.

Swahn, M. & Donovan, J. (2003). Correlates and predictors of violent behavior among
adolescent drinkers. Journal of Adolescent Health, 34 (6), 480-492.

Te Marvelde, J. M., Glas, C. A. W., Van Landeghem, G., & Van Darnme J. (2006).
Application of multidimensional item response theory models to longitudinal data.
Educational and Psychological Measurement, 66 (1), 5-34.

Thompson, J ., Palmer, T., & Moreno, S. (2006). Bayesian analysis in Stata with
WinBUGS. The Stata Journal, 6 (4), 530-549.

Tucker, L. R. (1966). Learning theory and multivariate experiment: Illustration of
determination of generalized learning curves. In R. B. Cattell (Ed.), Handbook of
multivariate experimental psychology (pp. 476-501). NY: Rand McNally.

Tuerlinckx, F., & Wang, W. C. (2004). Models for polytomous data. In P. De Boeck &
M. Wilson (Eds), Explanatory item response model: A generalized linear and
nonlinear approach (pp. 75-110). NY: Springer.

van den Oord, E. J. C. G. (2005). Estimating Johnson curve population distribution in
MULTILOG. Applied Psychological Measurement, 29 (1), 45-64.

Vermunt, J. (2007). Growth models for categorical response variables: Standard,
latent-class, and hybrid approaches. In K. van Montfort, J. Oud, and A. Satorra
(Eds). Longitudinal Models in the Behavioral and Related Sciences (pp. 139-158).
Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

Wesserman, L. (2003). All of statistics: A concise course in statistical inference, NY:
Springer.

Western, B. (1999). Bayesian analysis for sociologists: An introduction. Sociological
Methods and Research, 28 (1), 7-34.

Western, B., & J ackman, S. (1994). Bayesian inference for comparative research.
American Political Science Review, 88 (2), 412-423.

Wiggins, R. D., Ashworth, K., & O’Muircheartaigh, C. A. (1990). Multilevel analysis of
attitudes to abortion. The Statistician, 40 (2), 225-234.

Willett, J. B., & Sayer, A. G. (1994). Using covariance structure analysis to detect
correlates and predictors of individual change over time. Psychological Bulletin, 116
(2), 363-381.

Zhang, Z., Hamagami, F ., Wang, L., Grimm, K. J ., & Nesselroade, J. R. (2007). Bayesian

153

analysis of longitudinal data using growth curve models. International Journal of
Behavioral Development, 31 (4), 374-3 83.

154