7..

-‘-J

{£51. ‘

ﬂit

$14

5.3.?»

ﬂ

 

.e.

.. a"... uh.
i; "9 maﬁa? .. w...
. . u: A .
.1 . i!

g .2 33.1)...sz.
.i u.

j s... 1

z.-
.i .w
2
It.
. I?

51.
.
7.2“.“-
9355. ab ‘ 5
:1. .5... . ‘
.1. a?» {:4 53K .3 x
: Inn .wgnwwﬁvﬁ
. t :2. :u. a: 1.:
. ‘v‘r ... ”A
s 3.): . . a
. i . . I if
.u. 6... t a
u... 3 . t u. V
. ..... 3... .i. .. (J
3.43:1. .3: I. I?
3.. , 315%
{A .4...‘

‘1 .uw

 

Erma», .
..... 4.

. I?) \ \
.
3% .| g
.1“ , ‘

 

1
1.! ‘H‘
hm]? h...

 

3:

«a.

 

10. ARIA!
3 1‘ i
2.

 

 

$33. gag

 

.1: A... . .

gummy—W. Bi ,

 

n .1. Walk. w. “ﬁg-2%. .

a.

.5 ﬁégxw. $2

 

 

. .F
.5»

. . Lupéwmmmﬁaﬁ.

 

In. ‘
H /
9th

J 2 25:!" 75.2)

This is to certify that the
dissertation entitled

EXTENDING THE PARTIAL CREDIT AND RATING SCALE
MODELS USING THE HIERARCHICAL MULTIVARIATE
GENERALIZED LINEAR MODEL

presented by

JONATHAN R. MANALO

has been accepted towards fulﬁllment
of the requirements for the

PhD. degree in Measurement and Quatitative
Methods

 

 

    

Professo Signature

lei lMa/rW

Date

 

MSU is an Afﬁrmative Action/Equal Opportunity Institution

._.-—--u—u-o-.—s

 

LIBRARY
Michigan State
University

 

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

Mil-$1129 Y'IIU

 

 

 

 

 

 

 

 

 

 

 

 

 

6/01 c:/ClRC/DateDuep65—p.15

_ﬁ__g_, ,__._. .. i t_._ —_

EXTENDING THE PARTIAL CREDIT AND RATING SCALE MODELS USING
THE HIERARCHICAL MULTIVARIATE GENERALIZED LINEAR MODEL

By

Jonathan R. Manalo

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Counseling, Educational
Psychology, and Special Education

2004

ABSTRACT

EXTENDING THE PARTIAL CREDIT AND RATING SCALE MODELS USING
THE HIERARCHICAL MULTIVARIATE GENERALIZED LINEAR MODEL

By
Jonathan R. Manalo

In this dissertation, the Rating Scale and Partial Credit Models of Item Response
Theory (IRT) are extended using a hierarchical multivariate generalized linear model
(HMGLM). Speciﬁcally, previous extensions of IRT using hierarchical linear modeling
(HLM) are discussed by highlighting their weaknesses and how by applying the
HMGLM their weaknesses may be avoided. The HMGLM is also deﬁned, in particular,
as an extension of the Rating Scale and Partial Credit Models. A small simulation study is
described to illustrate the accuracy of the parameter recovery for these models.
Additionally, modeling extensions of the Rating Scale and Partial Credit Models are
made by applying the HMGLM. Computational examples are provided to illustrate the

application of these models.

Dedicated to my parents Alicia and Jesse for their constant support and love.

iii

ACKNOWLEDGMENTS

First of all, I would like to thank my dissertation committee, for the freedom they
allowed me, their support, constructive criticism, and insightful comments that made this
dissertation a much better project. But most importantly, thank you Dr. Maier for
accepting to be the chair of the committee and taking me on as your ﬁrst student. Your
direction, assistance, and support helped guide me through this dissertation.

Thank you Dr. Floden. Your deep thoughts pushed me to understand my topic in
more meaningﬁrl ways. Without this my dissertation would have simply been over 100
pages of formulas without any real meaning.

Thank you Dr. Reckase for not only pushing me to think deeper about the
psychometrics contained in this paper, but also thank you for the 5 years of guidance and
wisdom you offered while I was at Michigan State University. Without this, I would have
been just another MQM student.

Thank you Dr. Wolfe. Your ﬁ'iendship, guidance, and support for the past several
years—from the University of Florida to Michigan State University (and to wherever life
leads me)—have not only made me a better student, a better professional, and a better
leader, but you have also made me a better person as well. For this I am extremely
grateful, and for this you will always be the ‘Master’ and I will always be the
‘Grasshopper.’

Second, I would like to thank my friends and family who helped motivate and

support me. Especially, I would like to thank my parents Alicia and Jesse, my brothers

iv

Jeff and Jesse, my little sister Jessica, my ole buddies Wayne and Way, and my New
Mom Joyce for always being there. There is nothing like friends and family.

Lastly, I would like to thank my wife Margaret, my dogs Symbi and Isa, my dog
in heaven Cream, and my baby boy on the way Eian. Although they cannot read or
understand most of what I say, I am forever indebted to my dogs Symbi and Isa for they
always provided me with a smile and love, unconditionally, when I needed it the most.
To Cream: although you were not able to see me ﬁnish my dissertation and school, you
were always there to distract me and pursue the ﬁner things in life. Thank you. To my
boy Eian: You are the main reason I ﬁnished my dissertation in one year and not ﬁve.
Daddy is looking forward to the new chapter in his life (and Daddy has to pay for those
toys). To Margaret: throughout my graduate career, especially when I was down on
myself, you provided me with the support I needed; you provided me with the friendship
I needed; you provided me with the love I needed, always. Thank you.

I did it.

TABLE OF CONTENTS

LIST OF TABLES ................................................................................... xi
Chapter Page
1. INTRODUCTION ................................................................................ 1

1-1. Motivation of the study ...................................................................... 1

1-2. Overview of Previous Hierarchical IRT Models for Polytomous Items... ..... .....5
1-2-1. Traditional, Non-Hierarchical Partial Credit and Rating Scale Models. ....6
1-2-2. Random Coefﬁcients in a Multinomial Model Approach .................... 10

1-2-3. Bayesian Modeling ofRandom-Effects Approach.............................12

1-2-4. Rater Effects Approach ........................................................... 14
1-2-5. A Hierarchical, Univariate General Linear Model Approach ............... 18
2. A HIERARCHICAL MULTIVARIATE GENERALIZED LINEAR MODELING
FRAMEWORK FOR IRT ..................................................................... 24
2-1. The Hierarchical Multivariate Generalized Linear Model ........................... 24
2-1-1. The Level-1 Model for the HMGLM ........................................... 25
2-1-2. The Level-2 Model for the HMGLM ........................................... 29
2-1-3. The Level-3 Model for the HMGLM ........................................... 29
2-1-4. The Combined Model for the HMGLM ....................................... 31
2-2. A New Model 1: The Hierarchical Multivariate Generalized Linear-Partial
Credit Model (HMGL-PCM) ............................................................. 32
2-3. A New Model 2: The Hierarchical Multivariate Generalized Linear-Rating Scale
Model (HMGL-RSM) ..................................................................... 35
2-4. Assumptions ............................................................................... 37

vi

2-5. Estimation .................................................................................. 40

. PARAMETER RECOVERY AND EXAMPLE ............................................ 42
3-1. Simulation Design ......................................................................... 42
3-1-1. Design ............................................................................... 42
3-1-2. Analysis ............................................................................. 45
3-2. Parameter recovery results ............................................................... 45
3-2-1. Descriptive Statistics .............................................................. 46
3-2-2. RMSE ............................................................................... 51
3-3. Example .................................................................................... 54
3-3-1. Design .............................................................................. 54
3-3-2. Descriptive Statistics .............................................................. 55
3-3-3. Results .............................................................................. 55

. EXTENDING THE HMGL-RSM TO INCLUDE PERSON COVARIATES .......... 59
4-1. The HMGL-RSM with Person Covariates ............................................. 59
4-1-1. The Level-1 Model with Person Covariates ................................... 59
4—1-2. The Level-2 Model with Person Covariates .................................. 60
4-1-3. The Level-3 Model with Person Covariates .................................. 60
4-1-4. The Combined Model with Person Covariates ............................... 61
4-2. Simulation Study for the HMGL-RSM with Person Covariates .................... 62
4-2-1. Design ............................................................................... 62
4-2—2. Analysis ............................................................................. 64
4-2-3. Results: Descriptive Statistics .................................................... 64
4-2-4. Results: RMSE ...................................................................... 67

vii

4-3. Example Analysis of the HMGL—RSM with Person Covariates .................... 69

4-3-1. Design ............................................................................... 69
4-3-2. Analysis ............................................................................. 70
4-3-3. Results ............................................................................... 7O
5. EXTENDING THE HMGL-RSM TO INCLUDE A GROUP LEVEL .................. 76
5-1. The Four-Level HMGL—RSM ............................................................ 76
5-1-1. The Level-1 Model ................................................................ 76
5-1-2. The Level-2 Model ................................................................ 76
5-1-3. The Level-3 Model ................................................................ 77
5-1-4. The Level-4 Model ................................................................ 78
5-1-5. The Combined Model ............................................................. 78
5-2. Simulation Study for the Four-Level HMGL-RSM ................................... 80
5-2-1. Design ............................................................................... 81
5-2-2. Analysis ............................................................................. 84
5-2-3. Results: Descriptive Statistics .................................................... 86
5-2-4. Results: RMSE ...................................................................... 89
5-2-5. Results: Accuracy .................................................................. 90
5-3. Example Analysis of the Four-Level HMGL-RSM ................................... 91
5-3-1. Design ............................................................................... 92
5-3-2. Analysis ............................................................................. 92
5-3-3. Results ................................................................................ 93
6. EXTENDING THE HMGL-RSM TO INCLUDE ITEM COVARIATES ............ 98

viii

6-1. The HMGL-RSM with Item Covariates ............................................... 98

6-1-1. The Level-l Model with Item Covariates ..................................... 98

6-1-2. The Level-2 Model with Item Covariates .................................... 98

6-1-3. The Level-3 Model with Item Covariates .................................... 99
6-1-4. The Combined Model with Item Covariates ................................. 100
6-2. Simulation Study for the HMGL-RSM with Item Covariates ...................... 101
6-2-1. Design .............................................................................. 101
6-2-2. Analysis ............................................................................ 104
6-2-3. Results: Descriptive Statistics ................................................... 105
6-2-4. Results: RMSE ................. ' .................................................... 108
6-3. Example Analysis of the HMGL-RSM with Item Covariates ..................... 110
6-3-1. Design .............................................................................. 111
6-3-2. Analysis ............................................................................ 112
6-3-3. Results .............................................................................. 112
7. CONCLUSIONS AND FUTURE DIRECTIONS ........................................ 116
7-1. Conclusions ................................................................................ l 16
7-1-1. Contributions ...................................................................... 120
7-1-1 . 1. Special Estimation Software is Not Necessary ...................... 120
7-1-1.2. Common Notation ....................................................... 121
7-1-1.3. Well-Known Score Functions and Information Matrices. . . . ......121
7-1-1 .4. Common Estimation Method ........................................... 122
7-2. Limitations .................................................................................. 123
7-2-1. Item Discrimination Parameter is not Modeled .............................. 123

ix

7-2-2. Data Preparation is Cumbersome ............................................... 124

7-2-3. Possibly Long Estimation Times .............................................. 124
7-2-4. Unbalanced Data .................................................................. 125
7-2-5. Non-Normal Distribution for Random Effects Not Investigated .......... 126
7-3. Future directions ........................................................................... 126

APPENDIX A: Example SAS Code for Estimating the HMGL-RSM

for a Polytomous Test with 10 Items ............................................................. 130
APPENDIX B: Example SAS Code for Estimating the HMGL-PCM

for a Polytomous Test with 10 Items ............................................................. 133
APPENDIX C: Example of the Input Data Structure .......................................... 138
REFERENCES ...................................................................................... 139

LIST OF TABLES

l. The Signal Detection Model for the Rating Probabilities ( pka) ....................... 16
2. Item Parameters Used in the Simulation ..................................................... 43
3. Mean and Standard Error of 6 and Z for the Simulated 100, 500, and 1000

Persons ........................................................................................... 46
4. Mean and Standard Error of the Parameter Estimates for the RSM when

J = 10 .............................................................................................. 47
5. Mean and Standard Error of the Parameter Estimates for the HMGL-RSM when

J = 10 .............................................................................................. 48
6. Mean and Standard Error of the Parameter Estimates for the RSM when

J = 25 .............................................................................................. 49
7. Mean and Standard Error of the Parameter Estimates for the HMGL-RSM when

J = 25 .............................................................................................. 50
8. RMSE for the RSM and HMGL-RSM across 10 Items .................................. 52
9. RMSE for the RSM and HMGL-RSM across 25 Items .................................. 53
10. Parameter Estimates for the HMGL-RSM and -PCM .................................... 56
11. Mean and Standard Error of the Parameter Estimates for the HMGL-RSM when

1031 = .2 .......................................................................................... 65
12. Mean and Standard Error of the Parameter Estimates for the HMGL-RSM when

1031 = .5 .......................................................................................... 66
13. Mean and Standard Error of the Parameter Estimates for the HMGL-RSM when

1031 = I .......................................................................................... 67
14. RMSE for the HMGL-RSM with Person Covariates ..................................... 68
15. Parameter Estimates for the MRCMM and HMGL-RSM With SES as a Person

Covariate ........................................................................................ 72
16. DIP results for the Mantel-Haenszel test .................................................... 82

xi

17. Mean and Standard Error of the Parameter Estimates for the Four-Level HMGL-

RSM for Proportion = 10% ................................................................... 87
18. Mean and Standard Error of the Parameter Estimates for the Four-Level HMGL-

RSM for Proportion = 25% ................................................................... 88
19. RMSE for the Four-Level HMGL-RSM .................................................... 89
20. Hit Rates for Detecting DIF with the HMGL-RSM ....................................... 91
21. Hit Rates for Detecting DIF with the MH test .............................................. 91
22. Item Analysis of a Real Data Set ............................................................. 95
23. Mean and Standard Error of the Parameter Estimates

for the HMGL-RSM for Model 1 ........................................................... 106
24. Mean and Standard Error of the Parameter Estimates

for the HMGL-RSM for Model 2 ........................................................... 107
25. Mean and Standard Error of the Parameter Estimates

for the HMGL-RSM for Model 3 ........................................................... 108
26. RMSE for the HMGL-RSM with Item

Covariates ...................................................................................... 110
27. Demographic Information .................................................................... 111
28. Parameter Estimates for the HMGL-RSM With Age as an Item Covariate .......... 113

xii

Chapter 1. Introduction
l-l. Motivation of the study

In recent years, educational researchers have combined the theory and methods of
Hierarchical Linear Modeling (HLM; Goldstein, 2003; Raudcnbush & Bryk, 2002;
Snijders & Bosker, 1999) and Item Response Theory (IRT; Lord, 1980). For example,
Kamata (1998, 2001), Maier (2000, 2001), Fox and Glas (1998), and Adams and Wilson
(1996) used the HLM framework to deﬁne IRT models for dichotomously scored items.
As they illustrate, one advantage of unifying HLM and IRT methods is that postulating
IRT models becomes increasingly flexible. For example, traditional IRT models (e. g., l-
parameter model; Lord 1980) may be formulated to include covariates (Cheong &
Raudcnbush, 2000; Fox, In press, a; Kamata, 1998, 2001).

Another advantage of unifying IRT and HLM is that the [RT parameters and their
standard errors may be estimated more precisely (Maier, 2000, 2001, 2002; Mislevy,
1987). That is, by applying the HLM framework, a Level-l model is deﬁned in which the
item parameters in an IRT model are ﬁxed and nested within a Level-2 model. The
Level-2 model deﬁnes the person parameters as being randomly varying. By considering
the nested relationship—an item level nested within a person level, the variation of the
responses within persons and between persons is taken into consideration, and estimation
methods may obtain better precision.

Unfortunately, with these advantages, a few disadvantages follow. For instance,
although the aforementioned IRT models were suitable for items that were scored
dichotomously, they were not suitable for items that were scored using partial credit (i.e.,

polytomous items). To compensate for this limitation, Adams and colleagues (Adams &

Wilson, 1996; Adams et al., 1997), Maier (2000, 2002), Patz and colleagues (Patz, 1996
as cited by Patz, Junker, and Johnson, 1999; Patz, Junker, and Johnson, 1999; Patz,
Junker, Johnson, & Mariano, 2002), Donoghue and Hombo (2003), Rijmen, Tuerlinckx,
De Bock, and Kuppens (2003), and Tuerlinckx and Wang (2004) developed IRT models
using a hierarchical framework for polytomous items. However, these models were
limited in at least one of two ways.

The ﬁrst limitation was that it did not allow for modeling of predictor variables to
help explain the variation in the item and person parameters (e.g., Donoghue & Hombo,
2003; Patz, 1996 as cited by Patz, Junker, and Johnson, 1999; Patz, Junker, and Johnson,
1999; Patz, Junker, Johnson, & Mariano, 2002). As mentioned above, it may be
important to control for the inﬂuences of predictor variables in a psychometric testing
environment (Cheong & Raudcnbush, 2000; Fox, In press, a; Kamata, 1998, 2001).
Although Adams et al.’s model may include predictor variables for the person parameter,
to date, their model may not include predictors of item behaviors. In addition, although
Maier’s model (2000, 2002) may be extended to include predictor variables (e.g., Fox, In
press, a), the ease at which this may be accomplished may be arguable. If a researcher
believes that person covariates and predictors of item behaviors should be controlled for,
then a more ﬂexible model is not only desired but should be employed.

The other limitation was that the correlation between categories of a polytomous
item may not be sufﬁciently accounted for in the model (e.g., Adams et al., 1997;
Donoghue & Hombo, 2003; Maier, 2000, 2002; Patz, 1996 as cited by Patz, Junker, and
Johnson, 1999; Patz, Junker, and Johnson, 1999; Patz, Junker, Johnson, & Mariano,

2002). That is, the aforementioned model treats the item response as being sampled from

a univariate distribution. However, in some cases, the categories of an item merely
represent nominal variables; that is, the categories are simply labels. For example, an
item with the categories ‘negative’, ‘neutral’, and ‘positive’, may be considered as three
separate dichotomous, indicator variables labeled ‘negative feeling’, ‘neutral feeling’, and
‘positive feeling’, each with the possibilities ‘yes’ or ‘no’. Viewed this way, each
category represents a variable, and the response itself is a vector of Os and a l, and should
be treated as if being sampled ﬁ'om a multivariate distribution (F ahrrneir & Tutz, 2001).

Below, a general framework is proposed that uses HLM to model various IRT
models. This is accomplished by applying a multivariate generalized linear modeling
framework within HLM. The model and framework is relatively new and is commonly
seen in the statistical literature under the heading of ‘Multivariate Generalized Linear
Mixed Model’ (MGLMM; e.g., Fahrmeir & Tutz, 2001; Gueorguieva, 2001; Hartzel,
Agresti, & Caffo, 2001). Here, to be consistent with the majority of the educational
literature, rather than describing the model as being ‘mixed’, the model is described as
‘hierarchical’ and label it a Hierarchical Multivariate Generalized Linear Model
(HMGLM).

Additionally, although Tuerlinckx and Wang (2004) recently illustrated the
application of the MGLMM to IRT models and although it can be shown that the models
they deﬁne are similar to those that are deﬁned here (in particular those in Chapter 3), the
focus of this dissertation, unlike the aforementioned studies, is to expand IRT models
using a particular framework—the hierarchical framework set forth by Goldstein (2003),
Raudcnbush and Bryk (2002), and Snijders and Bosker (I999): HLM. And, unlike

Tuerlinckx and Wang (2004), HLM is used to expand IRT models by conceptualizing the

units that are measured (e.g., persons and items) as being nested within one another (see
Chapter 2). Furthermore, this provides a more ‘natural’ way for conceptualizing
hierarchical polytomous RT models. Therefore, by using the HLM framework to apply
the HMGLM to RT, readers may better see the hierarchical relationships that exist in
educational testing data.

However, the purpose of applying the HMGLM to RT and HLM is not
necessarily to develop an alternative framework for modeling and estimating RT models
per se; rather, the purpose of applying the HMGLM is to develop a framework in which
the RT models may be extended in various ways, such as adding person covariates and
predictors of item behaviors. Speciﬁcally, the advantages of using the framework
provided by the HMGLM are that (1) both of the aforementioned limitations are avoided,
i.e., polytomous RT models may be extended to include person covariates and predictors
of item behaviors, and the correlation between categories of a polytomous item may be
accounted for; (2) models using the HMGLM may currently be estimated using existing
software (e.g., SAS, 2001; STATA, 2000); (3) RT and HLM are uniﬁed using a
common notation; (4) score functions and information matrices (which may be used for
parameter estimation) are well-known under the HMGLM (e.g., see F ahrrneir & Tutz,
2001); and (5) a broad class of RT models within the HLM framework may be estimated
using a common method (e.g., maximum likelihood).

This paper consists of seven chapters. In Chapter 1, the motivation for unifying
HLM and RT are discussed, and two limitations with the current RT models within the
HLM framework already are identiﬁed. In addition, Chapter 1 describes four approaches

for unifying HLM and polytomous RT models, as well as the limitations associated with

each approach. Chapter 2 provides a detailed description of a new approach for unifying
HLM and polytomous RT models. This new approach applies a hierarchical multivariate
generalized linear model. In addition, Chapter 2 presents a re-formulation of two
polytomous RT models, the Rating Scale Model (Andrich, 1978) and the Partial Credit
Model (Masters, 1982), using the hierarchical multivariate generalized linear model.
Chapter 3 provides a simulation study for the parameter recovery of these models, as well
as an example analysis for illustrating the use and interpretation of the models. Chapter 4
simulates and illustrates the application of the hierarchical multivariate generalized linear
model in which the Rating Scale Model is extended to include person covariates. Chapter
5 simulates and illustrates the application of the hierarchical multivariate generalized
linear model in which the Rating Scale Model is extended to include a group level as a
measure of DIP. Chapter 6 simulates and illustrates the application of the hierarchical
multivariate generalized linear model in which the Rating Scale Model is extended to
include item covariates to explain DIF. Finally, Chapter 7 discusses the general
contributions of the hierarchical multivariate generalized linear model, both

methodologically and substantively, to the ﬁelds of HLM, RT, and educational research.

1-2. Overview of Previous Hierarchical IRT Models for Polytomous Items

As Kamata (2001) points out, the uniﬁcation of RT and HLM occurred several
years ago across three separate ﬁelds: psychometrics (e. g., Adams et al., 1997), non-
linear mixed-effects modeling methods (e.g., Hedeker & Gibbons, 1993, as cited by
Kamata, 2001 ), and random-effect Bayesian modeling (e. g., Spiegelhalter, Thomas, Best,

& Gilks, 1996, as cited by Kamata, 2001). Since each ﬁeld essentially conducted their

work independently of one another, each pursued the uniﬁcation using different
perspectives. Kamata (1998, 2001) continued this tradition by using a generalized linear
modeling approach in HLM. Below, each perspective is discussed in relation to RT
models for polytomous items.

However, before this endeavor is pursued, one ﬁrst brieﬂy describes two
traditional, non-hierarchical IRT models for polytomous items: Masters’ (1982) Partial
Credit Model (PCM) and a special case of the PCM, the Rating Scale Model (Andrich,
1978). By doing so, the reader may recognize the transition that is made ﬁ'om modeling
non-hierarchically to modeling hierarchically, and the reader may notice the similarities
and differences between the current hierarchical RT models for polytomous items.
Furthermore, these models and each perspective are discussed below using a common
example within a typical testing condition to illustrate how the concepts of RT transfer

over to HLM.

1-2-1. Traditional. Non-Hierarchical Partial Credit and Rating Scale Models

Masters’ (1982) Partial Credit Model (PCM) deﬁnes the probability ”0k that

person k will respond to category 1' of item j as

exp:(0k -5,-j)

”ilk: , i=0 , (1.1)

i!

z expz (9k — 6,-1- )

i=0 i=0

 

where 9k is the location of person It on the underlying latent trait continuum; and 6,-1- is

the location of a particular category 1' (i = 0,1,. . . ,i',. . . I ) for item j on the underlying

latent trait continuum.
The PCM may be re-expressed in terms of logits; that is, as a model that describes
the log-odds of the probability that person k will select category i rather than category

i—l for itemj

 

7r”
log ”k =9]. —5,.j. (1.2)
”i-1,jk

Although 6k and 6,-1- may take on several different interpretations depending on

the testing environment (for example, in achievement testing 0k is commonly referred to

as proﬁciency), here a personality testing environment is assumed, and one continues
with the example given in Section 1-1 in which each item contains three categories,
‘negative’, ‘neutral’, and ‘positive’. The personality test attempts to measure the latent
trait ‘honesty’ of each particular applicant. This is achieved by asking various types of
honesty questions, in which the applicant responds by selecting one of the three
categories, which represents his/her feelings toward the question. Hence, in our example,

6k is the honesty of applicant Ir, and 6,-1- is the ‘attractiveness’ of a particular category 1',

or feeling 1’, rather than i—l for each question j.
Thus, in a testing environment, the PCM suggests that the probability that a
person will select a particular category of a particular item depends not only on the

person’s location on the underlying latent trait continuum (in this case, honesty), but also

it depends on the item’s category location on the underlying latent trait continuum (in this
case, the attractiveness of each feeling for each item).

Notice that the traditional model does not consider the hierarchical relationship
that exists between persons and items. To help illustrate this idea, it may be better to
think of persons as being schools and items as being students. Using this example, it is
easier to see that a set of students is nested within a particular school. Furthermore, if the
same test was given to the students across the different schools, it seems reasonable to
expect that student performance on the test would be more homogenous within a
particular school, and, generally speaking, the performance of a school may be more
heterogeneous than another school (e.g., school in a higher SES location may perform
differently than a school in a lower SES location).

Thus, referring back to our original honesty example, it seems reasonable to argue
that items are nested within persons. Hence, it seems reasonable that a particular person's
set of responses will be more homogeneous than when compared to a set of responses for
another person. Furthermore, it seems reasonable that overall a person’s responses are
heterogeneous when compared to another person’s responses. Therefore, the traditional
RSM and PCM do not consider the variation of the responses within persons and between

persons. Hence, in HLM terms, 0k and 5,-1- do not vary across the person or item level

and are considered ﬁxed parameters. In other words, there is no Level-l model for the
items that is deﬁned within a Level-2 model for the persons.

Continuing then, Andrich’s (1978) Rating Scale Model (RSM), may be
considered a special case of the PCM (as mentioned above). To obtain the RSM, the

PCM is ﬁrst re-expressed to model the overall location of each item on the underlying

latent trait continuum and the response threshold of selecting category i rather than i — 1
(instead of modeling the item’s category location on the underlying latent trait continuum

as before), one obtains

 

”n
log ’1" =91 -5j —r,-j, (1.3)
”i—1,jk

where 9k is given above; but now 5,-1- is decomposed into two components, i.e.,

I
. . . . l .
6,-1- =5j +rij , where 5]- 15 the overall attractlveness of 1temj (6}- : 72151'1' ); and rij 1S
1:

the response threshold of being attracted to category i rather than i—l , and are deviations

from the overall attractiveness of item j (61- ).

However, if the category thresholds are constrained to be equal across items, i.e.,

rij = 2',- , then RSM may be considered a special case of the PCM

 

72'“
log ‘1" =91 —5j —z',-, (1.4)
”14,11:

where 61- is deﬁned above; and r,- is the threshold of being attracted to category 1' rather

than i —1 for all items.

Thus, in our example, the RSM suggests that the probability that a person will be
attracted to select a particular feeling for a particular item depends not only on the
person’s honesty, but also the overall attractiveness of the item and the threshold of being
attracted to feeling i rather than i—l. Again, notice now that the thresholds do not vary
for each item; rather, the thresholds are common across items.

Additionally, notice like the PCM, the RSM does not consider the variation of the

responses within persons and between persons. Hence for the traditional PCM and RSM,

the hierarchical nature is ignored, and all parameters are considered ﬁxed parameters.
That is, in HLM terms, there is no Level-l model for the items that is deﬁned within a
Level-2 model for the persons.

(As an aside, note the PCM and RSM are also appropriate for modeling
dichotomous items, in which the dichotomous response is treated as being two categories
(i.e., the l-parameter model). Lastly, similar relationships hold for the hierarchical analog

of the RSM.)

1-2-2. Mom Coefﬁcients in a Multinomial Model Approﬂ

One approach for modeling RT models in HLM was spearheaded by Adams and
Wilson (1996) and Adams et a1. (1997). In their approach, they applied a multinomial
model that incorporated random coefﬁcients for the modeling of the person’s location on
the underlying continuum. Speciﬁcally, the Level-l model for their aptly named

Multidimensional Random Coefﬁcient Multinomial Model (MRCMM) is deﬁned as

 

108 ”3k = 77“].
I
n-1,). ’ (1.5)

where Il’y-k is deﬁned above; b}!- is a vector of scores for the vector of multiple
dimensions (9k) for person It; and a}!- is a design vector for the set of item parameters
(é) , i.e., 6}- and Ti. Notice that the item parameters (g) may be considered ﬁxed.

The Level-2 model speciﬁes the random distribution of 9k , which may linearly

depend on predictor variables (e.g., SES, gender, etc.)

9,, ”mm, (1.6)

10

where xk is a vector for the covariate scores; 5 is matrix for the ﬁxed regression
coefﬁcients for the covariates; and 5k ~ N (0,0,? ).

If the model is constrained to be unidimensional (Adams & Wilson, 1996), and

constraints are placed on the item parameters (1;) , then Adams et al. (1997) have shown

this model to be a hierarchical generalization of the PCM (e.g., see Rijmen et al. (2003))
and RSM (as well as a generalization for the l-parameter model, c.f., Lord, 1980; Adams
& Wilson, 1996). Additionally, Adams and colleagues (Wang, Wilson, & Adams, 1998)
showed that the NRCMM is a generalization of the models proposed by Andersen (1985)
and Embretson (1991), in which covariates were used to measure change (in the person

parameter 9k ).

Continuing our example then, the MRCMM suggests that the probability that an
applicant will be attracted to select a particular feeling for a particular item depends not
only on the applicant’s honesty, but also the overall attractiveness of the item and the
threshold of being attracted to feeling i rather than i—l. Additionally, if the researcher
has reason to believe that the applicant’s honesty may be inﬂuenced by other variables,
such as his or her criminal history or the number of occasions he or she has taken the test,
then these covariates may be controlled for as well (Equation (1.6)).

Furthermore, unlike the traditional PCM and RSM, the random coefﬁcients in a
multinomial model considers the variation of the responses within persons and between
persons. This is seen in the Level-l and -2 models (Equations (1.5) and (1.6)) when the

item parameters (51- and Ti) are treated as ﬁxed effects and are nested within the random

effect of persons (9k ).

ll

Unfortunately, as mentioned above, currently the MRCMM is limited in that the
software for estimating the parameters (i.e., ConQuest, 1998), may only estimate models
that contain predictor variables at the person level model, and the MRCMM may not be
applied when modeling predictor variables for the item parameters; nor may they be
applied when controlling for the correlated relationships of the multivariate response

vectors.

1-2-3. Bayesian Modeling of Random-Effects Approach

Another approach for modeling polytomous RT models in HLM was proposed
by Maier (2000, 2002) and Fox (In press, b). In the approach, Bayesian procedures are
applied to the modeling of the random effects of the PCM, which may be represented as a
Means-as-Outcomes model in the HLM framework (Maier, 2000, 2002; Raudenbush &

Bryk, 2002). Speciﬁcally, in Iogit form, Maier’s model is given by

 

108 7571. = 77"].
I
”1.1, ,1 J (1.7)

= 0,). "‘ 61" ,
where Irijk and 6,-1- is the PCM pararneterization of 51- and ti; and 19* is the ability of
person i for response set r. Note 6,-1- is treated as a ﬁxed parameter, and is interpreted as a

location of a particular category i for item j on the underlying latent trait continuum.

The Level-1 and -2 models specify the hierarchical nature of 6,), . Speciﬁcally,

the Level-l model states

6,), =ak+£,*, (1.8)

12

where 8,), is the random error associated with the random intercept ak of person k for
response set r, 5r]: ~ N (0,03)

The Level-2 model deﬁnes ark. It is given as
ak = W127 + VOk , (1.9)

where W); = (1, W1 k ,. . . , Wp_1, k ) is a matrix containing the p predictor variables;

7' = ( 70, 71,. . . , yp_1) is a matrix containing the ﬁxed regression coefﬁcients for the p

predictor variables; and VOk is the random error associated with the ﬁxed regression
coefﬁcients 7 for person k, VOk ~ N (0,03).

Referring back to the example, the Bayesian modeling of random-effects
approach models applicant behavior similarly to the random coefﬁcients in a multinomial
model approach, so the concepts will not be repeated here. However, one of the primary
differences between the two approaches (and the traditional RSM) is that the Bayesian
approach speciﬁcally models the variation of the responses within persons in the Level-1

model (i.e., 51k in Equation (1.8)), and it speciﬁcally models the variation of the
responses between persons in the Level-2 model (i.e., VOk in Equation (1 .9)).

Unfortunately, the Bayesian modeling of random-effects approach does not
adequately account for the correlated relationships of the multivariate response vectors.
Additionally, the estimation of parameters using a fully Bayesian approach requires
speciﬁcation of a prior distribution. However, as models become more complex (which is
the case as one includes predictor variables), an inappropriate choice for the prior

distribution may lead to an improper posterior distribution, which may not be detected by

13

MCMC methods. Also, some researchers may not accept the fully Bayesian perspective

and may believe in applying other theoretical perspectives, e.g., a frequentist perspective.

1-2-4. Rafter Effects Approzﬂ

A third approach, a rater effects approach, was developed by Patz and colleagues
(Patz, 1996 as cited by Patz, Junker, and Johnson, 1999; Patz, Junker, and Johnson, 1999;
Patz, Junker, Johnson, & Mariano, 2002). It is fairly different from the previous
approaches in that it applies a generalizability ﬁamework within an HLM framework to
obtain a ‘rater effect’. Speciﬁcally, the approach is given by the Hierarchical Rater Model
(HRM), which is essentially a 3-Level model in which the ratings of a rater are nested
within item responses, which in turn is nested within a person’s location on the
underlying continuum.

Speciﬁcally, at Level-1 (which Patz and colleagues describe as the ﬁrst stage
model), the model is deﬁned by

log[—itik—] = logit[P(CJ-k = {'6’}, ,Xjkm E {QC-1}”

”i-l,jk
= 77ijk (1-10)
9k ‘51 “’41,

where Cjk is an ideal, unobserved, latent trait rating variable that describes person k’s

performance on item j, which follows (any RT model, but in this case) the PCM (where

5,)- is decomposed into two components, i.e., 6,-1- = 51- + r such that 5]- is the overall

1:,"

l
attractiveness of item j (6]- : i—Zdy- ); and 1,-1- is the response threshold of being
i =1

attracted to category i rather than i— l , and are deviations from the overall attractiveness

l4

of item j (61- )); and X jkm is the signal detection model (see, e.g., Table l) for rater m

who rates person k on item j, which follows the Level-2 model described below. Note

r 4 j now describes the threshold of the ideal, latent rating 4' for item j, rather than the

observed rating i. Also, note that 61- and 141- are considered ﬁxed effects.

15

Table 1. The Signal Detection Model for the Rating Probabilities ( p5,")

 

 

 

0 I I
0 P00»: P01»: ... P01»:
4 1 Pl 0m Pl 1m ' " pllm
1 P10»: P11». Pllm

 

 

Note. pan, is the probability that rater m rates the observed rating 1' given the
ideal rating 4 .

The Level-2 model (which Patz and colleagues described as the second stage

model) describes the relationship between one or more raters’ rating i and the ideal rating

category {jk (4' = 0,1,. . . , I ) . The model is a discrete signal detection problem using a

matrix of rating probabilities pa," 2 P(rater m rates ilé'jk ) , as seen in Table 1. Although

the density of PG"! for each row in Table 1 make take any form, Patz and colleagues

used a normal density (see Patz, Junker, and Johnson (1999) for the pararneterization of

the normal density).

Finally, the Level-3 model (which follows from the HLM framework) deﬁnes 6k
as a random effect that is distributed as N (,u, 062 ).

To better understand the rater effects approach, the personality testing example is

referred to again. Recall in this example, that we have an applicant whom is responding

16

to an honesty exam, in which each item asks the applicant to select one of three
categories, negative, neutral, or positive. However, instead of the applicant selecting the
categories, for the rater effects approach, the applicant is asked to provide a response to
the open-ended question. And, for this response, a rater (or multiple raters) is asked to
rate the applicant’s response for each item as being in one of the aforementioned
categories. Thus, the rater effects approach suggests that the probability that an applicant
will fall into a particular category of feeling for a particular item depends not only on the
applicant’s honesty, but also the overall attractiveness of the item, and the threshold that a
_rat_er assigns a particular feeling 1' rather than i —1 for each question j.

Additionally, unlike the previous approaches, the rater effects approach models
the variation of the responses within persons and between persons by applying a
generalizability approach. Speciﬁcally, this approach attempts to measure the nested
effect of the rater’s ratings on the person’s item responses (see, e.g., the Level-2 model
depicted in Table 1). Additionally, as mentioned above, this effect is nested within the
Level-3 model, the person level model, which models the variation of the responses

between persons as random effects (61k ).

Unfortunately, although the rater effects approach effectively estimates the rater
effect for simulated data (e.g., Donoghue & Hombo, 2003; Patz, Junker, and Johnson,
1999; Patz, Junker, Johnson, & Mariano, 2002), the approach does not consider the
modeling of predictor variables for persons and items, and the approach does not
adequately account for the correlated relationships of the multivariate response vectors.
Additionally, researchers report that, when compared to non-hierarchical rater effects

models, the precision of estimates afforded by the HLM framework was not observed

17

when applying the model to real data. Also, researchers complained that the estimation of

the parameters was relatively “labor and time intensive” (Barr & Raju, 2003, p.41).

1-2-5. A Hieraghﬁrl, Univariate Genegrl Linear Model Apmoach

The last approach discussed here for modeling polytomous RT models in HLM
essentially extends the work of Kamata (1998, 2001), which proposed using a
hierarchical, univariate generalized linear model (GLM) to parameterize an RT model
for dichotomous items (i.e., the l-parameter model, Lord, 1980). To illustrate the
approach, the models are ﬁrst deﬁned using the notation typically applied in hierarchical
GLM. Then, the parameters are described in terms of how the model relates to the
traditional RT parameters.

The hierarchical, univariate GLM approach is deﬁned by applying a multinomial
model using a baseline-category logit link function (Raudcnbush & Bryk, 2002). The
reason for doing so is to illustrate the equivalence between the adj acent-category link
function and the baseline-category link function, which is used in the popular text by
Raudcnbush and Bryk (2002) and brieﬂy noted by Rijmen et al. (2003).

Speciﬁcally, the Level-1 model uses a regression-type formulation, and is deﬁned

by

(1.11)

18

where Irv-k is the probability that the observed response of person k on item j falls in

category i; 7:11.], is the probability that the observed response of person k on item j falls in

the ‘baseline’ category I; X qik is the qth dummy variable for person k, with values 1
when q = j , and 0 when q 1: j for item j; and, for person k, ﬂqijk is the regression

coefﬁcient of category i for item j. Thus, for the Level-1 model, the category level model,

the regression coefﬁcient of category i for item j ( ﬂqijk ) measures the overall effect (i.e.,

mean effect) of category i for item j, which one may notice is assumed to be ﬁxed for
each category of each item (i.e., there are no random effects added to the Level-l model).
To model how the category effects behave across items, the Level-2 model, the
item level model, is deﬁned. Speciﬁcally, for the PCM, the Level-2 model may be
deﬁned as
I -1
ﬂquk = 7q0jk + 1;)ka Wlijk

I—l

ﬂquk = Mon + Z 71ijkw1ijk

i=0 (1.12)

[-1

.Bq,I—1,jk = 7q0jk + gnijkwlijk
1:

where, for person k, quJ-k is mean effect of item j across categories i; 7ka is the effect

of item j on a particular category i; and Wlijk is a dummy variable with values 1 if i' = i

for the j m item answered by person k, and 0 otherwise.

19

In contrast, for the RSM, it is assumed that the effect of item j on a particular

category i is equal for all items; hence, the constraint that 71,-!)C = 711'1k = = 711;”: = 71ik

is made, and the Level-2 model for the RSM becomes

[—1

ﬂqz‘jk = 7:101]: + X 71.1 Wm. , (1.13)
i=0

where 7q0jk is deﬁned above; 711']: is the effect of item j on a particular category i, which

is common across the j items; and wh-k is a dummy variable with values 1 if i' =i for the

j ‘1‘ item answered by person k, and 0 otherwise.

Continuing with the RSM (where analogous deﬁnitions apply to the PCM), the
Level-3 model, the person level model, models how the aforementioned effects behave at

the person level. Specifically, the Level-3 model is deﬁned as
7q0jk = quo +“qjk (1-14)
7111: = 31.0 (1.15)
where ’1qu0 is the mean effect of persons on item j; quk is the unique, random effect of

person k (i.e., uqik is the deviation of person k from the ﬁxed, category intercept

(111010)); and 21,0 is the mean change in ’1qu0 for a particular category i, for all
persons.

However, in a testing environment, it is assumed that the unique effect (um-k) of

person k does not vary across the categories of an item j. Hence, the effects are

constrained to be equal for each category i of each item j, i.e.,

20

quk = uqlk = = uQJk = uk , uk ~ N (0,03) . And, the Level-3 model for quJ-k

becomes
7q0jk =lqojo+uk, (1.16)

where quJ-k and lquJ-O are deﬁned above; and uk is the random effect of person k.

Thus, for the person level model, the mean effect of category i for item j (,1th jo)

varies for each item, but is ﬁxed for each person k. And, the unique effect of person k

(uk) on the mean effect of category i for item j is constant for each ’1qu0- Lastly, the

effect of the item on a particular category i (21,0) , varies for each category i, but is ﬁxed

for each person k (and constant across the j items).
However, the baseline category parameterization implies that the regression
coefﬁcient of category i for item j is the mean effect of category i for the jth item from

the baseline category I, i.e.,

.6qu =13.“ $1,111., (1.17)
But, rather than a baseline category parameterization (such as that discussed by
Raudcnbush and Bryk, 2002), popular polytomous RT models apply an adj acent-
category parameterization (i.e., Agresti, 1996, 2002; Andrich, 1978; Hartzel etal., 2001;

Masters, 1982; Wright & Masters, 1982), e.g., see the RSM in Equation (1.4). Therefore,

the correct effect of interest is not the effect of category i for the j ‘1‘ item from the

baseline category I (Equation (1.17)); rather, the correct effect is the effect of category i

for the j th item from the adjacent category i— l

21

'6qu E .5qu ‘ ngJ-ld'k- (1-18)

#

This implies that to obtain the adjacent-category effect (Aw-0) from the baseline-

category parameterizations, one must do the following:

'3qu 5 £qu ’ [’qJ—ij
= (ﬁqijk T'ﬁquk )—(ﬁq,i—l,jk _ﬁquk) (1°19)
= ﬂqg-k -ﬂq,.-_1,jk-
Taking the equations above, this suggests the following. The mean effect of
category i from the adjacent category i — 1 (Aqu0 ) , in the HLM framework, is analogous

to (the negative of) the location of a particular category i for item j on the underlying

latent trait continuum (—6j ) , in the RT framework. Additionally, the effect of the item
on a particular category i (21,0) , in the HLM framework, is analogous to (the negative
of) the threshold of a particular category i (—r,- ) , in the RT framework. Lastly, the
location of person k on the underlying latent trait continuum (6k ), in the RT framework,

is analogous to the unique effect of person k (uk ). In short, the parameters for the

traditional RSM are equivalent to the parameters in the hierarchical GLM in the

following manner:
6j=— quO (1.20)
ﬁ=-Am azn
6,, = uk. (1.22)

22

Therefore, in the personality testing example, the hierarchical GLM approach is
very similar to the random coefﬁcients in a multinomial model approach in that the
probability that an applicant will be attracted to a particular feeling for a particular item
depends not only on the applicant’s honesty, but also the attractiveness that an applicant
will select a particular feeling i rather than i—l for each question j. However, rather than
modeling the parameters directly like the random coefﬁcients approach, the hierarchical
GLM approach models effects——that is, the overall attractiveness of an item as well as the
effect of the item on a particular category (i.e., the Level-2 model; Equations (1 . 12) or
(1.13)), while the honesty of an applicant is modeled using a unique effect that is treated
as random at the Level-3 model (Equation (1.16)).

Furthermore, like the random coefﬁcients approach, the hierarchical GLM
approach can model person covariates; however, unlike the random coefﬁcients
approach, the hierarchical GLM approach can also model predictors of item behaviors.
Since modeling person covariates and predictors of item behaviors are very similar for
the hierarchical, univariate GLM approach and the hierarchical, multivariate GLM
(which is the main focus of the paper), this discussion is left for Chapters 4, 5, and 6.

One limitation of the hierarchical univariate GLM approach is that the approach
does not adequately account for the correlated relationships of the multivariate response

vectors.

23

Chapter 2. A Hierarchical Multivariate Generalized Linear Modeling F rarnework for
RT
2-1. The Hierarchical Multivariate Generalized Linear Model

As stated earlier, the purpose of this paper is to develop a framework for
modeling RT models in HLM such that traditional RT models may be extended in
various manners. This framework will not only attempt to develop models that avoid the
limitations of the previous models (i.e., polytomous RT models may be extended to
include person and item-speciﬁc covariates, and the correlation between categories of a
polytomous item may be accounted for), but the model is also advantageous to apply
because, as mentioned above, (1) models using HMGLM may currently be estimated
using existing software (e.g., SAS, 2001; STATA, 2000); (2) RT and HLM are uniﬁed
using a common notation; (3) score functions and information matrices (which may be
used for parameter estimation) are well-known under the HMGLM (e.g., see Fahrmeir &
Tutz, 2001); and (4) a broad class of RT models within the HLM framework may be
estimated using a common method (e.g., maximum likelihood).

Using the notation typically applied in hierarchical GLM, the hierarchical models
for the HMGLM, which has its roots in the multivariate framework provided by Fahrmeir
and Tutz (2001), Gueorguieva (2001), and Hartzel et al. (2001), are deﬁned. As
mentioned previously the models deﬁned here in Chapter 2 may resemble those deﬁned
recently by Tuerlinckx and Wang (2004); however, one reiterates that, unlike the
aforementioned authors, the models below are deﬁned by explicitly modeling the nested
levels. Speciﬁcally, the Level-l model deﬁnes the category level. The Level-2 model

deﬁnes the item level. And, the Level-3 model deﬁnes the person level. Finally, the

24

combined model is deﬁned. After the presentation of these models, the Rating Scale
Model (RSM; Andrich, 1978) and Partial Credit Model (PCM; Masters, 1982) are
deﬁned within the HMGLM. For each of these deﬁnitions, to help ease the presentation,
one continues with the previous honesty exam example, and one illustrates how the

concepts behind each of the RT models transfers over to the HMGLM.

2-1-1. The Level-1 Model for the HMGLM

As mentioned above, the Level-1 model for the HMGLM deﬁnes the Level-1
units, the categories of the items. To deﬁne the Level-l model for the HMGLM, the
categorical responses i (i = 0,1, 2, ..., I) ofperson k (k =1, 2, 3, ..., K) to itemj (i =1, 2,

3, ..., J) are re-expressed as a dummy-coded, multivariate response vector

J’k =(5’ik-5’ik-i’ék-o-«5’Jky (21)
where
5’11. =(J’11kaY21ks-wnlki
5’2k =(y12k’YZ2kamry12k), (22)
5’11 = (Y1Jk-J’2kayljk I-
and

1 if response to item j equals i
y). ={ (2.3)

0 otherwise.

Note that if the multivariate response vector jijk is a vector of 0’s, then category 0 was

chosen by person k for item j. Here, category 0 was chosen to be the reference category to

be consistent with polytomous RT models; however, other reference categories can be

25

utilized without loss of generality. Additionally, notice the multivariate response vectors
are one of the primary differences between multivariate hierarchical GLM and univariate

hierarchical GLM.

Another primary difference is that it is assumed that the J’ijk are conditionally

independent given the multivariate (not univariate) random effect u jk . If the sum of the

I
conditionally independent observations yijk for jijk is taken, i.e., y jk = Z yijk , then it
i=1

is also assumed that y jk are multinomially distributed with parameters
1: ﬂ, =(7:1 17,42 jk,...,n,j,<) (Hartzel, Agresti, & Caffo, 2001).

Thus, the conditional distribution f ( y jk | u jk ) is a member of the multivariate
exponential family with multivariate means A k , #2 k , #3 k , ..., ,qu . That is,

#11: =E(yrk IU1)="('hk)
#21: =E(J’2k |u2)=h(ﬁzk)
#31. =E(.V3k |u3)=h(rr31) (2.4)

mt =E(J’Jk I”J)=h(”.lk)’

where [1(7) jk ) is a vector of inverse link ftmctions

”1(ij I

h2('7jk), (2.5)

h] ('Ijk)

26

where 0}}: is a vector of functions that describe the linear relationship of the ﬁxed and

random parameters.
To obtain the desired form of a polytomous RT model, the vector of inverse link

functions are deﬁned using the adjacent-category link function (Agresti, 1996, 2002)

h(”jk)

1.
exp[ 2 Uijk ]

 

hi (Ujk) = #jk E ”ijk = 1 i=2. , (2-6)
Zexp[z77ijk]
i=0 i=0

which is the probability Irv-k of person k selecting category i (i = 0,1,. . . ,i',. . . I) of item j,

0
where 770jk =.-: 0; hence, exp[z ’Iijk] = exp(770jk ) E 1.
i=0
Re-expressing the link ﬁmction as the log-odds of person k responding to category
i rather than category i — 1 for item j, the Level-l model for the HMGLM is obtained:

/ r \
6XP[ 2 iiijk J

i=0

1 i' 1
2 zexp 27;...
log [—”k ] = log 1:0 ,, 1:0 )

751-1, jk

 

 

r-l \
exp 771—1, jk
i =0 J

I {-1
z...[z...]
)

\i=0 i=0

 

 

 

27

f \

it
€XP[Z’71jk]
i=0
14-1
W[Z'k-ij]
i=0 )

 

 

 

K

 

(2.7)
-log 614301011: +’l'tjk +- - -+’7i—l,jk +77117:)
exP(’70jk +771jk +---+77i-l,jk)

2(7701'k “7111c +° ' '+’7i-1,j,k +77ijk)—(770jk +771jk +~ - -+'ii-1,jk)

= ’lzjk-
Speciﬁcally, the Level-l model for the HMGLM deﬁnes the log-odds of the

probability that person k will select category i rather than category i —1 for item j as

category effects

.. J .
log [—71—] = 2 135.251 . (2.8)

”i-1,jk j=1

where ,65; is the mean category effect if person k selects category i of item j; and x jk is

a dummy variable with values 1 if person k answers item j, and 0 otherwise.

Thus, like the Level-1 model for the univariate, hierarchical GLM approach, the

mean category effect ( £52) non-randomly varies across each category i of each item j

for each person k. Furthermore, the mean category effect ( [39) is inﬂuenced by the

effect of the particular item in which the categories are nested. The Level-2 model

describes these effects.

28

2-1-2. The Level-2 Model for the HMGLM
Since the Level-1 model described the category effects for an item only if it has
been answered by person k, then like the Level-1 model, the Level-2 model is deﬁned in

terms of the answered item as well. Speciﬁcally, the Level-2 model, the item-level model

for the HMGLM, is generally deﬁned as

. I .
Atlanta-DBM- (29>
i =1

where, for person k, 70jk is the mean effect of item j across categories i; 7182‘ is the
effect of item j on a particular category i; and lek is a dummy variable with values 1 if
i' = i for the j m item answered by person k, and 0 otherwise. Recall ’IOjk -=- 0. Thus, for
identiﬁability, 7190,? 20

Thus, like the Level-2 model for the univariate, hierarchical GLM approach, the

Level-2 model deﬁnes how the category effects ( £52) behave when they are nested

within the item-level model. Speciﬁcally, the category effects vary non-randomly and

depend upon the mean effect of the item across the categories ( 7017:) and the effect of

the item on each category ( 71(1),)

2-1-3. The Level-3 Model for the HMGLM

29

The person-level model for the HMGLM, the Level-3 model, deﬁnes how the

item effects behave when nested within persons. Speciﬁcally, the Level-3 model is

deﬁned as
70jk =10jo+ujb (2-10)
719-}, = (1'2, (2.11)

where, for the j ‘1‘ item that is answered by person k, [101-0 is the mean effect of persons

on item j; u jk is the random effect of person k on the mean effect of item j; and 218% is

the mean change in the 201-0 for a particular category of item j, for all persons.

However in RT, we assume that the person effects are constant across items.

Thus, the following constraint is made
“1k =u2k =---=“jk =uk,
and the Level-3 model for the mean item effect becomes
70jk =20j0+uk, (2.12)
where 101-0 is deﬁned above; and uk is the random effect of person k across items.
Thus, like the Level-3 model for the univariate, hierarchical GLM approach, the

Level-3 model deﬁnes the mean effect of the item (7011,) as depending upon the mean

effect of the item across all persons (2101-0) , and depending upon the unique effect of a

particular person k (uk ). Additionally, the Level-3 model deﬁnes the effect of the item

on a speciﬁc category (71(2) as being ﬁxed for each person k (11(2)) .

30

2-1-4. The Combined Model for the HMGLM
To obtain the combined model for the HMGLM, the models for Levels 1, 2, and 3

are combined

.. J I .
log[_£g£_]= Z[’10]0 +21};%W11k +uk}ljk. (2.13)
j=l

”I _lrjk i=1

To obtain the matrix representation of the combined model, the following

matrices are deﬁned:

B=(ﬂl’BZ:---9BJ)', (2.14)
where
l 2 I
Hence qjx deﬁnes the following linear relationships

171k = ZlkB + wlkuk
712/: = szB + “'2ch
’73): = Z3kI5 + “'3ka (2.16)
'71]: = ZJkB + kauk9
where B is deﬁned above and is a (p x 1) -dimensional matrix for the unknown
parameters (p) of the ﬁxed effects; Zi1,Z,-2,Z,-3,. ..,ZU are (Ix p) -dimensional design

matrices for the ﬁxed effects; uk are (p x 1) -dimensional matrices for the unknown

parameters (p) ofthe random effects; and Wlk,W2k,W3k ,...,WJk are (Ix p) -

dimensional design matrices for the random effects.

31

Lastly, the random effect u is assumed to be independent and identically

distributed with density g(u) , which is not restricted to any form. Here the density

g (u) is chosen to analogously follow traditional RT assumptions and previous
formulations of hierarchical RT models (e.g., Kamata, 1998, 2001; Lord, 1980;
Miyazaki, 2000)
u~MVN(0,2). (2.17)
(Note the dummy variable xJ-k in the above discussion represents the situation

where all persons respond to all items, i.e., the data are balanced. If the data are

unbalanced, then x jk may take on a similar coding scheme as that provided in Equation

(1.11) for the hierarchical, univariate GLM approach. That is, x jk becomes xqjk , and

represents the qth dummy variable for person k, with values 1 when q = j , and 0 when

q ¢j for itemj.)

2-2. A New Model 1: The Hierarchical Multivariate Generalized Linear-Partial
Credit Model (HMGL-PCM)

To illustrate the relationship between the HMGLM and traditional RT
parameters, the PCM is deﬁned within the HMGLM. Since the PCM is deﬁned within the
hierarchical framework of the HMGLM, the model can essentially be thought of as a new
model. This new model is named the Hierarchical Multivariate Generalized Linear-
Partial Credit Model (HMGL-PCM). For the HMGL-PCM, the reader should notice the

application of the HLM framework (i.e., the deﬁnition of model levels), which is not used

32

by Tuerlinckx and Wang (2004) and provides a more natural way for conceptualizing the
hierarchical PCM.

The Level-1 model (the category level) for the HMGL—PCM is deﬁned as

J i)
104-”197‘— —]= Zn“ p1,} xjk, (2.18)
”i— —l ,jk j=1
where all terms are deﬁned above.
The Level-2 model (the item level) is deﬁned as
ﬂ()=701k+Z}/ljkwlljka (2.19)

where all terms are deﬁned above.

Here, to see the relationship between the HMGL-PCM and traditional PCM, one
refers back to the honesty example. Recall, that in this example, an applicant is
responding to several polytomous honesty items by selecting a particular category, which
represents his/her feelings toward the item. Hence, for the HMGL-PCM, the probability

that an applicant is attracted to a particular feeling for a particular answered item depends

upon the overall attractiveness of the item ( 70jk ) , and how the attractiveness of the item

inﬂuences a particular feeling (71(2). Additionally, notice that the attractiveness of a

feeling for an item is nested within that item, as modeled from Level-1 to Level-2.

Continuing with the HMGL-PCM, the Level-3 model (the person level) is deﬁned

70jk = 4010 + uk, (2.20)

752. =43?» (2.21)

33

where all terms are deﬁned above.

The combined model for the HMGL-PCM is deﬁned as

 

.. J I . .
1.,( ]=Z[,,j,.z.,<),wg)..,).,,, (2.22)

7ri-1,jk j=1 i=1

which reduces to the following for a particular category i of an item j

log (l'Ji—] = 201.0 + 2(2) + uk. (2.23)

751-1, jk
Here we can clearly see how the category effects ﬁmction as the categories are

nested within items, which in turn are nested within persons. Speciﬁcally, the probability

that an applicant is attracted to a particular feeling for a particular answered item not only

depends upon the overall attractiveness of the item (701k ) , but also how the

attractiveness of the item inﬂuences a particular feeling (78.) ). In addition, as the Level-

3 model shows (Equations (2.20) - (2.21)), the overall attractiveness of the item (10 jg)

and the inﬂuence of an item on a particular feeling (11(2)) is ﬁxed across persons.

Furthermore, as is commonly assumed in RT, the unique effect (uk) of an applicant
randomly varies across the different applicants (but remains ﬁxed across items and across
feelings).

In short, the parameters of the I-[MGL-PCM are related to the parameters of the

traditional PCM in the following manner:

61- = 401-0, (2.24)

34

ry- = — (1'2). (2.25)

and

6k = uk . (2.26)

2-3. A New Model 2: The Hierarchical Multivariate Generalized Linear-Rating
Scale Model (HMGL-RSM)

Now the RSM is deﬁned within the HMGLM, and this new model is named the
Hierarchical Multivariate Generalized Linear-Rating Scale Model (HMGL-RSM). Recall
from Section 1-2-1, that the RSM is simply a special case of the PCM. Hence, the model
deﬁnitions of the HMGL-RSM follow very closely to the HMGL-PCM. Again, the reader
should notice the application of the HLM framework (i.e., the deﬁnition of model levels),
which is not used by Tuerlinckx and Wang (2004) and provides a more natural way for
conceptualizing the hierarchical RSM.

The Level-1 model (the category level) is

, (11L)- ’ (i) .
og m . _ Zﬂjkxjk, (2.27)
r—l,jk 1:1
where all terms are deﬁned above.

The Level-2 model (the item level) is obtained by constraining the effect of an

(i

item on a particular category to be equal for all items (i.e., 71(1) = 712k = = 7(3) = 71(2)

. I . .
,55-1)=70jk+. rifiwlfi. (2.28)

35

where 70jk is deﬁned above; 71(1) is the effect of the item on a particular category,

which again is equal for all items; and W102 is a dummy variable with values 1 if i' = i for

the jth item answered by person k, and 0 otherwise.

The Level-3 model (the person level model) is
70,-]. = 10 )0 + uk, (2.29)
71",? = 41%). (2.30)
where 2101-0 and uk are deﬁned above; and 21100) is the mean change in the 2101-0 for a

particular category i, for all persons.
In our example, the parameters of the HMGL- RSM may be interpreted

accordingly. The probability that an applicant is attracted to a particular feeling for a

particular answered item depends upon the overall attractiveness of the item (7011: ) , and

the common inﬂuence of the attractiveness of the items on a particular feeling (71(2).

Additionally, the overall attractiveness of the item and the inﬂuence of the items on a
particular feeling is ﬁxed across persons (201-0 and 11%) , respectively). Lastly, the unique

effect of an applicant on the item randomly varies across the different applicants (uk ).

The combined model for the HMGL-RSM is deﬁned as

.. J l . .
1444-]:44.24142”). (m

717—ij j=1 i=1

which reduces to the following for a particular category i of an item j

36

 

rog( ”'7" ]= 201-0 + 2100) + uk. (2.32)

711—1, jk
In short, the parameters of the HMGL-RSM are related to the parameters of the

traditional RSM in the following manner:

5]- : 401-0, (2.33)
q=-42. a3g

and
19,, = uk. (235)

2-4. Assumptions
Like non-hierarchical, univariate GLM, there are distributional and structural

assumptions of the HMGLM that need to be satisﬁed for the model to hold. As

mentioned above, the distributional assumption is that the yyk are conditionally
independent given the random effect uk (i.e., f ( y jk Iuk )), and the conditional

distribution f ( y jk luk) is a member of the multivariate exponential family. Here, it is

assumed to be multinomially distributed with parameters 1: jk =(7r1jk , ﬂzjk ,. . . , Irle) .

The structural assumption is given by the Level 1 model; and, that is, the

expectation of f ( y jk | uk) (i.e., ”1%) is determined by a vector of linear predictors
(Equation (2.16)) in the form of a vector of inverse link functions, h(r]jk ). For the

purposes here, 11(1) jk ) is chosen to be the logit form of the adjacent-categories link

37

function (Equation (2.6); Agresti, 1996, 2002; Hartzel, Agresti, & Caffo, 2001). Agresti
(2002) shows this ﬁmction to be the form of the RSM and PCM.

Regarding the hierarchical nature of the HMGLM, recall from above that the
random component u requires certain distributional assumptions. One of the advantages
of applying the HMGLM is that u is not restricted to be a speciﬁc distribution. For the
purposes here, RT parameters are being modeled, and, recall, u is equivalent to the
location of person k on the underlying continuum. In RT, it is customary that the
locations of all persons on the underlying continuum are assumed to be normally
distributed (e.g., Cheong & Raudcnbush, 2000; Kamata, 1998, 2001; Lord, 1980;
Miyazaki, 2000). It is also customary in HLM, to model the random components as being
multivariate normally distributed (Raudcnbush & Bryk, 2002). Thus, although not
necessary for the HMGLM, here previous customs were followed and u was assumed to
be multivariate normally distributed (Equation (2.17)).

Additionally in traditional RT methodology, the scale of the person and item
parameters is indeterminate (Lord, 1980). For the HMGLM, this is resolved in the

following manner. Recall that the HMGLM begins by modeling category effects of
person k on category i of item j. This suggests that ,6)? measures the effect of the
category from the grand mean

,6)” = a0 + a512, (2.36)

I

where do is the grand mean of the person measures; and for person k, 515,? is the

regression coefﬁcient for category i of item j.

38

Also recall that, after several hierarchical levels are modeled, the unique effect of
person k is modeled. This suggests that the unique effect of person k is the residual of

person k from the grand mean of the person measures

ﬂ}? = “0 + a)? + “k , (237)

where 050 and a)? are deﬁned above; and uk is the unique effect of person k. In other

words, uk is the deviation of person k from do.

In order to resolve the indeterminacy of the scale for the HMGL-RSM and -PCM,

u is assumed to be N (0, 2) . Notice if the coefﬁcients are assumed to be independent,

this is equivalent to saying that ,6 ~ N (0,2). Furthermore, since the coefﬁcients are

measured effects from the grand mean, and the distribution and mean of ,6 is chosen to

be normal and zero, respectively, then this is equivalent to saying that the grand mean,
which again is centered on person measures, is zero, and the distribution is normal.
Therefore, this resolves the indeterminateness of the scale by centering on person
measures, in which the center of the normally distributed measures is zero.

Also in RT, it is assumed that, beyond the characteristics (i.e., parameters) of an

item, success on an item only depends on the person’s location on the underlying

continuum (61k = uk ). In other words, it is assumed that the test is unidimensional—

success depends on the one dimension (e. g., honesty), and not on other traits (i.e., the test
is not multidimensional) (Lord, 1980). From unidimensionality, it follows that the items
are assumed to be locally independent. That is, the conditional probability of success on
one particular item, given the person’s location on the underlying continuum, is equal to

the conditional probability of success on all other items, given the person’s location on

39

the underlying continuum (Lord, 1980). By using the HMGLM, the assumption of
unidimensionality is relaxed. For example, below one presents extensions of the HMGL-
RSM in which person covariates (Chapter 4) and predictors of item behaviors for the
overall item location (Chapters 5 and 6) are modeled.

By modeling the aforementioned, this implies the deﬁnition of local independence
is slightly altered for the HMGL-RSM and -PCM. That is, the deﬁnition of local
independence is now the following: the conditional probability of success on one
particular item, given the person’s ability and the covariates, is equal to the conditional
probability of success on all other items, given the person’s ability and the covariates
(c.f., the deﬁnition of local independence above).

Note local independence is satisﬁed for the HMGL-RSM and -PCM because the
item locations are assumed to be ﬁxed at the person level. In other words, if the item
locations varied randomly or non-randomly, then the conditional probability of success
on one particular item, given the person’s ability and the covariates, would go_t
necessarily equal the conditional probability of success on all other items, given the
person’s ability and the covariates. (This suggests that the HMGL-RSM and -PCM may
be used to examine violations of local independence by modeling item covariates that
examine how the item locations vary. Although this goes beyond the scope of this

dissertation, this type of analysis is similar to those presented in the following Chapters.)

2-5. Estimation

Estimation of the parameters for the HMGL-RSM and -PCM may be

accomplished using frequentist or Bayesian methods. For examples of Monte Carlo

40

methods see Fahrmeir and Tutz (2001) and Hartzel et al. (2001). For examples of
Bayesian procedures see Fahrmeir and Tutz (2001), Fox and Glas (1998), and Maier
(2000, 2002). Fortunately, if one prefers frequentist methods, then the parameters of the
HMGL-RSM and -PCM may be estimated by readily available popular statistical
software packages, such as SAS (using PROC NLMIXED) and STATA (using
GLLAMM; Rabe-Hesketh, Pickles, & Skrondal, 2001). Speciﬁcally, estimates of the
parameters are obtained by maximizing an approximation to the likelihood integrated
over the random effects, where the integral approximations are obtained via adaptive
Gaussian quadrature and the optimization technique is carried out using a dual quasi-
Newton algorithm (SAS, 2001) or a modiﬁed Newton-Rapheson algorithm (Rabe-
Hesketh, Pickles, & Skrondal, 2001). Approximate standard errors of the successfully
converged parameter estimates are based on the second derivative matrix of the
likelihood function (SAS, 2001) or the delta-method (Rabe-Hesketh, Pickles, & Skrondal,
2001).

Unfortunately, popular software such as PROC NLMIXED does not estimate

multiple random effects. For example, for the models given above, only the person

parameter ((9,( ) may be considered random (uk) while the item and category parameters

(dj, 1,) may be considered ﬁxed (23%,10 jO, 211(2)). If one wishes to treat the item

parameters as random, then one may use GLLAMM or other methods (such as MCMC or

Bayesian estimation).

4]

Chapter 3. Parameter Recovery and Example
3-1. Simulation Design
The following section describes the design for a simulation study. Speciﬁcally,

observations were simulated using the RSM. Next, parameter estimates of the RSM and
HMGL-RSM were obtained with Winsteps (1999) and SAS (2001), respectively. Finally,
A comparison between the analyses of the parameter recovery rates follows. Because of
computational constraints (i.e., see Section 7-2-3), the PCM was not simulated. However,
because of the similarity between the RSM and PCM, similar results would be expected

(e.g., see Section 3-3).

3-1-1. Disign

The design of the simulation is as follows. Observations were simulated using the
RSM. This model was chosen because it is commonly used when scaling polytomous
data, such as those found in questionnaire data (e. g., Dodd, 1990; Smith & Johnson,
2000; Zhu, Updyke, & Lewandowski, 1997) and achievement data (e.g., Michigan
Education Assessment Program, 2003). For the study, simulees (K = 100, 500, or 1000)
responded to polytomous items (J = 10 or 25), where each item consisted of 3 categories i
(i = 0, 1, 2). The number of simulees, items, and categories were chosen to follow typical
data from a questionnaire (e. g., Dodd, 1990; Smith & Johnson, 2000; Zhu, Updyke, &
Lewandowski, 1997) or a large-scale assessment (e. g., US. Department of Education,
1999)

Item parameters were also selected to represent parameter estimates from typical

polytomous data. Speciﬁcally, item parameters were selected from the RT scaling of a

42

conﬁdential readiness assessment. For this assessment, there were three sub-scales that
measured the personal and social development (16 items), language (12 items), and
mathematical thinking (14 items) of a child. For each item, a particular scenario was
observed with the child, and a rater would then proceed to score the child in one of three
categories, a lower, middle, and higher category, each representing the performance of
that child on that particular item. For the purposes of this dissertation, only the ﬁrst 25
items were used. Table 2 displays the item parameters used in the simulation. (Note

although 1'1 and r2 appear to be extreme, these are typical values seen in educational

questionnaires because it is common in education that the middle categories, as opposed
to the extreme categories, are frequently used. For example, see Dodd (1990), Smith and

Johnson (2000), and Zhu, Updyke, and Lewandowski (1997).)

Table 2. Item Parameters Used in the Simulation

 

 

 

RSM
Simulation 5 .
1
Item

1 -0.09
2 0.02
3 -0.92
4 -1.57
5 -0.81
6 -O.74
7 -0.81
8 -0.01
9 0.07
10 -0.85
l l -l .28
12 -l .02
13 -1.14
14 -1.39
15 0.54
16 -0.32

43

Table 2 (cont’d)

17 -0.09
18 0.11
19 -015
20 -042
21 0

22 0.51
23 0.52
24 0.73
25 0.79
.1 -224
12 2.24

 

Note. 6 j : location

for item j. {2'1 and 12 }
= thresholds 1 and 2.

To produce the simulated responses under the RSM, each simulee k was randomly

assigned a location 9k , 19 ~ N (0, 1), and each item j was randomly assigned a set of item

parameters. If J = 10, then the item parameters were randomly selected to be those that
appear for the ﬁrst 10 items in Table 2; otherwise, J = 25 and all items were used.

Using 6,, , (5 j , and r,- , three response probabilities for each simulee by item

combination were produced, POjk (t9) , P1 jk (i9) , and szk (6’). If
i' i'+l

213-37. (0) < Y jk S 2 Ii’jk (6) , then simulee k was assigned a response of i' +1 for item j;
0 0

otherwise a response of 0 was assigned. Note that i' = 0, 1; and Y jk was a single,

random number for each j x k combination, Y ~ U(0, 1) .

The simulation procedure utilized a fully crossed 3 x 2 factorial design that

simulated 6 conditions. Each administration was iterated 50 times producing 300 unique

44

response data matrices. The number of iterations was chosen because Kamata (1998)
showed this to be a reasonable number for obtaining stable estimates. S-Plus (2000) was

used to generate all data.

3-1-2. An_alyis

PROC NLMIXED of SAS (2001) was used to estimate the person and item
parameters for the HMGL-RSM, while WINSTEPS (1999) was used to estimate the
person and item parameters for the RSM. An example of the SAS code for the HMGL-
RSM is provided in Appendix A. (An example of the SAS code for the HMGL-PCM is
provided in Appendix B.) An example for the input data structure is provided in
Appendix C. To investigate the accuracy of the parameter estimates for the RSM and

HMGL-RSM, the root mean square error (RMSE) for ya , Z , 51- , and r,- was obtained

over the iterations for each condition. Speciﬁcally, the RMSE was obtained by

 

. 1N . 2
RMSE(w)= "NEW—w") , (3.1)
n=l

where the maximum number of n iterations was N = 50; and a) is an arbitrary parameter

representing either #9 , X , 5 j , or ri.

3-2. Parameter recovery results
Below, the descriptive statistics are presented for 19 for the 50 iterations of each

condition. Recall, that 61- and I,- were speciﬁed and shown in Table 2. Also, the results

for the mean and standard deviations of the parameter estimates for 50 iterations for all

45

conditions are displayed and discussed. Lastly, the results of the analysis for recovering

the parameters are presented.

3-2-1. Descriptive Statistics

The results of the descriptive statistics for 0 and Z of 100, 500, and 1000
persons are presented in Table 3. As can be seen, the sampling distribution of 219 was
centered on or near zero with a small standard error (which decreased as persons

increased, as would be expected). Additionally, the sampling distribution of ,u): was

centered on or near one with a small standard error (which decreased as persons
increased, as would be expected). These ﬁndings suggest that the distribution of 0 was

simulated very well for all conditions.

Table 3. Mean and Standard Error of 19 and X for the Simulated 100, 500, and 1000
Persons

 

 

 

 

6 2
K M SE M SE
100 -0.01 (0.11) 0.98 (0.06)
500 0.01 (0.05) 1.00 (0.03)
1000 0.00 (0.03) 1.00 (0.02)

 

Note. K = Number of simulated individuals. M
= Mean. SE = Standard error.

Displayed in Tables 4 and 6, and 5 and 7 are the mean and standard deviations of
the parameter estimates for the RSM and HMGL-RSM, respectively. As can be seen for

both the RSM and HMGL-RSM, the standard deviations of the estimates are similar

46

across conditions. Furthermore, the standard deviations are fairly low and decrease as the
number of persons increase. This suggests that WINSTEPS and PROC NLMD(ED obtain
relatively consistent estimates of the HMGL-RSM parameters.

As for the mean of the estimates, in general, the estimates obtained by
WIN STEPS for the RSM appear to resemble the estimates obtained by PROC
NLMIXED for the HMGL-RSM (c.f., Table 2). Below, in Section 3-2-2, the RMSE is

examined.

Table 4. Mean and Standard Error of the Parameter Estimates for the RSM when J = 10

 

 

 

100 500 1000
M SE M SE M SE

51 -0.21 (0.28) -013 (0.13) -0.11 (0.08)
32 0.01 (0.24) -0.01 (0.10) 0.02 (0.08)
53 -O.98 (0.31) -103 (0.14) -103 (0.09)
54 -1.67 (0.28) -1.75 (0.12) -1.75 (0.09)
55 093 (0.23) -093 (0.13) -091 (0.08)
36 -O.84 (0.23) -0.83 (0.1 1) -O.83 (0.10)
57 —0.90 (0.27) -091 (0.13) -O.88 (0.09)
38 -0.05 (0.23) -0.05 (0.12) 0.00 (0.08)
59 0.05 (0.25) 0.05 (0.13) 0.09 (0.08)
510 -094 (0.23) -094 (0.13) -0.93 (0.08)
r“, -2.53 (0.12) -2.53 (0.07) -252 (0.04)
52 2.53 (0.12) 2.53 (0.07) 2.52 (0.04)
pé 0.00 (0.01) 0.00 (0.01) 0.00 (0.00)
zé 1.36 (0.12) 1.37 (0.05) 1.36 (0.03)

 

Note. {100, 500,1000} = Number of simulated individuals.

(51,32,...,(§10) = location for items 1 — 10. {fljz} = thresholds 1

and 2. ,ué = Mean person location. Zé= Standard deviation of the

person locations. M = Mean. SE = Standard error.

47

Table 5.

Mean and Standard Error of the Parameter Estimates for the HMGL-RSM when

 

 

 

 

 

J = 10
100 500 1000
M SE M SE M SE
31 -0.18 (0.26) -011 (0.12) -009 (0.07)
52 0.01 (0.22) 0.00 (0.09) 0.02 (0.07)
53 -O.88 (0.28) -0.92 (0.13) -0.93 (0.08)
54 -1.51 (0.25) -1.57 (0.10) -1.57 (0.08)
55 -O.84 (0.21) -O.84 (0.12) -O.82 (0.07)
56 -O.76 (0.21) -0.75 (0.10) -075 (0.08)
57 -O.81 (0.24) -O.82 (0.12) -079 (0.08)
38 -0.04 (0.21) -004 (0.11) 0.00 (0.08)
59 0.05 (0.22) 0.04 (0.11) 0.08 (0.07)
510 -O.85 (0.21) -O.84 (0.11) -0.84 (0.07)
:3 -2.25 (0.10) -225 (0.05) -224 (0.03)
£2 2.25 (0.10) 2.25 (0.05) 2.24 (0.03)
#6 0.01 (0.00) 0.01 (0.00) 0.01 (0.00)
zé 0.99 (0.12) 1.00 (0.05) 1.00 (0.04)

 

Note. {100,500,1000} = Number of simulated individuals.

(51,52,...,310) = location for items 1 — 10. (f1,f2} = thresholds l

and 2. p6 = Mean person location. 263: Standard deviation of the

person locations. M = Mean. SE = Standard error.

48

Table 6. Mean and Standard Error of the Parameter Estimates for the RSM when J = 25

 

 

 

 

 

100 500 1000
M SE M SE M SE

51 -0.19 (0.27) -0.12 (0.12) -010 (0.08)
52 0.02 (0.22) 0.00 (0.10) 0.02 (0.07)
5‘3 -092 (0.30) 096 (0.13) -0.96 (0.08)
54 -157 (0.24) -1.63 (0.10) -1.64 (0.08)
55 -O.87 (0.23) -0.87 (0.12) -O.85 (0.07)
56 -079 (0.21) -O.78 (0.10) -O.78 (0.08)
37 -O.84 (0.25) -0.85 (0.12) -O.82 (0.08)
53 -004 (0.22) -004 (0.11) 0.00 (0.08)
59 0.05 (0.23) 0.05 (0.12) 0.08 (0.07)
510 -0.88 (0.21) -O.88 (0.11) -O.87 (0.07)
3“ -129 (0.24) -133 (0.12) -133 (0.07)
512 -1.10 (0.25) -1.09 (0.12) -1.05 (0.09)
513 —1.14 (0.22) -120 (0.14) -1.19 (0.09)
514 —1.41 (0.24) -144 (0.08) -145 (0.08)
515 0.58 (0.26) 0.56 (0.10) 0.56 (0.07)
316 -O.36 (0.26) -033 (0.12) -032 (0.10)
517 -005 (0.20) -0.09 (0.10) -0.11 (0.07)
513 0.10 (0.26) 0.10 (0.12) 0.11 (0.08)
519 -0.13 (0.25) -0.17 (0.13) -O.16 (0.07)
520 -044 (0.21) -044 (0.1 1) -0.45 (0.09)
521 0.03 (0.25) 0.00 (0.11) 0.00 (0.08)
522 0.50 (0.25) 0.54 (0.10) 0.50 (0.09)
523 0.54 (0.24) 0.53 (0.11) 0.54 (0.09)
524 0.80 (0.25) 0.76 (0.10) 0.77 (0.07)
525 0.85 (0.26) 0.79 (0.10) 0.82 (0.08)
£, -2.36 (0.06) -235 (0.03) -234 (0.02)
£2 2.36 (0.06) 2.35 (0.03) 2.34 (0.02)
”(9 0.00 (0.00) 0.00 (0.00) 0.00 (0.00)
zé 1.13 (0.08) 1.14 (0.04) 1.13 (0.02)

 

49

Table 6 (cont’d)

 

Note. {100,500,1000} = Number of simulated individuals.

(61,52,...,625} = location for items 1 - 25. {flfz} = thresholds l

and 2. ,ué = Mean person location. 29*: Standard deviation of the

person locations. SE = Standard error.

Table 7. Mean and Standard Error of the Parameter Estimates for the HMGL-RSM when

 

 

 

1:25
100 500 1000
M SE M SE M SE
3] -019 (0.26) -009 (0.11) -009 (0.07)
52 0.02 (0.21) 0.02 (0.09) 0.02 (0.07)
53 -0.89 (0.28) -091 (0.13) -093 (0.08)
54 -1.52 (0.23) -1.56 (0.10) -1.58 (0.07)
55 -0.84 (0.22) -0.82 (0.12) -O.82 (0.07)
56 -O.76 (0.20) -0.73 (0.10) —0.75 (0.08)
57 -O.81 (0.24) -O.80 (0.12) -079 (0.08)
58 -004 (0.21) -002 (0.11) 0.00 (0.08)
59 0.05 (0.22) 0.06 (0.11) 0.08 (0.07)
510 -O.85 (0.20) -O.82 (0.11) -O.84 (0.07)
5“ -124 (0.23) -1.27 (0.11) -1.28 (0.07)
512 -1.06 (0.24) -1.03 (0.11) -1.01 (0.09)
513 -1.10 (0.21) -1.14 (0.13) -1.15 (0.09)
514 -l.36 (0.23) -1.37 (0.07) -1.39 (0.08)
515 0.56 (0.25) 0.56 (0.09) 0.54 (0.07)
316 -035 (0.25) -030 (0.11) -0.31 (0.09)
317 -0.04 (0.19) -007 (0.09) -011 (0.07)
513 0.10 (0.25) 0.12 (0.12) 0.11 (0.07)
5‘19 -012 (0.24) -015 (0.12) -O.16 (0.07)
520 -042 (0.20) -040 (0.1 1) -043 (0.08)
521 0.03 (0.24) 0.02 (0.1 1) 0.00 (0.08)

50

Table 7 (cont’d)

522 0.48 (0.24) 0.55 (0.10) 0.49 (0.08)
523 0.52 (0.23) 0.53 (0.10) 0.52 (0.09)
524 0.77 (0.24) 0.76 (0.10) 0.75 (0.07)
525 0.82 (0.25) 0.78 (0.10) 0.79 (0.08)
£1 -2.26 (0.06) -225 (0.03) -224 (0.02)
£2 2.26 (0.06) 2.25 (0.03) 2.24 (0.02)
#63 0.00 (0.00) 0.02 (0.02) 0.00 (0.00)
zé 0.99 (0.08) 1.00 (0.04) 1.00 (0.02)

 

159$ {100, 500,1000} = Number of simulated individuals.
(61,52,...,525}= location for items 1 — 25. {fljz} = thresholds 1
and 2. ya = Mean person location. Xé= Standard deviation of the
person locations. SE = Standard error.

3-2-2. _R_M_SE

The results of the RMSE for [.19 , Z , 61- , and r,- of the RSM and HMGL-RSM
when persons respond to 10 and 25 items are provided in Tables 8 and 9. For both the
RSM and HMGL-RSM, trends indicated that as persons increased from 100 to 1000, the
RMSE generally decreased for #6 , 2 , 61- , and r,- . This is expected because as the

persons increase there were more observations from which to estimate the person and

item parameters.
Additionally as one case see, although the RMSE decreases for both the RSM and

HMLG-RSM, the RSME is somewhat higher for the RSM estimates. This is particularly

the case for I], 12 , and 29, when persons responded to 10 items. This probably occurs

because, when using WINSTEPS to estimate these parameters, more items are needed to

obtain more precise estimates. In contrast, notice that as more items are estimated the

51

RMSE does not decrease for 29 for the HMGL-RSM; rather the RMSE remains fairly
stable. This occurs because 29 of the HMGL-RSM is the variation between the

empirical Bayes estimates of the random effect of persons. As discussed by Raudcnbush
and Bryk (2002), and shown here, this estimate depends on the number of units of the

random effects, not the number of ﬁxed effects, in this case, the number of items.

Table 8. RMSE for the RSM and HMGL-RSM across 10 Items

 

 

 

 

 

 

RSM HMGL-RSM
K K

100 500 1000 100 500 1000
.51 0.30 0.13 0.08 0.27 0.12 0.07
52 0.24 0.10 0.08 0.21 0.10 0.07
53 0.32 0.17 0.14 0.28 0.12 0.08
64 0.29 0.21 0.20 0.25 0.10 0.08
65 0.26 0.18 0.12 0.21 0.12 0.07
56 0.25 0.14 0.13 0.20 0.10 0.08
57 0.28 0.17 0.11 0.24 0.12 0.08
68 0.24 0.12 0.08 0.21 0.11 0.08
59 0.25 0.13 0.08 0.22 0.11 0.07
510 0.25 0.15 0.11 0.21 0.11 0.07
2'1 0.31 0.30 0.28 0.10 0.05 0.03
12 0.31 0.30 0.28 0.10 0.05 0.03
#0 0.01 0.01 0.00 0.01 0.01 0.01
29 0.37 0.37 0.36 0.12 0.05 0.04

 

Note. K = Number of simulated persons. (61,62, . . ,510} =
location for items 1 — 10. {11,12} = thresholds l and 2. ”61

= Mean person location. 20: Standard deviation of the
person locations.

52

Table 9. RMSE for the RSM and HMGL-RSM across 25 Items

 

 

 

 

 

 

RSM HMGL-RSM
K K

100 500 1000 100 500 1000
51 0.28 0.12 0.08 0.27 0.1 1 0.07
.52 0.22 0.10 0.07 0.21 0.09 0.07
53 0.29 0.13 0.09 0.28 0.13 0.08
54 0.24 0.12 0.10 0.24 0.10 0.07
55 0.24 0.14 0.08 0.22 0.12 0.07
56 0.21 0.11 0.09 0.20 0.10 0.08
57 0.25 0.12 0.08 0.23 0.12 0.08
58 0.22 0.11 0.08 0.21 0.10 0.08
59 0.23 0.12 0.07 0.22 0.11 0.07
510 0.21 0.11 0.07 0.20 0.11 0.07
a“ 0.24 0.13 0.09 0.23 0.11 0.07
512 0.26 0.14 0.09 0.24 0.11 0.09
513 0.22 0.15 0.10 0.22 0.13 0.09
514 0.24 0.10 0.10 0.23 0.07 0.08
515 0.26 0.10 0.07 0.25 0.09 0.07
516 0.26 0.12 0.09 0.25 0.12 0.09
517 0.20 0.10 0.08 0.20 0.09 0.07
613 0.25 0.12 0.08 0.24 0.12 0.07
519 0.25 0.13 0.07 0.24 0.12 0.07
.520 0.21 0.11 0.09 0.20 0.11 0.08
521 0.25 0.11 0.08 0.24 0.11 0.08
522 0.25 0.11 0.09 0.24 0.11 0.08
523 0.24 0.11 0.09 0.23 0.10 0.08
524 0.26 0.10 0.08 0.24 0.10 0.07
525 0.26 0.10 0.08 0.25 0.10 0.07
r1 0.13 0.11 0.10 0.06 0.03 0.02
12 0.13 0.11 0.10 0.06 0.03 0.02
#9 0.00 0.00 0.00 0.01 0.01 0.01
:9 0.15 0.14 0.13 0.12 0.05 0.04

 

53

Table 9 (cont’d)

 

Etc; K = Number of simulated persons. (61,62, ..,625} =
location for items 1 — 25. (71,72) = thresholds 1 and 2. #9-

= Mean person location. Zg= Standard deviation of the
person locations.

3-3. Example
Below, an example analysis is presented using both the HMGL-RSM and -PCM.
The purpose is to illustrate the basic concepts underlying these two models, as well as to

illustrate the differences between the two models.

3-3-1. M

The design of the analysis is as follows. Five hundred respondents were randomly
selected from a larger sample of students that responded to a conﬁdential readiness
assessment. (Note this was the same assessment that was simulated in Section 3-1.) In
this sample, 46% had parents with high SES (SES = 1); 44% had parents with middle
SES (SES = 2); and 10% had parents with low SES (SES = 3). 56% were male, and 44%
were female. Additionally, approximately less than 1% were age 5; 23% were age 6; 65%
were age 7; 12% were age 8; and less than 1% were age 9. Lastly, less than 1% were
Asian; 42% were African-American; 2% were Hispanic; and 56% were Caucasian.

For the purposes of this illustration, only the ﬁrst 10 items of the assessment were
used. (Note each item measured the person’s personal and social development.)

Additionally, only those respondents who answered each item and whose parents

54

provided their SES were used. As illustrated above, the sample and item sizes were

adequate to obtain relatively precise parameter estimates.

3-3-2. £11193

To analyze the responses of the students, PROC NLMIXED of SAS (2001) was
used to estimate the person and item parameters for the HMGL-RSM and -PCM.
Comparison between model ﬁt is achieved using the Akaike Information Criterion (AIC)
and Bayesian Information Criterion (BIC). Manalo (2004) and Singer (1998) shows these

measures to be adequate for judging model ﬁt in HLM analyses.

3-3-3. Results

The results of the analysis for the HMGL-RSM and -PCM are presented in Table

10. As can be seen, 31 - 510 , ”é , and 2% are similar between the two models.

Additionally, in - filo are similar across the ten items for the -PCM. Lastly, notice in -
filo are also generally similar to £1 and £2 for the -RSM.

To determine which model better ﬁts the data, the AIC and BIC are examined. As
shown, the AIC is lower for the HMGL-PCM than the -RSM, but the BIC is lower for the
HMGL-RSM than the -PCM. This suggests that the AIC indicates the HMGL-PCM as
being a better ﬁt for the data, while the BIC indicates the HMGL-RSM as being the better
ﬁt. However, focusing on the information weights, which act similar to an effect size in
that measures are normalized and models can be compared on a common (probabilistic)
scale (formulas can be found in Bumham and Anderson (2002)), we see that the

information weights for the HMGL-RSM and -PCM are .11 and .89 for the AIC, and 1.00

55

and 0.00 for the BIC. Since higher values indicate better ﬁt, and given the larger disparity
in the weights between the BIC than the AIC, and because the BIC compensates for the
large sample size and the AIC does not, the BIC might give a better representation of the

model ﬁt for the two models. Hence, using the BIC, it appears that the HMGL-RSM ﬁts

the data better. This suggests that the thresholds (I?) are common across items (i.e.,

ry- = r,- ), and items share common thresholds.

Table 10. Parameter Estimates for the HMGL-RSM and -PCM

 

 

 

RSM PCM
Est. SE Est. SE 2'1]. 2'2 j SE(1'21-)
51 0.49 (0.16) 0.48 (0.14) -2.10 2.10 (0.09)
52 0.75 (0.16) 0.72 (0.15) -2.03 2.03 (0.09)
63 -0.28 (0.16) -0.28 (0.14) -2.00 2.00 (0.08)
64 -0.92 (0.16) -0.93 (0.14) -2.22 2.22 (0.07)
55 -0. 12 (0.16) -0.12 (0.14) -2.05 2.05 (0.08)
56 0.03 (0.16) 0.04 (0.14) -2.22 2.22 (0.08)
67 -0.22 (0.16) -0.22 (0.14) -2.69 2.69 (0.09)
68 0.79 (0.16) 0.85 (0.15) -2.39 2.39 (0.10)
69 0.87 (0.16) 0.81 (0.15) -1.87 1.87 (0.09)
610 -0.04 (0.16) -0.05 (0.14) -2.09 2.09 (0.08)
2'1 -2.15 . - - - - -
2'2 2.15 (0.03) - - - - -
#0 -0.01 . -0.01 .
20 2.80 (0.12) 2.82 (0.12)
AIC 7146.7 7142.6
BIC 7201.5 7273.2

 

56

Table 10 (cont’d)

 

Note. {51,62,...,510} = location for items 1 — 10. {r1,r2} = thresholds 1 and 2
for the -RSM. {r1 1312]) = thresholds l and 2 for item j of the -PCM. ya =

Mean person location. Zg= Standard deviation of the person locations. AIC =

Akaike Information Criterion. BIC = Bayesian Information Criterion. Est. =
Estimate. SE = Standard error.

To illustrate the interpretation of 0 for the HMGL-RSM (which is similar for the

-PCM), one focuses on an arbitrarily chosen respondent. For this respondent, 0 = —2.36
logits. Note although a rater selected the categories for the respondent, assume (for this
example and the following examples) that the respondent made the selection for himself

or herself. Thus, on the underlying continuum, notice this person’s location is much

lower on the scale than the overall attractiveness of, say, item 1 (3] = .49). As shown

below, for this item this suggests that the respondent is more likely to be attracted to the
lower categorical responses than the higher categorical responses.
To determine the probability that this respondent will select category 0, 1, or 2,

one refers back to Equations (2.6) and (2.32)-(2.35). For item 1,

.0, = ”PM = .67
ll’

 

 

exp(—2.36—.49-(—2.15)) = 33
w

”11:

exp([—2.36-.49-2.15]+[—2.36—.49—(—2.15)])
”21 = =° ,
11/

 

where

57

1,11 = exp(O) +
exp(-2.36 — .49 -(—2.15)) +
exp([-2.36—.49—2.15]+[—2.36—.49—(-2.15)])
= 1.50.

This suggests that, for item 1, the probability that this respondent will select category 0 is
.67, which is approximately double the probability of selecting category 1. As for

category 2, the respondent has a probability of 0 of selecting this category.

58

Chapter 4. Extending the HMGL-RSM To Include Person Covariates

4-1. The HMGL-RSM with Person Covariates

As seen in Chapter 3, one advantage of applying the HMGLM to model the RSM
is that the it affords the opportunity to obtain better precision for the estimates of the
person and item parameters. However, this is not the only advantage. As mentioned
previously, another advantage—the primary focus of this paper—is that by modeling the
RSM in the HMGLM, the user may posit a model that includes covariates. In this
chapter, the inclusion of covariates at the person level is discussed. This form of the
HMGL-RSM may be especially important in accountability investigations in which the
user is interested in the location of student, after controlling for the effects of a covariate
(e. g., Stone and Lane (2003)).

To model the HMGL-RSM with person covariates, one follows the previous
deﬁnitions of the HMGL-RSM (Section 2-2), in which the category is nested within the
item, which in turn is nested within the person. However, now covariates at the person

level are included.

4-1-1. The Level-1 Model with Person Covariates

The Level-1 model (the category level) is deﬁned as

.. J .
log[—ﬂU-k—-] = Z 4219,, , (4.1)

”i-1,jk j-_-1

where )6)? is the mean category effect if person k selects category i of item j; and x jk is

a dummy variable with values 1 if person k answers item j, and 0 otherwise.

59

4-1-2. The Level-2 Model with Person Covariates

The Level-2 model (the item level) is deﬁned as

. I . .
9);) =70 ,1. €79,242, (42)

where, for person k, 70jk is the mean effect of item j across categories i; 719,2 is the

effect of an item on a particular category i; and W92 is a dummy variable with values 1 if

i' = i, and 0 otherwise. For identiﬁability, 71(2) 2 0.

4-1-3. The Level-3 Model with Person Covariates

The Level-3 model (the person level model) is deﬁned as

T

7017. = 4010 + 240131190 11,: +141: (43)
i=1

7192 =42, (4.4)

where, for the j ‘1‘ item that is answered by person k, 101-0 is the mean effect of persons
on item j; 4013’ is the effect of person covariate t; ”’0ij is a dummy variable with
values 1 if covariate t effects person k, and 0 otherwise; uk is the random effect of person
k on the mean effect of item j, after accounting for covariate t; and 21100) is the mean

change in 201-0 for a particular category of the items, for all persons.

However, the effect of covariate t is assumed to effect person k equally for each

item j; hence 20],, = 102’, =... = 20), = ’10,). Thus, the Level-3 model for 7017c becomes

60

T
701k = 1070 + 240-,zW0.k,z + 141.. (4.5)
i=1

where 201-0 and uk are deﬁned above; 2103, is the effect of person covariate t, which is
now constant across items; and WC,“ is a dummy variable with values 1 if covariate t

effects person k, and 0 otherwise.
Here, it is helpﬁrl to refer back to the honesty example, in which a particular
feeling of an applicant in nested within an item, which in turn is nested within the person.

As before, a particular answered item not only depends upon the overall attractiveness of

the item (101-0) , but it also depends on the attractiveness of the item inﬂuencing a
particular feeling (if?) . In addition to the honesty of the person, the response to the

item also depends upon the person covariate (20.) ) , such as SES. In other words, for

example, the respondent may become more honest as SES increases.

4-1-4. The Combined Model with Person Covariates
The combined model of the HMGL-RSM with person covariates reduces to the

following for a particular category 1’ of the item j

”ﬁt (1')
log——- ”11 =20j0+210 +Zroﬂ,w0k,+uk, (4.6)
" 1

where all terms are deﬁned above.
Therefore, the parameters of the HMGL-RSM with person covariates are related

to and extend the parameters of the traditional RSM in the following manner:

5} = ‘40)0- (4-7)

61

r,- = 41(1)) . (4.8)
and

91,1 = 40-,1WO-k,1 + “k

91,2 = 40-,2W0-k,2 + “k (4 9)

9m = 40-,TW0-k,T +“k
where 6 j and r,- are deﬁned above; and (9“, 0&2” . . ,ij is the location of person k,

when accounting for covariate t (t = 1, ..., T).

4-2. Simulation Study for the HMGL-RSM with Person Covariates

The following section describes a simulation study for the HMGL-RSM with
person covariates. Since Section 3-2 already described a simulation study that examined
the parameter recovery of the person and item parameters when person covariates were
not added to the HMGL-RSM, the focus of this section is to examine the behaviors of the

person parameters when being inﬂuenced by covariates.

4-2-1. Disign
The design of the simulation is as follows. Observations were simulated using the
HMGL-RSM. For the study, 100, 500, or 1000 simulees responded to 10 polytomous
items, where each item consisted of 3 categories i (i = 0, 1, 2). The number of simulees,
items, and categories were chosen to follow typical data from a questionnaire (e.g., Dodd,
1990; Smith & Johnson, 2000; Zhu, Updyke, & Lewandowski, 1997) or a large-scale
assessment (e.g., Michigan Education Assessment Program, 2003; US. Department of

Education, 1999). In addition, the number of simulees and items were chosen because, as

62

shown in Section 3-2, these sample sizes allow for reasonable precision (at least when
covariates were not modeled).

To produce the simulated responses, each simulee k was randomly assigned to be

in one of three levels of a person covariate (11031). The probability of being selected to a

given level was chosen to be .46, .46, and .08, respectively. Probabilities followed the
actual frequencies of the levels of a covariate used in an actual administration of a

conﬁdential readiness assessment. Here, the covariate was SES.

Additionally, each simulee k was randomly assigned a uk , u ~ N (0,1) . Thus, 9k

was obtained by using Equation (4.9). For the simulation, to examine the effect of the

person covariate, 2031 was selected to be .2, .5, and 1. These values were chosen to

follow previous simulation designs of hierarchical RT models using person covariates

(Kamata, 1998). 61- and 2',- were randomly selected to represent parameter estimates

obtained from typical polytomous data (i.e., items 1-10 in Table 2).

Using 6k , 61- , and r,- , three response probabilities for each simulee by item

combination were produced, POjk (0) , Pljk (l9) , and szk (6). If
i' i'+1
20:13-91, (0) < Y jk 5 Z Ii'jk (9) , then simulee k was assigned a response of i' +1 for item j;
0

otherwise a response of 0 was assigned. Note that i' = 0, 1; and ij was a single,

random number for each j x k combination, Y ~ U (0, l) .

The simulation procedure utilized a fully crossed 3 x 3 factorial design that
simulated 9 conditions. Each administration was iterated 50 times producing 450 unique

response data matrices. The number of iterations was chosen because Kamata (1998)

63

showed this to be a reasonable number for obtaining stable estimates. S-Plus (2000) was
used to generate all data. SAS (2001) was used to obtain parameter estimates and conduct

signiﬁcance tests.

4-2-2. Analysis
For the analysis regarding the parameter recovery of the HMGL-RSM with

person covariates, the RMSE for uk , 1031, 51- and r,- was obtained over the iterations

for each condition. Speciﬁcally, the RMSE was obtained by

 

. 1 N . 2
RMSE(w)= 7v-2(c.;,,-a5,,) , (4.10)
n=1

where the maximum number of n iterations was N = 50; and a) is an arbitrary parameter

representing either uk , 2031, 6]- and r,- . A descriptive analysis of the RMSE was

conducted for each condition.

4-2-3. Results: Descriptive Stzﬁstics
Displayed in Tables 11, 12, and 13 are the mean and standard deviations of the

parameter estimates for the HMGL-RSM when 1031 equaled .2, .5, and 1, respectively.

As can be seen, the standard deviations of the estimates are similar across conditions.
Additionally, the standard deviations are fairly low and decrease as the number of
persons increase. This suggests that PROC NLMIXED obtains relatively consistent

estimates of the HMGL-RSM parameters.

64

As for the mean of the estimates, in general, the estimates obtained by PROC
NLMIXED for the HMGL-RSM appear to differ only slightly from their parameter

values. Below, in Section 4-2-4, the RMSE is examined.

Table 11. Mean and Standard Error of the Parameter Estimates for the HMGL-RSM
when 1031 = .2

 

 

 

 

 

100 500 1000
M SE M SE M SE
51 -010 (0.35) -013 (0.14) -009 (0.11)
52 0.04 (0.37) 0.00 (0.16) 0.00 (0.14)
53 -0.89 (0.39) -093 (0.16) -092 (0.12)
54 -1.59 (0.29) -l.61 (0.16) -1.58 (0.14)
55 -0.81 (0.38) -O.83 (0.18) -O.82 (0.14)
56 -O.76 (0.34) -0.76 (0.15) -073 (0.13)
57 -O.88 (0.35) -O.85 (0.15) -0.81 (0.12)
58 -0.05 (0.31) -005 (0.14) 0.00 (0.11)
59 0.05 (0.37) 0.05 (0.16) 0.07 (0.12)
510 -0.84 (0.38) -O.87 (0.17) -035 (0.12)
{-1 -2.27 (0.12) -225 (0.05) -225 (0.03)
£2 2.27 (0.12) 2.25 (0.05) 2.25 (0.03)
1031 0.19 (0.18) 0.19 (0.07) 0.20 (0.06)
#0 0.01 (0.00) 0.01 (0.00) 0.01 (0.00)
2,; 0.99 (0.13) 1.00 (0.06) 1.00 (0.03)

 

Note. {100, 500,1000} = Number of simulated individuals.

{$1,52,...,310} = location for items 1 — 10. {23,52} = thresholds 1

and 2. 2031 = person covariate. #12 = Mean person location, after

controlling for 11031. Zé= Standard deviation of the person

locations, after controlling for 10,1. M = Mean. SE = Standard

CITOI'.

65

Table 12. Mean and Standard Error of the Parameter Estimates for the HMGL-RSM
when 1031 = .5

 

 

 

 

100 500 1000
M SE M SE M SE
51 -014 (0.39) -0.12 (0.14) -009 (0.11)
32 0.06 (0.36) 0.01 (0.16) 0.00 (0.16)
53 -0.87 (0.40) -095 (0.14) -0.93 (0.12)
54 -1.57 (0.31) -1.62 (0.16) -1.58 (0.15)
55 -O.80 (0.34) -0.83 (0.16) -0.81 (0.14)
36 -0.75 (0.33) —0.77 (0.14) -073 (0.13)
57 -0.83 (0.33) -0.86 (0.16) -O.80 (0.13)
33 0.00 (0.32) -0.06 (0.14) -0.01 (0.12)
39 0.07 (0.31) 0.06 (0.15) 0.07 (0.12)
510 -0.82 (0.35) -O.87 (0.16) -O.86 (0.14)
23 -2.24 (0.13) -225 (0.05) -224 (0.04)
£2 2.24 (0.13) 2.25 (0.05) 2.24 (0.04)
2031 0.50 (0.17) 0.49 (0.07) 0.50 (0.06)
#12 0.00 (0.00) 0.01 (0.00) 0.01 (0.00)
2,; 0.97 (0.13) 1.00 (0.06) 1.00 (0.03)

 

Note. {100, 500,1000} = Number of simulated individuals.

{31,52,...,310} = location for items 1 — 10. {ﬁjz} = thresholds 1

and 2. 1031 = person covariate. pa = Mean person location, after

controlling for 1031. 263: Standard deviation of the person

locations, after controlling for 1031. M = Mean. SE = Standard

error.

66

Table 13. Mean and Standard Error of the Parameter Estimates for the HMGL-RSM
when 1031 = 1

 

 

 

 

 

100 500 1000
M SE M SE M SE
51 -0.10 (0.35) -012 (0.14) -0.10 (0.12)
52 0.10 (0.38) 0.00 (0.17) 0.00 (0.15)
53 -0.87 (0.39) -094 (0.17) -093 (0.14)
54 -1.53 (0.34) -1.60 (0.17) -1.59 (0.16)
55 -0.77 (0.37) -0.83 (0.17) -O.82 (0.15)
56 -071 (0.38) -0.77 (0.14) -075 (0.16)
57 -0.78 (0.36) -O.86 (0.15) -0.82 (0.13)
58 0.04 (0.34) -0.04 (0.15) -0.01 (0.13)
59 0.11 (0.34) 0.07 (0.16) 0.07 (0.14)
510 -0.79 (0.38) -O.87 (0.17) -0.86 (0.14)
f1 -2.26 (0.16) -225 (0.07) -2.25 (0.05)
£2 2.26 (0.16) 2.25 (0.07) 2.25 (0.05)
2031 1.03 (0.18) 0.99 (0.07) 1.00 (0.07)
A; -0.01 (0.01) -0.01 (0.00) -001 (0.00)
2,; 0.98 (0.11) 1.01 (0.06) 1.00 (0.04)

 

Note. {100, 500,1000} = Number of simulated individuals.

{31,52,...,r§10} = location for items 1 — 10. {23,23} = thresholds l

and 2. 2031 = person covariate. #12 = Mean person location, after

controlling for 11031. 25,: Standard deviation of the person

locations, after controlling for 1031. M = Mean. SE = Standard

error.

4-2-4. Results: RMSE

The results of the RMSE for ﬁg , 2,; , 2031, 6]- , and r,- of the HMGL-RSM with

a person covariate are provided in Table 14. Trends indicated that as persons increased

67

from 100 to 1000, the RMSE generally decreased for 21,; , 2,; , 2031, 61- , and r,- . This is

expected because as the persons increase there were more observations from which to

estimate the person and item parameters.

Additionally as one case see, the magnitude of the covariate (20,1) does not

inﬂuence the RMSE. This illustrates that regardless of the size of the covariate, the
coefﬁcient for the covariate is recovered fairly well, with increasing precision as the

number of persons increase.

Table 14. RMSE for the HMGL-RSM with Person Covariates

 

 

 

 

.2 .5 1
100 500 1000 100 500 1000 100 500 1000
a, 0.35 0.15 0.11 0.38 0.14 0.11 0.35 0.15 0.12
52 0.37 0.15 0.14 0.36 0.15 0.16 0.39 0.17 0.15
53 0.39 0.16 0.12 0.40 0.15 0.12 0.39 0.17 0.14
64 0.29 0.16 0.13 0.31 0.17 0.15 0.34 0.17 0.16
55 0.38 0.18 0.14 0.34 0.16 0.14 0.37 0.17 0.15
56 0.34 0.15 0.12 0.33 0.15 0.13 0.37 0.15 0.16
57 0.36 0.16 0.12 0.33 0.17 0.13 0.36 0.16 0.13
63 0.31 0.14 0.11 0.32 0.15 0.12 0.34 0.16 0.13
59 0.37 0.16 0.12 0.31 0.15 0.12 0.34 0.15 0.14
510 0.38 0.17 0.12 0.35 0.16 0.14 0.39 0.17 0.14
2'1 0.12 0.05 0.03 0.13 0.05 0.04 0.16 0.07 0.05
2'2 0.12 0.05 0.03 0.13 0.05 0.04 0.16 0.07 0.05
2031 0.18 0.07 0.05 0.16 0.07 0.06 0.18 0.07 0.07
#1, 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
z . 0.12 0.06 0.03 0.13 0.06 0.03 0.1 1 0.06 0.03

 

68

Table 14(cont’d)

 

 

Note. {100, 500,1000} = Number of simulated individuals. {2,5,1} = Values of
.J. {61,52,...,510} = location for items 1 — 10. {11,12} = thresholds 1 and 2.

20
210 ,1 = person covariate. 21,; = Mean person location, after controlling for 1031.
Z =

Standard deviation of the person locations, after controlling for 1031 .

4-3. Example Analysis of the HMGL-RSM with Person Covariates
The purpose of this section is to provide an example analysis that illustrates the
basic concepts of the HMGL-RSM. In particular, how to use the HMGL-RSM to model

person covariates is illustrated.

4-3-1. _DLsign

The design of the analysis is as follows. Five hundred respondents were randomly
selected from a larger sample of students that responded to a conﬁdential readiness
assessment. Note this was the same assessment simulated in Sections 3-1 and 4-2, and
notice this was the same sample and set of items illustrated in Section 3-3. Speciﬁcally,
in this sample, 46% had parents with high SES (SES = l); 44% had parents with middle
SES (SES = 2); and 10% had parents with low SES (SES = 3). 56% were male, and 44%
were female. Additionally, approximately less than 1% were age 5; 23% were age 6; 65%
were age 7; 12% were age 8; and less than 1% were age 9. Lastly, less than 1% were
Asian; 42% were African-American; 2% were Hispanic; and 56% were Caucasian.

For the purposes of this illustration, only the ﬁrst 10 items of the assessment were

used. (Note each item measured the person’s personal and social development.)

69

Additionally, only those respondents who answered each item and whose parents
provided their SES were used. As illustrated in Section 4-2, the sample and item sizes

were adequate to obtain relatively precise parameter estimates.

4-3-2. Analysis
To analyze the responses of the students, PROC NLMIXED of SAS (2001) was
used to estimate the person and item parameters for the HMGL-RSM with SES as the

person covariate. For comparison, the MRCMM (Equation (1.5) and (1.6)) with SES as a

covariate for the random person location (19k ) was also estimated using PROC

NLMIXED. Note that SAS was used and not Conquest because it was of interest to
compare the models, not the estimation algorithms of the software.

Also note that for the MRCMM, typically the item response is a column vector,
where the number of rows is equal to the number of categories, and where a row equals 1
if the person selected a particular category, and 0 otherwise. This creates a dummy,
column vector with observations equal to I x J x K rows. For the data here, when the
observation vector was created this way, the adaptive Gaussian quadrature integral
approximations did not converge. To reduce the number of observations, rather than
using a column vector of Os and Is, the categorical response itself (e. g., if the person
selected category 2, the response was 2) was used. This created a response column vector
with observations equal to J x K rows. By doing this column vector, convergence was

achieved.

4-3-3. Results

70

The results of the analysis for the HMGL-RSM and MRCMM with SES as a
person covariate are presented in Table 15. As can be seen, the HMGL-RSM and
MRCMM yield identical estimates for all parameters. This result is not surprising given
that in order to comply with the assumptions of IRT, the HMGL-RSM is deﬁned by
constraining the person location to be equal across items and categories (see Section 2-1-
3). Consequently, there is no variation in the person location across items and categories,
as is the case with the MRCMM. Additionally, recall that in order to get the estimation
algorithm to converge for the MRCMM, the number of observations was reduced.
Therefore, because the general form of the HMGL-RSM and MRCMM are similar, and

because the number of observations is equivalent, similar estimates are obtained.

71

Table 15. Parameter Estimates for the MRCMM and HMGL—RSM With SES as a Person
Covariate

 

 

 

MRCMM HMGL-RSM
M SE M SE
61 0.09 (0.36) 0.09 (0.36)
62 0.35 (0.36) 0.35 (0.36)
63 -0.68 (0.36) -0.68 (0.36)
64 -l.31 (0.36) -1.31 (0.36)
65 -0.51 (0.36) -0.51 (0.36)
66 -0.37 (0.36) -0.37 (0.36)
57 -0.62 (0.36) -0.62 (0.36)
68 0.39 (0.36) 0.39 (0.36)
59 0.48 (0.36) 0.48 (0.36)
610 -0.44 (0.36) -0.44 (0.36)
11 2.15 . 2.15 .
r2 -2.15 (0.03) -2.15 (0.03)
pm -0.12 (2.42) -0.12 (2.42)
ﬂu 0.12 (2.42) 0.12 (2.42)
1031 -0.24 (0.20) ~0.24 (0.20)
[.102 -0.74 (2.65) -O.74 (2.65)
ﬂu -O.26 (2.65) -0.26 (2.65)
1031 -0.24 (0.20) -O.24 (0.20)
#93 -0.15 (2.52) -O.15 (2.52)
ﬂu 0.57 (2.52) 0.57 (2.52)
1031 -0.24 (0.20) -0.24 (0.20)
#49 -0.40 -0.40
#u -0.01 -0.01
20 2.80 (0.12) 2.80 (0.12)
AIC 7147.3 7147.3
BIC 7206.3 7206.3

 

Note. {61,62,...,6|0} = location for items 1 — 10. {11,12} = thresholds 1 and 2. 20111
= Effect of SES. #9 = Overall Mean person location. {ﬂap/162W93} = Mean for

person locations of high, medium, and low SES groups. 29= Standard deviation of

the person locations. AIC = Akaike Information Criterion. BIC = Bayesian
Information Criterion. Est. = Estimate. SE = Standard error.

72

Nevertheless, the HMGL-RSM may be the preferred model, because it is
expected that if the number of observations was increased for the MRCMM, as the
developers intended it to be, then the estimates would be somewhat different. And as
mentioned above, the MRCMM was not deﬁned as being able to model additional
hierarchical levels that predict how the item parameters behave, which may be important
(e. g., see Section 5).

To illustrate the inﬂuence of SES, one now compares the HMGL-RSM with and

without SES (Table 10 in Chapter 3). As one can see, when SES is not included in the

model, the overall mean person location is centered near zero (,ué = —.01) , as would be

expected. Additionally, if SES is not modeled, the low, medium, and high SES groups
have mean person locations equaling .27, -.34, and .26, respectively. Notice, then, that the
mean person location of the high SES group is actually lower than the location for the
low SES group. Also, the middle SES group is nearly one logit lower than both the high
and low SES groups.

In contrast, when controlling for SES, the overall mean of the random effect of
persons is centered near zero, [1,; = —.01 . Notice this value is similar to the overall mean

person location when SES is not accounted for. This is expected because, recall, the mean

of the random effect of persons (u) is set to zero, and when SES is not modeled, (9 = u .

However, by modeling SES, we see that its effect on the person location ([1031) is

-.24. Hence, as a person increases in one unit in SES (i.e., increases in poverty), his

location decreases. Thus, by including SES, the overall person location decreases by

almost half a logit (yé = —.40) . Hence, if the parent’s SES is controlled for—that is, we

73

ignore the effects of the parent’s SES—then, the average person’s location on the

underlying continuum is ahnost half a logit higher.

For example, the mean location ( ”19) of the high, medium, and low SES groups

is -.12, -.74 and -.15. However, notice after controlling for SES, the mean location (11,; )

of the high, medium, and low SES groups now becomes .12, -.26, and .57. Although the
rankings are the same to the rank orderings when SES is not controlled for, notice now
that by controlling for SES, the groups’ mean location increases. Additionally, the
difference in mean locations between the groups becomes larger at nearly a half a lo git.

So which is the better model for the data: the HMGL-RSM with SES or without
SES? Examining the AIC and BIC values, we see that for both the AIC and BIC, the
lower values are for the HMGL-RSM without SES. Furthermore, when inspecting the
information weights, the AIC and BIC weights for the HMGL-RSM without SES are .57
and 1.00, while the AIC and BIC weights for the HMGL-RSM with SES are .43 and .08.
Since the AIC and BIC are lower for the HMGL-RSM without SES, and since higher
weights indicate the model is more likely, the evidence suggests that the HMGL-‘RSM
without SES is the better ﬁtting model.

Before this section is concluded, the reader should notice that the difference
between the item locations for each item of the two the models is -.4. This difference
does not necessarily indicate that by including SES in the model, the item location
decreases by -.4; rather, it indicates the arbitrariness of the IRT scale. That is, recall from
Section 24, the IRT scale is indeterminate, and the indeterminacy is resolved by
centering on the normally distributed person measures, where the mean is equal to zero.

By including the covariate SES in the model, the mean of the scale changes. Speciﬁcally,

74

in contrast to before, when not including SES, 210 = 21,, = O . However by including SES,

”9 = ”10.118081 +14 = ”10,1ka + ”u = ”10,110“ + 0 = 7'4'

75

Chapter 5. Extending the HMGL-RSM To Include a Group Level
5-1. The Four-Level HMGL-RSM

As seen in Chapter 4, one advantage of applying the HMGLM to model the RSM
is that the user may posit a model that includes person covariates. Another advantage is
that the user may posit a model that includes a group level, which deﬁnes how the item
parameters behave across groups. Hence, a Four-Level I-IMGL-RSM is deﬁned. This
form of the HMGL-RSM may be especially important in educational testing during
investigations of differential item functioning (DIP).

To model the F our-Level HMGL-RSM, four models are deﬁned. The Level-1, -2,
and -3 models follow the previous deﬁnitions of the HMGL-RSM, in which the category
is nested within the item, which in turn is nested within the person (Section 2-3). For the
4-Level HGL-RSM, the Level-4 model is deﬁned for the group level, where persons are

nested within groups.

5-1-1. The Level-l Model

The Level-1 model (the category level) is deﬁned as

.. J .
log[-—7—rykI—] = 2 [35.21%] , (5.1)

”i-l,jkl j=1

where ,6?) is the mean category effect if person k in group I selects category i of item j;

and x jkl is a dummy variable with values 1 if person k in group 1 answers item j, and 0

otherwise.

5-1-2. The Level-2 Model

76

The Level-2 model (the item level) is deﬁned as

. l . .
1652 = 701k] +Z7l(flzlwl(.l]2p (5'2)
i=1

where, for person k in group I, 7’0jk1 is the mean effect of item j across categories i; 719,2,

is the effect of an item on a particular category i; and w”, is a dummy variable with
values 1 if i' = i for the j th item answered by person k in group I, and 0 otherwise. For

identiﬁability, 71( (1)30 20

Here, notice that before, the item effects only varied across persons. Now, not
only do the item effects vary for each person k, but the item effects vary for each group I
as well. To see how the effects vary, the person level model (Level 3) and the group level

model (Level 4) are deﬁned.

5-1-3. The Level-3 Model

The Level-3 model (the person level) is deﬁned as

70m = 40 101 + “kl, (5-3)
(1') - ( )
71 11- 41 01’ (5.4)
where, for the j th item that is answered by person k in group I, 40101 is the mean effect

of persons for group I on item j; “k1 is the random effect of person k in group [on the

mean effect of item j; and [1190), is the mean change in 1101-01 for a particular category of

the items.

77

However in IRT, we assume that the person effects are not only constant across

items, but constant regardless of group as well. Thus, the following constraint is made
uk1=uk2 =...=uk1 =uk,
and the Level-3 model for the mean item effect becomes
70jkl =10j01+ulw (55)
where 1101-01 is deﬁned above; and uk is the random effect of person k (regardless of

group) across items.

Here, it is helpful to refer back to the honesty example, for we can clearly see how
the category effects function as the categories are nested within items, which in turn are
nested within persons. Speciﬁcally, as mentioned above, the probability that an applicant

is attracted to a particular feeling for a particular answered item not only depends upon

the overall attractiveness of the item (1101-0, ) , but also how the attractiveness of the item
inﬂuences a particular feeling (11(2),). In addition, as the Level-3 model shows, the
overall attractiveness of the item (101-01) and the inﬂuence of an item on a particular

feeling (211(3),) is ﬁxed across persons, but may vary across 1 groups. Lastly, as is

commonly assumed in IRT, the unique effect of an applicant randomly varies across the

different applicants.

5-1-4. The Level-4 Model

Lastly, the Level-4 model (the group level) is deﬁned as

78

[—1
40101 =50j00 +250 10120 101, (5-6)
71(2) = 61(30. (5.7)
where, for the j th item that is answered by person k in group 1, 601-00 is the mean effect

of groups on item j; 501-0, is the mean change in ‘50j00 as group membership changes;

51(ng is the mean change in 501-00 for a particular category of item j; and 201-0, is a

dummy variable with values 1 if person k is a member of a particular group I, and 0
otherwise.

Again, one refers back to the honesty example. In the group level model, we can

see how the overall attractiveness of the item (101-0,) depends on group membership. For

example, if an applicant belongs to the baseline group, such as Caucasian, then the

overall attractiveness of the item for Caucasians is given as 501-00. However, if an

applicant belongs to a comparison group, such as Asians, then the overall attractiveness

of the item for Asians is given as 501-00 + 601-01. Additionally, notice the attractiveness of

the item for a particular feeling (2100) ) remains ﬁxed not only for different persons, but
for different groups as well (6:20).
5-1-5. The Combined Model

The combined model of the 4—Level HMGL-RSM reduces to the following for a

particular category i of item j

79

 

1—1
7’ "k1 i
104” y ]= 50100 + 25010120101 + 55.30 + “k, (5-3)

where all terms are deﬁned above.

Therefore, the parameters of the HMGL-RSM are related to and extend the

parameters of the traditional RSM in the following manner:

510 = '50100
511: "(501'00 +50j01)
512 = ‘(501‘00 +50 102) . (5.9)

51]-] = '(60j00 +§OIO,I’1)

75-51120, (5.10)
and
6k = uk , (5.11)
where r,- and 9k are deﬁned above; 61-0 is the location of the item on the underlying
continuum for the baseline group; and 61-, is the location of the item on the underlying

continuum for a particular group 1.

5-2. Simulation Study for the Four-Level HMGL-RSM

The following section describes a simulation study for the Four-Level HMGL-
RSM. Since Section 3-2 already described a simulation study that examined the
parameter recovery of the person and item parameters when a fourth level was not added
to the HMGL-RSM, the focus of this section is to examine the behaviors of the item

parameters when being inﬂuenced by the additional level. Speciﬁcally, the purposes of

80

the following section is (1) to determine the precision of the parameter recovery for the
person and item pararneters——in particular, the item parameters at the group level, and (2)
to determine the accuracy of a statistical test to detect the inﬂuence of a group-level

coefﬁcient as a measure of DIF.

5-2-1. Disign

The design of the simulation is as follows. Observations were simulated using the
HMGL-RSM. For the study, 500 simulees from 2 groups (I = 0, I) responded to 10
polytomous items, where each item consisted of 3 categories i (i = O, 1, 2). The number
of groups, simulees, items, and categories were chosen to follow typical data from a
questionnaire (e. g., Dodd, 1990; Smith & Johnson, 2000; Zhu, Updyke, & Lewandowski,
1997) or a large-scale assessment (e.g., Michigan Education Assessment Program, 2003;
US. Department of Education, 1999). In addition, the number of simulees and items were
chosen because, as shown in Section 3-2, these sample sizes allow for reasonable
precision (at least when a four-level model was not employed).

To produce the simulated responses, each simulee k in group I was randomly

assigned a location 61“ , 0 ~ N (0,1). Additionally, each item j was randomly assigned a

set of item parameters. These item parameters were selected to represent parameter
estimates from typical polytomous data, and follow those that are presented in Table 2 for
a conﬁdential readiness assessment. The items that were selected to be simulated were
randomly chosen to be the ﬁrst 10 items of the conﬁdential readiness assessment that did
not exhibit DIF between males and females (Table 16). By selecting only non-DIP items

(in regards to gender DIF), the inﬂuence of DIF by the non-focus items was minimized.

81

Table 16. DIF results for the Mantel-Haenszel test

 

 

Original Simulation M2 9
Item Item
1 1 0.52 0.471
2 2 0.37 0.545
3 76.52 0000"
4 74.91 0000"
5 36.46 0000"
6 3 0.15 0.699
7 4 0.77 0.379
8 5 0.16 0.688
9 38-89 0000"
10 6 8.17 0.004
11 7 0.12 0.731
12 8 0.39 0.532
13 13.28 0.000"
14 31.21 0000"
15 16.13 0000"
16 18-60 0000’
17 9 7.75 0.005
13 17-38 0000"
19 10 0.70 0.403
20 2.94 0.086
21 9.23 0.002’
22 0.09 0.760
23 2.85 0.091
24 0.03 0.871
25 9.79 0.002

 

_No_te, M 2 = Mantel-Haenszel test statistic. p =

p-value. 1 = statistically signiﬁcant at

a = ~05 25 = .002 . p_ = .000 implies p < .0001.

The Mantel-Haenszel (MH) test (Mantel, 1963) was used as the original test for

DIF. This test was selected because it has been well-studied (e. g., Kim, 2000), and has

82

been typically used in DIF analyses of polytomous data (e.g., US. Department of

Education, 1999).

Thus, using 9k] , 51.1, and r,- , three response probabilities for each simulee by

item combination were produced, P011109) , P1 jkl (6) , and P2jk1(9) . If
i' i’+1

Z Pi'jkl (0) < Y jk 5 Z Ii'jkl (6) , then simulee k in group I was assigned a response of
0 O

i' +1 for item j; otherwise a response of 0 was assigned. Note that i' = 0, l; and Y jk was

a single, random number for each j x k combination, Y ~ U(0, 1) .
The simulation manipulated three variables: (1) the proportion of simulees in the

focus group, (2) the difference in the mean location of the person parameters for the

reference group (67,0) and the focal group (671) , and (3) the level of DIF in the focus

item. Each variable and each condition (described below) was chosen because previous
research found these to inﬂuence DIF detection (Luppescu, 2002).

The conditions for the proportion of simulees in the focus group varied between
10% (50 simulees) and 25% (125 simulees). This represented a testing situation where

the focus group was small or moderate in size.

The conditions for the difference in mean location varied such that 60 was
randomly sampled from N (0,1) , and 6] was randomly sampled from N (-l,l) or

N (—.5,l). This represented a testing situation where, on average, the focus group had a

moderately lower or somewhat lower person location than the reference group.
Lastly, the conditions for the level of DIF in the focus item (which was arbitrarily

chosen to be item 1 in Table 16) varied for the focus group by a positive difference of l

83

standard error (.07) or 2 standard errors (.14). This represented a testing situation where
the focus item displayed a small or moderate effect of DIF; that is, the focus item was
somewhat or moderately less attractive to endorse for the focus group. (Note the standard
error for item 1 was found in Table 5 of Section 3-2-1 and chosen to be the standard error
when 1000 persons responded to 10 items.)

The simulation procedure utilized a fully crossed 2 x 2 x 2 factorial design that
simulated 8 conditions. Each administration was iterated 50 times producing 400 unique
response data matrices. The number of iterations was chosen because Kamata (1998)
showed this to be a reasonable number for obtaining stable estimates. S-Plus (2000) was
used to generate all data. SAS (2001) was used to obtain parameter estimates and conduct

signiﬁcance tests.

5-2-2. Analysis
For the analysis regarding the parameter recovery of the Four-Level HMGL-

RSM, the RMSE for 61-, and r,- was obtained over the iterations for each condition.

Speciﬁcally, the RMSE was obtained by

 

. 1 N .- 2
RMSE(60)= FEM-w” , (5.12)

n=l
where the maximum number of n iterations was N = 50; and a) is an arbitrary parameter

representing either 61-, or r,- . A descriptive analysis of the RMSE was conducted for

each condition.
For the analysis regarding the accuracy of a statistical test to detect DIF: a t-test

with a = .05 is applied to examine the following hypotheses:

84

”0 350101 = 0
H1 350101 3“ 0
Thus, if H 0 is not rejected, then there is statistical evidence to suggest that @0101

does not signiﬁcantly differ from zero, and no DIF exists. That is, the location of item 1

for each group is equal

51,1 = ‘(50100 +50101)
= ‘(50100)
= 51,0-

If H0 is rejected, then there is statistical evidence to suggest that £0101

signiﬁcantly differs from zero, and DIF exists. That is, the location of item 1 for each

group is not equal

51,1 = ‘(50100 +50101)
it ’(50100)
¢ 51,0.

Thus to examine the accuracy, if H 0 was rejected, then a ‘hit’ was made;

otherwise a ‘miss’ was made. The number of hits across iterations for a condition was
deﬁned as the hit rate, i.e., the accuracy of the t-test for detecting DIF (when DIF exists)
under the aforementioned conditions. A descriptive analysis of the hit rate was conducted
for each condition.

Note Cheong and Raudcnbush (2000), Kamata (1998), Luppescu (2002), and Kim
(2003) describe and illustrate similar DIF analyses using a two-level, hierarchical IRT
model for dichotomous data, in which the covariates for the item parameters were added

at the item level rather than a group level. Although the model presented above will

85

reduce to an analogous formulation of the aforementioned models, the model that was
deﬁned may be preferable because users are given the option of specifying a random
component at the group level. Although one did not include the random component here
since it was not of interest, other users may wish to examine this component as a measure

of the group location across the items.

5-2-3. Results: Descripﬁtive Statistics
Displayed in Tables 17 and 18 are the mean and standard deviations of the
parameter estimates for the Four-Level HMGL-RSM when the proportion of simulees in

the focus group was 10% and 25%, respectively. As can be seen, the standard deviations
of the estimates are similar and fairly low across conditions, except for 51,1. For 51,1, as

the proportion of simulees in the focus group increased from 10% to 25%, the standard
deviation decreased from a moderate to somewhat moderate magnitude, as would be
expected. This suggested that PROC NLMD(ED obtained relatively consistent estimates

of the HMGL-RSM parameters, especially as the group size increased.

86

Table 17. Mean and Standard Error of the Parameter Estimates for the Four-Level
HMGL-RSM for Proportion = 10%

 

 

 

 

 

 

 

 

6.2 = -.5 9.2 = -1
1 SD 2 SD 1 s1) 2 SD
M SE M SE M SE M SE
31,0 .005 (0.12) -005 (0.12) -002 (0.12) -002 (0.12)
51.1 0.06 (0.31) 0.13 (0.31) 0.28 (0.33) 0.34 (0.30)
52 0.08 (0.09) 0.08 (0.09) 0.12 (0.09) 0.12 (0.09)
53 -0-84 (0.11) -0.84 (0.11) -0.79 (0.12) -0.79 (0.12)
54 -1.45 (0.11) -1.45 (0.11) -140 (0.10) -140 (0.10)
55 -0.67 (0.11) -0.67 (0.11) -O.63 (0.11) -O.63 (0.11)
56 -071 (0.10) -071 (0.10) -0.65 (0.10) -0.65 (0.10)
37 -0.78 (0.10) -0.78 (0.10) -0.73 (0.10) -0.73 (0.10)
53 -0.06 (0.11) -0.06 (0.11) -001 (0.11) -0.01 (0.11)
39 0.12 (0.11) 0.12 (0.11) 0.17 (0.11) 0.17 (0.11)
3,0 -0-71 (0.09) -0.71 (0.09) -0.65 (0.09) -0.65 (0.09)
£1 -2.23 (0.04) -223 (0.04) -223 (0.04) -223 (0.04)
82 2.23 (0.04) 2.23 (0.04) 2.23 (0.04) 2.23 (0.04)

 

Note. §2= mean location of focus group. SD = standard deviation shift in item 1 for

focus group. {31’0,(§1,1} = location for item 1 for the reference (0) and focal (1) groups.

{32,53,...,510} = location for items 2 - 10 for both groups. {fljz} = thresholds l and 2.
M = mean. SE = standard error.

87

Table 18. Mean and Standard Error of the Parameter Estimates for the Four-Level
HMGL-RSM for Proportion = 25%

 

 

 

 

 

 

 

 

02:-.5 192=-1
ISD 289 1SD 2SD

M SE M SE M SE M SE
31,0 0.01 (0.14) 0.01 (0.14) 0.10 (0.14) 0.10 (0.14)
31,1 0-16 (0.19) 0.23 (0.20) 0.38 (0.21) 0.45 (0.19)
52 0.15 (0.08) 0.15 (0.08) 0.27 (0.08) 0.27 (0.08)
53 -0-75 (0.11) -0.75 (0.11) -0.63 (0.11) -0.63 (0.11)
54 -125 (0.11) -125 (0.11) -1.12 (0.10) -1.12 (0.10)
55 -O.46 (0.10) -O.46 (0.10) -034 (0.10) -034 (0.10)
56 -O.64 (0.10) -0.64 (0.10) -0.52 (0.10) -052 (0.10)
57 -0.77 (0.10) -0.77 (0.10) -0.65 (0.10) -0.65 (0.10)
58 -013 (0.11) -013 (0.11) 0.00 (0.11) 0.00 (0.11)
59 0.17 (0.11) 0.17 (0.11) 0.28 (0.11) 0.28 (0.11)
510 -054 (0.10) -0.54 (0.10) -041 (0.10) -041 (0.10)
81 -2.21 (0.04) -221 (0.04) -221 (0.04) -221 (0.04)
52 2.21 (0.04) 2.21 (0.04) 2.21 (0.04) 2.21 (0.04)

 

Note. 5.2 = mean location of focus group. SD = standard deviation shift in item 1 for

focus group. {510,611} = location for item 1 for the reference (0) and focal (1) groups.

{32,53,...,310} = location for items 2 — 10 for both groups. {fljz} _= thresholds l and 2.

M = mean. SE = standard error.

As for the mean of the estimates: In general, the estimates obtained by PROC

NLMIXED appeared to differ slightly from the parameter values (c.f., Table 2).

Speciﬁcally, trends indicated that the level of DIF did not inﬂuence the mean of the

estimates. However, it appeared that as the proportion of simulees in the focus group

increased and as the mean location of the focus group decreased, the mean of the

88

estimates generally deviated ﬁom the parameter values by a positive magnitude. Below,

the RMSE is examined.

5-2-4. Results: RMSE

The results of the RMSE for the item parameters of the Four-Level HMGL-RSM
are provided in Table 19. As alluded to above, trends indicated that as the level of DIF
increased, the RMSE did not vary across the conditions substantially. This is expected
because, as shown in Section 3-2-2, the location of the item does not inﬂuence the

RMSE.

Table 19. RMSE for the Four-Level HMGL-RSM

 

 

 

 

 

 

 

 

10% 25%
9.2: -.5 32:4 9.2=-.5 32:4

SD 1 2 1 2 1 2 l 2
31,0 0.13 0.13 0.14 0.14 0.17 0.17 0.23 0.23
5‘” 0.32 0.32 0.44 0.42 0.26 0.27 0.45 0.44
32 0.11 0.11 0.14 0.14 0.15 0.15 0.26 0.26
53 0.14 0.14 0.17 0.17 0.21 0.21 0.31 0.31
54 0.16 0.17 0.20 0.20 0.34 0.34 0.46 0.46
5‘5 0.17 0.17 0.21 0.21 0.36 0.36 0.48 0.48
86 0.11 0.11 0.13 0.13 0.14 0.14 0.25 0.25
5‘7 0.10 0.10 0.12 0.12 0.11 0.11 0.19 0.19
58 0.12 0.12 0.11 0.11 0.16 0.16 0.10 0.10
39 0.12 0.12 0.15 0.15 0.15 0.15 0.24 0.24
510 0.17 0.17 0.22 0.22 0.33 0.33 0.45 0.45
f1 0.04 0.04 0.04 0.04 0.05 0.05 0.05 0.05
f2 0.04 0.04 0.04 0.04 0.05 0.05 0.05 0.05

 

89

Table 19 (cont’d)

 

Note

 

. {10%,25%} = percentage of sample in focus group. 67.2 = mean location of focus

group. SD = standard deviation shift in item 1 for focus group. {51,0,(§1,1} = location for

item 1 for the reference (0) and focal (1) groups. {3233” ”310} = location for items 2 —
1 0 for both groups. {51, {'2} = thresholds 1 and 2. M = mean. SE = standard error.

Additionally, as the proportion of simulees in the focus group increased and as the
mean location of the focus group decreased, the RMSE generally increased. The one
exception occurs for (51,1 when 67.2 = -.5. In this case, as the proportion of simulees in the
focus group increased, the RMSE decreased.

Additionally, as the proportion of simulees in the focus group increased, the
magnitude of the RMSE generally increased from a low range (.04 to .22) to a moderate

range (.32 to .42). For 51,], the RMSE increased from a range of .32 to .44 to a range of

.26 to .45. These trends and magnitudes suggest that the sample characteristics of the

focal group inﬂuence the empirical Bayes estimates of not only the focal group, but the

non-focal group as well.

5-2-5. Results: Accurm

As for the accuracy of the t-test for detecting DIF (when DIF exists), the results
show the hit rates were low (Table 20), but still moderately higher than the MH test
(Table 21). Also, trends indicated that the hit rates increased as (1) the level of DIF
increased, (2) the mean location of the focus group decreased, and (3) the proportion in

the focus group increased. Thus, although the hit rates for detecting DIF with the HMGL-

9O

RSM were low, it is expected that increasing the sample and group size should increase
hit rates as well, and ﬁrrther set itself apart from the MH test. This provides some

evidence for the use of the HMGL-RSM as a test for DIF.

Table 20. Hit Rates for Detecting DIF with the HMGL-RSM

 

 

 

 

10% 25%
1 SD 2 SD 1 SD 2 SD
52: -5 0.06 0.10 0.10 0.16
52: -1 0.16 0.22 0.26 0.38

 

Note. {10%,25%} = percentage of sample in

focus group. 52: mean location of focus group.
SD = standard deviation shift in item 1 for focus
group.

Table 21. Hit Rates for Detecting DIF with the MH test

 

 

 

 

10% 25%
1 SD 2 SD 1 SD 2 SD
52: -5 0.04 0 0.10 0.10
§2= -1 0.02 O 0.12 0.08

 

Note. {10%,25%} = percentage of sample in

focus group. 5.2 = mean location of focus group.
SD = standard deviation shift in item 1 for focus
group.

5-3. Example Analysis of the Four-Level HMGL-RSM
The purpose of this section is to provide an example analysis that illustrates the
basic concepts of the 4—Level HMGL-RSM. In particular, one illustrates how to use the

model to detect DIF between males and females.

91

5-3-1. Disign

The design of the analysis is as follows. Five hundred respondents were randomly
selected from a larger sample of students that responded to a conﬁdential readiness
assessment. Note this was the same assessment simulated in Sections 3-1, 4-2, and 5-2. In
this sample, 53% were male, and 47% were female, as was the case in the original
sample. Additionally, approximately 1% were age 5; 26% were age 6; 67% were age 7;
and 6% were age 8. Lastly, approximately 1% were Asian; 48% were Aﬁican-American;
8% were Hispanic; and 42% were age Caucasian.

For the purposes of this illustration, only the ﬁrst 10 items of the assessment were
used. (Note each item measured the person’s personal and social development.)
Additionally, only those respondents who answered each item and provided their gender
were used. As illustrated in Sections 3-1 and 5-2, the sample and item sizes were
adequate to obtain relatively precise parameter estimates and moderately accurate DIF

tests.

5-3-2. m

To analyze the responses of the students, PROC NLMIXED of SAS (2001) was
used to estimate the person and item parameters for the Four-Level HMGL-RSM. Recall
that the four levels of this model are given above. The group predictor that was used was
Gender, in which Males was the reference group (0), and Females was the focus group
(1).

For each item, the following hypotheses are examined

92

170350101 =0
1L11150101 3&0,
wherej= l, ..., 10.

Thus for a particular item j, if H 0 was not rejected, then there was statistical

evidence to suggest that DIF does not exist. Likewise, if H0 was rejected, then there was

statistical evidence to suggest that DIF exists.

Additionally, for comparative purposes, the MH test was conducted. As
mentioned above, this test was selected because it has been well studied (e.g., Kim,
2000), and has been typically used in DIF analyses of polytomous data (e. g., US.
Department of Education, 1999). Also, note previous simulation research has suggested
that similar ﬁndings occur if no puriﬁcation procedures were used, two stage puriﬁcation
procedures were used, or an iterative puriﬁcation process was used (Wang & Su, 2004).
Hence, because similar DIF results are obtained regardless of puriﬁcation procedures,
and because research has shown that the two stage and iterative puriﬁcation procedures
become inefﬁcient when used in conditions similar to those studied here (Donoghue,
Holland, & Thayer, 1993 as cited by Wang & Su, 2004), the decision to not apply any

puriﬁcation was made.

To examine, if the t-test for 501-01 and MH test for item j was accurate, the results

of the analyses was compared to the DIF results found for the larger sample (Table 16).
As shown, of the ﬁrst 10 items, it was found that items 3-5 and 9 exhibit DIF between

Males and Females.

5-3-3. Results

93

The results of the analysis are presented in Table 22. As can be seen, the t-test
was fairly conservative at ﬂagging DIF, while the MH test was not. Speciﬁcally, the t—test
correctly identiﬁed items 3-5 and 9 as exhibiting DIF. However, the t-test also incorrectly
identiﬁed items 7, 8, and 10 as exhibiting DIF. In contrast, the MH test only correctly
identiﬁed item 9, and incorrectly identiﬁed item 1 as exhibiting DIF. Although the Type I
error may be high for the I-IMGL-RSM, this may be preferable because the consequences
may be greater if DIF was not ﬂagged rather than ﬂagged. Thus, although the Type I
error may be high, it appears that the t—test was more powerful at detecting DIF than the
MH test.

One reason the HMGL-RSM may be more powerful at detecting DIF than the
MH test is that the HMGL-RSM is based on parametric methods, while the MH test is
not. That is, the HMGL—RSM is based on the HMGLM framework which attempts to
explicitly model the parameters that characterize the DIF. And, as shown above, the
HMGL-RSM is estimated rather precisely; hence the parameters that characterize the DIF

may be estimated rather precisely as well.

94

Table 22. Item Analysis of a Real Data Set

 

 

 

 

HMGL-RSM MH
Item Par. Est. SE t p M2 p

1 50100 -1.49 0.22 -6.66 0.00 6.20 0.01 b
£0101 0.31 0.33 0.95 0.34

2 50200 -1 .87 0.23 -8.30 0.00 4.03 0.04
502 01 0.63 0.33 1.91 0.06

3 50300 -1.21 0.22 -5.46 0.00 5.46 0.02
430301 1.52 0.33 4.61 000a

4 50400 -0.68 0.22 -3.12 0.00 5.83 0.02
420401 1.52 0.33 4.61 0.00 a

5 50500 -0.94 0.22 -4.25 0.00 0.01 0.93
50501 1.13 0.33 3.45 0.00 a

6 £0600 -0.82 0.22 -3.73 0.00 3.34 0.07
50601 0.59 0.33 1.80 0.07

7 50700 -1 .13 0.22 -5.11 0.00 0.04 0.84
(50701 0.92 0.33 2.81 0.01 b

8 50800 -1.90 0.23 -8.39 0.00 0.04 0.84
£08 01 0.99 0.33 3.00 0.00b

9 420900 -2.25 0.23 -9.83 0.00 6.23 0.01 a
50901 1.62 0.33 4.87 0.00 b

10 4,30,10,00 -1 .11 0.22 -5.03 0.00 0.24 0.62
50,10,01 0.93 0.33 2.82 0.01 b

f1 -2.11 . . .

7‘2 2.11 0.04 -60.31 0.00

 

Note. Par. = parameter. Est. = estimate. SE = standard error. 1 = 1-
statistic. p = p—value. M 2 = Mantel-Haenszel test statistic.
{5010050101} = overall attractiveness of item j for Males and Females,

respectively. {fbfz} = thresholds 1 and 2. a = %0 = .01 . p_ = 0.00

implies Q < .001. a = correct ﬂag for DIF. b = incorrect flag for DIF.

95

To interpret the item parameters, recall that if the item does not exhibit DIF, then

60101 =0 and
511 =5j0
56]-
--§0100

If the item exhibits DIF, then 501-01 at 0 and
510 = '501'00
511 = ‘(501'00 ”(50101)
For example, for item 1, the t-test was not statistically signiﬁcant for $0101;

hence, 51-0 = .ﬂ = (51 = {0100 = 1.49 . Similarly, for item 2, the t-test was not

statistically signiﬁcant for 50101; hence 5‘2 = —§0200 =1.87. In other words, for item 1,

the log-odds of the overall attractiveness of the item is 1.49 for a typical respondent,
while for item 2, the log-odds of the overall attractiveness of the item is 1.87. Thus, item
1 has a lower overall attractiveness than item 2, which suggests that the polytomous
alternatives for item 1 are more easier to endorse than those for item 2, for a typical
respondent regardless of gender.

In contrast to items 1 and 2, for item 3 the t-test was statistically signiﬁcant for

50101; hence for M3165, 330 = -§0300 = 1.21 , and for Females,

53,1 = —(Zjo300 +5030] ) = -(—1.21 +1.52) = —.31 . Thus, the overall attractiveness of item

3 is substantially lower for Females than for Males. This suggests that the polytomous

alternatives for item 3 are easier to endorse for Females than for Males.

96

(As an aside, the reader should note that the item location for all items is lower for
Females than for Males. In other words, the items are easier for Females than for Males.
However, this does makes sense because each of the studied items measures a person’s
personal and social development, and it is well known that Female children are more
advanced in terms of personal and social development than Male children. Hence, the

items are expected to be easier for Females than for Males.)

97

Chapter 6. Extending the HMGL-RSM To Include Item Covariates

6—1. The HMGL-RSM with Person Covariates

As seen in the preceding chapters, the major advantages of applying the HMGLM
to model the RSM is that the it affords the user the opportunity to (I) obtain better
precision for the estimates of the person and item parameters; (2) posit a model with
person covariates; and (3) posit a model with a group level. In addition, the HMGLM
affords one the opportunity to posit a model with item covariates. This form of the
HMGL-RSM may be especially important in DIF studies in which the user attempts to
explain why DIF exists.

To model the HMGL-RSM with item covariates, one follows the previous
deﬁnitions of the I-IMGL-RSM (Section 2-2), in which the category is nested within the
item, which in turn is nested within the person. But now, one includes covariates at the

item level.

6-1-1. The Level-1 Model with Item Cgariates

 

The Level-1 model (the category level) is deﬁned as

.. J .
log[—£lk-—] = z $239k , (6.1)

”i—1,jk j=1

where 65.2) is the mean category effect if person k selects category i of item j; and x jk is

a dummy variable with values 1 if person k answers item j, and 0 otherwise.

6-1-2. The Level-2 Model with Item Covariates

The Level—2 model (the item level) is deﬁned as

98

(i) _ ( ( ) ( )

ﬂjk —70jk+ZI}/l'i2wl-k+ 272jkw21k+ +Zy(jkw7jk (6.2)
i:

where, for person k, 70jk is the mean effect of item j across categories i; 7152 is the

effect of an item on a particular category i; w( 1.2 is a dummy variable with values 1 if
r" = i , and 0 otherwise; 75.? is the effect of covariate t (t = 2,. ..,T —1) on a particular

1

category i for item j; and wt( 12 rs a the value of the tth covariate of category i for item j.

For identiﬁability, 7150 )“ =0 and 713(12):“

6-1-3. The Level-3 Model with Item Covariates

The Level-3 model (the person level) is deﬁned as

70jk = 4010 + “k 9 (6.3)
7,5? = 2152. (6.4)
7,5,). - 25.2, (6.5)

where, for the j th item that is answered by person k, 101-0 is the mean effect of persons

on item j; uk is the random effect of person k on the mean effect of item j; 11100) is the

,1)

mean change in 101-0 for a particular category of the items; and go is the mean change

in 4010 for a particular covariate t of category i for item j.

Here, it is helpful to refer back to the honesty example, in which a particular

feeling of an applicant in nested within an item, which in turn is nested within the person.

99

As before, a particular answered item not only depends upon the overall attractiveness of

the item (201-0) , but also how the attractiveness of the item inﬂuences a particular

feeling (1100)) . However, in addition to the honesty of the person, the response to the

item also depends upon an item covariate (1,513) , such as age. In other words, for

example, the respondent may select one feeling over another more frequently because of

his or her age.

6-1-4. The Combined Model with Item Comm
The combined model of the HMGL-RSM with person covariates reduces to the
following for a particular category i of the item j
log[— ”yk J: 101-0 +2.50 )+22t53w() wgfk+uk, (6.6)
”i- 1,11:
where all terms are deﬁned above.
Therefore, the parameters of the HMGL-RSM with item covariates are related to

and extend the parameters of the traditional RSM in the following manner:

6,- =-’10j0’ (6.7)
t.- = 41(2)). (6.8)
6k = uk , (6.9)
and
an]. = 4,52, (6.10)

100

where 6j , r,- , and 9k are deﬁned above; and U")- is the location of covariate t of

category i for item j on the underlying continuum, which increases one unit as wgk)

increases one unit.
Notice the HMGL-RSM with item covariates allows the item covariates to vary
not only for each item, but for each threshold within each item as well. Currently, the

aforementioned models in Chapter 1 do not allow for such ﬂexibility in item covariate

modeling.

6-2. Simulation Study for the HMGL-RSM with Item Covariates
The following section describes a simulation study for the HMGL-RSM with item
covariates. The focus of this section is to examine the behaviors of the person and item

parameters of the HMGL-RSM when being inﬂuenced by an item covariate.

6-2-1. Design

The design of the simulation is as follows. Observations were simulated using the
I-IMGL-RSM. For the study, 500 simulees responded to 10 polytomous items, where each
item consisted of 3 categories i (i = 0, l, 2). The number of simulees, items, and
categories were chosen to follow typical data from a questionnaire (e. g., Dodd, 1990;
Smith & Johnson, 2000; Zhu, Updyke, & Lewandowski, 1997) or a large-scale
assessment (e. g., Michigan Education Assessment Program, 2003; US. Department of
Education, 1999). In addition, the number of simulees and items were chosen because, as

shown in Section 32, these sample sizes allow for reasonable precision (at least when

covariates were not modeled).

lOl

To produce the simulated responses, each simulee k was randomly assigned to be

in one of four levels of the item covariate ([190). The probability of being selected to a

given level was chosen to be .01, .25, .66, and .08, respectively. Probabilities followed
the actual frequencies of the levels of a covariate used in an operational administration of
a conﬁdential readiness assessment. Here, the covariate was age. For the simulation, the

covariate inﬂuenced an arbitrarily chosen item, item 1.

Additionally, each simulee k was randomly assigned a 6k , 6? ~ N (0,1). 6 j and

r,- were randomly selected to represent parameter estimates obtained from a conﬁdential

readiness assessment (i.e., items 1-10 in Table 2).

Using 0k , 6 j , and r,- , three response probabilities for each simulee by item

combination were produced, POjk (t9) , Pljk (0) , and szk (6) . If
i' i'+l

ZPi'jk (0) < Y jk 5 Z Pi'jk (6) , then simulee k was assigned a response of i' +1 for item j;
0 0

otherwise a response of 0 was assigned. Note that i' = O, l; and Y jk was a single,

random number for each j x k combination, Y ~ U (0, 1) .

For the simulation, three different models were simulated for item 1. They were:

 

Model 1
\ .
It 1
log [fl = 4010 +4190) + (1)0 +141.
\ . ’ '
7t 2
log{—2—1-k- = 101-0 +11%) +3.51?) +uk
”11k 1
Model 2

102

1og[ :——“1:)= [101-0 +418+4210+11k
”0" , (6.12)

10g{— m]: 4010 +4118 +4210 +uk

”11k

where the following constraint is made: (1)0 = 2132) = 1210 ;

 

 

and Model 3
f \ .
7r 1
log J-li =40j0 +1158 + (1)()+“k
Work 1 , (6 13)
log L21" \ = 201-0 +2152 +O+uk
M111: J
where all terms are deﬁned above.
For the other items, the model was
10g[7:1 ——kjk]= 20j0+11()+uk, (6.14)
_ ,1

wherej= 2, ..., 10.
Note that (130 and 1%?) were arbitrarily set to .25 and .5, respectively, and that

11g“) was arbitrarily set to .25. The reason for arbitrarily selecting the values for the

coefﬁcients was because the simulation studies above illustrated that the magnitude of the
coefﬁcient did not affect the RMSE, so little would be gained by manipulating the
magnitude. Additionally, the values appeared to represent typical coefﬁcient values of a
covariate when using the HMGL-RSM (see Chapter 4).

Also note that the sample, item, and group sizes were not manipulated. The reason

for this is that previous simulation studies from the previous sections have already

103

examined this issue. It seems that similar results would follow for the current model if a
similar design to those above were used.

The simulation procedure simulated the 3 aforementioned conditions. Each
administration was iterated 50 times producing 150 unique response data matrices. The
number of iterations was chosen because Kamata (1998) showed this to be a reasonable
number for obtaining stable estimates. S-Plus (2000) was used to generate all data. SAS

(2001) was used to obtain parameter estimates and conduct signiﬁcance tests.

6-2-2. m

The purpose of the analysis is not only to examine the RMSE of estimating the
parameters for the HMGL-RSM with item covariates, but the purpose is to examine the
RMSE of estimating the parameters for the HMGL-RSM with item covariates when the
incorrect model is speciﬁed. The reason being is that typically the user does not know the
true model that explains the data. By examining the RMSE for the incorrect model, one
can better understand how incorrect model speciﬁcation affects precision.

Therefore, for the analysis, the three models described above were simulated. For
a particular dataset, SAS (2001) was then used to estimate Models 1-3. Hence, one model
would yield estimations for the correct model, while the two other models would yield
estimations for the incorrect models.

Next, the parameter recovery of the HMGL-RSM with item covariates was

conducted. Speciﬁcally, the RMSE for 9k , 2,5120, 22(3), [1210, 6 j and r,- was obtained

over the iterations for each condition. The RMSE was

104

 

1 N
—Z(a3,, (6.15)

RMSE( (60): Jan(

where the maximum number of n iterations was N = 50; and a) is an arbitrary parameter

representing either 19k , 2.511%, [12(9), 2210, 6] and r,- . A descriptive analysis of the

RMSE was conducted for each condition.

6-2-3. Results: Descriptive Statistics

Displayed in Tables 23, 24, and 25 are the mean and standard deviations of the
parameter estimates for the HMGL-RSM when the correct model for item 1 was Model
1 2 and 3, respectively. As can be seen, the standard deviations of the estimates are
generally low and similar across conditions. However, the standard deviations are
relatively higher for item 1 when covariates are added to the model. This suggests that
PROC NLMIXED obtains relatively consistent estimates of the HMGL-RSM parameters,

but the consistency decreases for an item when covariates are added.

As for the mean of the estimates, in general, the estimates obtained by PROC

N LMIXED for the HMGL-RSM appear to differ only slightly from their parameter

values. Below, in Section 6-2-4, the RMSE is examined.

105

Table 23. Mean and Standard Error of the Parameter Estimates for the HMGL-RSM for

 

 

 

 

Model 1
M SE M SE M SE

31 -0.03 (0.57) 0.13 (0.49) 1.16 (0.22)
«(1) 0.22 (0.20) 0.23 (0.17) -0.18 (0.09)
((2‘)) 0.51 (0.24)

5:0 0.01 (0.10) 0.01 (0.10) 0.01 (0.10)
53 -0.92 (0.10) -0.93 (0.10) -0.92 (0.10)
54 -1.59 (0.12) -1.62 (0.12) -1.60 (0.12)
55 -0.82 (0.11) -0.83 (0.11) -0.82 (0.11)
5‘6 -0.74 (0.12) -0.76 (0.12) -0.75 (0.12)
57 -0.84 (0.11) -0.86 (0.11) -0.85 (0.11)
58 -0.03 (0.11) -0.03 (0.11) -0.03 (0.11)
5‘9 0.05 (0.12) 0.05 (0.13) 0.05 (0.13)
31 0 -0.86 (0.13) -0.87 (0.13) -0.86 (0.13)
f1 -2.24 (0.05) -2.28 (0.05) -2.25 (0.05)
f2 2.24 (0.05) 2.28 (0.05) 2.25 (0.05)
#12 0.01 (0.00) 0.01 (0.00) 0.01 (0.00)
2,; 1.00 (0.06) 1.00 (0.06) 1.00 (0.06)

 

8% {1,2,3} =estimated Models 1,2, and 3. {51,32,...,310} =
location for items 1 - 10. {£132} == thresholds 1 and 2.

{8182,

Mean person location. 20: Standard deviation of the person

locations. M = Mean. SE = Standard error.

106

} = item covariates. 11.91% for Model 2 = 13210. M; =

Table 24. Mean and Standard Error of the Parameter Estimates for the HMGL-RSM for

 

 

 

 

Model 2
M SE M SE M SE

5, -0.02 (0.52) -0.01 (0.51) 0.56 (0.22)
~(l) 0.22 (0.19) 0.22 (0.19) 0.02 (0.10)
(3‘)) 0.23 (0.21)

3:0 0.01 (0.10) 0.01 (0.10) 0.01 (0.10)
5", -0.92 (0.10) -O.92 (0.10) -0.92 (0.10)
54 -1.59 (0.12) -1.59 (0.12) -1.60 (0.12)
5‘5 -0.82 (0.1 1) -0.82 (0.1 1) -0.82 (0.1 1)
56 -0.74 (0.12) -0.75 (0.12) -0.75 (0.12)
37 -0.84 (0.1 1) -0.84 (0.1 1) -0.84 (0.1 1)
5“,, -0.03 (0.1 1) -0.03 (0.1 1) -0.03 (0.1 1)
5'9 0.05 (0.12) 0.05 (0.12) 0.05 (0.12)
5,0 -0.86 (0.13) -0.86 (0.13) -0.86 (0.13)
1", -2.24 (0.05) -2.25 (0.05) -2.25 (0.05)
7‘2 2.24 (0.05) 2.25 (0.05) 2.25 (0.05)
#12 0.01 (0.00) 0.01 (0.00) 0.01 (0.00)
2,; 1.00 (0.06) 1.00 (0.06) 1.00 (0.06)

 

& {1, 2. 3} =estimated Models 1, 2, and 3. {5,82,...,310} =
location for items 1 - 10. {flj'z} = thresholds l and 2.

{49... 82,

Mean person location. Zé= Standard deviation of the person

locations. M = Mean. SE = Standard error.

107

} = item covariates. 4201),) for Model 2 = [1210. pg =

Table 25. Mean and Standard Error of the Parameter Estimates for the HMGL—RSM For

 

 

 

Model 3
M SE M SE M SE

5”, -0.02 (0.48) 0.01 (0.10) 0.01 (0.10)
«(1) 0.22 (0.17) 0.13 (0.20) 0.25 (0.08)
12;) -0.03 (0.17)

3:0 0.01 (0.10) -0.89 (0.10) -0.92 (0.10)
53 -0.92 (0.10) -1.56 (0.12) -1.59 (0.12)
34 -1.59 (0.12) -0.80 (0.10) -0.82 (0.11)
5'5 -0.82 (0.1 1) -0.73 (0.12) -0.74 (0.12)
56 -0.74 (0.12) -0.82 (0.10) -0.84 (0.11)
57 -0.84 (0.11) -0.03 (0.11) -0.03 (0.11)
5",, -0.03 (0.1 1) 0.05 (0.12) 0.05 (0.12)
59 0.05 (0.12) -0.83 (0.13) -0.86 (0.13)
5‘, 0 -O.86 (0.13) -2.20 (0.05) -2.24 (0.05)
f] -2.24 (0.05) 2.20 (0.05) 2.24 (0.05)
f2 2.24 (0.05) -0.08 (0.56) -0.10 (0.17)
”,2 0.01 (0.00) 0.01 (0.00) 0.01 (0.00)
2,; 1.00 (0.05) 1.00 (0.05) 1.00 (0.05)

 

age. {1, 2, 3} = estimated Models 1, 2, and 3. {51,52,..,,310} =

location for items 1 — 10. {£143} = thresholds 1 and 2.

{011.392.

Mean person location. 203: Standard deviation of the person

locations. M = Mean. SE = Standard error.

6-2-4. Results: RMSE

108

} = item covariates. [1.511% for Model 2 = 21.210. 21,; =

The results of the RMSE for ,ué , 263’ 22030, 12(3), 6,- , and r,- of the HMGL-

RSM with item covariates are provided in Table 26. Trends indicated that the RMSE

generally remained the same, which were low, for ya , 263 , 62 - 610 , and r,- , even if an

incorrect model was estimated.

However, the RMSE generally increased for 61 , [1.511% , and 2.5%?) when an

incorrect model was estimated. This especially occurs if the correct model was Model 1
or 2 and the incorrect estimated model was Model 3. Nevertheless, except if the correct
model was Model 1 and the incorrect estimated model was Model 3, the RMSE tended to
remain within reasonable levels below or around .55. Thus, the analysis provides some
evidence that if the model was correctly speciﬁed, the parameters were estimated
extremely well unless it were inﬂuenced by an item covariate. In this case, only when the
model did not specify an item covariate when there should have been one does the
precision become unreasonable. Otherwise, the precision is somewhat low, yet

reasonable.

109

Table 26. RMSE for the HMGL-RSM with Item Covariates

 

 

 

 

T 1 2 3
E 1 2 3 1 2 3 1 2 3
61 0.56 0.53 1.26 0.52 0.52 0.69 0.48 0.55 0.17
(l) 0.20 0.17 0.44 0.19 0.19 0.25 0.18 0.23 0.08
4210
4;?2) 0.24 0.34 0.55
52 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10
53 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10
54 0.12 0.13 0.12 0.12 0.12 0.12 0.12 0.12 0.12
55 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.10 0.11
56 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12
57 0.11 0.12 0.11 0.11 0.11 0.11 0.11 0.10 0.11
58 0.11 0.12 0.11 0.11 0.11 0.11 0.11 0.11 0.11
59 0.12 0.13 0.13 0.12 0.12 0.12 0.12 0.12 0.12
510 0.13 0.13 0.13 0.13 0.13 0.13 0.13 0.13 0.13
71 0.05 0.06 0.05 0.05 0.05 0.05 0.05 0.06 0.05
2'2 0.05 0.06 0.05 0.05 0.05 0.05 0.05 0.06 0.05
w 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
2,; 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05

 

Note. T = True model. E = Estimated model. {1, 2, 3} = Models 1, 2, and 3.
{31,52,...,510} = location for items 1 - 10. {fbfz} = thresholds 1 and 2.

 

{4930. “(3} = item covariates. #12 = Mean person location. Zé= Standard

deviation of the person locations. M = Mean. SE = Standard error.

6-3. Example Analysis of the HMGL-RSM with Item Covariates
The purpose of this section is to provide an example analysis that illustrates the
basic concepts of the HMGL-RSM with item covariates. In particular, one will illustrate

how to use the model to assist in explaining DIF.

110

6-3-1. Disign

The design of the analysis is as follows. Five hundred respondents were randomly
selected from a larger sample of students who responded to a conﬁdential readiness
assessment. Note these respondents were the same respondents used in Section 5-3.
However, those respondents who did not provide their Age were not used here. Thus, the
ﬁnal sample consisted of 473 respondents. Their demographics are provided below in
Table 27. As one can see, there appears to be an equal distribution of Males and Females

in each of the demographic categories.

Table 27. Demographic Information

 

Males Females Total

 

SES
Hi 106 103 209
Mid 123 100 223
L0 22 17 39
Age
5 1 2 3
6 66 64 130
7 171 161 332
8 22 6 28
Ethnicity
Asian 2 5 7
Af.-Am. 132 104 236
Hisp. 23 18 41
Cauc. 98 105 203

 

Egg, Af.-Am. = African-American. Hisp. = Hispanic.
Cauc. = Caucasian. 9 Males and 13 Females did not
provide their parent’s SES. 5 Males and 1 Female did
not provide Ethnicity.

lll

As in Section 5-3, this illustration only utilized the ﬁrst 10 items of a conﬁdential
readiness assessment (which again measured the person’s personal and social
development). The item covariate that was used was Age. Age was selected as the
covariate because there was some reason to believe that the older respondents may have
interpreted the categories differently than the younger respondents. Lastly, recall that
items 3-5 and 9 contained DIF. Hence, Age was used to explain the DIF that appeared for
items 3-5 and 9 for Males and Females. (Note although the HMGL-RSM identiﬁed
additional items as containing DIF, they were not modeled as being inﬂuenced by Gender
or Age. The reason for doing so was because the effects of the non-DIF items on the item

covariates were not of interest here.)

6-3 -2. mugs

To analyze the responses of the students, PROC NLMIXED of SAS (2001) was
used to estimate the person and item parameters for the HMGL-RSM with Gender and
Age as the item covariate for items 3-5 and 9, and no item covariates for the remaining

items. Hence, the ﬁnal model is the HMGL-RSM with a group level and item covariates.

6-3-3. Results

The results of the analysis are presented below in Table 28. To interpret the
HMGL-RSM with item covariates, recall from above that if the item exhibits DIF, then

(50101 $0 and

510 = 150100

511 =‘(4‘0j00 +§0j01)-

112

For example, for item 3, when item covariates are added to explain DIF, the overall

attractiveness of the item for Males is 33,0 = ~60300 = 4.34 , while the overall

attractiveness of the item for Females is

53,1 = ”(50300 +8030] ) = —(—4.34+ 0.82) = 3.52.

Table 28. Parameter Estimates for the HMGL-RSM With Age as an Item Covariate

 

 

 

Age Included Age Not Included
Item Par. Est. SE t 2 Est. SE t p—
1 50100 -1.34 0.16 -8.21 0.00 -1.32 0.16 -8.12 0.00
2 $0200 -1.60 0.16 -9.68 0.00 -1.57 0.16 -9.58 0.00
3 £0300 .434 1.42 -3.05 0.00 -0.89 0.19 -4.64 0.00
50301 0.82 0.23 3.66 0.00 0.81 0.22 3.64 0.00
~ 1 0.49 0.21 2.34 0.02
60302
~(2) 0.54 0.21 2.60 0.01
50302
4 420400 -1.53 1.42 -l.08 0.28 -0.37 0.19 -1.97 0.05
5040, 0.84 0.23 3.69 0.00 0.82 0.22 3.69 0.00
~ 1 0.16 0.21 0.79 0.43
50402
~(2) 0.18 0.21 0.85 0.40
60402
5 420500 -2.62 1.40 -l.87 0.06 -0.65 0.19 -3.41 0.00
5050, 0.46 0.22 2.09 0.04 0.48 0.22 2.17 0.03
.(1) 0.26 0.20 1.27 0.21
60502
. 2 0.33 0.20 1.61 0.11
60502
6 920600 -0.58 0.16 -3.62 0.00 -0.57 0.16 -3.59 0.00
£0700 -0.71 0.16 -4.42 0.00 -0.70 0.16 -4.37 0.00
8 430800 -1.44 0.16 -8.79 0.00 -1.42 0.16 -8.70 0.00

113

Table 28 (cont’d)

9 50900 -1.37 1.45 -0.94 0.35 -1.95 0.20 -9.79 0.00
5090, 0.88 0.23 3.86 0.00 0.95 0.23 4.17 0.00
(1) -011 0.21 -052 0.60
0902
~ 2 -004 0.21 -O.18 0.86
60902
10 50,10», -0.67 0.16 -4.16 0.00 -0.66 0.16 -4.11 0.00
2‘, -2.18 . . . 2.10 . . .
£2 2.18 0.05 -46.18 0.00 -2.10 0.03 -60.25 0.00

 

Note. Par. = parameter. Est. = estimate. SE = standard error. 1 = t-statistic. p = p-value.
{99ij ,50 1.01} = overall attractiveness of item j for Males and Females, respectively.

{51,5} = thresholds 1 and 2. p = 0.00 implies p < .01,

To explain the difference in the attractiveness between Males and Females, the
model suggests that Age may inﬂuence the genders. That is, older Males and Females
may interpret the item categories differently than younger Males and Females.
Additionally, this inﬂuence is not constant across category thresholds. For example, the

location of Age on the underlying continuum as the category increases from 0 to 1 is

13213 = @939,” = —.49 , while the location of Age as the category increases ﬁom l to 2 is

13223 = @3532 = —.54. Thus, if a Male or Female is age 5 then, then the location of Age

on the underlying continuum as the category increases from 0 to 1 is

13213 x 19513)], = —.49x 5 = —2.45 . If the age is 6, then the location is

13213 x wgg, = —.49x 6 = —2.94. And so on, for Ages 7 and 8, where similar

114

interpretations hold for the location of Age as the category increases from 1 to 2. This
suggests that as Age increases, the location of Age decreases for Males and Females.

To answer the question of whether Age adequately explains the DIF exhibited in
the items, one examines the model ﬁt of the current model compared to the model
without Age as a covariate using the AIC and BIC. When Age is included in the model,
the AIC and BIC are 6955.7 and 7060.7, respectively. When Age is not included in the
model, the AIC and BIC are 6955.6 and 7027.0, respectively. Furthermore, when
inspecting the information weights, the AIC and BIC weights for the HMGL-RSM
without Age are .51 and 1.00, while the AIC and BIC weights for the HMGL-RSM with
Age are .49 and .00. Since the AIC and BIC are lower for the HMGL-RSM without Age,
and since higher weights indicate the model is more likely, the evidence suggests that the
HMGL-RSM without Age is the better ﬁtting model. Thus, although the HGML-RSM
aids in the explanation of DIF, it was found that Age does not explain the existence of

DIF for this particular example.

115

Chapter 7. Conclusions and Future Directions
7-1. Conclusions
As shown in the preceding chapters, the parameters of the I-IMGL-RSM were
recovered fairly well. In addition, simulations and example analyses illustrated the three
primary advantages of utilizing the HMGLM to model the RSM and PCM. Speciﬁcally,

the HMGL-RSM and -PCM were able to extend existing models to include person

 

covariates, a group level, and item covariates.

In addition, the dissertation illustrated several advantages of utilizing the HMGL-
RSM and -PCM for analyzing educational testing data. Speciﬁcally, in Chapter 1, it was
discussed that traditional methods, such as the RSM and PCM, do not account for the
variation between persons and variation of responses within a person. By applying the
I-IMGL-RSM and -PCM, this is accounted for. Additionally, in Chapter 1, one discussed
how the HMGL-RSM and -PCM allow for a singular method that utilizes a hierarchical
framework (HLM) that extends polytomous IRT models to include person covariates and
predictors of item behaviors, and accounts for the correlation between categories of a
polytomous item. No other method applies this speciﬁc framework to do so.

In Chapter 2, the HMGLM framework is used to deﬁne the HMGL-RSM and -
PCM. It was noted, and should be re-stated, that although Tuerlinckx and Wang (2004)
present similar models, the reader should be aware that the models presented here are not
the same models as those presented by the aforementioned authors. The models deﬁned
here use the HMGLM framework; this framework deﬁnes a separate model for each
hierarchical level. As argued, this allows for a more ‘natural’ way of not only modeling

educational testing data, but also understanding educational testing data.

116

In Chapter 3, the HMGLM framework is used to illustrate how the HMGL-RSM
performs in comparison to traditional IRT methods such as the RSM. As shown and
discussed, the primary advantage is that the HMGL-RSM estimates have smaller standard
errors than the RSM estimates. This, of course, becomes important as the user places
higher stakes on the interpretation of those estimates. For example, if the user interprets
the estimate of the person parameter as being the person proﬁciency, and if the user
utilizes this estimate to make the high-stakes decision of whether or not the person passes
high-school, then the less error associated with this estimate, the more conﬁdent the user
will be in making this high-stake decision.

In Chapter 4, the HMGLM framework is used to illustrate how the HMGL-RSM
can be extended to include person covariates. As shown, by applying the HMGL-RSM
with person covariates the user can control for the inﬂuence of a covariate at the person
level. This form of the HMGL-RSM may be especially important in accountability
investigations in which the user is interested in the location of a student, after controlling
for the effects of a covariate (e. g., Stone and Lane (2003)). For example, assume in the
example analysis in Section 4-3 that test-takers obtain a monetary reward for performing
well. As shown, SES was negatively related to performance. Thus, we can see that if the
monetary cut-off were .5 logits, then the lower SES group would receive the monetary
reward—only if SES was controlled for. Compare this to not controlling for SES: the
lower SES group would not receive any monetary reward.

Additionally, as was implied in Chapter 4, the HMGL—RSM with person
covariates has its advantages over traditional methods using covariates such as the

analysis of covariance (ANCOVA). For instance, to apply ANCOVA as a measure for

117

controlling the effects of the covariates on student performance, then the user must ﬁrst
estimate the person and item locations using IRT. Next, the user applies AN COVA
procedures. To do so, one must estimate a model where the dependent variable is the total
test score; and the independent variables are the IRT person estimate and covariate. By
estimating this model, the user may be able to examine how the covariate inﬂuences the
person’s performance on the test. However, this process has its limitations. One
limitation is that the estimates of the covariate and the estimates of the person
performance are not necessarily placed on the same scale. This issue becomes a problem
as the user attempts to interpret the estimates: Does 1 unit in the covariate scale mean the
same thing as 1 unit in the person performance scale? Another limitation is that the
process is somewhat time inefﬁcient since two separate steps are used to obtain the
aforementioned estimates, the IRT step and the AN COVA step.

The advantage of applying the HMGLM to extend IRT models is that the
procedure for controlling the effects of the covariates on student performance is
simpliﬁed to only one step (i.e., estimating one model as opposed to two, which as
mentioned earlier may be a more natural way of conceptualizing the data), and the
estimates are placed on the same scale (Lord, 1980).

In Chapter 5, the HMGLM ﬁamework is used to illustrate how the HMGL-RSM
can be extended to include a group level. As shown, this model was a somewhat powerful
test for detecting DIF. Additionally, when compared to another popular DIF procedure,
the MH test, the HMGL-RSM was not only more powerful, but it afforded a few
advantages the MH test did not. For instance, although a puriﬁcation procedure was not

used here with the MH test because the puriﬁcation procedure would not greatly

118

inﬂuence the DIF results for the simulated testing conditions (e. g., Wang & Su, 2004),
there may be other operational conditions that a puriﬁcation procedure may be necessary.
By utilizing the I-IMGL-RSM, a puriﬁcation procedure is not necessary and this issue is
avoided. That is, by modeling the testing environment with the HMGL-RSM, the model
controls for the effects of DIF and non-DIF items and simultaneously investigates for
DIF. Hence, no puriﬁcation is necessary since the effect of the other items is controlled
for.

In Chapter 6, the HMGLM framework is used to illustrate how the HMGL-RSM
can be extended to include item covariates. As shown, this extension may provide a way
to explain why DIF exists. As brieﬂy discussed, after a DIF examination occurs in an
operational setting, the user must now attempt to explain why DIF occurs, and a decision
regarding the item must be made. That is, the user must decide: even though DIF exists,
does the item display any characteristics that would create a bias for a particular group? If
so, should the item be modiﬁed or removed from the test? By applying the HMGL-RSM
with item covariates, the guesswork is minimized for the ﬁrst part of the decision. That is,
rather than providing a subject judgment for whether or not the item displays any biasing
characteristics, the HMGL—RSM allows the user to explicitly create a model that
examines the user’s hypothesis. For example, rather than the user suggesting a math item
may be exhibiting DIF because it is a trigonometry item and the other items are not
trigonometry items, the user may explicitly deﬁne a model that includes whether or not
an item is a trigonometry item, and then he may examine this model for its ability to

explain the occurrence of DIF.

119

Lastly, it is re-iterated: the HMGLM allows the user to accomplish all of the
aforementioned advantages—simultaneously. Again, there is currently no other
procedure that applies this particular hierarchical framework to do so. Below one

discusses additional contributions of this ﬁamework and these models.

7-1 -1 . Contributions

Beyond extending the RSM and PCM, there are four main contributions that
result by applying the HMGLM to unify HLM and polytomous IRT models. As stated
before, they include (1) models using HMGLM may currently be estimated using existing
software (e.g., SAS, 2001; STATA, 2000); (2) [RT and HLM are uniﬁed using a
common notation; (3) score ﬁmctions and information matrices (which may be used for
parameter estimation) are well-known under the HMGLM (e. g., see F ahrrneir & Tutz,
2001); and (4) a broad class of IRT models within the HLM framework may be estimated

using a common method (e. g., maximum likelihood).

7-1-1 . 1. Swirl] Estigrtion Software is Not Necessg

By applying the HMGLM, estimation of IRT models does not require special
software (e. g., HLM for Windows, 2001). To estimate the HMGLM, all one needs is any
of the mass software that estimates generalized mixed models, such as SAS or STATA.
Consequently, this suggests that users do not have to learn additional software to estimate
these models. Although this may seem like a trivial point, it becomes a strong point once
one considers the amount of time and money saved by not expending one’s energies and

ﬁnances necessary in purchasing and learning new software.

120

7-1-1 .2. Common Notation
Another contribution of applying the HMGLM is that a common notation system

may be used to describe the models that are uniﬁed from two different areas of research.
Although this may seem trivial, it actually is not once one considers that each area of
research, HLM and IRT, has its own notation. Furthermore, each researcher may bring
his own ‘style’ to the notation system. Additionally, if one considers that each separate
notation system may be considered a separate language, then it becomes cumbersome and
confusing when researchers attempt to discuss similar concepts and theories in different
languages, i.e., notations. For example, notice in the discussion above, that the ability
parameter is represented by 0 in IRT, but the ability parameter is represented by u in
HLM. By applying the HMGLM, HLM and IRT may be uniﬁed in such a way that
avoids this issue. And, at the same time, the interpretation of the parameters remains
consistent. Furthermore, since the HMGLM is an extension of univariate GLM, which
already has a strong history and accepted notation, users may simply incorporate IRT and
HLM within a knowledge structure that already exists for GLM without confusing

oneself any further.

7- 1 -1.3. Well-Known Score Functions and Information Mjagces

By applying the HMGLM to IRT, the score functions and information matrices
are well known for the hierarchical IRT models (see Fahrmeir and Tutz, 2001, Chapter
3). Since these are well known, it is not necessary for the user to derive these such that

they can be used during maximum likelihood estimation of the parameters. Compare this

121

to the Bayesian approach. In this approach, for each new model that is developed, the
user may have to derive a new prior and posterior distribution so that the parameters can
be estimated. Although this may be a simple task for some, this may be an extremely
difﬁcult feat for others. By applying the HMGLM to IRT, this can be avoided, and most
researchers who have a general understanding of GLM, HLM, and IRT can enjoy its

application.

7-1-1.4. Common Estimation Method

As the reader can see, there are numerous possibilities for postulating hierarchical
IRT models when the HMGLM is applied. Fortunately, since the HMGLM is simply an
extension of GLM, which has well-studied and well-understood properties (e. g., score
functions and information matrices), the HMGLM also has well-studied and well-
understood properties. The advantage of this is that the nmnerous hierarchical IRT
models that can be developed under the HMGLM may be estimated using a common
estimation method. For instance, here, recall that estimates of the parameters are obtained
by maximizing an approximation to the likelihood integrated over the random effects,
where the integral approximations are obtained via adaptive Gaussian quadrature and the
optimization technique is carried out using a dual quasi-Newton algorithm (SAS, 2001)
or a modiﬁed Newton-Rapheson algorithm (Rabe-Hesketh, Pickles, & Skrondal, 2001).
Again, compare this to the Bayesian approach. For this approach, if a new model is
developed, characteristics such as the conditional probability distributions for the
variances may differ for each new model. Consequently, if the characteristics change for

each new model, then it may be necessary to alter the algorithm of the estimation method

122

for each new model. Obviously, this may prove to be laborious, and consequently the
application of the new model may be avoided. Again, this is not the case for the

HMGLM.

7-2. Limitations

Below, one describes ﬁve limitations that were encountered during this
dissertation, some of which was the result of using popular estimation software such as
PROC NLMD{ED in SAS. They include: (1) the item discrimination parameter is not
modeled; (2) data preparation is cumbersome; (3) potentially long estimation times; (4)
unbalanced data was not considered; and (5) a non-normal distribution of random effects

was not investigated.

7-2-1. Item Discrimination Parameter is Not Modeled

The ﬁrst limitation is that the item discrimination parameters were not modeled.
That is, Muraki (1992) presented an extension of the PCM in which each item has its own
discrimination (i.e., slope). As suggested by this model, this may be an important
parameter to consider if one cannot assume the discrimination of the test items equals
one. Notice that this assumption was made in order to simulate responses for the HMGL-
PCM and -RSM. Fortunately, this does not affect the generality of the I-IMGL-RSM or -
PCM. That is, although it may be necessary to model the discrimination parameter for
some achievement tests or questionnaires, this may not hold for all tests or
questionnaires. For example, the Michigan Education Assessment Program does not

apply a model with a discrimination parameter for estimating the parameters of the state’s

123

achievement test (Michigan Education Assessment Program, 2003). Additionally, Dodd

(1990), Smith and Johnson (2000), and Zhu, Updyke, and Lewandowski (1997) also do

not model discrimination parameters for estimating the parameters of a questionnaire.

7-2-2. QEPrepagation is Cumbersome

A second limitation is that data preparation is fairly cumbersome. That is, before
using estimating the HMGL-RSM and -PCM, the user must structure the raw data such
that the categorical response is a multivariate vector (rather than the category selection
itself, which is typically the case when estimating non-hierarchical polytomous models,
e. g., see the software WINSTEPS (1999)). Additionally, the user must create J -l
dummy variables that identify the item under investigation (see Appendix C). As can be
guessed, this process becomes rather tiresome as the number of items and categories
increases. Nevertheless, the author feels that the time invested in pursuing the application

of the HMGLM in IRT is far outweighed by the beneﬁts gained (see Section 7-1-1).

7-2-3. Possibly MngEstirLation Times

Another limitation is that, if adaptive Gaussian quadrature is used (as is done in
this dissertation), then the estimation of the HMGL-PCM and -RSM may require long
estimation times. For example when using a PC with a 3.2 GHz, Intel Pentium 4
processor, parameter estimation of the HMGL-RSM took approximately 12 hours when
the number of persons and items was 1000 and 25, respectively. This occurs because
adaptive Gaussian quadrature requires ﬁnding the mode of the function being integrated.

This means that as the number of random effects increases—in the case for IRT

124

modeling, as the number of persons increases—adaptive Gaussian quadrature ﬁnds the
mode for each unique random effect for each iteration of the estimation algorithm.

Thus, alternative methods to the HMGL-PCM and -RSM may be more
worthwhile if long estimation times are to be avoided. For example, if the user wants an
estimate of the effect of a covariate for a group of students, an ANCOVA can be applied.
Or, if the user wants to test for DIF, then the MH test can be applied. Of course, these
alternatives also have their disadvantages, which were discussed above. Hence, the user
must choose the preferred method based on which advantages and disadvantages are most
important to him/her.

Nevertheless, the long estimation times does not appear to be a major hurdle in
applying the HMGLM to IRT, at least in the near ﬁrture, considering that computers are
becoming increasingly faster, which may decrease estimation times. Additionally, as
mentioned in Section 2-5, other estimation procedures, which are possibly faster than

adaptive Gaussian quadrature, may be employed.

7-2-4. Unbalanced Data

A fourth limitation encountered in this dissertation is that the simulation study did
not investigate the accuracy of the parameter estimates of unbalanced data (i.e., all
persons do not respond to all items). Of course, in real data, unbalanced data is more
likely the rule rather than the exception. Nevertheless, this dissertation provides insight
on how well the parameters for the HMGL-PCM and -RSM are estimated under ideal
conditions. Consequently, this ideal scenario can now be used as a benchmark for

comparison with ﬁrture studies that investigate the effects of unbalanced data.

125

7-2-5. Non-Normal Distribution for Random Effects Not Investigaﬁd

A ﬁfth limitation is that non-normal random effects were not investigated.
Although it is possible that the random effects may not be normal in actual data, in
educational research it is commonly assumed that the distribution of the effects is normal
(e.g., Cheong & Raudcnbush, 2000; Kamata, 1998, 2001; Lord, 1980; Miyazaki, 2000).
Here, customary assumptions were used, and it is expected that this should not affect the
generality of the model itself. However, if the user is interested in non-normal effects,
then one may posit a non-normal distribution and estimate the model using approaches
other than those discussed here. For example, Hartzel et al. (2001) and Aitkin (1999, as
cited by Hartzel etal., 2001) present a semi-parametric estimation method that does not
rely on a multivariate normal speciﬁcation of the random effects. Additionally,
GLLAMM for STATA allows one to apply binomial, gamma, or Poisson (Rabe-Hesketh,
Pickles, & Skrondal, 2001). F ahrmeir and Tutz (2001, Chapter 7) present estimation
methods based on posterior modes or Bayesian techniques, which also do not require the
distribution of the random effect to be normal. Lastly, Breslow and Clayton (1993, as
cited by Gueorguieva, 2001) and Wolﬁnger and O’Connell (1994, as cited by
Gueorguieva, 2001) present a penalized quasi-likelihood method, which also does not

require the distribution of the random effect to be normal

7-3. Future directions

Future researchers may direct their efforts toward addressing the limitations

described above. For instance, researchers can develop software speciﬁcally designed for

126

estimating the HMGLM. If accomplished, limitations of data preparation and estimation
times would be avoided. However, that is not to say utilizing PROC NLMIXED in SAS
is not worthwhile. Typical everyday users who are not adept at developing computer
estimation software should ﬁne SAS useﬁrl as it provides an easily understandable and
readily available method to estimate the models discussed here.

Additionally, researchers may attempt to apply the HMGLM to a polytomous IRT
model with a discrimination parameter (e. g., Muraki, 1992). This may be possible if one
extends the work of Miyazaki (2000) to polytomous models. Additionally, researchers
may examine the parameter recovery rate for more ‘real-like’ simulated data in which the
data is unbalanced. Lastly, researchers may examine the estimates if non-normal random
effects were utilized.

Other research may direct their efforts toward extending the contributions
described above. For instance, researchers may wish to model a hierarchical FACETS
model (Linacre, 1994). One application of this model is found in the literature regarding
rater effects (e. g., Wolfe, Moulder, & Myford, 2001). It would be interesting to see how
accurately rater effects would be measured by the FACETS model by applying the
HMGLM.

Finally, future researchers may direct their efforts in comparing the HMGLM to
the Bayesian modeling of random-effects approach (Section 1-2-3), the rater effects
approach (Section 1-2-4), and the hierarchical univariate general linear model approach
(Section 1-2-5). Although each of these approaches attempts to obtain similar

information, they do so in different manners, as discussed above. It would be interesting

127

to examine the equivalence in the parameter estimates obtained from each approach. It is

possible that one approach provides better estimates than the other approaches.

128

APPENDICES

129

APPENDIX A.

Example SAS Code for Estimating the HMGL-RSM for a Polytomous Test with 10 Items

*-~ INPUT DATA WWW;

data RSM;
inﬁle "C:\WINDOWS\Start Menu\temp\data.dat" ;
input y0 yl y2 y3 person_id item__id xl-x10 ;

run;

proc sort ;
by person_id;
run;

*

*-~ RUN NLMIXED FOR INITIAL ESTIMATES WWW;
proc nlrnixed data=RSM ;

*PRE-INITIAL ESTIMATES;
parms betal-betalO gammal-gamma3 = 0;

*CODE LINEAR PREDICTORS;
gamma3 = -l*(gammal+gamma2);

etal = xl“ betal + x2* beta2 + x3* beta3 + x4“ beta4 + x5* beta5 +

x6* beta6 + x7* beta7 + x8* beta8 + x9* beta9 + x10* betalO + garnmal;
eta2 = x1* betal + x2* beta2 + x3* beta3 + x4* beta4 + x5* beta5 +

x6* beta6 + x7* beta7 + x8* beta8 + x9* beta9 + x10"‘ betalO + gamma2;

*RATING SCALE MODEL;

pi0 = l / (1 + exp(etal) + exp(etal+eta2) + exp(etal+eta2) );

pi] = exp(etal) / (l + exp(etal) + exp(etal+eta2) + exp(etal+eta2) );

pi2 = exp(etal+eta2) / (1 + exp(etal) + exp(etal+eta2) + exp(etal+eta2) );

*DEFINE LH(ELIHOOD;

Z = (Pi0**y0)*(Pi1"yl)*(Pi2**y2)*(Pi3**y3);
if (z > le-8) then 11 = log(z);

else ll=-le100;

model y0 ~ general(ll);

130

*SPECIFY RANDOM EFFECT DISTRIBUTION;
*none;

*OBTAIN INITIAL ESTIMATES;
ods output ParameterEstimates = parest ;

run;

*

*-~ RUN NLMIXED FOR FINAL ESTIMATES WWW;
proc nlmixed data=RSM ;

*READ IN INITIAL ESTIMATES;
parms / data = parest;

*CODE LINEAR PREDICTORS;
theta = 111 ;

gamma3 = -l*(gammal+garnma2);

etal = xl" betal + x2* beta2 + x3* beta3 + x4* beta4 + x5* beta5 +
x6* beta6 + x7* beta7 + x8* beta8 + x9* beta9 + x10* beta10 + gammal +
theta;
eta2 = x1* betal + x2* beta2 + x3"' beta3 + x4* beta4 + x5* beta5 +
x6* beta6 + x7* beta7 + x8* beta8 + x9* beta9 + x10* beta10 + gamma2 +
theta;

*RATING SCALE MODEL;

pi0 = l / (1 + exp(etal) + exp(etal+eta2) + exp(etal+eta2) );

pil = exp(eta1)/ (l + exp(eta1)+ exp(etal+eta2) + exp(etal+eta2) );

pi2 = exp(etal+eta2) / (l + exp(eta1)+ exp(etal+eta2) + exp(etal+eta2) );

*DEFINE LIKELIHOOD;

Z = (Pi0**y0)*(Pi1**y1)*(Pi2**y2)*(Pi3**y3);
if (z > 1e-8) then 11 = log(z);

else ll=-1e100;

model y0 ~ general(ll);

*SPECIFY RANDOM EFFECT DISTRIBUTION AND OBTAIN EMPIRICAL
BAYES ESTIMATES;

random 111 ~ normal(0, s1 ”'51) subject = person_id OUT=bayesest;

run;

131

***********************************************************************o
9

NOTE. THIS PROGRAM WAS OBTAINED AND MODIFIED FROM
HARTZEL, AGRESTI, AND CAFFO (2001).

ALSO NOTE THEY STATE THE FOLLOWING:

"With Gauss-Hermite quadrature, computer

underﬂow can be a problem mainly when there are many within-cluster
observations. For most data sets in our experience, however, it is

the number of clusters that is large and not the number of

observations within a cluster. In using NLMDCED, we addressed this
problem by assigning the likelihood to a very small number within the
limits of computer precision. Speciﬁcally we entered

if (z > le-8) then 11 = log(z); else 11=-1e100

for this purpose."

****#********t*********#****************************#*********#********;

132

APPENDIX B.

Example SAS Code for Estimating the HMGL-PCM for a Polytomous Test with 10 Items

*-~ INPUT DATA WWW;

data PCM;
inﬁle "C:\WINDOWS\Start Menu\temp\data.dat" ;
input y0 yl y2 y3 person_id item_id xl-xlO ;

run;

proc sort ;
by person_id;
T1111;

*

*~—- RUN NLMIXED FOR INITIAL ESTIMATES WWW;
proc nlrnixed data=PCM ;

*PRE-INITIAL ESTIMATES;

parms betal -beta10
gammal 1 -gammal 2
gamma2 1 -garnma22
garnma3 1 -gamma32
gamma4l -gamma42
gamma5 1 -gamma52
gamma6l -gamma62
gamma7l -gamma72
gamma8 l —garnma82
gamma9l -gamma92
gammal 01 -gamma102 = 0;

*CODE LINEAR PREDICTORS;
garnmal2 = -l*(gamma11);
gamma22 = -1*(gamma21);
gamma32 = -l*(gamma31);
garnma42 = -1*(gamma41);
gamma52 = -l*(garnma51);
garnma62 = -l*(garnma6l);
gamma72 = -1*(gamma71);

133

gamma82 = -l*(gamma81);
garnma92 = -1*(garnma9l);
gamma102 = -1*(gamma101);

betall = betal + garnmall ;
beta12 = betal + garnma12 ;

beta21 = beta2 + gamma21 ;
beta22 = beta2 + gamma22 ;

beta31 = beta3 + gamma3] ;
beta32 = beta3 + gamma32 ;

beta4] = beta4 + gamma41 ;
beta42 = beta4 + garnma42 ;

beta51 = beta5 + garnma51 ;
beta52 = beta5 + gamma52 ;

beta6l = beta6 + garnma6l ;
beta62 = beta6 + garnma62 ;

beta71 = beta7 + gamma7l ;
beta72 = beta7 + gamma72 ;

beta81 = beta8 + gamma8] ;
beta82 = beta8 + gamma82 ;

beta9l = beta9 + garnma9l ;
beta92 = beta9 + garnma92 ;

etal = xl" betall + x2* beta21 + x3* beta31 + x4* beta4] + x5* beta51 +
x6* beta6] + x7* beta7] + x8* beta81 + x9* beta9l + x10“ betalOl;

eta2 = xl" beta12 + x2* bet322 + x3* beta32 + x4* beta42 + x5* beta52 +
x6* beta62 + x7* beta72 + x8* beta82 + x9* beta92 + x10* beta102;

*PARTIAL CREDIT MODEL;

pi0 = 1 / (1 + exp(eta1)+ exp(etal+eta2) + exp(etal+eta2) );

pil = exp(eta1)/ (l + exp(etal) + exp(etal+eta2) + exp(etal+eta2) );

p12 = exp(etal+eta2) / (l + exp(etal) + exp(etal+eta2) + exp(etal+eta2) );

I"DEFINE LIKELIHOOD;

Z = (Pi0**y0)*(pi1**y1)*(Pi2**y2)*(Pi3**y3);
if (z > le-8) then 11 = log(z);

else ll=-le100;

134

model y0 ~ general(ll);

*SPECIFY RANDOM EFFECT DISTRIBUTION;
*none;

*OBTAIN INITIAL ESTIMATES;
ods output ParameterEstimates = parest ;

run;

*

*-~ RUN NLMIXED FOR FINAL ESTIMATES
proc nlrnixed data= PCM;

*READ IN INITIAL ESTIMATES;
parms / data = parest;

*CODE LINEAR PREDICTORS;
theta = ul ;

gammalZ = -1*(gammal 1);
gamma22 = -l*(gamma21);
gamma32 = -l*(gamma31);
gamma42 = -l*(gamma4l);
gamma52 = -1*(gamma51);
gamma62 = -l*(gamma6l);
gamma72 = -l*(gamma71);
gamma82 = -1*(gamma81);
garnma92 = -1*(gamma91);
gamma102 = -l*(gamma101);

betall = betal + gammall ;
beta12 = betal + garnma12 ;

beta21 = beta2 + gamma21 ;
beta22 = beta2 + garnma22 ;

beta31 = beta3 + gamma31 ;
beta32 = beta3 + gamma32 ;

beta4] = beta4 + gamma4l ;
beta42 = beta4 + garnma42 ;

beta51 = beta5 + gamma51 ;

135

beta52 = beta5 + gamma52 ;

beta6l = beta6 + garnma6l ;
beta62 = beta6 + gamma62 ;

beta7l = beta7 + gamma7l ;
beta72 = beta7 + gamma72 ;

betaSl = beta8 + gamma8] ;
beta82 = beta8 + gamma82 ;

beta9] = beta9 + garnma9l ;
beta92 = beta9 + gamma92 ;

etal = x1* betall + x2"l beta21 + x3* beta31 + x4* beta4] + x5* beta51 +

x6* beta61 + x7* beta7l + x8* beta81 + x9* beta9l + x10* beta101 + theta;
eta2 = x1* beta12 + x2”II beta22 + x3* beta32 + x4* beta42 + x5* bet352 +

x6* beta62 + x7* beta72 + x8* beta82 + x9* beta92 + x10* beta102 + theta;

*PARTIAL CREDIT MODEL;

pi0 = 1 / (1 + exp(etal) + exp(etal+eta2) + exp(etal+eta2) );

pil = exp(eta1)/ (1 + exp(etal) + exp(etal+eta2) + exp(etal+eta2) );

pi2 = exp(etal+eta2) / (1 + exp(etal) + exp(etal+eta2) + exp(etal+eta2) );

*DEFINE LIKELIHOOD;

Z = (Pi0**y0)*(pi1**y1)*(pi2**y2)*(l>i3**y3);
if (z > le-8) then 11 = log(z);

else ll=—1e100;

model y0 ~ general(ll);

*SPECIFY RANDOM EFFECT DISTRIBUTION AND OBTAIN EMPIRICAL
BAYES ESTIMATES;
random ul ~ normal(0, s1*sl) subject = person_id OUT=bayesest;

run;
* C
,

********************************1!II:III*************************#**********o
9

NOTE. THIS PROGRAM WAS OBTAINED AND MODIFIED FROM
HARTZEL, AGRESTI, AND CAFFO (2001).

ALSO NOTE THEY STATE THE FOLLOWING:

136

"With Gauss-Hermite quadrature, computer

underﬂow can be a problem mainly when there are many within-cluster
observations. For most data sets in our experience, however, it is

the number of clusters that is large and not the number of

observations within a cluster. In using NLMIXED, we addressed this
problem by assigning the likelihood to a very small number within the
limits of computer precision. Speciﬁcally we entered

if (z > le-8) then 11 = log(z); else ll=-1e100

for this purpose."

**#*****************************************************************IMHO!-
9

137

APPENDIX C.

Example of the Input Data Structure

x6 x7 x8 x9 x10
0
0
0
0
0
0
0
0
0

5000000000

x000000000

“000000000

eMOOOOOOOOO

2000000000

”000000000

x000000000
.mo000000000

t
.1111111111

.0

.1111111111

11000000000

0

3
y1234567891
2

y011110111
l 0
Y100001000

WOOOOOOOOOO
80000000000

b
0123456789w

1

0000000000

0000000000
0000000000
0000000000
0000000000
1111111111
0000000000
0000000000
0000000000
0000000000

5555555555

123456789m

0000100000
0111011000
10000001110
0000000001.

12345 70090
n4U 0001

111111111

000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000

111111111

0100000000001

1234567890
9999999991

0000000000
0100000000
1010000110

1

138

REFERENCES

139

References

Adams, R. J ., & Wilson, M. (1996). Formulating the Rasch model as a mixed
coefﬁcients multinomial logit. In G. Engelhard & M. Wilson (Eds.), Objective
measurement: Theory and practice (V 01. 3, pp. 143-166). Norwood: Ablex.

Adams, R. J ., Wilson, M., & Wang, W. (1997). The multidimensional random
coefﬁcients multinomial logit model. Applied Psychological Measurement. 21(1), 1-23.

Agresti, A. (1996). An introduction to categorical data analysis. New York: John
Wiley & Sons, Inc.

Agresti, A. (2002). Links between binary and multi-category logit item response
models and quasi-symmetric loglinear models. AnnaLles de lﬂaculte des Sciences de
Toulouse Mathematigues, 11(4), 443—454.

Aitkin, M. (1999). A general maximum likelihood analysis of variance
components in generalized linear models. Biometrics, 55, 117-128.

 

Andersen, E. B. (1985). Estimating latent correlations between repeated testings.
Psychometrik_aL43, 3-16.

Andrich, D. (1978). A rating scale formulation for ordered response categories.
Psychometrik_a, 43. 561-573.

Barr, M. A., & Raju, N. S. (2003). IRT-based assessments of rater effects in
multiple-source feedback instruments. Organizational Research Methods. 6(1), 1543.

Bennett, R. 13., Rock, D. A., & Novatkoski, I. (1989). Differential item
functioning on the SAT-M braille edition. Journal of Educational Measfurement. 26(1),
67-79.

Breslow, N. E., & Clayton, D. G. (1993). Approximate inference in generalized
linear mixed models. Journal of the American Stzﬁstical Association. 88. 9-25.

Bumham, K. P., & Anderson, D. R. (2002). Model selection and multimodel
inference: A pragtical information-theoretic approach (2nd. ed.). New York: Springer.

Cheong, Y. F ., & Raudcnbush, S. W. (2000). Measurement and structural models
for children's problem behaviors. Psychological Methods, 5(4), 477-495.

ConQuest. (1998). ACER ConQuest: Generalised item response modelling
software. Camberwell, Melbourne, Victoria: ACER Press.

140

Dodd, B. G. (1990). The effect of item selection procedure and stepsize on
computerized adaptive attitude measurement using the Rating Scale Model. Applied
Psychological Measurement, 14, 355-366.

Doherty, K. M., & Skinner, R. A. (2003). State of the states. In Quality Counts
2003: If I Can't Learn From You. Education Week. 22(17), 75-76, 78.

Donoghue, J .R., Holland, P.W., & Thayer, D.T. ( 1993). A Monte Carlo study of
factors that affect the Mantel-Haenszel and standardization measures of differential item
functioning. In P.W. Holland & H. Wainer (eds.), Differentialitem functionirg (pp. 137-
166). Hillsdale, NJ: Lawrence Erlbaum.

Donoghue, J. R., & Hombo, C. M. (2003). An exten_sion of the hierarchical raters'
model to polytomoua items. Paper presented at the Annual Meeting of the National
Council on Measurement in Education.

Embretson, S. E. (1991). A multidimensional latent trait model for measuring
learning and change. Psychometrika, 56. 495-515.

Fahrmeir, L., & Tutz, G. (2001). Multivariate statistical modelling based on
generalized linear models (2nd. ed.). New York: Springer-Verlag.

Fox, J. P. (In press, a). Applications of multilevel IRT modeling. .

Fox, J. P. (In press, b). Multilevel IRT using dichotomous and polytomous
response data.

Goldstein, H. (2003). Multilevel statistical models (3rd. ed.). New York: Oxford
University Press.

Gueorguieva, R. (2001). A multivariate generalized linear mixed model for joint
modelling of clustered outcomes in the exponential family. Statistical Modelling,_l(3),
1 77-1 93.

Hargrove, L. L., & Mao, M. X. (1997). Three-level HLM modeling of academic
and contextual variables related to SAT scores in T exa_s_. Paper presented at the Annual
Meeting of the National Council on Measurement in Education, Chicago, IL.

Hargrove, L. L., Mao, M. X., & Barkanic, G. (1996). HLM modeling of
coursework. AP. and other academic contextual variables related to SAT scores in Texas.

Paper presented at the Annual Meeting of the National Council on Measurement in
Education, New York, NY.

Hargrove, L. L., & Mellor, L. T. (1994). An HLM exploration of between-school
effects related to within-school SAT score differences in Texas: Accountability

l4l

implications. Paper presented at the National Council on Measurement, New Orleans,
LA.

Hartzel, J ., Agresti, A., & Caffo, B. (2001). Multinomial logit random effects
models. _Sgltistical Modelling,_1(2), 81-102.

Hedeker, D., & Gibbons, R. D. (1993). MD(OR: A computer program for mixed-
effects ordinal, probit, and logistic regression analysis. University of Illinois at Chicago.

Kamata, A. (1998). Some generalizations of the Rasch model: An Application of
the Hieaarchﬁl Generalized Linear Model. Unpublished doctoral dissertation, Michigan
State University.

Kamata, A. (2001). Item analysis by the Hierarchical Generalized Linear Model.
Journal of Educational Measurement. 38(1), 79-93.

Kim, S. H. (2000). An investigation of the Likelihood Ratio Test. the Mantel Test
and the Generalized Mantel-Haenszel Test of DIF. Paper presented at the annual meeting
of the American Educational Research Association, New Orleans, LA.

 

Kim, W. (2003). Development ofa Differentiailtem Functioning (DIF) procedure
using the Hierarchical Generalized Linear Model: A comparison study with logistic
regression procedure. Unpublished doctoral dissertation, Pennsylvania State University.

Lee, Y., & Nelder, J. A. (1996). Hierarchical generalized linear models. Journal of
the Royal Statistigrl Society. Series B. Methodological, 58, 619-656.

Linacre, J. M. (1994). @yQacet Rgch memement. Chicago: MESA Press.

Lord, F. M. (1980). Applications of Item Response Theogy to practical testing
problems. Hillsdale: Lawrence Erlbaum Associates, Inc.

Luppescu, S. (2002). DIF detection in HLM. Paper presented at the Annual
Meeting of the American Educational Research Association, New Orleans, LA.

Maier, K. S. (2000). Applyg'ng Bayesian methoda to hierarchical mea_surement
models. Unpublished doctoral dissertation, University of Chicago.

Maier, K. S. (2001). A Rasch hierarchical measurement model. J oumal of
Educational and Behavioralaﬁrtistics. 26(3), 307-330.

Maier, K. S. (2002). Modeling Incomplete Scaled Questionnaire data with a
Partial Credit Hierarchical Measurement Model. Journal of Educational and Behavioral
Statistics 27(3), 271-289.

 

 

142

Manalo, J. R. (2004). The accuracy and application of the AIC. BIC. and CAIC in
hierarchical linear modeling Paper presented at the Annual Meeting of the American
Educational Research Association, San Diego, CA.

Mantel, N. (1963). Chi-square tests with one degree of freedom: Extensions of the
Mantel-Haenszel procedure. Journal of the Arneﬁam Statistical Association, 58. 690-
700.

Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika,
47(2), 149-174.

Michigan Education Assessment Program (2003). Design and Validity of the Test.
Retrieved March, 2004, from http://www.meap.org/.

Mislevy, R. J. (1987). Exploiting auxiliary information about examinees in the
estimation of item parameters. Applied Psychological Measurement. 11(1), 81-91.

Miyazaki, Y. (2000). Incorporating factor aaalysis into hierarc_hical models.
Unpublished doctoral dissertation, Michigan State University.

Muraki, E. (1992). A generalized partial credit model: Application of an EM
algorithm. Applied Psychological Measurement, 16(2), 159-176.

Patz, R. J. (1996). Markov Chain Monte Carlo methods for Item Responae Theory
models with applications for the National Assessment of Educational Progress.
Unpublished doctoral dissertation, Carnegie Mellon University.

Patz, R. J ., Junker, B. W., Johnson, M. A., & Mariano, L. T. (2002). The
hierarchical rater model for rated test items and its application to large-scale educational
assessment data. Journal of Educational and Behavaioral Statistics 27, 341-384.

 

Patz, R. J ., Junker, B. W., & Johnson, M. S. (1999). The hieraghical rater model
for rated tememsand its application to lag-scale educational assessment data. Paper
presented at the Annual Meeting of the American Educational Research Association,
Montreal, Canada.

Rabe-Hesketh, S., Pickles, A., & Skrondal, A. (2001). GLLAMM Manual.
Department of Biostatistics and Computing, Institute of Psychiatry, Kings College,
University of London.

Raudcnbush, S., Bryk, A., & Congdon, R. (2001). HLM for Windows:
Hierarchical Linear and Non-linear Modelling (Version 5.04). Lincolnwood: Scientific
Software International. -

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical Linear Models:
Applications and Data Analysis Methods (2nd. ed.). London: Sage Publications, Inc.

 

143

Reckase, M. D. (1991). The discriminating power of items that measure more
than one dimension. Implied Pﬁchological Measurement. 15(4), 361-373.

Reckase, M. D. (1997). The past and ﬁiture of multidimensional item response
theory. Applied Psychologic_al Measurement. 21. 25-36.

Reise, S. P. (2000). Using multilevel logistic regression to evaluate person-ﬁt in
IRT models. Multivariate Behavioral Resaarch. 35(4), 543-568.

Rijmen, F., Tuerlinckx, F., De Bock, P., & Kuppens, P. (2003). A Nonlinear
mixed model frameWork for Item Response Theory. Psychological Methods. 8(2), 185-
205.

S-PLUS. (2000). S-Plus 2000. Cambridge: Mathsoft, Inc.

SAS. (2001). Statistical Analysis Software. Cary: SAS Institute.
Singer, J .D. (1998). Using SAS PROC MIXED to ﬁt multilevel models,

hierarchical models, and individual growth models. Journal of Educational and
Behavioral Statistics 24(4), 323-355.

 

Smith, E. V., & Johnson, B. D. (2000). Attention deﬁcit hyperactivity disorder
scaling and standard setting using Rasch measurement. Journal of Applied Measurement,
1(1), 3-24.

Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel aﬂlysis : An introduction to
basic and advanced multilevel modeling. London: Sage Publications.

Spiegelhalter, D. J ., Thomas, A., Best, N. G., & Gilks, W. R. (1996). BUGS 0.5
Examples (Vol. 1). Cambridge, UK: University of Cambridge, Institute of Public
Health, Medical Research Council Biostatistics Unit.

STATA. (2000). Stata Statistical Software (Version 6). College Station, TX.

Stone, C. A., & Lane, S. (2003). Consequences of a state accountability program:
Examining relationships between school performance gains and teacher, student, and
school variables. Applied Measurement in Education. 16(1), 1-26.

Tuerlinckx, F. & Wang, WC. (2004). Models for polytomous data. In P. De
Boeck & M. Wilson (Eds), Explanatory item response models: A generalized linear and
nonlinear approach (pp. 75-109). New York, NJ: Springer-Verlag.

US. Department of Education. Ofﬁce of Educational Research and Improvement.
National Center for Education Statistics. The NAEP 1996 Technical Report, NCES
1999—452, by Allen, N.L., Carlson, J.E., & Zelenak, CA. (1999). Washington, DC:
National Center for Education Statistics.

144

Wang, W. C., Wilson, M., & Adams, R. J. (1998). Measuring individual
differences in change with Multidimensional Rasch Models. Journal of Outcome
Measurement 2(3), 240-265.

Wang, W. C. & Su, Y.H. (2004). Factors inﬂuencing the Mantel and Generalized
Mantel-Haenszel methods for the assessment of differential item ﬁmctioning in
polytomous items. Applied Psychological Measurement, 28(6), 450-480.

 

WINSTEPS (1999). Rasch-Model Computer Program. Chicago: MESA Press.

Wolfe, E. W., Moulder, B. C., & Myford, C. M. (2001). Detecting Differential
Rater Functioning over Time (DRIFT) using a Rasch multi-faceted Rating Scale Model.
Journal of Applied Maasurement. 2(3), 256-280.

Wolﬁnger, R., & O'Connell, M. (1993). Generalized linear mixed models: A
pseudo-likelihood approach. J oumﬂf Statistical Computation and Simulationa. 48. 233-
243.

Wright, B. D., & Masters, G. N. (1982). Rating Scale Analysis. Chicago: Mesa
Press.

Zhang, Y., & Zhang, L. (2002). ModelinaSchool and district effects in the math

aphievement of Delaware students measured by DSTP: A prelimingy application of
Hierarchical Linear Modeling in accountability study. Paper presented at the American

Educational Research Association, New Orleans, LA.

 

Zhu, W., Updyke, W. F., & Lewandowski, C. (1997). Post-hoe Rasch analysis of
optimal categorization of an ordered-response scale. J ournal of Outcome Measurement.
1(4), 286-304.

145

 

 

I"glijljlnyjlilW