2.2.3.. r 3:.

u: my“... . a ‘ u... .

5.1),.

Tr. .1

 

 

11:39:13
@633

This is to certify that the
dissertation entitled

The Effect of Weighting in Kernel Equating Using
Counter-Balanced Designs

 

presented by

Yanxuan Qu

LIBRARY
Michigan State
University

has been accepted towards fulfillment
of the requirements for the

 

 

 

Ph. D. degree in Counseling, Educational
Psychology, and Special
Educaﬁon

 

 

Wan/Z 49. ﬂaw

 

Major Professor’s Signature
Q/v/o 7
p I

Date

 

MSU is an Afﬁrmative Action/Equal Opportunity Institution

44a..-.—.—.-.-n-n-o-I-I-I-O-I-i-l-O-3-.-.-.-

- _-—.—.-.--—- ‘

l PLACE IN RETURN BOX to remove this checkout from your record.
i TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6/07 p:/ClRC/DateDue.indd-p.1

 

 

THE EFFECT OF WEIGHTING IN KERNEL EQUATING
USING COUNTER-BALANCED DESIGNS

By

Yanxuan On

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements

for the degree of

DOCTOR OF PHILOSOPHY
Department of Counseling, Educational Psychology, and Special Education

2007

ABSTRACT

THE EFFECT OF WEIGHTING IN KERNEL EQUATIN G
USING COUNTER—BALANCED DESIGNS

By

Yanxuan Qu

The Counter-Balanced (CB) design for test equating is often used in pilot studies
for testing programs when sample size is limited. When a CB design is used to conduct
equating, data are usually treated as an Equivalent Group design or a Single Group design
(Kolen & Brennan, 2004). On the other hand, von Davier, Holland and Thayer (2004)
proposed a new approach under the Kernel Equating (KE) framework which treats data
as a weighted synthesized mixture of data from the two groups. This new approach is
named as the two independent Single Group approach (28G approach).

This study investigates the performance of the 28G approach in comparison to
other data treatment approaches under different sample sizes and order effect situations.
Both linear and equipercentile equating methods under KB and traditional equating
frameworks were applied to two real datasets and six simulated datasets. The results from
traditional equipercentile equating on each simulated population data were considered as
the benchmark to which all the other equating methods were compared. Standard Errors
of Equating (SEE), Root Mean Square Error (RMSE), equating bias, and Standard Error
of Equating Difference (SEED) were reported for each equating of the simulated data.
The standard Error of Equating and Root Mean Square Error were reported for equating
of the real data samples.

The results indicated the 28G approach uniﬁes the Equivalent Group approach

and the Single Group approach into its ﬂexible framework. The weighting mechanism in
the 28G approach seemed to be sensitive to different order effects. Possible criteria for

selecting the best weights are discussed.

DEDICATION

To my dear parents, my husband, and my little brother

iv

ACKNOWLEDGEMENTS

This dissertation work is completed under the help of many people. First, I am
deeply indebted to Professor Mark D. Reckase for his guidance and encouragement in
this dissertation work. Without his constant and unconditional support, this work would
not have been possible. I learned from him not only his knowledge, but also his
dedication to work and his peaceful and respectful attitude to people.

I would like to thank Dr. Alina von Davier for her generous help and guidance.
She is enthusiastic, upbeat and proactive. Thanks also go to Dr. Richard Houang, and Dr.
Sharif Shakrani for their insightﬁil comments on this dissertation; Dr. Ning Han and Dr.
Henry Chen for their assistance in the KB soﬁware and Dr. Linda Chard for her
assistance in editing the early version of my dissertation.

Meanwhile, I am very grateful to Dr. Betsy Becker, Dr. Mary Kennedy, and Dr.
Edward Wolfe. Working with them on different projects broadened my scope of
knowledge. Their ﬁnancial support made me concentrate on my study and made me feel
the family-like atmosphere.

Finally, my deep gratitude goes to my husband Lixiong Gu for his love and

support, and to my parents and my brother, for their understanding and encouragement.

TABLE OF CONTENTS

 

 

 

 

 

 

LIST OF FIGURES X
NOTATION XIII
CHAPTER I: INTRODUCTION - 1
1.1 EQUATING PROCEDURE IN GENERAL .................................................................................................... 2
1.2 COUNTER-BALANCED DESIGN AND EQUATING .................................................................................... 2
1.3 LITERATURE REVIEW ........................................................................................................................... 3
1.4 RESEARCH QUESTIONS ......................................................................................................................... 5
1.5 RESEARCH EXPECTATIONS ................................................................................................................... 6
CHAPTER II: THEORETICAL FRAMEWORK 6
2.1 COUNTER-BALANCED DESIGN ............................................................................................................. 7
2.2 EQUATING USING COUNTER-BALANCED DESIGNS ............................................................................. 10
2.2.1 Approaches to Treating Data in a CB Design ........................................................................... 10
2.2.2 Equating Methods for a CB Design ........................................................................................... 12
2.3 EQUATING WITH A CB DESIGN UNDER THE KERNEL EQUATING FRAMEWORK ................................... l4
2. 3.1 Step I. Log-linear Pre-smoothing .............................................................................................. 14
2.3.2 Step 2. Estimating Score Probabilities on the Target Population ............................................. 16
2.3.3 Step 3. Continuization ................................................................................................................ 18
2.3.4 Step 4. Equating ......................................................................................................................... 19

2.3.5 Step 5. Calculating Standard Error of Equating (SEE) and Standard Error of Equating
Diﬂerence (SEED) ......................................................................................................... 20
2.4 EQUATING ERROR .............................................................................................................................. 22
2.5 EVALUATING THE RESULTS OF EQUATING ......................................................................................... 22
2. 5. 1 Standard Error of Equating ....................................................................................................... 23
2.5.2 Root Mean Squared Deviation (RMSD) .................................................................................... 25
2.5.3 Equating Bias ............................................................................................................................ 26
2.5.4 Root Mean Square Error ........................................................................................................... 27
2.5.5 Standard Error of Equating Difference ..................................................................................... 2 7
CHAPTER III: METHODS _ 30
3.1 QUANTIFICATION OF DIFFERENTIAL ORDER EFFECT .......................................................................... 30
3.2 DATA .................................................................................................................................................. 31
3.2.1 Real Data ................................................................................................................................... 3]
3.2.2 Simulated Data .......................................................................................................................... 33
3.3 ANALYSIS ........................................................................................................................................... 44
3. 3. I Equating Methods Applied for Simulated Data ......................................................................... 44
3.3.2 Procedure for Estimating Empirical SEE for Simulated Data .................................................. 45
3.3.3 Evaluating Equating Results from Simulated Data ................................................................... 46
CHAPTER IV: RESULTS -- - - - 48
4.1 REAL DATA 1 ..................................................................................................................................... 48
4.1.1 Selecting the Best Equating Function Using RMSE .................................................................. 49
4.1.2 Selecting the Best Equating Function Using SEED ................................................................... 51
4.2 REAL DATA 2 ..................................................................................................................................... 53
4. 2. 1 Selecting the Best Equating Function Using RMSE .................................................................. 55
4.2.2 Selecting the Best Equating Function Using SEED ................................................................... 56
4.3 SIMULATED DATA .............................................................................................................................. 59
4.3.] Model Fit ................................................................................................................................... 59

vi

4.3.2 Evaluating the Equating Results by RMSE ................................................................................ 60

 

 

4.3.3 Evaluating the Equating Results by SEED ................................................................................ 74
CHAPTER V: DISCUSSION 79
5.1 PERFORMANCE OF THE KE METHODS ................................................................................................ 79
5.2 EFFECTS OF THE WEIGHTING METHOD ............................................................................................... 80
5.3 LIMITATIONS OF THIS STUDY ............................................................................................................. 82
5.3.1 Arbitrary Nature of the Equating Criterion ............................................................................... 83

5.3.2 Problem with Simulated Data .................................................................................................... 83

5.4 FUTURE STUDY ................................................................................................................................... 83
APPENDICES -- - _ -- -- -- - 85
REFERENCES - 113

 

vii

LIST OF TABLES

TABLE 1. Equivalent-Groups design ................................................................................. 7
TABLE 2. Single-Group design ......................................................................................... 8
TABLE 3. Counter-Balanced design .................................................................................. 8
TABLE 4. Ways of treating data in a CB design appearing in the literature .................. 12
TABLE 5. KE methods and corresponding traditional equating methods ...................... 28
TABLE 6. All equating methods compared in this study for simulated data ................... 29
TABLE 7. Summary statistics for real data I .................................................................. 32
TABLE 8. Summary statistics for real data 2 .................................................................. 32
TABLE 9. Descriptive statistics for simulated data I ...................................................... 36
TABLE 10. Descriptive statistics for simulated data 2 .................................................... 38
TABLE 11. Descriptive statistics for simulated data 3 .................................................... 39
TABLE 12. Descriptive statistics for simulated data 4 .................................................... 40
TABLE 13. Descriptive statistics for simulated data 5 .................................................... 42
TABLE 14. Descriptive statistics for simulated data 6 .................................................... 43
TABLE 15. Evaluation of equating results from real data 1 ........................................... 50
TABLE 16. Evaluation of equating results from real data 2 ........................................... 55
TABLE 17. Summary statistics for POP] linear equating methods ................................ 61
TABLE 18. Summary statistics for POP] equipercentile equating methods ................... 62
TABLE 19. Summary statistics for POP2 linear equating methods ................................ 63
TABLE 20. Summary statistics for POP2 equipercentile equating methods ................... 64
TABLE 21. Summary statistics for POP3 linear equating methods ................................ 65
TABLE 22. Summary statistics for POP3 equipercentile equating methods ................... 66
TABLE 23. Summary statistics for POP4 linear equating methods ................................ 67

viii

TABLE 24. Summary statistics for POP4 equipercentile equating methods ................... 68

TABLE 25. Summary statistics for POP5 linear equating methods ................................ 69
TABLE 26. Summary statistics for POP5 equipercentile equating methods ................... 70
TABLE 27. Summary statistics for POP6 linear equating methods ................................ 71
TABLE 28. Summary statistics for POP6 equipercentile equating methods ................... 72
TABLE 29. Selected equating function based on SEED .................................................. 78
TABLE 30. Selected equating function based on RMSE .................................................. 78
TABLE A1. Standard error of linear equating for real data 1 ........................................ 85
TABLE A2. Standard error of equipercentile equating for real data I .......................... 87
TABLE A3. Standard error of linear equating for real data 2 ........................................ 89
TABLE A4. Standard error of equipercentile equating for real data 2 .......................... 90

ix

FIGURE 1.

FIGURE 2.

FIGURE 3.

FIGURE 4.

FIGURE 5.

FIGURE 6.

FIGURE 7.

FIGURE 8.

FIGURE 9.

FIGURE 10.

FIGURE 1 1.

FIGURE 12.

FIGURE Al.

FIGURE A2.

FIGURE A3.

LIST OF FIGURES

Observed score distributions for X 1 and Y] in real data I ........................ 48
Observed score distributions for X 2 and Y 2 in real data I. ...................... 49

Equating difference between ZSG(I, 1) linear and ZSG(. 5, .5 ) linear and
the i’ 2SEED conﬁdence interval band around zero line, real data 1. 51

Equating difference between 2SG(1, I) equipercentile and 2SG( 5, .5)
equipercentile and the i 2SEED conﬁdence interval band around zero
line, real data 1. ........................................................................................ 52

Equating diﬂerence between 2SG(. 5, .5) linear and 250(5, .5)
equipercentile and the i ZSEED conﬁdence interval band around zero

line, real data I. ....................................................................................... 53
Observed score distributions for X 1, and Y1 in real data 2, ..................... 54
Observed score distributions for X 2, and Y 2 in real data 2, ..................... 54

Equating difference between 2SG(1, 1) linear and 2SG(.5, .5) linear and
the i 2SEED conﬁdence interval band around zero line, real data 2. 56

Equating diﬂerence between 2SG(1, 1) equipercentile and 2SG( 5, .5)
equipercentile and the i 2SEED conﬁdence interval band around zero
line, real data 2. ........................................................................................ 57

Equating difference between ZSG(I, 1) linear and 2SG(1, 1)
equipercentile, and the i‘ ZSEED conﬁdence interval band around zero

line, real data 2. ...................................................................................... 58
One example of F reeman-T ukey residual plot for POP3. ........................ 59
Equating dﬂerences and the i ZSEED band for simulated data 1. ......... 77

Equating difference between ZSG(1, I)linear and 2SG(. 5,.5) linear,
POP], n=1000 ......................................................................................... 91

Equating diﬂerence between 2SG(1, I) equipercentile and 2SG(. 5, .5 )
equipercentile, POP], n=]000. ............................................................... 91

Equating difference between 2SG(. 5, .5) linear and 2SG(.5, .5)
equipercentile, POP], n=1000. ............................................................... 92

FIGURE A4. Equating diﬂerence between 2SG(I, ]) linear and 2SG( 5, .5) linear,
POP2, n= —...1000 ..

FIGURE A5. Equating difference between 2SG(1 I) equipercentile and ZSG(. 5,. 5)

equipercentile, POP2, n=1000

FIGURE A6. Equating difference between 2SG(.5, .5) linear and 2SG(.5, .5)

equipercentile, POP2, n=1000. ............................................................

FIGURE A7. Equating difference between 2SG(1, 1) linear and 2SG( 5, 5) linear,
POP3, n= —1..000. ..... . .. ..

FIGURE A8. Equating difference between ZSG(], )1 equipercentile and ZSGf. 5, .5)

equipercentile, POP3, n= —...1000

FIGURE A9. Equating difference between ZSG(. 5, .5) linear and ZSG(.5, .5)

equipercentile, POP3, n=1000. ............................................................

FIGURE A10. Equating diﬂerence between 2SG(1, 1) linear and 2SG(. 5, .5) linear,

POP4, n= 1000..

FIGURE A1 1 Equating difference between 2SG(1, )1 equipercentile and 2SG(. 5, 5)

equipercentile, POP4, n=1000

FIGURE A12. Equating diﬁ"erence between 2SG(. 5, .5) linear and ZSG(.5, .5)

equipercentile, POP4, n=1000. ............................................................

FIGURE A13. Equating diﬂ‘érence between 2SG(1, 1) linear and 2SG(. 5,. 5) linear,
POP5, n= —..500. ... .. ... ..

FIGURE A14. Equating diﬂrerence between 2SG(1, )1 equipercentile and 2SG(. 5,. 5)

equipercentile, POP5, n— =500...

FIGURE A15. Equating difference between 2SG(. 5, .5) linear and 2SG(.5, .5)

equipercentile, POP5, n=5 00. ...............................................................

FIGURE A16. Equating difference between 2SG(11) linear and 2SG(. 5. 5) linear,
POP5, n= —1000. .. . . .

FIGURE A17. Equating diﬂerence between ZSG(1, ]) equipercentile and 2SG(. 5, 5)
equipercentile, POP5, n= —1..000 . .. ... .

FIGURE A18. Equating diﬂerence between 2SG(. 5, .5) linear and ZSG(.5, .5)

equipercenti 1e, POP5, n =1 000. ............................................................

xi

. ...93

...93

94

......95

...95

96

...97

....97

98

...99

...99

.100

...101

....101

.102

FIGURE A19. Equating difference between 256(11) linear and ZSG(. 5. 5) linear,

POP6, n= —300... ......103
FIGURE A20. Equating difference between 2SG(1 )1 equipercentile and ZSG(. 5, 5)
equipercentile, POP6, n= -3..00. .. .. ... ....103
FIGURE A21. Equating diﬂ"erence between ZSG(. 5, .5) linear and 2SG(. 5, .5)
equipercentile, POP6, n=300. ............................................................... 104
FIGURE A22. Equating diﬂerence between 2SG(1, 1) linear and 2SG(. 5, 5) linear,
POP6, n— 5.00 . ... ... . .. 105
FIGURE A23. Equating diﬂerence between 2SG(1, )1 equipercentile and ZSG(. 5, 5)
equipercentile, POP6, n= —...500 ... .. .. 105
FIGURE A24. Equating difference between 2SG(. 5, .5 ) linear and ZSG(.5, .5)
equipercentile, POP6, n=500. ............................................................... 106
FIGURE A25. Equating dzﬂerence between 2SG(11) linear and 2SG(. 5,. 5) linear,
POP6, n= 1000... .. 107
FIGURE A26. Equating difference between ZSG(1 ,1) equipercentile and 2SG(. 5,.5)
equipercentile, POP6, n=1000......... 107
FIGURE A27. Equating diﬂerence between 2SG(. 5, .5) linear and 2SG(.5, .5)
equipercentile, POP6, n=1000. ............................................................. 108
FIGURE A28. Equating diﬂ"erence between ZSG(], 1) equipercentile and EG
equipercentile, POP], n=50. .................................................................. 109
FIGURE A29. Equating difference between 2SG(1, I) equipercentile and E G
equipercentile, POP], n=100. ............................................................... 109

FIGURE A30. Equating diﬂerence between 2SG(I, 1) equipercentile and E G
equipercentile, POP4, n=50. ................................................................. 110

FIGURE A31. Equating diﬂerence between 2SG(1, 1) equipercentile and E G
equipercentile, POP4, n=100. ............................................................... 110

FIGURE A32. Equating difference between 2SG(1, I) equipercentile and E G
equipercentile, POP4, n=300. ............................................................... ll 1

FIGURE A33. Equating difference between 2SG(1, I) equipercentile and E G
equipercentile, POP6, n=50. ................................................................. 111

FIGURE A34. Equating difference between 2SG(1, I) equipercentile and E G
equipercentile, POP6, n=1000. .............................................................. 112

xii

NOTATION

 

 

Symbol Explanation

X, Y Names of two test forms to be equated

X, Y Scores on X and Y, random variables

P Population of examinees

T Target population of examinees on which the equating of X and Y takes
place

CB Counter-Balanced data collection design

EG Equivalent Group Design

SG Single Group Design

DF Design Function

X1 Test X that is taken ﬁrst

X2 Test X that is taken second

Y1 Test Y that is taken ﬁrst

Y2 Test Y that is taken second

F (x) Cumulative distribution of variable X

G( y) Cumulative distribution of variable Y

J Number of possible X scores

K Number of possible Y scores

xj A possible score value for X, j is from 1 to J

y k A possible score value for Y, k is from 1 to K

R Generic symbol for the population probability of X aﬁer pre-smoothing
for all designs

S Generic symbol for the population probability of Y after pre-smoothing
for all designs

r Estimated probabilities on target population T, transformed by DF from
R into r

s Estimated probabilities on target population T, transformed by DF from
S into s

éY (x) Estimated score x on Form X equated to Form Y

é X ( y) Estimated score y on Form Y equated to Form X

,9]. An estimated speciﬁc value of r

§ k An estimated speciﬁc value of s

13 j Estimated probability of getting a score x j on X

[3,, Estimated probability of getting a score yk on Y

pjk Estimated joint probability of getting a score x j on X and a score yk
on Y over the target population, T.

13(12)jk Estimated population probability of getting a score x j on test X1 which

 

xiii

 

13(21) jk

11X, by

X(hx)
Why)
Jey(F,§)
JDF(R?,§)

is taken ﬁrst and a score yk on test Y2 which is taken second
Estimated population probability of getting a score x j on test X2 which

is taken second and a score yk on test Y] which is taken ﬁrst

Bandwidth used to deﬁne the KB continuizations of F (x) and G(y). They
are positive numbers. Large values of the bandwidths lead to linear
equating, while smaller values give more “equipercentile-like” equating
functions.

ContinuiZed random variable for scores on Form X

Continuized random variable for scores on Form Y

Jacobian matrix of the KB function, which is a function of fand S

Jacobian matrix of the design function, which is a function of R and S

 

xiv

Chapter I: Introduction

Test equating is an important statistical procedure in educational testing. It is used
to produce scores that are comparable across different but parallel test forms, both within
a year and across years. Although there have been many comparative studies
investigating the accuracy of different equating methods, very few studies have been
done for equating with a Counter-Balanced (CB) design. Traditionally as in Lord (1950),
AngofT (1971) and Kolen and Brennan (2004), data collected by a CB design were either
pooled together as a Single Group (SG) design or discarded as an Equivalent Group (EG)
design. Recently, a new approach of treating data collected by a CB design was proposed
by von Davier, Holland and Thayer (2004). This new approach involves weighting data
before pooling them together. To evaluate the performance of this new approach, this
study compared the overall equating accuracy of the two independent single group
approach, abbreviated as the 28G approach, to the other approaches of treating data
collected by a CB design.

The rest of this chapter introduces the general procedure for equating using the
counter-balanced design and equating approaches for a CB design including the new 28G
approach under the Kernel Equating (KE) framework, and gives a brief summary of
literature on KE equating. At the end of this chapter, the research questions and research
expectations of this study are presented. Chapter 11 describes the CB design and KB
framework as well as equating errors and the evaluation of equating results. Chapter 111
describes the real and Simulated datasets to which the equating methods were applied and
the procedure of this study. Chapter IV presents the study results and Chapter V discusses

the ﬁndings and limitations of this study.

1.1 Equating Procedure in General

Every equating procedure consists of two basic components: equating design and
equating methods. Typical equating designs include Equivalent Group (also called
random group) design, Single Group design, Counter-Balanced design, and Non-
Equivalent Anchor Test (NEAT) design. Typical equating methods can be classiﬁed into
the following three categories: 1) Classical observed score equating; 2) Item Response
Theory (IRT) true score equating; and 3) Item Response Theory observed score equating.
Classical observed score equating methods include the mean, linear, and equipercentile
equating methods reported by Kolen (1988). They deﬁne the score correspondence
between two forms by setting certain characteristics of observed score distributions for a
speciﬁed group of examinees. Item response theory true score equating deﬁnes the score
correspondence by setting the true scores of examinees to be equal (Cook & Eignor,

1991).

1.2 Counter-Balanced Design and Equating

Counterbalance or Latin Square is often used in pure experimental designs to
cancel out order effects (Montogomery, 2000). In educational testing, a CB design is
often used to collect data in pilot studies of testing programs. In a CB design, two
independent groups of examinees usually take two parallel test forms X and Y in
different order.

Various ways of dealing with data in a CB design test equating were described in
Lord (1950), Angoff (1971), and Kolen and Brennan (2004). None of these approaches is
satisfactory for situations when order effect cannot be cancelled out. In order to improve

the equating practice for a CB design, especially when order effects cannot be cancelled

out, von Davier, Holland, and Thayer (2004) proposed a new way of treating data
collected by a CB design under their Kernel Equating framework. This new way of
treating data is named the two independent single group approach (ZSG approach), which
creates a synthetic target group by assigning different weights to the two tests taken in
different order, and applies linear and equipercentile equating methods to the synthetic
group. The signiﬁcance of this approach is its weighting mechanism, which is supposed
to have the potential to provide optimal equating results with the smallest equating error
by using as much data information as possible. However, the effectiveness of this 2SG
approach hasn’t been evaluated.

The 2SG approach, the EG approach, and the SG approach are all about data
collection designs in an equating procedure. The 2SG approach is under the framework of
Kernel Equating. The equating methods related to this approach are KE linear or KE
equipercentile equating methods. The EG approach and SG approach can be implemented
under both KB and traditional equating framework. Therefore, the equating methods
related to these two approaches are the KB linear, KE equipercentile, traditional linear or

traditional equipercentile equating methods (see more details in Chapter II).

1.3 Literature Review

Descriptions about equating using a CB design can be found in Lord (1950),
Angoff ( 1971), Kolen and Brennan (2004), Zeng and Cope (1995) and von Davier,
Holland, and Thayer (2004). The 2SG approach of treating data collected by a CB design
was mentioned in von Davier, Holland, and Thayer (2004). The only study compared the
performance of this ZSG approach with the EG and SG approach in improving equating

accuracy of a CB design equating is conducted by Qu and von Davier (2006). They

compared the 28G approach to the SG and EG approach under KE framework using a
real data collected by a CB design. It was found that, when order effect can be cancelled
out, the 28G approach with equal weights produce similar equating results as the SG
approach under KE framework. It is still unclear how the 2SG approach performs when
order effects cannot be cancelled out. Moreover, it is not well documented in the
literature how to test whether the order effects can or cannot be cancelled out.

The 28G approach is carried out under the KB ﬁ'amework. KE is a uniﬁed
approach to test equating based on a ﬂexible family of equipercentile—like equating
functions that contain the linear equating function as a special case. It belongs to the
category of classical observed score equating. Studies comparing the KB methods with
other equating procedures concluded that the KB procedure can improve or approximate
the equating results of corresponding traditional equating methods.

Livingston (1993a) compared KE methods with traditional linear and
equipercentile equating methods using small samples collected by a NEAT design. He
evaluated the equating methods in terms of random equating error and equating bias and
found that the KB methods with log-linear smoothing provided more accurate equating
results, when compared to traditional equating methods without smoothing. He also
found that, compared to the empirical standard error of equating, the analytic standard
error of equating calculated by the delta method is larger at the lower or higher score
range when sample size is less than 200.

Mao and von Davier (2005) compared Kernel Equating methods with their
corresponding traditional equating methods using real data in a NEAT design and an EG

design. For the NEAT design, they compared the traditional frequency estimation

equipercentile equating with KE post-stratiﬁcation equating method and the Tucker
method with the KB linear post-stratiﬁcation equating method. They found that KE
methods and their corresponding traditional equating methods have very similar equating
results. Von Davier, Holland, and others (2005) did a similar study using a pseudo-test
data with a NEAT design and drew the same conclusion.

Han, Li, and Hambleton (2005) compared KE with IRT true score equating
methods using data collected by a NEAT design. Again, they found the KB methods

provide similar equating results as those of the IRT equating methods.

1.4 Research Questions

This study intends to quantify differential order effects, to compare the 2SG
equating procedures under KE framework with other traditional equating procedures, and
to discover whether the weighting mechanism can enhance the equating accuracy under
different order effect Situations. The speciﬁc research questions are:

1) How Should differential order effects in CB designs be quantiﬁed?

2) Are the KB methods better than their corresponding traditional equating

methods?

3) Does the weighting in the 2SG approach provide better results under certain

order effect situation?

4) What weight should be used for a 28G approach?

Table 6 displays the 22 equating procedures compared in this dissertation. What
distinguishes them from each other are the way they treat the data collected by a CB
design (EG, SG or 28G with weighting) and the equating method (linear or

equipercentile) they adopted. To compare the performance of KE with traditional

equating methods, the equating results of two KE procedures are compared to the
equating results of their corresponding traditional equating procedures (as listed in table

5).

1.5 Research Expectations

1) The KE equating methods and their corresponding traditional equating methods
provide similar equating results.

2) As DOE increases, the weights of the 28G approach assigned on tests taken ﬁrst
increases accordingly.

3) Decision on the selection of an equating function with the optimal weights may

vary when using different statistical criterion to evaluate the equating results.

As presented above, the literature on any CB design equating is sparse. Since CB
design is still used in research projects and in the pilot study of testing programs (Yu,
2003) when examinees are hard to ﬁnd, it is useful to comprehend the 2SG approach and
to evaluate how much it can enhance overall equating accuracy when compared to other
methods in various order effect situations. Such a study will contribute to the general
knowledge about a CB design and the methods available for equating using data collected

by a CB design.

Chapter II: Theoretical Framework

This chapter ﬁrst introduces the equating designs related to a CB design, the
linear and equipercentile equating methods and the Kernel Equating framework, and then

describes the concept of equating error and the criteria used for evaluating equating

results.

2.1 Counter-Balanced Design

A CB design is Often used in practice when administering two forms to examinees
where it is difﬁcult to obtain sufﬁciently large group of examinees (Kolen & Brennan,
2004). To explain the CB design in more detail, a brief description about EG design and
SG design is necessary:

Equivalent Group Design

TABLE 1. Equivalent—Groups design

 

 

Population Sample X Y
P l J
P 2 \/

 

In an EG design, two independent random samples are drawn from a common
population of examinees, P. Each group of examinees is randomly assigned to take one of

the two parallel forms X and Y as shown in Table 1.

Single Group Design

TABLE 2. Single-Group design

 

Population Sample X Y

 

P 1 v J

In a SG design, only one random sample of examinees is selected from population
P, and all the examinees take the two test forms X and Y in one administration as shown
in Table 2. Because the two test forms are parallel and they are taken by the same
examinee, it is almost certain that the examinee’s performance on the second form will be
affected by their performance on the ﬁrst form. The effect may be a “practice/learning
effect,” or “fatigue effect.” If familiarity with the test increased performance, then Form
Y could appear to be easier than Form X. On the other hand, if fatigue is a factor in
examinee performance, then Form Y could appear relatively more difﬁcult than Form X
because examinees would be tired when administered Form Y (Kolen & Brennan, 2004).
For simplicity, all such possible effects will be named as “order effect” (Lord, 1950). If
the two test forms are administered in the same order to all examinees, as in a SG design,
it is impossible to obtain any estimate of the amount of order effect. Consequently, to
control for the order effect, it is usual to counterbalance the order of administration by
dividing the group in a SG design into two random halves and giving two test forms to

each group but in different order. This design is what is oﬁen called a CB design.

TABLE 3. Counter-Balanced design

 

 

Population Sample X 1 Y 1 X 2 Y 2
P 1 x/ ~/
P 2 v v

 

*The subscripts of X and Y indicate the order. Eg, X1 means take test X ﬁrst, Y 2 means take test Y second.

Table 3 illustrates a CB design, in which, two samples of examinees were

randomly chosen from a same population P and were randomly assigned as sample 1 and

sample 2. Sample 1 takes test X ﬁrst (denoted as X1), test Y second (denoted as Y2), and
sample 2 takes test Y ﬁrst (denoted as Y1) and test X second (denoted as X2). The

purpose of counterbalancing the order of testing is to ensure any order effects are present
equally in the scores obtained for both test forms X and Y such that the order effects on
Form X and Form Y can be cancelled out.

Theoretically, if random selection and random assignment of the examinees are
carried out strictly in operation, the purpose of canceling out “order effect” can be
accomplished by collecting data using a CB design. However, in practice, the assumption
of random selection is often violated. Usually, random sampling is replaced by random
cluster sampling. The violation of these two assumptions leads to the interaction between
group abilities and form difﬁculties, which is the reason why the order effects often
cannot be cancelled out. For example, some group of people might do better on the
second test after practicing on the ﬁrst test, while the other groups might do worse.

There have been different deﬁnitions for order effects in literature. Lord (1950)

and Angoff (1971) deﬁned the order effect on Form X

asKX = X2 —X1 = CO'X1 = C'O'X2 = COX, and the order effect on test FormYas

KY = Y2 — Y1 = CO'Yl = CO'Y2 = CO'Y (where C is a constant). They assumed that

order effects are constant for all examinees and are proportional to the standard
deviations. Kolen and Brennan (2004) explained order effects without assuming they are

constant for each examinee. They deﬁned Differential Order Effect (DOE)

as (X1 - I71) — (X2 — I72 ) and suggested that a signiﬁcant DOE would indicate that order

effects cannot be cancelled out in a CB design. However, there is not a signiﬁcance test
described in their book. In chapter 111, this dissertation adopted their deﬁnition of DOE,
described a hypothesis testing for the statistical signiﬁcance of DOE and suggested using

the effect size statistics for the magnitude of DOE.

2.2 Equating Using Counter-Balanced Designs

Like every equating procedure, equating using a CB design has two parts: data
collection design and equating methods.

2. 2. I Approaches to Treating Data in a CB Design: The nature Of CB design
leads to different ways of dealing with data. Comparing tables 1, 2 and 3 we see that CB
design actually contains both EG and SG designs. For example, there are two (dependent)

EG designs, one for X1 and Y1, and the other for X2 and Y2. In addition, there are two
(independent) SG designs, one for X1 and Y2, and the other for X2 and Y1. Finally, the
two groups of examinees can be pooled together and all the data from X1, Y2, X2 and Y1

can be treated as a pooled SG design.

Because of these different ways of considering data in a CB design, several data
treatment approaches have been used to equate test forms X and Y. Lord (1950) and
Angoff(1971) described a linear equating method that actually treated the data as pooled
single group design. They assume constant order effect and bivariate normal distributions
of test X and Y in the population. By constant order effect, they mean that order effects
are the same for all examinees and are proportional to the relevant standard deviations.
Kolen and Brennan (2004) did not assume constant order effects across examinees. They
suggested using the pooled SG approach when order effects can be cancelled out.

Otherwise, only the EG approach with X] and Y1 should be used, since it is perhaps the

10

only unbiased way of treating data in a CB design.
Nonetheless, each of these two approaches for treating data has its own

weaknesses. Although The E G approach using X1 and Y] only is unbiased, it throws

away half of the data and makes no use of the correlation between X and Y, which is
implicit in the SG aspects of the CB design. The pooled SG approach is considered
problematic when order effects cannot be cancelled out because it is hard to interpret the

pooled distribution of X1 and X2 (or Y, and Y2) when they each have a different

distribution (von Davier, Holland, & Thayer, 2004).

In an attempt to ﬁnd a better way of using data collected by a CB design, von
Davier, Holland, and Thayer (2004) proposed the 2SG approach, a new approach using
all data information as much as possible and more ﬂexibly. It is expected to be able to
unify the other three approaches into one single approach and provide an optimal
equating solution while taking into account different sizes of order effects. Section 2.3
explains this approach under the KE framework in detail.

Table 4 summarizes different ways of dealing with data in a CB design discussed

in literature review.

11

TABLE 4. Ways of treating data in a CB design appearing in the literature

 

 

 

 

 

 

 

EG design Explanation Use data from X 1 and Y1 only
for X1 and Random selection from a single population & random
Yl only Assumptions assignment
Suggested when DOE is signiﬁcant
Advantage/Disadvantage Unbiased/loss of half data
Source Kolen and Brennan (2004), von Davier et al. (2004)
EG design Explanation Use data from X2 and Y2 only
for X2 and Random selection from a single population; random
Y2 only Assumptions assignment
Deﬁnitely not when DOE is Signiﬁcant
Advantage/Disadvantage /biased; loss of half data
Source Kolen and Brennan (2004)
EG Explanation Average two EG equating functions
pooling , Random selection; random assignment
approach Assumptrons DOE is not signiﬁcant
A dvantage /Disa dvantage 3:;fprlllgdfaltracrtpcfpgmatron/Ignore dependency between two
Source Von Davier et al. (2004)
SG design Explanation Use data from X1 and Y2 only
for XI and , Random selection
Y2 only Assumptions DOE is not signiﬁcant
Advantage/Disadvantage /loss of data information
Source Kolen and Brennan (2004)
SG design Explanation Use data from X2 and Y1 only
for X2 and , Random selection
Y1 only Assumptions DOE is not signiﬁcant
Advantage/Disadvantage /loss of data information
Source Kolen and Brennan (2004)
Pooled SG . Use all data from X1, Y1, X2 and Y2 equally when order
approach Explanation effect can be cancelled out
, Random selection; random assignment
AssumptIons . . .
DOE Is not srgmﬁcant
A dvantage /Disa dvan tage Use full data information/not applicable when DOE is
Signiﬁcant
Kolen and Brennan (2004), Lord (1950), von Davier et al.
Source (2004)
ZSG , Use all data information unequally when different order
approach Explanation effects present

Assumptions

Advantage/Disadvantage
Source

Random selection & random assignment
All kinds of DOE

Use full data information/

Von Davier et al. (2004)

 

* Approaches 2, 3, 4, 5 are possible ways of treating data in a CB design but are of no interest to this study

2. 2.2 Equating Methods for a CB Design: Linear or equipercentile equating

12

methods following KB or traditional equating procedure are the equating methods related
to a CB design found in literature.

Every equating method deﬁnes a target population T, on which scores on the two
test forms are to be made equivalent (for the population as a whole, not necessarily for
every individual in the population) (Livingston, 2004; von Davier, Holland, & Thayer
2004; etc.). The target population depends on the data collection design. This study
focuses on the CB, EG, and SG designs where there is only one population P of test
takers from which particular samples are drawn. For these designs the target population T
is assumed to be the same as the underlying population P (von Davier, Holland, &
Thayer, 2004). The linear equating method is appropriate when tests X and Y have the
same distribution on the target population while the equipercentile equating method
adjusts for the differences in the distribution.

Linear equating deﬁnes the equating relationship as the equivalence of Z—scores,
whereas equipercentile equating method deﬁnes equating relationship as the equivalence
of cumulative distribution functions of X and Y in the population. Equation (1) and
equation (2) deﬁne the equating relationship for linear and. equipercentile equating when

equating X onto Y, which means each of the raw scores, xj is transformed to e Y(xJ-) or y by
these equating functions, i.e., a raw score of xj on test X is interchangeable with a raw

score of e ﬁx!) or y on test Y.

.x___-“X=__y‘“Y 2. y=aY+"—Y(x—ux) (I)
0X 0r 0X
60») -—- F(x) 2 y = G‘1<F(x>) <2)

Equation 2 holds only when X and Y are continuous. KE applies the Gaussian

13

Kernel continuization procedure (von Davier, Holland, & Thayer, 2004). While the
traditional equipercentile equating in this study uses linear interpolation to continuize

score distributions.

2.3 Equating with a CB Design under the Kernel Equating Framework

The KE framework accommodates both linear and equipercentile equating
procedures with pre-smoothing and continuization. Pre-smoothing is the log-linear
smoothing before scores are equated. Continuization is used to convert discrete score
distributions to continuous distributions by using a normal (Gaussian) “kernel” (Holland
& Thayer, 1989; von Davier, Holland, & Thayer, 2004). In the case of a CB design, the
KB framework incorporates three different ways of treating data -- the E G approach, the
pooled SG approach, and the 2SG approach. Both linear and equipercentile equating
methods are available to each of the three ways of treating data. The following section
introduces the ﬁve steps of the KB framework particularly for a CB design and presents

how the three approaches differ with respect to each of these ﬁve steps.

2.3.] Step 1. Log-linear Pre-smoothing

In pro-smoothing, the empirical score distributions are smoothed. Smoothing can
remove irregularity in the empirical score distributions and make them as smooth as the
population score distribution relationship. Smoothing is necessary, especially when
sample size is small (Livingston, 1993). KB conducts pre-smoothing using a log-linear
method. Compared to the other pro-smoothing methods, the log-linear method has the
ﬂexibility of accommodating many distributions and is well-behaved and relatively easy

to estimate. Because the log—linear models are a part of the exponential families, the

14

estimated distribution can match the sample distribution by as many moments as possible
(Holland & Thayer, 2000; Kolen & Brennan, 2004).

In this step, a log-linear model with best ﬁt is selected to ﬁt the sample data and to
estimate discrete score probabilities. The ﬁt of the log-linear models can be evaluated. by
examining changes in the likelihood ratio chi-square index over different models and

conditional Freeman-Tukey residual plots. The Freeman—Tukey residual plot displays the
deviation between ey (X) and Y or between ex(Y) and X. A log-linear model with
good ﬁt will have conditional Freeman-Tukey residuals randomly distributed within 3
units above or below the zero line. In addition, the ﬁt of a log-linear model can be
somehow reﬂected by the Standard Error of Equating introduced in step 5. A bad model

ﬁt could lead to large SEE.

Let J and K denotes the total number of possible scores on Form X and Form Y

respectively, x j represents a possible score value for test X, j=l to J on X; yk represents
a possible score value for test Y, k = l to K on test Y; p jk =Prob {X= x j , Y= yk | T }=the
bivariate score probability of X= x j and Y= yk over the target population T; let ,6 ’s be

the slope parameters that will be estimated by maximum likelihood method, a and a *

are the normalizing constants selected to make the sum of population score probabilities
equal to one; let T X and T Y denote the number of moments matched between the ﬁtted

probabilities and the observed score probabilities; and let I and L denote the number of

cross moments matched between the ﬁtted and the observed score probabilities. Then,

15

A univariate log-linear model takes the form of:

I .
10g<p,—)=a+ Zlmxji' ‘3)

A bivariate log-linear model takes the form of:

l _ * TX i i TY i V i I L i 1 (4)
08(ij)-a’ + ZﬂXOCj) + ZﬂYO’k) + 2121511ij
i=1 i=1 i=1 =1

For the SG KE method, one single bivariate log-linear model is ﬁt to the pooled

data to get the probability of an examinee getting a score of j on Form X and a score of k

on Form Y (that is pJ-k ).

For the 25G KE method, two separate bivariate log-linear models are ﬁt to two

groups of data to get two sets of probability estimates 13(1 2)jk and p<21)jk, where

P(12)jk is the estimated population probability of getting a score x j on test X], which

is taken ﬁrst and a score yk on test Y2 which is taken second; [301) jk is the estimated

population probability of getting a score x j on test X2 which is taken second and a score

yk on test Y1 which is taken ﬁrst.

For the EG approach, data is ﬁt by two univariate log-linear models.

Alternatively, the E G with X 1 and Y1 only KE method can be considered as a special case

Of the 25G KE method with weights of (l , 1).

2.3.2 Step 2. Estimating Score Probabilities on the Target Population

In this step, a Design Function (DF), either linear or non-linear, is applied to map

the estimated population score probabilities from step 1 into the estimated score

16

probabilities for X and Y on the target population T, denoted as f j and §k .

In the KB method of E0 with X] and Y1 only, the DF is an identity function, i.e.,
the estimated probabilities on target population T ( f j or 53k ) is identical to the estimated
population probabilities, 131- or p k . For both pooled SG and 25G KE methods, a non-

identity DF is needed to transform the estimated population probabilities from step 1,

which is relevant to the data design, into the estimated probabilities over target
A* A* o n -
population T. For the pooled SG KE method, r j and Sk IS the sum of the jornt

probabilities over k and j respectively. For the 2SG KE method, 12 j or §k is the weighted

average of the two sets of estimates from the two groups.

f; =Zf’jk’ (5)
k
51: ZED/ca (6)
]
fj = xEPazyk + (1— Wx)§l3(21)jk, (7)
§k = WyZﬁ(21)jk +(1— Wy)Z_ 13(12)jk, (8)
J J

Where Wx and Wy indicate the weights placed on the test forms taken ﬁrst.

Depending on the size of DOE, they can be adjusted somewhere between 0.5 and l to

emphasize information collected from tests taken ﬁrst. When both Wx and Wy are set to

be 1, data from test forms taken second are completely discarded. Thus the 2SG approach

17

becomes the E G approach with X1 and Y 1 only. On the other hand, when both Wx and

W

y are set to be 0.5, the 2SG approach approximates the SG approach by treating the

data equally from tests taken ﬁrst and second.

2.3.3 Step 3. Continuization

Livingston (1993) clearly explained this step. In all equipercentile equatings,
score x on Form X and score y on Form Y are equated in a population of test-takers if and
only if they have the same percentile rank in that population. In the real world of
educational testing, since the observed test scores are discrete, it is rare to ﬁnd a score on
Form Y that has exactly the same percentile rank in the test-taker population as score x on
Form X. In order to do equipercentile equating, discrete percentile rank score distribution
has to be continuized. In the KB framework, this “continuization” of the distribution is
accomplished when it replaces the frequency at each discrete score value with a
continuous frequency distribution centered at that value. In contrast, the traditional
equipercentile method uses linear interpolation to continuize discrete score distributions.

By adding a continuous random variable Vdistributed as N (0, l), the discrete

random variables X and Y are transformed into continuous variables X (11X ) and Y (hy)

in KE:
X(hX)=aX(X+hXV)+(1—0X)HX (9)
Y(hr)=aY(X+h)/V)+(1rar)llr (10)

In the above formula, h x and by can be any positive number. They are the

18

bandwidth of the replaced normal distributions for each discrete score; ,u X and 0'};

denote the mean and variance of variable X over target population T,

2
a
[1X =ijrj ,o/2Y = 2(xj - ,uX )Zr- ; div =—2—L7 is an adjusting constant.
J j 0X+hX

Since variable Vhas a continuous normal distribution, it is obvious that X + hXV will
be continuous and so does X (h X) . It can be proved that the transformed continuous

variable X (h X) and Y (hy) has the same mean and standard deviation as the discrete

variables X and Y respectively.

The selection of h X (or hY) determines the equating method. The KE Optimal
(simply as “KE” in Table 6) equating method selects h X (or hy) automatically by
minimizing the difference between the probability distributions of X (or Y) before and

A " 2
after continuization 20' j — f h X (x j l) , where f h X is the density of X (h X ) ). While
J

the KE_Linear (linear) equating method can be approximated by using a large
“bandwidth” value which is usually larger than 10 times of the standard deviation of an

observed score distribution.

2.3.4 Step 4. Equating
KE deﬁnes the equating relationship as the equivalence between the continuized
cumulative distributions of X (h X ) and Y (hy) . For example, the equating function for

equating X to Y on target population T is given by:

Ghy (yo) = Ft, (xi) => 5» = 6,7,1 <th (mm) a» éy<x> = 6;; (1%., (x»

19

(11)

Where F ( 11 X) and G(hy) represent cumulative density functions of
X (11 X ) and Y (by) respectively. The linear equating method is considered as a special

case in KE framework.

2. 3.5 Step 5. Calculating Standard Error of Equating (SEE) and Standard Error of
Equating Difference (SEED)
KE provides a formula for calculating SEE derived from the delta method (see

von Davier, Holland, and Thayer, 2004):

SEE (éY(x)) = SEE(€Y(X;f,§)) = JJey (f,§)JDF(1},§)2R,§~/eyz 1i §)JDFE;,§) (12)

 

Here R and S are used as generic names over all the designs for the population

score probabilities of X and Y estimated by the log-linear pre-smoothing model in step 1,

A

R
like [9]- ,pk , P(l2)jk , and [3mm etc. When sample size is large, .. is
S

asymptotically normally distributed with mean of (S) and variance matrix of

2 with dimension ((JK + JK) x (JK + JK)) ; f and S are the estimated population

RS
score probabilities of X and Y over target population T; Z R 3 is the covariance matrix of

R and S . The estimated equating function is a composition of éy and DF

(éY (x) = eY (x;f,§) = G_1(F(x)) ); the design function (DF) is a function of

20

R andS ; J ell“, g) and J D F( R, S) are Jacobian matrices (in formula 13 and 14) related

to the equating function and the design function respectively. J 8), (,2, §) is a (1X (J + K ))-

row vector of the ﬁrst derivatives of the estimated equating function with respect to each

estimated score probabilities r" and § over target population T, and J D F ( R 5) is a

((J + K) x (JK + JK)) - matrix of the ﬁrst derivatives of the DF with respect to each of

the output variables from the pre-smoothing procedure:

 

JeY(fa§) : (86,): age—3:) (13)
r as (lx(J+K))
a: a)
JDF(R,$‘) = OR US (14)
US US

 

 

\gﬁ,$/((J+K)X(JK+JK))

Kernel Equating provides an analytic tool to calculate standard error of equating.
It is known as the delta method (also known as Taylor Series method) and provides a
statistical procedure widely used to estimate the variance or standard error of a ﬁinction
of some statistical estimates with known asymptotic distributions (Kolen & Brennan,
2004; von Davier, Holland, & Thayer, 2004).

In addition to calculating the conditional SEE’S at each score point, KE also
provides the SEED statistics for calculating the standard error of equating difference
between two KE functions at each score point. Von Davier, Holland, and Thayer (2004)
used SEED to decide whether the equating results of two KE methods are signiﬁcantly

different from each other.

21

2.4 Equating Error

Equating error reﬂects the difference between the equated scores estimated from
the sample and the equated scores from the population. It consists of two sources of error
— random equating error and systematic equating error. Random equating error is the
error simply due to sampling. Systematic equating error arises if 1) the equating design is
inappropriately executed; 2) the statistical assumptions of an equating method are
violated; 3) equating procedure is inappropriately implemented, for example, applying an
IRT equating to a multidimensional test. The deﬁnition of random error and systematic
error determines that the magnitude of the random equating error closely depends on the
sample size, while the systematic equating error does not depend on the number of

examinees in the equating (Kolen & Brennan, 2004).

2.5 Evaluating the Results of Equating

After equating is conducted, the results of equating can be evaluated with several
criteria. According to Harris and Crouse (1993) and other evaluation studies of KE, the
evaluation criteria for equating results include:

1) Standard error of equating conditional on scores;

2) Root Mean Squared Deviation (RMSD) index and “average equating
error” index (Klein & Jarjoura, 1985; Livingston, Dorans, & Wright,
1990) for evaluating overall equating accuracy;

3) Conditional equating bias and “average equating bias” (Livingston, 1993);

4) Root Mean Square Error (RMSE) for overall adequacy of equating (Mao,
von Davier, & Rupp, 2005);

5) Standard Error of Equating Difference calculated under the KB framework

22

(von Davier, Holland, & Thayer, 2004).

2. 5. 1 Standard Error of Equating

The Standard Error of Equating (SEE) is useful in indicating the amount of
random error in equating which is due to sampling of examinees. There are two ways of
calculating SEE’s: analytic methods, and computational methods such as a bootstrap
resarnpling method or other empirical methods. The delta method is an analytic method
replying on asymptotic statistical assumptions. It uses normal distribution to approximate
the probability distribution of a statistical estimator. The assumption of asymptotic
normality holds only when sample size is relatively large. When sample size is small, the
delta method will not be accurate unless strong normality assumption holds for the
population.

Using a real data with a common item nonequivalent group design, Hanson, Zeng,
and Kolen (1993) compared the delta method standard errors of equating with the
bootstrap standard errors of equating for Levine observed score and true score linear
equating. The sample size is over 700. The results of their study indicate that compared to
the bootstrap SEE, the random equating errors for scores at the higher end were
overestimated by the delta method with a normality assumption while the random
equating errors for scores at the lower end were underestimated. Lu and Kolen (1994)
used the delta method and the bootstrap method to estimate SEE’S of Tucker linear
equating for a common item nonequivalent group design. They compared the differences
between standard errors derived from the delta method and the bootstrap method given
different sample sizes and different number of bootstrap replications. They also found

that the difference between standard errors calculated by the delta method and the

23

bootstrap method become larger as sample size decreases and as the number of bootstrap
replications decreases.

Bootstrap method refers to the resampling procedure of selecting random samples
with replacement from a given sample with size N repeatedly. The theoretical framework
for the bootstrap method and the applications of the bootstrap method were decribed in
Efron (1982), Efron and Tibshirani (1993) and Kolen and Brennan (2004). Suppose in a
random equivalent group design, two groups of examinees of size n, and n2 took test
forms X and Y respectively, Form Y is equated to Form X using equating method B,
Then a typical bootstrap method has the following steps: 1) Draw a sample of size n, with

replacement from the group of examinees taking test form X (size = n 1); 2) Draw a
random bootstrap sample of size n 2 with replacement from the group of examinees taking
test form Y (size = n2); 3) Conduct equating on the random bootstrap samples and obtain

an equating function; 4) Repeat step 1 through step 3 for a large number of times and
equate Y to X every time; 5) All the equating results at each score point form a
distribution. Calculate standard deviation of the equating results at each score point. The
result is called the estimated bootstrap standard error of equating conditional on every
score point. Then the bootstrap standard error of this equating procedure conditional on

each score level will be:

 

l n ,, 7:
SEE: Z:E(ex(yk)-ex(yk))2 (15)

where n is the total number of replications; yk represents the kth score on Form Y;

e X ( yk) is the equated score on Form X corresponding to score yk; 5X (yk) is the mean

24

of equated scores at score yk over the n replications. Parshall, Houghton, and Kromrey

(1995) used bootstrap standard error of equating and statistical bias in equating to study
the adequacy of equating. Their results incidate that as sample size decreased, equating
bias remains stable but the bootstrap SEE increased substantially. Therefore, they argued
for using the bootstrap method instead of the delta method to calculate SEE for samall
samples (Tsai, 1995).

Livingston (1993a) compared the standard errors of kernel equating methods with
traditional equipercentile methods using a common item nonequivalent group design. He
calculated random standard error of equating using an empirical method different from
the typical bootstrap method. He selected 50 small random samples of size n without
replacement from a big population dataset of size N. He then obtained equating results for
each of the 50 small samples. Standard deviation of the 50 equated scores from the
population criterion equating result at each raw score point is regarded as the conditional

standard error of equating at each score point. Instead of using the mean of the 50
equated scores for each raw score point (EX (yk) in formula 15), he used the equated

score on the population criterion.
The simulation study in this dissertation follows the same procedure as described
in Livingston (1993) to calculate empirical standard error of equating. The bootstrap

method was applied on the real datasets to calculate standard error of equating.

2. 5.2 Root Mean Squared Deviation (RMSD)

The root mean squared deviation (RMSD), is a measure of the overall equating
accuracy (Livingston, Dorans, & Wright, 1990; Livington, 1993; Schmitt, Cook, Dorans,

& Eignor, 1990). It can be calculated by:

25

 

ZnYk (jay/c _ xJ’k )2
RMSD = (16)

\ ZnYk

 

 

where x yk is the equated score on Form X corresponding to score y using the

criterion equating method; 55 y k is the equated score on test form X corresponding to

score y using other equating methods; nyk is the number of observations at each score

level of test Y. The RMSD is basically an average of the conditional random equating
errors. An alternative summary statistics is the average equating error, which is simply

the average of the conditional standard error of equatings over all the score points on test

Form Y (Klein & Jarjoura, 1985).

2. 5.3 Equating Bias

Equating bias is useful in indicating systematic error in equating. In equating
practice, equating bias is often estimated when comparing equating results with an
arbitrarily selected sound criterion. Generally, results from equipercentile equating are a
good candidate for such a criterion. Yen (1985) suggested. using the results from
equipercentile equating as a criterion because it is as accurate as the IRT-based equating
results. Livingston (1993a and 1993b) used the equipercentile equtaing results for a very
large sample as a baseline criterion. Alternatively, the true equating relationship can be
found from simulated data. In simulation studies, the population equating relationship is
known and can be reckoned as a comparison criterion for calculating equating bias, but
the degree to which the simulated data can represent real data is questionable.

Use the same notation deﬁned above, the equating bias conditional on each score

26

level can be caculated by:
xyk _ xyk (17)

The overall bias of equating can be calculated by:

ZnJ’k (jeYk _ xyk l/ZnJ’k (18)

2.5.4 Root Mean Square Error

As described above, SEE and RMSD reﬂects random equating error and
systematic equating error respectively. Tsai (1995) and Mao, von Davier, and Rupp
(2005) adopted the Root Mean Square Error (RMSE) index. Tsai (1995) explained why
this statistics takes into account the random equating error and systematic equating error

simultaneously.

 

RMSE=\/(d)2+(sdd)2 (19)

Where d is the mean of the equating differences at each score level, and sdd is the
standard deviation of the equating differences between two methods. It reﬂects how

biased and how accurate the equating results are comparing to an equating criterion.

2. 5.5 Standard Error of Equating Difference

SEED calculated in KE can be used to determine whether the equating difference
between two KE methods is signiﬁcant or not. Von Davier, Holland, and Thayer (2004)
used SEED to decide if equating bias in a CB design is signiﬁcantly big. When equating
using a CB design, the equating function of the 286 approach with weights of (l, l) is

unbiased since the data from tests taken ﬁrst is not affected by order effects. If a 2SG

27

method with certain weights is compared with the unbiased ZSG(l, 1) method, and their
equating difference falls within the range of j: ZSEED, then the equating bias of this 28G
method is small enough to be neglected. The standard error of equating will become the

only statistics to compare when selecting an equating function.

TABLE 5. KE methods and correggonding traditional equating methods

 

 

 

 

 

 

 

ZSG(.5, .5) KB linear Traditional SG linear equating

2SG(1, 1) KB linear Traditional EG linear equating

ZSG(.5, .5) KB equipercentile Traditional SG egmipercentile equating
2SG(1, 1) KB eguipercentile Traditional EG equipercentile equating
286 with other weights Not available

 

28

 

TABLE 6. All eqpating methods compared in this study for simulated data

 

 

 

 

 

Equating Explanation
28G ZSG(.5,.5) Log-linear smoothing; Treat data as two independent groups; Using
Design weights of (5,5) for X and Y
L' 28G(.5,.75) Log-linear smoothing; Treat data as two independent groups; Using
"war weights of(.5,.75) for x and Y
28G(.6,.5) Log-linear smoothing; Treat data as two independent groups; Using
weights of(.6,.5) for X and Y
ZSG(.6,.6) Log-linear smoothing; Treat data as two independent groups; Using
weights of (6,6) for X and Y
ZSG(.75,.5) Log-linear smoothing; Treat data as two independent groups; Using
weights of(.75,.5) for X and Y
28G(.75,.75) Log-linear smoothing; Treat data as two independent groups; Using
weights of(.75,.75) for X and Y
ZSG(.9,.5) Log-linear smoothing; Treat data as two independent groups; Using
weights of (.9,.5) for X and Y
ZSG(.9,.9) Log-linear smoothing; Treat data as two independent groups; Using
weights of (9,9) for X and Y
2SG( l ,1) Log-linear smoothing; Treat data as two independent groups; Using
weights of(l,l) for X and Y
ZSG ZSG(.5,.5) Log-linear smoothing; Treat data as two independent groups; Using
Design weights of (5,5) for X and Y
E , ZSG(.5,.75) Log-linear smoothing; Treat data as two independent groups; Using
qur- , weights of(.5,.75) for X and Y
percentile _ . . ,
ZSG(.6,.5) Log-linear smoothing; Treat data as two Independent groups; Usmg
weights of (6,5) for X and Y
ZSG(.6,.6) Log-linear smoothing; Treat data as two independent groups; Using
weights of(.6,.6) for X and Y
28G(.75,.5) Log-linear smoothing; Treat data as two independent groups; Using
weights of(.75,.5) for X and Y
28G(.75,.75) Log-linear smoothing; Treat data as two independent groups; Using
weights of(.75,.75) for X and Y
ZSG(.9,.5) Log-linear smoothing; Treat data as two independent groups; Using
weights of (.9,.5) for X and Y
ZSG(.9,.9) Log-linear smoothing; Treat data as two independent groups; Using
weights of (9,9) for X and Y
28G(1 ,1) Log-linear smoothing; Treat data as two independent groups; Using
weights of(l,l) for X and Y
SG design SG_Lin Linear-interpolation; Traditional linear equating
SG_Equi Linear-interpolation; Traditional equipercentile equating
EG design EG Linear Linear-interpolation for continuization; Traditional linear equating
EG Equi Linear-interpolation for continuization; Traditional equipercentile equating

 

Among these methods, the EG linear, EG equipercentile, SG linear and SG

equipercentile equating methods are the corresponding traditional equating methods for

29

the ZSG(1, 1) linear, 28G(l, 1) KB, SG KE linear and SG KE methods.
Chapter III: Methods

3.] Quantification of Differential Order Effect
This study draws on DOE as (A71 _ 171) — (X2 — )72) (Kolen and Brennan, 2004)
to further introduce Hypothesis Testing and effective size and estimate order effects in a
CB design.
The following is a derivation for a hypothesis testing of the statistical signiﬁcance

OfDOE:
DOE=(/3X1‘Il7I/1)-(ﬂx2 ’ﬂY2)=(ﬂX1+ﬂY2)—(ﬂXz +1ah)

: 2X1+ZY2 _ 2X2 +221
N1 N1 N2 N2

Z(X1+Y2) Z(X2+YI)_ ~ .
N1 ‘” N2 _'U(X1+Y2) —’u(X2+Yl) (20)

 

where AZ(Xl +Y2) is the average sum scores of X 1 and Y 2 for sample 1,ﬂ(X2+ Y1) is the
average sum scores of X 2 and Y, for sample 2; N1 is the number of examinees in sample
1, and N 2 is the total number of examinees in sample 2.

Therefore, the hypothesis testing for the signiﬁcance of DOE is actually
equivalent to a two independent sample t—test for the mean difference of Sum] 2 and
Sum2]. The null hypothesis for DOE becomes: H0 :IU(X,+Y2) —’u(X2+YI) = () ;

. DOE
and the t test 15: t = (21)

30

where sp is the square root of the pooled variance of the two sum scores,

 

2 2
s _ (n1 1)S(X1"'Y2)-i-(n2 1)S(Y1+X2) (22)
p—

n1+n2—2

 

The statistical Signiﬁcance of DOE, however, relies heavily on sample sizes. To
avoid the inﬂuence of sample size on the quantiﬁcation of differential order effects, the

effect sizes of DOE can be calculated:

.. Mean —Mean .
Effect size at = (X1+Y2) (Y1+X2)

 

(23)

Sp

3.2 Data

This study uses 2 real datasets and 6 simulated datasets with CB designs. The six

simulated datasets are generated in a systematic way with different sizes of DOE.

3.2.] Real Data

Real data ]: Von Davier, Holland, and Thayer (2004) provided a real dataset from
a small ﬁeld study of an international testing program. In their dataset, both test forms X
and Y are number-right scored. They have 75 items and 76 items respectively and their

correlation is le,y2) = r(X2,Y1) = 0.88 .

31

 

 

TABLE 7. Summary statistics for real data I

X 1 Y 2 X 2 Y 1 X Y Sum12 Sum21
N 143 143 140 140 283 283 143 140
Mean 52.65 51.42 50.64 51.39 51.66 51.41 104.07 102.04
SD 12.41 11.03 13.83 12.18 13.15 11.59 22.72 25.23
Skew -0.52 -0.37 -0.54 -0.58 -0.55 -0.49 -0.45 -O.57
Kurt -0.15 -0.64 -0.82 -0.52 -0.50 -0.55 -0.40 -0.67
Min 16 27 19 18 16 18 45 45
Max 74 71 72 71 74 71 142 142

 

 

*X and Y are scores for combined groups; Sum12 is the sum of scores on test X, and Y; for the ﬁrst group;

Sum21 is the sum of scores on test X2 and Y, for the second group.

The differential order effect in this dataset is DOE == (X1 - )71) — (A72 - 172) =
2.03, which has an effect size of 0.08 approximately. T-test is not signiﬁcant.

Real data 2: The second real data was collected using a CB design for an algebra
test. Each of the equating forms has 25 multiple-choice items. Group one has 399
students, who took Form X ﬁrst and Form Y second, and Group two has 362 students,

who took Form Y ﬁrst and Form X second. Both test forms X and Y are number-right

scored and their total score correlations are r(X1,Y2) = 0.64 and r(X2 ’ Y1 ) = 0.74

respectively.

TABLE 8. Summary statistics for real data 2

 

 

X, Y; Y, X; X Y SumIZ Sum21
N 399 399 362 362 761 761 399 362
Mean 13.04 13.00 12.14 11.84 12.47 12.59 26.04 23.98
SD 3.94 4.35 4.15 4.66 4.33 4.27 7.50 8.22
Skew -0.22 -0.25 0.25 0.22 -0.03 -0.01 -0.07 0.37
Kurt 0.21 0.40 —0.34 -0. 15 -0.06 -0.02 0.19 -0.28
Min 0 0 2 0 0 0 0 4
Max 23 25 23 25 23 25 48 48

 

 

*X and Y are scores for combined groups; Sum12 is the sum of scores on test X1 and Y2 for the ﬁrst group;

Sum21 is the sum of scores on test X2 and Y1 for the second group.

The differential order effect in this dataset is 2.06, which has an effect size of 0.26

approximately.

32

3.2.2 Simulated Data

In compliance with Davey, Nering, and Thompson’s (1997) purpose of simulating
realistic item response data, this study made an effort to generate data as close as possible
to the ﬁrst real data described earlier. The reason for selecting real data 1 as a target is
that the two test forms in this dataset have equal test-retest reliabilities, which is an
important assumption for linear and equipercentile equating. There are 75 items on each
simulated test form.

Six population datasets were simulated with different sizes of order effects using a
3 parameter logistic Item Response Theory model (3PL IRT model). In Lord (1980), a
3PL IRT model takes the form as below:

1— c
-l .7a(t9-b) (24)

 

P9 26+
() 1+e

where 19 is the underlying ability to be measured, a is the item discrimination
parameter, b is item difﬁculty, and c is the item guessing parameter indicating the
probability that a person completely lacking in ability will answer the item correctly.

Each of the six simulated datasets has two samples, each with size of 100,000.
Each sample takes two tests X and Y but in different order. A 75 by 100,000 item-person
response matrix with 0 and 1 scores was generated for each sample using the 3PL IRT
model. The scores on each item were then totaled to get an observed test score for each
examinee. After the simulation of data for two independent group taking two test forms
in different order, data from the two independent samples were simply combined together

to form the dataset with a pooled SG design. Please see the design below:

33

sam lel: X ,Y
ForaCB design: [9 ( 1 2)
sample2:(X2,Y1)

X1 Y2]

For 3 SG design: pooled sample:
(X 2 Y1

However, one drawback of using real data 1 is its lack of item response data.
Without the item response block, it is more difﬁcult to estimate the item parameters of
the real test items and use the estimated parameters for simulation. In this simulation, the
parameter distributions were decided based on empirical experience.

To ensure that the generated item discriminant parameter a and item guessing
level c are positive, parameter a ’s were randomly selected from a log-normal distribution,
and parameter c ’s were randomly selected from a beta distribution. Furthermore, in order
to make the simulated data more realistic, means and variances of the distributions of
parameter a, b, and c were adjusted to be certain values to best emulate the ﬁrst real data
set used in this study. Speciﬁcally, the mean and variance for the log-normal distribution
of parameter a was ﬁxed as 1 and 0.12; the mean and variance for the normal distribution
of parameter b was ﬁxed as -0.3 and 0.8 and the mean and variance for the beta
distribution of parameter c was ﬁxed as 0.25 and 0.008.

Order effects were considered as a second dimension of examinee’s underlying
abilities when taking the second test and the size of order effects varies across examinees.
Assume that the changes in examinees’ performances reﬂect the changes in their

underlying abilities, then,

612k = 611k +01k (sample 1); (25)

622k = 621k + 02k (sample 2); (26)

34

where k is the number of examinees;

611k denotes the underlying abilities of examinees in sample 1 taking the ﬁrst test
(X1);

31 2 k denotes the abilities of examinees in sample 1 taking the second test (Y2);

01k denotes the order effects of examinees in sample 1 taking test X ﬁrst and Y
second;

621k denotes the underlying abilities of examinees in sample 2 taking the ﬁrst test
(Y1);

622 k denotes the abilities of examinees in sample 2 taking the second test (X2);

02 k denotes the order effects of examinees in sample 2 taking test Y ﬁrst and X
second;

It was assumed that 611k and 612 k (or 621k and 622k) follows a bivariate
normal distribution with the same standard deviations. The correlation between 611k and

012 k (or 621k and 922 k ) may not be perfect since order effects are not constant across
examinees. It was set to be 0.94 in this study in order to achieve a correlation of observed
score at 0.88. 01k and 02 k both have variances of (1-0.94)2. When all the parameters a,
b, c, and 6 were randomly selected, calculate the probability of each examinee with
certain 6 level answering each item correctly from the 3PL IRT model. If the probability
of a correct response is greater than a random number from a uniform distribution, the

item response for a person on a speciﬁc item will be 1, otherwise it will be 0.

In this study, the effect sizes of differential order effects were controlled to be

35

changing from O to 0.2 in the simulated datasets. In order to meet this restriction and

make simulated data as real as possible, different means for the distributions of 61 1 and

612 (or 621 and 622) were tried and DOE’s were calculated afterwards until order

effects are within the range and the simulated test scores share similar descriptive

statistics as test scores in the ﬁrst real dataset. The distributions and descriptive statistics

of the six simulated datasets are provided below. As shown in table 9 to table 14, the

simulated data has similar distn'bution moments as the ﬁrst real dataset.

Simulated data 1 with insignificant order elfectsLDOE = -0.04)

0 Sample 1 (N=100000):

2

0 =1 0 =94
HN (”511:0 ”612:0'01)’ 911 611:]2
06119122294 0012=1

0 Sample 2 (N=100000):

2
0 =1 a :94
9 6 t9
6'" (111621 :0 #622 20.01), 21 21222
=.94 0 =1
0921922

922

a ~ (,ua =10; =0.12); b ~ (72,, =—0.3,a,3 =08);

C~

(726 = 025,05 = 0.008)

 

TABLE 9. Descriptive statistics for Simulated data I

 

Test Min. Max. Mean Std Skewness Kurtosis
X1 10 75 52.52 13.78 0% -0.67
Y2 9 75 50.50 13.57 -032 -0.81
X2 10 75 50.51 13.59 -031 -0.81
Y1 8 75 52.55 13.80 -045 -0.68

 

"(X1, Y2) = " (Y1, X2) 2 0-88

36

3000

    

 
 

 
  

25
200
150
1000 . 1000.
500 500

   
 

0— _
01020304050607075 0 10 2030 40 50 607075

X] (skewness=—O.46) Y2 (skewness=—0.32)

 

0 —10 20 30 4O 50 60 70 75 00 —1—0 20 30 4O 50 60 70 75
Y] (skewness=—0.45) X2 (skewness=-O.3l)
Simulated data2 with signiﬁcant order @tects (DOE = —0. 58, e ect size 0 DOE = 0.025

0 Sample 1 (N=100000):

2 _ __
0611—1 0611612_'94

19~ (719”:0 77612 =—0.025), _ 94 2 1
0611612 —. 0612 _
0 Sample 2 (N=100000):

2
0' =1 0 =94
_ _ 1921 921322
t9~ (#921 _0 21,922 _0.025), _ 94
0921922 _' 0622 _

37

a ~(ya =10; =0.12); b~ (,ub =—0.3,0'§ =08);

c ~ = 0.25 02 = 0.008
(#6 7 b )

TABLE 10. Descrytive statistics for simulated data 2

 

Test

 

 

Mm. Max. Mean Std Skewness Kurtosis
X1 9 75 52.01 13.71 -0.43 -0.7]
Y2 10 75 50.54 14.01 -0.27 -O.89
X2 11 75 51.15 13.90 -0.30 -0.87
Yl 10 75 51.98 13.66 -0.41 -0.73
7?le Y2) = For], X2) z 0.88
3000 3000

250
200
150

 

1000-

   
 
   

500

 
 

O _
0 10 20 30 40 50 60 7075
X1 (skewness=—0.43)

3000
250
200
‘150

500
Q _
0 10 2O 30 40 50 60 7075

Y1 (skewness=-0.4l)

 

500

    

0 _
C 10 20 30 40 50 60 7075

Y2 (skewness=-0.27)

500

 

o _
0 ‘10 20 30 4O 50 60 7075

X2 (skewness=-0.3)

38

Simulated data3 with significant order etZects (DOE= 1.41, etZect size of DOE = 0. 05 2
0 Sample 1 (N=100000):
0'2 =1 0' = 94
611 611912 '
0~ (7191 =0 #91 =0.05),
1 2 o = 94 02 —1
611912 ' 612

0 Sample 2 (N=100000):

 

 

2 _ _
0'921 —1 0921922 —.94
0~ (#9 =0 #9 =—0.05),
2‘ 22 — 94 0'2 —1
0921922 _' 1922 _

_ 2 _ . _ 2 _ .
a~ ,ua —1,ob —0.12 ,b~ ,ub ——0.3,0'b —0.8,
6 ~ (ﬂc =0.25,a§ =0.008)
TABLE 1 1. Descriptive statistics for simulated data 3
Test Min. Max. Mean Std Skewness Kurtosis
X1 9 75 52.01 13.71 -043 .071
Y2 10 75 51.54 13.90 -0.34 -0.84
X2 11 75 50.15 14.00 -024 -0.92
Y1 10 75 51.98 13.66 -0.41 -0.73

 

rm, Y2) = ’01,,1’2) = 0-88

3000

  
  

25
200
150
1000-
500

  

 

0 —— 0 __
0 10 2° 30 40 50 60 7075 0 10 20 30 40 50 60 7075

X1 (skewness=-0.43) Y2 (skewness=-0.34)

39

3000

        

O _
o 10 20 30 40 so 60 7075 0 1° 20 30 4° 50 6° 7°75
Y1 (skewness=-0.41) X2 (skewness=-0.24)

Simulated data4 with significant order eﬂects [DOE= -2. 75, etZect size at DOE = 0.12
0 Sample 1 09400000):
02 =1 0 = 94
01 1 61 1612 .

6~ (“1911:0 #912=_0'1)’ 2
0011612='94 0612=1

. Sample 2 (N=100000):

07-1

621 : 0.021022 :94

_. 2 _
0321322 -—.94 0922 ——1

6~ (#921:0 #4922 =0‘1)’

a ~ ()2, =10; =0.12); b~ (77,, =—0.3,a§ =08);
c~ =025 0'2 =0.008
(lac 2 b )

TABLE 12. Descriptive statistics for simulated data 4

 

 

Test Min. Max. Mean Std Skewness Kurtosis
X1 10 75 50.34 13.50 -0.31 —0.80
Y2 10 75 48.64 13.57 -0.29 -0.84
X2 11 75 51.35 13.27 -0.45 -0.67
Y1 9 75 50.39 13.56 -0.30 —0.81

 

r(Xl, Y2) = rm,X2) ‘~' 0-88

40

 

   
 

 

 

   

 

3000 3000
25
200
150
1000-
500
0 —— 0 _
0 10 20 30 40 50 60 7075 0 ‘IO 20 30 40 50 60 7075
X] (skewness=—0.31) Y2 (skewness=-0.29)
3000 3000
250
200
150
1000-
500
0—10 20 3O 4O 50 60 7075 O0 —‘l_0 20 30 40 50 60 7075
Y] (skewness=—0.30) X2 (skewness=-0.45)

Simulated data5 with signiﬁcant order ﬁns (DOE= -3. 7 6, etZect size of DOE = 0.152

0 Sample 1 (N=100000):

2
0 =1 0' =94
611 911312
6~ (#1911:0 #312=_0‘1)’ 2
0011012 =.94 0612 =1

0 Sample 2 (N=100000):

2

0' =1 0 :94
1921 921922

6~ (#921=0 #922 :0'2)’
0 - 94 0'2 —1
621622 —. 622 _

41

a ~(,ua =1,a§ =0.12); b~ (#7, =—0.3,a§ =08);
c~ =0.25 02 =0.008
(#0 2 b )

TABLE 13. Descriptive statistics for simulated data 5

 

 

Test Min. Max. Mean Std Skewness Kurtosis
X1 10 75 50.99 14.07 -0.29 -0.89
Y2 9 75 48.50 13.65 -0.11 -O.88
X2 11 75 52.33 13.36 -0.34 -0.75
Y1 9 75 50.92 14.10 -0.29 -0.88

 

r(Xl, Y2) = r02], xz) z 0.88

3000

3000

 

250
200
150
1000 '
500
0

 
 

A

10 20 30 40 50 60 7075

     
    

01020304050607075

  

X1 (skewness=-0.29) Y2 (skewness=—0.l l)

3000 3000

250 7

200

150

1000 1000.

500 500

0 — 0 —
0 ‘10 20 30 40 50 60 70 75 0 10 20 30 40 50 60 70 75

Y] (skewness=—0.29) X2 (skewness=0.34)

42

 

0 Sample 1 (N=100000):

0' —1 =94
_ _ 611 911912
6~ (#611—0 ﬂ912_—0'2)’ 94 2 _1
0311912 — 0612 T
0 Sample 2 (N=100000):
0'32] =1 0321922 .94
0~ (#1921:0 1“1922 =0°2)’ 2
0' = 94 0' —1
921922 ' 1922

a ~ (ya :10; =012); b~ (,ub =—03,a§ =08);
c~ =0.25 02 =0.008
(#c 2 b )

TABLE 14. Descriptive statistics for simulated data 6
Test

 

 

Min. Max. Mean Std Skewness Kurtosis
X1 9 75 52.52 13.78 -0.26 -0.88
Y2 9 75 47.75 13.79 -005 -0.96
x2 11 75 52.93 13.24 -0.37 -079
Y1 11 75 52.55 13.80 -025 -0.89

 

rm, Y2) = r(Yl,X2) z 088

 

3000
2509

    
  

200

150

1000 -
500

  

 

0 _.
0 ’10 20 30 40 50 60 7075

X1 (skewness=-0.26)

43

0—
010 20 30 40 50 60 7075

Y2 (skewness=—0.05)

    

0— o—
0 ‘10 2030 4050607075 0 10 203040 50607075

Y1 (skewness=-O.25) X2 (skewness=-0.3 7)

3.3 Analysis

The analysis of real data and simulated data in this study differs slightly. For the
two real datasets, the bootstrap method was employed to calculate standard error of 14
out of the total 22 equatings (as listed in Table 15 and Table 16). The equating results
were evaluated by SEE and RMSE. For the simulated datasets, empirical standard errors
of equating were calculated for 22 equating methods as displayed in Table 6. The
equating functions were evaluated by SEE, equating bias relative to the large sample
standard, RMSE and SEED. Computer soﬁware SAS, MATLAB, Compaq Visual

Fortran, and MATLAB were used to simulate data and conduct equating procedures.

3.3.] Equating Methods Applied for Simulated Data

Table 6 lists the names of all the equatings conducted for simulated data in this
study and provides detailed explanations for each equating. The results of the traditional
equipercentile equating (EG_Equi) on each population dataset were considered as
criterion equating results. All the other equating results were compared to this criterion

equating for each population. In this study, all the equatings are from test Form Y to test

44

Form X, i.e., the equating function takes the form of ex ( y) , which is a function of score

y.

3.3.2 Procedure for Estimating Empirical SEE for Simulated Data

Once the population datasets were generated, 500 random samples were selected

from each of the four populations without replacement. The estimation of empirical SEE

for the simulated datasets followed procedures as below:

1.

Randomly select one sample (n=50) from each of the two independent
samples from population 1 without replacement. Selected sample 1 has
scores for Form X, which is taken ﬁrst and Form Y, which is taken
second. Selected sample 2 will have scores for Form X, which is taken
second, and Form Y, which is taken ﬁrst. Data from the two
independent samples were simply combined to form a data with the
pooled single group design.

Apply the 22 equatings to the samples selected from the population.
When the sample size is greater than 100, two log-linear models were
ﬁt to the data for all the KB equating methods. The ﬁrst log-linear
model (model (2, 2, 1)) preserves the ﬁrst bivariate moment (the
correlation of scores on Form X and Form Y) and the ﬁrst two
univariate moments of each variable (mean and standard deviation). The
second log-linear model (model (4, 4, 1)) preserves the ﬁrst bivariate
moment and the ﬁrst four univariate moments of each variable.

Replace the test-takers into the corresponding population and repeat

45

sampling for 500 times. Then the 500 replications build up a conditional
distribution of equating results at each score point. The mean of this
conditional distribution is the equating results at each score point and
the standard deviation of this conditional distribution is the empirical
conditional SEE at each score point.

4. Repeat step 1 to 3, change the selected sample size from 50, to 100,
300, 500 and 1000.

5. Repeat the above procedures for simulated data 2 to data 6.

The bandwidth for KE linear equating was set at 200. The weighting parameter

Wx or w took values from 0.5 to 1.

y

3.3.3 Evaluating Equating Results from Simulated Data

For the simulated data, traditional equipercentile equating results with the EG
design were considered as the criterion. All the other equating methods were compared to
this criterion and. were evaluated in terms of Standard Error of Equating, equating bias
relative to the large sample standard, Root Mean Square Error and Standard Error of
Equating Difference. For the two real datasets, only bootstrap SEE and RMSE were
reported.

Equating Bias Relative to the Large Sample Standard

To calculate equating bias at each score point, for each of the 22 equatings under
each of the six population conditions, the mean of the 500 replications’ equating results
were subtracted from the criterion equating results (EG_Equi) at each score level (as in
formula 17). Conditional equating bias was not reported for simulated data. Instead, the

average of all the conditional biases at each score level was calculated and reported in

46

chapter IV.
Root Mean Square Error (RMSE)
The Root Mean Square Error of each equating compared to the criterion equating

is equal to the square root of the sum of squared average bias and variance of bias over

 

possible score points: RMSE: JW +(2de) ,where d is the mean of the equating

differences and sdd is the standard deviation of the differences between the equating

results of one method and the criterion equating results. It reﬂects how biased and how
accurate the equating results are compared to the population criterion.

Standard Error of Equating (SEE)

The empirical conditional standard error of equating was considered as the
standard deviation of the conditional distribution formed by the equating results for 500
replications. It can be calculated using the following formula. In chapter IV, only the
average of these conditional SEE’s over different score points was reported for each

equating method.

 

1 500
SEE= —— Z (8X(yk) eX(yk))2 (27)
499 j— _1

where j = 1 to 500 is the number of selected samples; k = 1 to K is the possible
score points on Form Y; é X ( yk) is the equated score from Form Y to Form X for the f“
replication; 8 X ( yk ) is the equated score of X corresponding to score yk from the

population dataset.

In this study, SEED was calculated directly by the KB software.

47

Chapter IV: Results

4.1 Real Data 1

Real data 1 has a DOE of 2.03, which is not statistically signiﬁcant (t=.713,
se=2.85, p=.476, dﬁ281), i.e., the order effect can be almost cancelled out by pooling
together the two groups of data in this speciﬁc example. The effect size of DOE is 0.08.

Levene’s test of homogeneity of variance (Levene, 1960) is not signiﬁcant (F =1 .67,
p=.197). The best ﬁt model for the KB methods is model (2, 2, 1): T X = T Y = 2 and I =

L = 1. The following ﬁgures show the observed score distributions for X1, Y2, X2, and

Y1 and their ﬁtted data distributions.

 

 

 

 

 

 

10 10
8 0 Observed o Obsened
5‘ - Fitted 5‘ - Fitted o 00
g 6 1 o. o o o 01:) a o o
3 O O 3 O O.“
8 4 a O o 90 8 4 d 0 00
LL 0 00 .9 LI": 0. o o \.o
2 ~ 0 o o o ‘1. 2 - o o o o .71
on o o «T o o 000 o ‘1“
0 4 > t >
o 15 30 45 60 75 0 19 38 57 76
X1 Scores Y1 Scores

FIGURE 1. Observed score distributions for X 1 and Y, in real data 1.

48

 

 

 

 

 

 

10 10
8 0 Observed 0 8 0 Observed
4 = o
a - Fitted . . 5; - Fitted
g 6 1 QC, 6 . o «I»
3 o oo «o 3 02"
8' 4 — o o 00 g 4 7 ’° ; ° "3
I: o o no 9 009 U“: o ’11-". o ‘99
2 — oo o o 2 ~ 000 an. o o '-'-_
«goo «o. o 0. g o 05':
0 0
0 15 30 45 60 75 0 19 38 57 76
X2 Scores Y2 Scores

FIGURE 2. Observed score distributions for X 2 and Y 2 in real data 1,

4.1.1 Selecting the Best Equating Function Using RMSE

All equating methods were compared to the traditional equipercentile equating
with an E G design (EG Equi.). It shows that, when DOE is insigniﬁcant, 2SG(.5,.5) and
SG_KE has similar equating results with almost the smallest SEE’s over the whole score
point scale, but they have bigger RMSE compared to the EG design. Not much difference
was found between the equating results of traditional equating and Kernel Equating. No
large difference was found between linear and equipercentile equating methods except
for traditional EG linear and traditional EG equipercentile. This is because the sample
size for EG design is only about 70 for each sample in this dataset, which is too small for
equipercentile equating. Equating results of ZSG (.75, .75) have relatively small SEE and

RMSE. It is the only method that best represents the criterion equating results.

49

TABLE 15. Evaluation of equating results from real data 1

 

 

 

 

 

 

ZSG KE SG EG
(5,.5) (.5,.75) (.75,.5) (.75,.75) (1,1) traditional traditional

Linear
Mean SEE 0.663 0.839 0.884 1.252 2.381 0.663 2.384
SD SEE 0.313 0.371 0.42 0.565 1.113 0.313 1.118
Min. SEE 0.32 0.44 0.425 0.646 1.164 0.32 1.164
Max. SEE 1.334 1.634 1.776 2.433 4.674 1.334 4.648
Mean Diff 2.066 1.769 1.229 0.908 -0.403 2.066 -0.418
RMSE 2.92 2.5 2.1 1.76 1.68 2.92 1.69
Equipercentile
Mean SEE 0.692 0.833 0.846 1.147 2.196 2.133 3.1
SD SEE 0.343 0.346 0.343 0.408 0.928 2.241 1.926
Min. SEE 0.332 0.385 0.419 0.429 0.491 0 0
Max. SEE 1.384 1.485 1.43 1.714 3.557 6.778 6.82]
Mean Diff 2.29 2.04 1.518 1.262 -0.062 1.369 0
RMSE 3.09 2.72 2.31 1.98 1.42 2.26 0

 

*Criterion equating = traditional EG equipercentile equating

The 2SG approach with weights of (1 , 1) has the smallest RMSE when taking the

EG traditional equipercentile equating function as a baseline. Therefore, the 28G (1, 1)

equipercentile method is the best equating function when using RMSE as an index.

50

4.1.2 Selecting the Best Equating Function Using SEED

 

 

 

 

 

 

 

f I T I T I 1 1 7 T 1' T T T
100 0 Equating Difference
oo
8 000000 0 ZSEED
._ 000 ~
° -2$EED
00° 5
o
6*»... oo 000 Zero Line 4
... 00000
.0...” 000000
4 P 0. 00° 70
... °°°°00000 00000000
0.. 00000000000000
2 _ “......“ 4
.0“...
“......
0 "o
“"00
-2 7 000 000 7
0000000 000000
00000000 00° 001)
_4 " 00000 -l
000000
0000
-6 r— 00000 d
oo
o00
0000
'8 0000 4
00
00°
00°
-10 ~ 4
4 1 4 1 L1_ 1 1 1 1 1 1 1 1__ 1—

 

 

 

0 510152025303 404550556 657075

FIGURE 3. Equating difference between 2SG(], 1) linear and 2SG(. 5, .5 ) linear and the

i 2SEED conﬁdence interval band around zero line, real data I.

51

 

 

 

 

 

 

[ I ﬂ ﬁr T Y7 f Y 1 1 T 1 j I I
8 — oooooooo 0°00 0 Equating thference H
o 00000 o ZSEED
00
6 l- 00 0000 O '25EED p
m. 00 —— I
O ....m ...“ 00°00 zero Llne
4 0 .0... o 00 4
o .. o
0... 000090000
1: .0... 000000000000000
0.. 0°
2 0...... 0°
.... o
0. ﬁt
0 ﬂ“
0.000.“.3
O
-2 ~ 00 A
o
0
lb ooooooooooooooooooO°°
00°
-4 ~ ﬁ
0
O 000
O
o 0000
-6 o 000 d
0 000°
000000000000000
-8
l
1 m 4 1 n— 1 1 J 1 1 1 4

 

 

 

0 5101520253035404550556 657075

FIGURE 4. Equating diﬂerence between 2SG(], 1) equipercentile and 2SG(. 5, .5 )

equipercentile and the 1“— 2SEED conﬁdence interval band around zero line, real data 1.

Figure 3 and Figure 4 indicate that the differences between the two KE linear and
the two KE equipercentile methods using weights of (1 , 1) and weights of (0.5, 0.5) are
small in comparison with the i 2SEED band. According to von Davier, Holland, and
Thayer (2004), this indicates that the equating bias introduced by order effects is small
enough to be ignored. Thus, the best equating function can be selected solely based on
the random equating error, i.e., the standard error of equating. In this case, the 2SG linear
or equipercentile equating with weights of (.5, .5) will be considered as the best ones.
Their equating difference can be tested against SEED again to decide which one to

choose.

52

 

 

 

 

 

 

 

 

 

3 ﬂ f r T T f f ‘17 r T
1
' Equating Difference ‘0
° ZSEED
2’ ° 2SEED
"’ Zero Line
000000
l t °° .+
«“398... d
.0 3833.... .
O 00003:. ... . 0°
°°o .0. 0°
0 0000033..“ °°°°°°°00Ooooooooooooooooooooa
. 000° oooooooooooooooo°o a...530000000000000008.800000
o0000 l
‘1 ”. C,oc’oo _‘
00000000
-2 = _
1P
-3 I l 1 L 1 J 1 1% L l 4‘
0 5 10 15 2L0 25 3O 35 40 45 5O 55 60 65 70 75

FIGURE 5. Equating difference between 2SG(. 5, .5) linear and ZSG(. 5, .5 ) equipercentile

and the i 2SEED conﬁdence interval band around zero line, real data I.

As shown in Figure 5, the difference between the KB linear and the KB
equipercentile equating functions falls beyond the 95% conﬁdence intervals along the
whole score scale except the lower end. The equating function deviates from a linear
function. Therefore, the 28G equipercentile equating function with weights of (.5, .5) is
preferable to the 2SG linear equating function with weights of (.5, .5) (von Davier,

Holland, & Thayer, 2004).

4.2 Real Data 2

The second real data has a DOE of 2.06. This is signiﬁcant as the order effect can

not be cancelled out by pooling together the two groups of data in this example. The

53

effect size of DOE is 0.26. The best fit model for the KB methods is model (2, 2, 1)

(TX=TY=2,I=L=1)forgrouplandmodel(4,4,1)(TX=TY=4,I=L=1)for

group 2. The following figures show the observed score distributions for X1, Y2, X2, and

Y1 and their best-ﬁt log—linear models.

 

50 — 0 Observed
40 ~ , .__.° - Fitted
>. '0.
8 30 - ," -
m
3 ‘ '0
g 20 ~ ..
LL 1O — 9.: ‘ _
0 4H“: ...—fag“
0 5 10 15 20 25
X1 Scores

Frequency

50~
4o—
30«
20«
104

04

 

0 Observed
- Fitted

10 15
Y1 Scores

FIGURE 6. Observed score distributions for X 1, and Y1 in real data 2.

401

Frequency
N on
O

..L
O O
L

 

O
A

O

5

. Obsened
.. - Fitted
t

I

a - O
M—T—r—‘ﬂ—ﬁ—Q‘f

10 15 20 25

X2 Scores

Frequency

 

50 — 0 Observed
4o _ 1., - Fitted
c- . '
4 ..
3o .- ”...
20 —+ _ -
1o 1 "3 :2.
u‘ .-
0 4W?
0 5 10 15 20 25
Y2 Scores

FIGURE 7. Observed score distributions for X 2, and Y 2 in real data 2,

54

4. 2. 1 Selecting the Best Equating Function Using RMSE

TABLE 16. Evaluation of equating resultsfrom real data 2
23G KE SG EG
(.5,.5) (.5,.75) (.75,.5) (.75,.75) (1,1) traditional traditional

 

 

 

 

 

 

Linear

Mean SEE 0.205 0.223 0.243 0.296 0.51 0.205 0.51
SD SEE 0.07 0.068 0.079 0.083 0.143 0.07 0.143
Min SEE 0.117 0.138 0.145 0.193 0.333 0.117 0.333
Max SEE 0.341 0.354 0.403 0.454 0.767 0.341 0.767
Mean Diff 0.749 0.498 0.448 0.198 -0.387 0.774 -0.29
RMSE 1.007 0.832 0.753 0.638 0.876 1.161 0.673
Equipercentile

Mean SEE 0.254 0.284 0.25 0.304 0.49 . 0.382 0.54
SD SEE 0.114 0.124 0.075 0.077 0.113 0.241 0.274
Min SEE 0.134 0.166 0.165 0.224 0.343 0 0
Max SEE 0.484 0.548 0.399 0.463 0.72 0.845 0.96
Mean Diff 0.671 0.526 0.505 0.362 0.033 0.965 0
RMSE 0.97 0.857 0.774 0.681 0.624 1.317 0

 

*Criterion equating = traditional EG equipercentile equating.

In Table 16, the ZSG equipercentile equating with weights of (1 , 1) has the
smallest RMSE when taking the EG traditional equipercentile equating function as a
baseline. Therefore, the 2SG (1, 1) equipercentile method is the best equating function

when using RMSE as an index.

55

4. 2.2 Selecting the Best Equating Function Using SEED

 

 

 

 

 

 

 

 

 

 

 

3 . .
° Equating Difference
" . , o ZSEED
2. ’ . . ° -ZSEED .
. ° , Zero Line
0 o . . o . 0 4b
1 0 o o o . . . o o o 0 J
O o o o o o o o o o g 0 Z
0" . U 0
-1 o o o 0 ° 0 ° 0 0 Al
0 O O O O >
_2~ 4
-3 4 l l
0 5 10 15 20 25

FIGURE 8. Equating diﬁerence between ZSG(1, 1) linear and ZSG(.5, .5 ) linear and the

i 2SEED conﬁdence interval band around zero line, real data 2.

56

 

3 . _ ﬁ ﬁ

 

 

 

 

 

 

 

 

 

0 Equating Difference
° ZSEED
2~ ° ~28EED
. Zero Line
Ii:— o o o . . . 0 o o ° ° 0 if
8 O 0 ° 0 0 ° 9 °
04' . . '
. . . 0
0 o o o o b
-2»
-3 i
0 5 10 15 20 25

FIGURE 9. Equating diﬂerence between 2SG(], I) equipercentile and ZSG(. 5, .5)

equipercentile and the i 2SEED confidence interval band around zero line, real data 2.

Figure 8 and Figure 9 indicate that the differences between the two KE linear and
the two KE equipercentile methods using weights of (1, 1) and weights of (0.5, 0.5) are
beyond the :t ZSEED band in the middle part of the score scale, where most of the scores
distributed. For von Davier, Holland, and Thayer (2004), this indicates that the equating

bias introduced by the use of the data from form X2 and Y2 cannot be ignored. The best

solution would be to discard data from tests taken second, that is, to treat the data
collected by a CB design as an EG design. After the weights are decided, the SEED plots
can be used again to decide which equating ﬁmction to choose, the 28G linear equating

with weights of (1 , 1) or the 28G equipercentile equating with weights of (1 , 1).

57

 

 

0 Equating Difference

 

 

 

 

 

° ZSEED
21! , ° -2$EED H
0 Zero Line
<3 .
1» ° 0 ' o :t
o o o . o o
o 3 O
0 8 8 2 o o o 0 ° ° 0. O 8 8 . . o (D

0
0° .833000000

 

 

 

FIGURE 10. Equating diﬂerence between ZSG(I, 1) linear and ZSG(1 , I) equipercentile,

and the i 2SEED conﬁdence interval band around zero line, real data 2.

As shown in Figure 10, the difference between the 2SG (1 , 1) linear and the 28G
(1, 1) equipercentile equating functions falls beyond the 95% conﬁdence intervals at the
lower and the middle score scale end. This indicates the equating function deviates from
a linear function. Therefore the 2SG(1, 1) equipercentile equating function is preferable

to the 28G (l , 1) linear equating function (von Davier, Holland, & Thayer, 2004).

58

4.3 Simulated Data

All the simulated data can be ﬁtted by a log-linear model of (2, 2, 1) with
adequate model ﬁt. Fitting a model with more parameters did not reduce the likelihood
ratio chi-square statistics signiﬁcantly. In addition, the Freeman -Tukey residual plots are
within the range of (-3, +3) for all the simulated data when ﬁtted with a model of log-

linear model of (2, 2, 1) like in Figure 13.

2-5- Freeman-Tukey Residual (AIX)

24
1.54 /

 

 

3

 

 

 

 

 

 

 

 

-2- Score

FIGURE 1 1. One example of F reeman-T ukey residual plot for POP3.

4.3.] Model Fit

Various log-linear models were ﬁtted to the simulated sample datasets. The
results indicate that, when sample size is 50, model (2, 2, 1) is the best ﬁt model. When

sample size is 100, 300, 500 or 1000, both model (2, 2, 1) and model (4, 4, l) have

59

fairly good model ﬁt. In this study, only the equating results of ﬁtting model (2, 2, l) are
reported since the equating results of ﬁtting model (4, 4, 1) are very similar to the

equating results of ﬁtting model (2, 2, 1).

4.3.2 Evaluating the Equating Results by RMSE

As shown in Table 17 and Table 18, the pooled SG and ZSG(.5,.5) approaches
under the KB framework have the lowest SEE and RMSE when DOE is almost zero. This
indicates that when order effect can be cancelled out, the pooled SG method or 2SG(.5,.5)
method can both provide optimal equating results.

Table 19 and Table 20 show the equating results for population data 2 where DOE
has an effect size of 0.025. The 2SG linear and equipercentile equating methods with
weights of (.5 , .75) for X and Y have the smallest RMSE. When the differential order
effect gets larger, as in data 3 where the effect size of DOE is 0.05, the 28G linear
equating methods with weights of (.9, .9) have the smallest RMSE (Table 21 and Table
22). When the effect size of DOE approaches to 0.1, the pooled SG approach and the
ZSG(.5, .5) approach are apparently not the best (Table 23 and Table 24). Instead, the
28G linear equating method with weights of (1, l) (i.e., EG KE linear method) or the EG
traditional linear method has the smallest RMSE. Furthermore, in population data 5 and
data 6 when the effect size of DOE is around 0.15 and 0.2, the beneﬁt of using weights of
(1, 1) in the 2SG approach becomes outstandingly bigger. As shown in Table 25 to Table
28, the EG KE linear or EG traditional linear methods have much smaller RMSE than

those methods which treat data as a single group design.

60

 

 

 

 

 

 

 

 

to... 2.....- to... ..o- 82.- .8... NS...- o.o.o- 5....- 5.... «=....- Em
m8... :3... m8... 2.... $4... 2...... men... .42. Sm... on... an... mmm
......o «...... :2... 5:... m5... 5...... 5...... 5..... as... «we... 3.... mmzm
ooo.te
08... 2......- eS... 8.....- E...- Eo... .5...- mmo... 3....- E... 3...... Em
.m. «.4... .m. E... wee... 2:. .2... MS... 3..... 8m... 3.... mmm
w... .o 5..... we. ... m... ... 8:. 3..... was an... EN... 84... E... mmzm
com":
8... 2.....- 8... ... ..o- 2.....- ...; E...- .8...- SN..- 32. 2.....- Em
3.. 3m... 35 .9... we... mom... who... 48... an... as... am... mmm
m... 5..... 2... 3.... 2...... 33 we... 2.... m3... 4%... E... mmzm
com”:
o 5....- c an..- .3...- 8..... 2......- m~....- m8...- SX. .8..- Em
Bed 2...... Boa 23 2%.. $5.. S... S... 3... S... SS. mmm
5.... 2:... a. ... .3... 48... S. ... ..m... 3 . ... .2. 3..... 3..... mmzm
8.1..
.2...- mm.....- .2..- E.... SN..- 3.....- Men...- wowé mom...- 82. ”2...- Em
83. $2 83. 3... 83 3mm 25.. new. .2. at... we". mmm
mm... 2... wmm... 2.... SN. 8.... Mam... SN... 3... 3..... ...... .52..
cm”:
...: 8.3 3.3 5.65 .35 3.3 6.3 $5.3 3...... ...0m

32.: 325 0mm 0mm 0mm 0mm 0mm 0mm 0mm 6mm 0mm

Ecomumﬁmhb EtomwmﬁP—F .305..—
Om 8 0mm

 

$356... 35533 .835 35% ..Qimo...m.:3w ASSEPA .: ”mt—max...-

61

 

 

 

 

 

 

 

 

8.....- .m.... 2...... 3.....- ......- ....o.... 8.....- t.....- S...- gm... ......- Em
....m.. 3..... ...... o3... ...... 9...... m2... ..Nm... m... E... ....N... mm...
2...... .2... N2... .2... 5;... ...... S... ...... .2... 2...... «...... m2)...

ooo.te
...... ...... ...-.... 5..... mm...- ...... an..- .8... mm...- ........ 8...... Em
...... N. ... SN. 5...... m8... 3.... 3..... .3... ...... ...... 2n... mmm
...... mom... 5.... .2... .... 2... $4... 2.... EN... ...... «...... mmzm
com“:
.8... N8...- wNo... ......- E..- ....o... 8.....- .No...- 3..... e... 3.....- Em
.8. .3. an. Sn. .5... 5...... 2.... 2.... N2... 2.... .m... mmm
.m... mm... 3... ...... $2. 3.... 8.... .2... ....N... we... 3..... mmzm
com“:
8.... ....o... ...... .8...- me...... to... .......- ...- .8...- ....... ......- Em
am... E. $2 8.... Sn. 3... a... 8.. 3..... 8.... 8.... mmm
.2... E... 2... m2... ...... NS... 3..... ...... om... ...... mm... mmzx
.....u:
2...... 2.....- 0......- 2.....- 3....- m....- E...- S....- E...- mwm... ......- Em
mmo... 22 23 m... at... 22 ...... ...... N3. 5%.. ...N. mmm
gm... ...... 2.... wow... 3.... mom... 2.... NE... N2... 3..... 2.... mmzm
on”:
9... a... 9...... 55.. m5. ...... ...... 5.6. at... ......

......m ......m 0% 0mm 0mm 0mm 0mm 0% 0mm 0mm 0mm

.mmoEuE-r 3:03:68.- o...:mo.o&=wm
0m 0m 8m

 

5.3656... M5333 ﬁttmuxmﬁzwm NKQQ Lox-8.22.3... buEEzm. .w. mama...-

62

 

 

 

 

 

 

 

 

..mm..- .2... 3.... NS...- wmwé ......- t....- 2...... om..- 3..... .3... Em
..8... 8m... N8... .8... 8.... 3m... .3... an... a. m... .33.. EN... mmm
8.... 8m... 3 . ... .2... m ...... .2... E... 8.... 8.... S... ..m... mmzm
coo.u=
m..- NNN... 3..... .. 3.....- 2.....- 0...... amé a...- ....... an..- Em
2.... 8.... on. ... ... 3..... ...3 NE... NE... 8.... 8m... 8.... mmm
9.... SN... 2 . ... ..m . ... wt... ...N... ...... mm... NR... 8.... Nu... mmzm
can":
.2..- .a... m... ... 2...... E...- wo...- 8.... ....~..- 8.....- em... 8.... Em
E... .2... ...... .5... m8... ...... ..ﬁ... N8... 0...... 8:. m3... mmm
..m... 8... am... 3... 3..... E... E... 2m... 2m... 3... ....... ma...
com“:
3..... .NN... .... ...- a. . ...- aw..- mmm..- N3..- wmm? .m..- N....... 8.....- Em
ww..~ 8.... 22 Na... ...... £2 .8... . 2...... 3.... E... mmm
...... ...m... a. . ... .8... ....o... ....N... 3... 2.... ....m... an... 2.... mmzm
.....n:
E...- .. .... N2..- 2m...- m.....- am...- ﬁ... 8...... N.......- 2...... 8m...- Em
83 No... 83 2.2 2...... on; E... 3.. .mm. mm. 8... mmm
Na... 8... 9...... 0...... 8.... S... N .2. a»... 0...... 3.... N2... mmzm
own:
...: a... 8.0.. .35.. $5.. 66.. ...... .3... ....m.
.35. .85. 0mm 0mm 0mm 6mm 0mm 0mm 0mm 0mm 0%
ECOBBWP—l 3:036qu .305..—

Om Om 0mm 30..

 

mnoSmE @5333 Emma NKQK ..o\8.:m...3m baﬁﬁxm .9 mama...-

63

 

 

 

 

 

 

 

 

...-.... mm... .8... 8.....- 2......- m..~...- m8...- omm..- En...- ..n..... 2.....- Em
8.... .3... wt... ...... 3..... ..9... 8m... .2... ....m... En... gm... mmm
EN... 3..... E... a... E... am... an... 8.... ...... em... «...... mmzm
.......n=
........ SN...- Noo... ......- SS- ......- m.....- 2.2.- .o.....- 2...... 3.....- Em
mm... :3 5a.. 2.... E... 5... Nmm... E... ......o «E... 5.... mmm
.2... 2...... ...... on... 3.... ..N... ..S... 5...... 2m... ...-...... 2.... mmzm
can“:
......... NR..- N. .... w ...... ...»...- : ...- Sm...- .m~..- ..m.....- 2... .2..- Em
......N ..R. .0... a»... 3...... mg... .8... .... Sn... .8... MR... mmm
SN... an... m . ... ...... 2.... .... . ... Sm... 8m... ...... at... £2. mmZm
cemnc
......- S...... .3...- .m....- 2.....- RN... 0%.... m...- 3....- .3... m2..- Em
......N 8.. ......N ......N 82 on. 2.... .8... Q... ..N... v..... mmm
N2... 2...... S . ... ...... 8.... m2... ..8... 8m... 0...... N... am... mmzm
.....u:
3.2.- ..mm..- N2...- ..N...- 9.....- mm...... 3..... .m~...- ......- 3.... .8..- Em
...2 23.. 8.... a... $2 .3. an. RN. 3... N... o... mmm
gm... .2... E... ....N... 3..... com... 8.... SN... 2...... .3... 3m... mm...
on”:

...: a... .3... 43.5.. $5.. ...... ...... 5.3 8.3

......m ......m 0mm own 0mm 0mm 0mm cm...- omm 0mm 0mm

322.68... ...—5:68..- ﬂammobmSGm

Om Om 0mm 20..

 

£333... @333 3:28me33 ”NOR kahuna-.28.- .DQEESA .om mama...-

64

 

 

 

 

 

 

 

 

NR..- 83- E... ....N... 3..... mo.... 2:... a... ....m... .8... E... Em
..o... 8.... as... E... 8.... ..m... 3m... 8m... .2... ...»... 3N... mmm
8.... ..8... an... 5... SN... 2.... ....m... an... :m... R... w . n... mmzm

oco.u=
mom...- Soé ...... 2."... 3..... m........ .8... NE... 3m... 3..... .32. Em

a... 3.... S... 2... 3.... at... $2. 8m... 3.... a... 3.... mmm
N2... 8.... mm . ... .3... ...-N... .0... .8... N8... 3.... woo... ct... mmzm

can”:

..2..- E? 5...... ....N... ......- wwv... .8... ..S... ....m... 8.... 3.... Em
m... . .3... ..N... «...... .8... ...... ...... 3...... 2.2. 8.... ....m... mmm
..m... ....m... .... . ... an... ....N... ...... ....N... a... 8.... .m... ...... mmzm

cow":

2 . ...- So...- a . ...- ...... a . ...- ..mm... .... . ... 8m... 5... .3... .3... Em
....Z 9...... $2 22 mm... mm. 8.... . ...o... 8.... 8.... mmm
8.... 2.... ...... 2... SN... .3... .8... ...... mm... 9...... E... mmzm

.....u:

23.- ...... .- 9......- 2.....- .8..- .02 m . N... 8... . .... $0... .2... Em
..N..... ...N. 8... man ....o. 32 .... mm... ....m. 2... NM... mmm
.3... ...... 9...... ...... m. m... on... ...N... «S... cm... 8.... 5..... mmzm

own:
...: 3.0.. ...... 5...... $5.. 3..... ...... 5..... .3...
.85. Ex... SN 0% 0mm 0mm 0mm 0mm 0mm 0mm 0mm
22.2.68.- _mcoEvm.-r 52.:
um 8. 0mm 20..

 

$853.. $5.333 .523 MKQK ..onu...m...u.m ...-853..” ..N mum/2-

65

 

 

 

 

 

 

 

 

mndd mnvd mid comd mend- Smd 02d Nmmd :uvd mvwd Edd mam
NS”.— mmdd momd Ed mmvd oovd nomd de odmd wmmd omd mmm
2nd coed 2 Nd mwmd and Evd :de dmd mwvd Ewd awed ”mm—2m
cod—Ha
.ddd wdvd Ed mdmd dedd- wvvd mNd 5m d novd mood Edd mam
0mm; t. _._ mom; 08.— ooed 2:. band obvd vad mdmd N _ vd mmm
:Hmd 3nd momd mmd 22d ddvd mdmd m Gd dmmd nmmd :bd mmSE
ddmnc
amdd wand mcmd vwmd mdd- wwvd dvmd Sod hmmd Edd bond 2.5
deN «am. Ed; 3d; omwd mood owed oomd nmmd mvcd de mmm
mwmd mid .mmd mwmd mm—d Ed mad 3%... mmmd Sod Ed mmEx
ddmu:
oodd- mmvd Edd vomd on —d- cwmd mdmd mend Smd Gwd mmod 92m
m_w.~ vmn; mcvd to; ~24 Sm. d5.— vod hcwd _~d._ Rwd mum
mmmd nod mmmd End de Evd wmmd Smd vmmd mdad mFd ”mm—>2
den:
mvmd- dmmd vMod- 2:. 53d- wnmd dmmd Sod End mad mgd mam
owed mmdd mama mvwd m3; d5. mom; mum; on: now; u: mmm
gmd NEd onvd bmmd m—Nd mmvd Nomd owed and mood wdwd mm—zm
om“:
...... .....m. 3.3 5.2.. $5.. 66.. 3..... .36. Ga.
.53. .ch 0mm 0mm 0mm 0mm 0mm 0mm 0mm 0mm 0mm
35:69:. 22.2.68... 23582561”. mv—
Om Om 0mm 20..

 

£853: 953%» ﬁtnmummmsg mnsnm m0\mu.:m.:3m «932.53% .mm mqm/Z.

66

 

 

 

 

 

 

 

 

NS...- mmmf «8...- ea...- NN... 3n..- $2.. ..3...- ms...- t...- mm..- Em
....a... ...m... 3...... E... 8.... .3... ......... 2...... 2m... 3..... ...m... mmm
8.... .2. 2.... 8.... ... ... 8.... 3..... 23... 2...... .5... .8... mmzm
......W:
3.....- ma..- «0...... ovm..- 3..... as..- an..- mo...- oow..- 9...- ...N..- Em
an. ...... .2. 2... NE... 8.... 5... Em... A...... a... 2.... mmm
2.... SN. 2.... ...... 3..... 8... SM... ...... 8o... .5... SN. mmzm
....muc
......- wmm..- ......- ommé 2.....- 3....- 3m...- E....- 2.....- .m..- 8...... Em
...... an... a... ...... 3..... 2.... ..ﬁ... 3.... 2.... as... am... mmm
an... a. as... .3... 0...... .8... 3..... S... .8... :2 N8. mmzm
con”:
man...- wvm..- 2......- mmm..- 3.....- 9.2.- 3......- 3... .- .8...- ox. .- ....N..- Em
5..” .3... a..." 32 .2. a: .2. ... ... 8.... 9.2 mg... mmm
3..... SN. 3...... .8... SN... 5... 2.... ...... So... on... «2.. mmzm
.....u:
an..- ..N. .- Rm...- ..... .- 8.....- «R..- 2......- wom..- .... . .- ..m. .- ..N. .- Em
.... 8m. .... .2 8.3 83 2.... .... mm... a... com. mmm
.8... SN. .3... ..N... am... .3... NR... 0%.. mm... ..m. ...N. mmzm
emu:

...: a... .2. .25. 3.2.. ...... .3. 5a. .3.

325 .83. 6mm 0mm 0m...- cmm 0mm 0mm 0mm 0mm 0mm

3:333qu 3:03:55;- ...-«OE..—

Om Om 0mm Eon—

 

mnoﬁms. M5833 .52.... ..QO .o\8...m.:3m 532.53% .mm mama...-

67

 

 

 

 

 

 

 

 

...... ......- ...2.- ......o- 2..... .8...- .$...- .8...- .~....- ........- m....- Em
...... .3... ...... $2. ...... ...... ...... 3.... 2.... $2. $2. mm.
$2. 3..... ...... 2...... ...... ...... 8.... ...... ...... ...... ...... mmzm
.......u..
2...... .02.- $2.- ......o- m......- $9..- 8...... $......- 2.....- ....- .S..- Em
on... ...... ... ...... 9.... $2. 3..... $2. 3...... mm... ...... mm.
.2... ...... .... ...... ...... ...... ...... .3... $.... ...... ...... mm...
. ...”:
......- ...2.- mm...- $...- ...- .2.- ..v..- ...... .- ......- ..... .- .....- Em
...... .2. 2... ...... ...... .3... 82. .$... ...... ..2. ...... mm.
.52. ...... ...... ...... .2. ...2. ...... .... 3.... ...... ...... mm...
.....u:
8.....- 8....- t.....- 2....- ......- ......- No.....- .....- ......- m..... o... Em
$..N 8... ...... ...... ...... ... $2.. ...... to... c. . .. .. .... mm.
.2... 2.... $2. ..2. ...... ...... ...... ...... ...... ...... .2. mm...
.....n:
8.....- $2.- 3....- .8...- m.....- 82.. 2....- .......- .......- .... .- .....- Em
.... ....N .8... $3. ...... S... ...... t... ...... ...... ...... mm.
$2. E... ...... ...... .$... ...... .... at... ...... ...... .... mmzm
own:

..... ...... ...... 5..... ...... ...... ...... ...... ......

......m ......m 0.. 0.. 0.. 0.. 0.. 0.. 0.. .... 0..

1325...:- 13223. .- 235809.51”.

Om Om 0mm Eon.

 

$332.. Macczwm m...:mu.$&=~.m VKQR ..QKE...M..S.. buﬁﬁzw ...N mag..-

68

 

 

 

 

 

 

 

 

...... ......- m..... 8....- ..N..- .8...- N. m. .- to. .- .... .- .... .- ......- Em
....... ...... ...... .R... S... ...... ...... ...... ...... ...... ...... mm.
2.... ...... 5.... 8.... .... 8.... .... .... .... ...... ...... mm...

......n:
E... ......- ..:.... .....- ......- ma.... 3....- S...- 8..- ......- .....~- Em
...... ...... ...... ...... 3..... ...... .9... 3..... ...... ...... ...... mm.
...... ...... 3..... 3.... .... ...... ...... ...... ...... a... ...... mm...
.....u:
.3... E...- E... 2......- m.... ......- Sm. .- ...- ......- .....- ......- Em
m... .9... ...... no... ...... 3..... .8... S... ...... 8... .2... mm.
3..... ...... m2... ...... .... ...... ...... .... a... .... .... mm...
.....u:
...... 3..... N..... E...- ......- ......- ......- .....- o...- v....- ......- Em
.m..~ .... .... .... ...... .8. N. . .. ...... .3... .. ... .... mm.
3..... ...... 9.... ...... ...... v.5... ...... a... .... N... .0... mms...
.....u:
3...... ......- vec... ......- m..... .0...- S...- ......- .....- .....- .....- Em
.2... N... .2... .... .. ... ...... a... .... ...... ...... .... mm.
S... 9.... .... ...... ...... .8... t... .... .... .... 2.... mm...
cm”:

.... ...... ...... 5...: ...-.... ...... ...-... 5..... ......

.85. ...-...... 0.. 0.. .... 0.. o...- o.. .... .... ....

ECOE—uEr—L 3:036qu .3054

Om Om 0mm 20..

 

@3332. M533? .32.... wRQK .o\8.zm.:3m 535.22% .mm mama...-

69

 

 

 

 

 

 

 

 

3..... 8.... 2.....- S.....- .....- ......- m....- ......- E...- .....- ......- Em
a... .8... ...... 2K... ...... .... ...... ...... ...... ...... 3.... mm.
.. .... ...... .... .9... ...... .3... .... ...... .... ...... .... mmsa
8...”:
...... a... .- ......- m.....- ......- va..- t .. .- .....- .... .- .2... E..- Em
...... ...... .... ...... ...... 2.... .5... ...... ...... ...... 8.... mm.
...... .... ...... ...... 8... .3... ...... ...... ... 8... N... mm...
...”:
...... ......- .o.....- .5...- N....- ...-....- ..m..- ......- m...- .....- .E..- Em
N... N... ...... N... 0..... .... ...... ...... ...... .8... E... mm.
...... ...... ...... 9...... ...... 9.0... 8... 5... N... ...... ...... mm...
.....u:
N... .....- =......- m.....- 8...- ...o..- .....- mm... 8.... N...- ..8..- Em
.3...- o... .... a... .... ...... ...... t... ...... ...... 8.... mm.
=..... ...... .... ...... ...... ...... ...... .... .... .... a... mm...
.....nc
...... .....- ......- mﬁé .....- R...- N....- 2....- S...- S...- ......- Em
c... ...... a... .... ...... .... .... 3..... .... .... 8... mm.
...... S... ...... ...... .... ...... a... .... .3. R... 8... mm...
...”:

.... a... ...... 5...... ...... a... ...-... ...-...... ...-...

......m ..=..m .... 0.. .... .... .... 0.. o... 0.. ....

22.2.6...- _.:o_.€E-_. 2:582:35
Om Om 0mm 20..

 

3.83»... 353%.. m...§m9.m&.§@m ...-ka ENSEMQS. 532.53% .cm mam/D-

7O

 

 

 

 

 

 

 

 

......- ....m- .8...- o.....- ......- c....- ......- o..... .3..- o....- .....- Em
5.... ...... 5.... ...-... 5.... .0... ...... ...... ...... ...... ...... mm.
...... ...... ...... ...... ...... .... ...... E... .... t... 5..... me)...

......u:
3......- ..o..- ......- m.....- ......- m...- mm... m..... .3..- .. ...- ......- Em
S... 3.... 5.... .... Sm... ...... 9.... =...... ...... ...... ...... mm.
3..... ...... ...... ...... .... .... ...... E... ...... ...... ...... mmzm
...”:
......- ........- .8... .N....- ......- .....- ......- RE- .....- ......- .......- Em
...... mm... ...... .... ...... .8... .8... ...... ...... .8... N... mm.
...... 3..... mm... ...... ...... .... ...... 8.... a... .... ......- mm...
...”:
......- .o..... S...- m...... .....N. .....- .....- E...- ...o..- ......- S....- Em
...... ...... ...... a... ...... a... ...... .... ...... .... ...... mm.
...... ...... ...... .... .... ...... a... 8.. .3. .... ...... mm...
.....u:
S...- S..... ......- v... . - ......- m . .. .- 3....- S... .3..- . ...- ......- Em
=...... t... =...... ...... .... .... ...... .... ...... .... 2... mm.
3..... ...... 3.... ...... .... .... .... .... ...... ...... =..... mm...
on":

.... ...... ...... 5...... ...... ...... ...... 5..... ......

$2... 32.. om. .... .... .... .... .... .... .... ....

3.856.:- _..:o_.=.ﬁh 50:5

Om Om 0mm ...—Om

 

£852.. 353.3 .82.... “.050. ..o\..u.:w.:S-n buEEzm. . hm mama...-

71

 

 

 

 

 

 

 

 

owed ...—Qw- wmmd- .mwd- So. T Sm. T NNNN- .NNN- End. 39. T hch- ..E
mow. wad can... no... Yo wmvd EH... Sm... mum... wmmd Km... mmm
9.2. gm...” mow... co... mmod mom. .om.~ ...-Nd end omcd 3..-N mmSE
coo—n:
m5... mom. T Swo- mowd- 33. T mmm. T wmmd- m _ Nd- oovd- mmo. T awed- ...m
coo.— Noo. 03.. N _ wd . _m.o mmcd 53.... wow... «mm... 5...... won... mmm
mm... Emd mm»... 2...... Sued coo. :m.~ wow-d omm.m god wmnd mmEm
, com”:
8... mood- m3..- mmwd- 53. T mom. T m _ NN- mead- VmVN- 3o. T wvoN- ...m
03.. mmm. .9... v.0... 3...... 7%... «and mm... mow... mom... wt... mmm
Em... CNN was... om... ..moN Non. mom-N SNN meN 8N Sud ”mm—2x
com”:
owe-o- SQN- mwmd- coT mo_.m- mmm..- mvmd- omﬁm- own-N- .moT mvod- 2.5
20d 2...— mmcd 3N.N mum. NS... m2... 3m... wwwd wmo._ mmwd mmm
.m... mmd 0..-ed no... wmmd woo. nmmd vamd Sod SQN 3..-N mmEm
2:”:
mg..- mo.~- omvd- momT wwmd- om..- GNN- mwm.~- mood- mg..- 39W 2.5
ooh-m SN mom-m mo...“ 3.. SN m3... 2.... mum. S..— _o~._ mum
mmmd wand mm... on... mow-N 3.0.— mm.m no...” SQN 3a.. unﬁm mmZm
on”:
.... ...... ...... 5...... ...-.... ...... ...-... 5..... ...... ......

....cm .15 0mm 0mm 0mm 0mm 0mm me 0mm 0mm 0mm

.9823...- EcoEcwﬁ- Bummeoﬁmwm
Om Om 0mm

 

£832.. @533 m...§mu..m.~..§m was... .8\......:..:Sm $328.5 .wm mqua-

72

The results indicate the KB methods can approximate their corresponding
traditional equating methods. No large differences were found between the KB equating
methods and their corresponding traditional equating methods (e. g., KE linear and
traditional linear, KE equipercentile and traditional equipercentile). This is consistent
with the results of evaluation studies for KE, such as Mao, von Davier, and Rupp (2005),
von Davier, Holland, Livingston, and others (2005).

Compared to the standard error of equating, the equating bias index is more
sample size independent. Given the same equating method, the equating bias does not
change a great deal as sample size increases. However, the standard error of equating
decreases conspicuously as sample size increases. The more data we have, the more
information we can use to estimate the equating relationship; the less equating error there
will be. This feature of SEE is inherited from its calculation formula.

When using RMSE as a means of evaluating equating functions, it was found that:
a) When DOE is almost zero, pooling the two samples together or using the 286
approach with weights of (.5, .5) are the optimal equating methods with small standard
error of equating and small bias; b) As DOE increases, the ZSG methods under the KB
framework with different weights can provide optimal equating results with smallest
RMSE. The weights for the ZSG approach gets larger as DOE increases; c) When the size
of DOE approaches to a certain point, treating data collected in a CB design as an E6
design will be the best equating solution. The weights of the 28G approach will become
1. The equating method could be either ZSG (1 , 1) or traditional linear or equipercentile

method.

73

4.3.3 Evaluating the Equating Results by SEED

Equating differences were compared against their 95% conﬁdence intervals for all
the sample size conditions under each population. The last graph in Figure 12 plots the
equating differences between ZSG(.5, .5) linear and ZSG(.5, .5) equipercentile methods
for simulated data 1 when sample size is 1000. The straight horizontal line in the middle
is the zero line. The equating differences represented by solid dots are around the zero
line within the range of the i- ZSEED band. The other ﬁve graphs present the equating
differences between the ZSG equipercentile equating with weights of (.5, .5) and (1, 1)
for different sample sizes drawn from simulated data 1.

It can be seen from these plots that SEED gets larger when the equating methods
are different from each other and when sample size decreases. Among the graphs in
Figure 12, the last graph exhibits the smallest SEED, showing that the 28G methods with
the same weighting parameters provide more similar equating results than the 28G
methods with different weighting parameters. Furthermore, the plots in Figure 12 indicate
that under a certain order effect situation, the equating difference stays relatively
unchanged, but SEED decreases as sample size increases. Therefore, the signiﬁcance of
the equating difference mostly depends on the sample size. If the equating differences
between two methods fall beyond the i ZSEED band when sample size is 500, they must
also be out of the band when sample size is 1000. Reversely, if the equating difference is
not signiﬁcant when sample size is 1000, then it must not be signiﬁcant when sample size
is 500.

More SEED plots are provided in the appendix. Most of the SEED plots are for

the differences between the ZSG(.5, .5) method and the ZSG(1, 1) method. The rational of

74

not comparing the equating difference between the ZSG(1, 1) method with the 286
method with any weights between 0.5 and 1 is provided here: In equating for a CB design
with differential order effect, the 28G approach with weights of (1 , 1) has no equating
bias. The 286 approach with weights of (.5, .5) will have the biggest equating bias. If the
equating difference between ZSG(.5, .5) and ZSG(1, 1) is not signiﬁcant, then the
equating difference between ZSG(1, 1) and a 2SG approach with any weights between
0.5 and 1 will not be signiﬁcant.

All the SEED plots for all the simulated datasets indicate that none of the equating
differences between methods ZSG(.5, .5) and ZSG(1, 1) under different sample size
conditions of population data 1, data 2 and data 3 are signiﬁcant. Therefore the bias
introduced by using data from tests taken second can be ignored. Thus the 28G approach
with weights of (.5, .5) can be selected as the best equating line for simulated data 1, data

2 and data 3 when the effect size of DOE is relatively small.

75

muchmoocmwommvovﬂnommmomm.o_ m o

 

- cam”:

00

000

00000

000000000000000000000000

0
00000

00

 

0:3 PEN all
Qmmmm- ..

QmMmN o
ooze-6&5 was... .

r n F

 

 

 

 

D h b »

000000000

000000000000000
o

mT

-oT

 

 

m—

mh

 

wk. on no

1 i4 11 1 _

commommvovwmommmomﬂa: m 0

m7

muchmooommOmmVOme-cmmmomm—A: m o

 

f

b

4

can”:

0000000
0000000 0000
0000 0
00° 0 0
0°
0000 0
000000000 000000 0
0

0000 0
00°00 °
000

0
0°
000

W.-

-o_-

 

Y.

000000
000
0
0000000000000000 0
000060000000
000
00°
00

00
00°00 000
00000000000 0

 

 

0:5 PEN Ill
Qmmmm- ..
Qmmmm o

 

 

 

coco-5&5 warm: .

p b b h b v F p b b h r p

A:

 

 

on. me Go

mm Om

mVOmeOm mNON 2.: m C

m—

 

 

 

 

 

 

 

2... Bow ..
Qmmle 0 00000000000000000 00
Dmmmm o
oo:o.ot_D wcﬂmscm .

q q 4

0000000
0°00 00°

 

oo oo
o
o o
o
. o o
C O—I r Own-C ace 0 g
. o0 — I oooooooooooo coo o
0000 00 00 o
o o
oo o oo
00 o 00 0
00° 0 0°00
0 o oo
o 00
00000 0 00000000 0
I 000000 0 ml f 00000 L
000000000 000 .
000000 0 00
oo o
000 r 0
oo o
oo 0
00 oo
00 Av
.. - - 5... - - I -- 45.1.. 1
oo 4o 00
o o
as o
00 0
00 t 0
o o
00000 00
0000000000 0
000
0000
00

 

 

 

 

m—

oocoshmﬂ maﬁa—am .

0:5 OuoNill .
ommmm- .. .. .
ammmm ..

m_.

o_.

.n-

o.

 

 

 

m—

76

.. 3...... .8333... 85...... QR...- “ 3.. ...... .......§. .53.... ... mmawa

mmcnmooommommvovmommmomm.o. m o mushmoocmmommvovgommmomﬂo. m o

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

. . _ _ m . - . m T
- 8on .OT - ace—H: -oT
- - m- - - w-

oooooooooooooooooooooooooooooo
oo: oooooooocczont » r< to co .
$3.38“... - ---------------- hnuugiﬁﬁno .3333. - ..... 1 ..o
ooooooooooooooooooooooooooooooooooooo oooooo
- . m r m
0:5 PEN III on: PEN ill
I Qmmle o i O.— L Qmmle o i O—
Qmmmm o Dmmmm o
BEBEE manamvm . comes-ta wanna-m. .
b _ . _ . . _ _ . p p p p h m #- p

 

p 0 Li 1- » _ l—r L — — LiiP — m.—

77

As DOE increases in simulated data 4, the equating difference between methods

ZSG(.5, .5) and ZSG(1, 1) falls beyond the 95% conﬁdence interval when sample size is

1000. In this case, the 28G approach with weights of (1 , 1) is preferred to avoid the

equating bias introduced by including data from X2 and Y2. This is also the case for data

5 when sample size is 500 and 1000 and for data 6 when sample size is 300, 500 and

1000.

Table 29 summarizes the equating functions selected by using SEED plots for

different samples under different order effect situations. It reﬂects that the EG design (the

ZSG approach with weights of (1 , 1)) is more appropriate at the lower right corner when

DOE gets larger and when sample size gets bigger.

TABLE 29. Selected equating function based on SEED

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

DOE n=50 n=100 n=300 n=500 n=1000
d=0 25C (.5, .5) ZSG (.5 .5) 2SG (.5, .2 * 25C (.5, .5) 2SG (.5, .5)
d=0.025 zsgs, .5) 236 (.5, .5) 286 (.5, .5) 2304.5, .5) 236 (.5, .3
d=0.05 2SG 15,3 2SG (.5, .5) 286 (.5, .5) 236 (.5, .5) 286 (.5, .5)
d=0.1 236 (.5, .5) 2SG (.5, .5) 256 (.5, .5) 230 (.5, .5) 2SG (1, 1)
d=0.15 2SG (.5, .5) 236 (.5, .5) 236 (.5, .5) 256 (1, 1) 2SG (1, 1)
d=0.2 286 (.5, .5) 250 (.5, .5) 28G (1, 1) 23C (1, 1) 256 (111)
TABLE 30. Selected equating function based on RMSE
DOE n=50 n=100 n=300 n=500 n=1000
d=O 2SG (.5, .5) 2SG (.5, .5) 2SG (.5, .5) 2SG (.5, .5) 286 1.5, .5)
d=0.025 256 (.5, .75) 286 (.5, .75) 286 (.5, .75) 28G (.5, .75) 2SG (.5, .75
d=0.05 28G (.9, .9) 28G (.9, .9) sz (.9, .9) sz (.9, .9) 2SG (.9, .9)
d=O.1 256 (1, 1) 2SG (1, 1) 2SG (1, 1) 236 (1, 1) 280 (1, 1)
d=0.15 236 (1, 1) 286 (1, 1) 2SG (1, 1) 2SG (1, 1) 28G (1, 1)
=0.2 2SG (1, 1) 236 (1, 1) ZSG (1, 1) ZSG (1, 1L 2SG (1, 1)

 

 

 

 

 

 

 

Comparing Table 30 with Table 29, it can be found that the RMSE and SEED

statistical indices produce same results when DOE is almost zero and when DOE is large

78

 

(effect size > 0.2 in this case). When the effect size of DOE is within a certain small
range, the RMSE can provide more ﬁne- grained equating solution. This is when the

weighting method comes into place.

Chapter V: Discussion

5.1 Performance of the KB Methods

The results of this study are consistent with previous studies that compared the
KB methods with the traditional equating methods. In general, the KB methods produce
results very similar to their corresponding traditional equating methods. These
similarities in equating results support KE method as a promising uniﬁed approach to test
equating based on a ﬂexible family of equipercentile-like equating functions. The entire
classic observed score equating methods can be incorporated into its framework. The
summary statistics in Table 17 to Table 28 indicate that the 2SG(.5, .5) linear method and
the SG linear method produce very similar equating results in terms of SEE, equating
bias and RMSE. Similarly, the 2SG(] , 1) linear and traditional EG linear equating
methods provide equating results very close to each other; so are the ZSG(.5, .5)
equipercentile, SG KE equipercentile and traditional SG equipercentile equating
methods. The equating differences between 286(1, 1) equipercentile method and the
traditional EG equipercentile method are small as well. Although the summary statistics
in Table 17 to Table 28 indicate their equating difference'is relatively larger compared to
the equating differences between the other previously-discussed approximation pairs. The
actual differences of their equating functions are smaller than 1 raw score point for any
score point above chance score, which are not large differences. Figure A28 to Figure

A34 plot the equating differences between the 2SG(1, l) equipercentile method and the

79

traditional EG equipercentile method for selected cases. The equating differences
between these two methods are the biggest in simulated data 6.
KB provides the SEED statistics for examining the equating difference between

two KE methods. The usefulness of this statistics is discussed below.

5.2 Effects of the Weighting Method

The overall equating accuracy consists of two parts: random equating error (SEE)
and systematic error (equating bias). When a CB design is used to collect data for an
equating, the 286 approach under KE framework attempts to provide an optimal
equating solution with the least overall equating error, which is indicated by the
magnitude of RMSE in this study.

In the rest of this section, the effect of the weighting method in enhancing overall
equating accuracy is discussed in terms of both equating bias and the overall equating
error.

The study results based on both real and simulated data indicate that the
weighting mechanism is effective in some extent. As DOE gets larger, the weights with
smallest RMSE also increase (as indicated in Table 30 for simulated data 2 and data 3).
Because random equating error increases as weights increase, the reduction in RMSE
must be due to the reduction of equating bias. Therefore, the results of this study
demonstrate that the 28G approach can reduce systematic equating error by adjusting the
weights placed on the data from tests taken first. However, the reduction in equating bias
is not significant as indicated by the SEED plots (as indicated in Table 29 for simulated
data 2 and data 3). The reduction of equating bias is only signiﬁcant when sample size is

large enough and when DOE is big enough. When this happens, the weights in the 28G

80

approach will be (1, 1), which indicates an E6 design.

The reason for the small amount of improvement in terms of RMSE is because, as
DOE gets larger, examinee’s performance on the second test will be more affected by
order effects and will be less accurate. Thus the 28G approach assigns more weights on
the tests taken ﬁrst to reduce bias introduced by order effects. The bigger the order
effects, the more weights will be put on the tests taken ﬁrst to reduce bias. However, the
more weights on the ﬁrst tests, the bigger the random equating errors are. Because of this
trade-off between random equating error and system equating error, when both random
and systematic equating errors are considered together, the equating error in terms of
RMSE does not seem to be reduced much.

The ﬁndings of this study support the 28G approach as a sensitive approach with
the ﬂexibility of using optimal data information as the size of order effects changes. The
RMSE index provides more detailed information and can help decide which weights to
use. However, the way of trying every possible weight between 0.5 and l to decide the
ﬁne-grained weights using the criterion of RMSE involves lengthy calculations.

Other possible ways of determining how to treat the data collected by a CB design
could be the hypothesis testing of DOE introduced in the method section and the SEED
method applied in this study. If the hypothesis test of DOE is not signiﬁcant, the data
collected by a CB design shall be pooled together as 3 SG design. Otherwise, the data
shall be treated as an EG design. The SEED plot method tests the signiﬁcance of the
equating difference between 2SG(.5, .5) and 2SG(], 1). If the equating difference is not
signiﬁcant, the ZSG(.5, .5) method will be used, i.e., data from the two samples will be

pooled together and will be treated as 3 SG design. Otherwise, if the equating difference

81

is signiﬁcant, the 286(1, 1) method will be used, i.e., the data in a CB design will be
treated as an EG design. These two methods may not be as accurate as the RMSE
method, but they are simpler to be carried out in practice. Further study can investigate
how consistent the decisions are when using these three methods to select the best
equating design.

Finally, the results of this study suggest that the advantage of collecting data using
a CB design over an EG design appears only when the magnitude of DOE is small. When
DOE is within a small range, data from the two groups can be pooled together using
different weights to reduce the overall equating error. However, when DOE is large,
information from tests taken second will make no contribution to improve the overall
equating accuracy. On the other hand, this study alerts us to the importance of

implementing random sampling and random assignment in a CB design.

5.3 Limitations of This Study

One concern about real data 2 is that test X and test Y has different test-retest

reliabilities, e.g., r(X1,Y2) =0.64, r( X2, Y1 ) =0.74. Effort was made to enhance the

reliability of test X and to make it equal to the reliability of test Y. One way was to

remove items on test X that had low correlation with test score of Y 2. This purpose has

not been achieved successfully. It turned out that the reliability of test Y increased by a
similar amount as the reliability of test X increased. As a result, the equatings were
conducted to real data 2 disregarding the issue of unequal reliabilities.

The average equating bias reported in this study also has its disadvantages. That

is, when averaging all the conditional equating differences, the negative bias at individual

82

score levels will cancel out the positive bias at each individual raw-score level.

5.3.] Arbitrary Nature of the Equating Criterion

In this study, the equating criterion for each population was selected to be the
results of traditional equipercentile equating. It might be interesting to regard the results
of an IRT-based equating method as the equating criterion for each population. However,
this will not make too much change to the patterns of the equating differences between
different methods from the author’s point of view since Lord and Wingersky (1984)
found the IRT true score equating and equipercentile observed score equating yields

almost indistinguishable results using a sample of size around 3000.

5. 3.2 Problem with Simulated Data

Besides the 3PL IRT model, the one parameter IRT model and two parameter IRT
model were also applied to simulate data in this study. Comparing to the IPL or 2PL
model, the distributions of data simulated by using the 3PL model better represent the
distributions of real data 1 in terms of the minimum observed score level, the mean
scores, the skewness and the kurtosis statistics. Although efforts were made to make the
simulated data as close as possible to a real dataset, like many simulation studies, it is
unsure to what extent that the simulated data represents real order effects in a real CB

design.

5.4 Future Study

The 95% conﬁdence interval in the current SEED plot is two times of the

conditional standard error of equating difference at each raw score level, which indicates

83

that the current SEED plot conduct independent t-test at each score level to examine the
signiﬁcance of equating difference. One drawback of the current SEED plot is that it does
not control the family-wise error rate. Since the error rate at each score level is 0.05, the
overall error rate across the whole score scale must be larger than 0.05. When the
attention is on the equating difference at a particular cut score or within a small score
range, it is ﬁne to apply the i 2SEED conﬁdence interval at each score level.
Nevertheless, when it is needed to make a statement on the overall equating differences
across the whole score scale, a multivariate global test will need to take into account the
dependency among each score point and to control for the family—wise error rate. Future
study can explore how to develop such an overall test for the signiﬁcance of global

equating difference between two equating methods.

84

TABLE A]. Standard error of linear equating for real data I

APPENDICES

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Datal ZSG KB SG EG
2SG 28G ZSG 2SG ZSG Traditional Traditional
X (.5,.5) (.5,.75) (.75,.5) Q5,.75) (1, 1) Linear Linear
0 1.334 1.634 1.776 2.433 4.674 1.334 4.648
1 1.311 1.607 1.745 2.393 4.533 1.311 4.575
2 1.288 1.579 1.715 2.354 4.405 1.288 4.503
3 1.265 1.552 1.685 2.315 4.355 1.265 4.431
4 1.242 1.525 1.655 2.276 4.326 1.242 4.359
5 1.219 1.497 1.624 2.236 4.287 1.218 4.287
6 1.196 1.47 1.594 2.197 4.215 1.195 4.215
7 1.173 1.443 1.564 2.158 4.144 1.172 4.143
8 1.15 1.415 1.534 2.12 4.072 1.149 4.071
9 1.127 1.388 1.504 2.081 4 1.127 3.999
10 1.104 1.361 1.474 2.042 3.928 1.104 3.928
11 1.081 1.334 1.444 2.003 3.857 1.081 3.856
12 1.058 1.307 1.414 1.965 3.786 1.058 3.785
13 1.035 1.28 1.385 1.926 3.715 1.035 3.714
14 1.013 1.253 1.355 1.888 3.643 1.013 3.643
15 0.99 1.227 1.325 1.849 3.573 0.99 3.572
16 0.967 1.2 1.296 1.811 3.502 0.967 3.501
17 0.945 1.174 1.266 1.773 3.431 0.945 3.431
18 0.923 1.147 1.237 1.735 3.361 0.922 3.361
19 0.9 1.121 1.208 1.697 3.291 0.9 3.291
20 0.878 1.095 1.178 1.66 3.221 0.878 3.221
21 0.856 1.068 1.149 1.622 3.151 0.856 3.151
22 0.834 1.042 1.12 1.585 3.082 0.834 3.082
23 0.812 1.017 1.092 1.548 3.013 0.812 3.012
24 0.79 0.991 1.063 1.511 2.944 0.79 2.943
25 0.768 0.965 1.034 1.474 2.875 0.768 2.875
26 0.746 0.94 1.006 1.437 2.807 0.746 2.807
27 0.725 0.915 0.978 1.401 2.739 0.725 2.739
28 0.703 0.89 0.95 1.365 2.671 0.703 2.671
29 0.682 0.865 0.922 1.329 2.604 0.682 2.604
30 0.661 0.841 0.894 1.293 2.537 0.661 2.537
31 0.64 0.816 0.867 1.258 2.471 0.64 2.471
32 0.62 0.793 0.84 1.223 2.405 0.62 2.405
33 0.6 0.769 0.813 1.189 2.34 0.6 2.34
34 0.58 0.746 0.786 1.155 2.275 0.58 2.275
35 0.56 0.723 0.76 1.121 2.211 0.56 2.211
36 0.54 0.7 0.735 1.088 2.148 0.54 2.148

 

85

 

TABLE A1. Continued

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Data] 280 KB so BO
286 2SG 280 280 250 Traditional Traditional

8 (.5,.5) (.5,.75) (.75,.5) (.75,.75) (1, 1) Linear Linear
37 0.521 0.678 0.709 1.056 2.086 0.521 2.086
38 0.503 0.657 0.685 1.024 2.024 0.503 2.024
39 0.485 0.636 0.661 0.992 1.963 0.485 1.963
40 0.467 0.616 0.637 0.962 1.903 0.467 1.903
41 0.45 0.596 0.614 0.932 1.845 0.45 1.845
42 0.433 0.577 0.592 0.903 1.787 0.433 1.787
43 0.418 0.559 0.571 0.875 1.731 0.418 1.731
44 0.403 0.542 0.55 0.848 1.677 0.403 1.677
45 0.389 0.526 0.531 0.822 1.623 0.389 1.623
46 0.376 0.511 0.513 0.797 1.572 0.376 1.572
47 0.364 0.497 0.496 0.774 1.523 0.364 1.523
48 0.353 0.484 0.481 0.752 1.476 0.353 1.476
49 0.344 0.473 0.467 0.732 1.431 0.344 1.431
50 0.336 0.463 0.455 0.714 1.389 0.336 1.389
51 0.33 0.455 0.445 0.697 1.349 0.33 1.349
52 0.325 0.449 0.437 0.683 1.313 0.325 1.313
53 0.322 0.444 0.431 0.671 1.28 0.322 1.28
54 0.32 0.441 0.427 0.661 1.251 0.32 1.251
55 0.321 0.44 0.425 0.653 1.225 0.321 1.225
56 0.323 0.441 0.426 0.648 1.204 0.323 1.204
57 0.327 0.444 0.429 0.646 1.187 0.327 1.187
58 0.333 0.448 0.434 0.646 1.175 0.333 1.175
59 0.34 0.455 0.442 0.649 1.167 0.34 1.167
60 0.349 0.463 0.451 0.654 1.164 0.349 1.164
61 0.359 0.472 0.463 0.662 1.166 0.359 1.166
62 0.37 0.483 0.476 0.672 1.172 0.37 1.172
63 0.383 0.496 0.491 0.684 1.183 0.383 1.183
64 0.397 0.509 0.507 0.699 1.199 0.396 1.199
65 0.411 0.524 0.525 0.715 1.22 0.411 1.219
66 0.426 0.54 0.543 0.734 1.244 0.426 1.244
67 0.443 0.557 0.563 0.754 1.272 0.442 1.272
68 0.459 0.575 0.584 0.776 1.304 0.459 1.304
69 0.477 0.594 0.606 0.799 1.34 0.477 1.34
70 0.495 0.614 0.629 0.824 1.378 0.495 1.379
71 0.513 0.634 0.652 0.85 1.42 0.513 1.42
72 0.532 0.655 0.676 0.877 1.464 0.532 1.464
73 0.551 0.676 0.701 0.905 1.511 0.551 1.511
74 0.571 0.698 0.726 0.934 1.559 0.571 1.56
75 0.591 0.72 0.751 0.964 1.61 0.591 1.61

 

 

 

 

 

 

 

 

86

 

TABLE A2. Standard error of equipercentile equating for real data I

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Datal 28G KB 80 E6
2SG ZSG ZSG ZSG 25G Traditional Traditional

X (.5,.5) (.5,.75) (.75,.5) (.75,.75) (1 , D Equipercentile Equipercentile
0 1.218 1.269 1.169 1.272 1.778 0 0

1 1.354 1.432 1.349 1.511 2.328 1.159 1.025
2 1.384 1.477 1.409 1.609 2.664 2.318 2.049
3 1.383 1.485 1.428 1.657 2.896 3.478 3.073
4 1.369 1.478 1.43 1.683 3.068 4.512 3.981
5 1.348 1.464 1.424 1.698 3.2 5.325 4.751
6 1.324 1.445 1.414 1.707 3.303 5.928 5.341
7 1.298 1.424 1.4 1.712 3.383 6.335 5.791
8 1.27 1.401 1.385 1.714 3.444 6.65 6.231
9 1.241 1.378 1.369 1.714 3.49 6.724 6.366
10 1.212 1.353 1.352 1.712 3.523 6.774 6.485
11 1.182 1.328 1.335 1.708 3.545 6.777 6.551
12 1.152 1.303 1.317 1.703 3.556 6.762 6.582
13 1.122 1.278 1.298 1.697 3.557 6.755 6.622
14 1.092 1.252 1.28 1.689 3.551 6.747 6.673
15 1.063 1.227 1.26 1.679 3.536 6.758 6.744
16 1.033 1.201 1.241 1.668 3.515 6.778 6.821
17 1.004 1.176 1.221 1.655 3.486 5.972 6.441
18 0.975 1.15 1.2 1.64 3.451 5.548 6.292
19 0.947 1.125 1.179 1.624 3.41 5.347 6.214
20 0.919 1.1 1.158 1.606 3.363 3.977 5.787
21 0.892 1.075 1.136 1.586 3.311 3.277 5.531
22 0.865 1.05 1.113 1.564 3.254 1.957 5.042
23 0.839 1.026 1.09 1.541 3.193 1.516 4.665
24 0.813 1.001 1.067 1.516 3.127 1.284 4.371
25 0.788 0.977 1.043 1.49 3.058 1.018 4.131
26 0.763 0.953 1.018 1.462 2.987 0.866 3.697
27 0.739 0.929 0.993 1.433 2.912 0.801 3.397
28 0.715 0.905 0.968 1.403 2.837 0.786 3.201
29 0.692 0.881 0.942 1.371 2.759 1.121 3.049
30 0.669 0.858 0.916 1.339 2.681 1.197 2.39
31 0.646 0.834 0.889 1.305 2.603 1.267 2.266
32 0.624 0.811 0.862 1.271 2.524 1.201 2.194
33 0.603 0.788 0.836 1.237 2.446 1.027 2.283
34 0.582 0.765 0.809 1.202 2.368 0.762 2.493
35 0.561 0.742 0.782 1.167 2.292 0.835 2.671
36 0.541 0.72 0.755 1.131 2.217 1.085 2.757
37 0.521 0.697 0.728 1.096 2.143 1.344 2.797
38 0.502 0.675 0.702 1.062 2.071 1.305 2.848
39 0.483 0.654 0.676 1.028 2.001 1.279 2.884

 

 

 

 

 

 

 

 

87

 

TABLE A2. Continued

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Datal 2SG KE SG EG
2SG 28G 25G ZSG 2SG Traditional Traditional

X (.5,.5) (.5,.75) (.75,.5) (.75,.75) 1, I) Equipercentile Equipercentile
40 0.465 0.633 0.651 0.995 1.934 1.071 3.091
41 0.448 0.613 0.626 0.962 1.869 0.849 3.152
42 0.431 0.594 0.603 0.931 1.807 0.875 2.997
43 0.415 0.575 0.58 0.902 1.748 1.047 2.818
44 0.401 0.558 0.559 0.873 1.693 1.008 2.715
45 0.387 0.542 0.539 0.847 1.641 0.929 2.726
46 0.374 0.527 0.52 0.823 1.593 0.772 2.568
47 0.363 0.513 0.504 0.801 1.548 0.831 2.431
48 0.353 0.502 0.489 0.781 1.508 0.896 2.241
49 0.345 0.492 0.477 0.764 1.472 0.764 2.024
50 0.338 0.484 0.467 0.749 1.441 0.687 1.94
51 0.334 0.478 0.46 0.737 1.413 0.75 1.983
52 0.332 0.474 0.455 0.727 1 .3 89 0.934 1.989
53 0.332 0.473 0.454 0.721 1.369 0.988 1.907
54 0.334 0.473 0.454 0.717 1.353 0.745 1.832
55 0.338 0.476 0.458 0.715 1.339 0.619 1.626
56 0.344 0.48 0.464 0.715 1.328 0.59 1.442
57 0.352 0.487 0.472 0.718 1.319 0.574 1.353
58 0.362 0.495 0.482 0.722 1.312 0.539 1.322
59 0.374 0.504 0.494 0.727 1.306 0.541 1.276
60 0.387 0.515 0.508 0.734 1.299 0.5 1.242
61 0.401 0.526 0.522 0.741 1.293 0.534 1.308
62 0.416 0.538 0.538 0.748 1.285 0.623 1.442
63 0.432 0.551 0.554 0.755 1.276 0.86 1.501
64 0.448 0.563 0.569 0.761 1.265 0.928 1.51
65 0.464 0.575 ' 0.585 0.767 1.251 0.984 1.471
66 0.479 0.586 0.599 0.771 1.234 0.682 1.4
67 0.494 0.596 0.613 0.772 1.212 0.486 1.276
68 0.508 0.604 0.624 0.771 1.185 0.613 1.177
69 0.519 0.608 0.632 0.766 1 . 152 0.745 1.23
70 0.527 0.609 0.635 0.755 1.109 0.945 1.032
71 0.53 0.603 0.632 0.736 1.054 0.859 1.051
72 0.524 0.585 0.618 0.704 0.98 0.58 1.142
73 0.502 0.549 0.585 0.648 0.876 0.88 1.353
74 0.452 0.479 0.516 0.553 0.717 1.326 1.576
75 0.38 0.385 0.419 0.429 0.491 0 0

 

 

 

 

 

 

 

 

88

 

TABLE A3. Standard error of linear equating for real data 2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

IDatal 2SG KE so EG
2SG ZSG 286 28G 2SG Traditional Traditional
x (.5,.5) (.5,.75) (.75,.5) (.75,.75) (1, 1) Linear Linear
0 0.341 0.354 0.403 0.454 0.767 0.341 0.767
1 0.318 0.331 0.377 0.425 0.718 0.318 0.718
2 0.296 0.309 0.352 0.397 0.67 0.296 0.67
3 0.274 0.287 0.327 0.37 0.623 0.274 0.623
4 0.252 0.265 0.302 0.343 0.577 0.252 0.577
5 0.231 0.244 0.278 0.317 0.533 0.231 0.533
6 0.211 0.224 0.255 0.293 0.491 0.211 0.491
7 0.191 0.205 0.233 0.27 0.452 0.191 0.453
8 0.173 0.187 0.212 0.249 0.417 0.173 0.417
9 0.156 0.172 0.193 0.23 0.386 0.156 0.387
10 0.141 0.158 0.176 0.215 0.362 0.141 0.362
1 1 0.129 0.148 0.162 0.203 0.344 0.129 0.344
12 0.121 0.141 0.151 0.195 0.334 0.121 0.334
13 0.117 0.138 0.146 0.193 0.333 0.117 0.333
14 0.118 0.14 0.145 0.196 0.341 0.118 0.341
15 0.124 0.147 0.15 0.204 0.357 0.124 0.357
16 0.134 0.157 0.16 0.216 0.381 0.134 0.381
17 0.147 0.17 0.173 0.232 0.411 0.147 0.411
18 0.163 0.185 0.189 0.251 0.445 0.163 0.445
19 0.18 0.203 0.208 0.272 0.483 0.18 0.483
20 0.199 0.222 0.229 0.295 0.525 0.199 0.525
21 0.219 0.242 0.251 0.32 0.568 0.219 0.568
22 0.24 0.262 0.274 0.346 0.613 0.24 0.613
23 0.261 0.284 0.298 0.373 0.66 0.261 0.66
24 0.283 0.306 0.323 0.4 0.708 0.283 0.708
25 0.305 0.328 0.347 0.428 0.757 0.305 0.757

 

 

 

 

 

 

 

 

 

89

 

TABLE A4. Standard error of equipercentile equating for real data 2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

IDatal 28o KB so EG
ZSG 2SG 2SG ZSG ZSG LTraditional Traditional

X (5,.5) (.5,.75) (.75,.5) (.75,.75) (1, 1) quipcrcentile Equipercentile
0 0.484 0.548 0.399 0.456 0.72 0 0

1 0.469 0.532 0.393 0.463 0.67 0.711 0.785
2 0.448 0.498 0.393 0.451 0.624 0.827 0.916
3 0.393 0.433 0.362 0.41 1 0.575 0.845 0.96
4 0.328 0.36 0.318 0.361 0.525 0.448 0.832
5 0.272 0.295 0.277 0.315 0.479 0.353 0.696
6 0.229 0.247 0.244 0.279 0.439 0.239 0.535
7 0.202 0.216 0.222 0.255 0.404 0.286 0.423
8 0.185 0.2 0.207 0.241 0.378 0.268 0.348
9 0.172 0.191 0.198 0.234 0.358 0.201 0.513
10 0.16 0.184 0.189 0.229 0.347 0.196 0.425
11 0.149 0.177 0.18 0.226 0.343 0.198 0.334
12 0.139 0.171 0.172 0.224 0.346 0.192 0.422
13 0.134 0.167 0.166 0.224 0.358 0.179 0.454
14 0.134 0.166 0.165 0.227 0.377 0.216 0.374
15 0.139 0.17 0.169 0.235 0.402 0.213 0.532
16 0.149 0.177 0.178 0.245 0.431 0.231 0.5
17 0.161 0.186 0.19 0.257 0.459 0.24 0.662
18 0.176 0.197 0.204 0.269 0.486 0.305 0.598
19 0.195 0.213 0.219 0.282 0.511 0.312 0.74
20 0.222 0.239 0.235 0.297 0.538 0.345 0.883
21 0.259 0.276 0.254 0.316 0.566 0.406 0.952
22 0.307 0.326 0.277 0.34 0.592 0.439 0.769
23 0.354 0.379 0.297 0.362 0.61 0.839 0.262
24 0.375 0.413 0.298 0.365 0.609 0.779 0.131
25 0.374 0.418 0.304 0.351 0.605 0.671 0

 

 

 

 

 

 

 

 

 

90

 

a 66% e838 aeeeaeSe 0.53:8 .2 $50;

.ooo N H 2 $me 0238.43.33

«w. ..n uth use amazmoxmmwzom

mmommmommﬁem o

2.

 

mm on

4

4 «

ooo on:

it. i
a!“ 6 o

0°00 00000 I

0000
_amo

no oo mm on 9... ov

a _ J

ooooooo
ooooooooooooooo °oooooo
0
1' 00°00 o

’3."

 

Qmmmmi
Qmmmm

 

6:5 PEN ll

0

O

 

 

P _

 

8:95me menu:

> p 1F

1? p E r h — b .—

0

000°
0
0°

37

 

2

 

.821: .Eom .885 «3 Comm nee 88$

mm

a $6.3 8.8.58 8e68§6 823$ .2 $505

on

mooommommvonmommmomES m o

2..

 

vl

ml or

ooo o Hz

00000
ooooo°oooooooooooooooooooooooooooo

‘

000
0000000
ooooooooooooooo
oo
00 00°

iiii’ iiii

o7

3m-

0000:

 

A

 

 

0:5 PEN III
Dmmmm- ..
Dmmmm ..

 

 

 

cantata wanescm .

F n p w

v

.o~

 

2

 

91

.821: .364 83828366 a. .e. Comm 88

.862: «h. 6.»me 20$:me motmxmﬁmﬁ wﬁng .m< mMDOE

whom.mooommommvonmommmomfo_. m o

 

 

 

. m _-
i i o _ .
coo ~H:
Ti 1 ml
$8.8” ..... H - .............. - “----zoigmmmmmmmmmmo
T 1 m
054 BoN ll
- mmmmm- . LS
Qmmmm a
8:20th wacmscm .

 

 

 

 

 

 

» n > hi p p P p l» w p r h b m~

92

.oooTi: .NmOm 055809300

Am. .moOmN 50:0 03:00:38?

: 50% 8058 858:6 0:025: .2. 559:

mm on 3 oo mm on m: ov mm om mm om m5

o5mo

m5-

 

« q 5 5

ooo 5 H:

1

00
00°00
00000

0000

0000
000000
00000000ooooooooooooooooooo
ooo
OOO

. ”I .1- 11.13.:

TJ

000000000

0

o
o

O

 

oooo 1111111111
ooooo
00000000000000OOOOOOOOOOOOOOO
00°00
000°
0000
0000000

 

0:5 80N ill
5mmmm- ..

Qmmmm ..
00:00th wﬁﬁ: .

— — p _

 

 

 

 

h h + 1? p L p u

o
o
o
000000000

9 "'

O

0

O

43

3:-

c

:

 

 

m5

.ooo5u: .NmOm 50:: Amamowmm 50:0 30::

A 5.30mm :002509 00:00.55: wﬁﬁscm .v< EDGE

mm on

no oo mm om m: o: mm om mm om m5 o5 m

o

 

_ _ 15: a 5 5 5 ﬂ

- ooo 5 H:

00000
000000 0
voooooooooooooooooooooooooooooooooo

u 5 4 5 _

l

0.
000000
oooooooooooooooo
00000

’9’,

v

 

6 0000000000 00000 00000 00000 00000 00000 00
0000000
00

 

0:5 EQN 11
r Qmmmm- o

Qmmmm 0
00:80:55 wﬁﬁscm .

h — b p p p P h

 

 

 

 

T

 

 

m5-

o5-

o_

5

93

.821: «:0: 0328853516. .6 Com. use

302.: «w. Wow-am 200206 002005.: M53355 .o< mMDOE

mnohmooommommvonmommNoNQo5 m o

 

 

 

 

 

 

 

 

- . . 2-
- 821: .2-
- .m-
o0:
“Ema.“ 1180108888 .... Hun-ooooomommmmmm oommmuﬂwo
- . m
0:5 BQN 11
- 000mm- . -2
Amman 0
00:00.55 wﬁamscm .
: . . t _ _ : . . . _ - l . m5

 

94

.02 Te .20: 05588806
G. .202 05 6288833 22% .20: 03:: $.3me een 88:

A 2 .302 e858 850:6 05:55: .20. $52.: 2.30% 8628 850:6 058:0: .2 .550:

 

 

mnonmooommommvovmmommmomfo_ m o

A 4 A

muonmooommommvonmommNoNEo_ m o

4 1~ 4 5 q a 4 4

WT. W—i

- 823H .2- - 821e .2-

r 1m“: i L mm.

 

 

 

 

 

 

 

00°:
0 00000 O
00000000 00000 00000 000000 0000000 000000000
0 O 00
00°00 0 0° 000000
000000 i 0000
name a 8 68. a. 6. mmmmm mama 688 a f ...... ...... 1 ..... 1 -- 1 - : 68
E” “““ 444444 4 4444:, G T11 1111! .ll'. Iil .1111: 1 o
oooooo l 1 r ..oooooooooooooooooooii 1 1 i it i i 1»
0000 3!. A 00000 00000 ooooooooooooooo
4 44444444 9)) D 0 G 00000 00°00 00°
44444 caoooooo 0 0° 000°
0000000000000 000 000000 0000000000
00006000000 0000 00000 00000 0000:

 

 

0:5 80N 111 0:5 PEN Ill
Dmmmm- .. 5mmmm- ..

Qmmmm .. Qmmmm ..
00:20:55 mcuaaom . 0050055 252:5 .

— h r .— p 5 hi P h p p p h m.—

 

 

 

 

 

 

 

 

 

 

 

 

P h h p h p h h r F 1P 1.7 P m#

95

.82”: .32 02380353 «n. w .0me 35

32s a. 30% :833 800$?“ 3:30. .3 $590

 

muonmooommommVOVmMOmmmomﬂ2 m o

 

 

 

 

1 1 m T
1 COO ﬂ ”C 1 O —|
x L w-
$3.: - - ....... §§a§§§unnumho
, l m
023 BoN 1|

0 Qmmmm- o - 2
# Qmmmm o

oozeoba maﬁa: .

 

 

 

 

L _ 0 _ _ 0 0 p r p _ p r mﬁ

96

.oooﬁnc .vam 65580353
a. £62 as 05520953 .82“: .32 .38: 350mm 23 38:

: .303 5323 885% wqﬁém .0 2 550; 2.30% 5053 85306 33:5 .o 2 $520

muonmcoommommvovaommmcmmﬁ3 m o

 

 

muchmooommommVOmeommNomﬂ3 m o

 

 

 

 

 

 

q q 4 4 m ~ I J 1 m Ml
ooo # H: ooo _ H: ,
I 1 o .H I I 1 Q ~ I
f P m- ﬂ 1 m-
00000 000 oooooooooAv
000.0111 » 1H bbbbb Hun 1H 0080b owmmomwmoooowooow r WWooo oooooooo “whom hhhhh 1 1000.1. EMMWWW omwoo 0000M OOWOOWto bbbbbbbbbb
Tiiii, l 11-11.11115

 

Av 00000 800000000 000000000000000 00°00 000
u 0000 0000
0 00000
00000 00
00.

w

 

 

 

 

 

 

 

 

 

 

 

 

 

r L m T L m
0:5 EQN I 0:5 PEN Ill
1 ammmm- o 12 I 952. o a.
Qmmmm o QﬂMmN 0
8:08.005 mamasvm . 2 oocouoba magnum .

 

r L — p p — h _ h p p b h b m#

97

.82": .Eom £22833? a. w .093 BS

.325 «M. w. $va :32qu mu§x®§c $0.23sz .N~< mMDOE

mnonmooommommVCmeommmommH S m o

 

2-
r cog": loT
.. .m:

A
00000 00000 0000

“mmmﬁggiggiiggﬂmﬂﬂoﬂ O

 

as: PEN ll
Qmmmm- o

Qmmmm o
ooaouoba wﬁamscm .

h _ _

 

 

 

 

 

 

b h h h h r _ » mﬁ

98

down: .mmOm 63598953

Am. .303 98 33580933

: 50% 88:8 855% 8:85 +2 $50;

 

17 4

mnonmooommommVOmeommNomﬂA: m o

a} ﬁ 14 q 4

com“:

'I

000000000000000000000
000000 000

3.338333808.§8§-38- - o

44

o
00
00°
000000000 00
0000000000000
00000 00000000 000
000000000 0000
000000000000

T:

 

 

 

 

 

0:5 BQN III
I Qmmm N- ..
QmMmN o
oosobba wcumswm .

.133... ,1. ,l Mupzwwa

We

 

 

m7

o7

ml

2

2

down: .mmOm 03:: Am..m.VOmN 98 08::

2.30% 8258 855% 888m .mZ 856;

ms 3. mo co mm em 3 ow mm om mm om

ﬁne

2-

 

f Gown:

A

0000 0000.: m

.3.

 

“wgioggggagtgiogwfn-1811 11 1: 11

VIII: 1'! 1" ‘ll 1.1.l ‘Iil l|

 

Qmmm N-
Dmmmm

 

and PEN III

8:80.005 wcumswm .

 

 

 

L .— — p r r

.m

o—

 

2

 

99

gnu: .22 88888338 a. w .0me .33

.525 «w. w. .093 $953 muzmxﬁxﬁ Matuawm .m ~< mMDOE

 

whenmwoommommvovaommmommﬁSm o

 

 

 

 

 

 

 

 

T d 1 J _ q d m —l
0
com”: mm
1 1 o T
.. 1 WI
ME”- 1.: HHHéTJNMHMMmmmoM. o
ﬂ . m
85 BoN ll
1 Qmmmm- . 1 3
QmMmN .
momenta mean: .
_ _ _ _ _ _ T + L . _ _ _ 2

.oooﬁnc .mmOm 6558333
G. 50% 88 288.838 .82”: .28 88: 850mm 88 88:

2 50% 8838 888....6 888m .5 2 mmaoz 2.30% 8858 888:8 88:5 82 550;

whenmo¢emm¢mw¢0¢mm¢nmm¢m23 m o

A _ 4

 

muchmooommommVOmeommNomﬂ3 m o

q 4‘ 4‘ ~ .— 4 4

m7

 

m7
cog”: . cog”: .
f .3. I S-

, - m- , m-

l

 

 

 

 

00:
000000000

p it 00 00000300000000 000000.000:

014414 «mammogmawoooo 00000 I p -1 1 {gaoaoooooigbwwo
oooooo 0030 0000000000 o :8» ‘4: « oooooooo
CdMWO 00000000 000000000 00.: GINO” 0000000 00000 0000000000 00000
F -. I 1.1%.: o
r ..
o L

00.0
00 O
o 0000000oooooooooooooooooooooo 0° v000000000000°00OOOOOOOOOOOOOOOOOOOOOOO
000000 00 00000
00000 0 0000000
0000000 0000 00000
00000000 0000000
000000
000000;

 

, J m l 1 m
0:5 BoN II 85 EQN 1|:
I 95%. . é ; ammmm- . é
Qmmmm . QmMmm 0
82800.05 wcumscm . 322805 95QO .
. _ _ . . p _ _ p . . r . L . _ p _ _ . W —

 

 

 

 

 

 

 

 

 

 

 

 

_ — P b p p L 0 WM

 

101

.82": .32 80.888838 a. w .0me Ea 88.:

a. w .68. 88:8 888%.. 8:83. .w 2 $59..

mnonmooommommVOmeOmmNommﬁ2 m o

 

 

 

 

 

 

 

 

1 . . m T
, 82”: -2-
r m-
m.......................... ..... ......8...”
to}.
T .m
25 EQN 1|
1 95mm. .. , E
Qmmmm ..
8:20me was“: .

 

102

don”: .omOm 65580353

Am. .30mm can 2358383

: 59% 5223 agate $335 .o~< $505

mm on no oo mm on 3. ow mm Om mm om m

_o~m 0

m7

 

q 4‘ J 1 a

cemuc

~ 4

 

 

QmMmN-
Qmmmm

 

85 Saw Ill

0

 

 

 

1 # J‘ «

oooo
ﬂ 000000000 00000
000 o
000 0
0.0 ooo o
0000000000ONOouooouuuwma. 0&00000000000 00
Quooooooooooo 00 00000.... o
M 00 00000
own 0000
M O :0
o
o
0000
00000000 o
ooooooooooooooo c
00000
00000 o
00000 O
000 o
00 o
r. 00000 000
0000000000

coachota wcuasvm .

.oT

 

2

 

.oomnc .omOm mama: Am. .m.v0mm can 30::

A ~ .3me 8233 855:6 wcﬁscm .a 2 $505

mmomfe m o

. m7

 

mm on me om mm on 3 CV mm cm

f

cam“:

.00.... 0.00.0.00mbm111
L.§séaunﬁu:z...o.......§§

 

 

 

0:3 80M ll

Qmmmm-
Dmmmm
vow—Boga @535

O

0

 

 

h p L, b b

n

u «

00000

#100000 00000 00°00 00°00 00°00 00°00 0000
O 00°
0° 00
00°

0000
0 00°
00
00°00 000° L m
00°
00°
0°C

.2.

000:

-m-

o:

a:

 

 

103

8?: ES $328.338 «w. 65% Ba .82.:

mm on no ow m

a. w .0me §§B 8§m§n mssém ._~< $505

mom

wVOmeommNomfe m o

 

a a 4

co mu:

v
wwwmaggsagsasgiéﬁ
O

 

 

0:5 BQN ll
Qmmmm- o
Qmmmm o

 

 

 

oosouoba wcumswm .

p P L h w p p h

wwwmooo

ooooonooomwooo

cocooooouommmmo
O

A
0000
o

ix.1 i

o
o
000

O A
o
o.

1

 

 

2-

OT

2

2

104

down: 6&0; 65208383

Am. .303 98 2:582:53

: dumm c853 Dosage $823 a? 950$

mh on 3 oc mm cw mv ow mm Om mm om 2 A: m o

 

com“:

ooooooo
ooooooo ooo
0000 oo

.000

0000

0000000011 00 OOOOMMMOOOmm “0000000..

oooooo oo .00..
.0 000000000000 00000000
a 0
O

0

O

O
0
000000

 

oo
000
0000
0000000000
OOOOOOOOOOOOOOOOOOOOO

0

000000 0
00°00 00

00000000000000

I

 

25 BoN Ill
Qmmmm- o

Dmmmm ..
ooceoba maumscm .

p b h F b P P \P F u p P

 

 

 

 

O

O

O

O

A
o

 

 

m7

o7

9

m—

.oomH: .cmOm duos: Am. .m.v0mm was 30::

2 .30mm 20253 oocohotﬁ wcumswm .Nm< EDGE

mm on

we ow mm on Q ow mm Om mm

om 2 2 m

. m7

 

1 o
ooooooooooooooooooooooooooooooo

4 J J J W

cam”:

0000000

 

 

 

0:3 oHoN .ll
Qmmmm- o
Qmmmm o
85.5me @535 .

 

 

p L! p p {r h »

000

\4 \J A

00000

ooooooooooooooooooooo
$00000 oooooooooooooooo
00000
0000
0000
0000
0°00
0000
000

3.....si.§..::...:.....

.oT

oooeooooﬁv WI

0000: m

 

 

2

105

00%”: .33 smsasgém a. w .0me 0050 .523

a. w .0me “08.53 8.08%00 m§§£ ...? 520E

 

mnonmooommo mVOmeOmmNomfe m o

 

 

 

q d _ 7 ~ 1 a T a m .— I
r co m“: . o T
r 1 m:
0
memmwooasoaaoasssnnn ,unuﬁﬁgmﬁmwmmmmmmwoo ..o
. .00 o 0
o...
r . m
0:5 EuN ll
.. Qmmwm- .. . 2
QmmmN 0
8280005 wctascm .

 

 

 

 

 

» p p p r b L! \F r h L, \P [F L{ m—

 

106

.oooﬁn: .cmOm 65:80:33
3. 600% 05 02520220 .202”: .000: .32: G. .303 08 52:
2 .ccmm 5053 85.020 0:09:00 0?. $50: 2 50% 5258 802020 0:025 «.2 $52:

mm on no so mm on 3 ov mm om mm om 2 S m of-

 

 

mnonmcoommommVOmeommmomﬂo3 m o

4 4 4 3‘ _ J. _ a

2.

”G H
1 82 .2- 0 202 a .2-

m- . .m-

 

 

 

 

 

 

 

>» u ‘ ommwwwwuomaua: _
21111 <« {411 ooooooooooo 111 4» « 000000
0.000 ooummoogorooo: oooo aoooooooooooo
.00 ooooooo 0. 00 :000 )3 o 000
00 yyyyyyyyy 000000 000000 a 0000000 00000 ¢¢¢¢¢¢¢ .. e
O 0000.... .. ««««« 000 a
«000° com—.9 ﬁl. i | -ul..illllf,l¢1‘|1: . l O
0
000000000000 00: vooooooeooo ocooo ooooooooooooooooooo
0000000 00000 00000 000000000 00 00000000000000000
000000 o 0000
0000 00 000000
000000 000000000000 0000 00000 coco:

I .w T .m

0:3 0:0N 0:3 PEN

ammmm- . 2 - ammmm- . .2

QMMmN o Qmmwm o
00:0:0t5 $5025 . 00:05:05 gums—um .

n b P h \P P r LIL » p h » m~
b b L b P h r r P [F F b F m~

 

 

 

 

 

 

 

 

 

 

 

 

 

 

107

.202”: .30: 00038002010. .0 .0000 025

.5050 Q... .m. .093. 2003000 002000.30 0050:wa sm< mmDOE

whenmcoommomeOmeOmmmomfo_ m o

 

 

 

. . - 2-
002”:
f 1 O _ u
r . m-
“Wmmmmogggogunﬂ ......... unuﬁggmmmmummmwwwwwwéo
Av. 00.00 o:
no:
I . m
0:3 80N III
I QWMWN- o . O~
Qmmmm 0
00:88.05 wﬁﬁscm .

 

 

 

 

 

 

h h h _ - — L k P p b h _ — m—

108

Equating Difference, n=50. POP1

 

 

2.5 —
9 1.5 -
8
(D 05 4°“... 0
8 .0 rmWM‘ W.
S '0.5 ‘I .9 .0
LU -1.5 I

'2.5 T T I T I I I I I I I I I I I

O 5 10 1520 253035404550556065 7075
ScoreY

FIGURE A28. Equating difference between ZSG(I, I) equipercentile and E G-
equipercentile, POP], n=50.

Equating Difference, n=100, POP1

2.5 -
a) 1.5 ~
L—
8
U) 0.5 3.”.
“o 0 OWN
93 .0. 0°

. o

(:0 '0.5 .V.
5

-1.5 ~

'2.5 f I T I I T f r I I I

 

 

O 51015202530354045505560657075
ScoreY

FIGURE A29. Equating difference between 2SG(], I) equipercentile and E G
equipercentile, POP], n =1 00.

109

Equating Difference, n=50, POP4

2.5 7
9 1.5 4
8
(I) 0.5 .
'9 '0 5 ... .00”... “’0...
‘3 ' °. ”W
[B- ... ...
-1.5 ~ ...
-2.5 I I f ﬂ ﬁﬁ r f I r j I I I I

 

 

O 51015202530354045505560657075
ScoreY

FIGURE A30. Equating difference between ZSG(I, I) equipercentile and EG
equipercentile, POP4, n=50.

Equating Difference, n=100, POP4

2.5 -
a) 1.5 ~
L-
8
a) 0.5 ~
'0 «0" Inn ”000 °
9 o". ”o 0
g '0'5 J”... ....”W. W
0
La- .0. o.
'1.5 3 .m.’
'2,5 I I7 r I r T I I I j I I I I I

 

 

0 51015202530354045505560657075
ScoreY

FIGURE A31. Equating difference between ZSG(I, I) equipercentile and E G
equipercentile, POP4, n=100.

110

Equating Difference, n=300. POP4

2.5 -
a, 1.5 "
L
8
(D 0.5 q
93 00 0’ M.
g -0.5 - .0. .0 .9000“...
C’ .0. .0.
L“ -1.5 - ’00..
-2.5 7 I I T f I If I I I T I

 

 

0 51015202530354045505560657075
ScoreY

FIGURE A32. Equating difference between 250(1, 1) equipercentile and E G
equipercentile, POP4, n=3 00.

Equating Difference, n=50. POP6

 

 

2.5 -
93 1.5-
O
I?) 0-5 1 WW
8 00 out... “... o.
a; -o.5 — 0. o." “'00
3 0 .w
18 ’. .°’
"1.5 '1 .. ..
’w’
-2-5 F I I I I I I I I I T j I T j
0 5 1O 15 20 25 30 35 40 45 5O 55 60 65 7O 75

Score Y

FIGURE A33. Equating difference between ZSG(1 , I) equipercentile and E G
equipercentile, POP6, n=50.

111

Equating Difference, n=1000, POP6

 

 

2.5 -
9 1.54
8
(D 054
8 ' "’
*5 -05 o N...“ V
3 o o
a . °
-1.5 - 0 ,°
9
0....
‘2.5 I I I I I T I T I I T I I i
0 5 1O 15 20 25 30 35 40 45 50 55 60 65 70 75

Score Y

FIGURE A34. Equating diﬂerence between 2SG(], I) equipercentile and E G
equipercentile, POP6, n=1000.

112

REFERENCES

Angoff, W.H. (1971). Scales, norms, and equivalent scores. In R.L. Thorndike (Ed).,
Educational Measurement (2nd ed., pp. 508-600). Washington, DC: American
Council on Education. (Reprinted as W.H. Angoff, Scales, Norms, and Equivalent
Scores. Princeton, NJ: Educational Testing Service, 1984).

Cochran, W.G., & Cox GM. (1957). Experimental Designs (2rld Ed.), New York: Wiley.
Compaq Visual Fortan 6.5. (2000). Compaq Computer Corporation.

Cook, L.L., & Eignor, D.R. ( 1991). IRT Equating Methods. Educational Measurement:
Issues and Practice, 10(3), 37-45.

Davey, T., Nering, M.L. & Thompson, T. (1997). Realistic simulation of item response
data. (ACT Research Report Series 97-4). Iowa City, IA: American College
Testing.

von Davier, A.A., Holland, P.W., & Thayer, D.T. (2004). The kernel method of test
equating. New York: Springer Verlag.

von Davier, A.A., Holland, P.W., Livingston, S.A., Casablanca, J ., Grant, M.C., &
Martin, K. (2005). An evaluation of the kernel equating method in a non-
equivalent groups design with an external anchor-- a Special study with pseudo-
tests from real test data. Paper presented at the National Council of Measurement
in Education, Montreal, Canada.

von Davier, A.A. & Kong, N. (2005). A unified approach to linear equating for the
nonequivalent groups design. Journal of Educational and Behavioral Statistics,
30(3), 313-342.

Efron, B. (1982). The jackknife, the bootstrap, and other resampling plans. Philadelphia,
PA: Society for Industrial and Applied Mathematics.

Efron, B., & Tibshirani, R]. (1993). An introduction to the bootstrap (Monographs on
Statistics and Applied Probability 57). New York: Chapman & Hall.

113

Han, N., Li, S., & Hambleton, R. K. (2005). Comparing kernel and IRT equating
methods. Paper presented at the National Council of Measurement in Education,
Montréal, Canada.

Hanson, B.A., Zeng, L., & Kolen, M.J. (1993). Standard errors of Levine linear equating.
Applied Psychological Measurement, 1 7, 225-237.

Harris, D.J., & Crouse, J .D. (1993). A Study of Criteria Used in Equating. Applied
Measurement in Education, 6(3), 195-240.

Holland, P.W., & Thayer, D.T. (1989). The kernel method of equating score
distributions. Program statistics research technical report no. 89-84. Access
ERIC: F ulltext (142 Reports--Evaluative No. ETS-RR-89—7). New Jersey:

Educational Testing Service, Princeton, NJ.

Holland, P.W., & Thayer, D.T. (2000). Univariate and bivariate log linear models for
discrete test score distributions. Journal of Educational and Behavioral Statistics,
25, 133-183.

Holland, P.W., Liu, M., & Thayer, D.T. (2005). Exploring the population sensitivity of
linking functions to differences in test constructs and reliability using the Dorans-
Holland measures, kernel equating and data from the last. Paper presented at the

National Council of Measurement in Education, Montreal, Canada.

KB Software (2004). Computer Program. Princeton, NJ: Educational Testing Service.
Klein, L.W., & J arjoura, D. (1985). The importance of content representation for
common-item equating with nonrandom groups. Journal of Educational

Measurement, 22, 197-206.

Kolen, M.J. (1981). Comparison of traditional and item response theory methods for

equating tests. Journal of Educational Measurement, 18, 1-1 1.

Kolen, M.J. (1984). Effectiveness of analytic smoothing in equipercentile equating.
Journal of Educational Statistics, 9, 25-44.

Kolen, M.J., (1988). Traditional equating methodology. Educational Measurement:
Issues and Practice, 7(4), 29-36.

114

Kolen, M.J., & Brennan, R]. (2004). Test Equating: Methods and Practices (2nd ed.).
New York: Springer.

Liou, M., & Cheng, RE. (1995). Asymptotic standard error of equipercentile equating.
Journal of Educational and Behavioral Statistics, 20, 259-286.

Liou, M., Cheng, P.E., & Johnson, E.G. (1997). Standard errors of the kernel equating
methods under the common-item design. Applied Psychological Measurement,
21(4), 349-369.

Liu, J .H., Allspach, J .R., Feigenbaum, M., Oh, H.J., & Burton, N. (2004). A study of
fatigue effects from the new SAT. (Research Report 2004-5 & RR-04-46). New
York: College Entrance Examination Board, & Princeton, NJ: Educational

Testing Service.

Livingston, S.A., Dorans, N.J., & Wright, N.K. (1990). What combination of sampling
and equating methods works best? Applied Measurement in Education, 3, 73-95.

Livingston, S.A. (1993a). An empirical tryout of kernel equating (142 Reports--
Evaluative No. ETS-RR—93-33). New Jersey: Educational Testing Service,
Princeton, NJ.

Livingston, S.A. (1993b). Small sample equating with log-linear smoothing. Journal of
Educational Measurement, 30(1), 23-39.

Lord, RM. (1950). Notes on comparable scales for test scores (Research Bulletin 50-48).
Princeton, NJ: Educational Testing Service.

Lord, EM. (1980). Application of item response theory to practical testing problems.
Hillsdale, NJ: Erlbaum.

Lord, F.M. (1982a). The standard error of equipercentile equating. Journal of
Educational Statistics, 7, 165-174.

Lord, F.M. (1982b). Item response theory and equating — A technical summary. In P. W.
Holland and D. B. Rubin (Eds.) Testing Equating (pp. 141-148). New York:
Academic Press.

115

Lord, F.M., & Wingersky, MS. (1984). Comparison of IRT true-score and equipercentile
observed-score “equatings”. Applied Psychological Measurement, 8, 452-461.

Lu, S., & Kolen, M.J. (1994). Bootstrap standard errors and conﬁdence intervals in
linear equating. Paper presents at the annual meeting of the American

Educational Research Assosciation, New Orleans.

Mao, X., von Davier, A.A., & Rupp, S. (2005). Comparisons of the kernel equating
method with the traditional equating methods on praxis data. Paper presented at

the National Council of Measurement in Education, Montreal, Canada.
MATLAB version 7.1, (1984—2005). The MathWorks, Inc.

Montogomery DC. (2000). Design and analysis of experiments (5th edition). New York :
Wiley.

Moses, T., Yang, W., & Wilson, C. (2005). Using kernel equatingrto check the statistical
equivalence of nearly identical test editions. Paper presented at the National

Council of Measurement in Education, Montreal, Canada.

Moses, T.P., & von Davier, AA. (2005). A SAS macro for log linear smoothing:
Applications and implications. Paper presented at the American Educational

Research Association, Montréal, Canada.

Parr, WC. (1983). A note on the jackknife, the bootstrap and the delta method estimators
of bias and variance. Biometrika, 70, 3, 719-22.

Parshall, C.G., Houghton, Du Bose P., & Kromrey J .D. ( 1995). Equating error and
statistical bias in small sample linear equating. Journal of Educational
Measurement, 32, 37-54.

Qu, Y. & Von Davier, A. (2006). Comparison of two approaches for Counter-Balanced
design in a Kernel Equating framework. Paper presented at the National Council

of Measurement in Education, San Francesco, USA.

Rice, J .A. (1988). Mathematical statistics and data analysis. Monterey, Calif. :
Brooks/Cole.

SAS version 9, (2002). SAS Institute Inc., Cary, NC, USA.

116

Tsai, T.H. (1998). A comparison of bootstrap standard errors of [RT equating methods
for the common item nonequivalent groups design. Unpublished Dissertation.
Iowa City: University of Iowa.

Yu, L., Anderson, D.O., & Zeller, K. (2003). Report of the counterbalanced equating
study for the Algebra End-Of-Course assessment (Research report SR — 2003 —
56). Princeton, NJ : Educational Testing Service.

Zeng, L., & Cope, R. (1995). Standard error of linear equating for the counterbalanced
design. Journal of Educational and Behavioral Statistics, 20(4), 337-348.

117

iiiIiiiiiiiigiiii

3 1293 02845