ANOVELAPPROACHTOEVALUATEITEMPOOLS:THEITEMPOOL
UTILIZATIONINDEX
By
Emreonulate˘s
ADISSERTATION
Submittedto
MichiganStateUniversity
inpartialentoftherequirements
forthedegreeof
MeasurementandQuantitativeMethods{DoctorofPhilosophy
2015
ABSTRACT
ANOVELAPPROACHTOEVALUATEITEMPOOLS:THEITEMPOOL
UTILIZATIONINDEX
By
Emreonulate˘s
Inthisstudy,anindextoquantifytheadequacyofanitempoolofanadaptivetestfora
givensetoftestspandexamineepopulationisintroduced.Thisindexiscalled
theItemPoolUtilizationIndex(IPUI).TheIPUIrangesfrom0to1,withvaluescloseto1
indicatingtheitempoolcanprovideoptimumitemstoexamineesthroughoutthetest.This
indexcanbeusedtocomparetitempoolsordiagnosetheofagivenitem
poolbyquantifyingtheamountofdeviationfromaperfectitempool.
Simulationstudieswereconductedtoevaluatethecapacityofthisindexfordetectingthe
inadequaciesofbothsimulatedandoperationalitempools.Theaddedvalueofthisindex
wascomparedtotheexistingmethodsofevaluatingthequalityofcomputerizedadaptive
tests(CAT).
ResultsofthestudyshowedthattheIPUIcandetectevenslightdeviationsoftheitem
poolsfromanoptimalitempool.Itcanuncovertheshortcomingsofanitempoolthatother
outcomesofCATcannotdetect.Additionally,itcanbeusedtodiagnosetheweaknessesof
theitempoolandguidetestdeveloperstoimprovetheiritempools.
Keywords:
ComputerizedAdaptiveTest,ItemPool,ItemPoolDesign
IdedicatethisdissertationtomywifeFunda,mychildrenMeryem,BilgeandElif,
andmyparents,MeryemandSelahattin.
iii
ACKNOWLEDGEMENTS
ThisdissertationthesisistheoutcomeofmanyyearsofstudiesatMichiganStateUniversity.
TheideaforthisdissertationcamefromanadvancedpsychometrycourseItookfromDr.
MarkReckasebackinFallsemesterof2013.Atthattime,IwashelpingDr.Reckasein
developingidealitempoolsforatestingcompany.WhenIsawtheitemselectionalgorithm
introducedinHan(2012),whichwasoneofthetextswereadinthecourse,theideaofthis
thesiswasborn.
First,mydeepestthanksgoestomyacademicadvisorandcommitteechairDr.Mark
Reckase.DuringmyacademiclifeatMSUhehasbeenawise,thoughtfulandkindmentor.
Hewasgenerousinsharinghiswisdomandknowledge.Hehasbeenagreatexampleofa
hardworking,dedicatedandpassionatescholar.Heintroducedmetomanyprojectsthat
taughtmebothpracticalandtheoreticalaspectsofourHespentmanyhourswithme
tocultivatetheideaofthisthesisintoaworkthatI'mveryproudof.
NationalCouncilofStateBoardsofNursing(NCSBN)providedsupportatthe
earlystagesofthisdissertation.Theorganizationalsoprovidedoperationaldatausedinthis
studythathelpedmetoshowthepracticaluseoftheindex.IwanttothankDrs.AdaWoo,
HongQianandDoyoungKimfromNCSBNfortheirhelpandsupport.
Iwishtothankmyfriendsandcolleagues,EunHyeHam,Uygun,LiyangMao,
XinLuo,ChiChang,FrancisSmart,IfeomaIyioke,BingTong,AnneTraynor,LihongYang,
TingqiaoChen,KeyinWang,HyesukJangandXuechunZhouwhoenrichmylifeatMSU
andcontributedtomylearning.
I'mgratefultomydissertationcommitteemembers,Drs.SpyrosKonstantopoulos,
RichardHouangandChristopherNyefortheirwillingnesstoreviewmywork.Theirfeedback
improvedthisdissertationimmensely.Inaddition,I'mthankfultothefacultyatMeasurement
andQuantitativeMethodsProgram,especiallyKenFrank,TenkoRaykov,KimberlyMaier,
iv
BillSchmidt.Theyhelpedmetoestablishthebasisformyadvancedstudies.I'mgratefulto
thesupportofDr.EdRoeber.HeadvisedmeinthetwoyearsofmystudiesatMSU
andhelpedmetomywayinthislargeofstudy.
Myuniquegratitudesgoestomyparents,MeryemandSelahattinonulate˘s,fortheir
onmybehalfandtheirunconditionallove.Theirendlesssupportandencouragement
throughoutmylifehashelpedmeaccomplishthisandmanyothergoalsinlife.I'mgrateful
tomysisterZeynepandbrotherAhmetTalhaforbeingtherewheneverIneedthem.
Mychildren,Meryem,BilgeandElif,theyarethejoyofmylife.Overmanyhoursthey
putupwithanoftenabsentanddistractedfather.Ithankthemfortheirloveandthe
happinesstheybringtomylife.
And,Iowethisdissertationtotheunwaveringsupportofmywife,Funda.Without
her,Iwouldnothavebeenabletocompletethisdegree.Duringtheseyearsshehasalotof
thingsonhershoulders.Shegavebirthtothreewonderfulchildrenandraisedthem,pursued
aPhDofherown,involvedinmanyprojectsandmuchmore.Butstill,shealwaysmake
sureIhaveenoughspaceandtimeformywork.Hermoralsupporthelpedmetomy
wayduringthegloomydaysandshesharedmyjoyinmanyhappydays.Shespentmany
hoursreadingmydraftsandgavevaluablefeedback.I'mthankfultoherforencouragingme
tofollowmydreamandbelieveinmyself.
v
TABLEOFCONTENTS
LISTOFTABLES
...................................
ix
LISTOFFIGURES
...................................
x
KEYTOABBREVIATIONS
..............................
xv
CHAPTER1INTRODUCTION
...........................
1
CHAPTER2LITERATUREREVIEW
........................
5
2.1Notation......................................5
2.2ItemResponseTheory..............................5
2.3ComputerizedAdaptiveTesting.........................6
2.3.1InitialAbilityEstimate..........................7
2.3.2ItemSelection...............................8
2.3.2.1MaximumFisherInformation.................8
2.3.2.2Owen'sBayesianItemSelection................10
2.3.3ConstraintsonItemSelection......................12
2.3.3.1ContentBalancing.......................13
2.3.3.2ExposureControl........................14
2.3.4AbilityEstimation............................16
2.3.4.1MaximumLikelihoodEstimation...............16
2.3.4.2ExpectedaPosterioriEstimation...............18
2.3.4.3Owen'sBayesianEstimation..................19
2.3.5ItemPoolsinCAT............................20
2.3.5.1ItemPoolSize.........................21
2.3.5.2ItemPoolDesignandAssembly................22
2.3.6EvaluationoftheItemPools.......................23
CHAPTER3THEITEMPOOLUTILIZATIONINDEX
..............
25
3.1Relative................................25
3.2ItemPoolUtilizationIndex...........................27
3.3AnExampleCalculationofIPUI........................30
3.4betweenIPUIandStandardError..................31
3.5TheLimitationsofIPUI.............................33
CHAPTER4RESEARCHQUESTIONSANDMETHODS
.............
36
4.1ResearchQuestions................................36
4.2ResearchMethods.................................36
4.2.1FirstPhase-SimulatedData......................37
4.2.1.1CommonCATSp.................37
4.2.1.2ResearchQuestion1......................38
vi
4.2.1.3ResearchQuestion2......................42
4.2.1.4ResearchQuestion3......................44
4.2.2SecondPhase-RealData........................47
4.2.2.1ResearchQuestion4......................48
4.2.2.2ResearchQuestion5......................51
CHAPTER5RESULTS
................................
53
5.1FirstPhase-SimulatedData..........................53
5.1.1ResearchQuestion1...........................53
5.1.1.1ItemPoolandExamineeAbilityDiscrepancy........53
5.1.1.2ItemPoolSize.........................61
5.1.2ResearchQuestion2...........................72
5.1.2.1TestLength...........................73
5.1.2.2ExposureControl........................84
5.1.3ResearchQuestion3...........................96
5.2SecondPhase-OperationalData........................107
5.2.1IdealItemPoolGeneration.......................107
5.2.2ItemPoolsUsedintheSecondPhase..................109
5.2.3ResearchQuestion4...........................111
5.2.4ResearchQuestion5...........................130
CHAPTER6DISCUSSION
..............................
141
6.1SummaryoftheResults.............................141
6.2PracticalUsesofIPUI..............................144
6.2.1QuanoftheItemPoolQuality.................145
6.2.2IPUIinOptimalTestAssembly.....................147
6.2.3IPUIasaQualityControlTool.....................148
6.2.4IPUIasaDiagnosticTool........................148
6.2.5IPUIatIndividualandGroupLevel...................149
6.3Implications....................................151
6.3.1TheRobustnessofCATProcedurestoWeakItemPools.......151
6.3.2SummaryStatisticsforIPUI.......................152
6.3.3CommentaryontheResultsoftheOperationalItemPools......153
6.3.4IPUIandMeasurementQuality.....................153
6.4LimitationsoftheStudy.............................155
6.4.1GeneralizabilityoftheResults......................155
6.4.2ARecommendedValueforIPUI.....................156
6.4.3DetectionoftheRedundantItemsintheItemPool..........159
6.4.4ThePurposeoftheTestandtheoftheOptimumItem..160
6.5FutureResearchDirections............................161
6.5.1AGeneralFrameworkforIPUI.....................161
6.5.2WeightsforIPUI.............................163
6.5.3IPUIforOtherPsychometricModels..................163
6.5.4NamingoftheIndex...........................165
vii
APPENDICES
......................................
166
APPENDIXA
SUPPLEMENTARYFIGURESFORRESEARCHQUES-
TION1-PART1
........................
167
APPENDIXB
SUPPLEMENTARYFIGURESFORRESEARCHQUES-
TION1-PART2
........................
174
APPENDIXC
SUPPLEMENTARYFIGURESFORRESEARCHQUES-
TION2-PART1
........................
180
APPENDIXD
SUPPLEMENTARYFIGURESFORRESEARCHQUES-
TION2-PART2
........................
183
APPENDIXE
SUPPLEMENTARYFIGURESFORRESEARCHQUES-
TION3
.............................
186
APPENDIXF
SUPPLEMENTARYFIGURESFORIDEALITEMPOOL
CREATION
...........................
188
APPENDIXG
SUPPLEMENTARYFIGURESFORRESEARCHQUES-
TION4
.............................
190
APPENDIXH
SUPPLEMENTARYFIGURESFORRESEARCHQUES-
TION5
.............................
195
BIBLIOGRAPHY
....................................
200
viii
LISTOFTABLES
Table3.1ItemParametersofTest1andTest2....................26
Table3.2IPUICalculationExample..........................31
Table4.1ItemPoolInformationforResearchQuestion3...............46
Table4.2DistributionofContentforNCLEX-RNExamination...........47
Table5.1
SummaryStatisticsforResearchQuestion1-DiscrepancybetweenItem
PoolandAbilityDistribution.........................55
Table5.2MeansandStandardDeviationsofIPUIValuesbyItemPoolSize....67
Table5.3SummaryStatisticsforResearchQuestion2-TestLength.........75
Table5.4ItemExposureAnalysisbyTestLengthCondition.............82
Table5.5SummaryStatisticsforResearchQuestion2-ExposureControl......88
Table5.6ItemExposureAnalysisbyExposureControlCondition..........94
Table5.7SummaryStatisticsforResearchQuestion4.................114
Table5.8ItemExposureAnalysisbyItemPoolCondition..............128
Table5.9DecisionAccuracyAnalysisbyItemPoolCondition............130
Table5.10DecisionAccuracyConditionalonTrue

foreachItemPool.......136
Table6.1MeansandStandardDeviationsofMeanIPUIsoftheReplications....143
ix
LISTOFFIGURES
Figure1.1AdaptiveTestProgressPlotsforTwoExaminees.............2
Figure2.1
ComparisonoftheInformationFunctionsofSixItemPoolsUsingthe
ItemPoolInformationFunctions......................24
Figure3.1TestInformationFunctionsandRelativeofTest1andTest226
Figure3.2ADemonstrationofthebetweenSEandIPUI.........32
Figure5.1
SummaryStatisticsforResearchQuestion1-DiscrepancybetweenItem
yDistributionofItemPooland

Distribution..........54
Figure5.2DistributionofStandardErrorswithineachDiscrepancyCondition...56
Figure5.3DistributionofIPUIwithineachDiscrepancyCondition.........58
Figure5.4
RelationshipbetweenStandardErrorandIPUIforeachDiscrepancy
Condition...................................59
Figure5.5MeanBiasDistributionbyItemPoolSizeCondition...........63
Figure5.6MeanStandardErrorDistributionbyItemPoolSizeCondition.....64
Figure5.7MeanSquaredErrorDistributionbyItemPoolSizeCondition......65
Figure5.8ExposureRatesbyItemPoolSizeConditionforReplication19.....66
Figure5.9MeanIPUIDistributionbyItemPoolSizeCondition...........67
Figure5.10RelationshipbetweenMeanofIPUIandMeanofStandardError....69
Figure5.11IPUIandStandardErrorRelationshipforReplication19.........70
Figure5.12CorrelationbetweenStandardErrorandIPUIforeachReplication...71
Figure5.13IPUIandReliabilityRelationshipforReplication19............72
Figure5.14SummaryStatisticsforResearchQuestion2-TestLength........74
Figure5.15
RelationshipbetweenTrueandEstimatedAbilitybyTestLengthCondition
76
Figure5.16BiasDistributionbyTestLengthCondition................77
x
Figure5.17StandardErrorDistributionbyTestLengthCondition..........79
Figure5.18MeanSquaredErrorDistributionbyTestLengthCondition.......80
Figure5.19ItemExposureDistributionbyTestLengthCondition..........81
Figure5.20IPUIDistributionbyTestLengthCondition................83
Figure5.21IPUIandStandardErrorRelationshipbyTestLengthCondition....85
Figure5.22SummaryStatisticsforResearchQuestion2-ExposureControl.....87
Figure5.23
RelationshipbetweenTrueandEstimatedAbilitybyExposureControl
Condition...................................89
Figure5.24BiasDistributionbyExposureControlCondition.............90
Figure5.25StandardErrorDistributionbyExposureControlCondition.......91
Figure5.26MeanSquaredErrorDistributionbyExposureControlCondition....92
Figure5.27ItemExposureDistributionbyExposureControlCondition.......93
Figure5.28IPUIDistributionbyExposureControlCondition.............95
Figure5.29IPUIandStandardErrorRelationshipbyExposureControlCondition.97
Figure5.30ItemPoolDistributionsofProposedTestPlans..............98
Figure5.31MeanBiasConditionalonTrue

foreachPlan(ItemPool).......99
Figure5.32MeanStandardErrorConditionalonTrue

foreachPlan(ItemPool).100
Figure5.33MeanIPUIConditionalonTrue

foreachPlan(ItemPool).......101
Figure5.34IPUIDistributionConditionalonTrue

foreachPlan(ItemPool)...102
Figure5.35MeanIPUIateachItemNumberforSelectedTrue

s...........104
Figure5.36TheRelationshipbetweenIntermediate

EstimateandIPUI.......105
Figure5.37TheRelationshipbetweenIntermediate

EstimateandItemy.106
Figure5.38ProgressPlotforIdealItemPoolwithFixedBinSize0.4.........108
Figure5.39ItemyDistributionsbyContentAreaforIdealItemPoolwith
FixedBinSize0.4..............................110
xi
Figure5.40ItemyDistributionsforItemPoolsUsedinResearchQuestion4112
Figure5.41SummaryStatisticsforResearchQuestion4................113
Figure5.42IPUIDistributionforeachItemPoolCondition..............115
Figure5.43TheRelationshipbetweenTrue

andEstimated

.............116
Figure5.44
TheRelationshipbetweenIPUIandEstimatedAbilityforeachItem
PoolCondition................................118
Figure5.45
TheRelationshipbetweenIPUIandTestLengthforeachItemPool
Condition...................................119
Figure5.46BiasDistributionforeachItemPoolCondition..............121
Figure5.47TheRelationshipbetweenIPUIandBiasforeachItemPoolCondition.122
Figure5.48StandardErrorDistributionforeachItemPoolCondition........123
Figure5.49
TheRelationshipbetweenIPUIandStandardErrorforeachItemPool
Condition...................................125
Figure5.50MeanSquaredErrorDistributionforeachItemPoolCondition.....126
Figure5.51ExposureRateDistributionforeachItemPoolCondition.........127
Figure5.52
TheRelationshipbetweenExposureRatesandItemGrouped
byContentAreaforeachItemPoolCondition...............129
Figure5.53MeanBiasConditionalonTrue

foreachItemPoolCondition.....132
Figure5.54MeanStandardErrorConditionalonTrue

foreachItemPoolCondition134
Figure5.55MeanSquaredErrorConditionalonTrue

foreachItemPoolCondition135
Figure5.56MeanIPUIValuesConditionalonTrue

foreachItemPoolCondition.137
Figure5.57
MeanIPUIValuesConditionalonTrue

aroundtheCutScoreforeach
ItemPool...................................139
Figure5.58IPUIDistributionConditionalonTrue

foreachItemPool.......140
FigureA.1
ItemyDistribution(ResearchQuestion1-Discrepancybetween
ItemPoolandAbilityDistribution).....................167
FigureA.2
True

Distribution(ResearchQuestion1-DiscrepancybetweenItem
PoolandAbilityDistribution)........................168
xii
FigureA.3
DistributionofBiasforeachDiscrepancyCondition(ResearchQuestion
1-DiscrepancybetweenItemPoolandAbilityDistribution).......169
FigureA.4
RelationshipbetweenBiasandIPUIforeachDiscrepancyCondition
(ResearchQuestion1-DiscrepancybetweenItemPoolandAbility
Distribution).................................170
FigureA.5
DistributionofMeanSquaredErrorforeachDiscrepancyCondition
(ResearchQuestion1-DiscrepancybetweenItemPoolandAbility
Distribution).................................171
FigureA.6
RelationshipbetweenMeanSquaredErrorandIPUIforeachDiscrepancy
Condition(ResearchQuestion1-DiscrepancybetweenItemPooland
AbilityDistribution).............................172
FigureA.7
TwoExamineeswithSameStandardErrorsbuttIPUIValues
(ResearchQuestion1-DiscrepancybetweenItemPoolandAbility
Distribution).................................173
FigureB.1True

Distribution(ResearchQuestion1-Part2)............174
FigureB.2
ItemyDistributionbyItemPoolSizeConditionforReplication
19(ResearchQuestion1-Part2)......................175
FigureB.3
BiasDistributionbyItemPoolSizeConditionforReplication19(Re-
searchQuestion1-Part2).........................176
FigureB.4
StandardErrorDistributionbyItemPoolSizeConditionforReplication
19(ResearchQuestion1-Part2)......................177
FigureB.5
MeanSquaredErrorDistributionbyItemPoolSizeConditionforRepli-
cation19(ResearchQuestion1-Part2)..................178
FigureB.6
IPUIDistributionbyItemPoolSizeConditionforReplication19(Re-
searchQuestion1-Part2).........................179
FigureC.1
ItemyDistributionforResearchQuestion2-TestLengthConditions
180
FigureC.2True

DistributionforResearchQuestion2-TestLengthConditions..181
FigureC.3IPUIandBiasRelationshipbyTestLengthCondition..........182
FigureD.1ItemyDistributionforResearchQuestion2-ExposureControl.183
FigureD.2True

DistributionforResearchQuestion2-ExposureControl.....184
FigureD.3IPUIandBiasRelationshipbyExposureControlCondition.......185
xiii
FigureE.1TheBiasDistributionateachTrue

ValueforeachItemPoolCondition186
FigureE.2
TheStandardErrorDistributionateachTrue

ValueforeachItem
PoolCondition................................187
FigureF.1ProgressPlotforIdealItemPoolwithFixedBinSize0.8.........188
FigureF.2ItemyDistributionsbyContentAreaforIdealItemPoolwith
FixedBinSize0.8..............................189
FigureG.1True

DistributionforResearchQuestion4................190
FigureG.2
TheRelationshipbetweenEstimatedAbilityandBiasforeachItem
PoolCondition................................191
FigureG.3
TheRelationshipbetweenEstimatedAbilityandStandardErrorfor
eachItemPoolCondition..........................192
FigureG.4
TheRelationshipbetweenTestLengthandStandardErrorforeachItem
PoolCondition................................193
FigureG.5
TheRelationshipbetweenIPUIandMeanSquaredErrorforeachItem
PoolCondition................................194
FigureH.1
TheRelationshipbetweenTrue

andEstimated

foreachItemPool
Condition...................................195
FigureH.2TheBiasDistributionateachTrue

ValueforeachItemPoolCondition196
FigureH.3
MeanStandardErrorConditionalonRestrictedTrue

Rangeforeach
ItemPoolCondition.............................197
FigureH.4
TheStandardErrorDistributionateachTrue

ValueforeachItem
PoolCondition................................198
FigureH.5
TheTestLengthDistributionateachTrue

ValueforeachItemPool
Condition...................................199
xiv
KEYTOABBREVIATIONS
1PL
one-parameterlogistic.4,6,27,33{35,37,40,162{164
2PL
two-parameterlogistic.6,27,34,40,41,162{164
3PL
three-parameterlogistic.6,9,27,34,40,41,163,164
CAT
computerizedadaptivetesting.1{3,5{10,12{16,18{24,26,28{32,36,37,39{53,
60{62,72,73,79,84,86,93,96,102,111,131,138,141{151,154{157,160,161,163{165
EAP
expectedaposteriori.16,18,37,154
IPUI
itempoolutilizationindex.29{35,38{46,49{54,57{62,65,66,68{70,73,74,80{84,
86,94{96,99{103,105,106,111{115,117{120,124,130,131,136{144,146{165,194
IRT
itemresponsetheory.5,16,27,28,163{165
MAP
maximumaposteriori.18,154
MCAT
multidimensionalcomputerizedadaptivetest.164
MFI
maximumFisherinformation.8{10,37,155,162
MIRT
multidimensionalitemresponsetheory.13,164
MLE
maximumlikelihoodestimation.7,16{20,37,48
MSE
meansquarederror.23,30,36,42,44,46,52,54,56,57,60,62{64,73,74,78,86,89,
90,94,112,124,134,138,149
NCLEX-RN
NationalCouncilLicensureExaminationforRegisteredNurses.47{50,52,
107,124,127,130,131,144,153,159
NCSBN
NationalCouncilofStateBoardsofNursing.47,49,107,143
P&P
paperandpencil.6,12,21,22
RMSE
rootmeansquarederror.4
SE
standarderror.9,10,23,30{33,36,42,44,46,52,54{58,60{63,68,69,73,74,77,
78,84,86,88,89,94,96,98{100,112,120{124,133{135,138,142,146,149,151,154,
156{158
SEM
standarderrorofmeasurement.4
TIF
testinformationfunction.25
xv
CHAPTER1
INTRODUCTION
Increasingavailabilityofcomputersandrelativeadvantagesofcomputerizedadaptivetesting
(CAT)overpaperbasedtestsboostedtheusageofCATinrecentdecades.Mainly,aCAT
enablesmoretmeasurementofexamineeabilities,shortertestlengths,andmore
preciseabilityestimatesforexamineesattheextremeendsoftheabilitydistribution.But
thesebcomewithcosts.Amongothers,therequirementforalargeitempoolisthe
mostchallengingone.Anitempoolisthecollectionofitemsthatwillbeusedtoconstruct
individualadaptivetestsforexaminees.Anitempoolshouldincludetnumberof
highqualityitemsthataretargetedtotheexamineepopulation(Parshall,Spray,Kalohn,
&Davey,2002).Itshouldmeetthecontentsponsofthetestandprovidet
informationatalllevelsoftheabilitydistributionofthetargetpopulation(vanderLinden,
Ariel,&Veldkamp,2006).Flaugher(2000)underlinedtheimportanceofitempoolsina
CAT:
Obviously,thebetterthequalityoftheitempool,thebetterthejobtheadaptive
algorithmcando.Thebestandmostsophisticatedadaptiveprogramcannot
functionifitisheldincheckbyalimitedpoolofitems,oritemsofpoorquality.
(p.38)
Todemonstratetheimportanceofalargeitempool,averybasicadaptivetesthasbeen
simulated.Figure1.1showstheCATprogressoftwoexamineesforatestwithsametest
spanditempool.Theitempoolconsistsof50itemswithitemgenerated
fromastandardnormaldistribution.Thepointsintheshowtheintermediateability
estimatesoftheexaminees.Blue`b'pointsrepresenttheitemdyparametersthatare
administeredtotheexamineesateachstageoftheadaptivetest.Trueabilityparametersof
1
Examinee1andExaminee2are0and1.5,respectively.Fortheexaminee,theitempool
isverysuitable.Ateachstep,theitempoolcanprovideanitemwithayparameter
whichisveryclosetotheExaminee1'sintermediateabilityestimate.Ontheotherhand,for
Examinee2,aftercorrectlyansweringacoupleofquestions,theitempoolisoutof
items.Asaresult,eventhoughExaminee2correctlyanswerseachitem,theCATalgorithm
presentseasieritemstothisexaminee.
Figure1.1:AdaptiveTestProgressPlotsforTwoExaminees
Theconsequencesoftheinadequacyofanitempoolcanbeseriousdependingonthe
stakesoftheexam.Forexample,asseeninFigure1.1,thestandarderroroftheability
estimateforthesecondexamineeishigher.Themeasurementerrorofexamineeswithsimilar
abilitiesasExaminee2willbehigherandthiscanthedecisionsmadefromtheirtest
scores.Also,eventhoughExaminee2correctlyanswerseachitem,theitemsaregetting
2
easierandeasier.ThisiscontrarytothebasicpremiseofaCAT.Probablythiswill
themotivationoftheexaminee.So,testdevelopersshouldensurethattheiritempoolsare
supportingthepurposesofthetests.Theindexdevelopedinthisdissertationwillgivetest
developersatooltoevaluatethequalityoftheiritempools.
Testdevelopersareverymotivatedtokeeptheiritempoolsastaspossible.They
wanttouseasmanyitemsaspossibletoensurethequalityofthetest.Buttheydon'twant
touseitempoolsthataremorethanenough.Becauseitempooldevelopmentisanexpensive
enterprise.Breithaupt,Ariel,andHare(2010)estimatedthatfora40itemCATthatwill
spanover5yearswithtwoadministrationsperyear,2000itemsarenecessary.Considering
thatdevelopingoneitemwithtraditionalitemdevelopmentmethods(withallnecessary
controlmechanisms)costsbetween$1,500-2,500(Rudner,2010),thetotalcostofanitem
poolreaches$3,000,000to$5,000,000(Gierl&Lai,2013).Asaresult,testdevelopersare
motivatedtoreducethesizeoftheiritempoolsandmaketheiritempoolsastas
possible.
Evaluationoftheitempoolisveryimportantbecauseofthepossibleofaweak
itempoolonthedecisionsbasedonthetestresults.Aninappropriateitempoolcanincrease
thebiasesandstandarderrorsoftheabilityestimates.IfthetestisavariablelengthCAT,
largerstandarderrorswouldincreasethetestlength.Iftheitempoolistforone
groupofexamineesandnotforanother,thiswillimpairthefairnessofthetest.Somegroup
ofexamineesmighthavelesspreciseabilityestimatesorlongertestsdependingonthetest
spns.Simulationsmightrevealsuchproblems.Buttracingbacktothesourceof
suchproblemsmightnotbestraightforward,especiallyforCATswithcomplexdesigns.
Weakitempoolsmightalsocausetheviolationofsometestsporadministration
ofinappropriateitems.Testdevelopersshouldevaluatetheiritempoolsandensurethattheir
itempoolsareadequateforthetestspoftheirCATs.Forexample,Eggenand
Verschoor(2006)observedthattheCATalgorithmtheydesignedproducedteststhatdid
notmeetthetestspations.TheCATalgorithmfailedtoprovideappropriateitemsto
3
theexaminees.Theauthorsattributethistothelackofappropriateitemsintheitempool.
Themotivationforthisstudyistocreateanindexthatquantheperformanceof
anitempoolforagivenadaptivetestandexamineepopulation.Theperfectitempool
performancewillbeachievedwhenanitempoolcanprovideaperfectitemtoanexaminee
regardlessoftheabilityoftheexamineeorthestageofthetest,whilemeetingallof
theconstraintsofthetest.Aperfectitemisanitemwithmaximumpossibleamountof
informationatagivenabilitylevel.Forexample,forone-parameterlogistic(1PL)model,the
perfectitemforagivenabilityhasayparameterequaltothisgivenabilityvalue.
Inpractice,almostallitempoolsareimperfect.Butevaluatingtheofanitem
poolisnotstraightforward.Adaptivetestsareusuallyevaluatedwithoutcomevariables
suchasstandarderrorofmeasurement(SEM),biasoftheestimates,rootmeansquared
error(RMSE),itemexposurerates,overlapratesordecisionaccuracy.Allofthesearevery
valuableindicatorstoshowtheerentaspectsofthequalityofanadaptivetest.Butthe
qualityofanitempoolcannotbeevaluatedsolelybyanyoftheseindicators.
Thequalityofanitempoolisintermingledwithmanyaspectsofanadaptivetestsuch
astheitemselectionprocedures,theabilityestimationmechanisms,constraintsexposedon
adaptivetests,andtestspWhenevaluatingtheoutcomesofanadaptivetest,
usuallyitisnotpossibletosingleouttheofeachoftheseindividualfactorswithout
performingalargesimulationstudy.Theindexthatisdevelopedinthisdissertationaims
toquantify,foranitempool,theamountofdeviationfromaperfectitempool.Atest
developerwillbeabletousethisindextoevaluatetheitempool'sprospectiveperformance.
Consequentlythiswillleadtoadecisionofeitherkeepingtheitempoolintact,orimproving
itbyaddingmoreappropriateitems,orremovingtheredundantitemsandsavingthemfor
futureadministrations.
4
CHAPTER2
LITERATUREREVIEW
2.1Notation
Inthisthesis,thenotationusedbyvanderLindenandPashley(2010)hasbeenfollowed.
Itemsintheitempoolaredenotedby
i
=1
;:::;I
.Theorderofpresentationofitemsis
denotedby
k
=1
;:::;K
,where
K
isthetestlength.So,
i
k
denotestheindexoftheitemin
theitempoolwhichisthe
k
thselecteditemintheadaptivetest.Thesetofitemsthatare
alreadyadministeredbeforetheselectionof
k
thitemisdenotedas
S
k

1
=
f
i
1
;:::;i
k

1
g
.
Thesetofitemsremainedintheitempoolaftertheadministrationof
k

1itemsis
denotedas
R
k
=
f
1
;:::;I
gn
S
k

1
.Theresponsestringofanexamineewillbedenotedas
u
i
1
;u
i
2
;:::;u
i
K
.Inthisstudyonlydichotomousitemsused,sothevaluesof
u
i
k
canbe
either0forincorrectresponseor1forcorrectresponse.Theabilityparameterofexaminees
willbedenotedby

2
(

;
1
).Theexamineesareindexedwith
j
2f
1
;:::;N
g
,where
N
isthetotalnumberofexaminees.
2.2ItemResponseTheory
Itemresponsetheory(IRT)isthebackboneofaCAT.Almostallofthescoringanditem
selectionalgorithmsuseIRT.EspeciallyinaCAT,sincetexamineesseest
itemsatnttimes,theexistenceofacommonscaleisparamount.IRTprovidesthis
commonscaleforitemsandexaminees.
InIRT(Lord&Novick,1968;Lord,1980),theprobabilityofacorrectresponseofan
5
examineewithabilityparameter

toanitem
i
ismodeledas:
P
(
u
i
=1
j

)=
c
i
+(1

c
i
)
e
D

a
i
(


b
i
)
1+
e
D

a
i
(


b
i
)
u
i
=
8
>
<
>
:
1ifitem
i
'sresponseiscorrect
0ifitem
i
'sresponseisincorrect
(2.1)
where
a
istheitemdiscriminationparameter,
b
istheitemyparameter,
c
isthelower
asymptoteparameterand
D
isthescalingfactorwhichisusuallytakenas1.7.Themodelin
Equation(2.1)iscalledthree-parameterlogistic(3PL)model.Fixingthe
c
parameterto0
givestwo-parameterlogistic(2PL)model,andfurtherthe
a
parameterto1gives1PL
modelorRaschmodelwhen
D
=1(Rasch,1961).
2.3ComputerizedAdaptiveTesting
CATisacomplexmethodofdeliveringatailoredexamthatadaptstoanindividualexaminee.
TheresearchsupportingthedevelopmentofCAThasmorethan40yearsofhistory.Ithas
beenusedinoperationaltestinginthepasttwentyyears.
Comparedtopaperandpencil(P&P)tests,CAThasseveraladvantagessuchasshorter
tests,increasedtestreliability,ondemandtesting,immediatetestscoringandreporting
(Meijer&Nering,1999).ACATuseshalfasmanyitemscomparedtoP&Ptests(Weiss&
McBride,1984),insomecasesevenless(Gibbonsetal.,2008).ACATallowsthemeasurement
ofinformationsuchasresponsetimes(Wise,Bhola,&Yang,2006),speechentries,graphical
entries,mousemovementsandothertrackinginformationthatarenotavailabletoP&Ptests.
ACATalsoallowstheuseofinnovativeitemsthatcanhelptoincreasethevalidityevidence
oftestswhicharenotavailableinP&Ptests(Luecht&Clauser,2002).
ThelistbelowshowsthebasicalgorithmofaCAT:
1.
Specifyaninitialabilityestimate(
^

0
)asastartingpoint
6
2.
Selectanitemfromtheavailableitemsintheitempooltodelivertotheexaminee
3.
Scoreitemandupdateexaminee'sabilityestimate(
^

k
)
4.
Evaluatetheterminationcriteria:
a)
Ifconcludethetest
b)
Ifnotgotostep2.
Inthefollowingsectionseachpartofthelistabovewillbeexplainedfurther.
2.3.1InitialAbilityEstimate
InthestepofaCAT,anappropriateinitialabilityestimateshouldbedesignated.There
aretwooptionsforthestartingpointofaCAT:variablestartingpointandstarting
point.Whenastartingpointisused,eachexamineeisassignedthesameinitialability
estimateatthebeginningoftheCAT.Intestswithvariablestartingpoint,someprior
informationabouttheexamineeguidesthechoiceoftheinitialabilityestimate.
Itisrecommendedthattheinitialabilityestimateusealloftheavailableinformation
abouttheexaminee(Kingsbury&Wise,2000).Examplesofthispriorinformationmightbe
theprevioustestresults,schoolgrades,studentbackgroundinformation,andotherrelevant
collateralinformationthatiscorrelatedwithexaminee'sability.Variablestartingpointswill
increasetheoftheadaptivetest.Forsomeexaminees,thisinitialestimatemight
beButtheadaptivenatureofthetestwillsolvethisproblem.Also,iftheabilitiesof
examineesareestimatedusingBayesianestimationmethods,thevariablestartingpointwill
reducethebiasoftheseestimates(Weiss&McBride,1984;Wang&Vispoel,1998).For
maximumlikelihoodestimation(MLE),WangandVispoel(1998)foundnegligibleectsof
usingvariableandstartingpointsonbias.
Usingvariablestartingpointshasanaddedbttotestdeveloperstoo.Ifa
startingpointisusedforallexaminees,eachexamineewillseethesameitemsatthebeginning
7
ofthetest(supposingthatthereisnoexposurecontrol).Consequentlysomeitemswillbe
overexposed.Variablestartingpointscanalleviatethisproblem.
Eventhoughit'smanypsychometricbvariablestartingpointshaveanimportant
drawback.Usinginformationthatpossiblyexaminee'sabilityestimatebeyondthe
examinee'scurrenttestperformancemightbeobjectionabledependingonthetestpurpose.
Thisisthemainreasonwhymanyhighstakestestsuseastartingpoint.
Usually0ischosenasastartingpointforaCAT,becauseitisthemiddleofthe
abilitydistribution.Butdependingonthecircumstances,tstartingpointsmight
beconsidered.Forexample,ifthetestlengthofaCATisratherlongandtestdevelopers
wantexamineestowarmuptotheexamandmakeaneasystart,alowerstartingpointcan
beset.Forlicensureexams,theinitialstartingpointmightbesetatthecutscoreofthetest.
2.3.2ItemSelection
Theitemselectionalgorithmisthemostimportantpartoftheadaptivetest.Ateachstage,
theCATalgorithmshouldselectthemostappropriateitemusuallywiththeexistenceofmany
constraints.Iteminformationisanimportantdeterminantoftheitemselectionprocesses.
Almostallitemselectionalgorithmstrytoselectanitemthatwillgivethelargestamountof
informationtotheexaminee.Inthisdissertationonlyafewoftheavailableitemselection
algorithmswereinvestigatedindetail,butintheCATliteraturetherearemanyofthem
(vanderLinden,1998a;Barrada,Olea,Ponsoda,&Abad,2010;vanderLinden&Pashley,
2010).
2.3.2.1MaximumFisherInformation
ByfarthemostuseditemselectionalgorithminaCATismaximumFisherinformation
(MFI)(Lord,1977a).Inthismethod,theitemthathasthemaximumamountofinformation
atanexaminee'sintermediateabilityestimate(
^

)willbeadministered.Foratestwith
K
8
items,theFisherinformationfunctioncanbeexpressedas:
I
(

)=

E

@
2
@
2
log
L
(
u
j

)


=
K
X
k
=1

P
i
k
0

2
P
i
k
Q
i
k
(2.2)
where
L
(
u
j

)isthelikelihoodfunctionas:
L
(
u
j

)=
L

U
i
1
=
u
i
1
;:::;U
i
K
=
u
i
K
j


=
K
Y
k
=1
P
u
i
k
i
k

1

P
i
k

1

u
i
k
(2.3)
where
P
i
k
=
P

u
i
k
=1
j


isinEquation(2.1).Forthe3PLmodelsubstituting
thevaluesandcalculatingthederivativeofEquation(2.1)foritem
i
gives:
I
i
(

)=
(
Da
i
)
2
(1

c
i
)

c
i
+
e
D

a
i
(


b
i
)

1+
e

D

a
i
(


b
i
)

2
(2.4)
AteachstepoftheCAT,theMFIalgorithmsearchesforanitemthatmaximizesthe
totalinformation(Equation(2.2))giventheprevious
S
k

1
items.
MFIhassomeimportantadvantages.Ateachstep,itselectsthemostinformativeitem
whichincreasestheoftheCAT.Theprecisionofthetestincreasesrapidlyasmore
itemsareadministered.Itiswidelyusedandit'spropertiesarewellresearched.
MFImethodonlyusestheexaminee'scurrenttestdatatoselectanewitem.Thisis
desirableinsomecircumstances.But,usuallyattheearlystagesoftheCAT,thereisnot
enoughinformationtoguidethisitemselectionalgorithm.Asaresult,theitemsthatare
selectedmightnotbethemostappropriateones.Additionally,theitemsatthebeginningof
thetestcausebigjumpsintheabilityestimates.Thisisthereasonwhytestpreparation
companiestelltheirstudentstobeextracarefulwhilerespondingthefewitemsonthe
CAT.Thisproblemmightbealleviatedbyusingpriorinformationtoselectitemsorusing
titemselectionparadigmssuchasKullback-Leibleratleastatthebeginningofthe
test(Chang&Ying,1996).
BecauseMFImethodisselectingmostinformativeitems,itwillreducetheuncertaintyof
theabilityestimates,i.e.standarderror(SE),morequickly.Han(2012)foundthatMFI
9
resultedinthelowestSEofestimateregardlessoftheexposurecontrol.Eventhoughthisis
desirable,onepossiblesideofthismightbeonvariablelengthCATtests.Forthose
tests,usuallythetestterminateswhentheSEdropsbelowathreshold.With
MFI,thereisachancethatteststerminateprematurely.Also,Warm(1989)showedthatfor
shortertestsapproximatingthestandarderrorsusinginformationfunctionsunderestimate
thetrueerrorvariances.
MFIhassomeotherdisadvantagesaswell.Forshorttests,Chen,Ankenmann,andChang
(2000)foundthatMFIperformedmarginallyworsethanotheritemselectionmethodsthey
investigated.Fortestslongerthan10items,theyfoundthatthisdisappeared.
Usingitemssolelybasedontheirinformationvalueswillresultindisproportionateuse
ofsomehighlyinformativeitems(Way,1998).Thishastwodisadvantages.First,highly
informativeitems,whichusuallyhavehighitemdiscriminationparameters,exposedalot.
Ontheotherhand,itemswithlowitemdiscriminationparametersmightnotbeexposed
atall.Second,sinceveryinformativeitemsareusedatthebeginningofthetest,where
theprovisionalabilityestimatesareinaccurate,theabilityestimateswillbeoverand
underestimated(Chang,2004).Tomitigatethisproblem,methodssuchas
a
item
selectionmethod(Chang,Qian,&Ying,2001)or
a
with
b
-blockingitemselection
method(Changetal.,2001)hasbeenproposed.
2.3.2.2Owen'sBayesianItemSelection
Owen'sBayesianitemselectionalgorithm(Owen,1975)isbasedonthereductionofthe
posteriorvariancesoftheabilityestimates.FromaBayesianframework,theposterior
distributionoftheabilitygiventheresponsestoprevious
k

1itemsis:
g
(

j
u
)=
P


u
i
1
;:::;u
i
k

1

=
P
(
u
;
)
P
(
u
)
=
L
(
u
j

)

g
(

)
R
L
(
u
j

)

g
(

)

(2.5)
10
where
g
(

)isthepriordistributionoftheability,whichisusuallyanormaldistribution.
Thecalculationofthisposteriordistributionisnotcomputationallysimple.Owenuseda
normalapproximationtothisposteriordistributionandheprovedthatasthenumberof
administereditemsgotoy,theexpectedvalueoftheposteriordistributionwillconverge
tothetruevalueof

.
Eachexamineewillstartthetestwiththeinitialabilityestimatethatisequaltothe
expectedvalueofthepriordistribution,
g
(

).Ateachstage,Owen'sitemselectionalgorithm
searchesforanitemthatwillreducetheposteriorvariancemost.AccordingtoOwenthis
canbeachievedbyminimizingthe

function(Vale&Weiss,1977):

i
=
1
(1

c
i
)

 
1+
1
˙
2
0
a
2
i
!


1+
1
K


c
i
+
1

c
i
K


e
2
D
2
where
D
=
b
i


0
r
2

a

2
i
+
˙
2
0

K

1
=
1

erf(
D
)
2
erf(
x
)=
2
p
ˇ
Z
x
0
e

t
2
d
t:
where

0
and
˙
0
arethemeanandstandarddeviationofthepriordistribution,respectively.
Aftertheexamineeanswersanitem,anewposteriordistributioniscomputedusingthe
itemresponseandpriordistribution.Then,thisnewposteriordistributionbecomestheprior
distributionfortheselectionofthenextitem.Whenthevarianceoftheposteriordistribution
reducestoanacceptablelevelthatthetestdevelopercantolerate,thetestends.
Ateachstage,posteriorvarianceiscalculatedforeachavailableitemintheitempool.
Whenthecomputerprocessingspeedswereslow,thiscausedlongwaitsaftertheexaminee's
response.ValeandWeiss(1977)proposedarapiditemsearchproceduretosolvethisproblem
atthattime.Butasthecomputerspeedsincreasedthisproblemvanished.Asdiscussedin
Section2.3.4.3,Owen'sitemselectionalgorithmismuchfasterthanotherBayesianmethods.
11
2.3.3ConstraintsonItemSelection
Intheory,theitemthatismostinformativeattheexaminee'sintermediateabilityestimate
shouldbeadministered,butinpracticethisrarelyhappens.Thereareseveralstatisticalor
non-statisticalrulesthatconstraintheitemselection.Theseconstraintsensurethateach
testfollowsthetestspascloselyaspossibleandeachexamineegetsacomparable
andstandardizedtest.Inadditiontheyhelptestdeveloperstosecuretheiritempools.
Inoperationaltesting,thenumberofconstraintsonitemselectioncangoupto100.For
example,vanderLindenetal.(2006)reportedthatthereare96constraintsinLSATtest.In
theiradaptivetestsimulations,Stocking(1994)usedupto75constraintsonitemselection.
Themajorityoftheconstraintsonitemselectionarecontentbalancing,exposurecontrol
anditemenemies(Weiss,2011).Butconstraintsarenotlimitedtothese.Eignor,Stocking,
Way,and(1993)gaveadetailedaccountofsuchconstraints.Forexample,atest
developermightnotwanttouseitemsthatcontainuncommonwordsmorethanonceor
twiceinatest.Itemformatcanbeanimportantconstraintaswell.Forinstance,ina
sentencecompletionsection,testdevelopermightwanttopreserveacertainratioofitems
thatcontainasingleblankasopposedtotwoblanks.
Someitemsshouldnotbepresentedtoanexamineewithinthesametest.Forexample,
itemenemieswhichprovidesacluetothesolutionofeachother.Orredundantitemsthatare
verysimilartoeachother.Insuchcases,theitemselectionalgorithmshouldnotselectsuch
itemsifoneofthemisalreadyadministered.InP&Ptests,theseitemscanbeavoidedbefore
presentingthetesttotheexamineesbycheckingthetestforms.InaCAT,itemenemiescan
bebundledinsubsets.Ifanitemwithinasubsetisadministeredtoanexaminee,remaining
itemsareremovedfromtheavailableitempoolforthatexaminee.
12
2.3.3.1ContentBalancing
Eachtesthasaninferenceaboutexamineescores.Iftheinferencesofatestarerelatedto
generalmathematicsabilityandthemajorityofthetestcontentiscomingfromtrigonometry,
thenthetestscoredonottheclaimsofthetest.Forthisreason,eachtestneedsto
followatestblueprint.Thisblueprintdelineatesthedetailsaboutthetestincludingthe
contentareadistributionofitems,cognitiverequirementsofitemsandetc.Contentbalancing
isamechanismneededforaCATtofollowthetestspsandtoavoidover-testing
orunder-testingofsomecontentareas.Mosttestspecisetdesiredcontentcoverage
ratiossuchthatcertainpercentagesofthetestitemscomefromeachcontentdomain.
KingsburyandZara(1989)proposedasimpleandintuitivecontentbalancingalgorithm.
First,testdeveloperspthepercentageoftestitemsthatshouldcomefromeach
contentarea,i.e.targetpercentages.Aftertheadministrationofeachitem,thecomputer
calculatestheempiricalpercentagesofeachcontentarea.Then,theseempiricalpercentages
arecomparedtothetargetpercentages.Thecontentareawhichhasthelargestdiscrepancy
betweenthetargetandempiricalpercentagesisselected.Itemsfromothercontentareasare
outfromtheitempoolandthenextitemwillbedeliveredfromtheavailableitems
withinthiscontentdomain.Basicallyinthismethod,itempoolispartitionedintosmaller
itempoolsaccordingtoitemcontent.Ateachstageanitemfromoneofthesesmalleritem
poolsisselected.
Oneproblemwithnumberofitemsfromeachcontentisthepotentialinteraction
betweenitemcontentanditemculty(Segall,1996).If,forexample,trigonometryitems
areitemsandarithmeticitemsareeasieritemsintheitempool,alowability
examineemightneedtoanswertrigonometryitemsthatarewayabovehis/herabilitylevel.
ClearlythisreducestheofaCAT.ACATusingmultidimensionalitemresponse
theory(MIRT)mightbemoreeforsuchsituations.
Besidesthisrathersimplecontentbalancingmethod,thereareothertechniquesaswell(He,
13
Diao,&Hauser,2014).Examplesofsomeothercontentbalancingmethodsaretheshadow
testapproach(vanderLinden,2010),weighteddeviationsmodel(Swanson&Stocking,1993)
andmaximumpriorityindexmethod(Cheng&Chang,2009).
2.3.3.2ExposureControl
Dependingonthestakesoftheexamination,testdevelopersmaynotwantsomeitemsto
beoverused.Asthestakesofthetestincreases,theincentivefortestuserstoobtainthe
testitemswithoutpermissionincreases.Sometestpreparationorganizationseventried
systematicallytoobtaintestitems(Davey&Nering,2002).Therefore,testdevelopersshould
protecttheiritemsfromsuchattempts.Alsotestdevelopersdon'twantsomeitemstobe
underusedduetothehighcostofproducingitems.Inordertocontroltheusageofitemsin
aCAT,thefrequencyofitemadministrationisconstrainedusingexposurecontrolmethods.
Exposurecontrolproceduresareneededtomaintainthefairnessandthevalidityofthetest
bypreventingexamineesfromhavingpre-knowledgeoftheitems.
Testdevelopersdealwiththeitemexposureproblemsintwogeneralways.Theydeal
withtheitemexposureproblembeforetheexaminationbymanagingtheitempools,and/or
theydealwithitduringtheexaminationbyputtingsomeconstraintsontheitemselection
algorithm.
Testdevelopersmaychoosetouseitempoolsforacertainamountoftimeandchangeit.
Theamountoftimedependsonthefrequencyandvolumeofthetestadministrationand
stakesofthetest.Testdevelopercanuseacompletelynewitempool,apreviousitempool
orremoveonlytheproblematicandhighlyexposeditemsfromthepool.
Itemexposurecanbecontrolledduringtheexaminationaswell.Broadly,exposurecontrol
methodsduringthetestadministrationcanbedividedintotwo(Way,1998):methodsbased
onrandomizationandmethodsbasedonthefrequencyofadministrationofitemsfora
particularpopulation.
Therearemanyvariationsoftherandomizationapproachtoexposurecontrol.The
14
randomesqueprocedure(Kingsbury&Zara,1989)isasimplewayofdealingwithexposure
control.Usingthisprocedure,ateachstage,theCATalgorithmrandomlyselectsoneitem
fromthemostinformative
m
itemswhere
m
2f
2
;
3
;:::
g
.Anothermethodmentionedin
Eignoretal.(1993)isavariationofthisrandomesqueprocedure.Firstitemisselectedfrom
agroupofeightbestitems,seconditemisselectedfromagroupofsevenbestitemsand
soon.Aftertheeighthitemtheoptimalitemisselected.Theideabehindthisapproach
is,attheinitialstagesofthetest,almostallexamineesseethesamesetofitems.Aftera
certainnumberofitems,therewillbeenoughvariationintheexamineeresponsestoselect
anoptimumitem.Bergstrom,Lunz,andGershon(1992)anexposurecontrolmethod
foraCATusingtheRaschmodel.Intheirmethod,anitemwithyparameterwithin
0.10logitsoftheintermediateabilityestimateisselectedrandomly.
Sympson-Hetterexposurecontrol(Hetter&Sympson,1997)isanexampleofthesecond
categoryofexposurecontrolmethodsthatareconditionalonthefrequencyofadministration
ofitemsforaparticulartargetpopulation.Inthismethod,amaximumexposureratebetween
0and1isassignedtoeachitemusingsimulations.Lowermaximumexposureratesare
assignedtoitemsthatareveryinformativeandtendtobeadministeredmost.Duringthe
test,afterselectinganoptimumitemforadministration,arandomnumberfromauniform
distributionbetween0and1isgeneratedandcomparedtotheexposurerateofthisitem.If
thisrandomnumberissmallerthanthemaximumexposurerateoftheselectedoptimum
item,thenthisitemisadministeredtotheexaminee.Otherwise,thenextoptimumitemis
selectedandthesameexposurecontrolprocedureisappliedtothisitemaswelluntilanitem
isadministeredtotheexaminee.
Inadditiontothesetwomethods,therearemanymethodstocontrolitemexposureduring
thetest(Revuelta&Ponsoda,1998;Georgiadou,Trian&Economides,2007).Even
thoughitisveryimportanttocontainexposureratesatacceptablelevels,theseprocedures
willreducetheofaCAT.
15
2.3.4AbilityEstimation
Afteranexamineeanswersanitem,theabilityisestimatedtoeitherterminatethetestor
toselectanitemusingthisestimate.Selectinganappropriateestimationmethodiscrucial.
Estimationmethodwillthenalscorethatisreported,thedecisiontoterminate
thetestandtheitemsthatareselectedforadministration.Intheliterature,thereexists
numerousabilityestimationmethods(Wang&Vispoel,1998;vanderLinden&Pashley,
2010).Threeoftheseestimationmethodsareinvestigatedfurtherinthisdissertation:MLE,
expectedaposteriori(EAP)andOwen'sBayesianabilityestimation.
2.3.4.1MaximumLikelihoodEstimation
ByfarthemostusedmethodofabilityestimationinaCATisMLE.MLEdependsonone
ofthemainassumptionsofIRT,localindependence(Hambleton&Swaminathan,1985).
Accordingtothelocalindependenceassumption,foranyexamineethepartialcorrelation
betweenanytwoitemswillbezerowhentheabilityparameterisheldconstant.Asaresultof
thislocalindependenceassumption,thelikelihoodofaresponsestringforasingleexaminee
canbecalculatedusingthefollowingproduct:
L

U
i
1
=
u
i
1
;:::;U
i
K
=
u
i
K
j


=
K
Y
k
=1
P
u
i
k
i
k

1

P
i
k

1

u
i
k
(2.6)
where
P
i
k
isasinEquation(2.1).MLEoftheabilityisthevalueof

thatmaximizes
thislikelihood:
^

MLE
=argmax

2
(

;
1
)
f
L
(
u
j

)
g
(2.7)
Newton-Raphsonmethodcanbeusedtothe
^

valuethatmaximizesEquation(2.6).
Innumericalanalysis,theNewton-Raphsonmethodisusedtoapproximatetherootsof
areal-valuedfunction(Hildebrand,1987).InthecaseofEquation(2.6),themaximum
valueistherootofthederivativeofthelikelihoodfunction.TheNewton-Raphson
16
procedurestartswithaninitialestimatefor
x
0
=

0
.Startingfromthisinitialestimate,the
Newton-Raphsonprocedureiterativelyapproximatestotherootofthederivativeusing
thefollowingequation:
x
n
+1
=
x
n

f
(
x
n
)
f
0
(
x
n
)
(2.8)
where
n
istheiterationnumberand
f
(
x
)=
d

L
(
u
j

)
f
0
(
x
)=
d
2

2
L
(
u
j

)
:
Theiterationsstopwhenthebetween
x
n
+1
and
x
n
becomesacceptablysmall.
Theuserthisacceptablesmallvalue.Attheend,theprocesssaidtobeconverged
andthe
x
n
+1
valueistheMLEofthe

,denotedas
^

.Someauthors(Hambleton
&Swaminathan,1985)choosetomaximizethenaturallogarithmoflikelihoodfunction
(
ln
(
L
(
u
j

)))insteadoflikelihoodfunction.Bothmethodswillconvergetothesamenumber.
Butmaximizingthelogarithmoflikelihoodispreferabletomaximizingthelikelihood,because
logarithmoflikelihoodreducestothesummationoftheprobabilitiesandthenumbersused
intheanalysisislessextremewhenthesumsratherthanproductsareused.
TheNewton-Raphsonalgorithmisaquickmethodtotheabilityestimateofagiven
responsestring.Itisimportanttoselectagoodinitialestimateforthisapproximationtobe
quick.Insomecasestheremightbesomelocalminimumormaximumvalues.Thealgorithm
mightconvergetothesevaluesinsteadofaglobalmaximumpoint.Theusershouldbeaware
ofsuchpossibilitiesandchoosegoodstartingvaluestoconvergetotheglobalmaximum
value.
MLEshavemanydesirableproperties(Hambleton&Swaminathan,1985).Theyare
consistent,asthenumberofitemsincreasestheestimatesconvergetotheirtruevalues.They
arecient,theyhavethesmallestvarianceasymptotically.Andasymptoticallytheyare
17
normallydistributed.Thelastpropertyisveryusefulinpractice.Itallowsthecalculationof
thestandarderrorofthemaximumlikelihoodestimator:
se
(
^

j

)=
1
I
(
^

)
2
(2.9)
where
I
(
^

)istheinformationfunctiongiveninEquation(2.2).
Ontheotherhand,MLEhasanimportantpracticaldisadvantage.Forresponsestrings
thatconsistofall1'sorall0'sitwilltendto
1
or

,respectively.Toavoidthisproblem,
theCATtestcanstartwithaBayesianestimationprocedureandswitchtoMLEafter
obtainingaheterogeneousresponsestring.
2.3.4.2ExpectedaPosterioriEstimation
InEAPestimationtheinformationfromexaminee'sresponsesarecombinedwiththeinfor-
mationaboutthepopulation(Bock&Mislevy,1982).EAPistheexpectedvalueofthe
posteriordistributioninEquation(2.5)whichcanbewrittenas:
^

EAP
=
E
(

j
u
)=
Z
1

g
(

j
u
)

(2.10)
VarianceoftheEAPestimatecanbewrittenas:
var(

j
u
)=
Z
1


2
g
(

j
u
)


[
E
(

j
u
)]
2
(2.11)
AsmentionedinSection2.3.2.2,thecalculationoftheseintegralsarenottrivial.A
commonapproachistoapproximatethevaluesoftheseintegralsusingnumericalintegration
methods.
Anotherestimationmethodthatusesthesameposteriordistributionismaximuma
posteriori(MAP)estimation(Samajima,1969).Insteadofingtheexpectationofposterior
distributionlikeEAP,MAPlocatesthemodeoftheposteriordistribution:
^

MAP
=argmax

2
(

;
1
)
f
g
(

j
u
)
g
(2.12)
18
2.3.4.3Owen'sBayesianEstimation
Owen'sBayesianestimation(Owen,1969,1975)isasequentialabilityestimationmethod.
IteliminatedtheburdensomecomputationsofMLE.AteachstepofaCAT,theposterior
distributionoftheabilityfromthepreviousstepisusedasapriordistributionforthe
estimationofability.Furthermore,assuminganormaldistributionasapriordistributionfor
examineepopulationenablesaclosedformapproximationtotheposteriormeanandvariance
oftheability:
M
k
=
M
k

1

V
k

1
q
a

2
k
+
V
k

1

˚
(
D
k
)
(
D
k
)


1

u
k
A
k

V
k
=
V
k

1

V
2
k

1
q
a

2
k
+
V
k

1


k
(2.13)
where
M
k
and
V
k
arethemeanandvarianceoftheposteriordistribution,
M
k

1
and
V
k

1
arethemeanandvarianceofthepriordistribution,
a
k
;b
k
;c
k
aretheitemparametersofthe
k
thitem,
u
k
istheitemresponse.
D
k
,
A
k
and

k
inEquation(2.13)are
D
k
=
b
k

M
k

1
q
a

2
k
+
V
k

1
A
k
=
c
k
+(1

c
k
)

(

D
k
)

k
=
˚
(
D
k
)
(
D
k
)


1

u
k
A
k


1

u
k
A
k


˚
(
D
k
)
(
D
k
)
+
D
k

where
˚
andaretheprobabilitydensityfunctionandcumulativedensityfunctionof
thestandardnormaldistribution.MeanoftheposteriordistributioninEquation(2.13)
correspondstotheabilityestimateandsquarerootofthevarianceoftheposteriordistribution
correspondstothestandarderroroftheabilityestimate.
Owen'sBayesianestimationwasverypopularduetoit'scomputationalsimplicitydueto
itsclosedformequations.Butithasamajordisadvantage.Ateachstep,itusesanormal
densityfunctiontocreateaposteriordistribution(Wang&Vispoel,1998),butinfactthe
shapesofthedistributionscandeviatefromtheshapeofnormaldistributions.Asaresult,
19
thisestimationmethodintroducesabiastotheestimates.Forexample,simplychangingthe
orderoftheadministrationofitemsmightchangetheabilityestimate.
MLEisgenerallypreferabletoBayesianestimationmethods.Inthelongrun,MLEis
asymptoticallyunbiased.Itisnotbyanyotherfactor,likeBayesianestimation
methods,otherthanactualtestperformance.Bayesianestimationmethodsyieldbiased
estimatesbutthestandarderrorsassociatedwiththemaresmaller(Lord,1986;Wang&
Vispoel,1998).
2.3.5ItemPoolsinCAT
Parshalletal.(2002)danitempoolasa\collectionoftestitemsthatcanbeusedto
assembleorconstructacomputer-basedtest"(p.21).Intheliterature,itempoolshavebeen
calledas\itembanks",\questionbanks",\itemcollections",\itemreservoirs"and\testitem
libraries"(Millman&Arter,1984).Eventhoughtheremightbesubtlebetween
theseterms,theyareusedsynonymouslyinthisstudy.
ThequalityoftheitempoolisverycrucialbecausethequalityoftheCATdependson
them.Chapter1explainedtheimportanceoftheitempoolsforadaptivetests.Accordingto
Flaugher(2000),asatisfactoryitempoolforadaptivetestingshouldhaveitemswiththree
characteristics:(1)highitemdiscriminations(
a>
1)(2)arectangulardistributionofitem
and(3)lowguessingparameters(
c<
0
:
2).McBride(1977)describedanidealitem
poolashavingalargenumberofhighlydiscriminatingitems(
a>
0
:
8)andarectangular
distributionofitemthatcoverstheabilitycontinuum.Asimilarof
idealitempoolwasgivenbyMillsandStocking(1996).Urry(1977)gaveamoredetailed
descriptionofitemparametersofanitempool:itemdiscriminationsshouldbelargerthan
0.8,itemyparametersshouldbeevenlyandwidelydistributedbetween-2and2,
itemguessingparametersshouldbelowerthan0.3.Urryaddedthattheitempoolsshould
haveatleast100items.
Thequalityoftheitemsintheitempoolisimportantbecausetheexamineesare
20
administeredrelativelyfeweritemscomparedtoP&Ptests.Awediteminanitempool
hasmanyperils.Ittheabilityestimates,whichconsequentlythesubsequent
itemsadministered.Ithasalargerontheabilityestimatesbecausefeweritemsare
administeredinCATs.Sincetheitemseachexamineeseesaret,aweditemcan
someexamineesbutnotothers,whichinturnhampersthefairnessofthetest(Wainer,
2000).
2.3.5.1ItemPoolSize
Thesizeoftheitempoolisanotherconsiderationforthequalityofanitempool.Thesize
andtheitemultydistributionofanitempooldependsontheCATspand
theexamineepopulation(Reckase,2010).Stocking(1994)listedsixfactorsthatthe
sizeofanitempool:\itemselectionalgorithm,constraintsonitemcontent,psychometrics,
andexposure,stoppingrules,overlaprestrictions,testscoring,requirementsofparallelism
withexistingpaper-and-pencilforms"(p.7).
Thepurposeofthetestcanathesizeofanitempool(Parshall,2002).Largeritem
poolsareneededforhighstakestestscomparedtolowstakestests.Inhighstakestests,
duetothestakesofthedecision,testscoresshouldbemoreprecise.Highstakestestsare
morepronetocheatingwhichrequiresmorelimitsovertheexposureofitems.Another
considerationforitempoolsizeisthenumberoftestdays(i.e.lengthofthetestingwindow).
Asthenumberoftestdatesincreased,theitemsinthepoolexposedmore.Thiscreatesa
needformoreitemsintheitempool(Parshall,2002).
Intheliteraturethereisnotaconsensusregardingthesizeofanitempool.Urry(1977)
advisedanitempoolwithatleast100itemsforaCATtestthatimprovestheaccuracy
comparedtoasimilarP&Ptest.Stocking(1994)recommendedthataCATitempoolshould
be12timesaslargeasthelengthoftheCATtest.Chen,Ankenmann,andSpray(2003)
advisedthatanitempoolshouldbeatleast6.7timeslargerthanalengthteststo
containtheoverlapratebetweenexamineesbelow15%.Foranoverlapratebelow10%,item
21
poolsizeshouldbe10timesaslargeasalengthtest.
Verylargeitempoolsarenotadvisableinpractice.Suchitempoolsareto
manage,theycanstrainthehardwareandsoftwarerunningtheCATtest(Mills&Stocking,
1996).Searchingasmallitempoolisfastercomparedtoalargeitempool.Thisisan
importantpracticalconstraintforoperationaladaptivetests(Vale&Weiss,1977).Asingle
breachtotheitempoolmightcompromisealotofitemsatthesametime.Inthissense,
creatingitempoolswithcientlyenoughitemsisimportant.
2.3.5.2ItemPoolDesignandAssembly
ItempoolsforaCATcanbeassembledinasimilarfashiontothetraditionaltests.Initially,
testdeveloperagoalforthetest.Thisgoalmightbemeasuringeveryexamineeas
preciselyaspossible,measuringhighabilityexamineespreciselyforascholarship,makinga
decisionatacutpointtoobtainalicenseforajob,etc.Thetestdeveloperthendevelopsan
itempoolinformationfunctionforthisgoal.Finally,thetestdeveloperassemblesitemsso
thattheitempoolinformationfunctionmatchesthetargetinformationfunction.vander
Linden(1998b)discussedseveralmethodstoassembleregularP&Ptestsusingthesethree
steps.ThesetestassemblymethodscanbeextendedtotheitempoolsoftheCATs.
Sometestingagencieshavelargeitembanksthatcontainmanyitems.Theselargeitem
banksarecalledthe\masterpool"or\vat"(Way,1998).Ateachtestingwindow,atest
agencyassemblesanitempoolfromthisvatthatmeetsthetestspTheitem
poolsarereplacedwiththenewonesaftersomeconditionsmet.Way,andAnderson
(2002)callsthemthedockingrules.Itempoolscanberenewedafteracertainnumberof
examineesseestheitempool,orafteracertainperiodoftime.Itempoolsareassembled
fromthemasteritempoolusingtheassemblytechniquesdescribedinthepreviousparagraph.
Inaddition,someresearchersproposedmethodstodesignoptimalblueprintforanitem
poolandselectingitemsfromthemasterpoolusingtheseoptimumblueprints(Veldkamp
&vanderLinden,2010).Reckase(2010)developedthebin-and-unionmethodtobuildan
22
optimumitempoolblueprint.Thismethodwasusedinthisstudytobuildoptimumitem
pools(Section4.2.2.1).SeeHeandReckase(2013)foranexampleuseofthisapproach.
2.3.6EvaluationoftheItemPools
IntheCATliteraturethereisnotaspmethodtoevaluatethequalityofitempools.
Generally,itempoolsarecomparedperformingsimulationsthatusetitempools
whileholdingeverythingelsethesame(Thompson&Weiss,2011).Thetestdeveloperchecks
theindicatorslikebias,SE,meansquarederror(MSE),exposureratesandoverlapratesand
makesadecisionaboutthequalityoftheitempool.Eventhoughtheseareveryvaluable
indicators,noneofthesearethedirectmeasuresofthequalityofanitempool.Forexample,
thereasonwhySEisnotadirectmeasureofitempoolqualityisdiscussedthroughlyin
Section3.4onpage31.
Themostcommonmethodtoevaluateitempoolqualityisinvestigatingtheitempool
informationfunction(Segall,Moreno,&Hetter,1997).Itempoolinformationfunctionsof
titempoolsarecomparedwhilekeepingthetestingpurposeinmind.Forexample,
XingandHambleton(2004)comparedsixitempoolsusingthismethod(Figure2.1).Depend-
ingonthepurposesofthetest,testdeveloperbuildsatargetinformationfunctionforthe
itempool.Theitempoolsthatareclosetothistargetofinformationcurvewillbeselected.
Anothermethodofitempoolevaluationistoinvestigatetheexposureandoverlaprates
oftheitemswithintheitempools(Chang,2004).Anitempoolthathasalotofoverused
orunderuseditemsisdeemedtobeantitempool.Also,forsecurityreasons,test
developersdonotwanthighitemoverlapratesbetweentworandomlyselectedexaminees.
Thesemethodsareimportanttomonitorthequalityoftheitempoolbutitdoesnotsay
muchaboutthequalityoftheitemsanexamineesees.Atestwithhighexposureratesmight
presenthighqualityitemstoexaminees,oranitempoolwithlowexposureratesmightnot
provideappropriateitemstotheexaminees.Atestdevelopermightimplementanexposure
1
ThisgraphistakenfromXingandHambleton(2004,p.8)
23
Figure2.1:ComparisonoftheInformationFunctionsofSixItemPoolsUsingtheItemPool
InformationFunctions
1
controltoreducetheexposureratesoftheitemswithintheitempool.Butthismayresulta
lossintheoftheCATtest.
24
CHAPTER3
THEITEMPOOLUTILIZATIONINDEX
3.1Relative
Theoriginsofitempoolutilizationindexgoesbacktotheconceptofrelativeciency.
InLordandNovick(1968),Birnbaumintroducedtheconceptofrelative.He
usedrelativeciencytocomparetwoscoringmethods.Lord(1980,p.83)relative
oftwotests,xandy,atacertainability

as:
RE
(
y;x
)=
If
;y
g
If
;x
g
(3.1)
where
If
;y
g
and
If
;x
g
aretheinformationfunctionsfortestsyandx,respectively.Lord
(1974,1975,1977b,1980)demonstratedtheuseofrelativetoevaluateandcompare
ttests.
Asanexamplefortheuseofrelativeincomparingtwotests,let'sconsidertwo
testswith10items.ItemparametersofthesetestsareinTable3.1.Ifthetestinformation
function(TIF)ofthesetwotests(Figure3.1)areinvestigated,itcanbeseenthatthese
twotestshavetcharacteristics.Test2(greenline)providesmoreinformationfor
examineesjustabove

=0,butitislessinformativeforexamineesattheextremes.On
theotherhand,Test1(redline)isnotgivingasmuchinformationasTest2forexaminees
with

'sbetween-0.5and1.5,butthroughouttheabilityscaleitprovidesmoreinformation.
RelativeofTest1toTest2(blueline)showsthis.Test1ismoretcompared
toTest2whenbluelineisabovethedashedline
x
=1.Whenbluelinefallsbelowthe
dashedline,Test2ismoreinformative.
25
abc
Item11.21-1.410.22
Item21.32-1.210.14
Item31.160.010.17
Item41.570.250.23
Item51.32-0.820.15
Item61.410.240.26
Item71.16-0.550.15
Item81.120.510.18
Item91.35-1.220.15
Item101.242.230.21
(a)Test1
abc
Item11.08-0.980.20
Item21.36-0.700.15
Item30.970.370.22
Item41.29-0.070.14
Item51.460.060.19
Item61.480.390.18
Item71.41-0.750.18
Item80.690.550.19
Item91.070.170.23
Item101.560.360.14
(b)Test2
Table3.1:ItemParametersofTest1andTest2
Figure3.1:TestInformationFunctionsandRelativeofTest1andTest2
MorerecentlyHan(2012)createdanitemselectionalgorithmforCATusingrelative
concept.Inthisalgorithmheutilizedfromtheconceptofexpecteditem,
whichisthe\levelofrealizationofanitem'spotentialinformationatinterim
^

"(Han,2012,
p.227).Mathematicallyanitem
i
'sexpectedaftertheadministrationof
k
thitem
26
as:
I
i
h
^

k
i
I
i


i

(3.2)
where
I
i
h
^

k
i
istheamountofinformationitem
i
hasattheinterimabilityestimate
^

k
,and
I
i


i

isthemaximumpotentialinformationitem
i
canhave.For1PLand2PLmodels
(Hambleton&Swaminathan,1985),item
i
reachesmaximuminformationat


i
=
b
i
,where
b
i
istheitemyparameter.For3PLmodel,item
i
reachesmaximuminformationat


i
=
b
i
+
1
Da
i
ln

1+
p
1+8
c
i
2

:
(3.3)
3.2ItemPoolUtilizationIndex
Han(2012)usedexpecteditemasanintermediatesteptoselectmostappropriate
itemforanexaminee.Hedidnotmentioneditasapossiblewayofevaluatinganitem
pool'sciency.Thisiswherecurrentresearchdivertsfromhisresearchandtherestofthe
literature.Inthisdissertation,aslightlytversionofitemwillbeusedto
evaluateanitempool'sperformance.InthepaperofHan(2012),thefocuswaswhetheran
item'smaximumpotentialisfuornot(Equation(3.2)).Inthisdissertation,thefocus
iswhetheraperfectitemisadministeredtoanexamineeornot.Eventhoughthesetwo
interpretationsgivesameresultsforbasicIRTmodels,theyareconceptuallyt.In
thisdissertation,theitemforitem
i
at
^

as:
I
i
h
^

i
I
max
h
^

i
(3.4)
where
I
i
[
^

]istheinformationofitem
i
at
^

,and
I
max
[
^

]isthevalueofinformationat
^

if
anoptimumitemisadministeredtotheexamineewith
^

.Ifthisitemiscalculated
foreachiteminanadaptivetestforanexamineeandthenaveraged,wewillget:
1
K
K
X
k
=1
I
i
k
h
^

k

1
i
I
max
h
^

k

1
i
(3.5)
27
where
K
isthetestlength,
i
k
istheindexof
k
thadministereditemintheitempool,
^

k

1
is
theintermediateabilityestimateaftertheadministrationofthe(
k

1)thitem,
I
i
k
[
^

k

1
]is
theamountofinformationitem
i
k
hasat
^

k

1
and
I
max
[
^

k

1
]istheamountofmaximum
informationanoptimumitemhasat
^

k

1
.For
k
=1,
^

k

1
becomes
^

0
whichistheinitial
abilityestimate.ThevalueinEquation(3.5)towhatdegreetheitemspresentedto
anexamineedeviatefromtheitemscomingfromaperfectitempool.Here,aperfectitem
poolisasanitempoolinwhich,wheneveraCATalgorithmsearchesforanitem
todeliver,itempoolcanpresentaperfectitemforthatabilitylevel,regardlessofthestage
ofthetest.Aperfectitemforaparticular

isasanitemthatprovidesmaximum
possibleinformationatthat

level.ThisapproachissimilartoEignoretal.(1993):
...anitemisconsideredtohaveoptimumstatisticalpropertiesifitismost
informativeatanexaminee'scurrentmaximum-likelihoodestimateofability.
(p.10)
Thisconceptualizationofaperfectitemispurelystatisticalandfromtheframeworkof
IRT.Clearly,aperfectitemanditempoolshouldbevalidfortheintendedpurposeanduse
ofthetestscores(Kane,2013).Anotherassumptioninthisofperfectitemisthat,
aperfectitemforanexamineeistheitemthatisequaltothecurrentabilityestimateof
theexaminee.Thismightnotbethecaseineverysituation.Dependingonthepurposeof
thetest,abetteritemmightbetheonethatmaximizestheinformationatanotherability
level.Inalicensureexamination,forexample,testdevelopermightwanttoadministeritems
thathavemaximuminformationatthecutscore.Theaimofthelicensureexaminationis
tolearnwhetherexamineesareaboveorbelowthecutscore.Theexactlocationsofthe
examineesmightnotbetheprimaryaimoftheexamination.Inthiscase,aperfectitem
mightbeasanitemthathasthemaximumamountofinformationatthecutscore.
AggregatingEquation(3.5)overarepresentativegroupofexamineeswillgiveinformation
abouthowanitempoolperformsforthatexamineegroup.Theitempoolutilizationindex
28
(IPUI)proposedforevaluatingtheperformanceofanitempoolis:
IPUI
=
1
N
N
X
j
=1
0
B
@
1
K
j
K
j
X
k
=1
I
i
jk
h
^

j
(
k

1)
i
I
max
h
^

j
(
k

1)
i
1
C
A
(3.6)
where
N
isthetotalnumberofexamineesthattooktheadaptivetest,
K
j
isthetestlength
ofexaminee
j
,
i
jk
istheindexof
k
thtestitempresentedto
j
thexamineewithintheitem
pooland
^

j
(
k

1)
istheabilityestimateofexaminee
j
aftertheadministrationof
k
thitem.
Thesummationistakenovertheexamineestakingtheadaptivetest,andthesecond
innersummationistakenoverthesptestofeachexaminee.
ThevaluesthatIPUIcantakerangesbetween0and1.AnIPUIvalueof1a
perfectitempool.AnIPUIvalueof0istheoreticallypossible.Ifeachitempresentedto
examineehas0informationvalueattheexaminee'scurrentabilityestimate,thenIPUIwill
takeavalueof0.Butinpractice,eachitemprovidessomeinformationaboutanexaminees.
So,IPUIcangetverycloseto0butcannotbe0inpracticalsettings.
IPUIcantellanitempools'levelof.Ifoneaddsredundantitemstoanitem
poolthatcannotbeutilizedbyaCATalgorithm,IPUIwillnotincreaseorincreaseminimally.
Inthissense,IPUIisanindicatoroftheitempools',insteadofredundancy.IPUI
canbeusedtodiagnoseanitempool.AtestdevelopercancalculateIPUIforcertaingroups
ofexamineesorconditionalontabilitylevelsandobservetheweakspotsoftheitem
pool.
IPUIcanbehelpfultotestdevelopersandtestusersattwolevels.First,IPUIcanbe
usedatexamineelevelasshowninEquation(3.5).AnIPUIcanbecalculatedforeach
examineeandbothtestdevelopersandtestuserscanmonitorthequalityoftheitempoolat
thislevel.AtestdevelopercanassignabaselineIPUIvalueforeachexamineeandensure
thatitempoolistforeachindividualexaminee.Thiswillsubstantiatethefairness
claimofthetest.Second,asshowninEquation(3.6),IPUIcanbeusedattheaggregate
level.Thisallowstestdevelopertogetanoverallpictureofanitempool'sperformanceat
thegrouplevel.AggregatingIPUIatgrouplevelallowsdevelopertoweighttheIPUIfor
29
atargetgroup.Thismightbenecessaryincaseswherethetesthasaparticularaimand
targetpopulation.Testdevelopermightwanttoseehowanitempoolperformsformostof
thetargetpopulationandgiveasmallerweighttoexamineesattheextremes.Aggregation
oftheIPUIcanbeusefulatsuchsituations.
IncontrasttotheotheroutcomesoftheCATsuchasSE,MSEorbias,IPUIisa
standardizedmeasurethatrangesbetween0and1.Itisnotstraightforwardtocomparetwo
tvaluesofSEbecausetheydependonthecontextofthemeasurement.Therangesof
otheroutcomesofaCATareunspTheoretically,SEandMSEcantakeanypositive
value,biascantakeanyvalue.Forexample,itistointerpretwhetheraSEvalueof
0.4islargeorsmallwithoutknowingthecontext.Ontheotherhand,anIPUIvalueclose
to1alwaysindicatesanadequateitempool.IPUIvaluescanbecomparedacrosst
testingscenariosbecausetheyaredimensionless.
3.3AnExampleCalculationofIPUI
ThecalculationofIPUIisverystraightforward.AnexampleIPUIcalculationforExaminee
2inFigure1.1demonstratedinTable3.2.
IPUI
k
columnshowstherelativequalityoftheselecteditemateachstepoftheCAT.
Ascanbeseenfromtheline,theinitialabilityestimateis
^

0
=0.They
parameteroftheselecteditem(
b
i
1
=0
:
0013)isalmostequaltothisinitialabilityestimate.
Theinformationvalueofthisitem(
I
i
1
[
^

0
]=0
:
722)atthisinitialestimatealsoreachedthe
maximumpossibleinformationvalue(
I
max
).Examinee2givesacorrectanswertothe
itemandtheabilityoftheexamineeisupdatedto
^

1
=1
:
42
.Theitemthatcanprovide
highestinformationatthisabilityestimatehasanitemultyparameter
b
i
2
=1
:
34.This
itemisalsoveryinformative(
I
i
2
[
^

1
]=0
:
719),butnotasinformativeastheitem.IPUI
valueforthisitemslightlydroppedto0.995.Afterthethirditem,thediscrepancybetween
examinee'sestimatedabilityandtheyparameteroftheselecteditemstarttoincrease.
30
k
^

k

1
b
i
k
u
k
^

k
I
i
k
h
^

k

1
i
I
max
IPUI
k
IPUI
1:
k
10.0000000.00134767211.4175980.7220.7221.0001.000
21.4175981.33850582512.3118250.7190.7220.9950.998
32.3118252.22629008501.8393440.7190.7220.9950.997
41.8393441.63198064412.1980570.7010.7220.9700.990
52.1980571.51072284701.7030280.5230.7220.7240.937
61.7030281.06887530011.8336320.5470.7220.7580.907
71.8336321.04685373811.9312290.4760.7220.6590.871
81.9312290.94667872912.0012910.3840.7220.5320.829
92.0012910.75453767312.0473490.2770.7220.3830.779
102.0473490.70711851412.0861420.2440.7220.3370.735
112.0861420.63024940712.1178510.2070.7220.2860.694
122.1178510.58452503712.1454070.1850.7220.2560.658
132.1454070.50019353712.1681640.1570.7220.2170.624
142.1681640.34752208812.1851400.1200.7220.1660.591
152.1851400.24774431001.8645990.1000.7220.1380.561
Table3.2:IPUICalculationExample
Atthelastitem,eventhoughtheexaminee'sestimatedabilitybasedonhis/herprevious
14responsesis
^

14
=2
:
19
,itempoolcanonlyprovideanitemwithyparameter
b
i
15
=0
:
25.Theamountofinformationthatthisitemprovidesatthisabilityestimateis
verylowcomparedtopreviousitems(
I
i
15
[
^

14
]=0
:
1).Thisdiscrepancydonthe
IPUIvalueforthisitem(
IPUI
15
=0
:
138).Attheendofthetest,theoverallvalueofIPUI
forExaminee2is0.561.ComparethisvaluetotheIPUIvalueofExaminee1,0.998.
3.4betweenIPUIandStandardError
ThebiggestbetweenIPUIandtheSEisthe

valuesusedtocalculatethe
informationfunction.WhencalculatingtheSE,theinformationformulawilluseonlythe


estimate(Equation(2.4)).InthissenseSEisblindtothequality(orappropriateness)
oftheitemswhicharepresentedattheintermediatestagesoftheadaptivetest.SEonly
careswhethergooditemsarepresentedtoexamineewhicharearoundexaminee'slability
estimate.IfanexamineestartsaCATwithalowinitialabilityestimateandimprovesher
abilityestimatecontinuouslybycorrectlyansweringtheitemspresented,SEwillindicate
31
thatthequalityofthetestislowbecauseexamineegetalotofinappropriateitemscloseto
herabilityestimate.SEprovidesaveryimportantpieceofinformation.Butitsays
littleabouttheappropriatenessoftheitemsthatwerepresentedtotheexaminee.IPUI,on
thecontrary,canexclusivelygiveinformationregardingthequalityoftheitemspresented.
ThedistinctionbetweenSEandIPUIisdemonstratedwithanexample.Figure3.2shows
theCATprocessesoftwoexamineesthataretakingthesameadaptivetestwithsametest
spanditempool.
Figure3.2:ADemonstrationofthencebetweenSEandIPUI
Examinee3isahighabilityexamineeandansweredmostofthequestionscorrectly.Also,
theitempoolwasveryappropriateforExaminee3.Ateachstep,therewasanappropriate
itemtopresentintheitempoolwhichwasveryclosetoherintermediateabilityestimate.
Consequently,theIPUIvalueisveryhighforthisexaminee,0.98.Ontheotherhand,
32
Examinee4isanaverageabilityexaminee.Shestartedtestwithconsecutiveincorrect
responses.Lateronthetest,herresponsesimproved.Sincetheitempooliscomposed
ofratheritems,Examinee4couldnotgetappropriateitems.Eventhough,she
respondedincorrectly,thecultyoftheitemskeptincreasing.ThisresultinalowIPUI
value,0.273.
WhentheSEsofthesetwoexamineesareexamined,itisobservedthatExaminee3hasa
higherSE.Asexplainedintheparagraphofthissection,SEcalculatestheinformation
functionwithrespecttotheabilityestimate(reddashedlinesinFigure3.2).For
Examinee3,theyparametersoftheadministereditemswereawayfromthe
abilityestimate.ThisproducedinahighSEvalueforthisexaminee.ForExaminee4,
eventhoughtheitempoolwasnotveryappropriate,theyparametersoftheitems
presentedtothisexamineewereclosetotheabilityestimate.ThisproducedalowerSE
forthisexaminee.
VisuallythebetweenSEandIPUIcanbeconceptualizedwiththehelpofred
andblackdashedlinesinFigure3.2.SEisaggregatingthedistancesshownbythereddashed
lines.Thelongertheredlines,thehighertheSEwillbe.IPUIaggregatesthedistances
shownbytheblackdashedlines.Theshorterthedistances,thehighertheIPUIwillbe.
Inthisexample,SEclearlysaysnothingaboutthequalityoftheitempool.But,IPUI
showstheciencyoftheitempool.Theexampleshownaboveisnotcommoninpractice.
ButitshowsthecebetweenSEandIPUIplainly.Formostofthecases,SEandIPUI
willbehighlycorrelated.Testswithbetteritempoolswillhaveonaveragelowerstandard
errors.
3.5TheLimitationsofIPUI
InEquation(3.5),
I
max
h
^

k

1
i
isaequalforallitemsfor1PLmodel.For1PLmodel,
anoptimumitemwhichhasmaximuminformationat
^

k

1
hasitemyparameter
33
b
i
k
=
^

k

1
.Themaximumvalueoftheinformationofthisitemis:
I
max
h
^

k

1
i
=
D
2
e
D


^

k

1

b
i
k

 
1+
e

D


^

k

1

b
i
k

!
2
=
D
2
4
(3.7)
Forthe2PLmodel,anoptimumitemwhichhasmaximuminformationat
^

k

1
alsohas
itemyparameter
b
i
k
=
^

k

1
.Consequently,themaximumvalueofinformationfor
the2PLmodelwillbe:
I
max
h
^

k

1
i
=
D
2
a
2
i
k
e
D

a
i
k

^

k

1

b
i
k

 
1+
e

D

a
i
k

^

k

1

b
i
k

!
2
=
D
2
a
2
i
k
4
(3.8)
Forthe3PL,fromEquation(3.3),themaximumvalueofinformationwillbe:
I
max
h
^

k

1
i
=

Da
i
k

2

1

c
i
k

 
c
i
k
+
1+
q
1+8
c
i
k
2
! 
1+
2
1+
q
1+8
c
i
k
!
2
(3.9)
ItcanbeeasilyseenfromEquation(3.7)thatthevalueof
I
max
h
^

k

1
i
doesnotdepend
oneitheritemorabilityparametersfor1PLmodel.Thevalueisconstantforallitems.
Themaximuminformationforaperfectitemcanbecapturedbyaconstant.Ontheother
hand,inEquations(3.8)and(3.9)thevalueof
I
max
h
^

k

1
i
dependsonthevalueschosen
for
a
i
k
and
c
i
k
parameters.Sincewetheperfectitemastheonethatmaximizes
theiteminformationvalue,forthe2PLmodel,Equation(3.8)ismaximizedwhen
a
i
k
=
1
and
b
i
k
=
^

k

1
.Forthe3PLmodel,themaximuminformationvaluewillbereachedwhen
a
i
k
=
1
,
c
i
k
=0and
b
i
k
=
^

k

1
.Themaximizedvaluesofinformationsforthe2PLand
3PLmodelswillbeThiswillposeaproblemfortheIPUI.Thedenominatorof
theEquation(3.6)willbewhichwillforceIPUItobe0foranypracticaltesting
situationusing2PLor3PLmodel.ThisproblemisacknowledgedbyReckase(2010)where
hediscussestheimpossibilityofdevelopingoptimalitempoolsforthe2PLmodel.
34
Onepossiblesolutiontothisproblemisthevalueofthe
a
parametertoahigh
valuethatisrarelyreached,like3.Butanyvaluechoseninthismannerwillbearbitrary.
Anotheroptionmightbetoignorethe
a
parameterwhencalculatingIPUI.Thiswillbypass
theyproblemstatedabovebutalsolosesvaluableinformationaboutthequalityof
itempool.Anitem'squalityishighlyrelatedtoit'sdiscriminationpower.Ignoringthiswill
resultanequalIPUIvalueforaqualityitempoolwithmanyhighlydiscriminatingitems
andanitempoolwithmanylowdiscriminatingitems.Inthisdissertation,thislimitationof
IPUIwillbeacknowledgedandthepropertiesofIPUIfor1PLwillbeinvestigated.
35
CHAPTER4
RESEARCHQUESTIONSANDMETHODS
4.1ResearchQuestions
Asstatedinthepreviouschapter,thereisaneedtoevaluateitempools.Thisstudyfocuses
oncreatinganewindextoevaluatethequalityofanitempoolforagivenadaptivetest
andexamineepopulation.Thisstudyinvestigateswhetherthisindexprovidesadditional
informationaboutitempoolsontopofexistingmethodstoevaluateaCAT(suchasbias,
SEofabilityestimates,MSE,itempoolinformationfunction).Inaddition,themethods
todiagnosetheitempoolsusingthisindexareinvestigated.Theresearchquestionsofthis
studyarethefollowing:
1.
Foragivenpopulationofexamineesandadaptivetestdesign,doesthisindexchanges
astheitempoolqualitychanges?
2.
HowdoesthemagnitudeofthisindexchangesastheCATspthatthe
itempoolqualitychanges?
3.
Howcanthisindexbeusedtodiagnosetheshortcomingsoftheitempoolsandassist
testdeveloperstoimprovethequalityoftheiritempools?
4.
CanthisindexbeusedtoevaluatetheitempoolqualityofanoperationalCAT?
5.
Canthisindexbeusedtodiagnosetheshortcomingsofanoperationalitempool?
4.2ResearchMethods
Therearetwophasesofthisstudy.Inthephase,thethreeresearchquestionsare
investigatedusingCATsimulationswithtspInthesecondphaseofthe
36
study,thelasttworesearchquestionsareexaminedforanexistingoperationalCAT.
4.2.1FirstPhase-SimulatedData
Thephaseofthestudyconsistofthreesetsofsimulationstoanswerresearchquestions
1,2and3.Inthesethreesetsofsimulations,generateddatawereused.
4.2.1.1CommonCATSp
AlladaptivetestsinthephasesharedsomecommonspSimulationswere
performedusingRprogramminglanguage(RCoreTeam,2014).The1PLmodelwithscaling
parameterD=1.7wasused.Forthe1PL,theprobabilityofcorrectresponseis:
P
(
u
i
k
=1)=
1
1+
e

1
:
7


b
i
k

:
(4.1)
Eachteststartedwithaninitialabilityestimateof0.ItemswereselectedusingMFIas
explainedinSection2.3.2.1.Forinterimandabilityestimation,theEAPmethodwith
priormean0andpriorvariance4wasusedatthebeginningofthetestuntiltheexaminee
obtainedatleastonecorrectandoneincorrectresponse.Afteraheterogeneousresponse
string,abilitywasestimatedusingMLE.Thevarianceofthepriordistributionwaschosento
be4(insteadofastrongpriorwithvariance1orevenlower)toreducetheimpactofiton
theabilityestimatesandconsequentlytheitemselection.
Inpractice,abilityestimatesareusuallywithinanarbitraryinterval(Wang&
Vispoel,1998),forexamplebetween-4and4.Thisisusuallydonetodealwiththe
abilityestimatesofMLEforexamineeswithallcorrectorallincorrectresponsestrings.
Anotherreasonforthispracticeisextremeabilityvaluesintoapracticalinterval.
Inthephaseofthesimulations,abilityestimateswerenotintosuchanarbitrary
interval.Thereweretworeasonsbehindthisdecision.First,EAPestimationwasuseduntil
anexamineeobtainedaheterogeneousresponsestring.So,eabilityestimateswerenot
aproblem.Second,sincethephasewasatheoreticalstudy,theaimwastoobservethe
37
ofconditionsondependentvariableswithouttheinterferenceofsucharbitraryrules.
Forexample,iftherewasanarbitrarylimitforabilityestimates,thestandarddeviationof
theabilityestimatesattheextremesoftheabilitydistributionmightbedepresseddueto
suchrules.Thiswouldlimitthegeneralizabilityoftheconclusionsfromtheanalysisbecause
itwouldnotpossibletostripoutthectsofthisarbitraryrule.
ForallofthesimulationsinResearchQuestion1and2,
10
;
000
examineesweresimulated
foreachcondition.Thisnumberwasenoughtogetarepresentativesamplefromtheability
distributionsandminimizetheofsamplingerrors.Largersampleswouldnothave
muchaddedvalueontopofthisnumberduetothediminishingreturns.And,sincethere
weremanyconditionstosimulate,computingtimewouldincreaseexponentially.ForResearch
Question3,1000examineesateach

valueweresimulated.Sincethereare31

values
between-3and3with0.2interval,atotalof
31
;
000
examineesweresimulatedforeach
condition.
Togenerateresponses,arandomnumberfromauniformdistributionbetween0
and1wasgenerated.Then,theexaminee'sprobabilityofcorrectlyansweringtheitem
wascalculated.Iftherandomuniformnumberwassmallerthantheprobabilityofcorrect
response,ascoreof1wasassignedasaresponse,otherwiseascoreof0wasassigned.
4.2.1.2ResearchQuestion1
InthesetofsimulationstheutilityoftheIPUIwasexploredbycheckingwhether
thisindexchangessystematicallyastheitempoolqualitychanges,i.e.whetherthisindex
issensitivetothechangesinthequalityoftheitempool.Theitempoolqualitywas
operationalizedas:
1.
Thediscrepancybetweenitempooldistributionandexamineeabilitydistribution
2.
Itempoolsize
38
Accordingto(1),itempoolqualityreducesasthediscrepancybetweentheitempool
distributionandabilitydistributionincreases.Accordingto(2),itempoolqualitydecreases
asthesizeofanitempool(i.e.thenumberofitemsinanitempool)decreases.Itwas
hypothesizedthatthevalueoftheindexwilldecreaseasthequalityoftheitempooldecreases.
ResearchQuestion1wasansweredbycheckingwhethertheIPUIchangesasthesetwoitem
poolqualityindicatorschange.
Itempoolandexamineeabilitydiscrepancy
ACATsimulationwasperformedto
checkwhethertheIPUIdecreasesasthediscrepancybetweenitempooldistributionand
examineeabilitydistributionincreases.Eventhoughtherearemanywaystoincreasethe
discrepancybetweentwodistributions,inthisstudyitwasassumedthatbothitemy
(
b
parameter)distributionsoftheitempoolsandtheabilitydistributionofexamineeswere
normallydistributedwithsamestandarddeviations,1.Thediscrepancybetweentheitem
poolandabilitydistributionwasincreased(ordecreased)byincreasing(ordecreasing)the
betweenthemeansoftheitemydistributionandabilitydistribution.
Inthesimulation,theitemdydistributionoftheitempoolsweretothe
standardnormaldistribution.Ontheotherhand,abilitydistributionshadtmeans
rangingfrom-3to3with0.5intervals.Standarddeviationsoftheabilitydistributionswere
to1aswell.ThissetupallowedobservationofwhethertheIPUIchangeswhenthe
discrepancybetweentheitempoolandexamineeabilitydistributiontakesvalues-3,-2.5,
-2,-1.5,-1,-0.5,0,0.5,1,1.5,2,2.5and3.ItwashypothesizedthattheIPUIwouldtake
thehighestvaluewhenthediscrepancybetweenthetwodistributionswas0andIPUIwould
decreaseasthediscrepancymoveseithertowards-3or3.
AlloftheCATspweresameforeachofthe13discrepancyconditions.The
itempoolwasconsistof250items.Asmentionedabove,theitempoolhadastandardnormal
distribution.Thesameitempoolwasusedforallconditions.Foreachcondition,10000
examineesweresimulated.CATtestshadatestlengthof20items.Itempoolsizewas
39
chosenaccordingtothistestlength.Asmentionedabove,Stocking(1994)suggestedthat
itempoolsizeshouldbetwelvetimesthelengthofadaptivetest.Therewerenoconstraints
ontheitemselectionalgorithmsuchasexposurecontrolorcontentbalancing.
Itempool
ForthesecondpartofResearchQuestion1,whetherthevalueofIPUIchanges
astheitempoolsizechangeswasexplored.Itempoolsizeisthesecondoperationalized
indicatorfortheitempoolquality.Itishypothesizedthatastheitempoolsizeincreases,
theIPUIindexwillincreaseaswell.Butinsteadoflinear,thisrelationshipishypothesized
tobeamonotonicnon-linearincrease.Toobservethisrelationship,CATwithtitem
poolsizesweresimulated.Therewere11itempoolsizeconditions:(1)verysmallitempool
withthesizeofthetestlength,20inthiscase;(2-4)Smallitempoolsizeswith40,60and80
items;(5-11)largeitempoolswithsizes100,200,300,400,500,750and1000.Itemy
parametersofallitempoolsweregeneratedfromstandardnormaldistribution.Foreach
condition,
10
;
000
examineesgeneratedfromstandardnormaldistributionwereused.Test
lengthofalltestswere20.Therewerenoconstraintsontheitemselectionalgorithm.
Whencomparingitempools,especiallyfortheoneswithsmallnumbersofitems,the
ofsamplingerrorisexpected.Forexample,fortheitempoolofsize20,if20itemsfrom
thestandardnormaldistributionisrandomlyselectedandtheIPUIindexiscalculatedbased
onthisitempool,possiblythisindexwillfromanother20itemsthatarerandomly
chosen.Toalleviatetheeofthesamplingerror,simulationswererepeated25timesfor
eachitempoolsizeconditionandresultswerereportedbasedonthesereplications.Alarger
ofsamplingerrorisexpectedforsmallitempoolsizes.Butforlargeritempoolsizes,
theofthesamplingerrorisexpectedtobedisappear.Performingthesereplications
wasworthwhilebecauseitshowedthesensitivityoftheindexaswell.
Inthisthesis,the1PLmodelwasused.So,theitemsonlybytheirty
parameters.Allitemswereassumedtodiscriminateexamineesequallywell,andexaminees
wereassumedtonotguess.Ontheotherhand,in2PLor3PLmodels,itemsintheir
40
discriminationparametersandtheirguessingparameters.Inthesemodels,ahighquality
itemhasahighdiscriminationparameterandlowguessingparameter.Highqualityitems
providemoreinformationabouttheexaminee,giventhattheitemyisverycloseto
theexaminee'strueabilitylevel.Inthesemodels,increasingthesizeofanitempoolwithout
consideringthequalityofitemsmightresultinunexpecteditempoolperformances.Asmall
itempoolwithhighqualityitemscanperformbetterthanalargeritempoolwithlowquality
items.Consequently,thesizeofanitempoolwillnotbetheonlydeterminantofthequality
ofanitempoolin2PLor3PLmodels.
Foreachcondition,biases,standarderrors,meansquarederrorsoftheabilityestimates
andtheycot(McBride,1977),i.e.correlationbetweentrueandestimated
abilitieshavebeencalculated.Inaddition,IPUIvaluesandexposureratesoftheitemsfor
eachconditionwerecalculated.ForaCATwithtestlength
K
and
N
examinees:
MeanStandardError=
SE
=
1
N
N
X
j
=1
2
6
4
K
j
X
k
=1
I
i
jk

^

jk

3
7
5

1
(4.2)
MeanSquaredError=
1
N
N
X
j
=1

^

j


j

2
(4.3)
MeanBias=
1
N
N
X
j
=1
^

j


j
(4.4)
r

^

=
N
P
j
=1


j


^

j


^


s
N
P
j
=1


j


2
s
N
P
j
=1

^

j


^


2
(4.5)
where

j
isthetrueabilityofexaminee
j
,
^

j
istheestimatedabilityofexaminee
j
,


isthemeanoftrueabilitiesof
N
examinees,

^

isthemeanofestimatedabilitiesof
N
examinees,
I
i
jk
(
^

jk
)istheFisherinformationofitem
i
jk
at
^

jk
and
r

^

isthey
cot(correlationbetweentrueandestimatedabilities).
41
4.2.1.3ResearchQuestion2
Therearemanyfactorsthatectthequalityoftheitempoolsbesidesthesizeofanitem
poolorthediscrepancybetweentheitempoolandexamineepopulationdistribution,suchas
CATspThesefactorsmightthequalityoftheitempoolstly.
ChangingCATspmighthavepositiveornegativeimpactontheutilizationof
theitempool.Forasetofspitempoolqualitymightbetbutifthese
spnschanges,itempoolqualitymightchangeaswelleventhoughtheitempool
itselfdoesnotchange.Exposurecontrolisagoodexampleforthis.Anitempoolmightbe
tforanadaptivetestwithnoexposurecontrol,butimposinganexposurecontrol
proceduremightreducethequalityoftheitempool.InResearchQuestion2,howt
CATspmightthequalityoftheitempoolwasinvestigated.
Theresultsforadaptivetestswitherentspwerecomparedtosee(1)how
thesechangestheindex,(2)whetherthisindexcapturesthechangesintheitempool
qualitycausedbythechangesinCATspions,(3)therelationshipbetweenthisindex
andotheroutcomesoftheadaptivetests(suchasbias,SE,MSEetc.),(4)whetherthisindex
capturestheitempoolqualitychangesthatotherindicatorsofaCATcannotcapture.The
lastaimwouldshowtheaddedvalueofthisindextothecurrentliterature.
InordertoseethemainofthetestspontheIPUIandotherCAT
outcomeindicators,CATsimulationsinwhichonlyonespchangesatatimewere
performed.TwoCATspwerethefocus:testlengthandexposurecontrol.Clearly,
CATspionsmightalsointeractwitheachother.Changesintwoormoresp
atthesametimemighthaveatimpactontheitempoolqualitycomparedtochanging
onespatatime.Theofthechangesmightnotbeadditiveandtheremight
beinteractionsbetweenspInthisstudy,onlythemainwereinvestigated
forsimplicityreasons.Sp,theoftestlengthandexposurecontrolonthe
qualityoftheitempoolwereinvestigated.
42
Testlength
First,theoftestlengthonIPUIandotheroutcomesoftheadaptive
testwereinvestigated.AsindicatedintheruleofthumbbyStocking(1994),thereisadirect
relationshipbetweentestlengthanditempoolsize.Itisexpectedthat,everythingbeing
equal,astestlengthincreases,itempoolqualitydecreases,andconsequentlythevalueof
theIPUIdecreases.ThisrelationshipwasexploredbysimulatingCATswithttest
lengths.Eighteenttestlengthconditionsweretestedtoseetheoftestlength
onitempoolquality.Theseconditionsweretestlengths5,10,15,20,25,30,35,40,45,50,
55,60,65,70,100,200,300and400.Thelastconditionbasicallyadministeredeveryitemin
theitempooltoasimulee.Inthissensethisconditionwasnotverytthanalinear
test.Theonlyewastheorderingoftheitems.Inalineartesttheorderofitemsare
generallysameforeveryexaminee.
10
;
000
examineesweregeneratedfromastandardnormaldistribution.Thesamesetof
examineeswereusedforallconditions.Theitempoolsizewas400,approximately10times
themedianofthetestlengthconditions.Thesameitempoolusedforallconditions.Item
yparametersoftheitempoolweregeneratedfromstandardnormaldistribution.
Sincetheitempoolsizewasratherlarge,itwasassumedthattheofsamplingerror
wasminimal.Consequently,therewerenoreplicationsofthesimulationswithtitem
poolsofthesamesize.Thereweren'tanyconstraintsontheitemselectionalgorithm.
ExposureControl
ExposurecontrolisthesecondCATspimposedontheitem
selectionthatwasexplored.Exposurecontrolisanimportantfactortheitempools
especiallyinhighstakesCATs.Itisoneofthemainreasonsthatforcesthetestdevelopers
toincreasethesizeoftheiritempools.Forthisreason,itiscrucialtoexploretheinteraction
betweenIPUIandexposurecontrol.
ExposurecontrolhasmanyvariationsinadaptivetestingasexplainedinRevueltaand
Ponsoda(1998),Georgiadouetal.(2007),Leroux,Lopez,Hembry,andDodd(2013).This
studyfocusedonlyonrandomesqueexposurecontrolprocedure(Kingsbury&Zara,1989)
43
duetoitssimplicityandwidespreadusage.
Therewere12itemexposureconditions:(1)noexposurecontrol,(2-11)randomesque
with3,5,7,10,13,15,20,25,50,100itemsand(12)total-random-selectionofitems.For
example,inrandomesquewith10items,oneitemoutofthe10mostinformativeitemsat
examinee'scurrentabilityestimatewasselected.Intotal-random-selectionofitems,items
wererandomlyselectedoutofallavailableitems,regardlessoftheirinformationvalue.This
caseservedastheworstcasescenariointhesenseoftheofaCATamongthe
exposurecontrolconditions.Itwasexpectedthat,asmorerandomnesswasimposedonthe
itemselectionprocedure,thevalueoftheIPUIwoulddecrease.
Asintheprevioussimulations,everythingexcepttheexposurecontrolwerethesame
intheCATsimulationsbetweentheconditions.Foreachcondition,
10
;
000
examineeswere
generatedfromanormaldistributionwithameanof0andastandarddeviationof1.Test
lengthwas20forallconditions.Itempoolconsistedof250itemswhereitem
weregeneratedfromanormaldistributionwithameanof0andastandarddeviationof0.7.
IPUIvalueswerecomparedateachconditionalongwithmeanSE,MSE,meanbias,y
cotandexposurerates.
4.2.1.4ResearchQuestion3
ResearchQuestion3focusesontheusefulnessofthisindexasadiagnostictoolforitempools.
Thescenarioofthisresearchquestionwashypothesizedasthefollowing.
Supposeastatetestingagencyisplanningtodevelopatesttomeasuretheendofyear
achievementlevelsofstudents.Thepurposeofthetestistomeasureeachstudentasprecisely
aspossible.Studentshadawiderangeofabilities.CATisdecidedtobethebestwayto
achievethisgoal.Thestatetestingagencywantstomeasurethestudentsinthreecontent
areas.Contentarea1(i.e.arithmetic)isgenerallyregardedaseasyamongtheaverage
studentsandtherearenottoomanyitemsinthiscontentarea.Contentarea2(i.e.
algebra)isregardedasmediumyandcontentarea3(i.e.trigonometry)isregarded
44
asamongtheaveragestudents.ThestatetestingagencyhasthreeCATplansto
comparebeforeimplementingthetest.Previousitemdevelopmentrtsshowedthatthe
itemyparametersoftheitemsfromcontentarea1hadanormaldistributionwith
meany-1andstandarddeviation0.3.Theitemofitemsfromcontent
area2hadanormaldistributionwithmeany0andstandarddeviation0.3.Theitem
ofitemsfromcontentarea3hadanormaldistributionwithmeany1and
standarddeviation0.3.Inthefollowingsimulationplans,itemultyparametersofthe
itempoolsweregeneratedfromthesedistributionsforeachcontentarea.
Herearethethreetestplansinvestigatedforthishypotheticalstatetestingagencyto
demonstratetheuseofIPUI:
Plan1.
Forthisplan,anitempoolof90consistingofanequalnumberofitemsfrom
eachcontentarea(30itemseach)wascreated.TheCATspofthisitem
poolwerethesameasthecommonspofthepreviousresearchquestions
(seeSection4.2.1.1onpage37).Lengthofthetestwas15items.tthanthe
previousCATspinthistestplan,contentbalancingwasimposedonthe
itemselectionalgorithm.Examineesshouldrespondtoexactly5itemsfromeach
ofthethreecontentareas.Aftertheadministrationofeachitem,thecontentarea
thathasthelargestdiscrepancywiththetargetvalue(i.e.5)wasselected.Themost
informativeitemwithinthiscontentareaattheexaminee'sintermediateabilitylevel
wasadministered.
Plan2.
Inthisplan,thesameitempoolcreatedfortheplanwasused.Thetest
spnsofthisplanwerethesameastheplanexceptforthecontent
balancing.Forthesecondplan,nocontentbalancingwasimposedontheitemselection
algorithm.Themostinformativeitemattheexaminee'sintermediateabilityestimate
wasadministeredregardlessofthecontentoftheitem.
Plan3.
ThethirdplaninvolvesthecreationofaCATtestforeachcontentareaseparately
45
withalargeritempool.Onlytheresultsofthethirdcontentareawerecomparedto
theotherplansinlieuoftheothercontentareas.Anitempoolof90itemsfromthe
distributionmentionedaboveforcontentarea3wasgenerated.Sincethereisonlyone
contentarea,therewasnocontentbalancingforthistest.TheCATspof
thisplanwerethesameasPlan-2.
Thesizeoftheitempoolsforeachplanwasthesame,90items.Theitempoolsof
Plan1and2werethesame,buttheybythecontentbalancingimposedonthe
itemselectionalgorithm.TheCATspofPlan2and3werethesamebutthey
bythedistributionoftheitemoftheitempools.TheresultsofResearch
Question3showstheuseofIPUIasadiagnostictool.Usingthisdiagnosticinformation,
thishypotheticalstatetestingagencycandecidewhichplanprovidesthebestmeasurement
optiongiventhepracticalconstraints.Theycanalsoseetheweakpointsoftheitempools
andchooseaplanaccordingly,ordecidetoimprovetheitempools.
Tocompareeachplan,1000examineesweresimulatedateach

pointbetween

3and3
with0.2intervals.Thenumberofitemsandthedistributionsoftheitempoolsareshownin
Table4.1.
Table4.1:ItemPoolInformationforResearchQuestion3
ContentArea1ContentArea2ContentArea3
Total
Plan1-2(IP1-2)30
˘
N(-1,0.3)30
˘
N(0,0.3)30
˘
N(1,0.3)
90
Plan3(IP-3)0090
˘
N(1,0.3)
90
Note.
Thenumberofitemsandthedistributionofitemsaregivenwithineachcell.
Numberswithintheparenthesesarethemeansandthestandarddeviationsofthe
generatingdistributions.
Themeanvaluesofbias,SE,MSEandIPUIwerecalculatedateachtrue

conditionfor
eachplan.ThediagnosticutilityofIPUIwasinvestigatedandcomparedwiththediagnostic
utilitiesofotherCAToutcomes.
46
4.2.2SecondPhase-RealData
Inthesecondphase,thesimulationswerebasedontherealitempoolsofanoperational
CAT.NationalCouncilofStateBoardsofNursing(NCSBN)providedtheoperationalitem
poolsofNationalCouncilLicensureExaminationforRegisteredNurses(NCLEX-RN)exam.
NCLEX-RN(NationalCouncilofStateBoardsofNursing[NCSBN],2012)isanursing
licensureexamadministeredbyNCSBN.NCLEX-RNexamination\assessestheknowledge,
skillsandabilitiesthatareessentialfortheentry-levelnursetouseinordertomeetthe
needsofclientsrequiringthepromotion,maintenanceorrestorationofhealth"(NCSBN,
2012,p.3).ExamisdeliveredviaCAT.
DescriptionofNCLEX-RNExam
ThespationsoftheCATalgorithmusedby
NCLEX-RNiscomplex.ItemsthatpassthequalitycontrolarecalibratedusingtheRasch
model.Itempoolsareusedforacertainperiodoftimeandrenewedwithanewitempool
duetosecurityreasons.Itempoolsconsistof8contentareas.Examineesshouldanswera
spproportionofquestionsfromeachcontentarea.Thecontentdistributionofthe
examinationisinTable4.2(NCSBN,2012,p.5).
Table4.2:DistributionofContentforNCLEX-RNExamination
ContentAreaPercentageofItems
SafeandeCareEnvironment

ManagementofCare17-23%

SafetyandInfectionControl9-15%
HealthPromotionandMaintenance
6-12%
PsychosocialIntegrity
6-12%
PhysiologicalIntegrity

BasicCareandComfort6-12%

PharmacologicalandParenteralTherapies12-18%

ReductionofRiskPotential9-15%

PhysiologicalAdaptation11-17%
TheCATprocedurestartswithaninitialabilityestimatethatislowerthantheaverage
abilityestimateoftheexaminees.Initially,Owen'sBayesianprocedure(Owen,1975)isused
47
forabilityestimationanditemselectionuntiltheexamineehasatleastonecorrectand
oneincorrectresponseinherresponsestring.Anormalpriordistributionwithmean0and
variance4isused.Aftertheexamineehasatleastonecorrectandoneincorrectresponsein
herresponsestring,abilityisestimatedusingMLE.Themaximumvalueofthelikelihood
functionisfoundusingtheNewton-Raphsonalgorithm.
Foreachquestion,thecontentareaoftheitemisselectedThecontentareawhich
deviatesmostfromthetargetcontentproportionsisselected.Amongtheavailableand
unadministereditemswithintheselectedcontentarea,oneof
m
itemswhichhavethe
maximumamountofinformationatthecurrentabilityestimateisrandomlyselectedand
administeredtotheexaminee.Here,
m
istheparameterfortherandomesqueexposure
control.Aftertheexamineerespondstheitem,herabilityandthestandarderrorofher
abilityisupdated.
Aftertheexamineecompleted60items,theCATprogramcheckswhetherthecutscore
iscontainedwithina90%intervalaroundtheabilityestimate.Ifthecutscoreis
outsidethenceinterval,thetestisterminated.Iftheabilityestimateisabovethecut
score,theexamineepassesthetest,otherwisefails.
Ifthecutscoreiswithinthe90%intervalaroundtheabilityestimate,thetest
continuesuntilthecutscorefallsoutoftheinterval.TheCATprogramcontinues
toadministeritemsuptothemaximumtestlengthof250,ifthecutscoreisstillwithinthe
examineesestimatedabilityinterval.Aftertheadministrationof250thitem,the
testisterminated.Theexamineepassesthetestiftheabilityestimateisgreaterthan
thecutscore,otherwisetheexamineefails.
4.2.2.1ResearchQuestion4
NCSBNadministersNCLEX-RNthroughouttheyear.Duetosecurityreasons,testdevelopers
changetheitempoolswithincertainintervals.Eachitempoolisselectedtomeetthetest
sptionsandpassesthroughqualitycontrol.Theindexproposedinthisthesiscanhelp
48
thetestdeveloperstochecktheoftheselecteditempool.
Forthisthesis,NCSBNprovidedtheitemparametervalues,contentareainformationofthe
retireditempools,theabilitydistributionoftheprevioustesttakersandthetestsp
requiredtomimictheCATprocedure.Inaddition,NCSBNprovidedresponsestringsofe
anonymoustesttakers.ThisinformationwasusedtoensurethattheCATprocedurewritten
forthisdissertationindeedmimicstherealCATalgorithmusedoperationally.
Forthisresearchquestion,thequalityofeoperationalitempoolswereinvestigated.
Ontopoftheseeoperationalitempools,fouradditionalitempoolsweregenerated.Two
ofthemwereidealitempoolsdevelopedforthespandtargetpopulationofthe
NCLEX-RNtesttoformabaselineforcomparisons.Oneoftheseidealitempoolscreated
usingabinsizeof0.4,theotheriscreatedusingabinsizeof0.8.Thesebinsizes
arethesamebinsizesusedinHeandReckase(2013).Thedetailsofthedevelopmentofthe
idealitempoolsisdescribedinSection4.2.2.1onthefollowingpage.
Thethirditempoolcreatediscalledthehalf-item-pool(Half-IP).Thisitempoolhas
halfthesizeoftheoperationalitempool.Foreachcontentarea,halfoftheitemswere
randomlyselectedandthenremovedfromtheoperationalitempool.Theremaining
itemsformedthehalf-item-pool.Afourthgenerateditempooliscalledone-third-item-pool
(One-Third-IP).Thisitempoolwascreatedinasimilarwaytothehalf-item-pool.Insteadof
halfoftheitems,twothirdsoftheitemsrandomlyremovedfromeachcontentarea.The
remainingitemsformedtheone-third-item-pool.Theseitempoolsaretheotherextremesof
theidealitempools.
EvaluationoftheseitempoolswereperformedusingCATsimulationsforeachofthese
eightitempools.Normallydistributed
50
;
000
examineesweresimulated.Themeanandthe
standarddeviationofthenormaldistributionwasthesameastheabilitydistributionofthe
realexaminees.Thesame
50
;
000
examineeswereusedforeachitempoolcondition.IPUI
wascalculatedforeachitempoolcondition.Additionally,foreachitempool,meanstandard
error(Equation(4.2)),meansquarederror(Equation(4.3)),meanbias(Equation(4.4)),
49
ycot(Equation(4.5))andthepercentageofcorrectclascationwascalculated
andcomparedtotheIPUI.Besidestheseoutcomevariables,sinceNCLEX-RNtestisa
variablelengthtest,theaveragetestlengthandthemeanexposurerateswerealsocalculated.
Idealitempoolcreation
Idealitempoolswerecreatedusingthebin-and-unionmethod
asdescribedatReckase(2010)andHeandReckase(2013).AccordingtovanderLinden
etal.(2006)anidealitempoolshould
...consistofamaximalnumberofcombinationsofitemsthat(a)meetallcontent
spforthetestand(b)aremostinformativeataseriesofabilitylevels
theshapeofthedistributionoftheabilityestimatesforapopulationof
examinees.(p.82)
Thebin-and-unionmethodusesthisprincipletobuildanidealitempoolforagivenexaminee
populationandasetoftestspForthisdissertation,twoidealitempoolsforan
examineegroupwiththesameabilitydistributionastherealexamineegroupwerecreated.
TheCATspwerethesameastheNCLEX-RNexamination.Initially,
10
;
000
examineesweresimulatedfromadistributionthatrepresentstherealexamineepopulation.
Fortheseexaminees,aCATtestwasperformedwiththeassumptionofmaximallyinformative
itemisavailableateachstageoftheadaptivetest,whateverthevalueofintermediateability
estimateis.SincetheRaschmodelwasusedinthisstudy,
b
valuesofthemaximally
informativeitemswereequaltotheintermediateabilityestimates.
Initially,binswithintheabilityscalewiththefollowingrangelimitswerecreated:
Idealitempoolwithbinsize0.4:
(

,-3),[-3,-2.6),[-2.6,-2.2),[-2.2,-1.8),[-1.8,-1.4),
[-1.4,-1),[-1,-0.6),[-0.6,-0.2),[-0.2,0.2),[0.2,0.6),[0.6,1),[1,1.4),[1.4,1.8),[1.8,
2.2),[2.2,2.6),[2.6,3),[3,
1
)
Idealitempoolwithbinsize0.8:
(

,-2.8),[-2.8,-2),[-2,-1.2),[-1.2,-0.4),[-0.4,0.4),
[0.4,1.2),[1.2,2),[2,2.8),[2.8,
1
)
50
Thesebinswereusedtotallythenumberofitemsrequiredforeachrange.Thenumber
ofitemswithineachbinwassetto0atthebeginningofthesimulation.Then,foreach
examinee,aCATtestwassimulatedandthe
b
valuesoftheitemsthatwereadministered
wererecorded.Attheendofthetest,foreachcontentarea,itemsweredistributedtobins
accordingtotheir
b
values.Foreachbinrange,ifthenumberofitemsinthatbinrange
waslargerthanthepreviousnumberofitemsassignedtothatbin,thenthislargernumber
wasassignedtothatbin.Attheendofthesimulationsfortheentireexamineegroup,the
maximumnumberofitemsthatwerenecessaryforeachbinrangewasobtained.Thisgave
thedistributionoftheidealitempool.
Afterobtainingthedistributionoftheidealitempool,actualidealitempoolswere
generated.Arandomnumberfromthestandardnormaldistributionwasgenerated.This
randomnumberwasassignedtotheappropriatebinifthatbindidnotreachitsmaximum
sizeyet.ThisprocedurecontinueduntileachbinwasThisway,anidealitempoolfor
thisexamineegroupwasobtained.
4.2.2.2ResearchQuestion5
Inthisresearchquestion,theutilityoftheIPUIasadiagnostictoolforanoperationalCAT
wasinvestigated.AnalysesinthisresearchquestionweretheextensionoftheResearch
Question3toanoperationalCATwithtsp
Inthisresearchquestion,6itempoolswerestudied.Twooftheseitempoolswerethe
operationalitempools,twoofthemweretheidealitempoolsandthelasttwoitempoolswere
One-ThirdandHalfitempoolscreatedforthepreviousresearchquestion.Eventhoughthere
wereeoperationalitempoolsavailable,onlytwoofthemwereinvestigated.Theresultsof
theResearchQuestion4showedthatthebetweentheoperationalitempoolswere
minimal.Forthisreason,onlythetwooperationalitempoolswereinvestigatedfurther.
Foreachitempoolcondition,1000examineesattrue

values-3,-2.8,-2.6,-2.4,-2.2,-2,
-1.8,-1.6,-1.4,-1.2,-1,-0.8,-0.75,-0.7,-0.65,-0.6,-0.55,-0.5,-0.45,-0.4,-0.35,-0.3,-0.25,
51
-0.2,-0.15,-0.1,-0.05,0,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,
0.75,0.8,1,1.2,1.4,1.6,1.8,2,2.2,2.4,2.6,2.8and3weresimulatedusingtheNCLEX-RN
CATspOutsidethe

interval(

0
:
8
;
0
:
8),thebetweenthesimulated

valueswas0.2.Between[

0
:
8
;
0
:
8],thebetweenthesimulated

valueswas0.05
becausethesepointswereclosetothecutscoreanditwasimportanttomakemoreprecise
diagnosisclosetothecutscorewherethedecisionaccuracywasparamount.Ateach

value,
themeanofIPUIvalues,meanSE,MSE,meanbiasandthedecisionaccuracywascalculated.
EachoutcomevariablewascomparedtotheIPUItoshowthediagnosticutilityofIPUI.The
distributionsofIPUIvaluesateachconditional

valuegavealocaldiagnosisoftheitem
pool.
52
CHAPTER5
RESULTS
Theresultsoftheanalysesaredividedintotwophases.Inthephase,theresultsofthe
threeresearchquestionsarepresented.Inthesecondphasetheresultsofthefourthand
researchquestionsarepresented.
5.1FirstPhase-SimulatedData
Inthisphaseoftheresults,resultsofthreequestionsarepresented.Thesethree
questionsarebasedonsimulateddata.Testspwerecommoninthesethree
researchquestionsexcepttheconditionsbeingtested(Section4.2.1.1).Thedesignofthe
phasewaskeptassimpleaspossibletodemonstratetheoftheIPUI.Amore
complicatedCATdesignwasinvestigatedinthesecondphaseofthestudy.
5.1.1ResearchQuestion1
InResearchQuestion1,itempoolqualityisoperationalizedas(1)thediscrepancybetween
examineeabilitydistributionanditemydistributionoftheitempool,and(2)the
itempoolsize.Inthefollowingtwosectionstheresultsofthesimulationsexaminingthese
twoitempoolqualityindicatorsarepresented.
5.1.1.1ItemPoolandExamineeAbilityDiscrepancy
Therewerethirteenconditionsforitempoolandexamineeabilitydiscrepancy.Bothitem
yparametersintheitempoolandexamineeabilitiesweregeneratedfromnormal
distributionwithstandarddeviationof1.Theitempoolwasforallconditionsand
theitemhadastandardnormaldistribution.Theitemydistributionof
53
itempoolisinFigureA.1onpage167.Theconditionsbytheirmeanswhichranged
from-3to3with0.5intervals.
10
;
000
examineesweresimulatedforeachcondition.The
distributionsoftruethetasareplottedinFigureA.2onpage168.Foreachcondition,mean
ofbias,SE,MSEandIPUIwerecalculated.
TheresultsofthesimulationsaresummarizedinFigure5.1andTable5.1.Figure5.1
showsthechangeinbias,SE,MSEandIPUIforeachdiscrepancycondition.Table5.1shows
themeansandstandarddeviationsoftheseoutputvariables.
Figure5.1:SummaryStatisticsforResearchQuestion1-DiscrepancybetweenItemy
DistributionofItemPooland

Distribution
Bias
Figure5.1doesnotshowaclearrelationshipbetweenthebiasofabilityestimatesand
theamountofdiscrepancybetweenabilitydistributionanditempool.Meansofthebiases
werecloseto0forallconditions.Ontheotherhand,thestandarddeviationsofbiasesgiven
54
Table5.1:SummaryStatisticsforResearchQuestion1-Discrepancybetween
ItemPoolandAbilityDistribution
DiscrepancySEBiasMSEIPUI
-3.00.451(0.190)-0.002(0.426)0.182(0.312)0.632(0.264)
-2.50.385(0.157)-0.021(0.381)0.145(0.265)0.741(0.255)
-2.00.337(0.116)-0.020(0.346)0.120(0.212)0.839(0.219)
-1.50.307(0.075)-0.009(0.317)0.100(0.175)0.913(0.166)
-1.00.292(0.046)-0.007(0.301)0.091(0.143)0.958(0.114)
-0.50.284(0.026)-0.007(0.292)0.085(0.124)0.983(0.068)
0.00.283(0.024)-0.001(0.293)0.086(0.131)0.987(0.060)
0.50.285(0.035)0.003(0.296)0.087(0.142)0.979(0.084)
1.00.295(0.065)0.004(0.309)0.096(0.163)0.950(0.135)
1.50.314(0.096)0.013(0.323)0.105(0.185)0.901(0.186)
2.00.352(0.144)0.017(0.356)0.127(0.229)0.820(0.239)
2.50.414(0.189)0.028(0.405)0.165(0.297)0.710(0.272)
3.00.493(0.220)-0.013(0.466)0.217(0.375)0.596(0.274)
Note.
Numberswithintheparenthesesarestandarddeviationsofeachoutcome.
SE:StandardError;MSE:MeanSquaredError
inTable5.1displaysapattern.Asthediscrepancybetweenabilitydistributionanditem
poolincreased,thestandarddeviationofthebiasesincreasedaswell.Forlowdiscrepancy
conditionsthevariabilityofbiaseswerelow.Mostoftheexamineeshadbiasescloseto0.For
highdiscrepancyconditionsthevariabilityofthebiaseswerehigher.FigureA.3onpage169
alsoshowsthisvisually.
StandardError
Figure5.1showsthatastheabsolutevalueofdiscrepancybetweenthe
itempoolandabilitydistributionincreased,SEoftheabilityestimatesincreasedaswell.
Whentherewasaclosematchbetweenabilitydistributionanditempool,theoverallSEsof
examineeswerelow.Forlargediscrepancyconditions,themeanoftheSEswerelarger.
SimilartypeofincreasecanbeobservedforthestandarddeviationsoftheSEvalues.
Table5.1showsthatastheabsolutevalueofdiscrepancyincreased,thevariabilityofstandard
errorsincreased.ThevariabilityofSEsarealsoplottedinFigure5.2.InFigure5.2,each
pointrepresentstheSEvalueofasimulatedexaminee.Eachexamineeisplottedcloseto
55
thediscrepancyconditionitbelongs.Thepointsarescatteredaroundtheexactdiscrepancy
valuetoshowthedistributionofthevalues.Inadditiontopoints,abox-plotwasaddedto
showtheshapeofthedistribution.Onlytheboxandwhiskersofthebox-plotswereplotted
becausetheoutliersarealreadyshownbythepointsinthegraph.Asbox-plotsandthe
distributionofthepointsshow,SEforeachconditionhadapositiveskew.
Figure5.2:DistributionofStandardErrorswithineachDiscrepancyCondition
MeanSquaredError
TherelationshipbetweenMSEanddiscrepancyconditionswere
similartoSE.Asdiscrepancybetweenitempoolandabilitydistributionincreased,meanof
56
theMSEsincreasedaswell.ForsmalldiscrepancyconditionstheMSEdistributionswere
almostidentical.Table5.1showsthatbothmeansandstandarddeviationsofMSEvalues
wereveryclosewhenthediscrepancieswerebelow1unit.FigureA.5onpage171shows
thedistributionofMSEforeachdiscrepancycondition.Ascanbeseenfromthisplot,the
variationinMSEvaluesincreasedasthediscrepancyincreased.ThedistributionsofMSE
werepositivelyskewedasSEdistributions.
IPUI
ItcanbeclearlyseeninFigure5.1thatastheabsolutevalueofdiscrepancybetween
theitempoolandabilitydistributionincreased,theIPUIvaluesdecreased.Whentherewas
nodiscrepancyoritwasminimal,themeanofIPUIwasalmost1.Thismeans,foralmost
alloftheexamineesitempoolwasabletoprovideappropriateitemsthroughoutthetest.
Itempoolwasntfortheseexamineegroups.ButforhighdiscrepancyconditionsIPUI
valuesdecreasedmarkedly.ThemeanofIPUIvalueswentdownto0.6fortheseexaminee
groups.Fortheseexaminees,itempoolfailedtoprovideappropriateitems.
InadditiontothedecreaseinIPUIvaluesforhighdiscrepancyconditions,thevariability
oftheIPUIvaluesincreasedforhighdiscrepancyconditions.Whentherewasaclosematch
betweenitempoolandexamineedistribution,thestandarddeviationoftheIPUIwassmall,
around0.06.ThestandarddeviationsoftheIPUIvaluesgoupto0.274forhighdiscrepancy
conditions.ThisvariabilitycanalsobeobservedfromFigure5.3.Forallconditions,IPUI
valuesrangefrom0.274to1.Thevariabilityincreasedasthediscrepancyincreasedand
distributionsshowedanegativeskewforalmostallconditions.Butitcanbeobservedthat
theskewnessofthedistributionswentfromnegativetopositiveasthediscrepancyincreased.
Figure5.4showstherelationshipbetweenSEandIPUIforeachdiscrepancycondition.
Linearregressionlinewastoeachplot.ThecorrelationbetweenSEandIPUIwas
printedatthebottomofeachplot.Foreachdiscrepancyconditiontherewasanegative
relationshipbetweenSEandIPUI.Butthisrelationshipwasnotlinear.Insteaditwas
curvilinear.ThisgraphclearlyshowsthatSEandIPUIarenotthefunctionsofeachother.
57
Figure5.3:DistributionofIPUIwithineachDiscrepancyCondition
Figure5.4showsthatthereweremanycaseswherebothIPUIandSEwerelow.Thismeans
theitempooldidnotprovideoptimumitemstoexamineebutstilltheerrorofabilityestimate
islow.Ontheotherhand,therewereveryfewcaseswherebothIPUIandSEwerehigh.
Thismeans,whentheitempoolabletopresentoptimumitemstotheexaminee,thestandard
errorsoftheabilityestimateswillnotbehigh.
FigureA.4onpage170showstherelationshipbetweenIPUIandbiasforeachdiscrepancy
condition.TherewasapositiverelationshipbetweenIPUIandbiaswhenthediscrepancy
betweenitempoolandabilitydistributionwasnegative.Forpositivediscrepancyconditions
58
Figure5.4:RelationshipbetweenStandardErrorandIPUIforeachDiscrepancyCondition
IPUIandbiashadanegativerelationship.Theserelationshipswerenotstrongasthe
correlationcotsatthebottomofeachplotshows.Thisrelationshipcanbeexplained
bytheregressiontothemean
59
FigureA.6onpage172showstherelationshipbetweenIPUIandMSEforeachdiscrepancy
condition.Therewasnotanapparentpatternbetweenthesetwooutcomes.
OneinterestingobservationinFigure5.2istheclusteringofSEvalues.Forexample,
whenthediscrepancyconditionwas-3,thethreehighestSEvalueswere0.84,0.63and0.47.
ThereweremanyexamineeswiththeseSEvalues.Outof
10
;
000
simulatedexaminees,the
numberofexamineeswiththeseSEvalueswere
1362
,
415
and
211
respectively.TheseSE
valuesbelongtotheexamineeswith0,1and2totalcorrectresponsesrespectively,who
answeredthesameitemsinthesameorder.For1362examineeswithoutanycorrectresponse,
theCATprocedureadministeredthesameitemsbecauseoftheirresponses.Allofthese
itemsweretheeasiestitemsoftheitempool.Consequently,theirabilityestimateswere
thesameaswell.Thisiswhytheirstandarderrorswerethesame.Eventhoughtherewere
morethan
415
examineeswithtotalcorrectscoreof1,these
415
examineesmadethesame
oneerrortothesamequestionwhichresultintheadministrationofsametestitemstothese
examinees.Samethingwastruefor211examineeswiththethirdhighestSE.
SimilarphenomenoncanbeobservedtosomeextentforIPUIvaluesinFigure5.3.The
IPUIvalueshadmorevariabilityinthevaluesittookcomparedtoSE.Fordiscrepancy
condition-3,therewere1383examineeswhohadthesameIPUIvalue.Thisnumberisa
littlelargerthanthenumberofexamineeswithsameSE(1362).These21hadtSE
becausetheirlastitemiscorrect.ThatlastresponsedidnotchangetheIPUIvaluebut
changedtheSEvaluefortheseexaminees.Restofthe1362examineesdidnothaveany
correctresponse.
FortheexamineeswhohadtwocorrectresponsesandthesameSEvalues,theIPUIvalues
werenotalwaysthesame.Ifthelocationofthesetwoerrorsweret,thentheIPUI
valueswouldbeerent.TheCATprocessesoftwosuchexamineesisinFigureA.7on
page173.EventhoughthesetwoexamineeshadsameSE,theirIPUIvaluesweret
becauseofthepositionoftheirmistakes.Thisphenomenonisthemainreasonwhythere
weresomanytIPUIvaluescomparedtotheSE.Forexamineeswithonlytwocorrect
60
responses,SEswillbethesameifthesameitemsaredeliveredregardlessofthelocation
ofthecorrectresponses.Because,forRaschmodel,thenumberoftotalcorrectresponses
isatstatistic(Leeuw&Verhelst,1986).IftheCATalgorithmadministerssame
itemstotwoexamineesbutexamineescorrectlyanswertwotitems,thentheir
scorewillbethesame.ConsequentlytheirSEswillbethesameaswell.ButforIPUI,ifthe
locationofthecorrectresponseschange,thentheIPUIvalueswillchangeaswell.Because
IPUIisthefunctionofallintermediateabilityestimatesandtheitemparameters.Onthe
otherhand,SEisthefunctionofabilityestimateandtheitemparameters.
Forotherdiscrepancyconditionssimilarthinghappened.Fordiscrepancycondition3,
manyexamineesgotallitemscorrect.Asaresult,theCATalgorithmadministeredthesame
itemstotheseexamineesandtheirSEscameoutthesame.InFigure5.2,itcanbeseen
thattheSEvaluesforallcorrectandallincorrectanswerswerent.Oneimportantnote
isthat,theitempoolsusedinallconditionsweresame.So,ifanexamineecorrectlyanswers
allitemsintwotdiscrepancyconditions,thentheirstandarderrorswouldcomeout
thesame.Bycoincidence,theIPUIvaluesofallincorrectandallcorrectitemswerevery
closeforthisitempool(0.2749047vs.0.2742634respectively).Asaresult,theminimum
valuesofIPUIvaluesinFigure5.3fordiscrepancycondition-3and3lookedthesame.
5.1.1.2ItemPoolSize
InthesecondpartoftheResearchQuestion1,itempoolqualitywasoperationalizedasthe
sizeoftheitempool.Itwashypothesizedthatastheitempoolsizeincreases,thequalityof
theitempoolincreasesaswell.Here,theitempoolparameterdistributionswereassumed
toremainthesame.Consequently,thevaluesofIPUIwerehypothesizedtoincreaseas
theitempoolsizesincrease.Thishypothesiswastestedusing11itempoolsizeconditions.
Itempoolswith20,40,60,80,100,200,300,400,500,750and1000itemsgenerated.The
itemyparameters(
b
-parameter)ofallitempoolsweregeneratedfromastandard
normaldistribution.Foreachitempoolsizecondition,therewere25replicationstoobserve
61
theofsamplingerrorespeciallyforthesmallitempools.Forverysmallitempools,
eventhoughtheitemltiesweredrawnfromastandardnormaldistribution,theitems
generatedinonereplicationmighthavetitemcomparedtotheitems
generatedinanotherreplication.Theitemydistributionsofitempoolsfor19th
replicationisinFigureB.2onpage175.
10
;
000
examineesweresimulatedfromastandardnormaldistribution.Thesameset
ofexamineesusedforeachreplicationandcondition.Thedistributionoftrue

isshown
inFigureB.1onpage174.Testlengthfortheadaptivetestswas20andtherewereno
constraintsontheitemselectionalgorithm.TherestoftheCATspwerethe
sameasthecommonCATspmentionedinSection4.2.1.1onpage37.Foreach
conditionandreplication,bias,SE,MSE,exposureratesandIPUIvaluescalculated.
Bias
Figure5.5showsthemeanbiasdistributionforeachcondition.Eachpointrepresents
themeanofthebiasesof
10
;
000
simuleesforeachreplicationwithinacondition.Except
theverysmallitempools(20and40),meanbiaseswereverysmallforeachcondition.The
valuesforreplicationswerespreadaround0anddoesnotshowapatternforitempoolsizes
largerthan40.Forverysmallitempoolsizes,thespreadofmeanbiasvalueswashigher.
FigureB.3onpage176showsthebiasdistributionforthe19threplicationofeachcondition.
Thespreadofbiaseswerealsohigherforverysmallitempools.Fortherestoftheitempool
sizeconditions,thedistributionsofthebiaseswerealmostthesame,normallydistributed
withthesamemeansandstandarddeviations.
StandardError
Figure5.6showsthemeanSEvaluesforeachreplicationwithina
condition.AstheitempoolsizeincreasedthemeanSEdecreased.Foritempoolsthat
arelargerthan200,themeanSEvalueswerealmostsame.ThespreadofmeanSEs
withinconditionwerehigherforsmallitempools.Thistheofsamplingerror.
FigureB.4onpage177showstheSEdistributionforthe19threplicationofeachcondition.
62
Figure5.5:MeanBiasDistributionbyItemPoolSizeCondition
ThemeanofSEforeachconditiondecreasedastheitempoolsizeincreased.Aftertheitem
poolsizeof100,thebetweenmeanswereverysmall.ThespreadofSEdecreased
asthesizeoftheitempoolincreased.After200itemsthespreadwasalmostthesameforall
conditions.Butthenumberofoutliersdecreasedastheitempoolsizeincreased.Foritem
poolofsize
1000
,therewasonlyonesimuleewithSElargerthan.4.Forotherconditions
thisnumberincreasedastheitempoolsizedecreased.Alsothedistributionswerepositively
skewedforeachcondition.
MeanSquaredError
Figure5.7showsthemeanMSEvaluesforeachreplicationwithina
condition.MSEdistributionshowsasimilarpatternasSE,astheitempoolsizeincreasedthe
meanofMSEsdecreased.ThespreadofthemeanMSEvalueswithinaconditionwerelarger
63
Figure5.6:MeanStandardErrorDistributionbyItemPoolSizeCondition
forsmallitempoolsizeconditionsanddecreasedastheitempoolsizeincreased.Foritem
poolslargerthan200items,thespreadwasalmostthesame.FigureB.5onpage178shows
theMSEdistributionforthe19threplicationofeachcondition.Eventhoughthemeansand
thespreadofMSEvalueswerelargerforverysmallitempoolsizes,thedistributionswere
verysimilarforitempoolsthatarelargerthan40.Allofthedistributionswerepositively
skewed.
ExposureRates
Meanexposurerateforreplicationswerenotcalculatedbecausethe
meanexposurerateofanadaptivetestequalstotheratiooftestlengthtotheitempoolsize.
Instead,Figure5.8showstheexposureratesoftheitemsforthe19threplicationofeach
condition.Sincetestlengthwastheexposureratesofsmallitempoolswereoverall
64
Figure5.7:MeanSquaredErrorDistributionbyItemPoolSizeCondition
higherthantheexposureratesofthelargeritempools.Also,initialabilitywassameforall
examinees.Thisledtheselectionofthesameoneitemforeachadaptivetest.Forthisitem,
theexposureratewas1.Dependingonthecorrectorincorrectresponse,examineeswere
routedtothesameseconditems.Exposureratesfortheseitemswerearound.5forlargeitem
poolsandevenhigherforsmalleritempools.Besidessuchitems,foritempoolsthatwere
largerthan200items,majorityoftheexposurerateswerelowerthan0.2,therecommended
maximumvalueforitemexposure.
IPUI
Figure5.9showsthemeanIPUIvaluesforeachreplicationwithinacondition.There
wasaclearrelationshipbetweenitempoolsizeandmeanIPUIvalues.Astheitempoolsize
increased,themeanvaluesofIPUIincreasedaswell.ThespreadofmeanIPUIvalueswere
65
Figure5.8:ExposureRatesbyItemPoolSizeConditionforReplication19
largerforsmalleritempoolsizes,butthespreaddecreasedastheitempoolsizeincreased.
Table5.2showsthemeansandstandarddeviationsofallIPUIvaluescombinedforeach
condition(i.e.eachreplicationwithinaconditionwasaggregatedtogetasinglemeanand
standarddeviation).Eventhoughthebetweenconditionsaftertheitempoolsize
200isnotdiscernibleinFigure5.9,Table5.2showsthatmeanofIPUIincreasedsteadiliy
astheitempoolsizeincreased.AlsothestandarddeviationofIPUIvaluesalsodecreased
(exceptforitempoolsize40)astheitempoolsizeincreased.Thisshowsthat,itempools
providedmoreappropriateitemswhenitempoolsizeincreased.
SmallstandarddeviationsinTable5.2showsthatsmallernumberofsimuleesfrom
thelackofappropriateitem.ThiscanbeobservedvisuallyfromFigureB.6onpage179
whichshowsthedistributionofIPUIforthe19threplicationofeachcondition.Thisgraph
66
Figure5.9:MeanIPUIDistributionbyItemPoolSizeCondition
Table5.2:MeansandStandardDeviationsofIPUIValuesbyItemPoolSize
ItemPoolSizeMeanIPUIStandardDeviationofIPUI
200.58750.1798
400.83480.1817
600.90280.1593
800.93450.1325
1000.94940.1201
2000.97930.0768
3000.98800.0574
4000.99060.0507
5000.99240.0450
7500.99530.0343
10000.99650.0294
67
showssimilarrelationshipbetweenitempoolsizeandIPUIasmentionedabove.Inaddition
tothat,thenumberofexamineesthathavelowerIPUIvaluesdecreasedastheitempoolsize
increased.Thismeans,eventhesimuleeswithextremeabilityparameterssawappropriate
items.FigureB.6alsoshowsthatthedistributionofIPUIwasnegativelyskewedforeach
condition.
IPUIandStandardErrorRelationship
SincebothIPUIandSEusesinformation
functionfortheircalculations,thesetwovaluesareexpectedtoberelated.Figure5.10shows
therelationshipbetweenthemeanIPUIandmeanSEforeachreplicationwithinacondition.
Eachitempoolsizeconditionisrepresentedbyatcolorinthegraph.Itcanbeeasily
observedthatthereisalmostaperfectrelationshipbetweenmeanIPUIandmeanSEexcept
forverysmallitempoolsizeconditions.Thecorrelationbetweenthesetwovariableswas
-0.99.
Figure5.10showstherelationshipbetweenIPUIandSEatanaggregatelevel.Infact,the
relationshipofIPUIandSEforindividualsimuleeswasnotperfect.Figure5.11showsthe
relationshipbetweenIPUIandSEforthe19threplicationofeachcondition.Therelationship
wasmorelikeacurvilinearrelationshipforindividualexaminees.Thisisverysimilartothe
relationshipobservedinthepartoftheresultsforResearchQuestion1(Figure5.4).
OneinterestingobservationinFigure5.11isthecorrelationbetweenSEandIPUI.As
thesizeoftheitempoolincreasedtheabsolutevalueofcorrelationbetweenSEandIPUI
decreased.So,foritempoolsize
1000
,therelationshipbetweenSEandIPUIappearstobe
veryweak,
ˆ
=

0
:
34.Butforsmalleritempoolsthisrelationshipwasratherstrong.
Figure5.12showsthecorrelationbetweenSEandIPUIforeachreplication.Astheitem
poolsizeincreasedtheabsolutevalueofcorrelationbetweenSEandIPUIdecreasedformost
oftheitempoolsizeconditions.ThemainreasonbehindthisisthevariationofbothSE
andIPUIvalues.Forsmalleritempoolsizestherewasmorevariationinthevaluesofthese
variables.ButforlargeritempoolsizesthevaluesofIPUIwerealmost1,andthevariation
68
Figure5.10:RelationshipbetweenMeanofIPUIandMeanofStandardError
inSEwasalsoverylowandthiswasthemaincauseofweakrelationship.
IPUIandReliability
EventhoughSEgivesenoughinformationabouttheerrorinability
estimates,reliabilityisawellknownindicatorofthequalityofatest.Itisvaluableto
investigatetherelationshipbetweenreliabilityandIPUI.
Therelationshipbetweenstandarderror(
˙
e
)andreliability(
ˆ
;
^

)isasfollows:
˙
e
=
˙
^

q
1

ˆ
^

So,reliabilitycanbecalculatedas:
ˆ
^

=1

˙
2
e
˙
2
^

69
Figure5.11:IPUIandStandardErrorRelationshipforReplication19
Inthesimulation
˙
2
^

=1.Consequently,thereliabilitycanbecalculatedas:
ˆ
^

=1

˙
2
e
Figure5.13showstherelationshipbetweenIPUIandreliabilityforthe19threplication
ofeachcondition.AsthevaluesofIPUIincreasedthereliabilityincreasedaswellforeach
condition.Therelationshipwascurvilinearinsteadofalinearone.Forsmallitempoolsthe
70
Figure5.12:CorrelationbetweenStandardErrorandIPUIforeachReplication
relationshipwasratherstrong.Asitempoolsizeincreasedthestrengthoftherelationship
decreased.Figure5.13alsoshowsthemeanreliabilityvaluesforeachcondition(

x
).The
reliabilitywascomparativelylowfortheitempoolwith20items.Aftertheitempoolsizeof
40,theincreasewasminimal.
71
Figure5.13:IPUIandReliabilityRelationshipforReplication19
5.1.2ResearchQuestion2
IntheResearchQuestion2,theoftwotestsptestlengthandexposure
control,onthequalityofitempoolandCAToutcomeswasinvestigated.
72
5.1.2.1TestLength
InthestpartoftheResearchQuestion2,theofthetestlengthontheperformance
ofitempoolandotherCAToutcomeswereinvestigated.Itwashypothesizedthatincreasing
thetestlengthwilldecreasethequalityoftheitempoolasquanbyIPUI.Increasing
thetestlengthdoesnotnecessarilydecreasethequalityofaCAT,buttheofa
CATwilliftheitempooldoesnotsupportalongtest.Forexample,anincreaseinthe
testlengthisassociatedwithanincreaseinthetest'sprecision,i.e.decreaseinthestandard
erroroftheabilityestimates(Weiss,1982).Iftheitempoolistforalongtest,
increasingthetestlengthwillincreasetheprecisionofthetestminimally.Thishypothesis
wastestedusing18testlengthconditions.Theconditionsweretestswithlengths5,10,15,
20,25,30,35,40,45,50,55,60,65,70,100,200,300and400.
Anitempoolwith400itemswasgenerated.Item(
b
-parameters)ofthisitem
poolweregeneratedfromanormaldistributionwithmean0andstandarddeviation1.The
itemydistributionoftheitempoolisshowninFigureC.1onpage180.Thesame
itempoolwasusedforeachtestlengthcondition.Sincethesizeoftheitempoolwasrather
large,itwasassumedthattheofsamplingerrorwaslow.So,entitempoolsdid
notreplicatedandonlyonereplicationwasperformedforeachcondition.
10
;
000
examinees
weresimulatedfromanormaldistributionwithmean0andstandarddeviation1.The
samesetofexamineeswasusedforeachcondition.Thedistributionoftrue

isshownin
FigureC.2onpage181.Therewerenoconstraintsontheitemselectionalgorithm.Therest
oftheCATspwerethesameasthecommonCATspmentionedin
Section4.2.1.1onpage37.Foreachcondition,bias,SE,MSE,ycoent,exposure
ratesandIPUIvalueswerecalculated.
TheresultsoftheanalysesaresummarizedinFigure5.14.Theycotandthe
meanvaluesofbias,SE,MSEandIPUIforeachtestlengthconditionareshowninthe
Table5.3showstheycotinadditiontothemeansandstandarddeviationsof
73
bias,SE,MSEandIPUIforeachtestlengthcondition.Inthefollowingpagestheresultsof
theinvestigationofeachvariableispresentedseparately.
Figure5.14:SummaryStatisticsforResearchQuestion2-TestLength
RelationshipbetweenTrueandEstimatedAbility
Figure5.15showstherelationship
betweentrueability(

)andestimatedability(
^

).Thedashedlinesintherearethe
identitylines(i.e.
y
=
x
line).Whentheestimationisperfect,allofthepointsshouldfallon
thisline.Ifapointisabovetheidentityline,thismeansitisoverestimated,i.e.estimated
abilityislargerthanthetrueability(positivebias).Ifthepointisbelowtheidentityline,this
meansitisunderestimated,i.e.estimatedabilityissmallerthanthetrueability(negative
bias).
Ateachcondition,therewasanerrorintheestimationandpointsdeviatedsomewhat
74
Table5.3:SummaryStatisticsforResearchQuestion2-TestLength
TestLengthBiasSEMSEIPUIFidelity
50.007(0.650)0.611(0.046)0.423(0.653)0.999(0.004)0.8390
10-0.001(0.429)0.413(0.030)0.184(0.289)0.997(0.021)0.9198
150.000(0.340)0.330(0.021)0.116(0.172)0.995(0.032)0.9473
200.000(0.289)0.282(0.017)0.084(0.122)0.992(0.042)0.9609
25-0.002(0.256)0.250(0.015)0.065(0.099)0.991(0.049)0.9690
30-0.001(0.234)0.227(0.016)0.055(0.092)0.988(0.055)0.9740
35-0.004(0.211)0.210(0.013)0.044(0.065)0.985(0.059)0.9787
40-0.000(0.202)0.196(0.016)0.041(0.063)0.982(0.067)0.9806
45-0.000(0.185)0.185(0.015)0.034(0.052)0.979(0.073)0.9834
500.002(0.179)0.175(0.014)0.032(0.048)0.975(0.081)0.9845
55-0.001(0.169)0.167(0.015)0.028(0.042)0.971(0.084)0.9862
60-0.002(0.164)0.160(0.017)0.027(0.047)0.967(0.089)0.9869
650.002(0.155)0.154(0.015)0.024(0.039)0.962(0.094)0.9883
70-0.001(0.150)0.148(0.014)0.022(0.034)0.958(0.097)0.9891
100-0.002(0.128)0.126(0.017)0.016(0.031)0.926(0.123)0.9920
2000.002(0.099)0.096(0.020)0.010(0.023)0.809(0.164)0.9952
3000.001(0.090)0.087(0.020)0.008(0.015)0.681(0.167)0.9960
400-0.001(0.089)0.084(0.023)0.008(0.026)0.546(0.143)0.9961
Note.
Numberswithintheparenthesesarestandarddeviationsofeachoutcome.
SE:StandardError;MSE:MeanSquaredError;Fidelity:Correlationbetweentrueability
andestimatedability
fromtheidentityline.Forshortteststhedeviationfromtheidentitylineislarger.As
thelengthofthetestincreased,thepointsgetclosertotheidentityline.Thismeansthe
estimationstartedtoconvergetothetruevalues.BothFigure5.15andTable5.3shows
theycocients(i.e.thecorrelationbetweentrueandestimatedability)foreachtest
lengthcondition.Forshorterteststhecorrelationbetweentrueandestimatedabilitieswere
lower.Astestlengthincreased,thecorrelationssteadilyincreasedandapproachedto1.
Bias
Figure5.14showsthatthemeanbiasdidnotchangeacrossdtconditions.
Ontheotherhand,itcanbeobservedfromTable5.3that,thestandarddeviationofbias
decreasesasthetestlengthincreases.Figure5.16showsthedecreaseinthestandarddeviation
ofbiasvaluesvisually.Whenthetestlengthwasshort,theerrorintheestimateswerehigher
75
Figure5.15:RelationshipbetweenTrueandEstimatedAbilitybyTestLengthCondition
asalsoshowninFigure5.15.Thisisnotsurprisingbecauselongertestsareexpectedto
increasethequalityofthemeasurement.
76
Figure5.16:BiasDistributionbyTestLengthCondition
StandardError
ItcanbeobservedfromFigure5.14thatasthelengthoftestincreased
themeanvalueofSEdecreased.Table5.3showsthestandarddeviationsofSEsforeachtest
lengthcondition.Thesevaluesdoesnotshowauniformpattern.
ThedistributionofSEscanbeobservedvisuallyatFigure5.17.Inthisthedashed
linecorrespondstothestandarderrorthatisexpectedforatestwithreliabilityvalue0.9.
Fortestlengthconditionslongerthan20,majorityofthesimuleeshadSEvalueslowerthan
thisthreshold.ThemedianofSEsdecreasedquicklybetweentestlength5to30.Afterthis
therewasadecresebutthisdecreasewasnotlarge.Especially,therewasaminimaldecrease
77
fromtestlength300to400,eventhoughthetestlengthincreasedby100items.Thisisan
evidenceforthelossforlongtestsinglscat.
Ateachtestlengthcondition,thedistributionsofSEshadastrongpositiveskew.The
reasonbehindthiswastheoverlapbetweentheabilitydistributionofexamineesanditem
ydistributionofitempool.Formostoftheexaminees,theitempoolhadappropriate
items.Forasmallportionofexamineeswhowereattheextremesoftheabilityscale,the
itempooldidnothaveappropriateitems.TheSEsoftheseexamineeswerelargercompared
totheexamineesthatwereclosertothecenteroftheabilitydistribution.Sincethenumber
ofexamineesattheextremesweresmallcomparedtotheonesatthecenter,apositiveskew
occurredforSEdistribution.
MeanSquaredError
TheaverageofMSEvaluesdecreasedasthetestlengthincreased
(Figure5.14).Thisdecreasewasrapidforshortertestlengthsbutthetrendwas
afterthetestlength100.Table5.3showsthatthestandarddeviationofMSEvaluesalso
decreasedasthetestlengthincreased.Figure5.18showsthedistributionofMSEforeach
testlengthcondition.ThespreadofMSEvaluesdecreasedasthetestlengthincreased.For
testlengthslongerthan100,therewasalmostnobetweentheMSEdistributions.
ExposureRates
Exposureratedistributionofeachtestlengthconditionisshownin
Figure5.19.Inthismedianexposureratesareshownwithboldlinesinthemiddleof
box-plotstotheexposureratesofeachcondition.Thedashedlinesareshowingthe
0.20and0.05levelsforexposurerates.Itisrecommendedthatexposureratesoftheitems
fallbetweenthesetwodashedlines.Forverysmalltestlengthconditionstheexposurerates
wereverylow.Exposureratesoftheitemsincreasedastestlengthincreased,becausethe
itempoolsizewasthesameforeachtestlengthcondition.
Table5.4showsthemeansandstandarddeviationsoftheexposureratesinadditionto
theratesofexposeditemsthatwerelargerthan0.20andlowerthan0.05.Anexposurerate
78
Figure5.17:StandardErrorDistributionbyTestLengthCondition
largerthan0.20canbeseenasasignofahighlyexposeditem.ItisdesirableforaCATtest
tohaveitemexposureslowerthan0.2.Ontheotherhand,verylowexposureratessignify
itemsthatarenotshowntothemajorityoftheexaminees.Thisisnotdesirablebecause
underutilizationofitemsdecreasestheiencyoftheitempools.Inthissense0.05value
canbeseenasavalueforitemswithlowexposurerates.Thesecanchange
dependingonthepurposeofthetest.
Fortestlengthconditionslowerthan35,almostnoneoftheitemswereoverexposed(i.e.
exposurerates
>
0.2).Thesmallpercentageofitems(400

0
:
0175=7items)thathadlarger
79
Figure5.18:MeanSquaredErrorDistributionbyTestLengthCondition
exposurerateswasduetothelackofexposurecontrolintheitemselectionalgorithm,as
explainedpreviouslyinSection5.1.1.2.Fortestlengthconditionslongerthan55itemsmore
than10%oftheitemswereoverexposed.Theproportionofunderexposeditemsdecreasedas
testlengthincreased.Fortestlengthconditionsshorterthan45items,morethan10%ofthe
itemswereunderexposed.
IPUI
Figure5.14showsthatthemeanvalueofIPUIdecreasedasthetestlengthincreased.
StandarddeviationsofIPUIvaluesincreasedastestlengthsincreasedasshowninTable5.3.
80
Figure5.19:ItemExposureDistributionbyTestLengthCondition
Clearly,theitempoolwasabletosupportshortertests.MeanIPUIvaluefortestswith
5itemswerealmostperfect.Alsolowstandarddeviationsforshortertestspointoutthat
mostofthesimuleessawappropriateitems.Figure5.20showstheIPUIdistributionof
simuleesvisually.ThenumberofexamineeswithlowIPUIvaluesincreasedasthetestlength
increased.Fortestlengthslargerthan100,noneoftheexamineeshadIPUIvalueslarger
than.98,andthismaximumvaluedecreasedastestlengthincreased.
InFigure5.20,eachtestlengthconditionshowsanegativeskew.Thereasonbehindthis
skewnessistheoverlapbetweentheitemydistributionoftheitempoolandthe
abilitydistributionofthesimulees.Forthemajorityofthesimulees,itempoolhaditems
thatwereclosetotheirtrue

values.Consequently,theIPUIvalueswerehighformostof
thesimulees.TheIPUIvaluesofthesimuleeswithtrue

valuesattheextremeswerelower.
81
Table5.4:ItemExposureAnalysisbyTestLengthCondition
TestLengthMeanExposureSDExposure
>
.20Exposure
<
.05
50.01250.07090.01750.9450
100.02500.07120.01750.9050
150.03750.07070.01750.8200
200.05000.07040.01750.6500
250.06250.07040.01750.4750
300.07500.07040.01750.3275
350.08750.07020.02000.2150
400.10000.06950.02750.1050
450.11250.06920.03250.0500
500.12500.06900.04250.0250
550.13750.06920.06750.0250
600.15000.06940.11250.0200
650.16250.07020.18750.0175
700.17500.07040.25500.0175
1000.25000.08220.84250.0125
2000.50000.20410.93500.0050
3000.75000.28360.97250.0000
4001.00000.00001.00000.0000
Butsincethenumberofthesimuleeswithextremetrue

valueswerelow,thedistributionof
theIPUIbecameskewed.Theamountofskewnessdecreasedasthetestlengthincreased.
Forlongertests,itempoolfailedtoprovideappropriateitemsevenfortheexamineesthat
hadtrue

valuescloseto0.TheIPUIvaluesofallexamineesstartedtodecrease.Since
IPUIhasalowerboundof0,andunliketheupperboundof1,thislowerboundcannotbe
attainablepractically,asIPUIdecreasedforallsimulees,theIPUIvaluesofthesimuleeswith
true

satthemiddleoftheabilitydistributiondecreasedmorecomparedtothesimuleesat
theextremes.ThisisthereasonwhytheskewnessofIPUIdistributiondecreasedforlonger
testlengthconditions.
OneimportantbetweenFigure5.14andFigure5.20istheuseofsummary
statisticstoshowthebetweenIPUIvalues.InFigure5.14themeanvaluesof
IPUIareshown.Ontheotherhand,onFigure5.20themedianvaluesofIPUIareshownin
additiontothequartileinformationviaboxplots.SinceIPUIdistributionswerenegatively
82
Figure5.20:IPUIDistributionbyTestLengthCondition
skewedforallconditions,themedianofIPUIvalueswerealwayslowerthanthemeanvalues.
ThismighthaveimportantconsequencesonhowtoevaluatetheoverallIPUIvalues.
Forexample,forthesimulationconditionwheretestlengthwas70,themeanandmedianof
IPUIvalueswere0.958and0.994,respectively.Accordingtomedianvaluetheitempool
functionalmostperfectly,ontheotherhandmeanofIPUIvaluesindicatesthatitempool
performancewasnotperfect.
83
IPUIandStandardError
Intheprevioussectionstherelationshipbetweenmeanvalues
ofIPUIandSEwerenegative.Asthediscrepancybetweenitempoolandabilitydistribution
increasedinthepartoftheResearchQuestion1,themeanofIPUIvaluesdecreasedand
themeanofSEvaluedincreased.Thesametypeofrelationshipwasobservedforthesecond
partoftheResearchQuestion1wheretheitempoolsizeinvestigated.Asthesizeofthe
itempoolincreased,themeanofIPUIvaluesincreasedandthemeanofSEvalueddecreased.
Fortestlengthconditions,thistrendchanged.Figure5.14onpage74showsthatastest
lengthincreased,themeanvaluesofbothIPUIandSEdecreased.Inthiscase,eventhough
increasingthetestlengthdecreasedthequalityofitempoolforthatparticularCATdesign,
theprecisionoftheabilityestimatesincreased.Thisisanimportantobservationthatshows
howtCATspinteractwiththeoutcomesofCATtly.
TherelationshipbetweenIPUIandSEforeachtestlengthconditionisshowninFigure5.21.
Forindividualsimulees,therelationshipbetweenSEandIPUIisnegative,similartowhat
wasobservedinprevioussections(Figures5.4and5.11onpage59andonpage70).t
thantheprevioussections,thecurvilinearrelationshipbetweenthesetwovariablesbecame
moreevidentasthetestlengthincreased.Figure5.21alsoshowsthecorrelationsbetween
SEandIPUIforeachtestlengthcondition.ThecorrelationbetweenSEandIPUIwaslarger
inabsolutesenseforlongertests.
5.1.2.2ExposureControl
ForthesecondpartoftheResearchQuestion2,theoftheexposurecontrolonthe
performanceoftheitempoolandotherCAToutcomeswasinvestigated.Itwashypothesized
thataddingmorerandomizationtotheitemselectionprocesstoreduceitemexposurewill
decreasethequalityoftheitempoolasquanbyIPUI.Thishypothesiswastested
using12exposurecontrolconditions.Theconditionswere(1)noexposurecontrol,(2-11)
randomesquewith3,5,7,10,13,15,20,25,50,100itemsand(12)total-random-selection
ofitems.Theseexposurecontrolconditionsstartedfromtheconditionwheretherewasno
84
Figure5.21:IPUIandStandardErrorRelationshipbyTestLengthCondition
85
randomizationintheitemselectionandgraduallyincreasedtheamountofrandomization
imposedontheitemselectionalgorithm.Thelastconditionbasicallyrandomlyadministered
itemstoexamineeswithouttakingintoaccounttheirpreviousanswers.Inthefollowingpages
theseconditionswillbereferredas:Rand-1,Rand-3,Rand-5,Rand-7,Rand-10,Rand-13,
Rand-15,Rand-20,Rand-25,Rand-50,Rand-100andRand-Total.
Anitempoolwith250itemswasgenerated.Item(
b
-parameters)ofthisitem
poolweregeneratedfromanormaldistributionwithmean0andstandarddeviation0.7.The
itemydistributionoftheitempoolisshowninFigureD.1onpage183.Thesame
itempoolwasusedforeachexposurecontrolcondition.
10
;
000
examineesweresimulated
fromanormaldistributionwithmean0andstandarddeviation1.Thesamesetofexaminees
wasusedforeachcondition.Thedistributionoftrue

isshowninFigureD.2onpage184.
Testlengthforallofthetestswas20.Therewerenoconstraintsontheitemselection
algorithmotherthanexposurecontrol.TherestoftheCATsptionswerethesameas
thecommonCATspmentionedinSection4.2.1.1onpage37.Foreachcondition,
bias,SE,MSE,ycot,exposureratesandIPUIvaluescalculated.
TheresultsoftheanalysesaresummarizedinFigure5.22.Themeanvaluesofbias,
SE,MSE,IPUIandycotforeachtestlengthconditionisshowninthe
Table5.5showsthemeansandstandarddeviationsofbiases,SEs,MSEsandIPUIsforeach
testlengthconditioninadditiontotheycots.Infollowingpageseachofthese
outputvariablesarediscussedseparately.
RelationshipbetweenTrueandEstimatedAbility
Figure5.23showstherelationship
betweentrueability(

)andestimatedability(
^

).Thedashedlinesintherearethe
identitylines(i.e.
y
=
x
line).Ateachcondition,therewasanerrorintheestimationand
pointsdeviatedsomewhatfromtheidentityline.Towardsthemiddleoftheabilityscale,
thereisabalancebetweenoverestimatedandunderestimatedestimates.Buttowardsthe
extremesoftheabilityscale,balancestartstoshift.Abilitywasoverestimatedforhighability
86
Figure5.22:SummaryStatisticsforResearchQuestion2-ExposureControl
examineesandunderestimatedforlowabilityexaminees.
Figure5.23alsoshowstheycots(i.e.correlationbetweentrueandestimated
ability)foreachexposurecontrolcondition.ForconditionsRand-1toRand-25,they
cotwasnotchangedbetweenconditions.ButafterRand-25,ycotstarted
todecrease.
Bias
Figure5.22showsthatmeanbiaschangesminimallyacrosserentconditions.This
minimalchangewasnotinonedirection,soitcanbetreatedasarandomerror.Table5.5
alsoshowsthatthemagnitudeofthesemeanbiaseswerealmost0.Thestandarddeviation
ofbiaseswerealmostthesameforconditionsinwhichrandomizationwaslessthan25
items.AfterRand-25conditionthestandarddeviationsofthebiasesstartedtoincrease
87
Table5.5:SummaryStatisticsforResearchQuestion2-ExposureControl
ConditionBiasSEMSEIPUIFidelity
1-0.000(0.295)0.287(0.044)0.087(0.149)0.961(0.113)0.9580
30.002(0.297)0.287(0.046)0.088(0.150)0.958(0.117)0.9582
5-0.001(0.296)0.287(0.046)0.088(0.151)0.956(0.118)0.9586
7-0.002(0.297)0.288(0.047)0.088(0.151)0.953(0.120)0.9581
100.004(0.299)0.289(0.053)0.089(0.166)0.949(0.127)0.9581
130.006(0.297)0.289(0.053)0.089(0.156)0.944(0.130)0.9587
150.000(0.297)0.289(0.051)0.088(0.163)0.941(0.130)0.9586
200.005(0.297)0.290(0.052)0.088(0.148)0.933(0.135)0.9588
250.001(0.296)0.291(0.057)0.088(0.156)0.926(0.140)0.9589
500.005(0.309)0.298(0.072)0.095(0.187)0.886(0.163)0.9563
1000.001(0.341)0.317(0.101)0.116(0.249)0.796(0.188)0.9494
Random-0.000(0.420)0.389(0.150)0.176(0.375)0.535(0.180)0.9307
Note.
Numberswithintheparenthesesarestandarddeviationsofeachoutcome.
Condition:ExposureControl;SE:StandardError;MSE:MeanSquaredError;Fidelity:
Correlationbetweentrueabilityandestimatedability
systematically.ThiscanalsobeobservedvisuallyfromFigure5.24.Thefactthatmeanbiases
didnotchangemakessensebecause(1)itempoolandabilitydistributionisoverlapping
and(2)bothhighandlowabilityexamineeswereabyrandomization.Fromthe
standarddeviationsofbiasesitcanbesaidthattheitempoolcansupportrandomizationtill
Rand-25.Butafterthat,theamountofrandomizationstartedtotheabilityestimates
ofexaminees.
StandardError
AsFigure5.22shows,theincreaseinthemeanSEwasalmostnonexistent
fromtheconditionRand-1toRand-25.AfterRand-25,therewasanincreaseinthemeanSE
values.Table5.5alsoshowsanincreasebutveryminimalforlowrandomizationconditions.
Inaddition,Table5.5showsthatstandarddeviationofSEsincreasedsteadilyastheamount
ofrandomizationinitemselectionincreased.VisualinspectionofthespreadatFigure5.25
indicatesthisaswell.InFigure5.25thedashedlinepointstheSEvalueforatestwith
0.9reliabilityasexplainedatSection5.1.1.2onpage69.MajorityofthesimuleeshadSEs
smallerthan0.316forconditionsRand-1toRand-50.Theupperwhiskerofthebox-plots
88
Figure5.23:RelationshipbetweenTrueandEstimatedAbilitybyExposureControlCondition
arelowerthan0.316.Fortotalrandomselectioncondition,mostofthesimuleeshadSEs
largerthanthisthreshold.ForRand-100andRand-Totalconditions,thenegativeof
randomizationonabilityestimatesaremorevisible.
MeanSquaredError
Figure5.22showsthatthepatterninaverageMSEvalueswere
almostidenticaltothepatternofSEs.ThenumbersinTable5.5indicatesthatforconditions
89
Figure5.24:BiasDistributionbyExposureControlCondition
betweenRand-1andRand-25,thechangeintheaverageMSEvalueswasminimalandnot
steady.ButtherewasanincreasingpatternforconditionsfromRand-50toRand-Total.
ThechangeinthestandarddeviationsofMSEvalueswasnotsteadybetweenRand-1and
Rand-25conditions.Forexample,standarddeviationofMSEforRand-20conditionwas
lessthanthestandarddeviationfortheRand-10condition.Ontheotherhand,theincrease
inthestandarddeviationofMSEvaluesisevidentforconditionsRand-50toRand-Total.
Figure5.26showsthisspreadvisually.EspeciallyforRand-Totalcondition,theincreasein
thespreadisevident.
90
Figure5.25:StandardErrorDistributionbyExposureControlCondition
ExposureRates
Exposureratedistributionofeachexposurecontrolconditionisshownin
Figure5.27.Inthisoneobservationwithexposurerate1inRand-1conditionhasbeen
removedfromthetoshowthespreadbetter.Sincetherewasnotanexposurecontrol
inRand-1condition,thesameitemhasbeenselectedasaitemforeachsimulee.
Forthisreasontheexposurerateofthatitemis1.Intheboldlinesinthemiddle
oftheboxplotsshowsthemedianofexposurerates.Themedianexposurerateincreased
astheamountofrandomizationinitemselectionincreased,exceptforRand-100condition.
Itisimportanttonotethatsincetheratioofitempoolsizetothetestlengthisconstant
91
Figure5.26:MeanSquaredErrorDistributionbyExposureControlCondition
foreachcondition,themeanexposurerateswereequalforeachcondition(Table5.6).It
canbeobservedthatexposurecontrolmethodworked.Thenumberofhighlyexposeditems
decreasedastheamountofrandomizationimposedonitemselectionincreased.Fortotal
randomizationcondition,allitemsexposedatthesimilarrate.
Table5.6showsthemeansandstandarddeviationsofexposureratesinadditiontothe
ratioofexposureratesthatarelargerthan0.20andsmallerthan0.05.Eventhoughthe
meansofexposurerateswereequal,thestandarddeviationsofexposureratesdecreased
steadilyexceptforconditionRand-100.Asmallstandarddeviationofexposureratesmeans,
92
Figure5.27:ItemExposureDistributionbyExposureControlCondition
theuniformexposureofitemsacrossexaminees.Thestandarddeviationofexposureratesis
importantbecauseasChenetal.(2003)showedintheirpaper(Equation14,p.134),thereis
adirectrelationshipbetweenthevarianceofexposureratesandtheaveragebetween-test
overlaprate.IntheCATscenariohere,wherethemeanexposurerateswereequalbetween
conditions,thereductioninthestandarddeviationofexposureratesdirectlytranslatesto
areductionintheaveragebetween-testoverlaprates.Thismeans,forconditionswhere
standarddeviationofexposurerateswerelower,theratioofitemstworandomlyselected
examineesbothseewouldbelower.Thisisanimportanttestsecurityissueforanoperational
highstakesCATtest.
Thepercentageofitemsthatwereexposedmorethan20%oftheexamineesreducedas
theamountofrandomizationinitemselectionincreased.AftertheconditionRand-10,none
93
Table5.6:ItemExposureAnalysisbyExposureControlCondition
ExposureControlMeanExposureSDExposure
>
.20Exposure
<
.05
10.08000.09160.04000.4560
30.08000.05660.04000.2800
50.08000.04900.04800.2920
70.08000.04390.01600.2520
100.08000.03780.00000.1600
130.08000.03430.00000.1280
150.08000.03230.00000.1120
200.08000.02760.00000.0520
250.08000.02470.00000.0600
500.08000.01800.00000.0040
1000.08000.02160.00000.1000
Random0.08000.00250.00000.0000
oftheitemsexposedmorethan20%oftheexaminees.Thepercentageofitemsthatwere
exposedtolessthan5%oftheexamineesalsoreducedastheamountofrandomizationin
theitemselectionincreased,exceptforRand-100condition.Thisindicatesthat,allofthe
itemswereutilizedtoagreaterextentanditempoolbecamemoret.Theseresults
showedthat,fromtheexposurecontrolperspective,totalrandomizationofitemselectiondid
thebestjob.Butaspreviousresultsregardingbias,SEandMSEsuggestedthisconditionis
notdesirablefromothertheaspectsofthemeasurementpractice.
IPUI
Previousparagraphsshowedthatthemeanvaluesofbias,MSEandSEdidnot
forexposurecontrolconditionsbetweenRand-1toRand-25.Ontheotherhand,
Figure5.22showsthatIPUIdecreasedsteadilyasmorerandomizationimposedonitem
selectionmechanism.ForconditionsbetweenRand-50andRand-Total,thedecreaseisvery
clear.Table5.5showsthisdecreasenumerically.ThestandarddeviationofIPUIvalues
increasedsteadilyaswell,exceptforRand-Totalcondition.
Figure5.28displaysthedistributionofIPUIvaluesforeachexposurecontrolcondition
visually.TheincreaseinthespreadofIPUIvaluesareapparentfromtheboxplotsshown.
TheboldlinesintheboxplotsshowsthemedianvaluesofIPUI.Medianvaluesweregenerally
94
largerthantheaveragevaluesgiveninTable5.5.Consequently,thedecreaseinthemedian
valuescannotbeseenfromthegraph.
Figure5.28:IPUIDistributionbyExposureControlCondition
FortheRand-Totalcondition,eventhoughtheoverallIPUIvaluesweresmaller,majority
oftheexamineeshadIPUIvalueslargerthan0.6.Themainreasonforthiswastheoverlap
ofitempoolandabilitydistribution.Iftherewasalargediscrepancybetweenitempooland
abilitydistribution,thentheoverallIPUIvalueswouldbeevenless.
95
IPUIandStandardError
TherelationshipbetweenSEandIPUIisshowninFigure5.29.
ForeachconditiontherewasanegativerelationshipbetweenSEandIPUI.Therelationship
wascurvilinearandthereweremorespreadinthevaluesofIPUIcomparedtoSE.The
relationshipbetweenbiasandIPUIisplottedinFigureD.3onpage185.Therewasnota
clearrelationshipbetweenthesetwovariables.
5.1.3ResearchQuestion3
Inthisresearchquestion,theuseofIPUIasadiagnostictoolisevaluated.Ahypothetical
exampleofastatetestingagencywasintroducedinSection4.2.1.4onpage44.Thisstate
testingagencyhadthreeplanstotest.Itempoolinformationoftheseitempoolswereshown
inTable4.1onpage46,thedistributionsofitempoolsareinFigure5.30.Theplan
consistofanitempoolofsize90fromthreetcontentareas.Theoverallcultiesof
eachcontentareaweret.Inthisplan,contentbalancingwasimposedonthe
itemselection.Inthesecondplan,thesameitempoolusedforPlan1wasused.t
thanPlan1,contentbalancingwasnotimposedontheitemselection.InPlan3,theitem
poolconsistedof90itemsfromthecontentarea.Nocontentbalancingwasimposed
ontheitemselectionforthisplanaswell.Otherthanthebetweenitempoolsand
thecontentbalancing,theCATspationsoftheplanswerethesame.
ThefollowingparagraphsinvestigatetheseitempoolsusingtraditionalCAToutcomes
suchasmeanbiasandmeanSEateachtrue

.TheseresultsarecomparedtotheIPUI
resultsandtheutilityofIPUIasadiagnostictoolisdiscussed.Sincethisresearchquestion
investigatestheitempools,inthefollowingpages,plansarereferredasitempools.So,Plan
1,2and3arereferredasIP-1,IP-2andIP-3,respectively.
Bias
Figure5.31showsthemeanbiasvaluesconditionalontrue

valuesforeachitem
pool.IP-1andIP-2performedsimilarlythroughouttheabilityscale.IP-1hadoverallhigher
biascomparedtoIP-2.Thisshowstheofcontentbalancing.IP-3hadsimilarmean
96
Figure5.29:IPUIandStandardErrorRelationshipbyExposureControlCondition
biasonthepositivesideoftheabilityscale.Onthenegativesideoftheabilityscale,the
meanbiasesweremuchlargercomparedtotheotheritempools.AsFigure5.30shows,IP-3
didnothaveanyitemsthathadyparameterslessthan0.5.Clearly,thison
thebiasesofabilityestimates.Thebiasdistributionsforeachitempoolconditionshownin
97
Figure5.30:ItemPoolDistributionsofProposedTestPlans
FigureE.1onpage186corroboratesthisobservation.
StandardError
Figure5.32showsthemeanstandarderrorvaluesconditionalontrue

valuesforeachitempool.IP-2inplan2performedthebestthroughouttheabilityscale
exceptthepositiveendoftheabilityscale.ThemeanSEvaluesforthisitempoolwere
thelowestatthisinterval.ThemeanofSEforIP-1ofplan1hadthesameshapeasthe
IP-2buthadlargervaluescomparedtotheIP2.Closetothemiddleoftheabilityscale,the
betweenthesetwoitempoolsalmostdisappeared.MeanSEvaluesofIP-3were
substantiallylargeatthenegativesideoftheabilityscale.Attherightof

=
:
5,wherethis
itempoolwascomparativelystrong,ithadthelowestSEs.FigureE.2onpage187shows
thedistributionsoftheSEsateachtrue

value.Thisgraphalsocorroboratesthementioned
98
Figure5.31:MeanBiasConditionalonTrue

foreachPlan(ItemPool)
amongtheitempools.
IPUI
Figure5.33showsthemeanIPUIvaluesconditionalontrue

valuesforeachitem
pool.PerformanceoftheIP-2ofplan2wasbetterthanotheritempoolsfor

valuessmaller
than1.Fortrue

valueslargerthan1,theperformanceofIP-3wasbetter.ThemeanIPUI
valuesofIP-1waslowerthantheIP-2foralltrue

values.Thisclearlyshowstheof
contentbalancingontheperformancesoftheitempools.
ThesimilarityofSEandIPUIcanbeobservedfromtheIfthelinesinFigure5.33
werealongthexaxis,theshapeswouldresembletheSEdistributionsonFigure5.32.
Butthebetweenthesetwoarealsoapparent.Forexample,eventhough
IPUIresultsshowalargedbetweentheperformancesofitempoolsforPlan1and
99
Figure5.32:MeanStandardErrorConditionalonTrue

foreachPlan(ItemPool)
Plan2,thiswasnotontheSEfor

valuesbetween-0.5and0.5.So,for
thetrue

value0,theabsolutebetweenIPUIvaluesofIP-1andIP-2wereabout
0.2,theabsoluteofmeanSEvalueswerealmostnon-existent,0.014.Ontheother
hand,for

=2
:
2,theabsolutebetweenboththemeanIPUIvaluesandthemean
SEvalueswere0.17.So,thereisnotaonetoonerelationshipbetweenthesetwoindicators.
Figure5.33showstheweakpointsoftheitempoolsforeachplan.Itisadvisablefortest
developerstoaddmoreitemsaroundthe

valueswhereIPUIvalueswerelow.Butwhen
thereiscontentbalancingasinPlan1,theadviseofaddingmoreitemstotheitempool
whereIPUIvaluesarelowmightnotbeclear.Because,atestdevelopermightask\Add
itemsfromwhichcontentarea?".Toanswerthisquestion,thetestdevelopermightcheck
thegraphofthemeanIPUIvalueateachtestquestionforagiventrue

.
100
Figure5.33:MeanIPUIConditionalonTrue

foreachPlan(ItemPool)
Figure5.35showsthegraphofthemeanIPUIateachitemnumberofthetest.In
thisgraphatestdevelopercanobservewhethertheitempool'sperformancereducesasthe
testproceedsforcertaingroupsofexaminees.Iftheitempoolcannotprovideappropriate
items,theIPUIvaluewilldecreaseasthetestproceeds.Suchinformationwillguidethe
testdevelopersabouthowmanyitemsareneededtoaddtotheitempooltoresolvethis
.Forexample,IP-2abletoprovideappropriateitemstotheexamineeswithtrue

valuesbetween-1.4and1.4throughoutthetest.Butattheextremesoftheabilityscale,
aftertheseconditem,theitempoolfailedtoprovideappropriateitems.Ataround13items,
theitemsdidnotmatchtheabilityestimateoftheexamineeswithextremetrue

values.
IP-2andIP-3showsasmoothdecliningtrendfromthebeginningofthetesttotheend
forallofthetrue

values.ButIP-1showsazigzagshape.Thereasonbehindthisisthe
101
Figure5.34:IPUIDistributionConditionalonTrue

foreachPlan(ItemPool)
contentbalancing.Forexample,fortrue

value1.4,theCATalgorithmonaveragegavean
itemwithIPUIvalueof0.24asthefourthiteminPlan1.Butthenextitemhadahigher
IPUIvalue,thesixthitemhadevenhigherIPUIvalue.Foritem4,eventhoughtherewere
manyappropriateitemsavailableintheitempool,therewasnotaitemincontent
area1(arithmetic)topresenttotheexaminees.Contentbalancingcombinedwithaweak
itempooldistortedthemotivationbehindtheCAT:providingthemostappropriateitemsto
theexaminees.Theproblemofcontentbalancingwhenthereisaninteractionbetweenitem
102
yandcontentareaswasalsodescribedinWayetal.(2002):
Ifsomeitemtypesareinherentlymorethanothersandthecontent
spcallforeveryexamineetobeadministeredequalnumbersofeach,
thealgorithmwilltendtochoosethemostitemsfromtheeasycontent
areasforthehigh-abilityexaminees.(p.146)
AtestdevelopercanlookattheFigure5.35andadditemstothecontentareaswhich
haslowIPUIvalues.Foritempoolswithoutcontentbalancing(IP-2andIP-3),thisgraph
showsasimilarinformationastheprevioustwographs.Butstillthisgraphsgivesanideato
thetestdeveloperaboutapproximatelyhowmanyitemsshouldbeaddedtotheitempool
aroundthevicinityofthe

tomakeitsatisfactory.
Figure5.36showstherelationshipbetweentheintermediate

estimatesofeachexaminee
andtheIPUIvalueoftheparticularitemadministeredthatisappropriateforthatintermediate

estimate.Thepointsarecolorcodedbasedonthecontentoftheitemadministered.This
graphcombinesalloftheexamineesateachtrue

condition.Inthisregard,theability
distributionoftheexamineescanbeseenasauniformdistribution.Sincetherewasno
contentbalancinginotherplans,thecolorcodingisrelevantonlyforplan1.
Thisgraphisveryhelpfulfortestdeveloperstoseetheweakspotsoftheitempool.
Forplan1,duetothecontentbalancingimposed,IPUIvalueswerelowwhentheitem
selectionalgorithmselectedanitemfromacontentareawheretheintermediate

estimates
wereoutsidetheyrangeofthatcontentarea.Forinstance,whenanexaminee's
intermediate

estimatewaslowanditemselectionalgorithmhadtoadministeratrigonometry
item,theIPUIvalueforthatintermetiate

estimatewouldbelow,eventhoughtheeasiest
trigonometryitemavailableadministeredtotheexaminee.Astestproceedstheitempool
depletedeasytrigonometryitemsandevenhardertrigonometryitemsadministeredtothe
examinee.Thisviciouscyclecontinuesuntilthetestends.Atestdevelopercanobserve
thisfromFigure5.36andaddeasytrigonometryitemstotheitempool.Algebraitems
103
Figure5.35:MeanIPUIateachItemNumberforSelectedTrue

s
wereadequatearoundthemiddleofthe

scalebutmorealgebraitemsneededoutsidethe

interval[-1,1].
Thesameitempoolperformedbetterinplan2whentherewasnocontentbalancing
imposed.Theitempoolwasadequatewhenexaminee'sintermediateabilityestimateswere
between-1.5and1.5.Thisitempoolneedsveryeasyandveryitems.
Theitempoolforplan3performedwellbetween0.5and2.Outsidethisinterval,this
itempoolfailedtoprovideappropriateitemstotheexaminees.Moreitemsareneededfor
thisitempoolwithitemyparametersequaltotheintermediate

estimateswhere
104
IPUIvalueswerelow.
Figure5.36:TheRelationshipbetweenIntermediate

EstimateandIPUI
Onelimitationofthisgraphisit'sdependenceonthequalityofthe

estimates.Forplan
3,eventhoughthereweremanyexamineesthathadtrue

valuesbetween-3and-2,almost
noneoftheseexamineeshad

estimatesthatwerelowerthan-2.Infact,asFigure5.31on
page99shows,therewasalargepositivebiasintheestimatesforplan3.Thismightcausea
105
testdevelopertothinkthatnoitemsareneededwithitemyparametervaluessmaller
than-2.Addingeasieritemstothisitempoolwouldreducethebiasesinthe

estimatesand
potentiallyhighlighttheneedformucheasieritems.
Figure5.37showstherelationshipbetweenintermediate

estimatesandthey
parametersoftheitemsadministeredatthose

estimates.Thepointsarecoloredtoindicate
theleveloftheIPUIvalues.ThegreenpointsrepresenthighIPUIvaluesandredpoints
representlowIPUIvalues.Thedashedredlineistheidentity(
y
=
x
)line.Iftheitempool
providedappropriateitemstotheexaminees,thepointswouldfallontheidentityline.
Figure5.37:TheRelationshipbetweenIntermediate

EstimateandItemy
Thegraphforplan1showstheofcontentbalancing.Thepointsdeviatedfrom
theidentitylinethroughoutthe

scale.Thisitempoolwasadequatebetween-2and3for
plan2.Thedeviationsstartedtowardstheextremes.Thisitempoolwasnotadequatewhen
106
theintermediate

estimatesreachedtheextremesofthe

scale.Theitempoolforplan3
wasadequateforonlyasmallportionofthe

scale.Thisitempoolwasnotadequateforthe

estimatesoutsidetheintervalbetween0and2.
5.2SecondPhase-OperationalData
Inthesecondphaseoftheanalysis,ResearchQuestions4and5wereinvestigated.These
tworesearchquestionswereansweredusingeoperationalitempoolsinadditiontofour
generateditempools.Twoofthegenerateditempoolswereidealitempoolsgeneratedfor
thespcationsandexamineepopulationofNCLEX-RNexamination.Othertwoitem
poolsweregeneratedfromtheoperationalitempoolbyremovingsomeproportionof
theitems.Inthefollowingsections,theresultsoftheitempoolgenerationresultswill
bepresented,othertwosectionswillpresenttheresultsoftheResearchQuestions4and5.
5.2.1IdealItemPoolGeneration
Twoidealitempoolsweregeneratedtocomparedwiththeoperationalitempools.The
idealitempoolhadbinsizeswithlengths0.4.Themiddlebinwascenteredaround0in
the

scale.Thesecondidealitempoolhadxedbinsizeswithlengths0.8.Theitem
poolwasdesignedtobemoreprecisecomparedtothelatter.Reducingthelengthofthebin
sizesfurtherwouldincreasedtheprecisionbutinthatcasemoreitemswouldbeneeded.If
theitempoolhadmorethannecessaryitems,mostofthemwouldnotbeadministeredto
theexaminees.Thiswouldreducetheoftheitempool.Previousresearch(He&
Reckase,2013)andpreviousunpublisheditempooldesignstudiesforNCSBNshowedthat
binsizeswith0.4and0.8binwidthswouldbegoodchoices.
Inthecreationoftheidealitempools,
10
;
000
examineesweresimulated.Eachsimulee
neededrentitemsdependingonher

valueandresponses.Thesizeoftheidealitem
poolgrewasthenumberofsimuleesincreased.Afteranumberofsimulees,therewasalot
107
ofoverlapbetweentheitemsneededbysimuleesandtheneedfornewitemsdecreasedunless
asimuleehadanextremeabilityorirregularresponses.Figure5.38showsthegrowthofthe
sizeoftheidealitempoolforbinsize0.4.Thegrowthgraphforbinsize0.8is
inFigureF.1onpage188.Thepanelofthegraphsshowthegrowthprogressforeach
contentarea.Secondpanelofthegraphsshowtheoverallgrowthprogressoftheidealitem
pool.
Figure5.38:ProgressPlotforIdealItemPoolwithFixedBinSize0.4
Thesegrowthprogressgraphsareimportantinthediagnosisofthebin-and-unionmethod.
108
Ifthelinesdonotconvergetoasinglepoint,thismeansthenumberofexamineessimulated
wasnotenoughforthecreationoftheidealitempool.Inthatcase,moresimuleesareneeded
toobtainastableidealitempool.Fortheidealitempoolsgeneratedforthisdissertation,
theconvergencereachedafter
3000
simuleesforthebinsize0.4itempoolandafter
2000simuleesforthebinsize0.8itempool.
Attheendofthesimulations,idealitempoolwithbinsize0.4had
3071
items.Ideal
itempoolwithbinsize0.8had
1652
items.Theitemultyparameterdistribution
bycontentareafortheidealitempoolwithbinsize0.4isinFigure5.39.Thesame
graphforthebinsize0.8isinFigureF.2onpage189.The
b
-parameterdistributions
hadalmostthesameshapeforeachcontentarea.Thenumberoftheitemswithineach
contentareathecontentareadistributionsshowninTable4.2onpage47.Thepeaks
towardstheextremesoftheabilityscaletherelativelylargenumberofitemswithin
thelastbins.Thebinsattheextremescapturedalloftheitemsthatwereneededinthe

scalefromthelastbin'sboundarytothey.
5.2.2ItemPoolsUsedintheSecondPhase
Inthesecondphaseofthisstudy,eoperationalitempools(namedasOp1,...,Op5)
werecomparedtofourgenerateditempools.Twoofthegenerateditempoolswereideal
itempools,theidealitempoolwithbinsize0.4(Ideal0.4)andtheidealitempoolwithbin
size0.8(Ideal0.8).Thelasttwoitempoolsweregeneratedbyremovinghalfofthe
operationalitempool(Half-IP)andbyremovingtwothirdsoftheoperationalitempool
(One-ThirdIP).
Theitemydistributionsandthenumberofitemswithineachitempoolisin
Figure5.40.Thedashedredlineinthehistogramsarethemeanitemcultyvaluesof
eachitempool.Thenumberofitemswithineachoperationalitempoolwasthesame.The
distributionsoftheoperationalitemswerealmostthesameaswell.Themajorityofthe
itemswereclosetothecutscore.Thereweremoreeasyitemsthanitemswithin
109
Figure5.39:ItemyDistributionsbyContentAreaforIdealItemPoolwithFixed
BinSize0.4
eachoperationalitempool.Alsotheoperationalitempoolsdidnotspreadmuchtoeither
extremeoftheabilityscale.
Thehalfitempoolandtheonethirditempoolhadexactlyhalfandonethirdasmany
itemsastheoperationalitempool,respectively.Theshapeofthedistributionsofthese
twoitempoolsweresimilartotheoperationalitempools.
Theidealitempoolwithbinsize0.4hadmorethantwiceasmanyitemsasthe
110
operationalitempools.Thisitempoolhadmanyitemsclosetothemiddleoftheability
scale.tthantheshapesoftheoperationalitempools,thisitempoolhadalmost
equalnumberofeasyandharditems.Thespreadofthisitempoolwasalsowidercompared
totheoperationalitempools.
Idealitempoolwithbinsize0.8had180itemsmorethantheoperationalitem
pools.Ithadlessitemsaroundthemiddleoftheabilityscalecomparedtobothoperational
itempoolsandIdeal0.4itempool.Ithadmoreitemsattheextremes.Theboundaries
ofthebinsarevisibleforthisitempool.Closetothebinboundaries,therewasabump
inthenumberofitems.Whilegeneratingtheidealitempools,thecandidateitemswere
generatedfromastandardnormaldistribution.Iftheitemsweregeneratedfromauniform
distribution,thebumpsclosetothebinboundarieswouldhavebeendisappeared.Butif
auniformdistributionwasusedasthegeneratingfunction,thedecisionfortheminimum
andmaximumvalueofthisdistributionwouldhavebeenarbitrary.Theparametersofthe
uniformdistributioncannotbey,asaresulttheabilityscalewouldhavebeen
arbitrarily.
5.2.3ResearchQuestion4
TheaimofthisresearchquestionistoobservetheperformanceofIPUIforanoperational
CAT.Inthepreviousresearchquestions,theCATscenarioswererathersimplistic.Conditions
inonlyoneaspectfromaverysimpleCATalgorithm.Inthisresearchquestion,a
complexCATalgorithmwasinvestigated.ThisCATalgorithmincludescontentbalancing,
exposurecontrol,atwostageabilityestimationmethodandacomplicatedstoppingrule.
PredictingtheitempoolperformanceforrathersimpleCATscenariosmightbefeasible.But
foracomplexCATalgorithmastheoneinvestigatedinthisresearchquestion,predicting
itemperformancewouldbe
Thesamesimuleesamplewasusedforthesimulationsforeachitempoolcondition.
10
;
000
simuleesweregeneratedfromanormaldistributionwiththesimilarmeansandstandard
111
Figure5.40:ItemyDistributionsforItemPoolsUsedinResearchQuestion4
deviationsoftherealexamineesthattookthetests.The

distributionofthesimuleesare
plottedinFigureG.1onpage190.
Foreachitempool,meansofbias,SE,MSE,IPUI,decisionaccuracyandycot
calculated.ThesevaluesforeachitempoolconditionaregiveninFigure5.41.Ascanbeseen
fromthegraph,exceptfortheIPUIvaluestherewasalmostnobetweent
itempools.Table5.7showsthenumericvaluesofmeansandstandarddeviationsofthese
values.Table5.7alsoshowsthemeanandstandarddeviationofthetestlengthforeach
112
itempoolcondition.Inthefollowingpages,eachoftheseoutcomevariableswillbediscussed
separately.
Figure5.41:SummaryStatisticsforResearchQuestion4
IPUI
Thesizeoftheitempooldidnotmakemuchinotheroutcomesofthe
adaptivetestexcepttheexposurerates(whichwillbediscussedlaterinthissection).Since
theIPUIisadirectmeasureofthequalityofanitempool,thedintheitem
poolsareexpectedtoontheIPUIvalues.InFigure5.41,theonlybetween
itempoolswastheirIPUIvalues.TheIPUIvaluesofHalf-IPandOne-Third-IPwerelower
comparedtotheotheritempools.
Table5.7showsnobetweenthemeanandstandarddeviationsoftheIPUI
valuesfortheoperationalitempools.Ideal-0.8itempoolhadasimilarmeanIPUIvaluebut
113
Table5.7:SummaryStatisticsforResearchQuestion4
ItemPoolFidelityCotBiasSE
Op10.91840.016(0.238)0.222(0.055)
Op20.91920.014(0.239)0.223(0.055)
Op30.9180.010(0.241)0.222(0.055)
Op40.91780.015(0.240)0.222(0.055)
Op50.91970.011(0.236)0.222(0.056)
HalfIP0.9170.014(0.241)0.223(0.056)
One-ThirdIP0.91910.012(0.238)0.224(0.056)
Ideal(0.4)0.9180.012(0.240)0.222(0.055)
Ideal(0.8)0.91810.011(0.239)0.222(0.056)
ItemPoolMSEIPUITestLengthDecisionAccuracy
Op10.057(0.092)0.995(0.007)108.556(72.913)92.34%
Op20.057(0.093)0.995(0.006)108.524(73.285)92.59%
Op30.058(0.093)0.995(0.007)108.629(72.866)92.61%
Op40.058(0.091)0.995(0.007)109.007(73.083)92.68%
Op50.056(0.092)0.995(0.007)109.635(73.565)92.43%
HalfIP0.058(0.093)0.980(0.019)109.446(73.228)92.21%
One-ThirdIP0.057(0.091)0.957(0.030)110.220(74.077)92.23%
Ideal(0.4)0.058(0.095)0.999(0.001)108.718(72.903)92.22%
Ideal(0.8)0.057(0.091)0.994(0.002)109.352(73.183)92.33%
Note.
Numberswithintheparenthesesarestandarddeviationsofeachoutcome.
SE:StandardError;MSE:MeanSquaredError
thestandarddeviationofIPUIvalueswaslowercomparedtotheoperationalitempools.
Ideal-0.4itempoolhadameanIPUIvaluethatisalmost1andthestandarddeviationof
IPUIvalueswasthelowest.HalfandOne-ThirditempoolconditionshadlowermeanIPUI
valuesandtheyhadlargerstandarddeviations.Thequalityoftheitempoolswereclearly
onthestatisticsoftheIPUIvalues.
Figure5.42showsthedistributionsofIPUIvaluesvisually.Operationalitempools
hadsimilardistributionsbuttheyhadmanyexamineeswhohadcomparativelylowerIPUI
values.Ideal-0.8itempoolhadfarfewerexamineesthathadlowerIPUIvalues.Noneof
theexamineesgotanIPUIvaluelessthan0.95forthisitempoolcondition.Ideal-0.4item
poolperformedthebest.Allexaminees,except5ofthem,hadIPUIvalueslargerthan0.99.
SpreadofIPUIvaluesweremuchlargerfortheHalf-IPandOne-Third-IPconditions.In
114
fact,noneoftheexamineesinOne-Third-IPconditionhadanIPUIvaluelargerthan0.99.
Figure5.42:IPUIDistributionforeachItemPoolCondition
RelationshipbetweenTrueandEstimatedAbility
Figure5.43showstherelationship
betweentrueability(

)andestimatedability(
^

).Thedashedlinesintherearethe
identitylines(i.e.
y
=
x
line).Ateachcondition,therewasanerrorintheestimationand
pointsdeviatedsomewhatfromtheidentityline.Closetothemiddleoftheabilityscale
thereisasqueezeinthespreadaroundtheidentityline,estimatedabilitiesapproximated
theirtruevaluesbetter.Thereasonofthisisthevariablelengthtest.Aroundthecutscore,
115
testlengthswerelongercomparedtotheexamineesattheextremesoftheabilityscale.As
indicatedatSection5.1.2.1onpage74,longertestsincreasethetestprecisionandreducethe
estimationerror.Figure5.43alsoshowsthecorrelationsbetweentrueandestimatedabilities
foreachitempool.Therewasalmostnobetweenthecorrelationsoftheitempools.
Figure5.43:TheRelationshipbetweenTrue

andEstimated

IPUIandEstimatedAbility
Figure5.44showstherelationshipofIPUIandestimated
ability.Thisgraphisusefultoseethelocationsinabilityscalewheretheitempoolwas
116
t.OperationalitempoolshadveryhighIPUIvaluesaroundthecutscore.But
towardstheextremesoftheabilityscalethespreadofIPUIvaluesincreased.Especiallyfor
highabilityexaminees,theitempooldidnothaveenoughappropriateitems.
IdealitempoolshadhighIPUIvaluesthroughouttheabilityscale.Half-IPhadlower
IPUIvaluesforexamineesclosetothecutscore.Towardstheextremesoftheabilityscale
theIPUIvaluesdecreasedevenmore,especiallyforhighabilityexaminees.One-Third-IP
performedevenworsethanHalf-IP.Closetothecutscore,IPUIvaluesmakeadip,then
towardstheextremesoftheabilityscaleIPUIvaluesfellevenfurther.Comparedtothelow
abilityexaminees,highabilityexamineeshadlowerIPUIvalues.
Figure5.44showedthat,exceptidealitempools,theIPUIvaluesoftheexamineesatthe
extremesoftheabilityscalewerenotashighastheexamineesclosetothecutscore.From
theperspectiveoftestdevelopersforthistest,thismightnotbeveryproblematic.Thetest
isalicensuretestwithonecutscore.Aslongasthedecisionaccuracyishighforthetest,
thetestisdeemedtoit'spurpose.Forthisreason,theitempoolshouldhavet
itemsaroundthecutscore.Testlengthcangoupto250forexamineesclosetothecut
score.Theitempoolshouldsupportthislongtest.Figure5.44givessomeevidenceforthis.
Aroundthecutscore,foreachitempoolexcepttheHalf-IPandOne-Third-IPconditions,
examineeshadalmostperfectIPUIvalues.
IPUIandTestLength
Additionalevidencefortheyofitempoolaroundcut
scorecomesfromFigure5.45thatshowstherelationshipbetweenIPUIandtestlength.
ExceptforIdeal-0.4itempool,therewasaspreadofIPUIvaluesforteststhatlast60items.
Examineeswithtestlengths60wereawayfromthecutscore.Decisionsfortheseexaminees
wereclearafter60items.Theseexamineesweretheonesthatwerelocatedoutsideofthe
(-0.5,0.5)bandoftheabilityscaleinFigure5.44.
Testswerelongerforexamineeswhoseestimatedabilitieswereclosetothecutscore.
IPUIvaluesforexamineeswhotooktestslongerthan60itemswerecloseto1foroperational
117
Figure5.44:TheRelationshipbetweenIPUIandEstimatedAbilityforeachItemPool
Condition
itempoolsandidealitempools.Thisisanimportantindicatorforthequalityoftheitem
pool.Theseitempoolswereabletosupportverylongtestswith250items.Itiscrucialfor
anitempooltosupportlongtestsbecausehighdecisionaccuracyismuchneededforthe
examineeswhotaketheselongtests.Thesetestswerelongbecauseitwasdtomakea
decisionfortheseexaminees.Ontheotherhand,forHalf-IPandOne-Third-IPconditions,
Figure5.45showsthatIPUIvaluesstartedtodecreaseastestlengthincreased.Thisismore
118
visibleforOne-Third-IP.Fortheseitempools,astestlengthincreased,itempoolfailedto
provideappropriateitems.
Figure5.45:TheRelationshipbetweenIPUIandTestLengthforeachItemPoolCondition
Bias
Figure5.41showsnobetweenthemeanbiasesoftheitempools.Table5.7
showsthatmeanbiasvalueswerealittleabove0foreachitempoolcondition.Thestandard
deviationofthebiaseswerethesamefortitempoolconditionsaswell.Figure5.46
119
showsthisvisually.Therewasalmostnobetweenthebiasdistributionsofdit
itempools.
FigureG.2onpage191showstherelationshipbetweentheestimatedabilityandbias.If
therewasnobias,thepointswouldbeonthedashedlineinthemiddle.Thebiaswassmall
aroundthecutscore,asexplainedabove.But,therewasaratherlargecorrelationbetween
estimatedabilitiesandbiasesforeachitempoolconditionasshowninthetextboxes.The
regressionlinestothepointsalsoshowthispositiverelationship.Forhighabilitylevels
thebiaseswerepositive,abilitieswereoverestimated.Forlowabilitysimulees,abilitieswere
underestimated.
BiasandIPUI
Figure5.47showstherelationshipbetweenbiasandIPUI.Alinear
regressionlinewastoeachplottoshowthelinearrelationshipbetweenthesetwo
variables.Fromthegraphitcannotbeconcludedthathighabsolutebiasisassociatedwith
highIPUI.Theexpectedrelationshipforthisgraphwasaninverted-Ushapedcurvilinearline,
whereIPUIwashighforlowabsolutebiasvaluesandlowforhighabsolutebiasvalues.This
conclusionisinlinewiththeofthepreviousresearchquestions.Ontheotherhand,
thelinesshowsthatthereisaweaknegativerelationshipbetweenbiasandIPUI.For
idealitempools,thisrelationshipalmostdisappears.Thisrelationshipismoreevidentfor
Half-IPandOne-Third-IPconditions.Figure5.44showsthatbothoperationalandshortened
itempoolswereweakonthepositivesideoftheabilityscale.Figure5.47theweakness
oftheitempoolsforhighabilityexaminees.Whenitempooldistributionswerebalanced,as
theidealitempools,therewasnorelationshipbetweenIPUIandbias.
StandardError
Figure5.41showsalmostnobetweenthemeanvaluesofSEs
fortitempoolconditions.NumericvaluesatTable5.7alsothis.Boththe
meansandstandarddeviationsofSEvalueswerethesameforeachitempoolcondition.Only
fortheOne-Third-IPconditiondidthesevaluesincrease,butthisincreasewasverysmall.
120
Figure5.46:BiasDistributionforeachItemPoolCondition
Figure5.48showsthedistributionofSEvaluesvisually.Thedashedlineshowsthe
thresholdforatestwith0.9reliability
1
.InthisexceptOne-Third-IP,allofthe
remainingitempoolshadalmostsameSEdistributions.Thenumberofsimuleeswhohad
highSEvalueswerehigherforOne-Thirditempoolcondition.Therewasoneexamineewho
hadaverylargeSEcomparedtotherestoftheexaminees.Also,theminimumvaluesofSEs
1
ThiswasexplainedinSection5.1.1.2onpage69.Thisnumberwasderivedunderthe
assumptionthatthestandarddeviationofthetrueabilitiesofthepopulationis1.Inthe
simulationsofResearchQuestion4,thestandarddeviationofthesimulatedtrueabilities
werelowerthan1.So,eventhoughthedashedlinegivessomeideaaboutthereliabilityof
thetest,acautionisadvisedwheninterpretingtheseresultsusingthisthreshold.
121
Figure5.47:TheRelationshipbetweenIPUIandBiasforeachItemPoolCondition
forthisitempoolwerelargercomparedtootheritempoolconditions.Inthissimulation,the
examineeswithsmallSEswereclosetothecutscore.Fortheexamineesthatwerecloseto
thecutscore,SEswerelargerforOne-Third-IPcondition.Thiscanbeinterpretedasthe
weaknessofthatitempool.
FigureG.3onpage192showstherelationshipbetweenestimatedabilityandSE.There
wasaclearrelationshipbetweenSEandestimatedability.Towardstheextremesoftheability
scale,thestandarderrorswerehigher.Theseweretheexamineeswhoseexamsat
122
Figure5.48:StandardErrorDistributionforeachItemPoolCondition
the60thitem.TherewasasteepdecreaseinSEsbetween
^

=

0
:
5and
^

=

0
:
25,and
steepincreasebetween
^

=0
:
25and
^

=0
:
5.MostoftheseSEsbelongtotheexamineeswith
testlengthsbetween61and249.LowestSEvaluesbetween
^

=

0
:
25and
^

=0
:
25belong
totheexamineeswithtestlengths250.Thispatternwasthesameforalloftheitempools.
Buttthantheotheritempools,forOne-Third-IP,theSEsincreasedevenmoreat
theextremesoftheabilityscale.Mostprobablecauseofthisincreasewasthescarcityof
theitemswithintheitempool.AsimilartrendcanbeseentosomeextentfortheHalf-IP,
especiallyforhighabilityexaminees.
123
TherelationshipbetweenSEandtestlengthisshowninFigureG.4onpage193.There
wasaclearnegativerelationshipbetweentestlengthandSEforallitempoolconditions.
Thecorrelationcotsinthetextboxesshowaverystronglinearrelationshipbetween
thesetwovariables,eventhoughtherelationshipappearstobecurvilinear.
IPUIandStandardError
TherelationshipbetweenSEandIPUIisshowninFigure5.49.
ThereisnotanapparentrelationshipbetweenSEandIPUIbecausethisrelationshipis
confoundedwiththetestlength.Infact,thisresemblesFigure5.45,onlythex-axisis
ed.ThereisadirectrelationshipbetweentestlengthandSEasshowninSection5.1.2.1
onpage77.Fortestswith60items,SEwashigh,IPUIwasgenerallylowforshorttests
asshowninFigure5.45.Astestlengthincreased,theSEdecreased.Foroperationaland
idealitempools,theIPUIvalueswerenear1evenfortestswith250items.Asaresult,even
thoughSEsdecreasedIPUIvaluesremainedcloseto1.ForOne-Third-IPcondition,IPUI
valueswerelowerwhenSEwerelower.Thereasonforthiswasthelackoftitemsin
One-Thirditempooltoprovideexamineeswithverylongtests.
MeanSquaredError
Figure5.41showsnobetweentheaverageMSEvalues
betweenitempools.ThevaluesinTable5.7alsodoesnotshowanynoticeable
ineithermeansorthestandarddeviationsoftheMSEvalues.Visualinspectionofthe
individualMSEvaluesinFigure5.50doesnotshowanyvisiblebetweenitempools.
FigureG.5onpage194showstherelationshipbetweenIPUIandMSE.Therewasaweak
negativeassociationbetweenthesetwovariables.Itwasexpectedthattheexamineeswho
didnottakethemostappropriateitems(i.e.hadlowIPUIvalues)alsohadhighMSEvalues.
ExposureRates
NCLEX-RNisahighstakestest.Controllingtheexposureratesisvery
importantforthesecurityofthetest.Figure5.51showstheexposureratedistribution
foreachitempoolcondition.Thedashedlinesintheshowthe0.20and0.05levels
forexposurerates,whichcorrespondstorecommendedhighandlowexposurethresholds
124
Figure5.49:TheRelationshipbetweenIPUIandStandardErrorforeachItemPoolCondition
foritems,respectively.Theexposureratedistributionsofalloperationalitempoolshad
anegativeskew.Operationalitempoolswereverysuccessfulincontainingtheexposure
ratesbelow0.2.Nearlyhalfoftheitemsinoperationalitempoolshadexposureslowerthan
0.05.Themedianexposurerates(shownbytheboldlinesintheoftheboxplots)wereall
lowerthan0.05.So,therewerealotofitemsthatwereadministeredtolessthan5%ofthe
examineesample.
FortheHalf-IPandOne-Third-IP,theexposureratedistributionshadabalancedspread
125
Figure5.50:MeanSquaredErrorDistributionforeachItemPoolCondition
andlessskewcomparedtootheritempoolconditions.Butmanyitemshadexposurerates
largerthanthe0.2threshold.EspeciallyfortheOne-Third-IP,themajorityoftheitemshad
exposurerateslargerthan0.2.
Theperformanceoftheidealitempoolswerenotperfectinthesenseofexposurecontrol.
Noneoftheitemshadexposurerateslargerthan0.18forIdeal-0.4(idealitempoolwith
binsize0.4)itempool.Butabout75%oftheitemshadexposurerateslowerthan0.05.
ForIdeal-0.8itempool,manyitemswereexposedtomorethan20%oftheexaminees.But
themajorityoftheitemshadexposuresrateslessthan0.05.
126
Figure5.51:ExposureRateDistributionforeachItemPoolCondition
Table5.8showsthemeansandstandarddeviationsofexposureratesinadditiontothe
ratioofexposureratesthatarelargerthan0.20andsmallerthan0.05.Meanexposurerates
thenumberofitemsintheitempool.BecauseNCLEX-RNisavariablelengthtest,
therewasnotafunctionalrelationshipbetweenmeanexposureratesanditempoolsizesasin
thetestlengthtests.Exposurerateoutcomesofoperationalitempoolswereverysimilar
toeachother.Allhadsimilarmeansandstandarddeviationsfortheexposurerates.Onlya
fewitemswereexposedtomorethan20%oftheexaminees.Withintheoperationalitem
pools,Op-5hadthemostitemsthathadexposurerateslargerthan.2,whichcorrespondsto
42items(outof
1472
items).Almosthalfoftheitemswereexposedtolessthan5%ofthe
examinees.
Theresultsoftheexposurerateswerenotperfectforanyitempool.Operationalitem
127
poolssuccessfullylimittheexposureratesbelow0.2buttheyhadalotofunderexposeditems.
ThesameiscorrectforIdeal-0.4itempool.Butthisitempoolhadmanymoreunderexposed
items(75%oftheitempool).Thisitempoolwasnotientinthissense.Half-IPand
One-Third-IPconditionshadfewerunderexposeditems,31%and16%oftheitempools
respectively.Buttheyhadmanyoverexposeditemsaswell,39%and59%oftheitempools
respectively.
Table5.8:ItemExposureAnalysisbyItemPoolCondition
ItemPoolMeanExposureSDExposure
>
.20Exposure
<
.05
Op10.07370.06400.00340.5258
Op20.07370.06410.01630.5326
Op30.07380.06370.01700.5183
Op40.07410.06480.00680.5312
Op50.07450.06620.02850.5346
HalfIP0.14870.11080.38720.3139
One-ThirdIP0.22540.13650.58690.1616
Ideal(0.4)0.03540.04580.00000.7473
Ideal(0.8)0.06620.08260.13620.6138
Afurtherinvestigationofexposureratesrevealedtheproblemsforeachitempool.
Figure5.52showstherelationshipbetweenexposureratesanditemesgroupedby
contentareas.Foreachoperationalitempool,mostoftheexposeditemshadclose
tothecutscore.Thisisexpectedbecausethetestlengthswerelongeraroundthecutscore
andthemeanoftheexamineepopulationwasclosetothecutscoreaswell.Foroperational
itempools,mostoftheitemsthathadlowexposurerateswereattheextremesoftheability
scale.
One-Third-IPhadmostoftheoverexposeditemsaroundthecutscoreaswell.But
thepeakpointwasaround0.3.Therewasascarcityofyitems.Mostofthe
underexposeditemsforthisitempoolconditionwereatthelowerendoftheabilityscale.
Ideal-0.4itempoolhadalotofunderexposeditemsattheextremes.Noneofthecontent
areashadlargerexposureratescomparedtoothercontentareas.Oneexceptionmightbe
128
theIdeal-0.8IPcondition.Firstcontentareahadsomewhathigherexposureratesinthis
itempool.
Figure5.52:TheRelationshipbetweenExposureRatesandItemtiesGroupedby
ContentAreaforeachItemPoolCondition
DecisionAccuracy
NCLEX-RNisacertitest.Soitiscrucialtohaveahigh
decisionaccuracy.Table5.7showsthedecisionaccuracyfortestsbasedoneachitempool.
129
Foreachitempool,decisionaccuracywasabout92%.Eventheidealitempoolwithover
3000itemsdidnothavehigherdecisionaccuracythantheOne-Third-IP.Table5.9shows
thedetailedinformationaboutthedecisionaccuracy.Mostoftheexamineespassedthetest
(around66%).Thedecisionwasincorrectforabout8%oftheexaminees.Thepercentagesof
thefalsenegatives(examineeswhoincorrectlyfailed)werehigherthanthefalsepositives
(examineeswhopassedincorrectly).Eventhoughanincorrectdecisionisnotgood,usually
forexamsfalsenegativesarebetterthanfalsepositives
2
.
Table5.9:DecisionAccuracyAnalysisbyItemPoolCondition
ItemPoolFail(C.D.)Fail(I.D.)Pass(C.D.)Pass(I.D.)
Op129.73%4.15%62.61%3.51%
Op229.64%3.81%62.95%3.6%
Op329.93%4.08%62.68%3.31%
Op429.77%3.85%62.91%3.47%
Op529.73%4.06%62.7%3.51%
HalfIP29.69%4.24%62.52%3.55%
One-ThirdIP29.77%4.3%62.46%3.47%
Ideal(0.4)29.73%4.27%62.49%3.51%
Ideal(0.8)29.58%4.01%62.75%3.66%
Percentagesofsimuleeswhofailedorpassedthetest,andwhetherthe
decisionwascorrectorincorrect.
C.D.:CorrectDecision;I.D.:IncorrectDecision
5.2.4ResearchQuestion5
Theaimofthisresearchquestionistodemonstratethediagnosisofthequalityofan
operationalitempoolusingIPUIandguidetestdeveloperstobuildbetteritempools.Six
itempoolswereinvestigated.Twooftheseitempoolswereoperationalitempools,twoof
themwereidealitempoolswithbinsizes0.4and0.8
3
,twoofthemwereonethirdand
halfoftheoperationalitempools(One-Third-IPandHalf-IPrespectively).Foreach
2
Inreality,thispreferencedependsonthecostofafalsepositiveandafalsenegative
decision.
3
CheckSection4.2.2.1forthedetailsoftheseitempools
130
itempoolcondition,thesameNCLEX-RNCATspasdescribedinSection4.2.2
onpage47wereused.
Inthefollowingparagraphs,itempoolsarediagnosedusingtCAToutcome
variables.ThelastoutcomevariableinvestigatedisIPUI.ItishypothesizedthatIPUIcan
unearththecienciesintheitempoolsthatotheroutcomevariablescouldnot
Bias
Therelationshipbetweentrue

andthemeanofestimated

sforeachitempool
conditionisshowninFigureH.1onpage195.Thisgraphshowstwobumpsclosetothecut
score.Awayfromthemiddleoftheabilityscale,thereseemstobeaperfectrelationship
betweentrue

andthemeanofestimated

s.Thisgraphdoesnotshowanyence
betweentheitempoolconditions.
Figure5.53showsthemeanofbiasesateachtrue

foritempools.Thisgraphisamore
detailedversionofFigureH.1.Eachitempoolhadasimilarbumparoundthecutscore.The
reasonforthesebumpswastheterminationrule.Thetestsoftheexamineesendedwhen
thecutscorewasoutsidetheintervalsaroundtheirestimatedabilities.Since
theirtestsended,examineesdidnottheopportunitytoconvergetotheirtrue

sand
consequentlyabiasoccurred.Forexample,takethegroupofexamineeswithtrue

=0
:
25.
Themeanbiasesfortheseexamineeswere0.10,whichmeanstheirmeanestimated

swere
about0.35(=0
:
10+0
:
25).Whentheseexamineescorrectlyansweredseveralitems,their
estimated

sincreased(abovetheirtrue

s)andsincethecutscorefelloutofthe
intervaltheirtestsended.IftheCATalgorithmadministeredmoreitemstotheseexaminees,
theirestimated

swoulddecrease.Butsincethetestended,itdidnothavetheopportunity
toconvergeonagoodestimate.
Towardstheextremesofthe

scale,thereappearstobeabetweenitem
pools.Theidealitempools(especiallytheIdeal-0.4itempool)hadmeanbiasescloseto0.
One-ThirdandHalfitempoolshadnegativebiascloseto

values-3,andpositivebiasat

valuescloseto3.ThebiasinOne-ThirdIPismorevisibleforpositivetrue

valueslarger
131
than2.
Figure5.53:MeanBiasConditionalonTrue

foreachItemPoolCondition
FigureH.2onpage196showsthedistributionofbiasateachtrue

valuefortheitem
poolconditions.Someofthetrue

valuesclosetothecutscorewereomittedtomakethe
graphmorereadable.Foralloftheitempoolconditions,thevariationinthebiasesincreased
towardstheextremesoftheabilityscale,exceptfortheidealitempools.Theincreaseinthe
variationismorevisibleforOne-Thirditempoolcondition.
StandardError
Figure5.54showsthemeanSEvaluesateachtrue

valuefort
itempoolconditions.SEvaluesclosetothecutscorewerelowerbecauseofthelongtests
theseexamineestook.Fortrue

valuesoutsideofthe

scalebetween[

1
;
1],testlengths
werealmostalways60(FigureH.5onpage199).Thebetweentitempools
132
becomeclearforthesetestlengths.One-ThirditempoolhadthehighestmeanSEvalue
followedbyHalf-IP.Therewasalmostnobetweentwooperationalitempools.The
meanSEvaluesforoperationalitempoolswerelowerthanHalf-IPbuthigherthantheideal
itempools.IdealitempoolshadthelowestSEsalongthe

scale.Ideal-0.4itempoolhad
smallerSEsevenattheveryextremesoftheabilityscale.
TheresultsfortheSEsoutsidethe

scalebetween[

1
;
1]rthenumberofitems
withineachitempoolaroundthesevalues.Clearlylackoftnumberofitems
theSEvalues.Closetothemiddleoftheabilityscale,therewasalmostnobetween
itempoolconditions.AtFigure5.54,itistoseethebetweenitempools
fortrue

valuesclosetothecutscore.FigureH.3onpage197showsthemeanSEfortrue

valuesbetween-0.7and0.7.Therewasalmostnosystematicbetweenitempools
withinthisinterval.One-Third-IPhadslightlyhighervaluesbetween-0.2and0.2,butthere
wasalmostnooutsidethisinterval.
FigureH.4onpage198showsthestandarderrordistributionateachtrue

valuefor
theitempoolconditions.Someofthetrue

valuesclosetothecutscorewerealsoomitted
inthisgraphtomakeitmorereadable.ThevariationofSEswaslargeatthemiddleand
towardtheextremesoftheabilityscalefortheoperationalandreduceditempools.For
theidealitempools,thevariationwaslargeclosetothemiddleoftheabilityscale,but
variationdidnotincreasetowardstheextremes.Thiscanbeseenasanindicatorofthe
lackofappropriateitemsattheextremesfortheoperationalandreduceditempools.The
testlengthsoftheexamineesattheextremeswere60.So,theonlyfactorthatthe
increaseinSEshouldbethelackofappropriateitemsattheextremes.
MeanSquaredError
Figure5.55showstheMSEvaluesateachtrue

valueforallitem
poolconditions.Betweentrue

values-1and1,thereappearsalmostnosystematic
betweenitempools.Fortrue

valuesaround-0.4and0.4,MSEvaluesmadeadip.Closeto
true

=0,MSEvaluesreachedalocalmaximum.Towardstheextremesoftheabilityscale
133
Figure5.54:MeanStandardErrorConditionalonTrue

foreachItemPoolCondition
theMSEvaluesstartedtoincrease,exceptfortheidealitempools.
Thesystematicbetweenitempoolsbecameclearfortrue

valuessmallerthan
-2andlargerthan2.One-ThirditempoolhadthelargestMSEfollowedbytheHalf-IP.
TwooperationalitempoolshadMSEvaluesbetweentheidealitempoolsandtheHalf-IP.
Therewasnotasystematicncebetweenoperationalitempools.Ideal-0.4itempool
hadthesmallestMSEvalues.MSEvaluesforthisitempooldidnotincreaseevenatthe
veryextremesofthe

scale.SinceMSEcanbeseenasafunctionofbiasandSEthisgraphs
makessense.ThetowardstheextremeswerebytheinSEs
amongitempools(Figure5.54).
134
Figure5.55:MeanSquaredErrorConditionalonTrue

foreachItemPoolCondition
DecisionAccuracy
Table5.10showsthedecisionaccuracyatthetrue

valuesbetween
-0.5and0.5foreachitempoolcondition.Thetabledresultsarerestrictedtothisrange
becauseallthedecisionaccuracyvaluesoutsidethisintervalwasvirtually100%foreach
itempoolcondition.Table5.10doesnotshowanysystematicbetweenitempool
conditions.Closetothecutscorethedecisionaccuracywascloseto50%.Thismakesense
becauseifanexamineehastruescorethatisequaltothecutscore,thenhalfofthetimeshe
willpassandacorrectdecisionwillobtained.Astrue

valuesdeviatedfromcutscore,the
decisionaccuracyimproved.
IPUI
Figure5.56showsthemeanIPUIvaluesatthetrue

valuesforeachitempool
condition.Thisclearlyshowsencesbetweenitempools.Eachitempoolhadhigh
135
Table5.10:DecisionAccuracyConditionalonTrue

foreachItemPool
TrueThetaOp1Op2HalfIPOne-ThirdIPIdeal(0.4)Ideal(0.8)
-0.50100.0%100.0%99.9%99.9%99.9%100.0%
-0.45100.0%99.9%100.0%99.8%99.7%100.0%
-0.4099.8%99.9%100.0%99.7%99.4%99.6%
-0.3599.4%99.4%99.5%99.5%99.7%99.3%
-0.3099.1%98.2%98.7%98.5%98.7%98.5%
-0.2597.0%97.9%96.6%95.3%96.7%96.9%
-0.2091.7%92.4%91.9%92.2%93.1%93.9%
-0.1588.5%84.8%87.4%87.0%84.0%86.4%
-0.1078.1%76.0%80.7%78.1%75.7%76.6%
-0.0564.0%64.6%65.3%64.8%63.5%63.8%
0.0051.5%47.4%46.0%51.4%48.7%48.8%
0.0563.9%63.9%63.8%65.7%67.2%65.2%
0.1076.9%80.3%76.9%78.1%77.9%76.9%
0.1586.9%87.5%86.2%88.1%85.8%85.1%
0.2091.9%93.3%92.7%93.8%93.2%93.8%
0.2596.7%97.0%96.6%96.5%96.1%96.4%
0.3098.5%98.6%98.6%97.9%98.4%98.5%
0.3598.7%99.2%99.5%99.3%99.5%99.3%
0.4099.7%99.9%99.7%99.8%99.3%99.7%
0.4599.9%99.9%99.9%99.8%100.0%100.0%
0.50100.0%100.0%100.0%100.0%99.9%99.9%
IPUIvaluesclosetothecutscore.Butforexamineesthatwereawayfromthecutscore,
IPUIvaluesstartedtodecrease.Ideal-0.4itempoolperformedthebest.Throughoutthe

scale,thevalueswerecloseto1exceptveryextreme

values.Ideal-0.8itempoolhadIPUI
valuesabove0.99betweentrue

values-2and2.IPUIvaluesstartedtodecreasetowards
theextremes.
Ideal-0.4itempoolshowstheoflackofitemsfortrue

valuesthatwerelarger
than2.5andsmallerthan-2.5.Itisnaturaltoaskwhyanidealitempooldoesnothave
enoughitemsforexamineesattheextremes.Thereasonissimple.Thisitempoolwasideal
foraparticulargroupofexaminees,whichhadadistributionthatresemblesrealexaminees.
FigureG.1onpage190showsthedistributionoftheseexaminees.Noneoftheexamineesin
thisdistributionhadtrue

ssmallerthan-2andonlyafewwerelargerthan2.Asaresult,
136
theidealitempoolswerenotdesignedfortheexamineesoutsidethisinterval.Itisnormal
foranexamineewithtrue

-3or3tohaveanIPUIvaluesmallerthan1fortheseidealitem
pools.
Theperformanceofthetwooperationalitempoolswerealmostthesame.Betweentrue

s-1and1,themeanIPUIvalueswereatorabove.99.ThemeanIPUIvaluesstartedto
decreasetowardstheextremes.At

=

2thevaluedecreasedto.95andfor

=

3the
meanIPUIvaluebecame.7.Theperformanceoftheoperationalitempoolsatthepositive
sideofthe

scalewaspoorercomparedtothenegativeside.At

=2themeanIPUIvalue
reducedto.94andat

=3itwas.68.
Figure5.56:MeanIPUIValuesConditionalonTrue

foreachItemPoolCondition
TheperformanceofHalfandOne-Thirditempoolsshowedthesamedecreasingpatterns
towardstheextremesoftheabilityscale.Eventhoughtheperformancesofitempools
137
between

=

1and1arenotdistinguishable,theperformanceofOne-Thirditempoolwas
clearlypoorbetweenthisintervalcomparedtotheotheritempools.
Figure5.57showsthemeanIPUIvaluesforitempoolsbetween

=

0
:
7and0.7.The
betweenitempoolsareclearinthisBothoperationalandidealitempools
hadIPUIvalueslargerthan0.99betweenthisinterval.MeanIPUIvaluesforIdeal-0.4item
poolsurpassestheremainingitempoolsthroughoutthisinterval.Operationalitempoolsare
indistinguishableinthisaswell.Ideal-0.8itempooldidnotperformedaswellasthe
operationalitempoolsclosetothecutscore.Butasthepreviousshowed,towardsthe
extremesit'sperformancesurpassedthem.FornoneoftheexamineesHalfitempool'sIPUI
valuesreach0.99.Thisitempooldidnotperformcomparativelywellfortheexamineesat
thepositivesideoftheabilityscale.
EventhoughFigure5.57showsaclearbetweentheperformancesofitempools,
thepracticalimportanceofthismaynotbelarge.Thisissuewillbediscussedin
moredetailinthediscussionsection.HereitcanbesaidthatIPUIshowseventheslightest
betweentheitempoolsthatotherCAToutcomescouldnotcapture.Comparedto
theotheroutcomevariables,IPUIclearlythestrengthsandweaknessesoftheitem
pools.
Theresultsofthepreviousoutcomevariableswereinconclusive.Forexample,at-0.10
themeanSEoftheoperationalitempool1(Op-1)waslowerthantheremainingitempools,
at-0.05itwashigherthantheremainingones.Thesamethingcanbesaidforthebiasand
MSE.Fornoneoftheseoutcomevariablescanonederiveaclearconclusionaboutwhether
anitempoolisbetterthantheothersforaparticulartrue

value.IPUIcanquantifythe
ofeachitempoolateachtrue

value.WhetherthebetweentheIPUI
valueshaveapracticalisanotherissue.Theinconclusiveresultsespeciallyclose
tothecutscoreforbias,SE,MSEandthedecisionaccuracyshowsthattheIPUI
detecteddidnothavepracticallyttsontheotheroutcomesoftheadaptive
test.
138
Figure5.57:MeanIPUIValuesConditionalonTrue

aroundtheCutScoreforeachItem
Pool
TheprevioustwoshowedthemeanvaluesofIPUI.Inadditiontothis,the
distributionofIPUIalsogivesimportantinformationabouttheperformancesoftheitem
pools.Figure5.58showstheIPUIdistributionateachtrue

value.Sometrue

valuesaround
thecutscorewereomittedinthistomakeitmorereadable.Theboxplotsshownfor
eachtrue

valueincludethemedian,quartilesandthespreadoftheIPUIdistribution.
Foreachitempoolcondition,thespreadoftheIPUIvaluesincreasedtowardstheextremes
oftheabilityscale.Operationalitempoolsperformedwellclosetothecutscore.Towards
theextremes,thespreadoftheIPUIvaluesincreasedfortheseitempools.Theperformance
ofreduceditempoolswereclearlyworsethantheoperationalitempools.Theone-Third
itempooldidnotperformedwellevenclosetothecutscore.ThespreadofIPUIvalueswere
139
largeevenfortheexamineesclosetothecutscore.TheperformanceofIdeal-0.4wasthe
best.Evenattheextremes,examineescangetthemostappropriateitems.
Figure5.58:IPUIDistributionConditionalonTrue

foreachItemPool
140
CHAPTER6
DISCUSSION
6.1SummaryoftheResults
Theaimofthisstudywastodevelopamethodtoevaluatethequalityofitempoolsfor
adaptivetests.Currentmethodstoevaluatetheadequacyoftheitempoolswerediscussed
inSection2.3.6.ThesemethodsfallshortastheCATtestdesigngetscomplicated.The
histogramoftheitemparametersforanitempoolmightshowthatitempoolist.
ButchangingaspoftheCATtestmightmakethisitempoolientforthe
purposeofthetest.Tosolvethisproblem,anewindexevaluatingthequalityofitempools
wasdeveloped.ThisindexiscalledItemPoolUtilizationIndex(IPUI,seeSection3.2on
page27forthederivationofthisindex).IPUIquantheamountofdevianceofanitem
poolfromaperfectlyoptimumitempool.Thistheoreticaloptimumitempooleach
sptionofthetestandprovidesanoptimumitemintermsofinformationateachstage
ofthetestforeveryexaminee.
Theutilityofthisnewlydevelopedindexwasinvestigatedbyeresearchquestions(see
Section4.1onpage36).Thethreeresearchquestionsusedsimulateddataandthelast
twoquestionsusedoperationaldata.
ResearchQuestion1investigatedwhetherIPUIissensitivetothechangesinthequality
oftheitempools.Itempoolqualitywasoperationalizedby(1)thediscrepancybetweenthe
itemydistributionoftheitempoolandtheabilitydistributionoftheexaminees,and
(2)theitempoolsize.ForthepartoftheResearchQuestion1,13discrepancyconditions
weretested.Theitempoolwasthesameforallconditions.Itemyparametersof
theitempoolweregeneratedfromthestandardnormaldistribution.Theexamineeability
distributionsweregeneratedfromanormaldistribution.Themeansofthedistributions
141
rangedfrom-3to3with0.5intervals.Theresults(Figures5.1and5.3onpage54andon
page58)showedthatIPUIwassensitivetothediscrepancybetweenitempoolsandability
distribution.Increasingdiscrepancyedtheotheroutcomesoftheadaptivetestaswell,
suchasSE.ButIPUIhadmorevariabilityinitsevaluationoftheshortcomingsofitempool
comparedtoSE,andtherelationshipbetweenSEandIPUIwasnotlinear(Figure5.4on
page59).
SecondpartoftheResearchQuestion1investigatedwhetherIPUIissensitivetothe
changesofthesizeofanitempool.Eleventitempoolsrangingfrom20itemsto
1000itemswereinvestigated.Asthenumberofitemsintheitempoolincreased,themean
valueofIPUIincreasedaswell(Table5.2onpage67).After300items,theincreaseinIPUI
valueswereminimal.
Additionally,ResearchQuestion2lookedatthesamplingdistributionofIPUI.For
eachitempoolsizecondition,25replicationswereperformed.Withineachcondition,each
replicationhadthesameitempoolsize.Theitemyparametersweregeneratedfrom
thesamedistribution.Table6.1showsthemeanandstandarddeviationofmeanIPUIvalues
aggregatedbyreplication.ThevariabilityofthemeanIPUIvalueswerelargerforsmallitem
poolsizes.ThevariationsofmeanIPUIvalueshadtwosources.First,thevariationdueto
samplingoftitempoolswiththesamesizesforeachcondition.Second,thevariation
duetoIPUI.
ThespofCATscangetrathercomplex.Anitempoolthatisworkingperfectly
foronesetofspmightnotperforminasimilarwayforanothersetof
spResearchQuestion2investigatedwhetherIPUIcandetecttheadequacyofthe
sameitempoolfortCATspations.TwoCATspwereinvestigated:
testlengthandexposurecontrol.
FirstpartoftheResearchQuestion2investigated18ttestlengthsrangingfrom5
to400
1
.ResultsshowedthatastestlengthincreasedthevalueofIPUIdecreased(Figure5.20
1
Thesizeoftheitempoolwasalso400
142
Table6.1:MeansandStandardDeviationsofMeanIPUIsoftheReplications
ItemPoolSizeMeanofMeanIPUIStandardDeviationofMeanIPUI
200.58750.02528
400.83480.00935
600.90280.00803
800.93450.00928
1000.94940.00843
2000.97930.00448
3000.98800.00210
4000.99060.00217
5000.99240.00145
7500.99530.00090
10000.99650.00096
onpage83).IPUIdistributionsindicatedthatthisitempoolcansupportatestlengthof50
formajority(75%)oftheexaminees.
SecondpartoftheResearchQuestion2investigated12exposurecontrolconditions.These
conditionsrangefromnoexposurecontroltorandomselectionofitems.IPUIdetected
evensmallbetweenconditionswhereotherCAToutcomesshowedno
(Figure5.22onpage87).ResultsoftheResearchQuestion2showedthatIPUIisvery
sensitivetoevensmallmoofthetestsp
ResearchQuestion3wasdesignedtoshowtheutilityofIPUIasadiagnostictoolforitem
poolevaluation.Threetestplanswithtspewereinvestigated.Thetwo
planshadthesameitempoolbutintheplancontentbalancingwasimposedonitem
selection.Forthesecondcondition,therewasnotanyconstraintsonitemselectionalgorithm.
Thethirdplanhadanitempoolconsistingofratheritems.IPUIresultsclearly
showedtheweakpointsoftheitempoolforeachcondition.Furtherdiagnosticinformation
providedbytgraphsofIPUIshoweddetailedinformationoftheweaknessesofthe
itempoolforparticulartestsp
ResearchQuestion4and5usedoperationaldataprovidedbyNCSBNtoshowtheutility
ofIPUI.ResearchQuestion4comparedninetitempoolswiththesamesp
143
astheNCLEX-RNexam.Fiveoftheitempoolsweretheoperationalitempoolspreviously
usedinNCLEX-RNexams.Twoitempoolsweretheidealitempoolsgeneratedforthe
spoftheNCLEX-RNexam.Twoitempoolsweregeneratedbyrandomlyremoving
halfandone-thirdoftheoperationalitempool.Thesameexamineedistributionas
therealexamineepopulationwasusedforthecomparisons.TheresultsoftheResearch
Question4showedthatallofthetheitempooldesignsperformedwellfortheexaminee
group.AmongotheroutcomesofCATonlytheIPUIdetectedtheweaknessesofthehalf
andone-thirditempools(Figure5.41onpage113).IPUIdetectedthatthesetwoitempools
depletedofappropriateitemstowardstheendofthetestforexamineesclosetothecutscore
(seeFigure5.45onpage119).Operationalandidealitempoolswerestrongevenforthelong
tests.
ResearchQuestion5wassimilartotheResearchQuestion3.Theaimwastoshow
theutilityofIPUIasadiagnostictoolforanoperationalCAT.Theresultsshowedthat
operationalandidealitempoolswereveryrobustclosetothecutscore.Towardstheextremes
oftheabilityscale,theoperationalitempoolsstartedtoweaken(seeFigure5.56onpage137).
Butthishadnoonthedecisionaccuracy(seeTable5.10onpage136).IPUIdetected
evenslightbetweenitempoolsclosetothecutscorewhereeachitempoolwas
comparativelystrong(seeFigure5.57onpage139).
6.2PracticalUsesofIPUI
TheresultsofthestudyshowedthatIPUIwasverysensitivetothechangesinthequalityof
itempools,changesintestsptionsthattheutilizationoftheitempool,and
wasausefuldiagnostictooltoimprovetheitempoolquality.Inthissectionthepractical
usesofIPUIarediscussed.
144
6.2.1QuanoftheItemPoolQuality
Testdevelopersarebuildingitempoolsforadaptivetestsveryfrequently.Usersofthesetests
needtoknowthequalityoftheteststhatareadministered.Sincethequalityoftheadaptive
testsarestronglytiedwiththequalityoftheitempools(Flaugher,2000),testdevelopers
havetomakesurethatthequalityoftheiritempoolsareadequateforthepurposesofthe
test.
Fourgeneralmethodsareusedtogetinformationabouttheitempool:(1)itempoolsize,
(2)descriptivestatisticsforitempoolparameters(i.e.mean,standarddeviation,histogram,
etc.)(3)itempoolinformationfunction(4)outcomevariablesofCATsimulations.Eachof
theseexistingmethodshastheirownshortcomingsasexplainedbelow.
Itempoolsize,i.e.thenumberofitemsintheitempool,isanimportantindicatorofthe
adequacyoftheitempool.Eventhoughgeneralrulesexistforthesizeofanitempool,such
asitempoolsizeshouldbetwelvetimesthelengthoftheadaptivetest(Stocking,1994),
therearenoonesizeallkindofgeneralruleforanadequatesizeforanitempool.An
itempoolwhichisperfectlyadequateforanexamineegroupmightnotbeadequatefor
anotherexamineegroup(seethepartoftheResearchQuestion1inSection5.1.1.1on
page53).Ahighquality100itemsmightperformbetterthanalowquality200items(Xing
&Hambleton,2004).Asaresult,inadditiontothesizeoftheitempool,theinformation
aboutthequalityoftheitemsarenecessarytoevaluatetheitempool.Thequalityofitems
canbemeasuredbytheitemparameters.
Thedescriptivestatisticsfortheitempoolparametersareusefultoseetheoverallpicture
oftheitempool.Commondescriptivestatisticsarethemeansandstandarddeviations
oftheitemparametersorthehistogramswhichareusedtovisualizetheitemparameters.
Especiallythedistributionoftheitemyparameterscanbehelpfultoseewhether
thereisadiscrepancybetweentheitempoolandtheabilitydistributionoftheexaminee
group.Forinstance,avisualcomparisonofFiguresA.1andA.2onpage167andonpage168
145
willgiveanideaabouttheabilitydistributiontowhichtheitempoolismostappropriate.
ButoperationalCATsarerarelyasstraightforwardastheonesinResearchQuestion1.There
aremanyconstraintsontheitemselectionalgorithmwhichmakesitulttoevaluate
theoftheitempoolbysimplyinspectingthedescriptivestatisticsoftheitem
parameters.Forexample,plan1and2intheResearchQuestion3usedthesameitempools
(Figure5.30onpage98).Butduetothecontentbalancingimposedontheitemselection
algorithminplan1,theperformancesofthesetwoitempoolsalot(seeFigures5.32
and5.34onpage100andonpage102).
Itempoolinformationfunctionsarewidelyusedtoevaluatetheadequacyoftheitem
poolforaparticulartestpurpose.XingandHambleton(2004)useditempoolinformation
functionstocomparetitempools(seeFigure2.1onpage24).Inaddition,itempool
informationfunctionsarewidelyusedtobuilditempoolsforadaptivetests(vanderLinden,
Veldkamp,&Reese,2000,2006;Belov&Armstrong,2009).Buttestinformationfunctions
fromthesamedisadvantageexplainedinthepreviousparagraph.Twoitempools
mighthavethesameinformationfunctions(asinplan1and2ofResearchQuestion3)but
theycanperformtly.
ThecomparisonoftheitempoolsusingtheoutcomesoftheCAT,suchasbiases,SEs
oftheabilityestimatesandtheexposureratesoftheitemsisacommonapproachaswell
(He&Reckase,2013;Thompson&Weiss,2011).Infact,inalloftheresearchquestions
investigated,suchoutcomesofCATswereusedalongwiththeIPUItocomparetheitem
pools.TheoutcomesoftheCATgivesvaluableinformationabouttheperformanceofthe
itempools.Butnoneofthemgivesadirectwaytoevaluatethequalityofanitempool.
Figure1.1onpage2isaverygoodexampleforthis.Thisshowsthattheitempool
providedveryappropriateitemstotheExaminee3,butnottotheExaminee4.TheIPUI
valuesofExaminee3and4was0.98and0.273respectively.ButtheSEoftheExaminee3
washigherthantheSEofExaminee4.Ifonejudgestheperformancesoftheitempoolfor
thesetwoexamineesaccordingtoSE,aninaccuratepicturecanbedrawn.Suchexamples
146
canbefoundfortheotheroutcomesoftheCAT.
Asitwasshowninthisstudy,IPUIisadirectwaytomeasuretheadequacyofanitem
poolatboththeexamineelevelandtheexamineegrouplevel.Atestdevelopercaneasily
quantifytheadequacyofitempoolbyusingIPUIwithoutresortingtotheindirectwaysto
evaluatetheitempools.
6.2.2IPUIinOptimalTestAssembly
ThespecialissueofAppliedPsychologicalMeasurementinSeptember1998wasaboutthe
optimaltestassembly.vanderLinden(1998b)introducedtheconceptanddiscussedt
methodstoassembleatestoptimally.Hedividedthetestspintotwobroadareas,
constraintsandobjectives.Constraintsarethetestoritemattributesthathasanupper
and/orlowerlimittobemet.Forexample,aminimumormaximumnumberofitemstobe
administered,thenumberofwordsinthetest,thenumberofitemswithandetc.are
amongtheconstraints.Theobjectivesrequireatestattributeorafunctionofitemattributes
toreachaminimumormaximum.Forexample,maximizationoftestinformationwithin
acertainrange,maximizationofthetestvalidity,maximizationofthedecisionaccuracy,
minimizationofthestandarderrorofabilityestimatesandetc.
Theseconstraintsandobjectivesgenerallyleadtoanoptimizationproblem:optimization
ofanobjectiveinthepresenceoftheconstraints.IPUIcanbeusedasaconstraintorasan
objectiveintheseoptimizationproblems.Asaconstraint,atestdevelopermightrequire
thatnoneoftheexamineesinanadaptivetestshouldhaveanIPUIvaluelessthanacertain
value.Ontheotherhand,IPUIcanserveasanobjectiveofatestassemblyproblemtobe
maximized.AnitempoolthathasamaximummeanIPUIvaluethatallofthe
constraintsofthetestcanbechosenasanitempool.
147
6.2.3IPUIasaQualityControlTool
Tests,especiallythehighstakesones,shouldconformtoindustrystandardsasdescribed
inStandardsforEducationalandPsychologicalTesting(AmericanEducationalResearch
Association,AmericanPsychologicalAssociation,&NationalCouncilonMeasurementin
Education,2014).Testdevelopersconstantlymonitorthequalityoftheteststheyadminister.
Inadditiontoadheringtothehighqualitytestdevelopmenttechniques(Schmeiser&Welch,
2006),developersoftheCATtestshouldmakesurethattheirtestsareperformingasintended.
ForCATtests,testdevelopershavelesscontrolovertheparticulartestanexaminee
gets.Hence,theneedforqualitycontrolishigh.Oneofthemethodstestdevelopersuseis
showingthepaper-and-pencilcopiesofvariousCATteststoexperttestspecialists(Eignor
etal.,1993).Theseexpertsexaminethetestsandcheckwhetherthetestsadheretothetest
sp
IPUIcanbeusedasanadditionaltoolforcheckingthequalityofindividualteststhatare
administeredtotheexaminees.Ifanexaminee'sIPUIvaluedropsbelowacertainlevel,the
examineemightbeForinstance,Figure5.43onpage116showedthatforOne-Third
itempool,examineeswithabilityestimateslargerthan2hadrelativelylowerIPUIvalues.
Ifthisisnotacceptableconsideringthetestpurpose,testdevelopercantakeappropriate
precautions.AlowIPUIvaluethatitempoolfailedtoprovideappropriateitems
tothisexaminee.Testdevelopercaneitherimprovetheitempoolorchangethetest
sptoimprovetheutilizationoftheitempool.
6.2.4IPUIasaDiagnosticTool
Asindicatedintheprevioussections,IPUIcanbeusedtomeasurethequalityoftheitem
poolandasatoolofqualitycontrol.Afteroutthatitempoolisnotperformingwell,
thenextstepofatestdeveloperistodiagnosethenciesoftheitempoolandimprove
theitempoolinsuchawaythatitempoolprovideseachexamineeappropriateitems.
148
DiagnosticutilityoftheIPUIwasinvestigatedinResearchQuestions3and5.Theresults
oftheseresearchquestionsshowedthatIPUIcanshowthesoftheitempools
andguidetestdeveloperstoaddsuitableitemstothesedForexample,in
ResearchQuestion3,itistojudgefromtheSEgraphinFigure5.32onpage100
thepropertiesoftheitemsthatshouldbeaddedtotheitempoolinplan1.TheIPUIvs.
itemnumbergraphinFigure5.35onpage104showedthatthecauseofthelowperformance
ofthisitempoolwasthecontentbalancingrestrictions.Further,Figure5.36onpage105
showedtheapproximateitemyvaluesneededfromeachcontentareatoimprovethe
itempool.Similargraphsthatusesothertestconstraintscanbehelpfultodiagnoseand
improvetheitempools.
6.2.5IPUIatIndividualandGroupLevel
TheprevioussectionsexplainedfourtwaystheIPUIcanbeusedinpractice.Itis
believedthatIPUIcanbeusedasanoutcomevariableforCATjustlikebias,SE,MSE,
exposurerateoroverlaprate.TestdeveloperscanuseIPUIattwolevels:atgrouplevelor
atindividuallevel.
Atgrouplevel,IPUIisanindicatoroftheadequacyofanitempoolforagivensetoftest
spnsandexamineegroup.Thegrouplevelstatisticscanbethemeanormedian
oftheIPUI.Thesestatisticswillbytheexamineegroup.Iftheitempoolis
appropriateformostoftheexamineesthemeanofIPUIwillbelarge.Iftheitempoolis
appropriateforonlyasmallportionoftheexamineestested,themeanofIPUIwillbesmall.
Practitionerscanusethesesummaryvaluestoevaluatethequalityoftheiritempools
overtime.Iftheexamineegroupdoesnotchangedramaticallyfromyeartoyear,test
developercansetastandardminimumforthemeanIPUIvalue.Overtheyearseachitem
pooldevelopedcanbecomparedtothisstandard.Thiswillensurethefairnessofthetest
acrossyears.ResearchQuestion2showedthatachangeintestspcanthe
adequacyoftheitempool.Usingastandardlikethiswillallowtestdeveloperstoimplement
149
newtestspwhileensuringthattheadequacyoftheitempoolisstillonparwith
theitempoolsusedintheprevioustestingwindows.Forinstance,ifthetestingagency
decidestoincreasethesecurityoftheitempoolbyimplementinganewexposurecontrol
method,testdeveloperscanuseIPUItoensurethattheitempoolisstilladequateaftersuch
achange.
AseconduseofIPUIisattheindividuallevel.AtestdevelopercansetaminimumIPUI
valueforeachexamineesothattheitempoolisadequateforeachtesttaker.Inpractice,
theitempoolscannotprovideappropriateitemstotheexamineesattheextremesofthe
abilityscale.Forinstance,operationalitempoolsinResearchQuestion4didnotprovide
appropriateitemstosomeexamineeswithestimated

valuessmallerthan-2orlargerthan2
(Figure5.44).Forthisoperationaltest,theinadequacyoftheitempoolfortheseexaminees
didnotthedecisionabouttheexaminees.Ontheotherhand,ifthepurposeofthe
testwastomeasureeachstudenttlywell,thetestdevelopercanmaketheitempool
broadenoughtoprovideappropriateitemstotheexamineesattheextremes.Thiswillensure
thetestfairnessatindividuallevel.Anitempoolthatprovidesappropriateitemstosome
groupofexamineesbutnotforanothergroupwillunderminethefairnessofthetest.Using
aminimumvalueforIPUIasabenchmark,testdeveloperscanensurethequalityofservice
foreachindividualexaminee.
Fromtheperspectiveofanexaminee,theywantafairinstrumentthatmeasurestheir
abilityaspreciselyaspossible.IftheexamineegetsatestwithhighIPUIvalue,thismeans
atleasttheitempoolportionoftheCATworked
AsdiscussedinSection6.4.2onpage156,thisstudydoesnotprovidearecommended
valueforIPUI.Butthisdoesnotprecludetestdeveloperstosettheirownstandardsand
comparetheperformancesoftheitempoolsusingIPUI.
150
6.3Implications
6.3.1TheRobustnessofCATProcedurestoWeakItemPools
TheresultsofthestudyshowedthatIPUIwasverysensitivetochangesinthequalityofthe
itempool.WhenaCATalgorithmadministerssub-optimalitems,IPUIdetectsthem.The
otheroutcomesoftheCATwerenotassensitivetosub-optimalityoftheitempoolunless
theitempoolunderperformedtly.Forexample,meanbiaswasrarelyby
theinadequacyoftheitempoolunlesstherewasalargediscrepancybetweenitempooland
abilitydistribution(seeFigure5.31onpage99).
InResearchQuestion4,theycot,themeanbias,thedecisionaccuracyand
themeanSEvalueswerealmostthesameacrossthetitempoolconditions(see
Table5.7onpage114).EventhoughIPUIindicatedaperformanceceamongitem
pools,thisdidnotontheotheroutcomes.Theresultsoftheotherresearchquestions
indicatedthistoo.ThistherobustnessoftheCATprocedurestoinadequateitem
pools.
TherobustnessofmaximumlikelihoodabilityestimationinaCATwasshownbyChang
andYing(2009).Theyfoundthatevenforanitembankwithalimitedcapacity,the
maximumlikelihoodestimatesof

wereconsistentandasymptoticallynormal.
FortheRaschmodel,therobustnessoftheCATprocedurestotheselectionofsub-optimal
itemswasobservedbyotherresearcherstoo.Bergstrometal.(1992)performedastudywhere
theymotheitemselectionalgorithmtoselectitemswith0.5,0.6and0.7probabilities
ofcorrectresponses.Theofthesemoontheprecisionoftheabilityestimates
wasminimal.Way(1998)concludedthat\...theadaptivenatureofCATplaysasurprisingly
minorrolewhentheRaschmodelisused.Thiscontradictsthecommonlyheldassumption
thatCATwilltlyimprovemeasurementprecisionthroughtargetingitemselection
toeachindividual"(p.21).Theresultsofthisstudycorroboratestheofthese
researchers.
151
6.3.2SummaryStatisticsforIPUI
WhenevaluatingtheIPUIforagroupofexaminees,atestdeveloperhastoptionsto
summarizethedistributionofIPUI.Themostinformativewayistovisualizethedistribution
ofIPUIvaluesusingahistogram,boxplotorascatterplotoftheIPUIversusthe

estimates
2
.
Thiswillallowthetestdevelopertoobservetheperformanceoftheitempoolatanindividual
level.
Atthegrouplevel,theofIPUIinEquation(3.6)onpage29usesthemean
tosummarizetheperformanceoftheitempool.EspeciallyforskewedIPUIdistributions,
averagingtheIPUIvaluesmightnotgiveagoodpictureoftheadequacyofthetest.The
distributionofIPUIisgenerallynegativelyskewediftheitempooliswellsuitedforthe
majorityoftheexaminees.Thiscanbeseenforsomeconditionsintheresultssection(i.e.
Figures5.3and5.20onpage58andonpage83).Insuchcases,themeanandmedianvalues
ofIPUIandmayleadtotinterpretationsaboutthequalityoftheitempool
3
.
Thiswillthepotentialcomparisonsoftheitempools.Whenthemeanandmedian
valuesofIPUIarediscrepant,practitionersareadvisedtolookattheoveralldistributionof
theIPUIvaluesandevaluatetheitempoolsaccordingly.
InadditiontothemeanormedianvaluesofIPUI,thevariationoftheIPUIcangive
importantpiecesofinformation.Thebestcasescenarioforanitempoolisahighmean
andasmallstandarddeviationofIPUIvalues.Thishappenswhentheitempoolprovides
appropriateitemstoalmostalloftheexaminees.However,ifthevariationoftheIPUI
valuesislarge,thisindicateslargediscrepanciesbetweentheperformanceoftheitempool
fortexaminees.Dependingonthepurposeofthetest,thismightnotbefair.
2
SeeFigure5.44onpage118asanexample.
3
SeetheendofSection5.1.2.1onpage82foranexampleandadiscussionaboutthe
betweenusingmedianandmeanasasummarystatisticsforIPUI.
152
6.3.3CommentaryontheResultsoftheOperationalItemPools
Theresultsforthecomparisonoftheoperationalitempoolswereverygood.Exceptforthe
examineeswhowerefarawayfromthecutscore,performancesoftheoperationalitempools
wereverygood.Sincethepurposeofthetestwasdividingexamineesintotwogroups,the
enessoftheitempoolattheextremesmightnotbecrucial.Operationalitempools
performedverywellaroundthemiddleoftheabilitydistributionwherethecutscorewas
located.Examineeswhowereclosertothecutscoretooklongertests.So,itwasimportant
foritempooltoprovidetnumberofappropriateitemsfortheexamineesclosetothe
cutscore.ThegraphscomparingIPUIandtestlength
4
showedthattheIPUIvaluesofthe
examineeswhotooklongtestswereveryhigh.Thissuggeststhattheoperationalitempools
providedappropriateitemstotheexamineeswhosetestslasted250items.Theseexaminees
weretheonesforwhomthemeasurementprecisionwasveryessential.
Infact,NCLEX-RNexamwasnotaverygoodexampletoshowthemeritsoftheIPUI.
Theexamhasalonghistoryandtheitempoolsfortheoperationaltestsaremeticulously
preparedforthetest.Furthermore,thetestisverylong,consequentlyitisveryhardtogeta
decisionerrorunlesstheexaminee'struescoreisclosetothecutscore.ThemeritsofIPUI
wouldbemoreevidentfortestswithmuchsmalleritempoolsandthedecisionforawide
rangeofabilitiesareneeded.Achievementtestswhichdesiretomeasureawiderangeof
abilitieswouldbeagoodexampleforshowingtheusesofIPUI.
6.3.4IPUIandMeasurementQuality
IPUIisanindicatoroftheadequacyoftheitempool.Itisnotanindicatorofthequality
ofthemeasurement.Certainly,anadequateitempoolwouldimprovethequalityofthe
measurement,butitisnotatcondition.Anitempoolmightprovideappropriate
itemstoanexaminee,butstill,themeasurementqualityofthetestmightbelow.
4
SeeFigure5.45onpage119.
153
Forexample,Figure3.2onpage32showsthattheitempoolprovidedappropriateitems
toExaminee3throughoutthetest.TheIPUIvalueforthisexamineewas0.98,indicating
theitempoolwasadequateforthisexaminee.ButtheSEoftheabilityestimateforthis
examineewashigh,0.685.Thetestendedafter8items,andforthisexaminee,8itemswere
notenoughforapreciseestimateoftheability.Eventhoughtheitempoolportionofthis
testperformedwell,thetestspneedstobechangedforaprecisemeasurementof
theability.
Ontheotherhand,apreciseabilityestimatedoesnotmeanthattheitempoolperformed
well.Forexample,Examinee4inFigure3.2hadlowerSEcomparedtoExaminee3,butthe
itempoolfailedtoprovideappropriateitemstothisexaminee.Theresultsofthepart
oftheResearchQuestion2alsocorroboratesthis.Figure5.14onpage74showsthatwhen
testswerelongertheabilityestimatesweremoreprecise.Yet,theitempoolfailedtoprovide
enoughappropriateitemsforlongertests.
Thereareothersituationswhereanitempoolisadequatebutduetotheotheraspects
oftheCATalgorithm,themeasurementqualityForinstance,iftheitemselection
algorithm(suchasEAPorMAP)usesastrongpriordistribution,theabilityestimatewill
bebiased(Kim&Nicewander,1993).Theitempoolmightprovideappropriateitemstothe
examinees,butthiswillnotreducethebiascausedbytheitemselectionalgorithm.Ahigh
IPUIvaluemightnotcorrespondtosmallerbiases.
Theresultsofthisstudyshowedthat,ingeneral,anadequateitempoolisassociated
withbettermeasurement
5
.Nevertheless,asdiscussed,anadequateitempooldoesnotalways
enoughforhighmeasurementquality.
5
SeeFigures1.1,5.4and5.11onpage2,onpage59andonpage70.
154
6.4LimitationsoftheStudy
6.4.1GeneralizabilityoftheResults
Thegeneralizationsmadeinthestudyarelimitedtothemethodsused.Forexample,in
thesecondpartoftheResearchQuestion2,theofexposurecontrolonIPUIwas
investigated.Inthatresearchquestion,onlytherandomesqueexposurecontrolmethodwas
usedasaproxyofexposurecontrolmethods.AsdiscussedinSection2.3.3.2onpage14,there
aremanyotherexposurecontrolmethodsusedinoperationaltests.Theresultspresentedin
thisstudyarelimitedtorandomesqueexposurecontrolmethod.Butstill,itisexpectedthat
anyexposurecontrolmethodwillreducethequalityoftheitempool.Similarlimitationsof
generalizabilityarealsovalidfortheotheraspectsofthesimulations.
Theitemselectionprocedureisanotherexampleofthelimitedgeneralizabilityofthe
currentstudy.MFIitemselectionalgorithmwasusedinallofthesimulationsinthisstudy.
Resultsmightbetforotheritemselectionalgorithms.Thegeneralizabilityofthe
resultsfromMFItootheritemselectionalgorithmsmightnotbestraightforward.MFI
usestheFisherinformationoftheitems.TheIPUIalsodependsontheFisherinformation.
Inthisrespect,IPUIisveryrelevanttoCATsusingMFI.Ontheotherhand,otheritem
selectionalgorithmsmighthaveatcriteriaforselectingtheitems.Forexample,
Kullback-Leibleritemselectionalgorithm(Chang&Ying,1996)searchesforanitemthat
maximizestheglobalinformationinsteadofFisherinformation.Asaresult,evenifthereare
nootherconstraintsontheitemselectionalgorithm(suchasexposurecontrolorcontent
balancing)andtheitempoolistlylarge,theIPUIvaluemightbelowforthisitem
selectionalgorithm.Inthisregard,thismightbeseenasalimitationofIPUI.ButIPUIcan
begeneralizedandreformulatedsuchthatinsteadofmaximizationoftheFisherinformation,
maximizationoftheglobalinformationmightbeexpected
6
.
OperationalCATshavemanyotherconstraintssuchasitemenemies,limitationsonthe
6
MoreonthisinSection6.5.1onpage161
155
numberofcertaintypesofitems,limitationsonwordcounts,keydistributionconstraints
(i.e.samenumberofA's,B's...shouldappearascorrectresponse.)andetc
7
.Thesewere
notexploredinthisstudy.Eachoftheseconstraintsareexpectedtoreducethevalueofthe
IPUI.
Throughoutthisstudy,theofchangingonlyoneCATsponIPUIwas
investigated.InResearchQuestion2,asthetestlengthincreased-whileholdingeveryother
aspectoftheCAT-themeanvalueofIPUIdecreased.Butincreasingthetestlengthalso
decreasedthestandarderrorsofabilityestimatesbecausetestsbecamemoreprecise
8
.Onthe
otherhand,inthesecondpartoftheResearchQuestion2,increasingtheexposurecontrol
parameterdecreasedtheoverallIPUIvaluesandincreasedtheSEsoftheabilityestimates.
Inbothofthesesetsofsimulationssinceonlyonesptionwaschangedatatime,it
waseasytoobservetheofthesechangesonIPUIandotherCAToutcomes.But,itis
hardtopredicttheofchangingmorethanonespontheoutcomesofaCAT
andIPUI.Forinstance,ifthelengthofthetestandtheexposurecontrolparameterhave
increasedatthesametime,itwouldbehardtopredictthepotentialchangesintheoutcomes.
TheIPUIvalueswoulddecrease,butitisveryhardtopredictthechangesinSE.
Therefore,itwouldbevaluabletoperformmultifactordesignswherethemainsand
theinteractionsbetweentCATsponsonCAToutcomesandIPUIcanbe
observed.Asafutureexpansionofthisstudy,thiswillbevaluable.
6.4.2ARecommendedValueforIPUI
IPUIvaluesareboundedbetween0and1.Thisisveryusefulfromtheperspectiveof
comparingtitempoolsorcomparingthesameitempoolforttestsptions
orexamineepopulations.Butfromapracticalpointofviewitisdesirabletohavea
recommendedvalueforIPUI.IfIPUIgoesbelowaspvalue,thiswillsignalthetest
7
TheseconstraintswereinvestigatedinSection2.3.3onpage12
8
Howeveer,theofthetestsdecreased.
156
developerthattheitempoolisnottforanexamineeoragroupofexaminees.One
oftheaimsofthisstudywasaspnumbersothatthetestdeveloperscoulduse
asaforaninadequateitempool.
ForverybasicCATdesignsarecommendedvalueforIPUImightbetenable.Section5.1.1.2
onpage69explainedtherelationshipbetweenreliabilityandthestandarderror.Assuming
thattheexamineepopulationhasastandardnormaldistribution,astandarderrorvalueof
0.32correspondstoareliabilitycotof0.9.Table5.1onpage55indicatesthat,when
thediscrepancybetweentheexamineeabilitydistributionandtheitemydistribution
was1.5(or-1.5),themeanstandarderrorofabilityestimateswereapproximatelyequalto
0.32.ThemeanIPUIvalueforthesediscrepancyconditionswere0.9and0.91respectively.
InthesecondpartorResearchQuestion1,whentheitempoolsizewas60,themeanSE
wasapproximately0.30(seeFigure5.2onpage56).ThemeanIPUIvalueforanitempool
sizeof60itemswas0.9028(seeTable5.2onpage67).Thesetwoprovidessome
evidenceforarecommendedvalueforIPUI.Fortheseparticulartestdesignsandexaminee
populationstherewasacorrespondencebetweenareliabilityof0.9andanIPUIvalueof0.9.
ButtheresultsofResearchQuestion2-wherethetsoftestsponIPUIwere
investigated-didnotthiskindofrelationshipbetweenthemeanSEandmean
IPUI
9
.
Additionally,attachingadirectlinkbetweenIPUIandanotheroutcomeofaCATtest
suchasSEwillobviatetheuseofIPUI.IPUIcapturesauniqueaspectoftheitempool,
whetheritisadequateornot.OtheroutcomesoftheCATarecapturingsomeotherimportant
aspectsofthetest,butnotnecessarilytheadequacyoftheitempool.Also,theremaynot
beadirectlinkbetweenIPUIandtheotheroutcomeoftheCAT.Forexample,Section3.4
showedthatIPUIandSEcapturestaspectsoftheCAT.Resultsofthepart
oftheResearchQuestion2showedthatadecreaseintheadequacyofanitempooldidnot
implyadecreaseinthequalityofthemeasurement.Astestlengthsincreased,themean
9
SeeFigures5.14and5.22onpage74andonpage87.
157
valuesofIPUIandSEdecreased
10
.
Thechallengewitharecommendedvalueiscloselyrelatedtotheionofan
inadequateitempool.Unfortunatelythereisnotaclearitionofaninsutitem
pool.Antitempoolrevealsitselfbytheoutcomesofthetest.Thesecanbehigh
standarderrorsoftheabilityestimates,thebiasoftheabilityestimates,theviolationofthe
constraintsofthetestorfailuretosatisfythetestspTherearenouniversally
acceptedbenchmarksforanyoftheseoutcomes.Obviously,teststhatmeetallofthetest
spandhavinglowstandarderrorsoftheabilityestimatesandbiasesaredesirable.
Buthowlowisgoodenough?Iftherewasanacceptedthresholdbetweenagoodtestanda
badtest,itcouldbepossibletoaspecIPUIvalueforthatthreshold.
Thischallengeappliestootherindicesofthetestqualityaswell.Ifweask\Whatisa
goodvalueforthetestreliability?",theanswerofapsychometricianwouldbe\Itdepends
onthecontext.".HereisanexcerptfromNunnallyandBernstein(1994)onthestandardsof
reliability:
Asatisfactorylevelofreliabilitydependsonhowameasureisbeingused.In
theearlystagesofpredictiveorconstructvalidationresearch,timeandenergy
canbesavedusinginstrumentsthathaveonlymodestreliability,e.g.,.70.[...]
Incontrasttothestandardsusedtocomparegroups,areliabilityof.80maynot
benearlyhighenoughinmakingdecisionsaboutindividuals.Groupresearchis
oftenconcernedwiththesizeofcorrelationsandwithmeanamong
experimentaltreatments,forwhichareliabilityof.80isadequate.[...]If
importantdecisionsaremadewithrespecttosptestscores,areliabilityof
.90isthebareminimum,andareliabilityof.95shouldbeconsideredthedesirable
standard.(pp.264-265)
Astheauthorsindicated,therecommendedvalueforreliabilitydependsonthecontext
10
SeeFigure5.14onpage74.
158
wherescoreswillbeused.Similarly,therecommendedvaluesforIPUIshoulddependonthe
contextwheretheitempoolwillbeused.AhighstakesadaptivetestmightrequireanIPUI
valueof.99foreachexaminee.Forexample,theoperationalpoolsoftheNCLEX-RNexam
wereadequateclosetothecutscore.TheIPUIvaluesoftheexamineeswerelargerthan0.99
closetothecutscore(Figure5.44).Ontheotherhand,ifthepurposeoftheadaptivetestis
simplytoobtainareliablegroupsummaryofexaminees,thenalowervalueofIPUImight
beacceptable(Kruyen,Emons,&Sijtsma,2012).
Inaddition,therecommendedvaluesforreliabilitymentionedearlierwaspossiblynot
agreeduponrightawayamongresearcherswhenCronbachwrotehishighlycitedpaperin
1951(Cronbach,1951)
11
.Instead,yearsofuseoftheinternalreliabilitycotinent
contextsestablishedtheserecommendedvaluesamongresearchers.Similarly,itisexpected
thatastheuseofIPUIbecomesprevalentamongthepractitionersanditisusedint
contexts,thepropertiesofIPUIandit'sinteractionswithothertestspeciwillbe
exploredmore.ThiswillpotentiallyleadtoarecommendedvalueforIPUI.
6.4.3DetectionoftheRedundantItemsintheItemPool
Thereisadelicatebalancebetweenasatisfactoryitempoolandanitempoolwhichhasmore
thanenoughitems.Bothofthemservewellforthepurposesofatest.Butlargeritempools
havetheirowndisadvantagesasdiscussedinSection2.3.5.1.Testdeveloperswanttheiritem
poolstosatisfythetestpurposestly.However,duetothecostofdevelopingitems,
theydon'twantredundantandunderuseditemsintheitempools.Buildinganitempool
thatcontainsjustenoughnumberofitemsisnoteasy.
IPUIcannotdistinguishbetweenalavishitempoolwhichhasmorethanenoughitems
andanitempoolthatissatisfactoryenoughanddoesnothaveredundantitems.Forboth
oftheseitempools,IPUIwillbe1.IfanitempoolhadanIPUIvalueof1,addingmore
itemstothisitempoolwouldnotincreasethevalueofIPUIfurther.Inpractice,anIPUI
11
Here,itisnotsuggestedthatCronbachistheinventorofreliability.
159
valueof1almostneverhappens.Unlessthetestdeveloperaddsitemswithexactlythesame
itemparameters,meanIPUIvalueswillalwaysincrease.SeeTable5.2onpage67onhowan
increaseinthesizeoftheitempoolincreasedtheIPUIslightlyforlargeitempoolsizes.
6.4.4ThePurposeoftheTestandtheoftheOptimumItem
IPUIinitiallydevelopedfortheadaptiveteststhataredesignedtomeasureeveryexaminee
aspreciselyaspossibleregardlessoftheabilityoftheexaminees.Thisgoalisthepurposeof
mostoftheachievementtests.Butnotalltestsaredesignedaroundthispurpose.
Licensuretests,forexample,primarilyinterestedinwhetheranexamineeisabovea
cutscoreorbelowit.Theprecisionoftheabilityestimateofanexamineewhoisfaraway
fromthecutscoreisnotcrucialaslongasthedecisionregardingthepassingstatusofthis
examineeisclear.Asaresult,ifanadaptivetestdoesnotgivethemostappropriateitem
forthisexaminee,thisistolerable.Theoretically,thebestitempoolforthepurposeofa
licensuretestisanitempoolincludingentnumberofitemsthathaveitem
equaltothecutscore.Forthepurposesofthetest,thisitempoolisperfect,butforthe
precisionoftheabilityestimatesitisfarfromperfect.Manyotherexamplescanbegiven
fortestsinwhichthehighprecisionoftheestimatesforallexamineesisnottheprimary
purposeforthetest.
IPUIinEquation(3.5)quanthequalityofanitempoolasiftheprimarypurposeof
theadaptivetestistheprecisionoftheabilityestimates.IntheCATliterature,theoptimum
itemaccordingtothispurpose:\...anitemisconsideredtohaveoptimumstatistical
propertiesifitismostinformativeatanexamineescurrentmaximum-likelihoodestimate
ofability"(Eignoretal.,1993,p.10).Forthegenerallogicoftheadaptivetestthismakes
sense.ThisiswhyFlaugher(2000)listedarectangulardistributionofitemyasa
characteristicofasatisfactoryitempool.Arectangulardistributionofitemyenables
aCATproceduretoprovideeachexamineeanappropriateitem.
Inthefuture,theformulationofIPUIcanbegeneralizedtoincludevariouspurposesof
160
theadaptivetests.Thedenominatorcanbemosothattheoptimumitemisin
accordancewiththetestpurpose.Forthenumerator,theinformationshouldbecalculated
inrespecttothisoptimumitem.
6.5FutureResearchDirections
6.5.1AGeneralFrameworkforIPUI
Asdiscussedintheprevioussection,theoftheoptimumitemmightbet
fortestswithtpurposes.IPUIhasalimitedofanoptimumitem.Future
researchcaninvestigateageneralizedframeworkforIPUIwhichencompasseserent
oftheoptimumitem.
Inageneralizedframework,theoftheoptimumitemdoesnothavetobean
itemthathasthemaximuminformationatexaminee'sintermediateabilityestimate.Instead,
theoptimumitemcanbeinaccordancewiththepurposeofthetest.IPUIcan
quantifythediscrepancybetweentheadministereditemandtheoptimumitem.
Forinstance,foralicensuretestwithonecutscore,theoptimumitemhasay
parameterthatisequaltothecutscore.Suchanitemincreasesthedecisionaccuracy,ifnot
theprecisionoftheabilityestimates.IPUIcanbecalculatedastheratiooftheinformation
oftheadministereditematthecutscoretotheinformationoftheoptimumitematthecut
score.Iftherearemultiplecutscoresinatest(suchasbasic,tandadvanced),the
oftheoptimumitembecomescomplicated(Eggen&Straetmans,2000).
Anotherforoptimumitemisrelatedtothetestanxietyamongexaminees.One
ofthecriticismsofaCATisthecultyofitemspresentedtotheexaminee.Itemselection
inaCATisoptimizedsothatateachstepoftheCAT,thealgorithmadministersanitem
with50%probabilityofcorrectanswerattheintermediateabilityestimateoftheexaminee.
Thiscontinuingchallengethroughoutthetestmightcausefrustrationtosomeexaminees.
EggenandVerschoor(2006)dasolutiontothisproblem.Insteadofselectingitems
161
thathave50%probabilityofcorrectresponse,theyanitemselectionalgorithmwhich
selectsitemsthathave60%or70%(oranyotherdesiredpercentage)probabilityofcorrect
response.WhencomparingtheiralgorithmwithMFIitemselectionalgorithm,theyobserved
thattheydidnotachievedesiredpercentages.Theyattributedthediscrepancybetweenthe
actualanddesiredpercentagesto\amismatchbetweentheitemsavailableintheitembank
andthedesiredpercentagesinthepopulation"(Eggen&Verschoor,2006,p.391).They
hypothesizedthatadditionofeasieritems(inthecasewheredesiredpercentagewas60%)to
itembankwouldresolvetheproblem.
Atsight,IPUImightseemtoquantifythemismatchtheyobserved.ButinfactIPUI
wouldnothelpinthissituation.ThemainassumptionofIPUIistheofoptimum
item.Anoptimumitemisanitemwhichprovidesthemaximuminformationatanexaminees
abilitylevel,anitemwith50%probabilityofcorrectresponseattheintermediate

estimate.
InthestudyofEggenandVerschoor(2006),theofoptimumitemwast.
Theydeanoptimumitemasanitemthathasmaximuminformationat\anability
valueatwhichtheexamineewiththecurrentabilityestimatehasahigherorlowersuccess
probability"(p.387).IPUIasasinEquation(3.5)couldnotcapturethe\mismatch"
theydesired.
Ontheotherhand,forthisparticularitemselectionalgorithmthereexistsasolution
forthisproblem.TheformulationofIPUIcouldbechangedtoadjustfortheirof
optimumitem.InEquation(3.5),insteadofcalculatinginformationat
^

k

1
,theinformation
canbecalculatedat
^

k

1


i
,where
^

k

1
istheintermediateabilityestimatebeforethe
administrationof
k
thitem.

i
istheshiftparameterforthe
i
-thitemwhichisfor
2PLas
1
a
i
ln
(
p
1

p
),where
p
isthedesiredprobabilityofcorrectresponse.Forexample,ifthe
desiredprobabilityis0.60,theitemselectionalgorithmwillselectanitemthathasmaximum
informationat
^

k

1


i
=
^

k

1

1
a
i
ln(
0
:
6
1

0
:
6
).
For1PLmodel,theofoptimumitemforanexamineewithintermediateability
estimate
^

k

1
isanitemwithyparameterequalsto
^

k

1

ln
(
p
1

p
).Whenthedesired
162
probability
p
=0
:
5,thiscorrespondstoayparameterequalstotheintermediate

estimate.For2PLmodel,theofoptimumitemismorecomplicated.Sinceitem
discriminationparameterdoesnothaveanupperbound,thetestdevelopercanda
maximumvaluefor
a
parameter(
a
max
).Accordingly,itemdiscriminationparameterofthe
optimumitemis
a
max
andtheitemyparameteris
^

k

1

1
a
max
ln
(
p
1

p
).Usingthe
generalframeworkdiscussedabove,theofIPUIforthe
k
thadministereditem
i
k
willbe:
IPUI
k
=
I
i
k
h
^

k

1

1
a
max
ln(
p
1

p
)
i
I
max
h
^

k

1

1
a
max
ln(
p
1

p
)
i
Thiscanbeadjustedfor1PLmodelby
a
max
to1.Theexamplesgivenhere
canbeextendedtotoptimumitemitionsandtheIPUIcanbeusedasamore
generaltoolforevaluatingtheadequacyofitempoolsfortCATscenarios.
6.5.2WeightsforIPUI
Atindividualtestlevel,IPUIiscurrentlygivingequalweightsateachstageoftheadaptive
test.Inreality,asChangandYing(2008)argued,itisdesirableforaCATtoprovidebetter
itemstowardstheendofthetest.Theabilityestimatesatthebeginningofthetestareprone
tomoreerror,soitemsdonothavetomatchtheintermediateabilityestimatesprecisely.
Towardstoendofthetestbetteritemsareneededbecausetheabilityestimatesaremore
precise.Consideringthis,theweightsoftheIPUImightbeadjustedtobelowattheearly
stagesofthetestandhightowardstheendofthetest.Inthiscase,iftheitempoolis
depletedtowardstheendofthetest,wheretheneedforappropriateitemsismorecritical,
thisweightingschemewillpunishtheitempoolforthis.
6.5.3IPUIforOtherPsychometricModels
IPUIiscurrentlyavailableforonly1PLmodel.Thishampersitsusefor2PLand3PLmodels
whichareverycommoninoperationaltests.Theproblemwith2PLand3PLIRTmodels
163
istheparametervaluesoftheoptimumitemforthesemodels.Foranoptimumitem,the
itemdiscriminationparameter(
a
parameter)shouldbeequaltoy.Theinformation
valueofthisitemalsohasanvalue.ThismakesthedenominatoroftheIPUI
ConsequentlythevalueoftheIPUIwillbe
Inpractice,suchanoptimumitemisnotpossibletodevelop(Reckase,2010).This
limitationcanbehandledbysettingalimittotheitemdiscriminationparameter.Even
thoughthevalueofthislimitwillbearbitrary,thehistoricaltestdatacanbeusedtogetthis
number.Forinstance,anoptimumitemfor3PLcanbeashavinganitemy
equaltotheintermediateabilityestimate,itemdiscriminationequalto2,andguessing
parameterequalto0.
Thisapproachhassomelimitations.Ifanitemhasan
a
parameterlargerthan2,thenthe
valueofIPUIwillexceed1.Inaddition,thecomparisonofIPUIvalueswillnotbepossibleif
tlimitsfor
a
parametersareusedforttests.
Fromthediagnosticpointofview,usingIPUIonlyfor1PLmakessense.Inreality,when
atestdeveloperneedstoaddanitemtotheitempool,itiscomparativelyeasierforitem
writerstowriteanitemthathasatargeteditemyparametercomparedtowriting
itemswithatargeteditemdiscriminationparameter(Bejar,1983).So,diagnostically,ifIPUI
guidestestdeveloperstowriteitemthathavesomespy,practicallythismight
befeasible.
Inadditionto1PL,2PLand3PLmodels,theuseofIPUIcanbeextendedtoMIRT
modelsandpolytomousIRTmodelstoo.ACATusingaMIRTmodelismorecomplex
comparedtotheunidimensionalCATs(Yao,Pommerich,&Segall,2014).Consequently,the
evaluationoftheitempoolsformultidimensionalcomputerizedadaptivetest(MCAT)is
moreTheextensionoftheIPUItomultidimensionalitempoolsisstraightforward
becausetheinformationfunctionofMIRTisverysimilartotheinformationfunctionof
unidimensionalIRT(Reckase&McKinley,1991).IPUIcananeasywaytoevaluatethe
itempoolsforMCAT.
164
CATsusingpolytomousIRTmodels(Nering&Ostini,2010)areanotherpossibleextension
ofIPUI.Inhealthsciences,theuseofaCATwithpolytomousitemsarecommon(Amtmann
etal.,2010;Haleyetal.,2009;Pilkonisetal.,2011).ThesizeoftheitempoolsforCATswith
polytomousitemsarerelativelysmallcomparedtotheitempoolsusedinhighstakestests.
IPUIcanbehelpfulforthediagnosisofthesesmallitempools.Inaddition,duetomultiple
possibleresponsesforeachitem,theevaluationoftheitempoolsmightbechallenging.
6.5.4NamingoftheIndex
Thecorrectnamingoftheindexisimportantbecauseitconveysthemessageaboutthe
possibleusesoftheindex.Anindexwithamisleadingnamemightresultinaninappropriate
useoftheindex.Theindexdevelopedinthisstudyquanwhethertheitempoolis
adequateforagivensetoftestspandtheexamineepopulation.Aperfectly
adequateitempoolmightnotbeadequateforatsetoftestsporfora
texamineepopulation.Thenameoftheindexshouldconveythedependenceofthe
itempoolperformancetothetestspeciandtheexamineepopulation.
Thename\itempoolutilizationindex"partiallycoversthismeaning.Butinthefuture,
othernamingalternativesthatconveysthecapabilitiesofthisindexbettershouldbeexplored.
Somealternativenamesmightbe\itempooladequacyindex",\qualityofutilizationofitem
poolindex"and\qualityofitempoolindex".Astheuseofthisindexspreadamongthe
practitionersandresearchers,aconsensusonabetternameforthisindexwillbereached.
165
APPENDICES
166
APPENDIXA
SUPPLEMENTARYFIGURESFORRESEARCHQUESTION1-PART1
FigureA.1:ItemcultyDistribution(ResearchQuestion1-DiscrepancybetweenItem
PoolandAbilityDistribution)
167
FigureA.2:True

Distribution(ResearchQuestion1-DiscrepancybetweenItemPooland
AbilityDistribution)
168
FigureA.3:DistributionofBiasforeachDiscrepancyCondition(ResearchQuestion1-
DiscrepancybetweenItemPoolandAbilityDistribution)
169
FigureA.4:RelationshipbetweenBiasandIPUIforeachDiscrepancyCondition(Research
Question1-DiscrepancybetweenItemPoolandAbilityDistribution)
170
FigureA.5:DistributionofMeanSquaredErrorforeachDiscrepancyCondition(Research
Question1-DiscrepancybetweenItemPoolandAbilityDistribution)
171
FigureA.6:RelationshipbetweenMeanSquaredErrorandIPUIforeachDiscrepancy
Condition(ResearchQuestion1-DiscrepancybetweenItemPoolandAbilityDistribution)
172
FigureA.7:TwoExamineeswithSameStandardErrorsbuttIPUIValues(Research
Question1-DiscrepancybetweenItemPoolandAbilityDistribution)
173
APPENDIXB
SUPPLEMENTARYFIGURESFORRESEARCHQUESTION1-PART2
FigureB.1:True

Distribution(ResearchQuestion1-Part2)
174
FigureB.2:ItemyDistributionbyItemPoolSizeConditionforReplication19
(ResearchQuestion1-Part2)
175
FigureB.3:BiasDistributionbyItemPoolSizeConditionforReplication19(Research
Question1-Part2)
176
FigureB.4:StandardErrorDistributionbyItemPoolSizeConditionforReplication19
(ResearchQuestion1-Part2)
177
FigureB.5:MeanSquaredErrorDistributionbyItemPoolSizeConditionforReplication19
(ResearchQuestion1-Part2)
178
FigureB.6:IPUIDistributionbyItemPoolSizeConditionforReplication19(Research
Question1-Part2)
179
APPENDIXC
SUPPLEMENTARYFIGURESFORRESEARCHQUESTION2-PART1
FigureC.1:ItemyDistributionforResearchQuestion2-TestLengthConditions
180
FigureC.2:True

DistributionforResearchQuestion2-TestLengthConditions
181
FigureC.3:IPUIandBiasRelationshipbyTestLengthCondition
182
APPENDIXD
SUPPLEMENTARYFIGURESFORRESEARCHQUESTION2-PART2
FigureD.1:ItemyDistributionforResearchQuestion2-ExposureControl
183
FigureD.2:True

DistributionforResearchQuestion2-ExposureControl
184
FigureD.3:IPUIandBiasRelationshipbyExposureControlCondition
185
APPENDIXE
SUPPLEMENTARYFIGURESFORRESEARCHQUESTION3
FigureE.1:TheBiasDistributionateachTrue

ValueforeachItemPoolCondition
186
FigureE.2:TheStandardErrorDistributionateachTrue

ValueforeachItemPool
Condition
187
APPENDIXF
SUPPLEMENTARYFIGURESFORIDEALITEMPOOLCREATION
FigureF.1:ProgressPlotforIdealItemPoolwithFixedBinSize0.8
188
FigureF.2:ItemyDistributionsbyContentAreaforIdealItemPoolwithFixed
BinSize0.8
189
APPENDIXG
SUPPLEMENTARYFIGURESFORRESEARCHQUESTION4
FigureG.1:True

DistributionforResearchQuestion4
190
FigureG.2:TheRelationshipbetweenEstimatedAbilityandBiasforeachItemPool
Condition
191
FigureG.3:TheRelationshipbetweenEstimatedAbilityandStandardErrorforeachItem
PoolCondition
192
FigureG.4:TheRelationshipbetweenTestLengthandStandardErrorforeachItemPool
Condition
193
FigureG.5:TheRelationshipbetweenIPUIandMeanSquaredErrorforeachItemPool
Condition
1
1
Notethatthex-axisscaleforeachist.
194
APPENDIXH
SUPPLEMENTARYFIGURESFORRESEARCHQUESTION5
FigureH.1:TheRelationshipbetweenTrue

andEstimated

foreachItemPoolCondition
195
FigureH.2:TheBiasDistributionateachTrue

ValueforeachItemPoolCondition
1
196
FigureH.3:MeanStandardErrorConditionalonRestrictedTrue

RangeforeachItem
PoolCondition
1
Forbrevity,onlysomeofthetrue

valuesaredisplayed.
197
FigureH.4:TheStandardErrorDistributionateachTrue

ValueforeachItemPool
Condition
2
2
Forbrevity,onlysomeofthetrue

valuesaredisplayed.
198
FigureH.5:TheTestLengthDistributionateachTrue

ValueforeachItemPoolCondition
3
3
Forbrevity,onlysomeofthetrue

valuesaredisplayed.For

valuesoutsidethe[-1.5,
1.5]interval,thetestlengthswereall60.
199
BIBLIOGRAPHY
200
BIBLIOGRAPHY
AmericanEducationalResearchAssociation,AmericanPsychologicalAssociation,&Na-
tionalCouncilonMeasurementinEducation.(2014).
Standardsforeducationaland
psychologicaltesting
.Washington,DC:AmericanEducationalResearchAssociation.
Amtmann,D.,Cook,K.F.,Jensen,M.P.,Chen,W.
-
H.,Choi,S.,Revicki,D.,...Lai,J.
-
S.
(2010).DevelopmentofaPROMISitembanktomeasurepaininterference.
PAIN
,
150
(1),173{182.doi:http://dx.doi.org/10.1016/j.pain.2010.04.025
Barrada,J.R.,Olea,J.,Ponsoda,V.,&Abad,F.J.(2010).Amethodforthecompar-
isonofitemselectionrulesincomputerizedadaptivetesting.
AppliedPsychological
Measurement
,
34
(6),438{452.doi:10.1177/0146621610370152
Bejar,I.I.(1983).Subjectmatterexperts'assessmentofitemstatistics.
AppliedPsychological
Measurement
,
7
(3),303{310.doi:10.1177/014662168300700306
Belov,D.I.&Armstrong,R.D.(2009).Directandinverseproblemsofitempooldesign
forcomputerizedadaptivetesting.
EducationalandPsychologicalMeasurement
,
69
(4),
533{547.doi:10.1177/0013164409332224
Bergstrom,B.A.,Lunz,M.E.,&Gershon,R.C.(1992).Alteringthelevelofyin
computeradaptivetesting.
AppliedMeasurementinEducation
,
5
(2),137{149.doi:10.
1207/s15324818ame0502
4
Bock,R.D.&Mislevy,R.J.(1982).AdaptiveEAPestimationofabilityinamicrocom-
puterenvironment.
AppliedPsychologicalMeasurement
,
6
(4),431{444.doi:10.1177/
014662168200600405
Breithaupt,K.,Ariel,A.A.,&Hare,D.R.(2010).Assemblinganinventoryofmultistage
adaptivetestingsystems.InW.J.vanderLinden&C.A.W.Glas(Eds.),
Elements
ofadaptivetesting
(pp.247{266).Springer.
Chang,H.
-
H.(2004).Understandingcomputerizedadaptivetesting:fromRobbins-Monroto
Lordandbeyond.InD.Kaplan(Ed.),
TheSagehandbookofquantitativemethodsfor
thesocialsciences
(pp.117{133).ThousandOaks,CA:Sage.
Chang,H.
-
H.,Qian,J.,&Ying,Z.(2001).multistagecomputerizedadaptive
testingwithbblocking.
AppliedPsychologicalMeasurement
,
25
(4),333{341.doi:10.
1177/01466210122032181
201
Chang,H.
-
H.&Ying,Z.(1996).Aglobalinformationapproachtocomputerizedadaptivetest-
ing.
AppliedPsychologicalMeasurement
,
20
(3),213{229.doi:10.1177/014662169602000303
Chang,H.
-
H.&Ying,Z.(2008).Toweightornottoweight?Balancingofinitialitems
inadaptivetesting.
Psychometrika
,
73
(3),441{450.doi:10.1007/s11336-007-9047-7
Chang,H.
-
H.&Ying,Z.(2009).Nonlinearsequentialdesignsforlogisticitemresponsetheory
modelswithapplicationstocomputerizedadaptivetests.
TheAnnalsofStatistics
,
37
(3),1466{1488.doi:10.2307/30243674
Chen,S.
-
Y.,Ankenmann,R.D.,&Chang,H.
-
H.(2000).Acomparisonofitemselectionrules
attheearlystagesofcomputerizedadaptivetesting.
AppliedPsychologicalMeasurement
,
24
(3),241{255.Retrievedfromhttp://apm.sagepub.com/content/24/3/241.abstract
Chen,S.
-
Y.,Ankenmann,R.D.,&Spray,J.A.(2003).Therelationshipbetweenitem
exposureandtestoverlapincomputerizedadaptivetesting.
JournalofEducational
Measurement
,
40
(2),129{145.doi:10.2307/1435342
Cheng,Y.&Chang,H.
-
H.(2009).Themaximumpriorityindexmethodforseverelycon-
straineditemselectionincomputerizedadaptivetesting.
BritishJournalofMathematical
andStatisticalPsychology
,
62
(2),369{383.doi:10.1348/000711008X304376
Cronbach,L.J.(1951).Cotalphaandtheinternalstructureoftests.
Psychometrika
,
16
(3),297{334.doi:10.1007/BF02310555
Davey,T.&Nering,M.L.(2002).Controllingitemexposureandmaintainingitemsecurity.
InC.N.Mills,M.T.Potenza,J.J.Fremer,&W.C.Ward(Eds.),
Computer-based
testing:buildingthefoundationforfutureassessments
(pp.165{191).Mahwah,New
Jersey:LawrenceErlbaumAssociates.
Eggen,T.J.H.M.&Straetmans,G.(2000).Computerizedadaptivetestingforclassifying
examineesintothreecategories.
EducationalandPsychologicalMeasurement
,
60
(5),
713{734.doi:10.1177/00131640021970862
Eggen,T.J.H.M.&Verschoor,A.J.(2006).Optimaltestingwitheasyoritems
incomputerizedadaptivetesting.
AppliedPsychologicalMeasurement
,
30
(5),379{393.
doi:10.1177/0146621606288890
Eignor,D.R.,Stocking,M.L.,Way,W.D.,&M.(1993).
Casestudiesincomputer
adaptivetestdesignthroughsimulation
.EducationalTestingService.
Flaugher,R.(2000).Itempools.InH.Wainer,N.J.Dorans,D.Eignor,R.Flaugher,B.F.
Green,R.J.Mislevy,...D.Thissen(Eds.),
Computerizedadaptivetesting:aprimer
(2ndedition,pp.37{60).Mahwah,NewJersey:LawrenceErlbaumAssociates.
202
Georgiadou,E.G.,TrianE.,&Economides,A.A.(2007).Areviewofitemexposure
controlstrategiesforcomputerizedadaptivetestingdevelopedfrom1983to2005.
The
JournalofTechnology,LearningandAssessment
,
5
(8).
Gibbons,R.D.,Weiss,D.J.,Kupfer,D.J.,Frank,E.,Fagiolini,A.,Grochocinski,V.J.,...
Immekus,J.C.(2008).Usingcomputerizedadaptivetestingtoreducetheburdenof
mentalhealthassessment.
PsychiatricServices
,
59
(4),361{8.
Gierl,M.J.&Lai,H.(2013).Instructionaltopicsineducationalmeasurement(ITEMS)
module:usingautomatedprocessestogeneratetestitems.
EducationalMeasurement:
IssuesandPractice
,
32
(3),36{50.doi:10.1111/emip.12018
Haley,S.M.,Fragala-Pinkham,M.A.,Dumas,H.M.,Ni,P.,Gorton,G.E.,Watson,K.,...
Tucker,C.A.(2009).Evaluationofanitembankforacomputerizedadaptivetestof
activityinchildrenwithcerebralpalsy.
PhysicalTherapy
,
89
(6),589{600.
Hambleton,R.K.&Swaminathan,H.(1985).
Itemresponsetheory:principlesandapplications
.
Boston,MA:KluwPub.
Han,K.T.(2012).Anbalancedinformationcriterionforitemselectionincomput-
erizedadaptivetesting.
JournalofEducationalMeasurement
,
49
(3),225{246.doi:10.
1111/j.1745-3984.2012.00173.x
He,W.,Diao,Q.,&Hauser,C.(2014).Acomparisonoffouritem-selectionmethodsfor
severelyconstrainedCATs.
EducationalandPsychologicalMeasurement
.doi:10.1177/
0013164413517503
He,W.&Reckase,M.D.(2013).Itempooldesignforanoperationalvariable-length
computerizedadaptivetest.
EducationalandPsychologicalMeasurement
.doi:10.1177/
0013164413509629
Hetter,R.D.&Sympson,J.B.(1997).Item-exposureinCAT-ASVAB.InW.A.Sands,
B.K.Waters,&J.R.McBride(Eds.),
Computerizedadaptivetesting:frominquiryto
operation
(pp.141{144).Washington,DC:AmericanPsychologicalAsociation.
Hildebrand,F.B.(1987).
Introductiontonumericalanalysis
(2ndedition).Mineola:NY:
CourierDoverPublications.
Kane,M.T.(2013).Validatingtheinterpretationsandusesoftestscores.
Journalof
EducationalMeasurement
,
50
(1),1{73.doi:10.1111/jedm.12000
Kim,J.K.&Nicewander,W.A.(1993).Abilityestimationforconventionaltests.
Psychome-
trika
,
58
(4),587{599.doi:10.1007/BF02294829
203
Kingsbury,G.G.&Wise,S.L.(2000).Practicalissuesindevelopingandmaintaininga
computerizedadaptivetestingprogram.
Psicolgica:Revistademetodologaypsicologa
experimental
,
21
(1),135{156.
Kingsbury,G.G.&Zara,A.(1989).Proceduresforselectingitemsforcomputerizedadaptive
tests.
AppliedMeasurementinEducation
,
2
(4),359{375.
Kruyen,P.M.,Emons,W.H.M.,&Sijtsma,K.(2012).Testlengthanddecisionqualityin
personnelselection:whenisshorttooshort?
InternationalJournalofTesting
,
12
(4),
321{344.doi:10.1080/15305058.2011.643517
Leeuw,J.d.&Verhelst,N.(1986).MaximumlikelihoodestimationingeneralizedRasch
models.
JournalofEducationalStatistics
,
11
(3),183{196.doi:10.2307/1165071
Leroux,A.J.,Lopez,M.,Hembry,I.,&Dodd,B.G.(2013).Acomparisonofexposurecontrol
proceduresinCATsusingthe3PLmodel.
EducationalandPsychologicalMeasurement
,
73
(5),857{874.doi:10.1177/0013164413486802
Lord,F.M.(1974).Therelativeoftwotestsasafunctionofabilitylevel.
Psy-
chometrika
,
39
(3),351{358.doi:10.1007/BF02291708
Lord,F.M.(1975).Relativeofnumber-rightandformulascores.
BritishJournal
ofMathematicalandStatisticalPsychology
,
28
(1),46{50.
Lord,F.M.(1977a).Abroad-rangetailoredtestofverbalability.
AppliedPsychological
Measurement
,
1
(1),95{100.doi:10.1177/014662167700100115
Lord,F.M.(1977b).Practicalapplicationsofitemcharacteristiccurvetheory.
Journalof
EducationalMeasurement
,
14
(2).doi:10.2307/1434011
Lord,F.M.(1980).
Applicationsofitemresponsetheorytopracticaltestingproblems
.Hillsdale,
NJ:L.ErlbaumAssociates.
Lord,F.M.(1986).Maximumlikelihoodandbayesianparameterestimationinitemresponse
theory.
JournalofEducationalMeasurement
,
23
(2),157{162.Retrievedfromhttp:
//www.jstor.org/stable/1434513
Lord,F.M.&Novick,M.R.(1968).
Statisticaltheoriesofmentaltestscores
.Reading,MA:
Addison-Wesley.
Luecht,R.M.&Clauser,B.E.(2002).TestmodelsforcomplexCBT.InC.N.Mills,M.T.
Potenza,J.J.Fremer,&W.C.Ward(Eds.),
Computer-basedtesting:buildingthe
foundationforfutureassessments
(pp.67{88).Mahwah,NJ:LawrenceErlbaum.
204
McBride,J.R.(1977).SomepropertiesofaBayesianadaptiveabilitytestingstrategy.
Applied
PsychologicalMeasurement
,
1
(1),121{140.doi:10.1177/014662167700100119
Meijer,R.&Nering,M.L.(1999).Computerizedadaptivetesting:overviewandintroduction.
AppliedPsychologicalMeasurement
,
23
(3),187{194.
Millman,J.&Arter,J.A.(1984).Issuesinitembanking.
JournalofEducationalMeasurement
,
21
(4),315{330.doi:10.2307/1434584
Mills,C.N.&Stocking,M.L.(1996).Practicalissuesinlarge-scalecomputerizedadaptivetest-
ing.
AppliedMeasurementinEducation
,
9
(4),287{304.doi:10.1207/s15324818ame0904
1
NationalCouncilofStateBoardsofNursing.(2012).
NCLEX-RNexaminationdetailedtest
planfortheNationalCouncilLicensureExaminationforRegisteredNursesitemwriter-
itemreviewer-nurseeducatorversion
.NationalCouncilofStateBoardsofNursing.
Chicago,IL.Retrievedfromhttps://www.ncsbn.org/2013
NCLEX
RN
Detailed
Test
Plan
Educator.pdf
Nering,M.L.&Ostini,R.(2010).
Handbookofpolytomousitemresponsetheorymodels
.New
York:Routledge.
Nunnally,J.C.&Bernstein,I.H.(1994).
Psychometrictheory
(3rdedition).NewYork:
McGraw-Hill.
Owen,R.(1969).
ABayesianapproachtotailoredtesting
(ReportNo.ResearchBulletinNo.
69-92).EducationalTestingService.
Owen,R.(1975).ABayesiansequentialprocedureforquantalresponseinthecontextof
adaptivementaltesting.
JournaloftheAmericanStatisticalAssociation
,
70
(350),
351{356.
Parshall,C.G.(2002).ItemdevelopmentandpretestinginaCBTenvironment.InC.N.
Mills,M.T.Potenza,J.J.Fremer,&W.C.Ward(Eds.),
Computer-basedtesting:
buildingthefoundationforfutureassessments
(pp.119{141).Mahwah,NewJersey:
LawrenceErlbaumAssociates.
Parshall,C.G.,Spray,J.A.,Kalohn,J.C.,&Davey,T.(2002).
Practicalconsiderationsin
computer-basedtesting
.NewYork:SpringerVerlag.
Pilkonis,P.A.,Choi,S.W.,Reise,S.P.,Stover,A.M.,Riley,W.T.,Cella,D.,&PROMIS
CooperativeGroup.(2011).Itembanksformeasuringemotionaldistressfromthe
Patient-ReportedOutcomesMeasurementInformationSystem(PROMIS):depression,
anxiety,andanger.
Assessment
,
18
(3),263{283.doi:10.1177/1073191111411667
205
RCoreTeam.(2014).
R:alanguageandenvironmentforstatisticalcomputing
.RFoundation
forStatisticalComputing.Vienna,Austria.Retrievedfromhttp://www.R-project.org
Rasch,G.(1961).Ongenerallawsandthemeaningofmeasurementinpsychology.In
ProceedingsofthefourthBerkeleysymposiumonmathematicalstatisticsandprobability
(Vol.4,pp.321{333).UniversityofCaliforniaPressBerkeley,CA.
Reckase,M.D.(2010).Designingitempoolstooptimizethefunctioningofacomputerized
adaptivetest.
PsychologicalTestandAssessmentModeling
,
52
(2),127{141.
Reckase,M.D.&McKinley,R.L.(1991).Thediscriminatingpowerofitemsthatmeasure
morethanonedimension.
AppliedPsychologicalMeasurement
,
15
(4),361{373.doi:10.
1177/014662169101500407
Revuelta,J.&Ponsoda,V.(1998).Acomparisonofitemexposurecontrolmethodsin
computerizedadaptivetesting.
JournalofEducationalMeasurement
,
35
(4),311{327.
Retrievedfromhttp://www.jstor.org/stable/1435308
Rudner,L.M.(2010).Implementingthegraduatemanagementadmissiontestcomputerized
adaptivetest.InW.J.vanderLinden&C.A.W.Glas(Eds.),
Elementsofadaptive
testing
(pp.151{165).Springer.
Samajima,F.(1969).Estimationoflatentabilityusingaresponsepatternofgradedscores.
PsychometricMonograph
,
17
.Retrievedfromhttp://www.psychometrika.org/journal/
online/MN17.pdf
Schmeiser,C.B.&Welch,C.J.(2006).Testdevelopment.InR.L.Brennan(Ed.),
Educational
measurement
(4thedition,pp.307{353).Westport,CT:ACE/PraegerPublishers.
Segall,D.O.(1996).Multidimensionaladaptivetesting.
Psychometrika
,
61
(2),331{354.
doi:10.1007/bf02294343
Segall,D.O.,Moreno,K.E.,&Hetter,R.D.(1997).Itempooldevelopmentandevaluation.
InW.A.Sands,B.K.Waters,&J.R.McBride(Eds.),
Computerizedadaptivetesting:
frominquirytooperation
(pp.117{130).Washington,DC:AmericanPsychological
Asociation.
Stocking,M.L.(1994).
Threepracticalissuesformodernadaptivetestingitempools
(Report
No.RR-94-05).EducationalTestingService.Princeton,NewJersey.
Swanson,L.&Stocking,M.L.(1993).Amodelandheuristicforsolvingverylargeitem
selectionproblems.
AppliedPsychologicalMeasurement
,
17
(2),151{166.doi:10.1177/
014662169301700205
206
Thompson,N.A.&Weiss,D.J.(2011).Aframeworkforthedevelopmentofcomputerized
adaptivetests.
PracticalAssessment,Research,andEvaluation
,
16
(1),1{9.
Urry,V.W.(1977).Tailoredtesting:asuccessfulapplicationoflatenttraittheory.
Journal
ofEducationalMeasurement
,
14
(2),181{196.doi:10.2307/1434014
Vale,C.D.&Weiss,D.J.(1977).
Arapiditem-searchprocedureforbayesianadaptivetesting.
researchreport77-4
(ReportNo.ResearchReport77-4).
vanderLinden,W.J.(1998a).Bayesianitemselectioncriteriaforadaptivetesting.
Psy-
chometrika
,
63
(2),201{216.
vanderLinden,W.J.(1998b).Optimalassemblyofpsychologicalandeducationaltests.
AppliedPsychologicalMeasurement
,
22
(3),195{211.doi:10.1177/01466216980223001
vanderLinden,W.J.(2010).Constrainedadaptivetestingwithshadowtests.InW.J.
vanderLinden&C.A.Glas(Eds.),
Elementsofadaptivetesting
(pp.31{55).New
York,NY:Springer.
vanderLinden,W.J.,Ariel,A.,&Veldkamp,B.P.(2006).Assemblingacomputerized
adaptivetestingitempoolasasetoflineartests.
JournalofEducationalandBehavioral
Statistics
,
31
(1),81{99.doi:10.3102/10769986031001081
vanderLinden,W.J.&Pashley,P.J.(2010).Itemselectionandabilityestimationin
adaptivetesting.InW.J.vanderLinden&C.A.Glas(Eds.),
Elementsofadaptive
testing
(pp.3{30).NewYork,NY:Springer.
vanderLinden,W.J.,Veldkamp,B.P.,&Reese,L.M.(2000).Anintegerprogramming
approachtoitembankdesign.
AppliedPsychologicalMeasurement
,
24
(2),139{150.
doi:10.1177/01466210022031570
Veldkamp,B.P.&vanderLinden,W.J.(2010).Designingitempoolsforadaptivetesting.In
W.J.vanderLinden&C.A.W.Glas(Eds.),
Elementsofadaptivetesting
(Chap.12,
pp.231{245).NewYork:Springer.
Wainer,H.(2000).Introductionandhistory.InH.Wainer,N.J.Dorans,D.Eignor,R.
Flaugher,B.F.Green,R.J.Mislevy,...D.Thissen(Eds.),
Computerizedadaptive
testing:aprimer
(2nd,pp.1{21).Mahwah,NewJersey:LawrenceErlbaumAssociates.
Wang,T.&Vispoel,W.P.(1998).Propertiesofabilityestimationmethodsincomputerized
adaptivetesting.
JournalofEducationalMeasurement
,
35
(2),109{135.doi:10.1111/j.
1745-3984.1998.tb00530.x
207
Warm,T.(1989).Weightedlikelihoodestimationofabilityinitemresponsetheory.
Psy-
chometrika
,
54
(3),427{450.doi:10.1007/bf02294627
Way,W.D.(1998).Protectingtheintegrityofcomputerizedtestingitempools.
Educational
Measurement:IssuesandPractice
,
17
(4),17{27.doi:10.1111/j.1745-3992.1998.tb00632.x
Way,W.D.,M.,&Anderson,G.S.(2002).Developing,maintaining,andrenewing
theiteminventorytosupportCBT.InC.N.Mills,M.T.Potenza,J.J.Fremer,&W.C.
Ward(Eds.),
Computer-basedtesting:buildingthefoundationforfutureassessments
(pp.143{164).Mahwah,NewJersey:LawrenceErlbaumAssociates.
Weiss,D.J.(1982).Improvingmeasurementqualityandwithadaptivetesting.
AppliedPsychologicalMeasurement
,
6
(4),473{492.doi:10.1177/014662168200600408
Weiss,D.J.(2011).Betterdatafrombettermeasurementsusingcomputerizedadaptive
testing.
JournalofMethodsandMeasurementintheSocialSciences
,
2
(1),1{27.
Weiss,D.J.&McBride,J.R.(1984).BiasandinformationofBayesianadaptivetesting.
AppliedPsychologicalMeasurement
,
8
(3),273{285.doi:10.1177/014662168400800303
Wise,S.L.,Bhola,D.S.,&Yang,S.
-
T.(2006).Takingthetimetoimprovethevalidity
oflow-stakestests:thengCBT.
EducationalMeasurement:Issuesand
Practice
,
25
(2),21{30.doi:10.1111/j.1745-3992.2006.00054.x
Xing,D.&Hambleton,R.K.(2004).Impactoftestdesign,itemquality,anditembanksizeon
thepsychometricpropertiesofcomputer-basedcredentialingexaminations.
Educational
andPsychologicalMeasurement
,
64
(1),5{21.doi:10.1177/0013164403258393
Yao,L.,Pommerich,M.,&Segall,D.O.(2014).UsingmultidimensionalCATtoadminister
ashort,yetprecise,screeningtest.
AppliedPsychologicalMeasurement
,
38
(8),614{631.
208