ANOVELAPPROACHTOEVALUATEITEMPOOLS:THEITEMPOOL UTILIZATIONINDEX By Emreonulate˘s ADISSERTATION Submittedto MichiganStateUniversity inpartialentoftherequirements forthedegreeof MeasurementandQuantitativeMethods{DoctorofPhilosophy 2015 ABSTRACT ANOVELAPPROACHTOEVALUATEITEMPOOLS:THEITEMPOOL UTILIZATIONINDEX By Emreonulate˘s Inthisstudy,anindextoquantifytheadequacyofanitempoolofanadaptivetestfora givensetoftestspandexamineepopulationisintroduced.Thisindexiscalled theItemPoolUtilizationIndex(IPUI).TheIPUIrangesfrom0to1,withvaluescloseto1 indicatingtheitempoolcanprovideoptimumitemstoexamineesthroughoutthetest.This indexcanbeusedtocomparetitempoolsordiagnosetheofagivenitem poolbyquantifyingtheamountofdeviationfromaperfectitempool. Simulationstudieswereconductedtoevaluatethecapacityofthisindexfordetectingthe inadequaciesofbothsimulatedandoperationalitempools.Theaddedvalueofthisindex wascomparedtotheexistingmethodsofevaluatingthequalityofcomputerizedadaptive tests(CAT). ResultsofthestudyshowedthattheIPUIcandetectevenslightdeviationsoftheitem poolsfromanoptimalitempool.Itcanuncovertheshortcomingsofanitempoolthatother outcomesofCATcannotdetect.Additionally,itcanbeusedtodiagnosetheweaknessesof theitempoolandguidetestdeveloperstoimprovetheiritempools. Keywords: ComputerizedAdaptiveTest,ItemPool,ItemPoolDesign IdedicatethisdissertationtomywifeFunda,mychildrenMeryem,BilgeandElif, andmyparents,MeryemandSelahattin. iii ACKNOWLEDGEMENTS ThisdissertationthesisistheoutcomeofmanyyearsofstudiesatMichiganStateUniversity. TheideaforthisdissertationcamefromanadvancedpsychometrycourseItookfromDr. MarkReckasebackinFallsemesterof2013.Atthattime,IwashelpingDr.Reckasein developingidealitempoolsforatestingcompany.WhenIsawtheitemselectionalgorithm introducedinHan(2012),whichwasoneofthetextswereadinthecourse,theideaofthis thesiswasborn. First,mydeepestthanksgoestomyacademicadvisorandcommitteechairDr.Mark Reckase.DuringmyacademiclifeatMSUhehasbeenawise,thoughtfulandkindmentor. Hewasgenerousinsharinghiswisdomandknowledge.Hehasbeenagreatexampleofa hardworking,dedicatedandpassionatescholar.Heintroducedmetomanyprojectsthat taughtmebothpracticalandtheoreticalaspectsofourHespentmanyhourswithme tocultivatetheideaofthisthesisintoaworkthatI'mveryproudof. NationalCouncilofStateBoardsofNursing(NCSBN)providedsupportatthe earlystagesofthisdissertation.Theorganizationalsoprovidedoperationaldatausedinthis studythathelpedmetoshowthepracticaluseoftheindex.IwanttothankDrs.AdaWoo, HongQianandDoyoungKimfromNCSBNfortheirhelpandsupport. Iwishtothankmyfriendsandcolleagues,EunHyeHam,Uygun,LiyangMao, XinLuo,ChiChang,FrancisSmart,IfeomaIyioke,BingTong,AnneTraynor,LihongYang, TingqiaoChen,KeyinWang,HyesukJangandXuechunZhouwhoenrichmylifeatMSU andcontributedtomylearning. I'mgratefultomydissertationcommitteemembers,Drs.SpyrosKonstantopoulos, RichardHouangandChristopherNyefortheirwillingnesstoreviewmywork.Theirfeedback improvedthisdissertationimmensely.Inaddition,I'mthankfultothefacultyatMeasurement andQuantitativeMethodsProgram,especiallyKenFrank,TenkoRaykov,KimberlyMaier, iv BillSchmidt.Theyhelpedmetoestablishthebasisformyadvancedstudies.I'mgratefulto thesupportofDr.EdRoeber.HeadvisedmeinthetwoyearsofmystudiesatMSU andhelpedmetomywayinthislargeofstudy. Myuniquegratitudesgoestomyparents,MeryemandSelahattinonulate˘s,fortheir onmybehalfandtheirunconditionallove.Theirendlesssupportandencouragement throughoutmylifehashelpedmeaccomplishthisandmanyothergoalsinlife.I'mgrateful tomysisterZeynepandbrotherAhmetTalhaforbeingtherewheneverIneedthem. Mychildren,Meryem,BilgeandElif,theyarethejoyofmylife.Overmanyhoursthey putupwithanoftenabsentanddistractedfather.Ithankthemfortheirloveandthe happinesstheybringtomylife. And,Iowethisdissertationtotheunwaveringsupportofmywife,Funda.Without her,Iwouldnothavebeenabletocompletethisdegree.Duringtheseyearsshehasalotof thingsonhershoulders.Shegavebirthtothreewonderfulchildrenandraisedthem,pursued aPhDofherown,involvedinmanyprojectsandmuchmore.Butstill,shealwaysmake sureIhaveenoughspaceandtimeformywork.Hermoralsupporthelpedmetomy wayduringthegloomydaysandshesharedmyjoyinmanyhappydays.Shespentmany hoursreadingmydraftsandgavevaluablefeedback.I'mthankfultoherforencouragingme tofollowmydreamandbelieveinmyself. v TABLEOFCONTENTS LISTOFTABLES ................................... ix LISTOFFIGURES ................................... x KEYTOABBREVIATIONS .............................. xv CHAPTER1INTRODUCTION ........................... 1 CHAPTER2LITERATUREREVIEW ........................ 5 2.1Notation......................................5 2.2ItemResponseTheory..............................5 2.3ComputerizedAdaptiveTesting.........................6 2.3.1InitialAbilityEstimate..........................7 2.3.2ItemSelection...............................8 2.3.2.1MaximumFisherInformation.................8 2.3.2.2Owen'sBayesianItemSelection................10 2.3.3ConstraintsonItemSelection......................12 2.3.3.1ContentBalancing.......................13 2.3.3.2ExposureControl........................14 2.3.4AbilityEstimation............................16 2.3.4.1MaximumLikelihoodEstimation...............16 2.3.4.2ExpectedaPosterioriEstimation...............18 2.3.4.3Owen'sBayesianEstimation..................19 2.3.5ItemPoolsinCAT............................20 2.3.5.1ItemPoolSize.........................21 2.3.5.2ItemPoolDesignandAssembly................22 2.3.6EvaluationoftheItemPools.......................23 CHAPTER3THEITEMPOOLUTILIZATIONINDEX .............. 25 3.1Relative................................25 3.2ItemPoolUtilizationIndex...........................27 3.3AnExampleCalculationofIPUI........................30 3.4betweenIPUIandStandardError..................31 3.5TheLimitationsofIPUI.............................33 CHAPTER4RESEARCHQUESTIONSANDMETHODS ............. 36 4.1ResearchQuestions................................36 4.2ResearchMethods.................................36 4.2.1FirstPhase-SimulatedData......................37 4.2.1.1CommonCATSp.................37 4.2.1.2ResearchQuestion1......................38 vi 4.2.1.3ResearchQuestion2......................42 4.2.1.4ResearchQuestion3......................44 4.2.2SecondPhase-RealData........................47 4.2.2.1ResearchQuestion4......................48 4.2.2.2ResearchQuestion5......................51 CHAPTER5RESULTS ................................ 53 5.1FirstPhase-SimulatedData..........................53 5.1.1ResearchQuestion1...........................53 5.1.1.1ItemPoolandExamineeAbilityDiscrepancy........53 5.1.1.2ItemPoolSize.........................61 5.1.2ResearchQuestion2...........................72 5.1.2.1TestLength...........................73 5.1.2.2ExposureControl........................84 5.1.3ResearchQuestion3...........................96 5.2SecondPhase-OperationalData........................107 5.2.1IdealItemPoolGeneration.......................107 5.2.2ItemPoolsUsedintheSecondPhase..................109 5.2.3ResearchQuestion4...........................111 5.2.4ResearchQuestion5...........................130 CHAPTER6DISCUSSION .............................. 141 6.1SummaryoftheResults.............................141 6.2PracticalUsesofIPUI..............................144 6.2.1QuanoftheItemPoolQuality.................145 6.2.2IPUIinOptimalTestAssembly.....................147 6.2.3IPUIasaQualityControlTool.....................148 6.2.4IPUIasaDiagnosticTool........................148 6.2.5IPUIatIndividualandGroupLevel...................149 6.3Implications....................................151 6.3.1TheRobustnessofCATProcedurestoWeakItemPools.......151 6.3.2SummaryStatisticsforIPUI.......................152 6.3.3CommentaryontheResultsoftheOperationalItemPools......153 6.3.4IPUIandMeasurementQuality.....................153 6.4LimitationsoftheStudy.............................155 6.4.1GeneralizabilityoftheResults......................155 6.4.2ARecommendedValueforIPUI.....................156 6.4.3DetectionoftheRedundantItemsintheItemPool..........159 6.4.4ThePurposeoftheTestandtheoftheOptimumItem..160 6.5FutureResearchDirections............................161 6.5.1AGeneralFrameworkforIPUI.....................161 6.5.2WeightsforIPUI.............................163 6.5.3IPUIforOtherPsychometricModels..................163 6.5.4NamingoftheIndex...........................165 vii APPENDICES ...................................... 166 APPENDIXA SUPPLEMENTARYFIGURESFORRESEARCHQUES- TION1-PART1 ........................ 167 APPENDIXB SUPPLEMENTARYFIGURESFORRESEARCHQUES- TION1-PART2 ........................ 174 APPENDIXC SUPPLEMENTARYFIGURESFORRESEARCHQUES- TION2-PART1 ........................ 180 APPENDIXD SUPPLEMENTARYFIGURESFORRESEARCHQUES- TION2-PART2 ........................ 183 APPENDIXE SUPPLEMENTARYFIGURESFORRESEARCHQUES- TION3 ............................. 186 APPENDIXF SUPPLEMENTARYFIGURESFORIDEALITEMPOOL CREATION ........................... 188 APPENDIXG SUPPLEMENTARYFIGURESFORRESEARCHQUES- TION4 ............................. 190 APPENDIXH SUPPLEMENTARYFIGURESFORRESEARCHQUES- TION5 ............................. 195 BIBLIOGRAPHY .................................... 200 viii LISTOFTABLES Table3.1ItemParametersofTest1andTest2....................26 Table3.2IPUICalculationExample..........................31 Table4.1ItemPoolInformationforResearchQuestion3...............46 Table4.2DistributionofContentforNCLEX-RNExamination...........47 Table5.1 SummaryStatisticsforResearchQuestion1-DiscrepancybetweenItem PoolandAbilityDistribution.........................55 Table5.2MeansandStandardDeviationsofIPUIValuesbyItemPoolSize....67 Table5.3SummaryStatisticsforResearchQuestion2-TestLength.........75 Table5.4ItemExposureAnalysisbyTestLengthCondition.............82 Table5.5SummaryStatisticsforResearchQuestion2-ExposureControl......88 Table5.6ItemExposureAnalysisbyExposureControlCondition..........94 Table5.7SummaryStatisticsforResearchQuestion4.................114 Table5.8ItemExposureAnalysisbyItemPoolCondition..............128 Table5.9DecisionAccuracyAnalysisbyItemPoolCondition............130 Table5.10DecisionAccuracyConditionalonTrue foreachItemPool.......136 Table6.1MeansandStandardDeviationsofMeanIPUIsoftheReplications....143 ix LISTOFFIGURES Figure1.1AdaptiveTestProgressPlotsforTwoExaminees.............2 Figure2.1 ComparisonoftheInformationFunctionsofSixItemPoolsUsingthe ItemPoolInformationFunctions......................24 Figure3.1TestInformationFunctionsandRelativeofTest1andTest226 Figure3.2ADemonstrationofthebetweenSEandIPUI.........32 Figure5.1 SummaryStatisticsforResearchQuestion1-DiscrepancybetweenItem yDistributionofItemPooland Distribution..........54 Figure5.2DistributionofStandardErrorswithineachDiscrepancyCondition...56 Figure5.3DistributionofIPUIwithineachDiscrepancyCondition.........58 Figure5.4 RelationshipbetweenStandardErrorandIPUIforeachDiscrepancy Condition...................................59 Figure5.5MeanBiasDistributionbyItemPoolSizeCondition...........63 Figure5.6MeanStandardErrorDistributionbyItemPoolSizeCondition.....64 Figure5.7MeanSquaredErrorDistributionbyItemPoolSizeCondition......65 Figure5.8ExposureRatesbyItemPoolSizeConditionforReplication19.....66 Figure5.9MeanIPUIDistributionbyItemPoolSizeCondition...........67 Figure5.10RelationshipbetweenMeanofIPUIandMeanofStandardError....69 Figure5.11IPUIandStandardErrorRelationshipforReplication19.........70 Figure5.12CorrelationbetweenStandardErrorandIPUIforeachReplication...71 Figure5.13IPUIandReliabilityRelationshipforReplication19............72 Figure5.14SummaryStatisticsforResearchQuestion2-TestLength........74 Figure5.15 RelationshipbetweenTrueandEstimatedAbilitybyTestLengthCondition 76 Figure5.16BiasDistributionbyTestLengthCondition................77 x Figure5.17StandardErrorDistributionbyTestLengthCondition..........79 Figure5.18MeanSquaredErrorDistributionbyTestLengthCondition.......80 Figure5.19ItemExposureDistributionbyTestLengthCondition..........81 Figure5.20IPUIDistributionbyTestLengthCondition................83 Figure5.21IPUIandStandardErrorRelationshipbyTestLengthCondition....85 Figure5.22SummaryStatisticsforResearchQuestion2-ExposureControl.....87 Figure5.23 RelationshipbetweenTrueandEstimatedAbilitybyExposureControl Condition...................................89 Figure5.24BiasDistributionbyExposureControlCondition.............90 Figure5.25StandardErrorDistributionbyExposureControlCondition.......91 Figure5.26MeanSquaredErrorDistributionbyExposureControlCondition....92 Figure5.27ItemExposureDistributionbyExposureControlCondition.......93 Figure5.28IPUIDistributionbyExposureControlCondition.............95 Figure5.29IPUIandStandardErrorRelationshipbyExposureControlCondition.97 Figure5.30ItemPoolDistributionsofProposedTestPlans..............98 Figure5.31MeanBiasConditionalonTrue foreachPlan(ItemPool).......99 Figure5.32MeanStandardErrorConditionalonTrue foreachPlan(ItemPool).100 Figure5.33MeanIPUIConditionalonTrue foreachPlan(ItemPool).......101 Figure5.34IPUIDistributionConditionalonTrue foreachPlan(ItemPool)...102 Figure5.35MeanIPUIateachItemNumberforSelectedTrue s...........104 Figure5.36TheRelationshipbetweenIntermediate EstimateandIPUI.......105 Figure5.37TheRelationshipbetweenIntermediate EstimateandItemy.106 Figure5.38ProgressPlotforIdealItemPoolwithFixedBinSize0.4.........108 Figure5.39ItemyDistributionsbyContentAreaforIdealItemPoolwith FixedBinSize0.4..............................110 xi Figure5.40ItemyDistributionsforItemPoolsUsedinResearchQuestion4112 Figure5.41SummaryStatisticsforResearchQuestion4................113 Figure5.42IPUIDistributionforeachItemPoolCondition..............115 Figure5.43TheRelationshipbetweenTrue andEstimated .............116 Figure5.44 TheRelationshipbetweenIPUIandEstimatedAbilityforeachItem PoolCondition................................118 Figure5.45 TheRelationshipbetweenIPUIandTestLengthforeachItemPool Condition...................................119 Figure5.46BiasDistributionforeachItemPoolCondition..............121 Figure5.47TheRelationshipbetweenIPUIandBiasforeachItemPoolCondition.122 Figure5.48StandardErrorDistributionforeachItemPoolCondition........123 Figure5.49 TheRelationshipbetweenIPUIandStandardErrorforeachItemPool Condition...................................125 Figure5.50MeanSquaredErrorDistributionforeachItemPoolCondition.....126 Figure5.51ExposureRateDistributionforeachItemPoolCondition.........127 Figure5.52 TheRelationshipbetweenExposureRatesandItemGrouped byContentAreaforeachItemPoolCondition...............129 Figure5.53MeanBiasConditionalonTrue foreachItemPoolCondition.....132 Figure5.54MeanStandardErrorConditionalonTrue foreachItemPoolCondition134 Figure5.55MeanSquaredErrorConditionalonTrue foreachItemPoolCondition135 Figure5.56MeanIPUIValuesConditionalonTrue foreachItemPoolCondition.137 Figure5.57 MeanIPUIValuesConditionalonTrue aroundtheCutScoreforeach ItemPool...................................139 Figure5.58IPUIDistributionConditionalonTrue foreachItemPool.......140 FigureA.1 ItemyDistribution(ResearchQuestion1-Discrepancybetween ItemPoolandAbilityDistribution).....................167 FigureA.2 True Distribution(ResearchQuestion1-DiscrepancybetweenItem PoolandAbilityDistribution)........................168 xii FigureA.3 DistributionofBiasforeachDiscrepancyCondition(ResearchQuestion 1-DiscrepancybetweenItemPoolandAbilityDistribution).......169 FigureA.4 RelationshipbetweenBiasandIPUIforeachDiscrepancyCondition (ResearchQuestion1-DiscrepancybetweenItemPoolandAbility Distribution).................................170 FigureA.5 DistributionofMeanSquaredErrorforeachDiscrepancyCondition (ResearchQuestion1-DiscrepancybetweenItemPoolandAbility Distribution).................................171 FigureA.6 RelationshipbetweenMeanSquaredErrorandIPUIforeachDiscrepancy Condition(ResearchQuestion1-DiscrepancybetweenItemPooland AbilityDistribution).............................172 FigureA.7 TwoExamineeswithSameStandardErrorsbuttIPUIValues (ResearchQuestion1-DiscrepancybetweenItemPoolandAbility Distribution).................................173 FigureB.1True Distribution(ResearchQuestion1-Part2)............174 FigureB.2 ItemyDistributionbyItemPoolSizeConditionforReplication 19(ResearchQuestion1-Part2)......................175 FigureB.3 BiasDistributionbyItemPoolSizeConditionforReplication19(Re- searchQuestion1-Part2).........................176 FigureB.4 StandardErrorDistributionbyItemPoolSizeConditionforReplication 19(ResearchQuestion1-Part2)......................177 FigureB.5 MeanSquaredErrorDistributionbyItemPoolSizeConditionforRepli- cation19(ResearchQuestion1-Part2)..................178 FigureB.6 IPUIDistributionbyItemPoolSizeConditionforReplication19(Re- searchQuestion1-Part2).........................179 FigureC.1 ItemyDistributionforResearchQuestion2-TestLengthConditions 180 FigureC.2True DistributionforResearchQuestion2-TestLengthConditions..181 FigureC.3IPUIandBiasRelationshipbyTestLengthCondition..........182 FigureD.1ItemyDistributionforResearchQuestion2-ExposureControl.183 FigureD.2True DistributionforResearchQuestion2-ExposureControl.....184 FigureD.3IPUIandBiasRelationshipbyExposureControlCondition.......185 xiii FigureE.1TheBiasDistributionateachTrue ValueforeachItemPoolCondition186 FigureE.2 TheStandardErrorDistributionateachTrue ValueforeachItem PoolCondition................................187 FigureF.1ProgressPlotforIdealItemPoolwithFixedBinSize0.8.........188 FigureF.2ItemyDistributionsbyContentAreaforIdealItemPoolwith FixedBinSize0.8..............................189 FigureG.1True DistributionforResearchQuestion4................190 FigureG.2 TheRelationshipbetweenEstimatedAbilityandBiasforeachItem PoolCondition................................191 FigureG.3 TheRelationshipbetweenEstimatedAbilityandStandardErrorfor eachItemPoolCondition..........................192 FigureG.4 TheRelationshipbetweenTestLengthandStandardErrorforeachItem PoolCondition................................193 FigureG.5 TheRelationshipbetweenIPUIandMeanSquaredErrorforeachItem PoolCondition................................194 FigureH.1 TheRelationshipbetweenTrue andEstimated foreachItemPool Condition...................................195 FigureH.2TheBiasDistributionateachTrue ValueforeachItemPoolCondition196 FigureH.3 MeanStandardErrorConditionalonRestrictedTrue Rangeforeach ItemPoolCondition.............................197 FigureH.4 TheStandardErrorDistributionateachTrue ValueforeachItem PoolCondition................................198 FigureH.5 TheTestLengthDistributionateachTrue ValueforeachItemPool Condition...................................199 xiv KEYTOABBREVIATIONS 1PL one-parameterlogistic.4,6,27,33{35,37,40,162{164 2PL two-parameterlogistic.6,27,34,40,41,162{164 3PL three-parameterlogistic.6,9,27,34,40,41,163,164 CAT computerizedadaptivetesting.1{3,5{10,12{16,18{24,26,28{32,36,37,39{53, 60{62,72,73,79,84,86,93,96,102,111,131,138,141{151,154{157,160,161,163{165 EAP expectedaposteriori.16,18,37,154 IPUI itempoolutilizationindex.29{35,38{46,49{54,57{62,65,66,68{70,73,74,80{84, 86,94{96,99{103,105,106,111{115,117{120,124,130,131,136{144,146{165,194 IRT itemresponsetheory.5,16,27,28,163{165 MAP maximumaposteriori.18,154 MCAT multidimensionalcomputerizedadaptivetest.164 MFI maximumFisherinformation.8{10,37,155,162 MIRT multidimensionalitemresponsetheory.13,164 MLE maximumlikelihoodestimation.7,16{20,37,48 MSE meansquarederror.23,30,36,42,44,46,52,54,56,57,60,62{64,73,74,78,86,89, 90,94,112,124,134,138,149 NCLEX-RN NationalCouncilLicensureExaminationforRegisteredNurses.47{50,52, 107,124,127,130,131,144,153,159 NCSBN NationalCouncilofStateBoardsofNursing.47,49,107,143 P&P paperandpencil.6,12,21,22 RMSE rootmeansquarederror.4 SE standarderror.9,10,23,30{33,36,42,44,46,52,54{58,60{63,68,69,73,74,77, 78,84,86,88,89,94,96,98{100,112,120{124,133{135,138,142,146,149,151,154, 156{158 SEM standarderrorofmeasurement.4 TIF testinformationfunction.25 xv CHAPTER1 INTRODUCTION Increasingavailabilityofcomputersandrelativeadvantagesofcomputerizedadaptivetesting (CAT)overpaperbasedtestsboostedtheusageofCATinrecentdecades.Mainly,aCAT enablesmoretmeasurementofexamineeabilities,shortertestlengths,andmore preciseabilityestimatesforexamineesattheextremeendsoftheabilitydistribution.But thesebcomewithcosts.Amongothers,therequirementforalargeitempoolisthe mostchallengingone.Anitempoolisthecollectionofitemsthatwillbeusedtoconstruct individualadaptivetestsforexaminees.Anitempoolshouldincludetnumberof highqualityitemsthataretargetedtotheexamineepopulation(Parshall,Spray,Kalohn, &Davey,2002).Itshouldmeetthecontentsponsofthetestandprovidet informationatalllevelsoftheabilitydistributionofthetargetpopulation(vanderLinden, Ariel,&Veldkamp,2006).Flaugher(2000)underlinedtheimportanceofitempoolsina CAT: Obviously,thebetterthequalityoftheitempool,thebetterthejobtheadaptive algorithmcando.Thebestandmostsophisticatedadaptiveprogramcannot functionifitisheldincheckbyalimitedpoolofitems,oritemsofpoorquality. (p.38) Todemonstratetheimportanceofalargeitempool,averybasicadaptivetesthasbeen simulated.Figure1.1showstheCATprogressoftwoexamineesforatestwithsametest spanditempool.Theitempoolconsistsof50itemswithitemgenerated fromastandardnormaldistribution.Thepointsintheshowtheintermediateability estimatesoftheexaminees.Blue`b'pointsrepresenttheitemdyparametersthatare administeredtotheexamineesateachstageoftheadaptivetest.Trueabilityparametersof 1 Examinee1andExaminee2are0and1.5,respectively.Fortheexaminee,theitempool isverysuitable.Ateachstep,theitempoolcanprovideanitemwithayparameter whichisveryclosetotheExaminee1'sintermediateabilityestimate.Ontheotherhand,for Examinee2,aftercorrectlyansweringacoupleofquestions,theitempoolisoutof items.Asaresult,eventhoughExaminee2correctlyanswerseachitem,theCATalgorithm presentseasieritemstothisexaminee. Figure1.1:AdaptiveTestProgressPlotsforTwoExaminees Theconsequencesoftheinadequacyofanitempoolcanbeseriousdependingonthe stakesoftheexam.Forexample,asseeninFigure1.1,thestandarderroroftheability estimateforthesecondexamineeishigher.Themeasurementerrorofexamineeswithsimilar abilitiesasExaminee2willbehigherandthiscanthedecisionsmadefromtheirtest scores.Also,eventhoughExaminee2correctlyanswerseachitem,theitemsaregetting 2 easierandeasier.ThisiscontrarytothebasicpremiseofaCAT.Probablythiswill themotivationoftheexaminee.So,testdevelopersshouldensurethattheiritempoolsare supportingthepurposesofthetests.Theindexdevelopedinthisdissertationwillgivetest developersatooltoevaluatethequalityoftheiritempools. Testdevelopersareverymotivatedtokeeptheiritempoolsastaspossible.They wanttouseasmanyitemsaspossibletoensurethequalityofthetest.Buttheydon'twant touseitempoolsthataremorethanenough.Becauseitempooldevelopmentisanexpensive enterprise.Breithaupt,Ariel,andHare(2010)estimatedthatfora40itemCATthatwill spanover5yearswithtwoadministrationsperyear,2000itemsarenecessary.Considering thatdevelopingoneitemwithtraditionalitemdevelopmentmethods(withallnecessary controlmechanisms)costsbetween$1,500-2,500(Rudner,2010),thetotalcostofanitem poolreaches$3,000,000to$5,000,000(Gierl&Lai,2013).Asaresult,testdevelopersare motivatedtoreducethesizeoftheiritempoolsandmaketheiritempoolsastas possible. Evaluationoftheitempoolisveryimportantbecauseofthepossibleofaweak itempoolonthedecisionsbasedonthetestresults.Aninappropriateitempoolcanincrease thebiasesandstandarderrorsoftheabilityestimates.IfthetestisavariablelengthCAT, largerstandarderrorswouldincreasethetestlength.Iftheitempoolistforone groupofexamineesandnotforanother,thiswillimpairthefairnessofthetest.Somegroup ofexamineesmighthavelesspreciseabilityestimatesorlongertestsdependingonthetest spns.Simulationsmightrevealsuchproblems.Buttracingbacktothesourceof suchproblemsmightnotbestraightforward,especiallyforCATswithcomplexdesigns. Weakitempoolsmightalsocausetheviolationofsometestsporadministration ofinappropriateitems.Testdevelopersshouldevaluatetheiritempoolsandensurethattheir itempoolsareadequateforthetestspoftheirCATs.Forexample,Eggenand Verschoor(2006)observedthattheCATalgorithmtheydesignedproducedteststhatdid notmeetthetestspations.TheCATalgorithmfailedtoprovideappropriateitemsto 3 theexaminees.Theauthorsattributethistothelackofappropriateitemsintheitempool. Themotivationforthisstudyistocreateanindexthatquantheperformanceof anitempoolforagivenadaptivetestandexamineepopulation.Theperfectitempool performancewillbeachievedwhenanitempoolcanprovideaperfectitemtoanexaminee regardlessoftheabilityoftheexamineeorthestageofthetest,whilemeetingallof theconstraintsofthetest.Aperfectitemisanitemwithmaximumpossibleamountof informationatagivenabilitylevel.Forexample,forone-parameterlogistic(1PL)model,the perfectitemforagivenabilityhasayparameterequaltothisgivenabilityvalue. Inpractice,almostallitempoolsareimperfect.Butevaluatingtheofanitem poolisnotstraightforward.Adaptivetestsareusuallyevaluatedwithoutcomevariables suchasstandarderrorofmeasurement(SEM),biasoftheestimates,rootmeansquared error(RMSE),itemexposurerates,overlapratesordecisionaccuracy.Allofthesearevery valuableindicatorstoshowtheerentaspectsofthequalityofanadaptivetest.Butthe qualityofanitempoolcannotbeevaluatedsolelybyanyoftheseindicators. Thequalityofanitempoolisintermingledwithmanyaspectsofanadaptivetestsuch astheitemselectionprocedures,theabilityestimationmechanisms,constraintsexposedon adaptivetests,andtestspWhenevaluatingtheoutcomesofanadaptivetest, usuallyitisnotpossibletosingleouttheofeachoftheseindividualfactorswithout performingalargesimulationstudy.Theindexthatisdevelopedinthisdissertationaims toquantify,foranitempool,theamountofdeviationfromaperfectitempool.Atest developerwillbeabletousethisindextoevaluatetheitempool'sprospectiveperformance. Consequentlythiswillleadtoadecisionofeitherkeepingtheitempoolintact,orimproving itbyaddingmoreappropriateitems,orremovingtheredundantitemsandsavingthemfor futureadministrations. 4 CHAPTER2 LITERATUREREVIEW 2.1Notation Inthisthesis,thenotationusedbyvanderLindenandPashley(2010)hasbeenfollowed. Itemsintheitempoolaredenotedby i =1 ;:::;I .Theorderofpresentationofitemsis denotedby k =1 ;:::;K ,where K isthetestlength.So, i k denotestheindexoftheitemin theitempoolwhichisthe k thselecteditemintheadaptivetest.Thesetofitemsthatare alreadyadministeredbeforetheselectionof k thitemisdenotedas S k 1 = f i 1 ;:::;i k 1 g . Thesetofitemsremainedintheitempoolaftertheadministrationof k 1itemsis denotedas R k = f 1 ;:::;I gn S k 1 .Theresponsestringofanexamineewillbedenotedas u i 1 ;u i 2 ;:::;u i K .Inthisstudyonlydichotomousitemsused,sothevaluesof u i k canbe either0forincorrectresponseor1forcorrectresponse.Theabilityparameterofexaminees willbedenotedby 2 ( ; 1 ).Theexamineesareindexedwith j 2f 1 ;:::;N g ,where N isthetotalnumberofexaminees. 2.2ItemResponseTheory Itemresponsetheory(IRT)isthebackboneofaCAT.Almostallofthescoringanditem selectionalgorithmsuseIRT.EspeciallyinaCAT,sincetexamineesseest itemsatnttimes,theexistenceofacommonscaleisparamount.IRTprovidesthis commonscaleforitemsandexaminees. InIRT(Lord&Novick,1968;Lord,1980),theprobabilityofacorrectresponseofan 5 examineewithabilityparameter toanitem i ismodeledas: P ( u i =1 j )= c i +(1 c i ) e D a i ( b i ) 1+ e D a i ( b i ) u i = 8 > < > : 1ifitem i 'sresponseiscorrect 0ifitem i 'sresponseisincorrect (2.1) where a istheitemdiscriminationparameter, b istheitemyparameter, c isthelower asymptoteparameterand D isthescalingfactorwhichisusuallytakenas1.7.Themodelin Equation(2.1)iscalledthree-parameterlogistic(3PL)model.Fixingthe c parameterto0 givestwo-parameterlogistic(2PL)model,andfurtherthe a parameterto1gives1PL modelorRaschmodelwhen D =1(Rasch,1961). 2.3ComputerizedAdaptiveTesting CATisacomplexmethodofdeliveringatailoredexamthatadaptstoanindividualexaminee. TheresearchsupportingthedevelopmentofCAThasmorethan40yearsofhistory.Ithas beenusedinoperationaltestinginthepasttwentyyears. Comparedtopaperandpencil(P&P)tests,CAThasseveraladvantagessuchasshorter tests,increasedtestreliability,ondemandtesting,immediatetestscoringandreporting (Meijer&Nering,1999).ACATuseshalfasmanyitemscomparedtoP&Ptests(Weiss& McBride,1984),insomecasesevenless(Gibbonsetal.,2008).ACATallowsthemeasurement ofinformationsuchasresponsetimes(Wise,Bhola,&Yang,2006),speechentries,graphical entries,mousemovementsandothertrackinginformationthatarenotavailabletoP&Ptests. ACATalsoallowstheuseofinnovativeitemsthatcanhelptoincreasethevalidityevidence oftestswhicharenotavailableinP&Ptests(Luecht&Clauser,2002). ThelistbelowshowsthebasicalgorithmofaCAT: 1. Specifyaninitialabilityestimate( ^ 0 )asastartingpoint 6 2. Selectanitemfromtheavailableitemsintheitempooltodelivertotheexaminee 3. Scoreitemandupdateexaminee'sabilityestimate( ^ k ) 4. Evaluatetheterminationcriteria: a) Ifconcludethetest b) Ifnotgotostep2. Inthefollowingsectionseachpartofthelistabovewillbeexplainedfurther. 2.3.1InitialAbilityEstimate InthestepofaCAT,anappropriateinitialabilityestimateshouldbedesignated.There aretwooptionsforthestartingpointofaCAT:variablestartingpointandstarting point.Whenastartingpointisused,eachexamineeisassignedthesameinitialability estimateatthebeginningoftheCAT.Intestswithvariablestartingpoint,someprior informationabouttheexamineeguidesthechoiceoftheinitialabilityestimate. Itisrecommendedthattheinitialabilityestimateusealloftheavailableinformation abouttheexaminee(Kingsbury&Wise,2000).Examplesofthispriorinformationmightbe theprevioustestresults,schoolgrades,studentbackgroundinformation,andotherrelevant collateralinformationthatiscorrelatedwithexaminee'sability.Variablestartingpointswill increasetheoftheadaptivetest.Forsomeexaminees,thisinitialestimatemight beButtheadaptivenatureofthetestwillsolvethisproblem.Also,iftheabilitiesof examineesareestimatedusingBayesianestimationmethods,thevariablestartingpointwill reducethebiasoftheseestimates(Weiss&McBride,1984;Wang&Vispoel,1998).For maximumlikelihoodestimation(MLE),WangandVispoel(1998)foundnegligibleectsof usingvariableandstartingpointsonbias. Usingvariablestartingpointshasanaddedbttotestdeveloperstoo.Ifa startingpointisusedforallexaminees,eachexamineewillseethesameitemsatthebeginning 7 ofthetest(supposingthatthereisnoexposurecontrol).Consequentlysomeitemswillbe overexposed.Variablestartingpointscanalleviatethisproblem. Eventhoughit'smanypsychometricbvariablestartingpointshaveanimportant drawback.Usinginformationthatpossiblyexaminee'sabilityestimatebeyondthe examinee'scurrenttestperformancemightbeobjectionabledependingonthetestpurpose. Thisisthemainreasonwhymanyhighstakestestsuseastartingpoint. Usually0ischosenasastartingpointforaCAT,becauseitisthemiddleofthe abilitydistribution.Butdependingonthecircumstances,tstartingpointsmight beconsidered.Forexample,ifthetestlengthofaCATisratherlongandtestdevelopers wantexamineestowarmuptotheexamandmakeaneasystart,alowerstartingpointcan beset.Forlicensureexams,theinitialstartingpointmightbesetatthecutscoreofthetest. 2.3.2ItemSelection Theitemselectionalgorithmisthemostimportantpartoftheadaptivetest.Ateachstage, theCATalgorithmshouldselectthemostappropriateitemusuallywiththeexistenceofmany constraints.Iteminformationisanimportantdeterminantoftheitemselectionprocesses. Almostallitemselectionalgorithmstrytoselectanitemthatwillgivethelargestamountof informationtotheexaminee.Inthisdissertationonlyafewoftheavailableitemselection algorithmswereinvestigatedindetail,butintheCATliteraturetherearemanyofthem (vanderLinden,1998a;Barrada,Olea,Ponsoda,&Abad,2010;vanderLinden&Pashley, 2010). 2.3.2.1MaximumFisherInformation ByfarthemostuseditemselectionalgorithminaCATismaximumFisherinformation (MFI)(Lord,1977a).Inthismethod,theitemthathasthemaximumamountofinformation atanexaminee'sintermediateabilityestimate( ^ )willbeadministered.Foratestwith K 8 items,theFisherinformationfunctioncanbeexpressedas: I ( )= E @ 2 @ 2 log L ( u j ) = K X k =1 P i k 0 2 P i k Q i k (2.2) where L ( u j )isthelikelihoodfunctionas: L ( u j )= L U i 1 = u i 1 ;:::;U i K = u i K j = K Y k =1 P u i k i k 1 P i k 1 u i k (2.3) where P i k = P u i k =1 j isinEquation(2.1).Forthe3PLmodelsubstituting thevaluesandcalculatingthederivativeofEquation(2.1)foritem i gives: I i ( )= ( Da i ) 2 (1 c i ) c i + e D a i ( b i ) 1+ e D a i ( b i ) 2 (2.4) AteachstepoftheCAT,theMFIalgorithmsearchesforanitemthatmaximizesthe totalinformation(Equation(2.2))giventheprevious S k 1 items. MFIhassomeimportantadvantages.Ateachstep,itselectsthemostinformativeitem whichincreasestheoftheCAT.Theprecisionofthetestincreasesrapidlyasmore itemsareadministered.Itiswidelyusedandit'spropertiesarewellresearched. MFImethodonlyusestheexaminee'scurrenttestdatatoselectanewitem.Thisis desirableinsomecircumstances.But,usuallyattheearlystagesoftheCAT,thereisnot enoughinformationtoguidethisitemselectionalgorithm.Asaresult,theitemsthatare selectedmightnotbethemostappropriateones.Additionally,theitemsatthebeginningof thetestcausebigjumpsintheabilityestimates.Thisisthereasonwhytestpreparation companiestelltheirstudentstobeextracarefulwhilerespondingthefewitemsonthe CAT.Thisproblemmightbealleviatedbyusingpriorinformationtoselectitemsorusing titemselectionparadigmssuchasKullback-Leibleratleastatthebeginningofthe test(Chang&Ying,1996). BecauseMFImethodisselectingmostinformativeitems,itwillreducetheuncertaintyof theabilityestimates,i.e.standarderror(SE),morequickly.Han(2012)foundthatMFI 9 resultedinthelowestSEofestimateregardlessoftheexposurecontrol.Eventhoughthisis desirable,onepossiblesideofthismightbeonvariablelengthCATtests.Forthose tests,usuallythetestterminateswhentheSEdropsbelowathreshold.With MFI,thereisachancethatteststerminateprematurely.Also,Warm(1989)showedthatfor shortertestsapproximatingthestandarderrorsusinginformationfunctionsunderestimate thetrueerrorvariances. MFIhassomeotherdisadvantagesaswell.Forshorttests,Chen,Ankenmann,andChang (2000)foundthatMFIperformedmarginallyworsethanotheritemselectionmethodsthey investigated.Fortestslongerthan10items,theyfoundthatthisdisappeared. Usingitemssolelybasedontheirinformationvalueswillresultindisproportionateuse ofsomehighlyinformativeitems(Way,1998).Thishastwodisadvantages.First,highly informativeitems,whichusuallyhavehighitemdiscriminationparameters,exposedalot. Ontheotherhand,itemswithlowitemdiscriminationparametersmightnotbeexposed atall.Second,sinceveryinformativeitemsareusedatthebeginningofthetest,where theprovisionalabilityestimatesareinaccurate,theabilityestimateswillbeoverand underestimated(Chang,2004).Tomitigatethisproblem,methodssuchas a item selectionmethod(Chang,Qian,&Ying,2001)or a with b -blockingitemselection method(Changetal.,2001)hasbeenproposed. 2.3.2.2Owen'sBayesianItemSelection Owen'sBayesianitemselectionalgorithm(Owen,1975)isbasedonthereductionofthe posteriorvariancesoftheabilityestimates.FromaBayesianframework,theposterior distributionoftheabilitygiventheresponsestoprevious k 1itemsis: g ( j u )= P u i 1 ;:::;u i k 1 = P ( u ; ) P ( u ) = L ( u j ) g ( ) R L ( u j ) g ( ) (2.5) 10 where g ( )isthepriordistributionoftheability,whichisusuallyanormaldistribution. Thecalculationofthisposteriordistributionisnotcomputationallysimple.Owenuseda normalapproximationtothisposteriordistributionandheprovedthatasthenumberof administereditemsgotoy,theexpectedvalueoftheposteriordistributionwillconverge tothetruevalueof . Eachexamineewillstartthetestwiththeinitialabilityestimatethatisequaltothe expectedvalueofthepriordistribution, g ( ).Ateachstage,Owen'sitemselectionalgorithm searchesforanitemthatwillreducetheposteriorvariancemost.AccordingtoOwenthis canbeachievedbyminimizingthe function(Vale&Weiss,1977): i = 1 (1 c i ) 1+ 1 ˙ 2 0 a 2 i ! 1+ 1 K c i + 1 c i K e 2 D 2 where D = b i 0 r 2 a 2 i + ˙ 2 0 K 1 = 1 erf( D ) 2 erf( x )= 2 p ˇ Z x 0 e t 2 d t: where 0 and ˙ 0 arethemeanandstandarddeviationofthepriordistribution,respectively. Aftertheexamineeanswersanitem,anewposteriordistributioniscomputedusingthe itemresponseandpriordistribution.Then,thisnewposteriordistributionbecomestheprior distributionfortheselectionofthenextitem.Whenthevarianceoftheposteriordistribution reducestoanacceptablelevelthatthetestdevelopercantolerate,thetestends. Ateachstage,posteriorvarianceiscalculatedforeachavailableitemintheitempool. Whenthecomputerprocessingspeedswereslow,thiscausedlongwaitsaftertheexaminee's response.ValeandWeiss(1977)proposedarapiditemsearchproceduretosolvethisproblem atthattime.Butasthecomputerspeedsincreasedthisproblemvanished.Asdiscussedin Section2.3.4.3,Owen'sitemselectionalgorithmismuchfasterthanotherBayesianmethods. 11 2.3.3ConstraintsonItemSelection Intheory,theitemthatismostinformativeattheexaminee'sintermediateabilityestimate shouldbeadministered,butinpracticethisrarelyhappens.Thereareseveralstatisticalor non-statisticalrulesthatconstraintheitemselection.Theseconstraintsensurethateach testfollowsthetestspascloselyaspossibleandeachexamineegetsacomparable andstandardizedtest.Inadditiontheyhelptestdeveloperstosecuretheiritempools. Inoperationaltesting,thenumberofconstraintsonitemselectioncangoupto100.For example,vanderLindenetal.(2006)reportedthatthereare96constraintsinLSATtest.In theiradaptivetestsimulations,Stocking(1994)usedupto75constraintsonitemselection. Themajorityoftheconstraintsonitemselectionarecontentbalancing,exposurecontrol anditemenemies(Weiss,2011).Butconstraintsarenotlimitedtothese.Eignor,Stocking, Way,and(1993)gaveadetailedaccountofsuchconstraints.Forexample,atest developermightnotwanttouseitemsthatcontainuncommonwordsmorethanonceor twiceinatest.Itemformatcanbeanimportantconstraintaswell.Forinstance,ina sentencecompletionsection,testdevelopermightwanttopreserveacertainratioofitems thatcontainasingleblankasopposedtotwoblanks. Someitemsshouldnotbepresentedtoanexamineewithinthesametest.Forexample, itemenemieswhichprovidesacluetothesolutionofeachother.Orredundantitemsthatare verysimilartoeachother.Insuchcases,theitemselectionalgorithmshouldnotselectsuch itemsifoneofthemisalreadyadministered.InP&Ptests,theseitemscanbeavoidedbefore presentingthetesttotheexamineesbycheckingthetestforms.InaCAT,itemenemiescan bebundledinsubsets.Ifanitemwithinasubsetisadministeredtoanexaminee,remaining itemsareremovedfromtheavailableitempoolforthatexaminee. 12 2.3.3.1ContentBalancing Eachtesthasaninferenceaboutexamineescores.Iftheinferencesofatestarerelatedto generalmathematicsabilityandthemajorityofthetestcontentiscomingfromtrigonometry, thenthetestscoredonottheclaimsofthetest.Forthisreason,eachtestneedsto followatestblueprint.Thisblueprintdelineatesthedetailsaboutthetestincludingthe contentareadistributionofitems,cognitiverequirementsofitemsandetc.Contentbalancing isamechanismneededforaCATtofollowthetestspsandtoavoidover-testing orunder-testingofsomecontentareas.Mosttestspecisetdesiredcontentcoverage ratiossuchthatcertainpercentagesofthetestitemscomefromeachcontentdomain. KingsburyandZara(1989)proposedasimpleandintuitivecontentbalancingalgorithm. First,testdeveloperspthepercentageoftestitemsthatshouldcomefromeach contentarea,i.e.targetpercentages.Aftertheadministrationofeachitem,thecomputer calculatestheempiricalpercentagesofeachcontentarea.Then,theseempiricalpercentages arecomparedtothetargetpercentages.Thecontentareawhichhasthelargestdiscrepancy betweenthetargetandempiricalpercentagesisselected.Itemsfromothercontentareasare outfromtheitempoolandthenextitemwillbedeliveredfromtheavailableitems withinthiscontentdomain.Basicallyinthismethod,itempoolispartitionedintosmaller itempoolsaccordingtoitemcontent.Ateachstageanitemfromoneofthesesmalleritem poolsisselected. Oneproblemwithnumberofitemsfromeachcontentisthepotentialinteraction betweenitemcontentanditemculty(Segall,1996).If,forexample,trigonometryitems areitemsandarithmeticitemsareeasieritemsintheitempool,alowability examineemightneedtoanswertrigonometryitemsthatarewayabovehis/herabilitylevel. ClearlythisreducestheofaCAT.ACATusingmultidimensionalitemresponse theory(MIRT)mightbemoreeforsuchsituations. Besidesthisrathersimplecontentbalancingmethod,thereareothertechniquesaswell(He, 13 Diao,&Hauser,2014).Examplesofsomeothercontentbalancingmethodsaretheshadow testapproach(vanderLinden,2010),weighteddeviationsmodel(Swanson&Stocking,1993) andmaximumpriorityindexmethod(Cheng&Chang,2009). 2.3.3.2ExposureControl Dependingonthestakesoftheexamination,testdevelopersmaynotwantsomeitemsto beoverused.Asthestakesofthetestincreases,theincentivefortestuserstoobtainthe testitemswithoutpermissionincreases.Sometestpreparationorganizationseventried systematicallytoobtaintestitems(Davey&Nering,2002).Therefore,testdevelopersshould protecttheiritemsfromsuchattempts.Alsotestdevelopersdon'twantsomeitemstobe underusedduetothehighcostofproducingitems.Inordertocontroltheusageofitemsin aCAT,thefrequencyofitemadministrationisconstrainedusingexposurecontrolmethods. Exposurecontrolproceduresareneededtomaintainthefairnessandthevalidityofthetest bypreventingexamineesfromhavingpre-knowledgeoftheitems. Testdevelopersdealwiththeitemexposureproblemsintwogeneralways.Theydeal withtheitemexposureproblembeforetheexaminationbymanagingtheitempools,and/or theydealwithitduringtheexaminationbyputtingsomeconstraintsontheitemselection algorithm. Testdevelopersmaychoosetouseitempoolsforacertainamountoftimeandchangeit. Theamountoftimedependsonthefrequencyandvolumeofthetestadministrationand stakesofthetest.Testdevelopercanuseacompletelynewitempool,apreviousitempool orremoveonlytheproblematicandhighlyexposeditemsfromthepool. Itemexposurecanbecontrolledduringtheexaminationaswell.Broadly,exposurecontrol methodsduringthetestadministrationcanbedividedintotwo(Way,1998):methodsbased onrandomizationandmethodsbasedonthefrequencyofadministrationofitemsfora particularpopulation. Therearemanyvariationsoftherandomizationapproachtoexposurecontrol.The 14 randomesqueprocedure(Kingsbury&Zara,1989)isasimplewayofdealingwithexposure control.Usingthisprocedure,ateachstage,theCATalgorithmrandomlyselectsoneitem fromthemostinformative m itemswhere m 2f 2 ; 3 ;::: g .Anothermethodmentionedin Eignoretal.(1993)isavariationofthisrandomesqueprocedure.Firstitemisselectedfrom agroupofeightbestitems,seconditemisselectedfromagroupofsevenbestitemsand soon.Aftertheeighthitemtheoptimalitemisselected.Theideabehindthisapproach is,attheinitialstagesofthetest,almostallexamineesseethesamesetofitems.Aftera certainnumberofitems,therewillbeenoughvariationintheexamineeresponsestoselect anoptimumitem.Bergstrom,Lunz,andGershon(1992)anexposurecontrolmethod foraCATusingtheRaschmodel.Intheirmethod,anitemwithyparameterwithin 0.10logitsoftheintermediateabilityestimateisselectedrandomly. Sympson-Hetterexposurecontrol(Hetter&Sympson,1997)isanexampleofthesecond categoryofexposurecontrolmethodsthatareconditionalonthefrequencyofadministration ofitemsforaparticulartargetpopulation.Inthismethod,amaximumexposureratebetween 0and1isassignedtoeachitemusingsimulations.Lowermaximumexposureratesare assignedtoitemsthatareveryinformativeandtendtobeadministeredmost.Duringthe test,afterselectinganoptimumitemforadministration,arandomnumberfromauniform distributionbetween0and1isgeneratedandcomparedtotheexposurerateofthisitem.If thisrandomnumberissmallerthanthemaximumexposurerateoftheselectedoptimum item,thenthisitemisadministeredtotheexaminee.Otherwise,thenextoptimumitemis selectedandthesameexposurecontrolprocedureisappliedtothisitemaswelluntilanitem isadministeredtotheexaminee. Inadditiontothesetwomethods,therearemanymethodstocontrolitemexposureduring thetest(Revuelta&Ponsoda,1998;Georgiadou,Trian&Economides,2007).Even thoughitisveryimportanttocontainexposureratesatacceptablelevels,theseprocedures willreducetheofaCAT. 15 2.3.4AbilityEstimation Afteranexamineeanswersanitem,theabilityisestimatedtoeitherterminatethetestor toselectanitemusingthisestimate.Selectinganappropriateestimationmethodiscrucial. Estimationmethodwillthenalscorethatisreported,thedecisiontoterminate thetestandtheitemsthatareselectedforadministration.Intheliterature,thereexists numerousabilityestimationmethods(Wang&Vispoel,1998;vanderLinden&Pashley, 2010).Threeoftheseestimationmethodsareinvestigatedfurtherinthisdissertation:MLE, expectedaposteriori(EAP)andOwen'sBayesianabilityestimation. 2.3.4.1MaximumLikelihoodEstimation ByfarthemostusedmethodofabilityestimationinaCATisMLE.MLEdependsonone ofthemainassumptionsofIRT,localindependence(Hambleton&Swaminathan,1985). Accordingtothelocalindependenceassumption,foranyexamineethepartialcorrelation betweenanytwoitemswillbezerowhentheabilityparameterisheldconstant.Asaresultof thislocalindependenceassumption,thelikelihoodofaresponsestringforasingleexaminee canbecalculatedusingthefollowingproduct: L U i 1 = u i 1 ;:::;U i K = u i K j = K Y k =1 P u i k i k 1 P i k 1 u i k (2.6) where P i k isasinEquation(2.1).MLEoftheabilityisthevalueof thatmaximizes thislikelihood: ^ MLE =argmax 2 ( ; 1 ) f L ( u j ) g (2.7) Newton-Raphsonmethodcanbeusedtothe ^ valuethatmaximizesEquation(2.6). Innumericalanalysis,theNewton-Raphsonmethodisusedtoapproximatetherootsof areal-valuedfunction(Hildebrand,1987).InthecaseofEquation(2.6),themaximum valueistherootofthederivativeofthelikelihoodfunction.TheNewton-Raphson 16 procedurestartswithaninitialestimatefor x 0 = 0 .Startingfromthisinitialestimate,the Newton-Raphsonprocedureiterativelyapproximatestotherootofthederivativeusing thefollowingequation: x n +1 = x n f ( x n ) f 0 ( x n ) (2.8) where n istheiterationnumberand f ( x )= d L ( u j ) f 0 ( x )= d 2 2 L ( u j ) : Theiterationsstopwhenthebetween x n +1 and x n becomesacceptablysmall. Theuserthisacceptablesmallvalue.Attheend,theprocesssaidtobeconverged andthe x n +1 valueistheMLEofthe ,denotedas ^ .Someauthors(Hambleton &Swaminathan,1985)choosetomaximizethenaturallogarithmoflikelihoodfunction ( ln ( L ( u j )))insteadoflikelihoodfunction.Bothmethodswillconvergetothesamenumber. Butmaximizingthelogarithmoflikelihoodispreferabletomaximizingthelikelihood,because logarithmoflikelihoodreducestothesummationoftheprobabilitiesandthenumbersused intheanalysisislessextremewhenthesumsratherthanproductsareused. TheNewton-Raphsonalgorithmisaquickmethodtotheabilityestimateofagiven responsestring.Itisimportanttoselectagoodinitialestimateforthisapproximationtobe quick.Insomecasestheremightbesomelocalminimumormaximumvalues.Thealgorithm mightconvergetothesevaluesinsteadofaglobalmaximumpoint.Theusershouldbeaware ofsuchpossibilitiesandchoosegoodstartingvaluestoconvergetotheglobalmaximum value. MLEshavemanydesirableproperties(Hambleton&Swaminathan,1985).Theyare consistent,asthenumberofitemsincreasestheestimatesconvergetotheirtruevalues.They arecient,theyhavethesmallestvarianceasymptotically.Andasymptoticallytheyare 17 normallydistributed.Thelastpropertyisveryusefulinpractice.Itallowsthecalculationof thestandarderrorofthemaximumlikelihoodestimator: se ( ^ j )= 1 I ( ^ ) 2 (2.9) where I ( ^ )istheinformationfunctiongiveninEquation(2.2). Ontheotherhand,MLEhasanimportantpracticaldisadvantage.Forresponsestrings thatconsistofall1'sorall0'sitwilltendto 1 or ,respectively.Toavoidthisproblem, theCATtestcanstartwithaBayesianestimationprocedureandswitchtoMLEafter obtainingaheterogeneousresponsestring. 2.3.4.2ExpectedaPosterioriEstimation InEAPestimationtheinformationfromexaminee'sresponsesarecombinedwiththeinfor- mationaboutthepopulation(Bock&Mislevy,1982).EAPistheexpectedvalueofthe posteriordistributioninEquation(2.5)whichcanbewrittenas: ^ EAP = E ( j u )= Z 1 g ( j u ) (2.10) VarianceoftheEAPestimatecanbewrittenas: var( j u )= Z 1 2 g ( j u ) [ E ( j u )] 2 (2.11) AsmentionedinSection2.3.2.2,thecalculationoftheseintegralsarenottrivial.A commonapproachistoapproximatethevaluesoftheseintegralsusingnumericalintegration methods. Anotherestimationmethodthatusesthesameposteriordistributionismaximuma posteriori(MAP)estimation(Samajima,1969).Insteadofingtheexpectationofposterior distributionlikeEAP,MAPlocatesthemodeoftheposteriordistribution: ^ MAP =argmax 2 ( ; 1 ) f g ( j u ) g (2.12) 18 2.3.4.3Owen'sBayesianEstimation Owen'sBayesianestimation(Owen,1969,1975)isasequentialabilityestimationmethod. IteliminatedtheburdensomecomputationsofMLE.AteachstepofaCAT,theposterior distributionoftheabilityfromthepreviousstepisusedasapriordistributionforthe estimationofability.Furthermore,assuminganormaldistributionasapriordistributionfor examineepopulationenablesaclosedformapproximationtotheposteriormeanandvariance oftheability: M k = M k 1 V k 1 q a 2 k + V k 1 ˚ ( D k ) ( D k ) 1 u k A k V k = V k 1 V 2 k 1 q a 2 k + V k 1 k (2.13) where M k and V k arethemeanandvarianceoftheposteriordistribution, M k 1 and V k 1 arethemeanandvarianceofthepriordistribution, a k ;b k ;c k aretheitemparametersofthe k thitem, u k istheitemresponse. D k , A k and k inEquation(2.13)are D k = b k M k 1 q a 2 k + V k 1 A k = c k +(1 c k ) ( D k ) k = ˚ ( D k ) ( D k ) 1 u k A k 1 u k A k ˚ ( D k ) ( D k ) + D k where ˚ andaretheprobabilitydensityfunctionandcumulativedensityfunctionof thestandardnormaldistribution.MeanoftheposteriordistributioninEquation(2.13) correspondstotheabilityestimateandsquarerootofthevarianceoftheposteriordistribution correspondstothestandarderroroftheabilityestimate. Owen'sBayesianestimationwasverypopularduetoit'scomputationalsimplicitydueto itsclosedformequations.Butithasamajordisadvantage.Ateachstep,itusesanormal densityfunctiontocreateaposteriordistribution(Wang&Vispoel,1998),butinfactthe shapesofthedistributionscandeviatefromtheshapeofnormaldistributions.Asaresult, 19 thisestimationmethodintroducesabiastotheestimates.Forexample,simplychangingthe orderoftheadministrationofitemsmightchangetheabilityestimate. MLEisgenerallypreferabletoBayesianestimationmethods.Inthelongrun,MLEis asymptoticallyunbiased.Itisnotbyanyotherfactor,likeBayesianestimation methods,otherthanactualtestperformance.Bayesianestimationmethodsyieldbiased estimatesbutthestandarderrorsassociatedwiththemaresmaller(Lord,1986;Wang& Vispoel,1998). 2.3.5ItemPoolsinCAT Parshalletal.(2002)danitempoolasa\collectionoftestitemsthatcanbeusedto assembleorconstructacomputer-basedtest"(p.21).Intheliterature,itempoolshavebeen calledas\itembanks",\questionbanks",\itemcollections",\itemreservoirs"and\testitem libraries"(Millman&Arter,1984).Eventhoughtheremightbesubtlebetween theseterms,theyareusedsynonymouslyinthisstudy. ThequalityoftheitempoolisverycrucialbecausethequalityoftheCATdependson them.Chapter1explainedtheimportanceoftheitempoolsforadaptivetests.Accordingto Flaugher(2000),asatisfactoryitempoolforadaptivetestingshouldhaveitemswiththree characteristics:(1)highitemdiscriminations( a> 1)(2)arectangulardistributionofitem and(3)lowguessingparameters( c< 0 : 2).McBride(1977)describedanidealitem poolashavingalargenumberofhighlydiscriminatingitems( a> 0 : 8)andarectangular distributionofitemthatcoverstheabilitycontinuum.Asimilarof idealitempoolwasgivenbyMillsandStocking(1996).Urry(1977)gaveamoredetailed descriptionofitemparametersofanitempool:itemdiscriminationsshouldbelargerthan 0.8,itemyparametersshouldbeevenlyandwidelydistributedbetween-2and2, itemguessingparametersshouldbelowerthan0.3.Urryaddedthattheitempoolsshould haveatleast100items. Thequalityoftheitemsintheitempoolisimportantbecausetheexamineesare 20 administeredrelativelyfeweritemscomparedtoP&Ptests.Awediteminanitempool hasmanyperils.Ittheabilityestimates,whichconsequentlythesubsequent itemsadministered.Ithasalargerontheabilityestimatesbecausefeweritemsare administeredinCATs.Sincetheitemseachexamineeseesaret,aweditemcan someexamineesbutnotothers,whichinturnhampersthefairnessofthetest(Wainer, 2000). 2.3.5.1ItemPoolSize Thesizeoftheitempoolisanotherconsiderationforthequalityofanitempool.Thesize andtheitemultydistributionofanitempooldependsontheCATspand theexamineepopulation(Reckase,2010).Stocking(1994)listedsixfactorsthatthe sizeofanitempool:\itemselectionalgorithm,constraintsonitemcontent,psychometrics, andexposure,stoppingrules,overlaprestrictions,testscoring,requirementsofparallelism withexistingpaper-and-pencilforms"(p.7). Thepurposeofthetestcanathesizeofanitempool(Parshall,2002).Largeritem poolsareneededforhighstakestestscomparedtolowstakestests.Inhighstakestests, duetothestakesofthedecision,testscoresshouldbemoreprecise.Highstakestestsare morepronetocheatingwhichrequiresmorelimitsovertheexposureofitems.Another considerationforitempoolsizeisthenumberoftestdays(i.e.lengthofthetestingwindow). Asthenumberoftestdatesincreased,theitemsinthepoolexposedmore.Thiscreatesa needformoreitemsintheitempool(Parshall,2002). Intheliteraturethereisnotaconsensusregardingthesizeofanitempool.Urry(1977) advisedanitempoolwithatleast100itemsforaCATtestthatimprovestheaccuracy comparedtoasimilarP&Ptest.Stocking(1994)recommendedthataCATitempoolshould be12timesaslargeasthelengthoftheCATtest.Chen,Ankenmann,andSpray(2003) advisedthatanitempoolshouldbeatleast6.7timeslargerthanalengthteststo containtheoverlapratebetweenexamineesbelow15%.Foranoverlapratebelow10%,item 21 poolsizeshouldbe10timesaslargeasalengthtest. Verylargeitempoolsarenotadvisableinpractice.Suchitempoolsareto manage,theycanstrainthehardwareandsoftwarerunningtheCATtest(Mills&Stocking, 1996).Searchingasmallitempoolisfastercomparedtoalargeitempool.Thisisan importantpracticalconstraintforoperationaladaptivetests(Vale&Weiss,1977).Asingle breachtotheitempoolmightcompromisealotofitemsatthesametime.Inthissense, creatingitempoolswithcientlyenoughitemsisimportant. 2.3.5.2ItemPoolDesignandAssembly ItempoolsforaCATcanbeassembledinasimilarfashiontothetraditionaltests.Initially, testdeveloperagoalforthetest.Thisgoalmightbemeasuringeveryexamineeas preciselyaspossible,measuringhighabilityexamineespreciselyforascholarship,makinga decisionatacutpointtoobtainalicenseforajob,etc.Thetestdeveloperthendevelopsan itempoolinformationfunctionforthisgoal.Finally,thetestdeveloperassemblesitemsso thattheitempoolinformationfunctionmatchesthetargetinformationfunction.vander Linden(1998b)discussedseveralmethodstoassembleregularP&Ptestsusingthesethree steps.ThesetestassemblymethodscanbeextendedtotheitempoolsoftheCATs. Sometestingagencieshavelargeitembanksthatcontainmanyitems.Theselargeitem banksarecalledthe\masterpool"or\vat"(Way,1998).Ateachtestingwindow,atest agencyassemblesanitempoolfromthisvatthatmeetsthetestspTheitem poolsarereplacedwiththenewonesaftersomeconditionsmet.Way,andAnderson (2002)callsthemthedockingrules.Itempoolscanberenewedafteracertainnumberof examineesseestheitempool,orafteracertainperiodoftime.Itempoolsareassembled fromthemasteritempoolusingtheassemblytechniquesdescribedinthepreviousparagraph. Inaddition,someresearchersproposedmethodstodesignoptimalblueprintforanitem poolandselectingitemsfromthemasterpoolusingtheseoptimumblueprints(Veldkamp &vanderLinden,2010).Reckase(2010)developedthebin-and-unionmethodtobuildan 22 optimumitempoolblueprint.Thismethodwasusedinthisstudytobuildoptimumitem pools(Section4.2.2.1).SeeHeandReckase(2013)foranexampleuseofthisapproach. 2.3.6EvaluationoftheItemPools IntheCATliteraturethereisnotaspmethodtoevaluatethequalityofitempools. Generally,itempoolsarecomparedperformingsimulationsthatusetitempools whileholdingeverythingelsethesame(Thompson&Weiss,2011).Thetestdeveloperchecks theindicatorslikebias,SE,meansquarederror(MSE),exposureratesandoverlapratesand makesadecisionaboutthequalityoftheitempool.Eventhoughtheseareveryvaluable indicators,noneofthesearethedirectmeasuresofthequalityofanitempool.Forexample, thereasonwhySEisnotadirectmeasureofitempoolqualityisdiscussedthroughlyin Section3.4onpage31. Themostcommonmethodtoevaluateitempoolqualityisinvestigatingtheitempool informationfunction(Segall,Moreno,&Hetter,1997).Itempoolinformationfunctionsof titempoolsarecomparedwhilekeepingthetestingpurposeinmind.Forexample, XingandHambleton(2004)comparedsixitempoolsusingthismethod(Figure2.1).Depend- ingonthepurposesofthetest,testdeveloperbuildsatargetinformationfunctionforthe itempool.Theitempoolsthatareclosetothistargetofinformationcurvewillbeselected. Anothermethodofitempoolevaluationistoinvestigatetheexposureandoverlaprates oftheitemswithintheitempools(Chang,2004).Anitempoolthathasalotofoverused orunderuseditemsisdeemedtobeantitempool.Also,forsecurityreasons,test developersdonotwanthighitemoverlapratesbetweentworandomlyselectedexaminees. Thesemethodsareimportanttomonitorthequalityoftheitempoolbutitdoesnotsay muchaboutthequalityoftheitemsanexamineesees.Atestwithhighexposureratesmight presenthighqualityitemstoexaminees,oranitempoolwithlowexposureratesmightnot provideappropriateitemstotheexaminees.Atestdevelopermightimplementanexposure 1 ThisgraphistakenfromXingandHambleton(2004,p.8) 23 Figure2.1:ComparisonoftheInformationFunctionsofSixItemPoolsUsingtheItemPool InformationFunctions 1 controltoreducetheexposureratesoftheitemswithintheitempool.Butthismayresulta lossintheoftheCATtest. 24 CHAPTER3 THEITEMPOOLUTILIZATIONINDEX 3.1Relative Theoriginsofitempoolutilizationindexgoesbacktotheconceptofrelativeciency. InLordandNovick(1968),Birnbaumintroducedtheconceptofrelative.He usedrelativeciencytocomparetwoscoringmethods.Lord(1980,p.83)relative oftwotests,xandy,atacertainability as: RE ( y;x )= If ;y g If ;x g (3.1) where If ;y g and If ;x g aretheinformationfunctionsfortestsyandx,respectively.Lord (1974,1975,1977b,1980)demonstratedtheuseofrelativetoevaluateandcompare ttests. Asanexamplefortheuseofrelativeincomparingtwotests,let'sconsidertwo testswith10items.ItemparametersofthesetestsareinTable3.1.Ifthetestinformation function(TIF)ofthesetwotests(Figure3.1)areinvestigated,itcanbeseenthatthese twotestshavetcharacteristics.Test2(greenline)providesmoreinformationfor examineesjustabove =0,butitislessinformativeforexamineesattheextremes.On theotherhand,Test1(redline)isnotgivingasmuchinformationasTest2forexaminees with 'sbetween-0.5and1.5,butthroughouttheabilityscaleitprovidesmoreinformation. RelativeofTest1toTest2(blueline)showsthis.Test1ismoretcompared toTest2whenbluelineisabovethedashedline x =1.Whenbluelinefallsbelowthe dashedline,Test2ismoreinformative. 25 abc Item11.21-1.410.22 Item21.32-1.210.14 Item31.160.010.17 Item41.570.250.23 Item51.32-0.820.15 Item61.410.240.26 Item71.16-0.550.15 Item81.120.510.18 Item91.35-1.220.15 Item101.242.230.21 (a)Test1 abc Item11.08-0.980.20 Item21.36-0.700.15 Item30.970.370.22 Item41.29-0.070.14 Item51.460.060.19 Item61.480.390.18 Item71.41-0.750.18 Item80.690.550.19 Item91.070.170.23 Item101.560.360.14 (b)Test2 Table3.1:ItemParametersofTest1andTest2 Figure3.1:TestInformationFunctionsandRelativeofTest1andTest2 MorerecentlyHan(2012)createdanitemselectionalgorithmforCATusingrelative concept.Inthisalgorithmheutilizedfromtheconceptofexpecteditem, whichisthe\levelofrealizationofanitem'spotentialinformationatinterim ^ "(Han,2012, p.227).Mathematicallyanitem i 'sexpectedaftertheadministrationof k thitem 26 as: I i h ^ k i I i i (3.2) where I i h ^ k i istheamountofinformationitem i hasattheinterimabilityestimate ^ k ,and I i i isthemaximumpotentialinformationitem i canhave.For1PLand2PLmodels (Hambleton&Swaminathan,1985),item i reachesmaximuminformationat i = b i ,where b i istheitemyparameter.For3PLmodel,item i reachesmaximuminformationat i = b i + 1 Da i ln 1+ p 1+8 c i 2 : (3.3) 3.2ItemPoolUtilizationIndex Han(2012)usedexpecteditemasanintermediatesteptoselectmostappropriate itemforanexaminee.Hedidnotmentioneditasapossiblewayofevaluatinganitem pool'sciency.Thisiswherecurrentresearchdivertsfromhisresearchandtherestofthe literature.Inthisdissertation,aslightlytversionofitemwillbeusedto evaluateanitempool'sperformance.InthepaperofHan(2012),thefocuswaswhetheran item'smaximumpotentialisfuornot(Equation(3.2)).Inthisdissertation,thefocus iswhetheraperfectitemisadministeredtoanexamineeornot.Eventhoughthesetwo interpretationsgivesameresultsforbasicIRTmodels,theyareconceptuallyt.In thisdissertation,theitemforitem i at ^ as: I i h ^ i I max h ^ i (3.4) where I i [ ^ ]istheinformationofitem i at ^ ,and I max [ ^ ]isthevalueofinformationat ^ if anoptimumitemisadministeredtotheexamineewith ^ .Ifthisitemiscalculated foreachiteminanadaptivetestforanexamineeandthenaveraged,wewillget: 1 K K X k =1 I i k h ^ k 1 i I max h ^ k 1 i (3.5) 27 where K isthetestlength, i k istheindexof k thadministereditemintheitempool, ^ k 1 is theintermediateabilityestimateaftertheadministrationofthe( k 1)thitem, I i k [ ^ k 1 ]is theamountofinformationitem i k hasat ^ k 1 and I max [ ^ k 1 ]istheamountofmaximum informationanoptimumitemhasat ^ k 1 .For k =1, ^ k 1 becomes ^ 0 whichistheinitial abilityestimate.ThevalueinEquation(3.5)towhatdegreetheitemspresentedto anexamineedeviatefromtheitemscomingfromaperfectitempool.Here,aperfectitem poolisasanitempoolinwhich,wheneveraCATalgorithmsearchesforanitem todeliver,itempoolcanpresentaperfectitemforthatabilitylevel,regardlessofthestage ofthetest.Aperfectitemforaparticular isasanitemthatprovidesmaximum possibleinformationatthat level.ThisapproachissimilartoEignoretal.(1993): ...anitemisconsideredtohaveoptimumstatisticalpropertiesifitismost informativeatanexaminee'scurrentmaximum-likelihoodestimateofability. (p.10) Thisconceptualizationofaperfectitemispurelystatisticalandfromtheframeworkof IRT.Clearly,aperfectitemanditempoolshouldbevalidfortheintendedpurposeanduse ofthetestscores(Kane,2013).Anotherassumptioninthisofperfectitemisthat, aperfectitemforanexamineeistheitemthatisequaltothecurrentabilityestimateof theexaminee.Thismightnotbethecaseineverysituation.Dependingonthepurposeof thetest,abetteritemmightbetheonethatmaximizestheinformationatanotherability level.Inalicensureexamination,forexample,testdevelopermightwanttoadministeritems thathavemaximuminformationatthecutscore.Theaimofthelicensureexaminationis tolearnwhetherexamineesareaboveorbelowthecutscore.Theexactlocationsofthe examineesmightnotbetheprimaryaimoftheexamination.Inthiscase,aperfectitem mightbeasanitemthathasthemaximumamountofinformationatthecutscore. AggregatingEquation(3.5)overarepresentativegroupofexamineeswillgiveinformation abouthowanitempoolperformsforthatexamineegroup.Theitempoolutilizationindex 28 (IPUI)proposedforevaluatingtheperformanceofanitempoolis: IPUI = 1 N N X j =1 0 B @ 1 K j K j X k =1 I i jk h ^ j ( k 1) i I max h ^ j ( k 1) i 1 C A (3.6) where N isthetotalnumberofexamineesthattooktheadaptivetest, K j isthetestlength ofexaminee j , i jk istheindexof k thtestitempresentedto j thexamineewithintheitem pooland ^ j ( k 1) istheabilityestimateofexaminee j aftertheadministrationof k thitem. Thesummationistakenovertheexamineestakingtheadaptivetest,andthesecond innersummationistakenoverthesptestofeachexaminee. ThevaluesthatIPUIcantakerangesbetween0and1.AnIPUIvalueof1a perfectitempool.AnIPUIvalueof0istheoreticallypossible.Ifeachitempresentedto examineehas0informationvalueattheexaminee'scurrentabilityestimate,thenIPUIwill takeavalueof0.Butinpractice,eachitemprovidessomeinformationaboutanexaminees. So,IPUIcangetverycloseto0butcannotbe0inpracticalsettings. IPUIcantellanitempools'levelof.Ifoneaddsredundantitemstoanitem poolthatcannotbeutilizedbyaCATalgorithm,IPUIwillnotincreaseorincreaseminimally. Inthissense,IPUIisanindicatoroftheitempools',insteadofredundancy.IPUI canbeusedtodiagnoseanitempool.AtestdevelopercancalculateIPUIforcertaingroups ofexamineesorconditionalontabilitylevelsandobservetheweakspotsoftheitem pool. IPUIcanbehelpfultotestdevelopersandtestusersattwolevels.First,IPUIcanbe usedatexamineelevelasshowninEquation(3.5).AnIPUIcanbecalculatedforeach examineeandbothtestdevelopersandtestuserscanmonitorthequalityoftheitempoolat thislevel.AtestdevelopercanassignabaselineIPUIvalueforeachexamineeandensure thatitempoolistforeachindividualexaminee.Thiswillsubstantiatethefairness claimofthetest.Second,asshowninEquation(3.6),IPUIcanbeusedattheaggregate level.Thisallowstestdevelopertogetanoverallpictureofanitempool'sperformanceat thegrouplevel.AggregatingIPUIatgrouplevelallowsdevelopertoweighttheIPUIfor 29 atargetgroup.Thismightbenecessaryincaseswherethetesthasaparticularaimand targetpopulation.Testdevelopermightwanttoseehowanitempoolperformsformostof thetargetpopulationandgiveasmallerweighttoexamineesattheextremes.Aggregation oftheIPUIcanbeusefulatsuchsituations. IncontrasttotheotheroutcomesoftheCATsuchasSE,MSEorbias,IPUIisa standardizedmeasurethatrangesbetween0and1.Itisnotstraightforwardtocomparetwo tvaluesofSEbecausetheydependonthecontextofthemeasurement.Therangesof otheroutcomesofaCATareunspTheoretically,SEandMSEcantakeanypositive value,biascantakeanyvalue.Forexample,itistointerpretwhetheraSEvalueof 0.4islargeorsmallwithoutknowingthecontext.Ontheotherhand,anIPUIvalueclose to1alwaysindicatesanadequateitempool.IPUIvaluescanbecomparedacrosst testingscenariosbecausetheyaredimensionless. 3.3AnExampleCalculationofIPUI ThecalculationofIPUIisverystraightforward.AnexampleIPUIcalculationforExaminee 2inFigure1.1demonstratedinTable3.2. IPUI k columnshowstherelativequalityoftheselecteditemateachstepoftheCAT. Ascanbeseenfromtheline,theinitialabilityestimateis ^ 0 =0.They parameteroftheselecteditem( b i 1 =0 : 0013)isalmostequaltothisinitialabilityestimate. Theinformationvalueofthisitem( I i 1 [ ^ 0 ]=0 : 722)atthisinitialestimatealsoreachedthe maximumpossibleinformationvalue( I max ).Examinee2givesacorrectanswertothe itemandtheabilityoftheexamineeisupdatedto ^ 1 =1 : 42 .Theitemthatcanprovide highestinformationatthisabilityestimatehasanitemultyparameter b i 2 =1 : 34.This itemisalsoveryinformative( I i 2 [ ^ 1 ]=0 : 719),butnotasinformativeastheitem.IPUI valueforthisitemslightlydroppedto0.995.Afterthethirditem,thediscrepancybetween examinee'sestimatedabilityandtheyparameteroftheselecteditemstarttoincrease. 30 k ^ k 1 b i k u k ^ k I i k h ^ k 1 i I max IPUI k IPUI 1: k 10.0000000.00134767211.4175980.7220.7221.0001.000 21.4175981.33850582512.3118250.7190.7220.9950.998 32.3118252.22629008501.8393440.7190.7220.9950.997 41.8393441.63198064412.1980570.7010.7220.9700.990 52.1980571.51072284701.7030280.5230.7220.7240.937 61.7030281.06887530011.8336320.5470.7220.7580.907 71.8336321.04685373811.9312290.4760.7220.6590.871 81.9312290.94667872912.0012910.3840.7220.5320.829 92.0012910.75453767312.0473490.2770.7220.3830.779 102.0473490.70711851412.0861420.2440.7220.3370.735 112.0861420.63024940712.1178510.2070.7220.2860.694 122.1178510.58452503712.1454070.1850.7220.2560.658 132.1454070.50019353712.1681640.1570.7220.2170.624 142.1681640.34752208812.1851400.1200.7220.1660.591 152.1851400.24774431001.8645990.1000.7220.1380.561 Table3.2:IPUICalculationExample Atthelastitem,eventhoughtheexaminee'sestimatedabilitybasedonhis/herprevious 14responsesis ^ 14 =2 : 19 ,itempoolcanonlyprovideanitemwithyparameter b i 15 =0 : 25.Theamountofinformationthatthisitemprovidesatthisabilityestimateis verylowcomparedtopreviousitems( I i 15 [ ^ 14 ]=0 : 1).Thisdiscrepancydonthe IPUIvalueforthisitem( IPUI 15 =0 : 138).Attheendofthetest,theoverallvalueofIPUI forExaminee2is0.561.ComparethisvaluetotheIPUIvalueofExaminee1,0.998. 3.4betweenIPUIandStandardError ThebiggestbetweenIPUIandtheSEisthe valuesusedtocalculatethe informationfunction.WhencalculatingtheSE,theinformationformulawilluseonlythe estimate(Equation(2.4)).InthissenseSEisblindtothequality(orappropriateness) oftheitemswhicharepresentedattheintermediatestagesoftheadaptivetest.SEonly careswhethergooditemsarepresentedtoexamineewhicharearoundexaminee'slability estimate.IfanexamineestartsaCATwithalowinitialabilityestimateandimprovesher abilityestimatecontinuouslybycorrectlyansweringtheitemspresented,SEwillindicate 31 thatthequalityofthetestislowbecauseexamineegetalotofinappropriateitemscloseto herabilityestimate.SEprovidesaveryimportantpieceofinformation.Butitsays littleabouttheappropriatenessoftheitemsthatwerepresentedtotheexaminee.IPUI,on thecontrary,canexclusivelygiveinformationregardingthequalityoftheitemspresented. ThedistinctionbetweenSEandIPUIisdemonstratedwithanexample.Figure3.2shows theCATprocessesoftwoexamineesthataretakingthesameadaptivetestwithsametest spanditempool. Figure3.2:ADemonstrationofthencebetweenSEandIPUI Examinee3isahighabilityexamineeandansweredmostofthequestionscorrectly.Also, theitempoolwasveryappropriateforExaminee3.Ateachstep,therewasanappropriate itemtopresentintheitempoolwhichwasveryclosetoherintermediateabilityestimate. Consequently,theIPUIvalueisveryhighforthisexaminee,0.98.Ontheotherhand, 32 Examinee4isanaverageabilityexaminee.Shestartedtestwithconsecutiveincorrect responses.Lateronthetest,herresponsesimproved.Sincetheitempooliscomposed ofratheritems,Examinee4couldnotgetappropriateitems.Eventhough,she respondedincorrectly,thecultyoftheitemskeptincreasing.ThisresultinalowIPUI value,0.273. WhentheSEsofthesetwoexamineesareexamined,itisobservedthatExaminee3hasa higherSE.Asexplainedintheparagraphofthissection,SEcalculatestheinformation functionwithrespecttotheabilityestimate(reddashedlinesinFigure3.2).For Examinee3,theyparametersoftheadministereditemswereawayfromthe abilityestimate.ThisproducedinahighSEvalueforthisexaminee.ForExaminee4, eventhoughtheitempoolwasnotveryappropriate,theyparametersoftheitems presentedtothisexamineewereclosetotheabilityestimate.ThisproducedalowerSE forthisexaminee. VisuallythebetweenSEandIPUIcanbeconceptualizedwiththehelpofred andblackdashedlinesinFigure3.2.SEisaggregatingthedistancesshownbythereddashed lines.Thelongertheredlines,thehighertheSEwillbe.IPUIaggregatesthedistances shownbytheblackdashedlines.Theshorterthedistances,thehighertheIPUIwillbe. Inthisexample,SEclearlysaysnothingaboutthequalityoftheitempool.But,IPUI showstheciencyoftheitempool.Theexampleshownaboveisnotcommoninpractice. ButitshowsthecebetweenSEandIPUIplainly.Formostofthecases,SEandIPUI willbehighlycorrelated.Testswithbetteritempoolswillhaveonaveragelowerstandard errors. 3.5TheLimitationsofIPUI InEquation(3.5), I max h ^ k 1 i isaequalforallitemsfor1PLmodel.For1PLmodel, anoptimumitemwhichhasmaximuminformationat ^ k 1 hasitemyparameter 33 b i k = ^ k 1 .Themaximumvalueoftheinformationofthisitemis: I max h ^ k 1 i = D 2 e D ^ k 1 b i k 1+ e D ^ k 1 b i k ! 2 = D 2 4 (3.7) Forthe2PLmodel,anoptimumitemwhichhasmaximuminformationat ^ k 1 alsohas itemyparameter b i k = ^ k 1 .Consequently,themaximumvalueofinformationfor the2PLmodelwillbe: I max h ^ k 1 i = D 2 a 2 i k e D a i k ^ k 1 b i k 1+ e D a i k ^ k 1 b i k ! 2 = D 2 a 2 i k 4 (3.8) Forthe3PL,fromEquation(3.3),themaximumvalueofinformationwillbe: I max h ^ k 1 i = Da i k 2 1 c i k c i k + 1+ q 1+8 c i k 2 ! 1+ 2 1+ q 1+8 c i k ! 2 (3.9) ItcanbeeasilyseenfromEquation(3.7)thatthevalueof I max h ^ k 1 i doesnotdepend oneitheritemorabilityparametersfor1PLmodel.Thevalueisconstantforallitems. Themaximuminformationforaperfectitemcanbecapturedbyaconstant.Ontheother hand,inEquations(3.8)and(3.9)thevalueof I max h ^ k 1 i dependsonthevalueschosen for a i k and c i k parameters.Sincewetheperfectitemastheonethatmaximizes theiteminformationvalue,forthe2PLmodel,Equation(3.8)ismaximizedwhen a i k = 1 and b i k = ^ k 1 .Forthe3PLmodel,themaximuminformationvaluewillbereachedwhen a i k = 1 , c i k =0and b i k = ^ k 1 .Themaximizedvaluesofinformationsforthe2PLand 3PLmodelswillbeThiswillposeaproblemfortheIPUI.Thedenominatorof theEquation(3.6)willbewhichwillforceIPUItobe0foranypracticaltesting situationusing2PLor3PLmodel.ThisproblemisacknowledgedbyReckase(2010)where hediscussestheimpossibilityofdevelopingoptimalitempoolsforthe2PLmodel. 34 Onepossiblesolutiontothisproblemisthevalueofthe a parametertoahigh valuethatisrarelyreached,like3.Butanyvaluechoseninthismannerwillbearbitrary. Anotheroptionmightbetoignorethe a parameterwhencalculatingIPUI.Thiswillbypass theyproblemstatedabovebutalsolosesvaluableinformationaboutthequalityof itempool.Anitem'squalityishighlyrelatedtoit'sdiscriminationpower.Ignoringthiswill resultanequalIPUIvalueforaqualityitempoolwithmanyhighlydiscriminatingitems andanitempoolwithmanylowdiscriminatingitems.Inthisdissertation,thislimitationof IPUIwillbeacknowledgedandthepropertiesofIPUIfor1PLwillbeinvestigated. 35 CHAPTER4 RESEARCHQUESTIONSANDMETHODS 4.1ResearchQuestions Asstatedinthepreviouschapter,thereisaneedtoevaluateitempools.Thisstudyfocuses oncreatinganewindextoevaluatethequalityofanitempoolforagivenadaptivetest andexamineepopulation.Thisstudyinvestigateswhetherthisindexprovidesadditional informationaboutitempoolsontopofexistingmethodstoevaluateaCAT(suchasbias, SEofabilityestimates,MSE,itempoolinformationfunction).Inaddition,themethods todiagnosetheitempoolsusingthisindexareinvestigated.Theresearchquestionsofthis studyarethefollowing: 1. Foragivenpopulationofexamineesandadaptivetestdesign,doesthisindexchanges astheitempoolqualitychanges? 2. HowdoesthemagnitudeofthisindexchangesastheCATspthatthe itempoolqualitychanges? 3. Howcanthisindexbeusedtodiagnosetheshortcomingsoftheitempoolsandassist testdeveloperstoimprovethequalityoftheiritempools? 4. CanthisindexbeusedtoevaluatetheitempoolqualityofanoperationalCAT? 5. Canthisindexbeusedtodiagnosetheshortcomingsofanoperationalitempool? 4.2ResearchMethods Therearetwophasesofthisstudy.Inthephase,thethreeresearchquestionsare investigatedusingCATsimulationswithtspInthesecondphaseofthe 36 study,thelasttworesearchquestionsareexaminedforanexistingoperationalCAT. 4.2.1FirstPhase-SimulatedData Thephaseofthestudyconsistofthreesetsofsimulationstoanswerresearchquestions 1,2and3.Inthesethreesetsofsimulations,generateddatawereused. 4.2.1.1CommonCATSp AlladaptivetestsinthephasesharedsomecommonspSimulationswere performedusingRprogramminglanguage(RCoreTeam,2014).The1PLmodelwithscaling parameterD=1.7wasused.Forthe1PL,theprobabilityofcorrectresponseis: P ( u i k =1)= 1 1+ e 1 : 7 b i k : (4.1) Eachteststartedwithaninitialabilityestimateof0.ItemswereselectedusingMFIas explainedinSection2.3.2.1.Forinterimandabilityestimation,theEAPmethodwith priormean0andpriorvariance4wasusedatthebeginningofthetestuntiltheexaminee obtainedatleastonecorrectandoneincorrectresponse.Afteraheterogeneousresponse string,abilitywasestimatedusingMLE.Thevarianceofthepriordistributionwaschosento be4(insteadofastrongpriorwithvariance1orevenlower)toreducetheimpactofiton theabilityestimatesandconsequentlytheitemselection. Inpractice,abilityestimatesareusuallywithinanarbitraryinterval(Wang& Vispoel,1998),forexamplebetween-4and4.Thisisusuallydonetodealwiththe abilityestimatesofMLEforexamineeswithallcorrectorallincorrectresponsestrings. Anotherreasonforthispracticeisextremeabilityvaluesintoapracticalinterval. Inthephaseofthesimulations,abilityestimateswerenotintosuchanarbitrary interval.Thereweretworeasonsbehindthisdecision.First,EAPestimationwasuseduntil anexamineeobtainedaheterogeneousresponsestring.So,eabilityestimateswerenot aproblem.Second,sincethephasewasatheoreticalstudy,theaimwastoobservethe 37 ofconditionsondependentvariableswithouttheinterferenceofsucharbitraryrules. Forexample,iftherewasanarbitrarylimitforabilityestimates,thestandarddeviationof theabilityestimatesattheextremesoftheabilitydistributionmightbedepresseddueto suchrules.Thiswouldlimitthegeneralizabilityoftheconclusionsfromtheanalysisbecause itwouldnotpossibletostripoutthectsofthisarbitraryrule. ForallofthesimulationsinResearchQuestion1and2, 10 ; 000 examineesweresimulated foreachcondition.Thisnumberwasenoughtogetarepresentativesamplefromtheability distributionsandminimizetheofsamplingerrors.Largersampleswouldnothave muchaddedvalueontopofthisnumberduetothediminishingreturns.And,sincethere weremanyconditionstosimulate,computingtimewouldincreaseexponentially.ForResearch Question3,1000examineesateach valueweresimulated.Sincethereare31 values between-3and3with0.2interval,atotalof 31 ; 000 examineesweresimulatedforeach condition. Togenerateresponses,arandomnumberfromauniformdistributionbetween0 and1wasgenerated.Then,theexaminee'sprobabilityofcorrectlyansweringtheitem wascalculated.Iftherandomuniformnumberwassmallerthantheprobabilityofcorrect response,ascoreof1wasassignedasaresponse,otherwiseascoreof0wasassigned. 4.2.1.2ResearchQuestion1 InthesetofsimulationstheutilityoftheIPUIwasexploredbycheckingwhether thisindexchangessystematicallyastheitempoolqualitychanges,i.e.whetherthisindex issensitivetothechangesinthequalityoftheitempool.Theitempoolqualitywas operationalizedas: 1. Thediscrepancybetweenitempooldistributionandexamineeabilitydistribution 2. Itempoolsize 38 Accordingto(1),itempoolqualityreducesasthediscrepancybetweentheitempool distributionandabilitydistributionincreases.Accordingto(2),itempoolqualitydecreases asthesizeofanitempool(i.e.thenumberofitemsinanitempool)decreases.Itwas hypothesizedthatthevalueoftheindexwilldecreaseasthequalityoftheitempooldecreases. ResearchQuestion1wasansweredbycheckingwhethertheIPUIchangesasthesetwoitem poolqualityindicatorschange. Itempoolandexamineeabilitydiscrepancy ACATsimulationwasperformedto checkwhethertheIPUIdecreasesasthediscrepancybetweenitempooldistributionand examineeabilitydistributionincreases.Eventhoughtherearemanywaystoincreasethe discrepancybetweentwodistributions,inthisstudyitwasassumedthatbothitemy ( b parameter)distributionsoftheitempoolsandtheabilitydistributionofexamineeswere normallydistributedwithsamestandarddeviations,1.Thediscrepancybetweentheitem poolandabilitydistributionwasincreased(ordecreased)byincreasing(ordecreasing)the betweenthemeansoftheitemydistributionandabilitydistribution. Inthesimulation,theitemdydistributionoftheitempoolsweretothe standardnormaldistribution.Ontheotherhand,abilitydistributionshadtmeans rangingfrom-3to3with0.5intervals.Standarddeviationsoftheabilitydistributionswere to1aswell.ThissetupallowedobservationofwhethertheIPUIchangeswhenthe discrepancybetweentheitempoolandexamineeabilitydistributiontakesvalues-3,-2.5, -2,-1.5,-1,-0.5,0,0.5,1,1.5,2,2.5and3.ItwashypothesizedthattheIPUIwouldtake thehighestvaluewhenthediscrepancybetweenthetwodistributionswas0andIPUIwould decreaseasthediscrepancymoveseithertowards-3or3. AlloftheCATspweresameforeachofthe13discrepancyconditions.The itempoolwasconsistof250items.Asmentionedabove,theitempoolhadastandardnormal distribution.Thesameitempoolwasusedforallconditions.Foreachcondition,10000 examineesweresimulated.CATtestshadatestlengthof20items.Itempoolsizewas 39 chosenaccordingtothistestlength.Asmentionedabove,Stocking(1994)suggestedthat itempoolsizeshouldbetwelvetimesthelengthofadaptivetest.Therewerenoconstraints ontheitemselectionalgorithmsuchasexposurecontrolorcontentbalancing. Itempool ForthesecondpartofResearchQuestion1,whetherthevalueofIPUIchanges astheitempoolsizechangeswasexplored.Itempoolsizeisthesecondoperationalized indicatorfortheitempoolquality.Itishypothesizedthatastheitempoolsizeincreases, theIPUIindexwillincreaseaswell.Butinsteadoflinear,thisrelationshipishypothesized tobeamonotonicnon-linearincrease.Toobservethisrelationship,CATwithtitem poolsizesweresimulated.Therewere11itempoolsizeconditions:(1)verysmallitempool withthesizeofthetestlength,20inthiscase;(2-4)Smallitempoolsizeswith40,60and80 items;(5-11)largeitempoolswithsizes100,200,300,400,500,750and1000.Itemy parametersofallitempoolsweregeneratedfromstandardnormaldistribution.Foreach condition, 10 ; 000 examineesgeneratedfromstandardnormaldistributionwereused.Test lengthofalltestswere20.Therewerenoconstraintsontheitemselectionalgorithm. Whencomparingitempools,especiallyfortheoneswithsmallnumbersofitems,the ofsamplingerrorisexpected.Forexample,fortheitempoolofsize20,if20itemsfrom thestandardnormaldistributionisrandomlyselectedandtheIPUIindexiscalculatedbased onthisitempool,possiblythisindexwillfromanother20itemsthatarerandomly chosen.Toalleviatetheeofthesamplingerror,simulationswererepeated25timesfor eachitempoolsizeconditionandresultswerereportedbasedonthesereplications.Alarger ofsamplingerrorisexpectedforsmallitempoolsizes.Butforlargeritempoolsizes, theofthesamplingerrorisexpectedtobedisappear.Performingthesereplications wasworthwhilebecauseitshowedthesensitivityoftheindexaswell. Inthisthesis,the1PLmodelwasused.So,theitemsonlybytheirty parameters.Allitemswereassumedtodiscriminateexamineesequallywell,andexaminees wereassumedtonotguess.Ontheotherhand,in2PLor3PLmodels,itemsintheir 40 discriminationparametersandtheirguessingparameters.Inthesemodels,ahighquality itemhasahighdiscriminationparameterandlowguessingparameter.Highqualityitems providemoreinformationabouttheexaminee,giventhattheitemyisverycloseto theexaminee'strueabilitylevel.Inthesemodels,increasingthesizeofanitempoolwithout consideringthequalityofitemsmightresultinunexpecteditempoolperformances.Asmall itempoolwithhighqualityitemscanperformbetterthanalargeritempoolwithlowquality items.Consequently,thesizeofanitempoolwillnotbetheonlydeterminantofthequality ofanitempoolin2PLor3PLmodels. Foreachcondition,biases,standarderrors,meansquarederrorsoftheabilityestimates andtheycot(McBride,1977),i.e.correlationbetweentrueandestimated abilitieshavebeencalculated.Inaddition,IPUIvaluesandexposureratesoftheitemsfor eachconditionwerecalculated.ForaCATwithtestlength K and N examinees: MeanStandardError= SE = 1 N N X j =1 2 6 4 K j X k =1 I i jk ^ jk 3 7 5 1 (4.2) MeanSquaredError= 1 N N X j =1 ^ j j 2 (4.3) MeanBias= 1 N N X j =1 ^ j j (4.4) r ^ = N P j =1 j ^ j ^ s N P j =1 j 2 s N P j =1 ^ j ^ 2 (4.5) where j isthetrueabilityofexaminee j , ^ j istheestimatedabilityofexaminee j , isthemeanoftrueabilitiesof N examinees, ^ isthemeanofestimatedabilitiesof N examinees, I i jk ( ^ jk )istheFisherinformationofitem i jk at ^ jk and r ^ isthey cot(correlationbetweentrueandestimatedabilities). 41 4.2.1.3ResearchQuestion2 Therearemanyfactorsthatectthequalityoftheitempoolsbesidesthesizeofanitem poolorthediscrepancybetweentheitempoolandexamineepopulationdistribution,suchas CATspThesefactorsmightthequalityoftheitempoolstly. ChangingCATspmighthavepositiveornegativeimpactontheutilizationof theitempool.Forasetofspitempoolqualitymightbetbutifthese spnschanges,itempoolqualitymightchangeaswelleventhoughtheitempool itselfdoesnotchange.Exposurecontrolisagoodexampleforthis.Anitempoolmightbe tforanadaptivetestwithnoexposurecontrol,butimposinganexposurecontrol proceduremightreducethequalityoftheitempool.InResearchQuestion2,howt CATspmightthequalityoftheitempoolwasinvestigated. Theresultsforadaptivetestswitherentspwerecomparedtosee(1)how thesechangestheindex,(2)whetherthisindexcapturesthechangesintheitempool qualitycausedbythechangesinCATspions,(3)therelationshipbetweenthisindex andotheroutcomesoftheadaptivetests(suchasbias,SE,MSEetc.),(4)whetherthisindex capturestheitempoolqualitychangesthatotherindicatorsofaCATcannotcapture.The lastaimwouldshowtheaddedvalueofthisindextothecurrentliterature. InordertoseethemainofthetestspontheIPUIandotherCAT outcomeindicators,CATsimulationsinwhichonlyonespchangesatatimewere performed.TwoCATspwerethefocus:testlengthandexposurecontrol.Clearly, CATspionsmightalsointeractwitheachother.Changesintwoormoresp atthesametimemighthaveatimpactontheitempoolqualitycomparedtochanging onespatatime.Theofthechangesmightnotbeadditiveandtheremight beinteractionsbetweenspInthisstudy,onlythemainwereinvestigated forsimplicityreasons.Sp,theoftestlengthandexposurecontrolonthe qualityoftheitempoolwereinvestigated. 42 Testlength First,theoftestlengthonIPUIandotheroutcomesoftheadaptive testwereinvestigated.AsindicatedintheruleofthumbbyStocking(1994),thereisadirect relationshipbetweentestlengthanditempoolsize.Itisexpectedthat,everythingbeing equal,astestlengthincreases,itempoolqualitydecreases,andconsequentlythevalueof theIPUIdecreases.ThisrelationshipwasexploredbysimulatingCATswithttest lengths.Eighteenttestlengthconditionsweretestedtoseetheoftestlength onitempoolquality.Theseconditionsweretestlengths5,10,15,20,25,30,35,40,45,50, 55,60,65,70,100,200,300and400.Thelastconditionbasicallyadministeredeveryitemin theitempooltoasimulee.Inthissensethisconditionwasnotverytthanalinear test.Theonlyewastheorderingoftheitems.Inalineartesttheorderofitemsare generallysameforeveryexaminee. 10 ; 000 examineesweregeneratedfromastandardnormaldistribution.Thesamesetof examineeswereusedforallconditions.Theitempoolsizewas400,approximately10times themedianofthetestlengthconditions.Thesameitempoolusedforallconditions.Item yparametersoftheitempoolweregeneratedfromstandardnormaldistribution. Sincetheitempoolsizewasratherlarge,itwasassumedthattheofsamplingerror wasminimal.Consequently,therewerenoreplicationsofthesimulationswithtitem poolsofthesamesize.Thereweren'tanyconstraintsontheitemselectionalgorithm. ExposureControl ExposurecontrolisthesecondCATspimposedontheitem selectionthatwasexplored.Exposurecontrolisanimportantfactortheitempools especiallyinhighstakesCATs.Itisoneofthemainreasonsthatforcesthetestdevelopers toincreasethesizeoftheiritempools.Forthisreason,itiscrucialtoexploretheinteraction betweenIPUIandexposurecontrol. ExposurecontrolhasmanyvariationsinadaptivetestingasexplainedinRevueltaand Ponsoda(1998),Georgiadouetal.(2007),Leroux,Lopez,Hembry,andDodd(2013).This studyfocusedonlyonrandomesqueexposurecontrolprocedure(Kingsbury&Zara,1989) 43 duetoitssimplicityandwidespreadusage. Therewere12itemexposureconditions:(1)noexposurecontrol,(2-11)randomesque with3,5,7,10,13,15,20,25,50,100itemsand(12)total-random-selectionofitems.For example,inrandomesquewith10items,oneitemoutofthe10mostinformativeitemsat examinee'scurrentabilityestimatewasselected.Intotal-random-selectionofitems,items wererandomlyselectedoutofallavailableitems,regardlessoftheirinformationvalue.This caseservedastheworstcasescenariointhesenseoftheofaCATamongthe exposurecontrolconditions.Itwasexpectedthat,asmorerandomnesswasimposedonthe itemselectionprocedure,thevalueoftheIPUIwoulddecrease. Asintheprevioussimulations,everythingexcepttheexposurecontrolwerethesame intheCATsimulationsbetweentheconditions.Foreachcondition, 10 ; 000 examineeswere generatedfromanormaldistributionwithameanof0andastandarddeviationof1.Test lengthwas20forallconditions.Itempoolconsistedof250itemswhereitem weregeneratedfromanormaldistributionwithameanof0andastandarddeviationof0.7. IPUIvalueswerecomparedateachconditionalongwithmeanSE,MSE,meanbias,y cotandexposurerates. 4.2.1.4ResearchQuestion3 ResearchQuestion3focusesontheusefulnessofthisindexasadiagnostictoolforitempools. Thescenarioofthisresearchquestionwashypothesizedasthefollowing. Supposeastatetestingagencyisplanningtodevelopatesttomeasuretheendofyear achievementlevelsofstudents.Thepurposeofthetestistomeasureeachstudentasprecisely aspossible.Studentshadawiderangeofabilities.CATisdecidedtobethebestwayto achievethisgoal.Thestatetestingagencywantstomeasurethestudentsinthreecontent areas.Contentarea1(i.e.arithmetic)isgenerallyregardedaseasyamongtheaverage studentsandtherearenottoomanyitemsinthiscontentarea.Contentarea2(i.e. algebra)isregardedasmediumyandcontentarea3(i.e.trigonometry)isregarded 44 asamongtheaveragestudents.ThestatetestingagencyhasthreeCATplansto comparebeforeimplementingthetest.Previousitemdevelopmentrtsshowedthatthe itemyparametersoftheitemsfromcontentarea1hadanormaldistributionwith meany-1andstandarddeviation0.3.Theitemofitemsfromcontent area2hadanormaldistributionwithmeany0andstandarddeviation0.3.Theitem ofitemsfromcontentarea3hadanormaldistributionwithmeany1and standarddeviation0.3.Inthefollowingsimulationplans,itemultyparametersofthe itempoolsweregeneratedfromthesedistributionsforeachcontentarea. Herearethethreetestplansinvestigatedforthishypotheticalstatetestingagencyto demonstratetheuseofIPUI: Plan1. Forthisplan,anitempoolof90consistingofanequalnumberofitemsfrom eachcontentarea(30itemseach)wascreated.TheCATspofthisitem poolwerethesameasthecommonspofthepreviousresearchquestions (seeSection4.2.1.1onpage37).Lengthofthetestwas15items.tthanthe previousCATspinthistestplan,contentbalancingwasimposedonthe itemselectionalgorithm.Examineesshouldrespondtoexactly5itemsfromeach ofthethreecontentareas.Aftertheadministrationofeachitem,thecontentarea thathasthelargestdiscrepancywiththetargetvalue(i.e.5)wasselected.Themost informativeitemwithinthiscontentareaattheexaminee'sintermediateabilitylevel wasadministered. Plan2. Inthisplan,thesameitempoolcreatedfortheplanwasused.Thetest spnsofthisplanwerethesameastheplanexceptforthecontent balancing.Forthesecondplan,nocontentbalancingwasimposedontheitemselection algorithm.Themostinformativeitemattheexaminee'sintermediateabilityestimate wasadministeredregardlessofthecontentoftheitem. Plan3. ThethirdplaninvolvesthecreationofaCATtestforeachcontentareaseparately 45 withalargeritempool.Onlytheresultsofthethirdcontentareawerecomparedto theotherplansinlieuoftheothercontentareas.Anitempoolof90itemsfromthe distributionmentionedaboveforcontentarea3wasgenerated.Sincethereisonlyone contentarea,therewasnocontentbalancingforthistest.TheCATspof thisplanwerethesameasPlan-2. Thesizeoftheitempoolsforeachplanwasthesame,90items.Theitempoolsof Plan1and2werethesame,buttheybythecontentbalancingimposedonthe itemselectionalgorithm.TheCATspofPlan2and3werethesamebutthey bythedistributionoftheitemoftheitempools.TheresultsofResearch Question3showstheuseofIPUIasadiagnostictool.Usingthisdiagnosticinformation, thishypotheticalstatetestingagencycandecidewhichplanprovidesthebestmeasurement optiongiventhepracticalconstraints.Theycanalsoseetheweakpointsoftheitempools andchooseaplanaccordingly,ordecidetoimprovetheitempools. Tocompareeachplan,1000examineesweresimulatedateach pointbetween 3and3 with0.2intervals.Thenumberofitemsandthedistributionsoftheitempoolsareshownin Table4.1. Table4.1:ItemPoolInformationforResearchQuestion3 ContentArea1ContentArea2ContentArea3 Total Plan1-2(IP1-2)30 ˘ N(-1,0.3)30 ˘ N(0,0.3)30 ˘ N(1,0.3) 90 Plan3(IP-3)0090 ˘ N(1,0.3) 90 Note. Thenumberofitemsandthedistributionofitemsaregivenwithineachcell. Numberswithintheparenthesesarethemeansandthestandarddeviationsofthe generatingdistributions. Themeanvaluesofbias,SE,MSEandIPUIwerecalculatedateachtrue conditionfor eachplan.ThediagnosticutilityofIPUIwasinvestigatedandcomparedwiththediagnostic utilitiesofotherCAToutcomes. 46 4.2.2SecondPhase-RealData Inthesecondphase,thesimulationswerebasedontherealitempoolsofanoperational CAT.NationalCouncilofStateBoardsofNursing(NCSBN)providedtheoperationalitem poolsofNationalCouncilLicensureExaminationforRegisteredNurses(NCLEX-RN)exam. NCLEX-RN(NationalCouncilofStateBoardsofNursing[NCSBN],2012)isanursing licensureexamadministeredbyNCSBN.NCLEX-RNexamination\assessestheknowledge, skillsandabilitiesthatareessentialfortheentry-levelnursetouseinordertomeetthe needsofclientsrequiringthepromotion,maintenanceorrestorationofhealth"(NCSBN, 2012,p.3).ExamisdeliveredviaCAT. DescriptionofNCLEX-RNExam ThespationsoftheCATalgorithmusedby NCLEX-RNiscomplex.ItemsthatpassthequalitycontrolarecalibratedusingtheRasch model.Itempoolsareusedforacertainperiodoftimeandrenewedwithanewitempool duetosecurityreasons.Itempoolsconsistof8contentareas.Examineesshouldanswera spproportionofquestionsfromeachcontentarea.Thecontentdistributionofthe examinationisinTable4.2(NCSBN,2012,p.5). Table4.2:DistributionofContentforNCLEX-RNExamination ContentAreaPercentageofItems SafeandeCareEnvironment ManagementofCare17-23% SafetyandInfectionControl9-15% HealthPromotionandMaintenance 6-12% PsychosocialIntegrity 6-12% PhysiologicalIntegrity BasicCareandComfort6-12% PharmacologicalandParenteralTherapies12-18% ReductionofRiskPotential9-15% PhysiologicalAdaptation11-17% TheCATprocedurestartswithaninitialabilityestimatethatislowerthantheaverage abilityestimateoftheexaminees.Initially,Owen'sBayesianprocedure(Owen,1975)isused 47 forabilityestimationanditemselectionuntiltheexamineehasatleastonecorrectand oneincorrectresponseinherresponsestring.Anormalpriordistributionwithmean0and variance4isused.Aftertheexamineehasatleastonecorrectandoneincorrectresponsein herresponsestring,abilityisestimatedusingMLE.Themaximumvalueofthelikelihood functionisfoundusingtheNewton-Raphsonalgorithm. Foreachquestion,thecontentareaoftheitemisselectedThecontentareawhich deviatesmostfromthetargetcontentproportionsisselected.Amongtheavailableand unadministereditemswithintheselectedcontentarea,oneof m itemswhichhavethe maximumamountofinformationatthecurrentabilityestimateisrandomlyselectedand administeredtotheexaminee.Here, m istheparameterfortherandomesqueexposure control.Aftertheexamineerespondstheitem,herabilityandthestandarderrorofher abilityisupdated. Aftertheexamineecompleted60items,theCATprogramcheckswhetherthecutscore iscontainedwithina90%intervalaroundtheabilityestimate.Ifthecutscoreis outsidethenceinterval,thetestisterminated.Iftheabilityestimateisabovethecut score,theexamineepassesthetest,otherwisefails. Ifthecutscoreiswithinthe90%intervalaroundtheabilityestimate,thetest continuesuntilthecutscorefallsoutoftheinterval.TheCATprogramcontinues toadministeritemsuptothemaximumtestlengthof250,ifthecutscoreisstillwithinthe examineesestimatedabilityinterval.Aftertheadministrationof250thitem,the testisterminated.Theexamineepassesthetestiftheabilityestimateisgreaterthan thecutscore,otherwisetheexamineefails. 4.2.2.1ResearchQuestion4 NCSBNadministersNCLEX-RNthroughouttheyear.Duetosecurityreasons,testdevelopers changetheitempoolswithincertainintervals.Eachitempoolisselectedtomeetthetest sptionsandpassesthroughqualitycontrol.Theindexproposedinthisthesiscanhelp 48 thetestdeveloperstochecktheoftheselecteditempool. Forthisthesis,NCSBNprovidedtheitemparametervalues,contentareainformationofthe retireditempools,theabilitydistributionoftheprevioustesttakersandthetestsp requiredtomimictheCATprocedure.Inaddition,NCSBNprovidedresponsestringsofe anonymoustesttakers.ThisinformationwasusedtoensurethattheCATprocedurewritten forthisdissertationindeedmimicstherealCATalgorithmusedoperationally. Forthisresearchquestion,thequalityofeoperationalitempoolswereinvestigated. Ontopoftheseeoperationalitempools,fouradditionalitempoolsweregenerated.Two ofthemwereidealitempoolsdevelopedforthespandtargetpopulationofthe NCLEX-RNtesttoformabaselineforcomparisons.Oneoftheseidealitempoolscreated usingabinsizeof0.4,theotheriscreatedusingabinsizeof0.8.Thesebinsizes arethesamebinsizesusedinHeandReckase(2013).Thedetailsofthedevelopmentofthe idealitempoolsisdescribedinSection4.2.2.1onthefollowingpage. Thethirditempoolcreatediscalledthehalf-item-pool(Half-IP).Thisitempoolhas halfthesizeoftheoperationalitempool.Foreachcontentarea,halfoftheitemswere randomlyselectedandthenremovedfromtheoperationalitempool.Theremaining itemsformedthehalf-item-pool.Afourthgenerateditempooliscalledone-third-item-pool (One-Third-IP).Thisitempoolwascreatedinasimilarwaytothehalf-item-pool.Insteadof halfoftheitems,twothirdsoftheitemsrandomlyremovedfromeachcontentarea.The remainingitemsformedtheone-third-item-pool.Theseitempoolsaretheotherextremesof theidealitempools. EvaluationoftheseitempoolswereperformedusingCATsimulationsforeachofthese eightitempools.Normallydistributed 50 ; 000 examineesweresimulated.Themeanandthe standarddeviationofthenormaldistributionwasthesameastheabilitydistributionofthe realexaminees.Thesame 50 ; 000 examineeswereusedforeachitempoolcondition.IPUI wascalculatedforeachitempoolcondition.Additionally,foreachitempool,meanstandard error(Equation(4.2)),meansquarederror(Equation(4.3)),meanbias(Equation(4.4)), 49 ycot(Equation(4.5))andthepercentageofcorrectclascationwascalculated andcomparedtotheIPUI.Besidestheseoutcomevariables,sinceNCLEX-RNtestisa variablelengthtest,theaveragetestlengthandthemeanexposurerateswerealsocalculated. Idealitempoolcreation Idealitempoolswerecreatedusingthebin-and-unionmethod asdescribedatReckase(2010)andHeandReckase(2013).AccordingtovanderLinden etal.(2006)anidealitempoolshould ...consistofamaximalnumberofcombinationsofitemsthat(a)meetallcontent spforthetestand(b)aremostinformativeataseriesofabilitylevels theshapeofthedistributionoftheabilityestimatesforapopulationof examinees.(p.82) Thebin-and-unionmethodusesthisprincipletobuildanidealitempoolforagivenexaminee populationandasetoftestspForthisdissertation,twoidealitempoolsforan examineegroupwiththesameabilitydistributionastherealexamineegroupwerecreated. TheCATspwerethesameastheNCLEX-RNexamination.Initially, 10 ; 000 examineesweresimulatedfromadistributionthatrepresentstherealexamineepopulation. Fortheseexaminees,aCATtestwasperformedwiththeassumptionofmaximallyinformative itemisavailableateachstageoftheadaptivetest,whateverthevalueofintermediateability estimateis.SincetheRaschmodelwasusedinthisstudy, b valuesofthemaximally informativeitemswereequaltotheintermediateabilityestimates. Initially,binswithintheabilityscalewiththefollowingrangelimitswerecreated: Idealitempoolwithbinsize0.4: ( ,-3),[-3,-2.6),[-2.6,-2.2),[-2.2,-1.8),[-1.8,-1.4), [-1.4,-1),[-1,-0.6),[-0.6,-0.2),[-0.2,0.2),[0.2,0.6),[0.6,1),[1,1.4),[1.4,1.8),[1.8, 2.2),[2.2,2.6),[2.6,3),[3, 1 ) Idealitempoolwithbinsize0.8: ( ,-2.8),[-2.8,-2),[-2,-1.2),[-1.2,-0.4),[-0.4,0.4), [0.4,1.2),[1.2,2),[2,2.8),[2.8, 1 ) 50 Thesebinswereusedtotallythenumberofitemsrequiredforeachrange.Thenumber ofitemswithineachbinwassetto0atthebeginningofthesimulation.Then,foreach examinee,aCATtestwassimulatedandthe b valuesoftheitemsthatwereadministered wererecorded.Attheendofthetest,foreachcontentarea,itemsweredistributedtobins accordingtotheir b values.Foreachbinrange,ifthenumberofitemsinthatbinrange waslargerthanthepreviousnumberofitemsassignedtothatbin,thenthislargernumber wasassignedtothatbin.Attheendofthesimulationsfortheentireexamineegroup,the maximumnumberofitemsthatwerenecessaryforeachbinrangewasobtained.Thisgave thedistributionoftheidealitempool. Afterobtainingthedistributionoftheidealitempool,actualidealitempoolswere generated.Arandomnumberfromthestandardnormaldistributionwasgenerated.This randomnumberwasassignedtotheappropriatebinifthatbindidnotreachitsmaximum sizeyet.ThisprocedurecontinueduntileachbinwasThisway,anidealitempoolfor thisexamineegroupwasobtained. 4.2.2.2ResearchQuestion5 Inthisresearchquestion,theutilityoftheIPUIasadiagnostictoolforanoperationalCAT wasinvestigated.AnalysesinthisresearchquestionweretheextensionoftheResearch Question3toanoperationalCATwithtsp Inthisresearchquestion,6itempoolswerestudied.Twooftheseitempoolswerethe operationalitempools,twoofthemweretheidealitempoolsandthelasttwoitempoolswere One-ThirdandHalfitempoolscreatedforthepreviousresearchquestion.Eventhoughthere wereeoperationalitempoolsavailable,onlytwoofthemwereinvestigated.Theresultsof theResearchQuestion4showedthatthebetweentheoperationalitempoolswere minimal.Forthisreason,onlythetwooperationalitempoolswereinvestigatedfurther. Foreachitempoolcondition,1000examineesattrue values-3,-2.8,-2.6,-2.4,-2.2,-2, -1.8,-1.6,-1.4,-1.2,-1,-0.8,-0.75,-0.7,-0.65,-0.6,-0.55,-0.5,-0.45,-0.4,-0.35,-0.3,-0.25, 51 -0.2,-0.15,-0.1,-0.05,0,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7, 0.75,0.8,1,1.2,1.4,1.6,1.8,2,2.2,2.4,2.6,2.8and3weresimulatedusingtheNCLEX-RN CATspOutsidethe interval( 0 : 8 ; 0 : 8),thebetweenthesimulated valueswas0.2.Between[ 0 : 8 ; 0 : 8],thebetweenthesimulated valueswas0.05 becausethesepointswereclosetothecutscoreanditwasimportanttomakemoreprecise diagnosisclosetothecutscorewherethedecisionaccuracywasparamount.Ateach value, themeanofIPUIvalues,meanSE,MSE,meanbiasandthedecisionaccuracywascalculated. EachoutcomevariablewascomparedtotheIPUItoshowthediagnosticutilityofIPUI.The distributionsofIPUIvaluesateachconditional valuegavealocaldiagnosisoftheitem pool. 52 CHAPTER5 RESULTS Theresultsoftheanalysesaredividedintotwophases.Inthephase,theresultsofthe threeresearchquestionsarepresented.Inthesecondphasetheresultsofthefourthand researchquestionsarepresented. 5.1FirstPhase-SimulatedData Inthisphaseoftheresults,resultsofthreequestionsarepresented.Thesethree questionsarebasedonsimulateddata.Testspwerecommoninthesethree researchquestionsexcepttheconditionsbeingtested(Section4.2.1.1).Thedesignofthe phasewaskeptassimpleaspossibletodemonstratetheoftheIPUI.Amore complicatedCATdesignwasinvestigatedinthesecondphaseofthestudy. 5.1.1ResearchQuestion1 InResearchQuestion1,itempoolqualityisoperationalizedas(1)thediscrepancybetween examineeabilitydistributionanditemydistributionoftheitempool,and(2)the itempoolsize.Inthefollowingtwosectionstheresultsofthesimulationsexaminingthese twoitempoolqualityindicatorsarepresented. 5.1.1.1ItemPoolandExamineeAbilityDiscrepancy Therewerethirteenconditionsforitempoolandexamineeabilitydiscrepancy.Bothitem yparametersintheitempoolandexamineeabilitiesweregeneratedfromnormal distributionwithstandarddeviationof1.Theitempoolwasforallconditionsand theitemhadastandardnormaldistribution.Theitemydistributionof 53 itempoolisinFigureA.1onpage167.Theconditionsbytheirmeanswhichranged from-3to3with0.5intervals. 10 ; 000 examineesweresimulatedforeachcondition.The distributionsoftruethetasareplottedinFigureA.2onpage168.Foreachcondition,mean ofbias,SE,MSEandIPUIwerecalculated. TheresultsofthesimulationsaresummarizedinFigure5.1andTable5.1.Figure5.1 showsthechangeinbias,SE,MSEandIPUIforeachdiscrepancycondition.Table5.1shows themeansandstandarddeviationsoftheseoutputvariables. Figure5.1:SummaryStatisticsforResearchQuestion1-DiscrepancybetweenItemy DistributionofItemPooland Distribution Bias Figure5.1doesnotshowaclearrelationshipbetweenthebiasofabilityestimatesand theamountofdiscrepancybetweenabilitydistributionanditempool.Meansofthebiases werecloseto0forallconditions.Ontheotherhand,thestandarddeviationsofbiasesgiven 54 Table5.1:SummaryStatisticsforResearchQuestion1-Discrepancybetween ItemPoolandAbilityDistribution DiscrepancySEBiasMSEIPUI -3.00.451(0.190)-0.002(0.426)0.182(0.312)0.632(0.264) -2.50.385(0.157)-0.021(0.381)0.145(0.265)0.741(0.255) -2.00.337(0.116)-0.020(0.346)0.120(0.212)0.839(0.219) -1.50.307(0.075)-0.009(0.317)0.100(0.175)0.913(0.166) -1.00.292(0.046)-0.007(0.301)0.091(0.143)0.958(0.114) -0.50.284(0.026)-0.007(0.292)0.085(0.124)0.983(0.068) 0.00.283(0.024)-0.001(0.293)0.086(0.131)0.987(0.060) 0.50.285(0.035)0.003(0.296)0.087(0.142)0.979(0.084) 1.00.295(0.065)0.004(0.309)0.096(0.163)0.950(0.135) 1.50.314(0.096)0.013(0.323)0.105(0.185)0.901(0.186) 2.00.352(0.144)0.017(0.356)0.127(0.229)0.820(0.239) 2.50.414(0.189)0.028(0.405)0.165(0.297)0.710(0.272) 3.00.493(0.220)-0.013(0.466)0.217(0.375)0.596(0.274) Note. Numberswithintheparenthesesarestandarddeviationsofeachoutcome. SE:StandardError;MSE:MeanSquaredError inTable5.1displaysapattern.Asthediscrepancybetweenabilitydistributionanditem poolincreased,thestandarddeviationofthebiasesincreasedaswell.Forlowdiscrepancy conditionsthevariabilityofbiaseswerelow.Mostoftheexamineeshadbiasescloseto0.For highdiscrepancyconditionsthevariabilityofthebiaseswerehigher.FigureA.3onpage169 alsoshowsthisvisually. StandardError Figure5.1showsthatastheabsolutevalueofdiscrepancybetweenthe itempoolandabilitydistributionincreased,SEoftheabilityestimatesincreasedaswell. Whentherewasaclosematchbetweenabilitydistributionanditempool,theoverallSEsof examineeswerelow.Forlargediscrepancyconditions,themeanoftheSEswerelarger. SimilartypeofincreasecanbeobservedforthestandarddeviationsoftheSEvalues. Table5.1showsthatastheabsolutevalueofdiscrepancyincreased,thevariabilityofstandard errorsincreased.ThevariabilityofSEsarealsoplottedinFigure5.2.InFigure5.2,each pointrepresentstheSEvalueofasimulatedexaminee.Eachexamineeisplottedcloseto 55 thediscrepancyconditionitbelongs.Thepointsarescatteredaroundtheexactdiscrepancy valuetoshowthedistributionofthevalues.Inadditiontopoints,abox-plotwasaddedto showtheshapeofthedistribution.Onlytheboxandwhiskersofthebox-plotswereplotted becausetheoutliersarealreadyshownbythepointsinthegraph.Asbox-plotsandthe distributionofthepointsshow,SEforeachconditionhadapositiveskew. Figure5.2:DistributionofStandardErrorswithineachDiscrepancyCondition MeanSquaredError TherelationshipbetweenMSEanddiscrepancyconditionswere similartoSE.Asdiscrepancybetweenitempoolandabilitydistributionincreased,meanof 56 theMSEsincreasedaswell.ForsmalldiscrepancyconditionstheMSEdistributionswere almostidentical.Table5.1showsthatbothmeansandstandarddeviationsofMSEvalues wereveryclosewhenthediscrepancieswerebelow1unit.FigureA.5onpage171shows thedistributionofMSEforeachdiscrepancycondition.Ascanbeseenfromthisplot,the variationinMSEvaluesincreasedasthediscrepancyincreased.ThedistributionsofMSE werepositivelyskewedasSEdistributions. IPUI ItcanbeclearlyseeninFigure5.1thatastheabsolutevalueofdiscrepancybetween theitempoolandabilitydistributionincreased,theIPUIvaluesdecreased.Whentherewas nodiscrepancyoritwasminimal,themeanofIPUIwasalmost1.Thismeans,foralmost alloftheexamineesitempoolwasabletoprovideappropriateitemsthroughoutthetest. Itempoolwasntfortheseexamineegroups.ButforhighdiscrepancyconditionsIPUI valuesdecreasedmarkedly.ThemeanofIPUIvalueswentdownto0.6fortheseexaminee groups.Fortheseexaminees,itempoolfailedtoprovideappropriateitems. InadditiontothedecreaseinIPUIvaluesforhighdiscrepancyconditions,thevariability oftheIPUIvaluesincreasedforhighdiscrepancyconditions.Whentherewasaclosematch betweenitempoolandexamineedistribution,thestandarddeviationoftheIPUIwassmall, around0.06.ThestandarddeviationsoftheIPUIvaluesgoupto0.274forhighdiscrepancy conditions.ThisvariabilitycanalsobeobservedfromFigure5.3.Forallconditions,IPUI valuesrangefrom0.274to1.Thevariabilityincreasedasthediscrepancyincreasedand distributionsshowedanegativeskewforalmostallconditions.Butitcanbeobservedthat theskewnessofthedistributionswentfromnegativetopositiveasthediscrepancyincreased. Figure5.4showstherelationshipbetweenSEandIPUIforeachdiscrepancycondition. Linearregressionlinewastoeachplot.ThecorrelationbetweenSEandIPUIwas printedatthebottomofeachplot.Foreachdiscrepancyconditiontherewasanegative relationshipbetweenSEandIPUI.Butthisrelationshipwasnotlinear.Insteaditwas curvilinear.ThisgraphclearlyshowsthatSEandIPUIarenotthefunctionsofeachother. 57 Figure5.3:DistributionofIPUIwithineachDiscrepancyCondition Figure5.4showsthatthereweremanycaseswherebothIPUIandSEwerelow.Thismeans theitempooldidnotprovideoptimumitemstoexamineebutstilltheerrorofabilityestimate islow.Ontheotherhand,therewereveryfewcaseswherebothIPUIandSEwerehigh. Thismeans,whentheitempoolabletopresentoptimumitemstotheexaminee,thestandard errorsoftheabilityestimateswillnotbehigh. FigureA.4onpage170showstherelationshipbetweenIPUIandbiasforeachdiscrepancy condition.TherewasapositiverelationshipbetweenIPUIandbiaswhenthediscrepancy betweenitempoolandabilitydistributionwasnegative.Forpositivediscrepancyconditions 58 Figure5.4:RelationshipbetweenStandardErrorandIPUIforeachDiscrepancyCondition IPUIandbiashadanegativerelationship.Theserelationshipswerenotstrongasthe correlationcotsatthebottomofeachplotshows.Thisrelationshipcanbeexplained bytheregressiontothemean 59 FigureA.6onpage172showstherelationshipbetweenIPUIandMSEforeachdiscrepancy condition.Therewasnotanapparentpatternbetweenthesetwooutcomes. OneinterestingobservationinFigure5.2istheclusteringofSEvalues.Forexample, whenthediscrepancyconditionwas-3,thethreehighestSEvalueswere0.84,0.63and0.47. ThereweremanyexamineeswiththeseSEvalues.Outof 10 ; 000 simulatedexaminees,the numberofexamineeswiththeseSEvalueswere 1362 , 415 and 211 respectively.TheseSE valuesbelongtotheexamineeswith0,1and2totalcorrectresponsesrespectively,who answeredthesameitemsinthesameorder.For1362examineeswithoutanycorrectresponse, theCATprocedureadministeredthesameitemsbecauseoftheirresponses.Allofthese itemsweretheeasiestitemsoftheitempool.Consequently,theirabilityestimateswere thesameaswell.Thisiswhytheirstandarderrorswerethesame.Eventhoughtherewere morethan 415 examineeswithtotalcorrectscoreof1,these 415 examineesmadethesame oneerrortothesamequestionwhichresultintheadministrationofsametestitemstothese examinees.Samethingwastruefor211examineeswiththethirdhighestSE. SimilarphenomenoncanbeobservedtosomeextentforIPUIvaluesinFigure5.3.The IPUIvalueshadmorevariabilityinthevaluesittookcomparedtoSE.Fordiscrepancy condition-3,therewere1383examineeswhohadthesameIPUIvalue.Thisnumberisa littlelargerthanthenumberofexamineeswithsameSE(1362).These21hadtSE becausetheirlastitemiscorrect.ThatlastresponsedidnotchangetheIPUIvaluebut changedtheSEvaluefortheseexaminees.Restofthe1362examineesdidnothaveany correctresponse. FortheexamineeswhohadtwocorrectresponsesandthesameSEvalues,theIPUIvalues werenotalwaysthesame.Ifthelocationofthesetwoerrorsweret,thentheIPUI valueswouldbeerent.TheCATprocessesoftwosuchexamineesisinFigureA.7on page173.EventhoughthesetwoexamineeshadsameSE,theirIPUIvaluesweret becauseofthepositionoftheirmistakes.Thisphenomenonisthemainreasonwhythere weresomanytIPUIvaluescomparedtotheSE.Forexamineeswithonlytwocorrect 60 responses,SEswillbethesameifthesameitemsaredeliveredregardlessofthelocation ofthecorrectresponses.Because,forRaschmodel,thenumberoftotalcorrectresponses isatstatistic(Leeuw&Verhelst,1986).IftheCATalgorithmadministerssame itemstotwoexamineesbutexamineescorrectlyanswertwotitems,thentheir scorewillbethesame.ConsequentlytheirSEswillbethesameaswell.ButforIPUI,ifthe locationofthecorrectresponseschange,thentheIPUIvalueswillchangeaswell.Because IPUIisthefunctionofallintermediateabilityestimatesandtheitemparameters.Onthe otherhand,SEisthefunctionofabilityestimateandtheitemparameters. Forotherdiscrepancyconditionssimilarthinghappened.Fordiscrepancycondition3, manyexamineesgotallitemscorrect.Asaresult,theCATalgorithmadministeredthesame itemstotheseexamineesandtheirSEscameoutthesame.InFigure5.2,itcanbeseen thattheSEvaluesforallcorrectandallincorrectanswerswerent.Oneimportantnote isthat,theitempoolsusedinallconditionsweresame.So,ifanexamineecorrectlyanswers allitemsintwotdiscrepancyconditions,thentheirstandarderrorswouldcomeout thesame.Bycoincidence,theIPUIvaluesofallincorrectandallcorrectitemswerevery closeforthisitempool(0.2749047vs.0.2742634respectively).Asaresult,theminimum valuesofIPUIvaluesinFigure5.3fordiscrepancycondition-3and3lookedthesame. 5.1.1.2ItemPoolSize InthesecondpartoftheResearchQuestion1,itempoolqualitywasoperationalizedasthe sizeoftheitempool.Itwashypothesizedthatastheitempoolsizeincreases,thequalityof theitempoolincreasesaswell.Here,theitempoolparameterdistributionswereassumed toremainthesame.Consequently,thevaluesofIPUIwerehypothesizedtoincreaseas theitempoolsizesincrease.Thishypothesiswastestedusing11itempoolsizeconditions. Itempoolswith20,40,60,80,100,200,300,400,500,750and1000itemsgenerated.The itemyparameters( b -parameter)ofallitempoolsweregeneratedfromastandard normaldistribution.Foreachitempoolsizecondition,therewere25replicationstoobserve 61 theofsamplingerrorespeciallyforthesmallitempools.Forverysmallitempools, eventhoughtheitemltiesweredrawnfromastandardnormaldistribution,theitems generatedinonereplicationmighthavetitemcomparedtotheitems generatedinanotherreplication.Theitemydistributionsofitempoolsfor19th replicationisinFigureB.2onpage175. 10 ; 000 examineesweresimulatedfromastandardnormaldistribution.Thesameset ofexamineesusedforeachreplicationandcondition.Thedistributionoftrue isshown inFigureB.1onpage174.Testlengthfortheadaptivetestswas20andtherewereno constraintsontheitemselectionalgorithm.TherestoftheCATspwerethe sameasthecommonCATspmentionedinSection4.2.1.1onpage37.Foreach conditionandreplication,bias,SE,MSE,exposureratesandIPUIvaluescalculated. Bias Figure5.5showsthemeanbiasdistributionforeachcondition.Eachpointrepresents themeanofthebiasesof 10 ; 000 simuleesforeachreplicationwithinacondition.Except theverysmallitempools(20and40),meanbiaseswereverysmallforeachcondition.The valuesforreplicationswerespreadaround0anddoesnotshowapatternforitempoolsizes largerthan40.Forverysmallitempoolsizes,thespreadofmeanbiasvalueswashigher. FigureB.3onpage176showsthebiasdistributionforthe19threplicationofeachcondition. Thespreadofbiaseswerealsohigherforverysmallitempools.Fortherestoftheitempool sizeconditions,thedistributionsofthebiaseswerealmostthesame,normallydistributed withthesamemeansandstandarddeviations. StandardError Figure5.6showsthemeanSEvaluesforeachreplicationwithina condition.AstheitempoolsizeincreasedthemeanSEdecreased.Foritempoolsthat arelargerthan200,themeanSEvalueswerealmostsame.ThespreadofmeanSEs withinconditionwerehigherforsmallitempools.Thistheofsamplingerror. FigureB.4onpage177showstheSEdistributionforthe19threplicationofeachcondition. 62 Figure5.5:MeanBiasDistributionbyItemPoolSizeCondition ThemeanofSEforeachconditiondecreasedastheitempoolsizeincreased.Aftertheitem poolsizeof100,thebetweenmeanswereverysmall.ThespreadofSEdecreased asthesizeoftheitempoolincreased.After200itemsthespreadwasalmostthesameforall conditions.Butthenumberofoutliersdecreasedastheitempoolsizeincreased.Foritem poolofsize 1000 ,therewasonlyonesimuleewithSElargerthan.4.Forotherconditions thisnumberincreasedastheitempoolsizedecreased.Alsothedistributionswerepositively skewedforeachcondition. MeanSquaredError Figure5.7showsthemeanMSEvaluesforeachreplicationwithina condition.MSEdistributionshowsasimilarpatternasSE,astheitempoolsizeincreasedthe meanofMSEsdecreased.ThespreadofthemeanMSEvalueswithinaconditionwerelarger 63 Figure5.6:MeanStandardErrorDistributionbyItemPoolSizeCondition forsmallitempoolsizeconditionsanddecreasedastheitempoolsizeincreased.Foritem poolslargerthan200items,thespreadwasalmostthesame.FigureB.5onpage178shows theMSEdistributionforthe19threplicationofeachcondition.Eventhoughthemeansand thespreadofMSEvalueswerelargerforverysmallitempoolsizes,thedistributionswere verysimilarforitempoolsthatarelargerthan40.Allofthedistributionswerepositively skewed. ExposureRates Meanexposurerateforreplicationswerenotcalculatedbecausethe meanexposurerateofanadaptivetestequalstotheratiooftestlengthtotheitempoolsize. Instead,Figure5.8showstheexposureratesoftheitemsforthe19threplicationofeach condition.Sincetestlengthwastheexposureratesofsmallitempoolswereoverall 64 Figure5.7:MeanSquaredErrorDistributionbyItemPoolSizeCondition higherthantheexposureratesofthelargeritempools.Also,initialabilitywassameforall examinees.Thisledtheselectionofthesameoneitemforeachadaptivetest.Forthisitem, theexposureratewas1.Dependingonthecorrectorincorrectresponse,examineeswere routedtothesameseconditems.Exposureratesfortheseitemswerearound.5forlargeitem poolsandevenhigherforsmalleritempools.Besidessuchitems,foritempoolsthatwere largerthan200items,majorityoftheexposurerateswerelowerthan0.2,therecommended maximumvalueforitemexposure. IPUI Figure5.9showsthemeanIPUIvaluesforeachreplicationwithinacondition.There wasaclearrelationshipbetweenitempoolsizeandmeanIPUIvalues.Astheitempoolsize increased,themeanvaluesofIPUIincreasedaswell.ThespreadofmeanIPUIvalueswere 65 Figure5.8:ExposureRatesbyItemPoolSizeConditionforReplication19 largerforsmalleritempoolsizes,butthespreaddecreasedastheitempoolsizeincreased. Table5.2showsthemeansandstandarddeviationsofallIPUIvaluescombinedforeach condition(i.e.eachreplicationwithinaconditionwasaggregatedtogetasinglemeanand standarddeviation).Eventhoughthebetweenconditionsaftertheitempoolsize 200isnotdiscernibleinFigure5.9,Table5.2showsthatmeanofIPUIincreasedsteadiliy astheitempoolsizeincreased.AlsothestandarddeviationofIPUIvaluesalsodecreased (exceptforitempoolsize40)astheitempoolsizeincreased.Thisshowsthat,itempools providedmoreappropriateitemswhenitempoolsizeincreased. SmallstandarddeviationsinTable5.2showsthatsmallernumberofsimuleesfrom thelackofappropriateitem.ThiscanbeobservedvisuallyfromFigureB.6onpage179 whichshowsthedistributionofIPUIforthe19threplicationofeachcondition.Thisgraph 66 Figure5.9:MeanIPUIDistributionbyItemPoolSizeCondition Table5.2:MeansandStandardDeviationsofIPUIValuesbyItemPoolSize ItemPoolSizeMeanIPUIStandardDeviationofIPUI 200.58750.1798 400.83480.1817 600.90280.1593 800.93450.1325 1000.94940.1201 2000.97930.0768 3000.98800.0574 4000.99060.0507 5000.99240.0450 7500.99530.0343 10000.99650.0294 67 showssimilarrelationshipbetweenitempoolsizeandIPUIasmentionedabove.Inaddition tothat,thenumberofexamineesthathavelowerIPUIvaluesdecreasedastheitempoolsize increased.Thismeans,eventhesimuleeswithextremeabilityparameterssawappropriate items.FigureB.6alsoshowsthatthedistributionofIPUIwasnegativelyskewedforeach condition. IPUIandStandardErrorRelationship SincebothIPUIandSEusesinformation functionfortheircalculations,thesetwovaluesareexpectedtoberelated.Figure5.10shows therelationshipbetweenthemeanIPUIandmeanSEforeachreplicationwithinacondition. Eachitempoolsizeconditionisrepresentedbyatcolorinthegraph.Itcanbeeasily observedthatthereisalmostaperfectrelationshipbetweenmeanIPUIandmeanSEexcept forverysmallitempoolsizeconditions.Thecorrelationbetweenthesetwovariableswas -0.99. Figure5.10showstherelationshipbetweenIPUIandSEatanaggregatelevel.Infact,the relationshipofIPUIandSEforindividualsimuleeswasnotperfect.Figure5.11showsthe relationshipbetweenIPUIandSEforthe19threplicationofeachcondition.Therelationship wasmorelikeacurvilinearrelationshipforindividualexaminees.Thisisverysimilartothe relationshipobservedinthepartoftheresultsforResearchQuestion1(Figure5.4). OneinterestingobservationinFigure5.11isthecorrelationbetweenSEandIPUI.As thesizeoftheitempoolincreasedtheabsolutevalueofcorrelationbetweenSEandIPUI decreased.So,foritempoolsize 1000 ,therelationshipbetweenSEandIPUIappearstobe veryweak, ˆ = 0 : 34.Butforsmalleritempoolsthisrelationshipwasratherstrong. Figure5.12showsthecorrelationbetweenSEandIPUIforeachreplication.Astheitem poolsizeincreasedtheabsolutevalueofcorrelationbetweenSEandIPUIdecreasedformost oftheitempoolsizeconditions.ThemainreasonbehindthisisthevariationofbothSE andIPUIvalues.Forsmalleritempoolsizestherewasmorevariationinthevaluesofthese variables.ButforlargeritempoolsizesthevaluesofIPUIwerealmost1,andthevariation 68 Figure5.10:RelationshipbetweenMeanofIPUIandMeanofStandardError inSEwasalsoverylowandthiswasthemaincauseofweakrelationship. IPUIandReliability EventhoughSEgivesenoughinformationabouttheerrorinability estimates,reliabilityisawellknownindicatorofthequalityofatest.Itisvaluableto investigatetherelationshipbetweenreliabilityandIPUI. Therelationshipbetweenstandarderror( ˙ e )andreliability( ˆ ; ^ )isasfollows: ˙ e = ˙ ^ q 1 ˆ ^ So,reliabilitycanbecalculatedas: ˆ ^ =1 ˙ 2 e ˙ 2 ^ 69 Figure5.11:IPUIandStandardErrorRelationshipforReplication19 Inthesimulation ˙ 2 ^ =1.Consequently,thereliabilitycanbecalculatedas: ˆ ^ =1 ˙ 2 e Figure5.13showstherelationshipbetweenIPUIandreliabilityforthe19threplication ofeachcondition.AsthevaluesofIPUIincreasedthereliabilityincreasedaswellforeach condition.Therelationshipwascurvilinearinsteadofalinearone.Forsmallitempoolsthe 70 Figure5.12:CorrelationbetweenStandardErrorandIPUIforeachReplication relationshipwasratherstrong.Asitempoolsizeincreasedthestrengthoftherelationship decreased.Figure5.13alsoshowsthemeanreliabilityvaluesforeachcondition( x ).The reliabilitywascomparativelylowfortheitempoolwith20items.Aftertheitempoolsizeof 40,theincreasewasminimal. 71 Figure5.13:IPUIandReliabilityRelationshipforReplication19 5.1.2ResearchQuestion2 IntheResearchQuestion2,theoftwotestsptestlengthandexposure control,onthequalityofitempoolandCAToutcomeswasinvestigated. 72 5.1.2.1TestLength InthestpartoftheResearchQuestion2,theofthetestlengthontheperformance ofitempoolandotherCAToutcomeswereinvestigated.Itwashypothesizedthatincreasing thetestlengthwilldecreasethequalityoftheitempoolasquanbyIPUI.Increasing thetestlengthdoesnotnecessarilydecreasethequalityofaCAT,buttheofa CATwilliftheitempooldoesnotsupportalongtest.Forexample,anincreaseinthe testlengthisassociatedwithanincreaseinthetest'sprecision,i.e.decreaseinthestandard erroroftheabilityestimates(Weiss,1982).Iftheitempoolistforalongtest, increasingthetestlengthwillincreasetheprecisionofthetestminimally.Thishypothesis wastestedusing18testlengthconditions.Theconditionsweretestswithlengths5,10,15, 20,25,30,35,40,45,50,55,60,65,70,100,200,300and400. Anitempoolwith400itemswasgenerated.Item( b -parameters)ofthisitem poolweregeneratedfromanormaldistributionwithmean0andstandarddeviation1.The itemydistributionoftheitempoolisshowninFigureC.1onpage180.Thesame itempoolwasusedforeachtestlengthcondition.Sincethesizeoftheitempoolwasrather large,itwasassumedthattheofsamplingerrorwaslow.So,entitempoolsdid notreplicatedandonlyonereplicationwasperformedforeachcondition. 10 ; 000 examinees weresimulatedfromanormaldistributionwithmean0andstandarddeviation1.The samesetofexamineeswasusedforeachcondition.Thedistributionoftrue isshownin FigureC.2onpage181.Therewerenoconstraintsontheitemselectionalgorithm.Therest oftheCATspwerethesameasthecommonCATspmentionedin Section4.2.1.1onpage37.Foreachcondition,bias,SE,MSE,ycoent,exposure ratesandIPUIvalueswerecalculated. TheresultsoftheanalysesaresummarizedinFigure5.14.Theycotandthe meanvaluesofbias,SE,MSEandIPUIforeachtestlengthconditionareshowninthe Table5.3showstheycotinadditiontothemeansandstandarddeviationsof 73 bias,SE,MSEandIPUIforeachtestlengthcondition.Inthefollowingpagestheresultsof theinvestigationofeachvariableispresentedseparately. Figure5.14:SummaryStatisticsforResearchQuestion2-TestLength RelationshipbetweenTrueandEstimatedAbility Figure5.15showstherelationship betweentrueability( )andestimatedability( ^ ).Thedashedlinesintherearethe identitylines(i.e. y = x line).Whentheestimationisperfect,allofthepointsshouldfallon thisline.Ifapointisabovetheidentityline,thismeansitisoverestimated,i.e.estimated abilityislargerthanthetrueability(positivebias).Ifthepointisbelowtheidentityline,this meansitisunderestimated,i.e.estimatedabilityissmallerthanthetrueability(negative bias). Ateachcondition,therewasanerrorintheestimationandpointsdeviatedsomewhat 74 Table5.3:SummaryStatisticsforResearchQuestion2-TestLength TestLengthBiasSEMSEIPUIFidelity 50.007(0.650)0.611(0.046)0.423(0.653)0.999(0.004)0.8390 10-0.001(0.429)0.413(0.030)0.184(0.289)0.997(0.021)0.9198 150.000(0.340)0.330(0.021)0.116(0.172)0.995(0.032)0.9473 200.000(0.289)0.282(0.017)0.084(0.122)0.992(0.042)0.9609 25-0.002(0.256)0.250(0.015)0.065(0.099)0.991(0.049)0.9690 30-0.001(0.234)0.227(0.016)0.055(0.092)0.988(0.055)0.9740 35-0.004(0.211)0.210(0.013)0.044(0.065)0.985(0.059)0.9787 40-0.000(0.202)0.196(0.016)0.041(0.063)0.982(0.067)0.9806 45-0.000(0.185)0.185(0.015)0.034(0.052)0.979(0.073)0.9834 500.002(0.179)0.175(0.014)0.032(0.048)0.975(0.081)0.9845 55-0.001(0.169)0.167(0.015)0.028(0.042)0.971(0.084)0.9862 60-0.002(0.164)0.160(0.017)0.027(0.047)0.967(0.089)0.9869 650.002(0.155)0.154(0.015)0.024(0.039)0.962(0.094)0.9883 70-0.001(0.150)0.148(0.014)0.022(0.034)0.958(0.097)0.9891 100-0.002(0.128)0.126(0.017)0.016(0.031)0.926(0.123)0.9920 2000.002(0.099)0.096(0.020)0.010(0.023)0.809(0.164)0.9952 3000.001(0.090)0.087(0.020)0.008(0.015)0.681(0.167)0.9960 400-0.001(0.089)0.084(0.023)0.008(0.026)0.546(0.143)0.9961 Note. Numberswithintheparenthesesarestandarddeviationsofeachoutcome. SE:StandardError;MSE:MeanSquaredError;Fidelity:Correlationbetweentrueability andestimatedability fromtheidentityline.Forshortteststhedeviationfromtheidentitylineislarger.As thelengthofthetestincreased,thepointsgetclosertotheidentityline.Thismeansthe estimationstartedtoconvergetothetruevalues.BothFigure5.15andTable5.3shows theycocients(i.e.thecorrelationbetweentrueandestimatedability)foreachtest lengthcondition.Forshorterteststhecorrelationbetweentrueandestimatedabilitieswere lower.Astestlengthincreased,thecorrelationssteadilyincreasedandapproachedto1. Bias Figure5.14showsthatthemeanbiasdidnotchangeacrossdtconditions. Ontheotherhand,itcanbeobservedfromTable5.3that,thestandarddeviationofbias decreasesasthetestlengthincreases.Figure5.16showsthedecreaseinthestandarddeviation ofbiasvaluesvisually.Whenthetestlengthwasshort,theerrorintheestimateswerehigher 75 Figure5.15:RelationshipbetweenTrueandEstimatedAbilitybyTestLengthCondition asalsoshowninFigure5.15.Thisisnotsurprisingbecauselongertestsareexpectedto increasethequalityofthemeasurement. 76 Figure5.16:BiasDistributionbyTestLengthCondition StandardError ItcanbeobservedfromFigure5.14thatasthelengthoftestincreased themeanvalueofSEdecreased.Table5.3showsthestandarddeviationsofSEsforeachtest lengthcondition.Thesevaluesdoesnotshowauniformpattern. ThedistributionofSEscanbeobservedvisuallyatFigure5.17.Inthisthedashed linecorrespondstothestandarderrorthatisexpectedforatestwithreliabilityvalue0.9. Fortestlengthconditionslongerthan20,majorityofthesimuleeshadSEvalueslowerthan thisthreshold.ThemedianofSEsdecreasedquicklybetweentestlength5to30.Afterthis therewasadecresebutthisdecreasewasnotlarge.Especially,therewasaminimaldecrease 77 fromtestlength300to400,eventhoughthetestlengthincreasedby100items.Thisisan evidenceforthelossforlongtestsinglscat. Ateachtestlengthcondition,thedistributionsofSEshadastrongpositiveskew.The reasonbehindthiswastheoverlapbetweentheabilitydistributionofexamineesanditem ydistributionofitempool.Formostoftheexaminees,theitempoolhadappropriate items.Forasmallportionofexamineeswhowereattheextremesoftheabilityscale,the itempooldidnothaveappropriateitems.TheSEsoftheseexamineeswerelargercompared totheexamineesthatwereclosertothecenteroftheabilitydistribution.Sincethenumber ofexamineesattheextremesweresmallcomparedtotheonesatthecenter,apositiveskew occurredforSEdistribution. MeanSquaredError TheaverageofMSEvaluesdecreasedasthetestlengthincreased (Figure5.14).Thisdecreasewasrapidforshortertestlengthsbutthetrendwas afterthetestlength100.Table5.3showsthatthestandarddeviationofMSEvaluesalso decreasedasthetestlengthincreased.Figure5.18showsthedistributionofMSEforeach testlengthcondition.ThespreadofMSEvaluesdecreasedasthetestlengthincreased.For testlengthslongerthan100,therewasalmostnobetweentheMSEdistributions. ExposureRates Exposureratedistributionofeachtestlengthconditionisshownin Figure5.19.Inthismedianexposureratesareshownwithboldlinesinthemiddleof box-plotstotheexposureratesofeachcondition.Thedashedlinesareshowingthe 0.20and0.05levelsforexposurerates.Itisrecommendedthatexposureratesoftheitems fallbetweenthesetwodashedlines.Forverysmalltestlengthconditionstheexposurerates wereverylow.Exposureratesoftheitemsincreasedastestlengthincreased,becausethe itempoolsizewasthesameforeachtestlengthcondition. Table5.4showsthemeansandstandarddeviationsoftheexposureratesinadditionto theratesofexposeditemsthatwerelargerthan0.20andlowerthan0.05.Anexposurerate 78 Figure5.17:StandardErrorDistributionbyTestLengthCondition largerthan0.20canbeseenasasignofahighlyexposeditem.ItisdesirableforaCATtest tohaveitemexposureslowerthan0.2.Ontheotherhand,verylowexposureratessignify itemsthatarenotshowntothemajorityoftheexaminees.Thisisnotdesirablebecause underutilizationofitemsdecreasestheiencyoftheitempools.Inthissense0.05value canbeseenasavalueforitemswithlowexposurerates.Thesecanchange dependingonthepurposeofthetest. Fortestlengthconditionslowerthan35,almostnoneoftheitemswereoverexposed(i.e. exposurerates > 0.2).Thesmallpercentageofitems(400 0 : 0175=7items)thathadlarger 79 Figure5.18:MeanSquaredErrorDistributionbyTestLengthCondition exposurerateswasduetothelackofexposurecontrolintheitemselectionalgorithm,as explainedpreviouslyinSection5.1.1.2.Fortestlengthconditionslongerthan55itemsmore than10%oftheitemswereoverexposed.Theproportionofunderexposeditemsdecreasedas testlengthincreased.Fortestlengthconditionsshorterthan45items,morethan10%ofthe itemswereunderexposed. IPUI Figure5.14showsthatthemeanvalueofIPUIdecreasedasthetestlengthincreased. StandarddeviationsofIPUIvaluesincreasedastestlengthsincreasedasshowninTable5.3. 80 Figure5.19:ItemExposureDistributionbyTestLengthCondition Clearly,theitempoolwasabletosupportshortertests.MeanIPUIvaluefortestswith 5itemswerealmostperfect.Alsolowstandarddeviationsforshortertestspointoutthat mostofthesimuleessawappropriateitems.Figure5.20showstheIPUIdistributionof simuleesvisually.ThenumberofexamineeswithlowIPUIvaluesincreasedasthetestlength increased.Fortestlengthslargerthan100,noneoftheexamineeshadIPUIvalueslarger than.98,andthismaximumvaluedecreasedastestlengthincreased. InFigure5.20,eachtestlengthconditionshowsanegativeskew.Thereasonbehindthis skewnessistheoverlapbetweentheitemydistributionoftheitempoolandthe abilitydistributionofthesimulees.Forthemajorityofthesimulees,itempoolhaditems thatwereclosetotheirtrue values.Consequently,theIPUIvalueswerehighformostof thesimulees.TheIPUIvaluesofthesimuleeswithtrue valuesattheextremeswerelower. 81 Table5.4:ItemExposureAnalysisbyTestLengthCondition TestLengthMeanExposureSDExposure > .20Exposure < .05 50.01250.07090.01750.9450 100.02500.07120.01750.9050 150.03750.07070.01750.8200 200.05000.07040.01750.6500 250.06250.07040.01750.4750 300.07500.07040.01750.3275 350.08750.07020.02000.2150 400.10000.06950.02750.1050 450.11250.06920.03250.0500 500.12500.06900.04250.0250 550.13750.06920.06750.0250 600.15000.06940.11250.0200 650.16250.07020.18750.0175 700.17500.07040.25500.0175 1000.25000.08220.84250.0125 2000.50000.20410.93500.0050 3000.75000.28360.97250.0000 4001.00000.00001.00000.0000 Butsincethenumberofthesimuleeswithextremetrue valueswerelow,thedistributionof theIPUIbecameskewed.Theamountofskewnessdecreasedasthetestlengthincreased. Forlongertests,itempoolfailedtoprovideappropriateitemsevenfortheexamineesthat hadtrue valuescloseto0.TheIPUIvaluesofallexamineesstartedtodecrease.Since IPUIhasalowerboundof0,andunliketheupperboundof1,thislowerboundcannotbe attainablepractically,asIPUIdecreasedforallsimulees,theIPUIvaluesofthesimuleeswith true satthemiddleoftheabilitydistributiondecreasedmorecomparedtothesimuleesat theextremes.ThisisthereasonwhytheskewnessofIPUIdistributiondecreasedforlonger testlengthconditions. OneimportantbetweenFigure5.14andFigure5.20istheuseofsummary statisticstoshowthebetweenIPUIvalues.InFigure5.14themeanvaluesof IPUIareshown.Ontheotherhand,onFigure5.20themedianvaluesofIPUIareshownin additiontothequartileinformationviaboxplots.SinceIPUIdistributionswerenegatively 82 Figure5.20:IPUIDistributionbyTestLengthCondition skewedforallconditions,themedianofIPUIvalueswerealwayslowerthanthemeanvalues. ThismighthaveimportantconsequencesonhowtoevaluatetheoverallIPUIvalues. Forexample,forthesimulationconditionwheretestlengthwas70,themeanandmedianof IPUIvalueswere0.958and0.994,respectively.Accordingtomedianvaluetheitempool functionalmostperfectly,ontheotherhandmeanofIPUIvaluesindicatesthatitempool performancewasnotperfect. 83 IPUIandStandardError Intheprevioussectionstherelationshipbetweenmeanvalues ofIPUIandSEwerenegative.Asthediscrepancybetweenitempoolandabilitydistribution increasedinthepartoftheResearchQuestion1,themeanofIPUIvaluesdecreasedand themeanofSEvaluedincreased.Thesametypeofrelationshipwasobservedforthesecond partoftheResearchQuestion1wheretheitempoolsizeinvestigated.Asthesizeofthe itempoolincreased,themeanofIPUIvaluesincreasedandthemeanofSEvalueddecreased. Fortestlengthconditions,thistrendchanged.Figure5.14onpage74showsthatastest lengthincreased,themeanvaluesofbothIPUIandSEdecreased.Inthiscase,eventhough increasingthetestlengthdecreasedthequalityofitempoolforthatparticularCATdesign, theprecisionoftheabilityestimatesincreased.Thisisanimportantobservationthatshows howtCATspinteractwiththeoutcomesofCATtly. TherelationshipbetweenIPUIandSEforeachtestlengthconditionisshowninFigure5.21. Forindividualsimulees,therelationshipbetweenSEandIPUIisnegative,similartowhat wasobservedinprevioussections(Figures5.4and5.11onpage59andonpage70).t thantheprevioussections,thecurvilinearrelationshipbetweenthesetwovariablesbecame moreevidentasthetestlengthincreased.Figure5.21alsoshowsthecorrelationsbetween SEandIPUIforeachtestlengthcondition.ThecorrelationbetweenSEandIPUIwaslarger inabsolutesenseforlongertests. 5.1.2.2ExposureControl ForthesecondpartoftheResearchQuestion2,theoftheexposurecontrolonthe performanceoftheitempoolandotherCAToutcomeswasinvestigated.Itwashypothesized thataddingmorerandomizationtotheitemselectionprocesstoreduceitemexposurewill decreasethequalityoftheitempoolasquanbyIPUI.Thishypothesiswastested using12exposurecontrolconditions.Theconditionswere(1)noexposurecontrol,(2-11) randomesquewith3,5,7,10,13,15,20,25,50,100itemsand(12)total-random-selection ofitems.Theseexposurecontrolconditionsstartedfromtheconditionwheretherewasno 84 Figure5.21:IPUIandStandardErrorRelationshipbyTestLengthCondition 85 randomizationintheitemselectionandgraduallyincreasedtheamountofrandomization imposedontheitemselectionalgorithm.Thelastconditionbasicallyrandomlyadministered itemstoexamineeswithouttakingintoaccounttheirpreviousanswers.Inthefollowingpages theseconditionswillbereferredas:Rand-1,Rand-3,Rand-5,Rand-7,Rand-10,Rand-13, Rand-15,Rand-20,Rand-25,Rand-50,Rand-100andRand-Total. Anitempoolwith250itemswasgenerated.Item( b -parameters)ofthisitem poolweregeneratedfromanormaldistributionwithmean0andstandarddeviation0.7.The itemydistributionoftheitempoolisshowninFigureD.1onpage183.Thesame itempoolwasusedforeachexposurecontrolcondition. 10 ; 000 examineesweresimulated fromanormaldistributionwithmean0andstandarddeviation1.Thesamesetofexaminees wasusedforeachcondition.Thedistributionoftrue isshowninFigureD.2onpage184. Testlengthforallofthetestswas20.Therewerenoconstraintsontheitemselection algorithmotherthanexposurecontrol.TherestoftheCATsptionswerethesameas thecommonCATspmentionedinSection4.2.1.1onpage37.Foreachcondition, bias,SE,MSE,ycot,exposureratesandIPUIvaluescalculated. TheresultsoftheanalysesaresummarizedinFigure5.22.Themeanvaluesofbias, SE,MSE,IPUIandycotforeachtestlengthconditionisshowninthe Table5.5showsthemeansandstandarddeviationsofbiases,SEs,MSEsandIPUIsforeach testlengthconditioninadditiontotheycots.Infollowingpageseachofthese outputvariablesarediscussedseparately. RelationshipbetweenTrueandEstimatedAbility Figure5.23showstherelationship betweentrueability( )andestimatedability( ^ ).Thedashedlinesintherearethe identitylines(i.e. y = x line).Ateachcondition,therewasanerrorintheestimationand pointsdeviatedsomewhatfromtheidentityline.Towardsthemiddleoftheabilityscale, thereisabalancebetweenoverestimatedandunderestimatedestimates.Buttowardsthe extremesoftheabilityscale,balancestartstoshift.Abilitywasoverestimatedforhighability 86 Figure5.22:SummaryStatisticsforResearchQuestion2-ExposureControl examineesandunderestimatedforlowabilityexaminees. Figure5.23alsoshowstheycots(i.e.correlationbetweentrueandestimated ability)foreachexposurecontrolcondition.ForconditionsRand-1toRand-25,they cotwasnotchangedbetweenconditions.ButafterRand-25,ycotstarted todecrease. Bias Figure5.22showsthatmeanbiaschangesminimallyacrosserentconditions.This minimalchangewasnotinonedirection,soitcanbetreatedasarandomerror.Table5.5 alsoshowsthatthemagnitudeofthesemeanbiaseswerealmost0.Thestandarddeviation ofbiaseswerealmostthesameforconditionsinwhichrandomizationwaslessthan25 items.AfterRand-25conditionthestandarddeviationsofthebiasesstartedtoincrease 87 Table5.5:SummaryStatisticsforResearchQuestion2-ExposureControl ConditionBiasSEMSEIPUIFidelity 1-0.000(0.295)0.287(0.044)0.087(0.149)0.961(0.113)0.9580 30.002(0.297)0.287(0.046)0.088(0.150)0.958(0.117)0.9582 5-0.001(0.296)0.287(0.046)0.088(0.151)0.956(0.118)0.9586 7-0.002(0.297)0.288(0.047)0.088(0.151)0.953(0.120)0.9581 100.004(0.299)0.289(0.053)0.089(0.166)0.949(0.127)0.9581 130.006(0.297)0.289(0.053)0.089(0.156)0.944(0.130)0.9587 150.000(0.297)0.289(0.051)0.088(0.163)0.941(0.130)0.9586 200.005(0.297)0.290(0.052)0.088(0.148)0.933(0.135)0.9588 250.001(0.296)0.291(0.057)0.088(0.156)0.926(0.140)0.9589 500.005(0.309)0.298(0.072)0.095(0.187)0.886(0.163)0.9563 1000.001(0.341)0.317(0.101)0.116(0.249)0.796(0.188)0.9494 Random-0.000(0.420)0.389(0.150)0.176(0.375)0.535(0.180)0.9307 Note. Numberswithintheparenthesesarestandarddeviationsofeachoutcome. Condition:ExposureControl;SE:StandardError;MSE:MeanSquaredError;Fidelity: Correlationbetweentrueabilityandestimatedability systematically.ThiscanalsobeobservedvisuallyfromFigure5.24.Thefactthatmeanbiases didnotchangemakessensebecause(1)itempoolandabilitydistributionisoverlapping and(2)bothhighandlowabilityexamineeswereabyrandomization.Fromthe standarddeviationsofbiasesitcanbesaidthattheitempoolcansupportrandomizationtill Rand-25.Butafterthat,theamountofrandomizationstartedtotheabilityestimates ofexaminees. StandardError AsFigure5.22shows,theincreaseinthemeanSEwasalmostnonexistent fromtheconditionRand-1toRand-25.AfterRand-25,therewasanincreaseinthemeanSE values.Table5.5alsoshowsanincreasebutveryminimalforlowrandomizationconditions. Inaddition,Table5.5showsthatstandarddeviationofSEsincreasedsteadilyastheamount ofrandomizationinitemselectionincreased.VisualinspectionofthespreadatFigure5.25 indicatesthisaswell.InFigure5.25thedashedlinepointstheSEvalueforatestwith 0.9reliabilityasexplainedatSection5.1.1.2onpage69.MajorityofthesimuleeshadSEs smallerthan0.316forconditionsRand-1toRand-50.Theupperwhiskerofthebox-plots 88 Figure5.23:RelationshipbetweenTrueandEstimatedAbilitybyExposureControlCondition arelowerthan0.316.Fortotalrandomselectioncondition,mostofthesimuleeshadSEs largerthanthisthreshold.ForRand-100andRand-Totalconditions,thenegativeof randomizationonabilityestimatesaremorevisible. MeanSquaredError Figure5.22showsthatthepatterninaverageMSEvalueswere almostidenticaltothepatternofSEs.ThenumbersinTable5.5indicatesthatforconditions 89 Figure5.24:BiasDistributionbyExposureControlCondition betweenRand-1andRand-25,thechangeintheaverageMSEvalueswasminimalandnot steady.ButtherewasanincreasingpatternforconditionsfromRand-50toRand-Total. ThechangeinthestandarddeviationsofMSEvalueswasnotsteadybetweenRand-1and Rand-25conditions.Forexample,standarddeviationofMSEforRand-20conditionwas lessthanthestandarddeviationfortheRand-10condition.Ontheotherhand,theincrease inthestandarddeviationofMSEvaluesisevidentforconditionsRand-50toRand-Total. Figure5.26showsthisspreadvisually.EspeciallyforRand-Totalcondition,theincreasein thespreadisevident. 90 Figure5.25:StandardErrorDistributionbyExposureControlCondition ExposureRates Exposureratedistributionofeachexposurecontrolconditionisshownin Figure5.27.Inthisoneobservationwithexposurerate1inRand-1conditionhasbeen removedfromthetoshowthespreadbetter.Sincetherewasnotanexposurecontrol inRand-1condition,thesameitemhasbeenselectedasaitemforeachsimulee. Forthisreasontheexposurerateofthatitemis1.Intheboldlinesinthemiddle oftheboxplotsshowsthemedianofexposurerates.Themedianexposurerateincreased astheamountofrandomizationinitemselectionincreased,exceptforRand-100condition. Itisimportanttonotethatsincetheratioofitempoolsizetothetestlengthisconstant 91 Figure5.26:MeanSquaredErrorDistributionbyExposureControlCondition foreachcondition,themeanexposurerateswereequalforeachcondition(Table5.6).It canbeobservedthatexposurecontrolmethodworked.Thenumberofhighlyexposeditems decreasedastheamountofrandomizationimposedonitemselectionincreased.Fortotal randomizationcondition,allitemsexposedatthesimilarrate. Table5.6showsthemeansandstandarddeviationsofexposureratesinadditiontothe ratioofexposureratesthatarelargerthan0.20andsmallerthan0.05.Eventhoughthe meansofexposurerateswereequal,thestandarddeviationsofexposureratesdecreased steadilyexceptforconditionRand-100.Asmallstandarddeviationofexposureratesmeans, 92 Figure5.27:ItemExposureDistributionbyExposureControlCondition theuniformexposureofitemsacrossexaminees.Thestandarddeviationofexposureratesis importantbecauseasChenetal.(2003)showedintheirpaper(Equation14,p.134),thereis adirectrelationshipbetweenthevarianceofexposureratesandtheaveragebetween-test overlaprate.IntheCATscenariohere,wherethemeanexposurerateswereequalbetween conditions,thereductioninthestandarddeviationofexposureratesdirectlytranslatesto areductionintheaveragebetween-testoverlaprates.Thismeans,forconditionswhere standarddeviationofexposurerateswerelower,theratioofitemstworandomlyselected examineesbothseewouldbelower.Thisisanimportanttestsecurityissueforanoperational highstakesCATtest. Thepercentageofitemsthatwereexposedmorethan20%oftheexamineesreducedas theamountofrandomizationinitemselectionincreased.AftertheconditionRand-10,none 93 Table5.6:ItemExposureAnalysisbyExposureControlCondition ExposureControlMeanExposureSDExposure > .20Exposure < .05 10.08000.09160.04000.4560 30.08000.05660.04000.2800 50.08000.04900.04800.2920 70.08000.04390.01600.2520 100.08000.03780.00000.1600 130.08000.03430.00000.1280 150.08000.03230.00000.1120 200.08000.02760.00000.0520 250.08000.02470.00000.0600 500.08000.01800.00000.0040 1000.08000.02160.00000.1000 Random0.08000.00250.00000.0000 oftheitemsexposedmorethan20%oftheexaminees.Thepercentageofitemsthatwere exposedtolessthan5%oftheexamineesalsoreducedastheamountofrandomizationin theitemselectionincreased,exceptforRand-100condition.Thisindicatesthat,allofthe itemswereutilizedtoagreaterextentanditempoolbecamemoret.Theseresults showedthat,fromtheexposurecontrolperspective,totalrandomizationofitemselectiondid thebestjob.Butaspreviousresultsregardingbias,SEandMSEsuggestedthisconditionis notdesirablefromothertheaspectsofthemeasurementpractice. IPUI Previousparagraphsshowedthatthemeanvaluesofbias,MSEandSEdidnot forexposurecontrolconditionsbetweenRand-1toRand-25.Ontheotherhand, Figure5.22showsthatIPUIdecreasedsteadilyasmorerandomizationimposedonitem selectionmechanism.ForconditionsbetweenRand-50andRand-Total,thedecreaseisvery clear.Table5.5showsthisdecreasenumerically.ThestandarddeviationofIPUIvalues increasedsteadilyaswell,exceptforRand-Totalcondition. Figure5.28displaysthedistributionofIPUIvaluesforeachexposurecontrolcondition visually.TheincreaseinthespreadofIPUIvaluesareapparentfromtheboxplotsshown. TheboldlinesintheboxplotsshowsthemedianvaluesofIPUI.Medianvaluesweregenerally 94 largerthantheaveragevaluesgiveninTable5.5.Consequently,thedecreaseinthemedian valuescannotbeseenfromthegraph. Figure5.28:IPUIDistributionbyExposureControlCondition FortheRand-Totalcondition,eventhoughtheoverallIPUIvaluesweresmaller,majority oftheexamineeshadIPUIvalueslargerthan0.6.Themainreasonforthiswastheoverlap ofitempoolandabilitydistribution.Iftherewasalargediscrepancybetweenitempooland abilitydistribution,thentheoverallIPUIvalueswouldbeevenless. 95 IPUIandStandardError TherelationshipbetweenSEandIPUIisshowninFigure5.29. ForeachconditiontherewasanegativerelationshipbetweenSEandIPUI.Therelationship wascurvilinearandthereweremorespreadinthevaluesofIPUIcomparedtoSE.The relationshipbetweenbiasandIPUIisplottedinFigureD.3onpage185.Therewasnota clearrelationshipbetweenthesetwovariables. 5.1.3ResearchQuestion3 Inthisresearchquestion,theuseofIPUIasadiagnostictoolisevaluated.Ahypothetical exampleofastatetestingagencywasintroducedinSection4.2.1.4onpage44.Thisstate testingagencyhadthreeplanstotest.Itempoolinformationoftheseitempoolswereshown inTable4.1onpage46,thedistributionsofitempoolsareinFigure5.30.Theplan consistofanitempoolofsize90fromthreetcontentareas.Theoverallcultiesof eachcontentareaweret.Inthisplan,contentbalancingwasimposedonthe itemselection.Inthesecondplan,thesameitempoolusedforPlan1wasused.t thanPlan1,contentbalancingwasnotimposedontheitemselection.InPlan3,theitem poolconsistedof90itemsfromthecontentarea.Nocontentbalancingwasimposed ontheitemselectionforthisplanaswell.Otherthanthebetweenitempoolsand thecontentbalancing,theCATspationsoftheplanswerethesame. ThefollowingparagraphsinvestigatetheseitempoolsusingtraditionalCAToutcomes suchasmeanbiasandmeanSEateachtrue .TheseresultsarecomparedtotheIPUI resultsandtheutilityofIPUIasadiagnostictoolisdiscussed.Sincethisresearchquestion investigatestheitempools,inthefollowingpages,plansarereferredasitempools.So,Plan 1,2and3arereferredasIP-1,IP-2andIP-3,respectively. Bias Figure5.31showsthemeanbiasvaluesconditionalontrue valuesforeachitem pool.IP-1andIP-2performedsimilarlythroughouttheabilityscale.IP-1hadoverallhigher biascomparedtoIP-2.Thisshowstheofcontentbalancing.IP-3hadsimilarmean 96 Figure5.29:IPUIandStandardErrorRelationshipbyExposureControlCondition biasonthepositivesideoftheabilityscale.Onthenegativesideoftheabilityscale,the meanbiasesweremuchlargercomparedtotheotheritempools.AsFigure5.30shows,IP-3 didnothaveanyitemsthathadyparameterslessthan0.5.Clearly,thison thebiasesofabilityestimates.Thebiasdistributionsforeachitempoolconditionshownin 97 Figure5.30:ItemPoolDistributionsofProposedTestPlans FigureE.1onpage186corroboratesthisobservation. StandardError Figure5.32showsthemeanstandarderrorvaluesconditionalontrue valuesforeachitempool.IP-2inplan2performedthebestthroughouttheabilityscale exceptthepositiveendoftheabilityscale.ThemeanSEvaluesforthisitempoolwere thelowestatthisinterval.ThemeanofSEforIP-1ofplan1hadthesameshapeasthe IP-2buthadlargervaluescomparedtotheIP2.Closetothemiddleoftheabilityscale,the betweenthesetwoitempoolsalmostdisappeared.MeanSEvaluesofIP-3were substantiallylargeatthenegativesideoftheabilityscale.Attherightof = : 5,wherethis itempoolwascomparativelystrong,ithadthelowestSEs.FigureE.2onpage187shows thedistributionsoftheSEsateachtrue value.Thisgraphalsocorroboratesthementioned 98 Figure5.31:MeanBiasConditionalonTrue foreachPlan(ItemPool) amongtheitempools. IPUI Figure5.33showsthemeanIPUIvaluesconditionalontrue valuesforeachitem pool.PerformanceoftheIP-2ofplan2wasbetterthanotheritempoolsfor valuessmaller than1.Fortrue valueslargerthan1,theperformanceofIP-3wasbetter.ThemeanIPUI valuesofIP-1waslowerthantheIP-2foralltrue values.Thisclearlyshowstheof contentbalancingontheperformancesoftheitempools. ThesimilarityofSEandIPUIcanbeobservedfromtheIfthelinesinFigure5.33 werealongthexaxis,theshapeswouldresembletheSEdistributionsonFigure5.32. Butthebetweenthesetwoarealsoapparent.Forexample,eventhough IPUIresultsshowalargedbetweentheperformancesofitempoolsforPlan1and 99 Figure5.32:MeanStandardErrorConditionalonTrue foreachPlan(ItemPool) Plan2,thiswasnotontheSEfor valuesbetween-0.5and0.5.So,for thetrue value0,theabsolutebetweenIPUIvaluesofIP-1andIP-2wereabout 0.2,theabsoluteofmeanSEvalueswerealmostnon-existent,0.014.Ontheother hand,for =2 : 2,theabsolutebetweenboththemeanIPUIvaluesandthemean SEvalueswere0.17.So,thereisnotaonetoonerelationshipbetweenthesetwoindicators. Figure5.33showstheweakpointsoftheitempoolsforeachplan.Itisadvisablefortest developerstoaddmoreitemsaroundthe valueswhereIPUIvalueswerelow.Butwhen thereiscontentbalancingasinPlan1,theadviseofaddingmoreitemstotheitempool whereIPUIvaluesarelowmightnotbeclear.Because,atestdevelopermightask\Add itemsfromwhichcontentarea?".Toanswerthisquestion,thetestdevelopermightcheck thegraphofthemeanIPUIvalueateachtestquestionforagiventrue . 100 Figure5.33:MeanIPUIConditionalonTrue foreachPlan(ItemPool) Figure5.35showsthegraphofthemeanIPUIateachitemnumberofthetest.In thisgraphatestdevelopercanobservewhethertheitempool'sperformancereducesasthe testproceedsforcertaingroupsofexaminees.Iftheitempoolcannotprovideappropriate items,theIPUIvaluewilldecreaseasthetestproceeds.Suchinformationwillguidethe testdevelopersabouthowmanyitemsareneededtoaddtotheitempooltoresolvethis .Forexample,IP-2abletoprovideappropriateitemstotheexamineeswithtrue valuesbetween-1.4and1.4throughoutthetest.Butattheextremesoftheabilityscale, aftertheseconditem,theitempoolfailedtoprovideappropriateitems.Ataround13items, theitemsdidnotmatchtheabilityestimateoftheexamineeswithextremetrue values. IP-2andIP-3showsasmoothdecliningtrendfromthebeginningofthetesttotheend forallofthetrue values.ButIP-1showsazigzagshape.Thereasonbehindthisisthe 101 Figure5.34:IPUIDistributionConditionalonTrue foreachPlan(ItemPool) contentbalancing.Forexample,fortrue value1.4,theCATalgorithmonaveragegavean itemwithIPUIvalueof0.24asthefourthiteminPlan1.Butthenextitemhadahigher IPUIvalue,thesixthitemhadevenhigherIPUIvalue.Foritem4,eventhoughtherewere manyappropriateitemsavailableintheitempool,therewasnotaitemincontent area1(arithmetic)topresenttotheexaminees.Contentbalancingcombinedwithaweak itempooldistortedthemotivationbehindtheCAT:providingthemostappropriateitemsto theexaminees.Theproblemofcontentbalancingwhenthereisaninteractionbetweenitem 102 yandcontentareaswasalsodescribedinWayetal.(2002): Ifsomeitemtypesareinherentlymorethanothersandthecontent spcallforeveryexamineetobeadministeredequalnumbersofeach, thealgorithmwilltendtochoosethemostitemsfromtheeasycontent areasforthehigh-abilityexaminees.(p.146) AtestdevelopercanlookattheFigure5.35andadditemstothecontentareaswhich haslowIPUIvalues.Foritempoolswithoutcontentbalancing(IP-2andIP-3),thisgraph showsasimilarinformationastheprevioustwographs.Butstillthisgraphsgivesanideato thetestdeveloperaboutapproximatelyhowmanyitemsshouldbeaddedtotheitempool aroundthevicinityofthe tomakeitsatisfactory. Figure5.36showstherelationshipbetweentheintermediate estimatesofeachexaminee andtheIPUIvalueoftheparticularitemadministeredthatisappropriateforthatintermediate estimate.Thepointsarecolorcodedbasedonthecontentoftheitemadministered.This graphcombinesalloftheexamineesateachtrue condition.Inthisregard,theability distributionoftheexamineescanbeseenasauniformdistribution.Sincetherewasno contentbalancinginotherplans,thecolorcodingisrelevantonlyforplan1. Thisgraphisveryhelpfulfortestdeveloperstoseetheweakspotsoftheitempool. Forplan1,duetothecontentbalancingimposed,IPUIvalueswerelowwhentheitem selectionalgorithmselectedanitemfromacontentareawheretheintermediate estimates wereoutsidetheyrangeofthatcontentarea.Forinstance,whenanexaminee's intermediate estimatewaslowanditemselectionalgorithmhadtoadministeratrigonometry item,theIPUIvalueforthatintermetiate estimatewouldbelow,eventhoughtheeasiest trigonometryitemavailableadministeredtotheexaminee.Astestproceedstheitempool depletedeasytrigonometryitemsandevenhardertrigonometryitemsadministeredtothe examinee.Thisviciouscyclecontinuesuntilthetestends.Atestdevelopercanobserve thisfromFigure5.36andaddeasytrigonometryitemstotheitempool.Algebraitems 103 Figure5.35:MeanIPUIateachItemNumberforSelectedTrue s wereadequatearoundthemiddleofthe scalebutmorealgebraitemsneededoutsidethe interval[-1,1]. Thesameitempoolperformedbetterinplan2whentherewasnocontentbalancing imposed.Theitempoolwasadequatewhenexaminee'sintermediateabilityestimateswere between-1.5and1.5.Thisitempoolneedsveryeasyandveryitems. Theitempoolforplan3performedwellbetween0.5and2.Outsidethisinterval,this itempoolfailedtoprovideappropriateitemstotheexaminees.Moreitemsareneededfor thisitempoolwithitemyparametersequaltotheintermediate estimateswhere 104 IPUIvalueswerelow. Figure5.36:TheRelationshipbetweenIntermediate EstimateandIPUI Onelimitationofthisgraphisit'sdependenceonthequalityofthe estimates.Forplan 3,eventhoughthereweremanyexamineesthathadtrue valuesbetween-3and-2,almost noneoftheseexamineeshad estimatesthatwerelowerthan-2.Infact,asFigure5.31on page99shows,therewasalargepositivebiasintheestimatesforplan3.Thismightcausea 105 testdevelopertothinkthatnoitemsareneededwithitemyparametervaluessmaller than-2.Addingeasieritemstothisitempoolwouldreducethebiasesinthe estimatesand potentiallyhighlighttheneedformucheasieritems. Figure5.37showstherelationshipbetweenintermediate estimatesandthey parametersoftheitemsadministeredatthose estimates.Thepointsarecoloredtoindicate theleveloftheIPUIvalues.ThegreenpointsrepresenthighIPUIvaluesandredpoints representlowIPUIvalues.Thedashedredlineistheidentity( y = x )line.Iftheitempool providedappropriateitemstotheexaminees,thepointswouldfallontheidentityline. Figure5.37:TheRelationshipbetweenIntermediate EstimateandItemy Thegraphforplan1showstheofcontentbalancing.Thepointsdeviatedfrom theidentitylinethroughoutthe scale.Thisitempoolwasadequatebetween-2and3for plan2.Thedeviationsstartedtowardstheextremes.Thisitempoolwasnotadequatewhen 106 theintermediate estimatesreachedtheextremesofthe scale.Theitempoolforplan3 wasadequateforonlyasmallportionofthe scale.Thisitempoolwasnotadequateforthe estimatesoutsidetheintervalbetween0and2. 5.2SecondPhase-OperationalData Inthesecondphaseoftheanalysis,ResearchQuestions4and5wereinvestigated.These tworesearchquestionswereansweredusingeoperationalitempoolsinadditiontofour generateditempools.Twoofthegenerateditempoolswereidealitempoolsgeneratedfor thespcationsandexamineepopulationofNCLEX-RNexamination.Othertwoitem poolsweregeneratedfromtheoperationalitempoolbyremovingsomeproportionof theitems.Inthefollowingsections,theresultsoftheitempoolgenerationresultswill bepresented,othertwosectionswillpresenttheresultsoftheResearchQuestions4and5. 5.2.1IdealItemPoolGeneration Twoidealitempoolsweregeneratedtocomparedwiththeoperationalitempools.The idealitempoolhadbinsizeswithlengths0.4.Themiddlebinwascenteredaround0in the scale.Thesecondidealitempoolhadxedbinsizeswithlengths0.8.Theitem poolwasdesignedtobemoreprecisecomparedtothelatter.Reducingthelengthofthebin sizesfurtherwouldincreasedtheprecisionbutinthatcasemoreitemswouldbeneeded.If theitempoolhadmorethannecessaryitems,mostofthemwouldnotbeadministeredto theexaminees.Thiswouldreducetheoftheitempool.Previousresearch(He& Reckase,2013)andpreviousunpublisheditempooldesignstudiesforNCSBNshowedthat binsizeswith0.4and0.8binwidthswouldbegoodchoices. Inthecreationoftheidealitempools, 10 ; 000 examineesweresimulated.Eachsimulee neededrentitemsdependingonher valueandresponses.Thesizeoftheidealitem poolgrewasthenumberofsimuleesincreased.Afteranumberofsimulees,therewasalot 107 ofoverlapbetweentheitemsneededbysimuleesandtheneedfornewitemsdecreasedunless asimuleehadanextremeabilityorirregularresponses.Figure5.38showsthegrowthofthe sizeoftheidealitempoolforbinsize0.4.Thegrowthgraphforbinsize0.8is inFigureF.1onpage188.Thepanelofthegraphsshowthegrowthprogressforeach contentarea.Secondpanelofthegraphsshowtheoverallgrowthprogressoftheidealitem pool. Figure5.38:ProgressPlotforIdealItemPoolwithFixedBinSize0.4 Thesegrowthprogressgraphsareimportantinthediagnosisofthebin-and-unionmethod. 108 Ifthelinesdonotconvergetoasinglepoint,thismeansthenumberofexamineessimulated wasnotenoughforthecreationoftheidealitempool.Inthatcase,moresimuleesareneeded toobtainastableidealitempool.Fortheidealitempoolsgeneratedforthisdissertation, theconvergencereachedafter 3000 simuleesforthebinsize0.4itempoolandafter 2000simuleesforthebinsize0.8itempool. Attheendofthesimulations,idealitempoolwithbinsize0.4had 3071 items.Ideal itempoolwithbinsize0.8had 1652 items.Theitemultyparameterdistribution bycontentareafortheidealitempoolwithbinsize0.4isinFigure5.39.Thesame graphforthebinsize0.8isinFigureF.2onpage189.The b -parameterdistributions hadalmostthesameshapeforeachcontentarea.Thenumberoftheitemswithineach contentareathecontentareadistributionsshowninTable4.2onpage47.Thepeaks towardstheextremesoftheabilityscaletherelativelylargenumberofitemswithin thelastbins.Thebinsattheextremescapturedalloftheitemsthatwereneededinthe scalefromthelastbin'sboundarytothey. 5.2.2ItemPoolsUsedintheSecondPhase Inthesecondphaseofthisstudy,eoperationalitempools(namedasOp1,...,Op5) werecomparedtofourgenerateditempools.Twoofthegenerateditempoolswereideal itempools,theidealitempoolwithbinsize0.4(Ideal0.4)andtheidealitempoolwithbin size0.8(Ideal0.8).Thelasttwoitempoolsweregeneratedbyremovinghalfofthe operationalitempool(Half-IP)andbyremovingtwothirdsoftheoperationalitempool (One-ThirdIP). Theitemydistributionsandthenumberofitemswithineachitempoolisin Figure5.40.Thedashedredlineinthehistogramsarethemeanitemcultyvaluesof eachitempool.Thenumberofitemswithineachoperationalitempoolwasthesame.The distributionsoftheoperationalitemswerealmostthesameaswell.Themajorityofthe itemswereclosetothecutscore.Thereweremoreeasyitemsthanitemswithin 109 Figure5.39:ItemyDistributionsbyContentAreaforIdealItemPoolwithFixed BinSize0.4 eachoperationalitempool.Alsotheoperationalitempoolsdidnotspreadmuchtoeither extremeoftheabilityscale. Thehalfitempoolandtheonethirditempoolhadexactlyhalfandonethirdasmany itemsastheoperationalitempool,respectively.Theshapeofthedistributionsofthese twoitempoolsweresimilartotheoperationalitempools. Theidealitempoolwithbinsize0.4hadmorethantwiceasmanyitemsasthe 110 operationalitempools.Thisitempoolhadmanyitemsclosetothemiddleoftheability scale.tthantheshapesoftheoperationalitempools,thisitempoolhadalmost equalnumberofeasyandharditems.Thespreadofthisitempoolwasalsowidercompared totheoperationalitempools. Idealitempoolwithbinsize0.8had180itemsmorethantheoperationalitem pools.Ithadlessitemsaroundthemiddleoftheabilityscalecomparedtobothoperational itempoolsandIdeal0.4itempool.Ithadmoreitemsattheextremes.Theboundaries ofthebinsarevisibleforthisitempool.Closetothebinboundaries,therewasabump inthenumberofitems.Whilegeneratingtheidealitempools,thecandidateitemswere generatedfromastandardnormaldistribution.Iftheitemsweregeneratedfromauniform distribution,thebumpsclosetothebinboundarieswouldhavebeendisappeared.Butif auniformdistributionwasusedasthegeneratingfunction,thedecisionfortheminimum andmaximumvalueofthisdistributionwouldhavebeenarbitrary.Theparametersofthe uniformdistributioncannotbey,asaresulttheabilityscalewouldhavebeen arbitrarily. 5.2.3ResearchQuestion4 TheaimofthisresearchquestionistoobservetheperformanceofIPUIforanoperational CAT.Inthepreviousresearchquestions,theCATscenarioswererathersimplistic.Conditions inonlyoneaspectfromaverysimpleCATalgorithm.Inthisresearchquestion,a complexCATalgorithmwasinvestigated.ThisCATalgorithmincludescontentbalancing, exposurecontrol,atwostageabilityestimationmethodandacomplicatedstoppingrule. PredictingtheitempoolperformanceforrathersimpleCATscenariosmightbefeasible.But foracomplexCATalgorithmastheoneinvestigatedinthisresearchquestion,predicting itemperformancewouldbe Thesamesimuleesamplewasusedforthesimulationsforeachitempoolcondition. 10 ; 000 simuleesweregeneratedfromanormaldistributionwiththesimilarmeansandstandard 111 Figure5.40:ItemyDistributionsforItemPoolsUsedinResearchQuestion4 deviationsoftherealexamineesthattookthetests.The distributionofthesimuleesare plottedinFigureG.1onpage190. Foreachitempool,meansofbias,SE,MSE,IPUI,decisionaccuracyandycot calculated.ThesevaluesforeachitempoolconditionaregiveninFigure5.41.Ascanbeseen fromthegraph,exceptfortheIPUIvaluestherewasalmostnobetweent itempools.Table5.7showsthenumericvaluesofmeansandstandarddeviationsofthese values.Table5.7alsoshowsthemeanandstandarddeviationofthetestlengthforeach 112 itempoolcondition.Inthefollowingpages,eachoftheseoutcomevariableswillbediscussed separately. Figure5.41:SummaryStatisticsforResearchQuestion4 IPUI Thesizeoftheitempooldidnotmakemuchinotheroutcomesofthe adaptivetestexcepttheexposurerates(whichwillbediscussedlaterinthissection).Since theIPUIisadirectmeasureofthequalityofanitempool,thedintheitem poolsareexpectedtoontheIPUIvalues.InFigure5.41,theonlybetween itempoolswastheirIPUIvalues.TheIPUIvaluesofHalf-IPandOne-Third-IPwerelower comparedtotheotheritempools. Table5.7showsnobetweenthemeanandstandarddeviationsoftheIPUI valuesfortheoperationalitempools.Ideal-0.8itempoolhadasimilarmeanIPUIvaluebut 113 Table5.7:SummaryStatisticsforResearchQuestion4 ItemPoolFidelityCotBiasSE Op10.91840.016(0.238)0.222(0.055) Op20.91920.014(0.239)0.223(0.055) Op30.9180.010(0.241)0.222(0.055) Op40.91780.015(0.240)0.222(0.055) Op50.91970.011(0.236)0.222(0.056) HalfIP0.9170.014(0.241)0.223(0.056) One-ThirdIP0.91910.012(0.238)0.224(0.056) Ideal(0.4)0.9180.012(0.240)0.222(0.055) Ideal(0.8)0.91810.011(0.239)0.222(0.056) ItemPoolMSEIPUITestLengthDecisionAccuracy Op10.057(0.092)0.995(0.007)108.556(72.913)92.34% Op20.057(0.093)0.995(0.006)108.524(73.285)92.59% Op30.058(0.093)0.995(0.007)108.629(72.866)92.61% Op40.058(0.091)0.995(0.007)109.007(73.083)92.68% Op50.056(0.092)0.995(0.007)109.635(73.565)92.43% HalfIP0.058(0.093)0.980(0.019)109.446(73.228)92.21% One-ThirdIP0.057(0.091)0.957(0.030)110.220(74.077)92.23% Ideal(0.4)0.058(0.095)0.999(0.001)108.718(72.903)92.22% Ideal(0.8)0.057(0.091)0.994(0.002)109.352(73.183)92.33% Note. Numberswithintheparenthesesarestandarddeviationsofeachoutcome. SE:StandardError;MSE:MeanSquaredError thestandarddeviationofIPUIvalueswaslowercomparedtotheoperationalitempools. Ideal-0.4itempoolhadameanIPUIvaluethatisalmost1andthestandarddeviationof IPUIvalueswasthelowest.HalfandOne-ThirditempoolconditionshadlowermeanIPUI valuesandtheyhadlargerstandarddeviations.Thequalityoftheitempoolswereclearly onthestatisticsoftheIPUIvalues. Figure5.42showsthedistributionsofIPUIvaluesvisually.Operationalitempools hadsimilardistributionsbuttheyhadmanyexamineeswhohadcomparativelylowerIPUI values.Ideal-0.8itempoolhadfarfewerexamineesthathadlowerIPUIvalues.Noneof theexamineesgotanIPUIvaluelessthan0.95forthisitempoolcondition.Ideal-0.4item poolperformedthebest.Allexaminees,except5ofthem,hadIPUIvalueslargerthan0.99. SpreadofIPUIvaluesweremuchlargerfortheHalf-IPandOne-Third-IPconditions.In 114 fact,noneoftheexamineesinOne-Third-IPconditionhadanIPUIvaluelargerthan0.99. Figure5.42:IPUIDistributionforeachItemPoolCondition RelationshipbetweenTrueandEstimatedAbility Figure5.43showstherelationship betweentrueability( )andestimatedability( ^ ).Thedashedlinesintherearethe identitylines(i.e. y = x line).Ateachcondition,therewasanerrorintheestimationand pointsdeviatedsomewhatfromtheidentityline.Closetothemiddleoftheabilityscale thereisasqueezeinthespreadaroundtheidentityline,estimatedabilitiesapproximated theirtruevaluesbetter.Thereasonofthisisthevariablelengthtest.Aroundthecutscore, 115 testlengthswerelongercomparedtotheexamineesattheextremesoftheabilityscale.As indicatedatSection5.1.2.1onpage74,longertestsincreasethetestprecisionandreducethe estimationerror.Figure5.43alsoshowsthecorrelationsbetweentrueandestimatedabilities foreachitempool.Therewasalmostnobetweenthecorrelationsoftheitempools. Figure5.43:TheRelationshipbetweenTrue andEstimated IPUIandEstimatedAbility Figure5.44showstherelationshipofIPUIandestimated ability.Thisgraphisusefultoseethelocationsinabilityscalewheretheitempoolwas 116 t.OperationalitempoolshadveryhighIPUIvaluesaroundthecutscore.But towardstheextremesoftheabilityscalethespreadofIPUIvaluesincreased.Especiallyfor highabilityexaminees,theitempooldidnothaveenoughappropriateitems. IdealitempoolshadhighIPUIvaluesthroughouttheabilityscale.Half-IPhadlower IPUIvaluesforexamineesclosetothecutscore.Towardstheextremesoftheabilityscale theIPUIvaluesdecreasedevenmore,especiallyforhighabilityexaminees.One-Third-IP performedevenworsethanHalf-IP.Closetothecutscore,IPUIvaluesmakeadip,then towardstheextremesoftheabilityscaleIPUIvaluesfellevenfurther.Comparedtothelow abilityexaminees,highabilityexamineeshadlowerIPUIvalues. Figure5.44showedthat,exceptidealitempools,theIPUIvaluesoftheexamineesatthe extremesoftheabilityscalewerenotashighastheexamineesclosetothecutscore.From theperspectiveoftestdevelopersforthistest,thismightnotbeveryproblematic.Thetest isalicensuretestwithonecutscore.Aslongasthedecisionaccuracyishighforthetest, thetestisdeemedtoit'spurpose.Forthisreason,theitempoolshouldhavet itemsaroundthecutscore.Testlengthcangoupto250forexamineesclosetothecut score.Theitempoolshouldsupportthislongtest.Figure5.44givessomeevidenceforthis. Aroundthecutscore,foreachitempoolexcepttheHalf-IPandOne-Third-IPconditions, examineeshadalmostperfectIPUIvalues. IPUIandTestLength Additionalevidencefortheyofitempoolaroundcut scorecomesfromFigure5.45thatshowstherelationshipbetweenIPUIandtestlength. ExceptforIdeal-0.4itempool,therewasaspreadofIPUIvaluesforteststhatlast60items. Examineeswithtestlengths60wereawayfromthecutscore.Decisionsfortheseexaminees wereclearafter60items.Theseexamineesweretheonesthatwerelocatedoutsideofthe (-0.5,0.5)bandoftheabilityscaleinFigure5.44. Testswerelongerforexamineeswhoseestimatedabilitieswereclosetothecutscore. IPUIvaluesforexamineeswhotooktestslongerthan60itemswerecloseto1foroperational 117 Figure5.44:TheRelationshipbetweenIPUIandEstimatedAbilityforeachItemPool Condition itempoolsandidealitempools.Thisisanimportantindicatorforthequalityoftheitem pool.Theseitempoolswereabletosupportverylongtestswith250items.Itiscrucialfor anitempooltosupportlongtestsbecausehighdecisionaccuracyismuchneededforthe examineeswhotaketheselongtests.Thesetestswerelongbecauseitwasdtomakea decisionfortheseexaminees.Ontheotherhand,forHalf-IPandOne-Third-IPconditions, Figure5.45showsthatIPUIvaluesstartedtodecreaseastestlengthincreased.Thisismore 118 visibleforOne-Third-IP.Fortheseitempools,astestlengthincreased,itempoolfailedto provideappropriateitems. Figure5.45:TheRelationshipbetweenIPUIandTestLengthforeachItemPoolCondition Bias Figure5.41showsnobetweenthemeanbiasesoftheitempools.Table5.7 showsthatmeanbiasvalueswerealittleabove0foreachitempoolcondition.Thestandard deviationofthebiaseswerethesamefortitempoolconditionsaswell.Figure5.46 119 showsthisvisually.Therewasalmostnobetweenthebiasdistributionsofdit itempools. FigureG.2onpage191showstherelationshipbetweentheestimatedabilityandbias.If therewasnobias,thepointswouldbeonthedashedlineinthemiddle.Thebiaswassmall aroundthecutscore,asexplainedabove.But,therewasaratherlargecorrelationbetween estimatedabilitiesandbiasesforeachitempoolconditionasshowninthetextboxes.The regressionlinestothepointsalsoshowthispositiverelationship.Forhighabilitylevels thebiaseswerepositive,abilitieswereoverestimated.Forlowabilitysimulees,abilitieswere underestimated. BiasandIPUI Figure5.47showstherelationshipbetweenbiasandIPUI.Alinear regressionlinewastoeachplottoshowthelinearrelationshipbetweenthesetwo variables.Fromthegraphitcannotbeconcludedthathighabsolutebiasisassociatedwith highIPUI.Theexpectedrelationshipforthisgraphwasaninverted-Ushapedcurvilinearline, whereIPUIwashighforlowabsolutebiasvaluesandlowforhighabsolutebiasvalues.This conclusionisinlinewiththeofthepreviousresearchquestions.Ontheotherhand, thelinesshowsthatthereisaweaknegativerelationshipbetweenbiasandIPUI.For idealitempools,thisrelationshipalmostdisappears.Thisrelationshipismoreevidentfor Half-IPandOne-Third-IPconditions.Figure5.44showsthatbothoperationalandshortened itempoolswereweakonthepositivesideoftheabilityscale.Figure5.47theweakness oftheitempoolsforhighabilityexaminees.Whenitempooldistributionswerebalanced,as theidealitempools,therewasnorelationshipbetweenIPUIandbias. StandardError Figure5.41showsalmostnobetweenthemeanvaluesofSEs fortitempoolconditions.NumericvaluesatTable5.7alsothis.Boththe meansandstandarddeviationsofSEvalueswerethesameforeachitempoolcondition.Only fortheOne-Third-IPconditiondidthesevaluesincrease,butthisincreasewasverysmall. 120 Figure5.46:BiasDistributionforeachItemPoolCondition Figure5.48showsthedistributionofSEvaluesvisually.Thedashedlineshowsthe thresholdforatestwith0.9reliability 1 .InthisexceptOne-Third-IP,allofthe remainingitempoolshadalmostsameSEdistributions.Thenumberofsimuleeswhohad highSEvalueswerehigherforOne-Thirditempoolcondition.Therewasoneexamineewho hadaverylargeSEcomparedtotherestoftheexaminees.Also,theminimumvaluesofSEs 1 ThiswasexplainedinSection5.1.1.2onpage69.Thisnumberwasderivedunderthe assumptionthatthestandarddeviationofthetrueabilitiesofthepopulationis1.Inthe simulationsofResearchQuestion4,thestandarddeviationofthesimulatedtrueabilities werelowerthan1.So,eventhoughthedashedlinegivessomeideaaboutthereliabilityof thetest,acautionisadvisedwheninterpretingtheseresultsusingthisthreshold. 121 Figure5.47:TheRelationshipbetweenIPUIandBiasforeachItemPoolCondition forthisitempoolwerelargercomparedtootheritempoolconditions.Inthissimulation,the examineeswithsmallSEswereclosetothecutscore.Fortheexamineesthatwerecloseto thecutscore,SEswerelargerforOne-Third-IPcondition.Thiscanbeinterpretedasthe weaknessofthatitempool. FigureG.3onpage192showstherelationshipbetweenestimatedabilityandSE.There wasaclearrelationshipbetweenSEandestimatedability.Towardstheextremesoftheability scale,thestandarderrorswerehigher.Theseweretheexamineeswhoseexamsat 122 Figure5.48:StandardErrorDistributionforeachItemPoolCondition the60thitem.TherewasasteepdecreaseinSEsbetween ^ = 0 : 5and ^ = 0 : 25,and steepincreasebetween ^ =0 : 25and ^ =0 : 5.MostoftheseSEsbelongtotheexamineeswith testlengthsbetween61and249.LowestSEvaluesbetween ^ = 0 : 25and ^ =0 : 25belong totheexamineeswithtestlengths250.Thispatternwasthesameforalloftheitempools. Buttthantheotheritempools,forOne-Third-IP,theSEsincreasedevenmoreat theextremesoftheabilityscale.Mostprobablecauseofthisincreasewasthescarcityof theitemswithintheitempool.AsimilartrendcanbeseentosomeextentfortheHalf-IP, especiallyforhighabilityexaminees. 123 TherelationshipbetweenSEandtestlengthisshowninFigureG.4onpage193.There wasaclearnegativerelationshipbetweentestlengthandSEforallitempoolconditions. Thecorrelationcotsinthetextboxesshowaverystronglinearrelationshipbetween thesetwovariables,eventhoughtherelationshipappearstobecurvilinear. IPUIandStandardError TherelationshipbetweenSEandIPUIisshowninFigure5.49. ThereisnotanapparentrelationshipbetweenSEandIPUIbecausethisrelationshipis confoundedwiththetestlength.Infact,thisresemblesFigure5.45,onlythex-axisis ed.ThereisadirectrelationshipbetweentestlengthandSEasshowninSection5.1.2.1 onpage77.Fortestswith60items,SEwashigh,IPUIwasgenerallylowforshorttests asshowninFigure5.45.Astestlengthincreased,theSEdecreased.Foroperationaland idealitempools,theIPUIvalueswerenear1evenfortestswith250items.Asaresult,even thoughSEsdecreasedIPUIvaluesremainedcloseto1.ForOne-Third-IPcondition,IPUI valueswerelowerwhenSEwerelower.Thereasonforthiswasthelackoftitemsin One-Thirditempooltoprovideexamineeswithverylongtests. MeanSquaredError Figure5.41showsnobetweentheaverageMSEvalues betweenitempools.ThevaluesinTable5.7alsodoesnotshowanynoticeable ineithermeansorthestandarddeviationsoftheMSEvalues.Visualinspectionofthe individualMSEvaluesinFigure5.50doesnotshowanyvisiblebetweenitempools. FigureG.5onpage194showstherelationshipbetweenIPUIandMSE.Therewasaweak negativeassociationbetweenthesetwovariables.Itwasexpectedthattheexamineeswho didnottakethemostappropriateitems(i.e.hadlowIPUIvalues)alsohadhighMSEvalues. ExposureRates NCLEX-RNisahighstakestest.Controllingtheexposureratesisvery importantforthesecurityofthetest.Figure5.51showstheexposureratedistribution foreachitempoolcondition.Thedashedlinesintheshowthe0.20and0.05levels forexposurerates,whichcorrespondstorecommendedhighandlowexposurethresholds 124 Figure5.49:TheRelationshipbetweenIPUIandStandardErrorforeachItemPoolCondition foritems,respectively.Theexposureratedistributionsofalloperationalitempoolshad anegativeskew.Operationalitempoolswereverysuccessfulincontainingtheexposure ratesbelow0.2.Nearlyhalfoftheitemsinoperationalitempoolshadexposureslowerthan 0.05.Themedianexposurerates(shownbytheboldlinesintheoftheboxplots)wereall lowerthan0.05.So,therewerealotofitemsthatwereadministeredtolessthan5%ofthe examineesample. FortheHalf-IPandOne-Third-IP,theexposureratedistributionshadabalancedspread 125 Figure5.50:MeanSquaredErrorDistributionforeachItemPoolCondition andlessskewcomparedtootheritempoolconditions.Butmanyitemshadexposurerates largerthanthe0.2threshold.EspeciallyfortheOne-Third-IP,themajorityoftheitemshad exposurerateslargerthan0.2. Theperformanceoftheidealitempoolswerenotperfectinthesenseofexposurecontrol. Noneoftheitemshadexposurerateslargerthan0.18forIdeal-0.4(idealitempoolwith binsize0.4)itempool.Butabout75%oftheitemshadexposurerateslowerthan0.05. ForIdeal-0.8itempool,manyitemswereexposedtomorethan20%oftheexaminees.But themajorityoftheitemshadexposuresrateslessthan0.05. 126 Figure5.51:ExposureRateDistributionforeachItemPoolCondition Table5.8showsthemeansandstandarddeviationsofexposureratesinadditiontothe ratioofexposureratesthatarelargerthan0.20andsmallerthan0.05.Meanexposurerates thenumberofitemsintheitempool.BecauseNCLEX-RNisavariablelengthtest, therewasnotafunctionalrelationshipbetweenmeanexposureratesanditempoolsizesasin thetestlengthtests.Exposurerateoutcomesofoperationalitempoolswereverysimilar toeachother.Allhadsimilarmeansandstandarddeviationsfortheexposurerates.Onlya fewitemswereexposedtomorethan20%oftheexaminees.Withintheoperationalitem pools,Op-5hadthemostitemsthathadexposurerateslargerthan.2,whichcorrespondsto 42items(outof 1472 items).Almosthalfoftheitemswereexposedtolessthan5%ofthe examinees. Theresultsoftheexposurerateswerenotperfectforanyitempool.Operationalitem 127 poolssuccessfullylimittheexposureratesbelow0.2buttheyhadalotofunderexposeditems. ThesameiscorrectforIdeal-0.4itempool.Butthisitempoolhadmanymoreunderexposed items(75%oftheitempool).Thisitempoolwasnotientinthissense.Half-IPand One-Third-IPconditionshadfewerunderexposeditems,31%and16%oftheitempools respectively.Buttheyhadmanyoverexposeditemsaswell,39%and59%oftheitempools respectively. Table5.8:ItemExposureAnalysisbyItemPoolCondition ItemPoolMeanExposureSDExposure > .20Exposure < .05 Op10.07370.06400.00340.5258 Op20.07370.06410.01630.5326 Op30.07380.06370.01700.5183 Op40.07410.06480.00680.5312 Op50.07450.06620.02850.5346 HalfIP0.14870.11080.38720.3139 One-ThirdIP0.22540.13650.58690.1616 Ideal(0.4)0.03540.04580.00000.7473 Ideal(0.8)0.06620.08260.13620.6138 Afurtherinvestigationofexposureratesrevealedtheproblemsforeachitempool. Figure5.52showstherelationshipbetweenexposureratesanditemesgroupedby contentareas.Foreachoperationalitempool,mostoftheexposeditemshadclose tothecutscore.Thisisexpectedbecausethetestlengthswerelongeraroundthecutscore andthemeanoftheexamineepopulationwasclosetothecutscoreaswell.Foroperational itempools,mostoftheitemsthathadlowexposurerateswereattheextremesoftheability scale. One-Third-IPhadmostoftheoverexposeditemsaroundthecutscoreaswell.But thepeakpointwasaround0.3.Therewasascarcityofyitems.Mostofthe underexposeditemsforthisitempoolconditionwereatthelowerendoftheabilityscale. Ideal-0.4itempoolhadalotofunderexposeditemsattheextremes.Noneofthecontent areashadlargerexposureratescomparedtoothercontentareas.Oneexceptionmightbe 128 theIdeal-0.8IPcondition.Firstcontentareahadsomewhathigherexposureratesinthis itempool. Figure5.52:TheRelationshipbetweenExposureRatesandItemtiesGroupedby ContentAreaforeachItemPoolCondition DecisionAccuracy NCLEX-RNisacertitest.Soitiscrucialtohaveahigh decisionaccuracy.Table5.7showsthedecisionaccuracyfortestsbasedoneachitempool. 129 Foreachitempool,decisionaccuracywasabout92%.Eventheidealitempoolwithover 3000itemsdidnothavehigherdecisionaccuracythantheOne-Third-IP.Table5.9shows thedetailedinformationaboutthedecisionaccuracy.Mostoftheexamineespassedthetest (around66%).Thedecisionwasincorrectforabout8%oftheexaminees.Thepercentagesof thefalsenegatives(examineeswhoincorrectlyfailed)werehigherthanthefalsepositives (examineeswhopassedincorrectly).Eventhoughanincorrectdecisionisnotgood,usually forexamsfalsenegativesarebetterthanfalsepositives 2 . Table5.9:DecisionAccuracyAnalysisbyItemPoolCondition ItemPoolFail(C.D.)Fail(I.D.)Pass(C.D.)Pass(I.D.) Op129.73%4.15%62.61%3.51% Op229.64%3.81%62.95%3.6% Op329.93%4.08%62.68%3.31% Op429.77%3.85%62.91%3.47% Op529.73%4.06%62.7%3.51% HalfIP29.69%4.24%62.52%3.55% One-ThirdIP29.77%4.3%62.46%3.47% Ideal(0.4)29.73%4.27%62.49%3.51% Ideal(0.8)29.58%4.01%62.75%3.66% Percentagesofsimuleeswhofailedorpassedthetest,andwhetherthe decisionwascorrectorincorrect. C.D.:CorrectDecision;I.D.:IncorrectDecision 5.2.4ResearchQuestion5 Theaimofthisresearchquestionistodemonstratethediagnosisofthequalityofan operationalitempoolusingIPUIandguidetestdeveloperstobuildbetteritempools.Six itempoolswereinvestigated.Twooftheseitempoolswereoperationalitempools,twoof themwereidealitempoolswithbinsizes0.4and0.8 3 ,twoofthemwereonethirdand halfoftheoperationalitempools(One-Third-IPandHalf-IPrespectively).Foreach 2 Inreality,thispreferencedependsonthecostofafalsepositiveandafalsenegative decision. 3 CheckSection4.2.2.1forthedetailsoftheseitempools 130 itempoolcondition,thesameNCLEX-RNCATspasdescribedinSection4.2.2 onpage47wereused. Inthefollowingparagraphs,itempoolsarediagnosedusingtCAToutcome variables.ThelastoutcomevariableinvestigatedisIPUI.ItishypothesizedthatIPUIcan unearththecienciesintheitempoolsthatotheroutcomevariablescouldnot Bias Therelationshipbetweentrue andthemeanofestimated sforeachitempool conditionisshowninFigureH.1onpage195.Thisgraphshowstwobumpsclosetothecut score.Awayfromthemiddleoftheabilityscale,thereseemstobeaperfectrelationship betweentrue andthemeanofestimated s.Thisgraphdoesnotshowanyence betweentheitempoolconditions. Figure5.53showsthemeanofbiasesateachtrue foritempools.Thisgraphisamore detailedversionofFigureH.1.Eachitempoolhadasimilarbumparoundthecutscore.The reasonforthesebumpswastheterminationrule.Thetestsoftheexamineesendedwhen thecutscorewasoutsidetheintervalsaroundtheirestimatedabilities.Since theirtestsended,examineesdidnottheopportunitytoconvergetotheirtrue sand consequentlyabiasoccurred.Forexample,takethegroupofexamineeswithtrue =0 : 25. Themeanbiasesfortheseexamineeswere0.10,whichmeanstheirmeanestimated swere about0.35(=0 : 10+0 : 25).Whentheseexamineescorrectlyansweredseveralitems,their estimated sincreased(abovetheirtrue s)andsincethecutscorefelloutofthe intervaltheirtestsended.IftheCATalgorithmadministeredmoreitemstotheseexaminees, theirestimated swoulddecrease.Butsincethetestended,itdidnothavetheopportunity toconvergeonagoodestimate. Towardstheextremesofthe scale,thereappearstobeabetweenitem pools.Theidealitempools(especiallytheIdeal-0.4itempool)hadmeanbiasescloseto0. One-ThirdandHalfitempoolshadnegativebiascloseto values-3,andpositivebiasat valuescloseto3.ThebiasinOne-ThirdIPismorevisibleforpositivetrue valueslarger 131 than2. Figure5.53:MeanBiasConditionalonTrue foreachItemPoolCondition FigureH.2onpage196showsthedistributionofbiasateachtrue valuefortheitem poolconditions.Someofthetrue valuesclosetothecutscorewereomittedtomakethe graphmorereadable.Foralloftheitempoolconditions,thevariationinthebiasesincreased towardstheextremesoftheabilityscale,exceptfortheidealitempools.Theincreaseinthe variationismorevisibleforOne-Thirditempoolcondition. StandardError Figure5.54showsthemeanSEvaluesateachtrue valuefort itempoolconditions.SEvaluesclosetothecutscorewerelowerbecauseofthelongtests theseexamineestook.Fortrue valuesoutsideofthe scalebetween[ 1 ; 1],testlengths werealmostalways60(FigureH.5onpage199).Thebetweentitempools 132 becomeclearforthesetestlengths.One-ThirditempoolhadthehighestmeanSEvalue followedbyHalf-IP.Therewasalmostnobetweentwooperationalitempools.The meanSEvaluesforoperationalitempoolswerelowerthanHalf-IPbuthigherthantheideal itempools.IdealitempoolshadthelowestSEsalongthe scale.Ideal-0.4itempoolhad smallerSEsevenattheveryextremesoftheabilityscale. TheresultsfortheSEsoutsidethe scalebetween[ 1 ; 1]rthenumberofitems withineachitempoolaroundthesevalues.Clearlylackoftnumberofitems theSEvalues.Closetothemiddleoftheabilityscale,therewasalmostnobetween itempoolconditions.AtFigure5.54,itistoseethebetweenitempools fortrue valuesclosetothecutscore.FigureH.3onpage197showsthemeanSEfortrue valuesbetween-0.7and0.7.Therewasalmostnosystematicbetweenitempools withinthisinterval.One-Third-IPhadslightlyhighervaluesbetween-0.2and0.2,butthere wasalmostnooutsidethisinterval. FigureH.4onpage198showsthestandarderrordistributionateachtrue valuefor theitempoolconditions.Someofthetrue valuesclosetothecutscorewerealsoomitted inthisgraphtomakeitmorereadable.ThevariationofSEswaslargeatthemiddleand towardtheextremesoftheabilityscalefortheoperationalandreduceditempools.For theidealitempools,thevariationwaslargeclosetothemiddleoftheabilityscale,but variationdidnotincreasetowardstheextremes.Thiscanbeseenasanindicatorofthe lackofappropriateitemsattheextremesfortheoperationalandreduceditempools.The testlengthsoftheexamineesattheextremeswere60.So,theonlyfactorthatthe increaseinSEshouldbethelackofappropriateitemsattheextremes. MeanSquaredError Figure5.55showstheMSEvaluesateachtrue valueforallitem poolconditions.Betweentrue values-1and1,thereappearsalmostnosystematic betweenitempools.Fortrue valuesaround-0.4and0.4,MSEvaluesmadeadip.Closeto true =0,MSEvaluesreachedalocalmaximum.Towardstheextremesoftheabilityscale 133 Figure5.54:MeanStandardErrorConditionalonTrue foreachItemPoolCondition theMSEvaluesstartedtoincrease,exceptfortheidealitempools. Thesystematicbetweenitempoolsbecameclearfortrue valuessmallerthan -2andlargerthan2.One-ThirditempoolhadthelargestMSEfollowedbytheHalf-IP. TwooperationalitempoolshadMSEvaluesbetweentheidealitempoolsandtheHalf-IP. Therewasnotasystematicncebetweenoperationalitempools.Ideal-0.4itempool hadthesmallestMSEvalues.MSEvaluesforthisitempooldidnotincreaseevenatthe veryextremesofthe scale.SinceMSEcanbeseenasafunctionofbiasandSEthisgraphs makessense.ThetowardstheextremeswerebytheinSEs amongitempools(Figure5.54). 134 Figure5.55:MeanSquaredErrorConditionalonTrue foreachItemPoolCondition DecisionAccuracy Table5.10showsthedecisionaccuracyatthetrue valuesbetween -0.5and0.5foreachitempoolcondition.Thetabledresultsarerestrictedtothisrange becauseallthedecisionaccuracyvaluesoutsidethisintervalwasvirtually100%foreach itempoolcondition.Table5.10doesnotshowanysystematicbetweenitempool conditions.Closetothecutscorethedecisionaccuracywascloseto50%.Thismakesense becauseifanexamineehastruescorethatisequaltothecutscore,thenhalfofthetimeshe willpassandacorrectdecisionwillobtained.Astrue valuesdeviatedfromcutscore,the decisionaccuracyimproved. IPUI Figure5.56showsthemeanIPUIvaluesatthetrue valuesforeachitempool condition.Thisclearlyshowsencesbetweenitempools.Eachitempoolhadhigh 135 Table5.10:DecisionAccuracyConditionalonTrue foreachItemPool TrueThetaOp1Op2HalfIPOne-ThirdIPIdeal(0.4)Ideal(0.8) -0.50100.0%100.0%99.9%99.9%99.9%100.0% -0.45100.0%99.9%100.0%99.8%99.7%100.0% -0.4099.8%99.9%100.0%99.7%99.4%99.6% -0.3599.4%99.4%99.5%99.5%99.7%99.3% -0.3099.1%98.2%98.7%98.5%98.7%98.5% -0.2597.0%97.9%96.6%95.3%96.7%96.9% -0.2091.7%92.4%91.9%92.2%93.1%93.9% -0.1588.5%84.8%87.4%87.0%84.0%86.4% -0.1078.1%76.0%80.7%78.1%75.7%76.6% -0.0564.0%64.6%65.3%64.8%63.5%63.8% 0.0051.5%47.4%46.0%51.4%48.7%48.8% 0.0563.9%63.9%63.8%65.7%67.2%65.2% 0.1076.9%80.3%76.9%78.1%77.9%76.9% 0.1586.9%87.5%86.2%88.1%85.8%85.1% 0.2091.9%93.3%92.7%93.8%93.2%93.8% 0.2596.7%97.0%96.6%96.5%96.1%96.4% 0.3098.5%98.6%98.6%97.9%98.4%98.5% 0.3598.7%99.2%99.5%99.3%99.5%99.3% 0.4099.7%99.9%99.7%99.8%99.3%99.7% 0.4599.9%99.9%99.9%99.8%100.0%100.0% 0.50100.0%100.0%100.0%100.0%99.9%99.9% IPUIvaluesclosetothecutscore.Butforexamineesthatwereawayfromthecutscore, IPUIvaluesstartedtodecrease.Ideal-0.4itempoolperformedthebest.Throughoutthe scale,thevalueswerecloseto1exceptveryextreme values.Ideal-0.8itempoolhadIPUI valuesabove0.99betweentrue values-2and2.IPUIvaluesstartedtodecreasetowards theextremes. Ideal-0.4itempoolshowstheoflackofitemsfortrue valuesthatwerelarger than2.5andsmallerthan-2.5.Itisnaturaltoaskwhyanidealitempooldoesnothave enoughitemsforexamineesattheextremes.Thereasonissimple.Thisitempoolwasideal foraparticulargroupofexaminees,whichhadadistributionthatresemblesrealexaminees. FigureG.1onpage190showsthedistributionoftheseexaminees.Noneoftheexamineesin thisdistributionhadtrue ssmallerthan-2andonlyafewwerelargerthan2.Asaresult, 136 theidealitempoolswerenotdesignedfortheexamineesoutsidethisinterval.Itisnormal foranexamineewithtrue -3or3tohaveanIPUIvaluesmallerthan1fortheseidealitem pools. Theperformanceofthetwooperationalitempoolswerealmostthesame.Betweentrue s-1and1,themeanIPUIvalueswereatorabove.99.ThemeanIPUIvaluesstartedto decreasetowardstheextremes.At = 2thevaluedecreasedto.95andfor = 3the meanIPUIvaluebecame.7.Theperformanceoftheoperationalitempoolsatthepositive sideofthe scalewaspoorercomparedtothenegativeside.At =2themeanIPUIvalue reducedto.94andat =3itwas.68. Figure5.56:MeanIPUIValuesConditionalonTrue foreachItemPoolCondition TheperformanceofHalfandOne-Thirditempoolsshowedthesamedecreasingpatterns towardstheextremesoftheabilityscale.Eventhoughtheperformancesofitempools 137 between = 1and1arenotdistinguishable,theperformanceofOne-Thirditempoolwas clearlypoorbetweenthisintervalcomparedtotheotheritempools. Figure5.57showsthemeanIPUIvaluesforitempoolsbetween = 0 : 7and0.7.The betweenitempoolsareclearinthisBothoperationalandidealitempools hadIPUIvalueslargerthan0.99betweenthisinterval.MeanIPUIvaluesforIdeal-0.4item poolsurpassestheremainingitempoolsthroughoutthisinterval.Operationalitempoolsare indistinguishableinthisaswell.Ideal-0.8itempooldidnotperformedaswellasthe operationalitempoolsclosetothecutscore.Butasthepreviousshowed,towardsthe extremesit'sperformancesurpassedthem.FornoneoftheexamineesHalfitempool'sIPUI valuesreach0.99.Thisitempooldidnotperformcomparativelywellfortheexamineesat thepositivesideoftheabilityscale. EventhoughFigure5.57showsaclearbetweentheperformancesofitempools, thepracticalimportanceofthismaynotbelarge.Thisissuewillbediscussedin moredetailinthediscussionsection.HereitcanbesaidthatIPUIshowseventheslightest betweentheitempoolsthatotherCAToutcomescouldnotcapture.Comparedto theotheroutcomevariables,IPUIclearlythestrengthsandweaknessesoftheitem pools. Theresultsofthepreviousoutcomevariableswereinconclusive.Forexample,at-0.10 themeanSEoftheoperationalitempool1(Op-1)waslowerthantheremainingitempools, at-0.05itwashigherthantheremainingones.Thesamethingcanbesaidforthebiasand MSE.Fornoneoftheseoutcomevariablescanonederiveaclearconclusionaboutwhether anitempoolisbetterthantheothersforaparticulartrue value.IPUIcanquantifythe ofeachitempoolateachtrue value.WhetherthebetweentheIPUI valueshaveapracticalisanotherissue.Theinconclusiveresultsespeciallyclose tothecutscoreforbias,SE,MSEandthedecisionaccuracyshowsthattheIPUI detecteddidnothavepracticallyttsontheotheroutcomesoftheadaptive test. 138 Figure5.57:MeanIPUIValuesConditionalonTrue aroundtheCutScoreforeachItem Pool TheprevioustwoshowedthemeanvaluesofIPUI.Inadditiontothis,the distributionofIPUIalsogivesimportantinformationabouttheperformancesoftheitem pools.Figure5.58showstheIPUIdistributionateachtrue value.Sometrue valuesaround thecutscorewereomittedinthistomakeitmorereadable.Theboxplotsshownfor eachtrue valueincludethemedian,quartilesandthespreadoftheIPUIdistribution. Foreachitempoolcondition,thespreadoftheIPUIvaluesincreasedtowardstheextremes oftheabilityscale.Operationalitempoolsperformedwellclosetothecutscore.Towards theextremes,thespreadoftheIPUIvaluesincreasedfortheseitempools.Theperformance ofreduceditempoolswereclearlyworsethantheoperationalitempools.Theone-Third itempooldidnotperformedwellevenclosetothecutscore.ThespreadofIPUIvalueswere 139 largeevenfortheexamineesclosetothecutscore.TheperformanceofIdeal-0.4wasthe best.Evenattheextremes,examineescangetthemostappropriateitems. Figure5.58:IPUIDistributionConditionalonTrue foreachItemPool 140 CHAPTER6 DISCUSSION 6.1SummaryoftheResults Theaimofthisstudywastodevelopamethodtoevaluatethequalityofitempoolsfor adaptivetests.Currentmethodstoevaluatetheadequacyoftheitempoolswerediscussed inSection2.3.6.ThesemethodsfallshortastheCATtestdesigngetscomplicated.The histogramoftheitemparametersforanitempoolmightshowthatitempoolist. ButchangingaspoftheCATtestmightmakethisitempoolientforthe purposeofthetest.Tosolvethisproblem,anewindexevaluatingthequalityofitempools wasdeveloped.ThisindexiscalledItemPoolUtilizationIndex(IPUI,seeSection3.2on page27forthederivationofthisindex).IPUIquantheamountofdevianceofanitem poolfromaperfectlyoptimumitempool.Thistheoreticaloptimumitempooleach sptionofthetestandprovidesanoptimumitemintermsofinformationateachstage ofthetestforeveryexaminee. Theutilityofthisnewlydevelopedindexwasinvestigatedbyeresearchquestions(see Section4.1onpage36).Thethreeresearchquestionsusedsimulateddataandthelast twoquestionsusedoperationaldata. ResearchQuestion1investigatedwhetherIPUIissensitivetothechangesinthequality oftheitempools.Itempoolqualitywasoperationalizedby(1)thediscrepancybetweenthe itemydistributionoftheitempoolandtheabilitydistributionoftheexaminees,and (2)theitempoolsize.ForthepartoftheResearchQuestion1,13discrepancyconditions weretested.Theitempoolwasthesameforallconditions.Itemyparametersof theitempoolweregeneratedfromthestandardnormaldistribution.Theexamineeability distributionsweregeneratedfromanormaldistribution.Themeansofthedistributions 141 rangedfrom-3to3with0.5intervals.Theresults(Figures5.1and5.3onpage54andon page58)showedthatIPUIwassensitivetothediscrepancybetweenitempoolsandability distribution.Increasingdiscrepancyedtheotheroutcomesoftheadaptivetestaswell, suchasSE.ButIPUIhadmorevariabilityinitsevaluationoftheshortcomingsofitempool comparedtoSE,andtherelationshipbetweenSEandIPUIwasnotlinear(Figure5.4on page59). SecondpartoftheResearchQuestion1investigatedwhetherIPUIissensitivetothe changesofthesizeofanitempool.Eleventitempoolsrangingfrom20itemsto 1000itemswereinvestigated.Asthenumberofitemsintheitempoolincreased,themean valueofIPUIincreasedaswell(Table5.2onpage67).After300items,theincreaseinIPUI valueswereminimal. Additionally,ResearchQuestion2lookedatthesamplingdistributionofIPUI.For eachitempoolsizecondition,25replicationswereperformed.Withineachcondition,each replicationhadthesameitempoolsize.Theitemyparametersweregeneratedfrom thesamedistribution.Table6.1showsthemeanandstandarddeviationofmeanIPUIvalues aggregatedbyreplication.ThevariabilityofthemeanIPUIvalueswerelargerforsmallitem poolsizes.ThevariationsofmeanIPUIvalueshadtwosources.First,thevariationdueto samplingoftitempoolswiththesamesizesforeachcondition.Second,thevariation duetoIPUI. ThespofCATscangetrathercomplex.Anitempoolthatisworkingperfectly foronesetofspmightnotperforminasimilarwayforanothersetof spResearchQuestion2investigatedwhetherIPUIcandetecttheadequacyofthe sameitempoolfortCATspations.TwoCATspwereinvestigated: testlengthandexposurecontrol. FirstpartoftheResearchQuestion2investigated18ttestlengthsrangingfrom5 to400 1 .ResultsshowedthatastestlengthincreasedthevalueofIPUIdecreased(Figure5.20 1 Thesizeoftheitempoolwasalso400 142 Table6.1:MeansandStandardDeviationsofMeanIPUIsoftheReplications ItemPoolSizeMeanofMeanIPUIStandardDeviationofMeanIPUI 200.58750.02528 400.83480.00935 600.90280.00803 800.93450.00928 1000.94940.00843 2000.97930.00448 3000.98800.00210 4000.99060.00217 5000.99240.00145 7500.99530.00090 10000.99650.00096 onpage83).IPUIdistributionsindicatedthatthisitempoolcansupportatestlengthof50 formajority(75%)oftheexaminees. SecondpartoftheResearchQuestion2investigated12exposurecontrolconditions.These conditionsrangefromnoexposurecontroltorandomselectionofitems.IPUIdetected evensmallbetweenconditionswhereotherCAToutcomesshowedno (Figure5.22onpage87).ResultsoftheResearchQuestion2showedthatIPUIisvery sensitivetoevensmallmoofthetestsp ResearchQuestion3wasdesignedtoshowtheutilityofIPUIasadiagnostictoolforitem poolevaluation.Threetestplanswithtspewereinvestigated.Thetwo planshadthesameitempoolbutintheplancontentbalancingwasimposedonitem selection.Forthesecondcondition,therewasnotanyconstraintsonitemselectionalgorithm. Thethirdplanhadanitempoolconsistingofratheritems.IPUIresultsclearly showedtheweakpointsoftheitempoolforeachcondition.Furtherdiagnosticinformation providedbytgraphsofIPUIshoweddetailedinformationoftheweaknessesofthe itempoolforparticulartestsp ResearchQuestion4and5usedoperationaldataprovidedbyNCSBNtoshowtheutility ofIPUI.ResearchQuestion4comparedninetitempoolswiththesamesp 143 astheNCLEX-RNexam.Fiveoftheitempoolsweretheoperationalitempoolspreviously usedinNCLEX-RNexams.Twoitempoolsweretheidealitempoolsgeneratedforthe spoftheNCLEX-RNexam.Twoitempoolsweregeneratedbyrandomlyremoving halfandone-thirdoftheoperationalitempool.Thesameexamineedistributionas therealexamineepopulationwasusedforthecomparisons.TheresultsoftheResearch Question4showedthatallofthetheitempooldesignsperformedwellfortheexaminee group.AmongotheroutcomesofCATonlytheIPUIdetectedtheweaknessesofthehalf andone-thirditempools(Figure5.41onpage113).IPUIdetectedthatthesetwoitempools depletedofappropriateitemstowardstheendofthetestforexamineesclosetothecutscore (seeFigure5.45onpage119).Operationalandidealitempoolswerestrongevenforthelong tests. ResearchQuestion5wassimilartotheResearchQuestion3.Theaimwastoshow theutilityofIPUIasadiagnostictoolforanoperationalCAT.Theresultsshowedthat operationalandidealitempoolswereveryrobustclosetothecutscore.Towardstheextremes oftheabilityscale,theoperationalitempoolsstartedtoweaken(seeFigure5.56onpage137). Butthishadnoonthedecisionaccuracy(seeTable5.10onpage136).IPUIdetected evenslightbetweenitempoolsclosetothecutscorewhereeachitempoolwas comparativelystrong(seeFigure5.57onpage139). 6.2PracticalUsesofIPUI TheresultsofthestudyshowedthatIPUIwasverysensitivetothechangesinthequalityof itempools,changesintestsptionsthattheutilizationoftheitempool,and wasausefuldiagnostictooltoimprovetheitempoolquality.Inthissectionthepractical usesofIPUIarediscussed. 144 6.2.1QuanoftheItemPoolQuality Testdevelopersarebuildingitempoolsforadaptivetestsveryfrequently.Usersofthesetests needtoknowthequalityoftheteststhatareadministered.Sincethequalityoftheadaptive testsarestronglytiedwiththequalityoftheitempools(Flaugher,2000),testdevelopers havetomakesurethatthequalityoftheiritempoolsareadequateforthepurposesofthe test. Fourgeneralmethodsareusedtogetinformationabouttheitempool:(1)itempoolsize, (2)descriptivestatisticsforitempoolparameters(i.e.mean,standarddeviation,histogram, etc.)(3)itempoolinformationfunction(4)outcomevariablesofCATsimulations.Eachof theseexistingmethodshastheirownshortcomingsasexplainedbelow. Itempoolsize,i.e.thenumberofitemsintheitempool,isanimportantindicatorofthe adequacyoftheitempool.Eventhoughgeneralrulesexistforthesizeofanitempool,such asitempoolsizeshouldbetwelvetimesthelengthoftheadaptivetest(Stocking,1994), therearenoonesizeallkindofgeneralruleforanadequatesizeforanitempool.An itempoolwhichisperfectlyadequateforanexamineegroupmightnotbeadequatefor anotherexamineegroup(seethepartoftheResearchQuestion1inSection5.1.1.1on page53).Ahighquality100itemsmightperformbetterthanalowquality200items(Xing &Hambleton,2004).Asaresult,inadditiontothesizeoftheitempool,theinformation aboutthequalityoftheitemsarenecessarytoevaluatetheitempool.Thequalityofitems canbemeasuredbytheitemparameters. Thedescriptivestatisticsfortheitempoolparametersareusefultoseetheoverallpicture oftheitempool.Commondescriptivestatisticsarethemeansandstandarddeviations oftheitemparametersorthehistogramswhichareusedtovisualizetheitemparameters. Especiallythedistributionoftheitemyparameterscanbehelpfultoseewhether thereisadiscrepancybetweentheitempoolandtheabilitydistributionoftheexaminee group.Forinstance,avisualcomparisonofFiguresA.1andA.2onpage167andonpage168 145 willgiveanideaabouttheabilitydistributiontowhichtheitempoolismostappropriate. ButoperationalCATsarerarelyasstraightforwardastheonesinResearchQuestion1.There aremanyconstraintsontheitemselectionalgorithmwhichmakesitulttoevaluate theoftheitempoolbysimplyinspectingthedescriptivestatisticsoftheitem parameters.Forexample,plan1and2intheResearchQuestion3usedthesameitempools (Figure5.30onpage98).Butduetothecontentbalancingimposedontheitemselection algorithminplan1,theperformancesofthesetwoitempoolsalot(seeFigures5.32 and5.34onpage100andonpage102). Itempoolinformationfunctionsarewidelyusedtoevaluatetheadequacyoftheitem poolforaparticulartestpurpose.XingandHambleton(2004)useditempoolinformation functionstocomparetitempools(seeFigure2.1onpage24).Inaddition,itempool informationfunctionsarewidelyusedtobuilditempoolsforadaptivetests(vanderLinden, Veldkamp,&Reese,2000,2006;Belov&Armstrong,2009).Buttestinformationfunctions fromthesamedisadvantageexplainedinthepreviousparagraph.Twoitempools mighthavethesameinformationfunctions(asinplan1and2ofResearchQuestion3)but theycanperformtly. ThecomparisonoftheitempoolsusingtheoutcomesoftheCAT,suchasbiases,SEs oftheabilityestimatesandtheexposureratesoftheitemsisacommonapproachaswell (He&Reckase,2013;Thompson&Weiss,2011).Infact,inalloftheresearchquestions investigated,suchoutcomesofCATswereusedalongwiththeIPUItocomparetheitem pools.TheoutcomesoftheCATgivesvaluableinformationabouttheperformanceofthe itempools.Butnoneofthemgivesadirectwaytoevaluatethequalityofanitempool. Figure1.1onpage2isaverygoodexampleforthis.Thisshowsthattheitempool providedveryappropriateitemstotheExaminee3,butnottotheExaminee4.TheIPUI valuesofExaminee3and4was0.98and0.273respectively.ButtheSEoftheExaminee3 washigherthantheSEofExaminee4.Ifonejudgestheperformancesoftheitempoolfor thesetwoexamineesaccordingtoSE,aninaccuratepicturecanbedrawn.Suchexamples 146 canbefoundfortheotheroutcomesoftheCAT. Asitwasshowninthisstudy,IPUIisadirectwaytomeasuretheadequacyofanitem poolatboththeexamineelevelandtheexamineegrouplevel.Atestdevelopercaneasily quantifytheadequacyofitempoolbyusingIPUIwithoutresortingtotheindirectwaysto evaluatetheitempools. 6.2.2IPUIinOptimalTestAssembly ThespecialissueofAppliedPsychologicalMeasurementinSeptember1998wasaboutthe optimaltestassembly.vanderLinden(1998b)introducedtheconceptanddiscussedt methodstoassembleatestoptimally.Hedividedthetestspintotwobroadareas, constraintsandobjectives.Constraintsarethetestoritemattributesthathasanupper and/orlowerlimittobemet.Forexample,aminimumormaximumnumberofitemstobe administered,thenumberofwordsinthetest,thenumberofitemswithandetc.are amongtheconstraints.Theobjectivesrequireatestattributeorafunctionofitemattributes toreachaminimumormaximum.Forexample,maximizationoftestinformationwithin acertainrange,maximizationofthetestvalidity,maximizationofthedecisionaccuracy, minimizationofthestandarderrorofabilityestimatesandetc. Theseconstraintsandobjectivesgenerallyleadtoanoptimizationproblem:optimization ofanobjectiveinthepresenceoftheconstraints.IPUIcanbeusedasaconstraintorasan objectiveintheseoptimizationproblems.Asaconstraint,atestdevelopermightrequire thatnoneoftheexamineesinanadaptivetestshouldhaveanIPUIvaluelessthanacertain value.Ontheotherhand,IPUIcanserveasanobjectiveofatestassemblyproblemtobe maximized.AnitempoolthathasamaximummeanIPUIvaluethatallofthe constraintsofthetestcanbechosenasanitempool. 147 6.2.3IPUIasaQualityControlTool Tests,especiallythehighstakesones,shouldconformtoindustrystandardsasdescribed inStandardsforEducationalandPsychologicalTesting(AmericanEducationalResearch Association,AmericanPsychologicalAssociation,&NationalCouncilonMeasurementin Education,2014).Testdevelopersconstantlymonitorthequalityoftheteststheyadminister. Inadditiontoadheringtothehighqualitytestdevelopmenttechniques(Schmeiser&Welch, 2006),developersoftheCATtestshouldmakesurethattheirtestsareperformingasintended. ForCATtests,testdevelopershavelesscontrolovertheparticulartestanexaminee gets.Hence,theneedforqualitycontrolishigh.Oneofthemethodstestdevelopersuseis showingthepaper-and-pencilcopiesofvariousCATteststoexperttestspecialists(Eignor etal.,1993).Theseexpertsexaminethetestsandcheckwhetherthetestsadheretothetest sp IPUIcanbeusedasanadditionaltoolforcheckingthequalityofindividualteststhatare administeredtotheexaminees.Ifanexaminee'sIPUIvaluedropsbelowacertainlevel,the examineemightbeForinstance,Figure5.43onpage116showedthatforOne-Third itempool,examineeswithabilityestimateslargerthan2hadrelativelylowerIPUIvalues. Ifthisisnotacceptableconsideringthetestpurpose,testdevelopercantakeappropriate precautions.AlowIPUIvaluethatitempoolfailedtoprovideappropriateitems tothisexaminee.Testdevelopercaneitherimprovetheitempoolorchangethetest sptoimprovetheutilizationoftheitempool. 6.2.4IPUIasaDiagnosticTool Asindicatedintheprevioussections,IPUIcanbeusedtomeasurethequalityoftheitem poolandasatoolofqualitycontrol.Afteroutthatitempoolisnotperformingwell, thenextstepofatestdeveloperistodiagnosethenciesoftheitempoolandimprove theitempoolinsuchawaythatitempoolprovideseachexamineeappropriateitems. 148 DiagnosticutilityoftheIPUIwasinvestigatedinResearchQuestions3and5.Theresults oftheseresearchquestionsshowedthatIPUIcanshowthesoftheitempools andguidetestdeveloperstoaddsuitableitemstothesedForexample,in ResearchQuestion3,itistojudgefromtheSEgraphinFigure5.32onpage100 thepropertiesoftheitemsthatshouldbeaddedtotheitempoolinplan1.TheIPUIvs. itemnumbergraphinFigure5.35onpage104showedthatthecauseofthelowperformance ofthisitempoolwasthecontentbalancingrestrictions.Further,Figure5.36onpage105 showedtheapproximateitemyvaluesneededfromeachcontentareatoimprovethe itempool.Similargraphsthatusesothertestconstraintscanbehelpfultodiagnoseand improvetheitempools. 6.2.5IPUIatIndividualandGroupLevel TheprevioussectionsexplainedfourtwaystheIPUIcanbeusedinpractice.Itis believedthatIPUIcanbeusedasanoutcomevariableforCATjustlikebias,SE,MSE, exposurerateoroverlaprate.TestdeveloperscanuseIPUIattwolevels:atgrouplevelor atindividuallevel. Atgrouplevel,IPUIisanindicatoroftheadequacyofanitempoolforagivensetoftest spnsandexamineegroup.Thegrouplevelstatisticscanbethemeanormedian oftheIPUI.Thesestatisticswillbytheexamineegroup.Iftheitempoolis appropriateformostoftheexamineesthemeanofIPUIwillbelarge.Iftheitempoolis appropriateforonlyasmallportionoftheexamineestested,themeanofIPUIwillbesmall. Practitionerscanusethesesummaryvaluestoevaluatethequalityoftheiritempools overtime.Iftheexamineegroupdoesnotchangedramaticallyfromyeartoyear,test developercansetastandardminimumforthemeanIPUIvalue.Overtheyearseachitem pooldevelopedcanbecomparedtothisstandard.Thiswillensurethefairnessofthetest acrossyears.ResearchQuestion2showedthatachangeintestspcanthe adequacyoftheitempool.Usingastandardlikethiswillallowtestdeveloperstoimplement 149 newtestspwhileensuringthattheadequacyoftheitempoolisstillonparwith theitempoolsusedintheprevioustestingwindows.Forinstance,ifthetestingagency decidestoincreasethesecurityoftheitempoolbyimplementinganewexposurecontrol method,testdeveloperscanuseIPUItoensurethattheitempoolisstilladequateaftersuch achange. AseconduseofIPUIisattheindividuallevel.AtestdevelopercansetaminimumIPUI valueforeachexamineesothattheitempoolisadequateforeachtesttaker.Inpractice, theitempoolscannotprovideappropriateitemstotheexamineesattheextremesofthe abilityscale.Forinstance,operationalitempoolsinResearchQuestion4didnotprovide appropriateitemstosomeexamineeswithestimated valuessmallerthan-2orlargerthan2 (Figure5.44).Forthisoperationaltest,theinadequacyoftheitempoolfortheseexaminees didnotthedecisionabouttheexaminees.Ontheotherhand,ifthepurposeofthe testwastomeasureeachstudenttlywell,thetestdevelopercanmaketheitempool broadenoughtoprovideappropriateitemstotheexamineesattheextremes.Thiswillensure thetestfairnessatindividuallevel.Anitempoolthatprovidesappropriateitemstosome groupofexamineesbutnotforanothergroupwillunderminethefairnessofthetest.Using aminimumvalueforIPUIasabenchmark,testdeveloperscanensurethequalityofservice foreachindividualexaminee. Fromtheperspectiveofanexaminee,theywantafairinstrumentthatmeasurestheir abilityaspreciselyaspossible.IftheexamineegetsatestwithhighIPUIvalue,thismeans atleasttheitempoolportionoftheCATworked AsdiscussedinSection6.4.2onpage156,thisstudydoesnotprovidearecommended valueforIPUI.Butthisdoesnotprecludetestdeveloperstosettheirownstandardsand comparetheperformancesoftheitempoolsusingIPUI. 150 6.3Implications 6.3.1TheRobustnessofCATProcedurestoWeakItemPools TheresultsofthestudyshowedthatIPUIwasverysensitivetochangesinthequalityofthe itempool.WhenaCATalgorithmadministerssub-optimalitems,IPUIdetectsthem.The otheroutcomesoftheCATwerenotassensitivetosub-optimalityoftheitempoolunless theitempoolunderperformedtly.Forexample,meanbiaswasrarelyby theinadequacyoftheitempoolunlesstherewasalargediscrepancybetweenitempooland abilitydistribution(seeFigure5.31onpage99). InResearchQuestion4,theycot,themeanbias,thedecisionaccuracyand themeanSEvalueswerealmostthesameacrossthetitempoolconditions(see Table5.7onpage114).EventhoughIPUIindicatedaperformanceceamongitem pools,thisdidnotontheotheroutcomes.Theresultsoftheotherresearchquestions indicatedthistoo.ThistherobustnessoftheCATprocedurestoinadequateitem pools. TherobustnessofmaximumlikelihoodabilityestimationinaCATwasshownbyChang andYing(2009).Theyfoundthatevenforanitembankwithalimitedcapacity,the maximumlikelihoodestimatesof wereconsistentandasymptoticallynormal. FortheRaschmodel,therobustnessoftheCATprocedurestotheselectionofsub-optimal itemswasobservedbyotherresearcherstoo.Bergstrometal.(1992)performedastudywhere theymotheitemselectionalgorithmtoselectitemswith0.5,0.6and0.7probabilities ofcorrectresponses.Theofthesemoontheprecisionoftheabilityestimates wasminimal.Way(1998)concludedthat\...theadaptivenatureofCATplaysasurprisingly minorrolewhentheRaschmodelisused.Thiscontradictsthecommonlyheldassumption thatCATwilltlyimprovemeasurementprecisionthroughtargetingitemselection toeachindividual"(p.21).Theresultsofthisstudycorroboratestheofthese researchers. 151 6.3.2SummaryStatisticsforIPUI WhenevaluatingtheIPUIforagroupofexaminees,atestdeveloperhastoptionsto summarizethedistributionofIPUI.Themostinformativewayistovisualizethedistribution ofIPUIvaluesusingahistogram,boxplotorascatterplotoftheIPUIversusthe estimates 2 . Thiswillallowthetestdevelopertoobservetheperformanceoftheitempoolatanindividual level. Atthegrouplevel,theofIPUIinEquation(3.6)onpage29usesthemean tosummarizetheperformanceoftheitempool.EspeciallyforskewedIPUIdistributions, averagingtheIPUIvaluesmightnotgiveagoodpictureoftheadequacyofthetest.The distributionofIPUIisgenerallynegativelyskewediftheitempooliswellsuitedforthe majorityoftheexaminees.Thiscanbeseenforsomeconditionsintheresultssection(i.e. Figures5.3and5.20onpage58andonpage83).Insuchcases,themeanandmedianvalues ofIPUIandmayleadtotinterpretationsaboutthequalityoftheitempool 3 . Thiswillthepotentialcomparisonsoftheitempools.Whenthemeanandmedian valuesofIPUIarediscrepant,practitionersareadvisedtolookattheoveralldistributionof theIPUIvaluesandevaluatetheitempoolsaccordingly. InadditiontothemeanormedianvaluesofIPUI,thevariationoftheIPUIcangive importantpiecesofinformation.Thebestcasescenarioforanitempoolisahighmean andasmallstandarddeviationofIPUIvalues.Thishappenswhentheitempoolprovides appropriateitemstoalmostalloftheexaminees.However,ifthevariationoftheIPUI valuesislarge,thisindicateslargediscrepanciesbetweentheperformanceoftheitempool fortexaminees.Dependingonthepurposeofthetest,thismightnotbefair. 2 SeeFigure5.44onpage118asanexample. 3 SeetheendofSection5.1.2.1onpage82foranexampleandadiscussionaboutthe betweenusingmedianandmeanasasummarystatisticsforIPUI. 152 6.3.3CommentaryontheResultsoftheOperationalItemPools Theresultsforthecomparisonoftheoperationalitempoolswereverygood.Exceptforthe examineeswhowerefarawayfromthecutscore,performancesoftheoperationalitempools wereverygood.Sincethepurposeofthetestwasdividingexamineesintotwogroups,the enessoftheitempoolattheextremesmightnotbecrucial.Operationalitempools performedverywellaroundthemiddleoftheabilitydistributionwherethecutscorewas located.Examineeswhowereclosertothecutscoretooklongertests.So,itwasimportant foritempooltoprovidetnumberofappropriateitemsfortheexamineesclosetothe cutscore.ThegraphscomparingIPUIandtestlength 4 showedthattheIPUIvaluesofthe examineeswhotooklongtestswereveryhigh.Thissuggeststhattheoperationalitempools providedappropriateitemstotheexamineeswhosetestslasted250items.Theseexaminees weretheonesforwhomthemeasurementprecisionwasveryessential. Infact,NCLEX-RNexamwasnotaverygoodexampletoshowthemeritsoftheIPUI. Theexamhasalonghistoryandtheitempoolsfortheoperationaltestsaremeticulously preparedforthetest.Furthermore,thetestisverylong,consequentlyitisveryhardtogeta decisionerrorunlesstheexaminee'struescoreisclosetothecutscore.ThemeritsofIPUI wouldbemoreevidentfortestswithmuchsmalleritempoolsandthedecisionforawide rangeofabilitiesareneeded.Achievementtestswhichdesiretomeasureawiderangeof abilitieswouldbeagoodexampleforshowingtheusesofIPUI. 6.3.4IPUIandMeasurementQuality IPUIisanindicatoroftheadequacyoftheitempool.Itisnotanindicatorofthequality ofthemeasurement.Certainly,anadequateitempoolwouldimprovethequalityofthe measurement,butitisnotatcondition.Anitempoolmightprovideappropriate itemstoanexaminee,butstill,themeasurementqualityofthetestmightbelow. 4 SeeFigure5.45onpage119. 153 Forexample,Figure3.2onpage32showsthattheitempoolprovidedappropriateitems toExaminee3throughoutthetest.TheIPUIvalueforthisexamineewas0.98,indicating theitempoolwasadequateforthisexaminee.ButtheSEoftheabilityestimateforthis examineewashigh,0.685.Thetestendedafter8items,andforthisexaminee,8itemswere notenoughforapreciseestimateoftheability.Eventhoughtheitempoolportionofthis testperformedwell,thetestspneedstobechangedforaprecisemeasurementof theability. Ontheotherhand,apreciseabilityestimatedoesnotmeanthattheitempoolperformed well.Forexample,Examinee4inFigure3.2hadlowerSEcomparedtoExaminee3,butthe itempoolfailedtoprovideappropriateitemstothisexaminee.Theresultsofthepart oftheResearchQuestion2alsocorroboratesthis.Figure5.14onpage74showsthatwhen testswerelongertheabilityestimatesweremoreprecise.Yet,theitempoolfailedtoprovide enoughappropriateitemsforlongertests. Thereareothersituationswhereanitempoolisadequatebutduetotheotheraspects oftheCATalgorithm,themeasurementqualityForinstance,iftheitemselection algorithm(suchasEAPorMAP)usesastrongpriordistribution,theabilityestimatewill bebiased(Kim&Nicewander,1993).Theitempoolmightprovideappropriateitemstothe examinees,butthiswillnotreducethebiascausedbytheitemselectionalgorithm.Ahigh IPUIvaluemightnotcorrespondtosmallerbiases. Theresultsofthisstudyshowedthat,ingeneral,anadequateitempoolisassociated withbettermeasurement 5 .Nevertheless,asdiscussed,anadequateitempooldoesnotalways enoughforhighmeasurementquality. 5 SeeFigures1.1,5.4and5.11onpage2,onpage59andonpage70. 154 6.4LimitationsoftheStudy 6.4.1GeneralizabilityoftheResults Thegeneralizationsmadeinthestudyarelimitedtothemethodsused.Forexample,in thesecondpartoftheResearchQuestion2,theofexposurecontrolonIPUIwas investigated.Inthatresearchquestion,onlytherandomesqueexposurecontrolmethodwas usedasaproxyofexposurecontrolmethods.AsdiscussedinSection2.3.3.2onpage14,there aremanyotherexposurecontrolmethodsusedinoperationaltests.Theresultspresentedin thisstudyarelimitedtorandomesqueexposurecontrolmethod.Butstill,itisexpectedthat anyexposurecontrolmethodwillreducethequalityoftheitempool.Similarlimitationsof generalizabilityarealsovalidfortheotheraspectsofthesimulations. Theitemselectionprocedureisanotherexampleofthelimitedgeneralizabilityofthe currentstudy.MFIitemselectionalgorithmwasusedinallofthesimulationsinthisstudy. Resultsmightbetforotheritemselectionalgorithms.Thegeneralizabilityofthe resultsfromMFItootheritemselectionalgorithmsmightnotbestraightforward.MFI usestheFisherinformationoftheitems.TheIPUIalsodependsontheFisherinformation. Inthisrespect,IPUIisveryrelevanttoCATsusingMFI.Ontheotherhand,otheritem selectionalgorithmsmighthaveatcriteriaforselectingtheitems.Forexample, Kullback-Leibleritemselectionalgorithm(Chang&Ying,1996)searchesforanitemthat maximizestheglobalinformationinsteadofFisherinformation.Asaresult,evenifthereare nootherconstraintsontheitemselectionalgorithm(suchasexposurecontrolorcontent balancing)andtheitempoolistlylarge,theIPUIvaluemightbelowforthisitem selectionalgorithm.Inthisregard,thismightbeseenasalimitationofIPUI.ButIPUIcan begeneralizedandreformulatedsuchthatinsteadofmaximizationoftheFisherinformation, maximizationoftheglobalinformationmightbeexpected 6 . OperationalCATshavemanyotherconstraintssuchasitemenemies,limitationsonthe 6 MoreonthisinSection6.5.1onpage161 155 numberofcertaintypesofitems,limitationsonwordcounts,keydistributionconstraints (i.e.samenumberofA's,B's...shouldappearascorrectresponse.)andetc 7 .Thesewere notexploredinthisstudy.Eachoftheseconstraintsareexpectedtoreducethevalueofthe IPUI. Throughoutthisstudy,theofchangingonlyoneCATsponIPUIwas investigated.InResearchQuestion2,asthetestlengthincreased-whileholdingeveryother aspectoftheCAT-themeanvalueofIPUIdecreased.Butincreasingthetestlengthalso decreasedthestandarderrorsofabilityestimatesbecausetestsbecamemoreprecise 8 .Onthe otherhand,inthesecondpartoftheResearchQuestion2,increasingtheexposurecontrol parameterdecreasedtheoverallIPUIvaluesandincreasedtheSEsoftheabilityestimates. Inbothofthesesetsofsimulationssinceonlyonesptionwaschangedatatime,it waseasytoobservetheofthesechangesonIPUIandotherCAToutcomes.But,itis hardtopredicttheofchangingmorethanonespontheoutcomesofaCAT andIPUI.Forinstance,ifthelengthofthetestandtheexposurecontrolparameterhave increasedatthesametime,itwouldbehardtopredictthepotentialchangesintheoutcomes. TheIPUIvalueswoulddecrease,butitisveryhardtopredictthechangesinSE. Therefore,itwouldbevaluabletoperformmultifactordesignswherethemainsand theinteractionsbetweentCATsponsonCAToutcomesandIPUIcanbe observed.Asafutureexpansionofthisstudy,thiswillbevaluable. 6.4.2ARecommendedValueforIPUI IPUIvaluesareboundedbetween0and1.Thisisveryusefulfromtheperspectiveof comparingtitempoolsorcomparingthesameitempoolforttestsptions orexamineepopulations.Butfromapracticalpointofviewitisdesirabletohavea recommendedvalueforIPUI.IfIPUIgoesbelowaspvalue,thiswillsignalthetest 7 TheseconstraintswereinvestigatedinSection2.3.3onpage12 8 Howeveer,theofthetestsdecreased. 156 developerthattheitempoolisnottforanexamineeoragroupofexaminees.One oftheaimsofthisstudywasaspnumbersothatthetestdeveloperscoulduse asaforaninadequateitempool. ForverybasicCATdesignsarecommendedvalueforIPUImightbetenable.Section5.1.1.2 onpage69explainedtherelationshipbetweenreliabilityandthestandarderror.Assuming thattheexamineepopulationhasastandardnormaldistribution,astandarderrorvalueof 0.32correspondstoareliabilitycotof0.9.Table5.1onpage55indicatesthat,when thediscrepancybetweentheexamineeabilitydistributionandtheitemydistribution was1.5(or-1.5),themeanstandarderrorofabilityestimateswereapproximatelyequalto 0.32.ThemeanIPUIvalueforthesediscrepancyconditionswere0.9and0.91respectively. InthesecondpartorResearchQuestion1,whentheitempoolsizewas60,themeanSE wasapproximately0.30(seeFigure5.2onpage56).ThemeanIPUIvalueforanitempool sizeof60itemswas0.9028(seeTable5.2onpage67).Thesetwoprovidessome evidenceforarecommendedvalueforIPUI.Fortheseparticulartestdesignsandexaminee populationstherewasacorrespondencebetweenareliabilityof0.9andanIPUIvalueof0.9. ButtheresultsofResearchQuestion2-wherethetsoftestsponIPUIwere investigated-didnotthiskindofrelationshipbetweenthemeanSEandmean IPUI 9 . Additionally,attachingadirectlinkbetweenIPUIandanotheroutcomeofaCATtest suchasSEwillobviatetheuseofIPUI.IPUIcapturesauniqueaspectoftheitempool, whetheritisadequateornot.OtheroutcomesoftheCATarecapturingsomeotherimportant aspectsofthetest,butnotnecessarilytheadequacyoftheitempool.Also,theremaynot beadirectlinkbetweenIPUIandtheotheroutcomeoftheCAT.Forexample,Section3.4 showedthatIPUIandSEcapturestaspectsoftheCAT.Resultsofthepart oftheResearchQuestion2showedthatadecreaseintheadequacyofanitempooldidnot implyadecreaseinthequalityofthemeasurement.Astestlengthsincreased,themean 9 SeeFigures5.14and5.22onpage74andonpage87. 157 valuesofIPUIandSEdecreased 10 . Thechallengewitharecommendedvalueiscloselyrelatedtotheionofan inadequateitempool.Unfortunatelythereisnotaclearitionofaninsutitem pool.Antitempoolrevealsitselfbytheoutcomesofthetest.Thesecanbehigh standarderrorsoftheabilityestimates,thebiasoftheabilityestimates,theviolationofthe constraintsofthetestorfailuretosatisfythetestspTherearenouniversally acceptedbenchmarksforanyoftheseoutcomes.Obviously,teststhatmeetallofthetest spandhavinglowstandarderrorsoftheabilityestimatesandbiasesaredesirable. Buthowlowisgoodenough?Iftherewasanacceptedthresholdbetweenagoodtestanda badtest,itcouldbepossibletoaspecIPUIvalueforthatthreshold. Thischallengeappliestootherindicesofthetestqualityaswell.Ifweask\Whatisa goodvalueforthetestreliability?",theanswerofapsychometricianwouldbe\Itdepends onthecontext.".HereisanexcerptfromNunnallyandBernstein(1994)onthestandardsof reliability: Asatisfactorylevelofreliabilitydependsonhowameasureisbeingused.In theearlystagesofpredictiveorconstructvalidationresearch,timeandenergy canbesavedusinginstrumentsthathaveonlymodestreliability,e.g.,.70.[...] Incontrasttothestandardsusedtocomparegroups,areliabilityof.80maynot benearlyhighenoughinmakingdecisionsaboutindividuals.Groupresearchis oftenconcernedwiththesizeofcorrelationsandwithmeanamong experimentaltreatments,forwhichareliabilityof.80isadequate.[...]If importantdecisionsaremadewithrespecttosptestscores,areliabilityof .90isthebareminimum,andareliabilityof.95shouldbeconsideredthedesirable standard.(pp.264-265) Astheauthorsindicated,therecommendedvalueforreliabilitydependsonthecontext 10 SeeFigure5.14onpage74. 158 wherescoreswillbeused.Similarly,therecommendedvaluesforIPUIshoulddependonthe contextwheretheitempoolwillbeused.AhighstakesadaptivetestmightrequireanIPUI valueof.99foreachexaminee.Forexample,theoperationalpoolsoftheNCLEX-RNexam wereadequateclosetothecutscore.TheIPUIvaluesoftheexamineeswerelargerthan0.99 closetothecutscore(Figure5.44).Ontheotherhand,ifthepurposeoftheadaptivetestis simplytoobtainareliablegroupsummaryofexaminees,thenalowervalueofIPUImight beacceptable(Kruyen,Emons,&Sijtsma,2012). Inaddition,therecommendedvaluesforreliabilitymentionedearlierwaspossiblynot agreeduponrightawayamongresearcherswhenCronbachwrotehishighlycitedpaperin 1951(Cronbach,1951) 11 .Instead,yearsofuseoftheinternalreliabilitycotinent contextsestablishedtheserecommendedvaluesamongresearchers.Similarly,itisexpected thatastheuseofIPUIbecomesprevalentamongthepractitionersanditisusedint contexts,thepropertiesofIPUIandit'sinteractionswithothertestspeciwillbe exploredmore.ThiswillpotentiallyleadtoarecommendedvalueforIPUI. 6.4.3DetectionoftheRedundantItemsintheItemPool Thereisadelicatebalancebetweenasatisfactoryitempoolandanitempoolwhichhasmore thanenoughitems.Bothofthemservewellforthepurposesofatest.Butlargeritempools havetheirowndisadvantagesasdiscussedinSection2.3.5.1.Testdeveloperswanttheiritem poolstosatisfythetestpurposestly.However,duetothecostofdevelopingitems, theydon'twantredundantandunderuseditemsintheitempools.Buildinganitempool thatcontainsjustenoughnumberofitemsisnoteasy. IPUIcannotdistinguishbetweenalavishitempoolwhichhasmorethanenoughitems andanitempoolthatissatisfactoryenoughanddoesnothaveredundantitems.Forboth oftheseitempools,IPUIwillbe1.IfanitempoolhadanIPUIvalueof1,addingmore itemstothisitempoolwouldnotincreasethevalueofIPUIfurther.Inpractice,anIPUI 11 Here,itisnotsuggestedthatCronbachistheinventorofreliability. 159 valueof1almostneverhappens.Unlessthetestdeveloperaddsitemswithexactlythesame itemparameters,meanIPUIvalueswillalwaysincrease.SeeTable5.2onpage67onhowan increaseinthesizeoftheitempoolincreasedtheIPUIslightlyforlargeitempoolsizes. 6.4.4ThePurposeoftheTestandtheoftheOptimumItem IPUIinitiallydevelopedfortheadaptiveteststhataredesignedtomeasureeveryexaminee aspreciselyaspossibleregardlessoftheabilityoftheexaminees.Thisgoalisthepurposeof mostoftheachievementtests.Butnotalltestsaredesignedaroundthispurpose. Licensuretests,forexample,primarilyinterestedinwhetheranexamineeisabovea cutscoreorbelowit.Theprecisionoftheabilityestimateofanexamineewhoisfaraway fromthecutscoreisnotcrucialaslongasthedecisionregardingthepassingstatusofthis examineeisclear.Asaresult,ifanadaptivetestdoesnotgivethemostappropriateitem forthisexaminee,thisistolerable.Theoretically,thebestitempoolforthepurposeofa licensuretestisanitempoolincludingentnumberofitemsthathaveitem equaltothecutscore.Forthepurposesofthetest,thisitempoolisperfect,butforthe precisionoftheabilityestimatesitisfarfromperfect.Manyotherexamplescanbegiven fortestsinwhichthehighprecisionoftheestimatesforallexamineesisnottheprimary purposeforthetest. IPUIinEquation(3.5)quanthequalityofanitempoolasiftheprimarypurposeof theadaptivetestistheprecisionoftheabilityestimates.IntheCATliterature,theoptimum itemaccordingtothispurpose:\...anitemisconsideredtohaveoptimumstatistical propertiesifitismostinformativeatanexamineescurrentmaximum-likelihoodestimate ofability"(Eignoretal.,1993,p.10).Forthegenerallogicoftheadaptivetestthismakes sense.ThisiswhyFlaugher(2000)listedarectangulardistributionofitemyasa characteristicofasatisfactoryitempool.Arectangulardistributionofitemyenables aCATproceduretoprovideeachexamineeanappropriateitem. Inthefuture,theformulationofIPUIcanbegeneralizedtoincludevariouspurposesof 160 theadaptivetests.Thedenominatorcanbemosothattheoptimumitemisin accordancewiththetestpurpose.Forthenumerator,theinformationshouldbecalculated inrespecttothisoptimumitem. 6.5FutureResearchDirections 6.5.1AGeneralFrameworkforIPUI Asdiscussedintheprevioussection,theoftheoptimumitemmightbet fortestswithtpurposes.IPUIhasalimitedofanoptimumitem.Future researchcaninvestigateageneralizedframeworkforIPUIwhichencompasseserent oftheoptimumitem. Inageneralizedframework,theoftheoptimumitemdoesnothavetobean itemthathasthemaximuminformationatexaminee'sintermediateabilityestimate.Instead, theoptimumitemcanbeinaccordancewiththepurposeofthetest.IPUIcan quantifythediscrepancybetweentheadministereditemandtheoptimumitem. Forinstance,foralicensuretestwithonecutscore,theoptimumitemhasay parameterthatisequaltothecutscore.Suchanitemincreasesthedecisionaccuracy,ifnot theprecisionoftheabilityestimates.IPUIcanbecalculatedastheratiooftheinformation oftheadministereditematthecutscoretotheinformationoftheoptimumitematthecut score.Iftherearemultiplecutscoresinatest(suchasbasic,tandadvanced),the oftheoptimumitembecomescomplicated(Eggen&Straetmans,2000). Anotherforoptimumitemisrelatedtothetestanxietyamongexaminees.One ofthecriticismsofaCATisthecultyofitemspresentedtotheexaminee.Itemselection inaCATisoptimizedsothatateachstepoftheCAT,thealgorithmadministersanitem with50%probabilityofcorrectanswerattheintermediateabilityestimateoftheexaminee. Thiscontinuingchallengethroughoutthetestmightcausefrustrationtosomeexaminees. EggenandVerschoor(2006)dasolutiontothisproblem.Insteadofselectingitems 161 thathave50%probabilityofcorrectresponse,theyanitemselectionalgorithmwhich selectsitemsthathave60%or70%(oranyotherdesiredpercentage)probabilityofcorrect response.WhencomparingtheiralgorithmwithMFIitemselectionalgorithm,theyobserved thattheydidnotachievedesiredpercentages.Theyattributedthediscrepancybetweenthe actualanddesiredpercentagesto\amismatchbetweentheitemsavailableintheitembank andthedesiredpercentagesinthepopulation"(Eggen&Verschoor,2006,p.391).They hypothesizedthatadditionofeasieritems(inthecasewheredesiredpercentagewas60%)to itembankwouldresolvetheproblem. Atsight,IPUImightseemtoquantifythemismatchtheyobserved.ButinfactIPUI wouldnothelpinthissituation.ThemainassumptionofIPUIistheofoptimum item.Anoptimumitemisanitemwhichprovidesthemaximuminformationatanexaminees abilitylevel,anitemwith50%probabilityofcorrectresponseattheintermediate estimate. InthestudyofEggenandVerschoor(2006),theofoptimumitemwast. Theydeanoptimumitemasanitemthathasmaximuminformationat\anability valueatwhichtheexamineewiththecurrentabilityestimatehasahigherorlowersuccess probability"(p.387).IPUIasasinEquation(3.5)couldnotcapturethe\mismatch" theydesired. Ontheotherhand,forthisparticularitemselectionalgorithmthereexistsasolution forthisproblem.TheformulationofIPUIcouldbechangedtoadjustfortheirof optimumitem.InEquation(3.5),insteadofcalculatinginformationat ^ k 1 ,theinformation canbecalculatedat ^ k 1 i ,where ^ k 1 istheintermediateabilityestimatebeforethe administrationof k thitem. i istheshiftparameterforthe i -thitemwhichisfor 2PLas 1 a i ln ( p 1 p ),where p isthedesiredprobabilityofcorrectresponse.Forexample,ifthe desiredprobabilityis0.60,theitemselectionalgorithmwillselectanitemthathasmaximum informationat ^ k 1 i = ^ k 1 1 a i ln( 0 : 6 1 0 : 6 ). For1PLmodel,theofoptimumitemforanexamineewithintermediateability estimate ^ k 1 isanitemwithyparameterequalsto ^ k 1 ln ( p 1 p ).Whenthedesired 162 probability p =0 : 5,thiscorrespondstoayparameterequalstotheintermediate estimate.For2PLmodel,theofoptimumitemismorecomplicated.Sinceitem discriminationparameterdoesnothaveanupperbound,thetestdevelopercanda maximumvaluefor a parameter( a max ).Accordingly,itemdiscriminationparameterofthe optimumitemis a max andtheitemyparameteris ^ k 1 1 a max ln ( p 1 p ).Usingthe generalframeworkdiscussedabove,theofIPUIforthe k thadministereditem i k willbe: IPUI k = I i k h ^ k 1 1 a max ln( p 1 p ) i I max h ^ k 1 1 a max ln( p 1 p ) i Thiscanbeadjustedfor1PLmodelby a max to1.Theexamplesgivenhere canbeextendedtotoptimumitemitionsandtheIPUIcanbeusedasamore generaltoolforevaluatingtheadequacyofitempoolsfortCATscenarios. 6.5.2WeightsforIPUI Atindividualtestlevel,IPUIiscurrentlygivingequalweightsateachstageoftheadaptive test.Inreality,asChangandYing(2008)argued,itisdesirableforaCATtoprovidebetter itemstowardstheendofthetest.Theabilityestimatesatthebeginningofthetestareprone tomoreerror,soitemsdonothavetomatchtheintermediateabilityestimatesprecisely. Towardstoendofthetestbetteritemsareneededbecausetheabilityestimatesaremore precise.Consideringthis,theweightsoftheIPUImightbeadjustedtobelowattheearly stagesofthetestandhightowardstheendofthetest.Inthiscase,iftheitempoolis depletedtowardstheendofthetest,wheretheneedforappropriateitemsismorecritical, thisweightingschemewillpunishtheitempoolforthis. 6.5.3IPUIforOtherPsychometricModels IPUIiscurrentlyavailableforonly1PLmodel.Thishampersitsusefor2PLand3PLmodels whichareverycommoninoperationaltests.Theproblemwith2PLand3PLIRTmodels 163 istheparametervaluesoftheoptimumitemforthesemodels.Foranoptimumitem,the itemdiscriminationparameter( a parameter)shouldbeequaltoy.Theinformation valueofthisitemalsohasanvalue.ThismakesthedenominatoroftheIPUI ConsequentlythevalueoftheIPUIwillbe Inpractice,suchanoptimumitemisnotpossibletodevelop(Reckase,2010).This limitationcanbehandledbysettingalimittotheitemdiscriminationparameter.Even thoughthevalueofthislimitwillbearbitrary,thehistoricaltestdatacanbeusedtogetthis number.Forinstance,anoptimumitemfor3PLcanbeashavinganitemy equaltotheintermediateabilityestimate,itemdiscriminationequalto2,andguessing parameterequalto0. Thisapproachhassomelimitations.Ifanitemhasan a parameterlargerthan2,thenthe valueofIPUIwillexceed1.Inaddition,thecomparisonofIPUIvalueswillnotbepossibleif tlimitsfor a parametersareusedforttests. Fromthediagnosticpointofview,usingIPUIonlyfor1PLmakessense.Inreality,when atestdeveloperneedstoaddanitemtotheitempool,itiscomparativelyeasierforitem writerstowriteanitemthathasatargeteditemyparametercomparedtowriting itemswithatargeteditemdiscriminationparameter(Bejar,1983).So,diagnostically,ifIPUI guidestestdeveloperstowriteitemthathavesomespy,practicallythismight befeasible. Inadditionto1PL,2PLand3PLmodels,theuseofIPUIcanbeextendedtoMIRT modelsandpolytomousIRTmodelstoo.ACATusingaMIRTmodelismorecomplex comparedtotheunidimensionalCATs(Yao,Pommerich,&Segall,2014).Consequently,the evaluationoftheitempoolsformultidimensionalcomputerizedadaptivetest(MCAT)is moreTheextensionoftheIPUItomultidimensionalitempoolsisstraightforward becausetheinformationfunctionofMIRTisverysimilartotheinformationfunctionof unidimensionalIRT(Reckase&McKinley,1991).IPUIcananeasywaytoevaluatethe itempoolsforMCAT. 164 CATsusingpolytomousIRTmodels(Nering&Ostini,2010)areanotherpossibleextension ofIPUI.Inhealthsciences,theuseofaCATwithpolytomousitemsarecommon(Amtmann etal.,2010;Haleyetal.,2009;Pilkonisetal.,2011).ThesizeoftheitempoolsforCATswith polytomousitemsarerelativelysmallcomparedtotheitempoolsusedinhighstakestests. IPUIcanbehelpfulforthediagnosisofthesesmallitempools.Inaddition,duetomultiple possibleresponsesforeachitem,theevaluationoftheitempoolsmightbechallenging. 6.5.4NamingoftheIndex Thecorrectnamingoftheindexisimportantbecauseitconveysthemessageaboutthe possibleusesoftheindex.Anindexwithamisleadingnamemightresultinaninappropriate useoftheindex.Theindexdevelopedinthisstudyquanwhethertheitempoolis adequateforagivensetoftestspandtheexamineepopulation.Aperfectly adequateitempoolmightnotbeadequateforatsetoftestsporfora texamineepopulation.Thenameoftheindexshouldconveythedependenceofthe itempoolperformancetothetestspeciandtheexamineepopulation. Thename\itempoolutilizationindex"partiallycoversthismeaning.Butinthefuture, othernamingalternativesthatconveysthecapabilitiesofthisindexbettershouldbeexplored. Somealternativenamesmightbe\itempooladequacyindex",\qualityofutilizationofitem poolindex"and\qualityofitempoolindex".Astheuseofthisindexspreadamongthe practitionersandresearchers,aconsensusonabetternameforthisindexwillbereached. 165 APPENDICES 166 APPENDIXA SUPPLEMENTARYFIGURESFORRESEARCHQUESTION1-PART1 FigureA.1:ItemcultyDistribution(ResearchQuestion1-DiscrepancybetweenItem PoolandAbilityDistribution) 167 FigureA.2:True Distribution(ResearchQuestion1-DiscrepancybetweenItemPooland AbilityDistribution) 168 FigureA.3:DistributionofBiasforeachDiscrepancyCondition(ResearchQuestion1- DiscrepancybetweenItemPoolandAbilityDistribution) 169 FigureA.4:RelationshipbetweenBiasandIPUIforeachDiscrepancyCondition(Research Question1-DiscrepancybetweenItemPoolandAbilityDistribution) 170 FigureA.5:DistributionofMeanSquaredErrorforeachDiscrepancyCondition(Research Question1-DiscrepancybetweenItemPoolandAbilityDistribution) 171 FigureA.6:RelationshipbetweenMeanSquaredErrorandIPUIforeachDiscrepancy Condition(ResearchQuestion1-DiscrepancybetweenItemPoolandAbilityDistribution) 172 FigureA.7:TwoExamineeswithSameStandardErrorsbuttIPUIValues(Research Question1-DiscrepancybetweenItemPoolandAbilityDistribution) 173 APPENDIXB SUPPLEMENTARYFIGURESFORRESEARCHQUESTION1-PART2 FigureB.1:True Distribution(ResearchQuestion1-Part2) 174 FigureB.2:ItemyDistributionbyItemPoolSizeConditionforReplication19 (ResearchQuestion1-Part2) 175 FigureB.3:BiasDistributionbyItemPoolSizeConditionforReplication19(Research Question1-Part2) 176 FigureB.4:StandardErrorDistributionbyItemPoolSizeConditionforReplication19 (ResearchQuestion1-Part2) 177 FigureB.5:MeanSquaredErrorDistributionbyItemPoolSizeConditionforReplication19 (ResearchQuestion1-Part2) 178 FigureB.6:IPUIDistributionbyItemPoolSizeConditionforReplication19(Research Question1-Part2) 179 APPENDIXC SUPPLEMENTARYFIGURESFORRESEARCHQUESTION2-PART1 FigureC.1:ItemyDistributionforResearchQuestion2-TestLengthConditions 180 FigureC.2:True DistributionforResearchQuestion2-TestLengthConditions 181 FigureC.3:IPUIandBiasRelationshipbyTestLengthCondition 182 APPENDIXD SUPPLEMENTARYFIGURESFORRESEARCHQUESTION2-PART2 FigureD.1:ItemyDistributionforResearchQuestion2-ExposureControl 183 FigureD.2:True DistributionforResearchQuestion2-ExposureControl 184 FigureD.3:IPUIandBiasRelationshipbyExposureControlCondition 185 APPENDIXE SUPPLEMENTARYFIGURESFORRESEARCHQUESTION3 FigureE.1:TheBiasDistributionateachTrue ValueforeachItemPoolCondition 186 FigureE.2:TheStandardErrorDistributionateachTrue ValueforeachItemPool Condition 187 APPENDIXF SUPPLEMENTARYFIGURESFORIDEALITEMPOOLCREATION FigureF.1:ProgressPlotforIdealItemPoolwithFixedBinSize0.8 188 FigureF.2:ItemyDistributionsbyContentAreaforIdealItemPoolwithFixed BinSize0.8 189 APPENDIXG SUPPLEMENTARYFIGURESFORRESEARCHQUESTION4 FigureG.1:True DistributionforResearchQuestion4 190 FigureG.2:TheRelationshipbetweenEstimatedAbilityandBiasforeachItemPool Condition 191 FigureG.3:TheRelationshipbetweenEstimatedAbilityandStandardErrorforeachItem PoolCondition 192 FigureG.4:TheRelationshipbetweenTestLengthandStandardErrorforeachItemPool Condition 193 FigureG.5:TheRelationshipbetweenIPUIandMeanSquaredErrorforeachItemPool Condition 1 1 Notethatthex-axisscaleforeachist. 194 APPENDIXH SUPPLEMENTARYFIGURESFORRESEARCHQUESTION5 FigureH.1:TheRelationshipbetweenTrue andEstimated foreachItemPoolCondition 195 FigureH.2:TheBiasDistributionateachTrue ValueforeachItemPoolCondition 1 196 FigureH.3:MeanStandardErrorConditionalonRestrictedTrue RangeforeachItem PoolCondition 1 Forbrevity,onlysomeofthetrue valuesaredisplayed. 197 FigureH.4:TheStandardErrorDistributionateachTrue ValueforeachItemPool Condition 2 2 Forbrevity,onlysomeofthetrue valuesaredisplayed. 198 FigureH.5:TheTestLengthDistributionateachTrue ValueforeachItemPoolCondition 3 3 Forbrevity,onlysomeofthetrue valuesaredisplayed.For valuesoutsidethe[-1.5, 1.5]interval,thetestlengthswereall60. 199 BIBLIOGRAPHY 200 BIBLIOGRAPHY AmericanEducationalResearchAssociation,AmericanPsychologicalAssociation,&Na- tionalCouncilonMeasurementinEducation.(2014). Standardsforeducationaland psychologicaltesting .Washington,DC:AmericanEducationalResearchAssociation. Amtmann,D.,Cook,K.F.,Jensen,M.P.,Chen,W. - H.,Choi,S.,Revicki,D.,...Lai,J. - S. (2010).DevelopmentofaPROMISitembanktomeasurepaininterference. PAIN , 150 (1),173{182.doi:http://dx.doi.org/10.1016/j.pain.2010.04.025 Barrada,J.R.,Olea,J.,Ponsoda,V.,&Abad,F.J.(2010).Amethodforthecompar- isonofitemselectionrulesincomputerizedadaptivetesting. AppliedPsychological Measurement , 34 (6),438{452.doi:10.1177/0146621610370152 Bejar,I.I.(1983).Subjectmatterexperts'assessmentofitemstatistics. AppliedPsychological Measurement , 7 (3),303{310.doi:10.1177/014662168300700306 Belov,D.I.&Armstrong,R.D.(2009).Directandinverseproblemsofitempooldesign forcomputerizedadaptivetesting. EducationalandPsychologicalMeasurement , 69 (4), 533{547.doi:10.1177/0013164409332224 Bergstrom,B.A.,Lunz,M.E.,&Gershon,R.C.(1992).Alteringthelevelofyin computeradaptivetesting. AppliedMeasurementinEducation , 5 (2),137{149.doi:10. 1207/s15324818ame0502 4 Bock,R.D.&Mislevy,R.J.(1982).AdaptiveEAPestimationofabilityinamicrocom- puterenvironment. AppliedPsychologicalMeasurement , 6 (4),431{444.doi:10.1177/ 014662168200600405 Breithaupt,K.,Ariel,A.A.,&Hare,D.R.(2010).Assemblinganinventoryofmultistage adaptivetestingsystems.InW.J.vanderLinden&C.A.W.Glas(Eds.), Elements ofadaptivetesting (pp.247{266).Springer. Chang,H. - H.(2004).Understandingcomputerizedadaptivetesting:fromRobbins-Monroto Lordandbeyond.InD.Kaplan(Ed.), TheSagehandbookofquantitativemethodsfor thesocialsciences (pp.117{133).ThousandOaks,CA:Sage. Chang,H. - H.,Qian,J.,&Ying,Z.(2001).multistagecomputerizedadaptive testingwithbblocking. AppliedPsychologicalMeasurement , 25 (4),333{341.doi:10. 1177/01466210122032181 201 Chang,H. - H.&Ying,Z.(1996).Aglobalinformationapproachtocomputerizedadaptivetest- ing. AppliedPsychologicalMeasurement , 20 (3),213{229.doi:10.1177/014662169602000303 Chang,H. - H.&Ying,Z.(2008).Toweightornottoweight?Balancingofinitialitems inadaptivetesting. Psychometrika , 73 (3),441{450.doi:10.1007/s11336-007-9047-7 Chang,H. - H.&Ying,Z.(2009).Nonlinearsequentialdesignsforlogisticitemresponsetheory modelswithapplicationstocomputerizedadaptivetests. TheAnnalsofStatistics , 37 (3),1466{1488.doi:10.2307/30243674 Chen,S. - Y.,Ankenmann,R.D.,&Chang,H. - H.(2000).Acomparisonofitemselectionrules attheearlystagesofcomputerizedadaptivetesting. AppliedPsychologicalMeasurement , 24 (3),241{255.Retrievedfromhttp://apm.sagepub.com/content/24/3/241.abstract Chen,S. - Y.,Ankenmann,R.D.,&Spray,J.A.(2003).Therelationshipbetweenitem exposureandtestoverlapincomputerizedadaptivetesting. JournalofEducational Measurement , 40 (2),129{145.doi:10.2307/1435342 Cheng,Y.&Chang,H. - H.(2009).Themaximumpriorityindexmethodforseverelycon- straineditemselectionincomputerizedadaptivetesting. BritishJournalofMathematical andStatisticalPsychology , 62 (2),369{383.doi:10.1348/000711008X304376 Cronbach,L.J.(1951).Cotalphaandtheinternalstructureoftests. Psychometrika , 16 (3),297{334.doi:10.1007/BF02310555 Davey,T.&Nering,M.L.(2002).Controllingitemexposureandmaintainingitemsecurity. InC.N.Mills,M.T.Potenza,J.J.Fremer,&W.C.Ward(Eds.), Computer-based testing:buildingthefoundationforfutureassessments (pp.165{191).Mahwah,New Jersey:LawrenceErlbaumAssociates. Eggen,T.J.H.M.&Straetmans,G.(2000).Computerizedadaptivetestingforclassifying examineesintothreecategories. EducationalandPsychologicalMeasurement , 60 (5), 713{734.doi:10.1177/00131640021970862 Eggen,T.J.H.M.&Verschoor,A.J.(2006).Optimaltestingwitheasyoritems incomputerizedadaptivetesting. AppliedPsychologicalMeasurement , 30 (5),379{393. doi:10.1177/0146621606288890 Eignor,D.R.,Stocking,M.L.,Way,W.D.,&M.(1993). Casestudiesincomputer adaptivetestdesignthroughsimulation .EducationalTestingService. Flaugher,R.(2000).Itempools.InH.Wainer,N.J.Dorans,D.Eignor,R.Flaugher,B.F. Green,R.J.Mislevy,...D.Thissen(Eds.), Computerizedadaptivetesting:aprimer (2ndedition,pp.37{60).Mahwah,NewJersey:LawrenceErlbaumAssociates. 202 Georgiadou,E.G.,TrianE.,&Economides,A.A.(2007).Areviewofitemexposure controlstrategiesforcomputerizedadaptivetestingdevelopedfrom1983to2005. The JournalofTechnology,LearningandAssessment , 5 (8). Gibbons,R.D.,Weiss,D.J.,Kupfer,D.J.,Frank,E.,Fagiolini,A.,Grochocinski,V.J.,... Immekus,J.C.(2008).Usingcomputerizedadaptivetestingtoreducetheburdenof mentalhealthassessment. PsychiatricServices , 59 (4),361{8. Gierl,M.J.&Lai,H.(2013).Instructionaltopicsineducationalmeasurement(ITEMS) module:usingautomatedprocessestogeneratetestitems. EducationalMeasurement: IssuesandPractice , 32 (3),36{50.doi:10.1111/emip.12018 Haley,S.M.,Fragala-Pinkham,M.A.,Dumas,H.M.,Ni,P.,Gorton,G.E.,Watson,K.,... Tucker,C.A.(2009).Evaluationofanitembankforacomputerizedadaptivetestof activityinchildrenwithcerebralpalsy. PhysicalTherapy , 89 (6),589{600. Hambleton,R.K.&Swaminathan,H.(1985). Itemresponsetheory:principlesandapplications . Boston,MA:KluwPub. Han,K.T.(2012).Anbalancedinformationcriterionforitemselectionincomput- erizedadaptivetesting. JournalofEducationalMeasurement , 49 (3),225{246.doi:10. 1111/j.1745-3984.2012.00173.x He,W.,Diao,Q.,&Hauser,C.(2014).Acomparisonoffouritem-selectionmethodsfor severelyconstrainedCATs. EducationalandPsychologicalMeasurement .doi:10.1177/ 0013164413517503 He,W.&Reckase,M.D.(2013).Itempooldesignforanoperationalvariable-length computerizedadaptivetest. EducationalandPsychologicalMeasurement .doi:10.1177/ 0013164413509629 Hetter,R.D.&Sympson,J.B.(1997).Item-exposureinCAT-ASVAB.InW.A.Sands, B.K.Waters,&J.R.McBride(Eds.), Computerizedadaptivetesting:frominquiryto operation (pp.141{144).Washington,DC:AmericanPsychologicalAsociation. Hildebrand,F.B.(1987). Introductiontonumericalanalysis (2ndedition).Mineola:NY: CourierDoverPublications. Kane,M.T.(2013).Validatingtheinterpretationsandusesoftestscores. Journalof EducationalMeasurement , 50 (1),1{73.doi:10.1111/jedm.12000 Kim,J.K.&Nicewander,W.A.(1993).Abilityestimationforconventionaltests. Psychome- trika , 58 (4),587{599.doi:10.1007/BF02294829 203 Kingsbury,G.G.&Wise,S.L.(2000).Practicalissuesindevelopingandmaintaininga computerizedadaptivetestingprogram. Psicolgica:Revistademetodologaypsicologa experimental , 21 (1),135{156. Kingsbury,G.G.&Zara,A.(1989).Proceduresforselectingitemsforcomputerizedadaptive tests. AppliedMeasurementinEducation , 2 (4),359{375. Kruyen,P.M.,Emons,W.H.M.,&Sijtsma,K.(2012).Testlengthanddecisionqualityin personnelselection:whenisshorttooshort? InternationalJournalofTesting , 12 (4), 321{344.doi:10.1080/15305058.2011.643517 Leeuw,J.d.&Verhelst,N.(1986).MaximumlikelihoodestimationingeneralizedRasch models. JournalofEducationalStatistics , 11 (3),183{196.doi:10.2307/1165071 Leroux,A.J.,Lopez,M.,Hembry,I.,&Dodd,B.G.(2013).Acomparisonofexposurecontrol proceduresinCATsusingthe3PLmodel. EducationalandPsychologicalMeasurement , 73 (5),857{874.doi:10.1177/0013164413486802 Lord,F.M.(1974).Therelativeoftwotestsasafunctionofabilitylevel. Psy- chometrika , 39 (3),351{358.doi:10.1007/BF02291708 Lord,F.M.(1975).Relativeofnumber-rightandformulascores. BritishJournal ofMathematicalandStatisticalPsychology , 28 (1),46{50. Lord,F.M.(1977a).Abroad-rangetailoredtestofverbalability. AppliedPsychological Measurement , 1 (1),95{100.doi:10.1177/014662167700100115 Lord,F.M.(1977b).Practicalapplicationsofitemcharacteristiccurvetheory. Journalof EducationalMeasurement , 14 (2).doi:10.2307/1434011 Lord,F.M.(1980). Applicationsofitemresponsetheorytopracticaltestingproblems .Hillsdale, NJ:L.ErlbaumAssociates. Lord,F.M.(1986).Maximumlikelihoodandbayesianparameterestimationinitemresponse theory. JournalofEducationalMeasurement , 23 (2),157{162.Retrievedfromhttp: //www.jstor.org/stable/1434513 Lord,F.M.&Novick,M.R.(1968). Statisticaltheoriesofmentaltestscores .Reading,MA: Addison-Wesley. Luecht,R.M.&Clauser,B.E.(2002).TestmodelsforcomplexCBT.InC.N.Mills,M.T. Potenza,J.J.Fremer,&W.C.Ward(Eds.), Computer-basedtesting:buildingthe foundationforfutureassessments (pp.67{88).Mahwah,NJ:LawrenceErlbaum. 204 McBride,J.R.(1977).SomepropertiesofaBayesianadaptiveabilitytestingstrategy. Applied PsychologicalMeasurement , 1 (1),121{140.doi:10.1177/014662167700100119 Meijer,R.&Nering,M.L.(1999).Computerizedadaptivetesting:overviewandintroduction. AppliedPsychologicalMeasurement , 23 (3),187{194. Millman,J.&Arter,J.A.(1984).Issuesinitembanking. JournalofEducationalMeasurement , 21 (4),315{330.doi:10.2307/1434584 Mills,C.N.&Stocking,M.L.(1996).Practicalissuesinlarge-scalecomputerizedadaptivetest- ing. AppliedMeasurementinEducation , 9 (4),287{304.doi:10.1207/s15324818ame0904 1 NationalCouncilofStateBoardsofNursing.(2012). NCLEX-RNexaminationdetailedtest planfortheNationalCouncilLicensureExaminationforRegisteredNursesitemwriter- itemreviewer-nurseeducatorversion .NationalCouncilofStateBoardsofNursing. Chicago,IL.Retrievedfromhttps://www.ncsbn.org/2013 NCLEX RN Detailed Test Plan Educator.pdf Nering,M.L.&Ostini,R.(2010). Handbookofpolytomousitemresponsetheorymodels .New York:Routledge. Nunnally,J.C.&Bernstein,I.H.(1994). Psychometrictheory (3rdedition).NewYork: McGraw-Hill. Owen,R.(1969). ABayesianapproachtotailoredtesting (ReportNo.ResearchBulletinNo. 69-92).EducationalTestingService. Owen,R.(1975).ABayesiansequentialprocedureforquantalresponseinthecontextof adaptivementaltesting. JournaloftheAmericanStatisticalAssociation , 70 (350), 351{356. Parshall,C.G.(2002).ItemdevelopmentandpretestinginaCBTenvironment.InC.N. Mills,M.T.Potenza,J.J.Fremer,&W.C.Ward(Eds.), Computer-basedtesting: buildingthefoundationforfutureassessments (pp.119{141).Mahwah,NewJersey: LawrenceErlbaumAssociates. Parshall,C.G.,Spray,J.A.,Kalohn,J.C.,&Davey,T.(2002). Practicalconsiderationsin computer-basedtesting .NewYork:SpringerVerlag. Pilkonis,P.A.,Choi,S.W.,Reise,S.P.,Stover,A.M.,Riley,W.T.,Cella,D.,&PROMIS CooperativeGroup.(2011).Itembanksformeasuringemotionaldistressfromthe Patient-ReportedOutcomesMeasurementInformationSystem(PROMIS):depression, anxiety,andanger. Assessment , 18 (3),263{283.doi:10.1177/1073191111411667 205 RCoreTeam.(2014). R:alanguageandenvironmentforstatisticalcomputing .RFoundation forStatisticalComputing.Vienna,Austria.Retrievedfromhttp://www.R-project.org Rasch,G.(1961).Ongenerallawsandthemeaningofmeasurementinpsychology.In ProceedingsofthefourthBerkeleysymposiumonmathematicalstatisticsandprobability (Vol.4,pp.321{333).UniversityofCaliforniaPressBerkeley,CA. Reckase,M.D.(2010).Designingitempoolstooptimizethefunctioningofacomputerized adaptivetest. PsychologicalTestandAssessmentModeling , 52 (2),127{141. Reckase,M.D.&McKinley,R.L.(1991).Thediscriminatingpowerofitemsthatmeasure morethanonedimension. AppliedPsychologicalMeasurement , 15 (4),361{373.doi:10. 1177/014662169101500407 Revuelta,J.&Ponsoda,V.(1998).Acomparisonofitemexposurecontrolmethodsin computerizedadaptivetesting. JournalofEducationalMeasurement , 35 (4),311{327. Retrievedfromhttp://www.jstor.org/stable/1435308 Rudner,L.M.(2010).Implementingthegraduatemanagementadmissiontestcomputerized adaptivetest.InW.J.vanderLinden&C.A.W.Glas(Eds.), Elementsofadaptive testing (pp.151{165).Springer. Samajima,F.(1969).Estimationoflatentabilityusingaresponsepatternofgradedscores. PsychometricMonograph , 17 .Retrievedfromhttp://www.psychometrika.org/journal/ online/MN17.pdf Schmeiser,C.B.&Welch,C.J.(2006).Testdevelopment.InR.L.Brennan(Ed.), Educational measurement (4thedition,pp.307{353).Westport,CT:ACE/PraegerPublishers. Segall,D.O.(1996).Multidimensionaladaptivetesting. Psychometrika , 61 (2),331{354. doi:10.1007/bf02294343 Segall,D.O.,Moreno,K.E.,&Hetter,R.D.(1997).Itempooldevelopmentandevaluation. InW.A.Sands,B.K.Waters,&J.R.McBride(Eds.), Computerizedadaptivetesting: frominquirytooperation (pp.117{130).Washington,DC:AmericanPsychological Asociation. Stocking,M.L.(1994). Threepracticalissuesformodernadaptivetestingitempools (Report No.RR-94-05).EducationalTestingService.Princeton,NewJersey. Swanson,L.&Stocking,M.L.(1993).Amodelandheuristicforsolvingverylargeitem selectionproblems. AppliedPsychologicalMeasurement , 17 (2),151{166.doi:10.1177/ 014662169301700205 206 Thompson,N.A.&Weiss,D.J.(2011).Aframeworkforthedevelopmentofcomputerized adaptivetests. PracticalAssessment,Research,andEvaluation , 16 (1),1{9. Urry,V.W.(1977).Tailoredtesting:asuccessfulapplicationoflatenttraittheory. Journal ofEducationalMeasurement , 14 (2),181{196.doi:10.2307/1434014 Vale,C.D.&Weiss,D.J.(1977). Arapiditem-searchprocedureforbayesianadaptivetesting. researchreport77-4 (ReportNo.ResearchReport77-4). vanderLinden,W.J.(1998a).Bayesianitemselectioncriteriaforadaptivetesting. Psy- chometrika , 63 (2),201{216. vanderLinden,W.J.(1998b).Optimalassemblyofpsychologicalandeducationaltests. AppliedPsychologicalMeasurement , 22 (3),195{211.doi:10.1177/01466216980223001 vanderLinden,W.J.(2010).Constrainedadaptivetestingwithshadowtests.InW.J. vanderLinden&C.A.Glas(Eds.), Elementsofadaptivetesting (pp.31{55).New York,NY:Springer. vanderLinden,W.J.,Ariel,A.,&Veldkamp,B.P.(2006).Assemblingacomputerized adaptivetestingitempoolasasetoflineartests. JournalofEducationalandBehavioral Statistics , 31 (1),81{99.doi:10.3102/10769986031001081 vanderLinden,W.J.&Pashley,P.J.(2010).Itemselectionandabilityestimationin adaptivetesting.InW.J.vanderLinden&C.A.Glas(Eds.), Elementsofadaptive testing (pp.3{30).NewYork,NY:Springer. vanderLinden,W.J.,Veldkamp,B.P.,&Reese,L.M.(2000).Anintegerprogramming approachtoitembankdesign. AppliedPsychologicalMeasurement , 24 (2),139{150. doi:10.1177/01466210022031570 Veldkamp,B.P.&vanderLinden,W.J.(2010).Designingitempoolsforadaptivetesting.In W.J.vanderLinden&C.A.W.Glas(Eds.), Elementsofadaptivetesting (Chap.12, pp.231{245).NewYork:Springer. Wainer,H.(2000).Introductionandhistory.InH.Wainer,N.J.Dorans,D.Eignor,R. Flaugher,B.F.Green,R.J.Mislevy,...D.Thissen(Eds.), Computerizedadaptive testing:aprimer (2nd,pp.1{21).Mahwah,NewJersey:LawrenceErlbaumAssociates. Wang,T.&Vispoel,W.P.(1998).Propertiesofabilityestimationmethodsincomputerized adaptivetesting. JournalofEducationalMeasurement , 35 (2),109{135.doi:10.1111/j. 1745-3984.1998.tb00530.x 207 Warm,T.(1989).Weightedlikelihoodestimationofabilityinitemresponsetheory. Psy- chometrika , 54 (3),427{450.doi:10.1007/bf02294627 Way,W.D.(1998).Protectingtheintegrityofcomputerizedtestingitempools. Educational Measurement:IssuesandPractice , 17 (4),17{27.doi:10.1111/j.1745-3992.1998.tb00632.x Way,W.D.,M.,&Anderson,G.S.(2002).Developing,maintaining,andrenewing theiteminventorytosupportCBT.InC.N.Mills,M.T.Potenza,J.J.Fremer,&W.C. Ward(Eds.), Computer-basedtesting:buildingthefoundationforfutureassessments (pp.143{164).Mahwah,NewJersey:LawrenceErlbaumAssociates. Weiss,D.J.(1982).Improvingmeasurementqualityandwithadaptivetesting. AppliedPsychologicalMeasurement , 6 (4),473{492.doi:10.1177/014662168200600408 Weiss,D.J.(2011).Betterdatafrombettermeasurementsusingcomputerizedadaptive testing. JournalofMethodsandMeasurementintheSocialSciences , 2 (1),1{27. Weiss,D.J.&McBride,J.R.(1984).BiasandinformationofBayesianadaptivetesting. AppliedPsychologicalMeasurement , 8 (3),273{285.doi:10.1177/014662168400800303 Wise,S.L.,Bhola,D.S.,&Yang,S. - T.(2006).Takingthetimetoimprovethevalidity oflow-stakestests:thengCBT. EducationalMeasurement:Issuesand Practice , 25 (2),21{30.doi:10.1111/j.1745-3992.2006.00054.x Xing,D.&Hambleton,R.K.(2004).Impactoftestdesign,itemquality,anditembanksizeon thepsychometricpropertiesofcomputer-basedcredentialingexaminations. Educational andPsychologicalMeasurement , 64 (1),5{21.doi:10.1177/0013164403258393 Yao,L.,Pommerich,M.,&Segall,D.O.(2014).UsingmultidimensionalCATtoadminister ashort,yetprecise,screeningtest. AppliedPsychologicalMeasurement , 38 (8),614{631. 208