WlHiHflUIMIIH\lult'xltl‘d‘li‘fll IN \ N WWW!“ LIBRARY Michigan State Universrty This is to certify that the dissertation entitled COMPARISON OF ABILITY ESTIMATION AND ITEM SELECTION METHODS IN MULTIDIMENSIONAL COMPUTERIZED ADAPTIVE TESTING presented by Qi Diao has been accepted towards fulfillment of the requirements for the Doctoral degree in Measurement and Quantitative Methods “If-"Ire. 1/ /(/ [raw/4d Major ProfesTsor’s Signature ‘ 476%4411 '5 o ! 200‘? Date MSU is an Affirmative Action/Equal Opportunity Employer -.----o-n-o-.-a-.-—u-------u--~— PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE .iifiimiz h 5/08 K:lProiIAoc&PrelelRC/DateDue.indd COMPARISON OF ABILITY ESTIMATION AND ITEM SELECTION METHODS IN MULTIDIMENSIONAL COMPUTERIZED ADAPTIVE TESTING By Qi Diao A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Measurement and Quantitative Methods 2009 ABSTRACT COMPARISON OF ABILITY ESTIMATION AND ITEM SELECTION METHODS IN MULTIDIMENSIONAL COMPUTERIZED ADAPTIVE TESTING By Qi Diao The impetus of the study is the lack of guidance in the literature of multidimensional computerized adaptive testing (MCAT) in terms of which item selection and ability estimation methods to use and under what condition. This study did a comprehensive comparison of ability estimation and item selection methods in MCAT. Two ability estimation methods included maximum likelihood estimation and Bayesian estimation method. The item selection methods can be divided into three categories, item selection methods associated with maximum likelihood, item selection with Bayesian with Fisher’s information, and item selection method with Kullback—Leibler information. The comparison was made conditioning on such factors as test length, use of priors, etc. Simulations were based on real data from 2005 Michigan Educational Assessment Program. As the result of the study, recommendations were made which method should be used under certain condition. It is believed that the results of the study can help future researchers in selecting ability estimation and item select methods when conducting their own research in MCAT and help the construction of operational MCAT procedures. To my dear husband Hao Ren, and my parents iii ACKNOWLEDGEMENTS I would like to express my sincere appreciation to my major dissertation advisor, Dr. Mark Reckase; for his guidance, support, and encouragement throughout my dissertation study. Dr. Reckase inspired me through his passion for research, and excellence in teaching and professionalism. I would also like to express my sincere gratitude and thanks to Dr. Kimberly Maier, for her continuous support and guidance in my course selections, study and my research. I would also like to give my thanks to Dr. Sharif Shakrani for his encouragement and guidance throughout my Ph.D. study and my assistant work. His passion in applying research to aid the improvement of education inspired my choice of career. I would like to thank Dr. Lijian Yang for his consistent guidance of my statistical studies. I would like to thank CTB/McGraw-Hill companies for their support of this study. This study is partially funded by CTB and without all the support from the research department of CTB, I would not have been able to complete this study while working as a research scientist at CTB. I would like to give my special thanks to Dr. Wim van der Linden for his support and guidance throughout this study. I would like to thank my husband Hao Ren for his overwhelming support, encouragement and love. Without him, none of this would have been possible. Finally, I would like to thank my beloved parents and friends. Thank you! iv TABLE OF CONTENTS LIST OF TABLES ...................................................................................... vii LIST OF FIGURES - - -- -_ -- . - --.vii CHAPTER LINTRODUCTION - - - - ...... c .............................. l 1.1 Multidimensional Item Response Model ............................................................ l 1.2 Components of a CAT procedure ....................................................................... 3 CHAPTER 2.ABILITY ESTIMATION AND ITEM SELECTION METHODS IN MCAT .................................................................................. 6 2.1 Ability Estimation Methods ............................................................................... 6 2.1.1 Maximum Likelihood Method ........................................................................ 6 2.1.2 Bayesian Estimation Method .......................................................................... 7 2.1.3 Other Ability Estimation Methods ................................................................. 8 2.2 Item Selection Methods ...................................................................................... 9 2.2.1 Maximizing the Determinant of the Fisher Information Matrix (D-optimality) ........................................................................................................................ 9 2.2.2 Minimizing the Trace of Inverse of Fisher Information Matrix (A-optimality) ...................................................................................................................... 10 2.2.3 Largest Decrement in the Volume of Bayesian Credibility Ellipsoid .......... 11 2.2.4 Maximizing the Kullback-Leibler Information ............................................ 12 2.2.5 Other Item Selection Methods ...................................................................... 13 CHAPTER 3.RESEARCH QUESTIONS AND METHODS ................. 14 3.1 Resesarch Questions ......................................................................................... 14 3.2 Research Methods ............................................................................................. 17 3.2.1 Real Data Used ............................................................................................. 17 3.2.2 Simulation ..................................................................................................... 20 CHAPTER 4.RESULTS AND DISCUSSION..... .................................. 26 CHAPTER 5. CONCLUSION S AND FUTURE RESEARCH DIRECTION.........----- - - _ - ........................... - 103 LIST OF TABLES Table 3.1 All conditions in comparing different ability estimation method ............ 16 Table 3.2 All conditions in comparing item selection methods .......................... 17 Table 3.3 MIRT item parameters for Grade 7 Michigan Educational Assessment Program (MEAP) Mathematics test from Li (2006). ....................................... . 18 Table 3.4 Correlation coefficients among 3 dimensions on Grade 7 Michigan Education Assessment Program (MEAP) Mathematics Test ............................... 20 Table 3.5 Distributions for multidimensional item parameter generation mimicking Grade 7 Michigan Education Assessment Program (MEAP) Mathematics test. 100 items were generated for each dimension. .................................................... 21 Table 3.6 All 11 simulated conditions. All simulations are for 27 true ability points, 50 replicates at each point. ................................................................................... 23 Table 4.1 Items administered, responses and updated ability estimates after each item for one examinee. The combination was maximum likelihood and D-optimality. Initial estimate was (0, 0, 0) and the true location was (1, 1, 1) ..................... 34 Table 4.2 Computation time for each examinee (Unit: second) ......................... 102 vi LIST OF FIGURES Figure 4.1 Mean biases and RMSEs for maximum likelihood as the ability estimation method and D-optimality as the item selection method, at test length =20 and at test length =50 ....................................................................................... 27 Figure 4.2 Mean and standard deviation of Euclidean distance for maximum likelihood as the ability estimation method and D-optimality as the item selection method, at test length =20 and at test length =50 ............................................. 31 Figure 4.3 Successive progress plot of updated ability estimates and true location point after administering each item for maximum likelihood method and D-optirnality. Initial estimate (0, 0, 0). True location point (1, l, 1). Test length-=50 ........................................................................................ 32 Figure 4.4 Euclidean distance of between updated ability estimates and true location point after administering each item for maximum likelihood method and D-optimatlity. Initial estimate (0, 0, 0). True location point (1, 1, 1). Test length=50. ....................................................................................... 33 Figure 4.5 Mean biases and RMSEs for Bayesian as the ability estimation method and maximizing decrement volume in Bayesian as the item selection method with prior set as mean 0 and identity matrix as variance covariance matrix, at test length =20 and at test length =50 ............................................................................... 36 Figure 4.6 Mean and standard deviation of Euclidean distance of Bayesian as ability estimation method and maximizing decrement volume in Bayesian as the item selection method with prior set as mean 0 and identity matrix as variance covariance matrix, for test length =20 and for test length =50 .......................................... 40 Figure 4.7 Successive progress plot of updated ability estimates and true location point after administering each item for Bayesian method with identity matrix. Initial estimate (0, 0, 0). True location point (1, 1, 1). Test length=50 .......................... 42 Figure 4.8 Euclidean distance of between updated ability estimates and true location point after administering each item for Bayesian method with identity matrix. Initial estimate (0, O, 0). True location point (1, 1, 1). Test length=50. ........................ 43 Figure 4.9 Mean biases and RMSEs for Bayesian as the ability estimation method and maximizing decrement volume in Bayesian as the item selection method with prior set as mean 0 and diag (9) matrix as variance covariance matrix, at test length =20 and at test length =50 ........................................................................... 44 vii Figure 4.10 Mean and standard deviation of Euclidean distance of Bayesian as ability estimation method and maximizing decrement volume in Bayesian as the item selection method with prior set as mean 0 and diag (9) matrix as variance covariance matrix, at test length =20 and at test length =50 ............................................ 48 Figure 4.11 Successive progress plot of updated ability estimates and true location point after administering each item for Bayesian method with diag(9) variance covariance matrix as prior. Initial estimate (0, 0, 0). True location point (1, l, 1). Test length=50 ........................................................................................ 49 Figure 4.12 Euclidean distance of between updated ability estimates and true location point after administering each item for Bayesian method with diag(9) variance covariance matrix as prior. Initial estimate (0, 0, 0). True location point (1, l, 1). Test length=50 ......................................................................................... 50 Figure 4.13 Mean biases and RMSEs for Bayesian as the ability estimation method and maximizing decrement volume in Bayesian as the item selection method with prior set as mean 0 and true variance covariance matrix as variance covariance matrix, at test length =20 and at test length =50 ...................................................... 51 Figure 4.14 Mean and standard deviation of Euclidean distance of Bayesian as ability estimation method and maximizing decrement volume in Bayesian as the item selection method with prior set as mean 0 and true variance covariance matrix as variance covariance matrix, at test length =20 and at test length =50 ................... 55 Figure 4.15 Successive progress plot of updated ability estimates and true location point after administering each item for Bayesian method with true variance covariance matrix as prior. Initial estimate (0, 0, 0). True location point (1, 1, 1). Test length=50 ........................................................................................ 56 Figure 4.16 Euclidean distance of between updated ability estimates and true location point after administering each item for Bayesian method with true variance covariance matrix as prior. Initial estimate (0, 0, 0). True location point (1, l, 1). Test length=50 ......................................................................................... 57 Figure 4.17 Mean biases and RMSEs for Bayesian as the ability estimation method and Kullback-Leibler information as the item selection method, at test length =20 and at test length =50 ................................................................................ 58 Figure 4.18 Mean and standard deviation of Euclidean distance of Bayesian as ability estimation method and Kullback-Leibler information as the item selection method, at test length =20 and at test length =50 62 viii Figure 4.19 Successive progress plot of updated ability estimates and true location point after administering each item for Kullback-Leibler. Initial estimate (0, 0, 0). True location point (1, 1, 1). Test length=50 ................................................ 63 Figure 4.20 Euclidean distance of between updated ability estimates and true location point after administering each item for Kullback-Leibler. Initial estimate (0, 0, 0). True location point (1, 1, 1). Test length=50 ................................................ 64 Figure 4.21 Mean biases and RMSEs for maximum likelihood as the ability estimation method, with D-optimality and A—optimality at the item selection methods, test length =50 ................................................................................... 65 Figure 4.22 Mean and standard deviation of Euclidean distance for the combination with maximum likelihood as ability estimation method, comparison of D-optimality and A-optimality as item selection methods ................................................ 69 Figure 4.23 Mean biases and RMSEs for Bayesian as the ability estimation method, comparion of prior variance covariance matrix as: 1) identity matrix; 2) diag (9) and 3) true variance covariance matrix. Test length=20 71 Figure 4.24 Mean biases and RMSEs for Bayesian as the ability estimation method, comparion of prior variance covariance matrix as: 1) identity matrix; 2) diag (9) and 3) true variance covariance matrix. Test length=50 75 Figure 4.25 Mean and standard deviation of Euclidean distance of Bayesian as the ability estimation method, comparison among prior variance covariance matrix: 1) identity matrix, 2) diag(9), and 3) true variance covariance matrix ...................... 79 Figure 4.26 Mean biases and RMSEs for comparison of maximum likelihood method and Bayesian method. Test length=20 ....................................................... 82 Figure 4.27 Mean biases and RMSEs for comparison of maximum likelihood method and Bayesian method. Test length=50 ....................................................... 85 Figure 4.28 Means and standard deviations of Euclidean distance, comparison of maximum likelihood method and Bayesian method. Test length=20 and Test Iength=50 ........................................................................................ 89 Figure 4.29 Mean biases and RMSEs of the comparison of Kullback-Leibler and Volume decrement in Bayesian. Variance covariance of priors is identity matrix. Test length=20 ........................................................................................ 92 Figure 4.30 Mean biases and RMSEs of the comparison of Kullback-Leibler and Volume decrement in Bayesian. Variance covariance of priors is identity matrix. Test length=50 ............................................................................ ' ............ 95 ix Figure 4.31 Means and standard deviations of Euclidean distance, comparison of Kullback-Leibler information and volume decrement in Bayesian with Fisher’s information ...................................................................................... 99 CHAPTER 1. INTRODUCTION Computerized adaptive testing (CAT) has been widely used in many testing programs (e.g. the Graduate Management Admission Test and the Armed Services Vocational Aptitude Test Battery). It is based on the principle of selecting items to match the current proficiency estimate of an examinee. Adaptive tests have many potential advantages, such as improved measurement precision, reduced test time, and flexible individual testing time. Ample research has been done on unidimensional CAT (e.g., van der Linden & Glas 2000; Wainer 2000; Bock & Mislevy 1988). However, only a few studies have been done on multidimensional computerized adaptive testing (MCAT) (e.g., Segall, 1996; Veldkamp & van der Linden, 2002, Reckase 2009). This study will compare methods used in two important parts in adaptive testing: ability estimation and item selection methods under different conditions. It is believed that the results of the study can help future researchers in selecting ability estimation and item select methods when conducting their own research in MCAT and help the construction of operational MCAT procedures. 1.1 Multidimensional Item Response Model A basic unidimensional model for dichotomously scored response is the three parameter logistic (3PL) model (Bimbaum, 1968). In this model, the probability of person ] With ability 6] answers item i correctly is: aim]. —b:. ) e P(Uij =I|9j,a;,bi, Ci)=ci+(1_ci) 1+e aiwj—bi) ’ (1'1) where a is the discrimination parameter; b is item difficulty parameter; and c is pseudo guessing parameter. More detailed description of the parameters can be found in McDonald (1999). One basic assumption of any item response model is local independence. Local independence means that given one examinee, his/her answer to one test item does not influence the probability of his/her answer to another item except through parameterO. Also, one examinee’s answer to one test item does not influence another examinee’s answer. In unidimensional cases, the assumption is the same as: n P(U1= u], U2 = uz, Un = u” is) = r1 13(9)” (1 —P,-(0))""i, (1.2) 7 i=1 So the probability of an examinee getting a set of observed responses 14,, uz, ...,u,, is only a function of item parameters and examinee’s ability parameterO. However, if more than one ability dimensions are measured in the test, the unidimensional models may not fit and multidimensional response models are needed in order to satisfy the local independence assumption. The MIRT model used in this study in a generalization of model (1.1) into multidimensional space: _ el.7(ai9'j+di) P(U1j=1I0j:ai’ CI: di)=ci+(1_ci) a (1'3) 1.7(ai0'j+dI-) 1+e where 0 is a lxm vector of examinee j’s ability coordinates with m is the number of dimensions in the coordinate space. a is lxm discrimination parameter. c and the intercept term d are scalars. In general, there are at least three motivations for developing MCAT. The first one is the same as said above: for many operational tests, the unidimensional models may not fit. Multidimensional response models are needed in order to satisfy the assumption of local independence. The second motivation is that for testing for diagnosic purposes we want to extract as much information as possible and for correlated ability dimensions information fiom one dimension can help measure ability in another dimension. The second motivation also leads to the third: efficiency. Because we can use information from correlated abilities, multidimensional adaptive testing can fiirther make the ability estimation process more efficient. 1.2 Components of a CAT Procedure For any adaptive test, five key questions need to be answered: 1) which model to use; 2) how to select the first item; 3) how to update ability estimate after an examinee gives the response; 4) how to select the next item; 5) how to end the test. So in order to develop any adaptive test, ability estimation and item selection methods are very fundamental. This research is targeted at investigating them in multidimensional cases. There has been some research done in unidimensional CAT to investigate the properties of ability estimation and item selection methods (e.g. Weiss & McBride, f 1984; van der Linden & Pashley, 2000). However, in the current literature on multidimensional adaptive testing, most studies are done using a single ability estimation and item selection method because they focus on other aspects of adaptive testing (e.g. Li Ip & Pub, 2008). The only study that concentrated on a comparison of different ability estimation and item selection methods for multidimensional adaptive testing was Tam (1992). But that was before most currently used methods (e.g. Segall, 1996; Veldkamp & van der Linden, 2002) were developed. Also, most of the research done in multidimensional adaptive testing used two-dimensional cases, but we believe for the purpose of multidimensional tests, at least three dimensions are needed to give a rigorous evaluation of the procedures. Therefore, in order to have a better understanding of MCAT, this study conducted a comparative study of ability estimation and item selection methods in MCAT under different conditions. The first attempt to extend unidimensional adaptive testing methods to multidimensional cases was Bloxom and Vale (1987). As mentioned above, Tam (1992) worked on comparing adaptive estimation for multidimensional tests and he also developed an iterative maximum likelihood ability estimation procedure himself. But all studies in those times were limited by computer power, which is not a problem for the computers now. Several current studies have investigated ability estimation methods and item selection methods. Segall (1996, 2000) applied maximum likelihood estimation and item selection methods and Bayesian estimation and item selection methods. Luecht (1996) examined the benefits of applying multidimensional adaptive testing methods in a licensing/certification context. Another W method, Kullback-Leibler Information, was first introduced to adaptive testing by Chang & Ying (1996). Veldkamp & van der Linden (2002) further developed it for the multidimensional case. In this study, ability estimation methods: maximum likelihood (Segall 1996, 2000, Reckase 2009) and Bayesian methods (Segall 1996, 2000) were investigated. Item selection methods: maximizing Fisher’s information (Segall 1996, 2000, Mulder & van der Linden, 2008), including D-optimality, A-optimality, and maximizing Kullback-Leibler information (V eldkamp & van der Linden, 2002) were compared. The objective of the study is to compare the above methods for various conditions, such as test lengths and priors used. CHAPTER 2. ABILITY ESTIMATION AND ITEM SELECTION METHODS IN MCAT 2.1 Ability Estimation Methods In this study, we assume the number of dimensions and the multidimensional coordinate space has been determined. The item bank exists and all item parameters a , c, d have been calculated. So the focus here is to administer the test and estimate examinees’ ability parameters 0. In any CAT procedure, an initial estimate of a person’s location in the coordinate system 00 is specified, and then an item is selected and administered to the examinee. Based on the examinee’s answer to the item, an updated location estimate is calculated. Then another item is selected based on the updated location estimate, this procedure is repeated until the end of the test. The final location estimate for this examinee is given. In this section, two methods of how to estimate persons’ locations in the coordinate system are shown. The two methods are maximum likelihood and Bayesian methods. The algorithm of each method is briefly introduced. 2.1.1 Maximum Likelihood Method Maximum likelihood method was first applied in MCAT by Segall (1996, 2000). It begins with the likelihood function. Assumed n items have been administered, from the local independence assumption, the likelihood of an examinee with ability 0 observes a vector of responses u is: L(uw) = L(u ., uvn IO) = n 3(0)“‘Qi(0)1’“’ , (2.1) , uv , .. v1 2 iev where Pi(0) is defined by (1.3), Qi(0)=l—P,-(0), and v is a vector containing the identifiers of the administered items. The maximum likelihood estimates are the solution to the set of m simultaneous equations given by: ilnL(u|0)-0 (2 2) ao ’ ' This set of equations does not have a closed form solution, so Segall (1996, 2000) suggested using an iterative numerical procedure, e.g. Newton-Raphson procedure, to obtain the estimates. A more detailed description of the method can be found in Segall (1996, 2000). 2.1.2 Bayesian Estimation Method This Bayesian estimation method is introduced by Segall (1996). From Bayes Theorem, the posterior density function of 0 is: _ 1.52. f (9 I I!) — L(u I 9) f (u) , (23) where L(u | 0) is defined as in (2.1), f (0) is the prior distribution of 0 , and f (u) is the marginal probability of n. In most of the studies, we assume the prior distribution of 0 is multivariate normal with mean u and variance covariance matrix (D: f(0) = (an—"”2 H» r“2 expt—gte — u)'¢"' (0 - m]. (2.4) There are two ways of obtaining the point estimates of ability: the mode of the posterior distribution (MAP) or the mean of the posterior distribution (EAP). MAP is used more often simply because it requires far less computation. But with the increase of the computer power, EAP is also applicable. MAP can be obtained from the solution to the system of equations: —a-Inf(0|u)-0 (25) 60 . . The same as in the case of solving the equation (2.2), no explicit solution can be found. So an iterative numerical procedure such as Newton-Raphson procedure must be applied to find the solution. EAP is calculation by: 6: 15(9 | u), (2.6) where the expectation is taken according to the posterior distribution of 0. More detailed description of this method can be found in Segall (1996, 2000). 2.1.3 Other Ability Estimation Methods Bloxom and Vale (1987) developed Owen’s (1975) unidimensional sequential updating procedure into a multivariate extension through a series of normal approximations. Tam (1992) developed an iterative maximum likelihood ability estimation method for the two dimensional normal ogive model. Some combinations of maximum likelihood and Bayesian methods are proposed in Reckase (2009). One example would be to use Bayesian ability estimation method at the beginning of the test and when the ability location estimates become finite, maximum likelihood method in 2.1.1 can be used. 2.2 Item Selection Methods After each time the ability location estimates are updated, the next item needs to be selected for the examinee. There are several methods for choosing the next item. All of them are based on either maximizing or minimizing some criteria at the most recently updated location estimates. The difference among all item selection methods are the kind of criterion chosen. This section will describe several item selection methods that can be found in the research literature. 2.2.1 Maximizing the Determinant of the Fisher Information Matrix (D-optimality) This method was proposed in MCAT setting in Segall (1996). For unidimensional cases, the largest reduction in the sampling variance of E is achieved by selecting the item with the largest information value. However, in MCAT, information is no longer a scalar but a mxm matrix. It is defined that information based on previous administered items and updated ability estimateé , {r-th, s-th} elements of the information matrix is: azlnL _.____ , ‘ 2.7 66.09,] ( ) 1,, (9, 6) = -E[ and the {r-th, s-th} elements of an item information matrix is defined: 613(0) x 68(9) 66, 665 Pi(°)Q.-(9) Irs(0: ui): a (28) This method selects the next item which can achieve the largest decrement in the volume of the confidence ellipsoid. In order to realize that, a criterion of maximize: arg max det(I(0, 0k_1)+I(0, uk )), (2.9) is set where 0k_1 is the ability estimate update after k-l items have been administered and kth item needs to be selected. More details about the method of maximizing the determinant of Fisher information matrix is shown in Segall (1996, 2000) 2.2.2 Minimizing the Trace of Inverse of Fisher Information Matrix (A-optimality) Mulder & van der Linden (2008) introduce the method of minimizing the Fisher information matrix as the standard for selecting the next item. Mulder & van der Linden (2008) observed that in the optimal design literature, usage of determinant or trace of an information matrix or a covariance matrix is the standard practice. While using the determinant can select items that lead to the smallest generalized variance of the ability estimators, using the trace may select a different set of items because it only focuses on the variances of the ability estimators. But the results in Mulder & van der Linden (2008) showed that the precision of using trace of the inverse of the Fisher information matrix was comparable to using the determinant of the Fisher 10 information matrix in most cases. So this method is also included in this study. The criterion for selection is: arg mintrace(I(0,0k_1)+I(0,uk))_1 (2.10) In Mulder & van der Linden (2008), a more detailed description of this criterion can be found. A 3-dimensional case example was given in Mulder & van der Linden (2008). Let eigenvalues of 1(0, 0k_1)+l(0, uk) be x1, x2, x3(x1, x2, x3 ¢ 0). It has: trace((I(0, ék_,)+1(o, uk))-')=i+i+i x1 x2 x3 det(I(0, ék_1)+ 1(9, uk )) = xlx2x3 So the criterion of A-optimality: xle + XIX3 'I' XZX3 arg min trace(l(0, 0 k—l ) + [(0, uk ))_l = arg min xlx2x3 xlx2x3 detam, ék_1)+1(0, uk» = arg max = arg max 3 Zdet(l(0, ék_1)+l(9, “Ohm [=1 xle + XIX3 + XZX3 A-optimality contains the criterion of D-optimality as an import part. So the behavior of D-optimality and A-optimality should be similar. 2.2.3 Largest Decrement in the Volume of Bayesian Credibility Ellipsoid Based on the criterion in section 2.2.1, Segall (1996) developed another criterion for item selection. This Bayes Theorem based criterion selects the next item that leads to the largest decrement in the volume .of the Bayesian credibility ellipsoid. When Bayesian methods are used, prior information about the population ability distribution is available. Then the criterion given in section 2.2.1 (Segall 1996) changes to: II arg max det(I(0, ék_l)+1(e, uk)+<1>“), (2.11) and (I) is the same as defined in (2.4), which is the variance-covariance matrix of the prior multivariate normal ability distribution. More details about the method of maximizing the decrement in the volume of the Bayesian credibility ellipsoid is shown in Segall (1996, 2000). 2.2.4 Maximizing the Kullback-Leibler Information The information most used in CAT research is Fisher’s information. All the above methods are based on Fisher’s information. Kullback-Leibler information was first introduced for the unidimensional CAT by Chang and Ying (1996). Veldkamp and van der Linden (2002) further generalized it to the multidimensional cases. Kullback-Leibler information is suggested to perform better than Fisher information, especially during the beginning stage of the test (Chang & Ying 1996).£Ws 319151an betassajwlilselihmds maths-sam¢.paramcter--spaca~(LehInaaa§¢ W98); As suggested by Veldkamp and van der Linden (2002), it is desirable to select the next item that yields a likelihood at the true ability value maximally from those at any otherO. For one single item 2', Kullback-Leibler information is defined as: L(ui I 90) K,(9, 90) = E[ln L(u, I0) I, (2.12) where 00 is the true ability value of the examinee. After administering n items, for a set of response vector u, the measure is defined: 12 M]. (2.13) K,,(o, 90) = E[ln L(uIO) Because 00 is unknown and 0 is unspecified, Veldkamp and van der Linden (2002) based their item selection on the posterior expected Kullback—Leibler information that the most recent updated ability estimate 0le after k-l items. So the criterion of selecting the next item is to maximize: K.B(é"*‘>= LIE-(0, (thaw->69. (2.14) Chang and Ying (1996) and Veldkamp and van der Linden (2002) have more details on the item selection method of maximizing Kullback-Leibler information. 2.2.5 Other Item Selection Methods The above item selection methods are the most often used ones in the recent MCAT studies. However, there are other item selection methods in MCAT. For example, Mulder & van der Linden (2009) introduced three criteria for item selection based on Kullback-Leibler Information: 1) posterior expected Kullback-Leibler Information; 2) Kullback-Leibler distance between subsequent posteriors; 3) mutual information. Details about those criteria can be found in Mulder & van der Linden (2009). Reckase (2009) also proposed to select the next item that maximized the information in the direction that had the least information. More details about this item selection method can be found in Reckase (2009). 13 CHAPTER 3. RESEARCH QUESTIONS AND METHODS 3.1 Research Questions The goal of this research is to compare several ability estimation and item selection methods. To compare the performance of the methods, the criterion is which of the estimates yields values that are closer to the true value of the location for an examinee after a fixed number of items have been administered. Mean bias and Root Mean Squared Error (RMSE) were used as the standards to measure the precision of the estimates. Even though computer power is not a problem nowadays, it would still be interesting to compare the computation time of each method to find the balance between computation time and the precision of the estimates. For the research questions on ability estimation methods, first of all, as mentioned above, one problem with maximum likelihood estimation (Segall 1996) is that it may not converge at the beginning of the test. No research has shown how many items need to be administered before the estimates converge and when the estimates are near the true ability value. So the first research question is \what test lengg lingerie/d £0,116“? a converging results“ for maximum, likelihogifihflitY. QSLgngtionmetlE; The trend for the mean biases and RMSEs will help to decide whether the estimation is converging or not. Plots of successive estimates of the location as the test processes will be drawn to determine whether the test is converging at certain number of items. 14 Both maximum likelihood (Segall 1996) and Bayesian methods (Segall 1996) are used in MCAT research literature. So the second research question is which one of them performs better and under what conditions. The performance of those two methods was compared with different test lengths. The hypothesis is that when the test length is short, Bayesian methods would outform the maximum likelihood method. However, when the test length is long, those two methods should not differ much. Different priors for Bayesian method may be used, whether they are informative or not so informative. The research question is whether priors have any impact on the estimation results. Three priors were selected by this study: strong prior, relatively relaxed prior, and true priors. We define true prior here as the prior with variance covariance matrix from the whole examinee population. The hypothesis is that when true priors are used, more accurate estimates are expected. Stronger priors’ estimates are better than relatively relaxed priors for the students whose ability distribution nearer to the prior but worse for the students whose ability distribution further away fi'om the priors’. Argument for the usage of relatively relaxed priors is to be objective in administration of the tests. 15 Table 3.1 All conditions in comparing different ability estimation methods Ability Estimation Methods Prior Test Length MLE N/A short long Bayesian Strong prior short long Relatively relaxed prior short long True prior short long For research questions on item selection methods, the first one is to compare the performance of D-optimality (maximizing the determinant (Segall 1996)) and A-optimality (minimizing the trace (Mulder & van der Linden 2008)) when maximum likelihood method is used. From the literature of Optimal design, those two methods should be comparable. In this study, the two item selection methods were comparedat the long test length and research hypothesis is their performance is comparable. The second comparison of item selection methods is to compare the performance of Bayesian method based on Fisher’s information (maximizing decrement in Bayesian) with the one based on Kullback-Leibler information. The comparison is conditioning on test length and we would also use plots of successive estimates to see how fast each method converges. All methods will also be compared with themselves conditioning on test length. The results can provide evidence for each combination of ability update and item selection method, when it converges to the true ability point in the dimension and has stable and accurate estimates. 16 Table 3.2 All conditions in comparing item selection methods Item Selection Methods Abilitv Estimation method Test Length D-optimality MLE long A-optimality NILE long Bayesian Volume Decrease Bayesian short long Kullback-Leibler Bayesian short long Note: Bayesian Volume Decrease refers to the item selection method of maximizing the decrement in volume in Bayesian credibility ellipsoid described in section 2.2.3. 3.2 Research Methods 3.2.1 Real Data Used In this study, item pool was simulated based on real data from Michigan Educational Assessment Program (MEAP). Li (2006) used the data from 2005 MEAP mathematics test for the 7th graders. This real data set included 8562 examinees and 50 multiple choice items. From the dimensionality analysis results of Li (2006), this data set measured three ability dimensions: the first dimension measured ability to abstract math concepts; the second dimension measured vocabulary and operations ability; the third dimension measured problem solving ability. The estimated item parameters from Li (2006) for all items are listed in table 3. l7 Table 3.3 MIRT item parameters for Grade 7 Michigan Educational Assessment Program (MEAP) Mathematics test from Li (2006). Dimensi on 1 l l l 1 1 1 1 1 l l 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 Table 3.3 (Continue) 0.09 0.12 0.10 . 0.12 0.14 . -0.04 0.19 . 0.20 0.23 . 0.29 0.19 . 0.43 0.15 . 0.53 0.10 . 0.19 0.10 . -0.05 0.22 . 0.15 0.11 . 0.10 . 0.21 0.01 0.30 0.75 3 3 3 3 3 3 3 3 3 3 3 3 Note: *. c is the pseudo guessing parameter specified in equation 1.3; a1, a2, a3 are discrimination parameters for each of the three ability dimensions, where a=(a1, a2, a3), a as in equation 1.3; mdiff is the difficulty parameter in MIRT with negative value representing easy items. It is different from d as in equation 1.3. More details will be given in the section 3.2.2. **. The ‘items’ column contains a two-digit number for each item, representing the position of the item in actual test administration. Abbreviations for content classifications are listed at the number. Also, Li’s study showed that the test had simple structure, which means each item mainly loaded on one dimension. As shown in Table 3, the first 14 rows are the 14 items that mainly measure dimension 1, abstracting math concepts. From row 15 to row 33 are items mainly measuring dimension 2, vocabulary and operations ability. The last 17 rows represent 17 items that measure mainly dimension 3, problem solving ability. More details about the dimensional structures of test items can be found in Reckase (2009). In Li (2006), all correlations among the three O-scales were about 0.5. To be more specific, for all 8562 examinees, the variance-covariance matrix among dimensions is as in Table 4. This was used as» the true prior in the simulationgfor Bayesian/abilig estimatiqnaethod- Table 3.4 Correlation coefficients among 3 dimensions on Grade 7 Michigan Education Assessment Program (MEAP) Mathematics Test Dimension 1 Dimension 2 Dimension 3 Dimension 1 1 0.5104 0.5117 Dimension 2 0.5104 1 0.5675 Dimension 3 0.5117 0.5675 1 3.2.2 Simulation The study was simulated based on compensatory MIRT model as in equation 3.1 with all c parameters set as 0. In Li (2006), instead of generating a and d, other derived MIRT statistics mdisci, mdiffi, and dCOSjk were generated first. Parameters a and d were derived as the functions of those statistics. The relationship between them is represented by equation 3.2 and 3.2. e1.7(a,—9'j +di ) P(Uy' :lloj’ais Ci, (’1'): (3.1) 1+ e1.7(a,-0'j +di ) d,- = —mdiff,- x mdisci (3.2) where mdifj} is the difficulty parameter, and mdisci is the discriminating power of m the item for the most discriminating combinations of dimensions, mdisci = ’2 aikz . k=1 aik = dCOSjk X MdiSCi, (3.3) where dcosjk is the directional cosine that reflects how well an item measures each dimension. 20 Based on Li (2006)’s item parameters, 300 items were generated with 100 items mainly measuring each dimension. The item parameters were generated from distributions derived from Li (2006) for mimicking Grade 7 Michigan Education Assessment Program (MEAP) Mathematics test. Table 3.5 Distributions for multidimensional item parameter generation mimicking Grade 7 Michigan Education Assessment Program (MEAP) Mathematics test. 100 items were generated for each dimension. A. mdiff: difficulty parameter (negative value represents easy item) Distribution Dimension 1 normal (mean=-O.8, var=0.6) Dimension 2 normal (mean=0.37, var=0g) Dimension 3 normal (mean=-0.39, var=0.4L B. mdics: discriminating power of the items at direction of best measurement Distribution lognormal (mean=1, var=0.03) C. dcos: directional cosine determining the direction an item are measuring Distribution Dimension 1 dcosl fi_ (d cos 22 + dcos32) dcos2 beta (mean=0.0246, var=0.002) dcos3 beta (mean=0. 1694, var=0.003) Dimension 2 dcosl beta (mean=0.l366, var=0.005) dcosz J1 —(d coslz + dcos32) dcos3 beta (mean=0.0846, var=0.004) Dimension 3 dcosl beta (mean=0.1 161, var=0.006) dcos2 beta (mean=0.0507, var=0.002) dcos3 \/1—(alcosl2 +dcoslz) 21 Because the data set was three dimensional, 50 replications were simulated for each combination of 6] =-l, 0, 1, 62 =-1, 0, 1 and 93 =-1, 0, 1. If Bayesian methods were used, all interim ability estimates were MAP estimates and the final ability estimates were EAP estimates. Mean bias and root mean squared errors (RMSE) were used as measures of estimation precision. Mean biases and root mean squared errors (RMSE) were calculated for each dimension. Euclidean distance was also calculated as another index of the precision of the estimates. Euclidean distance in three-dimensioanl space between the estimate and true location point was calculated in as in equation 3.4. D=\/(él‘61)2+(é2-62)2+(é3"93)2a (3.4) where 0=(é1, 632, (93) is the current updated ability location point in the space, and 0 =(61 , 62, 63) is true ability location point. For general maximum likelihood method, a limit of 13 was set in order to provide the estimate updates when they were not converging. So whenever the estimates were larger than 3, the value 3 was set to the estimates. If the estimates were smaller than -3, the value -3 was set for the estimates. Total of 13 conditions were simulated to make a comprehensive comparison of each ability update and item selection method. Table 4 shows all the conditions. 22 Table 3.6 All 11 simulated conditions. All simulations are for 27 true ability points, 50 replicates at each point. AIM"?! Item Selection . Test Estimation Methods Prior Len th Methods g NEE D-optimality N/A 20 50 A-optimality N/A 50 Bayesian Bayesian Volume Mean=0, var-cov=identity 20 Decrease matrix 50 Bayesian Volume Mean=0, var-cov=diag(9) 20 Decrease 50 Bayesian Volume Mean=0, var-cov=true 20 Decrease ability distribution 5 0 Kullback-Leibler Mean=0, var-cov=identity 20 matrix 50 Note: Bayesian Volume Decrease here refers to the item selection method of maximizing the decrement in volume in Bayesian credibility ellipsoid described in section 2.2.3. The test length of 20 was chosen to represent short tests (e.g. Electronics Information Test in ASVAB). The test length of 50 was chosen to represent long test (e.g. 2007 MEAP Mathematics Test). The test lengths of 20 and 50 were generated for each condition (combination of an ability update and item selection method). For each condition, 50 replicated were simulated for all 27 true ability points. In order to answer the research question about the convergence problems of the general maximum likelihood method, test lengths of 20 and test length of 50 for the combination of MLE and D-optimality were simulated and compared. Estimates for 23 each dimension were calculated and compared to the true values. Euclidean distance was calculated and successive plot was draw to see the converging speed. To compare the performance of maximum likelihood and Bayesian as the ability estimation methods, the combination of D-optimality and maximum likelihood and the combination of Fisher’s information and Bayesian were simulated at the test lengths of 20 and 50. Their final estimates were compared to the true values. And Euclidean distances were calculated and successive plots were draw to compare the convergence rate. When ability estimation method of maximum likelihood was used, item selection methods of A-optimality and D-optimality were simulated and compared at the test length of 50. They were compared in terms of convergence rate and accuracy of the final estimates. One of the research questions is to evaluate the impact of priors used when Bayesian methods were used. Three priors were selected for the simulation. All of them were multinormal distributions with mean 0. The first one was with identity matrix as the variance covariance matrix. This represented a strong prior. The second one was with diag(9), that is, all the diagonal elements were 9 and all the off-diagonal elements were 0. This represented a relatively weak prior. The last one was with true ability variance covariance matrix as specified as in Table 4, which was calculated from all 7th graders of 2005 MEAP test. Test lengths of 20 and 50 were simulated for each prior. Their final estimation results were compared to measure the impact of the priors. 24 In order to compare the performance of item selection methods of Bayesian Volume Decrease (short term used here for maximizing decrement volume in Bayesian) and Kullback-Leibler information, test lengths of 20 and 50 were simulated and the final estimates were compared. The prior used was multinormal with mean 0 and identity matrix as the variance covariance matrix for both methods. Euclidean distance was calculated and successive plots were draw to compare the convergence rate. 25 f CHAPTER 4. RESULTS AND DISCUSSIONS In order to measure the impact of non-convergence problems at the beginning of the test when maximum likelihood was used as the ability estimation method, the results of test lengths of 20 and 5 0 were compared for D-optimality. First, the comparison of biases and RMSEs are shown in Figure 4.1. When the test was short (test length=20), the estimation was not stable: size for both the biases and RMSEs was large. But when the test was longer (test length=50), the size for both the biases and RMSEs became smaller. The estimates were more stable and accurate for all three dimensions. 26 Figure 4.1 Mean biases and RMSEs for maximum likelihood as the ability estimation method and D-optimality as the item selection method, at test length =20 and at test length =50. A: Biases Dim1: MLE a D-optimality Test Length 20 vs 50 o. —1 0 Test length = 20 I, g A Test length = 50 c5 3 0 AW: 50' ° A * 0A A A A8 m- _ 3‘9vo0 o 9 A 0 q g:- '7 e13-1-1-1-1-1-1-1-1-10000ooooo1 1 1 1 1 1 1 1 1 92L1r10001 1 14110001 1141—1000 111 93:101-101-101-101-101-101-101-101-101 27 True Location Points Dim2: MLE 8. D-optimality Test Length 20 vs 50 o — 0 Test length = 20 u, A Test length = 50 3 :5 T ° 9 ° 0 A 0A: A23 0 5 O giAgfigA A a 0 o O ‘9 _ q 0 Q 0141—10001 1 1-1110001114110001“ 923-1-1-1-1-1-1-1-1-10000000001 1 1 11 1 11 1 93:101-101401-101-101-101401-101-101 27 True Location Points 27 " . B: RMSEs 1.0 Blas o 0.5 -0.5 -1.0 1.0 RMSE 0.4 0.6 0.8 0.2 0.0 Dim3: MLE 8. D—optimality Test Length 20 vs 50 0 Test length = 20 A Test length = 50 00°00A 95 AAAAA 32'} We 0 AAAA 91411000111-1110001114110001“ 92:10 140140140 140 1-101-101-101-101 931-1-1-1-1-1-1-1-1-1ooooooooo1 1 11 1 1 1 1 1 27 True Location Points Dim1: MLE 8. D-opt'unality Test Length 20 vs 50 2 0 Test length = 20 — A Test length = 50 0 O 0 AA ° _ 83 A80 0 _ A A0 o 0 0 O A A oo°°a o J 00 Aa O O O oAAAA A AAfiabAA 912-1-1-1-1-1-1-1-1-1ooooooooo1 1 1 1 1 1 11 1 02:111000111-1-1—10001 1141-1000111 931-10 1-10 1-10 1-10 1-101-101-101-10 1-101 27 True Location Points 28 RMSE 0.4 0.6 0.8 1.0 0.2 0.0 1.0 RMSE 0.6 0.8 0.4 0.2 IO Dim2: MLE 8. D-optimality Test Length 20 vs 50 0 Test length = 20 — 0 A Test length = 50 DO O A Anna A — 911-1-1-10001 1 1-1-1-10001 1141-1000 1 11 921-1-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 933-10 1-10 1-10 1-10 1-101-101-10 1-10 1-10 1 27 True Location Points Dim3: MLE 8. D-opt'l'nality Test Length 20 vs 50 o 0 Test length = 20 — A Test length = 50 A o O O A o T Ago 030 o o o 0 ° 0 A 0 o o 0 AA ._ A A A o o o 0 A o AAAAA a — 0° 0 AAAAAAAAA .4 31741—10001 1144-10001 1 1-1t1000111 92:10 1-10 1-101-10 1-101-10 1-10 1-101-101 931-1-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 27 True Location Points 29 The same results can be found in calculating the Euclidean distance between the final estimates and true ability location points for test lengths of 20 and 50. Euclidean distance between the estimates and true ability location points were as specified as in equation 3.4. For all 27 true ability points, both the means and the standard deviations of the Euclidean distance of test length of 20 were larger than those of test length of 50. The estimates of test length of 20 were not stable, while at the test length of 50, the estimates were more stable and accurate. The U-shape of means of the Euclidean distance were observed, which showed that the estimation precision were more accurate for examinees with location 0. However, for examinees whose positions in the three dimensional space were away from the origin, the precision was not as good. Means and standard deviations of the Euclidean distances were show in Figure 4.2 for the combination of maximum likelihood and D-optimality. 30 Figure 4.2 Mean and standard deviation of Euclidean distance for maximum likelihood as the ability estimation method and D-optimality as the item selection method, at test length =20 and at test length =50. Euclidean Distance: “LE 8. D-optimality Test Length '20 vs 50 ID V. _ o 0 Test length = 20 A ° A Test length = 50 o O 1': — O 0 o A C o A o o g o o o O 0 E AAAO A Ac 0 (,0 A 0 AA0 '0. _ AA 0 AOA ° A A A A A A O. o - 014—111141410000000001 1 1 1 1 1 111 922-1—1-10001 1 11140001 1 141-1000 1 1 1 931401-10 1-10 110 140140 140140 1.10 1 27 True Location Points Euclidean Distance: MLE 8. D-optimality Test Length 20 vs 50 Q __ ‘- m 0 Test length = 20 c5 7 A Test length = 50 s .. . ° 8 o ° ° 0 0A 0 V a A 33 o O O 0 9 9 ' _ A A 00 A A A “1 .. A A O A O. _ O 91:11—111-11—110000000001 1 1 111111 62111—10001 1 1-1-110001 1 1411000111 932-101-101-101-10 1-101-101-10 1-101-101 27 True Location Points 31 ’1. From the results above, the estimates for maximum likelihood for short test (test length=20) were not very reliable and accurate. The plot of successive estimates for one examinee with true location point (1, 1, l) was drawn in Figure 4.3. The initial estimate was (0, 0, 0) and the test length was 50. It showed that at the beginning of the test, the estimates were not converging. They hit the ceiling we set i 3 when the estimate was not converging. After several items, estimate converged and became nearer and nearer to the true location point. The Euclidean distance at each estimate updated point for this particular examinee was also drawn and showed in Figure 4.4. The pattern was the same: at the beginning of the test, the estimation was not converging and it took several items till it converged. Figure 4.3 Successive progress plot of updated ability estimates and true location point afier administering each item for maximum likelihood method and D-optimality. Initial estimate (0, 0, 0). True location point (1, 1, 1). Test length=50. MLE with D-optimality theta 3 -.1 -2 ti: -- -a a a" :9; :3. theta1 32 Figure 4.4 Euclidean distance of between updated ability estimates and true location point after administering each item for maximum likelihood method and D-optimatlity. Initial estimate (0, O, 0). True location point (1, 1, 1). Test length=50. MLE with D-optimality Distance 3 4 l l 2 l ‘l l l l V I l I 0 10 20 30 40 50 Number of Items Table 4.1 showed all the items this particular examinee took and his updated ability estimates afier administering each item. It can be seen that after 7 items, location estimates on all three dimensions converged and instead of assigning ceiling value 3 or flooring value -3, maximum likelihood estimates were set as the estimates. After about 38 items being administered, this examinee’s location estimates were near the true location point (1, l, 1). And it stayed around the true points afterwards. 33 Table 4.1 Items administered, responses and updated ability estimates after each item for one examinee. The combination was maximum likelihood and D-optimality. Initial estimate was (0, 0, 0) and the true location was (1 , 1, 1). Iltems Administeredl Item ID Response 6. 62 6’3 1 30 1 3.00 3.00 3.00 2 128 0 -0.72 3.00 3.00 3 178 1 0.74 3.00 3.00 4 152 0 0.60 1.13 3.00 5 28 0 0.89 0.37 3.00 6 143 0 0.39 0.42 3.00 7 114 1 0.67 0.47 3.00 8 162 0 0.65 0.74 1.69 9 296 0 0.73 0.82 0.97 10 220 0 0.81 0.83 0.37 11 68 1 0.74 1.32 0.30 12 201 1 0.70 1.26 0.61 13 141 1 0.83 1.24 0.60 14 64 0 0.86 1.03 0.62 15 236 1 0.85 1.00 0.75 16 183 1 0.97 1.00 0.75 17 242 0 0.98 1.02 0.64 18 115 1 1.09 1.02 0.63 19 132 0 1.02 1.03 0.64 20 214 0 1.02 1.05 0.53 21 235 1 1.02 1.03 0.58 22 56 1 1.02 1.07 0.58 23 123 1 1.09 1.08 0.58 24 144 0 1.05 1.07 0.58 25 213 1 1.05 1.06 0.62 26 251 1 1.04 1.04 0.67 27 197 1 1.11 1.05 0.66 28 137 0 1.08 1.04 0.67 29 256 1 1.08 1.03 0.69 30 164 1 1.11 1.02 0.69 31 217 1 1.11 1.03 0.71 32 276 1 1.10 1.02 0.76 33 250 1 1.10 1.00 0.82 34 218 0 1.10 1.01 0.78 35 124 0 1.09 1.01 0.78 34 Table 4.] (Continue) 36 267 1 1.08 1.02 0.81 37 285 1 1.08 1.01 0.87 38 239 1 1.07 0.99 0.92 39 270 1 1.07 0.97 0.97 40 157 0 1.02 0.98 0.97 41 126 1 1.03 0.98 0.97 42 215 1 1.03 0.97 1.01 43 196 0 1.01 0.95 1.01 44 57 1 1.01 0.96 1.01 45 272 0 1.01 0.97 0.99 46 36 1 1.01 0.98 0.99 47 109 1 1.04 0.98 0.99 48 174 0 1.03 0.97 0.99 49 289 0 1.03 0.97 0.98 50 184 0 1.00 0.97 0.98 When the combination of Bayesian ability estimation method and Bayesian volume decrement item selection method was used, the comparison of test lengths of 20 and 50 was made for each prior to determine what test length was needed to have accurate estimates. The first comparison was made when the prior is a multinormal distribution with mean 0, and identity matrix as the variance-covariance matrix. This was used in the study as an example of strong prior. Mean biases and RMSEs for each dimension were compared for the test lengths of 20 and 50 and the results were shown in Figure 4.5. For all dimensions, the estimation precision at the test length of 50 was slightly better than that of the test length of 20. However, the differences were small and at the test length of 20, the estimation was already stable and near the true values. Both the mean biases and RMSEs were small. When the test length increased to 50, the biases and RMSEs became slightly smaller. But the difference was not as big as those 35 in the combination methods when maximum likelihood method was used. Mean biases and RMSEs are shown in Figure 4.5 for all three dimensions. Figure 4.5 Mean biases and RMSEs for Bayesian as the ability estimation method and maximizing decrement volume in Bayesian as the item selection method with prior set as mean 0 and identity matrix as variance covariance matrix, at test length =20 and at test length =50. A: Biases Dim1: Bayesian & Identity Prior Test Length 20 vs 50 Q _ 0 Test length = 20 “2 _ A Test length = 50 O 3 C! a o "2 q _. Q _ 912-1-1-1-1-1-11-1-10000000001 1 1 11 1 1 1 1 9211410001 1 1-1-1-1000111-11-1000 1 1 1 931-101-101-101-101-101-101-101-101-101 27 True Location Points 36 Dim2: Bayesian & Identity Prior Test Length 20 vs 50 O. _ 0 Test length = 20 ,Q _ A Test length = 50 O 911111000111-1-1—1000111-1-1-1000111 6211-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 932.10 1-101-101-101-10 1-101-101-101-10 1 27 True Location Points . u- I Dim3: Bayesian 8 Identity Prior Test Length 20 vs 50 1.0 0 Test length = 20 A Test length = 50 “2 o -O.5 C! F 31:1-1-1000111-1-1—1000111-11—1000111 922-10 1-101-10 1-10 1-10 1-101-10 1-101-10 1 933-1-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 27 True Location Points 37 B: RMSEs 1.0 RMSE 0.4 0.6 0.8 0.2 1.0 RMSE 0.2 0.4 0.6 0.8 0.0 0.0 . Dim1: Bayesian 8 Identity Prior Test Length 20 vs 50 0 Test length = 20 a A Test length = 50 _ A 0 o ggfigoaAG‘BO 0° oazazz Aasgsiggaeeee 612-1-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 0241-10001 1 1-11—10001 1141100 0111 931-101—101-101-10 1-10 1-101—101-10 1-101 27 True Location Points Dim2: Bayesian 8. Identity Prior Test Length 20 vs 50 0 Test length = 20 _. A Test length = 50 .1 A ° 0 15° " 96 3‘ was 0000;}; 38%“ 00A 8AAAA AAA .1 6141—10001 1 14110001 1 1-11—1000111 622-1-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 932-10 1-101-10 1—101-101—10 1-10 1401401 27 True Location Points 38 Dim3: Bayesian 8 Identity Prior Test Length 20 vs 50 C! ..1 0 Test length = 20 g -— A Test length = 50 m “2 _ (0 O E I! V. _ O A Aa° aoezfiga g 559 2° A90 0000008215t AKEAAAAAA Q a O 612111000111-111000111-11—1000111 622-10 140 1—10 1-10 140 140140 140140 1 931.1.1.1.1.1.1.1.1.1ooooooooo1 1 1 1 1 1 1 1 1 27 True Location Points The similar results can be found in calculating the Euclidean distance between final estimates and true ability location points. Again, the Euclidean distances were calculated according to equation 3.4. The mean and standard deviation of the Euclidean distance for the test length of 20 were slightly larger than those of the test length of 50. However, the differences were small. At the test length of 20, the final estimates were already very near the true ability location points. Even though when the test length increased to 50, the mean final estimates were nearer to the true ability location points, the change was not big as for practical purposes. From Figure 4.6, which showed the Euclidean distance, the difference between test lengths of 20 and 50 seemed to be larger than in Figure 4.5. But that was because the way Euclidean distance was defined, it was the distance sum over all dimensions. The results showed that the performance of the combination method of Bayesian as the ability update 39 method and maximizing decrement volume in Bayesian as the item selection method was already good with the test length of 20. Means and standard deviations of the Euclidean distance were shown in Figure 4.6. Figure 4.6 Mean and standard deviation of Euclidean distance of Bayesian as ability estimation method and maximizing decrement volume in Bayesian as the item selection method with prior set as mean 0 and identity matrix as variance covariance matrix, for test length =20 and for test length =50. Euclidean Distance: Bayesian 8 identity Prior Test Length 20 vs 50 '0... OTestlength=20 ATestlengtthO Q_ C‘- N O E 'Q- o a °a A ° ° 0 fi 0 A o a Aegis sigoaeg ‘Aeigatfia A 94 O 01:11-111—11—1—10000000001 1 1 1 1 1111 62:11—1000111-11-10001 1 14110001 1 1 93:10 1-101—101-101—101-101—10 1-101-101 27 True Location Points 40 Euclidean Distance: Bayesian 8 ldentity Prior Test Length 20 vs 50 o — m 9 Test length = 20 c5 " A Test length = 50 “2 _ o O to V — c5 0‘. —— A A o O 9623 o ggngzg A83<> A59AA 82335198 O. _ o 913-1-1-1-1-1-1-1-14000000000 1 1 1 111111 623—1—1—1000111-1—1-10001 1 1-1-1-1000 1 1 1 93:10 140140140 140 1-10 1-10 1-101-101 27 True Location Points As in the results for maximum likelihood method, plot of successive estimates of the combination of Bayesian and Bayesian volume decrement with identity matrix as prior was shown in Figure 4.7 for one example of examinee with true location point of (1, 1, 1) and initial estimate of (0, 0, 0). This figure showed there was no non-converging issue with Bayesian method and the estimates quickly converged to the true location. This was evidence why at the test length of 20, the estimate was already accurate for this combination of method. Euclidean distance between the estimates and true location (1, 1, 1) was calculated and shown in Figure 4.8. The results corresponded to the results in Figure 4.7. 41 Figure 4.7 Successive progress plot of updated ability estimates and true location point after administering each item for Bayesian method with identity matrix. Initial estimate (0, 0, 0). True location point (1, 1, 1). Test length=50. Bayesian identity var-cov theta 3 theta 1 42 r Figure 4.8 Euclidean distance of between updated ability estimates and true location point after administering each item for Bayesian method with identity matrix. Initial estimate (0, 0, 0). True location point (1, 1, 1). Test length=50. Bayesian identity var-cov Distance 3 4 l l 2 l 1 1 _ _ H o _ _ I I I I I I 0 10 20 30 40 50 Number of Items The study also compared the performance of different test lengths for the combination of Bayesian as the ability estimation method and maximizing decrement volume in Bayesian as the item selection method with mean 0, and prior set as diag (9). The results were similar to the conditions that had identity matrix as the variance covariance matrix prior. Both the mean biases and RMSEs showed evidence that the estimation precision at the test length of 50 was slightly better than that of the test length of 20. However, the differences were small and at the test length of 20, the estimation was already stable and precise. Both the mean biases and RMSEs were small. When the test length increased to 50, the biases and RMSEs became slightly smaller. But the difference was not much. Mean biases and RMSEs for each dimension were shown in Figure 4.9. 43 If"?! Figure 4.9 Mean biases and RMSEs for Bayesian as the ability estimation method and maximizing decrement volume in Bayesian as the item selection method with prior set as mean 0 and diag (9) matrix as variance covariance matrix, at test length =20 and at test length =50. A: Biases Dim1: Bayesian 8 Diag(9) Prior Test Length 20 vs 50 Q .4 0 Test length = 20 u; _ A Test length = 50 O 3 Q a o “2 _ q 9 _ 313-1-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 9241—10001 1 1-1-1—10001 1141-1000 1 1 1 33:101-101-101-101-101-101-101-10 1-10 1 27 True Location Points 44 Dim2: Bayesian 8 Diag(9) Prior Test Length 20 vs 50 0 Test length = 20 A Test length = 50 9141-1000111-1—1—1000111-11-1000 1 11 322-1-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 031-10 1101-10 110 1-10 1101.10 1.10 1-10 1 27 True Location Points Dim3: Bayesian 8 Diag(9) Prior Test Length 20 vs 50 O 0 Test length = 20 A Test length = 50 61:11—1000111-1—1—1000111-1-1—1000111 21101-10 1-10 1-10 1-101—101-10 1—10 1-10 1 0 933-1-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 11 1 27 True Location Points 45 B: RMSEs 0.8 1.0 RMSE 0.4 0.6 0.2 0.0 1.0 RMSE 0.2 0.4 0.6 0.8 0.0 Dim1: Bayesian 8 Diag(9) Prior Test Length 20 vs 50 0 Test length = 20 — A Test length = 50 61L1-1-1-11-1-1-1-1ooooooooo1 1 1 11 1 11 1 02‘411000111-111000111411000 1 1 1 93:101-101-101-10 1-10 1401401401401 27 True Location Points Dim2: Bayesian 8 Diag(9) Prior Test Length 20 vs 50 _l 0 Test length = 20 - A Test length = 50 I 3 A ° ° A fi 983% 80$: 0 00 o A Aflaéioz KAZAAAAZA 612-14100011 1411000111411000111 9224141111410000000001 1 1 1 1 1 11 1 932-10 1-10 1-10 1-10 1-10 110 140140 1401 27 True Location Points 46 Din13: Bayesian 8 Diag(9) Prior Test Length 20 vs 50 0. _ 0 Test length = 20 ID A = d — Test length 50 m ‘9 _ ti) o E a: Vt ._ O A z A N __ AngK‘ggAg 3° ZNzga O O 0 :30 o 0 A A AAXA AAA °. _ O 611—1110001114110001 11411000 111 923-1 0 1-10 1-10 1-10 1-101-10 1-10 1-101-10 1 332.1.1.1.1.1.1.1.1.1ooooooooo1 1 1 1 1 1 1 1 1 27 True Location Points For this combination with diag(9) as prior variance covariance matrix, the Euclidean distances for the test length of 20 were slightly larger than those of the test length of 50. However, the differences were small. At the test length of 20, the final estimates were very near the true ability location points. When the test length increased to 50, the mean final estimates were closer to the true ability location points. The change was not much though. This corresponded to the result for the mean biases and RMSEs: the performance of the combination method of Bayesian as the ability update method and maximizing decrement volume in Bayesian, diag (9) prior as the item selection method was already good with the test length of 20. Mean and standard deviation of the Euclidean distance were shown in Figure 4.10. 47 Figure 4.10 Mean and standard deviation of Euclidean distance of Bayesian as ability estimation method and maximizing decrement volume in Bayesian as the item selection method with prior set as mean 0 and diag (9) matrix as variance covariance matrix, at test length =20 and at test length =50. Euclidean Distance: Bayesian 8 0109(9) Prior Test Length 20 vs 50 '0. 4 0 Test length = 20 A Test length = 50 Q _ c ‘- «I O E m — d o A A Q 4 o 612-11111111100000000011 1 111111 02'4110001 11411000111411000 1 1 1 03:101-101-101-10140 1-10 1-10 1-10 1401 27 True Location Points Euclidean Distance: Bayesian 8 0109(9) Prior Test Length 20 vs 50 Q _ m 0 Test length = 20 c5 ‘ A Test length = 50 (D o. ._ o to V O. _ N A ' ‘ A ° A A o A aAfiazao90933383939532.3390 0 c5 1 61:11111111100000000011 1111111 62:111000111—1110001 1 141—1000 111 931-10 1101-10 1-101-101—101—10 1.101401 27 True Location Points 48 Plot of successive estimates of one examinee with true location point (1, l, 1) and initial estimate (0, 0, O) with this combination of diag(9) as the prior variance covariance matrix is shown in Figure 4.11. Successive Euclidean distance between the updated estimates and the true ability location (1 , 1, 1) was shown in Figure 4.12. The same as in conditions with identity matrix as the prior variance covariance matrix, the results showed that there was no non-convergence problems at the beginning of the test and the estimates quickly moved from initial estimate to the true location point. Figure 4.11 Successive progress plot of updated ability estimates and true location point after administering each item for Bayesian method with diag(9) variance covariance matrix as prior. Initial estimate (0, O, 0). True location point (1, 1, 1). Test length=50. Bayesian Diag(9) var-cov theta 3 49 SH Figure 4.12 Euclidean distance of between updated ability estimates and true location point after administering each item for Bayesian method with diag(9) variance covariance matrix as prior. Initial estimate (0, 0, 0). True location point (1, l, 1). Test length=50. Bayesian Diag(9) var-cov Distance 3 4 l l 2 1 1 l o _ _ _ I I l l j l 0 10 20 3O 40 50 Number of Items The third and final prior set in the study was true variance covariance matrix as the prior’s variance covariance. Comparison between the test length of 20 and test length of 50 was made to access the converging speed of the combination and whether the estimates were accurate for a short test. For this combination, Bayesian was the ability estimation method and maximizing decrement volume in Bayesian was the item selection method with mean 0, and prior set as true variance covariance matrix. The true variance covariance matrix was the correlation matrix calculated from all 8562 7th grade examinees of 2005 Michigan Education Assessment Program (MEAP) Mathematics test. The true variance covariance matrix was given in Table 3.4. When using true variance covariance matrix as the prior’s variance covariance matrix, both the mean biases and RMSEs showed evidence that the estimation precision at the test 50 length of 50 was slightly better than that of the test length of 20. It was the same as for the conditions that had the identity matrix and diag (9) matrix as the variance covariance matrix prior. However, the differences were small and at the test length of 20, the estimation was already stable and accurate. When the test length increased to 50, the biases and RMSEs became slightly smaller. But the difference was not big as for practical purposes. Mean biases and RMSEs for each dimension were shown in Figure 4.13 Mean biases and RMSEs for Bayesian as the ability estimation method and maximizing decrement volume in Bayesian as the item selection method with prior set as mean 0 and true variance covariance matrix as variance covariance matrix, at test length =20 and at test length =50. A: Biases Dim1: Bayesian & True Prior Test Length 20 vs 50 O. _ 0 Test length = 20 «a _ A Test length = 50 O a O. <3 Egizzmfizt a A 8, ED “ £6 to Q g 25090036” “2 _ 9 2 + 012-1-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 0221110001 1 11110001 1 14110001 1 1 33210 140140 1-10 140140 140 1-10 1-10 1 27 True Location Points 51 1.0 0.5 Blas 0 -O.5 -1.0 1.0 -1.0 Dim2: Bayesian 8. True Prior Test Length 20 vs 50 0 Test length = 20 A Test length = 50 00 O 0°0°Z AA A A A, Efifififi g m 5 A33 AA AA 0° 0030000 61:1-1—1000111—141000111-11—1000111 022-1-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 931-101-101-101—101-101—101-101-101-101 27 True Location Points Dirn3: Bayesian 81 True Prior Test Length 20 vs 50 0 Test length = 20 A Test length = 50 o °°°° flfigaAAAA a ’E fiéafiéaée 35333 A0" J 00000 0 61:1—1_1000111-1—r100011 1411000111 922—10 1-10 1—10 1-10 1-10 1-10 1-10 1-101—101 931-1-1-1-1-1-1-1-1-10000000001 1 1 11 1 1 1 1 27 True Location Points 52 B: RMSEs 1.0 RMSE 0.4 0.6 0.8 0.2 0.0 1.0 RMSE 0.4 0.6 0.8 0.2 0.0 Dim1: Bayesian 8. True Prior Test Length 20 vs 50 0 Test length = 20 — A Test length = 50 O 0 0 00 ° 0 °A — AAAQAGEA o o 0 0 0 8 0 A 8 O O o o 0 A AA fifiafid A AAAAA _ 612-1-1-1-1-1-1-1-1-100000000011 1 11 1 11 1 92:1-110001 1 1-1-110001 1 141-1000 1 1 1 93:101-10 1-101-101-101-101-101-101-101 27 True Location Points Dirn2: Bayesian 8. True Prior Test Length 20 vs 50 0 Test length = 20 — A Test length = 50 A o 000 0% g _. “80 ° A 5A A 0 ° 8: AKAAO 003 000A 3 A A AKAA A — 01:1—110001 1 1-1—11000111-11-100011 1 922-1-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 93:101-101-10 1-101-101-101-101-101-101 27 True Location Points 53 I Dim3: Bayesian 8. True Prior Test Length 20 vs 50 o —-1 0 Test length = 20 m A = d — Test length 50 m ‘9 _ (D O E . I! :5 a O o a N o o 0 o d .a 98 QQAAAAO o o o o o ozzgggéAfi AKAAAAAAA O 6 _ 911—1-1-1000111—1-1—10001 11-1—1—100011 1 022-10 1-10 1—10 1-10 1-10 140 140140 1.10 1 e33-1-1-1-1-1-$1-1-1000000000 1 1 1 1 1 1 1 1 1 27 True Location Points For the combination with true variance covariance matrix as the prior variance covariance matrix, the means and standard deviations of the Euclidean distance for the test length of 20 were slightly bigger than those of the test length of 50. However, the difference was small. At the test length of 20, the final estimates were already very near the true ability location points. When the test length increased to 50, the mean final estimates became closer to the true ability location points. The change was not so big as for practical purposes. The performance of the combination method of Bayesian as the ability update method and maximizing decrement volume in Bayesian, with true variance covariance matrix as the item selection method was already good with the test length of 20. Means and standard deviations of the Euclidean distance were shown in Figure 4.14. 54 Figure 4.14 Mean and standard deviation of Euclidean distance of Bayesian as ability estimation method and maximizing decrement volume in Bayesian as the item selection method with prior set as mean 0 and true variance covariance matrix as variance covariance matrix, at test length =20 and at test length =50. Euclidean Distance: Bayesian 8. Tme Prior Test Length 20 vs 50 It) 6- T 0 Testlength=20 A Testlength=50 Q _ C‘- G O E m — o' o o 0 A0 00 on 333a °3 2°:zggzaz Aztg °°8A3 o — d 913-1-1-1-1-1-1-1—1-100000000011 111111 1 9211—1—1000111-1—1—10001 1 1-1—1—100011 1 93:101—10 1-101-1o1-1o1-101-101-101-101 27 True Location Points Euclidean Distance: Bayesian 8. True Prior Test Length 20 vs 50 O. _ m 0 Test length = 20 o‘ ‘ A Test length = 50 (D d __ o a) q — o‘ 0! - o o o o e A o A ° swzmm, magnate o — o' 91L1-1—1—1-1—1-1—1—100000000011 1 1 11111 9211—1—10001 1 1-1410001 11411000 1 11 332-10 1-101-101-101-10 1401401401401 27 True Location Points 55 I. Plot of successive estimates of one examinee with true location point (1, 1, 1) and initial estimate (0, O, O) with this combination of true variance covariance as the prior variance covariance matrix is shown in Figure 4.15. Successive Euclidean distance between the updated estimates and the true ability location (1, 1, 1) are shown in Figure 4.12. The same as in conditions with identity matrix as the prior variance covariance matrix, the results show that there was no non-convergence problems at the beginning of the test and the estimates moved quickly from initial estimate to the true location point. Figure 4.15 Successive progress plot of updated ability estimates and true location point after administering each item for Bayesian method with true variance covariance matrix as prior. Initial estimate (0, O, 0). True location point (1, 1, 1). Test length=50. Bayesian true var-cov .............................................. theta 3 56 Figure 4.16 Euclidean distance of between updated ability estimates and true location point after administering each item for Bayesian method with true variance covariance matrix as prior. Initial estimate (0, O, 0). True location point (1, l, 1). Test length=50. Bayesian true var-cov m _1 v -1 an — C 2 fl ON - o l l l T l l 0 10 20 30 40 50 Number of Items , When Kullback-Leibler information was used instead of Fisher’s information, the comparison between test length of 20 and test length of 50 was made to test the performance conditioning on different test lengths for this combination. For this Combination, Bayesian was the ability estimation method and maximizing Kullback-Leibler information was the item selection method. The mean biases and RMSEs showed evidence that the estimation precision at the test length of 50 was better than that of the test length of 20. However, the differences were small and at the test length of 20, the estimation was already stable and precise. When the test length increased to 50, the biases and RMSEs became slightly smaller. But the difference was not so big as for practical purposes. This result was similar to the 57 combinations in which Bayesian was used as ability update method and Bayesian volume decrement used as item selection method. Figure 4.17 Mean biases and RMSEs for Bayesian as the ability estimation method and Kullback-Leibler information as the item selection method, at test length =20 and at test length =50. A: Biases Dim1: Bayesian 8. KL Test Length 20 vs 50 °. _ 0 Test length = 20 ,Q _ A Test length = 50 O -1.0 01L1-1-1-1-1-1-1-1-1ooooooooo1 1 1 1 1 1 1 1 1 02211100011144100011141-100011 1 93:101-101-10 1-101-101-101—101-101-101 27 True Location Points 58 Dim2: Bayesian & KL Test Length 20 vs 50 Q _ 0 Test length = 20 ,0. _ A Test length = 50 0 3q_ 28: Eggeoiaeeeaégaze A 50 A 911610 A0 A O o 0 lo q, J 9 z ' 9141—10001 1 1-11-1000111-11—1000111 022-1-141-1-1-1-1-10000000001 1 1 11 1 11 1 93:10 1-10 1-10 1-101-10 1401-10 140 1-10 1 27 True Location Points D'nn3: Bayesian & KL Test Length 20 vs 50 o — 0 Test length = 20 u; _ A Test length = 50 O — 61:1-1-1000111-1—1—1000111-11—1000111 92:101-101-10 1-101-101-101-101401-101 933-1-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 27 True Location Points -1.0 59 B: RMSEs Dim1: Bayesian 8. KL Test Length 20 vs 50 o — 0 Test length = 20 g .. A Test length = 50 m“? _ mo 2 ms. _ O o o A A ° g — AéAgOA‘AOAO o o o o o zgfifiaazgg AAAAAAZAE g a 912-1-1-1-1-1-1-1-1-10000000001 1 1 11 1 11 1 02:111000111-1-1100011 1411000111 931101-101-101—101-101-10 1401401401 27 True Location Points Dim2: Bayesian 8: KL Test Length 20 vs 50 o — 0 Test length = 20 g _ A Test length = 50 m“? _ m0 2 0!". _ O 8 A: g 030 5° ° A g" 9 3 AKA A ooooooog‘OAi1 80‘; :ZdabfifiAA o —1 c5 61114-100011 1-1-1-100011 141—1000 1 1 1 021-1-141-1-1-1-1-10000000001 1 1 1 1 1 11 1 93L101-101-10 1-10 1-101-101-10 1401-101 27 True Location Points 60 Dim3: Bayesian & KL Test Length 20 vs 50 O 0 Test length = 20 g — A Test length = 50 m“! _ mo 2 0:“! ._ O A o 3, A O 38 e (g— 3% 0608400 o°o°oo$8845 g gagfibAAAA O. _ O 91l1—1-10001 1 1—1—1—1000111-14—1000 111 02:10 1-10 1-10 1-10 1-10 1—10 1—10 1-10 1-10 1 932.1.1.1.1.1.1.1.1.1ooooooooo1 1 1 1 1 1 1 1 1 27 True Location Points For the combination of using Bayesian as the ability estimation method and Kullback—Leibler information as the item selection method, the results were similar to maximizing decrement volume in Bayesian. The means and standard deviations of the Euclidean distance for the test length of 20 were slightly larger than those of the test length of 50. However, the differences were not much. At the test length of 20, the final estimates were very near to the true ability location points. When the test length increased to 50, the mean final estimates were closer to the true ability location points and the change was small. The performance of the combination method of Bayesian as the ability update method and maximizing Kullback-Leibler as the item selection method was already good with the test length of 20. Means and standard deviations of the Euclidean distance are shown in Figure 4.18. 61 Figure 4.18 Mean and standard deviation of Euclidean distance of Bayesian as ability estimation method and Kullback-Leibler information as the item selection method, at test length =20 and at test length =50. Euclidean Distance: Bayesn 8. KL Test Length 20 vs 50 1.5 0 Test length = 20 A Test length = 50 Mean 1.0 0.5 0 Ana 0 o A ° 00 A BBKAAZO o gzfigerg afiAfiAAAA A 61:11111—11-11000000000111111 111 024110001 1 14110001 1 141400011 1 93:10 1-101-10 1-101-101-10 1401401401 27 True Location Points 0.0 Euclidean Distance: Bayesian & KL Test Length 20 vs 50 1.0 0 Test length = 20 '— A Test length = 50 SD 0.0 0.2 0.4 0.6 0.8 A 8AAQQA A98 23° 60A95303°AA 03 A A39 0 A Aoo —l 01:1—1—114111—1000000000 1 1 1 111 111 6241—10001 1 1-1—1—10001 114110001 11 93:10 1-10 1-10 1-10 140140140 140 1.101 27 True Location Points 62 Plot of successive estimates of one examinee with true location point (1, 1, 1) and initial estimate (0, 0, 0) with maximizing Kullback-Leibler information is shown in Figure 4.19. Successive Euclidean distance between the updated estimates and the true ability location (1, 1, 1) are shown in Figure 4.20. The same as in conditions with all other Bayesian methods, the results for Kullback-Leibler showed that there was no non-convergence problems at the beginning of the test and the estimates quickly moved from initial estimate to the true location point. The results also corresponded to results from the analysis of mean biases and RMSEs that at the short test (test length=20) the estimates were already accurate. Figure 4.19 Successive progress plot of updated ability estimates and true location point after administering each item for Kullback-Leibler. Initial estimate (0, 0, 0). True location point (1, 1, 1). Test length=50. theta 3 63 Figure 4.20 Euclidean distance of between updated ability estimates and true location point after administering each item for Kullback-Leibler. Initial estimate (0, O, 0). True location point (1, 1, 1). Test length=50. KL Distance 3 l 2 I 1 l l l l l l l 0 10 20 30 40 50 Number of Items One of the research questions was to compare the performance of A-optimality with D-optimality as the item selection method when maximum likelihood was used as the ability estimation method. The hypothesis was that their performance was comparable. Mathematical aspects of this comparability are given in the Chapter 5. Mean biases and RMSEs of the final estimates of both methods were compared at the test length of 50 for each dimension. Means and standard deviations of Euclidean distance between the final estimates and true ability location points were also compared. 64 Figure 4.21 Mean biases and RMSEs for maximum likelihood as the ability estimation method, with D-optimality and A-optimality at the item selection methods, test length =50. A: Biases Dim1: D-optimality vs A-optimality Test Length=50 Q J ‘- <> D-optimality In A A-optimality o. _ so Angée 3534345283136 5“ A A AA 0 3_ $3°Ao 90A 9 _ 012-1-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 92:1-1—10001 11-141000111-1110001 1 1 931-10 1401-10 1.10 140140140 1401401 27 True Location Points 65 1.0 Blas 0.5 -1.0 -0.5 0.0 1.0 -1.0 -O.5 0.0 0.5 Dim2: D-optimality vs A-optimality Test Length=50 0 D-optimality A A—optimality 6124110001 1 14110001 1 141100011 1 922-1111111110000000001 1 1 1 1 1 1 1 1 93:101-101—10 1101101101101-101-101 27 True Location Points Dim3: D-optimali‘ty vs A-optimality Test Length=50 <> D-optimality A A-optimal'rty o°3 AAEAAfiA$632A82AAA A 8 A0 AA 9- ‘5 Go 61411000111-1110001114110001 11 92:10 110110110 110110 1.10 1-10 1-10 1 931-1-141-111410000000001 1 1 1 1 1 1 1 1 27 True Location Points 66 B: RMSEs Dim1: D-optimality vs A-optimality Test Length=50 Q _ N 0 D—optimality A A—optimality 'Q _. In to E O _ [Z ‘- A d _ G Ao o A 0 0 A33 o QAééaAeAaAAééA O. c—t 61L11-1-1-1-111-1ooooooooo1 1 1 1 1 1 1 1 1 0224110001 114110001 1 1411000 1 11 93L10 1-1 0 1101-10 1-10 110 1-10 1-10 1101 27 True Location Points Dim2: D-opt’mality vs A—optimality Test Length=50 O. _ N 0 D-optimal’rty A A-optimality 'Q _ m (D e 2 . A 6 A «a _ ° 9 a o o ‘9‘ A°A o a 39° 21 A8 30 A0 0 A $9A9AAAAA 612—1110001114110001 1 1411000111 022-11-1111-1-110000000001 1 1 1 1 1 1 1 1 93110 110 110110 110 110110 110 1101 27 True Location Points 67 Dim3: D-optimality vs A-optimality Test Length=50 Q _ °‘ 0 D-optimality A A-optimality '0. __ ll] 0) 5° _ m ‘- A 0 a g — EA $831321 A 8 0g 0 O 0 8° AA A AA QAAAAQAAA 9(41100011 1411000 1 1 111100 0 111 621-10 1—10 1-10 1-10 1-10 1-10 1-10 1-10 1-10 1 93;1.1.1.1.1.1.1.1.1ooooooooo1 1 1 1 1 1 1 1 1 27 True Location Points The mean biases and RMSEs were very similar for D-optimality and A-optimality. It showed that at the test length of 50, the two item selection methods were comparable, which was the same as in the research hypothesis. 68 Figure 4.22 Mean and standard deviation of Euclidean distance for the combination with maximum likelihood as ability estimation method, comparison of D-optimality and A-optimality as item selection methods. Euclidean Distance: D- vs A-optimality Test Length=50 1.5 0 D-optimality A A—optimality 01> 1.0 Mean p 0.0 fl 61:11111111100000000011 1 111111 6224110001 1 14110001 114110001 1 1 932-10110 110110110 1101101101101 27 True Location Points Euclidean Distance: D- vs A-optimality Test Length=50 1.0 O D-optimality A A-optimality SD 0.0 0.2 0.4 0.6 0.8 c O D0 De .1 01:111111111000000000111111111 0224110001 1 1411000111411000111 932—101101-101-10110 1101101101101 27 True Location Points 69 The results of the means and stande deviations of Euclidean distance between the final estimates and true location points were measures of estimation precision over dimensions. Figure 4.22 shows that over all three dimensions, the estimation precision of D-optimality and A-optimality was similar. For the research question on the evaluation of the impact of priors on the performance of using Bayesian as the item selection method and maximizing decrement volume in Bayesian as the item selection method, the comparisons were made for the test length of 20 and test length of 50 and variance covariance were identity matrix, diag(9), and true variance covariance matrix calculated from the population. Mean biases and RMSEs are shown in Figure 4.23 for test length of 20 and in Figure 4.24 for the test length of 50. 70 Figure 4.23 Mean biases and RMSEs for Bayesian as the ability estimation method, comparion of prior variance covariance matrix as: 1) identity matrix; 2) diag (9) and 3) true variance covariance matrix. Test length=20. A: Biases Dim1: Prior Identity vs diag(9) vs True Test Length=20 3 ‘ 0 Identity Prior A Diag(9) Prior “2 _. + True Prior 0 Sq Bo “2 _ q 9 _ 61:1111-1-11-1-10000000001 1 1 1 1 1 1 1 1 92:11-1000111-11—10001 1 141-1000 1 1 1 93'401101-101101-101101101101-101 27 True Location Points 71 I Dim2: Prior Identity vs diag(9) vs True Test Length=20 1.0 “ 0 Identity Prior A Diag(9) Prior ._ + True Prior flew , . $4$+s+$$ + -1.0 -O.5 0.0 0.5 614110001 11111000111411000111 62:1111111110000000001 1 1 1 1 1 11 1 93:101101101-101101101-101101101 27 True Location Points Dim3: Prior Identity vs diag(9) vs True Test Length=20 1.0 ‘ 0 Identity Prior A Diag(9) Prior _ + True Prior -1.0 -O.5 0.0 0.5 61:1110001 11111000111411000111 022-101101101-10110110110110110 1 93:1111111110000000001 1 1 1 1 1 1 1 1 27 True Location Points 72 B: RMSEs RM SE 0.5 1 0 1.5 0.0 2.0 RMSE 1 O 1.5 0.5 0.0 Dim1: Prior Identity vs diag(9) vs True Test Length=20 0 Identity Prior A Diag(9) Prior _ + True Prior “sewers“...mme 912-1-111-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 6224110001 1 141-10001114110001 11 932101-101101-10110 1101-101101-101 27 True Location Points Dim2: Prior Identity vs diag(9) vs True Test Length=20 0 Identity Prior A Diag(9) Prior _ + True Prior 1141114111: MMhfitfla-“zfl fl 91:111000111—11—100011141—1000111 922-11-1111-1-110000000001 1 1 1 1 1 11 1 931-101-101101-101—10 1101101401101 27 True Location Points 73 Dims: Prior Identity vs diag(9) vs True Test Length=20 Q _ N 0 Identity Prior A Diag(9) Prior "‘3. __ + True Prior m (I) E O _ tr ‘- “2 _ o maximise 419111411 Q _ meme 0 613-1-1—10001 11-1—1—10001 1 1—1~1—1000 1 1 1 92:40 1—101-1 01-10 1-101-101-10 1-101-10 1 0Iii-1-‘1-1-1-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 27 True Location Points At the test length of 20, if the value of true location points on the dimension on which the biases were calculated was 0, the biases of all three priors were very close. When the true value was either 1 or -1, among the three priors, the biases for the true variance covariance matrix were the biggest. Prior variance covariance matrix as identity matrix and diag(9) were comparable. However, overall, the biases for all three priors on all three dimensions were very small and comparable, even though the true variance covariance had the largest biases for true values away from 0. The comparison from RMSEs showed that all three priors were comparable and there was no big difference at the test length of 20. 74 Figure 4.24 Mean biases and RMSEs for Bayesian as the ability estimation method, comparion of prior variance covariance matrix as: 1) identity matrix; 2) diag (9) and 3) true variance covariance matrix. Test length=50. A: Biases Dim1: Prior Identity vs diag(9) vs True Test Length=50 3 ‘ 0 Identity Prior A Diag(9) Prior '0. _ + True Prior 0 39 BO "2 __ q 9 .. 612-1111111110000000001 1 1 1 1 1 1 1 1 62:1110001 1 14110001114110001 1 1 93:101101-101101101101101101-101 27 True Location Points 75 Dim2: Prior Identity vs diag(9) vs True Test Length=50 ‘ 0 identity Prior A Diag(9) Prior .. + True Prior 1.0 -1.0 -0.5 0.0 0.5 6124110001 1 14110001 1 1411000111 02:1111111110000000001 1 1 1 1 1 1 1 1 032401101101101101-10 110110 110 1 27 True Location Points Dim3: Prior Identity vs diag(9) vs True Test Length=50 1.0 ‘ 0 Identity Prior A Diag(9) Prior 4 + True Prior -1.0 -O.5 0.0 0.5 6124110001 1 14110001 1 1411000111 622-101101101101101101101-101101 033-1-141-1111100000000011 1 11 1 11 1 27 True Location Points 76 B: RMSEs Dim1: Prior Identity vs diag(9) vs True Test Length=50 Q _. N 0 Identity Prior A Diag(9) Prior 3 _ + True Prior ur . (D E O ._ a: .- “2 _ o O A A A 9 q __ “‘47 4mmm+m¢144fi 3'6 O 61:11-111-1-1-1-100000000011 1 11 1 1 1 1 92:111000111-111000111411000 1 1 1 931-10 110 110 110110 110110 1101101 27 True Location Points Dim2: Prior Identity vs diag(9) vs True Test Length=50 O. _ N 0 Identity Prior A Diag(9) Prior 3 _ + True Prior m m 5 O _ m ‘- m — c5 44 $1. 43 4 q _ 49$ Afi‘g | ”3““ 4 A a O 01411000111411000111-11-100011 1 622-11-1111-1-110000000001 1 1 1 1 1 1 1 1 93:101-10110 110 1101-10 110 1-10 1-10 1 27 True Location Points 77 Dim3: Prior Identity vs diag(9) vs True Test Length=50 0. _ N 0 Identity Prior A Diag(9) Prior 3 ._ + True Prior m a) so _, tr 1- '0. _ O 6 $1 1 1 A q __ M $ $4“ means-11$“ 1* o 61:1110001 11111000111411000111 923-10 1-10 1-10 1-10 1—101-10 1-10 1-101-10 1 e3:-1-‘l-1-1-1-1-1-1-1(|000000001 1 1 1 1 1 1 1 1 27 True Location Points Figure 4.24 was the comparison of mean biases and RMSEs for all three priors when the test length was 50. It showed that at the test length of 50, both biases and RMSEs were very small and estimates were very accurate for all three priors. Therefore, when the test was long (test length=50), the impact of prior was small for the combination of Bayesian as the ability estimation method and maximizing volume decrement in Bayesian as the item selection method. This combination for all three prior, that is, strong prior, relatively weak prior, and true prior, produced accurate estimates at the end of the tests. The above results were drawn for each dimension. An overall measure, Euclidean distance was also calculated and shown in Figure 4.25. Mean and standard deviation of Euclidean distance between the final estimates and true location points for both the 78 test length of 20 and test length of 50 showed that the impact of priors was small and all three priors were comparable. Figure 4.25 Mean and standard deviation of Euclidean distance of Bayesian as the ability estimation method, comparison among prior variance covariance matrix: 1) identity matrix, 2) diag(9), and 3) true variance covariance matrix. Test length =20 Euclidean Distance of 3 Prior var-cov Identity vs diag(9) vs True Test Length=20 31‘ 15 l 0 Identity Prior A Diag(9) Prior — + True Prior 1.0 Mean 0.5 4 0 $3114“ Mirisite‘izsflr —t 61:11111111100000000011 1 111111 02‘4110001 1111100011 1411000111 931-10110 1101101101-101-101-101-101 27 True Location Points 0.0 79 Euclidean Distance of 3 Prior var-cov Identity vs diag(9) vs True Test Length=20 1.0 0 Identity Prior A Diag(9) Prior + True Prior SD 0.0 0.2 0.4 0.6 0.8 sweatsgtmaseazimwfi 01:1111111110000000001 1 1 111 111 024110001 1 11110001 1 14110001 1 1 931-10 110 110 110 110 110 110110 1101 27 True Location Points Test length=50 Euclidean Distance of 3 Prior var-cov Identity vs diag(9) vs True Test Length=50 1.5 0 Identity Prior A Diag(9) Prior + True Prior 1.0 Moan 0.5 fl: QfiA$4$3fifie4afig$x gfifigfi‘fl O O 012—1111111110000000001 1 1 1111 1 1 0214110001 1 14110001 1 11110001 1 1 93:101-101-10 110110110 1101101101 27 True Location Points 80 Euclidean Distance of 3 Prior var-cov Identity vs diag(9) vs True Test Length=50 0. _ Q _ 0 Identity Prior 0 A Diag(9) Prior ‘9 . + True Prior 0° a: V. _ O N _. + N 0 0A $$4%4t3*+9 ¢+$+$A i fi‘iiiegfie 0. ._ O 6111-1-1-1—1-1-1-1—100000000011111 1 111 62'4110001 1 14110001 11411000111 931.10 1-10 1-10 1—10 1-10 1-101-10 1-101-101 27 True Location Points Another research question was which ability estimation methods performed better, maximum likelihood or Bayesian. In order to make the comparison, the combination of maximum likelihood and D-optimality, and the combination Bayesian with maximizing volume decrement in Bayesian with identity matrix as the prior were compared at the test lengths of 20 and 50. The mean biases and RMSEs were compared and the results at the test length of 20 are shown in Figure 4.26 and the results at the test length of 50 are shown in Figure 4.27. 81 Figure 4.26 Mean biases and RMSEs for comparison of maximum likelihood method and Bayesian method. Test length=20. A: Biases Dim1: MLE vs Bayesian Test Length=20 O s; ‘ 0 MLE A Bayesian “2 _ o A 0 sq _ AAA A A0 0 ° —° 0 0 m 0 O 3 ._ 00° oo o 0 O ‘._- .— 612-1-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 11 1 62’4110001 1 14110001 1141100011 1 931101101101101101101101101101 27 True Location Points Din2: MLE vs Bayesian Test Length=20 o ‘4 “ 0 MLE A Bayesian '0 d - 00000 0 0 0° 0 3‘! "Wmé ‘ _o A AA “A m 00000000 A A A u: q‘ — o C! ‘7 ._ 61:1110001 11111000111411000111 62:1111111110000000001 1 1 1 1 1 11 1 93.1101101101101101101101101101 27 True Location Points 82 Dim3: MLE vs Bayesian Test Length=20 o .z ‘ <> MLE A Bayesian '0. _ o o°°o°o 0° 3 9 Amalgam :0 ° 0 ° 0 0° “!_‘ oo 0 0° 9 o O. "_ _ 01:1-1-1000111-1-1-100011141—1000111 622—10 140140 140 1-10 1-10 1401-10 1.10 1 93:-1-1-‘1-1-1-‘1-1-1-1()(HM!0000C)1 1 1 1 1 1 1 1 1 27 True Location Points B: RMSEs Dim1: MLE vs Bayesian Test Length=20 0. _ N O MLE In _ A Bayesian ,- m a) E O _ CUP 0 oo 00 m 0 O o O 0 C5_ 00 0 3 0° 0 0 A o 0 A3°AA 912-1-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 92:1—1-10001 1 14410001 114140001 1 1 931-101401-101-101-10 1-10 1-101-10 1-101 27 True Location Points 83 RMSE 1.0 RM SE 1.0 2.0 1.5 0.5 . d 2.0 1.5 0.5 Dim2: MLE vs Bayesian Test Length=20 O MLE A Bayesian o A A AAAAAAAAAEAAgAAAAAAA AA 61:1-1-10001 1 14110001 1141—1000111 921-1-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 93110 1-10 1-10 1—1 0 1-10 1-10 1-10 1-101-101 27 True Location Points Dim3: MLE vs Bayesian Test Length=20 0 MLE A Bayesian O O 0 000000 O 0 O O 000° 000° 0 A 5A 0 AaAAAAAA AAA AA AbggadaAMA 914110001 1 1-1-1-1000111-11-1000111 921—101-101-101-101401-101-101-101-10 1 e33-1-1-1-1-1-‘1-1-1-10000001H)01 1 1 11 1 1 1 1 27 True Location Points At the test length of 20, it can be seen from Figure 4.26 that the mean biases of maximum likelihood were much larger than those of Bayesian method. The 84 comparison of RMSEs also confirmed that Bayesian ability estimation method outperformed maximum likelihood at a short test (test length=20). Another interesting thing that could be found in the above graph was that for the true ability values that were negative, the mean biases for the maximum likelihood method were negatively biases while for Bayesian method, they were positive. When the true ability values were positive, the mean biases for the maximum likelihood method were positive and for the Bayesian method, they were negative. Figure 4.27 Mean biases and RMSEs for comparison of maximum likelihood method and Bayesian method. Test length=50. A: Biases D’In1: MLE vs Bayesian Test Length=50 3 - <> MLE A Bayesian ID 6 __ 39 ED “2 9 q ‘7 a 612-1-1-1-1-1-1-1-1-100000000011 1 1 1 1 11 1 9224110001 1 14110001 1 1411000 1 1 1 93:10 1-101—101-10 110 1101-10 1401401 27 True Location Points 85 Dim2: MLE vs Bayesian Test Length=50 ‘ 0 MLE A Bayesian 1.0 -1.0 -0.5 0.0 0.5 61:111000111-11-10001 1 1411000111 622-1-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 932-10 1-10 1-10 1-10 1-101-10 140140140 1 27 True Location Points Dim3: MLE vs Bayesian Test Length=50 ‘ 0 MLE A Bayesian 1.0 -1.0 -O.5 0.0 0.5 61:1—110001 1 14410001 11-1-1-1000111 022-101-101-10 1-101-101-101-101-101-101 933-1-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 27 True Location Points 86 IhIUWSEs Dim1: MLE vs Bayesian Test Length=50 O. _ N 0 MLE .0 _ A Bayesian ,: Ill 10 5°. _ 11‘- 0 00 '0 0 o O O o' — o A0 A0 0 oo 0 AA AA A AA gAgAnaaAAamaaaagAA 911-1-1-1-1-1-1-1-1-100000000011 1 1 1 1 11 1 02:11-1000111-1-1-10001 1 1-1-1-1000 111 93:101-101-101-101-101-101-101-101-101 27 True Location Points Dim2: MLE vs Bayesian Test Length=50 C! _ N O MLE ,0 __ A Bayesian ,: I.I.I a) 5C! _ (z.- 0 m 0 0 O O'- o °o 0° 00° 00° 0 ° 0 Aagaé A53 A AAAA AAA A 8AAAA KaAA 912—111000111-1-1-10001 1 141-1000111 922-1-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 931-10 1-10 1-10 1-101-10 1-1 0 1-10 1-10 1-101 27 True Location Points 87 Dim3: MLE vs Bayesian Test Length=50 C! __ N 0 MLE m. _ A Bayesuan ‘- I.l.l a) E O __ n: .- O u) 0 o o o d .J o o o o o o 0 00000 AAAAAAAAA A AAAAAAA aeAAaAAaA 611-1-1—1000111-1-1-1000111-1-1—1000 111 623-10 1-101—10 1-10 1—101-10 1-10 1-10 1-10 1 B32-1-1-1.1.1.1.1.1.1ooooooooo1 1 1 1 1 1 11 1 27 True Location Points The results from Figure 4.27 showed that at the test length of 50, the mean biases of the maximum likelihood were still larger than those of Bayesian method. RMSES were also larger for the maximum likelihood method than the Bayesian method. Therefore, even for long test (test length=50), Bayesian ability estimation method still outperformed the maximum likelihood ability estimation method. The mean biases and RMSEs in Figure 4.26 and Figure 4.27 were measures for the precision of each dimension. The study also used means and standard deviations of the Euclidean distance between the final estimates and true location points as the measure of overall precision. Figure 4.28 shows the results of the means and standard deviations of the Euclidean distance for the comparison of maximum likelihood and Bayesian ability estimation methods at both the test length of 20 and the test length of 50. 88 Figure 4.28 Means and standard deviations of Euclidean distance, comparison of maximum likelihood method and Bayesian method. Test length=20 and Test length=50. Test length=20 Euclidean Distance: MLE vs Bayesian Test Length=20 1.5 O MLE ° A Bayesian 1.0 Mean 0 0.5 r> D A A A A A AA A AAA AAA AAA AA AA AA A A — 61:1111—11141000000000111 111111 0241—10001 1 14110001 1 141-10001 1 1 93:101-101-101-101-101-101-101-101-101 27 True Location Points 0.0 Euclidean Distance: MLE vs Bayesian Test Length=20 1.0 O MLE A Bayesian SD 0.0 0.2 0.4 0.6 0.8 00 0° 0 O 0 AA a A AA A AA AAAAA AAAA AA AA AAAAA —1 913-1—1—1-1—1-1-1-1-100000000011 1 111111 923—1-1-1000111-1-1—1000111-1-1—1000111 93:101-101-101-101-101-101-101-101-101 27 True Location Points 89 Test length=50 Mean 1.0 1.5 0.5 0.0 1.0 SD 0.0 0.2 0.4 0.6 0.8 Euclidean Distance: MLE vs Bayesian Test Length=50 O MLE A Bayesian A (>00 A o o A A A A 0A0 A A A A AAA A AAA A A 58A°5 A A A 61:11—111-111-100000000011 1 111111 62:111000111-11-10001 1 141100011 1 93:101-10 1-101-101-101-101-101—101—101 27 True Location Points Euclidean Distance: MLE vs Bayesian Test Length=w ° MLE A Bayesian A A 0 o A A A A A an AA AA AAA A AA 61:1111-1111-1000000000111 11 111 1 6241—10001 1144-10001 1 1414000111 93110 140140 1-101-101-101-101-101—10 1 27 True Location Points 90 At the test length of 20, the mean Euclidean distances between the final estimates and true ability points were much larger for maximum likelihood method than for Bayesian ability estimation method. The standard deviations of the Euclidean distance were also much larger for the maximum likelihood estimation method. So over all three dimensions, the estimation precision for the maximum likelihood method was not good. Bayesian estimation method outperformed it in a large degree at the short test length (test length=20). When the test length increase to 50, the accuracy of both methods became better and the gap of the precision between the two methods became smaller. However, from the results of means and standard deviations of Euclidean distance, the overall estimation accuracy was still better for Bayesian than maximum likelihood estimation method. The results also showed that the final estimates for Bayesian method were also more stable than maximum likelihood method. The last research question was to compare the performance of volume decrement in Bayesian with Fisher’s information and the performance of maximizing Kullback-Leibler information. In order to make such comparison, both methods used the prior with mean 0, and identity matrix as the variance covariance matrix. The comparison was conditioning on test lengths. The mean biases and RMSEs for the final estimates of each dimension were calculated and Figure 4.29 shows the comparison at the test length of 20 and Figure 4.30 shows the comparison of the two methods at the test length of 50. 91 Figure 4.29 Mean biases and RMSEs of the comparison of Kullback-Leibler and Volume decrement in Bayesian. Variance covariance of priors is identity matrix. Test length=20. A: Biases Dim1: KL vs Bayesian Volume Decrement Test Length=20 Q _ " <> Kullback—Leibler m A Bayesian Volume Decrement o. _. 2. 3Q _ AAA _. a o '9 _ 9 Q __ 91:1-1-1-1-1-1-1-1-100000000011 1 1 1 111 1 9241—10001 1 1-1-1-10001 1 1-11—1000 1 1 1 93210 1-10 1—10 1-10 1-10 1-101-10 1-10 1-101 27 True Location Points Dim2: KL vs Bayesian Volume Decrement Test Length=20 Q _ ‘— 0 Kullback—Leibler m A Bayesian Volume Decrement o- -1 5 Q ——QAWM¢%SMT—X—n—— m 0 83A 9033 ‘9 __ q q _ 61:1110001 11-1—1—10001 1141400011 1 623-1-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 93:101-101—101-101-101-101-10 140140 1 27 True Location Points 92 B: RMSEs Din3: KL vs Bayesian Volume Decrement Test Length=20 C! _ " O Kullback-Leibler 0 A Bayesian Volume Decrement d —i 3 Q a o '0. ‘7 T Q _ 91:1—1—10001 1 1-1-1-10001 1141-1000 1 11 92:10 1-10 1-101-10 1-10 1-10 1-10 140 1.101 93L1.1.1.1.1.1.1.1.1ooooooooo1 1 1 1 1 1 1 1 1 27 True Location Points Dim1: KL vs Bayesian Volume Decrement Test Length=20 o — oi 0 Kullback-Leibler u, A Bayesian Volume Decrement m w E Q _ m ‘- m .— c5 Abaégfigédéazaag&aaé$gAAQAGA D d _ 912-1-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 022.1-1—100011141—10001 1 1-11-100011 1 93:101—101-101-101-101-10 1401401401 27 True Location Points 93 Dim2: KL vs Bayesian Volume Decrement Test Length=20 Q _ N 0 Kullback-Leibler ,0 A Bayesian Volume Decrement ‘-' _ ur a) E O _ n: .- ‘Q _ o 8 A o lagegangnggaQ'AZé‘efiAaAéég 8M 0. _ 61:1—1—1000111-11-1000111-1-1—1000 1 1 1 921-1-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 933-10 1-101-101-10 1-101-101-101-101-101 27 True Location Points Din3: KL vs Bayesian Volume Decrement Test Length=20 Q _ N 0 Kullback—Leibler m A Bayesian Volume Decrement v: _ I11 (I) E O _ m 1- '0. _ O AAAAAAa53AgAAgAAgAAQAAg6A$A 91344-10 001 11-1—1—10 001 1 1—1—1—100 0 111 92110 1-10 1-10 1-10 1-10 1-10 1-10 1-10 1-101 e33-1-1-1-1-1-1-1-1-100000000(l 1 1 1 1 1 1 1 1 1 27 True Location Points At the test length of 20, the mean biases were small for both the volume decrement in Bayesian with Fisher’s information method and maximizing Kullback-Leibler information item selection method. Those two methods produced accurate final 94 estimates at the test length of 20. RMSEs were also small for both methods. So the estimates of both methods were already stable at the test length of 20. From both the mean biases and RMSEs, it can be seen that Kullback-Leilber information and Bayesian method using Fisher’s information were comparable at the test length of 20. Figure 4.30 Mean biases and RMSEs of the comparison of Kullback-Leibler and Volume decrement in Bayesian. Variance covariance of priors is identity matrix. Test length=50. A: Biases Dim1: KL vs Bayesian Volume Decrement Test Length=50 0. _ " <> Kullback-Leibler m A Bayesian Volume Decrement d a a Q AAA 3 ...... a o *' ‘ "2 _ ‘? 9 _ 611-1-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 92211—10001 1 1.1110001 1 1-11-10001 1 1 93:10 140 1401—10 140 1401401401401 27 True Location Points 95 B: RMSEs 1.0 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 J A Bayesian Volume Decrement Dim2: KL vs Bayesian Volume Decrement Test Length=50 <> KullbackcLeibler 91L1-1-1000111-1—1-10001 1 141-1000 111 922-1-141-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 93110 1-10 1-10 1-10 1-10 1-10 1-10 1-10 1-101 27 True Location Points Din3: KL vs Bayesian Volume Decrement Test Length=50 <> Kullback-Leibler A Bayesian Volume Decrement 912-1-1-10001 1 1-1110001 1 1-1410001 11 02:101—101-101-101-101-101-101-101-101 933-1-1-1-1-1-1-1-1-1ooooooooo1 1 1 1 1 1 1 1 1 27 True Location Points 96 Dim1: KL vs Bayesian Volume Decrement Test Length=50 Q a 0| 0 Kullback-Leibler ,0 A Bayesian Volume Decrement ,; _ m a) E Q _. m .- 0. _ o A 9399A 0 as e AAAAaAAMAA-AAAAAM‘ d _ 912-1-1-1-1-1-1-1-1-10000000001 1 1 1 1 1 1 1 1 62144-10001 1 1-11—1000111-11-100011 1 03:10 1-101-10 1-10 1-101-10 1-10 140140 1 27 True Location Points Dim2: KL vs Bayesian Volume Decrement Test Length=50 C! _ N 0 Kullback—Leibler ,0 A Bayesian Volume Decrement ‘4 _ m (D E O. _ m .- “2 _ c As AA as A g 86 a A A A 3 AA A A o _ AAAAAOAAA O 61:1-1-10001 11-1—1-10001 11-1-1—1000111 923-1-1-1-1-1-1-1-1-10000000001 1 1 11 1 11 1 933-10 1-10 1-10 1-10 1-10 1-10 1-10 1-10 1-10 1 27 True Location Points 97 Dim3: KL vs Bayesian Volume Decrement Test Length=50 0. _ N 0 Kullback-Leibler u, A Bayesian Volume Decrement m w E O _ m «.- m —1 :5 33518468. 3A édéama & aaaaaaagaag 613-1-1-10001 1 144-10001 1 144-1000 1 11 02:10 1-10 1-10 1-10 1-10 140 140140 1401 e33--1-‘l-‘I-‘I-1-1-1-1-‘l000000000 1 1 1 1 1 1 1 1 1 27 True Location Points When the test length increased to 50, from the results of mean biases and RMSEs, the precision of the two methods: volume decrement in Bayesian using Fisher’s information and maximizing Kullback-Leibler information, were good and those two methods were comparable in terms of estimation accuracy and stability. The overall measure, Euclidean distance was also calculated and shown in Figure 4.31. From the comparison of means and standard deviations of the Euclidean distance, it can be seen that the performance of Kullback-Leibler information of the volume decrement in Bayesian with Fisher’s information was comparable both at the test lengths of 20 and 50. 98 Figure 4.31 Means and standard deviations of Euclidean distance, comparison of Kullback-Leibler information and volume decrement in Bayesian with Fisher’s information. Test Length=20 Euclidean Distance Kullback-Leibler vs Bayesian Volume Decrement Test Length=20 m — 0 Kullback-Leibler A Bayesian Volume Decrement o .— c I- as O 5 “2 _ ° 9 A A e AAAAgAAA gagAzgéa A& 93639 c d _ 61:1411—1—11—1—100000000011 111 1111 023-1-1-1000111-1—1—1000111-1—1—100011 1 93:10 1-10 1-101-10 1-101-101-101-10 1-101 27 True Location Points 99 Euclidean Distance Kullback-Leibler vs Bayesian Volume Decrement Test Length=20 1.0 0 Kullback-Leibler A Bayesian Volume Decrement SD 0.0 0.2 0.4 0.6 0.8 A fingeagagéééZAiggfié0&ééggzafi —1 61:11—111—11—1100000000011 1 111111 624110001 1 1-11-10001 1 1-11100011 1 931-10 1-10 1-10 1-10 1-10 1-10 1-10 1-10 1-10 1 27 True Location Points Test Length=50 Euclidean Distance Kullback-Leibler vs Bayesian Volume Decrement Test Length=50 1.5 <> Kullback—Leibler A Bayesian Volume Decrement 1.0 Mean 8.3.6.2. B. a Aa‘aea eaa 9‘3 Astagede A — 611-1-1-11-1—11-1-10000000001 1 1 111 1 1 1 922-1—1—10001 1 1-1-1-10001 1141-10001 1 1 93L10 1-10 1-101-10 1-10 1-10 1-10 1-10 1401 27 True Location Points 0.0 100 Euclidean Distance Kullback-Leibler vs Bayesian Volume Decrement Test Length=50 0. —1 Q _ 0 Kullback-Leibler O A Bayesian Volume Decrement “2 _ o O to ‘7. __ 0 q — A 0 A 08 0 ° Aawzgmaagoae manage Q _ O 9121—1-11-1—1-1—1—1000000000 1 1 1111111 923—1-1—1000111-1-1—10001 1 1-1—1—10001 11 e33-101-101-101-101-10 1-10 1—101-101—101 27 True Location Points As stated in the research questions, even though computation time was not as important as before, it was so interesting to calculate the computation time for each method to have a balance between estimation precision and computation time. For each combination of ability estimation and item selection method, the computation time was calculated at the examinee level using second as the unit for time. The times shown in Table 4.2 are how many seconds were needed to administer test to one examinee. In the simulation study, the examinees’ response time was set as 0. Except for the item selection method using Kullback-Leibler information, the computation time was around 2 seconds for the test length of 20 and around 9 seconds for the test length of 50. When Kullback-Leibler information was used, the computation increased about 10 times for both test lengths. 101 Table 4.2 Computation time for each examinee (Unit: second) Ability Item Selection Prior Test Test Estimation Method Length=20 Length=50 Method MLE D-optimality N/A 2.765 8.696 A-optimality N/A 1.889 6.580 Bayesian Bayesian Volume Identity 2.442 9.429 Decrement diag(9) 2.460 9.490 True 2.441 9.41 8 Bayesian Kullback-Leibler Identity 20.694 99.624 Note: All the integrating calculations were programmed in FORTRAN and all other programming was done is R. All computation time was calculated on a PC with a 3.0 GHz AMD Athlon 64 Dual Core processor and 2.00 GB RAM. 102 CHAPTER 5. CONCLUSION, RECOMMENDATION, AND FUTURE RESEARCH DIRECTION This study did a comprehensive comparison of ability estimation and item selection methods in multidimensional computerized adaptive testing. Two ability estimation methods included maximum likelihood estimation and Bayesian estimation method. The item selection methods can be divided into three categories, item selection methods associated with maximum likelihood, item selection with Bayesian with Fisher’s information, and item selection method with Kullback-Leibler information. D-optimality (maximizing the determinant of Fisher’s information) and A-optimality (minimizing the trace of the inverse of Fisher’s information) were included for item selection methods that were associated with maximum likelihood method. Three priors of Bayesian method with maximizing the volume decrement with Fisher’s information were selected to measure the impact of the priors in Bayesian. Different test lengths were selected (test length=20 and test length=50). In total, 11 combinations of ability estimation and item selection methods were simulated and compared in the study. The initial estimate for all examinees was 0 and the mean of all priors was 0. This led to one trend for all biases. For Bayesian estimation, all biases were “inward bias”. Estimators of positive values of 0,- (i=1, 2, 3) were negatively biased and the estimators of negative values were positively biased. In the opposite, when maximum likelihood estimation was used, the biases were “outward bias”. Estimators of positive 103 values of 6? were positively biases and the estimators of negative values were negatively biases. From the results of mean biases and RMSEs of final estimates for each dimension, and means and standard deviations of Euclidean distance, it can be seen that maximum likelihood ability estimation method did have non-convergence problems at the beginning of the test and it affected the estimation precision of the method. Plots of successive progress of updated estimates also supported this conclusion. Therefore, it was recommended that a longer test should be used when maximum likelihood ability estimation method was applied. When Bayesian ability estimation method was applied, for all the combinations with the item selection methods, the comparison of test lengths of 20 and 50 showed that the precision difference was small. The final estimates were already stable and accurate. Therefore, if Bayesian ability estimation method was used, a short test (test length=20 or more) could be used. The comparison of maximum likelihood and Bayesian ability estimation methods showed that Bayesian ability estimation method outperformed maximum likelihood method, especially for short test length. In general, Bayesian ability estimation was recommended as the ability estimation method. But with Bayesian, the test designers need to select the priors, which might not be as objective as the maximum likelihood method. So all factors need to be taken into considerations when choosing the ability estimation method. In theory, if the test length is very long (estimates for both methods converged and were stable), the estimation of the two methods should be comparable. 104 The study also evaluated the impact of priors when Bayesian method was used. Three priors: a strong prior, a relative weak prior, and a true prior calculated from the population were compared. When the true ability value on the dimension was 0, all three priors were comparable and the mean biases were small. When the true ability value was negative or positive, and opposite to the research hypothesis, the true prior did not perform as well as the other two priors. It was because mean of multinormal distributions for all priors was 0, the priors pulled the estimates towards the mean 0. With the true prior, the force of pulling was the strongest. So the biases were the biggest. But for all three priors and conditioning on both short and long test lengths, the performance of Bayesian estimation was good and the final estimates were stable and accurate. More studies need to be done on how to utilize the collateral information for priors to assist a better estimation with Bayesian method. Instead of the population prior, as was used in the study, an individual prior may be used or hierarchical models could be tested to see if that can lead to better final estimation. All the priors used in study had the same values on the diagonal respectively. There was more regular compared to cases like variances were quite different and correlations more varied. More studies need to be done to investigate such priors to assess the impact of items selection and ability estimation methods under such conditions. The Kullback-Leiber information was relatively new compared to the Fisher’s information in multidimensional adaptive testing. The comparison of the two in the study showed that the performance of the two was comparable for both the short and long test lengths. However, the Kullback-Leiber information did cost much longer 105 computation time than other methods. And if computation time is one of the concerns for one test application, then volume decrement of Bayesian with Fisher’s information was recommended rather than the Kullback-Leibler information. Also, the cases studied here were three dimensional. With the increase of the dimensions, it was expected that the computation time would also increase. Therefore, extra care should be taken if higher dimensions were studied. Multidimensional computerized adaptive testing is a relatively new area of research. This study was a comparison of ability estimation and item selection methods to make recommendation and guidance in terms of what ability estimation and item selection methods to use when designing a multidimensional computerized adaptive testing. The conclusions of this study were limited to the conditions of item pool, test lengths and priors used. Also, during the work of this study, more and more ability estimation and item selection methods are being developed. So in future, more research needs to be done to compare the new methods with all the methods in this study. There are also other issues in multidimensional CAT, such as how to select the first item and how to end the test, which needs more research on. 106 REFERENCES Bimbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. Lord, F. M., & Novick, M. R. (Eds.), Statistical Theories of Mental Test Scores (pp. 397-479). Addison-Wesley, Reading, MA. J Bloxom, B. M., & Vale, C. D. (1987). Multidimensional adaptive testing: A procedure for sequential estimation of the posterior centroid and dispersion of theta. Paper presented at the meeting of the Psychometric Society, Montreal. Bock, R. D. & Mislevy, R. J. (1988). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement 6, 431-444. \/Chang, H.-H., & Ying, Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20, 213-229. Chang, H.-H., & Ying, Z. (1999). a-stratified multistage computerized adaptive testing. Applied Psychological Measurement, 23, 211-222. Hetter, R. R., & Sympson, J . B. (1997). Item-exposure in CAT-ASVAB. In W. A. Sands, J. R. Waters, & J. R. McBride (Eds.), Computerized adaptive testing: From inquiry to operation (pp. 141-144). Washington, DC: American Psychological Association. Kim, J. P. (2001). Proximity measures and cluster analyses in multidimensional item response theory. Unpublished doctoral dissertation, Michigan State University, East Lansing, MI. 6 Lee, Y.-H, 1p, E. H., &Fuh, C.-D. (2008). A strategy for controlling item exposure in multidimensional computerized adaptive testing. Educational and Psychological Measurement, 68, 215-232. ~ / Li, T. (2006). The eflect of dimensionality on vertical scalingt. Unpublished doctoral dissertation, Michigan State University, East Lansing, MI. Luecht, R. M. (1996). Multidimensional computerized adaptive testing in a certification or licensure context. Applied Psychological Measurement, 20, 389-404. Lehmann, E. L., & Casella, G. (1998). Theory of point estimation. New York, NY: Srpinger-Verlag. McDonald, R. P. (1997) Normal —ogive multidimensional model. In W.J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 25 8-270). New York: Springer. 107 McDonald, R. P. (1999). Test Theory: a unified treatment. Lawrence Erlbaum Associates, Hillsdale, NJ. Mulder, J. & van der Linden, W. J. (2008). Multidimensional adaptive testing with optimal design criteria for item selection. Psychometrika. Tentatively accepted for publication. Owen, R. J. (1975). A Bayesian sequential procedure for quantal response in the context of adaptive mental testing. Journal of the American Statistical Association, 70, 3 5 1-356. Reckase, M. D. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9, 401-412. Reckase, M. D. (1997). A linear logistic multidimensional model for dichotomous items response data. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 271-286). New York: Springer. Reckase, M. D. (2009?) Multidimensional Item Response Theory?? Segall, D. O. (1996). Multidimnesional adaptive testing. Psychometrika, 61, 331-354. Segall, D. O. (2000). Principles of multidimensional adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 53-73). Boston: Kluwer. Sympson, J. B., & Hetter, R. D. (1985, October). Controlling item-exposure rates in computerized adaptive testing. In Proceedings of the 27‘ annual meeting of the Military Testing Association (pp. 973-977). San Diego, CA: Navy Personnel Research and Development Center. Tarn, S. S. (1992). A comparison of methods for adaptive estimation of a multidimensional trait. Unpublished doctoral dissertation, Columbia University, New York City, NY. . van der Linden, W. J. (2000). Constrained adaptive testing with shadow tests. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 27-52). Boston: Kluwer. van der Linden, W. J ., & Glas, C. A. W (Eds) (2000). Computerized adaptive testing: Theory and practice. Boston: Kluwer. C van der Linden, W. J ., & Veldkamp, B. P. (2004). Constraining item exposure in computerized adaptive testing with shadow tests. Journal of Educational and Behavioral Statistics, 29, 273-291. 108 A van der Linden, W. J ., & Veldkamp, B. P. (2007). Conditional item-exposure control in adaptive testing using item-ineligibility probabilities. Journal of Educational and Behavioral Statistics, 32, 398-418. k/Veldkamp, B. P., & van der Linden, W. J. (2002). Multidimensional adaptive testing with constraints on test content. Psychometrika, 67, 575-588. Wainer, Howard (2000). Computerized adaptive testing: a primer, 2nd Ed. Mahwah, NJ: Jawrence Erlbaum Association. 109