MULTIDIMENSIONAL ITEM RESPONSE THEORY: AN INVESTIGATION OF INTERACTION EFFECTS BETWEEN FACTORS ON ITEM PARAMETER RECOVERY USING MARKOV CHAIN MONTE CARLO By Jonghwan Lee A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Measurement and Quantitative Methods 2012 ABSTRACT MULTIDIMENSIONAL ITEM RESPONSE THEORY: AN INVESTIGATION OF INTERACTION EFFECTS BETWEEN FACTORS ON ITEM PARAMETER RECOVERY USING MARKOV CHAIN MONTE CARLO By Jonghwan Lee It has been more than 50 years since Lord (1952) published “A Theory of Test Scores (Psychometric Monograph No.7)” which is recognized as one of the most influential in Item Response Theory (IRT) history. Since then, there has been extensive research investigating several aspects of IRT such as: (1) Modeling; (2) Estimation of latent traits; and (3) Estimation of item parameters. There has also been extensive development of applications based on IRT such as (1) Equating; (2) Linking; (3) Differential Item Function (DIF); (4) Standard setting; and others. All those applications have the same assumption—that the item parameters are calibrated as accurately as possible. Nevertheless, there has been extensive research investigating the techniques to estimate the item and latent trait parameters. All previously developed estimation techniques are based on the uni-dimensional IRT model. However, estimation procedures have become more sophisticated because of the appearance of multidimensional item response theory models (MIRT). In MIRT, there are several factors that are influential in calibration procedures, such as (1) number of latent traits; (2) correlation between the latent traits; (3) non-normal distribution of latent traits; and (4) different types of configurations of latent traits (approximate simple structure and mixed structure). In this study, the interaction effects of combined factors on item parameter recovery were investigated using the Markov Chain Monte Carlo simulation method. The findings show that a higher number of dimensions require a bigger sample size than lower dimensions—2000 and 1000 sample sizes for 6-dimensions and 3-dimensions, respectively. That model does not consider correlation and skewness in the latent trait distribution, however. This study shows that if there is an additional factor introduced into the features of the latent trait such as correlation or skewness, increasing the sample size is not helpful in improving the accuracy of item parameter recovery. Rather, an alternative MIRT model should be considered in the case of correlated latent traits, and for transforming non-normal distributions of latent traits to normal distributions. a-parameters are more affected when there is correlation between latent traits. d-parameters have more influence when the latent trait distribution is skewed. Overall, the more factors that influence the estimation of the parameters of the MIRT model, the higher the bias found in the item parameter calibration. If the latent structures are independent and normally distributed, then the higher the dimension is in the model specification, the less bias it will have in item parameter calibration. It is also true that if the latent structures have different types configuration, such as AS or MS, then increasing the number of dimensions may possibly decrease the bias created from different types of latent structures configuration. When the latent traits are suspected of having a skewed or non-normal distribution, then it bias is not improved by simply increasing the sample size, though it might be helpful to increase the number of items at the same time. Another way to fix this problem is to use a sample of examinees selected from a wide range of abilities. This is also true in the case of latent traits that are correlated with each other. Selecting the examinee group carefully greatly reduces the bias resulting from the item calibration procedure. Copyright by JONGHWAN LEE 2012 DEDICATION To my brother, Koowhan Lee, who supported me from the beginning of this long journey. Without his support, trust, and patience, I would not have been able to complete my doctoral program. I also would like to give my deep appreciation to my sister-in-law. To my family, who gave me endless support through my doctoral program. Their love and support made me a better person, and gave me unlimited energy to finish the program. v ACKNOWLEDGEMENTS First and foremost, I would like to express my deepest appreciation to the committee chair and my advisor, Dr. Mark Reckase. Without his guidance, support and encouragement, I would not have been able to finish my degree. I remember when I took his class at the beginning of my doctoral studies. He offered me really deep insights, knowledge and understanding of what measurement is about. My deepest appreciation goes to the other committee members, Dr. Kimberley S. Maier, Dr. Spyros Konstanoupolos, and Dr. Ryan Bowles. In particular, Dr. Kimberley S. Maier’s help and advice were essential to my finishing up the dissertation. Without her help in the last minutes, I would not have been able to complete it. My sincere appreciation also goes to Dr. Barbara Schneider, who supported me financially and academically. She showed me the right direction for a scholar to go. Aside from her financial support, she trained me to be a better scholar. My appreciation also goes to all of the members of the College Ambition Program (CAP) project—Christina Mazuca, Justina Judy, and all of the others who supported me through hard times. Also, while I have not named them all, my deep appreciation goes to all my friends whom I shared my life as a Ph.D. student at MSU. I would especially like to thank Eun Jeong Noh, who has been the best colleague during my doctoral program. Last but most of all, my deepest appreciation goes to my family members. My father, SangYeon Lee, my mother, Dooyi Yoo, my brother,MaengHwan Lee, sister-in-law, Haesuk Kwon and sisters, nieces, and nephews; they all deserved to be called Doctor. Without their endless love and support, I would not be standing and hooding my doctoral gown. vi TABLE OF CONTENTS LIST OF TABLES ......................................................................................................................... ix  LIST OF FIGURES ....................................................................................................................... xi  CHAPTER 1 ................................................................................................................................... 1  INTRODUCTION ...................................................................................................................... 1  1.1. Multi-dimensional Item Response Theory (MIRT) ......................................................... 1  1.2. Current Issues in Item Parameter Calibration Procedures ............................................... 4  1.3. Focus of This Study ......................................................................................................... 7  1.4. Research Questions to be Addressed ............................................................................... 7  CHAPTER 2 ................................................................................................................................... 9  LITERATURE REVIEW ........................................................................................................... 9  2.1 Uni-dimension to Multi-dimensions ................................................................................. 9  2.2 Types of Latent Configurations: Approximate Simple Structure (AS) and Mixed Structure (MS) ...................................................................................................................... 10  2.3 Correlated Latent Traits .................................................................................................. 11  2.4 Skewed Latent Trait Distributions .................................................................................. 11  2.5 Item Parameter Estimation Techniques: MLE and MCMC ........................................... 12  CHAPTER 3 ................................................................................................................................. 16  RESEARCH DESIGN .............................................................................................................. 16  3.1. Model Specification ....................................................................................................... 16  3.2 Data Generation .............................................................................................................. 17  3.2.1 Skewed Multivariate Normal Distribution............................................................... 26  3.3 MCMC Simulation.......................................................................................................... 27  3.4 Accessing Convergence of MCMC Simulation .............................................................. 30  3.5 Prior Distributions and Likelihood Functions................................................................. 32  3.5.1 Prior Distributions.................................................................................................... 32  3.5.2 Likelihood Functions ............................................................................................... 34  3.6 Evaluation Criteria for Simulation Results ..................................................................... 36  CHAPTER 4 ................................................................................................................................. 38  RESULTS ................................................................................................................................. 38  4.1. Convergence Diagnostic ................................................................................................ 38  4.1.1. Heidelberger and Welch diagnostic ........................................................................ 39  4.1.2. Geweke diagnostics ................................................................................................ 39  4.1.3. Graphical diagnostics: Autocorrelation, posterior density, and trace plots ............ 40  4.1.4. MCMC standard error ............................................................................................. 43  4.1.5. Item parameter recovery diagnostic ........................................................................ 44  vii 4.2. 3-Dimensions ................................................................................................................. 45  4.2.1. Approximate Simple Structure (AS) and Mixed Structures (MS) .......................... 46  4.2.2. Correlated Latent Traits .......................................................................................... 48  4.2.3. Skewed Latent Trait Distributions .......................................................................... 50  4.2.4. Correlated Latent Traits and Skewed Latent Traits Distributions .......................... 53  4.3. 6-Dimensions ................................................................................................................. 56  4.3.1. Approximate Simple Structures and Mixed Structures .......................................... 57  4.3.2. Correlated Latent Traits .......................................................................................... 58  4.3.3. Skewed Latent Traits Distributions ........................................................................ 62  4.3.4. Correlated and Skewed Latent Traits Distributions ................................................ 65  Chapter 5 ....................................................................................................................................... 70  Summary and Discussion .......................................................................................................... 70  5.1. Overview of the Study ................................................................................................... 70  5.2. Summary of Results ....................................................................................................... 71  5.2.1. Sample Size ............................................................................................................. 71  5.2.2. Types of Latent Trait Configuration: Approximate Simple (AS) and Mixed Traits (MS) .................................................................................................................................. 71  5.2.3. Correlated Latent Traits .......................................................................................... 71  5.2.4. Skewed Latent Traits Distributions ........................................................................ 72  5.2.5. Correlated and Skewed Latent Traits Distributions ................................................ 72  5.3. Discussion ...................................................................................................................... 73  5.4. Implications and Limitations ......................................................................................... 75  5.4.1. Implications............................................................................................................. 75  5.4.2. Limitations .............................................................................................................. 76  APPENDIX ................................................................................................................................... 77  REFERENCES ........................................................................................................................... 102  viii LIST OF TABLES Table 3.2.1. True item parameters for 3-dimensions with AS and MS ........................................ 20  Table 3.2.2. True item parameters for 6-dimensions with AS structures ..................................... 22  Table 3.2.3. True item parameters for 6-dimensions with MS ..................................................... 24 Table 4.1.1: Geweke’s Z-score ..................................................................................................... 40  Table 4.1.2: MCMC standard error............................................................................................... 43  Table 4.1.3: Highest posterior interval for a1, a2, and a3 parameters ............................................ 44  Table 4.1.4: Highest posterior interval for a4, a5, and a6 - parameters ........................................ 45  Table 4.1.5: Highest posterior interval for d- parameters ............................................................. 45 Table 4.3.1. BIAS for different types of latent trait configuration (AS vs. MS) .......................... 57  Table 4.3.2. MAD for different types of latent trait configuration (AS vs. MS) .......................... 58  Table 4.3.3. RMSE for different types of latent trait configuration (AS vs. MS) ........................ 59  Table 4.3.4. BIAS for correlated latent traits (MS only) .............................................................. 60  Table 4.3.5. MAD for correlated latent traits (MS only) .............................................................. 61  Table 4.3.6. RMSE for correlated latent traits (MS only) ............................................................ 62  Table 4.3.7. BIAS when skew is imposed on the latent traits distributions (+.9 and -.9) ............ 63  Table 4.3.8. MAD when skew is imposed on the latent traits distributions (+.9 and -.9) ............ 64  Table 4.3.9. RMSE when skew is imposed on the latent traits distributions (+.9 and -.9)........... 65  Table 4.3.10. BIAS when both correlation and skew are imposed on the latent traits distributions ....................................................................................................................................................... 66  Table 4.3.11. MAD when both correlation and skew are imposed on the latent traits distributions ....................................................................................................................................................... 68  ix Table 4.3.12. RMSE when both correlation and skew are imposed into the latent traits distributions................................................................................................................................... 69 Table A.1.1. Heidelberger and Welch’s Convergence Diagnostic: a1-parameter ........................ 78  Table A.1.2. Heidelberger and Welch’s Convergence Diagnostic: a2-parameter ........................ 80  Table A.1.3. Heidelberger and Welch’s Convergence Diagnostic: a3-parameter ........................ 82  Table A.1.4. Heidelberger and Welch’s Convergence Diagnostic: a4-parameter ........................ 84  Table A.1.5. Heidelberger and Welch’s Convergence Diagnostic: a5-parameter ........................ 86  Table A.1.6. Heidelberger and Welch’s Convergence Diagnostic: a6-parameter ........................ 88  Table A.1.7. Heidelberger and Welch’s Convergence Diagnostic: d-parameter .......................... 90 Table A.2. 1. Geweke’s Z-score ................................................................................................... 92 Table A.3. 1. MCMC standard error ............................................................................................. 94 Table A.4. 1. Highest posterior density(HPD) interval for a1, a2, and a3-parameters .................. 96  Table A.4. 2. Highest posterior density(HPD) interval for a4, a5, and a6-parameters .................. 98  Table A.4. 3. Highest posterior density(HPD) interval for d-parameter .................................... 100  x LIST OF FIGURES Figure 4.1. Autocorrelation plot for a1 of item 6 .......................................................................... 41  Figure 4.2. Trace plot for a1 of item 6........................................................................................... 42  Figure 4.3. Posterior density plot for a1 of item 6 ......................................................................... 42  xi CHAPTER 1 INTRODUCTION This chapter presents a brief introduction to MIRT models, current issues in item parameter calibration procedures in MIRT models, the focus of this study, and research questions to be addressed. 1.1. Multi-dimensional Item Response Theory (MIRT) Item Response Theory (IRT) has been recognized as one of the major developments in th educational and psychological measurement during the 20 century. IRT is a mathematical expression that shows the relation between the characteristics of a person (e.g. a latent trait) and the characteristics of the test items. The history of IRT dates back to when Lawley (1944) and Tucker (1946) published their seminal articles. However, the most important contribution to the IRT literature occurred when Lord (1952) published “A Theory of Test Score (Psychometric Monograph No.7)”. Lord, Novick, and Birnbaum (1968) published a book called “Statistical Theories of Mental Test Scores,” which provides the basic assumptions of the IRT models such as 1) local independence, 2) uni-dimensionality of latent trait, and 3) monotonicity. All of these assumptions are crucial factors to the modeling of IRT. During the last decade, IRT had been used as the primary tool in the educational and psychological measurement fields. Equating, linking, DIF, and computerized adoptive tests are just a few well-known IRT-based applications. Many of the uses of these applications depend on how accurately the item and person’s parameters are estimated. In order to have stable and consistent estimation of parameters, all assumptions for item response theory should be fulfilled. 1 One of the most commonly violated assumptions is the uni-dimensionality in the latent trait structure implied by the item response data. In most instances, it is sufficient to assume that all the test items in a test are sensitive to differences in examinees along a single latent trait (Yanyan Sheng & Wikle, 2007). However, a large body of research has pointed out that this violation of uni-dimensionality leads to a certain degree of bias in parameter estimation, which has, therefore, led to the development of Multi-dimensional Item Response theory (MIRT) models (Bock & Aitkin, 1981; Reckase & McKinley, 1982; Samejima, 1974; Thissen & Steinberg, 1984; Whitely, 1980). Several multi-dimensional item response theory models are proposed such as: 1) multi-dimensional extension of the two-parameter logistic model (Reckase, 1985); 2) multi-dimensional extension of the three-parameter logistic model (Reckase, 2009); 3) multi-dimensional extension of the normal ogive model (McDonald, 1999; Samejima, 1974); 4) multi-dimensional partial credit model (Kelderman & Rijkes, 1994); 5) multi-dimensional extension of the generalized partial credit model (L. H. Yao & Schwarz, 2006); and 6) multidimensional extension of the graded response model (Muraki & Carlson, 1993). Two different kinds of models are commonly referred to as MIRT models— compensatory models and non-compensatory models. In the framework of the compensatory model, the probability of answering correctly is influenced by a weighted linear combination of latent traits. In other words, the probability of answering correctly is influenced not just by one latent trait but by a weighted combination of latent traits. For example, mathematics tests are usually composed of more than two dimensions, such as understanding the problem. Which is comprehensive abilities in reading; translating the problem into an equation, which is the mathematic thinking; and solving the problem, which is analytic ability. All three dimensions should be combined to solve the problem correctly. In the framework of non-compensatory 2 models, one needs to have sufficient levels of each of the measured latent traits in order to solve the question. That is, a deficiency in one latent trait cannot be offset by increasing another one (Bolt & Lall, 2003). As Bolt and Lall (2003) pointed out, the practical distinction between the two types of models is often based on the estimation techniques. In non-compensatory models, it is relatively hard to estimate parameters and to make inferences from them compared to compensatory models, because their estimation procedure requires sufficient variability in the relative difficulties of components across items to identify the dimensions (Maris, 1995). Two statistical descriptions, MDIFF and MDISC, are commonly used to describe the characteristics of test items in MIRT models. Reckase (1985) described the multi-dimensional difficulty of the test item, often referred to as MDIFF. It has the same interpretation as the bparameter in a uni-dimensional item response theory model that expresses the difficulty of the test item as a direction and a distance in the complete latent space. The equation to estimate the MDIFF is given below (see Reckase, 1985, for complete derivation): MDIFFi = -di ∑k a2 i=1 i 1.1 Where ai is the vector of item discrimination parameter, and di is a scalar parameter that is related to the difficulty of the item. Reckase and McKinley (1991) developed the overall measure of multi-dimensional discrimination (MDISC), which is analogous to an a-parameter in the uni-dimensional model. Instead it represents a single a-parameter, it is an overall measure of the capability of an item to distinguish between individuals that are in different locations in the complete latent space. The 3 equation for m-dimensional MDISC is given below. See Reckase and McKinley (1991) for the complete derivation. m MDISC= k=1 a2 1.2 ik Where ai is the vector of item discrimination parameters, and m is number of dimension. 1.2. Current Issues in Item Parameter Calibration Procedures Even though the development of estimation techniques in MIRT is still an on-going research topic, many of the estimation procedures used in uni-dimensional item response theory have been adopted into the estimation of parameters in MIRT models. A joint maximum likelihood (JML) (Birbaum, 1968) procedure implemented in LOGIST (Wingersky, Barton, & Lord, 1982), once the most popular computer program for estimating the parameters in unidimensional item response theory, was implemented in MIRTE (Carlson, 1987). The unweighted least squares estimation implemented in NOHARM (Fraser & McDonald, 1988), which is now being used to estimate parameters in both MIRT models and uni-dimensional IRT models. A marginal maximum likelihood estimation procedure is implemented in TESTFACT (Bock et al., 2003). There have been extensive comparative studies that compared the performance between NOHARM and TESTFACT (Béguin & Glas, 2001; Goaz & C.M, 2002; Stone & Yeh, 2006). These studies have shown that neither one of the programs is superior to the other. Rather, the accuracy of item recovery mainly depends on the specification of factors, such 4 as the number of parameters to be estimated, sample size, dimensional structure, or number of items. One limitation of using Maximum Likelihood Estimation (MLE) in an IRT framework is that, on occasion, parameters of some items cannot be estimated because of the data structure (Baker, 1987). For instance, when responses from an examinee are either all correct or incorrect, it cannot be used to estimate the parameters. Therefore, MLE removes these responses from the dataset. Dropping these non-usable responses causes a loss of information, and can decrease the sample size (e.g. number of response sets), which is a critical factor for the accurate estimation of parameters using the MLE procedure. Even though there are several estimation procedures currently available, developing estimation procedures in MIRT models is still an active area of research. The major challenge in parameter estimation techniques in MIRT models is that it the relationship between parameter recovery and test specifications, such as number of dimensions, dimension structure, number of items, number of examinee, and selection of parameter distributions, is still not clear. Most estimation procedures are implemented in a computer program, and are used in both practical and research fields that require pre-specifications. Most of the computer programs require specification of the types of models, number, and structures of dimensions that could be estimated, as well as the specific types of algorithms being used to estimate parameters. So it is nearly impossible to explore all the possible relationships among all specifications under one estimation program. In addition, the problem with using several computer programs to investigate the relationship among several factors is that they do not always agree with each other. 5 Recently, Bayesian analysis, specifically the Markov Chain Monte Carlo (MCMC) methods, has received a great deal of attention from researchers (Béguin & Glas, 2001; Bolt & Lall, 2003; Patz & Junker, 1999a, 1999b; Wollack, Bolt, Cohen, & Lee, 2002). The history of MCMC dates back to 1953, when Metropolis first introduced the metropolis algorithm (Metropolis, Rosenbluth, Rosenbluth, Teller, & Teller, 1953). Prior to that time, MCMC was not recognized as one of the statistical methods because of the cost of computation. However, it has started to gain more attention now that high-speed computers have become available for less cost. However, this method has still suffered from criticism among statisticians, who infer that the MCMC method is subjective because of the biased selection of prior distribution. A. Gelman and Shalizi (2012) argued that Bayesian prior distribution is not a personal belief but a part of a hypothesized model. They also argued that the advantage of using Bayesian inference is deductive, meaning that Bayesian inferences could be made from the given data and some model assumptions. Nevertheless, Bayesian analysis has become the alternative estimation method when the model is complicated, so that it cannot be estimated analytically. One of the advantages of using MCMC in the MIRT framework is that it could be used as an alternative method to estimate the parameters when a model is complex, so that it is difficult to estimate them analytically (Harwell, Stone, Hsu, & Kirisci, 1996). For example, the number of parameters to be estimated in a compensatory model is calculated by using N(M+1)+ M x Y. M is the number of dimensions, N is the number of items, and Y is the number of examinees (Reckase, 2009). If there are 50 items with five dimensions and 2000 examinees in the model estimation, then 10,300 parameters need to be estimated; it is nearly impossible to have stable estimates. 6 The other advantage of using MCMC is that it gives more control for researchers to examine the inter-relational effect of several factors at one time (M. Harwell et al., 1996). Since most current item parameter recovery software programs have their own specifications of modeling—such as types of MIRT models, number of dimensions, dimension structure, number of items, and number of examinees—MCMC gives more flexibility to estimate parameters under several factors simultaneously. This is one of the major motivations for using MCMC as the estimation technique for this study to examine the interaction effects of several factors in item parameter recovery in MIRT models. 1.3. Focus of This Study The main focus of this study is to investigate the interaction effects between factors on item parameter recovery in the MIRT model using the MCMC simulation. Even though there are good existing estimation procedures (e.g. TESTFACT, NOHARM), those programs have limitations related to the relationship among several factors because of the way they are specified in the software. For example, TESTFACT does not provide the option of defining the correlated– dimension structures when running the calibration. MCMC provides a great deal of flexibility in estimating the parameter. In this study, MCMC is being used to investigate the interaction effect among several factors such as number of dimensions, number of items, number of examinees, and dimension structures. 1.4. Research Questions to be Addressed In order to investigate the interaction between factors, the specifications for large scale assessments (e.g. ACT or NAEP) will be borrowed to guide the selection of number of 7 dimensions and number of items. The number of items is fixed at 60. This number of items is the average number of items over different subject areas of the ACT college admissions tests—75 items for English, 60 items for Mathematics, 40 items for Reading, and 40 items for Science. Since this study explores a 6-dimension MIRT model, 60 items could be distributed evenly into 6-dimensions. Each dimension has 10 items. For a 3-dimension MIRT model, each dimension will have 20 items. The specific research questions that will be answered from the MCMC simulation are: 1. Under two different numbers of dimensions, 3 and 6, what is the least sample size to obtain the most stable item parameter estimate? 2. Based on conditions from research question 1 plus two different dimension structures, approximate simple structure and mixed structure, what is the least sample size to obtain the most stable item parameter estimate? 3. Based on conditions from research questions 1 and 2 plus correlated latent traits (in mixed structure), then what is the least sample size to obtain the most stable item parameter estimate? 4. When a different shape of ability distribution (e.g. skewness) is imposed into the latent traits distribution, then what is the least sample size to obtain the most stable item parameters estimate? 8 CHAPTER 2 LITERATURE REVIEW In this chapter, the theoretical foundations of this study are presented, such as types of latent traits structure configuration; approximate simple structure (AS) and mixed structure (MS); correlated latent traits; skewed latent traits distributions; and item parameter estimation techniques including Maximum likelihood and MCMC. 2.1 Uni-dimension to Multi-dimensions Uni-dimensionality is one of the most commonly violated assumptions in the latent trait structure implied by the item response data. This violation might cause an increase in bias in the estimation of item parameters and latent traits. Dorans and Kingston (1985) examined the effect of uni-dimensionality violation on equating by using GRE verbal scores, and showed that the violation of uni-dimensionality increased the bias on item parameters estimation, and that it lead to an unsatisfactory equating result. Yanyan Sheng and Wikle (2007) also showed that using a uni-dimensional model returned unsatisfactory results when tests were composited with several distinct abilities; in addition, they showed that applying a multi-dimensional model into a unidimensional structure did not harm the estimation results. Therefore, using a multi-dimensional model is a safe way to get the sustainable estimates from the calibration procedure. The question that should then be asked is how many dimensions are needed to represent the latent traits adequately. Reise, Waller, and Comrey (2000) showed that it is better to have a larger number of dimensions when assessing dimensionality. However, as Reckase (2009) pointed out, a cost is paid if more dimensions than necessary are used in an analysis: having more item parameters 9 that need to be estimated might increase the bias as well as the sample size. The most commonly used MIRT model has two- or three-dimensions, and research shows that it requires a sample size of at least 1000 in order to obtain a sustainable item calibration result. However, there is still a lack of research investigating the effect of high dimensions, with more than three-dimensions. Nevertheless, all aspects of latent traits need to be examined, such as the number of the sample size, the distributions of latent traits, the types of latent traits configuration, and the correlation between latent traits when data has a high dimension. 2.2 Types of Latent Configurations: Approximate Simple Structure (AS) and Mixed Structure (MS) The structure of multidimensional tests is typically categorized into three kinds: 1) simple structure, 2) approximate simple structure, and 3) complex structure (e.g. mixed structure). Mixed structure will be used as interchangeable with complex structure from here on. Simple structure is the most restricted dimensional structure because it only has one nonzero aparameter on one dimension, even though there are several dimensions (Thurstone, 1947). One nonzero a-parameter in a multidimensional structure does not often appear in practical test situations. Approximate simple structures have fewer restrictions on nonzero a-parameters. In the framework of an approximate simple structure, there are multiple dimensions, however, only one a-parameter has a meaningful interpretation, and additional nonzero a-parameters are somewhat trivial quantitatively (Walker, Azen, & Schmitt, 2006). Mixed structure may be the most realistic of the dimensional structures. It consists of multiple dimensions in which several a-parameters are nonzero. The definition of structure types is based on the weighting of aparameters on dimensions clearly defined from previous research. However, the effect of different types of latent trait configuration on item parameter calibration has not been 10 investigated when it is combined with other factors such as number of dimension, correlation between latent traits, and skewed latent traits distributions. Therefore, this needs to be explored. 2.3 Correlated Latent Traits Unlike the effect of different structure types for latent traits configuration, the correlated latent traits influence on item parameter calibration has been investigated by several researchers (Batley & Boss, 1993; Finch, 2011; Robert L. McKinley & Reckase, 1984). Batley and Boss (1993) used a simulation study to identify the effect of correlated latent traits on item parameter calibration with two-dimensions. They found that the d-parameter does not get affected by correlated latent traits but a-parameters are more sensitive to correlated latent traits. Finch (2011) used a simulation study to examine the effect of correlated latent structures with two-dimensions and showed that correlated latent structures do have an influence on item parameter calibration. The magnitude of bias increasing with the magnitude of correlation among correlated latent traits for both a- and d-parameters. However, both studies only used a two-dimensional MIRT model, so they did not take into account a higher number of dimensions. 2.4 Skewed Latent Trait Distributions Besides the number of dimensions, types of latent trait structures configuration, and correlated latent traits, non-normal distributions have been identified as being problematic to estimating accurate item parameters. De Ayala and Sava-Bolesta (1999) examined three situations—normal, positively skewed, and uniform distribution—and showed that skewed distributions contribute to high RMSE on item parameter estimates, and uniform distributions contributed to low RMSE. Either uniform or skewed latent traits distributions contributed to the 11 bias in item parameter estimates. Finch (2011) used both positive and negative skewed distribution to identify the effect of non-normal distributions. It was shown that a-parameters were consistently underestimated and the bias of d-parameters was associated with the direction of skewness. So it is clear that non-normal distributions in the latent traits distributions cause bias in item parameter estimates. Yet, it is not clear how skewness in the latent traits distributions contributes to bias in item parameter estimates if there are several factors combined together, especially with a high number of dimensions. 2.5 Item Parameter Estimation Techniques: MLE and MCMC There are several parameter estimation techniques suggested by previous studies in the MIRT framework, and implemented in commonly available estimation programs. For example, the marginal maximum likelihood estimation procedure (Bock & Aitkin, 1981) is being implemented in the well-known estimation program TESTFACT (Bock et al., 2003); the unweighted least square method is implemented in NOHARM (Fraser & McDonald, 1988); and the Markov chain Monte Carlo method with Metropolis-Hastings sampling is implemented in BMIRT (L. Yao, 2003). TESTFACT uses a marginal maximum likelihood estimation (MMLE) procedure to estimate item parameters, and then uses a Bayesian estimation method to estimate the latent traits. TESTFACT specifically uses this MMLE procedure based on the expectation/maximization (EM) algorithm developed by Dempster, Laird, and Rubin (1977). In general, the EM algorithm is an iterative computation of maximum likelihood estimates in the presence of unobserved random variables. Suppose we have a joint probability density function, f(U, θ|ξ), where U is observed incomplete data and represents the item parameters to be estimated. For the two- and three12 parameter logistic IRT models, the distribution of f U, θ|ξ is unknown, so sufficient statistics are not available. Then, the expected values of the logf(U, θ|ξ), conditional on some observed representation of , are taken and treated as if they were known. This is called the expectation step. These expected values are then used to find the estimated item parameters that are maximizing the log of the likelihood function. This is called the maximization step. See Dempster et al. (1977) and Baker and Kim (2004) for more complete mathematical derivations of the EM algorithm. While this MMLE procedure has been shown to have a consistent performance of parameter recovery in both unidimensional and multidimensional IRT models, it has certain limitations. First, it requires eliminating the response strings that are perfectly correct or incorrect before running the estimation. That might result in loss of information about some examinees. Second, sometimes it returns infinity estimates for discrimination parameters, or zero estimates for discrimination that have an effect on the estimates of other parameters. In order to overcome the limitations from the MMLE estimation technique, many researchers turned their attention to a Bayesian method, particularly the Markov Chain Monte Carlo (MCMC) procedure. The use of MCMC in an IRT framework is relatively new. Albert (1992) used MCMC with Gibbs sampling with a two-parameter normal ogive model to estimate both the item and person parameters. To run the analysis, he used both simulated data with 30 items and 100 subjects, and real data from the Mathematic placement test from the Department of Mathematics and Statistics at Bowling Green State University. He showed that MCMC with Gibbs sampling gave compatible estimates of item and person parameters to those from the maximum likelihood estimation procedure. Patz and Junker (1999a) showed the potential benefit of MCMC in an IRT framework. In their paper, they reviewed MCMC methods, including two different sampling techniques, Gibbs sampling and Metropolis-Hasting sampling. They 13 suggested that Metropolis-Hasting with Gibbs sampling is more appropriate in an IRT framework than pure Metropolis-Hasting sampling. When the number of parameters increases, it becomes very difficult to maintain reasonable acceptance probabilities in a pure MetropolisHasting sampling method while thoroughly exploring the parameter space. In a subsequent paper (Patz & Junker, 1999b), they examined several variations such as multiple item types, missing data, and a rating response IRT model. They showed that MCMC is a good alternative to parameter estimation techniques when the model increases in complexity in terms of the number of parameters that need to be estimated. Wollack et al. (2002) did a comparable study of the effectiveness of MCMC with MML estimation at recovering the underlying parameters of a complex IRT model; the nominal response model. They showed that a greater sample size (300 to 500) retuned better recovery in both MCMC and MLE. They found that the advantage of using MCMC in an IRT framework is its ease of implementation with complex IRT models. However, all these studies focused on the uni-dimensional item response theory models. Later, MCMC methods were implemented for multidimensional item response theory models (Béguin & Glas, 2001; Bolt & Lall, 2003; Fu, Tao, & Shi, 2009; Y. Sheng, 2008). Béguin and Glas (2001) also found it easier to implement the estimation procedure with more complicated high-dimensional models using the MCMC method. They used a five-dimensional, three-parameter logistic model to examine the recovery of MCMC. They showed that MCMC recovered the true item parameters better than TESTFACT and NOHARM. Even though there are several studies using MCMC in multidimensional item response theory models, there is not extensive research to investigate the effect of a variation of factors under MIRT models. That is the motivation for this study. Equipped with the advantage of easy implementation of a complex 14 model, the interaction effects between factors on item parameters recovery in MIRT model are investigated using MCMC. 15 CHAPTER 3 RESEARCH DESIGN In this chapter, the specifications of the research design are presented, including model specification, data generation, specification of MCMC simulation, likelihood function of prior distributions, assessing convergence of MCMC simulation, and evaluation criteria of simulation results. Simulated data is used in this study rather than a real dataset. The rationale behind using simulated data instead of real data is that it is often suggested in order to separate the effect of model misfit and calibration errors (Bolt, 1999; Davey, Nering, & Thompson, 1997). The model specification and data generation procedure are explained in the following sections. 3.1. Model Specification Let Xij denote the response of person j on item i (1 if correct, 0 if not correct). Then the probability of answering correctly is given as follows (Reckase, 1985): exp[ ai θj + di ] P Xij =1 θj , ai , di )= 1+exp[ ai θj + di ] where P Xij =1 θj , ai , di ) is the probability of a correct response to item i by person j, ai is a vector of discrimination parameters, 16 (3.1) di is a scalar parameter that is related to difficulty of the item, and θj is a vector of ability parameters. The most commonly used multidimensional item response theory models are multidimensional two- and three-parameter logistic models (M2PL and M3PL respectively). The only difference between the two models is that the M3PL model has a pseudo-guessing parameter. The problematic role of the pseudo-guessing parameter in recovering a- and b-parameters was investigated by several studies (Baker, 1987; Hulin, Lissak, & Drasgow, 1982; Kolen, 1981; Robert L. McKinley & Reckase, 1980; Thissen & Wainer, 1982). Despite all the difficulties estimating the c-parameter, whether or not to include c-parameter into the model is still being debated. Yen (1981) used the simulation study to show that the data set generated by a threeparameter model fits very well with a two-parameter model. R. L. Mckinley and Mills (1985) showed the same result as Yen (1981). Since this study does not focus on the model-fit, a twoparameter model will be used to investigate the effect of variation of a variety of factors. 3.2 Data Generation Several factors are considered for generating true item parameters: 1) number of dimensions; 2) different types of latent traits configuration; 3) number of items; and 4) correlated latent traits First, two different numbers of dimensions—3- and 6-dimensions—are considered. Reise et al. (2000) summarized that it would be better to overestimate the number of dimensions than to underestimate them, in order to accurately represent the major relationships in the item response data. Most previous studies in MIRT have had three dimensions, so this study expands the number of dimensions up to six dimensions. 17 Second, Thurstone (1947) suggested that the number of variables needed to run the analysis with m factors is two or three times greater. Holzinger and Harman (1941) gave a formula to estimate the required number of variables in m factor analysis: n ≥ 2m+1 + √8m+1 3.2 2 Since this study examines 3-6 dimensions, the minimum numbers needed are 6 items for 3 dimensions, 8 items for 4 dimensions, and 10 items for 5 dimensions. However, Thurstone (1947) also suggested having more than five- or six-times more items than m factors. In practice, most tests have composited with more than 50 items. For example, the ACT exam has 75 items for English, 60 items for Math, and 40 items for Science. For this study, 60 items will be used because the average number of items in the ACT is around 60, and this will allow an evenly distributed item number across dimensions. So in 3-dimensions, each dimension has 20 items. In 6-dimensions, each dimension has 10 items. Third, two different types of latent trait structures will be examined—approximate simple structure and complex structure. Typically, items that lie within 20° of the x, y, or other axis (Froelich, 2001) are called an approximate simple structure. Items that lie within 40° of the x, y, or other axis are called a complex structure. Two different angles, 20 and 40, are used to generate two different dimension structures, the approximate simple structure and complex structure, respectively. Fourth, in order to examine the correlated latent traits (only for the complex structure), correlations of 0, 0.3, and 0.6, are considered. Those correlations were selected to provide a 18 broad range of potential conditions, from low to high. Previous studies covered a broad range of correlations. Walker et al. (2006) used 0.3, 0.6, and 0.9. Finch (2010) used 0, 0.3, 0.5, and 0.8. Tate (2003) used 0.6. With specifications on each factor, the true item parameters in k-dimensions corresponding to each dimension are generated using the following equations: ak =MDIS*Cos(αk ) (3.3) d= -MDISC*MDIFF (3.4) First, MDISCs and MDIFFs were randomly drawn from specific distributions: the former from lognormal distribution with  = -0.15 and  = 0.35, which resulted in an MDISC mean of 0.92 and a standard deviation of 0.33; the latter was from a normal distribution with mean of 0 and standard deviation of 0.7. Second, directional angles, s (i.e. angles between the item vector and ability axes), were generated using uniform distribution with a range specified by dimensional structure (i.e. either approximate simple structure or complex structure). Finally, item parameters were calculated from MDISC, MDIFF, and  values, using the above formulas. True item parameters for 3- and 6-dimensions with AS and MS structures are given in table 3.2.1, 3.2.2, and 3.2.3, respectively. The M2PL model specified above is being used to generate the response dataset. Multivariate normal distribution with a mean of 0, and a covariance matrix based on the specification of correlations above, are used for ability distribution. More details about skewed multivariate normal distribution are given in section 3.2.1. In order to make interpretation simple 19 for this study, it is assumed that all dimensions have the same distribution, with a mean of zero and a standard deviation of one, to effect identification. 10 replications of each test forms will be generated by using the specifications described above. The probability of a correct response for each examinee to each item is calculated using the M2PL model. If a random number drawn from a uniform distribution U 0,1 is less than the model-based probability, the item response is coded correct. Otherwise, it is coded wrong. Table 3.2.1. True item parameters for 3-dimensions with AS and MS AS Item 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 MS a1 a2 a3 d a1 a2 a3 d 0.6020 0.7295 0.7213 0.5356 0.9726 0.8741 0.7222 1.0943 1.4082 0.5542 0.7781 0.5478 1.0639 0.9555 0.6593 0.6599 0.6149 1.1168 0.7429 0.7586 0.1407 0.2057 0.1874 0.1472 0.2906 0.2391 0.1721 0.3078 0.3596 0.1316 0.2492 0.1437 0.2491 0.2656 0.2031 0.1826 0.1736 0.2937 0.1899 0.2159 0.1385 0.2030 0.1848 0.1452 0.2871 0.2359 0.1694 0.3037 0.3544 0.1296 0.2463 0.1416 0.2451 0.2621 0.2007 0.1802 0.1713 0.2896 0.1871 0.2132 0.7506 0.7883 0.7644 0.5175 0.9289 0.7038 0.4040 -0.1223 -0.2112 -0.1009 -0.1730 -0.1277 -0.3469 -0.5162 -0.3835 -0.4514 -0.5285 -1.1527 -0.8571 -1.5381 1.1640 0.4481 0.8766 0.5042 0.6011 0.5029 1.0557 0.9356 0.3315 0.5715 0.8342 0.3442 0.5922 0.5573 0.7906 0.4770 0.6218 0.6143 0.3699 0.5889 0.6381 0.2773 0.4355 0.2912 0.3417 0.3414 0.5804 0.4877 0.1730 0.3527 0.4447 0.2557 0.3810 0.3460 0.4591 0.2959 0.4454 0.2615 0.2631 0.3311 0.6250 0.2723 0.4255 0.2855 0.3349 0.3357 0.5685 0.4771 0.1692 0.3462 0.4353 0.2518 0.3743 0.3397 0.4502 0.2905 0.4384 0.2544 0.2590 0.3244 2.1288 0.7329 1.1914 0.3389 0.3980 0.2235 0.3750 0.2451 0.0841 0.0672 -0.0475 -0.0616 -0.1198 -0.1700 -0.3780 -0.3180 -0.6780 -0.6647 -0.5872 -0.8884 20 Table 3.2.1 (cont’d) AS Item 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 MS a1 a2 a3 d a1 a2 a3 d 0.1721 0.1754 0.2008 0.1333 0.2229 0.2168 0.1878 0.0919 0.1713 0.2030 0.2563 0.1957 0.1899 0.1045 0.2565 0.1843 0.2035 0.2747 0.2282 0.2465 0.1498 0.2540 0.3129 0.1911 0.2758 0.2268 0.1811 0.1573 0.2482 0.1114 0.6725 0.6294 0.7295 0.5351 0.8836 0.8370 0.8051 0.3694 0.6772 0.8934 0.9089 0.8729 0.8053 0.4679 0.8979 0.7790 0.9778 0.9867 0.7695 0.8983 0.1270 0.2070 0.2694 0.1683 0.2372 0.1954 0.1555 0.1355 0.2166 0.0960 0.1813 0.1839 0.2107 0.1406 0.2350 0.2282 0.1989 0.0970 0.1806 0.2153 0.2686 0.2078 0.2009 0.1109 0.2686 0.1950 0.2172 0.2881 0.2386 0.2587 0.6262 1.2731 1.2082 0.6405 1.0717 0.8709 0.7099 0.6031 0.8838 0.4269 0.8524 0.4900 0.5229 0.3036 0.3803 0.3132 0.2636 0.1049 0.1689 0.2120 0.2164 -0.0192 -0.0235 -0.0803 -0.2928 -0.4160 -0.6982 -0.9177 -0.9011 -1.4694 0.9049 1.5969 1.1065 0.5277 0.8405 0.5912 0.4600 0.3737 0.3402 0.0877 0.5397 0.2355 0.2308 0.5580 0.2857 0.2523 0.4029 0.2958 0.3286 0.3054 0.4887 0.1972 0.6358 0.4602 0.3343 0.3775 0.3112 0.3221 0.5129 0.5399 0.3150 0.1904 0.4930 0.4709 0.5749 0.3476 0.3760 0.3956 0.3766 0.4140 0.7587 0.3713 0.4026 0.7269 0.4936 0.3444 0.5241 0.6376 0.3545 0.4096 0.6831 0.3526 1.0602 0.8952 0.6409 0.6870 0.4629 0.4046 0.6889 0.9294 0.3232 0.1967 0.5051 0.4845 0.5887 0.3577 0.3847 0.4051 0.3868 0.4260 0.5164 0.2241 0.2184 0.5356 0.2705 0.2417 0.3867 0.2759 0.3174 0.2927 0.4677 0.1863 0.6032 0.4325 0.3145 0.3563 0.2970 0.3095 0.4917 0.5113 0.4964 0.3764 0.7298 0.8213 0.8336 0.6133 0.5225 0.5758 0.6125 0.7238 2.2377 0.5308 0.3592 0.7211 0.3081 0.1273 0.0848 0.0673 0.0202 -0.2209 -0.3759 -0.1892 -0.6289 -0.6906 -0.6367 -0.7641 -0.6938 -0.8181 -1.5367 -1.8720 1.1908 0.5311 0.9977 1.0370 1.0558 0.5801 0.3747 0.3425 0.3032 0.2654 21 Table 3.2.1 (cont’d) AS Item 51 52 53 54 55 56 57 58 59 60 MS a1 a2 a3 d a1 a2 a3 d 0.1280 0.4395 0.2044 0.5334 0.1719 0.1997 0.2855 0.2011 0.1574 0.3683 0.1115 0.3812 0.1760 0.4606 0.1445 0.1680 0.2452 0.1739 0.1326 0.3210 0.4617 1.6269 0.7874 2.0224 0.7516 0.8692 1.1175 0.7563 0.6811 1.3208 0.0557 0.0449 0.0033 -0.5475 -0.2949 -0.4760 -0.6721 -0.4799 -0.5184 -1.2562 0.7243 0.4038 0.4908 0.8608 0.3260 0.4737 0.4358 0.2744 0.1413 0.1831 0.7436 0.4153 0.5046 0.8825 0.3361 0.4885 0.4479 0.2829 0.1463 0.1876 1.1708 0.6951 0.8318 1.3079 0.6119 0.8899 0.7311 0.5100 0.2938 0.2716 0.3762 0.1581 -0.1574 -0.2834 -0.1262 -0.5125 -0.5060 -0.5520 -0.3710 -0.8392 Table 3.2.2. True item parameters for 6-dimensions with AS structures Item 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 a1 0.9520 0.5181 0.9168 0.9747 0.6815 0.8702 0.6559 0.8031 0.9516 1.2195 0.1372 0.1130 0.1115 0.1461 0.0836 0.1617 0.0757 0.1777 0.1537 0.1599 a2 0.1509 0.0880 0.1827 0.1658 0.1202 0.1735 0.1229 0.1656 0.1559 0.2386 0.7207 0.6605 0.5830 0.8875 0.4798 0.8979 0.3922 0.9654 0.9154 1.0718 a3 0.1481 0.0865 0.1801 0.1630 0.1183 0.1710 0.1211 0.1633 0.1532 0.2351 0.1263 0.1030 0.1026 0.1327 0.0763 0.1481 0.0698 0.1630 0.1398 0.1436 a4 0.1235 0.0731 0.1564 0.1378 0.1007 0.1485 0.1041 0.1425 0.1285 0.2036 0.1405 0.1161 0.1142 0.1502 0.0858 0.1659 0.0775 0.1822 0.1580 0.1648 22 a5 0.1297 0.0764 0.1624 0.1441 0.1051 0.1541 0.1084 0.1477 0.1347 0.2115 0.1267 0.1034 0.1030 0.1332 0.0766 0.1486 0.0700 0.1636 0.1404 0.1442 a6 0.1437 0.0841 0.1759 0.1585 0.1151 0.1670 0.1180 0.1596 0.1487 0.2294 0.1179 0.0954 0.0959 0.1224 0.0708 0.1377 0.0652 0.1519 0.1292 0.1312 d 1.8397 0.0031 -0.1967 -0.5534 -0.3904 -0.5404 -0.4196 -0.5817 -0.8593 -1.1938 1.1547 0.5270 0.3405 0.4820 0.2475 0.2265 -0.0450 -0.2064 -0.4622 -0.7069 Table 3.2.2 (cont’d) Item 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 a1 0.2030 0.2696 0.4085 0.4984 0.2642 0.1351 0.2346 0.1911 0.1669 0.2918 0.1242 0.2778 0.1562 0.1755 0.1793 0.2276 0.3478 0.4200 0.2725 0.1766 0.3069 0.3111 0.3164 0.2639 0.3705 0.2365 0.2883 0.2275 0.2023 0.2153 a2 0.2372 0.3050 0.4637 0.5586 0.3044 0.1543 0.2704 0.2161 0.1879 0.3309 0.1303 0.2884 0.1636 0.1850 0.1878 0.2387 0.3639 0.4400 0.2862 0.1865 0.3063 0.3106 0.3158 0.2634 0.3698 0.2361 0.2879 0.2271 0.2019 0.2150 a3 0.7504 0.7318 1.1559 1.2067 0.8661 0.4089 0.7712 0.5173 0.4270 0.8164 0.1223 0.2744 0.1539 0.1724 0.1766 0.2241 0.3426 0.4135 0.2680 0.1734 0.3151 0.3189 0.3268 0.2722 0.3816 0.2430 0.2953 0.2327 0.2075 0.2208 23 a4 0.2303 0.2979 0.4526 0.5465 0.2963 0.1505 0.2633 0.2111 0.1837 0.3230 0.3513 0.5595 0.4213 0.5616 0.4892 0.6449 0.9250 1.1599 0.8027 0.5929 0.2989 0.3036 0.3064 0.2559 0.3598 0.2303 0.2815 0.2224 0.1972 0.2100 a5 0.2581 0.3266 0.4974 0.5953 0.3289 0.1661 0.2923 0.2314 0.2006 0.3548 0.1167 0.2647 0.1471 0.1637 0.1688 0.2139 0.3278 0.3951 0.2554 0.1643 0.7750 0.7132 1.0268 0.8103 1.0755 0.6116 0.6329 0.4598 0.4829 0.5090 a6 0.2417 0.3096 0.4709 0.5664 0.3096 0.1569 0.2751 0.2194 0.1906 0.3360 0.1086 0.2505 0.1373 0.1510 0.1574 0.1989 0.3061 0.3681 0.2370 0.1509 0.2895 0.2948 0.2945 0.2465 0.3471 0.2229 0.2736 0.2165 0.1913 0.2037 d 1.1806 1.1368 1.2722 1.0743 0.3194 0.1412 0.1326 -0.0865 -0.0959 -0.3722 0.5930 0.8166 0.3586 0.3673 0.0287 0.0046 -0.0168 -0.1859 -0.1443 -0.3328 1.3252 1.0745 0.6668 0.2626 0.1008 -0.0393 -0.1310 -0.3131 -0.3752 -0.8940 Table 3.2.2 (cont’d) Item 51 52 53 54 55 56 57 58 59 60 a1 0.0925 0.1231 0.1036 0.2041 0.2574 0.1547 0.1035 0.1510 0.1318 0.0942 a2 0.0843 0.1118 0.0936 0.1833 0.2305 0.1355 0.0920 0.1351 0.1182 0.0831 a3 0.0961 0.1280 0.1078 0.2131 0.2690 0.1629 0.1083 0.1579 0.1376 0.0989 a4 0.0933 0.1241 0.1045 0.2060 0.2599 0.1564 0.1045 0.1525 0.1330 0.0952 a5 0.0862 0.1145 0.0959 0.1882 0.2368 0.1400 0.0947 0.1388 0.1214 0.0857 a6 0.4523 0.6205 0.5480 1.1438 1.4803 1.0549 0.6258 0.8765 0.7458 0.6073 d 0.4311 0.5341 0.2823 0.5113 0.5318 0.2146 -0.0509 -0.2256 -0.3648 -0.4395 Table 3.2.3. True item parameters for 6-dimensions with MS Item 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 a1 0.5618 0.6429 0.2921 0.4081 0.3061 0.9760 0.5915 0.9778 0.8383 0.7652 0.1760 0.2174 0.2892 0.1517 0.3612 0.3578 0.2362 0.3304 0.3958 0.3017 a2 0.2124 0.2158 0.1039 0.1479 0.1455 0.3461 0.2430 0.3464 0.2666 0.2870 0.4991 0.5049 0.8794 0.4334 0.9773 0.8802 0.4645 0.8253 1.0617 0.7136 a3 0.2371 0.2433 0.1166 0.1657 0.1599 0.3883 0.2695 0.3887 0.3022 0.3205 0.1674 0.2083 0.2742 0.1442 0.3442 0.3422 0.2274 0.3158 0.3773 0.2889 24 a4 0.2319 0.2375 0.1139 0.1620 0.1569 0.3794 0.2639 0.3798 0.2947 0.3134 0.1632 0.2038 0.2669 0.1406 0.3359 0.3345 0.2230 0.3087 0.3682 0.2826 a5 0.2238 0.2284 0.1098 0.1561 0.1521 0.3655 0.2552 0.3658 0.2829 0.3024 0.1690 0.2099 0.2769 0.1456 0.3473 0.3451 0.2290 0.3185 0.3807 0.2912 a6 0.2178 0.2218 0.1067 0.1518 0.1487 0.3553 0.2488 0.3556 0.2743 0.2942 0.1680 0.2089 0.2752 0.1447 0.3453 0.3432 0.2279 0.3168 0.3785 0.2897 d 0.9419 0.7950 0.3548 0.2712 0.2168 0.5893 -0.1134 -0.1765 -0.2088 -1.0466 0.4335 0.3587 0.4488 0.0275 0.0268 -0.0135 -0.0619 -0.6525 -0.9367 -0.9932 Table 3.2.3 (cont’d) Item 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 a1 0.2030 0.2696 0.4085 0.4984 0.2642 0.1351 0.2346 0.1911 0.1669 0.2918 0.1242 0.2778 0.1562 0.1755 0.1793 0.2276 0.3478 0.4200 0.2725 0.1766 0.3069 0.3111 0.3164 0.2639 0.3705 0.2365 0.2883 0.2275 0.2023 0.2153 a2 0.2372 0.3050 0.4637 0.5586 0.3044 0.1543 0.2704 0.2161 0.1879 0.3309 0.1303 0.2884 0.1636 0.1850 0.1878 0.2387 0.3639 0.4400 0.2862 0.1865 0.3063 0.3106 0.3158 0.2634 0.3698 0.2361 0.2879 0.2271 0.2019 0.2150 a3 0.7504 0.7318 1.1559 1.2067 0.8661 0.4089 0.7712 0.5173 0.4270 0.8164 0.1223 0.2744 0.1539 0.1724 0.1766 0.2241 0.3426 0.4135 0.2680 0.1734 0.3151 0.3189 0.3268 0.2722 0.3816 0.2430 0.2953 0.2327 0.2075 0.2208 25 a4 0.2303 0.2979 0.4526 0.5465 0.2963 0.1505 0.2633 0.2111 0.1837 0.3230 0.3513 0.5595 0.4213 0.5616 0.4892 0.6449 0.9250 1.1599 0.8027 0.5929 0.2989 0.3036 0.3064 0.2559 0.3598 0.2303 0.2815 0.2224 0.1972 0.2100 a5 0.2581 0.3266 0.4974 0.5953 0.3289 0.1661 0.2923 0.2314 0.2006 0.3548 0.1167 0.2647 0.1471 0.1637 0.1688 0.2139 0.3278 0.3951 0.2554 0.1643 0.7750 0.7132 1.0268 0.8103 1.0755 0.6116 0.6329 0.4598 0.4829 0.5090 a6 0.2417 0.3096 0.4709 0.5664 0.3096 0.1569 0.2751 0.2194 0.1906 0.3360 0.1086 0.2505 0.1373 0.1510 0.1574 0.1989 0.3061 0.3681 0.2370 0.1509 0.2895 0.2948 0.2945 0.2465 0.3471 0.2229 0.2736 0.2165 0.1913 0.2037 d 1.1806 1.1368 1.2722 1.0743 0.3194 0.1412 0.1326 -0.0865 -0.0959 -0.3722 0.5930 0.8166 0.3586 0.3673 0.0287 0.0046 -0.0168 -0.1859 -0.1443 -0.3328 1.3252 1.0745 0.6668 0.2626 0.1008 -0.0393 -0.1310 -0.3131 -0.3752 -0.8940 Table 3.2.3 (cont’d) Item 51 52 53 54 55 56 57 58 59 60 a1 0.4085 0.2182 0.2610 0.2968 0.1851 0.2181 0.3034 0.2349 0.1894 0.3743 a2 0.4467 0.2408 0.2877 0.3279 0.2060 0.2464 0.3305 0.2665 0.2096 0.4144 a3 0.4182 0.2239 0.2677 0.3047 0.1904 0.2253 0.3102 0.2429 0.1946 0.3844 a4 0.4495 0.2424 0.2897 0.3302 0.2076 0.2485 0.3325 0.2688 0.2111 0.4174 a5 0.4382 0.2358 0.2818 0.3210 0.2014 0.2402 0.3245 0.2595 0.2052 0.4055 a6 0.9276 0.5720 0.6773 0.7924 0.5460 0.7587 0.6449 0.8504 0.5170 1.0314 d 2.2550 1.0999 0.7342 0.8207 0.1620 -0.1287 -0.1592 -0.4674 -0.4735 -1.5128 3.2.1 Skewed Multivariate Normal Distribution To generate the multivariate skew normal distribution, this study follows the proposal of alternative parameterization defined by Arellano-Valle and Azzalini (2008). The d-dimensional skew normal density function is defined as follows: fd x; ξ, Ω,α =2ϕd x- ξ; Ω Φ αT ω-1 x-ξ , x ∈ Rd (3.5) where ϕd x; Ω is the Nd (0, Ω density function for d x d positive definite symmetric matrix Ω, is a vector location parameter, α is a vector shape parameter ξ, α ∈ Rd , and is a diagonal matrix formed by the standard deviation of Ω. This procedure uses centered parameters (CP), which are transformed from direct parameters (DP) such as mean, covariance matrix, and skewness. Under a certain choice of CP μ, Σ, Υ that belongs to the admissible CP sets corresponds to DP ξ, Ω, α . The CP is defined as: μ= Ε Y = ξ + ωμz Σ=var Y = Ω- ωμz μT ω=ωΣz ω z 26 After some algebra, DP is calculated from CP as: -1 ξ= μ - σσz μz , -1 ω=σσz , Ω= Σ+ ωμz μT ω , z where μz = 2Υ 1/3 , c= 4- π 1+ c2 c More detail about re-parameterization between CP and DP can be found in the study by Arellano-Valle and Azzalini (2008). 3.3 MCMC Simulation MCMC methods have become a familiar method for estimating the parameters of complex statistical models with the rapid decrease in computing costs. Despite the fact that MCMC methods can be implemented for complicated statistical models, the basic idea underlying MCMC methods is extremely simple. The main idea is to sample from the posterior distribution (e.g. target distribution), and use those samples to make inferences about parameters of interest. Suppose we have a joint distribution p(θ, β), where is the ability parameter and is a vector of item parameters. Ultimately, our goal is to find the joint posterior distribution, such as p θ, β|X ∝p X|θ, β *p θ, β . In order to find such a joint distribution, we run the Markov Chain with a transition kernel, the probability of moving to a new state θk+1 , βk+1 , given the current state of the chain θk , βk . There are two well-known transition kernels, Gibbs sampling (Geman & Geman, 1984) and Metropolis-Hasting sampling (Hastings, 1970; Metropolis et al., 1953). 27 Gibbs sampling uses the full conditional distribution to sample from the sequence of parameters. Suppose that we have a vector of parameter ϴ(θ1 ,θ2 ,…, θk ). Then the Gibbs sampling algorithm is defined as follows: 1. Set the starting values for the vector of parameter 2. Set j = j +1 j j-1 j-1 j-1 3. Sample θ1 θ2 , θ3 , ⋯, θk . j j j-1 j-1 4. Sample θ2 θ1 , θ3 , ⋯, θk ). ⋮ j j j j k. Sample θk θ1 , θ2 , ⋯, θ ). k-1 k+1. Return to step 2 and repeat until convergence. Metropolis-Hasting sampling is used when it is difficult to use the full conditional distribution. The Metropolis-Hasting sampling algorithm is defined as follows: 1. Establish the starting value for parameter . 2. Specify a proposal density r(θj , θ(j+1) , which defines the proposal density from the current state θj to the next state θ(j+1) . 3. Given the current state θj , draw the candidate parameter θ* from the proposal density. 4. Compute probability α θj ,θ* =min 1, * * j j j * g(θ )r(θ , θ ) g(θ )r(θ , θ , ) where g(.) is the density of the target distribution. 5. Compare α θj ,θ* with a U(0,1) random draw u. If α θj ,θ* > u, then set θj = θ* . 28 Otherwise, set θj+1 = θj . In this study, the Gibbs sampling built in the OpenBUGS program (v. 3.11) is used to run the simulation. The High Performance Computing Center (HPCC) at Michigan State University was used to run the simulation. HPCC provides seven clusters, which are composed with various numbers of nodes. The system runs on Red Hat Enterprise Linux 6.1. HPCC also provides various computational software, such as OpenBUGS, Matlab, and R (HPCC, 2012). In OpenBUGS, there are several sampling algorithms provided and systematically implemented into the program. Once OpenBUGS starts to run the simulation, then the sampling algorithm stored in the program is automatically loaded and the most appropriate sampling algorithm is identified. These sampling algorithms include a proposal distribution with normal distribution, univariate distribution, and multivariate normal distribution. Once the sampling algorithm and prior distribution are specified, there are a number of decisions that need to be made to make sure that the inference from the MCMC result is meaningful and useful. First, the initial values should be set to run the simulation. Second, is deciding whether multiple chains or a single long chain should be run. Third, the length of iterations needs to be set. Fourth, the convergence of the Markov chain needs to be diagnosed to make sure it has reached the target distribution (i.e. stationary distribution). Brooks (1998) showed that starting values do not have a serious impact on any inference from MCMC because the sample used to make inferences is chosen after the chain has reached a stable stage (i.e. stationary distribution). However, the choice of starting values may affect the performance of the Markov chain. Several methods have been suggested by researchers. Andrew Gelman and Rubin (1992) suggested starting values be sampled from a high density distribution 29 of a mixture t-distribution, which is called a simple mode-finding algorithm. Brooks and Morgan (1994) suggested the use of an annealing algorithm to sample initial values. Once starting values are set, the next step is to decide whether to run multiple Markov chains in parallel or a single long chain. Geyer (1992) suggested a long single chain because running multiple chains does not guarantee that each short chain will have reached a stationary distribution. Even though multiple chains give a diagnostic value about the length of iterations, that inference is not valid if multiple chains do not give an agreeable result; the other side of this argument is that this result of agreement does not confirm that each multiple chain has been reached at the stationary distribution. Geyer (1992) also argued that one very long run could give a valuable diagnostic on the convergence of Markov chains. If the run does not seem to reach stationary distribution, then it is too short and a longer chain needs to be run. Even though multiple chains have a small advantage in the diagnostic of convergence, this small advantage could not be critical because a single long run could also provide a useful diagnostic value. The next step is to set up the length of burn-in (i.e. warm-up). Several researches have shown the formal analysis to calculate how many iterations should be thrown away (Kelton & Law, 1984; Raftery & Lewis, 1992; Ripley & Kirkland, 1990). However, Geyer (1992) argued that this formal analysis does not seem necessary in a practical running of Markov chains. He suggested that it would suffice to throw away 1 or 2% of a run. More can be thrown away later if the auto-covariance calculations or time-series plots showed the slow of mixing. 3.4 Accessing Convergence of MCMC Simulation The crucial part of using MCMC in parameter estimation is to assess how well the MCMC algorithm (e.g. Gibbs sampling and Metropolis-Hasting sampling) performs. Without 30 evidence of having reached the target distribution (i.e. stationary distribution), the inferences made from the MCMC method should be questioned. Several studies suggest different diagnostic methods for convergence (Andrew Gelman & Rubin, 1992; Geweke, 1992; Heidelberger & Welch, 1983; Raftery & Lewis, 1992). Andrew Gelman and Rubin (1992) used multiple sequences of chains to estimate the variance, called the potential scale reduction factor (PSRF). See Gelman and Rubin (1992) for details. If the value of the PSRF is large, then the convegerence of the Markov chain is suspicious and more iterations need to be run. If the value of PSRF is close to 1, then the Markov chain is close to stationary distribution. Geweke (1992) used the spectral density to estimate the variance from a single long chain. The basic idea is that there are no discontinuties at frequency zero for the times series of spectral density. The diagnostics are performed by dividing the iterations into two parts, the first part (10% and more) and the last part (50% and more), and taking the difference of the means of each part and dividing it by its standard error, which becomes the Z-score test statistic. See Geweke (1992) for technical details. Heidelberger and Welch (1983) developed a method that was initially used to estimate the length of the iteration. However, it is feasable for checking the diagnostics of convergence in a Gibbs simulation. The basic idea is to use the test statistics, which are based on the Cramer-von statistics, in the sequence of iterations. The first step is to set up the initial check-point and estimate the confidential interval, if it passes the testing procedure. This step is repeated until either the iteration reaches the end or a confidential interval meets the accuracy criteria. For more technical details, the reader is encouraged to see their paper. In this study, Heidelberger’s and Welch’s diagnostic is used to to diagnose the convergence of Gibbs sampling. 31 Cowles and Carlin (1996) reviewed several MCMC convergence diagnotic procedures, and showed that none of the procedures is superior to other. They also suggested using multiple procedures to diagnose the convergence because each procedure has its own properties for accessing the convergence. In this study, since a single long chain was used, both the Geweke (1992) and Heidelberger and Welch (1983) were used to examinee the convergence of iterations. In addition to this, graphical methods will be used to access the convergence of the chain, such as time-series, autocorrelation function, and posterior density plots. 3.5 Prior Distributions and Likelihood Functions 3.5.1 Prior Distributions Suppose that the probability of a correct response for examinee j on item i is given as follows: exp[ ai θj + di ] Pi θj =P Xij =1 θj , ai , di )= 3.6 1+exp[ ai θj + di ] and suppose that the M2PL model holds local independence assumption. Then the joint probability of a response is given as follows: N n P X | Θ,Σ, a, d] = Pi θj xij 1- Pi θj 1-xij 3.7 j=1 i=1 where, Xij is the observed response of jth examinee on ith item, is N x k vector of ability. k is the number of dimensions in the M2PL model. This study assumes that the ability parameter is considered to be mutually independent, and to follow the multivariate normal with mean vector µ and the covariance matrix 32 (Béguin & Glas, 2001; Bolt & Lall, 2003). Then prior distribution πθ θj for ability distribution is specified as: k -1 ' 2 -1 πθ θj =Nk θj μ, ∑ ) =(2π) 2 |∑| 2 exp - 1 θj -μ ∑ θj -μ 2 where, 3.8 µθ =[01, … , 0k], 10⋯0 ⋯ Σθ 01 ⋱ 0 ⋮⋮ ⋮ 00⋯1 . The mean vector follow all zeros, and the covariance matrix is equal to identity matrix. The reason for using the identity matrix as the covariance in the prior multivariate normal distribution is that the purpose of this study is to investigate the mis-specified model effect in item calibration. In practical situations, it is nearly impossible to know the structure of the latent constructs. So, most item parameter calibration procedures use the standard covariance matrix as prior distribution. By using the standard covariance matrix as the prior distribution, it is possible to see the effect of the mis-specification effect in item calibration procedures. Prior distribution for discrimination parameter aik takes normal distribution as aik ~ Na μ,σ2 I aik >0 , where I(.) is an indicator function used to make sure that the aik parameter is sampled from a positive area of normal distribution. Fu et al. (2009) used and showed that the truncated normal distribution for a-parameter is the appropriate choice, and works very well in an MCMC simulation setting as prior distribution. Mean 0 and variance 2 are used as the hyper-parameter. Béguin and Glas (2001) also used mean 0 and variance 1, and 33 showed that these prior specifications worked very well in item parameter calibration in an MCMC simulation. So, while the specification for this study is less informative than their specification, Bolt and Lall (2003) used only slightly more informative priors (mean=0, variance=2). They also showed this prior distribution works very well in an MCMC simulation for MIRT framework. Prior distribution for difficulty parameter d takes normal distribution di ~ Nd μ,σ2 . Mean 0 and variance 20 are used as prior distribution for d-parameter. This prior is a noninformative prior. Most of the previous studies used a broad range of variance, such as from 1 to 5 (Baker & Kim, 2004; Bolt & Lall, 2003; Finch, 2011; Harwell et al., 1996; Maydeu-Olivares, 2001; Y. Sheng, 2008). However, the large variance, 20, is used for this study because the purpose of this study is to evaluate the estimates, given incorrect assumptions about the latent structures. In order to evaluate the estimates, it is necessary to not rely on the prior specifications. Using a noninformative prior would enable the estimation to rely more heavily on the data rather than the prior (Baker & Kim, 2004). For a- and d-parameters, this study uses a univariate normal distribution. This is very common practice in an MCMC simulation study in the MIRT framework. e.g. (Baker & Kim, 2004; Bolt & Lall, 2003; Fu et al., 2009). 3.5.2 Likelihood Functions The overall posterior for item and ability parameters distribution is expressed in the following manner. Let us assume that MIRT model P Xij holds the local independence assumption, then the likelihood function of the M2PL models with response data Xij , matrix, can be expressed as: 34 N n L Xij | Θ,Σ,A,d = xij Pi θj 1-xij * Qi θj j=1 i=1 (3.9) where Xij is the response of jth examinee on item i, is N x k vector of ability, is item discrimination vector, and is n 1 item difficulty vector. Since we define the prior distributions for proficiency and the item parameters as multivariate normal distribution πθ θj , A, d , then the full joint posterior distribution is given, p θj , A, d Θ, Σ, A, d) ∝ L Xij | Θ, Σ, A, d πθ θj , A, d For the proficiency, the joint posterior distribution could be obtained as; n p θj Θ, Σ, A, d, X ∝ Pi θj uij i=1 Qi θj 1-uij * πθ θj where k -1 1 πθ θj = (2π) 2 |∑| 2 exp - θj -μ 2 ' ∑-1 θj -μ For the item discrimination parameter, the joint posterior distribution can be expressed as, n p ai |Θ, Σ, d, X ∝ Pi θj i=1 where 35 uij Qi θj 1-uij * πa ai k -1 1 πa ai = (2π) 2 |∑| 2 exp - ai -μ ' ∑-1 ai -μ 2 For the difficulty parameter, the joint posterior distribution can be expressed as, n p di |Θ, Σ, A, X ∝ Pi θj i=1 uij Qi θj 1-uij * πd di where k -1 1 πd di = (2π) 2 |∑| 2 exp - di -μ ' ∑-1 di -μ 2 3.6 Evaluation Criteria for Simulation Results The inference from a Bayesian framework is different from the perspective of frequentist statistics. The inference of frequentist statistics is based on the sampling distribution of a population parameter, so that the summary statistics could be used to make inference. The inference from a Bayesian framework is based on the posterior distribution, which is the combined knowledge of prior distribution and likelihood function, views the data as fixed, and generates a distribution for population parameters. The evaluation of frequentist estimates is often based on its performance as measured by characteristics such as bias and mean square error. While the Bayesian paradigm does not use these measures because of their underlying view of the nature of a population parameter, these characteristics will also be considered in this study. Although it is acknowledged that the application of these measures conflicts with the Bayesian paradigm, it is constructive to consider them in addition to the Bayesian measures as a means of providing evaluation measures that might be more familiar to most researchers. Three summary statistics are used to evaluate the simulation results: BIAS, Mean Absolute Deviation (MAD), and Root Mean Squared Error (RMSE). The magnitude of those 36 criteria is compared based on the factors implemented into the MIRT model. For example, to evaluate the a1- parameter through the simulation, those criteria are computed by I BIAS= i=1 a1i - a1i 3.10 I I MAD= i=1 a1i - a1i 3.11 I I MSE= i=1 a1i - a1i I 2 3.12 where a1i is the true value of a1 for item i, a1i is the corresponding estimate, and I is number of items. The reported criteria are the averaged value across replications for each parameter. 37 CHAPTER 4 RESULTS This section covers the simulation results, including convergence diagnostics for the MCMC chains, and item parameter recovery in the MIRT model when different factors are imposed into the model, such as the number of dimensions (3 and 6), types of latent traits configuration (AS and MS), correlated latent structures (.3 and .6), and skewed latent traits distributions (-.9 for the negative and +.9 for the positive skew) with different sample sizes (1000, 1500, 2000, and 3000). Each combination of simulation has 10 replications, which make a total of 840 sets of simulations. 4.1. Convergence Diagnostic Patz and Junker (1999a) emphasized that two things must be determined. One is that the MCMC has to reach into stationary distribution, meaning that it has converged. The other one is that the MCMC standard error associated with the point estimates should be small. In order to examine the convergence of the MCMC chains, two convergence diagnostic procedures were used, the Heidelberger and Welch diagnostic, and Geweke. Graphical diagnostics, including autocorrelation, trace, and posterior density plots, are also conducted. Due to the large number of item parameters from the different sets of simulations, only a selected set of items is presented to show the convergence of the MCMC chain. The set of items used to show the convergence was chosen to show the worst convergence from all of the sets of simulations. Therefore, it is assumed that the rest of items are naturally in the acceptable range of convergence. In order to estimate the standard error of point estimates, which is the mean for this study, the batch standard error built-in function in CODA is used. 38 4.1.1. Heidelberger and Welch diagnostic Heidelberger and Welch (1983) is one of the MCMC convergence diagnostic methods implemented into the CODA package in R program. The technical details of the method are covered in the research design section. The set of simulations chosen is the worst scenario case: 6 dimensions, mixed structures, correlated latent structures, and skewed latent structures. Among 60 items across 6 a-parameters, only a few items do not pass the convergence test, which requires an extended length of chain. All items for the d-parameter passed the convergence test. Even though extended length of chain should improve the convergence, 25,000 iterations with 3,000 burn-ins are enough to have a satisfactory convergence. See the tables in the appendix for a full description of which items did not pass the convergence test. 4.1.2. Geweke diagnostics In addition to Heidelberger’s and Welch’s convergence diagnostic, Geweke’s convergence is used to access the MCMC convergence. The same item parameter as used for Heidelberger’s and Welch’s test is used for Geweke’s convergence test. If |Z| <2, then it is assumed that the iteration reached stationary distribution and has met the convergence. See Table 4.1.1. The Z-score for the first 20 items are shown. Even though a few of the items do not seem to reach to convergence, the Z-score for most of the items is less than 2, and it confirms that the MCMC iteration has reach to convergence. The Z-score for the 60 items is given in the appendix. 39 Table 4.1.1: Geweke’s Z-score Item 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 a1 1.2684 0.8198 1.1488 1.3278 0.7640 2.3268 1.5009 1.3062 0.8921 1.2711 -0.4740 -0.3858 0.2475 0.3892 -0.9393 1.0436 0.5820 0.5422 0.4158 -0.3669 a2 -2.2061 -1.1491 -2.3534 -2.5092 -3.5548 -1.6575 -1.9750 -3.2312 -2.0617 -3.6032 -0.8094 0.2128 1.7524 -0.0242 0.9163 1.3765 0.3020 0.7274 0.6139 0.7525 a3 0.5387 1.3394 0.8052 0.7329 0.8827 0.6066 1.2068 0.3156 0.3080 0.6145 2.1629 1.1112 -0.1811 -2.0751 -1.2904 -1.9252 -1.4122 1.1886 -1.9429 1.5577 a4 0.2543 0.6109 1.5895 0.5589 0.7436 -0.1853 0.5158 0.5285 -0.3810 0.7171 1.7458 -0.1536 -0.7205 1.2241 1.1393 1.3216 0.5939 0.1160 1.0971 0.9327 a5 0.7505 -1.4409 -1.4047 -0.1756 0.0197 -1.6516 -1.5500 -2.9689 -0.3362 0.4523 -1.2479 -0.3504 -0.1001 -2.3642 -1.2072 -2.5827 -1.1244 -2.4029 -2.1644 -1.5352 a6 1.6152 0.4359 1.8357 2.0876 2.3901 1.2716 2.3976 2.1849 3.0078 2.9258 -0.1922 -0.3221 0.5925 0.3590 -0.1175 0.1225 0.3885 0.3778 0.1458 -0.4552 d 0.5404 -0.0797 -0.2742 0.5249 0.0838 -0.0478 -0.6697 1.1820 0.1300 0.0248 1.0436 -0.6892 -0.7560 -1.0243 -0.4807 -0.9919 -0.6820 -2.1434 0.2930 -1.8313 4.1.3. Graphical diagnostics: Autocorrelation, posterior density, and trace plots In addition to the two convergence diagnostics, Geweke, and Heidelberger and Welch, three graphical diagnostic plots are used to show the convergence of the MCMC iteration. Due to the large number of item parameters, only some selected items’ autocorrelation plot, trace plot and density plot are presented. Figure 4.1 shows the autocorrelation plot for a1-parameter of item number 6 in 6-dimension with MS structures. It shows that the MCMC iteration reaches the stationary distribution. 40 Figure 4.1. Autocorrelation plot for a1 of item 6 Figures 4.2 and 4.3 show the trace and posterior density plot, respectively, for the same item in the autocorrelation plot. It does not look like MCMC has a good convergence. However, it does take a sample from the range that is close to the true item parameter (a1 = .9760). If MCMC chains have more iterations, the convergence will improve. Since the autocorrelation plot shows a promising convergence, 25000 iterations with 3000 burn-ins should be enough to get a satisfactory convergence. 41 Figure 4.2. Trace plot for a1 of item 6 Figure 4.3. Posterior density plot for a1 of item 6 42 4.1.4. MCMC standard error As specified in the beginning of this chapter, MCMC simulation requires two diagnostics, a convergence test and MCMC standard error. While the convergence test shows how well the MCMC simulation has reached into stationary distribution, an MCMC standard error shows whether the number of iterations used in the simulation are enough or not. Table 4.1.2 shows the batched standard error for the first 20 items. All standard errors are in three digits, which confirms that the number of iterations of MCMC used in this study are enough. The batch standard error for the full 60 items is given in the appendix. Table 4.1.2: MCMC standard error Item 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 a1 0.0045 0.0016 0.0039 0.0041 0.0019 0.0042 0.0029 0.0046 0.0041 0.0048 0.0004 0.0013 0.0006 0.0007 0.0012 0.0015 0.0008 0.0012 0.0006 0.0011 a2 0.0016 0.0010 0.0028 0.0020 0.0022 0.0012 0.0028 0.0027 0.0019 0.0050 0.0029 0.0030 0.0025 0.0017 0.0011 0.0035 0.0008 0.0018 0.0040 0.0039 a3 0.0030 0.0007 0.0026 0.0021 0.0012 0.0027 0.0020 0.0033 0.0025 0.0044 0.0019 0.0018 0.0009 0.0031 0.0021 0.0022 0.0014 0.0018 0.0030 0.0034 43 a4 0.0011 0.0017 0.0010 0.0022 0.0015 0.0027 0.0021 0.0021 0.0026 0.0024 0.0009 0.0015 0.0013 0.0014 0.0008 0.0020 0.0014 0.0017 0.0019 0.0024 a5 0.0018 0.0013 0.0030 0.0028 0.0026 0.0019 0.0019 0.0014 0.0018 0.0033 0.0026 0.0013 0.0014 0.0018 0.0018 0.0018 0.0009 0.0030 0.0017 0.0018 a6 0.0037 0.0018 0.0034 0.0046 0.0036 0.0029 0.0026 0.0030 0.0052 0.0075 0.0036 0.0040 0.0019 0.0056 0.0018 0.0049 0.0024 0.0054 0.0069 0.0065 d 0.0013 0.0008 0.0010 0.0012 0.0015 0.0015 0.0008 0.0016 0.0015 0.0030 0.0021 0.0012 0.0013 0.0016 0.0011 0.0017 0.0022 0.0028 0.0021 0.0039 4.1.5. Item parameter recovery diagnostic Once the convergence of the MCMC simulation is confirmed, it is necessary to diagnose how well the item parameters are recovered. In order to assess item parameter recovery, a credible interval is constructed. The credible interval shows how well the estimated item parameters are in the range of standard error. For this study, the Highest posterior intervals (HPD) are assessed for each estimated item parameter to ensure that they are in the range of satisfactory standard error. HPD is also implemented in the CODA package in R. For this study, a probability of .95 is set. Tables 4.1.3, 4.1.4, and 4.1.5 show HPD for the a-parameters and the d-parameter for 6-dimensions with correlated and skewed latent structures. All of the estimated means of item parameters are between the lower and upper bounds of HPD, which shows that the item parameters are recovered in the acceptance area. Table 4.1.3: Highest posterior interval for a1, a2, and a3 parameters Item 1 2 3 4 5 6 7 8 9 10 Mean 0.4391 0.4231 0.2637 0.2947 0.2678 0.6398 0.4619 0.7090 0.6421 0.5545 a1 Lower 0.0954 0.2224 0.0000 0.0465 0.0754 0.1883 0.1718 0.1330 0.1152 0.2008 Upper 0.7284 0.6228 0.4801 0.5199 0.4488 1.0030 0.7242 1.1650 1.0620 0.8570 Mean 0.4089 0.2595 0.1915 0.2285 0.2495 0.4364 0.3278 0.5096 0.4505 0.3781 a2 Lower 0.1175 0.0001 0.0001 0.0001 0.0861 0.0211 0.0180 0.0422 0.0195 0.0189 44 Upper 0.7038 0.5307 0.4614 0.4742 0.4126 0.9389 0.6745 1.0880 0.9843 0.7737 Mean 0.2820 0.1491 0.0934 0.1479 0.2260 0.2927 0.1568 0.2635 0.2702 0.1971 a3 Lower 0.0633 0.0002 0.0000 0.0002 0.0558 0.0433 0.0000 0.0009 0.0112 0.0035 Upper 0.4875 0.3045 0.2142 0.2919 0.3832 0.5433 0.3094 0.5114 0.5044 0.3775 Table 4.1.4: Highest posterior interval for a4, a5, and a6 - parameters Item 1 2 3 4 5 6 7 8 9 10 Mean 0.2835 0.2468 0.1031 0.1736 0.1818 0.3494 0.3361 0.3800 0.2873 0.3540 a4 Lower 0.0004 0.0165 0.0000 0.0000 0.0301 0.0501 0.1379 0.0192 0.0028 0.1244 Upper 0.5377 0.4621 0.2407 0.3575 0.3305 0.6744 0.5467 0.7268 0.5907 0.5991 Mean 0.2550 0.2548 0.1021 0.1893 0.1767 0.3547 0.3274 0.3895 0.2910 0.3254 a5 Lower 0.0001 0.0002 0.0000 0.0000 0.0202 0.0715 0.1133 0.0633 0.0174 0.1014 Upper 0.5782 0.4930 0.2418 0.3624 0.3268 0.6439 0.5305 0.7610 0.5629 0.5519 Mean 0.2077 0.3048 0.0845 0.1669 0.1756 0.3651 0.2476 0.3041 0.3047 0.2775 a6 Lower 0.0136 0.0237 0.0000 0.0001 0.0317 0.1037 0.0020 0.0001 0.0269 0.0486 Upper 0.3922 0.5645 0.1962 0.3263 0.3211 0.6280 0.4594 0.5794 0.5469 0.5020 Table 4.1.5: Highest posterior interval for d- parameters Item 1 2 3 4 5 6 7 8 9 10 Mean 0.995 0.762 0.353 0.283 0.151 0.552 -0.095 -0.138 -0.163 -1.029 d Lower 0.897 0.680 0.277 0.209 0.073 0.463 -0.182 -0.232 -0.250 -1.129 Upper 1.085 0.855 0.431 0.364 0.224 0.646 -0.017 -0.050 -0.080 -0.926 4.2. 3-Dimensions In this section, 3-dimensions of latent structures are explored, with factors such as types of latent traits configuration (AS vs. MS), correlated latent traits (.3 and .6), and skewed latent traits distributions (+.9 and -.9). 45 4.2.1. Approximate Simple Structure (AS) and Mixed Structures (MS) The potential effect of structure types of the latent construct—approximate simple (AS) and mixed structures (MS)—on item parameter recovery in the MIRT model is examined with three different sample sizes, 1000, 1500, and 2000. Table 4.2.1 shows the BIAS when the different types of latent structures were imposed into item recovery in the MIRT model. When approximate simple structure (AS) is imposed into item parameter recovery, the item parameters are overestimated for a-parameters, compared to MS. However, the d-parameter shows a different pattern. d-parameter is underestimated when AS is imposed into item recovery, compared to MS. Table 4.2.1: BIAS for different types of latent traits configuration (AS vs. MS) Item Parameters Structures N 1000 a1 a2 a3 d AS MS AS MS AS MS AS MS 1500 2000 0.0941 -0.0007 0.0837 -0.0204 0.0784 -0.0131 0.0649 0.0083 0.0615 -0.0061 0.0415 -0.0095 0.1048 -0.0022 -0.3348 0.0069 0.0922 -0.0089 -0.3471 -0.0057 0.0901 -0.0191 -0.3420 0.0028 46 Table 4.2.2. MAD for different types of latent traits configuration (AS vs. MS) Item parameters Structures N 1000 a1 a2 a3 d AS MS AS MS AS MS AS MS 1500 2000 0.1951 0.1115 0.1847 0.0940 0.1883 0.1086 0.3018 0.1236 0.3060 0.0980 0.2940 0.1159 0.3506 0.1103 0.6351 0.0642 0.3397 0.0924 0.6376 0.0527 0.3510 0.0939 0.6349 0.0462 Table 4.2.3. RMSE for different types of latent traits configuration (AS vs. MS) Item parameters Structures N AS a1 MS AS a2 MS AS a3 MS AS d MS N 1000 1500 2000 0.1951 0.1115 0.1847 0.0940 0.1883 0.1086 0.3018 0.1236 0.3060 0.0980 0.2940 0.1159 0.3506 0.1103 0.6351 0.0642 0.3397 0.0924 0.6376 0.0527 0.3510 0.0939 0.6349 0.0462 Tables 4.2.2 and 4.2.3 show the mean absolute difference (MAD) and root mean square error (RMSE), respectively. MAD and RMSE show that MS has better item recovery in terms of the magnitude of MAD and RMSE than does AS. The range of RMSE for a-parameters for AS is from .1847 to .3510, but is .0924 to .1159 for MS. AS has a bigger and wider range of estimated a-parameters than MS. The range of RMSE for the d-parameter shows the same pattern. AS has a range from .6349 to .6376, and MS has a range from .0462 to .0642. There is no significant improvement as the sample size is being increased from 1000 to 2000, which tells us that the 47 sample size (N=1,000) has an enough power to estimate the stable item recovery whether latent traits structures have different configuration of types such as AS or MS. 4.2.2. Correlated Latent Traits Only MS is used to see the interaction effect of correlated latent traits. The AS is not used for correlated latent structures because it has a dominated dimension, which has superior power to make the student able to answer correctly. On other hand, the MS has several dimensions that could contribute to students choosing the correct answer. Thus, the relationship among dimensions in MS has a more serious impact on students’ getting a correct answer. Table 4.2.4 shows the BIAS when different correlations (.3 and .6) are imposed into the latent structures. With a .3 correlation between latent structures, a-parameters are underestimated, ranging from -.0544 to -.0285. When a .6 correlation is imposed into latent structures, aparameters are overestimated, ranging from -.0064 to .0394. The d-parameter does not show any noticeable pattern in item recovery, whether latent structures have a high or low correlation. Tables 4.2.5 and 4.2.6 show the MAD and RMSE, respectively. The magnitude of MAD and RMSE show that a 0.6 correlation has more impact on the recovery of a-parameters than a .3 correlation. The d-parameter does not show any influence from any correlation, in terms of the magnitude of MAD and RMSE. As sample size increases from 1000 to 1500 and to 2000, the item recovery does not seem to improve for the a-parameters. On other hand, the d-parameter does improve when the sample size increases, even though the magnitude of improvement is really small; the difference from the biggest to the smallest is just .002. 48 Table 4.2.4. BIAS for correlated latent traits (MS only) Item parameters Correlation 1000 N 1500 2000 a1 0 0.3 0.6 -0.0007 -0.0401 0.0184 -0.0204 -0.0411 0.0028 -0.0131 -0.0544 0.0147 a2 0 0.3 0.6 0.0083 -0.0285 0.0312 -0.0061 -0.0345 0.0394 -0.0095 -0.0392 0.0091 a3 0 0.3 0.6 0 0.3 0.6 -0.0022 -0.0291 0.0193 0.0069 0.0032 -0.0101 -0.0089 -0.0377 -0.0064 -0.0057 0.0071 0.0039 -0.0191 -0.0436 0.0084 0.0028 -0.0047 0.0036 d Table 4.2.5. MAD for correlated latent traits (MS only) Item parameters Structures 1000 N 1500 2000 a1 0 0.3 0.6 0.11152 0.11307 0.12724 0.09399 0.10851 0.12279 0.10859 0.11819 0.1218 a2 0 0.3 0.6 0.0083 -0.0285 0.0312 0.09801 0.09715 0.11997 0.11592 0.1105 0.11018 a3 0 0.3 0.6 0 0.3 0.6 0.11025 0.11554 0.12942 0.06419 0.06427 0.06262 0.09238 0.11273 0.12417 0.0527 0.05234 0.05008 0.09389 0.12013 0.12348 0.04618 0.04592 0.04719 d 49 Table 4.2.6. RMSE for correlated latent traits (MS only) Item Parameters Structures 1000 N 1500 2000 a1 0 0.3 0.6 0.1301 0.1280 0.1459 0.1122 0.1220 0.1430 0.1300 0.1340 0.1367 a2 0 0.3 0.6 0.1451 0.1251 0.1486 0.1149 0.1119 0.1407 0.1389 0.1250 0.1281 a3 0 0.3 0.6 0 0.3 0.6 0.1324 0.1313 0.1511 0.0785 0.0785 0.0785 0.1120 0.1255 0.1417 0.0661 0.0653 0.0622 0.1109 0.1338 0.1425 0.0554 0.0561 0.0575 d 4.2.3. Skewed Latent Trait Distributions When skew is imposed on the latent traits distributions, it does not have as much influence on item recovery for the a-parameters, whether they have AS or MS. It also does not have any noticeable influence whether there is a positive or a negative skew on the latent structures. In all cases, the a-parameters were overestimated. However, the d-parameters showed an interesting pattern of item recovery. When the latent structures have an AS, the d-parameter was underestimated, ranging in BIAS from -.4810 to -.2048. However, a positive skew, which means that a low ability sample is used, slightly underestimated the d-parameter, compared with no skew on the latent traits distributions. When a negative skew is imposed into the latent traits distributions, it slightly overestimates, when compared with no skew on the latent traits distributions. When the latent traits have an MS, both a-parameters and d-parameters showed the same pattern as an AS. However, the magnitude of RMSE and MAD showed that AS had more 50 trouble estimating the d-parameter when skew was imposed on the latent traits distributions. See tables 4.2.8 and 4.2.9. Table 4.2.7. BIAS when skew is imposed on the latent traits distributions (+.9 and -.9) Item Parameters Structures Skew 1000 a1 AS MS a2 AS MS a3 AS MS d AS MS N 1500 2000 No Positive Negative No Positive Negative 0.0941 0.1530 0.1505 -0.0007 0.0553 0.0524 0.0837 0.1394 0.1389 -0.0204 0.0382 0.0395 0.0784 0.1342 0.1303 -0.0131 0.0589 0.0316 No Positive Negative No Positive Negative 0.0649 0.1257 0.1216 0.0083 0.0777 0.0672 0.0615 0.1107 0.1178 -0.0061 0.0701 0.0605 0.0415 0.1111 0.1078 -0.0095 0.0417 0.0432 No Positive Negative No Positive Negative No Positive Negative No Positive Negative 0.1048 0.1529 0.1569 -0.0022 0.0581 0.0686 -0.3348 -0.4635 -0.2102 0.0069 -0.1651 0.1439 0.0922 0.1500 0.1411 -0.0089 0.0416 0.0494 -0.3471 -0.4810 -0.2048 -0.0057 -0.1484 0.1521 0.0901 0.1518 0.1373 -0.0191 0.0398 0.0524 -0.3420 -0.4698 -0.2108 0.0028 -0.1493 0.1428 51 Table 4.2.8. MAD when skew is imposed on the latent traits distributions (+.9 and -.9) Item parameters Structures Skew 1000 a1 AS MS a2 AS MS a3 AS MS d AS MS N 1500 2000 No Positive Negative No Positive Negative 0.1951 0.2181 0.2138 0.1115 0.1230 0.1222 0.1847 0.2071 0.2070 0.0940 0.1099 0.1094 0.1883 0.2012 0.2030 0.1086 0.1102 0.1073 No Positive Negative No Positive Negative 0.3018 0.3109 0.3145 0.1236 0.1448 0.1329 0.3060 0.3075 0.3090 0.0980 0.1226 0.1279 0.2940 0.2985 0.2996 0.1159 0.1066 0.1182 No Positive Negative No Positive Negative No Positive Negative No Positive Negative 0.3506 0.3800 0.3813 0.1103 0.1252 0.1240 0.6351 0.6822 0.6034 0.0642 0.1673 0.1485 0.3397 0.3632 0.3645 0.0924 0.1110 0.1219 0.6376 0.6872 0.5982 0.0527 0.1500 0.1525 0.3510 0.3668 0.3654 0.0939 0.1000 0.1043 0.6349 0.6833 0.6048 0.0462 0.1501 0.1431 52 Table 4.2.9. RMSE when skew is imposed on the latent traits distributions (+.9 and -.9) Item parameters Structures Skew 1000 a1 AS MS a2 AS MS a3 AS MS d AS MS N 1500 2000 No Positive Negative No Positive Negative 0.1951 0.2181 0.2138 0.1301 0.1470 0.1441 0.1847 0.2071 0.2070 0.1122 0.1301 0.1262 0.1883 0.2012 0.2030 0.1300 0.1264 0.1267 No Positive Negative No Positive Negative 0.3383 0.3238 0.3274 0.1236 0.1448 0.1329 0.3432 0.3170 0.3189 0.0980 0.1226 0.1279 0.3017 0.3084 0.3077 0.1159 0.1066 0.1182 No Positive Negative No Positive Negative No Positive Negative No Positive Negative 0.3825 0.3930 0.3935 0.1324 0.1480 0.1457 0.6425 0.6893 0.6116 0.0785 0.1827 0.1673 0.3681 0.3736 0.3753 0.1120 0.1370 0.1414 0.6421 0.6925 0.6037 0.0661 0.1616 0.1640 0.3598 0.3756 0.3745 0.1109 0.1200 0.1218 0.6384 0.6874 0.6088 0.0554 0.1593 0.1518 4.2.4. Correlated Latent Traits and Skewed Latent Traits Distributions In this section, results are presented for when both correlation and skew are imposed on the latent traits. When both correlation and skew are imposed into the latent traits, high correlation (.6) with skewed distributions contributes to a higher magnitude of bias for aparameters than low correlations. The mixed factors of both correlation and skew overestimate the a-parameters. However, the d-parameter is not influenced by whether latent traits have high 53 or low correlations. Imposing correlation with a skewed latent trait distribution does increase the magnitude of bias. Table 4.2.10 shows the bias when both correlation and skew are imposed on the latent traits. Table 4.2.10. BIAS when both correlation and skew are imposed on the latent traits Item parameters Correlation Skew 1000 N 1500 2000 a1 0 No 0.3 Positive Negative 0.6 Positive Negative -0.0007 0.0108 0.0138 0.0667 0.0802 -0.0204 0.0150 0.0063 0.0683 0.0575 -0.0131 0.0116 0.0094 0.0625 0.0672 a2 0 No 0.3 Positive Negative 0.6 Positive Negative 0.0083 0.0233 0.0270 0.0753 0.0902 -0.0061 0.0212 0.0150 0.0721 0.0696 -0.0095 0.0107 0.0062 0.0501 0.0614 a3 0 No 0.3 Positive Negative 0.6 Positive Negative 0 No 0.3 Positive Negative 0.6 Positive Negative -0.0022 0.0177 0.0154 0.0710 0.0677 0.0069 -0.0982 0.1072 -0.1053 0.1213 -0.0089 0.0111 0.0073 0.0533 0.0624 -0.0057 -0.0981 0.1052 -0.0938 0.0850 -0.0191 -0.0008 0.0008 0.0691 0.0584 0.0028 -0.1173 0.1137 -0.1001 0.0888 d When both correlation and skew are imposed on the latent traits, the d-parameter was overestimated when a negative skew was applied, and underestimated when a positive skew was applied. Increasing the sample size from 1000 to 1500 and to 2000 did not improve the item recovery. The magnitude of BIAS, MAD, and RMSE stay in the small range, except for dparameters with a .6 correlation and skewed trait distributions. When the sample size was 54 increased from 1000 to 1500, it improves the estimate of the d-parameter. However, the estimates do not improve when the sample size increased from 1500 to 2000. Table 4.2.11 and 4.2.12 show MAD and RMSE when both correlation and skew are imposed on the latent traits, respectively. Table 4.2.11. MAD when both correlation and skew are imposed on the latent traits Item parameters Correlation Skew 1000 N 1500 2000 a1 0 No 0.3 Positive Negative 0.6 Positive Negative 0.1115 0.1213 0.1214 0.1473 0.1490 0.0940 0.1233 0.1242 0.1473 0.1398 0.1086 0.1152 0.1193 0.1363 0.1431 a2 0 No 0.3 Positive Negative 0.6 Positive Negative 0.1236 0.1173 0.1267 0.1445 0.1459 0.0980 0.1242 0.1177 0.1381 0.1348 0.1159 0.1136 0.1083 0.1245 0.1363 a3 0 No 0.3 Positive Negative 0.6 Positive Negative 0 No 0.3 Positive Negative 0.6 Positive Negative 0.1103 0.1220 0.1212 0.1378 0.1477 0.0642 0.1110 0.1192 0.1163 0.1301 0.0924 0.1241 0.1103 0.1405 0.1398 0.0527 0.1039 0.1101 0.0994 0.0951 0.0939 0.1185 0.1305 0.1336 0.1439 0.0462 0.1187 0.1157 0.1034 0.0938 d 55 Table 4.2.12. RMSE when both correlation and skew are imposed into the latent structures Item parameters Correlation Skew 1000 N 1500 2000 a1 0 No 0.3 Positive Negative 0.6 Positive Negative 0.1301 0.1418 0.1440 0.1697 0.1679 0.1122 0.1447 0.1422 0.1668 0.1565 0.1300 0.1314 0.1352 0.1507 0.1644 a2 0 No 0.3 Positive Negative 0.6 Positive Negative 0.1451 0.1343 0.1439 0.1670 0.1639 0.1149 0.1470 0.1370 0.1587 0.1534 0.1389 0.1323 0.1274 0.1403 0.1566 a3 0 No 0.3 Positive Negative 0.6 Positive Negative 0 No 0.3 Positive Negative 0.6 Positive Negative 0.1324 0.1398 0.1451 0.1579 0.1684 0.0785 0.1286 0.1379 0.1361 0.1493 0.1120 0.1398 0.1289 0.1596 0.1550 0.0661 0.1177 0.1248 0.1135 0.1096 0.1109 0.1360 0.1478 0.1480 0.1665 0.0554 0.1293 0.1277 0.1153 0.1056 d 4.3. 6-Dimensions This section presents 6-dimensions of the latent trait, with factors such as types of latent trait configuration (AS vs. MS), correlated latent traits (.3 and .6), and skewed latent traits distributions (+.9 and -.9). 56 4.3.1. Approximate Simple Structures and Mixed Structures When there are different types of latent trait configuration such as AS and MS on 6dimensions, the magnitude of BIAS for the a-parameters for AS is bigger than for MS, except the a1-parameter, which has a smaller BIAS with AS. Table 4.3.1 shows the BIAS when AS and MS are imposed into 6-dimension latent traits. The d-parameter shows no significantly different pattern between AS and MS in terms of BIAS. Both the AS and MS structures give a satisfactory item recovery, ranging from -.0086 to +.0093. When sample sizes increase from 1000 to 1500, 2000, and 3000, the magnitude of BIAS of a-parameters gets smaller for both AS and MS structures, which shows that a sample size of at least 2000 is required for satisfactory item parameter recovery in 6-dimension latent traits. Table 4.3.1. BIAS for different types of latent trait configuration (AS vs. MS) Item parameters Structures 1000 No 1500 2000 3000 a1 AS MS 0.0462 0.0369 0.0181 0.0226 a2 AS MS 0.0462 0.0466 0.0234 0.0113 0.0038 0.0046 -0.0016 -0.0177 a3 AS MS 0.0462 0.0363 0.0117 -0.0002 -0.0147 0.0052 -0.0015 -0.0127 a4 AS MS 0.0462 0.0237 0.0117 -0.0002 0.0026 0.0057 -0.0058 -0.0144 a5 AS MS 0.0543 0.0183 0.0162 0.0106 0.0046 0.0059 -0.0040 -0.0078 a6 AS MS AS MS d 0.0117 -0.0024 0.0159 0.0003 0.0522 0.0268 0.0129 0.0276 0.0142 0.0051 0.0045 -0.0086 -0.0044 -0.0025 0.0093 0.0040 0.0029 0.0012 0.0035 0.0012 Magnitude of MAD and RMSE in Tables 4.3.2 and 4.3.3 show that the AS structures give a better item recovery for a-parameters, compared to MS structures. It shows that a57 parameters are more problematic for both AS and MS, compared to d-parameter in 6-dimensions. In particular, AS structures show an irregular pattern in some of the a-parameters such as a3, a4, and a5. When the sample size increases, the RMSE goes higher than with a smaller sample size. Overall, MS shows a higher RMSE, compared to AS. However, MS shows a stable and gradual decrease when sample size increases. There is no noticeable influence from the different types of latent structures configuration. Both the AS and MS structures have almost identical magnitudes of RMSE and MAD. Table 4.3.2. MAD for different types of latent trait configuration (AS vs. MS) Item parameters Structures N 1000 1500 2000 3000 a1 AS MS 0.0818 0.1465 0.0590 0.1149 0.0543 0.1075 0.0452 0.0933 a2 AS MS 0.0805 0.1517 0.0624 0.1053 0.0540 0.1145 0.0659 0.0860 a3 AS MS 0.0805 0.1443 0.0833 0.1087 0.0528 0.1119 0.0696 0.0782 a4 AS MS 0.0805 0.0965 0.1043 0.1044 0.0564 0.1040 0.0717 0.0743 a5 AS MS 0.0805 0.1044 0.0806 0.0920 0.0527 0.1201 0.0458 0.0959 a6 AS MS AS MS 0.0994 0.1478 0.0741 0.0692 0.0622 0.1108 0.0516 0.0594 0.0542 0.1154 0.0503 0.0462 0.0466 0.0868 0.0393 0.0392 d 4.3.2. Correlated Latent Traits Only MS is used to see the interaction effect of correlated latent traits. The AS has not been used for correlated latent traits because it has a dominated dimension, which has superior power for making a student able to answer correctly. On other hand, MS has several dimensions 58 that could contribute to making students give a correct answer. So the relationship among dimensions in MS has a more serious impact on students’ getting a correct answer. Table 4.3.3. RMSE for different types of latent trait configuration (AS vs. MS) Item parameters Structures N 1000 1500 2000 3000 a1 AS MS 0.1044 0.1687 0.0741 0.1388 0.0673 0.1260 0.0549 0.1108 a2 AS MS 0.1022 0.1702 0.0790 0.1247 0.0670 0.1391 0.1069 0.1008 a3 AS MS 0.1022 0.1676 0.0790 0.1303 0.0670 0.1332 0.1153 0.0960 a4 AS MS 0.1399 0.1627 0.1761 0.1250 0.0707 0.1249 0.1207 0.0904 a5 AS MS 0.1399 0.1579 0.1761 0.1271 0.0660 0.1450 0.0563 0.1152 a6 AS MS AS MS 0.1473 0.1710 0.0898 0.0852 0.0784 0.1347 0.0628 0.0734 0.0682 0.1350 0.0619 0.0554 0.0576 0.1058 0.0473 0.0476 d Table 4.3.4 shows the BIAS when different magnitudes of correlations (.3 and .6) are imposed into the latent traits in 6-dimensions. It clearly shows that the magnitude of BIAS for aparameters increases when the correlation increases. In the case of no correlation and a sample size of 1000, the BIAS for the a1 parameter is .0369. It goes up to .0778 when correlation .3 is used, which is almost twice that of no correlation. It goes up to .01661 when the correlation .6 is used, which is two times larger than the .3 correlation. On the other hand, the d-parameter is not influenced by the correlation between the latent traits. The magnitude of BIAS does not change significantly, whether it has a high correlation (.6) or a low correlation (.3) between the latent structures. When the sample size increases from 1000 to 1500, 2000, and 3000, the magnitude of 59 BIAS is gradually decreased for the correlated latent traits. However, the change of BIAS does not seem significant. Table 4.3.4. BIAS for correlated latent traits (MS only) Item Parameters Correlation N 1000 a1 a2 a3 a4 a5 a6 d 1500 2000 3000 0 0.0369 0.0226 0.3 0.0778 0.0599 0.6 0.1661 0.1454 0 0.0207 0.0046 0.3 0.0591 0.0428 0.6 0.1518 0.1275 0 0.0209 0.0052 0.3 0.0570 0.0426 0.6 0.1476 0.1274 0 0.0237 0.0057 0.3 0.0663 0.0513 0.6 0.1581 0.1379 0 0.0183 0.0059 0.3 0.0584 0.0405 0.6 0.1481 0.1298 0 0.0276 0.0142 0.3 0.0682 0.0527 0.6 0.1555 0.1374 0 -0.0025 0.0093 0.3 0.0054 0.0073 0.6 0.0155 -0.0029 0.0159 0.0523 0.1391 0.0003 0.0482 0.1279 -0.0016 -0.0177 0.0352 0.0278 0.1303 0.0913 -0.0015 -0.0127 0.0354 0.0275 0.1137 0.1079 -0.0058 -0.0144 0.0389 0.0283 0.1253 0.1129 -0.0040 -0.0078 0.0347 0.0194 0.1242 0.1115 0.0051 0.0401 0.1258 0.0040 0.0009 0.0095 0.0012 0.0391 0.1213 0.0012 0.0024 0.0016 Tables 4.3.5 and 4.3.6 show the MAD and RMSE when the latent traits are being correlated. When the sample size is 1000, the magnitude of RMSE for the a-parameters is almost same to no correlation and to .3 correlation. However, .6 has a much higher RMSE than .3 and no correlation. When the sample size is being increased to 1500, 2000, and 3000, the magnitude of RMSE for no correlation drops more rapidly than for the .3 or .6 correlations. However, the 60 d-parameter has almost the same rate of change in terms of magnitude of RMSE and MAD when 0, .3 and .6 correlations are imposed into the latent traits. It also gradually decreases when the sample size increases. Even though the size of change is not significant, a bigger sample size contributes to decreasing the RMSE and MAD of the d-parameter. Table 4.3.5. MAD for correlated latent traits (MS only) Item Parameters Correlation N 1000 a1 a2 a3 a4 a5 a6 d 0 0.3 0.6 0 0.3 0.6 0 0.3 0.6 0 0.3 0.6 0 0.3 0.6 0 0.3 0.6 0 0.3 0.6 1500 2000 3000 0.1465 0.1662 0.2294 0.1149 0.1528 0.2126 0.1075 0.1402 0.2047 0.0933 0.1389 0.2026 0.1517 0.1629 0.2238 0.1053 0.1457 0.2069 0.1145 0.1400 0.2045 0.0860 0.1331 0.1850 0.1443 0.1614 0.2229 0.1087 0.1532 0.2027 0.1119 0.1404 0.1943 0.0782 0.1322 0.1899 0.1423 0.1646 0.2259 0.1044 0.1551 0.2133 0.1040 0.1422 0.2017 0.0743 0.1285 0.1962 0.1445 0.1605 0.2173 0.0920 0.1484 0.2089 0.1201 0.1403 0.2000 0.0959 0.1209 0.1922 0.1478 0.1670 0.2250 0.0692 0.0696 0.0719 0.1108 0.1507 0.2127 0.0594 0.0578 0.0659 0.1154 0.1501 0.2043 0.0462 0.0469 0.0499 0.0868 0.1412 0.2044 0.0392 0.0407 0.0385 61 Table 4.3.6. RMSE for correlated latent traits (MS only) Item Parameters Correlation N 1000 1500 2000 3000 0 0.1687 0.1388 0.1260 0.1108 a1 0.3 0.1771 0.1661 0.1520 0.1489 0.6 0.2386 0.2219 0.2126 0.2128 0 0.1702 0.1247 0.1391 0.1008 a2 0.3 0.1732 0.1570 0.1501 0.1431 0.6 0.2337 0.2139 0.2117 0.1925 0 0.1676 0.1303 0.1332 0.0960 a3 0.3 0.1742 0.1652 0.1478 0.1425 0.6 0.2335 0.2117 0.2010 0.1983 0 0.1627 0.1250 0.1249 0.0904 a4 0.3 0.1758 0.1640 0.1527 0.1388 0.6 0.2369 0.2215 0.2075 0.2021 0 0.1684 0.1142 0.1450 0.1152 a5 0.3 0.1727 0.1604 0.1484 0.1311 0.6 0.2279 0.2185 0.2090 0.1987 0 0.1710 0.1347 0.1350 0.1058 a6 0.3 0.1795 0.1608 0.1590 0.1510 0.6 0.2371 0.2181 0.2122 0.2105 0 0.0852 0.0734 0.0554 0.0476 d 0.3 0.0854 0.0714 0.0577 0.0508 0.6 0.0901 0.0799 0.0596 0.0471 4.3.3. Skewed Latent Traits Distributions This section presents the results for 6-dimensions with skewed latent traits distributions on both AS and MS. Table 4.3.7 shows the BIAS when negative (-.9) and positive (+.9) skews are imposed into the latent traits distributions. The magnitude of BIAS for the a-parameters shows that a skew on the latent traits distributions increases the BIAS, whether it has AS or MS. However, there is not any significant difference between the AS and MS structures. It does not show any significant difference between a negative or positive skew. 62 Table 4.3.7. BIAS when skew is imposed on the latent traits distributions (+.9 and -.9) Item Parameters Skew 1000 AS 1500 2000 3000 1000 MS 1500 2000 3000 a1 No Positive Negative 0.046 0.110 0.104 0.018 0.090 0.088 0.012 0.080 0.072 -0.002 0.068 0.072 0.037 0.127 0.129 0.023 0.110 0.096 0.016 0.098 0.102 0.000 0.092 0.094 a2 No Positive Negative 0.047 0.110 0.106 0.023 0.084 0.084 0.011 0.076 0.073 0.004 0.065 0.064 0.021 0.110 0.104 0.005 0.085 0.087 -0.002 0.077 0.084 -0.018 0.080 0.069 a3 No Positive Negative 0.036 0.094 0.099 0.012 0.078 0.076 0.000 0.075 0.069 -0.015 0.060 0.062 0.021 0.106 0.110 0.005 0.091 0.088 -0.001 0.086 0.089 -0.013 0.072 0.073 a4 No Positive Negative 0.044 0.105 0.105 0.019 0.086 0.093 0.015 0.075 0.076 0.003 0.068 0.069 0.024 0.115 0.114 0.006 0.092 0.095 -0.006 0.088 0.084 -0.014 0.076 0.085 a5 No Positive Negative 0.054 0.104 0.106 0.016 0.085 0.084 0.011 0.081 0.077 0.005 0.069 0.068 0.018 0.109 0.107 0.006 0.099 0.086 -0.004 0.087 0.088 -0.008 0.079 0.076 a6 No Positive Negative No Positive Negative 0.052 0.112 0.110 0.005 -0.177 0.168 0.027 0.088 0.089 -0.009 -0.185 0.172 0.013 0.086 0.085 -0.004 -0.184 0.165 0.003 0.074 0.066 0.003 -0.178 0.172 0.028 0.120 0.115 -0.003 -0.228 0.227 0.014 0.101 0.096 0.009 -0.209 0.238 0.005 0.096 0.091 0.004 -0.221 0.220 0.001 0.092 0.081 0.001 -0.213 0.224 d The d-parameter shows a different pattern for negative or positive. When the positive skew is imposed into the latent traits distributions, it underestimates the d-parameter. When the negative skew is imposed into the latent traits distributions, it overestimates the d-parameter. MS has a bigger BIAS when skew is implied into the latent structures than does AS. 63 Table 4.3.8. MAD when skew is imposed on the latent traits distributions (+.9 and -.9) Item Parameters Skew 1000 AS 1500 2000 3000 1000 MS 1500 2000 3000 a1 No Positive Negative 0.082 0.124 0.121 0.059 0.103 0.102 0.054 0.090 0.083 0.045 0.079 0.080 0.146 0.201 0.203 0.115 0.170 0.153 0.126 0.162 0.148 0.093 0.135 0.142 a2 No Positive Negative 0.059 0.103 0.102 0.062 0.101 0.096 0.054 0.091 0.087 0.066 0.075 0.073 0.152 0.189 0.189 0.105 0.151 0.161 0.114 0.133 0.146 0.086 0.133 0.126 a3 No Positive Negative 0.073 0.106 0.112 0.083 0.093 0.090 0.053 0.088 0.080 0.070 0.071 0.071 0.144 0.187 0.186 0.109 0.149 0.170 0.112 0.145 0.147 0.078 0.119 0.132 a4 No Positive Negative 0.096 0.117 0.117 0.104 0.098 0.103 0.056 0.090 0.088 0.072 0.076 0.079 0.142 0.188 0.203 0.104 0.162 0.176 0.104 0.154 0.150 0.074 0.131 0.138 a5 No Positive Negative 0.104 0.119 0.117 0.081 0.120 0.094 0.053 0.092 0.088 0.046 0.078 0.078 0.144 0.197 0.194 0.092 0.167 0.160 0.120 0.145 0.148 0.096 0.143 0.140 a6 No Positive Negative No Positive Negative 0.099 0.123 0.120 0.074 0.178 0.170 0.062 0.120 0.104 0.052 0.185 0.173 0.054 0.095 0.095 0.050 0.184 0.166 0.047 0.081 0.076 0.039 0.178 0.172 0.148 0.200 0.202 0.069 0.230 0.227 0.111 0.174 0.165 0.059 0.210 0.238 0.115 0.157 0.158 0.046 0.222 0.220 0.087 0.145 0.147 0.039 0.213 0.224 d Tables 4.3.8 and 4.3.9 show the MAD and RMSE, which confirms the pattern of effect of skew in the latent traits distributions in AS and MS. Whether it has a negative or positive skew, the effect of skew is almost identical for a-parameter recovery. Whether it has AS or MS, the magnitude of MAD and RMSE for a-parameters increase almost the same amount when skew is implied into the latent traits distributions. The d-parameter shows a different story; MS has a bigger MAD and RMSE than AS when there is skew. Increasing the sample size decreases the 64 RMSE for the a-parameters, even though the change is not significantly large. The d-parameter does not improve when sample size increases. Table 4.3.9. RMSE when skew is imposed on the latent traits distributions (+.9 and -.9) Item Parameters Skew 1000 AS 1500 2000 3000 1000 MS 1500 2000 3000 a1 No Positive Negative 0.082 0.124 0.121 0.059 0.103 0.102 0.054 0.090 0.083 0.045 0.079 0.080 0.146 0.201 0.203 0.115 0.170 0.153 0.108 0.146 0.167 0.093 0.135 0.142 a2 No Positive Negative 0.102 0.152 0.145 0.079 0.121 0.120 0.067 0.107 0.106 0.107 0.088 0.086 0.170 0.208 0.207 0.125 0.166 0.177 0.139 0.149 0.164 0.101 0.152 0.144 a3 No Positive Negative 0.093 0.130 0.135 0.133 0.113 0.108 0.065 0.105 0.097 0.115 0.083 0.085 0.168 0.204 0.201 0.130 0.170 0.186 0.133 0.161 0.162 0.096 0.137 0.151 a4 No Positive Negative 0.140 0.141 0.144 0.176 0.117 0.124 0.071 0.107 0.106 0.121 0.091 0.093 0.163 0.206 0.217 0.125 0.176 0.191 0.125 0.167 0.169 0.090 0.149 0.159 a5 No Positive Negative 0.158 0.142 0.140 0.127 0.168 0.113 0.066 0.110 0.104 0.056 0.092 0.093 0.168 0.215 0.210 0.114 0.185 0.176 0.145 0.166 0.167 0.115 0.159 0.160 a6 No Positive Negative No Positive Negative 0.147 0.150 0.146 0.090 0.196 0.187 0.078 0.165 0.123 0.063 0.198 0.185 0.068 0.113 0.112 0.062 0.194 0.176 0.058 0.095 0.090 0.047 0.183 0.179 0.171 0.217 0.214 0.085 0.245 0.242 0.135 0.189 0.180 0.073 0.221 0.248 0.135 0.174 0.173 0.055 0.229 0.228 0.106 0.167 0.171 0.048 0.219 0.229 d 4.3.4. Correlated and Skewed Latent Traits Distributions This section explores the influence when both factors, correlation and skew, are imposed into the latent traits distributions. Table 4.3.10 shows the BIAS for item parameters, when both factors are implemented into item recovery procedures. As with 3-dimension structures, only MS is examined. For a-parameters, the magnitude of BIAS almost doubles, compared to having just 65 Table 4.3.10. BIAS when both correlation and skew are imposed on the latent traits distributions Item parameters Correlation Skew 1000 1500 2000 3000 0.0159 0.1491 0.1628 0.3206 0.3445 0.0003 0.1469 0.1497 0.2972 0.2994 a1 0 No 0.3 Positive Negative 0.6 Positive Negative 0.0369 0.1818 0.1855 0.3336 0.3475 0.0226 0.1623 0.1751 0.3250 0.3136 a2 0 No 0.3 Positive Negative 0.6 Positive Negative 0.0207 0.1600 0.1692 0.3194 0.3284 0.0046 -0.0016 -0.0177 0.1397 0.1357 0.1480 0.1515 0.1398 0.1250 0.3237 0.3009 0.2878 0.2969 0.3154 0.2810 a3 0 No 0.3 Positive Negative 0.6 Positive Negative 0.0209 0.1551 0.1649 0.3099 0.3125 0.0052 -0.0015 -0.0127 0.1436 0.1356 0.1279 0.1507 0.1393 0.1227 0.2945 0.2946 0.2947 0.2953 0.3005 0.2562 a4 0 No 0.3 Positive Negative 0.6 Positive Negative 0.0237 0.1680 0.1743 0.3176 0.3306 0.0057 -0.0058 -0.0144 0.1526 0.1290 0.1268 0.1436 0.1392 0.1329 0.3195 0.3042 0.2778 0.3067 0.2961 0.3281 a5 0 No 0.3 Positive Negative 0.6 Positive Negative 0.0183 0.1577 0.1716 0.3143 0.3324 0.0059 -0.0040 -0.0078 0.1349 0.1352 0.1254 0.1457 0.1441 0.1239 0.3046 0.2840 0.2946 0.3036 0.2747 0.2737 a6 0 No 0.0276 0.0142 0.0051 0.0012 0.3 Positive 0.1725 0.1549 0.1277 0.1301 Negative 0.1788 0.1536 0.1454 0.1315 0.6 Positive 0.3235 0.3196 0.2794 0.2876 Negative 0.3314 0.3175 0.3105 0.3059 0 No -0.0025 0.0093 0.0040 0.0012 0.3 Positive -0.1990 -0.2172 -0.2298 -0.2296 Negative 0.2257 0.2212 0.2180 0.2367 0.6 Positive -0.3665 -0.3576 -0.3484 -0.3524 Negative 0.3533 0.3825 0.3774 0.3083 d 66 one factor such as correlation or skew. For example, the a1-parameter has a .778 with a sample of 1000 and the correlation .3. a1-parameter has a .1271 with a sample of 1000 and a positive skew. However, when the a1-parameter has factors, .3 correlation and a positive skew, then BIAS becomes .1818, which is almost double, compared to the model with just one factor. The same thing happens when a .6 correlation and negative skew are implemented into the latent traits distributions. Even though the magnitude of BIAS decreases slightly with an increasing sample size when there is only correlation between latent traits, the magnitude of BIAS does not change significantly when two factors are incorporated at the same time. For the d-parameter, if there is only one factor, such as correlation or skew, then the BIAS is smaller than the model with two factors incorporated together. For example, when the sample size is 1000, the BIAS for the d-parameter with a .6 correlation is .0155. The BIAS for the d-parameter with a negative skew is .2269. It becomes .3533 with two factors, a .6 correlation and a negative skew together. However, if it has a low correlation like .3, then the BIAS does not get bigger with two factors together. For example, the BIAS is .0054 with a .3 correlation and .2269 with a negative skew. But it does not get bigger with two factors, .3 correlation and negative skew; it becomes .2257, which is almost the same as the one-factor model. Increasing the sample size does not help to improve item recovery in terms of BIAS for the d-parameters. 67 Table 4.3.11. MAD when both correlation and skew are imposed on the latent traits distributions Item parameters Correlation Skew 1000 1500 2000 3000 a1 0 No 0.3 Positive Negative 0.6 Positive Negative 0.1465 0.2430 0.2456 0.3515 0.3622 0.1149 0.2229 0.2354 0.3429 0.3384 0.1075 0.2110 0.2173 0.3363 0.3576 0.0933 0.1999 0.2122 0.3239 0.3210 a2 0 No 0.3 Positive Negative 0.6 Positive Negative 0.1517 0.2277 0.2329 0.3442 0.3490 0.1053 0.2194 0.2242 0.3440 0.3283 0.1145 0.2058 0.2147 0.3263 0.3385 0.0860 0.2061 0.2004 0.3146 0.3128 a3 0 No 0.3 Positive Negative 0.6 Positive Negative 0.1443 0.2265 0.2219 0.3365 0.3307 0.1087 0.2168 0.2158 0.3226 0.3199 0.1119 0.2035 0.2072 0.3173 0.3279 0.0782 0.1847 0.1942 0.3236 0.2951 a4 0 No 0.3 Positive Negative 0.6 Positive Negative 0.1423 0.2331 0.2434 0.3447 0.3648 0.1044 0.2262 0.2162 0.3474 0.3408 0.1040 0.2038 0.2086 0.3311 0.3288 0.0743 0.1974 0.2044 0.3121 0.3588 a5 0 No 0.3 Positive Negative 0.6 Positive Negative 0.1423 0.2331 0.2434 0.3397 0.3535 0.0920 0.2185 0.2109 0.3279 0.3335 0.1201 0.2026 0.2199 0.3148 0.3024 0.0959 0.2027 0.2016 0.3250 0.3032 a6 0 No 0.3 Positive Negative 0.6 Positive Negative 0 No 0.3 Positive Negative 0.6 Positive Negative 0.1478 0.2405 0.2366 0.3464 0.3545 0.0692 0.2013 0.2275 0.3672 0.3533 0.1108 0.2241 0.2188 0.3405 0.3477 0.0594 0.2181 0.2219 0.3576 0.3826 0.1154 0.2059 0.2139 0.3076 0.3341 0.0462 0.2299 0.2180 0.3484 0.3774 0.0868 0.1984 0.2036 0.3205 0.3295 0.0392 0.2296 0.2368 0.3524 0.3693 d 68 Table 4.3.12. RMSE when both correlation and skew are imposed into the latent traits distributions Item parameters Correlation Skew 1000 1500 2000 3000 a1 0 No 0.3 Positive Negative 0.6 Positive Negative 0.16872 0.25402 0.25749 0.36246 0.37556 0.13883 0.23006 0.24566 0.35228 0.34777 0.12603 0.21845 0.22746 0.34555 0.36783 0.11077 0.20997 0.22123 0.33322 0.32881 a2 0 No 0.3 Positive Negative 0.6 Positive Negative 0.17021 0.23887 0.24379 0.35497 0.35974 0.12465 0.22802 0.23363 0.35256 0.33733 0.13907 0.21277 0.22521 0.33727 0.34732 0.10079 0.21702 0.20858 0.32443 0.32279 a3 0 No 0.3 Positive Negative 0.6 Positive Negative 0.16764 0.23695 0.23287 0.34726 0.34624 0.13034 0.22517 0.22581 0.33189 0.33159 0.13316 0.21237 0.21561 0.326 0.33718 0.09597 0.19643 0.20698 0.33579 0.30498 a4 0 No 0.3 Positive Negative 0.6 Positive Negative 0.16272 0.24128 0.2536 0.35652 0.37697 0.12503 0.23584 0.22742 0.36047 0.34968 0.12491 0.21435 0.2177 0.33894 0.33705 0.09044 0.20556 0.21572 0.32107 0.36794 a5 0 No 0.3 Positive Negative 0.6 Positive Negative 0.16838 0.23629 0.24238 0.34988 0.36593 0.11416 0.22647 0.22185 0.33614 0.34331 0.14503 0.21165 0.22668 0.3235 0.31133 0.11516 0.21 0.20988 0.33836 0.31198 a6 0 No 0.3 Positive Negative 0.6 Positive Negative 0 No 0.3 Positive Negative 0.6 Positive Negative 0.17101 0.24892 0.24713 0.35638 0.36351 0.08515 0.21524 0.24315 0.38004 0.36803 0.13471 0.23503 0.22909 0.35053 0.35942 0.07341 0.22885 0.23492 0.36935 0.3905 0.13501 0.21329 0.21924 0.31489 0.34556 0.05538 0.23666 0.22548 0.35513 0.38492 0.10583 0.20637 0.21408 0.33113 0.33867 0.04755 0.23601 0.24242 0.35789 0.37408 d 69 CHAPTER 5 SUMMARY AND DISCUSSION This chapter will give a brief overview of the study, followed by a summary and a detailed discussion of the results. Finally, the implications and limitations of the study will be presented. 5.1. Overview of the Study This study investigates the influence of multiple factors—such as sample size, the types of latent trait configuration, number of dimensions, correlation between latent traits, and skew in the latent traits distributions—on item parameter recovery in a multi-dimensional item response theory model. In order to examine the influence of combinations of factors, the Markov Chain Monte Carlo method is used, specifically, with a Gibbs sampling technique used to run the simulation study. 60 items and different sample sizes of 1000, 1500, and 2000 for a 3-dimension model, and 1000, 1500, 2000, and 3000 for a 6-dimension model, are used. Two different types of latent trait configuration are used: Approximate simple and Mixed traits. .3 and .6 correlation are used to generate the correlated latent traits. Skew given to the latent traits distributions follows the Pearson's skewness index: +.9 and -.9 for positive and negative skewness, respectively. A total of 84 sets of combination of factors, plus 10 replications for each set, resulted in 840 simulation sets being run. 70 5.2. Summary of Results 5.2.1. Sample Size When different sample sizes are used to calibrate the item parameters, it improves the item parameter calibration process. For 3-dimensions, increasing the sample size from 1000 to 1500 and 2000 does not improve the item calibration, which shows that a sample size of 1000 is large enough for 3-dimensions. For 6-dimensions, increasing the sample size from 1000 to 1500, 2000, and 3000 does improve the item calibration. However, the improvement from 2000 to 3000 does not seem significant, which shows that a sample size of 2000 is enough to have an adequate item calibration for 6-dimensions. 5.2.2. Types of Latent Trait Configuration: Approximate Simple (AS) and Mixed Traits (MS) Different patterns are shown, dependent on the number of dimensions and item parameters. AS has a higher error for 3-dimensions and item parameters compared to MS. AS overestimates for a-parameters, but underestimates for the d-parameter, compared to MS. However, the types of latent traits do not influence item calibration, whether it’s a-parameters or the d-parameter in the 6-dimension model. 5.2.3. Correlated Latent Traits Correlated latent traits are only examined in mixed traits. When correlation is implemented into the latent traits, it shows that different behaviors depend on the number of dimensions. When there are 3-dimensions, a high correlation, like a .6 correlation, overestimates but a low correlation, like a .3 correlation, underestimates for a-parameters. The d-parameter is 71 not influenced by correlated latent traits. When there are 6-dimensions, both low and high correlations overestimate for a-parameters. The d-parameter is not influenced by the correlated latent traits. In terms of magnitude of bias, the higher the correlation, the bigger the error. Increasing sample size does not improve the item calibration accuracy. 5.2.4. Skewed Latent Traits Distributions When skewness is implemented into the model, it overestimates for a-parameters, regardless of the types of latent traits and number of dimensions. It does not matter whether the skew is negative or positive, they all overestimate the a-parameters. The d-parameter shows different behaviors, depending on the number of dimensions. In 3-dimensions, both negative and positive skew underestimate the d-parameter for an AS structure. However, whereas the positive skew underestimates the d-parameter for MS traits, the negative skew overestimates it. In 6dimensions, the positive skew underestimates and the negative skew overestimates, regardless of the types of latent traits. Increasing the sample size does not improve the item calibration. 5.2.5. Correlated and Skewed Latent Traits Distributions Only mixed structure is examined. When both correlation and skewness are implemented into the model together, the pattern of influence is similar to when only skewness is implemented into the model. However, it increases the magnitude of bias compared to the model with only one factor included. As the size of correlation increases from .3 to .6, it doubles the size of bias regardless of the number of dimensions and skewed latent traits. Increasing the sample size does not improve the item calibration. 72 5.3. Discussion This study explores the interaction effect on item parameter recovery when there are multiple factors combined in the MIRT model. The primary purpose of this study is to find the appropriate sample size in order to have accurate item parameters from the calibration procedures, when there more than one factor is involved in the model. First, it is clear that an MIRT model with higher dimensions needs to have a larger sample size to get better results on the calibration of item parameters. However, that is only the case for a MIRT model without any other factors such as correlation or skewness imposed on the latent traits distributions. Traditionally, it is common to have a sample size of 1000 when an IRT model has a uni-dimension. From the results of this study, this is enough for a 3-dimension MIRT model to get a satisfactory item parameter recovery. However, a sample size of more than 1000 is required if the number of dimensions increases to 6. This requires a sample size of 2000 to get a satisfactory item parameter recovery. Thus, a larger sample size is recommended if the number of dimensions increases. Second, the types of latent trait configuration show different behaviors, depending on the number of dimensions. This study shows that the latent trait types have more trouble in approximate simple traits in 3-dimensions. When number of dimensions goes up 6-dimensions, AS and MS show almost identical behavior. For a MIRT model of 3-dimensions, AS shows an overestimated bias for a-parameters, and an underestimated bias for the d-parameter. It is interesting that MS has a lower bias than AS in 3-dimensions, when AS and MS have almost the same bias in 6-dimensions. Results show that the interaction effect of latent traits types is cancelled when the number of dimensions is higher. This finding suggests that if researchers 73 consider using a MIRT model with AS, using high dimensions will give better results in item parameters recovery, rather than increasing the sample size. Third, when the correlation between latent traits is in the MIRT model, the result shows that a combination of high correlations and a high number of dimensions in latent traits contributes to a high magnitude of bias. The interaction effect from combining correlation and number of dimensions is more troublesome than with the number of dimensions and types of latent traits combined. It appears that when researchers suspect a correlation between the latent traits, it is not helpful to just increase the sample size. Rather, researchers should find alternative MIRT models that account for the correlated latent traits. Fourth, when skewed latent traits distributions are in the MIRT model with different types of latent traits configuration, results show that the bias increases. The amount of increased bias is almost the same whether there are 3-dimensions or 6-dimensions. There is also the same amount of increase in bias whether it has AS or MS. The improvement of item parameters calibration in terms of bias is not influenced by increasing sample size. This finding suggests that researchers should correct the latent traits distribution if he/she is suspicious about non-normal distribution on the latent traits distribution. Fifth, when all factors are combined—correlation, skewness, and number of dimensions—the model with a low correlation and a small number of dimensions has a lower bias than the model with a high correlation and a high number of dimensions. Overall, increasing the sample size helps to improve the accuracy of item parameter recovery when the latent traits have different types of structures configuration, AS and MS, and a high number of dimensions. However, if the latent traits are correlated, then solely increasing sample size does not improve the accuracy of item parameter estimates. Rather, the number of 74 items should be increased along with increasing the sample size. It is also possible to get a normal distribution of latent traits distributions by selecting the sample group carefully. This is also true for skewed latent traits distributions. The sample group must be selected with careful consideration. Based on the test specifications, the sample group should be selected from a wide range of abilities, from low to high. Having test-takers with a wide range of abilities will prevent the latent trait distribution from being skewed. 5.4. Implications and Limitations 5.4.1. Implications With the popularity of item response theory (IRT) in the field of measurement, its use is not just limited to measurement but expanded to almost all fields of behavior science research. Since the research in behavioral sciences is getting complicated, it requires IRT models with more than just one dimension. That is where Multi-dimensional item response theory (MIRT) models come in. Most applications based on MIRT models require assumptions that the item parameters are accurately estimated in advance. Traditionally, it is known that a larger sample size gives a better item recovery result. However, there is a lack of research on how the item parameter recovery is influenced if other factors are included in the model, such as correlated latent traits or skewed latent traits distributions. It could be just one factor, or it could be a combination of more than one factor. In this study, the interaction effect of combined factors on item recovery is explored. It clearly shows that increasing sample size does not improve item parameter calibration if there is more than one factor involved. Rather, correlated or skewed latent traits distribution should be corrected before running a calibration program, in order to 75 have more accurate results for item parameter calibration. This finding is helpful for researchers in that it will save costs associated with recruiting a larger sample size than is necessary. 5.4.2. Limitations This study has several limitations. First, due to computing and time resources, the replications for each condition is limited to 10, which might contribute to some estimation errors of the MCMC simulation. An MCMC simulation suggests having 50 replications to have a stable estimation, if it is necessary. A future study with more replications would yield more affirmative results. Second, it is assumed that all the latent traits in the MIRT models have the same distribution in order to make interpretation clear. However, it might be not practical to have such an assumption in a real situation. Considering different distributions on each latent trait will be the next step for a future study. 76 APPENDIX 77 APPENDIX Table A.1.1. Heidelberger and Welch’s Convergence Diagnostic: a1-parameter Items 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Stationarity Test passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed Start P-value 1 8801 1 1 1 1 1 4401 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.1442 0.3837 0.1708 0.1972 0.2051 0.0822 0.0704 0.0520 0.1214 0.1676 0.7937 0.8795 0.9896 0.4052 0.5056 0.8939 0.8323 0.9746 0.9890 0.9834 0.8823 0.5622 0.8627 0.8851 0.9247 0.9384 0.5585 0.9005 0.9085 0.8991 78 Halfwidth Test passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed Mean Halfwidth 0.1639 0.0502 0.1289 0.2258 0.0650 0.2272 0.1701 0.1576 0.1415 0.1408 0.0543 0.2147 0.2221 0.0441 0.0709 0.2111 0.0704 0.2887 0.1642 0.1682 0.1950 0.1570 0.2056 0.2555 0.2024 0.1585 0.0764 0.1347 0.1105 0.2837 0.0067 0.0021 0.0058 0.0077 0.0032 0.0076 0.0045 0.0069 0.0061 0.0082 0.0017 0.0038 0.0040 0.0014 0.0019 0.0052 0.0022 0.0055 0.0064 0.0063 0.0083 0.0070 0.0088 0.0103 0.0125 0.0048 0.0052 0.0086 0.0064 0.0153 Table A.1.1 (cont’d) Items 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Stationarity Test passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed failed Start P-value 1 1 1 1 1 1 4401 4401 1 1 6601 1 1 1 1 1 1 1 1 1 2201 2201 1 2201 1 4401 2201 2201 1 NA 0.1405 0.1368 0.6697 0.0722 0.1455 0.1289 0.0676 0.1081 0.0541 0.1469 0.2188 0.1921 0.0592 0.1926 0.8881 0.4568 0.8898 0.2415 0.5545 0.0824 0.0644 0.1180 0.5720 0.0770 0.0509 0.0500 0.2062 0.1348 0.0552 0.0144 79 Halfwidth Test passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed Mean Halfwidth 0.1080 0.3248 0.0779 0.1757 0.0602 0.1385 0.1523 0.1934 0.1052 0.2060 1.2414 0.7931 0.4075 1.2614 0.7710 0.8599 0.6345 1.0436 0.3498 1.1286 0.1981 0.1256 0.0372 0.1985 0.3401 0.1273 0.1021 0.1106 0.1467 NA 0.0050 0.0147 0.0020 0.0062 0.0032 0.0071 0.0056 0.0077 0.0047 0.0072 0.0067 0.0033 0.0025 0.0064 0.0035 0.0039 0.0027 0.0041 0.0021 0.0065 0.0039 0.0040 0.0012 0.0082 0.0137 0.0083 0.0043 0.0061 0.0065 NA Table A.1.2. Heidelberger and Welch’s Convergence Diagnostic: a2-parameter Items 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Stationarity Test passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed Start P-value 1 1 1 1 1 1 2201 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2201 1 1 1 1 1 1 1 1 0.4900 0.3567 0.0847 0.2916 0.1021 0.2743 0.0680 0.0546 0.1991 0.0974 0.8854 0.5049 0.6684 0.8327 0.5037 0.4983 0.7507 0.7597 0.9326 0.8574 0.9903 0.3543 0.0525 0.7157 0.8079 0.6680 0.6465 0.1866 0.1764 0.8469 80 Halfwidth Test passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed Mean Halfwidth 0.1398 0.0449 0.2098 0.1834 0.1714 0.0600 0.1779 0.2037 0.1110 0.2746 0.1397 0.1468 0.1864 0.0837 0.0659 0.1433 0.0763 0.0650 0.2022 0.1883 0.8620 0.7216 0.9079 1.0763 1.3329 0.5539 0.8473 0.9922 0.8136 1.7251 0.0051 0.0018 0.0060 0.0057 0.0054 0.0030 0.0043 0.0056 0.0049 0.0103 0.0048 0.0043 0.0039 0.0040 0.0022 0.0051 0.0019 0.0033 0.0076 0.0070 0.0035 0.0032 0.0031 0.0034 0.0063 0.0026 0.0033 0.0052 0.0029 0.0121 Table A.1.2 (cont’d) Items 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Stationarity Test passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed Start P-value 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.2422 0.2488 0.4696 0.2085 0.1949 0.3144 0.1547 0.3931 0.2314 0.3368 0.9508 0.7858 0.6484 0.9602 0.7523 0.9605 0.9031 0.9542 0.9107 0.9583 0.9925 0.1668 0.7793 0.1752 0.4989 0.1174 0.9240 0.3858 0.4570 0.5378 81 Halfwidth Test passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed Mean Halfwidth 0.2146 0.1892 0.0543 0.1909 0.1151 0.1575 0.1309 0.1528 0.0653 0.1920 0.2529 0.1380 0.0722 0.2707 0.2023 0.1594 0.1614 0.1245 0.0698 0.1381 0.0456 0.1254 0.0856 0.2110 0.2251 0.1266 0.1584 0.1622 0.0754 0.0862 0.0075 0.0134 0.0017 0.0065 0.0061 0.0082 0.0046 0.0075 0.0035 0.0080 0.0121 0.0069 0.0028 0.0126 0.0079 0.0071 0.0063 0.0076 0.0028 0.0095 0.0015 0.0034 0.0024 0.0052 0.0085 0.0051 0.0034 0.0051 0.0032 0.0028 Table A.1.3. Heidelberger and Welch’s Convergence Diagnostic: a3-parameter Items Stationarity Test 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed Start P-value 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4401 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.3273 0.1127 0.2428 0.3839 0.3759 0.8066 0.7666 0.2356 0.2503 0.2929 0.0906 0.5316 0.7671 0.2323 0.2848 0.3261 0.1415 0.7838 0.4407 0.3758 0.6809 0.1968 0.6898 0.8118 0.6931 0.5298 0.8980 0.7576 0.7461 0.9191 82 Halfwidth Test Mean Halfwidth passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed 0.1843 0.0310 0.2680 0.1292 0.0641 0.2080 0.2092 0.2039 0.1300 0.1998 0.6696 0.6718 0.5099 0.7219 0.4731 0.8160 0.4161 0.8923 1.1336 1.0691 0.1312 0.1310 0.1795 0.1196 0.1750 0.0947 0.1751 0.1575 0.0940 0.3664 0.0049 0.0008 0.0043 0.0040 0.0026 0.0052 0.0039 0.0051 0.0041 0.0075 0.0035 0.0037 0.0029 0.0044 0.0034 0.0039 0.0026 0.0043 0.0050 0.0043 0.0056 0.0048 0.0063 0.0064 0.0094 0.0036 0.0060 0.0072 0.0045 0.0121 Table A.1.3 (cont’d) Items 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Stationarity Test passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed Start P-value 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.7805 0.3288 0.0688 0.3841 0.4351 0.6514 0.6334 0.5502 0.1899 0.5580 0.6049 0.9888 0.6072 0.8013 0.8216 0.8315 0.8043 0.6360 0.3797 0.9174 0.6593 0.2957 0.1351 0.8000 0.8433 0.6246 0.5458 0.9998 0.8554 0.9950 83 Halfwidth Test passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed Mean Halfwidth 0.1207 0.4486 0.0517 0.0795 0.2063 0.1878 0.1695 0.1831 0.1535 0.2825 0.1525 0.2754 0.0791 0.1710 0.0935 0.1783 0.0477 0.1670 0.0423 0.1648 0.1320 0.0590 0.0658 0.2865 0.1534 0.2531 0.0855 0.0796 0.1059 0.0777 0.0067 0.0178 0.0017 0.0045 0.0071 0.0089 0.0055 0.0090 0.0065 0.0096 0.0067 0.0041 0.0023 0.0068 0.0038 0.0053 0.0017 0.0067 0.0010 0.0062 0.0033 0.0016 0.0021 0.0054 0.0075 0.0057 0.0023 0.0033 0.0039 0.0025 Table A.1.4. Heidelberger and Welch’s Convergence Diagnostic: a4-parameter Items Stationarity Test Start P-value Halfwidth Test Mean Halfwidth 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.8384 0.4861 0.9591 0.9535 0.9088 0.3075 0.4810 0.8041 0.7099 0.8828 0.2346 0.4755 0.7504 0.5600 0.2157 0.5655 0.3445 0.7231 0.9106 0.2778 0.9017 0.7041 0.4705 0.3635 0.9047 0.1473 0.2132 0.2948 0.8598 0.7905 passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed 0.0857 0.1386 0.1648 0.1311 0.1973 0.1082 0.1132 0.1796 0.2218 0.2129 0.0457 0.0783 0.0864 0.1215 0.0494 0.2358 0.0848 0.1373 0.1711 0.1132 0.1405 0.2745 0.1718 0.1171 0.2765 0.0947 0.0983 0.1237 0.0761 0.1751 0.0042 0.0045 0.0061 0.0069 0.0067 0.0059 0.0047 0.0066 0.0077 0.0107 0.0013 0.0026 0.0023 0.0039 0.0016 0.0046 0.0020 0.0041 0.0059 0.0045 0.0050 0.0048 0.0057 0.0053 0.0078 0.0030 0.0045 0.0052 0.0033 0.0085 84 Table A.1.4. (cont’d) Items 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Stationarity Test passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed failed passed passed passed passed passed passed passed passed passed passed failed Start P-value 1 1 1 1 1 1 1 1 1 1 2201 1 2201 1 1 2201 1 1 NA 2201 1 1 1 1 1 1 1 1 1 NA 0.6487 0.2903 0.2784 0.2801 0.6944 0.8458 0.2277 0.3503 0.5789 0.2582 0.1445 0.1182 0.1314 0.1352 0.1254 0.1085 0.3517 0.1384 0.0209 0.0792 0.5177 0.7741 0.2188 0.3363 0.7575 0.3442 0.1699 0.1757 0.4223 0.0053 85 Halfwidth Test passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed Mean Halfwidth 0.1878 0.2787 0.0706 0.1119 0.1777 0.1107 0.0798 0.1410 0.1025 0.1466 0.2391 0.1018 0.0923 0.3369 0.0832 0.1250 0.0934 0.2634 NA 0.2110 0.3994 0.5317 0.5265 1.0391 1.5751 1.0872 0.5742 0.8852 0.7819 NA 0.0070 0.0157 0.0024 0.0058 0.0072 0.0065 0.0036 0.0069 0.0049 0.0073 0.0120 0.0051 0.0030 0.0097 0.0045 0.0063 0.0047 0.0082 NA 0.0104 0.0026 0.0031 0.0026 0.0039 0.0093 0.0046 0.0027 0.0039 0.0031 NA Table A.1.5. Heidelberger and Welch’s Convergence Diagnostic: a5-parameter Items 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Stationarity Test passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed Start P-value 1 1 1 8801 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2201 1 1 1 1 0.3111 0.0617 0.1048 0.7956 0.2455 0.2840 0.2068 0.1675 0.1783 0.6803 0.6104 0.4848 0.3505 0.2855 0.2674 0.6172 0.8342 0.4870 0.7277 0.7492 0.4253 0.6548 0.4247 0.2799 0.3366 0.0962 0.3677 0.7694 0.5978 0.7086 86 Halfwidth Test passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed Mean Halfwidth 0.9233 0.4825 0.8559 0.9417 0.7335 0.8501 0.6322 0.8743 0.8883 1.3362 0.1136 0.0640 0.0710 0.0763 0.1098 0.1134 0.1104 0.1225 0.0783 0.0766 0.0934 0.1479 0.1476 0.1271 0.1203 0.0846 0.1345 0.0839 0.1935 0.3203 0.0051 0.0029 0.0032 0.0043 0.0036 0.0032 0.0026 0.0049 0.0036 0.0063 0.0035 0.0023 0.0020 0.0029 0.0030 0.0036 0.0022 0.0042 0.0042 0.0038 0.0042 0.0044 0.0058 0.0065 0.0066 0.0036 0.0055 0.0048 0.0061 0.0109 Table A.1.5. (cont’d) Items 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Stationarity Test passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed Start P-value 4401 2201 1 4401 6601 4401 1 6601 4401 4401 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.0607 0.0524 0.1119 0.0751 0.1056 0.0685 0.0672 0.1083 0.0685 0.1258 0.2125 0.3345 0.6440 0.3572 0.3983 0.2448 0.3762 0.1572 0.9481 0.3442 0.0670 0.6455 0.7212 0.8314 0.6786 0.9270 0.9462 0.9435 0.9124 0.6614 87 Halfwidth Test passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed Mean Halfwidth 0.2183 0.2524 0.0316 0.2844 0.1603 0.2305 0.1937 0.2315 0.2543 0.1578 0.2331 0.0936 0.1624 0.1022 0.0806 0.2372 0.0950 0.2000 0.1178 0.1580 0.0953 0.2184 0.1691 0.2338 0.2056 0.2319 0.0524 0.1342 0.0896 0.1320 0.0096 0.0195 0.0011 0.0085 0.0085 0.0103 0.0065 0.0110 0.0087 0.0099 0.0091 0.0042 0.0033 0.0070 0.0045 0.0071 0.0034 0.0082 0.0026 0.0081 0.0033 0.0044 0.0048 0.0078 0.0116 0.0093 0.0023 0.0066 0.0050 0.0047 Table A.1.6. Heidelberger and Welch’s Convergence Diagnostic: a6-parameter Items 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Stationarity Test passed passed passed passed passed passed passed passed failed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed Start P-value 1 1 4401 4401 4401 1 4401 2201 NA 8801 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.0775 0.0519 0.1454 0.0583 0.0609 0.0762 0.1225 0.0743 0.0417 0.1337 0.7935 0.3321 0.6442 0.4623 0.6926 0.5607 0.6998 0.5527 0.5657 0.2704 0.5207 0.1000 0.5022 0.4065 0.5171 0.5130 0.2682 0.8106 0.1690 0.4291 88 Halfwidth Test passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed Mean Halfwidth 0.1321 0.0855 0.0973 0.2228 0.1148 0.1157 0.1266 0.0859 NA 0.1666 0.1572 0.1326 0.0600 0.2844 0.0569 0.1230 0.1128 0.1418 0.1532 0.1088 0.1212 0.2039 0.2133 0.2556 0.3417 0.0359 0.1036 0.3651 0.1472 0.4453 0.0068 0.0036 0.0067 0.0088 0.0075 0.0066 0.0051 0.0058 NA 0.0148 0.0053 0.0056 0.0024 0.0064 0.0022 0.0068 0.0030 0.0068 0.0090 0.0078 0.0057 0.0064 0.0074 0.0090 0.0112 0.0014 0.0060 0.0083 0.0061 0.0143 Table A.1.6. (cont’d) Items Stationarity Test 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed Start P-value 1 2201 1 1 4401 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.0522 0.1351 0.2599 0.2116 0.0913 0.9670 0.3390 0.6223 0.1249 0.5696 0.9036 0.3645 0.9814 0.8682 0.2253 0.8282 0.3333 0.8901 0.8399 0.7208 0.6865 0.3310 0.1952 0.7648 0.4198 0.2595 0.7476 0.3425 0.4455 0.5871 89 Halfwidth Test Mean Halfwidth passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed 0.8905 1.9137 0.3255 0.8337 0.7995 0.9977 0.6473 1.0091 0.7710 1.0333 0.3593 0.2061 0.0831 0.1466 0.0792 0.2029 0.1226 0.2383 0.0403 0.1279 0.1132 0.1632 0.0911 0.2428 0.2921 0.1943 0.0378 0.1708 0.1377 0.0772 0.0033 0.0152 0.0024 0.0040 0.0035 0.0040 0.0029 0.0031 0.0039 0.0065 0.0087 0.0058 0.0023 0.0080 0.0035 0.0053 0.0039 0.0072 0.0011 0.0069 0.0031 0.0044 0.0040 0.0079 0.0134 0.0087 0.0016 0.0076 0.0061 0.0036 Table A.1.7. Heidelberger and Welch’s Convergence Diagnostic: d-parameter Items 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Stationarity Test passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed Start P-value 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.9248 0.7610 0.7914 0.2363 0.2884 0.3306 0.7550 0.1794 0.3384 0.3201 0.0621 0.7794 0.6429 0.4652 0.8312 0.3772 0.3167 0.1589 0.0851 0.2886 0.3598 0.3049 0.7184 0.4188 0.6544 0.7085 0.5311 0.0803 0.5646 0.6575 90 Halfwidth Test passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed failed passed passed passed passed passed passed passed passed passed passed passed passed passed Mean Halfwidth 1.8052 -0.0532 -0.2077 -0.5214 -0.4003 -0.5679 -0.4760 -0.7079 -0.9386 -1.3475 1.1477 0.5408 0.3588 0.4011 0.2872 0.1790 -0.0419 -0.1822 -0.4961 -0.6791 1.1359 0.9683 0.7617 0.6249 -0.7869 -0.2982 -0.5098 -0.6942 -0.7061 -2.1652 0.0040 0.0014 0.0019 0.0023 0.0018 0.0023 0.0020 0.0040 0.0031 0.0046 0.0023 0.0017 0.0020 0.0023 0.0017 0.0022 0.0044 0.0026 0.0036 0.0041 0.0028 0.0042 0.0038 0.0026 0.0050 0.0018 0.0024 0.0055 0.0034 0.0103 Table A.1.7. (cont’d) Items 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Stationarity Test passed passed failed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed passed Start P-value 1 2201 NA 1 8801 1 1 1 1 4401 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.1513 0.1029 0.0021 0.1325 0.1823 0.0728 0.4140 0.0562 0.3011 0.1305 0.1420 0.1577 0.8701 0.4816 0.8048 0.9666 0.7360 0.1475 0.2165 0.6451 0.4431 0.5711 0.6277 0.6573 0.4995 0.1381 0.6640 0.2890 0.6563 0.3434 91 Halfwidth Test passed passed passed passed passed passed passed passed passed passed passed passed passed passed failed passed passed passed passed passed passed passed passed passed passed passed passed passed passed Mean Halfwidth 1.1994 1.9186 NA 0.2784 0.0436 0.0395 -0.1569 -0.8579 -0.8446 -1.0531 1.4298 0.7137 0.3740 0.5017 0.3044 -0.0189 -0.1252 -0.1840 -0.0847 -1.1910 0.4207 0.4594 0.3294 0.4527 0.5587 0.2048 -0.0334 -0.2808 -0.3581 -0.3982 0.0026 0.0104 NA 0.0022 0.0025 0.0029 0.0021 0.0026 0.0017 0.0096 0.0039 0.0021 0.0016 0.0059 0.0021 0.0020 0.0016 0.0023 0.0012 0.0063 0.0014 0.0022 0.0014 0.0032 0.0056 0.0021 0.0018 0.0021 0.0019 0.0023 Table A.2. 1. Geweke’s Z-score Item 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 a1 1.2684 0.8198 1.1488 1.3278 0.7640 2.3268 1.5009 1.3062 0.8921 1.2711 -0.4740 -0.3858 0.2475 0.3892 -0.9393 1.0436 0.5820 0.5422 0.4158 -0.3669 -2.1450 -1.2775 -2.7402 -1.5634 -1.6566 -1.6212 -2.8851 -3.0024 -2.1789 -2.2778 a2 -2.2061 -1.1491 -2.3534 -2.5092 -3.5548 -1.6575 -1.9750 -3.2312 -2.0617 -3.6032 -0.8094 0.2128 1.7524 -0.0242 0.9163 1.3765 0.3020 0.7274 0.6139 0.7525 -0.1815 -3.2277 -2.1957 0.2541 0.4288 -0.9506 -0.3538 -1.6878 -1.3104 0.9394 a3 0.5387 1.3394 0.8052 0.7329 0.8827 0.6066 1.2068 0.3156 0.3080 0.6145 2.1629 1.1112 -0.1811 -2.0751 -1.2904 -1.9252 -1.4122 1.1886 -1.9429 1.5577 -0.8239 -1.2714 -0.2052 -0.7805 -0.8126 -0.4867 -0.8161 0.0708 -0.9396 0.3624 a4 0.2543 0.6109 1.5895 0.5589 0.7436 -0.1853 0.5158 0.5285 -0.3810 0.7171 1.7458 -0.1536 -0.7205 1.2241 1.1393 1.3216 0.5939 0.1160 1.0971 0.9327 0.5877 0.9894 2.4467 -1.3888 0.8734 0.6011 1.5896 1.7112 1.3913 1.0308 92 a5 0.7505 -1.4409 -1.4047 -0.1756 0.0197 -1.6516 -1.5500 -2.9689 -0.3362 0.4523 -1.2479 -0.3504 -0.1001 -2.3642 -1.2072 -2.5827 -1.1244 -2.4029 -2.1644 -1.5352 1.4657 0.6789 1.7355 2.2053 2.0408 2.3581 1.8637 0.6105 1.6996 1.6574 a6 1.6152 0.4359 1.8357 2.0876 2.3901 1.2716 2.3976 2.1849 3.0078 2.9258 -0.1922 -0.3221 0.5925 0.3590 -0.1175 0.1225 0.3885 0.3778 0.1458 -0.4552 1.5479 2.9973 1.6357 2.4522 2.0909 1.2245 1.5609 1.3291 1.7909 2.1224 d 0.5404 -0.0797 -0.2742 0.5249 0.0838 -0.0478 -0.6697 1.1820 0.1300 0.0248 1.0436 -0.6892 -0.7560 -1.0243 -0.4807 -0.9919 -0.6820 -2.1434 0.2930 -1.8313 -0.5681 -2.7610 -1.0770 0.9058 -1.2165 -0.8022 -0.5495 1.5138 -0.1070 -1.4205 Table A.2.1. (cont’d) Item 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 a1 0.1678 -0.0277 -0.3205 -0.8227 0.4982 -0.4855 1.1050 0.8087 0.5295 -0.1855 -0.6586 -0.2370 1.2948 0.4015 -0.9935 -0.4188 -0.0810 1.0662 0.3858 1.7297 2.7123 3.0361 0.6232 3.1589 3.2102 4.3559 3.3913 4.1752 3.4294 3.8606 a2 -2.5702 -2.1961 -0.9302 -1.2366 -2.3341 -2.0135 -2.3846 -0.8641 -2.4461 -1.0243 1.7794 2.6316 1.6047 1.9269 2.2310 2.1789 0.5557 1.7195 0.8214 2.0107 0.3227 -1.3009 -0.5849 -1.3031 -1.0141 -2.8829 0.2152 -1.9060 -0.4988 0.0203 a3 -0.0885 1.3086 0.8677 0.6951 0.6859 0.0016 -1.2996 -0.1113 0.8458 0.4129 -0.6834 -0.2186 -1.3724 -1.3193 -1.1292 -0.4423 -0.3994 -0.4714 -1.2955 -0.1287 -0.3417 -1.6603 0.9672 -0.7525 -0.9114 -0.2860 -0.7293 -0.1184 -0.6279 -0.2590 a4 -0.8382 -1.2008 -2.4352 -1.4195 -1.1272 -0.4144 -1.9396 -1.5132 -1.1646 -1.8310 -4.3514 -3.4110 -2.9975 -2.7121 -3.1834 -3.6156 -2.6835 -3.0514 -2.7006 -3.3321 -0.1434 0.4125 1.1769 0.8369 -1.0415 -0.2098 1.5084 -1.9989 -0.3578 -0.9981 93 a5 -0.9783 -1.8658 -1.0420 -1.1550 -0.9521 -1.4494 -1.2063 -2.2510 -2.1291 -1.2711 -0.9686 -1.1005 0.3032 -1.5692 -0.4129 -0.8354 -0.0723 -0.8465 -0.1919 -0.6108 -2.9107 -1.6686 -0.0656 -1.3537 -1.3377 -1.4810 -0.4950 -1.2315 -0.5170 -2.0552 a6 1.6324 2.3903 2.3969 0.6767 1.0472 0.3119 1.6314 0.7292 1.5168 0.0369 1.7381 0.6683 0.9923 1.1988 0.7945 1.3713 1.3062 0.5978 1.0801 0.9165 0.7698 2.0707 0.6545 1.1618 1.8209 1.9488 0.8239 1.7296 1.4703 1.1665 d 1.1548 2.2688 2.6859 1.3804 1.6419 1.4489 0.7137 0.0170 -0.2428 0.9766 -0.7062 -0.6826 -0.5413 0.2489 -0.6069 0.5700 -1.2262 1.5022 -0.0473 -0.9695 -0.7996 -0.3658 0.1518 -1.1000 -1.6240 -1.6734 0.7482 -0.7038 -1.8652 -1.4288 Table A.3. 1. MCMC standard error Item 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 a1 0.0045 0.0016 0.0039 0.0041 0.0019 0.0042 0.0029 0.0046 0.0041 0.0048 0.0004 0.0013 0.0006 0.0007 0.0012 0.0015 0.0008 0.0012 0.0006 0.0011 a2 0.0016 0.0010 0.0028 0.0020 0.0022 0.0012 0.0028 0.0027 0.0019 0.0050 0.0029 0.0030 0.0025 0.0017 0.0011 0.0035 0.0008 0.0018 0.0040 0.0039 a3 0.0030 0.0007 0.0026 0.0021 0.0012 0.0027 0.0020 0.0033 0.0025 0.0044 0.0019 0.0018 0.0009 0.0031 0.0021 0.0022 0.0014 0.0018 0.0030 0.0034 a4 0.0011 0.0017 0.0010 0.0022 0.0015 0.0027 0.0021 0.0021 0.0026 0.0024 0.0009 0.0015 0.0013 0.0014 0.0008 0.0020 0.0014 0.0017 0.0019 0.0024 a5 0.0018 0.0013 0.0030 0.0028 0.0026 0.0019 0.0019 0.0014 0.0018 0.0033 0.0026 0.0013 0.0014 0.0018 0.0018 0.0018 0.0009 0.0030 0.0017 0.0018 a6 0.0037 0.0018 0.0034 0.0046 0.0036 0.0029 0.0026 0.0030 0.0052 0.0075 0.0036 0.0040 0.0019 0.0056 0.0018 0.0049 0.0024 0.0054 0.0069 0.0065 d 0.0013 0.0008 0.0010 0.0012 0.0015 0.0015 0.0008 0.0016 0.0015 0.0030 0.0021 0.0012 0.0013 0.0016 0.0011 0.0017 0.0022 0.0028 0.0021 0.0039 21 22 23 24 25 26 27 28 29 30 0.0023 0.0011 0.0020 0.0035 0.0024 0.0011 0.0011 0.0026 0.0014 0.0043 0.0009 0.0028 0.0023 0.0018 0.0015 0.0009 0.0009 0.0027 0.0016 0.0014 0.0031 0.0034 0.0041 0.0035 0.0061 0.0025 0.0027 0.0052 0.0020 0.0084 0.0018 0.0020 0.0021 0.0019 0.0027 0.0016 0.0022 0.0023 0.0006 0.0033 0.0013 0.0015 0.0027 0.0029 0.0028 0.0023 0.0014 0.0017 0.0015 0.0046 0.0026 0.0035 0.0028 0.0039 0.0045 0.0007 0.0030 0.0028 0.0032 0.0067 0.0011 0.0023 0.0021 0.0010 0.0017 0.0006 0.0014 0.0028 0.0019 0.0024 94 Table A.3.1. (cont’d) Item a1 a2 a3 a4 a5 a6 d 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 0.0025 0.0069 0.0006 0.0029 0.0013 0.0030 0.0026 0.0036 0.0025 0.0030 0.0041 0.0010 0.0015 0.0027 0.0010 0.0011 0.0010 0.0018 0.0006 0.0032 0.0024 0.0026 0.0004 0.0059 0.0095 0.0049 0.0029 0.0039 0.0041 0.0043 0.0047 0.0067 0.0007 0.0039 0.0031 0.0040 0.0028 0.0035 0.0018 0.0033 0.0030 0.0018 0.0012 0.0028 0.0023 0.0021 0.0014 0.0023 0.0009 0.0024 0.0004 0.0021 0.0011 0.0028 0.0040 0.0032 0.0011 0.0027 0.0017 0.0014 0.0045 0.0143 0.0014 0.0039 0.0049 0.0064 0.0049 0.0069 0.0053 0.0078 0.0024 0.0006 0.0010 0.0027 0.0009 0.0022 0.0005 0.0023 0.0004 0.0020 0.0017 0.0011 0.0008 0.0020 0.0023 0.0019 0.0009 0.0006 0.0017 0.0006 0.0042 0.0071 0.0014 0.0026 0.0033 0.0027 0.0020 0.0028 0.0023 0.0033 0.0067 0.0028 0.0023 0.0052 0.0026 0.0041 0.0018 0.0049 0.0009 0.0067 0.0014 0.0005 0.0017 0.0016 0.0031 0.0014 0.0015 0.0020 0.0025 0.0026 0.0044 0.0076 0.0005 0.0041 0.0038 0.0047 0.0025 0.0052 0.0049 0.0043 0.0049 0.0023 0.0020 0.0051 0.0030 0.0047 0.0020 0.0054 0.0011 0.0050 0.0013 0.0019 0.0023 0.0027 0.0036 0.0038 0.0008 0.0015 0.0018 0.0021 0.0025 0.0107 0.0012 0.0018 0.0023 0.0011 0.0016 0.0010 0.0021 0.0019 0.0029 0.0018 0.0006 0.0022 0.0014 0.0020 0.0018 0.0028 0.0005 0.0027 0.0014 0.0021 0.0019 0.0026 0.0064 0.0039 0.0007 0.0038 0.0023 0.0016 0.0016 0.0074 0.0013 0.0012 0.0014 0.0020 0.0017 0.0015 0.0008 0.0047 0.0030 0.0012 0.0008 0.0031 0.0009 0.0003 0.0007 0.0013 0.0007 0.0028 0.0008 0.0010 0.0005 0.0014 0.0031 0.0021 0.0005 0.0013 0.0010 0.0014 95 Table A.4. 1. Highest posterior density(HPD) interval for a1, a2, and a3-parameters Item 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Mean 0.4391 0.4231 0.2637 0.2947 0.2678 0.6398 0.4619 0.7090 0.6421 0.5545 0.1686 0.1848 0.2615 0.1323 0.4396 0.3100 0.2081 0.3610 0.3489 0.2767 0.3273 0.3709 0.4475 0.5029 0.3087 0.1467 0.2187 0.2722 0.1234 0.3013 a1 Lower 0.0954 0.2224 0.0000 0.0465 0.0754 0.1883 0.1718 0.1330 0.1152 0.2008 0.0000 0.0191 0.0237 0.0000 0.2154 0.1039 0.0300 0.1401 0.1246 0.0703 0.0482 0.1186 0.1017 0.1270 0.0702 0.0000 0.0000 0.0474 0.0000 0.0021 Upper 0.7284 0.6228 0.4801 0.5199 0.4488 1.0030 0.7242 1.1650 1.0620 0.8570 0.3611 0.3351 0.4697 0.3359 0.6607 0.5053 0.3853 0.5742 0.5729 0.4730 0.5954 0.6453 0.7802 0.8894 0.5427 0.3085 0.4096 0.4843 0.3330 0.5559 Mean 0.4089 0.2595 0.1915 0.2285 0.2495 0.4364 0.3278 0.5096 0.4505 0.3781 0.4741 0.3416 0.5800 0.2015 0.6190 0.5817 0.2808 0.5326 0.6200 0.5061 0.3737 0.2244 0.3632 0.4468 0.2788 0.1303 0.2319 0.2648 0.1158 0.2987 a2 Lower 0.1175 0.0001 0.0001 0.0001 0.0861 0.0211 0.0180 0.0422 0.0195 0.0189 0.0165 0.0395 0.0640 0.0000 0.2644 0.1493 0.0584 0.1412 0.1501 0.1364 0.0718 0.0010 0.0482 0.1236 0.0456 0.0000 0.0156 0.0843 0.0000 0.0252 96 Upper 0.7038 0.5307 0.4614 0.4742 0.4126 0.9389 0.6745 1.0880 0.9843 0.7737 0.8374 0.5908 1.0610 0.4248 0.9730 0.9736 0.5050 0.8522 1.0270 0.8542 0.6560 0.4357 0.6782 0.7812 0.5022 0.2996 0.4361 0.4579 0.2656 0.5448 Mean 0.2820 0.1491 0.0934 0.1479 0.2260 0.2927 0.1568 0.2635 0.2702 0.1971 0.4556 0.2955 0.6120 0.2786 0.6072 0.6299 0.3113 0.4333 0.6585 0.5335 0.6078 0.4046 0.7089 0.8776 0.5294 0.2461 0.4698 0.3387 0.2487 0.6055 a3 Lower 0.0633 0.0002 0.0000 0.0002 0.0558 0.0433 0.0000 0.0009 0.0112 0.0035 0.0001 0.0171 0.0987 0.0924 0.1872 0.2528 0.1002 0.0448 0.2524 0.1652 0.2977 0.0449 0.1893 0.3027 0.1766 0.0034 0.1489 0.1477 0.0199 0.1879 Upper 0.4875 0.3045 0.2142 0.2919 0.3832 0.5433 0.3094 0.5114 0.5044 0.3775 0.8314 0.5266 1.0560 0.4711 0.9761 0.9931 0.5185 0.7542 1.0050 0.8535 0.9319 0.7408 1.2030 1.3520 0.8402 0.4647 0.7695 0.5366 0.4855 0.9877 Table A.4.1 (cont’d) Item 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Mean 0.1522 0.3570 0.2137 0.1648 0.2751 0.2169 0.3929 0.5042 0.4053 0.2997 0.2293 0.2147 0.3591 0.1824 0.2999 0.2635 0.2594 0.1476 0.1664 0.1538 0.6727 0.3490 0.3859 0.4791 0.3799 0.4183 0.4119 0.4040 0.2662 0.6179 a1 Lower 0.0047 0.1577 0.0000 0.0001 0.0370 0.0093 0.1567 0.2097 0.1762 0.1066 0.0015 0.0001 0.1057 0.0001 0.0550 0.0924 0.0657 0.0001 0.0088 0.0001 0.2409 0.0002 0.0005 0.2725 0.0432 0.0626 0.1147 0.0418 0.0002 0.1506 Upper 0.2842 0.5593 0.4061 0.3237 0.5158 0.3917 0.6657 0.8409 0.6415 0.4915 0.4439 0.4189 0.6021 0.4058 0.5765 0.4518 0.4476 0.2800 0.3095 0.2953 1.1290 0.7552 0.8374 0.6785 0.7382 0.8133 0.7182 0.8260 0.5714 1.1200 Mean 0.1603 0.3348 0.2126 0.2142 0.2200 0.2506 0.4359 0.5910 0.3408 0.2671 0.3296 0.3180 0.3474 0.2222 0.3013 0.2218 0.2495 0.1172 0.1501 0.2102 0.4785 0.1677 0.3027 0.3733 0.1656 0.2143 0.3469 0.2692 0.1525 0.3280 a2 Lower 0.0001 0.1026 0.0005 0.0000 0.0000 0.0012 0.1055 0.2139 0.0028 0.0541 0.0280 0.0448 0.0868 0.0000 0.0050 0.0373 0.0379 0.0000 0.0000 0.0030 0.2135 0.0000 0.0004 0.1680 0.0002 0.0185 0.1424 0.0521 0.0000 0.0687 97 Upper 0.3323 0.5948 0.4315 0.5014 0.5371 0.5891 0.8444 1.1360 0.7085 0.4905 0.5968 0.5826 0.6372 0.4880 0.5861 0.4192 0.4583 0.2624 0.2953 0.3933 0.7612 0.3440 0.5594 0.5928 0.3393 0.3955 0.5369 0.4725 0.3045 0.5845 Mean 0.1500 0.2731 0.1932 0.1675 0.1799 0.2267 0.4205 0.4643 0.2378 0.1311 0.2998 0.3374 0.3528 0.2804 0.3434 0.2498 0.2585 0.1627 0.2080 0.2519 0.4100 0.2427 0.3254 0.3602 0.1450 0.2104 0.2875 0.2798 0.1725 0.3308 a3 Lower 0.0005 0.0718 0.0007 0.0000 0.0005 0.0002 0.1471 0.1417 0.0000 0.0000 0.0007 0.0559 0.0888 0.0000 0.0306 0.0593 0.0361 0.0000 0.0138 0.0613 0.0800 0.0001 0.0330 0.1511 0.0000 0.0000 0.0586 0.0638 0.0000 0.0008 Upper 0.3019 0.4737 0.3828 0.3547 0.4348 0.4736 0.7060 0.8278 0.5132 0.2837 0.5660 0.6078 0.6014 0.5656 0.6602 0.4512 0.4601 0.3496 0.4060 0.4418 0.7388 0.5124 0.5962 0.5845 0.3408 0.4654 0.5086 0.5194 0.3640 0.6453 Table A.4. 2. Highest posterior density(HPD) interval for a4, a5, and a6-parameters Item 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Mean 0.2835 0.2468 0.1031 0.1736 0.1818 0.3494 0.3361 0.3800 0.2873 0.3540 0.1222 0.2088 0.2744 0.1193 0.2856 0.3676 0.2456 0.4480 0.4237 0.3582 0.1848 0.3644 0.5774 0.5668 0.3727 0.1922 0.3276 0.2724 0.2065 0.3309 a4 Lower 0.0004 0.0165 0.0000 0.0000 0.0301 0.0501 0.1379 0.0192 0.0028 0.1244 0.0000 0.0008 0.0000 0.0000 0.0407 0.1380 0.0552 0.1309 0.1591 0.1247 0.0000 0.0969 0.1825 0.1525 0.0965 0.0000 0.0313 0.0625 0.0137 0.0053 Upper 0.5377 0.4621 0.2407 0.3575 0.3305 0.6744 0.5467 0.7268 0.5907 0.5991 0.3054 0.3806 0.5738 0.2549 0.5105 0.5967 0.4348 0.7226 0.6793 0.6077 0.4534 0.6468 1.0100 1.0400 0.6913 0.4181 0.6284 0.4872 0.3825 0.6782 Mean 0.2550 0.2548 0.1021 0.1893 0.1767 0.3547 0.3274 0.3895 0.2910 0.3254 0.1821 0.2310 0.3650 0.1325 0.3341 0.4319 0.2712 0.4662 0.4791 0.4087 0.1927 0.3170 0.5581 0.5433 0.3684 0.1928 0.3374 0.2766 0.1953 0.3192 a5 Lower 0.0001 0.0002 0.0000 0.0000 0.0202 0.0715 0.1133 0.0633 0.0174 0.1014 0.0000 0.0181 0.0285 0.0005 0.0447 0.1977 0.0703 0.1824 0.2287 0.1875 0.0000 0.0900 0.1912 0.1763 0.1085 0.0000 0.0405 0.0900 0.0027 0.0213 98 Upper 0.5782 0.4930 0.2418 0.3624 0.3268 0.6439 0.5305 0.7610 0.5629 0.5519 0.4141 0.4378 0.6871 0.2714 0.6257 0.6687 0.4660 0.7217 0.7170 0.6338 0.4253 0.5683 0.9691 0.9609 0.6543 0.3921 0.6318 0.4650 0.3550 0.6025 Mean 0.2077 0.3048 0.0845 0.1669 0.1756 0.3651 0.2476 0.3041 0.3047 0.2775 0.1007 0.1470 0.2843 0.2422 0.3733 0.3311 0.2531 0.2917 0.3769 0.2804 0.4498 0.5480 0.7314 0.8712 0.5068 0.2241 0.4209 0.2715 0.2819 0.5282 a6 Lower 0.0136 0.0237 0.0000 0.0001 0.0317 0.1037 0.0020 0.0001 0.0269 0.0486 0.0000 0.0000 0.0335 0.0552 0.1377 0.0911 0.0797 0.0371 0.1374 0.0732 0.0622 0.2509 0.1784 0.2701 0.1410 0.0001 0.0632 0.0403 0.0490 0.0336 Upper 0.3922 0.5645 0.1962 0.3263 0.3211 0.6280 0.4594 0.5794 0.5469 0.5020 0.2582 0.2870 0.5376 0.4182 0.6148 0.5816 0.4220 0.5201 0.6429 0.4986 0.8637 0.7975 1.2470 1.4220 0.8788 0.4684 0.7981 0.5102 0.4899 0.9640 Table A.4.2. (cont’d) Item 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Mean 0.2315 0.3687 0.2678 0.3725 0.3932 0.4119 0.5198 0.6925 0.4355 0.2400 0.4999 0.5684 0.5240 0.5913 0.6445 0.3687 0.3803 0.3253 0.3336 0.2754 0.4254 0.2087 0.2556 0.4559 0.1703 0.2713 0.3599 0.3263 0.2808 0.3860 a4 Lower 0.0204 0.0990 0.0046 0.0612 0.0357 0.0433 0.0836 0.1501 0.0094 0.0015 0.0077 0.1043 0.1021 0.0438 0.0850 0.0659 0.0423 0.0831 0.1069 0.0001 0.0559 0.0002 0.0001 0.2397 0.0000 0.0001 0.1188 0.0520 0.0547 0.0799 Upper 0.4286 0.6434 0.4834 0.6457 0.7333 0.7303 0.9124 1.2180 0.8144 0.4833 0.9106 0.9903 0.9244 1.0930 1.1410 0.6378 0.6772 0.5482 0.5563 0.5310 0.8094 0.4690 0.5437 0.6772 0.4026 0.5568 0.6000 0.5949 0.4967 0.7282 Mean 0.2048 0.3256 0.2316 0.3308 0.3256 0.3581 0.4435 0.6159 0.3532 0.2079 0.5692 0.6169 0.5663 0.6649 0.7042 0.3952 0.4220 0.3269 0.3376 0.3261 0.3958 0.1705 0.2351 0.4276 0.1513 0.2352 0.3625 0.2978 0.2540 0.3464 a5 Lower 0.0084 0.0640 0.0002 0.0572 0.0160 0.0149 0.0662 0.1196 0.0001 0.0001 0.0675 0.1509 0.1037 0.0207 0.0881 0.0533 0.0339 0.0638 0.0824 0.0049 0.0641 0.0000 0.0000 0.2307 0.0000 0.0019 0.1297 0.0604 0.0609 0.0389 99 Upper 0.3978 0.6256 0.4907 0.6152 0.6432 0.7049 0.9204 1.2230 0.7837 0.4755 0.9438 0.9729 0.9333 1.1130 1.1610 0.6619 0.7147 0.5617 0.5729 0.5575 0.7377 0.3709 0.5069 0.6223 0.3628 0.4501 0.6261 0.5506 0.4470 0.6580 Mean 0.1365 0.2532 0.1532 0.1561 0.2673 0.2032 0.3382 0.3403 0.2722 0.1391 0.2372 0.2643 0.3555 0.2968 0.3872 0.3096 0.2613 0.2193 0.2251 0.2285 0.6865 0.5125 0.5249 0.4754 0.4313 0.5093 0.4348 0.5050 0.3626 0.7296 a6 Lower 0.0000 0.0582 0.0000 0.0000 0.0428 0.0000 0.0176 0.0001 0.0132 0.0000 0.0005 0.0002 0.1043 0.0015 0.0885 0.1269 0.0538 0.0291 0.0247 0.0336 0.1266 0.1348 0.0376 0.2766 0.0014 0.0874 0.1058 0.1282 0.0888 0.1819 Upper 0.2789 0.4514 0.3337 0.3338 0.5019 0.4233 0.6104 0.6501 0.5119 0.3016 0.4805 0.5175 0.6148 0.5717 0.6939 0.4953 0.4728 0.4036 0.4327 0.4119 1.1060 0.8197 0.8535 0.6707 0.7206 0.8352 0.7186 0.8171 0.6082 1.1510 Table A.4. 3. Highest posterior density(HPD) interval for d-parameter Item 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Mean 0.995 0.762 0.353 0.283 0.151 0.552 -0.095 -0.138 -0.163 -1.029 0.364 0.358 0.336 0.011 0.006 -0.008 -0.115 -0.673 -0.947 -0.960 1.139 1.115 1.293 1.154 0.297 0.104 0.113 -0.099 -0.124 -0.446 100 d Lower 0.897 0.680 0.277 0.209 0.073 0.463 -0.182 -0.232 -0.250 -1.129 0.285 0.276 0.240 -0.064 -0.090 -0.098 -0.189 -0.767 -1.059 -1.045 1.032 1.007 1.164 1.026 0.216 0.028 0.036 -0.174 -0.198 -0.537 Upper 1.085 0.855 0.431 0.364 0.224 0.646 -0.017 -0.050 -0.080 -0.926 0.450 0.434 0.440 0.089 0.090 0.081 -0.046 -0.578 -0.848 -0.857 1.244 1.205 1.428 1.291 0.385 0.182 0.201 -0.010 -0.049 -0.356 Table A.4.3. (cont’d) Item 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Mean 0.558 0.854 0.340 0.328 0.034 -0.027 0.028 -0.251 -0.230 -0.399 1.236 1.077 0.625 0.196 0.018 -0.070 -0.164 -0.386 -0.414 -0.873 2.374 1.131 0.783 0.789 0.195 -0.155 -0.127 -0.456 -0.495 -1.522 101 d Lower 0.481 0.764 0.265 0.249 -0.046 -0.106 -0.060 -0.350 -0.312 -0.480 1.119 0.967 0.530 0.106 -0.073 -0.147 -0.245 -0.466 -0.493 -0.967 2.197 1.031 0.690 0.698 0.117 -0.239 -0.213 -0.545 -0.576 -1.658 Upper 0.636 0.944 0.422 0.411 0.111 0.056 0.118 -0.150 -0.139 -0.327 1.350 1.176 0.720 0.278 0.115 0.014 -0.076 -0.310 -0.333 -0.790 2.572 1.232 0.878 0.885 0.267 -0.077 -0.045 -0.368 -0.415 -1.406 REFERENCES 102 REFERENCES Albert, J. H. (1992). Bayesian Estimation of Normal Ogive Item Response Curves Using Gibbs Sampling. Journal of Educational Statistics, 17(3), 251-269. Arellano-Valle, R. B., & Azzalini, A. (2008). The centred parametrization for the multivariate skew-normal distribution. Journal of Multivariate Analysis, 99(7), 1362-1382. doi: 10.1016/j.jmva.2008.01.020 Baker, F. B. (1987). Methodology Review: Item Parameter Estimation Under the One-, Two-, and Three-Parameter Logistic Models. Applied Psychological Measurement, 11(2), 111141. doi: 10.1177/014662168701100201 Baker, F. B., & Kim, S.-H. (2004). Item Response Theory: Parameter estimation techniques (Second Edition, Revised and Expanded ed.). New York, NY: Marcel Dekker. Batley, R.-M., & Boss, M. W. (1993). The Effects on Parameter Estimation of Correlated Dimensions and a Distribution-Restricted Trait in a Multidimensional Item Response Model. Applied Psychological Measurement, 17(2), 131-141. doi: 10.1177/014662169301700203 Béguin, A., & Glas, C. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66(4), 541-561. doi: 10.1007/bf02296195 Birbaum, A. (1968). Some latent trait models and their use in inferring an examinee's ability. In F. M. L. M. R. Novick (Ed.), Statistical theories of mental test scores (pp. 397-472). Reading, MA: Addison-Wesley. Bock, R. D., & Aitkin, M. (1981). Marginal Maximum-Likelihood Estimation of Item Parameters - Application of an Em Algorithm. Psychometrika, 46(4), 443-459. Bock, R. D., Gibbons, R., SChillings, S. G., Muraki, E., Wilson, D. T., & Woods, R. (2003). Testfact(version 4). Chicago, IL: Scientific Software International. Bolt, D. M., & Lall, V. F. (2003). Estimation of Compensatory and Noncompensatory Multidimensional Item Response Models Using Markov Chain Monte Carlo. Applied Psychological Measurement, 27(6), 395-414. doi: 10.1177/0146621603258350 Brooks, S. P. (1998). Markov chain Monte Carlo method and its application. Journal of the Royal Statistical Society Series D-the Statistician, 47(1), 69-100. Brooks, S. P., & Morgan, B. J. T. (1994). Automatic starting point selection for function optimization. Statistics and Computing, 4(3), 173-177. doi: 10.1007/bf00142569 103 Carlson, J. E. (1987). Multidimensional Item Response Theory Estimation: A Computer Program. IOWA, IA: ACT. Cowles, M. K., & Carlin, B. P. (1996). Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review. Journal of the American Statistical Association, 91(434), 883-904. De Ayala, R. J., & Sava-Bolesta, M. (1999). Item Parameter Recovery for the Nominal Response Model. Applied Psychological Measurement, 23(1), 3-19. doi: 10.1177/01466219922031130 Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1-38. Dorans, N. J., & Kingston, N. M. (1985). The Effects of Violations of Unidimensionality on the Estimation of Item and Ability Parameters and on Item Response Theory Equating of the GRE Verbal Scale. Journal of Educational Measurement, 22(4), 249-262. Finch, H. (2010). Item Parameter Estimation for the MIRT Model. Applied Psychological Measurement, 34(1), 10-26. doi: 10.1177/0146621609336112 Finch, H. (2011). Multidimensional Item Response Theory Parameter Estimation With Nonsimple Structure Items. Applied Psychological Measurement, 35(1), 67-82. doi: 10.1177/0146621610367787 Fraser, C., & McDonald, R. P. (1988). NOHARM II : A FORTRAN program for fitting unidimensional and multidimensional normal ogive models of latent trait theory. Armidale, Australia: University of New England, Centre for Behavioral Studies. Froelich, A. G. (2001). Assessing the uni-dimensionality of test items and some asymptotics of parametric item response theory. Unpublished doctoral dissertation. Department of Statistics. University of Illinois at Urbana-Champaign. Fu, Z.-H., Tao, J., & Shi, N.-Z. (2009). Bayesian estimation in the multidimensional threeparameter logistic model. Journal of Statistical Computation and Simulation, 79(6), 819835. doi: 10.1080/00949650801966876 Gelman, A., & Rubin, D. B. (1992). Inference from Iterative Simulation Using Multiple Sequences. Statistical Science, 7(4), 457-472. Gelman, A., & Shalizi, C. R. (2012). Philosophy and the practice of Bayesian statistics in the social sciences. In H. Kincaid (Ed.), The Oxford Handbook of Philosophy of Social Science: Oxford University Press. Geman, S., & Geman, D. (1984). Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images. Pattern Analysis and Machine Intelligence, IEEE Transactions on, PAMI-6(6), 721-741. 104 Geweke, J. (1992). Evaluating the Accuracy of Sampling-Based Approaches to the Calculation of Posterior Moments. In J. M. Bernardo, J. Berger, A. P. Dawid & A. F. M. Smith (Eds.), Bayesian Statistics (4th ed., pp. 169-193). Oxford, U.K: Oxford University Press. Geyer, C. J. (1992). Practical Markov Chain Monte Carlo. Statistical Science, 7(4), 473-483. Goaz, J. K., & C.M, W. (2002). An Empirical comparison of multidimensional item response theory data using TESTFACT and NOHARM. Paper presented at the the annual meeting of the National Council for Measurement in Education, New Orleans. Harwell, M., Stone, C. A., Hsu, T.-C., & Kirisci, L. (1996). Monte Carlo Studies in Item Response Theory. Applied Psychological Measurement, 20(2), 101-125. doi: 10.1177/014662169602000201 Hastings, W. K. (1970). Monte Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika, 57(1), 97-109. Heidelberger, P., & Welch, P. D. (1983). Simulation Run Length Control in the Presence of an Initial Transient. Operations Research, 31(6), 1109-1144. Holzinger, K., & Harman, H. (1941). Factor analysis: A synthesis of factorial methods. Chicago, IL: The University of Chicago Press. HPCC. (2012). HPCC Retrieved December 11, 2012, from WIKI: https://wiki.hpcc.msu.edu/display/hpccdocs/Documentation+and+User+Manual#Docume ntationandUserManual-Overview Hulin, C. L., Lissak, R. I., & Drasgow, F. (1982). Recovery of Two- and Three-Parameter Logistic Item Characteristic Curves: A Monte Carlo Study. Applied Psychological Measurement, 6(3), 249-260. doi: 10.1177/014662168200600301 Kelderman, H., & Rijkes, C. (1994). Loglinear multidimensional IRT models for polytomously scored items. Psychometrika, 59(2), 149-176. doi: 10.1007/bf02295181 Kelton, W. D., & Law, A. M. (1984). An Analytical Evaluation of Alternative Strategies in Steady-State Simulation. Operations Research, 32(1), 169-184. Kolen, M. J. (1981). Comparison of Traditional and Item Response Theory Methods for Equating Tests. Journal of Educational Measurement, 18(1), 1-11. Lawley, D. N. (1944). The factorial analysis of multiple item tests Paper presented at the The Royal Society of Edinburgh. Lord, F. (1952). A Theory of Test Score(Psychometric Monograph No. 7). Richimond, WA: Psychometric Corporation. 105 Lord, F. M., Novick, M. R., & Birnbaum, A. (1968). Statistical theories of mental test scores. Oxford, England: Addison-Wesley. Maris, E. (1995). Psychometric Latent Response Models. Psychometrika, 60(4), 523-547. Maydeu-Olivares, A. (2001). Multidimensional Item Response Theory Modeling of Binary Data: Large Sample Properties of NOHARM Estimates. Journal of Educational and Behavioral Statistics, 26(1), 51-71. doi: 10.3102/10769986026001051 McDonald, R. P. (1999). Test theory: A unified treatment: Mahwah, NJ, US: Lawrence Erlbaum Associates Publishers. Mckinley, R. L., & Mills, C. N. (1985). A Comparison of Several Goodness-of-Fit Statistics. Applied Psychological Measurement, 9(1), 49-57. McKinley, R. L., & Reckase, M. D. (1980). A comparison of the ANCILLES and LOGIST parameter estimation procedure for the three-parameter logistic model using goodness of fit as a criterion. Columbia MO: University of Missouri Tailored Testing Laboratory. McKinley, R. L., & Reckase, M. D. (1984). An investigation of the effect of correlated abilities on observed test characteristics (T. D. Division, Trans.). Iowa City, IA: American College Testing Program. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). Equation of State Calculations by Fast Computing Machines. The Journal of Chemical Physics, 21(6), 1087-1092. Muraki, E., & Carlson, J. E. (1993). Full-information factor analysis for polytomous item responses. Paper presented at the American Educational Research Association, Atlanta. Patz, R. J., & Junker, B. W. (1999a). A Straightforward Approach to Markov Chain Monte Carlo Methods for Item Response Models. Journal of Educational and Behavioral Statistics, 24(2), 146-178. Patz, R. J., & Junker, B. W. (1999b). Applications and Extensions of MCMC in IRT: Multiple Item Types, Missing Data, and Rated Responses. Journal of Educational and Behavioral Statistics, 24(4), 342-366. Raftery, A. E., & Lewis, S. (1992). How many iterations in the Gibbs Sampler? In J. M. bernardo, J. berger, A. P. Dawid & A. F. M. Smith (Eds.), Bayesian Statistics (4th ed., pp. 763-773). Oxford, U.K: Oxford University Press. Reckase, M. D. (1985). The Difficulty of Test Items That Measure More Than One Ability. Applied Psychological Measurement, 9(4), 401-412. doi: 10.1177/014662168500900409 106 Reckase, M. D. (2009). Multidimensional Item Response Theory. New York, NY: Springer New York. Reckase, M. D., & McKinley, R. L. (1982). Some Latent Trait Theory in a Multidimensional Latent Space. S.l.: Distributed by ERIC Clearinghouse. Reckase, M. D., & McKinley, R. L. (1991). The Discriminating Power of Items That Measure More Than One Dimension. Applied Psychological Measurement, 15(4), 361-373. doi: 10.1177/014662169101500407 Reise, S. P., Waller, N. G., & Comrey, A. L. (2000). Factor analysis and scale revision. Psychological Assessment, 12(3), 287-297. Ripley, B. D., & Kirkland, M. D. (1990). Iterative simulation methods. Journal of Computational and Applied Mathematics, 31(1), 165-172. doi: 10.1016/0377-0427(90)90347-3 Samejima, F. (1974). Normal Ogive Model on Continuous Response Level in Multidimensional Latent Space. Psychometrika, 39(1), 111-121. Sheng, Y. (2008). A MATLAB package for Markov chain Monte Carlo with a multiunidimensional IRT model. Journal of Statistical Software, 28(10). Sheng, Y., & Wikle, C. K. (2007). Comparing Multiunidimensional and Unidimensional Item Response Theory Models. Educational and Psychological Measurement, 67(6), 899-919. doi: 10.1177/0013164406296977 Stone, C. A., & Yeh, C.-C. (2006). Assessing the Dimensionality and Factor Structure of Multiple-Choice Exams. Educational and Psychological Measurement, 66(2), 193-214. doi: 10.1177/0013164405282483 Tate, R. (2003). A Comparison of Selected Empirical Methods for Assessing the Structure of Responses to Test Items. Applied Psychological Measurement, 27(3), 159-203. doi: 10.1177/0146621603027003001 Thissen, D., & Steinberg, L. (1984). A Response Model for Multiple-Choice Items. Psychometrika, 49(4), 501-519. Thissen, D., & Wainer, H. (1982). Some Standard Errors in Item Response Theory. Psychometrika, 47(4), 397-412. Thurstone, L. L. (1947). Multiple-factor analysis; a development and expansion of The Vectors of Mind: Chicago, IL, US: University of Chicago Press. Tucker, L. (1946). Maximum validity of a test with equivalent items. Psychometrika, 11(1), 1-13. doi: 10.1007/bf02288894 107 Walker, C. M., Azen, R., & Schmitt, T. (2006). Statistical Versus Substantive Dimensionality. Educational and Psychological Measurement, 66(5), 721-738. doi: 10.1177/0013164405285907 Whitely, S. E. (1980). Multicomponent Latent Trait Models for Ability Tests. Psychometrika, 45(4), 479-494. Wingersky, M. S., Barton, M. A., & Lord, F. M. (1982). LOGIST user's guide. Princeton NJ: Educational Testing Service. Wollack, J. A., Bolt, D. M., Cohen, A. S., & Lee, Y.-S. (2002). Recovery of Item Parameters in the Nominal Response Model: A Comparison of Marginal Maximum Likelihood Estimation and Markov Chain Monte Carlo Estimation. Applied Psychological Measurement, 26(3), 339-352. doi: 10.1177/0146621602026003007 Yao, L. (2003). BMIRT: Bayesian multivariate item response theory. Monterey, CA: CTB/McGraw-Hill. Yao, L. H., & Schwarz, R. D. (2006). A multidimensional partial credit model with associated item and test statistics: An application to mixed-format tests. Applied Psychological Measurement, 30(6), 469-492. doi: Doi 10.1177/0146621605284537 Yen, W. M. (1981). Using Simulation Results to Choose a Latent Trait Model. Applied Psychological Measurement, 5(2), 245-262. 108