Using multidimensional item response theory to report subscores across multiple test forms
There is an increasing interest in subscores in educational testing because subscores have potential benefits in remedial and instructional application (Sinharay, Puhan, & Haberman, 2011). Users of score reports are interested in receiving information on examinees’ performances on subsections of an achievement test. These scores “typically are referred to as ‘subscale scores,’ ‘subtest scores,’ or more generically, ‘subscores (Ferrara & DeMauro, 2006, p. 583).’” Among these current subscore research reports, few address the following issues. First, in most research, the number of subscores, the number of items in each subscore domain and the item types in each domain are already fixed according to the classification produced by test developers and content experts. Thus, the distinct domains defining subscores may not be clearly defined in a technical psychometric sense. Also, little information may be provided to show there are enough items in each domain to support reporting useful scores. Moreover, it may not be clear why particular types of items are grouped together within each domain. Finally, few discuss how to link and equate test forms when reporting subscores. In order to fill in the above gaps and to explore solutions to the questions, this research study applied the multidimensional item response theory to report subscores for a large-scale international English language test. Different statistical and psychometric skills and methods were used to analyze the dimension structure, the clusters for reporting subscores, and to link individual test forms to provide comparable and reliable subscores. The results show that there are seven distinct dimensions that capture the variation among examinee responses to items in the data sets. For each different form, there are different number of clusters identified. Moreover, each cluster corresponds with a unique reference composite. Across all five test forms, there are 6 – 8 clusters identified. There is a consistency of the dimensional structure across these five forms based on the parallel analysis, exploratory and confirmatory factor analysis, cluster analysis and reference composite analysis. The nonorthogonal Procrustes rotation linked each individual form with the base form and rotated the subscores from individual forms back to the same base form so that the subscores identified from different forms were comparable. In conclusion, this research provided a systematic method to report subscores using multidimensional item response theory. Such procedures can be replicated and applied for different test programs. Large amount of missing values and small sample size for each individual form were limitations in this study. For future research, I would suggest using large-scale data sets with few missing values. For each individual test form, the sample size should be better larger than 450, such as 600 to 800.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Authors
-
Xu, Jing-Ru
- Thesis Advisors
-
Reckase, Mark D.
- Committee Members
-
Houang, Richard T.
Mandrekar, Vidyadhar
Zhong, Pingshou
- Date
- 2016
- Subjects
-
Educational tests and measurements--Design and construction
Educational tests and measurements--Evaluation
Item response theory
Multidimensional scaling
- Program of Study
-
Measurement and Quantitative Methods - Doctor of Philosophy
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- xi, 92 pages
- ISBN
-
9781339692395
1339692392