THE PERFORMANCE OF MLR, USLMV, AND WLSMV ESTIMATION IN STRUCTURAL
REGRESSION MODELS WITH ORDINAL VARIABLES
By
Cheng-Hsien Li

	  
	  
	  
	  
	  
	  
	  
	  
	  
	  
	  
	  
	  
	  
A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Measurement and Quantitative Methods−Doctor of Philosophy
2014

ABSTRACT
THE PERFORMANCE OF MLR, USLMV, AND WLSMV ESTIMATION IN STRUCTURAL
REGRESSION MODELS WITH ORDINAL VARIABLES
By
Cheng-Hsien Li
In the educational, social, and behavioral sciences, ordered observed categorical
variables are commonly used to operationalize latent constructs in structural regression
models. Treating ordinal manifest variables as if they were continuous, the precision and
accuracy of model parameter estimates, standard errors, and chi-square goodness of fit
statistics are likely compromised, leading to invalid statistical inferences. Three robust
estimators − robust maximum likelihood (MLR), robust unweighted least squares (ULSMV),
and robust weighted least squares (WLSMV) − have been proposed in the literature over the
past two decades, and are considered to be superior to normal theory-based maximum
likelihood (ML) when ordinal observed variables are analyzed.
The purpose of this thesis was to carry out a Monte Carlo simulation study, in order to
compare the performance of ML, the most widely known estimation method, with the three
robust estimators (MLR, ULSMV, and WLSMV) on parameter estimates, standard errors, and
chi-square goodness of fit statistics in a five-factor structural regression model with ordinal
observed variables. There were 4 (level of asymmetric distributions of ordinal observed
variables: symmetry, slight and moderate asymmetry, as well as bipolarization) × 4 (number
of observed variables’ categories: 4, 5, 6, and 7) × 7 (sample size: 200, 300, 400, 500, 750,
1,000, and 1,500) = 112 conditions in the study. Five hundred data sets were generated under
each experimental condition. Model parameters, standard errors, chi-square goodness of fit
statistics, and RMSEA were estimated for each replication using ML, MLR, ULSMV, and WLSMV.
Data generation and analysis were performed with Mplus 7.
The results reveal that (1) the four estimators are all subjected to non-convergence

problems with 4-category, moderately asymmetric data in the smallest sample size N = 200;
(2) WLSMV and ULSMV are likely to produce inadmissible solutions in some conditions with
sample sizes N = 200 or 300; (3) WLSMV and ULSMV yield more accurate factor loading
estimates than ML and MLR across all conditions in the study; (4) the estimates of structural
coefficients under ML and MLR outperform WLSMV and ULSMV in all symmetric data conditions,
whereas WLSMV and ULSMV surpass ML and MLR in nearly all asymmetric data conditions; (5)
the robust standard errors of factor loadings obtained with ULSMV are more precise than those
produced by WLSMV and MLR across all conditions; (6) the robust standard errors of structural
coefficients obtained with WLSMV are more precise than those with ULSMV and MLR in all
asymmetric data conditions; (7) among the three robust estimators, MLR is inferior to WLSMV
and ULSMV in controlling for Type I error rates of testing overall model fit in almost every
condition, unless a larger sample size is used (i.e., N = 1,000 in this thesis); (8) RMSEA seems
to be a reliable index in the evaluation of overall model fit when the model has no specification
error; (9) the benefit of using diagonal weights can be found in the estimation of factor
loadings and structural coefficients as well as robust standard errors of structural coefficients,
but not in the estimation of robust standard errors of factor loadings and the mean- and
variance-adjusted chi-square goodness of fit statistics across all conditions; and (10) the
accuracy and precision of factor loading and structural coefficient estimates and standard error
estimates of factor loadings and structural coefficients improve with increasing sample size
and number of observed variables’ categories but decrease with a greater level of asymmetric
distributions.
Collectively, the findings from this study provide a better understanding of the
performance of the three robust estimators, and aim to inform the work of applied researchers
with respect to the importance of attending to assumption violations and selecting an
“appropriate” estimator under circumstances frequently encountered in practice. Finally,
implications of the findings for structural regression models using these four estimators are
discussed, as are the limitations of this study as well as potential directions for future research.

Copyright by
CHENG-HSIEN LI
2014

ACKNOWLEDGMENTS

The writing of this dissertation has been an incredible journey and a monumental
accomplishment in my academic life.

First of all, I would like to express my deepest appreciation and genuine gratitude to my
advisor and dissertation chair Dr. Tenko Raykov for his unflagging support and constant
encouragement throughout my doctoral endeavors. I greatly appreciate his patience to review
early dissertation drafts so many times and his constructive feedback, which have helped me
improve the quality of this dissertation tremendously. Without his supervision, guidance, and
intellectual enlightenment, this dissertation would not have been possible.

I would like to thank my dissertation committee members Dr. Mark Reckase, Dr. Richard
DeShon, and Dr. Matthew Diemer, each of whom has provided insightful comments and
invaluable suggestions on an earlier version of this dissertation, and has made a unique
contribution to the completion of this dissertation. A special thanks is extended to Dr. Matthew
Dimmer, who has been my mentor in the fields of Developmental and Educational Psychology
for the past six years. Working with him has been a truly rewarding and inspirational
experience along the way. I would also like to thank Dr. Konstantopoulos, Dr. Schmidt, and Dr.
Kimberley for providing excellent opportunities to enrich my teaching and research
experience.

I feel deeply indebted to Dr. Jing-Jyi Wu, Dr. Shu-Shen Shih, and Dr. James Tu, who
have supported me emotionally and academically since I started pursuing my Ph.D. degree in

v

2008. I would like to express my gratitude to my colleagues Anne Traynor and Hyesuk Jang,
and my friends Yun-Jia Lo, Yi-Ling Cheng, I-Chien Chen, Guan Saw, and Chi Chang, who have
helped me in countless ways throughout my years at MSU.

Last but absolutely not least, a unique thanks goes to my lovely family, my parents
Hsin-Hua Li and Hsiu-Kan Chen, brother Hung-Wei Li, sister-in-law Mei-Hua Lin, sister
Wen-Hui Li, nieces Tzu-Ching Li and Tzu-Yi Li, and nephew Pin-Yen Li. Words cannot express
how grateful I am to them for all of the sacrifices that they have made on my behalf,
unwavering faith that they have had in me, and unconditional love that they have given me.

vi

TABLE OF CONTENTS

LIST OF TABLES

ix

LIST OF FIGURES

xii

CHAPTER 1 INTRODUCTION
Structural Regression Models
Thresholds and Polychoric Correlations
Least Squares Estimation
Robust Corrections to Standard Errors and Test Statistics
Maximum Likelihood Estimation
Robust Corrections to Standard Errors and Test Statistics

1
4
7
11
15
17
19

CHAPTER 2 EMPRICAL FINDINGS
Parameter Estimates
Standard Error Estimates
Chi-Square Goodness of Fit Statistics

22
23
24
24

CHAPTER 3 PRESENT STUDY

26

CHAPTER 4 METHOD
Model Specification
Simulation Design
Number of Observed Variables’ Categories
Ordinal Observed Distributions
Sample Size
Data Generation and Analysis
Outcome Variables

32
32
34
35
36
38
40
41

CHAPTER 5 RESULTS
Non-Convergence and Inadmissible Solutions
Parameter Estimates
Factor Loadings
Structural Coefficients
Standard Error Estimates
Chi-Square Goodness of Fit Statistics
RMSEA

47
47
49
49
51
53
55
57

CHAPTER 6 DISCUSSION
Implications for Applied Research

58
63

vii

Sample Size
Estimation Methods
Response Categories and Observed Distributions
Limitations and Directions for Future Research
CHAPTER 7 SUMMARY AND CONCLUSIONS
APPENDICES
Appendix
Appendix
Appendix
Appendix
Appendix
Appendix

63
64
67
68
72

A: Tables
B: Figures
C: Technical Details
D: Mplus Code for Data Generation and Analysis
E: Results for Sample Sizes N = 400, 750, and 1,500
F: Results for Bipolarization Data

REFERENCES

76
77
102
110
112
117
132
142

viii

LIST OF TABLES

Table 1

Overview of Six Major Simulation Studies in Ordinal CFA

77

Table 2

Robust Estimation Comparison in the Three SEM Software Packages

77

Table 3

Comparison of Two Major Estimation Approaches: Maximum Likelihood
and Least Squares in Mplus

78

Table 4(a)

Cases of Non-Convergence

79

Table 4(b)

Cases of Inadmissible Solutions

80

Table 5

The Average Relative Bias (RBA) and Average Root Mean Squared Error
(MSEA) for Factor Loadings and Structural Coefficients (N = 200)

81

The Average Relative Bias (RBA) and Average Root Mean Squared Error
(MSEA) for Factor Loadings and Structural Coefficients (N = 300)

82

The Average Relative Bias (RBA) and Average Root Mean Squared Error
(MSEA) for Factor Loadings and Structural Coefficients (N = 500)

83

The Average Relative Bias (RBA) and Average Root Mean Squared Error
(MSEA) for Factor Loadings and Structural Coefficients (N = 1,000)

84

The Average Root Mean Squared Error (MSEA) for the Four Structural
Coefficients (N = 1,000)

85

The Average Relative Bias (RBA) and Average Root Mean Squared Error
(MSEA) for Standard Errors (SE) of Factor Loadings and Structural
Coefficients (N = 200)

86

The Average Relative Bias (RBA) and Average Root Mean Squared Error
(MSEA) for Standard Errors (SE) of Factor Loadings and Structural
Coefficients (N = 300)

88

The Average Relative Bias (RBA) and Average Root Mean Squared Error
(MSEA) for Standard Errors (SE) of Factor Loadings and Structural
Coefficients (N = 500)

90

Table 6

Table 7

Table 8

Table 9

Table 10

Table 11

Table 12

ix

Table 13

Table 14

Table 15

Table 16

Table 17

Table E1

Table E2

Table E3

Table E4

Table E5

Table E6

Table E7

Table E8

The Average Relative Bias (RBA) and Average Root Mean Squared Error
(MSEA) for Standard Errors (SE) of Factor Loadings and Structural
Coefficients (N = 1,000)

92

Bias and Rejection Rates of Chi-Square Statistics as well as Means and
Rejection Rates of RMSEA (N = 200)

94

Bias and Rejection Rates of Chi-Square Statistics as well as Means and
Rejection Rates of RMSEA (N = 300)

96

Bias and Rejection Rates of Chi-Square Statistics as well as Means and
Rejection Rates of RMSEA (N = 500)

98

Bias and Rejection Rates of Chi-Square Statistics as well as Means and
Rejection Rates of RMSEA (N = 1,000)

100

The Average Relative Bias (RBA) and Average Root Mean Squared Error
(MSEA) for Factor Loadings and Structural Coefficients (N = 400)

117

The Average Relative Bias (RBA) and Average Root Mean Squared Error
(MSEA) for Factor Loadings and Structural Coefficients (N = 750)

118

The Average Relative Bias (RBA) and Average Root Mean Squared Error
(MSEA) for Factor Loadings and Structural Coefficients (N = 1,500)

119

The Average Relative Bias (RBA) and Average Root Mean Squared Error
(MSEA) for Standard Errors (SE) of Factor Loadings and Structural
Coefficients (N = 400)

120

The Average Relative Bias (RBA) and Average Root Mean Squared Error
(MSEA) for Standard Errors (SE) of Factor Loadings and Structural
Coefficients (N = 750)

122

The Average Relative Bias (RBA) and Average Root Mean Squared Error
(MSEA) for Standard Errors (SE) of Factor Loadings and Structural
Coefficients (N = 1,500)

124

Bias and Rejection Rates of Chi-Square Statistics as well as Means and
Rejection Rates of RMSEA (N = 400)

126

Bias and Rejection Rates of Chi-Square Statistics as well as Means and

x

Table E9

Rejection Rates of RMSEA (N = 750)

128

Bias and Rejection Rates of Chi-Square Statistics as well as Means and
Rejection Rates of RMSEA (N = 1,500)

130

Table F1

The Average Relative Bias (RBA) and Average Root Mean Squared Error
(MSEA) for Factor Loadings and Structural Coefficients with Bipolarization
Distribution
132

Table F2

The Average Relative Bias (RBA) and Average Root Mean Squared Error
(MSEA) for Standard Errors (SE) of Factor Loadings and Structural
Coefficients with Bipolarization Distribution

134

Bias and Rejection Rates of Chi-Square Statistics as well as Means and
Rejection Rates of RMSEA with Bipolarization Distribution

138

Table F3

xi

LIST OF FIGURES

Figure 1

The postulated five-factor structural regression model with standardized
Coefficients. Note. Ordinal observed variables of each latent construct
are not depicted for clarity.
102

Figure 2

Response probabilities of ordinal observed indicators

Figure 3

Average mean squared error for the factor loading estimates across the
number of categories with symmetric data and the smallest sample size
N = 200
104

Figure 4

Average mean squared error for the standard error estimates of factor
loadings across the number of categories with slightly asymmetric data
and the sample size N = 300
105

Figure 5

Average mean squared error for the standard error estimates of factor
loadings across the number of categories with slightly asymmetric data
and the sample size N = 1,000
106

Figure 6

Average mean squared error for the standard error estimates of structural
coefficients across the number of categories with slightly asymmetric data
and the sample size N = 300
107

Figure 7

P-P plots for TML, TMLR, TWLSMV, and TULSMV (Moderate Asymmetry and
7-category)

108

P-P plots for TML, TMLR, TWLSMV, and TULSMV (N = 300 and 7-category)

109

Figure 8

xii

103

CHAPTER 1
INTRODUCTION

Observed variables measured with a set of ordered categories (e.g., using Likert-type
scales) are commonly employed to operationalize latent constructs in the educational, social,
and behavioral sciences. Unlike continuous variables, calculation of means, variances, and
covariances for ordered observed categorical variables (i.e., ordinal observed variables) is in
general meaningless due to the lack of substantively interpretable origins and metrics for
these variables (Jöreskog, 2005). When it comes to a statistical model, one with ordinal data
on outcome variables entails different parameter specification than a model with continuous
response variables. By treating ordinal observed variables as if they were continuous, applied
researchers may not only possibly undermine the precision and accuracy of model parameter
estimates − to varying degree depending on models, data characteristics, and related
circumstances − but also arrive at misleading scientific conclusions drawn from empirical data.
This problem, which generally plagues applied researchers utilizing various statistical
frameworks, is also inevitable when employing latent variable modeling (LVM), in particular
confirmatory factor analysis (CFA) and structural equation modeling (SEM).

Over the past few decades, an extensive body of research has used structural regression
models in the applied educational, behavioral, and social science literature. A structural
regression model takes into account the measurement error of observed variables, and it
simultaneously captures the linear relationships among latent constructs of interest. The most
widely known estimator used in structural regression (SR) models is the normal theory-based
maximum likelihood (ML) method. This is largely due to its optimal properties of asymptotic
unbiasedness, consistency, normality, and efficiency (Bollen, 1989). Use of ML, however,

1

assumes that the observed variables are continuous and multivariate normally distributed in
the population conditional on the covariates if included in the model (Bollen, 1989; Jöreskog,
1969). Therefore, ML is not, strictly speaking, appropriate for observed variables that are
scaled ordinally. Several estimators with robust corrections to standard errors and chi-square
goodness of fit statistics, such as robust ML (MLR: Muthén & Muthén, 2010), robust
unweighted least squares (ULSMV: Muthén, 1993; Satorra & Bentler, 1994), and robust
weighted least squares (WLSMV: Muthén, du Toit, & Spisic, 1997), have been proposed in the
literature, and are considered superior to “conventional” ML when ordinal data on response
variables are employed in latent variable analysis. It is noted in passing that robust ML has
been suggested for use when ordinal observed variables have at least five response categories
(e.g., Johnson & Creech, 1983; Rigdon, 1998; Raykov, 2012, and references therein). The
robust ML estimator is also frequently used by applied researchers, based on the argument
that ordinal data on response variables could be considered “approximately continuous” if the
number of observed variables’ categories is sufficiently large.

A growing number of simulation studies have compared the relative performance of
different estimators in ordinal CFA (Hoogland & Boomsma, 1998). However, one major
limitation of previous ordinal CFA simulation studies is that researchers may have devoted less
attention to inter-factor correlation estimates. Some simulation studies examined the joint
performance of both factor loading and inter-factor correlation estimates (see, e.g., Lei, 2009;
Yang-Wallentin, Jöreskog, & Luo, 2010). Yet, the performance of different robust estimation
methods on inter-factor correlation estimates is unclear and unexplored. Although a few
simulation studies compared the performance of different estimation methods in an SR model
with ordinal observed variables (see, e.g., Anderson, 1996; Coender, Satorra, & Saris, 1997),

2

they failed to incorporate robust corrections to standard errors and chi-square statistics and
excluded the effect of number of observed variables’ categories. To date, no simulation study
has been identified in the extant literature that has employed an SR model with ordinal
observed variables to compare the performance of robust estimators since the three robust
estimators were developed and made available in widely circulated computer programs.
Therefore, the comparison of the performance of robust estimators on structural coefficients
remains an open research question. Additionally, the performance of MLR implemented in
Mplus has not yet been systematically evaluated in the literature. Given that robust estimators
have recently received considerable attention also in applied research settings, it can be
expected that findings on their performance in structural regression models with ordinal data
on response variables, in particular of the three robust estimators (MLR, WLSMV, and ULSMV),
would be of particular importance for empirical researchers in the educational, social, and
behavioral sciences.

The central objective of this thesis is to carry out a Monte Carlo simulation study
addressing gaps in the extant literature and contributing to our understanding of the impact of
ordinal observed variables on parameter estimates, in particular (but not limited to) of
structural regression coefficients, their associated standard errors, the chi-square goodness of
fit statistics, and RMSEA in SR models. Another important objective is to compare the
performance of the four different estimators (ML, MLR, ULSMV and WLSMV) in SR models with
ordinal observed indicators under different experimental conditions. Findings from this study
are expected (1) to inform the work of applied researchers with respect to the importance of
attending to assumption violations, and (2) to translate directly into recommendations for
selecting an “appropriate” estimator under empirical circumstances frequently encountered in

3

current research practice. Finally, implications of the findings for structural regression models
using these four estimators are discussed, and the limitations of this study as well as
potentially directions for future research are discussed.

The remainder of this dissertation is organized as follows. It begins by (1) delineating the
parameterization of a structural regression model with ordinal observed variables, followed by
(2) describing the estimation of thresholds and polychoric correlations, subsequently (3)
introducing two major estimation approaches: least squares and maximum likelihood, then (4)
providing a brief review of prior research that has investigated the behavior of the four
estimators in applications, (5) presenting the aims of the study, (6) outlining the model
specification, simulation design, and evaluation criteria, (7) reporting the results, and finally (8)
concluding with a discussion of limitations of this study, recommendations for applied
researchers, and directions for future research, as well as a series of brief “take-home”
messages for empirical researchers.

Structural Regression Models
A structural regression model (i.e., a structural equation model with a regression
relationship between some of its latent variables) permits testing hypothetical
associations/relationships among latent variables measured each by a set of observed
variables. A structural regression model with ordinal observed variables, in general, consists of
two components: (i) the measurement models and (ii) the structural model. The measurement
models can be expressed as follows (Bollen, 1989)

y* = vy* + Λy*η + ε,

4

(1)

and

x* = vx* + Λx*ξ + δ,

(2)

where vy* is a p × 1 vector of intercept terms for y*, vx* a q × 1 vector of intercept terms for x*,
y* represents a p × 1 vector of latent response variables y*s underlying ordinal observed,
endogenous variables ys, x* a q × 1 vector of latent response variables x*s underlying ordinal
observed, exogenous variables xs, Λy* a p × m matrix of factor loadings for y*, Λx* a q × n
matrix of factor loadings for x*, η an m × 1 vector of endogenous latent variables, ξ a n × 1
vector of exogenous latent variables with E(ξ) = κ and Cov(ξ) = Φ (a n × n
variance-covariance matrix of latent variables ξ), ε a p × 1 vector of measurement errors in y*
with E(ε) = 0 and Var(ε) = Θε (a p × p diagonal matrix of residual variances for y*, assuming
measurement errors ε are uncorrelated with all other measurement errors and latent variables
η), δ a q × 1 vector of measurement errors in x* with E(δ) = 0 and Var(δ) = Θδ (a q × q
diagonal matrix of residual variances for x*, assuming measurement errors δ are uncorrelated
with all other measurement errors and latent variables ξ). It is also assumed that ε is
uncorrelated with δ.

The structural model is defined as

η = α + Bη + Γξ + ζ,

(3)

where α is an m × 1 vector of latent means for η, B an m × m matrix of structural regression

5

coefficients with zero diagonal elements among η (assuming |I − B| ≠ 0), Γ an m × n matrix
of structural regression coefficients between ξ and η, ζ an m × 1 vector of disturbance terms
in η with E(ζ) = 0 and Cov(ζ) = Ψ (an m × m diagonal matrix of residual variances for η,
assuming disturbance terms ζ are uncorrelated with all other disturbance terms and latent
variables ξ). It follows that E(η) = (I − B)−1(α + Γκ) and Cov(η) = (I − B)−1(ΓΦΓ’ + Ψ)(I −
B)−1’.

Let θ denote the vector of model parameters. Then, the mean structure for the latent
response variable (y*, x*) of a general structural regression model parameterized in θ can be
expressed as

µμ  !∗

(4.1)

µ(θ) = µμ ∗ ,
  !

where µy* = vy* + Λy*(I − B)−1(α + Γκ) and µx* = vx* + Λx*κ.

Similarly, the covariance structure implied by this model can be expressed as

Σ*(θ) =

𝚺 !∗ !∗

𝚺 !∗ !∗

𝚺 !∗ !∗

𝚺 !∗ !∗

,

(4.2)

where Σx*x* = Λx*ΦΛ’x* + Θδ, Σy*y* = Λy*(I − B)−1(ΓΦΓ’ + Ψ)(I − B)−1’Λ’y* + Θε, and Σy*x* =
Λy*(I − B)−1ΓΦΛ’x*. Unlike a structural regression model with continuous observed variables,
the variances of measurement errors (i.e., the diagonal elements of Θδ and Θε) are not
identified here. These variances can be identified by either standardizing the latent response

6

variables y* and x* or standardizing the measurement errors δ and ε. The former is the default
given by the Delta parameterization in Mplus; the latter is referred to as Theta
parameterization (Muthén & Muthén, 2010). In order to introduce metrics for the latent
response variables, the variances of the latent response variables y* and x* have been
assumed for convenience to be equal to 1 when ordinal observed variables are observed.
Therefore, Θδ has to be constrained as

Θδ = I – diag(Λx*ΦΛ’x*),

(4.3)

and Θε has to be constrained accordingly as

Θε = I – diag(Λy*(I − B)−1(ΓΦΓ’ + Ψ)(I − B)−1’Λ’y*).

(4.4)

As a consequence, the Σ*(θ) has unit diagonal elements and therefore reduces as a correlation
matrix implied by the model under consideration. Next, the relationships between the latent
constructs (η and ξ) and underlying latent response variables (y* and x*) are estimated via
analysis of the correlation matrix among the latent response variables y* and x*, using the
ordinal observed data.

Thresholds and Polychoric Correlations
A correlation between two normal, latent response variables is referred to as a
polychoric correlation, for which the two ordinal observed indicators have at least three
response categories. A polychoric correlation is typically estimated using a two-stage
procedure proposed by Olsson (1979; also see Bollen, 1989; Jöreskog, 2005): (i) the

7

estimation of thresholds from the univariate marginal distributions, and (ii) the estimation of
polychoric correlations through the bivariate marginal distributions for given the threshold
estimates.

A continuous, normal, latent response variable y* underlies an ordinal observed variable
y in the population:

y = c, if τc−1 < y* < τc,

c = 1, 2, …, g,

(5)

where c defines the observed value of an ordinal variable y, τ is the threshold (−∞ = τ0 < τ1 <
τ2 …< τg−1 < τg = ∞), and g is the number of ordered categories. In the educational,
behavioral, and social sciences, many latent constructs of interest are “conceptually”
continuous, and therefore assuming an underlying continuous y* is a reasonable approach
(Coenders, Satorra, & Saris, 1997). For example, a respondent tends to endorse the kth
response category when her/his latent response value y* lies between τk−1 and τk. The ordinal
observed data only provide an approximation of the underlying continuous, latent response
variable because ordered observed categorical data in nature are discrete. A standard normal
distribution is selected for the latent response variable y* with a probability density function
𝜙! u =

  !!

!
!!

e! ! , −∞   < u <   ∞ and a cumulative distribution function Φ! (u). The probability of

the ith category response is obtained as

πi = p (y = i) = p (τi−1 < y* < τi) =

!!
𝜙 (u) 𝑑𝑢
!!!! !

8

=    Φ! τ! − Φ! (τ!!! ),

(6)

and it follows that

τ! = Φ! !! π! +    π! + ⋯ , +  π! ,

i = 1, 2, …, g−1,

(7)

where Φ! !! is the inverse of the standard normal cumulative distribution function. Next, τi
can be estimated as

τi = Φ! !! p! +    p! + ⋯ , +  p! , i = 1, 2, …, g−1,

(8)

where pi is the sample proportion of responses in category i. It is noted that the threshold
estimation model is saturated. Namely, the number of threshold parameters (i.e., m−1) is
equal to the number of non-redundant sample proportions.

Let each ordinal observed variable y1 and y2 have g1+1 (τ1,1, τ1,2 …, τ1,g1) and g2+1 (τ2,1,
τ2,2 …, τ2,g2) categories, respectively. Assume that underlying variables y1* and y2* are both
standard normal distributions with zero means, unit variances, and a correlation ρ. A standard
bivariate normality of y1* and y2* is also assumed with its probability density function
𝜙! u, v, ρ =

!
!" (!!!! )

e

!

  !! !!"#$!!!
!(!!!! )

, −∞   < u, v <   ∞. This correlation ρ between y1* and y2* defines a

polychoric correlation. The likelihood function of yielding the observed bivariate sample can be
defined as

L=C

!! !!
!!!

!! !!
!!"
!!! π!" ,

9

(9)

where C is a constant and nij is the frequency in cell (i, j) of a bivariate contingency table. πij
is the probability in cell (i, j) defined as

πij = p (y1 = i, y2 = j) = p (τ1,i−1 < y1* < τ1,i, τ2,j−1 < y2* < τ2,j)
=

!!,!
!!,!
ϕ (u, v, ρ) dudv,
!!,!!! !!,!!! !

(10)

which can then be rewritten as

πij =    Φ! τ!,! , τ!,! , ρ − Φ! τ!,! , τ!,!!! , ρ − Φ! τ!,!!! , τ!,! , ρ + Φ! τ!,!!! , τ!,!!! , ρ ,

(11)

where Φ! is the standard bivariate normal cumulative distribution function with the
correlation coefficient ρ. Take the natural logarithm of the likelihood function L and the partial
derivative on lnL with respect to ρ:

lnL = lnC +
∂lnL
=   
∂ρ

!! !!

!! !! n

!!!

!!!

!"   

π!"

!! !!
!!!

!! !!
!!! 𝑛!"   𝑙𝑛 𝜋!" ,

(12)

[ϕ! τ!,! , τ!,! , ρ − ϕ! τ!,! , τ!,!!! , ρ
(13)

−ϕ! τ!,!!! , τ!,! , ρ + ϕ! τ!,!!! , τ!,!!! , ρ ],

Threshold estimates are obtained using sample cumulative marginal proportions of the
bivariate contingency table, for example,

τ!,! = Φ! !!   

!
!!!

!! !!
!!! p!"

10

,

(14)

where p!" is the sample proportion in cell (k, j). Next, solve the equation

!!"!
!!

= 0  (i.e.,

maximizing lnL) using the given threshold estimates to obtain the polychoric correlation
estimate ρ. It is noted that a Pearson product-moment correlation between two ordinal
observed variables is generally attenuated because the underlying continuum is coarsely
categorized to obtain ordinal observed variables. A greater amount of attenuation in Pearson
product-moment correlation estimates occurs when ordinal observed variables have only a few
alternatives, and/or opposite skewed and increasingly leptokurtic distributions (Bollen, 1989;
Olsson, 1979; Muthén & Kaplan, 1992). In the next section, two estimation families used in
SEM with ordinal observed variables to obtain model parameters, standard errors, and
chi-square goodness of fit statistics are introduced in turn: least squares and maximum
likelihood approaches.

Least Squares Estimation
Muthén (1984) made a substantial breakthrough in analyzing a structural equation
model with ordinal observed variables using a weighted least squares (WLS) approach. The
thresholds and polychoric correlations are first estimated using two-stage ML estimation in
the preceding paragraph. Parameter estimates are then obtained using a consistent estimator
of the asymptotic covariance matrix of the polychoric correlation and threshold estimates
(denoted as 𝐕) in a weight matrix W, to minimize the weighted least squares fit function
(Muthén, 1984):

FWLS = [s – σ(θ)]’ W−1 [s – σ(θ)],

11

(15)

where θ is the vector of model parameters, σ(θ) is the model-implied vector consisting of the
non-duplicated, vectorized elements of Σ*(θ) (i.e., vech[Σ*(θ)]), and s is the vector containing
the non-duplicated, vectorized elements of sample statistics (i.e., threshold and polychoric
correlation estimates). Note that the vech(.) operator strings out non-redundant matrix
elements by stacking them up into a column vector, leaving out the upper-diagonal elements.
The weight matrix includes variability of threshold and polychoric correlation estimates and
interrelationships among polychoric correlation estimates. This procedure only incorporates
univariate and bivariate margins into the estimation of model parameters, and it often has
been termed as limited information estimation, in contrast to full information that uses
subjects’ complete multivariate response pattern, typically paralleling the item response
theory (IRT) framework (see, e.g., Forero & Maydeu-Olivares, 2009; Wirth & Edwards, 2007,
for a full discussion of limited information vs. full information).

Standard errors are given by the square roots of the diagonals of the asymptotic
covariance matrix of the parameter estimates θ from a Taylor expansion (see, e.g., Browne,
1984; Satorra, 1989):

aCov(θ)WLS = N−1(𝚫′𝐖 !𝟏 𝚫)−1𝚫′𝐖 !𝟏 𝐕𝐖 !𝟏 𝚫(𝚫′𝐖 !𝟏 𝚫)−1,

(16)

and because of W = 𝐕, it reduces to

aCov(θ)WLS = N−1[𝚫′𝐕 !! 𝚫]−1,

12

(17)

where N represents the sample size, 𝚫 =

!!(!)
!!

is the so-called Jacobian matrix of first

derivatives when evaluating at the parameter estimates θ, and 𝐕 is the estimated asymptotic
covariance matrix of s. The chi-square goodness of fit statistic is defined as

TWLS = (N − 1) FWLS(θ, s),

df = s – t,

(18)

where s = the number of unique elements in s and t = the number of independent model
parameters. That is, degrees of freedom are the difference between the number of parameters
in the unrestricted model and the number of parameters in the estimated model. However, the
performance of WLS deteriorates with small sample sizes and/or model complexity, mainly
because of the size and the invertibility of the weight matrix W = 𝐕. Specifically, WLS has been
subject to non-convergence problems with small sample sizes and/or complex models in
simulation studies (Flora & Curran, 2004; Oranje, 2003). As the number of ordinal observed
variables increases, the size of 𝐕 grows exponentially, leading to demanding computations
and numerical problems in the process of estimation. In addition, when sample sizes are
small, the estimated asymptotic covariance matrix 𝐕 has much sampling variation, and the
inversion of 𝐕 is typically infeasible as well (Browne, 1984; Jöreskog & Sörbom, 1996;
Muthén, 1993). These weaknesses render the WLS estimator less attractive for applications.

Empirical research has also suggested that WLS is inferior to other
Least-Squares-family estimators (e.g., WLSMV or ULSMV) in CFA models when the sample
size is small and/or the model becomes complicated (Flora & Curran, 2004; Oranje, 2003;
Yang-Wallentin, Jöreskog, & Luo, 2010). Flora and Curran (2004) found that (1) parameter

13

estimates were less overestimated by WLSMV than WLS; (2) standard errors were less
negatively biased by WLSMV than WLS, relative to the standard deviation of parameter
estimates across replications; and (3) chi-square statistics were less inflated by WLSMV than
WLS. Yang-Wallentin, Jöreskog, & Luo (2010) revealed that the performance of WLS was
uniformly worse in terms of parameter estimates, standard errors, and chi-square statistics,
than WLSMV and ULS with robust corrections.

One possible way to circumvent the troubling features and ease the computational
burden is to choose a simple weight matrix, such as the identity matrix I, or a reduced and
invertible from of 𝐕 (e.g., retaining diagonal elements of 𝐕 only). The former choice
simplifies WLS to unweighted least squares (ULS: Muthén, 1993), and the latter reduces to
diagonally weighted least squares (TLS (two-step weighted least squares): Christoffersson,
1977; DWLS: Jöreskog & Sörbom, 1996; robust WLS or WLSMV: Muthén, du Toit, & Spisic,
1997). The fit function for each can be represented as follows

FULS = [s – σ(θ)]’ (I) −1 [s – σ(θ)],

(19)

FD-WLS = [s – σ(θ)]’ (WD)−1 [s – σ(θ)],

(20)

and

where WD = diag(𝐕) contains only diagonal elements of the estimated asymptotic covariance
matrix of the polychoric correlation and threshold estimates. Throughout this dissertation,
D-WLS is used to represent diagonally weighted least squares due to various terms used in

14

currently circulated computer programs. More specifically, D-WLS only weights the residual
vector [s – σ(θ)] using the asymptotic “variances” of polychoric correlation and threshold
estimates, and ULS weights all elements of the residual vector “equally” using the identity
matrix I (Bollen, 1989; Muthén & Muthén, 2010).

Robust Corrections to Standard Errors and Test Statistics
Unlike the aforementioned full weight matrix W in WLS, I and WD only contain limited or
reduced/partial information in the weight matrix. A disadvantage of ULS is that the weight
matrix I makes obtained parameter estimates less sensitive to differences in the elements of
the residual vector (Bolt, 2005). Although improvement can be expected while using the
diagonal weight matrix WD, the asymptotic covariances between ploychoric correlation
estimates are still left outside the estimation procedure. The parameter estimates obtained by
ULS and D-WLS are therefore not asymptotically efficient (i.e., smaller sampling error),
resulting in potentially inaccurate standard error estimates. That is, the WLS parameter
estimates have the smallest variances within the class of least squares estimators. Because
both ULS and D-WLS are less efficient than WLS, upward corrections applied to standard errors
are suggested. Underestimation of standard errors may affect statistical inferences for
parameter estimates. Robust correction to standard errors are implemented in the estimated
asymptotic covariance matrix of the parameter estimates θ for ULS estimation (Muthén, 1993;
Satorra & Bentler, 1994):

aCov(θ)ULS = N−1(𝚫′𝚫)−1𝚫′𝐕𝚫(𝚫′𝚫)−1,

and for D-WLS estimation (Muthén, du Toit, & Spisic, 1997):

15

(21)

aCov(θ)D-WLS = N−1(𝚫′𝐖𝐃!𝟏 𝚫)−1𝚫′𝐖𝐃!𝟏 𝐕𝐖𝐃!𝟏 𝚫(𝚫′𝐖𝐃!𝟏 𝚫)−1.

(22)

Likewise, because of using a consistent estimator of the asymptotic covariance matrix of
the polychoric correlation and threshold estimates (𝐕) as the full weight matrix, TWLS is
asymptotically chi-square distributed. However, the standard test statistics TULS and TWLS are
not appropriate for model fit evaluation because the test statistics produced by ULS and D-WLS
are no longer asymptotically chi-square distributed. This robust correction entails adjusting
both the mean and variance of the test statistics. Therefore the mean- and variance-adjusted
chi-square statistic can each be implemented in the ULS estimator (Asparouhov & Muthén,
2010):

TULSMV = aTULS + b,

df = s – t,

(23)

where TULS = (N − 1) FULS(θ, s), 𝐕 is the estimated asymptotic covariance matrix of s, a =
!"
!"#$%(𝐔𝐕𝐔𝐕)

is a scale factor, b = df –

!"  [!"#$% 𝐔𝐕 ]!
!"#$%(𝐔𝐕𝐔𝐕)

is a shift parameter, and 𝐔 = I − 𝚫(𝚫′𝚫)−1𝚫′;

and in the D-WLS estimator (Asparouhov & Muthén, 2010):

TD-WLSMV = = aTD-WLS + b,

df = s – t,

(24)

where TD-WLS = (N − 1) FD-WLS(θ, s), 𝐕 is the estimated asymptotic covariance matrix of s, a =
!"
!"#$%(𝐔𝐕𝐔𝐕)

, b = df –

!"  [!"#$% 𝐔𝐕 ]!
!"#$%(𝐔𝐕𝐔𝐕)

, and 𝐔 = 𝐖𝐃!𝟏 − 𝐖𝐃!𝟏 𝚫(𝚫′𝐖𝐃!𝟏   𝚫)−1𝚫′𝐖𝐃!𝟏 . Unlike WLS, 𝐕

need not be inverted (i.e., a positive definite matrix) in the computation of robust standard

16

errors and adjusted chi-square test statistics using ULS and D-WLS. Both TULSMV and TD-WLSMV
result in smaller test statistics in comparison to TWLS. That is, chi-square statistics in the robust
estimators are downwardly adjusted to compensate for the effect of only including limited or
reduced/partial information in the weight matrix. This correction can help control for the
probability of Type I error (i.e., rejecting a correctly specified model by chance). Furthermore,
this new second order chi-square correction has been implemented in Mplus 6 and later
versions. For the Satterthwaite (1941) type correction prior to Mplus 6, refer to Satorra &
Bentler (1994) and Muthén, du Toit, & Spisic (1997).

It is worth reiterating that (1) the aim of the robust corrections to standard errors in the
already available ULS and D-WLS estimators is to compensate for the loss of efficiency (i.e.,
smaller variability of parameter estimates) when the full weight matrix is not performed; and
(2) the mean- and variance-adjustments for test statistics in ULS and D-WLS estimators are
targeted to make the shape of test statistics be approximately close to the reference
chi-square distribution with the associated degrees of freedom. Note that the mean-adjusted
chi-square statistic in diagonally weighted least squares estimation is not presented here (i.e.,
ESTIMATOR = WLSM, see Appendix C for details).

Maximum Likelihood Estimation
When the assumption of multivariate normality is considered tenable in a SEM model
with “continuous” observed variables, parameter estimates can be obtained by maximizing the
likelihood of the observed data; that is, the minimization of the maximum likelihood fit function
(Bollen, 1989):

17

FML = ln|Σ(Θ)| + trace[SΣ−1(Θ)] – ln|S| – r,

(25)

where Θ denotes the vector of model parameters, Σ(Θ) is the model-implied “covariance”
matrix, S is the sample-based “covariance” matrix, and r (= p + q) is the total number of
continuous observed variables in the model. Under the multivariate normality assumption,
standard errors are the square roots of the diagonal elements of the estimated asymptotic
covariance matrix for Θ from FML:

aCov(Θ)ML =

!
!!!

E

!! !!"
!!!!!

!!

.

(26)

The test statistic that uses Wishart-based likelihood is defined as

TML = (N − 1) FML(Θ, S),

df = s – t,

(27)

where s = the number of unique elements in S and t = the number of independent model
parameters (Bollen, 1989; Muthén & Muthén, 2010). However, it is generally not advisable to
use ML for ordinal observed variables with only a few response categories. In order to use ML,
one may assume that a given set of ordinal observed variables are “approximately continuous”
if they have more than five response alternatives, and further one treats them as if they were
continuous. The normality of ordinal observed variables due to categorization is typically not
plausible. The superiority of the robust ML method (MLR) over the normal theory-based ML
method has proved manifested in the extant literature when modeling ordinal observed
variables.

18

Robust Corrections to Standard Errors and Test Statistics
In order to accommodate ordinal data on response variables (i.e., approximately
continuous), standard errors and chi-square goodness of fit statistics are corrected in the MLR
estimation to enhance robustness against the presence of non-normality. Ordinal observed
variables are rarely normally distributed but often exhibit non-normality in the form of
asymmetry to some degree (Micceri, 1989). Acquiescence (or disacquiescence) response
style may introduce both skewed and leptokurtic distributions, whereas extreme response
style may result in slightly skewed and platykurtic distributions (Weijters, Geuens, &
Schillewaert, 2010). The parameter estimates obtained with ML are not asymptotically
efficient, provided that the normality assumption is not tenable. The obtained aCov(Θ)ML in
equation (26) is no longer consistent for the asymptotic covariance matrix of Θ, leading to
inaccurate standard error estimates (Yuan, Bentler, & Zhang, 2005; Yuan & Hayashi, 2006).
Rather, a consistent estimator of the asymptotic covariance matrix of the parameter estimates
Θ for MLR can be estimated using the pseudo maximum likelihood (PML) approach
(Asparouhov & Muthén, 2005; Savalei, 2010; Yuan & Schuster, 2013):

aCov(Θ)MLR = N−1(𝚫′𝐈𝐎𝐁 𝚫)−1𝚫′𝐈𝐎𝐁 𝐕𝐈𝐎𝐁 𝚫(𝚫′𝐈𝐎𝐁 𝚫)−1,

(28)

𝐈𝐎𝐁 = D’{Σ−1(Θ)⊗[(Σ−1(Θ)SΣ−1(Θ) – ½Σ−1(Θ)]}D,

(29)

and

19

where 𝚫 =

!!(!)
!!

is the matrix of model first derivatives evaluated at the parameter estimates

Θ, (𝚫′𝐈𝐎𝐁 𝚫) is the estimated “observed” information matrix, and 𝐕 is the estimated asymptotic
covariance matrix of S. The “duplication” matrix D is of order r2 × ½r(r+1) (r = the number of
observed variables in Σ(Θ), see Magnus & Neudecker, 1986, p. 172) and ⊗ denotes a
Kronecker product. Note that D is utilized to transform a r2 × r2 symmetric matrix,
Σ−1(Θ)⊗(Σ−1(Θ)SΣ−1(Θ) – ½Σ−1(Θ), into a ½r(r+1) × ½r(r+1) symmetric matrix, 𝐈𝐎𝐁 . The
middle matrix 𝚫′𝐈𝐎𝐁 𝐕𝐈𝐎𝐁 𝚫 contains the sample estimates of skewness and kurtosis of observed
variables in order to correct the possible violation of normality assumption (Yuan, Bentler, &
Zhang, 2005). While modeling non-normal data, the ML standard error estimates in general
are deflated, whereas the robust standard errors obtained with MLR are therefore adjusted
upward to alleviate some underestimation of standard error estimates.

As is well known, non-normality of observed variables could lead to substantial
overestimation of chi-square goodness of fit statistics. Similar to the two variants of the
Yuan-Bentler (1997, 1998) and the Satorra & Bentler (1994) robust chi-square statistics, a
modification of chi-square statistics proposed by Asparouhov & Muthén (2005) using the
pseudo maximum likelihood (PML) estimator is defined as

TMLR = ãTML,

where ã =

!"
!"#$%  [(𝐕  𝐈𝐎𝐁 )]  !  !"#$%  [!!"#(!)!"#   (𝚫! 𝐈𝐎𝐁 𝚫)]

df = s – t,

(30)

  is a scale factor, TML = (N − 1) FML(Θ, S), TMLR

denotes the robust ML chi-square test statistic using MLR estimation in Mplus,  𝐕 is the
estimated asymptotic covariance matrix of S, s = the number of unique elements in S, and t =

20

the number of total model parameters. The scale factor ã is used to remove the effect of
skewness and kurtosis of observed data in order to adjust for deviation from normality. TMLR
was found to perform well under a variety of conditions investigated by Asparouhov & Muthén
(2005). It is worth noting that the downward adjustments for test statistics in MLR can yield
the distributional behavior of test statistics that more closely follows a central chi-square in the
presence of non-normality. Note that different robust corrections to standard errors and
chi-square statistics in maximum likelihood estimation computations are also available but
outside the scope of this study (i.e., ESTIMATOR = MLM or MLMV, see Appendix C for details).

21

CHAPTER 2
EMPIRICAL FINDINGS

A review of simulation studies across six high-impact journals was conducted to
determine whether a Monte Carlo simulation study examined ordinal confirmatory factor
analysis or structural equation modeling with ordinal observed variables over 20 years
(between the years 1994 and 2013) in Structural Equation Modeling, Psychological Methods,
Multivariate Behavioral Research, Psychometrika, Educational and Psychological Measurement,
and Applied Psychological Measurement. I have identified a total of 13 studies carrying out
structural equation modeling with ordinal observed variables (4 articles) or ordinal
confirmatory factor analysis (9 articles). The two studies using structural regression models
with ordinal observed indicators examined the effect of parceling methods for categorical
variable methodology, which is less relevant to the goals of current research. For the other two
studies, Anderson (1996) mainly focused on an evaluation of distributional misspecification
corrections applied to the McDonald Fit Index that was rarely used in empirical studies and
typically not provided in software programs. Coenders, Satorra, and Saris (1997) examined
the performance of three correlation estimation methods in an SR model, and their attention
was only restricted to point estimates of model parameters using the normal-theory maximum
likelihood method and the weighted least squares procedure. The night studies associated with
ordinal confirmatory factor analysis typically compared the relative performance of different
estimators on parameter estimates (i.e., factor loadings, inter-factor correlations if any),
standard errors, and chi-square goodness of fit statistics. The empirical findings, using ML and
the three robust estimators (MLR, ULSMV, and WLSMV), can be briefly summarized below.
Table 1 lists 6 major simulation studies that have investigated the performance of the three
robust estimators in ordinal CFA models. Because MLR has not been systematically studied in

22

the previous simulation literature across the six aforementioned journals, a review of robust
corrections to standard errors and chi-square goodness of fit statistics in least squares and
maximum likelihood estimators is included with all other robust methods in Mplus, EQS, and
LISREL (see Table 2 for comparison of the three robust estimators in the 3 different SEM
software programs; see Table 3 for comparison of the two major estimation approaches in
Mplus). While these robust standard errors and chi-square statistics may exhibit very slight
differences across varying adjustments in a finite sample, they should be asymptotically
equivalent as the sample size approaches infinity.

Parameter Estimates
Factor loading estimates were less biased by WLSMV than ML and MLR, even with more
than five response alternatives (Beauducel & Herzberg, 2006). Relative bias in factor loading
estimates from ULSMV was equal to or smaller than WLSMV (Forero, Maydeu-Olivares, &
Gallardo-Pujol, 2009) across the conditions (i.e., varying distributions of ordinal observed
variables, numbers of observed variables’ categories) investigated, and relative bias in factor
loading estimates from ULSMV was smaller than ML and MLR even with more than seven
response alternatives (Rhemtulla, Brosseau-Liard, & Savalei, 2012), irrespective of the level of
asymmetric distributions of ordinal observed variables. Inter-factor correlations were,
generally, less overestimated by ML and MLR than WLSMV (Beauducel & Herzberg, 2006) and
ULSMV (Rhemtulla, Brosseau-Liard, & Savalei, 2012) across varying numbers of observed
variables’ categories from two to seven, except under extremely asymmetric distributions of
ordinal observed indicators. However, Yang-Wallentin, Jöreskog, & Luo (2010) gave empirical
evidence that parameter estimates (consisting of factor loadings and inter-factor correlations
jointly) were essentially unbiased for ULSMV, WLSMV, MLR, and ML, regardless of the number

23

of observed variables’ categories and the level of asymmetric distributions of ordinal observed
variables. Lei (2009) found that relative bias in parameter estimates (including both factor
loadings and inter-factor correlations) was generally negligible for WLSMV, MLR, and ML across
different distributions of ordinal observed variables. Oranje (2003) concluded that ML, MLR,
and WLSMV produced equally accurate parameter estimates across different numbers of
observed variables’ categories.

Standard Error Estimates
The “uncorrected” standard errors of factor loadings produced by ML were higher than
the robust standard errors of those obtained by WLSMV across different numbers of observed
variables’ categories (Beauducel & Herzberg, 2006). However, the “uncorrected” standard
errors of factor loadings produced by ULS were more accurate, in terms of the standard
deviation of parameter estimates over replication, than the robust standard errors of factor
loadings produced by WLSMV (Forero, Maydeu-Olivares, & Gallardo-Pujol, 2009). The robust
standard errors of parameter estimates (including both factor loadings and inter-factor
correlations) produced by ULSMV and WLSMV were generally less biased than those obtained
by robust ML, regardless of the number of observed variables’ categories and the level of
asymmetric distributions of ordinal observed variables (Yang-Wallentin, Jöreskog, & Luo, 2010;
Lei, 2009). More specifically, Rhemtulla, Brosseau-Liard, and Savalei (2012) revealed that
ULSMV produced less biased standard errors of factor loadings than MLMV, whereas ULSMV
produced more biased standard errors of inter-factor correlations than MLMV, consistently
across different numbers of observed variables’ categories.

Chi-Square Goodness of Fit Statistics

24

The “uncorrected” chi-square statistics produced by ML tended to over-reject the
proposed models compared to the robust chi-square statistics obtained by WLSMV, when the
number of observed variables’ categories was less than 4 (Beauducel & Herzberg, 2006). The
“mean-adjusted” chi-square statistics obtained by MLM provided the most correct rejection
rates compared to those obtained by WLSMV across varying numbers of observed variables’
categories (Oranje, 2003). On the contrary, the “mean- and variance-adjusted” chi-square
statistics obtained by WLSMV have shown to be slightly more powerful than the
“mean-adjusted” chi-square statistics produced by MLM across different levels of asymmetric
distributions of ordinal observed variables (Lei, 2009). On the other hand, the “mean- and
variance-adjusted” chi-square statistics were comparably good for MLMV and ULSMV when the
number of observed variables’ categories ranged from four to six. Furthermore, when the
number of observed variables’ categories was two or three, the “mean- and variance-adjusted”
chi-square statistics produced by MLMV tended to over-reject the proposed models, and those
obtained by ULSMV were likely to under-reject the proposed models (Rhemtulla,
Brosseau-Liard, & Savalei, 2012). Finally, the “mean-adjusted” chi-square statistics were
essentially equal for MLM, ULSM, and WLSM (Yang-Wallentin, Jöreskog, & Luo, 2010),
regardless of the number of observed variables’ categories and the level of asymmetric
distributions of ordinal observed variables.

25

CHAPTER 3
PRESENT STUDY

The present study was designed to address gaps in the literature and to advance our
understanding of the impact of ordinal observed variables on parameter estimates, standard
errors, and chi-square goodness of fit statistics in a structural regression (SR) model using ML,
MLR, ULSMV, and WLSMV. An SR model was selected to broaden the scope of methodological
perspectives beyond any previous study in terms of model complexity, because numerous
simulation studies have been conducted with ordinal CFA models under extensive conditions.
Two literature reviews reported that the median number of latent factors for a CFA model was
3, and the median number of observed indicators in total was 16, indicating that a CFA model
in the areas of scale development and item analysis is in general smaller than an SR model in
applied settings (Jackson, Gillaspy, & Purc-Stephenson, 2009; DiStefano & Hess, 2005). One
major limitation of previous ordinal CFA simulation studies is that researchers have devoted
excessive attention to factor loading estimates instead of inter-factor correlation estimates.
More specifically, they have (1) simply excluded the inter-factor correlations (e.g., Forero,
Maydeu-Olivares, & Gallardo-Pujol, 2009); (2) used homogeneous values for the population
inter-factor correlations (e.g., Beauducel & Herzberg, 2006; DiStefano, 2002); or (3)
examined the joint performance of both factor loading and inter-factor correlation estimates
(e.g., Lei, 2009; Yang-Wallentin, Jöreskog, & Luo, 2010), leaving the undetermined
performance of inter-factor correlation estimates.

When a researcher employs an SR model to study relational phenomena among latent
constructs of interest, the ultimate goal is to identify successfully the structural coefficients
(inter-factor correlations, structural regression coefficients, and possibly mediating effects),

26

given a tenable measurement model. The SR model proposed here has its practical advantage
of allowing applied researchers to study the inter-factor correlations, direct effects, and
mediating/indirect effects, which are not uncommon in published research. Heterogeneity of
structural regression coefficients and inter-factor correlations in the proposed SR model is
more realistic in applied settings and can also assess the effect of structural coefficient
magnitude.

Hoogland and Boomsma (1998) systematically reviewed 34 studies in SEM from 1984 to
1994. They found that 89% used CFA models and 11% employed SR models. The
aforementioned literature search from 1994 to 2013 that I conducted reflects a growing
interest in SR models (about 30% of 59 studies). Researchers also arrived at the same
recommendation that future research on a more complex SR model is needed (see, e.g.,
Bandalos, 2006; Beauducel & Herzberg, 2006; Flora & Curran, 2004; Rhemtulla,
Brosseau-Liard, & Savalei, 2012). In this study, an effort was undertaken to extend the
existing literature (Anderson, 1996; Coenders, Satorra, & Saris, 1997; Ethington, 1987) on
sample size, ordinal observed distributions, and number of observed variables’ categories to a
broader set of structural regression models with ordinal observed variables. This study aims to
address several important limitations of generalizability applied to the work by Anderson
(1996) and Coenders, Satorra, and Saris (1997), in which (1) both studies failed to incorporate
robust corrections to standard errors and chi-square statistics due to unavailability of
computer programs; (2) one merely used two ordinal observed variables for each latent
construct in the SR model, not generally reflecting realistic applications; and (3) both left the
effect of number of observed variables’ categories outside the simulation design. Therefore,
the proposed model design in this study first attempts to complement the related prior

27

research, and findings are expected to assist applied researchers in making more informed
decisions while analyzing an SR model with ordinal observed indicators.

Second, although MLR is not designed specifically for ordinal data on response variables,
one may assume that data are “approximately continuous” if the number of observed variables’
categories is sufficiently large. In practice, empirical researchers have congruously performed
MLR in ordinal CFA or CFA-based models when the number of categories for each observed
variable is more than five. Yet, unlike other robust ML estimators, MLR implemented in Mplus
has not been systematically evaluated by means of a Monte Carlo simulation study in the
literature, although its robust correction is similar but not equivalent to other robust ML
estimators (e.g., MLM in Mplus or ML, ROBUST in EQS). The inclusion of WLSMV and ULSMV in
the study also contributes to the existing literature because (1) MLR and WLSMV are very often
regarded as the most common estimators in an SR model with ordinal observed indicators due
to the violation of normality assumption; and (2) ULSMV has been shown to have some relative
superiority over ML with robust corrections in the analysis of ordinal confirmatory factor
models (Yang-Wallentin, Jöreskog, & Luo, 2010; Rhemtulla, Brosseau-Liard, & Savalei, 2012),
although it has less appeared in applied research.

Comparison of WLSMV and ULSMV can shed some light on the effectiveness of the two
weight matrices. More specifically, by looking into the weight matrices of ULSMV and WLSMV,
it seems that using the identity matrix I essentially makes the parameter estimates consistent,
and adding diagonal weights may possibly bring about a small improvement on parameter
estimates (Muthén and Muthén, 2010). This study’s specificity in evaluating the effectiveness
of diagonal weights can contribute to scholarly understanding of how the diagonal weight

28

matrix improves the accuracy and precision of parameter and standard error estimates. Finally,
as clearly explicated in the existing literature, it is generally not recommended to use the
normal theory-based maximum likelihood (ML) method when ordinal observed variables are
analyzed. However, ML estimation in this study served as a baseline to explore the differences
between ML and the three robust estimators. Therefore, it is worthwhile to investigate the
performance of the four estimators in an SR model with ordinal data on response variables.

Third, several simulation studies have also examined the impact of the number of
observed variables’ categories on ML and other least squares estimators in ordinal CFA (see,
e.g., Rhemtulla, Brosseau-Liard, & Savalei, 2012; Yang-Wallentin, Jöreskog, & Luo, 2010).
However, what has not yet been known is the impact of the number of observed variables’
categories on the overall quality of parameter estimates, especially structural regression
coefficients, robust standard error estimates, and the sensitivity of adjusted chi-square
statistics using MLR, ULSMV, and WLSMV in an SR model. Additionally, this study compared
the behavior of the MLR, ULSMV, and WLSMV estimators under varying degrees of normality
violation in an SR model, which extends the literature by the inclusion of asymmetric
distributions of ordinal observed variables (Beauducel & Herzberg, 2006). MLR has been
developed to permit modeling non-normal (approximately) continuous variables, whereas
ULSMV and WLSMV have been implemented to deal with non-normal data because both
estimators make no distributional assumption germane to the shape of observed variables in
the population from which samples are drawn. When ordinal observed variables exhibit
different levels of asymmetric distributions, the standard error estimates and chi-square
statistics produced from these estimators are different. Without better understanding of the
robustness of these estimators against non-normality, researchers are unlikely able to settle

29

upon an appropriate estimation method under suboptimal conditions in applications (e.g.,
Boomsma, 2013). The choice of estimation methods thus depends on the continuity
(concerning the number of observed variables’ categories) and the distribution of the ordinal
observed measures.

Finally, this study was designed to examine the effect of sample size while utilizing these
four estimators, because researchers have noted that a desirable sample size is known to be
an important factor in SR models. Sample size is almost universally an experimental factor in
a Monte Carlo simulation study (Paxton, Curran, Bollen, Kirby, & Chen, 2001). Sample size has
been shown to interact with the characteristics of the data (e.g., non-normality). A small
sample size may not only cause inaccurate parameter estimates and unreliable standard errors,
but can also give problems of non-convergence and improper or inadmissible solutions. In
addition, for a small sample size, the test statistic is likely not asymptotically chi-square
distributed. Applied researchers are therefore interested in determining the smallest sample
size (i.e., the sufficient sample size) at which the accuracy of parameter estimates, the
stability of standard error estimates, and the robustness of chi-square statistics can be
fulfilled.

The four estimators were evaluated by the quality of parameter estimates (i.e., factor
loadings, inter-factor correlations, and structural regression coefficients) and standard errors,
and by the performance of chi-square goodness of fit statistics, detailed further in the Outcome
Variables section of this thesis. In summary, this study builds on previous simulation studies
and mixed findings in pursuing the following two research questions:

30

1. Are any of the four estimators (ML, MLR, ULSMV, and WLSMV) consistently better or
worse than the others in the estimation of model parameters, standard errors, and
chi-square goodness of fit statistics across the experimental conditions investigated?

2. Are there any effects of the number of observed variables’ categories, the level of
asymmetric distributions of ordinal observed variables, and sample size on the
performance of ML, MRL, ULSMV, and WLSMV estimates in an SR model?

31

CHAPTER 4
METHOD

A Monte Carlo simulation study was carried out to determine what effects of different
configurations of the number of observed variables’ categories, the level of asymmetric
distributions of ordinal observed variables, and sample size have on parameter estimates,
standard errors, and chi-square goodness of fit statistics in a five-factor structural regression
model with ordinal observed variables.

Model Specification
A five-factor structural regression model (SRM) with ordinal data on response variables
is depicted in Figure 1. A five-factor structural regression model with each factor having 4
ordinal observed variables was examined as the representative of the “medium-sized” SEM
model specification frequently encountered in applications. To ensure representativeness of
the model design from an applied standpoint, I conducted another review of 29 empirical
studies using structural equation modeling from journals published by the American
Psychological Association, the APA Educational Publishing Foundation, and the Canadian
Psychological Association (through the PsycARTICLES database) during 2013, and 7 empirical
studies that appeared in Structural Equation Modeling since 1994. In terms of the size of model
being tested, the median number of latent factors across 36 studies was 5 (with 38% of the
models tested), and the median number of total observed variables was 18 (with 15 and 24
representing the 25th and 75th percentiles, respectively).

It is critical to choose a number of observed indictors per factor that is not too small (e.g.,
2 indicators per factor; see Coenders, Satorra, & Saris, 1997; Ethington, 1987), yet remains

32

practical in the context of a simulation study. In structural equation modeling applications, the
number of indicators per factor typically falls within the range of 2 to 5 (Ding, Velicer, & Harlow,
1995), and five or more indicators per factor have rarely appeared in the literature (Gerbing &
Anderson, 1985). I chose 4 ordinal observed indicators per factor, resulting in 20 ordinal
observed variables in total, which represents a reasonable number of observed variables in the
reviews of both Monte Carlo simulation studies and the applied literature, but apparently this
number is smaller than some impressive studies (more than 40 ordinal observed indictors in
total, e.g., see Beauducel & Herzberg, 2006; Forero, Maydeu-Olivares, & Gallardo-Pujol, 2009).
Prior research has shown that the performance of parameter estimates and standard errors
improves with increasing the number of observed variables per factor conditional on a set of
good quality indicators (Forero & Mayedu-Olivares, 2009; Forero & Maydeu-Olivares, &
Gallardo-Pujol, 2009; Gagné and Hancock, 2006; Gerbing & Anderson, 1985; Velicer & Fava,
1998). Marsh, Hau, Balla, and Grayson (1998) noted that the maximum accuracy of parameter
estimates appeared to be reached when the number of observed variables per factor was 4,
and trivially improved as the number of observed variables for each factor increased.

Four different estimation procedures that are given by ML, MLR, ULSMV, and WLSMV in
Mplus were used. For the first and second estimation procedures, each factor was measured by
four ordinal observed indicators that were treated as if they were continuous variables. The
parameter estimates, standard errors, and chi-square goodness of fit statistic were obtained
using ML and MLR. Since the analyzed ordinal observed indicators were assumed to be
approximately continuous in this case, data analysis for the ML and MLR estimators was based
on a sample-based covariance matrix. Regarding the third and fourth estimation procedures,
each ordinal observed indicator was instead determined by its continuous, normal, latent

33

response distribution. The asymptotic covariance matrix of the polychoric correlation and
threshold estimates was used for data analysis in ULSMV and WLSMV estimators to obtain the
parameter estimates, standard errors, and chi-square goodness of fit statistic. The analyzed
ordinal observed indicators in this case were specified as categorical variables in Mplus.

Simulation Design
For the sake of simplicity, homogeneous factor loadings are commonly used in
simulation studies (see, e.g., Anderson, 1996; Flora & Curran, 2004; Forero &
Maydeu-Olivares, 2009), which may not be representative of real-world conditions. In this
study, four factor loadings (Λy* and Λx*) were held at .8, .7, .6, and .5, with corresponding
residual variances (Θε and Θδ) automatically set to .36, .51, .64, and .75 under a standardized
solution (according to equations (4.3) and (4.4)) in the population model across all exogenous
and endogenous latent variables. The common standardized factor loadings range from .4
to .9 in research practice and simulation studies (Bandalos, 2006; Ethington, 1987; Hoogland
& Boomsma, 1998; Paxton, Curran, Bollen, Kirby, & Chen, 2001). The variance-covariance
matrix of two exogenous latent variables (Φ) consists of two components: (1) the one
inter-factor correlation was set to .3 in the population, reflecting a reasonable and empirical
inter-factor correlation value in the reviews of both simulation studies (66% using .3) and
applied literature (about 50% between .2 and .4); and (2) the two exogenous factor variances
were set equal to 1. The two matrices of structural regression coefficients B and Γ were each
set up as

B=

    0
.3
.2

    0
    0
.5

0
.4
0 and Γ = . 4
0
  .1

34

.6
.2 .
  .1

The residual variances of the three endogenous latent variables (Ψ) were designated
at .336, .436, and .379, based on the computation of the equation (4.4), in order to obtain
standardized structural regression coefficients. The common standardized solutions ranged
from .1 to .7 for structural regression coefficients, and from .2 to .8 for residual variances (i.e.,
1−R2) in practice and simulation studies. Structural regression coefficients below .1 were, in
general, not statistically and practically significant in applied research (Bandalos, 2006;
Ethington, 1987; Hoogland & Boomsma, 1998; Paxton, Curran, Bollen, Kirby, & Chen, 2001).
Note that the structural model is saturated (i.e., no unspecified relationships among the
exogenous and endogenous latent variables).

Number of Observed Variables’ Categories
Of the 157 psychometric measures in the SEM applications search that I conducted, the
greatest percentage of response category was five (39.4%), followed by seven (29.9%), four
(10.2%), and six (8.3%). Odd-numbered Likert scales with the middle response category
seem to occur more frequently in empirical studies. Prior simulation studies in SR models with
ordinal observed indicators did not fully examine the effect of number of observed variables’
categories (e.g., Anderson, 1996; Coenders, Satorra, & Saris, 1997; Ethington, 1987).
However, MLR has been congruously considered “appropriate” in the majority of published
studies when ordinal observed variables have more than five response categories without
piling or flooring effects. The chief goal here is to examine whether this general
recommendation is empirically valid in an SR model. In order to explore the impact of
categorization, four, five, six, and seven categories were generated for each ordinal observed
indicator within different levels of ordinal observed distributions; details are in the next

35

section.

Ordinal Observed Distributions
Micceri (1989) found that non-normality in the form of asymmetry for psychometric
distributions (due to categorization) was very usual in applied studies. Only about 3% of the
125 distributions he examined were close to normal and near symmetric, and over 80%
exhibited at least slight or moderate asymmetry. Micceri attempted to provide an empirical
base from which a simulation study could be closely related to the real-world data. Therefore,
four ordinal observed distributions that vary in symmetry and response style were
manipulated in this thesis: (1) a symmetric distribution, (2) a slightly asymmetric distribution,
(3) a moderately asymmetric distribution, and (4) a bipolarized distribution. When responding
to Likert-type items in the educational, behavioral, and social sciences, respondents vary in
their endorsement and exhibit different response styles. Distribution (1) can be considered as
middle-category response style (reference pattern), Distributions (2) and (3) as acquiescence
response style (disacquiescence if going toward the opposite direction), and Distribution (4) as
extreme response style (Weijters, Geuens, & Schillewaert, 2010). For a symmetric distribution,
the middle categories had the highest probabilities; for slightly and moderately asymmetric
distributions, the probabilities increased from low to high categories to different degrees; and
for a bipolarized distribution, the higher probabilities were placed on the both end-points.

For the sake of simplification, a standard normal distribution was selected for each latent
response variable in the data generation (i.e., with zero mean and variance at one), which led
to a zero mean structure. Random draws of the vector y* and x* were made from a multivariate
normal distribution with a zero mean vector (i.e., µ = 0) and a correlation matrix Σ* (see

36

equations (4.1) and (4.2)). The multivariate normally distributed data were first generated,
then ordinally scaled using prior thresholds to induce the desired asymmetric distributions and
response probabilities along the standard normal distributions (Muthén & Muthén, 2010).
Sixteen sets of thresholds (z-scores) were used to categorize the continuous response
distributions into ordinal observed data. That is, the response probability for each category is
the area under the standard normal density function between a pair of thresholds through
integral calculus. In order to limit the complexity of the simulation, the underlying normal
distribution was not manipulated in the study, because it requires an additional factor with
several distributions of interest that would multiply the number of experimental design
conditions beyond practical manageability. More importantly, the polychoric correlation
estimates have been proved robust against violation of the latent normality assumption
(Coenders, Statorra, & Saris, 1996; Flora & Curran, 2004; Micceri, 1989; Quiroga, 1992).

Response probabilities of ordinal observed indicators used in the study are displayed in
Figure 2. Note that 1(a) to 1(d) represent a symmetric distribution with zero skewness and
kurtosis from −.49 to −.48; 2(a) to 2(d) represent a slightly asymmetric distribution with
skewness from −.92 to −.91 and kurtosis from .80 to .84; 3(a) to 3(d) represent a moderately
asymmetric distribution with skewness from −1.39 to −1.38 and kurtosis from 1.14 to 1.19;
and 4(a) to 4(d) represent a bipolarized distribution with skewness from −.32 to −.31 and
kurtosis from −1.58 to −1.57.

In the symmetry condition, the threshold values were [−1.282, 0, 1.282] for four
categories with 10%, 40%, 40%, and 10% falling into each category; [−1.282, −.524, .524,
1.282] for five categories with 10%, 20%, 40%, 20%, and 10%; [−1.645, −.806, 0, .806,

37

1.645] for six categories with 5%, 16%, 29%, 29%, 16%, and 5%; and [−1.645, −.954,
−.385, .385, .954, 1.645] for seven categories with 5%, 12%, 18%, 30%, 18%, 12%, and 5%.
In the slight asymmetry condition, the threshold values were [−1.645, −1.08, .412] for four
categories with 5%, 9%, 52%, and 34% falling into each category; [−1.751, −1.341,
−.524, .706] for five categories with 4%, 5%, 21%, 46%, and 24%; [−1.751, −1.341, −1.08,
0, .878] for six categories with 4%, 5%, 5%, 36%, 31%, and 19%; and [−1.751, −1.341,
−1.036, −.613, .496, 1.341] for seven categories with 4%, 5%, 6%, 12%, 42%, 22%, and
9%. In the moderate asymmetry condition, the threshold values were [−1.645, −1.08, −.253]
for four categories with 5%, 9%, 26%, and 60% falling into each category; [−1.751, −1.282,
−.842, .05] for five categories with 4%, 6%, 10%, 32%, and 48%; [−1.751, −1.341, −1.036,
−.674, .202] for six categories with 4%, 5%, 6%, 10%, 33%, and 42%; and [−1.751, −1.341,
−1.126, −.878, −.553, .279] for seven categories with 4%, 4%, 5%, 6%, 10%, 32%, and
39%. In the bipolarization condition, the threshold values were [−.524, −.253, .253] for four
categories with 30%, 10%, 20%, and 40% falling into each category; [−.583, −.332,
−.151, .332] for five categories with 28%, 9%, 7%, 19%, and 37%; [−.674, −.385, −.253,
0, .385] for six categories with 25%, 10%, 5%, 10%, 15%, and 35%; and [−.842, −.468,
−.305, −.176, .100, .524] for seven categories with 20%, 12%, 6%, 5%, 11%, 16%, and
30%. A check on the generated data sets was made to ensure that the response probabilities
of observed variables approximated the four pre-specified targets (i.e., symmetry, slight and
moderate asymmetry, and bipolarization).

Sample Size
Sample size in SEM has been shown to interact with the size of model complexity (e.g.,
number of observed variables). The guideline for an adequate sample size in a CFA model is

38

commonly a function of the number of observed variables. For example, Jöreskog and Sörbom
(1986; 1996) recommended a sample size requirement of 1.5p(p+1), where p is the number
of observed variables. The five-factor structural regression model with 20 ordinal observed
indicators in this study requires a minimum sample size of 630. Additionally, if a sample size is
too small, polychoric correlation estimates may be unstable. Several reviews of published
applications of SEM and CFA have appeared. Breckler (1990) reviewed 72 studies in both CFA
and SEM between 1977 and 1987, and reported that the median sample size was 198. Only
25% of the models were tested on samples of more than 200. Medsker, Williams, and Holohan
(1994) identified 28 studies in both CFA and SEM between 1988 and 1993, and reported that
the average sample size was 299. DiStefano and Hess (2005) reviewed 101 studies in CFA
from 1990 to 2002, and reported that the median sample size was 377, and about 19% of the
models were tested on samples of less than 200. Jackson, Gillaspy, and Purc-Olivares (2009)
systematically reviewed 194 studies in CFA from 1998 to 2006. They reported that the median
sample size was 389, and about 20% of the models were tested on samples of less than 200.
The SEM applications search that I conducted showed that the sample size ranged from 110 to
2,512, with a mean of 518, across 36 studies. The median sample size was 341, with the 25th
and 75th percentiles of 245 and 603 respectively. About 14% of the models were tested on
samples of less than 200. Overall, there seems to be a strong consensus to increase sample
size for SEM and CFA models over the past 35 years.

Seven different sample sizes commonly encountered in empirical investigations were
employed in this study: N = 200, 300, 400, 500, 750, 1,000, and 1,500 (see, e.g., Beauducel
& Herzberg, 2006; Flora & Curran, 2004; Forero, Maydeu-Olivares, & Gallardo-Pujol, 2009). In
the case of a five-factor SR model with 20 ordinal indicators, a sample size N = 200 and 300 is

39

considered as small, a sample size between 400 and 750 as medium, and a sample size N =
1,000 or 1,500 as large. The corresponding ratio of N and p (N/p) are 10, 15, 20, 25, 37.5, 50,
and 75, which reaches to the minimum recommendation of having sample size N at least 10
times the number of variables p (Nunnally, 1978; DiStefano & Hess, 2005) and covers a wide
range of N/P values (about 94%, from 7.41 to 49.27 that were observed in the aforementioned
literature review). The values selected for this simulation study were intended to provide
information across a broad array of sample sizes, and the N/P values reflected what has
appeared in applied work.

Data Generation and Analysis
There are 4 (ordinal observed distributions) × 4 (number of observed variables’
categories) × 7 (sample size) = 112 experimental conditions in the study. A random seed was
set up across experimental conditions for the random draws on the process of data generation.
The advantages of this decision are that the data can be regenerated, and the results can be
reproduced by other researchers. Five hundred data sets were generated per experimental
condition, yielding a total of 56,000 data sets. The choice of 500 replications was made with
consideration to sampling variance reduction, adequate power, and practical manageability
(Muthén, 2002). Note that this study did not consider the possible effects of missing data on
the performance of the four estimation methods but only focused on complete case analysis.
Model parameters, standard errors, and the chi-square goodness of fit statistic were estimated
for each replication using ML, MLR, ULSMV, and WLSMV. Data generation and analysis were
performed with Mplus 7 (Muthén & Muthén, 2010), unless explicitly noted otherwise. Appendix
D includes Mplus code for data generation and analysis.

40

Outcome Variables
Seven outcomes were empirically studied in the study: (1) average relative bias of
parameter estimates, (2) average mean squared error of parameter estimates, (3) average
relative bias of standard error estimates, (4) average mean squared error of standard error
estimates, (5) relative bias of chi-square goodness of fit statistics, (6) the model rejection rate
associated with the chi-square goodness of fit statistic at an alpha level of .05, and (7) the
model rejection rate judging by the 90% confidence interval for the RMSEA.

The difference between the estimated and the true values of a parameter (i.e., the bias)
was used to evaluate the performance of the four different estimators. Since bias is highly
dependent on the magnitude of the true parameter value, and a great number of parameter
estimates were involved in each experiment being planned, the relative bias (RB) over the
replications and average relative bias (RBA) across the total number of parameter estimates
were calculated, in tandem, by

RB(𝜃! ) =

!
!!

!

!!" !!!
!!

×  100%, i = 1, 2, …, np; j = 1, 2, …, nr,

(31)

and

RBA(𝜃) =

!
!!

! RB(𝜃! ),

(32)

where RB(𝜃! ) denotes the relative bias of the parameter estimate 𝜃! over the replications, 𝜃!"
is the parameter estimate of the ith population parameter estimate 𝜃! in the jth replication, nr

41

is the number of replications in each experimental condition, and np is the total number of
parameter estimates. The formulae can be applied to model parameter estimates of interest,
such as factor loadings (λ), inter-factor correlations (ϕ), and structural regression coefficients
(β or γ). A RBA value less than 5% can be interpreted as a trivial bias, between 5% and 10%
as a moderate bias, and greater than 10% as a substantial bias (Curran, West, & Finch, 1996).
Note that RBA should be interpreted with caution, since it is used to describe an “overall”
picture of average bias, i.e., lumping bias in a positive and negative direction together.

To quantify the overall quality of parameter estimates, the mean squared error is
commonly used in simulation studies because it accounts for both the amount of bias and the
sampling variability of parameter estimates (i.e., efficiency). The mean squared error (MSE)
and average mean squared error (MSEA) can be defined as

MSE(𝜃! ) =

!

!!" !!!

!

,

!!

!

!

! MSE(𝜃! )  ,

!!

(33)

and

MSEA(𝜃) =

!!

(34)

where MSE(𝜃! ) denotes the mean squared error of the parameter estimate 𝜃! over the
replications; and other notations have been defined. A small MSEA value is suggested as
favorable because it indicates better overall quality of parameter estimates, that is, less biased
and more precise.

42

To obtain accurate and precise standard error estimates is also a primary concern in
applied and simulation studies. In a similar way, the bias formulations can be used for standard
error estimates, relative to the standard deviation of the parameter estimates over the
replications (also referred to as the empirical standard error). That is, the standard deviation
of the parameter estimates over the replications is used as a proxy for the population standard
error. The RB and RBA for standard error estimates are formulated as

RB[SE(𝜃! )] =

!
!!

!

!"(!! )! !!"(!! )
!"(!! )

×  100%,

(35)

and

RBA[SE(𝜃)] =

!

! RB[SE(𝜃! )]  ,

!!

(36)

where SE(𝜃! )!   is the estimated standard error of parameter 𝜃! in the jth replication, and SD(𝜃! )
is the standard deviation of parameter 𝜃! over the replications. The mean squared error (MSE)
and average mean squared error (MSEA) can also be defined as

MSE[SE(𝜃! )] =

!
!!

!

!"(!! )! !!"(!! )
!"(!! )

and

43

!

,

(37)

MSEA[SE(𝜃)] =

!
!!

! MSE[SE(𝜃! )],

(38)

where MSE[SE(𝜃! )] denotes the mean squared error of the estimated standard error of
parameter estimate 𝜃! over the replications.

Likewise, the performance of chi-square statistics can be assessed by the relative bias.
Because of the expected value of a chi-square distribution equal to its degrees of freedom, the
relative bias of chi-square statistics over the replications can be expressed as

RB(𝜒 ! ! )=

!! ! !!"

×  100%,

(39)

, j = 1, 2, …, nr,

(40)

!"

and

RB(𝜒 ! ) =

!
! !"(! ! )

  !!

where 𝜒 ! ! is the estimate of the chi-square statistic in the jth replication, df is the model
degrees of freedom, and nr is the number of replications in each experimental condition.

Alternatively, chi-square test statistics have been examined often through the
calculation of the rejection rate at a given nominal alpha level of .05 in simulation studies. The
rejection rate equals the number of replications for which the chi-square value is greater than
the critical value divided by the number of replications (successfully analyzed). The rejection
rate of the proposed model should, therefore, approximate 5% specified in the population

44

model. The obtained rejection rates lying between .025 and .075 can be considered acceptable
at a nominal alpha level of .05 (Bradley, 1978). A high rate of rejection suggests an inflated
Type I error rate of testing overall model fit, reflecting that chi-square tests may have been
over-rejected; a low rate of rejection otherwise indicates that chi-square test statistics may
have been underestimated. Moreover, a high rate of rejection implies increased likelihood
against the null hypothesis, whereas a low rate of rejection may indicate a potential
compromise of the power of rejecting the hypothesized model.

Finally, applications of ad hoc fit indices have been less common in the extant literature
(e.g., Flora & Curran, 2004; Forero, Maydeu-Olivares, & Gallardo-Pujol, 2009; Rhemtulla,
Brosseau-Liard, & Savalei, 2012; Yang-Wallentin & Jöreskog, & Luo, 2010). However, the root
mean square error of approximation (RMSEA) has received the most attention, and it recently
has been recognized as one of the most informative and trustworthy indices of model fit in
applied research. RMSEA is a function of the sample estimate of the noncentrality parameter,
𝜆:

RMSEA = max   0,

!
!!!   ×  !

,

(41)

where 𝜆 = T – d, T is the estimated chi-square test statistic, and d is the degrees of freedom.
One can replace T as TML, TMLR, TULSMV, or TD-WLSMV in the equation (41) to obtain RMSEAML,
RMSEAMLR, RMSEAULSMV, or RMSEAD-WLSMV. RMSEA not only takes into account model complexity,
as reflected in the degrees of freedom, but also it is least sensitive to sample size among ad
hoc fit indices. It has been suggested that an RMSEA value of less than or equal to .05 is

45

indicative of a model of close fit (Browne & Cudeck, 1993). Because RMSEA, unlike chi-square
statistics, does not have a known sampling distribution to assess its behavior, the performance
of RMSEA was therefore assessed by the calculation of the rejection rate, judging by the 90%
confidence interval. The upper and lower bounds of a 90% confidence interval for the RMSEA
can be calculated as (Browne & Cudeck, 1993):

RMSEA_low = max   0,

!.!"
!!!   ×  !

,

(42)

.

(43)

and

RMSEA_upp = max   0,

!.!"
!!!   ×  !

λ.!" is the value that T is the 95th percentile of the noncentral chi-square distribution 𝜒 ! (d,
λ.!" ), and λ.!" is the value that T is the 5th percentile of the noncentral chi-square distribution
𝜒 ! (d, λ.!" ). Likewise, a 90% confidence interval for the RMSEA for ML, MLR, ULSMV, or
D-WLSMV can be obtained by replacing T as TML, TMLR, TULSMV, or TD-WLSMV in the noncentral
chi-square distribution. The rejection rate is determined as the number of replications for
which the lower bound of a 90% confidence interval for the RMSEA is greater than the
“practical” guideline of cutoff value of .05 divided by the number of replications (successfully
analyzed). Also, means of RMSEA over the replications are reported to illustrate the practical
relevance of the findings.

46

CHAPTER 5
RESULTS

Due to an overwhelming amount available output, this results section needs reduction to
accomplish a concise and attractive, though informative, presentation. The tables and figures
are collapsed across several conditions, and the results for certain conditions not presented
here are available in Appendix E. More specifically, results for sample sizes of N = 400, 750,
and 1,500 are not presented here but are appended as supplemental materials in Appendix E,
mainly because a similar pattern of the results for N = 400 and N = 750 was observed to those
for N = 500; and results with N = 1,500 were comparable to N = 1,000, in terms of the
performance of the parameter, standard error estimates, test statistics, and RMSEA.
Furthermore, an exhaustive report of bipolarized data was not undertaken here, as the effect
of bipolarization on model results fell between that of slight asymmetry and moderate
asymmetry across most conditions; the effect of bipolarization on chi-square goodness of fit
statistics fell between that of symmetry and slightly asymmetry across many conditions.
However, Appendix F contains all results from the bipolarized data conditions. Because ML and
MLR produced the same rates of non-convergence and inadmissible solutions, and the same
values of parameter estimates (including both factor loadings and structural coefficients),
these results were combined within the estimator denoted by “ML/MLR” in some tables.
However, uncorrected standard errors and unadjusted chi-square goodness of fit statistics
obtained with ML were different from MLR, so they were reported separately in the pertinent
result tables.

Non-Convergence and Inadmissible Solutions
Non-convergence was defined as the iterative estimation process that failed to converge

47

because the maximum number of iterations (by Mplus default) exceeded or because there
were difficulties in optimizing the fit function before the maximum number of iterations had
been reached (Muthén & Muthén, 2010). An inadmissible solution (i.e., Heywood cases) was
defined as a statistically converged solution that, however, produced unbounded parameter
estimates (i.e., an estimated inter-factor correlation larger than 1 in absolute value) or
negative residual variances.

Tables 4(a) and 4(b) show the number of cases that failed to converge or produced
inadmissible solutions. Note that ML and MLR produced the same number of cases of
non-convergence and inadmissible solutions, so results were combined within the estimator
denoted by “ML/MLR” in Tables 4(a) and 4(b). As shown in Table 4(a), estimation that failed to
converge most likely occurred with 4-category, moderately asymmetric data in the smallest
sample size N = 200 for all four estimators (in boldface). Convergence failures disappeared for
all four estimators when sample size increased to N = 300, except for 2 cells (in boldface).
Regarding inadmissible solutions in Table 4(b), ML and MLR did not yield any inadmissible
solution across all conditions in the study. WLSMV and ULSMV tended to produce inadmissible
solutions particularly when sample size was small. Among the four estimators, ULSMV had a
higher probability of producing inadmissible solutions than the other three estimators across
many conditions with sample size N = 200. The highest rate of inadmissible solutions obtained
with ULSMV was 1.2% (6 cases), and it appeared with four-category, moderately asymmetric
data, and sample size N = 200. However, there were no inadmissible solution across all but
three conditions (in boldface) as sample size increased to N = 300 or more. In general, the four
estimators (ML, MLR, WLSMV, and ULSMV) all resulted in convergence failures when data were
4-category, moderately asymmetric in the smallest sample size N = 200. ULSMV and WLSMV

48

were more likely subjected to inadmissible solutions across many conditions with sample sizes
N = 200 or 300.

In order to inform research practice and maximize external validity, the replications that
were classified as non-convergence or inadmissible solutions were considered invalid empirical
observations and were excluded from studying the impact of experimental factors on the
performance of the four estimators and evaluating the parameter and standard error
estimates, test statistics, and RMSEA (cf. Boomsma, 2013; Chen, Bollen, Paxton, Curran, &
Kirby, 2001; Flora & Curran, 2004; Forero & Maydeu-Olivares, 2009). Note that additional
analyses that included the inadmissible solutions were conducted. The analyses indeed
brought about minor changes in outcome variables, but the conclusions remained unchanged.

Parameter Estimates
Factor Loadings
Tables 5−8 display average relative bias (RBA) and average mean squared error (MSEA)
of factor loadings and structural coefficients by number of observed variables’ categories and
ordinal observed distributions for all four estimators. Note that ML and MLR produced the same
parameter estimates (both factor loadings and structural coefficients), so results were
combined within the estimator denoted by “ML/MLR” in Tables 5−8. Factor loadings were, on
average, underestimated by ML and MLR. They were moderately or substantially
downward-biased across all sample size conditions, except for symmetric data with 5
categories or more. The magnitude of this negative bias was reduced with increasing the
number of observed variables’ categories but increased with increasing the level of
asymmetric distributions of ordinal observed variables. Conversely, factor loading estimates

49

obtained by WLSMV and ULSMV appeared to be negligibly unbiased on average, irrespective of
the number of observed variables’ categories, the shape of ordinal observed distributions, and
sample size. Overall, WLSMV and ULSMV were consistently superior to ML and MLR for factor
loading estimation in all investigated conditions, indicating that WLSMV and ULSMV yield more
accurate factor loading estimates than ML and MLR.

In order to quantify the overall quality of parameter estimates, both the amount of bias
and the sampling variability of parameter estimates (i.e., efficiency) should be considered
simultaneously. An index that combines both squared bias and sampling variance is the mean
squared error (MSE). A small MSE value is suggested as favorable because it indicates better
overall quality of parameter estimates, that is, less biased and more precise. Regarding the
overall quality of estimated factor loadings, the average mean squared error (MSEA) decreased
with increasing sample sizes and the number of observed variables’ categories but increased
with a greater level of asymmetric distributions. That is, the performance of factor loading
estimates became better when sample size and the number of observed variables’ categories
increased but turned worse when the level of asymmetric distributions increased. MSEA was
most pronounced in the conditions where RBA was appreciable; in particular, it was noticeably
large with four-category ordinal data. In general, MSEA obtained with WLSMV and ULSMV were
smaller than ML and MLR across nearly all conditions.

However, there were few cells where MSEA obtained with ML and MLR was smaller than
WLSMV and ULSMV when data were symmetric. In order to get a deeper understanding of this
scenario, the MSEA was then partitioned into two components: squared bias and sampling
variance in a stacked histogram. The lower portion in the stacked histogram is the squared bias,

50

whereas the upper portion represents the sampling variance. Figure 3 clarifies that ML and
MLR displayed higher bias for categories 5, 6, and 7 than WLSMV and ULSMV in the
combination of symmetric data and N = 200, despite generally lower MSEA. Specifically, ML
and MLR produced more biased, but less variable, factor loading estimates, indicating that the
estimates obtained in any given replications are likely to be close to each other but too far from
the population value. Such observation disappeared as sample size increased, reflecting that a
large sample size can wash out the advantage of symmetric data in ML and MLR estimation.

It is of particular interest that MSEA obtained with WLSMV was consistently slightly
smaller than ULSMV across all cells, suggesting that the diagonal weights indeed contribute a
small improvement on the overall quality of factor loading estimates. That is, factor loading
estimates obtained with WLSMV were less biased and more precise than those produced by
ULSMV. Uniformly, WLSMV and ULSMV are considered better than ML and MLR on the
performance of factor loading estimates across nearly all conditions. However, the
performance of ULSMV fell between that of WLSMV and ML/MLR.

Structural Coefficients
The overall bias in structural coefficients (including both structural regression
coefficients and the inter-factor correlation) obtained with the four estimators was, on average,
trivially biased (either positively or negatively). Averaging over the structural coefficient
estimates, the bias obtained with WLSMV and ULSMV appeared to be consistently trivial, rarely
leading to the amount of bias greater than 1%. ML and MLR however introduced the amount of
slightly marked bias into the estimates of structural coefficients with moderately asymmetric
data (about −3%) across all sample sizes. In comparison, there was no remarkable distinction

51

among the four estimators, in terms of the absolute value of RBA. However, ML and MLR
produced slightly larger bias than WLSMV and ULSMV in many conditions, except for certain
symmetric data conditions.

With respect to the overall quality of estimated structural coefficients, the MSEA
decreased with increasing sample size and the number of observed variables’ categories but
increased with increasing the level of asymmetric distributions of ordinal observed variables.
Similarly, the performance of structural coefficient estimates improved when sample size and
the number of observed variables’ categories increased but dropped when the level of
asymmetric distributions increased. Unlike factor loading estimates, the benefit of using
diagonal weights to bring about a small improvement on the overall quality of structural
coefficient estimates did not accrue until a medium sample size (N = 500) was reached. In
general, there was no remarkable evidence suggesting that one of the four estimators is
inferior to another one, in terms of MSEA. However, ML and MLR produced smaller MSEA than
WLSMV and ULSMV in all conditions of symmetric data, while WLSMV and ULSMV produced
smaller MSEA in nearly all asymmetric data conditions.

In addition to overall bias and quality of estimated structural coefficients, examination of
each structural coefficient was also employed to gain further insight into which type(s) of
structural coefficients performed better. In terms of MSEA, the performance of estimated
structural coefficients became better as the magnitude of coefficients increased for all the four
estimators. The estimates of the inter-factor correlation and structural regression coefficients
in Γ generally were less biased and more precise than those of structural regression
coefficients in B, provided with the same magnitude of coefficient. Table 9 shows that MSEA of

52

ϕ12 was about 4 to 5 times smaller than MSEA of β21, and MSEA of γ32 was consistently smaller
than MSEA of β21 across the four estimators in the N = 1,000 conditions. Again, it was observed
that ML and MLR produced smaller MSEA than WLSMV and ULSMV in all the conditions of
symmetric data, while WLSMV and ULSMV produced smaller MSEA in all asymmetric data
conditions.

Standard Error Estimates
The RBA and MSEA for standard errors of factor loadings and structural coefficients are
presented in Tables 10−13. Standard errors exhibited, on average, moderately downward bias
for both WLSMV and ULSMV with the smallest sample size (N = 200), reflecting that robust
standard errors were not upward adjusted enough to compensate the loss of efficiency caused
by WLSMV and ULSMV estimation in the sample size N = 200 conditions. Not surprisingly,
standard error underestimation improved when sample size increased. In contrast, the
amount of negatively moderate-to-substantial bias was observed in ML estimation across most
simulation conditions, except for all cells of symmetric data. The amount of trivial bias
(essentially unbiased) was produced by MLR across most conditions. That is, this
moderate-to-substantial underestimation of ML standard errors was significantly attenuated
when robust corrections to standard errors in MLR estimation were employed. As soon as the
sample size reached to N = 500 or more, the three robust estimators performed comparably
well for estimating standard errors of parameter estimates, in terms of RBA. Uncorrected
standard errors produced by ML still remained moderately-to-substantially biased in all
asymmetric data conditions. Overall, the performance of MLR surpassed that of ML, WLSMV,
and ULSMV across most conditions, in terms of RBA. The performance of ML was the worst in
all asymmetric data conditions. However, there was no remarkable distinction between ML and

53

MLR in the conditions with symmetric data and sample size N = 300 or more.

Much as with the overall quality of parameter estimates, MSEA associated with standard
error estimates decreased with increasing sample size but increased with increasing the level
of asymmetric distributions of ordinal observed variables. MSEA obtained with ML and MLR
demonstrated little sensitivity to the number of observed variables’ categories, whereas in
estimating structural coefficients, MSEA obtained with WLSMV and ULSMV diminished as the
number of observed variables’ categories increased. The advantage of incorporating diagonal
weights into the estimated asymptotic covariance matrix of the parameter estimates was only
sustained with robust standard errors of structural coefficient estimates, not with that of factor
loading estimates.

In general, USLMV produced more precise estimated standard errors of factor loadings
than WLSMV and MLR in all conditions, whereas WLSMV produced more precise estimated
standard errors of structural coefficients than ULSMV and MLR across all asymmetric data
conditions. The MSEA was partitioned into two components: squared bias and sampling
variance in a stacked histogram, as described in the preceding section. As depicted in Figure 4,
due to lower sampling variance, ULSMV displayed the lowest MSEA in the combination of
slightly asymmetric data and N = 300, despite slightly higher bias. As sample size increased to
N = 1,000, the bias produced by ULSMV was essentially equal to the other two robust
estimators (see Figure 5), and ULSMV still had the lowest MSEA among the 4 estimators.
Because of the amount of moderate bias, ML exhibited the highest MSEA among the 4
estimators, although some lower sampling variances were observed. In Figure 6, WLSMV
displayed lowest MSEA due to lower sampling variance, despite trivial bias. Although MLR

54

produced less biased standard error estimates, it had relatively high sampling variance,
illustrating that the standard error estimates obtained in any given replications are usually not
far from the empirical standard error (i.e., the standard deviation of parameter estimates over
replications), but are widely spread out. In addition, ML produced higher biased standard
errors but smaller sampling variances than MLR.

Chi-Square Goodness of Fit Statistics
Tables 14−17 present findings for chi-square goodness of fit statistics and RMSEA with
ML, MLR, ULSMV, and WLSMV estimators: (1) relative bias of chi-square goodness of fit
statistics, (2) rejection rates associated with the Likelihood Ratio (LR) Test, (3) mean of
RMSEA, and (4) rejection rates associated with the 90% CI for the RMSEA. The rejection rates
associated with the LR test equal the number of replications for which the chi-square value is
greater than the critical value divided by the number of successfully analyzed replications, and
the rejection rates associated with the 90% CI for the RMSEA are determined as the number
of replications for which the lower bound of a CI is greater than the practical cutoff value of .05
divided by the number of successfully analyzed replications. The boldface numbers in these
tables indicate unacceptable rejection rates, implying that acceptable difference rates in the
tables are within the range [2.5%, 7.5%] (Bradley, 1978).

For WLSMV and ULSMV, the empirical Type I error rates of testing overall model fit were
almost all within the range of .025 and .075, very close to the nominal Type I error (alpha =.
05), except for the smallest sample size (N = 200). When the sample size was too small,
WLSMV seemed to reject the hypothesized model too frequently, whereas the corresponding
chi-square statistics of ULSMV were too conservative, as evidenced by N = 200. On the other

55

hand, MLR appeared to be systematically inferior in controlling for Type I error rates of testing
overall model fit across nearly all conditions, unless a larger sample size was used (e.g., N =
1,000). ML performed worse than MLR across most conditions, except for some cells of
symmetric data. When data were slightly or moderately asymmetric, ML seemed to reject the
hypothesized model much beyond expectation (more than 10 times the nominal Type I error),
indicating that uncorrected chi-square statistics may have been substantially inflated in the
presence of non-normality.

Among the three robust estimators, the corrected chi-square test statistics tend to be
positively biased across all experimental conditions, with MLR correction being particularly
unstable. The degree of positive bias diminished as sample size increased. Also, the number of
observed variables’ categories and the level of asymmetric distributions of ordinal observed
variables had an increasing effect on the inflation of chi-square statistics, but this effect was
more pronounced for small sample sizes (e.g., N = 200 or 300). In general, MLR estimation
was prone to yield moderately inflated chi-square statistics in the conditions of moderately
asymmetric data and the small sample size (e.g., N = 200 or 300). Compared to WLSMV
estimation, the positive bias was seen to be slightly smaller with ULSMV estimation.

Graphical comparisons of the observed distributions of the test statistics to the expected
chi-square distributions are further provided to visualize some information from Tables 14
through 17 using Probability-Probability (P-P) plots. Figures 7 and 8 demonstrated extremely
bad distributional behavior of TML and also evidently disclosed the effects of small sample size
and asymmetric distributions on the inflation of chi-square statistics. The plots for the
moderately asymmetric data with seven-category were shown for the worst scenario where N

56

= 200 for all four estimators. Remarkably, the overall behavior of TML is clearly deviant across
most conditions unless the data were symmetric and sample size increased. Overall, TULSMV had
the closest approximation to the reference chi-square distribution, followed by TWLSMV, TMLR,
and then TML in all conditions.

RMSEA
As seen in Tables 14−17, rejection rates associated with the 90% CI for the RMSEA were
not sensitive to the conditions of the study. This may be attributed to the population SR model
being correctly specified in data analysis. However, means of RMSEA were minimally positively
biased for all four estimators. It is not surprising that this inflation was reduced monotonically
with increasing sample size, regardless of the number of observed variables’ categories and
the shape of ordinal observed distributions.

57

CHAPTER 6
DISCUSSION

This study sought to compare the performance of ML, MLR, WLSMV, and ULSMV in
regard to parameter estimates, standard errors, and chi-square goodness of fit statistics in a
five-factor structural regression model with ordinal observed variables under different
experimental configurations of ordinal observed distributions, number of observed variables’
categories, and sample size, resulting in 112 conditions. The conditions were chosen to
highlight the differences among the four estimators, as well as to investigate a wide variety of
empirical circumstances frequently encountered in research practice. Several general findings
are discussed as follows.

First, the four estimators all results in convergence failures when data were 4-category,
moderately asymmetric in the smallest sample size N = 200. Furthermore, ULSMV and WLSMV
were more likely subject to inadmissible solutions when small sample sizes N = 200 or 300
were analyzed. The small sample degradation of the ML and three robust estimators is
consistent with previous simulation studies in which non-convergence or inadmissible
solutions more frequently occur with small sample sizes (Herzog, Boomsma, & Reinecke, 2007;
Rhemtulla, Brosseau-Liard, & Savalei, 2012; Forero, Maydeu-Olivares, & Gallardo-Pujol,
2009). However, it is important to note that increasing sample size N to 300 apparently can
help reduce the degree of this unfavorable outcome.

Second, this study replicated previous results that factor loadings were typically
underestimated by ML and MLR but were essentially unbiased with WLSMV and ULSMV
(Beauducel & Herzberg, 2006; Flora & Curran, 2004; Forero, Maydeu-Olivares, Gallardo-Pujol,

58

2009; Rhemtulla, Brosseau-Liard, & Savalei, 2012). The accuracy and precision of estimated
factor loadings with WLSMV and ULSMV were better than that of estimated factor loadings with
ML and MLR across nearly all conditions, in terms of MSEA. Interestingly, on the basis of this
simulation study, a clear superiority of WLSMV and ULSMV over ML and MLR in factor loading
estimates across all simulation conditions was confirmed in this study, irrespective of the
number of observed variables’ categories, the shape of ordinal observed distributions, and
sample size. Even when the number of observed variables’ categories in the data reached to
seven, ML and MLR still led to moderately biased factor loading estimates (Beauducel &
Herzberg, 2006; Rhemtulla, Brosseau-Liard, & Savalei, 2012), suggesting that prior studies
with ordinal observed indicators using ML and MLR underestimated associations between
ordinal observed variables and latent constructs. In turn, the estimates of reliability for
composite scores on Likert-type scales may have been undermined, which is particularly more
appreciable with increasing the level of asymmetric distributions of ordinal observed variables.

ML and MLR led to moderately biased factor loading estimates but only produced a small
amount of bias in structural coefficients across all conditions. More specifically, ML and MLR
displayed mind robustness against violation of normality in estimating structural coefficients
but rather factor loadings. This “unique” finding contributes to the literature by demonstrating
that a combination effect of categorization and asymmetric observed distributions is larger on
the measurement model parameters than on the structural model parameters. Such
observation is similar to that of Coenders, Satorra, and Saris (1997), who concluded that
Pearson product-moment correlations between ordinal observed indicators (through
maximum likelihood estimation) perform badly in the estimation of factor loadings but such
lower measurement quality estimates can lead to approximately correct point estimates of

59

structural coefficients.

Moreover, in terms of the accuracy and precision of estimated structural coefficients, a
clear superiority of ML and MLR over WLSMV and ULSMV was found in all symmetric data
conditions, whereas the advantage shifted to WLSMV and ULSMV in nearly all asymmetric data
conditions. This may be attributed to the symmetric data being analyzed and the desirable
estimation properties of maximum likelihood, such as unbiasedness and maximal efficiency,
can therefore be retained. It is readily apparent that the accuracy and precision of parameter
estimates (including factor loadings, the inter-factor correlation, and structural regression
coefficients) became better with increasing sample size and the number of observed variables’
categories but decreased with a greater level of asymmetric distributions of ordinal observed
variables.

In addition, increasing the magnitude of population structural coefficients was associated
with higher accuracy and precision of structural coefficient estimation. A finding worth noting
is that given the same magnitude of structural coefficients, the inter-factor correlation and
structural regression coefficients between exogenous and endogenous latent variables
generally performed better than structural regression coefficients among endogenous latent
variables across all conditions. Prior simulation studies rarely explored the effect of ordinal
observed variables on structural coefficients, but these findings complement the existing
literature by showing the performance of structural coefficient estimates in an SR model. The
specification of heterogeneous structural coefficients also highlights the potential weakness in
an SR model − estimating a small structural regression coefficient among endogenous latent
variables is likely compromised.

60

Third, it was observed that MLR gave more accurate, but less precise, standard error
estimates than WLSMV and ULSMV across most conditions. Despite the slightly higher amount
of bias, among the three robust estimators, ULSMV produced more precise estimated standard
errors of factor loadings across all conditions, whereas WLSMV produced more precise
estimated standard errors of structural coefficients, due to smaller sampling variation, in all
asymmetric data conditions. These findings resonate with the existing literature, in showing
that the performance of standard error estimates for WLSMV and ULSMV is better than for MLR
(Rhemtulla, Brosseau-Liard, & Savalei, 2012; Yang-Wallentin, Jöreskog, & Luo, 2010). In
addition, ML produced moderately-to-substantially biased standard error estimates across all
conditions, except for the conditions of symmetric data, congruent with previous studies (e.g.,
Beauducel & Herzberg, 2006; Kaplan, 2009). Likewise, the accuracy and precision of standard
error estimates improved with increasing sample size and the number of observed variables’
categories but was reduced with a greater level of asymmetric distributions of ordinal observed
variables. However, standard error estimates obtained by ML and MLR were less sensitive to
the number of observed variables’ categories.

Fourth, in the evaluation of overall model fit using chi-square goodness of fit statistics,
WLSMV and ULSMV had empirical rejection rates within the acceptable range of .025 and .075,
closed to the nominal Type I error α = .5. However, when the sample size was too small (e.g.,
N = 200), WLSMV was likely to over-reject the hypothesized model more often than expected,
echoing the high tendency of WLSMV rejection rates (Beauducel & Herzberg, 2006; Flora &
Curran, 2004). In contrast, ULSMV tended to under-reject the hypothesized model less than
the alpha level with a small sample. As with previous studies, ML produced unacceptable

61

rejection rates and TML exhibited extremely deviant distributional behavior in all asymmetric
data conditions (e.g., Kaplan, 2009; Muthén & Kaplan, 1992). Among the three robust
estimators, MLR was systematically inferior to WLSMV and ULSMV in controlling for Type I
error rates of testing overall model fit across many conditions, due to moderate-to-substantial
inflation of chi-square goodness of fit statistics. The deviant distributional behavior of TMLR
occurred with the moderately asymmetric data having seven categories in the smallest sample
size N = 200. The finding also suggests that TMLR is subjected to sizeable overestimation in a
small sample. Until the sample size increased to 1,000, an acceptable rejection rate associated
with MLR chi-square statistics was consequently observed.

With respect to the supplemental fit index, RMSEA, rejection rates judging by 90%
confidence intervals revealed less sensitivity to the correctly specified SR model in this study.
Although RMSEA showed promise for assessing the adequacy of a hypothesized model, means
of RMSEA were slightly positively biased for all four estimators. This is in line with Curran et al.
(2002) and Herzog & Boomsma (2009), who found that RMSEA is upward biased in smaller
sample size conditions. Overall, RMSEA seems to be a reliable index in the evaluation of overall
model fit when the model has no specification error.

Fifth, this study also aimed to evaluate the performance of the two weight matrices (I
versus WD) in the estimation of parameters, robust standard errors, and test statistics. An
interesting finding of this study is that the benefit of incorporating diagonal weights into the
least squares fit function and estimated asymptotic covariance matrix was observed with
parameter estimates and robust standard error estimates across all conditions, except for
robust standard errors of factor loadings. That is, the diagonal weights contributed a small

62

improvement upon the performance of parameters and robust standard errors estimates.
Additionally, this advantage was not sustained with chi-square statistic corrections because
TULSMV appeared to have the closest approximation to the reference chi-square distribution.
However, these findings make a distinct contribution to the existing literature in which the
effectiveness of the diagonal weights is not very clear. In sum, not only can this diagonal
weight matrix get around computational troubles in the conditions of small sample sizes or/and
complex models but also yield relatively accurate parameter and standard error estimates, and
a well-behaved distribution of test statistics that is approximately close to a central chi-square.

Implications for Applied Research
Sample Size
There are several specific implications of the findings with respect to fitting an SR model
with ordinal observed variables using these four estimators in practice. First, applied
researchers are concerned with the rates of non-convergence and inadmissible solutions. A
non-converged or inadmissible solution often plagues applied researchers, and it is of no use
for substantive interpretation. Evidence suggests that the four estimators all resulted in
convergence failures when data were 4-category, moderately asymmetric in the smallest
sample size N = 200, but only WLSMV and USLMV were subjected to inadmissible solutions in
many conditions with sample sizes N = 200 or 300. In addition, a small sample size is often
problematic because parameter and standard error estimates can be biased seriously and less
precise. Not surprisingly, increasing sample size not only protects against convergence failures
and inadmissible solutions but also improves the performance of model estimation.

One of the relevant implications for applied researchers is the presence of conditions for

63

which none of the three robust estimators yields adequate results. WLSMV and ULSMV do not
need a large sample size for the recovery of population parameters and to evaluate overall
model fit via the mean- and variance-adjusted chi-square goodness of fit statistics, but a
medium sample (N = 500 or more) is needed to obtain better standard error estimates. On the
other hand, MLR does not require a large sample to produce stable structural coefficient and
standard error estimates, but may need a quite large sample (N = 1,000 or more) to control for
Type I error rates of testing overall model fit, despite the existence of moderate
underestimation in factor loading estimates.

Taken together, a sample size less than 500 should be avoided to use when fitting a
medium-size model with ordinal observed variables in practice. In this case, the ratio of
sample size (N) and the number of observed variables (p) is 25 which much more exceeds the
recommendation of having N at least 10 times p (Nunnally, 1978). Additionally, the ratio of N
and the number of free parameters (q) is 10 which just meets the minimum requirement of
having at least N : q = 10 : 1 with non-normal data when using maximum likelihood estimation
(Bentler & Chou, 1987; Hu, Bentler, & Kano, 1992). Note that the number of free parameters
in MLR estimation was 50 because there were 20 factor loadings, 20 error variances, and 10
structural coefficients.

Estimation Methods
Regarding estimation method selection, if the structural relationships are of primary
concern in a research setting, the use of MLR can be recommended when fitting an SR model
with ordinal observed variables on this ground. The biases of MLR estimates remained quite
small and were typically less than .01 in the standardized structural coefficient metric,

64

although a substantial amount of bias in estimating factor loadings is inevitable (about 5% to
10%). Given this recommendation, a word of caution is warranted. In a small sample, the
robust chi-square goodness of fit statistic obtained with MLR is likely compromised, and
RMSEA can be regarded as another alternative to evaluate the plausibility of overall model fit.
This fit index could be of particular benefit to applied researchers in evaluating overall model
fit. In general, it is not advisable to use ML in an SR model with ordinal observed variables
unless data are symmetric and the desired sample size N = 500 is reached. In this case,
structural coefficient and standard error estimates are considered reliable but the factor
loadings are slightly underestimated and the uncorrected chi-square statistic is still slightly
inflated, and RMSEA should be used to evaluate overall model fit. Generally speaking, the
moderate-to-substantial underestimation of standard errors and considerable inflation of
chi-square statistics make ML less attractive and favorable in practice, particularly when data
moderately deviate from normality. This study also supports the argument that the
performance of ML is generally unacceptable in the presence of non-normality.

It seems that WLSMV and ULSMV compensate more effectively than MLR for the bias and
model fit evaluation measures due to the observed indicators by virtue of being ordinal rather
than continuous in the SR model. Furthermore, the benefit of using diagonal weights makes
WLSMV superior to ULSMV in many conditions; however, in a very rare scenario, when WLSMV
is subject to the non-convergent issue, ULSMV may serve as another alternative for applied
researchers. It is worth noting that once applied researchers confront the problem of missing
data, ML or MLR with full information estimation is considered as a promising approach to
handling missing data without employing (single or multiple) data imputation. Yet, the
treatment of missing data in WLSMV and ULSMV estimators remains technically

65

underdeveloped, providing its bivariate orientation (pairwise deletion as the default in Mplus,
Muthén & Muthén, 2010). Additionally, some applied researchers may be limited in the choice
of software programs or by estimation availability of certain software programs that they are
familiar with. For instance, diagonally weighted least squares estimation is only implemented
in Mplus, LISREL, SAS PROC CALIS, and the R package ‘lavaan’ but currently unavailable in
EQS, Amos, and STATA.

Finally, another relevant implication for applied researchers is related to practical
differences among the three robust estimators in this study. Take the model inference for
example, the robust chi-square goodness of fit statistics obtained with MLR may tend to
over-reject the true model about 5-20% more in the conditions with small sample sizes (e.g.,
N = 200, 300, or 500) than WLSMV and ULSMV. Specifically, applied researchers are very
likely to reach completely different conclusions by rejecting the true model if they employ MLR
rather than WLSMV or ULSMV in data analysis. Additionally, applied researchers occasionally
use the RMSEA estimates to evaluate the model misfit, instead of the 90% confidence interval
of the RMSEA. They reject the hypothesized model if the RMSEA estimate is greater than the
“practical” cutoff value of .05. Another observation of possibly misleading conclusions drawn
from empirical data is that a slightly higher bias of the RMSEA estimates makes MLR a little
vulnerable in the evaluation of overall model fit with the smallest sample size N = 200. Some
replications ended up with rejecting the true model based on the RMSEA estimates when MLR
was employed in the analysis, and this situation became even worse when ML was used.
Regarding the model parameter inference, estimating a small structural regression coefficient
of .1 in the population model is very likely challenging as well, in particular of the conditions
with asymmetric data and/or small sample sizes. The parameter estimates obtained with

66

WLSMV and ULSMV had higher rates of statistical significance than those obtained with MLR
across all asymmetric data conditions, regardless of number of observed variables’ categories
and sample size. Namely, applied researchers have higher likelihood of detecting these small
relationships (i.e., 0.1 in the standardized regression coefficient metric) between latent
constructs if they employ WLSMV or ULSMV in data analysis. For instance, the statistical
significant rates of WLSMV and ULSMV were about 5% higher than those of MLR in the
conditions with sample size N = 1000 when data were slightly or moderately asymmetric.

Therefore, advocates of robust estimation methods take the view that if standard errors
and chi-square goodness of fit statistics are statistically corrected, then the power of
uncovering the relationships between observed variables and/or latent variables can be
enhanced, and the overall model hypothesis testing is able to maintain the type I error rate
close to the nominal level in the evaluation of overall model fit. These statistical properties
directly translate into substantive and practical advantages − applied researchers are likely to
detect genuine relationships with precision and have more reliable model inference.

Response Categories and Observed Distributions
The accuracy and precision of parameter and standard error estimates improved as the
number of observed variables’ categories increased. These findings support the
recommendation that applied researchers are encouraged to use 7-category ordinal observed
indicators in a measurement design whenever possible. As was stated in the preceding section,
ML and MLR did not fare well on factor loading estimation even when the number of observed
variables’ categories was seven across all sample sizes. Although the point of superiority of ML
and MLR over WLSMV and ULSMV may probably be reached with a larger number of observed

67

variables’ categories (e.g., 9 or 10), the implication for applied research for this further
investigation is limited because ordinal observed indicators with more than 9 categories are
rarely used in practice. Of the 157 psychometric measures in the SEM applications search I
conducted, there were only 6 cases (3.8%) in which ordinal observed indicators had more than
7 categories. Previous studies appear to support the desirability of a larger number of
observed variables’ categories (e.g., higher psychometric qualities), but increasing the
number of response categories may also affect respondents’ cognitive capability to process the
meaning of each response category (see, e.g., Cook, Heath, & Thompson, 2001; Lietz, 2010).
The number of response categories is also closely related to the distributions of ordinal
observed variables. Statisticians have agreed that none of the real-world data is perfectly
symmetric or/and normal (Gartside, 2001; Nester, 1996). Given the pervasiveness of
non-normal data in practice, a general guideline for applied researchers is to examine the
extent to which normality violation in the distributions of ordinal observed variables occurs
before conducting data analysis. If ordinal observed indicators with moderately asymmetric
and leptokurtic distributions are present, interpretation should be with much caution in
structural coefficients, factor loadings, standard errors, and chi-square goodness of fit
statistics. A cross-validated study can also help replicate the findings.

Limitations and Directions for Future Research
There are innumerable combinations to manipulate in a single simulation study, but one
can only focus on certain factors of particular interest to make the research design feasible and
manageable. One drawback of carrying out a Monte Carlo study is that results are conditional
on the simulation design. This study shares the same limitation as all Monte Carlo simulation
studies, in that generalizations are constrained by the specification of the experimental

68

conditions employed in this study. Several limitations embedded in this study can be
considered as potentially fruitful directions for future research.

First, a thorough examination of the effects of violation of the latent normality
assumption on WLSMV and ULSMV is beyond the scope of this study. However, the polychoric
correlation estimates have been proved to be robust against moderate violations of normality
assumption in the latent response variables (Coenders, Satorra, & Saris, 1996; Flora & Curran,
2004; Quiroga, 1992). Thoughtful consideration of a given construct is necessary to judge
whether the underlying normality is tenable. The underlying distribution of the frequency of
aggressive behaviors per day, for example, is unlikely normally distributed in the population.
Besides, a test of the underlying bivariate normality assumption is available with LISREL’s
processor PRELIS. The assumption of underlying bivariate normality is needed to calculate the
polychoric correlation. Future research may investigate ordinal observed indicators with
non-normal underlying distributions, or a mixture of underlying normality and non-normality
on the same factor.

Although this study did not empirically examine the effects of violation of the underlying
normality distributions, some predictions could be made with caution while selecting the three
robust estimators. For example, the effects of violation of the underlying normality
distributions would likely be more saliently on the performance of WLSMV and ULSMV than
that of MLR, holding other conditions constant. In addition, given the condition of multiple
underlying distributions across several groups of interest, it could be expected that the
situation of heterogeneous underlying distributions would exacerbate the effects of violation of
the underlying normality distributions on the performance of WLSMV and ULSMV than that of

69

MLR with all remaining conditions being equal. Although I only considered ordinal observed
variables for each latent construct in this study, I would predict that MLR would likely perform
better in the estimation of factor loadings under the condition of a mixture of continuous
observed variables and ordinal observed variables, compared to the condition of all ordinal
observed variables.

Second, the 5-factor SR model in this study was selected to be the representative of the
medium-sized SEM model specification, which is beyond any prior studies documented in the
SEM literature. However, further investigation tailored to various applications of SEM is
suggested, in which models approximate real-world situations likely to be encountered in
empirical studies: (1) a latent growth curve model, with the aim to capture individual
trajectories, (2) a multiple-group structural regression model to possibly study group
similarities and differences, or (3) a multilevel structural equation model in consideration of
clustering effects. Additionally, this study was limited to a saturated structural model;
therefore a natural extension of this study is the investigation of a non-saturated structural
model by manipulating the number of structural coefficients.

Third, due to the population SR model being correctly specified, the present study does
not pursue the possible effects of model misspecification. In fact, applied researchers have to
recognize that they may not always work with models without specification errors. The popular
supplemental fit index, RMSEA, showed promise for assessing the adequacy of a hypothesized
model without specification error in this study, but an interesting avenue of further
investigation would examine the power of both corrected chi-square goodness of fit statistics
and RMSEA to detect model misspecification. Although previous simulation studies have

70

suggested that the ordinal CFA models are robust to slight model misspecification (e.g., Flora
& Curran, 2004; Maydeu-Olivares, 2006), a worthy topic for future research is to compare the
performance of these robust estimators on parameter and standard error estimates,
chi-square goodness of fit statistics, and ad hoc fit indices when an SR model with ordinal
observed variables under different levels of model specification errors. For example, applied
researchers may omit structural regression coefficients or cross-factor loadings, or include
certain misspecified structural regression coefficients that are not actually in the population
model.

71

CHAPTER 7
SUMMARY AND CONCLUSIONS

The conclusions of this study can be summarized as follows:
(1) the four estimators are all subjected to non-convergence problems with 4-category,
moderately asymmetric data in the smallest sample size N = 200;
(2) WLSMV and ULSMV are likely to produce inadmissible solutions in some conditions
with sample sizes N = 200 or 300;
(3) WLSMV and ULSMV yield more accurate factor loading estimates than ML and MLR
across all conditions in the study;
(4) the estimates of structural coefficients under ML and MLR outperform WLSMV and
ULSMV in all symmetric data conditions, whereas WLSMV and ULSMV surpass ML
and MLR in nearly all asymmetric data conditions;
(5) the robust standard errors of factor loadings obtained with ULSMV are more precise
than those produced by WLSMV and MLR across all conditions;
(6) the robust standard errors of structural coefficients obtained with WLSMV are more
precise than those with ULSMV and MLR in all asymmetric data conditions;
(7) among the three robust estimators, MLR is inferior to WLSMV and ULSMV in
controlling for Type I error rates of testing overall model fit in almost every
condition, unless a larger sample size is used (i.e., N = 1,000 in this thesis);
(8) RMSEA seems to be a reliable index in the evaluation of overall model fit when the
model has no specification error;
(9) the benefit of using diagonal weights can be found in the estimation of factor
loadings and structural coefficients and robust standard errors of structural
coefficients, but not in the estimation of robust standard errors of factor loadings

72

and the mean- and variance-adjusted chi-square goodness of fit statistics across all
conditions; and
(10) the accuracy and precision of factor loadings and structural coefficients, and
standard error estimates of both factor loadings and structural coefficients improve
with increasing sample size and number of observed variables’ categories but
decrease with a greater level of asymmetric distributions.

Although WLSMV and ULSMV can be generally recommended to use when fitting an SR
model with ordinal observed variables, it is worthwhile to point out that each estimator
considered in this thesis has its own advantages and disadvantages. This study provides
evidence that WLSMV and ULSMV perform better than MLR, and that MLR dose so than ML in
many conditions. WLSMV and ULSMV do not need a large sample size for the recovery of
population factor loadings and structural coefficients, and to evaluate overall model fit using
the mean- and variance-adjusted chi-square goodness of fit statistics, but a medium sample
(e.g., N = 500 or more) is required to obtain stable standard error estimates of both factor
loadings and structural coefficients. In addition, the benefit of using diagonal weights in
WLSMV can be observed in the estimation of factor loadings and structural coefficients as well
as robust standard errors of structural coefficients. Compared to ML and MLR, WLSMV and
ULSMV have more reliable model inference in small sample sizes and are more likely to detect
small structural relationships with precision when data were slightly or moderately
asymmetric.

On the other hand, MLR has its own strengths − e.g., generally less biased standard
error estimates of factor loadings and structural coefficients, and accurate and precise

73

structural coefficient estimates in the conditions of symmetric data. MLR does not require a
large sample to produce stable structural coefficient estimates and standard error estimates of
factor loadings and structural coefficients, but may need a quite large sample (e.g., N = 1,000
or more) to control for Type I error rates of testing overall model fit, despite the existence of
moderate underestimation in factor loading estimates. However, the small amount of bias in
structural coefficient estimates makes MLR practically recommendable when applied
researchers are primarily concerned with structural relationships among latent constructs.
Consistent with asymptotic theory, ML can perform pretty well in a relatively large sample
when data are near symmetric (or close to normal). Generally speaking, the
moderate-to-substantial underestimation of standard errors for both factor loadings and
structural coefficients, and considerable inflation of chi-square goodness of fit statistics make
ML less attractive and favorable in practice, particularly when data moderately deviate from
normality. However, ML and MLR with full information estimation can be considered a
promising approach in research practice when applied researchers have to deal with missing
data because the treatment of missing data in WLSMV and ULSMV estimators remains
technically underdeveloped.

It is important to keep in mind that any working recommendations provided herein are
based on the current model configurations. This study did not consider the possible effects of
violation of the underlying normality distributions. However, it can be expected that the effects
of the underlying normality assumption violation would be more saliently on the performance
of WLSMV and ULSMV than that of MLR on model estimation. Furthermore, it is unclear that the
performance of the four estimators on parameter and standard error estimates, chi-square
goodness of fit statistics, and RMSEA in an SR model with ordinal observed variables under

74

varying levels of model misspecification. Future investigations into these simulation design
characteristics would likely render informative suggestions and more fine-grained
recommendations. Applied researchers still have to weigh the pros and cons of different
estimators, in order to make better-informed decisions while analyzing an SR model with
ordinal observed indicators.

75

APPENDICES

76

Table 1. Overview of Six Major Simulation Studies in Ordinal CFA

Beauducel
& Herzberg

Lei

2003
200, 500,
1000
1&3
5, 10, 15, 30,
45
2, 3, 5

2006
250, 500, 750,
1000
1, 2, 4, 8

2009
100, 250,
1000
2&3

Studies
Forero,
Maydeu-Olivares,
& Gallardo-Pujol
2009
200, 500,
2000
1&3

5, 10, 20, 40

6&9

9, 21, 42

6 & 16

10 & 20

2, 3, 4, 5, 6

5

2&5

2, 5, 7

2, 3, 4, 5, 6, 7

Yes

Yes

Yes

Yes

Yes

Yes

ML* & WLSMV

ML & WLSMV

ML, ROBUST
& WLSMV

ULS & WLSMV

ML*, ULS*, DWLS*

ULSMV & MLMV

Oranje
Year
Sample Size
No. Factors
No. Variables
No. Categories
Item
Asymmetry
Estimation

Yang-Wallentin,
Joreskog,
& Luo
2010
100, 200, 400,
800, 1600
2&4

Rhemtulla,
Brosseau-Liard,
& Savalei
2012
100, 150, 350,
600
2

LISREL &
Mplus
EQS & Mplus
Mplus
LISREL
Mplus
Mplus
Note. *Polychoric correlation estimates and estimated asymptotic covariance matrix need to compute from PRELIS before
performing LISREL.
Software

Table 2. Robust Estimation Comparison in the Three SEM Software Packages
SEM Software Programs
Estimation
Mplus
EQS
LISREL
Robust Maximum Likelihood
MLR
ML, ROBUST
ML*
Robust Unweighted Least Squares
ULSMV
LS, ROBUST
ULS*
Robust Weighted Least Squares
WLSMV
×
DWLS*
*
Note. Polychoric correlation estimates and estimated asymptotic covariance matrix need to compute from PRELIS before
performing LISREL. Robust weighted least squares estimation is currently unavailable in EQS.

77

Table 3. Comparison of Two Major Estimation Approaches: Maximum Likelihood and Least Squares in Mplus
Estimators

Parameters

Standard Errors

Chi-square

ML
MLM = ML
MLMV = ML
MLR = ML

ML
MLM = MLMV ≠ ML
MLMV = MLM ≠ ML
MLR ≠ ML

ML
MLM ≠ ML
MLMV ≠ ML
MLR ≠ ML

ULS
ULSMV = ULS

ULS
ULSMV ≠ ULS

ULS
ULSMV ≠ ULS

Maximum Likelihood
ML
MLM
MLMV
MLR
Least Squares
ULS
ULSMV

WLS
WLS
WLS
WLS
WLSM
WLSM = WLSMV ≠ WLS
WLSM = WLSMV ≠ WLS
WLSM ≠ WLS
WLSMV
WLSMV = WLSM ≠ WLS
WLSMV = WLSM ≠ WLS
WLSMV ≠ WLS
Note. ML = maximum likelihood, MLM = maximum likelihood with a mean-adjusted chi-square statistic, MLMV = maximum
likelihood with a mean- and variance-adjusted chi-square statistic; ULS = unweighted least squares, ULSMV = unweighted least
squares with a mean- and variance-adjusted chi-square statistic; WLS = weighted least squares, WLSM = weighted least squares
with a mean-adjusted chi-square statistic, WLSMV = weighted least squares with a mean- and variance-adjusted chi-square
statistic.

78

Table 4(a). Cases of Non-Convergence
Dis.
Est.

ML/MLR

WLSMV

ULSMV

N

Cat.

Symmetry

Slight Asymmetry

Moderate Asymmetry

Bipolarization

4

5

6

7

4

5

6

7

4

5

6

7

4

5

6

7

200

0

0

0

0

0

0

0

0

5

0

0

0

0

0

0

0

300

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

400

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

500

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

750

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1000

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1500

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

200

0

0

0

0

0

0

0

0

4

1

0

0

0

0

0

0

300

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

400

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

500

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

750

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1000

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1500

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

200

0

0

0

0

0

0

0

0

2

0

0

0

0

0

0

0

300

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

400

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

500

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

750

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1000

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1500

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Note. Est. = Estimators, Dis. = distribution type, and Cat. = number of categories. ML/MLR = maximum likelihood/robust
maximum likelihood, WLSMV = robust weighted least squares, ULSMV = robust unweighted least squares. N = sample sizes.

79

Table 4(b). Cases of Inadmissible Solutions
Dis.
Est.

ML/MLR

WLSMV

ULSMV

N

Cat.

Symmetry

Slight Asymmetry

Moderate Asymmetry

Bipolarization

4

5

6

7

4

5

6

7

4

5

6

7

4

5

6

7

200

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

300

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

400

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

500

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

750

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1000

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1500

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

200

0

1

0

0

0

1

0

0

0

0

0

2

3

1

1

0

300

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

400

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

500

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

750

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1000

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1500

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

200

0

1

0

0

1

2

0

0

6

0

0

1

4

1

1

1

300

0

0

0

0

0

0

0

0

3

0

1

0

0

0

0

0

400

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

500

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

750

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1000

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1500

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Note. Est. = Estimators, Dis. = distribution type, and Cat. = number of categories. ML/MLR = maximum likelihood/robust
maximum likelihood, WLSMV = robust weighted least squares, ULSMV = robust unweighted least squares. N = sample sizes.

80

Table 5. The Average Relative Bias (RBA) and Average Root Mean Squared Error (MSEA) for Factor Loadings and Structural
Coefficients (N = 200)

Dis.
sym

slight

mod

ML/MLR

WLSMV

ULSMV

RBA Robust WLS MSEA

RBA Robust WLS MSEA

RBA Robust WLS MSEA

Cat.

FL

SC

FL

SC

FL

SC

FL

SC

FL

SC

FL

SC

4

-7.00

-2.13

0.0142

0.9570

0.30

-1.73

0.0118

0.9716

-0.08

-1.89

0.0122

0.9454

5

-4.44

-1.33

0.0107

0.7460

0.22

-0.74

0.0108

0.7847

-0.13

-0.82

0.0112

0.7637

6

-3.20

-0.90

0.0095

0.6817

0.22

-0.52

0.0103

0.7338

-0.11

-0.82

0.0107

0.7123

7

-2.49

-0.42

0.0089

0.6734

0.13

-0.17

0.0100

0.7222

-0.19

-0.20

0.0104

0.7002

4

-10.10

-2.79

0.0216

1.3215

0.20

-0.65

0.0136

1.1989

-0.23

-0.97

0.0140

1.1492

5

-6.92

-2.17

0.0154

0.8693

0.21

-1.02

0.0118

0.9212

-0.18

-1.00

0.0122

0.8813

6

-6.04

-2.35

0.0138

0.8267

0.15

-1.50

0.0111

0.8199

-0.20

-1.69

0.0115

0.7826

7

-5.43

-1.40

0.0132

0.7546

0.18

-0.31

0.0105

0.7231

-0.16

-0.41

0.0109

0.7017

4

-11.86

-3.23

0.0291

1.2265

0.06

0.50

0.0161

1.4529

-0.55

-0.02

0.0168

1.4251

5

-9.26

-3.03

0.0218

1.1744

0.11

-0.25

0.0133

1.1013

-0.37

-0.69

0.0138

1.1302

6

-8.78

-3.18

0.0212

1.0577

0.10

0.11

0.0126

1.0693

-0.32

-0.35

0.0131

1.0445

7

-8.73

-3.03

0.0213

1.0262

0.11

0.25

0.0121

0.9686

-0.30

-0.16

0.0126

0.9564

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. ML/MLR = maximum likelihood/robust maximum likelihood, WLSMV =
robust weighted least squares, ULSMV = robust unweight least squares. FL represents factor loadings and SC is structural
coefficients.

81

Table 6. The Average Relative Bias (RBA) and Average Root Mean Squared Error (MSEA) for Factor Loadings and Structural
Coefficients (N = 300)

Dis.
sym

slight

mod

ML/MLR

WLSMV

ULSMV

RBA Robust WLS MSEA

RBA Robust WLS MSEA

RBA Robust WLS MSEA

Cat.

FL

SC

FL

SC

FL

SC

FL

SC

FL

SC

FL

SC

4

-6.91

-0.05

0.0108

0.4427

0.23

0.27

0.0076

0.4781

-0.03

0.18

0.0079

0.4693

5

-4.44

-0.26

0.0076

0.4325

0.12

0.23

0.0069

0.4365

-0.12

0.09

0.0072

0.4316

6

-3.19

0.04

0.0065

0.3915

0.10

0.44

0.0065

0.4135

-0.11

0.35

0.0068

0.4155

7

-2.43

0.08

0.0060

0.3824

0.08

0.47

0.0064

0.3997

-0.12

0.35

0.0067

0.3995

4

-9.96

-0.89

0.0174

0.6313

0.13

0.55

0.0088

0.6205

-0.15

0.32

0.0091

0.6097

5

-6.87

-0.81

0.0117

0.5484

0.11

0.57

0.0076

0.5144

-0.15

0.42

0.0079

0.5208

6

-5.88

-0.64

0.0101

0.4766

0.14

0.48

0.0071

0.4545

-0.10

0.33

0.0074

0.4618

7

-5.36

-0.67

0.0097

0.4908

0.06

0.40

0.0069

0.4304

-0.17

0.25

0.0072

0.4297

4

-11.72

-3.51

0.0238

0.7830

0.07

-0.23

0.0107

0.8631

-0.32

-0.64

0.0112

0.7905

5

-9.12

-2.62

0.0173

0.6345

0.07

0.08

0.0088

0.6117

-0.24

0.05

0.0092

0.6212

6

-8.63

-2.81

0.0164

0.7001

0.10

0.34

0.0081

0.5747

-0.19

0.18

0.0084

0.5856

7

-8.65

-2.68

0.0166

0.6522

0.04

0.51

0.0078

0.5210

-0.23

0.32

0.0082

0.5271

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. ML/MLR = maximum likelihood/robust maximum likelihood, WLSMV =
robust weighted least squares, ULSMV = robust unweight least squares. FL represents factor loadings and SC is structural
coefficients.

82

Table 7. The Average Relative Bias (RBA) and Average Root Mean Squared Error (MSEA) for Factor Loadings and Structural
Coefficients (N = 500)

Dis.
sym

slight

mod

ML/MLR

WLSMV

ULSMV

RBA Robust WLS MSEA

RBA Robust WLS MSEA

RBA Robust WLS MSEA

Cat.

FL

SC

FL

SC

FL

SC

FL

SC

FL

SC

FL

SC

4

-6.95

-0.78

0.0084

0.2751

0.12

-0.48

0.0045

0.2953

-0.05

-0.53

0.0046

0.3000

5

-4.40

-0.59

0.0053

0.2451

0.09

-0.28

0.0041

0.2618

-0.06

-0.33

0.0043

0.2676

6

-3.18

-0.89

0.0043

0.2371

0.06

-0.69

0.0039

0.2522

-0.08

-0.73

0.0041

0.2560

7

-2.43

-0.38

0.0038

0.2211

0.06

-0.13

0.0038

0.2366

-0.08

-0.19

0.0039

0.2413

4

-9.85

-1.27

0.0141

0.3203

0.14

-0.18

0.0052

0.3231

-0.04

-0.26

0.0053

0.3279

5

-6.78

-1.70

0.0087

0.2976

0.10

-0.68

0.0045

0.3035

-0.06

-0.75

0.0047

0.3082

6

-5.86

-1.35

0.0073

0.2775

0.08

-0.62

0.0041

0.2794

-0.08

-0.68

0.0043

0.2865

7

-5.27

-1.26

0.0067

0.2640

0.07

-0.24

0.0040

0.2422

-0.08

-0.29

0.0042

0.2475

4

-11.56

-3.65

0.0192

0.3944

0.11

-0.69

0.0062

0.4510

-0.14

-0.91

0.0065

0.4686

5

-9.00

-3.48

0.0133

0.3458

0.09

-1.35

0.0052

0.3485

-0.10

-1.52

0.0054

0.3566

6

-8.56

-3.44

0.0126

0.3428

0.03

-1.15

0.0048

0.3113

-0.15

-1.28

0.0050

0.3162

7

-8.49

-3.39

0.0125

0.3428

0.04

-0.90

0.0047

0.3017

-0.13

-1.03

0.0049

0.3084

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. ML/MLR = maximum likelihood/robust maximum likelihood, WLSMV =
robust weighted least squares, ULSMV = robust unweight least squares. FL represents factor loadings and SC is structural
coefficients.

83

Table 8. The Average Relative Bias (RBA) and Average Root Mean Squared Error (MSEA) for Factor Loadings and Structural
Coefficients (N = 1,000)

Dis.
sym

slight

mod

ML/MLR

WLSMV

ULSMV

RBA Robust WLS MSEA

RBA Robust WLS MSEA

RBA Robust WLS MSEA

Cat.

FL

SC

FL

SC

FL

SC

FL

SC

FL

SC

FL

SC

4

-6.89

-0.44

0.0065

0.1363

0.08

-0.23

0.0023

0.1433

0.01

-0.27

0.0024

0.1452

5

-4.34

-0.60

0.0035

0.1246

0.09

-0.34

0.0020

0.1284

0.01

-0.37

0.0021

0.1299

6

-3.07

-0.54

0.0025

0.1133

0.11

-0.36

0.0019

0.1177

0.04

-0.39

0.0020

0.1190

7

-2.34

-0.59

0.0021

0.1082

0.09

-0.37

0.0019

0.1126

0.03

-0.39

0.0020

0.1142

4

-9.78

-1.18

0.0117

0.1632

0.15

-0.27

0.0026

0.1622

0.05

-0.24

0.0027

0.1647

5

-6.69

-1.51

0.0065

0.1484

0.11

-0.87

0.0022

0.1431

0.03

-0.92

0.0023

0.1446

6

-5.79

-1.20

0.0053

0.1345

0.10

-0.51

0.0021

0.1248

0.03

-0.54

0.0022

0.1263

7

-5.20

-1.08

0.0046

0.1377

0.10

-0.47

0.0020

0.1228

0.04

-0.50

0.0021

0.1246

4

-11.49

-3.04

0.0162

0.1840

0.14

-0.16

0.0031

0.2001

0.02

-0.25

0.0033

0.2021

5

-8.94

-2.47

0.0106

0.1694

0.11

-0.16

0.0026

0.1648

0.02

-0.21

0.0027

0.1660

6

-8.46

-2.57

0.0098

0.1699

0.11

-0.32

0.0024

0.1498

0.03

-0.38

0.0025

0.1524

7

-8.44

-2.47

0.0098

0.1709

0.11

-0.32

0.0023

0.1476

0.03

-0.36

0.0024

0.1495

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. ML/MLR = maximum likelihood/robust maximum likelihood, WLSMV =
robust weighted least squares, ULSMV = robust unweight least squares. FL represents factor loadings and SC is structural
coefficients.

	  

84

Table 9. The Average Root Mean Squared Error (MSEA) for the Four Structural Coefficients (N = 1,000)
ML/MLR

WLSMV

ULSMV

Robust WLS

Robust
WLS
Structural
Coefficients

Robust WLS

Dis. & Cat.

γ22= .2

β31= .2

β21= .3

ϕ12= .3

γ22= .2

β31= .2

β21= .3

ϕ12= .3

γ22= .2

β31= .2

β21= .3

ϕ12= .3

sym

4

0.1288

0.2066

0.0820

0.0176

0.1323

0.2227

0.0866

0.0179

0.1349

0.2283

0.0884

0.0178

5

0.1203

0.1769

0.0805

0.0172

0.1241

0.1855

0.0846

0.0176

0.1258

0.1893

0.0864

0.0176

6

0.1045

0.1596

0.0688

0.0175

0.1086

0.1668

0.0721

0.0179

0.1105

0.1701

0.0735

0.0179

7

0.1038

0.1473

0.0676

0.0169

0.1090

0.1565

0.0719

0.0172

0.1105

0.1602

0.0730

0.0171

4

0.1541

0.2350

0.1009

0.0227

0.1576

0.2353

0.1031

0.0217

0.1588

0.2414

0.1044

0.0216

5

0.1440

0.2010

0.0987

0.0207

0.1392

0.1965

0.0967

0.0193

0.1407

0.2018

0.0981

0.0193

6

0.1211

0.1904

0.0821

0.0194

0.1148

0.1795

0.0761

0.0180

0.1163

0.1841

0.0777

0.0180

7

0.1324

0.1863

0.0895

0.0197

0.1238

0.1677

0.0816

0.0175

0.1265

0.1714

0.0836

0.0175

4

0.1890

0.2554

0.1197

0.0310

0.2014

0.2926

0.1350

0.0264

0.2048

0.2995

0.1370

0.0262

5

0.1551

0.2488

0.0999

0.0266

0.1433

0.2446

0.0959

0.0226

0.1453

0.2494

0.0978

0.0226

6

0.1614

0.2319

0.1088

0.0271

0.1384

0.2069

0.0952

0.0210

0.1402

0.2137

0.0970

0.0210

7

0.1521

0.2381

0.1005

0.0273

0.1285

0.2112

0.0883

0.0199

0.1304

0.2169

0.0899

0.0199

slight

mod

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. ML/MLR = maximum likelihood/robust maximum likelihood, WLSMV =
robust weighted least squares, ULSMV = robust unweight least squares.

85

Table 10. The Average Relative Bias (RBA) and Average Root Mean Squared Error (MSEA) for Standard Errors (SE) of Factor
Loadings and Structural Coefficients (N = 200)

Dis.
sym

slight

mod

ML

MLR

RBA Robust WLS MSEA

RBA Robust WLS MSEA

Cat.

SEFL

SESC

SEFL

SESC

SEFL

SESC

SEFL

SESC

4

-1.98

-4.28

0.0093

0.0963

-0.66

-2.56

0.0143

0.0990

5

-2.34

-2.82

0.0103

0.0451

-1.14

-1.17

0.0152

0.0505

6

-2.53

-2.61

0.0103

0.0399

-1.37

-1.19

0.0150

0.0452

7

-2.79

-4.27

0.0114

0.0358

-1.59

-2.71

0.0162

0.0415

4

-7.32

-8.21

0.0143

0.3196

0.31

-1.93

0.0166

0.4033

5

-8.13

-5.80

0.0159

0.0751

-0.80

0.64

0.0170

0.0898

6

-6.86

-5.14

0.0143

0.0643

-0.10

0.59

0.0167

0.0738

7

-8.46

-4.81

0.0167

0.0527

-0.48

1.57

0.0162

0.0637

4

-16.97

-11.63

0.0374

0.3579

0.33

2.71

0.0191

0.4418

5

-14.85

-12.33

0.0310

0.2261

0.30

-0.04

0.0186

0.2428

6

-16.07

-12.02

0.0348

0.1406

-0.09

1.01

0.0178

0.1590

7

-16.75

-12.47

0.0370

0.1186

0.02

1.18

0.0176

0.1314

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. ML = maximum likelihood, MLR = robust maximum likelihood. SEFL
represents standard errors of factor loadings and SESC is standard errors of structural coefficients.

86

Table 10 (cont’d)

Dis.
sym

slight

mod

WLSMV

ULSMV

RBA Robust WLS MSEA

RBA Robust WLS MSEA

Cat.

SEFL

SESC

SEFL

SESC

SEFL

SESC

SEFL

SESC

4

-5.77

-6.62

0.0157

0.1017

-3.81

-4.27

0.0120

0.1088

5

-5.96

-5.91

0.0158

0.0528

-4.67

-3.87

0.0122

0.0514

6

-7.45

-8.11

0.0173

0.0544

-6.31

-5.86

0.0139

0.0543

7

-8.10

-9.96

0.0187

0.0522

-7.23

-7.80

0.0150

0.0475

4

-6.14

-6.32

0.0169

0.1399

-4.97

-4.37

0.0134

0.1251

5

-7.15

-7.34

0.0176

0.0816

-6.11

-5.15

0.0140

0.0709

6

-7.38

-7.82

0.0175

0.0551

-6.38

-5.49

0.0142

0.0546

7

-8.05

-7.54

0.0180

0.0581

-6.93

-5.27

0.0144

0.0514

4

-6.66

5.13

0.0188

0.2212

-5.06

-3.56

0.0142

0.2649

5

-6.84

-7.11

0.0172

0.1199

-5.30

-5.64

0.0134

0.1272

6

-7.86

-10.01

0.0182

0.0971

-7.06

-8.28

0.0146

0.0918

7

-7.72

-9.52

0.0184

0.0936

-7.14

-8.05

0.0154

0.0916

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. WLSMV = robust weighted least squares, ULSMV = robust unweight
least squares. SEFL represents standard errors of factor loadings and SESC is standard errors of structural coefficients.

87

Table 11. The Average Relative Bias (RBA) and Average Root Mean Squared Error (MSEA) for Standard Errors (SE) of Factor
Loadings and Structural Coefficients (N = 300)

Dis.
sym

slight

mod

ML

MLR

RBA Robust WLS MSEA

RBA Robust WLS MSEA

Cat.

SEFL

SESC

SEFL

SESC

SEFL

SESC

SEFL

SESC

4

-0.74

2.52

0.0065

0.0392

-0.02

3.95

0.0098

0.0451

5

-1.10

-0.08

0.0068

0.0273

-0.43

1.18

0.0098

0.0314

6

-0.71

0.95

0.0065

0.0278

0.05

2.06

0.0096

0.0316

7

-1.25

1.38

0.0069

0.0274

-0.54

2.56

0.0100

0.0324

4

-7.26

-4.63

0.0114

0.0557

-0.43

1.50

0.0103

0.0679

5

-7.98

-6.75

0.0144

0.0460

-0.87

-0.62

0.0108

0.0501

6

-7.08

-4.13

0.0115

0.0401

-0.78

1.45

0.0105

0.0509

7

-8.94

-5.46

0.0147

0.0348

-1.51

0.66

0.0107

0.0414

4

-16.82

-14.01

0.0346

0.2006

-0.69

-0.35

0.0121

0.2541

5

-15.08

-10.93

0.0292

0.0867

-0.84

1.46

0.0114

0.1692

6

-16.06

-12.60

0.0325

0.1475

-0.99

0.18

0.0115

0.0519

7

-16.68

-12.79

0.0345

0.0623

-0.96

0.49

0.0112

0.0640

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. ML = maximum likelihood, MLR = robust maximum likelihood. SEFL
represents standard errors of factor loadings and SESC is standard errors of structural coefficients.

88

Table 11 (cont’d)

Dis.
sym

slight

mod

WLSMV

ULSMV

RBA Robust WLS MSEA

RBA Robust WLS MSEA

Cat.

SEFL

SESC

SEFL

SESC

SEFL

SESC

SEFL

SESC

4

-3.04

-0.22

0.0101

0.0435

-1.97

1.41

0.0080

0.0456

5

-3.03

-1.06

0.0097

0.0325

-2.68

0.63

0.0080

0.0334

6

-4.02

-3.04

0.0100

0.0341

-3.50

-1.78

0.0079

0.0341

7

-4.69

-2.51

0.0105

0.0345

-4.49

-0.97

0.0086

0.0335

4

-3.97

-1.56

0.0106

0.0550

-3.06

0.33

0.0083

0.0598

5

-3.91

-3.73

0.0103

0.0460

-3.72

-2.44

0.0084

0.0522

6

-4.28

-3.25

0.0102

0.0361

-3.50

-1.82

0.0082

0.0374

7

-5.33

-3.78

0.0113

0.0337

-5.02

-2.16

0.0093

0.0339

4

-5.34

-4.21

0.0135

0.1137

-4.89

-1.26

0.0110

0.2204

5

-4.73

-2.70

0.0116

0.0913

-4.09

-1.51

0.0092

2.42
0.1000

6

-4.49

-3.27

0.0113

0.0555

-4.12

-1.91

0.0091

0.0692

7

-4.89

-3.08

0.0114

0.0382

-4.74

-1.73

0.0093

0.0407

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. ML = maximum likelihood, MLR = robust maximum likelihood, WLSMV
= robust weighted least squares, ULSMV = robust unweight least squares. SEFL represents standard errors of factor loadings
and SESC is standard errors of structural coefficients.

89

Table 12. The Average Relative Bias (RBA) and Average Root Mean Squared Error (MSEA) for Standard Errors (SE) of Factor
Loadings and Structural Coefficients (N = 500)

Dis.
sym

slight

mod

ML

MLR

RBA Robust WLS MSEA

RBA Robust WLS MSEA

Cat.

SEFL

SESC

SEFL

SESC

SEFL

SESC

SEFL

SESC

4

0.74

-1.45

0.0041

0.0150

0.88

-0.59

0.0060

0.0166

5

0.08

-0.47

0.0048

0.0120

0.23

0.50

0.0066

0.0137

6

-0.18

-0.97

0.0046

0.0119

0.16

-0.25

0.0065

0.0134

7

-0.29

-0.35

0.0051

0.0113

0.09

0.48

0.0070

0.0130

4

-5.91

-3.55

0.0084

0.0249

0.49

2.38

0.0075

0.0295

5

-6.10

-5.77

0.0089

0.0186

0.42

0.16

0.0080

0.0196

6

-5.04

-5.05

0.0075

0.0163

0.96

0.23

0.0077

0.0175

7

-7.00

-4.72

0.0097

0.0159

0.04

1.26

0.0072

0.0175

4

-15.22

-12.88

0.0276

0.0402

0.38

0.23

0.0080

0.0331

5

-13.61

-10.90

0.0228

0.0275

0.17

1.05

0.0074

0.0224

6

-14.59

-11.81

0.0259

0.0310

-0.03

0.76

0.0074

0.0246

7

-15.14

-12.79

0.0276

0.0323

0.12

0.15

0.0074

0.0227

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. ML = maximum likelihood, MLR = robust maximum likelihood. SEFL
represents standard errors of factor loadings and SESC is standard errors of structural coefficients.

90

Table 12 (cont’d)

Dis.
sym

slight

mod

WLSMV

ULSMV

RBA Robust WLS MSEA

RBA Robust WLS MSEA

Cat.

SEFL

SESC

SEFL

SESC

SEFL

SESC

SEFL

SESC

4

-0.43

-3.00

0.0065

0.0169

0.25

-2.05

0.0054

0.0171

5

-1.23

-1.70

0.0069

0.0145

-1.36

-1.04

0.0055

0.0149

6

-1.74

-2.76

0.0068

0.0139

-1.56

-1.83

0.0055

0.0141

7

-1.91

-2.60

0.0069

0.0135

-2.06

-1.80

0.0056

0.0138

4

-1.64

-0.14

0.0070

0.0239

-1.46

0.49

0.0057

0.0249

5

-1.80

-3.20

0.0074

0.0183

-2.14

-2.51

0.0062

0.0185

6

-0.94

-4.00

0.0066

0.0155

-1.32

-3.92

0.0053

0.0160

7

-1.99

-1.15

0.0069

0.0147

-2.09

-0.54

0.0058

0.0152

4

-2.03

-2.60

0.0080

0.0295

-1.80

-2.48

0.0060

0.0324

5

-1.70

-2.07

0.0070

0.0202

-0.97

-1.82

0.0061

0.0212

6

-2.32

-1.69

0.0071

0.0201

-1.96

-0.87

0.0060

0.0207

7

-2.58

-3.04

0.0071

0.0180

-2.49

-2.42

0.0062

0.0187

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. WLSMV = robust weighted least squares, ULSMV = robust unweight
least squares. SEFL represents standard errors of factor loadings and SESC is standard errors of structural coefficients.

91

Table 13. The Average Relative Bias (RBA) and Average Root Mean Squared Error (MSEA) for Standard Errors (SE) of Factor
Loadings and Structural Coefficients (N = 1,000)

Dis.
sym

slight

mod

ML

MLR

RBA Robust WLS MSEA

RBA Robust WLS MSEA

Cat.

SEFL

SESC

SEFL

SESC

SEFL

SESC

SEFL

SESC

4

0.56

-1.09

0.0028

0.0072

0.29

-0.55

0.0036

0.0079

5

0.36

-2.31

0.0026

0.0060

0.14

-1.74

0.0034

0.0066

6

1.18

-0.82

0.0028

0.0056

1.16

-0.35

0.0037

0.0063

7

1.51

-0.76

0.0030

0.0054

1.48

-0.26

0.0038

0.0062

4

-5.50

-5.64

0.0055

0.0118

0.45

-0.15

0.0037

0.0107

5

-6.38

-8.22

0.0067

0.0135

-0.24

-2.74

0.0038

0.0094

6

-5.34

-5.04

0.0058

0.0090

0.26

0.13

0.0041

0.0084

7

-6.24

-7.11

0.0066

0.0113

0.49

-1.51

0.0038

0.0083

4

-14.72

-13.60

0.0239

0.0262

0.28

-0.70

0.0039

0.0111

5

-12.31

-11.68

0.0178

0.0205

1.17

-0.08

0.0043

0.0100

6

-13.46

-13.17

0.0208

0.0244

0.71

-1.14

0.0041

0.0102

7

-13.67

-13.55

0.0213

0.0257

1.19

-1.08

0.0041

0.0105

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. ML = maximum likelihood, MLR = robust maximum likelihood. SEFL
represents standard errors of factor loadings and SESC is standard errors of structural coefficients.

92

Table 13 (cont’d)

Dis.
sym

slight

mod

WLSMV

ULSMV

RBA Robust WLS MSEA

RBA Robust WLS MSEA

Cat.

SEFL

SESC

SEFL

SESC

SEFL

SESC

SEFL

SESC

4

-0.73

-1.91

0.0035

0.0082

0.23

-1.47

0.0031

0.0084

5

-0.10

-2.32

0.0034

0.0070

0.29

-1.75

0.0031

0.0071

6

-0.45

-1.82

0.0033

0.0066

-0.32

-1.25

0.0029

0.0068

7

0.05

-1.89

0.0034

0.0065

-0.02

-1.31

0.0029

0.0066

4

-0.26

-1.36

0.0038

0.0098

0.38

-1.01

0.0032

0.0103

5

-0.84

-3.48

0.0035

0.0088

-0.58

-2.79

0.0031

0.0089

6

-0.80

-0.62

0.0034

0.0070

0.02

-0.18

0.0028

0.0074

7

-0.69

-2.35

0.0036

0.0072

0.13

-1.76

0.0032

0.0073

4

-1.41

-1.93

0.0042

0.0115

-1.17

-1.31

0.0035

0.0119

5

-0.41

-0.57

0.0037

0.0093

-0.33

-0.14

0.0033

0.0095

6

-0.57

-1.89

0.0040

0.0085

-0.87

-1.56

0.0036

0.0087

7

-0.07

-2.34

0.0038

0.0081

-0.16

-1.88

0.0031

0.0083

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. WLSMV = robust weighted least squares, ULSMV = robust unweight
least squares. SEFL represents standard errors of factor loadings and SESC is standard errors of structural coefficients.

93

Table 14. Bias and Rejection Rates of Chi-Square Statistics as well as Means and Rejection Rates of RMSEA (N = 200)
ML

MLR

Robust WLSRMSEA
Chi-square
Dis.
sym

slight

mod

Robust WLSRMSEA
Chi-square

Cat.

bias

%

M

%

bias

%

M

%

4

5.14

9.60

0.015

0.00

5.99

11.60

0.016

0.00

5

5.69

13.80

0.015

0.00

6.431

15.80

0.016

0.00

6

5.61

14.00

0.015

0.00

6.37

15.00

0.016

0.00

7

5.77

12.40

0.015

0.00

6.54

14.60

0.016

0.00

4

13.52

32.20

0.023

0.00

9.58

21.20

0.019

0.00

5

14.33

34.40

0.024

0.00

9.57

22.20

0.019

0.00

6

13.53

31.40

0.023

0.00

9.33

19.80

0.019

0.00

7

14.79

36.60

0.025

0.00

9.59

23.00

0.019

0.00

4

28.44

74.49

0.036

0.00

11.70

26.11

0.022

0.00

5

25.90

68.40

0.034

0.00

11.37

26.20

0.021

0.00

6

27.41

67.74

0.035

0.00

11.62

27.66

0.021

0.00

7

28.92

74.50

0.036

0.00

11.78

26.91

0.021

0.00

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. ML = maximum likelihood, MLR = robust maximum likelihood. M =
mean.

94

Table 14 (cont’d)
WLSMV

ULSMV

Robust WLSRMSEA
Chi-square
Dis.
sym

slight

mod

Robust WLSRMSEA
Chi-square

Cat.

bias

%

M

%

bias

%

M

%

4

2.88

4.80

0.011

0.00

0.80

2.60

0.009

0.00

5

3.45

6.21

0.012

0.00

1.50

3.61

0.010

0.00

6

4.65

7.20

0.013

0.00

2.55

4.80

0.011

0.00

7

5.23

8.20

0.014

0.00

2.99

5.04

0.012

0.00

4

4.04

4.82

0.013

0.00

1.83

1.61

0.010

0.00

5

4.73

6.02

0.014

0.00

2.62

3.22

0.011

0.00

6

5.25

7.01

0.014

0.00

3.26

4.81

0.012

0.00

7

5.78

9.02

0.015

0.00

3.45

5.81

0.012

0.00

4

5.19

4.89

0.014

0.00

3.36

2.86

0.012

0.00

5

5.58

8.22

0.015

0.00

3.76

5.20

0.012

0.00

6

5.94

9.02

0.015

0.00

4.12

6.81

0.013

0.00

7

6.47

9.70

0.016

0.00

4.65

7.46

0.013

0.00

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. WLSMV = robust weighted least squares, ULSMV = robust unweight
least squares. M = mean.

95

Table 15. Bias and Rejection Rates of Chi-Square Statistics as well as Means and Rejection Rates of RMSEA (N = 300)
ML

MLR

Robust WLSRMSEA
Chi-square
Dis.
sym

slight

mod

Robust WLSRMSEA
Chi-square

Cat.

bias

%

M

%

bias

%

M

%

4

2.83

9.80

0.010

0.00

3.24

10.80

0.010

0.00

5

3.59

10.00

0.011

0.00

3.89

10.80

0.011

0.00

6

3.53

10.80

0.010

0.00

3.92

11.40

0.011

0.00

7

3.82

10.40

0.011

0.00

4.17

11.20

0.011

0.00

4

11.82

27.60

0.018

0.00

6.12

12.80

0.013

0.00

5

13.59

33.87

0.019

0.00

7.24

18.04

0.014

0.00

6

11.95

27.66

0.018

0.00

6.14

13.63

0.013

0.00

7

13.76

33.40

0.019

0.00

6.91

16.60

0.013

0.00

4

26.40

67.74

0.028

0.00

7.37

17.43

0.014

0.00

5

24.11

64.20

0.027

0.00

7.40

18.60

0.014

0.00

6

26.62

69.00

0.028

0.00

8.39

20.00

0.015

0.00

7

27.29

68.60

0.029

0.00

7.87

19.80

0.014

0.00

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. ML = maximum likelihood, MLR = robust maximum likelihood. M =
mean.

96

Table 15 (cont’d)
WLSMV

ULSMV

Robust WLSRMSEA
Chi-square
Dis.
sym

slight

mod

Robust WLSRMSEA
Chi-square

Cat.

bias

%

M

%

bias

%

M

%

4

1.65

4.00

0.009

0.00

0.23

3.40

0.007

0.00

5

2.13

5.00

0.009

0.00

0.80

4.40

0.008

0.00

6

3.12

6.20

0.010

0.00

1.71

4.60

0.008

0.00

7

3.44

7.60

0.010

0.00

1.98

5.60

0.008

0.00

4

2.57

4.01

0.009

0.00

1.17

2.61

0.008

0.00

5

3.61

6.61

0.010

0.00

2.11

4.61

0.009

0.00

6

3.53

6.81

0.010

0.00

2.15

5.21

0.009

0.00

7

4.06

7.60

0.010

0.00

2.51

6.20

0.009

0.00

4

3.44

5.24

0.010

0.00

2.39

4.05

0.009

0.00

5

3.67

6.40

0.010

0.00

2.58

5.20

0.009

0.00

6

4.00

5.80

0.011

0.00

2.85

4.41

0.009

0.00

7

4.12

5.60

0.011

0.00

2.95

3.60

0.010

0.00

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. WLSMV = robust weighted least squares, ULSMV = robust unweight
least squares. M = mean.

97

Table 16. Bias and Rejection Rates of Chi-Square Statistics as well as Means and Rejection Rates of RMSEA (N = 500)
ML

MLR

Robust WLSRMSEA
Chi-square
Dis.
sym

slight

mod

Robust WLSRMSEA
Chi-square

Cat.

bias

%

M

%

bias

%

M

%

4

2.84

9.20

0.008

0.00

2.92

8.80

0.008

0.00

5

2.56

9.20

0.008

0.00

2.54

9.20

0.007

0.00

6

2.37

8.40

0.007

0.00

2.45

8.60

0.008

0.00

7

2.68

9.60

0.008

0.00

2.70

9.40

0.008

0.00

4

10.79

24.00

0.013

0.00

3.86

9.40

0.009

0.00

5

11.11

25.40

0.013

0.00

3.59

9.20

0.008

0.00

6

11.05

27.05

0.013

0.00

3.98

11.02

0.009

0.00

7

11.38

26.80

0.014

0.00

3.46

9.60

0.008

0.00

4

25.65

66.53

0.022

0.00

5.00

10.06

0.009

0.00

5

23.57

60.32

0.021

0.00

5.15

12.22

0.009

0.00

6

24.48

63.40

0.021

0.00

4.72

12.80

0.009

0.00

7

25.88

67.40

0.022

0.00

4.94

12.40

0.009

0.00

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. ML = maximum likelihood, MLR = robust maximum likelihood. M =
mean.

98

Table 16 (cont’d)
WLSMV

ULSMV

Robust WLSRMSEA
Chi-square
Dis.
sym

slight

mod

Robust WLSRMSEA
Chi-square

Cat.

bias

%

M

%

bias

%

M

%

4

1.55

4.80

0.007

0.00

0.64

4.00

0.006

0.00

5

1.46

4.60

0.006

0.00

0.60

3.80

0.006

0.00

6

1.86

5.60

0.007

0.00

0.86

4.40

0.006

0.00

7

2.30

7.40

0.007

0.00

1.34

5.80

0.006

0.00

4

1.36

3.00

0.006

0.00

0.34

2.60

0.006

0.00

5

1.99

3.80

0.007

0.00

1.01

3.80

0.006

0.00

6

2.13

5.61

0.007

0.00

1.21

5.20

0.006

0.00

7

1.91

3.80

0.007

0.00

1.03

3.20

0.006

0.00

4

2.31

4.62

0.007

0.00

1.56

3.80

0.007

0.00

5

2.41

6.83

0.007

0.00

1.65

4.80

0.007

0.00

6

2.28

4.80

0.007

0.00

1.48

3.60

0.006

0.00

7

2.66

6.60

0.007

0.00

1.88

5.20

0.007

0.00

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. WLSMV = robust weighted least squares, ULSMV = robust unweight
least squares. M = mean.

99

Table 17. Bias and Rejection Rates of Chi-Square Statistics as well as Means and Rejection Rates of RMSEA (N = 1,000)
ML

MLR

Robust WLSRMSEA
Chi-square
Dis.
sym

slight

mod

Robust WLSRMSEA
Chi-square

Cat.

bias

%

M

%

bias

%

M

%

4

0.81

5.60

0.005

0.00

0.70

5.40

0.005

0.00

5

0.88

6.00

0.005

0.00

0.67

5.80

0.005

0.00

6

1.28

7.20

0.005

0.00

1.12

7.20

0.005

0.00

7

1.10

5.80

0.005

0.00

0.93

5.80

0.005

0.00

4

10.06

22.20

0.009

0.00

2.22

6.60

0.005

0.00

5

9.51

22.00

0.009

0.00

1.12

7.80

0.005

0.00

6

9.36

19.80

0.009

0.00

1.45

7.40

0.005

0.00

7

9.73

21.20

0.009

0.00

0.93

6.80

0.005

0.00

4

23.43

63.20

0.014

0.00

1.78

7.80

0.005

0.00

5

21.09

56.40

0.014

0.00

1.69

5.40

0.005

0.00

6

22.43

58.00

0.014

0.00

1.60

5.60

0.005

0.00

7

23.35

59.80

0.014

0.00

1.50

7.40

0.005

0.00

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. ML = maximum likelihood, MLR = robust maximum likelihood. M =
mean.

100

Table 17 (cont’d)
WLSMV

ULSMV

Robust WLSRMSEA
Chi-square
Dis.
sym

slight

mod

Robust WLSRMSEA
Chi-square

Cat.

bias

%

M

%

bias

%

M

%

4

0.28

3.00

0.004

0.00

-0.21

2.80

0.004

0.00

5

0.30

3.60

0.004

0.00

-0.13

3.01

0.004

0.00

6

1.14

3.60

0.005

0.00

0.74

3.80

0.004

0.00

7

0.90

4.40

0.004

0.00

0.56

3.80

0.004

0.00

4

1.04

3.60

0.005

0.00

0.56

3.41

0.004

0.00

5

0.62

4.80

0.005

0.00

0.19

4.80

0.004

0.00

6

0.91

4.40

0.004

0.00

0.49

4.40

0.004

0.00

7

0.79

4.60

0.004

0.00

0.30

3.80

0.004

0.00

4

0.61

5.00

0.004

0.00

0.30

4.41

0.004

0.00

5

0.89

4.20

0.004

0.00

0.55

3.60

0.004

0.00

6

0.71

3.60

0.004

0.00

0.35

2.60

0.004

0.00

7

0.55

4.20

0.004

0.00

0.19

3.60

0.004

0.00

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. WLSMV = robust weighted least squares, ULSMV = robust unweight
least squares. M = mean.

101

ζ2

η2
β3

γ 21 = .4

2

ξ1

=

.2

=

η3

1

.4

=.
3

γ31 = .1

γ 32 = .1

ξ2

β 31

γ12

= .6

ζ3

β2

1

Φ12 = .3

=

.5

γ1

γ 22

η1

2
=.

ζ1

Figure 1. The postulated five-factor structural regression model with standardized coefficients.
Note. Ordinal observed variables of each latent construct are not depicted for clarity.

102

Distribution 1: Symmetry

1(a)

1(b)
1(c)
Distribution 2: Slight Asymmetry

1(d)

2(a)

2(b)
2(c)
Distribution 3: Moderate Asymmetry

2(d)

3(a)

3(b)
3(c)
Distribution 4: Bipolarization

3(d)

4(a)
4(b)
Figure 2. Response probabilities of ordinal observed indicators.

103

4(c)

4(d)

	  

	  
Figure 3. Average mean squared error for the factor loading estimates across the number of categories with symmetric
data and the smallest sample size N = 200.

104

	  

	  
Figure 4. Average mean squared error for the standard error estimates of factor loadings across the number of
categories with slightly asymmetric data and the sample size N = 300.

105

Figure 5. Average mean squared error for the standard error estimates of factor loadings across the number of
categories with slightly asymmetric data and the sample size N = 1,000.

106

Figure 6. Average mean squared error for the standard error estimates of structural coefficients across the number of
categories with slightly asymmetric data and the sample size N = 300.

	  

107

Sample size N = 200

	  

	  

Sample size N = 500

	  

	  
Sample size N = 1,000

	  

	  
	  
Figure 7. P-P plots for TML, TMLR, TWLSMV, and TULSMV (Moderate Asymmetry and 7-category)

	  
	  
	  
	  
	  
	  
	  
	  

108

Symmetry

	  

	  
Slight Asymmetry

	  

	  
Moderate Asymmetry

	  

	  

Figure 8. P-P plots for TML, TMLR, TWLSMV, and TULSMV (N = 300 and 7-category)

	  

109

Appendix C
Technical Details
1. Robust correction to the chi-square statistic for WLSM
The mean-adjusted chi-square statistic can also be implemented in the D-WLS estimator
(Muthén & Muthén, 2010):

TD-WLSM = =

!"
!"#$%(𝐔𝐕)

TWLS,

df = s – t,

(A.1)

where TWLS = (N − 1) FWLS(θ, s), 𝐕 is the estimated asymptotic covariance matrix of s, 𝐔 =
𝐖𝐃!𝟏 − 𝐖𝐃!𝟏 𝚫(𝚫′𝐖𝐃!𝟏   𝚫)−1𝚫′𝐖𝐃!𝟏 , s = the number of unique elements in s, and t = the number of
independent model parameters.
2. Robust corrections to the standard error for MLM or MLMV
A consistent estimator of the asymptotic covariance matrix of the parameter estimates Θ for
MLM or MLMV can be expressed as (Muthén & Muthén, 2010; Satorra & Bentler, 1994):
aCov(Θ)MLM or MLMV = N−1(𝚫′𝐖𝐍𝐓 𝚫)−1𝚫′𝐖𝐍𝐓 𝐕𝐖𝐍𝐓 𝚫(𝚫′𝐖𝐍𝐓 𝚫)−1,

(A.2)

WNT = ½N{D’[Σ−1(Θ)⊗Σ−1(Θ)]D},

(A.3)

and

where 𝚫 =

!!(!)
!!

is the matrix of model first derivatives evaluated at the parameter estimates

Θ, WNT is the normal-theory weight matrix (see Browne, 1974), 𝐕 is the estimated asymptotic
covariance matrix of S, D is the “duplication” matrix (see Magnus & Neudecker, 1986) and ⊗
denotes a Kronecker product.
3. Robust corrections to the chi-square statistic for MLM and MLMV
The mean-adjusted chi-square statistic is available in the robust ML estimator (also known as
the Satorra-Bentler scaled chi-square statistic: Satorra & Bentler, 1994; Muthén, 1993):

110

TMLM = TSB =

!"
!"#$%(𝐔𝐕)

TML,

df = s – t,

(A.4)

where TML = (N − 1) FML(Θ, S), 𝐕 is the estimated asymptotic covariance matrix of S, 𝐔 =
𝐖𝐍𝐓 − 𝐖𝐍𝐓 𝚫(𝚫′𝐖𝐍𝐓   𝚫)−1𝚫′𝐖𝐍𝐓 , s = the number of unique elements in S, and t = the number of
total model parameters.
Alternatively, the mean- and variance-adjusted chi-square statistic can also be implemented
in the robust ML estimator (Asparouhov & Muthén, 2010):

TMLMV =

!"
!"#$%(𝐔𝐕𝐔𝐕)

TML + df –

!"  [!"#$% 𝐔𝐕 ]!
!"#$%(𝐔𝐕𝐔𝐕)

,

df = s – t,

(A.5)

where TML = (N − 1) FML(Θ, S), 𝐕 is the estimated asymptotic covariance matrix of S, 𝐔 =
𝐖𝐍𝐓 − 𝐖𝐍𝐓 𝚫(𝚫′𝐖𝐍𝐓   𝚫)−1𝚫′𝐖𝐍𝐓 , s = the number of unique elements in S, and t = the number of
total model parameters.

111

Appendix D
Mplus Code for Data Generation and Analysis
1. Mplus code for data generation
TITLE: Data generation in an SR model with symmetry data, 4 categories, and N = 200
MONTECARLO:
NAMES = y1-y20;
NOBSERVATIONS = 200; ! sample size N = 200
NREPS = 500; ! number of replications = 500
SEED = 4533;
REPSAVE = ALL;
SAVE = ex1_rep*.dat;
! The SAVE option is used to name the files to which the 500 datasets were written.
! The asterisk * was replaced by the replication number. A file, ex1_replist.dat, was also
! produced. The file contains the file names of the 500 generated datasets.
GENERATE = y1-y20 (3); ! number of thresholds = 3
CATEGORICAL = y1-y20;
MODEL POPULATION:
F1 BY y1*.8 y2*.7 y3*.6 y4*.5; ! standardized factor loadings
F1@1; ! latent variance
[y1$1*-1.282
[y2$1*-1.282
[y3$1*-1.282
[y4$1*-1.282

y1$2*0
y2$2*0
y3$2*0
y4$2*0

y1$3*1.282]; ! pre-specified thresholds
y2$3*1.282];
y3$3*1.282];
y4$3*1.282];

y1*.36 y2*.51 y3*.64 y4*.75; ! residual variances
F2 BY y5*.8 y6*.7 y7*.6 y8*.5;
F2@1; ! latent variance
[y5$1*-1.282
[y6$1*-1.282
[y7$1*-1.282
[y8$1*-1.282

y5$2*0
y6$2*0
y7$2*0
y8$2*0

y5$3*1.282];
y6$3*1.282];
y7$3*1.282];
y8$3*1.282];

y5*.36 y6*.51 y7*.64 y8*.75;
F3 BY y9*.8 y10*.7 y11*.6 y12*.5;
F3@.336; ! residual variance of latent variable
[y9$1*-1.282 y9$2*0 y9$3*1.282];
[y10$1*-1.282 y10$2*0 y10$3*1.282];
[y11$1*-1.282 y11$2*0 y11$3*1.282];
[y12$1*-1.282 y12$2*0 y12$3*1.282];

112

y9*.36 y10*.51 y11*.64 y12*.75;
F4 BY y13*.8 y14*.7 y15*.6 y16*.5;
F4@.4364; ! residual variance of latent variable
[y13$1*-1.282
[y14$1*-1.282
[y15$1*-1.282
[y16$1*-1.282

y13$2*0
y14$2*0
y15$2*0
y16$2*0

y13$3*1.282];
y14$3*1.282];
y15$3*1.282];
y16$3*1.282];

y13*.36 y14*.51 y15*.64 y16*.75;
F5 BY y17*.8 y18*.7 y19*.6 y20*.5;
F5@.3798; ! residual variance of latent variable
[y17$1*-1.282
[y18$1*-1.282
[y19$1*-1.282
[y20$1*-1.282

y17$2*0
y18$2*0
y19$2*0
y20$2*0

y17$3*1.282];
y18$3*1.282];
y19$3*1.282];
y20$3*1.282];

y17*.36 y18*.51 y19*.64 y20*.75;
F1 WITH F2*.3; ! inter-factor correlation
F3 ON F1*.4 F2*.6; ! gamma coefficients
F4 ON F1*.4 F2*.2;
F5 ON F1*.1 F2*.1;
F4 ON F3*.3; ! beta coefficients
F5 ON F3*.2 F4*.5;
MODEL:
F1 BY y1*.8 y2*.7 y3*.6 y4*.5;
F1@1;
[y1$1*-1.282
[y2$1*-1.282
[y3$1*-1.282
[y4$1*-1.282

y1$2*0
y2$2*0
y3$2*0
y4$2*0

y1$3*1.282];
y2$3*1.282];
y3$3*1.282];
y4$3*1.282];

F2 BY y5*.8 y6*.7 y7*.6 y8*.5;
F2@1;
[y5$1*-1.282
[y6$1*-1.282
[y7$1*-1.282
[y8$1*-1.282

y5$2*0
y6$2*0
y7$2*0
y8$2*0

y5$3*1.282];
y6$3*1.282];
y7$3*1.282];
y8$3*1.282];

113

F3 BY y9*.8 y10*.7 y11*.6 y12*.5;
F3@1;
[y9$1*-1.282 y9$2*0 y9$3*1.282];
[y10$1*-1.282 y10$2*0 y10$3*1.282];
[y11$1*-1.282 y11$2*0 y11$3*1.282];
[y12$1*-1.282 y12$2*0 y12$3*1.282];
F4 BY y13*.8 y14*.7 y15*.6 y16*.5;
F4@1;
[y13$1*-1.282
[y14$1*-1.282
[y15$1*-1.282
[y16$1*-1.282

y13$2*0
y14$2*0
y15$2*0
y16$2*0

y13$3*1.282];
y14$3*1.282];
y15$3*1.282];
y16$3*1.282];

F5 BY y17*.8 y18*.7 y19*.6 y20*.5;
F5@1;
[y17$1*-1.282
[y18$1*-1.282
[y19$1*-1.282
[y20$1*-1.282

y17$2*0
y18$2*0
y19$2*0
y20$2*0

y17$3*1.282];
y18$3*1.282];
y19$3*1.282];
y20$3*1.282];

F1 WITH F2*.3;
F3 ON F1*.4 F2*.6;
F4 ON F1*.4 F2*.2;
F5 ON F1*.1 F2*.1;
F4 ON F3*.3;
F5 ON F3*.2 F4*.5;
OUTPUT: TECH9;
! The TECH9 option is used to request error messages related to convergence for each
! replication.
Notes: (1) This is an example Mplus code for ordinal indicators that have symmetric
distributions and four categories in a sample size of N = 200. The number of thresholds, the
pre-specified values of thresholds, and sample size (i.e., the NOBSERVATIONS option) can
be correspondingly modified to target different experimental conditions. (2) See Chapter 12:
Monte Carlo simulation studies of the Mplus User’s Guide for further details about other
commands and options. (3) The exclamation mark ! is used to make notes and comments
but not read by Mplus.

114

2. Mplus code for data analysis using ML and MLR
TITLE: Data analysis in an SR model using ML
DATA: FILE=ex1_replist.dat;
! The FILE option is used to carry out data analysis for each replication.
! “ex1_replist.dat” contains the file names of the 500 generated datasets.
TYPE = MONTECARLO;
VARIABLE:
NAMES= y1-y20;
ANALYSIS:
ESTIMATOR = ML;
! One can replace ML by MLR to obtain robust maximum likelihood estimation.
MODEL:
F1 BY y1* y2-y4;
F1@1;
F2 BY y5* y6-y8;
F2@1;
F3 BY y9* y10-y12;
F3@1;
F4 BY y13* y14-y16;
F4@1;
F5 BY y17* y18-y20;
F5@1;
F1 WITH F2;
F3 ON F1 F2;
F4 ON F1 F2;
F5 ON F1 F2;
F4 ON F3;
F5 ON F3 F4;
OUTPUT: STDYX; ! The STDYX option is used to request standardized solutions.
SAVEDATA: RESULTS ARE <Name of Results File>; ! The SAVEDATA command is used to
save estimation results obtained from the 500 replications.

115

3. Mplus code for data analysis using ULSMV and WLSMV
TITLE: Data analysis in an SR model using ULSMV
DATA: FILE=ex1_replist.dat;
TYPE = MONTECARLO;
VARIABLE:
NAMES= y1-y20;
CATEGORICAL= y1-y20;
ANALYSIS:
ESTIMATOR = ULSMV;
! One can replace ULSMV by WLSMV to obtain robust weighted least squares estimation.
MODEL:
F1 BY y1* y2-y4;
F1@1;
F2 BY y5* y6-y8;
F2@1;
F3 BY y9* y10-y12;
F3@1;
F4 BY y13* y14-y16;
F4@1;
F5 BY y17* y18-y20;
F5@1;
F1 WITH F2;
F3 ON F1 F2;
F4 ON F1 F2;
F5 ON F1 F2;
F4 ON F3;
F5 ON F3 F4;
OUTPUT: STDYX;
SAVEDATA: RESULTS ARE <Name of Results File>;

116

Appendix E
Results for sample sizes of N = 400, 750, and 1,500 are presented below:
1. Tables E1−E3 display average relative bias (RBA) and average mean squared error (MSEA) of factor loadings and structural
coefficients by number of categories and observed distributions for all three robust estimators.
Table E1. The Average Relative Bias (RBA) and Average Root Mean Squared Error (MSEA) for Factor Loadings and Structural
Coefficients (N = 400)

Dis.
sym

slight

mod

ML/MLR

WLSMV

ULSMV

RBA Robust WLS MSEA

RBA Robust WLS MSEA

RBA Robust WLS MSEA

Cat.

FL

SC

FL

SC

FL

SC

FL

SC

FL

SC

FL

SC

4

-6.97

-1.08

0.0094

0.3552

0.11

-0.76

0.0058

0.3705

-0.09

-0.93

0.0061

0.3751

5

-4.49

-0.02

0.0063

0.3315

0.00

0.21

0.0053

0.3494

-0.18

0.04

0.0055

0.3577

6

-3.22

-0.37

0.0053

0.2930

0.03

-0.24

0.0051

0.3095

-0.14

-0.39

0.0053

0.3136

7

-2.48

-0.59

0.0047

0.2963

0.01

-0.37

0.0049

0.3112

-0.16

-0.50

0.0051

0.3172

4

-9.95

-1.04

0.0155

0.4576

0.08

-0.48

0.0066

0.4513

-1.14

-0.58

0.0068

0.4452

5

-6.88

-0.71

0.0100

0.4010

-0.01

0.12

0.0058

0.3897

-0.20

-0.02

0.0061

0.3904

6

-5.92

-1.12

0.0086

0.3694

0.03

-0.52

0.0055

0.3361

-0.15

-0.70

0.0057

0.3396

7

-5.37

-0.65

0.0080

0.3950

0.00

-0.01

0.0052

0.3479

-0.17

-0.17

0.0054

0.3509

4

-11.67

-2.93

0.0212

0.5442

0.02

-0.07

0.0081

0.5969

-0.28

-0.29

0.0085

0.6034

5

-9.03

-2.60

0.0149

0.4678

0.10

-0.54

0.0067

0.4472

-0.14

-0.81

0.0070

0.4511

6

-8.62

-2.47

0.0142

0.4849

0.06

-0.21

0.0062

0.4317

-0.15

-0.40

0.0065

0.4308

7

-8.60

-2.57

0.0143

0.4901

0.04

-0.17

0.0060

0.4147

-0.17

-0.36

0.0063

0.4139

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. ML/MLR = maximum likelihood/robust maximum likelihood, WLSMV =

117

robust weighted least squares, ULSMV = robust unweight least squares. FL represents factor loadings and SC is structural
coefficients.
Table E2. The Average Relative Bias (RBA) and Average Root Mean Squared Error (MSEA) for Factor Loadings and Structural
Coefficients (N = 750)

Dis.
sym

slight

mod

ML/MLR

WLSMV

ULSMV

RBA Robust WLS MSEA

RBA Robust WLS MSEA

RBA Robust WLS MSEA

Cat.

FL

SC

FL

SC

FL

SC

FL

SC

FL

SC

FL

SC

4

-6.85

-0.30

0.0071

0.1879

0.17

-0.21

0.0031

0.1979

0.06

-0.26

0.0032

0.2011

5

-4.31

-0.05

0.0041

0.1606

0.15

-0.01

0.0027

0.1732

0.04

-0.04

0.0028

0.1762

6

-3.09

-0.39

0.0031

0.1572

0.12

-0.33

0.0026

0.1698

0.03

-0.35

0.0027

0.1732

7

-2.34

-0.29

0.0027

0.1490

0.12

-0.23

0.0025

0.1618

0.03

-0.27

0.0026

0.1654

4

-9.80

-1.34

0.0125

0.2267

0.18

-0.38

0.0034

0.2357

0.05

-0.44

0.0035

0.2406

5

-6.71

-1.25

0.0072

0.1936

0.14

-0.30

0.0030

0.1940

0.03

-0.29

0.0031

0.1975

6

-5.81

-1.14

0.0060

0.1832

0.13

-0.28

0.0028

0.1812

0.02

-0.33

0.0029

0.1860

7

-5.23

-0.94

0.0053

0.1753

0.11

-0.21

0.0026

0.1689

0.01

-0.25

0.0028

0.1720

4

-11.54

-3.17

0.0173

0.2488

0.11

-0.31

0.0042

0.2875

-0.05

-0.39

0.0044

0.2917

5

-8.97

-2.61

0.0115

0.2277

0.10

-0.26

0.0035

0.2377

-0.02

-0.32

0.0036

0.2427

6

-8.52

-2.70

0.0108

0.2186

0.10

-0.23

0.0033

0.2086

-0.02

-0.31

0.0034

0.2137

7

-8.48

-2.52

0.0108

0.2177

0.12

-0.01

0.0031

0.2079

0.00

-0.06

0.0033

0.2115

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. ML/MLR = maximum likelihood/robust maximum likelihood, WLSMV =
robust weighted least squares, ULSMV = robust unweight least squares. FL represents factor loadings and SC is structural
coefficients.

118

Table E3. The Average Relative Bias (RBA) and Average Root Mean Squared Error (MSEA) for Factor Loadings and Structural
Coefficients (N = 1,500)

Dis.
sym

slight

mod

ML/MLR

WLSMV

ULSMV

RBA Robust WLS MSEA

RBA Robust WLS MSEA

RBA Robust WLS MSEA

Cat.

FL

SC

FL

SC

FL

SC

FL

SC

FL

SC

FL

SC

4

-6.88

-0.01

0.0059

0.0805

0.08

0.12

0.0015

0.0840

0.03

0.12

0.0016

0.0855

5

-4.33

-0.18

0.0030

0.0781

0.07

0.03

0.0013

0.0820

0.03

0.03

0.0014

0.0839

6

-3.08

-0.14

0.0020

0.0715

0.08

0.02

0.0013

0.0748

0.03

0.01

0.0014

0.0764

7

-2.36

-0.28

0.0016

0.0692

0.06

-0.13

0.0012

0.0720

0.02

-0.12

0.0013

0.0737

4

-9.80

-0.93

0.0111

0.1058

0.10

-0.05

0.0017

0.1042

0.04

-0.01

0.0017

0.1051

5

-6.72

-0.78

0.0059

0.0918

0.08

0.07

0.0015

0.0902

0.03

0.08

0.0015

0.0924

6

-5.81

-0.65

0.0047

0.0855

0.07

-0.03

0.0014

0.0800

0.03

-0.03

0.0015

0.0814

7

-5.22

-0.70

0.0040

0.0869

0.08

0.05

0.0013

0.0790

0.03

0.07

0.0014

0.0806

4

-11.57

-2.92

0.0155

0.1237

0.04

-0.19

0.0021

0.1309

-0.04

-0.20

0.0022

0.1333

5

-8.98

-2.40

0.0099

0.1053

0.07

-0.12

0.0017

0.1012

0.00

-0.13

0.0018

0.1026

6

-8.51

-2.53

0.0091

0.1075

0.06

-0.18

0.0016

0.0934

0.01

-0.16

0.0017

0.0944

7

-8.48

-2.49

0.0091

0.1064

0.07

-0.10

0.0015

0.0881

0.02

-0.09

0.0016

0.0897

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. ML/MLR = maximum likelihood/robust maximum likelihood, WLSMV =
robust weighted least squares, ULSMV = robust unweight least squares. FL represents factor loadings and SC is structural
coefficients.

	  
	  

119

2. The RBA and MSEA for standard errors of factor loadings and structural coefficients are presented in Tables E4−E6.

Table E4. The Average Relative Bias (RBA) and Average Root Mean Squared Error (MSEA) for Standard Errors (SE) of Factor
Loadings and Structural Coefficients (N = 400)

Dis.
sym

slight

mod

ML

MLR

RBA Robust WLS MSEA

RBA Robust WLS MSEA

Cat.

SEFL

SESC

SEFL

SESC

SEFL

SESC

SEFL

SESC

4

0.18

-0.06

0.0048

0.0201

0.46

1.03

0.0071

0.0227

5

-0.21

-1.44

0.0052

0.0162

0.08

-0.31

0.0075

0.0189

6

-0.95

-0.18

0.0054

0.0152

-0.55

0.80

0.0077

0.0177

7

-0.80

-0.93

0.0052

0.0141

-0.40

0.02

0.0075

0.0164

4

-5.45

-7.23

0.0077

0.0289

1.16

-1.61

0.0082

0.0290

5

-6.39

-7.05

0.0094

0.0241

0.16

-1.25

0.0088

0.0243

6

-6.32

-6.10

0.0089

0.0220

-0.37

-0.86

0.0081

0.0233

7

-6.93

-8.47

0.0098

0.0254

0.27

-2.75

0.0081

0.0232

4

-16.07

-14.70

0.0316

0.0514

-0.22

-1.44

0.0087

0.0385

5

-14.28

-13.28

0.0264

0.0405

-0.17

-1.40

0.0086

0.0292

6

-14.58

-14.59

0.0259

0.0429

0.20

-2.51

0.0085

0.0301

7

-15.33

-14.83

0.0282

0.0428

0.10

-2.24

0.0084

0.0287

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. ML = maximum likelihood, MLR = robust maximum likelihood. SEFL
represents standard errors of factor loadings and SESC is standard errors of structural coefficients.

120

Table E4 (cont’d)

Dis.
sym

slight

mod

WLSMV

ULSMV

RBA Robust WLS MSEA

RBA Robust WLS MSEA

Cat.

SEFL

SESC

SEFL

SESC

SEFL

SESC

SEFL

SESC

4

-2.21

-1.28

0.0072

0.0223

-1.41

-0.55

0.0060

0.0230

5

-2.49

-2.50

0.0072

0.0198

-2.12

-2.27

0.0078

0.0220

6

-3.66

-2.61

0.0079

0.0183

-3.52

-1.68

0.0067

0.0186

7

-3.66

-3.36

0.0077

0.0175

-3.33

-2.57

0.0064

0.0176

4

-2.19

-3.00

0.0078

0.0293

-1.86

-1.53

0.0064

0.0290

5

-2.97

-3.17

0.0080

0.0227

-2.85

-2.29

0.0068

0.0228

6

-3.63

-1.89

0.0079

0.0211

-3.14

-1.04

0.0066

0.0219

7

-3.78

-4.78

0.0079

0.0206

-3.43

-3.82

0.0065

0.0203

4

-3.76

-2.96

0.0098

0.0406

-3.75

-3.11

0.0082

0.0429

5

-3.33

-2.05

0.0084

0.0301

-3.57

-1.97

0.0072

0.0295

6

-3.15

-5.10

0.0085

0.0279

-3.72

-4.11

0.0072

0.0271

7

-3.71

-5.04

0.0086

0.0243

-4.00

-4.12

0.0075

0.0239

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. WLSMV = robust weighted least squares, ULSMV = robust unweight
least squares. SEFL represents standard errors of factor loadings and SESC is standard errors of structural coefficients.

121

Table E5. The Average Relative Bias (RBA) and Average Root Mean Squared Error (MSEA) for Standard Errors (SE) of Factor
Loadings and Structural Coefficients (N = 750)

Dis.
sym

slight

mod

ML

MLR

RBA Robust WLS MSEA

RBA Robust WLS MSEA

Cat.

SEFL

SESC

SEFL

SESC

SEFL

SESC

SEFL

SESC

4

-0.03

-2.20

0.0027

0.0108

-0.19

-1.54

0.0039

0.0120

5

1.23

-1.28

0.0036

0.0090

1.12

-0.60

0.0047

0.0102

6

0.63

-1.63

0.0032

0.0086

0.69

-1.06

0.0044

0.0096

7

0.59

-1.47

0.0034

0.0077

0.66

-0.88

0.0045

0.0087

4

-5.11

-6.27

0.0053

0.0167

1.00

-0.85

0.0045

0.0161

5

-5.21

-5.84

0.0061

0.0130

1.10

-0.14

0.0055

0.0124

6

-4.38

-5.70

0.0048

0.0133

1.38

-0.58

0.0048

0.0128

7

-5.32

-5.88

0.0060

0.0134

1.54

-0.24

0.0053

0.0126

4

-14.02

-11.18

0.0222

0.0267

1.29

2.03

0.0049

0.0198

5

-11.69

-11.75

0.0162

0.0250

2.02

-0.15

0.0051

0.0156

6

-13.00

-11.59

0.0195

0.0256

1.40

0.69

0.0051

0.0170

7

-13.91

-12.25

0.0220

0.0258

1.06

0.40

0.0047

0.0152

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. ML = maximum likelihood, MLR = robust maximum likelihood. SEFL
represents standard errors of factor loadings and SESC is standard errors of structural coefficients.

	  
	  
	  

122

Table E5 (cont’d)

Dis.
sym

slight

mod

WLSMV

ULSMV

RBA Robust WLS MSEA

RBA Robust WLS MSEA

Cat.

SEFL

SESC

SEFL

SESC

SEFL

SESC

SEFL

SESC

4

-1.07

-3.16

0.0044

0.0124

-0.53

-2.76

0.0041

0.0123

5

0.63

-2.39

0.0045

0.0108

0.44

-1.96

0.0039

0.0108

6

-0.35

-3.42

0.0045

0.0106

-0.26

-2.99

0.0039

0.0103

7

-0.17

-3.37

0.0043

0.0098

-0.22

-2.97

0.0035

0.0098

4

0.04

-3.53

0.0044

0.0153

0.43

-3.02

0.0039

0.0156

5

-0.30

-2.49

0.0044

0.0117

0.02

-2.09

0.0037

0.0119

6

-0.71

-3.36

0.0041

0.0115

-0.46

-3.07

0.0036

0.0117

7

0.07

-3.39

0.0047

0.0109

-0.19

-2.72

0.0038

0.0108

4

-1.01

-0.76

0.0050

0.0196

-0.78

-0.33

0.0041

0.0202

5

-0.57

-2.68

0.0045

0.0149

-0.57

-2.38

0.0041

0.0151

6

-1.13

-2.50

0.0049

0.0138

-1.33

-2.14

0.0043

0.0141

7

-1.31

-3.18

0.0045

0.0127

-1.28

-2.73

0.0042

0.0128

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. WLSMV = robust weighted least squares, ULSMV = robust unweight
least squares. SEFL represents standard errors of factor loadings and SESC is standard errors of structural coefficients.

123

Table E6. The Average Relative Bias (RBA) and Average Root Mean Squared Error (MSEA) for Standard Errors (SE) of Factor
Loadings and Structural Coefficients (N = 1,500)

Dis.
sym

slight

mod

ML

MLR

RBA Robust WLS MSEA

RBA Robust WLS MSEA

Cat.

SEFL

SESC

SEFL

SESC

SEFL

SESC

SEFL

SESC

4

1.10

1.69

0.0023

0.0048

0.68

2.18

0.0027

0.0057

5

1.19

-0.41

0.0023

0.0041

0.88

0.15

0.0027

0.0048

6

0.42

0.78

0.0017

0.0039

0.32

1.20

0.0023

0.0046

7

0.94

0.52

0.0021

0.0038

0.86

1.00

0.0026

0.0047

4

-5.12

-5.92

0.0046

0.0090

0.66

-0.56

0.0028

0.0069

5

-4.71

-5.24

0.0045

0.0078

1.39

0.40

0.0035

0.0065

6

-5.07

-4.50

0.0045

0.0062

0.42

0.59

0.0029

0.0054

7

-5.19

-5.87

0.0049

0.0079

1.43

-0.33

0.0032

0.0055

4

-14.52

-13.61

0.0230

0.0246

0.25

-0.92

0.0030

0.0083

5

-12.40

-11.33

0.0173

0.0183

0.83

0.17

0.0032

0.0071

6

-12.98

-12.37

0.0188

0.0207

1.02

-0.40

0.0031

0.0072

7

-13.69

-12.78

0.0208

0.0218

0.92

-0.41

0.0033

0.0072

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. ML = maximum likelihood, MLR = robust maximum likelihood. SEFL
represents standard errors of factor loadings and SESC is standard errors of structural coefficients.

124

Table E6 (cont’d)

Dis.
sym

slight

mod

WLSMV

ULSMV

RBA Robust WLS MSEA

RBA Robust WLS MSEA

Cat.

SEFL

SESC

SEFL

SESC

SEFL

SESC

SEFL

SESC

4

0.31

1.36

0.0026

0.0056

-0.09

1.54

0.0021

0.0058

5

0.67

-0.60

0.0026

0.0051

0.60

-0.48

0.0025

0.0052

6

-0.18

0.14

0.0024

0.0046

-0.68

0.40

0.0020

0.0048

7

0.32

0.17

0.0025

0.0045

0.05

0.41

0.0023

0.0047

4

0.65

-1.10

0.0030

0.0065

1.07

-0.62

0.0031

0.0065

5

0.50

-0.35

0.0027

0.0064

0.30

-0.32

0.0024

0.0065

6

-0.57

0.29

0.0024

0.0047

-0.81

0.63

0.0020

0.0049

7

0.53

-0.90

0.0029

0.0049

0.52

-0.65

0.0028

0.0050

4

-0.53

-1.07

0.0034

0.0083

-0.74

-0.99

0.0031

0.0084

5

-0.76

0.21

0.0026

0.0070

-0.98

0.56

0.0023

0.0072

6

-0.49

0.58

0.0026

0.0064

-0.32

0.93

0.0023

0.0066

7

-0.26

1.12

0.0025

0.0061

-0.37

1.31

0.0023

0.0062

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. WLSMV = robust weighted least squares, ULSMV = robust unweight
least squares. SEFL represents standard errors of factor loadings and SESC is standard errors of structural coefficients.

125

3. Tables E7−E9 present findings for chi-square test statistics and RMSEA with MLR, ULSMV, and WLSMV estimation.

Table E7. Bias and Rejection Rates of Chi-Square Statistics as well as Means and Rejection Rates of RMSEA (N = 400)
ML

MLR

Robust WLSRMSEA
Chi-square
Dis.
sym

slight

mod

Robust WLSRMSEA
Chi-square

Cat.

bias

%

M

%

bias

%

M

%

4

2.55

8.00

0.008

0.00

2.78

8.40

0.008

0.00

5

3.63

6.80

0.009

0.00

3.79

7.20

0.009

0.00

6

2.74

6.80

0.009

0.00

2.91

7.40

0.009

0.00

7

2.91

7.00

0.009

0.00

3.06

7.40

0.009

0.00

4

12.08

30.60

0.016

0.00

5.55

14.40

0.011

0.00

5

11.80

28.40

0.015

0.00

4.72

10.80

0.010

0.00

6

11.39

27.00

0.015

0.00

4.82

11.40

0.010

0.00

7

13.17

31.60

0.016

0.00

5.55

12.60

0.011

0.00

4

25.15

63.53

0.024

0.00

5.37

14.23

0.010

0.00

5

23.67

62.73

0.023

0.00

5.98

11.62

0.011

0.00

6

25.48

64.80

0.024

0.00

6.32

13.00

0.011

0.00

7

26.49

68.80

0.025

0.00

6.20

13.80

0.011

0.00

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. ML = maximum likelihood, MLR = robust maximum likelihood. M =
mean.

126

Table E7 (cont’d)
WLSMV

ULSMV

Robust WLSRMSEA
Chi-square
Dis.
sym

slight

mod

Robust WLSRMSEA
Chi-square

Cat.

bias

%

M

%

bias

%

M

%

4

1.30

5.20

0.007

0.00

0.12

3.20

0.006

0.00

5

2.28

3.80

0.008

0.00

1.15

2.81

0.007

0.00

6

2.63

5.40

0.008

0.00

1.42

4.80

0.007

0.00

7

2.97

5.80

0.008

0.00

1.84

4.80

0.008

0.00

4

2.91

6.20

0.008

0.00

1.69

4.20

0.007

0.00

5

2.50

5.40

0.008

0.00

1.31

3.80

0.007

0.00

6

2.67

5.60

0.008

0.00

1.52

4.20

0.007

0.00

7

3.86

5.00

0.009

0.00

2.58

3.40

0.008

0.00

4

2.42

4.02

0.008

0.00

1.43

3.81

0.007

0.00

5

3.13

6.43

0.008

0.00

2.13

5.00

0.008

0.00

6

3.52

7.80

0.009

0.00

2.54

6.40

0.008

0.00

7

4.20

6.80

0.009

0.00

3.25

5.40

0.009

0.00

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. WLSMV = robust weighted least squares, ULSMV = robust unweight
least squares. M = mean.

127

Table E8. Bias and Rejection Rates of Chi-Square Statistics as well as Means and Rejection Rates of RMSEA (N = 750)
ML

MLR

Robust WLSRMSEA
Chi-square
Dis.
sym

slight

mod

Robust WLSRMSEA
Chi-square

Cat.

bias

%

M

%

bias

%

M

%

4

2.18

9.60

0.006

0.00

2.12

9.60

0.006

0.00

5

1.95

6.60

0.006

0.00

1.80

6.40

0.006

0.00

6

2.11

9.80

0.006

0.00

2.04

9.80

0.006

0.00

7

1.85

8.20

0.006

0.00

1.75

8.00

0.006

0.00

4

9.37

22.20

0.010

0.00

1.87

5.80

0.006

0.00

5

10.98

25.20

0.011

0.00

2.81

8.80

0.006

0.00

6

10.55

24.00

0.010

0.00

2.86

8.80

0.006

0.00

7

10.71

26.80

0.011

0.00

2.11

6.80

0.006

0.00

4

24.44

63.60

0.017

0.00

3.14

9.20

0.007

0.00

5

22.23

57.20

0.016

0.00

3.11

8.60

0.006

0.00

6

23.67

63.40

0.017

0.00

3.07

8.60

0.007

0.00

7

24.38

66.60

0.017

0.00

2.77

8.00

0.006

0.00

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. ML = maximum likelihood, MLR = robust maximum likelihood. M =
mean.

	  
	  
	  

128

Table E8 (cont’d)
WLSMV

ULSMV

Robust WLSRMSEA
Chi-square
Dis.
sym

slight

mod

Robust WLSRMSEA
Chi-square

Cat.

bias

%

M

%

bias

%

M

%

4

1.11

5.40

0.005

0.00

0.48

4.80

0.005

0.00

5

0.89

4.60

0.005

0.00

0.27

3.41

0.005

0.00

6

1.60

5.60

0.005

0.00

0.99

5.20

0.005

0.00

7

1.66

6.40

0.006

0.00

1.02

5.40

0.005

0.00

4

6.40
1.07

5.20

0.005

0.00

0.34

3.80

0.005

0.00

5

1.75

3.80

0.006

0.00

1.13

3.80

0.005

0.00

6

1.85

5.80

0.006

0.00

1.23

5.00

0.005

0.00

7

1.41

5.40

0.005

0.00

0.72

4.20

0.005

0.00

4

1.32

4.60

0.005

0.00

0.72

3.80

0.005

0.00

5

2.00

6.80

0.006

0.00

1.45

6.00

0.005

0.00

6

1.94

6.40

0.006

0.00

1.39

5.80

0.005

0.00

7

2.04

6.20

0.006

0.00

1.48

5.20

0.005

0.00

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. WLSMV = robust weighted least squares, ULSMV = robust unweight
least squares. M = mean.

129

Table E9. Bias and Rejection Rates of Chi-Square Statistics as well as Means and Rejection Rates of RMSEA (N = 1,500)
ML

MLR

Robust WLSRMSEA
Chi-square
Dis.
sym

slight

mod

Robust WLSRMSEA
Chi-square

Cat.

bias

%

M

%

bias

%

M

%

4

0.90

6.20

0.004

0.00

0.69

6.20

0.004

0.00

5

1.02

5.80

0.004

0.00

0.71

5.40

0.004

0.00

6

1.15

6.80

0.004

0.00

0.93

6.60

0.004

0.00

7

1.11

6.60

0.004

0.00

0.84

6.00

0.004

0.00

4

9.85

21.80

0.007

0.00

1.66

6.60

0.004

0.00

5

10.34

21.20

0.007

0.00

1.53

8.00

0.004

0.00

6

9.74

21.60

0.007

0.00

1.44

7.40

0.004

0.00

7

10.91

23.20

0.008

0.00

1.68

8.00

0.004

0.00

4

24.03

63.60

0.012

0.00

1.87

7.40

0.004

0.00

5

21.89

59.00

0.011

0.00

1.95

7.40

0.004

0.00

6

23.41

59.60

0.012

0.00

1.98

7.20

0.004

0.00

7

24.69

62.80

0.012

0.00

2.20

7.60

0.004

0.00

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. ML = maximum likelihood, MLR = robust maximum likelihood. M =
mean.

	  
	  
	  

130

	  
Table E9 (cont’d)
WLSMV

ULSMV

Robust WLSRMSEA
Chi-square
Dis.
sym

slight

mod

Robust WLSRMSEA
Chi-square

Cat.

bias

%

M

%

bias

%

M

%

4

0.57

5.00

0.004

0.00

0.36

4.21

0.004

0.00

5

0.40

5.40

0.004

0.00

0.19

4.20

0.004

0.00

6

0.98

6.40

0.004

0.00

0.79

6.40

0.004

0.00

7

0.86

6.20

0.004

0.00

0.61

5.20

0.004

0.00

4

1.04

6.00

0.004

0.00

0.75

5.84

0.003

0.00

5

0.86

5.60

0.004

0.00

0.67

5.60

0.004

0.00

6

0.96

5.80

0.004

0.00

0.76

5.40

0.003

0.00

7

0.81

6.20

0.004

0.00

0.57

5.80

0.004

0.00

4

0.67

5.20

0.004

0.00

0.49

5.42

0.004

0.00

5

1.22

6.00

0.004

0.00

1.08

5.00

0.004

0.00

6

1.05

7.60

0.004

0.00

0.83

7.00

0.004

0.00

7

1.33

7.60

0.004

0.00

1.16

7.60

0.004

0.00

Note. Dis. = distribution type and Cat. = number of categories. sym = symmetric distribution, slight = slightly asymmetric
distribution, mod = moderately asymmetric distribution. WLSMV = robust weighted least squares, ULSMV = robust unweight
least squares. M = mean.

	  

131

Appendix F
Results for bipolarization data are presented below:
1. Table F1 displays average relative bias (RBA) and average mean squared error (MSEA) of factor loadings and structural
coefficients by number of categories and sample sizes for all three robust estimators.

Table F1. The Average Relative Bias (RBA) and Average Root Mean Squared Error (MSEA) for Factor Loadings and Structural
Coefficients with Bipolarization Distribution

N = 200

N = 300

N = 400

ML/MLR

WLSMV

ULSMV

RBA Robust WLS MSEA

RBA Robust WLS MSEA

RBA Robust WLS MSEA

Cat
.
4

FL

SC

FL

SC

FL

SC

FL

SC

FL

SC

FL

SC

-10.36

-2.09

0.0219

1.2664

0.20

-0.67

0.0140

1.3447

-0.28

-0.80

0.0145

1.2803

5

-9.89

-2.36

0.0208

1.2004

0.17

-0.58

0.0131

1.2148

-0.29

-0.79

0.0136

1.1765

6

-9.04

-2.15

0.0189

1.1454

0.14

-0.57

0.0127

1.0955

-0.29

-0.77

0.0132

1.0687

7

-7.74

-1.55

0.0163

0.9636

0.15

-0.10

0.0117

0.9089

-0.26

-0.26

0.0121

0.8856

4

-10.36

-2.08

0.0180

0.6648

0.08

-0.29

0.0089

0.6808

-0.03

0.18

0.0079

0.4693

5

-9.82

-2.18

0.0168

0.6092

0.08

-0.22

0.0085

0.6339

-0.21

-0.42

0.0089

0.6274

6

-8.96

-1.93

0.0150

0.5489

0.10

-0.02

0.0081

0.5497

-0.18

-0.14

0.0084

0.5469

7

-7.69

-1.60

0.0126

0.5139

0.08

0.19

0.0075

0.5025

-0.18

0.09

0.0078

0.4991

4

-10.33

-1.40

0.0163

0.4485

0.08

-0.06

0.0069

0.4910

-0.16

-0.19

0.0072

0.4941

5

-9.81

-1.52

0.0152

0.4281

0.06

-0.28

0.0066

0.4641

-0.17

-0.40

0.0069

0.4656

6

-8.98

-1.67

0.0135

0.3999

0.02

-0.42

0.0063

0.4311

-0.19

-0.57

0.0066

0.4320

7

-7.70

-1.50

0.0111

0.3795

0.04

-0.35

0.0059

0.3845

-0.16

-0.50

0.0061

0.3863

132

Table F1 (cont’d)
N = 500

N = 750

N = 1,000

N = 1,500

4

-10.34

-2.20

0.0151

0.3299

0.04

-0.96

0.0055

0.3649

-0.16

-1.06

0.0057

0.3698

5

-9.83

-2.33

0.0141

0.3161

0.03

-0.93

0.0052

0.3445

-0.15

-1.01

0.0054

0.3503

6

-8.97

-2.17

0.0123

0.3082

0.04

-0.77

0.0049

0.3198

-0.13

-0.84

0.0051

0.3249

7

-7.69

-2.03

0.0100

0.2864

0.06

-0.71

0.0045

0.2898

-0.11

-0.80

0.0047

0.2955

4

-10.21

-1.57

0.0133

0.2309

0.16

-0.12

0.0036

0.2558

0.03

-0.10

0.0038

0.2588

5

-9.69

-1.45

0.0122

0.2207

0.15

-0.02

0.0034

0.2395

0.03

-0.03

0.0035

0.2430

6

-8.84

-1.44

0.0106

0.2117

0.14

-0.12

0.0032

0.2252

0.02

-0.14

0.0034

0.2298

7

-7.58

-1.17

0.0084

0.1992

0.11

-0.04

0.0030

0.2051

0.00

-0.05

0.0031

0.2097

4

-10.21

-1.51

0.0126

0.1629

0.12

-0.26

0.0027

0.1752

0.02

-0.32

0.0028

0.1758

5

-9.70

-1.35

0.0116

0.1622

0.12

-0.19

0.0025

0.1689

0.03

-0.27

0.0026

0.1697

6

-8.84

-1.42

0.0099

0.1521

0.13

-0.22

0.0024

0.1566

0.04

-0.27

0.0025

0.1579

7

-7.59

-1.21

0.0078

0.1475

0.09

-0.16

0.0022

0.1438

0.01

-0.22

0.0023

0.1456

4

-10.26

-1.39

0.0120

0.0955

0.07

-0.09

0.0018

0.1033

0.00

-0.12

0.0019

0.1052

5

-9.74

-1.40

0.0110

0.0939

0.07

-0.03

0.0017

0.0977

0.02

-0.06

0.0018

0.0996

6

-8.90

-1.34

0.0094

0.0912

0.06

-0.05

0.0016

0.0939

0.01

-0.07

0.0017

0.0956

7

-7.62

-1.22

0.0072

0.0876

0.06

-0.03

0.0015

0.0885

0.01

-0.04

0.0016

0.0902

Note. Cat. = number of categories. ML = maximum likelihood, MLR = robust maximum likelihood, WLSMV = robust weighted
least squares, ULSMV = robust unweight least squares. FL represents factor loadings and SC is structural coefficients.

133

2. The RBA and MSEA for standard errors of factor loadings and structural coefficients are presented in Table F2.

Table F2. The Average Relative Bias (RBA) and Average Root Mean Squared Error (MSEA) for Standard Errors (SE) of Factor
Loadings and Structural Coefficients with Bipolarization Distribution

N = 200

N = 300

N = 400

ML

MLR

RBA Robust WLS MSEA

RBA Robust WLS MSEA

Cat.

SEFL

SESC

SEFL

SESC

SEFL

SESC

SEFL

SESC

4

-5.94

-9.17

0.0123

0.1581

0.53

-2.57

0.0124

0.1957

5

-5.80

-8.76

0.0122

0.2144

0.85

-1.99

0.0124

0.2638

6

-5.88

-9.01

0.0123

0.1749

0.54

-2.59

0.0124

0.2026

7

-5.76

-8.33

0.0124

0.0712

0.36

-2.24

0.0125

0.0817

4

-4.55

-6.27

0.0087

0.0695

0.94

-0.65

0.0082

0.0735

5

-4.72

-5.51

0.0088

0.0544

0.94

0.28

0.0081

0.0592

6

-4.71

-4.16

0.0085

0.0449

0.76

1.56

0.0077

0.0501

7

-4.62

-4.50

0.0088

0.0392

0.62

0.96

0.0082

0.0446

4

-4.99

-5.21

0.0070

0.0282

-0.17

0.17

0.0057

0.0293

5

-5.13

-5.18

0.0072

0.0271

-0.13

0.23

0.0057

0.0280

6

-5.08

-4.39

0.0070

0.0243

-0.20

0.97

0.0055

0.0261

7

-4.98

-4.37

0.0071

0.0216

-0.32

0.66

0.0058

0.0227

134

Table F2 (cont’d)
N = 500

N = 750

N = 1,000

N = 1,500

4

-4.83

-4.86

0.0061

0.0198

-0.22

0.30

0.0047

0.0204

5

-5.10

-4.49

0.0063

0.0188

-0.33

0.77

0.0045

0.0198

6

-4.69

-4.84

0.0060

0.0179

-0.01

0.23

0.0047

0.0182

7

-4.21

-4.10

0.0056

0.0157

0.33

0.73

0.0048

0.0166

4

-3.99

-5.13

0.0044

0.0151

0.21

-0.43

0.0033

0.0147

5

-3.72

-5.33

0.0044

0.0146

0.66

-0.57

0.0036

0.0140

6

-3.58

-5.26

0.0042

0.0138

0.72

-0.64

0.0035

0.0131

7

-3.37

-4.56

0.0040

0.0129

0.78

-0.17

0.0034

0.0128

4

-3.50

-5.96

0.0033

0.0109

0.46

-1.44

0.0025

0.0086

5

-3.72

-6.50

0.0038

0.0116

0.41

-1.89

0.0029

0.0088

6

-3.14

-5.77

0.0033

0.0102

0.94

-1.25

0.0028

0.0081

7

-3.29

-5.89

0.0034

0.0099

0.62

-1.62

0.0028

0.0076

4

-4.56

-2.29

0.0042

0.0060

-0.83

2.23

0.0025

0.0068

5

-4.60

-2.41

0.0044

0.0060

-0.71

2.21

0.0026

0.0068

6

-4.28

-2.73

0.0040

0.0056

-0.45

1.75

0.0024

0.0059

7

-4.33

-2.96

0.0039

0.0060

-0.63

1.28

0.0023

0.0061

Note. Cat. = number of categories. ML = maximum likelihood, MLR = robust maximum likelihood. SEFL represents standard
errors of factor loadings and SESC is standard errors of structural coefficients.

135

Table F2 (cont’d)

N = 200

N = 300

N = 400

WLSMV

ULSMV

RBA Robust WLS MSEA

RBA Robust WLS MSEA

Cat.

SEFL

SESC

SEFL

SESC

SEFL

SESC

SEFL

SESC

4

-3.90

-5.10

0.0138

0.1609

-2.72

-3.71

0.0106

0.1311

5

-3.31

-5.41

0.0138

0.1372

-2.30

-4.39

0.0108

0.1120

6

-4.05

-5.02

0.0141

0.1114

-2.90

-3.91

0.0112

0.0987

7

-4.10

-4.81

0.0139

0.0715

-3.10

-3.36

0.0111

0.0692

4

-0.99

-0.97

0.0095

0.0609

-1.97

1.41

0.0080

0.0456

5

-1.43

0
-0.86

0.0093

0.0576

-1.51

0.54

0.0074

0.0601

6

-1.47

0.43

0.0089

0.0435

-1.22

1.69

0.0071

0.0459

7

-1.83

-0.70

0.0092

0.0390

-1.50

0.53

0.0073

0.0419

4

-1.90

-1.07

0.0068

0.0285

-1.67

-0.34

0.0054

0.0284

5

-1.92

-2.32

0.0068

0.0273

-1.62

-1.52

0.0056

0.0274

6

-2.13

-2.19

0.0069

0.0267

-1.87

-1.27

0.0057

0.0265

7

-2.40

-1.57

0.0069

0.0207

-2.38

-0.71

0.0057

0.0206

136

Table F2 (cont’d)
N = 500

N = 750

N = 1,000

N = 1,500

4

-0.93

-0.86

0.0063

0.0227

-0.55

-0.27

0.0052

0.0237

5

-1.14

-0.98

0.0059

0.0208

-0.77

-0.39

0.0049

0.0223

6

-0.88

-0.70

0.0059

0.0192

-0.66

-0.04

0.0049

0.0202

7

-0.55

-0.56

0.0061

0.0163

-0.36

0.01

0.0051

0.0174

4

-0.66

-1.95

0.0040

0.0155

-0.31

-1.67

0.0036

0.0157

5

-0.16

-2.76

0.0041

0.0146

0.32

-2.54

0.0037

0.0146

6

-0.02

-2.93

0.0040

0.0134

0.24

-2.79

0.0036

0.0137

7

0.25

-2.44

0.0039

0.0130

0.35

-2.18

0.0035

0.0132

4

-0.14

-1.79

0.0030

0.0092

-0.16

-1.21

0.0026

0.0093

5

0.04

-1.89

0.0033

0.0090

0.15

-1.33

0.0027

0.0091

6

0.32

-1.87

0.0031

0.0081

0.12

-1.43

0.0026

0.0083

7

0.40

-1.60

0.0031

0.0073

-0.05

-1.20

0.0027

0.0075

4

-0.80

2.15

0.0026

0.0072

-1.04

2.19

0.0024

0.0074

5

-0.42

1.92

0.0025

0.0067

-0.40

1.99

0.0024

0.0069

6

0.08

1.63

0.0026

0.0063

-0.19

1.79

0.0024

0.0066

7

0.06

0.38

0.0025

0.0061

-0.01

0.55

0.0023

0.0064

Note. Cat. = number of categories. WLSMV = robust weighted least squares, ULSMV = robust unweight least squares. SEFL
represents standard errors of factor loadings and SESC is standard errors of structural coefficients.

137

3. Table F3 presents findings for chi-square test statistics and RMSEA with ML, MLR, ULSMV, and WLSMV estimation.

	  
Table F3. Bias and Rejection Rates of Chi-Square Statistics as well as Means and Rejection Rates of RMSEA with Bipolarization
Distribution
ML

MLR

Robust WLSRMSEA
Chi-square

N = 200

N = 300

N = 400

Robust WLSRMSEA
Chi-square

Cat.

bias

%

M

%

bias

%

bias

%

4

11.86

29.20

0.022

0.00

7.02

14.60

0.017

0.00

5

11.92

27.80

0.022

0.00

6.93

15.40

0.017

0.00

6

12.06

28.60

0.022

0.00

7.15

15.20

0.017

0.00

7

11.77

26.65

0.022

0.00

7.13

15.63

0.017

0.00

4

8.53

19.80

0.015

0.00

3.46

9.00

0.011

0.00

5

8.82

22.80

0.015

0.00

3.55

8.80

0.010

0.00

6

8.74

18.80

0.015

0.00

3.61

10.20

0.010

0.00

7

8.46

17.60

0.015

0.00

3.55

10.20

0.010

0.00

4

8.25

17.00

0.013

0.00

2.93

8.60

0.009

0.00

5

8.23

17.20

0.013

0.00

2.74

8.60

0.009

0.00

6

7.80

16.80

0.00
0.012

0.00

2.47

7.80

0.008

0.00

7

7.57

15.00

0.012

0.00

2.50

6.60

0.008

0.00

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

138

Table F3 (cont’d)
N = 500

N = 750

N = 1,000

N = 1,500

4

7.74

18.40

0.011

0.00

2.30

10.20

0.008

0.00

5

7.74

18.20

0.011

0.00

2.13

9.00

0.007

0.00

6

7.72

18.40

0.011

0.00

2.22

9.20

0.007

0.00

7

7.62

19.60

0.011

0.00

2.39

10.60

0.007

0.00

4

7.79

17.00

0.009

0.00

2.22

7.80

0.006

0.00

5

7.94

16.20

0.009

0.00

2.19

9.60

0.006

0.00

6

7.78

18.20

0.009

0.00

2.17

9.20

0.006

0.00

7

7.43

17.00

0.009

0.00

2.08

8.20

0.006

0.00

4

6.85

15.00

0.007

0.00

1.18

7.00

0.005

0.00

5

7.10

16.60

0.007

0.00

1.26

6.80

0.005

0.00

6

6.89

16.40

0.007

0.00

1.19

7.40

0.005

0.00

7

6.71

14.40

0.007

0.00

1.26

5.40

0.005

0.00

4

7.01

16.40

0.006

0.00

1.28

8.60

0.004

0.00

5

7.25

15.80

0.006

0.00

1.33

6.00

0.004

0.00

6

7.34

15.00

0.006

0.00

1.56

8.00

0.004

0.00

7

6.99

16.80

0.006

0.00

1.47

7.40

0.004

0.00

Note. Cat. = number of categories. ML = maximum likelihood, MLR = robust maximum likelihood. M = mean.

	  
	  

139

	  
Table F3 (cont’d)
WLSMV

ULSMV

Robust WLSRMSEA
Chi-square

N = 200

N = 300

N = 400

Robust WLSRMSEA
Chi-square

Cat.

bias

%

bias

%

M

%

bias

%

4

2.65

3.82

0.011

0.00

1.46

1.41

0.010

0.00

5

2.54

4.41

0.011

0.00

1.40

3.21

0.009

0.00

6

2.89

3.82

0.011

0.00

1.74

3.01

0.010

0.00

7

3.23

5.01

0.012

0.00

1.98

4.22

0.010

0.00

4

1.19

3.40

0.008

0.00

0.23

3.40

0.007

0.00

5

1.04

3.40

0.008

0.00

0.22

2.80

0.007

0.00

6

1.27

3.20

0.008

0.00

0.46

2.60

0.007

0.00

7

1.42

4.80

0.008

0.00

0.61

4.20

0.008

0.00

4

1.66

4.41

0.007

0.00

1.01

3.81

0.007

0.00

5

1.63

4.40

0.007

0.00

1.01

4.00

0.007

0.00

6

1.62

4.20

0.008

0.00

0.99

2.80

0.007

0.00

7

1.82

3.60

0.008

0.00

1.11

3.40

0.007

0.00

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

	  

140

Table F3 (cont’d)
N = 500

N = 750

N = 1,000

N = 1,500

4

0.87

4.40

0.006

0.00

0.29

4.00

0.006

0.00

5

0.61

4.60

0.006

0.00

0.04

4.00

0.006

0.00

6

0.60

5.00

0.006

0.00

0.04

4.00

0.006

0.00

7

0.69

4.00

0.006

0.00

0.04

3.40

0.005

0.00

4

1.15

5.80

0.005

0.00

0.76

6.20

0.005

0.00

5

1.18

5.60

0.005

0.00

0.79

5.60

0.005

0.00

6

1.21

6.00

0.005

0.00

0.80

5.40

0.005

0.00

7

0.78

4.60

0.005

0.00

0.34

5.00

0.005

0.00

4

0.15

4.80

0.004

0.00

-0.11

4.40

0.004

0.00

5

0.21

5.40

0.004

0.00

-0.04

5.20

0.004

0.00

6

0.16

4.80

0.004

0.00

-0.11

3.80

0.004

0.00

7

0.28

3.80

0.004

0.00

-0.01

4.20

0.004

0.00

4

0.42

5.40

0.003

0.00

0.33

5.40

0.003

0.00

5

0.57

5.00

0.004

0.00

0.49

5.40

0.004

0.00

6

0.95

6.20

0.004

0.00

0.86

6.20

0.004

0.00

7

0.86

5.60

0.004

0.00

0.75

5.20

0.004

0.00

Note. Cat. = number of categories. WLSMV = robust weighted least squares, ULSMV = robust unweight least squares. M =
mean.

	  

141

REFERENCES

142

REFERENCES
Anderson, R. D. (1996). An Evaluation of the Satorra-Bentler distribution misspecification
correction applied to the McDonald fit index. Structural Equation Modeling, 3, 203-227.
Asparouhov, T., & Muthén, B. O. (2005). Multivariate statistical modeling with survey data.
Retrieved from: http://www.fcsm.gov/05papers/Asparouhov_Muthen_IIA.pdf
Asparouhov, T., & Muthén, B. O. (2010). Simple second order chi-square correction. Retrieved
from: http://www.statmodel.com/download/WLSMV_new_chi21.pdf
Bandalos, D. L. (2006). The use of Monte Carlo studies in structural equation modeling
research. In G. R. Hancock & R. Mueller (Eds.), Structural equation modeling: A second
course (pp. 385-426). Greenwich, CT: Information Age.
Beauducel, A., & Herzberg, P. Y. (2006). On the performance of maximum likelihood versus
means and variance adjusted weighted least squares estimation in CFA. Structural
Equation Modeling, 13(2), 186-203.
Bentler, P. M., & Chou, C. P. (1987). Practical issues in structural equation modeling.
Sociological Methods and Research, 16, 78-117.
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Bolt, D. M. (2005). Limited- and full-information estimation of item response theory models. In
A. Maydeu-Olivares & J. J. McArdle (Eds.), Contemporary psychometrics (pp. 27-71).
Mahwah, NJ: Lawrence Erlbaum Associates.
Boomsma, A. (2013). Reporting Monte Carlo studies in structural equation modeling.
Structural Equation Modeling, 20, 518-540.
Bradley, J. V. (1978). Robustness? British Journal of Mathematical and Statistical Psychology,
58, 430-450.
Breckler, S. J. (1990). Applications of covariance structure modeling in psychology: Cause for
concern? Psychological Bulletin, 107, 260-273.
Browne, M. W. (1984). Asymptotically distribution-free methods for the analysis of covariance
structures. British Journal of Mathematics and Statistical Psychology, 37, 62-83.
Browne, M. W. (1974). Generalized least squares estimators in the analysis of covariance
structures. South African Statistical Journal, 8, 1-24. Reprinted in 1977 in D. J. Aigner & A.
S. Goldberger (Eds.), Latent variables in socioeconomic models (pp. 205-226). Amsterdam:
North Holland.

143

Brown, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen &
J. S. Long (Eds.), Testing structural equation models (pp. 136-162). Newbury Park, CA:
Sage.
Chen, F., Bollen, K. A., Paxton, P., Curran, P. J., & Kirby, J. (2001). Improper solutions in
structural equation models: Causes, consequences, and strategies. Sociological Methods
and Research, 29, 468-508.
Christoffersson, A. (1977). Two-step weighted least squares factor analysis of dichotomized
variables. Psychometrika, 40, 433-438.
Coenders, G., Satorra, A., & Saris, W. E. (1997). Alternative approaches to structural modeling
of ordinal data: A Monte Carlo study. Structural Equation Modeling, 4, 261- 282.
Cook, C., Heath, F., & Thompson, R. L. (2001). Score reliability in web- or internet-based
surveys: unnumbered graphic rating scales versus Likert-type scales. Educational and
Psychological Measurement, 61, 697-706.
Curran, P. J., Bollen, K. A., Paxton, P., Kirby, J., & Chen, F. (2002). The noncentral chi-square
distribution in misspecified structural equation models: Finite sample results form a Monte
Carlo simulation. Multivariate Behavioral Research, 37, 1-36.
Ding, L., Velicer, W. F., Harlow, L. L. (1995). Effects of estimation methods, number of
indicators per factor, and improper solutions on structural equation modeling fit indices.
Structural Equation Modeling, 2, 119-144.
DiStefano, C., & Hess, B. (2005). Using confirmatory factor analysis for construct validation:
An empirical review. Journal of Psychoeducational Assessment, 23, 225-241.
Ethington, C. A., (1987). The robustness of LISREL estimates in structural equation models
with categorical variables. The Journal of Experimental Education, 55, 80-88.
Flora, D. B., & Curran P. J. (2004). An empirical evaluation of alternative methods of
estimation for confirmatory factor analysis with ordinal data. Psychological Methods, 9(4),
466-491.
Forero, C. G., & Maydeu-Olivares, A. (2009). Estimation of IRT graded response models:
Limited versus full information methods. Psychological Methods, 14, 275-299.
Forero, C. G., Maydeu-Olivares, A., & Gallardo-Pujol, D. (2009). Factor analysis with ordinal
indicator: A Monte Carlo Study Comparing DWLS and ULS Estimation. Structural Equation
Modeling, 16, 625-641.
Gagné, P., & Hancock, G. R. (2006). Measurement model quality, sample size, and solution
propriety in confirmatory factor models. Multivariate Behavioral Research, 41, 65-83.

144

Gartside, P. (2001). Letters to the editor. The American Statistican, 55, 171-174.
Gerbing, D. W., & Anderson, J. C. (1985). The effects of sampling error and model
characteristics on parameter estimation for maximum likelihood confirmatory factor
analysis. Multivariate Behavioral Research, 20, 225-271.
Herzog, W., & Boomsma, A. (2009). Small-sample robust estimators of noncentrality-based
and incremental model fit. Structural Equation Modeling, 16, 1-27.
Hoogland, J. J., & Boomsma, A. (1998). Robustness studies in covariance structure modeling:
An overview and meta-analysis. Sociological Methods & Research, 26, 329-367.
Hu, L. T., Bentler, P. M., & Kano, Y. (1992). Can test statistics in covariance structure analysis
be trusted? Psychological Bulletin, 112, 351-362.
Jackson, D. L., Gillaspy, J. A., & Purc-Stephenson, R. (2009). Reporting practices in
confirmatory factor analysis: An overview and some recommendations. Psychological
Methods, 14, 6-23.
Johnson D. R., & Creech, J. C. (1983). Ordinal measures in multiple indicator models: A
simulation study of categorization error. American Sociological Review, 48(3), 398-407.
Jöreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor
analysis. Psychometrika, 34, 183-202.
Jöreskog, K. G. (2005). Structural equation modeling with ordinal variables using LISREL.
Retrieved from: http://www.ssicentral.com/lisrel/techdocs/ordinal.pdf.
Jöreskog, K. G., & Sörbom, D. (1986). PRELIS: A program for multivariate data screening and
data summarization. A pre-processor for LISREL. Mooresville, IN: Scientific Software.
Jöreskog, K. G., & Sörbom, D. (1996). LISREL 8: User’s reference guide. Chicago: Scientific
Software International.
Jöreskog, K. G., Sörbom, D., du Toit, S., & du Toit, M. (2000). LISREL 8: New Statistical
Features. Chicago: Scientific Software International.
Kaplan, D. (2009). Structural equation modeling: Foundations and extensions. (2nd ed.).
Thousand Oaks, CA: Sage.
Lei, P. (2009). Evaluating estimation methods for ordinal data in structural equation modeling.
Quality and Quantity, 43, 495-507.
Lietz, P. (2010). Research into questionnaire design: A summary of the literature.
International Journal of Market Research, 52, 249-272.

145

Magnus, J. R., & Neudecker, H. (1986). Symmetry, 0-1 matrices and Jacobians: A review.
Econometric Theory, 2, 157-190.
Marsh, H. W., Hau, K., Balla, J. R., & Grayson, D. (1998). Is more ever too much? The number
of indicators per factor in confirmatory factor analysis. Multivariate Behavioral Research,
33, 181-220.
Maydeu-Olivares, A. (2006). Limited information estimation and testing of discretized
multivariate normal structural models. Psychometrika, 71, 57-77.
Medsker, G. M., Williams, L. J., & Holohan, P. (1994). A review of current practices for
evaluating causal models in organizational behavior and human resources management
research. Journal of Management, 20, 439-464.
Micceri, T. (1989). The unicorn, the normal curve, than other improbable creatures.
Psychological Bulletin, 105, 156-166.
Muthén, B. O. (1984). A general structural equation model with dichotomous, ordered
categorical, and continuous latent variable indicators. Psychometrika, 49, 115-132.
Muthén, B. O. (1993). Goodness of fit with categorical and nonnormal variables. In K. A. Bollen
& J. S. Long (Eds.), Testing structural equation models (pp. 205-234). Newbury Park, CA:
Sage.
Muthén, B. O. (2002). Using Mplus Monte Carlo simulation in practice: A note one assessing
estimation quality and power in latent variable models. Retrieved from:
https://www.statmodel.com/download/webnotes/mc1.pdf.
Muthén, B. O., & Kaplan, D. (1992). A comparison of some methodologies for the factor
analysis of non-normal Likert variables: A note on the size of the model. British Journal of
Mathematical and Statistical Psychology, 45, 19-30.
Muthén, L. K., & Muthén, B. O. (2010). Mplus user’s guide. Los Angeles, CA: Muthén & Muthén.
Muthén, B. O., du Toit, S. H. C., & Spisic, D. (1997). Robust inference using weighted least
squares and quadratic estimating equations in latent variable modeling with categorical
and continuous outcomes. Retrieved from:
http://gseis.ucla.edu/faculty/muthen/articles/Article_075.pdf.
Nester, M. (1996). An applied statistician’s creed. Applied Statistics, 45, 401-410.
Olsson, U. (1979). Maximum likelihood estimation of the polychoric correlation coefficient.
Psychometrika, 44, 443-460.
Oranje, A. (2003, April). Comparison of estimation methods in factor analysis with categorical
variables: Applications to NAEP data. Paper presented at the annual meeting of the

146

American Education Research Association (AERA), Chicago, IL.
Paxton, P., Curran P. J., Bollen, K. A., Kirby J., & Chen, F. (2001). Monte Carlo experiments:
Design and implementation. Structural Equation Modeling, 8, 287-312.
Raykov, T. (2012). Scale construction and development using structural equation modeling. In
R. H. Hoyle (Ed.), Handbook of Structural Equation Modeling (pp. 472-492). New York: The
Guildford Press.
Rhemtulla, M., Brosseau-Liard, P. E., & Savalei, V. (2012). When can categorical variable be
treated as continuous? A comparison of robust continuous and categorical SEM estimation
methods under suboptimal conditions. Psychological Methods.
Rigdon, E. E. (1998). Structural equation modeling. In G. A. Marcoulides (Ed.), Modern
methods for business research (pp. 251-294). Mahwah, NJ: Lawrence Erlbaum Associates.
Satorra, A., & Bentler, P. M. (1994). Corrections to test statistics and standard errors in
covariance structure analysis. In A. von Eye & C. C. Clogg (Eds.), Latent variable analysis:
Applications for developmental research (pp. 399-419). Thousand Oaks, CA: Sage.
Satorra, A. (1989). Alternative test criteria in covariance structure analysis: A unified
approach. Psychometrika, 54, 131-151.
Satterthwaite, F. E. (1941). Synthesis of variance. Psychometrika, 6, 309-316.
Savalei, V. (2010). Expected versus observed information in SEM with incomplete normal and
nonnormal data. Psychological Methods, 15, 352-367.
Steiger, J. H. (1990). Structural model evaluation and modification: An interval estimation
approach. Multivariate Behavioral Research, 25, 173-180.
Velicer, W. F., & Fava, J. L. (1998). Effects of variable and subject sampling on factor pattern
recovery. Psychological Methods, 3, 231-251.
Weijter, B., Geuens, M., & Schillewaert, N. (2010). The stability of individual response styles.
Psychological Methods, 15, 96-110.
Wirth, R. J., & Edwards, M. C. (2007). Item factor analysis: Current approaches and future
directions. Psychological Methods, 12, 58-79.
Yang-Wallentin, F., Jöreskog, K. G., & Luo, H. (2010). Confirmatory factor analysis of ordinal
variables with misspecified models. Structural Equation Modeling, 17, 392-423.
Yuan, K. H., & Bentler, P. M. (1997). Improving parameter tests in covariance structure
analysis. Computational Statistics & Data Analysis, 26, 177-198.

147

Yuan, K. H., & Bentler, P. M. (1998). Normal theory based test statistics in structural equation
modeling. British Journal of Mathematical and Statistical Psychology, 51, 289-309.
Yuan, K. H., & Hayashi, K. (2006). Standard errors in covariance structure models:
Asymptotics versus bootstrap. British Journal of Mathematical and Statistical Psychology,
59, 397-417.
Yuan, K. H., & Schuster, C. (2013). Overview of statistical estimation methods. In T. D. Little
(Ed.), The Oxford Handbook of Quantitative Methods: Volume 1 (pp. 361-387). New York:
Oxford University Press.
Yuan, K. H., Bentler, P. M., & Zhang, W. (2005). The effect of skewness and kurtosis on mean
and covariance structure analysis: The univariate case and its multivariate implication.
Sociological Methods and Research, 34, 249-258.

148